SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 805–827
c 2009 Society for Industrial and Applied Mathematics
HIGHER-ORDER FINITE ELEMENT METHODS AND POINTWISE ERROR ESTIMATES FOR ELLIPTIC PROBLEMS ON SURFACES∗ ALAN DEMLOW† Abstract. We define higher-order analogues to the piecewise linear surface finite element method studied in [G. Dziuk, “Finite elements for the Beltrami operator on arbitrary surfaces,” in Partial Differential Equations and Calculus of Variations, Springer-Verlag, Berlin, 1988, pp. 142–155] and prove error estimates in both pointwise and L2 -based norms. Using the Laplace–Beltrami problem on an implicitly defined surface Γ as a model PDE, we define Lagrange finite element methods of arbitrary degree on polynomial approximations to Γ which likewise are of arbitrary degree. Then we prove a priori error estimates in the L2 , H 1 , and corresponding pointwise norms that demonstrate the interaction between the “PDE error” that arises from employing a finite-dimensional finite element space and the “geometric error” that results from approximating Γ. We also consider parametric finite element approximations that are defined on Γ and thus induce no geometric error. Computational examples confirm the sharpness of our error estimates. Key words. Laplace–Beltrami operator, surface finite element methods, a priori error estimates, boundary value problems on surfaces, pointwise and maximum norm error estimates AMS subject classifications. 58J32, 65N15, 65N30 DOI. 10.1137/070708135
1. Introduction. The numerical solution of partial differential equations (PDEs) defined on surfaces arises naturally in many applications (cf. [CDR03], [CDD+04], [BMN05], [He06], and [DE07a], among many others). We consider the following model problem in order to focus on basic issues arising in the definition and analysis of such numerical methods. Let Γ be a smooth n-dimensional surface (n = 2, 3) without boundary embedded in Rn+1 . Let f be given data satisfying Γ f dσ = 0 where dσ is the surface measure, and let u solve −ΔΓ u = f on Γ. Here ΔΓ is the Laplace–Beltrami operator on Γ, and we require Γ u dσ = 0 in order to guarantee uniqueness. Several methods for defining suitable triangulations of Γ and corresponding finite element spaces have been proposed. For example, one may use the manifold structure of Γ (cf. [Ho01]) or a global parametric representation (cf. [AP05]) to triangulate Γ. In this work we focus on the method originally considered in [Dz88] in which Γ is represented as a level set of a smooth signed distance function d. In [Dz88], Γ is approximated by a polyhedral surface Γh having triangular faces, and the equations for defining a piecewise linear finite element approximation to u are conveniently defined and solved on Γh . This method has several advantages when compared with approaches relying on global or local parametrizations of Γ. These include its flexibility in handling various surfaces and its direct extension to problems in which the surface under consideration evolves in an unknown fashion and a parametrization is ∗ Received by the editors November 14, 2007; accepted for publication (in revised form) October 7, 2008; published electronically February 6, 2009. This material is based upon work partially supported under National Science Foundation grants DMS-0303378 and DMS-0713770. http://www.siam.org/journals/sinum/47-2/70813.html † Department of Mathematics, University of Kentucky, 715 Patterson Office Tower, Lexington, KY 40506-0027 (
[email protected]).
805
806
ALAN DEMLOW
thus not available. The paradigm example of such an evolution problem is motion of a surface by mean curvature flow; cf. [Dz91], [DDE05]. In the present work we focus on two goals. The first is to define higher-order analogues to the surface finite element method defined in [Dz88]. Higher-order approximations are desirable in many situations because of their increased computational efficiency versus piecewise linear finite element methods. In order to obtain such approximations, it is generally necessary to approximate Γ to higher order in addition to employing higher-order finite element spaces. We thus construct parametric finite element spaces of arbitrary degree that are defined on arbitrary-degree polynomial approximations to Γ. In addition, we describe fully parametric finite element spaces defined directly on Γ via local transformations from the faces of Γh so that no error arises from approximating Γ. It should be noted that in both of these cases, we require explicit knowledge of the distance function d (either through an analytical formula or by a numerical approximation) in order to construct our algorithm. Our second main goal is to carry out a thorough error analysis for finite element methods for the Laplace–Beltrami operator on surfaces. The original work of Dziuk in [Dz88] contains proofs of optimal-order convergence of the piecewise linear surface finite element method in the L2 and energy norms. Here we prove optimal-order estimates for pointwise errors in function values and gradients and for local energy errors in addition to the L2 and energy errors. These estimates are valid for arbitrary degrees of finite element spaces and polynomial approximations to Γ. As in [Dz88], we split the overall error into a “geometric error” arising from the approximation of Γ and a standard finite element “almost-best-approximation” error which arises from approximating an infinite-dimensional function space by a finite-dimensional finite element space. Roughly speaking, when employing finite element spaces of degree r on polynomial surface approximations of degree k, we have ∇Γ (u − uh )L2 (Γ) ≤ Chr uH r+1 (Γ) + Chk+1 uH 1 (Γ) , u − uh L2 (Γ) ≤ Chr+1 uH r+1 (Γ) + Chk+1 uH 1 (Γ) , where uh is the finite element solution, ∇Γ is the tangential gradient on Γ, and C depends on geometric properties of Γ. We also prove similar estimates in L∞ and 1 . As we verify via numerical experiments, one must thus choose k + 1 ≥ r to W∞ achieve optimal-order convergence in Wp1 norms and k ≥ r to achieve optimal-order convergence in Lp norms. We finally note that approximating Γ via higher-degree polynomials has the added benefit that the curvatures of the approximating surface Γh have a natural pointwise definition and converge to those of Γ. The availability of a simple curvature approximation is beneficial in applications where the weak form of the PDE under consideration, and thus also the finite element method, explicitly employs curvature information (as, for example, in the image processing application in [CDR03]). Curvature information also was used in the a posteriori error estimates given in [DD07]. However, pointwise curvatures are not naturally defined on the piecewise linear discrete surfaces employed in [Dz88], and ad-hoc reconstruction methods must be used to define suitable curvatures if they are explicitly required in calculations (cf. [CDR03]). An outline of the paper is as follows. Section 2 contains definitions and preliminaries. In section 3 we prove abstract error estimates in various norms. In section 4, we demonstrate how these abstract estimates may be applied to various finite element methods on surfaces and give computational results illustrating the basic error behavior of the methods. In section 5 we give a brief discussion of conditions under which
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
807
our error analysis may be extended to more general classes of PDEs on surfaces and manifolds. 2. Preliminaries. In this section we record a number of preliminaries concerning geometry, transformations of functions between the continuous and discrete surfaces Γ and Γh , analytical results, and finite element approximation theory. 2.1. Geometric and analytical preliminaries on Γ. We assume throughout that Γ is a compact, oriented, C∞ , two- or three-dimensional surface without boundary which is embedded in R3 or R4 , respectively. Our results may be extended to higher-dimensional surfaces of codimension one if appropriate results from finite element approximation theory can be proved; we restrict ourselves to lower-dimensional manifolds so that we may employ the Lagrange interpolant in our analysis. Let d be the oriented distance function for Γ. For concreteness, let d < 0 on the interior of Γ and d > 0 on the exterior of Γ. ν = ∇d is then the outward-pointing unit normal, and H = ∇2 d is the Weingarten map. Here we express these quantities in the coordinates of the embedding space Rn+1 (n = 2, 3). For x ∈ Γ, the n eigenvalues κ1 , . . . , κn of H corresponding to eigenvectors perpendicular to ν are the principal curvatures at x. Let U ⊂ Rn+1 be a strip of width δ about Γ, where δ > 0 is sufficiently small to ensure that the decomposition a(x) = x − d(x)ν (x) onto Γ is unique. We also require that δ < mini=1,...,n κi L1 (Γ) ; cf. [GT98, sec∞ tion 14.6] and [DD07]. Let P = I − ν ⊗ ν be the projection onto the tangent plane at x, where ⊗ is the outer product defined by (a ⊗ b)c = ab · c. Then ∇Γ = P∇ is the tangential gradient, divΓ = ∇Γ · is the tangential divergence, and ΔΓ = divΓ ∇Γ is the Laplace–Beltrami operator. We shall use standard notation (H 1 (Γ), Wpj (Γ), etc.) for Sobolev spaces and norms of functions possessing j tangential derivatives lying in Lp . Next we state some analytical results. Let (2.1) L(u, v) = ∇Γ u∇Γ v dσ, Γ
and let (·, ·) be the L2 inner product over Γ. Lemma 2.1. Let f ∈ L2 (Γ) satisfy Γ f dσ = 0. Then the problem L(u, v) = (f, v) ∀ v ∈ H 1 (Γ) has a unique weak solution u satisfying Γ u dσ = 0, and (2.2)
uH22 (Γ) ≤ Cf L2 (Γ) .
Proof. See [Aub82, Chapter 4] for a proof of existence and uniqueness. Inequality (2.2) may be proved by local transformations to subsets of Rn and a covering argument. The proofs of our pointwise error estimates also rely on properties of the Green’s function. We denote by α(x, y) the surface distance between x, y ∈ Γ. Lemma 2.2. There exists a function G(x, y), unique up to a constant, such that for all functions φ ∈ C 2 (Γ), 1 φ(x) = φ dσ + G(x, y)(−ΔΓ φ(y)) dσ. |Γ| Γ Γ In addition, for x, y ∈ Γ with x = y, C(1 + log α(x, y)), n = 2, (2.3) G(x, y) ≤ Cα(x, y)2−n , n > 2.
808
ALAN DEMLOW
Also, let |γ + β| > 0, where γ and β are multi-indices. Then (2.4)
γ β DΓ,x G(x, y)| ≤ Cα(x, y)2−n−|γ+β| . |DΓ,y
Proof. Existence of the Green’s function G, (2.3), and (2.4) for 1 ≤ |α| ≤ 2 and |β| = 0 are contained in Theorem 4.13 of [Aub82]. Inequality (2.4) may be easily extended to arbitrary α, β with |α + β| > 0 by using the representation (17) on p. 109 of [Aub82]. Finally, let γΓ > 0 be the largest positive number such that all balls BγΓ (x0 ) = {x ∈ Γ : α(x, x0 ) < γΓ } of radius γΓ map smoothly to domains in Rn . Such a number γΓ exists since Γ is a smooth, compact surface. 2.2. The discrete surface Γh . Let Γh ⊂ U be a polyhedron having triangular faces (n = 2) or a polytope having tetrahedral cells (n = 3) whose vertices lie on Γ and whose faces (cells) are shape-regular and quasi-uniform of diameter h. We shall denote by T˜h the set of triangular faces of Γh and by Th the image under a of T˜h (i.e., Th consists of curved simplices lying on Γ). Let νh be the outward unit normal on Γh . We will analyze finite element methods defined on Γh , on Γ, and on higherorder polynomial approximations of Γ, but Γh will play a central role in defining and analyzing all of them. From a programming standpoint in particular, Γh is fundamental to our methods in that the faces T˜h of Γh always constitute the “base” triangulation of Γ, with parametric finite element spaces then being defined over T˜h . 2.3. Higher-order polynomial approximations to Γ. Next we describe a family Γkh (k ≥ 1) of polynomial approximations to Γ. The higher-order finite element spaces we use here are largely described in [He05] and also are similar to the surface element spaces described in [Ne76]. First let Γh = Γ1h be a polyhedral approximation to Γ as in the preceding subsection. For k ≥ 2 and for a given element T˜ ∈ T˜h , let φk1 , . . . , φknk be the Lagrange basis functions of degree k on T˜ corresponding to the nodal points x1 , . . . , xnk . For x ∈ T˜ , we then define the discrete projection ak (x) =
nk
a(xj )φkj (x).
j=1
Employing the above definition on each element T˜ ∈ T˜h yields a continuous piecewise polynomial map on Γh . We then define the corresponding discrete surface Γkh = {ak (x) : x ∈ Γh }. Thus each component of ak is the Lagrange interpolant of the corresponding component of the projection a restricted to Γh . Let Tˆhk be the image under ak of T˜h , i.e., for Tˆ ∈ Tˆhk , Tˆ = ak (T˜ ) for some T˜ ∈ T˜h . Let also Thk be the image under a of Tˆhk . Next we discuss the computation of geometric quantities on Γkh . Note first that k Γh is defined parametrically, not implicitly as is Γ. Thus practical computation of geometric quantities such as normals and curvatures on Γkh may involve somewhat different formulas than does computation of the corresponding quantities on Γ. Let νhk be the (piecewise smooth) unit normal on Γkh . In order to compute νhk in a practical situation, we let K be a unit simplicial reference element lying in Rn . Let Tˆ ∈ Tˆhk with Tˆ = ak (T˜ ) where T˜ ∈ T˜h , and let M : K → T˜ be an affine coordinate transformation with M(K) = T˜ . A typical finite element code allows easy ˆk,xn , where x1 , . . . , xn are the standard Euclidean access to the quantities a ˆk,x1 , . . . , a
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
809
coordinates on K and a ˆk = ak ◦ M. νhk is then the outward-pointing unit vector that is perpendicular to a ˆk,x1 , . . . , a ˆk,xn . If n = 2, we thus have for x ∈ K (2.5)
ak (x)) = ± νhk (ˆ
ˆk,x2 (x) a ˆk,x1 (x) × a . |ˆ ak,x1 (x) × a ˆk,x2 (x)|
One advantage of employing higher-order approximations to Γ is that in contrast to piecewise linear approximations, such surfaces have naturally defined pointwise curvatures. This information is explicitly needed in the weak (and thus finite element) formulations of various equations. Fix a point a ˆk (x) ∈ Γkh , where x ∈ K with K and a ˆk as above. The second fundamental form with respect to the basis {ˆ ak,x1 , . . . , a ˆk,xn } of the tangent space Taˆk (x) is given by II = [ˆ ak,xi xj ·νhk ], and the metric tensor is given by ˆk,xj ]. The Weingarten map with respect to the basis {ˆ ak,x1 , . . . , a ˆk,xn } is G = [ˆ ak,xi · a then Htan = IIG−1 . It is often desirable to express the Weingarten map with respect to the coordinates of the embedding space Rn+1 instead of with respect to the basis of the tangent space induced by a ˆk . We thus compute −1 Hkh = a , ˆk,xn Htan Pn a ˆk,xn νhk ˆk,x1 .. a ˆk,x1 .. a where Pn is defined by (x1 , . . . , xn , xn+1 ) → (x1 , . . . , xn ). The principal curvatures and corresponding eigenbasis of the tangent space may be computed from Hkh . An alternative when n = 2 is to apply the formula Hkh = ∇Γkh νhk to (2.5). We now state results concerning the approximation of Γ by Γkh . Proposition 2.3. For h small enough, T˜ ∈ T˜h , Tˆ ∈ Tˆhk , and 1 ≤ i ≤ k, (2.6)
dL∞ (Γkh ) ≤ a − ak L∞ (Γh ) ≤ Chk+1 ,
(2.7)
a − ak W i
(2.8)
ν − νhk L∞ (Γkh ) ≤ Chk ,
(2.9)
˜
∞ (T )
≤ Chk+1−i ,
H ◦ a − Hkh L∞ (Tˆ ) ≤ Chk−1 .
The constants C above depend upon the distance function d and its derivatives. Proof. Inequalities (2.6) and (2.7) follow directly from the definition of ak as the Lagrange interpolant of a and the definition of d (cf. [BS02] for standard results concerning finite element interpolation theory). To prove (2.8), consider a point x ˆ ∈ Γkh , x) for x ˜ ∈ T˜ ⊂ Γh . Employing (2.6) and the smoothness of Γ, we have where x ˆ = ak (˜ x)| ≤ |ν (ak (˜ x)) − ν (a(˜ x))| + |ν (a(˜ x)) − νhk (ak (˜ x))| |ν (ˆ x) − νhk (ˆ ≤ C(Γ)hk+1 + |ν (a(˜ x)) − νhk (ak (˜ x))|. Assuming without loss of generality that T lies in the x1 , . . . , xn -hyperplane, we next note that ν (a(˜ x)) is the outward-facing unit vector orthogonal to ax1 , . . . , axn and νhk (ak (˜ x)) is the outward-facing unit vector orthogonal to ak,x1 , . . . , ak,xn . From (2.7) we have |axi −ak,xi | ≤ Chk , and it is also not difficult to compute that |axi | is bounded from above and below independent of h for 1 ≤ i ≤ n. Using these facts, one may x))| ≤ Chk , for example, then compute in an elementary fashion that |ν (a(˜ x))−νhk (ak (˜ by using the Gram–Schmidt orthonormalization algorithm. Inequality (2.9) may be proved in a similar fashion after noting that axi xj − ak,xi xj L∞ (Tˆ) ≤ Chk−1 for any element Tˆ ⊂ Γkh .
810
ALAN DEMLOW
Remark 2.4. Because Hkh involves the second derivatives of a C 0 interpolant, it is only defined elementwise. However, for k ≥ 2 a pointwise definition of Hkh on an element interface may be defined by taking the limit of Hkh as the interface is approached from any adjacent element. Stitching these elementwise approximations together yields a global, piecewise continuous curvature approximation with O(hk−1 ) error. In particular, while Hkh viewed globally is a distribution with singular jump terms on element interfaces, it is not necessary to take these jump terms into account in order to obtain a convergent pointwise curvature approximation for higher-order discrete surfaces. 2.4. The correspondence between Γh , Γkh , and Γ. Our analysis requires a number of relationships between functions defined on Γ and Γkh , as in [Dz88] and [DD07]. In addition, proving approximation results for the parametric finite element r spaces Shk will require establishing similar relationships between functions defined on k Γh and Γh . We first establish relationships between functions defined on the continuous surface Γ and the discrete surfaces Γkh . Let v ∈ H 1 (Γ) and define the extension v (x) = x)) = vh (x), v(a(x)) for x ∈ U . For vh ∈ H 1 (Γkh ) we define the lift v˜h ∈ H 1 (Γ) by v˜h (a(˜ x ˜ ∈ Γh . For vh ∈ H 1 (Γkh ), we then define the extension vh (x) = v˜h (a(x)) for any x ∈ U . Also, for x ˆ ∈ Γkh let μhk (ˆ x) satisfy μhk (ˆ x) dσhk (ˆ x) = dσ(a(ˆ x)), where dσ and dσhk are surface measures on Γ and Γkh , respectively. Proposition 2.5. Let x ∈ Γkh and n = 2, 3. Then (2.10)
μhk (ˆ x) = ν (ˆ x) · νhk (ˆ x)Πni=1 (1 − d(ˆ x)κi (ˆ x)).
κi (a(x)) Remark 2.6. For x ∈ U , κi (x) = 1+d(x)κ ; cf. [GT98], [DD07]. i (a(x)) Proof. Equation (2.10) is proved in [DD07] for n = 2 using properties of the cross product, so we sketch a proof for n = 3. Let Tˆ ⊂ Rn be a reference simplex. Let also f = ak ◦L: Tˆ → T˜ ⊂ Γkh , where T˜ = ak (T ) for T ∈ T˜h and L : Tˆ → T is one of the obvious natural linear transformations. Let f have Jacobian F ∈ R(n+1)×n with singular values σ1 , . . . , σn and singular value decomposition F = UΣVT . Here U has orthonormal columns u1 , . . . , un , νhk , Σ ∈ R(n+1)×n , and V ∈ Rn×n is orthogonal. Let dx be a Lebesgue measure on Tˆ. First we compute dσhk = |Πni=1 σi | dx and dσ = |det[(P − dH)F ν ]| dx = [Πni=1 (1 − dκi )]| det[PF ν ]| dx. But | det[PF ν ]| = √ det FT PPF. √ For n = 2, 3, a short computation involving the singular value decomposition yields det FT PPF = ν · νhk |Πni=1 σi |, which completes the proof. Next we state identities regarding tangential gradients on Γ, Γh , and Γkh (cf. [Dz88], [DD07]). For vh ∈ H 1 (Γkh ), v ∈ H 1 (Γ), and xˆ ∈ Γkh ,
(2.11) (2.12)
∇Γkh v (ˆ x) = [Ph,k (ˆ x)][(I − dH)(ˆ x)][P(ˆ x)]∇Γ v(a(ˆ x)), x) ⊗ ν (ˆ x) ν k (ˆ ∇Γ vh (a(ˆ x)) = [(I − dH)(ˆ x)]−1 I − hk x). ∇Γkh vh (ˆ νh (ˆ x) · ν (ˆ x)
Here Ph,k = I − νhk ⊗ νhk is the projection onto the tangent space of Γh,k . Letting (2.13)
AΓ (a(ˆ x)) =
1 P(ˆ x)[I − d(ˆ x)H(ˆ x)]Ph,k (ˆ x)[I − d(ˆ x)H(ˆ x)]P(ˆ x) μhk (ˆ x)
for x ˆ ∈ Γkh , (2.11) also yields the integral equality ∇Γkh uh ∇Γkh vh dσhk = AΓ ∇Γ uh ∇Γ vh dσ. (2.14) Γk h
Γ
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
811
We also shall need to compare Sobolev norms of functions defined on Γ and Γkh . Let v ∈ Wpj (Γ) with j ≥ 0 and 1 ≤ p ≤ ∞. Then there exist constants Cj depending on j and Γ such that for h small enough, (2.15) (2.16) (2.17)
1 vLp (Γ) ≤ v Lp (Γkh ) ≤ C0 vLp (Γ) , C0 1 ∇Γ vLp (Γ) ≤ ∇Γkh v Lp (Γkh ) ≤ C1 ∇Γ vLp (Γ) , C1 DΓj k v Lp (Γkh ) ≤ Cj DΓm vLp (Γ) . h
1≤m≤j
The first two inequalities follow from (2.11) and (2.12) along with the equivalence of dσ and dσhk for h small enough. Inequality (2.17) follows from repeated application of (2.11), Proposition 2.3, and the equivalence of dσ and dσhk . Next we establish analogues of (2.15), (2.16), and (2.17) for functions defined on Γkh and Γh . In particular, let T˜ be a triangular face of Γh , and let Tˆ = ak (T˜ ) ⊂ Γkh . x) = v(ak (˜ x)). Let also v be defined and piecewise smooth on Γkh , and for x˜ ∈ T˜ let v˜(˜ Then there exist positive constants Ci,j such that for h small enough, (2.18) (2.19) (2.20)
1 vLp(Tˆ ) ≤ ˜ v Lp (T˜) ≤ C0,k vLp (Tˆ ) , C0,k 1 ∇Γkh vLp (Tˆ ) ≤ ∇Γh v˜Lp (T˜) ≤ C1,k ∇Γkh vLp (Tˆ ) , C1,k DΓmk vLp (Tˆ ) . DΓj h v˜Lp (T˜) ≤ Cj h
1≤m≤j
We briefly discuss the proof of the above inequalities. Because the transformation m (T ) ≤ CaW m (T ) ≤ C x) is the Lagrange interpolant of x˜ → a(˜ x), ak W∞ x ˜ → ak (˜ ∞ for m ≥ 0 and h small enough. Let μ ˜hk be defined by μ ˜hk (˜ x) dσh1 = dσhk (ak (˜ x)), x ˜ ∈ Γh . Then |μh1 − μ ˜hk | ≤ Chk , so that μ ˜hk ≈ 1 for h small enough. These two facts taken together immediately give (2.18), (2.20), and the second inequality in (2.19). In order to establish the first inequality in (2.19), assume for simplicity that n = 2 and T lies in the xy-plane. The general case follows by employing an appropriate coordinate transformation and making the obvious adjustments if n = 3. We have
(2.21)
x) =∇Γh v(ak (˜ x)) ∇Γh v˜(˜ T = ak,x ak,y 0 ∇Γkh v(ak (˜ x))
T x)). = ak,x ak,y 0 + νhk ⊗ νhk ∇Γkh v(ak (˜
T x) ⊗ νhk (˜ x) and B = (I − dH)(˜ x) = ∇a + ν ⊗ ν Let A = ak,x (˜ x) ak,y (˜ x) 0 + νhk (˜ for x˜ ∈ Γh , and let · 2 be the matrix 2-norm. We first use the fact that ∇a = P − dH to calculate that |az | = |∇a · νh1 | = |∇a · (νh1 − ν )| ≤ Ch. In addition, |ak,x − ax | + |ak,y − ay | ≤ Chk . Next we note that since B is defined on Γh and approaches the identity as dist(Γh , Γ) → 0, B2 + B−1 2 ≤ C for h small enough.
812
ALAN DEMLOW
Thus employing (2.8), we have (again for h small enough) that A−1 2 ≤ A−1 − B−1 2 + B−1 2 (2.22)
≤ A−1 2 B − A2 B−1 2 + C ≤ ChA−1 2 + C ≤ C.
Multiplying (2.21) through by A−1 , inserting (2.22) into (2.21), and employing the equivalence of dσh and dσhk yields the first inequality in (2.19). 2.5. Finite element spaces and approximation theory. We begin by defining a family of Lagrange finite element spaces on Γh . Let S˜hr = {χ ˜ ∈ C 0 (Γh ) : χ| ˜ T˜ ∈ ˜ ˜ Pr ∀ T ∈ Th }, where r ≥ 1 and Pr is the set of polynomials in n variables of degree r r on Γkh by or less. We next define the family Sˆhk r Sˆhk = {χ ˆ ∈ C 0 (Γkh ) : χ ˆ=χ ˜ ◦ a−1 ˜ ∈ S˜hr }. k for some χ r Sˆhk is an isoparametric finite element space if k = r, subparametric if k < r, and superparametric if k > r. We finally define the corresponding lifted spaces on Γ,
˜ for some χ ˜ ∈ S˜hr } Shr = {χ ∈ C 0 (Γ) : χ = χ and r r = {χ ∈ C 0 (Γ) : χ = χ ˆ for some χ ˆ ∈ Sˆhk }. Shk r
= Shr . Note that because a ◦ ak = a, Shk Next we state results concerning finite element approximation theory. We only consider Lagrange-type interpolants as we only need to approximate functions which are sufficiently smooth (H22 ) to guarantee the availability of point values for n ≤ 3. For v ∈ H22 (Γ), we define the interpolant Ih1 = Ih : C 0 (Γ) → Shr by
Ih v = (I˜h v ) , where I˜h : C 0 (Γh ) → S˜hr is the standard Lagrange interpolant. We also define the r interpolant Iˆhk : C 0 (Γkh ) → Sˆhk by Iˆhk v(x) = I˜h v(a−1 k (x)), and Ihk v = (Iˆhk v ) . Note that Ih = Ihk since a ◦ ak (x) = a(x) for x ∈ Γh . This is the case even though the nodal points lying on Γ (and thus nodal values) of the two interpolants are the same. At several points in our presentation we will consider subdomains D ⊂ Γ. Let Dh = int(∪T ∈Th ,T ∩D =∅ T ) and Dhk = int(∪T ∈T k ,T ∩D =∅ T ). Also, for a given parameh ter γ ≥ h, we let Dγ = {x ∈ Γ : distΓ (x, D) < γ}. We shall need the following approximation and superapproximation results. Proposition 2.7. Assume that v ∈ Wpr+1 (Γ) for some 2 ≤ p ≤ ∞, let h be ˜ h = Dh , and S r = S r or small enough, and let D ⊂ Γ. Assume that either I = Ih , D h k ˜ r r I = Ih , Dh = Dhk , and S = Shk . Then for i = 0, 1 and 2 ≤ m ≤ r + 1, (2.23)
|v − Iv|Wpi (D) ≤ Chm−i vWpm (D˜ h ) .
813
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES r (Γ). Then for χ ∈ S r , Let also ω ∈ W∞
(2.24)
∇Γ (ωχ − I(ωχ))Lp (D) ≤C
r+1 h χLp(D˜ h ) ωW∞ ˜h) (Γ) + ∇Γ χLp (D
r
r
h ωW i i
˜
∞ (Dh )
i=1
.
˜ h, Finally, for any χ ∈ S r and any mesh domain D ∇Γ χL2 (D˜ h ) ≤ Ch−1 χL2 (D˜ h ) .
(2.25)
All constants above depend on sufficiently high derivatives of the distance function d. Proof. The proof follows by combining (2.15) through (2.20) with standard estimates for the Lagrange interpolant on Γh (cf. [BS02]). For example, if I = Ihk , we ˜ = may prove (2.24) by letting T˜ be a face of Γh and (a ◦ ak )(T˜) = T ⊂ Γ. Let χ(x) χ((a ◦ ak )(x)) and ω ˜ (x) = ω((a ◦ ak )(x)) for x ∈ T˜ . Inequalities (2.15) and (2.19), standard approximation and inverse results on T˜ , and (2.17) and (2.20) then yield ωχ ˜ − I˜h (˜ ω χ)) ˜ Lp (T˜ ) ∇Γ (ωχ − Ih (ωχ))Lp (T ) ≤ C1 C1,k ∇Γh (˜ ≤ Chr |˜ ω χ| ˜ Wpr+1 (T˜ ) ≤ Chr
i=1
≤C
r+1
|˜ ω|W i
˜
∞ (T )
|χ| ˜ Wpr+1−i (T˜)
r+1 ˜ + ∇Γ χ h χ ˜ Lp (T˜) |˜ ω |W∞ ˜ Lp (T˜ ) h (T )
r
r
h |˜ ω|W i i
i=1
˜
∞ (T )
r+1 ≤ CCr+1 Cr+1,k h χLp(T ) ωW∞ (T ) + C1 C1,k ∇Γ χLp (T )
r
r
i (T ) . h ωW∞
i
i=1
Summing over T ∩ D = ∅ completes the proof of (2.20). The rest of Proposition 2.7 is proved in a similar fashion, with obvious slight simplifications when I = Ih . The proofs of our pointwise estimates also employ a discrete δ-function. r , let x ∈ T ⊂ Γ with T a surface Proposition 2.8. Let S r = Shr or S r = Shk k triangle in either Th or Th , and let n be a unit vector lying in the tangent plane to Γ at x. Then there exist δx ∈ C0∞ (T ) and δ˜x ∈ [C0∞ (T )]n+1 such that (2.26)
n δx Wpj (T ) + δ˜x Wpj (T ) ≤ Ch−j−n+ p
for j = 0, 1 and 1 ≤ p ≤ ∞, and for any χ ∈ S r , |χ(x)| ≤ C δx χ dσ , (2.27) T
(2.28)
˜ |∇Γ χ(x) · n| ≤ C χ∇Γ · δx dσ . T
Proof. We prove (2.28) when S r = Shr ; the other cases are similar. Assume x = a(˜ x) for x ˜ ∈ T˜ ∈ T˜h , and T = a(T˜). Then employing (2.12), we have x) ⊗ ν (˜ x) νh (˜ −1 x) · n |∇Γ χ(x) · n| = [(I − dH)(˜ I− ∇Γh χ (˜ x)] νh (˜ x) · ν (˜ x) x) · n|. ≤ C|∇Γh χ (˜
814
ALAN DEMLOW
˜ Following [SW95], there exists a smooth function δx˜ with support in T and not depen−k−n+ n p dent on χ such that δx˜ Wpk (T ) ≤ Ch and ∇Γh χ (˜ x) · n = T˜ ∇Γh χ · nδx˜ dσh . Employing (2.11) and integrating by parts yields
1 ∇Γh χ · nδx˜ dσh = − χ∇Γ · [I − dH][Ph ]n δx˜ dσ. μh T˜ T Setting δ˜x = μ1h δx˜ [I − dH][Ph ]n, we thus have (2.28). The proof of (2.26) is easily accomplished using (2.15) and (2.16). 2.6. Finite element methods. In this section we define two main types of finite element methods. The first type is defined on polynomial approximations of r . Dziuk’s original method in [Dz88] is a special case of this Γ using the spaces Sˆhk method. The second class of methods involves finite element solutions defined on Γ r . using the spaces Shr and Shk r We first define u˜hk ∈ Sˆhk . Let fh ∈ L2 (Γkh ) be an approximation to f satisfying r fh dσhk = 0. Then u˜hk ∈ Sˆhk uniquely satisfies Γk u ˜hk dσhk = 0 and Γk h h r (2.29) ∇Γkh u ˜hk ∇Γkh vh dσhk = fh vh dσhk ∀ vh ∈ Sˆhk . Γk h
Γk h
Dziuk’s original method results if we take k = r = 1 and fh = f − |Γ1h | Γh f dσh1 . Using (2.14) while recalling the definition (2.13) of AΓ and the definition (2.1) of L, we have the perturbed Galerkin orthogonality relationship
fh r ˜hk ∇Γ χ dσ + . f− χ dσ, χ ∈ Sˆhk L(u − u ˜hk , χ ) = (AΓ − P)∇Γ u μhk Γ Γ We next define two methods directly on Γ. The first of these methods employs the by lifting polynomial spaces directly from Γh . In particular, spaces Shr that are defined let uh,Γ ∈ Shr satisfy Γ uh,Γ dσh = 0 and (2.30) ∇Γ uh,Γ ∇Γ vh dσ = f vh dσ ∀ vh ∈ Shr . Γ
Γ
uh,Γ satisfies the Galerkin orthogonality relationship L(u − uh,Γ , χ) = 0, χ ∈ Shr . So long as one has ready access to the projection a, it is not difficult to program the method (2.30). Indeed, from (2.12) we see that (2.30) may be viewed as a finite element method over Γh for an elliptic problem with nonconstant elliptic coefficient matrix. Equation (2.30) may thus be regarded as an alternative to our generalized version (2.29) of Dziuk’s method which does not involve any geometric error. We emphasize, however, that there are cases where one has access only to a polynomial approximation of Γ, and employing (2.30) is not possible in these cases. r satisfy Γ uhk = 0, In addition, we let uhk ∈ Shk r (2.31) ∇Γ uhk ∇Γ vh dσ = f vh dσ ∀ vh ∈ Shk . Γ
Γ
uhk satisfies the Galerkin orthogonality relationship r L(u − uhk , χ) = 0, χ ∈ Shk .
We employ (2.31) only as a theoretical tool in duality arguments used to prove error bounds in non–energy norms and do not foresee any practical use for it.
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
815
3. Abstract error analysis. In this section we prove error estimates for surface finite element methods. Our analysis is carried out under the assumption that the r in section 2.5 hold. We approximation properties proved for the spaces Shr and Shk prove our results under general assumptions, as we wish our analysis to apply in other situations. In particular, these assumptions will hold if the approximating surfaces Γh and Γkh have nodes that lie within O(hk+1 ) of Γ instead of on Γ. It is reasonable to expect that this would be the case when using isoparametric spaces to compute evolving surfaces as in [Dz91], for example. 3.1. Assumptions on the finite element space and solution. We denote by S r a generic finite element space of degree r. Depending on the error estimate to be proven, we shall require some or all of the following approximation properties: A1: Basic approximation. We assume that there exists a linear interpolation operator I : H22 (Γ) → S r satisfying (2.23). A2: Superapproximation. Inequality (2.24) holds for any χ ∈ S r . A3: Inverse inequality. Inequality (2.25) holds for any χ ∈ S r . A4: Discrete δ function. There exist discrete δ-functions satisfying the properties (2.26), (2.27), and (2.28). Finally we assume that the finite element approximation uh ∈ S r to u satisfies the perturbed Galerkin orthogonality relationship ∇Γ (u − uh )∇Γ χ dσ = F (χ) ∀ χ ∈ S r , (3.1) Γ
where F is assumed to be a continuous linear functional on H 1 (Γ)/R. Here we shall think of F as encoding a geometric error resulting from the discrete approximation of the surface Γ. Thus F ≡ 0 for the methods (2.30) and (2.31) defined directly on Γ, whilefor the method (2.29) defined on polynomial approximations to Γ we have F (χ) = Γ (AΓ − I)∇Γ u˜hk ∇Γ χ dσ + Γ (f − fh /μhk )χ dσ. (The latter version of F is continuous on H 1 (Γ)/R because Γ (f − fh /μhk ) dσ = 0.) Such a linear functional F may also be employed to analyze other error sources such as the inexact evaluation of integrals due to numerical quadrature or nonlinearities (cf. the classical work [NS74] and the discussion in [De07]). 3.2. H 1 and L2 estimates. Here we give local and global H 1 and L2 estimates. Before doing so, we define the norms |||F |||H −j =
sup
F (u)
u∈H j (Γ)/R,uH j (Γ)/R =1
and |||F |||H −1 (D) =
sup u∈H01 (D),∇Γ uL2 (D) =1
F (u), D Γ
on linear functionals F : H 1 (Γ)/R → R. Theorem 3.1. Assume that u ∈ H 1 (Γ) and uh ∈ S r satisfy L(u−uh, vh ) = F (vh ) ∀vh ∈ S r , where F is a continuous linear functional on H 1 (Γ)/R. Then (3.2) (3.3)
∇Γ uh L2 (Γ) ≤ ∇Γ uL2 (Γ) + C|||F |||H −1 , ∇Γ (u − uh )L2 (Γ) ≤ minr ∇Γ (u − χ)L2 (Γ) + C|||F |||H −1 . χ∈S
816
ALAN DEMLOW
Let D ⊂ Γ be a subdomain, and let Kh ≤ γ ≤ γΓ with K sufficiently large and γΓ defined as in section 2.1. Then if A.1, A.2, and A.3 hold,
1 ∇Γ (u − uh )L2 (D) ≤ C minr ∇Γ (u − χ)L2 (Dγ ) + u − χL2 (Dγ ) χ∈S γ (3.4) 1 + u − uh L2 (Dγ ) + |||F |||H −1 (Dγ ) . γ 1 Finally, let u − uh = |Γ| Γ (u − uh ) dσ. Then if A.1 is satisfied, (3.5) u − uh − u − uh L2 (Γ) ≤ C h minr ∇(u − χ)H 1 (Γ) + h|||F |||H −1 + |||F |||H −2 . χ∈S
Proof. In order to prove (3.2), we calculate that ∇Γ uh 2L2 (Γ) = ∇Γ u∇Γ uh dσ − F (uh ) Γ
≤ ∇Γ uL2 (Γ) ∇Γ uh L2 (Γ) + |||F |||H −1 uh H 1 (Γ)/R ≤ (∇Γ uL2(Γ) + C|||F |||H −1 )∇Γ uh L2 (Γ) , where C arises from a Poincar´e inequality. Dividing through by ∇Γ uh L2 (Γ) completes the proof of (3.2). Inequality (3.3) may be proved by writing u − uh = (u − χ) − (uh − χ). We next prove (3.4). Let {Di }N i=1 be a cover of D consisting of balls of radius γ , and let D = {x ∈ Γ : dist (x, Di ) < γ4 }. We may choose the cover so that Γ i,γ/2 4 the balls Di,γ/2 have finite overlap. Finally let ωi ∈ C0∞ (Di,γ/2 ) with ωi |Di ≡ 1 and −j j ωi W∞ , 0 ≤ j ≤ r + 1. Such a cutoff function ω exists for γ ≤ γΓ . Fixing (Γ) ≤ Cγ r χ ∈ S , we set ψi = ωi2 (χ − uh ) and compute ∇Γ (u−uh )2L2 (D) ≤
N
L(ωi (u − uh ), ωi (u − uh ))
i=1
=
N
L(u − uh , ωi2 (u − uh )) + Di,γ/2
i=1
(3.6) ≤
N
|∇Γ ωi |2 (u − uh )2 dσ
[L(u − uh , ωi2 (u − χ)) + L(u − uh , ψi − Iψi ) + F (Iψi )]
i=1
+
C u − uh 2L2 (Dγ ) . γ2
Next we bound the terms in the last sum in (3.6). For any 1 ≥ > 0, L(u − uh , ωi2 (u − χ)) = ∇Γ (ωi (u − uh ))[ωi ∇Γ (u − χ) + 2(u − χ)∇Γ ωi ] dσ −
Γ
ωi (u − uh )∇Γ ωi ∇Γ (u − χ) dσ − 2 Γ
(3.7)
≤ ∇Γ (ωi (u − uh ))2L2 (Γ) + +
|∇Γ ωi |2 (u − uh )(u − χ) dσ Γ
C ∇Γ (u − χ)2L2 (Di,γ/2 )
C (u − uh 2L2 (Di,γ/2 ) + u − χ2L2 (Di,γ/2 ) ). γ2
817
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
−j j yields Applying (2.24) and (2.25) while recalling that h ≤ γ and ωi W∞ (Γ) ≤ Cγ
∇Γ (ψi − Ih ψi )L2 (Γ)
h 1 χ − uh L2 ((Di,γ/4 )h ) + ∇Γ (χ − uh )L2 ((Di,γ/4 )h ) ≤C γ γ
(3.8)
≤
C (u − χL2 (Di,γ/2 ) + u − uh L2 (Di,γ/2 ) ). γ
Applying the first line of the previous inequality, we find
(3.9)
h L(u − uh , ψi − Ih ψi ) ≤ C ∇Γ (u − uh )2L2 (Di,γ/2 ) + C∇Γ (u − χ)2L2 (Di,γ/2 ) γ +
C (u − uh 2L2 (Di,γ/2 ) + u − χ2L2 (Di,γ/2 ) ). γ2
Applying the second line of (3.8) and noting that ∇Γ ψi L2 (Di,γ/2 ) ≤ ∇Γ (u − χ)L2 (Di,γ/2 ) + ∇Γ (ωi (u − uh ))L2 (Di,γ/2 ) + γ1 u − uh L2 (Di,γ/2 ) , we finally compute N
F (Iψi ) = F
i=1
N
Iψi
≤ |||F |||H −1 (Dγ/2 )
i=1
≤ |||F |||H −1 (Dγ/2 )
N
N
∇Γ Iψi L2 (Di,γ/2 )
i=1
∇Γ (Iψi − ψi )L2 (Di,γ/2 ) + ∇Γ ψi L2 (Di,γ/2 )
i=1
(3.10) ≤
C C |||F |||2H −1 (Dγ/2 ) + 2 (u − χ2L2 (Dγ/2 ) + u − uh 2L2 (Dγ/2 ) ) γ + C∇Γ (u − χ)2L2 (Dγ/2 ) +
N
∇Γ (ωi (u − uh ))2L2 (Γ) .
i=1
Combining (3.7), (3.9), and (3.10) into (3.6) yields 1 ∇Γ (ωi (u − uh ))2L2 (Di,γ/2 ) ≤ C() 2 (u − χ2L2 (Dγ/2 ) γ i=1
N
(3.11)
+ u − uh 2L2 (Dγ/2 ) ) + ∇Γ (u − χ)2L2 (Dγ/2 ) + |||F |||2H −1 (Dγ/2 ) +
N Ch ∇Γ (u − uh )2L2 (Dγ/2 ) + 2 ∇Γ (ωi (u − uh ))2L2 (Di,γ/2 ) . γ i=1
The last term in (3.11) may be kicked back by taking = 14 , yielding 1 ∇Γ (u − uh )2L2 (D) ≤ C 2 (u − χ2L2 (Dγ/2 ) + u − uh 2L2 (Dγ/2 ) ) γ (3.12) h + ∇Γ (u − χ)2L2 (Dγ/2 ) + |||F |||2H −1 (Dγ/2 ) + ∇Γ (u − uh )2L2 (Dγ/2 ) . γ The term
h γ ∇Γ (u
− uh )2L2 (Dγ/2 ) above may be eliminated by iterating (3.12) with
Dγ/2 and Dγ replacing D and Dγ/2 , respectively. This results in a term
h2 γ 2 ∇Γ (u
−
818
ALAN DEMLOW
χ) + ∇Γ (χ − uh )2L2 (Dγ ) which may be eliminated by using the triangle inequality and an inverse inequality. In order to prove (3.5), we first let z ∈ H 1 (Γ) solve L(v, z) = (v, e− e)Γ , Γ z dσ = 0, where e = u − uh and e = u − uh . Then using (2.23), (2.2), and (3.3) yields e − e2L2 (Γ) = (e − e, −ΔΓ z) = L(e, z − Ih z) + F (Ih z − z) + F (z) ≤ C∇Γ eL2 (Γ) ∇Γ (z − Ih z)L2 (Γ) + |||F |||H −1 z − Ih zH 1 (Γ) + |||F |||H −2 zH22 (Γ) ≤ C(h minr ∇Γ (u − χ)L2 (Γ) + h|||F |||H −1 + |||F |||H −2 )zH22 (Γ) χ∈S
≤ C(h minr ∇Γ (u − χ)L2 (Γ) + h|||F |||H −1 + |||F |||H −2 )e − eL2 (Γ) . χ∈S
Dividing through by e − eL2 (Γ) completes the proof. 3.3. Pointwise estimates: Statement of results. In this subsection we state h pointwise stability and error estimates. Following [Sch98], let σx (y) = α(x,y)+h , where we recall that α(x, y) is the surface distance on Γ. We then define the weighted norm σxs Dα uLp (Γ) . uWpj ,x,s = 0≤|α|≤j
Letting q be the conjugate exponent to p, we define the weighted norm (3.13)
|||F |||Wp−j ,x,s =
sup
F (v).
v
=1 j Wq ,x,−s
We shall drop the subscripts x and s in (3.13) when s = 0. Theorem 3.2. Let 0 ≤ s ≤ r − 1 and 0 ≤ t ≤ r, and assume that A1, A2, A3, and A4 all hold. Then for any x ∈ Γ, |(u − uh − u − uh )(x)| (3.14)
≤ Ch,s inf r (h∇Γ (u − χ)L∞ ,x,s + u − χL∞ ,x,s ) χ∈S
−1 −2 ), + C(hh,s |||F |||W∞ ,x,s + h |||F |||W∞
and (3.15) (3.16)
−1 ), |∇Γ uh (x)| ≤ C(h,t ∇Γ uL∞ ,x,t + h |||F |||W∞ −1 ). |∇Γ (u − uh )(x)| ≤ C(h,t inf r ∇Γ (u − χ)L∞ ,x,t + h |||F |||W∞
χ∈S
Here h = ln h1 , h,t = h if t = r and h,t = 1 otherwise, and h,s = h if s = r − 1 and h,s = 1 otherwise. Taking s = t = 0 and taking a maximum of (3.14) and (3.16) over Γ yields quasi1 optimal L∞ and W∞ error estimates, modulo analysis of perturbation terms involving F . When s > 0 (3.14) shows that the pointwise gradient error at x is localized to x in that the weight σxs deemphasizes the approximation error ∇(u − χ)(y) by a factor of hs when α(x, y) ≈ 1. No localization occurs in errors for function values in the piecewise linear case as s = r − 1 = 0 in this case (cf. [De04] for a counterexample). Note that (3.14) and (3.16) are very similar to the results in [Sch98] for domains in Rn . Details peculiar to the fact that we are working on surfaces are hidden in the functional F .
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
819
3.4. Proof of Theorem 3.2. We shall prove (3.15) in full detail. The proof of (3.16) follows from (3.15) by writing ∇Γ (u − uh ) = ∇Γ (u − χ) − ∇Γ (uh − χ). The proof of (3.14) is similar but slightly simpler, and we only sketch its proof. We proceed via a duality argument. Fix a point x ∈ Γ, and let n be a unit vector lying in the tangent plane to Γ at x. Let δ˜x satisfy the properties (2.26) and x ∇Γ · δ˜x ) for (2.28), and let g x be a xdiscrete Green’s function satisfying L(v, g ) = (v, 1 x ˜ all v ∈ H (Γ) and Γ g dσ = 0. (Note that Γ ∇Γ · δx = 0.) Let also gh ∈ S r be its finite element approximation satisfying L(vh , g x − ghx ) = 0 ∀vh ∈ S r and Γ ghx dσ = 0. Then |∇Γ uh (x) · n| ≤ C uh ∇Γ · δ˜x dσ Γ
=
L(uh , ghx )
= L(u, ghx ) − F (ghx )
= L(u, ghx − g x ) + L(u, g x ) − F (ghx ) ≤ ∇Γ uL∞ ,x,t ∇Γ (g x − ghx )L1 (Γ),x,−t +
u∇Γ · δ˜x dσ T
x −1 g W 1 (Γ) + |||F |||W∞ h 1
≤ C∇Γ uL∞ ,x,t (1 + ∇Γ (g x − ghx )L1 (Γ),x,−t ) x −1 ∇Γ g L (Γ) , + C|||F |||W∞ h 1
where we have used a Poincar´e inequality in the last step. Similarly, fix x ∈ Γ, and let gˆx satisfy Γ gˆx dσ = 0 and L(v, gˆx ) = (v, δx − δx ) for (2.26) and (2.27). Also let gˆhx ∈ S r satisfy L(ˆ g x − gˆhx , χ) = 0 ∀χ ∈ S r δx satisfying x r and Γ gˆh dσ = 0. Let also x ∈ T . Then for χ ∈ S , |(u − uh )(x)−u − uh | ≤ |(u − χ)(x)| + C (χ − uh − u − uh )δx dσ Γ
≤ C(u − χL∞ (T ) + |L(u − uh , gˆx )|) −1 ≤ (∇Γ (u − χ)L∞ ,x,s + |||F |||W∞ g x − gˆhx W11 ,x,−s ,x,s )ˆ −2 ˆ + Cu − χL∞ (T ) + |||F |||W∞ g x W12 (Γ) .
The heart of our proof consists of the following lemma. Lemma 3.3. Under the assumptions of section 2 and Theorem 3.2, (3.17) (3.18) (3.19)
∇Γ (g x − ghx )L1 ,x,−t ≤ Ch,t , ˆ g x − gˆhx W11 ,x,−s ≤ Chh,s , g x W12 (Γ) ≤ Ch . ∇Γ g x L1 (Γ) + ˆ
The proof of (3.16) will be complete once we prove Lemma 3.3. 3.5. Proof of Lemma 3.3. The proof of Lemma 3.3 is similar to that given for domains in Rn in [Sch98] (though the fact that we consider here an indefinite bilinear form complicates matters slightly). Thus we omit some details from our proof. Note first that g x − ghx satisfies the error estimates of Theorem 3.1 with F ≡ 0. We then decompose Γ into annular subdomains about the point x. For a parameter M > 0 which we shall later take to be large enough, we fix Γ0 = BMh (x) and define
820
ALAN DEMLOW
γj = 2j M h. Let J be the largest integer such that γJ ≤ γ2Γ , where γΓ is defined in section 2.1. For 0 < j < J, we define the annuli Γj = {y ∈ Γ : γj−1 < α(x, y) < γj } and then finally define ΓJ = Γ \ ∪0≤j<J Γj . Thus Γ = ∪0≤j≤J Γj . Also, we let Γ j = int(Γj−1 ∪ Γj ∪ Γj+1 ), Γ j = Γ j−1 ∪ Γ j ∪ Γ j+1 , and Γ j = Γj−1 ∪ Γj ∪ Γj+1 . We then use (3.4), H¨ older’s inequality, and (2.23) to find that ∇Γ (g x − ghx )L1 ,x,−t ≤ C(M )hn/2 ∇Γ (g x − ghx )L2 (Γ0 ) + C
J γj t
h
j=1
(3.20)
n/2
γj
∇Γ (g x − ghx )L2 (Γj )
≤ C(M )hn/2 [∇Γ (g x − ghx )L2 (Γ0 ) + h−1 g x − ghx L2 (Γ0 ) + minr (∇Γ (g x − χ)L2 (Γ0 ) + h−1 g x − χL2 (Γ0 ) )] χ∈S
+
J γj t j=1
h
r+1 γjn hr g x W∞ (Γj,h ) +
γ t j
h
n/2−1
γj
g x − ghx L2 (Γj ) .
Let ωj ∈ C0∞ (Γ j ) be a cutoff function satisfying 0 ≤ ωj ≤ 1 and ωj ≡ 1 on Γj . 1 2 x x 2 Let Cj = |Γ| Γ ωj (g − gh ) dσ, and let w ∈ H (Γ) with Γ w dσ = 0 solve j
L(w, v) = (ωj2 (g x − ghx ) − Cj , v) ∀ v ∈ H 1 (Γ). Using (2.23) and recalling that Γ (g x − ghx ) dσ = 0, we compute g x − ghx 2L2 (Γj ) ≤ ωj (g x − ghx )2L2 (Γ) = (ωj2 (g x − ghx ) − Cj , g x − ghx ) = L(w, g x − ghx )
(3.21)
= L(w − Ih w, g x − ghx ) ≤ C(hwH 2 (Γ ∇Γ (g x − ghx )L2 (Γ j ) j ) x x r+1 + hr wW∞ (Γ\Γ ) ∇Γ (g − gh )L1 (Γ) ). j
Noting that w(y) = Γ Gy (z)ωj2 (g x − ghx ) dσ(z) since Γ Gy (z)Cj dσ(z) = 0, we use (2.4) to calculate that for any multi-index β with |β| ≤ r + 1 and any y ∈ Γ \ Γ j , Dβ w(y) = Dyβ Gy (z)[ωj2 (g x − ghx )] dσ(z)
Γ
(3.22)
≤ |Γj |ωj2 (g x − ghx )L2 (Γj ) Dyβ Gy L∞ (Γj ) n/2
≤ Cγj
ωj (g x − ghx )L2 (Γj ) γj1−n−r .
Inserting (3.22) into (3.21) and using the regularity estimate (2.2) yields ωj (g x − ghx )2L2 (Γ) ≤ C[h∇Γ (g x − ghx )L2 (Γ j )
r h −n/2+1 + γj ∇Γ (g x − ghx )L1 (Γ) ]ωj (g x − ghx )L2 (Γ) , γj
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
821
so that g x − ghx L2 (Γj ) ≤ Ch∇Γ (g x − ghx )L2 (Γ j )
r h −n/2+1 + γj ∇Γ (g x − ghx )L1 (Γ) . γj
(3.23)
Recalling (2.26), we next compute that for y ∈ Γj,h and β with |β| = r + 1, Dβ g x (y) = − ∇Γ,z Dyβ Gy (z)δ˜x (z) dσ(z) Γ
(3.24)
≤∇Γ Dyβ Gy L∞ (supp(δ˜x )) δ˜x L1 (Γ) ≤Cγj−n−r .
Finally, employing (3.3), (3.5), (2.23), (2.2), and (2.26) yields C(M )hn/2 [∇Γ (g x −ghx )L2 (Γ0 ) + h−1 g x − ghx L2 (Γ0 ) + minr (∇Γ (g x − χ)L2 (Γ0 ) + h−1 g x − χL2 (Γ0 ) ]
(3.25)
χ∈Sh
≤ Chn/2+1 ∇Γ · δ˜x L2 (Γ) ≤ C. Inserting (3.23), (3.24), and (3.25) into (3.20), rearranging terms, and finally employing (3.25) yields ∇Γ (g x − ghx )L1 ,x,−t ≤ C + C
J γj t
h
j=1
≤ C +C
J j=1
n/2
γj
J γj t n r −r−n γj t n/2 h γj h γj +C γj ∇Γ (g x − ghx )L2 (Γj ) h h γ j j=1
+ C∇Γ (g x − ghx )L1
r J γj t h j=1
h
γj
≤ C + C(1 + ∇Γ (g x − ghx )L1 (Γ) )
r−t J
h j=1
+
∇Γ (g x − ghx )L2 (Γj )
γj
J C γj t n/2 γj ∇Γ (g x − ghx )L2 (Γj ) . M j=1 h
The last term above may be kicked back (to the last term in the first line) for M large J enough. In addition, we note that j=1 ( γhj )r−t ≤ Ch,t M 1r−t . Thus (3.26)
∇Γ (g x − ghx )L1 ,x,−t ≤ C +
C h,t ∇Γ (g x − ghx )L1 (Γ) . M r−t
Applying (3.26) with t = 0 and taking M large enough to kick back the last term yields (3.27)
∇Γ (g x − ghx )L1 ≤ C.
822
ALAN DEMLOW
Inserting (3.27) into (3.26) completes the proof of (3.17). In order to prove the inequality ∇Γ g x L1 (Γ) ≤ h from (3.19), we first note the easily proven regularity estimate ∇Γ g x L2 (Γ) ≤ Cδ˜x L2 (Γ) ≤ Ch−n/2 . Computing as in (3.24) yields Dα g x (y) ≤ Cα(x, y)−2 for |α| = 1 and α(x, y) ≥ 3h. We thus find that ∇Γ g x L1 (Γ) ≤ Chn/2 ∇Γ g x L2 (Γ) + ∇Γ g x L1 (Γ\B3h (x)) C ≤C+ y −1 dy ≤ Ch . 3h
The proofs of (3.18) and the inequality ˆ g x W12 (Γ) ≤ Ch are very similar to the corresponding proofs for the appropriate norms of g x − ghx and gˆx and also to the proofs given in [Sch98], so we only make a couple of notes. First, (3.18) requires us to bound a weighted W11 norm of gˆx − gˆhx , not just an L1 norm of the gradient as in (3.17). However, if we carry out the computation in (3.20) with gˆx − gˆhx and s in place of g x − ghx and t, respectively, then the last line of (3.20) can easily be shown to bound ˆ g x − gˆhx L1 ,x,−s . Second, the right-hand side δx − δx is not locally supported, which requires a modification when performing computations similar to (3.22) and (3.24). In particular, we note that gˆx (y) = Γ Gy (z)(δx − δx ) dσ(z) = Γ Gy (z)δx dσ(z) and then proceed essentially as in (3.24). 4. Error analysis of specific methods and numerical results. In this section we apply the abstract error analysis in section 3 to the methods (2.29) and (2.30) in section 2.6. In the case of the method (2.29) defined on polynomial approximations to Γ, the resulting error bounds consist of a “PDE”- or “almost-best-approximation”type term that arises in essentially every finite element approximation, plus a geometric error term arising from the approximation of Γ by Γkh . We also briefly describe numerical experiments that confirm the structure of our H 1 and L2 estimates. 4.1. Error estimates for FEM on polynomial approximations to Γ. We first state a fundamental geometric error bound which is an extension of a bound found in [Dz88] to higher-order approximations of Γ. Proposition 4.1. (4.1)
AΓ − PL∞ (Γ) ≤ Chk+1 .
Proof. Recalling that dL∞ (Γh,k ) ≤ Chk+1 and noting from (2.10) that |1 − ≤ Chk+1 + C|1 − ν · νhk | ≤ Chk+1 + C|ν − νhk |2 ≤ Chk+1 , we have |AΓ − P| ≤ |PPh,k P − P| + Chk+1 . But |PPh,k P − P| = |(νhk − ν · νhk ν ) ⊗ (νhk − ν · νhk ν )| ≤ Ch2k , which completes the proof. Next we give H 1 and L2 estimates. Corollary 4.2. Let u ˜hk satisfy (2.29) with fh = μhk f . Then if u ∈ H r+1 (Γ), 1 μhk |
(4.2) (4.3)
˜hk )L2 (Γ) ≤ C(hr uH r+1 (Γ) + hk+1 ∇Γ uL2 (Γ) ), ∇Γ (u − u ˜hk L2 (Γ) ≤ C(hr+1 uH r+1 (Γ) + hk+1 ∇Γ uL2 (Γ) ), u − u˜hk − u − u
where C depends on d and its derivatives.
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
823
Remark 4.3. The geometric error in the L2 estimate (3.5) has the form h|||F |||−1 + |||F |||−2 . However, we cannot take advantage of the fact that the norm ||| · |||−2 is weaker than the norm ||| · |||−1 in order to achieve a higher order of convergence hk+2 for the geometric error in our L2 estimates. Computational experiments in section 4 confirm that the geometric error is indeed of order hk+1 for both the L2 and energy errors. ˜hk | = |˜ uhk | ≤ Chk+1 ∇Γ uL2 (Γ) Remark 4.4. It is possible to show that |u − u for h small enough, so that in fact (4.3) holds with u − u ˜hk L2 (Γ) on the left-hand side. We state (4.3) as we do both to maintain consistency with [Dz88] and because we wish to emphasize that (4.3) is sharp with respect to the order of the geometric error. Proof. Note first that if fh = μhk f , u˜hk satisfies (3.1) with F (χ) = Γ (AΓ − ˜hk ∇Γ χ dσ. Combining (3.2) and (4.1) yields P)∇Γ u ˜hk L2 (Γ) |||F |||H −1 ≤ Chk+1 ∇Γ u ≤ Chk+1 (∇Γ uL2 (Γ) + C|||F |||H −1 ). Taking h small enough to kick back the last term above yields |||F |||H −1 ≤ Chk+1 ∇Γ uL2 (Γ) ,
(4.4)
which when combined with (3.3) and (2.23) completes the proof of (4.2). Noting that |||F |||H −2 ≤ |||F |||H −1 and then inserting (4.4) into (3.5) while recalling (2.23) completes the proof of (4.3). We now give pointwise error estimates. Corollary 4.5. Let u ˜hk satisfy (2.29) with fh = μhk f . Let also 0 ≤ s ≤ r − 1 and 0 ≤ t ≤ r. Then for any x ∈ Γ, ˜hk | |(u − u ˜hk )(x) − u − u (4.5)
≤Ch,s infr (h∇Γ (u − χ)L∞ ,x,s + u − χL∞ ,x,s ) + Chk+1 h ∇Γ uL∞ (Γ) , χ∈Shk
(4.6)
|∇Γ (u − u˜hk )(x)| ≤ C(h,t infr ∇Γ (u − χ)L∞ ,x,t + hk+1 h ∇Γ uL∞ (Γ) ). χ∈Shk
Here C depends on d and its derivatives, and h , h,t , and h,s are defined as in Theorem 3.2. Proof. We recall that F (χ) = Γ (AΓ − P)∇Γ u˜hk ∇Γ χ dσ and then use (3.15) with t = 0 and (4.1) to find that for h small enough, ∇Γ u˜hk L∞ (Γ) ≤ C(∇Γ uL∞ (Γ) + h AΓ − PL∞ (Γ) ∇Γ u ˜hk L∞ (Γ) ) ≤ C(∇Γ uL∞ (Γ) + hk+1 h ∇Γ u˜hk L∞ (Γ) ) ≤ C∇Γ uL∞ (Γ) . Here we have kicked back the last term on the right-hand side by taking h sufficiently k+1 −1 −2 ≤ Ch ∇Γ uL∞ (Γ) , which when inserted small. Thus |||F |||W∞ ,x,s + |||F |||W∞ into (3.14) and (3.16) yields (4.5) and (4.6), respectively. Taking the maximum of (4.5) and (4.6) with t = s = 0 leads to standard quasioptimal pointwise error estimates. In addition, one can easily use (2.23) and elementary manipulations to prove asymptotic error expansion inequalities similar to those given in [Sch98] for domains in Rn .
824
ALAN DEMLOW
Corollary 4.6. Under the conditions of Corollary 4.5, k+1 r+1 u − u ˜hk − u − u˜hk L∞ (Γ) ≤ C(˜h hr+1 uW∞ h ∇Γ uL∞ (Γ) ), (Γ) + Ch k+1 r+1 ∇Γ (u − u ˜hk )L∞ (Γ) ≤ C(hr uW∞ h ∇Γ uL∞ (Γ) ), (Γ) + Ch
where ˜h = h if r = 1 and ˜h = 1 otherwise. In addition for 0 ≤ s ≤ r − 1, 0 ≤ t ≤ r, and x ∈ Γ, r+1 |(u − u ˜hk )(x) − u − u ˜hk | ≤ Ch,s h |DΓβ u(x)| 1≤|β|≤r+1
+
h
|β|−r−1
|DΓβ u(x)|
+ h uW∞ r+1+s (Γ) s
,
r+2≤|β|≤r+s
|∇Γ (u −
u ˜hk )(x)|
≤ Ch,r h
r
|DΓβ u(x)|
1≤|β|≤r+1
+
h
|β|−r−1
|DΓβ u(x)|
r+1+t + h uW∞ (Γ)
t
.
r+2≤|β|≤r+t
4.2. Error estimates for finite element methods defined on Γ. In order to obtain error estimates for the method (2.30), we simply apply Theorems 3.1 and 3.2 with F ≡ 0 while recalling (2.23). Corollary 4.7. Let uh,Γ defined by (2.30), and assume u ∈ H r+1 (Γ). Then ∇Γ (u − uh,Γ )L2 (Γ) ≤ Chr uH r+1 (Γ) , u − uh,Γ L2 (Γ) ≤ Chr+1 uH r+1 (Γ) . For x ∈ Γ, 0 ≤ s ≤ r − 1, and 0 ≤ t ≤ r, |(u − uh,Γ )(x)| ≤ Ch,s inf r (h∇Γ (u − χ)L∞ ,x,s + u − χL∞ ,x,s ), χ∈Sh
|∇Γ (u − uh,Γ )(x)| ≤ Ch,t inf r ∇Γ (u − χ)L∞ ,x,t . χ∈Sh
Here h,s and h,t are as defined in Theorem 3.2. 4.3. Numerical experiments. In our numerical experiments we let Γ = {x ∈ x2 R : x21 + x22 + 93 = 1}; that is, Γ is an ellipsoid having principal axes of lengths 1, 1, and 3. Also, we let u = x1 . (Note that ΔΓ u ≡ 0 on Γ, even though u(x) = x1 is a harmonic function on R3 .) Computations were performed on a sequence of uniformly refined meshes in all cases, with high-order quadrature being employed. We refer to [DD07] for more implementation details, in particular the numerical approximation of a when, as in the current case, d is not explicitly available. All methods were implemented using the finite element toolbox ALBERTA [SS05]. In Figure 1 we display plots of ∇Γ (u − uh )L2 (Γ) versus the number of degrees of freedom (DOF), where uh = u˜h1 , uh = u ˜h2 , and uh = uh,Γ are the finite element approximations defined on a polyhedral approximation to Γ (via (2.29) with k = 1), a quadratic approximation to Γ (via (2.29) with k = 2), and Γ (via (2.30)), respectively. 3
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
825
1.E+00 1.E+00
1.E-02 1.E-02 slope=-1/2 linears cubics quadratics slope=-1 1.E-04 1.E+01
1.E-04
1.E+03
1.E+05
1.E+07
1.E-06 1.E+01
slope=-1/2 linears slope=-1 quadratics quartics cubics slope=-3/2
1.E+03
1.E+05
1.E+07
1.E+00
1.E-02
1.E-04
1.E-06
1.E-08 1.E+01
slope=-1/2 linears slope=-1 quadratics slope=-3/2 cubics quartics slope=-2
1.E+03
1.E+05
1.E+07
Fig. 1. Plots of ∇Γ (u − uh )L2 (Γ) vs. the number of degrees of freedom: Finite element method defined on Γh (upper left), Γ2h (upper right), and Γ (bottom).
Optimal-order decrease for ∇Γ (u − uh)L2 (Γ) is DOF −r/2 , so we display logarithmic lines of various slopes for comparison with computed error trends. The effect of the geometric error is clearly seen. When k = 1 (upper left of Figure 1), we obtain optimal-order convergence when r = 1 and r = 2 so that hk+1 ≤ hr . Suboptimal convergence is obtained when r ≥ 3, as expected. When k = 2 (upper right) we obtain optimal convergence for r ≤ 3, but not for r = 4. Thus (4.2) is sharp with respect to the geometric error hk+1 ∇Γ uL2 (Γ) . Finally, in the bottom plot of Figure 1 we observe optimal-order convergence for all polynomial degrees r ≤ 4 when defining the finite element method directly on Γ via (2.30). We note, however, that our experiments use high-order quadrature, and the quadrature error is likely to be more pronounced when using (2.30) in practical situations, as this formulation essentially involves an elliptic problem with a nonconstant coefficient matrix. Similar plots of the L2 error on linear and quadratic surface approximations are displayed in Figure 2. These plots confirm the sharpness of the error estimate (4.3). 5. Extensions. In this section we briefly discuss extensions of our methods and analysis to more general situations. 5.1. More general surface approximations. Our definitions in section 2 require that the nodes of the discrete surfaces Γh and Γkh lie on Γ. This is a reasonable assumption for stationary problems, but not for geometric evolution problems
826
ALAN DEMLOW
1.E+00
1.E-01 1.E-02
1.E-03 cubics quadratics linears slope=-1
1.E-05 1.E+00
1.E+02
1.E-04
1.E+04
1.E+06
1.E+08
1.E-06 1.E+01
slope=-1 linears quartics cubics quadratics slope=-3/2
1.E+03
1.E+05
1.E+07
Fig. 2. Plots of u − uh − u − uh L2 (Γ) vs. the number of degrees of freedom: Finite element method defined on Γh (left) and Γ2h (right).
such as mean curvature flow where the goal is to approximate an unknown surface Γ (cf. [Dz91]). Instead of assuming that the nodes of the discrete surfaces lie on Γ, it is reasonable to assume that they lie within O(hk+1 ) of Γ; cf. the comments at the beginning of section 3. 5.2. Surfaces with boundary. Our development may be carried out for surfaces Γ with boundary ∂Γ modulo “variational crimes” that arise when S r ⊂ H 1 (Γ), just as for domains in Rn . Note that variational crimes do not arise if ∂Γ is “curvipolygonal” in the sense that a(∂Γh ) = ∂Γ (cf. [DD07]). In a few situations, ∂Γ may be both smooth and “curvi-polygonal” in this sense (e.g., if Γ is a half-sphere). 5.3. General second-order elliptic PDE. Many applications involve general ˜ Γ u+cu = f. If we second-order linear elliptic problems of the form −divΓ (D∇Γ u)+ b·∇ make the natural assumption that Dτ ·ν = b·ν = 0 for τ ·ν = 0 (cf. [DE07b]), then the H 1 and L2 error estimates of sections 3 and 4 hold for this problem if the associated bilinear form is coercive and the coefficients sufficiently smooth. In particular, one can show that the geometric error is still of order hk+1 in the more general case. Our pointwise estimates hold if a Green’s function satisfying the identities and inequalities in Lemma 2.2 exists (note that [Aub82] considers only the Laplace–Beltrami operator). 5.4. C 2 surfaces. In many situations of interest, Γ is not infinitely differentiable. The essential assumption that the orthogonal projection a exists generally requires that Γ be C 2 , and situations where Γ is less regular cannot be considered without substantial modification to our methodology. If Γ is merely C 2 , the abstract energy and L2 error estimates of Theorem 3.1 hold verbatim, but the order of the geometric error in Corollary 4.2 is naturally restricted by the smoothness of Γ. We also expect the abstract pointwise estimates of Theorem 3.2 to hold if Γ is only C 2 so long as s = 0 and t ≤ 1. Proving such a statement using our techniques requires the establishment of pointwise estimates for the Green’s function as in Lemma 2.2. This can likely be accomplished using an elementary mapping argument, though we have not checked the details. 5.5. Manifolds. The abstract error analysis of section 3 relies on two classes of assumptions: those concerning the finite element triangulation and space, and those concerning the underlying PDEs. The PDE assumptions employed in section 3 hold with slight modification if one considers smooth Riemannian manifolds without boundary instead of smooth surfaces without boundary. Thus if one can construct
HIGHER-ORDER FEM AND POINTWISE ESTIMATES ON SURFACES
827
finite element spaces on manifolds satisfying the assumptions A1 through A4, the results of section 3 should hold as well. REFERENCES [AP05]
T. Apel and C. Pester, Clement-type interpolation on spherical domains—interpolation error estimates and application to a posteriori error estimation, IMA J. Numer. Anal., 25 (2005), pp. 310–336. [Aub82] T. Aubin, Nonlinear Analysis on Manifolds. Monge-Amp` ere Equations, Grundlehren Math. Wiss. 252, Springer-Verlag, New York, 1982. ¨ nsch, P. Morin, and R. H. Nochetto, A finite element method for surface [BMN05] E. Ba diffusion: The parametric case, J. Comput. Phys., 203 (2005), pp. 321–343. [BS02] S. C. Brenner and L. R. Scott, The Mathematical Theory of Finite Element Methods, 2nd ed., Texts Appl. Math. 15, Springer-Verlag, New York, 2002. [CDD+04] U. Clarenz, U. Diewald, G. Dziuk, M. Rumpf, and R. Rusu, A finite element method for surface restoration with smooth boundary conditions, Comput. Aided Geom. Design, 21 (2004), pp. 427–445. [CDR03] U. Clarenz, U. Diewald, and M. Rumpf, A multiscale fairing method for textured surfaces, in Visualization and Mathematics III, Math. Vis., Springer-Verlag, Berlin, 2003, pp. 245–260. [DDE05] K. Deckelnick, G. Dziuk, and C. M. Elliott, Computation of geometric partial differential equations and mean curvature flow, Acta Numer., 14 (2005), pp. 139–232. [De04] A. Demlow, Piecewise linear finite element methods are not localized, Math. Comp., 73 (2004), pp. 1195–1201. −1 [De07] A. Demlow, Sharply localized pointwise and W∞ estimates for finite element methods for quasilinear problems, Math. Comp., 76 (2007), pp. 1725–1741. [DD07] A. Demlow and G. Dziuk, An adaptive finite element method for the Laplace–Beltrami operator on implicitly defined surfaces, SIAM J. Numer. Anal., 45 (2007), pp. 421– 442. [Dz88] G. Dziuk, Finite elements for the Beltrami operator on arbitrary surfaces, in Partial Differential Equations and Calculus of Variations, Lecture Notes in Math. 1357, Springer-Verlag, Berlin, 1988, pp. 142–155. [Dz91] G. Dziuk, An algorithm for evolutionary surfaces, Numer. Math., 58 (1991), pp. 603–611. [DE07a] G. Dziuk and C. M. Elliott, Finite elements on evolving surfaces, IMA J. Numer. Anal., 27 (2007), pp. 262–292. [DE07b] G. Dziuk and C. M. Elliott, Surface finite elements for parabolic equations, J. Comput. Math., 25 (2007), pp. 385–407. [GT98] D. Gilbarg and N. S. Trudinger, Elliptic Partial Differential Equations of Second Order, 2nd ed., Springer-Verlag, Berlin, 1998. [He05] C.-J. Heine, Isoparametric Finite Element Approximation of Curvature on Hypersurfaces, preprint, 2005. [He06] C.-J. Heine, Computations of form and stability of rotating drops with finite elements, IMA J. Numer. Anal., 26 (2006), pp. 723–751. [Ho01] M. Holst, Adaptive numerical treatment of elliptic systems on manifolds. A posteriori error estimation and adaptive computational methods, Adv. Comput. Math., 15 (2001), pp. 139–191. [Ne76] J.-C. N´ ed´ elec, Curved finite element methods for the solution of singular integral equations on surfaces in R3 , Comput. Methods Appl. Mech. Engrg., 8 (1976), pp. 61–80. [NS74] J. A. Nitsche and A. H. Schatz, Interior estimates for Ritz-Galerkin methods, Math. Comp., 28 (1974), pp. 937–958. [Sch98] A. H. Schatz, Pointwise error estimates and asymptotic error expansion inequalities for the finite element method on irregular grids. I. Global estimates, Math. Comp., 67 (1998), pp. 877–899. [SW95] A. H. Schatz and L. B. Wahlbin, Interior maximum-norm estimates for finite element methods, Part II, Math. Comp., 64 (1995), pp. 907–928. [SS05] A. Schmidt and K. G. Siebert, Design of Adaptive Finite Element Software, Lect. Notes Comput. Sci. Eng. 42, Springer-Verlag, Berlin, 2005.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 828–843
CONVERGENCE ANALYSIS OF PROJECTION METHODS FOR THE NUMERICAL SOLUTION OF LARGE LYAPUNOV EQUATIONS∗ V. SIMONCINI† AND V. DRUSKIN‡ Abstract. The numerical solution of large-scale continuous-time Lyapunov matrix equations is of great importance in many application areas. Assuming that the coefficient matrix is positive definite, but not necessarily symmetric, in this paper we analyze the convergence of projection-type methods for approximating the solution matrix. Under suitable hypotheses on the coefficient matrix, we provide new asymptotic estimates for the error matrix when a Galerkin method is used in a Krylov subspace. Numerical experiments confirm the good behavior of our upper bounds when linear convergence of the solver is observed. Key words. Lyapunov equation, Krylov subspace, matrix exponential, Faber polynomials AMS subject classifications. 65F10, 93B40 DOI. 10.1137/070699378
1. The problem. We are interested in the approximate solution of the following Lyapunov matrix equation: AX + XA = BB ,
(1.1)
with A a real matrix of large dimension and B a real tall matrix. Here A indicates the transpose of A. We assume that the n × n matrix A is either symmetric and positive definite or nonsymmetric with positive definite symmetric part, that is, (A + A )/2 is positive definite. In the following we mostly deal with the case of B having a single column, that is, B = b, and we assume that b has unit Euclidean norm, that is, b = 1. Nonetheless, our results can be extended to the multiple vector case. This problem arises in a large variety of applications, such as signal processing and system and control theory. The symmetric solution X carries important information on the stability and energy of an associated dynamical linear system and on the feasibility of order reduction techniques [2], [6], [8]. The analytic solution of (1.1) can be written as ∞ ∞ (1.2) e−tA BB e−tA dt = xx dt, X= 0
0
where we have set x = e−tA B. Let αmin be the smallest eigenvalue of the symmetric part of A, αmin = λmin ((A + A )/2) > 0. Then it can be shown that x ≤ exp(−tαmin )B; see, e.g., [8, Lemma 3.2.1]. Projection-type methods seek an approximate solution Xm in a subspace of Rn by requiring, e.g., that the residual BB − (AXm + Xm A ) be orthogonal to this subspace. A particularly effective choice as approximation space is given by (here for B = b) the Krylov subspace Km (A, b) = span{b, Ab, . . . , Am−1 b} of dimension m ≤ n ∗ Received by the editors August 8, 2007; accepted for publication (in revised form) October 14, 2008; published electronically February 6, 2009. http://www.siam.org/journals/sinum/47-2/69937.html † Dipartimento di Matematica, Universit` a di Bologna, Piazza di Porta S. Donato 5, I-40127 Bologna, Italy (
[email protected]). ‡ Schlumberger-Doll Research, 1 Hampshire St., Cambridge, MA 02139 (
[email protected]field. slb.com).
828
PROJECTION METHODS FOR LYAPUNOV EQUATIONS
829
[22], [23], [31]; we also refer to a richer bibliographic account collected in [2], [9], while we point to [33] for recent algorithmic progress within the Krylov subspace context. Abundant experimental evidence over the years has shown that the use of the space Km (A, b) allows one to often obtain a satisfactorily accurate approximation Xm , in a space of much lower dimension than n. A particularly attractive feature is that Xm
may be written as a low rank matrix, Xm = Um Um with Um of low column rank, so that only the matrix Um needs to be stored. To the best of our knowledge, no asymptotic convergence analysis of this Galerkin method is available in the literature. The aim of this paper is to fill this gap. We also refer to [30] for a priori estimates on the residual norm when solving the Sylvester equation with projection-type methods; there, the role of αmin is also emphasized, although the bound derived in [30, Proposition 4.1] for the residual norm is of greater value as a nonstagnation condition of the procedure, rather than as an estimate of the actual convergence behavior. To derive our error estimates, we shall use the integral representation (1.2) for both X and Xm and explicitly bound the norm of the error matrix X − Xm ; we refer to [31] for early considerations in this direction. Our approach is highly inspired by, and fully relies on, the papers [13], [24], where general estimates for the error in approximating matrix operators by polynomial methods are derived. We provide explicit estimates when A is symmetric, and when A is nonsymmetric with its field of values (or spectrum) contained in certain not necessarily convex sets of C+ . We also show that the convergence of the Galerkin method is closely related to that of Galerkin methods for solving the linear system (A + αmin I)d = b. Our estimates are asymptotic, and thus linear; that is, they do not capture the possibly superlinear convergence behavior of the method that is sometimes observed [29]. In the linear system setting, the superlinear behavior is due to the fact that Krylov-based methods tend to adapt to the (discrete) spectrum of A, accelerating convergence as spectral information is gained while enlarging the space. Recent results for A symmetric have been derived, which completely describe the behavior of Krylov subspace solvers in the presence of superlinear convergence [4], [5]; see also [34] for a discussion and more references. Throughout the paper we assume exact arithmetic. 2. Numerical solution and preliminary considerations. Given the Krylov subspace Km (A, b) and a matrix Vm whose orthonormal columns span Km (A, b), with b = V e1 , we seek an approximation in the form Xm = Vm Ym Vm . Here and in the following, ei denotes the ith column of the identity matrix of given dimension. Imposing that the residual Rm = bb − (AXm + Xm A ) be orthogonal to the given space, the so-called Galerkin condition, yields the equation Vm Rm Vm = 0
⇔
Tm Y + Y Tm = e1 e 1,
where Tm = Vm AVm ; see, e.g., [2], [31]. The m × m matrix Ym can thus be computed by solving the resulting small-size Lyapunov equation. The matrix Xm can be equivalently written in integral form. Indeed, let xm = xm (t) = Vm e−tTm e1 be the so-called Krylov approximation to x = x(t) in Km (A, b). Then Xm can be written as
∞ −tTm
−tTm e e1 e1 e dt Vm Xm = Vm = 0
0 ∞
−tTm Vm e−tTm e1 e Vm dt = 1e
0
∞
xm x m dt.
830
V. SIMONCINI AND V. DRUSKIN
We are interested in finding a priori bounds for the 2-norm of the error matrix, that is, for X − Xm , where the 2-norm is the matrix norm ∞induced by the vector Euclidean norm. We start by observing that X − Xm = 0 (xx − xm x m )dt, and that
xx − xm x m = x(x − xm ) + (x − xm )xm ≤ (x + xm ) x − xm .
It holds that λmin ((Tm +Tm )/2) ≥ αmin . Using xm ≤ exp(−tλmin ((Tm +Tm )/2)) ≤ exp(−tαmin ), we have ∞ ∞
X − Xm ≤ xx − xm xm dt ≤ (x + xm )x − xm dt 0 0 ∞ (2.1) e−tαmin x − xm dt. ≤2 0
We notice that e−tαmin x − xm = exp(−t(A + αmin I))b − Vm exp(−t(Tm + αmin I))e1 =: ˆ x − xˆm , which is the error in the approximation of the exponential of the shifted matrix A + αmin I with the Krylov subspace solution. Therefore, ∞ X − Xm ≤ 2 (2.2) ˆ x−x ˆm dt. 0
In the following we will bound X − Xm by judiciously integrating an upper bound of the integrand function. In fact, estimates for the error norm ˆ x−x ˆm are available in the literature, which show superlinear convergence of the Krylov approximation xm to the exponential vector x; see, e.g., [12], [39], [36], [21]. However, these bounds are not appropriate when used in the generalized integral above. The matrix Vm = [v1 , . . . , vm ] can be generated one vector at the time, by means of the following Arnoldi recursion: (2.3)
AVm = Vm Tm + vm+1 tm+1,m e m,
v1 = b/b,
where Vm+1 = [Vm , vm+1 ] has orthonormal columns and spans Km+1 (A, b). In general, Tm is upper Hessenberg, and it is symmetric, and thus tridiagonal, when A is itself symmetric. We conclude this section with a technical lemma, whose proof is included for completeness; see, e.g., [24] for a similar result in finite precision arithmetic. ∞ Lemma 2.1. Let Pk be a polynomial of degree at most k. Let f (z) = k=0 fk Pk (z) be a convergent series expansion of the analytic function f and assume that the expansions of f (A) and of f (Tm ) are also well defined. Then f (A)b − Vm f (Tm )e1 ≤
∞
|fk | (Pk (A) + Pk (Tm )).
k=m
Proof. We have f (A)b − Vm f (Tm )e1 =
m−1
fk (Pk (A)b − Vm Pk (Tm )e1 )
k=0 ∞
+
k=m
fk (Pk (A)b − Vm Pk (Tm )e1 ).
PROJECTION METHODS FOR LYAPUNOV EQUATIONS
831
Using the Arnoldi relation and the fact that Tm is upper Hessenberg, Ak Vm e1 = k Vm Tm e1 for k = 1, . . . , m − 1, and thus Pk (A)b = Pk (A)Vm e1 = Vm Pk (Tm )e1 , k = 1, . . . , m − 1, so that ∞ f (A)b − Vm f (Tm )e1 = fk (Pk (A)b − Vm Pk (Tm )e1 ). k=m
Taking norms, the result follows. 3. The symmetric case. In the symmetric case, we show that the asymptotic convergence rate of the Krylov subspace solver is the same as that of the conjugate gradient method applied to the shifted system (A + αmin I)x = b, where αmin = λmin , the smallest eigenvalue of the positive definite matrix A [18]; see also section 5. Proposition 3.1. Let A be symmetric and positive definite, and let λmin be the ˆ min , λ ˆmax be the extreme eigenvalues of A + λmin I and smallest eigenvalue of A. Let λ ˆ ˆ κ ˆ = λmax /λmin . Then m √ √ κ ˆ+1 κ ˆ−1 √ X − Xm ≤ 4 (3.1) √ . ˆ min κ κ ˆ+1 λ ˆ ∞ Proof. Using (2.1) we are left to estimate 0 e−tαmin x − xm dt. Let λmax be the largest eigenvalue of A. Formula (4.2) in [12] shows that both x and xm may be written as Chebyshev series,1 e.g., for x we have
∞ λmax + λmin λmax − λmin Ik t x = 2 exp −t Tk (A )b, 2 2 k=0
where Ik is the Bessel function of an imaginary argument, or modified Bessel function, Tk is the Chebyshev polynomial of degree k, and A = (λmax + λmin )/(λmax − λmin )I − 2/(λmax − λmin )A so that Tk (A ) ≤ 1 holds; see also [1, formula (9.6.34)]. Since polynomials of degree up to k − 1 are exactly represented in the Krylov subspace of dimension k (see [12] and also Lemma 2.1), it thus follows that
∞ λmax + λmin λmax − λmin Ik t x − xm ≤ 4 exp −t . 2 2 k=m
κ + 1)/(ˆ κ − 1) and ρ = Therefore, setting p = (3λmin + λmax )/(λmax − λmin ) = (ˆ p + p2 − 1, we have ∞ ˆ x−x ˆm dt X − Xm ≤ 2 0
≤8
∞ k=m
(3.2)
0
∞
3 1 λmax − λmin λmin + λmax exp −t Ik t dt 2 2 2
= ( 32 λmin +
∞
8 λmax 2 2 )
−
(λmax −λmin )2 k=m 4
1 k p + p2 − 1
∞ 1 8(ˆ κ + 1) = √ κ ˆ (3λmin + λmax ) k=m p + p2 − 1 k
2ρ 1 4(ˆ κ + 1) = √ . ρ κ ˆ (3λmin + λmax ) − 1 ρm 1 The
prime in the series indicates that the first term is divided by two.
832
V. SIMONCINI AND V. DRUSKIN 0
10
error norm ||X−X || m
estimate of Proposition 3.1 −2
absolute error norm
10
−4
10
−6
10
−8
10
−10
10
−12
10
0
5
10
15
20
25
30
dimension of Krylov subspace
Fig. 3.1. Example of section 3. 400×400 diagonal matrix with uniformly distributed eigenvalues in [1, 10]. True error norm and its estimate of Proposition 3.1 for the Krylov subspace approximation of the Lyapunov solution.
To get (3.2) we used the following integral formula for Bessel functions in [19, Formula (6.611.4)]: ∞ βν e−αt Iν (βt)dt = for ν > −1 and α > |β|. α2 − β 2 (α + α2 − β 2 )ν 0 Standard algebraic manipulations give √ √ κ ˆ+1 2ρ = κ ˆ + 1. , ρ= √ ρ−1 κ ˆ−1 In Figure 3.1 we report the behavior of the bound of Proposition 3.1 for a 400×400 diagonal matrix A having uniformly distributed eigenvalues between 1 and 10. Here αmin = λmin = 1. The vector b is the normalized vector of all ones. We explicitly observe that the linearity of the convergence rate is exactly reproduced by the upper bound of Proposition 3.1. 4. The nonsymmetric case. For A nonsymmetric, the result of the previous section can be generalized whenever the field of values of A is contained in a “wellbehaved” set of the complex plane. We recall that the field of values of a real matrix A in the Euclidean inner product is defined as F (A) = {x∗ Ax, x ∈ Cn , x = 1}, where x∗ is the conjugate transpose of x. The location of the field of values plays a crucial role in the behavior and analysis of polynomial-type methods for the solution of linear systems; see, e.g., [15], [27]. The following results make use of the theory of Faber polynomials and of recently obtained results that have been used in the context of linear systems. To this end, we need some definitions on conformal mappings. Let C = C ∪ {∞}, and let D(0, 1) = {|τ | ≤ 1} be the closed unit disk centered at zero. Given a bounded set Ω such that its complement is simply connected, define the conformal mapping φ that maps the complement of Ω onto the exterior of the unit disk D(0, 1), and such that φ(∞) = ∞ and φ (∞) > 0; see, e.g., [35, section 1.2]. Let ψ denote the inverse of φ. The principal (polynomial) part of the Laurent series of φk is the Faber polynomial Φk , of exact degree k. Under these hypotheses, it was recently shown by Beckermann
PROJECTION METHODS FOR LYAPUNOV EQUATIONS
833
that for any z in a convex and compact set of C, it holds that |Φk (z)| ≤ 2. Assume that f (λ) = exp(−λt) is regular in Ω = ψ(D(0, r2 )), and let f (λ) ≡ exp(−λt) =
∞
fk Φk (λ)
k=0
be the expansion of exp(−λt) in Faber series in Ω with 1 < r2 < ∞. For 1 < r < r2 , the expansion coefficients are given as 1 1 exp(−tψ(τ )) fk = (4.1) dτ, |fk | ≤ k sup | exp(−tψ(τ ))|; k+1 2πi |τ |=r τ r |τ |=r see, e.g., [35, sect. 2.1.3], [37]. Note that fk = fk (t). 4.1. Field of values contained in an ellipse. The case in which the field of values is contained in an ellipse is a particularly natural generalization of the symmetric case. Proposition 4.1. Assume the field of values of the real matrix A is contained + in the ellipse E ⊂ C of center (c, 0), foci (c ± d, 0), and semiaxes a1 and a2 , so that 2 2 d = a1 − a2 . Then r2 8 X − Xm ≤ (αmin + c)2 − d2 r2 − 1
1 r2
m ,
2 1 min 2 + (c + αmin ) − d2 , and r = a1 +a where r2 = c+α 2r 2r 2 . Proof. For λ ∈ E, and setting r = 2r/d, we have Φk (λ) = 2( r)−k Tk ( λ−c d ) (see, e.g., [38], [17]); therefore we can explicitly write the Faber series on E via Chebyshev ones as e−λt = 2 exp(−tc)
∞ k=0
Ik (td)Tk
λ−c d
= exp(−tc)
∞
Ik (td) rm Φk (λ).
k=0
Using Lemma 2.1, the bounds Φk (Tm ) ≤ 2, Φk (A) ≤ 2 obtained in [3], and the same integral formula for Bessel functions as in the proof of Proposition 3.1, we obtain X − Xm ∞ ≤2 ˆ x−x ˆm dt 0
≤8
∞ k=m
∞
e(−αmin −c)t Ik (td) rk dt
0
k
m ∞
r2 8 1 8 1 = = . (αmin + c)2 − d2 k=m r2 (αmin + c)2 − d2 r2 − 1 r2 We show the quality of the estimate with a few numerical examples. Example 4.2. We consider a 400 × 400 (normal) diagonal matrix A whose eigenvalues are λ = c + a1 cos θ + ıa2 sin θ, θ uniformly distributed in [0, 2π] and c = 20, that the eigenvalues are on an elliptic curve with semiaxes a1 = 10 and a2 = 2, so √ center c and focal distance d = a21 − a22 = 96. Here αmin ≈ 10.001, yielding
834
V. SIMONCINI AND V. DRUSKIN 0
10
error norm ||X−X || m
estimate of Prop. 4.1
−2
10
−4
absolute error norm
10
−6
10
−8
10
−10
10
−12
10
−14
10
0
2
4
6
8
10
12
14
16
dimension of Krylov subspace
Fig. 4.1. Example 4.2. True error and its estimate of Proposition 4.1 for the Krylov subspace solver of the Lyapunov equation.
1/r2 ≈ 0.2056 for Proposition 4.1. The vector b is the vector of all ones, normalized to have unit norm. In Figure 4.1 we report the error associated with the Krylov subspace approximation of the Lyapunov solution, and the estimate of Proposition 4.1. The agreement is impressive, as should be expected since the spectrum lies exactly on the elliptic curve and the matrix is normal, so that the field of values coincides with the associated convex hull. Example 4.3. We next consider the 400 × 400 matrix A stemming from the centered finite difference discretization of the operator L(u) = −Δu + 40(x + y)ux + 200u in the unit square, with Dirichlet boundary conditions. The spectrum of A, together with its field of values (computed with the MATLAB function fv.m in [20]) and a surrounding ellipse, is shown in the left plot of Figure 4.2. Here αmin = 0.4533. The ellipse has parameters c = 4.4535, a 1 = c − αmin , a2 = 3.7, a1 , a2 being the semiaxes’ length, and focal distance d = a21 − a22 ≈ 1.52, yielding 1/r2 ≈ 0.8044. The right plot of Figure 4.2 shows the convergence history of the Krylov solver, together with the asymptotic factor (1/r2 )m in Proposition 4.1. The initial asymptotic convergence rate is reasonably well captured by the estimate. Example 4.4. We consider the 400 × 400 bidiagonal matrix A with uniformly distributed diagonal elements in the interval [10, 110] and unit upper diagonal. In this case αmin = 9.4692. The vector b is the normalized vector of all ones. Our numerical computation reported in the left plot of Figure 4.3 showed that the field of values of A (computed once again with fv.m [20]) is contained in an ellipse with center c = 60, semiaxes a1 = 50.8, a2 = 4.2, and focal distance d = a21 − a22 ≈ 50.62, yielding 1/r2 = 0.4699. The right plot of Figure 4.3 shows the convergence history of the Krylov solver, together with the asymptotic factor in Proposition 4.1. Once again, the asymptotic rate is a good estimate of the actual convergence rate. Even more accurate bounds for this example might be obtained by using more appropriate conformal mappings than the ellipse. It may be possible to include the field of values into a rectangle, for which the mapping ψ could be numerically estimated [14], [16]; see also Example 4.9. 4.2. Field of values contained in a more general region. For a more general region, we employ the general expansion in Faber series. We will proceed as
835
PROJECTION METHODS FOR LYAPUNOV EQUATIONS 0
10
error norm ||X−X || m
4
asymp. estimate of Prop. 4.1 −2
3
10
absolute error norm
2
ℑ(λ)
1
0
−1
−2
−4
10
−6
10
−8
10
−3 −10
10 −4
0
1
2
3
4
5
6
7
8
9
−12
10
0
5
10
15
20
25
30
35
40
45
50
dimension of Krylov subspace
ℜ(λ)
Fig. 4.2. Example 4.3. Left plot: Spectrum of A, field of values (thin solid curve), and smallest computed elliptic curve including the field of value (thick solid curve). Right plot: True error and its asymptotic factor in the estimate of Proposition 4.1 for the Krylov subspace solver of the Lyapunov equation.
0
5
10
error norm ||X−X || m
4
asymp. estimate of Prop. 4.1
−2
10
3 −4
10
absolute error norm
2
ℑ(λ)
1
0
−1
−2
−6
10
−8
10
−10
10
−3 −12
10
−4
−5
−14
0
20
40
60
ℜ(λ)
80
100
120
10
0
5
10
15
20
25
30
dimension of Krylov subspace
Fig. 4.3. Example 4.4. Left plot: Real spectrum, field of values (thin solid curve), and smallest computed elliptic curve including the field of value (thick solid curve). Right plot: True error and its estimate of Proposition 4.1 for the Krylov subspace solver of the Lyapunov equation.
follows. Using Lemma 2.1, we write ˆ x−x ˆm ≤
∞
|fk | (Φk (A + αmin I) + Φk (Tm + αmin I)).
k=m
If we consider a convex set containing the field of values of A + αmin I, the result in [3] allows us to write Φk (A + αmin I) ≤ 2 and Φk (Tm + αmin I) ≤ 2, so that ˆ x−x ˆm ≤ 4
∞ k=m
|fk |,
836
V. SIMONCINI AND V. DRUSKIN
and we can conclude by using (4.1), once appropriate estimates for the sup function and for r2 areidentified. More precisely, if M = M(t) > 0 is such that |fk | ≤ Mr2−k ∞ for all k, and 0 Mdt converges, then
X − Xm ≤ 8
∞
Mdt
0
r2 r2 − 1
1 r2
m .
In the next few corollaries we derive a result of the same type, with a choice of r2 such that the generalized integral converges. In case we wish to work only with a set containing the spectrum, but not necessarily the field of values of A + αmin I, we can relax the convexity assumption and differently bound the norm of the Faber polynomials in A, at the price of keeping the condition number of the eigenvector matrix in the convergence estimate. This case will be analyzed at the end of this section, and one example will be given around Corollary 4.10. We start by considering once again the case when the field of values is contained in an ellipse, for which the result is qualitatively the same as that in Proposition 4.1. The reason for reproducing the result in the case of the ellipse is precisely to appreciate the limited loss of accuracy given by the bound, when the more general approach is used, and to explicitly show the calculations in the case of an easy-to-handle mapping. Corollary 4.5. Assume the field of values of the real matrix A is contained in an ellipse E ⊂ C+ of center (c, 0) and semiaxes a1 and a2 , a1 > a2 . Let αmin = λmin ((A + A )/2). Then for satisfying 0 < ≤ 2αmin, 8 r2 X − Xm ≤ r2 − 1
1 r2
m ,
where r2 =
1 c + αmin − + 2r 2r
2 (c + αmin − ) − d2 ,
r=
a1 + a2 ,d= 2
a21 − a22 .
ˆ be the selected ellipse containing the field of values Proof. Let α = αmin and let E ˆ of A + αI. We consider the mapping whose boundary image of the unit disk is ∂ E, (d/2)2 iθ ˆ ψ(τ ) = c + α + rτ + rτ , with τ = e ∈ D(0, 1), so that ψ(|τ | = 1) = ∂ E. For > 0, we define r2 := |ψ −1 ()|, so that exp(−t) = max | exp(−tψ(τ ))|, |τ |=r2
and for 1 < rˆ < r2 , 1 2π
(4.2)
2π
|f (ψ(ˆ r eiθ ))|dθ ≤ exp(−t) =: M(t). 0
ˆ is convex, it follows that Φk (A + αI) ≤ 2 for k = 0, 1, . . . ; see [3]. The Since E same holds for Φk (Tm + αI), since the field of values of Tm + αI is included in that of A + αI. Therefore, Lemma 2.1 ensures that ˆ x−x ˆm ≤
∞ k=m
r2 |fk | (Φk (A + αI) + Φk (Tm + αI)) ≤ 8 exp(−t) r2 − 1
1 r2
m .
PROJECTION METHODS FOR LYAPUNOV EQUATIONS
Finally, using
∞ 0
837
exp(−t)dt = −1 ,
X − Xm ≤ 0
∞
8 r2 ˆ x−x ˆm dt ≤ r2 − 1
1 r2
m ,
which completes the proof. The ideal result for x − xm would set r2 to be equal to r2 = ψ −1 (0) and not to r2 = ψ −1 () in the proof. However, this would make M in (4.2) equal to one, and the generalized integral would not converge. The result above can be compared to the sharper one in Proposition 4.1. In practice, however, the asymptotic result is not affected by the use of , since it is sufficient to take small compared to αmin , and the same asymptotic rate as in Proposition 4.1 is recovered; only the multiplicative factor increases. Therefore, setting r2,0 = ψ −1 (0), the result above shows that
(4.3)
X − Xm = O
1
m .
r2,0
The following mapping is a modified version of the external mapping used, for instance, in [21]: (4.4)
2−θ 1 τ, ψ(τ ) = γ1 − γ2 1 − τ
τ = σeiω ,
|τ | ≥ 1,
for 0 < θ < 1 and γ1 , γ2 ∈ R+ . The function ψ maps the exterior of the disc D(0, 1) onto a wedge-shaped convex set Ω in C+ . The following result holds. ˆ ⊂ C+ be the wedge-shaped set which is the image through Corollary 4.6. Let Ω ˆ ψ of the disk D(0, 1), where ψˆ is as in (4.4). Assume the field of values of the matrix ˆ For 0 < < 2αmin, let A + αmin I, with αmin = λmin ((A + A )/2), is contained in Ω. r2 = |ψˆ−1 ()|. Then 8 r2 X − Xm ≤ r2 − 1
1 r2
m .
Proof. The proof follows the same steps as that of Corollary 4.5. Example 4.7. We consider the 400 × 400 (normal) diagonal matrix A whose eigenvalues are on the curve ψ(τ ) = 2 − 2(1 − 1/τ )2−ω τ for τ ∈ D(0, 1) with ω = 0.3 (see the left plot of Figure 4.4). Here αmin = 1.9627. The image of the mapping ˆ ) = αmin + ψ(τ ), τ ∈ D(0, 1), thus contains the spectrum of A + αmin I. Numerical ψ(τ computation yields r2,0 = |ψˆ−1 (0)| ≈ 3.5063. The vector b is the normalized vector of all ones. The right plot of Figure 4.4 shows the convergence history of the Krylov solver, together with the asymptotic factor (1/r2,0 )m in the estimate of Corollary 4.6. The linear asymptotic convergence is fully captured by the estimate. In our next examples we numerically determine a contour bounding the field of values of the coefficient matrix. Indeed, more general mappings than in the examples above can be obtained and numerically approximated within the class of Schwarz– Christoffel conformal mappings [10]. In all cases, the vector b was taken to be the normalized vector of all ones. Example 4.8. We consider the 200 × 200 Toeplitz matrix A = Toeplitz(−1, −1, 2, 0.1).
838
V. SIMONCINI AND V. DRUSKIN 0
1
10
error norm ||X−Xm|| 0.8
asympt. estimate of Corollary 4.6 −2
10
0.6 −4
10 absolute error norm
0.4
ℑ(λ)
0.2
0
−0.2
−0.4
−6
10
−8
10
−10
10
−0.6 −12
10
−0.8 −14
−1
2
3
4
5
6
ℜ(λ)
7
8
10
9
0
5
10
15
20
25
dimension of Krylov subspace
Fig. 4.4. Example 4.7. Left plot: Spectrum of A. Right plot: True error and its asymptotic factor associated with the asymptotic estimate (1/r2,0 )m related to Corollary 4.6 for the Krylov subspace solver of the Lyapunov equation.
0
10
2
error norm ||X−Xm|| asympt. estimate
−1
1.5
10
1
10
absolute error norm
−2
ℑ(λ)
0.5
0
−0.5
−1
−3
10
−4
10
−5
10
−6
10
−1.5
−7
10
−2 −8
0
0.5
1
1.5
ℜ(λ)
2
2.5
3
3.5
10
0
20
40
60
80
100
120
dimension of Krylov subspace
Fig. 4.5. Example 4.8. Left: Spectrum (“×”) and approximated field of values (solid line). Right: True convergence rate and asymptotic estimate (1/r2,0 )m .
In this case, the asymptotic convergence rate was numerically determined. To this end, we used the Schwarz–Christoffel mapping Toolbox [11] in MATLAB to numerically compute a conformal mapping whose image was an approximation to the boundary of the field of values of A (cf. left plot of Figure 4.5). A polygon with few vertices approximating ∂F (A + αmin I) was obtained with fv.m, and this was then injected into the Schwarz–Christoffel inverse mapping function to construct the sought-after mapping and the value of r2,0 according to (4.3). The asymptotic rate was determined to be 1/r2,0 ≈ 0.8859. The right plot in Figure 4.5 shows the extremely good agreement between the true error and the asymptotic rate for this numerically determined mapping. Example 4.9. We consider once again the matrix in Example 4.4 and use the Schwarz–Christoffel mapping Toolbox to generate a sharper estimate of the polygon
839
PROJECTION METHODS FOR LYAPUNOV EQUATIONS 0
4
10
error norm ||X−Xm|| asympt. estimate
3
−2
10
2 −4
absolute error norm
10
ℑ(λ)
1
0
−1
−6
10
−8
10
−10
10
−2
−12
10
−3
−4
−14
20
40
60
ℜ(λ)
80
100
120
10
0
5
10
15
20
25
30
dimension of Krylov subspace
Fig. 4.6. Example 4.9. Left: Spectrum (“×”) and approximated field of values (solid line). Right: True convergence rate and asymptotic estimate (1/r2,0 )m .
including the field of values. This provides a refined numerical mapping and a more accurate convergence rate. The polygon approximating the field of values of A+αmin I is shown in the left plot of Figure 4.6, while the history of the error norm and the estimate for the numerically computed value 1/r2,0 ≈ 0.4445 (cf. (4.3)) are reported in the right plot of Figure 4.6. The estimated convergence rate is clearly higher, that is, 1/r2,0 is smaller, than the value computed with the ellipse, which was 1/r2 ≈ 0.4699. The following mapping was analyzed in [26] and is associated with a nonconvex domain; the specialized case of an annular sector is discussed, for instance, in [7]. Given a set Ω, assume that ∂Ω is an analytic Jordan curve. If Ω is of bounded (or finite) boundary rotation, then max |Φk (z)| ≤ z∈Ω
V (Ω) , π
where V (Ω) is the boundary rotation of Ω, defined as the total variation of the angle between the positive real axis and the tangent of ∂Ω. In particular, this bound is scaleinvariant, so that it also holds that V (sΩ) = V (Ω) [26]. These important properties ensure that for a diagonalizable matrix A, Φk (A + αmin I) is bounded independently of k, on a nonconvex set with bounded boundary rotation. Indeed, letting A = QΛQ−1 be the spectral decomposition of A, then Φk (A + αmin I) ≤ κ(Q)Φk (Λ + αmin I), where κ(Q) = Q Q−1, and the estimate above can be applied. Corollary 4.10. Assume that A is diagonalizable, and let A = QΛQ−1 be its spectral decomposition. Assume the spectrum of A + αmin I is contained in the set sΩ ∈ C+ , with s > 0, whose boundary is the “bratwurst” image for |τ | = 1 of ψ(τ ) =
(ρτ − λN )(ρτ − λM ) ∈ C+ , (N − M )ρτ + λ(N M − 1)
where τ ∈ D(0, r), r ≥ 1, while N, M, ρ, and λ are given and such that ψ(D(0, 1)) ⊂ C+ . Then, for 0 < < min|τ |=1 (ψ(τ )),
m 8V (Ω)κ(Q) r2 1 X − Xm ≤ , π r2 − 1 r2 where r2 ≥ 1 is the smallest radius such that = (ψ(r2 exp(iθ))) for some θ.
840
V. SIMONCINI AND V. DRUSKIN 1
6
10
error norm ||X−Xm|| 0
asympt. estimate
10
4 −1
10
r2,0
−2
10 absolute error norm
ℑ(λ)
2
0
−2
−3
10
−4
10
−5
10
−6
10
−7
10
−4
ρ=0.98
−8
10
−6
−9
0
2
4
ℜ(λ)
6
8
10
10
0
10
20
30
40
50
60
dimension of Krylov subspace
Fig. 4.7. Example 4.11. Left plot: Spectrum and “bratwurst” curves associated with disks of different radius. Right plot: True error and the asymptotic factor of its estimate in Corollary 4.10 for the Krylov subspace solver of the Lyapunov equation.
Proof. Proceeding as in Corollary 4.5 we have ˆ x−x ˆm ≤
∞
|fk | (Φk (A + αI) + Φk (Tm + αI))
k=m
V (Ω)κ(Q) r2 ≤ 4M(t) π r2 − 1
1 r2
m .
Here M(t) = exp(−t). Finally,
m ∞ 1 8V (Ω)κ(Q) ∞ r2 ˆ x−x ˆm dt ≤ M(t)dt , X − Xm ≤ 2 π r − 1 r 2 2 0 0 from which the result follows. Example 4.11. This example is taken from [25]; see also [26] for more details. In this case, A is the 225 × 225 matrix pde225 of the Matrix Market repository [28] and it is such that αmin ≈ 0.08249. The spectrum of A + αmin I is included in the set 2Ω whose boundary is the bratwurst image of ψ as in Corollary 4.10, with λ = −1, N = 1.0508, ρ = 0.98, M = 0.6626 (exact to the first decimal digits; the other parameters defined in [25] were set at the different values θ = 5/4π, e = 1.40). The left plot of Figure 4.7 shows the spectrum of A + αmin I as “×”; the solid curve corresponds to the boundary of ψ(D(0, 1)), enclosing the whole spectrum. Let r2,0 ≥ 1 be the smallest radius such that ψ(r2,0 eiθ ) = 0 for some θ. Then the dashed curve is the boundary of ψ(D(0, r2,0 )). The right plot of Figure 4.7 shows the convergence 1 m , curve of the Krylov subspace solver, together with the asymptotic quantity r2,0 m = 1, 2, . . . , associated with Corollary 4.10. We observe that the initial convergence phase is well captured by the estimate. As expected, the estimate cannot reproduce the superlinear convergence of the solver at later stages. 5. Connections to linear system solvers and further considerations. The relation ∞ −1 z = e−tz dt 0
PROJECTION METHODS FOR LYAPUNOV EQUATIONS
841
can be used to show a close connection between our estimates and the solution of the linear system (A + αmin I)d = b in the Krylov subspace. Let Vm (Tm + αmin I)−1 e1 be the Galerkin approximation to the linear system solution d in the Krylov subspace Km (A, b) = Km (A + αmin I, b). Then the system error can be written as (A + αmin I)−1 b − Vm (Tm + αmin I)−1 e1 ∞ = (exp(−t(A + αmin I))b − Vm exp(−t(Tm + αI))e1 ) dt. 0
Comparing the last integral with the error bound in (2.2) shows that the error norm (A+αminI)−1 b−Vm (Tm +αmin I)−1 e1 may be bounded by exactly the same tools we have used for the Lyapunov error and that the two initial integral bounds differ only by a factor of two. Indeed, the estimates of Proposition 3.1 (symmetric case) and of Proposition 4.1 (spectrum contained in an ellipse) employ the same asymptotic factors that characterize the convergence rate of methods such as the conjugate gradients in the symmetric case, and FOM or GMRES in the nonsymmetric case, when applied to the system (A + αmin I)d = b; see, e.g., [32]. Therefore, we have shown that the convergence of a Galerkin procedure in the Krylov subspace for solving (1.1) has the same convergence factor as a corresponding Krylov subspace method for the shifted (single vector) linear system. As a natural consequence of the discussion above, the previous results can be generalized to the case when b is replaced by a matrix B, with more than one column. A Galerkin approximation may be obtained by first generating the “block” Krylov subspace Km (A, B) = span{B, AB, . . . , Am−1 B} and then proceeding as described in section 2; see, e.g., [2]. Let B = [b1 , . . . , bs ]. Setting Z = exp(−tA)B and letting Zm ∈ Km (A, B) be the associated Krylov approximation to the exponential, we can
bound ZZ − Zm Zm , for instance, as
≤ ZZ − Zm Zm
s
(k) (k) (k) (k) zm (zm ) − zm (zm ) ,
k=1 (1)
(s)
where Z = [z (1) , . . . , z (s) ] and Zm = [zm , . . . , zm ]. The results of the previous sections can be thus applied to each term in the sum. Refined bounds may possibly be obtained by using the theory of matrix polynomials, but this is beyond the scope of this work; see, e.g., [32]. We also observe that our convergence results can be generalized to the case of accelerated methods, such as that described in [33], by using the theoretical matrix function framework described in [13]. Acknowledgments. We are deeply indebted to Leonid Knizhnerman for several insightful comments which helped improve a previous version of this paper. We also thank the referee, whose criticism helped us improve this paper. REFERENCES [1] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions, Dover, New York, 1965. [2] A. C. Antoulas, Approximation of Large-Scale Dynamical Systems, Adv. Des. Control 6, SIAM, Philadelphia, 2008. [3] B. Beckermann, Image num´ erique, GMRES et polynˆ omes de Faber, C. R. Acad. Sci. Paris Ser. I, 340 (2005), pp. 855–860.
842
V. SIMONCINI AND V. DRUSKIN
[4] B. Beckermann and A. B. J. Kuijlaars, Superlinear convergence of conjugate gradients, SIAM J. Numer. Anal., 39 (2001), pp. 300–329. [5] B. Beckermann and A. B. J. Kuijlaars, Superlinear CG convergence for special right-hand sides, Electron. Trans. Numer. Anal., 14 (2002), pp. 1–19. [6] P. Benner, Control theory, in Handbook of Linear Algebra, Chapman & Hall/CRC, Boca Raton, FL, 2006, Chapter 57. [7] J. P. Coleman and N. J. Myers, The Faber polynomials for annular sectors, Math. Comp., 64 (1995), pp. 181–203. [8] M. J. Corless and A. E. Frazho, Linear Systems and Control—An Operator Perspective, Pure Appl. Math., Marcel Dekker, New York, Basel, 2003. [9] B. N. Datta, Krylov subspace methods for large-scale matrix problems in control, Future Generation Computer Systems, 19 (2003), pp. 1253–1263. [10] T. A. Driscoll and L. N. Trefethen, Schwarz-Christoffel Mapping, Cambridge Monogr. Appl. Comput. Math. 8, Cambridge University Press, Cambridge, UK, 2002. [11] T. Driscoll, Algorithm 756: A MATLAB Toolbox for Schwarz-Christoffel mapping, ACM Trans. Math. Software, 22 (1996), pp. 168–186. [12] V. Druskin and L. Knizhnerman, Two polynomial methods of calculating functions of symmetric matrices, U.S.S.R. Comput. Math. Math. Phys., 29 (1989), pp. 112–121. [13] V. Druskin and L. Knizhnerman, Extended Krylov subspaces: Approximation of the matrix square root and related functions, SIAM J. Matrix Anal. Appl., 19 (1998), pp. 755–771. [14] M. Eiermann, On semiiterative methods generated by Faber polynomials, Numer. Math., 56 (1989), pp. 139–156. [15] M. Eiermann, Field of values and iterative methods, Linear Algebra Appl., 180 (1993), pp. 167– 197. [16] S. W. Ellacott, Computation of Faber series with application to numerical polynomial approximation in the complex plane, Math. Comp., 40 (1983), pp. 575–587. [17] K. O. Geddes, Near-minimax polynomial approximation in an elliptical region, SIAM J. Numer. Anal., 15 (1978), pp. 1225–1233. [18] G. Golub and C. F. Van Loan, Matrix Computations, 3rd ed., The Johns Hopkins University Press, Baltimore, MD, 1996. [19] I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products (corrected and enlarged edition), Academic Press, San Diego, CA, 1980. [20] N. J. Higham, The Matrix Computation Toolbox, http://www.ma.man.ac.uk/∼higham/ mctoolbox. [21] M. Hochbruck and C. Lubich, On Krylov subspace approximations to the matrix exponential operator, SIAM J. Numer. Anal., 34 (1997), pp. 1911–1925. [22] I. M. Jaimoukha and E. M. Kasenally, Krylov subspace methods for solving large Lyapunov equations, SIAM J. Numer. Anal., 31 (1994), pp. 227–251. [23] K. Jbilou and A. J. Riquet, Projection methods for large Lyapunov matrix equations, Linear Algebra Appl., 415 (2006), pp. 344–358. [24] L. Knizhnerman, Calculus of functions of unsymmetric matrices using Arnoldi’s method, Comput. Math. Math. Phys., 31 (1991), pp. 1–9. [25] T. Koch and J. Liesen, The conformal “bratwurst” maps and associated Faber polynomials, Numer. Math., 86 (2000), pp. 173–191. [26] J. Liesen, Construction and Analysis of Polynomial Iterative Methods for Non-Hermitian Systems of Linear Equations, Ph.D. thesis, Fakult¨ at f¨ ur Mathematik, Universit¨ at Bielefeld, 1998. [27] T. A. Manteuffel, The Tchebychev iteration for nonsymmetric linear systems, Numer. Math., 28 (1977), pp. 307–327. [28] Matrix Market, A Visual Repository of Test Data for Use in Comparative Studies of Algorithms for Numerical Linear Algebra, Mathematical and Computational Sciences Division, National Institute of Standards and Technology; available online at http://math.nist.gov/ MatrixMarket. [29] O. Nevanlinna, Convergence of Iterations for Linear Equations, Birkh¨ auser, Basel, 1993. [30] M. Robb´ e and M. Sadkane, A convergence analysis of GMRES and FOM for Sylvester equations, Numer. Algorithms, 30 (2002), pp. 71–89. [31] Y. Saad, Numerical solution of large Lyapunov equations, in Signal Processing, Scattering, Operator Theory, and Numerical Methods, Proceedings of the International Symposium MTNS-89, Vol. III, M. A. Kaashoek, J. H. van Schuppen, and A. C. Ran, eds., Birkh¨ auser, Boston, 1990, pp. 503–511. [32] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS, Boston, 1996.
PROJECTION METHODS FOR LYAPUNOV EQUATIONS
843
[33] V. Simoncini, A new iterative method for solving large-scale Lyapunov matrix equations, SIAM J. Sci. Comput., 29 (2007), pp. 1268–1288. [34] V. Simoncini and D. B. Szyld, On the occurrence of superlinear convergence of exact and inexact Krylov subspace methods, SIAM Rev., 47 (2005), pp. 247–272. [35] V. I. Smirnov and N. A. Lebedev, Functions of a Complex Variable, Constructive Theory, MIT Press, Cambridge, MA, 1968. [36] D. E. Stewart and T. S. Leyk, Error estimates for Krylov subspace approximations of matrix exponentials, J. Comput. Appl. Math., 72 (1996), pp. 359–369. [37] P. K. Suetin, Fundamental properties of Faber polynomials, Russian Math. Surv., 19 (1964), pp. 121–149. [38] P. K. Suetin, Series of Faber Polynomials (Analytical Methods and Special Functions), Gordon and Breach Science Publishers, Amsterdam, 1998 (translated by E. V. Pankratiev). [39] H. Tal-Ezer, Spectral methods in time for parabolic problems, SIAM J. Numer. Anal., 26 (1989), pp. 1–11.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 844–860
CAN THE NONLOCAL CHARACTERIZATION OF SOBOLEV SPACES BY BOURGAIN ET AL. BE USEFUL FOR SOLVING VARIATIONAL PROBLEMS?∗ GILLES AUBERT† AND PIERRE KORNPROBST‡ Abstract. We question whether the recent characterization of Sobolev spaces by Bourgain, Brezis, and Mironescu (2001) could be useful to solve variational problems on W 1,p (Ω). To answer this, we introduce a sequence of functionals so that the seminorm is approximated by an integral operator involving a differential quotient and a radial mollifier. Then, for the approximated formulation, we prove existence, uniqueness, and convergence of the solution to the unique solution of the initial formulation. We show that these results can also be extended in the BV -case. Interestingly, this approximation leads to a unified implementation, for Sobolev spaces (including with high p-values) and for the BV space. Finally, we show how this theoretical study can indeed lead to a numerically tractable implementation, and we give some image diffusion results as an illustration. Key words. calculus of variation, functional analysis, Sobolev spaces, BV , variational approach, integral approximations, nonlocal formulations AMS subject classifications. 35J, 45E, 49J, 65N, 68W DOI. 10.1137/070696751
1. Introduction. The goal of this work is to propose a new unifying method for solving variational problems defined on the Sobolev spaces W 1,p (Ω) or on the space of functions of bounded variations BV (Ω) of the form (1.1)
inf
u∈W 1,p (Ω)
F (u),
with
|∇u(x)|p dx +
F (u) = Ω
h(x, u(x))dx. Ω
To solve this problem numerically, particularly in the case when p = 1, several methods have been proposed; see, e.g., [8, 13, 14, 7, 18, 19]. These methods mainly rely on regularization or duality results. In this article we propose an alternative method based on a recent new characterization of the Sobolev spaces by Bourgain, Brezis, and Mironescu [5], and further extended by Ponce [16] in the BV -case. In [5] the authors showed that the Sobolev seminorm of a function f can be approximated by a sequence of integral operators involving a differential quotient of f and a suitable sequence of radial mollifiers: |u(x) − u(y)|p lim ρn (|x − y|)dxdy = KN,p |∇u|p dx. n→∞ Ω Ω |x − y|p Ω ∗ Received by the editors July 10, 2007; accepted for publication (in revised form) October 15, 2008; published electronically February 6, 2009. http://www.siam.org/journals/sinum/47-2/69675.html † Laboratoire J. A. Dieudonn´ e, UMR 6621 CNRS, Universit´e de Nice-Sophia Antipolis, 06108 Nice Cedex 2, France (
[email protected]). ‡ INRIA Sophia Antipolis, Projet Odyss´ ee, 2004 Route des Lucioles, 06902 Sophia Antipolis, France (
[email protected]).
844
VARIATIONAL PROBLEMS IN W 1,p (Ω) AND BV (Ω)
845
In this paper, our main contribution is to show how this characterization can be used to approximate the variational formulation (1.1) by defining the sequence of functionals |u(x) − u(y)|p ρ (|x − y|)dxdy + h(x, u(x)) dx. Fn (u) = n |x − y|p Ω Ω Ω To do this, we prove that the sequence of minimizers of Fn converges to the solution of the original variational formulation. We prove this result for any p ≥ 1, so that the BV -case is also covered (thanks to results by Ponce [16]). Note that approximation is not constrained by the fidelity attach term (see [7]). Numerically, we propose a unified subgradient approach for all p ≥ 1, and we show how to discretize the nonlocal singular term with a finite element–type method. Interestingly, the nonlocal term in Fn has some similarities to recent contributions by Gilboa and Osher [12] and Gilboa et al. [11], who propose to minimize nonlocal functionals of the type φ(|u(x) − u(y)|)w(|x, y|)dxdy, Ω
Ω
where φ is a convex positive function and w is a weighting function. The authors propose a general formalism for nonlocal smoothing terms but define them heuristically for their applications in image processing (see also the link to neighborhood filters [6]). In our contribution, the nonlocal term that we propose comes from the approximation of a seminorm, so that we will show some regularity results on the solution. Notice that one related major difference is the weighting function, which is in our case singular. This paper is organized as follows. In section 2, we recall the main results from [5] that we will use herein and define the sequence of the approximating functional Fn . In section 3, we present the most significant results of the paper, considering the case p > 1: we prove existence and uniqueness of a minimizer un of Fn , characterize its regularity, derive the optimality condition, and finally show that un converges to the unique solution of the initial formulation. In section 4, we describe how those results can be extended to the case p = 1, which corresponds to the BV -case. Finally, we show in section 5 how this theoretical study can indeed lead to a numerically tractable implementation, and we give some image diffusion results as an illustration. 2. The Bourgain–Brezis–Mironescu result. Let us first recall the result of Bourgain, Brezis, and Mironescu [5]. Proposition 2.1. Assume 1 ≤ p < ∞ and u ∈ W 1,p (Ω), and let ρ ∈ L1 (R), ρ ≥ 0. Then |u(x) − u(y)|p ρ(|x − y|)dxdy ≤ CupW 1,p ρL1 (R) , (2.1) p |x − y| Ω Ω p where uW 1,p denotes the (semi)norm defined by upW 1,p = Ω |∇u|p dx and C depends only on p and Ω. Now let us suppose that (ρn ) is a sequence of radial mollifiers, i.e., (2.2) ρn ≥ 0, ρn (|x|)dx = 1, RN
and for every δ > 0, we assume that ∞ (2.3) lim ρn (r)rN −1 dr = 0. n→∞
δ
846
GILLES AUBERT AND PIERRE KORNPROBST
With conditions (2.2) and (2.3), which we will assume throughout this article, we have the following proposition. Proposition 2.2. If 1 < p < ∞ and u ∈ W 1,p (Ω), then |u(x) − u(y)|p ρn (|x − y|)dxdy = KN,pupW 1,p , (2.4) lim n→∞ Ω Ω |x − y|p where KN,p depends only on p and N . In this paper, we propose to apply Propositions 2.1 and 2.2 for solving general variational problems of the form (2.5)
inf
u∈W 1,p (Ω)
with (2.6)
F (u),
|∇u(x)|p dx +
F (u) = Ω
h(x, u(x))dx, u ∈ W 1,p (Ω). Ω
To do this, following [5], we introduce the nonlocal formulation (2.7)
inf
u∈Lp (Ω)
with (2.8)
Fn (u) = Ω
Ω
Fn (u),
|u(x) − u(y)|p ρn (|x − y|)dxdy + |x − y|p
h(x, u(x)) dx. Ω
Our goal is to establish in which sense formulation (2.7)–(2.8) approximates the initial formulation (2.5)–(2.6). 3. Approximation of variational problems on W 1,p (Ω), p > 1. Thanks to Proposition 2.1, functional Fn (u) is well-defined on W 1,p (Ω). However, one cannot prove directly that Fn admits a unique minimizer on W 1,p (Ω), since minimizing sequences cannot be bounded in that space. Thus we need to consider the minimization over the larger space Lp (Ω), and problem (2.7) is in fact an unbounded problem in Lp (Ω). In this section, we prove the following results: • For n fixed, we show in section 3.1 that problem (2.7) admits a unique solution un ∈ Lp (Ω). • Then we show in section 3.2 that un is more regular and belongs to the Sobolev space W s,p (Ω) with 1/2 < s < 1. Moreover, we show that all minimizing sequences are bounded on W s,p (Ω). The main consequence is that minimizing sequences (uln )l indeed converge strongly to un . This additional regularity will also enable us to consider problems with Dirichlet boundary conditions, since one can give a meaning to the trace operator on that space. • The previous regularity result will be fundamental in section 3.3 when we consider that n tends to infinity. Applying some results by Ponce [16], we will show that un converges to the unique solution u of the original formulation (2.5). • In section 3.4 we establish the expression of the Euler–Lagrange equation. Remark. Note that throughout this section and in the proofs, we will denote by C a universal constant that may be different from one line to the other. If the constant depends on n, for example, it will be denoted by C(n).
VARIATIONAL PROBLEMS IN W 1,p (Ω) AND BV (Ω)
847
3.1. Existence and uniqueness of a solution un in Lp (Ω). Now, let us show that functional (2.8) admits a unique minimizer. It is clear by using again Proposition 2.1 and the fact that ρn L1 (R) = 1 that we have for all v in W 1,p (Ω) p Fn (u) ≤ inf Fn (u) ≤ Fn (v) ≤ CvW 1,p + h(x, v(x)) dx, inf p 1,p u∈L (Ω)
u∈W
(Ω)
Ω
from which we deduce that inf u∈Lp (Ω) Fn (u) is bounded by a finite constant (independent of n). Proposition 3.1. Assume that h ≥ 0, the function x → h(x, u(x)) is in L1 (Ω) for all u in Lp (Ω), h is convex with respect to its second argument, and, for each n, the function t → ρn (t) is nonincreasing. Then functional (2.8) admits a unique minimizer in Lp (Ω). Before proving this proposition, let us recall a technical lemma from Bourgain, Brezis, and Mironescu (Lemma 2 in [5]) that we will use in the proof of Proposition 3.1. Lemma 3.2. Let g, k : (0, δ) → R+ . Assume g(t) ≤ g(t/2) for t ∈ (0, δ), and that k is nonincreasing. Then for all M > 0, there exists a constant C(M ) > 0 such that
δ
t
(3.1)
M−1
g(t)k(t)dt ≥ C(M )δ
0
−M
δ
t
M−1
δ
tM−1 k(t)dt.
g(t)dt
0
0
Proof of Proposition 3.1. Let us consider a minimizing sequence uln of Fn (u) with n > 0 fixed. Since h ≥ 0 and inf u∈Lp (Ω) Fn (u) is bounded, then there exists a constant C such that |uln (x) − uln (y)|p (3.2) ρn (|x − y|)dxdy ≤ C. |x − y|p Ω Ω We are going to apply techniques borrowed from Bourgain, Brezis, and Mironescu [5, Theorem 4]. Without loss of generality, we may assume that Ω = RN and that the support of uln is included in a ball B of diameter 1. This can be achieved by extending each function uln by reflection across the boundary in a neighborhood of ∂Ω. We may also assume the normalization condition Ω uln (x)dx = 0 for all n and l. Let us define for each n, l, t > 0 (3.3) Enl (t) = |uln (x + tw) − uln (x)|p dxdw, S N −1
RN
where S N −1 denotes the unit sphere of RN . Straightforward changes of variables show that 1 |uln (x) − uln (y)|p E l (t) ρ (|x − y|)dxdy = tN −1 np ρn (t)dt, n p |x − y| t 0 Ω Ω and thus (3.2) can be equivalently expressed as 1 E l (t) (3.4) tN −1 np ρn (t)dt ≤ C. t 0 Now since we have supposed that uln is of zero mean, we can write 1 l l ul (y)dy. un (x) = un (x) − |B| B n
848
GILLES AUBERT AND PIERRE KORNPROBST
Thus p p 1 1 l l uln (y)dy dx = u (x) − u (y)dy |uln (x)|p dx = uln (x) − dx, n n |B| B |B|p B and, thanks to the H¨ older inequality, there exists a constant C such that 1 l p l l p (3.5) |un (x)| dx ≤ C tN −1 Enl (t)dt. |un (x + h) − un(x)| dx dh = C |h|≤1
0
Now, an interesting property of
Enl
is that
Enl (2t) ≤ 2p Enl (t).
(3.6)
Inequality (3.6) follows from the triangle inequality |a + b|p ≤ 2p−1 (|a|p + |b|p ): Enl (2t) = |uln (x + 2tw) − uln (x)|p dxdw S N −1
RN
= S N −1
≤ 2p−1
(3.7)
RN
|uln (x + 2tw) − uln (x + tw) + uln (x + tw) − uln (x)|p dxdw
S N −1
RN
+ S N −1
RN
|uln (x + 2tw) − uln (x + tw)|p dxdw |uln (x + tw) − uln (x)|p dxdw
≤ 2p Enl (t), since both integrals in (3.7) are equal (up to a change of variable). To conclude we apply Lemma 3.2 with M = N , δ = 1, k(t) = ρn (t), and g(t) = l En (t) tp
(this choice is valid thanks to the hypotheses on ρn and property (3.6)). We obtain 1 1 1 E l (t) E l (t) tN −1 ρn (t) np dt ≥ C tN −1 ρn (t)dt tN −1 np dt t t 0 0 0 1 1 (3.8) tN −1 ρn (t)dt tN −1 Enl (t)dt, ≥C 0
0
where we 1 have used in the last inequality the fact that 0 < t < 1. Let us denote d(n) = 0 tN −1 ρn (t)dt > 0; we obtain, thanks to (3.4), (3.5), and (3.8), that there exists a constant C(n) > 0 (but which is independent of l) such that l u n p ≤ C(n). (3.9) L (Ω) From (3.9), we deduce that, up to a subsequence, uln tends weakly in Lp (Ω) to some un ∈ Lp (Ω) as l → +∞. Then we deduce that the sequence wnl (x, y) = uln (x) − uln (y) tends weakly in Lp (Ω × Ω) to wn (x, y) = un (x) − un (y). Since the functional ρn (|x − y|) w→ |w(x, y)|p dxdy |x − y|p Ω Ω ¯ we easily get is nonnegative, convex, and lower semicontinuous from Lp (Ω × Ω) → R, Fn (un ) ≤ lim Fn (uln ) = l→∞
inf
u∈Lp (Ω)
Fn (u),
where the symbol lim denotes the lower limit. Therefore un is a minimizer of Fn . Moreover it is unique since the function t → |t|p is strictly convex for p > 1.
VARIATIONAL PROBLEMS IN W 1,p (Ω) AND BV (Ω)
849
3.2. Regularity result for un . We have obtained the existence of a minimizer in Lp (Ω). Let us show that the solution is in fact more regular than just Lp . As for W 1,p (Ω), the space W s,p (Ω) can be characterized by a differential quotient. For 0 < s < 1 and 1 ≤ p < ∞, we define |u(x) − u(y)| s,p p p W (Ω) = u ∈ L (Ω); ∈ L (Ω × Ω) , |x − y|s+N/p endowed with the norm |u|pW s,p (Ω) =
|u|p dx +
Ω
Ω
Ω
|u(x) − u(y)|p dxdy. |x − y|sp+N
Let us consider n fixed and let us denote by C(n) a universal positive constant depending on n (i.e., C(n) may be different from one line to the next). Let (uln )l be a minimizing sequence of (2.7) so that |uln (x) − uln (y)|p ρn (|x − y|)dxdy ≤ C(n). (3.10) |x − y|p Ω Ω Then we would like to prove that (3.10) implies |uln (x) − uln (y)|p dxdy ≤ C(n) (3.11) |x − y|sp+N Ω Ω for some 1/2 < s < 1 and some other constant C(n), thus showing that uln belongs to W s,p (Ω). Proposition 3.3. Let q be a real number such that p2 < q < p and (p−1) ≤ q, and let us assume that ρn verifies (2.2)–(2.3) and also that conditions of Proposition 3.1 are fulfilled. Moreover let us suppose that the functions t → ρn (t) and t → tq+2−p ρn (t) are nonincreasing for t ≥ 0. Then uln ∈ W q/p,p (Ω) for all l. Proof. Without loss of generality, let us prove Proposition 3.3 for the case N = 2. Equivalently, thanks to (3.3) of Enl , we can rewrite (3.10) and (3.11) so that one needs to prove that 1 l E (t) (3.12) t np ρn (t)dt ≤ C(n) t 0 implies
1
t 0
Enl (t) dt ≤ C(n). tsp+2 E l (t)
n , k(t) = tq+2−p ρn (t). Assuming Let us apply Lemma 3.2 with M = δ = 1, g(t) = tq+1 the hypothesis on g(t) is true, Lemma 3.2 gives 1 l 1 l 1 En (t)ρn (t) En (t) (3.13) dt ≥ C(M ) dt tq+2−p ρn (t)dt. tp−1 tq+1 0 0 0
Therefore 0
1
Enl (t) 1 dt ≤ 1 q+1 t C(M ) 0 tq+2−p ρn (t)dt
0
1
Enl (t)ρn (t) dt, tp−1
850
GILLES AUBERT AND PIERRE KORNPROBST
and according to (3.12), we get 1 l En (t) C(n)/C(M ) , dt ≤ 1 q+1 t tq+2−p ρn (t)dt 0 0 where the right-hand term is bounded independently of l. Thus uln ∈ W s,p (Ω) with s = qp , and since we have supposed p2 < q < p we have 12 < s < 1. So it remains to show that function g(t) verifies the hypothesis of Lemma 3.2. l En (t) tq+1
We have to check g(t) ≤ g(t/2). Since g(t) = l
En (t) 2q+1−p tq+1
then g(t/2) =
l En (t/2) q+1 tq+1 2
≥
=2 g(t) (thanks to (3.3)). Thus we get g(t/2) ≥ g(t) if q +1−p ≥ 0, i.e., if q ≥ (p − 1). Depending on p, one needs to find a function ρn (t) so that ρn (t) and tq+2−p ρn (t) are decreasing, and verify (2.2) and (2.3). Let us show that such a ρn function exists. We define 1 (3.14) ρn (t) = Cn2 ρ(nt) with C = ρ(|x|)dx R2 q+1−p
and, depending on the values of p, we propose the following functions: ⎧ q+1 ⎪ if p = 1, with 0.5 < q < 1, ⎪ ⎨exp(−t)/t q (3.15) ρ(t) = exp(−t)/t if p = 2, with 1 < q < 2, ⎪ ⎪ ⎩exp(−t)/t if p > 2, with q = p − 1. As a consequence, we have the following proposition. Proposition 3.4. Let (uln )l be a minimizing sequence of (2.7). Let us suppose that h verifies the conditions of Proposition 3.1 and the coercivity condition h(x, u) ≥ a|u|p + b, with a > 0. Then the sequence (uln )l is bounded in W q/p,p (Ω) uniformly with respect to l. Therefore, up to a subsequence, uln tends weakly to un in W q/p,p (Ω) (and strongly in Lp (Ω)). Another direct consequence of Proposition 3.3 is the following. Lemma 3.5. We have inf u∈Lp (Ω) Fn (u) = inf u∈W s,p (Ω) Fn (u), and the solution of the problem posed on Lp (Ω) is also the solution of the problem posed in W s,p (Ω). Proof. Since W s,p (Ω) ⊂ Lp (Ω), then inf
u∈Lp (Ω)
Fn (u) ≤
inf
u∈W s,p (Ω)
Fn (u).
By definition, since un is the minimizer of Fn in Lp (Ω), we have Fn (un ) =
inf
u∈Lp (Ω)
Fn (u) ≤
inf
u∈W s,p (Ω)
Fn (u),
but as un ∈ W s,p (Ω), we have finally inf
u∈W s,p (Ω)
Fn (u) ≤ Fn (un ) =
inf
u∈Lp (Ω)
Fn (u) ≤
inf
u∈W s,p (Ω)
Fn (u),
which concludes the proof. Remark. Yet another consequence of Proposition 3.3 is that one can also consider problems with Dirichlet boundary conditions if necessary: If one needs to solve problem (2.5) with a Dirichlet boundary condition u = ϕ on ∂Ω, then one can impose the minimizing sequence of (2.7) to verify uln = ϕ on ∂Ω (which has a meaning thanks to this regularity result), so that, by continuity of the trace operator, we have un = ϕ on ∂Ω. Thus un is the unique minimizer in W q/p,p (Ω) of problem (2.7), also verifying the Dirichlet boundary condition.
VARIATIONAL PROBLEMS IN W 1,p (Ω) AND BV (Ω)
851
3.3. Study of the limn→∞ un . In section 3 we proved the existence of a unique solution un for problem (2.7), with n fixed, which is in fact in W s,p (Ω). Now, we are going to examine the asymptotic behavior of (2.7) as n → ∞. Throughout this section we will suppose the hypotheses stated in Propositions 3.3 and 3.4 hold. By definition of a minimizer, we have, for all v ∈ W q/p,p (Ω), |v(x) − v(y)|p ρn (|x − y|)dxdy + h(x, v(x))dx. (3.16) Fn (un ) ≤ Fn (v) = |x − y|p Ω Ω Ω Thus by using (2.1) and the fact that |ρn |L1 = 1 we deduce from (3.16) that Fn (un ) is bounded uniformly with respect to n. In particular, we get for some constant C > 0 |un (x) − un (y)|p ρn (|x − y|)dxdy ≤ C. |x − y|p Ω Ω By using the same technique as in Proposition 3.3, we still have that (un ) is bounded in W q/p,p (Ω). Therefore there exists u such that (up to a subsequence) un → u in Lp (Ω)strong. Moreover, by applying Theorem 4 from [5], we obtain that u ∈ W 1,p (Ω). We claim that u is the unique solution of problem (2.5), i.e., for all v ∈ W 1,p (Ω), (3.17) |∇u(x)|p dx + h(x, u(x))dx ≤ |∇v(x)|p dx + h(x, v(x))dx. Ω
Ω
Ω
Ω
To prove (3.17) we refer the reader to the paper by Ponce [16]. In this paper the author studies in the same spirit as [5] new characterizations of Sobolev spaces and also of the space BV (Ω) of functions of bounded variations (see also section 4). The author considers more general differential quotients than the ones in [5], namely, functionals of the form
|u(x) − u(y)| w En (u) = ρn (|x − y|)dxdy. |x − y| Ω Ω By studying the asymptotic behavior, Ponce [16] obtained new characterizations of W 1,p (Ω) but also of BV (Ω). In particular, for w(t) = |t|p the author proved that En (u) Γ-converge (up to a multiplicative constant) to E(u) = Ω |∇u|p dx. We have the following proposition. Proposition 3.6. (i) The sequence of functionals h(x, u(x))dx Fn (u) = En (u) + Ω
Γ-converges (up to a multiplicative constant) to F (u) = E(u) + h(x, u(x))dx. Ω
(ii) The sequence un of minimizers of Fn (u), which is precompact in Lp (Ω), converges to the unique minimizer of F (u). Proof. Item (i) is the Γ-convergence result shown by Ponce [16]. Item (ii) is a direct consequence of general Γ-convergence properties, since we proved that the sequence (un ) is bounded in W s,p (Ω), and thus converges strongly in Lp (Ω) to u (up to a subsequence).
852
GILLES AUBERT AND PIERRE KORNPROBST
3.4. Euler–Lagrange equation. Since un is a global minimizer of Fn (u) it necessarily verifies Fn (un ) = 0, i.e., an Euler–Lagrange equation. The Euler–Lagrange equation is given in the following proposition. Proposition 3.7. If function h is differentiable, verifies conditions of Propositions 3.1 and 3.4, and verifies for all u and a.e. x an inequality of the form | ∂h(x,u) ∂u | ≤ l(x) + b|u|p−1 for some function l(x) ∈ L1 (Ω), l(x) > 0 and some b > 0, then the unique minimizer un of Fn (u) verifies for a.e. x |un (x) − un (y)|p−2 ∂h(x, un (x)) = 0. (un (x) − un (y))ρn (|x − y|)dy + (3.18) 2p p |x − y| ∂u Ω Proof. Let us focus on the smoothing term and denote |un (x) − un (y)|p En (un ) = ρn (|x − y|)dxdy, |x − y|p Ω Ω and let us consider for all v in W 1,p (Ω) the differential quotient Dv (t) = We have
Dv (t) = Ω
Ω
En (un + tv) − En (un ) . t
|un (x) − un (y) + t(v(x) − v(y))|p − |un (x) − un (y)|p ρn (|x − y|)dxdy. |x − y|p
Thanks to Taylor’s formula, there exists c(t, x, y) with |c(t, x, y) − (un (x) − un (y))| < t|v(x) − v(y)| such that (v(x) − v(y))c(t, x, y)|c(t, x, y)|p−2 Dv (t) = p ρn (|x − y|)dxdy. |x − y|p Ω Ω Moreover, we have, as t → 0, (v(x) − v(y))c(t, x, y)|c(t, x, y)|p−2 ρn (|x − y|) |x − y|p →
(v(x) − v(y))(un (x) − un (y))|un (x) − un (y)|p−2 ρn (|x − y|). |x − y|p
On the other hand |c(t, x, y)|p−1 ≤ 2p (|un (x) − un (y)|p−1 + |v(x) − v(y)|p−1 ). Thus (3.19)
≤2
p
(v(x) − v(y))c(t, x, y)|c(t, x, y)|p−2 ρ (|x − y|) n p |x − y|
|v(x) − v(y)||un (x) − un (y)|p−1 |v(x) − v(y)|p ρn (|x − y|) + ρn (|x − y|) . |x − y|p |x − y|p
Let us discuss the integrability of the right-hand side terms denoted, respectively, by A and B. The second term B is bounded by an integrable function because v ∈ W 1,p (Ω) and thanks to Proposition 2.1. The first term A gives un (x) − un (y) p−1 p−1 |v(x) − v(y)| p1 ρn (x − y) ρn p (x − y), A= |x − y| |x − y|
VARIATIONAL PROBLEMS IN W 1,p (Ω) AND BV (Ω)
853
where |v(x) − v(y)| p1 ρn (x − y) |x − y| is in Lp (Ω) since v ∈ W 1,p (Ω) and thanks to Proposition 2.1, and un (x) − un (y) p−1 p−1 ρn p (x − y) |x − y| p
is in L p−1 (Ω) since un is a minimizing sequence. So A is also bounded by an integrable function. Therefore we can apply Lebesgue’s dominated convergence theorem (n is fixed) and get |un (x) − un (y)|p−2 En (un ), v = p (v(x) − v(y))(un (x) − un (y))ρn (|x − y|)dy. |x − y|p Ω The computation of the derivative of Ω h(x, u(x))dx is classical. Thus the desired result (3.18) by remarking that the function (x, y) → antisymmetric with respect to (x, y).
|un (x)−un (y)|p−2 (un (x)−un (y)) |x−y|p
is
4. Extension of previous results to the BV (Ω)-case (p = 1). A similar result to that of Proposition 2.2 holds if p = 1; see [16]. In this case we need to search for a solution for problem (2.5) in BV (Ω), the space of functions of bounded variations [1, 10]. In fact most results are still valid in this case with some adaptations. We do not reproduce here details of their proofs, which rely upon the work by Ponce [16], who has, as said before, generalized to BV (Ω) the results of [5] stated in the W 1,p (Ω) case. Let us recall the main steps and show how the results can be extended. • The first point is that the proof of Proposition 3.1 does not apply in the case p = 1 since we cannot extract from a sequence bounded in L1 (Ω) a weakly converging subsequence. Thus we have to show that a minimizing sequence uln of Fn (u) is bounded in the Sobolev space W q,1 (Ω), with 0.5 < q < 1. To do that, we use the same proof as in Proposition 3.3. Then, thanks to the twodimensional Rellich–Kondrachov theorem W q,1 (Ω) ⊂ Lr (Ω) with compact 2 2 injection for 1 ≤ r < 2−q (note that if 0.5 < q < 1, then 4/3 < 2−q < 2). l Therefore, up to a subsequence, un (x) tends, a.e., to some function un (x). Then by using Fatou’s lemma we get Fn (un ) ≤ lim inf l→∞ Fn (uln ); i.e., un is a minimizer of Fn . • The result when n tends to infinity is again obtained thanks to the Γ-convergence result by Ponce and the compactness of the sequence un in Lr (Ω). As a result, un converges strongly in L1 (Ω) to u ∈ BV (Ω). • Finally, the Euler–Lagrange equation (3.18) is no longer true in the case p = 1 since the function t → |t| is not differentiable. However, it is subdifferentiable. Therefore (3.18) changes into an inclusion (4.1)
0 ∈ ∂En (un ) +
∂h (x, un ), ∂u
where En (u) = Ω Ω |u(x)−u(y)| ρn (|x − y|)dxdy. In (4.1), we can choose any |x−y| element of the subdifferential, and, for example, 1 (4.2) 2 sign(un (x) − un (y))ρn (|x − y|)dy, Ω |x − y|
854
GILLES AUBERT AND PIERRE KORNPROBST
where
⎧ ⎪ ⎨−1 if s < 0, sign(s) = 0 if s = 0, ⎪ ⎩ 1 if s > 0.
(4.3)
5. Implementation details and results. 5.1. A unified discrete implementation. In this section, we give the implementation details to solve the general variational problem (2.7) in a unified way (for n fixed) for both Sobolev and BV spaces. The goal is to solve the differential inclusion 0 ∈ ∂Fn (un ), with a standard subgradient descent approach [17, 4]: k+1 u (x) = uk (x) − αk g k (x), (5.1) u0 (x) = u0 (x) ∀x ∈ Ω, where αk is the kth step size and g k is any subgradient in ∂Fn (un ). Taking into account the expression of the gradient or subgradient, we have here ∂h (x, uk (x)) − 2pIuk (x) , (5.2) uk+1 (x) = uk (x) + αk − ∂u with (5.3)
Iuk (x) =
|uk (x) − uk (y)|p−1 sign(uk (x) − uk (y))ρn (|x − y|)dy ∀p. p |x − y| Ω
Note that (5.3) is a unified expression which corresponds to the gradient when p > 1 (see the Euler–Lagrange equation in section 5.1), or a given element of the subdifferential in the BV -case (see section 4). We remind the reader that the definition of ρn also depends on p (see (3.15)). Now the problem is to discretize in space the integral Iuk (x), which has a singular kernel, not defined when x = y. Let us introduce the function Juk such that Juk (x, y) (5.4) Iuk (x) = dy, Ω |x − y| with Juk (x, y) =
|uk (x) − uk (y)|p−1 sign(uk (x) − uk (y))ρn (|x − y|). |x − y|p−1
Because of the singularity, simple schemes using finite differences and integral approximations, for example, will fail. Here we propose to do the following: • Discretize the space using a triangulation. We denote by T the family of triangles covering Ω (see Figure 1). • Interpolate linearly the function Juk (x, y) on each triangle (x fixed). • Find explicit expressions for the integral Juk (x, y)/|x − y| on each triangle. Note that this kind of estimation also appears, for instance, in electromagnetism problems such as MEG-EEG (see, e.g., [9]), where one needs to estimate such singular integrals on meshed domains (three-dimensional domains here).
855
VARIATIONAL PROBLEMS IN W 1,p (Ω) AND BV (Ω)
(a)
(b)
(c)
Fig. 1. (a) Mesh definition. Pixels are represented by the dashed squares. The circles correspond to the centers of the pixels defining the nodes of the mesh. Four nodes define two triangles. (b) In the special case when x is a node (x = y1 in the figure), one needs an interpolation to define Juk (x, y). In that situation, another point z close to the node is introduced and a linear interpolation is estimated. (c) Different cases depending on the situation of x with respect to Ti . Triangle T1 has no edge aligned with x; for triangle T2 , x is one node; for T3 , x is aligned with one edge.
Let us now detail each step. First, integral (5.4) becomes Juk (x, y) (5.5) Iuk (x) = dy. Ti |x − y| Ti ∈T
Then let us approximate Juk (x, y) on each triangle by a linear interpolation. We assume that x is given and fixed. Given one triangle T ∈ T , let us denote the three nodes of T by {yi = (yi1 , yi2 )T }i=1..3 , where the subscript indicates the component. Then we define {Ai }i=1..3 to be the three-dimensional points Ai = (yi1 , yi2 , Juk (x, yi ))T . Note that as soon as x = yi , Juk (x, yi ) is well-defined. Otherwise, if x is in fact a node of T , for example, y1 (see Figure 1(b)), then we use a linear interpolation algorithm: We introduce one point z ∈ T close to y1 , estimate the value of Juk (z, y1 ) at this point, and deduce the value of Juk (x, y1 ) by interpolation. So, given {Ai }i=1..3 , we can in fact choose any node yj and write
1 n 1 (5.6) Juk (x, y) = Juk (x, yj ) − 3 (y − yj ), n n2 where n is the normal to the triangle A1 A2 A3 (see Figure 1(b)). With (5.6) we obtain
1 n Juk (x, y) 1 (y − yj ) 1 (5.7) dy = Juk (x, yj ) dy − 3 dy 2 n n T |x − y| T |x − y| T |x − y| 1 dy = Juk (x, yj ) |x − y| T
1 n 1 (y − x) 1 − 3 dy + (x − y dy . ) j n n2 T |x − y| T |x − y| So, in order to estimate the integral over triangle T , one need only estimate 1 (y − x) dy and dy. (5.8) T |x − y| T |x − y|
856
GILLES AUBERT AND PIERRE KORNPROBST
If we introduce the distance function Dist(x, y) = |x − y| =
(x1 − y 1 )2 + (x2 − y 2 )2 ,
so that ∇y Dist(x, y) =
y−x , |x − y|
y Dist(x, y) =
1 , Dist(x, y)
then we have the following relations: ∂Dist 1 dy = (5.9) y Dist(x, y)dy = (x, y)N i ds, i ∂y T |x − y| T ∂T i=1,2 (5.10) T
(y − x) dy = |x − y|
∇y Dist(x, y)dy =
T
Dist(x, y)N ds, ∂T
where N is the normal to the edges of the triangle T . So we need to estimate the two kinds of integrals defined on the boundaries of the triangles. This can be done explicitly, as follows. Lemma 5.1. Let us consider a segment S = (α, β) of extremities α = (α1 , α2 ), β = (β 1 , β 2 ), N the normal to this segment, and x a fixed given point. Let us define a = |αβ|,
δ = a 2 b 2 − c2 ,
b = |xα|,
d = xα · N,
√ l1 = c/ δ,
√ l2 = (a2 + c)/ δ,
c = xα · αβ. Then we have ∂Dist 0 if x is aligned with S, i (x, y)N ds = i d(asinh(l2 ) − asinh(l1 )) otherwise, ∂y i=1,2 S
(5.11) and (5.12)
Dist(x, y)N ds S
⎧ 2 a /2 if x = α or x = β, ⎪ ⎪ ⎪ ⎪ and c > 0, ⎨ a2 /2 + c if c = ab (x aligned with αβ) = 2 and c < 0, −a /2 − c if c = −ab (x aligned with αβ) ⎪ ⎪ ⎪ ⎪ ⎩ δ/a2 l2 1 + l22 + asinh(l2 ) − l1 1 + l12 − asinh(l1 ) otherwise. Proof. Let us show how to obtain (5.11) when x, α, and β are not aligned. To do this, let us parametrize the segment S = [α, β] so that S=
y(t) = t
β1 β2
+ (1 − t)
α1 α2
; t ∈ (0, 1) .
VARIATIONAL PROBLEMS IN W 1,p (Ω) AND BV (Ω)
857
The unitary normal vector of the segment S is given by
−(β 2 − α2 ) 1 N= . 1 1 2 β 1 − α1 (β − α ) + (β 2 − α2 )2 So we have I=
∂Dist 1 y i (t) − xi i N i |αβ|ds. (x, y)N ds = i ∂y |x − y(t)| i=1,2 S i=1,2 0
After some algebraic computations, we get 1 dt , I = αβ · xα⊥ 2 2 t |αβ| + |xα|2 + 2 t xα · αβ 0 2 −x2 ) with xα⊥ = −(α . Using the notation defined in Lemma 5.1, and since δ > 0 α1 −x1 (x, α, and β are not aligned), we have 1 dt ⊥ a √ I = αβ · xα . 4 δ 0 a c 2 +1 δ t + a2 We can explicitly compute the integral with the change of variable c a2 z = √ t+ 2 , a δ so that we obtain I=
αβ · xα⊥ (asinh(l2 ) − asinh(l1 )), |αβ|
which concludes the proof. Other cases follow from similar arguments. With Lemma 5.1, one can estimate (5.9) and (5.10) and thus (5.7). By summing over all the squares and for a given x, we obtain the estimation of the integral Iuk (x) (5.5), and then we can iterate (5.2). 5.2. Experiments on image restoration. Let u : Ω ⊂ R2 → R be an original image describing a real scene, and let u0 be the observed image of the same scene (i.e., a degradation of u). We assume that u0 = R u + η,
(5.13)
where η stands for a white additive Gaussian noise and where R is a linear operator representing the blur (usually a convolution). Given u0 , the problem is then to reconstruct u knowing (5.13). Supposing that η is a white Gaussian noise, and according to the maximum likelihood principle, we can find an approximation of u by solving the least-squares problem 2 |u0 − Ru| dx, inf u
Ω
where Ω is the domain of the image. However, this is well known to yield to an ill-posed problem [15, 3].
858
GILLES AUBERT AND PIERRE KORNPROBST
original
noisy
restored (p = 1)
Fig. 2. Example of image restoration.
A classical way to overcome ill-posed minimization problems is to add a regularization term to the energy so that the problem is to minimize 2 p (5.14) F (u) = |u0 − Ru| dx + λ |∇u| dx. Ω
Ω
The first term in F (u) measures the fidelity to the data. The second is a smoothing term. In other words, we search for a u that best fits the data so that its gradient is low (so that noise will be removed). The parameter λ is a positive weighting constant. For p = 1 we have in fact a BV -norm which leads to discontinuous solutions (see [2] for a review). Remark that (5.14) is of the form (2.5), with h(x, u(x)) = |u0 (x) − Ru(x)|2 . Without loss of generality, we will assume that the operator R is the identity operator. So, in this section, we show some numerical results considering the minimization of the nonlocal functional |u(x) − u(y)|p 2 ρn (|x − y|)dxdy (5.15) Fn (u) = |u0 − u| dx + λ |x − y|p Ω Ω Ω for a given n. The first result, shown in Figure 2, illustrates an image restoration result on a real noisy image for p = 1. The result is as expected, which is very close to classical TV results. We recall that this approximation of the BV regularization problem is indeed independent of the fidelity attach term. The second result, shown in Figure 3, is another image restoration result on a simple synthetic step image, which illustrates the effect of the parameter p on the edges. For example, we recover the classical observation for p = 1 or p = 2. More importantly, we show that our approximation can be successfully used to handle variational problems posed on W 1,p (Ω) with high values of p which, to our knowledge, generally leads to numerically unstable schemes. 6. Conclusion. Our main contribution was to show that the characterization result due to Bourgain, Brezis, and Mironescu [5] for the Sobolev seminorm can indeed be successfully applied to solve variational problems. It was not a priori straightforward that this characterization of W 1,p could be useful in the theoretical and numerical analysis of problems of calculus of variations.
VARIATIONAL PROBLEMS IN W 1,p (Ω) AND BV (Ω)
859
Evolution for p = 1
Evolution for p = 2
Evolution for p = 20
Evolution for p = 40 Fig. 3. Example of evolutions with various values of p applied to a synthetic noisy image.
A step further, we proved that our results can be extended also in the BV -case, thanks to Ponce’s results [16]. Note that the BV -case is not a simple extension from the W 1,p -case, and it requires some adaptations. Interestingly, we show that this approach allows us to treat problems posed in W 1,p with high values of p, which is a challenging problem as far as we know. Finally, our contribution does not target a particular field of application, and image restoration was proposed here as an illustration: We wanted also to show that this alternative formulation, which leads to nonlocal terms with singular kernels, can be implemented. REFERENCES [1] L. Ambrosio, N. Fusco, and D. Pallara, Functions of Bounded Variation and Free Discontinuity Problems, Oxford Math. Monogr., Oxford University Press, New York, 2000. [2] G. Aubert and P. Kornprobst, Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations, 2nd ed., Appl. Math. Sci. 147, SpringerVerlag, New York, 2006. [3] M. Bertero and P. Boccacci, Introduction to Inverse Problems in Imaging, Institute of Physics Publishing, Bristol, 1998.
860
GILLES AUBERT AND PIERRE KORNPROBST
[4] D. P. Bertsekas, Nonlinear Programming, 2nd ed., Athena Scientific, Nashua, NH, 1999. [5] J. Bourgain, H. Brezis, and P. Mironescu, Another look at Sobolev spaces, in Optimal Control and Partial Differential Equations, J. L. Menaldi, E. Rofman, and A. Sulem, eds., IOS Press, 2001, pp. 439–455. [6] A. Buades, B. Coll, and J. M. Morel, Neighborhood filters and PDE’s, Numer. Math, 105 (2006), pp. 1–34. [7] A. Chambolle, An algorithm for total variation minimization and applications, J. Math. Imaging Vision, 20 (2004), pp. 89–97. [8] T. F. Chan, G. H. Golub, and P. Mulet, A nonlinear primal-dual method for total variationbased image restoration, SIAM J. Sci. Comput., 20 (1999), pp. 1964–1977. [9] E. Darve, M´ ethodes multipˆ oles rapides: R´ esolution des ´ equations de Maxwell par formulations int´ egrales, Ph.D. thesis, Universit´e de Paris 6, 1999. [10] L. C. Evans and R. F. Gariepy, Measure Theory and Fine Properties of Functions, CRC Press, Boca Raton, FL, 1992. [11] G. Gilboa, J. Darbon, S. Osher, and T. F. Chan, Nonlocal Convex Functionals for Image Regularization, Technical Report 06-57, UCLA CAM Report, UCLA, Los Angeles, CA, 2006. [12] G. Gilboa and S. Osher, Nonlocal linear image regularization and supervised segmentation, Multiscale Model. Simul., 6 (2007), pp. 595–630. ¨ ller and K. Kunisch, Total bounded variation regularization as a bilaterally [13] M. Hintermu constrained optimization problem, SIAM J. Appl. Math., 64 (2004), pp. 1311–1333. ¨ ller and G. Stadler, An infeasible primal-dual algorithm for total bounded [14] M. Hintermu variation–based inf-convolution-type image restoration, SIAM J. Sci. Comput., 28 (2006), pp. 1–23. [15] A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Appl. Math. Sci. 120, Springer-Verlag, New York, 1996. [16] A. Ponce, A new approach to Sobolev spaces and connections to γ-convergence, Calc. Var. Partial Differential Equations, 19 (2004), pp. 229–255. [17] N. Z. Shor, Minimization Methods for Nondifferentiable Functions, Springer Ser. Comput. Math. 3, Springer-Verlag, Berlin, 1985. [18] C. R. Vogel and M. E. Oman, Fast, robust total variation-based reconstruction of noisy, blurred images, IEEE Trans. Image Process., 7 (1998), pp. 813–824. [19] P. Weiss, L. Blanc-F´ eraud, and G. Aubert, Efficient schemes for total variation minimization under constraints in image processing, SIAM J. Sci. Comput., to appear.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 861–886
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD WITH CONVERGENCE RATES∗ MARIO S. MOMMER† AND ROB STEVENSON‡ Abstract. An adaptive finite element method is analyzed for approximating functionals of the solution of symmetric elliptic second order boundary value problems. We show that the method converges and derive a favorable upper bound for its convergence rate and computational complexity. We illustrate our theoretical findings with numerical results. Key words. adaptive finite element method, convergence rates, computational complexity, quantity of interest, a posteriori error estimators AMS subject classifications. 65N30, 65N50, 65N15, 65Y20, 41A25 DOI. 10.1137/060675666
1. Introduction. Adaptive finite element methods (AFEMs) have become a standard tool for the numerical solution of partial differential equations. Although used successfully for more than 25 years, in more than one space dimension, even for the most simple case of symmetric elliptic equations of second order a(u, v) = f (v) (∀v), their convergence was not demonstrated before the works of D¨ orfler [D¨ or96] and Morin, Nochetto, and Siebert [MNS00]. Convergence alone, however, does not show that the use of an AFEM for a solution that has singularities improves upon, or even competes with, that of a nonadaptive FEM. Recently, after the derivation of such a result by Binev, Dahmen, and DeVore [BDD04] for an AFEM extended with a so-called coarsening routine, in [Ste07] it was shown that standard AFEMs converge with the best possible rate in linear complexity. The aforementioned works all deal with AFEMs in which the error is measured 1 in the energy norm · E := a(·, ·) 2 . In many applications, however, one is not so much interested in the solution u as a whole, but rather in a (linear) functional g(u) of the solution, often being referred to as a quantity of interest. With uτ denoting the finite element approximation of u with respect to a partition τ , from |g(u) − g(uτ )| ≤ gE u − uτ E , obviously it follows that convergence of uτ towards u with respect to · E implies that of g(uτ ) towards g(u) with at least the same rate. It is, however, generally observed that with adaptive methods especially designed for the approximation of this quantity of interest, known as goal-oriented adaptive methods, convergence of g(uτ ) towards g(u) takes place at a higher rate. Examples of such methods can be found in the monographs [AO00, BR03, BS01], and in references cited therein. So far these goal-oriented adaptive methods are usually not proven to converge. An exception is the method from [DKV06], however, in which adaptivity is purely driven by energy norm minimalization of the error in the dual problem a(v, z) = g(v) (∀v). Another exception is the goal-oriented method from [MvSST06], which is ∗ Received by the editors November 22, 2006; accepted for publication (in revised form) October 16, 2008; published electronically February 6, 2009. This work was supported by the Netherlands Organization for Scientific Research and by the European Community’s Human Potential Programme under contract HPRN-CT-2002-00286. http://www.siam.org/journals/sinum/47-2/67566.html † Interdisciplinary Center for Scientific Computing (IWR), Universit¨ at Heidelberg, Im Neuenheimer Feld 368, 69120 Heidelberg, Germany (
[email protected]). ‡ Korteweg–de Vries Institute for Mathematics, University of Amsterdam, Plantage Muidergracht 24, 1018 TV Amsterdam, The Netherlands (
[email protected]).
861
862
MARIO S. MOMMER AND ROB STEVENSON
proven to converge with a rate equal to what we will demonstrate (for piecewise linears), where in [MvSST06] the strong assumption u, z ∈ C 3 (Ω) was made. The starting point of our method is the well-known upper bound (1.1)
|g(u) − g(uτ )| = |a(u − uτ , z − zτ )| ≤ u − uτ E z − zτ E ,
where zτ is the finite element approximation with respect to τ of z. Having available an AFEM that is convergent with respect to the energy norm, in view of (1.1) an obvious approach would be to use it for finding partitions τp and τd such that the corresponding finite √ element approximations uτp and zτd have, say, both energy norm errors less than ε. Indeed, then the product of the errors in primal and dual finite element approximations with respect to the smallest common refinement of τp and τd —and thus the error in the approximation of the quantity of interest—is less than ε. This approach, however, would not benefit from the situation in which, quantitatively or qualitatively, either primal or dual solution is easier to approximate by finite element functions. The alternative method we propose here works, in essence, as follows. On the kth iteration, we start from a partition τk and compute on it the solutions of the primal and dual problems. To advance the iteration, this partition is refined in such a way that the product u − uτ E z − zτ E is reduced by a constant factor. To achieve this, we consider the effort needed to reduce each of u − uτ E and z − zτ E by the same constant factor, which we do by separately computing suitable refinement sets. The smallest of these sets is then applied to τk to obtain τk+1 . We can show that this method is convergent. In particular, we prove that if, for whatever s, t > 0, the solutions of the primal and dual problems can be approximated in energy norm to any accuracy δ > 0 from partitions of cardinality O(δ −1/s ) or O(δ −1/t ), respectively, then given ε > 0, our method constructs a partition of cardinality O(ε−1/(s+t) ) such that |g(u) − g(uτ )| ≤ u − uτ E z − zτ E ≤ ε. In view of the assumptions, this order of cardinality realizing u − uτ E z − zτ E ≤ ε is optimal. Moreover, by solving the arising linear systems only inexactly, we show that the overall cost of the algorithm is of order O(ε−1/(s+t) ). The convergence rate s + t of our goal-oriented method is thus the sum of the rates s and t of the best approximations in energy norm for primal and dual problems. With the approach of approximating both primal and dual problem within tolerance √ ε, the rate would be 2 min(s, t). Another alternative approach, namely, to solve each of the problems to an accuracy of εs/(s+t) and εt/(s+t) , respectively, would also result in the rate s + t. This approach, however, is not feasible, since the values s and t are generally unknown. Our method converges at the rate s + t without previous knowledge about the regularity of the solutions. Concerning the value of s (and similarly t), when applying finite elements of order p, for s up to p/n, a rate s is guaranteed when the solution has “ns orders of smoothness” in Lτ (Ω) for some τ > ( 12 + s)−1 (instead of in L2 (Ω) required for nonadaptive approximation) (cf. [BDDP02]). Our method is based on minimizing an upper bound for the error in the functional, which under certain circumstances can be crude. Actually, in all available goal-oriented adaptive methods the decision of which elements have to be refined is based on some upper bound for the error. Unlike the error in energy norm, there exists no computable two-sided bound for the error in a functional of the solution.
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
863
This leaves open the possibility that some bounds are “usually” sharper than others. An argument against the upper bound (1.1) brought up in [BR03] is that it is based on the application of a global Cauchy–Schwarz inequality, whereas the dual weighted residual method advocated there would better respect the local information. The contribution of the current paper is that we prove a rate that is generally observed with goal-oriented methods. When applying finite element spaces of equal order at primal and dual sides, we neither expect (see Remark 5.1 for details) nor observe in our experiments that on average our bound gets increasingly more pessimistic when the iteration proceeds. This paper is organized as follows: In section 2, we describe the model boundary value problem that we will consider. The finite element spaces and the refinement rules based on bisections of n-simplices are discussed in section 3. In section 4, we give results on residual-based a posteriori energy error estimators. In section 5, we present our goal-oriented AFEM under the simplifying assumption that the righthand sides of both primal and dual problems are piecewise polynomial with respect to the initial finite element partition. We derive the aforementioned bound on the cardinality of the output partition. In section 6, the method is extended to general right-hand sides. By replacing the exact solutions of the arising linear systems by inexact ones, it is further shown that the required number of arithmetic operations and storage locations satisfies the same favorable bound as the cardinality of the output partition. Finally, in section 7, we present numerical results obtained with the method. To apply our approach also to unbounded functionals, here we recall the use of extraction functionals, an approach introduced in [BS01]. In this paper, by C D we will mean that C can be bounded by a multiple of D, independently of parameters upon which C and D may depend. Similarly, C D is defined as D C, and C D as C D and C D. 2. The model problem. Let Ω ⊂ Rn be a polygonal domain. We consider the following model boundary value problem in variational form: Given f ∈ H −1 (Ω), find u ∈ H01 (Ω) such that (2.1) a(u, v) := A∇u · ∇v = f (v) (v ∈ H01 (Ω)), Ω
where A ∈ L∞ (Ω) is a symmetric n × n matrix with ess inf x∈Ω λmin (A(x)) > 0. We assume that A is piecewise constant with respect to an initial finite element partition τ0 of Ω specified below. To keep the exposition simple, we do not attempt to derive results that hold uniformly in the size of jumps of ρ(A) over element interfaces, although, under some conditions, this is likely possible; cf. [Ste05]. For f ∈ L2 (Ω), we interpret f (v) as Ω f v. Given some g ∈ H −1 (Ω), we will be interested in g(u). With z ∈ H01 (Ω) we will denote the solution of the dual problem (2.2)
a(v, z) = g(v) (v ∈ H01 (Ω)).
We set the energy norm on H01 (Ω) and dual norm on H −1 (Ω) by 1
vE = a(v, v) 2 respectively.
and hE =
sup 0 =v∈H01 (Ω)
|h(v)| , vE
864
MARIO S. MOMMER AND ROB STEVENSON
¯ into 3. Finite element spaces. Given an essentially disjoint subdivision τ of Ω (closed) n-simplices, called a partition, we will search approximations for u and z from the finite element space ! Pp (T ), Vτ := H01 (Ω) ∩ T ∈τ
where 0 < p ∈ N is some fixed constant. For approximating the functionals f and g, we will make use of spaces ! V∗τ := Pp−1 (T ). T ∈τ
Although it is not a finite element space in the usual sense, we also use ! (3.1) W∗τ := {h ∈ H(div; T ) : [[h · n]]∂T ∈ L2 (∂T )}, T ∈τ
with n being a unit vector normal to ∂T , and [[ ]]∂T denoting the jump of its argument over ∂T in the direction of n, defined to be zero on ∂Ω. Obviously, [V∗τ ]n ⊂ W∗τ . Below, we specify the type of (nested) partitions we will consider, and we recall some results from [Ste08], generalizing upon known results for newest vertex bisection in two dimensions. For 0 ≤ k ≤ n − 1, a (closed) simplex spanned by k + 1 vertices of an n-simplex T is called a hyperface of T . For k = n − 1, it will be called a true hyperface. A partition τ is called conforming when the intersection of any two different T, T ∈ τ is either empty or a hyperface of both simplices. Different simplices T , T that share a true hyperface will be called neighbors. (Actually, when Ω = int(Ω), the above definition of a conforming partition can be unnecessarily restrictive. We refer to [Ste08] for a discussion of this matter.) Simplices will be refined by means of bisection. In order to guarantee uniform shape regularity of all descendants, a proper cyclic choice of the refinement edges should be made. To that end, given {x0 , . . . xn } ⊂ Rn , not on a joint (n − 1)dimensional hyperplane, we distinguish between n(n + 1)! tagged simplices given by all possible ordered sequences (x0 , x1 , . . . , xn )γ and types γ ∈ {0, . . . , n − 1}. Given a tagged simplex T = (x0 , x1 , . . . , xn )γ , its children are the tagged simplices n , x1 , . . . , xγ , xγ+1 , . . . , xn−1 )(γ+1)modn (x0 , x0 +x 2
and n , x1 , . . . , xγ , xn−1 , . . . , xγ+1 )(γ+1)modn , (xn , x0 +x 2
where the sequences (xγ+1 , . . . , xn−1 ) and (x1 , . . . , xγ ) should be read as being void for γ = n − 1 and γ = 0, respectively. So these children are defined by bisecting the edge x0 xn of T —i.e., by connecting its midpoint with the other vertices x1 , . . . , xn−1 — by an appropriate ordering of their vertices and by having type (γ + 1)modn. See Figure 3.1 for an illustration. This bisection process was introduced in [Tra97] and, using different notation, in [Mau95]. The edge x0 xn is called the refinement edge of T . In the n = 2 case, the vertex opposite this edge is known as the newest vertex. Corresponding to a tagged simplex T = (x0 , . . . , xn )γ , we set TR = (xn , x1 , . . . , xγ , xn−1 , . . . , xγ+1 , x0 )γ ,
865
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
1
2
3 0
3
0
1
1
0 3
2
2
Fig. 3.1. Bisection of a tagged tetrahedron of type 0 with the next two-level cuts indicated.
2 0
1 2
1 1
2
0 1
1 0 2
1
0
0 2 0
1
2
0 2
21
0
Fig. 3.2. Matching neighbors for n = 2, and their level 1 and 2 descendants. The neighbors in the rightmost picture are not reflected neighbors, but the pair of their neighboring children are.
which is the tagged simplex that has the same set of children as T , and in this sense is equal to T . So actually we distinguish between 12 n(n + 1)! tagged simplices. Given a fixed conforming initial partition τ0 of tagged simplices of some fixed type γ, we will exclusively consider partitions that can be created from τ0 by recurrent bisections of tagged simplices, in short, descendants of τ0 . Simplices that can be created in this way are uniformly shape regular, dependent only on τ0 and n. For the case that Ω might have slits, we assume that ∂Ω is the union of true hyperfaces of T ∈ τ0 . We will assume that the simplices from τ0 are tagged in such a way that any two neighbors T = (x0 , . . . , xn )γ , T = (x 0 , . . . , x n )γ from P0 match in the sense that if x0 xn or x 0 x n is on T ∩ T , then either T and T are reflected neighbors, meaning that the ordered sequence of vertices of either T or TR coincides with that of T on all but one position, or the pair of neighboring children of T and T are reflected neighbors. See Figure 3.2 for an illustration. It is known (see [BDD04] and the references therein) that for any conforming partition into triangles there exists a local numbering of the vertices so that the matching condition is satisfied. We do not now whether the corresponding statement holds in more space dimensions. Yet we showed that any conforming partition of n-simplices can be refined, inflating the number of simplices by not more than an absolute constant factor, into a conforming partition τ0 that allows a local numbering of the vertices so that the matching condition is satisfied.
866
MARIO S. MOMMER AND ROB STEVENSON
For applying a posteriori error estimators, we will require that the partitions τ underlying the approximation spaces be conforming. So in the following τ , τ , τˆ, etc., will always denote conforming partitions. Bisecting one or more simplices in a conforming partition τ generally results in a nonconforming partition . Conformity has to be restored by (recursively) bisecting any simplex T ∈ that contains a vertex v of a T ∈ that does not coincide with any vertex of T (such a v is called a hanging vertex). This process, called completion, results in the smallest conforming refinement of . Our adaptive method will be of the following form: for j := 1 to M do create some, possibly nonconforming refinement j of τj−1 complete j to its smallest conforming refinement τj endfor M As we will see, we will be able to bound j=1 #j − #τj−1 . Because of the additional bisections made in the completion steps, however, generally #τM − #τ0 will be larger. The following crucial result, which relies on the matching condition in the initial partition, shows that these additional bisections inflate the total number of simplices by at most an absolute constant factor. Theorem 3.1 (generalizes upon [BDD04, Theorem 2.4] for n = 2). #τM − #τ0
M
#j − #τj−1 ,
j=1
dependent only on τ0 and n, and in particular thus independently of M . Remark 3.2. Note that this result in particular implies that any descendant of τ0 has a conforming refinement τ with #τ #, dependent only on τ0 and n. We end this section by introducing two more notations. For partitions τ , τ , we write τ ⊇ τ (τ ⊃ τ ) to denote that τ is a (proper) refinement of τ . The smallest common refinement of τ and τ will be denoted as τ ∪ τ . 4. A posteriori estimators for the energy error. Given a partition τ , and with uτ denoting the solution in Vτ of (4.1)
a(uτ , vτ ) = f (vτ ) (vτ ∈ Vτ ),
in this section we discuss properties of the common residual-based a posteriori error estimator for u − uτ E . Since a( , ) is symmetric, an analogous result will apply to z − zτ E , with zτ denoting the solution in Vτ of (4.2)
a(vτ , zτ ) = g(vτ ) (vτ ∈ Vτ ).
By formally viewing H01 (Ω) as Vτ corresponding to the infinitely uniformly refined partition τ = ∞, at some places we interpreted results derived for uτ to hold for the solution u of (2.1) by substituting τ = ∞. For developing an AFEM that reduces the error in each iteration, it will be necessary to approximate the right-hand side by discrete functions. Loosely speaking, in [MNS00] the error in this approximation is called data oscillation. Being on a partition τ , it will be allowed to use functions from V∗τ + div[V∗τ ]n , where div := (−∇) : L2 (Ω)n → H −1 (Ω). Depending on the right-hand side at hand, it might be
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
867
more convenient to approximate it by functions from V∗τ or from div[V∗τ ]n , or by a combination of these. In view of this, we will write f = f 1 + divf 2 ,
(4.3)
where f 1 ∈ H −1 (Ω) and f 2 ∈ L2 (Ω)n are going to be approximated by functions from V∗τ or from div[V∗τ ]n , respectively. Similarly, we write g = g 1 + divg 2 . Remark 4.1. Obviously, any f ∈ H −1 (Ω) can be written in the above form with vanishing f2 . On the other hand, by taking f 2 = −∇w with w ∈ H01 (Ω) being the solution of Ω ∇w · ∇v = f (v) (v ∈ H01 (Ω)), we see that we can equally well consider a vanishing f 1 . For u ¯τ ∈ Vτ , f¯1 ∈ L2 (Ω), and f¯2 ∈ W∗τ (see (3.1)), where we have in mind approximations to uτ , f 1 , and f 2 , respectively, and T ∈ τ , we set the local error indicator ηT (f¯1 , f¯2 , u ¯τ ) := diam(T )2 f¯1 + ∇ · [A∇¯ uτ + f¯2 ]2 L2 (T )
¯2
+ diam(T )[[[A∇¯ uτ + f ] · n]]∂T 2L2 (∂T ) . Note that the first term is the weighted local residual of the equation in strong form. We set the energy error estimator 12 E(τ, f¯1 , f¯2 , u ¯τ ) := ηT (f¯1 , f¯2 , u ¯τ ) . T ∈τ
The following Proposition 4.2 is a generalization of [Ste07, Theorem 4.1] valid for A = Id, f 2 = 0, and polynomial degree p = 1. This result in turn was a generalization of [BMN02, Lemma 5.1, eq. (5.4)] (see also [Ver96]) in the sense that instead of u − uτ E , the difference uτ − uτ E for any τ ⊃ τ is estimated. Proposition 4.2 tells us that this difference can be bounded from above by the square root of the sum of the local error indicators corresponding to those simplices from τ that either are not in τ since they were refined or have nonempty intersection with such simplices. By taking τ = ∞, this result yields the known bound for u − uτ E . Proposition 4.2. Let τ ⊃ τ be partitions, and let f 1 ∈ L2 (Ω), f 2 ∈ W∗τ , and G = G(τ, τ ) := {T ∈ τ : T ∩ T˜ = ∅ for some T˜ ∈ τ, T˜ ∈ τ }. Then we have
uτ − uτ E ≤ C1
12 1
2
ηT (f , f , uτ )
T ∈G
for some absolute constant C1 > 0. Note that #G #τ − #τ . In particular, by taking τ = ∞, we have u − uτ E ≤ C1 E(τ, f 1 , f 2 , uτ ).
(4.4)
Proof. We have uτ − uτ E = sup0 =vτ ∈Vτ vτ ∈ Vτ , we have
|a(uτ −uτ ,vτ )| . vτ E
For any vτ ∈ Vτ ,
a(uτ − uτ , vτ ) = a(uτ − uτ , vτ − vτ ) f 1 (vτ − vτ ) − f 2 · ∇(vτ − vτ ) − A∇uτ · ∇(vτ − vτ ) = T
=
T
T
(f 1 + ∇ · [A∇uτ + f 2 ])(vτ − vτ ) − ∂T
[A∇uτ + f 2 ] · n(vτ − vτ ) ,
868
MARIO S. MOMMER AND ROB STEVENSON
where the last line follows by integration by parts. By taking vτ to be a suitable local quasi-interpolant of vτ as in [Ste07] (for p > 1, one may consult [KS08]) or, alternatively, a Cl´ement-type interpolator, and applying a Cauchy–Schwarz inequality, one completes the proof. Remark 4.3. For the lowest order elements, i.e., p = 1, a statement similar to Proposition 4.2 is valid with error indicators consisting of the jump terms over the interfaces only. As a consequence, along the lines that we will follow for elements of general degree p, for p = 1 a cheaper goal-oriented AFEM can be developed that has similar properties. Details can be found in Appendix A of the extended preprint version [MS08] of this work. Next we study whether the error estimator also provides a lower bound for u − uτ E and, when τ is a sufficient refinement of τ , for uτ − uτ E . In order to derive such estimates, for the moment we further restrict the type of right-hand sides. The proof of the following proposition will be derived along the lines of the proof of [BMN02, Lemma 5.3], where the Stokes problem is considered (see also [MNS00, Lemma 4.2] for the case p = 1 and f 2 = 0). For convenience of the reader we include it here. Proposition 4.4. Let τ ⊂ τ be partitions, and let f 1 ∈ V∗τ , f 2 ∈ [V∗τ ]n , and u ¯ τ ∈ Vτ . (a) If T ∈ τ contains a vertex of τ in its interior, then uτ + f 2 ]2L2 (T ) |uτ − u¯τ |2H 1 (T ) . diam(T )2 f 1 + ∇ · [A∇¯ (b) If a joint true hyperface e of T1 , T2 ∈ τ contains a vertex of τ in its interior, then diam(e)[[[A∇¯ uτ + f 2 ] · n]]e 2L2 (e) |uτ − u ¯τ |2H 1 (T1 ∪T2 ) +
2
diam(Ti )2 f 1 + ∇ · [A∇¯ uτ + f 2 ]2L2 (Ti ) .
i=1
" Proof. Let φT ∈ H01 (Ω) ∩ T ∈τ P1 (T ) be the canonical nodal basis function uτ +f 2 ])|T ∈ Pd−1 (T ), associated to a vertex of τ inside T . Writing RT = (f 1 +∇·[A∇¯ and vτ = RT φT ∈ Vτ , using the fact that supp vτ ⊂ T , by integration by parts we get RT2 RT2 φT = RT vτ = (f1 + divf 2 )(vτ ) − A∇¯ uτ · ∇vτ T
T
T
T
A∇(uτ − u ¯τ ) · ∇vτ ,
= T
and so by |vτ |H 1 (T ) "diam(T )−1 vτ L2 (T ) diam(T )−1 RT L2 (T ) , we infer (a). Let φe ∈ H01 (Ω) ∩ T ∈τ P1 (T ) be the canonical nodal basis function associated uτ + f 2 ] · n]]e ∈ Pd−1 (e), let J¯e ∈ to a vertex interior to e. Writing Je = [[[A∇¯ Pd−1 (T1 ∪ T2 ) denote its extension constant in the direction normal to e, and let vτ = J¯e φe ∈ Vτ . Using the fact that supp vτ ⊂ T1 ∪ T2 , by integration by parts we get 2 2 2 Je Je φe = Je vτ = (A∇¯ uτ + f ) · ∇vτ + ∇ · (A∇¯ uτ + f 2 )vτ . e
e
e
T1 ∪T2
T1 ∪T2
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
From
f · ∇vτ = −divf (vτ ) = −a(uτ , vτ ) + 2
T1 ∪T2
2
we infer
T1 ∪T2
f 1 vτ ,
Je2 a(¯ uτ − uτ , vτ ) +
e
869
T1 ∪T2
(f 1 + ∇ · (A∇¯ uτ + f 2 ))vτ −1
|¯ uτ − uτ |H 1 (T1 ∪T2 ) diam(e)
+
2
RTi L2 (Ti ) vτ L2 (T1 ∪T2 ) .
i=1 1 Using the fact that vτ L2 (T1 ∪T2 ) J¯e L2 (T1 ∪T2 ) diam(e) 2 Je L2 (e) , we infer item (b) of the proposition. In view of this last result, we will call a (possibly nonconforming) ⊃ τ a full refinement with respect to T ∈ τ when
T , and its neighbors in τ , as well as all true hyperfaces of T , all contain a vertex of in their interiors. As a direct consequence of Proposition 4.4 we have the following. ¯ τ ∈ Vτ , Corollary 4.5. Let τ be a partition, let f 1 ∈ V∗τ , f 2 ∈ [V∗τ ]n , and u and let τ ⊃ τ be a full refinement of τ with respect to all T from some F ⊂ τ . Then (4.5)
c2
12 ηT (f 1 , f 2 , u¯τ )
≤ uτ − u¯τ E
T ∈F
for some absolute constant c2 > 0. In particular, we have (4.6)
¯τ ) ≤ u − u¯τ E . c2 E(τ, f 1 , f 2 , u
Next, we investigate the stability of the energy error estimator. Proposition 4.6. Let τ be a partition, and let f 1 ∈ L2 (Ω), f 2 ∈ W∗τ , and vτ , wτ ∈ Vτ . Then c2 |E(τ, f 1 , f 2 , vτ ) − E(τ, f 1 , f 2 , wτ )| ≤ vτ − wτ E . Proof. For f˜1 ∈ L2 (Ω), f˜2 ∈ W∗τ , and vτ , wτ ∈ Vτ , by two applications of the 2 triangle inequality in the form · − · ≤ · − · 2 , first for vectors and then for functions, we have |E(τ, f 1 , f 2 , vτ ) − E(τ, f˜1 , f˜2 , wτ )| ≤ E(τ, f 1 − f˜1 , f 2 − f˜2 , vτ − wτ ). By substituting f˜1 = f 1 and f˜2 = f 2 , and by applying (4.6) the proof is complete. 5. An idealized goal-oriented AFEM. From (2.2) and u − uτ ⊥a( , ) Vτ zτ , we have (5.1)
|g(u) − g(uτ )| = |a(u − uτ , z)| = |a(u − uτ , z − zτ )| ≤ u − uτ E z − zτ E .
We will develop an adaptive method for minimizing the right-hand side of this expression.
870
MARIO S. MOMMER AND ROB STEVENSON
Remark 5.1. A question that naturally arises is whether there is something to be gained from using finite elements of different orders for the dual and the primal problems. Note that the derivation of (5.1) remains valid if the dual solution is computed in a lower order space, or for that matter in any space that is a subspace of Vτ . But this will result in a larger z − zτ E , worsening our error estimate without changing the actual error |g(u) − g(uτ )|. And how about using a higher order space for the dual problem? In this case, (5.1) no longer holds. As g(u) = f (z), we can approximate it by f (zτ ) with (5.2)
|f (z) − f (zτ )| = |a(u, z − zτ )| = |a(u − uτ , z − zτ )| ≤ u − uτ E z − zτ E .
Thus, as before, we obtain a worse error estimate than if we had used the same higher order space for the primal problem as well. We conclude that with our approach there is no gain from using different orders and, accordingly, will consider here only spaces of equal order. Up to and including Lemma 5.3, we start with discussing a method for reducing u − uτ E or similarly z − zτ E separately. For some fixed
c2 θ ∈ 0, , C1 we will make use of the following routine to mark simplices for refinement: ¯τ ] → F MARK[τ, f¯1 , f¯2 , u % f¯1 ∈ L2 (Ω), f¯2 ∈ W∗τ , u ¯ τ ∈ Vτ . Select, in O(#τ ) operations, a set F ⊂ τ with, up to some absolute factor, minimal cardinality such that ηT (f¯1 , f¯2 , u ¯τ ) ≥ θ2 E(τ, f¯1 , f¯2 , u ¯τ )2 . (5.3) T ∈F
Remark 5.2. Selecting F that satisfies (5.3) with truly minimal cardinality would require the sorting of all ηT = ηT (f¯1 , f¯2 , u ¯τ ), which takes O(#τ log(#τ )) operations. The log-factor can be avoided by performing an approximate sorting based on binning that we recall here: With N := #τ , we may discard all ηT ≤ (1 − θ2 )E(τ, f¯1 , f¯2 , u ¯τ )2 /N . With M := maxT ∈τ ηT , and q the smallest integer with −q−1 2 M ≤ (1 − θ )E(P c , f¯1 , f¯2 , wP c )2 /N , we store the others in q + 1 bins depending 2 on whether ηT is in [M, 12 M ), [ 12 M, 14 M ), . . . , or [2−q M, 2−q−1 M ). Then we build F by extracting ηT from the bins, starting with the first bin, moving to the second bin when the first is empty, and so on until (5.3) is satisfied. Let the resulting F now contain ηT from the th bin, but not from further bins. Then a minimal set F˜ that satisfies (5.3) contains all ηT from the bins up to the ( − 1)th one. Since any two ηT in the th bin differ at most by a factor of 2, we infer that the cardinality of the contribution from the th bin to F is at most twice as large as that to F˜ , so that #F ≤ 2#F˜ . Assuming that each evaluation of ηT takes O(1) operations, the number of operations and storage locations required by this procedure is O(q + #τ ), with q < log2 (M N/[(1 − θ2 )E(τ, f¯1 , f¯2 , u ¯τ )2 ]) ≤ log2 (N/(1 − θ2 )) log2 (#τ ) < #τ . The assumption on the cost of evaluating ηT is satisfied when f¯1 ∈ V∗τ and f¯2 ∈ [V∗τ ]n , as will be the case in our applications. Having a set of marked elements F , the next step is to apply the following:
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
871
REFINE[τ, F ] → τ % Determines the smallest τ ⊇ τ which is a full refinement % with respect to all T ∈ F . The cost of the call is O(#τ ) operations. Using the results on the a posteriori error estimator derived in the previous section, we have the following result. Lemma 5.3. Let f 1 ∈ V∗τ , f 2 ∈ [V∗τ ]n . Then for F = MARK[τ, f 1 , f 2 , uτ ] and τ ⊇ REFINE[τ, F ], we have u − uτ E ≤ 1 −
(5.4)
c22 θ 2 C12
12
u − uτ E .
Furthermore #F #ˆ τ − #τ0 for any partition τˆ for which u − uτˆ E ≤ 1 −
C12 θ 2 c22
12
u − uτ E .
Proof. Since this is a key result, for convenience of the reader we recall the arguments from [Ste07]. From u − uτ 2E = u − uτ 2E + uτ − uτ 2E and, by (4.5), (5.3), and (4.4), uτ − uτ E ≥ c2 θE(τ, f 1 , f 2 , uτ ) ≥
c2 θ u − uτ E , C1
we conclude (5.4). With τˆ being a partition as in the statement of the theorem, let τ˘ = τ ∪ τˆ. Then, as τ and τˆ, the partition τ˘ is a conforming descendant of τ0 , u − uτ˘ E ≤ u − uτˆ E , and #˘ τ − #τ ≤ #ˆ τ − #τ0 . To see the last statement, note that each simplex in τ˘ that is not in τ is in τˆ. Therefore, since τ ⊃ τ0 , the number of bisections needed to create τ˘ from τ , whose number is equal to #˘ τ − #τ , is not larger than the number of bisections needed to create τˆ from τ0 , whose number is equal to #ˆ τ − #τ0 . With G = G(τ, τ˘) from Proposition 4.2, we have C12 ηT (f 1 , f 2 , uτ ) ≥ uτ˘ − uτ 2E = u − uτ 2E − u − uτ˘ 2E T ∈G
≥
C12 θ 2 u c22
− uτ 2E ≥ C12 θ2 E(τ, f 1 , f 2 , uτ )2
by (4.6). By construction of F , we conclude that #F #G #˘ τ − #τ ≤ #ˆ τ − #τ0 , which completes the proof.
872
MARIO S. MOMMER AND ROB STEVENSON
The idea of the goal-oriented AFEM will be to mark sets of simplices for refinement corresponding to both primal and dual problems, and then to perform the actual refinement corresponding to that set of marked simplices that has the smallest cardinality. In order to assess the quality of the method, we first introduce the approximation classes As . For s > 0, we define s 1 s A = u ∈ H0 (Ω) : |u|As := sup ε inf [#τ − #τ0 ] < ∞ ε>0
{τ :u−uτ E ≤ε}
and equip it with norm uAs := uE +|u|As . So As is the class of functions that can be approximated within any given tolerance ε > 0 in E by a continuous piecewise 1/s polynomial of degree p on a partition τ with #τ − #τ0 ≤ ε−1/s |u|As . Remark 5.4. Although in the definition of As we consider only conforming descendants τ of τ0 , in view of Remark 3.2, we note that these approximation classes would remain the same if we would replace τ by any descendant of τ0 , conforming or not. While the As contain Vτ for any s, and thus are never empty, only the range s ≤ p/n is of interest, as even C ∞ functions are only guaranteed to belong to As for this range. Classical estimates show that for s ≤ p/n, H 1+p (Ω) ∩ H01 (Ω) ⊂ As , where it is sufficient to consider uniform refinements. The class As is much larger than H 1+p (Ω) ∩ H01 (Ω), which is the reason to consider adaptive methods in the first place. A (near) characterization of As for s ≤ p/n in terms of Besov spaces can be found in [BDDP02] (although there the case n = 2 and p = 1 is considered, results easily generalize). We now consider the following adaptive algorithm: GOAFEM[f 1 , f 2 , g 1 , g 2 , ε] → [τn , uτn , zτn ] % For this preliminary version of the goal-oriented AFEM, % it is assumed that f 1 , g 1 ∈ V∗τ0 and f 2 , g 2 ∈ [V∗τ0 ]n . k := 0 while C1 E(τk , f 1 , f 2 , uτk ) · C1 E(τk , g 1 , g 2 , zτk ) > ε do Fp := MARK[τk , f 1 , f 2 , uτk ] Fd := MARK[τk , g 1 , g 2 , zτk ] With F being the smallest of Fp and Fd , τk+1 := REFINE[τk , F ] k := k + 1 end do n:=k Theorem 5.5. Let f 1 , g 1 ∈ V∗τ0 and f 2 , g 2 ∈ [V∗τ0 ]n . Then [τn , uτn , zτn ] = GOAFEM[f 1 , f 2 , g 1 , g 2 , ε] terminates, and u − uτn E z − zτn E ≤ ε. If u ∈ As and z ∈ At , then #τn − #τ0 ε−1/(s+t) (|u|As |z|At )1/(s+t) , dependent only on τ0 , and on s or t when they tend to 0 or ∞. Remark 5.6. Assuming only that u ∈ As and z ∈ At , given a partition τ , the generally smallest upper bound for the product of the errors in energy norm in primal and dual solutions that can be expected is [#τ −#τ0 ]−s |u|As [#τ −#τ0 ]−t |z|At . Setting this expression equal to ε, one finds #τ − #τ0 = ε−1/(s+t) (|u|As |z|At )1/(s+t) .
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
873
We conclude that the partition produced by GOAFEM is at most a constant factor larger than the generally smallest partition τ for which u − uτ E z − zτ E is less than the prescribed tolerance. c2 θ 2 1 Proof. Let Ek := u − uτk E z − zτk E . Then Ek+1 ≤ [1 − C2 2 ] 2 Ek by (5.4), and 1
c2 E(τk , f 1 , f 2 , uτk )c2 E(τk , g 1 , g 2 , zτk ) ≤ Ek by (4.6). So GOAFEM[f 1 , f 2 , g 1 , g 2 , ε] terminates, with En ≤ C1 E(τn , f 1 , f 2 , uτn )C1 E(τn , g 1 , g 2 , zτn ) ≤ ε by (4.4). With Fk being the set of marked cells inside the kth call of REFINE, Lemma 5.3 and the assumptions u ∈ As , z ∈ At show that 1 − 2s − 2t1 −1/s 1/s −1/t 1/t C 2 θ2 C 2 θ2 #Fk ≤ min 1 − c12 u − uτk−1 E |u|As , 1 − c12 z − zτk−1 E |z|At 2
2
−1/s
min{u − uτk−1 E ≤
−1/t
1/s
|u|As , z − zτk−1 E
1/t
|z|At }
−1/(s+t)
max min{δ −1/s |u|As , η −1/t |z|At } = Ek−1 1/s
1/t
δη≥Ek−1
(|u|As |z|At )1/(s+t) .
The partition τk is the smallest conforming refinement of the generally nonconforming k , defined as the smallest refinement of τk−1 which is a full refinement with respect to all T ∈ Fk . From Theorem 3.1, #k −#τk−1 #Fk , the majorized linear convergence c2 of k → Ek−1 , and En−1 > C22 ε, we conclude that 1
#τn − #τ0
n
−1/(s+t)
#Fk En−1
(|u|As |z|At )1/(s+t)
k=1
ε−1/(s+t) (|u|As |z|At )1/(s+t) . 6. A practical goal-oriented AFEM. So far, we assumed that f = f 1 + divf 2 , g = g 1 + divg 2 , with f 1 , g 1 ∈ V∗τ , f 2 , g 2 ∈ [V∗τ ]n for any partition τ that we encountered; i.e., we assumed that f 1 , g 1 ∈ V∗τ0 , f 2 , g 2 ∈ [V∗τ0 ]n . From now on, given a partition τ , we will approximate f, g ∈ H −1 (Ω) by fτ1 + divfτ2 , gτ1 + divgτ2 , respectively, where fτ1 , gτ1 ∈ V∗τ , fτ2 , gτ2 ∈ [V∗τ ]n and either τ = τ or, when it is needed to have a smaller approximation error, τ ⊃ τ . We will set fτ := fτ1 + divfτ2 ,
gτ := gτ1 + divgτ2 .
To be able to distinguish between primal or dual solutions corresponding to different right-hand sides, we introduce operators L : H01 (Ω) → H −1 (Ω) by (Lv)(w) = a(v, w) (v, w ∈ H01 (Ω)), and Lτ : Vτ → V τ by (Lτ vτ )(wτ ) = a(vτ , wτ ) (vτ , wτ ∈ Vτ ). The solutions u, z, uτ , zτ of (2.1), (2.2), (4.1), (4.2) can now be written as L−1 f , −1 (L )−1 g, L−1 g, respectively. Since in our case L = L and L τ = Lτ , for notaτ f , (Lτ ) tional convenience we will drop the prime. Note that L·E = ·E , L−1 τ E →E ≤ 1, →E ≤ 1. and (L−1 − L−1 ) E τ Furthermore, in view of controlling the cost of our adaptive solver, from now on we will solve the arising Galerkin systems only approximately. The following lemma generalizes upon Lemma 5.3, relaxing both the condition that the right-hand side is in V∗τ +div[V∗τ ]n and the assumption that we have the exact Galerkin solution available, assuming that the deviations from that ideal situation are sufficiently small in a relative sense. Lemma 6.1 (see [Ste07, Lemmas 6.1 and 6.2]). There exist positive constants ω = ω(θ, C1 , c2 ) and λ = λ(ω, C1 , c2 ) such that for any f ∈ H −1 (Ω), partition τ ,
874
MARIO S. MOMMER AND ROB STEVENSON
fτ1 ∈ V∗τ , fτ2 ∈ [V∗τ ]n , u ¯τ ∈ Vτ with (6.1)
¯τ E ≤ ωE(τ, fτ1 , fτ2 , u ¯τ ), f − fτ E + L−1 τ fτ − u
F := MARK[τ, fτ1 , fτ2 , u ¯τ ] satisfies #F #ˆ τ − #τ0 for any partition τˆ for which ¯ τ E . u − uτˆ E ≤ λu − u Furthermore, given a
μ∈ 1−
c22 θ 2 C12
12
,1 ,
there exists an ω = ω(μ, θ, C1 , c2 ) > 0, such that if (6.1) is valid for this ω, and for τ ⊇ REFINE[τ, F ], fτ ∈ H −1 (Ω) and u¯τ ∈ Vτ , f − fτ E + L−1 ¯τ E ≤ ωE(τ, fτ1 , fτ2 , u¯τ ), τ fτ − u then u − u ¯τ E ≤ μu − u ¯ τ E . For solving the Galerkin systems approximately, we assume that we have an iterative solver of optimal type available: (0)
¯τ GALSOLVE[τ, fτ , uτ , δ] → u (0) % fτ ∈ (Vτ ) and uτ ∈ Vτ , the latter being an initial approximation for an % iterative solver. The output u¯τ ∈ Vτ satisfies ¯τ E ≤ δ. L−1 τ fτ − u % The call requires max{1, log(δ −1 L−1 τ fτ − uτ E )}#τ % arithmetic operations. (0)
Multigrid methods with local smoothing, or their additive variants (Bramble–Pasciak– Xu) as preconditioners in conjugate gradients, are known to be of this type. A routine called RHSf , and analogously RHSg , will be needed to find a sufficiently accurate approximation to the right-hand side f of the form fτ1 + divfτ2 with fτ1 ∈ V∗τ , fτ2 ∈ [V∗τ ]n . Since this might not be possible with respect to the current partition, a call of RHSf may result in further refinement. RHSf [τ, δ] → [τ , fτ1 , fτ2 ] % δ > 0. The output consists of fτ1 ∈ V∗τ and fτ2 ∈ [V∗τ ]n , where τ = τ or, % if necessary, τ ⊃ τ , such that f − fτ E ≤ δ. Assuming that u ∈ As for some s > 0, the cost of approximating the righthand side f using RHSf will generally not dominate the other costs of our adaptive
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
875
method only if there is some constant cf such that for any δ > 0 and any partition τ , for [τ , ·, ·] := RHSf [τ, δ], it holds that #τ − #τ ≤ cf δ −1/s , 1/s
and the number of arithmetic operations required by the call is #τ . We will refer to such an RHSf as s-optimal with constant cf . Obviously, given s, such a routine can exist only when f ∈ A¯s , defined by # s −1 s A¯ = f ∈ H (Ω) : sup ε inf [#τ − #τ0 ] < ∞ . ε>0
{τ :inf f 1 ∈V∗ ,f 2 ∈[V∗ ]n f −fτ E ≤ε} τ
τ
τ
τ
On the one hand, u ∈ A implies that f ∈ A¯s . Indeed, for any partition τ , let fτ2 := −A∇uτ . Then fτ2 ∈ [V∗τ ]n and f − divfτ2 E = u − uτ E . On the other hand, knowing that f ∈ A¯s is a different thing than knowing how to construct suitable sn−1 approximations. If s ∈ [ n1 , p+1 (Ω), then the best approximations fτ1 n ] and f ∈ H ∗ to f from Vτ with respect to L2 (Ω) using uniform refinements τ of τ0 are known to converge with the required rate. For general f ∈ A¯s , however, a realization of a suitable routine RHSf has to depend on the functional f at hand. Remark 6.2. When u and f are smooth, then u ∈ Ap/n and f ∈ A¯(p+1)/n . Indeed, u is approximated by piecewise polynomials of degree p, and f by those of degree p − 1 (apart from possible approximations from div[V∗τ ]n ), whereas the errors are measured in H01 (Ω) or H −1 (Ω), respectively. Also for less smooth u and f , one can expect that usually u ∈ As and f ∈ A¯s for some s > s. In our adaptive method, given some partition τ , for both computing the error estimator and setting up the Galerkin system, we will replace f by an approximation from V∗τ + div[V∗τ ]n where τ ⊇ τ (and similarly for g). This has the advantages that we can consider f ∈ L2 (Ω) + divW∗τ , for which thus the error estimator is not defined, and that we don’t have to worry about quadrature errors in various places in the algorithm. Assuming f ∈ L2 (Ω) + divWnτ for any τ , another option, followed in [MNS00], is not to replace f by an approximation, but to check whether, on the current partition, the error in the best approximation for f from V∗τ (+div[V∗τ ]n ), called data oscillation, is sufficiently small relative to the error in the current approximation to u, and, if not, to refine τ to achieve this. Convergence of this approach was shown, and it can be expected that by applying suitable quadrature and inexact Galerkin solves, optimal computational complexity can be shown as well. The observations at the beginning of this remark indicate that “usually,” at least asymptotically, there will be no refinements needed to reduce the data oscillation. This explains why common adaptive methods that ignore data oscillation usually converge with optimal rates. In addition to being s-optimal, we will have to assume that RHSf is linearly convergent, by which we mean that for any d ∈ (0, 1), there exists a D > 0 such τ , ·, ·] := RHSf [τ, δ], the output that for any δ > 0, partitions τ and τ ⊇ τˆ where [ˆ [τ , ·, ·] := RHSf [τ , dδ] satisfies #τ ≤ D#τ . Remark 6.3. Usually, a realization of [ˆ τ , ·, ·] := RHSf [τ, δ] will be based on the selection of τˆ such that an upper bound for the error is less than the prescribed tolerance. Since this upper bound will be an algebraically decreasing function of #ˆ τ − #τ0 , linear convergence is obtained. We now have the ingredients in hand to define our practical adaptive goal-oriented finite element routine GOAFEM. Compared to the idealized version from the previous section, we will have to deal with the fact that when solving the Galerkin systems s
876
MARIO S. MOMMER AND ROB STEVENSON
only inexactly, and applying inexact right-hand sides, C1 times the a posteriori error estimator E(·) is not necessarily an upper bound for the energy norm of the error. We have to add correction terms to obtain an upper bound. Furthermore, after applying REFINE on either the primal or dual side, we have to specify a tolerance for the error in the new approximation of the right-hand side and in that of the new approximate Galerkin solution. In order to know that a subsequent REFINE results in an error reduction, in view of Lemma 6.1 we would like to choose this tolerance smaller than ω times the new error estimator, which, however, is not known yet. Although we can expect that usually the new estimator is only some moderate factor less than the existing one, it cannot be excluded that the new estimator is arbitrarily small, e.g., when we happen to have reached a partition on which the solution can be exactly represented. In this case, an error reduction is immediate, and so we don’t have to rely on REFINE to achieve it. ¯τ , z¯τ ] GOAFEM[f, g, δp , δd , ε] → [τ, u % Let ω ∈ (0, c2 ) be a constant not larger than the constants ω(θ, C1 , c2 ) and 2 2 1 % ω(μ, θ, C1 , c2 ) for some μ ∈ ([1 − cCθ2 ] 2 , 1) mentioned in Lemma 6.1. 1
2+3C1 c−1 −1 −1 2 + C1 c−1 ))]−1 be a 2 )(2 + C1 (c2 + 2ω 2+C1 c−1 2 1 2 1 2 [τp , fτp , fτp ] := RHSf [τ, δp ], [τd , gτd , gτd ] := RHSg [τ, δd ]
% Let 0 < β < [(
constant.
τ := τ0 , u ¯τp := z¯τd := 0 do u ¯τp := GALSOLVE[τp , fτp , u ¯ τ p , δp ] z¯τd := GALSOLVE[τd , gτd , z¯τd , δd ] 1 2 σp := (2 + C1 c−1 ¯ τp ) 2 )δp + C1 E(τp , fτp , fτp , u −1 1 2 σd := (2 + C1 c2 )δd + C1 E(τd , gτp , gτp , z¯τd ) if σp σd ≤ ε then τ := τp ∪ τd , u¯τ := u ¯τp , z¯τ := z¯τd stop endif ¯τp ) then Fp := MARK[τ, fτ1p , fτ2p , u ¯ τp ] if 2δp ≤ ωE(τp , fτ1p , fτ2p , u else Fp := ∅ endif if 2δd ≤ ωE(τd , gτ1p , gτ2p , z¯τd ) then Fd := MARK[τ, gτ1p , gτ2p , z¯τd ] else Fd := ∅ endif if #τp − #τ + #Fp ≤ #τd − #τ + #Fd then τ := REFINE[τp , Fp ], δp := min(δp , βσp ) [τp , fτ1p , fτ2p ] := RHSf [τ, δp ], τd := τ ∪ τd else τ := REFINE[τd , Fd ], δd := min(δd , βσd ) τp := τ ∪ τp , [τd , gτ1p , gτ2p ] := RHSg [τ, δd ] endif enddo Theorem 6.4. [τ, u ¯τ , z¯τ ] = GOAFEM[f, g, δ p , δ d , ε] terminates, and u − u ¯τ E z − z¯τ E ≤ ε. If u ∈ As , z ∈ At , RHSf (RHSg ) is s-optimal (t-optimal) with constant cf (cg ), δ p > cf , and δ d > cg , then
1/s 1/s 1/t t 1/(s+t) . #τ #τ0 + ε−1/(s+t) (|u|As + cf )s (|z|At + c1/t g ) If, additionally, f E δ p , gE δ d , and δ p δ d u − uτ0 E z − zτ0 E + ε, then the number of arithmetic operations and storage locations required by the call
877
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
are bounded by some absolute multiple of the same expression. The constant factors involved in these bounds may depend only on τ0 , and on s or t when they tend to 0 or ∞, and concerning the cost, on the constants involved in the additional assumptions. Remark 6.5. The condition δ p > cf implies that for a call [τ , ·, ·] = RHS[τ, δ p ], we have τ = τ . Proof. We start with collecting a few useful estimates. At evaluation of σp , by (4.4) and Proposition 4.6, we have −1 u − u¯τp E ≤ u − L−1 fτp E + (L−1 − L−1 ¯ τp E τp )fτp E + Lτp fτp − u −1 ≤ δp + C1 E(τp , fτ1p , fτ2p , L−1 ¯ τp E τp fτp ) + Lτp fτp − u −1 ≤ δp + C1 E(τp , fτ1p , fτ2p , u ¯τp ) + (C1 c−1 ¯ τp E 2 + 1)Lτp fτp − u 1 2 ≤ (2 + C1 c−1 ¯τp ) =: σp 2 )δp + C1 E(τp , fτp , fτp , u
(6.2)
and, by Corollary 4.5, −1 E(τp , fτ1p , fτ2p , u ¯τp ) ≤ c−1 fτp − u ¯ τp E 2 L −1 −1 ≤ c−1 − L−1 ¯ τp E ] τp )(fτp − f )E + Lτp fτp − u 2 [u − uτp E + (L −1 ≤ c−1 2 u − uτp E + c2 2δp .
(6.3)
So if 2δp ≤ ωE(τp , fτ1p , fτ2p , u ¯τp ), then E(τp , fτ1p , fτ2p , u ¯τp ) ≤ [c2 − ω]−1 u − uτp E , and so σp ≤ Du − uτp E ,
(6.4)
where D :=
(1+ 12 C1 c−1 2 )ω+C1 . c2 −ω
Now we are ready to show majorized linear convergence of σp σd . Consider any (A) (B) (A) (B) two instances σp and σp of σp , where σp has been computed preceding σp . (A) (B) (A) (B) With δp , δp and τp , τp being the corresponding tolerances and partitions, (B) (A) (B) (A) (A) from (6.3), δp ≤ δp and τp ⊇ τp , and so u − uτ (B) E ≤ u − u¯τ (A) E ≤ σp p p by (6.2), and we have (B) σp(B) = (2 + C1 c−1 + C1 E(τp(B) , fτ1(B) , fτ2(B) , u ¯τ (B) ) 2 )δp p
≤ (2 +
(A) 3C1 c−1 2 )δp
≤ Kσp(A) ,
(6.5)
(i)
+
p
p
(A) C1 c−1 2 σp
where K :=
2+3C1 c−1 2 2+C1 c−1 2
(i)
+ C1 c−1 2 .
(i)
Let us denote by τp , δp , f 1(i) , f 2(i) , u¯τ (i) , σp the instances of τp , δp , fτ1p , fτ2p , τp
τp
p
(i)
(i)
u ¯τp , σp at the moment of the ith call of REFINE[τp , Fp ]. If 2δp > ωE(τp , f 1(i) , τp
f 2(i) , u ¯τ (i) ), then for any k < i, τp
p
−1 −1 σp(i) < (2 + C1 (c−1 ))δp(i) ≤ (2 + C1 (c−1 ))βσp(k) . 2 + 2ω 2 + 2ω (j)
If, for some k ∈ N0 , 2δp
(j)
≤ ωE(τp , f 1(j) , f 2(j) , u ¯τ (j) ) for j = i, . . . , i − k, then by τp (j)
τp (j−1)
(6.4), Lemma 6.1, where we use that δp ≤ δp
p
, and (6.2),
σp(i) ≤ Du − u¯τ (i) E ≤ Dμk u − u¯τ (i−k) E ≤ Dμk σp(i−k) . p
p
878
MARIO S. MOMMER AND ROB STEVENSON
−1 Since (2 + C1 (c−1 ))β < 1/K by definition of β, from (6.5) we conclude that 2 + 2ω (i+M) (i) ≤ ασp . Since all results for any α ∈ (0, 1) there exists an M such that σp derived so far are equally valid on the dual side, by taking α < 1/K we infer that by 2M iterations of the loop inside GOAFEM, the product σp σd is reduced by a factor αK < 1. Indeed, either σp or σp is reduced by a factor α, whereas the other cannot increase by a factor larger than K. Next, we bound the cardinality of the output partition. If GOAFEM terminates as a result of the first evaluation of the test σp σd ≤ ε, then by the assumptions that δ p > cf and δ d > cg , the output partition τp ∪ τd = τ0 . In the following, we consider the case that initially σp σd > ε. At evaluation of the test #τp − #τ + #Fp ≤ #τd − #τ + #Fd , we have
#τp − #τ ≤ (βK −1 σp )−1/s cf . 1/s
(6.6)
Indeed, the current #τp − #τ is not larger than this difference at the moment of the most recent call of RHSf [τ, δp ]. By the assumption of RHSf being s-optimal, the latter difference was zero when at that time δp > cf . Otherwise, since δ p > cf by assumption, this δp was equal to β times the minimum of all values attained by σp up to that moment. Using (6.5) and the fact that RHSf is s-optimal with constant cf , we end up with (6.6). If, at evaluation of the test #τp − #τ + #Fp ≤ #τd − #τ + #Fd , Fp = ∅, i.e., if ¯τp ], in the preceding lines 2δp ≤ ωE(τp , fτ1p , fτ2p , u¯τp ) and Fp := MARK[τ, fτ1p , fτ2p , u an application of Lemma 6.1 and the assumption that u ∈ As show that then −1/s
#Fp u − u ¯ τp E
(6.7)
|u|As σp−1/s |u|As 1/s
1/s
by (6.4). Clearly, results analogous to (6.6) and (6.7) are valid on the dual side. Now with σp,j , σd,j being the instances of σp , σd at the jth evaluation of the test #τp − #τ + #Fp ≤ #τd − #τ + #Fd , with n being the last one, an application of Theorem 3.1 shows that for τ being the output of the call of REFINE following this last test, being thus the last call of REFINE, we have #τ − #τ0
n
−1/s
1/s
−1/t
1/s
1/t
min{σp,j (|u|As + cf ), σd,j (|z|At + c1/t g )}
j=1
≤
n
t 1/(s+t) (σp,j σd,j )−1/(s+t) [(|u|As + cf )s (|z|At + c1/t g ) ] 1/s
1/s
1/t
j=1
(6.8)
t 1/(s+t) ε−1/(s+t) [(|u|As + cf )s (|z|At + c1/t g ) ] 1/s
1/s
1/t
by the majorized linear convergence of (σp,j σd,j )j and σp,n σd,n > ε. Suppose that this last call of REFINE took place on the primal side. Then the output partition of GOAFEM is τp ∪ τd , where [τp , ·, ·] := RHSf [τ, δp ] and τd := τ ∪ τd . As we have seen, if δp ≤ cf , i.e., if possibly τp τ , then δp is larger than βK −1 times the current σp , which, by its definition, is larger than 2 + C1 c−1 times 2 (prev) (prev) . A call of RHSf [·, δp ] has been made the previous value of δp , denoted as δp (prev) inside GOAFEM, and so τ ⊇ τ with [τ , ·, ·] := RHSf [·, δp ]. The assumption of RHSf being linearly convergent shows that #τp #τ .
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
879
The current #τd − #τ is not larger than this difference at the moment of the last call of RHSg , and so analogously we find that #τd #τ . We conclude that (6.9)
t 1/(s+t) . #τp ∪ τd #τ #τ0 + ε−1/(s+t) [(|u|As + cf )s (|z|At + c1/t g ) ] 1/s
1/s
1/t
Finally, we have to bound the cost of the algorithm. At the moment of the first ¯τp , δp ], we have call of GALSOLVE[τp , fτp , u ¯τp E ≤ fτp − f E + f E ≤ δp + f E δp L−1 τp fτp − u by assumption. We now consider any further calls. From (6.3), u − uτ0 E ≤ f E δ p by assumption, and (6.5), we have that the currents δp and σp at the moment of such a call satisfy σp δp . As a consequence, we have −1 L−1 ¯τp E ≤ (L−1 − L−1 fτp − u ¯τp E ≤ 2L−1 fτp − u ¯ τp E τp fτp − u τp )fτp E + L
≤ 2[f − fτp E + u − u¯τp E ] ≤ 2δp + 2σp δp . By the assumption of GALSOLVE being an optimal iterative solver, we conclude that the cost of these calls is O(#τp ). ¯τp ], The number of arithmetic operations needed for the calls MARK[τ, fτ1p , fτ2p , u τ := REFINE[τp , Fp ], and [τp , ·, ·] := RHSf [τ, δp ] are O(#τ ), O(#τ ), and O(#τp ), respectively. Moreover, we know that #τp #τ , and that #τ − #τ0 as a function of the iteration count is majorized by a linearly increasing sequence with upper bound (6.8). From the assumption that δ p δ d u − uτ0 E z − zτ0 E + ε, the first σp σd u − uτ0 E z − zτ0 E + ε, meaning that after some absolute constant number of iterations, either the current τ is unequal to τ0 or the algorithm has terminated. Together, above observations show that the total cost is bounded by some absolute multiple of the right-hand side of (6.9). Remark 6.6. The functions u ¯τ , z¯τ produced by GOAFEM are not the exact Galerkin approximations, and so u − u¯τ E z − z¯τ E is not necessarily an upper bound for |g(u) − g(¯ uτ )|. Writing g(u) − g(¯ uτ ) = a(u − u ¯ τ , z) = a(u − u ¯ τ , z − zτ ) = a(u − u ¯τ , z − z¯τ ) − a(u − u ¯τ , zτ − z¯τ ), and using the fact that u − u ¯τ E ≤ σp , z − z¯τ E ≤ σd , zτ − z¯τ ≤ δd ≤ (2 + −1 −1 C1 c−1 ) σ , and σ σ ≤ ε, we end up with |g(u) − g(¯ uτ )| ≤ [1 + (2 + C1 c−1 ]ε. d p d 2 2 ) 7. Numerical experiments. In this section we will consider the performance of the GOAFEM routine in practice. As many real-world problems require the evaluation of functionals that are unbounded on H01 (Ω), we will also consider such a problem. As GOAFEM can handle only bounded functionals, we need to do some additional work. Following [BS01], we will apply a so-called extraction functional, a technique that we recall below. An alternative approach would be to apply a regularized functional as suggested in [OR76, BR96]. This approach can be applied more generally since no Green’s function is needed. On the other hand, it introduces an additional error that can only be controlled in terms of higher order derivatives of the solution beyond those that are needed for the functional to be well defined. 7.1. Extraction functionals. Let g˜ be some functional defined on the solution u of (2.1), but that is unbounded on H01 (Ω). With f being the right-hand side of (2.1), we write g˜(u) = g(u) + M (f ), where g ∈ H −1 (Ω) and M is a functional on
880
MARIO S. MOMMER AND ROB STEVENSON
f . Since u and f are related via an invertible operator, this is always possible, even for any g ∈ H −1 (Ω). Yet, we would like to do this under the additional constraint that M (f ) can be computed within any given tolerance at low cost. Basically, this additional condition requires that a Green’s function for the differential operator is available. We consider A = Id, i.e., the Poisson problem, on a two-dimensional domain Ω, and, for some x ¯ ∈ Ω, g˜ = g˜x¯ given by g˜x¯ (u) =
∂u (¯ x), ∂x1
assuming that u is sufficiently smooth. With (r, θ) denoting polar coordinates centered r cosθ at x ¯, we have log ˜x¯ in the sense that for any smooth ¯ , and so − 2πr = g 2π = δx cosθ 2 test function φ ∈ D(R ), − R2 2πr φ = g˜x¯ (φ). Generally, this formula cannot be applied with φ replaced by the solution u of (2.1). Indeed, in the general case this function has a nonvanishing normal derivative at the boundary of Ω, and therefore x ¯ its zero extension is not sufficiently smooth. Therefore, with w0x¯ := cosθ 2πr , w1 being a x ¯ sufficiently smooth function equal to w0 outside some open Σ Ω that contains x¯, and wx¯ := w0x¯ − w1x¯ for any φ ∈ D(R2 ), we write x ¯ g˜x¯ (φ) = − w1 φ − wx¯ φ = R2
R2
R2
(−w1x¯ )φ +
wx¯ (−φ) Ω
=: gx¯ (φ) + Mx¯ (−φ). Clearly, gx¯ extends to a bounded functional on L1 (R2 ), with gx¯ (v) = Ω (−w1x¯ )v when supp v ⊂ Ω. In particular, gx¯ is bounded on H01 (Ω), which enables us to use GOAFEM to evaluate it. Moreover, since supp wx¯ Ω, under some mild conditions the above reformulation can be shown to be applicable to u. The details are as follows. Proposition 7.1. If (a) f ∈ L2 (Ω), (b) u is continuously differentiable at x ¯, and (c) in a neighborhood of x ¯, f is in Lp for some p > 2, then g˜x¯ (u) = gx¯ (u) + Mx¯ (f ). Proof. Let B(¯ x; ε) be the ball centered at x¯ with radius ε, and small enough such that B(¯ x; ε) Ω. Since u, wx¯ ∈ H 1 (Ω\B(¯ x; ε)), u ∈ L2 (Ω\B(¯ x; ε)) by (a), x; ε)), and supp wx¯ Ω, integration by parts shows that wx¯ ∈ L2 (Ω\B(¯ ∂wx¯ ∂u −u = (7.1) wx¯ uwx¯ − wx¯ u, ∂n ∂n ∂B(¯ x;ε) Ω\B(¯ x;ε) where n is the outward pointing normal of ∂B(¯ x; ε). We have limε↓0 Ω\B(¯x;ε) uwx¯ = − limε↓0 Ω\B(¯x;ε) uw1x¯ = gx¯ (u). Since | B(¯x;ε) w0x¯ f | ≤ f Lp(B(¯x;ε)) w0x¯ Lq (B(¯x;ε)) ( p1 + 1q = 1), and furthermore ε 2π q 1/q w0x¯ Lq (B(¯x;ε)) = [ 0 0 | cosθ → 0 when ε ↓ 0 and q < 2, from (c) we conclude 2πr | r] x ¯ that − limε↓0 Ω\B(¯x;ε) w u = Ω wx¯ f = Mx¯ (f ).
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
881
The contributions of w1x¯ to the left-hand side of (7.1) vanish when ε ↓ 0. 2π ∂u ∂u cosθ ∂u From ∂B(¯x;ε) w0x¯ ∂n = 0 (cosθ ∂x + sinθ ∂x ) 2πε εdθ and (b), we infer that 1 2 1 ∂u x ¯ ∂u limε↓0 ∂B(¯x;ε) w0 ∂n = 2 ∂x1 (¯ x). From 2π 2π ¯ ∂w x −1 1 u ∂n0 = 2πε cosθ udθ = 2πε sinθ ∂u ∂θ dθ 0
∂B(¯ x;ε)
= and (b), we infer that − limε↓0 vations give the proof.
1 2π
0
2π
sinθ (− sinθ 0
∂B(¯ x;ε)
ux¯
¯ ∂w0x ∂n
=
∂u ∂x1
+ cosθ
1 ∂u x). 2 ∂x1 (¯
∂u ∂x2 )dθ
Together, the above obser-
7.2. Implementation. The implementation of the GOAFEM routine is essentially as described above, with the sole difference that we did not approximate the right-hand sides for setting up the Galerkin systems and computing the a posteriori error estimators, but instead used quadrature directly. This was possible, and in view of Remark 6.2 reasonable, because in our experiments either the right-hand sides are very smooth or they are already in V∗τ0 + div[V∗τ0 ]n . For all experiments, we used p = 2, i.e., quadratic Lagrange elements. The GALSOLVE routine we use solves the linear systems with the conjugate gradient method using the well-known Bramble–Pasciak–Xu preconditioner. All routines were implemented in Common Lisp and run using the SBCL compiler and run-time environment. This allowed for a short development time and wellinstrumented code. With regards to efficiency, the only effort made in that direction consisted in making sure that the asymptotics were correct. While an efficient implementation would be possible with moderate effort (see [Neu03]), for our purposes convenience and correctness were the most important considerations. For the experiment in which we use the extraction functional for the partial derivative at a point introduced above, we also have to solve a quadrature problem. For this we used the adaptive cubature routine Cuhre [BEG91] as implemented in the Cuba cubature package [Hah05]. 7.3. Experiments. To test GOAFEM, we chose two distinct situations. For the first example, we want to compute a partial derivative at a point of a function given as the solution of a Poisson problem, thus illustrating the applicability of our method to this situation. In our second example, we consider a problem in which the singularities of the solutions to the primal and dual problems are spatially separated. 2 Example 7.2. Let Ω = (0, 1) . We consider problem (2.1), choosing the righthand side f = 1 (i.e., f (v) = Ω vdx). We will test the performance of GOAFEM on the task of computing ∂u (¯ x), ∂x1 49 with x ¯ = ( π7 , 100 ). The initial partition is as indicated in Figure 7.1, with ( 12 , 12 ) being the newest vertex of all 4 triangles. Following the discussion from subsection 7.1, we take w1x¯ = ψw0x¯ , and thus wx¯ = (1 − ψ)w0x¯ , with ψ being a sufficiently smooth function, 1 outside some neighborhood
882
MARIO S. MOMMER AND ROB STEVENSON
Fig. 7.1. Initial partition τ0 corresponding to Example 7.2.
1
80 0 -80
0.5
0
0.5
1
0
Fig. 7.2. Right-hand side of the dual problem corresponding to Example 7.2.
of x ¯ insideΩ, and 0 on some of x ¯. Proposition 7.1 shows that smaller neighborhood ∂u x ¯ x ¯ (¯ x ) = u(−ψw ) + (1 − ψ)w f . Writing (θ, r) for the polar coordinates 0 0 ∂x1 Ω Ω around x ¯, we chose r ∞ (7.2) ψ(θ, r) := ψ ∗ (s)ds / ψ ∗ (s)ds, 0
0
of order 6, with support [0.1, 0.45]. with ψ ∗ a spline function We evaluated Ω (1 − ψ)w0x¯ f using the adaptive quadrature routine Cuhre. To obtain precision of 10−12 it needed 216515 integrand evaluations. On current off-theshelf hardware, it takes only a few seconds. To approximate Ω u(−ψw0x¯ ) we used GOAFEM. Since the right-hand sides 1 and (−ψw0x¯ ) of primal and dual problems are smooth, their solutions are in Ap/n = A1 , so that the error in the functional is O([#τ − #τ0 ]−2 ). We compared the results with those obtained with the corresponding non-goal-oriented adaptive finite element routine AFEM for minimizing the error in energy norm, which is obtained by applying refinements always because of the markings at primal side. The solutions of the primal and dual problems are in H 3−ε (Ω) for any ε > 0, but, because the right-hand sides do not vanish at the corners, they are not in H 3 (Ω). Recalling that we use quadratic elements, as a consequence (fully) optimal convergence rates with respect to E are not obtained using uniform refinements. On the other hand, since the (weak) singularities in the primal and dual solutions are solely caused by the shape of the domain, the same local refinements near the corners are appropriate for both primal and dual problem. Therefore, in view of (1.1), we may expect that also with AFEM the error in the functional is O([#τ − #τ0 ]−2 ). On the other hand, since quantitatively the right-hand side, and so the solution of the dual problem, are not that smooth (see Figure 7.2), we may hope that the application of GOAFEM yields quantitatively better results. In Figure 7.3, we show errors in Ω u(−ψw0x¯ ) as a function of #τ − #τ0 . The re-
883
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
10-2 10-4 10-6 10-8 10-10 101
102
103
104
Fig. 7.3. Error in the functional vs. #τ − #τ0 using GOAFEM (solid) and AFEM (dashed) corresponding to Example 7.2, and a curve C[#τ − #τ0 ]−2 . 1
1
0.5
0.5
0
0 0
0.5
1
0
0.5
1
Fig. 7.4. Partitions produced by AFEM and GOAFEM with nearly equal number of triangles for Example 7.2.
sults confirm that for both GOAFEM and AFEM, these errors are O([#τ −#τ0 ]−2 ), where on average for GOAFEM the errors are smaller. In Figure 7.4, we show partitions produced by GOAFEM and AFEM. With AFEM local refinements are made only towards the corners, whereas with GOAFEM additional local refinements are made in areas where quantitatively the dual solution is nonsmooth due to oscillations in its right-hand side. Example 7.3. As in Example 7.2, we consider Poisson’s problem on the unit square. We now take as initial partition the one that is obtained from the partition from Figure 7.1 by 2 uniform refinements. We define the right-hand sides f and g of primal and dual problems by ∂v ∂v (7.3) , g(v) = − , f (v) = − Tf ∂x1 Tg ∂x1
884
MARIO S. MOMMER AND ROB STEVENSON
Fig. 7.5. Initial partition τ0 corresponding to Example 7.3, and Tf (left bottom), Tg (right top).
0.08 0.04 1
0 0
0.5 0.5 0
Fig. 7.6. Primal solution corresponding to Example 7.3.
-4
10
-6
10
10-8
-10
10
10
2
3
10
4
10
Fig. 7.7. Error in the functional vs. #τ − #τ0 using GOAFEM (solid) and AFEM (dashed) corresponding to Example 7.3, and a curve C[#τ − #τ0 ]−2 .
where Tf and Tg are the simplices {(0, 0), ( 21 , 0), (0, 12 )} and {(1, 1), ( 21 , 1), (1, 12 )}, respectively; see Figure 7.5. That is, with χf being the characteristic function of Tf , f = div[χf 0]T . So in view of (4.3), here we write f as f 1 + divf 2 with vanishing f 1 , and benefit from the fact that f 2 ∈ [Vτ∗0 ]2 . Similarly for g. The primal solution has a singularity along the line connecting the points ( 12 , 0) and (0, 12 ) (see Figure 7.6), and similarly the dual solution has one along the line connecting (1, 12 ) and ( 12 , 1). Since the non-goal-oriented adaptive finite element routine AFEM does not see the latter singularity, it behaves much worse than GOAFEM, as seen in Figure 7.7. For GOAFEM we observe an error O([#τ −#τ0 ]−2 ), which, since
A GOAL-ORIENTED ADAPTIVE FINITE ELEMENT METHOD
885
Fig. 7.8. Partitions produced by AFEM and GOAFEM with nearly equal number of triangles for Example 7.3.
p/n = 1, is equal to the best possible rate predicted by Theorem 6.4. In Figure 7.8, we show partitions produced by AFEM and GOAFEM, respectively. REFERENCES [AO00] [BR96]
[BDD04] [BDDP02] [BEG91]
[BMN02]
[BR03] [BS01]
[DKV06]
[D¨ or96] [Hah05] [KS08] [Mau95] [MNS00] [MS08]
M. Ainsworth and J. T. Oden, A Posteriori Error Estimation in Finite Element Analysis, Pure Appl. Math., Wiley-Interscience, New York, 2000. R. Becker and R. Rannacher, A feed-back approach to error control in finite element methods: Basic analysis and examples, East-West J. Numer. Math., 4 (1996), pp. 237–264. P. Binev, W. Dahmen, and R. DeVore, Adaptive finite element methods with convergence rates, Numer. Math., 97 (2004), pp. 219–268. P. Binev, W. Dahmen, R. DeVore, and P. Petruchev, Approximation classes for adaptive methods, Serdica Math. J., 28 (2002), pp. 391–416. J. Bernsten, T. O. Espelid, and A. Genz, An adaptive algorithm for the approximate calculation of multiple integrals, ACM Trans. Math. Software, 17 (1991), pp. 437– 451. ¨ nsch, P. Morin, and R. H. Nochetto, An adaptive Uzawa FEM for the Stokes E. Ba problem: Convergence without the inf-sup condition, SIAM J. Numer. Anal., 40 (2002), pp. 1207–1229. W. Bangerth and R. Rannacher, Adaptive Finite Element Methods for Differential Equations, Lectures Math. ETH Z¨ urich, Birkh¨ auser Verlag, Basel, 2003. I. Babuˇ ska and T. Strouboulis, The Finite Element Method and Its Reliability, Numer. Math. Sci. Comput., The Clarendon Press, Oxford University Press, New York, 2001. W. Dahmen, A. Kunoth, and J. Vorloeper, Convergence of adaptive wavelet methods for goal-oriented error estimation, in Numerical Mathematics and Advanced Applications, Springer-Verlag, Berlin, 2006, pp. 39–61. ¨ rfler, A convergent adaptive algorithm for Poisson’s equation, SIAM J. Numer. W. Do Anal., 33 (1996), pp. 1106–1124. T. Hahn, Cuba—a library for multidimensional numerical integration, Comput. Phys. Comm., 168 (2005), pp. 78–95. Y. Kondratyuk and R. P. Stevenson, An optimal adaptive finite element method for the Stokes problem, SIAM J. Numer. Anal., 46 (2008), pp. 747–775. J. M. Maubach, Local bisection refinement for n-simplicial grids generated by reflection, SIAM J. Sci. Comput., 16 (1994), pp. 210–227. P. Morin, R. Nochetto, and K. Siebert, Data oscillation and convergence of adaptive FEM, SIAM J. Numer. Anal., 38 (2000), pp. 466–488. M. Mommer and R. P. Stevenson, A Goal-Oriented Adaptive Finite Element Method with Convergence Rates—Extended Version, preprint, Korteweg–de Vries Institute for Mathematics, University of Amsterdam, 2008; extended preprint version of current work on http://staff.science.uva.nl/∼rstevens/publ.html.
886
MARIO S. MOMMER AND ROB STEVENSON
[MvSST06] K.-S. Moon, E. von Schwerin, A. Szepessy, and R. Tempone, Convergence rates for an adaptive dual weighted residual finite element algorithm, BIT, 46 (2006), pp. 367–407. [Neu03] N. Neuss, On using Common Lisp in scientific computing, in Challenges in Scientific Computing–CISC 2002, Springer-Verlag, Berlin, 2003, pp. 237–245. [OR76] J. T. Oden and J. N. Reddy, An Introduction to the Mathematical Theory of Finite Elements, Pure Appl. Math., Wiley-Interscience, New York, 1976. [Ste05] R. P. Stevenson, An optimal adaptive finite element method, SIAM J. Numer. Anal., 42 (2005), pp. 2188–2217. [Ste07] R. P. Stevenson, Optimality of a standard adaptive finite element method, Found. Comput. Math., 7 (2007), pp. 245–269. [Ste08] R. P. Stevenson, The completion of locally refined simplicial partitions created by bisection, Math. Comp., 77 (2008), pp. 227–241. [Tra97] C. T. Traxler, An algorithm for adaptive mesh refinement in n dimensions, Computing, 59 (1997), pp. 115–137. ¨ rth, A Review of A Posteriori Error Estimation and Adaptive Mesh[Ver96] R. Verfu Refinement Techniques, Wiley-Teubner, Chichester, UK, 1996.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 887–910
PRACTICAL VARIANCE REDUCTION VIA REGRESSION FOR SIMULATING DIFFUSIONS∗ G. N. MILSTEIN† AND M. V. TRETYAKOV‡ Abstract. The well-known variance reduction methods—the method of importance sampling and the method of control variates—can be exploited if an approximation of the required solution is known. Here we employ conditional probabilistic representations of solutions together with the regression method to obtain sufficiently inexpensive (although rather rough) estimates of the solution and its derivatives by using the single auxiliary set of approximate trajectories starting from the initial position. These estimates can effectively be used for significant reduction of variance and further accurate evaluation of the required solution. The developed approach is supported by numerical experiments. Key words. probabilistic representations of solutions of partial differential equations, numerical integration of stochastic differential equations, Monte Carlo technique, variance reduction methods, regression AMS subject classifications. Primary, 65C05; Secondary, 65C30, 60H10 DOI. 10.1137/060674661
1. Introduction. The stochastic approach to solving problems of mathematical physics is based on probabilistic representations of their solutions by making use of the weak-sense numerical integration of stochastic differential equations (SDEs) and the Monte Carlo (MC) technique. In this approach we have two main errors: the error of SDE numerical integration and the MC error. The first error essentially depends on the choice of a method of numerical integration, and the second one depends on the choice of the probabilistic representation (it is understood that the first error for a chosen method can be reduced by decreasing the step of discretization, and the MC error for a selected probabilistic representation can be reduced by increasing the number of independent trajectories). While the error of numerical integration is well studied in the systematic theory of numerical integration of SDEs, which allows us to propose suitable effective methods for a lot of typical problems (see, e.g., [16]), in connection with the MC error there is a lack of constructive variance reduction methods. The well-known variance reduction methods (see [12, 16, 21] and the references therein) of importance sampling and of control variates can be exploited only in the case when an approximation of the required solution u(t, x) is known. However, in general even rough approximations of the desired solution u(t, x) and its derivatives ∂u/∂xi (t, x), i = 1, . . . , d, are unknown beforehand. At first sight, it seems that approximating them roughly is not difficult since they can be found by the MC technique using a comparatively small number of independent trajectories. But this presupposes evaluating them at many points (tk , xk ). Computing u(tk , xk ) and ∂u/∂xi (tk , xk ) by ∗ Received by the editors November 10, 2006; accepted for publication (in revised form) October 17, 2008; published electronically February 6, 2009. This work was partially supported by the Royal Society International Joint Project-2004/R2-FS grant and UK EPSRC research grant EP/D049792/1. http://www.siam.org/journals/sinum/47-2/67466.html † Department of Mathematics, Ural State University, Lenin Str. 51, 620083 Ekaterinburg, Russia (
[email protected]). ‡ Department of Mathematics, University of Leicester, Leicester LE1 7RH, UK (M.Tretyakov@ le.ac.uk). This author’s research was partially supported by a Leverhulme Research Fellowship. Part of this work was done while the author was on study leave granted by the University of Leicester.
887
888
G. N. MILSTEIN AND M. V. TRETYAKOV
the MC technique requires different auxiliary sets of approximate trajectories because of the different starting points (tk , xk ). This is too expensive, i.e., as a rule, such a procedure is more expensive than simple increase of the number of trajectories starting from the initial position (t0 , x0 ), at which we aim to find the value of the solution u. So, a suitable method of constructing u(tk , xk ) and ∂u/∂xi (tk , xk ) should be comparatively inexpensive. Therefore we cannot require a considerable accuracy of the estimates for u(tk , xk ) and ∂u/∂xi (tk , xk ) because there is a trade-off between accuracy and computational expenses. Our proposition is to exploit conditional probabilistic representations. Their employment together with the regression method allows us to evaluate u(tk , x) and ∂u/∂xi (tk , x) using the single auxiliary set of approximate trajectories starting from the initial position (t0 , x0 ) only. This plays a crucial role in obtaining sufficiently inexpensive (but at the same time useful for variance reduction) i i $ $ (tk , x). The construction of u ˆ and ∂u/∂x is accompaestimates u ˆ(tk , x) and ∂u/∂x nied by a number of errors of a different nature. Although it is impossible to evaluate i $ (tk , x) for variance these errors satisfactorily, the suitability of u ˆ(tk , x) and ∂u/∂x reduction can be directly verified during computations since the MC error can always be estimated. We emphasize that the obtained (even rather rough) estimates can effectively be used for accurately evaluating the function u not only at the position (t0 , x0 ) but at many other positions as well. This paper is most closely connected with [6, 12, 13, 14] (see also the [16]) and with the works [21, 20] by N. Newton. The method of importance sampling from [6, 12] is exploited in [25] for some specific physical applications. Various other aspects of variance reduction related to simulating diffusions are considered, e.g., in [2, 4, 9, 10, 24] (see also the references therein). An extended list of works devoted to variance reduction of MC simulations can be found in [7]. In section 2 we recall some known facts concerning the MC technique for linear parabolic equations and the general scheme of regression method for estimating conditional expectations. Section 3 is devoted to conditional probabilistic representations of solutions of parabolic equations and their derivatives. These representations together with regression approach play a decisive role in the economical estimating of u and ∂u/∂xi at all points (t, x), given the only set of trajectories starting from the initial point (t0 , x0 ). In section 3.2 we obtain the estimate u ˆ(s, x) and propose to estimate the derivatives ∂u/∂xi (s, x) by ∂ u ˆ/∂xi (s, x). This estimation of derivatives is inexpensive from the computational point of view, but they are rather rough. Section 3.3 is devoted to the more accurate way of estimating derivatives using a linear regression i i $ $ (tk , x). In section 3.4, we obtain ∂u/∂x (tk , x) in the method directly to find ∂u/∂x case of nonsmooth initial data exploiting probabilistic representations for ∂u/∂xi (s, x) which rest on the Malliavin integration by parts. To this aim, we derive a conditional version of the Malliavin integration-by-parts formula adapted to our context. It should be noted that if the dimension d is large, the procedures of sections 3.3 and 3.4 are computationally very demanding since they require integration of the d2 -dimensional system of first-order variation equations whose solution is present in the probabilistic representations for ∂u/∂xi (s, x). Therefore, in practice, the inexpensive procedure of section 3.2 is preferable if d is large. In section 4 we give a simple, analytically tractable example to illustrate the benefits of the proposed variance reduction procedure, and we also test it on a one-dimensional array of stochastic oscillators and on the Black–Scholes pricing model for a binary asset-or-nothing call option. Section 5 gives a summary of the proposed approach to variance reduction.
PRACTICAL VARIANCE REDUCTION
889
2. Preliminaries. In this section we recall some known facts concerning probabilistic representations of the solutions of parabolic partial differential equations and the regression method of estimating conditional expectations in the form suitable for our purposes. 2.1. Probabilistic representations. Let us consider the Cauchy problem for the linear parabolic equation (2.1)
d d ∂2u ∂u ∂u 1 ij + a (t, x) i j + bi (t, x) i ∂t 2 i,j=1 ∂x ∂x ∂x i=1
+ c(t, x)u + g(t, x) = 0, t0 ≤ t < T, x ∈ Rd , with the initial condition (2.2)
u(T, x) = f (x), x ∈ Rd .
The matrix a(t, x) = {aij (t, x)} in (2.1) is symmetric and at least positive semidefinite. Let σ(t, x) be a matrix obtained from the equation a(t, x) = σ(t, x)σ (t, x). Let (Ω, F , Ft , P ), t0 ≤ t ≤ T, be a filtered probability space. The solution to the problem (2.1)–(2.2) has the following probabilistic representation (the well-known Feynman–Kac formula): (2.3)
u(s, x) = E[f (Xs,x (T ))Ys,x,1 (T ) + Zs,x,1,0 (T )],
where Xs,x (t), Ys,x,y (t), Zs,x,y,z (t), t ≥ s, is the solution of the Cauchy problem for the system of SDEs (2.4)
dX = b(t, X)dt + σ(t, X)dw(t), X(s) = x, dY = c(t, X)Y dt, Y (s) = y, dZ = g(t, X)Y dt, Z(s) = z.
Here w(t) = (w1 (t), . . . , wd (t)) is a d-dimensional {Ft }t≥t0 -adapted standard Wiener process, and Y and Z are scalars. If y = 1, z = 0, we shall use the notation Ys,x (t) := Ys,x,1 (t), Zs,x (t) := Zs,x,1,0 (t) (analogous notation will be used later for some other variables). So, (2.5)
u(s, x) = E[f (Xs,x (T ))Ys,x (T ) + Zs,x (T )].
There are various sets of sufficient conditions ensuring connection between the solutions of the Cauchy problem (2.1)–(2.2) and their probabilistic representations (2.5)–(2.4). For definiteness, we shall keep the following assumptions. We assume that the coefficients b, σ, c, and g have bounded derivatives up to some order, and additionally c and g are bounded on [t0 , T ] × Rd . Further, we assume that the matrix a(t, x) is positive definite and, moreover, the uniform ellipticity condition holds: there exists σ0 > 0 such that a−1 (t, x) = (σ(t, x)σ (t, x))−1 ≤ σ0−1 , t0 ≤ t ≤ T, x ∈ Rd .
890
G. N. MILSTEIN AND M. V. TRETYAKOV
As for function f (x), it is assumed to grow at infinity not faster than a polynomial function. It can be both smooth and nonsmooth. We note that the results of this paper can be used under other sets of conditions. For instance, one can consider situations with nonglobally Lipschitz coefficients [18] or with matrix a(t, x) which is positive semidefinite. For example, in section 4.2 we consider a numerical example with nonglobally Lipschitz coefficients and positive semidefinite matrix a(t, x), and the example from section 4.3 has a discontinuous f (x). The value u(s, x) from (2.5) can be evaluated using the weak-sense numerical integration of the system (2.4) together with the MC technique. More specifically, we have (2.6)
¯ s,x (T ))Y¯s,x (T ) + Z¯s,x (T )] u(s, x) ≈ E[f (X ≈
M 1 ¯ s,x (T ))m Y¯s,x (T ) + m Z¯s,x (T )] , [f (m X M m=1
where the first approximate equality involves an error due to replacing X, Y , Z by ¯ Y¯ , Z¯ (the error is related to the approximate integration of (2.4)) and the error in X, ¯ s,x (T ), m Y¯s,x (T ), the second approximate equality comes from the MC technique; m X ¯ ¯ ¯ ¯ m Zs,x (T ), m = 1, . . . , M, are independent realizations of Xs,x (T ), Ys,x (T ), Zs,x (T ). While the weak-sense integration of SDEs is developed sufficiently well and a lot of different effective weak-sense numerical methods have been constructed (see, e.g., [16]), the methods of reducing the second error in (2.6) are more intricate. The error of the MC method is evaluated by ρ¯ = c
¯ s,x (T ))Y¯s,x (T ) + Z¯s,x (T )])1/2 (var[f (X , M 1/2
where, e.g., the values c = 1, 2, 3 correspond to the fiducial probabilities 0.68, 0.95, 0.997, respectively. Introduce (2.7)
Γ = Γs,x := f (Xs,x (T ))Ys,x (T ) + Zs,x (T ),
(2.8)
¯=Γ ¯ s,x := f (X ¯ s,x (T ))Y¯s,x (T ) + Z¯s,x (T ). Γ
¯ s,x , we can assume that the error of the MC method is Since varΓs,x is close to varΓ estimated by (2.9)
ρ=c
(varΓs,x )1/2 . M 1/2
2.2. Variance reduction. If varΓs,x is large, then to achieve a satisfactory accuracy we have to simulate a very large number of independent trajectories. Clearly, variance reduction is of crucial importance for effectiveness of any MC procedure. To reduce the MC error, one usually exploits some other probabilistic representations of solutions to considered problems. To obtain various probabilistic representations of the solution to the problem (2.1)–(2.2), we introduce the system (see [13, 14, 16]) (2.10)
dX = b(t, X)dt − σ(t, X)μ(t, X)dt + σ(t, X)dw(t), X(s) = x, dY = c(t, X)Y dt + μ (t, X)Y dw(t), Y (s) = 1, dZ = g(t, X)Y dt + F (t, X)Y dw(t), Z(s) = 0,
891
PRACTICAL VARIANCE REDUCTION
where μ and F are column-vector functions of dimension d satisfying some regularity conditions (e.g., they have bounded derivatives with respect to xi up to some order). We should note that X, Y , Z in (2.10) differ from X, Y , Z in (2.4); however, this does not lead to any ambiguity. The formula (2.5), i.e., u(s, x) = EΓs,x ,
(2.11)
remains valid under the new X, Y , Z. While the mean EΓ does not depend on the choice of μ and F, the variance varΓ = EΓ2 − (EΓ)2 does. Thus, μ and F can be used to decrease the variance varΓ and, consequently, the MC error can be reduced. The following theorem is proved in [14] (see also [13, 16]). Theorem 2.1. Let μ and F be such that for any x ∈ Rd there exists a solution to the system (2.10) on the interval [s, T ]. Then the variance varΓ is equal to (2.12)
T 2 Ys,x (t)
varΓ = E s
d d j=1
∂u σ + uμj + F j i ∂x i=1 ij
2 dt,
provided that the expectation in (2.12) exists. In (2.12) all the functions σ ij , μj , F j , u, ∂u/∂xi have (t, Xs,x (t)) as their argument. In particular, if μ and F are such that (2.13)
d i=1
σ ij
∂u + uμj + F j = 0, j = 1, . . . , d, ∂xi
then varΓ = 0, i.e., Γ is deterministic. We recall that if we put here F = 0, then we obtain the method of importance sampling (first considered in [6, 12, 24]), and if we put μ = 0, then we obtain the method of control variates (first considered in [21]). Theorem 2.1 establishes the combining method of variance reduction proved in [13]; see also [16]. Obviously, μ and F satisfying (2.13) cannot be constructed without knowing u(t, x), s ≤ t ≤ T, x ∈ Rd . Nevertheless, the theorem claims a general possibility of variance reduction by a proper choice of the functions μj and F j , j = 1, . . . , d. Theorem 2.1 can be used, for example, if we know a function u ˆ(t, x) connected with an approximating problem and which is close to u(t, x). In this case we take any μ ˆj , j Fˆ , j = 1, . . . , d, satisfying (2.14)
d
σ ij
i=1
∂u ˆ +u ˆμ ˆj + Fˆ j = 0, ∂xi
and then the variance var Γ, though not zero, is small. Let us emphasize that (2.13) serves only as a guidance for getting suitable μ and F (recall that the mean EΓ does not depend on the choice of μ and F ). In particular, i $ can differ from ∂ u ˆ/∂xi . In such cases, instead of (2.14) the derivative estimate ∂u/∂x we use (2.15)
d i=1
σ ij
$ ∂u +u ˆμ ˆj + Fˆ j = 0. ∂xi
It might seem that the problem of at least rough approximation of the functions u(t, x) and ∂u/∂xi (t, x) is not difficult since they can be found approximately due to
892
G. N. MILSTEIN AND M. V. TRETYAKOV
the Feynman–Kac formula, numerical integration of SDEs, and the MC technique. ¯k ) But then numerical integration of the system (2.10) presupposes evaluating u(tk , X i ¯ ¯ and ∂u/∂x (tk , Xk ) at many points (tk , Xk ). Their evaluation by the MC method requires different sets of auxiliary approximate trajectories because of the different ¯ k ). This is too expensive; i.e., as a rule, such a procedure is more starting points (tk , X expensive than simple increase of M in (2.6). Our aim is to propose a systematic method of approximating the functions u and ∂u/∂xi , i = 1, . . . , d, relatively cheaply, and hence obtain systematic methods of variance reduction. To this end, we exploit the regression method of evaluating u(tk , x) and ∂u/∂xi (tk , x), which allows us to use only one set of approximate trajectories starting from the initial position (t0 , x0 ). 2.3. Pathwise approach for derivatives ∂u/∂xi (s, x). The probabilistic representation for the derivatives ∂ i (s, x) :=
∂u(s, x) , i = 1, . . . , d, ∂xi
can be obtained by the straightforward differentiation of (2.11) (see, e.g., [7, 13]): ⎛ d ∂f (Xs,x (T )) i ∂ i (s, x) = E ⎝ (2.16) δs,x X j (T )Ys,x (T ) j ∂x j=1 ⎞ i i + f (Xs,x (T ))δs,x Y (T ) + δs,x Z(T )⎠ ,
where j ∂Xs,x (t) ∂Ys,x (t) i , δ i Y (t) := δs,x Y (t) := , ∂xi ∂xi ∂Zs,x (t) i δ i Z(t) := δs,x Z(t) := , s ≤ t ≤ T, i, j = 1, . . . , d, ∂xi
i X j (t) := δ i X j (t) := δs,x
satisfy the system of variational equations associated with (2.10): (2.17)
dδ i X =
d ∂(b(t, X) − σ(t, X)μ(t, X))
∂xj
j=1
δ i X j dt +
d ∂σ(t, X) j=1
∂xj
δ i X j dw(t) ,
δ i X j (s) = 0 if j = i, and δ i X i (s) = 1 , (2.18)
dδ i Y =
d
Y
j=1
+
d
∂c(t, X) i j δ X dt + c(t, X)δ i Y dt ∂xj Y
j=1
(2.19)
dδ i Z =
d
Y
j=1
+
d j=1
∂μ (t, X) i j δ X dw(t) + μ (t, X)δ i Y dw(t), δ i Y (s) = 0, ∂xj
∂g(t, X) i j δ X dt + g(t, X)δ i Y dt ∂xj Y
∂F (t, X) i j δ X dw(t) + F (t, X)δ i Y dw(t), δ i Z(s) = 0. ∂xj
PRACTICAL VARIANCE REDUCTION
893
Introduce a partition of the time interval [t0 , T ], for simplicity the equidistant one: t0 < t1 < · · · < tN = T with step size h = (T − t0 )/N. Let us apply a weak scheme (see, e.g., [16]) to the systems of SDEs (2.10), (2.17)–(2.19) to obtain ¯ k )), m = 1, . . . , M, all starting from independent approximate trajectories (tk , m X(t i ¯ ¯ ¯ the point (t0 , x), and m Y (tk ), m Z(tk ), m δ X(tk ), m δ¯i Y (tk ), m δ¯i Z(tk ) with m Y¯ (t0 ) = ¯ 0 ) = 0, m δ¯i X j (t0 ) = 0 if j = i, and m δ¯i X i (t0 ) = 1, m δ¯i Y (t0 ) = 0, m δ¯i Z(t0 ) = 1, m Z(t 0. Then we obtain the following MC estimates of the derivatives ∂u/∂xi (t0 , x) from (2.16) with (s, x) = (t0 , x): ⎡ M d ¯ )) ∂f (m X(T 1 ¯i j ¯ ⎣ (2.20) ∂ˆi (t0 , x) = m δ X (T ) m Y (T ) M m=1 j=1 ∂xj ⎤ ¯ )) + f (m X(T
¯i Y (T ) + m δ¯i Z(T )⎦ .
mδ
Clearly, the estimates ∂ˆi (tk , x) for derivatives ∂u/∂xi (tk , x) can be obtained analogously. Theorem 2.1 asserts that the variance in evaluating u by (2.11) can reach zero value for some μ and F . In [13] it is proved that for the same μ and F the variance in evaluating ∂ i by (2.16) is equal to zero as well (we pay attention that not only μ and F but also their derivatives are present in (2.18) and (2.19)). 2.4. Regression method of estimating conditional expectation. Let us recall the general scheme of the linear regression method (see, e.g., [8]). Consider a sample (m X, m V ), m = 1, . . . , Mr , from a generic member (X, V ) of the sample, where X is a d-dimensional and V is a one-dimensional random variable. We pay attention that we denote by Mr the size of the sample used in the regression, while M is the number of realizations used for computing the required quantity u(t0 , x0 ) (see (2.6)). Let the values of X belong to a domain D ⊂ Rd . It is of interest to estimate the regression function (2.21)
c(x) = E(V |X = x).
mapping D to R. As an estimate cˆ(x) Let {ϕl (x)}L l=1 be a set of basis functions each L of c(x), we choose the function of the form l=1 αl ϕl (x) that minimizes the empirical risk: 2 Mr L 1 αl ϕl (m X) . (2.22) α ˆ = arg min mV − α∈RL Mr m=1 l=1
So (2.23)
cˆ(x) =
L
α ˆ l ϕl (x),
l=1
where α ˆ l satisfy the system of linear algebraic equations (2.24)
a11 α1 + a12 α2 + · · · + a1L αL = b1 · · · · · · · · · · · aL1 α1 + aL2 α2 + · · · + aLL αL = bL
894
G. N. MILSTEIN AND M. V. TRETYAKOV
with (2.25)
aln =
Mr Mr 1 1 ϕl (m X)ϕn (m X), bl = ϕl (m X) Mr m=1 Mr m=1
m V,
l, n = 1, . . . , L.
Thus, the usual base material in the field of regression is a sample (m X, m V ), m = 1, . . . , Mr , from a generic member (X, V ) of the sample. Remark 2.2. Although in this paper we use linear regression, in principle other regression methods (see, e.g., [3, 8]) can be exploited as well. 3. Conditional probabilistic representations and methods of evaluating u(s, x) and ∂u/∂xi (s, x) by regression. The routine (unconditional) probabilistic representations are ideal for the MC evaluation of u(t0 , x0 ) by using a set of trajectories starting from the point (t0 , x0 ). To find u(s, x) by this approach, we need to construct another set of trajectories which starts from (s, x). However, we can use the previous set starting from (t0 , x0 ) to compute u(s, x), s > t0 , if we make use of conditional probabilistic representations. In this section we introduce the conditional probabilistic representations for solutions of parabolic equations and for derivatives of the solutions. 3.1. Conditional probabilistic representations for u(s, x) and ∂u/∂xi (s, x). Along with the unconditional probabilistic representation (2.11), (2.7), (2.10) for u(s, x), we have the following conditional one: (3.1)
u(s, x) = E (f (Xs,x (T ))Ys,x (T ) + Zs,x (T )) = E (f (Xs,X (T ))Ys,X (T ) + Zs,X (T ) with X := Xt0 ,x0 (s)|Xt0 ,x0 (s) = x) .
This formula can be considered as the conditional version of the Feynman–Kac formula. Analogously to (3.1), we get for ∂ i (s, x) = ∂u/∂xi (s, x) (see (2.16)) ⎛ ⎞ d ∂f (Xs,x (T )) i i i δs,x X j (T )Ys,x (T ) + f (Xs,x (T ))δs,x Y (T ) + δs,x Z(T )⎠ ∂ i (s, x) = E ⎝ j ∂x j=1 (3.2)
⎛ d ∂f (Xs,X (T )) i =E⎝ δs,X X j (T )Ys,X (T ) j ∂x j=1
⎞
i i + f (Xs,X (T ))δs,X Y (T ) + δs,X Z(T )|X := Xt0 ,x0 (s) = x⎠ .
So, we have two different probabilistic representations both for u(s, x) and ∂ i (s, x): the first one is in the form of unconditional expectation (see section 2), and the second one (i.e., (3.1) and (3.2)) is in the form of conditional expectation. The first form can be realized naturally by the MC approach and the second one by a regression method. As we discussed before, it is too expensive to run sets of trajectories starting from various initial points (s, x), and we do have the set of trajectories (t, m Xt0 ,x0 (t)). Taking this into account, the second way (which relies on the conditional probabilistic representations and regression) is more preferable although it is less accurate. A proof of (3.1) and (3.2) relies on the following assertion: if ζ is F˜ -measurable, f (x, ω) is independent of F˜ , and Ef (x, ω) = φ(x), then E(f (ζ, ω)|F˜ ) = φ(ζ) (see,
895
PRACTICAL VARIANCE REDUCTION
e.g., [11]). From this assertion, for any measurable g it holds (with ζ = Xt0 ,x0 (s), F˜ = σ{Xt0 ,x0 (s)}, f (x, ω) = g(Xs,x (T ))) that E(g(Xs,X (T ))|Xt0 ,x0 (s) = x) = Eg(Xs,x (T )) with X := Xt0 ,x0 (s), hence (3.1) and (3.2). 3.2. Evaluating u(s, x). In evaluating u(s, x) by regression, the pairs (X, V ) and (m X,m V ) have the form (X, V ) ∼ (Xt0 ,x0 (s), f (Xs,X (T ))Ys,X (T ) + Zs,X (T )) ,
(3.3) (m X,
mV
) ∼ (m Xt0 ,x0 (s), f (m Xs,m X (T ))
m Ys,m X (T )
+ m Zs,m X (T )) .
To realize a regression algorithm, we construct the set of trajectories (t, m Xt0 ,x0 (t)). Of course, we construct them approximately at the time moments s = tk and store ¯ t0 ,x0 (tk )). The time s in (3.3) is the obtained values. So, in reality we have (tk , m X equal to that of tk . We note that Xs,X (t) = Xs,Xt0 ,x0 (s) (t) = Xt0 ,x0 (t), t ≥ s;
(3.4)
i.e., Xs,X (t) is a continuation of the base solution starting at the moment t0 and Xs,X (T ) in (3.3) is equal to Xt0 ,x0 (T ). It is not so for Y : Ys,X (T ) = Yt0 ,x0 (T ). Let us recall that Ys,X (t) is the solution of the equation (see (2.10)) (3.5)
dYs,X = c(t, Xs,X (t))Ys,X dt + μ (t, Xs,X (t))Ys,X dw(t), Y (s) = 1.
Clearly, (3.6)
Ys,X (t) =
Yt0 ,x0 (t) , s ≤ t ≤ T, Yt0 ,x0 (s)
hence storing Yt0 ,x0 (t), we can get Ys,X (T ) in (3.3). Analogously, Zs,X (T ) = Zt0 ,x0 (T ). It is not difficult to find that (3.7) 1 1 (Zt ,x (t) − Zt0 ,x0 (s)), Zs,X (T ) = (Zt0 ,x0 (T ) − Zt0 ,x0 (s)). Zs,X (t) = Yt0 ,x0 (s) 0 0 Yt0 ,x0 (s) Therefore
1 Yt0 ,x0 (T ) u(s, x) = E f (Xt0 ,x0 (T )) + (Zt0 ,x0 (T ) − Zt0 ,x0 (s)) |Xt0 ,x0 (s) = x . Yt0 ,x0 (s) Yt0 ,x0 (s) ¯ m Y¯ , Thus, storing m Xt0 ,x0 (t), m Yt0 ,x0 (t), m Zt0 ,x0 (t), t0 ≤ t ≤ T (in fact, storing m X, ¯ m Z at tk ), we get the pairs (m X, m V ) from
1 Yt ,x (T ) (X, V ) ∼ Xt0 ,x0 (s), f (Xt0 ,x0 (T )) 0 0 + (Zt0 ,x0 (T ) − Zt0 ,x0 (s)) . Yt0 ,x0 (s) Yt0 ,x0 (s) Having this sample, one can obtain u ˆ(s, x) by the linear regression method (see section 2.4): (3.8)
u ˆ(s, x) =
L l=1
α ˆ l ϕl (x).
896
G. N. MILSTEIN AND M. V. TRETYAKOV
From (3.8) it is straightforward to obtain a very simple estimate ∂ˆi (s, x) for ∂ i (s, x) = ∂u/∂xi (s, x): ∂u ˆ(s, x) ∂ϕl (x) = α ˆl . ∂ˆi (s, x) = ∂xi ∂xi L
(3.9)
l=1
Then from (2.14) we find some μ ˆ(s, x), Fˆ (s, x) for any t0 < s < T (in reality for ˆ 0 , x0 ) (see (2.5) and (2.7)) for u(t0 , x0 ) due to any tk ) and construct the variate Γ(t ˆ 0 , x0 ) the system (2.10) with μ = μ ˆ and F = Fˆ . We repeat that the variate Γ(t ˆ is unbiased for any μ ˆ and F . We note that it is sufficient to have rather rough (in ˆ(s, x) comparison with the required accuracy in evaluating u(t0 , x0 )) approximations μ and Fˆ (s, x) of some optimal μ and F from (2.13). Therefore, it is natural to use a coarser discretization and fewer MC runs in the regression part of evaluating u ˆ(s, x) due to (3.8), i.e., to take Mr in (2.22) smaller than M and to construct samples m X in (2.25) with a comparatively rough discretization. Then in computing u(t0 , x0 ) with a finer discretization, the necessary values of μ ˆ and Fˆ at the intermediate points can be obtained after, e.g., linear interpolation of uˆ with respect to time. The success of any regression-based approach clearly depends on the choice of basis functions. This is known to be a rather complicated problem, both in practice and theory. In fact, it is necessary to use a special basis tailored to each particular problem. Fortunately, the variance can easily be evaluated during simulation. Therefore, it is not very expensive from the computational point of view to check the quality of a given basis if we take coarse discretizations both in the regression part and in the main part of evaluating u(t0 , x0 ) and if we take not too large numbers Mr and M of MC runs. This can help in choosing a proper basis. Remark 3.1. Clearly, α ˆ l depend on s (on tk ). Let us note that the number L and the set {ϕl (x)}L l=1 may depend on tk as well. Remark 3.2. It is obvious that in practice we use (2.10) with different μ and F in the implementation of the regression and in computing the required quantity u(t0 , x0 ). Indeed, in the regression part of the procedure we can take arbitrary μ and F (e.g., both zero), while in computing u(t0 , x0 ) we choose μ and F according to i $ (2.14) with u ˆ obtained via the regression or according to (2.15) with u ˆ and ∂u/∂x obtained via the regression. Remark 3.3. At s = t0 the system (2.24) degenerates into the single equation (we suppose that not all of ϕl (x0 ) are equal to zero) (3.10) Mr 1 ¯ t0 ,x0 (T )) ϕ1 (x0 )α1 + · · · + ϕL (x0 )αL = [f (m X Mr m=1
¯
m Yt0 ,x0 (T )
+ m Z¯t0 ,x0 (T )].
Therefore, the coefficients α1 (t0 ), . . . , αL (t0 ) cannot be found from (3.10) uniquely. At the same time, the linear combination α1 (t0 )ϕ1 (x0 ) + · · · + αL (t0 )ϕL (x0 ), i.e., the estimate u ˆ(t0 , x0 ) =
Mr 1 ¯ t0 ,x0 (T )) [f (m X Mr m=1
¯
m Yt0 ,x0 (T )
+ m Z¯t0 ,x0 (T )],
is defined uniquely. Clearly, when tk is close to t0 (for instance, at t1 ), the system (2.24), though not degenerate, is ill-conditioned. Nevertheless, for such tk and for x
PRACTICAL VARIANCE REDUCTION
897
close to x0 , the estimate u ˆ(tk , x) = α1 (tk )ϕ1 (x) + · · · + αL (tk )ϕL (x) can be found sufficiently accurate. However, since it is not possible to satisfactorily determine the coefficients α1 (tk ), . . . , αL (tk ), we cannot get the derivatives ∂ u ˆ(tk , x)/∂xi i i by direct differentiation as α1 (tk )∂ϕ1 (x)/∂x + · · · + αL (tk )∂ϕL (x)/∂x . In addition, let us emphasize that such difficulties are not essential for the whole procedure of variance reduction because the variance is equal to the integral (2.12), and unsatisfactory knowledge of u and ∂u/∂xi on short parts of the interval [t0 , T ] does not significantly affect the value of the integral. 3.3. Evaluating ∂u/∂xi (s, x). The problem of evaluating ∂u/∂xi (s, x) is of independent importance due to its connection with numerical computation of Greeks in finance. Many articles are devoted to pathwise methods of estimating Greeks (see [7] and the references therein; see also [13]). In [17] the finite-difference-based method is developed, and [5, 4] suggest using Malliavin calculus for computing Greeks. Several pathwise and finite-difference-based methods for calculating sensitivities of Bermudan options using regression methods and MC simulations are considered in [1] (see also the references therein). In this section we propose a conditional version of the pathwise method, and in section 3.4 we present a conditional version of the approach based on the Malliavin integration by parts for evaluating ∂u/∂xi (s, x). As mentioned previously, differentiating the equality (3.8) gives an estimate for ∂ i (s, x) = ∂u/∂xi (s, x) (see (3.9)); however, in general, it is rather rough. A more accurate way is to use the linear regression method directly. In evaluating ∂ i (s, x) by regression, the pair (X, V i ) has the form (see (3.2)) (3.11) X = Xt0 ,x0 (s), Vi =
d ∂f (Xs,X (T )) j=1
∂xj
i i i δs,X X j (T )Ys,X (T ) + f (Xs,X (T ))δs,X Y (T ) + δs,X Z(T ).
We already have expressions for Xs,X (T ), Ys,X (T ), Zs,X (T ) via Xt0 ,x0 (t), Yt0 ,x0 (t), Zt0 ,x0 (t), with t being equal to s and T (see the formulas (3.4), (3.6), (3.7)). Our neari i i est aim is to express δs,X X j (T ), δs,X Y (T ), δs,X Z(T ) via Xt0 ,x0 (t), Yt0 ,x0 (t), Zt0 ,x0 (t), δti0 ,x0 X j (t), δti0 ,x0 Y (t), δti0 ,x0 Z(t). i i X j (t). The column-vector δs,X X(t) is the solution of the linWe begin with δs,X ear homogeneous stochastic system (2.17) whose coefficients depend on Xs,X (t) = Xt0 ,x0 (t). Let the matrix i Φs,X (t) := {δs,X X j (t)}
be the fundamental matrix of solutions of (2.17) normalized at time s, i.e., Φs,X (s) = I, where I is the identity matrix. Its element on the jth row and ith column is equal i X j (t). Clearly, to δs,X (3.12)
Φs,X (t) = Φt0 ,x0 (t)Φ−1 t0 ,x0 (s).
i Now let us turn to the column-vector δs,X Y (t), consisting of components δs,X Y (t). We have (see (2.18))
(3.13)
dδs,X Y = Ys,X (t)Φ s,X (t) ∇c(t, Xs,X (t))dt + c(t, Xs,X (t))δs,X Y dt
+ Ys,X (t)Φ s,X (t) ∇[μ (t, Xs,X (t))dw(t)] + δs,X Y μ (t, Xs,X (t))dw(t), δs,X Y (s) = 0.
898
G. N. MILSTEIN AND M. V. TRETYAKOV
Due to the equality Xs,X (t) = Xt0 ,x0 (t) and (3.6) and (3.12), we get from (3.13) (3.14) dδs,X Y =
Yt0 ,x0 (t) −1 [Φ (s)] Φ t0 ,x0 (t) ∇c(t, Xt0 ,x0 (t))dt + c(t, Xt0 ,x0 (t))δs,X Y dt Yt0 ,x0 (s) t0 ,x0 Yt0 ,x0 (t) −1
[Φ (s)] Φ t0 ,x0 (t) ∇[μ (t, Xt0 ,x0 (t))dw(t)] Yt0 ,x0 (s) t0 ,x0
+
+ δs,X Y μ (t, Xt0 ,x0 (t))dw(t), δs,X Y (s) = 0. Taking into account the equality dδt0 ,x0 Y (t) = Yt0 ,x0 (t)Φ t0 ,x0 (t) ∇c(t, Xt0 ,x0 (t))dt + c(t, Xt0 ,x0 (t))δt0 ,x0 Y (t)dt
+ Yt0 ,x0 (t)Φ t0 ,x0 (t) ∇[μ (t, Xt0 ,x0 (t))dw(t)] + δt0 ,x0 Y (t)μ (t, Xt0 ,x0 (t))dw(t),
it is not difficult to verify that (3.15)
δs,X Y (t) =
Yt0 ,x0 (t)
δ (s)] Y (t) − Y (s) . [Φ−1 δ t0 ,x0 t ,x Yt0 ,x0 (s) t0 ,x0 Yt0 ,x0 (s) 0 0 1
In the similar way we obtain δs,X Z(t) =
(3.16)
1 Yt0 ,x0 (s)
[Φ−1 t0 ,x0 (s)] (δt0 ,x0 Z(t) − δt0 ,x0 Z(s))
1 [Φ−1 (s)] δt0 ,x0 Y (s) (Zt0 ,x0 (t) − Zt0 ,x0 (s)) . Yt20 ,x0 (s) t0 ,x0
−
Hence the column-vector ∂(s, x) with the components ∂ i (s, x) is equal to
Yt0 ,x0 (T ) −1 [Φ (3.17) (s)] Φ ∂(s, x) = E t0 ,x0 (T ) ∇f (Xt0 ,x0 (T )) Yt0 ,x0 (s) t0 ,x0 + f (Xt0 ,x0 (T ))δs,X Y (T ) + δs,X Z(T ) |Xt0 ,x0 (s) = x , where δs,X Y (T ) and δs,X Z(T ) are from (3.15) and (3.16). Thus, storing m Xt0 ,x0 (t), m Yt0 ,x0 (t), m Zt0 ,x0 (t), m Φt0 ,x0 (t), Z(t), t0 ≤ t ≤ T , we get the corresponding samples (3.18) (m X,
mV
i
)=
m Xt0 ,x0 (s),
m Yt0 ,x0 (T ) m Yt0 ,x0 (s)
[m Φ−1 t0 ,x0 (s)]
m δt0 ,x0 Y
m Φt0 ,x0 (T )
(t),
m δt0 ,x0
∇f (m Xt0 ,x0 (T ))
i , + f (m Xt0 ,x0 (T )) m δs,m X Y (T ) + m δs,m X Z(T ) where m Φt0 ,x0 (s) is a realization of the fundamental matrix Φt0 ,x0 (s) which corresponds to the same elementary event ω ∈ Ω as the realization m Xt0 ,x0 (t). We use (m X, m V i ) for evaluating ∂ i (s, x), i = 1, . . . , d, by the linear regression method: (3.19)
∂ˆi (s, x) =
L l=1
βˆli ψl (x).
PRACTICAL VARIANCE REDUCTION
899
Remark 3.4. This paper is most closely connected with [6, 12, 13, 14] (see also [16]) and with the works [21, 20] by N. Newton. In [21, 20], both the method of control variates and the method of importance sampling for calculating solutions u(t, x) of parabolic partial differential equations by the MC method are considered. In both cases, a perfect variate (i.e., one which is unbiased and has zero variance) is constructed based on the Funke–Shevlyakov–Haussmann formula (see the corresponding reference and details in [21]; such a formula is usually called as the Clark–Ocone– Haussmann formula). Then some approximation methods of simulating the variates are proposed in [21, 20] to yield unbiased estimators for the desired solution u(t, x) with reduced variances. If the dimension d is large, the most labor-consuming calculations are connected with integration of the d2 -dimensional system of first-order variation equations. This is required to construct the estimators. In this paper, we use variates in the form (2.11), (2.10) with μ and F satisfying (2.13). Due to Theorem 2.1, these variates are perfect if u and ∂u/∂xi are exact. We evaluate u and ∂u/∂xi based on conditional probabilistic representations and construct unbiased estimators for u(t, x) using (2.15) or (2.14). We note that (2.14) allows us to avoid estimating ∂u/∂xi (see (3.8)–(3.9)) and hence to avoid integration of the equations of first-order variation. In addition, the obtained estimator by (2.14) remains unbiased. In spite of the fact that our approach and that of N. Newton clearly differ, they undoubtedly have profound connections. For example, the Clark–Ocone–Haussmann formula, being the basis for Newton’s approach, can fairly easily be derived using the conditional probabilistic representations (3.1), (3.2). 3.4. Evaluating ∂u/∂xi (s, x) using the Malliavin integration by parts. If f (x) is an irregular function, one can use the procedure recommended in section 3.2, where we do not need direct calculations of derivatives ∂u/∂xi . Another way consists in approximating f by a smooth function with the consequent use of the procedure from section 3.3. Because we do not pursue a high accuracy in estimating u and ∂u/∂xi , such approximation of f can be quite satisfactory. For direct calculation of derivatives ∂u/∂xi without smoothing f, we can use the conditional version of the integration-by-parts (Bismut–Elworthy–Li) formula. This formula is successfully applied for evaluating deltas in the case of an irregular f (see, e.g., [5, 4, 22]). For calculating ∂u/∂xi in the case of u given by u(s, x) = EΓs,x = E[f (Xs,x (T ))Ys,x (T ) + Zs,x (T )], where Xs,x (T ), Ys,x (T ), Zs,x (T ) satisfy system (2.10), the following variant of the integration-by-parts formula can be derived: T 1 i −1 ∂Xs,x (s ) ∂ (s, x) = EΓs,x (3.20) dw(s ) σ T −s ∂xi s T T 1 1
−1 ∂Xs,x (s ) −1 ∂Xs,x (s ) EΓs,x E μ σ ds + Z (s )μ σ ds − s,x T −s ∂xi T −s ∂xi s s T T ∂Ys,x (s ) 1 Zs,x (s ) ∂Ys,x (s ) 1 1 EΓs,x E + ds − ds T −s ∂xi T −s ∂xi s Ys,x (s ) s Ys,x (s ) T T ∂Zs,x (s ) 1 1
−1 ∂Xs,x (s ) E E − Ys,x (s )F σ ds + ds := Di (s, x), T −s ∂xi T −s ∂xi s s where μ , σ −1 , and F have (s , Xs,x (s )) as their arguments. In particular, if c = 0, g = 0, μ = 0, F = 0, we get the well-known integration-by-parts formula (see,
900
G. N. MILSTEIN AND M. V. TRETYAKOV
e.g., [22]): (3.21)
∂ i (s, x) =
1 Ef (Xs,x (T )) T −s
T
σ −1 (s , Xs,x (s ))
s
∂Xs,x (s ) ∂xi
dw(s ).
As in section 3.1, together with the unconditional probabilistic representation (3.20) for ∂ i (s, x), we have the following conditional one: ∂ i (s, x) = E(Di (s, X)|X := Xt0 ,x0 (s) = x).
(3.22)
Again, the formula (3.20) is natural for the MC approach and (3.22) for a regression method. An implementation of the regression method is based upon the corresponding approximation (m X, m V i ) of the pair (X, V i ) = (Xt0 ,x0 (s), Di (s, Xt0 ,x0 (s))) following the ideas of section 3.3. 3.5. Two-run procedure. The straightforward implementation of evaluating u(s, x) and ∂u/∂xi (s, x) by regression as described in sections 3.2 and 3.3 requires storing m Λ(tk )
:= (m Xt0 ,x0 (tk ),
m Yt0 ,x0 (tk ), m Zt0 ,x0 (tk ), m Φt0 ,x0 (tk ), m δt0 ,x0 Y
(tk ),
m δt0 ,x0 Z(tk ))
¯ k )) at all tk , k = 1, . . . , N, in the main (or, more precisely, their approximations m Λ(t computer memory (RAM) until the end of the simulation. This puts a requirement on the RAM size that is too demanding and limits the practicality of the proposed approach since in almost any practical problem a relatively large number of time steps ¯ k) is needed. However, this difficulty can be overcome and we can avoid storing m Λ(t at all tk by implementing the two-run procedure described below. First, we recall that, as a rule, pseudorandom number generators used for MC simulations have the property that the sequence of random numbers obtained by them is easily reproducible (see, e.g., [16] and the references therein). Let us fix a sequence of pseudorandom numbers. The two-run procedure can schematically be presented as follows. First run: ¯ k ), k = 1, . . . , N, with • simulate Mr number of independent trajectories m Λ(t an arbitrary choice of μ and F (e.g., μ = 0 and F = 0); ¯ to form the component V needed for the • compute and store the values m Γ regression in the second run and compute and store the values ¯ (T )m Φ ¯
mY
t0 ,x0 (T )
¯ )) + f (m X(T ¯ )) ∇f (m X(T
m δY
(T ) +m δZ(T )
and m Y¯ (T ) to form the components V i in the second run. Second run: • reinitialize the random number generator so that it produces the same sequence as for the first run; • for k = 1, . . . , N ¯ k ), m = 1, . . . , Mr , as in the first run (i.e., they – simulate the same m Λ(t correspond to the same sequence of pseudorandom numbers as in the ¯ k ) in RAM; first run), keeping only the current m Λ(t ¯ k ) from this – use the values stored in RAM during the first run and m Λ(t i ¯ ¯ ) run to find u ¯(tk , x) and ∂u/∂x (tk , x) by regression (m Λ(tk ) and m Λ(T i form the pairs (m X, m V ) and (m X, m V ) needed for the regression);
901
PRACTICAL VARIANCE REDUCTION
¯(tk , x) and F¯ (tk , x) – use the found u¯(tk , x) and ∂u/∂xi (tk , x) to obtain μ required for variance reduction (see section 2.2); – simulate (2.10) with μ = μ ¯ and F = F¯ on this step and thus obtain M independent triples ˜ t0 ,x0 (tk ), (m X ˜
m Yt
˜
˜
˜
m Yt0 ,x0 (tk ), m Zt0 ,x0 (tk )) ˜ (tk−1 ) (tk ),m
, X(tk−1 ),m Y k−1 m
Z˜t
˜ = (m X ˜ k−1 ) (tk ), tk−1 ,m X(t
˜
˜ (tk−1 ),m Z(t ˜ k−1 ) (tk )),
, X(tk−1 ),m Y k−1 m
which we keep in RAM until the next step; ˜ t0 ,x0 (T ), m Y˜t0 ,x0 (T ), m Z˜t0 ,x0 (T )) to get the required • use the obtained (m X u(t0 , x0 ) (see (2.6)). We emphasize that in the two-run procedure at each time moment s = tk we need to keep in memory only the precomputed values stored at the end of the first ¯ k ) and (m X ˜ t0 ,x0 (tk ), m Y˜t0 ,x0 (tk ), m Z˜t0 ,x0 (tk )) (only at the run and the values m Λ(t current time step k), which is well within RAM limits of a PC. We note that the two-run realization of the procedure from section 3.2 based on using regression for estimating u only is less computationally demanding (both on processor time and RAM and especially for problems of large dimension d) than the procedures of sections 3.3 and 3.4 which estimate the derivatives of u via regression. The two-run procedure was used in the numerical experiments of sections 4.2 and 4.3. 4. Examples. The first example is partly illustrative and partly theoretical. The second and third examples are numerical. 4.1. Heat equation. Consider the Cauchy problem (4.1)
∂u σ 2 ∂ 2 u + = 0, t0 ≤ t < T, x ∈ R, ∂t 2 ∂x2 u(T, x) = x2 .
Its solution is (4.2)
u(t, x) = σ 2 (T − t) + x2 .
The probabilistic representation (2.10), (2.11) with μ = 0 takes the form 2
u(s, x) = E Xs,x (4.3) (T ) + Zs,x (T ) = EΓs,x , (4.4)
dX = σdw(t), X(s) = x,
(4.5)
dZ = F (t, X)dw(t), Z(s) = 0. 2
(T ) + Zs,x (T ) = 0 for the optimal Due to Theorem 2.1, we have varΓs,x = var Xs,x choice of the function F (t, x) = −σ∂u/∂x = −2σx. We note that in this example ∂u/∂x and the optimal F do not depend on time t. For the purpose of this illustrative example, we evaluate u(0, 0) = EΓ0,0 . Let us simulate (4.4) exactly (i.e., we have no error of numerical integration): (4.6) X0 = x, Xk+1 = Xk + σΔk w, k = 0, . . . , N − 1, Δk w := w(tk+1 ) − w(tk ). M 1 2 ˆ(0, 0) = M For F ≡ 0, we have u(0, 0) = EΓ0,0 ≈ u m=1 m XN , where m XN are independent realizations of XN obtained by (4.6). Further, varΓ0,0 = 2σ 4 T 2 , and
902
G. N. MILSTEIN AND M. V. TRETYAKOV
hence the MC error is equal to (see (2.9)) √ 2 2σ T . (4.7) ρ=c √ M For instance, to achieve the accuracy ρ = 0.0001 for c = 3 (recall that there is no error of numerical integration here) in the case of σ = 1 and T = 10, one needs to perform M = 18 × 1010 MC runs. To reduce the MC error, we estimate ∂u/∂x by regression to get Fˆ (tk , x) close to the optimal F = −2σx. As the basis functions for the regression, we take the first two Hermite polynomials: ψ1 (x) = 1, ψ2 (x) = 2x.
(4.8)
We note that in this example the required derivative ∂u/∂x can be expanded in the basis (4.8); i.e., here we do not have any error due to the cut-off of a set of basis functions. In the construction of the estimate for ∂u/∂x, we put F = 0 in (4.5). The variational equation associated with (4.4) has the form (see (2.17)) dδX = 0, δX(s) = 1, and hence δX(t) = 1, t ≥ s. Thus, the sample from (3.18) takes the form ˆ k , x) for ∂u/∂x(tk , x) (m X, m V ) = (m Xt0 ,x0 (s), 2 m Xt0 ,x0 (T )) and the estimator ∂(t is constructed as ˆ k , x) = α ˆ 1 (tk ) + 2α ˆ 2 (tk )x, k = 1, . . . , N, ∂(t
(4.9)
where α ˆ 1 (tk ) and α ˆ 2 (tk ) satisfy the system of linear algebraic equations (see (2.24)– (2.25)) a11 α1 + a12 α2 = b1 ,
(4.10)
a21 α1 + a22 α2 = b2 , a11 = 1, a12 = a21 := a12 (tk ) =
(4.11)
a22 := a22 (tk ) =
b1 := b1 (tk ) =
Mr 1 2 × m X(tk ), Mr m=1
Mr 1 2 4 × (m X(tk )) , Mr m=1
Mr Mr 1 1 2 × m X(T ), b2 := b2 (tk ) = 4 × m X(tk ) × m X(T ). Mr m=1 Mr m=1
Here m X(tk ), m = 1, . . . , Mr , k = 1, . . . , N, are independent realizations of X(tk ) obtained by (4.6). Hence (4.12)
α ˆ 1 (tk ) =
b1 a22 − b2 a12 a22 − (a12 )
2
, α ˆ 2 (tk ) =
b2 − b1 a12
2.
a22 − (a12 )
We define (4.13)
Mr σ Fˆ (0, x) = − 2 × m X(T ), Mr m=1
Fˆ (t, x) = −σ (α ˆ 1 (tk ) + 2α ˆ 2 (tk )x) for t ∈ (tk−1 , tk ], k = 1, . . . , N.
PRACTICAL VARIANCE REDUCTION
903
We simulate (4.5) with F = Fˆ (t, x) exactly (i.e., again we have no error of numerical integration): (4.14)
Z0 = 0,
Zk+1 = Zk − σ α ˆ 1 (tk+1 )Δk w − 2σ 2 α ˆ 2 (tk+1 )w(tk )Δk w − σ 2 α ˆ2 (tk+1 ) (Δk w)2 − h .
The increments Δk w are the same both in (4.6) and in (4.14) and are independent of ˆ2. the ones used to estimate α ˆ 1 and α We simulate (4.15)
Mr 2 1 2 u(0, 0) = EΓ0,0 = E XN ˆ(0, 0) = + ZN ≈ u m XN + m ZN , Mr m=1
where m XN and m ZN are independent realizations of XN and ZN obtained according to (4.6) and (4.14). We note that the approximation (4.15) does not have the numerical integration error or the error due to the cut-off of the basis; it has the MC error only. Using Theorem 2.1, one can evaluate varΓ0,0 in the case of F = Fˆ defined in (4.13) and obtain varΓ0,0 ≈ 4σ 4 T 2 /Mr . Then the MC error ρ in this case is equal to (compare with (4.7)) (4.16)
2σ 2 T ρ ≈ c√ . M Mr
This example illustrates that in the absence of the error due to the cut-off of a set of basis functions used √ in regression and of the numerical integration error, the MC error is reduced ∼ 1/ Mr times by the proposed variance reduction technique. This is, of course, a significant improvement. Indeed, let us return to the example discussed after (4.7). The estimate (4.16) implies that to achieve the accuracy ρ = 0.0001 for c = 3 in the case of σ = 1 and T = 10, one can take, e.g., M = Mr = 6 × 105 ; i.e., one can run about 105 times fewer trajectories than when the variance reduction was not used (see the discussion after (4.7)). The gain of computational efficiency is significant in spite of the fact that there is an overhead cost of solving the linear system (4.10) in the “regression’s runs.” Remark 4.1. In the above analysis we assumed that “regression’s runs” and the MC runs for computing the desired value u(0, 0) are independent. In practice, this assumption can be dropped, and we can use the same paths X(t) for both the “regression’s runs” and the MC runs. Then, as a rule, we choose Mr ≤ M. Remark 4.2. We are expecting (see also experiments in section 4.2) that in the general case the MC error after application of this variance reduction technique has the form
1 errB hp/2 +√ + √ (4.17) ρ=O √ , M Mr M M where the first term has the same nature as in this illustrative example (see (4.16)); the second term is due to the error of numerical integration (it is assumed that a method of weak order p is used); and the third one arises as a result of the use of a finite set of functions as the basis in the regression, while the solution u(t, x) is usually expandable in a basis consisting of an infinite number of functions (i.e., this
904
G. N. MILSTEIN AND M. V. TRETYAKOV
error is due to the cut-off of the basis). We note that finding an appropriate basis for regression in applying this variance reduction approach to a particular problem can be a difficult task and requires some knowledge of the solution u(t, x) of the considered problem. Roughly speaking, in the proposed implementation of the variance reduction methods (the method of importance sampling, the method of control variates, or the combining method) we substitute the task of finding an approximate solution to the problem of interest with the task of finding an appropriate basis for the regression. For complicated systems of SDEs, it is preferable to use regression to approximate the solution u(t, x) and then differentiate this approximation to approximate the derivatives ∂u/∂xi . In the case of this illustrative example we take the first three Hermite polynomials, ψ1 (x) = 1, ψ2 (x) = 2x, ψ3 (x) = 4x2 − 2,
(4.18)
as the basis functions for the regression. In this example the required function u(t, x) can be expanded in the basis (4.18). We construct the estimator u ˆ(tk , x) for u(tk , x): 2 ˆ 1 (tk ) + 2α ˆ 2 (tk )x + α ˆ3 (tk ) · 4x − 2 , k = 1, . . . , N, (4.19) u ˆ(tk , x) = α where α ˆ 1 (tk ), α ˆ 2 (tk ), α ˆ 3 (tk ) satisfy the system of linear algebraic equations (2.24) with the corresponding coefficients. Further, we approximate the derivative ∂u/∂x(tk , x), ∂u (tk , x) ≈ 2α ˆ 2 (tk ) + 8α ˆ 3 (tk )x, ∂x with α ˆ 2 (tk ) and α ˆ 3 (tk ) from (4.19), and we define (4.20)
(4.21)
Fˆ (t, x) := −σ (2α ˆ 2 (tk ) + 8α ˆ 3 (tk )x) for t ∈ [tk−1 , tk ), k = 1, . . . , N,
which we use for variance reduction by putting F = Fˆ in (4.5). In the experiments we simulate (4.5) with F = Fˆ (t, x) exactly (see (4.14)). The new estimator for u(0, 0) has the form (4.15) again but with the new ZN corresponding to the choice of Fˆ (t, x) from (4.21). Table 1 Heat equation. Simulation of u(0, 0) for σ = 1 and T = 10 by (4.15) with the corresponding choice of the function F and for various M . The time step h = 0.1 and Mr = M . The exact value is u(0, 0) = 10. The value after “±” equals two standard deviations of the corresponding estimator and gives the confidence interval for the corresponding value with probability 0.95 (i.e., c = 2). M 103 104 105
F =0 9.67 ± 0.85 9.92 ± 0.28 9.970 ± 0.089
F = Fˆ from (4.13) 9.993 ± 0.045 9.9970 ± 0.0058 10.0000 ± 0.0003
F = Fˆ from (4.21) 9.999 ± 0.101 9.999 ± 0.012 10.0014 ± 0.0014
Table 1 gives some results of simulating u(0, 0) by (4.15) with F = 0, F = Fˆ from (4.13), and F = Fˆ from (4.21). √ We see that for F = 0 the MC error is consistent with (4.7); i.e., it decreases ∼ 1/ M . When the variance reduction is used, the results in Table 1 approve the MC error estimate (4.16). It is quite obvious that Fˆ from (4.13) is a more accurate estimator for the exact F = −2σx than Fˆ from (4.21), and then the MC error in the first case should usually be less than in the second case, which is observed in the experiments as well. We also did similar experiments in the case of the terminal condition u(T, x) = x4 in (4.1). To estimate ∂u/∂x by regression, we took the basis consisting of the first four Hermite polynomials. The results were analogous to those given above for the case x2 .
905
PRACTICAL VARIANCE REDUCTION
4.2. Ergodic limit for one-dimensional array of stochastic oscillators. Consider the one-dimensional array of oscillators [23, 19]: (4.22) dP i = −V (Qi ) dt − λ · (2Qi − Qi+1 − Qi−1 ) dt − νP i dt + σ dwi (t), P i (0) = pi , dQi = P i dt, Qi (0) = q i , i = 1, . . . , n, where periodic boundary conditions are assumed, i.e., Q0 := Qn and Qn+1 := Q1 ; wi (t), i = 1, . . . , n, are independent standard Wiener processes; ν > 0 is a dissipation parameter; λ ≥ 0 is a coupling constant; σ is the noise intensity; and V (z), z ∈ R, is a potential. The SDEs (4.22) are ergodic with the Gibbs invariant measure μ. We are interested in computing the average of the potential energy with respect to the invariant measure associated with (4.22): n
λ i i i+1 2 Eμ U (Q) = Eμ V (Q ) + · (Q − Q ) . 2 i=1 To this end (see further details in [19]), we simulate the system (4.22) on a long time interval and approximate the ergodic limit Eμ U (Q) by EU (Q(T )) for a large T. To illustrate variance reduction via regression, we simulate (4.23)
u(0, p, q) = EU (Qp,q (T )) = E [U (Qp,q (T )) + Zp,q (T )] ,
where Z(t), 0 ≤ t ≤ T, satisfies dZ = F (t, P, Q)dw(t), Z(0) = 0.
(4.24)
We choose the n-dimensional vector function F (t, p, q) to be equal to (see (2.14)) F i (t, p, q) = −σ
(4.25)
∂u ˆ , ∂pi
i = 1, . . . , n,
where u ˆ = uˆ(t, p, q) is an approximation of the function u(t, p, q) := EU (Qt,p,q (T )). We simulate (4.22) using the second-order weak quasi-simplectic integrator from [15, 16]: (4.26) P0 = p,
Q0 = q ,
h i i P1,k = e−νh/2 Pki , Qi1,k = Qik + P1,k , 2 . i−1 i i = P1,k + h −V (Qi1,k ) − λ · (2Qi1,k − Qi+1 − Q ) + h1/2 σξik , P2,k 1,k 1,k i i = e−νh/2 P2,k , Qik+1 = Qi1,k + Pk+1
h i P , i = 1, . . . , n, 2 2,k
k = 0, . . . , N − 1 ,
where ξik are independent and identically distributed random variables with the law √ (4.27) P (ξ = 0) = 2/3, P (ξ = ± 3) = 1/6.
906
G. N. MILSTEIN AND M. V. TRETYAKOV
And we approximate (4.24) by the standard second-order weak method (see [16, p. 103]): (4.28) Z0 = 0, Zk+1 = Zk + h1/2
n
F i (tk , Pk , Qk )ξik + σh
i=1
n n ∂ r F (tk , Pk , Qk )ξirk i ∂p r=1 i=1
n
1 + h3/2 LF i (tk , Pk , Qk )ξik , 2 i=1 −1, i < r, 1 1 ξirk = ξik ξrk − γir ζik ζrk , γir = 2 2 1, i ≥ r , ∂ 1 ∂2 ∂ + + −V (q i ) − λ · (2q i − q i+1 − q i−1 ) − νpi i j ∂t 2 i=1 j=1 ∂p ∂p ∂pi i=1 n
L :=
+
n i=1
pi
n
n
∂ , ∂q i
where ξik and ζjk are mutually independent random variables, ξik are distributed by the law (4.27), and the ζik are distributed by the law P (ζ = ±1) = 1/2. We consider two potentials: the harmonic potential (4.29)
V (z) =
1 2 z , z ∈ R, 2
and the hard anharmonic potential 1 2 1 4 z + z , z ∈ R. 2 2 We define the approximation uˆ(t, p, q) used in (4.25) at t = tk , k = 0, . . . , N − 1 , as follows. First, it is reasonable to put ∂ u ˆ/∂pi (t, p, q) = 0 for 0 ≤ t ≤ T0 with some relatively small T0 since for large T the function u(t, p, q), 0 ≤ t ≤ T0 , is almost constant due to the ergodicity (the expectation in (4.23) is almost independent of the initial condition). Further, let T0 , T, h, N, and a nonnegative integer κ be such that T0 = N0 h, T = N h, N − N0 = κN , where N0 and N are integers. Introduce θk = tN0 +k κ , k = 1, . . . , N . In the case of harmonic potential the required function u(t, p, q) can be expanded in the basis consisting of the finite number of functions (4.30)
(4.31)
V (z) =
ϕl ∈ {1, pi , q i , pi pj , q i q j , pi q j ,
i, j = 1, . . . , n}.
In our experiments we deal with three oscillators (n = 3); the basis (4.31) in this case has 28 functions. We use the set of functions (4.31) as a set of basis functions for regression in both cases of harmonic and hard anharmonic potentials. Namely, using regression as described in section 3.2, we construct the estimator u ˆ(θk , p, q) for u(θk , p, q) as (4.32)
uˆ(θk , p, q) =
L l=1
α ˆ l (θk )ϕl (p, q),
PRACTICAL VARIANCE REDUCTION
907
ˆ l (θk ) satisfy the system of linear algebraic equawhere ϕl are defined in (4.31) and α tions (2.24). The matrix formed from α ˆ l (θk ) is positive definite, and we solve the system of linear algebraic equations by Cholesky decomposition. To find the estimator u ˆ, we use Mr independent trajectories. Then for T0 < tk < T we put uˆ(tk , p, q) = u ˆ(θk , p, q) with θk ≤ tk < θk +1 . The recalculation of the estimator u ˆ once per a few number of steps κ reduces the cost of the procedure. We note that for the basis (4.31) the corresponding function F from (4.25) is such that some terms in the scheme (4.28) are canceled; in particular, it is not required to simulate the ζik in this case. We compute u(0, p, q) in the usual way, (4.33)
u(0, p, q) = E [U (Qp,q (T )) + Zp,q (T )] ≈ E [U (QN ) + ZN ] ≈
M 1 [U (m QN ) + M m=1
m ZN ] ,
by simulating M independent realizations of QN , ZN from (4.26), (4.28). In these experiments the two-run procedure described in section 3.5 was used. Suppose we would like to compute u(0, p, q) for the particular set of parameters n = 3, λ = 1, ν = 1, σ = 1, T = 10 and the potentials (4.29) and (4.30) with accuracy of order 10−3 . Since we are using the scheme of order two, we can take h = 0.02. Let us first consider the case of harmonic potential (4.29). Without variance reduction (i.e., for F = 0), we obtain 0.7500 ± 0.0010 with the fiducial probability 95% by simulating M = 1.4 × 106 trajectories, taking ∼541 sec on a PC. When we use the variance reduction technique as described above, it is sufficient to take T0 = 2, κ = 2, Mr = 2 × 104, M = 3 × 104 to get 0.7496 ± 0.0010 in ∼64 sec. In this example the procedure with variance reduction requires an eighth of the computational time. All the expenses are taken into account, including the time required for the first run of the two-run procedure, which is less than 10% of the total time. We recall that in this case the required function u(t, p, q) can be expanded in the finite basis (4.31), unlike the case of hard anharmonic potential when such a basis is infinite. Now consider the case of hard anharmonic potential (4.30). Without variance reduction (i.e., for F = 0), we obtain 0.6491 ± 0.0011 with the fiducial probability 95% by simulating M = 106 trajectories, taking ∼403 sec on a PC. With variance reduction, we reach the same level of accuracy 0.6491 ± 0.0011 in ∼98 sec by choosing, e.g., T0 = 2, κ = 2, Mr = 2.5 × 104 , M = 5.5 × 104. Thus, the procedure with variance reduction requires a quarter of the computational time. Some other results of our numerical experiments are presented in Tables 2 and 3. They show dependence of the MC error on M and Mr . The numerical integration error is relatively small here and does not essentially affect the results. The case We observe Mr = 0 means that the simulation was done without variance reduction. √ that in both tables for a fixed Mr the √ MC error decreases ∼1/ M . Further, we see from Table 2 that the MC √ error is ∼1/ Mr for fixed M (for Mr > 0, of course), and, consequently, it is ∼1/ M Mr when the variance reduction is used (we recall that the time step is relatively small here). As noted before, the basis used in the variance reduction is such that the function u(t, x) can be expanded in it in the case of harmonic potential; i.e., errB in (4.17) is equal to 0. These observations are consistent with the MC error estimate (4.17). For the anharmonic potential, errB is not equal to zero, and we see in Table 3 that the increase of Mr has less impact on the MC error in this case.
908
G. N. MILSTEIN AND M. V. TRETYAKOV
Table 2 Harmonic potential. Two standard deviations of the estimator (4.33) in the case of potential (4.29) for different M and Mr . Mr = 0 means that variance reduction was not used. The other parameters are n = 3, λ = 1, ν = 1, σ = 1, T = 10 and h = 0.01, T0 = 2, κ = 1.
M M M M
103
= = 104 = 105 = 106
Mr = 0 4.0 × 10−2 1.2 × 10−2 3.9 × 10−3 1.2 × 10−3
Mr = 103 2.6 × 10−2 7.8 × 10−3 2.3 × 10−3 8.2 × 10−4
Mr = 104 −− 2.3 × 10−3 7.9 × 10−4 2.4 × 10−4
Mr = 105 −− −− 2.5 × 10−4 7 × 10−5
Table 3 Hard anharmonic potential. Two standard deviations of the estimator (4.33) in the case of potential (4.30) for different M and Mr . The other parameters are the same as in Table 2.
M M M M
= 103 = 104 = 105 = 106
Mr = 0 3.3 × 10−2 1.1 × 10−2 3.5 × 10−3 1.1 × 10−3
Mr = 103 2.3 × 10−2 7.4 × 10−3 2.4 × 10−3 7.4 × 10−4
Mr = 104 −− 3.0 × 10−3 9.5 × 10−4 2.9 × 10−4
Mr = 105 −− −− 6.7 × 10−4 2.2 × 10−4
4.3. Pricing a binary asset-or-nothing call option. Consider the Black– Scholes equation for pricing a binary asset-or-nothing call option: (4.34)
∂u ν 2 2 ∂ 2 u ∂u + x − ru = 0, 0 ≤ t < T, x ∈ R, + rx ∂t 2 ∂x2 ∂x 0 if x < K, u(T, x) = f (x) = x if x ≥ K.
The solution of this problem for x > 0 and K > 0 is (4.35)
u(t, x) = x Φ (y∗ ) ,
where
y 2 ν2 1 x 1 + r+ y∗ = √ e−z /2 dz . ln (T − t) and Φ(y) = √ K 2 ν T −t 2π −∞ The probabilistic representation (with μ = 0) of the solution to (4.34) takes the form u(s, x) = E f (Xs,x (T ))e−r(T −s) + Zs,x (T ) , (4.36) (4.37)
dX = rXdt + νXdw(t), X(s) = x,
(4.38)
dZ = F (t, X)e−r(t−s) dw(t), Z(s) = 0.
The purpose of this example is to illustrate that the approach to evaluating u(s, x) introduced in section 3.2 works, in principle, in the case of discontinuous initial conditions f (x). We use, as a set of basis functions for regression, the set consisting of three functions: (4.39)
ϕ1 (x) =
K (arctan(α(x − K)) + arctan(αK), π
909
PRACTICAL VARIANCE REDUCTION
ϕ2 (x) =
x(x − 2K) x + , 2 2 4( (x − K) /4 + β + K 2 /4 + β)
ϕ3 (x) =
x , γ + x2
where α > 0, β > 0, and γ > 0 are parameters, which can change from one time layer to another. We note that the functions are chosen so that ϕl (0) = 0, l = 1, 2, 3, and the payoff f (x) is well approximated by ϕ1 (x) + ϕ2 (x) with large α and small β. In the experiments, we take the volatility ν = 0.2, the interest rate r = 0.02, and the maturity time T = 3 and approximate the option price u(0, 1), whose exact value due to (4.35) is u(0, 1) ≈ 0.635 48. We define the time-dependent α = α(t) and β = β(t) via linear interpolation: 0.0001t 0.005(T − t) 10t 0.01(T − t) + , β(t) = + , T T T T and we choose γ = 8. We simulate (4.37)–(4.38) using the weak Euler scheme with time step h = T /N = 0.001. In the first run (see section 3.5 for the description of ¯ ))e−rT , which are needed the algorithm), we put F = 0 and store the values f (m X(T for the regression in the second run. In the second run, using regression with the set of basis functions (4.39), we construct the estimator u ˆ(θk , x) for u(θk , x), where θk = κk h, k = 1, . . . , N ; κ and N are nonnegative integers such that κN h = T . We use here κ = 5; i.e., we recalculate the estimator u ˆ only once per five time layers to reduce the computational cost. Further, u ˆ(tk , x) is set equal to zero for 0 ≤ tk < 0.01. In the second run we put F (t, x) = −ν∂ u ˆ/∂x. In both runs we simulate M = 4 · 104 independent trajectories. As a result, we get u(0, 1) ≈ u¯(0, 1) = 0.6358 ± 0.0018 with the fiducial probability 95%. To achieve a similar result without variance reduction, namely, u ¯(0, 1) = 0.6342 ± 0.0019, one has to simulate M = 5 · 105 independent trajectories, which requires at least three times more computational time than the procedure with variance reduction. This experiment demonstrates that the simple and cheap estimation of ∂u/∂x by ∂ u ˆ/∂x works even in the case of discontinuous initial conditions. α(t) =
5. Conclusions. Starting an MC simulation, first of all we have to estimate the number of trajectories required to reach a prescribed accuracy. Fortunately, we can easily do this because a reliable estimate of the variance can be obtained by a preliminary numerical experiment using a relatively small set of trajectories. If the required number of trajectories is too large, we run inevitably into the problem of variance reduction. The known variance reduction methods (the method of importance sampling, the method of control variates, and the combining method) are based on the assumption that approximations of the solution u(t, x) of the considered problem and its spatial derivatives ∂u(t, x)/∂xi are known. In this paper we proposed to construct such approximations as a part of the MC simulation using conditional probabilistic representations together with the regression method and thus make the variance reduction methods practical. The basis used in the regression method can be chosen using some a priori knowledge of the considered problems, as illustrated in the examples. As is known (see, e.g., [16]), the variance reduction methods are applicable in the case of boundary value problems for parabolic and elliptic equations as well. Although here we illustrated the proposed implementation of these variance reduction methods for the Cauchy problems for parabolic equations, the approach is straightforwardly applicable to boundary value problems. We also note that the proposed technique of conditional probabilistic representations together with regression can be used for evaluating different Greeks for American- and Bermudan-type options (see [1]).
910
G. N. MILSTEIN AND M. V. TRETYAKOV REFERENCES
[1] D. Belomestny, G. N. Milstein, and J. G. M. Schoenmakers, Sensitivities for Bermudan Options by Regression Methods, WIAS preprint 1247, WIAS, Berlin, 2007. [2] B. Bouchard, I. Ekeland, and N. Touzi, On the Malliavin approach to Monte Carlo approximation of conditional expectations, Finance Stoch., 8 (2004), pp. 45–71. [3] J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications, Chapman & Hall, London, 1996. [4] E. Fourni´ e, J.-M. Lasry, J. Lebuchoux, and P.-L. Lions, Application of Malliavin calculus to Monte Carlo methods in finance II, Finance Stoch., 5 (2001), pp. 201–236. [5] E. Fourni´ e, J.-M. Lasry, J. Lebuchoux, P.-L. Lions, and N. Touzi, Application of Malliavin calculus to Monte Carlo methods in finance, Finance Stoch., 3 (1999), pp. 391–412. [6] S. A. Gladyshev and G. N. Milstein, The Runge-Kutta method for calculation of Wiener integrals of functionals of exponential type, Zh. Vychisl. Mat. i Mat. Fiz., 24 (1984), pp. 1136– 1149. [7] P. Glasserman, Monte Carlo Methods in Financial Engineering, Springer-Verlag, New York, 2004. ¨ rfi, M. Kohler, A. Krzyz˙ ak, and H. Walk, A Distribution-Free Theory of Nonpara[8] L. Gyo metric Regression, Springer-Verlag, New York, 2002. [9] A. Kebaier, Statistical Romberg extrapolation: A new variance reduction method and applications to option pricing, Ann. Appl. Probab., 15 (2005), pp. 2681–2705. [10] A. Kohatsu-Higa and R. Pettersson, Variance reduction methods for simulation of densities on Wiener space, SIAM J. Numer. Anal., 40 (2002), pp. 431–450. [11] N. V. Krylov, Controllable Processes of Diffusion Type, Nauka, Moscow, 1977. [12] G. N. Milstein, Numerical Integration of Stochastic Differential Equations, Ural State University, Sverdlovsk, 1988 (in Russian); English translation: Kluwer Academic, Dordrecht, The Netherlands, 1995. [13] G. N. Milstein and J. G. M. Schoenmakers, Monte Carlo construction of hedging strategies against multi-asset European claims, Stoch. Stoch. Rep., 73 (2002), pp. 125–157. [14] G. N. Milstein, J. G. M. Schoenmakers, and V. Spokoiny, Transition density estimation for stochastic differential equations via forward-reverse representations, Bernoulli, 10 (2004), pp. 281–312. [15] G. N. Milstein and M. V. Tretyakov, Quasi-symplectic methods for Langevin-type equations, IMA J. Numer. Anal., 23 (2003), pp. 593–626. [16] G. N. Milstein and M. V. Tretyakov, Stochastic Numerics for Mathematical Physics, Springer-Verlag, Berlin, 2004. [17] G. N. Milstein and M. V. Tretyakov, Numerical analysis of Monte Carlo evaluation of Greeks by finite differences, J. Comput. Finance, 8 (2005), pp. 1–33. [18] G. N. Milstein and M. V. Tretyakov, Numerical integration of stochastic differential equations with nonglobally Lipschitz coefficients, SIAM J. Numer. Anal., 43 (2005), pp. 1139– 1154. [19] G. N. Milstein and M. V. Tretyakov, Computing ergodic limits for Langevin equations, Phys. D, 229 (2007), pp. 81–95. [20] N. Newton, Continuous-time Monte Carlo methods and variance reduction, in Numerical Methods in Finance, L. C. G. Rodgers and D. Talay, eds., Cambridge University Press, Cambridge, UK, 1997, pp. 22–42. [21] N. J. Newton, Variance reduction for simulated diffusions, SIAM J. Appl. Math., 54 (1994), pp. 1780–1805. [22] D. Nualart, The Malliavin Calculus and Related Topics, Springer-Verlag, Berlin, 2006. [23] R. Reigada, A. H. Romero, A. Sarmiento, and K. Lindenberg, One-dimensional arrays of oscillators: Energy localization in thermal equilibrium, J. Chem. Phys., 111 (1999), pp. 1373–1384. [24] W. Wagner, Monte Carlo evaluation of functionals of solutions of stochastic differential equations. Variance reduction and numerical examples, Stoch. Anal. Appl., 6 (1988), pp. 447– 468. [25] G. Zou and R. D. Skeel, Robust variance reduction for random walk methods, SIAM J. Sci. Comput., 25 (2004), pp. 1964–1981.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 911–928
A DOMAIN DECOMPOSITION METHOD FOR COMPUTING BIVARIATE SPLINE FITS OF SCATTERED DATA∗ MING-JUN LAI† AND LARRY L. SCHUMAKER‡ Abstract. A domain decomposition method for solving large bivariate scattered data fitting problems with bivariate minimal energy, discrete least-squares, and penalized least-squares splines is described. The method is based on splitting the domain into smaller domains, solving the associated smaller fitting problems, and combining the coefficients to get a global fit. Explicit error bounds are established for how well our locally constructed spline fits approximate the global fits. Some numerical examples are given to illustrate the effectiveness of the method. Key words. computation of bivariate splines, scattered data fitting AMS subject classifications. 41A63, 41A15, 65D07 DOI. 10.1137/070710056
1. Introduction. Suppose f is a smooth function defined on a domain Ω in d R2 with polygonal boundary. Given the values {fi := f (xi , yi )}ni=1 of f at some set of scattered points in Ω, we consider the problem of computing a function s that interpolates the data, or in the case of noisy data or large sets of data, approximates rather than interpolates f . There are many methods for solving this problem, but here we will focus on three methods based on bivariate splines, namely, • the minimal energy (ME) method, • the discrete least-squares (DLS) method, • the penalized least-squares (PLS) method. These three variational methods have been extensively studied in the literature; see [1, 6, 7, 8, 12] and the references therein. It is well known that all three do a good job of fitting smooth functions. But they are global methods, which means that the coefficients of a fitting spline are computed from a single linear system of equations, which can be very large if the dimension of the spline space is large. This would appear to limit the applicability of variational spline methods to moderately sized problems. However, as we shall show in this paper, it is possible to efficiently compute ME-, DLS-, and PLS-splines, even with spline spaces of very large dimension. Suppose that is a triangulation of Ω, and that S() is a spline space defined on . Throughout this paper we assume that S() has a stable local minimal determining set M; see section 4 or the book [10]. This means that each spline s ∈ S() is uniquely determined by a set of coefficients {cξ }ξ∈M , where each cξ is associated with a unique (domain) point ξ of . The idea of our method is simple. Instead of finding all of the coefficients {cξ }ξ∈M at once, this algorithm reduces the problem to a collection of smaller problems. To state our algorithm formally, we need some additional notation. If ω is a subset of Ω, ∗ Received
by the editors December 4, 2007; accepted for publication (in revised form) July 10, 2008; published electronically February 13, 2009. http://www.siam.org/journals/sinum/47-2/71005.html † Department of Mathematics, University of Georgia, Athens, GA 30602 (
[email protected]). This author’s research was partially supported by the National Science Foundation under grant 0713807. ‡ Department of Mathematics, Vanderbilt University, Nashville, TN 37240 (larry.schumaker@ vanderbilt.edu). 911
912
MING-JUN LAI AND LARRY L. SCHUMAKER
we set star0 (ω) = ω ¯ , and for all ≥ 1, recursively define star (ω) :=
/
{T ∈ : T ∩ star−1 (ω) = ∅}.
Algorithm 1.1 (domain decomposition method). 1) Choose a decomposition of Ω into disjoint connected sets {Ωi }m i=1 . 2) Choose k > 0. For each i = 1, . . . , m, let ski ∈ S()|Ωki be the spline fit based on data in Ωki := stark (Ωi ). Let {cki,ξ } be the set of all coefficients of ski . 3) For each i = 1, . . . , m, set cξ = cki,ξ
for all ξ ∈ M ∩ Ωi .
We call a spline s produced by this algorithm a domain decomposition (DDC) spline. We emphasize that this domain decomposition method is very different from domain decomposition methods used in classical numerical algorithms for partial differential equations and in the application of radial basis functions to scattered data fitting and meshless methods for PDE’s; see Remark 1. As we shall see, our method • is easy to implement, • allows the solution of very large data fitting problems, • with appropriately chosen m and k, produces a spline which is very close to the globally defined spline, • is amenable to parallel processing, • produces a spline s in the space S(), i.e., with the same smoothness as the global fit, • does not make use of blending functions. The paper is organized as follows. In section 2 we review the basics of minimal energy, discrete least-squares, and penalized least-squares spline fitting. Then in section 3 we present some numerical experiments to illustrate the performance of our domain decomposition method. There we also explore the following questions: • How does the time required to compute a domain decomposition spline s compare with that required for finding a global spline fit sg from S()? • How does s − sg behave as we choose different decompositions and different values for the parameter k? • How well does the shape of s match that of sg ? In section 4 we review some Bernstein–B´ezier tools needed to analyze our method and present two lemmas needed later. In section 5 we show that for the variational spline methods described in the following section, s − sg = O(σ k ) for some 0 < σ < 1. We conclude the paper with remarks and references. 2. Three variational spline fitting methods. Given d > r ≥ 1 and a triangulation of a domain Ω ∈ R2 , let Sdr () := {s ∈ C r (Ω) : s|T ∈ Pd , all T ∈ } be the associated space of bivariate splines of smoothness r and degree d. Here Pd is the d+2 dimensional space of bivariate polynomials of degree d. Such spaces, 2 along with various subspaces of so-called supersplines, have been intensely studied in the literature; see the book [10] and references therein. There are many splinebased methods for interpolation and approximation. Here we are interested in three particular methods.
COMPUTING BIVARIATE SPLINE FITS
913
2.1. Minimal energy interpolating splines. Suppose we are given values d d {fi }ni=1 associated with a set of nd ≥ 3 abscissae A := {(xi , yi )}ni=1 in the plane. The problem is to construct a smooth function s that interpolates this data in the sense that s(xi , yi ) = fi ,
i = 1, . . . , nd .
To solve this problem, suppose is a triangulation with vertices at the points of A. Let S() be a spline space defined on with dimension n ≥ nd , and let Λ(f ) = {s ∈ S() : s(xi , yi ) = fi , i = 1, . . . , nd }. Let (2.1)
[(sxx )2 + 2(sxy )2 + (syy )2 ]dxdy
E(s) = Ω
be the well-known thin-plate energy of s. Then the minimal energy (ME) interpolating spline is the function sE in Λ such that (2.2)
E(sE ) = min E(s). s∈Λ(f )
Assuming Λ(f ) is nonempty, it is well known (see, e.g., [1, 6, 12]) that there exists a unique ME-spline characterized by the property (2.3)
sE , gE = 0,
where (2.4)
all g ∈ Λ(0),
φ, ψE :=
[φxx ψxx + 2φxy ψxy + φyy ψyy ]dxdy. Ω
Moreover, its Bernstein–B´ezier coefficients can be computed by solving an appropriate linear system of equations. For details on two different approaches to this computation, see [1] and [12]. Assuming the data come from a smooth function, i.e., (2.5)
fi = f (xi , yi ),
i = 1, . . . , nd ,
then it is possible to give an error bound for how well the corresponding minimal energy interpolating spline se approximates f . To state the result, suppose the triangulation is β-uniform, i.e., (2.6)
|| ≤ β < ∞, ρ
where || is the length of the longest edge in , and ρ is the minimum of the inradii of the triangles of . Let θ be the smallest angle in . Then it was shown 2 (Ω), in Theorem 6.2 of [6] that for all f ∈ W∞ (2.7)
f − sE Ω ≤ C||2 |f |2,Ω ,
where · Ω is the supremum norm on Ω, and | · |2,Ω is the corresponding Sobolev semi-norm. C is a constant depending only on d, , β, and θ if Ω is convex. If Ω is
914
MING-JUN LAI AND LARRY L. SCHUMAKER
nonconvex, the constant C may also depend on the Lipschitz constant of the boundary of Ω. Now suppose skE is a DDC ME spline computed using Algorithm 1.1 with parameter k ≥ . Then since the analog of (2.7) holds for each subdomain Ωi of Ω, we have sE − skE Ω ≤ C||2 |f |2,Ω .
(2.8)
This shows that the DDC ME spline skE interpolating a given function f is close to the global minimal energy spline sE whenever f is smooth and || is small. The estimate (2.8) does not depend on k, and so gives no information on how the difference behaves with increasing k. In section 5.1 we show that sE − skE Ω = O(σ k ) with 0 < σ < 1. 2.2. Discrete least-squares splines. When the set of data is very large or the d are noisy, it is often better to construct an approximation from measurements {fi }ni=1 a spline space S() of dimension n < nd . Some or all of the vertices of may be at d , but they may also be completely different. The solution points in A := {(xi , yi )}ni=1 of the variational problem of minimizing s − f 2A :=
nd
[s(xj , yj ) − fj ]2
j=1
over all s in S() is called the discrete least-squares (DLS) spline sL . It is well known (see, e.g., [1, 12]) that if S() satisfies the property (2.9)
s(xi , yi ) = 0,
i = 1, . . . , nd ,
implies s ≡ 0,
then there is a unique DLS spline sL fitting the data. It is characterized by the property sL − f, gA = 0,
(2.10)
all g ∈ S(),
where φ, ψA :=
(2.11)
nd
φ(xi , yi )ψ(xi , yi ).
i=1
The Bernstein–B´ezier coefficients of sL can be computed by solving an appropriate linear system of equations. For details on two different approaches to this computation, see [1] and [12]. Assuming the data come from a smooth function, it is possible to give an error bound for how well the least-squares spline sL approximates f . To state the result, suppose as before that the triangulation is β-uniform. In addition, suppose that the data is sufficiently dense that for some constant K1 > 0, ⎛ (2.12)
K1 sT ≤ ⎝
⎞1/2 s(xj , yj )2 ⎠
for all s ∈ S() and all T ∈ .
(xj ,yj )∈T
Let K2 := max #(A ∩ T ). T ∈
COMPUTING BIVARIATE SPLINE FITS
915
m+1 (Ω) with 0 ≤ m ≤ d, Then for all f ∈ W∞
(2.13)
f − sL Ω ≤ C||m+1 |f |m+1,Ω ;
see the remark following Theorem 8.1 in [7]. If Ω is convex, the constant C depends only on d, , β, K2 /K1 , and θ . If Ω is nonconvex, C may also depend on the Lipschitz constant of the boundary of Ω. Now suppose skL is a DDC least-squares spline computed using Algorithm 1.1 with parameter k ≥ . Then the same error bound holds for each subdomain Ωi of Ω, and combining with (2.13) gives (2.14)
sL − skL Ω ≤ C||m+1 |f |m+1,Ω .
This shows that the DDC least-squares spline skL fitting measurements of a given function f is close to the global least squares spline sL whenever f is smooth and || is small. The estimate (2.14) does not depend on k, and so gives no information on how the difference behaves with increasing k. In section 5.2 we show that it is O(σ k ) with 0 < σ < 1. d and S() 2.3. Penalized least-squares splines. Suppose A := {xi , yi }ni=1 d are as in the previous subsections. Fix λ ≥ 0. Then given data values {fi }ni=1 , the corresponding penalized least-squares (PLS) spline is defined to be the spline sλ in S() that minimizes
Eλ (s) := s − f A + λE(s), where E(s) is defined in (2.1). It is well known (cf. [1, 12]) that if S is a spline space such that (2.9) holds, then there exists a unique PLS spline sλ minimizing Eλ (s) over s ∈ S(). Moreover, sλ is characterized by (2.15)
sλ − f, sA + λsλ , sE = 0,
all s ∈ S().
As with the other two methods, the Bernstein–B´ezier coefficients of sλ can be computed by solving an appropriate linear system of equations. For details on two different approaches to this computation, see [1] and [12]. It is known [8] that for all f ∈ WΩm+1 with 0 ≤ m ≤ d, (2.16) f − sλ Ω ≤ C ||m+1 |f |m+1,Ω + λ|f |2,Ω for λ sufficiently small compared to ||. The constant C depends only on d, , β, θ , K2 /K2 , and the area of Ω. If Ω is nonconvex, C may also depend on the Lipschitz constant of the boundary of Ω. Now suppose skλ is a DDC PLS spline computed using Algorithm 1.1 with parameter k ≥ . Then since the analog of (2.16) holds for each subdomain Ωi of Ω, we have (2.17) sλ − skλ Ω ≤ C ||m+1 |f |m+1,Ω + λ|f |2,Ω . This shows that the DDC PLS spline skλ fitting a given function f is close to the global PLS spline sλ whenever f is smooth and || is small. The estimate (2.17) does not depend on k, and so gives no information on how the difference behaves with increasing k. In section 5.3 we show that it is O(σ k ) with 0 < σ < 1.
916
MING-JUN LAI AND LARRY L. SCHUMAKER
Fig. 1. A minimal determining set for S51,2 ().
3. Numerical examples. In this section we illustrate the domain decomposition method by applying it to compute minimal energy and discrete least-squares fits of scattered data. All of our examples are based on the superspline space S51,2 () := {s ∈ S51 () : s ∈ C 2 (v) for all vertices v ∈ }. Here s ∈ C 2 (v) means that all polynomial pieces of s on triangles sharing the vertex v have common derivatives up to order 2 at v. It is well known that the dimension of this space is 6nV + nE , where nV , nE are the number of vertices and edges of , respectively. The computations in this section are based on the algorithms in [12] which make use of a stable local minimal determining set M for S51,2 () and the associated stable local M-bases defined in [10]. Figure 1 shows a minimal determining set for S51,2 (), where points in the set are marked with black dots and triangles. 3.1. Example 1. Let H be the unit square, and let (3.1)
F (x, y) = 0.75 exp(−0.25(9x − 2)2 − 0.25(9y − 2)2 ) + 0.75 exp(−(9x + 1)2 /49 − (9y + 1)/10) + 0.5 exp(−0.25(9x − 7)2 − 0.25(9y − 3)2 ) − 0.2 exp(−(9x − 4)2 − (9y − 7)2 )
be the well-known Franke function defined on H; see Figure 2. Let 1087 be the triangulation shown in Figure 3. This triangulation has 1087 vertices, 3130 edges, and 2044 triangles. The dimension of the space S51,2 (1087 ) is 9652, and the total number of Bernstein–B´ezier coefficients of a spline in this space is 25,871. First we compute the minimal energy spline fit sE of f from S51,2 (1087 ). This requires solving a linear system of 8565 equations with 322,989 nonzero entries. Although the largest element in the corresponding matrix is O(107 ), its condition number is of order O(104 ). For comparison purposes we computed the maximum error e∞ over a 160 × 160 grid, along with the RMS error e2 over the same grid. These errors are shown in the first line of Table 1, along with the computational time in seconds. To explore the performance of our DDC technique, we computed approximations 2 of sE by decomposing Ω into squares {Ωi }m i=1 of width 1/m. In Table 1 we list the results where k is the parameter controlling the size of the sets Ωki in Algorithm 1.1. In addition to the errors e∞ and e2 measuring how well sE fits f , we also tabulate the maximum difference ec∞ between the coefficients of our DDC spline and the coefficients
917
COMPUTING BIVARIATE SPLINE FITS
Fig. 2. The Franke function.
Fig. 3. A triangulation of 1087 vertices.
Table 1 DDC ME fits to Franke’s function from S51,2 (1087 ). m 1 4 4 4 4 8 8 8 8
k 0 1 2 3 4 1 2 3 4
e∞ 9.1(−4) 3.0(−3) 9.3(−4) 9.1(−4) 9.1(−4) 3.1(−3) 9.2(−4) 9.1(−4) 9.1(−4)
e2 7.7(−5) 2.1(−4) 8.6(−5) 7.8(−5) 7.7(−5) 2.7(−4) 9.4(−5) 7.8(−5) 7.7(−5)
ec∞
ec2
8.5(−3) 3.4(−3) 3.4(−4) 5.4(−5) 8.6(−3) 1.9(−3) 3.4(−4) 8.9(−5)
9.1(−5) 1.9(−5) 3.0(−6) 4.4(−7) 1.6(−4) 3.5(−5) 7.0(−6) 1.3(−7)
time 25 9 15 21 30 7 16 29 50
918
MING-JUN LAI AND LARRY L. SCHUMAKER
Fig. 4. stark (Ω64 ) for k = 1, 2, 3.
of the global ME spline sE . We also compute the RMS difference ec2 for the coefficients, and list the computational time in seconds. We now comment on these results. • Accuracy of fit: The table shows that in this experiment, the DDC splines with k = 1 do not fit f as well as the ME spline, but as soon as k ≥ 2, the errors are virtually identical. From the standpoint of accuracy of fit, there is no need to use values of k larger than 2 or 3. • Accuracy of coefficients: The table shows that the DDC fits also provide very good approximations of the coefficients of the global minimal energy spline sE . Both ec∞ and ec2 decrease as k increases, as predicted by the theoretical results in section 5.1. • Time: The main use of the DDC algorithm is to make it possible to solve large variational spline problems which could not be solved at all without using the method. For small problems, it often takes more time to solve for a DDC ME spline than for the global ME spline itself. For this moderately sized problem, we see that some of the DDC splines took less time to compute than the global fit, even for the same accuracy. For example, the DDC spline with m = 8 and k = 2 delivers virtually the same accuracy as the global ME spline, but in only about one half the computing time. For larger problems, the time required to compute DDC ME splines is substantially less than for the global splines; see Example 2. • Condition numbers: Since the entries in the matrix of the linear systems depend on integrals of squares of second derivatives over triangles, when the triangles are of size O(h), the entries are of size O(h−4 ) and even larger if some triangles are very thin. In this example the largest entries are of the order O(107 ). For very regular triangulations (for example type-I triangulations), the condition numbers of the matrices are of size O(103 ), independent of how many triangles there are. For less regular triangulations, they can be much larger. However, for the matrices associated with the triangulations in Figure 4, they are of order O(104 ). • Shape of stark : Figure 4 shows stark (Ω64 ) for k = 1, 2, 3, where Ω64 := [.875, 1] × [.875, 1], shown in dark grey in the figure. The white triangles are the triangles added to form the stars. • Shape of the surface: We have compared 3D plots of the global minimal energy fit of f with the DDC ME fits for the parameters in Table 1. For k = 1 we noticed slight deviations in shape, but for all higher values of k we got excellent shapes. 3.2. Example 2. We repeat Example 1 with a type-I triangulation of the unit square with 4225 vertices. This triangulation includes 12,416 edges and 8192 triangles. The dimension of the space S51,2 (4225 ) is 37,776, and the total number of Bernstein–
919
COMPUTING BIVARIATE SPLINE FITS Table 2 DDC ME fits to Franke’s function from S51,2 (4225 ). m 1 8 8 8 16 16 16
k 0 1 2 3 1 2 3
e∞ 1.2(−4) 9.9(−4) 2.9(−4) 1.8(−4) 9.8(−4) 2.9(−4) 1.8(−4)
e2 7.6(−6) 4.7(−5) 1.5(−5) 9.9(−6) 6.9(−5) 1.9(−5) 1.0(−5)
ec∞
ec2
2.2(−3) 6.8(−4) 1.7(−4) 2.3(−3) 7.6(−4) 1.6(−4)
2.3(−5) 5.7(−6) 1.4(−6) 4.4(−5) 1.0(−5) 2.5(−6)
time 326 37 65 97 29 66 128
B´ezier coefficients of a spline in this space is 103,041. We again fit the Franke function. First we compute the minimal energy spline fit sE of f from S51,2 (4225 ). This requires solving a linear system of 33,541 equations with 1,282,073 nonzero entries. Although the largest element in this matrix is O(107 ), its condition number is O(104 ). Our program took 326 seconds to compute s. For comparison purposes, we computed the maximum error e∞ over a 160 × 160 grid, along with the RMS error e2 over this grid. These errors are shown in the first line of Table 2, along with the computational time (in seconds). We computed approximations of sE using the same decompositions of Ω as in Example 1 based on m2 squares of width 1/m. In Table 2 we list the results. Here we see that using the DDC method results in substantial time savings. We also see that the errors ec∞ and ec2 behave like O(σ k ) with σ ≈ 1/4, confirming the theoretical results in section 5.2. 3.3. Example 3. In this example we work with elevation heights measured at 15,585 points in the Black Forest of Germany. The corresponding DeLaunay triangulation BF is shown in Figure 5, although the triangulation is so fine in many areas that it is impossible to see the individual triangles without zooming in. This triangulation has 47,333 edges and 31,449 triangles. The dimension of the space S51,2 (BF ) is 142,643, and the total number of Bernstein–B´ezier coefficients of a spline in this space is 393,911. The computation of the minimal energy spline fit sE would require solving a linear system of 126,758 equations, which is beyond the capability of our software. So instead we computed a DDC approximation of the ME spline using the decomposition of Example 1 based on 100 squares. The computation took 288 seconds, and Figure 6 shows the resulting surface. 3.4. Example 4. In this example we again work on the unit square H. This time we approximate Franke’s function by least squares based on measured data at 62,500 grid points in H. We approximate from the space S51,2 (1087 ), where 1087 is the same triangulation as in Example 1; see Figure 3. We choose this triangulation since it is big enough to illustrate how the DDC method works, but small enough so that we can compute the global least square spline for comparison purposes. This function can of course fit very well with much smaller spline spaces and much less data. For example, with a type-I triangulation with 81 vertices and 1089 grid data, the errors for the least-squares spline fit are e∞ = 5.2(−4) and e2 = 5.0(−5). The results of our experiments are shown in Table 3. Note that the times of computation for least-squares splines are significantly greater than for the ME splines reported in Table 1. This is due to the fact that a large part of the computation is taken up with finding the triangles containing the various data points. These times can be reduced
920
MING-JUN LAI AND LARRY L. SCHUMAKER
Fig. 5. Triangulation of 15, 585 points in the Black Forest.
Fig. 6. The minimal energy interpolant of the Black Forest data.
921
COMPUTING BIVARIATE SPLINE FITS Table 3 DDC least-squares fits to Franke’s function from S51,2 (BF ). m 1 4 4 4 8 8 8 10
k 0 1 2 3 1 2 3 2
e∞ 4.5(−7) 4.7(−6) 3.8(−6) 9.9(−7) 5.5(−6) 3.8(−6) 1.7(−6) 2.5(−6)
e2 2.3(−8) 7.1(−8) 5.3(−8) 3.2(−8) 1.1(−7) 8.0(−8) 6.8(−8) 9.8(−8)
ec∞
ec2
1.9(−5) 5.6(−6) 1.7(−6) 2.0(−5) 1.1(−5) 3.9(−6) 5.3(−6)
2.1(−8) 1.0(−8) 5.5(−9) 4.3(−8) 2.2(−8) 1.7(−8) 2.8(−8)
time 42 44 62 82 48 93 151 113
by incorporating standard techniques for reducing the time required for these search operations. • Accuracy of fit: Table 3 shows that in this experiment the DDC least-squares splines with k = 1 do not fit f quite as well as the global least-squares spline, but with increasing k they come very close. As with the minimal energy case, it appears that a good choice might be k = 2. • Accuracy of coefficients: The table shows that the DDC fits also provide very good approximations of the coefficients of the global least-squares spline. Both ec∞ and ec2 decrease as k increases. Indeed, for m = 4, the error of ec∞ behaves like O(σ k ) with σ ≈ 1/4, while for m = 8, σ ≈ 1/2. There is a similar effect for e2 , confirming the theoretical results in section 5.2. • Time: The main use of the DDC algorithm is to make it possible to solve large variational spline problems which could not be solved at all without using the method. For small problems, it can take more time to solve for a DDC leastsquares spline than for the global least-squares spline itself. However, even for this moderately sized problem, we see that most of the DDC splines took less time to compute for nearly the same accuracy. • Condition numbers: The condition numbers of the Gram matrix arising in DLS fitting with splines is dependent on a number of things. The size of β (which reflects whether there are skinny triangles in ) plays a role, but not as large a role as in the ME case (since here we are not working with second derivatives). What seems more critical in the least-squares case is the distribution of data over the triangles—if there are triangles with barely enough data to ensure a nonsingular system, the condition number tends to be high. For this particular example, the condition numbers of the matrices arising in the subproblems lie in the range of 105 to 106 . • Shape of the surface: We have compared 3D plots of the global least-squares fit of f with the DDC least-squares fits for the parameters in Table 3. For k = 1 we noticed slight deviations in shape, but for all higher values of k we got excellent shapes. 4. Analytical tools. In this section we set the stage for the proofs in section 5 of our main results. 4.1. Bernstein–B´ ezier techniques. We make use of the Bernstein–B´ezier representation of splines. Given d and , let Dd, := ∪T ∈ Dd,T be the corresponding set of domain points, where for each T := v1 , v2 , v3 , iv1 + jv2 + kv3 T Dd,T := ξijk := . d i+j+k=d
922
MING-JUN LAI AND LARRY L. SCHUMAKER
Then every spline s ∈ Sd0 () is uniquely determined by its set of coefficients {cξ }ξ∈Dd, , and s|T := cξ BξT , ξ∈Dd,T
where {BξT } are the Bernstein basis polynomials associated with the triangle T . Suppose now that S() is a subspace of Sd0 (). Then a set M ⊆ Dd, of domain points is called a minimal determining set (MDS) for S() provided it is the smallest set of domain points such that the corresponding coefficients {cξ }ξ∈M can be set independently, and all other coefficients of s can be consistently determined from smoothness conditions, i.e., in such a way that all smoothness conditions are satisfied (see p. 136 of [10]). The dimension of S() is then equal to the cardinality of M. Clearly, M = Dd, is a minimal determining set for Sd0 (), and thus the dimension of Sd0 () is nV + (d − 1)nE + d−1 2 nT , where nV , nE , nT are the number of vertices, edges, and triangles of . For each η ∈ Dd, \ M, let Γη be the smallest subset of M such that cη can be computed from the coefficients {cξ }ξ∈Γη by smoothness conditions. Then M is called -local provided there exists an integer not depending on such that (4.1)
Γη ⊆ star (Tη ),
all η ∈ Dd, \ M,
where Tη is a triangle containing η. M is said to be stable provided there exists a constant K3 depending only on and the smallest angle in the triangulation such that (4.2)
|cη | ≤ K3 max |cξ |, ξ∈Γη
all η ∈ Dd, \ M.
Suppose M is a stable local MDS for S(). For each ξ ∈ M, let ψξ be the spline in S() such that cξ = 1 while cη = 0 for all other η ∈ M. Then the splines {ψξ }ξ∈M are clearly linearly independent and form a basis for S(). This basis is called the M-basis for S(); see section 5.8 of [10]. It is stable and -local in the sense that for all ξ ∈ M, ψξ Ω ≤ K4 ,
(4.3) and
supp ψξ ⊆ star (Tξ ),
(4.4)
where Tξ is a triangle containing ξ. Here is the integer constant in (4.1), and the constant K4 depends only on and the smallest angle in . There are many spaces with stable local bases. For example, the spaces Sd0 () have stable local bases with = 1. The same is true for the superspline spaces r,2r S4r+1 () for all r ≥ 1. There are also several families of macroelement spaces defined for all r ≥ 1 with the same property; see [10]. 4.2. Two lemmas. For convenience we recall a lemma from [3]. Lemma 4.1. Suppose a0 , a1 , . . . , are nonnegative numbers such that (4.5) γ aj ≤ aν for all ν = 0, 1, 2, . . . , j≥ν
923
COMPUTING BIVARIATE SPLINE FITS
for some 0 < γ < 1. Then aν ≤ γ1 σ ν a0 , where σ := 1 − γ. We now establish a key lemma whose proof is modelled on the proof of Theorem 3.1 in [7]. Let W be a space of spline functions defined on a triangulation of Ω with inner product f, gW and norm f 2W := f, f W . Suppose that {Bξ }ξ∈M is a 1-local basis for W such that for some constants C1 , C2 , 02 0 0 0 0 0 (4.6) C1 |cξ |2 ≤ 0 cξ Bξ 0 ≤ C2 |cξ |2 0 0 ξ∈M
W
ξ∈M
ξ∈M
for all coefficient vectors c := {cξ }ξ∈M . Lemma 4.2. Let ω be a cluster of triangles in , and let T ∈ ω. Then there exists constants 0 < σ < 1 and C depending only on the ratio C2 /C1 such that if g is a function in W with (4.7)
g, wW = 0 for all w ∈ W with supp(w) ⊆ stark (ω),
for some k ≥ 1, then g · χT W ≤ Cσ k gW .
(4.8) Proof. For each ν ≥ 0, let
ν 2 k Mω ν := {ξ ∈ M : supp(Bξ ) ⊆ star (R \ star (ω))}. ω ω ω Define N0ω := Mω 0 , and let Nν := Mν \ Mν−1 , for ν ≥ 1. Given g := let gν := cξ Bξ , uν := g − gν , aν := c2ξ . ξ∈Mω ν
ξ∈M cξ Bξ ,
ξ∈Nνω
By (4.6), (4.9)
aj =
j≥ν+1
c2ξ ≤
ξ ∈Mω ν
uν 2W , C1
while (4.7) implies g, uν W = 0. Since supp(uν ) ∩ ν ≥ 1, it follows that (4.10)
1 ξ∈Mω ν−1
uν 2W = g − gν , uν W = −gν , uν W 0 0 3 2 0 0 0 0 cξ Bξ , uν ≤0 cξ Bξ 0 =− 0 0 ω ω ξ∈Nν
W
ξ∈Nν
Dividing by uν W and squaring, then using (4.6), we get 02 0 0 0 0 0 2 uν W ≤ 0 cξ Bξ 0 ≤ C2 aν . 0 0 ω ξ∈Nν
W
Combining (4.9) and (4.10) gives (4.11)
j≥ν
aj ≤
C1 + C2 aν , C1
ν ≥ 1.
W
supp(Bξ ) = ∅ for
uν W .
924
MING-JUN LAI AND LARRY L. SCHUMAKER
Then applying Lemma 4.1 gives aν ≤
(C1 + C2 ) ν−1 σ a1 , C1
with σ := C2 /(C1 + C2 ). On the other hand,
a1 ≤
aj =
j≥0
c2ξ ≤
ξ∈M
1 g2W . C1
Now let q be the smallest integer such that there is a basis function Bξ in Mω q with T ⊆ supp(Bξ ). Then by (4.6), g ·
χT 2W
02 0 0 0 0 0 =0 cξ Bξ 0 ≤ C2 0 0 Bξ |T =0
≤
W
c2ξ = C2
ξ ∈Mω q−1
aj
j≥q
C2 C1 + C2 2 q−1 σ g2W . C1 C1
Since q ≥ k + 1, we have (4.8). 5. Dependence of the errors on the parameter k. In this section we examine the difference between global splines and their DDC approximations as a function of the parameter k. We give separate results for ME, DLS, and PLS splines. Throughout the section we assume that is a β-uniform triangulation, and that S() is an associated spline space with a stable local M-basis. 5.1. Minimal energy interpolating splines. Given a set of measurements d of a function f at the vertices of a triangulation , let sE be the corresponding {fi }ni=1 minimal energy interpolating spline. Let skE be the DDC ME spline computed using 2 Algorithm 1.1 with parameter k. In (2.8) we showed that if f ∈ W∞ (Ω), then sE − k 2 sE Ω = O(|| ). In this section we discuss the dependence of this difference on k. 2 (Ω) Theorem 5.1. There exists σ ∈ (0, 1) such that for all f ∈ W∞ (5.1)
Dxα Dyβ (sE − skE )Ω ≤ Cσ k ||1−α−β |f |2,Ω
for all 0 ≤ α+β ≤ 1. When Ω is convex, C is a constant depending only on d, , β, θ , and the area of Ω. When Ω is nonconvex, C also depends on the Lipschitz constant of the boundary of Ω. Proof. Let Ωi be one of the subdomains in Algorithm 1.1. In view of the way in which sE is defined, it suffices to estimate sE −skE Ωi . Let ki be the subtriangulation obtained by restricting to Ωki := stark (Ωi ). Fix k ≥ 1. We make use of Lemma 4.2 applied to W = {s ∈ S()|Ωki : s(v) = 0 for all vertices v of ki }, with the inner product (5.2)
φ, ψE,Ωki :=
[φxx ψxx + 2φxy ψxy + φyy ψyy ]dxdy. Ωk i
925
COMPUTING BIVARIATE SPLINE FITS
Let sE,Ωki := sE |Ωki be the global ME interpolant of f restricted to Ωki , and let skE,i be the ME interpolant of f in the space S()|Ωki . Let {Bξ }ξ∈Mki be a stable 1-local basis for S()|Ωki . It was shown in Corollary 5.3 of [6] that (5.3)
C1 ||−2
0 0 0 0 0 0 |cξ |2 ≤ 0 cξ Bξ 0 0 0 k k
ξ∈Mi
ξ∈Mi
≤ C2 ||−2 E,Ωk i
|cξ |2 ,
ξ∈Mk i
where C1 and C2 depend only on d, , and β. Writing g := sE,Ωki − skE,i ∈ W, and using the characterization of ME splines, we have (5.4)
g, Bξ E,Ωki = 0,
all Bξ with supp(Bξ ) ⊆ Ωki .
Now suppose T is a triangle in Ωi where |g| takes its maximum. Since g is a polynomial on T , we can use Lemma 6.1 of [6] and Theorem 1.1 of [10] to get (5.5)
gΩi = gT ≤ 12|T |2 |g|2,∞,T ≤ C3 |||g|2,2,T ≤ C3 ||g·χT E,Ωki ,
where C3 depends only on d. In view of (5.3) and (5.4), we can apply Lemma 4.2 to get (5.6)
g·χT E,Ωki ≤ C4 σ k gE,Ωki ≤ C4 A1/2 σ k |g|2,∞,Ωki ,
where A is the area of Ωki . Note that C4 does not depend on || since the constant in Lemma 4.2 depends on the ratio C2 ||−2 /C1 ||−2 . Now let τ be a triangle where |g|2,∞,Ωi takes its maximum. Then using the Markov inequality, we have (5.7)
|g|2,∞,Ωi = |g|2,∞,τ ≤
C5 C5 gτ ≤ 2 f − sE τ + f − skE,i τ . 2 |τ | |τ |
Combining the inequalities (5.5)–(5.7) with the error bound (2.7), we get (5.1) for α = β = 0. To get the result for derivatives, we apply the Markov inequality on a triangle where Dxα Dyβ gΩ takes its maximum value. d 5.2. DLS splines. Given a set of measurements {fi }ni=1 of a function f and a triangulation , let sL be the DLS spline fit of f from S(). Let skL be the DDC least-squares spline produced by Algorithm 1.1 with parameter k. In (2.14) we showed m+1 (Ω), then sL − skL Ω = O(||m+1 ). In this section we discuss the that if f ∈ W∞ dependence of this difference on k. The following result gives results for the derivatives of the difference. As is customary in spline theory, the norm here is to be interpreted as the maximum of the supremum norms over the triangles in since the splines sL and skL may not have derivatives at every point in Ω. m+1 (Ω) with 0 ≤ m ≤ d, Theorem 5.2. There exists σ ∈ (0, 1) such that if f ∈ W∞ then
(5.8)
Dxα Dyβ (sL − skL )Ω ≤ Cσ k ||m−α−β |f |m+1,Ω .
for all 0 ≤ α + β ≤ m. When Ω is convex, C is a constant depending only on d, , β, K1 , K2 , and θ . When Ω is nonconvex, C also depends on the Lipschitz constant of the boundary of Ω. Proof. Let Ωi be one of the subdomains in Algorithm 1.1. In view of the way in which sL is defined, it suffices to estimate the norm of sL − skL on Ωi . Let ki be the
926
MING-JUN LAI AND LARRY L. SCHUMAKER
subtriangulation obtained by restricting to Ωki := stark (Ωi ). Fix k ≥ 1. We make use of Lemma 4.2 applied to W = S()|Ωki with the inner product (5.9)
φ, ψAki :=
φ(xi , yi )ψ(xi , yi ).
(xi ,yi )∈Ωk i
Let sL,Ωki := sL |Ωki be the restriction to Ωki of the global least-squares spline fit sL of f from S(), and let skL,i be the least-squares spline fit of f from the space S()|Ωki . Let {Bξ }ξ∈Mki be a stable 1-local basis for S()|Ωki . It was shown in Lemma 5.1 of [7] that 0 0 0 0 0 0 2 |cξ | ≤ 0 cξ Bξ 0 ≤ C2 |cξ |2 . (5.10) C1 0 0 k k k k ξ∈Mi
ξ∈Mi
Ai
ξ∈Mi
Writing g := sL,Ωki − skL,i ∈ W, and using the characterization of least-squares splines, we have (5.11)
g, Bξ Aki = 0,
all Bξ with supp(Bξ ) ⊆ Ωki .
Now suppose T is a triangle in Ωi where |g| takes its maximum. Then using (2.12) and Lemma 4.2 we get √ 1 C3 k C3 N K2 k (5.12) gΩi = gT ≤ g·χT Aki ≤ σ gAki ≤ σ gΩki , K1 K1 K1 √ where N is the number of triangles in Ωki . Note that N ≤ C4 /||, where C4 depends on the area of Ωki and the constant β. On the other hand, (5.13)
gΩki ≤ f − sL Ωki + f − skL,i Ωki .
Combining the last two inequalities with the error bound (2.13), we get (5.8) for α = β = 0. To get the result for the derivative Dxα Dyβ , we apply the Markov inequality to a triangle where Dxα Dyβ gΩ takes its maximum. d 5.3. PLS splines. Given a set of measurements {fi }ni=1 of a function f and a triangulation , let sλ be the PLS spline fit of f from S() with smoothing parameter λ > 0. Let skλ be the DDC PLS spline produced by Algorithm 1.1 with parameter k. m+1 In (2.17) we showed that if f ∈ W∞ (Ω), then sλ − skλ Ω = O(||m+1 ) + O(λ). In this section we discuss the dependence of this difference on k. m+1 (Ω) with 1 ≤ m ≤ d, Theorem 5.3. There exists σ ∈ (0, 1) such that if f ∈ W∞ then √
λ λ k k (5.14) sλ − sλ Ω ≤ Cσ 1 + ||m |f |m+1,Ω + |f |2,Ω || ||
if λ is sufficiently small compared to ||. When Ω is convex, C is a constant depending only on d, , β, K1 , K2 , θ , and the area of Ω. When Ω is nonconvex, C also depends on the Lipschitz constant of the boundary of Ω. Proof. Let Ωi be one of the subdomains in Algorithm 1.1. In view of the way in which sλ is defined, it suffices to estimate the norm of sλ − skλ on Ωi . Let ki be the
927
COMPUTING BIVARIATE SPLINE FITS
subtriangulation obtained by restricting to Ωki := stark (Ωi ). Fix k ≥ 1. We make use of Lemma 4.2 applied to W := S()|Ωki with the inner product φ, ψλ := φ, ψAki + λφ, ψE,Ωki ,
(5.15)
where the inner-products in this definition are as in (5.2) and (5.9). Let sλ,Ωki := sλ |Ωki be the restriction to Ωki of the global PLS spline fit sλ of f from S(), and let skλ,i be the PLS spline fit of f from the space S()|Ωki using data in Ωki . Let {Bξ }ξ∈Mki be a stable 1-local basis for S()|Ωki as in the proof of Theorem 5.2. Combining (5.3) and (5.10), we see that 0 0
0 0 λ λ 0 0 2 |cξ | ≤ 0 cξ Bξ 0 ≤ C2 1 + |cξ |2 . (5.16) C1 1 + 2 0 0 ||2 || k k k ξ∈Mi
ξ∈Mi
λ
ξ∈Mi
Writing g := sλ,Ωki − skλ,i ∈ W, and using the characterization of PLS splines, we have g, Bξ λ = 0,
(5.17)
all Bξ with supp(Bξ ) ⊆ Ωki .
Now suppose T is a triangle in Ωi where |g| takes its maximum. Then by (2.12), gT ≤
1/2 1 1 1 g·χT Aki ≤ = g·χT λ . g·χT 2Ak + λg·χT 2E,Ωk i i K1 K1 K1
Using Lemma 4.2, we get gT ≤
√ 1/2 C3 k C3 k C3 k σ gλ ≤ σ g2Ak + λg2E,Ωk ≤ σ gAki + λgE,Ωki , i i K1 K1 K1
where C3 depends only on the ratio C2 /C1 . Following the proofs of Theorems 5.1 and 5.2, we see that gE,Ωki ≤
C4 gΩki , ||2
gAki ≤
C5 gΩki , ||
which gives gT ≤ C6 σ
k
√ λ 1 + gΩki . || ||2
Now gΩki ≤ f − sλ Ωki + f − skλ,i Ωki , and using (2.16) we get (5.14). 6. Remarks. Remark 1. DDC methods have been studied for more than 150 years in the literature on the numerical solution of boundary value problems, going back at least to Schwarz’s alternating method; see, e.g., [11]. For a comprehensive treatment and an extensive list of references, see [13]. The idea of domain decomposition has recently been adapted to the problem of fitting scattered data with radial basis functions
928
MING-JUN LAI AND LARRY L. SCHUMAKER
(see [2]) as well as to meshless methods (based on radial basis functions) for solving boundary-value problems, see [4] and the book [5]. Remark 2. Many authors have tried to solve global fitting problems by dividing the domain into subdomains, computing fits on each subdomain, and then blending the resulting surface patches together with some kind of blending functions. In most of these methods the use of blending functions changes the form of the final approximant and produces a fit which may not be close to the global fit. Our DDC method is not based on blending functions, and our theorems ensure that the DDC-spline is close to the global fit. Remark 3. As observed in [12], in computation with M-bases it is important to exercise some care in choosing the MDS M. Thus, for example in Figure 1, for each vertex v, the six black dots should be chosen in the triangle with largest angle at v. This means that the minimal determining sets for the subspaces S()|Ωki may not be subsets of the MDS for the full space. Remark 4. For convenience, the results of section 5 assume that we are working with a spline space with a 1-local stable basis. However, the same analysis can be carried out with spline spaces with -local stable bases under the assumption that k ≥ . Remark 5. The computations reported here were done on a Macintosh G5 computer using Fortran. The codes have not been optimized for storage or computational speed. We report computational times to give a feeling for how quickly DDC-spline fits can be computed, and to provide a basis for comparing various algorithms. Since the local fits in the DDC method can be computed independently, the actual run times can be greatly reduced by working on a multiprocessor machine (or on a cluster). REFERENCES [1] G. Awanou, M.-J. Lai, and P. Wenston, The multivariate spline method for scattered data fitting and numerical solution of partial differential equations, in Wavelets and Splines (Athens, 2005), G. Chen and M.-J. Lai, eds., Nashboro Press, Brentwood, TN, 2006, pp. 24–74. [2] R. K. Beatson, W. A. Light, and S. Billings, Fast solution of the radial basis function interpolation equations: Domain decomposition methods, SIAM J. Sci. Comput., 22 (2000), pp. 1717–1740. [3] C. de Boor, A bound on the L∞ -norm of L2 -approximation by splines in terms of a global mesh ratio, Math. Comp., 30 (1976), pp. 765–771. [4] Y. Duan, Meshless Galerkin method using radial basis functions based on domain decomposition, Appl. Math. Comput., 179 (2006), pp. 750–762. [5] G. Fasshauer, Meshfree Approximation Methods with MATLAB, World Scientific, Singapore, 2007. [6] M. von Golitschek, M.-J. Lai, and L. L. Schumaker, Error bounds for minimal energy bivariate polynomial splines, Numer. Math., 93 (2002), pp. 315–331. [7] M. von Golitschek and L. L. Schumaker, Bounds on projections onto bivariate polynomial spline spaces with stable bases, Constr. Approx., 18 (2002), pp. 241–254. [8] M.-J. Lai, Multivariate splines for data fitting and approximation, in Approximation Theory XII (San Antonio, 2007), M. Neamtu and L. L. Schumaker, eds., Nashboro Press, Brentwood, TN, 2008, pp. 210–228. [9] M.-J. Lai and L. L. Schumaker, On the approximation power of bivariate splines, Adv. Comput. Math., 9 (1998), pp. 251–279. [10] M.-J. Lai and L. L. Schumaker, Spline Functions on Triangulations, Cambridge University Press, Cambridge, UK, 2007. [11] M.-J. Lai and P. Wenston, On Schwarz’s domain decomposition methods for elliptic boundary value problems, Numer. Math., 84 (2000), pp. 475-495. [12] L. L. Schumaker, Computing bivariate splines in scattered data fitting and the finite-element method, Numer. Algorithms, 48 (2008), pp. 237–260. [13] A. Toselli and O. Widlund, Domain Decomposition Methods—Algorithms and Theory, Springer-Verlag, Berlin, 2005.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 929–952
COUPLED GENERALIZED NONLINEAR STOKES FLOW WITH FLOW THROUGH A POROUS MEDIUM∗ V. J. ERVIN† , E. W. JENKINS† , AND S. SUN† Abstract. In this article, we analyze the flow of a fluid through a coupled Stokes–Darcy domain. The fluid in each domain is non-Newtonian, modeled by the generalized nonlinear Stokes equation in the free flow region and the generalized nonlinear Darcy equation in the porous medium. A flow rate is specified along the inflow portion of the free flow boundary. We show existence and uniqueness of a variational solution to the problem. We propose and analyze an approximation algorithm and establish a priori error estimates for the approximation. Key words. generalized nonlinear Stokes flow, coupled Stokes and Darcy flow, defective boundary condition AMS subject classification. 65N30 DOI. 10.1137/070708354
1. Introduction. The coupling of Stokes and Darcy flow problems has received significant attention over the past several years due to its importance in modeling problems such as surface fluid flow coupled with flow in a porous media (see, for instance, [4, 9, 12, 14, 16, 20, 21]). As in [12], the investigation in this paper is motivated by industrial filtering applications where a non-Newtonian fluid passes through a filter to remove unwanted particulates. The lifetime of the filter is dictated by the increase in pressure drop across the porous medium. This pressure drop increase occurs as debris, transported into the filter by the free flowing fluid, deposits into the filter. Models of the coupled system are necessary to develop simulators that can aid in the design of filters with extended lifetimes and minimize release of debris into the downstream flow. In these applications, flow rates are typically specified at the inflow of the filtering apparatus. Our first step in modeling the filtration problem is to consider the case of the coupled nonlinear Stokes–Darcy flow problem with defective boundary conditions. Namely, we assume that only flow rates are specified along the inflow boundary. In [12], the authors use the Darcy equation as a boundary condition for the Stokes problem in the free-flow region. We couple the flows across the internal boundary by using conservation of mass and balance of forces across the interface, as in [9, 14, 20, 21]. For Newtonian fluids the extra stress tensor, τ , is proportional to the deformation tensor, d(u), with the constant of proportionality being the value of the dynamic viscosity, ν. Our model problem uses generalized power law fluids, which are an extension of Newtonian fluids. Generalized power law fluids have a nonconstant viscosity that is a function of the magnitude of the deformation tensor. Models for such viscosity functions include the following [3, 17]:
∗ Received by the editors November 16, 2007; accepted for publication (in revised form) August 4, 2008; published electronically February 13, 2009. http://www.siam.org/journals/sinum/47-2/70835.html † Department of Mathematical Sciences, Clemson University, Clemson, SC 29634-0975 (vjervin@ clemson.edu,
[email protected],
[email protected]). The research of the first two authors was partially supported by the National Science Foundation under grant DMS-0410792.
929
930
V. J. ERVIN, E. W. JENKINS, AND S. SUN
Carreau model. (1.1)
ν(d(u)) = ν∞ + (ν0 − ν∞ )/(1 + K|d(u)|2 )(2−r)/2 ,
where r > 1, ν0 , ν∞ , and K > 0 are constants. Cross model. (1.2)
ν(d(u)) = ν∞ + (ν0 − ν∞ )/(1 + K|d(u)|(2−r) ),
where r > 1, ν0 , ν∞ , and K > 0 are constants. Power law model. (1.3)
ν(d(u)) = K|d(u)|r−2 ,
where r > 1 and K > 0 are constants. Many generalized Newtonian fluids exhibit a sheer thinning property; that is, the viscosity decreases as the magnitude of d(u) increases. For the above models this corresponds to a value for r between 1 and 2. Generalized power law viscosity models have been used in modeling the viscosity of biological fluids, lubricants, paints, and polymeric fluids. In the analysis below we assume a general function for ν(d(u)) that satisfies particular continuity and monotonicity properties. (See (2.16), (2.17).) For non-Newtonian fluid flow in a porous medium, various models for the effective viscosity νeff have been proposed in the literature. (See, for example, [15, 18] and the references cited therein.) Based upon dimensional analysis most models assume that √ νeff is a function of |up |/( κ mc ), where κ denotes the permeability of the porous medium, up the Darcy velocity, and mc a constant related to the internal structure of the porous media. Models for νeff include the following [15, 18]: Cross model. (1.4)
νeff (up ) = ν∞ + (ν0 − ν∞ )/(1 + K |up |2−r ),
where r > 1, ν0 , ν∞ , and K > 0 are constants. Power law model. r−2 √ (1.5) νeff (up ) = K |up |/( κ mc ) , where r > 1 and K > 0 are constants. Again, in the analysis below we assume a general function for νeff (up ) that satisfies particular continuity and monotonicity properties. (See (2.16), (2.17).) Remark. In this work we ignore the influence of pressure on viscosity. The variational formulation presented below for the coupled nonlinear flow problem (ignoring the defective boundary conditions) is analogous to that for the linear coupled problem studied in [9, 14, 20, 21]. However, as the function setting for the linear problem is in Hilbert spaces (H1 (Ω), L2 (Ω)) compared to Banach spaces (W1,r (Ω), Lr (Ω)) for the nonlinear problem, the analysis used herein is considerably different than that in [9, 14, 20, 21]. 2. Modeling equations. Let Ω ⊂ Rn , n = 2 or 3, denote the flow domain of interest. Additionally, let Ωf and Ωp denote bounded Lipschitz domains for the nonlinear generalized Stokes flow and nonlinear generalized Darcy flow, respectively. The interface boundary between the domains we denote by Γ := ∂Ωf ∩ ∂Ωp . Note that Ω := Ωf ∪ Ωp ∪ Γ. The outward-pointing unit normal vectors to Ωf and Ωp are
COUPLED GENERALIZED NONLINEAR FLOW
931
denoted nf and np , respectively. The tangent vectors on Γ are denoted by t1 (for n = 2), or tl , l = 1, 2 (for n = 3). We assume that there is an inflow boundary Γin , a subset of ∂Ωf \Γ, which is separated from Γ, and an outflow boundary Γout , a subset of ∂Ωp \Γ, which is also separated from Γ. See Figure 2.1 for an illustration of the domain of the problem.
Γin Ωf 111111111111 000000000000 000000000000 111111111111 000000000000 111111111111 Γ 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111
Porous Media Ωp
111111111111 000000000000 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111
Γout
Fig. 2.1. Illustration of flow domain.
Define Γf := ∂Ωf \(Γ ∪ Γin ), and Γp := ∂Ωp \(Γ ∪ Γout ). Velocities are denoted by uj : Ωj → Rn , j = f, p, and pressures are denoted by pj : Ωj → R, j = f, p. In Ωf , we assume that the flow is governed by the nonlinear generalized Stokes flow, subject to a specified flow rate, −f r, across Γin and no-slip condition on Γf : (2.1)
−∇ · (σ − pf I) = ff
(2.2)
∇ · uf = 0
in Ωf , in Ωf ,
σ = gf (d(uf ))d(uf )
(2.3) (2.4)
uf = 0
in Ωf ,
on Γf ,
uf · nf ds = −f r,
(2.5) Γin
where σ denotes the fluid’s extra stress tensor and d(v) := 12 (∇v + ∇T v) is the deformation tensor. The particular form for the nonlinear viscosity function gf (·) is discussed in section 2.2. For simplicity we consider here the case of a single inflow boundary Γin . Multiple inflow boundary segments with separately specified flow rates can also be modeled [6, 7, 11]. We assume that the flow in the porous domain Ωp is governed by a generalized Darcy’s equation subject to a specified flow rate, f r, across Γout and a nonpenetration
932
V. J. ERVIN, E. W. JENKINS, AND S. SUN
condition on Γp : (2.6)
up = −
κ ∇pp νeff
(2.7)
∇ · up = 0
in Ωp ,
up · np = 0
on Γp ,
(2.8)
in Ωp ,
up · nf ds = f r.
(2.9) Γout
In general κ denotes a symmetric, positive definite tensor. For simplicity, we will assume κ is a positive (scalar) constant. 2.1. Interface conditions. The flows in Ωf and Ωp are coupled across the interface Γ. Conditions describing the coupling of the flows are discussed below. Conservation of mass across Γ: The conservation of mass across Γ imposes the constraint (2.10)
uf · nf + up · np = 0
on Γ.
Balance of the normal forces across Γ: The balance of the normal forces across Γ imposes the constraint (2.11)
pf − (σnf ) · nf = pp
on Γ.
Balance of the forces on Γ: For the tangential forces on Γ we use the Beavers– Joseph–Saffman condition [1, 13, 22] (2.12)
uf · tl = −csrl (σnf ) · tl
on Γ,
l = 1, . . . , n − 1,
where csrl , l = 1, . . . , n − 1, denote frictional constants that can be determined experimentally. 2.2. Variational formulations. Given r ∈ R, r > 1, we denote its unitary conjugate by r , satisfying r−1 + (r )−1 = 1. For Ωf , define 4 Xf := v : v ∈ (W 1,r (Ωf ))n ,
v|Γf = 0
5
and
Mf := Lr (Ωf ).
For v ∈ Xf , q ∈ Mf , define vXf := v(W 1,r (Ωf ))n , and qMf := qLr (Ωf ) . For Ωp , define Lr (div, Ωp ) := {v : v ∈ (Lr (Ωp ))n and ∇ · v ∈ Lr (Ωp )} , 4 5 and Mp := Lr (Ωp ). Xp := v : v ∈ Lr (div, Ωp ), v · n|Γp = 0 , Similarly, for v ∈ Xp , q ∈ Mp , define vXp := v(Lr (Ωp ))n + ∇ · vLr (Ωp ) and qMp := qLr (Ωp ) . We also use the spaces X and M defined on Ω by X := Xf × Xp and M := q ∈ Mf × Mp | q dA = 0 Ω
and denote the dual space of X by X ∗ .
COUPLED GENERALIZED NONLINEAR FLOW
933
For v = (vf , vp ) ∈ X and q = (qf , qp ) ∈ M , vX := vf Xf + vp Xp
1/r qM := qf rLr (Ωf ) + qp rLr (Ωp ) .
and
Also, for f, k : Ω → Rm , (f, k) := Ω f · k dA. Let g(x) : RN −→ R+ ∪ {0} and G(x) : RN −→ RN be given by G(x) := g(x)x. Further for x, h ∈ RN , let G(·) satisfy (for constants C1 , C2 , C3 > 0, and c ≥ 0) (2.13) (2.14)
A1: |G(x + h) − G(x)| |h| ≤ C1 (G(x + h) − G(x)) · h, |h|2 ≤ C2 (G(x + h) − G(x)) · h, c + |x|2−r + |x + h|2−r
A2:
A3: |G(x + h) − G(x)| ≤ C3
(2.15)
c+
|x|2−r
|h| , + |x + h|2−r
with the convention that G(x) = 0 if x = 0, and |h|/(c + |x| + |h|) = 0 if c = 0 and x = h = 0. From A1, A2, and A3 it follows (see [23]) that there exist constants C4 , C5 > 0 such that for s, t, w ∈ (Lr (Ω))N (2.16) (G(s) − G(t))·(s − t) dA Ω s − t2Lr (Ω) ≥ C4 |G(s) − G(t)||s − t| dA + , 2−r c + s2−r Ω Lr (Ω) + tLr (Ω) (2.17) (G(s) − G(t))·w dA Ω 1/r
2−r |s − t| ∞r ≤ C5 |G(s) − G(t)||s − t| dA wLr (Ω) . c + |s| + |t| Ω In Ωp , with x, h in (2.13)–(2.15) denoting vectors in Rn and · the usual vector dot product, we assume that gp (up ) := νeff /κ, and let Gp (v) = gp (v)v. In Ωf we assume that σ = gf (d(uf ))d(uf ), and let Gf (τ ) := gf (τ )τ , where we interpret x, h in (2.13)–(2.15) as tensors in Rn×n and · as the usual tensor scalar product. Remark. For ν∞ = 0, conditions (2.13)–(2.15) are satisfied for Gf (τ ) and Gp (v), with gf (d(u)) = 2ν(d(u)) described in (1.1)–(1.3) and gp (up ) = νeff (up ) described in (1.4) and (1.5) (see [23]). Different functions spaces from the setting studied herein are required for ν∞ > 0. Multiplying (2.1) through by v1 ∈ Xf , integrating over Ωf , and using (2.3) and the fact that {nf , tl , l = 1, . . . , n − 1} form an orthonormal basis along Γ, we have ff · v1 dA = σ : d(v1 ) dA − pf ∇ · v1 dA − ((−pf I + σ)nf ) · v1 ds Ωf
Ωf
Ωf
gf (d(uf ))d(uf ) : d(v1 ) dA −
= Ωf
+
n−1 l=1
(2.18)
Γ∪Γin
− nTf σtl v1 · tl ds
Γ
(pf −
+ Γ
pf ∇ · v1 dA Ωf
nTf σnf ) v1
· nf ds −
((−pf I + σ)nf ) · v1 ds. Γin
934
V. J. ERVIN, E. W. JENKINS, AND S. SUN
Also, multiplying (2.6) through by v2 ∈ Xp and integrating over Ωp , we obtain
gp (up )up · v2 dA −
(2.19) 0 = Ωp
pp ∇ · v2 dA Ωp
pp v2 · np ds +
+ Γout
pp v2 · np ds. Γ
The coupling of the Stokes and Darcy flows occurs through the interface conditions (2.10) and (2.11). Following [14], we introduce a new variable λ representing λ := pf − (σnf ) · nf = pp
(2.20)
and incorporate (2.11) into (2.18) and (2.19). Equation (2.10) is imposed weakly in a separate equation. (See (2.32) below.) Note that using the Beavers–Joseph–Saffman condition (2.12), n−1 l=1
−nTf σtl v1 · tl ds =
Γ
n−1 l=1
Γ
csrl−1 (uf · tl ) (v1 · tl ) ds.
To incorporate the specified flow rate conditions into the mathematical formulation, we use a Lagrange multiplier approach. In (2.18) and (2.19)
((−pf I + σ)nf ) · v1 ds is replaced by βin
(2.21) Γin
v1 · nf ds Γin
pp v2 · np ds is replaced by βout
(2.22) Γout
v2 · np ds, Γout
where βin , βout ∈ R are undetermined constants. We comment below on the implicit assumptions induced by using the Lagrange multiplier approach. For v ∈ W 0,r (div, Ωp ), we have that v · np ∈ W −1/r,r (∂Ωp ) (see [8, p. 47]). For v ∈ Xp and λ ∈ W 1/r,r (Γ) we define the operator v · np ∈ W −1/r,r (Γ) as
v · np , λΓ := v · np , EΓr λ∂Ωp ,
(2.23)
with EΓr λ defined as in Lemma A.1 in Appendix A (with the association p = r , Ω = Ωp , Γ = Γ, Γb = Γp , Γd = Γout ). Note that for v ∈ Xp sufficiently smooth,
v · np , λΓ = v · np , EΓr λ∂Ωp =
v · np λ ds. Γ
For v ∈ (W 1,r (Ωf ))n we have that v · nf ∈ W 1/r ,r (∂Ωf ); hence well defined.
Γ
v · nf λ ds is
935
COUPLED GENERALIZED NONLINEAR FLOW
In order to compactly write the mathematical formulation, we introduce the following bilinear forms: (2.24)
af (u, v) :=
gf (d(u))d(u) : d(v) dA +
(2.25)
Ωf
n−1 l=1
Γ
csrl−1 (u · tl ) (v · tl ) ds,
gp (u)u · v dA,
ap (u, v) :=
Ωp
q∇ · v dA + β
(2.26) bf (v, q, β) :=
Ωf
v · nf ds, Γin
q ∇ · v dA + β
(2.27) bp (v, q, β) := Ωp
v · np ds. Γout
With the above notation, the modeling equations in Ωf may be written as (2.28) ∀v1 ∈ Xf , af (uf , v1 ) − bf (v1 , pf , βin ) + v1 · nf λ ds = (ff , v1 )Ωf Γ
bf (uf , q1 , β1 ) = −β1 f r
(2.29)
∀(q1 × β1 ) ∈ Mf × R,
and in Ωp as (2.30)
ap (up , v2 ) − bp (v2 , pp , βout ) + λ, v2 · np Γ = 0 bp (up , q2 , β2 ) = β2 f r
(2.31)
∀v2 ∈ Xp ,
∀(q2 × β2 ) ∈ Mp × R.
Together with (2.28)–(2.31) we have the interface condition (2.10). We impose this constraint weakly using uf · nf ζ ds + up · np , ζΓ = 0 ∀ζ ∈ W 1/r,r (Γ). (2.32) Γ
Introduce f := (ff , 0), bI (·, ·) : X × W 1/r,r (Γ) → R as vf · nf ζds + vp · np , ζΓ , (2.33) bI (v, ζ) := Γ
and a(·, ·) : X × X → R, b(·, ·, ·) : X × M × R2 → R as (2.34)
a(u, v) := af (uf , vf ) + ap (up , vp )
and
b(v, q, γ) := bf (vf , qf , γ1 ) + bp (vp , qp , γ2 ).
We then state the coupled fluid flow problem as follows: Given f ∈ X ∗ , f r ∈ R, determine (u, p, λ, β) ∈ X × M × W 1/r,r (Γ) × R2 such that (2.35) (2.36)
a(u, v) − b(v, p, β) + bI (v, λ) = (f , v) ∀v ∈ X, −1 b(u, q, γ) − bI (u, ζ) = γ · f r ∀(q, ζ, γ) ∈ M × W 1/r,r (Γ) × R2 . 1
The unique solvability of (2.35)–(2.36) hinges upon showing two inf-sup conditions: one for b(·, ·, ·) and the other for bI (·, ·).
936
V. J. ERVIN, E. W. JENKINS, AND S. SUN
Equivalence of the differential equations and variational formulations. As demonstrated above, the variational formulation (2.35)–(2.36) was obtained by multiplying the differential equations by sufficiently smooth functions, integrating over the domain, and, where appropriate, applying Green’s theorem. We also used (2.21)– (2.22) to impose the specified flow rate boundary conditions. For a smooth solution, the steps used in deriving the variational equations can be reversed to show that equations (2.1)–(2.5), (2.6)–(2.9) are satisfied. In addition we have that a smooth solution of (2.35)–(2.36) satisfies the following additional boundary conditions (see [7]). For nf , the outward normal on Γin , express the extra stress vector on Γin , σnf , as σnf = sn nf + sT , where sn = (σnf ) · nf and sT = σnf − sn nf . The scalar sn represents the magnitude of the extra stress in the outward normal direction to Γin , and sT the component of the extra stress vector which lies in the plane of Γin . Lemma 2.1. Any smooth solution of (2.35), (2.36) satisfies the following boundary conditions: (2.37)
on Γin ,
−pf + sn = −βin
(2.38)
on Γout ,
pp = −βout .
and
sT = 0;
Proof. The proof follows as in [7]. Remark. Equations (2.1)–(2.5), (2.6)–(2.9), (2.10)–(2.12) do not uniquely define a solution, but rather a set of solutions. The variational formulation (2.35)–(2.36) chooses a solution from the solution set. Specifically, (2.35)–(2.36) chooses the solution which satisfies (2.37)–(2.38). A different variational formulation may result in the selection of a different solution from the solution set. (See, for example, [7].) 3. Existence and uniqueness of the variational formulation. In order to show the existence and uniqueness of the variational formulation, we introduce the following subspaces of X: V := {v ∈ X : bI (v, ζ) = 0
(3.1)
Z := {v ∈ V : b(v, q, γ) = 0
(3.2)
∀ζ ∈ W 1/r,r (Γ)}, ∀(q, γ) ∈ M × R2 }.
Consider b(·, ·, ·) : X × M × R2 → R defined in (2.34). Using H¨ older’s inequality together with the definition (2.23), we have that b(·, ·, ·) is continuous. In addition, b(·, ·, ·) satisfies the following inf-sup condition. Lemma 3.1. There exists CMRV > 0 such that b(u, q, γ) inf sup ≥ CMRV , 2 (0,0) =(q,γ )∈M×R u∈V uX (q, γ)M×R2
(3.3)
where (q, γ)M×R2 := qM + γR2 . Proof. Fix (q, γ) ∈ M × R2 and let
(3.4) Note that
qˆ := Ω
|q|r /r−1 q γ ˆ := , γ . γR2 qrM−1
ˆ = γR2 , ˆ q qˆ dΩ = qM , ˆ q Lr (Ω) = 1, and γ · γ γ R2 = 1.
COUPLED GENERALIZED NONLINEAR FLOW
937
m m Let Γm i ⊂ Γi such that meas(Γi ) > 0 and dist(Γi , ∂Ω\Γi ) > 0 for i = in, out. 1/r ,r (∂Ω) be given by Let h ∈ C(∂Ω) ⊂ W
h|Γm := γˆi /meas(Γm i ), i
i = in, out,
h|∂Ω\(Γin ∪Γout ) := 0, and on Γi \Γm i h is either a strictly increasing or strictly decreasing function. Also, let δ ∈ R be given by
δ := h ds − qˆ dA /meas(Ω). Ω
∂Ω
From [8, p. 127], given f ∈ Lr (Ω), a ∈ W 1−1/r,r (∂Ω), 1 < r < ∞, satisfying (3.5) f dA = a · n ds, Ω
∂Ω
there exists v ∈ W 1,r (Ω) such that (3.6)
∇·v=f
(3.7)
v=a with vW 1,r (Ω)
(3.8)
in Ω,
on ∂Ω, ≤ C f Lr (Ω) + aW 1−1/r,r (∂Ω) .
Let f = qˆ + δ, and for {n, ti , i = 1, . . . , n − 1} denoting an orthonormal system on ∂Ω, let a be defined by a · n = h, a · ti = 0, i = 1, . . . , n − 1. Remark. The choice of the constant δ guarantees that the compatibility condition f dΩ = Ω ∂Ω a · n ds is satisfied. γ Rm = C1 . Also, Note that aW 1/r ,r (∂Ω) ≤ C1 and ˆ (3.9) qˆ dA ≤ ˆ qLr (Ω) 1Lr (Ω) = C2 ,
Ω
(3.10) ∂Ω
h ds ≤ ˆ γ R2 1R2 = C3 ,
and thus δLr (Ω) ≤ C4 . Let vf = v|Ωf , vp = v|Ωp , where v denotes the solution of (3.6)–(3.7). From (3.8) we have vX ≤ C (1 + C4 + C1 ) ≤ C5 .
(3.11)
Also, note that vf ∈ W 1/r ,r (∂Ωf ), vp ∈ W 1/r ,r (∂Ωp ), and vf = vp on Γ. Thus, for λ ∈ W 1/r,r (Γ), vf · nf λ ds + vp · np , λΓ = vf · nf λ ds + vp · np λ ds = 0, Γ
i.e., v ∈ V .
Γ
Γ
938
V. J. ERVIN, E. W. JENKINS, AND S. SUN
Now,
q ∇ · vdA + γ1
b(v, q, γ) = Ω
Γin
≥
v · nf ds + γ2
v · np ds Γout
ˆ ·γ q (ˆ q + δ) dA + γ Ω
= qM + γR2
as
= (q, γ)M×R2 ,
Ω
q δ dA = 0 for q ∈ M . Thus, sup u∈V
b(u, (q, γ)) b(v, (q, γ)) 1 ≥ ≥ , (q, β)M×Rm uX (q, β)M×Rm vX C5
from which (3.3) directly follows. The required inf-sup condition for bI (·, ·) may be stated as follows. Lemma 3.2. The bilinear form bI (·, ·) : X × W 1/r,r (Γ) → R is continuous. Moreover, there exists CXΓ > 0 such that (3.12)
inf
sup
0 =λ∈W 1/r,r (Γ) u∈X
bI (u, λ) ≥ CXΓ . uX λW 1/r,r (Γ)
Proof. The continuity of bI (·, ·) follows from the continuity of the trace operator and definition (2.23). The proof of this inf-sup condition requires a suitable extension of a functional from W −1/r,r (Γ) to W −1/r,r (∂Ωp ) be defined. Some of the notation used in this proof is defined in the appendix, where suitable extension operators from Γ to ∂Ωp are discussed. To show (3.12), let λ ∈ W 1/r,r (Γ). Then, from the definition of the norm, there −1/r,r (Γ), fΓ W −1/r,r (Γ) = 1, such that exists fΓ ∈ W fΓ , λΓ ≥
(3.13)
1 λW 1/r,r (Γ) . 2
Given fΓ ∈ W −1/r,r (Γ) we can extend it to a functional f in W −1/r,r (∂Ωp ) by (3.14)
f, ξ∂Ωp := fΓ , ξ|Γ Γ for ξ ∈ W 1/r,r (∂Ωp ). 1/r,r
Note that for η ∈ W00
(∂Ωp \Γ)
r r f, E00,∂Ω η∂Ωp = fΓ , E00,∂Ω η|Γ Γ = fΓ , 0Γ = 0. p \Γ p \Γ
Thus, from Definition A.3 (see Appendix A), f |∂Ωp \Γ = 0. Also, f W −1/r,r (∂Ωp ) = ≤ (3.15)
f, ξ∂Ωp fΓ , ξΓ Γ = sup ξ∈W 1/r,r (∂Ωp ) ξW 1/r,r (∂Ωp ) ξ∈W 1/r,r (∂Ωp ) ξW 1/r,r (∂Ωp ) sup
sup
fΓ W −1/r,r (Γ) ξΓ W 1/r,r (Γ)
ξ∈W 1/r,r (∂Ωp )
≤ fΓ W −1/r,r (Γ) = 1.
ξW 1/r,r (∂Ωp )
939
COUPLED GENERALIZED NONLINEAR FLOW
Let φ ∈ W 1,r (Ωp ) be given by the weak solution of −∇ · |∇φ|r
(3.16)
−2
∇φ + |φ|r
|∇φ|r
(3.17)
−2
−2
φ=0
∇φ · np = f
in Ωp , on ∂Ωp ,
i.e., φ satisfies
|∇φ|r −2 ∇φ · ∇w + |φ|r −2 φw dA
(T (φ), w) :=
(3.18)
Ωp
=
∀w ∈ W 1,r (Ωp ).
f w ds ∂Ωp
Existence and uniqueness of φ follow from the strong monotonicity of T : W 1,r (Ωp ) ∗ −→ W 1,r (Ωp ) . Note that
(T (φ), φ) = φrW 1,r (Ωp ) ≤ f W −1/r,r (∂Ωp ) φW 1/r,r (∂Ωp ) ≤ C1 f W −1/r,r (∂Ωp ) φW 1,r (Ωp )
=⇒ φrW 1,r (Ωp ) ≤ C∗ f rW −1/r,r (∂Ωp ) ≤ C∗ ,
(3.19)
as f W −1/r,r (∂Ωp ) ≤ 1. Now, let v := |∇φ|r −2 ∇φ. Note from (3.16) that ∇ · v = |φ|r −2 φ, and
vrW 0,r (div,Ωp ) = φrW 1,r (Ωp ) ≤ C∗ ,
(3.20)
i.e., v ∈ W 0,r (div, Ωp ) and v · np ∈ W −1/r,r (∂Ωp ). Finally, let w = (0, v) ∈ X. Then, in view of (2.23), sup u∈X
bI (u, λ) bI (w, λ) 0 + v · np , λΓ ≥ = uX wX vW 0,r (div,Ωp )
≥ = =
v · np , EΓr λ∂Ωp 1/r
C∗ 1
1/r C∗
1 1/r
C∗ ≥
f, EΓr λ∂Ωp fΓ , λΓ
1 1/r
2C∗
as f |∂Ωp \Γ = 0
λW 1/r,r (Γ)
(see (A.7))
from (3.13).
We are now in a position to prove the existence and uniqueness of the solution. Theorem 3.3. There exists a unique solution (u, p, λ, β) ∈ X ×M ×W 1/r,r (Γ)× 2 R satisfying (2.35)–(2.36). In addition, there exists a constant C > 0 such that (3.21) uX ≤ C ff Xf∗ + |f r| .
940
V. J. ERVIN, E. W. JENKINS, AND S. SUN
Proof. For v = (v1 , v2 ) ∈ Z, note that ∇ · v1 = 0 a.e. in Ωf and ∇ · v2 = 0 a.e. in Ωp . Hence, for v ∈ Z, v2 Xp = v2 Lr (Ωp ) and vX = v1 Xf + v2 Lr (Ωp ) . From the continuity and inf-sup condition for b(·, ·, ·) [10, Remark 4.2, p. 61] there exists u0 ∈ V such that −1 b(u0 , q, γ) = γ · fr ∀(q, γ) ∈ M × R2 , 1 (3.22)
u0 X ≤ C|f r|.
with
Together with the continuity and inf-sup condition of bI (·, ·), the existence and uniqueness of the solution to (2.35)–(2.36) can be equivalently stated as follows: Given ˜ ∈ Z, u = u ˜ + u0 , such that f ∈ X ∗ , determine u a(˜ u + u0 , v) = (f , v)
(3.23)
∀v ∈ Z.
The existence and uniqueness of the solution to (3.23) follows from the continuity and strict monotonicity of a(·, ·) on Z × Z, which follows from assumptions (2.16)–(2.17) and the restriction that for Ω ⊂ R2 , 4/3 < r ≤ 2, and for Ω ⊂ R3 , 3/2 < r ≤ 2. This restriction arises in applying the Sobolev embedding theorem to verify the continuity of a(·, ·). Specifically, n−1 l=1
Γ
csrl−1 ((uf − wf ) · tl ) (vf · tl ) ds
≤ Cuf − wf L2 (Γ) vf L2 (Γ) ≤ Cuf − wf W 1−1/r,r (∂Ωf ) vf W 1−1/r,r (∂Ωf ) ≤ Cu − wX vX . Also, it follows from (2.16), (2.17), and (3.22) that ˜ uX ≤ C (f X ∗ + |f r|) = C ff Xf∗ + |f r| , and therefore the estimate
uX ≤ C ff Xf∗ + |f r| .
4. Finite element approximation. In this section we discuss the finite element approximation to the coupled generalized nonlinear Stokes–Darcy system (2.35), (2.36). We focus our attention on the conforming approximating spaces Xf,h ⊂ Xf ,
Mf,h ⊂ Mf ,
Xp,h ⊂ Xp ,
Mp,h ⊂ Mp ,
Lh ⊂ W 1/r,r (Γ),
where Xf,h , Mf,h denote velocity and pressure spaces typically used for fluid flow approximations, and Xp,h , Mp,h denote velocity and pressure spaces typically used for (mixed formulation) Darcy flow approximations. We begin by describing the finite element approximation framework used in the analysis. Let Ωj ⊂ Rn (n = 2, 3), j = f, p, be a polygonal domain and let Tj,h be a triangulation of Ωj made of triangles (in R2 ) or tetrahedra (in R3 ). Thus, the computational domain is defined by Ω = ∪K; K ∈ Tf,h ∪ Tp,h .
COUPLED GENERALIZED NONLINEAR FLOW
941
We assume that there exist constants c1 , c2 such that c1 h ≤ h K ≤ c2 ρ K , where hK is the diameter of triangle (tetrahedron) K, ρK is the diameter of the greatest ball (sphere) included in K, and h = maxK∈Tf,h ∪Tp,h hK . For simplicity, we assume that the triangulations on Ωf and Ωp induce the same partition on Γ, which we denote TΓ,h . Let Pk (A) denote the space of polynomials on A of degree no greater than k. Also, for x = [x1 , . . . , xn ]T ∈ Rn , let RTk (A) := (Pk (A))n + xPk (A) denote the kth order Raviart–Thomas elements. Then we define the finite element spaces as follows: 5 4 (4.1) Xf,h := v ∈ Xf ∩ C(Ωf )2 : v|K ∈ Pm (K) ∀K ∈ Tf,h , 5 4 Mf,h := q ∈ Mf ∩ C(Ωf ) : q|K ∈ Pm−1 (K) ∀K ∈ Tf,h , (4.2) (4.3)
Xp,h := {v ∈ RTk (K) ∀K ∈ Tp,h } ,
(4.4)
Mp,h := {q ∈ Mf : q|K ∈ Pk (K) ∀K ∈ Tf,h } , . Lh := ζ ∈ W 1/r,r (Γ) ∩ C(Γ) : ζ|K ∈ Pl (K) ∀K ∈ TΓ,h .
(4.5)
Note that as we are assuming 1 < r < 2, then 1/r > 1/2, which implies that, for Ω ⊂ R2 , λ ∈ W 1/r,r (Γ) is continuous. For m = 2, Xf,h and Mf,h denote the Taylor–Hood spaces. Below we assume that m ≥ 2, k ≥ 1, and l ≤ k. Let 4 5 4 5 0 0 := v ∈ Xf,h : v|∂Ωf \Γin = 0 and Xp,h := v ∈ Xp,h : v · np |∂Ωp \Γout = 0 . Xf,h Lemma 4.1. There exist constants Cf,h , Cp,h > 0, such that q ∇ · vh dA Ωf h (4.6) sup ≥ Cf,h , inf 0 =qh ∈Mf,h v ∈X 0 qh Mf vh Xf h f,h Ωp qh ∇ · vh dA inf (4.7) sup ≥ Cp,h . 0 =qh ∈Mp,h v ∈X 0 qh Mp vh Xp h p,h Proof. For the case of the pressure spaces having mean value equal to zero, the inf-sup conditions (4.6) and (4.7) are well established. As mentioned in [14], one can extend the inf-sup conditions to the above pressure spaces via a local projector operator argument. (See [2, section VI.4].) Remark. There are several other suitable choices of approximation spaces. (See discussions in [14, 9].) Discrete approximation problem. Given f ∈ X ∗ , f r ∈ R, determine (uh , ph , λh , β h ) ∈ Xh × Mh × Lh × R2 such that (4.8)
a(uh , vh ) − b(vh , ph , β h ) + bI (vh , λh ) = (f , vh )
(4.9)
b(uh , qh , γ h ) − bI (uh , ζh ) = γ h ·
−1 1
∀vh ∈ Xh ,
fr
∀(qh , γ h , ζh ) ∈ Mh × R2 × Lh (Γ).
942
V. J. ERVIN, E. W. JENKINS, AND S. SUN
A more general inf-sup condition than that given by (4.6), (4.7) is needed for the analysis. This is established using the following two lemmas. (See also [24].) Corresponding to V and Z as defined in (3.1) and (3.2), we have the discrete counterparts (4.10)
Vh := {v ∈ Xh | bI (vh , ζ) = 0 ∀ζ ∈ Lh },
(4.11)
Zh := {v ∈ Vh | b(v, q, γ) = 0 ∀(q, γ) ∈ Mh × R2 }.
Lemma 4.2. There exists CRXh > 0 such that for h sufficiently small (4.12)
inf
sup
Γin
β1 wf,h · nf ds +
0 =β∈R2 wh ∈Vh
Γout
β2 wp,h · np ds
wh X βR2
≥ CRXh .
Proof. We use (3.5)–(3.8) to construct a suitable function v. Then using a linear interpolant for v we obtain the stated result. Assume β = [β1 , β2 ]T ∈ R2 is given. For i ∈ {in, out}, let si (x) denote an arc length parameter on Γi , and define φi : ∂Ω → R by ⎧ 2 ⎪ s (x), ⎪ ⎨ |Γi | i 2 φi (x) = |Γi | (|Γi | − si (x)), ⎪ ⎪ ⎩ 0
x ∈ Γi ,
0 ≤ si (x) ≤
x ∈ Γi ,
|Γi | 2
|Γi | 2 ,
< si (x) ≤ |Γi |,
otherwise.
Further, let a ∈ W 1−1/r,r (∂Ω) and f ∈ Lr (Ω) be given by (4.13)
a(x) = (β1 φin (x) + β2 φout (x)) n,
1 f (x) = |Ω|1/r
a · n ds, ∂Ω
where n denotes the outward-pointing unit normal to Ω. Note that aW 1−1/r,r (∂Ω) ≤ |β1 | φin nW 1−1/r,r (∂Ω) + |β2 | φout nW 1−1/r,r (∂Ω) ≤ CβR2 and f Lr (Ω) ≤ (|β1 | |Γin | + |β2 | |Γout |) /2 ≤ CβR2 . With a and f given by (4.13), let v be given by (3.6), (3.7), and vf,h = Ih (v)|Ωf , vp,h = Ih (v)|Ωp , where Ih (v) denotes a continuous linear interpolant of v with respect to Tf,h ∪ Tp,h . Note that vh = (vf,h , vp,h ) ∈ Vh and v − vh W s,r (Ω) ≤ Ch1−s vW 1,r (Ω) ,
v − vh W 0,r (∂Ω) ≤ Chr vW 1,r (Ω) .
s = 0, 1,
943
COUPLED GENERALIZED NONLINEAR FLOW
Then, for h sufficiently small, Γin β1 wf,h · nf ds + Γout β2 wp,h · np ds sup wh X wh ∈Xh β v · nf ds + Γout β2 vp,h · np ds Γin 1 f,h ≥ vh X β v · nf ds + Γout β2 vp · np ds + Γin β1 (vf,h − vf ) Γin 1 f · nf ds + Γout β2 (vp,h − vp ) · np ds ≥ C vX
≥ C1 βR2 − C2 hr βR2 , from which (4.12) follows. Lemma 4.3. For h sufficiently small, there exists Cbh > 0 such that (4.14)
inf
sup
(0,0) =(qh ,β)∈Mh ×R2 vh ∈Vh
b(vh , (qh , β)) ≥ Cbh . vh X (q, β)M×R2
ˆ h ∈ Xh such that Proof. Let (ph , β) ∈ Mh × R2 . From Lemma 4.2, there exists u (4.15)
ˆ uh X = βRm
and
Γin
β1 vf,h · nf ds +
Γout
β2 vp,h · np ds
ˆ uh X
≥ CRXh βR2 .
Consider the following two problems. 0 ˜ f,h ∈ Xf,h Problem 1. Discrete power law problem in Ωf . Determine u , p˜f,h ∈ Mf,h such that (4.16) (4.17)
0 uf,h ), d(v)) − (˜ pf,h , ∇ · v) = 0 ∀v ∈ Xf,h , (|d(˜ uf,h )|r−2 d(˜ 1−r /r
˜ f,h ) = (q, pf,h Mf (q, ∇ · u
ˆ f,h ) ∀q ∈ Mf,h . |pf,h |r /r−1 pf,h − ∇ · u
0 ˜ p,h ∈ Xp,h Problem 2. Modified Darcy problem in Ωp . Determine u , p˜p,h ∈ Mp,h such that
˜ p,h , v) − (˜ pp,h , ∇ · v) = 0 (|˜ up,h |r−2 u
(4.18) (4.19)
1−r /r
˜ p,h ) = (q, pp,h Mp (q, ∇ · u
0 ∀v ∈ Xp,h ,
ˆ p,h ) ∀q ∈ Mp,h . |pp,h |r /r−1 pp,h − ∇ · u
Note that 1−r /r
pj,h Mj
ˆ j,h ∈ Lr (Ωj ), |pj,h |r /r−1 pj,h − ∇ · u
j = f, p.
0 0 ˜ f,h ∈ Xf,h ˜ p,h ∈ Xp,h Existence and uniqueness of u , p˜f,h ∈ Pf,h and u , p˜p,h ∈ Pp,h satisfying (4.16), (4.17) and (4.18), (4.19), respectively, follow from the inf-sup conditions (4.6), (4.7) and the strong monotonicity of T : X −→ X ∗ , (T (φ), ψ) := r−2 |φ| φ · ψ dA.
944
V. J. ERVIN, E. W. JENKINS, AND S. SUN
˜ f,h and q = p˜f,h , From (4.16) and (4.17), choosing v = u uf,h )|r−2 d(˜ uf,h ), d(˜ uf,h )) ˜ uf,h rXf = (|d(˜ ˜ f,h ) = (˜ pf,h , ∇ · u 1−r /r
(4.20)
ˆ f,h ) |pf,h |r /r−1 pf,h − ∇ · u = (˜ pf,h , pf,h Mf 1−r /r ˆ f,h Lr |pf,h |r /r−1 pf,h Lr + ∇ · u ≤ ˜ pf,h Mf pf,h Mf ≤ ˜ pf,h Mf pf,h Mf + C ˆ uf,h Xf ≤ C ˜ pf,h Mf pf,h Mf + βR2 .
0 and Mf,h we have Also, from the inf-sup condition for spaces Xf,h
c ˜ pf,h Mf ≤ sup
0 v∈Xf,h
= sup 0 v∈Xf,h
≤ sup
0 v∈Xf,h
(˜ pf,h , ∇ · v) vXf (|d(˜ uf,h )|r−2 d(˜ uf,h ), d(v)) vXf ( |d(˜ uf,h )|r−2 d(˜ uf,h )Lr d(v)Lr vXf
= |d(˜ uf,h )|r−2 d(˜ uf,h )Lr r/r
= ˜ uf,h Xf .
(4.21)
Combining (4.20) and (4.21) we have the estimate (4.22) ˜ uf,h Xf ≤ C pf,h Mf + βR2 . ˜ p,h satisfying Problem 2 leads to the estimate Proceeding in a similar fashion for u (4.23) ˜ up,h Xp ≤ C pp,h Mp + βR2 . ˜ j,h + u ˆ j,h , j = f, p. Note that as uf,h = 0 on Γ and up,h · np = 0 on Let uj,h = u Γ, uh ∈ Vh . Then, using (4.17), (4.19), and (4.12), pf,h ∇ · uf,h dA + pp,h ∇ · up,h dA + β1 uf,h · nf ds b(uh , (ph , β)) = Ωf
Ωp
Γin
up,h · np ds
+ β2 Γout
1−r /r
= Ωf
pf,h pf,h Mf
1−r /r
pp,h pp,h Mp
+ Ωp
|pf,h |r /r−1 pf,h dA
ˆ f,h · nf ds + β2 u
+ β1 Γin
(4.24)
|pp,h |r /r−1 pp,h dA
≥ c ph 2M + β2R2 .
ˆ p,h · np ds u Γout
945
COUPLED GENERALIZED NONLINEAR FLOW
Thus, using (4.24), (4.22), and (4.23), we have sup
vh ∈Xh
b(vh , (ph , β)) b(uh , (ph , β)) ≥ vh X uh X ≥ C (ph P + βR2 ) ,
from which (4.14) immediately follows. The discrete inf-sup condition for bI (·, ·) follows from the continuous inf-sup condition and the existence of a bounded interpolation operator Ip,h : Xp → Xh,p satisfying, for some α > 0, (4.25) w − Ip,h (w) · np W −1/r,r (∂Ωp ) ≤ Cap hα wXp
Ip,h (w)Xp ≤ Cip wXp .
and
Lemma 4.4. There exists CXΓh > 0 such that for h sufficiently small (4.26)
sup
inf
0 =λh ∈Lh uh ∈Xh
bI (uh , λh ) ≥ CXΓh . uh X λh W 1/r,r (Γ)
Proof. With λ = λh , let vp ∈ W 0,r (div, Ωp ) be as defined by (3.16)–(3.20), and let vp,h = IR−T (vp ) ∈ Xp,h denote the Raviart–Thomas interpolant of vp . Further, let vh = (0, vp,h ) ∈ Xh . Then sup
uh ∈Xh
bI (uh , λh ) bI (vh , λh ) ≥ uh X vh X =
0 + vp,h · np , λh Γ vp,h W 0,r (div,Ωp )
=
vp · np , λh Γ (vp,h − vp ) · np , λh Γ + vp,h W 0,r (div,Ωp ) vp,h W 0,r (div,Ωp )
(vp,h − vp ) · np , EΓr λh ∂Ωp vp · np , λh Γ ≥ + C vp W 0,r (div,Ωp ) vp,h W 0,r (div,Ωp )
≥
(vp,h − vp ) · np , EΓr λh ∂Ωp 1 λW 1/r,r (Γ) + . 2C vp,h W 0,r (div,Ωp )
With λ = λh let ϕ be given by (A.1)–(A.3), and let ϕh = I(ϕ) denote a continuous linear interpolant of ϕ with respect to Tp,h . Note that λh = ϕh on Γ and Γout . Now,
(vp,h − vp ) · np , EΓr λh ∂Ωp = (vp,h − vp ) · np , ϕh ∂Ωp
+ (vp,h − vp ) · np , (EΓr λh − ϕh )∂Ωp
= 0 + vp,h · np , (EΓr λh − ϕh )∂Ωp
− vp · np , (EΓr λh − ϕh )∂Ωp .
As EΓr λh − ϕh = 0 on ∂Ωp \Γp and vp · np |Γp = 0, then vp · np , (EΓr λh − ϕh )∂Ωp = 0. Further, as vp,h · np = 0 on Γp , vp,h · np , (EΓr λh − ϕh )∂Ωp = 0, from which (4.26) then follows. We now state and prove the existence and uniqueness of solutions to (4.8)–(4.9).
946
V. J. ERVIN, E. W. JENKINS, AND S. SUN
Theorem 4.5. There exists a unique solution (uh , ph , λh , βh ) ∈ Xh × Mh × Lh × R2 satisfying (4.8)–(4.9). In addition, there exists a constant C > 0 such that (4.27) uh X ≤ C ff Xf∗ + |f r| . Proof. With the inf-sup conditions given in (4.14) and (4.26), the existence and uniqueness follows exactly as for the continuous problem in Theorem 3.3. The norm estimate for uh follows in a similar manner to that for u and uses the property that ∇ · Xp,h ⊂ Mp,h . 4.1. A priori error estimate. Next we investigate the error between the solution of the continuous variational formulation and its discrete counterpart. Theorem 4.6. Let 0 2−r 0 2−r 0 0 0 |d(uf ) − d(uf,h )| 0 r 0 |up − up,h | 0 r 0 0 0 0 + 0 and E(u, uh ) = 0 c + |d(uf )| + |d(uf,h )| 0L∞ (Ωf ) c + |up | + |up,h | 0L∞ (Ωf ) |gf (d(uf ))d(uf ) − gf (d(uf,h ))d(uf,h )||d(uf ) − d(uf,h )| dA G(u, uh ) =
Ωf
|gp (up )up − gp (up,h )up,h ||up − up,h | dA.
+ Ωp
Then for (u, p, λ, β) satisfying (2.35)–(2.36) and (uh , ph , λh , β h ) satisfying (4.8)–(4.9), and h sufficiently small, there exists a constant C > 0 such that 2 inf u − vh 2X + E(u, uh )r u − vh rX (4.28) u − uh X + G(u, uh ) ≤ C vh ∈Xh
+
inf p − qh 2M + inf λ − ζh W 1/r,r (Γ) ,
qh ∈Mh
ζh ∈Lh
p − ph M + β − βh R2 + λ − λh W 1/r,r (Γ) 1/r ≤ C E(u, u ) G(u, u ) + inf p − q + inf λ − ζ h h h M h W 1/r,r (Γ) . (4.29) q ∈M ζ ∈L h
h
h
h
Note that the constant C in Theorem 4.6 may depend upon uX . The following combined inf-sup condition is used in the proof of Theorem 4.6. Lemma 4.7. There exists a constant Cc > 0 such that (4.30)
b(vh , qh , γ h ) − bI (vh , ζh ) inf sup ≥ Cc . 2 (0,0,0) =(qh ,ζh ,γ h )∈Mh ×Lh ×R vh ∈Xh (qh M + ζh W 1/r,r (Γ) + γ h R2 )vh X
Proof. As b(·, ·, ·) and bI (·, ·) are continuous and satisfy inf-sup conditions (4.14) and (4.26), the inf-sup condition (4.30) follows immediately. (See Theorem B.1 in Appendix B.) Proof of Theorem 4.6. Introduce the affine subspace Z˜h defined by Z˜h := {(qh , ζh , γ h ) ∈ Mh × Lh × R2 : −b(vh , qh , γ h ) + bI (vh , ζh ) = (f , vh ) − a(uh , vh ) ∀vh ∈ Xh }. Note that (ph , λh , βh ) ∈ Z˜h .
947
COUPLED GENERALIZED NONLINEAR FLOW
For uf,h , from (2.16) d(uf ) − d(uf,h )2Lr (Ωf ) 2−r c + d(uf )2−r Lr (Ωf ) + d(uf,h )Lr (Ωf ) + |gf (d(uf ))d(uf ) − gf (d(uf,h ))d(uf,h )||d(uf ) − d(uf,h )| dA
Ωf
≤C
(gf (d(uf ))d(uf ) − gf (d(uf,h ))d(uf,h )) : (d(uf ) − d(uf,h )) dA Ωf
(gf (d(uf ))d(uf ) − gf (d(uf,h ))d(uf,h )) : (d(uf ) − d(vf,h )) dA
=C Ωf
(gf (d(uf ))d(uf ) − gf (d(uf,h ))d(uf,h )) : (d(vf,h ) − d(uf,h )) dA
+C Ωf
= I1 + I2 . To estimate I1 we use (2.17). (gf (d(uf ))d(uf ) − gf (d(uf,h ))d(uf,h )) : (d(uf ) − d(vf,h )) dA Ωf
1/r
|gf (d(uf ))d(uf ) − gf (d(uf,h ))d(uf,h )| |d(uf ) − d(uf,h )| dA
≤C Ωf
0 0 2−r 0 |d(uf ) − d(uf,h )| 0 r 0 0 ·0 d(uf ) − d(vf,h )Lr (Ωf ) c + |d(uf )| + |d(uf,h )| 0∞ ≤ 1 |gf (d(uf ))d(uf ) − gf (d(uf,h ))d(uf,h )| |d(uf ) − d(uf,h )| dA Ωf
0 0 +C0 0
0 2−r r r |d(uf ) − d(uf,h )| 0 r 0 d(uf ) − d(vf,h )Lr (Ωf ) . c + |d(uf )| + |d(uf,h )| 0∞
Thus we have that d(uf ) − d(uf,h )2Lr (Ωf ) 2−r c + d(uf )2−r Lr (Ωf ) + d(uf,h )Lr (Ωf ) + |gf (d(uf ))d(uf ) − gf (d(uf,h ))d(uf,h )||d(uf ) − d(uf,h )| dA Ωf
(4.31)
0 0 0 |d(uf ) − d(uf,h )| 02−r r 0 0 d(uf ) − d(vf,h )Lr (Ωf ) + I2 . ≤C0 c + |d(uf )| + |d(uf,h )| 0 ∞
Similarly, we obtain that for vp,h ∈ Xp,h up − up,h 2Lr (Ωp ) + |gp (up )up − gp (up,h )up,h | |up − up,h | dA 2−r c + up 2−r Ωp Lr (Ωp ) + up,h Lr (Ωp ) 0 0 0 |up − up,h | 02−r r 0 (4.32) up − vp,h Lr (Ωp ) + I4 , ≤C0 0 c + |up | + |up,h | 0 ∞
948
V. J. ERVIN, E. W. JENKINS, AND S. SUN
where I4 is given by (gp (up )up − gp (up,h )up,h ) : (vp,h − up,h ) dA.
I4 := C Ωp
Note that with vh = (vf,h , vp,h ), I2 + I4 = a(u, vh − uh ) − a(uh , vh − uh ), and for (qh , ζh , γ h ) ∈ Z˜h , a(u, vh − uh ) − a(uh , vh − uh ) = b(vh − uh , p, β) − bI (vh − uh , λ) − b(vh − uh , ph , βh ) + bI (vh − uh , λh ) = b(vh − uh , p, β) − bI (vh − uh , λ)
( as (ph , λh , βh ) ∈ Z˜h )
= b(vh − uh , p − qh , β − γ h ) − bI (vh − uh , λ − ζh ) = b(u − uh , p − qh , β − γ h ) − b(u − vh , p − qh , β − γ h ) − bI (u − uh , λ − ζh ) + bI (u − vh , λ − ζh ) ≤ u − uh 2X + C u − vh 2X + p − qh 2M + λ − ζh 2W 1/r,r (Γ) .
(4.33)
In the last step of (4.33) we use the continuity of the operators b(·, ·, ·) and bI (·, ·). Combining (4.31)–(4.33) and the fact that ∇·Xp,h ⊂ Mp,h , we obtain the estimate (4.28) for (qh , ζh , γ h ) ∈ Z˜h . The inf-sup condition (4.30) then enables (qh , ζh , γ h ) to be lifted from Z˜h to Mh × Lh × R2 . (See [5] for details.) To establish (4.29) we begin with the inf-sup condition (4.30). ph − qh M + β h − γ h R2 + λh − ζh W 1/r,r (Γ) b(vh , (ph − qh ), (β h − γ h )) − bI (vh , λh − ζh ) vh X
b(vh , (p − qh ), (β − γ h )) − bI (vh , λ − λh ) ≤C vh X ≤C
b(vh , (p − ph ), (β − β h )) − bI (vh , λ − ζh ) − vh X
a(u, vh ) − a(uh , vh ) ≤ C p − qh M + β − γ h R2 + λh − ζh W 1/r,r (Γ) − vh X ≤ C p − qh M + β − γ h R2 + λh − ζh W 1/r,r (Γ) + E(u, uh ) G(u, uh )1/r . (4.34) Combining (4.34) with the triangle inequality, we obtain (4.29). Appendix A. Extension operator from Γ to ∂Ω. Let Ω be a bounded ¯∪Γ ¯b ∪ Γ ¯ d , where Γ, Γb , and Γd Lipschitz domain in Rn (n = 2 or 3), and let ∂Ω = Γ c are pairwise disjoint and dist(Γ, Γd ) > 0. Additionally, let Γ = ∂Ω\Γ.
949
COUPLED GENERALIZED NONLINEAR FLOW
We use standard notation to denote the function spaces used, for example, W s,p (Ω), −l,q l,p W (∂Ω), etc., with W00 (∂Ω) denoting the dual space of W00 (∂Ω), where q is the unitary conjugate of p, i.e., 1/q := 1 − 1/p. The expression A " B is used to denote the inequality A ≤ (constant) · B. Next we investigate a suitable extension of a function λ defined on Γ to a function defined on ∂Ω. Assume that p ≥ 2. Lemma A.1. Given λ ∈ W 1/q,p (Γ) define EΓp λ := γ0 ϕ, where γ0 is the trace operator from W 1,p (Ω) to W 1/q,p (∂Ω), and ϕ ∈ W 1,p (Ω) is the weak solution to l,p
(A.1) (A.2) (A.3)
−∇ · |∇ϕ|p−2 ∇ϕ = 0 in Ω, λ on Γ, ϕ= 0 on Γd , |∇ϕ|p−2 ∂n ϕ = 0
on Γb .
Then EΓp λ ∈ W 1/q,p (∂Ω), and EΓp λW 1/q,p (∂Ω) " λW 1/q,p (Γ) . Proof. The proof follows from the strong monotonicity [19] of the operator L : X −→ X ∗ , L(u) := −∇ · |∇u|p−2 ∇u, where X = {f ∈ W 1,p (Ω) : f |Γ∪Γd = 0} [23]. p For λ ∈ W 1/q,p (Γ), let E00,Γ λ denote the extension of λ by zero on Γc . 1/q,p
p Remark. Note that E00,Γ λ ∈ W 1/q,p (∂Ω) if and only if λ ∈ W00 (Γ). Lemma A.2 (see [9]). For ζ ∈ W 1/q,p (∂Ω) there exist ζΓ ∈ W 1/q,p (Γ) and 1/q,p p ζΓc ∈ W00 (Γc ) such that ζ = EΓp ζΓ + E00,Γ c ζΓc . Moreover, this decomposition is unique. Proof. Let ζ ∈ W 1/q,p (∂Ω). Define, ζΓ := ζ|Γ and ζΓc := ξ|Γc , where ξ := ζ − EΓp ζΓ . Note that ζ|Γ ∈ W 1/q,p (Γ) and
EΓp ζΓ W 1/q,p (∂Ω) " ζΓ W 1/q,p (Γ) ≤ ζW 1/q,p (∂Ω) , p p and hence ξ ∈ W 1/q,p (∂Ω). Also, E00,Γ c ζΓc = ξ as ζ and EΓ ζΓ agree on Γ. Thus, 1/q,p
from the remark above, ζΓc ∈ W00 (Γc ). p To show uniqueness of the decomposition, observe that if 0 = EΓp ζΓ + E00,Γ c ζΓc , then ζΓ is the trace of the weak solution of (A.1)–(A.3) for λ = 0. Hence ζΓ = 0. Next we introduce the concept of the restriction of an operator in W −1/q,q (∂Ω) to be equal to zero. Definition A.3 (see [9]). If f ∈ W −1/q,q (∂Ω), then f |Γc = 0 means by definition that (A.4)
p f, E00,Γ c ξ∂Ω = 0
1/q,p
∀ξ ∈ W00
(Γc ).
The following lemma describes how an operator in W −1/q,q (∂Ω) can be decomposed into an operator in W −1/q,q (Γ) and an operator in W −1/q,q (Γc ). Lemma A.4 (see [9]). For f ∈ W −1/q,q (∂Ω) there exists fΓ ∈ W −1/q,q (Γ) and −1/q,q p fΓc ∈ W00 (Γc ) such that for ζ ∈ W 1/q,p (∂Ω), with ζ = EΓp ζΓ + E00,Γ c ζΓc , as defined in Lemma A.2, we have (A.5)
f, ζ∂Ω = fΓ , ζΓ Γ + fΓc , ζΓc Γc . 1/q,p
Proof. For ζΓ ∈ W 1/q,p (Γ) and ζΓc ∈ W00 (A.6)
fΓ , ζΓ Γ := f, EΓp ζΓ ∂Ω
(Γc ), define
p and fΓc , ζΓc Γc := f, E00,Γ c ζΓc ∂Ω .
950
V. J. ERVIN, E. W. JENKINS, AND S. SUN
Then fΓ , ζΓ Γ ≤ f W −1/q,q (∂Ω) EΓp ζΓ W 1/q,p (∂Ω) " f W −1/q,q (∂Ω) ζΓ W 1/q,p (Γ) , −1/q,q
and thus fΓ ∈ W −1/q,q (Γ). Analogously, fΓc ∈ W00
(Γc ). Additionally,
p fΓ , ζΓ Γ + fΓc , ζΓc Γc = f, EΓp ζΓ ∂Ω + f, E00,Γ c ζΓc ∂Ω = f, ζ∂Ω .
Note that for f ∈ W −1/q,q (∂Ω) with f |Γc = 0 (see Definition A.3), from (A.6), f, ζ∂Ω = fΓ , ζΓ Γ
(A.7)
∀ζ ∈ W 1/q,p (∂Ω).
Thus functionals in W −1/q,q (∂Ω) which are zero when restricted to ∂Ω\Γ can be identified with functionals in W −1/q,q (Γ). Appendix B. Combined inf-sup conditions. In deriving a priori error estimates for mixed methods, whose analysis relies on several inf-sup conditions, combined inf-sup conditions are needed. In this section we show that the required inf-sup conditions follow readily from the continuity of the bilinear forms and the individual inf-sup conditions. Theorem B.1. Let V, Q1 , Q2 be Banach spaces, and let b1 (·, ·) : V × Q1 −→ R, b2 (·, ·) : V × Q2 −→ R, and Z1 := {v ∈ V | b1 (v, q) = 0 ∀q ∈ Q1 }. Assume that b2 (·, ·) is continuous and there exist β1 , β2 > 0 such that sup v∈V,vV =1
sup v∈Z1 ,vV =1
b1 (v, q1 ) ≥ β1 q1 Q1
∀q1 ∈ Q1 ,
b2 (v, q2 ) ≥ β2 q2 Q2
∀q2 ∈ Q2 .
Then there exists β > 0 such that sup v∈V,vV =1
(b1 (v, q1 ) + b2 (v, q2 )) ≥ β (q1 Q1 + q2 Q2 )
∀(q1 , q2 ) ∈ Q1 × Q2 .
Proof. By the continuity of b2 (·, ·), there exists C2 > 0 such that b2 (v, q2 ) ≤ C2 vV q2 Q2
∀(v, q2 ) ∈ V × Q2 .
Let (q1 , q2 ) ∈ Q1 × Q2 be given, and choose v1 ∈ V with v1 V = 1 and v2 ∈ Z1 with v2 V = 1, satisfying b1 (v1 , q1 ) ≥
β1 q1 Q1 , 2
b2 (v2 , q2 ) ≥
β2 q2 Q2 . 2
Then for u = v1 + (1 + 2C2 /β2 )v2 we have β1 q1 Q1 , 2
2C2 β2 b2 (u, q2 ) = b2 (v1 , q2 ) + 1 + q2 Q2 . b2 (v2 , q2 ) ≥ β2 2
b1 (u, q1 ) = b1 (v1 , q1 ) ≥
COUPLED GENERALIZED NONLINEAR FLOW
951
Finally, as uV ≤ 2(1 + 2C2 /β2 ), with u0 = u/uV b1 (u0 , q1 ) + b2 (u0 , q2 ) ≥ β (q1 Q1 + q2 Q2 ) , where β = min{β1 , β2 }/(4(1 + C2 /β2 )). Corollary B.2. Let Z0 , Qi , i = 1, . . . , n, be Banach spaces, and let bi (·, ·) : Z0 × Qi −→ R, i = 1, . . . , n, and Zi := {v ∈ Zi−1 | bi (v, q) = 0 ∀q ∈ Qi }, i = 1, . . . , n − 1. Assume that bi (·, ·) is continuous and there exist βi such that sup
bi (v, q) ≥ βi qQi
∀q ∈ Qi , i = 1, . . . , n.
v∈Zi−1 ,vZ0 =1
Then there exists β > 0 such that (B.1) n sup bi (v, qi ) ≥ β (q1 Q1 + · · · + qn Qn ) ∀(q1 , . . . , qn ) ∈ Q1 × · · · × Qn . v∈Z0 ,vZ0 =1 i=1
Proof. The proof of (B.1) follows from Theorem B.1 and by induction. Acknowledgment. The authors would like to thank the referees for their helpful suggestions. REFERENCES [1] G. Beavers and D. Joseph, Boundary conditions at a naturally impermeable wall, J. Fluid Mech., 30 (1967), pp. 197–207. [2] F. Brezzi and M. Fortin, Mixed and Hybrid Finite Element Methods, Springer-Verlag, New York, 1991. [3] S.-S. Chow and G. Carey, Numerical approximation of generalized Newtonian fluids using Powell–Sabin–Heindl elements: I. Theoretical elements, Internat. J. Numer. Methods Fluids, 41 (2003), pp. 1085–1118. [4] M. Discacciati, E. Miglio, and A. Quateroni, Mathematical and numerical models for coupling surface and groundwater flows, Appl. Numer. Math., 43 (2002), pp. 57–74. [5] V. J. Ervin and H. Lee, Numerical approximation of a quasi-Newtonian Stokes flow problem with defective boundary conditions, SIAM J. Numer. Anal., 45 (2007), pp. 2120–2140. [6] V. Ervin and T. Phillips, Residual a posteriori error estimator for a three-field model of a non-linear generalized Stokes problem, Comput. Methods Appl. Mech. Engrg., 195 (2006), pp. 2599–2610. [7] L. Formaggia, J.-F. Gerbeau, F. Nobile, and A. Quateroni, Numerical treatment of defective boundary conditions for the Navier–Stokes equations, SIAM J. Numer. Anal., 40 (2002), pp. 376–401. [8] G. Galdi, An Introduction to the Mathematical Theory of the Navier-Stokes Equations, Vol. 1, Springer-Verlag, New York, 1994. [9] J. Galvis and M. Sarkis, Non-matching mortar discretization analysis for the coupling Stokes–Darcy equations, Electron. Trans. Numer. Anal., 26 (2007), pp. 350–384. [10] V. Girault and P. Raviart, Finite Element Methods for Navier-Stokes Equations, SpringerVerlag, Berlin, 1986. [11] M. D. Gunzburger and S. L. Hou, Treating inhomogeneous essential boundary conditions in finite element methods and the calculation of the boundary stresses, SIAM J. Numer. Anal., 29 (1992), pp. 390–424. [12] N. Hanspal, A. Waghode, V. Nassehi, and R. Wakeman, Numerical analysis of coupled Stokes/Darcy flow in industrial filtrations, Transp. Porous Media, 64 (2006), pp. 73–101. ¨ ger and A. Mikelic ´, On the interface boundary condition of Beavers, Joseph, and [13] W. Ja Saffman, SIAM J. Appl. Math., 60 (2000), pp. 1111–1127. [14] W. J. Layton, F. Schieweck, and I. Yotov, Coupling fluid flow with porous media flow, SIAM J. Numer. Anal., 40 (2003), pp. 2195–2218. [15] X. Lopez, P. Valvatne, and M. Blunt, Predictive network modeling of single-phase nonNewtonian flow in a porous media, J. Colloid Interface Sci., 264 (2003), pp. 256–265.
952
V. J. ERVIN, E. W. JENKINS, AND S. SUN
[16] M. Mu and J. Xu, A two-grid method of a mixed Stokes–Darcy model for coupling fluid flow with porous media flow, SIAM J. Numer. Anal., 45 (2007), pp. 1801–1813. [17] R. Owens and T. Phillips, Computational Rheology, Imperial College Press, London, 2002. [18] J. Pearson and P. Tardy, Models for flow of non-Newtonian and complex fluids through porous media, J. Non-Newtonian Fluid Mech., 102 (2002), pp. 447–473. [19] M. Renardy and R. Rogers, An Introduction to Partial Differential Equations, SpringerVerlag, New York, 1993. [20] B. Rivi` ere, Analysis of a discontinuous finite element method for the coupled Stokes and Darcy problems, J. Sci. Comput., 22/23 (2005), pp. 479–500. [21] B. Rivi` ere and I. Yotov, Locally conservative coupling of Stokes and Darcy flows, SIAM J. Numer. Anal., 42 (2005), pp. 1959–1977. [22] P. Saffman, On the boundary condition at the surface of a porous media, Stud. Appl. Math., 50 (1971), pp. 93–101. [23] D. Sandri, On the numerical approximation of quasi-Newtonian flows whose viscosity obeys a power law or the Carreau law, RAIRO Mod´ el. Math. Anal. Num´er., 27 (1993), pp. 131–155. ¨ rth, Finite element approximation of incompressible Navier-Stokes equations with [24] R. Verfu slip boundary condition, Numer. Math., 50 (1987), pp. 697–721.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 953–971
ON OPTIMAL CONVERGENCE RATE OF THE RATIONAL KRYLOV SUBSPACE REDUCTION FOR ELECTROMAGNETIC PROBLEMS IN UNBOUNDED DOMAINS∗ LEONID KNIZHNERMAN† , VLADIMIR DRUSKIN‡ , AND MIKHAIL ZASLAVSKY‡ Abstract. We solve an electromagnetic frequency domain induction problem in R3 for a frequency interval using rational Krylov subspace (RKS) approximation. The RKS is constructed by spanning on the solutions for a certain a priori chosen set of frequencies. We reduce the problem of the optimal choice of these frequencies to the third Zolotaryov problem in the complex plane, having an approximate closed form solution, and determine the best Cauchy–Hadamard convergence rate. The theory is illustrated with numerical examples for Maxwell’s equations arising in 3D magnetotelluric geophysical exploration. Key words. frequency domain problems, Galerkin method, third Zolotaryov problem in complex plane AMS subject classifications. 30C85, 30E10, 41A05, 41A20, 65M60, 86-08 DOI. 10.1137/080715159
1. Introduction. Many boundary value problems can be reduced to computation of u = f (A)ϕ, where A is an operator in a Hilbert space, and u and ϕ are elements of the same space. In practice A can be a large ill-conditioned matrix obtained after discretization of a PDE operator, which is why it is convenient to consider A as an unbounded operator. The resolvent f (λ) =
1 λ+s
is one of the most commonly used functions appearing in the solution of linear nonstationary equations in the frequency domain. As an important practical application, we consider the direct problem of electromagnetic frequency sounding arising in geophysical prospecting. It can be reduced to the magnetic field formulation of the frequency-domain Maxwell equations in R3 in the low frequency regime (displacement currents are assumed to be negligible) (1.1)
∇ × (μσ)−1 ∇ × H + iωH = ∇ × σ −1 J
with zero boundary conditions at infinity. Here H is the vector magnetic field induced by an external current J, ω is a frequency, μ is the magnetic permeability (which is assumed to be constant throughout the whole domain), and c1 ≤ σ ≤ c2 is variable electrical conductivity distribution, where c1 and c2 are positive constants. We solve ∗ Received by the editors February 6, 2008; accepted for publication (in revised form) August 7, 2008; published electronically February 13, 2009. http://www.siam.org/journals/sinum/47-2/71515.html † Central Geophysical Expedition, House 38, Building 3, Narodnogo opolcheniya St., Moscow, 123298 Russia (
[email protected]). ‡ Schlumberger Doll Research, 1 Hampshire St., Cambridge, MA 02139 (
[email protected],
[email protected]).
953
954
L. KNIZHNERMAN, V. DRUSKIN, AND M. ZASLAVSKY
the resolvent problem with s = iω, A = A∗ = ∇ × (μσ)−1 ∇× and ϕ = ∇ × σ −1 J. Maxwell’s operator ∇ × (μσ)−1 ∇× in unbounded domains has a continuum (without holes) spectrum supported on the entire R+ = [0, +∞] [33, section 9]. Usually, the electromagnetic field is measured on for ω ∈ [ωmin , ωmax ]; i.e., the resolvent must be computed for multiple values of s corresponding to this interval. Two of the authors solved these problems using the so-called spectral Lanczos decomposition method (SLDM), which is Galerkin method on a Krylov subspace Km (A, ϕ) [6]. Similar approaches (with different names) were used in, e.g., [27, 26, 35, 9, 18]; however, the basic idea first appeared in the classical work of Hestenes and Stiefel [17]. The SLDM allows one to compute the resolvent for many frequencies with the cost of a single frequency problem using unpreconditioned conjugate gradients, and the time domain solution converges even asymptotically faster than the frequency domain solution [6]. However, the SLDM convergence was strongly affected by the condition number of the discrete problem and frequency range. Spectral adaptation of Krylov methods and efficiency of rational approximation can be combined in the so-called rational Krylov subspaces (RKS) [30]. The approximate solution is projected onto an RKS, which is a span of different rational functions of A applied to ϕ. Let us consider a subdiagonal RKS in the generic form: (1.2)
Un = span{b, Ab, . . . , An−1 b},
b=
n !
(A + sj I)−1 ϕ.
j=1 −1
Obviously, (A + sj I) ϕ ∈ Un ; i.e., the solution of the resolvent problem with s = sj is exactly approximated on Un , so the shifts sj are also called interpolating points. We assume that the RKS is computed using iterative methods for which there are no computational advantages to solving multiple linear systems with the same shifts (because of extensive memory requirements for the discretization of large scale electromagnetic problems in geophysics); i.e., we assume that sj do not coincide. The RKS is widely used in model reduction, in particular for computation of transfer functions of linear problems; see reviews [3, 8] for details. The question is, what is the optimal convergence rate with such an approach, and how do we choose sj to achieve it? For unbounded frequency intervals the interpolating frequencies can be obtained using the H2 -optimality conditions [23] by computing a sequence of Krylov subspaces [15]. In this work we consider bounded intervals, for which we compute optimal rates and corresponding interpolating points using the L∞ -optimality condition. The key of our approach is presenting the Galerkin solution as a particular case of the so-called skeleton approximation fskel (A, s)ϕ, where fskel (λ, s) is a rational function of λ and s introduced in [34, 28]. The optimization of the error of the skeleton approximation can be reduced to the famous third Zolotaryov problem with asymptotically optimal sj computed in terms of elliptic integrals. Given a bounded positive frequency interval, the computed interpolation points provide convergence with the optimal Cauchy–Hadamard rate for the class of operators with continuum spectrum supported on entire R+ and with a regular enough spectral measure. 2. Formulation of the problem. RKS Galerkin method. We compute action of the resolvent operator (2.1)
u = (A + sI)−1 ϕ,
A ≥ 0,
where A is a self-adjoint nonnegative definite operator acting in a Hilbert space H equipped with an inner product ., ., and ϕ is a normalized vector from this space.
ON OPTIMAL CONVERGENCE RATE OF RKS REDUCTION
955
We assume that A has a continuum (without holes) spectrum supported on the entire R+ . We assume that s ∈ S, where S is a compact subset of the complex plain not intersecting the real negative semiaxis. Should we have a solution us for a complex parameter s, we automatically also have the solution for the conjugate parameter s as us , so without loss of generality we can assume that S is symmetric with respect to the real axis. Choose noncoinciding parameters sj ∈ S, symmetric with respect to the real axis, 1 ≤ j ≤ n, and construct RKS (1.2). Due to the continuity of the A’s spectrum the corresponding spectral measure has infinite number of increase points, so dim Un = n. To approximately solve (2.1), we will use Galerkin approximation on Un . The Galerkin solution u ˜ ∈ Un satisfies the equalities (A + sI)˜ u, v = ϕ, v
(2.2)
∀v ∈ Un .
We construct a well-conditioned basis Gn = {g1 , . . . , gn } of Un with the help of a recursive algorithm. There are many ways to construct Gn . They are known generically by the name rational Arnoldi method (see, e.g., [30, 14]). In our numerical experiments we implement the following well-known simple variant of rational Arnoldi. Set g1 =
(A + s1 I)−1 ϕ . (A + s1 I)−1 ϕ
Let 2 ≤ l ≤ n and g1 , . . . , gl−1 have been calculated. Then the vector gl is obtained by the Gram–Schmidt orthogonalization of (A + sl I)−1 gl−1 to gj , j = 1, . . . , l − 1. Usually, the most computationally expensive part of rational Arnoldi is the solution of shifted linear systems. 3. RKS Galerkin method and the third Zolotaryov problem in the complex plane. 3.1. RKS Galerkin method and skeleton approximants. Let μ(λ) be the spectral measure, associated with the couple (A, ϕ). Using Parseval’s identity, we obtain f (A)ϕ, g(A)ϕ = f, gμ , where f, gμ =
+∞
g(λ)f (λ) dμ(λ). 0
Scalarizing the problem, i.e., considering it in the spectral coordinates, we will seek the Galerkin approximant w 6 ∈ Vn to the function 1 , λ+s
λ ∈ R,
λ ≥ 0,
s ∈ S,
where Vn is the spectral counterpart of Un from (1.2) defined as Vn = span
1 λ λn−1 , ,..., qn qn qn
,
qn (λ) =
n ! (λ + sl ). l=1
The Galerkin solution v˜ ∈ Vn satisfies the equation (3.1)
v, (λ + s)˜ v − 1μ = 0
∀v ∈ Vn .
956
L. KNIZHNERMAN, V. DRUSKIN, AND M. ZASLAVSKY
Problem (3.1) has a unique solution. Obviously, (λ + sl I)−1 ϕ ∈ Vn , so they are the solutions of (3.1) for s = sl , the points sl being the interpolation ones of v˜ as a function of s. Let θj and Zj ∈ Vn , j = 1, . . . , n, be, respectively, the Ritz values and (normalized) Ritz “vectors” (which are actually functions of λ) satisfying v, (λ − θj )Zj μ = 0
(3.2)
∀v ∈ Vn .
This problem (for the operator of multiplication by λ in L2,μ and the trial subspace Vn ) is Hermitian, so θj are positive and Zj are orthonormal. The Galerkin solution can be presented via spectral decomposition as (3.3)
v˜ =
n (θj + s)−1 Zj , 1μ Zj . j=1
By construction sl are either real positive or have a complex conjugate counterpart in S, and thus qn (λ) > 0 for λ ∈ R+ , i.e., on the A’s spectrum. So (3.1), (3.2), (3.3) can be equivalently considered as the polynomial problem with respect to qn v˜ instead of v˜ on the subspace Kn = span{1, λ, . . . , λn−1 } instead of Vn and spectral measure ρ instead of μ, where dρ(λ) = qn (λ)−2 dμ(λ). This allows us to apply to our rational approximant the known results from the theory of orthogonal polynomials (see [5]). First, we note that θj are the nodes of a Gaussian quadrature, and as such they don’t qn coincide. Also, (3.3) can be viewed as the Lagrange polynomial interpolating λ+s at θj (with respect to λ). So, we can summarize the interpolation properties of v˜ as a function of λ and s in the following lemma. Lemma 3.1. We have
1 = 0, λ ≥ 0, l = 1, . . . , n, v˜ − λ + s s=sl and
v˜ −
1 = 0, λ + s λ=θl
s ∈ S,
l = 1, . . . , n.
The so-called skeleton approximation of functions of two variables was introduced in [34] and then used in [12, 16]. This approximation for the function 1/(x + y) was investigated in [28]. This function is defined as ⎛ 1 ⎞
s+λ1 , 1 1 ⎟ −1 ⎜ (3.4) fskel (λ, s) = , ..., M ⎝ ... ⎠ , λ + s1 λ + sn 1 s+λn
where M = (Mkl ) is the n × n matrix with the entries Mkl = 1/(λk + sl ). Theorem 3 from [28] for our case can be written as 9 n n ! 1 1 λ − λj ! s − sj (3.5) δ= − fskel (λ, s) = · ; λ+s λ + s j=1 λ + sj j=1 s + λj i.e., λj and sj are interpolating points. Both v˜ and fskel are (n−1)/n rational functions of λ and of s, so from Lemma 3.1 and (3.5) we obtain the following proposition.
ON OPTIMAL CONVERGENCE RATE OF RKS REDUCTION
957
Proposition 3.2. If θj = λj , j = 1, . . . , n, then v˜ ≡ fskel . The relative interpolation error, i.e., the left-hand side of (3.5), can be written as δ=
r(λ) , r(−s)
r(z) =
n ! z − λj . z + sj j=1
Introduce the quantity σn (R+ , −S) ≡
(3.6)
min
λ1 ,...,λn ,s1 ,...,sn
maxλ≥0 |r(λ)| . minz∈−S |r(z)|
As will be discussed in detail later, minimization problem (3.6) is a partial case of the third Zolotaryov problem in the complex plane, and it has an asymptotically (in the Cauchy–Hadamard sense) best solution with λj ∈ R+ and sj ∈ S, such that λl = λj and sl = sj if l = j. We will use sj obtained from (3.6) to construct the Galerkin subspace U . Optimal λj may differ from the Ritz values θj , but the Galerkin error can still be estimated via σn (R+ , −S). Proposition 3.3. We have an estimate 0 0 0 0 1 σn (R+ , S) 0 0 (3.7) 0 λ + s − v˜0 ≤ 2 dist(R+ , S) . μ Proof. For any λj and sj (j = 1, . . . , n) obtained from the solution of Zolotaryov problem (3.6), fskel (λ, s) ∈ V and (λ + s)fskel (λ, s) = 1 − δ(λ, s), so fskel (λ, s) is the solution of the modified Galerkin problem v, (λ + s)fskel (λ, s) − 1 + δ(λ, s)μ = 0
∀v ∈ V.
Obviously, fskel (λ, s) = (λ + s)−1 [1 − δ(λ, s)], so 0 0 0 0 1 −1 −1 0 0 0 λ + s − fskel 0 = (λ + s) δ(λ, s)μ ≤ (λ + s) μ δ(λ, s)μ . μ
From the identities ϕ = 1μ =
∞ 0
dμ = 1 we get
δ(λ, s)μ ≤ max |δ(λ, s)|. λ∈R+
For the optimal δ obtained with the help of (3.6) we obtain (3.8)
δ(λ, s)μ ≤ σn (R+ , −S)
958 and (3.9)
L. KNIZHNERMAN, V. DRUSKIN, AND M. ZASLAVSKY
0 0 0 0 1 σn (R+ , −S) 0 0 0 λ + s − fskel 0 ≤ dist(R+ , S) . μ
Again, for any λj and sj the spectral decomposition gives 0 0 : 0 n 0 ; n 0 0 ; −1 0 (θj + s)−2 Zj , δ2μ . < fskel − v˜μ = 0 (θj + s) Zj , δμ Zj 0 0= 0 j=1 0 j=1 So, with the optimal δ obtained with the help of (3.6), using (3.8) and real positivity of θj , we infer fskel − v˜μ ≤
σn (R+ , −S) . dist(R+ , S)
Using this estimate, (3.9), and the triangle inequality, we obtain (3.7). Obviously, the error of the Galerkin approximate cannot be smaller than the optimal error measured in the same norm, so we have a lower bound for the relative error of the Galerkin approximant in spectral coordinates as (λ + s)˜ v − 1L∞ (R+ ) ≥ σn (R+ , −S). Thus, we have both the upper L2 and lower L∞ error norms of order σn (R+ , −S). So it is natural to expect that Proposition 3.3 gives a sharp bound in the Cauchy– Hadamard sense and that ωj are close to optimal in the same sense. It follows from Parseval’s identity that the Galerkin error in the L2 norm can be computed as = 0 0 ∞ 0 0 1 0 u − u˜ = 0 |˜ v − (λ + s)−1 |2 dμ(λ). 0 λ + s − v˜0 = 0 μ The Galerkin method can improve the convergence speed due to adaptation to the nonuniformity of μ. However, for the class of operators with regular enough spectral measures, supported on the entire R+ , the spectral adaptation cannot improve the Cauchy–Hadamard convergence rate. 3.2. The third Zolotaryov problem in the complex plane. Minimization problem (3.6) is a partial case of the third Zolotaryov problem in the complex plane (see [10] or [36, section 8.7]). This problem in relation to the alternating direction implicit (ADI) method was investigated in [22, 7, 20, 32]. Generally this problem can be solved numerically with the use of the Remez algorithm. In particular, we are interested in cases when S = −S = D = i[ωmin , ωmax ] ∪ (−i)[ωmin , ωmax ]. Such a problem arises in geophysical prospecting with low frequency electromagnetic sources (see the numerical examples). For these cases we shall calculate the asymptotical convergence factor and give a closed form approximate solution. min = 1 − κ2 , 0 < κ < 1. Let ωωmax Introduce the full elliptic integral of modulus κ, 1 dt K(κ) = . 2 )(1 − κ 2 t2 ) (1 − t 0
ON OPTIMAL CONVERGENCE RATE OF RKS REDUCTION
959
Theorem 3.4. With the number (3.10)
√ πK( 1 − κ2 ) ρ = exp − 2K(κ)
the following assertions are valid: (3.11) (3.12)
σn (R+ , D) ≥ ρn , n ∈ N, n lim σn (R+ , D) = ρ. n→∞
We shall give a proof of Theorem 3.4 in the appendix. Later on we assume that the number of frequencies n is even. In practice, we work with functional spaces over C, the operator A, and the right-hand-side vector ϕ being real. In such a situation, should we obtain the solution u for a frequency ω, the solution for the frequency −ω is just u¯. Thus we can reckon that frequencies ω and −ω belong to the compact D simultaneously. In this case D is symmetric with respect to R. The proof of [36, section 8.7, Theorem 9] in conjunction with the maxim from [10, section 5, paragraph 1] says how parameters ωj and λj should be asymptotically distributed on D and R+ , respectively, for approximation (3.4) to be optimal in the Cauchy–Hadamard sense. Since the measure β (see (A.12)) is equilibrium on D to Ω, we have taken 2
2j − 1 n ωj ,κ , = 1 − (1 − κ2 ) sn ω n2 +j = −ωj , j = 1, . . . , , (3.13) ωmax n 2 so on each connected component of D the parameters ωj are asymptotically distributed as interpolation nodes of corresponding Zolotaryov approximants. Remark 1. Optimal (in the Cauchy–Hadamard sense) parameters λj /ωmax can be found as the roots U of the equations > > 1 dv 2U 2U 1 − 1 + arctan +1 arctan 2K(κ) 1−κ2 v v (v − 1 + κ2 )v(1 − v) (3.14)
=
(j − 0.5)π , n
j = 1, . . . , n.
But these parameters are not exploited in our reduced order models since we use Galerkin formulation (2.2) and its Ritz values may differ from optimal λj . Conjecture 1. Given (3.13) and (3.14), one can explicitly (in the Zolotaryov style) present the quantities maxz∈D |r(z)−1 | and maxλ≥0 |r(λ)| in terms of elliptic functions and obtain the upper bound σn (R+ , D) = O (ρn ) , where ρ is defined by formula (3.10). For the case when κ → 1 − 0 it is possible to obtain an asymptotical formula for ρ containing only elementary functions. In fact, in this case κ < 1 tends to 1 and the formulae K(κ) =
16 1 log + o(1) 2 1 − κ2
960
L. KNIZHNERMAN, V. DRUSKIN, AND M. ZASLAVSKY 10-7
10-8
-9
10
10-10
-3
-2.5
-2
-1.5 log10ω
-1
Fig. 1. 0.001 ≤ ω ≤ 1, n = 40, the error
-0.5
0
maxλ≥0 |r(λ)| . |r(iω)|
(see [1, (17.3.26)]) and π K( 1 − κ2 ) = + o(1) 2 enable us to transform (3.10) into the expression π2 2 + o(1) (3.15) ρ = exp − . log ωωmax + log 16 min max
|r(λ)|
λ≥0 In Figures 1 and 2 we show the plots of the error for Zolotaryov |r(iω)| approximants as functions of ω for n = 40 and 60, respectively. The error graphs show almost equal ripples on the prescribed spectral interval, which, by analogy with the Chebyshev real approximation theory, enables us to conjecture that our approximants are almost the best.
4. Numerical experiments. We consider the direct problem of magnetotelluric geophysical exploration. The electromagnetic field excited by the Sun propagates into the Earth. Using the Fourier transform (transfer function) of the measured field, geophysicists determine underground distribution of conductivity σ, and the direct problem constitutes in the solution of (1.1) for a given frequency interval. In the geophysical exploration the problem is considered in the conductive inhomogeneous half-space with horizontal plane source at +∞. We deal with the plane electric wave polarized along a horizontal (x) direction for the frequency interval from 0.01 Hz to 15 Hz. The measurements are the ratios of x-component of electric and y-component of magnetic fields (impedances) taken at the plane z = 300 m. In our experiments we estimated the relative L2 norm of the error on the plane. As was already mentioned, the most computationally expensive part of rational
ON OPTIMAL CONVERGENCE RATE OF RKS REDUCTION
961
10-11
10-12
10
-13
10-14
-3
-2.5
-2
-1.5 log10ω
-1
Fig. 2. 0.001 ≤ ω ≤ 1, n = 60, the error
-0.5
0
maxλ≥0 |r(λ)| . |r(iω)|
Fig. 3. Medium for test 1: A homogeneous conductive half-space.
Arnoldi is the solution of shifted linear systems. We used for this purpose a preconditioned Krylov subspace (QMR) solver [37]. In the first test we consider the homogeneous half-space shown in Figure 3. Figure 4 shows the comparison of frequency distribution of the errors for geometric and Zolotaryov grids for test 1 with n = 16. The geometric grid is the most common ad hoc grid used in applications. Indeed, Zolotaryov’s grids are superior. However, for large ωmax /ωmin # n the zeros of a Zolotaryov approximant’s error are visually close to a geometric progression, and the convergence rate of the approximant, based on the
962
L. KNIZHNERMAN, V. DRUSKIN, AND M. ZASLAVSKY
−1
Zolotaryov and geometric grids: error for Galerkin approximant with n=16
10
Zolotaryov grid Geometric grid
−2
10
−3
Error
10
−4
10
−5
10
−6
10
−7
10
−2
10
−1
0
10
10
1
10
Frequency
Fig. 4. Test 1: Error distribution for the geometric and Zolotaryov grids, ωmin = 0.01, ωmax = 15.
Fig. 5. Medium for test 2.
geometric progression grids, approaches that of the optimal (Zolotaryov’s) one [19]. However, as we see from the graphs, the error distribution for the Zolotaryov grid is more uniform than the one for the geometric grid on [ωmin , ωmax ], which results in slightly better accuracy in the L∞ [ωmin, ωmax ] norm. In test 2 we consider a more complicated medium consisting of a resistive target (oil reservoir) embedded under the sea bottom of variable depth (see Figure 5). The spectral distribution for this problem varies more than for the previous one (though still without holes in the spectral measure’s support), so both Zolotaryov and geomet-
ON OPTIMAL CONVERGENCE RATE OF RKS REDUCTION
963
Zolotaryov and geometric grids: error for Galerkin approximant with n=24
−2
10
Zolotaryov grid Geometric grid
−3
10
−4
10
−5
Error
10
−6
10
−7
10
−8
10
−9
10
−10
10
−2
−1
10
0
10
1
10
10
Frequency
Fig. 6. Test 2: Error distribution for the geometric and Zolotaryov grids, ωmin = 0.01, ωmax = 15.
Convergence of Galerkin approximants for test 1 0
10
Zolotaryov grid Geometric grid Asymptotical estimate
−1
10
−2
10
Error
−3
10
−4
10
−5
10
−6
10
−7
10
0
5
10
15
20
25
30
35
n
Fig. 7. Convergence for Zolotaryov and geometric grids (test 1) and comparison with theoretical results.
ric progression exhibit more nonuniform error distribution, but the Zolotaryov error remains more uniform and smaller in the L∞ [ωmin , ωmax ] norm (see Figure 6). In Figures 7 and 8 we show the errors (for both the grids) in the L∞ [ωmin , ωmax ] norm as functions of n for tests 1 and 2, respectively. For both tests the Zolotaryov grid slightly overperforms the geometric one, and the average slopes of the Zolotaryov error curves are in good agreement with the asymptotic estimate determined by (3.15).
964
L. KNIZHNERMAN, V. DRUSKIN, AND M. ZASLAVSKY Convergence of Galerkin approximants for test 2
1
10
Zolotaryov grid Geometric grid Asymptotical estimate
0
10
−1
10
−2
Error
10
−3
10
−4
10
−5
10
−6
10
0
5
10
15
20
25
30
35
n
Fig. 8. Convergence for Zolotaryov and geometric grids (test 2) and comparison with theoretical results.
The asymptotic estimate is computed as c exp −n
π2 2
log ωωmax + log 16 min
,
with a constant c chosen to fit the actual Zolotaryov error. For n = 32 it took 35 minutes of computer time on a PC with a Pentium IV 2 GHz processor to solve the problem from test 2 (our preconditioner allows us to obtain the exact solution after just one QMR iteration for test 1) with 6 digits of accuracy. For comparison, the same task took 32450 steps and 252 minutes of computer time for the SLDM. So the RKS reduction significantly overperforms the SLDM, but not without drawbacks. The RKS reduction requires additional memory to store Gn and a priori knowledge of the Krylov subspace dimension n. 5. Conclusive remarks. • The problem of optimization of rational Krylov subspaces (RKS) for computation of the resolvent of self-adjoint operators can be reduced to the third Zolotaryov problem in the complex plane. • This problem can be asymptotically solved in a closed form for a bounded positive frequency interval. • The numerical experiments confirm the theoretical results for the models from geophysical applications. • We are looking into possibilities of extension of the developed approach to non-Hermitian operators and the computation of exponentials and other functions of operators. • A drawback of the developed approach is that the dimension of the rational Krylov subspace should be known a priori. We are planning to address this issue in our future research.
ON OPTIMAL CONVERGENCE RATE OF RKS REDUCTION
965
Appendix. Proof of Theorem 3.4 and auxiliary assertions. In subsection A.1 we shall establish properties of the Green function for the domain C \ R− ; the relation between values on R+ and on iR is the key point. In subsection A.2 we shall compare the corresponding potentials of two measures supported, respectively, on R and iR. This will enable us to express the asymptotical convergence factor of our (complex) third Zolotaryov problem through that of the classical (real) problem studied by Zolotaryov himself. A.1. Green’s function. Remark 2. Due to technical reasons, we prefer to handle the condenser (R− , D) instead of (R+ , D). Of course, σn (R− , D) = σn (R+ , D) because of the symmetry. Removing from the complex plane the support R− of the measure, generating the Markov function 0 1 √ (z − x)−1 dx, (A.1) z −1/2 = z ∈ R− −∞ π −x (see [4, part 1, section 2.2, p. 47]), we obtain the domain Ω = C \ R− . According to a definition from [25, Chapter 5, section 5] or [31, section A.V], Green’s function (of two variables) for Ω gΩ (z, x),
z, x ∈ Ω,
is the one satisfying the following conditions: (1) the function gΩ (z, x) as a function of z is harmonic in the domain Ω\{x}; (2) the function gΩ (z, x) − log
1 z−x
is bounded in some vicinity of a point x; (3) the limit value of gΩ (z, x) as z tends to a point from R− is zero. Lemma A.1. Green’s function (of two variables) for the domain Ω is expressed by the formula √ √ z+ x ¯ √ , (A.2) gΩ (z, x) = log √ z, x ∈ Ω. z− x Proof. It is known [25, Chapter 5, section 5] that (A.3)
gΩ (z, x) = log |φ(z, x)|,
z, x ∈ Ω,
where with a fixed argument x the slice z → φ(z, x) conformally maps Ω ∪ {∞} onto the exterior to the unit circle in C in such a way that φ(x, x) = ∞. We shall build φ as a composition of the following conformal mappings: √ z−1 (A.4) z → √ z+1 transforms [21, p. 428] Ω into the open unit circle; (A.5)
z →
z−a , 1−a ¯z
|a| < 1,
966
L. KNIZHNERMAN, V. DRUSKIN, AND M. ZASLAVSKY
transforms [24, p. 104] the open unit circle into itself; the inversion z →
(A.6)
1 z
transforms the open unit circle into the exterior to the open unit circle. We shall choose the parameter value √ x−1 , a= √ x+1 so that (A.7)
√ √ 1+a x+1+ x−1 √ √ = √ = x. 1−a x+1− x+1
Composing the mappings (A.4)–(A.6) and accounting (A.7), we obtain √
√ √ z+1−a ¯( z − 1) √ φ(z, x) = √z−1 =√ z − 1 − a( z + 1) √ −a z+1 √ √ √ (1 − a ¯) z + (1 + a 1−a ¯ ¯) z+ x ¯ √ √ , = = ·√ (1 − a) z − (1 + a) 1−a z− x 1−a ¯ √z−1 z+1
which in conjunction with (A.3) follows (A.2). Remark 2. Notwithstanding that representation (A.2) is unsymmetric, it is easy to see that the symmetry property gΩ (z, x) = gΩ (x, z),
z, x ∈ Ω,
holds. Lemma A.2. If u, v ∈ R, u, v > 0, then (A.8)
gΩ (ui, vi) + gΩ (ui, −vi) = gΩ (u, v).
Proof. Indeed, we derive from (A.2) √ √ ui + √−vi ui + √vi √ + log √ gΩ (ui, vi) + gΩ (ui, −vi) = log √ √ ui − vi ui − −vi √ √ √ √ √ √ u+ v u + v u + i v √ + log √ √ = log √ √ = gΩ (u, v). = log √ u− v u − i v u − v Lemma A.3. The following differential relations hold: √ ∂gΩ (−u + i, v) v (A.9) , =√ ∂ u(u + v) =+0 √ ∂gΩ (−u + i, vi) v (A.10) = √ 2 , √ ∂ =+0 2u v2 + u − v2 √ ∂gΩ (−u + i, −vi) v = (A.11) √ 2 , √ ∂ =+0 2u v2 + u + v2 u, v ∈ R,
u, v > 0.
ON OPTIMAL CONVERGENCE RATE OF RKS REDUCTION
967
. Proof. The symbol = will denote an equality up to an o() addend. The limit values of gΩ (z, x) are zero, when z or x tends to a point from R− . First, we have √ √ i + + v −1 + i + v −u + i + v . u u 2u u = log log √ √ = log i + 2u −1 + u i − uv − uv −u + i − v = . = log
1+ 1+
v u v u
+ −
v u u v u u
v v . 1 . u u u u = log 1 + 2 , = 2 1 + uv 1 + uv
which gives (A.9). Second, we obtain √ √ √ √ u i 1 − i + v 1−i √ −u + i + −vi . 2u 2 √ 1+i √ = log √ log √ i u i 1 − 2u − v √2 −u + i − vi √ √ + v + u − v2 i 2 u 2 = log √ 2√ u − v2 + u − v2 i √ 2 2 √ − v2 + 4 2√ u v2 + u − v2 1 2 u = log v 2 √ 2 2 √ − + u − v2 2 2 u . 1 = log 1 + 2
2 √ u
v 2
+
v
. =
2
√ 2 u − v2
v 2
v 2u √ 2 ; + u − v2
this leads to (A.10). Third, we analogously derive √ √ √ −u + i + √vi u i 1 − i + v 1+i √ 2u . 2 √ 1−i √ log √ = log √ i −u + i − −vi u i 1 − 2u − v √2 √ + v + √u + v i 2 u 2 2 √ = log 2√ u − v2 + u + v2 i √ v 2 2 √ − + 4 2√ u v2 + u + v2 2 1 2 u = log v 2 √ 2 2 √ − + u + v2 2 2 u . 1 = log 1 + 2
2 √ u
v 2
+
v 2
√ 2 u + v2
. =
v 2
v 2u √ 2 ; + u + v2
this justifies (A.11). A.2. Two measures and their potentials. It follows from the explicit formulae [2, section 39] for the extremal error points of diagonal Zolotaryov approximants to the function z −1/2 on the segment [1 − κ2 , 1] that, as the approximant’s degree tends to infinity, the interpolation points are, in the limit, distributed according to
968
L. KNIZHNERMAN, V. DRUSKIN, AND M. ZASLAVSKY
the probability measure α on [1 − κ2 , 1], defined by the equality α (x) =
1 . 2K(κ) (x + κ2 − 1)x(1 − x)
Since Zolotaryov approximants are optimal (though with a weight), the measure α is equilibrium with respect to Ω. Without loss of generality we can assume that ωmax = 1. Now introduce the following probability measure β on the compact D: (A.12)
β(iX) = β(−iX) =
α(X) , 2
X a measurable subset of [1 − κ2 , 1].
Define the two potentials (A.13)
1
g(α, Ω; z) = 1−κ2
gΩ (z, x) dα(x),
g(β, Ω; z) =
gΩ (z, x) dβ(x). D
Proposition A.4. The measure β is the equilibrium one for the compact D with respect to the domain Ω. The (common) value of g(α, Ω; z) on D equals the half (common) value of g(α, Ω; z) on [1 − κ2 , 1]. Proof. The two potentials on the corresponding supports owing to (A.8), (A.12), and (A.13) are related by g(β, Ω; ui) =
1 2
1
1−κ2
gΩ (ui, vi) dα(v) + 1 = 2
1 2
1
1−κ2
gΩ (ui, −vi) dα(v)
1
gΩ (u, v) dα(v) =
1−κ2
u ∈ R,
1 g(α, Ω; u), 2
1 − κ2 ≤ u ≤ 1.
It remains to recall that the potential g(α, Ω; u) is constant on [1 − κ2 , 1]. Lemma A.5. The two potentials satisfy the equality
+∞
(A.14) 0
∂g(α, Ω; −u) du = ∂νu
+∞
∂g(β, Ω; −u) du ∂νu
0
(= π),
where ν is the upward (or, which is the same due to the symmetry, downward) normal. Proof. On the one hand, in view of (A.1) and (A.9)
+∞
0
(A.15)
∂g(α, Ω; −u) du = ∂νu 1
= 1−κ2
0
+∞
0
+∞
1
1−κ2
∂gΩ (−u, v) du dα(v) ∂νu
√ du √ v dα(v) = π u(u + v)
1
dα(v) = π. 1−κ2
On the other hand, making at a suitable moment the change of variables u = vt2 and
ON OPTIMAL CONVERGENCE RATE OF RKS REDUCTION
969
exploiting formulae (A.10), (A.11) and [13, item 2.172], derive +∞ 1 ∂g(β, Ω; −u) 1 ∂gΩ (−u, vi) ∂gΩ (−u, −vi) du = + du dα(v) ∂νu ∂νu ∂νu 0 0 1−κ2 2 +∞ √ 1 1 1 1 √ √ √ +√ = du v dα(v) 2 1−κ2 0 2u(u + v − 2uv) 2u(u + v + 2uv) +∞ 1 +∞ √ 1 2vt dt 2vt dt √ √ √ √ v dα(v) = + 2 1−κ2 0 2vt2 (vt2 + v − 2v 2 t2 ) 2vt2 (vt2 + v + 2v 2 t2 ) 0 1 +∞ +∞ 1 dt dt √ √ =√ + dα(v) 2 1−κ2 0 t2 − 2t + 1 t2 + 2t + 1 0 ⎛ ⎞ √ t=+∞ √ t=+∞ 1 2t − 2t + 2 2 ⎝ arctan √ ⎠ dα(v) = + arctan √ 2 t=0 2 t=0 1−κ2
+∞
π π π π + + − = π. 2 4 2 4 (A.16) =
Comparing (A.15) and (A.16), we get (A.14). A.3. Proof of Theorem 3.4. Proof. It follows from [11, section 1] that the Riemann modulus of the condenser (R− , [1 − κ2 , 1]) equals ρ2 . This implies (see [10, section 3]) that lim
n→∞
n σn (R− , [1 − κ2 , 1]) = ρ2 .
Take into account that potentials (A.13), divided by their values on the compacts [1 − κ2 , 1] and D, respectively, solve the Dirichlet problems with the zero boundary condition on R− and unity boundary condition on [1 − κ2 , 1] or D (these harmonic functions are called harmonic measures; see [29, section 4.3]). Formula (27) from [36, section 8.7, Theorem 9] and the definition of the quantity τ from that theorem’s proof show how the quantities lim
n→∞
n σn (R− , D) and
lim
n→∞
n σn (R− , [1 − κ2 , 1])
are expressed in terms of the harmonic measures: the asymptotic convergence factors’ logarithms are inversely proportional to the integral over R− of the normal derivative of harmonic measures (it is sufficient to know the integrals over one of the two edges of the slit R− ). Assertion (3.12) is a consequence of Lemma A.5 and Proposition A.4. Assertion (3.11) then follows from [10, Theorem 1]. Remark 3. The proof of the mentioned Theorem 9 from [36, section 8.7] shows that solutions that are optimal in the Cauchy–Hadamard sense can be taken with ωj ∈ D and λj ∈ R− . Acknowledgments. The authors are thankful to A. B. Bogatyryov, M. Botchev, V. I. Lebedev, S. P. Suetin, and E. E. Tyrtyshnikov for bibliographical support. The authors are grateful to B. Beckermann for pointing out a stable and simple variant of the rational Arnoldi method.
970
L. KNIZHNERMAN, V. DRUSKIN, AND M. ZASLAVSKY REFERENCES
[1] M. Abramowitz and J. Stegan, eds., Handbook of Mathematical Functions, Appl. Math. 55, National Bureau of Standards, Washington, D.C., 1964. [2] N. I. Akhiezer, Theory of Approximation, Dover, New York, 1992. [3] Z. Bai, Krylov subspace techniques for reduced-order modeling of large-scale dynamical systems, Appl. Numer. Math., 43 (2002), pp. 9–44. [4] G. A. Baker and P. Graves-Morris, Pad´ e Approximants, Addison–Wesley, London, 1996. [5] A. Bultheel, P. Gonzalez-Vera, E. Hendriksen, and O. Njastad, Orthogonal Rational Functions, Cambridge University Press, Cambridge, UK, 1999. [6] V. L. Druskin and L. A. Knizhnerman, A spectral semi-discrete method for the numerical solution of three-dimensional non-stationary electrical prospecting problems, Izv. Akad. Nauk SSSR Ser. Fiz. Zemli, 8 (1988), pp. 63–74 (in Russian; translated into English). [7] N. S. Ellner and E. L. Wachspress, Alternating direction implicit iteration for systems with complex spectra, SIAM J. Numer. Anal., 28 (1991), pp. 859–870. [8] R. W. Freund, Model reduction methods based on Krylov subspaces, Acta Numer., 12 (2003), pp. 267–319. [9] E. Gallopoulos and Y. Saad, Efficient solution of parabolic equations by Krylov approximation method, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 1236–1264. [10] A. A. Gonchar, Zolotarev problems connected with rational functions, Mat. Sb., 7 (1969), pp. 623–635 (in Russian; translated into English). [11] A. A. Gonchar, On the speed of rational approximation of some analytic functions, Mat. Sb., 34 (1978), pp. 131–145 (in Russian; translated into English). [12] S. A. Goreinov, Mosaic-skeleton approximations of matrices generated by asymptotically smooth and oscillative kernels, in Matrix Methods and Computations, Inst. Numer. Math. RAS, Moscow, 1999, pp. 42–76 (in Russian). [13] I. S. Gradshteyn and I. M. Ryzhik, Tables of Integrals, Series, and Products, Academic Press, New York, 2000. [14] E. J. Grimme, Krylov Projection Methods for Model Reduction, Ph.D. thesis, The University of Illinois at Urbana-Champaign, 1997. [15] S. Gugercin, A. Antoulas, and C. Beattie, A rational Krylov iteration for optimal H2 model reduction, in Proceedings of the 17th International Symposium on Mathematical Theory of Networks and Systems, Kyoto, Japan, 2006, pp. 1665–1667. [16] W. Hackbusch, B. N. Khoromskii, and E. E. Tyrtyshnikov, Hierarchical Kronecker tensorproduct approximations, J. Numer. Math., 13 (2005), pp. 119–156. [17] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand., 49 (1952), pp. 409–436. [18] M. Hochbruck and C. Lubich, On Krylov subspace approximations to the matrix exponential operator, SIAM J. Numer. Anal., 34 (1997), pp. 1911–1925. [19] D. Ingerman, V. Druskin, and L. Knizhnerman, Optimal finite difference grids and rational approximations of the square root. I. Elliptic functions, Commun. Pure Appl. Math., 53 (2000), pp. 1039–1066. [20] M.-P. Istace and J.-P. Thiran, On the third and fourth Zolotarev problems in the complex plane, SIAM J. Numer. Anal., 32 (1995), pp. 249–259. [21] P. K. Kythe, Computational Conformal Mapping, Birkh¨ auser, Boston, 1998. [22] V. I. Lebedev, On Zolotarev problems in the alternating direction method. II, in Trudy Semin. S. L. Sobolev 1, Novosibirsk, Nauka, 1976, pp. 51–59 (in Russian). [23] L. Meier and D. Luenberger, Approximation of linear constant systems, IEEE Trans. Automat. Control, 12 (1967), pp. 585–588. [24] Z. Nehari, Conformal Mapping, Dover, New York, 1975. [25] E. M. Nikishin and V. N. Sorokin, Rational Approximations and Orthogonality, Nauka, Moscow, 1988 (in Russian); English translation in Transl. Math. Monogr., AMS, Providence, RI, 1991. [26] B. Nour-Omid, Lanczos method for heat conduction analysis, Internat. J. Numer. Methods Engrg., 24 (1987), pp. 251–262. [27] B. Nour-Omid and R. W. Clough, Dynamic analysis of structure using Lanczos co-ordinates, Earthquake Eng. and Struct. Dynamics, 12 (1984), pp. 565–577. [28] I. V. Oseledets, Lower bounds for separable approximations of the Hilbert kernel, Mat. Sb., 198 (2007), pp. 425–432 (in Russian; translated into English). [29] T. Ransford, Potential Theory in the Complex Plane, London Math. Soc. Stud. Texts 28, Cambridge University Press, Cambridge, UK, 1995. [30] A. Ruhe, The rational Krylov algorithm for nonsymmetric eigenvalue problems. III: Complex shifts for real matrices, BIT, 34 (1994), pp. 165–176.
ON OPTIMAL CONVERGENCE RATE OF RKS REDUCTION
971
[31] H. Stahl and V. Totik, General Orthogonal Polynomials, Encyclopedia Math. Appl. 43, Cambridge University Press, Cambridge, UK, 1992. [32] G. Starke, Optimal alternating direction implicit parameters for nonsymmetric systems of linear equations, SIAM J. Numer. Anal., 28 (1991), pp. 1431–1445. [33] M. E. Taylor, Partial Differential Equations II: Qualitative Studies of Linear Equations, Springer, New York, 1991. [34] E. E. Tyrtyshnikov, Mosaic-skeleton approximations, Calcolo, 33 (1996), pp. 47–57. [35] H. A. Van der Vorst, An iterative solution method for solving f (A)x = b using Krylov subspace information obtained for the symmetric positive definite matrix, J. Comput. Appl. Math., 18 (1987), pp. 249–263. [36] J. L. Walsh, Interpolation and Approximation by Rational Functions in the Complex Domain, AMS, Providence, RI, 1960. [37] M. Zaslavsky, S. Davydycheva, V. Druskin, A. Abubakar, T. Habashy, and L. Knizhnerman, Finite-difference solution of the 3D electromagnetic problem using divergence-free preconditioners, in Proceedings of SEG Annual Meeting, New Orleans, 2006, pp. 775–778.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 972–996
HARDY SPACE INFINITE ELEMENTS FOR SCATTERING AND RESONANCE PROBLEMS∗ THORSTEN HOHAGE† AND LOTHAR NANNEN† Abstract. This paper introduces a new type of infinite element for scattering and resonance problems that is derived from a variant of the pole condition as radiation condition. This condition states that a certain transform of the exterior solution belongs to the Hardy space of L2 boundary values of holomorphic functions on the unit disc if and only if the solution is outgoing. We obtain a symmetric variational formulation of the problem in this Hardy space. Our infinite elements correspond to a Galerkin discretization with respect to the standard monomial orthogonal basis of this Hardy space and lead to simple element matrices. Hardy space infinite elements are particularly well suited for solving resonance problems since they preserve the eigenvalue structure of the problem. We prove superalgebraic convergence for a separated problem. Numerical experiments exhibit fast convergence over a wide range of wave numbers. Key words. transparent boundary conditions, radiation conditions, pole condition, infinite elements, Hardy spaces, Helmholtz equation AMS subject classifications. 65N30, 65N12, 35B34, 35J20, 44A10 DOI. 10.1137/070708044
1. Introduction. For solving a time-harmonic wave equation on an unbounded domain by finite element methods, appropriate boundary conditions have to be imposed on the artificial boundary of the necessarily finite computational domain. These boundary conditions should be chosen in such a way that the solution of the boundary value problem on the computational domain is a good approximation to the restriction of the solution of the wave equation posed on the unbounded domain. Such conditions are called transparent boundary conditions and replace the radiation condition at infinity. The method proposed in this paper works well for scattering problems, but a particular advantage over numerous competing transparent boundary conditions is the ability to easily treat resonance problems. Such problems appear in molecular physics, acoustics, lasers, and numerous other areas of engineering, natural sciences, and mathematics (cf. [22, 14, 13, 7, 25]). A typical resonance problem for the Neumann– Laplacian in the complement of a smooth, compact domain K ⊂ Rd such that Rd \ K 2 is connected consists in finding a nontrivial eigenpair (u, λ) ∈ Hloc (Rd \ K) × C such that (1.1a) (1.1b) (1.1c)
−Δu = λu
in Rd \ K,
∂u =0 on ∂K, ∂ν u satisfies a radiation condition.
∂u ∂ν
denotes the outward normal derivative. For other equivalent definitions of resonances we refer to [23, 25]. In the scattering problem corresponding to (1.1), the ∗ Received by the editors November 20, 2007; accepted for publication (in revised form) August 8, 2008; published electronically February 13, 2009. This work was supported by the Deutsche Forschungsgemeinschaft (DFG), grant Ho 2551/2-1. http://www.siam.org/journals/sinum/47-2/70804.html † Institute of Numerical and Applied Mathematics, University of G¨ ottingen, D-37083 G¨ ottingen, Germany (
[email protected],
[email protected]).
972
973
HARDY SPACE INFINITE ELEMENTS
number λ ∈ (0, ∞) is given and the homogeneous boundary condition (1.1b) is replaced by an inhomogeneous boundary condition. In the following let λ = κ2 with (κ) > 0 and assume that K is contained in the ball Ba := {x : x < a} of radius a > 0. One of a several equivalent formulations of the radiation condition (1.1c) is (1) that u has an expansion in terms of Hankel functions Hn of the first kind, (1.2)
u(x) =
Ml ∞ l=0 m=0
(1)
αl,m (κ|x|)1−d/2 Hl−1+d/2 (κ|x|)Yl,m
x |x|
,
|x| > a,
where {Yl,0 , . . . , Yl,Ml } is an orthonormal basis of the i-th eigenspace of the Laplace– Beltrami operator on S d−1 . (Yl,m are spherical harmonics d = 3 and trigonometric monomials for d = 2.) A solution u to (1.1a) satisfying (1.2) is called outgoing, whereas a solution with a corresponding expansion in terms of Hankel functions√ of the second kind is called incoming. It can be shown that all resonances κ = λ, κ > 0 of (1.1) satisfy $(κ) < 0 (cf. [23]). For such values of κ, it follows from the asymptotic behavior of Hankel functions, (1.3) |eiz | (1) |Hl (z)| = |z|
1 1 |e−iz | (2) 1+O 1+O , |Hl (z)| = , |z| |z| |z|
|z| → ∞,
that outgoing solutions are exponentially increasing at infinity, and incoming solutions are exponentially decreasing. This implies in particular that incoming, but not outgoing, solutions satisfy the Sommerfeld radiation condition
∂u − iκu → 0 as r = |x| → ∞ (1.4) r(d−1)/2 ∂r for $(κ) < 0 since condition (1.4) (as well as the conjugate condition with −i replaced by i) selects exponentially decaying solutions. Hence the Sommerfeld condition does not characterize outgoing waves for $(κ) < 0. The fact that (1.4) is not valid for $(κ) < 0 rules out the simple transparent boundary condition ∂u/∂r = iκu on ∂Ba for resonance problems as well as higher order local conditions [11, 6]. Standard infinite elements are based on the series expansion (1.2) or the Wilcox expansion [3, 4]. Since κ appears in (1.2) in a very nonlinear way inside the argument of the Hankel functions, standard infinite elements destroy the eigenvalue structure of problem (1.1). The same holds true for boundary element methods. On the other hand, the perfectly matched layer (PML) method preserves the eigenvalue structure, and has been used under the name complex scaling for the theoretical study and the numerical computation of resonances in molecular physics since the 1970s [14, 22]. Despite the name, Hardy space infinite elements are actually closer to PML than to classical infinite elements (cf. [10]). In this paper we will use the pole condition as radiation condition (cf. [18, 9, 10]). The formulation used in this paper states that a function u is outgoing if and only if a certain transform of u in a radial direction belongs to the Hardy space H + (S 1 ) on the complex unit circle S 1 . Analogously u is incoming if and only if the same transform of u belongs to the orthogonal complement of H + (S 1 ) in L2 (S 1 ). Therefore, we apply the above transform to the variational formulation of the exterior Helmholtz equation and incorporate the radiation condition by restricting L2 (S 1 ) to the correct Hardy space. Hardy space infinite elements correspond to the Galerkin method applied to
974
THORSTEN HOHAGE AND LOTHAR NANNEN
this variational problem using the standard monomial orthogonal basis of the Hardy space H + (S 1 ). For one-dimensional time-dependent problems a similar approach has been studied in [16]. The rest of this paper is organized as follows: We first present a complete treatment of Hardy space infinite elements for one-dimensional problems in section 2. In the following section 3 we derive analogous Hardy space infinite elements in arbitrary space dimensions. Then the convergence of this method is analyzed using separation arguments in section 4. Numerical results are described in section 5 before we end this paper by some conclusions, including a discussion of pros and cons of the proposed method. 2. One-dimensional Helmholtz equation. In this section we will consider the one-dimensional time-harmonic wave equation (2.1a)
−u (r) − κ2 p(r)u(r) = 0, r ≥ 0, u (0) = f0 ,
(2.1b) (2.1c)
u
outgoing,
with a given complex wave number κ ∈ C with positive real part, a boundary value f0 ∈ C, and a positive potential p ∈ L∞ ((0, ∞)) satisfying p(r) = 1 for r ≥ a. We will split u into an interior part uint := u|[0,a] and an exterior part uext (r) := u(r + a), r > 0. Actually, in one space dimension the Sommerfeld-type transparent boundary condition u (a) = iκu(a) is exact even for $(κ) < 0, and (2.1) reduces to the simple boundary value problem (2.2)
−u int − pκ2 uint = 0,
u int (0) = f0 , u int (a) = iκuint (a).
To explain the basic ideas, we will apply Hardy space infinite elements to problem (2.1) even though this is more complicated than solving (2.2) and requires more degrees of freedom. Note, however, that for the corresponding resonance problem, (2.2) leads to a quadratic eigenvalue problem, whereas Hardy space infinite elements will lead to a linear eigenvalue problem. 2.1. Pole condition and Hardy spaces. Since we assumed p ≡ 1 on [a, ∞), the exterior part of all solutions to (2.1a) is of the form (2.3)
uext (r) = C1 eiκr + C2 e−iκr , r ≥ 0.
The term C1 eiκr corresponds to an outgoing wave, and C2 e−iκr to an incoming wave. The pole condition distinguishes these two solutions with the help of the Laplace ∞ := Luext is transform (Lf )(s) := 0 e−sr f (r)dr. Due to the explicit form (2.3), u given by (2.4)
u (s) =
C2 C1 + , (s) > |$(κ)|. s − iκ s + iκ
This function has a holomorphic extension to C \ {iκ, −iκ}. u is outgoing if and only if u has no pole in the lower complex half-plane and incoming if and only if u has no pole in the upper complex half-plane. This motivates the use of the following Hardy spaces. Definition 2.1 (H − (R) and H + (R)). The Hardy space H ± (R) is the set of all functions f ∈ L2 (R) that are L2 boundary values of a function v, which is holomorphic
HARDY SPACE INFINITE ELEMENTS
975
in C± := {s ∈ C : $(±s) > 0} and for which the integrals R |v(x ± i)|2 dx are uniformly bounded for > 0. |R ∈ H + (R). u is outgoing if and only if u |R ∈ H − (R) and incoming if and only if u 2 ± Equipped with the standard L inner product, H (R) are Hilbert spaces (cf. [5]). Moreover, by the Paley–Wiener theorem these spaces are characterized by u ∈ L2 (R) : F −1 u (±t) = 0 for almost all t > 0} H ± (R) = { ∞ ist 1 in terms of the inverse Fourier transform (F −1 f )(t) = 2π −∞ e f (s) ds. This yields 2 + − the orthogonal decomposition L (R) = H (R) ⊕ H (R). The function v in Definition 2.1 is uniquely determined by f and can be recovered by the Cauchy integral f (˜ s) 1 d˜ s, s ∈ C± . (2.6) v(s) = 2πi R s˜ − s (2.5)
Since we are interested in outgoing solutions, we will mainly deal with the space H − (R). Because of the lack of a convenient orthonormal basis of H − (R) we will apply a further transform to another closely related Hardy space. Definition 2.2 (H + (S 1 )). The Hardy space H + (S 1 ) is the set of all functions F ∈ L2 (S 1 ) that are L2 boundary values of a function V , which is holomorphic in 2π the unit disk D := {z ∈ C : |z| < 1} and for which the integrals 0 |V (reiθ )|2 dθ are uniformly bounded for r ∈ [0, 1). Equipped with the L2 scalar product, H + (S 1 ) is a Hilbert space, and a simple complete orthogonal system of H + (S 1 ) is given by the monomials z k , k = 0, 1, . . . . A family of unitary operators identifying the Hilbert spaces H − (R) and H + (S 1 ) can be defined with the help of the M¨ obius transformations ϕκ0 (z) := iκ0 z+1 z−1 , κ0 > 0, which map the unit disc D to the half-space C− . The parameter κ0 will act as a ∞ tuning parameter in the algorithms to be discussed below. Since −∞ |f (t)|2 dt = 2π (f ◦ ϕκ0 )(eiθ ) ϕ (eiθ )2dθ and ϕ (z) = −2iκ02 , the mappings 0
κ0
κ0
(z−1)
1 z−1 √ are isometric from L2 (R) to L2 (S 1 ) up to the factor −2iκ0 , and it can be shown √ that Mκ0 (H − (R)) = H + (S 1 ) (see [5]). Hence, −2iκ0 Mκ0 : H − (R) → H + (S 1 ) is unitary. Many of the operators on H + (S 1 ) which will appear in our analysis are of the following form. Definition 2.3 (Toeplitz operator). Let f ∈ L∞ (S 1 ) be a complex-valued function and let P : L2 (S 1 ) → H + (S 1 ) denote the orthogonal projection. Then the Toeplitz operator Tf : H + (S 1 ) → H + (S 1 ) with symbol f is defined by Tf U := P (f U ). We will need the following classical results on Toeplitz operators: If f : S 1 → C is continuous and has no zeros, then Tf is a Fredholm operator, and ind(Tf ) = −wn(f ) where wn(f ) denotes the winding number of f around 0 [1, Theorem 2.42]. Moreover, if ind(Tf ) = 0, then Tf is injective and hence boundedly invertible [1, Corollary 2.40]. := Mκ0 u of the outgoing Let us consider the explicit form of the transform U solution u of (2.1). With u0 := u(a) we have (2.7)
(Mκ0 f )(z) := (f ◦ ϕκ0 )(z)
L|R
(2.8) uext (r) = u0 eiκr −→ u (s) =
u0 s − iκ
Mκ0
(z) = −→ U
u0 . iκ0 (z + 1) − iκ(z − 1)
976
THORSTEN HOHAGE AND LOTHAR NANNEN
(1) = u0 /(2iκ0 ). This will be convenient for coupling the transformed Note that U exterior to the interior problem. To take advantage of this fact we decompose U(z) =
(2.9)
1 (u0 + (z − 1)U (z)) 2iκ0
with
U (z) :=
(z) − u0 2iκ0 U . z−1
and U are simple poles Since the only singularities of the holomorphic extensions of U κ0 +κ κ0 +κ at κ0 −κ and since κ0 −κ ∈ / D for (κ/κ0 ) > 0, both U and U are analytic on S 1 and + 1 belong to H (S ). 2.2. Variational formulation. The formal variational formulation of the differential equation (2.1a) is a ∞ 2 (2.10) (uint vint − κ puint vint )dr + (u ext vext − κ2 uext vext )dr = −f0 vint (0). 0
0
The basic identities for transforming the exterior variational problem to the Hardy space are ∞ ∞ i −iκ0 (2.11) F (z)G(z)|dz|, f (r)g(r)dr = − f(−s) g (s)ds = 2π −∞ π 0 S1 = M κ0 with f = (Lf )|R , g = (Lg)|R , F = Mκ0 f, and G g. They will be derived in Lemma A.1 for the more general case κ0 ∈ C (cf. Remark 2.8 below). Introducing the bilinear form A(F, G) := (2.12) G(z)F (z)|dz|, F, G ∈ H + (S 1 ), S1
∞ 0 we have in particular that 0 f gdr = −iκ π A F,G . Theorem 2.4. Let κ0 , (κ) > 0 and X := H 1 ([0, a]) ⊕ H + (S 1 ). If u ∈ 2 Hloc ([0, ∞)) is a solution to (2.1), then (uint , U ) with U defined in (2.9) belongs to X and satisfies the variational equation
uint vint (2.13) B , = −f0 vint (0), U V with B
a uint vint (u int vint − κ2 p uint vint )dr , := U V 0 −
iκ0 iκ2 A(u0 + (z + 1)U, v0 + (z + 1)V ) − A(u0 + (z − 1)U, v0 + (z − 1)V ) 4π 4πκ0
for all (vint , V ) ∈ X and v0 := vint (a). Conversely, if (uint , U ) ∈ X is a solution of (2.13), then uint belongs to H 2 ([0, a]) and is the restriction of a solution u to (2.1). Proof. Assume first that u is a solution to (2.1). It suffices to show that (2.13) holds for all (vint , V ) in a dense subset of X. Hence, we start with a test function v ∈ C([0, ∞)) ∩ H 1 ([0, a]) for which vext has the form vext (r) = v0 eikr ,
$(k) > −$(κ), (k) > 0.
HARDY SPACE INFINITE ELEMENTS
977
For such test functions, the product u · v and products of derivatives decay exponentially, and (2.10) can be derived by partial integration. Moreover, for these test functions the identity (2.11) holds both for f = uext , g = vext and for f = u ext , g = vext . In the second case we apply the identities (2.14)
(Lf )(s) = s(Lf )(s) − f0 ,
(Mκ0 L|R f )(z) = iκ0
1 z + 1 f0 + (z − 1)F (z) f0 = (f0 + (z + 1)F (z)) , − z−1 2iκ0 z−1 2
where f0 and F are defined in analogy to u0 and U , to finally arrive at (2.13) with (2.15)
V (z) =
k − κ0 2iκ0 (Mκ0 L|R vext )(z) − v0 = v0 . z−1 (κ0 − k)z + (κ0 + k)
Since by virtue of Lemma A.2 the span of such functions is dense in H + (S 1 ) and B is continuous on X × X, (2.13) holds for all (vint , V ) ∈ X. Conversely, let (uint , U ) ∈ X be a solution to (2.13). For vint = 0 it follows after multiplication by −4πiκ0 that (2.16) S1
. V (z) −κ20 (z + 1) [u0 + (z + 1)U (z)] − κ2 (z − 1) [u0 + (z − 1)U (z)] |dz| = 0
for all V ∈ H + (S 1 ). Due to (2.20) below, the orthogonal projection P : L2 (S 1 ) → H + (S 1 ) applied to the expression in braces vanishes. Since P z = 0, we obtain 5 4 (2.17) P {mU } = P (κ20 − κ2 ) + (κ20 + κ2 )z u0 = (κ20 − κ2 )u0 , with m(z) := −κ20 |z + 1|2 − κ2 |z − 1|2 . The left-hand side of (2.17) is the Toeplitz operator Tm with symbol m applied to U . Since m(z) = −2(κ2 + κ20 ) + 2(κ2 − κ20 )(z), the graph of m is the straight line connecting −4κ2 and −4κ20 . Therefore, Tm is boundedly invertible by the results quoted after Definition 2.3. Hence, (2.17) has a unique solution. By the derivation of κ−κ0 . (2.13), this solution is given by (2.8) and (2.9), or explicitly U (z) = u0 (κ0 −κ)z+(κ 0 +κ) Plugging this into (2.13) and using (2.16), we obtain the variational formulation of the boundary value problem (2.2): a (2.18) (vint u int − κ2 p vint uint )dr = iκu0 v0 − vint (0)f0 . 0
By elliptic regularity results uint belongs to H 2 ([0, a]) and solves (2.2). Hence, it is also part of a solution to (2.1). 2.3. G˚ arding-type inequality. It is obvious that the bilinear form B in Theorem 2.4 is and symmetric. Moreover, the interior part Bint (uint , vint ) := a bounded 2 (u v − κ pu v arding inequality int int )dr satisfies the standard G˚ int int 0 (2.19)
{Bint (uint , uint )} + βuint 2L2 ≥ uint 2H 1 ,
with β := (|κ|2 + 1)pL∞ ≥ 0. We want to derive a similar inequality for the whole bilinear form B. Note that we cannot simply choose V = U since U ∈ / H + (S 1 ) for
978
THORSTEN HOHAGE AND LOTHAR NANNEN
U ∈ H + (S 1 ) in general. However, a useful conjugation on the Hilbert space H + (S 1 ) is given by the mapping C : H + (S 1 ) → H + (S 1 ) defined by (CF )(z) := F (z). It is easy to check that C is well-defined, antilinear, and isometric, C 2 = I; i.e., C is indeed a conjugation. Moreover, it has the useful property that (2.20)
A(F, CG) = F, GL2 (S 1 ) .
Theorem 2.5. Let (κ2 ), κ0 > 0. Then there exist constants α, β, γ > 0, such that 0
0 0 uint 02 uint uint 2 0 0 . (i + γ)B , + βuint L2 ≥ α 0 U CU U 0X Proof. For the exterior part of the bilinear form Bext := B − Bint we obtain from the identity (2.20) that
uint uint κ0 (1 − γi) 2 , = (i + γ)Bext u0 + (z + 1)U )L2 (S 1 ) 4π U CU
2 κ (1 − γi) 2 + u0 + (z − 1)U L2 (S 1 ) 4πκ0 for any γ ∈ R. Due to the assumption (κ2 ) > 0, we may choose a γ > 0 such that (κ2 (1−γi)) > 0. Using the inequality x2 +y2 ≥ 12 x−y2 with x := u0 +(z+1)U and y := u0 + (z − 1)U we obtain (2.21)
uint uint (i + γ)Bext , ≥α ˜ U 2L2 , U CU
κ2 (1−γi) , 2πκ0 . This together with (2.19) yields the asserwith α ˜ := min κ0 (1−γi) 2π 2 tion with β := γ(|κ| + 1)pL∞ > 0 and α := min(˜ α, γ). Using standard arguments, we obtain the following corollary. Corollary 2.6. If the variational equation (2.13) has only the trivial solution for f0 = 0, then it has a unique solution for all f0 ∈ R, and the solution depends continuously on f0 . By virtue of Theorem 2.4, the variational equation (2.13) is uniquely solvable if and only if κ is not a resonance. 2.4. Galerkin approximation. In the following we will consider the Galerkin approximations to (2.13) using a finite element subspace Vh of H 1 ([0, a]) and the subspace ΠN := span{1, z, . . . , z N } of H + (S 1 ). This leads to the discrete variational problems
uh vh vh (2.22) B , = −f0 vh (0), ∈ Xh,N := Vh ⊕ ΠN . UN VN VN Using Theorem 2.5 and the compactness of the embedding H 1 ([0, a]) → L2 ([0, a]), we obtain the following convergence result (cf. [12, Theorem 13.7]).
HARDY SPACE INFINITE ELEMENTS
979
Theorem 2.7. Let (κ2 ), κ0 > 0, and assume that κ is not a resonance. Let (uint , U ) ∈ X denote the unique solution to (2.13). Then there exist constants C, N0 , h0 > 0 such that the variational problems (2.22) have a unique solution (uh , UN ) ∈ Xh,N for N ≥ N0 and h ≤ h0 , and u − vh 2H 1 + U − VN 2L2 (S 1 ) . inf u − uh 2H 1 + U − UN 2L2 (S 1 ) ≤ C (vh ,VN ) ∈Xh,N
Since U is analytic, we have exponential convergence in N , i.e., for some constants c, C˜ > 0 inf
VN ∈ΠN
˜ −cN . U − VN L2 (S 1 ) ≤ Ce
Although the derivation of the exterior its imple∞part of (2.13) is nonstandard, ∞ mentation is rather simple: For F (z) = j=0 αj z j and G(z) = j=0 βj z j , we have ∞ A(F, G) = 2π j=0 αj βj . With respect to the monomial basis of ΠN the operators (2.23)
f0 1 → (f0 + (• ± 1)F ) F 2
T± : C ⊕ H (S ) → H (S1 ), +
1
+
occurring in (2.13) are approximately represented by the bidiagonal matrices ⎛
(2.24)
TN,± :=
⎜ ⎜ 1⎜ ⎜ ⎜ ⎜ 2⎜ ⎜ ⎝
1 ±1 1 ±1 .. .
⎞
.. 1
. ±1
⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
∈ R(N +1)×(N +2) .
The Galerkin approximation (2.22) corresponds to the introduction of an “infinite element” with N + 2 degrees of freedom, which couples to the interior domain via the unknown u0 . The local element 4 5 matrix of this infinite element is given by
TN,+ + (κ/κ0 )2 TN,− TN,− . −2iκ0 TN,+ In the space basis functions correspond to the functions 4 domain thej monomial 5 T (u , z ) , which are given by uj := (L|R )−1 M−1 − 0 κ0 # j
j (2iκ0 r)n+1 . u0 + n (n + 1)! n=0
(2.25)
uj (r) = e
iκ0 r
From this formula it is clear that if the sum over the uj converges at some points in the exterior domain, the convergence will be slow, in particular far away from the coupling boundary. If the exterior solution is of interest, it can be computed from u0 by Green’s formula, which for one space dimension reduces to u(r) = u0 exp(iκ(r−a)). For inhomogeneous exterior domains without explicitly known Green’s function other numerical realizations of the pole condition can be used to compute the exterior solution (see [19]). Remark 2.8 (choice of κ0 ). It follows from (2.8) and (2.9) or from (2.25) that for scattering problems the optimal choice of κ0 is κ0 = κ since in this case U ≡ 0, and we obtain the exact transparent boundary condition even with no degrees of freedom in H + (S 1 ). For resonance problems, κ0 should be chosen in the region of the complex plane where resonances are of interest. In this case it is advantageous to choose κ0
980
THORSTEN HOHAGE AND LOTHAR NANNEN
as a complex number with $(κ0 ) < 0 and (κ0 ) > 0. All results of this section can be generalized to this case: u is outgoing if and only if Lu|κ0 R belongs to the space − − + 1 H − (κ0 R) := {f (κ−1 0 •) : f ∈ H (R)}. Mκ0 maps H (κ0 R) bijectively to H (S ). In Theorems 2.4, 2.5, 2.7 and Corollary 2.6 we have to replace the conditions on κ and κ0 by (κ/κ0 ) > 0 and (κ2 /κ0 ) > 0. These are reasonable assumptions, since κ0 should be chosen close to the resonances κ of interest anyway. 3. Helmholtz equation in higher dimensions. In this section we will treat the Helmholtz equation in higher dimensions in a manner similar to that in the previous section for one dimension. Besides the resonance problem (1.1) we will also study the scattering problem −Δu − κ2 u = 0
(3.1a)
in Rd \ K,
∂u =f on ∂K, ∂ν u satisfies a radiation condition
(3.1b) (3.1c)
for given κ ∈ C with (κ) > 0 and f ∈ H −1/2 (∂K). This will be done by considering the Laplace transform of the scaled exterior solution (3.2)
uext (r, xˆ) := (r + 1)(d−1)/2 u((r + 1)ˆ x), r > 0, x ˆ ∈ Γ := ∂Ba ,
with respect to the radial variable r, i.e., ∞ (3.3) (Luext )(s, x ˆ) := e−sr uext (r, xˆ) dr,
(s) > |$(κ)|, x ˆ ∈ Γ.
0
The radial variable is scaled such that uext (r, xˆ) ∼ exp(ikar)u∞ (ˆ x) as r → ∞. This scaling is not essential, but simplifies the computations. In particular, we will be able to use part of the analysis of the previous section. 3.1. Pole condition in terms of Hardy spaces. Recall that for Riemannian manifolds A, B the spaces L2 (A; L2 (B)) ∼ L2 (A × B) ∼ L2 (A) ⊗ L2 (B) are isometrically isomorphic. Consequently, H − (R) ⊗ L2 (Γ) can be considered as a closed subspace of L2 (R × Γ). It consists of all functions f ∈ L2 (R × Γ) for which there exists a measurable function v : C− × Γ → C, which is holomorphic in the first variable such that sup>0 R Γ |v(s − i, x ˆ)|2 dˆ x ds < ∞ and →0 |f (s, x ˆ) − v(s − i, x ˆ)|2 dˆ x ds −→ 0. R
Γ
If v = Luext , we will shorten this to L|R uext := f . Again, v can be recovered from f by a Cauchy integral as in (2.6). Definition 3.1. Let u be a complex-valued function on Rd \ K, and assume that the Laplace transform (Luext )(s, •) is well defined by (3.2) and (3.3) for all s in some open region D ⊂ C and belongs to L2 (Γ). We say that u satisfies the pole condition if the function D → L2 (Γ), s → (Luext )(s, •) has a holomorphic extension to C− , and L|R uext belongs to H − (R) ⊗ L2 (Γ). Remark 3.2. It is easy to see that Definition 3.1 without the condition L|R uext ∈ H − (R)⊗ L2 (Γ) is equivalent to the formulation in [9, Definition 2.1]. Moreover, it was
HARDY SPACE INFINITE ELEMENTS
981
shown in [9, section 9] that the pole condition is equivalent to Sommerfeld’s radiation condition for solutions to the Helmholtz equation with κ > 0. From the results in that section in [9], in particular (9.14) and (9.9b), it can can be seen that the condition L|R uext ∈ H − (R) ⊗ L2 (Γ) is also satisfied at least for sufficiently large a. Remark 3.3. In [9] only the case κ > 0 was considered. However, the pole condition is also a valid radiation condition for $(κ) = 0. The singularity of the Laplace transform Luext of an outgoing wave is still a singularity with a branch cut located at iκa, and hence in the upper half-plane. As mentioned in the introduction, Sommerfeld’s radiation condition is not valid for $(κ) < 0, and hence no equivalence result holds true in this case. However, it is actually much simpler to prove equivalence of the pole condition and the radiation condition (1.2) since the Hankel function can be recovered from the pole condition approach (see [9, section 7]). Note that the pole condition is independent of the differential equation. Solutions to the Helmholtz equation will belong to spaces of higher regularity with respect to the second variable. In analogy to the previous section we consider the M¨ obius transform Mκ0 ⊗IL2 (Γ) := (Mκ0 ⊗ IL2 (Γ) )L|κ0 R uext . from H − (κ0 R) ⊗ L2 (Γ) to H + (S 1 ) ⊗ L2 (Γ) and write U Moreover, we define u0 := u|Γ and (3.4)
U (z, x ˆ) :=
(z, x ˆ) − u0 (ˆ x) 2iκ0 U , z−1
z ∈ S 1, x ˆ ∈ Γ,
in analogy to (2.9). 3.2. Variational formulation. Assume that u is a solution to the scattering problem (3.1) and define uint := u|Ωint with Ωint := Ba \ K and uext by (3.2). Then for smooth, rapidly decaying test functions v a straightforward computation yields ∞ 4 5 d−1 1 ∇uint · ∇vint − κ2 uint vint dx + u0 v0 dˆ x+ ∂r uext ∂r vext dr dˆ x 2a Γ a Γ 0 Ωint ∞ ∇xˆ uext · ∇xˆ vext Cd uext vext 2 +a − κ uext vext − 2 f vint ds, dr dˆ x=− (r + 1)2 a (r + 1)2 Γ 0 ∂K with Cd := (d−1)(3−d) and the surface gradient ∇xˆ on Γ. 4 We first derive the transformation to the Hardy space formally. Due to (2.9), (2.14), and (2.23) we have
u0 u0 iκ0 (Mκ0 ⊗ I)L|κ0 R uext = (T− ⊗ I) , (Mκ0 ⊗ I)L|κ0 R ∂r uext = (T+ ⊗ I) . U U By [9, Theorem 9.3] (I ⊗∇xˆ )L|κ0 R uext is also analytic with respect to the first variable s in C− and decays like |s|−1 as |s| → ∞. In addition we need to recall the identity
∞ f e−(σ−s) f(σ)dσ. (3.5) L (s) = (JLf )(s) with (J f )(s) := •+1 s := J−1 arises from a multiplication with a factor r + 1, i.e., The inverse operator D (DLf )(s) = L{(• + 1)f }(s) = (−∂s + 1)Lf (s). The M¨obius transformed operators −1 and J := Mκ0 JM −1 . As are defined by D := Mκ0 DM κ0 κ0 ∞ −iκ0 # A F1 , F2 , f1 f2 dr dˆ x= π Γ 0
982
THORSTEN HOHAGE AND LOTHAR NANNEN
with A# (F1 , F2 ) := Γ S 1 F1 (z, x ˆ)F2 (z, x ˆ) d|z| dˆ x for Fj = (Mκ0 ⊗ I)L|κ0 R fj , we obtain 4 5 d−1 ∇uint ∇vint − κ2 uint vint dx + u0 v0 dˆ x 2a Γ Ωint
u0 v0 iκ0 # A (T+ ⊗ I) − , (T+ ⊗ I) aπ U V
u0 v0 aiκ2 # A ⊗ I) ⊗ I) − (T , (T (3.6) − − πκ0 U V
u0 v0 ai # A + (JT− ⊗ ∇xˆ ) , (JT− ⊗ ∇xˆ ) πκ0 tan U V
u0 v0 iCd # A (JT− ⊗ I) f vint |∂K ds. − , (JT− ⊗ I) =− πκ0 a U V ∂K If L2tan (Γ) denotes the space of square integrable tangential vector fields on Γ, we define A# ˆ) · F2 (z, x ˆ) d|z| dˆ x. tan (F1 , F2 ) := Γ S 1 F1 (z, x This bilinear form suggests introducing the space (3.7a)
uint u0 1 + 1 2 + 1 2 X := ∈ H (Ωint ) ⊕ H (S )⊗L (Γ) : (JT− ⊗∇xˆ ) ∈ H (S )⊗Ltan (Γ) , U U #
with the inner product (3.7b) ?
@ uint vint := uint , vint H 1 (Ωint ) + U, V H+(S 1 )⊗L2 (Γ) , U V X# ?
@ u0 v0 + (JT− ⊗ ∇xˆ ) , (JT− ⊗ ∇xˆ ) U V H+(S 1 )⊗L2
.
tan (Γ)
It is easy to see that the bilinear form in (3.6) is bounded with respect to the norm of X # . It is shown in Lemma A.3 that X # with this inner product is a Hilbert space, and for each vint ∈ H 1 (Ω) there exists a vector in X # containing vint ∈ H 1 (Ω) as first component (note that the surface gradient ∇xˆ is not applied to the H 1/2 (Γ)-function v0 , but to a sum with other functions). Moreover, it is shown in Lemma A.3 that there exists a dense subset of test functions (vint , V ) ∈ X # for which the transforms above are justified. Therefore, we obtain the following result. Theorem 3.4. If u is a solution to the scattering problem (3.1), then (uint , U ) belongs to the space X # and satisfies the symmetric variational equation (3.6). The converse result will be shown later in Corollary 4.3 using a separation argument. 3.3. Galerkin discretization. Let Vh ⊂ H 1 (Ωint ) be a finite element subspace on the computational domain Ωint , and let Vh |Γ denote the set of traces of functions in Vh on the artificial boundary Γ. Moreover, we use the polynomial subspace ΠN ⊂ H + (S 1 ) as in section 2. We will use a Galerkin method where the space X # in Theorem (3.4) is approximated by the finite-dimensional subspace (3.8)
# Xh,N := Vh ⊕ ΠN ⊗ Vh |Γ .
983
HARDY SPACE INFINITE ELEMENTS
Fig. 3.1. Hardy space infinite element corresponding to quadratic Lagrange elements.
For a given finite element basis of Vh let {wj : j = 0, . . . , NΓ } denote the corresponding set of nonvanishing traces on Γ. Then we choose the functions (z, x ˆ) → z n wj (ˆ x) (j = 0, . . . , NΓ , n = 0, . . . , N ) as the basis of ΠN ⊗ Vh |Γ . The system matrix with respect to this basis can be assembled elementwise in a finite element fashion as illustrated in Figure 3.1. Each infinite element couples with the interior finite elements via common degrees of freedom for the Dirichlet values on Γ. Moreover, there is a coupling between neighboring infinite elements. Due to the structure of the bilinear form (3.6), the local element matrices are sums of Kronecker products of matrices. Γ denote Let MelΓ and Sel the element mass an stiffness matrix on Γ corresponding to the bilinear forms Γ u0 v0 dˆ x and Γ ∇xˆ u0 · ∇xˆ v0 dˆ x, respectively. The discrete representation of the operators T± has already been described in section 2; see (2.24). It remains to discuss the discretization of the operator J. Recall that J is the inverse of a differential operator D, which is given explicitly by
z−1 (z − 1)2 (3.9) (DF )(z) = F (z) + + 1 F (z), F ∈ H + (S 1 ). 2iκ0 2iκ0 To avoid numerical integrations, we use the inverse of the discretization of D ⎛
(3.10)
DN := id(N +1)×(N +1) +
⎜ ⎜ ⎜ ⎜ 1 ⎜ ⎜ ⎜ 2iκ0 ⎜ ⎜ ⎜ ⎜ ⎝
−1 1 1 −3 2 2 −5 .. .
⎞
3 .. . N
..
. −2N − 1
⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
as the discretization of J. Hence, the element matrix of a Hardy space infinite element is given by L1 ⊗ MΓel + L2 ⊗ SΓel − κ2 L3 ⊗ MΓel ,
(3.11) with d−1 L1 = 2a L2 =
1 0
−
2ai T D−2 TN,− , κ0 N,− N
2iκ0 2Cd i T T TN,+ − D−2 TN,− , a N,+ κ0 a N,− N and
L3 =
2ai T TN,− . κ0 N,−
Note that the eigenvalue structure with respect to κ2 is preserved for the discretization with Hardy space infinite elements.
984
THORSTEN HOHAGE AND LOTHAR NANNEN
Remark 3.5. The Hardy space infinite element method is not restricted to the case of spherical artificial boundaries Γ = ∂Ba . We have applied the method also to boundaries Γ = ∂P with convex polyhedrons P using the segmentation of the exterior domain Ωext := Rd \ P presented in [17, 24]. Although the variational formulation becomes more complicated, the method still seems to converge superalgebraically (see [15]). 4. Convergence analysis for the separated problems. In this section we analyze the convergence of Hardy space infinite elements in the exterior domain (i.e., for the special case K = Ba ) after a Fourier separation. Implications for the full problem are discussed in section 4.4. 4.1. The separated equations. For this end, we choose an orthonormal basis of eigenfunctions Φn ∈ L2 (Γ), n ∈ N0 , such that −Δxˆ Φn = λn Φn for the Laplace– Beltrami operator Δxˆ on Γ. The functions u0 and U have expansions with respect ∞ ∞ to this basis of the form u0 (ˆ x) = n=0 u0,n Φn (ˆ x), U (z, x ˆ) = n=0 Un (z)Φn (ˆ x), and similarly for v0 and V . Moreover, the Neumann data on ∂K = Γ, which will be denoted by g instead of f in this section, can be decomposed into the Fourier series ∞ g(ˆ x) = n=0 gn Φn (ˆ x). Then the variational problem (3.6) decouples into a series of ˜ := C ⊕ H + (S 1 ): variational problems in X (4.1)
u0,n v0 u0,n v0 Cd − a2 λn B2 + = −gn v0 , B1 , , κ20 a Un V Un V
v0 ˜ ∈ X, V
˜ are given by for the Fourier coefficients, where the bilinear forms B1 , B2 on X
v0 u0 d−1 B1 u0 v0 , := 2a U V
u0 v0 u0 v0 iκ0 aiκ2 A T+ − A T− , T− , , T+ − aπ πκ0 Un Vn U V
v0 u0 u0 v0 iκ0 B2 , := − , JT− . A JT− π U V U V ˜ given by the sum of the inner products on We use the canonical inner product on X + 1 ˜ →X ˜ (j = 1, 2) implicitly by C and H (S ). Defining the operators Kj : X ? @
u0 u0 v0 v0 u0 v0 ˜ Kj = Bj , , , , ∈ X, U V U CV U V ˜ X the variational equations (4.1) can be reformulated as operator equations
u0,n u0,n −gn Cd − a2 λn K + = . (4.2) K1 2 Un aκ20 Un 0 4.2. Uniqueness and smoothness of solutions. Motivated by the Paley– ˜ → L2 (R+ ) by Wiener theorem (2.5) we introduce a transform Q : X
f0 f0 −1 ∞ ist −1 (4.3) Q e t ≥ 0. (t) := (κ0 s) ds, Mκ0 T− 2π −∞ F F
985
HARDY SPACE INFINITE ELEMENTS
The following result will be used to show uniqueness, but may also be of independent interest. ˜ to the Sobolev space H 1 (R+ ), Lemma 4.1. Q is a norm isomorphism from X
and f := Q(f0 , F ) satisfies f (0) = f0 and
(4.4)
1 f (t) = 2π
∞
e
ist
−∞
M−1 κ0 T+
f0 (κ0 s) ds, F
t ≥ 0.
Proof. Let us first show that the range of Q is contained in H 1 (R+ ). Due to (2.5) we have f (t) = 0 for t < 0 if we use definition (4.3) also for t < 0. Therefore we get f ∈ H 1 (R+ ) if we can show that w(t) := f (t) + f (−t), t ∈ R, belongs to H 1 (R).
Introducing f6 := (iκ0 )−1 M−1 we have f6(κ0 s) = (−iκ0 )−1 (F f )(s) and κ0 T− (f0 , F ) −iκ0 (F w)(s) = f6(κ0 s) + f6(−κ0 s). Due to (2.14) and the definition (2.23) of T+ , the
− function •f6 − f0 = M−1 κ0 T+ (f0 , F ) belongs to H (κ0 R). Hence, the function −iκ20 s(F w)(s) = κ0 sf6(κ0 s) − f0 − −κ0 sf6(−κ0 s) − f0 ,
s ∈ R,
∞ is square integrable, and therefore −∞ (1 + s2 )|(F w)(s)|2 ds < ∞. This implies that w ∈ H 1 (R). To prove the second assertion first note that
∞
e
(4.5)
−ist
f (t) dt = −f (0) + is
0
∞
e−ist f (t) dt,
s ∈ R.
0
Since we have already shown that f ∈ L2 (R+ ), the right-hand side is a square in tegrable function of s by Plancherel’s theorem. As •f6 − f0 = M−1 also κ0 T+ (f0 , F ) 2 belongs to L (κ0 R), the constant function f (0) − f0 is square integrable and hence 0. Therefore, f (0) = f0 , and applying the inverse Fourier transform to (4.5) yields (4.4). Q is injective as a composition of injective operators. To prove that Q is onto, choose an arbitrary v ∈ H 1 (R+ ) and extend it by zero on the negative real axis. Then (2.5) implies that F v ∈ H − (R), and hence V := (−iκ0 )−1 Mκ0 (F v)(κ−1 0 •) belongs to 4 5 (z)−v0 2iκ0 V −1 + 1 H (S ). Moreover, Mκ0 (F v )(κ •) (z) = iκ0 V (z) + with v0 := v(0) 0
z−1
(z)−v0 is an element of H (S ). Hence, the function V (z) := 2iκ0 Vz−1 (cf. (2.9)) belongs to H + (S 1 ), and we have (Mκ0 T− (v0 , V ) )(κ0 s) = −(F v)(s), so Q(v0 , V ) = v. The boundedness of Q−1 follows either directly from the construction above or the open mapping theorem. Note that the separation index n is the index of an enumeration of the double indices (l, m) = (l(n), m(n)) in (1.2). Hence, solutions to (4.2) are given by modified (due to the scaling in (3.2)) and Laplace and M¨ obius transformed Hankel functions (1/2) 1−d/2 (1/2) Hl(n)−1+d/2 (r). Hn (r) := r (1) Proposition 4.2. Let (κ/κ0 ) > 0. If Hn (κa) = 0, then (4.1) has a unique (1) (1) ˜ and u0,n = Hn(1)(κa) gn . If Hn (κa) = 0, then (4.1) has a solution (u0,n , Un ) ∈ X +
1
κHn
(κa)
solution if and only if gn = 0. Proof. Using Lemmas 4.1 and A.1 and the Fourier convolution theorem, it can be shown that (4.1) is equivalent to the variational problem to find un ∈ H 1 (R+ ) such
986
THORSTEN HOHAGE AND LOTHAR NANNEN
that i aκ0
∞
−κ20 u n (t)v (t)
0
+
a2 λn − Cd − (κa) un (t)v(t) + it un (t)v(t) dt ( κ0 + 1)2 2
d−1 un (0)v(0) = −gn v(0) 2a
for all v ∈ H 1 (R+ ). This is the variational formulation of the exterior boundary value problem Cd − a2 λn 2 2 (4.6a) κ0 un (t) − (κa) + it ˆ ∈ Γ, un (t) = 0, t ≥ 0, x ( κ0 + 1)2
i d−1 un (0) = un (0) , (4.6b) agn + κ0 2 un ∈ L2 (R+ ).
(4.6c)
The general solution of the differential equation (4.6a) is given by
un (t) =
it +1 κ0
(d−1)/2
it it (1) (2) (2) H + 1 + c H + 1 . c(1) κa κa n n n n κ0 κ0
Due to the asymptotic behavior (1.3) of the Hankel functions and the assumption (2)
(1)
(κ/κ0 ) > 0, (4.6c) implies that cn = 0. If Hn (κa) = 0, then the boundary condi (1) (1) tion (4.6b) implies u0,n = Hn (κa)/(κHn (κa)) gn . Otherwise (4.6b) is satisfied if and only if gn = 0. As a corollary we obtain the converse of Theorem 3.4. Corollary 4.3. If (uint , U ) ∈ X # is a solution to the variation problem (3.6) (1)
and Hn (κa) = 0, then uint is the restriction of a solution to (3.1). Proof. Let (uint , U ) be a solution to (3.6) and let ∂ν u ∈ H −1/2 (Γ) denote the Neumann trace. We rearrange the terms in (3.6) such that only the integrals over Ωint and ∂K are on the left-hand side to obtain
u0 v0 f vint |∂K ds + Bint (uint , vint ) = Bext , . U V ∂K It follows that Bext (u0 , U ) , (v0 , V ) = Γ ∂ν u v0 ds for all (v0 , V ) . Now we can apply a Fourier separation on Γ and use Proposition 4.2 to obtain the relation (1) (1) Hn (κa)u0,n = Hn (κa)(∂ν u)n for the Fourier coefficients (∂ν u)n := Γ ∂ν uΦn ds. Therefore, we can define an outgoing exterior solution by (1.2) with the constants αl(n),m(n) =
H(1) n (κa) (1)
κHn
(κa)
(∂ν u)n , which has the same Cauchy data on Γ as uint .
Lemma 4.4. We have Un ∈ H + (S 1 ) ∩ C ∞ (S 1 ). Proof. It follows from [9, Proposition 6.6 and Lemma 6.3] that the Fourier coefficients of the Laplace transform, u ˆn (s) := Luext (x, ·), Φn L2 (Γ) , have an integral representation of the form ∞ cn ψn (t) cn u ˆn (s) = − − dt, s ∈ C \ {iκa − t : t ≥ 0}, iκa − s iκa − t − s 0
987
HARDY SPACE INFINITE ELEMENTS
with a constant cn ∈ C and a function ψn (t) decaying exponentially as t → ∞. This n := Mκ0 (ˆ un ) belongs to H + (S 1 ) ∩ implies that u ˆn |R ∈ H − (R) ∩ C ∞ (R). Hence, U ∞ 1 C (S \ {1}). It remains to study the asymptotic behavior of u ˆn at infinity, or n at 1. Expanding the integral kernel in powers of equivalently the behavior of U ˆn has an 1/(s − iκ0 ) and using the exponential decay of ψn , it can be shown that u asymptotic expansion uˆn (s) =
J j=1
αj −J , + o |s − iκ | 0 (s − iκ0 )j (n)
|s| → ∞,
for any J ∈ N. By well-known asymptotic formulas for the Laplace transform we have (n) n u0,n = α1 . Since Mκ0 ((• − iκ0 )−j ) (z) = (z − 1)j−1 /(2iκ0 )j , it follows that U satisfies n (z) = U
(n) J αj (z − 1)j−1 + o |z − 1|J−1 , j (2iκ0 ) j=1
as |z − 1| → 0.
Therefore, n (z) − α 2iκ0 U 1 z−1
(n)
Un (z) =
=
J j=2
αj (z − 1)j−2 + o(|z − 1|J−2 ), (2iκ0 )j−1
as |z − 1| → 0.
This implies that Un is J − 2 times differentiable at 1. Since J was arbitrary, this n shows that Un ∈ H + (S 1 ) ∩ C ∞ (S 1 ). together with the properties of U 4.3. Convergence. The bilinear form aB1 essentially coincides with the exterior part Bext of the bilinear form from the one-dimensional case. As in (2.21) we have (4.7)
{(i + γ)B1 ((u0 , U ), (u0 , CU ))} ≥ αU 2X
for some α, γ > 0 if (κ0 ), (κ2 /κ0 ) > 0. Therefore, K1 is boundedly invertible. Lemma 4.5. The operator K2 is compact. Proof. K2 is a rank-1 perturbation of the operator K3 : H + (S 1 ) → H + (S 1 ) given implicitly by iκ0 (4.8) (K3 U, V )H + (S 1 ) = − (z − 1)J 2 (z − 1)U (z)V (z)|dz|. π S1 Here we have used the boundedness of J : H + (S 1 ) → H + (S 1 ) (see (4.9a)) and the symmetry property A(U, JV ) = A(JU, V ), which follows from the representation of D = J −1 with respect to the monomial basis. Since the orthogonal projection P : L2 (S 1 ) → H + (S 1 ) and the operator H + (S 1 ) → H + (S 1 ), U → J((• − ˜ 4 : H + (S 1 ) → L2 (S 1 ), 1)U ) are bounded, it suffices to show the compactness of K ˜ (K4 U )(z) = (z − 1)(JU )(z), or equivalently the compactness of K4 := H − (R) → 2iκ0 )(s). The following inequalities hold for some constants L2 (R), (K4 f )(s) := s+iκ (Jf 0 − C > 0, f ∈ H (R), and s, s1 , s2 ∈ R: (4.9a)
2 ≤ Cf 2 , Jf
(4.9b)
|(K4 f )(s)| ≤
(4.9c)
|(K4 f )(s1 ) − (K4 f )(s2 )| ≤ C
C f 2 , |s + iκ0 |
|s1 − s2 |f 2 .
988
THORSTEN HOHAGE AND LOTHAR NANNEN
= g ∗ f with The first inequality is a consequence of Plancherel’s theorem, since Jf −t g(t) := e for t ≥ 0 and g(t) ≡ 0 for t < 0: √ √ 2 = g ∗ f 2 = 2πF (g ∗ f )2 = 2πF g F f 2 ≤ 2πF g∞F f 2 ≤ Cf 2 . Jf For the third inequality we assume without loss of generality that s2 > s1 and write ∞ (s1 −σ) (Jf )(s2 ) s2 e(s1 −σ) e e(s2 −σ) )(s1 ) (Jf − f (σ) dσ + − f (σ) dσ . = s1 + iκ0 s2 + iκ0 s1 + iκ0 s2 + iκ0 s2 s1 s1 + iκ0 The first integral can be estimated with the Cauchy–Schwarz inequality by (s −σ) 2 s2 1/2 e 1 2 |I1 | ≤ |s2 − s1 | sup |f (σ)| dσ ≤ C |s1 − s2 |f 2 . σ∈[s1 ,s2 ] s1 + iκ0 s1 For I2 the mean value theorem and the Cauchy–Schwarz inequality yield 1/2 ∞ (t−1)(s2 −s1 ) (s2 −σ) 2 6 |I2 | ≤ C|s2 − s1 | sup e |e | dσ f 2 , t∈[0,1]
s2
and we have shown (4.9c). Inequality (4.9b) can be proven in an analogous manner. In order to show the compactness of K4 we use the Arzel`a–Ascoli theorem. Thus let (wn )n∈N be a sequence in H − (R) with wn 2 ≤ 1 for all n ∈ N and vn := K4 wn . Due to the Arzel`a–Ascoli theorem, there exists a subsequence of (vn ) which converges in the supremum norm of a compact subset I of R, since (vn ) is equicontinuous and bounded in I by(4.9b) and (4.9c). Let Ij := [−j, j] ⊂ R, vn0 (l) := vl . Moreover, for every j ∈ N let vnj (l) be a subsequence of vnj−1 (l) converging in the supremum norm of Ij . Thus the diagonal subsequence vn(l) := vnl (l) converges pointwise in R and for each Ij in the supremum norm of Ij to a function v. For given > 0 it remains to show that there exists a l0 () ∈ N such that vn(l) − v2 < for all l ≥ l0 . This can be seen with (4.9b) since there exists a j0 () ∈ N such that 1 vn(l) (s) − v(s)2 ds ≤ 2C ds ≤ . 2 |s + iκ | 2 0 R\Ij0 R\Ij0 Because of the uniform convergence of vn(l) in Ij0 , the subsequence vn(l) of the image sequence vn = K4 wn converges in L2 (R) and the proof is done. With these preparations we easily obtain the following superalgebraic convergence result. Theorem 4.6. Assume that κ0 , κ/κ0 , and κ2 /κ0 have positive real part and (1)
that Hn (κa) = 0; i.e., κ is not a resonance of (4.1). Then there exist constants (N ) (N ) N0 , Cl > 0 such that for N ≥ N0 there exists a unique solution (u0,n , Un ) in the space XN := C ⊕ ΠN to the variational equation (N ) (N ) (N ) (N ) u0,n u0,n Cd − a2 λn v0 v0 (N ) , + , = −gn v0 B2 (4.10) B1 2 (N ) (N ) (N ) (N ) aκ V V Un Un 0 (N )
for (v0 (4.11)
, V (N ) ) ∈ XN . Moreover, for any l ∈ N the error estimate 0
0 0 u(N ) u0,n 0 C 0 0,n 0 − 0 0 ≤ l (N ) 0 Un Un 0 ˜ N X
holds for some constant C depending on l, n, and κ.
989
HARDY SPACE INFINITE ELEMENTS
Proof. Due to the coercivity estimate (4.7) the method converges for the bilinear form B1 . Using [12, Theorem 13.7], Proposition 4.2, and Lemma 4.5, it follows that the whole method (4.10) is stable and convergent. From the approximation properties of trigonometric polynomials and Lemma 4.4, it follows that the speed of convergence is superalgebraic. Since the operators on the left-hand side of (4.2) are compact perturbations of Toeplitz operators, we could have appealed to more general convergence results for the finite section method (cf. [1, Chapter 7]) for an alternative proof of Theorem 4.6. 4.4. Discussion. For a fixed finite element subspace of H 1/2 (Γ), a separation argument in this subspace and Theorem 4.6 yield superalgebraic convergence to a transformed outgoing solution as N → ∞. However, our results do not exclude the possibility that the constants in the convergence estimate explode as the mesh size tends to 0. To our knowledge this is also the state of the art for usual infinite elements in the space domain (cf. [3, 4]). Numerical evidence presented in Figure 5.1 suggests that both the discrete bilinear forms are bounded from above, and their inf-sup constants are bounded from below, both uniformly in the Hardy dimension N and the separation index n. We have not been able to prove this for the inf-sup constants so far. With such uniform estimates one would obtain convergence of the Neumann-to-Dirichlet (or equivalently the Dirichlet-to-Neumann) operators in the natural operator norms, which easily yields a convergence result for the scattering problem (3.1) (cf. [11, 10]). 5. Numerical results. We first study the separated equations and decompose the norm • X # := •, •X # into the norms (5.1)
0
0 0
02 0 0 u0,n 02 0 0 0 := 1 + λn |u0,n |2 + Un 2 + 1 + λn 0JT− u0,n 0 H (S ) 0 Un 0 0 Un 0H + (S 1 ) X n
0 02 0 0 0 u0,n 02 . If for each Fourier coefficient (u0,n , Un ) such that 0 uU0 0X # = Un n Xn ) U (N ∈ CN denotes the vector of the first N Fourier coefficients of Un , the discrete n counterpart on XN,n := CN +1 is the norm
(5.2) 0 02 0 u 0 0,n 0 0 0 ) 0 0 U (N 0 n
:=
XN,n
u0,n
) U (N n
∗
√ 1 + λn
1
+
− −1
DN DN TN,− λn TN,−
u0,n
) U (N n
.
Figure 5.1 show the norms and inf-sup constants with respect to the norm in (5.2) of (N ) the bilinear form in (4.10), which is represented by the matrix Tn := L1 + λn L2 − 2 κ L3 (see (3.11)). They were computed using a Cholesky factorization G = L∗ L of (N ) (N ) the Gramian matrix G in (5.2) as (L )−1 Tn L−1 2 and [(L )−1 Tn L−1 ]−1 2 , respectively. Here A2 denotes the spectral norm, i.e., the largest singular value of a matrix A. The results suggest that the norms are bounded from above and the inf-sup constants are bounded from below, both uniformly in N and n. Figure 5.2 shows the convergence of the relative errors of the numerical approxi(1) (1) mations to the Neumann-to-Dirichlet numbers NtD(n, κ, a) := Hn (κa)/κHn (κa). These numerical approximations are computed by solving (4.10) with gn = 1; they (N ) are given by the negative upper left entry of the matrices [Tn ]−1 defined above. The results exhibit a fast, almost exponential convergence as N → ∞ for each Fourier
990
THORSTEN HOHAGE AND LOTHAR NANNEN
discrete inf−sup constant
discrete norm bounds 2.08
2 N=10 N=20 N=50 N=100
2.06 2.04
N=10 N=20 N=50 N=100
1.5 1
2.02 0.5
2 1.98 0
50
100
150
0 0
200
50
separation index n
100
150
200
separation index n
Fig. 5.1. Norms and inf-sup constants of the separated bilinear forms in (4.10) with respect to the norms defined in (5.2) for κ = κ0 = a = 1 and d = 2. κ=1
κ=5
0
0
−5
10
n= 0 n= 2 n= 5 n= 10 n= 20 n= 40
−10
10
−15
10
20
40
60 N
80
100
10
−5
10
n= 0 n= 2 n= 5 n= 10 n= 20 n= 40
−10
10
−15
10
20
40
60 N
80
100
rel. NtD error
10
rel. NtD error
rel. NtD error
κ = 25
0
10
−5
10
n= 0 n= 2 n= 5 n= 10 n= 20 n= 40
−10
10
−15
10
20
40
60
80
100
N
Fig. 5.2. Relative error of the Neumann-to-Dirichlet numbers for different Fourier modes n, different wave numbers κ, a = 1, κ0 = κ, and d = 2.
mode n. The constants deteriorate as n grows, but improve as κ grows. Due to the stability shown in Figure 5.1, this must be due to the approximation properties of polynomial subspaces for the transformed Hankel functions. The error for the full unseparated problem is mainly determined by the convergence behavior of the first Fourier modes as the size |u0,n | of the Fourier coefficients decays exponentially with n since u0 is analytic. Figure 5.3 shows results for the scattering of plane incident waves with different wave numbers κ by a kite-shaped domain. As a reference solution we computed a pair of Cauchy data on Γ by a Nystroem integral equation method (cf. [2, section 3.5]). We used the reference Neumann data on spheres of radius 2 and 3.5 as initial data for the Hardy space method (HSM) and compared the Dirichlet data computed by the HSM to the reference Dirichlet data. As basis functions on Γ we used so-called hierarchic shape functions of high polynomial degrees (see [21, section 3.1.4]) such that the finite element error could be neglected. The error plot in Figure 5.3 clearly exhibits fast convergence with respect to N both for the wave number κ = 5 and κ = 25. As for other methods (e.g., PML or standard infinite elements), the error for a fixed number of degrees of freedom in the exterior domain grows smaller as the distance of the coupling boundary to the scatterer increases. Since a crucial advantage of the HSM is the applicability of the method to resonance problems, we computed as a second example the resonances of a square with
991
HARDY SPACE INFINITE ELEMENTS
k= 5
0
10 radius = 3.5 radius = 2
−2
−4
10
−6
10
−8
10
−10
10
−12
−4
10
−6
10
−8
10
−10
10
−12
10
10
−14
0
radius = 3.5 radius = 2
−2
10 H1/2 error of Dirichlet−Data
H1/2 error of Dirichlet−Data
10
10
k= 25
0
10
−14
5
10
15
20 Dim Hardy
25
30
35
40
10
0
5
10
15
20 Dim Hardy
25
30
35
40
Fig. 5.3. H 1/2 (Γ)-error in the Dirichlet data for different wave numbers and radii as a function of the number N of degrees of freedom in the Hardy space H + (S 1 ).
Fig. 5.4. Eigenfunctions of an open square.
a small opening. This was done using the finite element solver ngsolve, which is an add-on of the mesh generator netgen [20]. In Figure 5.4 three different eigenfunctions are plotted. Two of them correspond to the real valued eigenvalues of the Laplace operator in a closed square and the third to an exterior surface resonance, the location of which depends mainly on the circumference of the obstacle (cf. [25] and the references therein). In Figure 5.5 the exterior resonances of the sphere were computed as roots of the Hankel functions of the first kind. Additionally we used
992
THORSTEN HOHAGE AND LOTHAR NANNEN
1 0 −1 −2 −3 −4 −5 −6 −2
0
2
4
6
8
Fig. 5.5. Resonances of an open square (•: HSM for open square; ♦: PML for open square; : eigenvalues for closed square; ◦: exterior resonances of a sphere with the same circumference as the square).
PML (♦) as reference solution. The HSM resonances in the third quadrant and the PML resonances in the lower part of the plot are computational artifacts. 6. Conclusions. We have presented a new type of infinite elements based on the pole condition which are derived by transforming the exterior variational formulation of the Helmholtz equation to a Hardy space. They can be coupled with finite elements of arbitrary order in the interior domain and have simple, symmetric element matrices with a tensor product structure. The convergence with respect to the number of degrees of freedom in the transformed radial direction is superalgebraic. Moreover, they are particularly well suited for resonance problems since they preserve the eigenvalue structure. As opposed to other numerical realizations of the pole condition (cf. [8, 19]) it is not possible to recover the exterior solution directly by the HSM. Let us compare Hardy space infinite elements with PML from a practical perspective: The PML method has the advantage of being easy to implement in standard software package, whereas the HSM requires the implementation of a new (in)finite element. The HSM has the advantage that it is a high order method which can easily be combined with low order codes. Moreover, the only tuning parameter in the HSM is κ0 , and the rule κ0 ≈ κ yields good results, whereas for PML at least the slope of the path in the complex plane, the width of the layer, and the polynomial degree have to be chosen. Our preliminary numerical experiments suggest that the HSM performs at least as good as PML, but for a definite conclusion more thorough numerical studies optimizing the various PML parameters will be necessary. The HSM is not restricted to the situation studied in this paper, but can be extended to other differential equations and other coupling boundaries, which may be subject of future research. Appendix. In this appendix we prove the lemmas needed for the transformation to the Hardy space. Lemma A.1. Let M ≥ 0 and κ0 ∈ C be given constants with (κ0 ) > 0, and let f, g : R+ → C be two measurable functions such that f exp(−M •) and g exp(M •) belong to L1 ([0, ∞)) ∩ L2 ([0, ∞)). Moreover, assume that the Laplace transformed functions fˆ := Lf and gˆ := Lg have holomorphic extensions to the regions sketched
993
HARDY SPACE INFINITE ELEMENTS
(a) fˆ(s)
(c) fˆ(s)ˆ g (−s)
(b) gˆ(s)
Fig. A.1. Regions to which the functions in Lemma A.1 have holomorphic extensions.
in Figure A.1 and that |fˆ(s)s|, |ˆ g (s)s| are uniformly bounded in these regions. Then ∞ i −iκ0 (A.1) f (r)g(r)dr = − F (z)G(z)|dz|, f(s) g (−s)ds = 2π κ0 R π 0 S1 with F := Mκ0 (fˆ|κ0 R ) and G := Mκ0 (ˆ g |κ0 R ). (The orientation of the contour κ0 R is from left to right.) Proof. We extend f, g by zero to f ∗ , g ∗ : R → C and write the integral as a ∞ Fourier transform (F ϕ)(s) := −∞ e−ist ϕ(t)dt evaluated at s = 0:
∞
f (r)g(r)dr = F {f ∗ g ∗ } (0) =
0
1 2π
∞
5 4 5 4 F f ∗ e−M• (t)F g ∗ eM• (−t)dt.
−∞
5 4 5 4 g(−(it + M )) exist due to Here F f ∗ e−M• (t) = f(it + M ) and F g ∗ eM• (−t) = our assumptions. The first equation in (A.1) follows by Cauchy’s integral theorem for the closed contour γ1 + γ2 − γ3 + γ4 shown in Figure A.1(c), using the fact that the integrals over γ2 and γ4 vanish as R → ∞ due to the assumed decay of fˆ and gˆ: ∞ i i lim lim f (r)g(r) dr = − f(s) g (−s) ds = − f(s) g (−s) ds. 2π R→∞ γ1 2π R→∞ γ3 0 To prove the second equation we use the substitution of variables s = ϕκ0 (z) and the −2iκ0 1 identities ϕ κ0 (z) = (z−1) to obtain 2 and −ϕκ0 (z) = ϕκ0 (z) for z ∈ S i lim − 2π R→∞
−κ0 f(s) g (−s)ds = π γ3 =
−κ0 π
S 1 ,
f(ϕκ0 (z)) g(−ϕκ0 (z)) dz z−1 z−1 F (z)G(z)
S 1 ,
z−1 dz. z−1
1/z−1 The symbol indicates clockwise orientation of the contour S 1 . Since z−1 z−1 = z−1 = − 1z for z ∈ S 1 , and dz = −iz|dz|, we obtain the second equation in (A.1). Lemma A.2. Let κ0 ∈ C \ {0}, let E be an open subset of {k ∈ C : (k/κ0 ) > 0}, k−κ0 for k ∈ E. Then span{Vk : k ∈ E} is dense in and define Vk (z) := (κ0 −k)z+(κ 0 +k) + 1 H (S ). i(k−κ0 ) 1 Proof. A straightforward computation shows that (M−1 1 Vk )(z) = κ0 s−ik/κ0 , with the transform M1 defined in (2.7) (with κ0 = 1, not the κ0 given in the lemma).
994
THORSTEN HOHAGE AND LOTHAR NANNEN
Since M1 : H − (R) → H + (S 1 ) is unitary, the statement is equivalent to the density of Y := span{1/(• − ik/κ0 ) : k ∈ E} in H − (R). Assume that f ∈ Y ⊥ , i.e., f (˜ s)/(˜ s − ik/κ0 ) d˜ s = 0 for all k ∈ E. Then the holomorphic function R 1 f (˜ s) w(z) := d˜ s, z ∈ C− , 2πi R s˜ − z vanishes on {ik/κ0 : k ∈ E}, which is an open subset of C− . Therefore, w vanishes identically in C− . Due to Definition 2.1 and (2.6), f are the boundary values of w on R, and hence f = 0. This shows that Y ⊥ = {0}, i.e., Y is dense in H − (R). Lemma A.3. Consider the set X # and the inner product defined in (3.7), and let (κ), (κ0 ) > 0. (1) X# is a Hilbert space. (2) For each vint ∈ H 1 (Ωint ) there exists V ∈ H + (S 1 )⊗L2 (Γ), such that (vint , V ) ∈ X# . 6# 6 # ⊂ X # , such that for all (vint , V ) ∈ X (3) There exists a dense subset X ∞ ∞ we have vint ∈ C (Ωint ) and there exists a function vext ∈ C ([0, ∞) × Γ)
such that (iκ0 )−1 (M−1 κ0 T− ⊗ I)(v0 , V ) = L|κ0 R vext and the assumptions of Lemma A.1 are fulfilled with f (r) := exp(iκr) and g(r) := vext (r, xˆ) for all x ˆ ∈ Γ as well as with the first derivatives of vext . Proof. (1) A straightforward argument using the closedness of the surface gradient ∇x shows that X# is complete. 1/2 (Γ), the Fourier (2) Let vint ∈ H 1 (Ω in ) and define v0 := uint |Γ . Since v0 ∈ H ∞ 1/2 2 coefficients of v0 satisfy n=0 (1+λn ) |v0,n | < ∞. Here and in the following we use ∞ the notation of section 4. Define V (z, x ˆ) := n=0 v0,n Vkn (z)Φn (ˆ x) with a sequence (kn ) to be specified later. Since the functions Vk in Lemma A.2 satisfy Vk (z) = k/κ0 +1 −1 0 +1| , it follows by radial symmetry that Vk 2L2 (S 1 ) = Ξ |k/κ k/κ0 −1 − z |k/κ0 −1| − 1 , with 11π/6 Ξ(t) := S 1 |1 + t − z|−2 |dz| for t > 0. Setting c := π/6 |1 − exp(iθ)|−2 dθ, we obtain Ξ(t) − c ≤
π/6 −π/6
dθ ≤ |1 + t − exp(iθ)|2
π/6
−π/6
t2
4 atan(π/12t) 2π dθ = ≤ , 2 + θ /4 t t
so Ξ(t) = O(t−1 ) as t ( 0. From the identity T− (1, Vk ) =
κ0 (κ0 −k) Vk
it follows that
0 02
0 |k/κ0 + 1| |k − κ0 |2 0 2 0T− 1 0 − 1 = O(k) = V = Ξ 2 1 k L (S ) Vk 0L2 (S 1 ) |κ0 |2 0 |k/κ0 − 1| √ as (k) → ∞. Now choose k0 such that (k0 /κ0 ) > 0 and kn := k0 + λn for n = 1, 2, . . . . Then # 0 02 0
02 ∞ 0 vint 0 0 1 0 2 2 2 0 0 0 0 − vint H 1 = |v0,n | Vkn L2 (S 1 ) + λn 0JT− 0 V 0 Vkn 0L2 (S 1 ) X# n=0 ∞ 2 2 2 λn |κ0 | ≤C |v0,n | |kn | 1 + J |kn − κ0 |2 n=0 ≤C
∞ n=0
|v0,n |2 |kn | ≤ C
∞ n=0
with a generic constant C. Hence, (vint , V ) ∈ X # .
|v0,n |2 (1 + λn )1/2 < ∞,
HARDY SPACE INFINITE ELEMENTS
995
∞ x) (3) With V as constructed above we have vext (r, xˆ) = n=0 v0,n exp(ikn r)Φn (ˆ (cf. (2.8), (2.9), (2.15)). If vint ∈ C ∞ (Ωint ), then the Fourier coefficients v0,n decay superalgebraically, and the series together with its term-by-term derivatives converges uniformly on compact subsets. Moreover, r → eiκr vext (r, xˆ) decays exponentially if $(kn + κ) = $(k0 + κ) > 0. This can be arranged by an appropriate choice of ˆ) and also to its first derivatives. k0 . Hence, Lemma A.1 can be applied to vext (r, x Since everything above remains valid if k is chosen in a small ball around k0 + n √ λn , the density property follows from Lemma A.2 and the density of C ∞ (Ωint ) in H 1 (Ωint ). Acknowledgments. The idea to use a transform to the Hardy space H + (S 1 ) arose from discussions with Frank Schmidt and his group at Zuse Institut in Berlin within this project. REFERENCES ¨ ttcher and B. Silbermann, Analysis of Toeplitz Operators, 2nd ed., Springer Monogr. [1] A. Bo Math., Springer-Verlag, Berlin, 2006. [2] D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory, 2nd ed., Appl. Math. Sci. 93, Springer-Verlag, Berlin, 1998. [3] L. Demkowicz and K. Gerdes, Convergence of the infinite element methods for the Helmholtz equation in separable domains, Numer. Math., 79 (1998), pp. 11–42. [4] L. Demkowicz and F. Ihlenburg, Analysis of a coupled finite-infinite element method for exterior Helmholtz problems, Numer. Math., 88 (2001), pp. 43–73. [5] P. L. Duren, Theory of H p spaces, Pure Appl. Math. 38, Academic Press, New York, 1970. [6] D. Givoli, High-order nonreflecting boundary conditions without high-order derivatives, J. Comput. Phys., 170 (2001), pp. 849–870. ¨ berl, Acoustic resonances in high lift configura[7] S. Hein, T. Hohage, W. Koch, and J. Scho tion, J. Fluid Mech., 582 (2007), pp. 179–202. [8] T. Hohage, F. Schmidt, and L. Zschiedrich, A new method for the solution of scattering problems, in Proceedings of the JEE’02 Symposium, B. Michielsen and F. Decav`ele, eds., ONERA, Toulouse, France, 2002, pp. 251–256. [9] T. Hohage, F. Schmidt, and L. Zschiedrich, Solving time-harmonic scattering problems based on the pole condition I: Theory, SIAM J. Math. Anal., 35 (2003), pp. 183–210. [10] T. Hohage, F. Schmidt, and L. Zschiedrich, Solving time-harmonic scattering problems based on the pole condition II: Convergence of the PML method, SIAM J. Math. Anal., 35 (2003), pp. 547–560. [11] F. Ihlenburg, Finite Element Analysis of Acoustic Scattering, Appl. Math. Sci. 132, SpringerVerlag, New York, 1998. [12] R. Kress, Linear Integral Equations, 2nd ed., Appl. Math. Sci. 82, Springer-Verlag, New York, 1999. [13] M. Lenoir, M. Vullierme-Ledard, and C. Hazard, Variational formulations for the determination of resonant states in scattering problems, SIAM J. Math. Anal., 23 (1992), pp. 579–608. [14] N. Moiseyev, Quantum theory of resonances: Calculating energies, width and cross-sections by complex scaling, Phys. Rep., 302 (1998), pp. 211–293. [15] L. Nannen, Hardy-Raum Methoden zur numerischen L¨ osung von Streu- und Resonanzproblemen auf unbeschr¨ ankten Gebieten, Ph.D. thesis, University of G¨ ottingen, T¨ onning, 2008. ¨ dle, F. Schmidt, and L. Zschiedrich, Transparent boundary condi[16] D. Ruprecht, A. Scha tions for time-dependent problems, SIAM J. Sci. Comput., 30 (2008), pp. 2358–2385. [17] F. Schmidt, A New Approach to Coupled Interior-Exterior Helmholtz-Type Problems: Theory and Algorithms, habilitation, Freie Universit¨ at Berlin, 2002. [18] F. Schmidt and P. Deuflhard, Discrete transparent boundary conditions for the numerical solution of Fresnel’s equation, Comput. Math. Appl., 29 (1995), pp. 53–76. ¨ dle, and L. Zschiedrich, Pole condition: A [19] F. Schmidt, T. Hohage, R. Klose, A. Scha numerical method for Helmholtz-type scattering problems with inhomogeneous exterior domain, J. Comput. Appl. Math., 218 (2008), pp. 61–69. ¨ berl, Netgen—an advancing front 2d/3d-mesh generator based on abstract rules, Com[20] J. Scho put. Visual. Sci., 1 (1997), pp. 41–52.
996
THORSTEN HOHAGE AND LOTHAR NANNEN
[21] C. Schwab, p- and hp-Finite Element Methods: Theory and Applications in Solid and Fluid Mechanics, Numer. Math. Sci. Comput., The Clarendon Press, Oxford University Press, New York, 1998. [22] B. Simon, The theory of resonances for dilation analytic potentials and the foundations of time dependent perturbation theory, Ann. Math., 97 (1973), pp. 247–274. [23] M. Taylor, Partial Differential Equations: Qualitative Studies of Linear Equations, Vol. 2, Springer-Verlag, New York, 1996. ¨ dle, and F. Schmidt, A new finite element realization of [24] L. Zschiedrich, R. Klose, A. Scha the perfectly matched layer method for Helmholtz scattering problems on polygonal domains in two dimensions, J. Comput. Appl. Math., 188 (2006), pp. 12–32. [25] M. Zworski, Resonances in physics and geometry, Notices Amer. Math. Soc., 46 (1999), pp. 319–328.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 997–1018
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS∗ P.-A. ABSIL† AND K. A. GALLIVAN‡ Abstract. In numerical optimization, line-search and trust-region methods are two important classes of descent schemes, with well-understood global convergence properties. We say that these methods are “accelerated” when the conventional iterate is replaced by any point that produces at least as much of a decrease in the cost function as a fixed fraction of the decrease produced by the conventional iterate. A detailed convergence analysis reveals that global convergence properties of line-search and trust-region methods still hold when the methods are accelerated. The analysis is performed in the general context of optimization on manifolds, of which optimization in Rn is a particular case. This general convergence analysis sheds new light on the behavior of several existing algorithms. Key words. line search, trust region, subspace acceleration, sequential subspace method, Riemannian manifold, optimization on manifolds, Riemannian optimization, Arnoldi, Jacobi–Davidson, locally optimal block preconditioned conjugate gradient (LOBPCG) AMS subject classifications. 65B99, 65K05, 65J05, 65F15, 90C30 DOI. 10.1137/08072019X
1. Introduction. Let f be a real-valued function defined on a domain M , and let {xk } be a sequence of iterates generated as follows: for every k, some xk+1/2 ∈ M is generated (possibly implicitly) using a descent method that has global convergence to stationary points of f ; then xk+1 is chosen arbitrarily in the sublevel set {x ∈ M : f (x) ≤ f (xk+1/2 )}. We term “acceleration” the fact of choosing xk+1 rather than xk+1/2 as the new iterate. The question addressed in this paper is whether the inclusion of the acceleration step preserves global convergence, i.e., whether {xk } converges to stationary points. We prove that the answer is positive for a wide class of methods. The initial motivation for engaging in this general convergence analysis was to obtain a unifying convergence theory for several well-known eigenvalue algorithms. For example, the Jacobi–Davidson approach [38] is a popular technique for computing an eigenpair (eigenvalue and eigenvector) of a matrix A. It is an iterative method where the computation of the next iterate xk+1 from the current iterate xk can be decomposed into two steps. The Jacobi step consists of solving (usually, approximately) a Newton-like equation to obtain an update vector ηk . Whereas in a classical Newton method the new iterate xk+1 is defined as xk + ηk , the Davidson step uses the update vector ηk to expand a low-dimensional subspace and selects xk+1 as the “best” approximation (in some sense) of the sought eigenvector of A within the subspace. A key to the success of this approach is that the problem of computing xk+1 within the ∗ Received by the editors April 3, 2008; accepted for publication (in revised form) September 16, 2008; published electronically February 13, 2009. This paper presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office. The scientific responsibility rests with its authors. This work was supported in part by the US National Science Foundation under grant OCI0324944 and by the School of Computational Science of Florida State University. http://www.siam.org/journals/sinum/47-2/72019.html † D´ epartement d’ing´enierie math´ematique, Universit´e catholique de Louvain, 1348 Louvain-laNeuve, Belgium (
[email protected], http://www.inma.ucl.ac.be/˜absil). ‡ Department of Mathematics, Florida State University, Tallahassee, FL 32306-4510 (kgallivan@ fsu.edu, http://www.math.fsu.edu/˜gallivan).
997
998
P.-A. ABSIL AND K. A. GALLIVAN
subspace can be viewed as a reduced-dimensional eigenvalue problem, which can be solved efficiently when the dimension of the subspace is small. In certain situations, notably when xk+1 is chosen as the Ritz vector associated with an extreme Ritz value, the Davidson step can be interpreted as an acceleration step in the sense given above. The reader primarily interested in eigenvalue algorithms can thus think of the purpose of this paper as formulating and analyzing this Jacobi–Davidson concept in the broad context of smooth optimization, i.e., the minimization of a smooth realvalued cost function over a smooth domain. The “Jacobi” step, instead of being restricted to (inexact) Newton methods, is expanded to cover general line-search and trust-region techniques. The “Davidson” step, or acceleration step, is also made more general: any iterate xk+1 is accepted provided that it produces a decrease in the cost function that is at least equal to a prescribed fraction of the decrease produced by the Jacobi update; minimizing the cost function over a subspace that contains the Jacobi update is just one way of achieving this goal. This new analysis, while requiring only rather straightforward modifications of classical proofs found in the optimization literature, is very general and powerful. In particular, our global convergence analysis yields novel global convergence results for some well-known eigenvalue methods. Moreover, the proof technique is less ad hoc than the proofs and derivations usually found in the numerical linear algebra literature, since it simply relies on showing that the methods fit in the broad optimization framework. What we mean by a smooth domain is a (smooth) manifold. Since the work of Gabay [17], there has been a growing interest for the optimization of smooth cost functions defined on manifolds. Major references include [22, 40, 34, 14, 3]. These differential-geometric techniques have found applications in various areas, such as signal processing, neural networks, computer vision, and econometrics (see, e.g., [6]). The concept of a manifold generalizes the notion of a smooth surface in a Euclidean space. It can thus be thought of as a natural setting for smooth optimization. Roughly speaking, a manifold is a set that is locally smoothly identified with open subsets of Rd , where d is the dimension of the manifold. When the manifold is given to us as a subset of Rn described by equality constraints, the differential-geometric approach can be viewed as an “informed way” of doing constrained optimization. The resulting algorithms have the property of being feasible (i.e., the iterates satisfy the constraints). In several important cases, however, the manifold is not available as a subset of Rn but rather as a quotient space. Usually, the fundamental reason why the quotient structure appears is in order to take into account an inherent invariance in the problem. Smooth real-valued functions on quotient manifolds lend themselves as well to differentialgeometric optimization techniques. We refer the reader to [6] for a recent overview of this area of research. The reader solely interested in unconstrained optimization in Rn should bear in mind that this situation is merely a particular case of the differential-geometric optimization framework considered here. We frequently mention in the text how unconstrained optimization in Rn is subsumed. Line-search and trust-region methods are two major techniques for unconstrained optimization in Rn (see, e.g., [30]). Line-search techniques were proposed and analyzed on manifolds by several authors; see, e.g., [33, 34, 22, 40, 41, 6]. A trust-region framework, based on a systematic use of the concept of retraction, for optimizing functions defined on abstract Riemannian manifolds was proposed more recently [2, 6, 9]. Under reasonable conditions, which hold in particular for smooth cost functions on compact Riemannian manifolds, the trust-region method was shown to converge
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
999
to stationary points of the cost function (this is an extension of a well-known result for trust-region methods in Rn ). Furthermore, if the trust-region subproblems are (approximately) solved using a truncated conjugate gradient (CG) method with a well-chosen stopping criterion, then the method converges locally superlinearly to the nondegenerate local minima of the cost function. However, these favorable global and local convergence properties do not yield any information on the number of iterates needed, from a given initial point, to reach the local superlinear regime; and, indeed, problems can be crafted where this number of iterates is prohibitively high. The same can be said about the retraction-based line-search approach considered here. Acceleration techniques can be viewed as a way of improving the speed of convergence of those methods. The acceleration idea is closely related to the subspace expansion concept in Davidson’s method for the eigenvalue problem [12] (see also the more recent results in [38, 16, 15]), but the constraints we impose on the acceleration step are weaker than in Davidson-type algorithms. Our approach is also reminiscent of the sequential subspace method (SSM) of Hager [20, 25]. Whereas the latter uses subspace acceleration for the purpose of approximately solving trust-region subproblems, we use it as an outermost iteration wrapped around line-search and trust-region methods. The sequential subspace optimization algorithm of Narkiss and Zibulevsky [31] fits in the same framework. The paper is organized as follows. In section 2, we define the concept of acceleration. The background in optimization on manifolds is recalled in section 3, with a particular emphasis on the case where the manifold is simply Rn . We show global convergence properties for accelerated line-search (section 4) and trust-region (section 5) methods on Riemannian manifolds (of which the classical Rn is a particular case). Section 6 gives a local convergence result. In section 7, these results are exploited to show global convergence properties of subspace acceleration methods. In particular, a conceptually simple accelerated conjugate gradient method, inspired from the work of Knyazev [26] for the symmetric eigenvalue problem, is proposed, and its global convergence is analyzed. Applications are mentioned in section 8, and conclusions are drawn in section 9. A preliminary version of this paper appeared in the technical report [4], where the retraction-based line-search scheme and the acceleration concept were introduced. 2. Accelerated optimization methods. In this section, we define the concept of acceleration and briefly discuss acceleration strategies. An important acceleration technique, which consists of minimizing the cost function over an adequately chosen subspace, will be further discussed in section 7. 2.1. Definition. Let f be a cost function defined on an optimization domain M . Given a current iterate xk ∈ M , line-search and trust-region methods generate a new iterate in M ; call it xk+1/2 . Accelerating the method consists of picking a new iterate xk+1 ∈ M that produces at least as much of a decrease in the cost function as a fixed fraction of the decrease produced by xk+1/2 . In other words, xk+1 must satisfy (1) f (xk ) − f (xk+1 ) ≥ c f (xk ) − f (xk+1/2 ) for some constant c > 0 independent of k. 2.2. Acceleration strategies. This relaxation on the choice of the new iterate introduces leeway for exploiting information that may improve the behavior of the method. For example, xk+1 can be determined by minimizing f over some well-
1000
P.-A. ABSIL AND K. A. GALLIVAN
chosen subset of the domain M , built using information gained over the iterations. This idea is developed in section 7. Moreover, a wide variety of “hybrid” optimization methods fit in the framework of (1). For example, let A be a line-search or trust-region algorithm, and let B be any descent method. If, for all k, xk+1/2 is obtained from xk by A and xk+1 is obtained from xk+1/2 by B, then the sequence {xk } is generated by an accelerated line-search or trust-region algorithm. Likewise, for all k, let xk+1/2 be obtained from xk by A, let xk+1/2 ) and x ˜k+1/2 be obtained from xk by B, and let xk+1 = xk+1/2 if f (xk+1/2 ) ≤ f (˜ xk+1 = x ˜k+1/2 otherwise; then the sequence {xk } is again generated by an accelerated line-search or trust-region method. Note that, until we reach section 7 on subspace acceleration, we make no assumption other than (1) on how xk+1 is chosen from xk+1/2 . We also point out that values of c in the open interval (0, 1) do not correspond to acceleration in the intuitive sense of the term since f (xk+1 ) is possibly greater than f (xk+1/2 ). Actually, all practical accelerated methods considered in section 8 satisfy (1) with c = 1. However, we consider the general case c > 0 because it may be useful in some situations and the global convergence analysis for c > 0 is not significantly more complicated than for c = 1. 3. Preliminaries on Euclidean and Riemannian optimization. In this paper, we assume that the optimization domain M is a (finite-dimensional) Riemannian manifold. The particularization to unconstrained optimization in Rn is made explicit whenever we feel that it improves readability. Loosely speaking, a manifold is a topological set covered by mutually compatible local parameterizations. We refer, e.g., to [13, 6] for details. An important type of manifolds are those subsets of Rn with a tangent space of constant dimension defined at each point (simple examples are spheres and Rn itself). If the tangent spaces Tx M are equipped with an inner product ·, ·x that varies smoothly with x, then the manifold is called Riemannian. In this paper, we consider the problem of minimizing a real function f (the cost function) defined on a Riemannian manifold M . Classical unconstrained optimization in Rn corresponds to the case M = Rn . The tangent space to Rn at any point x ∈ Rn is canonically identified with Rn itself: Tx Rn ) Rn . The canonical Riemannian structure on Rn is its usual Euclidean vector space structure, where the inner product at x ∈ Rn defined by ξ, ζ := ξ T ζ for all ξ, ζ ∈ Tx Rn ) Rn . The major problem to overcome is that manifolds are in general not flat so that the sum of two elements of M or their multiplication by scalars is not defined. A remedy advocated in [2] is to locally “flatten” the manifold onto the tangent space Txk M at the current iterate xk . This is done by means of a retraction, a concept proposed by Shub [32, 3]. Definition 3.1 (retraction). A retraction on a manifold M is a mapping R from the tangent bundle T M into M with the following properties (let Rx denote the restriction of R to Tx M ): 1. R is continuously differentiable. 2. Rx (ξ) = x if and only if ξ = 0x , the zero element of Tx M . 3. DRx (0x ) = idTx M , where DRx (0x ) denotes the differential of Rx (·) at 0x and idTx M denotes the identity mapping on Tx M , with the canonical identification T0x (Tx M ) ) Tx M . d Rx (tξx )t=0 = ξx Instead of the third condition, it is equivalent to require that dt for all ξx ∈ Tx M .
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
1001
We do not necessarily assume that R is defined on the whole tangent bundle T M , but we make the blanket assumption that its evaluation never fails in the algorithms. Note that the third condition implies that Rx is defined on a neighborhood of the origin of Tx M for all x ∈ M ; this guarantees that, given ηx ∈ Tx M , Rx (tηx ) is well-defined at least on some nonempty interval − < t < . On a Riemannian manifold, it is always possible to choose the retraction R as the exponential mapping (which is defined everywhere when the manifold is complete). Using the exponential, however, may not be computationally sensible. The concept of retraction gives the possibility of choosing more efficient substitutes (see [3, 6]). Given a cost function f on a manifold M equipped with a retraction R, we define the lifted cost function at x ∈ M as (2)
fˆx : Tx M → R : ξ → f (Rx (ξ)).
When M = Rn , the natural retraction is given by (3)
Rx (ξ) := x + ξ,
and fˆx satisfies fˆx (ξ) = f (x + ξ) for all x ∈ Rn and all ξ ∈ Tx Rn ) Rn . Given a current iterate xk on M , any line-search or trust-region method applied to fˆxk produces a vector ηk in Txk M . In a line-search method, ηk is used as a search direction: a point is sought on the curve t → Rxk (tηk ) that satisfies some conditions on the cost function (e.g., a line minimizer or the Armijo condition). In a trust-region method [2], ηk defines a proposed new iterate Rxk (ηk ). In both cases, the optimization method yields a proposed new iterate xk+1/2 in M . Below we study the convergence properties of such schemes when they are accelerated in the sense of (1). 4. Accelerated line-search methods. Line-search methods (without acceleration) on a manifold M endowed with a retraction R are based on the update formula xk+1 = Rxk (tk ηk ), where ηk is in Txk M and tk is a scalar. The two issues are to select the search direction ηk and then the step length tk . To obtain global convergence results, some restrictions have to be imposed on ηk and tk . The following definition concerning ηk is adapted from [10]. Definition 4.1 (gradient-related). A sequence {ηk }, ηk ∈ Txk M , is gradientrelated if, for any subsequence {xk }k∈K in M that converges to a nonstationary point, the corresponding subsequence {ηk }k∈K is bounded and satisfies lim sup grad f (xk ), ηk xk < 0.
k→∞, k∈K
When M = Rn with
T its canonical Euclidean structure, we have grad f (x) = ∂1 f (x) · · · ∂n f (x) and grad f (x), η = η T grad f (x), where we used the canonical identification Tx Rn ) Rn . (One must bear in mind that when we use the identification Tx Rn ) Rn , we lose the information on the foot x of the tangent vector. In order to specify the foot, we say that {ηk } ⊆ Rn is gradient-related to {xk }.) There is a relation between the gradient relatedness of {ηk } and the angle between −grad f (x ),η ηk and the steepest-descent direction. Let ∠(−grad f (xk), ηk)= arccos grad f (xk )kx ηk kxkx k k denote the angle between ηk and the steepest-descent direction −grad f (xk ). Let {ηk } be such that c1 ≤ ηk xk ≤ c2 for some 0 < c1 < c2 < ∞ and all k. Then the condition ∠(−grad f (xk ), ηk ) ≥ θ for some fixed θ > π2 and all k is sufficient for the
1002
P.-A. ABSIL AND K. A. GALLIVAN
sequence {ηk } to be gradient-related to {xk }. In particular, assume that ηk is obtained by solving a linear system Ak ηk = −grad f (xk ), where Ak is a linear symmetric positive-definite transformation of Txk M . Then cos ∠(−grad f (xk ), ηk ) ≥ κ−1 (Ak ), where κ(Ak ) denotes the condition number of Ak . Hence if the smallest eigenvalue of Ak is bounded away from zero and the largest eigenvalue of Ak is bounded, then {ηk } is bounded away from zero and infinity and the condition number of Ak is bounded, and thus {ηk } is gradient-related. (Note that the condition that the linear operator A : Tx M → Tx M is symmetric positive-definite means that u, Avx = Au, vx for all u, v ∈ Tx M , and u, Aux > 0 for all nonzero u ∈ Tx M . In the case of Rn endowed with its canonical inner product, this corresponds to the classical definitions of symmetry and positive definiteness for the matrix representing the operator A.) The next definition, related to the choice of the step length tk , relies on Armijo’s backtracking procedure [7] (or see [10]) to find a point at which there is sufficient decrease of the cost function. Definition 4.2 (Armijo point). Given a differentiable cost function f on a Riemannian manifold M with retraction R, a point x ∈ M , a nonzero descent vector η ∈ Tx M (i.e., grad f (x), ηx < 0), a scalar α > 0 such that the segment [0, α]η ⊆ Tx M is included in the domain of R, and scalars β ∈ (0, 1) and σ ∈ (0, 1), the Armijo vector is defined as η A = β m αη, where m is the first nonnegative integer such that (4)
f (x) − f (Rx (β m αη)) ≥ −σ grad f (x), β m αηx .
The Armijo point is Rx (β m αη) ∈ M . It can be shown, using the classical Armijo theory for the lifted cost function fˆx , that there is always an m such that (4) holds, and hence the definition is legitimate. A similar definition was proposed in [41] for the particular case where the retraction is the exponential mapping. When M = Rn with its canonical Euclidean structure, the definition reduces to the classical situation described, e.g., in [10]. We propose the following accelerated Riemannian line-search algorithm. Algorithm 1. Accelerated Line Search (ALS) Require: Riemannian manifold M ; continuously differentiable scalar field f on M ; retraction R from T M to M as in Definition 3.1; scalars α > 0, c, β, σ ∈ (0, 1). Input: Initial iterate x0 ∈ M . Output: Sequence of iterates {xk } ⊆ M and search directions {ηk } ⊆ T M . 1: for k = 0, 1, 2, . . . do 2: Pick a descent vector ηk in Txk M such that tηk is in the domain of R for all t ∈ [0, α]. 3: Select xk+1 ∈ M such that , (5) f (xk ) − f (xk+1 ) ≥ c f (xk ) − f Rxk η A where η A is the Armijo vector (Definition 4.2 with x := xk and η := ηk ). 4: end for Observe that Algorithm 1, as well as most other algorithms in this paper, describes a class of numerical algorithms; one could call it an algorithm model. The purpose of this analysis paper is to give (strong) convergence results for (broad) classes of algorithms. For Algorithm 1, we have the following convergence result, whose proof closely follows [10, Proposition 1.2.1]. The result is, however, more general in three
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
1003
aspects. (1) Even when the optimization domain is Rn , the line search is not necessarily done on a straight line, because the choice of the retraction is not restricted to the natural retraction (3) in Rn . (2) Even in the case of Rn , points other than the Armijo point can be selected, as long as they satisfy the acceleration condition (5). (3) Finally, the optimization domain can be any Riemannian manifold. Theorem 4.3. Let {xk } be an infinite sequence of iterates generated by Algorithm 1 (ALS), and assume that the generated sequence {ηk } of search directions is gradient-related (Definition 4.1). Then every limit point of {xk } is a stationary point of f . Proof. The proof is by contradiction. Suppose that there is a subsequence {xk }k∈K converging to some x∗ with grad f (x∗ ) = 0. Since {f (xk )} is nonincreasing, it follows that {f (xk )} converges to f (x∗ ). Hence f (xk ) − f (xk+1 ) goes to zero. By the construction of the algorithm, f (xk ) − f (xk+1 ) ≥ −cσαk grad f (xk ), ηk xk , where αk ηk is the Armijo vector. Since {ηk } is gradient-related, it follows that {αk }k∈K → 0. It follows that for all k greater than some k, αk < α, which means that αk = β m α for some m ≥ 1, which implies that the previously tried step size β m−1 α = αk /β did not satisfy the Armijo condition. In other words, f (xk ) − f (Rxk (αk /β)ηk ) < −σ(αk /β)grad f (xk ), ηk xk
∀k ∈ K, k ≥ k.
Denoting (6)
η˜k =
ηk ηk
and α ˜k =
αk ηk , β
the inequality above reads fˆxk (0) − fˆxk (˜ αk η˜k ) < −σgrad f (xk ), ηk xk α ˜k
∀k ∈ K, k ≥ k,
where fˆ is defined as in (2). The mean value theorem yields (7)
ηk ), η˜k xk < −σgrad f (xk ), ηk xk −grad fˆxk (t˜
∀k ∈ K, k ≥ k,
where t is in the interval [0, α ˜ k ]. Since {αk }k∈K → 0 and since ηk is gradient-related, hence bounded, it follows that {α ˜ k }k∈K → 0. Moreover, since η˜k has unit norm and its foot xk converges on the index set K, it follows that {ηk }k∈K is included in some compact subset of the tangent bundle T M , and therefore there exists an index set ˜ ⊆ K such that {˜ K ηk }k∈K˜ → η˜∗ for some η˜∗ ∈ Tx∗ M with ˜ η ∗ = 1. We now take ˜ Since the Riemannian metric is continuous (by definition), the limit in (7) over K. 1 ˆ f ∈ C , and grad fxk (0) = grad f (xk ) (because of point 3 in Definition 3.1, see [6, equation (4.4)]), we obtain −grad f (x∗ ), η˜∗ x∗ ≤ −σgrad f (x∗ ), η˜∗ x∗ . Since 0 < σ < 1, it follows that grad f (x∗ ), η˜∗ x∗ ≥ 0. On the other hand, from the fact that {ηk } is gradient-related, one obtains that grad f (x∗ ), η˜∗ x∗ < 0, a contradiction.
1004
P.-A. ABSIL AND K. A. GALLIVAN
More can be said under compactness assumptions, using a standard topological argument. (The purpose of the compactness assumption is to ensure that every subsequence of {xk } has at least one limit point.) Corollary 4.4. Let {xk } be an infinite sequence of iterates generated by Algorithm 1 (ALS), and assume that the generated sequence {ηk } of search directions is gradient-related (Definition 4.1). Assume that there is a compact set C such that {xk } ⊆ C. (This assumption holds in particular when the sublevel set L = {x ∈ M : f (x) ≤ f (x0 )} is compact: the iterates all belong to the sublevel set since f is nonincreasing. It also holds when M itself is compact.) Then limk→∞ grad f (xk ) = 0. Proof. The proof is by contradiction. Assume the contrary; i.e., there is a subsequence {xk }k∈K and > 0 such that grad f (xk ) > for all k ∈ K. Since {xk } ⊆ C, with C compact, it follows that {xk }k∈K has a limit point x∗ in C (Bolzano–Weierstrass theorem). By continuity of grad f , one has grad f (x∗ ) ≥ , i.e., x∗ is not stationary, a contradiction with Theorem 4.3. 5. Accelerated trust-region algorithm. We first briefly recall the basics of the Riemannian trust-region scheme (RTR) proposed in [2]. Let M be a Riemannian manifold with retraction R. Given a cost function f : M → R and a current iterate xk ∈ M , we use Rxk to locally map the minimization problem for f on M into a minimization problem for the cost function fˆxk defined as in (2). The Riemannian metric g turns Txk M into a Euclidean space endowed with the inner product gxk (·, ·), which makes it possible to consider the following trust-region subproblem in the Euclidean space Txk M : (8a)
min mxk (η) subject to η, ηxk ≤ Δ2k ,
η∈Txk M
where (8b)
1 mxk (η) ≡ f (xk ) + grad f (xk ), ηxk + Hxk η, ηxk , 2
Δk is the trust-region radius, and Hxk : Txk M → Txk M is some symmetric linear operator, i.e., Hxk ξ, χxk = ξ, Hxk χxk , ξ, χ ∈ Tx M . Note that mxk need not be the exact quadratic Taylor expansion of fˆxk about zero, since Hk is freely chosen. Next, an approximate solution ηk to the trust-region subproblem (8) is produced. For the purpose of obtaining global convergence results, the ηk need not be the exact solution provided it produces a sufficient decrease of the model, as specified later. The decision to accept or not the candidate Rxk (ηk ) and to update the trust-region radius is based on the quotient (9)
ρk =
fˆxk (0xk ) − fˆxk (ηk ) f (xk ) − f (Rxk (ηk )) = mxk (0xk ) − mxk (ηk ) mxk (0xk ) − mxk (ηk )
measuring the agreement between the model decrease and the function decrease at the proposed iterate. The following algorithm differs from the RTR algorithm of [2] only below the line “if ρk > ρ .” (The specific rules for accepting the proposed new iterate and updating the trust-region radius come from [30]; they form a particular instance of the rules given in [11].) Next, we study the global convergence of Algorithm 2. We show that, under some assumptions on the cost function, the model and the quality of ηk , it holds
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
1005
Algorithm 2. Accelerated Trust Region (ATR) Require: Riemannian manifold M ; scalar field f on M ; retraction R from T M to M ¯ > 0, Δ0 ∈ (0, Δ), ¯ and ρ ∈ [0, 1 ), c ∈ (0, 1), as in Definition 3.1. Parameters: Δ 4 c1 > 0. Input: Initial iterate x0 ∈ M . Output: Sequence of iterates {xk }. 1: for k = 0, 1, 2, . . . do 2: Obtain ηk by (approximately) solving (8). 3: Evaluate ρk from (9); 4: if ρk < 14 then 5: Δk+1 = 14 Δk 6: else if ρk > 34 and ηk = Δk then ¯ 7: Δk+1 = min(2Δk , Δ) 8: else 9: Δk+1 = Δk ; 10: end if 11: if ρk > ρ then 12: Select xk+1 ∈ M such that (10)
else Select xk+1 ∈ M such that
13: 14:
(11) 15: 16:
f (xk ) − f (xk+1 ) ≥ c (f (xk ) − f (Rxk (ηk ))) ;
f (xk ) − f (xk+1 ) ≥ 0;
end if end for
that the gradient of the cost function goes to zero at least on a subsequence of {xk }. This is done by slightly modifying the corresponding development given in [2] to take acceleration into account. We need the following definition. Definition 5.1 (radially L-C 1 function). Let fˆ : T M → R be as in (2). We say that fˆ is radially Lipschitz continuously differentiable if there exist reals βRL > 0 and δRL > 0 such that, for all x ∈ M , for all ξ ∈ T M with ξ = 1, and for all t < δRL , it holds that d d ˆ ˆ (12) dτ fx (τ ξ)|τ =t − dτ fx (τ ξ)|τ =0 ≤ βRL t. For the purpose of Algorithm 2, which is a descent algorithm, this condition needs only to be imposed in the level set (13)
{x ∈ M : f (x) ≤ f (x0 )}.
We also require the approximate solution ηk of the trust-region subproblem (8) to produce a sufficient decrease in the model. More precisely, ηk must produce at least as much of a decrease in the model function as a fixed fraction of the so-called Cauchy decrease; see [30, section 4.3]. Since the trust-region subproblem (8) is expressed on
1006
P.-A. ABSIL AND K. A. GALLIVAN
a Euclidean space, the definition of the Cauchy point is adapted from Rn without difficulty, and the bound
gradf (xk ) (14) mk (0) − mk (ηk ) ≥ c1 gradf (xk ) min Δk , , Hk for some constant c1 > 0, is readily obtained from the Rn case, where Hk is defined as (15)
Hk := sup{Hk ζ : ζ ∈ Txk M, ζ = 1}.
In particular, the Steihaug–Toint truncated CG method (see, e.g., [37, 30, 11]) satisfies this bound (with c1 = 12 , see [30, Lemma 4.5]) since it first computes the Cauchy point and then attempts to improve the model decrease. With these things in place, we can state and prove the following global convergence result. Theorem 5.2. Let {xk } be a sequence of iterates generated by Algorithm 2 (ATR) with ρ ∈ [0, 14 ). Suppose that f is C 1 and bounded below on the level set (13), that fˆ is radially L-C 1 (Definition 5.1), and that Hk ≤ β for some constant β. Further suppose that all approximate solutions ηk of (8) satisfy the Cauchy decrease inequality (14) for some positive constant c1 . We then have lim inf grad f (xk ) = 0. k→∞
Proof. Here is a brief outline of the proof for the reader’s convenience. We will assume for contradiction that the norm of the gradient is bounded away from zero. Then a key to reaching a contradiction is that the trust-region does not shrink to zero (21). This is ensured by showing that ρk is greater than 12 whenever Δk is smaller than a global value (20). This result itself is obtained by imposing that the discrepancy between the model and the cost function is uniformly quadratic (17) and that the denominator of ρk is bounded below by a ramp function of Δk (14). We now turn to the detailed proof. First, we perform some manipulation of ρk from (9): (f (x ) − fˆ (η )) − (m (0) − m (η )) k xk k k k k |ρk − 1| = mk (0) − mk (ηk ) m (η ) − fˆ (η ) k k xk k (16) = . mk (0) − mk (ηk ) Direct manipulations on the function t → fˆxk (t ηηkk ) yield
ηk d ˆ ˆ ˆ |τ =0 fxk (ηk ) = fxk (0xk ) + ηk fxk τ dτ ηk
ηk
d ˆ d ˆ ηk ηk fxk τ fxk τ + |τ =t − |τ =0 dt dτ ηk dτ ηk 0 = f (xk ) + grad f (xk ), ηk xk + , η where | | < 0 k βRL t dt = 12 βRL ηk 2 whenever ηk < δRL , and βRL and δRL are the constants in the radially L-C 1 property (12). Therefore, it follows from the
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
1007
definition (8b) of mk that 1 ˆ mk (ηk ) − fxk (ηk ) = Hxk ηk , ηk xk − 2 1 1 ≤ βηk 2 + βRL ηk 2 ≤ β ηk 2 (17) 2 2 whenever ηk < δRL , where β = max(β, βRL ). Assume for the purpose of contradiction that lim inf k→∞ grad f (xk ) = 0; that is, assume that there exist > 0 and a positive index K such that grad f (xk ) ≥
(18)
∀k ≥ K.
From (14) for k ≥ K, we have (19)
gradf (xk ) mk (0) − mk (ηk ) ≥ c1 gradf (xk ) min Δk , ≥ c1 min Δk , . Hk β
Substituting (17) and (19) into (16), we have that (20)
|ρk − 1| ≤
β Δ2k β ηk 2 ≤ c1 min Δk , β c1 min Δk , β
ˆ that allows us to bound the whenever ηk < δRL . We can choose a value of Δ ˆ Choose Δ ˆ as follows: right-hand side of the inequality (20) when Δk ≤ Δ.
ˆ ≤ min Δ
c1 , , δRL . 2β β
This gives us min(Δk , β ) = Δk . We can now write (20) as follows: |ρk − 1| ≤
ˆ k Δ 1 β ΔΔ ≤ = . k 2 c1 min Δk , β 2 min Δk , β
ˆ so that, by the workings of Algorithm 2, it Therefore, ρk ≥ 12 > 14 whenever Δk ≤ Δ ˆ It follows that a reduction of Δk (by a follows that Δk+1 ≥ Δk whenever Δk ≤ Δ. 1 ˆ Therefore, we conclude that factor of 4 ) can occur in Algorithm 2 only when Δk > Δ. ˆ ∀k ≥ K. (21) Δk ≥ min ΔK , Δ/4 Consequently, ρk ≥ 14 must hold infinitely many times (otherwise {Δk } would go to zero by the workings of the algorithm). So there exists an infinite subsequence K such that ρk ≥ 14 > ρ for k ∈ K. If k ∈ K and k ≥ K, it follows from (19) and (10) that f (xk ) − f (xk+1 ) ≥ c fxk − fˆxk (ηk ) 1 ≥ c (mk (0) − mk (ηk )) 4
1 ≥ c c1 min Δk , 4 β ˆ Δ 1 ≥ c c1 min ΔK , , . 4 4 β / K, it follows that f (xk ) → −∞, a Since, moreover, f (xk ) − f (xk+1 ) ≥ 0 for all k ∈ contradiction since f is bounded below on the level set containing {xk }.
1008
P.-A. ABSIL AND K. A. GALLIVAN
The convergence result of Theorem 5.2 is essentially identical to the corresponding result for the non-accelerated Riemannian trust-region method (see [2] or [6]), which itself is a natural generalization of a convergence result of the classical (nonaccelerated) trust-region method in Rn . In the classical convergence theory of trustregion methods in Rn (see, e.g., [30, 11]), this result is followed by another theorem stating that, under further assumptions, limk→∞ grad f (xk ) = 0; i.e., the gradient of the cost function goes to zero on the whole sequence of iterates. This result also has a natural generalization for the non-accelerated Riemannian trust-region method (see [2, Theorem 4.4] or [6, Theorem 7.4.4]). It is an open question whether this result extends verbatim to the accelerated case. At least we can say that the proof cannot be adapted in a simple way: the condition that there exist μ > 0 and δμ > 0 such that (22)
ξ ≥ μ dist(x, Rx (ξ))
for all x ∈ M, for all ξ ∈ Tx M, ξ ≤ δμ ,
no longer implies that ηk ≥ μ dist(xk , xk+1 ) when acceleration comes into play. A simple fix is to require that there exists μ > 0 such that the iterates satisfy (23)
ηk ≥ μ dist(xk , xk+1 ) for all k.
We then obtain the following result. (We refer to [2, 6] for the concept of Lipschitz continuous differentiability of f on the Riemannian manifold M ; the definition reduces to the classical one when the manifold is Rn . The extension of the proof of [6, Theorem 7.4.4] to a proof of Theorem 5.3 is left to the reader.) Theorem 5.3. Let {xk } be a sequence of iterates generated by Algorithm 2 (ATR). Suppose that all of the assumptions of Theorem 5.2 are satisfied. Further suppose that ρ ∈ (0, 14 ), that f is Lipschitz continuously differentiable, and that (23) is satisfied for some μ > 0. It then follows that lim grad f (xk ) = 0.
k→∞
6. Local convergence. We now briefly comment on how accelerating an optimization method may affect its order of convergence. Consider an algorithm that converges locally with order q to a local minimum v of the cost function f ; that is, dist(x+ , v) ≤ c0 (dist(x, v))q for some c0 > 0 and all x in some neighborhood of v, where x+ stands for the next iterate computed from the current iterate x. If the algorithm is accelerated in the sense of (1), then local convergence to v is no longer guaranteed without further hypotheses; i.e., the algorithm may converge to stationary points other than v. However, for sequences of iterates of the accelerated algorithm that converge to v, we have the following result. Proposition 6.1. Let v be a nondegenerate minimizer of f ∈ C 3 (M ), where M is a Riemannian manifold. Consider a descent algorithm that converges locally with order q > 1 to v. If {xk } is a sequence of iterates of an accelerated version of the descent algorithm, in the sense of (1) with c = 1, and {xk } converges to v, then it does so with order q. Proof. We work in a coordinate system around v. Abusing notation, we use the same symbols for points of M and their coordinate representations. There is a neighborhood U of v such that, for all x ∈ U, we have 1 λm x − v2 ≤ f (x) − f (v) ≤ 2λM x − v2 , 2
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
1009
where λM ≥ λm > 0 denote the largest and smallest eigenvalues, respectively, of the Hessian of f at v (they are positive since v is a nondegenerate minimizer). Since c = 1, it follows from (1) that f (xk+1 ) ≤ f (xk+1/2 ). Moreover, by the equivalence of norms, there is a neighborhood U1 of v and constants c1 and c2 such that, for all x ∈ U1 , 1 c1 dist(x, v) ≤ x − v ≤ c2 dist(x, v). Since the original descent algorithm converges locally with order q to v, there exists a nonempty open ball B (v) such that, whenever q xk ∈ B (v), it holds that xk+1/2 ∈ B (v) with dist(xk+1/2 , v) ≤ c0 (dist(xk , v)) . Moreover, can be chosen such that B (v) ⊆ U ∩ U1 . Since {xk } converges to v, there is K such that, for all k > K, xk belongs to B (v). We have, for all k > K, (dist(xk+1 , v))2 ≤ c21 xk+1 − v2 2 2 ≤ c21 (f (xk+1 ) − f (v)) ≤ c21 (f (xk+1/2 ) − f (v)) λm λm 2 4 4 λM xk+1/2 − v2 ≤ c21 λM c0 c22 dist(xk+1/2 , v) ≤ c21 λm λm 4 2q ≤ c21 λM c20 c22 (dist(xk , v)) . λm 7. Sequential subspace optimization methods. We consider sequential subspace optimization methods in the form given in Algorithm 3 below. It generalizes the sequential subspace optimization (SESOP) algorithm of [31] to Riemannian manifolds. Algorithm 3. SESOP Require: Riemannian manifold M ; continuously differentiable scalar field f on M ; retraction R from T M to M as in Definition 3.1. Input: Initial iterate x0 ∈ M . Output: Sequence of iterates {xk } ⊆ M 1: for k = 0, 1, 2, . . . do 2: Select a subspace Sk ⊆ Txk M . 3: Find ξk = arg minξ∈Sk f (Rxk (ξ)). 4: Set xk+1 = Rxk (ξk ). 5: end for If Sk is chosen in step 2 such that Sk contains ηk , where ηk is as in Algorithm 1 (ALS) (resp., Algorithm 2 (ATR)), then Algorithm SESOP becomes an instance of Algorithm 1 (resp., Algorithm 2), with c = 1. The SESOP framework thus provides a strategy for accelerating line-search and trust-region methods. When M = Rn with its natural retraction, Algorithm 3 becomes Algorithm 4 below, which can be found in [31] in an almost identical formulation. Observe that Algorithm 4. Rn -SESOP Require: Continuously differentiable scalar field f on Rn . Input: Initial iterate x0 ∈ Rn . Output: Sequence of iterates {xk } ⊆ Rn 1: for k = 0, 1, 2, . . . do 2: Select a real matrix Wk with n rows. 3: Find y ∗ = arg miny f (x + Wk y). 4: Set xk+1 = xk + Wk y ∗ . 5: end for
1010
P.-A. ABSIL AND K. A. GALLIVAN
if xk ∈ col(Wk ), where col(W ) denotes the subspace spanned by the columns of W , then xk+1 admits the expression (24)
xk+1 = arg min f (x). x∈col(Wk )
Definition 7.1 (gradient-related sequence of subspaces). A sequence {Sk } of subspaces of Txk M is gradient-related if there exists a gradient-related sequence {ηk } such that ηk ∈ Sk for all k; equivalently, for any subsequence {xk }k∈K that converges to a nonstationary point, we have
grad f (xk ), η < 0. inf lim sup k→∞, k∈K
η∈Sk , η=1
When M = Rn , the condition that Sk be a subspace of Txk M reduces to Sk being a subspace of Rn (in view of the canonical identification Tx Rn ) Rn ). Proposition 7.2. Let {xk } be an infinite sequence of iterates generated by Algorithm 3 (SESOP). Assume that the sequence {Sk } produced by Algorithm 3 is gradient-related (Definition 7.1). Then every limit point of {xk } is a stationary point of f . Assume further that {xk } is included in some compact set C. Then limk→∞ grad f (xk ) = 0. Proof. The proof is a direct consequence of the convergence analysis of Algorithm 1 (ALS). We now discuss a detailed procedure for selecting Sk in Algorithm 3 (SESOP). It generalizes an idea in [26], which can be traced back to [39]. We denote by Pγt←t0 ζ the vector of Tγ(t) M obtained by parallel transporting a vector ζ ∈ Tγ(t0 ) M along a curve γ. We refer, e.g., to [13, 6] for details on parallel translation. In Rn , the natural parallel translation is simply given by Pγt←t0 ζ = ζ (where the ζ on the left-hand side is viewed as an element of Tγ(0) M and the ζ on the right-hand side is viewed as an element of Tγ(t) M ). The name conjugate gradient is justified by the following property. Let M be the Euclidean space Rn with retraction Rx (ξ) := x + ξ. Let f be given by f (x) = 12 xT Ax, where A is a symmetric positive-definite matrix. Then Algorithm 5 reduces to the classical linear CG method. This result is a consequence of the minimizing properties of the CG method. Again in the Euclidean case, but for general cost functions, Algorithm 5 can be viewed as a “locally optimal” nonlinear CG method: instead of computing a search direction ξk as a correction of −grad f (xk ) along ξk−1 (as is done in classical CG methods), the vector ξk is computed as a minimizer over the space spanned by {−grad f (xk ), ξk−1 }. For the general Riemannian case, assuming that the retraction is chosen as the Riemannian exponential, Algorithm 5 can be thought of as a locally optimal version of the Riemannian CG algorithms proposed by Smith [34] (see also [14]). By construction, the sequence {Sk } in Algorithm 5 is gradient-related. The following result thus follows from Proposition 7.2. Proposition 7.3. Let {xk } be an infinite sequence of iterates generated by Algorithm 5. Then every limit point of {xk } is a stationary point of f . Assume further that {xk } ⊆ C for some compact set C. Then limk→∞ grad f (xk ) = 0. This result still holds if the parallel transport in Algorithm 5 is replaced by any vector transport as defined in [6]; indeed, the sequence {Sk } is still gradient-related by construction. Moreover, we point out that since Algorithm 5 is based on CG, it tends to display fast local convergence.
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
1011
Algorithm 5. Accelerated Conjugate Gradient (ACG) Require: Riemannian manifold M ; continuously differentiable scalar field f on M ; retraction R from T M to M as in Definition 3.1. Input: Initial iterate x0 ∈ M . Output: Sequence of iterates {xk }. 1: ξ0 := 0; x1 := x0 ; 2: for k = 1, 2, . . . do 4 5 3: Compute ξk as a minimizer of fˆxk over Sk := span Pγ1←0 ξk−1 , grad f (xk ) where γ(t) := Rxk−1 (t ξk−1 ); 4: Compute xk+1 = Rxk (ξk ); 5: end for 8. Applications. Several occurences of Algorithms 1 (ALS), 2 (ATR), and 3 (SESOP) appear in the literature, e.g., in [20], [31], and in several eigenvalue algorithms. Indeed, it is well-known that subspace acceleration can remarkably improve the efficiency of eigensolvers; see, for example, the numerical comparison in [6, Figure 4.3] between a steepest descent algorithm and an accelerated version thereof, equivalent to locally optimal block preconditioned conjugate gradient (LOBPCG). Since, moreover, subspace acceleration is easy to perform for the eigenvalue problem, there are few methods that do not exploit it. In the context of this analysis paper, we will focus on showing that the theory developed in the previous sections leads to convergence results for certain well-known algorithms. Some of these convergence results are new, to the best of our knowledge. In other cases, we recover results that have already been established, but the acceleration-based proof technique is novel and arguably more streamlined. 8.1. Lanczos algorithm. In a Ritz-restarted Lanczos algorithm for computing the leftmost eigenpair of a symmetric matrix A, the next iterate xk+1 is chosen as a minimizer of the Rayleigh quotient over the subspace Km (xk ) := span{xk , Axk , A2 xk , . . . , Am xk }, m ≥ 1. Recall that the Rayleigh quotient of A is the function f : Rn0 → R : x →
xT Ax . xT x
Its stationary points are the eigenvectors of A, and at those points it takes the value of the corresponding eigenvalue. (Note, however, that f (x) = λi , where λi is an eigenvalue of A, does not imply that x is an eigenvector of A, unless λi is an extreme eigenvalue of A.) Since xk belongs to Km (xk ), we are in the situation (24), and thus the Ritz-restarted Lanczos algorithm is an instance of Algorithm 3 (SESOP) (specifically, of Algorithm 4 (Rn -SESOP)). The gradient of the Rayleigh quotient at xk is collinear with Axk − f (xk )xk , which belongs to Km (xk ), and hence {Km (xk )} is gradient-related to {xk }. It follows from Theorem 7.2 that every limit point of {xk } is an eigenvector of A, regardless of x0 . Taking into account the properties of the Rayleigh quotient f along with the fact that {xk } is a descent sequence for f , it follows that {xk } converges to the eigenspace associated to an eigenvalue of A. The same conclusion holds for the Ritz-restarted Krylov method proposed by Golub and Ye [19] for the symmetric definite generalized eigenvalue problem. In other words, we recovered [19, Theorem 3.2]. 8.2. LOBPCG. Knyazev’s LOBPCG method [26], in combination with a symmetric positive-definite preconditioner, is a popular algorithm for computing approx-
1012
P.-A. ABSIL AND K. A. GALLIVAN
imations to the smallest eigenvalues and eigenvectors of the eigenproblem Au = Buλ, where A and B are real symmetric positive-definite matrices of order n. Here we consider LOBPCG as formulated in [21, Algorithm 1] (with some changes in the notation), and we show, using Theorem 4.3, that the limit points of {col(Xk )} are invariant subspaces of the pencil (A, B). Moreover, invariant subpaces that do not correspond to the smallest eigenvalues are “unstable,” in the sense explained below. The LOBPCG algorithm is described in Algorithm 6. In the algorithm, (Y, Θ) = RR(S, p) performs a Rayleigh–Ritz analysis where the pencil (S T AS, S T BS) has eigenvectors Y and eigenvalues Θ, i.e., S T ASY = S T BSY Θ
and Y T S T BSY = Ib×b ,
where Ib×b is the identity matrix of size b × b. The first p pairs with smallest Ritz values are returned in Y and in the diagonal matrix Θ in a nondecreasing order. Note that we consider the formulation [21, Algorithm 1] because it is simple to state and comprehend. However, it should be kept in mind that the matrix [Xk , Hk , Pk ] may become singular or ill-conditioned [21]. Therefore, in practical implementations, it is recommended to rely on the robust representation given in [21, Algorithm 2]. The convergence results obtained below also hold in this case. Algorithm 6. LOBPCG [26, 21] without soft-locking Require: Symmetric positive-definite matrices A and B of order n; symmetric positive-definite preconditioner N ; block-size p. ˜ ∈ Rn×p . 1: Select an initial guess X ˜ where (Y, Θ0 ) = RR(X, ˜ p). 2: X0 = XY 3: Rk = AX0 − M X0 Θ0 . 4: Pk = [ ]. 5: for k = 0, 1, 2, . . . do 6: Solve the preconditioned linear system N Hk = Rk . 7: Let S = [Xk , Hk , Pk ] and compute (Yk , Θk+1 ) = RR(S, p). 8: Xk+1 = [Xk , Hk , Pk ]Yk . 9: Rk+1 = AXk+1 − M Xk+1 Θk+1 . 10: Pk+1 = [0, Hk , Pk ]Yk . 11: end for In the case p = 1, it takes routine manipulations to check, using Proposition 7.2 with the Rayleigh quotient as the cost function, that all of the limit points of {Xk } are eigenvectors of the pencil (A, B). We now consider the general case p ≥ 1 in detail. denote the set of all full-rank n × p real matrices. Observe that Rn×p is Let Rn×p ∗ ∗ an open subset of Rn×p (it is thus an open submanifold of the linear manifold Rn×p , see [6]) and that TX Rn×p ) Rn×p for all X ∈ Rn×p . In Rn×p , consider the inner ∗ ∗ ∗ product defined by −1 T Z1 Z2 , X ∈ Rn×p , Z1 , Z2 ∈ TX Rn×p . (25) Z1 , Z2 X = 2 trace X T BX ∗ ∗ (The factor of 2 is included here to prevent factors of 2 from appearing in the formula into a of the gradient below. This is still a valid inner product, and it turns Rn×p ∗
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
1013
Riemannian manifold.) Consider the cost function −1 T → R : X → trace X T BX X AX . (26) f : Rn×p ∗ This generalized Rayleigh quotient was studied, e.g., in [6] (when B = I, it reduces to the extended Rayleigh quotient of [22]). It satisfies the property f (XW ) = f (X) for and all W invertible of size p × p. A matrix X ∈ Rn×p is a stationary all X ∈ Rn×p ∗ ∗ point of f if and only if its column space is an invariant subspace of the pencil (A, B). The value of f at an invariant subspace is the sum of the corresponding eigenvalues. The stationary points whose column space is the rightmost invariant subspace of (A, B) (i.e., the one corresponding to the largest eigenvalues) are global maximizers of f . The stationary points whose column space is the leftmost invariant subspace of (A, B) (i.e., the one corresponding to the smallest eigenvalues) are global minimizers of f . All of the other stationary points are saddle points. is Rn×p with infinitely many elements excerpted makes it The fact that Rn×p ∗ difficult to view LOBPCG as an instance of Algorithm 3 (SESOP). Instead, we view it as an instance of Algorithm 1 (ALS). The gradient of f with respect to the Riemannian metric (25) is −1 T X AX; grad f (X) = AX − BX X T BX see, e.g., [6, equation (6.37)]. Referring to Algorithm 6, we have Hk = N −1 grad f (Xk ) and 0 02 1 0 0 grad f (Xk ), −Hk Xk = 0N − 2 grad f (Xk )0 , F
from which it follows that {−Hk } is gradient-related to {Xk } (Definition 4.1). We consider the retraction given by RX (Z) = X + Z, X ∈ Rn×p , Z ∈ TX Rn×p ) Rn×p . ∗ ∗ The Armijo point along −Hk takes the form Xk+1/2 = Xk − αk Hk for some αk > 0. Hence Xk+1/2 = [Xk , Hk , Pk ]Y for some Y . Without preconditioning (N = I), Xk+1/2 is full-rank (i.e., it belongs to Rn×p ) for any αk . Indeed, we have that XkT Xk+1/2 = XkT (I −αk A)Xk +αk XkT AXk = ∗ XkT Xk is full-rank. (Observe that all iterates are B-orthogonal, hence of full rank.) With the preconditioner, however, this property is no longer guaranteed. Nevertheless, given A, B and N symmetric positive-definite matrices of order n, it is possible to find α such that X − αN −1 grad f (X) has full rank for all B-orthonormal X and all α ∈ [0, α]. (This is because {X ∈ Rn×p : X T BX = I} is a compact subset of Rn×p and Rn×p \ Rn×p is a closed subset of Rn×p that do not intersect, and hence their ∗ distance does not vanish.) With this α, LOBPCG becomes an instance of Algorithm 1 (ALS), provided we show that the acceleration bound (5) holds for some c > 0. It does hold for c = 1, as a consequence of the following result. Lemma 8.1. In the context of Algorithm 6, we have 4 5 f (Xk+1 ) = min f ([Xk , Hk , Pk ]Y ) : Y ∈ R3p×p , Y T [Xk , Hk , Pk ]T B[Xk , Hk , Pk ]Y = I = min{f ([Xk , Hk , Pk ]Y ) : Y ∈ R3p×p , [Xk , Hk , Pk ]Y full rank}, where f denotes the Rayleigh quotient (26).
1014
P.-A. ABSIL AND K. A. GALLIVAN
Proof. The three expressions are equal to the sum of the p leftmost eigenvalues of the pencil (U T AU, U T BU ), where U is a full-rank matrix with col(U ) = col([Xk , Hk , Pk ]). This yields the following result. Proposition 8.2. Let {Xk } be a sequence of iterates generated by Algorithm 6 (LOBPCG). Then the following holds. (a) Every limit point X∗ of {Xk } is a stationary point of f ; i.e., col(X∗ ) is an invariant subspace of (A, B); (b) limk→∞ AXk − BXk Θk = 0, where Θk is as in Algorithm 6 (LOBPCG); (c) The limit points of {col(Xk )} are p-dimensional invariant subspaces of (A, B); (d) limk→∞ f (Xk ) exists (where f is the generalized Rayleigh quotient (26)), and thus f takes the same value at all limit points of {Xk }. (e) Let V be a limit point of {col(Xk )} that is not a leftmost invariant subspace of (A, B) (“leftmost” means related to the smallest eigenvalues). Then V is unstable in the following sense: there is > 0 such that for all δ > 0 there exists K > 0 and Z ∈ Rn×p , with Z < δ, such that if XK is perturbed to XK + Z and the algorithm is pursued from this new iterate, then the new sequence satisfies ∠(col(Xk ), V) > for all but finitely many iterates. Proof. Point (a) follows from Proposition 4.3 as explained above. Point (b) follows from Corollary 4.4 since all iterates belong to the compact set {X ∈ Rn×p : X T BX = I}. Note that grad f (Xk ) = AXk −BXk Θk . Point (c) involves the topology of the quotient manifold. The result follows from the fact that the col mapping is to the Grassmann manifold of p-planes in Rn . (The topology of continuous from Rn×p ∗ the Grassmann manifolds is precisely the one that makes the col mapping continuous; see, e.g., [6] for details.) Point (d) holds because LOBPCG is a descent method for f . Point (e) can be deduced from the fact that the non-leftmost invariant subspaces of (A, B) are saddle points or maxima for f and from the fact that LOBPCG is a descent method for f . 8.3. Jacobi–Davidson methods. The Jacobi–Davidson algorithm for computing the smallest eigenvalue and eigenvector of an n × n symmetric matrix A, as described in [38, Algorithm 1], clearly fits within Algorithm 3 (SESOP). However, without further assumptions, it is not guaranteed that {Sk } be gradient-related: it all depends on how the Jacobi correction equation is “approximately” solved. If the approximate solution can be guaranteed to be gradient-related, then it follows from Proposition 7.2 that all limit points are stationary points of the Rayleigh quotient; i.e., they are eigenvectors. For example, consider, as in [28], the Jacobi equation in the form (27) I − xk xTk (A − τ I) I − xk xTk ηk = − I − xk xTk Axk , xTk ηk = 0, where τ is some target less than the smallest eigenvalue λ1 of A, and assume that the approximate solution ηk is obtained with mk steps of the CG iteration (1 ≤ mk < n for all k). We show that the sequence {ηk } is gradient-related to {xk }, and thus {Sk } is gradient-related to {xk } when Sk contains ηk for all k. By the workings of CG (with zero initial condition), ηk is equal to Vmk yk , where Vmk is an orthonormal basis of the Krylov subspace Kmk generated from −(I − xk xTk )Axk using the operator (I − xk xTk )(A − τ I)(I − xk xTk ) and where yk solves (28)
VmTk (A − τ I)Vmk yk = −VmTk Axk .
Notice that the Krylov subspace is orthogonal to xk and contains the gradient (I −
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
1015
xk xTk )Axk , and hence we have the identities (I −xk xTk )Vmk = Vmk and Vmk VmTk Axk = Vmk VmTk (I − xk xTk )Axk = (I − xk xTk )Axk . Since A − τ I is positive-definite, it follows that the condition number of the projected matrix VmTk (A − τ I)Vmk is bounded, and hence in view of (28) the angle between yk and −VmTk Axk is bounded away from π T T 2 , and so is the angle between Vmk yk = ηk and −Vmk Vmk Axk = −(I − xk xk )Axk because Vmk is an orthonormal basis. Moreover, {yk } is bounded away from zero and infinity, and so is {ηk }. We have thus shown that the sequence {ηk } is gradient-related to {xk } (see the discussion that follows Definition 4.1). Thus Proposition 8.2 holds, mutatis mutandis, for the Jacobi–Davidson method [38, Algorithm 1] when the Jacobi equation (27) is defined and solved approximately with CG as in [28]. The result still holds when the CG iteration for (approximately) solving (27) is preconditioned with a positive-definite preconditioner Nk . Indeed, the preconditioned CG for solving a linear system Bη = −g amounts to applying the “regular” CG ˜ η˜ = −˜ ˜ = N −1 BN −1 , η˜ = N η, and method to the transformed system B g, where B −1 g˜ = N g (see, e.g., [18, section 10.3]). If η˜j is an iterate of the regular CG applied ˜ η˜ = −˜ to B g and thus ηj = N −1 η˜j is the iterate of the preconditioned CG, then we have ˜ ηj , g˜ = N ηj , N −1 g = ηj , g. Thus the sequence {η}k , where ηk is the approximate solution of (27) returned by the preconditioned CG, is gradient-related. Note that the choice of τ to make (A − τ I) positive-definite in (27) is crucial in the development above. In the frequently encountered case where τ is selected as the Rayleigh quotient θk at xk , it seems difficult to provide a theoretical guarantee that the approximate solution ηk of (27) is gradient-related, unless we assume that the iteration starts close enough to the minor eigenvector so that (I − xk xTk )(A − θk I)(I − xk xTk ) is positive definite as a linear transformation of the orthogonal complement of xk . (An example of the requirement that the iteration start sufficiently close to the minor 2 eigenvector is the condition θk < λ1 +λ in [29, Theorem 4.3].) However, in practice, 2 it is quite clear that a solver producing a sequence {ηk } that is not gradient-related would have to be particularly odd. It is thus not surprising that the global convergence properties stated in Proposition 8.2 have been empirically observed in general for eigenvalue algorithms that fit in the Jacobi–Davidson framework. Another example (which does not fit, strictly speaking, in the Jacobi–Davidson framework, but is closely related) is when, as in [1], the Jacobi equation is solved approximately using a truncated CG algorithm and the approximate solution is accepted or rejected using a trust-region mechanism. The method becomes an instance of Algorithm 2 applied to the Rayleigh quotient cost function, and Proposition 8.2 holds, mutatis mutandis. 8.4. Sequential subspace method. All of the algorithms thus far in this section are concerned with the eigenvalue problem; however, the area of application of the convergence theory developed in this paper is not restricted to eigenvalue solvers. An example is the SSM of Hager [20] for minimizing an arbitrary quadratic function over a sphere. This algorithm is an instance of Algorithm 3 (SESOP). In [20], {Sk } is required to contain grad f (xk ); therefore, all limit points are stationary by Proposition 7.2. This was proven in [25], where stronger global convergence results are obtained by making additional assumptions on {Sk }. 9. Concluding remarks. If we accelerate, in the sense of (1), an optimization algorithm that converges globally to stationary points of the cost function, do we preserve the global convergence result? We have answered this question positively for a wide class of line-search and trust-region methods. The global convergence of several eigenvalue algorithms follows from this result, under mild conditions, as shown
1016
P.-A. ABSIL AND K. A. GALLIVAN
in section 8. We suspect that several other existing methods satisfy the conditions of the global convergence theorems proven in this paper. An important practical issue in the design of accelerated algorithms is to strike a good balance of the workload between the “Jacobi-like” step (i.e., the computation of an update vector ηk ) and the “Davidson-like” step (i.e., the improvement on the Jacobi update, for example, via a minimization within a subspace containing ηk ). For example, at one extreme, the simplified Jacobi–Davidson in [28] simply turns off the Davidson step. Note that the algorithm in [8], where the “Jacobi” step consists of solving approximately a certain trust-region-like problem, shows promising numerical results even without using a “Davidson” step. At the other extreme, the workings of the the Jacobi–Davidson approach [38] can be exploited to let the Davidson step compensate for a crude approximation of the Jacobi update. In LOBPCG, the balance of the workload between the Jacobi-like step (computation of Hk ) and the Davidson-like step (computation of Xk+1 from [Xk , Hk , Pk ] by a Ritz process) depends much on the complexity of the chosen preconditioner; we refer, e.g., to [5, 27] for more information on preconditioners in LOBPCG. Note that in an eigenvalue method for a matrix A, the structure of A and the nature of the preconditioner will affect the computational burden on the Jacobi-like step, whereas the Davidson-like step, if implemented efficiently, should require only some orthogonalization routines and be largely independent of the cost of the operators. Hence, when the operators are inexpensive, it becomes more affordable to require a higher accuracy in the Jacobi-like step. We refer to [35, 24, 23, 36] for further work along these lines. Finally, we point out that there is not necessarily a unique way of separating the instructions of an iterative loop into a Jacobi-like step and a Davidson-like step that satisfy the conditions for the global convergence analysis. For example, the application of a preconditioner can be considered as part of the Jacobi-like step or as part of the acceleration step if the preconditioning leads to an acceleration bound (1). Acknowledgments. This work benefited in particular from discussions with Chris Baker, Bill Hager, Ekkehard Sachs, and Gerard Sleijpen. Special thanks to Chris Baker for his helpful comments on the manuscript. REFERENCES [1] P.-A. Absil, C. G. Baker, and K. A. Gallivan, A truncated-CG style method for symmetric generalized eigenvalue problems, J. Comput. Appl. Math., 189 (2006), pp. 274–285. [2] P.-A. Absil, C. G. Baker, and K. A. Gallivan, Trust-region methods on Riemannian manifolds, Found. Comput. Math., 7 (2007), pp. 303–330. [3] R. L. Adler, J.-P. Dedieu, J. Y. Margulies, M. Martens, and M. Shub, Newton’s method on Riemannian manifolds and a geometric model for the human spine, IMA J. Numer. Anal., 22 (2002), pp. 359–390. [4] P.-A. Absil and K. A. Gallivan, Accelerated Line-search and Trust-region Methods, Technical report FSU-SCS-2005-095, School of Computational Science, Florida State University, Tallahassee, FL, 2005. [5] P. Arbenz, U. L. Hetmaniuk, R. B. Lehoucq, and R. S. Tuminaro, A comparison of eigensolvers for large-scale 3D modal analysis using AMG-preconditioned iterative methods, Internat. J. Numer. Methods Engrg., 64 (2005), pp. 204–236. [6] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, Princeton, NJ, 2008. [7] L. Armijo, Minimization of functions having Lipschitz continuous first partial derivatives, Pacific J. Math., 16 (1966), pp. 1–3. [8] C. G. Baker, P.-A. Absil, and K. A. Gallivan, An implicit trust-region method on Riemannian manifolds, IMA J. Numer. Anal., to appear.
ACCELERATED LINE-SEARCH AND TRUST-REGION METHODS
1017
[9] C. G. Baker, Riemannian Manifold Trust-region Methods with Applications to Eigenproblems, Ph.D. thesis, School of Computational Science, Florida State University, Tallahassee, FL, 2008. [10] D. P. Bertsekas, Nonlinear Programming, Athena Scientific, Belmont, MA, 1995. [11] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust-Region Methods, MPS/SIAM Ser. Optim. 1, SIAM, Philadelphia, 2000. [12] E. R. Davidson, The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices, J. Comput. Phys., 17 (1975), pp. 87–94. [13] M. P. do Carmo, Riemannian geometry, Math. Theory Appl., Birkh¨ auser Boston, Boston, MA, 1992. Translated from the second Portuguese edition by Francis Flaherty. [14] A. Edelman, T. A. Arias, and S. T. Smith, The geometry of algorithms with orthogonality constraints, SIAM J. Matrix Anal. Appl., 20 (1998), pp. 303–353. [15] D. R. Fokkema, G. L. G. Sleijpen, and H. A. Van der Vorst, Accelerated inexact Newton schemes for large systems of nonlinear equations, SIAM J. Sci. Comput., 19 (1998), pp. 657–674. [16] D. R. Fokkema, G. L. G. Sleijpen, and H. A. van der Vorst, Jacobi–Davidson style QR and QZ algorithms for the reduction of matrix pencils, SIAM J. Sci. Comput., 20 (1998), pp. 94–125. [17] D. Gabay, Minimizing a differentiable function over a differential manifold, J. Optim. Theory Appl., 37 (1982), pp. 177–219. [18] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins S. Math. Sci., Johns Hopkins University Press, Baltimore, MD, 1996. [19] G. H. Golub and Q. Ye, An inverse free preconditioned Krylov subspace method for symmetric generalized eigenvalue problems, SIAM J. Sci. Comput., 24 (2002), pp. 312–334. [20] W. W. Hager, Minimizing a quadratic over a sphere, SIAM J. Optim., 12 (2001), pp. 188–208. [21] U. Hetmaniuk and R. Lehoucq, Basis selection in LOBPCG, J. Comput. Phys., 218 (2006), pp. 324–332. [22] U. Helmke and J. B. Moore, Optimization and Dynamical Systems, Comm. Control Engrg. Ser., Springer-Verlag, London, 1994. [23] M. E. Hochstenbach and Y. Notay, Controlling Inner Iterations in the Jacobi-Davidson Method, SIAM J. Matrix Anal. Appl., to appear. [24] M. E. Hochstenbach and Y. Notay, The Jacobi-Davidson method, GAMM Mitt. Ges. Angew. Math. Mech., 29 (2006), pp. 368–382. [25] W. W. Hager and S. Park, Global convergence of SSM for minimizing a quadratic over a sphere, Math. Comp., 74 (2005), pp. 1413–1423. [26] A. V. Knyazev, Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method, SIAM J. Sci. Comput., 23 (2001), pp. 517–541. [27] I. Lashuk, M. Argenti, E. Ovtchinnikov, and A. Knyazev, Preconditioned eigensolver LOBPCG in Hypre and PETSc, in Domain Decomposition Methods in Science and Engineering XVI, Lect. Notes Comput. Sci. Eng. 55, Springer-Verlag, Berlin, 2007. [28] Y. Notay, Combination of Jacobi-Davidson and conjugate gradients for the partial symmetric eigenproblem, Numer. Linear Algebra Appl., 9 (2004), pp. 21–44. [29] Y. Notay, Is Jacobi–Davidson faster than Davidson?, SIAM J. Matrix Anal. Appl., 26 (2004), pp. 522–543. [30] J. Nocedal and S. J. Wright, Numerical Optimization, Springer Ser. Oper. Res., SpringerVerlag, New York, 1999. [31] G. Narkiss and M. Zibulevsky, Sequential Subspace Optimization Method for Large-Scale Unconstrained Problems, Technical report CCIT 559, EE Dept., Technion, Haifa, Israel, 2005. [32] M. Shub, Some remarks on dynamical systems and numerical analysis, in Dynamical Systems and Partial Differential Equations, Proceedings of the VII ELAM, L. Lara-Carrero and J. Lewowicz, eds., Equinoccio, Universidad Sim´ on Bol´ıvar, Caracas, 1986, pp. 69–91. [33] S. T. Smith, Geometric Optimization Methods for Adaptive Filtering, Ph.D. thesis, Division of Applied Sciences, Harvard University, Cambridge, MA, 1993. [34] S. T. Smith, Optimization techniques on Riemannian manifolds, in Hamiltonian and Gradient Flows, Algorithms and Control, Fields Inst. Commun. 3, American Mathematical Society, Providence, RI, 1994, pp. 113–136. [35] A. Stathopoulos and Y. Saad, Restarting techniques for the (Jacobi-)Davidson symmetric eigenvalue methods, Electron. Trans. Numer. Anal., 7 (1998), pp. 163–181. [36] A. Stathopoulos, Nearly optimal preconditioned methods for Hermitian eigenproblems under limited memory. Part I: Seeking one eigenvalue, SIAM J. Sci. Comput., 29 (2007), pp. 481– 514.
1018
P.-A. ABSIL AND K. A. GALLIVAN
[37] T. Steihaug, The conjugate gradient method and trust regions in large scale optimization, SIAM J. Numer. Anal., 20 (1983), pp. 626–637. [38] G. L. G. Sleijpen and H. A. Van der Vorst, A Jacobi–Davidson iteration method for linear eigenvalue problems, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 401–425. [39] I. Takahashi, A note on the conjugate gradient method, Inform. Process. Japan, 5 (1965), pp. 45–49. [40] C. Udris¸te, Convex Functions and Optimization Methods on Riemannian Manifolds, Math. Appl. 297, Kluwer Academic, Dordrecht, the Netherlands, 1994. [41] Y. Yang, Globally convergent optimization algorithms on Riemannian manifolds: Uniform framework for unconstrained and constrained optimization, J. Optim. Theory Appl., 132 (2007), pp. 245–265.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1019–1037
c 2009 Society for Industrial and Applied Mathematics
ON PRECONDITIONED ITERATIVE METHODS FOR CERTAIN TIME-DEPENDENT PARTIAL DIFFERENTIAL EQUATIONS∗ ZHONG-ZHI BAI† , YU-MEI HUANG‡ , AND MICHAEL K. NG§ Abstract. When the Newton method or the fixed-point method is employed to solve the systems of nonlinear equations arising in the sinc-Galerkin discretization of certain time-dependent partial differential equations, in each iteration step we need to solve a structured subsystem of linear equations iteratively by, for example, a Krylov subspace method such as the preconditioned GMRES. In this paper, based on the tensor and the Toeplitz structures of the linear subsystems we construct structured preconditioners for their coefficient matrices and estimate the eigenvalue bounds of the preconditioned matrices under certain assumptions. Numerical examples are given to illustrate the effectiveness of the proposed preconditioning methods. It has been shown that a combination of the Newton/fixed-point iteration with the preconditioned GMRES method is efficient and robust for solving the systems of nonlinear equations arising from the sinc-Galerkin discretization of the time-dependent partial differential equations. Key words. time-dependent partial differential equation, sinc-Galerkin discretization, Toeplitzlike matrix, preconditioning, eigenvalue bound, GMRES method AMS subject classifications. 65F10, 65F15, 65T10; CR: G1.3 DOI. 10.1137/080718176
1. Introduction. We consider the numerical solution of time-dependent partial differential equations of the form ⎧ ∂2u ∂u ∂u ⎪ ⎪ ⎨pt (t) (x, t) + px (x)u(x, t) (x, t) − ε 2 (x, t) = f (x, t), a < x < b, t ≥ 0, ∂t ∂x ∂x (1.1) ⎪ u(a, t) = γ(t) and u(b, t) = δ(t), t ≥ 0, ⎪ ⎩ u(x, 0) = g(x), a ≤ x ≤ b, where pz (z), z ∈ {x, t}, are given continuously differentiable functions, f (x, t), γ(t), δ(t), and g(x) are given bounded functions, and ε is a prescribed small positive parameter. Note that when pz (z) ≡ 1, z ∈ {x, t}, the partial differential equation (1.1) reduces to the Burgers equation; see [16] for more details. When the time-dependent partial differential equation (1.1) is discretized by the sinc-Galerkin method, in an analogous approach to [5] we can obtain systems of nonlinear equations of the form (1.2)
F(u) := Bu + CΨ(u) − b = 0,
∗ Received by the editors March 11, 2008; accepted for publication (in revised form) October 13, 2008; published electronically February 13, 2009. http://www.siam.org/journals/sinum/47-2/71817.html † State Key Laboratory of Scientific/Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, P.O. Box 2719, Beijing 100080, People’s Republic of China (bzz@ lsec.cc.ac.cn). This author’s research was supported by The National Basic Research Program (2005CB321702) and The National Outstanding Young Scientist Foundation (10525102), People’s Republic of China. ‡ School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, People’s Republic of China. § Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong (
[email protected]). This author’s research was supported in part by RGC grants 7046/03P, 7035/04P, and 7035/05P and FRG/04-05/II-51.
1019
1020
ZHONG-ZHI BAI, YU-MEI HUANG, AND MICHAEL K. NG
where B and C are known n-by-n matrices, b is a given n-vector, and Ψ : Rn → Rn , with Ψ(u) = (ψ1 (u1 ), ψ2 (u2 ), . . . , ψn (un ))T
and u = (u1 , u2 , . . . , un )T ,
is a continuous diagonal mapping defined on the open ball Uδ := {u ∈ Rn | u < δ}. Here, δ is a positive constant. The matrices B and C are given by B = ε Tx(2) + Dx(1) Tx(1) + Tx(1) Dx(1) + Dx(2) ⊗ Qt (3) (1) (1) (3) (4) + Qx ⊗ Dt T t + T t Dt + Dt (1.3) and
C = Dx(3) Tx(1) + Tx(1) Dx(3) + Dx(4) ⊗ Qt ,
(1.4)
and the mapping Ψ is given by T Ψ(u) = u21 , u22 , . . . , u2n ,
(1.5) (i)
where Tz (i = 1, 2 and z ∈ {x, t}) are (mz + nz + 1)-by-(mz + nz + 1) Toeplitz matrices, with ⎡ mz +nz ⎤ 1 0 −1 · · · (−1) 2 mz +nz ⎢ ⎥ .. ⎢ ⎥ 1 . ⎢ ⎥ ⎢ ⎥ .. ⎥, 1 1 (1.6) Tz(1) = ⎢ . −2 ⎢ ⎥ 2 ⎢ ⎥ .. ⎢ ⎥ ⎣ ⎦ . −1 mz +nz
− (−1) mz +nz ⎡
Tz(2)
(1.7)
⎢ ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎢ ⎣
π2 3
··· −2
− 12
1 ···
2 22
0 (−1)mz +nz 2 (mz +nz )2
.. .
−2 ..
2 22
2 22
.
.. .
(−1)mz +nz 2 (mz +nz )2
−2 ···
π2 3
−2
2 22
⎤ ⎥ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎥ ⎦
(i)
and Dz and Qz (i = 1, 2, 3, 4 and z ∈ {x, t}) are (mz + nz + 1)-by-(mz + nz + 1) diagonal matrices, with #nz φz (z) hz 2ωz (z) (1) · diag − (1.8) , − Dz = 2 (φz (z))2 φz (z)ωz (z) z=−mz
(1.9)
Dz(2)
h2 = z · diag 2
ω (z) − z2 (φz (z)) ωz (z)
#nz
,
z=−mz
PRECONDITIONING METHODS FOR TIME-DEPENDENT PDEs
(1.10) (1.11)
1021
hz nz · diag {−pz (z)ωz (z)}z=−m , z 2 #nz (pz (z)ωz (z)) h2z · diag − = , 2 φz (z)
Dz(3) = Dz(4)
z=−mz
and (1.12)
Qz = diag
ωz (z) φz (z)
nz
.
z=−mz
Here, mx , nx and mt , nt are positive integers representing the numbers of the bases used in the spatial and the temporal spaces, respectively, φx (x) and φt (t) are the restrictions of the conformal mapping φz (z) onto the real intervals (a, b) and (0, +∞), respectively, with φz (z) a mapping from a simply connected domain D onto Dd := {z | z = x + ıy, |y| < d, d > 0}, with ı the imaginary unit; and ωx (x) and ωt (t) are two weighting functions with respect to the spatial and the temporal variables, respectively. See [16, 5] for a detailed description about the sinc-Galerkin discretization. We remark that the first and the second derivatives of φz (z) and ωz (z) with respect to the variable z will be denoted as (1) φz (z), ωz (z) and φz (z), ωz (z), respectively, and the matrices Tz , z ∈ {x, t}, defined (2) in (1.6) are skew-symmetric, while the matrices Tz , z ∈ {x, t}, defined in (1.7) are symmetric positive definite; see Lemmas 2.1 and 2.2. The system of nonlinear equations (1.2) is usually termed as a mildly nonlinear system in literature; see [19, 21] for general backgrounds and applications, [2, 5] for the basic existence and uniqueness theory about the solution, and [1, 2, 7, 8, 21, 22] for several splitting iteration methods in the sequential and parallel computing senses. When the system of mildly nonlinear equations (1.2) is solved by the Newton or the fixed-point iteration method, at each step we need to solve a subsystem of linear equations of the form (1.13)
(B + CD)z = r,
where D is a diagonal matrix approximating the Jacobian matrix of the mapping Ψ : Rn → Rn and r is the current residual vector. Unfortunately, direct methods such as the Gaussian elimination or the fast Toeplitz algorithms [15, 14] are not applicable to effectively solve this class of diagonally scaled Toeplitz-plus-diagonal linear systems due to the considerably high computational complexity; see [9, 10, 11, 12, 13]. However, noticing that the matrix-vector product (B+CD)q can be computed in O(n log n) operations for any vector q ∈ Rn , we can employ Krylov subspace iteration methods such as GMRES [20] to iteratively solve the linear subsystem (1.13) in an economical cost. Usually, in order to accelerate the convergence speeds of the Krylov subspace iteration methods, we need to precondition the linear subsystem (1.13) by a good approximating matrix with respect to the coefficient matrix A := B + CD. Therefore, in order to solve the original linear subsystem, we turn to solving the corresponding preconditioned linear subsystem instead; see [6, 5] and the references therein. In this paper, we construct a structured preconditioner M for the matrix A by making use of the tensor-product structure of the original matrix A and the diagonally
1022
ZHONG-ZHI BAI, YU-MEI HUANG, AND MICHAEL K. NG
scaled Toeplitz-plus-diagonal structure of the matrix blocks involved. The positive definiteness of both matrices A and M are discussed in detail, and the eigenvalue bounds about the preconditioned matrix M −1 A are estimated precisely by utilizing the generalized Bendixson theorem [6]. Theoretical analysis shows that the eigenvalues of the matrix M −1 A are tightly and uniformly bounded in a rectangle on the complex plane independent of the size of the matrix. Numerical implementations show that the Newton-GMRES and the fixed-point-GMRES iteration methods, when incorporated with the structured preconditioner M , are effective and robust nonlinear solvers for the systems of mildly nonlinear equations arising from the sinc-Galerkin discretization of the referred time-dependent partial differential equations. The organization of the paper is as follows. In section 2, we construct a structured preconditioner for the coefficient matrix of the linear subsystem (1.13) and analyze basic properties of the original and the preconditioning matrices. In section 3, we demonstrate several preliminary results associated with the spectral analysis of the preconditioned matrix. The eigenvalue bounds of the preconditioned matrix are estimated in section 4, and numerical examples are given in section 5 to show the effectiveness of the proposed preconditioning and the corresponding preconditioned iteration methods. Finally, in section 6, we end this paper with some concluding remarks. 2. The structured preconditioners. Consider the system of mildly nonlinear equations (1.2), with the function Ψ(u) being given in (1.5) and the matrices B and (i) (i) C being given in (1.3) and (1.4), respectively, where Tz (i = 1, 2, z ∈ {x, t}), Dz (i = 1, 2, 3, 4 and z ∈ {x, t}) and Qz (z ∈ {x, t}) are defined in (1.6)–(1.12). Denote by I the identity matrix. Let Ω be a positive definite diagonal matrix such that D := I ⊗ Ω is an approximation to the Jacobian matrix of Ψ(u). Then the target matrix under consideration is
(2.1)
A = B + CD = ε Tx(2) + Dx(1) Tx(1) + Tx(1) Dx(1) + Dx(2) ⊗ Qt (3) (1) (1) (3) (4) + Qx ⊗ Dt T t + T t Dt + Dt + Dx(3) Tx(1) + Tx(1) Dx(3) + Dx(4) ⊗ (Qt Ω).
By utilizing the special structure of the matrix A, we can construct its preconditioner M as
(2.2) where
and
+ CD M =B = ε Bx(2) + Dx(1) Bx(1) + Bx(1) Dx(1) + Dx(2) ⊗ Qt (3) (1) (1) (3) (4) + Qx ⊗ Dt Bt + Bt Dt + Dt + Dx(3) Bx(1) + Bx(1) Dx(3) + Dx(4) ⊗ (Qt Ω), = ε B (2) + D(1) B (1) + B (1) D(1) + D(2) ⊗ Qt B x x x x x x (3) (1) (1) (3) (4) + Qx ⊗ Dt Bt + Bt Dt + Dt = Dx(3) Bx(1) + Bx(1) Dx(3) + Dx(4) ⊗ Qt , C
1023
PRECONDITIONING METHODS FOR TIME-DEPENDENT PDEs
and, for z ∈ {x, t}, Bz(1) = tridiag [1, 0, −1]
(2.3)
(1)
and Bz(2) = tridiag [−1, 2, −1] (2)
are tridiagonal approximations to Tz and Tz , respectively. Note that the precon(i) ditioning matrix M is obtained by replacing only Tz (i = 1, 2, z ∈ {x, t}) in the (i) matrix A by Bz (i = 1, 2, z ∈ {x, t}), correspondingly. We remark that the preconditioner M is a block tridiagonal matrix and is usually of mild size as, compared with the finite-difference system, the sinc-Galerkin system needs not be very large and is of mild size in order to achieve the same discretization accuracy [17, 18, 5]. Therefore, for any given vector r, the generalized residual equation M w = r involved in the preconditioned GMRES iteration method can be solved in O(Nx Nt2 ) or O(Nx2 Nt ) operations by using a variety of linear solvers such as the sparse direct methods, where Nz = mz + nz + 1, with z ∈ {x, t}. (2) It was proved in [16] that the Toeplitz matrix Tx is symmetric positive definite and its eigenvalues are located in a positive interval. This result, together with some (1) eigenproperties of the Toeplitz matrices Tz (z ∈ {x, t}), is precisely described in the following lemma. (1) Lemma 2.1 (see [16, Theorems 4.18 and 4.19]). Let the matrices Tz (z ∈ {x, t}) (2) and Tx be defined as in (1.6) and (1.7), respectively. Then (1) (i) for z ∈ {x, t}, Tz is a skew-symmetric matrix and its eigenvalues (1) nz (1) {ıλj }j=−mz satisfy λj ∈ [−π, π], −mz ≤ j ≤ nz ; (2)
(ii) Tx
(2)
x is a symmetric positive definite matrix and its eigenvalues {λj }nj=−m x
(2)
satisfy λj ∈ [4 sin2 ( 2(Nπx +1) ), π 2 ], where Nx = mx + nx + 1. Analogously, the structural properties and the eigenvalue locations about the (1) (2) matrices Bz (z ∈ {x, t}) and Bx are precisely described in the following lemma; see [4]. (1) (2) Lemma 2.2 (see [4, Lemma A.1]). Let the matrices Bz (z ∈ {x, t}) and Bx be defined as in (2.3). Then (1) (i) for z ∈ {x, t}, Bz is a skew-symmetric matrix and its eigenvalues (1) nz (1) {ıλj }j=−mz satisfy λj ∈ [− cos( Nzπ+1 ), cos( Nzπ+1 )], −mz ≤ j ≤ nz , where Nz = mz + nz + 1; (2) (2) x (ii) Bx is a symmetric positive definite matrix and its eigenvalues {λj }nj=−m x (2)
satisfy λj ∈ [4 sin2 ( 2(Nπx +1) ), 4 cos2 ( 2(Nπx +1) )], where Nx = mx + nx + 1. Based on these two lemmas, we now demonstrate the positive definiteness of the matrix A defined in (2.1) and its preconditioning matrix M defined in (2.2). To this end, in what follows we use (·)∗ to denote the conjugate transpose of either a vector or a square matrix. For a given square matrix X, we use H(X) and S(X) to denote, respectively, its Hermitian and skew-Hermitian parts [4] and λ(X) its spectral set. (2) (4) (4) Theorem 2.1. Assume that Dx , Dx , and Dt are positive semidefinite diagonal matrices and Qz (z ∈ {x, t}) and Ω are positive definite diagonal matrices. Then both H(A) and H(M ) are symmetric positive definite matrices. Hence, A and M are positive definite 1 and, thus, are nonsingular. 1 A matrix is positive definite if its Hermitian part is positive definite. Note that a positive definite matrix is not necessarily Hermitian; see [4, 3].
1024
ZHONG-ZHI BAI, YU-MEI HUANG, AND MICHAEL K. NG
Proof. The Hermitian and the skew-Hermitian parts of A and M are 1 (A + A∗ ) 2
H(A) =
(4) = ε Tx(2) + Dx(2) ⊗ Qt + Qx ⊗ Dt + Dx(4) ⊗ (Qt Ω),
S(A) =
1 (A − A∗ ) 2
(3) (1) (1) (3) = ε Dx(1) Tx(1) + Tx(1) Dx(1) ⊗ Qt + Qx ⊗ Dt Tt + Tt Dt + Dx(3) Tx(1) + Tx(1) Dx(3) ⊗ (Qt Ω)
and 1 (M + M ∗ ) 2 (4) = ε Bx(2) + Dx(2) ⊗ Qt + Qx ⊗ Dt + Dx(4) ⊗ (Qt Ω),
H(M ) =
1 (M − M ∗ ) 2 (3) (1) (1) (3) = ε Dx(1) Bx(1) + Bx(1) Dx(1) ⊗ Qt + Qx ⊗ Dt Bt + Bt Dt
S(M ) =
+ Dx(3) Bx(1) + Bx(1) Dx(3) ⊗ (Qt Ω). (2)
(4)
(4)
Because the diagonal matrices Dx , Dx , and Dt are positive semidefinite, the diagonal matrices Qz (z ∈ {x, t}) and Ω are positive definite, and from Lemma 2.1 (2) the Toeplitz matrices Tx are symmetric positive definite, so we know that H(A) is symmetric positive definite. Therefore, A is a positive definite matrix and, thus, is nonsingular. (2) From Lemma 2.2 the matrix Bx is symmetric positive definite. By applying the same arguments to the preconditioning matrix M , we can immediately show that M is positive definite and nonsingular, too. 3. Several preliminary lemmas. In this section, we are going to demonstrate several lemmas that are indispensable for estimating the eigenvalue bounds of the preconditioned matrix M −1 A. Lemma 3.1. Let Δ = diag(δ1 , δ2 , . . . , δn ) be an n-by-n positive diagonal matrix and H ∈ Cn×n be a Hermitian positive definite matrix. Then it holds that v ∗ (Δ ⊗ H)v ≤ κ(Δ)κ(H) v ∗ (H ⊗ Δ)v
∀v ∈ Cn \{0},
where κ(·) denotes the Euclidean condition number of the corresponding matrix. Proof. Because H ∈ Cn×n is a Hermitian positive definite matrix, there exist a unitary matrix U ∈ Cn×n and a positive diagonal matrix Λ = diag(λ1 , λ2 , . . . , λn ) ∈
PRECONDITIONING METHODS FOR TIME-DEPENDENT PDEs
1025
Rn×n such that H = U ∗ ΛU . Therefore, for all v ∈ Cn \ {0} we have v ∗ (Δ ⊗ H)v v ∗ [Δ ⊗ (U ∗ ΛU )]v = ∗ ∗ v (H ⊗ Δ)v v [(U ∗ ΛU ) ⊗ Δ]v v ∗ [(I ⊗ U )∗ (Δ ⊗ Λ)(I ⊗ U )]v = ∗ v [(U ⊗ I)∗ (Λ ⊗ Δ)(U ⊗ I)]v max1≤,j≤n {δ λj } ≤ min1≤,j≤n {δj λ } max1≤≤n δ max1≤≤n λ = · min1≤≤n δ min1≤≤n λ = κ(Δ)κ(H). While Lemma 3.1 gives an upper bound about the generalized Rayleigh quotient with respect to the Hermitian positive definite matrix H, the following lemma presents an estimate about the generalized Rayleigh quotient with respect to the Hermitian and the skew-Hermitian matrices H and S. Lemma 3.2. Let Γ = diag(γ1 , γ2 , . . . , γn ) and Δ = diag(δ1 , δ2 , . . . , δn ) be n-by-n positive diagonal matrices, H ∈ Cn×n be a Hermitian positive definite matrix, and S ∈ Cn×n be a skew-Hermitian matrix. Then it holds that ∗ ∗ v (S ⊗ Γ)v ≤ τ v (S ⊗ Γ)v ∀v ∈ Cn \{0}, v ∗ (H ⊗ Γ)v v ∗ (H ⊗ Δ)v where τ = max1≤≤n { γδ
}. Proof. Because H ∈ Cn×n is Hermitian positive definite, there exist a unitary matrix U ∈ Cn×n and a positive diagonal matrix Λ ∈ Rn×n such that H = U ∗ ΛU . Therefore, for all v ∈ Cn \ {0} we have v ∗ (H ⊗ Δ)v = v ∗ (U ∗ ΛU ⊗ Δ)v = v ∗ ((U ∗ ⊗ I)(Λ ⊗ Δ)(U ⊗ I))v 1 1 ≥ (v ∗ ((U ∗ ⊗ I)(Λ ⊗ Γ)(U ⊗ I))v) = (v ∗ (H ⊗ Γ)v). τ τ It then follows that
∗ ∗ v (S ⊗ Γ)v ≤ τ v (S ⊗ Γ)v . v ∗ (H ⊗ Δ)v v ∗ (H ⊗ Γ)v
The following generalized Bendixson theorem, established in [6], is essential for us to derive a rectangular domain for bounding the eigenvalues of the preconditioned matrix M −1 A. Theorem 3.1 (see [6, Theorem 2.4]). Let A, M ∈ C n×n be n-by-n complex matrices, and, for ∀ v ∈ C n \ {0}, it holds that v ∗ H(A)v = 0 and v ∗ H(M )v = 0. Let the functions h(v), fA (v), and fM (v) be defined as h(v) =
v ∗ H(A)v , v ∗ H(M )v
fA (v) =
1 v ∗ S(A)v · , ı v ∗ H(A)v
and
fM (v) =
1 v ∗ S(M )v · , ı v ∗ H(M )v
respectively. Assume that there exist positive constants γ1 and γ2 such that γ1 ≤ h(v) ≤ γ2
∀ v ∈ C n \ {0}
and nonnegative constants η and μ such that −μ ≤ fA (v) ≤ μ
and
− η ≤ fM (v) ≤ η ∀ v ∈ C n \ {0}.
1026
ZHONG-ZHI BAI, YU-MEI HUANG, AND MICHAEL K. NG
Then, when ημ ≤ 1, we have ⎧ ⎪ ⎨ (1 − ημ)γ1 ≤ Re λ M −1 A ≤ (1 + ημ)γ2 , 1 + η2 ⎪ ⎩−(η + μ)γ ≤ Im λ M −1 A ≤ (η + μ)γ . 2 2 Here, Re(·) and Im(·) represent the real and the imaginary parts of the corresponding complex, respectively. In order to derive the bounded domain about the eigenvalues of the matrix M −1 A by making use of the generalized Bendixson theorem, we essentially need the bounds of several generalized Rayleigh quotients with respect to certain parts of the matrices A and M defined in (2.1) and (2.2). These bounds are precisely stated in the following two lemmas. (2) Lemma 3.3 (see [6, Lemma 4.2]). Assume that Dx defined in (1.9) is a positive (2) semidefinite diagonal matrix. Let Tx be the Toeplitz matrix defined in (1.7) and (2) Bx the tridiagonal matrix defined in (2.3), respectively. Then it holds that (2) (2) v ∗ T x + Dx v π2 ≤ ∀v ∈ Cn \{0}. 1≤ (2) (2) 4 ∗ v Bx + Dx v (2)
Lemma 3.4. Assume that Dx defined in (1.9) is a positive semidefinite diagonal (j) matrix, Qt defined in (1.12) is a positive definite diagonal matrix, and Dz (j = 1, 3, (1) z ∈ {x, t}) are the diagonal matrices defined in (1.8) and (1.10). Let Tz (z ∈ {x, t}) (2) (1) and Tx be the Toeplitz matrices defined in (1.6) and (1.7) and Bz (z ∈ {x, t}) (2) (2) and Bx be the tridiagonal matrices defined in (2.3), respectively. Denote cx = 2 π 4 sin ( 2(Nx +1) ). For z ∈ {x, t}, let Nz = mz + nz + 1 and assume N := Nx = Nt . Define - . - . d¯(j) Dz(j) (j = 1, 3, z ∈ {x, t}), d(2) Dx(2) z = max x = min 1≤≤N
1≤≤N
and (j)
2π d¯z μ(j) z = > , (2) (2) (2) 2 cx + dx π + dx
(j) (2) (2) dx + 4 − dx d¯z ηz(j) = , (2) (2) cx + dx
j = 1, 3,
z ∈ {x, t}.
Then, for j = 1, 3, z ∈ {x, t}, and all v ∈ Cn \ {0}, it holds that ⎧ (j) (1) ⎨ v ∗ Dz Tz + Tz(1) Dz(j) ⊗ Qt v , max (2) (2) ⎩ ∗ T x + Dx ⊗ Qt v v ⎫ v ∗ Q ⊗ D(j) T (1) + T (1) D(j) v ⎬ z z z z t ≤ μ(j) z ⎭ (2) (2) v v ∗ Qt ⊗ T x + Dx
PRECONDITIONING METHODS FOR TIME-DEPENDENT PDEs
1027
and ⎧ (j) (1) ⎨ v ∗ Dz Bz + Bz(1) Dz(j) ⊗ Qt v , max (2) (2) ⎩ v ∗ Bx + Dx ⊗ Qt v ⎫ v ∗ Q ⊗ D(j) B (1) + B (1) D(j) v ⎬ z z z z t ≤ η (j) . z ⎭ (2) (2) ∗ v v Qt ⊗ Bx + Dx Proof. By making use of Lemma 2.1, following the same arguments as in the proof of [6, Lemma 4.3] we can obtain these estimates. 4. The spectral analysis. In this section, we will derive precise bounds for the eigenvalues of the preconditioned matrix M −1 A, where the matrices A and M are defined in (2.1) and (2.2), respectively. To this end, we first estimate the bounds of the function h(v) defined in Theorem 3.1. (2) (4) Lemma 4.1. Assume that Dx and Dz (z ∈ {x, t}) defined in (1.9) and (1.11) are positive semidefinite diagonal matrices and Qz (z ∈ {x, t}) defined in (1.12) and (2) Ω are positive definite diagonal matrices. Let Tx be the Toeplitz matrix defined in (2) (1.7) and Bx be the tridiagonal matrix defined in (2.3). Then (4.1)
1≤
π2 v ∗ H(A)v ≤ v ∗ H(M )v 4
∀v ∈ Cn \ {0}.
Proof. For notational simplicity we denote (4)
Dδ = Q x ⊗ Dt
+ Dx(4) ⊗ (Qt Ω) + δI,
where δ > 0 is arbitrary. Evidently, Dδ is a positive definite diagonal matrix. Therefore, for any v ∈ Cn \ {0}, according to the proof of Theorem 2.1 we have (2) (2) v ∗ ε T x + Dx ⊗ Qt + Dδ v v ∗ [H(A) + δI]v = (2) (2) v ∗ [H(M ) + δI]v v ∗ ε Bx + Dx ⊗ Qt + Dδ v ⎧ (2) ⎫ ⎨ v ∗ ε Tx + Dx(2) ⊗ Qt v ∗ v Dδ v ⎬ , ≤ max ⎩ v ∗ ε B (2) + D(2) ⊗ Q v v ∗ Dδ v ⎭ x x t ⎧ (2) ⎫ ⎬ ⎨ v ∗ Tx + Dx(2) ⊗ Qt v , 1 . = max ⎩ v ∗ B (2) + D(2) ⊗ Q v ⎭ x x t The above inequality follows from the basic inequality: β1 β2 β1 + β 2 ≤ max , ∀αj , βj > 0, α1 + α2 α1 α2
j = 1, 2.
Based on Lemma 3.3, we can demonstrate the validity of the estimate v ∗ [H(A) + δI]v π2 ≤ v ∗ [H(M ) + δI]v 4
1028
ZHONG-ZHI BAI, YU-MEI HUANG, AND MICHAEL K. NG
in an analogous fashion to [6, Lemma 4.2]. Moreover, as δ > 0 is arbitrary, it then follows that π2 v ∗ H(A)v ≤ . ∗ v H(M )v 4 Similarly, the left-hand side of the inequality (4.1) can be verified. For the bounds of the functions fA (v) and fM (v) defined in Theorem 3.1, we can give the following estimates. (2) (4) Lemma 4.2. Assume that Dx and Dz (z ∈ {x, t}) defined in (1.9) and (1.11) are positive semidefinite diagonal matrices, Qz (z ∈ {x, t}) defined in (1.12) and Ω (j) are positive definite diagonal matrices, and Dz (j = 1, 3, z ∈ {x, t}) are the diagonal (1) (2) matrices defined in (1.8) and (1.10). Let Tz (z ∈ {x, t}) and Tx be the Toeplitz (1) (2) matrices defined in (1.6) and (1.7) and Bz (z ∈ {x, t}) and Bx be the tridiagonal (2) 2 π matrices defined in (2.3), respectively. Denote cx = 4 sin ( 2(Nx +1) ). For z ∈ {x, t}, let Nz = mz + nz + 1 and assume N := Nx = Nt . Define - . - . Dz(j) (j = 1, 2, 3), d(2) Dx(2) d¯(j) z = max x = min 1≤≤N
1≤≤N
and (j)
2π d¯z μ(j) z = > , (2) (2) (2) 2 cx + dx π + dx
(j) (2) (2) ¯ dx + 4 − dx dz ηz(j) = , (2) (2) cx + dx
j = 1, 3,
z ∈ {x, t}.
Let ⎧ 4 −1 5 2 ¯(2) ⎪ κ(Q Qt Qx ε π + d ) max x t ⎪ ⎪ 1≤≤N (3) ⎪ (1) ⎪ μt + max {[Ω] }μ(3) ⎪ x , ⎨μ = μx + (2) (2) 1≤≤N cx + dx 4 −1 5 (2) (2) ⎪ ⎪ Qt Qx ε 4 − cx + d¯x κ(Qt ) max ⎪ ⎪ 1≤≤N ⎪ (3) ⎪ ηt + max {[Ω] }ηx(3) . ⎩η = ηx(1) + (2) (2) 1≤≤N cx + dx Then it holds that ∗ v S(A)v v ∗ H(A)v ≤ μ
and
∗ v S(M )v v ∗ H(M )v ≤ η
∀v ∈ Cn \ {0}.
Proof. For notational simplicity we denote (4)
D(4) = Qx ⊗ Dt (4)
+ Dx(4) ⊗ (Qt Ω).
Because Dz (z ∈ {x, t}) are positive semidefinite diagonal matrices and Qz (z ∈ {x, t}) and Ω are positive definite diagonal matrices, we see that D(4) is a positive semidefinite diagonal matrix.
PRECONDITIONING METHODS FOR TIME-DEPENDENT PDEs
1029
For any v ∈ Cn \ {0}, according to the proof of Theorem 2.1 we have (1) (1) (1) (1) ∗ ∗ v S(A)v v ε Dx Tx + Tx Dx ⊗ Qt v ≤ v ∗ H(A)v ∗ v ε Tx(2) + Dx(2) ⊗ Qt + D(4) v v ∗ Q ⊗ D(3) T (1) + T (1) D(3) v x t t t t + (2) (2) v ∗ ε Tx + Dx ⊗ Qt + D(4) v (3) (1) v ∗ D T + T (1) D(3) ⊗ (Q Ω) v x x x x t + (2) (2) v ∗ ε Tx + Dx ⊗ Qt + D(4) v (1) (1) v ∗ D T + T (1) D(1) ⊗ Q v x x x x t ≤ (2) (2) v ∗ T x + Dx ⊗ Qt v v ∗ Q ⊗ D(3) T (1) + T (1) D(3) v x t t t t + (2) (2) v ∗ ε T x + Dx ⊗ Qt v v ∗ D(3) T (1) + T (1) D(3) ⊗ (Q Ω) v x x x x t (4.2) + (2) (2) v ∗ ε T x + Dx ⊗ Qt v and
(4.3)
(1) (1) (1) (1) ∗ ∗ v S(M )v v ε Dx Bx + Bx Dx ⊗ Qt v ≤ v ∗ H(M )v ∗ v ε Bx(2) + Dx(2) ⊗ Qt + D(4) v v ∗ Q ⊗ D(3) B (1) + B (1) D(3) v x t t t t + (2) (2) v ∗ ε Bx + Dx ⊗ Qt + D(4) v (3) (1) v ∗ D B + B (1) D(3) ⊗ (Q Ω) v x x x x t + v ∗ ε Bx(2) + Dx(2) ⊗ Qt + D(4) v (1) (1) v ∗ D B + B (1) D(1) ⊗ Q v x x x x t ≤ (2) (2) v ∗ Bx + Dx ⊗ Qt v v ∗ Q ⊗ D(3) B (1) + B (1) D(3) v x t t t t + (2) (2) v ∗ ε Bx + Dx ⊗ Qt v (3) (1) v ∗ D B + B (1) D(3) ⊗ (Q Ω) v x x x x t . + (2) (2) v ∗ ε Bx + Dx ⊗ Qt v
Here in both estimates we have technically split the nominators into three parts and then used the triangular inequality to obtain the first inequalities. The second inequalities are directly obtained by using the positive semidefiniteness of the diagonal
1030
ZHONG-ZHI BAI, YU-MEI HUANG, AND MICHAEL K. NG (2)
matrix D(4) . In addition, we have used the facts that Dx is a positive semidefinite (2) (2) diagonal matrix and both Tx and Bx are positive definite Toeplitz matrices; see Lemma 2.1. From Lemma 3.4 we easily see that (1) (1) v ∗ D T + T (1) D(1) ⊗ Q v x x x x t ≤ μ(1) x (2) (2) ∗ T x + Dx ⊗ Qt v v
(4.4)
and (1) (1) v ∗ D B + B (1) D(1) ⊗ Q v x x x x t ≤ η (1) x (2) (2) ∗ Bx + Dx ⊗ Qt v v
(4.5)
hold true. It follows from Lemmas 2.1 and 2.2 that (2) π 2 + d¯x (4.6) κ Tx(2) + Dx(2) ≤ (2) (2) cx + dx
4 − c(2) + d¯(2) x x and κ Bx(2) + Dx(2) ≤ . (2) (2) cx + dx
By making use of Lemma 3.1 and (4.6), we have (2) (2) v ∗ Qt ⊗ ε T x + Dx v v ∗ ε Tx(2) + Dx(2) ⊗ Qt v ≥ (2) (2) κ ε T x + Dx κ(Qt ) (2) (2) c + dx x · v ∗ Qt ⊗ ε Tx(2) + Dx(2) v (2) ε π 2 + d¯x κ(Qt ) 1 = · v ∗ Qt ⊗ ε Tx(2) + Dx(2) v σT
≥ (4.7) and
(2) (2) v v ∗ Qt ⊗ ε Bx + Dx v ∗ ε Bx(2) + Dx(2) ⊗ Qt v ≥ (2) (2) κ(Qt ) κ ε Bx + Dx (2) (2) cx + dx · v ∗ Qt ⊗ ε Bx(2) + Dx(2) v (2) (2) ε 4 − cx + d¯x κ(Qt ) 1 = · v ∗ Qt ⊗ ε Bx(2) + Dx(2) v, σB
≥ (4.8) where
(2) ε π 2 + d¯x κ(Qt ) σT =
(2)
(2)
cx + dx
and σB =
(2) (2) ε 4 − cx + d¯x κ(Qt ) (2)
(2)
cx + dx
.
PRECONDITIONING METHODS FOR TIME-DEPENDENT PDEs
1031
Therefore, according to Lemmas 3.2 and 3.4, as well as (4.7)–(4.8), it holds that v ∗ Q ⊗ D(3) T (1) + T (1) D(3) v v ∗ Q ⊗ D(3) T (1) + T (1) D(3) v x x t t t t t t t t ≤ σT (2) (2) (2) (2) v ∗ ε T x + Dx ⊗ Qt v v ∗ Qt ⊗ ε T x + Dx v v ∗ Q ⊗ D(3) T (1) + T (1) D(3) v x t t t t ≤ σT τQ (2) (2) v v ∗ Qx ⊗ ε T x + Dx (3)
≤ σT τQ μt
(4.9)
and v ∗ Q ⊗ D(3) B (1) + B (1) D(3) v v ∗ Q ⊗ D(3) B (1) + B (1) D(3) v x x t t t t t t t t ≤ σB (2) (2) (2) (2) v v ∗ ε Bx + Dx ⊗ Qt v v ∗ Q t ⊗ ε Bx + D x v ∗ Q ⊗ D(3) B (1) + B (1) D(3) v x t t t t ≤ σB τQ (2) (2) ∗ v v Qx ⊗ ε Bx + Dx (3)
≤ σB τQ ηt ,
(4.10)
where τQ = max1≤≤N {[Q−1 t Qx ] }. In addition, according to Lemmas 3.2 and 3.4 it holds that (3) (1) (3) (1) v ∗ D T + T (1) D(3) ⊗ (Q Ω) v v ∗ D T + T (1) D(3) ⊗ (Q Ω) v x x x x x x x x t t ≤ τΩ (2) (2) (2) (2) v ∗ ε T x + Dx ⊗ Qt v v ∗ ε Tx + Dx ⊗ (Qt Ω) v ≤ τΩ μ(3) x
(4.11)
and (3) (1) (3) (1) v ∗ D B + B (1) D(3) ⊗ (Q Ω) v v ∗ D B + B (1) D(3) ⊗ (Q Ω) v x x x x x x x x t t ≤ τΩ (2) (2) (2) (2) v ∗ ε Bx + Dx ⊗ Qt v v ∗ ε Bx + Dx ⊗ (Qt Ω) v ≤ τΩ ηx(3) ,
(4.12)
where τQ = max1≤≤N {[Ω] }. Now, by substituting the inequalities (4.4), (4.5), (4.9), (4.10), (4.11), and (4.12) into (4.2) and (4.3), we immediately obtain the estimates that we are deriving. By using Theorem 3.1 and Lemmas 4.1 and 4.2, we can straightforwardly obtain the main theorem of this paper. Theorem 4.1. Let the conditions of Lemma 4.2 be satisfied. Without loss of generality, we make use of scaling on the original system of linear equations such that μη < 1. Then it holds that π 2 (1 + μη) 1 − μη ≤ Re λ M −1 A ≤ 2 1+η 4 and −
π 2 (μ + η) π 2 (μ + η) ≤ Im λ M −1 A ≤ . 4 4
1032
ZHONG-ZHI BAI, YU-MEI HUANG, AND MICHAEL K. NG
Based on Theorem 4.1, we can immediately obtain a theoretical estimate about the asymptotic convergence rate of the preconditioned GMRES method with the preconditioner M in (2.2) for solving the system of linear equations (1.13). Here, we should suitably scale the partial differential equation (1.1) and appropriately choose the weighting functions ωx (x) and ωt (t) and the conformal mappings φx (x) and φt (t) such that μη < 1. For details, we refer to [20, 6]. We remark that, when Theorem 4.1 is specialized to the matrices A and M , arising from the sinc-Galerkin discretization of the Burgers equation, much sharper bounds than those given in [5] about the eigenvalues of the preconditioned matrix M −1 A can be straightforwardly obtained under weaker restrictions. This is one of the theoretical advantages of our new result. 5. Numerical experiments. In this section, we use two examples of the timedependent partial differential equation (1.1) to demonstrate the effectiveness of the preconditioning and the corresponding preconditioned GMRES iteration method. Here, both Newton and fixed-point methods are applied to solve the discretized system of nonlinear equations (1.2). In our computations, the initial guess is set to be the zero vector and the outer nonlinear iteration is stopped once the current residual satisfies the criteria 0 (k) 0 0r 0 −6 0 02 0r(0) 0 ≤ 10 . 2
In each outer iteration step, a preconditioned linear system (5.1)
M −1 Az = M −1 r,
with
A = B + CD
+ CD, and M = B
is solved, which forms the inner iteration process for solving the linear subsystems involved in each step of the Newton or the fixed-point method; see (1.13) and (2.2). Here, the stopping criteria for the inner iteration, i.e., the preconditioned GMRES method, is that the relative reduction on the norm of the residual is less than 10−6 . Besides, all codes are written in MATLAB 7.01 and all experiments are implemented on a personal computer with 2.66GHz central processing unit and 0.99G memory. For the positive diagonal matrix Ω = diag([Ω]11 , [Ω]22 , . . . , [Ω]Nt Nt ), we can construct it according to a certain approximating rule. With respect to the Newton iteration method, we may minimize I ⊗ Ω − Ψ (u(c) )2 to obtain the Ω, where (c) (c) (c) u(c) = (u1 , u2 , . . . , un )T is the current Newton iterate. As now Ψ (u) = 2 · diag(u1 , u2 , . . . , un ), with u = (u1 , u2 , . . . , un )T and n = Nx Nt , by direct computations we can obtain the formulas for the diagonal elements of Ω as follows: [Ω]jj =
Nx −1 2 (c) ukNt +j , Nx
j = 1, 2, . . . , Nt .
k=0
Analogously, with respect to the fixed-point iteration method, we can choose [Ω]jj =
Nx −1 1 (c) ukNt +j , Nx
j = 1, 2, . . . , Nt ,
k=0
(c)
(c)
(c)
where u(c) = (u1 , u2 , . . . , un )T denotes the current fixed-point iterate. Note that the difference between these two Ω’s is just a factor of 2.
PRECONDITIONING METHODS FOR TIME-DEPENDENT PDEs
1033
The following two equations in the form of (1.1) are used to examine the numerical performance of the new preconditioner M defined in (2.2) and to show the accuracy of the computed solution. Example 5.1. The time-dependent partial differential equation ⎧ ∂u u(x, t) ∂u ∂2u ⎪ ⎪ ⎪ − (x, t) + (x, t) − ε 2 (x, t) ⎪ ⎪ ∂t x ∂x ∂x ⎪ ⎪ −π 2 t ⎪ sin (πx) = e ⎪ ⎨ −π 2 t cos (πx) πte + επ 2 t , 0 < x < 1 and t ≥ 0, · π2 t − 1 + ⎪ ⎪ ⎪ x ⎪ ⎪ ⎪ ⎪ u(0, t) = 0 and u(1, t) = 0, t ≥ 0, ⎪ ⎪ ⎩u(x, 0) = 0, 0 ≤ x ≤ 1, 2
with the exact solution being u(x, t) = te−π t sin(πx). Example 5.2. The time-dependent partial differential equation ⎧ ∂u u(x, t) ∂u ∂ 2u ⎪ ⎪ − (x, t) + (x, t) − ε (x, t) ⎪ ⎪ ∂x ∂x2 ⎪ ⎨ ∂t = −xe−t (1x− x)(1 − t) +t2 e−2t (1 − x)(1 − 2x) − 2εte−t , ⎪ ⎪ ⎪ ⎪ ⎪u(0, t) = 0 and u(1, t) = 0, ⎩ u(x, 0) = 0,
0 < x < 1 and t ≥ 0, t ≥ 0, 0 ≤ x ≤ 1,
with the exact solution being u(x, t) = x(1 − x)te−t . z The conformal mappings are chosen as φ(z) = ln( 1−z ) and ψ(z) = ln(sinh(z)) so that their restrictions onto the real intervals (0, 1) and (0, +∞) are φx (x) := φ(x) = x ln( 1−x ) and φt (t) := ψ(t) = ln(sinh(t)), which are used for the discretizations of x and t variables, respectively. And the weighting functions are chosen to be ωx (x) = 1/φx (x) and ωt (t) = 1/φt (t). In the numerical tables, the symbol I means that no preconditioner is used when solving the linear subsystems involved in the nonlinear iterations, while M represents that the preconditioner M defined in (2.2) is used. We use NIT to denote the number of the Newton iteration steps, FIT that of the fixed-point iteration steps, GIT the average number of GMRES iteration steps in each Newton or fixed-point iteration, CPU the total computing timings, and Se the maximum absolute discretization error at the sinc grid points and U e that on the corresponding uniform grid points, while we use “average Se” and “average U e” to represent the average absolute errors at all of the sinc grid points and at all of the uniform grid points, respectively. In addition, the symbol ∗ is used to denote that the iteration does not satisfy the terminating criterion within 50 steps of the Newton or the fixed-point iteration while + that the inner iteration does not satisfy the GMRES terminating criterion within 1000 iteration steps. We solve Example 5.1 when ε = 10−3 and ε = 10−4 . Tables 5.1–5.2 list the numbers of iteration steps and the CPU timings required for the convergence of the Newton iteration, and Tables 5.3–5.4 list those required for the convergence of the fixed-point iteration, respectively, when they are applied to solve the system of nonlinear equations (1.2) resulting from the sinc-Galerkin discretization of Example 5.1. Tables 5.5 and 5.6 list iteration numbers and CPU timings when the Newton and the fixed-point methods are applied, respectively, to Example 5.2, with ε = 10−3 . In all tables, some errors for reflecting the accuracy of the computed solutions are also shown.
1034
ZHONG-ZHI BAI, YU-MEI HUANG, AND MICHAEL K. NG Table 5.1 Results for Example 5.1. ε = 10−3 , and the Newton method is applied.
I n NIT GIT CPU NIT GIT 81 4 80 0.33 4 32 289 4 282 3.00 4 58 977 62.36 4 111 1089 4 4225 * + — 4 246
Se 2.22 × 10−3 1.21 × 10−3 1.55 × 10−3 1.69 × 10−3
M average Se 7.73 × 10−4 1.72 × 10−4 1.60 × 10−4 1.86 × 10−4
Ue 2.14 × 10−3 1.08 × 10−3 1.30 × 10−3 1.48 × 10−3
average U e CPU 6.54 × 10−4 0.33 8.70 × 10−5 0.98 1.41 × 10−5 6.48 1.05 × 10−5 78.59
Table 5.2 Results for Example 5.1. ε = 10−4 , and the Newton method is applied. I n NIT GIT CPU NIT GIT 81 4 80 0.33 4 35 289 4 283 3.02 4 68 1089 4 963 61.25 4 148 4225 * + — 5 359
Se 2.36 × 10−3 5.11 × 10−4 3.70 × 10−4 4.62 × 10−4
M average Se 7.31 × 10−4 8.08 × 10−5 3.03 × 10−5 2.91 × 10−5
Ue 2.23 × 10−3 2.61 × 10−4 1.71 × 10−4 1.85 × 10−4
average U e CPU 6.22 × 10−4 0.25 7.05 × 10−5 1.13 4.31 × 10−6 8.72 1.43 × 10−6 170.20
Table 5.3 Results for Example 5.1. ε = 10−3 , and the fixed-point method is applied. I n FIT GIT CPU FIT GIT 81 5 65 0.33 5 25 289 4 210 2.25 4 40 1089 6 824 83.67 6 76 4225 * + — 12 140
Se 2.22 × 10−3 1.21 × 10−3 1.54 × 10−3 1.69 × 10−3
M average Se 7.73 × 10−4 1.72 × 10−4 1.62 × 10−4 1.90 × 10−4
Ue 2.14 × 10−3 1.08 × 10−3 1.30 × 10−3 1.47 × 10−3
average U e CPU 6.54 × 10−4 0.23 8.70 × 10−5 0.64 1.41 × 10−5 6.32 1.05 × 10−5 111.63
Table 5.4 Results for Example 5.1. ε = 10−4 , and the fixed-point method is applied. I n FIT GIT CPU FIT GIT 81 6 68 0.34 6 27 289 4 211 2.27 4 46 1089 4 711 47.06 3 72 4225 * + — 3 123
Se 2.36 × 10−3 5.11 × 10−4 2.04 × 10−4 2.44 × 10−4
M average Se 7.31 × 10−4 8.08 × 10−5 2.49 × 10−5 2.77 × 10−5
Ue 2.23 × 10−3 2.60 × 10−4 1.74 × 10−4 1.99 × 10−4
average U e CPU 6.22 × 10−4 0.30 7.06 × 10−5 0.75 4.30 × 10−6 3.23 1.47 × 10−6 25.67
Table 5.5 Results for Example 5.2. ε = 10−3 , and the Newton method is applied. I M n NIT GIT CPU NIT GIT Se average Se Ue average U e CPU 289 9 285 7.08 9 87 4.23 × 10−3 1.30 × 10−3 4.27 × 10−3 1.41 × 10−3 3.31 1089 9 996 149.11 9 179 1.82 × 10−3 6.59 × 10−4 1.80 × 10−3 4.87 × 10−4 25.27 4225 * + — 10 508 2.35 × 10−3 5.77 × 10−4 2.17 × 10−3 3.17 × 10−4 653.31 Table 5.6 Results for Example 5.2. ε = 10−3 , and the fixed-point method is applied. I M n FIT GIT CPU FIT GIT Se average Se Ue average U e CPU 289 12 246 7.64 11 53 4.23 × 10−3 1.30 × 10−3 4.27 × 10−3 1.41 × 10−3 2.38 1089 16 808 204.30 14 93 1.82 × 10−3 6.60 × 10−4 1.80 × 10−3 4.87 × 10−4 17.98 4225 * + — 33 168 2.35 × 10−3 5.77 × 10−4 2.17 × 10−3 3.17 × 10−4 377.16
PRECONDITIONING METHODS FOR TIME-DEPENDENT PDEs 40
0.08
30
0.06
20
0.04
10 imaginary
imaginary
0.02
0
0
−0.02
−10
−0.04
−20
−0.06
−30
−0.08 −10 10
1035
−9
10
−8
10
−7
10
−6
10 real
−5
10
−4
10
−3
10
−40 0 10
−2
10
1
2
10 real
10
Fig. 5.1. Spectral distribution of Example 5.1. ε = 10−3 and n = 1089; without preconditioning (left), with the preconditioner M (right); and the Newton method is applied. 40
0.15
30 0.1
20 0.05
imaginary
imaginary
10 0
0
−0.05
−10
−0.1
−20
−0.15
−0.2 −10 10
−30
−9
10
−8
10
−7
10
−6
10 real
−5
10
−4
10
−3
10
−2
10
−40 −1 10
0
1
10
10
2
10
real
Fig. 5.2. Spectral distribution of Example 5.2. ε = 10−3 and n = 1089; without preconditioning (left), with the preconditioner M (right); and the fixed-point method is applied.
From these tables, we see that the new preconditioner can considerably improve the convergence properties of both Newton and fixed-point iteration methods and greatly reduce the running times. Moreover, with increasing of the problem size n, the number of the Newton or the fixed-point iteration steps keeps almost the same or increases slowly if the inner iteration solver, i.e., GMRES, is preconditioned by the new preconditioner while GMRES cannot achieve the prescribed tolerance within 1000 iteration steps and, therefore, the Newton or the fixed-point iteration cannot achieve the prescribed tolerance within 50 iteration steps if GMRES without using a preconditioner is employed as the inner iteration solver. Therefore, the new preconditioning method can substantially improve the convergence behaviors of both Newton and fixed-point iterations and, consequently, lead to fast convergent nonlinear solvers for the systems of nonlinear equations (1.2) arising in the sinc-Galerkin discretization of the time-dependent partial differential equation (1.1). Figures 5.1 and 5.2 depict the spectral distributions of the original coefficient matrix A and the preconditioned matrix M −1 A when the Newton method is applied to Example 5.1 and the fixed-point method is applied to Example 5.2, respectively. The figures clearly show that the matrices without preconditioning are very ill-conditioned and, therefore, the corresponding GMRES method may be convergent very slowly or even divergent, while the matrices with preconditioning are well-conditioned as they
1036
ZHONG-ZHI BAI, YU-MEI HUANG, AND MICHAEL K. NG
0.04
0.04
0.035
0.035
0.03
0.03
0.025
0.025
0.02
0.02
0.015
0.015
0.01
0.01
0.005
0.005
0 6
0 6 5
5
0.4
4
0.4
4
0.3
3
0.3
3
0.2
2
0.2
2
0.1
1 0
0.1
1 0
0
0
Fig. 5.3. Solutions of Example 5.1. ε = 10−3 and n = 1089; exact solution (left), computed solution (right); and the Newton method is applied.
0.1
0.1
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0 10
0 10 8
1 6
0.8 0.6
4 0.4
2
0.2 0
0
8
1 6
0.8 0.6
4 0.4
2
0.2 0
0
Fig. 5.4. Solutions of Example 5.2. ε = 10−3 and n = 1089; exact solution (left), computed solution (right); and the fixed-point method is applied.
have tightly clustered eigenvalues and, thus, the corresponding preconditioned GMRES method may be convergent very quickly to the exact solutions of the subsystems of linear equations. As a result, the preconditioned GMRES method used as the inner linear solver may lead to a fast convergent Newton or fixed-point method for solving the sinc-Galerkin nonlinear systems of the form (1.2). In Figures 5.3 and 5.4, we plot the exact and the computed solutions of Examples 5.1 and 5.2 corresponding to the cases shown in Figures 5.1 and 5.2, respectively, where the computed solution is obtained by using either the Newton or the fixed-point method. It is clear from Figures 5.3 and 5.4 that the new preconditioned iteration methods can compute reasonably accurate results. 6. Concluding remarks. We have constructed a structured preconditioner that can efficiently improve the convergence property of the GMRES iteration employed to inexactly solve the subsystem of linear equations involved in each Newton or fixedpoint iteration for solving the system of nonlinear equations resulting from the sincGalerkin discretization of the time-dependent partial differential equation (1.1). The bounds of the eigenvalues of the preconditioned matrix were precisely estimated by making use of the generalized Bendixson theorem, which, in particular, can lead to sharper eigenvalue bounds than those derived in [5] for the preconditioned matrix arising from the sinc-Galerkin discretization of the Burgers equation. Numerical experiments have shown the effectiveness of this new preconditioning method.
PRECONDITIONING METHODS FOR TIME-DEPENDENT PDEs
1037
REFERENCES [1] Z.-Z. Bai, Parallel multisplitting AOR method for solving a class of system of nonlinear algebraic equations, Appl. Math. Mech., 16 (1995), pp. 675–682. [2] Z.-Z. Bai, Parallel nonlinear AOR method and its convergence, Comput. Math. Appl., 31 (1996), pp. 21–31. [3] Z.-Z. Bai, G.H. Golub, L.-Z. Lu, and J.-F. Yin, Block triangular and skew-Hermitian splitting methods for positive-definite linear systems, SIAM J. Sci. Comput., 26 (2005), pp. 844– 863. [4] Z.-Z. Bai, G.H. Golub, and M.K. Ng, Hermitian and skew-Hermitian splitting methods for non-Hermitian positive definite linear systems, SIAM J. Matrix Anal. Appl., 24 (2003), pp. 603–626. [5] Z.-Z. Bai, Y-M. Huang, and M.K. Ng, On preconditioned iterative methods for Burgers equations, SIAM J. Sci. Comput., 29 (2007), pp. 415–439. [6] Z.-Z. Bai and M.K. Ng, Preconditioners for nonsymmetric block Toeplitz-like-plus-diagonal linear systems, Numer. Math., 96 (2003), pp. 197–220. [7] Z.-Z. Bai and D.-R. Wang, Asynchronous multisplitting nonlinear Gauss-Seidel type method, Appl. Math. J. Chinese Univ. Ser. B, 9 (1994), pp. 189–194. [8] Z.-Z. Bai and D.-R. Wang, Asynchronous parallel multisplitting nonlinear Gauss-Seidel iteration, Appl. Math. Chinese Univ. Ser. B, 12 (1997), pp. 179–194. [9] R.H. Chan and X.-Q. Jin, A family of block preconditioners for block systems, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 1218–1235. [10] R.H. Chan, W.-F. Ng, and H.-W. Sun, Fast construction of optimal circulant preconditioners for matrices from the fast dense matrix method, BIT, 40 (2000), pp. 24–40. [11] X.-Q. Jin, A note on preconditioned block Toeplitz matrices, SIAM J. Sci. Comput., 16 (1995), pp. 951–955. [12] X.-Q. Jin, Band Toeplitz preconditioners for block Toeplitz systems, J. Comput. Appl. Math., 70 (1996), pp. 225–230. [13] X.-Q. Jin, Developments and Applications of Block Toeplitz Iterative Solvers, Kluwer Academic Publishers, Dordrecht Science Press, Beijing, 2002. [14] T. Kailath and A.H. Sayed, Displacement structure: Theory and applications, SIAM Rev., 37 (1995), pp. 297–386. [15] N. Levinson, The Wiener RMS (root mean square) error criterion in filter design and prediction, J. Math. Phys. Mass. Inst. Tech., 25 (1947), pp. 261–278. [16] J. Lund and K.L. Bowers, Sinc Methods for Quadrature and Differential Equations, SIAM, Philadelphia, 1992. [17] M.K. Ng, Fast iterative methods for symmetric sinc-Galerkin systems, IMA J. Numer. Anal., 19 (1999), pp. 357–373. [18] M.K. Ng and D. Potts, Fast iterative methods for sinc systems, SIAM J. Matrix Anal. Appl., 24 (2002), pp. 581–598. [19] J.M. Ortega and W.C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, London, 1970. [20] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS, Boston, 1996. [21] D.-R. Wang, Z.-Z. Bai, and D.J. Evans, Asynchronous multisplitting relaxed iterations for weakly nonlinear systems, Int. J. Comput. Math., 54 (1994), pp. 57–76. [22] D.-R. Wang, Z.-Z. Bai, and D.J. Evans, On the monotone convergence of multisplitting method for a class of system of weakly nonlinear equations, Int. J. Comput Math., 60 (1996), pp. 229–242.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1038–1066
c 2009 Society for Industrial and Applied Mathematics
CONVERGENCE ANALYSIS OF A DISCONTINUOUS GALERKIN METHOD WITH PLANE WAVES AND LAGRANGE MULTIPLIERS FOR THE SOLUTION OF HELMHOLTZ PROBLEMS∗ MOHAMED AMARA† , RABIA DJELLOULI‡ , AND CHARBEL FARHAT§ Abstract. We analyze the convergence of a discontinuous Galerkin method (DGM) with plane waves and Lagrange multipliers that was recently proposed by Farhat, Harari, and Hetmaniuk [Comput. Methods Appl. Mech. Engrg., 192 (2003), pp. 1389–1419] for solving two-dimensional Helmholtz problems at relatively high wavenumbers. We prove that the underlying hybrid variational formulation is well-posed. We also present various a priori error estimates that establish the convergence and order of accuracy of the simplest element associated with this method. We prove that, for 2 k (k h) 3 sufficiently small, the relative error in the L2 -norm (resp. in the H 1 seminorm) is of order 4 2 5 k (k h) 3 (resp. of order (k h) 3 ) for a solution being in H 3 (Ω). In addition, we establish an a posteriori error estimate that can be used as a practical error indicator when refining the partition of the computational domain. Key words. acoustic scattering, discontinuous Galerkin, Helmholtz problems, hybrid finite element, inf-sup condition, plane waves AMS subject classifications. 65N12, 65N15, 35J05, 65N30, 74J20, 35Q60, 39A12, 78A45 DOI. 10.1137/060673230
Introduction. The discontinuous enrichment method (DEM) was developed in [1, 2] for the solution of multiscale boundary value problems (BVPs) with sharp gradients and rapid oscillations. These are problems for which the standard finite element method (FEM) can become prohibitively expensive. DEM can be described as a discontinuous Galerkin method (DGM) with Lagrange multiplier degrees of freedom (DOFs), in which the standard finite element polynomial field is enriched within each element by free-space solutions of the homogeneous partial differential equation to be solved. Usually, these are easily obtained in analytical form and are discontinuous across the element interfaces. The Lagrange multiplier DOFs are introduced at these interfaces to enforce a weak continuity of the solution. For the Helmholtz equation, the enrichment field can be constructed with plane waves, as these are free-space solutions of this equation. In [3], it was shown that for a large class of Helmholtz problems, the polynomial field is not necessary for efficiently capturing the solution. Hence, for these applications, the polynomial field was dropped, and the DEM was transformed into a ∗ Received by the editors November 13, 2006; accepted for publication (in revised form) October 16, 2008; published electronically February 13, 2009. http://www.siam.org/journals/sinum/47-2/67323.html † Laboratoire de Math´ ematiques Appliqu´ees, Universit´e de Pau et des Pays de l’Adour and CNRSUMR5142, BP 1155, 64013 Pau cedex, France (
[email protected]). ‡ Corresponding author. Department of Mathematics, California State University Northridge, Northridge, CA 91330-8313 (
[email protected]). This author’s research was partially supported by the National Science Foundation (NSF) under grant DMS-0406617 and by the Office of Naval Research (ONR) under grant N-00014-01-1-0356. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF or the ONR. § Department of Mechanical Engineering and Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305 (
[email protected]). This author’s research was partially supported by the National Science Foundation (NSF) under grant DMS-0406617 and by the Office of Naval Research (ONR) under grant N-00014-01-1-0356. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF or the ONR.
1038
CONVERGENCE ANALYSIS OF A DG METHOD
1039
DGM with plane wave basis functions. Similar exponential functions were previously introduced in the weak element method (WEM) [4], the partition of unity method (PUM) [5], the ultra weak variational method [6], and the least-squares method (LSM) presented in [7] for the solution of the Helmholtz equation. However, unlike WEM, the DGM proposed in [3] is based on a variational framework, and unlike PUM, it is discontinuous. Furthermore, in contrast to LSM, the continuity of the solution at the interelement boundaries is enforced in DEM by Lagrange multipliers rather than penalty parameters, which increases the robustness and accuracy of the underlying framework of approximation. In [3], two lower-order rectangular DGM elements with four and eight plane waves, respectively, were constructed and applied to the solution of two-dimensional waveguide problems with 10 ≤ kl ≤ 100, where k denotes the wavenumber and l is a characteristic length of the waveguide. The discretization by these elements of such Helmholtz problems was found to require five to seven times fewer DOFs than their discretization by the standard Q2 element, depending on the desired level of accuracy. In [8], this DGM was extended to exterior Helmholtz problems and was coupled with a second-order absorbing boundary condition. A lower-order quadrilateral element with eight Lagrange multiplier DOFs was designed and highlighted with the solution on unstructured meshes of sample acoustic scattering problems with 20 ≤ kl ≤ 40, where l denotes a characteristic length of the scatterer. This element was shown to deliver significant improvement over the performance of the standard and comparable Q2 element. In [9], two higher-order quadrilateral DGM elements with 16 and 32 plane waves, respectively, were presented. The DGM element with 16 plane waves has a computational complexity that is comparable to that of the standard Q4 element and was shown numerically to have the same convergence rate with respect to the mesh size. However, this DGM element was also shown numerically in [8] to deliver the same level of accuracy as Q4 using six times fewer DOFs. All of these performance results highlight the potential of the DGM introduced in [3] and expanded in [8] and [9]. However, no mathematical analysis of this method has been performed yet. The objective of this paper is to fill this gap in the specific context of the two-dimensional low-order element with four plane waves in order to set this DGM method on a firm theoretical basis. The proposed study assumes that the computational domain Ω is a polygonal-shaped domain that can be partitioned into rectangular elements. Note that the computational domain Ω may have reentrant corners, and therefore, the 5 considered acoustic scattered field is in H 3 (Ω) only. We partition the computational domain into rectangular-shaped elements and consider the case of the so-called R-4-1 element, that is, we approximate locally the primal variable by four plane waves and the dual variable by constants on the edges of interior elements. We must point out that this study cannot be extended—at this time—to higher-order elements because it assumes that the normal derivative of the primal variable is constant along the interior edges. This crucial property is valid only in the case of the R-4-1 element. 2 We prove that for k (k h) 3 small enough, the relative error in the L2 -norm (resp. in 4 2 the H 1 seminorm) is of order k (k h) 3 (resp. (k h) 3 ). We recall that in the case of the standard FEM using P1 element (see [10, 11]), it has been established that for k 2 h small enough, the relative error in the L2 -norm (resp. in the H 1 seminorm) is of order k 3 h2 (resp. k h). Moreover, if we assume that k h is small enough, it has been established in [11] that the relative error for both the L2 -norm and the H 1 seminorm are bounded by k (k h)2 . However, all these error estimates have been established assuming that the scattered field is in H 2 (Ω), which is not a realistic assumption for most applications. We must also point out that, to the best of our knowledge, no
1040
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
error estimates have been derived yet in the particular case of the Q4 finite element when applied to Helmholtz problems. We also derive a posteriori error estimate that can be used as a practical error indicator when refining the partition of the computational domain. This error estimate reveals that the relative error in the L2 -norm depends on the errors in the approximation of the interior and exterior boundary conditions as well as on the jump across the elements of the partition. The remainder of this paper is organized as follows. In section 1, we specify the notations and assumptions used in this paper, state the formulation of a twodimensional acoustic scattering problem in a bounded domain, and prove that the hybrid problem obtained by applying the DGM introduced above to the solution of the focus Helmholtz problem is well-posed in the sense of Hadamard [12]. More specifically, we introduce Theorem 1 to address the issues of existence, uniqueness, and stability of the DGM formulation. Next, we devote section 3 to the analysis of the discrete solution obtained with a DGM element with four plane waves. More specifically, we recall in section 3.2 the discrete DGM formulation and announce the main results of this paper. These are existence and uniqueness results, a priori error estimates that are stated in Theorem 2, and an a posteriori estimate that is stated in Theorem 3. The proofs of these three sets of fundamental results are detailed in sections 3.3 and 3.4. Finally, section 4 concludes this paper. 1. Preliminaries. We consider throughout this paper the acoustic scattering problem by a sound-hard scatterer [13] formulated in a bounded domain as follows:
(1.1)
(BVP)
⎧ Find u ∈ H 1 (Ω) such that ⎪ ⎪ ⎪ ⎪ ⎨ Δu + k 2 u = 0 ⎪ ⎪ ⎪ ⎪ ⎩
∂n u = −∂n e
in Ω, ikx·d
∂n u = iku
on Γ, on Σ,
where u is the scattered field and Ω is the computational domain. Ω is a bounded polygonal-shaped domain that can be partitioned into rectangular elements. Γ is its interior boundary, and Σ is the exterior boundary. n is the unitary outward normal vector to the boundaries Γ and Σ, and ∂n is the normal derivative. k is a positive number representing the wavenumber. d is a unit vector representing the direction of the incident plane wave. The equation on Γ is the Neumann boundary condition that characterizes the sound-hard property of the scatterer. We must point out that the interior Neumann boundary condition on Γ and the exterior condition on Σ are used only for simplicity. The results presented herein apply to all types of admissible boundary conditions. In addition, as it is well-known, one should use higher-order local absorbing boundary conditions for solving practical problems. 2. The continuous hybrid variational formulation. 2.1. Nomenclature and properties. We use throughout this paper the following notations and properties. • K element of Ω and ∂K is its boundary. ∂K = 14 is a j rectangular-shaped j K K T , where T is the jth edge of K with vertices (sK j , sj+1 ) and nj its j=1 K K outward unitary normal vector.
1041
CONVERGENCE ANALYSIS OF A DG METHOD
j K • hK j is the length of the edge TK , and hK = max1≤j≤4 hj . • (Th )h is a regular triangulation of the computational domain Ω into elements K, i.e.,
∃ cˆ > 0 / ∀ h, ∀ K ∈ Th ; h2K ≤ cˆ|K|, where |K|denotes the area of the element K [14]. Note that (Th )h is a quasiuniform triangulation, since its elements K are rectangles. • h = maxK∈Th hK . We also assume that kh ≤ π. This condition means that there is at least two elements per wavelength. • X is the space of the primal variable. X is given by ! 4 5 X = v ∈ L2 (Ω); ∀ K ∈ Th , vK = v|K ∈ H 1 (K) ≈ H 1 (K) K∈Th
and is equipped with the following norm: vX =
12 vK 2X(K)
∀ v ∈ X,
K∈Th
where vK X(K)
12 1 2 2 vK 0,K = |vK |1,K + . |K|
· 0,K (resp. | · |1,K ) is the L2 -norm (resp. seminorm) on the element K. • | · |1,Th is the seminorm in the space X defined by |v|1,Th =
12 |vK |21,K
∀ v ∈ X.
K∈Th 1
1
• H 2 (∂K) is the space of the traces of elements of H 1 (K), and H − 2 (∂K) is 1 1 the dual space of H 2 (∂K). H 2 (∂K) is equipped with the following norm: (2.1)
λ 12 ,∂K =
inf
w∈W (λ)
wX(K) = ΛX(K) ,
5 4 where W (λ) = w ∈ H 1 (K) ; w|∂K = λ and Λ is the unique element in W (λ) satisfying −ΔΛ +
1 Λ=0 |K|
a.e. in K.
It follows from the definition of the norm · X and (2.1) that v 21 ,∂K ≤ vX(K)
(2.2)
∀ v ∈ H 1 (K).
• M is the space of the dual variable defined by # F ! G 1 μK , λK − 1 × 1 ,∂K = 0 , H − 2 (∂K) ; ∀λ ∈ T, M = μ∈ K∈Th
K∈Th
2
2
1042
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
where μK = μ|∂K and the space T is given by # ! 1 T = λ∈ H 2 (∂K); ∀ K = K ∈ Th , λK = λK on ∂K ∩ ∂K . K∈Th
The space M is equipped with the following norm:
12
μM =
K∈Th
μK 2− 1 ,∂K 2
where μK − 12 ,∂K =
F G K μ , λ − 1 × 1 ,∂K 2
sup 1 2
λ 21 ,∂K
λ∈H (∂K)
(2.3)
=
∀μ ∈ M,
2
F G K μ , v − 1 × 1 ,∂K 2
sup v∈H 1 (K)
vX(K)
2
1
1
and . , .− 12 × 12 ,∂K is the duality product between H − 2 (∂K) and H 2 (∂K) [15]. • M is a subspace of M defined by M = μ ∈ ! L2 (∂K); μ = 0 on ∂Ω and ∀ K = K ∈ T , h K∈Th # μK + μK = 0 on ∂K ∩ ∂K . Therefore, we have M=M∩
!
L2 (∂K).
K∈Th
2.2. Formulation and mathematical results. We adopt the following hybridtype variational formulation (VP) for solving the BVP. Note that the VP is equivalent to BVP as indicated in Remark 1. ⎧ ⎪ ⎨Find (u, λ) ∈ X × M such that ∀ v ∈ X, (2.4) (VP) a(u , v) + b(v, λ) = F (v) ⎪ ⎩ b(u, μ) = 0 ∀ μ ∈ M, where the bilinear forms a(· , ·) and b(· , ·) and the function F are given by a(u , v) = ∇u · ∇v dx − k 2 uv dx − ik uv dt ∀ u , v ∈ X, K∈Th
K
K
F G μK , v − 1 × 1 ,∂K b(v, μ) = K∈Th
F (v) = −
K∈Th
2
∂K∩Γ
2
∂K∩Σ
∀ (v, μ) ∈ X × M,
v∂n eikx·d dt
∀ v ∈ X.
1043
CONVERGENCE ANALYSIS OF A DG METHOD
Note that the bilinear form b(· , ·) also satisfies μK v dt ∀ (v, μ) ∈ X × M. b(v, μ) = ∂K
K∈Th
In addition, the bilinear forms a(· , ·) and b(· , ·) satisfy the following important properties. Property 1. The bilinear forms a(. , .) and b(· , ·) are continuous on X × X and X × M , respectively. Furthermore, we have the following: (i) a(. , .) satisfies the G¨ arding inequality in H 1 (Ω) (2.5)
a(v , v) + k 2 v20,Ω = |v|21,Th
∀ v ∈ X,
where designates the real part. (ii) The null space N corresponding to the bilinear form b(. , .) is given by (2.6)
N = {v ∈ X ;
b(v, μ) = 0
∀ μ ∈ M } = H 1 (Ω).
(iii) The bilinear form b(. , .) satisfies the so-called inf-sup condition [21]: (2.7)
∀μ ∈ M ,
∃φ ∈ X :
sup v∈X
|b(v, μ)| |b(φ, μ)| = = μM . vX φX
Proof of Property 1. We prove only the third point, since the proof of (2.5) and (2.6) is straightforward. From the continuity of the bilinear form b(. , .), we deduce that (2.8)
sup v∈X
|b(v, μ)| ≤ μM vX
∀ μ ∈ M.
Next, for a fixed μ ∈ M , we consider the function φ ∈ X such that, for every K ∈ Th , φ|K = φK is the unique solution of the following variational problem: (2.9)
1 ∇φK · ∇v dx + |K| K
K
F G φK v dx = μK , v − 1 × 1 ,∂K 2
2
∀ v ∈ H 1 (K).
Hence, using (2.2) and (2.9), we have 0 0 0 0 G F φK 2X(K) = μK , φK − 1 × 1 ,∂K ≤ 0μK 0− 1 ,∂K φK 12 ,∂K ≤ 0μK 0− 1 ,∂K φK X(K) . 2
2
2
2
Thus, we deduce that φK X(K) ≤ μK − 12 ,∂K and then φX ≤ μM . Moreover, from (2.3) and (2.9), we have μK − 12 ,∂K ≤ φK X(K) . Therefore, it follows that φX = μM . On the other hand, from (2.9) and the definition of the bilinear form b(· , ·), we also have φK 2X(K) = φ2X = φX μM , b(φ, μ) = K∈Th
which concludes the proof of the inf-sup condition given by (2.7).
1044
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
Remark 1. The problems BVP and VP are equivalent in the following sense: (i) If the pair (u, λ) is a solution of VP, then it follows from the second equation of VP that u is in H 1 (Ω). Moreover, using the first equation of VP with test functions v ∈ D(Ω), we deduce that u is the solution of the first equation of BVP. Last, the use of test functions v ∈ H 1 (Ω) allows us to verify that u satisfies the boundary conditions on Γ and Σ. (ii) If u is the solution of BVP, then from the standard regularity results for Laplace’s operator [22] and due to the possible reentrant corners (with a mea5 K 2 3 sure angle of 3π 2 ), it follows that u ∈ H (Ω). Thus, ∂n u ∈ L (∂K) ∀ K ∈ Th 1 (∂n uK is even in H 6 (∂K)). Then we set on ∂K \ ∂Ω, −∂n u K (2.10) λ = 0 on ∂K ∩ ∂Ω. Therefore, the dual variable λ satisfies (2.10) in the L2 (∂K) sense, which is the classical sense. Having that in mind, one can multiply BPV by test functions v ∈ X and deduce that the pair (u, λ) satisfies VP. Next, we prove that the variational problem (VP) is well-posed in the sense of Hadamard [12]. This is main result of this section. It is stated in the following theorem. Theorem 1. The variational problem (VP) admits a unique solution (u, λ) ∈ 5 X × M . In addition, u belongs to H 3 (Ω) and for all θ ∈ [0, 53 ], there is a positive constant C (C depends on Ω and θ only) such that |u|θ,Ω ≤ C (1 + k)θ . The proof of this theorem is based on the following intermediate stability result. Lemma 1. Let f be in L2 (Ω). Then, the following BVP ⎧ 2 in Ω, ⎪ ⎨ΔU + k U = f on Γ, ∂n U = 0 (2.11) ⎪ ⎩ on Σ, ∂n U = ik U 5
has one and only one solution U in H 3 (Ω). Moreover, for all θ ∈ [0, 53 ], there is a positive constant C (C depends on Ω and θ only) such that (2.12)
|U |θ,Ω ≤ C (1 + k)θ−1 f 0,Ω .
Proof of Lemma 1. First, observe that the variational formulation corresponding to the BVP (2.11) is given by ⎧ 1 ⎪ ⎨Find U ∈ H (Ω) such that (2.13) ⎪ f v dx ∀ v ∈ H 1 (Ω). ⎩a(U , v) = − Ω
From (2.5), it follows that the bilinear form a(. , .) satisfies the Fredholm alternative on H 1 (Ω). Hence, the uniqueness ensures the existence of the solution U in H 1 (Ω). Therefore, we need only to prove the uniqueness of the solution of the BVP (2.11). Let w be the solution of the corresponding homogeneous BVP. The function w satisfies a(w, w) = 0
then w = 0 on Σ,
CONVERGENCE ANALYSIS OF A DG METHOD
1045
and we deduce that ∂n w = 0 on Γ
and
w = ∂n w = 0
on Σ.
Therefore, using the continuation theorem [16, 17], we obtain that w = 0 in Ω. From the standard regularity results for second-order elliptic BVPs [22] and due to the possible reentrant corners (with a measure angle of 3π 2 ), it follows that the 5 solution of problem (2.11) satisfies U ∈ H 3 (Ω), and there is a positive constant C (C depends on Ω only) such that (2.14) U 35 ,Ω ≤ C ΔU − 13 ,Ω + ∂n U 61 ,∂Ω . Moreover, using the results established in [18] and [19], we deduce the existence of a positive constant C (C depends on Ω only) such that (2.15)
U 0,Ω ≤
C f 0,Ω 1+k
and
|U |1,Ω ≤ C f 0,Ω .
Next, we establish the estimate (2.12). To do this, we will use the space interpolation results in [20]. First, using boundary conditions in BVP (2.11), we deduce that there is a positive constant C (C depends on Ω only) such that ∂n U 16 ,∂Ω = ∂n U 16 ,Σ = k U 61 ,Σ ≤ C k U 32 ,Ω . Therefore, it follows from the space interpolation results in [20] that there is a positive constant C (C depends on Ω only) such that 1
2
3 3 |U |1,Ω . ∂n U 61 ,∂Ω ≤ C k U 0,Ω
Finally, it follows from (2.15) that there exists a positive constant C (C depends on Ω only) such that (2.16)
2
∂n U 16 ,∂Ω ≤ C (1 + k) 3 f 0,Ω .
Furthermore, from the first equation of BVP (2.11), we deduce that ΔU 0,Ω ≤ k 2 U 0,Ω + f 0,Ω . Hence, it follows from (2.15) that there is a positive C (C depends on Ω only) such that ΔU 0,Ω ≤ C (1 + k) f 0,Ω . In addition, from the norms properties and (2.15), there is a positive C (C depends on Ω only) such that ΔU −1,Ω ≤ |U |1,Ω ≤ U 1,Ω ≤ C f 0,Ω . Consequently, it follows from these equations and the interpolation space results theorem (see [20]) that there is a positive constant C (C depends on the domain Ω only) such that (2.17)
2
ΔU − 13 ,Ω ≤ C (1 + k) 3 f 0,Ω .
Estimate (2.12) is then a direct consequence of (2.14), (2.16), and (2.17).
1046
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
Proof of Theorem 1. Since H 1 (Ω) is the null space of the bilinear form b(. , .) (see (2.6)), the VP is reduced to the variational problem a(u , v) = F (v)
∀ v ∈ H 1 (Ω).
From (2.5), it follows that the bilinear form a(. , .) satisfies the Fredholm alternative on H 1 (Ω). Hence, the uniqueness ensures the existence of the solution u in H 1 (Ω). On the other hand, the uniqueness results readily from the solution of BVP (2.11). Therefore, the solution u of the reduced variational problem in the null space H 1 (Ω) of the bilinear form b(. , .) exists and is unique. Therefore, both existence and uniqueness of the solution of the complete variational problem VP are standard consequences (see, for example, [21]) of the inf-sup condition given by (2.7). To prove the stability estimates, we first observe that the pair (u, λ) solution of the variational formulation (VP) satisfies the following mixed BVP: ⎧ 2 in Ω, ⎪ ⎨Δu + k u = 0 ikx·d ∂n u = −∂n e on Γ, ⎪ ⎩ on Σ, ∂n u = iku and ∀ K ∈ Th , we have K
λ
=
−∂n u
on ∂K \ ∂Ω,
0
on ∂K ∩ ∂Ω.
Consequently, if we set U = u + eikx·d φ
(2.18) where φ ∈ D(Ω) satisfies φ = 1 on Γ,
∂n φ = 0 on Γ,
φ = ∂n φ = 0 on Σ,
then it is easy to verify that U is the unique solution of BVP (2.11) with the right-hand side f given by f = (2ik d · ∇φ + Δφ) eikx·d , and there is a positive constant C (C depends on Ω only) such that f 0,Ω ≤ C (1 + k). Therefore, the proof of Theorem 1’s estimate is an immediate consequence of estimate (2.12) in Lemma 1, which concludes the proof of Theorem 1. 3. The discrete formulation. 3.1. Assumptions, notations, and properties. We adopt, throughout this section, the following notations and properties. K ik nK j ·(x−sj ) ; 1 ≤ j ≤ 4. • ∀ K ∈ Th , φK j =e • Xh is the discrete space for the primal variable. Xh is given by 4 5 Xh = vh ∈ X; ∀ K ∈ Th , vh |K ∈ Xh (K) ,
1047
CONVERGENCE ANALYSIS OF A DG METHOD
where
⎧ ⎨ Xh (K) =
⎩
vhK ∈ H 1 (K) ; vhK =
4
where αK j ∈C
K αK j φj ,
j=1
⎫ ⎬ ⎭
.
Note that Xh ⊆ X, and therefore, Xh is also equipped with the norm .X . • Mh is the discrete space of the dual variable. Mh is defined as follows: K K Mh = μh ∈ M; ∀ K ∈ Th and ∀ Tj ⊂ ∂K : μj = μ|T K ∈ C, 1 ≤ j ≤ 4 . j
K • For every K ∈ Th , the matrix B K = (Blj )1≤l,j≤4 represents the elementary matrix corresponding to the bilinear form b(· , ·). Hence, the entries of the matrix B K are given by 1 K = K φK 1 ≤ l, j ≤ 4. (3.1) Blj j dt, K hl Tl
• Cˆ designates a generic positive constant. Cˆ is independent of k, Ω, and the triangulation Th . • For a given K ∈ Th and ∀ v K ∈ H 1 (K), we have the following two classical inequalities [14]:
12 0 K 2 0 K0 1 0 K 02 0 0 ˆ 0 v 0,K + hK v 1,K v 0,∂K ≤ C , (3.2) hK 0 0 0 K 0 0v − 1 ˆ K v K (3.3) v K dx0 ≤ Ch . 0 0 1,K |K| K
0,K
1 In addition, it follows from combining (3.2) (when applied to v K − |K| and (3.3) that 0 0 1 0 K 0 1 K 0 ˆ 2 v K (3.4) v dx0 ≤ Ch . K 0v − |K| 0 1,K K 0,∂K
K
v K dx)
3.2. Discrete formulation and announcement of the main results. The discrete variational problem (DVP) corresponding to the variational formulation (VP) can be formulated as follows: ⎧ ⎪ ⎨Find (uh , λh ) ∈ Xh × Mh such that a(uh , vh ) + b(vh , λh ) = F (vh ) ∀ vh ∈ Xh , (3.5) (DVP) ⎪ ⎩ ∀ μh ∈ Mh . b(uh , μh ) = 0 The next two theorems summarize the main results of this section. Theorem 2. The DVP admits a unique solution (uh , λh ) ∈ Xh × Mh . 2
2
Moreover, for h0 > 0 such that k (1 + k) 3 h03 is “sufficiently small” and kh0 π, there is a positive constant C (C depends on Ω only) such that for all h ≤ h0 , we have (3.6)
7
4
5
2
u − uh 0,Ω ≤ C(1 + k) 3 h 3 , |u − uh |1,Th + λ − λh M ≤ C(1 + k) 3 h 3 ,
where (u, λ) is the solution of the continuous variational problem VP (2.4).
1048
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
Theorem 3. Let u be the solution of the continuous variational problem VP (2.4) and uh be the solution of the DVP. We assume that kh π, then there exists a constant C > 0 (C depends on Ω only) such that (3.7)
⎛
u − uh 0,Ω
≤ Cˆ ⎝
12 he ∂n uh −
ikuh 20,e
e⊂Σ
+
12 he ∂n uh + ∂n eikx·d 20,e
+
12 ⎞ 2 ⎠, h−1 e [uh ]0,e
e interior
e⊂Γ
where e is an edge of Th , [uh ] is the jump of uh across the edge e, and he is the length of e. Remark 2. We must point out that it has been reported in [10, 11] that for a highfrequency regime, the use of P1 FEM leads to the following estimates: |u − uh |1,Ω ≤ C k 2 h and u − uh 0,Ω ≤ C k 3 h2 when k 2 h is small enough. These estimates were derived assuming that u ∈ H 2 (Ω), which is not, however, valid for most problems. The a posteriori estimate given by (3.7) is a practical tool for a mesh adaptive strategy. This estimate reveals that the L2 error depends on how well the jump of the primal variable as well as the interior and exterior boundary conditions are approximated at the element level. In order to prove Theorems 2 and 3, we need first to establish intermediate interpolation results. This is accomplished in section 3.3. Then, we prove in section 3.4.1 the existence and the uniqueness of the solution of the DVP. This result is established as a direct consequence of Proposition 1 and Proposition 2. Section 3.4.2 is devoted to the proof of (3.6) and (3.7). The error estimate given by (3.6) is established in four steps, each step is formulated as a lemma (see Lemma 7 to Lemma 10). The a posteriori error estimate given by (3.7) is established at the end of section 3.4.2. The next result, that can be easily established, shows why the existence and the uniqueness of the solution of (DVP) is not a direct consequence of the existence and the uniqueness of the solution of (VP). Lemma 2. The null space Nh corresponding to the bilinear form b(· , ·) defined by Nh = {vh ∈ Xh : b(vh , μh ) = 0 ;
∀ μh ∈ Mh }
satisfies (3.8)
Nh = vh ∈ Xh ;
∂K∩∂K
vhK
dt = ∂K∩∂K
vhK
dt,
∀ K = K ∈ Th .
Remark 3. Lemma 2 states that Nh is not a subspace of N = H 1 (Ω), which is the null space of the bilinear form b(. , .). Indeed, the trace of an element of Nh on an edge of an element K is weakly continuous in the sense given by (3.8), while the trace of an element of N on an edge of an element K is “continuous” almost everywhere. Therefore, the inf-sup condition given by (2.7) and then Theorem 1 are no longer valid if we simply replace X and M by Xh and Mh , respectively. 3.3. Mathematical analysis of the interpolation operators. We establish in this section intermediate interpolation results that summarize the main properties of the projection operator Πh from X onto Xh and the projection operator Ph from
1049
CONVERGENCE ANALYSIS OF A DG METHOD
M onto Mh . These results are obtained in the case of a rectangular-shaped partition of the computational domain Ω. 3.3.1. Interpolation operator in Xh . Lemma 3. For a fixed K ∈ Th , we have the following two properties: K (i) The normal derivative ∂n φK j is constant on every edge Tl (1 ≤ l, j ≤ 4). (ii) If khK ≤ π, then the matrix B K is invertible and there is a positive constant Cˆ such that 0 0 Cˆ 0 K −1 0 (3.9) 0 B 0 ≤ 2 2 . k hK 2 Proof of Lemma 3. It follows from the definition of φK j (see section 3.1) that K K K ∂n φK j = ik nj · nl φj
(1 ≤ l, j ≤ 4).
on TlK
Therefore, since K is a rectangular-shaped element, a simple calculation shows that K ∂n φK j = ik on Tj ,
K ∂n φK j = −ik on Tj+2 ,
In addition, it follows from the definition of that ⎡ 1 b1 ⎢ b2 1 K B =⎢ ⎣a2 b1 b 2 a1 where aj = e−ikhj and bj = K
−ikhK j
1−e ikhK j
K K and ∂n φK j = 0 on Tj+1 ∪ Tj+3 .
the elementary matrix B K (see (3.1)) a2 b2 1 b2
⎤ b1 a1 ⎥ ⎥, b1 ⎦ 1
1 ≤ j ≤ 4.
,
We set Δ = (1 + a1 )(1 + a2 ) − 4b1 b2 . Then, it is easy to verify that Δ = 0 for khK ≤ π (which is, in fact, a sufficient but not necessary condition). This ensures that the matrix B K is invertible, and we have ⎤ ⎡ 1+a1 1+a1 1 1 −2 bΔ1 −2 bΔ1 Δ + 1−a2 Δ − 1−a2 ⎥ ⎢ 1+a2 1+a2 1 1 ⎥ −2 bΔ2 −2 bΔ2 K −1 1⎢ Δ + 1−a1 Δ − 1−a1 ⎥ ⎢ = ⎢ 1+a B ⎥. 1+a1 1 1 1 ⎥ 2⎢ −2 bΔ1 −2 bΔ1 Δ + 1−a2 ⎦ ⎣ Δ − 1−a2 −2 bΔ2
1+a2 Δ
−
1 1−a1
−2 bΔ2
1+a2 1 Δ 1−a1
Finally, one can verify that there is a positive constant Cˆ and k such that 0 0 0 K −1 0 0 ≤ 0 B 2
Cˆ . k 2 h2K
Next, we introduce the sequence of linear operators (πK )K∈Th defined as follows: πK : H 1 (K) −→ C4 vK −→ πK v K , where (3.10)
1 πK v K j = K hj
v K dt, TjK
1 ≤ j ≤ 4.
1050
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
Then, it follows from (3.2) that, for any hK independent vectorial norm |||.||| in C4 , there is a positive constant Cˆ such that 0 0 πK v K ≤ Cˆ 0v K 0 (3.11) ∀v K ∈ H 1 (K). X(K) In addition, we have (3.12) ∀vhK ∈ Xh (K) ,
vhK =
4
K αK j φj ,
where αK j =
−1 BK πK vhK , j
j=1
1 ≤ j ≤ 4.
The next result states that, for a given K ∈ Th , the set of DOFs associated to the 4 planar waves (φK j )j=1 is unisolvent. Lemma 4. For a given K ∈ Th and for any vhK ∈ Xh (K), we have the following equivalence: K vh dt = 0, 1 ≤ l ≤ 4 ⇐⇒ vhK = 0 on K . TlK
Proof of Lemma 4. Using (3.10) and (3.12), it follows that for a given K ∈ Th , we have vhK dt = 0, 1 ≤ l ≤ 4 ⇐⇒ πK vhK = 0 ⇐⇒ vhK = 0, TlK
which proves Lemma 4. Consequently, one can construct a sequence of local linear operator (ΠK )K∈Th as follows: ΠK : H 1 (K) −→ Xh (K), vK with
v K dt =
(3.13) TjK
−→ ΠK v K ,
ΠK v K dt, TjK
1 ≤ j ≤ 4.
Next, we state three properties of the operator ΠK . These properties are immediate consequences of the definition of ΠK , the inequalities (3.2)–(3.3), property (3.13) of the operator ΠK , and the characterization of elements of Xh (K) with the elementary matrix B K (see (3.12)). Note that the second identity of (3.14) is obtained by Green’s formula using the rectangular shape of K. Property 2. The operator ΠK satisfies the following three properties: (i) ∀ K ∈ Th and ∀ v ∈ H 1 (K), we have K K (3.14) v − ΠK v dt = 0, ∇ v K − ΠK v K dx = 0. ∂K
K
(ii) There is a positive constant Cˆ such that (3.15) ∀ K ∈ Th ,
0 K 0 1 0v − ΠK v K 0 ˆ 2 v K − ΠK v K ≤ Ch K 0,∂K 1,K
∀ v K ∈ H 1 (K).
1051
CONVERGENCE ANALYSIS OF A DG METHOD
(iii) For a given v K ∈ H 1 (K), we have (3.16) πK v K = πK o ΠK v K ΠK v K =
and
4
K αK j φj ,
αK j =
with
−1 BK πK v K . j
j=1
Proof of Property 2. We prove only the second property, since the two others are immediate. Using (3.14) and the definition of the norm .0,∂K , we have 0 0 0 K 0 K 0 K 0 K 0v − ΠK v K 0 0 0v − ΠK v K − 1 v dt = − Π v K 0 0 |∂K| 0,∂K ∂K
0,∂K
≤ inf v K − ΠK v K − β0,∂K β∈C 1 K K v − ΠK v K dt0,∂K . ≤ v − ΠK v − |K| K
We then conclude using (3.4). In the next two lemmas, we establish a priori estimates on the operator ΠK . Lemma 5. Assume kh ≤ π. Then, there is a positive constant Cˆ such that ∀ K ∈ Th and ∀ v K ∈ H 1 (K), we have 0 K 0 0v − ΠK v K 0 (3.17) ≤ Cˆ hK v K − ΠK v K 1,K , 0,K 0 0 0 0 0 0 (3.18) k 0ΠK v K 00,K + 0ΠK v K 0X(K) ≤ Cˆ 0v K 0X(K) . Proof of Lemma 5. We establish the estimate given by (3.17) using the Aubin– Nitsche argument [23, 24, 25]. More specifically, consider the following auxiliary BVP: Find ϕ ∈ H01 (K) such that −Δϕ = v K − ΠK v K
on K.
Since K is a rectangular-shaped element, then ϕ is, in fact, in H 2 (K) we have 0 0 |ϕ|2,K = Δϕ0,K = 0v K − ΠK v K 00,K . It follows that 0 K 0 0v − ΠK v K 02 = 0,K
∇ v K − ΠK v K · ∇ϕ dx −
K
H
H01 (K), and
K v − ΠK v K ∂n ϕ dt. ∂K
Using (3.14), we deduce that
∇ v K − ΠK v K · ∇ϕ dx = ∇ v K − ΠK v K · ∇ϕ − 1 ∇ϕ dx dx . |K| K K K Then, ∇ v K − ΠK v K · ∇ϕ dx ≤ v K − ΠK v K 1,K K
0 0 0 0 0∇ϕ − 1 0 ∇ϕ dx . 0 0 |K| K 0,K
1052
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
It follows from (3.3) that there is a positive constant Cˆ such that ˆ K v K − ΠK v K ∇ v K − ΠK v K · ∇ϕ dx ≤ Ch |ϕ|2,K . 1,K K
Moreover, using (3.13) we obtain that
K K 1 K K K v − ΠK v ∂n ϕ dt = v − ΠK v ∇ϕ dx · n dt . ∇ϕ − |K| K ∂K ∂K Hence, we have 0 0 0 0 K 0 0 K K 0∇ϕ − 1 ≤ 0v − ΠK v K 0 0 v ∂ − Π v ϕ dt ∇ϕ dx . K n 0 0,∂K 0 |K| K ∂K 0,∂K Finally, using inequality (3.4) and (3.15), it follows that there is positive constant Cˆ such that K K ˆ K |v K − ΠK v K |1,K |ϕ|2,K . v − ΠK v ∂n ϕ dt ≤ Ch ∂K
Therefore, (3.17) results from 0 K 0 0v − ΠK v K 02 ≤ Ch ˆ K v K − ΠK v K |ϕ|2,K 0,K 1,K K 0 0 0v K − ΠK v K 0 ˆ K v − ΠK v K = Ch . 1,K 0,K Next, we establish the estimate given by (3.18). To do this, we first note that it follows from (3.16) that ∀ v K ∈ H 1 (K) ,
4 K K ΠK v K ≤ α φ , j
j
j=1
where |||.||| is any norm in Xh (K). Hence, using (3.12), (3.11), and (3.9), there is a positive constant Cˆ such that ∀ v K ∈ H 1 (K),
ΠK v K ≤
Cˆ k 2 h2K
On the other hand, it is easy to verify that 0 K0 0φj 0 ≤ hK and 0,K
0 K0 0v 0 . max φK j X(K) 1≤j≤4
K φj
1,K
≤ khK .
Consequently, there is a positive constant Cˆ such that 0 0 0ΠK v K 0 ≤ 0,K
Cˆ k2 h
K
0 K0 0v 0 X(K)
and
ΠK v K
1,K
≤
0 Cˆ 0 0v K 0 . X(K) khK
Furthermore, using (3.17), we deduce that 0 K 0 0v − ΠK v K 0 ΠK v K ˆ hK v K ≤ C + h K 0,K 1,K 1,K
0 K0 K C 0 ˆ 0 v X(K) . ≤ C hK v 1,K + k
1053
CONVERGENCE ANALYSIS OF A DG METHOD
Thus, 0 0 0 0 k 0v K − ΠK v K 00,K ≤ Cˆ khK v K 1,K + 0v K 0X(K) , and therefore, using the definition of the norm · X(K) , it follows that 0 0 0 0 k 0ΠK v K 00,K ≤ Cˆ 0v K 0X(K) , which concludes the proof of the first part of (3.18). Finally, we establish the second part of the estimate given by (3.18). To do this, we observe that ∀ v K ∈ H 1 (K), we have K K K K K v − ΠK v K 2 = v − ΠK v K ΔΠK v K dx ∇ v − ΠK v .∇v dx + 1,K K
= K
K
∇ v K − ΠK v K .∇v K dx − k 2
K v − ΠK v K ΠK v K dx
K
0 0 0 0 ≤ v K − ΠK v K 1,K v K 1,K + k 2 0v K − ΠK v K 00,K 0ΠK v K 00,K . Note that there are no boundary terms in the previous equalities because of Lemma 3 and (3.13). Using again (3.17), we deduce the existence of a positive constant Cˆ such that ˆ 2 hK ΠK v K 0,K . |v K − ΠK v K |1,K ≤ |v K |1,K + Ck Therefore, using the first part of (3.18), we deduce that K 0 0 v − ΠK v K ˆ K 0v K 0 ≤ v K 1,K + Ckh . 1,K X(K) Consequently, there is a positive constant cˆ such that 0 0 0 0 ΠK v K ˆ K 0v K 0 ≤ 2 v K 1,K + Ckh ≤ cˆ 0v K 0X(K) . 1,K X(K) Moreover, using (3.17), we deduce that there is a positive constant Cˆ such that 0 0 0 0 ˆ K v K − ΠK v K 0ΠK v K 0 ≤ 0v K 00,K + Ch , 0,K 1,K and thus, 0 0 0 0 ˆ K 0v K 0 0ΠK v K 0 ≤ Ch , 0,K X(K) which concludes the proof of (3.18). Lemma 6. Assume kh ≤ π. Then for every s ∈ [0, 1], there is a positive constant Cˆ such that for all K ∈ Th , we have (3.19) |vK − ΠK vK |1,K ≤ Cˆ2 hsK |vK |1+s,K + k 2 hK vK 0,K + k 2 h2K |vK |1,K ∀vK ∈ H 1+s (K). Proof of Lemma 6. First, let ϕ be in P1 (K), where P1 (K) is the space of the affine polynomial functions. Then, using first (3.14) and the fact that ∇ϕ is constant in
1054
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
each triangle, next that functions in Xh satisfy the homogeneous Helmholtz equation in each triangle, we can write |ϕ − ΠK ϕ|21,K = ∇(ϕ − ΠK ϕ).∇(ϕ − ΠK ϕ) dx = − ∇(ϕ − ΠK ϕ).∇ΠK ϕ dx K
K
(ϕ − ΠK ϕ).ΔΠK ϕ dx −
= K
(ϕ − ΠK ϕ).∂n ΠK ϕ dt ∂K
(ϕ − ΠK ϕ).ΔΠK ϕ dx = −k 2
= K
(ϕ − ΠK ϕ).ΠK ϕ dx K
≤ k 2 ϕ − ΠK ϕ0,K ΠK ϕ0,K . From relation (3.17), we obtain ˆ K |ϕ − ΠK ϕ|1,K . ϕ − ΠK ϕ0,K ≤ Ch Moreover, (3.18) gives ˆ ΠK ϕ0,K ≤ C(ϕ 0,K + hK |ϕ|1,K ). Hence, ˆ 2 hK (ϕ0,K + hK |ϕ|1,K ). |ϕ − ΠK ϕ|1,K ≤ Ck On the other hand, it follows from (3.18) that for vK ∈ H 1 (K) and ϕ ∈ P1 (K), we have
1 vK − ϕ0,K + |vK − ϕ|1,K |ΠK (ϕ − vK )|1,K ≤ Cˆ hK and then |vK − ΠK vK |1,K ≤ |vK − ϕ|1,K + |ϕ − ΠK ϕ|1,K + |ΠK (ϕ − vK )|1,K ≤ Cˆ
1 hK vK
− ϕ0,K + |vK − ϕ|1,K + k 2 hK (ϕ0,K + hK |ϕ|1,K ) .
Furthermore, since khK ≤ π, we deduce that |vK − ΠK vK |1,K
1 ≤ Cˆ vK − ϕ0,K + |vK − ϕ|1,K + k 2 hK vK 0,K + k 2 h2K |vK |1,K . hK Since vK ∈ H 1+s (K) with s ∈ [0, 1], we chose ϕ to be the P1 -polynomial approxi 1 mation (the Lagrange polynomial interpolation) of v on K if s = 0 and ϕ = |K| v dx K if s = 0. Therefore, it follows from the standard P1 interpolation results on K (see [14]) that |vK − ΠK vK |1,K ≤ Cˆ hsK |vK |1+s,K + k 2 hK vK 0,K + k 2 h2K |vK |1,K . Next, we introduce the global interpolation linear operator Πh as follows: Πh : X −→ Xh , v −→ Πh v,
CONVERGENCE ANALYSIS OF A DG METHOD
1055
with (Πh v)|K = ΠK (v|K ) ∈ Xh (K) ∀ K ∈ Th . Property 3. The global interpolation operator Πh : X −→ Xh satisfies the following four properties: (i) ∀v ∈ H 1+s (Ω) with s ∈ [0, 1], we have (3.20) v − Πh v0,Ω ≤ Cˆ h1+s |v|1+s,Ω + k 2 h3 |v|1,Ω + k 2 h2 v0,Ω , (3.21) |v − Πh v|1,Th ≤ Cˆ hs |v|1+s,Ω + k 2 h2 |v|1,Ω + k 2 hv0,Ω . (ii) ∀ v ∈ H 1 (Ω), Πh v ∈ Nh , where Nh is the null space of b(. , .). (iii) ∀ v ∈ X and ∀ vh ∈ Xh , we have a(v − Πh v, vh ) = −ik (v − Πh v) v h dt, K∈Th ∂K∩Σ (3.22) vh (v − Πh v) dt. a(vh , v − Πh v) = −ik K∈Th
∂K∩Σ
(iv) ∀ v ∈ X and ∀ μh ∈ Mh , we have b(v, μh ) = b(Πh v, μh ).
(3.23)
Note that (3.20)–(3.21) are immediate consequences of Lemma 6, while the two equalities given by (3.22) are obtained by Green’s formula and using the fact that the plane waves are solutions of the Helmholtz equation. 3.3.2. Interpolation operator in Mh . We introduce here the projection operator Ph for the dual variable λ. Ph is defined as follows: Ph : M −→ Mh , μ
−→ Ph μ,
where ∀ K ∈ Th ,
Ph μ|
TK j
=
1 hK j
1 ≤ j ≤ 4.
μ dt, TjK
Then, the operator Ph satisfies (3.24)
∀ K ∈ Th , ∀ μ ∈ M,
μdt =
∂K
Ph μdt . ∂K
3.4. Proof of Theorem 2. We first prove that the DVP admits a unique solution (uh , λh ) in Xh × Mh and then we establish the error estimate given by (3.6). 3.4.1. Existence and uniqueness. First, we prove that the bilinear form b(· , ·) satisfies the inf-sup condition [21]. This result is stated in Proposition 1. Then, we prove in Proposition 2 the uniqueness of the solution of the homogeneous problem corresponding to the variational problem (DVP). The existence and uniqueness of the DVP is then a direct consequence of Proposition 1 and Proposition 2.
1056
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
Proposition 1. Assume kh ≤ π. Then, there is a positive constant γ independent of k and h such that γμh M ≤ sup
vh ∈Xh
|b(vh , μh )| ≤ μh M vh X
∀ μh ∈ Mh .
Proof of Proposition 1. From (2.8), we deduce that ∀ μh ∈ Mh ,
sup
vh ∈Xh
|b(vh , μh )| ≤ μh M . vh X
In addition, it follows from (2.7) that ∀ μh ∈ Mh , ∃ φ ∈ X,
sup v∈X
|b(v, μh )| |b(φ, μh )| = = μh M . vX φX
Therefore, it follows from (3.23) that μh M =
|b(Πh φ, μh )| Πh φX . Πh φX φX
Since kh ≤ π, it follows from (3.18) that there is a positive constant Cˆ such that |b(vh , μh )| , μh M ≤ Cˆ sup vh X vh ∈Xh which concludes the proof of Proposition 1. Proposition 2. Assume kh ≤ π. Then, the only solution of the following homogeneous DVP Find uh ∈ Nh such that a(uh , vh ) = 0
∀ vh ∈ Nh ,
is the trivial one. Proof of Proposition 2. Let uh ∈ Nh such that a(uh , vh ) = 0 ∀ vh ∈ Nh , then a(uh , uh ) = 0, which implies uh = 0
on Σ and k uh 0,Ω = |uh |1,Th .
In addition, since uh ∈ Xh , then Δuh + k 2 uh = 0 in every K ∈ Th . Therefore, using integration by parts, it follows that a(uh , vh ) = v h ∂n uh dt = 0 ∀vh ∈ Nh . K∈Th
∂K
Then, we also have ∂n uh = 0 on Γ ∪ Σ and [∂n uh ] = 0 on ∂K ∩ ∂K ∀ K = K ∈ Th , K is the jump of the normal derivative of uh across where [∂n uh ] = ∂n uK h + ∂n uh ∂K ∩ ∂K . To conclude the proof of this proposition, we use a discrete continuation result. We consider first the following property (P).
CONVERGENCE ANALYSIS OF A DG METHOD K be two adjacent edges of K such that Let K ∈ Th and TlK and Tm K K ∂n uh |TlK = ∂n uh |Tm K = uh dt = uh dt = 0, then uh = 0 TlK
1057
in K.
K Tm
Note that property (P) is easy to establish since uh ∈ Xh (a sum of four plane waves), and therefore, uh satisfies the Helmholtz equation at the element level K. Now, since there is at least one element K ∈ Th with two adjacent edges belonging to the boundary Σ, then using property (P) leads to uh = 0 in K. Then, we obtain sequentially that uh = 0 in all the quadrilaterals belonging to the first layer adjacent to the boundary Σ. We repeat this process on the second layer of the quadrilaterals and so on, until the boundary Γ is reached, which proves the uniqueness of the solution uh . 3.4.2. A priori error estimates. In the next lemmas, we establish a priori estimates in order to prove the error estimate (3.6) given in Theorem 2 between the exact solution (u, λ) and the discrete solution (uh , λh ). We consider the following notations: (3.25)
κh = h(1 + k)
zh = uh − Πh u.
and
Lemma 7. There is a positive constant Cˆ independent of k and h such that the solution λ of the variational problem VP (2.4) satisfies 2
ˆ 3 (1 + k). λ − Ph λM ≤ Cκ h Proof of Lemma 7. First, recall that −∂n u K λ = 0
on ∂K \ ∂Ω, on ∂K ∩ ∂Ω.
Therefore, using the definition of the operator Ph along with the fact the normal unit vector nK is constant on each edge e of K, we deduce that ∀ K ∈ Th , we have 0 02 0 0 K 0∇u.nK − 1 0 λ − Ph λ20,∂K = ∇u.n dt 0 0 |e| e 0,e e⊂K,e interior 0 02 0 0 0∇u − 1 ≤ ∇u dt0 = inf 2 ∇u − β20,e 0 0 |e| e β∈C 0,e e⊂K,e interior e⊂K,e interior 0 02 0 02 0 0 0 0 0∇u − 1 0 ≤ 0∇u − 1 0 ≤ ∇u dx ∇u dx . 0 0 0 0 |K| K |K| K 0,e 0,∂K e⊂K,e interior
Finally, using classical interpolation results [14], there is a positive constant Cˆ such that (3.26)
∀ K ∈ Th ,
1
ˆ 6 |u| 5 . λ − Ph λ0,∂K ≤ Ch K 3 ,K
In addition, we have from (2.3) that λ − Ph λ
1
H − 2 (∂K)
=
sup v∈H 1 (K)
∂K
(λ − Ph λ)v dt . vX(K)
1058
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
On the other hand, from (3.24), we deduce that
(λ − Ph λ)vdt =
(λ − Ph λ)v dt ≤ λ − Ph λ0,∂K
0 0 0 0 0v − 1 v dx0 0 0 |K| K 0,∂K
∂K
1 (λ − Ph λ) v − |K| ∂K
K
v dx dt
∀ v ∈ H 1 (K).
Hence,
∂K
∀ v ∈ H 1 (K).
Using the following classical interpolation results [14], it follows that there is a positive constant Cˆ such that 0 0 0 1 1 0 0v − 1 ˆ 2 |v|1,K ≤ Ch ˆ 2 vX(K) . v dx0 ≤ Ch K K 0 0 |K| K 0,∂K We then deduce the existence of a positive constant Cˆ such that (3.27)
∀ K ∈ Th ,
λ − Ph λ
1
1 H− 2
(∂K)
ˆ 2 λ − Ph λ0,∂K ≤ Ch K
∀ μ ∈ M.
Lemma 7 is the consequence of (3.26)–(3.27) and Theorem 1. The next lemma can be viewed as a consistency result. Lemma 8. Assume kh ≤ π. Then, there is a positive constant Cˆ independent of k and h such that ∀vh ∈ Xh and ∀v ∈ H 1 (Ω), 2
|a(zh , vh ) + b(vh , λh − Ph λ)| ≤ Cˆ (1 + k)κh3 [ κh |vh |1,Th + |v − vh |1,Th ] . Proof of Lemma 8. We have a(zh , vh ) = a(uh − Πh u , vh ) = a(u − Πh u , vh ) − a(u − uh , vh ). Moreover, since u satisfies VP, we have a(u, vh ) + b(vh , λ) = F (vh ), and since uh satisfies DVP, we have a(uh , vh ) + b(vh , λh ) = F (vh ). Consequently, we obtain a(u − uh , vh ) = − b(vh , λ − λh ), which leads to a(zh , vh ) + b(vh , λh − Ph λ) = a(u − Πh u , vh ) + b(vh , λ − Ph λ).
CONVERGENCE ANALYSIS OF A DG METHOD
1059
Hence, it follows from (3.22) that (3.28)
a(uh −Πh u , vh )+b(vh , λh −Ph λ) = −ik
(u−Πh u)v h dt+b(vh , λ−Ph λ) ∀vh ∈ Xh . Σ
Next, using (3.13) and following the same proof of (3.26) in Lemma 7, we obtain 1 (u − Πh u)v h dt ≤ |u − Π u| | v − v h dt| dt h h |e| e Σ e⊂Σ e 0 0 0 0 1 0 ≤ u − Πh u0,∂K 0vh − vh dx0 . 0 |K| K 0,∂K ∂K⊂Σ
Hence, using (3.4), it follows that there is a positive constant Cˆ such that (u − Πh u)v h dt ≤ Cˆ hK |u − Πh u|1,K |vh |1,K . Σ
K∈Th
Then, it follows from using Theorem 1 and Lemma 6 that there is a positive constant Cˆ such that 5 (u − Πh u)v h dt ≤ Cˆ κ 3 + κ2h + κ3h |vh |1,Th , h Σ
which implies (assuming kh ≤ π) that 5 ˆ 3 (3.29) (u − Πh u)v h dt ≤ C κh |vh |1,Th . Σ
On the other hand, we have ∀v ∈ H 1 (Ω), [v h ] (λ − Ph λ) dt = |b(vh , λ − Ph λ)| = e e interior
=
e interior
≤
K
e interior
[v − v h ] (λ − Ph λ) dt e
1 (λ − Ph λ) . (v − v h ) − (v − v h ) dt |e| e e
0 0 0 0 1 0 λ − Ph λ0,∂K 0(v − vh ) − (v − vh ) dx0 . 0 |K| K 0,∂K
Therefore, it follows from using using (3.4) that there is a positive constant Cˆ such that 1 2 hK |v − vh |1,K λ − Ph λ0,∂K . |b(vh , λ − Ph λ)| ≤ Cˆ K∈Th
Hence, from (3.26) and Theorem 1, we obtain that there is a positive constant Cˆ such that (3.30)
2
|b(vh , λ − Ph λ)| ≤ Cˆ κh3 (1 + k) |v − vh |1,Th .
1060
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
We conclude the proof of Lemma 8 by substituting (3.29) and (3.30) into (3.28). Remark 4. We deduce from Lemma 8 that, when kh ≤ π, there is a positive constant Cˆ such that ∀vh ∈ Nh and ∀v ∈ H 1 (Ω), (3.31)
2
|a(zh , vh )| ≤ Cˆ (1 + k) κh3 [ κh |vh |1,Th + |v − vh |1,Th ].
Lemma 9. Assume kh ≤ π. Then, there is a positive constant C (C depends on Ω only) such that 2 2 (3.32) zh 0,Ω ≤ C κh3 (1 + k)κh3 + |zh |1,Th . Proof of Lemma 9. First, observe that zh belongs to Nh and let φ be the solution of the following BVP (see Lemma 1): −Δφ − k 2 φ = z h
in Ω,
and ∂n φ = 0
on Γ,
∂n φ = ik φ
on Σ.
5 3
Hence, it follows from Lemma 1 that φ ∈ H (Ω) and (see (2.12)) there is constant C > 0 (C depends on Ω only) such that, for every s ∈ [0, 53 ], we have |φ|s,Ω ≤ C (1 + k)s−1 zh 0,Ω .
(3.33) In addition, we have (3.34)
zh 20,Ω = a(zh , φ) −
e interior
[zh ]∂n φ dt. e
Equation (3.34) results from multiplying the BVP introduced in Lemma 9, integrating by parts on Ω, and using the definition of the bilinear form a. The second term of this equality is due to the discontinuity of zh along the interior edges. Recall that the jump [φ] along e ∈ ∂K ∩ ∂K is given by [φ] = φK − φK . On the other hand, we have |a(zh , φ)| ≤ |a(zh , Πh φ)| + |a(zh , φ − Πh φ)| . It follows from (3.22) that (3.35)
|a(zh , φ)| ≤ |a(zh , Πh φ)| + k zh φ − Πh φ dt . Σ
Since Πh φ ∈ Nh (see property (ii) in Property 3), then it follows from Remark 4 that there is a positive constant Cˆ such that 2
|a(zh , Πh φ)| ≤ Cˆ (1 + k) κh3 [κh |Πh φ|1,Th + |φ − Πh φ|1,Th ] . Moreover, it follows from Lemma 6 that there is a positive constant Cˆ such that - 2 . |φ − Πh φ|1,Th ≤ Cˆ h 3 |φ| 53 ,Ω + k 2 hφ0,Ω + k 2 h2 |φ|1,Ω .
1061
CONVERGENCE ANALYSIS OF A DG METHOD
Then, using relation (3.33) and the assumption kh ≤ π, we obtain 2
ˆ 3 zh 0,Ω |φ − Πh φ|1,Th ≤ Cκ h
ˆ h 0,Ω . |Πh φ|1,Th ≤ Cz
and
We then obtain 4
|a(zh , Πh φ)| ≤ Cˆ (1 + k)κh3 zh 0,Ω . For the second part of (3.35), we have 2 zh φ − Πh φ dt ≤ Cˆ h |φ − Πh φ|1,T |zh |1,T ≤ Chκ ˆ 3 |zh |1,T zh 0,Ω . h h h h Σ
Note that the previous inequality was obtained using the same methodology to prove 1 Lemma 5. Hence, first we use (3.13) when we add the constant (− |K| z dt) to zh . h K Then, we apply Cauchy–Schwarz along with inequalities (3.2) and (3.4). Finally, it follows that there is a positive constant C (C depends on Ω only) such that 4 5 (3.36) |a(zh , φ)| ≤ C (1 + k)κh3 + κh3 |zh |1,Th zh 0,Ω . Next, we estimate the term | e interior e [zh ]∂n φ dt| in (3.34). First, observe that zhK dt = zhK dt ∀ e ∈ ∂K ∩ ∂K and K = K ∈ Th e
and
e
e∈∂K∩∂K
zhK − zhK
∂n φ dt
1 1 zhK dt ∇φ − ∇φ dx · nK dt zhK − |e| e |K| K e
1 1 K K + zh dt ∇φ − ∇φ dx · nK dt. zh − | |e| |K e e K
=
Therefore, 1 zh − 1 dt. [zh ]∂n φ dt ≤ z dt ∇φ dx ∇φ − h |e| e |K| K e e e interior
K∈Th e⊂K
Hence, it follows that 2 2 [zh ]∂n φ dt ≤ Cˆ h 3 |zh |1,Th |φ| 53 ,Ω ≤ Cκh3 |zh |1,Th zh 0,Ω . (3.37) e e interior
We conclude the proof of Lemma 9 by substituting (3.36) and (3.37) into equation (3.34). 2
2
Lemma 10. Let h0 be a positive number such that k h03 (1 + k) 3 is “sufficiently small.” Then, there is a positive constant C (C depends on Ω only) such that, for all h ≤ h0 , we have 4
ˆ + k)κ 3 uh − Πh u0,Ω ≤ C(1 h
and
2
ˆ + k)κ 3 . |uh − Πh u|1,Th ≤ C(1 h
1062
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
Proof of Lemma 10. It follows from the definition of the bilinear form a(., .) that 2 2 |a(zh , zh )| = |zh |21,Th − k 2 zh 20,Ω + k 2 zh 40,Γ . Moreover, using Remark 4 with vh = zh and v = 0 along with the fact that kh ≤ π, we obtain 2
|a(zh , zh )| ≤ Cˆ (1 + k)κh3 |zh |1,Th . Therefore, we deduce that 2
|zh |21,Th ≤ k 2 zh 20,Ω + Cˆ (1 + k)κh3 |zh |1,Th . Then, using (3.32) along with Young’s inequality, we obtain 8 4 2 |zh |21,Th ≤ C k 2 (1 + k)2 κh3 + k 2 κh3 |zh |21,Th + (1 + k)κh3 |zh |1,Th . Consequently, we have 8 4 4 |zh |21,Th ≤ C k 2 (1 + k)2 κh3 + k 2 κh3 |zh |21,Th + (1 + k)2 κh3 . 4
4
Let us consider h0 such that Ck 2 (1 + k) 3 h03 ≤ 4 3
Ck κh ≤ 2
1 2.
|zh |21,Th
We deduce that 8 4 ≤ C k 2 (1 + k)2 κh3 + (1 + k)2 κh3 ,
1 2,
then for every h ≤ h0 , we have
2
then |zh |1,Th ≤ Cˆ (1 + k)κh3 .
In addition, we obtain, from using (3.32), that 4
ˆ + k)κ 3 , zh 0,Ω ≤ C(1 h which concludes the proof of Lemma 10. Proof of the a priori error estimate of Theorem 2. We are now ready to prove the estimate given by (3.6). • From Lemmas 6 and 10, it follows that there is a positive constant C (C depends on Ω only) such that 4 4 u − uh 0,Ω ≤ u − Πh u0,Ω + uh − Πh u0,Ω ≤ C κh3 + (1 + k)κh3 and
2 2 |u − uh |1,Th ≤ |u − Πh u|1,Th + |uh − Πh u|1,Th ≤ C κh3 + kκh + (1 + k)κh3 .
Hence, we deduce that 4
u − uh 0,Ω ≤ C (1 + k)κh3
2
and |u − uh |1,Th ≤ C (1 + k)κh3 .
• Moreover, we deduce from Lemma 8 that there is a positive constant Cˆ such that 2
|b(vh , λh − Ph λ)| ≤ Cˆ (1 + k)κh3 |vh |1,Th + |a(zh , vh )|
∀vh ∈ Xh .
CONVERGENCE ANALYSIS OF A DG METHOD
1063
On the other hand, it follows from the definition of the bilinear form a(. , .) that 2 |a(zh , vh )| ≤ |zh |1,Th |vh |1,Th + k zh .v h dx + k zh 0,Σ vh 0,Σ ∀vh ∈ Xh . Ω
Therefore, using the definition of the norm · X and inverse inequality results, we deduce that there is a positive constant Cˆ such that 1 1 |a(zh , vh )| ≤ |zh |21,Th + k 2 h2 zh 20,Ω 2 vh X + Cˆ k zh 0,Σ h 2 vh X
∀vh ∈ Xh .
In addition, it follows from the definition of the bilinear form a(. , .) and from using (3.31) with vh = zh and v = 0 (see Remark 4) that there is a positive constant Cˆ such that 2
kzh 20,Σ ≤ |a(zh , zh )| ≤ Cˆ (1 + k)κh3 |zh |1,Th . Therefore, using Lemma 10, we deduce that there is a positive constant C (C depends on Ω only) such that 2
1
k 2 zh 0,Σ ≤ (1 + k)κh3 . Hence, we deduce that there is a positive constant C (C depends on Ω only) such that 2
|a(zh , vh )| ≤ C (1 + k)κh3 vh X
∀vh ∈ Xh .
Consequently, it follows from Proposition 1 that there is a positive constant C (C depends on Ω only) such that 2
λh − Ph λM ≤ C (1 + k)κh3 . Finally, we deduce from Lemma 7 that there is a positive constant C (C depends on Ω only) such that 2
λ − λh M ≤ C (1 + k)κh3 , which concludes the proof of the error estimate of Theorem 2. Proof of the a posteriori error estimate (3.7) in Theorem 3. Let φ be the solution of the BVP (2.11) (see Lemma 1) with f = u − uh . Then, this solution φ belongs to 5 H 3 (Ω) and for every s ∈ [0, 53 ], there exists a constant C > 0 depending only on s and Ω such that |φ|s,Ω C(1 + k)s−1 u − uh 0,Ω . Using integration by parts, one can easily verify that u − uh 20,Ω = φ(∂n uh − ik uh ) dt + φ ∂n uh + ∂n eikx·d dt K∈Th ∂K∩Σ K∈Th ∂K∩Γ + [∂n uh ]φ dt − [uh ]∂n φ dt. e interior
e
e interior
e
1064
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
On the other hand, we also have a(uh , Πh φ) = −
∂n eikx·d Πh φ dt.
∂K∩Γ
K∈Th
Therefore, using integration by parts along with the fact that uh satisfies the Helmholtz equation at the element level, we have ∂n uh Πh φ dt + (∂n uh − ikuh ) Πh φ dt ∂K∩Γ
K∈Th
K∈Th
+
∂K∩Σ
[∂n uh ] Πh φ dt
e interior
=−
K∈Th
e
∂n eikx·d Πh φ dt.
∂K∩Γ
Consequently, using the fact that for every interior edge e, we have [∂ u ] Πh φ dt, we deduce that e n h (3.38) u −
uh 20,Ω
=
+
[∂n uh ] φ dt =
φ − Πh φ (∂n uh − ik uh ) dt
∂K∩Σ
K∈Th
e
φ − Πh φ (∂n uh + ∂n eikx·d ) dt −
∂K∩Γ
K∈Th
e interior
[uh ]∂n φ dt. e
Next, we estimate each integral in the right-hand side of (3.38) to deduce the a posteriori estimate given by (3.7) in Theorem 3. φ − Πh φ (∂n uh − ik uh ) dt. • First, we estimate: I1 = ∂K∩Σ K∈Th
We have I1
12 he ∂n uh − ik uh 20,e
e⊂Σ
Cˆ
12 he ∂n uh − ik uh 20,e
12 2 h−1 e φ − Πh φ0,e
e⊂Σ
|φ − Πh φ|1,Th .
e⊂Σ
Therefore, assuming that kh π, it follows from the properties of the operator Π (see (3.21) in Property 3) that there is a positive constant Cˆ such that I1 ≤ Cˆ1
12 he ∂n uh −
ik uh 20,e
2 h 3 |φ| 53 ,Ω + |φ|1,Ω + kφ0,Ω .
e⊂Σ
We deduce from the a priori estimate on |φ|s,Ω that there is a positive constant Cˆ1 such that 12 I1 ≤ Cˆ1 he ∂n uh − ik uh 20,e u − uh 0,Ω . e⊂Σ
1065
CONVERGENCE ANALYSIS OF A DG METHOD
• Similarly, there is also a positive constant Cˆ2 such that ikx·d I2 = dt φ − Πh φ ∂n uh + ∂n e ∂K∩Γ K∈Th
≤ Cˆ
12 he ∂n uh + ∂n eikx·d 20,e
|φ − Πh φ|1,Th .
e⊂Γ
Then, there is there is a positive constant denoted again by Cˆ2 such that 12 0 02 ikx·d 0 I2 ≤ Cˆ2 he 0∂n uh + ∂n e u − uh 0,Ω . 0,e
e⊂Γ
• Last, we estimate I3 = | e interiorHe [uh ]∂n φ dt|. Consider an interior edge e = ∂K(e) ∂K (e), then [uh ]∂n φ dt = [uh ]∇φ.n dt = [uh ] (∇φ − β).n dt e
e
We then obtain
∀ β ∈ C2 .
e
[uh ]∂n φ dt ≤ [uh ]0,e inf ∇φ − β 0,e . β ∈C2 e
On the other hand, since there is a positive constant Cˆ such that 1
inf ∇φ − β 0,e ≤ Cˆ he6 |φ| 53 ,K(e) ,
β ∈C2
it follows that I3 ≤ Cˆ
1 6
he [uh ]0,e |φ| 53 ,K(e) ≤ Cˆ
e interior
12 h−1 e
[uh ]20,e
2
h 3 |φ| 53 ,Ω .
e interior
Then, there is a positive constant Cˆ3 such that 12 −1 2 ˆ I3 ≤ C3 he [uh ]0,e u − uh 0,Ω . e interior
4. Conclusion. A DGM with plane waves and Lagrange multipliers was recently proposed by Farhat, Harari, and Hetmaniuk [3] for solving two-dimensional Helmholtz problems at relatively high wavenumbers. In many previous papers, this method was shown numerically to offer a significant potential for wave propagation problems including acoustic scattering. However, it lacked a formal convergence theory. This paper is a first step toward filling this gap. Indeed, it is proved that the hybrid variational formulation underlying this DGM is well-posed in the sense of Hadamard. In addition, a priori error estimates proved for the so-called R-4-1 element, that is, the simplest two-dimensional element associated with this discretization method, establish the convergence of this element and reveal its formal order of accuracy. Furthermore, an a posteriori error estimate was derived that can be used as a practical error indicator when refining the partition of the computational domain. Higher-order elements will be analyzed in future research.
1066
MOHAMED AMARA, RABIA DJELLOULI, AND CHARBEL FARHAT
Acknowledgment. The authors are grateful to the referees for their constructive suggestions and remarks. REFERENCES [1] C. Farhat, I. Harari, and L. P. Franca, The discontinuous enrichment method, Comput. Methods Appl. Mech. Engrg, 190 (2001), pp. 6455–6479. [2] C. Farhat, I. Harari, and U. Hetmaniuk, The discontinuous enrichment method for multiscale analysis, Comput. Methods Appl. Mech. Engrg, 192 (2003), pp. 3195–3210. [3] C. Farhat, I. Harari, and U. Hetmaniuk, A discontinuous Galerkin method with Lagrange multipliers for the solution of Helmholtz problems in the mid-frequency regime, Comput. Methods Appl. Mech. Engrg., 192 (2003), pp. 1389–1419. [4] M. E. Rose, Weak element approximations to elliptic differential equations, Numer. Math., 24 (1975), pp. 185–204. [5] I. Babuˇ ska I, and J. M. Melenk, The partition of unity method, Internat. J. Numer. Methods Engrg., 40 (1997), pp. 727–758. [6] O. Cessenat and B. Despres, Application of an ultra weak variational formulation of elliptic PDEs to the two-dimensional Helmholtz problem, SIAM J. Numer. Anal., 35 (1998), pp. 255– 299. [7] P. Monk and D. Q. Wang, A least-squares method for the Helmholtz equation, Comput. Methods Appl. Mech. Engrg., 175 (1999), pp. 121–136. [8] C. Farhat, P. Weidemann-Goiran, and R. Tezaur, A discontinuous Galerkin method with plane waves and Lagrange multipliers for the solution of short wave exterior Helmholtz problems on unstructured meshes, Wave Motion, 39 (2004), pp. 307–317. [9] C. Farhat, R. Tezaur, and P. Wiedemann-Goiran, Higher-order extensions of a discontinuous Galerkin method for mid-frequency Helmholtz problems, Internat. J. Numer. Methods Engrg., 61 (2004), pp. 1938–1956. [10] A. Bayliss, C. I. Goldstein, and E. Turkel, On accuracy conditions for the numerical computations of waves, J. Comput. Phys., 59 (1985), pp. 396–404. [11] F. Ihlenburg, Finite Element Analysis of Acoustic Scattering, Appl. Math. Sci. 132, SpringerVerlag, New York, 1998. [12] J. Hadamard, Lectures on Cauchy’s Problem in Linear Partial Differential Equations, Yale University Press, New Haven, 1923. [13] D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory, Appl. Math. Sci. 93, Springer-Verlag, New York, 1992. [14] P. G. Ciarlet, The Finite Element Method for Elliptic Problems, North–Holland, Amsterdam, 1978. [15] R. A. Adams, Sobolev Spaces, Academic Press, New York, 1975. ¨ rmander, The Analysis of Linear Partial Differential Operator, Springer-Verlag, New [16] L. Ho York, 1985. [17] M. E. Taylor, Partial Differential Equations I: Basic Theory, Springer-Verlag, New York, 1997. [18] M. Melenk, On Generalized Finite Element Methods, Ph.D. thesis, University of Maryland, College Park, MD, 1995. [19] U. Hetmaniuk, Stability estimates for a class of Helmholtz problems, Commun. Math., Sci., 5 (2007), pp. 665–678. [20] J. L. Lions and E. Magenes, Non-homogeneous Boundary Value Poblems and Applications, Volume I, Springer-Verlag, New York, 1972. [21] F. Brezzi and M. Fortin, Mixed and Hybrid Finite Element Methods, Springer-Verlag, New York, 1991. [22] P. Grisvard, Elliptic Problems in Non Smooth Domains, Pitman, Boston, 1985. [23] J. P. Aubin, Analyse Fonctionnelle Appliqu´ ee, Presse Universitaire de France, Paris, 1987. [24] J. Nitsche, Ein kriterium fur die quasi-optimalitat des Ritzchen Verfahrens, Numer. Math., 11 (1968), pp. 346–348. [25] J. Cea, Approximation variationnelle des probl` emes aux limites, Ann. Inst. Fourier, 14 (1964), pp. 345–444.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1067–1091
c 2009 Society for Industrial and Applied Mathematics
A CONVERGENT ADAPTIVE METHOD FOR ELLIPTIC EIGENVALUE PROBLEMS∗ S. GIANI† AND I. G. GRAHAM‡ Abstract. We prove the convergence of an adaptive linear finite element method for computing eigenvalues and eigenfunctions of second-order symmetric elliptic partial differential operators. The weak form is assumed to yield a bilinear form which is bounded and coercive in H 1 . Each step of the adaptive procedure refines elements in which a standard a posteriori error estimator is large and also refines elements in which the computed eigenfunction has high oscillation. The error analysis extends the theory of convergence of adaptive methods for linear elliptic source problems to elliptic eigenvalue problems, and in particular deals with various complications which arise essentially from the nonlinearity of the eigenvalue problem. Because of this nonlinearity, the convergence result holds under the assumption that the initial finite element mesh is sufficiently fine. Key words. second-order elliptic problems, eigenvalues, adaptive finite element methods, convergence AMS subject classifications. 65N12, 65N25, 65N30, 65N50 DOI. 10.1137/070697264
1. Introduction. In the last decades, mesh adaptivity has been widely used to improve the accuracy of numerical solutions to many scientific problems. The basic idea is to refine the mesh only where the error is high, with the aim of achieving an accurate solution using an optimal number of degrees of freedom. There is a large amount of numerical analysis literature on adaptivity, in particular on reliable and efficient a posteriori error estimates (e.g., [1]). Recently, the question of convergence of adaptive methods has received intensive interest and a number of convergence results for the adaptive solution of boundary value problems have appeared (e.g., [8, 18, 19, 7, 6, 23]). We prove here the convergence of an adaptive linear finite element algorithm for computing eigenvalues and eigenvectors of scalar symmetric elliptic partial differential operators in bounded polygonal or polyhedral domains, subject to Dirichlet boundary data. Such problems arise in many applications, e.g., resonance problems, nuclear reactor criticality, and the modelling of photonic band gap materials, to name but three. Our refinement procedure is based on two locally defined quantities, firstly, a standard a posteriori error estimator and secondly a measure of the variability (or “oscillation”) of the computed eigenfunction. (Measures of “data oscillation” appear in the theory of adaptivity for boundary value problems, e.g., [18]. In the eigenvalue problem the computed eigenvalue and eigenfunction on the present mesh plays the role of “data” for the next iteration of the adaptive procedure.) Our algorithm performs local refinement on all elements on which the minimum of these two local quantities is sufficiently large. We prove that the adaptive method converges provided the initial mesh is sufficiently fine. The latter condition, while absent for adaptive methods for ∗ Received by the editors July 16, 2007; accepted for publication (in revised form) October 20, 2008; published electronically February 13, 2009. http://www.siam.org/journals/sinum/47-2/69726.html † School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK (
[email protected]). ‡ Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, UK (
[email protected]).
1067
1068
S. GIANI AND I. G. GRAHAM
linear symmetric elliptic boundary value problems, commonly appears for nonlinear problems and can be thought of as a manifestation of the nonlinearity of the eigenvalue problem. We believe that the present paper is the first contribution to the topic of convergence of adaptive methods for eigenvalue problems. Since writing this paper, substantial improvements in the theory have been made in [5], where the need to adapt on the oscillations of the eigenvalue is removed and, in addition, the general convergence of the adaptive scheme to a nonspurious eigenvalue of the continuous problem is established. The outline of the paper is as follows. In section 2 we briefly describe the model elliptic eigenvalue problem and the numerical method and in section 3 we describe a priori estimates, most of which are classical. Section 4 describes the a posteriori estimates and the adaptive algorithm. Section 5 proves that proceeding from one mesh to another ensures error reduction (up to oscillation of the computed eigenfunction) while the convergence result is presented in section 6. Numerical experiments illustrating the theory are presented in section 7. 2. Eigenvalue problem and numerical method. Throughout, Ω will denote a bounded domain in Rd (d = 2 or 3). In fact, Ω will be assumed to be a polygon (d = 2) or polyhedron (d = 3). We will be concerned with the problem of finding an eigenvalue λ ∈ R and eigenfunction 0 = u ∈ H01 (Ω) satisfying (2.1)
a(u, v) := λ b(u, v),
where, for real valued functions u and v, ∇u(x)T A(x)∇v(x)dx (2.2) a(u, v) =
for all v ∈ H01 (Ω), B(x)u(x)v(x)dx .
and b(u, v) =
Ω
Ω
Here, the matrix-valued function A is required to be uniformly positive definite, i.e., (2.3) 0 < a ≤ ξ T A(x)ξ ≤ a
for all
ξ ∈ Rd
with |ξ| = 1
and all x ∈ Ω.
The scalar function B is required to be bounded above and below by positive constants for all x ∈ Ω, i.e., (2.4)
0 < b ≤ B(x) ≤ b for all x ∈ Ω.
We will assume that A and B are both piecewise constant on Ω and that any jumps in A and B are aligned with the meshes Tn (introduced below), for all n. Throughout the paper, for any polygonal (polyhedral) subdomain of D ⊂ Ω, and any s ∈ [0, 1], · s,D and | · |s,D will denote the standard norm and seminorm in the Sobolev space H s (D). Also (·, ·)0,D denotes the L2 (D) inner product. We also define the energy norm induced by the bilinear form a: |u|2Ω := a(u, u) for all u ∈ H01 (Ω), which, by (2.3), is equivalent to the H 1 (Ω) seminorm. (The equivalence constant depends on the contrast a/a, but we are not concerned with this dependence in the present paper.) We also introduce the weighted L2 norm: 2 u0,B,Ω = b(u, u) = B(x)|u(x)|2 dx, Ω
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
1069
and note the norm equivalence (2.5) bv0,Ω ≤ v0,B,Ω ≤ bv0,Ω . Rewriting the eigenvalue problem (2.1) in standard normalized form, we seek (λ, u) ∈ R × H01 (Ω) such that a(u, v) = λ b(u, v), for all v ∈ H01 (Ω) (2.6) . u0,B,Ω = 1 By the continuity of a and b and the coercivity of a on H01 (Ω) it is a standard result that (2.6) has a countable sequence of nondecreasing positive eigenvalues λj , j = 1, 2, . . . with corresponding eigenfunctions uj ∈ H01 (Ω) [3, 12, 24]. In this paper we will need some additional regularity for the eigenfunctions uj , which will be achieved by making the following regularity assumption for the elliptic problem induced by a. Assumption 2.1. We assume that there exists a constant Cell > 0 and s ∈ [0, 1] with the following property. For f ∈ L2 (Ω), if v ∈ H01 (Ω) solves the problem a(v, w) = (f, w)0,Ω for all w ∈ H01 (Ω), then v1+s,Ω ≤ Cell f 0,Ω . Assumption 2.1 is satisfied with s = 1 when A is constant (or smooth) and Ω is has a smooth boundary or is a convex polygon. In a range of other practical cases s ∈ (0, 1), for example, Ω nonconvex (see [4]), or A having a discontinuity across an interior interface (see [2]). Under Assumption 2.1 √ it follows that the eigenfunctions uj of the problem (2.6) satisfy uj 1+s,Ω ≤ Cell λj b. To approximate problem (2.6) we use the piecewise linear finite element method. Accordingly, let Tn , n = 1, 2, . . . denote a family of conforming triangular (d = 2) or tetrahedral (d = 3) meshes on Ω. Each mesh consists of elements denoted τ ∈ Tn . We assume that for each n, Tn+1 is a refinement of Tn . For a typical element τ of any mesh, its diameter is denoted Hτ and the diameter of its largest inscribed ball is denoted ρτ . For each n, let Hn denote the piecewise constant mesh function on Ω, whose value on each element τ ∈ Tn is Hτ and let Hnmax = maxτ ∈Tn Hτ . Throughout we will assume that the family of meshes Tn is shape regular; i.e., there exists a constant Creg such that (2.7)
Hτ ≤ Creg ρτ ,
for all τ ∈ Tn
and all n = 1, 2, . . . .
In the later sections of the paper, the Tn will be produced by an adaptive process which ensures shape regularity. We let Vn denote the usual finite dimensional subspace of H01 (Ω), consisting of all continuous piecewise linear functions with respect to the mesh Tn . Then the discrete formulation of problem (2.6) is to seek the eigenpairs (λn , un ) ∈ R × Vn such that a(un , vn ) = λn b(un , vn ), for all vn ∈ Vn (2.8) un 0,B,Ω = 1. The problem (2.8) has N = dim Vn positive eigenvalues (counted according to multiplicity) which we denote in nondecreasing order as λn,1 ≤ λn,2 ≤ · · · ≤ λn,N . It is well-known (see [24, section 6.3]) that for any j, λn,j → λj as Hnmax → 0 and (by the minimax principle—see, e.g., [24, section 6.1]) the convergence of the λn,j is monotone decreasing, i.e., (2.9)
λn,j ≥ λm,j ≥ λj ,
for all j = 1, . . . , N,
and all m ≥ n.
1070
S. GIANI AND I. G. GRAHAM
Thus, it is clear that there exists a separation constant ρ > 0 (depending on the spectrum of (2.6)) with the following property: If λj = λj+1 = · · · = λj+R−1 is any eigenvalue of (2.6) of multiplicity R ≥ 1, then λj ≤ ρ, |λn, − λj |
(2.10)
= j, j + 1, . . . , j + R − 1,
provided Hnmax is sufficiently small. (Note that for = j, j + 1, . . . j + R − 1, λn, → λ = λj .) The a priori error analysis for our eigenvalue problem is classical (see, e.g., [3], [12], and [24]). In the next section, we briefly recall some of the main known results and also prove a nonclassical result (Theorem 3.2) which is essential to the proof of convergence of our adaptive scheme. 3. A priori analysis. In this section we shall assume that λj is an eigenvalue of (2.6) and λn,j is its approximation as described above. Let uj and un,j be any corresponding normalized eigenvectors as defined in (2.6) and (2.8). From these we obtain the important basic identity: a(uj − un,j , uj − un,j ) = a(uj , uj ) + a(un,j , un,j ) − 2a(uj , un,j ) = λj + λn,j − 2λj b(uj , un,j ) = λn,j − λj + λj (2 − 2b(uj , un,j )) = λn,j − λj + λj b(uj − un,j , uj − un,j ).
(3.1) Using this and (2.9), we obtain
|||uj − un,j |||2Ω = |λj − λn,j | + λj uj − un,j 20,B,Ω .
(3.2)
The following theorem investigates the convergence of discrete eigenpairs. Although parts of it are very well-known, we do not know a suitable reference for all the results given below, so a brief proof is given for completeness. In the proof we make use of the orthogonal projection Qn of H01 (Ω) onto Vn with respect to the inner product induced by a(·, ·), which has the property: a(Qn u, vn ) = λ b(u, vn ) for all vn ∈ Vn .
(3.3)
In the main result of this paper we prove convergence for adaptive approximations to eigenvalues and eigenvectors assuming for simplicity a simple eigenvalue. The following preliminary theorem is stated for a simple eigenvalue. However, this result is known for multiple eigenvalues (see, e.g., [24]). More details are given in [10]. Theorem 3.1. Let λj be a simple eigenvalue of (2.6), let λn,j be its associated approximation from solving (2.8), and let uj and un,j be any corresponding normalized eigenvectors. Then for all 1 ≤ j ≤ N , (i) (3.4)
|λj − λn,j | ≤ |||uj − un,j |||2Ω ;
(ii) There are constants C1 , C2 > 0 and scalars αn,j ∈ {±1} such that (3.5)
uj − αn,j un,j 0,B,Ω ≤ C1 (Hnmax )s |uj − Qn uj |Ω ≤ C1 (Hnmax )s |uj − αn,j un,j |Ω ,
where s is as in Assumption 2.1.
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
1071
(iii) For sufficiently small Hnmax there is a constant C2 such that (3.6)
|||uj − αn,j un,j |||Ω ≤ C2 (Hnmax )s .
The constants C1 , C2 depend on the spectral information λ , u , = 1, . . . , j, the separation constant ρ, the constants Cell , Creg in Assumption 2.1 and in (2.7) and on the bounds a, a, b, b in (2.3), (2.4). Proof. The estimate (3.4) follows directly from (3.2). Note that (3.4) holds even if un,j is not close to u, which may occur due to the nonuniqueness of the eigenvectors. The proof of (3.5) is obtained by a reworking of the results in [24]. By the symmetry of a and b there exists a basis {un, : = 1, . . . , N } of Vn (containing un,j ) which is orthonormal with respect to inner product b, and each un, is an eigenvector of (2.8) corresponding to eigenvalue λn, . Then with βn,j := b(Qn uj , un,j ), Parseval’s equality yields Qn uj − βn,j un,j 20,B,Ω =
(3.7)
N
b(Qn uj , un, )2 .
=1
=j
Then, since λn, b(Qn uj , un, ) = a(Qn uj , un, ) = a(uj , un, ) = λj b(uj , un, ), we have (λn, − λj )b(Qn uj , un, ) = λj b(uj − Qn uj , un, ), and so Qn uj − βn,j un,j 20,B,Ω =
N
=1
λj λn, − λj
2 b(uj − Qn uj , un, )2
=j
≤ ρ2
N
b(uj − Qn uj , un, )2 ≤ ρ2 uj − Qn uj 20,B,Ω ,
=1
=j
with the last step again by Parseval’s equality. Hence, (3.8)
uj − βn,j un,j 0,B,Ω ≤ (1 + ρ)uj − Qn uj 0,B,Ω .
Moreover, uj 0,B,Ω −uj −βn,j un,j 0,B,Ω ≤ βn,j un,j 0,B,Ω ≤ uj 0,B,Ω +uj −βn,j un,j 0,B,Ω . Since the uj and the un,j are normalized, this implies 1 − uj − βn,j un,j 0,B,Ω ≤ |βn,j | ≤ 1 + uj − βn,j un,j 0,B,Ω and, combining these with (3.8), we have ||βn,j | − 1| ≤ (1 + ρ)uj − Qn uj 0,B,Ω . Thus, with αn,j := sign(βn,j ), we have |βn,j − αn,j | ≤ (1 + ρ)uj − Qn uj 0,B,Ω , and uj − αn,j un,j 0,B,Ω ≤ 2(1 + ρ)uj − Qn uj 0,B,Ω .
1072
S. GIANI AND I. G. GRAHAM
The first inequality in (3.5) now follows from an application of the standard Aubin– Nitsche duality argument, while the second is just the best approximation of Qn in the energy norm. The proof of (3.6) is a slight modification of that given in [24, Theorem 6.2]. The argument consists of obtaining an O((Hnmax )2s ) estimate for the eigenvalue error |λj − λn,j | and then combining this with (3.2) and (3.5). The next theorem is a generalization to eigenvalue problems of the standard monotone convergence property for linear symmetric elliptic PDEs, namely, that if one enriches the finite dimensional space, then the error is bound to decrease. This result fails to hold for eigenvalue problems (even for symmetric elliptic partial differential operators), because of the nonlinearity of such problems. The best that we can do is to show that if the finite dimensional space is enriched, then the error will not increase very much. This is the subject of Theorem 3.2. Theorem 3.2. For any 1 ≤ j ≤ N , there exists a constant q > 1 such that, for m ≥ n, the corresponding computed eigenpair (λm,j , um,j ) satisfies: (3.9)
|uj − αm,j um,j |Ω ≤ q |uj − αn,j un,j |Ω .
Proof. From Theorem 3.1 (ii), we obtain (3.10)
max s uj − αm,j um,j 0,B,Ω ≤ C1 (Hm ) |uj − Qm uj |Ω .
Since Tm is a refinement of Tn , it follows that Vn ⊂ Vm and so the best approximation property of Qm ensures that |uj − Qm uj |Ω ≤ |uj − Qn uj |Ω . max Hence, from (3.10) and using the fact that Hm ≤ Hnmax , we have
(3.11)
uj − αm,j um,j 0,B,Ω ≤ C1 (Hnmax )s |uj − Qn uj |Ω .
Recalling that (3.2) holds for all eigenfunctions, and using (3.11) and then (2.9), we obtain |uj − αm,j um,j |2Ω ≤ |λj − λm,j | + λj uj − αm,j um,j 20,B,Ω ≤ |λj − λm,j | + λj C12 (Hnmax )2s |uj − Qn uj |2Ω ≤ |λj − λn,j | + λj C12 (Hnmax )2s |uj − Qn uj |2Ω .
(3.12) Hence, from (3.4) we obtain (3.13) |uj −αm,j um,j |2Ω
≤
|uj −αn,j un,j |2Ω + λj C12 (Hnmax )2s |uj −Qn uj |2Ω .
But, since Qn yields the best approximation from Vn in the energy norm, we have (3.14)
|uj − αm,j um,j |2Ω ≤ (1 + λj C12 (H0max )2s ) |uj − αn,j un,j |2Ω ,
which is in the required form. Remark 3.3. From now on we will be concerned with a true eigenpair (λj , uj ) and its computed approximation (λj,n , uj,n ) on the mesh Tn . Theorem 3.1 tells us that a priori λn,j is “close” to λj and that the spaces spanned by uj and un,j are close. From now on we drop the subscript j and we simply write (λ, u) for the eigenpair of (2.6) (λn , un ) for a corresponding eigenpair of (2.8) and the scalar αn,j is abbreviated αn .
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
1073
4. A posteriori analysis. This section contains our a posteriori error estimator and the definition of the adaptive algorithm for which convergence will be proved in the following sections. Recalling the mesh sequence Tn defined above, we let Sn denote the set of all the interior edges (or the set of interior faces in 3D) of the elements of the mesh Tn . For each S ∈ Sn , we denote by τ1 (S) and τ2 (S) the elements sharing S (i.e., τ1 (S) ∩ τ2 (S) = S) and we write Ω(S) = τ1 (S) ∪ τ2 (S). We let nS denote the unit normal vector to S, orientated from τ1 (S) to τ2 (S). All elements, faces, and edges are considered to be closed sets. Furthermore, we denote the diameter of S by HS . Note that, by mesh regularity, diam(Ω(S)) ∼ Hτi (S) , i = 1, 2. Notation 4.1. We write A B when A/B is bounded by a constant which may depend on the functions A and B in (2.2), on a, a, b, and b, on Cell in Assumption 2.1, Creg in (2.7). The notation A ∼ = B means A B and A B. All the constants depending on the spectrum, namely, ρ in (2.10), q in (3.9), C1 and C2 in (3.5) and (3.6), are handled explicitly. Similarly all mesh size dependencies are explicit. Note that all eigenvalues of (2.8) satisfy λn 1, since λn ≥ λ1 = a(u1 , u1 ) |u1 |21,Ω u1 20,Ω u1 20,B,Ω = 1. Our error estimator is obtained by adapting standard estimates for source problems to the eigenvalue problem. Analogous eigenvalue estimates can be found in [9] (for the Laplace problem) and [25] (for linear elasticity) and related results are in [14]. For a function g, which is piecewise continuous on the mesh Tn , we introduce its jump across an edge (face) S ∈ Sn by: ⎛ ⎞ [g]S (x) := ⎝ lim g(˜ x) − lim g(˜ x)⎠ , x ˜∈τ1 (S) x ˜→x
for x ∈ int(S).
x ˜∈τ2 (S) x ˜→x
Then for any function v with piecewise continuous gradient on Tn we define, for S ∈ Sn , JS (v)(x) := [nS · Av]S (x),
for
x ∈ int(S).
The error estimator ηn on the mesh Tn is defined as 2 ηS,n , (4.1) ηn2 := S∈Sn
where, for each S ∈ Sn , (4.2)
0 02 0 1/2 0 2 := Hn λn un 20,B,Ω(S) + 0HS JS (un )0 . ηS,n 0,S
The following lemma is proved, in a standard way, by adapting the usual arguments for linear source problems. Note again that λ is an eigenvalue of (2.6), λn is a nearby eigenvalue of (2.8), and u, un are any corresponding normalized eigenfunctions which are only “near” in the sense of Theorem 3.1. Lemma 4.2 (reliability). (4.3)
|u − un |Ω ηn + Gn ,
and (4.4)
Gn :=
u − un 20,B,Ω 1 (λ + λn ) . 2 |u − un |Ω
1074
S. GIANI AND I. G. GRAHAM
Remark 4.3. Recalling Remark 3.3, un in Lemma 4.2 is any normalized eigenvector of (2.8) corresponding to the simple eigenvalue λ; i.e., its sign is not unique. However, the error estimators ηS,n are independent of the sign of un . This is not a contradiction: we shall see that only one choice of eigenfunction will guarantee that the second term on the right-hand side of (4.3) is small, and only in this case is the left-hand side also guaranteed to be small. A similar result to Lemma 4.2 was proved in [25, Proposition 5]. Proof. To ease readability we set en = u − un in the proof. Note first that, since (λ, u) and (λn , un ), respectively, solve the eigenvalue problems (2.1) and (2.8), we have, for all wn ∈ Vn , |||en |||2Ω = a(en , en ) = a(en , en − wn ) + a(en , wn ) = a(en , en − wn ) + a(u, wn ) − a(un , wn ) (4.5)
= a(en , en − wn ) + b(λu − λn un , wn ) = a(en , en − wn ) − b(λu − λn un , en − wn ) + b(λu − λn un , en ).
To estimate the first two terms on the right-hand side of (4.5), first note that, for all v ∈ H01 (Ω), a(en , v) − b(λu − λn un , v) = −a(un , v) + λn b(un , v). Hence, using elementwise integration by parts (and the fact that A∇un is constant on each element and v vanishes on ∂Ω), we obtain a(en , v) − b(λu − λn un , v) = − (A∇un ).∇v + λn b(un , v) τ ∈Tn
τ
S∈Sn
S
=−
(4.6)
JS (un )v + λn b(un , v),
and hence, for all wn ∈ Vn , (4.7) a(en , en −wn ) − b(λu−λn un , en −wn ) = −
S∈Sn
JS (un )(en −wn ) + λn b(un , en −wn ).
S
Now recall the Scott–Zhang quasi-interpolation operator ([22]) which has the property that, for all v ∈ H01 (Ω), In v ∈ Vn and (4.8)
v − In v0,τ Hτ |v|1,ω(τ ) ,
1
v − In v0,S HS2 |v|1,ω(S) ,
where ω(τ ) is the union of all elements sharing at least a point with τ , and ω(S) is the union of all elements sharing at least a point with S. (Note Ω(S) ⊆ ω(S).) Substituting wn = In en in (4.7) and using the Cauchy–Schwarz inequality, together with estimates (4.8), we obtain (4.9)
a(en , en − wn ) − b(λu − λn un , en − wn ) ηn |||en |||Ω .
To estimate the third term on the right-hand side of (4.5), we simply observe that due to the normalization in each of the eigenvalue problems (2.1) and (2.8) we have (4.10)
b(λu − λn un , en ) = (λ + λn )(1 − b(u, un )) =
1 (λ + λn )en 20,B,Ω . 2
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
1075
Now, combine (4.9) and (4.10) with (4.5) and divide by |||en |||Ω to obtain the result. Remark 4.4. We shall see below that Gn defined above constitutes a “higher order term”. For mesh refinement based on the local contributions to ηn , we use the same marking strategy as in [8] and [18]. The idea is to refine a subset of the elements of Tn whose side residuals sum up to a fixed proportion of the total residual ηn . Definition 4.5 (marking strategy 1). Given a parameter 0 < θ < 1, the procedure is: mark the sides in a minimal subset Sˆn of Sn such that ⎛ (4.11)
⎝
⎞1/2 2 ⎠ ηS,n
≥ θηn .
S∈Sˆn
To compute Sˆn , we compute all the “local residuals” ηS,n , then insert edges (faces) into Sˆn in order of nonincreasing magnitude of ηS,n , until (4.11) is satisfied. A minimal subset Sˆn may not be unique. After this is done, we construct another set Tˆn , containing all the elements of Tn , which contain at least one edge (face) belonging to Sˆn . In order to prove our convergence theory, we require an additional marking strategy based on oscillations (Definition 4.7 below). This also appears in some theories of adaptivity for source problems, e.g., [8], [18], [16], [7], and [6]), but to our knowledge has not yet been used in connection with eigenvalue problems. The concept of “oscillation” is just a measure of how well a function may be approximated by piecewise constants on a particular mesh. For any function v ∈ L2 (Ω), and any mesh Tn , we introduce its orthogonal projection Pn v onto piecewise constants defined by 1 (4.12) (Pn v)|τ = vn , for all τ ∈ Tn . |τ | τ Then we make the definition: Definition 4.6 (oscillations). On a mesh Tn , we define (4.13)
osc(v, Tn ) := Hn (v − Pn v)0,B,Ω .
Note that osc(v, Tn ) =
1/2 Hτ2 v
−
Pn v20,B,τ
,
τ ∈Tn
and that (by standard approximation theory and the ellipticity of a(·, ·)), (4.14)
osc(v, Tn ) (Hnmax )2 |||v|||Ω ,
for all v ∈ H01 (Ω).
The second marking strategy (introduced below) aims to reduce the oscillations corresponding to a particular approximate eigenfunction un . Definition 4.7 (marking strategy 2). Given a parameter 0 < θ˜ < 1: mark the elements in a minimal subset T˜n of Tn such that (4.15)
osc(un , T˜n ) ≥ θ˜ osc(un , Tn ).
1076
S. GIANI AND I. G. GRAHAM
@ @ @ @ @ (a)
@ @ @ @ @ @ @ @ (b)
@ @ @ @ @ @ @ @ (c)
Fig. 4.1. The refinement procedure applied to an element of the mesh. In (a) the element before the refinement, in (b) after the three sides have been refined, and in (c) after the bisection of one of the three new segments.
Analogously to (4.11), we compute T˜n by inserting elements τ into T˜n according to nonincreasing order of their local contributions Hτ2 (un − Pn un )20,B,τ until (4.15) is satisfied. Our adaptive algorithm can then be stated: Algorithm 1 Converging algorithm Require: 0 < θ < 1 Require: 0 < θ˜ < 1 loop Solve the Problem (2.8) for (λn , un ) Mark the elements using the first marking strategy (Definition 4.5) Mark any additional unmarked elements using the second marking strategy (Definition 4.7) Refine the mesh Tn and construct Tn+1 end loop In 2D at the nth iteration in Algorithm 1 each element in the set Tˆn ∪ T˜n is refined using the algorithm illustrated in Figure 4.1. This consists of three recursive applications of the newest node algorithm [17] to each marked triangle, first creating two sons, then four grandsons, and finally bisecting two of the grandsons. This wellknown algorithm is stated without name in [18, section 5.1]), is called “bisection5” in [7] and is called “full refinement” in [23]. This technique creates of a new node in the middle of each marked side in Sˆn and also a new node in the interior of each marked element. It follows from [17] that this algorithm yields shape regular conforming meshes in 2D. In the 3D case we use a suitable refinement that creates a new node on each marked face in Sˆn and a node in the interior of each marked element. In [18] and [16] it has been shown for linear source problems that the reduction of the error, as the mesh is refined, is triggered by the decay of oscillations of the source on the sequence of constructed meshes. For the eigenvalue problem (2.1) the quantity λu plays the role of data and in principle we have to ensure that oscillations of this quantity (or, more precisely, of its finite element approximation λn un ) are sufficiently small. However, λn un may change if the mesh changes and so the proof of error reduction for eigenvalue problems is not as simple as it is for linear source problems. This is the essence of the theoretical difficulty dealt with in this paper. 5. Error reduction. In this section we give the proof of error reduction for Algorithm 1. The proof has been inspired by the corresponding theory for source problems in [18]. However, the nonlinearity of the eigenvalue problem introduces new complications, and there are several lemmas before the main theorem (Theorem 5.6). For the rest of the section let (λn , un ) be an approximate eigenpair on a mesh Tn , let
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
@ @ @ @ u @u u @ @ x1 @ x2 @ xS @ @ @ (a)
1077
@ @ @x @ S u @ u @ x2 @ x1 @u @ @ @ @ (b)
Fig. 5.1. Two cases of refined couples of elements.
Tn+1 be the mesh obtained by one iteration of Algorithm 1, and let (λn+1 , un+1 ) be the corresponding eigenpair in the sense made precise in Remark 3.3. The first lemma uses ideas from [18, Lemma 4.2] for the 2D case. The extension of this lemma to the 3D case is treated in Remark 5.2. Lemma 5.1. Consider the 2D case. Let Sˆn be as defined in Definition 4.5 and let Pn be as defined in (4.12). For any S ∈ Sˆn , there exists a function ΦS ∈ Vn+1 such that supp(ΦS ) = Ω(S) and also (5.1) λn
B(Pn un )ΦS − Ω(S)
S
0 02 0 1/2 0 JS (un )ΦS = Hn λn Pn un 20,B,Ω(S) + 0HS JS (un )0 , 0,S
and (5.2)
0 02 0 1/2 0 |||ΦS |||2Ω(S) Hn λn Pn un 20,B,Ω(S) + 0HS JS (un )0 , 0,S
where |||v|||2Ω(S) := Ω(S) ∇v T A∇v. Proof. Figure 5.1 illustrates two possible configurations of the domain Ω(S). We then define (5.3)
ΦS := αS ϕS + β1 ϕ1 + β2 ϕ2 ,
where ϕS and ϕi are the nodal basis functions associated with the points xS and xi on Tn+1 , and αS , βi are defined by
(5.4)
⎧ 0 02 0 1/2 0 ⎪ ⎪ J (u ) 0H 0 S n ⎪ S ⎪ 0,S ⎨− αS = J (u )ϕ n S S S ⎪ ⎪ ⎪ ⎪ ⎩ 0
if JS (un ) = 0, otherwise,
and ⎧ Hn λn Pn un 20,B,τi (S) − αS τi (S) Bλn (Pn un )ϕS ⎪ ⎪ ⎪ ⎨ Bλn (Pn un )ϕi τi (S) (5.5) βi = ⎪ ⎪ ⎪ ⎩ 0 for i = 1, 2.
if Pn un |τi (S) = 0, otherwise,
1078
S. GIANI AND I. G. GRAHAM
Note that JS (un ) and Pn un are constant on each element τ . Using the fact that supp(ϕi ) = τi (S), for i = 1, 2 we can easily see that the above formulae imply 0 02 0 1/2 0 (5.6) JS (un )ϕS = − 0HS JS (un )0 , αS 0,S S (5.7) Bλn (Pn un )(αS ϕS + β1 ϕ1 + β2 ϕ2 ) = Hn λn Pn un 20,B,Ω(S) Ω(S)
(and that these formulae remain true even if JS (un ) or Pn un |τi (S) vanish). Hence,
B(Pn un )ΦS −
λn Ω(S)
JS (un )ΦS S
B(Pn un )(αS ϕS + β1 ϕ1 + β2 ϕ2 ) − αS
= λn Ω(S)
JS (un )ϕS S
and (5.1) follows immediately on using (5.6) and (5.7). To proceed from here note that by the shape-regularity of the mesh and the standard inverse estimate, |||φS |||Ω(S) HS−1 φS 0,Ω(S) . Also, for all elements τ ∈ Tn+1 with τ ⊂ supp φS , there exists an affine map χ : τˆ → τ , where τˆ is the unit simplex in R2 and φˆS := φS ◦ χ is a nodal basis function on τˆ. The Jacobian Jχ of χ is constant and is proportional to the area of τ . Hence, 2 ˆ |φS |2 = φS 20,τ = φS Jχ ∼ HS2 , τ
τˆ
which ensures at |||ϕS |||Ω(S) 1 and, similarly, |||ϕi |||Ω(S) 1. Combining these with (5.3), we obtain |||ΦS |||2Ω(S) |αS |2 + |β1 |2 + |β2 |2 . over [−HS /2, Now, note that by a simple change of variable, S ϕS is the integral HS /2] of the one-dimensional hat function centered on 0 and so S ϕS ∼ HS . Since JS (un ) is constant on S , we have 0 0 0 1/2 02 |JS (un )| 0HS 0 0 0 0 1/2 0 0,S |JS (un )|HS ∼ 0HS JS (un )0 . (5.9) |αS | HS 0,S 2 Also, since Pn un is constant on each τi (S) and, since τi (S) Bφi ∼ Hτi (S) , we have (5.8)
|βi |
λn | (Pn un )|τi (S) | Hn 20,B,τi (S) + |αS |Hτ2i (S) Hτ2i (S)
λn | (Pn un )|τi (S) | Hτ2i (S) + |αS | ∼ λn Hn Pn un 0,B,τi (S) + |αS |. This implies (5.10)
0 02 0 1/2 0 |βi |2 λn Hn Pn un 20,B,τi (S) + |αS |2 λn Hn Pn un 20,B,τi (S) + 0HS JS (un )0 , 0,S
and the proof is completed by combining (5.8) with (5.9) and (5.10).
1079
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
Remark 5.2. To extend the results in Lemma 5.1 to the 3D case we need to use a refinement procedure for tetrahedra that creates a new node on each marked face in Sˆn and a node in the interior of each marked element. The proof in the 3D case is similar to the proof in the 2D case: for each couple of refined elements we define ΦS := αS ϕS + β1 ϕ1 + β2 ϕ2 , where ϕS is the nodal basis function associated to the new node on the shared face and ϕi are the nodal basis functions associated to the new nodes in the interior of the elements. The coefficients αS , β1 , and β2 can be chosen in the same way as in Lemma 5.1, and the rest of the proof proceeds similarly. In the next lemma, we bound the local error estimator above by the local difference of two discrete solutions coming from consecutive meshes, plus higher order terms. This kind of result is called “discrete local efficiency” by many authors. Recall that Tn+1 is the refinement of Tn obtained by applying Algorithm 1. Lemma 5.3. For any S ∈ Sˆn , we have 2 ηS,n
|un+1 − un |2Ω(S) + Hn (λn+1 un+1 − λn Pn un )20,B,Ω(S)
(5.11) + Hn λn (un − Pn un )20,B,Ω(S) . Proof. Since the function ΦS defined in Lemma 5.1 is in Vn+1 and supp(ΦS ) = Ω(S), we have (5.12)
a(un+1 − un , ΦS ) = a(un+1 , ΦS ) − a(un , ΦS ) = λn+1
Bun+1 ΦS − a(un , ΦS ). Ω(S)
Now applying integration by parts to the last term on the right-hand side of (5.12), we obtain Bun+1 ΦS − JS (un )ΦS . (5.13) a(un+1 − un , ΦS ) = λn+1 Ω(S)
S
Rewriting (5.13) and combining with (5.1), we obtain B(λn+1 un+1 − λn Pn un )ΦS a(un+1 − un , ΦS ) − Ω(S) = λn B(Pn un )ΦS − JS (un )ΦS Ω(S)
S
0 02 0 1/2 0 = Hn λn Pn un 20,B,Ω(S) + 0HS JS (un )0 .
(5.14)
0,S
Rearranging this, and then applying the triangle and Cauchy–Schwarz inequalities, we obtain 0 02 0 1/2 0 Hn λn Pn un 20,B,Ω(S) + 0HS JS (un )0 0,S ≤ |a(un+1 − un , ΦS )| + B(λn+1 un+1 − λn Pn un )ΦS Ω(S)
≤ |||un+1 − un |||Ω(S) |||ΦS |||Ω(S) + λn+1 un+1 − λn Pn un 0,B,Ω(S) ΦS 0,B,Ω(S) (5.15) |||un+1 − un |||Ω(S) + Hn (λn+1 un+1 − λn Pn un )0,B,Ω(S) |||ΦS |||Ω(S) .
1080
S. GIANI AND I. G. GRAHAM
In the final step of (5.15) we made use of the Poincar´e inequality ΦS 0,B,Ω(S) HS |||ΦS |||Ω(S) and also the shape-regularity of the meshes. In view of (5.2), we have 0 02 0 1/2 0 Hn λn Pn un 20,B,Ω(S) + 0HS JS (un )0 0,S 2 |||un+1 − un |||Ω(S) + Hn (λn+1 un+1 − λn Pn un )0,B,Ω(S) |||un+1 − un |||2Ω(S) + Hn (λn+1 un+1 − λn Pn un )20,B,Ω(S) .
(5.16)
Now, from the definition of ηS,n in (4.2), and the triangle inequality, we have (5.17)
0 02 0 1/2 0 2 Hn λn Pn un 20,B,Ω(S) + 0HS JS (un )0 ηS,n
0,S
+ Hn λn (un − Pn un )20,B,Ω(S) .
The required inequality (5.11) now follows from (5.16) and (5.17). In the main result of this section, Theorem 5.6 below, we will be interested in achieving an error reduction result of the form |||u − αn+1 un+1 |||Ω ≤ ρ|||u − αn un |||Ω for some ρ < 1. Note that we need to introduce the scalar αn here to ensure nearness of the approximate eigenfunction to the true one. To prove error reduction we exploit the identity |u − αn un |2Ω = |u − αn+1 un+1 + αn+1 un+1 − αn un |2Ω = |u − αn+1 un+1 |2Ω + |αn+1 un+1 − αn un |2Ω
(5.18)
+ 2a(u − αn+1 un+1 , αn+1 un+1 − αn un ). In the case of source problems (e.g., [18, 19]), the αn is not needed and the last term on the right-hand side vanishes due to Galerkin orthogonality. However, this approach is not available to us in the eigenvalue problem. Therefore, a more technical approach is needed to bound the last two terms on the right-hand side of (5.18) from below. The main technical result is in the following lemma. Recall the convention in Notation 4.1. Lemma 5.4. With u, un , αn as in Remark 3.3, (5.19)
|||αn+1 un+1 − αn un |||2Ω θ2 |||u − αn un |||2Ω − osc(λn un , Tn )2 − L2n ,
where θ is defined in the marking strategy in Definition 4.5 and Ln satisfies the estimate: ˆ nmax )s |||u − αn un |||Ω , Ln ≤ C(H
(5.20)
where Cˆ depends on θ, λ, C1 , C2 , and q. Remark 5.5. Note that the oscillation term in (5.19) is unaffected if we replace αn un by un . Proof. By Definition 4.5 and Lemma 5.3, we have θ2 ηn2 ≤
S∈Sˆn
2 ηS,n
|αn+1 un+1 − αn un |2Ω + Hn (λn+1 αn+1 un+1 − λn Pn αn un )20,B,Ω + osc(λn un Tn )2 .
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
1081
Hence, rearranging and making use of Lemma 4.2 and Remark 4.3, we have |||αn+1 un+1 − αn un |||2Ω θ2 ηn2 − Hn (λn+1 αn+1 un+1 − λn Pn αn un )20,B,Ω − osc(λn un Tn )2 (5.21)
θ2 |||u − αn un |||2Ω − osc(λn un Tn )2 ˜ 2 − Hn (λn+1 αn+1 un+1 − λn Pn αn un )2 − θ2 G n 0,B,Ω ,
˜ n is the same as Gn in Lemma 4.2, but with un replaced by αn un . where G Note that (5.21) is of the required form (5.19) with 1/2 ˜ 2n + Hn (λn+1 αn+1 un+1 − λn Pn αn un )20,B,Ω . Ln := θ2 G ˜n, We now estimate the last two terms in (5.21) to obtain (5.20). To estimate G we use Theorem 3.1(ii) to obtain
(5.22)
2 ˜ n 1 (λ + λn )C 2 (H max )2s |||u − Qn u|||Ω G 1 n 2 |||u − αn un |||Ω 1 ≤ (λ + λn )C12 (Hnmax )2s |||u − αn un |||Ω . 2
To estimate the last term in (5.21), we first use the triangle inequality to obtain (5.23) Hn (λn+1 αn+1 un+1 − λn Pn αn un )0,B,Ω ≤ Hn (λn+1 αn+1 un+1 − λn αn un )0,B,Ω + osc(λn un , Tn ). For the first term on the right-hand side of (5.23), we have (5.24) Hn (λn+1 αn+1 un+1 − λn αn un )0,B,Ω ≤ Hnmax (λu − λn+1 αn+1 un+1 0,B,Ω + λu − λn αn un 0,B,Ω ) . Then, recalling (2.6) and Theorem 3.1, we obtain λu − λn+1 αn+1 un+1 0,B,Ω ≤ |λ − λn+1 |u0,B,Ω + λn+1 u − αn+1 un+1 0,B,Ω ≤ |||u − αn+1 un+1 |||2Ω + λn+1 C1 (Hnmax )s |||u − αn+1 un+1 |||Ω .
(5.25)
Using Theorem 3.1 (iii) and then Theorem 3.2, this implies λu − λn+1 αn+1 un+1 0,B,Ω (C2 + λn+1 C1 )(Hnmax )s |||u − αn+1 un+1 |||Ω (5.26) ≤ q(C2 + λn+1 C1 )(Hnmax )s |||u − αn un |||Ω . An identical argument shows (5.27)
λu − λn αn un 0,B,Ω (C2 + λn C1 )(Hnmax )s |||u − αn un |||Ω .
Combining (5.26) and (5.27) with (5.24), and using (2.9), we obtain (5.28) Hn (λn+1 αn+1 un+1 − λn αn un )0,B,Ω (1 + q)(C2 + λn C1 )(Hnmax )s+1 |||u − αn un |||Ω .
1082
S. GIANI AND I. G. GRAHAM
Now combining (5.28) with (5.21), (5.22), and (5.23) we obtain the result. The next theorem contains the main result of this section. It shows that, provided we start with a “fine enough” mesh Tn , the mesh adaptivity algorithm will reduce the error in the energy norm. Theorem 5.6 (error reduction). For each θ ∈ (0, 1), there exists a sufficiently fine mesh threshold Hnmax and constants μ > 0 and ρ ∈ (0, 1) (all of which may depend on θ and on the eigenvalue λ), with the following property. For any ε > 0 the inequality (5.29)
osc(λn un , Tn ) ≤ με
implies either |||u − αn un |||Ω ≤ ε or |||u − αn+1 un+1 |||Ω ≤ ρ |||u − αn un |||Ω . Proof. In view of (5.18) and remembering that αn+1 un+1 − αn un ∈ Vn+1 we have |u − αn un |2Ω − |u − αn+1 un+1 |2Ω = |αn+1 un+1 − αn un |2Ω + 2a(u − αn+1 un+1 , αn+1 un+1 − αn un ) (5.30)
= |αn+1 un+1 − αn un |2Ω + 2b(λu − λn+1 αn+1 un+1 , αn+1 un+1 − αn un ).
Before proceeding further, recall that by the assumptions (2.3) and (2.4), and the Poincar´e inequality, there exists a constant CP (depending on A, B and Ω) such that v0,B,Ω ≤ CP |||v|||Ω ,
for all v ∈ H01 (Ω).
Now using Cauchy–Schwarz and then the Young inequality 2ab ≤ on the second term on the right-hand side of (5.30), we get
1 2 4CP
a2 + 4CP2 b2
(5.31) |u − αn un |2Ω − |u − αn+1 un+1 |2Ω ≥ |αn+1 un+1 − αn un |2Ω − 2λu − λn+1 αn+1 un+1 0,B,Ω αn+1 un+1 − αn un 0,B,Ω 1 ≥ |αn+1 un+1 − αn un |2Ω − αn+1 un+1 − αn un 20,B,Ω 4CP2 − 4CP2 λu − λn+1 αn+1 un+1 20,B,Ω 3 |αn+1 un+1 − αn un |2Ω − 4CP2 λu − λn+1 αn+1 un+1 20,B,Ω . ≥ 4 Hence 3 |||αn+1 un+1 − αn un |||2Ω 4 + 4CP2 λu − λn+1 αn+1 un+1 20,B,Ω .
|||u − αn+1 un+1 |||2Ω ≤ |||u − αn un |||2Ω −
Applying Lemma 5.4, we see that there exist constants C, Cˆ such that
3 2 3 ˆ 2 max 2s 2 |||u − αn+1 un+1 |||Ω ≤ 1 − Cθ + C C (Hn ) |||u − αn un |||2Ω 4 4 + 4 CP2 λu − λn+1 αn+1 un+1 20,B,Ω 3 + C osc(λn un , Tn )2 . 4
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
1083
Then, making use of (5.26) we have (5.32)
|||u − αn+1 un+1 |||2Ω ≤ γn |||u − αn un |||2Ω +
3 C osc(λn un , Tn )2 4
with
3 2 max 2s Cθ + C (Hn ) γn := 1 − , 4
(5.33)
where C is another constant independent of n. Note that Hnmax can be chosen sufficiently small so that γm ≤ γ for some γ ∈ (0, 1) and all m ≥ n. Consider now the consequences of the inequality (5.29). If |||u − αn un |||Ω > ε, then (5.32) implies 3 |||u − αn+1 un+1 |||2Ω ≤ γ + Cμ2 |||u − αn un |||2Ω . 4 Now choose μ small enough so that (5.34)
ρ :=
1/2 3 <1 γ + Cμ2 4
to complete the proof. 6. Proof of convergence. The main result of this paper is Theorem 6.2 below, which proves convergence of the adaptive method and also demonstrates the decay of oscillations of the sequence of approximate eigenfunctions. Before proving this result we need a final lemma. Lemma 6.1. There exists a constant ρ˜ ∈ (0, 1) such that (6.1)
osc(un+1 , Tn+1 ) ≤ ρ˜ osc(un , Tn ) + (1 + q)(Hnmax )2 |u − αn un |Ω .
Proof. First, recall that one of the key results in [18], namely, [18, Lemma 3.8], is the proof that the oscillations of any fixed function v ∈ H01 (Ω) are reduced by applying one refinement based on Marking Strategy 2 (Definition 4.7). Thus, we have (in view of Algorithm 1): (6.2)
osc(un , Tn+1 ) ≤ ρ˜ osc(un , Tn ),
where 0 < ρ˜ < 1 is independent of un . Thus, a simple application of the triangle inequality combined with (6.2) yields
(6.3)
osc(un+1 , Tn+1 ) ≤ osc(un , Tn+1 ) + osc(αn+1 un+1 − αn un , Tn+1 ) ≤ ρ˜ osc(un , Tn ) + osc(αn+1 un+1 − αn un , Tn+1 ).
(Recall, again, that osc(un , Tn ) = osc(αn un , Tn ).) A further application of the triangle inequality and then (4.14) yields osc(αn+1 un+1 − αn un , Tn+1 ) ≤ osc(u − αn+1 un+1 , Tn+1 ) + osc(u − αn un , Tn+1 ) (6.4)
(Hnmax )2 (|||u − αn+1 un+1 |||Ω + |||u − αn un |||Ω ) ,
1084
S. GIANI AND I. G. GRAHAM
and then combining (6.3) and (6.4) and applying Theorem 3.2 completes the proof. Theorem 6.2. Provided the initial mesh T0 is chosen so that H0max is small enough, there exists a constant p ∈ (0, 1), such that the recursive application of Algorithm 1 yields a convergent sequence of approximate eigenvalues and eigenvectors, with the property: |u − αn un |Ω ≤ B0 qpn ,
(6.5) and
λn osc(un , Tn ) ≤ B1 pn ,
(6.6)
where B0 and B1 are constants and q is the constant defined in Theorem 3.2. Remark 6.3. The initial mesh convergence threshold and the constants B0 and ˜ and λ. B1 may depend on θ, θ, Proof. The proof of this theorem is by induction and the induction step contains an application of Theorem 5.6. In order to ensure the reduction of the error, we have to assume that the starting mesh T0 is fine enough and μ in Theorem 5.6 is small enough such that, for the chosen value of θ, the quantity ρ in (5.34) satisfies ρ < 1. Then with ρ˜ as in Lemma 6.1, choose p in the range max{ρ, ρ˜} < p < 1. We also set 5 4 B1 = osc(λ0 u0 , T0 ) and B0 = max μ−1 p−1 B1 , |||u − α0 u0 |||Ω . To perform the inductive proof, first note that by the definition of B0 and Theorem 3.2, |u − α0 u0 |Ω ≤ B0 ≤ B0 q, since q > 1. Combined with the definition of B1 we have shown the result for n = 0. Now, suppose that, for some n > 0, the inequalities (6.5) and (6.6) hold. Now let us consider the outcomes, depending on whether the inequality (6.7)
|u − αn un |Ω ≤ B0 pn+1
holds or not. If (6.7) holds, then we can apply Theorem 3.2 to conclude that |u − αn+1 un+1 |Ω ≤ q |u − αn un |Ω ≤ qB0 pn+1 , which proves (6.5) for n + 1. On the other hand, if (6.7) does not hold, then, by definition of B0 , (6.8)
|||u − αn un |||Ω > B0 pn+1 ≥ μ−1 B1 pn .
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
1085
Also, since we have assumed (6.6) for n, we have λn osc(un , Tn ) ≤ με
(6.9)
with
ε := μ−1 B1 pn .
Then (6.8) and (6.9) combined with Theorem 5.6 yields |||u − αn+1 un+1 |||Ω ≤ ρ|||u − αn un |||Ω , and so, using the inductive hypothesis (6.5) combined with the definition of p, we have |||u − αn+1 un+1 |||Ω ≤ ρB0 qpn ≤ qB0 pn+1 , which, again, proves (6.5) for n + 1. To conclude the proof, we have to show that also (6.6) holds for n + 1. Using Lemma 6.1, (2.9), and the inductive hypothesis, we have λn+1 osc(un+1 , Tn+1 ) ≤ ρ˜B1 pn + (1 + q)(Hnmax )2 λn B0 qpn ≤ (˜ ρB1 + (1 + q)(H0max )2 λ0 B0 q)pn .
(6.10)
Now, (recalling that ρ˜ < p), in addition to the condition already imposed on H0max , we can further require that ρ˜B1 + (1 + q)(H0max )2 λ0 B0 q ≤ pB1 . This ensures that λn+1 osc(un+1 , Tn+1 ) ≤ B1 pn+1 , thus concluding the proof. 7. Numerical experiments. We present numerical experiments to illustrate the convergence theory. Algorithm 1 has been implemented in FORTRAN95. The mesh refinement has been done using the toolbox ALBERTA [20]. We used the package ARPACK [15] to compute eigenpairs and the sparse direct linear solver ME27 from the HSL [21, 13] to carry out the shift-invert solves required by ARPACK. Additional numerical experiments on photonic crystal problems and on 3D problems are given in [10] and [11]. 7.1. Example: Laplace operator. In the first set of simulations, we have solved the Laplace eigenvalue problem (i.e., A = I and B = 1 in (2.2)) on a unit square with Dirichlet boundary conditions. The exact eigenvalues are known explicitly. We compare different runs of Algorithm 1 using different values for θ and θ˜ in Table 7.1. Since the problem is smooth, from Theorem 3.1 it follows that using uniform refinement the rate of convergence for eigenvalues should be O(Hnmax )2 , or, equivalently, the rate of convergence in the number of degrees of freedom (DOFs) N should be O(N −1 ). We measure the rate of convergence by conjecturing that |λ−λn | = CN −β and estimating β for each pair of computations from the formula β = − log(|λ − λn |/|λ − λn−1 |)/ log(DOFsn /DOFsn−1 ). Similarly, Table 7.2 contains the same kind of information relative to the fourth smallest eigenvalue of the problem. Our results show a convergence rate close to O(N −1 ) for θ, θ˜ sufficiently large. However, ˜ the rate of convergence is sensitive to the values of θ and θ.
1086
S. GIANI AND I. G. GRAHAM
Table 7.1 Comparison of the reduction of the error and DOFs of the adaptive method for the smallest eigenvalue for the Laplace problem on the unit square.
Iteration 1 2 3 4 5 6 7
θ = θ˜ = 0.2 |λ − λn | DOFs β 0.1350 400 0.1327 498 0.0802 0.1293 613 0.1228 0.1256 731 0.1645 0.1215 854 0.2138 0.1165 970 0.3340 0.1069 1097 0.6962
θ = θ˜ = 0.5 |λ − λn | DOFs β 0.1350 400 0.1177 954 0.1581 0.0779 1564 0.8349 0.0501 1977 1.8788 0.0351 2634 1.2383 0.0176 4004 0.7885 0.0121 6588 0.7217
θ = θ˜ = 0.8 |λ − λn | DOFs 0.1350 400 0.0529 1989 0.0176 5205 0.0073 15980 0.0024 48434 0.0009 122699 0.0003 312591
β 0.5839 1.1407 0.7877 0.9836 1.0673 1.0083
Table 7.2 Comparison of the reduction of the error and DOFs of the adaptive method for the fourth smallest eigenvalue for the Laplace problem on the unit square.
Iteration 1 2 3 4 5 6 7
θ = θ˜ = 0.2 |λ − λn | DOFs β 2.1439 400 2.0997 505 0.0895 2.0549 626 0.1004 1.9945 759 0.1548 1.9164 883 0.2638 1.7717 1017 0.5557 1.6463 1131 0.6911
θ = θ˜ = 0.5 |λ − λn | DOFs β 2.1439 400 1.8280 1016 0.1658 1.0850 1636 1.1662 0.7792 2254 1.0331 0.4936 3067 1.4826 0.3484 4681 0.8240 0.2578 7321 0.6730
θ = θ˜ = 0.8 |λ − λn | DOFs 2.1439 400 0.7603 2039 0.2439 6793 0.0917 18717 0.0331 54113 0.0120 146056 0.0046 382024
β 0.6365 0.9447 0.9652 0.9583 1.0181 0.9970
In the theory presented in [24], it is shown that the error for eigenvalues for smooth problems is bounded in terms of the square of the considered eigenvalue, i.e., (7.1)
|λ − λn | ≤ C λ2 (Hnmax )2 .
Also, we know that the first and the fourth eigenvalues are 19.7392089 and 78.9568352, and so, λ4 = 4λ1 . Comparing errors in Table 7.2 with those in Table 7.1, we see that the errors are roughly multiplied by a factor of 16, as predicted by (7.1). Often h-adaptivity uses only a marking strategy based on an estimation of the error, as in Marking Strategy 1 and avoids refining based on oscillations as in Marking Strategy 2. (Convergence of an adaptive scheme for eigenvalue problems which does not use marking strategy 2 is recently proved in [5].) To investigate the effects of refinement based on oscillations, in Table 7.3 we have computed the smallest eigenvalue for the Laplace problem keeping θ fixed and varying θ˜ only. Reducing θ˜ towards 0 has the effect of turning off the refinement arising from Marking Strategy 2. The results in Table 7.3 seem to suggest that the rate of convergence slightly increases as θ˜ increases. We investigate this further in Table 7.4, where we take iterations 5, 6, and 7 from Table 7.3, and we present the quantity C ∗ := N × |λ − λn |, where N denotes the number of DOFs. Then C ∗ gives an indication of the size of the unknown constant in the optimal error estimate |λ − λn | = O(N −1 ). The results suggest that C ∗ stays ˜ fairly constant independent of θ. ˜ In Table 7.5, we have set θ = 0. Although the convergence result given in this paper does not hold any more, the method is still clearly convergent. Comparing Table 7.1, Table 7.3, and Table 7.5, we see that with the second marking strategy the
1087
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
Table 7.3 Comparison of the reduction of the error and DOFs of the adaptive method for the smallest ˜ eigenvalue for the Laplace problem on the unit square for a fixed value of θ and varying θ. θ = 0.8, θ˜ = |λ − λn | DOFs 0.1350 400 0.0704 1269 0.0307 2660 0.0137 7492 0.0056 18853 0.0021 52247 0.0008 140049
Iteration 1 2 3 4 5 6 7
0.1 β 0.5646 1.1215 0.7770 0.9699 0.9587 0.9834
θ = 0.8, θ˜ = |λ − λn | DOFs 0.1350 400 0.0698 1372 0.0300 2821 0.0133 7846 0.0052 20189 0.0020 55640 0.0008 145773
0.3 β 0.5353 1.1700 0.7980 0.9918 0.9382 1.0011
θ = 0.8, θ˜ = |λ − λn | DOFs 0.1350 400 0.0673 1555 0.0285 3229 0.0115 9140 0.0046 22793 0.0018 61582 0.0007 161928
0.5 β 0.5131 1.1757 0.8731 0.9913 0.9310 1.0238
Table 7.4 Values of C ∗ computed from Table 7.3. Iteration 5 6 7
θ = 0.8, θ˜ = 0.1 1.06 × 102 1.10 × 102 1.12 × 102
θ = 0.8, θ˜ = 0.3 1.05 × 102 1.11 × 102 1.12 × 102
θ = 0.8, θ˜ = 0.5 1.05 × 102 1.11 × 102 1.13 × 102
Table 7.5 Comparison of the reduction of the error and DOFs of the adaptive method for the smallest eigenvalue for the Laplace problem on the unit square using marking strategy 1 only.
Iteration 1 2 3 4 5 6 7
θ = 0.2 |λ − λn | DOFs β 0.1350 400 0.1328 447 0.1525 0.1299 503 0.1824 0.1271 565 0.1958 0.1238 637 0.2157 0.1189 712 0.3650 0.1113 795 0.6014
θ = 0.5 |λ − λn | DOFs β 0.1350 400 0.1209 648 0.2289 0.0859 1036 0.7283 0.0627 1455 0.9301 0.0458 1965 1.0429 0.0323 3031 0.8066 0.0228 4372 0.9531
θ = 0.8 |λ − λn | DOFs 0.1350 400 0.0704 1253 0.0307 2646 0.0138 7490 0.0056 18847 0.0021 52239 0.0008 140194
β 0.5704 1.1125 0.7697 0.9734 0.9585 0.9828
Table 7.6 Comparison between the number of marked elements by strategy 1 (i.e., #Tˆn ) and the number of marked elements by strategy 2 only (i.e., #(T˜n \Tˆn )) for different values of θ and θ˜ for the smallest eigenvalue of the Laplace problem on the unit square.
Iteration 1 2 3 4 5 6
θ = θ˜ = 0.2 #Tˆn #(T˜n \Tˆn ) 12 15 13 15 14 15 14 14 15 13 15 12
θ = θ˜ = 0.5 #Tˆn #(T˜n \Tˆn ) 85 99 102 85 100 25 173 7 310 48 552 184
θ = θ˜ = 0.8 #Tˆn #(T˜n \Tˆn ) 299 285 953 19 3069 198 7965 2053 22426 1486 58075 3005
number of degrees of freedom grows faster than without it. To illustrate this effect better, Table 7.6 tabulates the number of elements #Tˆn (marked by Marking Strategy 1) with the extra number of elements #(T˜n \Tˆn ) (marked by Marking Strategy 2 alone). Note that the new DOFs created by mesh refinement come only from the refinement of
1088
S. GIANI AND I. G. GRAHAM
Fig. 7.1. Loglog plots of convergence of adaptive and uniform refinement for first eigenvalue of the Laplacian (left) and fourth eigenvalue of the Laplacian (right).
Table 7.7 Comparison of the reduction of the error and DOFs of the adaptive method for the second smallest eigenvalue for the Laplace problem on the unit square.
n 1 2 3 4 5 6 7
θ= |λ − λn | 0.5802 0.5678 0.5514 0.5329 0.5111 0.4758 0.4392
θ˜ = 0.2 N β 400 478 0.1212 562 0.1816 646 0.2449 735 0.3237 829 0.5942 918 0.7856
θ = θ˜ = 0.5 |λ − λn | N 0.5802 400 0.4935 811 0.3201 1275 0.2295 1728 0.1521 2374 0.1078 3498 0.0782 5555
β 0.2291 0.9564 1.0953 1.2950 0.8875 0.6938
|λ − λn | 0.5802 0.2447 0.0959 0.0368 0.0136 0.0050 0.0020
θ = θ˜ = 0.8 N 400 1533 3640 11747 32881 82968 221521
β 0.6427 1.0826 0.8169 0.9651 1.0778 0.9574
the marked elements, but also from the closures used to keep the meshes conforming. It is clear that the number of elements marked as a result of the oscillations continues to rise as refinement proceeds, although much more slowly than the number marked by the residual-based criterion (Marking Strategy 1). In Figure 7.1 we compare the performance of the adaptive algorithm with uniform bisection5 refinement (see Figure 4.1) for the first and fourth eigenvalues of the Laplace operator. We note that in this case both methods converge with a similar rate, as is expected since in this case the regularity of eigenfunctions is H 2 . To complete this section, we give in Table 7.7 an example of the performance of the adaptive method for computing nonsimple eigenvalues. In this case, we considered the second smallest eigenvalue of the Laplace operator on the unit square which has multiplicity 2. We see that, although the theory given above does not strictly hold, the method performs similarly to the case of simple eigenvalues. 7.2. Example: Elliptic operator with discontinuous coefficients. In this example, we investigate how our method copes with discontinuous coefficients. In order to do that, we modified the smooth problem from Example 7.1. We inserted a square subdomain of side 0.5 in the center of the unit square domain. In the bilinear form (2.2), we also chose the function A to be the scalar piecewise constant function, which assumes the value 100 inside the inner subdomain and the value 1 outside it. As before, B in (2.2) is chosen as B = 1. The jump in the value of A generally
1089
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
Table 7.8 Comparison of the reduction of the error and DOFs of the adaptive method for the smallest eigenvalue for the 2D problem with discontinuous coefficient.
Iteration 1 2 3 4 5 6 7 8
θ = θ˜ = 0.2 |λ − λn | DOFs β 1.1071 81 1.0200 103 0.3410 1.0105 129 0.0416 1.0039 147 0.0498 0.8968 167 0.8843 0.8076 194 0.6996 0.8008 217 0.0747 0.7502 237 0.7401
θ = θ˜ = 0.5 |λ − λn | DOFs β 1.1071 81 0.8738 199 0.2632 0.5848 314 0.8805 0.3983 491 0.8591 0.2766 673 1.1564 0.1933 975 0.9665 0.1346 1476 0.8722 0.0948 2080 1.0237
θ = θ˜ = 0.8 |λ − λn | DOFs β 1.1071 81 0.4834 356 0.5597 0.2244 799 0.9494 0.0990 2235 0.7957 0.0401 4764 1.1932 0.0180 12375 0.8372 0.0065 29148 1.1888 0.0020 65387 1.4482
1
1.4 1.5
0.9
1.2
0.8 0.7
1
1
0.6
0.8 0.5
0.5 0.6
0.4 0.3
0.2
0.1 0
0.4
0 1
0.2
0.5 0
0.2
0.4
0.6
0.8
1
0
0
0.2
0.4
0.6
0.8
1
Fig. 7.2. A refined mesh from the adaptive method corresponding to the first eigenvalue of the 2D problem with discontinuous coefficient, and the corresponding eigenfunction.
produces a jump in the gradient of the eigenfunctions all along the boundary of the subdomain, and at the corners of the subdomain (from both inside and outside) the eigenfunction has infinite gradient, arising from the usual corner singularities. We choose our initial mesh to be aligned with the discontinuity in A and so only the corner singularities are active here. We still have Assumption 2.1, but now s < 1 and, from Theorem 3.1, using uniform refinement, the rate of convergence for eigenvalues should be O(Hnmax )2s or, equivalently, O(N −s ), where N is the number of DOFs. The adaptive method yields the optimal order of O(N −1 ) (which holds for uniform ˜ (See Table 7.8.) Here we meshes and smooth problems) for large enough θ and θ. compute the “exact” λ using a mesh with about half a million of DOFs. In Figure 7.2, we depict the mesh coming from the fourth iteration of Algorithm 1 with θ = θ˜ = 0.8 for the smallest eigenvalue of this problem. This mesh is the result of multiple refinements using both marking strategies 1 and 2 each time. As can be seen, the corners of the subdomain are much more refined than the rest of the mesh. This is clearly the effect of the first marking strategy, since the edge residuals have detected the singularity in the gradient of the eigenfunction at these points. In Figure 7.2, we also depict the corresponding eigenfunction. In Figure 7.3, analogously to Figure 7.1, we compare the convergence of the adaptive method with uniform refinement for this example. Now, because of the lack of regularity, the superiority of the adaptive method is clearly visible.
1090
S. GIANI AND I. G. GRAHAM
Fig. 7.3. Loglog plot of convergence of adaptive and uniform refinement for first eigenvalue of the problem with discontinuous coefficient.
Acknowledgment. We would like to thank Carsten Carstensen for his kind support and very useful discussions. REFERENCES [1] M. Ainsworth and J.T. Oden, A Posterior Error Estimation in Finite Element Analysis, Wiley, New York, 2000. [2] I. Babuˇ ska, The finite element method for elliptic equations with discontinuous coefficients, Computing, 5 (1970), pp. 207–213. [3] I. Babuˇ ska and J. Osborn, Eigenvalue problems, in Handbook of Numerical Analysis Vol. II, P.G. Cairlet and J.L. Lions, eds., North Holland, 1991, pp. 641–787. [4] M. Bourland, M. Dauge, M.-S. Lubuma, and S. Nic ¸ aise, Coefficients of the singularities for elliptic boundary value problems on domains with conical points. III: Finite element methods on polygonal domains, SIAM J. Numer. Anal., 29 (1992), pp. 136–155. [5] C. Carstensen and J. Gedicke, An oscillation-free adaptive FEM for symmetric eigenvalue problems, preprint, 2008. [6] C. Carstensen and R.H.W. Hoppe, Convergence analysis of an adaptive nonconforming finite element method, Numer. Math., 103 (2006), pp. 251–266. [7] C. Carstensen and R.H.W. Hoppe, Error reduction and convergence for an adaptive mixed finite element method, Math. Comput., 75 (2006) pp. 1033–1042. ¨ rfler, A convergent adaptive algorithm for Poisson’s equation, SIAM J. Numer. Anal., [8] W. Do 33 (1996), pp. 1106–1124. ´ n, C. Padra, and R. Rodr´ıguez, A posteriori estimates for the finite element [9] R.G. Dura approximation of eigenvalue problems, Math. Models Methods Appl. Sci., 13 (2003), pp. 1219–1229. [10] S. Giani, Convergence of adaptive finite element methods for elliptic eigenvalue problems with application to photonic crystals, Ph.D. Thesis, University of Bath, Bath, UK, 2008. [11] S. Giani and I.G. Graham, A convergent adaptive method for elliptic eigenvalue problems and numerical experiments, Research Report 14/08, Bath Institute for Complex Systems, 2008. http://www.bath.ac.uk/math-sci/BICS/ [12] W. Hackbusch, Elliptic Differential Equations, Springer, New York, 1992. [13] HSL archive, http://hsl.rl.ac.uk/archive/hslarchive.html [14] M.G. Larson, A posteriori and a priori analysis for finite element approximations of selfadjoint elliptic eigenvalue problems, SIAM J. Numer. Anal., 38 (2000), pp. 608–625. [15] R.B. Lehoucq, D.C. Sorensen, and C. Yang, ARPACK Users’ Guide: Solution of LargeScale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, SIAM, 1998 [16] K. Mekchay and R. H. Nochetto, Convergence of adaptive finite element methods for general second order linear elliptic pdes, SIAM J. Numer. Anal., 43 (2005), pp. 1803–1827. [17] W. Mitchell, Optimal multilevel iterative methods for adaptive grids, SIAM J. Sci. Stat. Comput., 13 (1992), pp. 146–167.
CONVERGENT ADAPTIVE METHOD FOR EIGENVALUE PROBLEMS
1091
[18] P. Morin, R.H. Nochetto, and K.G. Siebert, Data oscillation and convergence of adaptive fem, SIAM J. Numer. Anal., 38 (2000), pp. 466–488. [19] P. Morin, R.H. Nochetto, and K.G. Siebert, Convergence of adaptive finite element methods, SIAM Rev., 44 (2002), pp. 631–658. [20] A. Schmidt and K.G. Siebert, ALBERT: An adaptive hierarchical finite element toolbox, Manual, p. 244, preprint 06/2000 Freiburg. [21] J.A. Scott, Sparse direct methods: An introduction, Lecture Notes in Physics, 535, 401, 2000. [22] R.L. Scott and S. Zhang, Finite element interpolation of nonsmooth functions satisfying boundary conditions, Math. Comput., 54 (1990), pp. 483–493. [23] R. Stevenson, Optimality of a standard adaptive finite element method, Found. Comput. Math., 7 (2007), pp. 245–269. [24] G. Strang and G.J. Fix, An Analysis of the Finite Element Method, Prentice-Hall, 1973. [25] T.F. Walsh, G.M. Reese, and U.L. Hetmaniuk, Explicit a posteriori error estimates for eigenvalue analysis of heterogeneous elastic structures, Comput. Methods Appl. Mech. Engrg., 196 (2007), pp. 3614–3623.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1092–1125
THE DERIVATION OF HYBRIDIZABLE DISCONTINUOUS GALERKIN METHODS FOR STOKES FLOW∗ BERNARDO COCKBURN† AND JAYADEEP GOPALAKRISHNAN‡ Abstract. In this paper, we introduce a new class of discontinuous Galerkin methods for the Stokes equations. The main feature of these methods is that they can be implemented in an efficient way through a hybridization procedure which reduces the globally coupled unknowns to certain approximations on the element boundaries. We present four ways of hybridizing the methods, which differ by the choice of the globally coupled unknowns. Classical methods for the Stokes equations can be thought of as limiting cases of these new methods. Key words. Stokes equations, mixed methods, discontinuous Galerkin methods, hybridized methods, Lagrange multipliers AMS subject classifications. 65N30, 65M60, 35L65 DOI. 10.1137/080726653
1. Introduction. This paper is devoted to the derivation of a new class of discontinuous Galerkin (DG) methods for the three-dimensional Stokes problem −Δu + grad p = f div u = 0 u=g
in Ω, in Ω, on ∂Ω.
As usual, we assume that f is in L2 (Ω)3 , that g ∈ H 1/2 (∂Ω)3 , and that g satisfies the compatibility condition (1.1) g · n = 0, ∂Ω
where n is the outward unit normal on ∂Ω. We assume that Ω is a bounded simply connected domain with connected Lipschitz polyhedral boundary ∂Ω. The novelty in the class of DG methods derived here lies in the fact that they can be hybridized. Hybridized methods are primarily attractive due to the reduction in the number of globally coupled unknowns, especially in the high order case. Hybridization for conforming methods was traditionally thought of as a reformulation that moves the interelement continuity constraints of approximations from the finite element spaces to the system of equations. Such reformulations are now well known to possess various advantages [9] (in addition to the reduction in the number of unknowns). In adapting the hybridization idea to DG methods, we face the difficulty that DG methods have no interelement continuity constraints to begin with. Nonetheless, some DG methods realize interelement coupling through constraints on ∗ Received by the editors June 10, 2008; accepted for publication (in revised form) October 13, 2008; published electronically February 19, 2009. http://www.siam.org/journals/sinum/47-2/72665.html † School of Mathematics, University of Minnesota, Minneapolis, MN 55455 (
[email protected]. edu). This author’s research was supported in part by the National Science Foundation (grant DMS0712955) and by the University of Minnesota Supercomputing Institute. ‡ Department of Mathematics, University of Florida, Gainesville, FL 32611–8105 (
[email protected]fl. edu). This author’s research was supported in part by the National Science Foundation under grants DMS-0713833 and SCREMS-0619080.
1092
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1093
numerical traces, which can be used to perform hybridization. This idea was exploited in the context of the Poisson-like equations in [10]. It will feature again in this paper, manifesting in a more complicated form suited to the Stokes system. Let us put this contribution in perspective. This paper can be considered part of a series of papers in which we study hybridization of finite element methods. The hybridization of classical mixed methods for second-order elliptic problems was considered in [5, 6]. Hybridization of a DG method for the two-dimensional Stokes system was carried out in [3], while hybridization of a mixed method for the three-dimensional Stokes system was developed in [7, 8]. A short review of the work done up to 2005 is provided in [9]. Recently in [10] it was shown how mixed, discontinuous, continuous, and even nonconforming Galerkin methods can be hybridized in a single, unifying framework. This was done for second-order elliptic problems. In this paper, we extend this approach to Galerkin methods for the Stokes problem. However, although the hybridization techniques we propose here provide a similar unifying framework, we prefer to sacrifice generality for the sake of clarity and concentrate our efforts on a particular, new class of methods we call the hybridizable discontinuous Galerkin (HDG) methods. Then, just as was done for second-order elliptic problems in [10], we show that this procedure also applies to mixed and other classic methods which can be obtained as particular or limiting cases of these HDG methods. Our results are also an extension of previous work on hybridization of a DG [3] and a classical mixed method [7, 8] for the Stokes equations. For these two methods, hybridization was used to circumvent the difficult task of constructing a local basis for divergence-free spaces for velocity. Moreover, in [7, 8], it was shown that hybridization results in a new formulation of the method which only involves the tangential velocity and the pressure on the faces of the elements. In this paper, we show that such a formulation can also be obtained for the HDG methods. We also show that these methods can be hybridized in three additional ways differing in the choice of variables which are globally coupled. The organization of the paper is as follows. In section 2, we present the HDG methods and show that their approximate solution is well defined. In section 3, we present the four hybridizations of the HDG methods in full detail. Proofs of the theorems therein are displayed in section 4. Finally, in section 5, we end with some concluding remarks. 2. The HDG methods. 2.1. Definition of the methods. Let us describe the HDG methods under consideration. We begin by introducing our notation. We denote by Ωh = {K} a subdivision of the domain Ω into shape-regular tetrahedra K satisfying the usual assumptions of finite element meshes and set ∂Ωh := {∂K : K ∈ Ωh }. We associate to this mesh the set of interior faces Eho and the set of boundary faces Eh∂ . We say that e ∈ Eho if there are two tetrahedra K + and K − in Ωh such that e = ∂K + ∩ ∂K − , and we say that e ∈ Eh∂ if there is a tetrahedra K in Ωh such that e = ∂K ∩ ∂Ω. We set Eh := Eho ∪ Eh∂ . The HDG methods provide an approximate solution (ω h , uh , ph ) in some finitedimensional space W h × V h × Ph of the form W h = {τ ∈ L2 (Ω) :
τ |K ∈ W (K) ∀ K ∈ Ωh },
V h = {v ∈ L (Ω) :
v|K ∈ V (K) ∀ K ∈ Ωh },
Ph = {q ∈ L (Ω) :
q|K ∈ P (K) ∀ K ∈ Ωh },
2
2
1094
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
where the local spaces W (K), V (K), and P (K) are finite-dimensional polynomial spaces that we shall specify later. To define the approximate solution, we use the following formulation of the Stokes equations: (2.1a) (2.1b)
ω − curl u = 0 curl ω + grad p = f
(2.1c) (2.1d)
div u = 0 u=g
in Ω, in Ω, in Ω, on ∂Ω.
Multiplying the first three equations by test functions and integrating by parts, we arrive at the following formulation for determining an approximate solution (ω h , uh , ph ) in W h × V h × Ph : (2.2a)
uh , n × τ ∂Ωh (ω h , τ )Ωh − (uh , curl τ )Ωh +
= 0,
(2.2b)
(ω h , curl v)Ωh + ω h , v × n∂Ωh ph , v · n∂Ωh − (ph , div v)Ωh +
= (f , v)Ωh ,
− (uh , grad q)Ωh + uh · n, q∂Ωh = 0
(2.2c)
for all (τ , v, q) ∈ W h × V h × Ph . The notation for volume innerproducts above is defined by (ζ, ω)Ωh :=
K∈Ωh
ζ(x) ω(x) dx
and
(σ, v)Ωh :=
K
3
(σi , vi )Ωh
i=1
for all ζ, ω in L2 (Ωh ) := {v : v|K ∈ L2 (K) for all K in Ωh }, and all σ, v ∈ L2 (Ωh ) := [L2 (Ωh )]3 . More generally, our notation is such that if S represents the notation for any given space (e.g., S can be L2 , H 1 , etc.), the bold face notation S(Ωh ) denotes [S(Ωh )]3 , and S(Ωh ) := {ω : Ωh → R, ω|K ∈ S(K) ∀ K ∈ Ωh }, S(∂Ωh ) := {ω : ∂Ωh → R, ω|∂K ∈ S(∂K) ∀ K ∈ Ωh }. The boundary innerproducts in (2.2) are defined by v(γ) , n μ(γ) dγ, v , n, μ∂Ωh := K∈Ωh
∂K
where , is either · (the dot product) or × (the cross product) and n denotes the unit outward normal vector on ∂K. Similarly, for any Fh ⊆ Eh , the notation ·, ·Fh indicates a sum of integrals over the faces in Fh . To complete the definition of the HDG methods, we need to specify the numerical traces, for which we need the following notation. For any vector-valued function v we set (2.3a)
v t := n × (v × n),
(2.3b)
v n := n (v · n).
Note that we have that v = v n + v t . In this paper we will often use double-valued functions on Eho . One example is n. Indeed, on each interior mesh face e = ∂K + ∩∂K − ,
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1095
the unit normal n is double valued with two branches, one from K + , which we denote by n+ , and another from K − , which we denote by n− . Similarly, if v is in H 1 (Ωh ), its full trace, as well as the tangential and normal traces in (2.3), are generally double valued on Eho . We use v + and v − to denote the full trace on e of v from K + and K − , respectively. On each e = ∂K + ∩ ∂K − , the jumps of double-valued functions v in H 1 (Ωh ) and q in H 1 (Ωh ) are defined by [[q n]] := q + n+ + q − n− ,
(2.4a)
[[v , n]] := v + , n+ + v − , n− ,
(2.4b)
where , is either · or ×. With these preparations we can now specify our definition of the numerical traces appearing in (2.2). On the interior faces Eho , we set (2.5a)
( ω h )t
(2.5b)
( uh )t
(2.5c)
( uh )n
(2.5d)
ph
− + + − τt (ω h )t + τt+ (ω − τt τt h )t = + [[uh × n]], − + τt + τt τt− + τt+
+ +
τ (uh )t + τt− (u− 1 h )t = t + [[n × ω h ]], τt− + τt+ τt− + τt+
+ +
τn (uh )n + τn− (u− 1 h )n = + [[ph n]], τn− + τn+ τn− + τn+
− +
τ ph + τn+ p− τn+ τn− h = n − + [[uh · n]], τn + τn+ τn− + τn+
where the so-called penalization or stabilization parameters τt and τn are functions on Eh that are constant on each e in Eh and double valued on Eho ; indeed, if e = ∂K + ∩∂K − , then τt± and τn± are the values on e∩∂K ± of the stabilization parameters. Finally, on the boundary faces of Eh∂ , we set (2.6a) (2.6b)
( uh )t = g t , ( uh )n = g n ,
(2.6c) (2.6d)
h ) × n, ( ω h )t = (ω h )t + τt (uh − u h ) · n. ph = ph + τn (uh − u
This completes the definition of the HDG method in (2.2), save the specification of the spaces on each element. Let us briefly motivate the choice of the above numerical traces. First, we want them to be linear combinations of the traces of the approximate solution (ωh , uh , ph ). We also want them to be consistent and conservative; these are very important properties of the numerical traces as was shown in [1] in the context of second-order elliptic problems. They are consistent because when the approximate solution is continuous across interelement boundaries, or at the boundary of Ω, we have that uh )t , ( uh )n , ph ) = ((ω h )t , (uh )t , (uh )n , ph ). (( ω h )t , ( They are conservative because they are single valued. The above general considerations, however, are not enough to justify the specific expression of the numerical traces on the parameters τt and τn . We take this particular
1096
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
expression because it allows the hybridization of the methods. Although this will become evident when we develop each of its four hybridizations, we can briefly argue h = ( uh )t + why this is so. Suppose that we want the numerical trace of the velocity, u ( uh )n , to be the globally coupled unknown. This means that, on each element K ∈ Ωh , h . If in the we should be able to express all the remaining unknowns in terms of u weak formulation defining the method, (2.2), we take test functions with support in the element K, we see that we can achieve this if we could write h) × n ( ωh )t = (ω h )t + τt (uh − u
and
h ) · n, ph = ph + τn (uh − u
where (ω h , uh , ph ) is the approximation on the element K, n is the outward unit normal to K, and τt and τn take the values associated with K. Note that this is consistent with the choice of the corresponding numerical traces on the border of Ω, equations (2.6c) and (2.6d). Since the element K was arbitrary, we should then have + + − − h ) × n+ = (ω h )− h ) × n− , ( ω h )t = (ω h )+ t + τt (uh − u t + τt (uh − u + − + − h ) · n+ = p− h ) · n− ph = p+ h + τn (uh − u h + τn (uh − u
on all interior faces. A simple algebraic manipulation shows that this is possible only if the numerical traces therein are taken as in (2.5). Let us end this subsection by remarking that the choice of the penalization parameters τt and τn can be crucial since it can have an important effect on both the stability and the accuracy of the method. This constitutes ongoing work; see the last paragraph of section 5. In subsection 3.5, we show how, by taking special choices of these parameters, several already known methods for the Stokes system are recovered. 2.2. Other boundary conditions. The vorticity-velocity variational formulation admits imposition of boundary conditions other than (2.1d); see a short discussion in subsection 4.3 in [16]. In this paper, we consider the following types of boundary conditions: (2.7a)
(2.7b) (2.7c) (2.7d)
ut = g t p=r ut = g t un = g n ωt = γ t un = g n ωt = γ t p=r
# Type I boundary conditions, # Type II boundary conditions, # Type III boundary conditions, # Type IV boundary conditions.
We have already defined the HDG method in the case of the Type II boundary conditions in the previous subsection. Neither the equations of the HDG method (2.2) nor the equations of the interior numerical traces (2.5a)–(2.5d) change when the other boundary conditions are considered. But the equations for the boundary numerical
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1097
traces, namely (2.6a)–(2.6d), must be changed as follows: ⎫ ( uh )t = g t , ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎬ ( uh )n = (uh )n + (ph − ph )n, τn (2.8a) for Type I, ⎪ ⎪ h = (ω h )t + τt (uh − u h ) × n, ⎪ ω ⎪ ⎪ ⎭ ph = r, ⎫ 1 ⎪ h ), ⎪ ( uh )t = (uh )t + n × (ω h − ω ⎪ ⎪ τt ⎪ ⎬ ) = g , ( u h n (2.8b) for Type III, n ⎪ ⎪ ⎪ ( ω h )t = γ t , ⎪ ⎪ ⎭ h ) · n, ph = ph + τn (uh − u ⎫ 1 ⎪ h ), ⎪ ( uh )t = (uh )t + n × (ω h − ω ⎪ ⎪ τt ⎪ ⎪ ⎪ ⎬ 1 ) = (u ) + (p − p )n, ( u h n h n h h (2.8c) for Type IV. τn ⎪ ⎪ ⎪ ⎪ ( ω h )t = γ t , ⎪ ⎪ ⎪ ⎭ ph = r, When we do not have boundary conditions on pressure, the pressure variable in Stokes flow is determined only up to a constant. Therefore, for Type II and Type III boundary conditions, in order to obtain unique solvability we must change the pressure space from Ph to Ph0 = Ph ∩ L20 (Ω), where L20 (Ω) is the set of functions in L2 (Ω) whose mean on Ω is zero. In the case of Type I and Type IV boundary conditions, the pressure space is simply Ph . Finally, let us point out that the Type IV boundary conditions are not particularly useful since they have to be complemented by additional conditions on the velocity. For this reason, we do not consider them as possible boundary conditions for the Stokes equations. However, we discuss them here because, as we are going to see, there is a one-to-one correspondence between the four types of boundary conditions just considered and the four hybridizations of the HDG method. 2.3. Existence and uniqueness of the HDG solution. With (strictly) positive penalty parameters, the HDG method is well defined, as we next show. When we say that a multivalued function τ is positive on ∂Ωh , we mean that both branches of τ are positive on all faces of Eho and furthermore that the branch from within Ω is positive on the faces of ∂Ω. Of course, the branch from outside Ω is zero. To simplify our notation, we will use a symbol for averages of double-valued functions. On any interior face e = ∂K + ∩ ∂K − , let {{v}}α = v + α+ + v − α− for any double-valued function α. The notation {{v}} (without a subscript) denotes {{v}}α with α+ = α− = 1/2. As a final note on our notation, we do not distinguish between functions and their extensions by zero. Accordingly, we use the previously defined notations like [[·]] and {{·}} even for boundary faces in Eh∂ with the understanding that one of the branches involved is zero (which is the case when the function is
1098
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
extended by zero); e.g., on a boundary face e, the penalty function τn has only one nonzero branch, say τn− , so {{τn }} on e equals τn− /2. With this notation it is easy to verify that the identities (2.9a) (2.9b)
σ, v × n∂Ωh = {{σ}}α , [[v × n]] Eh − [[σ × n]], {{v}}1−α Eh , q, v · n ∂Ωh = {{q}}α , [[v · n]]Eh
+ [[q n]], {{v}}1−α Eh
hold for any α whose branches sum to one, i.e., α+ + α− = 1 on every face e in Eh . Proposition 2.1. Assume that τt and τn are positive on ∂Ωh . Assume also that curl V (K) ⊂ W (K), grad P (K) ⊂ V (K), div V (K) ⊂ P (K) for every element K ∈ Ωh . Then we have the following: 1. For the Type I boundary conditions, there is one and only one (ω h , uh , ph ) in the space W h × V h × Ph satisfying (2.2), (2.5), and (2.8a). 2. For the Type II boundary conditions, there is a solution (ω h , uh , ph ) in the space W h × V h × Ph satisfying (2.2), (2.5), and (2.6) if and only if g satisfies (1.1). When a solution (ω h , uh , ph ) exists, all solutions are of the form (ω h , uh , ph + κ) for some constant function κ. There is a unique solution if Ph is replaced by Ph0 . 3. For Type III, the statements of the Type II case holds verbatim after replacing (2.6) with (2.8b). Proof. The proof proceeds by setting all data to zero and finding the null space in each of the three cases. Taking (τ , v, q) := (ω h , uh , ph ) in (2.2) and adding them, we obtain (2.10)
(ω h , ωh )Ωh + Θh = 0,
where h ∂Ωh uh , n × ωh ∂Ωh − uh , n × ω Θh := −uh , n × ωh ∂Ωh + h · n∂Ωh . − ph , uh · n∂Ωh + ph , uh · n∂Ωh + ph , u Rewriting Θh using (2.9), we obtain h ]]Eh Θh = − ω h − {{ωh }}1−α , [[n × uh ]]Eh + uh − {{uh }}α , [[n × ω + ph − {{ph }}1−β , [[uh · n]]Eh + uh − {{uh }}β , [[ph n]]Eh for any α and β whose branches sum to one on every face of Eh . We set α = τt /2 {{τt }} and β = τn /2 {{τn }} on all the interior faces of Eho . On the remaining boundary faces, − + + we set α and β case by case as follows, letting α− ∂Ω , β∂Ω and α∂Ω , β∂Ω denote the branches of α, β from outside and inside Ω, respectively. − + − For the Type I case, we set α+ ∂Ω = 0, α∂Ω = 1, β∂Ω = 1, β∂Ω = 0. Then, inserting the expressions for the interior and boundary numerical traces given by (2.5) and (2.8a), we obtain 2 Θh = Θoh + τt , uh × n ∂Ω + τn , |ph n|2 ∂Ω ,
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1099
where ?
Θoh
? @ @ 2 2 2 2 , [[n × ω h ]] , [[uh × n]] = + {{τt }} {{1/τt }} Eho Eho ? ? @ @ 2 2 2 , [[uh · n]]2 , [[ph n]] + + . {{1/τn }} {{τn }} Eo Eo h
h
Hence (2.10) implies that ω h vanishes, uh and ph are continuous on Ω, and (uh )t and ph vanish on ∂Ω. With this in mind, we integrate by parts the equations defining the method, namely (2.2), to obtain (curl uh , τ )Ωh = 0, (grad ph , v)Ωh = 0, (div uh , q)Ωh = 0 for all (τ , v, q) ∈ W h × V h × Ph . By our assumptions on the local spaces, this implies that the following (global) distributional derivatives on Ω vanish: (2.11)
grad ph = 0,
div uh = 0,
and
curl uh = 0.
The first equality implies that ph vanishes since we already found ph to vanish on ∂Ω. Moreover, since (uh )t vanishes on the boundary ∂Ω, and since we have assumed that ∂Ω consists of just one connected component, the last two equalities imply that uh = 0. Thus, the null space is trivial. − + − For the Type II case, we set α+ ∂Ω = 0, α∂Ω = 1, β∂Ω = 0, and β∂Ω = 1 and simplify Θh using the interior and boundary numerical traces given by (2.5) and (2.6) to find that 2 2 Θh = Θoh + τt , uh × n ∂Ω + τn , uh · n ∂Ω . Hence (2.10) implies that ωh vanishes, uh is continuous on Ω, and uh is zero on ∂Ω, and ph is continuous on Ω. Proceeding as in the Type I case, we find that (2.11) holds, so uh vanishes. But unlike the Type I case, we can now conclude only that ph is constant. Thus the null space consists of (ω h , uh , ph ) = (0, 0, κ) for constant functions κ. Hence, all statements of the proposition on the Type II case follow. The Type III case is proved similarly. It is interesting to note that the proof of the Type II case required only minimal topological assumptions on Ω, namely, that Ω is connected. However, the proof of the other two cases used the further assumptions we placed on Ω. The mixed method presented in [8] without such topological assumptions dealt only with the Type II boundary conditions. We can now give some possible choices for polynomial spaces that can be set within each element. Clearly, Proposition 2.1 gives the conditions that we must satisfy. Let Pd denote the space of polynomials of degree at most d, and let Pd denote the space of vector functions whose components are polynomials in Pd . Let dP ≥ 1, dV ≥ 0, dW ≥ 0 be some integers satisfying (2.12)
dP − 1 ≤ dV ≤ min(dP + 1, dW + 1).
Then if we set W (K) = PdW ,
V (K) = PdV ,
P (K) = PdP ,
1100
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
the conditions of Proposition 2.1 are satisfied. Some examples are ⎧ (k, k − 1, k), (k + 1, k − 1, k), ⎨ (k − 1, k − 1, k), (k, k, k), (k + 1, k, k), (dW , dV , dP ) = k, k), ⎩(k − 1, (k, k + 1, k), (k + 1, k + 1, k) for some integer k ≥ 1. Clearly there is greater flexibility in the choice of spaces than, for instance, in the choice of spaces for mixed methods for the Stokes problem; e.g., from (2.12) it is clear that we can choose dW to be as large as we wish and the method continues to be well defined. Having established that the HDG methods are well defined, we show in the next section that they can be hybridized in different ways according to the choice of variables that are globally coupled. 3. Hybridizations of the HDG methods. In this section, we will restrict ourselves to considering the Stokes problem with the Type II boundary conditions. We hybridize the HDG method for this case. As we shall see, while hybridizing we can choose to set HDG methods with the other types of boundary conditions within mesh elements. For constructing hybridized methods based on the vorticity-velocity formulation, let us recall the following four transmission conditions for the Stokes solution components: (3.1) [[u × n]] = 0, [[u · n]] = 0, [[p n]] = 0. [[ω × n]] = 0, o o o o Eh
Eh
Eh
Eh
Corresponding to these four transmission conditions, there are four variables on which boundary conditions of the following form can be prescribed: (3.2)
ωt = γ t ,
ut = λt ,
un = λn ,
p = ρ.
With this correspondence in view, we can describe our approach for constructing hybridization techniques as follows. We pick any two of the variables in (3.2) as unknown boundary values on the boundary of each mesh element. (Once these values are known, the solution inside the element can be computed locally.) Then, we formulate a global system of equations for the chosen unknown variables, using the transmission conditions on the other two variables in (3.1). Of course, we must identify the proper discrete versions of these transmission conditions for this purpose. According to this strategy, there appears to be six possible cases. But two of the six cases yield underdetermined or overdetermined systems. For instance if we pick γ t and λt as unknowns, counting their components, we would have a total of four scalar unknown functions. However, the transmission conditions (the last two in (3.1)) form only two scalar equations so will yield an underdetermined system. Similarly, if we pick λn and ρ as the unknowns, we get an overdetermined system. We discard these two possibilities. In the remainder, we now work out the specifics for the remaining four cases. 3.1. Hybridization of Type I. A formulation with tangential velocity and pressure. Here, we choose the second and the last of the variables in (3.2), namely (u)t and p, as the unknowns on the mesh interfaces. Their discrete approximations will be denoted by λt and ρ, respectively. We shall then use the transmission conditions on the other two variables, namely, and [[u · n]] = 0, (3.3) [[ω × n]] = 0 Eho
Eho
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1101
to derive a hybridized formulation that will help us solve for the approximations λt and ρ. The success of this approach relies on us being able to compute approximate solutions within each element locally, once the discrete approximations λt ≈ ut and ρ ≈ p are found. In other words, we need a discretization of the following Stokes problem on one element: ωK − curl uK = 0
in K,
curl ω K + grad pK = f div uK = 0
in K, in K,
(uK )t = λt pK = ρ
on ∂K, on ∂K.
We use the HDG method (with Type I boundary conditions) applied to a single element as our discretization. Specifically, given (λt , ρ, f ) in L2 (∂Ωh ) × L2 (∂Ωh ) × L2 (Ω), we define (W, U, P) in W h × V h × Ph on the element K ∈ Ωh as the function in W (K) × V (K) × P (K) satisfying (3.4a) (3.4b) (3.4c)
(W, τ )K − (U, curl τ )K
= −λt , n × τ ∂K ,
$ v × n∂K (W, curl v)K + W, = (f , v)K − ρ, v · n∂K , − (P, div v)K · n, q∂K = 0 − (U, grad q)K + U
for all (τ , v, q) ∈ W (K) × V (K) × P (K), where 1 (P − ρ) n, τn
(3.4d)
n = (U)n + (U)
(3.4e)
$ = W + τt (U − λt ) × n. W
Note that the above system (3.4) is obtained from the HDG system (2.2) with Ω set to K and the numerical traces set by (2.8a) (and there are no interior faces). The above system of equations thus defines a linear map (the “local solver”) (3.4f)
LI
(λt , ρ, f ) −→ (W, U, P)
due to the unique solvability of the HDG method on one element, as given by Proposition 2.1(1). Next, we identify conditions on λt and ρ that make (W, U, P) identical to the approximation (ω h , uh , ph ). We begin by restricting the function (λt , ρ) to the space (Mh )t × Ψh , where (3.5a) (3.5b)
(Mh )t := {μt ∈ L2 (Eh ) : Ψh := {ψ ∈ L2 (Eh ) :
μt |e ∈ M (e) ∀ e ∈ Eho }, ψ|e ∈ Ψ (e) ∀ e ∈ Eh },
where, on each face e ∈ Eh , the finite-dimensional spaces M (e) and Ψ (e) are such that (3.5c) (3.5d)
M (e) ⊇ {(v t + n × τ )|e : (τ , v) ∈ W (K) × V (K) ∀ K : e ⊂ ∂K}, Ψ (e) ⊇ {(q + v · n)|e : (v, q) ∈ V (K) × P (K) ∀ K : e ⊂ ∂K}.
1102
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
The next theorem identifies discrete analogues of the transmission conditions (3.3) as the requirements for recovering the discrete solution. Theorem 3.1 (conditions for Type I hybridization). Suppose (ω h , uh , ph ) is the solution of the HDG method defined by (2.2), (2.5), and (2.6). Assume that (λt , ρ) ∈ (Mh )t × Ψh is such that λt = g t
(3.6a)
on ∂Ω,
$ μ E o = 0 [[n × W]], t h
(3.6b)
∀ μt ∈ Mh ,
· n]], ψE = g · n, ψ∂Ω [[U h (P, 1)Ω = 0.
(3.6c) (3.6d)
∀ ψ ∈ Ψh ,
uh )t , and ρ = ph . Then (W, U, P) = (ω h , uh , ph ), λt = ( Proof. We begin by noting that (W, U, P) is in the space W h × V h × Ph , by the definition of the local solvers. Moreover, by adding the equations defining the local solver, namely (3.4a)–(3.4c), we find that (W, U, P) satisfies the equations of (2.2), $ t in place of ( n in place of ( with (W) ω h )t , λt in place of ( uh )t , (U) uh )t , and ρ in $ t , λt , (U) n , and ρ can be related to (W, U, P), place of ph . Hence, if we show that (W) as in the expressions for the numerical traces (2.5a)–(2.5d), then the proof will be complete because of the uniqueness result of Proposition 2.1(2) (which applies due to condition (3.6d)). Therefore, let us first derive such expressions for λt and ρ. By the choice of the space Mh × Ψh , the jump conditions (3.6b) and (3.6c) imply that $ =0 [[n × W]]
and
· n]] = 0 [[U
on Eho .
Inserting the definition of the numerical traces (3.4d) and (3.4e), we readily obtain that, on Eho , [[n × W]] + τt+ (U+ )t + τt− (U− )t − (τt+ + τt− ) λt = 0,
1 1 + 1 − 1 [[U · n]] + + P + − P − + − ρ = 0, τn τn τn+ τn or, equivalently,
τt+ (U+ )t + τt− (U− )t 1 λt = + [[n × W]], τ− + τ+ τ− + τ+
− + t + t − − +t t τn P + τn P τn τn ρ= + [[U · n]]. − + τn + τn τn− + τn+ Substituting these expressions into (3.4d) and (3.4e), we obtain
+ − τt− (W+ )t + τt+ (W− )t τt τt + [[U × n]], τt− + τt+ τt− + τt+
+ +
− − 1 n = τn (U )n + τn (U )n + [[P n]]. (U) − + − τn + τn τn + τn+
$ t= (W)
In other words, the numerical traces satisfy (2.5). The fact that they satisfy (2.6a) and (2.6b) follows from conditions (3.6a) and (3.6c), respectively. Finally, (2.6c) and
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1103
(2.6d) follow directly from the definition of the numerical traces of the local solvers (3.4e) and (3.4d), respectively. Thus, by the uniqueness result of Proposition 2.1(2), we now conclude that uh )t and ρ = ph . (W, U, P) coincides with (ω h , uh , ph ), and consequently, λt = ( This completes the proof. At this point, we can comment more on our strategy for construction of hybridized DG methods. Roughly speaking, the derivation of our hybridized methods proceeds by imposing discrete versions of all four transmission conditions in (3.1) through the four numerical traces of the HDG method. The two numerical traces we picked as unknowns in this case, namely λt and ρ, being single valued on Eho , already satisfy a zero-jump transmission condition, so we have in some sense already imposed the second and the fourth of the conditions in (3.1). The discrete analogues of the remaining two (the first and the third) transmission conditions are (3.6b) and (3.6c), which requires the remaining two numerical traces to be single valued. Theorem 3.1 shows that once these conditions are imposed, the HDG solution is recovered. Next, we give a characterization of unknown traces λt and ρ and the discrete HDG solution (ω h , uh , ph ) in terms of the local solvers. In particular, we show that the jump conditions (3.6b) and (3.6c) define a mixed method for the tangential velocity and the pressure. To state the result, we need to introduce some notation. Letting λot = λt |Eho , and remembering our identification of functions with their zero extension, we can write λt = λot + g t . We denote by (Mho )t the functions of (Mh )t which are zero on ∂Ω (so λot is in (Mho )t ). Finally, we use the following notation for certain specific local solutions: (Wλt , Uλt , Pλt ) := LI (λt , 0, 0),
(3.7a)
(Wρ , Uρ , Pρ ) := LI (0, ρ, 0), (Wf , Uf , Pf ) := LI (0, 0, f ),
(3.7b) (3.7c)
where LI is as in (3.4f). We are now ready to state our main result for this case. Theorem 3.2 (characterization of the approximate solution). We have that ωh uh ph
= = =
Wλot Uλot Pλot
+ + +
Wρ Uρ Pρ
+ + +
Wf Uf Pf
+ + +
Wg t , Ug t , Pg t ,
where (λot , ρ) is the only element of (Mho )t × Ψh such that (3.8a) (3.8b)
ah (λot , μt ) + bh (ρ, μt ) = 1 (μt ), −bh (ψ, λot ) + ch (ρ, ψ) = 2 (ψ)
for all (μt , ψ) ∈ (Mho )t × Ψh , and (3.8c)
(Pλot + Pρ + Pf + Pgt , 1)Ω = 0.
1104
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
Here
?
ah (λt , μt ) :=(Wλt , Wμt )Ωh + τt (λt − Uλt )t , (μt − Uμt )t ∂Ωh + ? @ 1 Pμ t , bh (ρ, μt ) := ρ, n · Uμt + τn ∂Ωh
?
ch (ρ, ψ) := (Wρ , Wψ )Ωh + τt (Uρ )t , (Uψ )t ∂Ωh +
1 Pλ , Pμt τn t
@ , ∂Ωh
@ 1 (ρ − Pρ ), (ψ − Pψ ) , τn ∂Ωh
and 1 (μt ) := (f , Uμt )Ωh − ah (g, μt ), 2 (ψ) := − (f , Uψ )Ωh − g · n, ψ∂Ω + bh (ψ, g t ). The proof of this theorem is in section 4. In view of this theorem, we can obtain the HDG solution by first solving a symmetric global system that is smaller than (2.2) and then locally recovering all solution components (by applying LI ). This is the main advantage brought about by hybridization. It makes this HDG method competitive in comparison with other existing DG methods for Stokes flow. It is interesting to note that the space in which the trace variables lie, namely (M h )t and Ψh , can be arbitrarily large. While it is in the interest of efficiency to choose as small a space as possible (for a given accuracy), in mixed methods one also often require spaces to be not too large for stability reasons. In the HDG method, stability is guaranteed through the penalty parameters τn and τt . A consequence of this is that (3.8) is uniquely solvable, no matter how large (M h )t and Ψh are. For the analogous hybridized mixed method of [8], we needed the trace spaces corresponding to (M h )t and Ψh to be exactly equal to certain spaces of jumps, which created additional implementation issues such as construction of local basis functions for the spaces. 3.2. Hybridization of Type II. A formulation with velocity and means of pressure. Recalling our scheme for construction of hybridized methods described in the beginning of this section, we now consider the case when ut and un (i.e., all components of u) are chosen as the unknowns in the mesh interfaces. Correspondingly, we should use the transmission conditions on the other two variables, namely, and [[p n]] = 0, (3.9) [[ω × n]] = 0 Eho
Eho
to derive a hybridized formulation. However, the success of this strategy relies on us being able to compute approximate Stokes solutions within each element locally, once a discrete approximation to u, say λ, is obtained on the boundary of every mesh element. Here we find a difficulty not encountered in the previous case, namely, that the HDG discretization (2.2) on one element with λ as boundary data (of Type II) is not solvable in general, unless λn · n = 0, (3.10) ∂K
as seen from Proposition 2.1(2). Thus we are led to modify our local solvers, which in turn necessitates the introduction of a new variable (ρ) approximating the means of pressures on the element boundaries, as we shall see now.
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1105
The new local solver, denoted by LII , maps a given function (λ, ρ, f ) in L2 (∂Ωh )× (∂Ωh ) × L2 (Ω) to a triple (W, U, P) ∈ W h × V h × Ph defined below. Here, 2 (∂Ωh ) denotes the set of functions in L2 (∂Ωh ) that are constant on each ∂K for all mesh elements K. On any element K ∈ Ωh , the function (W, U, P) restricted to K is in W (K) × V (K) × P (K) and satisfies 2
(3.11a) (3.11b) (3.11c)
(W, τ )K − (U, curl τ )K $ v × n∂K (W, curl v)K + W, v · n∂K − (P, div v)K + P, − (U, grad q)K
= −λ, n × τ ∂K , = (f , v)K , = λ · n, q − q∂K , P = ρ,
(3.11d) where (3.11e) (3.11f)
$ = W + τt (U − λ) × n, W = P + τn (U − λ) · n. P
Here, we use the convention that for a given function q (that may not be in 2 (∂Ωh )), we understand q to mean the function in 2 (∂Ωh ) satisfying 1 (3.12) q|∂K = q dγ. |∂K| ∂K Obviously, for functions ρ in 2 (∂Ωh ), we have ρ = ρ. Let λ0n be the function on ∂Ωh defined by λ0n |∂K = λn |∂K − λ · n|∂K n for all mesh elements K. Then, we can rewrite the right-hand side of (3.11c) as λ0n , qn∂K . Hence, the system (3.11) minus (3.11d) is the same as the HDG system (2.2) applied to one element with the data g t = λt and g n = λ0n . Consequently, by Proposition 2.1(2), the system has a solution, and moreover, the solution is unique once (3.11d) is added to the system. Thus, the map LII is well defined. Note that (3.11) is the HDG discretization of the exact Stokes problem ω K − curl uK = 0 curl ωK + grad pK = f
in K, in K,
div uK = 0 uK = λt + pK = ρ
in K, λ0n
on ∂K,
on a single element K. Next, we find conditions on (λ, ρ, f ) that make (W, U, P) ≡ LII (λ, ρ, f ) equal to (ω h , uh , ph ). First, we restrict λ to the space Mh defined by (3.13a) (3.13b)
Mh := {μ ∈ L2 (Eh ) :
μ|e ∈ M (e) ∀ e ∈ Eho },
Ψ h := 2 (∂Ωh ),
where M (e) is a finite-dimensional space on the face e ∈ Eh such that (3.13c)
M (e) ⊇ {(v + n × τ + n q)|e : (τ , v, q) ∈ W (K) × V (K) × P (K) ∀ K : e ⊂ ∂K}.
1106
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
Then we have the following theorem, which identifies certain discrete analogues of (3.9) as sufficient conditions for the coincidence of the locally recovered solution with the HDG solution. Theorem 3.3 (conditions for Type II hybridization). Suppose (ω h , uh , ph ) is the solution of the HDG method defined by (2.2), (2.5), and (2.6). Assume that (λ, ρ) ∈ Mh × Ψ h is such that (3.14a)
λ=g
on ∂Ω,
(3.14b)
$ μ E o = 0 [[n × W]], t h
∀ μ ∈ Mh ,
(3.14c)
n]], μ E o = 0 [[P n h
∀ μ ∈ Mh ,
(3.14d)
λ · n, q∂Ωh = 0
(3.14e)
(P, 1)Ω = 0.
∀ q ∈ Ψ h,
uh )t , and λn = ( uh )n . Then (W, U, P) = (ω h , uh , ph ), λt = ( Proof. We will show that (W, U, P) and (ω h , uh , ph ) satisfy the same set of equations. To do this, just as in the proof of Theorem 3.1, it suffices to show that the $ t , λt , λn , and P can be related to (W, U, P) through the expresnumerical traces (W) sions in (2.5). $ t , λt , λn , and P. By the choice of the We therefore derive expressions for (W) space Mh , the jump conditions (3.14b) and (3.14c) imply that $ =0 [[n × W]]
and
n]] = 0 [[P
on Eho .
Inserting the definition of the numerical traces (3.11e) and (3.11f), we readily obtain that, on Eho , [[n × W]] + τt+ (U+ )t + τt− (U− )t − (τt+ + τt− ) λt = 0, [[P n]] + τn+ (U+ )n + τn− (U− )n − (τn+ + τn− ) λn = 0, or equivalently,
1 [[n × W]], τt− + τt+
+ +
τt (U )n + τt− (U− )n 1 + [[P n]]. λn = τt− + τt+ τt− + τt+ λt =
τt+ (U+ )t + τt− (U− )t τt− + τt+
+
Hence,
+ − τt− (W+ )t + τt+ (W− )t τt τt + [[U × n]], τt− + τt+ τt− + τt+
+ + + − − − τn τn = τn P + τn P P + [[U · n]]. τn− + τn+ τn− + τn+
$ t= (W)
In other words, the numerical traces satisfy (2.5a), (2.5b), (2.5c), and (2.5d). The fact that they also satisfy (2.6) follows from conditions (3.14a) and (3.14c) and the definition of the local solvers. Consequently, by Proposition 2.1(2), we conclude that the difference between (W, U, P) and (ω h , uh , ph ) is (0, 0, κ) for some constant function κ. Equation (3.14e) then completes the proof.
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1107
Next, we show that the jump conditions (3.14b) and (3.14c) define a mixed method for the velocity traces and pressure averages on element boundaries. We denote by Mho the set of functions in Mh that vanish on ∂Ω and split λ = λo + g with λo in Mho . In analogy with (3.7) of the Type I hybridization, we now define the specific local solutions for this case by (3.15a)
(Wλ , Uλ , Pλ ) := LII (λ, 0, 0),
(3.15b) (3.15c)
(Wρ , Uρ , Pρ ) := LII (0, ρ, 0), (Wf , Uf , Pf ) := LII (0, 0, f ),
but note that by Proposition 2.1(2), (Wρ , Uρ , Pρ ) = (0, 0, ρ).
(3.16)
Our main result for the Type II hybridization is the following theorem. Theorem 3.4 (characterization of the approximate solution). We have that ωh uh ph
= = =
Wλo Uλo Pλo
+ Wf + Uf + Pf
+ Wg , + Ug , + Pg +
Pρ ,
where (λo , ρ) is the only element of Mho × Ψ h such that ah (λo , μ) + bh (ρ, μ) = (μ), −bh (ψ, λo ) = 0 for all (μ, ψ) ∈ Mho × Ψ h , and (Pλo + Pρ + Pf + Pg , 1)Ω = 0. Here ah (λ, μ)= (Wλ , Wμ )Ωh + τt (λ − Uλ )t , (μ − Uμ )t ∂Ωh + τn (λ − Uλ )n , (μ − Uμ )n ∂Ωh , bh (ρ, μ) = −ρ, μ · n∂Ωh , (μ)
= (f , Uμ )Ωh
− ah (g, μ).
A proof can be found in section 4. For appropriate choice of polynomial spaces, as in the previous case, to satisfy the conditions of Proposition 2.1, we choose the degrees dP , dV , and dW to be integers obeying (2.12). Then M h is fixed once we pick any M (e) satisfying (3.13c), e.g., M (e) = Pmax(dV ,dW ,dP ) (e). 3.3. Hybridization of Type III. A formulation with tangential vorticity, normal velocity, and pressure means. Next we hybridize the HDG methods by making another choice of two variables in (3.2), namely ω t and un , as the unknowns on the mesh interfaces. Their discrete approximations will be denoted by γ t and λn , respectively. When we try to formulate a system for these unknowns using the transmission conditions on the other two variables, namely, and [[pn]] = 0, (3.17) [[u × n]] = 0 Eho
Eho
1108
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
we again face the same difficulty we faced in the Type II case. Consequently, as we shall see, we must introduce a new variable ρ approximating the averages of pressure on element boundaries, just as in the Type II case. To hybridize the HDG method, we begin as in the previous cases by introducing discrete local solutions. These will be obtained using the HDG discretization of the Stokes problem ωK − curl uK = 0
in K,
curl ω K + grad pK = f div uK = 0
in K, in K,
(ω K )t = γ t
on ∂K,
λ0n
on ∂K,
(uK )n = pK = ρ
on a single element K. Given the function (γ t , λn , ρ, f ) in L2 (∂Ωh ) × L2 (∂Ωh ) × 2 (∂Ωh ) × L2 (Ω), we define (W, U, P) in W h × V h × Ph on the element K ∈ Ωh as the function in W (K) × V (K) × P (K) such that (3.18a) (3.18b) (3.18c)
n × τ ∂K = 0, (W, τ )K − (U, curl τ )K + U, v · n∂K = (f , v)K − γ , v × n∂K , (W, curl v)K − (P, div v)K + P, t −(U, grad q)K = −λn · n, q − q∂K , P = ρ,
(3.18d) where (3.18e) (3.18f)
= U + 1 n × (W − γ ), U t τt = P + τn (U − λn ) · n. P
By Proposition 2.1(3), there is a unique solution to (3.18) on each mesh element K. In other words, the local solver LIII (γ t , λn , ρ, f ) := (W, U, P) is well defined. As in the previous cases, we now proceed to identify the discrete analogues of (3.17) that make LIII (γ t , λn , ρ, f ) identical to (ω h , uh , ph ). This will yield a mixed method for (γ t , λn , ρ, f ). To do this, we begin by restricting the function (γ t , λn , ρ) to the space (Gh )t × (Mh )n × Ψ h , where (3.19a)
(Gh )t := {δ t ∈ L2 (Eh ) :
(3.19b)
(Mh )n := {μn ∈ L (Eh ) :
(3.19c)
Ψ h := {ψ ∈ L2 (∂Ωh ) :
2
δ t |e ∈ G(e)
∀ e ∈ Eh },
μn |e ∈ M (e) ψ|∂K ∈ R
∀ e ∈ Eho },
∀ K ∈ Ωh } ≡ 2 (∂Ωh ),
where G(e) and M (e) for each face e ∈ Eh are finite-dimensional spaces satisfying (3.19d) (3.19e)
G(e) ⊇ {(vt + n × τ )|e : (τ , v) ∈ W (K) × U (K) ∀ K : e ⊂ ∂K}, M (e) ⊇ {(vn + n q)|e : (v, q) ∈ U (K) × P (K) ∀ K : e ⊂ ∂K}.
Theorem 3.5 (conditions for Type III hybridization). Suppose (ω h , uh , ph ) is the solution of the HDG method defined by (2.2), (2.5), and (2.6). Assume that
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1109
(γ t , λn , ρ) ∈ (Gh )t × (Mh )n × Ψ h is such that (3.20a)
λn = g n
on ∂Ω,
× n]], δ t E = g × n, δ t ∂Ω [[U t h [[P n]], μn Eho = 0
(3.20b) (3.20c) (3.20d)
λn · n, q∂Ωh = 0
(3.20e)
(P, 1)Ω = 0.
∀ δ t ∈ (Gh )t , ∀ μn ∈ (Mh )n , ∀ q ∈ Ψ h,
uh )n , and γ t = ( ω h )t . Then (W, U, P) = (ω h , uh , ph ), λn = ( Proof. We begin by noting that (W, U, P) is in the space W h ×V h ×Ph . Moreover, (W, U, P) satisfies the weak formulation (2.2) by the definition of the local solver (3.18). Next, we note that, by the choice of the space (Gh )t ×(Mh )n , the jump conditions (3.20b) and (3.20c) imply that × n]] = 0 [[U
and
n]] = 0 [[P
on Eho .
Inserting the definition of the numerical traces (3.18e) and (3.18f), we readily obtain that, on Eho ,
1 1 1 1 + − [[U × n]] + + (W )t + − (W )t − + − γ t = 0, τt τt τt+ τt − − + − [[P n]] + τn+ (U+ h )n + τn (Uh )n − (τn + τn ) λn = 0,
or, equivalently,
− + τt− (W+ )t + τt+ (W− )t τt τt + [[U × n]], τt− + τt+ τt− + τt+
+ +
τn (U )n + τn− (U− )n 1 + [[P n]]. λn = τn− + τn+ τn− + τn+ γt =
Hence,
τt+ (U+ )t + τt− (U− )t 1 + [[n × W]], τt− + τt+ τt− + τt+
− + − + τn P + τn+ P− τn τn P= + [[U · n]]. − + τn + τn τn− + τn+
t= (U)
In other words, the numerical traces satisfy (2.5a), (2.5b), (2.5c), and (2.5d). The fact that they also satisfy (2.6) follows from conditions (3.20a) and (3.20c). They also satisfy (2.6c) and (2.6d) by definition of the local solvers. By the uniqueness result of Proposition 2.1(2), we can now conclude that the approximation (W, U, P) coincides with (ω h , uh , ph ). Moreover, we also have γ t = ( ω h )t and λn = ( uh )n . This completes the proof. We now proceed to formulate a mixed method for the numerical traces. Define specific local solutions by (Wγ t , Uγ t , Pγ t ) := LIII (γ t , 0, 0, 0), (Wρ , Uρ , Pρ ) := LIII (0, 0, ρ, 0),
(Wλn , Uλn , Pλn ) := LIII (0, λn , 0, 0), (Wf , Uf , Pf ) := LIII (0, 0, 0, f),
1110
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
and observe that by Proposition 2.1(2), (Wρ , Uρ , Pρ ) = (0, 0, ρ). We additionally denote by (Mho )n the functions of (Mh )n which are zero on ∂Ω, and we write λn as the sum of λon and g n , where λon is in (Mho )n . We are now ready to state our main result. Theorem 3.6 (characterization of the approximate solution). We have that ωh uh ph
= = =
Wγ t Uγ t Pγ t
+ + +
Wλon Uλon Pλon
+ + +
Wf Uf Pf
+ + +
Wgn , Ugn , Pg n +
Pρ ,
where (γ t , λon , ρ) is the only element of (Gh )t × (Mho )n × Ψ h such that ah (γ t , δ t ) + bh (λn , δ t ) = 1 (δ t ), −bh (μn , γ t ) + ch (λn , μn ) + dh (ρ, μn ) = 2 (μn ), −dh (q, λn )
=0
for all (δ t , μn , ρ) ∈ Gh × (Mho )n × Ψ h , and (Pλot + Pρ + Pf + Pgt , 1)Ω = 0. Here ah (γ t , δ t ) := (Wγ t , Wδ t )Ωh ? @ 1 + n × (γ t − Wγ t ), n × (δ t − Wδ t ) + τn (Uγ t )n , (Uδ t )n ∂Ωh , τt ∂Ωh bh (λn , δ t ) := λn , Pδt + τn (Uλt )n ∂Ωh , ch (λn , μn ) := (Wλn , Wμn )Ωh ? @ 1 + n × Wμn , n × Wλn + τn (μn − Uμn )n , (λn − Uλn )n ∂Ωh , τt ∂Ωh dh (ρ, μn ) := − ρ, μn · n∂Ωh , and 1 (δ t ) := − (f , Uδ t )Ωh − bh (g n , δ t ) − g t × n, δ t ∂Ω , 2 (μn ) := (f , Uμn )Ωh − ch (g n , μn ).
3.4. Hybridization of Type IV. A formulation with tangential vorticity, pressure, and harmonic velocity potentials. There is now only one more remaining choice of two variables from in (3.2), namely ωt and p, that we have not yet investigated. This is the Type IV case. This case presents additional complications not found in the previous three cases. The complications are rooted in the same reason for which we did not consider “Type IV boundary conditions” in section 2. To explain the difficulty, suppose we are given an approximation (γ t , ρ) to (ω t , p) on ∂Ωh . To obtain an approximate solution inside the mesh elements, let us try to define a local solution (W, U, P) generated by data (γ t , ρ, f ) in L2 (∂Ωh ) × L2 (∂Ωh ) × L2 (Ω). For this, we would like to use the HDG method applied to one element K, with
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1111
boundary conditions on tangential vorticity and pressure (which would be discrete versions of boundary conditions ω t = γ t and p = ρ on ∂K). Thus we are led to take (W, U, P) as the function in W (K) × V (K) × P (K) such that n × τ ∂K = 0, (W, τ )K − (U, curl τ )K + U, (W, curl v)K − (P, div v)K = (f , v)K − n × γ t + ρ n, v∂K , −(U, grad q)K + U · n, q∂K = 0, t = (U)t + τ −1 n × (W− γ ) and (U) n = (U)n + τ −1 (P − ρ) n. Unfortunately with (U) t t n this problem is not solvable in general, which is the same reason we omitted this type of boundary condition in Proposition 2.1. Nonetheless, upon reviewing the proof of Proposition 2.1 in the case of one element, we find that the null space of the above system is of the form (W, U, P) = (0, grad φ, 0), where φ is in the following local space harmonic velocity potentials: Φ(K) = {ξ : grad ξ ∈ V (K) : Δξ = 0 and (ξ, 1)K = 0}. Hence we can recover unique solvability if the velocity is kept orthogonal to Φ(K). Keeping this in mind, we are motivated to reformulate the local problems to give a consistent system of equations as follows. Denote the L2 -projection of v ∈ V (K) into grad Φ(K) by grad φv . Given the function (γ t , ρ, φ, f ) in L2 (∂Ωh ) × L2 (∂Ωh ) × H 1 (Ωh ) × L2 (Ω), we define (W, U, P) in W h × V h × Ph on the element K ∈ Ωh as the function in W (K) × V (K) × P (K) such that (3.21a) n × τ ∂K = 0, (W, τ )K − (U, curl τ )K + U, (W, curl v)K − (P, div v)K = (f , v − grad φv )K (3.21b) −n × γ t +ρ n, v−grad φv ∂K, (3.21c) (3.21d)
· n, q∂K = 0, −(U, grad q)K + U (U, grad ξ)K =(grad φ, grad ξ)K ,
where (3.21e) (3.21f)
1 n × (W − γ t ), τt n = (U)n + 1 (P − ρ) n. (U) τn t = (U)t + (U)
A minor modification of the arguments in Proposition 2.1 shows unique solvability of (3.21); hence we can define a fourth local solver LIV : L2 (∂Ωh )×L2 (∂Ωh )×H 1 (Ωh )× L2 (Ω) → W (K) × V (K) × P (K) that takes (γ t , ρ, φ, f ) to (W, U, P). Note that (3.21) is a discretization of the exact Stokes problem ωK − curl uK = 0
in K,
curl ω K + grad pK = f div uK = 0
in K, in K,
(ω K )t = γ t pK = ρ
on ∂K, on ∂K,
1112
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
with the additional condition that the velocity field uK is L2 -orthogonal to all gradients of harmonic functions, which is necessary for well-posedness. Although we could have considered a global “Type IV boundary conditions” case in Proposition 2.1 through the addition of an equation like (3.21d), it does not appear to be very useful, because we do not know the data needed for the right-hand side. However, we can use Type IV boundary conditions locally to hybridize a global problem with Type II boundary conditions because we already have global solvability for the Type II boundary conditions case. We need only ensure that the local problems are solvable, and the reformulation of the local solvers with (3.21d) guarantees it. Now, we proceed as in the previous cases to identify conditions on γ t , ρ, and φ in such a way that (W, U, P) is identical to (ω h , uh , ph ). We begin by restricting the function (γ t , ρ, φ) to the space (Gh )t × Ψh × Φh , where (3.22a)
(Gh )t := {δ t ∈ L2 (Eh ) :
δ t |e ∈ G(e)
∀ e ∈ Eho },
(3.22b)
Ψh := {ψ ∈ L2 (Eh ) :
ψ|e ∈ Ψ (e) ∀ e ∈ Eh },
(3.22c)
Φh := {ξ ∈ H (Ωh ) :
ξ|K ∈ Φ(K)
1
∀K ∈ Ωh },
where, on each face e ∈ Eh , we have finite-dimensional spaces G(e) and Ψ (e) satisfying (3.22d) (3.22e)
G(e) ⊇ {(vt + n × τ )|e : (τ , v) ∈ W (K) × U (K) ∀ K : e ⊂ ∂K}, Ψ (e) ⊇ {(q + v · n)|e : (v, q) ∈ U (K) × P (K) ∀ K : e ⊂ ∂K}.
The next theorem identifies the discrete analogues of the transmission conditions [[u × n]]
[[u · n]]
= 0, Eho
=0 Eho
that recover the original solution. An additional condition also appears because of our reformulation of the local solvers. Theorem 3.7 (conditions for Type IV hybridization). Suppose (ω h , uh , ph ) is the solution of the HDG method defined by (2.2), (2.5), and (2.6). Assume that (γ t , ρ, φ) ∈ Mh × Ψh × Φh is such that (3.23a) (3.23b) (3.23c)
× n]], δ t E = g × n, δ t ∂Ω [[U h [[U · n]], ψEh = g · n, ψ∂Ω n × γ t + ρ n, grad ξ∂Ωh = (f , grad ξ)Ωh
∀ δ t ∈ Gh , ∀ ψ ∈ Ψh , ∀ ξ ∈ Φh ,
(P, 1)Ω = 0.
(3.23d)
Then (W, U, P) = (ω h , uh , ph ), γ t = ( ω h )t , and ρ = ph . Proof. The proof is similar to the analogous proofs in the previous three cases and begins with the observation that (W, U, P) satisfies the weak formulation (2.2) by the definition of the local solver (3.21) and condition (3.23c). Next, the jump conditions (3.23a) and (3.23b) imply that × n]] = 0 [[U
and
· n]] = 0 [[U
on Eho .
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1113
Inserting the definition of the numerical traces (3.21e) and (3.21f), we readily obtain that, on Eho ,
1 1 1 1 [[U × n]] + + (W+ )t + − (W− )t − + γ t = 0, τt τt τt+ τt−
1 1 1 − 1 [[U · n]] + + P+ + P − + ρ = 0, τn h τn− h τn+ τn− or, equivalently,
− + τt− (W+ )t + τt+ (W− )t τt τt γt = + [[U × n]], − + τt + τt τt− + τt+
− + − + τn P + τn+ P− τn τn ρ= + [[U · n]]. − + τn + τn τn− + τn+ Hence,
τt+ τt− [[n × W]], τt− + τt+
+ +
τn (U )n + τn− (U− )n 1 + [[P n]]. (U)n = τn− + τn+ τn− + τn+ t= (U)
τt+ (U+ )t + τt+ (U− )t τt− + τt+
+
In other words, (W, U, P) satisfies (2.2), (2.5), and (2.6). By the uniqueness result of Proposition 2.1(2), we can now conclude that the approximation (W, U, P) coincides with (ω h , uh , ph ) and consequently γ t = ( ωh )t and ρ = ph . Next, we give a characterization of the approximate solution in terms of the local solutions (Wγ t , Uγ t , Pγ t ) := LIV (γ t , 0, 0, 0),
(Wρ , Uρ , Pρ ) := LIV (0, ρ, 0, 0),
(Wφ , Uφ , Pφ ) := LIV (0, 0, φ, 0),
(Wf , Uf , Pf ) := LIV (0, 0, 0, f ).
Note that (Wφ , Uφ , Pφ ) = (0, grad φ, 0)
(3.24)
by direct verification in (3.21). The next theorem gives a mixed problem for the numerical traces γ t , ρ together with the volumetric unknown φ. The presence of the variable φ defined within the elements (and not element boundaries, as in the previous cases) may appear to annul the potential advantages of dimensional reduction brought about by hybridization. However, this is not the case because φ is completely determined by its values on element boundaries. Theorem 3.8 (characterization of the approximate solution). We have that ωh uh ph
= = =
Wγ t Uγ t Pλot
+ + +
Wρ Uρ Pρ
+ + +
Wf , Uf + Pf ,
grad φ,
where (γ t , ρ, φ) is the only element of (Gh )t × Ψh × Φh such that ah (γ t , δ t ) + bh (ρ, δ t ) + ch (φ, δ t ) = 1 (μt ), bh (ψ, γ t ) + dh (ρ, ψ) + eh (φ, ψ) = 2 (ψ), = 3 (ξ) −ch (ξ, γ t ) − eh (ξ, ρ)
1114
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
for all (δ t , ψ, ξ) ∈ (Gh )t × Ψh × Φh , and (Pγ t + Pρ + Pf , 1)Ω = 0. Here
?
ah (γ t , δ t )
:=
(Wγ t , Wδ t )Ωh + ? +
bh (ρ, δ t )
:=
ch (φ, δ)
:=
dh (ρ, ψ)
:=
eh (φ, ψ)
:=
1 Pγ , Pδ t τn t
@ ∂Ωh
@ 1 n × (γ t − Wγ t ), n × (δ t − Wδt ) , τt ∂Ωh @ ,
? 1 − Uδ t + Pδ , ρ τn t ∂Ωh
n × grad φ, δ t ∂Ωh , ? @ 1 (Wρ , Wψ )Ωh + (ρ − Pρ ), (ψ − Pψ ) τn ∂Ωh ? @ 1 + n × Wρ , n × Wψ , τt ∂Ωh −grad φ · n, ψ∂Ωh ,
and 1 (δ t ) := −(f , Uδt )Ωh − g × n, δ t ∂Ω , 2 (ψ) := −(f , Uψ )Ωh − g · n, ψ∂Ω , 3 (ψ) := +(f , grad ξ)Ωh .
3.5. Summary. We have shown how to hybridize the HDG methods in four different ways according to the choice of globally coupled variables. These variables are described in Table 3.1 for each of the hybridizations we considered. They are referred to as unknowns therein since all the other variables can be eliminated from the original equations. The corresponding discrete transmission conditions appear alongside under the heading jump conditions. The primary motivation for all these hybridizations is the reduction in the number of global degrees of freedom achieved by the elimination of volumetric unknowns ω h , uh , and ph . The variational equations on the mesh faces that we derived in each type result in significantly smaller systems, especially in the high order case. Table 3.1 The unknowns and jump conditions for the hybridizations. Type I II III IV
Unknowns ( uh )t ( uh )t ( ω h )t ( ω h )t
ph ( uh )n ( uh )n ph
Jump conditions
ph ph φh
[[n × ( ω h )t ]] = 0 [[n × ( ω h )t ]] = 0 [[( uh )t × n]] = 0 [[( uh )t × n]] = 0
[[( uh )n · n]] = 0 [[ ph n]] = 0 [[ ph n]] = 0 [[( uh )n · n]] = 0
For DG methods, the possibility of deriving a hybridized formulation is strongly dependent on the structure of the numerical traces. Although we gave expressions for
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1115
the numerical traces in the traditional DG format as in (2.5), we should note that the numerical traces on which the jump conditions are imposed can be expressed element by element. Indeed, on the boundary of each mesh element K, the numerical traces on which the jump conditions are imposed have the following expressions using the values of variables from just that element: ⎧ ω h )t = (ω h )t + τt (uh − ( uh )t ) × n on ∂K ⎨ ( (3.25) Type I: 1 ⎩ ( (ph − ph ) n on ∂K, uh )n = (uh )n + τn ( ω h )t = (ω h )t + τt (uh − ( uh )t ) × n on ∂K, Type II: (3.26) uh )n ) · n on ∂K, ph = ph + τn (uh − ( ⎧ 1 ⎨ ( ω h )t ) on ∂K, uh )t = (uh )t + n × (ω h − ( τt (3.27) Type III: ⎩ uh )n ) · n, on ∂K, ph = ph + τn (uh − ( ⎧ 1 ⎪ ⎪ uh )t = (uh )t + n × (ω h − ( ω h )t ) on ∂K, ⎨ ( τt Type IV: (3.28) 1 ⎪ ⎪ ⎩ ( uh )n = (uh )n + (ph − ( ph )t ) n on ∂K. τn Finally, let us note that in the rewritten expressions of the numerical traces above, it is easy to formally set the parameters τt , τn to either zero or infinity, which gives rise to numerical methods we can think of as being limiting cases of the HDG methods. In Table 3.2, for each of these limiting cases, we give the associated continuity properties of some of the components of the approximate solution as well as the corresponding natural hybridizations. Table 3.2 The continuity properties induced by the formal limits. Formal limit
Continuity property
Hybridization type
τt = 0 1 =0 τt τn = 0 1 =0 τ
ω h ∈ H(curl, Ω) uh ∈ H(curl, Ω) ph ∈ C0 (Ω) uh ∈ H(div, Ω)
I, II III, IV II, III I, IV
n
In particular, if we use the hybridizations of Type I or IV and formally set τn = ∞ in (3.25) or (3.28), we immediately obtain that uh ∈ H(div, Ω) by the jump condition (3.6c) (respectively, jump condition (3.23b)) for the Type I (respectively, Type IV) boundary conditions. We also immediately see that the discrete incompressibility condition (2.2c) becomes (div uh , q)Ωh = 0
∀ q ∈ Ph ,
and if we assume, as in Proposition 2.1, that div V (K) ⊂ P (K)
∀ K ∈ Ωh ,
we can conclude that our approximate velocity uh is strongly incompressible. That is, the distributional divergence of the numerical velocity approximation satisfies
1116
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
div uh = 0 in all Ω. It is interesting that even though the space V h is a space of completely discontinuous functions, we are able to recover such a velocity approximation. The first DG methods producing strongly incompressible velocities were introduced, in the framework of the Navier–Stokes equations, in [12] and were later more explicitly developed in [13]; see also [21], where this idea is applied to square and cube elements. Another DG method able to provide strongly incompressible velocities is the method introduced in [3]. It uses a velocity space V h of exactly divergence-free velocities and uses a hybridization technique to avoid the almost impossible task of constructing its bases. Unfortunately, the above-mentioned methods do not fit into our setting. The methods in [12, 13] do not use the vorticity as an unknown; instead, they use the gradient of the velocity. The method in [3] almost fits into our setting except for the fact that the numerical traces for the tangential vorticity and the tangential velocity do not coincide for any finite values of τt± . If, on the other hand, we formally set τt− = ∞ and then take τt+ = 0, we do recover the general form of the numerical traces considered in [3]. However, in that case, the numerical trace for the tangential vorticity becomes independent of the tangential velocity. This is certainly not the case for the scheme treated in [3]. In Table 3.3, we describe four special limiting cases. Most finite element methods for the Stokes problem use approximate velocities uh in H 1 (Ω) (see [2]); they thus correspond to the case τ1t = τ1n = 0. The method introduced by N´ed´elec in [17] corresponds to the case τt = τ1n = 0; its hybridization was carried out in [7, 8]. Table 3.3 Four special formal limits of HDG methods. 1 τt
τt = 0
=0
τn = 0
ω h ∈ H(curl, Ω) ph ∈ C0 (Ω) Type II hybridization
uh ∈ H(curl, Ω) ph ∈ C0 (Ω) Type III hybridization
1 τn
ω h ∈ H(curl, Ω) uh ∈ H(div, Ω) Type I hybridization
uh ∈ H(curl, Ω) uh ∈ H(div, Ω) Type IV hybridization
=0
4. Proofs of the characterization theorems. 4.1. Preliminaries. We begin by proving an auxiliary identity that we will use in all our proofs. It is stated in terms of functions (wh , uh , ph ) in W h × V h × Ph that are assumed to satisfy the equations (4.1a) (4.1b) (4.1c)
uh , n × τ ∂Ωh , (wh , τ )Ωh − (uh , curl τ )Ωh = − (wh , curl v)Ωh − (ph , div v)Ωh = (f, v − Pv)Ωh − ph , (v − Pv) · n∂Ωh − wh , (v − Pv) × n∂Ωh , −(uh , grad q)Ωh = − uh · n, q − Pq∂Ωh
for all (τ , v, q) ∈ W h × V h × Ph . Here P is a projection from Ph , and P is a projection h, u h , from V h . Their ranges are denoted by ψ h and H h , respectively. The symbols w and ph , while evocative of numerical traces, are not assumed to be related to the
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1117
variables (wh , uh , ph ) as in (2.5), nor are they assumed to be single valued on Eh . They simply denote some given functions on ∂Ωh . Lemma 4.1. Let (wh , uh , ph ) be a function satisfying (4.1a) and (4.1c), and let h , and h , and (w h , u h , p h ) be a function satisfying (4.1b) with f, w ph replaced by f , w ph , respectively. Then h + n p h ∂Ωh = (wh , w h )Ωh − uh , n × w
− uh − uh , n × ( wh − w h ) + n ( p h − p h )∂Ωh − (uh , f )Ωh whenever (Puh , Pp h ) = (0, 0). Proof. By (4.1a) with τ := w h , we have that uh , n × w h ∂Ωh (wh , w h )Ωh = (uh , curl w h )Ωh − and so, after integration by parts, (wh , w h )Ωh = (curl uh , w h )Ωh + uh − uh , n × w h ∂Ωh . By (4.1b) with v := uh , and with wh , uh , ph , and f replaced by w h , u h , p h , and f , respectively, we get
(wh , w h )Ωh = − wh , (uh − Puh ) × n∂Ωh + uh − uh , n × w h ∂Ωh + (p h , div uh )Ωh − p h , (uh − Puh ) · n∂Ωh + (f , uh − Puh )Ωh
h , ∂Ωh + uh − = − uh , n × w uh , n × w h ∂Ωh + (div uh , p h )Ωh − uh · n, p h ∂Ωh + (uh , f )Ωh since Puh = 0. If we now integrate by parts, we get h , ∂Ωh + uh − (wh , w h )Ωh = − uh , n × w uh , n × w h ∂Ωh h − p h ∂Ωh + (uh , f )Ωh , − (uh , grad p h )Ωh − uh · n, p and by (4.1c) with q := p h ,
h , ∂Ωh + uh − (wh , w h )Ωh = − uh , n × w uh , n × w h ∂Ωh h − p h ∂Ωh + (uh , f )Ωh − uh · n, p h − Pp h ∂Ωh − uh · n, p
h , ∂Ωh + uh − h )∂Ωh = − uh , n × w uh , n × (w h − w − uh · n, p h ∂Ωh + ( uh − uh ) · n, p h − p h ∂Ωh + (uh , f )Ωh since Pp h = 0. The result now follows after a simple rearrangement of terms. This completes the proof. The following immediate consequence of this result will also be useful. Corollary 4.2. Let (wh , uh , ph ) be a function satisfying (4.1), and let (w h , h, h , u h , uh , p h ) be a function satisfying (4.1) with f, w uh , and ph replaced by f , w and ph , respectively. Then we have
h + n h + n − uh , n × w p h ∂Ωh + (uh , f )Ωh = − uh , n × w p h ∂Ωh + (u h , f)Ωh , provided (Puh , Pph ) = (Pu h , Pp h ) = (0, 0) and
wh −w h )+n ( p h −p h )∂Ωh = − uh −u h , n×( wh −wh )+n ( p h −ph )∂Ωh . − uh −uh , n×(
1118
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
4.2. Proof of the characterization of Theorem 3.2. To prove the characterization of Theorem 3.2, we are going to use several key identities gathered in the following result. Recall the definitions of specific local solutions in (3.7) (such as $ and U the corresponding numerical traces, for Wλt , Uλt , etc.). We denote by W all choices of the subscript “,” that make sense in the discussion of this hybridization case: (4.2a)
$ λ = Wλ + τt (Uλ − λt ) × n, W t t t
(4.2b)
$ ρ = Wρ + τt (Uρ × n), W
(4.2c)
$ f = Wf + τt (Uf × n), W
λ = Uλ + 1 Pλ n, U t t τn t ρ = Uρ + 1 (Pρ − ρ)n, U τn f = Uf + 1 Pf n. U τn
Clearly these equations are inherited from the definitions (3.4d) and (3.4e). Lemma 4.3 (elementary identities). For any λt , μt ∈ L2 (Eh ), any ρ, ψ ∈ L2 (Eh ), and any f ∈ L2 (Ω), we have $ λ ]], μ E = (Wλ , Wμ )Ω + τt (λt − Uλ )t , (μ − Uμ )t ∂Ω − [[n × W t h t t t h t h t t ? @ 1 + Pλ , Pμt τn t ∂Ωh $ ρ ]], μ E = [[U μ · n]], ρE , − [[n × W t h h t $ f ]], μ E = − (f , Uμ )Ω − [[n × W t h h t and $ ψ ]], λt E , λ · n]], ψE = [[n × W − [[U t h h − [[Uρ · n]], ψEh = (Wρ , Wψ )Ωh + τt (Uρ )t , (Uψ )t ∂Ωh ? @ 1 + (Pρ − ρ), (Pψ − ψ) , τn ∂Ωh f · n]], ψE = + (f , Uψ )Ω . − [[U h h Proof. In all the applications of Lemma 4.1 and Corollary 4.2 in this proof, we take (P, P) = (0, 0). Observe that (4.1) is satisfied by (wh , uh , ph ) = (Wμt , Uμt , Pμt ) if we set $ μ , μ , (U μ )n , 0, 0). ( wh , ( uh )t , ( uh )n , p h , f) = (W t t t The system (4.1) is also satisfied by (w h , u h , p h ) = (Wλt , Uλt , Pλt ) if we set $ λ , λt , (U λ )n , 0, 0). ( wh , ( uh )t , ( uh )n , p h , f ) = (W t t
Hence, by Lemma 4.1, $ λ ]], μ E − [[n × W t t h
$ λ − Wλ )∂Ω = (Wλt , Wμt )Ωh − μt − Uμt , n × (W t t h − Uμt − Uμt , n (0 − Pλt ) ∂Ωh .
The first identity of the lemma follows from this and the identities defining the numerical traces such as (4.2).
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1119
The second identity of the lemma follows just as the fourth; see below. The third identity follows from Corollary 4.2. It is easy to check that the conditions of the corollary are satisfied with (wh , uh , ph ) = (Wμt , Uμt , Pμt ),
$ μ , μ , (U μ )n , 0, 0), ( wh , ( uh )t , ( uh )n , p h , f) = (W t t t
(w h , u h , p h ) = (Wf , Uf , Pf ),
$ f , 0, (U f )n , 0, f ). ( wh , ( uh )t , ( uh )n , p h , f ) = (W
Hence the corollary implies that $ f ∂Ω + (Uμ , f )Ω = − h ∂Ωh + (uh , f )Ωh uh , n × w −ut , n × W h h
h ∂Ωh + (u h , f)Ωh = − uh , n × w $ μ ∂Ω = −0, n × W h = 0, and the required identity follows. The fourth identity also follows from Corollary 4.2 after verifying its conditions with (wh , uh , ph ) = (Wλt , Uλt , Pλt ),
$ λ , λt , (U λ )n , 0, 0), ( wh , ( uh )t , ( uh )n , p h , f) = (W t t
(w h , u h , p h ) = (Wψ , Uψ , Pψ ),
$ ψ , 0, (U ψ )n , ψ, 0). ( wh , ( uh )t , ( uh )n , p h , f ) = (W
The fifth identity follows from Lemma 4.1 with (wh , uh , ph ) = (Wρ , Uρ , Pρ ),
$ ρ , 0, (Uρ )n , ρ, 0), h , f) = (W ( wh , ( uh )t , ( uh )n , p
(w h , u h , p h ) = (Wψ , Uψ , Pψ ),
$ ψ , 0, (U ψ )n , ψ, 0). ( wh , ( uh )t , ( uh )n , p h , f ) = (W
The sixth identity follows from Corollary 4.2 after verifying its conditions with (wh , uh , ph ) = (Wf , Uf , Pf ),
$ μ , 0, (U f )n , 0, f ), h , f) = (W ( wh , ( uh )t , ( uh )n , p t
$ ψ , 0, (U ψ )n , ψ, 0). (w h , u h , p h ) = (Wψ , Uψ , Pψ ), ( w h , ( u h )t , ( u h )n , p h , f ) = (W This completes the proof of the lemma. Proof of Theorem 3.2. By the jump conditions (3.6b) and (3.6c), $ λo + W $ ρ )]], μ E = [[n × (W $f + W $ g )]], μ E , − [[n × (W t h t h t − [[(Uλot + Uρ ) · n]], ψEh = [[(Uf + Ug ) · n]], ψEh − g · n, ψ∂Ω . By Lemma 4.3, we have that $ λo ]], μ E = ah (λo , μ ), − [[n × W t t h t t
λo · n]], ψE = −bh (ψ, λo ), − [[U t h t
$ ρ ]], μ E = bh (ρ, μ ), − [[n × W t h t
ρ · n]], ψE = ch (ρ, ψ). − [[U h
In order to prove (3.8a) and (3.8b), we now have only to show that 1 = 61 and 2 = 62 , where $ f ]], μ E + [[n × W $ g ]], μ E , 61 (μt ) := [[n × W t h t h 6 2 (ψ) := [[Uf · n]], ψEh + [[Ug · n]], ψEh − g · n, ψ∂Ω .
1120
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
But, again by Lemma 4.3, we have 61 (μt ) = (f , Uμt )Ωh − ah (g, μt ) = 1 (μt ). Similarly, applying Lemma 4.3 one more time, $ ψ ]]E − g · n, ψ∂Ω 62 (ψ) = −(f , Uψ )Ωh − g, [[n × W h = −(f , Uψ )Ωh + bh (ψ, g t ) − g · n, ψ∂Ω = 2 (ψ). It now only remains to prove that (λot , ρ) is the only solution of (3.8a)–(3.8c). First observe that the above arguments in fact show that the jump conditions (3.6b) ˜ o , ρ˜) is and (3.6c) hold if and only if (3.8a) and (3.8b) hold, respectively. Hence if (λ t ˜ o , ρ˜, f ) another solution of (3.8a)–(3.8c), then the numerical traces generated by LI (λ t will also satisfy (3.6b) and (3.6c). But then, since (3.8c) implies (3.6d), we find that ˜ o +g = ( all the conditions of Theorem 3.1 are verified, so we conclude that λ uh )t and t t ˜ o , ρ˜) = (λo , ρ). ρ˜ = ph . Since we also have (λot +g t , ρ) = (( uh )t , ph ), we conclude that (λ t t This completes the proof of Theorem 3.2. 4.3. Proof of the characterization of Theorem 3.4. To prove Theorem 3.4, we proceed as in the previous case and gather several key identities in the following result. Recall the definitions of specific local solutions in (3.15) (such as Wλ , Uλ , etc.). $ and P are given by (3.11) for the choices of subscript , The numerical traces W that make sense here, such as when , is λ, ρ, or f , e.g., $ ρ = Wρ + τt Uρ × n, W
λ = Pλ + τn (Uλ − λ) · n, P
just as in the previous case. Lemma 4.4 (elementary identities). For any λ, μ ∈ L2 (Eh ), any ρ ∈ 2 (∂Ωh ), and any f ∈ L2 (Ω), we have $λ + P λ n]], μE = (Wλ , Wμ )Ω + τt (λ − Uλ )t , (μ − Uμ )t ∂Ω − [[n × W h h h + τn (λ − Uλ )n , (μ − Uμ )n ∂Ωh , ρ n]], μE = −ρ, μ · n∂Ω , $ρ + P − [[n × W h h f n]], μE = − (f , Uμ )Ω . $f + P − [[n × W h h Proof. The second identity immediately follows because by (3.16), $ρ + n P ρ = +ρ n. n×W To prove the remaining identities, we set P = 0 and Pψ = ψ¯ (where ψ¯ is as defined in (3.12)) and apply Lemma 4.1 and Corollary 4.2 appropriately. Indeed, to prove the first identity, first observe that (4.1) is satisfied by (w h , u h , p h ) = (Wλ , Uλ , Pλ )
with
$ λ , λ, P λ , 0), ( wh , uh , p h , f ) = (W
(wh , uh , ph ) = (Wμ , Uμ , Pμ )
with
$ μ , μ, P μ , 0). ( wh , uh , p h , f) = (W
and
Furthermore, PPλ = 0. Hence the first identity follows by applying Lemma 4.1.
1121
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
Similarly, the last identity follows from Corollary 4.2, setting $ f , 0, P f , f ) ( wh , uh , p h , f ) = (W
(w h , u h , p h ) = (Wf , Uf , Pf ),
and
$ μ , μ, P μ , 0). h , f) = (W ( wh , uh , p
(wh , uh , ph ) = (Wμ , Uμ , Pμ ),
This completes the proof of the identities. Proof of Theorem 3.4. By the jump conditions (3.14b) and (3.14c), λo n + n × W $ρ + P ρ n]], μE $ λo + P − [[n × W h $f + P f n + n × W $g + P g n]], μE . = [[n × W h By Lemma 4.4, we have that $ λo + P λo n]], μE = ah (λo , μ), − [[n × W h ρ n]], μE = bh (ρ, μ). $ρ + P − [[n × W h It remains to show that the form (·) of the theorem coincides with 6 defined by $f + P f n]], μE + [[n × W $g + P g n]], μE . 6 (μ) := [[n × W h h But, again by Lemma 4.4, we have 6 (μ) = (f , Uμ )Ωh − ah (g, μ) = (μ). The proof of uniqueness of the trace solution (λo , ρ) proceeds as in the Type I case, so we omit it. 4.4. Proof of the characterization of Theorem 3.6. We now prove Theorem 3.6, using the identities gathered in the next lemma. The notation for the numer and P have meanings inherited from (3.18e) and (3.18f) ical traces of the form U as in the previous cases. Lemma 4.5 (elementary identities). For any γ t , δ t ∈ L2 (Eh ), any λn , μn ∈ 2 L (Eh ), any ρ, ψ ∈ L2 (Eh ), and any f ∈ L2 (Ω), we have G F γ × n]], δ t E = (Wγ , Wδ )Ω + τn (Uγ )n , (Uδ )n − [[U t h t h t t t ∂Ωh ? @ 1 + n × (γ t − Wγ t ), n × (δ t − Wδt ) , τt ∂Ωh δ n]], λn E , λ × n]], δ t E = [[P − [[U n t h h − [[Uρ × n]], δ t Eh = 0, f × n]], δ t E = (f , Uδ )Ω − [[U h
t
h
and γ n]], μ E = [[U μ × n]], γ E , − [[P n h t h t n ? @ J I 1 λ n]], μ − [[P = (W , W ) + n × W , n × W λn μn Ωh λn μn n n τt Eh ∂Ωh + τn (λn − Uλn )n , (μn − Uμn )n ∂Ωh , ρ n]], μ E = −ρ, μ · n∂Ω − [[P n h n h − [[Pf n]], μn Eh = −(f , Uμn )Ωh .
1122
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
ρ = 0 and Proof. The third and seventh identities immediately follow because U ρ = ρ. P For proving the remaining identities, we apply Lemma 4.1 and Corollary 4.2 with ¯ To prove the first identity, observe that (4.1) is satisfied by P = 0 and Pψ = ψ. δ )t , 0, P δ , 0). w h , ( u h )t , ( u h )n , p h , f ) = (δ t , (U (w h , u h , p h ) = (Wδ t , Uδt , Pδt ) with ( t t Equation (4.1) is also satisfied by γ )t , 0, P γ , 0). (w h , u h , p h ) = (Wγ t , Uγ t , Pγ t ) with ( w h , ( u h )t , ( u h )n , p h , f ) = (γ t , (U t t Since we also have PPδt = 0 because of (3.18d), all the conditions for applying Lemma 4.1 are satisfied. Thus the first identity follows from Lemma 4.1. The second identity follows like the fifth; see below. The fourth identity follows from Corollary 4.2 with (w h , u h , p h ) = (Wδ t , Uδt , Pδt ), (wh , uh , ph ) = (Wf , Uf , Pf ),
δ )t , 0, P δ , 0), ( w h , ( u h )t , ( u h )n , p h , f ) = (δ t , (U t t , f) = (0, (Uf )t , 0, Pf , f ). ( wh , ( uh )t , ( uh )n , p h
The fifth identity follows from Corollary 4.2 with γ )t , 0, P γ , 0), ( wh , ( uh )t , ( uh )n , p h , f ) = (γ t , (U t t μ )t , μ , P μ , 0). wh , ( uh )t , ( uh )n , p h , f) = (0, (U (wh , uh , ph ) = (Wμn , Uμn , Pμn ), ( n n n
(w h , u h , p h ) = (Wγ t , Uγ t , Pγ t ),
The sixth identity follows from Lemma 4.1 with (wh , uh , ph ) = (Wμn , Uμn , Pμn ), (w h , u h , p h ) = (Wλn , Uλn , Pλn ),
μ )t , μ , P μ , 0), h , f) = (0, (U ( wh , ( uh )t , ( uh )n , p n n n ( w , ( u )t , ( u )n , p , f ) = (0, (Uλ )t , λn , Pλ , 0). h
h
h
h
n
n
The eighth identity follows from Corollary 4.2 with (wh , uh , ph ) = (Wμn , Uμn , Pμn ), (w h , u h , p h ) = (Wf , Uf , Pf ),
μ )t , μ , P μ , 0), ( wh , ( uh )t , ( uh )n , p h , f) = (0, (U n n n f )t , 0, P f , f ). ( w , ( u )t , ( u )n , p , f ) = (0, (U h
h
h
h
This completes the proof. Proof of Theorem 3.6. By the jump conditions (3.20b) and (3.20c), γ + U λo + U ρ ) × n]], δ t E = [[(U f + U g ) × n]], δ t E − g × n, δ t ∂Ω , − [[(U t h h t t n n − [[(Pγ t + Pλon + Pρ ) n]], μn Eh = [[(Pf t + Pgn ) n]], μn Eh . By Lemma 4.5, we have that γ × n]], δ t E = ah (λo , δ t ), − [[U t h t o o − [[Uλn × n]], δ t Eh = bh (λn , δ t ), ρ × n]], δ t E = 0, − [[U h
γ n]], μ E = −bh (μ , γ ), − [[P n h n t t o o − [[Pλn n]], μn Eh = ch (λn , μn ), ρ n]], μ E = d(ρ, μ ). − [[P n
h
It remains to show that 1 = 61 and 2 = 62 , where f + U g ) × n]], δ t E − g × n, δ t ∂Ω , 61 (δ t ) := − [[(U t h n 6 2 (ψ) := [[(Pf t + Pgn ) n]], μn Eh .
n
1123
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
But, again by Lemma 4.5, we have 61 (δ t ) = −(f , Uδ t )Ωh − bh (g n , δ t ) − g t × n, δ t ∂Ω = 1 (δ t ), and, similarly, by Lemma 4.5, 62 (μn ) = (f , Uμn )Ωh − ch (g n , μn ) = 2 (μn ). The proof of Theorem 3.6 is completed by also establishing the uniqueness as in the previous cases. 4.5. Proof of the characterization of Theorem 3.8. To prove Theorem 3.8, appearing in these we use the identities below. The numerical traces of the form U identities are defined using (3.21e) and (3.21f) as in the previous cases for all possible choices of the subscripts , that make sense for this case. Lemma 4.6 (elementary identities). For any γ t , δ t ∈ L2 (Eh ), any ρ, ψ ∈ L2 (Eh ), any φ ∈ H 1 (Ωh ), and any f ∈ L2 (Ω), we have ? @ γ × n]], δ t E = (Wγ , Wδ )Ω + 1 Pγ , Pδ − [[U h t h t t t τn t ∂Ωh ? @ 1 + n × (γ t − Wγ t ), n × (δ t − Wδt ) , τt ∂Ωh δ · n]], ρE , ρ × n]], δ t E = − [[U − [[U h t h φ × n]], δ t E = n × grad φ, δ t ∂Ω , − [[U h h − [[Uf × n]], δ t E = +(f , Uδ )Ω h
t
h
and γ · n]], ψE = − [[U ψ × n]], γ E − [[U t h h t ? @ 1 − [[Uρ · n]], ψEh = (Wρ , Wψ )Ωh + (ρ − Pρ ), (ψ − Pψ ) τn ∂Ωh ? @ 1 + n × Wρ , n × Wψ , τt ∂Ωh φ · n]], ψE = −grad φ · n, ψ∂Ω , − [[U h h − [[Uf · n]], ψEh = +(f , Uψ )Ωh . Proof. The third and seventh identities are immediate because (3.24) implies that φ = grad φ. U In the remainder of the proof, whenever we apply Lemma 4.1 or Corollary 4.2 we take Pv = grad φv and P = 0. To prove the first identity, we proceed as in the previous cases and apply Lemma 4.1 (now additionally noting that PUγ t = 0) with (wh , uh , ph ) = (Wγ t , Uγ t , Pγ t ), (w h , u h , p h ) = (Wδ t , Uδt , Pδt ),
γ , 0, 0), ( wh , uh , p h , f) = (γ t , U t , f ) = (δ t , Uδ , 0, 0). ( w , u ,p h
h
h
t
1124
BERNARDO COCKBURN AND JAYADEEP GOPALAKRISHNAN
The second identity is proved just like the fifth; see below. The fourth identity follows from Corollary 4.2 with (wh , uh , ph ) = (Wf , Uf , Pf ), (w h , u h , p h ) = (Wδ t , Uδt , Pδt ),
f , 0, f ), ( wh , uh , p h , f) = (0, U δ , 0, 0), , f ) = (δ t , U ( w , u , p h
h
h
t
The fifth identity follows from Corollary 4.2 with (wh , uh , ph ) = (Wγ t , Uγ t , Pγ t ), (w h , u h , p h ) = (Wψ , Uψ , Pψ ),
γ , 0, 0), ( wh , uh , p h , f) = (γ t , U t ψ , ψ, 0). , f ) = (0, U ( w , u ,p h
h
h
The sixth identity follows from Lemma 4.1 with ρ , ρ, 0), ( wh , uh , p h , f) = (0, U ψ , ψ, 0). ( w , u , p , f ) = (0, U
(wh , uh , ph ) = (Wρ , Uρ , Pρ ), (w h , u h , p h ) = (Wψ , Uψ , Pψ ),
h
h
h
The eighth identity follows from Corollary 4.2 with (wh , uh , ph ) = (Wf , Uf , Pf ), (w h , u h , p h ) = (Wψ , Uψ , Pψ ),
f , 0, f ), ( wh , uh , p h , f) = (0, U ψ , ψ, 0). , f ) = (0, U ( w , u ,p h
h
h
Proof of Theorem 3.8. By the jump conditions (3.23b) and (3.23c), γ + U ρ + U φ ) × n]], δ t E = [[(U f × n]], δ t E − g × n, δ t ∂Ω , − [[(U h h t γ + U ρ + U φ ) · n]], ψE = [[U f · n]], ψE − g · n, ψ∂Ω . − [[(U h h t By Lemma 4.6, we have that γ × n]], δ t E = ah (γ , δ t ), − [[U t h t ρ × n]], δ t E = bh (ρ, δ t ), − [[U h − [[Uφ × u]], δ t E = ch (φ, δ t ), h
γ · n]], ψE = bh (ψ, γ ), − [[U t h t ρ · n]], ψE = dh (ρ, ψ), − [[U h − [[Uφ · n]], ψE = eh (φ, ψ), h
and that f × n]], δ t E − g × n, δ t ∂Ω = 1 (δ t ), [[(U h f · n]], ψE − g · n, ψ∂Ω = 2 (ψ). [[U h The proof of Theorem 3.8 is now completed by a uniqueness argument as in the previous cases. 5. Concluding remarks. In this paper, we introduced a new HDG method for the Stokes system and showed four different ways of hybridizing it. In order for these methods to be competitive with previously known ones [14, 20, 18, 19, 12, 15, 3, 13], they need to be not only efficiently implemented, but also efficiently solved. We would like to emphasize that our characterization theorems are a first step towards such a goal since they shed light on the structure of the corresponding equations. However, we feel that a meaningful study of those equations deserves a separate paper. The design of efficient solvers for these methods constitutes work in progress. Another subject that constitutes the subject of ongoing work is the analysis of the accuracy of the methods. A careful a priori error analysis of the HDG methods
HYBRIDIZABLE DG METHODS FOR STOKES FLOW
1125
should reveal the effect of the choice of the stabilization parameters τn and τt on their accuracy. Let us recall that, in the context of second-order elliptic problems, the HDG methods [10] were shown to be more accurate than all previously known DG methods when their stabilization parameters are suitably chosen. In particular, when using polynomial approximations of the same degree for both the solution and its gradient, both approximations were shown to converge with optimal order; see [4, 11]. It is thus reasonable to expect that by a proper choice of the parameters τn and τt , the HDG method using polynomial approximations of the same degree for the vorticity, velocity, and pressure will also converge optimally in all three variables. This is work in progress. REFERENCES [1] D. N. Arnold, F. Brezzi, B. Cockburn, and L. D. Marini, Unified analysis of discontinuous Galerkin methods for elliptic problems, SIAM J. Numer. Anal., 39 (2002), pp. 1749–1779. [2] F. Brezzi and M. Fortin, Mixed and Hybrid Finite Element Methods, Springer-Verlag, New York, 1991. ¨ tzau, Hybridized, globally divergence-free LDG meth[3] J. Carrero, B. Cockburn, and D. Scho ods. Part I: The Stokes problem, Math. Comp., 75 (2006), pp. 533–563. ´ n, A superconvergent LDG-hybridizable Galerkin [4] B. Cockburn, B. Dong, and J. Guzma method for second-order elliptic problems, Math. Comp., 77 (2008), pp. 1887–1916. [5] B. Cockburn and J. Gopalakrishnan, A characterization of hybridized mixed methods for second order elliptic problems, SIAM J. Numer. Anal., 42 (2004), pp. 283–301. [6] B. Cockburn and J. Gopalakrishnan, Error analysis of variable degree mixed methods for elliptic problems via hybridization, Math. Comp., 74 (2005), pp. 1653–1677. [7] B. Cockburn and J. Gopalakrishnan, Incompressible finite elements via hybridization. Part I: The Stokes system in two space dimensions, SIAM J. Numer. Anal., 43 (2005), pp. 1627–1650. [8] B. Cockburn and J. Gopalakrishnan, Incompressible finite elements via hybridization. Part II: The Stokes system in three space dimensions, SIAM J. Numer. Anal., 43 (2005), pp. 1651–1672. [9] B. Cockburn and J. Gopalakrishnan, New hybridization techniques, GAMM-Mitt., 2 (2005), pp. 154–183. [10] B. Cockburn, J. Gopalakrishnan, and R. Lazarov, Unified hybridization of discontinuous Galerkin, mixed, and continuous Galerkin methods for second-order elliptic problems, SIAM J. Numer. Anal., to appear. ´ n, and H. Wang, Superconvergent discontinuous Galerkin methods [11] B. Cockburn, J. Guzma for second-order elliptic problems, Math. Comp., 78 (2009), pp. 1–24. ¨ tzau, A locally conservative LDG method for the [12] B. Cockburn, G. Kanschat, and D. Scho incompressible Navier-Stokes equations, Math. Comp., 74 (2005), pp. 1067–1095. ¨ tzau, A note on discontinuous Galerkin divergence[13] B. Cockburn, G. Kanschat, and D. Scho free solutions of the Navier-Stokes equations, J. Sci. Comput., 31 (2007), pp. 61–73. ¨ tzau, and C. Schwab, Local discontinuous Galerkin [14] B. Cockburn, G. Kanschat, D. Scho methods for the Stokes system, SIAM J. Numer. Anal., 40 (2002), pp. 319–343. [15] V. Girault, B. Rivi` ere, and M. F. Wheeler, A discontinuous Galerkin method with nonoverlapping domain decomposition for the Stokes and Navier-Stokes problems, Math. Comp., 74 (2005), pp. 53–84. [16] M. D. Gunzburger, Finite Element Methods for Viscous Incompressible Flows: A Guide to Theory, Practice and Algorithms, Academic Press, New York, 1989. ´ ements finis mixtes incompressibles pour l’´ [17] J.-C. N´ ed´ elec, El´ equation de Stokes dans R3 , Numer. Math., 39 (1982), pp. 97–112. ¨ tzau, C. Schwab, and A. Toselli, Mixed hp-DGFEM for incompressible flows, SIAM [18] D. Scho J. Numer. Anal., 40 (2003), pp. 2171–2194. ¨ tzau, C. Schwab, and A. Toselli, Stabilized hp-DGFEM for incompressible flow, [19] D. Scho Math. Models Methods Appl. Sci., 13 (2003), pp. 1413–1436. [20] A. Toselli, hp-discontinuous Galerkin approximations for the Stokes problem, Math. Models Methods Appl. Sci., 12 (2002), pp. 1565–1616. [21] J. Wang and X. Ye, New finite element methods in computational fluid dynamics by H(div) elements, SIAM J. Numer. Anal., 45 (2007), pp. 1269–1286.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1126–1148
NUMERICAL ANALYSIS OF A FINITE ELEMENT/VOLUME PENALTY METHOD∗ BERTRAND MAURY† Abstract. We present here some contributions to the numerical analysis of the penalty method in the finite element context. We are especially interested in the ability provided by this approach to use Cartesian, non boundary-fitted meshes to solve elliptic problems in complicated domain. In the spirit of fictitious domains, the initial problem is replaced by a penalized one, posed over a simply shaped domain which covers the original one. This method relies on two parameters, namely h (space-discretization parameter) and ε (penalty parameter). We propose here a general strategy to estimate the error in both parameters, and we present how it can be applied to various situations. We pay special attention to a scalar version of the rigid motion constraint for fluid-particle flows. Key words. finite element method, penalty, Poisson’s problem, error estimate AMS subject classifications. 65N30, 65N12, 49M30 DOI. 10.1137/080712799
1. Introduction. Because of its conceptual simplicity and the fact that it is straightforward to implement, the penalty method has been widely used to incorporate constraints in numerical optimization. The general principle can been seen as a relaxed version of the following fact: given a proper functional J over a set X, and K a subset of X, minimizing J over K is equivalent to minimizing JK = J + IK over X, where IK is the indicatrix of K: 0 if x ∈ K, IK (x) = +∞ if x ∈ / K. Assume now that K is defined as K = {x ∈ X , Ψ(x) = 0}, where Ψ is a nonnegative function; the penalty method consists in considering relaxed functionals Jε defined as 1 Jε = J + Ψ , ε
ε > 0.
By definition of K, the function Ψ/ε approaches IK pointwisely: 1 Ψ(x) −→ IK (x) as ε goes to 0 ε
∀x ∈ X.
If Jε admits a minimum uε , for any ε, one can expect uε to approach a (or the) minimizer of J over K, if it exists. In the finite element context, some uεh is computed as the solution to a finite dimensional problem, where h is a space-discretization parameter. The work we present here is motivated by the fact that, even if the penalty method for the continuous problem is convergent and the discretization procedure is sound, the rate of convergence of uεh toward the exact solution is not straightforward to obtain. A huge literature is ∗ Received by the editors January 9, 2008; accepted for publication (in revised form) November 6, 2008; published electronically February 19, 2009. http://www.siam.org/journals/sinum/47-2/71279.html † Laboratoire de Math´ ematiques, Universit´e Paris-Sud, 91405 Orsay Cedex, France (Bertrand.
[email protected]).
1126
1127
PENALTY METHOD
dedicated to the situation where the constraint is distributed over the domain, like the divergence-free constraint for incompressible Stokes flows (see [BF91, GR79]). In this context, the penalty approach makes it possible to use mixed finite element methods which do not fulfill the so-called Babuska–Brezzi–Ladyzhenskaya (or infsup) condition. The penalty approach is also commonly used to prescribe (possibly nonhomogeneous) Dirichlet boundary conditions on a boundary. The pioneering papers [Nit71] and [Bab73] already addressed in the early 70’s the problem of error estimation with respect to both parameters h and ε. Those works have been widely used since then, and this area has recently experienced a regain of interest, triggered by problems arising in domain decomposition (see, e.g., [BHS03]), discontinuous Galerkin methods [BE07], or handling of discontinuities for elliptic problems with discontinuous coefficients [HH02]. We will focus here on another type of constraints, namely geometrical ones: we are interested in solving an elliptic problem on a domain Ω \ O, where Ω is a simply shaped domain (e.g., a rectangle) and O a set of holes, and we aim at replacing it by a new problem posed over the global domain Ω. The simplest situation one may consider consists in solving a Poisson problem in a perforated, rectangular domain Ω, with homogeneous Dirichlet boundary conditions on the holes and over the external boundary. In the purpose of using a Cartesian mesh which covers the whole domain (which can be of great interest if the holes are intended to move), it is natural to consider the penalized version of the problem, which consists in minimizing (O designs the subdomain covered by the holes) 1 2
2
|∇v| − Ω
fv + Ω
1 2ε
O
2
v 2 + |∇v|
over H01 (Ω). Another situation where the penalty approach has already proved to be quite efficient is the modeling of fluid-particle flows (see [RPVC05] or [JLM05]). The scalar version of this problem, which we shall address in detail in the following pages, consists in minimizing the standard functional 1 2 J(v) = |∇v| − fv 2 Ω Ω over all those functions which are constant on each connected component of the set of holes O. Again, the constraint is easily relaxed by adding to J a term which penalizes the H 1 seminorm of v over O. Two points advocate for the use of this approach: 1. The use of a Cartesian mesh makes this approach quite easy to implement: both cases reduce to a few lines of instructions within user-friendly finite element solvers like Freefem++ [FFp] for two-dimensional problems, or Freefem3D [FFp] for three-dimensional ones. Note that the penalty terms do not preserve the spectrum of the discrete Laplacian matrix, which prevents us from using standard fast solvers like fast Fourier transform (to the contrary of Lagrange multiplier based fictitious domain methods [PG02, GG95], which do preserve the structure of the matrix, at the price of an iterative algorithm on the Lagrange multipliers). A harmful effect upon the condition number of the solution matrix is furthermore to be expected. Yet, as the penalty parameter does not need to be taken too small, the method remains quite competitive for reasonably sized problems.
1128
BERTRAND MAURY
2. This method provides, with no extra computational cost, an approximation of the Lagrange multiplier associated with the constraint, which is of great significance from the modeling standpoint in many situations. For example, in the first situation we considered, which can be seen as the stationary heat equation, it is quite straightforward that, if we denote by uε the solution to the discretized problem, ξ ε ∈ H −1 defined as 1 ε (uε v + ∇uε · ∇v) ξ , v = ε O approximates the heat source which is necessary to fulfill the constraint. We shall establish that this natural outcome of the method is still provided by the discretized/penalized version. Note that this property has already been used to handle numerically the motion of a three-dimensional turbine in a Navier–Stokes fluid (see [DPM07]). As for the theoretical analysis of the method, the error due to the fact that the mesh is not boundary fitted is analyzed in [AR08, RAB07]. See also [SMSTT05] for similar estimates used to establish the convergence of a method to handle the motion of a rigid motion in the limit ε = 0. Yet, to the best of our knowledge, a full error estimate (simultaneous convergence of h and ε toward 0) has not yet been provided for the type of volume penalty approach we propose here. We aim here at showing that the global error can be controlled, as expected, by the sum of the penalty error and the space-discretization error, under quite general assumptions. This paper is organized as follows: in section 2, we recall some standard properties of the penalty method in the framework of constrained quadratic minimization, including some general facts about the space discretization of those problems. Section 3 is devoted to the main result: an abstract estimate for the primal and the dual parts of the discretized/penalized problem. The next section is concerned with a model problem, in the spirit of fluid-particle flows, for which we present in detail how the abstract estimate can be applied. Finally, we present in section 5 some other typical situations where the abstract estimate can be used. 2. Preliminaries, abstract framework. 2.1. Continuous problem. We recall here some standard properties concerning the penalty method applied to infinite dimensional problems. Most of those properties are established in [BF91], with a slightly different formalism. We consider the following set of assumptions:
(2.1)
⎫ ⎪ ⎪ ⎪ 2 ⎪ a(·, ·) bilinear, symmetric, continuous, elliptic (a(v, v) ≥ α |v| ), ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ b(·, ·) bilinear, symmetric, continuous, nonnegative, ⎪ ⎬ K = {u ∈ V , b(u, u) = 0} = ker b, ⎪ ⎪ 1 ⎪ ⎪ ⎪ J(v) = a(v, v) − ϕ , v, u = arg min J, ⎪ ⎪ K 2 ⎪ ⎪ ⎪ 1 1 ⎪ ε ⎭ Jε (v) = a(v, v) + b(v, v) − ϕ , v, u = arg min Jε . V 2 2ε V is a Hilbert space, ϕ ∈ V ,
Proposition 2.1. Under assumptions (2.1), the solution uε to the penalized problem converges to u.
1129
PENALTY METHOD
Proof. As the family (Jε ) is uniformly elliptic, |uε | is bounded. We extract a subsequence, still denoted by (uε ), which converges weakly to some z ∈ V . As Jε ≥ J and b(u, u) = 0, we have J(uε ) ≤ Jε (uε ) ≤ Jε (u) = J(u) ∀ε > 0,
(2.2)
so that (J is convex and continuous) J(z) ≤ lim inf J(uε ) ≤ J(u). As J(uε ) +
1 b(uε , uε ) ≤ J(u), 2ε
b(uε , uε )/ε is bounded, so that b(uε , uε ) goes to 0 with ε. Consequently, it holds that 0 ≤ b(z, z) ≤ lim inf b(uε , uε ) = 0, which implies z ∈ K, so that z = u. To establish the strong character of the convergence, we show that uε converges toward u for the norm associated with a(·, ·), which is equivalent to the original norm. As uε converges weakly to u for this scalar product (a(uε , v) → a(u, v) for any v ∈ V ), it is sufficient to establish the convergence of |uε |a = a(uε , uε )1/2 toward |u|a . First, |u|a ≤ lim inf |uε |a , and the other inequality comes from (2.2): 1 1 a(uε , uε ) − ϕ , uε ≤ a(u, u) − ϕ , u, 2 2 so that lim sup |uε |a ≤ |u|a . The proposition does not say anything about the rate of convergence, and it can be very poor, as the following example illustrates. Example 2.1. Consider I =]0, 1[, V = H 1 (I), and the problem which consists in minimizing the functional 1 2 |u | J(v) = 2 I over K = {v ∈ V , v(x) = 0 a.e. in O =]0, 1/2[} . The solution to that problem is obviously u = max(0, 2(x−1/2)). Now let us denote by uε the minimum of the penalized functional 1 1 2 Jε = |u | + |u|2 . 2 I 2ε O The solution to the penalized problem can be computed exactly:
ε
u = kε (x) sh
x √ ε
−1 x x 1 , in ]0, 1/2[ with kε (x) = sh √ + √ ch √ ε 2 ε ε
and uε affine in ]1/2, 1[, continuous at 1/2. This makes it possible to estimate |uε − u|, which turns out to behave like ε1/4 . Yet, in many situations, convergence can be shown to be of order 1, given some assumptions are fulfilled. Let us introduce ξ ∈ V as the unique linear functional such that (2.3)
a(u, v) + ξ , v = ϕ , v
∀v ∈ V.
Before stating the first order convergence result, we show here that the penalty method provides an approximation of ξ.
1130
BERTRAND MAURY
Proposition 2.2. Let ξ ε ∈ V be defined by v ∈ V −→ ξ ε , v =
1 b(uε , v). ε
Then ξ ε converges (strongly) to ξ in V , at least as fast as uε converges to u. Proof. The variational formulation of the penalized problem reads 1 a(uε , v) + b(uε , v) = ϕ , v ε
(2.4)
∀v ∈ V.
The result is then a direct consequence of the identity which we obtain by substracting (2.3) and (2.4): 1 ξ , v − b(uε , v) = a(u − uε , v) ∀v ∈ V, ε which yields ξ − ξ ε V ≤ C |u − uε |. Let us now establish the first order convergence, provided an extra compatibility condition between b(·, ·) and ξ is met. Proposition 2.3. Under assumptions (2.1), we assume in addition that there ˜ v) = ξ , v for all v ∈ V . Then |uε − u| = O(ε). exists ξ˜ ∈ V such that b(ξ, Proof. First of all, notice that it is possible to pick ξ˜ in K ⊥ (if not, we project it onto K ⊥ ). Now following the idea which is proposed in [Bab73] in a slightly different context (see the proof of Thm. 3.2 therein), we introduce Rε (v) =
1 1 a(u − v, u − v) + b(εξ˜ − v, εξ˜ − v), 2 2ε
which can be written Rε (v) =
ε ˜ ˜ 1 1 1 ˜ v). a(u, u) + b(ξ, ξ) + a(v, v) + b(v, v) − a(u, v) − b(ξ, 2 2 2 2ε
˜ v) = ξ , v and −a(u, v) − ξ , v = −ϕ , v, the functional Rε is equal to Jε As b(ξ, up to a constant. Therefore minimizing Rε amounts to minimizing Jε . Let us now introduce w = εξ˜ + u. We have Rε (w) =
ε2 ˜ ˜ a(ξ, ξ) + 0 2
because u ∈ K = ker b,
so that |Rε (w)| ≤ Cε2 . As uε minimizes Rε , 0 ≤ Rε (uε ) =
1 1 a(u − uε , u − uε ) + b(εξ˜ − uε , εξ˜ − uε ) ≤ Cε2 , 2 2ε
from which we deduce, as a(·, ·) is elliptic, |u − uε | = O(ε). Corollary 2.4. Under assumptions (2.1), we assume in addition that b(·, ·) can be written b(u, v) = (Bu, Bv), where B is a linear continuous operator onto a Hilbert space Λ, with closed range. Then |uε − u| = O(ε). Proof. Let us show that the assumption of Proposition 2.3 is met. It is sufficient to prove that any ξ ∈ V which vanishes over K identifies through b(·, ·) with some ξ˜ ∈ V ; i.e., there exists ξ˜ ∈ V such that ˜ v) ∀v ∈ V. ξ , v = b(ξ,
1131
PENALTY METHOD
Note that, as ξ vanishes over K, it can be seen as a linear functional defined on K ⊥ , so that it is equivalent to establish that T : V −→ (K ⊥ ) defined by ˜ v) ξ˜ −→ ξ : ξ , v = b(ξ,
∀v ∈ K ⊥
is surjective. We denote by T ∈ L (K ⊥ , V ) the adjoint of T . For all w ∈ K ⊥ , 2
|T w| = sup v =0
(T w, v) b(w, v) (Bw, Bv) |Bw| = sup = sup ≥ . |v| |v| |v| |w| v =0 v =0
As B has closed range, |Bw| ≥ C |w| for all w in (ker B)⊥ = K ⊥ , so that |T w| ≥ C 2 |w|
∀w ∈ K ⊥ ,
from which we conclude that T is surjective. Remark 2.1. Note that Proposition 2.3 is strictly stronger than its corollary. Indeed, consider the handling of homogeneous Dirichlet boundary conditions by penalty: V = H 1(Ω), where Ω is a smooth, bounded domain, a(u, v) = ∇u · ∇v, and ϕ , v = f v, where f is in L2 (Ω), and b(v, v) = ∂Ω v 2 . In this situation the corollary cannot be used, because the trace operator from H 1 (Ω) onto L2 (∂Ω) does not have a close range. On the other hand one can establish that ∂u v, ξ , v = ∂Ω ∂n and, as the solution u is regular (u ∈ H 2 (Ω)), its normal derivative (in H 1/2 (∂Ω)) can be built as the trace of a function ξ˜ in H 1 (Ω), so that Proposition 2.3 holds true. We conclude this section by some considerations concerning the saddle-point formulation of the constrained problem, which will be useful in the following. We consider again the closed situation. Proposition 2.5. Under the assumptions of Corollary 2.4, there exists λ ∈ Λ such that (2.5)
a(u, v) + (λ, Bv) = ϕ , v
∀v ∈ V.
The solution is unique in B(V ) (which identifies with Λ/ ker B ). Proof. The proof of this standard property can be found in [BF91]. In fact, it ˜ Uniqueness is has just been established in the proof of Corollary 2.4: λ is simply B ξ. straightforward. Proposition 2.6. Under the assumptions of Proposition 2.5 (assumptions (2.1) and B(V ) is closed), we introduce λε =
1 Buε . ε
Then |λε − λ| = O(ε), where λ is the unique solution of (2.5) in B(V ). Proof. Substracting the variational formulations for u and uε , we get (λε − λ, Bv) = a(uε − u, v) ∀v ∈ V. Now, as the range of B is closed, and λε − λ ∈ B(V ) = (ker B )⊥ , we have the inf-sup condition (see, e.g., [BF91]) sup v∈V
(λε − λ, Bv) ≥ β |λε − λ| , |v|
1132
BERTRAND MAURY
so that β |λε − λ| ≤ sup
(λε − λ, Bv) a(uε − u, v) = sup ≤ a |uε − u| , |v| |v|
which ensures the first order convergence thanks to Corollary 2.4. Corollary 2.7. For any z ∈ V such that Bz = λ, there exists a sequence (v ε ) in ker B such that ε u − v ε − z = O(ε). ε Proof. This is a direct consequence of the fact that, B(V ) being closed, the restriction of B to ker B ⊥ is a bicontinuous bijection between ker B ⊥ and B(V ). The convergence is therefore obtained by taking v ε = Pker B (uε /ε − z). 2.2. Discretized problem. We consider now a family (Vh )h of inner approximation spaces (Vh ⊂ V ) and the associated penalized/discretized problems ⎧ ε ε ε J ε (vh ) , ⎪ ⎨ Find uh ∈ Vh such that J (uh ) = vhinf ∈Vh (2.6) ⎪ ⎩ J ε (vh ) = 1 a(vh , vh ) + 1 b(vh , vh ) − ϕ , vh . 2 2ε As far as we know, there does not exist any general theory which would give an upper bound for the error |u − uεh | as the sum of a discretization error (typically h of h1/2 for volume penalty, depending on whether the mesh is boundary-fitted or not), and a penalty error (typically ε for closed-range penalty terms, possibly poorer in general situations, as in Example 2.1). We propose here two general properties which are direct consequences of standard arguments. They are suboptimal in the sense that neither of them is optimal from both standpoints (discretization and penalty), but, at least, they make it possible to recover the behavior in extreme situations (when ε goes to 0 much quicker than h, and the opposite). The first proposition uses the following lemma. Lemma 2.8. Under assumptions (2.1), there exists C > 0 such that b(uε , uε ) ≤ Cε |u − uε | . Proof. By definition of uε , Jε (uε ) =
1 1 1 a(uε , uε ) − ϕ , uε + b(uε , uε ) ≤ Jε (u) = a(u, u) − ϕ , u, 2 2ε 2
so that 0≤
1 1 1 b(uε , uε ) ≤ a(u, u) − a(uε , uε ) + ϕ , uε − u 2ε 2 2 ≤
1 a(u + uε , u − uε ) + ϕ , uε − u, 2
which yields the estimate by continuity of a(·, ·) and ϕ. Proposition 2.9. Under assumptions (2.1), we denote by uεh the solution to problem (2.6). Then
ε ε min |vh − u| + |u − u| . |uh − u| ≤ C vh ∈Vh ∩K
PENALTY METHOD
1133
Proof. As uεh minimizes a(v − uε , v − uε ) + b(v − uε , v − uε )/ε over Vh , α |uεh − uε |2 ≤ a(uεh − uε , uεh − uε ) 1 ≤ a(uεh − uε , uεh − uε ) + b(uεh − uε , uεh − uε ) ε
1 ≤ min a(vh − uε , vh − uε ) + b(vh − uε , vh − uε ) vh ∈Vh ε
1 ε ε ε ε ≤ min a(vh − u , vh − u ) + b(vh − u , vh − u ) . vh ∈Vh ∩K ε As vh is in K, the second term is b(uε , uε )/ε, which is bounded by C |uε − u| (by Lemma 2.8). Finally, we get
ε ε ε ε min |vh − u | + |u − u| , |uh − u | ≤ C vh ∈Vh ∩K
from which we conclude. Proposition 2.10. Under assumptions (2.1), Vh ⊂ V , and uεh being the solution to (2.6), it holds that C |uεh − u| ≤ √ inf |uε − vh | + |uε − u| . ε vh ∈Vh Proof. One has |uεh − u| ≤ |uεh − uε | + |uε − u| , and we control the first term by C´ea’s lemma applied to the bilinear form a + b/ε, whose ellipticity constant behaves like 1/ε. The following example illustrates how those estimates can be used in practice. Example 2.2. The simplest example of penalty formulation one may think about is the following: the constraint to vanish on the boundary of a subdomain O ⊂⊂ Ω is handled by minimizing the functional 1 1 2 (2.7) Jε (v) = |∇v| − fv + u2 . 2 Ω 2ε O Ω Now considering the L2 penalty method in O, if we admit the ε1/4 convergence of |uε − u|, Proposition 2.9 provides an estimate in h1/2 + ε1/8 . This estimate is optimal in h: the natural space discretization order is obtained if ε is small enough (ε = h4 in the present case). Symmetrically, the natural order in ε can be recovered if h is small enough: Indeed, if we admit that uε can be approximated at the same order as u over Ω, which is 1/2, then the choice ε = h4/3 in Proposition 2.10 gives |uεh − u| ≤
C 3/4 ε + ε1/4 = O(ε1/4 ). ε1/2 2
Note that if we replace u2 by u2 +|∇u| in the integral over O in (2.7), assumptions of Corollary 2.4 are fulfilled, so that convergence holds at the first order in ε. As a consequence, |u − uεh | is bounded by C(h1/2 + ε1/2 ) (by Proposition 2.9), which suggests the choice ε = h.
1134
BERTRAND MAURY
3. Full error estimate. As shall be made clear below, a full and optimal error estimate calls for a uniform discrete inf-sup condition. In the case of a nonconforming mesh, it appears immediately that the penalty term has to be modified. To anticipate this difficulty, we introduce a modified version of B, namely Bh , in this abstract approach. No assumption is made a priori on Bh in terms of approximation properties, but the estimate we establish below will not express any convergence property unless Bh approaches B in some sense. Besides (2.1), we consider the following set of additional assumptions and notation: ⎫ b(v, v) = (Bv, Bv), where B ∈ L (V, Λ) has a closed range, ⎪ ⎪ ⎪ (V ) family of approximation spaces , V ⊂ V, ⎪ ⎪ h h h ⎪ ⎪ ⎪ Bh ∈ L (V, Λ) , ker B ⊂ ker Bh , Bh bounded , Λh = Bh (Vh ), ⎪ ⎪ ⎪ ⎪ ⎪ ε ⎬ 1 J (vh ) = J(vh ) + (Bh vh , Bh vh ), (3.1) h ε ⎪ ⎪ ⎪ ε 1 ⎪ ⎪ uh = arg min Jhε , λεh = Bh uεh ∈ Λh , ⎪ ⎪ Vh ε ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ sup (Bh vh , λh ) ≥ β |λh | ⎭ ∀λ ∈ Λ . h h Λh |vh | vh ∈Vh Theorem 3.1 (primal/dual error estimate). Under assumptions (2.1) and (3.1), we have the following error estimate: |u − uεh | + |λ − λεh |
(3.2)
˜ ≤ C ε + inf |˜ uh − u| + inf λh − λ + |(Bh − B )λ| + |(Bh − B)z| , u ˜h ∈Vh
˜ h ∈Λh λ
where z is such that λ = Bz. Proof. The proof relies on some general properties of the continuous penalty method which we established in the beginning of this section, and an abstract stability estimate for saddle-point-like problems with stabilization (see Proposition 3.2 below). First of all, note that, as the range of B is closed, the convergence of uε toward u holds at the first order (by Corollary 2.4). As another consequence, λε = Buε /ε is such that |λ − λε | = O(ε) (by Proposition 2.6). We write the continuous penalized problem ∀v ∈ V, a(uε , v) + (λε , Bv) = ϕ , v (Buε , μ) −
ε(λε , μ)
=
∀μ ∈ Λ
0
and the discrete penalized problem in a saddle-point form a(uεh , vh ) + (λεh , Bh vh ) = ϕ , vh 0 (Bh uεh , μh ) − ε(λεh , μh ) =
∀vh ∈ Vh , ∀μh ∈ Λh .
As Λh is exactly Bh (Vh ), this problem admits a unique solution (uεh , λεh ) (see Propo˜ h ) ∈ Vh × Λh , vh ∈ Vh , μh ∈ Λh , sition 2.5). For any (˜ uh , λ ⎧ ˜ h − λε , Bh vh ) ˜ h − λε , Bh vh ) a(˜ uh − uεh , vh ) + (λ = a(˜ uh − uε , vh ) + (λ ⎪ h ⎪ ⎪ ⎨ + (B − B )λε , v , h
h
˜ h − λε , μh ) ⎪ (Bh (˜ uh − uεh ), μh ) − ε(λ ⎪ h ⎪ ⎩
˜ h − λε , μh ) = (Bh (˜ uh − u ), μh ) − ε(λ ε
+ (Bh − B)uε , μh .
1135
PENALTY METHOD
Our purpose is to use Proposition 3.2 (Vh and Λh play the role of V and Λ in the proposition, respectively) with (3.3)
˜ h − λε , Bh vh ) + (B − B )λε , vh , ϕ , vh = a(˜ uh − uε , vh ) + (λ h
(3.4)
˜ h − λε , μh ) + ((Bh − B)uε , μh ). Ψ , μh = (Bh (˜ uh − uε ), μh ) − ε(λ
The last term of (3.3) is transformed as follows: (Bh − B )λε = (Bh − B )λ + (Bh − B )(λε − λ), where λ ∈ B(V ) is the exact Lagrange multiplier defined by Proposition 2.5. So, defining ε
˜ h − λε ) + (Bh − B) u c(μ, μ ) = ε(μ, μ ) , w = u ˜h − uε , γ = −(λ ε
(see (3.7) for the meaning of w and γ), Proposition 3.2 ensures existence of a constant ˜ h − λε | is less than C > 0 (which does not depend on h) such that |˜ uh − uεh | + |λ h ˜ h − λε + (B − B )λ + |γ| . C |˜ u h − u ε | + λ h The second contribution to γ can be written, thanks to Corollary 2.7 and the fact that ker B ⊂ ker Bh ,
ε uε u ε (Bh − B) = (Bh − B) − v − z + (Bh − B) z, ε ε where v ε ⊂ ker B, and z is such that Bz = λ, which yields ˜ ε |γ| ≤ λ h − λ + O(ε) + |(Bh − B) z| . We finally obtain that |uε − uεh | + |λε − λεh | is less than
˜ ε uh − uε | + inf λ C inf |˜ h − λ + |(Bh − B )λ| + ε + |(Bh − B)z| , u ˜h ∈Vh
˜ h ∈Λh λ
so that, by eliminating uε in the left-hand side, and again using |uε − u| = O(ε) and |λε − λ| = O(ε) (see Corollary 2.4 and Proposition 2.6), we obtain the error estimate. Proposition 3.2 (abstract stability estimate). Let V and Λ be two Hilbert spaces, B ∈ L (V, Λ), a(·, ·) and c(·, ·) bilinear continuous functionals, which we suppose elliptic. Then the problem a(u, v) + (λ, Bv) = ϕ , v ∀v ∈ V, (3.5) (Bu, μ) − c(λ, μ) = Ψ , μ ∀μ ∈ Λ admits a unique solution (u, λ) ∈ V × Λ. We assume furthermore that there exists a constant β > 0 such that1 (3.6)
β P(ker B)⊥ v ≤ |Bv| ,
sup v∈V
(μ, Bv) ≥ β μΛ/ ker B , |v|
1 As the second inequality of (3.6) is a direct consequence of the first one, it could be suppressed. We keep both assumptions for clarity reasons.
1136
BERTRAND MAURY
that Ψ can be written Ψ , μ = (Bw, μ) + c(γ, μ),
(3.7)
and finally that c(·, ·) verifies μ1 ⊥μ2 −→ c(μ1 , μ2 ) = 0.
(3.8)
Then we have the following estimate: (3.9)
|u| + |λ| ≤ C(ϕ + |w| + |γ|),
where C is a locally bounded expression of a, 1/α, 1/β, B, c (α is the coercivity constant of a(·, ·)). Note that C does not depend upon the coercivity constant of c(·, ·). Proof. The first part of the proposition is trivial. With obvious notation, problem (3.5) can be written Au + B λ = ϕ, (3.10) Bu − M λ = Ψ, so that (u, λ) is uniquely determined as u = (A + B M −1 B)−1 ϕ + B M −1 Ψ ,
λ = M −1 (Bu − Ψ) .
In order to get an upper bound of |u| which does not degenerate with c(·, ·), we introduce, following [BF91], (3.11)
u = u0 + KLMN ∈ker B
u⊥ KLMN
,
λ=
∈(ker B)⊥
λ0 + KLMN
∈ker B
λ⊥ KLMN
.
∈(ker B )⊥
From (3.6) and the first line of (3.5), we have (3.12)
(λ, Bv) β λ⊥ = β λΛ/ ker B ≤ sup ≤ a |u| + ϕ . |v|
From (3.6) again and the second line of (3.5), we get (3.13)
(Bu, μ) β u⊥ = β P(ker B)⊥ u ≤ |Bu| = sup ≤ Ψ + c1/2 c(λ, λ)1/2 . |μ|
From the ellipticity of a(·, ·) and the first line of (3.5),
a(u0 , v0 ) a(u, v0 ) − a(u⊥ , v0 ) u0 = sup α |u0 | ≤ a u0 , ≤ sup |u0 | |v0 | |v0 | v0 ∈ker B v0 ∈ker B (3.14) ≤ ϕ + a u⊥ . From (3.13) and (3.14), we have 1 1 |u| ≤ u⊥ + |u0 | ≤ Ψ + c1/2 c(λ, λ)1/2 + ϕ + a u⊥ β α
1 a ϕ 1/2 1/2 (3.15) Ψ + c c(λ, λ) . ≤ 1+ + β α α
PENALTY METHOD
1137
Now substracting the two lines of (3.5) with v = u and μ = λ, we obtain a(u, u) + c(λ, λ) = ϕ , u − Ψ , λ = ϕ , u − (Bw, λ) − c(γ, λ) ≤ ϕ |u| + B |w| λ⊥ + c(γ, γ)1/2 c(λ, λ)1/2 , so that, from (3.15) and (3.12),
Ψ B a ϕ a(u, u) + c(λ, λ) ≤ ϕ + |w| a 1+ + β β α α
a 1 a B 1/2 |w| 1+ + c(λ, λ)1/2 c(γ, γ)1/2 + c (3.16) ϕ + , β α β α which can be written a(u, u) + c(λ, λ) ≤ P0 (ϕ , Ψ , |w| , |γ|c ) + c(λ, λ)1/2 P1 (ϕ , Ψ , |w| , |γ|c ), where P0 (resp., P1 ) is an homogeneous polynomial of degree 2 (resp., 1) in its four variables. The coefficients of those polynomials are polynomial in B, a, 1/β, 1/α, 1/2 with positive coefficients. We write X = c(λ, λ)1/2 , so that X 2 ≤ P1 X + P0 , c √ which implies |X| ≤ P1 + P0 , and finally c(λ, λ) = X 2 ≤ 2P12 + 2P0 = P2 (ϕ , Ψ , |w| , |γ|c ), where P2 is an homogeneous polynomial of degree 2. It is dominated by the square of the sum of the modulus of its variables, so that c(λ, λ)1/2 ≤ C(ϕ + Ψ + |w| + |γ|c ). Again using (3.16) (we keep C to design a generic constant, or more precisely a 1/2 polynomial in B, a, 1/β, 1/α, c ), we obtain immediately |u| ≤ C(ϕ + Ψ + |w| + |γ|c ). Finally, we write the second line of (3.5) with μ ∈ ker B . As c(·, ·) verifies (3.8), it 1/2 yields λ0 = Pker B γ, so that |λ0 | ≤ |γ|. As |γ|c ≤ c |γ|, and Ψ ≤ |w| + |γ|, estimate (3.9) is obtained. 4. Application. This section is dedicated to the application of Theorem 3.1 to a particular problem, namely a scalar version of the rigidity constraint for fluid-particle flows. 4.1. Model problem. In order to present explicit constructions when needed, we consider a particular situation. We introduce Ω =] − 2, 2[2 , and O = B(0, 1) ⊂⊂ Ω (see Figure 4.1). The case of more general situations is addressed in Remark 4.2, at the end of this paper. We consider the following problem: ⎧ −u = f in Ω \ O, ⎪ ⎪ ⎪ ⎪ ⎪ u = 0 on ∂Ω, ⎨ (4.1) u = U on ∂O, ⎪ ⎪ ⎪ ⎪ ∂u ⎪ ⎩ = 0, ∂O ∂n
1138
BERTRAND MAURY
where U is an unknown constant, and f ∈ L2 (Ω \ O). The scalar field u can be seen as a temperature and O as a zone with infinite conductivity. Definition 4.1. We say that u is a weak solution to (4.1) if u ∈ V = H01 (Ω), there exists U ∈ R such that u = U a.e. in O, and ∇u · ∇v = f v ∀v ∈ DO (Ω), Ω
Ω
where DO (Ω) is the set of all those functions which are compactly supported, C ∞ on Ω, and which are constant over O. Proposition 4.2. Problem (4.1) admits a unique weak solution u ∈ V = H01 (Ω), which is characterized as the solution to the minimization problem
(4.2)
⎧ Find u ∈ K such that ⎪ ⎪ ⎪ ⎨ 1 J(u) = inf J(v) , with J(v) = |∇u|2 − f v, v∈K ⎪ 2 Ω Ω ⎪ ⎪ 4 5 ⎩ K = v ∈ H01 (Ω), ∇v = 0 a.e. in O ,
where f has been extended by 0 inside O. Furthermore the restriction of u to the domain Ω \ O is in H 2 (Ω \ O). Proof. Existence and uniqueness are direct consequences of the Lax–Milgram theorem applied in K = {v ∈ V , ∇v = 0 a.e. in O}, which gives in addition the characterization of u as the solution to (4.2). Now u|Ω\O satisfies −u = f , with regular Dirichlet boundary conditions on the boundary of Ω \ O which decomposes as ∂O ∪ ∂Ω. As Ω is a convex polygon and ∂O is smooth, standard theory ensures that u|Ω\O ∈ H 2 (Ω \ O). Proposition 4.3 (saddle-point formulation). Let u be the weak solution to (4.1). There exists a unique λ ∈ Λ = L2 (O)2 such that λ is a gradient, and ∇u · ∇v + λ · ∇v = f v ∀v ∈ V. O
Ω 1
Ω
2
In addition λ is in H (O) . Proof. The first part is a consequence of Proposition 2.5, where B is defined by B : v ∈ H01 (Ω) −→ ∇v ∈ L2 (O)2 . Let us prove that B has a closed range. Considering μ ∈ Λ with μ = ∇v, we define w ∈ H01 (O) as w = v − m(v), where m(v) is the mean value of v over O. By the Poincar´e–Wirtinger inequality, one has wH 1 (O) ≤ C μL2 (O)2 . Now, as O ⊂⊂ Ω, there exists a continuous extension operator from H 1 (O) to H01 (Ω), so that we can extend w to obtain w ˜ ∈ H01 (Ω) with a norm controlled by μL2 (O)2 , which proves the closed character of B(V ), and consequently the existence of λ ∈ Λ, and its uniqueness in B(V ). Let us now describe λ. We have ∇u · ∇v + λ · ∇v = f v, Ω
O
Ω
1139
PENALTY METHOD
so that, by taking test functions in D(O), we get λ ∈ Hdiv (O) with ∇ · λ = 0. Taking now test functions which do not vanish on the boundary of O, we identify the normal trace of λ with ∂u/∂n ∈ H 1/2 (∂O). Therefore λ is defined as the unique divergencefree vector field in O, with normal derivative equal to ∂u/∂n on ∂O, which, in addition, is a gradient. In other words λ = ∇Φ, with ⎧ in O, ⎨ Φ = 0 ⎩ ∂Φ ∂n
∂u ∂n
=
on ∂O.
As O is smooth, Φ ∈ H 2 (O), so that λ = ∇Φ ∈ H 1 (O)2 . We introduce the penalized version of problem (4.2) ⎧ Find uε ∈ V such that J ε (uε ) = inf J ε (v) , ⎪ ⎨ v∈V (4.3) 1 1 2 2 ⎪ ⎩ J ε (v) = |∇v| + |∇v| − f v. 2 Ω 2ε O Ω Now we consider the family of Cartesian triangulations (Th ) of the square Ω (see Figure 4.1), and we denote by Vh the standard finite element space of continuous, piecewise affine function with respect to Th : 5 4 Vh = vh ∈ V , V|T is affine ∀T ∈ Th . It is tempting to define the fully discretized problem as the problem which consists in minimizing J ε over Vh . But this straightforward approach (which does not correspond to what is done in actual computations; see Remark 4.1) raises some problems in relation to the discrete inf-sup condition which we need to establish the error estimate (see Proposition 4.7). It is related to the fact that we cannot control the size of intersections of triangles with O (relative to the size of the whole triangle, which is h2 /2). To overcome this problem, many strategies can be adopted, all of them leading to change B onto a new discrete operator Bh . We propose here a radical method, which simply consists in removing in the penalty integral all squares (two-triangle sets) which intersect the boundary of O. It will be made clear that the convergence
Ω
O
∂Oh
∂O
Fig. 4.1. Domains Ω, O, Oh , and the mesh Th .
1140
BERTRAND MAURY
result is not sensitive to what is actually done in the neighborhood of ∂Ω. The proof simply requires that the reduced obstacle is included in the exact one, and that the difference set O \ Oh lies in a narrow band whose width goes to 0 like h. Definition 4.4. The reduced obstacle Oh ⊂ O is defined as the union of the triangles which belong to an elementary square which is contained in the disk O (see Figure 4.1). Definition 4.5. We recall that V = H01 (Ω), Λ is L2 (O)2 , and B ∈ L (V, Λ) is the gradient operator (see Proposition 4.3). We define Bh ∈ L (V, Λ) as v ∈ V −→ μ = Bh v = ½Oh ∇v, where ½Oh is the characteristic function of Oh (see Definition 4.4). Finally, the discretization space Λh ⊂ Λ = L2 (O)2 is the set of all those vector fields μh such that their restriction to Oh is the gradient of a scalar field vh ∈ Vh , and which vanish a.e. in O \ Oh , which we can express as Λh = {μh ∈ Λ , ∃vh ∈ Vh , μh = Bh vh } = Bh (Vh ). The fully discretized problem reads ⎧ ε ε ε ⎪ Jhε (vh ) , ⎪ Find uh ∈ Vh such that Jh (u ) = vhinf ⎨ ∈Vh (4.4) 1 1 2 2 ⎪ ε ⎪ |∇vh | + |∇vh | − f vh . ⎩ Jh (vh ) = 2 Ω 2ε Oh Ω 4.2. Error estimate for the model problem. Proposition 4.6 (primal/dual error estimate for (4.1), nonconforming case). Let u be the weak solution to (4.1), uεh the solution to (4.4), and λ the Lagrange multiplier (see Proposition 4.3), and let λεh = Bh uεh /ε (see Definition 4.5). We have the following error estimate: (4.5)
|u − uεh | + |λ − λεh | ≤ C(h1/2 + ε).
Proof. The proof is based on the abstract estimate in Theorem 3.1. All technical ingredients are put off until the end of the section. We shall simply refer here to the corresponding properties. The crucial requirement is the discrete inf-sup condition, which can be established for this choice of Bh (see Proposition 4.7). The terms ˜ uh − u| and inf λ inf |˜ h − λ u ˜h ∈Vh
˜ h ∈Λh λ
can be shown to behave like h1/2 (see Propositions 4.8 and 4.9, respectively). The last ˜ h − λ|. Indeed, two terms can be handled the same way as |λ |(Bh − B )λ| ≤ |λ|0,O\Oh , which is a O(h1/2 ) (it is the L2 norm of a function with H 1 regularity, on a hneighborhood of ∂O). The very same argument holds for |(Bh − B)z| (in our case, both quantities are the same). Proposition 4.7 (discrete inf-sup condition). Let Ω and O be defined as in the beginning of section 4. We introduce h = 1/N , N ∈ N, and Th is the regular triangulation with step h, so that the center of O is a vertex of Th . According to
1141
PENALTY METHOD
Definitions 4.4 and 4.5, Oh is the reduced obstacle, and Λh ⊂ L2 (O)2 = Λ is the set of all those vector fields which are the gradient of a piecewise affine function in Oh , and which vanish in O \ Oh . There exists β > 0 such that, for all h (= 1/N ), (4.6)
β P(ker Bh )⊥ vh ≤ |Bh vh |
∀vh ∈ Vh , sup
vh ∈Vh
(Bh vh , λh ) ≥ β λh Λh . |vh |
Proof. Let vh ∈ Vh be given. If we are able to build wh ∈ Vh such that Bh wh = Bh vh , with wh ≤ C Bh vh , we obtain P(ker B )⊥ vh = inf |vh − v˜h | ≤ |vh − (wh − vh )| = |wh | ≤ C |Bh vh | , h v ˜h ∈ker Bh
and the first inequality is proven. Let us describe how this wh ∈ Vh can be built in five steps. First, we introduce wh1 = vh −v h , where v h is the mean value of wh over Oh . Note that wh1 is not in Vh (it does not vanish on ∂Ω), but we consider only0its 0restriction to Oh . We have Bh wh1 = Bh vh , and the norm of wh1 is controlled: 0wh1 0H 1 (O ) ≤ h C1 Bh vh L2 (Oh )2 by the Poincar´e–Wirtinger inequality (with a constant which does not depend on h, as can be checked easily). We shall now describe how we plan to extend wh1 in the first quadrant, the three others being done the same way. This construction is illustrated by Figure 4.2. The first step consists in extending wh1 in the polygonal domain CA3 A 2 A1 on each horizontal segment by symmetry (see Figure 4.2). A similar construction extends wh1 in B2 B1
B3
C
A1
A 1
B2
A3
O
A2
B1
2. Fig. 4.2. Construction of wh
A 2
1142
BERTRAND MAURY
u1
u2
u1
u2
u1
v3
v2
v3
v3
v2
2 (detail). Fig. 4.3. Stretching of wh
CB1 B2 B3 . Now the function is simply extended in the upper right zone by symmetry around C. To show that the H 1 seminorm of the newly defined function wh2 is under control, we first remark that the shift between two consecutive lines does not exceed one cell. Now consider the detail in Figure 4.3. On the left we represented a detail of the triangulated domain in O where wh2 is already defined; the ui ’s and vi ’s represent the values of wh2 at some vertices. Now by applying the “symmetry” described previously, we obtain the stretched function which we represent on a single element. To control the effect of this stretching, we use Lemma 4.10 in the following way: The square of the H 1 seminorm of the new function is a quadratic nonnegative form q1 in the six variables, and the square of the H 1 seminorm corresponding to the left-hand situation itself is a scale invariant quadratic, nonnegative form q2 in the same variables, so that Lemma 4.10 ensures the existence of a universal constant C such that q1 ≤ Cq2 . As a consequence, the H 1 seminorm of the stretched function (in CA3 A 2 A 1 ) is controlled by the H 1 seminorm of the initial function (in CA1 A2 A3 ). As the new function in CA 1 B1 is obtained by standard symmetry, the H 1 seminorm identifies with the one of the initial function in CA1 B1 . 2 2 2 This leads to a new function2 wh defined on Oh , subtriangulation of Th , with w 4 ≤ C2 Bh vh 2 2 . As w has zero mean value in B(0, 1/2), one has h 1,O h L (Ω) h
0 20 0w 0 1 2 ≤ C Bh vh 2 2 . h H (O ) 2 L (Ω) h
√ Finally, O2h contains a ball strictly larger than O, say B(0, 1 + 2/4). Considering now a smooth function ρ which is equal to 1 in B(0, (1 + r)/2), and 0 outside B(0, r), we define wh3 as Ih (ρwh2 ) on O2h , and 0 in Ω \ O2h , where Ih is the standard interpolation operator. This function is in Vh ∩ H01 (Ω), and it verifies 0 0 Bh wh3 = λh , 0wh3 0H 1 (Ω) ≤ C3 Bh vh L2 (Ω)2 , so that the first inequality of (4.6) holds, with β = 1/C3 . The second one is a direct consequence of the first one: given λh = Bh uh , one considers wh = P(ker Bh )⊥ vh , so that 2
sup
vh ∈Vh
(Bh wh , λh ) |Bh wh | (Bh vh , λh ) ≥ = ≥ β |Bh wh | = β λh Λh , |vh | |wh | |wh |
which ends the proof. Proposition 4.8 (approximation of u). We make the same assumptions as in Proposition 4.7, and we consider u ∈ H01 (Ω) such that u = U ∈ R a.e. in O, uΩ\O ∈ H 2 (Ω \ O). There exists C > 0 such that inf u − u ˜h H 1 (Ω) ≤ Ch1/2 .
u ˜h ∈Vh
1143
PENALTY METHOD
Proof. We recall that Ih is the standard interpolation operator from C(Ω) onto Vh . Let us assume here that the constant value U on O is O (which can be achieved ˜ h as by substracting a smooth extension of this constant outside O). Now we define O the union of all those triangles of Th which have a nonempty intersection with O. We ˜ h and which identifies with Ih u at all define u ˜h as the function in Vh which is 0 in O other vertices. We introduce a narrow band around O: √ . (4.7) ωh = x ∈ Ω , x ∈ / O , d(x, O) < 2 2h . As u|Ω\O ∈ H 2 (Ω \ O), standard finite element estimates give (4.8)
|u − u ˜h |0,L2 (Ω\(O∪ωh )) ≤ Ch2 |u|H 2 (Ω\O) ,
(4.9)
|u − u ˜h |1,L2 (Ω\(O∪ωh )) ≤ Ch |u|H 2 (Ω\O) .
By construction, both L2 and H 1 errors in O are zero. There remains to estimate the error in the band ωh . The principle is the following: u ˜h is a poor approximation of u in ωh , but it is not very harmful because ωh is small. Note that similar estimates are proposed in [SMSTT05] or [AR08] . For the sake of completeness, and because it is essential to understand why a better order than 1/2 cannot be expected, we shall detail here the proof. First of all, we write (4.10)
u − u˜h ≤ |u|0,ωh + |u|1,ωh + |uh |0,ωh + |uh |1,ωh = A + B + C + D.
Lemma 4.13 ensures B ≤ Ch1/2 , and A ≤ Ch3/2 . As for u˜h (terms C and D in (4.10)), the proof is less trivial. It relies on the technical lemmas (Lemmas 4.11, 4.12, and 4.14 (see section 4.3)) which can be used as follows. The problematic triangles are those on which u ˜h identifies neither with 0, nor with Ih u. On such triangles, u˜h sticks to Ih u at 1 or 2 vertices, and vanishes at 2 or 1 vertices. As a consequence, the L∞ norm of u ˜h is less than the L∞ norm of Ih u. Let T be such a triangle. We write (using Lemma 4.11, the latter remark, the fact that Ih is a contraction from L∞ onto L∞ , Lemma 4.11 again, and Lemma 4.14) |˜ uh |2L2 (T ) ≤ C |T | ˜ uh 2L∞ (T ) ≤ C |T | Ih u2L∞ (T ) ≤
C 2 2 2 Ih uL2 (T ) ≤ C uL2 (T ) + h4 |u|2,T . C
By summing up all these contributions over all triangles which intersect ωh , and using the fact that the L2 norm of u on ωh behaves like h3/2 |u|2,T , we obtain 2
˜ uh L2 (ωh ) ≤
2
2
˜ uh L2 (T ) ≤ h3 |u|2,T ,
T ∩ωh =∅
which gives the expected h3/2 estimate for C. The last term of (4.10) is directly obtained by the previous estimate combined with the inverse inequality expressed by Lemma 4.12. Proposition 4.9 (approximation of λ). Let λ ∈ H 1 (O)2 be given, with λ = ∇w, w ∈ H 2 (O). There exists a constant C > 0 such that 0 0 0 ˜h0 ≤ Ch1/2 |λ|1,O , inf 0λ − λ 0 ˜ h ∈Λh λ
L2 (O)
1144
BERTRAND MAURY
where Λh is defined in section 3 (see Definition 4.5). Proof. First of all, we extend w on Ω \ O, to obtain a function (still denoted by w) in H01 (Ω) ∩ H 2 (Ω). Let us define wh as the standard interpolate of w over Th . One ˜ h ∈ Λh as the piecewise constant function which has |w − wh |1,O ≤ Ch. We define λ identifies with ∇wh on Oh (see Definition 4.4), and which vanishes in O \ Oh . One has 0 0 0 0 0 0 ˜h0 ˜h 0 = 0∇wh − λ = ∇wh L2 (O\Oh ) ≤ C ∇wL2 (O\Oh ) , 0∇wh − λ 0 2 0 2 L (O)
L (O\Oh )
which is the H 1 seminorm of a function in H 2 , in a narrow domain. Therefore it behaves like h1/2 times the H 2 seminorm of u (see Lemma 4.13 and Remark 4.3), which is the H 1 seminorm of λ. Finally, one gets 0 0 0 0 0 0 ˜h0 ˜h 0 ≤ |w − wh |1,O + 0∇wh − λ ≤ C(h + h1/2 ) |λ|1,O , 0λ − λ 0 2 0 2 L (O)
L (O)
which ends the proof. Remark 4.1 (boundary fitted meshes). Although it is somewhat in contradiction with its original purpose, the penalty method can be used together with a discretization based on a boundary fitted mesh. In that case, the approximation error behaves no longer like h1/2 but like h. Remark 4.2 (technical assumptions). Some assumptions we made are only technical and can surely be relaxed without changing the convergence results. For example the inclusion, which we supposed circular, could be a collection of smooth domains. Note that a convex polygon is not acceptable, as it is seen from the outside, so that u may no longer be in H 2 , which rules out some of the approximation properties we made. Concerning the mesh, we have good confidence in the fact that the result generalizes to any kind of unstructured mesh, but the proof of Proposition 4.7 in the general case can no longer be based on an explicit construction. 4.3. Technical lemmas. We gather here some elementary properties which are used in the proofs of Propositions 4.6, 4.7, 4.8, and 4.9. Lemma 4.10. Let E be a finite dimensional real vector space, with q1 and q2 two nonnegative quadratic forms with ker q2 ⊂ ker q1 . There exists C > 0 such that q1 ≤ Cq2 . v |q2 (v) = q2 (v) is a norm for E/ ker q2 . Now Proof. As q2 is nonnegative, v˜ → |˜ we define v ) = q1 (v) ∈ R. q˜1 : v˜ ∈ E/ ker q2 −→ q˜1 (˜ As ker q1 contains ker q2 , this functional is well defined. As it is quadratic over a finite √ dimensional space, it is continuous for the norm q2 , so that q1 (v) = q˜1 (˜ v ) ≤ C |v|2q2 = q2 (v), which ends the proof. Lemma 4.11. There exist constants C and C such that, for any nondegenerated triangle T , for any function wh affine in T , (4.11)
C |T | wh L∞ (T ) ≤ wh L2 (T ) ≤ C |T | wh L∞ (T ) . 2
2
2
1145
PENALTY METHOD
Proof. It is a consequence of the fact that, when deforming the supporting triangle 1/2 T , the L∞ norm is unchanged whereas the L2 norm scales like |T | . Lemma 4.12. There exists a constant C such that, for any nondegenerated triangle T , for any function wh affine in T , 2
|wh |1,K ≤ C
|T | 2 wh L∞ (T ) , ρ2K
where ρK is the diameter of the inscribed circle. Proof. Again, it is a straightforward consequence of the fact that, when deforming the supporting triangle T , the L∞ norm is unchanged whereas the gradient (which is constant over the triangle) scales like 1/ρk , so that the H 1 seminorm scales like 1/2 |T | /ρK . The next lemma establishes some Poincar´e-like inequalities in narrow domains. Lemma 4.13. Let O ⊂ R2 be the unit disk, strongly included in a domain Ω, and let ωη be the narrow band (note that this definition differs slightly from (4.7), which is of no consequence): 4 5 ωη = x ∈ Ω , x ∈ / O , d(x, O) < η , with η > 0. Denoting by |·|p,ω the H p seminorm over ω, we have the following estimates: |ϕ|0,ωη ≤ Cη 1/2 |ϕ|1,Ω\O
∀ϕ ∈ H 1 (Ω \ O),
ϕ|∂Ω = 0,
|ϕ|1,ωη ≤ Cη 1/2 |ϕ|2,Ω\O
∀ϕ ∈ H 2 (Ω \ O),
ϕ|∂Ω = 0,
|ϕ|0,ωη ≤ Cη 3/2 |ϕ|2,Ω\O
∀ϕ ∈ H 2 (Ω \ O),
ϕ|∂Ω = 0,
ϕ|∂O = 0.
Proof. We assume here that ϕ is C 1 in Ω \ O (the general case is obtained immer diately by density). Using polar coordinates, we write u(r, θ) = u(1, θ) + 1 ∂r udr, so that 2 2π 1+η 2π 1+η r 2 2 r dr dθ |u(1, θ)| r dr dθ + 2 ∂ ϕ ds |u|0,ωh ≤ 2 r 0
1
2
2
0
1
1
2
≤ C η |ϕ|0,∂O + η 2 |ϕ|1,ωη ≤ Cη |ϕ|1,Ω\O , from which we deduce the first estimate. This same approach can be applied to ∂i ϕ for ϕ ∈ H 2 . As ϕ is supposed to vanish over ∂Ω, one has |∂i ϕ| ≤ C ∇ϕH 1 (Ω\O) ≤ C |ϕ|22,Ω\O , which leads to the second estimate. As for the third one, simply notice that the boundary term (L2 norm over ∂O) vanishes in the equation above: |ϕ|0,ωη ≤ η |ϕ|1,ωη ≤ η 3/2 |ϕ|2,ωη , which ends the proof. Remark 4.3. The previous lemma extends straightforwardly to the case of any smooth inclusion (C 2 regularity of the boundary is sufficient) strongly included in a
1146
BERTRAND MAURY
domain Ω (for a detailed proof of a similar property, see [GLM06]) or to the case where the function is defined within the subdomain (in that case, ωη is defined as an inner narrow band). The last lemma quantifies how one can control the L2 norm of the interpolate of a regular function on a triangle, by means of the L2 norm and the H 2 seminorm of the function. Lemma 4.14. There exists a constant C such that, for any regular triangle T (see below), for any u ∈ H 2 (T ), 2 2 2 Ih uL2 (T ) ≤ C uL2 (T ) + h4 |u|2,T . By regular we mean that T runs over a set of triangles such that the flatness diam(T )/ ρK is bounded. Proof. The interpolation operator Ih : H 2 (T ) −→ L2 (T ) is continuous, and |u|2,T scales like h/ρ2K ≈ 1/h whereas the L2 norms scale like h. 5. Additional examples, concluding remarks. The approach can be checked to be applicable to some standard situations, like the constraint to vanish in an inclusion O ⊂⊂ Ω (see Example 2.2), as soon as H 1 -penalty is used. The functional to minimize is then 1 1 2 2 |∇v| − fv + u2 + |∇u| , Jε (v) = 2 Ω 2ε O Ω so that B identifies with the restriction operator from H01 (Ω) to H 1 (O). The discrete inf-sup condition, as well as the approximation properties, are essentially the same as in the case of an inclusion with infinite conductivity. Another straightforward application of the abstract framework presented in section 3 is the numerical modeling of a rigid inclusion in a material which obeys Lam´e’s equations of linear elasticity. The penalized functional is then 1 1 1 2 2 2 μ |e(v)| + λ |∇ · v| − f ·v+ |e(v)| , Jε (v) = 2 Ω 2 Ω 2ε O Ω where e(v) = ∇v + (∇v)T /2 is the strain tensor. We conclude this section by some remarks on the proof itself and on possible extensions of this approach. Remark 5.1 (conditioning issues). The fact that √ there is no need to choose ε too small (both errors balance for ε of the order of h) is of particular importance in terms of conditioning. Indeed, considering the matrix Aεh resulting from the twodimensional discrete minimization problem (4.4), it can be checked easily that its smallest eigenvalue scales like h2 , whereas its largest eigenvalue behaves like 1/ε, leading to a condition number of the order of 1/εh2 . Following the ε-h balance suggested by the error estimates, the condition number finally scales like 1/h5/2 , which compares reasonably to the standard 1/h2 . Note also that some special fixed point algorithms, recently proposed in [BFM08], can be used to circumvent the problem of ill-conditioning. Remark 5.2 (convergence in space). The poor rate of convergence in h is optimal for a uniform mesh, at least if we consider the H 1 error over all Ω. Indeed, as the solution is constant inside O, nonconstant outside with a jump in the normal derivative, the error within each element intersecting ∂O is a O(1) in this L∞ norm. By summing
PENALTY METHOD
1147
up over all those triangles, which cover a zone whose measure scales like h, we end up with this h1/2 error. Note that a better convergence could be expected, in theory, if one considers only the error in the domain of interest Ω \ O, the question now being whether the bad convergence in the neighborhood of ∂O pollutes the overall approximation. Our feeling is that this pollution actually occurs, because nothing is done in the present approach to distinguish both sides of ∂O, so that the method tends to balance the errors on both sides. An interesting way to give priority to the side of interest is proposed in [DP02] for a boundary penalty method; it consists in having the diffusion coefficient vanish within Ω. Note that other methods have been proposed to reach the optimal convergence rate on nonboundary fitted mesh (see [Mau01]), but they are less straightforward to implement. The simplest way to improve the actual order of convergence is to carry out a local refinement strategy in the neighborhood of ∂O, as proposed in [RAB07]. Remark 5.3 (nonregular domains). The method can be implemented straightforwardly to nonregular domains (e.g., with corners or cusps), but the numerical analysis presented here is no longer valid. In particular, the inf-sup condition established in Proposition 4.7 and approximation properties for u (see Proposition 4.8) may no longer hold. Notice that Propositions 2.9 and 2.10 do not require any regularity assumption, so that convergence can be established for some sequences (h, ε) tending to (0, 0), but the optimal order of convergence is lost. Practical tests suggest a reasonably good behavior of the method is such situations, like in the case where O consists of two tangent discs (this situation is of special interest for practical applications in the context of fluid particle flows, when two particles are in contact; see, for example, [Lef07]). Remark 5.4. Note that having ε go to 0 for any h > 0 leads to an estimate for a fictitious domain method (` a la Glowinski, i.e., based on the use of Lagrange multipliers). In [GG95], an error estimate is obtained for such a method; it relies on two independent meshes for the primal and dual components of the solution (conditionally to some compatibility conditions between the sizes of the two meshes). We recover this estimate in the situation where the local mesh is simply the restriction of the covering mesh to the obstacle (to the reduced obstacle Oh , to be more precise). REFERENCES [AR08]
[Bab73] [BE07]
[BF91] [BFM08]
[BHS03]
[DP02]
P. Angot and I. Rami` ere, Convergence analysis of the Q1-finite element method for elliptic problems with non boundary-fitted meshes, Internat. J. Numer. Methods Engrg., 75 (2008), pp. 1007–1052. I. Babuˇ ska, The finite element method with penalty, Math. Comp., 27 (1973), pp. 221– 228. E. Burman and A. Ern, A continuous finite element method with face penalty to approximate Friedrichs’ systems, M2AN Math. Model. Numer. Anal., 41 (2007), pp. 55–76. F. Brezzi and M. Fortin, Mixed and Hybrid Finite Element Methods, Springer Ser. Comput. Math. 15, Springer-Verlag, New York, 1991. T. T. C. Bui, P. Frey, and B. Maury, M´ ethode du second membre modifi´ e pour la gestion de rapports de viscosit´ e importants dans le probl` eme de Stokes bifluide, C. R. M´ ecanique, 336 (2008), pp. 524–529. R. Becker, P. Hansbo, and R. Stenberg, A finite element method for domain decomposition with non-matching grids, M2AN Math. Model. Numer. Anal., 37 (2003), pp. 209–225. S. Del Pino, Une m´ ethode d’´ el´ ements finis pour la r´ esolution d’EDP dans des domaines d´ ecrits par g´ eom´ etrie constructive, Ph.D. thesis, Universit´e Pierre et Marie Curie, Paris, France, 2002.
1148 [DPM07]
[FFp] [GG95] [GLM06]
[GR79] [HH02]
[JLM05]
[Lef07]
[Mau01] [Nit71]
[PG02]
[RAB07]
[RPVC05]
[SMSTT05]
BERTRAND MAURY S. Del Pino and B. Maury, 2d/3d turbine simulations with freefem, in Numerical Analysis and Scientific Computing for PDEs and Their Challenging Applications, J. Haataja, R. Stenberg, J. Periaux, P. Raback, and P. Neittaanmaki, eds., CIMNE, Barcelona, Spain, 2008. freeFEM++; http://www.freefem.org/. V. Girault and R. Glowinski, Error analysis of a fictitious domain method applied to a Dirichlet problem, Japan J. Indust. Appl. Math., 12 (1995), pp. 487–514. ´ pez, and B. Maury, One time-step finite element discretization V. Girault, H. Lo of the equation of motion of two-fluid flows, Numer. Methods Partial Differential Equations, 22 (2006), pp. 680–707. V. Girault and P.-A. Raviart, Finite Element Approximation of the Navier-Stokes Equations, Lecture Notes in Math. 749, Springer-Verlag, Berlin, 1979. A. Hansbo and P. Hansbo, An unfitted finite element method, based on Nitsche’s method, for elliptic interface problems, Comput. Methods Appl. Mech. Engrg., 191 (2002), pp. 5537–5552. J. Janela, A. Lefebvre, and B. Maury, A penalty method for the simulation of fluid-rigid body interaction, in CEMRACS 2004—Mathematics and Applications to Biology and Medicine, ESAIM Proc. 14, EDP Sciences, Les Ulis, France, 2005, pp. 115–123. A. Lefebvre, Fluid-particle simulations with FreeFem++, in Paris-Sud Working Group on Modelling and Scientific Computing 2006–2007, ESAIM Proc. 18, EDP Sciences, Les Ulis, France, 2007, pp. 120–132. B. Maury, A fat boundary method for the Poisson problem in a domain with holes, J. Sci. Comput., 16 (2001), pp. 319–339. ¨ J. Nitsche, Uber ein Variationsprinzip zur L¨ osung von Dirichlet-Problemen bei Verwendung von Teilr¨ aumen, die keinen Randbedingungen unterworfen sind, Abh. Math. Sem. Univ. Hamburg, 36 (1971), pp. 9–15. T.-W. Pan and R. Glowinski, Direct simulation of the motion of neutrally buoyant circular cylinders in plane Poiseuille flow, J. Comput. Phys., 181 (2002), pp. 260– 279. I. Rami` ere, P. Angot, and M. Belliard, A fictitious domain approach with spread interface for elliptic problems with general boundary conditions, Comput. Methods Appl. Mech. Engrg., 196 (2007), pp. 766–781. T. N. Randrianarivelo, G. Pianet, S. Vincent, and J. P. Caltagirone, Numerical modelling of solid particle motion using a new penalty method, Internat. J. Numer. Methods Fluids, 47 (2005), pp. 1245–1251. J. San Mart´ın, J.-F. Scheid, T. Takahashi, and M. Tucsnak, Convergence of the Lagrange–Galerkin method for the equations modelling the motion of a fluid-rigid system, SIAM J. Numer. Anal., 43 (2005), pp. 1536–1571.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1149–1167
MODIFIED COMBINED FIELD INTEGRAL EQUATIONS FOR ELECTROMAGNETIC SCATTERING∗ O. STEINBACH† AND M. WINDISCH† Abstract. The boundary integral formulation of exterior boundary value problems for the Maxwell system may not be equivalent to the original uniquely solvable problem if the wave number corresponds to an eigenvalue of an associated interior eigenvalue problem. To avoid these spurious modes one may use a combined boundary integral approach. To analyze the resulting boundary integral equations in the energy function spaces suitable regularizations have to be introduced. Here we formulate and analyze a modified boundary integral equation which is based on the use of standard boundary integral operators only. A first numerical example shows the applicability of the proposed approach. Key words. combined field integral equations, electromagnetic scattering, Maxwell system AMS subject classifications. 65N38, 78A45 DOI. 10.1137/070698063
1. Introduction. The modeling of electromagnetic scattering at a perfect conductor in the exterior of a bounded domain Ω ⊂ R3 leads to the Dirichlet boundary value problem [12, 18, 22, 23] curl curl U(x) − κ2 U(x) = 0
(1.1)
nx × (U(x) × nx ) =g(x)
(1.2)
for x ∈ Ωc = R3 \ Ω, for x ∈ Γ = ∂Ω,
where κ ∈ R+ is the wave number, and nx is the exterior unit normal vector for almost all x ∈ Γ. In addition to the exterior boundary value problem (1.1) we need to formulate the radiation condition of electromagnetic scattering, i.e., the Silver–M¨ uller radiation condition |curl U(x) × nx − iκ(nx × U(x)) × nx |2 dsx = 0, (1.3) lim r=|x|→∞
∂Br
where Br is a ball around zero with radius r. Note that the exterior Dirichlet boundary value problem (1.1)–(1.3) admits a unique solution. According to the partial differential operator in (1.1) we can formulate Green’s first formula which is valid for sufficiently smooth functions as (1.4) curl curl U(x) · V(x) dx = curl U(x) · curl V(x) dx Ω
Ω
− Γ
(curl U(x)|Γ × nx ) · (nx × (V(x)|Γ × nx )) dsx .
∗ Received by the editors July 24, 2007; accepted for publication (in revised form) November 12, 2008; published electronically February 19, 2009. This work was supported by the Austrian Science Fund (FWF) within the project “Data sparse boundary and finite element domain decomposition methods in electromagnetics” under grant P19255. http://www.siam.org/journals/sinum/47-2/69806.html † Institute of Computational Mathematics, Graz University of Technology, Steyrergasse 30, 8010 Graz, Austria (
[email protected],
[email protected]).
1149
1150
O. STEINBACH AND M. WINDISCH
Based on (1.4) related Sobolev spaces and corresponding trace operators can be introduced [4, 5, 6, 7, 8]; these results will be summarized in section 2. Then, the well-known Stratton–Chu representation formula will be discussed which implies the definition of appropriate potential and boundary integral operators [6, 8, 11, 13, 16, 17, 20, 21, 23]. The corresponding boundary integral equations can be used for a numerical treatment of the problem by means of boundary element methods [3, 6, 8, 11, 12, 13, 19, 23]. But although the exterior boundary value problem (1.1)– (1.3) is uniquely solvable, the standard boundary integral equations are not uniquely solvable if the wave number κ corresponds to an eigenvalue of an associated interior eigenvalue problem. To avoid these spurious modes Brakhage and Werner [1] introduced a combined boundary integral approach for the acoustic problem in 1965. In the same year Panich discussed this approach for the electromagnetic case [24]. But the analysis of the approach of Brakhage and Werner is applicable for smooth boundaries only. Hence modified boundary integral equations were discussed in [10] for the acoustic case and in [9] for the electromagnetic case. In [14] an alternative approach was introduced for the acoustic case. Here we want to generalize this idea to obtain modified combined boundary integral equations for the electromagnetic case. The paper is structured as follows: In section 2 we first summarize the definitions of Sobolev spaces to handle the variational formulation of the Maxwell system, and introduce potential operators and related boundary integral operators as needed later. We also discuss standard boundary integral approaches to solve the exterior Dirichlet boundary value problem, and comment on combined and already existing stabilized boundary integral formulations. An alternative modified boundary integral equation is formulated and analyzed in section 3. In particular, we present a new boundary integral formulation which is based on the use of standard, and therefore already available, boundary integral operators, and which is stable for all wave numbers. In section 4 we describe a first numerical example to show the applicability of the proposed approach. We finally end up with some conclusions and an outlook on ongoing work. 2. Function spaces and boundary integral equations. The formulation of boundary integral equations for the Maxwell system requires the use of the correct function spaces. Here we will recall only the definitions and the properties of Sobolev spaces for the Maxwell system; for a more detailed description see, e.g., [4, 5]. Let Ω ⊂ R3 be a Lipschitz polyhedron [4]1 with a Lipschitz boundary Γ = ∂Ω which is the union of plane faces Γi , i.e., Γ = i Γi , where ni is the exterior normal vector on Γi . The partial differential equation in (1.1) and Green’s first formula (1.4) motivate the definition of the energy space H(curl , Ω) := {V ∈ L2 (Ω) : curl V ∈ L2 (Ω)} as well as the space of the natural solutions H(curl 2 , Ω) := {V ∈ H(curl , Ω) : curl curl V ∈ L2 (Ω)} . In addition we need to introduce appropriate Sobolev spaces on the boundary. For |s| ≤ 1 and for scalar functions on the boundary the usual Sobolev spaces are denoted by H s (Γ). Let us define the Dirichlet traces γD U := n × (U|Γ × n) = n × γ× U,
γ× U := U|Γ × n
1151
MODIFIED CFIE FOR ELECTROMAGNETIC SCATTERING
and the Neumann trace γN U := curl U|Γ × n which all are mappings into tangential spaces. Hence we introduce the space L2,t (Γ) := {u ∈ L2 (Γ) : u · n = 0} of tangential L2 (Γ) integrable functions. For higher order Sobolev spaces we use the piecewise definition Hspw,t (Γ) := {u ∈ L2,t (Γ) : u ∈ Hs (Γk ), k = 1, . . . , NΓ }. 1/2
1/2
The trace spaces γD H1 (Ω) and γ× H1 (Ω) are denoted by H (Γ) and H⊥ (Γ), respectively; for an alternative definition see [4]. The dual spaces with respect to L2,t (Γ) −1/2 −1/2 are denoted by H (Γ) and H⊥ (Γ). Before introducing the trace spaces of H(curl Ω) we need to define some boundary differential operators. Here we just give definitions for smooth boundaries; for Lipschitz polyhedrons see [4, 5]. For a scalar function u defined on Γ we denote by u 6 an arbitrary bounded extension into a three-dimensional neighborhood of Γ. Then we can define the boundary differential operators ∇Γ u := [n × (∇6 u × n)]|Γ ,
curlΓ u := [curl (6 u n)]|Γ ,
where ∇Γ : H 1 (Ω) → L2,t (Γ),
curlΓ : H 1 (Ω) → L2,t (Γ).
In addition, we introduce the adjoint operators of −∇Γ and of curlΓ , i.e., divΓ : L2,t (Γ) → H∗−1 (Ω),
curlΓ : L2,t (Γ) → H∗−1 (Ω),
where 4 5 H∗−1 (Ω) = v ∈ H −1 (Γ) : v, 1Γ = 0 . With the help of these operators we can finally define the Hilbert spaces . −1/2 −1/2 H⊥ (curlΓ , Γ) := u ∈ H⊥ (Γ) : curlΓ u ∈ H−1/2 (Γ) , −1/2
H
. −1/2 (divΓ , Γ) := u ∈ H (Γ) : divΓ u ∈ H−1/2 (Γ) .
These spaces are dual to each other with respect to L2,t (Γ) and represent the trace spaces γD H(curl , Ω) and γ× H(curl , Ω), respectively. Furthermore, there holds the following theorem [4, Theorems 2.7 and 2.8] and [5, Theorem 4.5]. Theorem 2.1. The operators −1/2
γD : H(curl , Ω) → H⊥
(curlΓ , Γ), −1/2
γN : H(curl curl , Ω) → H are linear, continuous, and surjective.
(divΓ , Γ)
1152
O. STEINBACH AND M. WINDISCH
Now we are able to introduce some potential and boundary integral operators which are relevant for electromagnetic scattering [11]. The solution of the exterior Dirichlet boundary value problem (1.1)–(1.3) can be described by using the Stratton– Chu representation formula [13, 17, 23] (2.1)
c c U(x) = −ΨκM (γD U)(x) − ΨκS (γN U)(x)
for x ∈ Ωc ,
where the Maxwell single layer potential is given by ΨκS (μ) := ΨκA (μ) +
1 gradΨκV (divΓ (μ)), κ2
and the Maxwell double layer potential is defined by ΨκM (λ)(x) := curl ΨκA (λ × n)(x). The operators ΨκA and ΨκV are the vectorial and the scalar single layer potentials which are given by κ κ ΨA (λ)(x) := gκ (x, y)λ(y)dsy , ΨV (λ)(x) := gκ (x, y)λ(y)dsy , Γ
Γ
whereas gκ (x, y) is the fundamental solution of the Helmholtz equation, gκ (x, y) =
1 eiκ|x−y| . 4π |x − y|
To use an indirect approach to represent the solution of (1.1)–(1.3) the following result is essential; see, e.g., [11, Theorem 3.8] or [13, section 6]. Theorem 2.2. The Maxwell single and double layer potentials are solutions of the partial differential equation in (1.1) and fulfill the Silver–M¨ uller radiation condition (1.3). Moreover, the following mapping properties are valid: −1/2
(divΓ , Γ) → Hloc (curl 2 , Ω ∪ Ωc ),
−1/2
(curlΓ , Γ) → Hloc (curl 2 , Ω ∪ Ωc ).
ΨκS : H
ΨκM : H⊥
Hence we can represent the solution of the exterior Dirichlet boundary value problem (1.1)–(1.3) either by the single layer potential (2.2)
U(x) = ΨκS (μ)(x)
for x ∈ Ωc
or by using the double layer potential (2.3)
U(x) = ΨκM (λ)(x)
for x ∈ Ωc . −1/2
−1/2
To find the unknown density functions μ ∈ H (divΓ , Γ) and λ ∈ H⊥ (curlΓ , Γ) we have to formulate appropriate boundary integral equations which can be derived from the Dirichlet boundary condition (1.2). For this we first use the trace operators γD and γN as given in Theorem 2.1 to define related boundary integral operators; in
MODIFIED CFIE FOR ELECTROMAGNETIC SCATTERING
1153
particular for the interior trace we obtain γD ΨκS μ(x) =: Sκ μ(x),
1 κ γD ΨM λ(x) =: I + Cκ λ(x), 2
1 I + Bκ μ(x), γN ΨκS μ(x) =: 2 γN ΨκM λ(x) =: Nκ λ(x), while for the exterior trace we get c γD ΨκS μ(x) =: Sκ μ(x),
1 κ c γD ΨM λ(x) =: − I + Cκ λ(x), 2
1 c ΨκS μ(x) =: − I + Bκ μ(x), γN 2 c ΨκM λ(x) =: Nκ λ(x). γN
Note that −1/2
Sκ : H
−1/2
(divΓ , Γ) → H⊥
(curlΓ , Γ)
and −1/2
Nκ : H ⊥
−1/2
(curlΓ , Γ) → H
(divΓ , Γ).
Moreover, with respect to the complex duality pairing λ, μ = λ(x) · μ(x) dsx , Γ
we have for κ ∈ R\{0} −1/2
Sκ μ, w = μ, S−κ w for all μ, w ∈ H
−1/2
Nκ λ, v = λ, N−κ v
for all λ, v ∈ H⊥
(divΓ , Γ),
(curlΓ , Γ),
while the double layer potentials Cκ and Bκ are related to each other as follows. −1/2 −1/2 Lemma 2.3. For all μ ∈ H (divΓ , Γ) and λ ∈ H⊥ (curlΓ , Γ) there holds Bκ μ, λ = −μ, C−κ λ
for all κ ∈ R\{0}.
Proof. Since U = ΨκS μ and V = Ψ−κ M λ are solutions of the homogeneous Maxwell equations, we can write Green’s first formula (1.4) for the bounded domain Ω as curl U · curl Vdx = curl curl U · Vdx + γN U, γD V Ω
Ω
κ2 U · Vdx + γN U, γD V
= Ω
1154
O. STEINBACH AND M. WINDISCH
and
curl V · curl Udx = Ω
κ2 V · Udx + γN V, γD U. Ω
Hence we first conclude γN U, γD V = γN V, γD U. On the other hand, for a bounded domain Br \Ω we have 2 c c curl U · curl Vdx = κ U · Vdx + γN U · γD Vdsx − γN U, γD V Br \Ω
Br \Ω
and
∂Br
Br \Ω
curl V · curl Udx =
Hence we also conclude c c γN U, γD V =
Br \Ω
κ2 V · Udx +
c c γN V · γD Udsx − γN V, γD U. ∂Br
γN U · γD Vdsx −
∂Br
c c γN V · γD Udsx + γN V, γD U ∂Br
and therefore, for r → ∞, c c c c γN U, γD V = γN V, γD U = γN V, γD U = γN U, γD V. κ Note that U = ΨκS μ and V = Ψ−κ M λ = ΨM λ are both solutions of the homogeneous Maxwell equations (1.1) satisfying the radiation condition (1.3); see also [11, Lemma 3.10]. With the interior and exterior Neumann traces,
1 1 c I + Bκ μ, γN U = γN ΨκS μ = − I + Bκ μ, γN U = γN ΨκS μ = 2 2
we further obtain c U = 2Bκ μ, γN U + γN
c γN U − γN U = μ.
On the other hand, when considering the interior and exterior Dirichlet traces this gives
1 1 −κ c I + C−κ λ, γD V = − I + C−κ λ, γD V = γD ΨM λ = 2 2 and therefore c γD V + γ D V = 2C−κ λ,
c γD V − γD V = λ.
Hence we finally obtain c c 2Bκ μ, λ = γN U + γN U, γD V − γD V c c c c = γN U, γD V + γN U, γD V − γN U, γD V − γN U, γD V c c c c = γN U, γD V + γN U, γD V − γN U, γD V − γN U, γD V c c = γN U − γN U, γD V + γD V
= −2μ, C−κ λ.
1155
MODIFIED CFIE FOR ELECTROMAGNETIC SCATTERING −1/2
When using the single layer potential (2.2) we have to find μ ∈ H by solving the boundary integral equation Sκ μ(x) = g(x)
(2.4)
for x ∈ Γ, −1/2
while for the double layer potential (2.3) λ ∈ H⊥ boundary integral equation (2.5)
(divΓ , Γ)
1 − λ(x) + Cκ λ(x) = g(x) 2
(curlΓ , Γ) is the solution of the
for x ∈ Γ.
When applying the exterior Dirichlet and the exterior Neumann traces to the Stratton– Chu representation formula (2.1) we obtain a system of boundary integral equations, (2.6)
c U= γD
c −Sκ γN U
+
c ( 12 I − Cκ )γD U,
c c U=( 12 I − Bκ )γN U + γN
c −Nκ γD U.
In particular, to describe the solution of the exterior Dirichlet boundary value problem c (1.1)–(1.3) we may use the first boundary integral equation in (2.6) to find γN U∈ −1/2 H (divΓ , Γ) such that (2.7)
1 c Sκ γN U(x) = − g(x) − Cκ g(x) 2
for x ∈ Γ.
Proposition 2.4 (see [12]). Let λ = κ2 be an eigenvalue of the interior Maxwell eigenvalue problem curl curl Uλ (x) = λUλ (x)
for x ∈ Ω.
Then, in the case of the interior Dirichlet eigenvalue problem (2.8)
curl curl Uλ (x) = λUλ (x)
for x ∈ Ω,
γD Uλ (x) = 0
for x ∈ Γ,
γN Uλ (x) is in the kernel of Sκ and (− 12 I + Bκ ), i.e.,
Sκ γN Uλ = 0,
1 I − Bκ γN Uλ = 0. 2
On the other hand, if κ2 is not an eigenvalue of the interior Dirichlet eigenvalue problem (2.8), then Sκ w = 0 implies w = 0. Moreover, in the case of the interior Neumann eigenvalue problem (2.9)
curl curl Vλ (x) = λVλ (x)
for x ∈ Ω,
γN Vλ (x) = 0
for x ∈ Γ,
γD Vλ (x) is in the kernel of Nκ and ( 12 I − Cκ ), i.e.,
Nκ γD Vλ = 0,
1 I − Cκ γD Vλ = 0. 2
Hence, if λ = κ2 is an eigenvalue of the interior Maxwell eigenvalue problem, we conclude that the single layer potential operator Sκ is not invertible, and therefore
1156
O. STEINBACH AND M. WINDISCH
the boundary integral equations (2.4) and (2.7) are in general not solvable. However, due to ? @ ?
@ 1 1 − g − Cκ g, γN Uλ = g, − I + B−κ γN Uλ = 0 2 2 we conclude that the right-hand side of the boundary integral equation (2.7) is in the image of the single layer potential Sκ ; i.e., the boundary integral equation (2.7) of the direct approach is solvable, but the solution is not unique. Moreover, the boundary integral operator 12 I − Cκ is also not invertible, and therefore the boundary integral equation (2.5) of the indirect approach is in general not solvable. To overcome the problem of nonsolvability of boundary integral equations due to interior eigenfrequencies one may use a combined approach such as the formulation of Brakhage and Werner, who introduced a combined field integral equation for the acoustic scattering problem [1]. The same idea was used by Panich in [24] for the electromagnetic case. In general, the idea is to consider complex linear combinations of the single and double layer potential, i.e., U(x) = −iηΨκS w(x) − ΨκM w(x)
for x ∈ Ωc ,
where η ∈ R+ is some parameter to be chosen. The unknown density w ∈ L2 (Γ) can then be determined from the resulting boundary integral equation
1 c I − Cκ w(x) = g(x) for x ∈ Γ (2.10) γD U(x) = −iηSκ w(x) + 2 which can be proved to be uniquely solvable if the boundary Γ = ∂Ω is sufficiently smooth. But this proof is essentially based on the compactness of the double layer potential operator Cκ which is not satisfied if Ω is a Lipschitz polyhedron. Hence one −1/2 −1/2 may introduce a regularization operator B : H (divΓ , Γ) → H⊥ (curlΓ , Γ) such that the stabilized boundary integral equation
1 c I − Cκ Bw(x) = g(x) for x ∈ Γ (2.11) γD U(x) = −iηSκ w(x) + 2 −1/2
admits a unique solution w ∈ H (divΓ , Γ). A suitable compact operator B was introduced by Buffa and Hiptmair in [9]. The unique solvability of the stabilized boundary integral equation (2.11) is then based on a generalized G˚ arding inequality for the single layer potential Sκ and on the injectivity of the composed boundary integral operator in (2.11). In the next section we will describe an alternative approach which generalizes modified boundary integral equations for the Helmholtz case [14]. To analyze the proposed modified boundary integral formulation we will need some auxiliary results as given in the following. Due to the boundary integral equations (2.6) we define, for general σ ∈ C, the Calderon projector
1 I − Cσ −Sσ C= 2 1 −Nσ 2 I − Bσ which satisfies the projection property λ 2 λ (2.12) C =C μ μ
MODIFIED CFIE FOR ELECTROMAGNETIC SCATTERING −1/2
1157
−1/2
for all λ ∈ H⊥ (curlΓ , Γ) and μ ∈ H (divΓ , Γ). As a corollary of the projection property (2.12) we then conclude the relations (2.13)
Sσ Nσ =
1 I − C2σ , 4
(2.14)
Nσ Sσ =
1 I − B2σ , 4
(2.15)
−Nσ Cσ = Bσ Nσ ,
(2.16)
−Cσ Sσ = Sσ Bσ .
Note that the case σ = κ ∈ R corresponds to the Maxwell equation (1.1), while the purely imaginary case σ = iκ, κ ∈ R, corresponds to the Yukawa-type equation curl curl U(x) + κ2 U(x) = 0
for x ∈ Ωc ,
and the associated fundamental solution is given by giκ (x, y) =
1 e−κ|x−y| . 4π |x − y|
In this case, i.e., for σ = iκ, κ ∈ R, the single layer boundary integral operator Sσ and the hypersingular integral operator Nσ are self-adjoint with respect to the complex duality pairing, while for the related double layer potentials we have Bσ μ, λ = −μ, Cσ λ. If the single layer potential operator Sσ is invertible, we can define the Steklov– Poincar´e operator
1 −1/2 −1/2 I − C (2.17) Tσ := S−1 : H⊥ (curlΓ , Γ) → H (divΓ , Γ) σ σ 2 which allows an alternative symmetric representation
1 1 I + Bσ S−1 I − C Tσ := Nσ + (2.18) σ . σ 2 2 Theorem 2.5. The operators −1/2
c Ψ0A : H A0 = γD
1/2
(Γ) → H (Γ)
and c V0 = γD Ψ0V : H −1/2 (Γ) → H 1/2 (Γ) −1/2
are self-adjoint as well as H (Γ)- and H −1/2 (Γ)-elliptic, respectively. Moreover, for σ = iκ, κ ∈ R+ , the single layer potential −1/2
Sσ : H −1/2
−1/2
(divΓ , Γ) → H⊥
(curlΓ , Γ)
is H (divΓ , Γ)-elliptic and self-adjoint. Proof. For the mapping properties of the boundary integral operators A0 and V0 see [6, Theorem 4]. The ellipticity of Sσ follows as in the case of the Laplace operator; see, e.g., [27].
1158
O. STEINBACH AND M. WINDISCH
3. Modified boundary integral equations. In this section we propose an alternative approach of a modified boundary integral equation to solve the exterior Dirichlet boundary value problem (1.1)–(1.3). Because of symmetry reasons we choose
1 −1/2 −1/2 I + B−κ : H (divΓ , Γ) → H⊥ (curlΓ , Γ), B = S∗0 −1 2 −1/2
whereas S∗0 : H⊥
−1/2
(curlΓ , Γ) → H
(divΓ , Γ) is given by
S∗0 u := n × A0 (u × n) + curlΓ V0 curlΓ u. −1/2
By using Theorem 2.5 one can prove that S∗0 is H⊥ (curlΓ , Γ)-elliptic and selfadjoint. Now we can describe the solution of the exterior Dirichlet boundary value problem (1.1)–(1.3) by U(x) = ΨκS w(x) − iηΨκM Bw(x)
for x ∈ Ωc .
When applying the exterior Dirichlet trace we can find the unknown density w ∈ −1/2 H (divΓ , Γ) from the modified boundary integral equation
(3.1) Zκ w(x) = Sκ w(x) + iη
1 1 I − Cκ S∗0 −1 I + B−κ w(x) = g(x) 2 2
for x ∈ Γ.
To establish the unique solvability of the modified boundary integral equation (3.1) we first prove that Zκ is coercive. In contrast to the approach in [14] we show the coercivity in the second part, because the single layer potential Sκ does not fulfill a G˚ arding inequality. To prove the coercivity of the operator Zκ we first define an appropriate equivalent −1/2 norm in H⊥ (curlΓ , Γ), see Theorem 2.5; i.e., for σ = iκ, κ ∈ R+ , uS−1 := σ
S−1 σ u, u,
−1/2
u ∈ H⊥
(curlΓ , Γ).
As in the case of a formally elliptic partial differential operator [28] we can prove a contraction property of the associated double layer potential 12 I − Cσ , σ = iκ, κ ∈ R+ . −1/2
Theorem 3.1. For all u ∈ H⊥ (curlΓ , Γ) and for σ = iκ, κ ∈ R+ , there holds 0
0 0 1 0 0 I − Cσ u0 ≤ , (1 − cK )uS−1 0 2 0 −1 ≤ cK uS−1 σ σ Sσ where cK
1 = + 2
>
1 − cS1 cN 1 < 1, 4
and cS1 , cN 1 are the ellipticity constants of the single layer potential Sσ and of the hypersingular operator Nσ . Proof. The proof follows as in the case of a formally elliptic partial differential operator; see [28, Theorem 3.1].
1159
MODIFIED CFIE FOR ELECTROMAGNETIC SCATTERING −1/2
For u ∈ H⊥
(curlΓ , Γ) with uH−1/2 (curlΓ ,Γ) > 0 we first have ⊥
0
? 02
@ 0 1 0 1 −1 1 0 I − Cσ u0 0 2 0 −1 = Sσ 2 I − Cσ u, 2 I − Cσ u = Tσ u, u − Nσ u, u, Sσ where the Steklov–Poincar´e operator Tσ is defined as in (2.18). Let −1/2
J : H
−1/2
(divΓ , Γ) → H⊥
(curlΓ , Γ)
be the Riesz operator; then −1/2
A := J S−1 : H⊥ σ
−1/2
(curlΓ , Γ) → H⊥
(curlΓ , Γ)
−1/2
is self-adjoint and H⊥ (curlΓ , Γ)-elliptic. Hence we can consider the splitting A = A1/2 A1/2 to obtain
@ ? −1 1 Tσ u, u = Sσ I − Cσ u, u 2 ?
@ 1 I − C = JS−1 u, u σ σ −1/2 2 H (curlΓ ,Γ) ? =
A1/2
⊥
@ 1 I − Cσ u, A1/2 u −1/2 2 H (curlΓ ,Γ) ⊥
0
0 0 1/2 1 0 0 I − Cσ u0 ≤ 0A A1/2 uH−1/2 (curlΓ ,Γ) . 0 −1/2 ⊥ 2 H (curlΓ ,Γ) ⊥
With A1/2 v2H−1/2 (curl ⊥
Γ
,Γ)
= A1/2 v, A1/2 vH−1/2 (curlΓ ,Γ) ⊥
−1 2 = JS−1 σ v, vH−1/2 (curlΓ ,Γ) = Sσ v, v = vS−1 σ
⊥
we then obtain
0
0 0 1 0 0 Tσ u, u ≤ 0 I − Cσ u0 . 0 −1 uS−1 σ 2 Sσ
On the other hand, for the hypersingular boundary integral operator we have 2 Nσ u, u ≥ cN 1 uH−1/2 (curl ⊥
Γ
,Γ)
S −1 N S 2 ≥ cN 1 c1 Sσ u, u = c1 c1 uS−1 . σ
Altogether, this gives 0
02 0 1 0 0 I − Cσ u0 0 2 0 −1 = Tσ u, u − Nσ u, u Sσ 0
0 0 1 0 S 2 0 u −1 − cN I − C ≤0 , σ u0 1 c1 uS−1 Sσ 0 2 σ −1 Sσ
1160
O. STEINBACH AND M. WINDISCH
which is equivalent to a 2 b where
−
a S + cN 1 c1 ≤ 0, b
0
0 0 0 1 0 a := 0 I − Cσ u0 0 −1 ≥ 0, 2 Sσ
b := uS−1 > 0. σ
Hence we finally conclude 1 − 2
>
1 1 a S − cN ≤ + 1 c1 ≤ 4 b 2
>
1 S − cN 1 c1 , 4
which gives the assertion. A similar estimate can also be shown for the operator 12 I + Cσ . −1/2 Theorem 3.2. For v ∈ H⊥ (curlΓ , Γ), σ = iκ, κ ∈ R+ , there holds 0
0 0 1 0 0 I + Cσ v0 ≤ . (1 − cK )vS−1 0 2 0 −1 ≤ cK vS−1 σ σ Sσ Proof. The proof follows as in the case of a formally elliptic partial differential operator; see [28, Theorem 3.2]. With the contraction property of 12 I − Cσ we obtain 0
0 0 1 0 1 0 I − Cσ v0 = 0 I + Cσ v + vS−1 0 −1 σ 2 2 Sσ 0
0
0 0 0 1 0 1 0 0 0 0 0 ≤ 0 I + Cσ v0 + 0 I − Cσ v0 0 −1 2 2 S−1 Sσ σ 0
0 0 1 0 0 ≤0 0 2 I + Cσ v0 −1 + cK vS−1 σ Sσ and therefore the first inequality. On the other hand, by using the representations (2.17) and (2.18) we get 0
0
02 02 0 1 0 0 0 0 I + Cσ v0 0 I − 1 I − Cσ = v0 0 2 0 −1 0 0 −1 2 Sσ Sσ =
v2S−1 σ
0
? 02
@ 0 1 0 −1 1 0 0 I − Cσ v, v + 0 I − Cσ v0 − 2 Sσ 2 2 S−1 σ
=
v2S−1 σ
0
02 0 1 0 0 + 0 I − Cσ v0 0 −1 − 2Tσ v, v 2 Sσ
0
02 0 1 0 0 I − C − 2Nσ v, v = v2S−1 − 0 σ v0 0 2 σ S−1 σ 2 2 2 ≤ [1 − (1 − cK )2 − 2cS1 cN 1 ] vS−1 = cK vS−1 σ
and therefore the upper estimate.
σ
MODIFIED CFIE FOR ELECTROMAGNETIC SCATTERING
1161
As for the operators 12 I ± Cσ we can prove related estimates for the operators −1/2 1 (divΓ , Γ) which is induced by 2 I ± Bσ when considering an equivalent norm in H the single layer potential Sσ ; i.e., for σ = iκ, κ ∈ R+ there holds 0
0 0 1 0 0 (3.2) (1 − cK ) wSσ ≤ 0 I ± Bσ w0 0 ≤ cK wSσ 2 Sσ −1/2
for all w ∈ H For u ∈
(divΓ , Γ). −1/2 H (divΓ , Γ)
and κ ∈ R+ we finally define the operator
Sκ,0 u := A0 u −
1 ∇Γ V0 divΓ u. κ2
Now we are able to prove the coercivity of the operator Zκ . Theorem 3.3. Let κ ∈ R+ . The operator
1 1 −1/2 −1/2 Zκ = Sκ + iη I − Cκ S∗0 −1 I + B−κ : H (divΓ , Γ) → H⊥ (curlΓ , Γ) 2 2 satisfies a G˚ arding inequality; i.e., there holds $[Zκ μ, μ + c1 (μ, μ)] ≥ cZ μ2H−1/2 (div
Γ
,Γ)
−1/2
for all μ ∈ H (divΓ , Γ) with a positive constant cZ where c1 (μ, μ) is a compact bilinear form. Proof. Since Sκ,0 w, w is real, the same holds true for the duality product
@ ? 1 1 I + B−κ w, I + B−κ w ∈ R. S∗0 −1 2 2 Because of the contraction property (3.2) we get, for σ = iκ, 0
0 0 1 0 0 I + Bσ w0 ≥ c wH−1/2 (divΓ ,Γ) 0 2 0 −1/2
H (divΓ ,Γ)
−1/2
−1/2
for all w ∈ H (divΓ , Γ). Since the operator S∗0 −1 is H (divΓ , Γ)-elliptic, we have
@ ? 1 ∗ −1 1 I + Bσ w, I + Bσ w ≥ c w2H−1/2 (div ,Γ) S0 Γ 2 2
−1/2
for all w ∈ H form:
(divΓ , Γ). The operator Zκ can now be written in the following
Zκ = Sκ,0 + (Sκ − Sκ,0 ) +iη K LM N
1 ∗ −1 1 I − Cσ S0 I + Bσ 2 2
compact
+ (Cσ − K
Cκ )S∗0 −1
1 I + B−κ 2
1 ∗ −1 I − Cσ S0 (B−κ − Bσ ) , + 2 LM N
compact
1162
O. STEINBACH AND M. WINDISCH
which implies $ [Zκ w, w + c1 (w, w)] ?
@ 1 ∗ −1 1 = $ Sκ,0 w, w + iη S0 I + Bσ w, I + Bσ w 2 2 ?
@ 1 1 = η S∗0 −1 I + Bσ w, I + Bσ w 2 2 ≥ c w2H−1/2 (div
Γ
,Γ)
.
Note that the compactness of Sκ − Sκ,0 , Cσ − Cκ , and B−κ − Bσ follows as for the Helmholtz case; see, e.g., [26, 27, 29]. Hence, to use Fredholm’s alternative to establish the unique solvability of the modified boundary integral equation (3.1) it remains to prove the injectivity of the operator Zκ . This can be done as for the Helmholtz equation; see [14]. Theorem 3.4. For a positive wave number κ ∈ R+ there holds $[Sκ w, w] ≥ 0 −1/2
for all w ∈ H (divΓ , Γ). Proof. Let U(x) = ΨκS w(x), x ∈ Ω, be a solution of the partial differential equation (1.1). From Green’s first formula (1.4) we then have
2 γN U(x) · γD V(x)dsx . curl U(x) · curl V(x) − κ U(x) · V(x) dx = Ω
Γ
For V = U it follows that 2 2 2 |curl U(x)| − κ |U(x)| dx = γN U(x) · γD U(x)dsx . Ω
Γ
With γN ΨκS w(x) =
1 w(x) + Bκ w(x), 2
γD ΨκS w(x) = Sκ w(x), we then obtain ? @ 1 w + Bκ w, Sκ w . |curl U(x)|2 − κ2 |U(x)|2 dx = γN U, γD U = 2 Ω To handle the exterior domain Ωc we first consider the bounded domain Br \Ω, |curl U(x)|2 − κ2 |U(x)|2 dx Br \Ω
γN U(x) · γD U(x)dsx −
= ∂Br
c c γN U(x) · γD U(x)dsx . Γ
MODIFIED CFIE FOR ELECTROMAGNETIC SCATTERING
1163
For the exterior traces of U(x) = ΨκS w(x), x ∈ Ωc , we have for x ∈ Γ 1 c ΨκS w(x) = − w(x) + Bκ w(x), γN 2 c ΨκS w(x) = Sκ w(x), γD
and therefore
Br \Ω
|curl U(x)|2 − κ2 |U(x)|2 dx ?
γN U(x) · γD U(x)dsx +
= ∂Br
@ 1 w − Bκ w, Sκ w . 2
Hence we find by summing up the above expressions |curl U(x)|2 − κ2 |U(x)|2 dx = w, Sκ w + Br
γN U(x) · γD U(x)dsx ,
∂Br
and therefore
$[w, Sκ w] = −$
γN U(x) · γD U(x)dsx .
∂Br
From the Silver–M¨ uller radiation condition, i.e., lim |curl U(x) × n − iκ(n × U(x)) × n|2 dsx = 0, r=|x|→0
∂Br
we further conclude |γN U(x) − iκγD U(x)|2 dsx ∂Br
=
|γN U(x)|2 + |κγD U(x)|2 − 2[γN U(x) · iκγD U(x)] dsx
∂Br
=
|γN U(x)|2 + |κγD U(x)|2 − 2κ $ γN U(x) · γD U(x) dsx
∂Br
=
|γN U(x)|2 + |κγD U(x)|2 dsx + 2κ $[w, Sκ w] → 0
∂Br
as r → ∞, which implies 2κ $[w, Sκ w] ≤ 0 and thus 2κ $[Sκ w, w] ≥ 0. Now we are in a position to prove the injectivity of Zκ . Theorem 3.5. For κ ∈ R+ and η ∈ R+ the modified boundary integral operator
1 −1/2 −1/2 ∗ −1 1 I − Cκ S0 I + B−κ : H (divΓ , Γ) → H⊥ (curlΓ , Γ) Zκ = Sκ + iη 2 2 is injective.
1164
O. STEINBACH AND M. WINDISCH −1/2
Proof. Let w ∈ H
(divΓ , Γ) be a solution of the homogeneous equation Zκ w(x) = 0
for x ∈ Γ.
Then it follows that
?
@ 1 ∗ −1 1 I + B−κ w, I + B−κ w 0 = Zκ w, w = Sκ w, w + iη S0 2 2
and therefore ?
@ 1 ∗ −1 1 $ Sκ w, w + iη S0 I + B−κ w, I + B−κ w = 0. 2 2 By using Theorem 3.4 we then get
@ ? 1 ∗ −1 1 I + B−κ w, I + B−κ w = −$[Sκ w, w] ≤ 0, η S0 2 2 and hence we conclude
1 I + B−κ w = 0. 2
But then we also have Sκ w(x) = 0 for x ∈ Γ, which admits only a nontrivial solution w = γN Uλ if κ2 = λ is an eigenvalue of the interior Dirichlet eigenvalue problem (2.8) implying
1 I − B±κ w = 0, 2 i.e.,
1 I + B−κ w = 0, 2
1 I − B−κ w = 0. 2
Hence we conclude w = 0 for all frequencies κ > 0. When combining the coercivity (Theorem 3.3) and the injectivity (Theorem 3.4) of the operator Zκ we therefore conclude the unique solvability of the modified boundary integral equation (3.1). The related variational formulation is to find w ∈ −1/2 H (divΓ , Γ) such that ?
(3.3)
Sκ w, τ + iη
@ 1 1 I − Cκ S∗0 −1 I + B−κ w, τ = g, τ 2 2 −1/2
is satisfied for all test functions τ ∈ H (divΓ , Γ). Note that the variational problem (3.3) has a similar structure as the symmetric boundary integral representation of the Steklov–Poincar´e operator. Due to the composite structure a direct Galerkin discretization of (3.3) will not be possible. Hence we introduce
−1/2 ∗ −1 1 z = S0 I + B−κ w ∈ H⊥ (curlΓ , Γ), 2
MODIFIED CFIE FOR ELECTROMAGNETIC SCATTERING
1165
which is the unique solution of the variational problem such that ?
@ 1 ∗ S0 z, v = I + B−κ w, v 2 −1/2
is satisfied for all v ∈ H⊥ (curlΓ , Γ). Finally we obtain a saddle point formulation −1/2 −1/2 to find (w, z) ∈ H (divΓ , Γ) × H⊥ (curlΓ , Γ) such that (3.4)
Sκ w, τ
+ iη( 12 I − Cκ )z, τ
− ( 12 I + B−κ )w, v −1/2
+
S∗0 z, v
= g, τ =
0
−1/2
is satisfied for all (τ , v) ∈ H (divΓ , Γ) × H⊥ (curlΓ , Γ). Since the modified boundary integral equation (3.1) is the Schur complement system of the mixed formulation (3.4) the unique solvability of (3.4) follows immediately. Remark 3.6. In this paper we just presented a modified boundary integral formulation for the exterior Dirichlet boundary value problem (1.1)–(1.3). For an exterior Neumann boundary value problem a similar modified formulation can be derived and analyzed as well [29]. 4. Numerical example. As a numerical example to show the applicability of the proposed approach we consider the exterior Dirichlet boundary value problem (1.1)–(1.3) where Ω = (0, 1)3 is the unit cube whose boundary Γ = ∂Ω is decomposed into N triangular plane elements. For this domain we can easily deduce the eigenvalues and eigenfrequencies of the interior Dirichlet eigenvalue problem. In particular √ we will consider the smallest eigenvalue which corresponds to the wave number k = 2π ≈ 4.44288. As exact solution of the exterior Dirichlet boundary value problem (1.1)– (1.3) we consider [2] ⎛ ⎞ ⎞⎤ ⎛ ⎡ 1 x1 − xˆ1 2 2 2 2 ⎟⎥ ⎜ ⎢ κ r + κr + 1 ⎜ ⎟ κ r + 3κr + 3 (x1 − x ˆ1 ) ⎝ x2 − xˆ2 ⎠⎦ eκr U(x) = ⎣ ⎝ 0 ⎠− r3 r5 x3 − xˆ3 0 for x ∈ Ωc , where the source point is xˆ = ( 12 , 12 , 12 ) ∈ Ω, and r = |x − x ˆ|. For a comparison of different approaches we consider the indirect single layer potential ansatz leading to the boundary integral equation (2.4), the proposed modified formulation (η = 1) where we have to solve (3.1), and a direct approach which results in the boundary integral equation (2.7). In all cases the Galerkin discretization is done by using linear Raviart–Thomas elements; see, e.g., [2, 25] for details. The resulting linear systems are solved by a GMRES method with a relative error reduction of ε = 10−8 . Then we compute approximate solutions Uh and the related pointwise error in the evaluation point x ¯ = (1.4, 1.8, 2.0) ∈ Ωc . All results are documented in Table 1. It is obvious that the indirect single layer potential approach fails since the wave number k corresponds to an eigenvalue of the interior Dirichlet eigenvalue problem. The results of the modified formulation (3.1) and of the direct approach (2.7) are comparable in this example. However, for the latter one has to ensure a solvability condition also in the discrete case which requires in general the knowledge of the related eigenfrequency. Here we considered only a direct Galerkin discretization of (2.7) which may fail in more general situations.
1166
O. STEINBACH AND M. WINDISCH Table 1 Number of GMRES iterations and pointwise error. Indirect, (2.4) N 72 288 1152 4608 18432
Iter 53 107 238 554
|U(¯ x) − Uh (¯ x)| 7.64 10.85 15.52 43.20
Modified, (3.1) Iter 110 197 280 403 665
|U(¯ x) − Uh (¯ x)| 1.27632 0.19541 0.04874 0.01308 0.00730
Direct, (2.7) Iter 53 107 209 469 834
|U(¯ x) − Uh (¯ x)| 0.64908 0.19153 0.04677 0.01222 0.00529
Related to the numerical results there are several points to be discussed, first of all the numerical analysis to establish the quadratic order of pointwise convergence. Moreover, we have to investigate a suitable choice of the scaling parameter η ∈ R+ and the construction of efficient preconditioned iterative solution methods. It is obvious that these questions are strongly related to the case of exterior boundary value problems for the Helmholtz equation [15]. Note that the formulation corresponds to the symmetric formulation of boundary integral equations as used in domain decomposition methods, or to solve boundary value problems with boundary conditions of mixed Dirichlet and Neumann type [27]. 5. Conclusions. In this paper we have described and analyzed a modified boundary integral equation to solve an exterior Dirichlet boundary value problem for the Maxwell system which is stable for all wave numbers. Note that a similar formulation can be given in the case of an exterior Neumann boundary value problem as well. The proposed regularization operator relies on boundary integral operators which are already available when considering standard boundary integral equations for the Maxwell system. The modified boundary integral equation is finally reformulated as a saddle point formulation which allows a direct Galerkin discretization. A first numerical example shows the applicability of the proposed approach. In a forthcoming paper we will present the numerical analysis of the related boundary element method to solve the saddle point formulation (3.4). This may also include the use of fast boundary element methods, and the design of preconditioned iterative solution strategies to solve the resulting linear systems of algebraic equations. Acknowledgment. The authors would like to express their thanks to the anonymous referees for many helpful hints and advice. REFERENCES ¨ [1] H. Brakhage and P. Werner, Uber das Dirichletsche Aussenraumproblem f¨ ur die Helmholtzsche Schwingungsgleichung, Arch. Math., 16 (1965), pp. 325–329. [2] J. Breuer, Schnelle Randelementmethoden zur Simulation von elektrischen Wirbelstromfeldern sowie ihrer W¨ armeproduktion und K¨ uhlung, Dissertation, Universit¨ at Stuttgart, Stuttgart, Germany, 2005. [3] A. Buffa, Remarks on the discretization of some noncoercive operator with applications to heterogeneous Maxwell equations, SIAM J. Numer. Anal., 43 (2005), pp. 1–18. [4] A. Buffa and P. Ciarlet, On traces for functional spaces related to Maxwell’s equations. I. An integration by parts formula in Lipschitz polyhedra, Math. Methods Appl. Sci., 24 (2001), pp. 9–30. [5] A. Buffa and P. Ciarlet, On traces for functional spaces related to Maxwell’s equations. II. Hodge decompositions on the boundary of Lipschitz polyhedra and applications, Math. Methods Appl. Sci., 24 (2001), pp. 31–48. [6] A. Buffa, M. Costabel, and C. Schwab, Boundary element methods for Maxwell’s equations on non-smooth domains, Numer. Math., 92 (2002), pp. 679–710.
MODIFIED CFIE FOR ELECTROMAGNETIC SCATTERING
1167
[7] A. Buffa, M. Costabel, and D. Sheen, On traces for H(curl, Ω) in Lipschitz domains, J. Math. Anal. Appl., 276 (2002), pp. 845–867. [8] A. Buffa and R. Hiptmair, Galerkin boundary element methods for electromagnetic scattering, in Topics in Computational Wave Propagation, Lect. Notes Comput. Sci. Eng. 31, Springer, Berlin, 2003, pp. 83–124. [9] A. Buffa and R. Hiptmair, A coercive combined field integral equation for electromagnetic scattering, SIAM J. Numer. Anal., 42 (2004), pp. 621–640. [10] A. Buffa and R. Hiptmair, Regularized combined field integral equations, Numer. Math., 100 (2005), pp. 1–19. [11] A. Buffa, R. Hiptmair, T. von Petersdorff, and C. Schwab, Boundary element methods for Maxwell transmission problems in Lipschitz domains, Numer. Math., 95 (2003), pp. 459–485. [12] D. Colton and R. Kress, Integral Equation Methods in Scattering Theory, John Wiley and Sons, New York, 1983. [13] D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory, Appl. Math. Sci. 93, Springer, Berlin, 1998. [14] S. Engleder and O. Steinbach, Modified boundary integral formulations for the Helmholtz equation, J. Math. Anal. Appl., 331 (2007), pp. 396–407. [15] S. Engleder and O. Steinbach, Stabilized boundary element methods for exterior Helmholtz problems, Numer. Math., 110 (2008), pp. 145–160. [16] R. Hiptmair, Symmetric coupling for eddy current problems, SIAM J. Numer. Anal., 40 (2002), pp. 41–65. [17] R. Hiptmair, Boundary element methods for eddy current computation, in Computational Electromagnetics (Kiel, 2001), Lect. Notes Comput. Sci. Eng. 28, Springer, Berlin, 2003, pp. 103–126. [18] R. Hiptmair, Coupling of finite elements and boundary elements in electromagnetic scattering, SIAM J. Numer. Anal., 41 (2003), pp. 919–944. [19] R. Hiptmair and C. Schwab, Natural boundary element methods for the electric field integral equation on polyhedra, SIAM J. Numer. Anal., 40 (2002), pp. 66–86. [20] G. C. Hsiao, Mathematical foundations for the boundary field equation methods in acoustic and electromagnetic scattering, in Analysis and Computational Methods in Scattering and Applied Mathematics. A Volume in the Memory of Ralph Ellis Kleinman, Chapman & Hall/CRC Res. Notes Math. 417, F. Santosa and I. Stakgold, eds., Chapman & Hall/CRC, Boca Raton, FL, 2000, pp. 149–163. [21] G. C. Hsiao and R. E. Kleinman, Mathematical foundations for error estimation in numerical solutions of integral equations in electromagnetics, IEEE Trans. Antennas and Propagation, 45 (1997), pp. 316–328. [22] P. Monk, Finite Element Methods for Maxwell’s Equations, Numer. Math. Sci. Comput., Oxford University Press, New York, 2003. [23] J.-C. N´ ed´ elec, Acoustic and Electromagnetic Equations. Integral Representations for Harmonic Problems, Appl. Math. Sci. 144, Springer, New York, 2001. [24] O. I. Panich, On the question of the solvability of the exterior boundary value problems for the wave equation and Maxwell’s equations, Uspekhi Mat. Nauk., 20 (1965), pp. 221–226 (in Russian). [25] P.-A. Raviart and J. M. Thomas, A mixed finite element method for 2nd order elliptic problems, in Mathematical Aspects of Finite Element Methods (Rome, 1975), Lecture Notes in Math. 606, Springer, Berlin, 1977, pp. 292–315. [26] S. A. Sauter and C. Schwab, Randelementmethoden. Analyse, Numerik und Implementierung schneller Algorithmen, B. G. Teubner, Stuttgart, Leipzig, Wiesbaden, 2004. [27] O. Steinbach, Numerical Approximation Methods for Elliptic Boundary Value Problems. Finite and Boundary Elements, Springer, New York, 2008. [28] O. Steinbach and W. L. Wendland, On C. Neumann’s method for second order elliptic systems in domains with non-smooth boundaries, J. Math. Anal. Appl., 262 (2001), pp. 733– 748. [29] M. Windisch, Modifizierte Randintegralgleichungen f¨ ur elektromagnetische Streuprobleme, Diplomarbeit, Institut f¨ ur Numerische Mathematik, TU Graz, Graz, Austria, 2007.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1168–1194
c 2009 Society for Industrial and Applied Mathematics
A FAST METHOD FOR LINEAR WAVES BASED ON GEOMETRICAL OPTICS∗ CHRISTIAAN C. STOLK† Abstract. We develop a fast method for solving the one-dimensional wave equation based on geometrical optics. From geometrical optics (e.g., Fourier integral operator theory or WKB approximation) it is known that high-frequency waves split into forward and backward propagating parts, each propagating with the wave speed, with amplitude that is slowly changing depending on the medium coefficients, under the assumption that the medium coefficients vary slowly compared to the wavelength. Based on this we construct a method of optimal, O(N ) complexity, with basically the following steps: 1. decouple the wavefield into an approximately forward and an approximately backward propagating part; 2. propagate each component explicitly along the characteristics over a time step that is small compared to the medium scale but can be large compared to the wavelength; 3. apply a correction to account for the errors in the explicit propagation; repeat steps 2 and 3 over the necessary amount of time steps; and 4. reconstruct the full field by adding forward and backward propagating components again. Due to step 3 the method accurately computes the full wavefield. A variant of the method was implemented and outperformed a standard order (4,4) finite difference method by a substantial factor. The general principle is applicable also in higher dimensions, but requires efficient implementations of Fourier integral operators which are still the subject of current research. Key words. wave equation, numerical method, multiscale method, geometrical optics, integrating factor AMS subject classifications. 65M25, 76Q05 DOI. 10.1137/070698919
1. Introduction. Consider waves propagating in an inhomogeneous medium which varies slowly on the scale of the wavelength. In this case, waves behave much like in the constant coefficient case. For example, in one dimension an initial pulse approximately splits into a forward propagating pulse and a backward propagating pulse, each propagating with the wave speed, and with slowly varying amplitude. Indeed for small times, the wave “sees” only a small, approximately constant part of the medium. This can be made precise using WKB, or geometrical optics theory, or the more general and advanced theory of Fourier integral operators. One finds that the above picture is true in the limit for high-frequency waves; these have the just described relatively simple interaction with the medium. For the low-frequency part the interaction with the medium is of course more complicated; e.g., reflections occur. Simulating high-frequency waves using finite differences or finite elements is notoriously expensive, especially in three dimensions. One reason for this is the large number of time steps that is generally needed, since in conventional methods the time step is bounded by the space discretization length. In one dimension this leads to cost at least O(N 2 ) if N is the number of space discretization points. This on the one hand is quite understandable: The wavefield is computed over a finite part of the (x, t)-plane with resolution 1/N in both the x and the t direction. On the other hand, ∗ Received by the editors August 1, 2007; accepted for publication (in revised form) November 17, 2008; published electronically February 19, 2009. This research was supported by the Netherlands Organisation for Scientific Research through VIDI grant 639.032.509. This work was done while the author was employed at the University of Twente. http://www.siam.org/journals/sinum/47-2/69891.html † Korteweg-de Vries Institute for Mathematics, University of Amsterdam, 1018 TV Amsterdam, The Netherlands (
[email protected]).
1168
A FAST METHOD FOR LINEAR WAVES
1169
if we are interested only in the map from initial to final values, one can argue that there is room for improvement: The high frequencies are well described by translation and scaling over quantities that follow from the smoothly varying medium. The low frequencies still need to be computed by some discretization, but with a coarse grid. In this paper we will show that in fact we can devise a scheme that follows this pattern and is of complexity O(N ), i.e., optimal. The observation about the high cost of simulating high-frequency waves is not new, and various authors have sought to deal with this, e.g., [12] in one dimension, [2, 9] in higher dimensions. The paper [12] uses the observation that the matrix that describes the propagator P (t) (the operator exponent etM in the notation below, that maps initial values at time 0 to values at a later time t, assuming time-independent coefficients) can be compressed by wavelet compression. High-frequency signals in the propagator are concentrated around the characteristics. Low-frequency signals are not. Due to the separation in space and scale that is obtained using wavelets, this leads to many small entries that, if omitted, give only a small error to the matrix. The matrix is compressed in this way, and it becomes possible to store it. The operator exponent is then first computed for a small time τ , and subsequently for longer times by repeated squaring P (2τ ) = P (τ )2 , P (4τ ) = P (2τ )2 , etc. Unlike our method this idea is restricted to time-independent coefficients. Curvelet frames [15, 4] have been proposed to extend this idea to multiple dimensions. In this paper we introduce a new, different concept to reduce computational cost. We explicitly separate forward and backward propagating parts of the waves, as made possible by high-frequency asymptotic theory, and propagate these explicitly. No matrix compression is used. Roughly speaking the method involves the following steps, that are repeated over a number of time steps to obtain the final result: 1. Decouple the wavefield into a forward and a backward propagating part, like for the constant coefficient medium where we can find two functions F and B such that the solution is given by U1 (x, t) = B(x + ct) + F (x − ct). 2. Propagate each component explicitly over a time step that is small compared to the medium scale but large compared to the wavelength. 3. Apply a correction to account for the errors in the explicit propagation. 4. Reconstruct the full field by adding forward and backward propagating components again. For higher dimensions one could perhaps devise a similar scheme; however, at this point in time it is not clear how to efficiently compute the Fourier integral operators needed in step 2. Two methods according to this outline will be described. First we derive a relatively straightforward method, that is implemented numerically and tested. The goal of this is to get a first impression of what kind of numerical results can be obtained. Compared with an order (4,4) finite difference method we find improvements in speed of factors up to 20, depending on the smoothness of the medium. A second method is derived using several more innovations, in particular a new multiscale time-stepping method; see section 6 and thereafter. For this method we study error estimates and the complexity, and we show that it has optimal O(N ) complexity. The O(N ) complexity is better than that in [12], but we also have another improvement compared to the repeated squaring method, namely that our method is also applicable in media with time-dependent coefficients. Let us discuss in more mathematical terms the ideas behind the method. We consider the one-dimensional acoustic wave equation
1170
CHRISTIAAN C. STOLK
(1.1)
(∂t ◦ a(x, t)∂t − ∂x ◦ b(x, t)∂x )U1 (x, t) = 0,
with domain given by a circle Ω of integer length L. It will be convenient to write this as a first-order system; let U1 0 a−1 (1.2) U2 = a∂t U1 , U= . , M= U2 ∂x ◦ b∂x 0 Then (1.1) becomes d U = M U. dt
(1.3)
We view this as an ODE with values in a function space, which explains the notation d dt in this equation. We are interested in the initial value problem where U (t0 ) = U0 is given and U (t1 ) is to be determined. The natural space to consider the equation is U (t) ∈ H 1 × L2 , where H s = H s (Ω) denotes the Sobolev space of order s. With coefficients that are C k,1 in space, and with time derivative that is also C k,1 in space, there is existence, uniqueness, and stable dependence on initial values for U0 ∈ H s+1 × H s , with U (t) ∈ C([t0 , t1 ], H s+1 × H s ), for −k − 1 ≤ s ≤ k [14, 16]. Let us consider now where there is room for improvement in standard finite difference or finite element methods. Suppose U1 , U2 are discretized on Ω by finite differences, using a regular grid with grid distance h and N = L/h grid points. Then the operator M is discretized, and the time evolution is computed with some timestepping procedure. The operator M behaves like a first-order operator, mapping H s+1 × H s to H s × H s−1 . Its norm is proportional to h−1 . Accuracy and stability of a discrete approximation now require that the time step is of order h, Δt h/c(x, t), with c = b/a the velocity (the Courant–Friedrichs–Lewy condition). The cost for given N is therefore at least O((# of time steps) · N ) = O(N 2 ). To have lower cost, we will attack the number of time steps, by using larger time steps. An idea that has been used for this purpose is operator splitting with an integrating factor method. Suppose M is of the form (1.4)
M = A + B.
Operator splitting is the idea that the matrix exponential eΔt(A+B) is approximated by products of factors eΔtj A and eΔtk B . One way to derive an operator splitting method is the integrating factor method. Let E(t, t0 ) be a solution operator for U = AU , i.e., an operator that maps U (t0 ) to the solution U (t) of U = AU . For the time-independent case E(t, t0 ) = e(t−t0 )A . Then we can define V = E(t, t0 )−1 U.
(1.5)
The term E(t, t0 )−1 is then an integrating factor. Differentiating the equivalent equation E(t, t0 )V = U gives that (A + B)U =
dU dV = AE(t, t0 )V + E(t, t0 ) . dt dt
A FAST METHOD FOR LINEAR WAVES
Therefore, solving for
dV dt
1171
,
dV = E(t, t0 )−1 BE(t, t0 )V. dt To apply this usefully, the operator on the right-hand side must have smaller norm than the original operator M , so that time-stepping can be performed with larger time steps. This is applied in some nonlinear equations with a diffusive part, for which the time evolution can be computed efficiently in the Fourier domain [20]. Because of this use of an integrating factor, we call our method a geometrical optics integrating factor method. A similar idea is used in the Egorov theorem of microlocal analysis. In this theory, a Fourier integral operator (FIO) E(t, t0 ) is constructed [11, 10, 19, 21], such that the field V (t) = E(t, t0 )−1 U (t) satisfies (1.6)
∂ V (t) = R(t, t0 )V (t), ∂t where the operator R is smoothing, in the sense that it maps H s+1 ×H s → H s+1+K × H s+K for any K desired (the order K depends on the amount of terms in the asymptotic series for the amplitude in the FIO E(t, t0 )). The fact that R is bounded means that a properly discretized version can be bounded independent of h. By the above reasoning the stepsize requirement would become independent of h (of course an estimate of the time discretization error is needed to establish this). For small h, as the number of time steps would become large due to the CFL condition, one might expect to have a gain in computation speed for the transformed differential equation (1.7). Continuing this line of reasoning, the time step could become independent of the number of space discretization points N , assuming the desired accuracy stays fixed. For example, having initial conditions double in frequency, with the same medium and accuracy, one can conjecture that the time step could stay the same. While Fourier integral operator theory has been developed for any space dimension, for dimension 2 or higher it is not clear how to efficiently obtain numerical approximations of Fourier integral operators (see for work in this direction, e.g., the recent paper [3]). Here we therefore treat the one-dimensional case. In this case, it is convenient not to work with the field U , or with V in (1.7) directly, but instead work with forward and backward propagating components. These will be denoted by u1 and u2 . An operator Q and its inverse will be constructed such that u = (u1 , u2 )T = Q−1 U (this gives step 1 and 3). We will show that in terms of these variables the differential equation (1.3) becomes (1.7)
d u = (T + R)u dt
(1.8) with T =
b/a∂x + f1 0
0
, − b/a∂x + f2
f1 , f2 functions given below, and R a remainder operator, that is explicitly derived and is continuous H s+1 × H s+1 → H s+2 × H s+2 (for time-independent coefficients f1 = f2 = 0). Versions of R with off-diagonal terms that are even more smoothing can also be constructed; see further on in the paper. Equation (1.8) will be used for operator splitting. The equation u = T u corresponds to two transport equations (step 2 in the outline above). These are solved
1172
CHRISTIAAN C. STOLK
using the method of characteristics. This yields a geometrical optics approximation of the propagator. The term R then yields the correction mentioned in step 3 of the four points above. Computing with the characteristics is cheaper than computing directly on the wavefield, e.g., using a discretization of the transport equation. The explanation for this is that the time steps in an ODE solver needed for solving for the characteristics depend on the medium smoothness, and not on the smoothness of the wavefield, and can therefore be longer than the time steps in a discretization of the transport equation. Similarly, it is not necessary to compute a characteristic for each grid point because interpolation can be used. After computing the characteristics, applying the flow along the characteristics becomes a standard interpolation problem. The computation of flow along characteristics is related to the use of moving grids in scalar conservation laws. Originally the reason to have the grid moving with the singularities of solutions was that an adapted (locally refined) grid would stay adapted to the singularities. But it was also observed that this could lead to larger time steps [13]. As mentioned we have both numerical and theoretical results. First we derive a relatively simple method following the above ideas. This method has been implemented and compared with a standard order (4,4) finite-difference method described in [6]. Factors of order 10 to 20 of improvement in the computation speed were obtained in examples. In the second part of the paper we study error estimates and complexity. It turns out that the method described in sections 2 to 4 does not yet have the best possible complexity. With several enhancements we construct a method (or a class of methods) with optimal complexity O(N ) to solve the initial value problem. These additional features are the use of higher-order decoupling, and of a multiscale decomposition where each scale has its own time step (multiscale time-stepping). They will be further introduced in section 6. The remainder of the paper will be organized as follows. In section 2 we describe the separation of the forward and backward propagating parts of the wavefield (decoupling). The differential equation is then transformed into one to which operator splitting and the integrating factor method can be applied. This is discussed in section 3. We then describe a simple space discretization and the resulting algorithm in section 4. Section 5 contains some numerical results. Section 6 introduces the main additional ideas behind the method for which we establish O(N ) complexity. These are further worked out and proved in sections 7, 8, and 9. We end with a short discussion of the results. 2. Decoupling the equation. The splitting in (1.4)–(1.6) is not directly applied to M ; first the equation is transformed to new variables as announced in (1.8). We define new variables by U (t) = Q(t)u(t), with Q an invertible matrix operator. The operator Q is independent of t if M is independent of t, and may otherwise depend on t. The equation for u is then ( denoting time differentiation) (2.1)
u = (Q−1 M Q − Q−1 Q )u.
1173
A FAST METHOD FOR LINEAR WAVES
The purpose of this section is to find a suitable operator Q, such that the resulting differential equation is of the form b/a∂x + f1 0 u1 u1 d u1 (2.2) = +R , dt u2 u u 0 − b/a∂x + f2 2 2 with Q, f1 , and f2 and the remainder operator R to be determined. In fact, we will derive an explicit expression for b/a∂x + f1 0 QR = M Q − Q − Q . 0 − b/a∂x + f2 (2.3) KLMN KLMN K LM N A B C The notations A, B, C will be used below in evaluating the product. Note that R is not given directly, but has to be computed as the product of Q−1 and QR which are given; the reason for this is that we want to minimize the use of inverse differential operators, and here the only place where those occur is in Q−1 . We will find that the operator R belongs to a class of pseudodifferential operators of order −1. In the remainder of the section the actual computation is done. We treat separately the cases where a, b are time-independent, resp., the general case with timedependent a, b. For convenience we collect the results in the following lemma. Lemma 2.1. For the time-independent case, with Q given by (2.5), and f1 = f2 = 0, QR is given by (2.6) and (2.7). For the time-dependent case, with Q, f1 , f2 given by (2.8), (2.9), and (2.12), QR is given by (2.10), (2.11), (2.13), and (2.14). Computation for the time-independent case. In this case we will take Q independent of t, so that C = 0, and such that f1 and f2 vanish. Consider first the following choice for Q: 1 1 (0) √ . Q = √ ab∂x − ab∂x A quick computation shows that b/a∂x (2.4) Q(0) R(0) = M Q(0) − Q(0) 0
0
− b/a∂x
order(0) order(0) = , order(1) order(1)
so to highest order this is a good choice. Next we modify Q so that (1) it is invertible, and (2) the components of QR vanish to one order lower. The operator Q becomes invertible when the derivative is replaced by a regularized derivative, which will be denoted by ∂˜x , defined in the Fourier domain by multiplication with ik + βkα2 +1 , with α, β suitable positive, real constants that remain to be chosen. To eliminate the order 0 and order 1 terms in (2.4), the columns of Q will be normalized by a weight function; we will try (2.5)
Q=
f (x) √ f (x) ab∂˜x
f (x) √ −f (x) ab∂˜x
with f given by f = a−1/4 b−1/4 .
,
Q
−1
=
1 2
f −1
∂˜x−1 f −1 √1ab
f −1
−∂˜x−1 f −1 √1ab
,
1174
CHRISTIAAN C. STOLK
For contribution A we then find > > b b ˜ A11 = −A12 = f (x) ∂x + f (x) (∂x − ∂x ) = a−3/4 b1/4 ∂x + a−3/4 b1/4 (∂˜x − ∂x ) a a and A21 = A22 = ∂x b∂x f = a1/4 b1/4 ∂x a−1/2 b1/2 ∂x + R1 with R1 = a−1/4 b3/4
1
4 ∂x log a
−
1
− 34 ∂x log b
2 4 ∂x log a
1
4 ∂x
+ 14 ∂x2 log b
log a + 14 ∂x log b
.
Contribution B is given by B11 = −B12 = a−3/4 b1/4 ∂x and B21 = B22 = a1/4 b1/4 ∂x a−1/2 b1/2 ∂x + a1/4 b1/4 (∂˜x − ∂x )a−1/2 b1/2 ∂x . We thus find the following for QR: (2.6)
(QR)11 = −(QR)12 = a−3/4 b1/4 (∂˜x − ∂x )
and (2.7)
(QR)21 = (QR)22 = R1 − a1/4 b1/4 (∂˜x − ∂x )a−1/2 b1/2 ∂x .
The time-dependent case. In this case we try f (x) f (x) √ √ , (2.8) Q= f (x) ab∂˜x + c1 −f (x) ab∂˜x + c2 with f as above, and f1 , f2 , c1 , c2 to be determined. The inverse of Q will be discussed below. We find A11 = a−3/4 b1/4 ∂x + a−3/4 b1/4 (∂˜x − ∂x ) + a−1 c1 , A12 = − a−3/4 b1/4 ∂x − a−3/4 b1/4 (∂˜x − ∂x ) + a−1 c2 ; A21 and A22 remain unchanged. For the coefficients of the matrix operator B we find B11 = a−3/4 b1/4 ∂x + (ab)−1/4 f1 , B12 = − a−3/4 b1/4 ∂x + (ab)−1/4 f2 , B21 = a1/4 b1/4 ∂x a−1/2 b1/2 ∂x + a1/4 b1/4 (∂˜x − ∂x )a−1/2 b1/2 ∂x + c1 b/a∂x + (ab)1/4 ∂˜x ◦ f1 + c1 f1 , B22 = a1/4 b1/4 ∂x a−1/2 b1/2 ∂x + a1/4 b1/4 (∂˜x − ∂x )a−1/2 b1/2 ∂x − c2 b/a∂x − (ab)1/4 ∂˜x ◦ f2 + c2 f2 .
A FAST METHOD FOR LINEAR WAVES
For C we have C11 = ∂t (ab)−1/4 , C12 = ∂t (ab)−1/4 , C21 = ∂t (ab)1/4 ∂˜x + ∂t c1 , C22 = − ∂t (ab)1/4 ∂˜x + ∂t c2 . Adding all the contributions we find that (QR)11 = +a−3/4 b1/4 (∂˜x − ∂x ) + a−1 c1 − (ab)−1/4 f1 − ∂t (ab)−1/4 and (QR)21 = R1 − a1/4 b1/4 (∂˜x − ∂x )a−1/2 b1/2 ∂x − c1
b/a∂x − (ab)1/4 ∂˜x f1
− c1 f1 − ∂t (ab)1/4 ∂˜x − ∂t c1 . The lower-order terms vanish if c1 = − a3/4 b−1/4 ((ab)−1/4 ∂t (ab)1/4 ),
(2.9)
f1 = 0. What results is (2.10)
(QR)11 = a−3/4 b1/4 (∂˜x − ∂x )
and (2.11)
(QR)21 = R1 − a1/4 b1/4 (∂˜x − ∂x )a−1/2 b1/2 ∂x − ∂t (ab)1/4 (∂˜x − ∂x ) + ∂t ( a/b∂t (ab)1/4 ).
Similarly we have for the 12 and 22 components (QR)12 = − a−3/4 b1/4 (∂˜x − ∂x ) + a−1 c2 − (ab)−1/4 f2 − ∂t (ab)−1/4 , (QR)22 = R1 − a1/4 b1/4 (∂˜x − ∂x )a−1/2 b1/2 ∂x + c2 b/a∂x + (ab)1/4 ∂˜x f2 − c2 f2 + ∂t (ab)1/4 ∂˜x − ∂t c2 , with lower-order terms vanishing if (2.12)
c2 = − a3/4 b−1/4 ((ab)−1/4 ∂t (ab)1/4 ), f2 = 0.
The result for (QR)12 and (QR)22 are (2.13)
(QR)12 = − a−3/4 b1/4 (∂˜x − ∂x ),
(2.14)
(QR)22 = R1 − a1/4 b1/4 (∂˜x − ∂x )a−1/2 b1/2 ∂x + ∂t (ab)1/4 (∂˜x − ∂x ) + ∂t ( a/b∂t (ab)1/4 ).
This completes the time-dependent case, except for the inverse of Q.
1175
1176
CHRISTIAAN C. STOLK
For the inversion, rewrite Q as f (x) f (x) √ √ Q= , f (x) ab(∂˜x + c¯1 ) −f (x) ab(∂˜x − c¯2 ) c
with c¯j = f √jab . It turns out that Q can be inverted, according to the following explicit formula: ˜−1 ˜ −1 ∂˜−1 √1ab f −1 1 ∂ (∂x − c¯2 )f −1 . (2.15) Q = 2 ∂˜−1 (∂˜x + c¯1 )f −1 −∂˜−1 √1 f −1 ab This is basically due to the fact that c¯1 = c¯2 . 3. Operator splitting and time-stepping. The equation for the decoupled wavefields u is now d u = (T + R)u dt
(3.1)
with R as derived in the previous section and T given by b/a∂x 0 T = . 0 − b/a∂x The integrating factor will be E(t, t0 )−1 , where E(t, t0 ) solves T E(t, t0 ), E(t0 , t0 ) = Id, and we will define a field v by
d dt E(t, t0 )
=
v(t, t0 ) = E(t, t0 )−1 u(t), which satisfies the differential equation (3.2)
dv = E(t, t0 )−1 RE(t, t0 )v. dt
Applying Euler forward time-stepping for this equation gives v(t + Δt, t) ≈ (1 + Δt E(t + Δt, t)−1 RE(t + Δt, t))u(t), using that v(t, t) = u(t). Hence u(t + Δt) ≈ (1 + Δt R)E(t + Δt, t)u(t). A symmetric form of splitting (cf. Strang splitting [17]) leads to the following timestepping, expressed in time-stepping for u: (3.3)
u(t + Δt) ≈ (1 + 12 Δt R)E(t + Δt, t)(1 + 12 Δt R)u(t).
Let us now explain in more detail the computation of E(t, t0 ). This is a diagonal 2 × 2 matrix operator. We take the forward propagating component (the E2,2 component, which acts on the u2 field); the backward propagating component is done similarly. The characteristic equation is (3.4)
dx = c(x, t). dt
A FAST METHOD FOR LINEAR WAVES
1177
For the time-independent case, we can solve this ODE for x(t) x with initial value x(t0 ) = x0 by separating the variables, which yields the equation x0 c(ξ)−1 dξ = t−t0 , so the computation can be done from a primitive c(x) dx. For the time-dependent case (3.4) is solved directly. Let X(x0 , t, t0 ) denote the solution x(t) with initial values x(t0 ) = x0 . Then we have (3.5)
E2,2 (t, t0 )u2 = u2 (t0 , X(x, t0 , t))
(the characteristic is computed backward). If Φ2 (t, t0 ) denotes the characteristic flow mapping x0 to X(x0 , t, t0 ), this equals the pull back E2,2 (t, t0 )u2 (t0 ) = Φ2 (t0 , t)∗ u2 (t0 ). 4. Numerical implementation. For a numerical implementation, it remains to perform the space discretization. We chose to work with finite differences, which are easy to implement. The following operators were discretized: 1. ∂x . This operator was discretized using central differences. 2. ∂˜x , ∂˜x−1 , ∂˜x − ∂x . These are applied in the Fourier domain, with a regularized version of central differences. There computation involves an FFT and an inverse FFT, which, due to the O(N log N ) cost of this operation, will form the bulk of the computations. 3. Multiplications with coefficients and derivatives of coefficients. Derivatives of coefficients are computed again using central differences. 4. The translation operator E(t, t0 ) is computed for the time-independent case using the primitive c(x)−1 dx, mentioned above, and using a Runge–Kutta ODE solver otherwise. Then third-order Lagrange interpolation is applied. For the time-independent case a sparse matrix is precomputed, that performs the translation over a given time step Δt. In this way a simple numerical implementation of the method given by (3.3) was made. 5. Numerical results. In the numerical results we concentrate on the method for the time-independent case. For this case comparisons of computation time were made. For the time-dependent case it was observed that solutions are well approximated. But we feel the results for the time-independent case give sufficient indication of the effectiveness of the method. For this method, with the assumption of medium smoothness it is of course an important question just how smooth the medium coefficients need to be in order that the method demonstrates an improvement compared to more conventional methods. Therefore numerical results were computed for media with increasing smoothness. The media were chosen parameterized by B-splines of order 3; the coefficients a of the media were randomly chosen, uniformly distributed between 0.4 and 1.6. The increasing smoothness was obtained by increasing the node distance, for which we took the values 1, 2, 4, and 8. The b coefficient was chosen equal to 1. The initial value for U1 was a pulse of approximately unit width; the initial value for U2 was chosen equal to zero. In Figure 5.1 one such medium is displayed. In Figure 5.2 the initial value for U1 is plotted. The propagation was over approximately 100 wavelengths. The results were compared with the result of an order (4,4) finite difference method; see [6]. Both methods were implemented in MATLAB. For our method the main cost was in the Fourier transform used for computing ∂˜x and its inverse. In the standard finite difference methods, for each time step a sparse matrix was applied, and this constituted almost 100% of the cost. The first check was that the method actually approximates the solutions well. This was indeed the case. In Table 5.1 some numerical results are given, where
1178
CHRISTIAAN C. STOLK medium for dx_knots=4 and initial values 2.5 coeff a coeff b init val U1(t0)
2
1.5
1
0.5
0
−0.5
0
50
100
150 x
200
250
300
Fig. 5.1. Medium coefficients with random B-splines with knot distance 4.
initial value U1 init val U1(t0)
2
1.5
1
0.5
0
126
126.5
127
127.5
128 x
128.5
129
129.5
130
Fig. 5.2. Initial value for U1 used in the numerical tests.
hFD
Table 5.1 Comparison of the cost of the method of section 4 with an order (4,4) finite difference method. is the space stepsize taken in the finite difference scheme for which the comparison is made.
Medium scale
hFD
1
0.05 0.025 0.05 0.025 0.05 0.025 0.05 0.025
2 4 8
Cost FD Cost GOIF 0.37 0.30 3.3 5.4 9.0 15.7 17.4 25.3
computation time is compared. For the new method we required the error to be smaller in both the supremum and the L2 sense, or at most 10% larger in one of the two, but better when both are taken into account. As can be seen, knot distance 1 is not sufficient to obtain any gain, but from knot distance 2 considerable gain is obtained, up to a factor of about 20 for very smooth media. As this is only a first implementation we feel this is strong encouragement to further analyze geometrical optics based methods.
A FAST METHOD FOR LINEAR WAVES
1179
6. An optimal complexity method: Overview. For the method introduced above there were no rigorous error estimates given. The complexity is, however, at least O(N log N ) since the regularized derivative ∂˜x and its inverse were computed in the Fourier domain and needed to be computed for each time step. In this section we present a more elaborate algorithm, for which we establish that the complexity is O(N ), where N denotes the number of grid points in the space discretization. So the task in the remaining sections is on the one hand to control the error in a numerical method and on the other hand control the cost. The discretization will be done for the differential equation (6.1)
dv = E(t, t0 )−1 RE(t, t0 )v, dt
that resulted from (3.1) after applying the integrating factor. It follows from the results in section 8 below that the transformation from the original equation (1.3) to this form and back can be done at cost O(N ) and with error satisfying bounds that are sufficient. We will provide precise error estimates of classical type; i.e., we assume the input has a certain amount of additional regularity, we consider the discretization error in the result given that the input has to be approximated in an N -dimensional space of (spline-) functions, and we then show that the total error in the output is of the same order in N as the discretization error. Evolution according to (6.1) maps initial values v(t0 ) = v0 in H 1 × H 1 to final values v(t1 ) that are also in H 1 × H 1 . We will assume that v0 is in H 1+α × H 1+α , i.e., has α additional orders of regularity. The discretization error that results from putting v0 in an N -dimensional spline space can then be estimated by CN −α . We will show that, for a method with cost that can be bounded by CN , the final result satisfies an estimate of the type vapprox (t1 ) − v(t1 )H 1 ×H 1 ≤ CN −α (the letter C may mean a different constant in different equations). A naive approach would be to simply take the differential equation (6.1), first apply a discretization in space, and then subsequently apply discretization in time. The time discretization should preferably be of higher order. There are two main problems with this approach, which will lead to additional special features of our method. These new features are the following: 1. Higher order decoupling. Control of the time discretization error in higherorder time-stepping, say of order K, requires bounds on the time derivatives of the operator E(t, t0 )−1 RE(t, t0 ) occurring on the right-hand side of (6.1). The first time derivative contains a commutator [R, T ] (which is of order 0 and hence bounded), but higher time derivatives contain higher-order commutators, that are of positive order, and hence do not satisfy the required bounds. To address this issue we will introduce higher-order decoupling. In section 7 we will construct a new operator R, with off-diagonal terms that are smoothing operators of order K, and show that its time derivatives of order 0, . . . , K are bounded on a sufficiently large range of Sobolev spaces. The higher-order decoupling is obtained by adapting an argument of Taylor [19, Chapter 9] or [18]. 2. Multiscale time-stepping. The second problem that needs to be addressed is that in our complexity estimates, with increasing N , the error must decrease. This in turn means that the time step must decrease, which would lead to
1180
CHRISTIAAN C. STOLK
superlinear complexity. To address this issue we introduce multiscale timestepping. The idea is that the coarse scales are propagated with a small time step. The coarse scales are parameterized with relatively few coefficients but contain most of the energy. It is therefore affordable to use a smaller time step, and at the same time this leads to a big improvement in the error. For the fine scales, that contain relatively little energy, larger time steps are used. Incidentally this is very much in agreement with the philosophy of asymptotic methods, where the high frequencies are well approximated. Each time step amounts to a correction to the purely asymptotic approximation, so few are needed for the high frequencies. The idea of multiscale time-stepping is new to our knowledge. Because of the multiscale time-stepping, we assume the use of a wavelet based multiscale discretization in space. We will use [5] as our main reference for wavelet discretization; see also [7]. In the next three sections we will work out the above issues in detail and prove the O(N ) complexity result. Section 7 concerns the higher-order decoupling. Discretization and operator approximation will be discussed in section 8. Section 9 will contain the ideas on multiscale time-stepping and the final parts of the proof that combine all the intermediate results. 7. Higher-order decoupling. By the transformation u = Q−1 U in section 2, the original system (1.3) was transformed to u = (T + R)u, where T + R = Q−1 M Q − Q−1 Q . We had b/a∂x 0 T = . 0 − b/a∂x The operator R is a matrix pseudodifferential operator, with components that are of order order(−1) order(−1) (7.1) R= . order(−1) order(−1) Here by order(−1) we mean that it is bounded H s → H s+1 for a suitable range of s. In this section we explain how to construct Q such that R has the property that (7.2)
dj (E(t, t0 )−1 RE(t, t0 )) is bounded on H 1 × H 1 for j = 0, 1, . . . , K, dtj
with K a positive integer indicating, as mentioned, the order of the time-stepping that is going to be used. We first argue that property (7.1) is not sufficient if K > 1. Take for example the first time derivative of E(t, t0 )−1 RE(t, t0 ):
d dR −1 (E(t, t0 )RE(t, t0 )) = E(t, t0 ) (7.3) [R, T ] + E(t, t0 ). dt dt Consider the commutator [R, T ] occurring inside the brackets: [R1,1 , T1,1 ] R1,2 T2,2 − T1,1 R1,2 (7.4) [R, T ] = . R2,1 T1,1 − T2,2 R2,1 [R2,2 , T2,2 ]
A FAST METHOD FOR LINEAR WAVES
1181
To get the idea assume that the coefficients a and b are C ∞ , so that R and T have smooth symbols. What we see from this expression is the following: • The diagonal terms [R, T ]1,1 and [R, T ]2,2 are commutators of scalar pseudodifferential operators, and their order equals the order of R1,1 , resp., R2,2 . • For the off-diagonal terms [R, T ]1,2 and [R, T ]2,1 this is not true; their order is increased by 1 compared to R1,2 , resp., R2,1 . This has nothing to do with the specific form of R; if R is replaced by a different matrix pseudodifferential operator, these two statements remain true. So consider the second-order time derivative of E(t, t0 )RE(t, t0 ). This contains the higher-order commutator [[R, T ], T ]. Assuming (7.1) and using (7.4) twice, it follows that the offdiagonal terms [[R, T ], T ]1,2 and [[R, T ], T ]2,1 are (a priori) of order 1, implying that (7.2) is violated. To address this problem we will construct a modified operator Q, such that order(−1) order(−K) (7.5) R= . order(−K) order(−1) The old operators Q and R will be referred to as Q(−1) and R(−1) , because of (7.1). The new operators will be referred to as Q(−K) and R(−K) . This way, we can handle K time derivatives, each of which can increase the order of the off-diagonal term by 1. We write ∂˜x = ∂x + Ψ, where from now on we assume that Ψ is smoothing in the sense that it is continuous H s → H s+K , 1 − K ≤ s ≤ 1. The reason is that then any term that is a product of Ψ and other operators, none of which is of positive order, automatically is of order(−K) and is hence “safe” (see (7.5)). For Ψ, we could use for example Ψ=
α β(−∂x2 )K/2 + 1
α with symbol βk2K/2 . This is a modification with respect to the original definition +1 of ∂˜ in section 2. However, it does not affect equations like (2.6), (2.7), (2.10), (2.11), (2.13), and (2.14), because the specific form of ∂˜x − ∂x is not used in their derivation. The main result of this section is captured in the following theorem, a short explanation of which is given after its formulation. Theorem 7.1. Assume a, b are at least C 2K+1,1 . There exists an operator Q(−K) of the form 1 E 1 0 Q(−K) = Q(−1) 0 1 F 1
such that the operator R(−K) satisfies (7.2). The operators E, F can be chosen of the form K j=2
cE (−j) (x, t)∂˜x−j ,
K
cF (−j) (x, t)∂˜x−j ,
j=2
where the cE (−j) (x, t), cF (−j) (x, t) are (x, t) dependent coefficients that depend on a(x, t), b(x, t) and derivatives of order up to j of a, b. The operators that form the matrix elements of R(−K) are sums of products of the following basic operators: operator Ψ, operators ∂˜−k for k ≥ 0, and multiplication by coefficients that are functions
1182
CHRISTIAAN C. STOLK
of a, b and derivatives of order at most K + 1 of a and b. This can be done such that all the terms for the off-diagonal elements of R(−K) are explicitly of order −K in the sense that they contain a factor of Ψ or at least K powers of ∂˜x−1 . The description as a sum of products of basic operators is such that the operators involved can be numerically approximated with the techniques described in section 8. We note in particular that there are no cancellations between terms of R(−K) of order > −K. This is important, to avoid the situation where R(−K) consists of several contributions whose highest-order parts cancel analytically but not numerically due to the errors made in the numerical approximation. In the proof we will also describe a calculational scheme to compute the cE (−j) (x, t), cF (−j) (x, t). (We have not calculated any case K > 1 explicitly.) Proof. We write temporarily A B . T + R(−1) = C D We will first assume that a, b are C ∞ , so that all pseudodifferential operators involved have smooth symbols; later we will investigate how much smoothness for the coefficients is needed. Using a transformation with a matrix pseudodifferential operator of the form 10 E1 the operator B will be removed to the highest K − 1 orders. Replacing Q by Q 10 E1 yields the following for the new operator R; see (2.1): 1 0
(7.6)
E 1
−1
A C
B D
1 E 0 1
1 − 0
E 1
−1 0 0
=
E 0
B + AE − ED − E D + CE,
A − EC C
,
−1 1 −E = 0 1 . The first problem is to find E where we used the explicit inverse 10 E1 such that B +AE −ED−E is of the desired lower order. Next we do a transformation with a matrix F1 01 of the matrix in (7.6). After this second transformation, the new operator R becomes
−1 0 A − EC C 1
1 F
=
B + AE − ED − E D + CE
A − EC + (B + AE − ED − E )F C + (D + CE)F − F (A − EC) − F
1 F
0 1
−
1 F
−1 0 0 F 1
0 0
B + AE − ED − E . D + CE − F (B + AE − ED − E )
Just like E we must then choose F , such that C + (D+ CE)F 1 0 − F (A − EC) − F 1 E is the desired lower order. The new 1 QE is then Q 0 1 F 1 (using the factor of 1 0 is convenient compared to 1E because it has an explicit inverse, easy F 1 0 1 F 1 numerically). Let us consider the construction of E. This follows a standard pattern in pseudodifferential operator theory, choosing E order by order. We let
E = E (−2) + E (−3) + · · · + E (−K) ,
A FAST METHOD FOR LINEAR WAVES
1183
and set B (−2) = B (−1) + AE (−2) − E (−2) D − E (−2) , B (−3) = B (−2) + AE (−3) − E (−3) D − E (−3) , etc., until B (−K) = B + AE − ED − E . The principal symbol of B (−k) is of the form cB (−k) (x) (iξ)−k , while those of A and −D are both equal to b/a(iξ). Hence if we choose the principal symbol of E (−k−1) c (−k) equal to − B√ (x) (iξ)−k−1 , then the principal symbol of B (−k−1) vanishes, with as 2
a result that B
b/a (−k−1)
becomes an operator of order −k − 1 as desired. So we set
cB (−k) cE (−k−1) = − 2 b/a
E (−k−1) = cE (−k−1) ∂˜x−k−1 .
and
The operators E (−k) follow from this scheme. The coefficients cB (−k) and cE (−k) are determined inductively. This can be done on the symbol level using pseudodifferential operator calculus, or directly, as we will demonstrate now. We further investigate this construction of the cB (−k) and cE (−k) and of the remainders R(−k) . It is convenient to just take the matrix R(−1) , which is the starting point of the induction, and apply a few steps of the recipe. Doing this, the key properties that allow the successful construction will become clear, without becoming overly formal. The matrix R(−1) follows in the time-independent case from (2.6), (2.7), and (2.5). Omitting anything involving ∂˜x − ∂x (which is smoothing by definition), we have the following terms relevant for the higher-order decoupling: ∂˜x−1 a−1/4 b−1/4 R1 ∂˜x−1 a−1/4 b−1/4 R1 (−1) = + order(−K). R −∂˜−1 a−1/4 b−1/4 R1 −∂˜−1 a−1/4 b−1/4 R1 x
x
So we set, following the above scheme, E (−2) = −
a−1/4 b−1/4 R1 ˜−2 ∂x . 2 b/a
We then find B (−2) = B (−1) +
b/a∂x + ∂˜x a−1/4 b−1/4 R1 E − E − b/a∂x − ∂˜x a−1/4 b−1/4 R1
− E = ∂˜x−1 a−1/4 b−1/4 R1 − (7.7)
a−1/4 b−1/4 R1 ˜−2 a−1/4 b−1/4 R1 ˜−2 ∂x − ∂x b/a∂x b/a∂x 2 b/a 2 b/a
+ order(−3).
In the first term we need to commute ∂˜x−1 to the right, in the second term we need to commute ∂x to the right, and in the third term we need to commute ∂˜x−2 to the right.
1184
CHRISTIAAN C. STOLK
To continue an understanding of the commutator of ∂˜x−1 with (multiplication by) some function g(x) is needed. Such a commutator yields the following: [∂˜x−1 , g] = − ∂˜x−1 [∂x + S, g]∂˜x−1 = − ∂˜x−1 (∂x g)∂˜x−1 − ∂˜x−1 (Sg − gS)∂˜x−1 . The first term in this expression for the commutator is of order −2 and contains a coefficient with one more derivative. The second term is of order less than −K and is hence to be disregarded. After the commutations the highest-order terms in B (−2) cancel, and what remains are commutator terms and other lower-order terms. Several more remarks are in order. First the general form, involving as basic operations the ∂˜xj , the operator Ψ = ∂˜x − ∂x , and multiplications with coefficients and derivatives and powers of coefficients, remains conserved in each step. Concerning the order of derivatives of the coefficients that occur, in B (−1) and (−2) we have at most second-order derivatives, in B (−2) and E (−3) at most third E order, and inductively we find that in B (−j) and E (−j−1) we have at most j + 1 order of derivatives. One of the assumptions is that the coefficients are C 2K+1,1 , which implies that in R(−K) the coefficients are still C K,1 . Does this also hold for the time derivatives; i.e., do we have (7.2)? We must then carefully study (7.3) and (7.4). It turns out that each time derivative leads to a loss of at most one derivative in the regularity of the coefficients of a coefficient multiplication operator. With K time derivatives, we need C 0,1 smoothness to have a bounded map on H 1 × H 1 (L∞ would be enough if the operator was considered on L2 × L2 ). Therefore C K,1 in the coefficients occurring in R(−K) is sufficient and (7.2) follows. The operator F can be determined in a similar fashion. This completes the proof of Theorem 7.1. 8. Discretization and operator approximation. The multiscale discretization will be done using wavelets. We follow the book of Cohen [5], which gives an excellent description of one-dimensional wavelet discretization theory; see also [7]. In a wavelet discretization functions in L2 (Ω) and H s (Ω) are approximated by elements of increasingly large finite-dimensional subspaces of L2 (Ω) given by a multiresolution analysis Vj , j = 0, 1, 2, . . . . The spaces Vj are spanned by translates and scalings of the scaling function φ: φj,k = 2j/2 φ(2j · −k),
k ∈ Z/(2j LZ).
1 2 The Vj are assumed to form an increasing sequence Vj ⊂ Vj+1 , ∞ j=0 Vj = L (Ω). In our case, where the domain is a circle of integer length L, the space Vj has L2j elements. We denote by J the final level of discretization, so that N = L2J . Typically we will denote by fj an approximation of a function f in Vj , and by Aj the approximation of an operator A on Vj . The multiscale decomposition is obtained from the wavelet spaces. The wavelet space Wj is such that Vj+1 = Vj ⊕ Wj . It is spanned by the translates and scalings ψj,k of a mother wavelet ψ. This leads to the multiscale decomposition Vj = V0 ⊕ W0 ⊕ · · · ⊕ Wj−1 . The scaling function can be chosen with compact support, and with any order C k smoothness. Together with the Vj , a dual multiresolution analysis V˜j can be
A FAST METHOD FOR LINEAR WAVES
1185
˜ such constructed, spanned by translates and scalings of a dual scaling function φ, ˜ that the basis functions satisfy the biorthogonality property φj,k , φj,k = δk,k . One of φ, φ˜ can be chosen as a compactly supported spline, we assume φ is a spline, and V is a spline space of a certain order. The space Vj can be made to satisfy Vj ⊂ H s for any s by choosing wavelets of sufficiently high order of smoothness. Throughout the analysis we will assume sufficient smoothness of the wavelets, without specifying this precisely. The error estimates and assumptions on the smoothness of initial values are formulated in terms of regularity in L2 based Sobolev spaces. That is natural and convenient for wave equations (where physical energy conservation holds). It is also easy to handle in wavelet discretizations, because of norm equivalences. The Sobolev norms · H s are equivalent to weighted norms of the wavelet coefficients. If f=
K−1
2 L ∞ j
c−1,k φ0,k +
cj,k ψj,k ,
j=0 k=0
k=0
and the wavelets are sufficiently smooth, then there is the norm equivalence f 2H α (Ω)
∼
∞
|2αj cj,k |2 .
j=−1 k
From these norm equivalences one can easily derive an important approximation result. Assume that f is in H α ; then the projections ΠVj f of f to the Vj satisfy f − ΠVj f L2 (Ω) ≤ C2−αj f H α (Ω) . In our application we typically deal with products of operators that are applied after each other, in discrete form, to a discretized function. We first derive a criterion, that we call order k approximation operator, for each of the operators to satisfy, such that such products converge. After this we will argue that the operators in our application can be approximated such that the approximation indeed satisfies the approximation property. Suppose A is some operator H s1 → H s2 , and Aj is a discrete approximation to A. As pointed out, convergence estimates are done using additional regularity, say k additional orders of regularity. For our operator A from H s1 → H s2 we therefore assume its argument, say f is in H s1 +k . The result Af may be the argument of another operator, so we will require Af ∈ H s2 +k ; in other words we will assume A is continuous H s1 +s → H s2 +s for 0 ≤ s ≤ k. Next we discuss a property that ensures that Aj fj approximates Af if fj approximates f . Definition. Let A be as just described; then we say A and Aj satisfy the order k approximation property if A − Aj H s1 +k →H s2 ≤ C2−jk . This also implies that Aj is continuous H s1 +s → H s2 +s for 0 ≤ s ≤ k. This implies that if a function f ∈ H s1 +k is approximated in H s1 by functions fj , with
1186
CHRISTIAAN C. STOLK
the convergence as expected from the additional regularity, i.e., f − fj H s1 ≤ C2−kj f H s1 +k , then Aj fj approximates Af in the same way, since Af − Aj fj H s2 ≤ Aj (f − fj )H s2 + (A − Aj )f H s2 ≤ C2−jk f H s1 +k . We will assume that k is an integer, although this does not seem essential, and that k ≥ 1. The basic operators needed here are partial differential operators, the operator (−∂x2 + 1)−1 or inverses of higher-order elliptic operators for the approximation of the operator S of section 7, and the pull back along the characteristic flow (which is a smooth coordinate transformation). Here we discuss partial differential operators and constant coefficient inverse partial differential operators; the pull back will be discussed in the last part of this section. We state the result on the approximation of R(−K) , Q(−K) as a lemma. Lemma 8.1. Assume the coefficients a, b are C k+K+1,1 . Then numerical approximations to the operators R(−K) on H 1 × H 1 , Q(−K) from H 1 × H 1 → H 1 × L2 and (Q(−K) )−1 from H 1 × L2 → H 1 × H 1 can be constructed that satisfy the order k approximation property. Proof. Multiplication by polynomials and differentiation operators can be discretized using results of [8]; see that reference or section 2.5 of [5]. They can be discretized at cost O(N ), in such a way that the above order k approximation property is satisfied. For multiplication operators with functions other than polynomials, the coefficient is locally approximated by polynomials. As for the regularity requirement on the coefficients, for an approximate multiplication operator on H s1 to have the order k approximation property, it is sufficient to have C k+s1 −1,1 coefficients, since a C k−1,1 function can be approximated to error 2−jk by polynomials on regions of size order 2−j . In the case of the approximation of R(−K) on H 1 × H 1 , the coefficients in the remainder term need to be C k,1 . It follows that the coefficients a and b must be in C k+K+1,1 . The operator (−∂x2 + 1)−1 can be computed in O(N ) cost using a multigrid algorithm [1]; a wavelet variant of this algorithm was given in [5]. To show that the approximation property holds, a slight change in the argument about multilevel preconditioning in example 4 in section 3.11 of [5] is needed; namely, nj is chosen such that ρnj ≤ 2−t j , with t > t. Similar arguments work for the higher-order inverse elliptic operator Ψ. This concludes the proof. Next we will show a similar result for E(t, t0 ). This operator was diagonal with E2,2 given by (see (3.5)) (8.1)
E2,2 (t, t0 )u2 (x) = u2 (t0 , X(x, t0 , t)).
The 1,1 component of E(t, t0 ) is given by a similar formula. We will first discuss the approximation of X(x, t, t0 ); then the next lemma will contain the result on E(t, t0 ). Let Xj (x, t, t0 ) denote a numerical approximation used at level j. This must be computed for a set of points x. We require increasing accuracy as j increases, with error bounded by C2−j(k+1) . It is allowed that, as j increases, the computational cost increases as 2j . We find that for the time-independent case C k+1 smoothness of
A FAST METHOD FOR LINEAR WAVES
1187
the coefficients is sufficient, while for the time-dependent case C 2k+2 smoothness is sufficient for this computation, as we will now show. For the time-independent case, the evaluation of (8.1) can be done by solving X = X(x, t, t0 ) from X (8.2) c(ξ)−1 dξ = t − t0 . x
x
−1
First the primitive 0 c(ξ) dξ is computed for all x in the periodic grid with grid distance 2−j . Assuming that c is C k+1 , this can be done at cost O(2j ), with error ≤ C2−j(k+1) . Next the solution of (8.2) can be done for a set of 2j points x using interpolation, which conserves the order of error, i.e., with error still bounded by C2−j(k+1) . For the time-dependent case we solve for the characteristics using a Runge–Kutta method of order 2k + 2. We require C 2k+2 smoothness of c; then we can take order 2j/2 points with distance between them of 2−j/2 and solve with time steps of order 2−j/2 . The total error is then bounded by C2−j(k+1) . Next we discuss how (8.1) can be computed numerically such that the order k approximation property is satisfied. Lemma 8.2. Assume the coefficients a, b are C k+1 for the time-independent case or C 2k+2 in the time-dependent case, and the wavelets are order k + 1 splines. Then a numerical approximation to the operator E(t, t0 ) on H 1 × H 1 can be constructed that satisfies the order k approximation property. Proof. We consider the approximation at level J of E2,2 (t, t0 )f , with f an element of VJ . We have that E2,2 (t, t0 )f (x) = f (X(x, t0 , t)). For brevity we will write X(x) instead of X(x, t0 , t). We will write h(x) = f (X(x)). We want to compute cJ,k˜ = φ˜J,k˜ , h. The computation of matrix elements of polynomials, i.e., φ˜J,k˜ , p, when p is a polynomial, is basically exact; see the method of section 2.5 of [5]. To compute matrix elements of other smooth functions, it is common to approximate these locally by polynomials, and we will also use this in this argument. So to compute the approximate coefficient of the scaling function φJ,k˜ , the function h is approximated around the support of φJ,k˜ by a polynomial p. The approximate value of the coefficient is then c˜J,k˜ = φ˜J,k˜ , p and is obtained according to the mentioned section of [5]. Thus we must define how to approximate h locally by a polynomial. This can simply be done by polynomial interpolation with an order k polynomial. A function h in C k,1 can be approximated by interpolation on a grid of size 2−J up to an error bounded by sup |h(x) − p(x)| ≤ C2−(k+1)J hC k,1 (SJ,k˜ ) .
x∈SJ,k ˜
We will apply this to a wavelet, f = ψj,kˆ . We assume that the wavelet ψ is C k,1 and use that X is also C k,1 . The function h(x) = ψj,kˆ (X(x)) satisfies ψj,kˆ (X(·))C k,1 (SJ,k˜ ) ≤ C2j(k+3/2) . Thus the error with p an exact interpolating polynomial is given by |cJ,k˜ − c˜J,k˜ | ≤ φ˜J,k˜ L1 sup |h(x) − p(x)| ≤ C2J(−k−3/2)+j(k+3/2) ≤ C2(k+1)(j−J) . x∈SJ,k ˜
Here we used that φ˜J,k˜ L1 can be bounded by C2−J/2 (which has to do with the normalization; the L2 norm of φ˜J,k˜ is normalized to unity). Thus we find that the
1188
CHRISTIAAN C. STOLK
map from f to the error k˜ (cJ,k˜ − c˜J,k˜ )φJ,k˜ is bounded by C2−(k+1)J from H k+1 to L2 , and hence by C2−kJ from H k+1 to H 1 . A second source of error is that XJ (x) is used instead of the exact value X(x). For these errors we have XJ (x) dψj,kˆ ψj,kˆ (XJ (x)) − ψj,kˆ (X(x)) = (s)ds. dx X(x) Since satisfy
dψj,k ˆ dx
is bounded by C23j/2 , and |XJ (x) − X(x)| < C2−J(k+1) , these errors |ψj,kˆ (XJ (x)) − ψj,kˆ (X(x))| ≤ C23j/2−J(k+1) .
From this a bound C2−J(k+1) follows for the map from input to this error, considered in spaces H 3/2 → L2 , and a bound C2−JK from H 3/2 → H 1 , which is better than or equal to the bound for the interpolation error, since k > 1/2. 9. Multiscale time-stepping and proof of the theorem. In this section multiscale time-stepping is introduced to finally obtain an O(N ) algorithm. The results of section 7 enable the use of higher-order time-stepping methods and lead to estimates for the time discretization errors. The results of section 8 allow us to estimate the errors due to space discretization. Here we will combine space and time discretization, choose parameters, like the order of space and time discretization, and establish the complexity of the algorithm by estimating error and cost of the algorithm. We solve the equivalent of differential equation (3.2) with higher-order decoupling, after the application of the integrating factor; i.e., we solve (9.1)
dv (t) = S(t, t0 )v(t), dt
with S(t, t0 ) = E(t, t0 )−1 R(−K) E(t, t0 ), where R(−K) is as constructed in section 7. We will approximate the solution v(t1 ) starting from t0 . The approximation is done in H 1 × H 1 . The initial values v0 = u0 also must be in H 1 × H 1 . We assume they have α additional orders of regularity; i.e., they are in fact in H 1+α × H 1+α . It follows from the results of sections 7 and 8 that we can transform the values U (t) of the original system (1.3) to those of the transformed system (9.1) and back with complexity O(N ). Operators will be approximated with the order k approximation property, with k > α. A minimum value for k is derived below. Regularity assumptions follow from these assumptions according to the previous sections. Note that this is different from the previous section, where the order k corresponded to the order of additional regularity of functions that operators acted on, while here k > α. By Sj we denote an approximation of S in Vj × Vj with the order k approximation property, according to the methods of section 8. (Note that Sj = ΠVj SΠVj .) In general in an integrating factor method it is common to frequently reset t0 , so that E(t, t0 ) propagates only over small time intervals. We will refrain from doing so, as this is not needed in this context, and the frequent application of E(t, t0 ) to the full
A FAST METHOD FOR LINEAR WAVES
1189
signal (i.e., not only the addition made during a small time interval by a Runge–Kutta time step) may cause additional errors. As motivated in section 6, we will make a multiscale decomposition of the signal and do time-stepping separately for each scale. The initial values are decomposed as follows: u0 =
J
w0,j ,
j=0
with w0,0 = ΠV0 u0 , and w0,j = ΠWj−1 u0 , for j = 1, . . . , J. Here ΠVj , ΠWj denote the projection on Vj × Vj and Wj × Wj , respectively. The field v(t) will also be decomposed. The jth component, corresponding to initial values in Wj−1 × Wj−1 , will not be approximated in Vj × Vj , however (nor in Wj−1 × Wj−1 ), but in a space Vl(j) × Vl(j) , j ≤ l(j) ≤ J. To indicate this we write the components of the sum as vj,l(j) . We will show that v(t) can be approximated like v(t) ≈
J
vj,l(j) .
j=0
The motivation for doing this is simple: Large errors would result in the time propagation in Vj × Vj of the w0,j , while large cost would result if we would work in the full space VJ × VJ . By working in an intermediate space both cost and errors can be controlled. The final numerical approximation will be a sum of components wj,l(j),Δtj . The terms describe the discrete time propagation with time step Δtj , using the space discretized operators Sl(j) (t), applied to the initial values w0,j . For purposes of error estimation we consider two sets of fields in addition to wj,l(j),Δtj . We assume the fields vj,l(j) introduced above describe the continuous time propagation of the operator ΠVl(j) SΠVl(j) , and the field vj,l(j),Δtj will describe the discrete time propagation of ΠVl(j) SΠVl(j) . We first establish that vJ (t1 ) can be approximated like vJ (t1 ) ≈
J
vj,l(j) (t1 ).
k=0
Lemma 9.1. Suppose l(j) is such that k(l(j) − j) = α(J − j).
(9.2) Then (9.3)
0 0 0 0 J 0 0 0 0 v (t ) − v(t ) 1 0 j,l(j) 1 0 0j=0 0
≤ C2−αJ u0 H 1+α ×H 1+α .
H 1 ×H 1
Proof. Let vj,∞ denote the solution of the exact differential equation with initial dvj,∞ = Svj,∞ . As S is bounded on H 1+s × H 1+s , 0 ≤ s ≤ k, value w0,j . It satisfies dt it follows that vj,∞ (t) satisfies the bound vj,∞ (t)H 1+s ×H 1+s ≤ Cw0,j H 1+s ×H 1+s for 0 ≤ s ≤ k, t0 ≤ t ≤ t1 .
1190
CHRISTIAAN C. STOLK
We have (9.4)
dvj,l(j) dt
= ΠVl(j) SΠVl(j) vj,l(j) , so the difference vj,l(j) − vj,∞ satisfies
dvj,l(j) − vj,∞ = ΠVl(j) SΠVl(j) (vj,l(j) − vj,∞ ) + (ΠVl(j) SΠVl(j) − S)vj,∞ . dt
By standard estimates for ODEs we have that vj,l(j) (t) − vj,∞ (t)H 1+s ×H 1+s ≤ C1 vj,l(j) (t0 ) − vj,∞ (t0 )H 1+s ×H 1+s
t
+ C2 t0
(ΠVl(j) SΠVl(j) − S)vj,∞ (s)H 1+s ×H 1+s ds.
The first term on the right-hand side is zero. For the second term we use that by the regularity assumptions we have (9.5)
ΠVl(j) SΠVl(j) − SH 1+k ×H 1+k →H 1 ×H 1 ≤ C2−kl(j) .
The components of the initial values w0,j are bounded according to (9.6)
w0,j H 1+k ×H 1+k ≤ C2j(k−α) w0,j H 1+α ,
and the same is true for vj,∞ (t) for t0 < t < t1 . The inhomogeneous term in (9.4) can therefore be bounded by 0 0 0(ΠV SΠV − S)vj,∞ (t)0 ≤ C2−kl(j)+j(k−α) w0,j H 1+α ×H 1+α l(j) l(j) H 1 ×H 1 = C2−αJ w0,j H 1+α ×H 1+α . The error vj,l(j) (t1 ) − vj,∞ (t1 ) therefore satisfies the bound (9.7)
vj,∞ (t1 ) − vj,l(j) (t1 )H 1 ×H 1 ≤ C2−αJ w0,j H 1+α ×H 1+α .
Adding the estimates for each j results in (9.3). The second step in the estimation of the error is to estimate the time discretization error for the field vj,l(j) . We will argue that the fields vj,l(j) can be sufficiently accurately approximated using Runge–Kutta time discretization. By vj,l(j),Δtj we denote the time-discretized fields. We assume the use of an order K Runge–Kutta method for the time-stepping. Lemma 9.2. Suppose that the time step Δtj satisfies the inequality (9.8)
Δtj ≤ C2−α(J−j)/K ,
and that the coefficients a, b are at least C 2K+1,1 ; then we have 0 0 0 0 J 0 0 J 0 0 v (t ) − v (t ) ≤ C2−αJ u0 H 1+α ×H 1+α . (9.9) j,l(j),Δtj 1 j,l(j) 1 0 0 0 1 1 0 j=0 j=0 H ×H
Proof. The error per time step in vj,l(j) − vj,l(j),Δtj H 1 ×H 1
1191
A FAST METHOD FOR LINEAR WAVES
is bounded by K+1
(Δtj )
sup τ ∈[t,t+Δtj ]
0 0 0 dK+1 v 0 j,l(j) (τ ) 0 0 0 0 0 0 dtK+1
.
H 1 ×H 1 dK+1 v (τ )
j Using the differential equation, the higher-order time derivative dtK+1 can be dγ expanded as a sum of terms that are each given by a product of factors dtγ ΠVl(j) SΠVl(j) (total sum of the γ’s is ≤ K) acting on vj,l(j) (τ ). In section 7 it was shown that with j the given smoothness assumption on a, b, the time derivatives ddtSj were bounded dj operators on H 1 × H 1 for j = 0, . . . , K. The same is true for dt j ΠVl(j) SΠVl(j) . It follows that the error per time step is bounded by 0 0 0vj,l(j) (τ )0 1 1 . C(Δtj )K+1 sup H ×H
τ ∈[t,t+Δtj ]
Using standard arguments to go from local to global error, we find that the error at time t1 can be estimated by vj,l(j) (t1 ) − vj,l(j),Δtj (t1 )H 1 ×H 1 ≤ C(Δtj )K w0,j H 1 ×H 1 . We have that J αj 2 2 w0,j H 1 ×H 1 j=0
is bounded. We therefore require that (9.10)
(Δtj )K ≤ C2αj 2−αJ ;
then (9.9) follows. The conditions (9.8) and (9.10) are of course equivalent. For the estimate of the time discretization error it turned out to be convenient to work with ΠVl(j) SΠVl(j) , an exact discretization that is not practical to compute, instead of Sl(j) , the approximate discretization discussed in section 8. The reason is that the errors made in Sl(j) are not differentiable. So the next step is to take into account the difference between Sl(j) and ΠVl(j) SΠVl(j) . Lemma 9.3. Assume still (9.2). We have the estimate 0 0 0 J 0 J 0 0 0 0 (9.11) w (t ) − v (t ) ≤ C2−αJ u0 H 1+α ×H 1+α . j,l(j),Δtj 1 j,l(j),Δtj 1 0 0 0 1 1 0 j=0 j=0 H ×H
Proof. The difference Sl(j) − ΠVl(j) SΠVl(j) satisfies a similar estimate as the difference ΠVl(j) SΠVl(j) − S, which was considered in the proof of Lemma 9.1. The proof of (9.11) therefore proceeds similarly as the proof of Lemma 9.1, except that difference equations are used instead of differential equations. The difference wj,l(j),Δtj − vj,l(j),Δtj satisfies the linear inhomogeneous difference equation wj,l(j),Δtj (t + Δt) − vj,l(j),Δtj (t + Δt) = Δt RKStep(t, Δt, Sl(j) )(wj,l(j),Δtj (t) − vj,l(j),Δtj (t)) + Δt(RKStep(t, Δt, Sl(j) ) − RKStep(t, Δt, ΠVl(j) SΠVl(j) ))vj,l(j),Δtj (t),
1192
CHRISTIAAN C. STOLK
where Δt RKStep(t, Δt, A)y denotes the Runge–Kutta step for the equation y = Ay, which is a linear map on y. It follows that wj,l(j),Δtj (tˆ) − vj,l(j),Δtj (tˆ)H 1 ×H 1 ≤ CΔtj × (RKStep(t, Δt, Sl(j) )−RKStep(t, Δt, ΠVl(j) SΠVl(j) ))vj,l(j),Δtj (t)H 1 ×H 1 . t-values < tˆ
The difference RKStep(t, Δt, Sl(j) ) − RKStep(t, Δt, ΠVl(j) SΠVl(j) ) can be worked out. It is a product of Sl(j) − ΠVl(j) SΠVl(j) and of operators that are bounded on H 1+s , 0 ≤ s ≤ k. It follows that we have the estimate 0 0 0RKStep(t, Δt, Sl(j) ) − RKStep(t, Δt, ΠV SΠV )0 ≤ C2−kl(j) . l(j) l(j) H 1+k ×H 1+k →H 1 ×H 1 Furthermore 0 0 0vj,l(j),Δt (t)0 1+k 1+k ≤ 2j(k−α) w0,j 1+α . j H H ×H It follows that we can estimate 0 0 0wj,l(j),Δt (t1 ) − vj,l(j),Δt (t1 )0 ≤ C2−kl(j)+j(k−α) w0,j 1+α 1+α j j H ×H = C2−αJ w0,j H 1+α ×H 1+α . The estimate (9.11) trivially follows from this. This ends our estimation of the error. The cost of this time-stepping is C
J j=0
(Δtj )−1 2l(j) = C
J
2α(J−j)/K+α(J−j)/k+j
j=0
= C2J
J
2(−1+α/K+α/k)(J−j) .
j=0
The requirement is that the cost is bounded by CN , and hence that −1+α/K +α/k < 0. If we allow logarithmic cost O(N log N ), equality is also allowed. We hence have our final result. Theorem 9.4. If a Kth-order Runge–Kutta scheme is used, if the operators Sj are approximated using the order k approximation property, with, in particular, order k +1 spline wavelets, if the initial data u0 are in H 1+α ×H 1+α , if coefficient functions are at least C K+1+max(k,K),1 , and if (9.12)
1/K + 1/k < 1/α,
then the algorithm above with N = L2J degrees of freedom computes an approximation with error bound 0 0 0 0 J 0 0 0 0 w (t ) − v(t ) ≤ CN −α u0 H 1+α ×H 1+α (9.13) 1 0 j,l(j),Δtj 1 0 0 1 1 0 j=0 H ×H
at a cost O(N ). If (9.14)
1/K + 1/k = 1/α,
A FAST METHOD FOR LINEAR WAVES
1193
it satisfies the same error bound at cost O(N log N ). The requirement that u0 is in H 1+α × H 1+α means that the initial values U0 for the original system (1.3) must be in H 1+α × H α . In (9.13) it may look like we are summing J functions of N sample points, with cost O(JN ) = O(N log N ). However, this is not the case. The terms wj,l(j),Δtj (t1 ) have C2l(j) sample points (being in Vl(j) ). Using the wavelet spaces, and the fast wavelet transform (which is O(N ) for N sample points), the summation can be done J at cost C j=0 2l(j) ≤ C2J = O(N ). 10. Discussion. A numerical method for wave propagation in smooth media was developed. The numerical results in section 5 show that the method certainly has potential in applications with relatively smooth media. Further improvements might be possible to further improve computation speed or weaken the requirements of medium smoothness. One step that could possibly give an improvement is a coordinate change that makes the wave speed equal to unity. We refrained from doing this since it has no equivalent in higher dimensions, but it could reduce the error in the application of the operator T . The material of sections 6 to 9 not only leads to the O(N ) complexity result but also suggests ways to possibly improve the method. The main question for future research is in our view about the generalization to higher-dimensional cases. For the multidimensional case, curvelets form a redundant basis (frame) with respect to which the solution operator can be made sparse [4]. Potentially it could be used for computations. However, one needs to be able to implement operators that give the approximate effect of wave propagation, such as translation, rotation, and deformation, efficiently in a curvelet basis. Perhaps other fast implementations of Fourier integral operators could be used (cf. [3]) to compute the approximate wave propagation. In dimension 2 and higher the remainder operator R becomes, at least in the continuous setting, a pseudodifferential operator, which is more challenging to implement. But a priori there is no reason why the principle of combining an approximate solution operator with lower-order, exact “corrections” could not be extended to higher dimensions. REFERENCES [1] R. E. Bank and T. Dupont, An optimal order process for solving finite element equations, Math. Comp., 36 (1981), pp. 35–51. [2] G. Beylkin and K. Sandberg, Wave propagation using bases for bandlimited functions, Wave Motion, 41 (2005), pp. 263–291. [3] E. Cand` es, L. Demanet, and L. Ying, Fast computation of Fourier integral operators, SIAM J. Sci. Comput., 29 (2007), pp. 2464–2493. [4] E. J. Cand` es and L. Demanet, The curvelet representation of wave propagators is optimally sparse, Comm. Pure Appl. Math., 58 (2005), pp. 1472–1528. [5] A. Cohen, Numerical Analysis of Wavelet Methods, Stud. Math. Appl. 32, North–Holland, Amsterdam, 2003. [6] G. C. Cohen, Higher-Order Numerical Methods for Transient Wave Equations, Sci. Comput., Springer-Verlag, Berlin, 2002. [7] W. Dahmen, Wavelet and multiscale methods for operator equations, in Acta Numerica, 1997, Cambridge University Press, Cambridge, UK, 1997, pp. 55–228. [8] W. Dahmen and C. A. Micchelli, Using the refinement equation for evaluating integrals of wavelets, SIAM J. Numer. Anal., 30 (1993), pp. 507–537. [9] L. Demanet and L. Ying, Wave atoms and time upscaling of wave equations, Numer. Math., to appear. [10] J. J. Duistermaat, Fourier Integral Operators, Birkh¨ auser, Boston, 1996.
1194
CHRISTIAAN C. STOLK
¨ rmander, Fourier integral operators II, Acta. Math., 128 (1972), [11] J. J. Duistermaat and L. Ho pp. 183–269. [12] B. Engquist, S. Osher, and S. Zhong, Fast wavelet based algorithms for linear evolution equations, SIAM J. Sci. Comput., 15 (1994), pp. 755–775. [13] R. J. LeVeque, Convergence of a large time step generalization of Godunov’s method for conservation laws, Comm. Pure Appl. Math., 37 (1984), pp. 463–477. [14] J. L. Lions and E. Magenes, Non-homogeneous Boundary Value Problems and Applications, Vol. 1, Springer-Verlag, Berlin, 1972. [15] H. F. Smith, A Hardy space for Fourier integral operators, J. Geom. Anal., 8 (1998), pp. 629– 653. [16] C. C. Stolk, On the Modeling and Inversion of Seismic Data, Ph.D. thesis, Utrecht University, Utrecht, The Netherlands, 2000. [17] G. Strang, On the construction and comparison of difference schemes, SIAM J. Numer. Anal., 5 (1968), pp. 506–517. [18] M. E. Taylor, Reflection of singularities of solutions to systems of differential equations, Comm. Pure Appl. Math., 28 (1975), pp. 457–478. [19] M. E. Taylor, Pseudodifferential Operators, Princeton University Press, Princeton, NJ, 1981. [20] L. N. Trefethen, Spectral Methods in MATLAB, Software Environ. Tools 10, SIAM, Philadelphia, 2000. [21] F. Treves, Introduction to Pseudodifferential and Fourier Integral Operators, Vol. 2, Plenum Press, New York, 1980.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1195–1225
STABLE AND COMPATIBLE POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS AND APPLICATIONS TO THE p AND h-p FINITE ELEMENT METHOD∗ BENQI GUO† AND JIANMING ZHANG‡ Abstract. Polynomial extensions play a vital role in the analysis of the p and h-p finite element method (FEM) and the spectral element method. We construct explicitly polynomial extensions on standard elements: cubes and triangular prisms, which together with the extension on tetrahedrons are used by the p and h-p FEM in three dimensions. These extensions are proved to be stable and compatible with FEM subspaces on tetrahedrons, cubes, and prisms and realize a continuous 1/2 1/2 mapping: H00 (T ) (or H00 (S)) → H 1 (Ωst ), where Ωst denotes one of these standard elements and T and S are their triangular and square faces. Applications of these polynomial extensions to the p and h-p FEM are illustrated. Key words. the p and h-p version, finite elememt method, polynomial extension, tetrahedron, hexahedron, prism, pyramid, cube, Sobolev spaces, Jacobi polynomials AMS subject classifications. 65N30, 65N25, 35D10 DOI. 10.1137/070688006
1. Introduction. In the analysis of the high-order finite element method (FEM), such as the p and h-p versions of FEM and the spectral element method, we need to construct a globally continuous and piecewise polynomial which has the optimal estimation for its approximation error and satisfies homogeneous or nonhomogeneous Dirichlet boundary conditions. The construction of such a polynomial is started with local polynomial projections on each element for the best rate of convergence. Unfortunately, a union of local polynomial projections is not globally continuous and does not satisfy the homogeneous Dirichlet boundary conditions or the nonhomogeneous Dirichlet boundary conditions. In the context of the continuous Galerkin method in two and three dimensions, we have to adjust these local polynomial projections by a special technique called polynomial extension or lifting. Hence, it is essential for us to build a polynomial extension compatible with FEM subspaces, by which the union of local polynomial projections can be modified to a globally continuous polynomial without degrading the best order of approximation error. Compatible polynomial extensions together with local projections led to the best estimation in the approximation error for the p and h-p FEM [1, 2, 5, 6, 16, 21]. Babuˇska and Suri [5] proposed an extension F on a triangle T with I = (0, 1) as one of its sides, which realizes a continuous mapping H 1/2 (I) → H 1 (T ) such that F f ∈ Pp1 (T ) for f ∈ Pp (I). The extension is the convolution of f and a characteristic function. Using this extension they proved the existence of the continuous extension ∗ Received by the editors April 12, 2007, accepted for publication (in revised form) September 22, 2008; published electronically February 25, 2009. http://www.siam.org/journals/sinum/47-2/68800.html † Department of Mathematics, Shanghai Normal University, Shanghai, China and Department of Mathematics, University of Manitoba, Winnipeg, MB R3T 2N2, Canada (
[email protected]). The work of this author was partially supported by NSERC of Canada under grant OGP0046726 and partially supported by the Computational Science E-Institute of Shanghai Universities under project E03004. ‡ Department of Mathematics, University of Manitoba, Winnipeg, MB R3T 2N2, Canada (
[email protected]). The work of this author was partially supported by the University of Manitoba and by NSERC of Canada under grant OGP0046726.
1195
1196
BENQI GUO AND JIANMING ZHANG 1/2
R: H00 (Γ) → H 1 (T ) [3, 5] such that Rf ∈ Pp1 (T ) for f ∈ Pp0 (I). They generalize the extension on a square S = (−1, 1)2 , which realizes a continuous mapping 1/2 H00 (Γ) → H 1 (S) and Rf ∈ Pp2 (S) for f ∈ Pp0 (I). Hereafter, Pp (I) denotes a set of polynomial of degree ≤ p and Pp0 (I) is its subset of polynomial vanishing at the endpoints of I, Pp1 (Ω) and Pp2 (Ω) denote sets of polynomials of total and separate degree ≤ p on a domain Ω in Rn , n = 2, 3, respectively, and Ppm,0 (Ω) is its subset of polynomials vanishing on the boundary of Ω. These polynomial extensions are compatible with FEM subspaces and have been successfully applied to the p and h-p versions of FEM in two dimensions, which lead to the optimal estimate for approximation error in the finite element solution of the p and h-p versions on quasi-uniform meshes with triangular and quadrilateral elements [1, 2, 5, 6, 16]. It was shown [20] that the extension on a triangle or a square defined in [5] is stable in Sobolev spaces. The polynomial extensions in weighted Sobolev spaces on a square were studied in [9] to improve the error estimation of the spectral collection method for an approximation of the Stokes equation. The polynomial extensions in high-order Sobolev spaces were studied in [8]. The extension of convolution-type has been generalized to tetrahedrons [21] and cubes [7] in three dimensions. Mu˜ noz-Sola creatively developed the polynomial extension of convolution-type on tetrahedron K from a triangular face T by introducing the extension operator R (see (2.2)) and gave an explicit proof of continuity of the mapping 1/2 H00 (T ) → H 1 (K) such that RK f ∈ Pp1 (K) if f ∈ Pp1,0 (T ), which is compatible with the FEM subspaces on tetrahedral elements. The polynomial extension RK together with local projections leads to an error estimation for the h-p FEM on tetrahedral meshes [21]. Unfortunately, the polynomial extension of convolution-type on a cube D is not compatible with FEM subspaces on hexahedral element. Namely, if f ∈ Pp2,0 (S) where S is a square face of D, the extended polynomial by the convolution will not be in Pp2 (D), instead, in Pp2 (S) × P2p (I). Also, if f ∈ Pp1,0 (S), the extended polynomial is in Pp2 (D). Obviously, Pp1 (S) is not a trace space of Pp2 (D) and Pp2 (S) × P2p (I) ⊆ Pp2 (D). It seems that the extension of convolution-type works only for polynomial spaces of total degree ≤ p on elements in three dimensions, e.g., Pp1 (K), but does not work for polynomials spaces of separate degree ≤ p, e.g., Pp2 (D). Therefore, we need to develop a new type of extension operator RD without using convolution. In this paper we design polynomial extension on cubes by using spectral solutions of the eigenvalue problem of Poisson equation on a square face S and two-point value problem on an interval I. A polynomial extension using eigen-polynomials which forms an L2 and H 1 orthogonal basis of Pp2,0 (S) and spectral solutions of twopoint value problems associated with the eigenvalues realize a continuous mapping 1/2 RD : H00 (S) → H 1 (D) and RD f ∈ Pp2 (D) for f ∈ Pp2,0 (S). Besides tetrahedrons(simplices) and hexahedrons(cubes), triangular prisms are commonly used for FEM in three dimensions. There are two types of different faces of triangular prism: triangle and square. Therefore, we need to construct a polynomial extension from a triangular face and a polynomial extension from a square face. The former one is based on the convolution-type extension on a tetrahedron, and the later one is based on a new extension on a triangle from a side. Both are compatible with FEM subspaces 1/2 and realize continuous mapping H00 (T ) → H 1 (G) and H 2 (S) ∩ H01 (S) → H 1 (G), respectively. The rest of the paper is organized as follows. In section 2, after quoting the results T on polynomial extension R on tetrahedrons K from [21], a polynomial extension RK from a triangular face T to a triangular prism G is introduced, which is based on the extension on a truncated tetrahedron KH incorporated with a trilinear mapping of G onto KH . The continuity of the mapping is proved, and the compatibility with
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
1197
S from a square face S to a FEM subspace is verified. Another polynomial extension RG T triangular prism G is constructed, which is as important as RK in the error analysis of FEM on prism elements. In section 3, we construct an extension on a cube D without using convolution, instead using spectral solutions of an eigenvalue problem on a square and a two-point value problem on an interval. It is shown that this polynomial 1/2 extension realizes a continuous mapping: H00 (S) → H 1 (D) and compatible with FEM subspaces on cubic elements. Applications of the polynomial extensions to error estimation for the p-version of FEM in three dimensions are illustrated in the last section.
2. Polynomial extension on a triangular prism. 2.1. Polynomial extension on a tetrahedron. For the construction of polynomial extensions on a triangular prism, we need to quote results on the extension on a tetrahedron from [21]. We denote, by K, a standard tetrahedron {(x1 , x2 , x3 )|x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x1 + x2 + x3 ≤ 1} in R3 shown in Figure 2.1, and ∂K denotes the boundary of K. Let T = {(x1 , x2 )|x1 ≥ 0, x2 ≥ 0, x1 + x2 ≤ 1} be a standard triangle in R2 , and let Γi , 1 ≤ i ≤ 3 be faces of K contained in the plane xi = 0 and Γ4 be the oblique face. Mu˜ noz-Sola introduced the following operators [21]: x1 +x3 x1 +x2 +x3 −ξ1 2 FK f (x1 , x2 , x3 ) = 2 (2.1) dξ1 f (ξ1 , ξ2 )dξ2 x3 x1 x2 and (2.2)
RK f (x1 , x2 , x3 ) = (1 − x1 − x2 − x3 )x1 x2 FK f˜(x1 , x2 , x3 ),
with f˜(x1 , x2 ) =
f (x1 , x2 ) . x1 x2 (1 − x1 − x2 )
The operator RK has the following decomposition: (2.3)
RK f (x1 , x2 , x3 ) = (1 − x1 − x2 − x3 )R12 f (x1 , x2 , x3 ) + x2 R13 f (x1 , x2 , x3 ) + x1 R23 f (x1 , x2 , x3 ),
x
3
1
K Γ1
Γ4
Γ2 O
1
Γ3 1
x
1
Fig. 2.1. The tetrahedron K.
x
2
1198
BENQI GUO AND JIANMING ZHANG
where (2.4) (2.5)
f (x1 , x2 ) f˜12 (x1 , x2 ) = , x1 x2 Ri3 f (x1 , x2 , x3 ) = (1 − x1 − x2 − x3 )xi FK f˜i3 (x1 , x2 , x3 ),
R12 f (x1 , x2 , x3 ) = x1 x2 FK f˜12 (x1 , x2 , x3 ),
with f˜i3 (x1 , x2 ) =
f (x1 , x2 ) , i = 1, 2. xi (1 − x1 − x2 )
The following theorems were proved in [21]. Theorem 2.1. Let RK be the operator defined by (2.2). Then RK f (x) ∈ Pp1 (K) for f ∈ Pp1,0 (Γ3 ), and (2.6)
RK f H 1 (K) ≤ Cf
(2.7)
RK f |Γ3 = f,
1
2 (Γ ˆ3) H00
,
RK f |Γi = 0, i = 1, 2, 4,
where C is a constant independent of f and p. Theorem 2.2. For f ∈ Pp1 (∂K) = {f ∈ C 0 (∂K) | f |Γi ∈ Pp1 (Γi ), 1 ≤ i ≤ 4}, there exists a polynomial EK f ∈ Pp1 (K) such that EK f |∂K = f and EK f H 1 (K) ≤ Cf H 1/2 (∂K) ,
(2.8)
where C is a constant independent of f and p. 2.2. Polynomial extension on prisms from a triangular face. Let G = T × I be a triangular prism with faces Γi , 1 ≤ i ≤ 5 shown in Figure 2.2, where T = {(˜ x1 , x ˜2 ) | x ˜1 ≥ 0, x ˜2 ≥ 0, x ˜1 + x˜2 ≤ 1} and I = [0, 1]. Γi , 1 ≤ i ≤ 3 are on the ˜3 = 1, and Γ4 is the face planes x ˜i = 0, Γ5 is the face of G contained in the plane x of G contained in the plane x ˜1 + x ˜2 = 1. Then Γ3 = T and Γ2 = S = I × I. By ˜1 and x˜2 Pp1 (T ) × Pp (I), we denote a set of polynomials with the subtotal degree in x ≤ p and with the degree ≤ p in x ˜3 . Obviously Pp1 (G) ⊂ Pp1 (T ) × Pp (I) ⊂ Pp2 (G), it is denoted by Pp1.5 (G). We shall establish polynomial extensions from the triangle T to the prism G. Since the mapping M : ˜1 (1 − H x ˜3 ), x1 = x
(2.9)
x2 = x ˜2 (1 − H x ˜3 ),
x3 = H x ˜3
x3
x3
1
K
1
Γ5
G
Γ5
Γ3
KH
Γ4
Γ1 Γ2 O
H
1
Γ1
Γ2
x2
Γ4
O
Γ3
1
1
x1
x1 Fig. 2.2. The prism G and truncated tetrahedron KH .
1
x2
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
1199
maps the prism G onto a truncated tetrahedron KH = {(x1 , x2 , x3 )|x1 ≥ 0, x2 ≥ ˜i, i = 0, H ≥ x3 ≥ 0, x1 + x2 + x3 ≤ 1}, with H ∈ (0, 1) shown in Figure 2.2. Γ ˜ ˜ 1, 2, 3, 4, 5 are the faces of KH , Γ3 and Γ5 are contained in the planes x3 = 0 and ˜ i , i = 1, 2, 4 are portions of the faces of the tetrahedron x3 = H, respectively, and Γ K. Hence, need to construct a polynomial extension operator RH : Pp1,0 (T ) → O we 1 1 Pp (KH ) Pp (T ) × P1 (IH ) with desired properties, where IH = (0, H), which can lead to a polynomial extension from a triangular face to a whole prism. We now introduce polynomial lifting operator RH on KH defined by (2.10)
RH f (x1 , x2 , x3 ) = RK f (x1 , x2 , x3 ) −
x3 RK f (x1 , x2 , H), H
where RK is the lifting operator on K given in (2.2). Theorem 2.3. Let RH be the operator given in (2.10). Then, RH f (x) ∈ O Pp1 (KH ) Pp1 (T ) × P1 (IH ) for f ∈ Pp1,0 (T ) such that RH f (x) |Γ6 3 = f, RH f |Γ6 i = 0, i = 1, 2, 4, 5, and (2.11)
RH f H 1 (KH ) ≤ Cf
1
2 (Γ 6 3) H00
,
where IH = (0, H) and TH = {(x1 , x2 ) | x1 ≥ 0, x2 ≥ 0, x1 + x2 ≤ 1 − H} and C is a constant independent of f and p. T Combining the operator RH and the mapping M , we construct an extension RG by T RG f (˜ x1 , x ˜2 , x˜3 ) = RH f ◦ M = U (˜ x1 , x ˜2 , x ˜3 ) − x ˜3 U (˜ x1 , x ˜2 , 1), ˜2 , x ˜3 ) = RK f ◦M . Suppose that RK f (x1 , x2 , x3 ) = i+j+k≤p aijk xi1 xj2 xk3 , where U (˜ x1 , x then
(2.12)
x1 , x ˜2 , x˜3 ) = U (˜ x1 , x ˜2 , x ˜3 ) RK f ◦ M (˜ = aijk H k x ˜i1 x˜j2 x ˜k3 (1 − H x ˜3 )i+j ∈ Pp1 (T ) × Pp (I) i+j+k≤p
and x3 RK f (x1 , x2 , H) ◦ M = x ˜3 U (˜ x1 , x ˜2 , 1) ∈ Pp1 (T ) × P1 (I). H T f (˜ x1 , x ˜2 , x ˜3 ) = RH f ◦ M ∈ Pp1,0 (T ) × Pp (I) if f ∈ Pp1 (T ). We are able Therefore, RG to establish the polynomial extension from a triangular face to a prism. T T be the extension defined in (2.12). Then, RG f ∈ Pp1 (T ) × Theorem 2.4. Let RG T Pp (I) for f ∈ Pp1,0 (T ), RG f |Γ3 = f and vanishes on ∂G\Γ3 , and 0 T 0 0R f 0 1 (2.13) ≤ Cf 12 , G H (G) H00 (Γ3 )
where C is a constant independent of f and p. T T Proof. Obviously, RG : Pp1,0 (T ) → Pp1,0 (T ) × Pp (I), and RG fΓ3 = f for f ∈ 1,0 T Pp (T ), RG f |Γi = 0, i = 1, 2, 4, 5. Since the mapping M is trilinear, 0 T 0 0RG f 0 1 ≤ CRH f H 1 (KH ) . H (G) Then, (2.13) follows from (2.11) easily.
1200
BENQI GUO AND JIANMING ZHANG
ξ
ξ
ξ=x+h
ξ=x+h
a ξ=x
a
ξ=x
h a−h
a−h h X a−h
O
X a−h
O
Case 1. 0 < h < a/2
Case 2. a/2 < h < a
Fig. 2.3. Case 1 and Case 2.
It remained to prove Theorem 2.3. To this end, we need the following lemmas. Lemma 2.5. For 0 < h < a and any function g ∈ L2 (0, a), it holds that
a−h
(2.14) 0
2 a 1 x+h g(ξ)dξ dx ≤ |g(x)|2 dx. h x 0
Also, there hold
a−h
(2.15) 0
2 1 x+h 1 a g(ξ)dξ dx ≤ x|g(x)|2 dx h x h 0
and
a−h
(2.16) 0
2 1 x+h 1 a g(ξ)dξ dx ≤ (a − x)|g(x)|2 dx. h x h 0
Proof. By Schwarz inequality, we have 2 2 a−h x+h a−h x+h a−h x+h |g(ξ)|2 1 1 dξ. g(ξ)dξ dx ≤ |g(ξ)|dξ dx ≤ dx h x h x h 0 0 0 x Case 1 : 0 < h ≤ a/2 (see Figure 2.3). There holds 2 a−h x+h 1 x+h |g(ξ)|2 g(ξ)dξ dx ≤ dx dξ h x h 0 0 x a−h ξ a h ξ a−h |g(ξ)|2 |g(ξ)|2 |g(ξ)|2 dx + dx + dx = dξ dξ dξ h h h 0 0 h ξ−h a−h ξ−h a−h a h ξ|g(ξ)|2 h|g(ξ)|2 (a − ξ)|g(ξ)|2 dξ + dξ + dξ. = h h h 0 h a−h
a−h
Hence, we have 0
a−h
2 a 1 x+h g(ξ)dξ dx ≤ |g(ξ)|2 dξ h x 0
1201
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
and
a−h 0
2 1 x+h 1 a g(ξ)dξ dx ≤ ξ|g(ξ)|2 dξ. h x h 0
Case 2 : a/2 < h < a (see Figure 2.3). Similarly, there holds 2 a−h x+h x+h |g(ξ)|2 1 dξ g(ξ)dξ dx ≤ dx h x h 0 0 x h a a−h a−h ξ a−h |g(ξ)|2 |g(ξ)|2 |g(ξ)|2 dx + dx + dx = dξ dξ dξ h h h 0 0 a−h 0 h ξ−h h a a−h ξ|g(ξ)|2 (a − h)|g(ξ)|2 (a − ξ)|g(ξ)|2 dξ + dξ + dξ, = h h h 0 a−h h
a−h
which implies
a−h
0
2 a 1 x+h g(ξ)dξ dx ≤ |g(ξ)|2 dξ h x 0
and
a−h 0
2 1 x+h 1 a g(ξ)dξ dx ≤ ξ|g(ξ)|2 dξ. h x h 0
Therefore, we always have (2.14) and (2.15) for 0 < h ≤ a/2 or a/2 < h < a. Letting η = a − ξ and xˆ = a − h − x and using (2.15), we obtain 2 2 a−h xˆ+h 1 x+h 1 g(ξ)dξ dx = g(a − η)dη dˆ x h h 0 x 0 x ˆ 1 a 1 a ≤ x ˆ|g(a − x ˆ)|2 dˆ x= (a − z)|g(z)|2 dz, h 0 h 0
a−h
which yields (2.16). Lemma 2.6. Let R12 (x1 , x2 , H) and Ri3 (x1 , x2 , H) be the operators given in (2.4) and (2.5), with x3 = H. Then 0 0 1 0 0 (2.17) R12 f (x1 , x2 , H)L2 (KH ) ≤ C 0(x1 x2 ) 2 f (x1 , x2 )0 L2 (T )
and for i = 1, 2, (2.18)
0 1 0 1 0 0 Ri3 f (x1 , x2 , H)L2 (KH ) ≤ C 0xi2 (1 − x1 − x2 ) 2 f (x1 , x2 )0
L2 (T )
,
where C is a constant independent of f . Proof. Note that R12 f (x1 , x2 , H)2L2 (KH )
4 ≤ 2 H
H
dx3 0
1−x3
dx2 0
2 x1 +H 1 g1 (ξ1 )dξ1 dx1 , H x1
1−x2 −x3 0
1202
BENQI GUO AND JIANMING ZHANG
x +H with g1 (ξ1 ) = x22 |f˜(ξ1 , ξ2 )|dξ2 . Hereafter, f˜ denotes the extension of f by zero outside T . We apply here Lemma 2.5 to g1 (ξ1 ) with a = 1 − x2 − x3 , h = H, x = x1 , ξ = ξ1 . Then we get 2 1−x2 −x3 x1 +H 1 1−x2 −x3 +H 1 g(ξ1 )dξ1 dx1 ≤ x1 |g1 (x1 )|2 dx1 , H x1 H 0 0 which implies 2 1 x1 +H dx2 g1 (ξ1 )dξ1 dx1 H x1 0 0 2 1−x2 −x3 +H x2 +H 1 1−x3 ˜ ≤ dx2 x1 f (x1 , ξ2 ) dξ2 dx1 x2 H 0 0 ⎧ 1−x3 x2 +H ⎨ H 2 1 ˜ =H x1 dx1 f (x1 , ξ2 ) dξ2 dx2 ⎩ 0 H x2 0
(2.19)
1−x3
1−x2 −x3
1−x3 +H
+
⎫ 2 ⎬ 1 x2 +H ˜ f (x1 , ξ2 ) dξ2 dx2 . H x2 ⎭
1−x1 −x3 +H
x1 dx1 0
H
Applying Lemma 2.5 again, we have 2 1−x3 x2 +H 2 1 1−x3 +H ˜ ˜ 1 x2 f (x1 , x2 ) dx2 f (x1 , ξ2 ) dξ2 dx2 ≤ H x2 H 0 0 and
2 1 x2 +H 2 1 1−x1 −x3 +2H ˜ ˜ x2 f (x1 , x2 ) dx2 , f (x1 , ξ2 ) dξ2 dx2 ≤ H x2 H 0
1−x1 −x3 +H
0
which together with (2.19) yields 2 1−x3 1−x2 −x3 x1 +H x2 +H 1 ˜ dx2 dξ1 f (ξ1 , ξ2 ) dξ2 dx1 H x1 x2 0 0 1+H 1−x3 +H 1−x1 +2H 2 H ≤ dx1 + dx1 x2 x1 f˜(x1 , x2 ) dx2
0
≤
0
H
dx1 0
0
1+H
0
H
+
1+H
dx1
H
1−x1 +2H
0
2 0 02 1 0 0 x2 x1 f˜(x1 , x2 ) dx2 ≤ 2 0(x1 x2 ) 2 f 0 2
L (T )
Therefore, (2.17) follows immediately. Let Q1 be the mapping (2.20)
ˆ2 , x1 = x
x2 = 1 − x ˆ1 − x ˆ2 − x ˆ3 ,
which maps KH onto itself, and let W1 be the mapping (2.21)
ξ1 = ξˆ2 ,
ξ2 = 1 − ξˆ1 − ξˆ2 ,
x3 = x ˆ3 ,
.
1203
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
which maps T onto itself. Then fˆ(ξˆ1 , ξˆ2 ) = f (ξ1 , ξ2 ) ◦ W1 = f (ξˆ2 , 1 − ξˆ1 − ξˆ2 ) and R12 f (ˆ x1 , x ˆ2 , H) = R13 f (x1 , x2 , x3 ) ◦ Q1 |x3 =H . Therefore, 0 0 0 12 0 0 0 0 0 ˆ ˆ ˆ 0 ˆ2 , H)0 ≤ C 0 ξ1 ξ2 fˆ0 x1 , x R13 f (x1 , x2 , H)L2 (KH ) ≤ 0R12 f (ˆ 0 L2 (KH )
L2 (T )
0 1 0 1 0 0 ≤ C 0ξ12 (1 − ξ1 − ξ2 ) 2 f 0
L2 (T )
.
For R23 f , we introduce mapping Q2 and W2 : (2.22)
Q2 :
x1 = 1 − x ˆ1 − xˆ2 − x ˆ3 ,
x2 = x ˆ1 ,
x3 = x ˆ3 ,
which maps KH onto itself, and (2.23)
ξ1 = 1 − ξˆ1 − ξˆ2 ,
W2 :
ξ2 = ξˆ1 ,
which maps T onto itself. Similarly, there holds 0 0 0 0 R23 f (x1 , x2 , H)L2 (KH ) ≤ 0R12 fˆ(ˆ ˆ2 , H)0 x1 , x
L2 (KH )
0 1 0 1 0 0 ≤ C 0ξ22 (1 − ξ1 − ξ2 ) 2 f 0
0 1 0 ˆ1 ξˆ2 2 ≤C0 ξ 0
L2 (T )
0 0 fˆ0 0
L2 (T )
.
Lemma 2.7. Let R12 (x1 , x2 , H) and Ri3 (x1 , x2 , H) be the operators given in (2.4) and (2.5), with x3 = H. Then for i = 1, 2, 0 0 0 1 0 0 ∂R12 f (x1 , x2 , H) 0 0 −2 0 0 0 (2.24) ≤ C 0xi f 0 2 , 0 2 0 ∂xi L (T ) L (KH ) and t = 1, 2 0 0
0 0 0 0 0 ∂Ri3 f (x1 , x2 , H) 0 0 − 12 0 0 − 12 0 0 0 ≤ C 0xt f 0 2 + 0(1 − x1 − x2 ) f 0 2 (2.25) 0 , 0 2 ∂xt L (T ) L (T ) L (KH ) where C is a constant independent of f . Proof. Note that x1 +x2 +H−ξ1 f (ξ1 , ξ2 ) 2x2 x1 +H ∂R12 f (x1 , x2 , H) = 2 dξ1 dξ2 ∂x1 H x1 ξ1 ξ2 x2 2x2 x2 +H f (x1 , ξ2 ) 2x1 x2 x1 +H f (ξ1 , x1 + x2 + H − ξ1 ) dξ1 − 2 dξ2 + H x2 ξ2 H 2 x1 ξ1 (x1 + x2 + H − ξ1 ) and
∂R12 f (x1 , x2 , H) ≤ I1 + I2 + I3 , ∂x1
(2.26) where I1 = I3 =
2 H2 2 H2
x1 +H
x2 +H
dξ1 x1 x1 +H x1
x2
|f (ξ1 , ξ2 )| dξ2 , ξ1
I2 =
|f (ξ1 , x1 + x2 + H − ξ1 )|dξ1 .
2 H2
x2 +H
x2
|f (x1 , ξ2 )|dξ2 ,
1204
BENQI GUO AND JIANMING ZHANG
Note that 4 = 2 H
I1 2L2 (KH )
H
dx3 0
1−x3
1−x2 −x3
dx2 0
0
1 H
2
x1 +H
g1 (ξ1 )dξ1
dx1 ,
x1
x +H |f˜(ξ1 ,ξ2 )| dξ2 . Applying Lemma 2.5 to g1 (ξ1 ) with a = 1 − x2 − with g1 (ξ1 ) = x22 ξ1 x3 , h = H, x = x1 , ξ = ξ1 , we have 2 2 1−x2 −x3 x1 +H 1 1−x2 −x3 +H x2 +H f˜(x1 , ξ2 ) 1 g1 (ξ1 )dξ1 dx1 ≤ x1 dξ2 dx1 , H x1 x2 H 0 x1 0 which implies 2 1 x1 +H dx2 g1 (ξ1 )dξ1 dx1 H x1 0 0 2 1−x2 −x3 +H 1 x2 +H ˜ 1 1−x3 f (x1 , ξ2 )dξ2 dx1 ≤ dx2 H 0 x1 x2 0 ⎧ 2 1−x3 x2 +H ⎨ H 1 1 ≤H dx1 f˜(x1 , ξ2 )dξ2 dx2 H x2 ⎩ 0 x1 0
1−x3
1−x3 +H
+ H
1−x2 −x3
1 dx1 x1
1−x1 −x3 +H
0
⎫ 2 1 x2 +H ⎬ f˜(x1 , ξ2 )dξ2 dx2 . H x2 ⎭
Applying Lemma 2.5 again to the function g2 (ξ2 ) = f˜(x1 , ξ2 ), we have H H 1−x3 +H 2 4 1 ˜ 2 I1 L2 (KH ) ≤ 2 dx3 dx1 f (x1 , x2 ) dx2 H 0 x 1 0 0 H 1−x3 +H 1−x1 −x3 +2H 2 1 4 ˜ + 2 dx3 dx1 f (x1 , x2 ) dx2 H 0 x 1 H 0 (2.27) 2 1+H 1+H 1−x1 +2H f˜(x1 , x2 ) H 4 dx1 + dx1 dx2 ≤ H x1 0 0 H 0 0 8 0 0 − 12 02 ≤ 0x1 f 0 2 . H L (T ) Similarly, we have by Lemma 2.5, I2 2L2 (KH )
4 = 2 H ≤
(2.28)
4 H3
H
dx3
0
dx3
0
1−x3
0
2 1 x2 +H |f (x1 , ξ2 )|dξ2 dx2 H x2
1−x1 −x3 +H
dx1
H 1 4 dx dx 3 1 H3 0 0 0 0 0 4 0 12 02 = 2 0x2 f 0 2 H L (T )
≤
1−x1 −x3
dx1 0
H 0
1−x3
0 1−x1 +H
x2 |f (x1 , x2 )|2 dx2
2 x2 f˜(x1 , x2 ) dx2
1205
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
and I3 2L2 (KH ) 2 H 1−x3 1−x2 −x3 x1 +H 4 ˜ 1 = 2 dx3 dx2 f (ξ1 , x1 + x2 + H − ξ1 ) dξ1 dx1 H x1 H 0 0 0 H 1−x3 1−x2 −x3 +H 4 2 ≤ 3 dx3 dx2 x1 f˜(x1 , x2 + H) dx1 H 0 0 0 H 1 1−x2 +H 2 4 ≤ 3 dx3 dx2 x1 f˜(x1 , x2 + H) dx1 H 0 0 0 1 1−x2 +H 2 4 = 2 dx2 x1 f˜(x1 , x2 + H) dx1 . H 0 0 Letting z = x2 + H, we have 1 1−x2 +H 2 4 dx x1 f˜(x1 , x2 + H) dx1 2 2 H 0 0 1+H 1−z+2H 2 4 = 2 dz x1 f˜(x1 , z) dx1 H H 0 1 1−z 2 0 4 4 0 0 1 02 = 2 dz x1 f˜(x1 , z) dx1 ≤ 2 0x12 f 0 , H H H L2 (T ) 0 which implies (2.29)
I3 2L2 (KH ) ≤
0 4 0 0 12 02 f . 0x 0 H 2 1 L2 (T )
Combining (2.26)–(2.29), we have 0 0 0 ∂R12 f (x1 , x2 , H) 0 0 0 0 0 ∂x1
L2 (KH )
0 1 0 0 − 0 ≤ C 0x1 2 f 0
L2 (T )
.
Similarly, we can prove 0 0 0 1 0 0 ∂R12 f (x1 , x2 , H) 0 0 − 0 0 0 ≤ C 0x2 2 f 0 2 . 0 0 ∂x2 L (T ) 2 L (KH ) Let Qi and Wi (i=1,2) be the mapping as defined in (2.20)–(2.23). Then, for t = 1, 2, 0 0 0 0 0 0 0 0 ˆ(ˆ 0 ∂R13 f (x1 , x2 , H) 0 , x ˆ , H) f x ∂R 0 ˆ− 12 ˆ0 0 0 12 1 2 0 0 ξ ≤ ≤ C f 0 2 0 0 0 i 0 2 0 0 0 2 ∂xt ∂x ˆi L (T ) L (KH ) i=1,2 i=1,2 L (KH )
0 0 0 0 0 −1 0 0 −1 0 ≤ C 0ξ1 2 f 0 + 0(1 − ξ1 − ξ2 ) 2 f 0 . 2 2 L (T )
L (T )
Similarly, we have for t = 1, 2, 0 0 0 0 0 0 0 0 ˆ(ˆ 0 ∂R23 f (x1 , x2 , H) 0 f x , x ˆ , H) ∂R 0 ˆ− 12 ˆ0 0 0 12 1 2 0 0 ≤ ≤C 0 0 0ξi f 0 2 0 0 0 0 2 ∂xt ∂x ˆi L (T ) L2 (KH ) i=1,2 i=1,2 L (KH )
0 0 0 0 0 −1 0 0 −1 0 ≤ C 0ξ2 2 f 0 2 + 0(1 − ξ1 − ξ2 ) 2 f 0 2 . L (T )
L (T )
1206
BENQI GUO AND JIANMING ZHANG
Proof of Theorem 2.3. Obviously, RH f (x) ∈ Pp1 (KH ) f ∈ Pp1,0 (T ). Due to (2.10), we have
O
Pp1,0 (T ) × P1 (IH ) for
RH f (x1 , x2 , x3 )H 1 (KH ) ≤ RK f (x1 , x2 , x3 )H 1 (KH ) 0x 0 0 3 0 + 0 RK f (x1 , x2 , H)0 . H H 1 (KH )
(2.30)
By Theorem 2.1, there holds (2.31) RK f (x1 , x2 , x3 )H 1 (KH ) ≤ RK f (x1 , x2 , x3 )H 1 (K) ≤ Cf (x1 , x2 )
1
2 (T ) H00
and by (2.3) and Lemma 2.6–Lemma 2.7, it holds that 0x 0 0 0 3 0 RK f (x1 , x2 , H)0 1 H H (KH ) ⎞ ⎛ ≤ C ⎝R12 f (x1 , x2 , H)H 1 (KH ) + Ri3 f (x1 , x2 , H)H 1 (KH ) ⎠ i=1,2
⎛
0 0 0 − 12 0 xi f 0 + ≤ C ⎝f 12 0 H (T ) i=1,2
L2 (T )
0 0 1 0 0 + 0(1 − x1 − x2 )− 2 f 0
L2 (T )
⎞ ⎠ ≤ Cf
1
2 (T ) H00
,
which together with (2.30)–(2.31) leads to (2.11) immediately. 2.3. Polynomial extension on prisms from a square face. We shall construct a polynomial extension on prisms from a square face S = {x = (x1 , x2 , x3 ) | 0 ≤ x1 , x3 ≤ 1}, which is as important as the extension from a triangular face for error analysis and preconditioning of high-order FEM in three dimensions [15, 18]. Lemma 2.8. Let T = {(x1 , x2 )|0 < x2 < 1 − x1 , 0 ≤ x1 < 1} be the standard triangle and I = (0, 1). Then there is a polynomial extension operator RT∗ : H01 (I) → H 1 (T ) such that RT∗ f ∈ Pp1 (T ) if f (x1 ) ∈ Pp0 (I), and (2.32) (2.33)
RT∗ f |I = f (x1 ), RT∗ f |∂T \I = 0, 3 1 RT∗ f H t (T ) ≤ C pt− 2 f H 1 (I) + pt− 2 f L2 (I) , t = 0, 1,
with C independent of f and p. Proof. Let ψ(x2 ) = (1 − x2 )p . Then for t ≥ 0, 1
ψH t (I) ≤ Cpt− 2 .
(2.34)
1 (T ) by We introduce a function Ψ ∈ P2p+1
Ψ(x1 , x2 ) = ψ(x2 )((1 − x1 − x2 )f (x1 ) + x1 f (x1 + x2 )). Then Ψ(x1 , 0) = f (x1 ), Ψ(1, x2 ) = Ψ(x1 , 1 − x1 ) = 0, and (2.35) (2.36) 1 2
1
ΨL2 (T ) ≤ Cp− 2 f L2 (I) , 1 1 ΨH 1 (T ) ≤ C p− 2 f H 1 (I) + p 2 f L2(I) .
By the lifting theorem on the triangle T [17], there exists a lifting operator RT :
H00 (I) → H 1 (T ) x1 (1 − x1 − x2 ) RT f = x22
x1 +x2
x1
f (ξ) dξ ξ(1 − ξ)
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
1207
such that RT f ∈ Pp1 (T ), RT f |I = f , RT f |∂T \I = 0, and RT f H 1 (T ) ≤ Cf
1
2 (I) H00
,
which implies that RT satisfies (2.33) with t = 1. Unfortunately, the extension does not give precise information on RT f L2 (T ) , and the desired estimation (2.33) with t = 0 may not be true for RT . Therefore, we have to construct a new extension operator RT∗ . Note that Ψ − RT f = 0 on ∂T . By ΠT , we denote the orthogonal projection operator H01 (T ) → Pp1,0 (T ), and let wp = RT f + ΠT (Ψ − RT f ). Then wp (x1 , 0) = f (x1 ), wp (1, x2 ) = wp (x1 , 1 − x1 ) = 0, and (2.37) Ψ − wp = I − ΠT (Ψ − RT f ). Due to the continuity of operator RT and a trace theorem, we obtain (2.38)
wp H 1 (T ) ≤ ΨH 1 (T ) + Ψ − wp H 1 (T ) ≤ ΨH 1 (T ) + Ψ − RT f H 1 (T )
≤ 2ΨH 1 (T ) + RT f H 1 (T ) ≤ C ΨH 1 (T ) + f 12 H00 (I) ≤ CΨH 1 (T ) . ≤ C ΨH 1 (T ) + Ψ 12 H (∂T )
Let RT∗ f = wp . Then (2.36) and (2.38) lead to (2.32) and (2.33) with t = 1. Note that ΠT (Ψ − RT f ) is the finite element solution in Pp1,0 (T ) for the the boundary value problem −Δu + u = f˜ u |∂T = 0,
in T
with f˜ = −Δ(Ψ − RT f ) + Ψ − RT f . By the Nitsche’s trick, we have (I − ΠT )(Ψ − RT f )L2 (T ) ≤ Cp−1 (I − ΠT )(Ψ − RT f )H 1 (T ) ≤ Cp−1 ΨH 1 (T ) , which implies (2.39)
Ψ − wp L2 (T ) = (I − ΠT )(Ψ − RT f )L2 (T ) ≤ Cp−1 ΨH 1 (T ) .
Combining (2.39) and (2.36) we have (2.33) for t = 0. We construct a polynomial extension from a square face to the prism G with help of the extension RT∗ in triangle T : (2.40)
S f (x1 , x2 , x3 ) = RT∗ f (·, x3 ). RG
Theorem 2.9. Let Γ2 = S be a square face of the prism G as shown in Figure 2.2, S S and let RG be the extension operator defined as in (2.40). Then, RG f ∈ Pp1 (T )×Pp (I) 2,0 for f ∈ Pp (Γ2 ), and (2.41) (2.42) (2.43)
S S RG f = f on Γ2 , RG f = 0 on ∂G\Γ2 , 3 0 S 0 1 −2 − 12 0RG f 0 1 2 f 2 1 1 ≤ C p |f | + p |f | + p x H (Γ2 ) L (Γ2 ) , 3 H (Γ2 ) H (G) 3 0 S 0 1 0RG f 0 2 ≤ C p− 2 f H 1 (Γ2 ) + p− 2 f L2 (Γ2 ) . L (G)
1208
BENQI GUO AND JIANMING ZHANG
S Proof. Obviously, RG f ∈ Pp1 (T ) × Pp (I) and (2.41) holds. Due to (2.40),
0 S 02 0RG f 0 2 = L (G)
1
0
S 2 RG f dx1 dx2 dx3 ≤
1
RT∗ f 2L2 (T ) dx3
0
T
p−3 f (·, x3 )2H 1 (I) + p−1 f (·, x3 )2L2 (I) dx3 0 ≤ C p−3 f 2H 1 (S) + p−1 f 2L2(S) , ≤C
1
which leads to (2.43). Applying (2.40) to f (x1 , x3 ) and fx3 (x1 , x3 ), respectively, we have S 2 RG f
H 1 (G)
1
1
≤ 0
≤C 0
2 2 |RT∗ f |H 1 (T ) + |RT∗ fx3 |L2 (T ) dx3 p−1 f 2H 1 (I) + p f 2L2 (I) + p−3 fx3 2L2 (I) dx3 ,
which implies (2.42). Remark 2.1. It is an open problem whether there exists a polynomial extension S such that operator RG 0 S 0 0RG f 0 1 (2.44) ≤ Cf H 1/2 (S) . H (G) 00
Although (2.42) is not strong as the desired stability of (2.44), it gives the depenS dence of RG f H 1 (G) on f H t (S) , t = 1, 0 and fx3 H 1 (S) furnished precisely with −1/2 1/2 weights p , p , and p−3/2 , respectively. This estimation is sufficient while we apply the extension to a pair of elements sharing a common square face for constructing a continuous piecewise polynomial in Pp1.5 (G) without degrading the best order of S defined as in (2.40) is weakly stable, approximation error. Hence, the extension RG and Theorem 2.9 plays an important role in error analysis for the p and h-p versions of the FEM in three dimensions on meshes containing triangular prism elements. For the detail of the application of this extension for the construction of a continuous piecewise polynomial, we refer to [15, 18]. 3. Polynomial extension on a cube. Let D be a cube and Γi , i = 1, 2, . . . , 6 be faces of D shown in Figure 3.1, and let γij = Γi ∩ Γj , i = 1, 2, . . . , 6. As usual, I = [−1, 1] and S = [−1, 1]2 . x3 D
Γ4
Γ5
1
Γ3 Γ2
Γ6
O
1
Γ1 1
−1
x1
Fig. 3.1. A cube D.
x2
1209
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
3.1. Polynomial extension from a face. Let Jjα,β (x) be the Jacobi polynomial of degree j: (3.1)
Jjα,β (x) =
(−1)j (1 − x)−α (1 + x)−β dj (1 − x)j+α (1 + x)j+β , 2j j! dxj
j ≥ 0,
with weights α, β > −1, and let 1 − x2 2,2 ϕi (x) = Ji−1 (x), 2,2 γi−1
(3.2)
i = 1, 2, 3, . . . ,
5
2 i(i+1) 2,2 where γi−1 = (2i+3)(i+2)(i+3) . Proposition 3.1. ϕi (x), i = 1, 2, . . . , p − 1 form an orthogonal basis of Pp0 (I),
ϕi (x), ϕj (x)L2 (I) = δij ,
(3.3)
1 ≤ i, j ≤ p − 1.
Proof. Due to the orthonormality of Jacobi polynomials, 2 2,2 1 2,2 ϕi (x), ϕj (x)L2 (I) = 1 − x2 Ji−1 (x)Jj−1 (x)dx = δij . 2,2 2,2 I γi−1 γj−1 We introduce (3.4)
1 − x21 1 − x22 2,2 2,2 Ji−1 (x1 )Jj−1 (x2 ), 1 ≤ i, j ≤ p − 1, ϕn (x1 , x2 ) = ϕi (x1 )ϕj (x2 ) = 2,2 2,2 γi−1 γj−1
with n = (p − 1)(i − 1) + j. Proposition 3.2. {ϕn (x1 , x2 ), n = 1, 2, . . . , (p − 1)2 } forms an orthnormal basis of Pp2,0 (S) in L2 (S), i.e., (3.5)
ϕn , ϕm L2 (S) = δnm ,
1 ≤ n, m ≤ Np = (p − 1)2 .
Proof. Let n = (p − 1)(i − 1) + j and m = (p − 1)(i − 1) + j . Then ϕn , ϕm L2 (S) 2 2 1 − x21 1 − x22 2,2 2,2 = Ji−1 (x1 )Ji2,2 (x )dx Jj−1 (x2 )Jj2,2 −1 −1 (x2 )dx2 1 1 2,2 2,2 2,2 2,2 I I γj−1 γj −1 γi−1 γi −1 = δi,i δj,j = δnm . We consider an eigenvalue problem −u = λu (3.6) uΓ = 0,
in S = (−1, 1)2 ,
and its spectral solution (λp , ψp ), with ψp ∈ Pp2,0 (S), which satisfies (3.7) ∇ψp ∇qdx1 dx2 = λp ψp qdx1 dx2 ∀q ∈ Pp2,0 (S). S
S
1210
BENQI GUO AND JIANMING ZHANG
Selecting the basis {ϕn (x1 , x2 ), n = 1, 2, . . . , Np } as in (3.4), with Np = (p − 1)2 and Np ci ϕi (x1 , x2 ), we have the corresponding system of linear letting ψp (x1 , x2 ) = i=1 algebraic equations: → − → − → − K C = λM C = λ C , → − Np , with kij = S ∇ϕi ∇ϕj dx1 dx2 . Here, where C = (c1 , c2 , . . . , cNp )T , K = (kij )i,j=1 we used the orthonormality of ϕn (x1 , x2 ) in L2 (S), which implies the matrix M = I. Therefore, the spectral solution of eigenvalue problem (3.7) is equivalent to the eigenvalue problem of matrix K. Since K is symmetric and positive definite, the →(k) − are eigenvalues λp,k > 0, k = 1, 2, . . . , Np and the corresponding eigenvectors C orthonormal, i.e., I
Np J →(l) − (k) − → (k) (l) = ci ci = δk,l , 1, ≤ k, l ≤ Np . C ,C i=1
Np (k) The corresponding eigen-polynomial ψp,k = n=1 cn ϕn (x1 , x2 ). Then, due to the properties of eigenvalues and vectors of K, we have the following theorem. Theorem 3.3. The problem (3.7) has Np real eigenvalues, and the corresponding eigen-polynomials {ψp,k (x1 , x2 ), 1 ≤ k ≤ Np } are orthogonal in L2 (S) and H 1 (S), which form an L2 -orthonormal basis of Pp2,0 (S). Proof. The problem (3.7) has Np real eigenvalues because the corresponding stiffness matrix K is positive definite and there hold for 1 ≤ k, k ≤ Np ψp,k , ψp,k L2 (S) =
Np Np
(k) (k )
ci cj
ϕi , ϕj L2 (S) =
? @ →(k) − − →(k ) C ,C = δk,k
j=1 i=1
and
∇ψp,k ∇ψp,k dx1 dx2 = λk S
ψp,k ψp,k dx1 dx2 = λk δk,k . S
Therefore, {ψp,k , k = 1, 2, . . . , Np } is orthogonal in L2 (S) and H 1 (S) and forms an orthonormal basis in L2 (S). We next consider a two-point boundary value problem x3 ∈ I = (−1, 1), −vp,k (x3 ) + λp,k vp,k (x3 ) = 0, (3.8) vp,k (−1) = 1, vp,k (1) = 0, and its spectral solution φp,k ∈ Pp (I) such that φp,k (−1) = 1, φp,k (1) = 0 and (3.9) φp,k q + λp,k φp,k q dx3 = 0, I 0 3 ˜ which is equivalent to finding φp,k = φ˜p,k + 1−x 2 , with φp,k ∈ Pp (I) satisfying φ6p,k (x3 )q (x3 ) + λp,k φ6p,k (x3 )q(x3 ) dx3 I (3.10) 1 (q (x3 ) − λp,k (1 − x3 )q(x3 ))dx3 . = 2 I
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
1211
Since the corresponding bilinear form is coercive and continuous on H01 (I) × H01 (I), the solution φ˜p,k (x3 ) uniquely exists in Pp0 (I) for each λp,k . Lemma 3.4 (Inverse inequality). 2 4 (3.11) |∇ψp,k | dx1 dx2 ≤ Cp |ψp,k |2 dx1 dx2 , S
S
where C is a constant independent of p and k. Proof. It is a typical inverse inequality in two dimensions; for the proof, we refer to, e.g., [11]. Lemma 3.5. Let λp,k be an eigenvalue of the problem (3.7), and let φp,k (x3 ) be the corresponding solution of two-point value problem (3.8). Then
1
(3.12) −1
2 2 φp,k + λp,k |φp,k | dx3 ≤ C λp,k ,
k = 1, 2, . . . , Np .
Proof. Since λp,k is an eigenvalue of the problem (3.7), then λp,k = (∇ψp,k )2 dx1 dx2 . S
By Lemma 3.4, there exists a constant η > 0 independent of p and k such that 0 < λp,k ≤ ηp4 . Then for each k, we always can find a unique integer 1 ≤ Mk ≤ p satisfying η(Mk − 1)4 ≤ λp,k ≤ ηMk4 .
(3.13)
For each k, correspondingly we introduce the knots and the weights ξi , ωi (i = 0, 1, . . . , Mk ) of the Gauss–Legendre–Lobatto quadrature formula of order Mk on the interval [−1, 1]. We assume that the knots are ordered in such a way that ξ0 = −1. Let χk be the Lagrange interpolation polynomial of degree Mk such that 1, if i = 0, χk (ξi ) = 0, otherwise. By the equivalence of discrete and continuous L2 norms over PMk (−1, 1) (see [11]), there exists a constant c1 > 0 independent of Mk such that
1
−1
Since ω0 =
2 Mk (Mk +1)
|χk (x1 )|2 dx1 ≤ c1
Mk
χ2k (ξi )ωi = c1 ω0 .
i=0
(see [13]), we obtain
1
−1
|χk (x1 )|2 dx1 ≤
c2 , Mk2
and, by the inverse inequality, we have
1
−1
|χ k (x1 )|2 dx1 ≤ c2 ηMk2 .
1212
BENQI GUO AND JIANMING ZHANG
Setting q = φp,k −χk in (3.10) and by using the Cauchy–Schwarz inequality, we obtain 1 (φp,k )2 + λp,k (φp,k )2 dx3 ≤ CMk2 . −1
Lemma 3.5 follows immediately by this inequality and (3.13). Since f (x1 , x2 ) ∈ Pp2,0 (S) and {ψp,k (x1 , x2 ), 1 ≤ k ≤ Np } is an orthonormal basis of Pp2,0 (S), f (x1 , x2 ) = with βk =
Np
βk ψp,k (x1 , x2 ),
k=1
S
f (x1 , x2 )ψp,k (x1 , x2 )dx1 dx2 . Let RD f =
(3.14)
Np
βk ψp,k (x1 , x2 )φp,k (x3 ).
k=1
Obviously, RD f |Γ1 =
Np
βk ψp,k (x1 , x2 ) = f (x1 , x2 ),
k=1
where Γ1 = {(x1 , x2 , −1)| − 1 < x1 , x2 < 1}. Theorem 3.6. Let D = (−1, 1)3 and Γ1 = {(x1 , x2 , −1)| − 1 < x1 , x2 < 1}, then for f ∈ Pp2,0 (Γ1 ), there exists RD f ∈ Pp2 (D) such that RD f |Γ1 = f, RD f |∂D\Γ1 = 0, and RD f H 1 (D) ≤ Cf
(3.15)
1
2 (Γ ) H00 1
,
where C is a constant, which is independent of p and f . Proof. Let ψp,k and φp,k be defined as in (3.7) and (3.10), and let RD f be given in (3.14), then RD f |Γ1 = f,
RD f |∂D\Γ1 = 0.
Due to the orthogonality of the ψp,k L2 (S) and H 1 (S) and by using (3.7) and Lemma 3.5, we have RD f 2L2 (D) = and |RD f |2H 1 (D)
Np
1 βk2 λp,k k=1
∂RD f 2 ∂RD f 2 ∂RD f 2 = ∂x1 + ∂x2 + ∂x3 dx1 dx2 dx3 D
Np 2 2 2 2 2 = βk |ψp,k | dx1 dx2 φp,k dx3 + |∇ψp,k | dx1 dx2 |φp,k | dx3 S
k=1
I
S
Np Np 2 2 2 βk βk2 λp,k . = φp,k + λp,k |φp,k | dx3 ≤ C k=1
I
k=1
I
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
1213
Therefore, RD f 2H 1 (D) 0
(3.16)
≤C
Np
βk2 1 + λp,k .
k=1
Note that f 2L2 (Γ1 ) =
Np
βk2 ,
f 2H 1 (Γ1 ) =
k=1
0
Np
βk2 (1 + λp,k ).
k=1
By interpolation space theory [8, 10, 19], f 2
1 2 (Γ ) H00 1
≈
Np
1
βk2 (1 + λp,k ) 2 ≈
k=1
Np
βk2 1 + λp,k ,
k=1
which together with (3.16) implies (3.15). Analogously, we consider spectral solutions in either Pp2 (Γ1 ) or 0 Pp2 (Γ1 ) = {ϕ ∈ Pp2 (Γ1 ) | ϕ(±1, x2 ) = 0} for the corresponding eigenvalue problems. Obvi √ ously, { (2i + 1)(2j + 1)/2Li (x1 )Lj (x2 ), 0 ≤ i, j ≤ p} and { 2j + 1(1 − x21 )/ 2,2 2,2 2γi−1 Ji−1 (x1 )Lj (x2 ), 1 ≤ i ≤ p − 1, 0 ≤ j ≤ p} are the orthonormal bases of
2,2 (x1 ) denote the Legendre Pp2 (Γ1 ) and 0 Pp2 (Γ1 ), respectively, where Li (x1 ) and Ji−1 and the Jacobi polynomials. The arguments for Theorem 3.6 can be carried out except replacing Pp2,0 (Γ1 ) by Pp2 (Γ1 ) or 0 Pp2 (Γ1 ). Therefore, we have the following two theorems which are parallel to Theorem 3.6. Theorem 3.7. Let D = [0, 1]3 and Γ1 = {(x1 , x2 , 0)|0 < x1 , x2 < 1}, then for f ∈ Pp2 (Γ1 ), there exists U ∈ Pp2 (D) such that U |Γ1 = f, U |Γ4 = 0 and
(3.17)
U H 1 (D) ≤ Cf
1
H 2 (Γ1 )
,
where C is a constant independent of p and f . Theorem 3.8. Let D = [0, 1]3 and Γ1 = {(x1 , x2 , 0) | 0 < x1 , x2 < 1}, then for f ∈ Pp2 (Γ1 ), f |γ12 = 0, f |γ15 = 0, there exists U ∈ Pp2 (D) such that U |Γ1 = f, U |Γ4 = 0, U |Γ2 = 0, U |Γ5 = 0 and (3.18)
U H 1 (D) ≤ Cf
1
2 (Γ ,γ ∪γ ) H00 1 12 15
,
where C is a constant independent of p and f , and |u|2 |u|2 2 2 dSx + dSx . (3.19) u 1 = u 1 + 2 H 2 (Γi ) H00 (Γi ,γil ∪γim ) Γi dist(x, γil ) Γi dist(x, γim ) Remark 3.1. Theorem 3.6 can be proved on a cube (0, 1)3 by a simple mapping. Hereafter, D = (0, 1)3 shall be the standard cube for the convenience in following sections. Remark 3.2. The polynomial extension without using convolution was first proposed by Canuto and Funaro for the extension in square [10]. Since the polynomial extension of convolution-type is sufficient on triangle and square elements, the generalization of this approach to a cube is much more significant because it is the only polynomial extension compatible to FEM subspace on a cube.
1214
BENQI GUO AND JIANMING ZHANG x3
ξ3
Λ
1 1
H
ΛH
D 1
ξ2
O
O
1
x2
1 1
−1
x1
ξ1
Fig. 3.2. A cube and a truncated pyramid ΛH .
Remark 3.3. In [7] a similar extension was proposed by using spectral solutions of two eigenvalue problems in one dimension and one boundary value problem on an interval without rigorous proof. Recently, the same approach was developed with a proof in [12]. A genuine generalization of Canuto and Funaro’s approach from a square to a cube should be based on the spectral solution of an eigenvalue problem on a square, which is much better than the spectral solutions of two eigenvalue problems on an interval. More significantly, this approach can be used for a prism with nonsquare bases on which the eigenvalue problem cannot be decomposed into two one dimensional problems, e.g., a prism with a triangular base. The polynomial extension from a triangular base to a prism given in Theorem 2.4 can be proved by using this approach, but we will not elaborate the details here. Remark 3.4. As an analogue to the extension on a square via a convolution-type extension on a triangle and a mapping of a square onto a truncated triangle [5, 18], we are able to construct an extension via a convolution-type extension on a tetrahedron and a mapping of a cube onto a truncated tetrahedron. It was shown that there is a square base S to a pyramid Λ such that RΛ a convolution-type extension RΛ from 1 2 realizes a continuous mapping H00 (S) → H 1 (Λ) and RΛ f |S = f, RΛ f |∂Λ\S = 0 [22]. ˜ D on a cube D is defined as Then a convolution-type extension R ˜ D f = RΛH f ◦ M, R ˜ Λ f (x1 , x2 , x3 ) − x3 RΛ f (x1 , x2 , H), ˜ ΛH f (x1 , x2 , x3 ) = R R H where the mapping M : xi =
ξi + 1 2
H(ξ3 + 1) H(ξ3 + 1) 1− , i = 1, 2, x3 = 2 2
maps the cube D onto a truncated pyramid ΛH as shown in Figure 3.2. It is easy to ˜ D f ∈ Pp1 (D), R ˜ D f |S = f, R ˜ D f |∂D\S=0 if f ∈ Pp1,0 (S). Note that R ˜ D f ∈ see that R ˜ D f ∈ P 1,0 (S) × P 1 (I) if f ∈ P 2,0 (S). Hence, the convolution-type Pp2 (D), instead, R p p p ˜ extension RD is not compatible with the finite element space on the cube D and is not applicable to analysis of the p and h-p finite element solutions on meshes containing hexahedral elements. 3.2. Polynomial extension from whole boundary. We shall construct a polynomial extension E which lifts a polynomial on a whole boundary of a cube D in 1 three steps, which is proved to be a continuous operator: H 2 (∂D) → H 1 (T ).
1215
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
Theorem 3.9. Let D = [0, 1]3 be the cube and f ∈ Pp2 (∂D) = {f ∈ C 0 (∂D), f |Γi = fi ∈ Pp2 (Γi ), i = 1, . . . , 6}, where Γi ’s are the faces of cube D. Then there exists ED f ∈ Pp2 (D) such that ED f |∂D = f and ED f H 1 (D) ≤ Cf
(3.20)
1
H 2 (∂D)
,
where C is a constant independent of p and f , ∂D is the boundary of D. Proof. By Theorem 3.7, there exist U1 , U4 ∈ Pp2 (D) such that U1 |Γ1 = f1 , U1 |Γ4 = 0; U4 |Γ4 = f4 , U4 |Γ1 = 0, and (3.21)
U1 H 1 (D) ≤ Cf1
1
H 2 (Γ1 )
U4 H 1 (D) ≤ Cf4
,
1
H 2 (Γ4 )
.
Let g2 = f2 − U1 |Γ2 − U4 |Γ2 and g5 = f5 − U1 |Γ5 − U4 |Γ5 , then g2 vanishes at the sides γ12 and γ24 of Γ2 , and g5 vanishes at the sides γ15 and γ45 of Γ5 . By Theorem 3.8, there exist U2 , U5 ∈ Pp2 (D) such that U2 |Γ2 = g2 , U2 |Γi = 0, i = 1, 4, 5, U5 |Γ5 = g5 , U5 |Γj = 0, j = 1, 2, 4, and (3.22)
U2 H 1 (D) ≤ Cg2
1
2 (Γ ,γ ∪γ ) H00 2 12 24
U5 H 1 (D) ≤ Cg5
,
1
2 (Γ ,γ ∪γ ) H00 5 15 45
.
Let
g3 = f3 −
Ui |Γ3 ,
g6 = f6 −
i=1,2,4,5
Ui |Γ6 ,
i=1,2,4,5
then g3 |γ13 = −U2 |γ13 − U5 |γ13 , g3 |γ23 = 0, g3 |γ34 = −U2 |γ34 − U5 |γ34 , g3 |γ35 = 0, g6 |γ16 = −U2 |γ16 − U5 |γ16 , g6 |γ26 = 0, g6 |γ46 = −U2 |γ46 − U5 |γ46 , g6 |γ56 = 0. By Theorem 3.8, there exist U3 , U6 ∈ Pp2 (D) such that U3 |Γ3 = g3 , U3 |Γi = 0, i = 2, 5, 6, and U6 |Γ6 = g6 , U6 |Γj = 0, j = 2, 3, 5, and (3.23)
U3 H 1 (D) ≤ Cg3
1
2 (Γ ,γ ∪γ ) H00 3 23 35
,
U6 H 1 (D) ≤ Cg6
1
2 (Γ ,γ ∪γ ) H00 6 26 56
.
Let U = U1 + U2 + U3 + U4 + U5 + U6 . Then it is easy to see that U |Γi = fi , i = ¯1 ∩ Γ ¯ 2 and U1 |Γ1 = f1 , 2, 3, 5, 6. Let g1 = f1 − U |Γ1 , g4 = f4 − U |Γ4 . Since γ12 = Γ U2 |Γ1 = U4 |Γ1 = U5 |Γ1 = U3 |Γ2 = U6 |Γ2 = 0, there holds g1 |γ12 = (f1 − U |Γ1 )|γ12 = f1 |γ12 − ((U1 + U2 + U3 + U4 + U5 + U6 )|Γ1 ) |γ12 = f1 |γ12 − (f1 + U2 |Γ1 +U3 |Γ2 + U4 |Γ1 +U5 |Γ1 +U6 |Γ2 )|γ12 = 0, and since U3 |γ13 = g3 |γ13 = (f3 − U1 + U2 + U4 + U5 ) |γ13 and U6 |Γ3 = 0, it holds that g1 |γ13 = (f1 − U |Γ1 )|γ13 = f1 |γ13 − (U |Γ3 )|γ13 = f1 |γ13 − f3 |γ13 = 0. Similarly, it can be shown that g1 |γ15 = g1 |γ16 = 0. Hence, g1 |∂Γ1 = 0. Due to the symmetry, it holds that g4 |∂Γ4 = 0. By Theorem 3.6, there exist V1 ∈ Pp2,0 (Γ1 ) and V4 ∈ Pp2,0 (Γ4 ) such that V1 |Γ1 = g1 ,
V1 |Γi = 0,
i = 2, 3, 4, 5, 6,
V4 |Γ4 = g4 ,
V4 |Γi = 0,
i = 1, 2, 3, 5, 6,
1216
BENQI GUO AND JIANMING ZHANG
and V1 H 1 (D) ≤ Cg1
1
2 (Γ ) H00 1
,
V4 H 1 (D) ≤ Cg4
1
2 (Γ ) H00 4
.
Let ED f = U + V1 + V4 , then we have ED f |Γi = fi , i = 1, 2, 3, 4, 5, 6, and (3.24) ED f H 1 (S) ≤ U H 1 (S) + V1 H 1 (S) + V4 H 1 (S)
≤ C f1 12 + f4 12 + g2 H (Γ1 )
+ g5
H (Γ4 )
+ g3
1
2 (Γ ,γ ∪γ ) H00 5 15 45
+ g6
1
2 (Γ ,γ ∪γ ) H00 3 23 35
+ g1
1 2 (Γ ,γ ∪γ ) H00 6 26 56
1
2 (Γ ,γ ∪γ ) H00 2 12 24
1 2 (Γ ) H00 1
+ g4
1 2 (Γ ) H00 4
.
First, we prove that g2
(3.25)
1
2 (Γ ,γ ∪γ ) H00 2 12 24
≤ Cf
1
H 2 (Γ1 ∪Γ2 ∪Γ4 )
.
Due to (3.21), there holds g2 (3.26)
1
H 2 (Γ2 )
≤ f2
1
H 2 (Γ2 )
+ U1
1
H 2 (Γ2 )
+ U4
1
H 2 (Γ2 )
+ CU1 H 1 (D) + CU4 H 1 (D) ≤ f2 12 H (Γ2 ) . ≤ C f2 12 + f1 12 + f4 12 H (Γ2 )
H (Γ1 )
H (Γ4 )
1
2 For (3.25), by the definition (3.19) of H00 (Γ2 , γ12 ∪ γ24 ), we need to show that
(3.27) S
|g2 |2 dx1 dx3 ≤ Cf 12 , H (Γ1 ∪Γ2 ∪Γ4 ) x3
S
|g2 |2 dx1 dx3 ≤ Cf 12 . H (Γ1 ∪Γ2 ∪Γ4 ) 1 − x3
Since U1 (x1 , x3 , 0) = f1 (x1 , x3 ) and U4 (x1 , x3 , 0) = 0, there holds g2 (x1 , x3 ) = f2 (x1 , x3 ) −
Ui (x1 , x2 , x3 )|Γ2 = f2 (x1 , x3 ) −
i=1,4
Ui (x1 , 0, x3 )
i−1,4
= (f2 (x1 , x3 ) − f1 (x1 , x3 )) + (U1 (x1 , x3 , 0) − U1 (x1 , 0, x3 )) + (U4 (x1 , x3 , 0) − U4 (x1 , 0, x3 )). 1
Due to following equivalent norms for the space H 2 (Γ2 ∪ Γ1 ) [3, 14], (3.28)
f
1
H 2 (Γ2 ∪Γ1 )
≈ f2 2
1
H 2 (Γ2 )
+
f1 2 1 H 2 (Γ1 )
12 + D(f2 , f1 ) ,
where D(f2 , f1 ) = S
|f2 (t1 , 0, t2 ) − f1 (t1 , t2 , 0)|2 dt1 dt2 , t2
1217
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
we have
|f2 (x1 , x3 ) − f1 (x1 , x3 )|2 dx1 dx3 ≤ f 2 1 , H 2 (Γ1 ∪Γ2 ) x3 S |U1 (x1 , x3 , 0) − U1 (x1 , 0, x3 )|2 dx1 dx3 = D(U1 |Γ1 , U1 |Γ2 ) ≤ CU1 2 1 H 2 (Γ1 ∪Γ2 ) x3 S 2 2 ≤ CU1 H 1 (D) ≤ Cf1 1 , H 2 (Γ1 )
and S
|U4 (x1 , x3 , 0) − U4 (x1 , 0, x3 )|2 dx1 dx3 = D(U4 |Γ1 , U4 |Γ2 ) ≤ CU4 2 1 H 2 (Γ1 ∪Γ2 ) x3 2 2 ≤ CU4 H 1 (D) ≤ Cf4 1 . H 2 (Γ4 )
Therefore, we obtain the first inequality of (3.27). For the second inequality of (3.27), we shall decompose g2 (x1 , x3 ) differently. Since U4 (x1 , x3 , 1) = f4 (x1 , x3 ) and U1 (x1 , x3 , 1) = 0, there holds
g2 (x1 , x3 ) = f2 (x1 , x3 ) −
Ui (x1 , x2 , x3 )|Γ2 = f2 (x1 , x3 ) −
i=1,4
Ui (x1 , 0, x3 )
i=1,4
= (f2 (x1 , x3 ) − f4 (x1 , x3 )) + (U4 (x1 , x3 , 1) − U4 (x1 , 0, x3 )) + (U1 (x1 , x3 , 1) − U1 (x1 , 0, x3 )). Arguing as previously, we have the second inequality of (3.27). Then (3.25) follows immediately from (3.26)–(3.27). Due to the symmetry, we have analogously g5
(3.29)
1
2 (Γ ,γ ∪γ ) H00 5 15 45
≤ Cf
1
H 2 (Γ1 ∪Γ4 ∪Γ5 )
.
We shall next prove that (3.30) g3
1
2 (Γ ,γ ∪γ ) H00 3 23 35
≤ Cf
1
H 2 (∂D\Γ6 )
, g6
1
2 (Γ ,γ ∪γ ) H00 6 26 65
≤ Cf
1
H 2 (∂D\Γ3 )
By (3.22), (3.25), and (3.29) we have 0 0 0 0 0 0 0 g3 12 = 0f3 − Ui |Γ3 0 0 H (Γ3 ) 0 0 1 i=1,2,4,5 H 2 (Γ3 ) (3.31) ≤ f3 12 +C Ui H 1 (D) H (Γ3 )
⎛ ≤ C ⎝f3
1
H 2 (Γ3 )
i=1,2,4,5
+
≤ Cf
1 2
1
H 2 (Γ3 )
H (∂D\Γ6 )
fi
i=1,4
⎛ ≤ C ⎝f3
+
i=1,4
.
fi
1
H 2 (Γi )
1
H 2 (Γi )
+
⎞ gi
i=2,5
+
i=2,5
1
2 H00 (Γi ,γ1i ∪γi4 )
⎠
⎞
f
1
H 2 (Γ1 ∪Γi ∪Γ4 )
⎠
.
1218
BENQI GUO AND JIANMING ZHANG 1
2 For the first inequality of (3.30), due to the definition (3.19) of H00 (Γ3 , γ23 ∪ γ35 ), it remains to show that |g3 |2 |g3 |2 (3.32) dx1 dx3 ≤ Cf 12 , dx1 dx3 ≤ Cf 12 . H (∂D\Γ ) H (∂D\Γ6 ) 6 S x2 S 1 − x2
Since U2 (x2 , 0, x3 ) = g2 (x2 , x3 ) and U5 (x2 , 0, x3 ) = 0, we have g3 (x2 , x3 ) = f3 (x2 , x3 ) − g2 (x2 , x3 ) + U2 (x2 , 0, x3 ) − Ui (0, x2 , x3 ) i=1,2,4,5
= f3 (x2 , x3 ) − (f2 (x2 , x3 ) − U1 (x2 , 0, x3 ) − U4 (x2 , 0, x3 )) + U2 (x2 , 0, x3 ) − U1 (0, x2 , x3 ) − U4 (0, x2 , x3 ) − U2 (0, x2 , x3 ) − U5 (0, x2 , x3 ) = (f3 (x2 , x3 ) − f2 (x2 , x3 )) + (U1 (x2 , 0, x3 ) − U1 (0, x2 , x3 )) + (U4 (x2 , 0, x3 ) − U4 (0, x2 , x3 )) + (U2 (x2 , 0, x3 ) − U2 (0, x2 , x3 )) + (U5 (x2 , 0, x3 ) − U5 (0, x2 , x3 )). 1
By the equivalent norm of H 2 (Γ2 ∪ Γ3 ) described in (3.28), we have |f3 (x2 , x3 ) − f2 (x2 , x3 )| dx2 dx3 ≤ f 2 1 , H 2 (Γ3 ∪Γ2 ) x2 S S
|U1 (x2 , 0, x3 ) − U1 (0, x2 , x3 )| dx2 dx3 = D(U1 |Γ2 , U1 |Γ3 ) ≤ CU1 2 1 H 2 (Γ2 ∪Γ3 ) x2 2 2 ≤ CU1 H 1 (D) ≤ Cf1 1 , H 2 (Γ1 )
S
|U4 (x2 , 0, x3 ) − U4 (0, x2 , x3 )| dx2 dx3 = D(U4 |Γ2 , U4 |Γ3 ) ≤ CU4 2 1 H 2 (Γ2 ∪Γ3 ) x2 2 2 ≤ CU4 H 1 (D) ≤ Cf4 1 , H 2 (Γ4 )
S
|U2 (x2 , 0, x3 ) − U2 (0, x2 , x3 )| dx2 dx3 = D(U2 |Γ2 , U2 |Γ3 ) ≤ CU2 2 1 H 2 (Γ2 ∪Γ3 ) x2 2 2 ≤ CU2 H 1 (D) ≤ Cg2 1
2 (Γ ,γ ∪γ ) H00 2 12 24
≤ S
Cf 2 1 , H 2 (Γ1 ∪Γ2 ∪Γ4 )
|U5 (x2 , 0, x3 ) − U5 (0, x2 , x3 )| dx2 dx3 = D(U5 |Γ2 , U5 |Γ3 ) ≤ CU5 2 1 H 2 (Γ2 ∪Γ3 ) x2 2 2 ≤ CU5 H 1 (D) ≤ Cg5 1
2 (Γ ,γ ∪γ ) H00 5 12 24
≤ Cf 2
1
H 2 (Γ1 ∪Γ4 ∪Γ5 )
.
Then the first inequality of (3.32) follows easily. For the second inequality of (3.32), we shall decompose g3 (x2 , x3 ) in different way, i.e., g3 (x2 , x3 ) = (f3 (x2 , x3 ) − f5 (x2 , x3 )) + (U1 (x2 , 1, x3 ) − U1 (0, x2 , x3 )) + (U4 (x2 , 1, x3 ) − U4 (0, x2 , x3 )) + (U5 (x2 , 1, x3 ) − U5 (0, x2 , x3 )) + (U2 (x2 , 1, x3 ) − U2 (0, x2 , x3 )).
1219
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
Arguing as previously, we obtain the second estimation of (3.32). A combination of (3.31) and (3.32) leads to the first inequality of (3.30). By the symmetry, we have the second one of (3.30). Finally, we prove that gi
(3.33)
1
2 (Γ ) H00 1
≤ Cf
1
H 2 (∂D)
,
i = 1, 4.
By (3.21)–(3.23) and (3.25), (3.29)–(3.30), there holds g1 (3.34)
1
H 2 (Γ1 )
= f1 − U |Γ1 ≤ f1
1 2
H (Γ1 )
1
H 2 (Γ1 )
+
6
Ui
i=1
1
H 2 (Γ1 )
≤ Cf
1
H 2 (∂D)
.
For (3.33) with i = 1, we need to show that, for j = 1, 2, |g1 (x1 , x2 )|2 |g1 (x1 , x2 )|2 (3.35) dx1 dx2 ≤ Cf 12 , dx1 dx2 ≤ Cf 12 . H (∂D) H (∂D) xj 1 − xj S S Since U2 |Γ2 = g2 , U5 |Γ1 = 0, U3 |Γ2 = 0 and U6 |Γ2 = 0, we have g1 (x1 , x2 ) = f1 (x1 , x2 ) − g2 (x1 , x2 ) + U2 (x1 , 0, x2 ) − U (x1 , x2 , x3 )|Γ1 = f1 (x1 , x2 ) − (f2 (x1 , x2 ) − U1 (x1 , 0, x2 ) − U4 (x1 , 0, x2 )) + U2 (x1 , 0, x2 ) − Ui (x1 , x2 , 0) 1≤i≤6
= (f1 (x1 , x2 ) − f2 (x1 , x2 )) + (U1 (x1 , 0, x2 ) − U1 (x1 , x2 , 0)) + (U4 (x1 , 0, x2 ) − U4 (x1 , x2 , 0)) + (U2 (x1 , 0, x2 ) − U2 (x1 , x2 , 0)) + (U3 (x1 , 0, x2 ) − U3 (x1 , x2 , 0)) + (U6 (x1 , 0, x2 ) − U6 (x1 , x2 , 0)). 1
By the equivalent norm of H 2 (Γ2 ∪ Γ1 ) described in (3.28), there hold |f1 (x1 , x2 ) − f2 (x1 , x2 )|2 dx1 dx2 ≤ f 12 , H (Γ1 ∪Γ2 ) x2 S S
|U1 (x1 , 0, x2 ) − U1 (x1 , x2 , 0)|2 dx1 dx2 = D(U1 |Γ2 , U1 |Γ1 ) ≤ CU1 2 1 H 2 (Γ2 ∪Γ1 ) x2 ≤ CU1 2H 1 (D) ≤ Cf1 2 1 , H 2 (Γ1 )
S
|U4 (x1 , 0, x2 ) − U4 (x1 , x2 , 0)|2 dx1 dx2 = D(U4 |Γ2 , U4 |Γ1 ) ≤ CU4 2 1 H 2 (Γ2 ∪Γ1 ) x2 2 2 ≤ CU4 H 1 (D) ≤ Cf4 1 , H 2 (Γ4 )
S
|U2 (x1 , 0, x2 ) − U2 (x1 , x2 , 0)|2 dx1 dx2 = D(U2 |Γ2 , U2 |Γ1 ) ≤ CU2 2 1 H 2 (Γ2 ∪Γ1 ) x2 ≤ CU2 2H 1 (D) ≤ Cg2 2 1
2 (Γ ,γ ∪γ ) H00 2 12 24
≤
Cf 2 1 , H 2 (Γ1 ∪Γ2 ∪Γ4 )
1220 S
BENQI GUO AND JIANMING ZHANG
|U3 (x1 , 0, x2 ) − U3 (x1 , x2 , 0)|2 dx1 dx2 = D(U3 |Γ2 , U3 |Γ1 ) ≤ CU3 2 1 H 2 (Γ2 ∪Γ1 ) x2 ≤ CU3 2H 1 (D) ≤ Cg3 2 1
2 (Γ ,γ ∪γ ) H00 3 23 35
≤ and S
Cf 2 1 , H 2 (∂D\Γ6 )
|U6 (x1 , 0, x2 ) − U6 (x1 , x2 , 0)|2 dx1 dx2 = D(U6 |Γ2 , U6 |Γ1 ) ≤ CU6 2 1 H 2 (Γ2 ∪Γ1 ) x2 2 2 ≤ CU6 H 1 (D) ≤ Cg6 1
2 (Γ ,γ ∪γ ) H00 6 26 56
≤
Cf 2 1 . H 2 (∂D\Γ3 )
The above inequalities lead to the first estimation of (3.35) for j = 2. For the second inequality of (3.35) with j = 2, we shall decompose g1 differently. Since U5 |Γ5 = g5 , U2 |Γ5 = 0, U3 |Γ5 = 0, and U6 |Γ5 = 0, there holds g1 (x1 , x2 ) = (f1 (x1 , x2 ) − f5 (x1 , x2 )) + (U1 (x1 , 1, x2 ) − U1 (x1 , x2 , 0)) + (U4 (x1 , 1, x2 ) − U4 (x1 , x2 , 0)) + (U5 (x1 , 1, x2 ) − U5 (x1 , x2 , 0)) + (U2 (x1 , 1, x2 ) − U2 (x1 , x2 , 0)) + (U3 (x1 , 1, x2 ) − U3 (x1 , x2 , 0)) + (U6 (x1 , 1, x2 ) − U6 (x1 , x2 , 0)). Arguing as above, we can get the second inequality of (3.35) for j = 2. For the first and second inequalities of (3.35) for j = 1, we decompose g1 in two other ways. Since U3 |Γ3 = g3 , U6 |Γ3 = 0, U6 |Γ6 = g6 , and U3 |Γ6 = 0, we have g1 (x1 , x2 ) = (f1 (x1 , x2 ) − f3 (x1 , x2 )ig) + (U1 (0, x1 , x2 ) − U1 (x1 , x2 , 0)) + (U4 (0, x1 , x2 ) − U4 (x1 , x2 , 0)) + (U2 (0, x1 , x2 ) − U2 (x1 , x2 , 0)) + (U5 (0, x1 , x2 ) − U5 (x1 , x2 , 0)) + (U3 (x1 , 1, x2 ) − U3 (x1 , x2 , 0)) + (U6 (0, x1 , x2 ) − U6 (x1 , x2 , 0)) and g1 (x1 , x2 ) = (f1 (x1 , x2 ) − f6 (x1 , x2 )) + (U1 (1, x1 , x2 ) − U1 (x1 , x2 , 0)) + (U4 (1, x1 , x2 ) − U4 (x1 , x2 , 0)) + (U2 (1, x1 , x2 ) − U2 (x1 , x2 , 0)) + (U5 (1, x1 , x2 ) − U5 (x1 , x2 , 0)) + (U3 (1, x1 , x2 ) − U3 (x1 , x2 , 0)) + (U6 (1, x1 , x2 ) − U6 (x1 , x2 , 0)), respectively, which implies (3.35) for j = 1. Combining (3.34) and (3.35), we obtain (3.33) for i = 1. Analogously, we have (3.33) for i = 4 due to the symmetry, which together with (3.24)–(3.25) and (3.29)– (3.30) leads to (3.20). Thus, we complete the proof. 4. Applications to the error analysis of p-version of FEM. Tetrahedrons(simplices), triangular prisms(wedges), and hexahedrons(cubes) are three commonly used elements for the FEM in three dimensions. We have established polynomial extensions RG , RΛ , and RD on a triangular prism, a pyramid, and a cube, which, with the polynomial extension RK on a tetrahedron [21], are sufficient for the
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
1221
construction of a globally continuous and piecewise polynomial on a mesh containing tetrahedral elements, triangular prism elements, and hexahedral elements. Therefore, approximation errors in solutions of the p and h-p version can be proved to be as good as in local projections without comprising the optimal rate of the convergence. We will illustrate how to incorporate the local projection with polynomial extensions in the error analysis for the p-version of the FEM; the details of the proof are given in a coming paper [15]. Let Ω be a Lipschitz domain in R3 , and let Δ = {Ωj , 1 ≤ j ≤ J} be a partition of Ω. Ω j s are shape-regular and surfaced tetrahedral, hexahedral, and triangular-prism elements. By Mj , we denote a mapping of standard element Ωst onto Ωj , where Ωst is the standard tetrahedral K, or the standard triangular-prism G, or the standard hexahedron D which we defined in previous sections. Let Ppj (Ωj ) denote a set of pull-back polynomials ϕ on Ωj such that ϕ ◦ Mj ∈ Ppκj (Ωst ), with κ = 1 if Ωst is the tetrahedron K, κ = 2 if Ωst is the hexahedron D, and Pp1.5 (Ωst ) = Pp1 (T ) × Pp (I) if Ωst is the triangular-prism G. By P , we denote the distribution of the element degrees. As usual, the finite element subspaces of piecewise pull-back and continuous polynomials are defined as P,1 P,1 P 1 (4.1) SD (Ω; Δ) = SD (Ω; Δ) ∩ HD (Ω), SD (Ω; Δ) = {ϕϕ|Ωj ∈ Pp (Ωj ), 1 ≤ j ≤ J}, 1 where HD (Ω) denotes the set of u ∈ H 1 (Ω), with u = 0 on ΓD . Incorporating the polynomial extensions with the approximation in the framework of Jacobi-weighted Sobolev spaces, we have the following theorem, which leads to the error estimates for the p-version of the FEM with a quasi-uniform degree distribution in three dimensions. P,1 (Ω; Δ) be the finite element Theorem 4.1. Let u ∈ H k (Ω), k ≥ 1, and let SD subspace defined with a uniform degree p as in (4.1). Then there exists a polynomial P,1 (Ω; Δ) such that ϕ ∈ SD
(4.2)
u − ϕH 1 (Ω) ≤ C(p + 1)−(k−1) uH k (Ω) ,
with a constant C independent of p and u. We shall outline the proof and emphasize the essential role which the polynomial extensions play, and we refer readers to [15] for the details. To this end, we introduce three important propositions. Proposition 4.2. Let u ∈ H k (Ωj ), k > 32 , where Ωj is a tetrahedron, or a prism, or a cube with planar surfaces or nonplanar surfaces. Then there exists a polynomial φ ∈ Ppκ (Ωj ), with p ≥ 1 and κ = 1, 1.5, 2, respectively, such that for 0 ≤ ≤ k, (4.3)
u − φH (Ωj ) ≤ Cp−(k−) uH k (Ωj ) ,
and u = φ at vertices V of Ωj , 1 ≤ ≤ L, L = 4 or 6 or 8, respectively. Proposition 4.3. Let γ = (− 12 , 12 ) and u ∈ H s (γ), s > 1/2. Then there exists an operator πγ = H s (γ) → Pp (γ) such that u(± 12 ) = πγ u(± 21 ) and for 0 ≤ l ≤ s, (4.4)
u − πγ uH l (γ) ≤ C(p + 1)−(s−l) uH s (γ) ,
with a constant C independent of p and u. Proposition 4.4. Let Ωst be a standard tetrahedron, or triangular prism, or hexahedron, and let u ∈ H s (Ωst ), s ≥ 2. Then there exists a polynomial ϕj ∈ Pp (Ωst ) such that u(Vl ) = ϕp (Vl ) at the vertices Vl of Ωj , and ϕp |γ = πγ u on each edge of Ωst , (4.5)
u − ϕj H l (Ωst ) ≤ C(p + 1)−(s−l) uH s (Ωst )
1222
BENQI GUO AND JIANMING ZHANG
and on each face of Ωst 1
u − ψH t (Fi ) ≤ Cp−(k−t− 2 ) uH k (Ωst ) , t = 0, 1,
(4.6)
with a constant C independent of p and u. If Ωst is a standard triangular prism and u ∈ H s (Ωst ), s ≥ 3, it holds that 0 0 0 ∂(u − ψ) 0 5 0 0 (4.7) ≤ Cp−(k− 2 ) uH k (Ωst ) . 0 ∂x3 0 1 H (Fi ) The construction of the operator πγ and the polynomial ϕp are started with the Jacobi projection with β = −1/2 (Chebyshev projection) and followed by the modification at vertices and on edges. Proof of Theorem 4.1. We first assume that k ≥ 2. Due to Proposition 4.4, we have a polynomial ϕj ∈ Pp (Ωj ) in each element Ωj such that u = ϕj at each vertex V of Ωj and ϕj = πγ u on each edge γ of Ωj , where πγ is the projection-like operator defined as in Proposition 4.3, and, for 0 ≤ l ≤ k, u − ϕj H l (Ωj ) ≤ C(p + 1)−(k−l) uH k (Ωj ) .
(4.8)
¯ i is a common face of two neighboring elements Ωj ¯j ∩ Ω Suppose that F = Ω and Ωi . We may assume without loss of generality that Ωi and Ωj are standard-size elements. If F is a standard triangle T , there are three possible cases: (T1) both are tetrahedrons; (T2) both are triangular prisms; (T3) Ωj is a tetrahedron and Ωi is a triangular prism. If F is a standard square face S, similarly, there are three possible cases: (S1) both Ωj and Ωi are hexahedrons; (S2) both Ωj and Ωi are triangular prisms; (S3) Ωj is a hexahedron and Ωi is a triangular prism. We shall modify ϕi and ϕj in the cases (T1) and (S2); the treatment for other cases are similar with what follows. In the case (T1), Ωi and Ωj are tetrahedrons. ψ = (ϕi − ϕj )F ∈ Pp1,0 (F ). By Theorem 2.1, there is a polynomial Ψ ∈ Pp1 (Ωj ) such that Ψ |F = ψ and Ψ |∂Ωj \F = 0, and ΨH 1 (Ωj ) ≤ Cψ
(4.9)
1
2 (F ) H00
= Cϕi − ϕj
1
2 (F ) H00
.
1
2 Note that (ϕi − ϕj ) ∈ H00 (F ) = (H 0 (F ), H01 (F )) 12 ,2 and that for t = 0, 1,
ϕi − ϕj H t (F ) ≤ C ϕi − uH t (F ) + ϕj − uH t (F ) ≤ C(p + 1)−(k+t−1/2) uH k (Ωj ) + uH k (Ωi ) , which implies (4.10)
ΨH 1 (Ωj ) ≤ C(p + 1)−(k−1) uH k (Ωj ) + uH k (Ωi ) .
In the case (S2), by Proposition 4.4, there are ϕi ∈ Pp1.5 (Ωi ) and ϕj ∈ Pp1.5 (Ωj ) satisfying (4.5)–(4.7). Suppose that F = {x = (x1 , 0, x3 ) | 0 ≤ x1 , x3 ≤ 1}. Then
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
1223
ψ(x1 , x3 ) = (ϕi − ϕj )|F ∈ Pp2,0 (F ), and there exists a polynomial extension Ψ on Ωj [18] such that Ψ ∈ Pp1.5 (Ωj ), Ψ|F = ψ and Ψ|∂Ωj \F = 0, and 3 1 1 ΨH 1 (Ωj ) ≤ C (p + 1)− 2 ψx3 H 1 (F ) + (p + 1)− 2 ψH 1 (F ) + (p + 1) 2 ψL2 (F ) . Due to (4.5) and (4.7), there hold for t = 0, 1, 1 ψH t (F ) ≤ u−ϕj H t (F ) +u−ϕi H t (F ) ≤ C(p+1)−(k−t− 2 ) uH k (Ωj ) + uH k (Ωi ) and
0 0 0 ∂(u − ϕj ) 0 0 ψx3 H 1 (F ) ≤ 0 0 ∂x3 0
0 0 0 ∂(u − ϕi ) 0 0 +0 0 ∂x3 0 1 H 1 (F ) H (F ) −(k− 52 ) uH k (Ωj ) + uH k (Ωi ) , ≤ C(p + 1)
which implies (4.10). Let ϕ˜j = ϕj + Ψ and ϕ˜i = ϕi . Then ϕ˜j = ϕ˜i on F , and by (4.9) and (4.10), (4.11)
u − ϕ˜j H 1 (Ωj ) ≤ u − ϕj H 1 (Ωj ) + ΨH 1 (Ωj ) ≤ C(p + 1)−(k−1) uH k (Ωj ) + uH k (Ωi )
and (4.12)
u − ϕ˜i H 1 (Ωi ) = u − ϕi H 1 (Ωi ) ≤ C(p + 1)−(k−1) uH k (Ωi ) .
Adjusting ϕj on each face of Ωj by the polynomial extension Ψ, we achieve the continuity across interfaces of elements. For the homogeneous Dirichlet boundary condition, we can adjust ϕj in similar way such that ϕ˜j ∈ Ppκ (Ωj ) and vanishes on P,1 (Ω; Δ) and satisfies (4.2). ΓD ∩ ∂Ωj . Let φ = ϕ˜j in Ωj , 1 ≤ j ≤ J. Then ϕ ∈ SD 1 We next prove (4.2) for 1 < k < 3. It was shown in [4] that H k (Ω) ∩ HD (Ω) = k−1 1 3 1 1 3 1 (HD (Ω), H (Ω) ∩ HD (Ω))θ,2 ⊂ (H (Ω), H (Ω))θ,2 ∩ HD (Ω), with θ = 2 ∈ (0, 1) for 1 < k < 3. Since (H 1 (Ω), H 3 (Ω))θ,2 ⊂ (H 1 (Ω), H 3 (Ω))θ,∞ = B k (Ω), H k (Ω) ∩ 1 1 1 1 HD (Ω) ⊂ B k (Ω) ∩ HD (Ω). Suppose that v ∈ HD (Ω) and w ∈ H 3 (Ω) ∩ HD (Ω) form a k 1 decomposition of u ∈ B (Ω)∩HD (Ω). Applying (4.2) for k = 3, we have a polynomial P,1 ϕ ∈ SD (Ω; Δ), with p ≥ 1 such that w − ϕp H 1 (Ω) ≤ C
1 wH 3 (Ω) . (p + 1)2
Therefore, we have for any decomposition v and w of u, u − ϕH 1 (Ω) ≤ vH 1 (Ω) + w − ϕp H 1 (Ω)
1 ≤ C vH 1 (Ω) + wH 3 (Ω) (p + 1)2 = C vH 1 (Ω) + t1 wH 3 (Ω) , 1 with t1 = (p+1) 2 and C independent of v and w. Due to the definition of the Besov k space B (Ω), we have
u − ϕH 1 (Ω) ≤ CK(u, t1 ) ≤ Ctθ1 sup t−θ K(u, t) t>0
≤
Ctθ1 uB k (Ω)
≤ C(p + 1)k−1 uH k (Ω) .
1224
BENQI GUO AND JIANMING ZHANG
For p = 0 or k = 1, (4.2) is trivial by selecting ϕ = 0. Thus, the proof of the theorem is completed. Remark 4.1. For elliptic problems, there holds the finite element solution up ∈ P,1 SD (Ω; Δ) satisfies u − up H l (Ω) ≤ C
inf
P,1 w∈SD (Ω;Δ)
u − wH l (Ω) ≤ C(p + 1)−(k−1) uH k (Ω) ,
which together with (4.2) leads to the convergence of the p-version of FEM. Remark 4.2. For the sake of simplicity, we prove the theorem only for the pversion with uniform degree for problems with homogeneous Dirichlet boundary conditions, but the result of the theorem and the techniques in the proof can be generalized to the p-version with quasi-uniform degree distributions for problems with homogeneous and nonhomogeneous Dirichlet boundary conditions [15] and the h-p version [18] with quasi-uniform meshes and quasi-uniform degree distribution. REFERENCES [1] I. Babuˇ ska and B. Guo, Direct and inverse approximation theorems of the p-version of the finite element method in the framework of weighted Besov spaces. Part 1: Approximability of functions in the weighted Besov spaces, SIAM J. Numer. Anal., 39 (2002), pp. 1512–1538. [2] I. Babuˇ ska and B. Guo, Direct and inverse approximation theorems of the p-version of the finite element method in the framework of weighted Besov spaces, part 2: Optimal convergence of the p-version of the finite element method, Math. Models Methods Appl. Sci., 12 (2002), pp. 689–719. ¨ ranta, Efficient preconditioning for the [3] I. Babu˘ ska, A. Craig, J. Mandel, and J. Pitka p-version finite element method in two dimensions, SIAM J. Numer. Anal., 28 (1991), pp. 624–661. ¨ ranta, Direct and inverse error estimates for finite [4] I. Babuˇ ska, R. Kellogg, and J. Pitka elements with mesh refinements, Numer. Math., 33 (1979), pp. 447–471. [5] I. Babuˇ ska and M. Suri, The h-p version of the finite element method with quasiuniform meshes, RAIRO Mod´ el. Math. Anal. Num´er., 21 (1987), pp. 199–238. [6] I. Babuˇ ska and M. Suri, The optimal convergence rate of the p-version of the finite element method, SIAM J. Numer. Anal., 24 (1987), pp. 750–776. [7] F. B. Belgacem, Polynomial extensions of compatible polynomial traces in three dimensions, Comput. Methods Appl. Mech. Engrg., 116 (1994), pp. 235–241. [8] C. Bernardi, M. Dauge, and Y. Maday, Polynomials in the Sobolev World, Version 2, 2007, Institut de Recherche Math´ematique de Rennes (IRMAR), Universit´e Rennes I and Laboratoire Jacques-Louis Lions (LJLL), Paris VI, France, preprint. [9] C. Bernardi and Y. Maday, Rel` evement polynomial de traces et applications, Math. Anal. Num´er., 24 (1990), pp. 557–611. [10] C. Canuto and D. Funaro, The Schwarz algorithm for spectral methods, SIAM J. Numer. Anal., 25 (1988), pp. 24–40. [11] C. Canuto and A. Quarteroni, Approximation results for orthogonal polynomial in Sobolev spaces, Math. Comp., 38 (1982), pp. 67–86. [12] M. Costabel, M. Dauge, and L. Demckowicz, Polynomial Extension Operators in h1 , h(curl) and h(div)-Spaces in a Cube, Math. Comp., 77 (2008), pp. 1967–1999. [13] P. Davis and P. Rabinowitz, Methods of Numerical Integration, Academic Press, New York, 1975. [14] P. Grisvard, Elliptic Problems in Nonsmooth Domains, Pitman Publishing, Boston, London, Melboune, 1985. [15] B. Guo, Approximation theory of the p-version of the finite element method in three dimensions, part 2: Convergence of the p-version, SIAM J. Numer., to appear. [16] B. Guo and W. Sun, The optimal convergence of the h-p version of the finite element method with quasi-uniform meshes, SIAM J. Numer. Anal., 45 (2007), pp. 698–730. [17] B. Guo and J. Zhang, Constructive Proof of Polynomial Extensions in Two Dimensions, preprint, 2006. [18] B. Guo and J. Zhang, The h-p version of the finite element method in three dimensions with quasi uniform meshes, in preparation.
POLYNOMIAL EXTENSIONS IN THREE DIMENSIONS
1225
[19] J. Lions and E. Magenes, Non-Homogeneous Boundary Value Problems and Applications, Springer, New York, 1972. [20] Y. Maday, Rel` evements de traces polynomiales et interpolations Hilbertiennes entre espaces de polynˆ omes, C. R. Acad. Sci. Paris S´er. I Math., 309 (1989), pp. 463–468. ˜ oz-Sola, Polynomial liftings on a tetrahedron and applications to the h-p version of the [21] R. Mun finite element method in three dimensions, SIAM J. Numer. Anal., 34 (1997), pp. 282–314. [22] J. Zhang, The h-p Version of the Finite Element Method in Three Dimensions, Ph.D. thesis, Department of Mathematics, University of Manitoba, Winnipeg, 2008.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1226–1250
MIXED FINITE ELEMENT METHODS FOR THE FULLY ` NONLINEAR MONGE–AMPERE EQUATION BASED ON THE VANISHING MOMENT METHOD∗ XIAOBING FENG† AND MICHAEL NEILAN† Abstract. This paper studies mixed finite element approximations of the viscosity solution to the Dirichlet problem for the fully nonlinear Monge–Amp`ere equation det(D 2 u0 ) = f (> 0) based on the vanishing moment method which was proposed recently by the authors in [X. Feng and M. Neilan, J. Scient. Comp., DOI 10.1007/s10915-008-9221-9, 2008]. In this approach, the second-order fully nonlinear Monge–Amp`ere equation is approximated by the fourth order quasilinear equation −εΔ2 uε + det D 2 uε = f . It was proved in [X. Feng, Trans. AMS, submitted] that the solution uε converges to the unique convex viscosity solution u0 of the Dirichlet problem for the Monge–Amp`ere equation. This result then opens a door for constructing convergent finite element methods for the fully nonlinear second-order equations, a task which has been impracticable before. The goal of this paper is threefold. First, we develop a family of Hermann–Miyoshi-type mixed finite element methods for approximating the solution uε of the regularized fourth-order problem, which computes simultaneously uε and the moment tensor σε := D 2 uε . Second, we derive error estimates, which track explicitly the dependence of the error constants on the parameter ε, for the errors uε − uεh ε . Finally, we present a detailed numerical study on the rates of convergence in terms of and σ0 − σh ε , and numerically examine what is the “best” mesh size powers of ε for the error u0 − uεh and σε − σh h in relation to ε in order to achieve these rates. Due to the strong nonlinearity of the underlying equation, the standard perturbation argument for error analysis of finite element approximations of nonlinear problems does not work for the problem. To overcome the difficulty, we employ a fixed point technique which strongly relies on the stability of the linearized problem and its mixed finite element approximations. Key words. fully nonlinear PDEs, Monge–Amp`ere type equations, moment solutions, vanishing moment method, viscosity solutions, mixed finite element methods, Hermann–Miyoshi element AMS subject classifications. 65N30, 65M60, 35J60, 53C45 DOI. 10.1137/070710378
1. Introduction. This paper is the second in a sequence (cf. [19]) which concerns finite element approximations of viscosity solutions of the following Dirichlet problem for the fully nonlinear Monge–Amp`ere equation (cf. [22]): in Ω ⊂ Rn , (1.1) det D2 u0 = f (1.2)
u0 = g
on ∂Ω,
where Ω is a convex domain with smooth boundary ∂Ω. D2 u0 (x) and det(D2 u0 (x)) denote the Hessian of u0 at x ∈ Ω and the determinant of D2 u0 (x). The Monge–Amp`ere equation is a prototype of fully nonlinear second-order PDEs which have a general form (1.3) F D2 u0 , Du0 , u0 , x = 0 with F (D2 u0 , Du0 , u0 , x) = det(D2 u0 ) − f . The Monge–Amp`ere equation arises naturally from differential geometry and from applications such as mass transportation, ∗ Received by the editors December 10, 2007; accepted for publication (in revised form) October 7, 2008; published electronically February 25, 2009. This work was partially supported by NSF grants DMS-0410266 and DMS-0710831. http://www.siam.org/journals/sinum/47-2/71037.html † Department of Mathematics, The University of Tennessee, Knoxville, TN 37996 (xfeng@math. utk.edu,
[email protected]).
1226
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1227
meteorology, and geostrophic fluid dynamics [4, 8]. It is well known that, for nonstrictly convex domain Ω, the above problem does not have classical solutions in general even when f , g, and ∂Ω are smooth (see [21]). Classical result of Aleksandrov states that the Dirichlet problem with f > 0 has a unique generalized solution in the class of convex functions (cf. [1, 9]). Major progress on the analysis of problems (1.1)– (1.2) has been made later after the introduction and establishment of the viscosity solution theory (cf. [7, 12, 22]). We recall that the notion of viscosity solutions was first introduced by Crandall and Lions [11] in 1983 for the first-order fully nonlinear Hamilton–Jacobi equations. It was quickly extended to second-order fully nonlinear PDEs, with dramatic consequences in the wake of a breakthrough of Jensen’s maximum principle [24] and the Ishii’s discovery [23] that the classical Perron’s method could be used to infer existence of viscosity solutions. To continue our discussion, we need to recall the definition of viscosity solutions for the Dirichlet Monge–Amp`ere problem (1.1)–(1.2) (cf. [22]). Definition 1.1. A convex function u0 ∈ C 0 (Ω) satisfying u0 = g on ∂Ω is called a viscosity subsolution (resp., viscosity supersolution) of (1.1) if for any ϕ ∈ C 2 there holds det(D2 ϕ(x0 )) ≥ f (x0 ) (resp., det(D2 ϕ(x0 )) ≤ f (x0 )) provided that u0 − ϕ has a local maximum (resp., a local minimum) at x0 ∈ Ω. u0 ∈ C 0 (Ω) is called a viscosity solution if it is both a viscosity subsolution and a viscosity supersolution. It is clear that the notion of viscosity solutions is not variational. It is based on a “differentiation by parts” approach, instead of the more familiar integration by parts approach. As a result, it is not possible to directly approximate viscosity solutions using Galerkin type numerical methods such as finite element, spectral, and discontinuous Galerkin methods, which all are based on variational formulations of PDEs. The situation also presents a big challenge and paradox for the numerical PDE community, since, on one hand, the “differentiation by parts” approach has worked remarkably well for establishing the viscosity solution theory for fully nonlinear second-order PDEs in the past two decades; on the other hand, it is extremely difficult (if all possible) to mimic this approach at the discrete level. It should be noted that, unlike in the case of fully nonlinear first-order PDEs, the terminology “viscosity solution” loses its original meaning in the case of fully nonlinear second-order PDEs. Motivated by this difficulty and by the goal of developing convergent Galerkin type numerical methods for fully nonlinear second-order PDEs, very recently we proposed in [18] a new notion of weak solutions, called moment solutions, which is defined using a constructive method, called the vanishing moment method. The main idea of the vanishing moment method is to approximate a fully nonlinear second-order PDE by a quasilinear higher order PDE. The notion of moment solutions and the vanishing moment method are natural generalizations of the original definition of viscosity solutions and the vanishing viscosity method introduced for the Hamilton– Jacobi equations in [11]. We now briefly recall the definitions of moment solutions and the vanishing moment method, and refer the reader to [16, 18] for a detailed exposition. The first step of the vanishing moment method is to approximate the fully nonlinear (1.3) by the following quasilinear fourth-order PDE: (1.4)
−εΔ2 uε + F D2 uε , Duε , uε , x = 0
(ε > 0),
which holds in domain Ω. Suppose the Dirichlet boundary condition u0 = g is prescribed on the boundary ∂Ω, then it is natural to impose the same boundary condition
1228
XIAOBING FENG AND MICHAEL NEILAN
on uε , that is, uε = g
(1.5)
on ∂Ω.
However, boundary condition (1.5) alone is not sufficient to ensure uniqueness for fourth-order PDEs. An additional boundary condition must be imposed. In [16] the authors proposed to use one of the following (extra) boundary conditions: (1.6)
Δuε = ε,
or
D 2 uε ν · ν = ε
on ∂Ω,
where ν stands for the unit outward normal to ∂Ω. Although both boundary conditions work well numerically, the first boundary condition Δuε = ε is more convenient for standard finite element methods, spectral, and discontinuous Galerkin methods (cf. [19]), while the second boundary condition D2 uε ν · ν = ε fits better for mixed finite element methods, and hence, it will be used in this paper. In summary, the vanishing moment method involves approximating second-order boundary value problem (1.2)–(1.3) by fourth-order boundary value problems (1.4), (1.5), and (1.6). In the case of the Monge–Amp`ere equation, this means that we approximate boundary value problem (1.1)–(1.2) by the following problem: (1.7) in Ω, −εΔ2 uε + det D2 uε = f (1.8)
uε = g
on ∂Ω,
(1.9)
D u ν·ν =ε
on ∂Ω.
2 ε
It was proved in [16] that, if f > 0 in Ω, then problem (1.7)–(1.9) has a unique solution uε which is a strictly convex function over Ω. Moreover, uε uniformly converges as ε → 0 to the unique viscosity solution of (1.1)–(1.2). As a result, this shows that (1.1)–(1.2) possesses a unique moment solution that coincides with the unique viscosity solution. Furthermore, it was proved that there hold the following a priori bounds which will be used frequently later in this paper: j−1 uε W 2,∞ = O ε−1 , uε H j = O ε− 2 , (1.10) 0 2 ε 0 0 2 ε0 0cof D u 0 ∞ = O ε−1 0D u 0 2 = O ε− 12 , (1.11) L L for j = 2, 3, where cof(D2 uε ) denotes the cofactor matrix of the Hessian, D2 uε . With the help of the vanishing moment methodology, the original difficult task of computing the unique convex viscosity solution of the fully nonlinear Monge–Amp`ere problem (1.1)–(1.2), which has multiple solutions (i.e., there are nonconvex solutions), is now reduced to a feasible task of computing the unique regular solution of the quasilinear fourth-order problem (1.7)–(1.9). This then opens a door to let one use and/or adapt the wealthy amount of existing numerical methods, in particular, finite element Galerkin methods to solve the original problem (1.1)–(1.2) via the problem (1.7)–(1.9). The goal of this paper is to construct and analyze a class of Hermann–Miyoshitype mixed finite element methods for approximating the solution of (1.7)–(1.9). In particular, we are interested in deriving error bounds that exhibit explicit dependence on ε. We like to point out that one of our motivations for developing mixed finite element methods for (1.7)–(1.9) is that our experience in [19] tells us that Galerkin methods are numerically expensive for solving the singularly perturbed problem (1.7)–(1.9) (see [18] for a detailed numerical study). Finite element approximations of fourth-
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1229
order PDEs, in particular, the biharmonic equation, were carried out extensively in the 1970s in the two-dimensional case (see [10] and the references therein), and have attracted renewed interest lately for generalizing the well known 2-D finite elements to the 3-D case (cf. [33, 34, 32]) and for developing discontinuous Galerkin methods in all dimensions (cf. [17, 26]). Clearly, all these methods can be readily adapted to discretize problem (1.7)–(1.9) although their convergence analysis do not come easy due to the strong nonlinearity of the PDE (1.7). We refer the reader to [19, 27] for further discussions in this direction. A few attempts and results on numerical approximations of the Monge–Amp`ere as well as related equations have recently been reported in the literature. Oliker and Prussner [29] constructed a finite difference scheme for computing Aleksandrov measure induced by D2 u in 2-D and obtained the solution u of problem (1.7)–(1.9) as a by-product. Baginski and Whitaker [2] proposed a finite difference scheme for Gauss curvature equation (cf. [18] and the references therein) in 2-D by mimicking the unique continuation method (used to prove existence of the PDE) at the discrete level. In a series of papers (cf. [13] and the references therein) Dean and Glowinski proposed an augmented Lagrange multiplier method and a least squares method for problem (1.7)–(1.9) and the Pucci’s equation (cf. [7, 21]) in 2-D by treating the Monge– Amp`ere equation and Pucci’s equation as a constraint and using a variational criterion to select a particular solution. Very recently, Oberman [28] constructed some wide stencil finite difference schemes which fulfill the convergence criterion established by Barles and Souganidis in [3] for finite difference approximations of fully nonlinear second order PDEs. Consequently, the convergence of the proposed wide stencil finite difference scheme immediately follows from the general convergence framework of [3]. Numerical experiments results were reported in [29, 28, 2, 13]; however, convergence analysis was not addressed except in [28]. The remainder of this paper is organized as follows. In section 2, we first derive the Hermann–Miyoshi mixed weak formulation for problem (1.7)–(1.9) and then present our mixed finite element methods based on this weak formulation. Section 3 is devoted to studying the linearization of problem (1.7)–(1.9) and its mixed finite element approximations. The results of this section, which are of independent interests in themselves, will play a crucial role in our error analysis for the mixed finite element introduced in section 2. In section 4, we establish error estimates in the H 1 × L2 -norm for the mixed finite element solution (uεh , σhε ). Our main ideas are to use a fixed point technique and to make strong use of the stability property of the linearized problem and its finite element approximations, which all are established in section 3. In addition, we derive the optimal order error estimate in the H 1 -norm for uε − uεh using a duality argument. Finally, in section 5, we first run some numerical tests to validate our theoretical error estimate results, and we then present a detailed computational study for determining the “best” choice of mesh size h in terms of ε in order to achieve the optimal rates of convergence, and for estimating the rates of convergence for both u0 − uεh and u0 − uε in terms of powers of ε. We conclude this section by remarking that standard space notations are adopted in this paper; we refer to [5, 21, 10] for their exact definitions. In addition, Ω denotes a bounded domain in Rn for n = 2, 3. (·, ·) and ·, · denote the L2 -inner products on Ω and on ∂Ω, respectively. For a Banach space B, its dual space is denoted by B ∗ . C is used to denote a generic ε-independent positive constant. 2. Formulation of mixed finite element methods. There are several popular mixed formulations for fourth-order problems (cf. [6, 10, 15]). However, since the
1230
XIAOBING FENG AND MICHAEL NEILAN
Hessian matrix, D2 uε , appears in (1.7) in a nonlinear fashion, we cannot use Δuε alone as our additional variable, but rather we are forced to use σ ε := D2 uε as a new variable. Because of this, we rule out the family of Ciarlet–Raviart mixed finite elements (cf. [10]). On the other hand, this observation suggests to try Hermann– Miyoshi or Hermann–Johnson mixed elements (cf. [6, 15, 30, 31]), which both seek σ ε as an additional unknown. In this paper, we shall only focus on developing Hermann– Miyoshi-type mixed methods. We begin with a few more space notations: .
n×n W := μ ∈ H 1 (Ω) ; μij = μji , V := H 1 (Ω), V0 := H01 (Ω),
Vg := {v ∈ V ; v|∂Ω = g},
Wε := {μ ∈ W ; μν · ν|∂Ω = ε},
W0 := {μ ∈ W ; μν · ν|∂Ω = 0}.
To define the Hermann–Miyoshi mixed formulation for problem (1.7)–(1.9), we rewrite the PDE into the following system of second-order equations: (2.1)
σ ε − D2 uε = 0,
(2.2)
−εΔtr (σ ) + det (σ ε ) = f. ε
Testing (2.2) with v ∈ V0 yields (2.3) ε div(σ ε ) · Dv dx + det(σ ε )v dx = f v dx. Ω
Ω
Ω
Multiplying (2.1) by μ ∈ W0 and integrating over Ω we get
Du · div(μ) dx =
ε
(2.4)
ε
σ : μ dx + Ω
Ω
n−1 k=1
μν · τk
∂Ω
∂g ds, ∂τk
where σ ε : μ denotes the matrix inner product and {τ1 (x), τ2 (x), . . . , τn−1 (x)} denotes the standard basis for the tangent space to ∂Ω at x. From (2.3) and (2.4), we define the variational formulation for (2.1)–(2.2) as follows: Find (uε , σ ε ) ∈ Vg × Wε such that (2.5) (2.6)
g , μ (σ ε , μ) + (div(μ), Duε ) = ˜ 1 (div(σ ε ), Dv) + (detσ ε , v) = (f ε , v) ε
∀μ ∈ W0 , ∀v ∈ V0 ,
where ˜ g , μ =
n−1 ? i=1
∂g , μν · τi ∂τi
@ and f ε =
1 f. ε
To discretize (2.5)–(2.6), let Th be a quasiuniform triangular or rectangular partition of Ω if n = 2 and be a quasiuniform tetrahedral or 3-D rectangular mesh if n = 3. Let V h ⊂ H 1 (Ω) be the Lagrange finite element space consisting of continuous piecewise polynomials of degree k(≥ 2) associated with the mesh Th . Let Vgh := V h ∩ Vg , n×n ∩ Wε , Wεh := V h
V0h := V h ∩ V0 , n×n W0h := V h ∩ W0 .
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1231
In the 2-D case, the above choices of V0h and W0h are known as the Hermann– Miyoshi mixed finite element for the biharmonic equation (cf. [6, 15]). They form a stable pair which satisfies the inf-sup condition. We like to note that it is easy to check that the Hermann–Miyoshi mixed finite element also satisfies the inf-sup condition in 3-D. See section 3.2 for the details. Based on the weak formulation (2.5)–(2.6) and using the above finite element spaces, we now define our Hermann–Miyoshi-type mixed finite element method for (1.7)–(1.9) as follows: Find (uεh , σhε ) ∈ Vgh × Wεh such that (2.7) (2.8)
(σhε , μh ) + (div(μh ), Duεh ) = ˜ g , μh 1 (div(σhε ), Dvh ) + (det(σhε ), vh ) = (f ε , vh ) ε
∀μh ∈ W0h , ∀vh ∈ V0h .
Let (σ ε , uε ) be the solution to (2.5)–(2.6) and (σhε , uεh ) solves (2.7)–(2.8). As mentioned in section 1, the primary goal of this paper is to derive error estimates for uε − uεh and σ ε − σhε . To this end, we first need to prove existence and uniqueness of (σhε , uεh ). It turns out both tasks are not easy to accomplish due to the strong nonlinearity in (2.8). Unlike in the continuous PDE case, where uε is proved to be convex for all ε (cf. [16]), it is far from clear if uεh preserves the convexity even for small ε and h. Without a guarantee of convexity for uεh , we could not establish any stability result for uεh . This, in turn, makes proving existence and uniqueness a difficult and delicate task. In addition, again due to the strong nonlinearity, the standard perturbation technique for deriving error estimate for numerical approximations of mildly nonlinear problems does not work here. To overcome the difficulty, our idea is to adopt a combined fixed point and linearization technique which was used by the authors in [20], where a nonlinear singular second-order problem known as the inverse mean curvature flow was studied. We note that this combined fixed point and linearization technique kills three birds by one stone, that is, it simultaneously proves existence and uniqueness for uεh and also yields the desired error estimates. In the next two sections, we shall give a detailed account about the technique and realize it for problem (2.7)–(2.8). 3. Linearized problem and its finite element approximations. To build the necessary technical tools, in this section we shall derive and present a detailed study of the linearization of (2.5)–(2.6) and its mixed finite element approximations. First, we recall the following divergence-free row property for the cofactor matrices, which will be frequently used in later sections. We refer to [14, p. 440] for a short proof of the lemma. Lemma 3.1. Given a vector-valued function v = (v1 , v2 , . . . , vn ) : Ω → Rn . Assume v ∈ [C 2 (Ω)]n . Then the cofactor matrix cof(Dv) of the gradient matrix Dv of v satisfies the following row divergence-free property: (3.1)
div(cof(Dv))i =
n
∂xj (cof(Dv))ij = 0
for i = 1, 2, . . . , n,
j=1
where (cof(Dv))i and (cof(Dv))ij denote, respectively, the ith row and the (i, j)-entry of cof(Dv). 3.1. Derivation of linearized problem. We note that for a given function w there holds det D2 (uε + tw) = det D2 uε + ttr Φε D2 w + · · · + tn det D2 w ,
1232
XIAOBING FENG AND MICHAEL NEILAN
where Φε := cof(D2 uε ). Thus, setting t = 0 after differentiating with respect to t we find the linearization of M ε (uε ) := −εΔ2 uε + det(D2 uε ) at the solution uε to be Luε (w) := −εΔ2 w + tr Φε D2 w = −εΔ2 w + Φε : D2 w = −εΔ2 w + div(Φε Dw), where we have used (3.1) with v = Duε . We now consider the following linear problem: (3.2) (3.3)
Luε (w) = q w=0
in Ω, on ∂Ω,
(3.4)
D2 wν · ν = 0
on ∂Ω.
To introduce a mixed formulation for (3.2)–(3.4), we rewrite the PDE as (3.5) (3.6)
χ − D2 w = 0, −εΔtr(χ) + div(Φε Dw) = q.
Its variational formulation is then defined as: Given q ∈ V0∗ , find (χ, w) ∈ W0 × V0 such that (3.7) (3.8)
(χ, μ) + (div(μ), Dw) = 0 1 1 (div(χ), Dv) − (Φε Dw, Dv) = q, v ε ε
∀μ ∈ W0 , ∀v ∈ V0 .
It is not hard to show that if (χ, w) solves (3.7)–(3.8), then w ∈ H 2 (Ω) ∩ H01 (Ω) should be a weak solution to problem (3.2)–(3.4). On the other hand, by the elliptic theory for linear PDEs (cf. [25]), we know that if q ∈ V0∗ , then the solution to problem (3.2)–(3.4) satisfies w ∈ H 3 (Ω), so that χ = D2 w ∈ H 1 (Ω). It is easy to verify that (w, χ) is a solution to (3.7)–(3.8). 3.2. Mixed finite element approximations of the linearized problem. Our finite element method for (3.7)–(3.8) is defined by seeking (χh , wh ) ∈ W0h × V0h such that (3.9) (3.10)
(χh , μh ) + (div(μh ), Dwh ) = 0 1 (div(χh ), Dvh ) − (Φε Dwh , Dvh ) = q, vh ε
∀μh ∈ W0h , ∀vh ∈ V0h .
The objectives of this subsection are to first prove existence and uniqueness for problem (3.9)–(3.10) and then derive error estimates in various norms. First, we prove the following inf-sup condition for the mixed finite element pair (W0h , V0h ). Lemma 3.2. For every vh ∈ V0h , there exists a constant β0 > 0, independent of h, such that (3.11)
sup μh ∈W0h
(div(μh ), Dvh ) ≥ β0 vh H 1 . μh H 1
Proof. Given vh ∈ V0h , set μh = In×n vh . Then (div(μh ), Dvh ) = Dvh 2L2 ≥ β0 vh 2H 1 = β0 vh H 1 μh H 1 . Here we have used Poincar´e inequality. Remark 3.1. By [15, Proposition 1], (3.11) implies that there exists a linear operator Πh : W → W h such that (3.12)
(div(μ − Πh μ), Dvh ) = 0
∀vh ∈ V0h ,
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1233
and for μ ∈ W ∩ [H r (Ω)]n×n , r ≥ 1, there holds (3.13)
μ − Πh μH j ≤ Chl−j μH l
j = 0, 1,
1 ≤ l ≤ min{k + 1, r}.
We note that the above results were proved in the 2-D case in [15]; however, they also hold in the 3-D case as (3.11) holds in 3-D. Theorem 3.1. For any q ∈ V0∗ , there exists a unique solution (χh , wh ) ∈ W0h ×V0h to problem (3.9)–(3.10). Proof. Since we are in the finite dimensional case and the problem is linear, it suffices to show uniqueness. Thus, suppose (χh , wh ) ∈ W0h × V0h solves (χh , μh ) + (div(μh ), Dwh ) = 0 1 (div(χh ), Dvh ) − (Φε Dwh , Dvh ) = 0 ε
∀μh ∈ W0h , ∀vh ∈ V0h .
Let μh = χh , vh = wh , and subtract two equations to obtain 1 (χh , χh ) + (Φε Dwh , Dwh ) = 0. ε Since uε is strictly convex, then Φε is positive definite. Thus, there exists θ > 0 such that θ χh 2L2 + Dwh 2L2 ≤ 0. ε Hence, χh = 0, wh = 0, and the desired result follows. Theorem 3.2. Let (χ, w) ∈ [H r (Ω)]n×n ∩W0 ×H r (Ω)∩V0 (r ≥ 2) be the solution to (3.7)–(3.8) and (χh , wh ) ∈ W0h × V0h solves (3.9)–(3.10). Then there hold 3
(3.14)
χ − χh L2 ≤ Cε− 2 hl−2 [χH l + wH l ] ,
(3.15)
χ − χh H 1 ≤ Cε− 2 hl−3 [χH l + wH l ] ,
(3.16)
w − wh H 1 ≤ Cε−3 hl−1 [χH l + wH l ] ,
3
where l := min{k + 1, r}. Moreover, for k ≥ 3 there also holds (3.17)
w − wh L2 ≤ Cε−5 hl [χH l + wH l ] .
Proof. Let Ih w denote the standard finite element interpolant of w in V0h . Then (3.18)
(3.19)
(Πh χ − χh , μh ) + (div(μh ), D(Ih w − wh )) = (Πh χ − χ, μh ) + (div(μh ), D(Ih w − w)), 1 (div(Πh χ − χh ), Dvh ) − (Φε D(Ih w − wh ), Dvh ) ε 1 ε = − (Φ D(Ih w − w), Dvh ) . ε
Let μh = Πh − χh and vh = Ih w − wh and subtract (3.19) from (3.18) to get 1 ε (Φ D(Ih w − wh ), D(Ih w − wh )) ε = (Πh χ − χ, Πh χ − χh ) + (div(Πh χ − χh ), D(Ih w − w)) 1 + (Φε D(Ih w − w), D(Ih w − wh )) . ε
(Πh χ − χh , Πh χ − χh ) +
1234
XIAOBING FENG AND MICHAEL NEILAN
Thus, θ Πh χ − χh 2L2 + D(Ih w − wh )2L2 ε ≤ Πh χ − χL2 Πh χ − χh L2 + Πh χ − χh H 1 D(Ih w − w)L2 C + 2 D(Ih w − w)L2 D(Ih w − wh )L2 ε ≤ Πh χ − χL2 Πh χ − χh L2 + Ch−1 Πh χ − χh L2 D(Ih w − w)L2 C + 2 D(Ih w − w)L2 D(Ih w − wh )L2 , ε where we have used the inverse inequality. Using the Schwarz inequality and rearranging terms yield (3.20)
1 Πh χ − χh 2L2 + D(Ih w − wh )2L2 ε ≤ C Πh χ − χ2L2 + h−2 Ih w − w2H 1 + ε−3 Ih w − w2H 1 .
Hence, by the standard interpolation results [5, 10] we have 3 Πh χ − χh L2 ≤ C Πh χ − χL2 + h−1 Ih w − wH 1 + ε− 2 Ih w − wH 1 3
≤ Cε− 2 hl−2 (χH l + wH l ) , which, by the triangle inequality, yield 3
χ − χh L2 ≤ Cε− 2 hl−2 (χH l + wH l ) . The above estimate and the inverse inequality yield χ − χh H 1 ≤ χ − Πh χH 1 + Πh χ − χh H 1 ≤ χ − Πh χH 1 + h−1 Πh χ − χh L2 3
≤ Cε− 2 hl−3 (χH l + wH l ) . Next, from (3.20) we have √ 3 D(Ih w − wh )L2 ≤ εC Πh χ − χL2 + h−1 D(Ih w − w)L2 + ε− 2 Ih w − wH 1 (3.21)
≤ Cε−1 hl−2 (χH l + wH l ) .
To derive (3.16), we appeal to a version of the Aubin–Nitsche duality argument (cf. [5, 10]). We consider the following auxiliary problem: Find z ∈ H 2 (Ω) ∩ H01 (Ω) such that −εΔ2 z + div(Φε Dz) = −Δ(w − wh ) D zν · ν = 0 2
in Ω, on ∂Ω.
By the elliptic theory for linear PDEs (cf. [25]), we know that the above problem has a unique solution z ∈ H01 (Ω) ∩ H 3 (Ω) and (3.22) zH 3 ≤ Cb (ε)D(w − wh )L2 where Cb (ε) = O ε−1 .
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1235
Setting κ = D2 z, it is easy to verify that (κ, z) ∈ W0 × V0 and (κ, μ) + (div(μ), Dz) = 0 1 1 (div(κ), Dv) − (Φε Dz, Dv) = (D(w − wh ), Dv) ε ε
∀μ ∈ W0 , ∀v ∈ V0 .
It is easy to check that (3.9)–(3.10) produce the following error equations: (3.23) (3.24)
(χ − χh , μh ) + (div(μh ), D(w − wh )) = 0 1 (div(χ − χh ), Dvh ) − (Φε D(w − wh ), Dvh ) = 0 ε
∀μh ∈ W0h , ∀vh ∈ V0h .
Thus, 1 1 D(w − wh )2L2 = (div(κ), D(w − wh )) − (Φε Dz, D(w − wh )) ε ε 1 ε = (div(κ − Πh κ), D(w − wh )) − (Φ Dz, D(w − wh )) ε + (div(Πh κ), D(w − wh )) 1 = (div(κ − Πh κ), D(w − Ih w)) − (Φε Dz, D(w − wh )) ε + (χh − χ, Πh κ) 1 = (div(κ − Πh κ), D(w − Ih w)) − (Φε Dz, D(w − wh )) ε + (χh − χ, Πh κ − κ) + (χh − χ, κ) 1 = (div(κ − Πh κ), D(w − Ih w)) − (Φε Dz, D(w − wh )) ε + (χh − χ, Πh κ − κ) + (div(χ − χh ), Dz) = (div(κ − Πh κ), D(w − Ih w)) + (χh − χ, Πh κ − κ) 1 + (div(χ − χh ), D(z − Ih z)) − (Φε D(w − wh ), D(z − Ih z)) ε 2 ≤ div(κ − Πh κ)L D(w − Ih w)L2 + χh − χL2 Πh κ − κL2 + div(χ − χh )L2 D(z − Ih z)L2 C + 2 D(z − Ih z)L2 D(w − wh )L2 ε ≤ C D(w − Ih w)L2 + hχh − χL2 + h2 div(χ − χh )L2 h2 + 2 D(w − wh )L2 zH 3 . ε Then, by (3.14), (3.15), (3.21), and (3.22), we have D(w − wh )L2 ≤ Cb (ε)ε−2 hl−1 [χH l + wH l ] . Substituting Cb (ε) = O(ε−1 ) we get (3.16). To derive the L2 -norm estimate for w − wh , we consider the following auxiliary problem: Find (κ, z) ∈ W0 × V0 such that (κ, μ) + (div(μ), Dz) = 0 1 1 (div(κ), Dv) − (Φε Dz, Dv) = (w − wh , v) ε ε
∀μ ∈ W0 , ∀v ∈ V0 .
1236
XIAOBING FENG AND MICHAEL NEILAN
Assume the above problem is H 4 regular, that is, z ∈ H 4 (Ω) and (3.25) zH 4 ≤ Cb (ε)w − wh L2 with Cb (ε) = O ε−1 . We then have 1 1 w − wh 2L2 = (div(κ), D(w − wh )) − (Φε D(w − wh ), Dz) ε ε 1 ε = (div(Πh κ), D(w − wh )) − (Φ D(w − wh ), Dz) ε + (div(κ − Πh κ), D(w − wh )) 1 = (χh − χ, Πh κ) − (Φε Dz, D(w − wh )) ε + (div(κ − Πh κ), D(w − Ih w)) = (χh − χ, κ) + (χh − χ, Πh κ − κ) 1 − (Φε Dz, D(w − wh )) + (div(κ − Πh κ), D(w − Ih w)) ε 1 = (div(χ − χh ), Dz) − (Φε D(w − wh ), Dz) ε + (χh − χ, Πh κ − κ) + (div(κ − Πh κ), D(w − Ih w)) 1 = (div(χ − χh ), D(z − Ih z)) − (Φε D(w − wh ), D(z − Ih z)) ε + (χh − χ, Πh κ − κ) + (div(κ − Πh κ), D(w − Ih w)) C ≤ div(χ − χh )L2 + 2 D(w − wh )L2 D(z − Ih z)L2 ε + χh − χL2 Πh κ − κL2 + div(κ − Πh κ)L2 D(w − Ih w)L2 1 ≤ Ch3 χ − χh H 1 + 2 w − wh H 1 zH 4 ε + Ch2 χh − χL2 κH 2 + Chw − Ih wH 1 κH 2 ≤ Cε−5 hl (χH l + wH l ) zH 4 ≤ CCb (ε)ε−5 hl (χH l + wH l ) w − wh L2 , where we have used (3.14), (3.15), (3.16), (3.25), and the assumption k ≥ 3. Dividing the above inequality by w − wh L2 and substituting Cb (ε) = O(ε−1 ) we get (3.17). The proof is complete. 4. Error analysis for finite element method (2.7)–(2.8). The goal of this section is to derive error estimates for the finite element method (2.7)–(2.8). Our main idea is to use a combined fixed point and linearization technique (cf. [20]). Definition 4.1. Let T : Wεh × Vgh → Wεh × Vgh be a linear mapping such that for any (μh , vh ) ∈ Wεh × Vgh , T (μh , vh ) = (T (1) (μh , vh ), T (2) (μh , vh )) satisfies μh − T (1) (μh , vh ), κh + div(κh ), D vh − T (2) (μh , vh ) (4.1) (4.2)
= (μh , κh ) + (div(κh ), Dvh ) − ˜ g , κh ∀κh ∈ W0h , 1 div μh − T (1) (μh , vh ) , Dzh − Φε D vh − T (2) (μh , vh ) , Dzh ε 1 = (div(μh ), Dzh ) + (det(μh ), zh ) − (f ε , zh ) ∀zh ∈ V0 . ε
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1237
By Theorem 3.1, we conclude that T (μh , vh ) is well defined. Clearly, any fixed point (χh , wh ) of the mapping T (i.e., T (χh , wh ) = (χh , wh )) is a solution to problem (2.7)–(2.8), and vice-versa. The rest of this section shows that, indeed, the mapping T has a unique fixed point in a small neighborhood of (Ih σ ε , Ih uε ). To this end, we define 1 h h ε ε ˜ Bh (ρ) := (μh , vh ) ∈ Wε × Vg ; μh − Ih σ L2 + √ vh − Ih u H 1 ≤ ρ . ε 5 4 h h Z˜h := (μh , vh ) ∈ Wε × Vg ; (μh , κh ) + (div(κh ), Dvh ) = ˜ g, κh ∀κh ∈ W0h . ˜h (ρ) ∩ Z˜h . Bh (ρ) := B We also assume σ ε ∈ H r (Ω) and set l = min{k + 1, r}. The next lemma measures the distance between the center of Bh (ρ) and its image under the mapping T . Lemma 4.1. The mapping T satisfies the following estimates: 0 0 0 0 (4.3) 0Ih σ ε − T (1) (Ih σ ε , Ih uε )0 1 ≤ C1 (ε)hl−3 [σ ε H l + uε H l ] , 0 0H 0 ε (1) ε ε 0 (4.4) 0Ih σ − T (Ih σ , Ih u )0 2 ≤ C2 (ε)hl−2 [σ ε H l + uε H l ] , 0 0L 0 ε (2) ε ε 0 (4.5) 0Ih u − T (Ih σ , Ih u )0 1 ≤ C3 (ε)hl−1 [σ ε H l + uε H l ] , H
−1
−1
where C1 (ε) = O(ε ), C2 (ε) = O(ε ), C3 (ε) = O(ε−4 ) when n = 2, and C1 (ε) = 5 5 11 O(ε− 2 ), C2 (ε) = O(ε− 2 ), C3 (ε) = O(ε− 2 ) when n = 3. Proof. We divide the proof into four steps. Step 1: To ease notation we set ωh = Ih σ ε − T (1) (Ih σ ε , Ih uε ), sh = Ih uε − T (2) (Ih σ ε , Ih uε ). By the definition of T , we have for any (μh , vh ) ∈ W0h × V0h (ωh , μh ) + (div(μh ), Dsh ) = (Ih σ ε , μh ) + (div(μh ), D(Ih uε )) − ˜ g , μh , 1 1 (div(ωh ), Dvh ) − (Φε Dsh , Dvh ) = (div(Ih σ ε ), Dvh ) + (det(Ih σ ε ), vh ) − (f ε , vh ). ε ε It follows from (2.5)–(2.6) that, for any (μh , vh ) ∈ W0h × V0h (4.6) (4.7)
(ωh , μh ) + (div(μh ), Dsh ) = (Ih σ ε − σ ε , μh ) + (div(μh ), D(Ih uε − uε )) , 1 (div(ωh ), Dvh ) − (Φε Dsh , Dvh ) = (div(Ih σ ε − σ ε ), Dvh ) ε 1 + (det(Ih σ ε ) − det(σ ε ), vh ) . ε
Letting vh = sh , μh = ωh in (4.6)–(4.7), subtracting the two equations and using the mean value theorem we get (ωh , ωh ) +
1 ε (Φ Dsh , Dsh ) = (Ih σ ε − σ ε , ωh ) + (div(ωh ), D(Ih uε − uε )) ε 1 + (div(σ − Ih σ ε ), Dsh ) + (det(σ ε ) − det(Ih σ ε ), sh ) ε = (Ih σ ε − σ ε , ωh ) + (div(ωh ), D(Ih uε − uε )) 1 + (div(σ − Ih σ ε ), Dsh ) + (Ψε : (σ ε − Ih σ ε ), sh ) , ε
where Ψε = cof(τ Ih σ ε + [1 − τ ]σ ε ) for τ ∈ [0, 1].
1238
XIAOBING FENG AND MICHAEL NEILAN
Step 2: The case n = 2. Since Ψε is a 2 × 2 matrix whose entries are the same as those of τ Ih σ ε + [1 − τ ]σ ε , then by (1.11) we have Ψε L2 = cof(τ Ih σ ε + [1 − τ ]σ ε )L2 = τ Ih σ ε + [1 − τ ]σ ε L2 1 ≤ Ih σ ε L2 + σ ε L2 ≤ Cσ ε L2 = O ε− 2 . Step 3: The case n = 3. Note that (Ψε )ij = (cof(τ Ih σ ε + [1 − τ ]σ ε ))ij = det(τ Ih σ ε |ij + [1 − τ ]σ ε |ij ), where σ ε |ij denotes the 2 × 2 matrix after deleting the ith row and jth column of σ ε . We can, thus, conclude that |(Ψε )ij | ≤ 2 max (|τ (Ih σ ε )st + [1 − τ ](σ ε )st |)
2
s =i,t =j
≤ C max |(σ ε )st |2 ≤ Cσ ε 2L∞ . s =i,t =j
Thus, (1.11) implies that Ψε L2 ≤ Cσ ε 2L∞ = O ε−2 . Step 4: Using the estimates of Ψε L2 we have θ ωh 2L2 + Dsh 2L2 ≤ Ih σ ε − σ ε L2 ωh L2 + ωh H 1 D(Ih uε − uε )L2 ε 3 + Ih σ ε − σ ε H 1 Dsh L2 + Cε 2 (1−n) σ ε − Ih σ ε H 1 sh H 1 , where we have used Sobolev inequality. It follows from Poincar´e inequality, Schwarz inequality, and the inverse inequality that (4.8)
θ ωh 2L2 + sh 2H 1 ≤ Cε4−3n Ih σ ε − σ ε 2H 1 + Cωh H 1 Ih uε − uε H 1 ε ≤ Cε4−3n h2l−2 σ ε 2H l + Ch−1 ωh L2 Ih uε − uε H 1 .
Hence, 1 ωh 2L2 + sh 2H 1 ≤ Cε4−3n h2l−2 σ ε 2H l + Ch2l−4 uε 2H l . ε Therefore, ωh L2 ≤ C2 (ε)hl−2 [σ ε H l + uε H l ] , which and the inverse inequality yield ωh H 1 ≤ C1 (ε)hl−3 [σ ε H l + uε H l ] . Next, from (4.6) we have (div(μh ), Dsh ) ≤ ωh L2 μh L2 + Ih σ ε − σ ε L2 μh L2 + div(μh )L2 D(Ih uε − uε )L2 ≤ C2 (ε)hl−2 [σ ε H l + uε H l ] μh H 1 . It follows from (3.11) that (4.9)
Dsh L2 ≤ C(ε)hl−2 [σ ε H l + uε H l ] .
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1239
To prove (4.5), let (κ, z) be the solution to (κ, μ) + (div(μ), Dz) = 0 1 1 (div(κ), Dv) − (Φε Dz, Dv) = (Dsh , Dv) ε ε
∀μ ∈ W0 , ∀v ∈ V0 ,
and satisfy zH 3 ≤ Cb (ε)Dsh L2 . Then, 1 1 Dsh 2L2 = (div(κ), Dsh ) − (Φε Dz, Dsh ) ε ε 1 = (div(Πh κ), Dsh ) − (Φε Dz, Dsh ) ε 1 ε = −(ωh , Πh κ) − (Φ Dz, Dsh ) + (Ih σ ε − σ ε , Πh κ) ε + (div(Πh κ), D(Ih uε − uε )) 1 = −(ωh , κ) + (ωh , κ − Πh κ) − (Φε Dz, Dsh ) ε + (Ih σ ε − σ ε , Πh κ) + (div(Πh κ), D(Ih uε − uε )) 1 = (div(ωh ), Dz) − (Φε Dsh , Dz) + (ωh , κ − Πh κ) ε + (Ih σ ε − σ ε , Πh κ) + (div(Πh κ), D(Ih uε − uε )) 1 = (div(ωh ), D(z − Ih z)) − (Φε Dsh , D(z − Ih z)) + (ωh , κ − Πh κ) ε + (Ih σ ε − σ ε , Πh κ) + (div(Πh κ), D(Ih uε − uε )) 1 + (div(σ ε − Ih σ ε ), Ih z) + (det(σ ε ) − det(Ih σ ε ), Ih z) ε 1 ≤ div(ωh )L2 D(z − Ih z)L2 + Φε L∞ Dsh L2 D(z − Ih z)L2 ε + ωh L2 κ − Πh κL2 + Ih σ ε − σ ε L2 Πh κL2 + div(Πh κ)L2 D(Ih uε − uε )L2 C + div(σ ε − Ih σ ε )L2 Ih zL2 + Ψε L2 σ ε − Ih σ ε H 1 Ih zH 1 ε
1 2 ≤ Ch ωH 1 + 2 Dsh L2 zH 3 ε + C(ε)hl−1 (Ih zL2 + Ih zH 1 ) σ ε H l + Chωh L2 κH 1 + Chl σ ε H l Πh κL2 + Chl−1 Πh κH 1 uε H l ≤ C2 (ε)ε−2 hl−1 [uε H l + σ ε H l ] zH 3 ≤ C2 (ε)ε−2 Cb (ε)hl−1 [uε H l + σ ε H l ] Dsh L2 . Dividing by Dsh L2 , we get (4.5). The proof is complete. The next lemma shows the contractiveness of the mapping T . 19 19 n Lemma 4.2. There exists an h0 = o(ε 12 ) and ρ0 = o(ε 12 | log h|n−3 h 2 −1 ), such that for h ≤ h0 , T is a contracting mapping in the ball Bh (ρ0 ) with a contraction
1240 factor (4.10)
XIAOBING FENG AND MICHAEL NEILAN 1 2.
That is, for any (μh , vh ), (χh , wh ) ∈ Bh (ρ0 ), there holds 0 0 0 (1) 0 0T (μh , vh ) − T (1) (χh , wh )0
L2
0 1 0 0 0 + √ 0T (2) (μh , vh ) − T (2) (χh , wh )0 1 ε H
1 1 √ 2 1 ≤ vh − wh H . μh − χh L + 2 ε
Proof. We divide the proof into five steps. Step 1: To ease notation, let T (1) = T (1) (μh , vh ) − T (1) (χh , wh ),
T (2) = T (2) (μh , vh ) − T (2) (χh , wh ).
By the definition of T (i) , we get T (1) , κh + div(κh ), D T (2) = 0 ∀κh ∈ W0h , (4.11) 1 div T (1) , Dzh − (4.12) Φε D T (2) , Dzh ε 1 ε = [(Φ D(wh − vh ), Dzh ) + (det(χh ) − det(μh ), zh )] ε
∀zh ∈ V0h .
Letting zh = T (2) and κh = T (1) , subtracting (4.12) from (4.11), and using the mean value theorem we have 1 Φε DT (2) , DT (2) T (1) , T (1) + ε 1 ε = Φ D(vh − wh ), DT (2) + det(μh ) − det(χh ), T (2) ε 1 ε = Φ D(vh − wh ), DT (2) + Λh : (μh − χh ), T (2) ε 1 ε = Φ D(vh − wh ), DT (2) + Φε : (μh − χh ), T (2) ε + (Λh − Φε ) : (μh − χh ), T (2) 1 = div Φε T (2) , D(vh − wh ) + μh − χh , Φε T (2) ε + (Λh − Φε ) : (μh − χh ), T (2) 1 = div Πh Φε T (2) , D(vh − wh ) + μh − χh , Φε T (2) ε + (Λh − Φε ) : (μh − χh ), T (2) 1 ε (2) Φ T − Πh Φε T (2) , μh − χh + (Λh − Φε ) : (μh − χh ), T (2) = ε 0 1 0 0 0 ε (2) ≤ 0Φ T − Πh Φε T (2) 0 2 μh − χh L2 ε L 0 0 0 0 ε 2 + CΛh − Φ L μh − χh L2 0T (2) 0 L∞ 0 1 0 0 0 ε (2) ≤ 0Φ T − Πh Φε T (2) 0 2 μh − χh L2 ε L 0 0 3−n n 0 0 1− + | log h| 2 h 2 Λh − Φε L2 μh − χh L2 0T (2) 0 1 , H
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1241
where Λh = cof(μh + τ (χh − μh )), τ ∈ [0, 1]. n = 2, 3. We have used the inverse inequality to get the last inequality above. Step 2: The case of n = 2. We bound Φε − Λh L2 as follows: Φε − Λh L2 = cof(σ ε ) − cof(μh + τ (χh − μh ))L2 = σ ε − μh − τ (χh − μh )L2 ≤ σ ε − Ih σ ε L2 + Ih σ ε − μh L2 + χh − μh L2 ≤ Chl σ ε H l + 3ρ0 . Step 3: The case of n = 3. To bound Φε − Λh L2 in this case, we first write (Φε − Λh )ij L2 = (cof(σ ε )ij ) − cof(μh + τ (χh − μh ))ij L2 = det(σ ε |ij ) − det(μh |ij + τ (χh |ij − μh |ij ))L2 , where σ|ij denotes the 2 × 2 matrix after deleting the ith row and j th column. Then, use the mean value theorem to get (Φε − Λh )ij L2 = det(σ ε |ij ) − det(μh |ij + τ (χh |ij − μh |ij ))L2 = Λij : (σ ε |ij − μh |ij − τ (χh |ij − μh |ij ))L2 ≤ Λij L∞ σ ε |ij − μh |ij − τ (χh |ij − μh |ij )L2 , where Λij = cof(σ ε |ij + λ(μ|ij − τ (χh |ij − μ|ij ) − σ ε |ij )), λ ∈ [0, 1]. On noting that Λij ∈ R2 , we have Λij L∞ = cof(σ ε |ij + λ(μ|ij − τ (χh |ij − μ|ij ) − σ ε |ij ))L∞ = σ ε |ij + λ(μ|ij − τ (χh |ij − μ|ij ) − σ ε |ij )L∞ C ≤ Cσ ε L∞ ≤ . ε Combining the above estimates gives C ε σ |ij − μh |ij − τ (χh |ij − μh |ij )L2 ε C l ε h σ H l + ρ0 . ≤ ε
(Φε − Λh )ij L2 ≤
Step 4: We now bound Φε T (2) − Πh (Φε T (2) )L2 as follows: 0 02 0 02 0 0 0 ε (2) 0 0Φ T − Πh Φε T (2) 0 2 ≤ Ch2 0Φε T (2) 0 1 L H
0 0 02 02 0 0 0 0 = Ch2 0Φε T (2) 0 + 0D Φε T (2) 0 L2 L2
0 02 02 02 0 0 0 ε 0 2 0 ε (2) 0 (2) 0 ε (2) 0 ≤ Ch 0Φ T 0 2 + 0Φ DT 0 2 + 0DΦ T 0 2 L L L
0 02 0 02 0 02 0 2 ε 2 0 (2) 0 ε (2) 0 ε 2 0 (2) 0 ≤ Ch Φ L4 0T 0 + Φ L∞ 0DT 0 + DΦ L3 0T 0 L4 L2 L6
0 02 0 02 0 02 0 0 0 0 0 0 ≤ Ch2 Φε 2L4 0T (2) 0 1 + Φε 2L∞ 0DT (2) 0 2 + DΦε 2L3 0T (2) 0 1 H
02 0 0 0 ≤ Ch2 Φε 2L∞ + DΦε 2L3 0DT (2) 0 L2 02 Ch2 0 0 0 ≤ 13 0DT (2) 0 , L2 ε6
L
H
1242
XIAOBING FENG AND MICHAEL NEILAN
where we have used Sobolev’s inequality followed by Poincar´e’s inequality. Thus, 0 0 0 ε (2) 0 0Φ T − Πh Φε T (2) 0
L2
≤
0 Ch 0 0 (2) 0 0 2. 13 0DT L ε 12
Step 5: Finishing up. Substituting all estimates from Steps 2–4 into Step 1, and using the fact that Φε is positive definite we obtain for n = 2, 3 0 0 02 0 0 3−n θ0 25 n 0 0 0 0 0 (1) 02 0T 0 2 + 0DT (2) 0 2 ≤ Cε− 12 h + | log h| 2 h1− 2 ρ0 μh − χh L2 0DT (2) 0 2 . ε L L L Using Schwarz’s inequality, we get 0 0 0 (1) 0 0T 0
L2
0 3−n 1 0 19 n 0 0 + √ 0T (2) 0 1 ≤ Cε− 12 h + | log h| 2 h1− 2 ρ0 μh − χh L2 . ε H 19
19
Choosing h0 = o(ε 12 ), for h ≤ h0 and ρ0 = o(ε 12 | log h| 0 0 0 (1) 0 0T 0
L2
n−3 2
h 2 −1 ), there holds n
0 1 0 1 0 0 + √ 0T (2) 0 1 ≤ μh − χh L2 ε 2 H
1 1 ≤ μh − χh L2 + √ vh − wh H 1 . 2 ε
The proof is complete. We are now ready to state and prove the main theorem of this paper. 3 (ε) l−1 h ](σ ε H l + uε H l ). Then there Theorem 4.1. Let ρ1 = 2[C2 (ε)hl−2 + C√ ε exists an h1 > 0 such that for h ≤ min{h0 , h1 }, there exists a unique solution (σhε , uεh ) to (2.7)–(2.8) in the ball Bh (ρ1 ). Moreover, (4.13)
1 σ ε − σhε L2 + √ uε − uεh H 1 ≤ C4 (ε)hl−2 (σ ε H l + uε H l ) , ε
(4.14)
σ ε − σhε H 1 ≤ C5 (ε)hl−3 (σ ε H l + uε H l ) , 9
where C4 (ε) = C5 (ε) = O(ε− 2 ) when n = 2, C4 (ε) = C5 (ε) = O(ε−6 ) when n = 3. Proof. Let (μh , vh ) ∈ Bh (ρ1 ) and choose h1 > 0 such that h1 | log h1 |
3−n 2l−n
≤C
h1 | log h1 |
3−n 2l−n−2
≤C
25
ε 12 ε C3 (ε)(σ H l + uε H l ) 19
ε 12 ε C2 (ε)(σ H l + uε H l )
2 2l−n
and 2 2l−n−2
.
Then h ≤ min{h0 , h1 } implies ρ1 ≤ ρ0 . Thus, using the triangle inequality and
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1243
Lemmas 4.1 and 4.2, we get 0 0 0 1 0 0 0 0 0 0Ih σ ε − T (1) (μh , vh )0 2 + √ 0Ih uε − T (2) (μh , vh )0 1 ε L H 0 0 0 ε (1) ε ε 0 ≤ 0Ih σ − T (Ih σ , Ih u )0 2 L 0 0 0 1 0 0 0 (1) 0 0 ε ε (1) + 0T (Ih σ , Ih u ) − T (μh , vh )0 2 + √ 0Ih uε − T (2) (Ih σ ε , Ih uε )0 1 ε L H 0 1 0 0 (2) 0 + √ 0T (Ih σ ε , Ih uε ) − T (2) (μh , vh )0 1 ε H C3 (ε) ≤ C2 (ε)hl−2 + √ hl−1 (σ ε H l + uε H l ) ε
1 1 + Ih σ ε − μh L2 + √ Ih uε − vh H 1 2 ε ρ1 ρ1 ≤ + = ρ1 < 1. 2 2 So, T (μh , vh ) ∈ Bh (ρ1 ). Clearly, T is a continuous mapping. Thus, T has a unique fixed point (σhε , uεh ) ∈ Bh (ρ1 ), which is the unique solution to (2.7)–(2.8). Next, we use the triangle inequality to get 1 σ ε − σhε L2 + √ uε − uεh H 1 ≤ σ ε − Ih σ ε L2 + Ih σ ε − σhε L2 ε 1 + √ (uε − Ih uε H 1 + Ih uε − uεh H 1 ) ε ≤ ρ1 + Chl−1 (σ ε H l + uε H l ) ≤ C4 (ε)hl−2 (σ ε H l + uε H l ) . Finally, using the inverse inequality, we have σ ε − σhε H 1 ≤ σ ε − Ih σ ε H 1 + Ih σ ε − σhε H 1 ≤ σ ε − Ih σ ε H 1 + Ch−1 Ih σ ε − σhε L2 ≤ Chl−1 σ ε H l + Ch−1 ρ1 ≤ C5 (ε)hl−3 [σ ε H l + uε H l ] . The proof is complete. Comparing with error estimates for the linearized problem in Theorem 3.2, we see that the above H 1 error for the scalar variable is not optimal. Next, we shall employ a similar duality argument as used in the proof of Theorem 3.2 to show that the estimate can be improved to optimal order. Theorem 4.2. Under the same hypothesis of Theorem 4.1 there holds (4.15) uε − uεh H 1 ≤ C4 (ε)ε−2 hl−1 + C5 (ε)h2(l−2) (σ ε H l + uε H l ) . Proof. The regularity assumption implies that there exists (κ, z) ∈ W0 × V0 ∩ H 3 (Ω) such that (4.16) (4.17)
(κ, μ) + (div(μ), Dz) = 0 1 1 (div(κ), Dv) − (Φε Dz, Dv) = (D(uε − uεh ), Dv) ε ε
∀μ ∈ W0 , ∀v ∈ V0 ,
1244
XIAOBING FENG AND MICHAEL NEILAN
with (4.18)
zH 3 ≤ Cb (ε)D(uε − uεh )L2 .
It is easy to check that σ ε − σhε and uε − uεh satisfy the following error equations: (4.19) (4.20)
(σ ε − σhε , μh ) + (div(μh ), D(uε − uεh )) = 0 1 (div(σ ε − σhε ), Dvh ) + (det(σ ε ) − det(σhε ), vh ) = 0 ε
∀μh ∈ W0h , ∀vh ∈ V0h .
By (4.16)–(4.20) and the mean value theorem, we get 1 1 D(uε − uεh )2L2 = (div(κ), D(uε − uεh )) − (Φε Dz, D(uε − uεh )) ε ε 1 ε ε ε ε = (div(Πh κ), D(u − uh )) − (Φ D(u − uεh ), Dz) + (div(κ − Πh κ), D(uε − uεh )) ε 1 ε ε ε = (σh − σ , Πh κ) − (Φ D(uε − uεh ), Dz) + (div(κ − Πh κ), D(uε − uεh )) ε 1 ε ε ε = (σh − σ , κ) − (Φ D(uε − uεh ), Dz) ε + (div(κ − Πh κ), D(uε − Ih uε )) + (σhε − σ ε , Πh κ − κ) 1 = (div(σ ε − σhε ), Dz) − (Φε D(uε − uεh ), Dz) ε + (div(κ − Πh κ), D(uε − Ih uε )) + (σhε − σ ε , Πh κ − κ) 1 = (div(σ ε − σhε ), D(z − Ih z)) − (Φε D(uε − uεh ), D(z − Ih z)) ε + (div(κ − Πh κ), D(uε − Ih uε )) + (σhε − σ ε , Πh κ − κ) 1 1 − (det(σ ε ) − det(σhε ), Ih z) − (Φε D(uε − uεh ), D(Ih z)) ε ε 1 ε ε ε = (div(σ − σh ), D(z − Ih z)) − (Φ D(uε − uεh ), D(z − Ih z)) ε + (div(κ − Πh κ), D(uε − Ih uε )) + (σhε − σ ε , Πh κ − κ) 1 1 − (Ψε : (σ ε − σhε ), Ih z) − (Φε D(uε − uεh ), D(Ih z)) , ε ε where Ψε = cof(σ ε + τ [σhε − σ ε ]) for τ ∈ [0, 1]. Next, we note that (Ψε : (σ ε − σhε ), Ih z) + (Φε D(uε − uεh ), D(Ih z)) = (Φε : (σ ε − σhε ), Ih z) + (div(Φε Ih z), D(uε − uεh )) + ((Ψε − Φε ) : (σ ε − σhε ), Ih z) = (σ ε − σhε ), Φε Ih z) + (div(Πh (Φε Ih z)), D(uε − uεh )) + ((Ψε − Φε ) : (σ ε − σhε ), Ih z) + (div(Φε Ih z − Πh (Φε Ih z)), D(uε − Ih uε )) = (σ ε − σhε , Φε Ih z − Πh (Φε Ih z)) + ((Ψε − Φε ) : (σ ε − σhε ), Ih z) + (div(Φε Ih z − Πh (Φε Ih z)), D(uε − Ih uε )) .
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1245
Using this and the same technique used in Step 4 of Lemma 4.2, we have 1 1 D(uε − uεh )2L2 = (div(σ ε − σhε ), D(z − Ih z)) − (Φε D(uε − uεh ), D(z − Ih z)) ε ε 1 ε ε ε ε ε ε ((Φ − Ψ ) : (σ − σh ), Ih z) + (σ − σh , Πh (Φε Ih z) − Φε Ih z) + ε
+ (div(Πh (Φε Ih z) − Φε Ih z), D(uε − Ih uε )) + (σhε − σ ε , Πh κ − κ) + (div(κ − Πh κ), D(uε − Ih uε )) C ≤ div(σ ε − σhε )L2 + 2 D(uε − uεh )L2 D(z − Ih z)L2 ε C ε Φ − Ψε L2 σ ε − σhε L2 Ih zL∞ + σ ε − σhε L2 Πh (Φε Ih z) − Φε Ih zL2 + ε + div(Πh (Φε Ih z) − Φε Ih z)L2 D(uε − Ih uε )L2 + κ − Πh κL2 σ ε − σhε L2
+ div(κ − Πh κ)L2 D(uε − Ih uε )L2
1 ε 2 ε ε ε ≤ Ch σ − σh H 1 + 2 u − uh H 1 zH 3 ε C + 2 (Φε − Ψε L2 σ ε − σhε L2 + hσ ε − σhε L2 + uε − Ih uε H 1 ) zH 3 ε + Chσ ε − σhε L2 κH 1 + Cuε − Ih uε H 1 κH 1 (C4 (ε) + C5 (ε))hl−1 C4 (ε)hl−2 ε ε ε ε 2 ≤ [σ + u ] + Φ − Ψ zH 3 l l L H H 3 ε2 ε2 (C4 (ε) + C5 (ε))hl−1 ≤ Cb (ε) [σ ε H l + uε H l ] 3 ε2 C4 (ε)hl−2 ε ε 2 Φ − Ψ D(uε − uεh )L2 . + L ε2 We now bound Φε − Ψε L2 separately for the cases n = 2 and n = 3. First, when n = 2 we have Φε − Ψε L2 = cof(σ ε ) − cof(σhε + τ [σ ε − σhε ])L2 = σ ε − (σhε + τ [σ ε − σhε ])L2 ≤ C4 (ε)hl−2 [σ ε H l + uε H l ] . Second, when n = 3, on noting that |(Φε − Ψε )ij | = |(cof(σ ε ))ij − (cof(σhε + τ [σ ε − σhε ]))ij | = |det(σ ε |ij ) − det(σ ε |ij + τ [σ ε |ij − σhε |ij ])|, and, using the mean value theorem and Sobolev inequality, we get (Ψε )ij − (Φε )ij L2 = (1 − τ )(Λε )ij : (σ ε |ij − σhε |ij )L2 ≤ (Λε )ij H 1 σ ε |ij − σhε |ij H 1 , where (Λε )ij = cof(σ ε |ij + λ[σhε |ij − σ ε |ij ]) for λ ∈ [0, 1]. Since (Λε )ij ∈ R2×2 , then (Λε )ij H 1 = σ ε |ij + λ(σhε |ij − σ ε |ij )H 1 ≤ Cσ ε H 1 = O ε−1 . Thus,
Φε − Ψε L2 ≤ C4 (ε)ε−1 hl−2 (σ ε H l + uε H l ) .
Finally, combining the above estimates we obtain D(uε − uεh )L2 ≤ C4 (ε)ε−2 hl−1 + C4 (ε)h2(l−2) (σ ε H l + uε H l ) . We note that 2(l − 2) ≥ l − 1 for k ≥ 2. The proof is complete.
1246
XIAOBING FENG AND MICHAEL NEILAN
5. Numerical experiments and rates of convergence. In this section, we provide several 2-D numerical experiments to gauge the efficiency of the mixed finite element method developed in the previous sections. We numerically determine the “best” choice of the mesh size h in terms of ε, and rates of convergence for both u0 − uε and uε − uεh . All tests given below are done on domain Ω = [0, 1]2 . We refer the reader to [18, 27] for more extensive 2-D and 3-D numerical simulations. Newton’s method is employed as the (nonlinear) solver in all our numerical tests. We like to remark that the mixed finite element methods we tested are often 10–20 times faster than the Argyris finite element Galerkin method studied in [19]. We refer the reader to [18] for more discussions and comparisons on the Galerkin and mixed methods. Test 1. For this test, we calculate u0 − uεh for fixed h = 0.015, while varying in order to estimate uε − u0 . We use quadratic Lagrange element for both variables and solve problem (2.5)–(2.6) with the following test functions: (a) u0 = e
x2 +y2 2
,
x2 +y2 x2 +y2 f = 1 + x2 + y 2 e 2 , g = e 2 ,
(b) u0 = x4 + y 2 , f = 24x2 ,
g = x4 + y 2 .
After having computed the error, we divide it by various powers of to estimate the rate at which each norm converges. The left column of Figure 5.1, which is the log1 log plots of the errors in various norms vs ε, clearly shows that σ 0 − σhε L2 = O(ε 4 ). 1 Since h is very small, we then have u0 − uε H 2 ≈ σ 0 − σhε L2 = O(ε 4 ). Based on 1 this heuristic argument, we predict that u0 − uε H 2 = O(ε 4 ). Similarly, from the 3 left column of Figure 5.1, we see that u0 − uε L2 ≈ O(ε) and u0 − uε H 1 ≈ O(ε 4 ). Test 2. The purpose of this test is to calculate the rate of convergence of uε −uεh for fixed ε in various norms. We use quadratic Lagrange element for both variables and solve problem (2.5)–(2.6) with boundary condition D2 uε ν · ν = ε on ∂Ω being replaced by D2 uε ν · ν = hε on ∂Ω and using the following test functions: (a) uε = 20x6 + y 6 , f ε = 18000x4 y 4 − ε 7200x2 + 360y 2 , g ε = 20x6 + y 6 , (b) uε = xsin(x) + ysin(y), g ε = xsin(x) + ysin(y),
hε = 600x4 νx2 + 30y 4 νy2 . f ε = (2cos(x) − xsin(x))(2cos(y) − ysin(y)) − ε(xsin(x) − 4cos(x) + ysin(y) − 4cos(y)), hε = (2cos(x) − xsin(x))νx2 + (2cos(y) − ysin(y))νy2 .
After having computed the error in different norms, we divided each value by a power of h expected to be the convergence rate by the analysis in the previous section. As seen from the right column of Figure 5.1, which is the log-log plots of the errors in various norms vs h, the error converges exactly as expected in H 1 norm, but σhε appears to converge one order of h better than the analysis shows. In addition, the error seems to converge optimally in L2 norm although a theoretical proof of such a result has not yet been proved. Test 3. In this test, we fix a relation between and h, and then determine the “best” choice for h in terms of such that the global error u0 − uεh has the same convergence rate as that of u0 − uε . We solve problem (2.5)–(2.6) with the following test functions: (a) u0 = x4 + y 2 , f = 24x2 , g = x4 + y 2 . To see which relation gives the sought-after convergence rate, we compare the data with a function, y = βxα , where α = 1 in the L2 case, α = 34 in the H 1 case, and
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1247
Fig. 5.1. Log-log plots of change of u − uεh w.r.t. ε for Test 1 (left column) and log-log plots of change of u − uεh w.r.t. h for Test 2 (right column).
α = 14 in the H 2 -case. The constant, β, is determined using a least squares fitting algorithm based on the data. As seen in the figures below, the best h − ε relation depends on which norm 1 one considers. Figures 5.2 and 5.3 indicate that when h = ε 2 , u0 − uεh L2 ≈ O(ε), 1 and σ 0 − σhε L2 ≈ O(ε 4 ). It can also be seen from Figure 5.4 that when h = ε, 3 u0 − uεh H 1 = O(ε 4 ).
1248
XIAOBING FENG AND MICHAEL NEILAN
Fig. 5.2. Test 3a. L2 -error of uεh .
ε. Fig. 5.3. Test 3a. L2 -error of σh
` MIXED METHODS FOR MONGE–AMPERE EQUATIONS
1249
Fig. 5.4. Test 3a. H 1 -error of uεh .
REFERENCES [1] A. D. Aleksandrov, Certain estimates for the Dirichlet problem, Soviet Math. Dokl., 1 (1961), pp. 1151–1154. [2] F. E. Baginski and N. Whitaker, Numerical solutions of boundary value problems for Ksurfaces in R3 , Numer. Methods for Partial Differential Equations, 12 (1996), pp. 525–546. [3] G. Barles and P. E. Souganidis, Convergence of approximation schemes for fully nonlinear second order equations, Asymptot. Anal., 4 (1991), pp. 271–283. [4] J.-D. Benamou and Y. Brenier, A computational fluid mechanics solution to the MongeKantorovich mass transfer problem, Numer. Math., 84 (2000), pp. 375–393. [5] S. C. Brenner and L. R. Scott, The Mathematical Theory of Finite Element Methods, 3rd edition, Springer, New York, 2008. [6] F. Brezzi and M. Fortin, Mixed and Hybrid Finite Element Methods, 1st edition, SpringerVerlag, Berlin, 1991. [7] L. A. Caffarelli and X. Cabr´ e, Fully nonlinear elliptic equations, American Mathematical Society Colloquium Publications 43, AMS, Providence, RI, 1995. [8] L. A. Caffarelli and M. Milman, Monge Amp` ere equation: Applications to geometry and optimization, Contemporary Mathematics, AMS, Providence, RI, 1999. [9] S. Y. Cheng and S. T. Yau, On the regularity of the Monge-Amp` ere equation det(∂ 2 u/∂xi ∂xj ) = F (x, u), Comm. Pure Appl. Math., 30 (1977), pp. 41–68. [10] P. G. Ciarlet, The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam, 1978. [11] M. G. Crandall and P.-L. Lions, Viscosity solutions of Hamilton-Jacobi equations, Trans. Amer. Math. Soc., 277 (1983), pp. 1–42. [12] M. G. Crandall, H. Ishii, and P.-L. Lions, User’s guide to viscosity solutions of second order partial differential equations, Bull. Amer. Math. Soc. (N.S.), 27 (1992), pp. 1–67. [13] E. J. Dean and R. Glowinski, Numerical methods for fully nonlinear elliptic equations of the Monge-Amp` ere type, Comput. Methods Appl. Mech. Engrg., 195 (2006), pp. 1344–1386.
1250
XIAOBING FENG AND MICHAEL NEILAN
[14] L. C. Evans, Partial differential equations, Graduate Studies in Mathematics 19, AMS, Providence, RI, 1998. [15] R. S. Falk and J. E. Osborn, Error estimates for mixed methods, R.A.I.R.O. Anal. Num´ er., 14 (1980), pp. 249–277. [16] X. Feng, Convergence of the vanishing moment method for the Monge-Amp` ere equation, Trans. AMS, submitted. [17] X. Feng and O. A. Karakashian, Fully discrete dynamic mesh discontinuous Galerkin methods for the Cahn-Hilliard equation of phase transition, Math. Comp. 76 (2007), pp. 1093– 1117. [18] X. Feng and M. Neilan, Vanishing moment method and moment solutions for second order fully nonlinear partial differential equations, J. Scient. Comp., DOI 10.1007/s10915-0089221-9, 2008. [19] X. Feng and M. Neilan, Analysis of Galerkin methods for the fully nonlinear Monge-Amp` ere equation, Math. Comp., to appear. [20] X. Feng, M. Neilan, and A. Prohl, Error analysis of finite element approximations of the inverse mean curvature flow arising from the general relativity, Numer. Math., 108 (2007), pp. 93–119. [21] D. Gilbarg and N. S. Trudinger, Elliptic Partial Differential Equations of Second Order, Classics in Mathematics, Springer-Verlag, Berlin, 2001. Reprint of the 1998 edition. [22] C. E. Gutierrez, The Monge-Amp` ere Equation, volume 44 of Progress in Nonlinear Differential Equations and Their Applications, Birkhauser, Boston, MA, 2001. [23] H. Ishii, On uniqueness and existence of viscosity solutions of fully nonlinear second order PDE’s, Comm. Pure Appl. Math., 42 (1989), pp. 14–45. [24] R. Jensen, The maximum principle for viscosity solutions of fully nonlinear second order partial differential equations, Arch. Ration. Mech. Anal., 101 (1988), pp. 1–27. [25] O. A. Ladyzhenskaya and N. N. Ural’tseva, Linear and Quasilinear Elliptic Equations, Academic Press, New York, 1968. ¨ li, A priori error analysis for the hp-version of the discontinu[26] I. Mozolevski and E. Su ous Galerkin finite element method for the biharmonic equation, Comput. Methods Appl. Math., 3 (2003), pp. 596–607. [27] M. Neilan, Numerical methods for fully nonlinear second order partial differential equations, Ph.D. Dissertation, The University of Tennessee, in preparation. [28] A. M. Oberman, Wide stencil finite difference schemes for elliptic Monge-Amp´ ere equation and functions of the eigenvalues of the Hessian, Discrete Contin. Dyn. Syst. B, 10 (2008), pp. 221–238. [29] V. I. Oliker and L. D. Prussner, On the numerical solution of the equation (∂ 2 z/∂x2 )(∂ 2 z/∂y 2 ) − ((∂ 2 z/∂x∂y))2 = f and its discretizations. I., Numer. Math., 54 (1988), pp. 271–293. [30] A. Oukit and R. Pierre, Mixed finite element for the linear plate problem: The HermannMiyoshi model revisited, Numer. Math., 74 (1996), pp. 453–477. [31] J. E. Roberts and J. M. Thomas, Mixed and hybrid methods, Handbook of Numerical Analysis, Vol. II, Finite Element Methods, North-Holland, Amsterdam, 1989. [32] T. Nilssen, X.-C. Tai, and R. Wagner, A robust nonconfirming H 2 element, Math. Comp., 70 (2000), pp. 489–505. [33] M. Wang, Z. Shi, and J. Xu, A new class of Zienkiewicz-type nonconforming elements in any dimensions, Numer. Math., 106 (2007), pp. 335–347. [34] M. Wang and J. Xu, Some tetrahedron nonconforming elements for fourth order elliptic equations, Math. Comp., 76 (2007), pp. 1–18.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1251–1273
c 2009 Society for Industrial and Applied Mathematics
NONSMOOTH NEWTON METHODS FOR SET-VALUED SADDLE POINT PROBLEMS∗ † AND RALF KORNHUBER† ¨ CARSTEN GRASER
Abstract. We present a new class of iterative schemes for large scale set-valued saddle point problems as arising, e.g., from optimization problems in the presence of linear and inequality constraints. Our algorithms can be regarded either as nonsmooth Newton-type methods for the nonlinear Schur complement or as Uzawa-type iterations with active set preconditioners. Numerical experiments with a control constrained optimal control problem and a discretized Cahn–Hilliard equation with obstacle potential illustrate the reliability and efficiency of the new approach. Key words. set-valued saddle point problems, nonsmooth Newton methods, Uzawa algorithms, active set preconditioners AMS subject classifications. 49M29, 65H20, 65N22, 90C46 DOI. 10.1137/060671012
1. Introduction. We consider the iterative solution of large scale saddle point problems of the form
∗ F BT u f ∗ n ∗ m (1.1) u ∈R , w ∈R : , B −C w∗ g where B and C are suitable matrices and the set-valued operator F = ∂ϕ stands for the subdifferential of a strictly convex functional ϕ. Such kind of problems typically arise from the discretization of optimization or optimal control problems governed by partial differential equations with inequality constraints (cf., e.g., [32, 45]). In the case of a quadratic objective functional, we get (1.2)
F = A + ∂IK ,
where IK is denoting the indicator functional of the admissible set K, A is a selfadjoint positive definite, sometimes even diagonal matrix, and C = 0. Another rich and still growing class of problems of the form (1.1) consists of discretized phase field models, such as Cahn–Hilliard equations [5, 6, 8, 18, 19], Penrose–Fife equations [10], or Stefan-type problems [48]. For example, discretization of Cahn–Hilliard equations with logarithmic potential leads to the single-valued but singularly perturbed nonlinearity F (u) = Au + T log((1 + u)/(1 − u)) where the logarithmic term is understood componentwise. Nonlinearities of the form (1.2) occur as singular limit for vanishing temperature T . The matrices A and C are essentially stiffness matrices of the Laplacian with A augmented by a nonlocal term reflecting mass conservation. Other possible applications include discretized plasticity problems [21, 43]. Saddle point problems of the form (1.1) with single-valued, Lipschitz continuous nonlinearities F have been considered in [12, 27]. Interior point methods (cf., e.g., ∗ Received by the editors October 2, 2006; accepted for publication (in revised form) October 8, 2008; published electronically February 25, 2009. This work was funded in part by the Deutsche Forschungsgemeinschaft (DFG) under contract Ko 1806/3-1 and by the DFG Research Center Matheon. http://www.siam.org/journals/sinum/47-2/67101.html † Institut f¨ ur Mathematik II, Freie Universit¨ at Berlin, Arnimallee 6, D - 14195 Berlin, Germany (
[email protected],
[email protected]).
1251
1252
¨ CARSTEN GRASER AND RALF KORNHUBER
[50, 51]) are based on suitable regularizations of set-valued nonlinearities (1.2). It is not immediately clear how this strategy should be generalized to single-valued but singularly perturbed nonlinearities. Existing primal-dual active set methods [26, 46] are based on the elimination of the state variables us and an active set approach to the resulting constrained minimization problem for the controls uc . These methods are applicable to (1.1) with u = (us , uc ), provided that the corresponding partitioning of B = (Bs , Bc ) generates an invertible matrix Bs , that the set-valued nonlinearity (1.2) constrains only uc , and finally that C = 0. For example, discretized Cahn–Hilliard equations have none of these properties. The novel approach presented in this paper relies on convexity rather than smoothness. It is motivated by the fact that a variety of practically relevant nonlinearities F can be either inverted in closed form or efficiently inverted by multigrid methods. This includes, e.g., the nonlinearities mentioned above [4, 3, 24, 30, 31, 29]. The basic idea is to reformulate (1.1) as an unconstrained convex minimization problem for the dual unknown w. The gradient of the objective functional h is just the nonlinear Schur complement H of (1.1) and, thus, involves F −1 . Minimization of h is carried out by well-known gradient-related descent methods (cf., e.g., [36, 37, 38]). Global convergence is enforced by standard Armijo damping [2] for simplicity. We particularly concentrate on nonsmooth Newton or Newton-like methods for nonlinearities of the form (1.2) taking into account that the nonlinear Schur complement H is Lipschitz but not differentiable in the classical sense. We prove global convergence and local exactness. Inexact versions are shown to be globally convergent. In the special case of discretized optimal control problems with control constraints and diagonal matrix A, our algorithms reduce to well-known primal-dual active set methods [25]. Hence, the algorithms presented in this paper can be regarded as a new variational approach to primal-dual active set strategies, thus, providing a natural globalization and generalization of these methods. Extensions to single-valued but singularly perturbed nonlinearities F will be presented in a forthcoming paper [23]. Our approach also sheds new light on well-established algorithms in computational plasticity [49]. From a computational point of view, our algorithms can be reinterpreted as nonlinear Uzawa iterations with active set preconditioners [22]. For nonlinearities of the form (1.2), each iteration step requires the detection of the actual active set of uν = F −1 (f − B T wν ) (not of uν itself!) and the sufficiently accurate evaluation of a corresponding linear saddle point problem (the actual preconditioner). We found in our numerical experiments with a discretized Cahn–Hilliard equation that, for bad initial iterates, the overall computational work was dominated by Armijo damping, because each Armijo test involves the exact evaluation of F −1 , i.e., the solution of a discrete elliptic obstacle problem. For reasonable initial iterates as obtained, e.g., from the preceding time step, almost no damping was necessary. In this case the (inexact) evaluation of the linear saddle point problem clearly dominated the overall computational cost. The paper is organized as follows. After some notation and a precise formulation of the assumptions, we derive the equivalent unconstrained minimization problem which is fundamental for the rest of this paper. In section 3, we recall some general convergence results for gradient-related descent methods for unconstrained minimization, including damping strategies and inexact variants. Then we concentrate on the selection of suitable descent directions for the special case of nonlinearities of the form (1.2). More precisely, we investigate the B-subdifferential of F and later of H, giving rise to various nonsmooth Newton-type methods. The main convergence results are
SET-VALUED SADDLE POINT PROBLEMS
1253
collected in Theorems 4.1–4.3. Section 5 provides a more tangible reformulation of these abstract schemes in terms of quadratic obstacle problems and linear saddle point problems. Inexact evaluation of both of these subproblems and a heuristic damping strategy are also discussed. In our numerical computations, we consider a control constrained optimal control problem and a discretized Cahn–Hilliard equation. We found superlinear convergence and finite termination, supporting our theoretical findings. 2. Set-valued saddle point problems. 2.1. General assumptions and notation. Let ·, · denote the euclidian inner product on Rm . We equip Rm with the norm ·M , 2
xM = M x, x ,
x ∈ Rm ,
induced by a fixed symmetric, positive definite (s.p.d.) matrix M ∈ Rm,m . Linear mappings will be identified with their matrix representations with respect to the canonical basis vectors ei with the coefficients (ei )j = δi,j (Kronecker-δ). Elements x of the dual space (Rm ) will be represented as x = x, · with suitable x ∈ Rm . Hence, using 0 00 1 0 1 00 0 0 |x (y)| = | x, y | ≤ 0M − 2 x0 0M 2 y 0 = xM −1 yM , the dual space (Rm , ·M ) is identified with (Rm , ·M −1 ). We impose the following conditions on the saddle point problem (1.1). (A1) F = ∂ϕ is the subdifferential of a proper, lower semicontinuous, strictly convex functional ϕ : Rn → R = R ∪ {∞}. The inverse F −1 : Rn → Rn is single-valued and Lipschitz continuous. (A2) C ∈ Rm,m is symmetric, positive semidefinite. (A3) B ∈ Rm,n . (A4) The saddle point problem (1.1) has a unique solution. Nonlinearities F satisfying condition (A1) occur, e.g., in discretized Cahn–Hilliard equations with logarithmic potential [5]. Later on, we will concentrate on the special case F = A + ∂IK , where A ∈ Rn,n is s.p.d. and IK denotes the indicator functional of a closed convex set K. In this case, (A1) holds with ϕ(x) =
1 Ax, x + IK , 2
and x = F −1 (y) is the unique solution of the variational inequality (2.1)
x∈K:
Ax − y, v − x ≥ 0
∀v ∈ K.
It is well known that the corresponding mapping F −1 : (Rn , · A−1 ) → (Rn , · A ) is Lipschitz continuous with constant LF −1 ≤ 1 (cf., e.g., [28, p. 24]). 2.2. Nonlinear Schur complement and unconstrained minimization. Our aim is to reformulate the given saddle point problem as an unconstrained minimization problem. In the first step, the inclusion (1.1) is transformed into a single-valued equation.
1254
¨ CARSTEN GRASER AND RALF KORNHUBER
Proposition 2.1. The saddle point problem (1.1) is equivalent to (2.2)
w ∗ ∈ Rm :
H(w∗ ) = 0
with the Lipschitz continuous mapping H(w) = −BF −1 f − B T w + Cw + g , (2.3)
w ∈ Rm .
Proof. Using (A1), the equivalence is easily obtained by straightforward block elimination. Lipschitz continuity is clear since H consists of a sum and a composition of the Lipschitz continuous function F −1 with linear and constant functions. The operator H can be regarded as a nonlinear version of the well-known Schur complement. In contrast to the linear case, the right-hand side f cannot be separated from the part depending on w. Note that H is single-valued, because F −1 = (∂ϕ)−1 is single-valued or, equivalently, the minimization of ϕ on Rn admits a unique solution. Theorem 2.1. There is a Fr´echet-differentiable, convex functional h : Rm → R with the property ∇h = H and the representation w ∈ Rm , (2.4) h(w) = −L F −1 f − B T w , w , where L(u, w) = ϕ(u) − f, u + Bu − g, w −
1 Cw, w 2
denotes the Lagrange functional associated with (1.1). Proof. The polar (or conjugate) functional ϕ∗ of ϕ is convex and, by Corollary 5.2 in [17, p. 22], has the property ∂ϕ∗ = (∂ϕ)−1 = F −1 . Since F −1 is single-valued, ϕ∗ is Gˆ ateaux-differentiable. The continuity of F −1 implies that ϕ∗ is even Fr´echetdifferentiable with ∇ϕ∗ = F −1 . Setting (2.5)
1 h(w) = ϕ∗ f − B T w + Cw, w + g, w 2
we immediately get ∇h = H using the chain rule. By the definition of ϕ∗ we have ϕ∗ (y) = sup (y, x − ϕ(x)) = − infn (ϕ(x) − y, x) x∈R x∈Rn G −1 F −1 = − ϕ F (y) − y, F (y) , y ∈ Rn . Inserting this representation with y = f − B T w into (2.5), we get (2.4). The convexity of ϕ implies the monotonicity of F −1 . In combination with the nonnegativity of C we get (2.6)
w1 − w2 , H(w1 ) − H(w2 ) G F = f − B T w1 − f − B T w2 , F −1 f − B T w1 − F −1 f − B T w2 + C(w1 − w2 ), w1 − w2 ≥ 0
so that H is monotone. Therefore, h is convex. Assuming, in addition to (A2), that C is positive definite, it is not difficult to show that h is strongly convex; i.e., there is a constant μ > 0 such that (2.7) h(λx + (1 − λ)y) ≤ λh(x) + (1 − λ)h(y) − λ(1 − λ)
μ 2 x − yM 2
∀λ ∈ [0, 1]
1255
SET-VALUED SADDLE POINT PROBLEMS
holds for all x, y ∈ Rm . In general, however, h is not even strictly convex so that we had to require uniqueness separately. Combining Proposition 2.1 with Theorem 2.1, we are ready to state the main result of this section. Corollary 2.1. The set-valued saddle point problem (1.1) is equivalent to the unconstrained convex minimization problem (2.8)
w ∗ ∈ Rm :
h(w∗ ) ≤ h(w)
∀w ∈ Rm .
Recall that the functional h is differentiable with Lipschitz continuous gradient H = ∇h. However, the actual evaluation of h(w) and ∇h(w) might be expensive, because it involves the solution of F (u) = f − B T w. 3. Gradient-related methods. Exploiting Corollary 2.1, existing algorithms for the unconstrained minimization of convex, differentiable functionals now can be utilized to solve the constrained saddle point problem (1.1). In this section, we consider the fairly general class of gradient-related descent methods (see, for example, [37]). In agreement with section 2.2, we assume that h : Rm → R denotes a convex functional with Lipschitz continuous Fr´echet derivative ∇h and the unique minimizer w ∗ ∈ Rm . 3.1. Global convergence results. We consider the iteration (3.1)
wν+1 = wν + ρν dν ,
ν = 0, 1, . . . ,
with given initial guess w ∈ R . In each step, first a search direction dν is chosen according to the actual iterate wν and then a step size ρν is fixed according to wν and dν , i.e., 0
(3.2)
dν = d(ν, wν ),
m
ρν = ρ(ν, wν , dν ),
ν = 0, 1, . . . ,
with suitable mappings d, ρ. The search directions dν should allow for a sufficient descent of h. Definition 3.1. The search directions dν = d(ν, wν ), ν ∈ N, are called gradientrelated descent directions if for any sequence (wν ) ⊂ Rm the conditions (3.3)
∇h(wν ) = 0 ⇐⇒ dν = 0
∀ν ∈ N
and (3.4)
− ∇h(wν ), dν ≥ cD ∇h(wν )M −1 dν M
∀ν ∈ N
hold with a constant cD > 0 independent of ν. Note that the preconditioned gradients dν = −M −1 ∇h(wν ) satisfy (3.4) with equality and cD = 1. Obviously, (3.4) implies (3.5)
− ∇h(wν ), dν > 0
if ∇h(wν ) = 0. Search directions dν = d(ν, wν ), ν ∈ N, satisfying (3.3) and, instead of (3.4), the weaker condition (3.5) for arbitrary (wν ) ∈ Rm are called descent directions. The step sizes ρν should realize a sufficient portion of possible descent. Definition 3.2. Let dν = d(ν, wν ), ν ∈ N, be descent directions. Then the step sizes ρν = ρ(ν, wν , dν ), ν ∈ N, are called efficient if for any sequence (wν ) ⊂ Rm the estimate
2 ∇h(wν ), dν h(wν + ρν dν ) ≤ h(wν ) − cS (3.6) dν M holds for all ν ∈ N such that ∇h(wν ) = 0 with a constant cS > 0 independent of ν.
1256
¨ CARSTEN GRASER AND RALF KORNHUBER
We are now ready to prove convergence. Theorem 3.1. Assume that (3.2) provides gradient-related descent directions dν and efficient step sizes ρν . Then, for arbitrary initial iterate w0 ∈ Rm , the iterates wν , ν ∈ N, obtained from (3.1) converge to the minimizer w∗ of h. Proof. Combining the properties of dν = d(ν, wν ) and ρν = ρ(ν, wν , dν ) we get (3.7)
2 h(wν ) − h wν+1 ≥ cS c2D ∇h (wν )M −1
∀ν ∈ N.
Since h has a global minimizer, the sequence (h(wν )) is bounded from below and, by (3.7), is monotonically decreasing. Hence, h(wν ) converges to some h∗ ∈ R. Using again (3.7), we get (3.8)
2 0 ≤ cS c2D ∇h(wν )M −1 ≤ h(wν ) − h wν+1 → 0
so that ∇h(wν ) must tend to zero. The section S = {w ∈ Rm | h(w) ≤ h(w0 )} is bounded. Otherwise, there would be ∗ a sequence (wk ) ⊂ S with the property λ−1 k := wk − w ≥ k. Then, by compactness ∗ of the unit sphere with center w , the sequence wk = w∗ + (wk − w∗ )/wk − w∗ has a convergent subsequence wk j → w∗∗ = w∗ . By continuity and convexity of h this leads to h(w∗∗ ) = lim h(wk j ) ≤ lim λkj h(wkj ) + (1 − λkj )h(w∗ ) = h(w∗ ), j→∞
j→∞
contradicting the uniqueness of w∗ . The section S is also closed and, therefore, compact. As a consequence, (wν ) has a convergent subsequence (wνi ) → w∗∗ . The continuity of ∇h provides ∇h(w∗∗ ) = 0, and uniqueness implies w∗∗ = w∗ . Hence, each convergent subsequence must tend to w∗ . This proves the assertion. In the proof, we have made extensive use of Heine–Borel’s theorem which is restricted to finite dimensions. However, using weak compactness and the weak lower semicontinuity of h, weak convergence of the iterates wν can be shown by similar arguments in the infinite-dimensional case. Strong linear convergence can be shown in any dimension under the additional assumption that h is strongly convex. The proof is based on the following lemma summarizing well-known results (cf., e.g., [37]). Lemma 3.1. Let h be strongly convex with constant μ > 0. Then h satisfies the estimates (3.9)
1 μ 2 2 w − w∗ M ≤ h(w) − h(w∗ ) ≤ ∇h(w)M −1 2 2μ
∀w ∈ Rm
with the minimizer w∗ of h. Theorem 3.2. Assume that the conditions of Theorem 3.1 are satisfied and, in addition, h is strongly convex with constant μ > 0. Then the iterates wν , ν ∈ N, produced by (3.1) satisfy the error estimate (3.10)
wν − w∗ M ≤ q ν 2
2 0 h w − h(w∗ ) , μ
where 0 ≤ q = (1 − 2cS c2D μ) < 1 if w0 = w∗ . The proof is straightforward using Lemma 3.1.
1257
SET-VALUED SADDLE POINT PROBLEMS
3.2. Damping strategies. A variety of algorithms for efficient step size control are available from surveys and textbooks like [16, 36, 37, 38]. For simplicity, we consider the standard Armijo strategy [2], [16, p. 121], and [37, p. 491] based on the actual decrease of the functional h. More precisely, for a fixed parameter δ ∈ (0, 1) and each ν ∈ N a step size ρ ≥ 0 is called admissible if (3.11)
h(wν + ρdν ) ≤ h(wν ) + ρδ ∇h(wν ), dν
is satisfied. Proposition 3.1. Let (wν ) ⊂ Rm , and let dν = d(ν, wν ), ν ∈ N, be descent directions. For suitably selected, fixed parameters α > 0 and δ, β ∈ (0, 1) determine the step sizes ρν = ρ(ν, wν , dν ) ≥ 0 by # ∇h(wν ), dν ν j (3.12) ρ = max , ρ admissible ρ = αν β αν ≥ −α 2 j∈N∪{0} dν M if dν = 0 and set ρν = 0 otherwise. Then the efficiency condition (3.6) holds with 5 4 (3.13) . cS = δ min α, β 1−δ L Here L stands for the Lipschitz constant of ∇h, i.e., (3.14)
∇h(v) − ∇h(w)M −1 ≤ L v − wM
∀v, w ∈ Rm .
The proof of Proposition 3.1 adopts standard arguments, e.g., from [37]. Starting with j = 0, efficient step sizes can be computed from (3.12) by a finite number of tests. Observe that each of these tests might be expensive, because it requires the evaluation of h and, therefore, the evaluation of F −1 (cf. Theorem 2.1). 3.3. Inexact versions. We consider inexact search directions d˜ν . This means that for given ν and wν the exact evaluation dν = d(ν, wν ) is replaced by some approximation (3.15)
˜ wν ) d˜ν = d(ν,
based on some approximation d˜ of the exact mapping d. Proposition 3.2. Let dν = d(ν, wν ) be gradient-related descent directions with ˜ wν ) satisfy (3.3) and the constant cD . Assume that the approximations d˜ν = d(ν, accuracy condition 0 0 0 0 ν cD 0d − d˜ν 0 ≤ c0d˜ν 0 (3.16) , ∀ν ∈ N, c< M M 2 ˜ wν ) are also gradientfor any sequence (wν ). Then the approximations d˜ν = d(ν, related descent directions. Proof. Let (wν ) ⊂ Rm . Then the vectors dν = d(ν, wν ), ν ∈ N, satisfy (3.4) and we have to prove a similar estimate for the approximations d˜ν . This is trivial for d˜ν = 0. Note that (3.16) implies dν = 0 in this case. In light of (3.3) there is only the remaining case dν , d˜ν = 0. Some elementary calculations involving the Cauchy–Schwarz inequality and the triangle inequality yield 2 3 0 0 ν 0d − d˜ν 0 dν d˜ν ∇h(wν ) , −0 0 ≤ 2 0 ν0 M . 0d˜ν 0 0d˜ 0 ∇h(wν )M −1 dν M M
M
1258
¨ CARSTEN GRASER AND RALF KORNHUBER
As dν − d˜ν M /d˜ν M ≤ c < cD /2, it is clear that J I 0 0 − ∇h(wν ), d˜ν ≥ c˜D ∇h(wν )M −1 0d˜ν 0M with c˜D = cD − 2c > 0. Usually, the constant cD occurring in the accuracy condition (3.16) is not known. Replacing (3.16) by the asymptotic criterion 0 ν 0 0d − d˜ν 0 0 0 M =0 lim (3.17) 0d˜ν 0 ν→∞ M
˜ν
the approximate directions d have the desired property (3.4) for sufficiently large ν. 4. Nonsmooth Newton methods and related algorithms. We now consider the question of how to choose the descent directions dν = d(wν ). We will concentrate on preconditioned gradients of h or, more precisely, on directions of the form (4.1)
dν = −Sν−1 H(wν ),
H = ∇h,
with suitable s.p.d. matrices Sν = S(ν, wν ). If H would be sufficiently smooth, the derivative Sν = H (wν ) : Rm → Rm would provide the classical Newton iteration. From our assumptions (A1)–(A4) and the definition (2.3), we cannot expect H to exist. Hence, related concepts from nonsmooth analysis will be applied. To this end, (A1) is from now on replaced by the stronger condition (A1’): (A1’) F = A+∂IK , where A ∈ Rn,n is s.p.d. and IK denotes the indicator functional of the closed convex set (4.2)
K = {x ∈ Rn | a ≤ x ≤ b},
a, b ∈ (R ∪ {−∞, ∞})n ,
a < 0 < b.
Recall that F is the subdifferential of ϕ(x) = 12 Ax, x + IK and Lipschitz continuous with constant L ≤ 1 in this case. Nonlinearities F satisfying (A1’) occur, e.g., in discretized optimal control problems with inequality constraints [32, 45] or discretized phase field models with obstacle potentials [6, 8]. The condition a < 0 < b causes no loss of generality and will be notationally convenient in what follows. 4.1. The B-subdifferential of F −1 . Let c ∈ K with K ⊂ Rn defined in (4.2). We introduce the subset of all active indices Nc• := {i ∈ N | ai = ci or ci = bi } of the index set N = {1, . . . , n}. The mapping Tc : Rn → Rn , defined by xi ei , x ∈ Rn , Tc x := i∈N \Nc•
truncates all coefficients with active indices. Note that Tc is an orthogonal projection with respect to the euclidian scalar product ·, ·. The finite set C := {c ∈ K | (I − Tc )c = c}
SET-VALUED SADDLE POINT PROBLEMS
1259
represents all possible configurations of active coefficients, i.e., of coefficients with active indices. The active coefficients of x ∈ K are given by TC x := (I − Tx )x ∈ C.
(4.3)
As F : K → Rn is invertible, K and Rn can be decomposed according to / / (4.4) Ic , Rn = F (Ic ), Ic := {x ∈ K | TC x = c} , K= c∈C
c∈C
based on the subsets Ic of vectors with the same active coefficients. Note that (I − Tc )x = c
∀x ∈ Ic ,
c ∈ C.
We now investigate the restriction of F to Ic . To this end, it is convenient to introduce the mapping c := Tc ATc + I − Tc : Rn → Rn . A
(4.5)
c : ran Tc → ran Tc and A c reduces to the identity on the orthogonal Observe that A complement ran(I − Tc ). Hence, (4.6) Using
c Tc = Tc ATc = Tc A c , A I
c (I − Tc ) = I − Tc . A
J c x, y = ATc x, Tc y + (I − Tc )x, (I − Tc )y A
−1 we obtain c is s.p.d. Multiplying (4.6) by A it is easy to show that A c (4.7)
−1 , −1 Tc = Tc A A c c
−1 (I − Tc ) = I − Tc . A c
Lemma 4.1. Let c ∈ C. Then the restriction of F to Ic takes the form (4.8) [0, ∞)si (c)ei , x ∈ Ic , F (x) = Ax + i∈Nc•
denoting
+1 si (c) = −1
if ci = bi , if ci = ai ,
i ∈ Nc• .
Conversely, the restriction of F −1 to F (Ic ) takes the form −1 −1 (4.9) y ∈ F (Ic ). F −1 (y) = Tc A c Tc y + I − Tc Ac Tc A c, Proof. Let x ∈ Ic . Using the representation IK (x) = I[ai ,bi ] (xi ),
x=
i∈N
xi ei ,
i∈N
of the characteristic functional IK , we immediately get (cf. [17, p. 26]) ∂I[ai ,bi ] (xi )ei = [0, ∞)si (c)ei . ∂IK (x) = i∈N
This proves (4.8).
i∈Nc•
1260
¨ CARSTEN GRASER AND RALF KORNHUBER
Let x ∈ Ic and y ∈ F (x). We apply Tc to the representation (4.8), insert the splitting x = Tc x + (I − Tc )x, and use the identity (I − Tc )x = c to obtain c x − (I − Tc A)c. Tc y = Tc Ax = Tc ATc x + Tc Ac = A −1 Multiplication by A and reordering terms, we get c (4.10)
−1 Tc y + A −1 (I − Tc A)c. x=A c c
The left identity in (4.7) yields −1 Tc = A −1 Tc . −1 Tc Tc = Tc A A c c c Using c = (I − Tc )c and the right identity in (4.7), we obtain −1 −1 A c c = Ac (I − Tc )c = (I − Tc )c = c. Inserting these representations into (4.10) the assertion (4.9) follows. As a consequence of (4.4) and (4.9), F −1 is piecewise affine linear on Rn with the • −1 −1 linear part Tc A c Tc on each subset F (Ic ), c ∈ C. In the extreme case, Nc = N , F is even constant on F (Ic ). As F −1 is Lipschitz continuous, F −1 must be differentiable almost everywhere (cf. Rademacher’s theorem [35]). Let DF −1 denote the set where F −1 is differentiable. Then the B-subdifferential ∂B (F −1 ) (cf. [40, 46]) is defined by ⎧ ⎫ ⎨ ⎬ −1 (y D F ) ∂B (F −1 )(y) = lim . n n →y ⎩y y∈D ⎭ n
F −1
Note that ∂B F −1 (y) ⊂ co ∂B F −1 (y) = ∂ F −1 (y) with ∂(F −1 ) denoting Clarke’s generalized derivative [13, Chapter 2]. Proposition 4.1. Let y ∈ Rn and c = TC (F −1 (y)) ∈ C. Then −1 Tc ∈ ∂B F −1 (y). Tc A (4.11) c Proof. Note that F −1 (y) ∈ Ic by definition (4.4) of Ic . Inserting the decomposition x = Tc x+c of some arbitrary x ∈ Ic into (4.8), it turns out that F (Ic ) is the parallelepiped translated from the origin by Ac and spanned by the nonzero column vectors of ATc and of I − Tc with coefficients zi ∈ (ai , bi ), i ∈ N \ Nc• , and zi ∈ [0, ∞)si (c), c + (I − Tc )ATc , (4.7), i ∈ Nc• , respectively. Utilizing the identities ATc + I − Tc = A and the orthogonality Tc (I − Tc ) = 0, it is easily checked that −1 −1 (ATc + I − Tc ) = I. A c − (I − Tc )ATc Ac Hence, the interior of F (Ic ) cannot be empty so that the convexity of F (Ic ) yields (4.12)
F (Ic ) ⊂ int F (Ic ).
If y ∈ int F (Ic ), then the representation (4.9) implies −1 Tc . D F −1 (y) = Tc A c
SET-VALUED SADDLE POINT PROBLEMS
1261
F (I(b1 ,a2 ) )
F (I(a1 ,0) ) F (I(0,0) )
F (I(b1 ,0) ) F (I(b1 ,b2 ) )
Fig. 4.1. Decomposition of R2 into parallelepipeds F (Ic ), c ∈ C.
If y ∈ F (Ic ) \ int F (Ic ), then (4.12) implies that there is a sequence (yk ) ⊂ int F (Ic ) with yk → y. Obviously, −1 lim DF −1 (yk ) = Tc A c Tc
k→∞
which proves the assertion. Figure 4.1 illustrates the decomposition of Rn into the nondegenerating parallelepipeds F (Ic ), c ∈ C, for n = 2. The only bounded parallelepiped F (I(0,0) ) is spanned by the column vectors of A. 4.2. Algorithms and convergence results. Proposition 4.1 suggests using −1 B-subdifferentials Tc A c Tc , c ∈ C, for the linearization of the Schur complement H(w) = −BF −1 f − B T w + Cw + g , w ∈ Rm , as introduced in (2.3). Proposition 4.2. Assume that rank B = n. Then (4.13)
−1 (BTc )T + C ∈ ∂B H(w), S(c) = BTc A c
w ∈ Rm ,
where (4.14)
c = c(w) = TC F −1 f − B T w .
Proof. Let G : Rm → Rn be defined by G(w) = F −1 (f − B T w), w ∈ Rm . We consider some fixed w ∈ Rm and c = TC G(w). As rank B T = n, the mapping B T : Rm → Rn is surjective. Hence, the preimage G−1 (Ic ) of Ic is still a nondegenerate parallelepiped. Therefore, we can use the same arguments as in the proof of Proposition 4.1 to show T −1 −Tc A c Tc B ∈ ∂B G(w).
As H is an affine transformation of G, the assertion follows. Simple counterexamples show that (4.13) might not hold for rank B T < n. Let us check whether S(c) is invertible. We immediately get J I −1 (BTc )T x, (BTc )T y + Cx, y , x, y ∈ Rm . S(c)x, y = A c
1262
¨ CARSTEN GRASER AND RALF KORNHUBER
Hence, S(c) is symmetric and positive semidefinite. It is a sufficient (but not necessary) condition for the regularity of S(c) that C is s.p.d. Lemma 4.2. Assume that S(c) is s.p.d. for all c ∈ C. Then h is strongly convex. Proof. Consider G(w) = F −1 (f − B T w) as already introduced in the proof of Proposition 4.2. Let c ∈ C. Then for all w ∈ G−1 Ic the representation ∇h(w) = H(w) = S(c)w+ g˜(c) holds with suitable g˜(c) ∈ Rm independent of w (cf. Lemma 4.1). As S(c) is s.p.d., we have (4.15)
S(c)w, w ≥ γc w2M
∀w ∈ G−1 Ic
with some constant γc > 0. This means that h is quadratic and strongly 1convex on each preimage G−1 Ic . We now show strong convexity on the whole Rm = c∈C G−1 Ic with the constant μ = minc∈C γc > 0. To this end, we define the scalar functions ψ1 (λ) = x − y−2 M h(λx + (1 − λ)y), μ ψ2 (λ) = x − y−2 M λh(x) + (1 − λ)h(y) − 2 λ(1 − λ),
λ ∈ [0, 1],
with some fixed x = y ∈ Rm . It is sufficient to show ψ1 ≤ ψ2 . Obviously, ψ1 is piecewise quadratic, ψ2 is quadratic, and ψ1 (λ) = ψ2 (λ) at the boundary λ = 0, 1. By definition, ψ1 (λ) ≥ min γc = ψ2 (λ) c∈C
holds for almost all λ ∈ [0, 1]. Now ψ1 ≤ ψ2 follows either from elementary arguments or from a weak maximum principle (cf. [20, Theorem 9.1]) as applied to ψ1 − ψ2 . We are ready to state the basic convergence result of this section. Theorem 4.1. Assume that S(c) is s.p.d. for all c ∈ C. Then, for arbitrary initial iterate w0 ∈ Rm , the damped nonsmooth Newton-type method, as obtained by inserting the search directions (4.16)
dν = −Sν−1 H(wν ),
with Sν = S(cν ),
H(wν ) = ∇h(wν ),
cν = TC F −1 f − B T wν ,
and step sizes ρν selected according to Proposition 3.1 into the basic algorithm (3.1), converges linearly to the solution w∗ of (2.8). If (2.8) is nondegenerate in the sense that F −1 f − B T w∗ ∈ int Ic∗ , c∗ = TC F −1 f − B T w∗ , (4.17) then the algorithm terminates after a finite number of steps. Proof. To prove convergence by Theorem 3.1, we have only to show that the directions dν as defined in (4.16) are gradient-related. Let c ∈ C. Denoting the norm of the linear mapping S(c) : (Rm , ·M ) to (Rm , ·M −1 ) by Γc and using the coercivity (4.15), we get 02 0 0 G F γc 0 0S(c)−1 ∇h(w)0 ∇h(w)M −1 ∇h(w), S(c)−1 ∇h(w) ≥ γc 0S(c)−1 ∇h(w)0M ≥ M Γc for all w ∈ Rm . Since C is finite, (3.4) now holds with cD := min c∈C
γc > 0. Γc
SET-VALUED SADDLE POINT PROBLEMS
1263
Utilizing Lemma 4.2, linear convergence immediately follows from Theorem 3.2. If (2.8) is nondegenerate, then F −1 (f −B T wν0 ) ∈ Ic∗ holds for sufficiently large ν0 . This implies wν0 +1 = w∗ , because H is affine on all w with F −1 (f − B T w) ∈ Ic∗ . Under the additional assumption rank B = n, we obtain (cf. Proposition 4.2) Sν = S(cν ) ∈ ∂B H(w)
∀ν ∈ N
and, therefore, a nonsmooth Newton method. In order to allow for local superlinear or even quadratic convergence (cf. [39, 40]), it is essential that ρν → 1 for ν → ∞ which, in general, does not hold for the standard Armijo strategy. Hence, nonsmooth analogues of well-known affine-invariant damping strategies [16, section 3.4] will be the subject of future research. If h is not strongly convex, then S(c) is not invertible for certain c. Therefore, we now modify S(c) to ensure invertibility. By symmetry we have ker S(c) = (ran S(c))⊥ . We introduce the mapping I(c) : m R → Rm by (4.18)
I(c)|ker S(c) = I|ker S(c) ,
I(c)|ran S(c) = 0,
to define (4.19)
= S(c) + I(c), S(c)
c ∈ C.
Observe that the orthogonal subspaces ker S(c) and ran S(c) are invariant with re spect to S(c). Decomposing x, y into their components from ker S(c) and ran S(c), respectively, we get I J S(c)x, y = S(c)xran , yran + xker , yker so that S(c) is s.p.d. Note that S(c) can be rewritten as = S(c) + S(c)
l ki kiT ki 2 i=1
with k1 , . . . , kl denoting an orthogonal basis of ker S(c). If S(c) is replaced by S(c), then nonsmooth Newton steps are carried out on ran Sν , i.e., if possible, while simple gradient steps are performed on ker Sν . Theorem 4.2. For arbitrary initial iterate w0 ∈ Rm , the nonsmooth Newton-like method, as obtained by inserting the search directions (4.20)
dν = −Sν−1 H(wν ),
H(wν ) = ∇h(wν ),
with ν ), Sν = S(c
cν = TC F −1 f − B T wν ,
and step sizes ρν selected according to Proposition 3.1 into the basic algorithm (3.1), converges to the solution w∗ of (2.8). If the problem (2.8) is nondegenerate in the sense of (4.17) and S(c∗ ), c∗ = TC F −1 (f − B T w∗ ), is positive definite, then the algorithm terminates after a finite number of steps.
1264
¨ CARSTEN GRASER AND RALF KORNHUBER
Proof. Using the same arguments as in the proof of Theorem 4.1 it can be shown that the modified search directions dν defined in (4.20) are gradient-related. Hence, convergence is a consequence of Theorem 3.1. Finite termination also follows by the reasoning as in the proof of Theorem 4.1. Remark. In general, one would expect local superlinear convergence of a Newtonlike method. However, straightforward application of this concept makes no sense in the present, piecewise affine case, because, in a sufficiently small neighborhood, the algorithms terminate with the exact solution after one step. Further insight could be obtained by showing that the domain of superlinear convergence is larger than the domain of one step termination and, in particular, does not depend on the dimension m. In order to determine dν = −Sν−1 H(wν ), a linear saddle point problem associated ν ) has to be solved (see section 5 below). with the Schur complement matrix Sν = S(c A sufficiently accurate iterative solution preserves convergence. Theorem 4.3. For arbitrary initial iterate w0 ∈ Rm , the inexact nonsmooth Newton-like method, as obtained by inserting search directions d˜ν which satisfy (3.3) and the accuracy condition (3.16) with dν = −Sν−1 H(wν ) and step sizes ρν selected according to Proposition 3.1 into the basic algorithm (3.1), converges to the solution w∗ of (2.8). The iterates converge linearly if h is strongly convex, e.g., for positive definite C. Proof. As the directions dν are gradient-related (see the proof of Theorem 4.2 above) the convergence is an immediate consequence of Proposition 3.2. If C is positive definite, then h is strongly convex. In this case linear convergence follows from Theorem 3.2. 5. Computational aspects. 5.1. Preconditioned Uzawa methods. Denoting uν := F −1 (f − B T wν ) the Newton-like method as introduced in Theorem 4.2 can be interpreted as the preconditioned Uzawa iteration uν = F −1 f − B T wν , (5.1a) wν+1 = wν + ρν Sν−1 (Buν − Cwν − g) (5.1b) for the saddle point problem (1.1). The first substep (5.1a) amounts to the solution of the quadratic obstacle problem F G (5.2) uν = arg min 12 Av, v − f − B T wν , v , v∈K
which has been extensively treated in the literature (cf., e.g., [14, 21, 30, 34, 44, 3]). Inserting the definitions (4.19) and (4.13) of Sν and S(cν ), the evaluation of the preconditioned residual dν = Sν−1 (Buν − Cwν − g) in the second substep (5.1b) can be rewritten as the solution of the linear saddle point problem ν
cν u ˜ 0 A (BTcν )T (5.3) = , dν g + Cwν − Buν (BTcν ) −(C + I(cν )) where, according to (4.3), cν = TC uν identifies the active coefficients of uν . Recall cν is obtained from A by replacing the ith row and the ith column by the unit that A
SET-VALUED SADDLE POINT PROBLEMS
1265
vector ei if i is active, i.e., ci ∈ {ai , bi }. BTcν is obtained from B by annihilating the ith column if i is active. Finally, I(cν ) has been defined in (4.18). Thus, the preconditioner Sν is approximating the original set-valued operator by essentially eliminating the actual active coefficients [22]. A sufficiently accurate, iterative solution of (5.3) preserves convergence of the overall iteration (5.1) (cf. Theorem 4.3). In particular, multigrid methods have been investigated in [9, 42, 47, 52, 53]. 5.2. Inexact evaluation of F −1 . The exact solution uν = F −1 (f − B T wν ) appears on the right-hand side of the linear saddle point problem (5.3). However, it turns out that the preconditioned residual can be computed from wν and the active coefficients cν of uν alone. uν , w ˜ ν ) be the solution Proposition 5.1. For given wν ∈ Rm and cν = TC uν let (˜ of
ν
cν Tcν f − Tcν Acν u ˜ A (BTcν )T (5.4) = . w ˜ν g − Bcν − I(cν )wν (BTcν ) −(C + I(cν )) Then ˜ ν − wν . Sν−1 (Buν − Cwν − g) = w Proof. Let dν = Sν−1 (Buν − Cwν − g) = −Sν−1 H(wν ). Utilizing the definitions (2.3) of H, the representation (4.9) of F −1 , and the definitions (4.19) and (4.13) of Sν and S(cν ), respectively, we get Sν (wν + dν ) = Sν wν − H(wν ) T ν ν −1 + Bcν − Cwν − g = Sν wν + BTcν A cν Tcν f − B w − Ac ν ν ν ν −1 = (BTcν )A cν (Tcν f − Tcν Ac ) − (g − Bc − I(c )w ). Hence, w ˜ ν = wν + dν is the second component of the solution of (5.4). This completes the proof. Usually, the active coefficients cν of uν can be computed much faster than uν itself: For nondegenerate problems monotone multigrid methods [30] or even simple projected Gauß–Seidel relaxations [21, Chapter V] provide cν in a finite number of steps. Using the a priori estimate (cf., e.g., [28, p. 24]) (5.5)
u∗ − uν A ≤ B(w∗ − wν )A−1
the accuracy of uν can be estimated without actual computation of uν . In order to determine efficient step sizes ρν by Armijo’s strategy (cf. Proposition 3.1), we have to evaluate F −1 for each test j = 0, . . . in (3.12). Though it is possible to develop straightforward inexact variants of existing damping strategies, e.g., of the Curry–Altmann principle [37, p. 483], an even cheaper heuristic strategy will be applied in the numerical computations to be reported below: We set ρν = 1 if the condition 0 0 (5.6) dν M ≤ σ 0dν−1 0 M
holds with some fixed parameter σ ∈ (0, 1) and compute ρν according to Armijo’s strategy otherwise. Note that it is not hard to show convergence if (5.6) holds for dν = Sν−1 H(wν ) and all ν ∈ N.
1266
¨ CARSTEN GRASER AND RALF KORNHUBER
6. Numerical results. In the following examples Ω = (0, 1) × (0, 1) denotes the unit square and the triangulation TJ of Ω is resulting from J uniform refinement steps as applied to the initial partition T0 consisting of four congruent subtriangles. The uniform refinement Tj+1 of Tj is obtained by connecting the midpoints of all triangles T ∈ Tj . Hence, the mesh size of TJ is hJ = 2−J . The sequence T0 ⊂ T1 ⊂ · · · ⊂ TJ of triangulations gives rise to a nested sequence S0 ⊂ S1 ⊂ · · · ⊂ SJ of finite element spaces 5 4 Sj = v ∈ C Ω | v|T is linear ∀T ∈ Tj ⊂ H 1 (Ω), j = 0, . . . , J. The standard nodal basis of SJ is denoted by λp , p ∈ NJ , where NJ stands for the set of vertices of TJ . Homogeneous Dirichlet conditions give rise to the subspace SJ,0 = span{λp | p ∈ NJ,0 } ⊂ H01 (Ω),
NJ,0 = NJ ∩ Ω.
The scalar product in L2 (Ω) and its lumped version in SJ are denoted by (·, ·) and ·, ·, respectively. The linear space of piecewise constant functions 4 5 PJ = v ∈ L2 (Ω) | v|T is constant ∀T ∈ TJ ⊂ L2 (Ω) is spanned by the canonical basis μT , T ∈ TJ , as defined by μT (x) = 1 for x ∈ int T and μT (x) = 0 otherwise. 6.1. An optimal control problem with control constraints. For given y0 ∈ L4 (Ω) and ε > 0, we consider the following optimal control problem [45]. Find y ∈ H01 (Ω) and u ∈ L∞ (Ω) such that 1 ε y − y0 2L2 (Ω) + u2L2(Ω) dx (6.1) J (y, u) = 2 Ω 2 is minimal over all functions in H01 (Ω) and L∞ (Ω) subject to the state equation (6.2)
(∇y, ∇v) = (u, v)
∀v ∈ H01 (Ω)
and the control constraint (6.3)
u ∈ K = {v ∈ L∞ (Ω) | |v(x)| ≤ 1 a.e. in Ω}.
Approximating H01 (Ω) by SJ,0 and K by KJ = {v ∈ PJ | v|T ≤ 1 ∀T ∈ TJ } ⊂ K, we obtain a discrete analogue of the continuous problem. For existence and error estimates, we refer to [1]. We restrict our considerations to this discretization only. However, the algorithm behaves similar for other discretizations, e.g., with linear finite elements for the control. After incorporating (6.2) by a Lagrange multiplier w, the Kuhn–Tucker conditions of the discretized problem can be rewritten in the form (1.1) with n = |NJ,0 | + |TJ |, m = |TJ |, F = A + ∂KJ ,
DS 0 A= , DS = λp , λq p,q∈NJ,0 , DP = (μT , μT ) T,T ∈TJ , 0 εDP B = AS − DSP , AS = ∇λp , ∇λq p,q∈NJ,0 , DSP = λp , μT p∈NJ,0 ,T ∈TJ ,
1267
SET-VALUED SADDLE POINT PROBLEMS 0
0
10
10
−5
−5
10
error
error
10
−10
10
−15
−15
10
−20
10
1
−10
10
10
Newton 2
−20
3 4 iteration steps
10
5
Newton 5
10 15 iteration steps
20
Fig. 6.1. Iteration history for ε = 10−4 (left) and ε = 10−8 (right). The filled dots indicate ρν = 1.
C = 0, and suitable right-hand sides f and g. It is easily checked that the assumptions (A1’), (A2), and (A3) are fulfilled. Moreover, it turns out that S(c) is s.p.d. ∀c ∈ C. As a consequence, h must be strongly convex (cf. Lemma 4.2) providing uniqueness (A4) and linear convergence of the Newton-type iteration to be called Newton as well as its inexact version (cf. Theorems 4.1 and 4.3). In general, we have rank B = m < n so that it is not clear from our present analysis that Sν = S(cν ) ∈ ∂B (H(wν )) (cf. Proposition 4.2). As A is diagonal, the quadratic obstacle problems (5.2) arising in each iteration step can be easily solved by nodal projection. The linear saddle point problems (5.3) are evaluated by the direct solver UMFPACK [15]. Following [41, Chapter 5], we select the desired state ⎧ 4 ⎪ ⎪ ⎪ ⎨−10 y0 (x) = 0.001 ⎪ −2 ⎪ ⎪ ⎩ 50
if if if if
x ∈ [0, 0.75] × [0, 0.5], x ∈ [0, 0.75] × [0.5, 1], x ∈ [0.75, 1] × [0, 0.5], x ∈ [0.75, 1] × [0.5, 1]
in our numerical computations. The mesh size hJ = 2−J is resulting from J = 7 refinement steps. Finally, we choose the parameters . ν ),dν αν = max 1, −α ∇h(w , β = 0.5 , δ = 0.5 (6.4) α = 10−2 , 2 ν d M
in the associated Armijo strategy (cf. Proposition 3.1). Figure 6.1 shows the algebraic error w∗ − wν M over the number of iteration steps for the two problem parameters ε = 10−4 and ε = 10−8 , respectively. The algebraic error is measured in the energy norm induced by the Schur complement M = BA−1 B T providing 0 0 w∗ − wν M = 0B T (w∗ − wν )0A−1 ≥ u∗ − uν A according to (5.5). The “exact” solution w∗ is precomputed to round-off errors. In both cases, we observe superlinear convergence and finite termination, even exceeding the findings of Theorem 4.1. The condition number of (6.1) is increasing for decreasing regularization parameter ε. This is reflected by the large number of iteration steps for the small value ε = 10−8 . As the solution of the (diagonal!) obstacle problems (5.2) is almost for free and, in addition, no more than two tests are necessary in Armijo
¨ CARSTEN GRASER AND RALF KORNHUBER
1268
Newton
10
80 iteration steps
iteration steps
8 6 4
60 40 20
2 0 3
Newton
100
4
5 refinement level
6
7
0 3
4
5 refinement level
6
7
Fig. 6.2. Mesh dependence for ε = 10−4 (left) and ε = 10−8 (right).
damping, almost 100% of cpu time is consumed by the solution of the linear saddle point problems. For the given initial iterates the well-known (undamped) primal-dual algorithm converges only for ε = 10−4 but not for ε = 10−8 as indicated by Figure 6.1. On the other hand, in both cases the damping parameter ρν = 1 is accepted before the correct active set is detected in the last iteration step. We now investigate the mesh dependence of Newton. The two pictures in Figure 6.2 show the number of iteration steps required for the solution to round-off errors over the refinement levels. For both values ε = 10−4 and ε = 10−8 , the convergence speed seems to saturate with increasing refinement. It is interesting that coarser problems seem to become even harder for small ε. Note that the maximal number of Armijo tests is also increasing from two to ten on the coarsest mesh. 6.2. A Cahn–Hilliard problem. For given ε > 0, final time T > 0, and initial condition u0 ∈ K = {v ∈ H 1 (Ω) | |v| ≤ 1}, we consider the following initial value problem for the Cahn–Hilliard equation with an obstacle potential [7, 11, 18]. Find u ∈ H 1 (0, T ; (H 1 (Ω)) ) ∩ L∞ (0, T ; H 1(Ω)) and w ∈ L2 (0, T ; H 1(Ω)) with u(0) = u0 such that u(t) ∈ K and @ ? du ,v (6.5a) + ∇w, ∇v = 0 ∀v ∈ H 1 (Ω), dt 1 H (Ω) (6.5b) ε ∇u, ∇v − ∇u − (u, v − u) ≥ (w, v − u) ∀v ∈ K hold a.e. for t ∈ (0, T ). Here ·, ·H 1 (Ω) denotes the duality pairing of H 1 (Ω) and H 1 (Ω) . The unknown functions u and w are called order parameter and chemical potential, respectively. For existence and uniqueness results we refer to [7]. Semi-implicit Euler discretization in time and finite elements in space [6, 8] lead to the following discretized problem. Find ukJ ∈ KJ and wJk ∈ SJ such that F k G F G uJ , v + τ ∇wJk , ∇v = uk−1 (6.6a) ,v ∀v ∈ SJ , J F k G F k−1 G k k k k ∀v ∈ KJ (6.6b) ε ∇uJ , ∇ v − uJ − wJ , v − uJ ≥ uJ , v − uJ hold for each k = 1, . . . , N . We have chosen a uniform time step size τ = T /N , and KJ = K ∩ SJ is the nodal approximation of K. The initial condition u0J ∈ KJ is obtained by discrete L2 projection u0J , v = (u0 , v) ∀v ∈ SJ . Existence, uniqueness, and error estimates have been established in [8]. More precisely, there exists a discrete solution (ukJ , wJk ) with
SET-VALUED SADDLE POINT PROBLEMS
1269
uniquely determined ukJ , k = 1, . . . , N . Moreover, wJk is also unique, provided that the condition (6.7) ∃p ∈ NJ : ukJ (p) < 1 is fulfilled. Hence, (A4) is satisfied in this case. If (6.7) is violated, then either the triangulation TJ is too coarse to resolve the diffuse interface or only one phase is present; i.e., uJ is constant. For the iterative solution of each spatial problem (6.6) a projected block Gauß–Seidel scheme [6] and an ADI-type iteration [33] are widely used. Both algorithms suffer from rapidly deteriorating convergence rates for increasing refinement. Exploiting discrete mass conservation ukJ , 1 = (u0 , 1), each spatial problem (6.6) takes the form (1.1) with n = m = |NJ |, F = A + ∂IKJ , A = ε λp , 1 λq , 1 + ∇λp , ∇λq p,q∈NJ , C = τ ∇λp , ∇λq p,q∈NJ , B = − (λp , λq )p,q∈NJ , and suitable right-hand sides f and g. Assuming (6.7), it is easily checked that the assumptions (A1’), (A2), and (A3) are satisfied. Observe that A is the sum of a sparse stiffness matrix and a rank one matrix. We clearly have rank B = n so that S(c) ∈ ∂B H(w) is a B-subdifferential of H (cf. Proposition 4.2). However, as C is only positive semidefinite, the kernel ker S(c) is trivial only if Nc• = N . In the singular case Nc• = N , ker S(c) is spanned by the constant vector k1 = (1, . . . , 1)T . For our numerical computations, we select ε = 10−4 and the time step τ = ε, and the mesh size hJ = 2−J is resulting from J = 9 refinement steps. The initial condition u0 takes the values u0 (x) = max{min{2 sin(4πx1 ) sin(4πx2 ), 1}, −1}. We compare the nonsmooth Newton-like method (cf. Theorem 4.2) called Newton-like, the inexact variant (cf. Theorem 4.3) called Inexact, and the projected block Gauß–Seidel relaxation [6] called Gauß–Seidel. The actual active coefficients are computed from the obstacle problem (5.2) by a monotone multigrid method [30]. The linear saddle point problems (5.4) are solved iteratively by a linear multigrid method with block Gauß–Seidel smoother and canonical restriction and prolongation. In the exact version Newton-like the solution wν is computed to machine accuracy, and we use Armijo damping (cf. Proposition 3.1) with δ = 10−3 and the other parameters given in (6.4). In the νth outer iteration of Inexact we apply 3ν steps of the linear multigrid method with V (3, 3) cycle to match the asymptotic accuracy condition (3.17), and we use heuristic damping (5.6) with σ = 0.5. Figure 6.3 illustrates the algebraic error w∗ − wν M over the computational work for the first two spatial problems. We choose the discrete H 1 -norm induced by M = D + C with D = τ (λp , λq )p,q∈NJ . Hence, u∗ − uν A ≤ cw∗ − wν M with a constant c independent of J (cf. (5.5) and Poincar´e’s inequality). The “exact” solution w∗ is precomputed to round-off errors. For a fair comparison, the computational work is now measured in work units (not in iteration steps). One work unit is the cpu time required by one linear multigrid V (3, 3) cycle as applied to the linear saddle point problem (5.4). The left and the right picture in Figure 6.3 show the iteration histories for the spatial problems arising from the first and the second time step, respectively. Each marker refers to one iteration step of Newton-like and Inexact, respectively. As no initial data are available for the chemical potential w, we start with the bad initial iterate w0 = 0 in the first problem, while the final approximation from the previous time step provides a reasonable initial iterate for the second
¨ CARSTEN GRASER AND RALF KORNHUBER
1270 0
0
10
10
−5
−5
10
error
error
10
−10
10
−15
−15
10
−20
10
0
−10
10
10 Newton−like Inexact Gauss−Seidel 100
200
−20
300 400 work units
500
10
600
0
Newton−like Inexact Gauss−Seidel 20
40 60 work units
80
100
Fig. 6.3. Iteration histories for good initial iterates (left) and bad initial iterates (right). The filled dots indicate ρν = 1. Table 6.1 Distribution of cpu time over the subtasks in each Uzawa step. Inexact # tests % Armijo % obstacle % linear work units
1 7 88.7 7.2 4.1 106.1
2 3 85.9 0.0 14.0 50.1
3 5 88.1 −0.0 11.9 78.5
4 3 76.1 −0.0 23.8 49.0
5 3 74.2 0.0 25.7 56.4
6 1 49.2 0.0 50.7 24.5
7 3 69.3 0.0 30.7 40.5
8 1 44.4 −0.0 55.5 21.8
9 0 0.1 0.0 99.7 11.0
10 0 0.1 27.2 72.6 13.4
11 0 0.1 24.0 75.7 10.9
one. This makes quite a difference. For the bad initial iterate, it takes about 400 work units (about 6 iteration steps) until Newton-like and Inexact finally display superlinear convergence. Gauß–Seidel is even more efficient in the beginning of the iteration, but not comparable later. For reasonable initial iterates, superlinear convergence starts immediately (observe the different scaling of the x-axis). In both cases, Inexact turns out to be more efficient than Newton-like. Table 6.1 gives more detailed insight into the performance of the different building blocks of Inexact as applied to the first problem. The number of tests involved in Armijo damping is given in the first line. Due to the bad initial iterate, a considerable number of tests are required in the beginning which later goes down to zero. The following three lines show the actual percentage of cpu time required by damping and the approximate solution of the obstacle problem and of the linear saddle point problem, respectively. These numbers do not sum to 100 because minor computations are neglected. Observe that the computational work is first dominated by Armijo damping and later by the increasing number of multigrid sweeps for the linear saddle point problem. Apart from the initial step, the detection of the active set takes not more than 5 monotone multigrid sweeps, each of which is cheaper than a multigrid sweep for the linear saddle point problem. As shown in the last line, the absolute amount of computational work strongly depends on the number of Armijo tests, which in turn strongly depends on the (problem dependent!) choice of the parameters. Hence, the performance of Inexact could be probably improved by more careful tuning of the damping parameters. Observe that, for bad initial iterates, neither the exact nor the inexact method converges without damping. On the other hand, for both versions the damping parameter ρν = 1 is accepted before the correct active set is detected (cf. Figure 6.3). More efficient affine-invariant damping strategies for nonsmooth Newton-type algorithms will be the subject of future research.
1271
SET-VALUED SADDLE POINT PROBLEMS 25
25
Newton−like Inexact
20 iteration steps
iteration steps
20
15
10
5
0 4
Newton−like Inexact
15
10
5
5
6 7 refinement level
8
9
0 4
5
6 7 refinement level
8
9
Fig. 6.4. Mesh dependence for good initial iterates (left) and bad initial iterates (right).
We now investigate the mesh dependence of Newton-like and Inexact. Figure 6.4 shows the number of iteration steps required for the solution to round-off errors over the refinement levels. For the first spatial problem (left), we always start with wν = 0, while, for the second spatial problem (right), we always start from the previous time level. In both cases, the overall convergence speed seems to be scarcely affected by decreasing mesh size. It is astonishing that Inexact sometimes even needs less iteration steps. Note that the averaged error reduction per work unit of Inexact is about ρ = 0.6. We observed ρ ≈ 0.16 for the linear multigrid solver as applied to the linear saddle point problems. Hence, for reasonable initial iterates, the solution of the discrete Cahn–Hilliard problem by straightforward inexact versions required about three to four times the cpu time for the solution of related linear saddle point problems by standard multigrid methods. Acknowledgments. The authors would like to thank the unknown referees for their most valuable comments and suggestions. REFERENCES ¨ ltzsch, Error estimates for the numerical approximation of [1] N. Arada, E. Casas, and F. Tro a semilinear elliptic control problem, Comput. Optim. Appl., 23 (2002), pp. 201–229. [2] L. Armijo, Minimization of functions having Lipschitz–continuous first partial derivatives, Pacific J. Math., 204 (1966), pp. 126–136. [3] L. Badea, X.-C. Tai, and J. Wang, Convergence rate analysis of a multiplicative Schwarz method for variational inequalities, SIAM J. Numer. Anal., 41 (2003), pp. 1052–1073. [4] L. Badea, Convergence rate of a Schwarz multilevel method for the constrained minimization of nonquadratic functionals, SIAM J. Numer. Anal., 44 (2006), pp. 449–477. [5] J. W. Barrett and J. Blowey, An error bound for the finite element approximation of the Cahn-Hilliard equation with logarithmic free energy, Numer. Math., 72 (1995), pp. 1–20. ¨ rnberg, and V. Styles, Finite element approximation of a phase field [6] J. W. Barrett, R. Nu model for void electromigration, SIAM J. Numer. Anal., 42 (2004), pp. 738–772. [7] J. Blowey and C. Elliott, The Cahn-Hilliard gradient theory for phase separation with nonsmooth free energy, Part I: Mathematical analysis, European J. Appl. Math., 2 (1991), pp. 233–280. [8] J. Blowey and C. Elliott, The Cahn-Hilliard gradient theory for phase separation with non-smooth free energy, Part II: Numerical analysis, European J. Appl. Math., 3 (1992), pp. 147–179. [9] D. Braess and R. Sarazin, An efficient smoother for the Stokes problem, Appl. Numer. Math., 23 (1997), pp. 3–19. [10] M. Brokate and J. Sprekels, Hysteresis and Phase Transition, Appl. Math. Sci. 121, Springer, Berlin, Heidelberg, New York, 1996.
1272
¨ CARSTEN GRASER AND RALF KORNHUBER
[11] J. Cahn and J. Hilliard, Free energy of a nonuniform system I. Interfacial energy, J. Chem. Phys., 28 (1958), pp. 258–267. [12] X. Chen, On preconditioned Uzawa methods and SOR methods for saddle-point problems, J. Comput. Appl. Math., 100 (1998), pp. 207–224. [13] F. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [14] R. Cottle, J. Pang, and R. Stone, The Linear Complementarity Problem, Academic Press, Boston, 1992. [15] T. A. Davis, Algorithm 832: Umfpack v4.3 – an unsymmetric-pattern multifrontal method, ACM Trans. Math. Software, 30 (2004), pp. 196–199. [16] P. Deuflhard, Newton Methods for Nonlinear Problems, Springer, Berlin, Heidelberg, 2004. [17] I. Ekeland and R. Temam, Convex Analysis, North–Holland, Amsterdam, 1976. [18] C. Elliott, The Cahn-Hilliard model for the kinetics of phase separation, in Mathematical Models for Phase Change Problems, J. Rodrigues, ed., Birkh¨ auser, Basel, Switzerland, 1989, pp. 35–73. [19] H. Garcke and B. Stinner, Second order phase field asymptotics for multi-component systems, Interfaces Free Bound., 8 (2006), pp. 131–157. [20] D. Gilbarg and N. Trudinger, Elliptic Partial Differential Equations of Second Order, 2nd ed., Springer, Berlin, 1988. [21] R. Glowinski, Numerical Methods for Nonlinear Variational Problems, Springer, New York, 1984. ¨ ser and R. Kornhuber, On preconditioned Uzawa-type iterations for a saddle point [22] C. Gra problem with inequality constraints, in Domain Decomposition Methods in Science and Engineering XVI, Lect. Notes Comput. Sci. Eng., O. Widlund and D. Keyes, eds., Springer, Heidelberg, 2006, pp. 91–102. ¨ ser and R. Kornhuber, Adaptive multigrid methods for the Cahn-Hilliard equation [23] C. Gra with logarithmic potential, in preparation. ¨ ser and R. Kornhuber, Multigrid methods for obstacle problems, J. Comput. Math., [24] C. Gra to appear. ¨ ser, Globalization of nonsmooth Newton methods for optimal control problems, in Nu[25] C. Gra merical Mathematics and Advanced Applications, K. Kunisch, G. Of, and O. Steinbach, eds., Springer, Berlin, 2007, pp. 605–612. ¨ ller, K. Ito, and K. Kunisch, The primal-dual active set strategy as a semis[26] M. Hintermu mooth Newton method, SIAM J. Optim., 13 (2002), pp. 865–888. [27] Q. Hu and J. Zou, Nonlinear inexact Uzawa algorithms for linear and nonlinear saddle-point problems, SIAM J. Optim., 16 (2006), pp. 798–825. [28] D. Kinderlehrer and G. Stampacchia, An Introduction to Variational Inequalities and Their Applications, Academic Press, New York, 1980. [29] R. Kornhuber and R. Krause, Robust multigrid methods for vector-valued Allen-Cahn equations with logarithmic free energy, Comput. Vis. Sci., 9 (2006), pp. 103–116. [30] R. Kornhuber, Monotone multigrid methods for elliptic variational inequalities I, Numer. Math., 69 (1994), pp. 167–184. [31] R. Kornhuber, On constrained Newton linearization and multigrid for variational inequalities, Numer. Math., 91 (2002), pp. 699–721. [32] J. Lions, Optimal Control of Systems Governed by Partial Differential Equations, Springer, Berlin, Heidelberg, New York, 1971. [33] P. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators, SIAM J. Numer. Anal., 16 (1979), pp. 964–979. [34] J. Mandel, A multilevel iterative method for symmetric, positive definite linear complementarity problems, Appl. Math. Optim., 11 (1984), pp. 77–95. ˇ ˇek, A simple proof of the Rademacher theorem, Casopis [35] A. Nekvinda and L. Zaj´ıc Pˇest. Mat, 113 (1988), pp. 337–341. [36] J. Nocedal, Theory of algorithms for unconstrained optimization, Acta Numer., 1 (1992), pp. 199–242. [37] J. Ortega and W. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. [38] M. Powell, Direct search algorithms for optimization calculations, Acta Numer., 7 (1998), pp. 287–336. [39] L. Qi and J. Sun, A nonsmooth version of Newtons’s method, Math. Program., 58 (1993), pp. 353–367. [40] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Math. Oper. Res., 18 (1993), pp. 227–244.
SET-VALUED SADDLE POINT PROBLEMS
1273
[41] A. Schiela and M. Weiser, Superlinear convergence of the control reduced interior point method for pde contrained optimization, Comput. Optim. Appl., 39 (2008), pp. 369–393. ¨ berl and W. Zulehner, On Schwarz-type smoothers for saddle point problems, Numer. [42] J. Scho Math., 95 (2003), pp. 377–399. [43] J. Simo and T. Hughes, Computational Inelasticity, Springer, Berlin, 1998. [44] X.-C. Tai, Rate of convergence for some constraint decomposition methods for nonlinear variational inequalities, Numer. Math., 93 (2003), pp. 755–786. ¨ ltzsch, Optimale Steuerung partieller Differentialgleichungen. Theorie, Verfahren und [45] F. Tro Anwendungen, Vieweg, Wiesbaden, 2005. [46] M. Ulbrich, Nonsmooth Newton-like Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces, Habilitationsschrift, TU M¨ unchen, Munich, 2002. [47] S. Vanka, Block-implicit multigrid solution of Navier-Stokes equations in primitive variables, J. Comput. Phys., 65 (1986), pp. 138–158. [48] A. Visintin, Models of Phase Transitions, Birkh¨ auser, Boston, 1996. [49] C. Wieners, Nonlinear solution methods for infinitesimal perfect plasticity, ZAMM Z. Angew. Math. Mech., 87 (2007), pp. 643–660. [50] S. J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadelphia, 1997. [51] Y. Ye, Interior Point Algorithms, Wiley, Chichester, 1997. [52] W. Zulehner, A class of smoothers for saddle point problems, Computing, 65 (2000), pp. 227– 246. [53] W. Zulehner, Analysis of iterative methods for saddle point problems: A unified approach, Math. Comp., 71 (2002), pp. 479–505.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1274–1303
c 2009 Society for Industrial and Applied Mathematics
THE LOCAL L2 PROJECTED C 0 FINITE ELEMENT METHOD FOR MAXWELL PROBLEM∗ HUO-YUAN DUAN† , FENG JIA† , PING LIN† , AND ROGER C. E. TAN† Abstract. An element-local L2 -projected C 0 finite element method is presented to approximate the nonsmooth solution being not in H 1 of the Maxwell problem on a nonconvex Lipschitz polyhedron with reentrant corners and edges. The key idea lies in that element-local L2 projectors are applied to both curl and div operators. The C 0 linear finite element (enriched with certain higher degree bubble functions) is employed to approximate the nonsmooth solution. The coercivity in L2 norm is established uniform in the mesh-size, and the condition number O(h−2 ) of the resulting linear system is proven. For the solution and its curl in H r with r < 1 we obtain an error bound O(hr ) in an energy norm. Numerical experiments confirm the theoretical error bound. Key words. Maxwell problem, nonsmooth solution, C 0 finite element method, L2 projection AMS subject classification. 65N30 DOI. 10.1137/070707749
1. Introduction. In this paper we shall study the C 0 finite element method for Maxwell equations with a nonsmooth solution (i.e., the solution not in H 1 ). Consider a simply connected nonconvex polyhedral domain Ω ⊂ R3 with a connected Lipschitz continuous boundary Γ, and let u denote an unknown field and f a given function. The problem we shall consider is to find u such that (1.1)
curl curl u = f
in Ω,
u×n=0
on Γ.
The curl curl operator in (1.1) represents the principal part of a large number of forms and models of Maxwell equations [15, 20], and problem (1.1) plays a central role in most mathematical issues associated with Maxwell equations, such as regularitysingularities (see [26, 23, 13, 27, 29]), solvability-uniqueness (see [12, 39, 24, 2, 34, 14, 35]), and numerical methods (see [13, 25, 41, 19, 7, 49, 43, 50, 51, 48, 5, 44, 37, 4, 18, 8, 42] and references therein). We are interested in using C 0 finite elements of piecewise polynomials for the numerical solution of (1.1) because of the availability of numerous software packages. Also, C 0 elements are highly preferred in practice for all unknown variables of those problems coupled with Maxwell equations, e.g., for Magnetohydrodynamics coupling with Navier–Stokes equations and Maxwell equations, since velocity and pressure in the Navier–Stokes equations part are approximated by C 0 elements; it is not desirable from the implementation point of view if using non C 0 elements to approximate the magnetic field in the Maxwell equations part. Although (1.1) looks quite simple, its discretization by the C 0 finite element method is not straightforward. This is associated with some main difficulties displayed in computational electromagnetics: (a) The infinite dimensional null-space (i.e., gradient field) of the curl operator badly pollutes the finite element solutions (cf. [41, 43]); (b) In the case where the solution is not in H 1 , the finite element solution would not converge ∗ Received by the editors November 9, 2007; accepted for publication (in revised form) October 29, 2008; published electronically February 25, 2009. This work was supported by NUS academic research grant R-146-000-064-112. http://www.siam.org/journals/sinum/47-2/70774.html † Department of Mathematics, National University of Singapore, 2 Science Drive 2, Singapore 117543, Singapore (
[email protected],
[email protected],
[email protected],
[email protected]. sg).
1274
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1275
to the true solution but to some other solution in H 1 ; (c) The indefiniteness of the resulting linear system would increase difficulty in implementation. To avoid the problems of gradient field and indefiniteness, a plain regularization (PR) method is widely used in practice (see [39, 13, 26, 43]), with a divergence constraint imposed on u for a given g: (1.2)
div u = g
in Ω.
Setting (1.3)
. 3 3 U = v ∈ L2 (Ω) ; curl v ∈ L2 (Ω) , div v ∈ L2 (Ω), v × n|Γ = 0
and letting (·, ·) denote the L2 -inner product, the variational form of the PR method consists of finding u ∈ U such that (1.4)
(curl u, curl v) + s (div u, div v) = (f , v) + s (g, div v) ∀v ∈ U,
where the real number s > 0 is referred to as penalty or regularization parameter and can be taken as any positive constant [26]. The PR formulation (1.4) is well suited for C 0 finite element discretizations depicted in [21], since (1.4) is a second-order elliptic problem with its bilinear form coercive on U (cf. [39, 13, 4, 24, 37, 34, 26]). Consequently, a globally C 0 finite element solution may be produced, and the resulting linear system can be solved by any of the numerous well-developed direct and iterative solvers (e.g., conjugate gradient method) [38, 47] for symmetric, positive definite linear systems. Nevertheless, the C 0 finite element discretization of (1.4) does not give a correct approximation when the solution is not in H 1 . What is worse, even refining the meshes with more elements cannot improve this situation. Readers are referred to [27, 39, 13, 28, 41] for more details. The low regularity of the solution would occur near reentrant corners and edges of nonsmooth domains, even if the right-hand sides are smooth; see [26, 29]. Here we shall try to explain the incorrect convergence based on our intuitive observation. Such an observation, together with the well-known interpolation error estimate (1.6) below, essentially motivates the method developed in this paper. Take s = 1, and let uh denote the C 0 finite element solution of (1.4), with h being the mesh size of the finite element triangulation of Ω. As h tends to zero, the PR formulation (1.4) would force uh to converge to an element in H 1 , but not to the solution u that does not belong to H 1 , due to the following fact (see [24, 27]) that 3 (1.5) (curl v, curl z) + (div v, div z) = ( v, z) for all v, z ∈ U ∩ H 1 (Ω) . On the other hand, any function u in L2 (even in L1 ) can be well approximated by C 0 finite elements: (1.6)
u − u 60 ≤ C hr ur
if u ∈ H r , r ≥ 0,
where u 6 is a C 0 interpolation of u, and · 0 , · r stand for L2 - and H r -norm, respectively, cf. [10, 11, 52, 22, 21, 53, 17]. So, when the solution is not in H 1 , there should be no problem in using C 0 elements to obtain a correct and good C 0 approximation, but we have to modify the PR formulation. In this respect, there is an existing method: the weighted regularization (WR) method [25]. The WR method is theoretically and numerically proven to be good in
1276
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
obtaining correct C 0 approximations. It adds a suitable weight function in front of the div operator in (1.4), i.e., (1.7)
(curl u, curl v) + s (ω div u, div v) = (f , v) + s (ω g, div v) ∀v ∈ U ω ,
where ω(x) is a weight function and U ω is a ω-weighted Hilbert space. The weight function ω is determined according to the geometric singularities of the domain boundary. To approximate the solution the WR method employs a C 0 finite element that is required to contain the gradient of a C 1 finite element space. Several C 1 elements [21] exist in two dimensions (2D), but, to our knowledge, in three-dimensional (3D) case, either few C 1 elements are known or C 1 elements involve too many degrees of freedoms and stringent conditions on the finite element triangulation of the domain [45, 1, 54, 33, 55]. Thus, either it is not easy to find a C 0 approximate space containing the gradient of a 3D C 1 element, or such a C 0 approximate space is of relatively little interest. It is also worth mentioning the singular function (SF) method [13, 39, 5, 6]. The SF method is successful for reduced 2D problems [40, 5]. Roughly speaking, the SF method uses the PR formulation (1.4) but augments the C 0 approximate space by the singular functions associated with reentrant corners and edges, which would span a space with an infinite dimension and should be precisely calculated in advance. Based on above reasons, it is rather inconvenient to apply these methods to 3D problems, especially when the geometric singularities of the domain boundary are not explicitly known. It is also worth mentioning the weighted least-squares method of a first-order system of (1.1) in [46] with additional independent variables, where linear elements are used with fewer degrees of freedom. In this paper, we develop a new C 0 finite element method for solving problem (1.1)–(1.2), based on the spirit of the L2 projection technique involved in the leastsquares minimization of the L2 projected residual of the Stokes first-order system [32]. In our case here, the PR formulation (1.4) is not a least-squares minimization of the residual of the curcurl-div second-order system (1.1)–(1.2), so we directly modify (1.4) by applying element-local L2 projectors in front of both curl and div operators, with suitable mesh-dependent (element-local) bilinear and linear forms added. In the C 0 linear element (enriched by suitable face- and element-bubbles) an approximation behaving like (1.6) of the solution being not in H 1 to problem (1.1)–(1.2) is obtained. ˘ h and Rh denote two local L2 projectors, respectively, for div Specifically, let R and curl operators, which are, respectively, defined element-by-element onto the discontinuous piecewise constant finite element space and the discontinuous piecewise linear finite element space, and let Sh (·, ·) denote a mesh-dependent (element-local) bilinear form which is called the stabilization term and corresponds to a right-hand side mesh-dependent linear form Zh (·), and let Uh ⊂ U ∩ (H 1 (Ω))3 denote the approximate space. Then the L2 projection method for solving problem (1.1)–(1.2) is to find uh ∈ Uh such that ˘ h (div uh ), R ˘ h (div v h ) Lh (uh , v h ) := (Rh (curl uh ), Rh (curl v h )) + s R + α Sh (uh , v (1.8) h ) ˘ h (div v h ) + α Zh (v h ) ∀v h ∈ Uh , = (f , v h ) + s g, R where the real number α > 0 is referred to as a stabilization parameter. As the approximate space, Uh is chosen to be the C 0 linear element (enriched with certain higher degree face- and element-bubble functions; see (3.10)). We show that the
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1277
following coercivity holds: (1.9)
Lh (v, v) ≥ C v20
∀v ∈ Uh ,
and obtain the condition number O(h−2 ) of the resulting linear system. With the help of the L2 projectors and the face- and element-bubbles in Uh , we construct an 6 ∈ Uh such that the exact solution u being not in H 1 appropriate C 0 interpolation u and the finite element solution uh ∈ Uh satisfies (1.10)
6 0 . u − uh 0 u − u
Inequalities (1.10) and (1.6) indicate that even if u is not in H 1 , a correct and good C 0 approximation of u should be expected. In fact, when u and curl u are in H r , with a smooth f , we obtain the following desirable error estimate in an energy norm: (1.11)
0 0 0˘ 0 r u − uh 0 + Rh (curl (u − uh ))0 + 0R h (div (u − uh ))0 ≤ C h . 0
Before closing this section, we make several remarks. Firstly, the implementation of the L2 projection method is almost the same as that of the PR method (1.4), since in the former both additional L2 projections and mesh-dependent terms are element-locally evaluated. Secondly, in comparison with the WR method (1.7), the L2 projection method (1.8) does not involve the geometric singularities of the domain boundary, and the approximate space Uh is not required to contain the gradient of a C 1 element. As a matter of fact, Uh here does not contain the gradient of any known C 1 elements. Thirdly, if the approximate space is chosen to contain the gradient of some C 1 element, then we can drop the L2 projector Rh before the curl operator and use the following bilinear form: (1.12)
˘ h (div u), R ˘ h (div v) + α S ∗ (u, v), L∗h (u, v) := (curl u, curl v) + s R h
where Sh∗ is a part of the mesh-dependent bilinear form Sh . We note that both (1.7) and (1.12) may employ the same approximate space containing the gradient of ˘ h for the some C 1 element, but (1.12) involves only one element local L2 projector R div operator and an element local stabilization term. No geometric singularities are explicitly involved in (1.12). The outline of this paper is as follows. In section 2, we review the Maxwell equations. In section 3, we describe the local L2 projected C 0 finite element method. Section 4 is devoted to the establishment of coercivity and the condition number. In section 5 we obtain error bounds in an energy norm. In section 6, numerical tests are performed to demonstrate the theoretical error bounds, and we make some conclusions in the last section. 2. Preliminaries. Let Ω ⊂ R3 be a simply connected polyhedron with a connected Lipschitz continuous boundary Γ. Let n denote the outward unit normal vector to Γ. In addition to the usual Hilbert spaces: H 1 (Ω) with norm · 1 ; H01 (Ω) and H 1 (Ω)/R with norm | · |1 ; H r (Ω) with norm · r for r ∈ R, we introduce some
1278
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
of the div and curl Hilbert spaces as follows: 5 4 H(div; Ω) = v ∈ (L2 (Ω))3 , div v ∈ L2 (Ω) , H0 (div; Ω) = {v ∈ H(div; Ω); v · n|Γ = 0}, H div0 ; Ω = {v ∈ H(div; Ω); div v = 0}, H0 div0 ; Ω = H0 (div; Ω) ∩ H div0 ; Ω , 3 3 . , H(curl; Ω) = v ∈ L2 (Ω) , curl v ∈ L2 (Ω) H0 (curl; Ω) = {v ∈ H(curl; Ω), v × n|Γ = 0}, H curl0 ; Ω = {v ∈ H(curl; Ω); curl v = 0}, H0 curl0 ; Ω = H0 (curl; Ω) ∩ H curl0 ; Ω , where these div and curl space are, respectively, equipped with norms: · 0;div and · 0;curl : v20;div = v20 + div v20 ,
v20;curl = v20 + curl v20 ,
where · 0 stands for the L2 -norm. We have for U defined as in (1.3) U = H(div; Ω) ∩ H0 (curl; Ω). Assume that the right-hand sides f ∈ H div0 ; Ω
and g ∈ L2 (Ω).
The 3D Maxwell problem we shall consider reads as follows: Find u ∈ U such that (2.1)
curl curl u = f ,
div u = g
in Ω,
u × n|Γ = 0.
(2.2) Remark 2.1. Setting (2.3)
z := curl u,
we see that z satisfies (2.4) (2.5)
curl curl z = curl f , z · n|Γ = 0,
div z = 0
in Ω,
curl z × n|Γ = f × n|Γ ,
if additionally f ∈ H(curl; Ω). Remark 2.2. The time-harmonic Maxwell equations in 3D, curl E − iωμH = 0 E × n|Γ = 0
and curl H + (iεω − σ)E = J
in Ω,
and (μH) · n|Γ = 0,
are often considered in practice, where E is the electric field; H is the magnetic field; ω > 0 is the frequency of the vibrations; ε, μ, σ are, respectively, the permittivity, the permeability, and the conductivity of the materials occupying Ω; and J ∈ H(div; Ω) is
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1279
the current density. Set f := iωJ, κ2 := ω 2 (ε + iσ/ω), and g := J /(iω). Eliminating H we see that E satisfies curl μ−1 curl E − κ2 E = f , div (ε + iσ/ω) E = g in Ω. 2
Similarly, setting f := curl (ε + iσ/ω)−1 J and κ := ω 2 μ, and eliminating E we see that H satisfies 2 curl (ε + iσ/ω)−1 curl H − κ H = f , div (μ H) = 0 in Ω. In the case of μ = ε = 1 and σ = 0, we have the following models of Maxwell equations: curl curl u − ω 2 u = f ,
(2.6)
div u = g
in Ω,
u × n|Γ = 0,
(2.7)
where u stands for the electric field, with f = i ω J ; or curl curl u − ω 2 u = curl f ,
(2.8)
u · n|Γ = 0,
(2.9)
div u = 0
in Ω,
curl u × n|Γ = f × n|Γ ,
where u stands for the magnetic field, with f = J . Since the corner and edge singularities of problem (2.1)–(2.2) (resp., (2.4)–(2.5)) have the same principal parts as those of problem (2.6)–(2.7) (resp., (2.8)–(2.9)), and since the main difficulty in the C 0 finite element discretization of (2.6)–(2.7) (resp., (2.8)–(2.9)) is due to the low regularity of the solution (not due to the presence of ω 2 ), it suffices for us to develop C 0 finite element methods for problem (2.1)–(2.2) (resp., (2.4)–(2.5)), which is in [26] called a Maxwell problem. In other words, the finite element method for problem (2.1)–(2.2) can be applied to problem (2.6)–(2.7) straightforwardly, as well as to the Maxwell eigenproblem (see Remark 2.3 below). Remark 2.3. The 3D Maxwell eigenproblem relating to the source problem (2.6)– (2.7) is to find u and ω 2 such that (2.10)
curl curl u = ω 2 u,
div u = 0 in Ω,
u × n|Γ = 0.
The PR variational formulation of (2.10) is to find u ∈ U and ω 2 such that (cf. [27]) (2.11)
(curl u, curl v) + s (div u, div v) = ω 2 (u, v) ∀v ∈ U.
Note that, if the eigenfunction is not in H 1 , then (2.11) suffers the same difficulty as the source problem when discretized by the C 0 finite element method. Now let us recall Green’s formula of integration by parts on Lipschitz domain D: (div v, φ)0,D + (v, φ)0,D = v · n φ ∀v ∈ H(div; D), ∀φ ∈ H 1 (D), ∂D 3 (curl v, φ)0,D − (v, curl φ)0,D = v × n · φ ∀v ∈ H(curl; D), ∀φ ∈ H 1 (D) , ∂D
3
where v · w = i=1 vi wi . Note that the last formula holds alsofor φ ∈ H(curl; D) (in a suitable weak sense) on Lipschitz polyhedra, cf. [3], with ∂ D v × n · φ being
1280
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
written as ∂ D v × n · (n × φ × n). Here and in the sequel, (·, ·)0,D denotes the L2 inner product on D, and (·, ·) stands solely for the L2 -inner product on Ω. Before closing this section, we define a notation in 3D. For any vector-valued function v = (v1 , v2 , v3 ) and a scalar function q, we define a notation (v, q)0,D ∈ R3 by (2.12)
(v, q)0,D := ((v1 , q)0,D , (v2 , q)0,D , (v3 , q)0,D ) ∈ R3 .
For any v ∈ H(curl; D) and φ ∈ H01 (D), we have from the above Green’s formula (2.13) (curl v, φ)0,D = (((v2 , v3 ), curl23 φ)0,D , ((v3 , v1 ), curl31 φ)0,D , ((v1 , v2 ), curl12 φ)0,D ) ∈ R3 , where curlij φ = (∂j φ, −∂i φ) is the curl of the scalar function φ with respect to the coordinate components (xi , xj ), and we also have for u, v ∈ H(curl; D) and φ ∈ H01 (D) (2.14) (curl u, φ)0,D · (curl v, φ)0,D = ((u2 , u3 ), curl23 φ)0,D ((v2 , v3 ), curl23 φ)0,D + ((u3 , u1 ), curl31 φ)0,D ((v3 , v1 ), curl31 φ)0,D + ((u1 , u2 ), curl12 φ)0,D ((v1 , v2 ), curl12 φ)0,D . 3. The L2 projected C 0 finite element method. Let Ch denote the shape¯ into tetrahedra, with diameters hK for regular triangulation (see [21, 16, 37]) of Ω K ∈ Ch bounded by h. Let Pk be the space of polynomials of degree not greater than k ≥ 0, with k being a nonnegative integer. Set 4 5 Ph := q ∈ L2 (Ω); q|K ∈ P1 (K), ∀K ∈ Ch , (3.1) 4 5 Qh := q ∈ L2 (Ω); q|K ∈ P0 (K), ∀K ∈ Ch . (3.2) Let K ∈ Ch be a tetrahedron with vertices ai , 1 ≤ i ≤ 4, and let Fi be the face opposite ai . Denote by λi the barycentric coordinate of ai . In fact, P1 (K) = span{λi , 1 ≤ i ≤ 4} and λi is also called the shape function of P1 (K); cf. [21]. Introduce the elementbubble bK := λ1 λ2 λ3 λ4 ∈ H01 (K),
(3.3) and the face bubbles (3.4)
bF1 = λ2 λ3 λ4 ,
bF2 = λ1 λ3 λ4 ,
bF3 = λ1 λ2 λ4 ,
bF4 = λ1 λ2 λ3 .
We see that these face bubbles satisfy (3.5)
bFi |Fi ∈ H01 (Fi ),
bFi |Fj = 0 for all j = i.
Let φFi ,j = pFi ,j bFi ∈ H 1 (K), 1 ≤ j ≤ 3, be the shape (basis) functions of P4 (K) on Fi , 1 ≤ i ≤ 4, where (3.6)
P1 (Fi ) = span{pFi ,j |Fi , 1 ≤ j ≤ 3}.
Let (3.7)
3
P Fi := span{q Fi ,l , 1 ≤ l ≤ 9} = (span{pFi ,j , 1 ≤ j ≤ 3}) .
1281
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
Clearly, we have P Fi |Fi = (P1 (Fi ))3 . Introduce (3.8)
. 3 Φh := v ∈ H 1 (Ω))3 ; v|K ∈ (span{φFi ,j , 1 ≤ j ≤ 3, 1 ≤ i ≤ 4} , ∀K ∈ Ch 5 4 = v ∈ (H 1 (Ω))3 ; v|K ∈ span{q Fi ,l bFi , 1 ≤ l ≤ 9, 1 ≤ i ≤ 4}, ∀K ∈ Ch , . 3 Bh := v ∈ (H01 (Ω))3 ; v|K ∈ (span{bK }) , ∀K ∈ Ch (3.9) 5 4 = v ∈ (H01 (Ω))3 ; v|K ∈ (P0 (K))3 bK , ∀K ∈ Ch .
Define the C 0 approximate space Uh ⊂ (H 1 (Ω))3 ∩ H0 (curl; Ω) ⊂ U as follows: (3.10)
3 Uh = Ph ∩ H 1 (Ω) ∩ H0 (curl; Ω) + Φh ∩ H0 (curl; Ω) + Bh .
Let θK,l , 1 ≤ l ≤ m = 20, denote the shape function of P3 (K). Introduce a local set of functions ΥK = {θK,l , 1 ≤ l ≤ m = 20},
(3.11)
and define mesh-dependent (elementwisely) bilinear and linear forms as follows:
(3.12)
Sh,div (u, v) :=
m
(u, (θK,l bK ))0,K (v, (θK,l bK ))0,K
l=1
,
m
K∈Ch
(θK,l bK )20,K
l=1
(3.13)
Zh,div (g; v) := −
m l=1
K∈Ch
(g, θK,l bK )0,K (v, (θK,l bK ))0,K m l=1
(3.14)
Sh,curl (u, v) :=
K∈Ch
m l=1
, (θK,l bK )20,K
(curl u, θK,l bK )0,K · (curl v, θK,l bK )0,K m l=1
, (θK,l bK )20,K
˘ h (div v) ∈ Qh for where the notation in (2.12) was used in (3.14). We finally define R a given v ∈ H(div; Ω) ∩ H(curl; Ω) by 1 ˘ (3.15) Rh (div v)|K := div v ∀K ∈ Ch , |K| K where |K| denotes the volume of K, and define Rh (curlv) ∈ (Ph )3 by (3.16)
(Rh (curl v), q)0,K := (curl v, q)0,K
∀q ∈ (P1 (K))3 , ∀K ∈ Ch .
Setting (3.17)
Sh (u, v) := Sh,div (u, v) + Sh,curl (u, v),
Zh (v) := Zh,div (g; v),
and letting s, α be two positive constants, we define the bilinear form on Uh × Uh as follows: ˘ h (div u), R ˘ h (div v) +α Sh (u, v), (3.18) Lh (u, v) := (Rh (curl u), Rh (curl v))+s R
1282
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
and define the linear form on Uh as follows: ˘ h (div v)) + α Zh (v). Fh (v) := (f , v) + s (g, R
(3.19)
The L2 projected C 0 finite element method to numerically solve problem (2.1)– (2.2) reads as follows: Find uh ∈ Uh such that (3.20) Lh (uh , v) = Fh (v) ∀v ∈ Uh . Remark 3.1. The method (3.20) is not consistent in the usual sense [21], i.e., with u the exact solution and uh the finite element solution (See Lemma 5.1 for more details), Lh (u − uh , v h ) = 0
(3.21)
∀v h ∈ Uh ,
because the term Sh,curl (u, v h ) does not correspond to any right-hand side term and (3.22)
(Rh (curl u), Rh (curl v h )) = (curl u, Rh (curl v h )) = (f , v h ) ∀v h ∈ Uh ,
where u satisfies (3.23)
(curl u, curl v) = (f , v) ∀v ∈ U.
As we shall see, the estimate of the inconsistency error in (3.22) will be involved with the profound result on the regular-singular decomposition stated in Proposition 5.2 in the regularity theory for the Maxwell equations. Remark 3.2. The role of the face- and element-bubbles in Uh is to eliminate the effects of both curl and div partial derivatives on the solution u with the help of the ˘ h (see (5.14) in Lemma 5.5). The local set ΥK , defined local L2 projectors Rh and R as (3.11) and used in (3.12)–(3.14), ensures that the following element-local inclusion properties hold: (3.24)
div (v|K ) ∈ P3 (K),
curl (v|K ) ∈ (P3 (K))3
on K
∀v ∈ Uh , ∀K ∈ Ch ,
where v|K is the restriction of v to K ∈ Ch . From (3.24) we have certain coercivity properties for both Sh,div (u, v) and Sh,curl (u, v) (see Lemma 4.3). The stabilization term Sh in (3.17) is to ‘remedy’ the loss in the coercivity, where the loss is caused by the introduction of the L2 projectors in front of both curl and div operators (cf. the coercive PR form (1.4) without L2 projectors); see (4.27) in proving the coercivity property stated in Theorem 4.1. Remark 3.3. In 2D, we just take the approximate space as the P3 element: . 2 (3.25) Uh := v ∈ H 1 (Ω) ∩ H0 (curl; Ω); v|K ∈ (P3 (K))2 , ∀K ∈ Ch , where H0 (curl; Ω) = {v ∈ (L2 (Ω))2 ; curl v ∈ L2 (Ω), v · τ |∂ Ω = 0}, with curl v = ∂1 v2 −∂2 v1 and τ being the tangential unit vector to ∂ Ω, and the local set of functions (3.26)
ΥK := {θK,l , 1 ≤ l ≤ m = 6},
where θK,l , 1 ≤ l ≤ m = 6, is chosen as the shape function of P2 (K), and other definitions can be easily adjusted.
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1283
Remark 3.4. If the approximate space could contain the gradient of some C 1 element (i.e., the continuity of the functions is also imposed on the first-order partial derivatives across adjacent finite elements), we can drop both the L2 projector Rh of the curl operator and the mesh-dependent bilinear form Sh,curl (·, ·). Below for the 2D problem we propose two finite element methods for which the approximate space, respectively, contains the gradient of the Argyris C 1 triangle element and the Hsieh–Clough–Tocher (HCT) C 1 macro triangle element (see [21]). As for 3D, the approximate space containing the gradient of a C 1 element is of relatively little interest as pointed out in section 1. The Argyris C 1 element consists of polynomials of degree not greater than 5. The HCT C 1 macro-element consists of piecewise P3 polynomials, i.e, let Ti , 1 ≤ i ≤ 3, denote the subtriangles which are obtained by connecting the barycentric point of the triangle K ∈ Ch to the three vertices of K, then the HCT functions are P3 on each Ti . Set (3.27)
Th/2 := ∪K∈Ch ∪3i=1 Ti .
Define two approximate spaces as follows: . 2 Uh∗ := v ∈ H 1 (Ω) ∩ H0 (curl; Ω); v|K ∈ (P4 (K))2 , ∀K ∈ Ch , (3.28) . 2 (3.29) Uh∗∗ := v ∈ H 1 (Ω) ∩ H0 (curl; Ω); v|T ∈ (P2 (T ))2 , ∀T ∈ Th/2 , where Uh∗ contains the gradient of the Argyris C 1 element, and Uh∗∗ contains the gradient of the HCT C 1 macro element. Corresponding to Uh∗ , we introduce the local set of functions (3.30)
Υ∗K := {θK,l , 1 ≤ l ≤ m = 10},
where θK,l , 1 ≤ l ≤ m = 10, is chosen as the shape function of P3 (K), and we define ˘ h (div u), R ˘ h (div v) + α Sh,div (u, v), (3.31) L∗h (u, v) := (curl u, curl v) + s R ˘ h (div v) + α Zh,div (g; v), Fh∗ (v) := (f , v) + s g, R (3.32) where Sh,div (u, v) and Zh,div (g; v) are, respectively, defined by (3.12) and (3.13) but ˘ h is defined by those functions θK,l , 1 ≤ l ≤ m, are in Υ∗K given by (3.30), and R (3.15). The finite element method is, thus, stated as follows: Find u∗h ∈ Uh∗ such that (3.33) L∗h (u∗h , v) = Fh∗ (v) ∀v ∈ Uh∗ . While corresponding to Uh∗∗ we introduce the local set of functions for T ∈ Th/2 (3.34)
Υ∗∗ T := {θT,l ; 1 ≤ l ≤ m = 3},
where θT,l , 1 ≤ l ≤ m = 3, is chosen as the shape function of P1 (T ), and we define ˘ ˘ (3.35) L∗∗ (u, v) := (curl u, curl v) + s R (div u), R (div v) + α Sh/2,div (u, v), h h h ˘ h (div v) + α Zh/2,div (g; v), Fh∗∗ (v) := (f , v) + s g, R (3.36)
1284
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
where, with respect to the subtriangulation Th/2 given by (3.27), Sh/2,div (u, v) and Zh/2,div (g; v) are defined similarly to those in (3.12) and (3.13) with the choice Υ∗∗ T ˘ h is still defined in (3.15) with respect to the triangulation Ch . given by (3.34), and R The finite element method reads as follows: ∗∗ Find u∗∗ h ∈ Uh such that (3.37) ∗∗ ∗∗ Lh (uh , v) = Fh∗∗ (v) ∀v ∈ Uh∗∗ . It can be easily seen that both the methods (3.33) and (3.37) are consistent in the usual sense, i.e., (3.38)
L∗h (u − u∗h , v h ) = 0 ∀v h ∈ Uh∗ ,
for example, where u and u∗h are the exact solution and the finite element solution, respectively. As we shall see, the advantage of the consistency is allowing the righthand side f to be less regular; see (5.50) and Remark 5.3. 4. Coercivity and condition number. We first investigate properties of the mesh dependent bilinear forms. Lemma 4.1. Under the shape-regular condition, there exist constants C1 , C2 and C3 , C4 , independent of h and K, such that (4.1)
C1 h3K ≤
m
θK,l bK 20,K ≤ C2 h3K ,
l=1
(4.2)
C3 hK ≤
m
(θK,l bK )20,K ≤ C4 hK ,
l=1
where θK,l ∈ ΥK , 1 ≤ l ≤ m = 20, with ΥK given as in (3.11), and bK is defined by (3.3). Proof. Both (4.1) and (4.2) can be easily shown by the scaling argument [37, 21, 17], or by a direct approach as follows. Since bK = λ1 λ2 λ3 λ4 , and θK,l is either 1 9 9 2 λi (3 λi − 1) (3 λi − 2) (at vertices), or 2 λi λj (3 λi − 1), 2 λi λj (3 λj − 1) (at twoedge Gaussian nodes), or 27 λi λj λk (at face barycentric nodes), using the following formula on tetrahedron K (n1 )!(n2 )!(n3 )!(n4 )! , (for nonnegative integers nj ) λn1 1 λn2 2 λn3 3 λn4 4 = |K| (n 1 + n2 + n3 + n4 + 3)! K under the shape-regular condition [37, 16], it is not difficult to show that (4.1) and (4.2) hold. Lemma 4.2. We have (4.3) (4.4) (4.5)
|Sh,div (u, v)| ≤ u0 v0 , |Sh,curl (u, v)| ≤ 3 u0 v0 , 0 ≤ Sh,div (v, v) ≤ C h2K div v20,K , K∈Ch
(4.6)
0 ≤ Sh,curl (v, v) ≤ C
h2K curl v20,K .
K∈Ch
Proof. The left-hand sides of (4.5)–(4.6) are obvious. We only prove (4.3) and the right-hand side of (4.5) as examples, while (4.4) and the right-hand side of (4.6) can
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1285
be estimated in the same way, only noting that (2.14) will be used in proving (4.4). We first prove (4.3). From the Cauchy–Schwarz inequality we have m m ≤ u (u, (θ b )) (v, (θ b )) v (θK,l bK )20,K , K,l K 0,K K,l K 0,K 0,K 0,K l=1
and
(4.7)
l=1
m (u, (θK,l bK ))0,K (v, (θK,l bK ))0,K l=1 |Sh,div (u, v)| = m 2 K∈Ch (θ b ) K,l K 0,K l=1 12 12 2 2 u0,K v0,K ≤ u0,K v0,K ≤ K∈Ch
K∈Ch
K∈Ch
= u0 v0 . We next prove the right-hand side of (4.5). Since θK,l bK ∈ H01 (K), we have from Green’s formula of integration by parts (v, (θK,l bK ))0,K = −(div v, θK,l bK )0,K ,
(4.8) and then (4.9) Sh,div (v, v) =
K∈Ch
m
((v, (θK,l bK ))0,K )
l=1
m
l=1
2
= (θK,l bK )20,K
K∈Ch
m
2
((div v, θK,l bK )0,K )
l=1 m l=1
, (θK,l bK )20,K
where, from the Cauchy–Schwarz inequality and the right-hand side of (4.1), (4.10)
m
((div v, θK,l bK )0,K )2 ≤ div v20,K
l=1
m
θK,l bK 20,K ≤ C h3K div v20,K .
l=1
Combining (4.9)–(4.10) and the left-hand side of (4.2) obtains the right-hand side of (4.5). Now we introduce a mesh-dependent norm on Uh : (4.11) v2h := h2K div v20,K + h2K curl v20,K . K∈Ch
K∈Ch
Lemma 4.3. For all v ∈ Uh we have (4.12)
Sh,div (v, v) ≥ C
h2K div v20,K ,
K∈Ch
(4.13)
Sh,curl (v, v) ≥ C
h2K curl v20,K .
K∈Ch
As a consequence, for Sh (u, v), defined as in (3.17), there holds (4.14)
Sh (v, v) ≥ C v2h .
1286
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
Proof. We only prove (4.12), while (4.13) can be proven in the same way. Given any v ∈ Uh . From the element-local inclusion properties in (3.24) we, thus, write on K (4.15)
div v =
m
cl θK,l ,
l=1
where cl ∈ R are coefficients, and θK,l ∈ ΥK defined by (3.11). We have (4.16) m
2
((v, (θK,l bK ))0,K ) =
l=1
m
2
((div v, θK,l bK )0,K ) =
l=1
m
(c dl )2 = c A2K c,
l=1
where c = (c1 , . . . , cm ) ∈ R , dl = (d1,l , . . . , dm,l ) ∈ R , 1 ≤ l ≤ m, with di,l = (θK,i , θK,l bK )0,K , 1 ≤ i, l ≤ m, and AK is the ‘mass’ matrix with AK = [d1 , . . . , dm ] ∈ Rm×m . Clearly, AK is symmetric and positive definite. Let T ∈ Rm×m be the orthogonal matrix such that AK = T diag (λ1 , . . . , λm ) T , where 0 < λ1 ≤ · · · ≤ λm are the eigenvalues of AK . Using the scaling argument, we can easily show m
m
λ1 ≥ C h3K .
(4.17)
Let ¯ c = T c = (¯ c1 , . . . , c¯m ) ∈ Rm , we have from (4.16) that m
(4.18)
((v, (θK,l bK ))0,K )2 =
l=1
m
(¯ cl λl )2 .
l=1
On the other hand, by a similar argument we have from (4.15) that (4.19)
(div v, div v bK )0,K =
m
(¯ cl )2 λl .
l=1
We then obtain (4.20) m m m 2 ((v, (θK,l bK ))0,K ) = (¯ cl λl )2 ≥ λ1 (¯ cl )2 λl = λ1 (div v, div v bK )0,K . l=1
l=1
l=1
But, using the scaling argument we can have (4.21)
(div v, div v bK )0,K ≥ C (div v, div v)0,K .
Hence, from (4.20), (4.21), and (4.17), (4.22)
m
2
((v, (θK,l bK ))0,K ) ≥ C h3K div v20,K .
l=1
Then we have from (4.22) and the right-hand side of (4.2)
(4.23)
Sh,div (v, v) =
K∈Ch
m l=1
m
l=1
This completes the proof.
2
((v, (θK,l bK ))0,K ) (θK,l bK )20,K
≥C
K∈Ch
h2K div v20,K .
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1287
Remark 4.1. With Sh (u, v) defined in (3.17), Lemmas 4.2 and 4.3 lead to C v2h ≤ Sh (v, v) ≤ C v2h
∀v ∈ Uh .
One might, thus, think that, instead of using Sh (u, v), it would be more convenient to use the following stabilization term Sh (u, v): (4.24) Sh (u, v) := h2K (div u, div v)0,K + h2K (curl u, curl v)0,K . K∈Ch
K∈Ch
In that case, however, a correct convergent finite element solution may not be obtained when the exact solution is not in H 1 . This was confirmed by our numerical experiments (which is not reported in this paper). Such an incorrect convergence may be explained as in section 1. In fact, taking hK = h for all K, we have Sh (u, v) = h2 (div u, div v) + h2 (curl u, curl v) = h2 ( u, v)
∀u, v ∈ Uh ,
which may enforce a convergence of the finite element solution uh to an element in H 1 . On the other hand, the Sh (u, v) defined as in (3.17) is suitable for the nonsmooth solution that does not belong to H 1 , since no partial differential derivatives are applied on both u and v (where, to see this point for Sh,curl (u, v), (2.14) was used). For the analysis of coercivity, below we recall the L2 -orthogonal decomposition and the regular-singular decomposition of vector fields on Lipschitz polyhedra. The following first two propositions are due to [34], see also [4, 14]. Proposition 4.1. We have the following L2 -orthogonal decomposition of vector fields with respect to the L2 inner product (·, ·): 3 2 L (Ω) = H01 (Ω) ⊕ curl (H(curl; Ω) ∩ H0 div0 ; Ω) . Proposition 4.2. For any v ∈ H(curl; Ω) ∩ H0 (div0 ; Ω), or for any v ∈ H0 (curl; Ω) ∩ H(div0 ; Ω), we have v0 ≤ C curl v0 . Proposition 4.3 ([12, 13]). For any ψ ∈ H(curl; Ω) ∩ H0 (div; Ω), it can be written as the following regular-singular decomposition: ψ = ψ 0 + q, where ψ 0 ∈ H0 (div; Ω)∩(H 1 (Ω))3 is called “regular part” and q ∈ H 1 (Ω)\R “singular part,” satisfying 0 00 0ψ 0 ≤ C {ψ0 + curl ψ0 + div ψ0 }. 1 Theorem 4.1. Let the stabilization parameter α ≥ α0 > 0, with α0 being determined according to (4.27) below, i.e., α ≥ α0 = C6 as given in (4.28). We have (4.25)
Lh (v, v) ≥ C v20
∀v ∈ Uh .
As a consequence of Lax–Milgram lemma, problem (3.20) has a unique solution. Proof. Since 0 02 0˘ 0 (4.26) Lh (v, v) = Rh (curl v)20 + s 0R h (div v)0 + α Sh (v, v), 0
1288
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
we need only prove that there exist positive constants C5 and C6 such that 0 02 0˘ 0 2 (4.27) Rh (curl v)20 + s 0R h (div v)0 ≥ C5 v0 − C6 Sh (v, v) ∀v ∈ Uh . 0
Then the theorem follows by choosing α ≥ α0 := C6 .
(4.28)
Note that s may be chosen in advance as any given positive constant, say s = 1. From Proposition 4.1 we write v as the following L2 -orthogonal decomposition with respect to the L2 inner product: v = p + curl ψ,
(4.29)
with p ∈ H01 (Ω) and ψ ∈ H(curl; Ω) ∩ H0 (div0 ; Ω), satisfying v20 = p20 + curl ψ20 .
(4.30)
We also have from Proposition 4.2 ψ0 ≤ C curl ψ0 .
(4.31)
From Proposition 4.3 we further write ψ as ψ = ψ 0 + q,
(4.32)
where ψ 0 ∈ H0 (div; Ω) ∩ (H 1 (Ω))3 , q ∈ H(curl0 ; Ω) with q ∈ H 1 (Ω)/R, and we have from Proposition 4.3 and (4.31) 0 00 0ψ 0 ≤ C curl ψ0 . (4.33) 1 According to two components (p, ψ) in (4.29), we divide the proof of (4.27) into two steps. Step 1. We consider p. We take p6 ∈ Qh as the local L2 projection of p such that [30, 36] 1 (4.34) p6|K = p ∀K ∈ Ch , |K| K 12 h−2 620,K + 6 p0 ≤ C p1 . (4.35) K p − p K∈Ch
Let δ > 0 be a constant to be determined. We have 0 02 0 02 0˘ 0 0˘ 0 2 2 ˘ (4.36) (div v) + δ p 6 − δ 6 p − 2 δ R (div v), p 6 , 0Rh (div v)0 = 0R 0 h h 0 0
0
where (4.37)
−δ 2 6 p20 ≥ −δ 2 p20 ≥ −δ 2 C p20 , ˘ h (div v), p6 = −2 δ −2 δ R (div v, p6)0,K K∈Ch
(4.38) = 2δ
K∈Ch
(div v, p − p6)0,K − 2 δ
K∈Ch
(div v, p)0,K ,
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
−2 δ
(4.39)
1289
(div v, p)0,K = 2 δ (v, p) = 2 δ p20 ,
K∈Ch
(4.40) 1/2 1/2 −2 2 2 2 (div v, p − p6)0,K ≥ −2 δ hK div v0,K hK p − p60,K 2δ K∈Ch
K∈Ch
≥ −2 δ C
h2K div v20,K
K∈Ch
p1
K∈Ch
≥−
12
h2K div v20,K − C δ 2 p20 .
K∈Ch
Summarizing (4.36)–(4.40) and choosing (4.41)
0 < δ < 1/C,
we have 0 02 0˘ 0 h2K div v20,K . 0Rh (div v)0 ≥ δ (2 − 2 C δ) p20 −
(4.42)
0
K∈Ch
Step 2. We consider ψ (The argument is similar to that in Step 1, but we still P0 ∈ (P )3 as the local L2 projection of ψ 0 such that give the details). We take ψ h
P0 · q = ψ
(4.43) K
(4.44)
ψ0 · q
∀q ∈ (P1 (K))3 , ∀K ∈ Ch ,
K
h−2 K
0 0 0 0 P0 02 0ψ − ψ 0
12
0,K
K∈Ch
0 0 0 0 0 P0 0 + 0ψ 0 ≤ C 0ψ 0 01 . 0
Let δ > 0 be a constant to be determined. We have 02 0 02 0 0 P0 0 P0 0 − δ 2 0ψ P0 (4.45) Rh (curl v)20 = 0Rh (curl v) − δ ψ 0 0 0 + 2 δ Rh (curl v), ψ , 0
0
where (4.46)
0 02 0 02 0 P0 0 −δ 2 0ψ 0 ≥ −δ 2 C 0ψ 0 01 ≥ −δ 2 C curl ψ20 ,
(by (4.44) and (4.33))
0
(4.47) P0 = 2 δ curl v, ψ P0 2 δ Rh (curl v), ψ K∈Ch
= 2δ
K∈Ch
0,K
P0 − ψ 0 curl v, ψ
+2δ 0,K
curl v, ψ 0 0,K , K∈Ch
(4.48) 2δ curl v, ψ 0 0,K = 2 δ v, curl ψ 0 = 2 δ (v, curl ψ) = 2 δ curl ψ20 , K∈Ch
1290 (4.49) 2 δ
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
P0 − ψ 0 curl v, ψ
K∈Ch
≥ −2 δ
0,K
P0 h−2 K ψ
K∈Ch
≥ −2 δ C ψ 0 1
12 −
≥ −2 δ C curl ψ0 ≥−
h2K
curl v20,K
K∈Ch
12
h2K curl v20,K
K∈Ch
12
ψ 0 20,K
1/2 h2K
curl v20,K
K∈Ch
h2K curl v20,K − C δ 2 curl ψ20 .
K∈Ch
Summarizing (4.45)–(4.49) and choosing 0 < δ < 1/C, we have (4.50) Rh (curl v)20 ≥ δ(2 − 2 C δ) curl ψ20 − h2K curl v20,K . K∈Ch
Finally, from (4.42), (4.50), (4.30), (4.14), and (4.11), we obtain 0 02 0˘ 0 2 2 2 Rh (curl v)0 + s 0R h (div v)0 ≥ C5 p0 + curl ψ0 − vh (4.51) 0 ≥ C5 v20 − C6 Sh (v, v), where C5 and C6 are two positive constants independent of h and K. The proof is finished. Remark 4.2. In fact, the regularization parameter s and the stabilization parameter α can be both taken as any given positive constants, since Lh (·, ·) is nonnegative no matter what α ≥ 0 and s ≥ 0 are, i.e., for all α, s ∈ [0, +∞), Lh (v, v) ≥ 0
∀v ∈ Uh .
For example, denoting by L1,1 h the bilinear form in (3.20) for the choice α = s = 1 and by Lα,s for the choice (4.28) and any s > 0, we still have the coercivity as stated in (4.25) for L1,1 h , since we have from the above nonnegativeness property that 0 02 0˘ 0 2 −1 α,s L1,1 L (v, v). h (v, v) = Rh (curl v)0 + 0Rh (div v)0 + Sh (v, v) ≥ (max(1, α, s)) 0
On the other hand, a suitable large α will indeed yield smaller errors in their values, although whatever value of α does not affect the convergence rate, see the numerical experiments in section 6. Remark 4.3. Regarding L∗h in (3.31), we can obtain the same coercivity as in (4.25) by a similar argument, but replacing Step 2 by the following: since v ∈ H0 (curl; Ω), we have from (4.29) (4.52) curl ψ ∈ H0 (curl; Ω) ∩ H div0 ; Ω , and applying Proposition 4.2 with curl ψ ∈ H0 (curl; Ω) ∩ H(div0 ; Ω) to obtain (4.53)
curl v20 = curl curl ψ20 ≥ C curl ψ20 ,
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1291
and both (4.53) and (4.42) yield an estimation similar to (4.27), i.e., 0 02 0˘ 0 (4.54) curl v2 + s 0R (div v) 0 ≥ C7 v2 − C8 Sh,div (v, v), h 0
0
0
from which we have the following coercivity for L∗h with the stabilization parameter α > C8 : ∗ 2 2 2 (4.55) Lh (v, v) ≥ C v0;curl + hK div v0,K ∀v ∈ Uh∗ . K∈Ch
The above argument goes also to the L∗∗ h in (3.35) in the same way, only noting that (4.56) Sh/2,div (v, v) ≥ C h2T div v20,T ≥ C h2K div v20,K T ∈Th/2
K∈Ch
holds for all v ∈ Uh∗∗ , where (4.56) can be shown by a similar argument used for Lemma 4.3. Before closing this section, we give the condition number of the resulting linear system. Theorem 4.2. Assume that the meshes are uniform as usual. Then, the condition number of the resulting linear system of problem (3.20) is of O(h−2 ). ˘ h are local L2 projectors, we have from the inverse Proof. Since both Rh and R estimates [21] that for all v ∈ Uh 0 0 0˘ 0 (4.57) 0Rh (div v)0 + Rh (curl v)0 ≤ div v0 + curl v0 ≤ C h−1 v0 . 0
On the other hand, from Lemma 4.2 we have for all v ∈ Uh Sh (v, v) = Sh,div (v, v) + Sh,curl (v, v) ≤ C v20 . Hence, we have (4.58)
0 02 0˘ 0 −2 Lh (v, v) = Rh (curl v)20 + s 0R v20 h (div v)0 + α Sh (v, v) ≤ C h 0
∀v ∈ Uh ,
which, together with the L2 coercivity property in Theorem 4.1 and the symmetry property of Lh , leads to the result. 5. Error estimates. In this section, we establish in an energy norm the error bound between the exact solution and the finite element solution. This consists mainly of how to estimate the inconsistent errors caused by the L2 projector Rh and how to construct an appropriate interpolant of the exact solution to eliminate the effects of the first order derivatives from both div and curl operators on the solution that is not in H 1 , i.e., “eliminating” the div and curl operators in the context of (5.14) later on. The former depends on a profound result on the regular-singular decomposition of the curl of the solution and the latter resorts to the two L2 projectors. We first give estimates of inconsistency errors from the curl operator. Lemma 5.1. Let u and uh be the exact solution to problem (2.1)–(2.2) and the finite element solution to problem (3.20), respectively. We have for all v h ∈ Uh (5.1) Lh (u − uh , v h ) = (curl u, Rh (curl v h )) − (curl u, curl v h ) + α Sh,curl (u, v h ).
1292
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
Proof. From (3.12), (3.13), and the second equation in (2.1) we clearly have on Uh (5.2)
Sh,div (u, v h ) = Zh,div (g; v h ).
On the other hand, we have from (3.15), (3.16), (2.1), (2.2), and (3.23) on Uh ˘ h (div u), R ˘ h (div v h ) = div u, R ˘ h (div v h ) = g, R ˘ h (div v h ) , (5.3) R (5.4)
(Rh (curl u), Rh (curl v h )) = (curl u, Rh (curl v h )) = (curl u, Rh (curl v h )) −(curl u, curl v h ) + (f , v h ),
and we obtain (5.1). Remark 5.1. Regarding (3.33) or (3.37), as pointed out in Remark 3.4, there are no inconsistent errors, see (3.38). Lemma 5.2. Let u be the solution of problem (2.1)–(2.2). We have for all v h ∈ Uh (5.5)
|Sh,curl (u, v h )| ≤ C h curl u0
12 h2K curl v h 20,K
.
K∈Ch
Proof. Equation (5.5) is derived from the same argument as in proving Lemma 4.2. Proposition 5.1. For any v ∈ H0 (curl; Ω)∩H(div; Ω) or for any v ∈ H(curl; Ω) ∩ H0 (div; Ω) we have v ∈ (H r (Ω))3 for some real number r > 1/2, satisfying vr ≤ C (div v0 + curl v0 ). Lemma 5.3. Let u ∈ U be the solution of problem (2.1)–(2.2). Then, we have u, curlu ∈ (H r (Ω))3 for some real number r > 1/2, satisfying ur ≤ C (f 0 + g0 ),
curl ur ≤ C f 0 .
Proof. Since u ∈ U = H(div; Ω) ∩ H0 (curl; Ω) is the solution of problem (2.1)– (2.2), then for all v ∈ U (curl u, curl v) + (div u, div v) = (f , v) + (g, div v), which, together with Proposition 5.1, leads to the stated result. Moreover, since z = curl u satisfies curl z = f ,
div z = 0 in Ω,
z · n|Γ = 0,
we have from Proposition 5.1 again curl ur = zr ≤ C curl z0 = C f 0 . Proposition 5.2 ([51, 29, 27, 26, 31]). Additionally, assume that f ∈ H(curl; Ω) ∩(H r (Ω))3 for some real number r > 1/2. Let z be given as in (2.3), satisfying (2.4)– (2.5). Then, z can be written into the following regular-singular decomposition z = zH + ϕ
in Ω,
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1293
where z H ∈ H(curl; Ω) ∩ (H 1+r (Ω))3 and ϕ ∈ H 1 (Ω) ∩ H 1+r (Ω) satisfy z H 1+r + ϕ1+r ≤ C (f r + curl f 0 ). Lemma 5.4. Let u be the solution to problem (2.1)–(2.2), with the additional assumption that f ∈ H(curl; Ω) ∩ (H r (Ω))3 for some real number r > 1/2. We have for all v h ∈ Uh (curl u, Rh (curl v h )) − (curl ⎛ u, curl v h ) ≤ (5.6)
C hr (f r + curl f 0 ) ⎝Rh (curl v h )0 +
12 ⎞ ⎠. h2K curl v h 20,K
K∈Ch
Proof. According to the regular-singular decomposition of z = curl u = z H +ϕ u ∈ (Ph )3 as the interpolation to curl u by in Proposition 5.2, we define curl u := z 6H + ϕ, curl 6
(5.7)
6H ∈ (Ph )3 is the local L2 projection of z H , and ϕ where z 6 ∈ Ph ∩ H 1 (Ω) is the usual interpolant of ϕ. We have (5.8)
1/2 h−2 K
z H −
6H 20,K z
≤ C hr z H 1+r ,
ϕ − ϕ 6 1 ≤ C hr ϕ1+r ,
K∈Ch
6H 0 ≤ C hr z H r . z H − z
(5.9)
We, thus, have 0 0 0 u0 6H 0 + (ϕ − ϕ) 6 0 ≤ C hr (z H r + ϕ1+r ), (5.10) 0curl u − curl 0 ≤ z H − z 0
(5.11)
( (ϕ 6 − ϕ), curl v h ) = 0
∀v h ∈ Uh .
Since we have from (3.16) u, Rh (curl v h ) = curl u, curl v h , (5.12) curl we then have from (5.10)–(5.12)
(5.13)
(curl u, Rh (curl v h ))−(curl u, curl v h ) u, Rh (curl v h ) = curl u − curl u − curl u, curl v h + curl u, Rh (curl v h ) + (6 = curl u − curl z H − z H , curl v h ) r ≤ C h (z H r + ϕ1+r ) Rh (curl v h )0 1/2 + C hr z H 1+r h2K curl v h 20,K , K∈Ch
which, together with Proposition 5.2, leads to (5.6). 6 ∈ Uh of the solution u. In what follows, we construct an interpolant u
1294
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
Lemma 5.5. Let u ∈ U = H(div; Ω) ∩ H0 (curl; Ω) be the solution to problem 6 ∈ Uh defined as in (3.10) such that (2.1)–(2.2). Then, there exists a u 0 02 0˘ 0 6 ))0 = Rh (curl (u − u 6 ))20 = 0, (5.14) 0Rh (div (u − u 0
6 0 ≤ C hr ur . u − u
(5.15)
Proof. From Lemma 5.3 we know that u ∈ (H r (Ω))3 for some real number r > 1/2. We first let u0 ∈ (Ph ∩H 1 (Ω))3 ∩H0 (curl; Ω) be such that [10, 11, 22, 52, 53] 1/2 0 0 1 00 0 2 0 u−u 0+ (5.16) hF u − u 0,F ≤ C hr ur , r > . 2 K∈Ch F ⊂∂ K
6 ∈ Uh by the following (5.17)–(5.19): We then define u (5.17)
6 (a) = u0 (a) for all vertices a, u
(5.18) Fi
(6 u − u) · q Fi ,l = 0
∀q Fi ,l ∈ P Fi , ∀Fi ∈ ∂ K, ∀K ∈ Ch ,
where P Fi is given by (3.7) and ∂ K = {Fi , 1 ≤ i ≤ 4}, (5.19) (6 u − u) = 0. K
6 ∈ Uh as According to (3.10), on K with boundary ∂ K = {Fi , 1 ≤ i ≤ 4}, we write u the following form: 6 = u0 + u
(5.20)
9 4
ˆ + cK bK , ci,l q Fi ,l bFi + cK bK =: u
i=1 l=1
where ci,l ∈ R and cK ∈ R3 are all coefficients to be determined. Since the face bubble and the element bubble take zero at all vertices, (5.17) determines the linear 6 , and (5.18) is to determine the face bubble part because the element bubble part of u takes zero along all faces, and (5.19) is for the element bubble part. From (5.18) the coefficients ci,l , 1 ≤ l ≤ 9, are determined uniquely by (5.21)
9 l=1
ci,l Fi
q Fi ,l · q Fi ,k bFi =
Fi
u − u0 · q Fi ,k
1 ≤ k ≤ 9,
and from (5.19) the coefficient cK is given by ˆ) (u − u (5.22) cK = K . b K K Using the scaling argument, we can easily obtain 0 0 1 0 0 ˆ 0,K ≤ C 0u − u0 00,K + C (5.23) u − u hF2 0u − u0 00,F , F ⊂∂ K
and (5.24)
6 0,K ≤ C u − u ˆ 0,K . u − u
From (5.24), (5.23), and (5.16) it follows that (5.15) holds.
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1295
6 : we have from (3.15) and (5.18) Equation (5.14) holds from the construction of u that 0 02 0˘ 0 ˘ h (div (u − u 6) , R 6 ))0 = 6 )) div (u − u 0Rh (div (u − u 0,K 0 K∈Ch (5.25) ˘ h (div (u − u 6) · n R 6 )) = 0, = (u − u K∈Ch F ⊂∂ K
F
˘ h (div (u − u 6 ))|F ∈ P F |F . Similarly, we have from (3.16), (5.19), and (5.18) since n R that (5.26) 6 ))20 = 6 ), Rh (curl (u − u 6 )))0,K (curl (u − u Rh (curl (u − u K∈C h 6 , curl Rh (curl (u − u 6 )))0,K = (u − u K∈Ch 6 ) · (n × Rh (curl (u − u 6 ))) = 0, − (u − u K∈Ch F ⊂∂ K
F
6 ))|K ∈ (P0 (K))3 , and n × Rh (curl (u − u 6 ))|F ∈ P F |F . since curl Rh (curl (u − u Lemma 5.6. We have on H(curl; Ω) ∩ H(div; Ω) Lh (u, v) ≤ (Lh (u, u))1/2 (Lh (v, v))1/2 . Proof. Both the symmetry and the coercivity properties of Lh lead to the above generalized Cauchy–Schwarz inequality. Setting (5.27)
|||v|||2Lh := Lh (v, v),
we introduce an energy norm as follows: (5.28)
|||v|||20;Lh := v20 + |||v|||2Lh
0 02 0˘ 0 = v20 + Rh (curl v)20 + s 0R h (div v)0 + α Sh (v, v). 0
Theorem 5.1. Let u ∈ U be the solution to problem (2.1)–(2.2) with the righthand sides f ∈ H(div0 ; Ω) ∩ H(curl; Ω) ∩ (H r (Ω))3 for some r > 1/2 and g ∈ L2 (Ω), and let uh ∈ Uh be the solution to the finite element problem (3.20). Then (5.29)
|||u − uh |||0;Lh ≤ C hr (f 0;curl + f r + g0 ).
6 ∈ Uh be constructed as in Lemma 5.5. We have from Lemmas 5.1, Proof. Let u 5.2, 5.4, and 5.6 that 6 |||2Lh = Lh (uh − u 6 , uh − u 6) |||uh − u 6 6 6) = Lh (u − u, uh − u) + Lh (uh − u, uh − u 6 |||Lh + C hr (f r + curl f 0 ) |||uh − u 6 |||Lh 6 |||Lh |||uh − u ≤ |||u − u 6 |||Lh + hr (f r + curl f 0 )) |||uh − u 6 |||Lh , ≤ C (|||u − u that is, (5.30)
6 |||Lh ≤ C (|||u − u 6 |||Lh + hr (f r + curl f 0 )), |||uh − u
1296
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
where, from Lemma 5.5 and Lemma 4.2, 6 |||2Lh = Lh (u − u 6, u − u 6) |||u − u
0 02 0˘ 0 6 ))20 + s 0R 6 ))0 = Rh (curl (u − u h (div (u − u 0 6, u − u 6 ) + α Sh,curl (u − u 6, u − u 6) + α Sh,div (u − u 6, u − u 6 ) + α Sh,curl (u − u 6, u − u 6) = α Sh,div (u − u 6 20 , ≤ C u − u
that is, 6 |||Lh ≤ C u − u 6 0 . |||u − u
(5.31)
Therefore, we have from the L2 coercivity in Theorem 4.1, (5.30), (5.31), and (5.15)
(5.32)
(5.33)
6 0 + uh − u 6 0 ≤ u − u 6 0 + C |||uh − u 6 |||Lh u − uh 0 ≤ u − u 6 0 + C (|||u − u 6 |||Lh + hr (f r + curl f 0 )) ≤ u − u 6 0 + C hr (f r + curl f 0 ) ≤ C u − u r ≤ C h (ur + f r + curl f 0 ), 6 |||Lh + |||6 u − uh |||Lh |||u − uh |||Lh ≤ |||u − u 6 0 + C hr (f r + curl f 0 ) ≤ C u − u ≤ C hr (ur + f r + curl f 0 ),
but from Lemma 5.3 (5.34)
ur ≤ C (f 0 + g0),
we, therefore, add (5.32) and (5.33) to obtain (5.29). Remark 5.2. For the finite element method (3.33), since we have no inconsistent 6 ∗ ∈ Uh∗ be the interpolant to the solution u of problem (2.1)–(2.2), we errors, let u have (5.35)
6 ∗ |||L∗h ≤ C |||u − u 6 ∗ |||L∗h , |||u∗h − u
following a similar argument as in proving Theorem 5.1, where (5.36)
0 02 0˘ ∗ 0 6 ∗ |||2L∗ = curl (u − u 6 ∗ )20 + s 0R 6 6∗, u − u 6 ∗ ). |||u − u (div (u − u )) 0 + α Sh,div (u − u h h 0
6 ∗ ∈ Uh∗ to the solution u in a bit different way from We construct the interpolant u Lemma 5.5, but in a way similar to (5.7). So, we recall the regular-singular decomposition for the solution u itself. Proposition 5.3 ([51, 29, 27, 26, 31]). Let u ∈ U be the solution to problem (2.1)–(2.2), with the right-hand sides f ∈ H(div0 ; Ω) and g ∈ L2 (Ω). Then, u can be written as the sum of a regular part and a singular part: (5.37)
u = uH + ψ,
where (5.38)
3 uH ∈ H 1+r (Ω) ∩ H0 (curl; Ω),
ψ ∈ H01 (Ω) ∩ H 1+r (Ω)
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1297
for some r > 1/2, and uH 1+r + ψ1+r ≤ C (f 0 + g0 ).
(5.39)
6 ∗ ∈ Uh∗ to the solution u as follows: We define the interpolant u 6 6 H + ψ, 6 ∗ := u u
(5.40)
6 H ∈ Uh∗ is the interpolant to uH ∈ (H 1+r (Ω))3 with r > 1/2 and is constructed where u in a similar way as in Lemma 5.5 such that 0 0 0˘ 0 6 H ))0 = 0, (5.41) 0Rh (div (uH − u 0
(5.42)
6 H 0 + h |uH − u 6 H |1 ≤ C h1+r uH 1+r , uH − u
while ψ6 is the interpolant to ψ ∈ H01 (Ω) ∩ H 1+r (Ω) with r > 1/2 and is constructed in the Argyris C 1 triangle element [21] such that (5.43) ∂n ψ6 = ∂n ψ for all F ∈ ∂ K, for all K ∈ Ch , F F 6 1 ≤ C hr ψ1+r . (5.44) ψ − ψ From (5.43) we have
(5.45)
div ψ − ψ6 = 0,
K
that is to say, we have (5.46)
0 0 0˘ 0 0Rh div ψ − ψ6 0 = 0. 0
The combination of (5.46) and (5.41) results in 0 0 0˘ 0 6 ∗ 0 = 0. (5.47) 0Rh div u − u 0
We, therefore, have from the triangle-inequality, (5.35), (5.36), (5.42), (5.44), (5.47), and Lemma 4.2 that
(5.48)
6 ∗ |||L∗h + |||u∗h − u 6 ∗ |||L∗h |||u − u∗h |||L∗h ≤ |||u − u ∗ 6 |||L∗h ≤ C |||u − u 6 H )0 + u − u 6 ∗ 0 ) ≤ C (curl (uH − u ≤ C hr (uH 1+r + ψ1+r ),
and from Remark 4.3, (5.35), (5.42), (5.44), and (5.48) that
(5.49)
6 ∗ 0 + 6 u − u∗h 0 ≤ u − u u∗ − u∗h 0 ∗ 6 0 + |||6 ≤ C (u − u u∗ − u∗h |||L∗h ) ∗ 6 0 + |||u − u 6 ∗ |||L∗h ) ≤ C (u − u ≤ C hr (uH 1+r + ψ1+r ).
Finally, from (5.48), (5.49), and (5.39) we have the following error estimate in the energy norm (5.50)
|||u − u∗h |||0;L∗h ≤ C hr (f 0 + g0 ).
The above argument goes as well to the finite element method (3.37).
1298
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
Remark 5.3. We see that (5.50) involves only the L2 norm f 0 of the right-hand side f . So, when the approximate space contains the gradient of some C 1 element, the right-hand side f can be less regular. In general, f is required to be a little more regular (see (5.29)), since the regular-singular decomposition of the curl of the solution is used (see Proposition 5.2) in estimating the inconsistent error caused by the L2 projected curl term. 6. Numerical experiments. In this section we shall report some numerical results which confirm the theoretical error bound, by considering a 3D source problem and a 2D eigenproblem. A 3D source problem. Take the thick L-domain Ω = ([−1, 1]2 \ ([0, 1] × [−1, 0])) × [0, 1] ⊂ R3 , and consider the Maxwell source problem: Find u such that curl curl u = f ,
div u = g
in Ω,
u×n=0
on Γ = ∂ Ω,
where n is the unit outer normal vector to Γ. We take the exact solution
2 2θ u = η(x, y, z) 3 sin = (u1 , u2 , u3 = 0), 3 where x = cos(θ), y = sin(θ) and z = z, with being the distance to the reentrant edge along the z-axis starting from the origin (0, 0, 0) of opening angle 3 π/2, and η(x, y, z) = (1 − x2 )(1 − y 2 )z(1 − z) is a cut-off function so that u × n = 0 on Γ. The right-hand sides f and g are obtained by evaluating the equations on the given exact solution. We partition Ω into tetrahedra with uniform meshes. We employ the conjugate gradient method to solve the resulting symmetric and positive definite linear system, with the stopping tolerance 10−10 and with the null vector as an initial guess. In this numerical test we have two specific goals: (i) To verify the theoretical convergence rate, by computing the relative errors in L2 norm using the exact solution u = (u1 , u2 , u3 ) and the finite element solution uh = (u1,h , u2,h , u3,h ); (ii) To examine the effect of the stabilization parameter α, by considering several values of α as follows: α = 0.1,
1,
1000,
10000.
In addition, we set the penalty/regularization parameter s = 1. 2 Since the regularity for the u and its curl u is H 3 − for any ∈ (0, 1) (f is also 2 in H 3 − ), from the theoretical convergence rate stated in Theorem 5.1 we expect that 1 ,···) a mesh reduction of a factor of two (i.e., the mesh size decreases like h = 14 , 18 , 16 2/3 should result in an error reduction of 2 ≈ 1.586. This is clearly confirmed by the computed results listed in Tables 1–4. On the other hand, we observe that the stabilization parameter α does not affect the error reduction ratio (i.e., the ratios in Tables 1–4 are almost the same), although it affects the sizes of errors in the way that larger values of α yield smaller values of errors. This may be due to the fact that suitable larger α would enhance the stability (cf. (4.27)–(4.28)) and, thus, make the constant in front of the error bound (5.29) smaller. We also observe that the values of errors are the same for both u1 and u2 . This is because u1 and u2 are symmetric with respect to the O − xyz coordinates system. 1 In
Tables 1–4 the 3rd row is the L2 -norm values of u3,h for different mesh sizes, since u3 = 0.
1299
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM Table 1 Relative errors in L2 norm with α = 0.1. h= u1 −u1,h 0 u −u 0 = 2u 2,h u1 0 2 0 1 u −u 3 3,h 0 = u3,h 0
1 4
h=
1 8
1 16
h=
5092.13
3177.736
1975.3958
471.240
303.274
190.983
Table 2 Relative errors in L2 norm with α = 1.0. h= u1 −u1,h 0 u1 0
=
u2 −u2,h 0 u2 0
u3 − u3,h 0 = u3,h 0
1 4
h=
1 8
h=
1 16
509.238
317.792
197.552
47.1187
30.3241
19.0963
Table 3 Relative errors in L2 norm with α = 1000.0. h= u1 −u1,h 0 u1 0
=
u2 −u2,h 0 u2 0
u3 − u3,h 0 = u3,h 0
1 4
h=
1 8
h=
1 16
0.622315
0.400576
0.254468
0.050283
0.033089
0.021106
Table 4 Relative errors in L2 norm with α = 10000.0. h= u1 −u1,h 0 u1 0
=
u2 −u2,h 0 u2 0
u3 − u3,h 0 = u3,h 0
1 4
h=
1 8
h=
1 16
0.292675
0.202238
0.153258
0.016049
0.0139586
0.010769
A 2D eigenproblem. As an illustration of the application of the L2 projection method to Maxwell eigenproblem, we perform the numerical test for a 2D eigenproblem in the L-domain Ω = [−1, 1]2 \ ([0, 1] × [−1, 0]) ⊂ R2 : Find eigenvalues ω 2 and eigenfunctions u such that curl curl u = ω 2 u,
div u = 0
in Ω,
u·τ =0
on Γ = ∂ Ω,
where τ is the unit tangential vector along Γ. We partition Ω into triangles with uniform meshes. As mentioned in Remark 3.3, the approximate space is of P3 element. We can set the penalty/regularization parameter s as any positive constant, say s = 2. Following the computational results in Table 4 for the source problem, we take the stabilization parameter α as α = 10000. We consider the benchmark example for the L-domain from the website at http://www.maths.univ-rennes1.fr/ dauge/benchmax.html, and take the first two computed eigenvalues therein as true solutions, i.e., ω1 2 = 1.47562182408,
ω2 2 = 3.53403136678. 2
Note that the first eigenfunction has a strong singularity and is in H 3 − , and the 4 second eigenfunction is smooth and belongs to H 3 − for all > 0 (see [28]). We would like to verify the error estimates in the case of eigenproblem: with the application of the result of [9], we can conclude from Theorem 5.1 that the following theoretical
1300
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN Table 5 Relative errors and error reduction ratios of the first eigenvalue. 1 4
h= |ω1 2 −ω1,h 2 | | ω1 2 | Ratio
h=
1 8
h=
1 16
h=
1 32
h=
1 64
h=
1 128
0.79882e0
0.48321e0
0.23809e0
0.10345e0
0.42512e − 1
0.17092e − 1
—
1.65315
2.02953
2.30150
2.43343
2.48725
Table 6 Relative errors and error reduction ratios of the second eigenvalue. h= |ω2
2
−ω2,h
| ω2 2 | Ratio
2
|
1 4
h=
1 8
h=
1 16
h=
1 32
h=
1 64
h=
1 128
0.39675e − 1 0.94427e − 2 0.21858e − 2 0.51238e − 3 0.12298e − 3 0.30034e − 4 —
4.20166
4.32002
4.26597
4.16637
4.09469
convergence rate 2 ω1 − ω1,h 2 ≤ C h2 r
with r =
2 3
−
holds for the first eigenvalue corresponding to eigenfunction in H r . Thus, the error 4 reduction ratio of the first eigenvalue should be about 2 3 ≈ 2.519, with a mesh reduction of factor two. Regarding the second eigenvalue corresponding to a smooth 4 eigenfunction in H 3 − , for the approximation of the P3 element, an error reduction 8 ratio would be about 2 3 ≈ 6.349 with a mesh reduction of factor two. But, due to the inconsistent errors caused by both the L2 projected curl term and the mesh-dependent term Sh,curl, the error reduction ratio is 4 only; i.e., the theoretical convergence rate from Theorem 5.1 for the second eigenvalue is 2 ω2 − ω2,h 2 ≤ C h2 . From the computed error reduction ratios of eigenvalues listed in Tables 5 and 6 we see that the computational ratios are very close to the ones as predicted above. 7. Conclusions. We have proposed the element-local L2 projected C 0 finite element method for solving the Maxwell problem with the nonsmooth solution being not in H 1 . The key feature is that some element-local L2 projectors are applied to both the curl and div operators in the well-known plain regularization variational formulation. The Maxwell problem under consideration is posed in a simply connected polyhedron with a connected Lipschitz continuous boundary and has a solution that may be in H r with r < 1. We have established the coercivity and the condition number O(h−2 ) of the resulting linear system. We have also obtained the desired error bounds O(hr ) in an energy norm for the C 0 linear element (enriched by certain higher degree face- and element-bubble functions), when the solution and its curl are in H r (1/2 < r < 1) with a smooth right-hand side. Performed for a 3D source problem and a 2D eigenproblem, both of which are posed on nonsmooth domains with reentrant corners and/or edges and have nonsmooth solutions being not in H 1 , the numerical experiments have produced good and correct C 0 approximations of nonsmooth solutions and confirmed the theoretical convergence rate obtained. For this L2 projection method, we do not require that the C 0 approximate space contain the gradient of some C 1 element and we do not impose the information of the geometric singularities of the domain boundary in the finite element variational
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1301
formulation. These make the L2 projection method particularly attractive for Maxwell equations posed on more complex 3D domains. In addition, for 2D Maxwell problem we proposed two more L2 projection methods (only the divergence part involves the element-local L2 projector), where the C 0 approximate space contains the gradient of the Argyris C 1 triangle element and the Hsieh–Clough–Tocher C 1 macro-triangle element, respectively. Coercivity is established and error estimates for nonsmooth solution being not in H 1 are obtained. These last two methods are consistent and allow less regular right-hand sides. For 3D Maxwell problem similar methods can be developed in the same routine. A generalization of the L2 projection method to Maxwell interface problems with discontinuous inhomogeneous anisotropic materials in a multiply connected nonsmooth domain (existing reentrant corners and edges) and with mixed boundary conditions is currently being studied and will be reported elsewhere. Acknowledgments. The authors would like to thank the anonymous referees for their valuable comments and suggestions on the presentation of this paper. REFERENCES [1] P. Alfeld, A trivariate Clough-Tocher scheme for tetrahedral data, Comput. Aided Geom. Design, 1 (1984), pp. 169–181. [2] A. Alonso and A. Valli, Some remarks on the characterization of the space of tangential traces of H(rot; Ω) and the construction of an extension operator, Manuscripta Math., 89 (1996), pp. 159–178. [3] A. Alonso Rodr´ıguez, P. Fernanders, and A. Valli, Weak and strong formulations for the time-harmonic eddy-current problem in general multi-connected domains, European J. Appl. Math., 14 (2003), pp. 387–406. [4] C. Amrouche, C. Bernardi, M. Dauge, and V. Girault, Vector potentials in threedimensional non-smooth domains, Math. Methods Appl. Sci., 21 (1998), pp. 823–864. ¨ cker, Resolution of the Maxwell equations in [5] F. Assous, P. Ciarlet, Jr., and E. Sonnendru a domain with reentrant corners, M2AN Math. Model. Numer. Anal., 32 (1998), pp. 359– 389. ¨ cker, Characterization of [6] F. Assous, P. Ciarlet, Jr., P.-A. Ravairt, and E. Sonnendru the singular part of the solution of Maxwell’s equations in a polyhedral domain, Math. Methods Appl. Sci., 22 (1999), pp. 485–499. [7] F. Ben Belgacem and C. Bernardi, Spectral element discretization of the Maxwell equations, Math. Comp., 68 (1999), pp. 1497–1520. ´ dez, R. Rodr´ıguez, and P. Salgado, A finite element method with Lagrangian mul[8] A. Bermu tiplier for low-frequency harmonic Maxwell equations, SIAM J. Numer. Anal., 40 (2002), pp. 1823–1849. [9] I. Babu˘ ska, and J. E. Osborn, Finite element-Galerkin approximation of the eigenvalues and eigenvectors of selfadjoint problems, Math. Comp., 52 (1989), pp. 275–297. [10] C. Bernardi, Optimal finite element interpolation on curved domains, SIAM J. Numer. Anal., 26 (1989), pp. 1212–1240. [11] C. Bernardi and V. Girault, A local regularization operator for triangular and quadrilateral finite elements, SIAM J. Numer. Anal., 35 (1998), pp. 1893–1916. [12] M. Birman and M. Solomyak, L2 -theory of the Maxwell operator in arbitrary domains, Russian Math. Surveys, 42 (1987), pp. 75–96. [13] A.-S. Bonnet-Ben Dhia, C. Hazard, and S. Lohrengel, A singular field method for the solution of Maxwell’s equations in polyhedral domains, SIAM J. Appl. Math., 59 (1999), pp. 2028–2044. [14] A. Bossavit, Magnetostatic problems in multiply connected regions: Some properties of the curl operator, IEEE Proc., 135 (1988), pp. 179–187. [15] A. Bossavit, Computational Electromagnetism: Variational Formulations, Complementarity, Edge Elements, Academic Press, New York, 1998. ˇ´ıˇ [16] J. Brandts, S. Korotov, and M. Kr zk, On the equivalence of regularity criteria for triangular and tetrahedral partitions, Comput. Math. Appl., 55 (2008), pp. 2227–2233.
1302
H.-Y. DUAN, F. JIA, P. LIN, AND R. C. E. TAN
[17] S. C. Brenner and L. R. Scott, The Mathematical Theory of Finite Element Methods, Springer-Verlag, Berlin, 1996. [18] S. Caorsi, P. Fernandes, and M. Raffetto, Spurious-free approximations of electromagnetic eigenproblems by means of N´ edl´ ec-type elements, M2AN Math. Model. Numer. Anal., 35 (2001), pp. 331–354. [19] C. Carstensen, S. Funken, W. Hackbusch, R. H. W. Hoppe, and P. Monk, Computational Electromagnetics, Proceedings of the GAMM Workshop on Computational Electromagnetics, Spinger-Verlag, Berlin, 2003. [20] M. Cessenat, Mathematical Methods in Electromagnetism: Linear Theory and Applications, World Scientific, 1996. [21] P. G. Ciarlet, Basic Error Estimates for Elliptic Problems, in: Handbook of Numerical Analysis, Vol. II, Finite Element Methods (part 1), P. G. Ciarlet and J.-L. Lions, eds., North-Holland, Amsterdam, 1991. [22] P. Cl´ ement, Approximation by finite element functions using local regularization, RAIRO Numer. Anal., 9 (1975), pp. 77–84. [23] M. Costabel, A remark on the regularity of solutions of Maxwell’s equations on Lipschitz domains, M3AS Math. Methods Appl. Sci., 12 (1990), pp. 365–368. [24] M. Costabel, A coercive bilinear form for Maxwell’s equations, J. Math. Anal. Appl., 157 (1991), pp. 527–541. [25] M. Costabel and M. Dauge, Weighted regularization of Maxwell equations in polyhedral domains, Numer. Math., 93 (2002), pp. 239–277. [26] M. Costabel and M. Dauge, Singularities of electromagnetic fields in polyhedral domains, Arch. Rational Mech. Anal., 151 (2000), pp. 221–276. [27] M. Costabel and M. Dauge, Maxwell and Lam´ e eigenvalues on polyhedra, Math. Methods Appl. Sci., 22 (1999), pp. 243–258. [28] M. Costabel and M. Dauge, Computation of resonace frequencies for Maxwell equations in non smooth domains, in Lecture Notes in Comput. Sci. Eng. 31, M. Ainsworth, P. Davies, D. Duncan, P. Martin, and B. Rynne, eds., 2003, pp. 125–162. [29] M. Costabel, M. Dauge, and S. Nicaise, Singaularities of Maxwell interface problem, M2AN Math. Model. Numer. Anal., 33 (1999), pp. 627–649. [30] M. Crouzeix and P.-A. Raviart, Conforming and nonconforming finite element methods for solving the stationary Stokes equations, RAIRO Numer. Anal., 7 (1973), pp. 33–75. [31] M. Dauge, Private communication, 2005. [32] H.-Y. Duan, P. Lin, P. Saikrishnan, and R. C. E. Tan, L2 -projected least-squares finite element methods for the Stokes equations, SIAM J. Numer. Anal., 44 (2006), pp. 732–752. [33] G. Farin, Triangular Bernstein-B´ ezier patches, Comput. Aided Geom. Design, 3 (1986), pp. 83–127. [34] P. Fernandes and G. Gilardi, Magnetostatic and Electrostatic problems in inhomogeneous anisotropic media with irregular boundary and mixed boundary conditions, Math. Models Methods Appl. Sci., 7 (1997), pp. 957–991. [35] P. Fernandes and I. Perugia, Vector potential formulation for magnetostatics and modelling of permanent magnets, IMA J. Appl. Math., 66 (2001), pp. 293–318. [36] V. Girault, A local projection operator for quadrilateral finite elements, Math. Comp., 64(1995), pp. 1421-1431. [37] V. Girault and P. A. Raviart, Finite Element Methods for Navier-Stokes Equations, Theory and Algorithms, Springer-Verlag, Berlin, 1986. [38] G. H. Golub and C. F. Van Loan, Matrix Computation, 3rd edition, Johns Hopkins University Press, Baltimore, MD, 1996. [39] C. Hazard and M. Lenoir, On the solution of time-harmonic scattering problems for Maxwell’s equations, SIAM J. Math. Anal., 27 (1996), pp. 1597–1630. [40] C. Hazard and S. Lohrengel, A singular field method for Maxwell’s equations: Numerical aspects for 2D magnetostatics, SIAM J. Numer. Anal., 40 (2003), pp. 1021–1040. [41] R. Hiptmair, Finite elements in computational electromagnetism, Acta Numer., 2002, pp. 237– 339. ¨ tzau, Interior penalty method for [42] P. Houston, I. Perugia, A. Schneebeli, and D. Scho indefinite time-harmonic Maxwell equations, Numer. Math., 100 (2005), pp. 485–518. [43] J. M. Jin, The Finite Element Method in Electromagnetics (2nd Edition), John Wiley & Sons, New York, 2002. [44] F. Kikuchi, Mixed and penalty formulations for finite element analysis of an eigenvalue problem in electromagnetism, Comput. Methods Appl. Mech. Engrg., 64 (1987), pp. 509–521. [45] M.-J. Lai and A. LeM´ ehaut´ e, A new kind of trivaiate C 1 macro-element, Adv. Comput. Math., 21 (2004), pp. 273–292.
L2 PROJECTED FEM METHOD FOR MAXWELL PROBLEM
1303
[46] E. J. Lee and T. A. Manteuffel, FOSLL∗ method for the eddy current problem with threedimensional edge singularities, SIAM J. Numer. Anal., 45 (2007), pp. 787–809. [47] G. Meurant, Computer Solution of Large Linear Systems, Elsevier, Singapore, 1999. [48] P. Monk, A finite element method for approximating the time-harmonic Maxwell’s equations, Numer. Math., 63 (1992), pp. 243–261. [49] P. Monk, Analysis of a finite element method for Maxwell’s equations, SIAM J. Numer. Anal., 29 (1992), pp. 714–729. [50] P. Monk, Finite Element Methods for Maxwell Equations, Clarendon Press, Oxford, 2003. [51] S. Nicaise, Edge elements on anisotropic meshes and approximation of the Maxwell equations, SIAM J. Numer. Anal., 39 (2001), pp. 784–816. [52] L. R. Scott and S. Zhang, Finite element interpolation of nonsmooth functions satisfying boundary conditions, Math. Comput., 54 (1990), pp. 483–493. [53] O. Steinbach, On the stability of the L2 projection in fractional Sobolev spaces, Numer. Math., 88 (2000), pp. 367–379. [54] A. J. Worsey and G. Farin, An n-dimensional Clough-Tocher interpolant, Constr. Approx., 3 (1987), pp. 99–110. [55] A. J. Worsey and B. Piper, A trivariate Powell-Sabin interpolant, Comput. Aided Geom. Design, 5 (1988), pp. 177–186.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1304–1318
ON THE EXISTENCE OF EXPLICIT hp-FINITE ELEMENT METHODS USING GAUSS–LOBATTO INTEGRATION ON THE TRIANGLE∗ B. T. HELENBROOK† Abstract. Spectral-element simulations on quadrilaterals and hexahedra rely on the Gauss– Lobatto (GL) integration rule to enable explicit simulations with optimal spatial convergence rates. In this work, it is proved that a similar integration rule does not exist on triangles. The following properties of the rule are sought: a (p+1)(p+2)/2 point integration rule capable of exactly integrating the space given by T (2p − 1) ≡ {xm y n |0 ≤ m, n; m + n ≤ 2p − 1}, where p is an integer; integration points located at each of the triangle vertices; p − 1 integration points located on each side; and (p − 1)(p − 2)/2 integration points located in the interior of the element. The proof hinges on the fact that the existence of such a rule implies the existence of a nodal basis with an approximate diagonal mass matrix that can be inverted to obtain exact Galerkin projections of functions in T (p − 1). The proof shows that vertex functions of a basis having this property exist and are unique, but on a triangle these functions are not nodal, and therefore the GL rule does not exist. In spite of this, the existence of the vertex functions indicates that there may be a nonnodal basis that has the above property. This basis would enable explicit hp-finite element simulations on the triangle with optimal spatial accuracy. The methodology developed in the paper gives insight into a possible way to find such a basis. Key words. triangles, quadrature, integration, Gauss, Lobatto, mass-lumping AMS subject classifications. 65D32, 74S05 DOI. 10.1137/070685439
1. Introduction. Gauss–Lobatto (GL) integration [1, p. 888] provides the foundation for spectral element simulations [20]. Not only does it provide a numerical integration method, but the integration points also define a nodal basis that allows easy enforcement of continuity constraints at element boundaries and gives an approximately diagonal mass matrix. This last point enables unsteady simulations that do not require inversion of a globally coupled mass matrix and yet still obtain optimal spatial convergence rates [20]. These properties are the main reason that spectral element simulations can efficiently achieve a high order of accuracy. Although GL integration rules can be defined for segments, quadrilaterals, and hexahedra [16, p. 143], an equivalent integration rule has not been found for triangles. This is not due to a lack of effort in searching. Much effort has been made to find optimal interpolation points on the triangle [18, 2, 3, 27, 15] and also to find a quadrature formula [28, 29, 14, 4, 5, 25, 17]. Cools and coworkers provide an excellent summary of the current status of quadrature rules on triangles as well as other geometries [9, 8, 10, 7, 6, 19, 11]. Because no completely satisfactory integration rule has been found, researchers are still experimenting with different techniques for performing high-order continuous finite element simulations on triangles [23, 24, 12, 30, 21]. In this work, it is proved that there is no GL integration rule for a triangle that has properties similar to those for segments, quadrilaterals, and hexahedra. ∗ Received by the editors March 16, 2007; accepted for publication (in revised form) October 31, 2008; published electronically February 25, 2009. This material is based upon work supported by the National Science Foundation under grant 0513380. http://www.siam.org/journals/sinum/47-2/68543.html † Mechanical & Aeronautical Engineering Department, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5725 (
[email protected]).
1304
GAUSS–LOBATTO INTEGRATION ON THE TRIANGLE
1305
On quadrilaterals, the tensor-product GL integration rule for the space Q(p) ≡ {xm y n | 0 ≤ m, n; m ≤ p; n ≤ p} has the following properties: • a dim(Q(p)) = (p + 1)2 point integration rule capable of exactly integrating the polynomial space Q(2p − 1); • integration points located at each of the quadrilateral vertices; • p − 1 integration points located on each quadrilateral side; • (p − 1)2 integration points located in the interior of the element. On triangles, the function space typically used is T (p) ≡ {xm y n | 0 ≤ m, n; m + n ≤ p} [27]. For this space an integration rule is sought with the following properties: • a dim(T (p)) = (p + 1)(p + 2)/2 point integration rule capable of exactly integrating the polynomial space T (2p − 1); • integration points located at each of the triangle vertices; • p − 1 integration points located on each side; • (p − 1)(p − 2)/2 integration points located in the interior of the element. Theoretical results for polynomial integration formulas on a triangle give a lower bound for the number of points required to exactly integrate the space T (2p − 1) of p(p + 1)/2 + -p/2., where the floor symbols - . denote truncation [8]. The rule sought has more points than the lower bound for all p, but with special constraints on the positions. Note that in both the quadrilateral case and the triangle case the problem is overdetermined. On quadrilaterals, there are 4 + 2 × 4(p − 1) + 3 × (p − 1)2 = 3p2 + 2p − 1 degrees of freedom for the positions and weights, and there are 4p2 accuracy constraints. However, this solution exists. On triangles, there are 3 + 2 × 3(p − 1) + 3 × (p − 1)(p − 2)/2 = 3(p2 + p)/2 degrees of freedom for the positions and weights and 2p2 + p accuracy constraints. The basic steps of the proof are given in one dimension as a demonstration and then subsequently applied to triangles. A positive result of the proof is that vertex modes are found that allow “diagonal projection.” This is defined to mean that a diagonal mass matrix can be inverted to obtain exact Galerkin projections of functions in T (p − 1). A full basis that allows diagonal projection will enable explicit-unsteady, continuous finite element simulations on the triangle with optimal spatial accuracy. 2. One-dimensional integration. The first part of the proof is to establish some basic features of the GL integration rule on the domain x ∈ [−1, 1]. It is of course well known that the GL integration rule exists on this domain, but nonetheless it is instructive to go through the process in one dimension before applying it to triangles. The GL integration rule in one dimension is defined by
1
(2.1) −1
f (x)dx ≈
n
wi f (xi ),
i=1
where f (x) is the function to be integrated, n is the number of points in the GL rule, wi is the integration weight associated with each integration point, and xi is the location of the integration point. The first and last integration points are constrained to be at the edge of the domain, x1 = −1 and xn = 1. The GL integration rule has the following properties: • an n-point formula integrates polynomials of order 2n − 3; • the locations of the integration points are the roots of the derivative of the (x); (n − 1)st Legendre polynomials, Pn−1
1306
B. T. HELENBROOK
• the weights are given by wi =
(2.2)
2
2.
n(n − 1) [Pn−1 (xi )]
A (p + 1)-point GL integration rule can be used to generate an order p nodal polynomial basis. This basis is defined by p+1 !
φi (x) =
j=1,j =i
x − xj , xi − xj
i ∈ [1, p + 1],
where φi is the ith function of the basis vector. φi (x) is zero at all of the GL integration points except the ith point where it has the value 1, i.e., φi (xj ) = δi,j , where δi,j is the Kronecker delta function. The basis φ is referred to as the Gauss–Lobatto–Lagrange (GLL) basis. It spans P(p), which is the space of polynomials of degree p. The standard method of projecting a function onto this basis is defined as φφT udΩ = φf (x)dΩ, (2.3) Ω
Ω
where Ω is the domain [−1, 1]. This equation determines the coefficient vector, u such that φT u approximates f (x). The matrix Ω φφT dΩ is typically called the mass matrix, M . M is diagonal if the basis functions are orthogonal (Legendre polynomials). The above equation gives an exact representation of f (x) if f (x) is contained in the space spanned by φ. The combination of the (p + 1)-point GL integration rule and the order p nodal basis leads to an approximate orthogonality property. If φj φk dΩ Ω
is approximated as p+1
wi φj (xi )φk (xi ),
i=1
this becomes p+1
wi δj,i δk,i = δj,k wj .
i=1
This shows that the basis is orthogonal when integrated with the GL integration rule. Because the GL integration is accurate only for polynomials of order 2p − 1, and the integrand is of order 2p, this is not equivalent to showing that the basis itself is orthogonal. Theorem 2.1. The approximate orthogonality property of the GLL basis guarantees the existence of a diagonal projection operation that gives an exact representation of functions in P(p − 1). The diagonal projection operation is defined as φf (x)dΩ, (2.4) Du = Ω
where D is a diagonal matrix.
GAUSS–LOBATTO INTEGRATION ON THE TRIANGLE
1307
Proof. Let an entry of the matrix D be defined by (2.5)
dj,k =
p+1
wi φj (xi )φk (xi ) = δj,k wj .
i=1
Because of the approximate orthogonality property, D is diagonal. Furthermore, all of the weights of the GL integration rule are nonzero, so D is invertible. Thus there is a unique solution to (2.4). It remains to show that the exact solution satisfies (2.4) when f (x) is a polynomial of order p − 1. If f (x) is a polynomial of order p − 1 and the inversion is exact, then φT u is also a polynomial of order p − 1. Furthermore, (2.4), with dj,k defined as in (2.5), is an approximation to (2.3). When φT u is of order p − 1, the integrand on the left-hand side of (2.3) is of order 2p − 1. Because the GL integration rule is exact for polynomials of order 2p − 1, (2.4) and (2.5) are exact approximations to (2.3). Since the exact solution satisfies (2.3), it must also satisfy (2.4). The above shows that a GL integration rule guarantees the existence of a nodal basis that allows exact “diagonal projection” for functions of degree p − 1. Next, it is shown that this basis can be derived based on accuracy considerations. First, the nodal basis is divided into interior modes and vertex modes. The interior modes are zero at element boundaries and can be constructed from the space I(p) ≡
1 − x2 P(p − 2). 4
This is the space of all polynomials of degree ≤ p that are zero at both −1 and 1. Because the GL integration rule must have an integration point at −1 and 1, the nodal basis will always have a left and right vertex mode. The left vertex mode can be defined as a polynomial that is 1 at x = −1 and 0 at x = 1. Polynomials of degree p that satisfy these constraints can be constructed as 1−x + i(x) with i(x) ∈ I(p). 2 The function 1−x 2 and all of the interior modes have a root at x = 1, and therefore the left vertex mode will always have a root at x = 1. Similar results hold for the right vertex mode. Theorem 2.2. There is one and only one left vertex mode, φ1 , that allows exact diagonal projection of polynomials of order p − 1. Proof. Let the function to be projected, f (x), be described as 1−x 1+x + ai xi−2 , 2 2 i=2 p
(2.6)
f (x) = a1
and let the left vertex function of the basis vector, φ1 , be described as (2.7)
φ1 (x) =
p−1 1 − x 1 − x2 i−1 + bi x . 2 4 i=1
Let the projection be represented as φT (x)u. The first component of (2.4) is given by 1 d1,1 u1 = φ1 (x)f (x)dx, −1
1308
B. T. HELENBROOK
which is equivalent to 1 p−1 p 1−x 1+x 1 − x 1 − x2 i−1 + + a1 bi x ai xi−2 dx. d1,1 u1 = 2 4 i=1 2 2 i=2 −1 This equation must be true for all a. Equating φT (−1)u to f (−1) and using the fact that the left vertex function is the only nonzero basis function at x = −1 gives u1 = a1 . a1 then gives 1 p−1 1 − x 1 − x2 i−1 1 − x + dx, bi x d1,1 = 2 4 i=1 2 −1 which determines ⎡ 1 1⎢ x ⎢ (2.8) ⎢ .. −1 ⎣ .
d1,1 . Each of the remaining a’s give a row of the equations ⎡ ⎤ ⎤ 1 1 ⎥ 1 + x 1 − x2 ⎥
1 − x2 ⎢ ⎢ x ⎥ ⎥ 1, x, . . . , xp−2 b dx = − ⎥ ⎢ .. ⎥ dx. ⎦ 2 4 4 ⎣ . ⎦ −1
xp−2
xp−2
The above equations are a system of p − 1 equations in the p − 1 unknowns of b. It has a unique solution if the matrix on the left-hand side has a nonzero determinant. This matrix is symmetric because any entry can be represented as
1
1+x 1 − x2 ci,j = xi−1 xj−1 dx. 2 4 −1 It is also positive definite because ⎡
1
bT −1
1+x 2
1
= −1
1+x 2
1 − x2 4
⎢ ⎢ ⎢ ⎣
1 x .. .
⎤ ⎥
⎥ ⎥ 1, x, . . . , xp−2 dx b ⎦
xp−2 1 − x2 b(x)2 dx, 4
where b(x) = 1, x, . . . , xp−2 b. The integrand is always positive over the domain [−1, 1]. Because it is symmetric and positive definite, it is invertible, which proves that the left vertex mode is unique. Similar results hold for the right vertex mode. The next theorem is similar to a more general theorem given by Mysovskikh [22] for a multidimensional Gauss integration rule that states that “a necessary condition for the existence of a quadrature formula of degree 2k + 1 with N = dim P dk points is that the basic orthogonal polynomials of degree k + 1 have N common zeros” where P dk is the space of polynomials in dimension d with total degree less than k. (See [8, Theorem 2].) The following theorem is more useful for analyzing the existence of GL integration rules. Theorem 2.3. If the left and right vertex modes satisfying the diagonal projection property do not have p−1 roots at coincident locations in (−1, 1), then a GL integration rule does not exist.
GAUSS–LOBATTO INTEGRATION ON THE TRIANGLE
1309
Proof. Assume 1. a GL integration rule exists, and 2. a left and right vertex function exists satisfying the diagonal projection property but with roots at different locations in (−1, 1). By assumption 1 and Theorem 2.1, there exists a left and right vertex function satisfying the diagonal projection property. Furthermore, these functions are from a nodal basis and thus share the same roots in (−1, 1). Because the left and right vertex functions are unique by Theorem 2.2, this contradicts item 2 above. Thus assumption 2 excludes the existence of the GL integration rule. To verify whether a GL integration rule can exist or not, the location of the roots of the left (and right) vertex mode must be found using (2.8). By relaxing the form specified for the left vertex mode, one can obtain an explicit expression, which then makes it easy to determine the location of the roots. Instead of assuming the form given by (2.7), the following form is used: φ1 (x) =
(2.9)
1−x φ1 (x), 2
where φ1 (x) ∈ P(p − 1). This enforces the constraint that the left vertex mode have a root at x = 1, but does not constrain the value at −1. Following the same procedure as used to prove Theorem 2.2, φT (−1)u is equated to f (−1), giving u1 φ1 (−1) = a1 . Plugging (2.6) and (2.9) into (2.4) gives equations that must be true for all a. As before, the equation from a1 determines d1,1 . The remaining equations can be written as ⎡ ⎤ 1 1 ⎥ 1 − x2 ⎢ ⎢ x ⎥ (2.10) ⎢ .. ⎥ φ1 dx = 0. 4 ⎣ . ⎦ −1 xp−2 This shows that the function φ1 must be orthogonal to P(p − 2) with respect to the 2 weighting 1−x 4 . The Jacobi polynomials satisfy
1
−1
(α,β) (α,β) Pm Pn (1 − x)α (1 + x)β = δm,n .
(1,1)
Because the space P(p − 2) can be represented using the Jacobi polynomials Pn (x) (1,1) for n ∈ [0, p − 2], the polynomial Pp−1 (x) will satisfy (2.10). The left vertex function can therefore be represented as
1−x (1,1) 2 Pp−1 (x).
Following the same procedure for the
right vertex function shows that it can be represented as
1+x (1,1) 2 Pp−1 (x).
The roots
(1,1) Pp−1 (x)
of both polynomials in (−1, 1) are determined by and thus have the same locations. Not surprising, this shows that a GL integration rule may exist in one dimension. Based on the already known expression for the locations of the GL points, (1,1) it also shows that Pp (x) = Pp−1 (x). 3. Triangles. In this section, the same basic steps are used to show that a GL integration rule does not exist on triangles. First, it is shown that the existence of a GL integration rule with the properties defined in the introduction implies the existence of a nodal basis for the space T (p) that has a diagonal projection operation that is
1310
B. T. HELENBROOK
exact for functions in the space T (p − 1). It is then shown that the basis satisfying this property is unique and not nodal, proving that the GL integration rule does not exist. Before beginning, a standard triangle on which to perform the operations is defined by {r, s | − 1 ≤ r ≤ 1, −1 ≤ s ≤ r}, as shown in Figure 1. Following Dubiner [13], we introduce coordinates ξ = −1 + 2(1 + r)/(1 − s) and η = s, which are shown on the figure as well. In this coordinate system, the standard triangle is defined by −1 ≤ ξ ≤ 1, −1 ≤ η ≤ 1. Integration over the standard element is given by
1
r
1
1
f (r, s)dsdr = −1
−1
f (ξ, η) −1
−1
1−η dηdξ 2
in these coordinate systems. 1
1 s r
η
-1
-1 -1
1
-1
ξ
1
Fig. 1. Standard triangle and coordinate systems.
As in one dimension, it is assumed that the GL integration rule has the form
1
−1
−1
N (p)
r
f (r, s)dsdr ≈
wi f (ri , si ),
i=1
where f (r, s) is the function to be integrated, N (p) ≡ dim(T (p)) = (p + 1)(p + 2)/2 is the number of points in the GL rule, and wi is the weight associated with the point located at ri , si . Three of the points are required to be at the triangle vertices, r, s = (−1,−1), (−1,1), and (1,−1), and p − 1 points are required to be along each side of the element, r = −1, s = −1, and r = s. The remaining N (p − 3) = (p − 1)(p − 2)/2 points are assumed to be in the interior of the element. A formula is sought that can integrate polynomials in the space T (2p − 1) exactly. Some basic observations about the space T (p) are first given. This space can be decomposed into interior, side, and vertex modes. Interior modes are zero on all sides of the triangle and can be constructed from the space I(p) ≡ (r + 1)(s + 1)(r + s)T (p − 3). This is a general space for the interior modes, and it contains all polynomials in T (p) that have three component curves defined by r = −1, s = −1, and r = −s. (See [26, section 1.8] for a definition of component curves.) In some cases, it will be convenient to have an explicit representation of the interior space. In this case, the interior modes of the modified Dubiner basis [13] will be used. These are described in ξ, η coordinates as
m+2
1+ξ 1−η 1+η 1−ξ 2,2 φint,m,n = Pm (ξ) Pn2m+5,2 (η), 2 2 2 2
GAUSS–LOBATTO INTEGRATION ON THE TRIANGLE
1311
where 0 ≤ m < p − 2, 0 ≤ n < p − 2 − m. In some cases, a one-dimensional numbering of the interior modes will be needed, in which case φint,m,n will be replaced by φint,j , where j = N (m + n − 1) + n + 1. There are three distinct sets of side modes. The sides are numbered as shown in Figure 2, with side 1 being opposite to vertex 1. General spaces for constructing the side modes are S1 (p) ≡ (r + 1)(r + s)T (p − 2), S2 (p) ≡ (r + 1)(s + 1)T (p − 2), S3 (p) ≡ (s + 1)(r + s)T (p − 2).
v1
s3
s2
s1 v2
v3
Fig. 2. Numbering of the vertices and sides of the triangle.
The side modes can be constructed from p − 1 modes that are nonzero along the side and any linear combination of interior modes. Thus, each of these spaces includes the interior space as a subset. For each side, the form of the p − 1 side modes in the modified Dubiner basis is given by
m+2 1+ξ 1−η 1−ξ 2,2 (ξ) , Pm 2 2 2
1+ξ 1−η 1+η 2,2 φs2,m = (η), Pm 2 2 2
1−ξ 1−η 1+η 2,2 φs3,m = (−1)m (η), Pm 2 2 2 φs1,m =
where (0 ≤ m < p − 1). Vertex modes are constrained to be one at one vertex and zero along the opposing side. General spaces for obtaining vertex modes are given by V1 = (1 + s)T (p − 1), V2 = (r + s)T (p − 1), V3 = (1 + r)T (p − 1). Vertex modes can be constructed using a vertex function and any combination of modes from the two adjacent sides as well as interior modes. Thus, the vertex 1 space, for example, contains the S2 , S3 , and I spaces as a subset. In the modified Dubiner basis, the three vertex modes are linear functions that are one at one vertex and zero
1312
B. T. HELENBROOK
along the opposing side:
φv2 φv3
1+η φv1 = , 2
1−ξ 1−η = , 2 2
1+ξ 1−η = . 2 2
The vertex, side, and interior modes of the modified Dubiner basis are assembled into by listing first the three vertex modes, then the side 1 modes, a single basis vector, φ, the side 2 modes, the side 3 modes, and lastly the interior modes. To distinguish p is used. different basis orders, the notation φ As in one dimension, the first step is to show that the existence of a GL integration rule guarantees the existence of a nodal basis that allows exact diagonal projection for functions in T (p − 1). The following theorem is slightly more difficult to prove in two dimensions. Theorem 3.1. The existence of a GL integration rule on the triangle guarantees the existence of a nodal basis on the triangle. T a, is to exactly reproduce the values of a Proof. If a function in T (p), say φ function u(r, s) at the GL points, the following must be true:
N (p)
(3.1)
aj φj (rk , sk ) = u(rk , sk ) ∀k ∈ [1, N (p)].
j=1
This can be written more compactly as Pa = u, where P is an N (p) × N (p) square matrix with entries given by pj,k = φj (rk , sk ), and u is a column vector containing the values of u(r, s) at each GL point. To find the ith mode of the nodal basis, ψi , u(rk , sk ) is set to δi,k . If P is invertible, then the nodal basis is uniquely determined. This is in agreement with Theorem 3.7-3 in [26] which proves a similar result and then goes on to investigate the properties of these functions. Now assume P is singular. In this case, there are either an infinite number of solutions to (3.1) or no solutions. If u is chosen to be evaluated using a function in T (p), then there is certainly a function in T (p) that can reproduce these values in this particular case. This shows that there is at least one solution. To prove that this solution is unique, assume that there are two distinct functions, u1 and u2 , in T (p) that produce the same values on the GL points. Let these functions be represented using the modified Dubiner basis. Because there is a GL point located at each vertex, the coefficients of the vertex modes for both functions must be identical. Furthermore, because there are p − 1 GL points on each side, the coefficients of the side modes are also uniquely determined. u1 and u2 can therefore differ only in the coefficients of the interior modes. However, the GL integration rule integrates all polynomials in
GAUSS–LOBATTO INTEGRATION ON THE TRIANGLE
1313
T (2p − 1) exactly, and both functions are assumed to have the same values on the Gauss points. Therefore,
1
−1
r
−1
p−3 u1 dsdr = φ
1
−1
r
−1
p−3 u2 dsdr. φ
p−1 , but the additional constraints are not necessary for the This actually holds for φ proof. u1 and u2 have the same side and vertex modes, so they can be eliminated from both sides of the equation. The interior space of functions can be represented as p−3 . I(p) = span (1 + s)(1 + r)(r + s)φ In the same way that showed that (2.8) is symmetric positive definite, it can be shown that the above equation results in a symmetric positive definite matrix. u1 and u2 must therefore be identical. Since there is a unique solution, then P is not singular and the nodal basis is uniquely determined. Given the nodal basis and the GL integration rule, Theorem 2.1 can be extended to apply to triangles with no modification. This shows that if a GL integration rule exists, there is a nodal basis, and there is an exact diagonal projection operation for functions in T (p − 1). Following along with the one dimension logic, the next step is to prove the following theorem. Theorem 3.2. The three triangle vertex modes that allow exact diagonal projection of functions from T (p − 1) are unique. Proof. Let the function to be projected, f (r, s), be contained in T (p − 1) and described as T a, f (r, s) = φ p−1 and let the projected function be represented by T u, u(r, s) = ψ p where ψ is the basis allowing diagonal projection. Let the first vertex mode, ψ1 , be T b. Since ψ1 is assumed to described using the modified Dubiner basis as (1 + s)φ p−1 be a vertex mode, b1 is not zero. The mode can be scaled by an arbitrary constant, so b1 can be constrained to be 1. Because u(r, s) must equal f (r, s) at the vertex point, u1 is then equal to a1 . Diagonal projection requires that T bφ T adrds d1,1 u1 = d1,1 a1 = (1 + s)φ p−1 p−1 Ω
hold for all a. This again results in a set of symmetric positive definite matrices for the coefficients from b2 to bN (p−1) . To see this, the first component from the vectors b and a is explicitly extracted and then the remaining part of the vectors is represented as b/1 = b2 , . . . , bN (p−1) . In the following, all subscripts of /1 indicate the vector without the first component. The constraint corresponding to a1 determines the diagonal projection constant d1,1 . The remaining constraints are given by
T T (1 + s)φ a/1 drds = − p−1,/1 b/1 φp−1,/1
(3.2) Ω
Ω
(1 + s)2 T b1 φp−1,/1a/1 dr ds. 2
1314
B. T. HELENBROOK
These are N (p − 1) − 1 equations in N (p − 1) − 1 unknowns (b1 is set to one). That the matrix is positive definite can be seen by first letting b/1 = a/1 and then defining T g(r, s) as φ a/1 . This results in p−1,/1 (1 + s)(g(r, s))2 drds, Ω
which is positive over the triangle. Thus the matrix is positive definite, and the vertex mode that allows diagonal projection is unique. Theorem 3.3. If the zero curves of the three vertex modes do not coincide at p − 1 locations along each side of the triangle, then a GL integration rule does not exist. Proof. Assume 1. a GL integration rule exists, and 2. vertex functions exist satisfying the diagonal projection property, but the zero curves of these functions do not intersect at p − 1 locations along any side of the triangle. By assumption 1 and Theorem 3.1, there exist vertex functions satisfying the diagonal projection property. Furthermore, these functions are from a nodal basis, and there are p − 1 nodes along each side. This implies that all three functions are zero at p − 1 locations on each triangle side. Because the vertex functions are unique by Theorem 3.2, this contradicts item 2 above. Thus assumption 2 excludes the existence of the GL integration rule. The final step is to determine analytic expressions for the vertex functions. The easiest way to find the vertex functions is to simply invert (3.2) numerically. This result was used as a guide to determine an analytic description of the vertex functions. The analytic expression can be found most easily using ξ, η coordinates on the triangle. T b ∈ T (p − 1), (3.2) can be written as Treating b1 as an unknown and letting σ = φ p−1
1
(3.3) −1
1
1−η T dξdη = 0. (1 + η)σ φ a/1 p−1,/1 2 −1
This shows that the function σ should be orthogonal (with respect to a weighting function) to the space T (p − 1) excluding the vertex 1 mode. This space is formed by the union of the two other vertex spaces, V2 ∪ V3 . The numerical results indicate that σ is only a function of η. Therefore only the η components of this equation can be considered. The basis for the space V2 ∪ V3 consists of the two vertex modes, φv2 and φv3 , the side modes φs1,m , φs2,m , and φs3,m with 0 ≤ m < p − 2, and φint,m,n with 0 ≤ m < p − 3, 0 ≤ n < p − 3 − m. All of these modes include the factor 1−η 2 and reach a maximum degree in η of p − 1, and thus the η component of any function of the space can be constructed from (1 − η)P(p − 2). The η component of the orthogonality constraint is then 1 σP(p − 2)(1 − η)2 (1 + η)dη = 0. (3.4) −1
If σ is only a function of η, then σ ∈ P(p−1). To satisfy this orthogonality requirement, (2,1) σ must be the Jacobi polynomial Pp−1 (η). Because this polynomial is orthogonal to (2,1)
the functions Pm
(η) for m ∈ [0, p − 2] and these functions span P(p − 2), this choice
GAUSS–LOBATTO INTEGRATION ON THE TRIANGLE
1315
satisfies (3.4), and thus (3.3) as well. The vertex 1 function that allows diagonal projection on the triangle is thus (2,1)
ψv1 =
1 + s Pp−1 (s) , 2 P (2,1) (1) p−1
where it has been normalized such that the value of the function at the vertex is 1. The other two vertex functions can be found by using the rotational symmetry of the triangle. For example, to find ψv1 , one can substitute −1 − r − s for s to obtain (2,1)
ψv2
−(r + s) Pp−1 (−1 − r − s) . = (2,1) 2 P (1) p−1
For a GL rule to exist, these two functions should have the same roots along the adjacent side, r = −1. If p − 1 is even, this implies that the function should be an even function of s, and if p − 1 is odd, the function should be an odd function of s. (2,1) Based on the fact that the Jacobi polynomials, Pn (x), are orthogonal with respect to a nonsymmetric weighting function (1 − x)2 (1 + x), it is fairly obvious that they are not symmetric. To be sure, the polynomial form given by Pn(α,β) (x) = (1 − x)−α (1 + x)−β
dn (1 − x)(α+n) (1 + x)(β+n) n dx
is examined. Letting α = 2 and β = 1, after some manipulation this can be rewritten as n−1 d n d 1 Pn(2,1) (x) = − (1 − x2 )n+1 . 1 − x2 dx 1 − x dxn−1 If n is even, this function must be even for a GL rule to exist, and if n is odd, it dn−1 2 n+1 is odd. Denote should be odd. For the case of n even, the function dx n−1 (1 − x ) it as g(x). The above then becomes dg(x) n 1 (2,1) − g(x) Pn (x) = 1 − x2 dx 1−x and Pn(2,1) (−x) (2,1)
For Pn
dg(x) n 1 − (−g(x)) . = 1 − x2 dx 1+x
(2,1)
(x) to be even, Pn
(2,1)
(x) − Pn
(−x) should equal 0. The above gives −2n 1 (2,1) (2,1) g(x) . Pn (x) − Pn (−x) = 1 − x2 1 − x2
g(x) is not zero, so for any even n greater than zero the function is not even. A similar argument can be made for the case of odd n. The only case where the function is symmetric is n = 0. Based on Theorem 3.3, because the roots of the diagonal projection vertex modes do not coincide along the side, a GL integration rule does not exist on the triangle for
1316
B. T. HELENBROOK
p > 1. For p = 1, locating three Gauss points on the vertices does allow integration of the space 1, r, s exactly. Although no GL integration rule exists on the triangle, the fact that a diagonal projection vertex function exists gives hope that a finite element method similar to the spectral element method on quadrilaterals can still be developed. The diagonal projection vertex mode is shown in Figure 3. The grayscale shows the values for vertex mode 1, which has the value 1 at the top of the triangle. The solid black contour lines are the zero contours for this function. The dashed contour lines are the zero contours for vertex mode 2 and 3, which are rotations of vertex mode 1. These lines are shown to further demonstrate that the zero intersection points do not coincide. 1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Fig. 3. Contours of the vertex 1 mode that allows diagonal projection. Dashed lines are the zero lines of the vertex 2 and vertex 3 modes.
The interesting thing about the function shown in Figure 3 is that it is localized near the vertex and close to 0 elsewhere. This is similar to the GLL vertex functions used in quadrilateral and hexahedral spectral element methods. The most important point is that such a function allows a diagonal approximation to the mass matrix that is accurate to order p − 1. On quadrilaterals this property allows optimal spatial convergence rates to be obtained by unsteady explicit simulations [20]. Thus on triangles, optimal explicit simulations using continuous high-order polynomial approximations may still be possible even though a GL rule does not exist. Our continuing work is to determine whether there exist side and interior modes which also have the diagonal projection property. 4. Conclusions. It has been proven that a Gauss–Lobatto (GL) integration rule for triangles that has characteristics similar to GL integration on line segments, quadrilaterals, and hexahedra does not exist. Specifically, there is no integration rule having a point at each triangle vertex, p−1 points on each triangle side, and (p−1)(p− 2)/2 points in the interior that is capable of exactly integrating the space T (2p − 1). This also implies that there is no equivalent to the spectral element GLL nodal basis on the triangle. However, the analysis also shows that there is a vertex mode that
GAUSS–LOBATTO INTEGRATION ON THE TRIANGLE
1317
allows a diagonal approximation to the mass matrix accurate to order p − 1. This function may be a key to developing explicit simulations using continuous high-order polynomial approximations on triangles.
REFERENCES
[1] M. Abramowitz and I. A. Stegun, eds., Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables, Dover Publications, New York, 1965. [2] M. G. Blyth and C. Pozrikidis, A Lobatto interpolation grid over the triangle, IMA J. Appl. Math., 71 (2006), pp. 153–169. [3] Q. Chen and I. Babu˘ ska, Approximate optimal points for polynomial interpolation of real functions in an interval and in a triangle, Comput. Methods Appl. Mech. Engrg., 128 (1995), pp. 405–417. [4] M. J. S. Chin-Joe-Kong, W. A. Mulder, and M. V. Veldhuizen, Higher-order triangular and tetrahedral finite elements with mass lumping for solving the wave equation, J. Engrg. Math., 35 (1999), pp. 405–426. [5] G. Cohen, P. Joly, J. E. Roberts, and N. Tordjman, Higher order triangular finite elements with mass lumping for the wave equation, SIAM J. Numer. Anal., 38 (2001), pp. 2047–2078. [6] R. Cools, Constructing cubature formulae: The science behind the art, Acta Numer., 6 (1997), pp. 1–54. [7] R. Cools, Monomial cubature rules since “Stroud”: A compilation. II. Numerical Evaluation of Integrals, J. Comput. Appl. Math., 112 (1999), pp. 21–27. [8] R. Cools, Advances in multidimensional integration, J. Comput. Appl. Math., 149 (2002), pp. 1–12. [9] R. Cools, An encyclopaedia of cubature formulas, J. Complexity, 19 (2003), pp. 445–453. [10] R. Cools, I. Mysovskikh, and H. Schmid, Cubature formulae and orthogonal polynomials, J. Comput. Appl. Math., 127 (2001), pp. 121–152. [11] R. Cools and P. Rabinowitz, Monomial cubature rules since “Stroud”: A compilation, J. Comput. Appl. Math., 48 (1993), pp. 309–326. [12] S. Dey, J. E. Flaherty, T. K. Ohsumi, and M. S. Shephard, Integration by table look-up for p-version finite elements on curved tetrahedra, Comput. Methods Appl. Mech. Engrg., 195 (2006), pp. 4532–4543. [13] M. Dubiner, Spectral methods on triangles and other domains, J. Sci. Comput., 6 (1991), pp. 345–390. [14] D. A. Dunavant, High degree efficient symmetrical gaussian quadrature rules for the triangle, Internat. J. Numer. Methods Engrg., 21 (1985), pp. 1129–1148. [15] J. S. Hesthaven, From electrostatics to almost optimal nodal sets for polynomial interpolation in a simplex, SIAM J. Numer. Anal., 35 (1998), pp. 655–676. [16] T. J. R. Hughes, The Finite Element Method: Linear Static and Dynamic Finite Element Analysis, Prentice–Hall, Englewood Cliffs, NJ, 1987. [17] Y. Liu and M. Vinokur, Exact integrations of polynomials and symmetric quadrature formulas over arbitrary polyhedral grids, J. Comput. Phys., 140 (1998), pp. 122–147. [18] H. Luo and C. Pozrikidis, A Lobatto interpolation grid in the tetrahedron, IMA J. Appl. Math., 71 (2006), pp. 298–313. [19] J. N. Lyness and R. Cools, A survey of numerical cubature over triangles, in Mathematics of Computation 1943–1993: A Half-Century of Computational Mathematics (Vancouver, BC, 1993), Proc. Sympos. Appl. Math. 48, AMS, Providence, RI, 1994, pp. 127–150. [20] Y. Maday and A. T. Patera, Spectral element methods for the incompressible Navier-Stokes equations, in State-of-the-Art Surveys on Computational Mechanics, A. K. Noor and J. T. Oden, eds., The American Society of Mechanical Engineers, New York, 1989, pp. 71–143. [21] C. Mavriplis and J. van Rosendale, Triangular spectral elements for incompressible fluid flow, in Proceeding of the 11th AIAA Computational Fluid Dynamics Conference, Orlando, FL, 1993, paper AIAA-1993-3346. [22] I. P. Mysovskikh, Interpolyatsionnye kubaturnye formuly, “Nauka,” Moscow, 1981. [23] R. Pasquetti and F. Rapetti, Spectral element methods on triangles and quadrilaterals: Comparisons and applications, J. Comput. Phys., 198 (2004), pp. 349–362. [24] R. Pasquetti and F. Rapetti, Spectral element methods on unstructured meshes: Comparisons and recent advances, J. Sci. Comput., 27 (2006), pp. 377–387.
1318
B. T. HELENBROOK
[25] H. T. Rathod and M. Shajedul Karim, An explicit integration scheme based on recursion for the curved triangular finite elements, Comput. & Structures, 80 (2002), pp. 43–76. [26] A. H. Stroud, Approximate calculation of multiple integrals, Prentice–Hall, Englewood Cliffs, NJ, 1971. [27] M. A. Taylor and B. A. Wingate, A generalized diagonal mass matrix spectral element method for non-quadrilateral elements, Appl. Numer. Math., 33 (2000), pp. 259–265. [28] M. A. Taylor, B. A. Wingate, and L. P. Bos, A cardinal function algorithm for computing multivariate quadrature points, SIAM J. Numer. Anal., 45 (2007), pp. 193–205. [29] S. Wandzura and H. Xiao, Symmetric quadrature rules on a triangle, Comput. Math. Appl., 45 (2003), pp. 1829–1840. [30] T. Warburton, L. F. Pavarino, and J. S. Hesthaven, A pseudo-spectral scheme for the incompressible Navier-Stokes equations using unstructured nodal elements, J. Comput. Phys., 164 (2000), pp. 1–21.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1319–1365
UNIFIED HYBRIDIZATION OF DISCONTINUOUS GALERKIN, MIXED, AND CONTINUOUS GALERKIN METHODS FOR SECOND ORDER ELLIPTIC PROBLEMS∗ BERNARDO COCKBURN† , JAYADEEP GOPALAKRISHNAN‡ , AND RAYTCHO LAZAROV§ Abstract. We introduce a unifying framework for hybridization of finite element methods for second order elliptic problems. The methods fitting in the framework are a general class of mixed-dual finite element methods including hybridized mixed, continuous Galerkin, nonconforming, and a new, wide class of hybridizable discontinuous Galerkin methods. The distinctive feature of the methods in this framework is that the only globally coupled degrees of freedom are those of an approximation of the solution defined only on the boundaries of the elements. Since the associated matrix is sparse, symmetric, and positive definite, these methods can be efficiently implemented. Moreover, the framework allows, in a single implementation, the use of different methods in different elements or subdomains of the computational domain, which are then automatically coupled. Finally, the framework brings about a new point of view, thanks to which it is possible to see how to devise novel methods displaying very localized and simple mortaring techniques, as well as methods permitting an even further reduction of the number of globally coupled degrees of freedom. Key words. discontinuous Galerkin methods, mixed methods, continuous methods, hybrid methods, elliptic problems AMS subject classifications. 65N30, 65M60 DOI. 10.1137/070706616
1. Introduction. We introduce a new unifying framework for hybridization of finite element methods for second order elliptic problems. This framework is unifying in the sense that it includes as particular cases hybridized versions of mixed methods [4, 11, 26], the continuous Galerkin (CG) method [31], and a new, wide class of hybridizable discontinuous Galerkin (DG) methods. The unifying framework allows us to (i) significantly reduce the number of the globally coupled degrees of freedom of DG methods, (ii) use different methods in different parts of the computational domain and automatically couple them, and (iii) devise novel methods employing new mortaring techniques. We develop the unifying framework on the following model elliptic boundary value problem of second order written in mixed form: (1.1a) (1.1b)
q + a gradu = 0 div q + d u = f
(1.1c)
u=g
on Ω, on Ω, on ∂Ω.
∗ Received by the editors October 29, 2007; accepted for publication (in revised form) November 7, 2008; published electronically February 25, 2009. http://www.siam.org/journals/sinum/47-2/70661.html † School of Mathematics, University of Minnesota, Minneapolis, MN 55455 (
[email protected]. edu). This author’s research was supported in part by the National Science Foundation (grant DMS0411254) and by the University of Minnesota Supercomputing Institute. ‡ Department of Mathematics, University of Florida, Gainesville, FL 32611–8105 (
[email protected]fl. edu). This author’s research was supported in part by the National Science Foundation (grants DMS-0410030, DMS-0713833, and SCREMS-0619080). § Department of Mathematics, Texas A&M University, College Station, TX 77843–3368 (lazarov@ math.tamu.edu). This author’s research was supported in part by the National Science Foundation (grants NSF-DMS-0713829 and NSF-CNS-ITR-0540136).
1319
1320
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
Here Ω ⊂ Rn is a polyhedral domain (n ≥ 2), d(x) is a scalar nonnegative function, and a(x) is a matrix valued function that is symmetric and uniformly positive definite on Ω. In addition, we assume that the function g is the restriction of a smooth scalar function on ∂Ω and that the functions f , d, and a are smooth on Ω. These assumptions can be vastly generalized, but we take them for the sake of a transparent presentation of the design of our unifying framework. 1.1. The structure of the methods of the unifying framework. Let us begin the description of our results by arguing that what makes possible the construction of the unified framework is that all the numerical methods fitting in it are constructed by using a discrete version of a single property of the exact solution of problem (1.1). This property is a characterization of the values of the exact solution u on the interior boundaries of each of the elements K of any triangulation of the domain Ω, Th . Let us describe it. If on the border of the element K, ∂K, we set u = λ + g, where u on ∂K \ ∂Ω, 0 on ∂K \ ∂Ω, (1.2) λ= and g = 0 on ∂K ∩ ∂Ω, g on ∂K ∩ ∂Ω, by the linearity of the problem, we have that (1.3)
(q, u) = (Qλ + Qg + Qf, Uλ + Ug + Uf )
in Ω,
where the so-called local solvers (Q(·), U(·)) are defined on the element K ∈ Th as follows. For any single-valued functions m on L2 (∂K) and f on L2 (K), the functions (Qm, Um) and (Qf, Uf ) are the solutions of (1.4a) (1.4b)
c Qm + grad Um = 0, c Qf + grad Uf = 0,
div Qm + d Um = 0 div Qf + d Uf = f
on K, on K,
Um = m Uf = 0
on ∂K, on ∂K,
where c = a−1 for each element K ∈ Th . Conversely, the above property holds if and only if (see, for example, [46]) the normal component of Qλ + Qg + Qf across interelement boundaries is continuous. We thus see that this transmission condition, which we formally express as (1.5)
[[Qλ + Qg + Qf ]] = 0,
completely characterizes the function λ. Here [[·]] denotes the jump of the normal component of the a vector accross ∂K. The finite element methods of the unified framework are those that can be expressed as a discrete version of the above property. In this way, the only globally coupled degrees of freedom are bound to be those describing the approximation to λ. Thus, each of those method provides an approximate solution of the form (1.6)
(q h , uh ) = (Qλh + Qgh + Qf, Uλh + Ugh + Uf ),
where λh , respectively, gh , is an approximation in some finite-dimensional space Mh , respectively, Mh , of the values of u on the faces of the elements lying in the interior, respectively, in the border of Ω, and (Qm, Um) and (Qf, Uf ) are discrete versions of the exact local solvers (1.4)—we keep the same notation for the sake of simplicity. Moreover, the methods are such that λh can be determined by a discrete version of transmission condition (1.5), which we write as follows: (1.7)
ah (λh , μ) = bh (μ)
for all μ ∈ Mh .
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1321
In [26], where the hybridization of mixed methods was considered, the equation determining λh was called the jump condition. In our setting, it is called the conservativity condition to reflect the incorporation into the framework of DG and CG methods. Note that all the methods in the unified framework provide approximations for (q, u) in the interior of the elements K ∈ Th , (q h , uh ), as well as an approximation of u on the interior border of the elements λh ; this is why they are called hybrid. This is in agreement with the definition of hybrid methods proposed in [22, p. 421]: “we may define more generally as a hybrid method any finite element method based on a formulation where one unknown is a function, or some of its derivatives, on the set Ω, and the other unknown is the trace of some of its derivatives of the same function, or the trace of the function itself, along the boundaries of the set K.” Here K denotes a typical element of the triangulation. A long list of hybrid methods can be found in [22, 12, 51]. Of course, not every finite element method displays the above roughly described structure; in particular, it might not even be a hybrid method. However, many such methods can be rewritten as hybrid methods; this process is what can be called the hybridization of a finite element method. We say that we can hybridize a given finite element method if we can find a hybrid method (part) of whose solution coincides with the solution of the given method. The original finite element method is called hybridizable, and the hybrid method is then said to be a hybridization of the original method; for short, we call it a hybridized method. Next, we give a brief overview of the hybridization techniques of relevance for our purposes. 1.2. Hybridization of finite element methods. The first hybridization of a finite element method was proposed in 1965 [39] for a numerical method for solving the equations of linear elasticity. Perhaps because it was then intended as an implementation technique, the distinction between hybridization and static condensation, a widely known algebraic manipulation for size reduction of already assembled matrices, is seldom made in the engineering literature. However, in 1985 [4], hybridization was shown to be more than an implementation trick as it was proven that the new unknown λh , also interpreted to be the Lagrange multiplier associated with a continuity condition on the approximate flux, contains extra information about the exact solution. This was used to enhance the accuracy of the approximation by means of a local postprocessing [4, 11, 35]; see also [10]. After yet another two decades, a new perspective on hybridization emerged [26], and the characterization of the approximate trace λh as the solution of weak formulation (1.7) was introduced; this was done in the setting of the hybridization of the Raviart–Thomas (RT) and Brezzi–Douglas–Marini (BDM) mixed methods of arbitrary degree. The special case of the lowest order RT method had been previously considered in [21] within the framework of a study of the equivalence of mixed and nonconforming methods. In [26], it was shown that formulation (1.7) not only simplifies the task of assembling the stiffness matrix for the multiplier but can be used to establish unsuspected links between apparently unrelated mixed methods. It was also shown that it allows the devising and analysis of new, variable degree versions of those methods [27]. This new hybridization approach was later extended to finite element methods for the stationary Stokes equations using spaces of exactly divergence-free velocities; it was intended as an effective technique to bypass the extremely difficult construction of such spaces. It was successfully applied to a DG method [15] and to a mixed method for Stokes flow [28, 29]. For a review of these results, see [30]. Recently [31],
1322
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
this hybridization approach was applied to the CG method to pave the way for the computation of an H(div)-conforming approximation of the flux from the CG solution. 1.3. Hybridization of DG methods. In this paper, we continue this effort and show how to hybridize a large class of DG methods. Thus, we show that their approximate solution (q h , uh ) can be expressed as in (1.6) and that the approximate trace λh , which is nothing but the so-called numerical trace u h on the interelement boundaries (see [5]) satisfies weak formulation (1.7). In other words, we identify a class of DG methods whose globally coupled degrees of freedom are those of the numerical trace u h only; this results in an efficient implementation of these methods, as we argue below. In this way, the main disadvantages of DG methods for elliptic problems compared to other methods, namely, a higher number of globally coupled degrees of freedom for the same mesh and a lower sparsity of the corresponding stiffness matrices, are eliminated to a significant extent. The simplest examples of such methods are obtained by using a DG method to define the local solvers and by taking what could be called the corresponding natural choice for the space Mh for the approximate trace λh . For example, we can use the local discontinuous Galerkin (LDG) method to define the local solvers and construct a hybridizable DG method. Surprisingly, it turns out that the resulting DG method is not an LDG method but one of the DG methods considered in [17]; see Corollary 3.2. A similar result holds for the hybridizable DG methods whose local solvers are the interior penalty (IP) method, that is, the resulting method is not the original IP method but the IP-like method considered in [38]; see Corollary 3.4. This is in sharp contrast with the RT, BDM, and CG methods, each of which can be hybridized by using as local solvers the RT, BDM, and CG methods, respectively. It is interesting to note that the only known DG methods that turn out to be hybridizable by our technique are the following: a subset of the methods considered in [17], the minimal dissipation DG methods considered in [20], the minimal dissipation LDG method analyzed in [24], and the DG method considered in [38] and then rewritten as an IP method in [37]. With the exception of some LDG methods, none of the DG methods considered in the unified analysis of DG methods carried out in [5] is a hybridizable DG method. The reason is, roughly speaking, as follows. For all methods considered in [5], the variable q h is easily eliminated from the equations due to the fact that the numerical trace u h is independent of q h or graduh ; a primal formulation can then be found solely in terms of uh . In contrast, in our approach, we eliminate both q h and uh from the equations and obtain a formulation in terms of u h only, namely, (1.7). For this, it turns out that we need u h to be dependent on q h or graduh , except for a few special LDG methods. 1.4. Properties of the algebraic system of hybridizable DG methods. As pointed out above, since the degrees of freedom of the functions μ in the finite element space Mh are associated with the borders of the elements only, the stiffness matrix associated with weak formulation (1.7) of the numerical trace u h = λh is significantly smaller than the one associated to the original variables (q h , uh ). Moreover, the actual computation of the approximate solution of DG methods becomes competitive with that of hybridized mixed methods. For example, as we show below, on triangulations made of simplexes, the stiffness matrix associated with weak formulation (1.7) of any hybridizable DG method has the same size, block structure, and sparsity as the corresponding hybridized BDM [11] and RT [49] mixed methods; see [26] for details. Even more, it was recently proved (see [25, Property (iii) of Theorem 2.4]) that the stiffness matrices of the hybridized BDM and RT meth-
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1323
ods and the so-called single face hybridizable DG method are, in fact, identical provided d = 0. 1.5. New automatic coupling of different methods and mortaring techniques. One of the main features of the unified framework is that it allows for a single implementation of a vast class of finite element methods including DG, mixed, nonconforming, and CG methods and for their automatic coupling. Since it can be done even in the presence of nonmatching meshes, the unified framework provides a novel coupling and mortaring technique. This induces a paradigm shift in the way we view different finite element methods fitting in the framework, especially when considering adaptive algorithms. Indeed, since all these methods can be implemented within a single framework, the issue is now to investigate which method to use in what part of the domain in order to fully exploit its individual advantages. Let us briefly compare our new mortaring technique with the already established ones. Mortaring techniques (see the pioneering work [9]) were introduced to accommodate methods that can be defined in separate subdomains that could have been independently meshed. This technique introduces an auxiliary space for a Lagrange multiplier associated with a continuity constraint on the approximate solution. The resulting system could be written either as a saddle point problem, symmetric but indefinite [8], or as a nonconforming finite element approximation, which leads to a symmetric positive definite system; see, for example, [9, 42]. This classical mortaring is a powerful technique to achieve flexibility in the meshing and the choice of the finite element approximation. The work in this direction also includes coupling of mixed and CG [53], mixed and mixed finite element methods [2, 45], and DG and mixed methods [40]. However, this mortaring approach is very different from ours, since instead of enforcing the continuity of the approximation to u, we enforce a continuity condition on the approximation to the flux q. The way of coupling and mortaring provided by the unified framework represents a simpler alternative to the above-mentioned mortaring techniques, as well as to earlier works on the coupling of CG and DG methods implicitly contained in [5] and explicitly emphasized in [48], as well as to the coupling of DG and mixed methods introduced in [23] and in [50]. 1.6. Devising new methods. The unified framework provides a new point of view for constructing new methods. We provide three main examples of such methods. The first one is a family of methods well suited for hp-adaptivity and for dealing with nonmatching meshes. On each element K ∈ Th , it uses local solvers obtained from the RT, BDM, LDG, or CG methods by means of a suitable modification of the definition of the numerical trace of the flux of some faces of K only. For example, by modifying the numerical trace of the CG-H method on the element faces lying on the nonmatching interface, we allow the method to handle nonmatching grids. This method represents an alternative to the coupling of DG and CG methods proposed in [48]. The second example is a variable-degree RT method that can be used on some classes of nonconforming meshes. The third example is called the embedded DG (EDG) method; it was introduced in the setting of shell problems in [43]. An EDG method is obtained from an already existing hybridizable method by simply modifying the space Mh . This capability can be used as a new mortaring technique for dealing with nonmatching meshes, as we are going to see. Moreover, some EDG methods give rise to a stiffness matrix whose size and sparsity is exactly equal to that of the statically condensed stiffness matrix of the CG method, while retaining the stabilization mechanisms typical of DG methods; see [43]. As a consequence, EDG methods can immediately be incorporated into existing commercial codes. Related to EDG meth-
1324
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
ods are the so-called multiscale DG methods [44, 14], which were introduced with a similar intention but a different approach. 1.7. Possibilities and recent developments. The unified framework could be used to establish a single a priori and a single a posteriori error analysis of all the methods fitting in it. It could be used to compare different methods or to establish new relations between them just as the unsuspected relation between the RT and the BDM methods in [26] was recently uncovered by comparing their hybridized versions. The framework could also be used to further explore the relation between mixed and nonconforming methods like the relation between the RT method of lowest order and a nonconforming method established in [4] and exploited in [47]. This work was later generalized in [1], where links between a variety of mixed and nonconforming methods were established; see also the references therein. Finally, the unifying framework can be used to devise new preconditioners based on, for example, substructuring techniques. However, in this paper, none of the above-mentioned issues will be investigated. On the other hand, several discoveries induced by the unifying framework have already taken place. In particular, new DG methods which are more accurate and efficient than any other known DG method have been uncovered. Indeed, by exploiting the structure of the unified framework, a new DG method called the single face, hybridizable (SFH) DG method was constructed, which lies in between the RT and BDM methods; see [25]. It is the first known DG method, using polynomials of degree k for both q h and uh , proven to converge with order k + 1 in both variables; all other DG methods converge with order k in the flux only. Moreover, the SFH method shares with the RT and BDM methods their remarkable superconvergence properties; this allow for the element–by–element computation of a new approximation uh converging with order k+2. These results were then extended to other hybridizable DG methods in [33]. Therein, it was shown that in order to achieve the abovementioned convergence properties, the interelement jumps of both unknowns have to be penalized essentially in the same way. This goes against the established belief that the interelement jumps of uh need to be strongly penalized, while the interelement jumps of q h need not be. Also recently, a study of EDG methods obtained from hybridizable DG methods by forcing the numerical trace to be continuous has been carried out in [32]. It was proven that these EDG methods lose the above-mentioned convergence properties h is not single valued. Moreover, numerical evidence was because the numerical trace q provided indicating that this loss of accuracy of the EDG method is not compensated by the computational advantage of having a reduced amount of globally coupled degrees of freedom. Hybridizable DG methods, with properly chosen penalization parameters, are thus more efficient than their EDG counterparts. 1.8. Organization of the paper. The paper is organized as follows. In section 2, we describe the general structure of the hybridized finite element methods and prove that the approximate trace λh is characterized as the solution of a weak formulation of the form (1.7); see Theorem 2.1. We then provide sufficient conditions for the existence and uniqueness of the solution λh ; see Theorem 2.4. Further in this section we give some implementation details and compare the memory requirements of hybridizable methods with those of some classical DG methods. In section 3, we give several examples of hybridizable finite element methods. These include mixed methods using RT and BDM finite element spaces, a large variety of DG, CG, and some nonconforming finite element methods. In section 4, we build on the results of
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1325
the previous section and construct the above-mentioned novel hybridizable methods. Finally, in section 5, we conclude the paper with a few extensions and some final remarks. 2. The general framework of hybridization. In this section, we display the structure of hybridized finite element methods for second order elliptic problem (1.1). We begin by presenting the exact definition of the linear forms appearing in the weak formulation of the form (1.7), determining the approximate trace λh . We then provide sufficient conditions for the existence and uniqueness of λh and show that the assembly of the corresponding matrix equation can be done in a typical finite element fashion. We end by describing the sparsity structure of the stiffness matrix and comparing it with that of the stiffness matrices of the hybridized RT, IP, and LDG methods. 2.1. Notation. We use the notation used in [5]; let us recall it. Let Th be a collection of disjoint elements that partition Ω. The shape of the elements is not important in this general framework. Moreover, triangulation Th need not be conforming (we say that a triangulation Th is conforming if whenever the intersection of the boundaries of any two elements has nonzero (n − 1)-Lebesgue measure, the intersection is a face of each of the elements). So, Th can be a collection of simplices, quadrilaterals, cubes, or a mixture of them which are not required to align across element interfaces. An interior “face” of Th is any planar set e of positive (n − 1)dimensional measure of the form e = ∂K + ∩ ∂K − for some two elements K + and K − of the collection Th . (We use the word “face” even when n = 2.) We say that e is a boundary face if there is an element K of Th such that e = ∂K ∩ ∂Ω and the (n − 1)-Lebesgue measure of e is not zero. Let E◦h and E∂h denote the set of interior and boundary faces of Th , respectively. We denote by Eh the union of all the faces in E◦h and E∂h . In all our examples, elements of E◦h and E∂h are affine sets, although that is not required for the considerations in this section. Finite element methods based on the mesh Th typically use some finite-dimensional polynomial approximation spaces on each element of Th . On an element K, we denote by V (K) the polynomial space in which the flux q is approximated and by W (K) the space in which the scalar solution u is approximated. The corresponding global finite element spaces are defined by (2.1)
V h = {v : v|K ∈ V (K)}
and
Wh = {w : w|K ∈ W (K)}.
On an interior face e = ∂K + ∩ ∂K − , we consider scalar and vector functions that are, in general, double valued. For any discontinuous (scalar or vector) function q in Wh or V h , the trace q|e is a double-valued function, whose two branches are denoted by (q|e )K + and (q|e )K − . To simplify the notation, we often shorten these to qK + and qK − , respectively. These branches are defined by qK ± (x) = lim↓0 q(x − nK ± ) for all x in e. Here and elsewhere, n denotes the double-valued function of unit normals on Eh , so on any face e ⊆ ∂K, nK denotes the unit outward normal of K. The same notations are used for vector functions. For any double-valued vector function r on an interior face e, we define the jump of its normal component across the face e by [[r]]e := rK + · nK + + rK − · nK − . On any face e of K lying on the boundary, we set [[r]]e := r K · nK .
1326
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
To simplify the exposition, we use [[r]] to denote the single-valued function on the entire set Eh , which is equal to [[r]]e on every face e ∈ Eh . Similarly, for any e ∈ E◦h , we define {{ξ}}e =
1 (ξ + + ξK − ), 2 K
{{q}}e =
1 (q + + q K − ), 2 K
[[ξ]]e = ξK + nK + + ξK − nK − .
For a boundary face e in E∂h , the operator {{·}}e is also considered to be the identity, so that we can put together local operators {{·}}e to form a global operator {{·}} on Eh , just as we did for [[·]]. 2 Our notation for inner products is standard: For functions u and v in L (D), we n write (u, v)D = D uv dx if D is a domain of R and u , vD = D uv dx if D is a domain of Rn−1 . To emphasize the mesh-dependent nature of certain integrals, we introduce the notation (v, w)K and μ , λE = μ , λe (v, w)Th = K∈Th
e∈E
for functions v, w and μ, λ defined on Ω and Eh , respectively. Here E is any subset of Eh . 2.2. The general structure of the methods. To describe the structure of the methods fitting in the unified framework, we mimic the characterization of the exact solution given in the Introduction. Thus, we begin by choosing the space Mh of approximate traces, by taking the approximation to λ, λh , in (2.2)
Mh := {μ ∈ Mh : μ = 0 on ∂Ω}
and by setting gh = Ih g, where Ih is a suitably defined interpolation operator with image in Mh . Recall that g is the extension by zero of the Dirichlet data on ∂Ω to E◦h ; see (1.2). Next, we introduce a discrete version of local solvers (1.4a) and (1.4b). The first local solver maps each function m in Mh to the function (Qm, Um) on Ω, whose restriction to any mesh element K is in V (K) × W (K) and satisfies the following discretization of (1.4a): (2.3a) (2.3b)
for all v ∈ V (K), (c Qm, v)K − ( Um, div v)K = −m , v · n∂K n∂K + (d Um, w)K = 0 for all w ∈ W (K). −(gradw, Qm)K + w , Qm·
represents the numerical trace of the flux, which is, in general, a doubleHere Qm over a single simplex boundary valued function on E◦h . In inner products involving Qm ∂K, the integrand is assumed to be branch (Qm)K from that simplex. In all examples is either expressed explicitly in terms of we consider in this paper, numerical flux Qm (Qm, Um) or is an unknown function. In the examples where the latter case arises, lies and add new equations to we introduce the space in which the unknown Qm render the resulting formulation uniquely solvable. At this point, however, the precise is not essential, as we are solely interested in displaying the structure definition of Qm of the method for any Qm. Below, we formally require m → (Qm, Qm, Um) to be a well-defined linear map; see Assumption 2.1. The second local solver is a discretization of the second boundary value problem in (1.4b). It associates to any f ∈ L2 (Ω) the pair (Qf, Uf ), whose restriction to each
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1327
element K is defined as the function in V (K) × W (K) satisfying (2.4a) (c Qf, v)K − ( Uf, div v)K = 0
for all v ∈ V (K),
(2.4b)
I J ·n −(gradw, Qf )K + w , Qf
∂K
+ (d Uf, w)K = (f, w)K
for all w ∈ W (K).
. Just as for the first local solver, we leave undefined the numerical trace Qf Obviously, while the functions (Qf, Uf )|K and (Qm, Um)|K are in V (K)×W (K), and Qm lie will vary from example to example. Now we make the space in which Qf our assumption about the local solvers. Assumption 2.1 (existence and uniqueness of the local solvers). For every m in Um) depending linearly on Mh , there is a unique set of functions of m, (Qm, Qm, m and satisfying (2.3). Furthermore, for every f in L2 (Ω), there is a unique set of Uf ) depending linearly on f and satisfying (2.4). functions (Qf, Qf, Each of the methods under consideration define an approximation to (q, u), (2.5)
(q h , uh ) = (Qλh + Qgh + Qf, Uλh + Ugh + Uf ) ∈ (V h × Wh ),
where λh is assumed to be determined by the following discrete version of transmission condition (1.5): J I h + Qf h + Qg =0 for all μ ∈ Mh . (2.6) μ , Qλ Eh
If we define the numerical flux by h + Qg h + Qf, h := Qλ q
(2.7)
and if the (extension by zero to Eh of the) function [[ q h ]]|E◦h belongs to the space Mh , then condition (2.6) is simply stating that [[ q h ]]|E◦h = 0 pointwise, that is, the normal h is single valued, or, adopting the terminology component of the numerical trace q h is a conservative numerical flux. It is for this reason we call (2.6) of [5], the function q the conservativity condition. If the function [[ q h ]]|E◦h does not belong to the space Mh , the conservativity condition imposes only the weak continuity of the normal h , which, as a consequence, is not single valued. component of the numerical trace q It is worth noting that the method just described can be viewed as seeking the approximation (q h , uh , λh ) in V h × Wh × Mh satisfying (2.8a) (c q h , r)Th − (uh , div r)Th +
λh , r · n∂K\∂Ω = −gh , r · n∂Ω for all r ∈ V h ,
K∈Th
(2.8b) −(q h , gradw)Th +
K∈Th
(2.8c)
q h · n , w∂K + (d uh , w)Th = (f, w)Th
h · n∂K = 0 μ , q
for all w ∈ Wh , for all μ ∈ Mh .
K∈Th
Note that the first two equations are used to define local solvers (2.3) and (2.4), while the last is nothing but conservativity condition (2.6). This type of method is sometimes called a hybrid dual-mixed method. As pointed out in the Introduction, it is
1328
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
called mixed because we seek approximations for the flux q h , as well as the potential uh , on Ω. It is called hybrid dual because the approximate trace λh associated to the conservativity condition is an approximation for the trace of the potential u on the boundaries of the elements. Many hybridized finite element methods admit this structure. For example, some classic hybridized mixed methods [4, 26] are obtained by an appropriate choice of the h = q h . Many DG local spaces and by choosing Q(·) in such a way that we have q methods also fall into this form—although not all of them are hybridizable. Indeed, the schemes considered in the unified analysis of DG methods in [5] can be written in our notation as (c q h , v)Th − (uh , div v)K + uh , v · n∂K\∂Ω = −gh , v · n∂Ω , K∈Th
−(gradw, q h )Th +
K∈Th
h · n∂K + (d uh , w)Th = (f, w)Th , w , q
K∈Th
h are the so-called numerical traces of the DG method. Comparwhere u h and q ing these equations with (2.8) of our general framework, we immediately realize that u h = λh on E◦h . We thus see that, for a finite element method to be hybridizable, its numerical trace u h must be single valued. This implies, in particular, that the DG methods in [5] that are not adjoint consistent cannot be hybridized by using our h is not retechnique. In contrast, the (normal component of the) numerical trace q quired to be single valued, since conservativity condition (2.6) does not always ensure a single-valued numerical trace. Thanks to this flexibility, the CG method and the EDG methods turn out to be hybridizable. This concludes the description of the general structure of the methods. Methods with this structure include a wide class of DG and hybridized mixed and CG methods, as we show in sections 3, 4, and 5. 2.3. The characterization of the variable λh . As we see next, the relevance of the methods fitting the previously described general structure resides in the fact that the λh can be characterized in terms of a simple weak formulation in which none of the other variables appear. Theorem 2.1. Suppose Assumption 2.1 on the existence and uniqueness of the local solvers holds. Then λh ∈ Mh satisfies conservativity condition (2.6) if and only if it satisfies (2.9) where
ah (λh , μ) = bh (μ)
for all μ ∈ Mh ,
I ah (η, μ) =(c Qη, Qμ)Th + (d Uη, Uμ)Th + 1 , I J I bh (μ) = gh , Qμ + (f, Uμ)Th − 1 , Eh I + 1, I − 1, I + 1,
for all η and μ ∈ Mh .
J − Qη ( Uμ − μ) Qη , E J h − Qf ( Uμ − μ) Qf Eh J Uf Qμ − Qμ Eh J ( Uμ − μ) Qgh − Qgh E J h − Qμ ( Ugh − g) Qμ Eh
1329
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
Note that, since λh is an approximation of the function u on E◦h , it is natural to expect bilinear form ah (·, ·) to be symmetric. This motivates the following observation. is such that Bilinear form ah (·, ·) is symmetric if and only if numerical trace Q· I I J J − Qη − Qμ (2.10a) = 1 , ( Uη − η) Qμ 1 , ( Uμ − μ) Qη Eh
for all η, μ ∈ Mh . If we also have J I − Qf (2.10b) 1 , ( Uμ − μ) Qf then
Eh
Eh
J I − Qμ = 1 , Uf Qμ
I J bh (μ) = gh , Qμ
Eh
Eh
,
+ (f, Uμ)Ω .
All the examples in this paper satisfy the above symmetry conditions. Now we prove Theorem 2.1. Set J I h (2.11a) , ah (λh , μ) = − μ , Qλ Eh I J h + Qf bh (μ) = μ , Qg (2.11b) Eh
so that conservativity condition (2.6) takes the form (2.9). Theorem 2.1 then follows from the following result. Lemma 2.2 (elementary identities). We have, for any m, μ ∈ Mh and f ∈ L2 (Ω), I J (i) − μ , Qm = ( c Qm, Qμ)Ω + (d Um, Uμ)Ω Eh I J − Qm + 1 , ( Uμ − μ) Qm , Eh J I I J h (ii) − μ , Qg = − gh , Qμ Eh Eh J I h − Qgh + 1 , ( Uμ − μ) Qg E I J h − 1 , ( Ugh − gh ) Qμ − Qμ , Eh I J ]] (iii) − μ , [[Qf = − (f, Uμ)Th Eh I J − Qf + 1 , ( Uμ − μ) Qf Eh I J − Qμ − 1 , Uf Qμ . Eh
To prove Lemma 2.2, we need some identities which follow from the equations defining the local solvers by integration by parts. Lemma 2.3 (relation between jumps and local residuals). For any m, μ ∈ Mh , f ∈ L2 (Ω), v ∈ V h , and w ∈ Wh , the following identities hold: (2.12a) (2.12b) (2.12c) (2.12d)
( c Qm + grad Um, v)Th = + 1 , [[( Um − m) v]]Eh , I J − Qm ( div Qm + d Um, w)Th = − 1 , w Qm = + 1 , [[ Uf v]]Eh , I J − Qf ( div Qf + d Uf − f, w)Th = − 1 , w Qf
Eh
( c Qf + grad Uf, v)Th
Eh
.
,
1330
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
Using these identities, we now prove Lemma 2.2. Proof. Let us prove identity (i) of Lemma 2.2. We have I I J J − Qm − μ , Qm = − μ , [[Qm]]Eh − μ , Qm Eh
Eh
= (c Qμ, Qm)Th − ( Uμ, div Qm)Th I J − Qm − μ, Qm by (2.3a), Eh
= (c Qμ, Qm)Th + ( d Um, Uμ)Th I J − Qm + 1 , Uμ Qm Eh I J − μ, Qm − Qm by (2.12b). Eh
This proves identity (i) of Lemma 2.2. Now we prove identity (ii) of Lemma 2.2. To do that, note that, by identity (i) of Lemma 2.2, the bilinear form I I J J − Qm B(m, μ) = μ , Qm + 1 , ( Uμ − μ) Qm Eh
Eh
is symmetric. As a consequence, identity (ii) of Lemma 2.2 follows from equality B(μ, gh ) = B(gh , μ). Finally, we prove identity (iii) of Lemma 2.2. We have I J I J − Qf = − μ , [Qf ]Eh − μ , Qf − μ , Qf Eh
Eh
= (c Qμ, Qf )Th − ( Uμ, div Qf )Th I J − Qf − μ, by (2.3a), Qf Eh
= − (f, Uμ)Th + (c Qμ, Qf )Th + ( d Uμ, Uf )Th I J − Qf + 1 , ( Uμ − μ) Qf by (2.12d), Eh
= − (f, Uμ)Th + (div Qμ, Uf )Th + ( d Uμ, Uf )Th I J − Qf + 1 , ( Uμ − μ) Qf by (2.4a), Eh I J − Qμ = − (f, Uμ)Th − 1 , Uf Qμ Eh I J + 1 , ( Uμ − μ) Qf − Qf by (2.12b). Eh
This completes the proof of Lemma 2.2. 2.4. Sufficient conditions for the existence and uniqueness of λh . Next, we provide two conditions which are sufficient for the existence and uniqueness of λh . The first is a condition on the local solvers, and the second is a condition on the relation between the local solvers, on each element K of triangulation Th and the global space Mh of approximate traces. It is worth emphasizing that, by guaranteeing the existence and uniqueness of λh , these simple conditions ensure the automatic coupling of the different local solvers even across nonmatching meshes. Note that no explicit conditions on triangulation Th are involved in these conditions.
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1331
Assumption 2.2 (on the positive semidefiniteness of the local solvers). The local solvers and the numerical flux traces in (2.3) and (2.4) are such that, for every K ∈ Th , the following holds: I J ·n (2.13a) ≥0 for all μ ∈ Mh . − μ , Qμ ∂K
Moreover, there exits a space M (∂K) containing the set {ν : ν|e ∈ P0 (e) on each face e ∈ E◦h lying on ∂K} such that (2.13b)
I J ·n if μ , Qμ
∂K
= 0 for some μ ∈ Mh , then P∂K μ = CK
for some constant CK , where P∂K is the L2 (∂K)-orthogonal projection onto M (∂K). Note that auxiliary space M (∂K) is not necessarily finite-dimensional. Its use is only theoretical; it is not used in practice in any way. Let us argue that (2.13) is a reasonable condition on the positive semidefiniteness of the bilinear forms corresponding to the local solvers. Indeed, taking v := Qμ in (2.3a), m := μ and w := Um in (2.3b), and adding the equations, we get (2.14) I J ·n − m , Qμ
∂K
= (c Qm, Qμ)K + (d Um, Uμ)K +
I
J − Qm · n , Uμ − μ Qm
∂K
=: ah,K (m, μ). Thus, (2.13a) ensures that bilinear form ah,K (·, ·), which coincides with form ah (·, ·) when Ω is single element K, is positive semidefinite. Further, condition (2.13b) states that those functions m ∈ Mh for which ah,K (m, m) = 0 yield constants under an appropriate projection. This is a reasonable assumption, since it is a discrete version of a similar property of the exact solution. Indeed, for the exact solution, such a condition readily implies that Qm = 0 and, by (1.4a), that m = Um = constant on ∂K. This argument suggests that it is reasonable to expect projection P∂K to be strongly related to the identity, at least in parts of ∂K. The following assumption captures this property. It will allow us to establish a link between the different local solvers and, in so doing, to ensure the uniqueness of the solution of (1.7). Assumption 2.3 (the “gluing condition”). If μ ∈ Mh , then on every interior face e = ∂K + ∩ ∂K − , either μ = P∂K + μ or μ = P∂K − μ. We are now ready to state our result. Theorem 2.4 (existence and uniqueness of λh ). If Assumption 2.1 on the existence and the uniqueness of the local solvers, Assumption 2.2 on the positive semidefiniteness of the local solvers, and Assumption 2.3, the gluing condition, hold, then there is a unique solution λh of weak formulation (2.9). Proof. By Theorem 2.1, Assumption 2.1 guarantees the existence and the unique h . Therefore, system (2.9) is well defined. Since it is a square system, to ness of Qλ prove the existence and the uniqueness of its solution, it is enough to show that if ah (μ, μ) = 0 for some μ ∈ Mh , we have that μ = 0. By Lemma 2.2, J I J I ·n μ , Qμ =− . ah (μ, μ) = − μ , Qμ Eh
K∈Th
∂K
1332
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
Now, since ah (μ, μ) = 0, by (2.13a) of Assumption 2.2 on on the positive semidefiniteness of the local solvers, each of the summands on the right-hand side must vanish. Thus, I J ·n μ , Qμ
for all K ∈ Th .
=0
∂K
By condition (2.13b), on any interior face e = ∂K + ∩ ∂K − , this implies CK + = P∂K + μ =
1 μ , 1e = P∂K − μ = CK − , |e|
and by Assumption 2.3 (the gluing condition), we conclude that CK + = μ = CK − on the face e. This means that μ is a constant on Eh . Since μ = 0 on ∂Ω, we see that μ is identically equal to zero on Eh . This completes the proof. 2.5. The sparsity structure of the stiffness matrix for λh . Next, we comment on the sparsity structure of the stiffness matrix associated with weak formulation (1.7). For any given basis of the space of approximate traces Mh , we denote by [μ] the corresponding vector of coefficients of the representation of μ in a given basis of Mh . Then, weak formulation (2.9) A [λh ] = b, where [μ]t A [λh ] = ah (λh , μ)
and
[μ]t b = bh (μ).
Now, by (2.11), ah (η, μ) = −
I
·n μ , Qη
K∈Th
J and ∂K
bh (μ) =
I
J + Qg h ·n μ , Qf
K∈Th
, ∂K
we have that A=
AK
and
K∈Th
b=
bK ,
K∈Th
where AK and bK are defined by I J ·n [μ]t AK [η] = − μ , Qη
and ∂K
I J + Qg h ·n [μ]t bK = μ , Qf
. ∂K
Thus, the matrix equations for the multiplier can be obtained in a typical finite element manner. Moreover, the sparsity of the matrices AK and bK can be deduced from the following result. Proposition 2.1. Suppose Assumption 2.1 on the existence and the uniqueness of the local solvers holds. Then (i) if the support of μ ∈ Mh does not intersect ∂K, we have that [μ]t bK = 0; (ii) if the support of μ ∈ Mh or the support of η ∈ Mh does not intersect ∂K, we have that [μ]t AK [η] = 0.
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1333
Q P
Fig. 2.1. Interior edge e = PQ and the support of local solver (Qm, Um) for any m supported K is generally nontrivial on the boundary of the two shadowed triangles on e. Numerical trace (Qm) K, but it vanishes on the boundary of other triangles.
Proof. That [μ]t bK = 0 and [μ]t AK [η] = 0 if the support of μ does not intersect ∂K follows immediately from the definition of bK and AK . Let us show that [μ]t AK [η] = 0 if the support of η does not intersect ∂K. Since we are assuming that the local solvers are well defined, if the support of η does not intersect ∂K, we have, K = 0 on ∂K, and the result follows. This completes by Assumption 2.1, that (Qη) the proof. We emphasize that this result, illustrated in Figure 2.1, is possible due to the fact is double valued on all interior faces e ∈ E◦ . Indeed, take η that numerical trace Q· h as in the above proof and further assume that its support intersects ∂K , where the K can be nontrivial on e, in intersection of ∂K and ∂K is a face e in E◦h . Then (Qη) K = 0 on e because the general. However, this does not contradict the fact that (Qη) function Qη is double valued on e. In the remainder of this subsection, we compare the number of globally coupled degrees of freedom and the number of nonzero entries of the stiffness matrix, restricting our attention to the case of a conforming triangulation Th (no hanging nodes). First, consider the case in which Mh := Mch,k , where Mch,k := {μ ∈ C(Eh ) : μ|e ∈ Pk (e) for all faces e ∈ Eh }. Here, C(Eh ) denotes the space of continuous functions on Eh and Pk (D) the set of polynomials of degree at most k on a domain D. Then the sparsity structure of the matrix A is exactly that of the statically condensed stiffness matrix of a CG method using approximations whose restriction to each simplex K is in Pk (K). If, instead, we take Mh := Mh,k , where 5 4 (2.15) Mh,k = μ ∈ L2 (Eh ) : μ|e ∈ Pk (e) for all faces e ∈ E◦h , then by choosing basis functions whose support is always contained in a single face, we obtain matrix A, which has a block structure with square blocks of order equal to the dimension of Pk (e). The number of block rows and block columns is equal to the number of interior faces of triangulation Ni.f., and, on each block row, there are at most (2 n + 1) blocks that are not equal to zero. In other words, the size and sparsity structure of matrix A is precisely that of the stiffness matrix for the hybridized RT method using Mh as space of approximate traces; see [26]. This means that the order
1334
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
Table 2.1 Comparison between hybridizable DG methods and two typical DG methods on simplicial meshes. n
k
Rd.o.f.
2
1 2 3 4
1.00 1.33 1.67 2.00
Rsparsity IP LDG 1.20 3.00 2.13 5.33 3.33 8.33 4.80 12.00
n
k
Rd.o.f.
3
1 2 3 4
0.67 0.83 1.00 1.17
Rsparsity IP LDG 0.63 2.16 0.99 3.37 1.42 4.86 1.94 6.61
of matrix A, which is equal to the number of degrees of freedom of λh , is given by Nd.o.f. = Ni.f. dim Pk (e) and that the number of possibly nonvanishing entries of A is bounded by Nsparsity = Ni.f. (2 n + 1) (dim Pk (e))2 . Let us now compare the size and sparsity structure of this stiffness matrix with those of the IP and the (Schur-complement matrix of the) LDG methods that use polynomials of degree k. The number of globally coupled degrees of freedom for both methods is IP LDG Nd.o.f. = Nd.o.f. = Ns dim Pk (K),
where Ns denotes the number of simplexes of the triangulation. Moreover, the stiffness matrices in question have a block structure with square blocks of order equal to the dimension of Pk (K). On each block-row, the number of blocks that are not equal to zero are at most (n+2) for the IP method and ((n+1)2 +1) for the LDG method; recall that, for the LDG method, the degrees of freedom of the neighbors of the neighbors are also involved. This means that the number of nonzero entries of the corresponding stiffness matrices are (bounded by) IP LDG = Ns (n + 2) (dim Pk (K))2 , Nsparsity = Ns (n + 1)2 + 1 (dim Pk (K))2 . Nsparsity To compare with the hybridized methods, we consider the ratio of the number DG of globally coupled degrees of freedom Rd.o.f. := Nd.o.f. /Nd.o.f. and the ratio of the IP IP LDG number of entries different from zero Rsparsity := Nsparsity /Nsparsity and Rsparsity := LDG Nsparsity /Nsparsity . Since Ns /Ni.f. ≈ 2/(n + 1) (up to a lower order term related to the faces on the boundary), then
2 2
2 (n + 1)2 + 1 k k 2 (n + 2) IP LDG + 1 , Rsparsity = +1 . Rsparsity = (n + 1) (2 n + 1) n (n + 1) (2 n + 1) n In Table 2.1, we see that in two- or three-space dimensions, the hybridizable methods always have less degrees of freedom and have a stiffness matrix that is sparser than the corresponding LDG methods. The same is valid for the IP method in twospace dimensions and in three-space dimensions for k ≥ 3. In three-space dimensions, the IP method with k = 1 is more advantageous than the corresponding hybridizable DG method; for k = 2, its advantages are, however, marginal. It is interesting to extend the comparison with the IP method for which static condensation of the interior degrees of freedom has been carried out; of course, this
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1335
Table 2.2 Comparison between hybridizable and the statically-condensed IP methods on simplicial meshes. n 2
k 3 4 5
Rd.o.f. 1.50 1.60 1.67
Rsparsity 2.70 3.07 3.33
n 3
k 4 5 6
Rd.o.f. 1.13 1.23 1.32
Rsparsity 1.86 2.19 2.49
can be done only if k ≥ n + 1. In this case, the number of globally coupled degrees of freedom is sc−IP Nd.o.f. = Ns (dim Pk (K) − dim Pk−n−1 (K)).
The stiffness matrix in question has again a block structure with square blocks of order equal to (dim Pk (K) − dim Pk−n−1 (K)). On each block-row, the number of blocks that are not equal to zero are n + 2. Indeed, it can be shown that the interior degrees of freedom on a given simplex can be expressed in terms of the condensed degrees of freedom of the simplex and those of its neighbors, and that the condensed degrees of freedom can be expressed in terms of the interior degrees of freedom of the simplex and those of its neighbors. We then have sc−IP = Ns (n + 2) (dim Pk (K) − dim Pk−n−1 (K)2 . Nsparsity
This implies that the corresponding ratios are
2 k−j k sc−IP n +1 1 − Πj=1 Rd.o.f. = , (n + 1) n k+j and sc−IP Rsparsity
2 (n + 2) = (n + 1) (2 n + 1)
k +1 n
2
2 k−j n . 1 − Πj=1 k+j
We show some results in Table 2.2. We see that the hybridized methods produce smaller and more sparse matrices than the statically-condensed IP method. The same argument could be made for DG methods on n-dimensional rectangular finite elements. In this case, the DG approximations could be based on polynomials of degree k (instead of polynomials of degree k in each variable in the case of continuous elements). Then the ratio between the degrees of freedom (and the sparsity) will be lower, since instead of the factor Ns /Ni.f. ≈ 2/(n + 1), we have the factor Nr /Ni.f. ≈ 2/2n . A complete comparison of methods would require factoring in the costs of solving the algebraic problem. While greater sparsity or lesser number of degrees of freedom often yields faster solution methods, definitive conclusions can be made only after numerical experiments with specific direct or iterative methods; see [16] for such studies on older methods. 3. Examples of hybridizable methods. In this section, we give several examples of methods fitting the general structure described in the previous section. We restrict ourselves to methods that use the same local solver in all the elements K of triangulation Th . Throughout this section, we assume that Th is a conforming simplicial triangulation.
1336
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
To define each of the methods, we have only to specify (1) the numerical trace (2) the local spaces V (K), W (K), and (3) the space of approximate of the flux Q·, traces Mh . We then verify that the local solvers are well posed and discuss the conservativity condition by using Theorem 2.1. We use Theorem 2.4 to verify the existence and the uniqueness of the approximate trace λh and end by relating these results to relevant, earlier material. Our examples are summarized Tables 3.1 and 3.2; some of them are schematicaly related in Figure 3.1. The first column of the tables consists of method names. We adopt the following convention: Suppose that we define the local solver on each element by using a numerical method previously known as the “N” method. Then we call the resulting hybridized formulation an “N-hybridizable method” or, in short, an “N-H” method. For example, if we use the well-known IP method to define the local solvers, then any hybridized formulation with such local solvers is denoted as IP-H. We also say that a finite element method is an N-H method if there is a hybridization of the method that is an N-H method. In columns 2–4 of Table 3.1, we give the spaces of the local solvers and the approximate trace. In the fifth column, we indicate whether the method gives a h so the conservativity condition is satisfied in a strong form single-valued flux trace q h is double-valued so the methods leads to a weak conservativity condition. In or q the last two columns of Table 3.1, we define the numerical traces of the fluxes Qm and Qf . The weak formulations for the approximate traces obtained via Theorem 2.1 for each type of method are listed in Table 3.2. 3.1. The RT-H method. This method is obtained by using the RT method to define the local solvers. The three ingredients of the RT-H method are as follows: 1. For each K ∈ Th , we take = Qm, Qm
= Qf Qf
on ∂K;
2. The finite element space V (K) × W (K) is defined as Raviart–Thomas space of degree k: V (K) = Pk (K)n + x Pk (K),
W (K) = Pk (K),
k ≥ 0,
where Pk (K)n denotes the set of vector functions whose components are in Pk (K); 3. We define the space of approximate traces as Mh = Mh,k . The fact that the local solvers are well defined can be established by realizing that they are defined by using exactly the RT mixed finite element method. Indeed, if we and Qf into the equations defining the insert the expression of numerical traces Qm local solvers, we see that they are nothing but the RT discretizations of exact local problems (1.4), as claimed. Since the RT method is well defined (see [49, 12]) local solvers (Qm, Um) and (Qf, Uf ) are also well defined. h to be single Note that conservativity condition (2.6) forces numerical trace q ◦ h + Qf ]] valued. Indeed, because (extension by zero from Eh to Eh of) [[Qλh + Qg and test functions μ belong to the same space, conservativity condition (2.6) forces equality h + Qg h + Qf h] = [[q h ]] = Qλ = 0 on E◦h , [q
Qm Qm Qm Qm + τ ( Um − m) n Qm + τ ( Um − m) n Qm + τ ( Um − m) n −agrad Um + τ ( Um − m) n a new unknown variable a new unknown variable
NC-H† CG-H†
IP-H†
ah (η, μ) (c Qη, Qμ)T h + (d Uη, Uμ)T h (c Qη, Qμ)T h + (d Uη, Uμ)T h (c Qη, Qμ)T h + (d Uη, Uμ)T h + 1 , [[( Uμ − μ)(τ ( Uη − η)n)]] E h (agrad Uμ, grad Uη)T h + (d Uη, Uμ)T h + 1 , [[ (η − Uη)agrad Uμ + (μ − Uμ)agrad Uη ]] E h + 1 , [[( Uμ − μ)(τ ( Uη − η)n)]] E h (agrad Uη, grad Uμ)T h + (d Uη, Uμ)T h (agrad Uη, grad Uμ)T h + (d Uη, Uμ)T h † We assume that a(x) is a constant on each element.
Conservativity strong strong strong strong strong strong strong weak
Method RT-H BDM-H LDG-H
Mh Mh,k Mh,k Mh,k Mh,k Mh,k Mh,k Mh,k−1 Mch,k
Qf Qf Qf Qf + τ ( Uf ) n Qf + τ ( Uf ) n Qf + τ ( Uf ) n −agrad Uf + τ ( Uf ) n a new unknown variable a new unknown variable
· n ∂Ω (f, Uμ)T h + gh , Qμ (f, Uμ)T h + gh , [[Qμ]] Eh
(f, Uμ)T h + gh , −agrad Uμ · n + τ Uμ ∂Ω
bh (μ) (f, Uμ)T h + gh , Qμ · n ∂Ω (f, Uμ)T h + gh , Qμ · n ∂Ω (f, Uμ)T h + gh , Qμ · n + τ Uμ ∂Ω
Table 3.2 Weak formulations for the approximate trace.
W (K) Pk (K) Pk−1 (K) Pk−1 (K) Pk (K) Pk (K) Pk (K) Pk (K) Pk (K)
V (K) Pk + x Pk (K) Pk (K)n Pk (K)n Pk (K)n Pk−1 (K)n Pk (K)n Pk−1 (K)2 , k odd Pk−1 (K)n
Method RT-H BDM-H LDG-H LDG-H LDG-H IP-H NC-H CG-H (K)n
Table 3.1 Summary of the examples. UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1337
1338
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
1/τ = Qm + τ ( Um − m) Qm = Qf + τ Uf Qf
Mixed methods
LDG-H
IP-H
CG-H
0
1/τ0
1/τ
ill-defined methods
= −agrad Um + τ ( Um − m) Qm = −agrad Uf + τ Uf Qf
CG-H
Fig. 3.1. Relations between some hybridizable methods in terms of stabilization parameter τ .
h is single-valued, and q h ∈ H(div, Ω). so the normal component of numerical trace q Moreover, Theorem 2.1 asserts that the conservativity condition is equivalent to (2.9) with ah (η, μ) = (c Qη, Qμ)Th + (d Uη, Uμ)Th , bh (μ) = gh , Qμ · n∂Ω + (f, Uμ)Th , provided gh |E◦h = 0. This is, of course, a reasonable choice, since g|E◦h = 0 and Mh is a space of discontinuous functions. These results appeared earlier in [26, Theorem 2.1], where the hybridized RT method of arbitrary order was considered; the case of the lowest order RT method was previously considered in [21]. We can thus conclude that the original RT method is an RT-H method. In [41], bilinear form ah (·, ·) was shown to be positive definite; this implies that λh is uniquely determined. Next, we apply our general approach to this method and verify Assumption 2.2 on the positive semidefiniteness of the local solvers and Assumption 2.3, the gluing condition. By Theorem 2.4, this ensures the existence and the uniqueness of λh and hence that of approximation (q h , uh ). Proposition 3.1. Assumption 2.1 on the existence and the uniqueness of the local solvers, and Assumption 2.2 on the positive semidefiniteness of the local solvers hold for the RT-H method. Assumption 2.3, the gluing condition, also holds with M (∂K) = {μ : μ|e ∈ Pk (e) for all faces e of ∂K}. Proof. Assumption 2.1 obviously holds. Let us prove Assumption 2.2. To do that, we first show that condition (2.13a) holds. By identity (2.14) with μ := m, we have that I J ·n =(c Qm, Qm)K + (d Um, Um)K , − m , Qm ∂K
by the definition of Qm. We thus see that condition (2.13a) is satisfied. · Now we verify condition (2.13b) with the given choice of M (∂K). If m , Qm n∂K = 0, we immediately obtain Qm|K = 0. This implies that (2.3a) can be rewritten
1339
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
as (3.1)
(grad Um, v)K − Um − m , v · n∂K = 0
for all v ∈ V (K).
It is well known (see, for example, [12]) that for a given grad Um and Um − m, there is a function v ∈ V (K) such that (3.2)
(v, pk−1 )K =(grad Um, pk−1 )K
(3.3)
v · n , pk e = − Um − m , pk e
for all pk−1 ∈ Pk−1 (K), for all pk ∈ Pk (e)
for all faces e of K. Using this v in (3.1), we find that (grad Um, grad Um)K + ( Um − m, Um − m)∂K = 0. This implies that Um is a constant on K, so m is constant on ∂K. This proves that condition (2.13b) is satisfied with M (∂K) as described. It remains to verify Assumption 2.3. Since we are assuming that triangulation Th is conforming, each interior face e = ∂K + ∩ ∂K − is also a face of both K + and K − . Hence, since μ|e ∈ Pk (e), we have that P∂K + μ = μ = P∂K − μ on e. This completes the proof. 3.2. The BDM-H method. To obtain the BDM-H method, we use the BDM method to define the main three ingredients of the hybridization method: 1. For each K ∈ Th , we take = Qm, Qm
= Qf Qf
on ∂K;
2. The finite element spaces are defined as V (K) = Pk (K)n ,
W (K) = Pk−1 (K),
k ≥ 1;
3. The space of approximate traces is defined as Mh = Mh,k . This defines the BDM-H method. Everything said about the RT-H method in the previous subsection applies to the BDM-H method. In particular, we have that the original BDM method is a BDM-H method; see [41]. 3.3. The LDG-H methods. The LDG-H methods are obtained by using the LDG method to define the local solvers. The following specifications completely define the class of LDG-H methods: 1. The numerical traces (3.4)
= Qm + τK ( Um − m)n, Qm
= Qf + τK ( Uf ) n Qf
where τK is a function that can vary on ∂K. 2. The space V (K) × W (K) as one of the following choices: (3.5a) (3.5b)
Pk (K)n × Pk−1 (K), k ≥ 1 and τK ≥ 0 on ∂K; Pk (K)n × Pk (K), k ≥ 0 and τK > 0 on at least
(3.5c)
one face of the simplex K; Pk−1 (K) × Pk (K), k ≥ 1 and τK > 0 on ∂K. n
on ∂K,
1340
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
3. The space of approximate traces is Mh = Mh,k .
(3.6)
Typically, the stabilization parameter τ of the LDG methods is a nonnegative constant on each face in Eh . Here, we allow τ to be double valued on E◦h , with two branches τ − = τK − and τ + = τK + defined on the edge e shared by the finite elements K − and K + . Now the functions (Qm, Um) and (Qf, Uf ) are the approximations given by the LDG method to exact solutions of (1.4) on each element, as claimed. As is well known (see [34, 17, 5]), the LDG method is uniquely solvable for τK > 0. However, the above specifications define a wider class of LDG-H methods. We show that the existence and the uniqueness of the solution of the method can be guaranteed for each of choices (3.5). Proposition 3.2. Assumption 2.1 on the existence and the uniqueness of the local solvers holds for the numerical traces given by (3.4) and with any of choices (3.5) for V (K) × W (K). To prove this result for all the above-mentioned cases, we use the following auxiliary lemma. Lemma 3.1. Let τK ≥ 0. With the choice of numerical traces in (3.4), local problems (2.3) and (2.4) are uniquely solvable if V (K) × W (K) defined by (3.5) is such that whenever w ∈ W (K) satisfies (i) τK w = 0 on ∂K, and (ii) (w, div v)K = 0 for all v ∈ V (K), we have that w = 0. Proof. Let us prove the result for first local solver (Qm, Um) defined by (2.3). The result for the other local mapping (2.4) is similar. It suffices to prove uniqueness, since this implies existence. To prove uniqueness, we must show that, when m = 0, the only solution of (2.3) is the trivial one. Taking v = Qm and w = Um in (2.3) and adding the resulting equations, we get I J − Qm · n + (d Um, Um)K = 0. (c Qm, Qm)K + Um , Qm ∂K
Inserting the definition of the numerical trace Qm, we get (c Qm, Qm)K + Um , τK Um∂K + (d Um, Um)K = 0, and since c is positive definite and symmetric, d ≥ 0, and τK ≥ 0, we have that Qm = 0. It remains to show that Um = 0. To do so, we note that the above equation implies that (τ Um)K = 0 on ∂K. By (2.3a), we also have ( Um, div v)K = 0
for all v ∈ V (K).
By hypothesis (ii) of Lemma 3.1, this implies that Um = 0. This completes the proof. We are now ready to prove Proposition 3.2. Proof. By Lemma 3.1, we have only to show that, for each of three choices (3.5), if w ∈ W (K) satisfies τK w = 0 on ∂K and (w, div v)K = 0 for all v in V (K), then w = 0 on K. Let us show that this is true for the spaces given by (3.5a). Since div : V (K) → W (K) is surjective, we know there is a v in V (K) such that div v = w. This implies that (w, w)K = 0 and hence that w = 0 on K.
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1341
Next, let us consider choice (3.5b). Since w must vanish on the face F where τK > 0, we immediately have that w = 0 if k = 0. If k ≥ 1, it can be factored as w = F pk−1 , with pk−1 ∈ Pk−1 (K) and F equal to the barycentric coordinate function of K that vanishes on F . Then, choosing v in V (K) = Pk (K)n such that div v = pk−1 , equation 0 = (div v, w)K = (div v, F pk−1 )K = (pk−1 , F pk−1 )K implies that pk−1 vanishes on K, so w = 0 on K. Finally, let us consider choice (3.5c). Since τK > 0 on ∂K, we have that w = 0 on ∂K, and a simple integration by parts gives that (gradw, v)K = 0
for all v ∈ V (K) = Pk−1 (K)n .
Taking v = gradw allows us to conclude that w is a constant on K and hence identically zero on K. This completes the proof. Note that choices (3.4) of the numerical traces, (3.5) of the finite elements spaces V (K) × W (K), and (3.6) for approximate trace space Mh clearly imply that, for all these LDG-H methods, conservativity condition (2.6) is satisfied strongly. Moreover, by Theorem 2.1, the conservativity condition is equivalent to ah (λh , μ) = bh (μ) for all μ ∈ Mh , where ah (η, μ) = (c Qη, Qμ)Th + (d Uη, Uμ)Th + 1 , [[( Uμ − μ)(τ ( Uη − η)n)]]Eh , bh (μ) = gh , Qμ · n + τ Uμ∂Ω + (f, Uμ)Th , provided gh |E◦h = 0. Form ah (·, ·) is obviously symmetric. That it is also positive definite follows once Assumption 2.2 on the positive semidefiniteness of the local solvers is verified. Set (3.7)
M (∂K) = {μ : μ|e ∈ Pk (e) for all faces e where τK = 0, and μ|e ∈ L2 (e) for all faces e where τK > 0}.
Proposition 3.3. Let the numerical traces be set by (3.4), the local spaces be as in any of choices (3.5), and the space of approximate traces be set by (3.6). Then, Assumption 2.2 on the positive semidefiniteness of the local solvers and Assumption 2.3, the gluing condition, are satisfied with M (∂K) defined by (3.7). Proof. We begin by showing that condition (2.13a) holds. By identity (2.14) with μ := m and the definition of Qm, we have that I J ·n =(c Qm, Qm)K + (d Um, Um)K + τK ( Um − m) , Um − m∂K . − m , Qm ∂K
Since τK ≥ 0 in all three cases (3.5), we see that condition (2.13a) is satisfied. · n∂K = 0, we Now, let us verify condition (2.13b). If we assume that m , Qm immediately obtain that Qm|K = 0 and τ ( Um − m)|∂K = 0. This implies that the first equation defining first local solver (2.3a) can be rewritten as (3.8)
(grad Um, v)K − Um − m , v · n∂K = 0
for all v ∈ V (K).
We use this equation to show that in all three cases (3.5), condition (2.13b) is satisfied with P∂K defined, on the face e of K, as the L2 -projection into Pk (e) if τ |e = 0 and as the identity if τ |e > 0:
1342
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
(i) In case (3.5a), the result follows exactly as in the proof of Proposition 3.1. (ii) In case (3.5b), we know (see [24]) that there is a function v ∈ Pk (K)n such that (3.9) (3.10)
(v, pk−1 )K =(grad Um, pk−1 )K v · n , pk e = − Um − m , pk e
for all pk−1 ∈ Pk−1 (K)n , for all pk ∈ Pk (e)
for all the faces e of K except one, say, the face e on which τ > 0. Setting this v in (3.8) and using the fact that on e we have that m = Um, we obtain that Um is a constant on K and that m = Um on the remaining faces of ∂K. Thus, m is constant on ∂K and condition (2.13b) is verified. Assumption 2.3, the gluing condition, is trivially satisfied by virtue of the definition of M (∂K) in (3.7). (iii) In case (3.5c), we immediately see that m = Um on ∂K. Now we take v = grad Um in (3.8) to get that Um is a constant. This verifies Assumption 2.2 as in the previous case. Assumption 2.3 obviously holds from the definition of M (∂K) in (3.7). h and u Our next result sheds light into the nature of numerical traces q h of the LDG-H schemes. Proposition 3.4 (characterization of LDG-H methods). Let the numerical traces be set by (3.4), the local spaces be as in any of choices (3.5), the space of approximate traces be set by (3.6), and (q h , uh ) be as defined in (2.5). Then conservativity condition (2.6) holds on E◦h if and only if
τ+ τ− 1 + − λh = u (3.11a) h = − u + u + [[q h ]], τ + τ+ h τ− + τ+ h τ+ + τ−
τ− τ+ τ +τ − + − h = − q (3.11b) + + q q [[uh ]]. h h τ + τ+ τ− + τ+ τ− + τ+ Proof. Suppose the conservativity condition holds. We need to prove (3.11a) and (3.11b). By the definition of q h (see (2.7)) we have h + Qg h + Qf h = Qλ q = (Qλh + Qgh + Qf ) + τ ( Uλh + Ugh + Uf − λh − gh ) n = qh + τ (uh − λh − gh ) n. Inserting this expression into the conservativity condition and taking gh equal to zero on E◦h , we obtain that, for any μ ∈ Mh , μ , [[ q h ]]E◦h = μ , [[q h + τ (uh − λh ) n]]E◦h = 0, which implies, by our choice of spaces, that [[ q h ]] = 0 on E◦h or equivalently that + − − − [[q h ]] + τ + u+ λh = 0 on E◦h . h + τ uh − τ + τ Solving for λh , we obtain (3.11a). To prove (3.11b), we simply insert the expression for λh into the identity + + + + q + τ + u+ h · n = qh · n h − λh and perform a few algebraic manipulations.
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1343
The converse asserted by the proposition is trivial: If identities (3.11) hold, then h is single valued on E◦h and the conservativity condition the normal component of q is satisfied. This completes the proof. Corollary 3.2. The LDG method is not an LDG-H method for any finite τ . Proof. On any interior face e ∈ E◦h , the LDG method has a numerical trace u h independent of q h ; see [34, 17, 5]. On the other hand, by Proposition 3.4, the LDGH methods have numerical traces u h that depend on [[q h ]]. Since this dependence cannot be removed for any finite value of τ , we see that no LDG method is an LDG-H method. This completes the proof. As known from [34, p. 2445] and [17, p. 1681], the independence of numerical trace u h of the LDG methods of q h on interior faces E◦h allows us to eliminate the unknown q h from the equations and to obtain a primal formulation involving only uh . In contrast, in the LDG-H methods, u h must depend on q h as well. Both approaches recover q h locally but using different mechanisms. Since the LDG-H methods lead to a formulation involving only numerical trace λh , they have fewer globally coupled unknowns than the LDG method for high order polynomials. The LDG-H methods considered in this subsection were studied in [17] where it was proven, in particular, that the method is well defined for τ > 0 on Eh . Methods with τ = 0 do not fit in the framework proposed in [5]; they have been recently studied in [24]. 3.4. A limiting case of LDG-H methods. Here we consider hybridizable Galerkin methods that can be obtained formally considering limiting values of the penalty parameter in LDG-H methods. The motivation for doing this arises from the previous corollary (Corollary 3.2), whereby we know that the only chance for showing that an LDG method can be hybridized lies in cases where τ is allowed to be not finite. We first examine how numerical traces of the previous LDG-H method change as we formally pass to a limit in τ . By letting τ + go to infinity on the interior face e = ∂K + ∩ ∂K − while maintaining a fixed finite τ − , we find that the expressions for the numerical traces obtained in Proposition 3.4 become (3.12)
u h = u+ h
and
− h = q − q [[uh ]]. h +τ
Note that the above expression for primal numerical trace u h is independent of the fluxes, or, in other words, such traces will result in an LDG method. Indeed, the LDG method defined by these numerical traces have been thoroughly studied in the case τ − > 0; see [34, 17, 5]. In the special case τ − = 0, we get u h = u+ h
and
h = q − q h,
which also defines a previously studied LDG method. For this scheme, the discontinuities of the approximate solution across interior interelement boundaries do not introduce any dissipation. The dissipative effect of the discontinuities is concentrated on the boundary of the domain and hence reduced to a “minimum,” which is the reason for its name, the minimal dissipation LDG method. Since this scheme does not fit the unified analysis in [5], it was studied in [20] and [24] for problems in one and several space dimensions, respectively. The formal passage to limit solely in the expressions for numerical traces does not clarify if the limiting methods are hybridizable. In particular, we must explain
1344
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
precisely what we mean by setting τK = ∞ in the context of local solvers. To do so, let FK be the union of one or more faces of the element K where we want to set the branch τK to ∞. Since = Qm + τK ( Um − m)n, Qm we expect that in the formal limit of τK = ∞, we should have Um − m = 0. Then the on FK becomes an unknown because the last term above is an unknown value of Qm formal product of 0 with ∞. Motivated by this, we now define the local solvers with and Qf as new unknowns. More precisely, setting Qm W (K) = Pk (K),
V (K) = Pk (K)n ,
T K (FK ) = {nK w|FK : w ∈ W (K)},
F ) ∈ V (K) × W (K) × T K (FK ) for any we define local solution (Qm, Um, (Qm) K m ∈ Mh by (c Qm, v)K − ( Um, div v)K = −m , v · n∂K for all v ∈ V (K), I J (3.13b) −(gradw, Qm)K + w , Qm· n + (d Um, w)K = 0 for all w ∈ W (K),
(3.13a)
∂K
(3.13c)
Um = m
on FK .
Here, just as for the LDG-H methods, we set = Qm + τK ( Um − m) n Qm
on ∂K \ FK .
)F ) as the element of V (K) × W (K) × T K (FK ) Similarly, we define (Qf, Uf, (Qf K such that (3.14a) (3.14b)
(c Qf, v)K − ( Uf, div v)K = 0 for all v ∈ V (K), I J ·n −(gradw, Qf )K + w, Qf + (d Uf, w)K = (f, w) ∂K
(3.14c)
Uf = 0
for all w ∈ W (K),
on FK ,
where = Qf + τK ( Uf ) n Qf
on ∂K \ FK .
We set the space of approximate traces by (3.15)
Mh = {μ ∈ Mh,k : μ|FK is continuous on FK for all K ∈ Th }.
Note that the continuity condition in the above definition reflects the fact that the local solvers satisfy strong Dirichlet boundary conditions on FK for all K ∈ Th ; see (3.13c) and (3.14c). This completes the definition of the limiting case of the LDG-H method when τK = ∞ on FK . From now on, the above modification of the LDG local solvers is tacitly understood whenever we say that a branch of τ is infinity on a face. It is easy to check, by arguments similar to that in Proposition 3.2, that local problems (3.13) and (3.14) are uniquely solvable for every m in Mh and every f ∈ L2 (Ω) provided, for each element K ∈ Th , τK is not identically equal to zero on ∂K whenever FK is the empty set. Note that, although the local solvers have been modified, Theorem 2.1 continues to apply because its proof only relies on the form of the first two equations in the
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1345
local problems. Indeed, (3.13a) and (3.13b) are identical in form to (2.3a) and (2.3b), respectively; a similar remark applies to the equation of the second local solvers. Therefore, Theorem 2.1 also holds in this case. In particular, we have that τ ( Uη − η) , ( Uμ − μ)∂K\FK . ah (η, μ) = (c Qη, Qμ)Th + (d Uη, Uμ)Th + K∈Th
Finally, it is not difficult to see that Proposition 3.3 also holds. By Theorem 2.4, bilinear form ah (·, ·) is positive definite, and we can immediately see that λh is uniquely determined. Note that, unlike all previous examples, conservativity condition (2.6) for these h lie in Mh,k , methods is only imposed weakly. This is because while the jumps of q the approximate traces μ are in the space Mh , which is a strict subspace of Mh,k . Since all LDG methods have single-valued numerical traces, this seems to suggest that no LDG method can be a limiting case of the LDG-H method. However, this is not the case, as we see next. We consider the one-sided limiting case of the LDG-H method. This is the same as the above-defined limiting case of the LDG-H method but with the following additional assumption: For every interior face e in E◦h , one branch of τ is infinity, and the other branch is finite-valued. Corollary 3.3. The one-sided limiting case of the LDG-H method coincides with the LDG method whose numerical traces on the interior faces are given by (3.12). Proof. Let λ∞ h denote the solution of the one-sided limiting case of the LDG-H method, and let ∞ q∞ h = Qλh + Qgh + Qf,
∞ u∞ h = Uλh + Ugh + Uf.
∞ We will prove that q ∞ h and uh coincide with the corresponding solution components q hLDG and uhLDG , respectively, of the LDG method with numerical traces set as in (3.12). By the definition of the LDG method, q hLDG and u LDG satisfy (2.8a)–(2.8b) with h therein set, respectively, to u h of (3.12), which, for clarity, we h and q the λh and q LDG LDG h . and q will rewrite as u h ∞ LDG and uhLDG . It suffices to show that q ∞ h and uh satisfy the same equations as q h Adding local solver equations (3.13a) and (3.14a) over all elements, we find that q ∞ h ∞ and u∞ LDG . But, h satisfy the first equation of the LDG method with λh in place of u h since every interior edge has an infinite penalty branch and since
(3.16)
∞ λ∞ h |FK = (uh )FK
for all elements K,
LDG . we find that λ∞ h is in the same form as LDG numerical trace u h Also, summing local solver equations (3.13b) and (3.14b) over all elements, we ∞ ∞ find that q ∞ h ≡ h and uh satisfy the second equation of the LDG methods, with q LDG ∞ + Qg h + Qf in place of q Qλ . We will now show that the second equation, in h h fact, holds with the LDG flux. For this, we use the fact that (3.17)
[[ q∞ h ]] , μEh = 0
for all μ in the subspace Mh of functions in Mh (defined by (3.15)), with μ|∂Ω = 0. Now, if w is any function in W (K), then w|FK , extended by zero to Eh , is in Mh . Therefore, (3.17) implies q∞ q∞ h · n , wFK = − ( h )K c · (n)K c , wFK ∞ ∞ = − ( q∞ (n)K c , (n)K c w FK . h )K c + (τ )K c (uh )K c − λh
1346
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
Here, for notational convenience, we have denoted the branch of a multivalued function f from outside K by (f )Kc . By (3.16), we can rewrite the right-hand side as ∞ q∞ q∞ h · n , wFK = − ( h )K c + (τ )K c [[uh ]] , (n)K c w FK
and conclude that
(3.18)
q∞ h · n , w∂K =
K
q LDG · n , w∂K . h
K
∞ Thus, q ∞ h and uh satisfy the same equations as the LDG method with the same expressions for numerical traces as in the LDG case. ∞ LDG Note that in the above proof, q are not identical, in general, alh and q h though (3.18) holds. This explains why the normal component of the limiting LDG-H numerical trace may not be single valued, although the numerical trace of its equivalent LDG method is single valued.
3.5. The CG-H method. The CG-H methods are obtained by using the CG method to define the local solvers. We are also going to see that they are also obtained from LDG-H methods by letting τ go to infinity everywhere. Again, we need to specify the main ingredients of the local solvers. Similarly to the the limiting case of LDG-H methods, we need to give a new meaning of the will be unknown, we need an local solvers since τ = ∞. Since the numerical flux Q· appropriate space for its approximation. 1. For any k ≥ 1 and any K ∈ Th , we define the finite element spaces by (3.19)
V (K) = Pk−1 (K)n , W (K) = Pk (K), T (∂K) := {nK w|∂K : w ∈ W (K)}.
and
are unknown and will be determined by the 2. The numerical traces of fluxes Q· modified local solvers as follows: (Qm, Um, Qm) ∈ V (K) × W (K) × T (∂K) is a solution to the problem (3.20a) (3.20b)
(c Qm, v)K − ( Um, div v)K = −m , v · n∂K , I J ·n −(gradw, Qm)K + w , Qm + (d Um, w)K = 0, ∂K
(3.20c)
Um = m
on ∂K.
) ∈ V (K) × W (K) × for all v ∈ V (K) and w ∈ W (K). Similarly, (Qf, Uf, Qf T (∂K) is defined by (3.21a) (3.21b)
(c Qf, v)K − ( Uf, div v)K = 0, I J ·n + (d Uf, w)K = (f, w)K , −(gradw, Qf )K + w, Qf ∂K
(3.21c)
Uf = 0
on ∂K.
for all v ∈ V (K) and v ∈ W (K), 3. For the space of approximate traces, we take (3.22)
Mh := Mch,k .
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1347
We begin our discussion regarding the above CG-H method by verifying the assumptions required by Theorem 2.4. Proposition 3.5. Assumption 2.1 on the existence and the uniqueness of the local solvers holds for the CG-H local solver. Assumption 2.2 on the positive semidefiniteness of the local solvers and Assumption 2.3, the gluing condition, hold with M (∂K) = L2 (∂K). Proof. We prove the result for local solver (Qm, Um, Qm) defined by (3.20). The result for the local mapping defined by (3.21) is similar. Since the resulting system is square, we prove only uniqueness since this implies existence. Thus, we need to show that if m = 0, then the only solution is the trivial one. Taking v = Qm in (3.20a) and w = Um in (3.20b) and adding the resulting equations, we get I J − Qm · n (c Qm, Qm)K + Um , Qm + (d Um, Um)K = 0. ∂K
Since, by (3.20c), Um = 0 on ∂K, we immediately obtain that Qm = 0. This implies that (3.20a) can be rewritten as follows: (grad Um, v)K = 0
for all v ∈ V (K),
which implies that Um = 0. = 0. To do that, we use (3.20b) rewritten as It remains to show that Qm I J ·n w , Qm =0 for all w ∈ W (K). ∂K
By the definition of space T (∂K), we can find a function w ∈ W (K) such that = w n. This readily implies that Qm = 0. This completes the verification of Qm Assumption 2.1. Inequality (2.13a) of Assumption 2.2 can easily be seen to hold. The second part of Assumption 2.2 also holds, since M (∂K) = L2 (∂K). Finally, Assumption 2.3 trivially holds. Next, we discuss the conservativity condition. Flux approximation q h of the CGH method is, in general, not in H(div, Ω). Nonetheless, it is interesting to observe that even the CG-H method has a weak conservativity property. This property holds h + Qg h + Qf , a quantity that is not present in for numerical flux trace q h = Qλ the standard formulations of the CG methods but essential in our approach. Indeed, h satisfies Theorem 2.1 asserts that q μ , [[ q h ]]E◦h = 0
for all μ ∈ Mh ,
which is a weak conservativity condition. Observe that if a is a constant matrix on each element, by the definition of local solvers (3.20) and (3.21), we have that (3.23)
Qm = −agrad Um
and
Qf = −agrad Uf.
Hence, q h in (2.8a), being the sum of the local flux solutions, equals −agraduh on each element. Substituting this in (2.8b) and using the conservativity condition, we immediately see that uh satisfies the standard CG equations. In addition, the boundary conditions defining local solvers (3.20c) and (3.21c) imply that uh is continuous.
1348
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
Thus, we conclude that this CG-H formulation coincides with the CG method whenever a is constant. In other words, the original CG method is a CG-H method when the matrix-valued function a is a constant on each element. In this case, we can also simplify the forms in (2.9) using (3.23) to ah (η, μ) = (agrad Uη, grad Uμ)Th + (d Uη, Uμ)Th , I J + (f, Uμ)Th . bh (μ) = gh , Qμ Eh
Note that in our case, we do not necessarily have that gh |E◦h = 0. Hence, the corresponding integral cannot be performed only on ∂Ω as in the previous cases. Formulation (2.9) is nothing but the weak formulation for the CG method with static condensation of its interior degrees of freedom. This hybridization approach for the CG methods of degree k is explored in [31], where, in particular, a postprocessing technique providing locally conservative flux approximations competitive with that given by the RT methods of degree k − 1 is introduced. When the matrix-valued funcion a is not constant on each element, we cannot write (3.23) anymore. Instead, “a” has to be replaced by a function “a,” which is, roughly speaking, the inverse of some local average of c, the inverse of a. In practice, however, we do not compute the matrix-valued function a; instead, we compute directly the functions Qm and Qf by using the definition of the local solvers. 3.6. IP-H methods. The IP-H methods are obtained by using the numerical traces and the local solvers of the IP method. Thus, 1. the numerical traces are given by (3.24) = −agrad Uf +τK ( Uf ) n, = −agrad Um+τK ( Um−m) n,Qf Qm
on ∂K;
2. the finite element space V (K) × W (K) is defined for k ≥ 1 as (3.25)
V (K) = Pk (K)n ,
W (K) = Pk (K);
3. the space of approximate traces is chosen as (3.26)
Mh := Mh,k .
As before, τ is a double-valued function on E◦h , with two branches τ − = τK − and τ + = τK + defined on the edge e shared by the finite elements K − and K + . Note that IP methods can be defined by using a flux formulation, as the one employed here to define the local solvers or by means of a primal formulation; see [5]. These two IP methods, however, do coincide whenever the function a is a constant on each element K ∈ Th . For this reason, we are going to assume here that this is the case. All the results for this case, however, can be easily extended to the case in which a is not necessarily piecewise constant. Next, we provide sufficient conditions for the IP-H method to be well defined. For simplicity, we assume that mesh Th is shape regular, that is, that there is a constant γ > 0 such that hK /ρK ≤ γ for all simplexes K ∈ Th , where hK is the diameter of K and ρK the diameter of the largest ball contained in K. Proposition 3.6. Let the numerical traces be given by (3.24) and the local spaces by (3.25). Suppose a(x) is a constant matrix on each element K. Then Assumption (2.1) on the existence and the uniqueness of the local solvers holds provided τK > c0 /hK for some constant c0 > 0 depending on γ and a(x).
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1349
For a proof, see [6, 3]. Having established that the local solvers are well defined, we can apply Theorem 2.1. We find that the conservativity condition implies that λh solves (2.9), with ah (η, μ) =(c Qη, Qμ)Th + (d Uη, Uμ)Th + 1 , [[(μ − Uμ)(a grad Uη + Qη)]]Eh , + 1 , [[( Uμ − μ)(τ ( Uη − η)n)]]Eh , bh (μ) =gh , −agrad Uμ · n + τ Uμ∂Ω + (f, Uμ)Th , provided gh |E◦h = 0. Using (2.12a) of Lemma 2.3 and the fact that a(x) is constant on each K, we can simplify this expression as follows: ah (η, μ) = (c Qη, Qμ)Th − (c Qη + grad Uη, Qμ + agrad Uμ)Th + (d Uη, Uμ)Th + 1 , [[( Uμ − μ)(τ ( Uη − η)n)]]Eh = (agrad Uη, grad Uμ)Th + (d Uη, Uμ)Th + 1 , [[ ( Uη − η) agrad Uμ + ( Uμ − μ) agrad Uη ]]Eh + 1 , [[( Uμ − μ)(τ ( Uη − η)n)]]Eh . The positive definiteness of the form ah (·, ·) can be proven as in the case of LDGH methods. Indeed, this fact is an immediate consequence of Theorem 2.4 and the following result. Proposition 3.7. Let the numerical traces of the fluxes be set by (3.24), the local spaces be defined by (3.25), and the space of approximate traces be set by (3.26). Suppose a(x) is a constant matrix on each element K. Then Assumption 2.2 on the positive semidefiniteness of the local solvers and Assumption 2.3, the gluing condition, are satisfied with M (∂K) = {μ : μ|e ∈ Pk (e) for all faces e ∈ ∂K} whenever τK > c0 /hK for some constant c0 > 0 depending on γ and a(x). The proof of this result is similar to that of Proposition 3.3. Just as for LDG-H methods, we can give a characterization of the IP-H methods. It is given in the proposition below, which is an analog of Proposition 3.4 for the LDG-H methods. Since the proof is similar, we omit it. Proposition 3.8 (characterization of IP-H methods). Let the numerical traces be set by (3.24), the spaces be as in (3.25), and (q h , uh ) be as defined in (2.5). Then conservativity condition (2.6) holds if and only if on E◦h (3.27a)
λh = u h =
τ+ τ− 1 + − + − u u [[agraduh ]], τ− + τ+ h τ− + τ+ h τ+ + τ−
(3.27b) h = − q
+ − τ− τ+ τ τ + − + − a graduh − a graduh + [[uh ]]. τ− + τ+ τ− + τ+ τ− + τ+
We also have results analogous to Corollary 3.2. Corollary 3.4. The standard IP method is not an IP-H method for any finite τ . Proof. Comparing the numerical traces of the standard IP method (see [5, Table 3.1]), namely, hIP = − {{agraduh }} + C [[uh ]], u hIP = {{uh }} and q
1350
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
with the expressions for the numerical traces in Proposition 3.8, we find that they cannot coincide for any value of τ . In spite of this negative result, a stabilized DG finite element method introduced in [38] and rewritten in [37] as an IP method, turns out to be an IP-H method. To describe this scheme in a simple setting, assume that d = 0 and g = 0. The method, as presented in [38], does not use the function λh approximating u|Eh . Instead, it uses approximate fluxes h approximating the normal component of agradu. The space in which h lies is the space of scalar double-valued functions defined by Lh = {q :
q|e ∈ Pk (e) for all e ∈ Eh and qK + + qK − = 0 on e = ∂K + ∩ ∂K − }.
The DG method of [38] seeks uh ∈ Wh , given by (2.1), with W (K) = Pk (K), and h ∈ Lh such that (3.28) (agraduh , gradv)K − h , v∂K − η , uh ∂K K∈Th
− αh
h − agraduh · nK , η − agradv · nK ∂K = (f, v)
K∈Th
for all v ∈ Wh and η ∈ Lh . Here, α > 0 is a constant stabilization parameter, and h = maxK∈Th hK . Taking v ≡ 0 and using that {{η}} = 0 on E◦h , we get (3.29)
h =
1 {{agraduh }} · n − 2αh [[uh ]] · n 1 agraduh · n − αh uh
on E◦h , on E∂h .
We see from the above equation that h is indeed an approximation to the normal component of agradu. Next, taking η ≡ 0 in (3.28) and substituting therein the expression for h from (3.29), we get that uh ∈ Wh satisfies
(3.30)
(agraduh , gradv)Th − {{agradv}} , [[uh ]]Eh ? @ 1 [[uh ]] , [[v]] − {{agraduh }} − 2αh E◦ h ? @ 1 uh n , vn − agraduh − αh E∂ ? @h αh [[agraduh ]] , [[agradv]] − = (f, v) 2 E◦ h
for all v ∈ Wh . Now, we show that this is an IP-H method. Comparing the above formulation with the general primal formulation given by [5, equation (3.11)], we can easily verify that if we take
(3.31)
αh [[agraduh ]] on E◦h , u h = {{uh }} − 2 1 − {{agraduh }} + 2αh [[uh ]] on E◦h , h = q 1 −agraduh + αh uh n on E∂h ,
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1351
we recover (3.30). Hence, the above numerical traces are exactly the numerical traces of the IP-H method given by Proposition 3.8 with τ + = τ − = (αh)−1 . This shows that the DG method proposed in [38] is an IP-H method. The correspondence between their flux approximation h and our numerical flux trace follows immediately from (3.31) and (3.29): h · n = −h . q It also follows from Proposition 3.6 that the IP method of (3.30) is well defined when α > 0 is sufficiently small; a result already established in [38]. Let us end by pointing out that other IP–H-like methods can be obtained. For example, we could take V (K) = Pk−1 (K)n . 3.7. The NC-H methods. We now consider nonconforming hybridizable (NCH) methods and show that methods like the P1 -nonconforming method introduced in [36] in the framework of the stationary Stokes equations, are, in fact, NC-H methods. Again the main components of the NC-H method are defined as follows: 1. For any k ≥ 1, set (3.32)
V (K) = Pk−1 (K)n , W (K) = Pk (K), M (∂K) = {q : q|e ∈ Pk−1 (e) for every face e of K}, T (∂K) = {qnK : q|e ∈ Pk−1 (e) for every face e of K}.
K ) and (Qf, Uf, (Qf )K ) as the ele2. Define local solutions (Qm, Um, (Qm) ments of V (K) × W (K) × T (∂K) satisfying (3.33a) (3.33b) (3.33c)
(c Qm, v)K − ( Um, div v)K = −m , v · n∂K , I J −(gradw, Qm)K + w , Qm· n
∂K
+ (d Um, w)K = 0,
Um , μ∂K = m , μ∂K ,
for all v ∈ V (K), w ∈ W (K), and μ ∈ M (∂K), and (3.34a) (3.34b)
(c Qf, v)K − ( Uf, div v)K = 0, I J ·n −(gradw, Qf )K + w, Qf
∂K
(3.34c)
+ (d Uf, w)K = (f, w),
Uf , μ∂K = 0,
v ∈ V (K), w ∈ W (K), and μ ∈ M (∂K). 3. The space of approximate traces is given by Mh = Mh,k−1 . Having completed the definition of the main ingredients of the method, we now verify the assumptions of Theorem 2.4. Sufficient conditions under which Assumption 2.1 on the existence and the uniqueness of the local solvers hold are given next. Proposition 3.9. For k = 1 and arbitrary n and for odd k > 1 and n = 2, local solvers (3.33) and (3.34) have unique solutions. Proof. We prove only the result for the first local solver, since the other can be proven in a similar way. Since (3.33) is a square system, it suffices to prove that if = 0. Choosing v = Qm and w = Um, m = 0, then Qm = 0, Um = 0, and Qm
1352
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
adding (3.33a) and (3.33b), and integrating by parts, we get I J − Qm · n (c Qm, Qm)K + (d Um, Um)K + Um , Qm
= 0.
∂K
− If m = 0, (3.33c) implies that Um , μ∂K = 0 for all μ ∈ M (∂K). Since (Qm Qm) · n ∈ M (∂K), then the last term on the left-hand side above is zero, and hence, Qm = 0 and d Um = 0. Substituting this into (3.33a), we have 0 = ( Um, div v)K = −(grad Um, v)K
for all v ∈ Pk−1 (K)n ,
where, while integrating by parts, we have again used that Um , v · n∂K = 0. Thus, grad Um vanishes, so Um is a constant function, and Um , μ∂K = 0 implies that it vanishes identically. It remains to show that Qm·n also vanishes. Since both Qm and d Um vanish, (3.33b) implies that I J ·n = 0 for all w ∈ Pk (K). (3.35) w , Qm ∂K
For k = 1, that is, for Crouzeix–Raviart nonconforming finite elements, the result · n|e = aj for some constants follows easily for any dimension n ≥ 2. Indeed, let Qm j aj , j = 1, . . . , n + 1. Let w ∈ P1 (K) be a linear function on K which takes values aj · n∂K = at the centroids of the faces ej of K, j = 1, . . . , n + 1. Then 0 = w , Qm n+1 2 j=1 |ej |aj implies aj = 0 for all faces, that is, Qm·n = 0. Finally, we show the same for k odd and n = 2. Let e1 , e2 , and e3 denote the (j) three edges of K, and let Li denote the ith Legendre polynomial mapped affinely to ej from [−1, 1]. Assume that the first vertex of the edge ej is mapped to the point −1, and that, as we go from its first to its second vertex, the triangle K is to our left. · n|e ∈ Pk−1 (ej ), we can write Since Qm j
Qm
K
· nK |ej =
k−1
(j)
(j)
ai L i .
i=0
(j)
Note that when i is even, Li takes the same value at the endpoints of ej . Therefore, (1) (2) for any even i, we can choose a w in (3.35) such that w|e1 = Li , w|e2 = −Lk , (3) and w|e3 = Lk (because with these choices w|∂K is continuous). Then (3.35) implies (1) that the coefficient ai vanishes. Repeating the argument for all edges, we find that (j) (1) ai = 0 for all even i and j = 1, 2, 3. Next, for odd i, choose w such that w|e1 = Li , (2) (3) w|e2 = Li−1 , and w|e3 = −Lk . Since k is odd, these choices make w|∂K continuous, (1) so such a w can be found. With this w, (3.35) now gives that ai = 0 for all odd i as well. Repeating this argument for other edges, we find all coefficients to be zero, so vanishes. Qm Conservativity condition (2.6) with Mh = Mh,k−1 clearly implies strong conserva cancel off in weak tivity. Using Theorem 2.1 and noting that the unknown fluxes Q· formulation (2.9), by boundary condition (3.33c) for the local solver, we have that the bilinear form is symmetric: ah (η, μ) = (c Qη, Qμ)Th + (d Uη, Uμ)Th . Its positive definiteness will follow from Theorem 2.4 once Assumptions 2.2 and 2.3 are verified, which we do next.
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1353
Proposition 3.10. Assumption 2.2 on the positive semidefiniteness of the local solvers and Assumption 2.3, the gluing condition, are satisfied with M (∂K) defined as in (3.32). Proof. First, we show that condition (2.13a) holds. Taking v = Qm in (3.33a), w = Um in (3.33b), and adding the equations, we get, after a few simple algebraic manipulations, that I J I J − Qm · n , Um − m ·n = (c Qm, Qm)K + (d Um, Um)K + Qm − m , Qm ∂K
∂K
= (c Qm, Qm)K + (d Um, Um)K , by boundary condition (3.33c) for the local solver. This implies that (2.13a) of Assumption 2.2 is satisfied. · n∂K = 0, then Qm|K = 0 and Now, we prove condition (2.13b). If m , Qm (3.33a) becomes (grad Um, v)K = Um − m , v · n∂K = 0
for all v ∈ V (K).
This implies that that Um is a constant. This shows that condition (2.13b) of Assumption 2.2 is satisfied. Assumption 2.3 is trivially satisfied, and this completes the proof. In Tables 3.1 and 3.2, we give the simplified weak formulation of the NC-H method under the further assumption that c(x) is a constant matrix on each K in Th . In this case, we can show that the original NC method is an NC-H method. To see why, first observe that by summing up the last equation of the local solvers, we find that uh = Uλh + Ugh + Uf satisfies [[uh ]] , μe = 0
for all μ ∈ Pk−1 (e)
for all interior faces e, so the weak continuity constraints of the discontinuous method are satisfied. Now, (2.12a) and (2.12c) become ( c Qλh + grad Uλh , v)Th = 0 and ( c Qf + grad Uf, v)Th = 0, which gives q h = Qλh + Qgh + Qf = −agrad( Uλh + Ug + Uf ) = −agraduh . Then (2.8a) implies (agraduh , gradvh )Th + (duh , vh )Th = (f, vh ) for all vh ∈ {w : w ∈ Wh , [[w]] , μE◦h = 0 for all μ ∈ Mh and w , m∂Ω = 0 for all m ∈ Mh }, which is the familiar primal form of this nonconforming method. Note that although the information in gh disappears from the right-hand side above, it is contained in uh as uh = Uλh + Ugh + Uf . Let us end this subsection by pointing out that, in the case of lowest order polynomials k = 1 and for the case in which d = 0 and both c and f are constant on each simplex K of triangulation Th , our hybridization framework allows us to recover a well-known relationship between the RT method of lowest degree and the nonconforming method [4, 47]. Let us sketch how to obtain it. In this case, we can easily show that local solver Qm is the same for both this nonconforming method and that of the RT method of lowest degree; see the computation of the RT method in [26]. · n = Qm · n, we can conclude that the stiffness matrix Since we also have that Qm associated with bilinear form ah (·, ·) of both methods is also the same—if the degrees
1354
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
of freedom for the numerical traces are the barycenters of the faces. Moreover, since the average on each simplex of the local solver Um coincides with the local solver Um of the RT method under consideration, the matrix associated with linear form bh (·) is also the same for both methods. Of course, in both cases, we take gh at the barycenter of each face e ∈ E∂h to be the average of g on the face e. By Theorem 2.1, the degrees of freedom of the approximate traces are the same for both methods. The above-mentioned relation between the two methods now easily follows from the definition of approximate solutions (2.5). 4. Other novel methods. In this section, we build on the work done in the previous section and construct what are perhaps the three most important examples of methods of the unifying framework. The first is a class of methods employing different local solvers in different parts of the domain, which can easily deal with nonconforming meshes. The second is an RT method that can handle hanging nodes. The third is the family of EDG methods; they are constructed from already known hybridized methods in this unified framework in order to reduce their computational complexity. As for the examples of the previous section, we assume that the mesh is simplicial; however, we do not assume it to be necessarily conforming. 4.1. A class of hybridizable methods well suited for adaptivity. We introduce here a class of hybridizable methods able to use different local solvers in different elements and to easily handle nonconforming meshes. They are thus ideal to use with adaptive strategies. After introducing the methods, we prove that they are all well defined. We then discuss their main advantages and give several examples. To define the methods, we need to specify the numerical fluxes, the local finite element spaces, and the space of approximate traces: 1. For any simplex K ∈ Th , we take (4.1)
= Qm + τK ( Um − m)n, Qm
= Qf + τK ( Uf ) n Qf
on ∂K;
the function τK is allowed to change on ∂K. 2. The local space V (K) × W (K) can be any of the following: (4.2a) Pk(K) (K)n + x Pk(K) (K) × Pk(K) (K), (4.2b)
where k(K) ≥ 0 and τK ≥ 0 on ∂K, Pk(K) (K)n × Pk(K)−1 (K),
(4.2c)
where k(K) ≥ 1 and τK ≥ 0 on ∂K, Pk(K) (K)n × Pk(K) (K),
(4.2d)
where k(K) ≥ 0 and τK > 0 on at least one face e of K, Pk(K)−1 (K)n × Pk(K) (K), where k(K) ≥ 1 and τK > 0 on ∂K.
3. The space of approximate traces is (4.3a) Mh = Mh ∩ μ : μ|∂K ∈ C({x ∈ ∂K : τK (x) = ∞}) where (4.3b)
Mh := {μ ∈ L2 (Eh ) : μ|e ∈ Pk(e) (e) for all e ∈ E◦h }.
. ∀K ∈ Th ,
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1355
R Q
K S
P
Fig. 4.1. The interior edges e = PQ and e = QR are contained in the face PR of the element K. Assumption 4.1 is satisfied for this element if τK |∂K ∈ [0, ∞] and if τK |PQ and τK |QR are taken in (0, ∞).
Here, if e = ∂K + ∩ ∂K − , we set
(4.3c)
⎧ max{k(K + ), k(K − )} ⎪ ⎪ ⎪ ⎨k(K + ) k(e) := ⎪ k(K − ) ⎪ ⎪ ⎩ min{k(K + ), k(K − )}
if if if if
τ+ τ+ τ+ τ+
<∞ =∞ <∞ =∞
and and and and
τ− τ− τ− τ−
< ∞, < ∞, = ∞, = ∞,
and take gh = Ih g for some interpolation operator Ih into Mh . Note that the choice τ = ∞ on some interior faces e ∈ E◦h is allowed. We follow the convention that in this case, the definition of the local solvers has to be modified as for the limiting cases of LDG-H methods; see subsection 3.4. The definition of the methods is competed with the following assumption on the values of the stabilization parameter τ . Assumption 4.1. For each element K ∈ Th and each interior face e ∈ E◦h on ∂K, τK |e ∈ [0, ∞] and (4.4)
τK |e ∈ (0, ∞) if e is not a face of K.
Let us briefly discuss this assumption. First, let us recall the difference between an interior face e ∈ E◦h and the faces of the simplexes of the triangulation. Each simplex K in the partition Th has n + 1 faces determined by its vertices. On the other hand, if e is an interior face, we have that e = ∂K + ∩ ∂K − for some elements K + and K − in Th . We thus see that, for nonconforming meshes, although each interior face e is contained in a face of K + and a face of K − , it is not necessarily a face of K + or K − . See an example in Figure 4.1. The main motivation of the above assumption can now be easily seen. Indeed, take any K ∈ Th . If e ⊂ ∂K is a face in E◦h which is not a face of K, then the above assumption forces us to take the numerical trace corresponding to an LDG-H method; in this way, the nonconformity of the mesh can be dealt with in a very natural way. If, on the contrary, e is actually a face, the assumption allows us to take either τK = 0, τK ∈ (0, ∞), or even τK = ∞. In this way, the verification of Assumptions 2.1, 2.2, and 2.3 becomes extremely easy, as we are going to see next. Next, we show that the approximate solution (q h , uh ), (2.5), provided by this method is well defined. Proposition 4.1. Consider the method defined by (4.1), (4.2), and (4.3), and let Assumption 4.1 hold. Then Assumption 2.1 on the existence and the uniqueness of
1356
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
the local solvers, Assumption 2.2 on the positive semidefiniteness of the local solvers, and Assumption 2.3, the gluing condition, hold with
(4.5)
M (∂K) = {μ :
on any face e ∈ E◦h on ∂K, μ|e ∈ Pk(K) (e) if τK |e = 0, and
μ|e ∈ L2 (e) if τK |e > 0}.
Proof. Thanks to Theorem 2.4, we have only to satisfy Assumptions 2.1, 2.2, and 2.3. We begin by verifying Assumption 2.1 on the existence and the uniqueness of the local solvers. Let K be an arbitrary simplex of triangulation Th . Then, as discussed above, by condition (4.4), we have either τK = 0, τK ∈ (0, ∞), or τK = ∞ on each of the faces of each simplex K of triangulation Th . As a consequence, the fact that the local solvers are well defined can be easily obtained by a straightforward modification of the proofs of similar results for the LDG-H methods, Proposition 3.5, and the CG-H method, Proposition 3.5. For this reason, we do not present here the proof. However, let us note that whenever τK |e = ∞, we strongly impose a Dirichlet boundary condition, and so the space of approximate traces restricted to ∂K and local space W (K) must satisfy the following compatibility condition: {μ|S :
μ ∈ Mh } ⊂ {w|S :
w ∈ W (K)},
where
S := {x ∈ ∂K : τK (x) = ∞}.
This condition can be easily verified by noting that, if τ = ∞ on the interior face e ∈ E◦h , then e must be a face of K by the conditions on the stabilization parameters (4.4), and since, by the definition of k(e), (4.3c), we have that k(e) ≤ k(K). Next, let us prove that Assumption 2.2 on the positive semidefiniteness of the local solvers is satisfied with M (∂K) as in (4.5). For choice (4.2a), it is easy to see that it follows from Proposition 3.1 and from the definition of k(e), (4.3c). For the remaining choices, the result follows from Proposition 3.3 and the definition of k(e), (4.3c). Assumption 2.3, the gluing condition, also follows by using the arguments of the previous section. Indeed, for an interior face e = ∂K + ∩ ∂K − , if τ + or τ − is positive, the result trivially follows from condition (4.3c) and the fact that on e, one of the projections P∂K + or P∂K − becomes the identity by the definition of k(e), (4.3c). It remains to consider the case τ + = τ − = 0. By (4.3c), either k(K + ) or k(K − ) equals k(e), say, k(e) = k(K + ). Then we immediately have that P∂K + μ = μ. This completes the proof. Next, let us discuss the main features of these methods. (i) Variable degree approximation spaces on conforming meshes. The RT-H, BDM-H, and LDG-H methods considered in the previous section used a single local solver in each of the elements K of the conforming triangulation Th . A variabledegree version of each of these methods is a particular case of the class of methods presented here. Note that the case of the variable degree RT method, introduced and analyzed in [27], is exactly the variable-degree version of the method using the RT method as local solvers. (ii) Automatic coupling of different methods on conforming meshes. The methods presented here allow for the use of different local solvers in different elements K of Th , which are then automatically coupled. For example, if we are working with the RT, LDG, and CG local solvers, the conservativity condition implicitly imposes
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1357
the following expressions for the numerical traces: u h = uh |ΩLDG + u h = uh |ΩCG , u h = uh |ΩCG ,
1 τLDG
h = q h |ΩRT [[q h ]], q
(coupling RT and LDG),
h = q h |ΩLDG + τLDG [[uh ]] (coupling of LDG and CG), q h = q h |ΩRT . q (coupling of CG and RT).
Note that this coupling holds even for nonconforming meshes. It is interesting to compare the above couplings with other couplings in the available literature, namely, u h = uh |ΩLDG , u h = uh |ΩCG ,
h = q h |ΩRT + C11 [[uh ]] q h = q h |ΩLDG + τLDG [[uh ]] q
(coupling of RT and LDG in [23]), (coupling of LDG and CG in [48]).
(iii) Mortaring capabilities (for nonconforming meshes). One of the advantageous features of DG methods is their ability to handle nonconforming meshes; see [52] for an application to structural mechanics. The methods under consideration incorporate this mortaring ability thanks to the very form that the numerical trace of the flux on ∂K takes on an interior face e ∈ E◦h which is not a face of K, and thanks to the definition of the stabilization parameter τ therein. Let us give two examples. If we have a conforming mesh, we can take the first choice of local spaces (4.2a) and set τ ≡ 0. The resulting method, as we have seen, is nothing but the RTH method. We can easily modify this method to handle nonconforming meshes by simply taking τK ∈ (0, ∞) on every interior face e ∈ E◦h which is not a face of K, and otherwise, taking τK = 0. Thus, the resulting method can be considered as a variation of the RT method, which is capable of handling nonconforming meshes. We can do something similar with the CG method. Indeed, if the mesh is conforming, we can take the last choice of local spaces (4.2d) and set τ ≡ ∞ to obtain the hybridized CG method. For nonconforming meshes, we can slightly modify the method by simply taking τK ∈ (0, ∞) on every interior face e ∈ E◦h which is not a face of K, and otherwise, taking τK = ∞. The resulting method is thus a variation of the CG method capable of handling nonconforming meshes. It constitutes an alternative to the coupling of the CG and the LDG methods proposed in [48] to deal with nonmatching meshes. (iv) The conservativity condition. Let us end by noting that the stiffness matrix associated to the approximate trace λh is always symmetric and positive definite. Moreover, on the interior faces on which τ < ∞, the conservativity condition is enforced strongly. 4.2. The RT method on meshes with hanging nodes. Consider the case of the variable degree RT-H method. The method is obtained by taking the numerical traces as in (4.1) with τ ≡ 0, the local space (4.2a), and the multiplier space as in (4.3). This method does not belong to the family of methods described in the previous subsection because our choice of stabilization parameter does not satisfy condition (4.4). Thus, to ensure the existence and the uniqueness of the approximate solution, we have to impose special conditions on the meshes and link the definition of k(K) to the structure of the mesh. Let us illustrate how to do this in the two-dimensional case. The meshes Th we consider are constructed as follows. First, construct a conforming triangulation of Ω, Th (0) := {K (0) }. Then, take a subset of that triangulation Th (0,1) and divide each of its triangles into four congruent triangles; the set of those triangles is denoted by
1358
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
Klt Kl0
Kr
Klb
Fig. 4.2. In the presence of hanging nodes like the above, an RT-H method with the spaces on edges chosen to have the maximum degree from either side is well defined if k(Klt ) ≥ k(Kr ) and k(Klb ) ≥ k(Kr ). The degree k(Kl0 ) can be arbitrarily chosen.
Th (1) . Next, for j = 2, . . . , , given the set Th (j−1) , pick the subset Th (j−1,j) and create the set of smaller triangles Th (j) . The simple case = 1 is illustrated in Figure 4.2. Finally, we establish a link between the mesh and the definition of the polynomial degree of the RT method on the triangle K, k(K), as follows. If e = ∂K + ∩ ∂K − , we take k(e) := max{k(K + ), k(K − )} and require that (4.6)
if e is not an edge of K − , then k(K + ) ≥ k(K − ).
Next, we show that the method is well defined. Proposition 4.2. The variable-degree RT-H method on meshes with hanging nodes as described above is uniquely solvable. Proof. If we proceed exactly as in Proposition 3.1, we can see that Assumption 2.1 on the existence and the uniqueness of the local solvers is verified and that Assumption 2.2 on the positive semidefiniteness of the local solvers is also verified provided we change the definition of the set M (∂K) to M (∂K) = {μ : μ|e ∈ Pk(e) (e) for all edges e of ∂K}. The result follows if we prove that there is only one solution λh ∈ Mh of weak formulation (1.7). To do that, we proceed exactly as in the proof of Theorem 2.4. First, since Assumption 2.1 holds, we have that system (2.9) is well defined. Next, we show that ah (μ, μ) = 0 for μ ∈ Mh implies μ = 0. By Assumption 2.2, we readily obtain that, for any given K ∈ Th , we have that, on ∂K, CK = P∂K μ, where P∂K is the L2 -projection into M (∂K) as defined above. It remains to show that this implies that μ is a constant on Eh . To do that, we use the structure of the meshes and the definition of k(K) for all K ∈ Th . We proceed as follows. We claim that, for j = , − 1, . . . , 0, we have that μ|∂K is a constant for all K ∈ Th (j) . This immediately implies that μ is a constant on Eh , and since μ|∂Ω = 0, that μ = 0 on Eh . It remains to prove the claim. We proceed by induction on j. Let us prove the inductive hypothesis for j = . Let K be any triangle in Th () and pick any of its edges
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1359
e. If the edge e lies on ∂Ω, we immediately have that μ = CK = 0. If e = ∂K ∩ ∂K for some triangle K ∈ Th () , we proceed as in the proof of Theorem 2.4 to conclude that μ = CK = CK
on e.
The only other remaining possibility, by construction of triangulation Th () , is that e = ∂K ∩ ∂K for some triangle K ∈ Th ( ) , with < . In this case, e is not an edge of K , and, by condition (4.6) on k(K), we have that k(K) ≥ k(K ) and hence k(e) := max{k(K), k(K )} = k(K). This implies that μ = CK
on e.
Since edge e was picked arbitrarily, we conclude that μ|∂K is a constant, as wanted. Now, let us assume that the inductive hypothesis holds for j = J and let us prove it also holds for j = J − 1. Let K be any triangle in Th (J−1) and pick any of its edges e. Since, by the inductive hypothesis, μ|∂K is a constant for all K ∈ Th (J) , we have that μ|∂K is a constant for all K ∈ Th (J−1,J) , since, by construction, each of the triangles in Th (J−1,J) is subdivided in four congruent triangles in K ∈ Th (J) . Hence, μ = CK on e if the edge e lies in the border of any triangle in K ∈ Th (J−1,J) . To finish the proof, we need only to prove the same result in the remaining three cases: (i) if the edge e lies on ∂Ω, (ii) if e = ∂K ∩ ∂K for some triangle K ∈ Th (J−1) \ Th (J−1,J) , and (iii) if e = ∂K ∩ ∂K for some triangle K ∈ Th (J −1) , with J < J. This can be done exactly as in the previous step. 4.3. The EDG methods. Now we show that new methods [33] can be immediately generated from already existing hybridized methods by simply reducing the space of their approximate traces. The main interest of these EDG methods, introduced in the setting of shells problems in [43], stems from the further reduction in globally coupled unknowns achieved by reducing the approximate trace space Mh . To construct such methods, we begin with selecting any method defined by uniquely solvable local problems (2.3), (2.4), and conservativity condition (2.6), yielding a unique approximate trace λh . Then, by Theorem 2.1, λh ∈ Mh is the only solution of the weak formulation (4.7)
ah (λh , μ) = bh (Ih g; μ)
for all μ ∈ Mh ,
where we are writing bh (Ih g; μ) instead of just bh (μ) in order to stress its dependency on Ih g ∈ Mh . We now define an EDG method by replacing the original approximate 6 h . This forces us to replace Ih g ∈ Mh by I6h g ∈ M 6h trace space Mh by a subspace M and to change the conservativity condition, but the local solvers remain the same. 6 h → Mh as the identity operator representing the Now, define the operator Jh : M 6 h into Mh , hence, the name of these methods, and set natural embedding of M . 6h : μ Ph := μ M 6∈M 6 = 0 on ∂Ω . Then by Theorem 2.1, the new conservativity condition is equivalent to 6h , Jh μ Ph , 6 = bh Jh I6h g; Jh μ 6 for all μ 6∈M (4.8) ah Jh λ 6h ∈ M Ph is the new approximate trace. Note that we have that Jh μ where λ 6|∂Ω = 0 for P all μ 6 ∈ Mh .
1360
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
To show that this EDG method is well defined, it suffices to prove that homogeneous equation (4.8) has only a trivial solution. For simplicity, let us as6h , we get sume that ah (·, ·) is symmetric and positive definite. Thus, taking μ 6= λ 6 6 6 ah (Jh λh , Jh λh ) = 0. By the positive definiteness of ah (·, ·), we have Jh λh = 0, which 6h = 0. Hence, (4.8) is uniquely solvable. implies λ Now, let us show that it is very easy to obtain the equations for the EDG method once those of the original method have been obtained. Denote by [λh ] the vector of the degrees of freedom of the function λh with respect to some basis in Mh . Similarly, 6h ] the vector of degrees of freedom vector of the function λ 6h in M Ph . Equadenote by [λ 6h ], 6 tion (4.7) can be written in a matrix form as A [λh ] = b(Ih g), and if [Jh λh ] = T [λ 6h ] is T t A T [λ 6h ] = T t b(Jh I6h g). Here T is the rectangular then the equation for [λ P Ph ⊂ Mh , matrix representing the basis of Mh with respect to basis of Mh . Since M if we use the Lagrange basis functions, T is nothing but a connectivity matrix whose entries are zeroes and ones, so it is extremely easy to compute. Note that the above considerations continue to hold if Jh is any injective operator 6 h into Mh such that Jh μ Ph . Thus, new methods can also from M 6|∂Ω = 0 for all μ 6∈M 6 be created by using spaces Mh that are not necessarily subspaces of Mh . The main 6 h with respect task here would be to find the matrix T which represents the basis of M to basis of Mh . Let us give some examples of EDG methods. The first example of an EDG method was proposed in [43]: It is obtained from an LDG-H method using approximations of degree k in each variable by forcing the continuity of the traces. Thus, whereas the functions in the space of approximate traces for the LDG-H method Mh are discon6 h are continuous therein. tinuous on the borders of the elements, the functions of M This allows the method to be immediately incorporated into commercial codes. On the other hand, this also results in the degradation of the conservativity properties of the EDG method, which hold only weakly. In some cases, this induces a degradation in the approximating properties of the method as recently proven in [32]. Indeed, in that paper, it was shown that when the stabilization parameter τ is taken to be of order one, the EDG method converges with order k for q and order k + 1 for u. This has to be contrasted with the fact that the original LDG-H method converges with order k + 1 in both variables; see [33]. Moreover, in this case, the LDG-H has superconvergence properties that allow us to compute, in an element– by-element fashion, a new approximation to u converging with order k + 2; see also [33]. Such property does not hold for the corresponding EDG method. Even more, numerical experiments show that the computational advantage of the EDG method does not compensate for its loss of accuracy. On the other hand, if the stabilization parameter τ is taken to be of order h−1 , both the EDG and the LDG-H methods converge with the same orders, namely, k in q and order k + 1 in u. 6 h of Mh The second example is associated with the constructions of subspaces M that could be required to be very smooth. For example, we could ask that they be not only continuous on E◦h but C1 -continuous. This might be reasonable to do if the solution is very smooth and varies slowly in Ω. The third and last example is associated to methods for nonmatching grids. Suppose that Ω is divided into two domains Ω1 and Ω2 independently meshed, and that we are using the variation of the CG method to handle nonconforming methods described in the first subsection. Then, all the interior faces e lying on the interface Γ := ∂Ω1 ∩ ∂Ω2 must be computed and used to define the space of traces Mh . (This can be done, although it is a computational geometry tour de force, especially in
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1361
three dimensions; see [52].) To reduce the size of the interface space on Γ, we can alternately find a subspace of Mh of functions which are polynomials on the interior faces determined by, say, the vertices of the triangulation of Ω1 lying on Γ. 5. Extensions and generalizations. Although in all examples we have given, only simplicial elements have been considered, this is not essential. Obviously, quadrilateral, prismatic, and other elements could be handled easily by DG methods. Furthermore, our framework is applicable for mixed methods using other types of elements; see [12]. Note also that the considered DG, mixed, CG, and nonconforming finite element methods used to define the local solvers are not the only choices. Stabilized, Petrov– Galerkin methods, boundary element, and even (if possible) the exact solution can be used as local solvers. For example, the hybridization of the discontinuous Petrov– Galerkin method can be found in [18, 19]. In what follows, we sketch how to extend our results to include Neumann boundary conditions and interface transmission conditions. We also extend them to DG methods using other stabilization mechanisms. 5.1. Other boundary and transmission conditions. The hybridization method proposed here can be easily extended to other types of boundary and transmission conditions. Neumann boundary condition. For example, the case when on part ∂ΩN of the boundary ∂Ω the Neumann boundary condition q ·n = qN is specified can be incorporated easily in the hybridization procedure. We simply require that the approximate trace λh belongs to Mh = {μ ∈ Mh :
μ = 0 on ∂ΩD },
where ∂ΩD := ∂Ω\∂ΩN is the Dirichlet boundary and replace conservativity condition (2.6) by J I h + Qf h + Qg = μ , qN ∂ΩN for all μ ∈ Mh . μ , Qλ Eh
Transmission condition. To handle transmission condition [[q]] = t on the (n − 1)dimensional surface Γt , we simply have to write I J h + Qg h + Qf μ , Qλ = μ , qN ∂ΩN + μ , tΓt for all μ ∈ Mh , Eh
where we are assuming that Γt ⊂ Eh . This case is equivalent to having a right-hand side that is a δ-function with a support on Γt . Jump condition. Now, we can add jump condition [[u]] = j on the (n − 1)dimensional surface Γj , where j · n is given. Then we take triangulation Th such that Γj ⊂ Eh and proceed as follows. Since the exact solution is double valued on Γj , that is, since its traces on Γj are u± := {{u}} + 1/2 n± · j, we take the approximation to these traces to be λh + 1/2 n± · j on Γj and define the function (Qmj , Umj ) as the solution of local solver (2.3), with mj given by 1 nK · j on ∂K ∩ Γj , mj = 2 0 elsewhere. Then, we simply rewrite the conservativity condition as I J h + Qm h + Qf μ , Qλ + Qg = μ , qN ∂ΩN + μ , tΓt for all μ ∈ Mh . j Eh
1362
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
We see that the global system for λh has the same matrix and a right-hand side that incorporates the data related to the boundary and interface conditions. This particular example shows the ease with which the hybridizable methods can handle various types of boundary and transmission conditions for the differential equation. 5.2. Hybridizable DG methods with other stabilization mechanisms. For each finite element K ∈ Th , the LDG-H method uses on ∂K the numerical trace h = q h + τ (uh − λh )n and the IP-H uses the numerical trace q h = −agraduh + q τ (uh − λh )n. However, these are not the only choices for numerical traces we could use to generate stabilization through the difference between uh and λh . Indeed, in h = q h + the unified analysis of DG methods [5], we see that we can also take q αr ((uh − λh ) n) for the Brezzi–Manzini–Marini–Pietra–Russo (BMMPR) method [13] h = −agraduh + αr ((uh − λh )) for the Bassi–Rebay–Mariotti–Pedinotti–Savini and q (BRMPS) method [7]. Here, for any ϕ ∈ L2 (∂K), the vector αr (ϕ) is the element of V (K) such that αr (ϕ) = −τ re,K (ϕ) on each face e of K, (r e,K (ϕ), v)K = −ϕ , ve for all v ∈ V (K). It is not difficult to verify that results similar to those obtained for the LDG-H and IP-H methods can also be obtained for similar BMMPR-H and BRMPS-H methods, respectively. Let us briefly comment on a couple of interesting details. To fix ideas, we consider the BMMPR-H methods. For these methods, Theorem 2.1 holds with ah (η, μ) = (c Qη, Qμ)Th + (d Uη, Uμ)Th + 1 , [[( Uμ − μ)(αr (( Uη − η) n))]]Eh , bh (μ) = gh , (Qμ + αr ( Uμ n)) · n∂Ω + (f, Uμ)Th , provided gh |E◦h = 0. It is not difficult to see that bilinear form ah (·, ·) is symmetric. Indeed, we have that ( Uμ−μ) n , αr (( Uη−η) n)∂K =− τK |e ( Uμ − μ) n , r e,K (( Uη − η) n)e e
face of
K
e
face of
K
=+
τK |e (r e,K (( Uμ − μ) n), re,K (( Uη − η) n))K .
The fact that bilinear form ah (·, ·) is positive definite follows from Theorem 2.4 and a slight modification of Proposition 3.3; in it, we take M (∂K) = {v : v|e ∈ Pk (e), e ∈ ∂K}. Note that Assumption 2.2 is then satisfied, since re,K (ϕ) = 0, if and only if the L2 -projection of ϕ|e into Pk (e) is zero. Finally, note that the conservativity condition is enforced strongly. In this case, however, we do not have an explicit expression of the approximate trace λh in terms of (q h , uh ) as we have for the LDG-H methods in Proposition 3.4. Instead, we have only the relation [[αr (λh n)]] = [[αr (uh n)]] + [[q h ]]
on E◦h .
Let us end by noting that extensions of this work to other problems arising in continuum mechanics, fluid dynamics, and electromagnetism constitutes the subject of ongoing work.
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1363
Acknowledgment. The first author would like to thank Martin Vohral´ık for bringing to his attention reference [21]. REFERENCES [1] T. Arbogast and Z. Chen, On the implementation of mixed methods as nonconforming methods for second-order elliptic problems, Math. Comp., 64 (1995), pp. 943–972. [2] T. Arbogast, L.C. Cowsar, M.F. Wheeler, and I. Yotov, Mixed finite element methods on nonmatching multiblock grids, SIAM J. Numer. Anal., 37 (2000), pp. 1295–1315. [3] D. N. Arnold, An interior penalty finite element method with discontinuous elements, SIAM J. Numer. Anal., 19 (1982), pp. 742–760. [4] D. N. Arnold and F. Brezzi, Mixed and nonconforming finite element methods: Implementation, postprocessing and error estimates, RAIRO Mod´ el. Math. Anal. Num´er., 19 (1985), pp. 7–32. [5] D. N. Arnold, F. Brezzi, B. Cockburn, and L. D. Marini, Unified analysis of discontinuous Galerkin methods for elliptic problems, SIAM J. Numer. Anal., 39 (2002), pp. 1749–1779. [6] G. A. Baker, Finite element methods for elliptic equations using nonconforming elements, Math. Comp., 31 (1977), pp. 45–59. [7] F. Bassi, S. Rebay, G. Mariotti, S. Pedinotti, and M. Savini, A high-order accurate discontinuous finite element method for inviscid and viscous turbomachinery flows, in Proceedings of the 2nd European Conference on Turbomachinery Fluid Dynamics and Thermodynamics, Antwerpen, Belgium, Technologisch Instituut, 1997, pp. 99–108. [8] F. Ben Belgacem and Y. Maday, The mortar element method for three dimensional finite elements, M2AN Math. Model. Numer. Anal., 31 (1997), pp. 289–302. [9] C. Bernardi, Y. Maday, and A. T. Patera, Domain Decomposition by the mortar element method, in Asymptotic and Numerical Methods for Partial Differential Equations with Critical Parameters, H. G. Kaper and M. Garbey, eds., Kluwer Academic Publishers, Norwell, MA, 1993, pp. 269–286. [10] J. H. Bramble and J. Xu, A local post-processing technique for improving the accuracy in mixed finite-element approximations, SIAM J. Numer. Anal., 26 (1989), pp. 1267–1275. [11] F. Brezzi, J. Douglas, Jr., and L. D. Marini, Two families of mixed finite elements for second order elliptic problems, Numer. Math., 47 (1985), pp. 217–235. [12] F. Brezzi and M. Fortin, Mixed and Hybrid Finite Element Methods, Springer-Verlag, New York, 1991. [13] F. Brezzi, G. Manzini, L. D. Marini, P. Pietra, and A. Russo, Discontinuous finite elements for diffusion problems, in Atti Convegno in onore di F. Brioschi, Istituto Lombardo, Accademia di Scienze e Lettere, Milan, 1999, pp. 197–217. [14] A. Buffa, T. J. R. Hughes, and G. Sangalli, Analysis of a multiscale discontinuous Galerkin method for convection-diffusion problems, SIAM J. Numer. Anal., 44 (2006), pp. 1420– 1440. ¨ tzau, Hybridized, globally divergence-free LDG meth[15] J. Carrero, B. Cockburn, and D. Scho ods. Part I: The Stokes problem, Math. Comp., 75 (2006), pp. 533–563. [16] P. Castillo, Performance of discontinuous Galerkin methods for elliptic PDEs, SIAM J. Sci. Comput., 24 (2002), pp. 524–547. ¨ tzau, An a priori error analysis of [17] P. Castillo, B. Cockburn, I. Perugia, and D. Scho the local discontinuous Galerkin method for elliptic problems, SIAM J. Numer. Anal., 38 (2000), pp. 1676–1706. [18] P. Causin and R. Sacco, A discontinuous Petrov–Galerkin method with Lagrangian multipliers for second order elliptic problems, SIAM J. Numer. Anal., 43 (2005), pp. 280–302. [19] P. Causin and R. Sacco, Hierarchical mixed hybridized methods for elliptic problems, Comput. Methods Appl. Mech. Engrg., 198 (2009), pp. 1061–1973. [20] F. Celiker and B. Cockburn, Superconvergence of the numerical traces of discontinuous Galerkin and hybridized mixed methods for convection-diffusion problems in one space dimension, Math. Comp., 76 (2007), pp. 67–96. [21] Z. Chen, Equivalence between and multigrid algorithms for nonconforming and mixed methods for second-order elliptic problems, East-West J. Numer. Math., 4 (1996), pp. 1–33. [22] P. Ciarlet, The Finite Element Method for Elliptic Problems, North-Holland, Armsterdam, 1978. [23] B. Cockburn and C. Dawson, Approximation of the velocity by coupling discontinuous Galerkin and mixed finite element methods for flow problems, Comput. Geosci. (Special issue: Locally Conservative Numerical Methods for Flow in Porous Media), 6 (2002), pp. 502–522.
1364
B. COCKBURN, J. GOPALAKRISHNAN, AND R. LAZAROV
[24] B. Cockburn and B. Dong, An analysis of the minimal dissipation local discontinuous Galerkin method for convection-diffusion problems, J. Sci. Comput., 32 (2007), pp. 233– 262. ´ n, A superconvergent LDG-hybridizable Galerkin [25] B. Cockburn, B. Dong, and J. Guzma method for second-order elliptic problems, Math. Comp., to appear. [26] B. Cockburn and J. Gopalakrishnan, A characterization of hybridized mixed methods for second order elliptic problems, SIAM J. Numer. Anal., 42 (2004), pp. 283–301. [27] B. Cockburn and J. Gopalakrishnan, Error analysis of variable degree mixed methods for elliptic problems via hybridization, Math. Comp., 74 (2005), pp. 1653–1677. [28] B. Cockburn and J. Gopalakrishnan, Incompressible finite elements via hybridization. Part I: The Stokes system in two space dimensions, SIAM J. Numer. Anal., 43 (2005), pp. 1627– 1650. [29] B. Cockburn and J. Gopalakrishnan, Incompressible finite elements via hybridization. Part II: The Stokes system in three space dimensions, SIAM J. Numer. Anal., 43 (2005), pp. 1651–1672. [30] B. Cockburn and J. Gopalakrishnan, New hybridization techniques, GAMM Mitt. Ges. Angew. Math. Mech., 2 (2005), pp. 154–183. [31] B. Cockburn, J. Gopalakrishnan, and H. Wang, Locally conservative fluxes for the continuous Galerkin method, SIAM J. Numer. Anal., 45 (2007), pp. 1742–1776. ´ n, S.-C. Soon, and H. Stolarski, Analysis of the embedded discon[32] B. Cockburn, J. Guzma tinuous Galerkin method for second-order elliptic problems, submitted. ´ n, and H. Wang, Superconvergent discontinuous Galerkin methods [33] B. Cockburn, J. Guzma for second-order elliptic problems, Math. Comp., 78 (2009), pp. 1–24. [34] B. Cockburn and C.-W. Shu, The local discontinuous Galerkin method for time-dependent convection-diffusion systems, SIAM J. Numer. Anal., 35 (1998), pp. 2440–2463. [35] M. Comodi, The Hellan-Herrmann-Johnson method: Some new error estimates and postprocessing, Math. Comp., 52 (1989), pp. 17–29. [36] M. Crouzeix and P. A. Raviart, Conforming and nonconforming finite element methods for solving the stationary stokes equations, RAIRO Mod´ el. Math. Anal. Num´er., 7 (1973), pp. 33–75. [37] V. Dobrev, R. Lazarov, P. Vassilevski, and L. Zikatanov, Two-level preconditioning of discontinuous Galerkin method for second order elliptic problems, Numer. Linear Algebra Appl., 13 (2006), pp. 753–770. [38] R. Ewing, J. Wang, and Y. Yang, A stabilized discontinuous finite element method for elliptic problems, Numer. Linear Algebra Appl., 10 (2003), pp. 83–104. [39] B. M. Fraejis de Veubeke, Displacement and equilibrium models in the finite element method, in Stress Analysis, O. Zienkiewicz and G. Holister, eds., Wiley, New York, 1977, pp. 145– 197. [40] V. Girault, S. Sun, M. F. Wheeler, and I. Yotov, Coupling discontinuous Galerkin and mixed finite element discretizations using mortar finite elements, SIAM J. Numer. Anal., 46 (2008), pp. 949–979. [41] J. Gopalakrishnan, A Schwarz preconditioner for a hybridized mixed method, Comput. Methods Appl. Math., 3 (2003), pp. 116–134. [42] J. Gopalakrishnan and J. E. Pasciak, Multigrid for the mortar finite element method, SIAM J. Numer. Anal., 37 (2000), pp. 1029–1052. ¨ zey, B. Cockburn, and H. Stolarski, The embedded discontinuous Galerkin methods: [43] S. Gu Application to linear shells problems, Internat. J. Numer. Methods Engrg., 70 (2007), pp. 757–790. [44] J. T. R. Hughes, G. Scovazzi, P. B. Bochev, and A. Buffa, A multiscale discontinuous Galerkin method with the computational structure of a continuous Galerkin method, Comput. Methods Appl. Mech. Engrg., 195 (2006), pp. 2761–2787. [45] Y. A. Kuznetsov and M.F. Wheeler, Optimal order substructuring preconditioners for mixed finite element methods on nonmatching grids, East-West J. Numer. Math., 3 (1995), pp. 127–143. [46] J. L. Lions and E. Magenes, Nonhomogeneous Boundary Value Problems and Applications, Springer-Verlag, Berlin, 1972. [47] L. D. Marini, An inexpensive method for the evaluation of the solution of the lowest order Raviart–Thomas mixed method, SIAM J. Numer. Anal., 22 (1985), pp. 493–496. ¨ tzau, On the coupling of local discontinuous Galerkin and conforming [48] I. Perugia and D. Scho finite element methods, J. Sci. Comput., 16 (2001), pp. 411–433. [49] P. A. Raviart and J. M. Thomas, A mixed finite element method for second order elliptic problems, in Mathematical Aspects of Finite Element Method, Lecture Notes in Math. 606, I. Galligani and E. Magenes, eds., Springer-Verlag, New York, 1977, pp. 292–315.
UNIFIED HYBRIDIZATION OF DG, MIXED, AND CG METHODS
1365
[50] B. Rivi` ere and M. F. Wheeler, Coupling locally conservative methods for single-phase flow, Comput. Geosci. (Special issue: Locally Conservative Numerical Methods for Flow in Porous Media), 6 (2002), pp. 269–284. [51] J.-E. Roberts and J.-M. Thomas, Mixed and hybrid methods, in Finite Element Methods, Part 1, Handb. Numer. Anal. II, P. G. Ciarlet and J. L. Lions, eds., North-Holland, Amsterdam, 1991, pp. 523–639. [52] S. Siddarth, J. Carrero, B. Cockburn, K. Tamma, and R. Kanapady, The local discontinuous Galerkin method and component design integration for 3D elasticity, in Proceedings of the Third M.I.T. Conference on Computational Fluid and Solid Mechanics, Cambridge, MA, 2005, pp. 492–494. [53] C. Wieners and B. Wohlmuth, The coupling of mixed and conforming finite element discretizations, in Domain Decomposition Methods 10, Contemp. Math. 218, J. Mandel, C. Farhat, and X.-C. Cai, eds., American Mathematical Society, Providence, RI, 1997, pp. 453–459.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1366–1390
NUMERICAL DISPERSIVE SCHEMES FOR THE NONLINEAR ¨ SCHRODINGER EQUATION∗ LIVIU I. IGNAT† AND ENRIQUE ZUAZUA‡ Abstract. We consider semidiscrete approximation schemes for the linear Schr¨ odinger equation and analyze whether the classical dispersive properties of the continuous model hold for these approximations. For the conservative finite difference semidiscretization scheme we show that, as the mesh size tends to zero, the semidiscrete approximate solutions lose the dispersion property. This fact is proved by constructing solutions concentrated at the points of the spectrum where the second order derivatives of the symbol of the discrete Laplacian vanish. Therefore this phenomenon is due to the presence of numerical spurious high frequencies. To recover the dispersive properties of the solutions at the discrete level, we introduce two numerical remedies: Fourier filtering and a two-grid preconditioner. For each of them we prove Strichartz-like estimates and a local space smoothing effect, uniform in the mesh size. The methods we employ are based on classical estimates for oscillatory integrals. These estimates allow us to treat nonlinear problems with L2 -initial data, without additional regularity hypotheses. We prove the convergence of the two-grid method for nonlinearities that cannot be handled by energy arguments and which, even in the continuous case, require Strichartz estimates. Key words. finite differences, nonlinear Schr¨ odinger equations, Strichartz estimates AMS subject classifications. 65M12, 65T50, 35Q55 DOI. 10.1137/070683787
1. Introduction. Let us consider the linear (LSE) and the nonlinear (NSE) Schr¨ odinger equations: iut + Δu = 0, x ∈ Rd , t = 0, (1.1) u(0, x) = ϕ(x), x ∈ Rd , and (1.2)
x ∈ Rd ,
iut + Δu = F (u), u(0, x) = ϕ(x),
t = 0,
x∈R , d
respectively. The linear equation (1.1) is solved by u(t, x) = S(t)ϕ(x), where S(t) = eitΔ is the free Schr¨ odinger operator. The linear semigroup has two important properties. First, we have the conservation of the L2 -norm (1.3)
u(t)L2 (Rd ) = ϕL2 (Rd )
∗ Received by the editors February 28, 2007; accepted for publication (in revised form) November 10, 2008; published electronically February 25, 2009. This work has been supported by grant MTM2008-03541 of the Spanish MEC, the DOMINO Project CIT-370200-2005-10 in the PROFIT program, and the SIMUMAT project of the CAM (Spain). http://www.siam.org/journals/sinum/47-2/68378.html † Institute of Mathematics “Simion Stoilow” of the Romanian Academy, P.O. Box 1-764, RO014700 Bucharest, Romania (
[email protected]). This author’s research was also supported by reintegration grant RP-3, contract 4-01/10/2007 of CNCSIS Romania. ‡ Basque Center for Applied Mathematics, Gran Via, 35 - 2, 48009 Bilbao, Spain (zuazua@ bcamath.org).
1366
1367
NUMERICAL DISPERSIVE SCHEMES FOR NSE
and then a dispersive estimate of the form (1.4)
|u(t, x)| = |S(t)ϕ(x)| ≤
1 ϕL1 (Rd ) , (4π|t|)d/2
x ∈ Rd ,
t = 0.
The space-time estimate (1.5)
S(·)ϕL2+4/d(R, L2+4/d (Rd )) ≤ CϕL2 (Rd ) ,
due to Strichartz [27], is deeper. It guarantees that the solutions decay as t becomes large and that they gain some spatial integrability. Inequality (1.5) was generalized by Ginibre and Velo [8]. They proved the mixed space-time estimate, well known as Strichartz’s estimate: (1.6)
S(·)ϕLq (R, Lr (Rd )) ≤ C(q, r)ϕL2 (Rd )
for the so-called d/2-admissible pairs (q, r). We recall that the exponent pair (q, r) is α-admissible (cf. [14]) if 2 ≤ q, r ≤ ∞, (q, r, α) = (2, ∞, 1), and
1 1 1 =α − (1.7) . q 2 r The Strichartz estimates play an important role in the proof of the well-posedness of the NSE. Typically they are used when the energy methods fail to provide wellposedness results. The nonlinear problem (1.2) with nonlinearity F (u) = |u|p u, p < 4/d and initial data in L2 (Rd ) was first analyzed by Tsutsumi [30]. The author proved that, in this case, the NSE is globally well posed in L∞ (R, L2 (Rd )) ∩ Lqloc (R, Lr (Rd )), where (q, r) is a d/2-admissible pair depending on the nonlinearity F . The Schr¨ odinger equation has another remarkable property guaranteeing the gain of one half space derivative in L2x,t (cf. [5] and [15]): ∞ 1 |(−Δ)1/4 eitΔ ϕ|2 dtdx ≤ Cϕ2L2 (Rd ) . (1.8) sup x0 ,R R B(x0 ,R) −∞ It has played a crucial role in the study of the NSE with nonlinearities involving derivatives (see [16]). In particular, it is extremely useful when deriving compactness properties. For other properties on the Schr¨ odinger equation we refer the reader to [3] and [28]. In this paper we analyze whether semidiscrete schemes for the LSE have dispersive properties similar to (1.4), (1.6), and (1.8), uniform with respect to the mesh sizes. The study of these dispersion properties for these approximation schemes is relevant for introducing convergent schemes in the nonlinear context. Indeed, as mentioned above, the proof of the well-posedness of the NSE requires a fine use of the dispersion properties, and, consequently, it seems unlikely that the convergence of the numerical schemes could be proved if these dispersion properties are not verified at the numerical level. Estimates similar to (1.6) for numerical solutions will allow proving uniform (on the mesh-size parameter) bounds on discrete versions of the space L∞ (R, L2 (Rd )) ∩ Lqloc (R, Lr (Rd )). On the other hand, estimates similar to (1.8) on discrete solutions will give sufficient conditions to guarantee their compactness and thus the convergence towards the solution of the NSE (1.2).
1368
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
However, as we shall see, standard numerical approximation schemes often fail to satisfy these dispersive estimates, uniformly in the mesh-size parameter, and important work needs to be done to develop numerical schemes that do fulfill these estimates uniformly. To better illustrate the problems we shall address, let us first consider the conservative semidiscrete numerical scheme ⎧ h ⎪ ⎨ i du + Δ uh = 0, t > 0, h dt (1.9) ⎪ ⎩ uh (0) = ϕh . Here uh stands for the infinite unknown vector {uhj }j∈Zd , uj (t) being the approximation of the solution at the node xj = jh, and Δh the classical second order finite difference approximation of Δ: (1.10)
(Δh uh )j = h−2
d
(uhj+ek + uhj−ek − 2uhj ).
k=1
In the one-dimensional (1-d) case, the lack of uniform dispersive estimates for the solutions of (1.9) has been observed by the authors in [12, 13]. The symbol of the Laplacian, ξ 2 , in the numerical scheme (1.9) is replaced by 4/h2 sin2 (ξh/2) for the discrete Laplacian (1.10). The first and second derivatives of the latter vanish at the points ±π/h and ±π/2h of the spectrum. By building wave packets concentrated at the pathological spectral points ±π/2h, it is possible to prove the lack of any uniform estimate of the type (1.4) or (1.6). Similar negative results can be shown to hold concerning (1.8) by building wave packets concentrated at ±π/h. The paper is organized as follows. In section 2 we analyze the conservative approximation scheme (1.9). We extend the 1-d results mentioned above and prove that this scheme does not ensure the gain of any uniform integrability or local smoothing property of the solutions with respect to the initial data. The behavior of the Fourier symbol of the numerical scheme provides a good insight to this pathological behavior. We then propose a Fourier filtering method allowing recovery of both the integrability and the local smoothing properties of the continuous model. The lack of dispersion properties for the linear scheme makes it of little use to approximate nonlinear problems. In fact, in subsection 2.5, by an explicit construction we see that the solutions of a cubic semidiscrete Schr¨ odinger equation do not satisfy the dispersion property of the continuous one, uniformly in the mesh-size parameter. We then introduce a numerical scheme for which the dispersion estimates are uniform. The proposed scheme involves a two-grid algorithm to precondition the initial data. Based on this numerical scheme for the LSE we build a convergent numerical scheme for the NSE in the class of L2 (Rd )-initial data. Section 3 is dedicated to the analysis of the method based on the two-grid preconditioning of the initial data. We analyze the action of the linear semigroup exp(itΔh ) on the subspace of l2 (hZd ) consisting of the slowly oscillating sequences generated by the two-grid method. Once we obtain Strichartz-like estimates in this subspace we apply them to approximate the NSE. The nonlinear term is approximated in such a way that it belongs to the class of slowly oscillating data which permits the use of the uniform Strichartz estimates. The results in this paper should be compared to those in [25]. In that paper the authors analyze the Schr¨ odinger equation on the lattice Zd without analyzing the
1369
NUMERICAL DISPERSIVE SCHEMES FOR NSE
dependence on the mesh-size parameter h. They obtain Strichartz-like estimates in a class of exponents q and r larger than in the continuous one. But none of these results is uniform when working on the scaled lattice hZd and letting h → 0 as our results in section 2 show. In the context of equations on lattices we also mention [6, 19]. In these papers the authors analyze the dynamics of infinite harmonic lattices in the limit of the lattice distance tending to zero. The analysis in this paper can be adapted to address fully discrete schemes. In [10] necessary and sufficient conditions are given guaranteeing uniform dispersion estimates for fully discrete schemes. The work of Nixon [20] is also worth mentioning. There the 1-d KdV equation is considered and space-time estimates are proved for the implicit Euler scheme. 2. A conservative scheme. In this section we analyze the conservative scheme (1.9). This scheme satisfies the classical properties of consistency and stability which imply L2 -convergence. We construct pathological explicit solutions for (1.9) for which neither (1.6) nor (1.8) holds uniformly with respect to the mesh-size parameter h. In our analysis we make use of the semidiscrete Fourier transform (SDFT) (we refer the reader to [29] for the main properties of the SDFT). For any v h ∈ l2 (hZd ) we define its SDFT at the scale h by (2.11) vh (ξ) = (Fh v h )(ξ) = hd e−iξ·jh vjh , ξ ∈ [−π/h, π/h]d . j∈Zd
We will use the notation A B to report the inequality A ≤ constant × B, where the multiplicative constant is independent of h. The statement A ) B is equivalent to A B and B A. Taking the SDFT in (1.9) we obtain that uh (t) = S h (t)ϕh which is the solution of (1.9) satisfies (2.12)
uh (t, ξ) = 0, i uht (t, ξ) + ph (ξ)
t ∈ R,
ξ ∈ [−π/h, π/h]d,
where the function ph : [−π/h, π/h]d → R is defined by (2.13)
d 4 2 ξk h ph (ξ) = 2 sin . h 2 k=1
Solving the ODE (2.12) we obtain that the Fourier transform of uh is given by (2.14)
u h (t, ξ) = e−itph (ξ) ϕ h (ξ),
ξ ∈ [−π/h, π/h]d .
Observe that the new symbol ph (ξ) is different from the continuous one, |ξ|2 . In the 1-d case (see Figure 1), the symbol ph (ξ) changes convexity at the points ξ = ±π/2h and has critical points also at ξ = ±π/h, two properties that the continuous symbol does not have. Using that inf
ξ∈[−π/h,π/h]
|p h (ξ)| + |p h (ξ)| > 0,
in [13] (see also [25] for h = 1) it has been proved that (2.15) uh (t)l∞ (hZ) ϕh l1 (hZ) |t|−1/2 + (|t|h)−1/3 ,
t = 0.
1370
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
continuous semidiscrete
−π/h
0
−π/2h
π/2h
π/h
Fig. 1. The two symbols in dimension one.
1
10
1
|u (t)| ∞
l (Z)
−1/3
t
−1/2
t
0
10
−1
10
−2
10
0
10
1
10
2
10
3
10
Fig. 2. Log-log plot of the time evolution of the l∞ (Z)-norm of the fundamental solution u1 for (1.9).
Note that estimate (2.15) blows up as h → 0. Therefore it does not yield uniform Strichartz estimates. Figure 2 shows that (2.15) could not be improved for large time t. In fact when h = 1 and ϕ1 = δ0 (δ0 is the discrete Dirac function, where (δ0 )0 is one and zero otherwise) the solution u1 (t) behaves as t−1/3 for large time t instead of t−1/2 in the case of the LSE. In dimension d, similar results can be obtained in terms of the number of nonvanishing principal curvatures of the symbol and its gradient. Observe that, at the points ξ = (±π/2h, . . . , ±π/2h), all the eigenvalues of the Hessian matrix Hph = (∂ij ph )ij vanish. Moreover, if k-components of the vector ξ coincide with ±π/2h, the rank of Hph at this point is d − k instead of d, as in the continuous case. This will imply that the solutions of (1.9), concentrated at these points of the spectrum, will behave as t−(d−k)/2 (th)−k/3 instead of t−d/2 as t → ∞. This shows that there are no uniform estimates similar to (1.4) or (1.6) at the discrete level. But these inequalities are necessary to prove the uniform boundedness of the semidiscrete solutions in the nonlinear setting. On the other hand, at the points ξ = (±π/h, . . . , ±π/h), the gradient of the symbol ph (ξ) vanishes. As we will see, these pathologies affect the dispersive properties of the semidiscrete scheme (1.9) and its solutions do not fulfill the regularizing property (1.8), uniformly in h > 0, which is needed to guarantee the compactness of the semidiscrete solutions. This constitutes an obstacle when passing to the limit as h → 0 in the nonlinear semidiscrete models. This section is organized as follows. Section 2.1 deals with the analysis of proper-
NUMERICAL DISPERSIVE SCHEMES FOR NSE
1371
ties (1.4) and (1.6) for the solutions of (1.9). The local smoothing property is analyzed in section 2.2. In section 2.3 we prove uniform estimates similar to (1.4) and (1.8), uniformly with respect to the parameter h, in the class of initial data whose Fourier spectrum has been filtered conveniently. Strichartz-like estimates for filtered solutions are given in section 2.4. In section 2.5 we analyze a numerical scheme for the 1-d cubic NSE based on the conservative approximation of the linear Schr¨ odinger semigroup. We prove that its solutions do not remain uniformly bounded in any auxiliary space Lqloc (R, Lr (hZ)). 2.1. Lack of uniform dispersive estimates. First, we construct explicit examples of solutions of (1.9) for which all the classical estimates of the continuous case (1.6) blow up. Theorem 2.1. Let T > 0, r0 ≥ 1, and r > r0 . Then (2.16)
S h (T )ϕh lr (hZd ) =∞ ϕh lr0 (hZd ) h>0, ϕh ∈lr0 (hZd ) sup
and (2.17)
S h (·)ϕh L1 ((0,T ), lr (hZd )) = ∞. ϕh lr0 (hZd ) h>0, ϕh ∈lr0 (hZd ) sup
Remark 2.1. A finer analysis can be done. The same result holds if we take the supremum in (2.16) and (2.17) over the set of functions ϕh ∈ lr0 (hZd ) such that the support of their Fourier transform (2.11) contains at least one of the points of the set π π d π. . : ∃i ∈ {1, . . . , d} such that ξi = (2.18) Mh1 = ξ = (ξ1 , . . . , ξd ) ∈ − , h h 2h Observe that at the above points the rank of the Hessian matrix Hph is at most d − 1. Remark 2.2. Let Ph be an interpolator, piecewise constant or linear. In view of Theorem 2.1, for any fixed T > 0, the uniform boundedness principle guarantees the existence of a function ϕ ∈ L2 (Rd ) and a sequence ϕh such that Ph ϕh → ϕ in L2 (Rd ) and the corresponding solutions uh of (1.9) satisfy Ph uh L1 ((0,T ), Lr (Rd )) → ∞. Proof of Theorem 2.1. First, observe that it is sufficient to deal with the 1-d case. Indeed, for any sequence {ψjh }j∈Z set ϕhj = ψjh1 . . . ψjhd , where j = (j1 , j2 , . . . , jd ). We are thus considering discrete functions in separated variables. Then, for any t the following holds: (S h (t)ϕh )j = (S 1,h (t)ψ h )j1 (S 1,h (t)ψ h )j2 . . . (S 1,h (t)ψ h )jd , where S 1,h (t) is the linear semigroup generated by (1.9) in the 1-d case. Thus it is obvious that (2.16) and (2.17) hold in dimension d ≥ 2, once we prove them in the 1-d case d = 1. In the following we will consider the 1-d case d = 1 and prove (2.16), the other estimate (2.17) being similar. Using the properties of the SDFT it is easy to see that (S h (t)ϕh )j = (S 1 (t/h2 )ϕ1 )j , where ϕ1j = ϕhj , j ∈ Z. A scaling argument in (2.16) shows that (2.19)
1 2 1 1 1 S (T /h )ϕ lq (Z) S h (T )ϕh lq (hZ) q − q0 = h . ϕh lq0 (hZ) ϕ1 lq0 (Z)
1372
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
Let us introduce the operator S1 (t) defined by π (2.20) (S1 (t)ϕ)(x) = e−itp1 (ξ) eixξ ϕ(ξ)dξ, −π
which is the extension of the semigroup generated by (1.9) for h = 1 to all x ∈ R. We point out that for any sequence {ϕ1j }j∈Z , S1 (t)ϕ1 as in (2.20), which is defined for all x ∈ R, is in fact the band-limited interpolator of the semidiscrete function S 1 (t)ϕ1 . The results of Magyar, Stein, and Wainger [18] (see also Plancherel and P´ olya [21]) on band-limited functions show that the following inequalities hold for any q ≥ 1 and for all continuous functions ϕ supported in [−π, π]: c(q)ϕlq (Z) ≤ ϕLq (R) ≤ C(q)ϕlq (Z) . Thus for any q > q0 ≥ 1 the following holds for all functions ϕ1 whose Fourier transform is supported in [−π, π]: S1 (t)ϕ1 Lq (R) S 1 (t)ϕ1 lq (Z) ≥ c(q, q ) . 0 ϕ1 lq0 (Z) ϕ1 Lq0 (R)
(2.21)
In view of this property it is sufficient to deal with the operator S1 (t). Denoting τ = T /h2 , by (2.19) the proof of (2.16) is reduced to the proof of the following fact about the new operator S1 (t): (2.22)
1
lim τ 2
τ →∞
1 q0
− 1q
S1 (τ )ϕLq (R) = ∞. ϕLq0 (R) supp(ϕ)⊂[−π,π] sup
The following lemma is the key point in the proof of the last estimate. Lemma 2.1. There exists a positive constant c such that for all τ sufficiently large, there exists a function ϕτ such that ϕτ Lp (R) ) τ 1/3p for all p ≥ 1 and |(S1 (t)ϕτ )(x)| ≥
(2.23)
1 2
for all |t| ≤ cτ and |x − tp 1 (π/2)| ≤ cτ 1/3 . Remark 2.3. Lemma 2.1 shows a lack of dispersion in the semidiscrete setting when compared with the continuous one. In the latter, for any initial data ϕτ such that ϕτ L1 (R) ) τ 1/3 , the solution S(t)ϕτ of the LSE satisfies S(t)ϕτ L∞ (R)
τ 1/3 1 1/6 |t|1/2 τ
for all t ) τ , which is incompatible with (2.23). The proof of Lemma 2.1 will be given later. Assuming for the moment that Lemma 2.1 holds, we now prove (2.22). In view of Lemma 2.1, given q > q0 ≥ 1, for sufficiently large τ the following holds: 1 S1 (τ )ϕLq (R) − 1 τ 3q 3q0 . q ϕ L 0 (R) supp(ϕ)⊂[−π,π]
sup
Thus (2.22) holds and the proof is done. Proof of Lemma 2.1. The techniques used below are similar to those used in [7] to get lower bounds on oscillatory integrals.
1373
NUMERICAL DISPERSIVE SCHEMES FOR NSE
We define the relevant initial data through its Fourier transform. Let us first fix π a positive function ϕ supported on (−1, 1) such that −π ϕ = 1. For all positive τ , we set 1/3 (ξ − π/2)). ϕ τ (ξ) = τ 1/3 ϕ(τ We define ϕτ as the inverse Fourier transform of ϕ τ . Observe that ϕ τ is supported π in the interval (π/2 − τ −1/3 , π/2 + τ −1/3 ) and −π ϕ τ = 1. Also using that ϕτ (x) = ϕ1 (τ −1/3 x) we get ϕτ Lp (R) ) τ 1/3p for any p ≥ 1. The mean value theorem applied to the integral occurring in the right-hand side of (2.20) shows that (2.24)
|S1 (t)ϕτ (x)| ≥
1 − 2τ −1/3
sup ξ∈ supp(ϕ τ )
|x − tp 1 (ξ)|
π
−π
ϕ τ (ξ)dξ.
Using that the second derivative of p1 vanishes at ξ = π/2 we obtain the existence of a positive constant c1 such that |x − tp 1 (ξ)| ≤ |x − tp 1 (π/2)| + tc1 |ξ − π/2|2 ,
ξ ) π/2.
In particular for all ξ ∈ [π/2 − τ −1/3 , π/2 + τ −1/3 ] the following holds: |x − tp 1 (ξ)| ≤ |x − tp 1 (π/2)| + tc1 τ −2/3 . Thus there exists a (small enough) positive constant c such that for all x and t satisfying |x − tp 1 (π/2)| ≤ cτ 1/3 and t ≤ cτ 2τ −1/3
sup ξ∈ supp(ϕ τ )
|x − tp 1 (ξ)| ≤
1 . 2
In view of (2.24) this yields (2.23) and finishes the proof. 2.2. Lack of uniform local smoothing effect. In order to analyze the local smoothing effect at the discrete level we introduce the discrete fractional derivatives on the lattice hZd . We define, for any s ≥ 0, the fractional derivative (−Δh )s/2 uh at the scale h as s/2 s/2 h ph (ξ)eij·ξh Fh (uh )(ξ)dξ, j ∈ Zd , (2.25) ((−Δh ) u )j = [−π/h,π/h]d
where ph (·) is as in (2.13) and Fh (uh ) is the SDFT of the sequence {uhj }j∈Zd at the scale h . Concerning the local smoothing effect we have the following result. Theorem 2.2. Let T > 0 and s > 0. Then hd |j|h≤1 |((−Δh )s/2 S h (T )ϕh )j |2 (2.26) sup =∞ ϕh 2l2 (hZd ) h>0,ϕh ∈l2 (hZd ) and (2.27)
sup h>0,ϕh ∈l2 (hZd )
hd
T |j|h≤1 0
|((−Δh )s/2 S h (t)ϕh )j |2 dt ϕh 2l2 (hZd )
= ∞.
1374
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
Remark 2.4. The same result holds if we take the supremum in (2.26) and (2.27) over the set of functions ϕh ∈ l2 (hZd ) such that the support of ϕh contains at least one of the points of the set . π π d π (2.28) Mh2 = ξ = (ξ1 , . . . , ξd ) ∈ − , : ξi = ± , i = 1, . . . , d . h h h Observe that at the above points the gradient of ph vanishes. In contrast with the proof of Theorem 2.1 we cannot reduce it to the 1-d case. s/2 This is due to the extra factor ph (ξ) which does not allow us to use separation of variables. The proof consists in reducing (2.26) and (2.27) to the case h = 1 and then using the following lemma. Lemma 2.2. Let s > 0. There is a positive constant c such that for all τ sufficiently large there exists a function ϕ1τ with ϕ1τ l2 (Zd ) = τ d/2 and |((−Δ1 )s/2 S 1 (t)ϕ1τ )j | ≥ 1/2
(2.29)
for all |t| ≤ cτ 2 , |j| ≤ cτ . We postpone the proof of Lemma 2.2 and proceed with the proof of Theorem 2.2. Proof of Theorem 2.2. We prove (2.26), the other estimate (2.27) being similar. As in the previous section we reduce the proof to the case h = 1. By the definition of (−Δh )s/2 for any j ∈ Zd we have that ((−Δh )s/2 S h (t)ϕh )j = h−s ((−Δ1 )s/2 S 1 (t/h2 )ϕ1 )j ,
j ∈ Zd ,
where ϕhj = ϕ1j , j ∈ Zd . Thus hd
s/2 h S (T )ϕh )j |2 |j|h≤1 |((−Δh ) ϕh 2l2 (hZd )
=
h−2s
s/2 1 S (T /h2 )ϕ1 )j |2 |j|≤1/h |((−Δ1 ) . ϕ1 2l2 (Zd )
With c and ϕτ given by Lemma 2.2 and τ such that cτ 2 = T /h2 , i.e., τ = (T /c)1/2 h−1 , we have ϕ1τ 2l2 (Z) = τ d and lim
τ →∞
h−2s
s/2 1 S (T /h2 )ϕ1τ )j |2 |j|≤1/h |((−Δ1 ) ϕ1τ 2l2 (Zd )
τ 2s τ d = ∞. τ →∞ τ d
lim
This finishes the proof. Proof of Lemma 2.2. We choose a positive function ϕ supported in the unit ball = 1. Set for all τ ≥ 1 ϕ 1τ (ξ) = τ d ϕ (τ (ξ − πd )) , where πd = (π, . . . , π). We with Rd ϕ define ϕ1τ as the inverse Fourier transform at scale h = 1 of ϕ 1τ . Thus ϕ 1τ is supported in −1 1 d/2 {ξ : |ξ − πd | ≤ τ }, it has mass one, and ϕτ l2 (Zd ) ) τ . Applying the mean value theorem to the oscillatory integral occurring in the definition of (−Δ1 )s/2 S 1 (t)ϕ1τ and using that p1 (ξ) behaves as a positive constant in the support of ϕ 1τ we obtain that for some positive constant c0 |((−Δ1 )s/2 S 1 (t)ϕ1τ )j | ≥
1 − 2τ −1
sup ξ∈ supp(ϕ 1τ )
≥ c0
1 − 2τ
[−π,π]d
p1 (ξ)ϕ 1τ (ξ)dξ
−1
s/2
|j − t∇p1 (ξ)|
sup ξ∈ supp(ϕ 1τ )
|j − t∇p1 (ξ)|
[−π,π]d
ϕ 1τ (ξ)dξ.
1375
NUMERICAL DISPERSIVE SCHEMES FOR NSE
Using that ∇p1 vanishes at ξ = πd we obtain the existence of a positive constant c1 such that |j − t∇p1 (ξ)| ≤ |j| + tc1 |ξ − πd |,
ξ ∼ πd .
Then there exists a positive constant c such that for all j and t satisfying |j| ≤ cτ and t ≤ cτ 2 the following holds: 2τ −1
sup ξ∈ supp(ϕ τ )
|j − t∇p1 (ξ)| ≤
1 . 2
Thus for all t and j as above (2.29) holds. This finishes the proof. 2.3. Filtering of the initial data. As we have seen in the previous section the conservative scheme (1.9) does not reproduce the dispersive properties of the continuous LSE. In this section we prove that a suitable filtering of the initial data in the Fourier space provides uniform dispersive properties and a local smoothing effect. The key point to recover the decay rates (1.4) at the discrete level is to choose initial data with their SDFT supported away from the pathological points Mh1 in (2.18). Similarly, the local smoothing property holds uniformly on h if the SDFT of the initial data is supported away from the points Mh2 in (2.28). For any positive < π/2 we define Ωh , the set of all the points in the cube [−π/h, π/h]d whose distance is at least /h from the set in which some of the second order derivatives of ph (ξ) vanish: π π d . π : ξi ∓ Ωh,d = ξ = (ξ1 , . . . , ξd ) ∈ − , ≥ , i = 1, . . . , d . h h 2h h h Let us define the class of functions I,d ⊂ l2 (hZd ), whose SDFT is supported on Ωh,d : h = {ϕh ∈ l2 (hZd ) : supp(ϕ h ) ⊂ Ωh,d }. I,d
(2.30)
We can view this subspace of initial data as a subclass of filtered data in the sense that the Fourier components corresponding to ξ such that |ξi ± π/2h| ≤ /h have been cut off or filtered out. The following theorem shows that for initial data in this class the semigroup S h (t) has the same long time behavior as the continuous one, independently of h in what concerns the lp (hZd ) − lp (hZd ) decay property. Theorem 2.3. Let 0 < < π/2 and p ≥ 2. There exists a positive constant C(, p, d) such that (2.31)
2
S h (t)ϕh lp (hZd ) ≤ C(, p, d)|t|− 2 (1− p ) ϕh lp (hZd ) , d
t = 0,
h , uniformly on h > 0. holds for all ϕh ∈ lp (hZd ) ∩ I,d 1 Proof. A scaling argument reduces the proof to the case h = 1. For any ϕ1 ∈ I,d 1 the solution of (1.9) is given by S 1 (t)ϕ1 = K,d ∗ ϕ1 , where 1 (t, j) = eitp1 (ξ) eij·ξ dξ, j ∈ Zd . (2.32) K,d Ω1,d
As a consequence of Young’s inequality it remains to prove that (2.33)
1 K,d (t)lp (Zd ) ≤ C(, p, d)|t|−d/2(1−1/p)
1376
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
for any p ≥ 2 and for all t = 0. Observe that it is then sufficient to prove (2.33) in the 1-d case. Using that the second derivative of the function sin2 (ξ/2) is positive on Ω1,1 we obtain by the Van der Corput lemma (see [26, Prop. 2, Chap. 8, p. 332]) that 1 K,1 (t)l∞ (Z) ≤ c()|t|−1/2 which finishes the proof. A similar result can be stated for the local smoothing effect. For a positive , let 6 h of all points located at a distance of at least /h from the points us define the set Ω ,d (±π/h)d : d . 6 h,d = ξ ∈ − π , π : ξi ∓ π ≥ , i = 1, . . . , d . Ω h h h h 6 h the symbol ph (ξ) has no critical points other than ξ = 0. A Observe that on Ω ,d similar argument as in [15] shows that the linear semigroup S h (t) gains one half space derivative in L2t,x with respect to the initial datum filtered as above. More precisely, if Ph∗ denotes the band-limited interpolator (cf. [31, Chap. II]) (2.34) (Ph∗ uh )(x) = u h (ξ)eix·ξ dξ, x ∈ Rd , [−π/h, π/h]d
the following holds. Theorem 2.4. Let > 0. There exists a positive constant C(, d) such that for any R > 0 ∞ |(−Δ)1/4 Ph∗ eitΔh ϕh )|2 dtdx ≤ C(, d)Rϕh 2l2 (hZd ) −∞
|x|>R
6 h , uniformly on h > 0. holds for all ϕh ∈ l2 (hZd ) with supp(ϕ h ) ⊂ Ω ,d To prove this result we make use of the following theorem. Theorem 2.5 (see [15, Theorem 4.1]). Let O be an open set in Rd and ψ be a 1 C (O) function such that ∇ψ(ξ) = 0 for any ξ ∈ O. Assume that there is N ∈ N such that for any (ξ1 , . . . , ξd−1 ) ∈ Rd−1 and r ∈ R the equations ψ(ξ1 , . . . , ξk , ξ, ξk+1 , . . . , ξd−1 ) = r,
k = 0, . . . , d − 1,
have at most N solutions ξ ∈ R. For a ∈ L∞ (Rd × R) and f ∈ S(Rd ) define W (t)f (x) = ei(tψ(ξ)+x·ξ) a(x, ψ(ξ))f(ξ)dξ. O
Then for any R > 0
∞
(2.35) |x|≤R
−∞
|W (t)f (x)| dtdx ≤ cRN 2
O
|f(ξ)|2 dξ, |∇ψ(ξ)|
where c is independent of R and N and f . Remark 2.5. The result remains true for domains O where |∇ψ| has zeros, provided that the right-hand side of (2.35) is finite. 6h Proof of Theorem 2.4. Observe that for any ϕh ∈ l2 (hZd ) with supp(ϕ h ) ⊂ Ω ,d we have (Ph∗ eitΔh ϕh )(x) = eitph (ξ) eix·ξ ϕ h (ξ)dξ, x ∈ Rd . 6h Ω ,d
1377
NUMERICAL DISPERSIVE SCHEMES FOR NSE
6 h , ψ = ph (ξ), and a ≡ 1 and using that Applying Theorem 2.5 with O = Ω ,d 6 h we obtain that |∇ph (ξ)| ≥ c(, d)|ξ| for all ξ ∈ Ω ,d |x|
∞
−∞
|(−Δ)1/4 Ph∗ eitΔh ϕh |2 dtdx
6h Ω ,d
|ϕ h (ξ)|2 |ξ| dξ ϕh 2l2 (hZd ) . |∇ph (ξ)|
This finishes the proof. 2.4. Strichartz estimates for filtered data. In this section we are interested in deriving Strichartz-like estimates for the operator S h (t) when it acts on functions h , the class of functions defined in (2.30). belonging to I,d The main ingredient in obtaining Strichartz estimates is the following result due to Keel and Tao [14]. Theorem 2.6 (see [14, Theorem 1.2]). Let H be a Hilbert space, (X, dx) be a measure space, and U (t) : H → L2 (X) be a one parameter family of mappings, which obey the energy estimate U (t)f L2 (X) ≤ Cf H
(2.36) and the decay estimate
U (t)U (s)∗ gL∞ (X) ≤ C|t − s|−σ gL1 (X)
(2.37)
for some σ > 0. Then U (t)f Lq (R, Lr (X)) ≤ Cf H (2.38) 0 0 0 0 0 U (s)∗ F (s, ·)ds0 0 0 R
≤ CF Lq (R, Lr (X))
∀ F ∈ Lq (R, Lr (X)),
H
(2.39) 0 t 0 0 0 ∗ 0 U (t)U (s) F (s, ·)ds0 0 0 0
∀ f ∈ H,
≤ CF Lq˜ (R, Lr˜ (X))
∀ F ∈ Lq˜ (R, Lr˜ (X))
Lq (R, Lr (X))
for any σ-admissible pairs (q, r) and (˜ q , r˜). Remark 2.6. With the same arguments as in [14], the following also holds for all (q, r) and (˜ q , r˜), σ-admissible pairs: (2.40)
0 t 0 0 0 0 0 U (t − s)F (s, ·)ds 0 0 0
≤ CF Lq˜ (R, Lr˜ (X)) . Lq (R, Lr (X))
In the case of the Schr¨ odinger semigroup, S(t − s) = S(t)S(s)∗ , so (2.40) and (2.39) coincide. However, in our applications we will often deal with operators that do not satisfy S(t − s) = S(t)S(s)∗ . Let us choose 0 < < π/2, Kd1, as in (2.32) and U (t)ϕ1 = Kd1, ∗ϕ1 . We apply the above theorem to U (t), with X = Zd , dx being the counting measure, and H = l2 (Zd ). In this way we obtain Strichartz estimates for the semigroup S 1 (t) when acting on 1 , i.e., when h = 1. Then, by scaling, we obtain the following result in the class of I,d filtered initial data.
1378
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
Theorem 2.7. Let 0 < < π/2 and (q, r), (˜ q , r˜) be two d/2-admissible pairs. (i) There exists a positive constant C(d, r, ) such that S h (·)ϕh Lq (R, lr (hZd )) ≤ C(d, r, )ϕh l2 (hZd )
(2.41)
h holds for all functions ϕh ∈ I,d and for all h > 0. (ii) There exists a positive constant C(d, r, r˜, ) such that 0 t 0 0 0 h h 0 (2.42) S (t − s)f (s)ds0 ≤ C(d, r, r˜, )f h Lq˜ (R, lr˜ (hZd )) 0 0 0
Lq (R, lr (hZd ))
h holds for all functions f h ∈ Lq˜ (R, lr˜ (hZd )) with f (t) ∈ I,d for a.e. t ∈ R and for all h > 0.
2.5. On the cubic NSE. In the previous sections we have seen that the linear semidiscrete scheme (1.9) does not satisfy uniform (with respect to h) dispersive estimates. Accordingly we cannot use it to get numerical approximations for the NSE with uniform bounds on spaces of the form Lq ((0, T ), lr (hZd )). However, one could agree that, even if a perturbation argument based on the variation of constants formula and the dispersive properties of the linear scheme does not provide uniform bounds for the nonlinear problem, these estimates could still be true. In this section we give an explicit example showing that a numerical scheme for the cubic NSE based on the conservative scheme (1.9) does not satisfy uniform bounds in Lq ((0, T ), lr (hZd )). This shows that the conservative scheme (1.9) can be used neither for the LSE nor for the NSE within the Lq ((0, T ), lr (hZd ))-setting. We consider an approximation scheme to the 1-d NSE with nonlinearity 2|u|2 u: (2.43)
i∂t uhn + (Δh uh )n = |uhn |2 (uhn+1 + uhn−1 ).
In what follows we shall refer to it as the Ablowitz–Ladik approximation [1] for the NSE. As we shall see, this scheme possesses explicit solutions which blow up in any Lqloc (R, lr (hZ))-norm with r > 2 and q ≥ 1. We point out that this is compatible with the L2 -convergence of the numerical scheme (2.43) for smooth initial data [1, 2]. Let us consider ϕ ∈ L2 (R) as initial data for (1.2) with F (u) = 2u|u|2 . As initial condition for (2.43) we take uh (0) = ϕh , ϕh being an approximation of ϕ. Let us assume the existence of a positive T such that for any h > 0, there exists uh ∈ L∞ ([0, T ], l2 (hZ)) a solution of (2.43). The uniform boundedness of {uh }h>0 in L∞ ([0, T ], l2 (hZ)) does not suffice to prove its convergence to the solution of (1.2). One needs to analyze whether the solutions of (2.43) are uniformly bounded, with respect to h, in one of the auxiliary spaces Lqloc (R, lr (hZ)), a property that will guarantee that any possible limit point of {uh }h>0 belongs to Lq ((0, T ), Lr (R)). We are going to show that these uniform estimates do not hold in general. To do that we look for explicit travelling wave solutions of (2.43). By scaling, the problem can be reduced to the case h = 1. Indeed, uh is a solution of (2.43) if the scaled function u1n (t) = huhn (th2 ),
n ∈ Z,
t ≥ 0,
solves (2.43) for h = 1. In this case, h = 1, there are explicit solutions of (2.43) of the form (2.44)
u1n (t) = A exp(i(an − bt)) sech(cn − dt)
NUMERICAL DISPERSIVE SCHEMES FOR NSE
1379
for suitable constants A, a, b, c, d (for the explicit values we refer the reader to [2, p. 84]). In view of the structure of u1 it is easy to see that the solutions of (2.43), obtained from u1 by scaling, are not uniformly bounded as h → 0 in any auxiliary space Lq ((0, T ), lr (hZ)) with r > 2. Indeed, a scaling argument shows that 1 uh Lq ((0,T ), lr (hZ)) 1 2 1 u Lq ((0,T /h2 ), lr (Z)) r+q−2 = h . uh (0)l2 (hZ) u1 (0)l2 (Z)
Observe that, for any t > 0, the lr (Z)-norm behaves as a constant:
u (t)lr (Z) )
1/r 1/r r sech (cx − dt)dx = sech (cx)dx . r
1
R
R
Thus, for all T > 0 and h > 0 the solution u1 satisfies u1 Lq ((0,T /h2 ), lr (Z)) ) (T h−2 )1/q . Consequently for any r > 2 the solution uh on the lattice hZ satisfies uh Lq ((0,T ), lr (hZ)) 1 1 ) h r − 2 → ∞, uh (0)l2 (hZ)
h → 0.
This example shows that, in order to deal with the nonlinear problem, the linear approximation scheme needs to be modified. In the following section we present a method that preserves the dispersion properties and that can be used successfully at the nonlinear level. 3. A two-grid algorithm. In this section we present a conservative scheme that preserves the dispersive properties we discuss in the previous sections. In fact, the scheme we shall consider is the standard one (1.9). But, this time, in order to avoid the lack of dispersive properties associated with the high frequency components, the scheme (1.9) will be restricted to the class of filtered data obtained by a two-grid algorithm. The advantage of this filtering method with respect to the Fourier one is that the filtering can be realized in the physical space. The method, inspired by [9], that extends to several space variables the one introduced in [11], is roughly as follows. We consider two meshes: the coarse one of size 4h, 4hZd , and the finer one, the computational one hZd , of size h > 0. The method relies basically on solving the finite difference semidiscretization (1.9) on the fine mesh hZd , but only for slowly oscillating data, interpolated from the coarse grid 4hZd . As we shall see, the 1/4 ratio between the two meshes is important to guarantee the convergence of the method. This particular structure of the data cancels the two pathologies of the discrete symbol mentioned in section 2. Indeed, a careful Fourier analysis of those initial data shows that their discrete Fourier transform vanishes quadratically in each variable at the points ξ = (±π/2h)d and ξ = (±π/h)d . As we shall see, this suffices to recover at the discrete level the dispersive properties of the continuous model. Once the discrete version of the dispersive properties has been proved, we explain how this method can be applied to a semidiscretization of the NSE with nonlinearity f (u) = |u|p u. To do this, the nonlinearity has to be approximated in such a way that the approximate discrete nonlinearities belong to the subspace of filtered data as well.
1380
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
4h 3/4
1/4
1/4
2/4
3/4
2/4
3/4
1/4
h 6 between the grids 4hZ. Fig. 3. The action of the operator Π
3.1. The two-grid algorithm in the linear framework. To be more precise we introduce the following space of the slowly oscillating sequences. These sequences on the fine one hZd are those which are obtained from the coarse grid 4hZd by an interpolation process. Note that, by scaling, any function defined on the lattice hZd can be viewed as a function on the lattice Zd . Thus it suffices to define this space for h = 1. Let us consider the piecewise and continuous interpolator P11 acting on the coarse 6 : l2 (4Zd ) → l2 (Zd ) (see Figure 3) by grid 4Zd . We define the extension operator Π 6 )j = (P11 f )j , (Πf
(3.45)
We then define the space of the slowly 6 acting on functions of the operator Π ∗ 2 d 2 6 Π : l (hZ ) → l (4hZd ), the adjoint of (3.46)
j ∈ Zd ,
f : 4Zd → C.
d 6 oscillating sequences, Π(4hZ ), as the image d defined on 4hZ . We will also make use of 6 defined by Π,
6 4h , g h )l2 (hZd ) = (g 4h , Π 6 ∗ g h )l2 (4hZd ) (Πg 1 2 1 2
∀ g14h ∈ l2 (4hZd ),
g2h ∈ l2 (hZd ),
where (·, ·)l2 (hZd ) and (·, ·)l2 (4hZd ) are the inner products on l2 (hZd ) and l2 (4hZd ), respectively. 6 and Π 6 ∗ are given by In the 1-d case, the explicit expressions of Π 6 4h )4j+r = (Πg
4 − r 4h r 4h g + g4j+4 , 4 4j 4
j ∈ Z,
r ∈ {0, 1, 2, 3},
and 6 ∗ g h )4j = (Π
3 4−r r=0
4
r h h g4j+r + g4j−4+r , 4
j ∈ Z.
As we will see, S h (t) has appropriate decay properties when it acts on the subspace d 6 Π(4hZ ), uniformly on h > 0. The main results concerning the gain of integrability are given in the following theorem. Theorem 3.1. Let p ≥ 2 and (q, r), (˜ q , r˜) be two d/2-admissible pairs. The following hold: (i) There exists a positive constant C(d, p) such that 6 4h lp (hZd ) ≤ C(d, p)|t|−d( 12 − p1 ) Πϕ 6 4h p d S h (t)Πϕ l (hZ )
(3.47)
for all ϕ4h ∈ lp (4hZd ), h > 0, and t = 0. (ii) There exists a positive constant C(d, r) such that (3.48)
6 4h Lq (R, lr (hZd )) ≤ C(d, r)Πϕ 6 4h l2 (hZd ) S h (t)Πϕ
1381
NUMERICAL DISPERSIVE SCHEMES FOR NSE 1
10
|v 1 (t)|
0
t
−1/3
t
−1/2
∞
l (Z)
10
−1
10
−2
10
0
1
10
2
10
10
3
10
6 0 , where δ0 is one in Fig. 4. Log-log plot of the time evolution of the l∞ (Z)-norm of S 1 (t)Πδ zero and vanishes otherwise.
for all ϕ4h ∈ l2 (4hZd ) and h > 0. (iii) There exists a positive constant C(d, r) such that 0 0 ∞ 0 0 h ∗ 6 4h 0 6 4h q r d S (t) Πf (s)ds0 ≤ C(d, r)Πf (3.49) L (R,l (hZ )) 0 0 −∞
l2 (hZd )
for all f 4h ∈ Lq (R, lr˜ (4hZd )) and h > 0. (iv) There exists a positive constant C(d, r, r˜) such that 0 t 0 0 0 h 6 4h (s)ds0 6 4h q˜ 0 (3.50) S (t − s) Πf ≤ C(d, r, r˜)Πf L (R, lr˜ (hZd )) 0 0 0
Lq (R, lr (hZd ))
for all f 4h ∈ Lq˜ (R, lr˜ (4hZd )) and h > 0. Remark 3.1. In the particular case p = ∞, estimate (3.47) shows that the solution d 6 ) decays as t−d/2 when t becomes large which agrees of (1.9) with initial data in Π(4hZ with the LSE. This can be seen in Figure 4, where the initial data has been chosen 6 0 (δ0 being the discrete Dirac function defined on the coarse grid 4hZ). The as Πδ solution behaves as t−1/2 in contrast with the case presented in section 2, Figure 2, where the initial data was δ0 (the discrete Dirac function defined on the fine grid hZ) and the decay was as t−1/3 . The following lemma gives a Fourier characterization of the data that are obtained by this two-grid algorithm involving the meshes 4hZd and hZd . Its proof uses only the definition of the discrete Fourier transform and we omit it. Lemma 3.1. Let ψ 4h ∈ l2 (4hZd ). Then for all ξ ∈ [−π/h, π/h]d
d ! ξk h d 2 2 4h 4h 6 Πψ (ξ) = 4 Πψ (ξ) (3.51) cos (ξk h) cos , 2 k=1
if j ∈ 4Z and vanishes elsewhere. where (Πψ )j = Remark 3.2. Observe that the right-hand side product in (3.51) vanishes (see the right of Figure 5 for the 1-d case) on the sets Mh1 and Mh2 defined in sections 2.1 and 2.2, respectively. This will allow us to recover the dispersive properties of the numerical scheme introduced in this section. Remark 3.3. A simpler two-grid construction could be done by interpolating 2hZd sequences. We would get for all ψ 2h ∈ l2 (2hZd ) and ξ ∈ [−π/h, π/h]d
d ! ξk h d 2 2h 2h 6 Πψ (ξ) = 2 Πψ (ξ) cos , 2 4h
ψj4h
d
k=1
1382
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
4
2 1.8
3.5
1.6
3 1.4
2.5
1.2 1
2
0.8
1.5
0.6
1 0.4
0.5
0.2
−π/h
0
0
π/h
0
−π/h
π/h
Fig. 5. Multiplicative factors introduced by the two-grid algorithm in dimension one in the case of mesh ratio 1/2 and 1/4.
where (Πψ 2h )j = ψj2h if j ∈ 2Zd and vanishes elsewhere. In the 1-d case the multiplier introduced by this method is plotted in the left of Figure 5. This procedure would cancel the spurious numerical solutions at the frequencies Mh2 but not at Mh1 . In this case, as we proved in section 2, the Strichartz estimates would fail to be uniform on h. Thus we rather choose 1/4 as the ratio between the grids for the two-grid algorithm. We also point out that 4 is the smallest quotient of the grids for which the decay l1 (hZd ) − l∞ (hZd ) holds uniformly in the mesh parameter. Proof of Theorem 3.1. Let us define the weighted operators Ahβ (t) : l2 (hZd ) → 2 l (hZd ) by (3.52)
h (t)ψ h )(ξ) = e−itph (ξ) |g(ξh)|β ψ $h (ξ), (A β
where g(ξ) =
d !
cos(ξk ) cos
k=1
ξk 2
ξ ∈ [−π/h, π/h], .
We will prove that for any β ≥ 1/4, Ahβ (t) satisfies the hypotheses of Theorem 2.6. 6 4h = 4d Ah (t)Πϕ4h , we obtain Then, according to Lemma 3.1, observing that S h (t)Πϕ 2 (3.48), (3.49), and (3.50). It is easy to see that Ahβ (t)ψ h l2 (hZd ) ≤ ψ h l2 (hZd ) . According to this, it remains to prove that for any β ≥ 1/4 and t = s the following holds: (3.53)
Ahβ (t)Ahβ (s)∗ ψ h l∞ (hZd ) ≤ c(β, d)|t − s|−d/2 ψ h l1 (hZd ) .
A scaling argument reduces the proof to the case h = 1. We claim that (3.53) holds once (3.54)
A1γ (t)ψ 1 l∞ (Zd ) ≤ c(γ, d)|t|−d/2 ψ 1 l1 (Zd )
is satisfied for all γ ≥ 1/2. Indeed, using that the operator A1α (t) satisfies A1α (t)∗ = A1α (−t) we obtain A1β (t)A1β (s)∗ ψ 1 l∞ (Zd ) = A1β (t)A1β (−s)ψ 1 l∞ (Zd ) = A12β (t − s)ψ 1 l∞ (Zd ) |t − s|−d/2 ψl1 (Zd ) for all t = s and ψ 1 ∈ l1 (Zd ).
NUMERICAL DISPERSIVE SCHEMES FOR NSE
1383
In the following we prove (3.54). We write A1γ (t) as a convolution A1γ (t)ψ 1 = t −itp1 (ξ) t ∗ ψ 1 , where K |g(ξ)|γ . By Young’s inequality it is sufficient to Kd,γ d,γ (ξ) = e prove that for any γ ≥ 1/2 and t = 0 the following holds: t l∞ (Zd ) ≤ c(γ, d)|t|−d/2 . Kd,γ
(3.55)
t We observe that Kd,γ can be written by separation of variables as t K d,γ (ξ) =
d ! k=1
e
−4it sin2 (
ξk 2
)
γ ! d Q t (ξ ). cos(ξk ) cos ξk K j 2 j=1 1,γ
It remains to prove that (3.55) holds in one space dimension. We make use of the following lemma. Lemma 3.2 (see [15, Corollary 2.9]). Let (a, b) ⊂ R and ψ ∈ C 3 (a, b) be such that ψ changes monotonicity at finitely many points in the interval (a, b). Then # b b i(tψ(ξ)−xξ) 1/2 −1/2 e |ψ (ξ)| φ(ξ)dξ ≤ cψ |t| |φ (ξ)|dξ φL∞ (a,b) + a a holds for all real numbers x and t. Applying the above lemma with φ(ξ) = | cos ξ|γ−1/2 | cos(ξ/2)|γ , γ ≥ 1/2, and ψ(ξ) = −4 sin2 (ξ/2), we obtain (3.55) for d = 1, which finishes the proof. 3.2. A conservative approximation of the NSE. We now build a convergent numerical scheme for the semilinear NSE equation in Rd : iut + Δu = |u|p u, t = 0, (3.56) u(0, x) = ϕ(x), x ∈ Rd . Our analysis applies for the nonlinearity f (u) = −|u|p u as well. In fact, the key point for the proof of the global existence of the solutions is that the L2 -scalar product (f (u), u) is a real number. All the results extend to more general nonlinearities f (u) satisfying this condition under natural growth assumptions for L2 -solutions (see [3, Chap. 4.6, p. 109]). The first existence and uniqueness result for (3.56) with L2 (Rd )-initial data is as follows. Theorem 3.2 (global existence in L2 (Rd ); see Tsutsumi [30]). For 0 ≤ p < 4/d and ϕ ∈ L2 (Rd ), there exists a unique solution u in C(R, L2 (Rd ))∩Lqloc (R, Lp+2 (Rd )) with q = 4(p + 1)/pd that satisfies the L2 -norm conservation property and depends continuously on the initial condition in L2 (Rd ). The proof uses standard arguments, the key ingredient being to work in the space C(R, L2 (Rd )) ∩ Lqloc (R, Lp+2 (Rd )). This can only be done using Strichartz estimates. Local existence is proved by applying a fixed point argument to the integral formulation of (3.56) in that space. Global existence holds because of the L2 (Rd )-conservation property which excludes finite-time blow-up. In order to introduce a numerical approximation of (3.56) it is convenient to give the definition of the weak solution of (3.56). Definition 3.1. We say that u is a weak solution of (3.56) if the following hold: (i) u ∈ C(R, L2 (Rd )) ∩ Lqloc (R, Lp+2 (Rd )).
1384
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
(ii) u(0) = ϕ a.e. and u(−iψt + Δψ)dxdt = (3.57) R
Rd
R
Rd
|u|p uψdxdt
for all ψ ∈ D(R, H (R )), where p and q are as in the statement of Theorem 3.2. In this section we consider the following numerical approximation scheme for (3.56): 2
(3.58)
i
d
duh 6 (Π 6 ∗ uh ), + Δh uh = Πf dt
t ∈ R;
6 4h , uh (0) = Πϕ
with f (u) = |u|p u. In order to prove the global existence of solutions of (3.58), we will need to guarantee the conservation of the l2 (hZd )-norm of solutions, a property that the solutions 6 (Π 6 ∗ uh ) as an approximation of the nonlinear term of the NSE satisfy. The choice Πf f (u) is motivated by the fact that (3.59)
6 ∗ uh ), Π 6 ∗ uh )l2 (4hZd ) ∈ R, 6 (Π 6 ∗ uh ), uh )l2 (hZd ) = (f (Π (Πf
that, as mentioned above, guarantees the conservation of the l2 (hZd )-norm. The following holds. Theorem 3.3. Let p ∈ (0, 4/d) and q = 4(p + 2)/dp. Then for all h > 0 and for every ϕ4h ∈ l2 (4hZd ), there exists a unique global solution uh ∈ C(R, l2 (hZd )) ∩ Lqloc (R, lp+2 (hZd )) of (3.58). Moreover, uh satisfies 6 4h l2 (hZd ) uh L∞ (R, l2 (hZd )) ≤ Πϕ
(3.60)
and for all finite interval I 6 4h l2 (hZd ) , uh Lq (I, lp+2 (hZd )) ≤ c(I)Πϕ
(3.61)
where the above constants are independent of h. Proof of Theorem 3.3. The local existence and uniqueness can be proved, as in the continuous case, by a combination of the Strichartz-like estimates in Theorem 3.1 and of a fixed point argument in the space L∞ ((−T, T ), l2(hZd ))∩Lq ((−T, T ), lp+2 (hZd )), T being chosen small enough, depending on the initial data, but independent of h. Identity (3.59) guarantees the conservation of the l2 -norm of the solutions, and, consequently, the lack of blow-up and the global existence of the solutions. 3.3. Convergence of the method. In what follows we use the piecewise constant interpolator Ph0 . Given the initial datum ϕ ∈ L2 (Rd ) for the PDE, we choose h 6 4h converges strongly to ϕ the approximating discrete data (ϕ4h j )j∈Zd such that P0 Πϕ 2 d h 6 4h in L (R ). Thus, in particular, P0 Πϕ L2 (Rd ) ≤ C(ϕL2 (Rd ) ). The main convergence result is the following. Theorem 3.4. Let p and q be as in Theorem 3.3 and uh be the unique solution 6 4h as above. Then the sequence Ph0 uh of (3.58) for the approximate initial data Πϕ satisfies (3.62)
Ph0 uh u in L∞ (R, L2 (Rd )),
(3.63) Ph0 uh → u in L2loc (Rd+1 ),
Ph0 uh u in Lqloc (R, Lp+2 (Rd )),
6 (Π 6 ∗ uh ) |u|p u in Lq (R, L(p+2) (Rd )), Ph0 Πf loc
NUMERICAL DISPERSIVE SCHEMES FOR NSE
1385
where u is the unique solution of the NSE. First, we sketch the main ideas of the proof. The main difficulty in the proof of Theorem 3.4 is the strong convergence Ph0 uh → u in L2loc (Rd+1 ) which is needed to pass to the limit in the nonlinear term. Once it is obtained, the second convergence in (3.63) easily follows. Another technical difficulty comes from the fact that the interpolator Ph0 is not compactly supported in the Fourier space. Thus we instead consider the band-limited interpolator Ph∗ introduced in (2.34) and prove the compactness for Ph∗ uh . Once this is obtained, the L2 -strong convergence of Ph∗ uh is transferred to Ph0 uh . This is a consequence of the following property of both interpolators (cf. [22, Thm. 3.4.2, p. 90]): Ph0 uh (t) − Ph∗ uh (t)L2 (Ω) ≤ hPh∗ uh (t)H 1 (Ω) ,
(3.64)
which holds for all real t and Ω ⊂ Rd . To prove the L2 -strong convergence of Ph∗ uh we will show that it is uniformly 1/2 1 bounded in L2loc (R, Hloc (Rd )). We shall also obtain estimates in L2loc (R, Hloc (Rd )) h h which are not uniform on h but, according to (3.64), suffice to ensure that P0 u −Ph∗ uh strongly converges to zero in L2loc (Rd+1 ). The following lemma provides local estimates for Ph∗ uh in the H s -norm. Lemma 3.3. Let s ≥ 1/2, let I ⊂ R be a bounded interval, and let χ ∈ Cc∞ (Rd ). Then there is a constant C(I, χ), independent of h, such that (3.65)
6 4h )L2 (I, H s (Rd )) ≤ C(I, χ) Πϕ 6 4h l2 (hZd ) χPh∗ (S h (t)Πϕ hs−1/2
holds for all functions ϕ4h ∈ l2 (4hZd ) and h > 0. Moreover, for any d/2-admissible pair (q, r) 0 0
t 0 0 h C(I, χ) 6 4h h 4h 0 6 χP (3.66) 0 S (t − τ ) Πf (τ )dτ ≤ s−1/2 Πf Lq (I,lr (hZd )) 0 ∗ 0 2 h s d 0 L (I, H (R ))
for all f 4h ∈ Lq (I, lr (4hZd )) and h > 0. Proof. We divide the proof into two steps. The first one concerns the homogeneous estimate (3.65) and the second one (3.66). Step 1. Regularity of the homogeneous term. To prove (3.65) it is sufficient to prove, for any R > 0, the existence of a positive constant C(I, R) such that 6 4h )|2 dxdt ≤ C(I, R) |(−Δ)s/2 Ph∗ (S h (t)Πϕ |ϕ 4h (ξ)|2 dξ. h2s−1 [−π/h,π/h]d I |x|
C(I, R) 2s−1 h
[−π/h,π/h]d
d $h (ξ)|2 dξ ( j=1 ξj2 )1/2 |ψ d ( j=1 sin2 (ξj h)/h2 )1/2
[−π/h,π/h]d
"d
|ψh (ξ)|2 dξ
j=1
| cos(ξj h/2)|
,
1386
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
provided that all terms make sense. Note that this estimate holds for all ψ ∈ l2 (hZd ). Observe, however, that the term in the denominator in the right-hand side integral may vanish for the high frequencies ξ = (±π/h)d . In order to compensate this fact d 6 ). Now, we we consider initial data in the class of slowly oscillating sequences Π(4hZ 6 4h . Thus apply the last estimates to ψ h = Πϕ I
|x|
6 4h )|2 dxdt ≤ |(−Δ)s/2 Ph∗ (S h (t)Πϕ
C(I, R) ≤ 2s−1 h
[−π/h,π/h]d
|ϕ 4h (ξ)|2
d !
C(I, R) h2s−1
4h (ξ)|2 dξ 6 |Πϕ
[−π/h,π/h]d
| cos(ξj h/2)|3 dξ ≤
j=1
"d
j=1
| cos(ξj h/2)|
C(I, R) 6 4h Πϕ l2 (hZd ) . h2s−1
Step 2. Regularity of the inhomogeneous term. In the following we prove (3.66). This estimate will be reduced to the homogeneous one (3.65) by using the argument of Christ and Kiselev [4] (see also [24] in the context of the PDE). A simplified version, useful in PDE applications, is given in [24]. Lemma 3.4. Let X and Y be Banach spaces and assume that K(t, s) is a continuous function taking its values in B(X, Y ), the space of bounded linear mappings from X to Y . Suppose that −∞ ≤ a < b ≤ ∞ and set t b K(t, s)f (s)ds, W f (t) = K(t, s)f (s)ds. T f (t) = a
a
Assume that 1 ≤ p < q ≤ ∞ and T f Lq ([a,b],Y ) ≤ f Lp([a,b],X) . Then W f Lq ([a,b],Y ) ≤ f Lp([a,b],X) . Without loss of generality we can consider I = [0, T ]. In view of the above lemma it is sufficient to prove that the operator T 6 4h (τ )dτ T f 4h (t) = χPh∗ S h (t − τ )Πf 0
satisfies T f 4hL2 ([0,T ], H s (Rd )) ≤
C(T, χ) 6 4h Πf Lq ([0,T ], lr (hZd )) . hs−1/2
We write T f 4h as T f 4h(t) = χPh∗ S h (t)T1 f 4h (t), where T 6 4h (s)ds. T1 f 4h (t) = S h (s)∗ Πf 0
Estimate (3.67) yields T f
4h
L2 ([0,T ], H s (Rd ))
0 0 0 4h C(I, χ) 0 T 0 0 1 f (ξ) ≤ s−1/2 0 "d 0 1/2 0 0 h | cos(ξ h/2)| j j=1
L2 ([−π/h,π/h]d )
0 0 0 4h C(I, χ) 0 T 0 0 1 f (ξ) s−1/2 0 "d 0 1/2 | cos(ξ h)|1/2 0 0 h | cos(ξ h/2)| j j j=1
L2 ([−π/h,π/h]d )
,
NUMERICAL DISPERSIVE SCHEMES FOR NSE
1387
provided that all the above integrals are finite. Explicit computations on T1 f 4h show that 4h T 1 f (ξ)
"d j=1
| cos(ξj h/2)|1/2 | cos(ξj h)|1/2
T
eisph (ξ)
= 4d 0
3/2 d ! 4h (ξ, s)ds cos ξj h |cos(ξj h)|3/2 Πf 2
j=1
T
= 4d
(Ah3/2 (s))∗ Πf 4h (s)ds (ξ),
0
where the operator Ah3/2 is defined in (3.52). Applying Theorem 2.6 to the operator Ah3/2 we obtain, by estimate (2.38), that 0 0 0 T 0 0 0 h ∗ 4h 6 4h q (A3/2 (s)) Πf (s)0 Πf 4h Lq ([0,T ], lr (hZd )) Πf 0 L ([0,T ], lr (hZd )) . 0 0 02 d l (hZ )
The proof is now complete. Proof of Theorem 3.4. Using (3.60) we obtain that Ph0 uh is uniformly bounded in L∞ (R, L2 (Rd )). This guarantees the existence of a function u ∈ L∞ (R, L2 (Rd )) such that, up to a subsequence, Ph0 uh u in L∞ (R, L2 (Rd )). By (3.61) we obtain that u ∈ Lq (I, Lp+2 (Rd )) and, up to a subsequence, Ph0 uh u in Lq (I, Lp+2 (Rd )). In the following we prove the strong convergence of Ph0 uh . First, we prove that h h P0 u −Ph∗ uh → 0 in L2loc (R×Rd ). Second, we prove the compactness of Ph∗ uh . Finally, we obtain that Ph0 uh → u in L2loc (R × Rd ). For any Ω ⊂ Rd , classical properties of the interpolator Ph0 uh (see [22, Thm. 3.4.2, p. 90]) give us |Ph0 uh − Ph∗ uh |2 dx ≤ h2 Ph∗ uh 2H 1 (Ω) . Ω
Applying Lemma 3.3 with s = 1 we obtain, for any χ ∈ Cc∞ (Rd ), χ2 |Ph0 uh − Ph∗ uh |2 dxdt ≤ h2 χ2 |(I − Δ)1/2 Ph∗ uh |2 dxdt I
Rd
I
Rd
6 4h 22 d ) → 0, h → 0. ≤ hC(I, Πϕ l (hZ ) This shows that Ph0 uh − Ph∗ uh → 0 in L2loc (R × Rd ). Using Lemma 3.3 with s = 1/2 we obtain that for any smooth function χ, Ph∗ uh satisfies 6 4h l2 (hZd ) ). χPh∗ uh L2 (I, H 1/2 (Rd )) ≤ C(I, χ, Πϕ We can also prove the following uniform boundedness property of its time derivative: 0 dPh uh 0 0 ∗ 0 ≤ Δh Ph∗ uh L1 (I, H −2 (Rd )) + Ph∗ (|uh |p uh )L1 (I, H −2 (Rd )) 0 0 1 dt L (I, H −2 (Rd )) ≤ Ph∗ uh L1 (I, L2 (Rd )) + Ph∗ (|uh |p uh )L1 (I, L(p+2) (Rd )) ≤ C(I, ϕL2 (Rd ) ).
1388
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
Using the embeddings H s (Ω) →comp L2 (Ω) → H −2 (Ω), Ω ⊂ Rd being a bounded domain, and the compactness results of [23] we obtain the existence of a function v such that, up to subsequences, Ph∗ uh → v in L2loc (R × Rd ). Using the strong convergence of Ph∗ uh towards v we obtain that v = u and Ph0 uh → u in L2loc (R × Rd ). Let Γ ⊂ Zd be a finite set. Thus for any s ∈ Γ we have Ph0 uh (· + sh) → u in 2 6 and Π 6 ∗ involve Lloc (R × Rd ) and Ph0 uh (· + sh) → u a.e. in R × Rd . The operators Π 6 (Π 6 ∗ uh ) → |u|p u a.e. in R × Rd and only a finite number of translations. Then Ph0 Πf h6 ∗ h p q (p+2) d 6 (R )). P0 Πf (Π u ) |u| u in L (I, L Multiplying (3.58) by a function ψ ∈ Cc∞ (Rd+1 ), Ph0 uh satisfies h h h 6 (Π 6 ∗ uh )ψdxdt. (3.68) P0 u (−iψt + Δ ψ)dxdt = Ph0 Πf R
Rd
R
Rd
All the above weak convergences of Ph0 uh and (3.68) show that u satisfies (3.57). It remains to prove that u ∈ C(R, L2 (Rd )) and u(0) = ϕ. To prove that u ∈ C(R, L2 (Rd )) we show its continuity at t = 0; the same argument works at any time t. For any positive 0 ≤ t ≤ T < 1, the Strichartz estimates in Theorem 3.1 and the H¨ older inequality in time variable applied to the variation of constants formula give us 0 t 0 0 0 h h 4h h ∗ h 0 6 6 6 u (t) − S (t)Πϕ l2 (hZd ) ≤ 0 S (t − s)Πf (Π u )ds0 0 0
L∞ ([0,T ], l2 (Zd ))
|uh |p uh Lq(h) ([0,T ], l(p+2) (hZd )) ≤ T (q−(p+2))/q uh p+1 Lq ([0,T ], lp+2 (hZd )) T 1−pd/4 C(ϕL2 (Rd ) ). Using that Ph0 uh u and Ph0 S h (·)ϕh ∗ S(·)ϕ in L∞ ([0, T ], L2 (Rd )) we get 6 4h L∞ ([0,T ], L2 (Rd )) u(t) − S(t)ϕL2 (Rd ) ≤ lim inf Ph0 uh (·) − Ph0 S h (·)Πϕ h→0
T 1−pd/4 C(ϕL2 (Rd ) ). This proves that the solution u obtained as the limit of Ph0 uh satisfies u(t) → ϕ in L2 (Rd ) as t → 0. The uniqueness of the limit, a solution of the NSE (3.56), allows us to deduce that the whole sequence Ph0 uh converges without extracting subsequences. The proof of Theorem 3.4 is now complete. 3.4. The critical case p = 4/d. Our method works similarly in the critical case p = 4/d for small initial data. More precisely, the following holds. Theorem 3.5. There exists a constant , independent of h, such that for all initial d 6 data ϕh ∈ Π(4hZ ) with ϕh l2 (hZd ) < , the semidiscrete critical equation (3.58) with 2+4/d
p = 4/d has a unique global solution uh ∈ C(R, l2 (hZd )) ∩ Lloc (R, l2+4/d (hZd )). Moreover, for any d/2-admissible pair (q, r), uh ∈ Lqloc (R, lr (hZd )) and uh Lq (I, lr (hZd ) ≤ C(q, I)ϕh l2 (hZd ) for all finite intervals I, uniformly on h.
1389
NUMERICAL DISPERSIVE SCHEMES FOR NSE
With the same notation, as in the subcritical case, the following convergence result holds. Theorem 3.6. Let p = 4/d. Under the smallness assumption of Theorem 3.5, the sequence Ph0 uh satisfies Ph0 uh u in L∞ (R, L2 (Rd )),
Ph0 uh → u in L2loc (R × Rd ),
4/d+2
Ph0 uh u in Lloc
(R, L4/d+2 (Rd )),
6 (Π 6 ∗ uh )) |u|4/d u in L(4/d+2) (R, L(4/d+2) (Rd )), Ph0 Π(f loc
where u is the unique weak solution of the critical NSE with p = 4/d. In contrast with the viscous numerical scheme introduced in [12] this time we do not need to modify the exponent 4/d of the nonlinearity in the numerical scheme. In the present case, the class of Strichartz estimates for the linear semidiscrete semigroup hold for d/2-admissible pairs and not for the some α-admissible pairs, α > d/2. This allows us to use, for the numerical scheme based on the two-grid method, exactly the same nonlinearity as that given by the nonlinear problem after adapting it by means 6 and Π 6 ∗ as in (3.58). of extension and restriction operators Π We have analyzed here the case of small L2 -initial data. In the continuous case, the global well-posedness can be proved under a more general assumption: (3.69)
eitΔ ϕL2+4/d (R,L2+4/d (Rd )) ≤ c0
for some sufficiently small constant c0 . Examples of ϕ satisfying (3.69) with large L2 (Rd )-norm are given in [17, Chap. 5, section 5.4, p. 108–109]. At the numerical level, condition (3.69) can be replaced by (3.70)
S h (t)ϕh L2+4/d(R, l2+4/d (hZd ) ≤ c1 ,
d 6 where c1 is a positive, small enough constant and ϕh ∈ Π(4hZ ). Clearly, for ϕh ∈ d 2 d 6 Π(4hZ ) with small l (hZ )-norm, estimate (3.48) shows (3.70). The construction of h 6 ϕ ∈ Π(4hZd ) with large l2 (hZd )-norm satisfying (3.70) is an open problem.
REFERENCES [1] M. J. Ablowitz and J. F. Ladik, Nonlinear differential-difference equations, J. Math. Phys., 16 (1975), pp. 598–603. [2] M. J. Ablowitz, B. Prinari, and A. D. Trubatch, Discrete and Continuous Nonlinear Schr¨ odinger Systems, London Math. Soc. Lecture Note Ser. 302, Cambridge University Press, Cambridge, UK, 2004. [3] T. Cazenave, Semilinear Schr¨ odinger Equations, Courant Lect. Notes Math. 10, American Mathematical Society, Providence, RI, Courant Institute of Mathematical Sciences, New York, 2003. [4] M. Christ and A. Kiselev, Maximal functions associated to filtrations, J. Funct. Anal., 179 (2001), pp. 409–425. [5] P. Constantin and J. C. Saut, Local smoothing properties of dispersive equations, J. Amer. Math. Soc., 1 (1988), pp. 413–439. [6] J. Giannoulis, M. Herrmann, and A. Mielke, Continuum descriptions for the dynamics in discrete lattices: Derivation and justification, in Analysis, Modeling and Simulation of Multiscale Problems, Vol. 18, A. Mielke, ed., Springer, Berlin, 2006, pp. 435–466. [7] G. Gigante and F. Soria, On a sharp estimate for oscillatory integrals associated with the Schr¨ odinger equation, Int. Math. Res. Not., no. 24 (2002), pp. 1275–1293. [8] J. Ginibre and G. Velo, The global Cauchy problem for the nonlinear Schr¨ odinger equation revisited, Ann. Inst. H. Poincar´e Anal. Non Lin´eaire, 2 (1985), pp. 309–327. [9] R. Glowinski, Ensuring well-posedness by analogy: Stokes problem and boundary control for the wave equation, J. Comput. Phys., 103 (1992), pp. 189–221.
1390
LIVIU I. IGNAT AND ENRIQUE ZUAZUA
[10] L. I. Ignat, Fully discrete schemes for the Schr¨ odinger equation. Dispersive properties, Math. Models Methods Appl. Sci., 17 (2007), pp. 567–591. [11] L. I. Ignat and E. Zuazua, A two-grid approximation scheme for nonlinear Schr¨ odinger equations: Dispersive properties and convergence, C. R. Acad. Sci. Paris, 341 (2005), pp. 381– 386. [12] L. I. Ignat and E. Zuazua, Dispersive properties of a viscous numerical scheme for the Schr¨ odinger equation, C. R. Acad. Sci. Paris, 340 (2005), pp. 529–534. [13] L. I. Ignat and E. Zuazua, Dispersive properties of numerical schemes for nonlinear Schr¨ odinger equations, in Foundations of Computational Mathematics, Santander 2005, London Math. Soc. Lecture Note Ser. 331, L. M. Pardo et al., eds., Cambridge University Press, Cambridge, UK, 2006, pp. 181–207. [14] M. Keel and T. Tao, Endpoint Strichartz estimates, Amer. J. Math., 120 (1998), pp. 955–980. [15] C. E. Kenig, G. Ponce, and L. Vega, Oscillatory integrals and regularity of dispersive equations, Indiana Univ. Math. J., 40 (1991), pp. 33–69. [16] C. E. Kenig, G. Ponce, and L. Vega, Small solutions to nonlinear Schr¨ odinger equations, Ann. Inst. H. Poincar´e Anal. Non Lin´eaire, 10 (1993), pp. 255–288. [17] F. Linares and G. Ponce, Introduction to Nonlinear Dispersive Equations, Publica¸c˜ oes Matem´ aticas, IMPA, Rio de Janeiro, Brazil, 2004. [18] A. Magyar, E. M. Stein, and S. Wainger, Discrete analogues in harmonic analysis: Spherical averages, Ann. of Math. (2), 155 (2002), pp. 189–208. [19] A. Mielke, Macroscopic behavior of microscopic oscillations in harmonic lattices via WignerHusimi transforms, Arch. Ration. Mech. Anal., 181 (2006), pp. 401–448. [20] M. Nixon, The discretized generalized Korteweg-de Vries equation with fourth order nonlinearity, J. Comput. Anal. Appl., 5 (2003), pp. 369–397. ´ lya, Fonctions enti` [21] M. Plancherel and G. Po eres et int´ egrales de Fourier multiples. II, Comment. Math. Helv., 10 (1937), pp. 110–163. [22] A. Quarteroni and A. Valli, Numerical Approximation of Partial Differential Equations, Springer Ser. Comput. Math. 23, Springer, Berlin, 1994. [23] J. Simon, Compact sets in the space Lp (0, T ; B), Ann. Mat. Pura Appl. (4), 146 (1987), pp. 65– 96. [24] G. Staffilani and D. Tataru, Strichartz estimates for a Schr¨ odinger operator with nonsmooth coefficients, Comm. Partial Differential Equations, 27 (2002), pp. 1337–1372. [25] A. Stefanov and P. G. Kevrekidis, Asymptotic behaviour of small solutions for the discrete nonlinear Schr¨ odinger and Klein-Gordon equations, Nonlinearity, 18 (2005), pp. 1841– 1857. [26] E. M. Stein, Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals, Princeton Math. Ser. 43, Princeton University Press, Princeton, NJ, 1993. [27] R. S. Strichartz, Restrictions of Fourier transforms to quadratic surfaces and decay of solutions of wave equations, Duke Math. J., 44 (1977), pp. 705–714. [28] T. Tao, Nonlinear Dispersive Equations: Local and Global Analysis, CBMS Regional Conf. Ser. in Math. 106, American Mathematical Society, Providence, RI, 2006. [29] L. N. Trefethen, Spectral Methods in MATLAB, Software Environ. Tools 10, SIAM, Philadelphia, 2000. [30] Y. Tsutsumi, L2 -solutions for nonlinear Schr¨ odinger equations and nonlinear groups, Funkcial. Ekvac., 30 (1987), pp. 115–125. [31] R. Vichnevetsky and J. B. Bowles, Fourier Analysis of Numerical Approximations of Hyperbolic Equations, SIAM Stud. Appl. Math., 5, SIAM, Philadelphia, 1982.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1391–1420
c 2009 Society for Industrial and Applied Mathematics
DISCONTINUOUS GALERKIN METHODS FOR ADVECTION-DIFFUSION-REACTION PROBLEMS∗ BLANCA AYUSO† AND L. DONATELLA MARINI‡ Abstract. We apply the weighted-residual approach recently introduced in [F. Brezzi et al., Comput. Methods Appl. Mech. Engrg., 195 (2006), pp. 3293–3310] to derive discontinuous Galerkin formulations for advection-diffusion-reaction problems. We devise the basic ingredients to ensure stability and optimal error estimates in suitable norms, and propose two new methods. Key words. discontinuous Galerkin, advection-diffusion-reaction, inf-sup condition AMS subject classifications. 65N30, 65N12, 65G99, 76R99 DOI. 10.1137/080719583
1. Introduction. In recent years discontinuous Galerkin (DG) methods have become increasingly popular, and they have been used and analyzed for various kinds of applications: see, e.g., [2] for second order elliptic problems, [4], [3] for Reissner– Mindlin plates, and, for advection-diffusion problems, [13], [14], [23], [38], [20], [24], and [10]. Most DG methods for advection-diffusion or hyperbolic problems are constructed by specifying the numerical fluxes at the interelements, and, as far as we know, the advection field is mostly assumed to be either constant or divergence-free. In the present paper we follow a different path. On one hand, we derive DG formulations by applying the so-called weighted-residual approach of [6]. In this approach a DG method is written first in strong form, as a system of equations including the original PDE equation inside each element plus the necessary continuity conditions at interfaces. The variational form is then obtained by combining all these equations. In this way, the DG method establishes a linear relationship between the residual inside each element and the jumps across interelement boundaries. Such a linear relation permits us to recover DG methods proposed earlier in the literature, and at the same time provides a framework for devising new DG methods with the desired stability and consistency properties. As we shall show, this is possible, since stability and consistency can be ensured through a proper selection of the weights in the linear relationship, which in turn determines the DG method. On the other hand, we deal with a variable reaction and a variable advection field which is not divergence-free. With respect to other papers treating variable coefficients (see, e.g., [17], [18], [11]) the novelty of the present paper is that we relax the usual coercivity condition relating advection and reaction (see condition (2.2) in section 2). To the best of our knowledge the weaker coercivity condition was assumed in [21], but there the advection field is constant, while for variable coefficients similar assumptions ∗ Received by the editors March 31, 2008; accepted for publication (in revised form) November 17, 2008; published electronically February 25, 2009. http://www.siam.org/journals/sinum/47-2/71958.html † Departamento de Matem´ aticas, Universidad Aut´ onoma de Madrid, Madrid 28049, Spain (blanca.
[email protected]). The work of this author was partially supported by MEC under project MTM200500714 and by CAM under project S0505/ESP-0158. ‡ Dipartimento di Matematica, Universit` a degli Studi di Pavia and IMATI del CNR, Via Ferrata 1, 27100 Pavia, Italy (
[email protected]). The work of this author was partially supported by MIUR under project PRIN2006.
1391
1392
BLANCA AYUSO AND L. DONATELLA MARINI
in a different context were used in [19]. Clearly, the weaker condition (2.2), together with variable coefficients, makes the analysis more complicated than usual, surely more complicated than one could expect at first sight, if one wants to take care of situations where advection and/or reaction dominate in different parts of the domain or, more generally, when diffusion is (comparatively) very small. To ease the presentation we apply the weighted-residual approach to derive two DG methods proposed in the literature: the method introduced in [23], and that proposed in [24] and further analyzed in [10]. The former uses the nonsymmetric NIPG method for the diffusion terms and upwind for the convective part of the flux. In the latter the diffusion terms are treated with three different DG methods, and the whole physical flux is upwinded. This makes the approach well suited for strongly advectiondominated problems (actually, the most interesting cases) but less adequate in the diffusion-dominated or intermediate regimes. We also introduce two new methods. One of them, that we refer to as minimal choice, contains the minimum number of terms needed to get stability and optimal order of convergence in all regimes. The other one is a more refined method, that contains as a particular case the method [20] and the minimal choice. Our formulation allows us also to recover easily, for each of the methods analyzed, the corresponding SUPG-stabilized version. Many others methods could have been considered, but this would have made the paper practically unreadable. Moreover, our aim was not to compare the behavior of different schemes, but mostly to explore the possibilities and the ductility of the weighted-residual approach for designing and analyzing DG methods. It is worth noticing that this approach seems to be particularly suited for understanding in a natural way which stabilization mechanisms are, hidden in each DG method, responsible for the behavior of the DG approximation in the different regimes of the problem. It also provides a way to perform stability and a priori error analysis in a unified framework. Furthermore, we think that it could be useful also for applications to a posteriori error analysis, a field which is well developed for conforming approximations but much less studied for DG approximations or even stabilized methods. This surely deserves some further and future research. Throughout the paper we shall use standard notation for norms and seminorms in Sobolev spaces. To keep homogeneity of dimensions, we recall that on a domain Ω of diameter L we define (1.1)
v2k,Ω :=
k
L2s |v|2s,Ω ,
v ∈ H k (Ω), k ≥ 0,
s=0
(1.2)
vk,∞,Ω :=
k
Ls |v|s,∞,Ω ,
v ∈ W k,∞ (Ω), k ≥ 0.
s=0
The outline of the paper is as follows. In section 2 we present the problem with all the assumptions necessary to the analysis, and we apply the weighted-residual approach. In section 3 we show examples of choices of the “weights,” leading to four methods: the methods of [23] and [24], and two new methods. In section 4 we deal with the approximation and prove stability in a suitable DG norm. We also prove stability in a norm of SUPG-type, thus providing control on the streamline derivative. Section 5 is devoted to a priori error analysis, and optimal convergence is proved in both norms. Finally, in section 6 we present an extensive set of numerical experiments to compare the methods and to validate our theoretical results.
1393
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
2. Setting of the problem. To ease the presentation we shall restrict ourselves to the two-dimensional case, although the results here presented also hold in three dimensions. Let Ω be a bounded, convex, polygonal domain in R2 , and let β = (β1 , β2 )T be the velocity vector field defined on Ω with βi ∈ W 1,∞ (Ω), i = 1, 2, γ ∈ L∞ (Ω) the reaction coefficient, and ε a positive constant diffusivity coefficient. We define the inflow and outflow parts of Γ = ∂Ω in the usual fashion: Γ− = {x ∈ Γ : β(x) · n(x) < 0} = inflow, Γ+ = {x ∈ Γ : β(x) · n(x) ≥ 0} = outflow, where n(x) denotes the unit outward normal vector to Γ at x ∈ Γ. Let ΓD = ∅, and let ΓN be the parts of the boundary Γ where Dirichlet and Neumann boundary conditions are assigned, so that Γ = ΓD ∪ ΓN , ΓD ∩ ΓN = ∅. Thus, Γ D ± = ΓD ∩ Γ ± ,
Γ N ± = ΓN ∩ Γ ± .
Let f ∈ L2 (Ω), gD ∈ H 3/2 (ΓD ), gN ∈ H 1/2 (ΓN ). Consider the advectiondiffusion-reaction problem divσ(u) + γu
= f
in Ω,
u = gD
on ΓD ,
(βuχΓ− − ε∇u) · n = gN
on ΓN ,
(2.1) N
where σ(u) is the (physical) flux, given by σ(u) = −ε∇u + βu, and χΓ− is the characteristic function of Γ− N . The meaning of the boundary conditions N
+ on ΓN is that the total flux is imposed on Γ− N while on ΓN only the diffusive flux is specified (see [24]). Since the first equation in (2.1) is equivalent to −εΔu + β · ∇u + (divβ + γ)u = f , we introduce the “effective” reaction function (x) and we make the assumption
(2.2)
(x) := γ(x) +
1 divβ(x) ≥ 0 ≥ 0 2
∀x ∈ Ω.
For the subsequent stability and error analysis we shall make the following assumptions on the coefficients: the advective field has neither closed curves nor stationary points, i.e., (2.3)
β has no closed curves
and
|β(x)| = 0
∀x ∈ Ω.
This implies, as we shall see later on (see Remark 2.1 and Appendix A), that (H1)
∃ η ∈ W k+1,∞ (Ω)
such that
β · ∇η ≥ 2b0 := 2
β0,∞,Ω L
in Ω.
Furthermore, we assume that (H2)
∃ cβ > 0 such that |β(x)| ≥ cβ ||β||1,∞,Ω
∀x ∈ Ω,
and, given a shape-regular family Th of decompositions of Ω into triangles T , (H3)
∃ c > 0 such that ∀T ∈ Th
0,∞,T ≤ c (min (x) + b0 ). T
1394
BLANCA AYUSO AND L. DONATELLA MARINI
Remark 2.1. Assumption (2.3), together with the regularity β ∈ W 1,∞ (Ω), ensures the well-posedness of the continuous problem in the pure hyperbolic limit (ε = 0). (See [16] and also [33] for details.) Condition (H1) is based on a result first established in [16, Lemma 2.3] under more regularity assumptions on β. Namely, for β ∈ C k (U), k ≥ 1 satisfying (2.3), U being some neighborhood of Ω, the authors show the existence of η ∈ C k (U) verifying β · ∇η ≥ b0 > 0 in Ω. However, by revising the proof in [16], it can be seen that the result holds true also if β ∈ W 1,∞ (Ω), provided it satisfies (2.3) (see Appendix A for details). Assumption (H2) excludes undesirable situations of a small but highly oscillatory advection field and provides useful relations among norms. Indeed, from (1.2) we deduce cβ (2.4)
||β||1,∞,Ω ||β||1,∞,Ω ||β||0,∞,Ω ≤ b0 := ≤ , L L L
|β|1,∞,Ω ≤
1 ||β||0,∞,Ω b0 ||β||1,∞,Ω ≤ = . L cβ L cβ
Hypothesis (H3) is always verified in the advection-dominated regime (it says nothing more than ∈ L∞ (Ω)). Instead, when the advection field is negligible, it forbids the problem to shift from reaction-dominated to diffusion-dominated within a single element. Note that, since we are interested in the case where the diffusion coefficient ε is very small, what we refer to as diffusion-dominated problem (that is, when both reaction and advection are also very small) has little practical interest. Again let Th be a shape-regular family of decompositions of Ω into triangles T , − such that each (open) boundary edge belongs either to ΓD , or to Γ+ N or to ΓN (in other words, we avoid edges that belong to two different types of boundaries). We denote by hT the diameter of T , and we set h = maxT ∈Th hT . Since we look for a solution of (2.1) a priori discontinuous, we need to recall the definition of typical tools such as averages and jumps on the edges for scalar- and vector-valued functions. Let T1 and T2 be two neighboring elements, let n1 and n2 be their outward normal unit vectors, and let ϕi and τ i be the restrictions of ϕ and τ to Ti (i = 1, 2), respectively. Following [2] we set (2.5) (2.6)
{ϕ} = {τ } =
1 1 (ϕ + ϕ2 ), 2
1 1 (τ + τ 2 ), 2
[[ ϕ ]] = ϕ1 n1 + ϕ2 n2
on e ∈ Eh◦ ,
[[ τ ]] = τ 1 · n1 + τ 2 · n2
on e ∈ Eh◦ ,
where Eh◦ is the set of interior edges e. For e ∈ Eh∂ , the set of boundary edges, we set (2.7)
[[ ϕ ]] = ϕn,
{ϕ} = ϕ,
{τ } = τ .
For future purposes we also introduce a weighted average, for both scalar- and vectorvalued functions, as follows. With each internal edge e, shared by elements T1 and T2 , we associate two real nonnegative numbers α1 and α2 , with α1 + α2 = 1, and we define (2.8)
{τ }α = α1 τ 1 + α2 τ 2
on internal edges.
As shown, for instance, in [8] for a pure hyperbolic problem, a proper choice of α1 and α2 will introduce a stabilizing effect of upwind type into the scheme. We note that the
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
1395
arithmetic average is obtained for α1 = α2 = 1/2, while the classical upwind flux is obtained when αi = (sign(β · ni ) + 1)/2 for i = 1, 2 (where, as usual, sign(x) = x/|x| for x = 0 and sign(0) = 0). Indeed, for vectors the following relation holds:
[[ α ]] [[ τ ]] · ne , (2.9) {τ }α · ne = {τ } + 2 whenever ne is orthogonal to e. Thus, if, for instance, T1 is the upwind triangle, i.e., β · n1 > 0, then α = (1, 0) and
n1 {τ }α · n1 = {τ } + [[ τ ]] · n1 = τ 1 · n1 =: {τ }upw · n1 , 2 (2.10)
n2 2 {τ }1−α · n = {τ } + [[ τ ]] · n2 = τ 2 · n2 =: {τ }dw · n2 , 2 while for scalar functions we obviously have {v}α = v 1 =: {v}upw ,
{v}1−α = v 2 =: {v}dw .
Taking αi = 1/2+t sign(β·ni ) ( i = 1, 2) will allow us, choosing t with 0 < t0 ≤ t ≤ 1/2 on each edge, to tune up the quantity of upwind. We shall make extensive use of the identity [2, formula (3.3)] (2.11) τ · nϕ = {τ } · [[ ϕ ]] + [[ τ ]]{ϕ}, T ∈Th
∂T
e∈Eh
e
◦ e∈Eh
e
of the trace inequality [1], [2] (2.12)
||w||20,e ≤ Ct2 (|e|−1 ||w||20,T + |e||w|21,T ),
e ⊂ ∂T, w ∈ H 1 (T ),
with Ct a constant depending only on the minimum angle of T , and |e| = length of the edge e, and finally of the DG–Poincar´e inequality [5] ⎛ ⎞1/2 1 ||[[ v ]]||20,e ⎠ , (2.13) v0,Ω ≤ L CP ⎝|v|21,h + |e| e∈Γ / N
where CP is a positive constant depending on the minimum angle of Th , and | · |1,h denotes the broken H 1 -seminorm. With the previous definitions, problem (2.1) is equivalent to ⎧ = f in each T ∈ Th , ⎪ ⎪ divσ(u) + γu ⎪ ⎪ ⎪ = 0 on each e ∈ Eh◦ , ⎪ [[ σ(u) ]] ⎨ [[ u ]] = 0 on each e ∈ Eh◦ , (2.14) ⎪ ⎪ ⎪ u = gD on each e ∈ ΓD , ⎪ ⎪ ⎪ ⎩ (βuχ − − ε∇u) · n = g on each e ∈ Γ . ΓN
N
N
Following the approach of [6], we shall introduce a variational formulation of (2.14) in which each of the equations above has the same relevance and is therefore treated in the same fashion. To do so, we introduce the space V (Th ) := {v ∈ L2 (Ω) such that v|T ∈ H s (T ) ∀T ∈ Th , s > 3/2},
1396
BLANCA AYUSO AND L. DONATELLA MARINI
and we assume that we have five operators B0 , B1 , B2 , B1D , B2N from V (Th ) to L2 (Ω), L2 (Eh◦ ), L2 (Eh◦ ), L2 (ΓD ), L2 (ΓN ), respectively. Then we consider the problem ⎧ Find u ∈ V (Th ) such that ∀v ∈ V (Th ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (divh σ(u) + γu − f )B0 v + [[ u ]] · B1 v + [[ σ(u) ]]B2 v Ω e e ◦ ◦ (2.15) e∈Eh e∈Eh ⎪ ⎪ ⎪ ⎪ ⎪ D ⎪ (u − gD ) B1 v + ((βuχΓ− − ε∇u) · n − gN ) B2N v = 0, ⎪ ⎩ + N e
e∈ΓD
e∈ΓN
e
where divh denotes the divergence element by element. Different choices of the B’s operators will give rise to different formulations. Since the solution of the original problem (2.1) is always a solution of (2.15), if we ensure uniqueness of the solution of (2.15), such a solution will coincide with the solution of the original problem. Sufficient conditions on the operators B to guarantee uniqueness of the solution of (2.15) are given in [6, Theorem 1]. In the next section we shall present some choices of the operators verifying the hypotheses of the cited theorem. 3. Variational formulations. We will present four examples of different choices for the operators in (2.15). Two of them reproduce known formulations, while the other two will give rise to new methods. Example 1. We set ∀T ∈ Th ,
B0 v|T = v (3.1)
B1 v|e = ce
B2 v|e = −{v} ∀e ∈ Eh◦ , ε B1D v|e = ce [[ v ]] · n − β · nv |e|
n+ ε [[ v ]] + [[ βv ]] ∀e ∈ Eh◦ , |e| 2
∀e ∈ Γ− D,
B2N v|e = −v
∀e ∈ Γ− N.
In (3.1) n+ is the normal to e such that β · n+ ≥ 0, and ce is a positive constant such that (see [2]) ce ≥ η0 > 0 ∀e ∈ Eh .
(3.2)
We shall see that the definition of the operators on Γ+ can be made arbitrary, without compromising the stability or consistency properties of the resulting methods. We can choose, for instance, ε B1D v = ce v on e ∈ Γ+ B2N v = −v on Γ+ D, N. |e| With these choices, and setting S e = ce
ε , |e|
problem (2.15) reads
n+ [[ βv ]] (divσ(u) + γu − f )v + [[ u ]] · Se [[ v ]] + 0= 2 e ◦ T ∈Th T e∈Eh − [[ σ(u) ]]{v} + (u − gD ) · (Se [[ v ]] − βv) · n (3.3) ◦ e∈Eh
+
e∈Γ+ D
e
e∈Γ− D
(u − gD ) v −
Se e
e
e∈ΓN
e
((βuχΓ− − ε∇h u) · n − gN )v. N
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
1397
Using the identity (2.11) we have
divh σ(u) v = −
Ω
σ(u) · ∇h v + Ω
◦ e∈Eh
[[ σ(u) ]]{v} +
e
{σ(u)} · [[ v ]].
e
e∈Eh
Substituting in (3.3), and observing that the continuity of β and (2.10) implies
n+ n+ [[ βv ]] = [[ βu ]] · [[ v ]] {βu} · [[ v ]] + [[ u ]] · {βu} + 2 2 e e ◦ ◦
e∈Eh
e∈Eh
=
◦ e∈Eh
−
{βu}upw · n (v − v ) = +
+
e
◦ e∈Eh
{βu}upw · [[ v ]],
e
we obtain the following formulation:
(3.4)
⎧ Find u ∈ V (Th ) such that ∀v ∈ V (Th ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (γuv − σ(u) · ∇h v) + Se [[ u ]] · [[ v ]] + {βu}upw · [[ v ]] ⎪ ⎪ ⎪ Ω e e ⎪ ◦ e∈Eh ⎪ e∈Γ / N ⎪ ⎨ − {ε∇ u} · [[ v ]] + β · nuv ⎪ h ⎪ ⎪ e Γ+ ⎪ e ∈Γ / ⎪ N ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ = fv+ Se g D v − β · ngD v − gN v. ⎪ ⎪ ⎩ T e e e − T ∈Th
e∈ΓD
e∈ΓD
e∈ΓN
We observe that for the diffusive part this method gives the so-called incomplete interior penalty Galerkin (IIPG) method proposed and analyzed in [36], while the advective part is upwinded through the operator B1 . Example 2. We set B0 v|T = v B1 v|e = ce (3.5)
∀T ∈ Th , ε n+ [[ v ]] + {ε∇h v} + [[ βv ]] ∀e ∈ Eh◦ , |e| 2
B2 v|e = −{v} ∀e ∈ Eh◦ , B1D v|e = ce
B2N v|e = −v
∀e ∈ ΓN ,
ε v + (ε∇h v − βvχΓ− ) · n ∀e ∈ ΓD . D |e|
These choices reproduce the method introduced in [23] for the case γ = 0 and different boundary conditions. Indeed, in [23] the flux was not assigned at the inflow, and the boundary conditions were, with our notation, u = gD
on ΓD ≡ Γ \ Γ+ N,
(−ε∇u) · n = gN
on Γ+ N ,
Γ− N = ∅.
1398
BLANCA AYUSO AND L. DONATELLA MARINI
In (3.5) the diffusive part corresponds to the NIPG method of [32], and the advective part is upwinded through B1 . Substituting (3.5) in (2.15), and using (2.10) and the continuity of β, leads to the problem
(3.6)
⎧ Find u ∈ V (Th ) such that ∀ v ∈ V (Th ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (γuv − σ(u) · ∇ v) + S [[ u ]] · [[ v ]] + {βu}upw · [[ v ]] ⎪ h e ⎪ ⎪ Ω e e ◦ ⎪ e∈Eh e ∈Γ / ⎪ N ⎪ ⎨ ⎪ ({ε∇h u} · [[ v ]] − [[ u ]] · {ε∇h v}) + β · nuv ⎪ ⎪− ⎪ e e ⎪ + e ∈Γ / e∈Γ ⎪ N ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ f v + g (S v + (ε∇ v − βvχ − ) · n) − gN v. = D e h ⎪ ΓD ⎩ T e e T ∈Th
e∈ΓD
e∈ΓN
Example 3. We set B0 v|T = v
∀T ∈ Th , ε B1 v|e = ce [[ v ]] − θ{ε∇v}upw ∀e ∈ Eh◦ , h B2 v|e = −{v}dw ∀e ∈ Eh◦ , B2N v|e = −v ∀e ∈ ΓN , ε B1D v|e = ce v − (θε∇v + βvχΓ− ) · n ∀e ∈ Γ− D, D h where θ is a parameter that allows us to include various formulations for treating the diffusive part: symmetric for θ = 1, skew-symmetric for θ = −1, and neutral for θ = 0. This choice of the operators corresponds to the method introduced in [24] and analyzed in [10]. By substituting in (2.15), integrating by parts, and rearranging terms we obtain the following scheme:
(3.7)
⎧ Find u ∈ V (Th ) such that ∀v ∈ V (Th ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (γuv − σ(u) · ∇h v) + Se [[ u ]] · [[ v ]] + {βu}upw · [[ v ]] ⎪ ⎪ ⎪ Ω e e ◦ ⎪ e∈Eh e∈Γ / N ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ − ({ε∇h u}upw · [[ v ]] + θ[[ u ]] · {ε∇h v}upw ) + β · nuv ◦
e
+
e
e∈Eh e∈Γ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ − (ε∇h u · nv + θuε∇h v · n) ⎪ ⎪ ⎪ e ⎪ e∈Γ D ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ fv + gD (Se [[ v ]] − θε∇h v − βvχΓ− ) · n − gN v. = ⎪ ⎩ D T ∈Th
T
e∈ΓD
e
e∈ΓN
e
In (3.7) the whole flux σ(u) is upwinded through the operator B2 , but the upwind effect for the advective part is exactly the same as in methods (3.4) and (3.6).
1399
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
Example 4. Let {·}α be the weighted average defined in (2.8)–(2.9). We set B0 v|T = v B1 v|e = ce
∀T ∈ Th , ε [[ v ]] + θ({σ(v)}α − {βv}) ∀e ∈ Eh◦ , |e|
B2 v|e = −{v}1−α
∀e ∈ Eh◦ ,
B2N v|e = −v
∀e ∈ ΓN ,
ε B1D v|e = ce v − (θε∇h v + βvχΓ− ) · n ∀e ∈ ΓD . D h Substituting in (2.15) yields
(3.8)
⎧ Find u ∈ V (Th ) such that ∀ v ∈ V (Th ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ γuv − σ(u) · ∇ v + S [[ u ]] · [[ v ]] − θ [[ u ]] · {βv} ⎪ h e ⎪ ⎪ Ω e e ◦ ⎪ e∈E e ∈Γ / ⎪ N h ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ + ({σ(u)}α · [[ v ]] + θ[[ u ]] · {σ(v)}α ) + β · nuv ◦
e
+
e
e∈Eh e∈Γ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ − (ε∇h u · n v + θu ε∇h v · n) ⎪ ⎪ ⎪ ⎪ e∈ΓD e ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ − f v + g (S [[ v ]] − θε∇ v − βvχ ) · n − gN v. = D e h ⎪ ΓD ⎩ T ∈Th
T
e∈ΓD
e∈ΓN
e
In (3.8) θ again is a parameter that allows us to include different treatments of the diffusive part: symmetric for θ = 1 SIPG(α) (see [35], [22]), nonsymmetric for θ = −1, and neutral for θ = 0. However, as we shall see in Remark 4.2, the case θ = −1 gives rise to a formulation which is stable in a norm too weak, with a consequent loss of accuracy in the error estimates. Thus, it will not be further considered. The upwind is achieved in (3.8) through both operators B1 and B2 . Moreover, the use of the weighted average (2.8) should allow us to tune the amount of upwind on each edge. As a consequence, the formulation enjoys the nice feature of adapting easily from the advection-dominated to the diffusion-dominated regime. All the above formulations share the common form Find u ∈ V (Th ) such that ah (u, v) = L(v) ∀v ∈ V (Th ). Remark 3.1. In all cases, for obtaining the corresponding SUPG-stabilized DG formulations, one need only change the definition of the operator B0 into B0 v = v + cT β · ∇v on each T ∈ Th , cT being a constant varying elementwise and depending on hT and the coefficients of the problem β, ε, γ (see [28], [25], and [24]). 4. Approximation. With any integer k ≥ 1 we associate the finite element space of discontinuous piecewise polynomial functions Vhk = {v ∈ L2 (Ω) : v|T ∈ Pk (T ) ∀T ∈ Th },
1400
BLANCA AYUSO AND L. DONATELLA MARINI
where, as usual, Pk (T ) is the space of polynomials of degree at most k on T . Replacing V (Th ) by Vhk , we get the discrete problems, all sharing the form Find uh ∈ Vhk such that (4.1) ah (uh , vh ) = L(vh ) ∀vh ∈ Vhk . Consistency. Consistency holds by construction in all the cases, so that ah (u − uh , vh ) = 0
(4.2)
∀vh ∈ Vhk .
Stability. We shall prove stability in the norm |||v|||2 = |||v|||2d + |||v|||2rc ,
(4.3) with
|||v|||2d := ε|v|21,h + εv2j := ε|v|21,h + |||v|||2rc
:= ||( + b0 )
1/2
v||20,Ω
e∈Γ / N
+
ε ||[[ v ]]||20,e , |e|
|||β · n|1/2 [[ v ]]||20,e ,
e∈Eh
where b0 = ||β||0,∞ /L is defined in (H1), and is the piecewise constant function defined as (4.4)
(x)|T = |T ,
|T = min (x) x∈T
∀T ∈ Th .
Analogously, it will be useful to write the bilinear forms as ah (u, v) = adh (u, v) + arc h (u, v).
(4.5)
For simplicity, we start by considering the method (3.4), which corresponds to the “minimal choice” for the operators. Then we have d ah (u, v) = (4.6) ε∇h u · ∇h v + (Se [[ u ]] − {ε∇h u}) · [[ v ]], Ω
e∈Γ / N
(4.7)
(γuv − uβ · ∇h v) +
arc h (u, v) = Ω
e
◦ e∈Eh
{βu}upw · [[ v ]] +
e
Γ+
β · nuv.
We note that, using (2.12) and arguing as in [2], we can easily see that there exists a (geometric) constant Cg , depending only on the degree of the polynomials and on the minimum angle of the decomposition such that (4.8) ∀u ∈ Vhk , ∀v ∈ V (Th ). {ε∇h u}[[ v ]] ≤ Cg ε|u|1,h vj e∈Γ / N
e
This implies that there exists a constant Cd > 0 such that (4.9)
adh (u, v) ≤ Cd |||u|||d |||v|||d ,
u ∈ Vhk , v ∈ V (Th ),
and, for η0 in (3.2) verifying (4.10)
η0 > Cg2 /4,
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
1401
there exists a positive constant αd such that (4.11)
adh (v, v) ≥ αd |||v|||2d ,
v ∈ Vhk .
We also note that, in general, one would rather require, say, η0 > max{Cg2 , 1}
(4.12)
in order to have a quantifiable constant like αd = 1/2. In any case, the diffusive part alone would easily verify stability in all the methods. However, the technique of taking v = u, which is possibly the easiest way of proving stability, will not be sufficient when the reactive-advective part is also present, as it does not provide control on the L2 norm when advection dominates. Indeed, in all the cases we would have only 1/2 2 arc v||0,Ω + |||β · n|1/2 [[ v ]]||20,e , v ∈ Vhk . h (v, v) ≥ || e∈Eh
We will then prove stability in the norm (4.3) through an inf-sup condition. For that, following [28], we introduce the “weighting function” χ = exp(−η), with η defined in (H1). The assumptions on η imply the existence of three positive constants χ∗1 , χ∗2 , χ∗3 such that (4.13)
χ∗1 ≤ χ ≤ χ∗2 ,
|∇χ| ≤ χ∗3 .
Our weighting function will be slightly different. Indeed, we shall take (4.14)
ϕ = χ + κ,
where κ is a constant such that (4.15)
χ∗1 + κ > 6 CP Lχ∗3 ,
χ∗1 + κ > (χ∗2 + κ)/2,
and CP is the Poincar´e constant appearing in (2.13). The next lemma is a generalization to the case of variable β of that given in [26] for pure hyperbolic problems. See also [28] for the equivalent result for the SUPGstabilized method and [34] for the conforming residual-free bubbles method. We point out, however, that here, thanks to the choice (4.14), we were able to remove the condition “ε sufficiently small.” Lemma 4.1. Let ah (·, ·) be defined in (4.5)–(4.7), with (4.16)
η0 > max{9Cg2 /4, 1}.
Then, for every κ satisfying (4.15), the corresponding ϕ defined in (4.14) verifies (4.17) (4.18) (4.19)
adh (vh , ϕvh ) ≥
χ∗1 + κ |||vh |||2d , 6
arc h (vh , ϕvh ) ≥
χ∗1 |||vh |||2rc , 2
√ 145 ∗ (χ1 + κ)|||vh |||. |||ϕvh ||| ≤ 6
Proof. To simplify the notation we shall write α1 = χ∗1 + κ,
α2 = χ∗2 + κ,
α3 ≡ χ∗3
1402
BLANCA AYUSO AND L. DONATELLA MARINI
so that α1 ≤ ϕ ≤ α2 ,
(4.20) (4.21)
(i)
α1 > 6 CP L α3 ,
(ii)
|∇ϕ| ≤ α3 , 2α1 > α2 .
Conditions (4.8) and (4.20) give adh (vh , ϕvh ) ε|∇h vh |2 ϕ + (Se [[ vh ]] − {ε∇h vh }) · [[ v ]]ϕ + ε∇h vh · ∇ϕvh = Ω
e∈Γ / N
Ω
e
≥ ε α1 (|vh |21,h + η0 vh 2j ) − α2 Cg |vh |1,h vh j − α3 |vh |1,h ||vh ||0,Ω . This, using (4.21(ii)) and (4.16), then η0 ≥ 1 and (2.13), and finally (4.21(i)), gives easily α 1 (|vh |21,h + η0 vh 2j ) − α3 |vh |1,h ||vh ||0,Ω adh (vh , ϕvh ) ≥ ε 3 α1 α1 ≥ε |||vh |||2d , |vh |21,h + vh 2j − α3 CP L |||vh |||2d ≥ 3 6 that is, (4.17). As regards the reactive-convective part, we observe that, after integration by parts, using (2.11) and the continuity of β and ϕ we get 1 − β · ∇h (ϕvh )vh = − (β · ∇ϕ)vh2 − β · ∇h (vh2 )ϕ 2 Ω Ω Ω 1 1 1 =− (4.22) (β · ∇ϕ)vh2 + (divβ)ϕvh2 − {βϕ}[[ vh2 ]]. 2 Ω 2 Ω 2 e e∈Eh
Next, the continuity of β and ϕ easily imply that 1 {βvh } · [[ ϕvh ]] = {βϕ} · [[ vh2 ]]. 2 e e ◦ ◦ e∈Eh
e∈Eh
From this and (2.10) we then have β · n+ 1 ϕ|[[ vh ]]|2 . (4.23) {βvh }upw [[ ϕvh ]] = {βϕ} · [[ vh2 ]] + 2 2 e e e ◦ ◦ ◦ e∈Eh
e∈Eh
e∈Eh
By noting that (H1) and (4.20) imply −β · ∇ϕ = (β · ∇η) χ ≥ 2b0 χ ≥ 2b0 χ∗1 , from (4.22)–(4.23), using (4.20), (2.2), and (4.4), we obtain 1 1 rc 2 ah (vh , ϕvh ) = (β · ∇ϕ)vh2 γ + (divβ) ϕvh − 2 2 Ω Ω β · n+ 1 1 2 2 + β · nϕvh + β · nϕvh2 ϕ|[[ vh ]]| − 2 2 Γ− 2 Γ+ e ◦ e∈Eh
≥ χ∗1 ||( + b0 )1/2 vh ||20,Ω +
α1 χ∗ |β · n|1/2 [[ vh ]]20,e ≥ 1 |||vh |||2rc , 2 2 e∈Eh
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
1403
that is, (4.18). On the other hand, (4.19) again is an easy consequence of (2.13) and (4.20)–(4.21). Remark 4.1. We point out that condition (4.16) has been taken in order to simplify the computation and to provide an easily quantifiable constant in (4.17) (very much in the spirit of (4.12) compared with the less demanding (4.10)). Looking at the proof, however, we see that we could stick to (4.10) (changing the conditions on κ in (4.15) in order to have α2 /α1 as close to 1 as necessary). Hence, in some sense, the difficulty of finding “how big should η0 be in practice” has not been worsened by the above trick. Remark 4.2. Concerning the other three methods (3.6), (3.7), and (3.8), they exhibit essentially the same terms, with the only exception for the method (3.8), where the advective part contains ((θ + 1){βvh }α − θ{βvh }) [[ ϕvh ]] =: I1 , e
◦ e∈Eh
instead of the left-hand term in (4.23). Using the definition (2.9) of the weighted average we obtain, instead of (4.23), β · [[ α ]] 1 I1 = ϕ|[[ vh ]]|2 , {βϕ} · [[ vh2 ]] + (θ + 1) 2 2 e e ◦ ◦ e∈Eh
e∈Eh
where β · [[ α ]] = (2α+ − 1)β · n+ > 0 since α+ , the weight associated with the upwind triangle, is > 1/2. Hence, (4.18) holds also for method (3.8) (possibly with a different constant) if θ > −1. As already said, choosing θ = −1 in (3.8) produces undesirable cancellations which lead to having stability in a norm too weak to ensure control on the advective part. Namely, we have |||β · n|1/2 [[ vh ]]||20,e . ah (vh , ϕvh ) ≥ C ||( + b0 )1/2 vh ||20,Ω + |||vh |||2d + e∈Γ
Suboptimal error estimates (O(hk )) in this norm can be obtained, but the method is unstable in strongly advective regimes. Indeed, θ = −1 gives rise to a method without any kind of upwind. The following superapproximation results can be found in [29] and [37]. For convenience we briefly sketch the proof. Lemma 4.2. Let ϕ ∈ W k+1,∞ (Ω) be the function defined in (4.14). For vh ∈ Vhk , let ϕv Rh be the L2 -projection of ϕvh in Vhk . Then (4.24)
||ϕvh − ϕv Rh ||0,Ω ≤ C
||χ||k+1,∞,Ω h||vh ||0,Ω , L
(4.25)
R vh |1,h ≤ C |ϕvh − ϕ
||χ||k+1,∞,Ω ||vh ||0,Ω , L
(4.26)
1/2 ||ϕvh −
ϕv Rh ||20,e
e∈Eh
where L is the diameter of Ω.
≤C
||χ||k+1,∞,Ω 1/2 h ||vh ||0,Ω , L
1404
BLANCA AYUSO AND L. DONATELLA MARINI
Proof. We shall deduce (4.24). Observe first that, since κ R vh ≡ κvh , ϕvh − ϕv Rh ≡ χvh − χ R vh . Using classical interpolation results, the definition of the norm (1.2), the inverse inequality (see [12, Theorem 17.2, p. 135]), and h < L we have R vh ||0,T ≤ C hT k+1 |χvh |k+1,T ≤ ChT k+1 ||ϕvh − ϕ
k
|χ|k+1−j,∞,T |vh |j,T
j=0
≤ C||χ||k+1,∞,Ω
k hT k+1 |vh |j,T j=0
(4.27) ≤ C Cinv
Lk+1−j
k ||χ||k+1,∞,Ω hT k+1−j ||vh ||0,T L Lk−j j=0
||χ||k+1,∞,Ω ||vh ||0,T . L Hence, summing over all elements T ∈ Th we reach (4.24). Exactly in the same way we prove (4.25), while (4.26) is a consequence of (4.24)–(4.25) via the trace inequality (2.12). Lemma 4.3. In the hypotheses of Lemma 4.1, there exist two positive constants χ∗4 , χ∗5 such that, for any value of κ, the corresponding ϕ verifies ≤ C (k + 1)hT
R vh ) ≤ χ∗4 |||vh |||2d adh (vh , ϕvh − ϕ
(4.28) (4.29)
arc h (vh , ϕvh
− ϕv Rh ) ≤
χ∗5
∀vh ∈ Vhk ,
1/2 h |||vh |||2rc L
∀vh ∈ Vhk .
Proof. Using estimates (4.25)–(4.26) from Lemma 4.2, and then (2.13), we see that ||χ||k+1,∞,Ω 1/2 |||ϕv Rh − ϕvh |||d ≤ C ε ||vh ||0,Ω ≤ CCP ||χ||k+1,∞,Ω |||vh |||d . L Hence, from (4.9) we have adh (vh , ϕv Rh − ϕvh ) ≤ Cd |||vh |||d |||ϕv Rh − ϕvh |||d ≤ Cd CCP ||χ||k+1,∞,Ω |||vh |||2d , that is, (4.28) with χ∗4 = Cd CCP ||χ||k+1,∞,Ω . Before dealing with the reactive-convective part we observe that, if Ph0 β is the L2 -projection of β onto constants, by definition of ϕ R vh it holds that Ph0 β · ∇h vh (ϕvh − ϕv Rh ) = 0. Ω
By integrating by parts and using (2.11) and (2.10) we then have arc (v , ϕv R − ϕv ) = [γ + divβ]v ( ϕv R − ϕv ) + [β − Ph0 β] · ∇h vh (ϕv Rh − ϕvh ) h h h h h h h Ω
−
e∈Γ / +
Ω
β · [[ vh ]]{ϕv Rh − ϕvh } +
e
= I + II + III + IV.
β · n+ [[ vh ]][[ ϕv Rh − ϕvh ]] 2 e o
e∈Eh
1405
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
From (2.2), (H3), and (2.4) we have 1 I= vh (ϕv Rh − ϕvh ) + divβ vh (ϕv Rh − ϕvh ) 2 Ω Ω Rh − ϕvh )||0,Ω + ≤ c ||( + b0 )1/2 vh ||0,Ω ||( + b0 )1/2 (ϕv
b0 ||vh ||0,Ω ||ϕv Rh − ϕvh ||0,Ω . 2cβ
On the other hand, the definition (4.4) of and estimate (4.24) from Lemma 4.2 give ||( + b0 )1/2 (ϕv Rh − ϕvh )||20,Ω = (T + b0 )||(ϕv Rh − ϕvh )||20,T T ∈Th
≤ C||χ||2k+1,∞,Ω
2
2 h h 2 2 (T + b0 )||vh ||0,T = C ||χ||k+1,∞,Ω ||( + b0 )1/2 vh ||20,Ω , L L T ∈Th
so that (4.30)
h I ≤ C||χ||k+1,∞,Ω ||( + b0 )1/2 vh ||20,Ω . L
Classical approximation results, (4.24), (2.4), and the inverse inequality give (4.31)
II ≤ Ch|β|1,∞,Ω |vh |1,h
||χ||k+1,∞,Ω h h b0 ||vh ||0,Ω ≤ C||χ||k+1,∞,Ω ||vh ||20,Ω . L L cβ
Finally, from (4.26) we deduce h1/2 1/2 III + IV ≤ C β0,∞,Ω vh 0,Ω L
(4.32)
1/2 |β · n|
1/2
[[ vh ]]20,e
||χ||k+1,∞,Ω
e∈Eh
1/2 h 2 1/2 2 ≤C |β · n| [[ vh ]]0,e ||χ||k+1,∞,Ω . b0 ||vh ||0,Ω + L e∈Eh
Collecting (4.30)–(4.32) we then get arc Rh h (vh , ϕv
1/2 h − ϕvh ) ≤ C ||χ||k+1,∞,Ω |||vh |||2rc , L
that is, (4.29) with χ∗5 = C||χ||k+1,∞,Ω . The next theorem provides the first stability result for the variational formulations presented in section 3. Theorem 4.4. In the hypotheses of Lemma 4.1, there exists a positive constant αS = αS (β, Ω), and h0 = h0 (β) > 0, such that, for h < h0 , sup vh ∈Vhk
ah (uh , vh ) ≥ αS |||uh ||| |||vh |||
∀uh ∈ Vhk .
Proof. For uh ∈ Vhk , let vh = ϕu Rh ∈ Vhk be the L2 -projection of ϕuh as defined previously. We shall prove that (4.33)
|||vh ||| ≤ c1 |||uh |||,
(4.34)
ah (uh , vh ) ≥ c2 |||uh |||2 .
1406
BLANCA AYUSO AND L. DONATELLA MARINI
Adding and subtracting ϕuh , from (4.17) we have first adh (uh , ϕu Rh ) = adh (uh , ϕu Rh − ϕuh ) + adh (uh , ϕuh ) Rh − ϕuh ) + ≥ adh (uh , ϕu
χ∗1 + κ |||uh |||2d . 6
Using estimate (4.28) we then have easily that for χ∗1 + κ bigger than 12 χ∗4 we find adh (uh , ϕu Rh ) ≥ χ∗4 |||uh |||2d . In a similar way, from (4.29) and (4.18) one has, for h < h0 , arc Rh ) ≥ C |||uh |||2rc , h (uh , ϕu with C depending only on χ∗1 , χ∗5 . On the other hand, using (4.19) and Lemma 4.2, we have easily |||ϕu Rh ||| ≤ c1 |||uh |||, that is, (4.33), with c1 depending on χ∗1 and ||χ||k+1,Ω . Stability in a stronger norm. In a strongly advection-dominated regime it is desirable to have a control also on the streamline derivative; that is, it is necessary to have in (4.3) a term of SUPG type. We set hT ||P k (β · ∇v)||20,T , (4.35) |||v|||2DG := |||v|||2 + ||v||2S , ||v||2S := ||β||0,∞,T h T ∈Th
where Phk again is the L2 -projection on Vhk . Remark 4.3. The presence of the projection in (4.35) is due to the fact that we / Vhk . Clearly, whenever assumed β to be a variable function, and hence β · ∇h uh ∈ k β · ∇h uh ∈ Vh , that is, if β is either constant (see [25], [20], [10]) or piecewise linear (see [23]), the projection can be removed. Stability in the norm (4.35) can again be achieved through an inf-sup condition. Lemma 4.5. There exists a constant CS > 0, independent of h, ε, β, γ, such that (4.36)
sup vh ∈Vhk
ah (uh , vh ) ≥ CS (||uh ||S − |||uh |||) |||vh |||
∀uh ∈ Vhk .
Proof. For uh ∈ Vhk , let Phk (β·∇h uh ) ∈ Vhk be the L2 -projection on Vhk of β·∇h uh , for which the following estimates hold: (4.37)
∀T ∈ Th : |Phk (β · ∇uh )|1,T ≤ Cinv hT −1 ||Phk (β · ∇uh )||0,T ,
and, for any edge e, shared by two elements T + and T − , (4.38)
||[[ Phk (β · ∇h uh ) ]]||20,e ≤ C|e|−1 ||Phk (β · ∇uh )||20,T + ∪T − , ||{Phk (β · ∇h uh )}||20,e ≤ C|e|−1 ||Phk (β · ∇uh )||20,T + ∪T − .
Inequality (4.37) is the usual inverse inequality, while (4.38) through the is deduced k trace inequality (2.12) and (4.37). We then set vh = c (P (β · ∇h uh ))|T , h T ∈Th T where ⎧ ⎨ βhT if advection dominates in T, 0,∞,T cT = ⎩0 otherwise.
1407
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
We shall prove that (4.39)
|||vh ||| ≤ C1 ||uh ||S ,
(4.40)
ah (uh , vh ) ≥ C2 (||uh ||2S − |||uh |||||uh ||S ).
We prove first (4.39), having in mind that, if advection dominates, then (4.41)
ε < hT β0,∞,T /2,
||γ + divβ||0,∞,T < β0,∞,T /hT
∀T ∈ Th .
From (4.37) and (4.41) we deduce (4.42)
ε|vh |21,h
=
ε
T ∈Th
hT β0,∞,T
2 |Phk (β · ∇h uh )|21,T ≤ C ||uh ||2S .
Similarly, from (4.38) and (4.41) we have ε Se ||[[ vh ]]||20,e = ce ||[[ cT Phk (β · ∇h uh ) ]]||20,e ≤ C ||uh ||2S (4.43) |e| e∈Γ / N
e∈Γ / N
and (4.44)
|||β · n|1/2 [[ vh ]]||20,e =
e∈Eh
|||β · n|1/2 [[ cT Phk (β · ∇h uh ) ]]||20,e ≤ C ||uh ||2S .
e∈Eh
Since = (γ + divβ) − 12 divβ, in view of (4.41) and (2.4) we deduce 1 β0,∞,T β1,∞,Ω . ||||0,∞,T ≤ ||γ + divβ||0,∞,T + ||divβ||0,∞,T ≤ + 2 hT 2L Hence, from (H2) and since hT ≤ h < L we deduce cT ||||0,∞,T ≤ 1 +
hT 1 ≤1+ . 2Lcβ 2cβ
Consequently, (4.45)
||1/2 vh ||20,Ω ≤
||||0,∞,T c2T ||Phk (β · ∇uh )||20,T ≤ C||uh ||2S .
T ∈Th
Finally, always from (H2), (4.46)
||vh ||20,Ω
=
T ∈Th
hT β0,∞,T
2 ||Phk (β · ∇h uh )||20,T ≤
h ||uh ||2S , cβ β1,∞,Ω
and then, since b0 = β0,∞,Ω /L, β0,∞,Ω ≤ β1,∞,Ω , and h < L, b0 ||vh ||20,Ω ≤
1 ||uh ||2S . cβ
This and (4.45) can be written as (4.47)
||( + b0 )1/2 vh ||20,Ω ≤ C||uh ||2S ,
1408
BLANCA AYUSO AND L. DONATELLA MARINI
and (4.39) is proved. We turn now to prove (4.40), again referring to formulation (3.4). For the diffusive part we have, via the Cauchy–Schwarz inequality and (4.42), ε∇h uh ∇h vh ≤ ε1/2 |uh |1,h ε1/2 |vh |1,h ≤ Cε1/2 |uh |1,h ||uh ||S . Ω
For the integrals on the edges, the Cauchy–Schwarz inequality and (4.43) give 1/2 Se [[ uh ]][[ vh ]] ≤ C Se ||uh ||20,e ||uh ||S ≤ C||uh ||j ||uh ||S . e
e∈Γ / N
e∈Γ / N
In an analogous way, the Cauchy–Schwarz inequality, trace inequality (2.12), the inverse inequality, and (4.43) give {ε∇h uh } · [[ vh ]] ≤ C ε1/2 |uh |1,h ||uh ||S , e
e∈Γ / N
so that adh (uh , wh ) ≤ C|||uh |||||uh ||S .
(4.48)
For the reactive and advective terms, integration by parts, formula (2.11), and the definition of the upwind average (2.10) give 1 arc (u , v ) = u v + (β · ∇ u )v + divβuh vh h h h h h h h h 2 Ω Ω Ω β · n+ [[ uh ]][[ vh ]] − + β · [[ uh ]]{vh }. 2 e e ◦ + e∈Γ /
e∈Eh
By definition of projection we have (4.49) (β · ∇h uh )vh = Phk (β · ∇h uh ) vh = ||uh ||2S , Ω
Ω
and by the Cauchy–Schwarz inequality, (H2), and (4.47) (4.50) uh vh ≤ c ( + b0 )1/2 uh 0,Ω ( + b0 )1/2 vh 0,Ω ≤ C( + b0 )1/2 uh 0,Ω ||uh ||S . Ω
Using (2.4), (4.46), and (H2) we obtain
divβuh vh ≤ (4.51)
Ω
||β||1,∞,Ω L
1/2
b ≤ 0 cβ
uh 0,Ω
h cβ ||β||1,∞,Ω
1/2 uh S
1/2 h uh 0,Ω uh S ≤ C|||uh |||uh S . L
Finally, from the Cauchy–Schwarz inequality and (4.44) we easily obtain 1/2 β · n+ (4.52) [[ uh ]][[ vh ]] ≤ C |||β · n|1/2 [[ uh ]]||20,e ||uh ||S . 2 e ◦ e∈Eh
e∈Eh
1409
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
Collecting (4.49), (4.50), (4.51), and (4.52) we obtain 2 arc h (uh , vh ) ≥ ||uh ||S − C|||uh |||||uh ||S .
From (4.48) and the above estimate we then have ah (uh , vh ) ≥ ||uh ||2S − C|||uh |||||uh ||S , which, together with (4.39), gives (4.36). Theorem 4.6. There exists a constant CS = CS (β, Ω) > 0, and h0 = h0 (β) > 0, such that, for h < h0 , sup vh ∈Vhk
ah (uh , vh ) ≥ CS |||uh |||DG |||vh |||
∀uh ∈ Vhk .
Proof. The result follows from Theorem 4.4 and Lemma 4.5. We finally conclude by proving a result which provides stability in a norm of SUPG type but without the projection. However, this requires stronger regularity assumptions on β, dictated by the polynomial degree. More precisely, when using Vhk , we can prove stability in the norm hT (4.53) |||uh |||2SS := |||uh |||2 + ||uh ||2β , with ||uh ||2β = β · ∇uh 20,T , β0,∞,T T ∈Th
only if β ∈ W k,∞ (Ω). In other words, our initial assumption β ∈ W 1,∞ (Ω) guarantees stability in the norm (4.53) only for piecewise linear approximations. Theorem 4.7. Let β ∈ W k,∞ (Ω), k ≥ 1 being the polynomial degree of Vhk . Assume that (H2a)
∃ cβ > 0 such that |β(x)| ≥ cβ ||β||k,∞,Ω
∀x ∈ Ω.
Then, there exists a constant Css = Css (β, Ω) > 0, and h0 = h0 (β) > 0, such that, for h < h0 , (4.54)
sup vh ∈Vhk
ah (uh , vh ) ≥ Css |||uh |||SS |||vh |||
∀uh ∈ Vhk .
Proof. The proof is accomplished by proceeding similarly as for Theorem 4.6, and we omit the details. Indeed, the only step that needs to be modified is (4.49), as all the others hold with the norm · S replaced by · β , by simply using the stability of the L2 -projection. By adding and subtracting T ∈Th cT (β · ∇uh )|T we find (β · ∇h uh )vh = uh 2β + cT (β · ∇h uh )[Phk (β · ∇h uh ) − β · ∇h uh ] Ω
Ω
≥
uh 2β
− uh β
1/2 cT ||Phk (β
· ∇uh ) − β ·
∇uh ||20,T
.
T ∈Th
To estimate the second term, note that the regularity of β allows us to use the superapproximation property (4.27) (with β now playing the role of ϕ, and ∇uh playing the role of vh ). This plus inverse inequality and (H2a) give Phk (β · ∇uh ) − β · ∇uh 0,T ≤ ChT k |β · ∇uh |k,T ≤ Ck ≤C
βk,∞,Ω hT ∇uh 0,T L
||β||k,∞,Ω ||β||0,∞,T uh 0,T ≤ C uh 0,T . L cβ L
1410
BLANCA AYUSO AND L. DONATELLA MARINI
Since h < L we then have hT ||β||0,∞,T C ||uh ||20,T ≤ 2 b0 ||uh ||20,Ω . cT ||Phk (β · ∇uh ) − β · ∇uh ||20,T ≤ C 2 L cβ L cβ T ∈Th
T ∈Th
Thus,
(β · ∇h uh )vh ≥ uh 2β − C uh β |||uh |||. Ω
Then, the result (4.54) follows. 5. A priori error estimates. We next show a priori error estimates in the norms (4.3) and (4.35) for the methods presented. Let Phk be the L2 -projection in Vhk , for which the following local approximation property holds: (5.1)
||u − Phk u||r,T ≤ Chk+1−r |u|k+1,T ,
(5.2)
||u − Phk u||r,p,T ≤ Chk+1−r |u|k+1,p,T , 1 ≤ p ≤ ∞, r = 0, 1,
T ∈ Th ,
r = 0, 1, 2,
T ∈ Th .
Moreover, from (5.1) and (2.12) we deduce that k+1/2
||u − Phk u||0,e ≤ ChT
(5.3)
|u|k+1,T
∀e ∈ Eh .
Theorem 5.1. Let u be the solution of (2.1), and let uh be the solution of the discrete problems (4.1). There exists a constant C0 = C0 (Ω), depending on the domain Ω, the shape regularity of Th , and the polynomial degree (but independent of h and the coefficients of the problem), such that 1/2 1/2 (5.4) |||u − uh ||| ≤ C0 (Ω)hk ε1/2 + β0,∞,Ω h1/2 + 0,∞,Ω h . Proof. We define η = u − Phk u,
δ = uh − Phk u.
From Theorem 4.4 and Galerkin orthogonality (4.2) we have αS |||δ||| ≤
(5.5)
ah (η, vh ) ah (δ, vh ) = . |||vh ||| |||vh |||
The diffusive part is standard and can be easily estimated through the trace inequality (2.12), (5.1), and (5.3): adh (η, vh ) ≤ Chk ε1/2 |u|k+1,Ω |||vh |||d .
(5.6)
Regarding the advective part, since Ph0 β · ∇h vh ∈ Vhk , by definition of projection Ph0 β · ∇h vh η = 0. Ω
From this, the Cauchy–Schwarz inequality, (5.2), the inverse inequality, (2.4), and (5.1) we have −(β · ∇h vh )η = (Ph0 β − β) · ∇h vh η ≤ Ch|β|1,∞,Ω |vh |1,h ||η||0,Ω Ω
(5.7)
Ω
≤C
||β||1,∞,Ω b0 ||vh ||0,Ω ||η||0,Ω ≤ C ||vh ||0,Ω hk+1 |u|k+1,Ω L cβ
≤
1/2 Chk+1 b0 |u|k+1,Ω |||vh |||
=C
β0,∞,Ω L
1/2 hk+1 |u|k+1,Ω |||vh |||.
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
1411
Using (5.3) we obtain 1/2 {βη} · [[ vh ]]≤ β0,∞,Ω ||{η}||0,e |||β · n|1/2 [[ vh ]]||0,e e
(5.8)
e
e
≤
1/2 Cβ0,∞,Ω hk+1/2 |u|k+1,Ω |||vh |||,
and, arguing similarly, we have β · n+ 1/2 (5.9) [[ η ]] · [[ vh ]]≤ Cβ0,∞,Ω hk+1/2 |u|k+1,Ω |||vh |||. 2 e e Finally, by writing γ = − divβ/2, using (H3), (2.4), and (5.1) we obtain 1/2 b 1/2 1/2 1/2 γηvh ≤ 0,∞,Ω ||η||0,Ω c1/2 vh ||0,Ω + 0 ||η||0,Ω b0 ||vh ||0,Ω ρ ||( + b0 ) c β Ω (5.10)
1/2 β0,∞,Ω 1/2 |u|k+1,Ω |||vh |||. ≤ C hk+1 0,∞,Ω + L Then collecting (5.6)–(5.10) and using h/L < 1 we obtain 1/2 1/2 ah (η, vh ) ≤ Chk ε1/2 + β0,∞,Ω h1/2 + 0,∞,Ω h |u|k+1,Ω |||vh |||. Hence, substituting this estimate into (5.5) gives 1/2 1/2 |||δ||| ≤ C(Ω)hk ε1/2 + β0,∞,Ω h1/2 + 0,∞,Ω h |u|k+1,Ω . The result (5.4) then follows by the triangle inequality. Theorem 5.2. Let u be the solution of (2.1), and let uh be the solution of the discrete problems (4.1). There exists a constant C1 = C1 (Ω), depending on Ω, the shape regularity of Th , and the polynomial degree (but independent of γ, β, ε, and h), such that 1/2 1/2 |||u − uh |||DG ≤ C1 (Ω)hk ε1/2 + β0,∞,Ω h1/2 + 0,∞,Ω h |u|k+1,Ω . Proof. The proof follows the same steps of Theorem 5.1, using the stability result of Theorem 4.6. Hence we omit the details. Remark 5.1. The same error estimates hold in the norm ||| · |||SS under the assumption β ∈ W k,∞ (Ω). Remark 5.2. Theorems 5.1 and 5.2 provide robust a priori error estimates, which are optimal in all regimes. More precisely, we have ⎧ ⎪ if advection dominates, O(hk+1/2 ) ⎪ ⎨ |||u − uh |||, |||u − uh |||DG ) O(hk ) if diffusion dominates, ⎪ ⎪ ⎩ k+1 O(h ) if reaction dominates. Corollary 5.3. As a direct consequence of our error analysis we have the following result: ⎧ ⎪ hk+1/2 if advection dominates, ⎪ ⎨ (5.11) u − uh 0,Ω ≤ C2 |u|k+1,Ω hk if diffusion dominates, ⎪ ⎪ ⎩ k+1 h if reaction dominates,
1412
BLANCA AYUSO AND L. DONATELLA MARINI
where C2 depends on the domain Ω, the shape regularity of Th , the polynomial degree, and the coefficients of the problem γ, β, and ε (but is independent of h). Remark 5.3. Estimate (5.11) is suboptimal in the diffusion-dominated regime, since it was simply obtained through (2.13) and (5.4). In the advection-dominated regime, although suboptimal of 1/2, it is the best that one can expect for a regular triangulation without any further assumption on the construction-orientation of the mesh (see [30] for a counterexample in the pure hyperbolic case). Improved estimates in the case of β constant have been rigorously shown in [31] (for the pure hyperbolic case) under certain restrictions on the mesh and, more recently in [15], under milder assumptions on the grid. The techniques used in these papers rely strongly on the hypothesis that β is constant and do not seem to be easily extendable to the case of variable β. However, as we shall see in the next section, in many test cases optimal order of convergence in L2 is attained for quite general mesh partitions. 6. Numerical experiments. In this section we compare on various test problems the methods analyzed in the previous sections. All the experiments were performed on the unit square Ω = (0, 1)2 , using piecewise linear approximations on triangular grids, structured and unstructured. In all the graphics, method (3.4) is represented by − · − · − ·−; method (3.6) with − − − −; method (3.7) with · · · ◦ · · · ; and method (3.8) with −x−. For formulations (3.7) and (3.8) we report the results corresponding to θ = 1, i.e., the symmetric treatment of the diffusive part. All the computations were done in MATLAB7, on a Powerbook 1.5 with 2GB of Ram memory. Example 1: Case of smooth solution. We take β = [1, 1]T and γ = 0, and we vary the diffusion coefficient ε = 1, 10−3 , 10−9 . The forcing term f is chosen so that the analytical solution of (2.1), with Dirichlet boundary conditions, is given by u(x, y) = sin(2πx) sin(2πy). Figures 6.1 and 6.2 represent, on a log-log scale, the convergence diagrams in the norm ||| · |||DG (and ||| · |||, resp.) versus the mesh size h = maxT hT ≈ 1/5, 1/9, 1/18, 1/36. Clearly, the convergence rates are the same for all the methods, in agreement with the theory of section 5: first order accuracy when diffusion dominates and order 3/2 in the convection-dominated regime. Figure 6.3 depicts in a log-log scale the convergence diagrams in the L2 -norm with respect to the mesh size, h = 2−2 , 2−3 , 2−4 , 2−5 , 2−6 , on structured grids. Similar results, although not reported here, were obtained on unstructured grids. Observe that, due to smoothness of the solution, second order convergence is attained in all regimes for all the methods but method (3.7), which is only first order accurate when diffusion dominates. This is due to the fact that in the method (3.7) upwind is done on the whole flux. In method (3.8) the whole flux is also upwinded, but the use of the weighted average (2.9) allows ε = 10−3
ε =1
ε = 10−9
0
10
−1
−1
10
−2
1
10
3/2
3/2
−2
10
−1
10
10
−1
10
−1
10
Fig. 6.1. Example 1. Convergence diagrams in the ||| · |||DG-norm. Unstructured grids.
1413
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS ε = 10−3
ε =1
ε = 10−9
−1
−1
10
10
0
10
−2
10
−2
3/2
10
3/2
1 −1
−1
10
−1
10
10
Fig. 6.2. Example 1. Convergence diagrams in the ||| · |||-norm. Unstructured grids. ε = 10−9
ε = 10−3
ε = 1 −1
10
−1
10 −1
10
1
−2
10
−2
10
−2
10
−3
10
−3
10
2
−3
2
2
10
−1
−1
10
−1
10
10
Fig. 6.3. Example 1. Convergence diagrams in the L2 -norm. Structured grids.
us to tune the amount of upwind as a function of the data. It would be worth devising an automatic tuning. We did not yet, and found numerically the following “optimal values”: (α1 , α2 ) = (0.55, 0.45) for ε = 1, (α1 , α2 ) = (0.64, 0.36) for ε = 10−3 , and (α1 , α2 ) = (0.9, 0.1) for ε ≤ 10−5 . Example 2: Rotating flow. This example is taken from [24]. The data are γ = 0, β = [y − 1/2, 1/2 − x]T , and no external forces act on the system. The solution u is prescribed along the slit 1/2 × [0, 1/2] as follows: y ∈ [0, 1/2] .
u(1/2, y) = sin2 (2πy),
In Figure 6.4, for ε = 10−9 , we have represented the approximate solution obtained with the four methods on a structured triangular grid of 512 elements. As can be seen, all the methods perform similarly, and no significant differences can be appreciated. An important feature of all the methods is the absence of crosswind diffusion which occurs with stabilized conforming methods (see, e.g., [9], [7]). To better assess this feature of the methods, we have plotted in Figure 6.5 the profile of the approximate solutions at y = 1/2. 1
1
1
1
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0
0
0
1
0.8
1 0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
0.8
1 0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
0
0.8
1 0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
Fig. 6.4. Example 2. Approximate solutions for ε = 10−9 on structured grids. From left to right: methods (3.4), (3.6), (3.7), and (3.8).
1414
BLANCA AYUSO AND L. DONATELLA MARINI
Fig. 6.5. Example 2. Profile of the approximate solutions at y = 1/2; ε = 1e − 07 . From left to right: methods (3.4), (3.6), (3.7), and (3.8).
Example 3. Internal layers. The next example is devoted to assessing the performance of the methods in the presence of interior layers. We set γ = 0, β = √ [1/2, 3/2]T , and Dirichlet boundary conditions as follows: ⎧ on {y = 0, 0 ≤ x ≤ 1}, ⎪ ⎨ 1 1 on {x = 0, y ≤ 1/5}, u= ⎪ ⎩ 0 elsewhere. The diffusion coefficient is varied from ε = 10−3 to the limit case ε = 0 (pure hyperbolic case). In Figure 6.6 we represent the approximate solutions obtained on structured grids of 512 triangles with all methods for ε = 10−3 . They all behave poorly in the intermediate regimes, as they produce wiggles close to the boundary. These oscillations disappear in the strongly advection-dominated regime (see Figure 6.7), and the internal layer is sharply captured, with very small overshooting/undershooting. This can be better observed in Figure 6.8, where we have represented the profiles of the solutions at x = 0. Similar results were observed for the profiles at y = 0.5. We notice that the boundary layers on the outflow are missed in all the methods. This is a known drawback of DG approximations: as soon as advection dominates they behave as if the problem were purely hyperbolic. See also the next example for a similar behavior.
Fig. 6.6. Example 3. Approximate solutions for ε = 10−3 on unstructured grids. From left to right: methods (3.4), (3.6), (3.7), and (3.8).
Example 4. Boundary layers. In this example we apply the methods to a boundary layer problem taken from [23]. The data are γ = 0 and β = [1, 1]T , and we again vary the diffusion coefficient ε. The forcing term f is chosen so that the exact solution is given by u(x, y) = x + y(1 − x) +
e−1/ε − e−(1−x)(1−y)/ε , 1 − e−1/ε
(x, y) ∈ Ω.
This problem can be regarded as a multidimensional variant of the one-dimensional problem considered by Melenk and Schwab in [27]. Unlike the classical test case [38],
1415
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
Fig. 6.7. Example 3. Approximate solutions for ε = 10−9 on unstructured grids. From left to right: methods (3.4), (3.6), (3.7), and (3.8).
1
1
1
1
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0
0
0
−0.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
−0.2
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
−0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
−0.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 6.8. Example 3. Profile of the approximate solutions at x = 0; ε = 1e − 09. From left to right: methods (3.4), (3.6), (3.7), and (3.8).
1
0.8
0.6
0.4
0.2
0 1 0.8
1 0.6
0.8 0.6
0.4 0.4
0.2
0.2 0
0
Fig. 6.9. Example 4. Exact solution (left), approximate solution with method (3.7) (right); ε = 10−9 .
u does not reduce, in the hyperbolic limit case, to a linear function in the interior of the domain, as shown in Figure 6.9(left), for ε = 10−9 . In Figure 6.9(right) only the solution obtained with the method (3.7) is represented, as all the methods do not exhibit visible differences in the strongly advective regime. Notice that, since boundary conditions are imposed in a weak way, the boundary layer is not captured by the DG approximations, although the solution is free of spurious oscillations. In Figure 6.10 we compare the methods for ε = 10−3 and structured grids with 24 × 24 × 2 triangles. Again, no substantial differences can be observed, except for small oscillations in the method (3.7) (third plot in the figure), probably due to the upwind treatment of the diffusive part of the flux. For this test case we chose not to plot convergence diagrams in the norms (4.3) or (4.35) since, due to the weak approximation of the boundary conditions, the main contribution to the error comes from the error in the boundary layer, which is O(1), as can be seen in Figures 6.9 and 6.10. Figure 6.11 represents the convergence diagrams in the L1 -norm for ε = 10−3 and h = 1/5, 1/9, 1/18, 1/36. Note that as we would expect in this regime, and since we are measuring global errors, first order convergence is achieved. Although there are no great differences
1416
BLANCA AYUSO AND L. DONATELLA MARINI
between the methods, it seems that in this case method (3.4) gives the most accurate approximation. This can also be checked from Figure 6.10. Finally, Figure 6.12 shows the convergence diagrams in terms of h = 1/5, 1/9, 1/18, 1/36 on unstructured grids for ε = 10−9 in the L2 -norm (left), the ||| · |||d -norm in the interior of the domain (i.e., without the contribution of the boundary elements) (center), and in the norm ||| · |||S defined in (4.35) (right). Note that all the methods give optimal order of convergence in L2 in the advection-dominated regime (see Remark 5.3). Example 5. Compressible advection-diffusion problem. We conclude with a test where the advection field is not divergence-free. We set γ = 0 and β = [yx2 + 1, xy 2 + 1]T . The flow enters the computational domain Ω from two sides of Γ, namely {x = 0} and {y = 0}. The forcing term is chosen as 0 on 0 ≤ x ≤ 1/2, 0 ≤ y ≤ 1/3, f= −1 on 1/2 < x ≤ 1, 1/3 < y ≤ 1. Nonhomogeneous Dirichlet boundary conditions were imposed on Γ− : 2x on 0 ≤ x ≤ 1, y = 0, u= 3y on x = 0, 0 ≤ y ≤ 1, and homogeneous Neumann conditions on Γ+ = {x = 1, 0 < y < 1} ∪ {y = 1, 0 < x < 1}. Figure 6.13 shows a vector diagram of the advection field (left) and two different views of the approximate solution obtained with method (3.8) for ε = 10−9 on a structured triangular mesh with h = 1/16. In Figure 6.14 we represent the approximate solutions obtained on structured grids of 512 triangles (h = 1/16) with
Fig. 6.10. Example 4. Approximate solutions for ε = 10−3 . From left to right: methods (3.4), (3.6), (3.7), and (3.8).
Fig. 6.11. Example 4. Convergence diagrams in the L1 -norm; ε = 10−3 .
1417
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS −3
10
−4
10
−3
10
−2
10
2
1
3/2
−1
−1
10
−1
10
10
Fig. 6.12. Example 4. Convergence diagrams in the norms L2 (left), interior ||| · |||d (center), and ||| · |||S (right); ε = 10−9 . Unstructured grids. 3.5
3 3.5
1
2.5 3 2.5
2
0.8
2 1.5 0.6
1.5
1 0.5
1
0
0.4
0.5
−0.5 1 0.8
0.2
1 0.6 0.4
0.2 0
0.2
0.4
0.6
0.8
0
0.8 0.6
0.4 0
−0.5
0.2 0
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1
0.5
1
Fig. 6.13. Example 5. Left: vector diagram of advection field. Center and right: two views of the approximate solution obtained with method (3.8) for ε = 10−9 on a structured mesh. 3
3
3
3
2.5
2.5
2.5
2.5
2
2
2
2
1.5
1.5
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
0
0
−0.5
0
−0.5 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1
0.5
1
0
−0.5 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1
0.5
1
−0.5 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1
0.5
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1
0.5
1
Fig. 6.14. Example 5. Approximate solutions for ε = 10−3 on structured meshes. From left to right: methods (3.4), (3.6), (3.7), and (3.8).
all methods for ε = 10−3 . From the graphics we observe that all methods behave similarly and provide a good approximate solution also when div β = 0. Remark 6.1. In general, it is neither easy to compare the performance of different DG methods, nor to design a relevant test. For advection-diffusion-reaction problems it is even more complicated, if one wants to take care of all the possible regimes and of the variety of stabilizations. From the tests that we performed so far it seems that all the methods presented in the paper behave similarly, at least in the strongly advectiondominated case. Some differences appear in the intermediate regimes but not enough to draw definite conclusions. From the computational point of view method (3.4) is simpler than the others. On the other hand, method (3.8) seems promising to adjust to varying regimes, provided a sound automatic tuning of the upwind could be found. 7. Conclusions. By using the weighted-residual approach of [6] we set a unified framework for deriving and analyzing various methods for advection-diffusion-reaction problems. The analysis carried out applies to the case of variable convection and reaction fields, and shows that optimal estimates in DG norms are achieved. In particular, we relaxed the usual coercivity condition (see assumption (2.2)), thus allowing for taking care of a variety of situations, if one wants to allow cases of a (comparatively)
1418
BLANCA AYUSO AND L. DONATELLA MARINI
very small diffusion. All the methods considered in this paper seem to have the same stability and accuracy properties, in all regimes. This is also confirmed numerically, though the method (3.8) seems to be more flexible in the intermediate regimes, thanks to the possibility of tuning the amount of upwind. Appendix A. We briefly sketch how the function η ∈ W k+1,∞ (Ω) in (H1) can be constructed. Arguing as in [16] we can guarantee that, for β satisfying (2.3), (A.1)
if β ∈ [W 1,∞ (Ω)]2 =⇒ ∃ η6 ∈ W 1,∞ (Ω) s.t.
β · ∇6 η ≥ 2b0 > 0
in Ω.
We next show how from this function η0 the more regular η in (H1) can be constructed. Let {Uα+ }α be a finite open covering of Ω such that each Uα+ enjoys the following property: there exists some ε1 > 0 (to be chosen later) such that (A.2)
if x, y ∈ Uα+
=⇒
β(x) − β(y)0,∞ < ε1
and (A.3)
∀x, y ∈ Uα+
β(x) · ∇6 η (y) ≥ b0 .
Inequality (A.3) is actually a consequence of (A.2) and (A.1). Indeed, β(x) · ∇6 η (y) = β(y) · ∇6 η (y) + [β(x) − β(y)] · ∇6 η (y) ≥ 2b0 − ε1 ∇6 η 0,∞ . Hence, by taking ε1 = b0 /∇6 η0,∞ one can guarantee (A.3). Let Uα− ⊂ Uα+ be such that (A.2) holds with such choice of ε1 (so that (A.3) is valid for all x and y ∈ Uα− ), and such that {Uα− }α is still an open covering of Ω. Next, on each Uα− we mollify η6 by convolution with some ρδ mollifier; ηαδ = η6 ∗ ρδ in Uα− . Then, by taking a partition of unity {φα }α associated with the covering {Uα− }α we can construct η as in (H1) by δ δ gluing the mollified ηα , that is, η = α ηα · φα . Thus, the existence of η sufficiently smooth satisfying (H1) is guaranteed. REFERENCES [1] D. N. Arnold, An interior penalty finite element method with discontinuous elements, SIAM J. Numer. Anal., 19 (1982), pp. 742–760. [2] D. N. Arnold, F. Brezzi, B. Cockburn, and L. D. Marini, Unified analysis of discontinuous Galerkin methods for elliptic problems, SIAM J. Numer. Anal., 39 (2002), pp. 1749–1779. [3] D. N. Arnold, F. Brezzi, R. Falk, and L. D. Marini, Locking-free Reissner-Mindlin elements without reduced integration, Comput. Methods Appl. Mech. Engrg., 196 (2007), pp. 3660– 3671. [4] D. N. Arnold, F. Brezzi, and L. D. Marini, A family of discontinuous Galerkin finite elements for the Reissner-Mindlin plate, J. Sci. Comput., 22/23 (2005), pp. 25–45. [5] S. C. Brenner, Poincar´ e–Friedrichs inequalities for piecewise H 1 functions, SIAM J. Numer. Anal., 41 (2003), pp. 306–324. ¨ li, Stabilization mechanisms in discontin[6] F. Brezzi, B. Cockburn, L. D. Marini, and E. Su uous Galerkin finite element methods, Comput. Methods Appl. Mech. Engrg., 195 (2006), pp. 3293–3310. [7] F. Brezzi, L. D. Marini, and A. Russo, On the choice of a stabilizing subgrid for convectiondiffusion problems, Comput. Methods Appl. Mech. Engrg., 194 (2005), pp. 127–148. ¨ li, Discontinuous Galerkin methods for first-order hyper[8] F. Brezzi, L. D. Marini, and E. Su bolic problems, Math. Models Methods Appl. Sci., 14 (2004), pp. 1893–1903. [9] A. N. Brooks and T. J. R. Hughes, Streamline upwind/Petrov-Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations, Comput. Methods Appl. Mech. Engrg., 32 (1982), pp. 199–259.
DG FOR ADVECTION-DIFFUSION-REACTION PROBLEMS
1419
[10] A. Buffa, T. J. R. Hughes, and G. Sangalli, Analysis of a multiscale discontinuous Galerkin method for convection-diffusion problems, SIAM J. Numer. Anal., 44 (2006), pp. 1420–1440. [11] E. Burman and P. Zunino, A domain decomposition method based on weighted interior penalties for advection-diffusion-reaction problems, SIAM J. Numer. Anal., 44 (2006), pp. 1612– 1638. [12] P. G. Ciarlet, Basic error estimates for elliptic problems, in Handbook of Numerical Analysis, Vol. II, Handb. Numer. Anal. II, North–Holland, Amsterdam, 1991, pp. 17–351. [13] B. Cockburn, Discontinuous Galerkin methods for convection-dominated problems, in HighOrder Methods for Computational Physics, Lect. Notes Comput. Sci. Eng. 9, SpringerVerlag, Berlin, 1999, pp. 69–224. [14] B. Cockburn and C. Dawson, Some extensions of the local discontinuous Galerkin method for convection-diffusion equations in multidimensions, in The Mathematics of Finite Elements and Applications, X, MAFELAP 1999 (Uxbridge), Elsevier, Oxford, UK, 2000, pp. 225– 238. ´ n, Optimal convergence of the original DG method [15] B. Cockburn, B. Dong, and J. Guzma for the transport-reaction equation on special meshes, SIAM J. Numer. Anal., 46 (2008), pp. 1250–1265. [16] A. Devinatz, R. Ellis, and A. Friedman, The asymptotic behavior of the first real eigenvalue of second order elliptic operators with a small parameter in the highest derivatives. II, Indiana Univ. Math. J., 23 (1973–1974), pp. 991–1011. [17] A. Ern and J.-L. Guermond, Discontinuous Galerkin methods for Friedrichs’ systems. I. General theory, SIAM J. Numer. Anal., 44 (2006), pp. 753–778. [18] A. Ern and J.-L. Guermond, Discontinuous Galerkin methods for Friedrichs’ systems. II. Second-order elliptic PDEs, SIAM J. Numer. Anal., 44 (2006), pp. 2363–2388. [19] A. Ern and J.-L. Guermond, Discontinuous Galerkin methods for Friedrichs’ systems. Part III. Multifield theories with partial coercivity, SIAM J. Numer. Anal., 46 (2008), pp. 776–804. [20] J. Gopalakrishnan and G. Kanschat, A multilevel discontinuous Galerkin method, Numer. Math., 95 (2003), pp. 527–550. ´ n, Local analysis of discontinuous Galerkin methods applied to singularly perturbed [21] J. Guzma problems, J. Numer. Math., 14 (2006), pp. 41–56. [22] B. Heinrich and K. Pietsch, Nitsche type mortaring for some elliptic problem with corner singularities, Computing, 68 (2002), pp. 217–238. ¨ li, Discontinuous hp-finite element methods for advection[23] P. Houston, C. Schwab, and E. Su diffusion-reaction problems, SIAM J. Numer. Anal., 39 (2002), pp. 2133–2163. [24] T. J. R. Hughes, G. Scovazzi, P. B. Bochev, and A. Buffa, A multiscale discontinuous Galerkin method with the computational structure of a continuous Galerkin method, Comput. Methods Appl. Mech. Engrg., 195 (2006), pp. 2761–2787. ¨ vert, and J. Pitka ¨ ranta, Finite element methods for linear hyperbolic [25] C. Johnson, U. Na problems, Comput. Methods Appl. Mech. Engrg., 45 (1984), pp. 285–312. ¨ ranta, An analysis of the discontinuous Galerkin method for a [26] C. Johnson and J. Pitka scalar hyperbolic equation, Math. Comp., 46 (1986), pp. 1–26. [27] J. M. Melenk and C. Schwab, An hp finite element method for convection-diffusion problems in one dimension, IMA J. Numer. Anal., 19 (1999), pp. 425–453. ¨ vert, A Finite Element Method for Convection-Diffusion Problems, Ph.D. thesis, De[28] U. Na partment of Computer Science, Chalmers University of Technology, G¨ oteborg, Sweden, 1982. [29] J. Nitsche and A. Schatz, On local approximation properties of L2 -projection on splinesubspaces, Applicable Anal., 2 (1972), pp. 161–168. [30] T. E. Peterson, A note on the convergence of the discontinuous Galerkin method for a scalar hyperbolic equation, SIAM J. Numer. Anal., 28 (1991), pp. 133–140. [31] G. R. Richter, An optimal-order error estimate for the discontinuous Galerkin method, Math. Comp., 50 (1988), pp. 75–88. [32] B. Rivi` ere, M. F. Wheeler, and V. Girault, Improved energy estimates for interior penalty, constrained and discontinuous Galerkin methods for elliptic problems. I, Comput. Geosci., 3 (1999), pp. 337–360 (2000). [33] H.-G. Roos, M. Stynes, and L. Tobiska, Numerical Methods for Singularly Perturbed Differential Equations: Convection-Diffusion and Flow Problems, Springer Ser. Comput. Math. 24, Springer-Verlag, Berlin, 1996. [34] G. Sangalli, Global and local error analysis for the residual-free bubbles method applied to advection-dominated problems, SIAM J. Numer. Anal., 38 (2000), pp. 1496–1522.
1420
BLANCA AYUSO AND L. DONATELLA MARINI
[35] R. Stenberg, Mortaring by a method of J. A. Nitsche, in Computational Mechanics (Buenos Aires, 1998), CD-ROM file, Centro Internac. M´etodos Num´er. Ing., Barcelona, Spain, 1998. [36] S. Sun and M. F. Wheeler, Symmetric and nonsymmetric discontinuous Galerkin methods for reactive transport in porous media, SIAM J. Numer. Anal., 43 (2005), pp. 195–219. [37] L. B. Wahlbin, Superconvergence in Galerkin Finite Element Methods, Lecture Notes in Math. 1605, Springer-Verlag, Berlin, 1995. [38] H. Zarin and H.-G. Roos, Interior penalty discontinuous approximations of convectiondiffusion problems with parabolic layers, Numer. Math., 100 (2005), pp. 735–759.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1421–1444
c 2009 Society for Industrial and Applied Mathematics
ON MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING FOR GENERAL FINITE ELEMENT SPACES∗ QIANG DU† , DESHENG WANG‡ , AND LIYONG ZHU§ Abstract. The performance of finite element computation depends strongly on the quality of the geometric mesh and the efficiency of the numerical solution of the linear systems resulting from the discretization of partial differential equation (PDE) models. It is common knowledge that mesh geometry affects not only the approximation error of the finite element solution but also the spectral properties of the corresponding stiffness matrix. In this paper, for typical second-order elliptic problems, some refined relationships between the spectral condition number of the stiffness matrix and the mesh geometry are established for general finite element spaces defined on simplicial meshes. The derivation of such relations for general high-order elements is based on a new trace formula for the element stiffness matrix. It is shown that a few universal geometric quantities have the same dominant effect on the stiffness matrix conditioning for different finite element spaces. These results provide guidance to the studies of both linear algebraic solvers and the unstructured geometric meshing. Key words. condition number, mesh quality, finite element method, unstructured mesh AMS subject classifications. 65N30, 65F10 DOI. 10.1137/080718486
1. Introduction. The finite element solution of partial differential equations (PDEs) often involves mesh generation and optimization, the assembly of discrete algebraic systems using the finite element basis, and the solution of these systems by some algebraic solvers. Traditionally, the different components have often been studied separately, so as to maximize the independence between the various software components and to make the finite element method a versatile and popular methodology for many applications. In recent years, the finite element community has been paying increasing attention to an integrated adaptive solution strategy. It thus becomes important to understand the interplay between the various components in order to improve the overall performance of finite element simulations. A major objective of the study we have undertaken recently is to explore the relations among the mesh geometry, the efficiency of the linear solver for the resulting finite element linear system of equations, and the interpolation (or discretization) error. While it has been common in the meshing community to examine the quality of mesh with respect to various geometric measures, there have also been a number of works relating mesh quality to interpolation or discretization errors; see, for instance, [5, 7, 8, 10, 12, 33, 37, 39] and the references cited therein. Connections between the performance of the algebraic solvers and general unstructured meshes have also been made, but with much less rigor and generality. Perhaps the most widely known ∗ Received by the editors March 14, 2008; accepted for publication (in revised form) October 23, 2008; published electronically March 13, 2009. http://www.siam.org/journals/sinum/47-2/71848.html † Department of Mathematics, Pennsylvania State University, University Park, PA 16802 (qdu@ math.psu.edu). The work of this author is supported in part by NSF DMS-0712744. ‡ Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore (
[email protected]). The work of this author is supported in part by grants NTU start-up M58110011, ARC 29/07 T207B2202, and NRF 2007 IDM-IDM002-010. § Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Beijing, China (zhu
[email protected]).
1421
1422
QIANG DU, DESHENG WANG, AND LIYONG ZHU
facts in this direction were based on the vast experiences in the application of finite element technology, such as the belief that poorly shaped elements can give rise to ill-conditioned matrices, which tend to slow down or even prevent the convergence of iterative solvers. Even with the increasing popularity of the unstructured simplicial meshing in finite element simulations, there were relatively few attempts at general discussions on the precise connections between the solver performances and the qualities of unstructured meshing. From among the notable works we recall [36], in which the effect of the unstructured irregular grids on the performance of algebraic solvers and preconditioners has been examined through numerical examples. In [6, 18], the trade-offs associated with the cost of mesh improvement in terms of solution efficiency have been analyzed numerically. In [37], comprehensive discussions have been made on mesh quality measures, and in particular, on how a good element for resolving the discretization error may at the same time be good for the efficient solution of the resulting algebraic systems. More recently in [15], a mesh and solver co-adaptation strategy has been studied in the context of finite element methods for anisotropic problems. In a more general arena, but closely related to our objective, the exploration of the properties of the stiffness matrix resulting from the finite element discretizations in relation to the underlying geometric meshes has remained a continuing theme in the finite element literature for half a century. Precise and explicit descriptions of the relations between mesh geometry and the spectral condition numbers are naturally helpful to the understanding of the whole finite element solution process. Yet the current understanding of such relations remains largely incomplete despite a number of existing investigations [2, 21, 37, 38]. In this work, we are able to establish a precise relation between the mesh geometry and the spectral condition number of the stiffness matrix for some typical second-order elliptic equations discretized by general finite element methods based on unstructured simplicial meshes in any space dimension. An important conclusion following from our analysis is that the effect of the element geometry on the conditioning of the stiffness matrices for more general finite element methods is similar to that of the conforming linear Lagrange finite element. Consequently, a simplicial mesh that makes the stiffness matrices less ill-conditioned for the linear element tends to do the same for high-order elements as well. Results of such generality, to the best of our knowledge, have not been presented before in the literature. They bring new understanding to mesh generation and optimization and the solution of discrete algebraic systems. Our analysis is based on the derivation of an explicit trace formula for the element stiffness matrix corresponding to the finite element approximation to the Laplace operator (presented in section 2). While requiring only routine calculations, the trace formula appears to be new and quite elegant. It helps us to derive, in section 3, more precise estimates on the extreme eigenvalues of element stiffness matrices for general finite element spaces in terms of the element and mesh geometries, using an earlier framework on the estimation of stiffness matrix conditioning in [20, 21]. Some known calculations in the literature on the linear Lagrange finite element are also presented there as comparisons. The new estimate not only makes some of the classical works (such as those in [19, 20, 21]) more precise but also makes some observations for special cases (such as those in [37]) more general. In addition, we specialize to various cases and consider the relevant extensions (in section 4). The theoretical analysis is also complemented by numerical experiments which serve as further validation. 2. Finite element approximation and a new trace formula. In this section, we first derive a new trace formula for the element stiffness matrix for the
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1423
Laplace operator using general finite element methods. We then recall briefly the abstract framework on the condition number estimate for general symmetric secondorder elliptic equations given in [21] and the discussion on the linear Lagrange finite element given in [37]. These results form the basis of discussions on the condition number estimation for general high-order elements on general unstructured simplicial meshes. 2.1. Basic finite element terminology. Given an open bounded convex domain Ω ∈ Rd with a Lipschitz-continuous boundary, we consider the following general self-adjoint linear second-order elliptic boundary value problem: ⎧
d ⎪ ∂ ∂u ⎨ Ω, aij + a0 u = f in − (2.1) ∂xi ∂xj i,j=1 ⎪ ⎩ u=0 on ∂Ω, where the coefficient matrix A˜ = (aij )di,j=1 is symmetric positive definite everywhere in Ω and a0 ≥ 0 in Ω. Both A˜ and a0 are assumed to be smooth and uniformly bounded for simplicity. In addition, we let f ∈ L2 (Ω). The corresponding variational weak form is as follows: Find u ∈ H01 (Ω) such that (2.2) aΩ (u, v) =
d
Ω i,j=1
∂u ∂v aij ∂xi ∂xj
a0 uv dx =
dx + Ω
f v dx
∀v ∈ H01 (Ω).
Ω
It is well known that the above weak variational form (2.2) has a unique solution in H01 (Ω) [11]. Let τ denote the finite element mesh (a triangulation, or equivalently, a simplicial mesh for much of our discussion). Appropriate finite element spaces with suitably chosen nodal basis functions {φj }N j=1 may then be employed to discretize the continuous problem (2.2), resulting in algebraic systems associated with the finite element approximations. For any (simplicial) element t ∈ τ , we assume that the nodal basis, when restricted to t, is given by a canonical transformation from a nodal basis defined on a reference simplex described by the barycentric coordinates . bj = 1 . (2.3) t0 = (b1 , b2 , . . . , bd+1 ) | bi ≥ 0, Concerning the finite element space, we make an additional assumption that the nodal basis on t0 is invariant with respect to the permutation of the vertices, a property that is satisfied by most of the finite element spaces. Let K and M be the N × N stiffness and mass matrices, respectively, generated by the finite element methods, that is, K = (kij ), kij = aΩ (φi , φj ) and M = (mij ), mij = ρφi φj dx. Ω
Here, as in [20], a positive density function ρ = ρ(x) is introduced into the mass matrix. While for much of the discussion we focus on the case when ρ = 1 is a constant, a nonuniform density can be very useful in dealing with highly nonuniform meshes. Without further complicating the discussion, we assume that ρ remains positive and smooth in the domain of interest. Obviously, both K and M are symmetric, with M being positive definite and K being either positive or nonnegative definite. Denote the element matrices corresponding to K and M by Kt and Mt , respectively, for any (simplicial) element t ∈ τ .
1424
QIANG DU, DESHENG WANG, AND LIYONG ZHU
We use n to denote the dimension of Kt and Mt , which corresponds to the degree of freedom or the number of nodal basis functions for the element t. N M N The eigenvalues of K and M are denoted by {λK i }i=1 and {λi }i=1 , which are ordered by K K λK 1 ≤ λ2 ≤ · · · ≤ λN ,
M M λM 1 ≤ λ2 ≤ · · · ≤ λN .
M K M In this notation, λK 1 and λ1 are the minimal eigenvalues of K and M , and λN and λN Kt n Mt n are the maximal eigenvalues, respectively. Similarly, we use {λi }i=1 and {λi }i=1 to denote the eigenvalues of Kt and Mt , respectively, which are also ordered by Kt Kt t λK 1 ≤ λ2 ≤ · · · ≤ λn ,
Mt Mt t λM 1 ≤ λ2 ≤ · · · ≤ λn .
For the case of a conforming linear finite element, the nodal basis on the element be the vertices of t is simply given by the coordinates {b1 , b2 , . . . , bd+1 }. Let {zj }d+1 1 t with zj having corresponding barycentric coordinates bj = 1 and bi = 0 for i = j. It is well known that for each i, bi = bi (x) is a linear function of x ∈ t, representing the ratio of the volume formed by the simplex with vertices x ∪ {bj , j = i} and the volume of t. Moreover, bj zj . x= j
It is also trivial to see that ∇bi gives the normal direction of the (d − 1)-dimensional face Ai of t, opposite to the vertex zi , and |∇bi | is the reciprocal of the height of the simplex t corresponding to the vertex zi . Equivalently, we have [11] |∇bi | =
(2.4)
|Ai | d|t|
with |t| denoting the volume of t and |Ai | being the area of the face Ai for each 1 ≤ i ≤ d + 1. 2.2. A trace formula for the element stiffness matrix. We now derive a new trace formula for the stiffness matrix associated with the Laplace operator discretized by general simplicial finite element spaces. We adopt the notation introduced in the previous subsection but specialize to the case of ∇u · ∇v, dx (2.5) aΩ (u, v) = Ω 1
for any u, v in H (Ω). In this case, (2.2) corresponds to the Poisson equation with a homogeneous Dirichlet boundary condition if we take u, v ∈ H01 (Ω). Given a simplex t, we use {Li ({bj })}ni=1 to denote a general form of the nodal basis functions on t, and the finite element approximation is given by functions whose restrictions on t are linear combinations of {Li }. Notice that it is assumed that the set of basis functions remains invariant under any permutation to vertices, and thus under any permutation of the barycentric coordinates. Consider the element stiffness matrix Kt . Its (k, l)th entry is now given by at (Lk , Ll ) = ∇Lk · ∇Ll dx . t
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1425
In particular, we have the mth diagonal entry given by at (Lm , Lm ) = ∇Lm · ∇Lm dx t
⎛
d
=
i=1
⎝
t
d+1 j=1
d+1 d+1 d
=
∂bj ∂bk ∂xi ∂xi
i=1 j=1 k=1
Here, we have used the fact that m to get the trace of Kt , Tr(Kt ) =
n
∂bj ∂xi
and
⎞2 ∂Lm ∂bj ⎠ dx ∂bj ∂xi
∂bk ∂xi
t
∂Lm ∂Lm dx . ∂bj ∂bk
are constants on t. Now, we sum over
at (Lm , Lm )
m=1
=
d d+1 d+1 n ∂bj ∂bk ∂Lm ∂Lm dx ∂xi ∂xi t ∂bj ∂bk m=1 i=1 j=1 k=1
=
(2.6)
d+1 d+1 d i=1 j=1 k=1
∂bj ∂bk ∂xi ∂xi
n ∂Lm ∂Lm dx . t m=1 ∂bj ∂bk
By the invariance of the set of basis functions under the permutation of the barycentric coordinates, we see that there are two constants αdn and βnd such that 2 n
∂Lm dx = αdn |t| ∀j , (2.7) ∂bj t m=1 n ∂Lm ∂Lm dx = βnd |t| t m=1 ∂bj ∂bk
(2.8)
∀j = k .
Thus, we may use (2.7) and (2.8) for the cases k = j and k = j, respectively, to complete the sum in (2.6) over the index m first. This leads to 2 d+1
d+1 d d ∂bj ∂bj ∂bk d + βnd |t| . Tr(Kt ) = αn |t| ∂xi ∂xi ∂xi i=1 j=1 i=1 j=1 k =j
Noticing from the definition of {bj } that ⎞ ⎛ ⎞ ⎛ ∇⎝ bj ⎠ · ∇ ⎝ bj ⎠ = 0 , j
j
we then further obtain Tr(Kt ) =
(αdn
−
βnd )|t|
2 d+1
d ∂bj i=1 j=1
(2.9)
= (αdn − βnd )|t|
d+1 j=1
∂xi
|∇bj |2 = (αdn − βnd )d−2 Qd (t) ,
1426
QIANG DU, DESHENG WANG, AND LIYONG ZHU
where, according to (2.4), the term Qd (t) is given by 1 |Ai |2 |t| i=1 d+1
Qd (t) =
(2.10)
for any d-dimensional simplex t with |t| being its volume and {Ai }d+1 i=1 being the areas (volumes) of its (d − 1)-dimensional faces. Now, if we let γnd = (αdn − βnd )d−2 , then by a symmetry consideration, we can get the following equivalent form of γnd : γnd =
2 d+1 d+1
n 1 ∂Lm ∂Lm − dx . d3 (d + 1)|t| m=1 j=1 ∂bj ∂bk t k=1
Moreover, with a change of variable in the integral, we get a geometry-independent form of γnd as follows: (2.11)
γnd =
d+1 d+1 n ∂Lm ∂Lm 2 1 − dx , d3 (d + 1)|t0 | m=1 j=1 ∂bj ∂bk t0 k=1
where t0 is the standard reference simplex defined in (2.3). We thus arrive at the following theorem. Theorem 2.1 (a new trace formula). For any general finite element spaces defined on a simplicial mesh τ with the nodal basis on any d-dimensional simplex t ∈ τ satisfying the invariance property specified above, the element stiffness matrix for (2.5) has the trace formula (2.12)
Tr(Kt ) = γnd Qd (t),
where n is the cardinality of the set of local nodal basis functions, γnd is the positive constant defined by (2.11), and Qd (t) is as defined by (2.10). It is important to note that γnd is a positive constant that depends only on the corresponding basis functions on the reference simplex t0 and is independent of the geometry of the particular element t. Thus, we see the elegance of the above trace formula: it implies that the trace of the element stiffness matrix for general finite element spaces (with an invariant basis) is a product of two factors, with one being γnd , which is completely independent of the element t, and the other being Qd (t), the trace of Kt corresponding to the linear nodal basis consisting of {bj }d+1 j=1 , which is completely independent of the choice of the finite element spaces (as long as they take some invariant basis). While the calculation of the special case for the linear element is widely known in standard finite element texts [2, 11, 38], to the best of our knowledge, the more general cases have not been presented in the literature. Our derivation of the results is indeed for general finite element spaces on simplicial meshes that include the classical standard Lagrange finite element spaces of any order, and other exotic spaces, such as the enrichment of the conforming linear element with bubble functions or stabilized finite element spaces [1]. As a corollary, using the nonnegativeness of Kt , we can get an estimate for the t maximum eigenvalue λK n of the element stiffness matrix Kt . Corollary 2.1. Under the above conditions, we have (2.13)
γnd d t Qd (t) ≤ λK n ≤ γn Qd (t) . n−1
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1427
Though the upper and lower bounds in (2.13) differ by a factor of n− 1, the above estimate does provide a precise control on the contribution due to the mesh geometry on the largest eigenvalue of the element stiffness matrix. To be discussed later, this is crucial to the application of the framework developed in [20, 21] for estimating the condition number of the assembled global stiffness matrix K on the whole domain. Naturally, by summing over all elements, we may also get a trace formula for the global stiffness matrix using the result of the above theorem. Let us consider (2.14)
−∇ · (μ∇u)) = f
in Ω,
∂u = g on with a diffusion coefficient μ = μ(x) and Neumann boundary condition ∂n ∂Ω. Assume that the f and g are compatible so that the equation is solvable. Corollary 2.2. For any general finite element spaces defined on a simplicial mesh τ with the nodal basis on any d-dimensional simplex t ∈ τ satisfying the invariance property specified above, let Kμ be the global stiffness matrix of (2.14) with a Neumann boundary condition. If μ remains a constant on t for any t ∈ τ , then K has the trace formula (2.15) Tr(Kμ ) = γnd μt Qd (t) , t∈τ
where μt denotes the value of μ on t ∈ τ . The trace formulae can be extended to more general cases, where μ is not necessarily a constant on t but remains invariant under the transformation of permuting the vertices. For example, in two dimensions, μ on an element t can take on a function of the form c1 + c2 b1 b2 b3 , with {bi } being the barycentric coordinates on t and c1 , c2 being some constants. Note that for Dirichlet boundary conditions, contributions from the basis functions corresponding to the boundary nodes are not normally assembled into the stiffness matrix, which thus may lead to a minor alteration of the trace formula. We note that in the literature, it has been suggested that the minimization of the trace of the stiffness matrix can be used to optimize finite element grids; we see from (2.15) that the dependence of the trace on the mesh geometry is in fact the same for general finite element spaces. In practical implementation of the finite element methods, especially with the use of high-order finite element spaces, the assembly of the stiffness and mass matrices is often done with the help of numerical integration. With enough precision in the numerical quadrature, the order of accuracy of the finite element methods can be preserved [38]. It is then natural to ask if the use of quadrature affects the discussions in this paper and thus the relation between the mesh geometry and the conditioning of the stiffness and mass matrices. Let us consider first a simplex t which is mapped via an affine transform F to the reference element t0 . Let {wm , ym ∈ t} be a quadrature formula on t0 , that is, g(y)dy ∼ wm |t0 |g(ym ) . t0
m
Notice that a factor t0 is added in the quadrature so that a normalization condition m wm = 1 is satisfied. We assume in addition that {wm , ym ∈ t} gives an invariant quadrature; that is, it is invariant with respect to a permutation of the vertices of
1428
QIANG DU, DESHENG WANG, AND LIYONG ZHU
t0 , which is satisfied, for instance, by the one point quadrature at the barycenter, the midside rule, and other invariant high-order Gaussian quadratures [41]. For the entries of the element stiffness matrix for the Poisson equation at (Lk , Ll ), the integral on t is approximated by g(x)dx ∼
(2.16)
t
wm |t|g(F −1 ym ) .
m
Now, define the modified bilinear form as a ˆt (φ, ψ) =
wm |t|∇φ(xm ) · ∇ψ(xm )
m
for any polynomials φ and ψ defined on t and {xm = F −1 ym }. We then can follow a similar derivation given above to compute the trace of the modified element stiffness ˆ t = (ˆ at (Lk , Lj )) to get the following. matrix K Theorem 2.2. For any general finite element spaces defined on a simplicial mesh τ with the nodal basis on any d-dimensional simplex t ∈ τ satisfying the invariance property specified above, we have the following trace formula for the modified element ˆ t for (2.5) computed using an invariant numerical quadrature: stiffness matrix K ˆ t ) = γˆnd Qd (t) , Tr(K
(2.17)
where n and Qd (t) are as defined before, F is the affine map that maps t to t0 , and γˆnd is a positive constant defined by 1 = 3 wm d (d + 1) i=1 j=1 m n d+1 d+1
(2.18)
γˆnd
k=1
2 ∂Li ∂Li (ym ) − (ym ) . ∂bj ∂bk
The significance of Theorem 2.2 lies in the fact that the only geometric factor affecting the trace remains to be Qd (t) even with the use of a numerical integration. Of course, the assumption that the quadrature is invariant is crucial for the observation to hold. Before we conclude the discussion on the trace formula, we make a few comments on the constant γnd . First of all, it is possible to get some explicit estimates of γnd . For instance, as seen before, for a linear finite element in any dimension, we have 1 = 1/d2 . Naturally, it would be interesting to investigate the asymptotic behavior γd+1 d of γn as n gets larger. This would be of interest for the case of very high order Lagrange elements and p or h−p finite element spaces. Such a behavior will be studied in future works. 3. Mesh-dependent condition number estimates. In this section, we first discuss some detailed computations given in [37] on the relation between condition numbers of the stiffness matrices and the mesh geometry in some special cases. These results provide insight into the type of estimates we can expect in general. Afterwards, we recall some earlier estimates on the condition number of the stiffness matrices presented in [21]. We then use the trace formula derived in the previous section to reveal the detailed dependence of the condition numbers on the mesh geometry in the more general settings.
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1429
3.1. Some known results on the linear Lagrange finite element. We first focus on a special case corresponding to the Poisson equation with a homogeneous boundary condition: (3.1)
−u = f u=0
in on
Ω, ∂Ω,
and its equivalent variational weak form: Find u ∈ H01 (Ω) such that
∇u · ∇v dx =
(3.2) Ω
f v dx
∀ v ∈ H01 (Ω).
Ω
While the explicit forms of the element stiffness matrices for linear triangular and tetrahedral elements can be found in many standard finite element texts, a detailed calculation can be found in [37], where careful discussions on the bounds of the eigenvalues of element stiffness matrices are also presented with respect to the mesh quality corresponding to the linear Lagrange finite element. Here, we briefly recall the results presented in [37]. Similar calculations have been given in many other works; see, for example, [20, 38, 40]. In the two space dimension, let {li , θi } (i = 1, 2, 3) be the edge lengths and internal angles of a triangle t ∈ τ with area |t|. Then the element stiffness matrix on the triangle t is precisely [37] ⎞ l32 − l12 − l22 l22 − l11 − l32 2l12 1 ⎝ 2 l3 − l12 − l22 2l22 l12 − l22 − l32 ⎠ Kt = |t| (∇bi · ∇bj ) = 8|t| l22 − l12 − l32 l12 − l22 − l32 2l32 ⎛ ⎞ − cot(θ3 ) − cot(θ2 ) cot(θ2 ) + cot(θ3 ) 1 ⎠. − cot(θ3 ) cot(θ1 ) + cot(θ3 ) − cot(θ1 ) = ⎝ 2 − cot(θ ) cot(θ ) + cot(θ ) − cot(θ ) ⎛
(3.3)
2
1
1
2
In [37], the roots of its characteristic polynomial are computed as λ1 = 0 and (3.4)
λ2,3 =
1 8|t|
l12 + l22 + l32 ± (l12 + l22 + l32 )2 − 48|t|2 .
t The largest root λK is a scale-invariant indicator of the quality of the triangle’s 3 shape in terms of (3.4). Similar calculations can be found in other works as well; see, for example, [36], where eigenvalues of the diagonally preconditioned element stiffness matrix have also been explicitly computed. Note that the eigenvalues are nonnegative and λ1 = 0, so
(3.5)
1 1 2 1 2 t lj = cot(θj ) = l . cot(θj ) ≤ λK 3 ≤ 8|t| j 2 4|t| j j
The above equation is a special case of (2.13), and as explained in [37], it also shows t that if any of the angles approaches 0 or π, it would lead to large λK 3 , thus affecting the conditioning of the stiffness matrix. These angle conditions, as pointed out in [36], reflect the common knowledge of minimizing the element distortion, a principle behind the Delaunay triangulation [22, 37], and are compatible with the angle conditions for guaranteeing the uniform finite element approximations of derivatives [3, 38].
1430
QIANG DU, DESHENG WANG, AND LIYONG ZHU
Similarly, the element stiffness matrix for the linear Lagrange element on a threedimensional tetrahedron t can be written as [37] ⎞ ⎛ −l34 cot(θ34 ) −l24 cot(θ24 ) −l23 cot(θ23 ) k11 1 ⎜ −l34 cot(θ34 ) k22 −l14 cot(θ14 ) −l13 cot(θ13 ) ⎟ ⎟, (3.6) Kt = ⎜ ⎝ −l24 cot(θ24 ) −l14 cot(θ14 ) k33 −l12 cot(θ12 ) ⎠ 6 −l23 cot(θ23 ) −l13 cot(θ13 ) −l12 cot(θ12 ) k44 where lij is the edge of t with a corresponding dihedral angle θij , and the diagonal entries {kii } are such that the row sums are all identically zero. In [37], the characteristic polynomial of Kt is calculated as 1 1 |Ai |2 λ3 + 9|t| i=1 36 4
(3.7)
p(λ) = λ4 −
2 2 ljk λ −
1≤j
|t| λ. 9
From (3.6), we can see that if one of the dihedral angles approaches 0, its cotangent t approaches infinity, and so does λK 4 , the maximum eigenvalue of Kt . For a tetrahedron, it is possible for one dihedral angle to be arbitrarily close to π without any dihedral angle of the tetrahedron being small (see [37]). Although an angle approaching π has a cotangent approaching negative infinity, surprisingly, such a tetrahedron does not induce a large eigenvalue in Kt because each entry on the diagonal of Kt is nonnegative and has the form i,j lij cot θij . Therefore, if t has no dihedral angle t close to 0, the diagonal entries of Kt are bounded from the above, and thus so is λK 4 . This observation does not depend on whether t has planar angles near 0. t For λK 4 of a tetrahedron t, the following equation holds [37]: Q3 (t) Q3 (t) t ≤ λK , 4 ≤ 27 9
(3.8)
where Q3 (t) is as given in (2.10). This is again a special case of our general estimates t (2.13) for the linear tetrahedral element (with d = 3 and n = 4). It shows that λK 4 Kt K (and thus λN ) is not scale-invariant so that λ4 grows linearly with the longest edge, as pointed out in [37]. These calculations give some insight into how the conditioning of the element stiffness matrix for the linear element might be dependent on the mesh geometry. For a general higher-order finite element, it is not always possible to analytically solve for the eigenvalues of element stiffness matrices. Instead, the trace formula developed in the previous section can help establishing the link between the mesh and the element stiffness matrices for general finite element spaces. The only key step that remains to be worked out is to see how the global stiffness matrix condition number is related to that of the element stiffness matrix. This is to be addressed next. 3.2. Some known condition number estimates. As stated before, we are interested in studying the stiffness matrix conditioning for general self-adjoint elliptic equations discretized by general finite element spaces on unstructured simplicial meshes. In [20, 21], a general estimate on the spectral properties of the global stiffness matrix in relation to that of the element stiffness matrix was given: (3.9) (3.10)
K Kt t max(λK n ) ≤ λN ≤ P∗ max(λn ), t∈τ
t∈τ
K Mt t λ1 min(λM 1 ) ≤ λ1 ≤ λ1 P∗ max(λn ), t∈τ
t∈τ
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1431
where P∗ is the maximal number of elements in τ meeting at a nodal point, and λ1 is the smallest eigenvalue of the following elliptic eigenproblem: ⎧
d ⎪ ∂ ∂u ⎨ − aij = λρu in Ω, (3.11) ∂xi ∂xj i,j=1 ⎪ ⎩ u=0 on ∂Ω, where λ denotes any of the eigenvalues and u denotes a corresponding nonzero eigenfunction. The density ρ can be taken to be the unit constant in most cases, but for highly nonuniform meshes, a nonuniform density tends to give sharper estimates. From (3.9) and (3.10), the spectral condition number of the stiffness matrix K, Cond(K), satisfies the following inequalities [21]: t max(λK n )
(3.12)
t∈τ
t λ1 P∗ max(λM n )
≤ Cond(K) ≤
t∈τ
t P∗ max(λK n )
t∈τ
t λ1 min(λM 1 )
.
t∈τ
λ1
can be regarded as a constant that depends only on The lowest exact eigenvalue the intrinsic properties of the continuous problem but does not depend on the discretization parameters. In this paper, we always consider those meshes with uniformly bounded P∗ . It is certainly interesting to examine the sharpness of the estimates (3.12), or rather, the corresponding estimates on the extreme eigenvalues in (3.9)–(3.10). As our interests are to explore the connection between the mesh geometry and the condition number estimates, it can be seen from (3.9) that the estimate on the largest eigenvalue of the stiffness matrix is sharp up to at most a mesh-independent constant factor. But the lower and upper bounds in (3.10) can be different by orders of magnitude in an unstructured grid for a constant density ρ. A nonuniform density that matches the element volumes can help make the bounds sharper, as shown in [20]. This issue will be revisited in later sections. We note here that in cases where (3.12) is sharp, it remains to find good estimates on the extreme eigenvalues of the element stiffness and mass matrices. 3.3. Condition number estimates for general finite element spaces. For a given PDE, the relation between mesh geometry and stiffness matrix conditioning may vary with respect to different finite element spaces. To be able to utilize the trace formula and the estimates on the maximum eigenvalues of the element stiffness matrix established in the previous section, we again focus on the model problem (3.1) with an appropriate finite element space. Hence, in this subsection, K denotes the global stiffness matrix corresponding only to (3.1). Mt t By (3.12), to bound the condition number of K, we need estimates on λM 1 , λn , Kt and λn for the element mass and stiffness matrices corresponding to (3.1). The dependence of the spectral properties of the mass matrices on the mesh geometry has been previously studied. Some detailed computation can be found, for example, in [40]. For the element mass matrices, the computation is even simpler. Given a general finite element basis function ψ = ψ(x) of the form kd+1 αk bk11 bk22 · · · bd+1 , ψ(x) = | k|1 ≤n
where {bi } are the barycentric coordinates, k is a (d + 1)-dimensional multi-index with |k|1 being the l1 norm, and the coefficients αk depend only on the finite element
1432
QIANG DU, DESHENG WANG, AND LIYONG ZHU
space chosen, but are independent of mesh geometry. For the uniform density ρ = 1, using a change of variable to the reference element t0 , it is easy to get the following. Lemma 3.1. For the model bilinear form aΩ in (2.5), for the constant density ρ = 1, the element mass matrix Mt on the element t satisfies Mt =
(3.13)
|t| Mt . |t0 | 0
Consequently, (3.14)
t min λM 1 = δn min |t| ,
t∈τ
t∈τ
t max λM n = σn max |t| ,
t∈τ
t∈τ
where δn and σn are two constants given by (3.15)
δn =
1 Mt0 λ , |t0 | 1
σn =
1 Mt0 λn . |t0 |
Note that the constants δn and σn are independent of the element t but only on t0 and the corresponding local finite element basis. For a nonuniform density ρ, we have the following. Lemma 3.2. For the model bilinear form aΩ in (2.5), we have (3.16)
t t δn min{ρtmin |t|} ≤ min λM 1 ≤ δn min{ρmax |t|} ,
(3.17)
t t σn max{ρtmin |t|} ≤ max λM n ≤ σn max{ρmax |t|} ,
t∈τ
t∈τ
t∈τ
t∈τ
t∈τ
t∈τ
on the element t. where ρtmin and ρtmax are the minimum and maximum values of ρ n Proof. Given any z = (z1 , . . . , zn )T , we define the function φ = k=1 zk Lk , where {Lk } is the nodal basis of the finite element space on t. Obviously, we have t 2 2 t ρmin φ dt ≤ ρφ dt ≤ ρmax φ2 dt . t
t
t
By the definition of the element mass matrices, it is then easy to see that ρtminz T Mt1 z ≤ z T Mt z ≤ ρtmaxz T Mt1 z , where Mt1 corresponds to the element mass matrix with the constant density ρ = 1. Then by Lemma 3.1 and the variational definitions of the extreme eigenvalues, we immediately get the results in (3.16) and (3.17). The above lemmas are valid for general finite element spaces, and it simply implies that, by (3.10), a lower bound for the smallest eigenvalue λK 1 of the global stiffness matrix is proportional to the volume of the smallest element, while an upper bound is proportional to the volume of the largest element; that is, see the following. Lemma 3.3. Under the conditions on the finite element spaces described earlier, for the model bilinear form aΩ in (2.5), the smallest eigenvalue of the global stiffness matrix satisfies (3.18)
∗ t λ∗1 δn min{ρtmin |t|} ≤ λK 1 ≤ λ1 P∗ σn max{ρmax |t|} , t∈τ
where δn and σn are two constants defined in (3.15).
t∈τ
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1433
We note that the lower and upper bounds in (3.18) remain nearly on the same order for meshes with quasi-uniform element volumes in terms of the dependence on the mesh geometry. Thus, we expect that (3.18) may be less effective in highly graded or adapted meshes containing elements of very different sizes. We will revisit this in later discussions. Now to complete the condition number estimate, we need only bound the largest K eigenvalue λK N . From (3.9), we know that λN is related to the largest eigenvalues of the element stiffness matrices. Given the bounds on the largest eigenvalues of the element stiffness matrices in (2.13), bounds of λK N may be derived. Lemma 3.4. Under the conditions on the finite element spaces described earlier, for the model bilinear form aΩ in (2.5), the largest eigenvalue of the global stiffness matrix satisfies γnd d max{Qd (t)} ≤ λK N ≤ γn P∗ max{Qd (t)} . t∈τ (n − 1) t∈τ
(3.19)
Combining the results of Lemmas 3.3 and 3.4, we get the following. Theorem 3.1. Under the assumptions on the finite element spaces made earlier, we have, for the model bilinear form aΩ in (2.5), the following condition number estimate: (3.20)
γnd P∗ maxt∈τ {Qd (t)} γnd maxt∈τ {Qd (t)} ≤ Cond(K) ≤ . ∗ (n − 1)λ1 P∗ σn maxt∈τ {ρtmax |t|} λ∗1 δn mint∈τ {ρtmin |t|}
The proof of the theorem simply follows directly from the application of Lemma 3.1, Corollary 2.1, and estimates (3.9) and (3.10). The above result is for the Poisson equation (3.1), and the results for general diffusion equations can also be derived. As our objective is to explore the mesh dependence, we do not intend to get the optimal estimates with respect to all the quantities and parameters involved. Instead, we focus on results that have precise dependence on the geometric factors of the simplicial meshes. This can be easily achieved. For example, let us consider the following diffusion equation with a variable diffusion coefficient −∇(A(x)∇u) = f in Ω, (3.21) u=0 on ∂Ω with A = A(x) a d × d symmetric positive definite tensor satisfying 0 < β1 I ≤ A(x) ≤ β2 I
(3.22)
uniformly for x ∈ Ω for some positive constants β1 and β2 . We use KA to denote the stiffness matrix associated with the finite element discretization of (3.21) to differentiate from the notation K, which is reserved to denote the stiffness matrix for the Poisson equation (3.1) in this subsection. Then it is easy to check that for any y ∈ RN , we have y T KA y = (∇uh )T A(x)∇uh dx ≤ β2 (∇uh )T ∇uh dx = β2 yT Ky , Ω
Ω
and similarly, y ≥ β1 y T Ky . y T KA
1434
QIANG DU, DESHENG WANG, AND LIYONG ZHU
Thus, using the standard variational characterization of the extreme eigenvalues, we immediately get the result of the following theorem. Theorem 3.2. Under the assumptions on the finite element spaces made earlier, we have, for the stiffness matrix KA corresponding to (3.21), the estimate for the smallest eigenvalue, A β1 λ∗1 δn min{ρtmin |t|} ≤ λK ≤ β2 λ∗1 P∗ σn max{ρtmax |t|} , 1
(3.23)
t∈τ
t∈τ
and the estimate for the largest eigenvalue, β1 γnd d A max{Qd (t)} ≤ λK {Qd (t)} . N ≤ β2 γn P∗ max t∈τ (n − 1) t∈τ
(3.24)
Consequently, we also have the following condition number estimates: (3.25)
β1 γnd maxt∈τ {Qd (t)} β2 γnd P∗ maxt∈τ {Qd (t)} ≤ Cond(KA ) ≤ . ∗ t (n − 1)β2 λ1 P∗ σn maxt∈τ {ρmax |t|} β1 λ∗1 δn mint∈τ {ρtmin |t|}
The above theorem is very general and is valid in any space dimension for a general diffusion equation and for a general and possibly high-order finite element space (with an invariant nodal basis) defined on a general unstructured simplicial mesh. Despite the appearance of many terms in the estimate (3.25), a very precise relation between the conditioning of the global stiffness matrix and the mesh geometry is revealed by the bounds. Indeed, the most relevant quantities in (3.25) to the meshing qualities are simply the two ratios max{Qd (t)}/max{ρtmax |t|} t∈τ
t∈τ
and
max{Qd (t)}/min{ρtmin|t|}, t∈τ
t∈τ
assuming that P∗ , the maximal number of elements meeting at a nodal point, is under control. We note that for highly anisotropic problems or problems with strong inhomogeneous coefficients, the difference between β1 /β2 and β2 /β1 can be large. This issue is to be visited in later sections. 3.4. Mesh geometry and stiffness matrix conditioning. Based on Theorem 3.2, it can be said that, at least for problems that are not highly anisotropic, the most important geometric quantities that affect the conditioning of the global finite element stiffness matrix are the scaled volume ρtmin |t| of each element t (or ρtmax |t|, as we anticipate that ρtmax and ρtmin are of the same order for a given t), and the corresponding value of Qd (t). This is a rather universal property that is valid for general finite element spaces and general model equations. An effective control on these quantities in the meshing procedure may bear significance on the control of the conditioning of the linear systems coming from the finite element approximations.3 In cot θi with {θi }i=1 the two-dimensional case, we know that Q2 (t) corresponds to being the angles of the triangle t; thus, avoiding small angles in the triangulation is always preferred, as in the case of the Delaunay triangulation [22, 34, 37]. In fact, Qd (t) (for d = 2 or 3) has also been used as a mesh quality measure in many earlier studies on unstructured triangular meshes [5, 27, 30]. It has been labeled as a (smooth) conditioning quality measure in [37] based on the explicit calculation quoted earlier for the special case of the Poisson equation with a piecewise linear element. Relations between Qd (t) and other mesh quantity measures (see a nice summary in [37]) can
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1435
also be established. For example, let rin (t) be the radius of the largest inner-sphere of t; then 2 Qd (t)rin (t) ≤ |t|−1
d+1
ˆ 2 (t) ≤ (d + 1)3 |t| , |Ai |2 h i
i=1
ˆ i (t)} are the heights of the simplex t corresponding to the faces {Ai }. Simwhere {h ilarly, letting rmc (t) be the radius of the smallest containment sphere of t (the min-containment radius [37]), we have 2 4Qd (t)rmc (t) ≥ |t|−1
d+1
ˆ 2 (t) ≥ (d + 1)3 |t|. |Ai |2 h i
i=1
These inequalities imply that (3.26)
(d + 1)3 |t| (d + 1)3 |t| ≤ Q . (t) ≤ d 2 (t) 2 (t) 4rmc rin
Thus, how Qd (t) varies with respect to a scaled volume is very much related to the traditional characterization of the dependence of rin (t) and rmc (t) on the volume. We leave more discussions along this line for future work. 4. Numerical validation and applications. We now apply the general estimates obtained in the previous section to various special cases. Some of these are widely known and are consistent with the popular understanding in the finite element and meshing community, while others are interesting on their own. Numerical examples are provided to assess whether the estimates are sharp. 4.1. Two-dimensional uniform triangular element. As a special case, we consider a two-dimensional rectangular domain with a uniform triangular mesh consisting of right triangles, but with different aspect ratios; see Figure 4.1 for an illustration. We take ρ = 1 in this case. Let h be the length of the diagonal of each right triangle, and let θ and π/2 − θ be the two acute angles. Theorem 3.1 implies the following. Corollary 4.1. Given the uniform triangular mesh described above, and under the assumptions on the finite element spaces made earlier, for the model bilinear form aΩ in (2.5), we have the condition number estimate (4.1)
h2
c2 c1 ≤ Cond(K) ≤ 2 2 2 sin (2θ) h sin (2θ)
for some positive constants c1 and c2 , independent of h and θ. Proof. It follows from a simple calculation that for each triangle t, we have |t| = h2 sin(2θ)/4 and Q2 (t) = 8/sin(2θ). Substituting into the inequality (3.20), we get (4.1) immediately. The result in Corollary 4.1 is widely known in the finite element and meshing community [2, 38]. It is in fact quite sharp. In Tables 4.1 and 4.2, we present some numerical results computed on such uniform triangular meshes with the total number of elements being fixed (= 8192), but with different values for the angle θ. Thus, we get meshes of varying degrees of aspect ratio, and h2 is proportional to sin−1 (2θ). The estimate (4.1) in Corollary 4.1 predicts that the condition number is proportional
1436
QIANG DU, DESHENG WANG, AND LIYONG ZHU
Fig. 4.1. Uniform triangular meshes with isosceles right triangles (left) and right triangles with small angles (right).
Table 4.1 The linear element case for the uniform triangular mesh. Mesh 4 × 1024 8 × 512 16 × 256 32 × 128 64 × 64
t λK max 256.00 64.004 16.016 4.0655 1.5000
λK max 1024.011 256.0577 64.24519 16.99518 7.995182
λK min 0.004699 0.004788 0.004811 0.004817 0.004818
Condition number 2.179372e+5 5.347539e+4 1.335275e+4 3.528104e+3 1.659380e+3
Table 4.2 The quadratic element case for uniform triangular mesh. Mesh 4 × 1024 8 × 512 16 × 256 32 × 128 64 × 64
t λK max 682.67 170.69 42.750 11.023 4.7420
λK max 1365.352 341.4147 85.66466 22.66466 10.66466
λK min 0.001198 0.001204 0.001205 0.001205 0.001205
Condition number 1.140045e+6 2.836673e+5 7.108828e+4 1.880261e+4 8.846961e+3
to sin−1 (2θ), regardless of the order of the finite element spaces used. The same proportionality is true for λK max as predicted by Lemma 3.4 and the computation above. In Tables 4.1 and 4.2, for each mesh we report the corresponding largest eigenvalue of the element stiffness matrix, the extreme eigenvalues, and the condition number of global stiffness matrix. The quantity λK min is nearly unchanged, which is consistent with the theoretical prediction in Lemma 3.3 since the elements have a constant volume. Meanwhile, λK max and Cond(K) both grow when θ approaches 0 or π/2, while their minimum values are attained for θ = π/4 corresponding to the 64 × 64 mesh. In Figure 4.2, we plot with respect to sin−1 (2θ) (the horizontal axis) the curves of the largest eigenvalue and the condition number, respectively, for both the linear and the quadratic elements. The condition number for the quadratic case is normalized by a factor of 5.12 so as to fit into the same plot range. The perfect linear behavior verifies the theoretical prediction. 4.2. Finite element on quasi-volume-uniform, shape-regular meshes. The previous example focuses on the effect of the shape regularity on the condition number with a uniform element size (volume). We now discuss some effect of the element size on the condition number when the shapes of the elements remain regular.
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING 2500
2500
2000
2000
1500
1500
1000
1000
500
500
0 0
20
40
60
80
100
120
0 0
140
20
40
60
80
100
120
1437
140
Fig. 4.2. Plots against sin−1 (2θ) (the horizontal axis) of Cond(K) and λmax (K) for the linear element (left) and Cond(K)/5.12 and λmax (K) for the quadratic element (right). Here, the solid lines represent λmax (K).
In this subsection, we consider a simplicial mesh τ with simplices t ∈ τ satisfying (4.2) ρ1 |t|2−2/d ≤ A2i ≤ ρ2 |t|2−2/d ∀t ∈ τ i
for some positive constants ρ1 and ρ2 , independent of t. We refer to such meshes as shape regular. In light of (3.26), to assure (4.2), it is sufficient to assume that rmc (t) ≤ ρ3 rin (t)
∀t ∈ τ
for some constant ρ3 . Note that the latter condition is consistent with the traditional meaning of shape regularity given in standard texts (see, e.g., [11]). Meanwhile, we refer to a simplicial mesh τ as being quasi volume-uniform if (4.3)
min |t| ≥ ρ3 max |t| t∈τ
t∈τ
∀t ∈ τ
holds for some positive constant ρ3 , independent of t. Note also that this is somewhat different from the traditional notion of a quasi-uniform mesh, which is measured using the diameters of the elements rather than the volumes [11]. First of all, we take d = 2 and consider the conforming linear element space on a quasi-volume-uniform and shape-regular triangulation. Theorem 3.1 implies the following. Corollary 4.2. For the model bilinear form aΩ in (2.5) with a two-dimensional linear triangular element space defined on a quasi-volume-uniform and shape-regular triangulation, if h is the mesh parameter (diameter of the largest triangle), then c1 h−2 ≤ Cond(K) ≤ c2 h−2 for some constants c1 and c2 , independent of h. Proof. We notice 3 that under the assumption on the triangulation, for each triangle t, Q2 (t) = |t|−1 i=1 |Ai |2 and |t|h−2 remains uniformly bounded below and above by positive constants. Substituting into the inequality (3.20) with ρ = 1, we get the corollary immediately. While the above corollary is widely known, a lesser-known version about general Lagrange triangular finite element spaces remains true [2]. Corollary 4.3. For the model bilinear form aΩ in (2.5) discretized by a finite element space with an invariant basis defined on a quasi-volume-uniform d-dimensional
1438
QIANG DU, DESHENG WANG, AND LIYONG ZHU
simplicial mesh with h being the mesh parameter (diameter of the largest simplex), if we further assume that all the simplices are shape regular in the sense of (4.2), then (n,d) −2
c1 (n,d)
h
(n,d) −2
≤ Cond(K) ≤ c2
h
(n,d)
for some constants c1 and c2 , which are dependent on the finite element basis on the reference element t0 and dimension d, but are independent of h. The proof follows from the same line of argument as in the two-dimensional linear element case. We note that these corollaries can of course be derived in other ways, for instance, with the use of inverse inequality [11]. 4.3. Finite element on nonuniform shape-regular meshes. We now consider the case of shape-regular meshes, as specified by (4.2), but without the quasivolume-uniform assumption. Thus, the element sizes |t| are allowed to vary in a very large range. We then have the following. Corollary 4.4. For the model bilinear form aΩ in (2.5) with a general simplicial finite element space satisfying the conditions given in Theorem 3.1, corresponding to a d-dimensional simplicial mesh τ satisfying condition (4.2), we have (4.4)
c2 maxt∈τ |t|1−2/d c1 maxt∈τ |t|1−2/d ≤ Cond(K) ≤ maxt∈τ {ρtmax |t|} mint∈τ {ρtmin|t|}
for some constants c1 and c2 which are dependent on the finite element space but are independent of mesh geometry. The above result is interesting, for example, in the context of adaptive finite element simplicial meshes satisfying (4.2) but containing elements with considerable variations in their sizes. Preserving shape regularity is often implemented in the local mesh refinement procedure so that it is reasonable to expect that (4.2) is satisfied. Let hmin be the diameter of the smallest element in an adaptive finite element mesh satisfying (4.2). In both one and two space dimensions, we see that the use of a constant density ρ = 1 would yield an upper bound proportional to h−2 min , which is about the same order for the condition number of the linear system resulting from a uniform mesh of the mesh size hmin , though the degree of freedom (and thus the dimension of the global stiffness matrix) may be much smaller in the adaptive case than in the uniform case. Yet, this is generally not sharp. In [20], it was shown that with the element size distribution being inversely proportional to the nonuniform density, that is, (4.5)
c1 N −1 ≤ ρtmin |t| ≤ ρtmax |t| ≤ c2 N −1 ,
where N is the number of elements in τ , and c1 and c2 are some positive constants, the sharper upper bound (4.6)
Cond(K) ≤ cN hmin
holds. This is naturally consistent with the estimate given in (4.4). The sharper estimate indicates a much better condition number, and thus further demonstrates the greater efficiency of the adaptive mesh in both representing the PDE solutions and improving the conditioning of the resulting linear systems. For the inequalities (4.5) to hold for a smoothly defined density function, the variation in the element sizes needs to be properly controlled. Yet, we present a simple numerical example to
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1439
16 120
12
80
8
4 40
0
0
0
5
10
15
20
−4 0
5
10
15
20
25
30
35
40
Fig. 4.3. Plots of Cond(K)hmin (left) with respect to different m (horizontal axis) and the logarithms of all 41 eigenvalues of K for m = 21 (right).
illustrate that the bound (4.6) remains quite accurate even for highly graded meshes. We take a two point boundary value problem, (4.7)
−u = f
on (−1, 1) ,
and
u(−1) = u(1) = 0,
and consider the discretization with the linear finite element on a geometrically graded mesh xi = sgn(i − m)2|i−m|−m for 0 ≤ i ≤ 2m. For this mesh, hmin = 21−m and N = 2m − 1. We note first that the bounds on the largest eigenvalue in Lemma 3.4 gives the sharp estimate −1 m λK 2m−1 = O(hmin ) = O(2 ).
In Figure 4.3, we plot, with respect to m (the horizontal axis in the left figure), the product of hmin and Cond(K). The near linear scaling in m of Cond(K)hmin implies −2 that Cond(K) grows on the order of N h−1 min rather than O(hmin ), which is consistent with the sharper estimate (4.6). As mentioned in [37], a few small elements in a largely uniform mesh tend to produce large condition numbers, but in fact, they may only lead to a few outliers in the eigenvalue distributions and can thus be treated effectively. We also plot in Figure 4.3 the distribution of the logarithm of all 41 eigenvalues for the stiffness matrix corresponding to m = 21. It shows that, rather than giving only a few outliers, the geometrically (exponentially) graded meshes produced nearly exponentially distributed eigenvalues. The same numerical results can also be reproduced for two-dimensional analogues as well. Similar numerical examples can be constructed for two-dimensional problems. In dimensions three or higher, if we take ρ = 1, then the upper bound of the condition number estimate in Corollary 4.4 shows the dependence on h−d min , which is even worse than the dimension-independent estimate O(h−2 min ) in Corollary 4.2 for a quasi-volumeuniform shape-regular mesh with mesh size hmin . One may expect that it might be possible to get sharper bounds using a nonuniform density. We will examine these issues in greater detail in the future. 4.4. Finite element with three-dimensional tetrahedral meshes. Finite element methods are very popular for many large-scale three-dimensional problems. Three-dimensional unstructured tetrahedral mesh generation and optimization have also attracted much attention. For most mesh generators, a mesh sizing measure is introduced so that a mesh with suitably distributed sizing measure can be produced. Yet, controlling the shape regularity of the elements in spaces of three and higher dimensions remains a challenging task [17, 24, 31, 37].
1440
QIANG DU, DESHENG WANG, AND LIYONG ZHU Table 4.3 Extreme eigenvalues for Poisson equation with three-dimensional coarser meshes. Nτ
max Q3 (t)
min |t|
max |t|
1 λK min
1 λK max
2 λK min
2 λK max
7553 7838 7879 7837 7532 7737 7584 7545
934 472 311 181 160 118 108 95
0.00097 .00193 .00298 .00491 .00515 .00594 .0103 0.00932
0.236 0.269 0.282 0.264 0.208 0.242 0.235 0.202
0.240 0.253 0.254 0.254 0.2401 0.247 0.240 0.240
110.42 60.64 36.15 26.51 17.30 18.24 16.68 13.61
0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.03
209 107.3 72 42.5 37.51 28.67 25.72 23.02
t∈τ
t∈τ
t∈τ
Table 4.4 Extreme eigenvalues for Poisson equation with a three-dimensional finer mesh. Nτ
max Q3 (t)
min |t|
max |t|
1 λK min
1 λK max
2 λK min
2 λK max
21244 22180 22098 22060 22065 21460 21522 21710 21638 21575 21315 21273
2013 1103 599 401 335 259 257 255 121 115 114 84
1.43e-4 1.58e-4 4.34e-4 5.99e-4 8.09e-4 8.66e-4 7.51e-4 4.73e-4 2.25e-3 2.18e-3 2.34e-4 2.73e-3
0.0702 0.105 0.101 0.0927 0.0958 0.0727 0.0743 0.0854 0.0837 0.0784 0.0749 0.0696
0.0836 0.0831 0.0833 0.0833 0.0833 0.0833 0.0834 0.0835 0.0834 0.0834 0.0834 0.0835
227.8 113.37 59.53 49.25 43.32 33.51 33.92 25.15 17.23 18.79 17.05 13.75
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
450.11 247 137.1 91.17 75.81 58.67 59.4 58.10 28.13 26.93 26.4 20.33
t∈τ
t∈τ
t∈τ
We now present some examples of the condition numbers of the stiffness matrix for the Poisson equation (2.5) in a cubic box [0, 10]3 with a homogeneous Dirichlet boundary condition. The equation is solved based on some unstructured tetrahedral meshes generated with a uniform sizing measure. For detailed discussions on the related mesh generation procedures, we refer to [13, 14, 16, 17, 25] and the references cited therein. In our numerical results, computations are performed on meshes having two levels of resolution with the coarser meshes having element numbers ranging from 7500 to 7900 and with the finer meshes having element numbers ranging from 21200 to 22100. The results of the corresponding extreme eigenvalues of the global Ki 2 i stiffness matrices denoted by {λK min , λmax }i=1 for the linear and quadratic elements, respectively, are reported in Tables 4.3 and 4.4 for the various meshes. It is of course straightforward to get the condition numbers from the ratios of the extreme eigenvalues. In each case, we also list the number of elements (Nτ ) in the mesh τ , the maximum (maxt∈τ |t|) and minimum (maxt∈τ |t|) values of the element volumes, and the maximum value of Q3 (t) for t ∈ τ . We may see from the tables 2 i that the smallest eigenvalues {λK min }i=1 remain nearly constant for meshes at the same level with a ratio of nearly factor 8 between the linear and quadratic elements. Notice that the smallest and the largest element volumes do vary between meshes at the same level, so the lower and upper bounds in (3.18) are not tight in this case. Meanwhile, for the largest eigenvalues, they follow proportionally to the values of Qd (t) as predicted by estimate (3.19). More extensive computational studies for more general equations and geometric domains are currently under investigation.
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1441
4.5. Effect of anisotropy. Diffusion equations with highly anisotropic coefficients have wide applications in many practical problems. In [37], some discussions have been given for the linear finite element corresponding to an anisotropic Poisson equation of the form (3.21) with A = A(x) being replaced by a constant matrix B. To simplify the notation, we take the two-dimensional case as an example. Let v1 , v2 denote the orthogonal unit eigenvectors of B, and let ξ1 , ξ2 be the corresponding eigenvalues. Then, B = ξ1 v1 v1T + ξ2 v2 v2T . Let 1 1 G = √ v1 v1T + √ v2 v2T , ξ1 ξ2
1
or equivalently, G = B − 2 .
˜ Define the change of variable (˜ x, y˜)T = G(x, y)T and f˜(˜ x, y˜) = f (G−1 (˜ x, y˜)). Let Ω denote the image of Ω and t˜ denote the image of an element t for any t ∈ τ . With the above change of variable, (3.21) with the constant coefficient matrix B becomes (2.5) ˜ with unknown solution u˜ and right-hand side f˜. for variables (˜ x, y˜) ∈ Ω When the linear Lagrange finite element method is employed to solve the problem ˜ ˜bi (∇ ˜ and {˜bi } are the gradient op(3.21) with coefficient matrix B, since G−1 ∇bi = ∇ erator and the linear Lagrange basis in the new variable, respectively), we see that the element stiffness matrix Kt on a triangle t is identical to the element stiffness matrix for t˜ corresponding to the Poisson equation (2.5). Consequently, equilateral elements may not necessarily lead to good conditioning for stiffness matrices of anisotropic equations [37]. In [33], it is argued that an optimal uniform triangular mesh is equilateral with respect to the metric, which is the inverse of the coefficient matrix, which is consistent to the computation given in [37]. With the help of transformation G, the computations given in [37] can be readily applied to the case of more general finite element spaces, following similar discussions given in the earlier sections. It can thus be seen that the important geometric factors affecting the stiffness matrix conditioning, for highly anisotropic problems, ˜ ˜ |t˜| and ρtmax |t˜| of the transformed element t˜, and the corresponding value of are ρtmin d+1 −1 2 ˜ |t˜| i=1 |Ai | . For instance, consider the two-dimensional case with B being a diagonal tensor with diagonal entries 81 and 1. In this case, G is also diagonal with entries 1/9 and 1. Hence, thin triangles with an aspect ratio of roughly nine, oriented parallel to the xaxis, ideally provide the optimal stiffness matrix conditioning. The numerical results in Table 4.5 are obtained by solving the anisotropic equation in a unit square with a linear finite element on meshes shown in Figure 4.1. As predicted, the condition number corresponding to the triangulation of the 27 × 243 rectangular mesh is the smallest. Table 4.5 The linear element case for the anisotropic Poisson problem. Mesh 3 × 2187 9 × 729 27 × 243 81 × 81 243 × 27
t λK max 7.290278e+2 81.252329 13.500000 81.252329 7.290278e+2
λK max 2.916332e+3 3.278779e+2 71.876786 3.278767e+2 2.916321e+3
λK min 0.112615 0.122119 0.123214 0.123336 0.123348
Condition number 2.589641e+4 2.684904e+3 5.833500e+2 2.658407e+3 2.364306e+4
1442
QIANG DU, DESHENG WANG, AND LIYONG ZHU
5. Conclusion and future work. In this paper, the relations between the spectral condition number of stiffness matrix and mesh geometry are systematically explored. Our main results are rigorously derived and yet applicable to very general equations, finite element spaces, and geometric meshes. They may lead to more work in the following two directions: better understanding of the effect of geometry on the matrix conditioning can lead to the development of better iterative solvers; at the same time, better mesh generation and optimization strategies and mesh quality measures can be devised to generate meshes on which a compromise between the efficiency of the solver and the discretization error can be reached so that optimal performance of finite element computations can be obtained. There remain many interesting issues to be investigated in the future; for instance, preconditioning can greatly improve the performance of the linear algebraic solvers, and for many practical applications, the discrete algebraic problems can be tractable only if effective preconditioners are used. It will thus be interesting to study the precise dependence of the condition number estimates on the mesh geometry for preconditioned stiffness matrices [36]. Also, it is well known that the stiffness matrix conditioning will be different when different basis functions are employed [4, 9]. Comparisons of different basis selections for high-order elements remain to the investigated. This is particularly important for the p-version or h − p version finite element methods [23, 35]. In addition, we have not considered equations involving convection terms, which may be solved by stabilized finite elements; the streamline-upwind Petrov/Galerkin methods; and the residual-free bubbles methods. Such discussions may become more complex due to the possible lack of symmetry in the stiffness matrix and the loss of variational structure. Extensions to other interesting physical models such as the elasticity equations and Stokes equations, and to nonsimplicial meshes such as quadrilateral and hexahedral meshes (see [32, 28]), can also be considered. While many more issues remain to be examined, the present work complements existing work in the literature, and together, they provide a rigorous and systematic foundation for future studies. Acknowledgment. The authors thank the support of Lab for Scientific and Engineering Computing, Chinese Academy of Sciences, where this work was first initiated. REFERENCES [1] D. Arnold, F. Brezzi, and M. Fortin, A stable finite element for the Stokes equations, Calcolo, 12 (1984), pp. 337–344. [2] O. Axelsson and V. Barker, Finite Element Solution of Boundary Value Problems, Academic Press, London, 1983; reprinted as Classics Appl. Math. 35, SIAM, Philadelphia, 2001. [3] I. Babuˇ ska and A. K. Aziz, On the angle condition in the finite element method, SIAM J. Numer. Anal., 13 (1976), pp. 214–226. [4] E. Barragy and G. Carey, Preconditioners for high degree elements, Comput. Methods Appl. Mech. Engrg., 93 (1991), pp. 97–110. [5] R. E. Bank and R. K. Smith, Mesh smoothing using a posteriori error estimates, SIAM J. Numer. Anal., 34 (1997), pp. 979–997. [6] M. Batdorf, L. Freitag, and C. Ollivier-Gooch, Computational study of the effect of unstructured and mesh quality on solution efficiency, in Proceedings of the 13th CFD Conference, AIAA, Reston, VA, 1997. [7] M. Berzins, Mesh quality: A function of geometry, error estimates or both?, Engineering with Computers, 15 (1999), pp. 236–247. [8] M. Berzins, A solution-based triangular and tetrahedral mesh quality indicator, SIAM J. Sci. Comput., 19 (1998), pp. 2051–2060.
MESH GEOMETRY AND STIFFNESS MATRIX CONDITIONING
1443
[9] G. Carey and E. Barragy, Basis function selection and precondition high degree finite element and spectral methods, BIT, 29 (1989), pp. 794–804. [10] W. Cao, On the error of linear interpolation and orientation, aspect ratio, and internal angles of a triangle, SIAM J. Numer. Anal., 43 (2005), pp. 19–40. [11] P. Ciarlet, The Finite Element Method for Elliptic Problems, North–Holland, Amsterdam, 1978; reprinted as Classics in Appl. Math. 40, SIAM, Philadelphia, 2002. [12] M. Delfour, G. Payre, and J. Zolesio, An optimal triangulation for second-order elliptic problems, Comput. Methods Appl. Mech. Engrg., 50 (1985), pp. 231–261. [13] Q. Du, V. Faber, and M. Gunzburger, Centroidal Voronoi tessellations: Applications and algorithms, SIAM Rev., 41 (1999), pp. 637–676. [14] Q. Du and M. Gunzburger, Grid generation and optimization based on centroidal Voronoi tessellations, Appl. Comput. Math., 133 (2002), pp. 591–607. [15] Q. Du, Z. Huang, and D. Wang, Mesh and solver co-adaptation in finite element methods for anisotropic problems, Numer. Methods Partial Differential Equations, 21 (2005), pp. 859–874. [16] Q. Du and D. Wang, Tetrahedral mesh generation and optimization based on centroidal Voronoi tessellations, Internat. J. Numer. Methods Engrg., 56 (2003), pp. 1355–1373. [17] Q. Du and D. Wang, Recent progress in robust and quality mesh generation, J. Comput. Appl. Math., 195 (2006), pp. 8–23. [18] L. Freitag and C. Ollivier-Gooch, A cost/benefit analysis of simplicial mesh improvement techniques as measured by solution efficiency, Internat. J. Comput. Geom. Appl., 10 (2000), pp. 361–382. [19] I. Fried, Condition of finite element matrices generated from nonuniform meshes, AIAA Journal, 10 (1972), pp. 219–221. [20] I. Fried, Bounds on the spectral and maximum norms of the finite element stiffness, flexibility and mass matrices, Int. J. Solids Structures, 9 (1973), pp. 1013–1034. [21] I. Fried, Numerical Solution of Differential Equations, Academic Press, New York, 1979. [22] P. George and H. Borouchaki, Delaunay Triangulation and Meshing, Application to Finite Elements, Herm`es, Paris, 1998. [23] N. Hu, X. Guo, and I. Katz, Bounds for eigenvalues and condition number in the p-version of the finite element methods, Math. Comp., 67 (1998), pp. 1423–1450. [24] L. Ju, Conforming centroidal Voronoi Delaunay triangulation for quality mesh generation, Intern. J. Numer. Anal. Model., 4 (2007), pp. 531–547. [25] L. Ju, M. Gunzburger, and W. Zhao, Adaptive finite element methods for elliptic PDEs based on conforming centroidal Voronoi–Delaunay triangulations, SIAM J. Sci. Comput., 28 (2006), pp. 2023–2053. [26] M. Kittur, R. Huston, and F. Oswald, Finite-Element Grid Improvement by Minimization of Stiffness Matrix Trace, NASA Tech. Report 87-C-4, Glenn Research Center, Cleveland, OH, 1987. [27] P. Knupp, Matrix norms and the condition number: A general framework to improve mesh quality via node-movement, in Proceedings of the 8th International Meshing Roundtable (Lake Tahoe, CA), 1999, pp. 13–22. [28] P. Knupp, Hexahedral and tetrahedral mesh shape optimization, Internat. J. Numer. Methods Engrg., 58 (2003), pp. 319–332. ˇ´ıˇ [29] M. Kr zek, On the maximum angle condition for linear tetrahedral elements, SIAM J. Numer. Anal., 29 (1992), pp. 513–520. [30] A. Liu and B. Joe, Relationship between tetrahedron shape measures, BIT, 34 (1994), pp. 268–287. [31] G. Miller, D. Talmor, S.-H. Teng, and N. Walkington, A Delaunay based numerical method for three dimensions: Generation, formulation and partition, in Proceedings of the 27th ACM Symposium on Theory of Computing, ACM, New York, 1995, pp. 683–692. [32] P. Ming and Z. Shi, Quadrilateral mesh, Chin. Ann. Math. Ser. B, 23 (2002), pp. 235–252. [33] S. Oh and J. Yim, Optimal finite element mesh for elliptic equation of divergence form, Appl. Math. Comput., 162 (2005), pp. 969–989. [34] A. Okabe, B. Boots, K. Sugihara, and S. Chiu, Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, Wiley, Chichester, UK, 2000. [35] E. Olsen and J. Douglas, Bounds on spectral condition numbers of matrices arising in the p-version of the finite element method, Numer. Math., 69 (1995), pp. 333–352. [36] A. Ramage and A. J. Wathen, On preconditioning for finite element equations on irregular grids, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 909–921.
1444
QIANG DU, DESHENG WANG, AND LIYONG ZHU
[37] J. Shewchuk, What is a Good Linear Finite Element? Interpolation, Conditioning, Anisotropy and Quality Measures, Tech. report, Department of Computer Science, University of California, Berkeley, CA, 2003. [38] G. Strang and G. Fix, An Analysis of the Finite Element Method, Prentice-Hall, Englewood Cliffs, NJ, 1973. [39] I. Tsukerman, A general accuracy criterion for finite element approximation, IEEE Trans. Magnetics, 35 (1998), pp. 1–4. [40] A. Wathen, Realistic eigenvalue bounds for the Galerkin mass matrix, IMA J. Numer. Anal., 7 (1987), pp. 449–457. [41] Y. Xu, Orthogonal polynomials and cubature formulae on balls, simplices, and spheres, J. Comput. Appl. Math., 127 (2001), pp. 349–368.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1445–1473
DYNAMICAL SYSTEMS AND NON-HERMITIAN ITERATIVE EIGENSOLVERS∗ MARK EMBREE† AND RICHARD B. LEHOUCQ‡ Abstract. Simple preconditioned iterations can provide an efficient alternative to more elaborate eigenvalue algorithms. We observe that these simple methods can be viewed as forward Euler discretizations of well-known autonomous differential equations that enjoy appealing geometric properties. This connection facilitates novel results describing convergence of a class of preconditioned eigensolvers to the leftmost eigenvalue, provides insight into the role of orthogonality and biorthogonality, and suggests the development of new methods and analyses based on more sophisticated discretizations. These results also highlight the effect of preconditioning on the convergence and stability of the continuous-time system and its discretization. Key words. eigenvalues, dynamical systems, inverse iteration, preconditioned eigensolvers, geometric invariants AMS subject classifications. 15A18, 37C10, 65F15, 65L20 DOI. 10.1137/07070187X
1. Introduction. Suppose we seek a small number of eigenvalues (and the associated eigenspace) of the non-Hermitian matrix A ∈ Cn×n , having at our disposal a nonsingular matrix N ∈ Cn×n that approximates A. Given a starting vector p0 ∈ Cn , compute (1.1)
pj+1 = pj + N−1 (θj − A)pj ,
where θj − A is shorthand for Iθj − A, and θj =
(Apj , pj ) (pj , pj )
for some inner product (·, ·). Knyazev, Neymeyr, and others have studied this iteration for Hermitian positive definite A; see [21, 22] and references therein for convergence analysis and numerical experiments. Clearly the choice of N will influence the behavior of this iteration. With N = A, the method (1.1) reduces to (scaled) inverse iteration: pj+1 = A−1 pj θj . We are interested in the case where N approximates A, yet one can apply N−1 to a vector much more efficiently than A−1 itself. Such a N acts as a preconditioner for A, and, hence, (1.1) represents a preconditioned iteration. ∗ Received by the editors September 4, 2007; accepted for publication (in revised form) November 7, 2008; published electronically March 13, 2009. http://www.siam.org/journals/sinum/47-2/70187.html † Department of Computational and Applied Mathematics, Rice University, 6100 Main Street – MS 134, Houston, TX 77005-1892 (
[email protected]). This author’s research supported by U.S. Department of Energy grant DE-FG03-02ER25531 and National Science Foundation grant DMSCAREER-0449973. ‡ Sandia National Laboratories, P.O. Box 5800, MS 1110, Albuquerque, NM 87185–1110 (rblehou@ sandia.gov). Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the U.S. Department of Energy under contract DE-AC04-94AL85000.
1445
1446
MARK EMBREE AND RICHARD B. LEHOUCQ
This method contrasts with a different class of algorithms, based on inverse iteration (or the shift-invert Arnoldi algorithm), that apply a preconditioner to accelerate an “inner iteration” that approximates the solution to a linear system at each step; see, e.g., [24, 13, 16] and [6, Chapter 11]. For numerous practical large-scale non-Hermitian eigenvalue problems, such as those described in [25, 41], these inner iterations can be extremely expensive and highly dependent on the quality of the preconditioner. In contrast, as we shall see, the iteration (1.1) can converge to a leftmost eigenpair even when N is a suitable multiple of the identity. This paper provides a rigorous convergence theory that establishes sufficient conditions for (1.1) to converge to the leftmost eigenpair for non-Hermitian A. We obtain these results by viewing this iteration as the forward Euler discretization of the autonomous nonlinear differential equation (1.2)
p˙ = N
−1
(Ap, p) − Ap p (p, p)
with a unit step size. Here A and N are fixed but p depends on a parameter, t; p˙ denotes differentiation with respect to t. In the absence of preconditioning, the differential equation (1.2) has been studied in connection with power iteration [10, 29], as described in more detail below. The nonzero steady-states of this system correspond to (right) eigenvectors of A, and, hence, one might attempt to compute eigenvalues by driving this differential equation to steady-state as swiftly as possible. Properties of the preconditioner determine which of the eigenvectors is an attracting steady-state. The differential equation (1.2) enjoys a distinguished property, observed, for example, in [10, 29] with N = I. Suppose that p solves (1.2), θ = (p, p)−1 (Ap, p), and N is self-adjoint and invertible (A may be non-self-adjoint). Then for all t,
(1.3)
d (p, Np) = N−1 (pθ − Ap), Np + p, NN−1 (pθ − Ap) dt = (pθ, p) − (Ap, p) + (p, pθ) − (p, Ap) = 0.
Thus, (p, Np) is an invariant (or first integral ), as its value is independent of time; see [19, section 1.3] for a discussion of the unpreconditioned case (N = I), and, e.g., [4, 18] for a general introduction to invariant theory and geometric integration. The invariant describes a manifold in n-dimensional space, (p, Np) = (p0 , Np0 ), on which the solution to the differential equation with p(0) = p0 must fall. Simple discretizations, such as Euler’s method (1.1), do not typically respect such invariants, giving approximate solutions that drift from the manifold. Invariant-preserving alternatives (see, e.g., [18, 26]) generally require significantly more computation per step (though a tractable method for the unpreconditioned, Hermitian case has been proposed by Nakamura, Kajiwara, and Shiotani [28]). Our goal is to explain the relationship between convergence and stability of the continuous and discrete dynamical systems. In particular, the quadratic invariant is a crucial property of the continuous system, and plays an important role in the convergence theory of the corresponding discretization, even when that iteration does not preserve the invariant. For a non-Hermitian problem, one naturally wonders how (1.1) can be modified to incorporate estimates of both left and right eigenvectors. In this case, we obtain
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1447
the coupled iteration (given here without preconditioning) (Ap, q) p˙ = pθ − Ap, (1.4) , θ= q˙ = qθ − A∗ q, (p, q) and a simple derivation reveals that (p, q) is invariant. Our analysis demonstrates that this two-sided dynamical system often suffers from finite-time blowup; in the discrete scheme this is tantamount to incurable breakdown, a well-known ailment of oblique projection methods (see [5] for a discussion and references to the literature within the context of non-Hermitian Lanczos methods). A longstanding association exists between eigenvalue iterations and differential equations [1, 2, 3, 10, 11, 15, 19], often involving the observation that iterates of a particular eigenvalue algorithm are exactly discrete-time samples of some underlying continuous-time system. Notable examples include Rayleigh quotient gradient flow [10, 27], connections between the QR algorithm for dense eigenproblems and Toda flow [29, 39], and more general “isospectral flows” [42]. For example, Chu notes that the iterates of the standard power method can be obtained as integer-time samples of the solution to the system (1.2) with N = I and A replaced by log A [10, eq. (2.7)]. The present study draws upon this body of work, but takes a different perspective: we seek a better understanding of iterations such as (1.1) that provide only approximate solutions (with a truncation error due to discretization) to continuous time systems such as (1.2). The distinction is significant: for example, a continuoustime generalization of the power method will converge, with mild caveats, to the largest magnitude eigenvalue, whereas the related systems we study can potentially converge to the leftmost eigenvalue at a shift-independent rate with little more work per iteration than the power method; see Theorems 4.4 and 6.3. The connection between eigensolvers and continuous-time dynamical systems also arises in applications. For example, the Car–Parrinello method [8] determines the Kohn–Sham eigenstates from a second-order ordinary differential equation, Newton’s equations of motion (see [34, p. 1086] for a formulation using (1.2) with no preconditioning). The heavy ball optimization method [35] also formulates the minimum of the Rayleigh quotient via a second order ordinary differential equation. In [7], the ground state solution of Bose–Einstein condensates are determined via a normalized gradient flow discretized by several time integration schemes. (Both the Kohn–Sham eigenstates and Bose–Einstein condensates give rise to self-adjoint nonlinear eigenvalue problems.) We begin our investigation with a study of various unpreconditioned iterations (N = I). Section 2 introduces basic differential equations for computation of invariant subspaces of matrix pencils, and then identifies parameter choices that yield invariantpreserving iterations. Near steady states, the solutions to these systems can be viewed as exact invariant subspaces for nearby matrices, as observed in section 3. From this point we focus on single vector iterations for standard eigenvalue problems. Section 4 describes exact solution formulas for two unpreconditioned continuous-time systems, one-sided and two-sided methods. As such exact solutions for the preconditioned case are elusive, we analyze such systems asymptotically using center manifold theory in section 5. These two sections provide the foundation for the main result of section 6, the development of sufficient conditions for convergence of (1.1) for non-Hermitian matrices. 2. Dynamical systems and invariant manifolds. We first examine properties of the dynamical system (1.2) and various generalizations suitable for computing
1448
MARK EMBREE AND RICHARD B. LEHOUCQ
eigenvalues of non-Hermitian matrix pencils. Let A, B ∈ Cn×n be general matrices with fixed (time-invariant) entries. For the generalized eigenvalue problem Ax = Bxλ with N = I, the system (1.2) expands to p˙ = Bpθ − Ap for appropriate θ = θ(t). This equation suggests a generalization from a system with the single vector p ∈ Cn to a system that evolves an entire subspace, given by the range of a matrix P ∈ Cn×k : ˙ = BPL − AP, P where differentiation is still with respect to the autonomous variable t; we shall address the choice of L(t) ∈ Ck×k momentarily. (Quantities such as L are t-dependent unless explicitly stated otherwise; we typically suppress the t argument to simplify notation.) For non-Hermitian problems one might simultaneously evolve an equation for the adjoint to obtain approximations to the left eigenspace, which suggests the system ˙ = BPL − AP P ˙ = B∗ QM∗ − A∗ Q, Q
(2.1)
with initial conditions P(0) = P0 and Q(0) = Q0 , where P, Q ∈ Cn×k , and L, M ∈ Ck×k . The choice we make for the time-dependent L, M ∈ Ck×k can potentially couple P and Q as introduced in (1.4). Here ·∗ denotes the conjugate transpose and (·, ·) the standard Euclidean inner product (though this analysis generalizes readily ˙ =Q ˙ = 0, then to arbitrary inner products). If this system is at a steady state, i.e., P (2.2)
B∗ QM∗ = A∗ Q,
BPL = AP,
and, hence, provided P and Q have full column rank, the eigenvalues of L and M are included in the spectrum of the pencil A − λB, while the columns of P and Q span right- and left-invariant subspaces of the same pencil. We shall motivate the choice of L and M through generalizations of the invariant discussed in the introduction. The following notation facilitates the analysis of these subspace iterations. Definition 2.1. Given P, Q ∈ Cn×k , define (P, Q) = Q∗ P ∈ Ck×k ; i.e., the (i, j) entry of (P, Q) satisfies (P, Q)i,j := (Pej , Qei ), where e denotes the th column of the k × k identity matrix. In this notation, we have the homogeneity property (PL, Q) = Q∗ PL = (P, Q)L. Consider the pairs of (time-dependent) functions (2.3)
(Q, P),
(P, Q)
and
(P, P),
(Q, Q)
with derivatives d ˙ P + Q, P ˙ , (Q, P) = Q, dt
d ˙ Q + P, Q ˙ , (P, Q) = P, dt
d ˙ P + P, P ˙ , (P, P) = P, dt
d ˙ Q + Q, Q ˙ . (Q, Q) = Q, dt
and
Inspired by (1.3), we next investigate how best to choose L and M to make either pair in (2.3) invariant under the system (2.1).
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1449
Theorem 2.2. For the system of ordinary differential equations (2.1) with initial conditions P(0) = P0 ∈ Cn×k and Q(0) = Q0 ∈ Cn×k , the choices (2.4)
L = (BP, Q)−1 (AP, Q),
M∗ = (Q, BP)−1 (Q, AP)
give d d (P, Q) = (Q, P) = 0, dt dt and, hence, (P, Q) = (P0 , Q0 ) and (Q, P) = (Q0 , P0 ) hold for all t. Proof. Note that d ˙ Q + P, Q ˙ (P, Q) = P, dt = (BP, Q)L − (AP, Q) + M(P, B∗ Q) − (P, A∗ Q) ∗
d ˙ + P, ˙ Q (Q, P) = P, Q dt = M(P, B∗ Q) − (P, A∗ Q) + (BP, Q)L − (AP, Q), where we have used (2.1) and the homogeneity property. We can force (d/dt)(P, Q) to zero by setting L and M as in (2.4). The next result is a direct analogue of Theorem 2.2 for the second pair in (2.3). We omit the proof, a minor adaptation of the last one. Theorem 2.3. For the system of ordinary differential equations (2.1) with initial conditions P(0) = P0 ∈ Cn×k and Q(0) = Q0 ∈ Cn×k , the choices L = (BP, P)−1 (AP, P),
M∗ = (Q, BQ)−1 (Q, AQ)
give d d (P, P) = (Q, Q) = 0, dt dt and, hence, (P, P) = (P0 , P0 ) and (Q, Q) = (Q0 , Q0 ) for all t. The formulations for L and M given in Theorems 2.2 and 2.3 are known as generalized Rayleigh quotients [38]. With these values of L and M, we refer to (2.1) as the two-sided and one-sided dynamical systems. Theorem 2.2 shows that if P∗0 Q0 = I, then the two-sided solutions will preserve this property (allowing for biorthogonal bases for left and right invariant subspaces), though possibly at the expense of growing P or Q. Theorem 2.3, on the other hand, shows that the one-sided iteration maintains P and Q, though biorthogonality will generally be lost. From the invariants we also see that the system preserves the rank of solutions to both oneand two-sided equations—provided they exist (see section 4). Since (P, P) is fixed for the one-sided system, so too are all singular values (and, thus, the rank) of P. For the two-sided system, if (P0 , Q0 ) is full rank, (P, Q) must always be as well, and, hence, P and Q individually have full rank. We denote the dynamical systems (2.1) given the generalized Rayleigh quotients of Theorems 2.2 and 2.3 as “two-sided” and “one-sided”, respectively. We refer to the ensuing schemes that result from discretizing (2.1) as “two-sided” and “one-sided” iterations.
1450
MARK EMBREE AND RICHARD B. LEHOUCQ
3. Invariants and backward stability. We saw in (2.2) that, at a steady state, the eigenvalues of L and M are exact eigenvalues of the pencil A − λB. As the system approaches a steady state, how well do the eigenvalues of the invariantpreserving choices for L and M approximate the eigenvalues of the pencil? First, consider the one-sided system, with L as given in Theorem 2.3 and P full rank. The first part of (2.1) can then be written as ˙ 0 = BPL − A + P(P, P)−1 P∗ P, from which we see that the eigenvalues of L form a subset of the spectrum of the ˙ perturbed pencil (A + P(P, P)−1 P∗ ) − λB. How large can such perturbations be? −1 ∗ + Note that (P, P) P = P is the pseudoinverse of P, and so 0 0 ˙0 0 0 0 00 0 0 0˙ 0 ˙ 0 0 + 0 0P0 −1 ∗ 0 , 0P(P, P) P 0 ≤ 0P0 P = σk where σk is the smallest singular value of P ∈ Cn×k . As discussed at the end of section 2, the choice of L in Theorem 2.3 that makes (P, P) invariant also makes σk ˙ is small, i.e., near a steady state, we conclude that the invariant. Thus, when P eigenvalues of L are the exact eigenvalues of a nearby pencil, with σk−1 acting as a condition number does in a backward error bound; that condition number can be set to one simply by taking (P0 , P0 ) = I. (This is related to an error bound for Rayleigh– Ritz eigenvalue estimates for a Hermitian matrix using a nonorthogonal basis; see [32, Theorem 11.10.1].) This analysis suggests that a departure from orthogonality in a numerical integration of the differential equation is reflected in degrading accuracy of the approximate eigenvalues. Now consider the two-sided system with L and M as given by Theorem 2.2 with nonsingular (BP, Q). We wish to rewrite (2.1) in the form 0 = BPL − (A + E)P 0 = B∗ QM∗ − (A∗ + E∗ )Q for the same E in both iterations. Lemma 1 of [20] implies that such a perturbation E exists if and only if (BP, Q)L = M(BP, Q), which holds for the choice of L and M given in Theorem 2.2. The perturbation E is ˙ Moreover, the “main theorem” of [20] gives ˙ and E∗ Q = Q. not unique, but EP = P -0 0 0 0 . ˙ 0 , 0Q ˙0 min E2 = max 0P 2
2
if (P, P) = Ik and (Q, Q) = Ik . However, as the authors of [20] explain, a small E2 is irrelevant unless (P, Q)−1 2 is also small. In particular, when P is orthogonal to Q, min E2 is undefined. The discussion following Theorem 4.1 in subsection 4.1 explains that a large (or undefined) (P, Q)−1 2 is equivalent to near breakdown (or serious breakdown) of the two-sided dynamical system. We caution the reader that backward stability alone does not provide information on forward error, or accuracy, of the steady-states when A = A∗ . The relevance of backward stability is that the solution of our one- and two-sided systems are, at all times, steady-states for a related dynamical system. The distance to this related perturbed system depends upon the norm of the residuals.
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1451
4. Convergence analysis. At least for single-vector iterations (i.e., k = 1), the analysis of the one- and two-sided dynamical systems follows readily from the remarkable fact that, in many cases, simple formulas give the exact solutions of these nonlinear differential equations. This observation, inspired by a lemma of Nanda [29], informs convergence analysis of the eigeniterations that result from the discretization of these equations. Though expressed for the standard eigenvalue problem, these results can naturally be adapted to the generalized case by replacing A with B−1 A. We discuss the solution operators for two-sided systems, followed by one-sided systems. 4.1. Two-sided systems. The following result generalizes a result of Nanda [29, Lemma 1.4] for the two-sided dynamical system. Theorem 4.1. Consider the partitioned set of ordinary differential equations p˙ = pθ − Ap q˙ = qθ¯ − A∗ q,
(4.1)
with p(0) = p0 and q(0) = q0 , where p, q ∈ Cn , (p0 , q0 ) = 0, and θ=
(Ap, q) . (p, q)
Then there exists some tf > 0 such that for all t ∈ [0, tf ), p(t) = e−At p0 π(t),
∗
q(t) = e−A t q0 π(t),
where = (4.2)
π(t) =
(p0 , q0 ) . −At (e p0 , e−A∗ t q0 ) ∗
Proof. We define p(t) = e−At p0 π(t) and q(t) = e−A t q0 π(t), and will show that these formulas satisfy the system (4.1). Note that ∗ ∗ π Ae−At p0 , e−A t q0 + e−At p0 , A∗ e−A t q0 π˙ = 2 (e−At p0 , e−A∗ t q0 ) −At ∗ Ae p0 , e−A t q0 =π (e−At p0 , e−A∗ t q0 ) −At ∗ Ae p0 π, e−A t q0 π ¯ (Ap, q) = π = πθ. =π ∗t −At −A (e p0 π, e q0 π ¯) (p, q) Differentiating the formulas for p and q, thus, gives p˙ = −Ae−At p0 π + e−At p0 π˙
= −Ap + θp ¯ ¯ + e−A t q0 π ¯˙ = −A∗ q + θq, q˙ = −A∗ e−A t q0 π ∗
∗
as required. The hypothesis that (p0 , q0 ) = 0 ensures the existence of the solution at time t = 0. The formula will hold for all t > 0, until potentially ∗ (4.3) e−At p0 , e−A t q0 = 0.
1452
MARK EMBREE AND RICHARD B. LEHOUCQ
We define tf to be the smallest positive t for which (4.3) holds. If no such positive t exists, the solution exists for all t > 0 and we can take tf = ∞ in the statement of the theorem. Theorem 4.1 gives (p, q) = (p0 , q0 ), precisely as Theorem 2.2 indicates. Under the conditions of Theorem 4.1, solutions of the two-sided single-vector equations (4.1) have the same direction as solutions of the simpler linear systems x˙ = −Ax, x(0) = p0 and y˙ = −A∗ y, y(0) = q0 , but the magnitudes of p and q vary nonlinearly with (4.2). In particular, the inner product of p and q can be zero—even with both p and q nonzero—leading to finite time blow-up of (4.1). Note that if ∗ e−At p0 e−A t q0 , = 0, (p0 , q0 ) (q0 , p0 ) then π(t) is undefined. Hence, finite time blow-up is analogous to serious breakdown [43, p. 389], a problem endemic to oblique projection methods (see, e.g., [5]). This ratio will be nonzero but small in the vicinity of blow-up (or near-breakdown), a situation that commonly occurs in discretizations of these equations. The salient issue is that p and q are nearly orthogonal and so
−At ∗ e p0 e−A t q0 (p, q) = , (4.4) p q e−At p0 e−A∗ t q0 is a useful quantity to measure. This number is small when the secant of the angle between p and q is large. In section 6 we shall see the important consequences of these observations for eigensolvers derived from the discretization of (4.1). One can avoid breakdown altogether by using starting vectors p0 and q0 that are sufficiently accurate approximations to the right and left eigenvectors of A associated with the leftmost eigenvalue. Suppose A is diagonalizable with a simple leftmost eigenvalue λ1 , and all other eigenvalues strictly to the right of λ1 . Thus, there exists invertible X and diagonal Λ such that A = XΛX−1 with Λ1,1 = λ1 . Write λj = Λj,j , so that Re λj > Re λ1 for j = 2, . . . , n. Define r = X−1 p0 and s = X∗ q0 ; i.e., r and s are the expansions of the starting vectors in biorthogonal bases of right and left eigenvectors of A. Theorem 4.2. Under the setting established in the last paragraph, the condition |r1 s1 | >
n
|rj sj |
j=2
is sufficient to ensure that the dynamical system (4.1) has a solution for all t ≥ 0 given by Theorem 4.1; i.e., no incurable breakdown occurs. Proof. First note that n −At ∗ ∗ e p0 , e−A t q0 = Xe−Λt X−1 p0 , X−∗ e−Λ t X∗ q0 = (e−2Λt r, s) = rj sj e−2λj t . j=1 −2λ1 t
| ≥ |e Since Re λ1 < Re λj for j > 2, we have |e involving r and s, thus, implies, for t ≥ 0, that
−2λj t
| for all t ≥ 0. The hypothesis
n rj sj e−2λj t . r1 s1 e−2λ1 t ≥ j=2
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1453
Given this expression, we can twice apply the triangle inequality to conclude n rj sj e−2λj t 0 < r1 s1 e−2λ1 t − j=2
n n ∗ ≤ r1 s1 e−2λ1 t − rj sj e−2λj t ≤ rj sj e−2λj t = e−At p0 , e−A t q0 . j=2 j=1 Hence, π(t) in Theorem 4.1 is finite for all t ≥ 0, ensuring that the solution to the dynamical system (4.1) does not blow up at finite time. Theorem 4.2 implies that finite-time blow-up (or serious breakdown) is not generic for (4.1). However, the sufficient condition provided suggests that excellent initial approximations to the leftmost (left and right) eigenvectors are needed. 4.2. One-sided systems. The single vector one-sided system possesses a similar exact solution, which has been studied in the context of gradient flows associated with Rayleigh quotient iteration. We shall see that finite-time blow-up is never a concern for such systems. The following is a modest restatement of a result of Nanda [29, Lemma 1.4] (who considers the differential equation acting on the unit ball in Rn ). Theorem 4.3. Consider the ordinary differential equation p˙ = pθ − Ap,
(4.5)
with A ∈ Rn×n and initial condition p(0) = p0 ∈ Rn , where p0 = 0 and θ=
(Ap, p) . (p, p)
Then for all t ≥ 0, (4.5) has the exact solution p(t) = e−At p0 ω(t), where = ω(t) =
(p0 , p0 ) . (e−At p0 , e−At p0 )
We omit the proof of this result, which closely mimics that of Theorem 4.1. Of course, a similar formula can be written for the one-sided equation for q(t). The restriction to real matrices guarantees that (Ae−At p0 , e−At p0 ) = (e−At p0 , Ae−At p0 ); the result also hold for complex Hermitian A. As before, p has the same direction as the solution to the dynamical system x˙ = −Ax with x(0) = p0 , but the magnitude is scaled by the nonlinear scalar ω. Provided p0 = 0, the one-sided system (4.5) cannot blow up in finite time, since (p, p) = 0, in stark contrast to the two-sided iteration. This collinearity implies that the p vectors produced by the one- and two-sided systems provide equally accurate approximations to the desired eigenvector, at least until the latter breaks down. When A has a unique simple eigenvalue of smallest real part and the hypotheses of Theorem 4.1 or 4.3 are met, the asymptotic analysis of the associated dynamical system readily follows; cf. [19, section 1.3] for a generic asymptotic linear stability
1454
MARK EMBREE AND RICHARD B. LEHOUCQ
analysis of the one-sided iteration. In fact, one can develop explicit bounds on the sine of the angle between p and the desired eigenvector x1 , defined as sin ∠(p, x1 ) := min α∈C
αp − x1 . x1
Theorem 4.4. Suppose A can be diagonalized, A = XΛX−1 , and the eigenvalues of A can be ordered as Real(λ1 ) < Real(λ2 ) ≤ · · · ≤ Real(λn ). Let x1 and y1 denote right and left eigenvectors associated with λ1 , with x1 = 1 and y1∗ x1 = 1. Then the solution p(t) to both systems (4.1) and (4.5) satisfies sin ∠(p(t), x1 ) ≤ X X−1
p0 Re(λ1 −λ2 )t e |y1∗ p0 |
for all t ≥ 0 in the case of (4.5), and for all t ∈ [0, tf ) in the case of (4.1). Proof. Since x1 is a unit vector, we can write sin ∠(p(t), x1 ) = min αp(t) − x1 . α∈C
In both (4.5) and (4.1), p(t) is collinear with e−At p0 , so we can proceed with 0 0 sin ∠(p(t), x1 ) = min 0αXe−Λt X−1 p0 − x1 0 α∈C 0 λt 0 0 p0 Re(λ −λ )t 0e 1 0 0 −Λt −1 1 2 0 ≤ 0 ∗ Xe e X p0 − x1 0 . ≤ X 0X−1 0 ∗ 0 y1 p0 |y1 p0 | The first inequality follows from choosing a (suboptimal) value of α that cancels the terms in the x1 direction. (For similar analysis of the Arnoldi eigenvalue iteration, see [37, Proposition 2.1].) An analogous bound could be developed for the convergence of q to the left eigenvector y1 . When A is far from normal, one typically observes a transient stage of convergence that could be better described via analysis that avoids the diagonalization of A; see, e.g., [40, section 28], which includes similar analysis for the power method. The two-sided iteration converges to left and right eigenvectors of A associated with the leftmost eigenvalue, provided the method does not breakdown on the way to this limit. Several natural questions arise: How common is breakdown? How well do discretizations mimic this dynamical system? Before investigating these issues in section 6, we first address how preconditioning can accelerate—and complicate—the convergence of these continuous-time systems. 5. Preconditioned dynamical systems. What does it mean to precondition the eigenvalue problem? Several different strategies have been proposed in the literature (see especially the discussion in [21, pp. 109–110]); here we shall investigate analogous approaches for our continuous time dynamical systems, and the implications such modifications have on the convergence behavior described in the last section. One might first consider applying to the generalized eigenvalue problem Ap = Bpλ, left and right preconditioners M and N, so as to obtain the equivalent pencil −1 (5.1) M AN N−1 p = M−1 BN N−1 p λ.
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1455
Provided B is invertible, one could then define := M−1 BN −1 M−1 AN = N−1 B−1 AN A := N−1 p, p then apply the concepts from the preceding sections to the standard eigenvalue probp = p by evolving λ. For example, we could seek the leftmost eigenpair of A lem A the dynamical system p, ˙ = p θ − A p with the (preconditioned) Rayleigh quotient −1 −1 p, p A N B Ap, N−1 p θ= = . ) ( p, p (N−1 p, N−1 p) and B−1 A share the same spectrum because they are similar, and, hence, Note that A the asymptotic rate in Theorem 4.4 is immune to the preconditioner. The application of N could affect the system’s transient behavior, but M exerts no influence at all.1 = AB−1 , an Several choices for N are interesting. Taking N = A−1 gives A −1 alternative to the B A form suggested by the original problem. Similarity transformations can also be used to balance a matrix to improve the conditioning of the eigenvalue problem [31, 33], in which case N is constructed as a diagonal matrix Such balancing tends to decrease the departure from that reduces the norm of A. normality associated with the largest magnitude eigenvalues. In fact, in the 1960 article that introduced this idea, Osborne refers to this procedure as “pre-conditioning” [31]. A more extreme—if impractical—approach takes N to be a matrix that diagonalizes B−1 A (provided such a matrix exists), a choice that minimizes the constant XX−1 that describes the departure from normality in Theorem 4.4. As useful as such improvements might be, these strategies fail to alter the asymptotic convergence rate described in Theorem 4.4. To potentially improve this rate, one can apply the preconditioner N−1 directly to the residual pθ − Ap. Consider the dynamical system (5.2)
p˙ = N−1 (pθ − Ap),
where θ refers to the usual (unpreconditioned) Rayleigh quotient θ = (Ap, p)/(p, p). Discretization of this system results in the familiar preconditioned eigensolver described in (1.1). For this case, a generalization of Theorem 4.3 has proved elusive; we have found no closed form for the exact solution. Indeed, as we shall next see, the choice of preconditioner can even complicate the system’s local behavior. Let x1 denote a unit eigenvector of A associated with the eigenvalue λ1 . Note that x1 is a steady-state of (5.2), linearizing about which gives the Jacobian (5.3)
J = N−1 (I − x1 x∗1 )(λ1 − A).
As Jx1 = 0, the Jacobian J always has a zero eigenvalue, adding complexity to conventional linear stability analysis. The challenge can be magnified by a poor 1 Alternatively, by substituting (M−1 BN)−1 p 6 := N−1 p in (5.1), we obtain a system driven by 6 = M−1 AB−1 M that is independent of N. A
1456
MARK EMBREE AND RICHARD B. LEHOUCQ
choice for N. For example, suppose 1 0 0 1 −1 A= , N=N = , 0 2 1 0 so that
0 1 J= 1 0
0 0
1 x1 = , 0
λ1 = 1,
0 0 0 0 −1 = ; 1 0 −1 0 0
i.e., the Jacobian is a Jordan block with a double eigenvalue at zero. To obtain a rough impression of the behavior of the continuous system when θ is in the vicinity of λ1 , consider the constant-coefficient equation p˙ = N−1 (pλ1 − Ap), whose solution obeys the simple formula −1
p(t) = eN
(λ1 −A)t
p(0).
Hence, the asymptotic behavior of p is controlled by the spectrum of N−1 (λ1 − A). Assuming that N−1 (λ1 − A) has a simple zero eigenvalue, the convergence of this system to the dominant eigenvector depends on the nonzero eigenvalues of N−1 (λ1 − A): if this matrix has any other eigenvalues in the closed right half plane, the system will not generically converge; if all nonzero eigenvalues are in the open left half plane, then the convergence rate will be determined by the rightmost of them. Specific choices for N−1 will naturally depend significantly on the application problem at hand; in our general setting we seek to characterize basic traits of effective preconditioners. From the perspective of the convergence rate of the continuous dynamical system, we seek a preconditioner N−1 such that the nonzero eigenvalues of N−1 (λ1 − A) are as far to the left as possible. While the leftmost eigenvalues of N−1 (λ1 − A) do not much affect the behavior of the continuous system, they can have a significant effect on the stability of the discretized difference equation, i.e., the related eigensolvers. For example, if N−1 (λ1 − A) moves all nonzero eigenvalues into the left half plane, then replacing N by 12 N doubles the convergence rate of the continuous system. (We shall see on page 1461 that there is “no free lunch” for practical computations: the improved convergence rate of the continuous system is counter-balanced by the need to use a smaller step size in the discretized system.) To rigorously analyze the local behavior of the fully nonlinear system when p approximates the eigenvector x1 , we shall apply the center manifold theorem [9, 17], a tool for studying a dynamical system whose Jacobian has an eigenvalue on the imaginary axis. (Alternatively, we could restrict the system to the unit sphere in Rn .) We assume that A ∈ Rn×n . Without loss of generality, assume that λ1 = 0, so that the Jacobian at x1 (5.3) takes the form J = −N−1 (I − x1 x∗1 )A. Thus, for p near x1 we have p˙ = Jp + F(p) for the nonlinear function F(p) = N−1 (θ(p)p − (Ap, x1 )x1 ) that, by definition of the Jacobian, satisfies F(p) = o(p − x1 ). Suppose that J has a simple zero eigenvalue, and the rest of its spectrum is in the open left half plane. There exists some invertible (real, if J is real) matrix S with first column x1 and 0 0 S−1 JS = 0 C for some C ∈ R(n−1)×(n−1) whose spectrum is in the open left half plane.
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1457
We now transform coordinates into a form in which the center manifold theorem can most readily be applied. Define r(t) = S−1 (p(t) − x1 ), so that r˙ = S−1 JS S−1 (p − x1 ) + S−1 F(p) =
0 0 r + G(r), 0 C
where G(r) := S−1 F(Sr + x1 ) = S−1 F(p). By design, S−1 x1 = e1 ; hence, G(r) satisfies
(ASr, Sr) + (ASr, x1 ) −1 −1 (5.4) G(r) = S N S (r + e1 ) − (ASr, x1 )e1 . (Sr, Sr) + 2(x1 , Sr) + 1 Now we are prepared to cast this diagonalized problem into the conventional setting for center manifold theory. We write α r= b for α ∈ R and b ∈ Rn−1 . Using MATLAB index notation for convenience, the r system is simply α˙ 0 0 α G([α; b])1 = + , 0 C b G([α; b])2:n b˙ that is, α˙ = G([α; b])1 ,
b˙ = Cb + G([α; b])2:n .
Notice that the component α only figures in the nonlinear terms; we wish to determine how that contribution affects the magnitude of the b component—that is, the portion of the solution that we hope decays as t → ∞. Notice that b = 0 corresponds to the case when p is collinear with x1 . In this case p may differ from the unit eigenvector x1 , but regardless it is a fixed point of the dynamical system, and provided p = 0 we are content. In particular, if b = 0, then ASr = 0 too (recall that λ = 0), and we can see from (5.4) that G(r) = 0. In this case α˙ = G([α; 0])1 = 0,
b˙ = C0 + G([α; 0])2:n = 0,
so any such r is a fixed point of the dynamical system. We can put this in grander language: there exists some δ > 0 such that if α : |x| < δ =: M, r0 ∈ 0 then the dynamical system with r(0) = r0 satisfies r(t) ∈ M for all t > 0. (In particular, r(t) = r(0) ∈ M.) The set M is called a local invariant manifold. We can define this manifold (locally) by the requirement that b = g(α) := 0,
1458
MARK EMBREE AND RICHARD B. LEHOUCQ
which trivially satisfies g(0) = 0 and the Jacobian of g at α = 0 is Dg(0) = 0; furthermore, g is arbitrarily smooth near α = 0. Together, these properties ensure that M is a center manifold of the dynamical system. (We are fortunate in this case to have an explicit, trivial expression for this manifold.) All that remains is to apply Theorem 2 from Carr [9, p. 4]. Consider the equation u˙ = G([u; g(u)])1 = G([u; 0])1 = 0. The solution u(t) = 0 is clearly stable—if u(t) = ε, then |u(t) − 0| = |ε| is bounded for all t > 0—and, thus, Theorem 2(a) from [9] implies that the solution r(t) = 0 is a stable solution of the system 0 0 r˙ = r + G(r). 0 C Note that the solution u(t) = 0 is not asymptotically stable, that is, we do not have u(t) → 0 if u(0) = ε for small, nonzero ε. Were this the case, then we would be able to conclude that the r system was asymptotically stable. This would contradict our expectation that the original dynamical system will converge to something in span{x1 }, not necessarily to x1 itself. In particular, if N is self-adjoint, then (Np, p) is an invariant of the system, and so we expect that p(t) → ξx1 for ξ determined by |ξ|2 =
(Np, p) . (Nx1 , x1 )
We now have stability of the zero state of the r system, but that only means that solutions sufficiently close to r = 0 do not diverge. To say more—to say that the solutions actually converge to the center manifold—we can apply Theorem 2(b) of [9], which we slightly paraphrase here. Since the zero solution of the r equation is stable, for [α(0); b(0)] sufficiently small, there exists some solution u(t) of the equation u(t) ˙ = G([u; g(u)])1 = 0 and positive constant γ such that α(t) = u(t) + O e−γt , b(t) = g(u(t)) + O e−γt . In particular, in our setting such solutions u(t) will be constant: u(t) = c, and so there exist α(t) = c + O e−γt , b(t) = O e−γt , and, in particular, b(t) → 0 as t → ∞. Thus, for r0 sufficiently small, c r(t) = + O e−γt , 0 so that p(t) = Sr(t) + x1 = (1 + c)x1 + O(e−γt ). The preceding discussion is summarized in the following result. Theorem 5.1. If p(0) − x1 is sufficiently small and N−1 (I − x1 x∗1 )(λ − A) has a simple zero eigenvalue with all other eigenvalues in the open left half plane, then there exists γ > 0 and ξ ∈ R such that, as t → ∞, p(t) − ξx1 = O e−γt . In the case of self-adjoint, invertible N, |ξ| = |(p0 , Np0 )|. Note that if N is Hermitian and invertible but indefinite, then there always exists some unit vector p0 such that (p0 , Np0 ) = 0. If this starting vector is sufficiently close to the unit eigenvector x1 of A, then we have not ruled out the possibility that the system converges to the zero vector, rather than a desired eigenvector.
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1459
6. Discrete dynamical systems. The previous sections have addressed the quadratic invariant and convergence behavior of the continuous-time, one- and twosided dynamical systems. For purposes of computation, one naturally wonders how closely such properties are mimicked by the solutions to discretizations of these systems. The present section considers the convergence and preservation of the quadratic invariant by the discrete flow under a forward Euler time integration. We focus on this canonical integrator for three reasons: (1) this discretization leads to the algorithm (1.1) proposed in the literature; (2) analysis for forward Euler serves as a first step toward understanding more sophisticated algorithms; (3) more elaborate methods are not always practical. For example, the implicit midpoint rule will preserve the quadratic invariant (p, Np) [18, IV.2.1] of the one-sided system (1.2), but since this method takes the form
pj + pj+1 pj + pj+1 −1 θj+1 −A pj+1 = pj + hN 2 2 θj+1 =
(pj + pj+1 )T A(pj + pj+1 ) , (pj + pj+1 )T (pj + pj+1 )
its implementation requires the solution of a (nonlinear) system of equations at each step: a far more expensive proposition (per step) than the humble forward Euler method. (For a more sophisticated discretization in the unpreconditioned Hermitian case, along with a cautionary note about use of large step-size in the forward Euler method, see [28].) 6.1. Departure from the manifold. Given A ∈ Rn×n , for notational convenience we rewrite the two-sided system in the form (6.1)
p˙ = pθ − Ap =: f (p, q) q˙ = qθ − AT q =: g(p, q),
with θ = (qT p)−1 qT Ap = θT and initial conditions p(0) = p0 ∈ Rn and q(0) = q0 ∈ Rn . Similarly, the one-sided system (now including preconditioning) is (6.2)
p˙ = N−1 (pθ − Ap) =: N−1 f (p, p),
with θ = (pT p)−1 pT Ap = θT and p(0) = p0 ∈ Rn . In section 2 we showed that this system preserves the quadratic invariant qT p. To what extent do discretizations respect such conservation, and what are the implications of any drift from this manifold? To understand the role of discrete quadratic invariants, we consider the error when using a forward Euler time integrator. We begin with the two-sided iteration. The finite-time blow-up established in Theorem 4.1 is a strike against this method. Before abandoning it altogether, we wish to investigate the consequences of the blow-up on the discrete two-sided eigensolver. The forward Euler applied to (6.1) leads to the iteration (6.3) (6.4)
pj+1 = pj + hfj qj+1 = qj + hgj ,
where fj := f (pj , qj ) and gj := g(pj , qj ). With the mild caveat that qTj pj = 0, the form of the Rayleigh quotient gives qTj fj = 0 = pTj gj .
1460
MARK EMBREE AND RICHARD B. LEHOUCQ
This simple observation is critical to understanding the drift of the forward Euler iterates from the invariant manifold. It implies, for example, that the first iteration of (6.3)–(6.4) produces a iterate that is quadratically close to the manifold: qT1 p1 = qT0 p0 + h2 g0T f0 , which is perhaps surprising given the forward Euler method’s O(h) accuracy. Writing the departure from the manifold as dj = qTj pj − qT0 p0 , we, thus, have d1 = h2 (g0T f0 ). From this we can compute d2 = qT2 p2 − qT1 p1 + d1 = h2 g1T f1 + g0T f0 and, in general, dj+1 = h2 jk=0 gkT fk . (This result is a special case of one derived in [18] for partitioned Runge–Kutta systems.) Thus, we can bound the relative drift from the manifold as T j T q p fk gk j+1 j+1 − q0 p0 2 (6.5) ≤ h qT p qT p . 0 0 0 0 k=0 The definitions of f (p, q) and g(p, q) imply
qk pk fk ≤ (|θk | + A) pk ≤ 1 + Apk |qTk pk |
pk qk gk ≤ (|θk | + A) qk ≤ 1 + Aqk . |pTk qk | Substituting these formulas into (6.5), we arrive at the following result. Theorem 6.1. The forward Euler iterates (6.3)–(6.4) for the two-sided dynamical system (6.1) satisfy T 2 j
2 q pj+1 − qT0 p0 qk pk j+1 2 A ≤ h qk pk . 1 + (6.6) qT p0 qT p0 |qTk pk | 0 0 k=0 This bound implies that the departure from the manifold is proportional to the square of the step size, and involves the secants of the angles formed by qk and pk , k = 0, . . . , j, as well as the norms of qk and pk . Moreover, unless the cosines of the angles between qk and pk are bounded away from zero, there does not exist a step size h such that all iterates remain near the quadratic manifold. The proof of the theorem demonstrates that the secant of the angle is at least as large as the normalized residuals. Numerical experiments indicate that these bounds are descriptive; see the first example in section 6.3. A conclusion is that serious breakdown (as discussed after Theorem 4.1) leads to incurable breakdown of the two-sided iteration because forward Euler mimics the continuous solution and cannot “step-over” the point of blow-up. Given the shortcomings of the two-sided iteration, we shall, henceforth, focus on the one-sided dynamical system, and also include preconditioning (6.2). The associated forward Euler discretization takes the form (6.7)
pj+1 = pj + hN−1 fj ,
1461
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
where now fj = f (pj , pj ). (Here we see that the time-step h directly multiplies the preconditioner N, so that the effect of scaling N to improve the convergence rate of the continuous-time system, as discussed on page 1456, is equivalent to choosing a smaller time-step in the discrete setting.) The following analysis will play a useful role in our main convergence result, Theorem 6.3. For the rest of the paper we assume that N is symmetric and invertible, which, as seen in the Introduction, ensures that solutions of the continuous system reside on an invariant manifold pT Np = constant. At each time step, the discrete iteration incurs a local departure from that manifold of ej+1 := pTj+1 Npj+1 − pTj Npj = h2 fjT N−1 fj . Hence, if N−1 is additionally positive definite (e.g., N−1 = I), the drift is monotone increasing—an important property for the forthcoming convergence theory. When N is positive definite, we can define vector norms z2N−1 := zT N−1 z,
z2N := zT Nz
(which in turn induce matrix norms), with zN−1 ≤ N−1 zN . Thus, we write 0 02 0 02 ej+1 = h2 fj 2N−1 ≤ h2 0N−1 0 fj 2N = h2 0N−1 0 rj 2N pj 2N , where we use the normalized residual rj := fj /pj N = (θj − A)pj /pj N . Now consider the aggregate, global drift from the manifold: dj+1 := pTj+1 Npj+1 − pT0 Np0 =
j+1
j 0 02 ek ≤ h2 0N−1 0 rk 2N dk + p0 2N .
k=1
k=0
In particular, dj+1 is determined by the step size, the residual norms, and the growth in the norm of the iterates. For further simplification, choose some M > 0 such that rk 2N ≤ M for all k = 0, . . . , j. One coarse (but j-independent) possibility is (6.8)
M := inf 4A − s2N ≥ inf (A − s) − (θk − s)2N ≥ rk 2N , s∈R
s∈R
which is invariant to shifts in A. (In terms of the Euclidean norm, we, thus, have M ≤ 4κ(N) inf s∈R A − s2 , where κ(N) = NN−1.) Hence, j j 0 −1 02 0 −1 02 2 2 2 2 dj+1 ≤ h M 0N 0 (dk + p0 N ) = h M 0N 0 (j + 1)p0 N + dk k=0
k=1
(since d0 = 0). Thus, if we define the sequence {dk } by j 0 0 2 (6.9) dj+1 = h2 M 0N−1 0 (j + 1) + dk , k=1
then the departure from the manifold obeys dj+1 ≤ dj+1 p0 2N . Equation (6.9) is a binomial recurrence whose solution can be written explicitly: dj+1 =
j+1
k j+1 j+1 2 − 1. h M N−1 2 = 1 + h2 M N−1 2 k k=1
1462
MARK EMBREE AND RICHARD B. LEHOUCQ
Theorem 6.2. Let N ∈ Rn×n be symmetric and positive definite, and define M by (6.8). Then the forward Euler iterates (6.7) for the preconditioned one-sided dynamical system (6.2) satisfy (6.10)
0≤
j+1 pTj+1 Npj+1 − pT0 Np0 ≤ 1 + h2 M N−1 2 − 1, T p0 Np0
the upper bound being asymptotic to (j + 1)h2 N−1 2 M as h → 0. Note that a small eigenvalue of N results in a small time-step h. The bound also provides an estimate of a critical time-step 1 √ h j+1 N−1 M for forward Euler, limiting the departure from the quadratic manifold. Highly nonnormal problems for which A − s # maxk |λk − s| also result in tiny time-steps. Theorem 6.2 leads to an interesting observation—despite the fact that the forward Euler method generally incurs an O(h) truncation error and the global error grows exponentially in j for fixed h (see (6.12) and, e.g., [14, section 1.3]), for a one-sided iteration the drift from the quadratic manifold is O(h2 ) and both linear and nondecreasing in j for all starting vectors, under mild restrictions. This monotone departure from the manifold is exploited in the discrete convergence analysis to follow. So, although explicit Runge–Kutta methods (such as forward Euler) do not preserve quadratic invariants (see [18, Chapter IV]), the forward Euler iterates for the one-sided systems remain nearby. The reader is referred to [18, Chapter IV] for further information and references, including the use of projection to remain on the quadratic manifold. 6.2. Discrete convergence theory. Just as the local drift from the manifold at each iteration contributes to the global drift, so local truncation errors committed by each step of an ODE solver aggregate into a global error. How does this accumulated error affect convergence of the discrete method as we compute pj with j → ∞? In this section, we seek conditions that will ensure that the discrete preconditioned one-sided iteration (6.7) converges to the same eigenvector as the continuous system. First, we establish the setting that will be used through this rest of this section. Suppose A ∈ Rn×n has a simple eigenvalue λ1 strictly to the left of all other eigenvalues (and, hence, real). Without loss of generality (via a unitary similarity transformation) we can assume that A takes the form λ1 dT . (6.11) A= 0 C Let x1 and y1 denote unit-length right and left eigenvectors associated with λ1 ; in these coordinates we can take x1 = [1, 0, . . . , 0]T . Theorems 4.3, 4.4, and 5.1 provide conditions under which the solution p(t) of the continuous system converges in angle to the eigenvector x1 (e.g., if N = I and y1T p0 = 0). Before beginning the convergence analysis, one should appreciate that the conditions established in the last paragraph are not sufficient to guarantee convergence of the discrete iteration. Consider the following example. When N = I, the forward Euler iterate of the one-sided system at step k can be written as pk =
k−1 ! j=0
ϕj (A)p0
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1463
for linear factors ϕj (z) = 1+h(θj −z). If any of these factors has λ1 as a root, then pk will have no component in the direction of the eigenvector x1 , and so λ1 and x1 will not influence the iteration: convergence of pk to x1 is impossible. Concrete matrices that exhibit such behavior are simple to construct. For any fixed h > 0, set 0 −1 − 2/h 1 A= , p0 = . 0 1 1 Theorem 4.3 guarantees that the continuous one-sided system will converge for this A and p0 . At the first step of the forward Euler method θ0 = −1/h, so that ϕ0 (0) = 0 and p1 = [h+2, −h]T is an eigenvector for λ2 = 1, and pk will never have a component in the x1 direction for any k ≥ 1. (Note that ϕj (λ1 ) = 1 + h(θj − λ1 ) = 0 implies that θj − λ1 = −1/h < 0, and this is impossible if A is normal. As h is reduced, complete deflation requires an increasing departure from normality.) The more sophisticated restarted Arnoldi algorithm exhibits a similar phenomenon; see [12]. Under what circumstances can we guarantee convergence? To answer this question, we first review the conventional global error analysis for the forward Euler method; for details, see, e.g., [14, section 1.3]. The first step begins with the exact solution at time t = 0: p0 = p(0). Each subsequent step introduces a local truncation error, while also magnifying the global error aggregated at previous steps. Suppose we wish to integrate for t ∈ [0, τ ] with τ = kh for some integer k. With the local truncation error at each step bounded by 1 h¨ p(t), 0≤t≤τ 2
Th := max one can show that (6.12)
pk − p(τ ) ≤
Th τ L e −1 , L
where L is a Lipschitz constant for our differential equation; in Appendix A we show that L = 10N−1A will suffice. This expression for the global error captures an essential feature: for fixed τ , the fact that Th = O(h) implies that we can always select h > 0 sufficiently small as to make the difference between the forward Euler iterate pτ /h and the exact solution p(τ ) arbitrarily small. However, if we increase k with h > 0 fixed, the bound indicates an exponential growth in the error. To show that pk converges (in angle) to an eigenvector as k → ∞, further work is required. In this effort, the preservation of the quadratic invariant characterized in Theorem 6.2 plays an essential role. Preconditioning significantly complicates the convergence theory. For simplicity, our analysis imposes the stringent requirement that, in the coordinates in which A takes the form (6.11), we have η 0 −1 (6.13) N = 0 M in addition to the requirement that N−1 be symmetric and positive definite. The trivial off-diagonal blocks prevent the preconditioner from using the growing component of pk in x1 to enlarge the component in the unwanted eigenspace. A crucial ingredient in our convergence analysis is the constant γ := Π1 (I + hN(λ1 − A)) = I + hM(λ1 − C),
1464
MARK EMBREE AND RICHARD B. LEHOUCQ
where Π1 := I − x1 xT1 is a projector onto the complement of the desired invariant subspace. This constant γ, a function of h, measures the potency of the preconditioner: the smaller, the better. For example, in the ideal case that M = (C − λ1 )−1 , we have γ = |1 − h|, giving γ = 0 for the large step size h = 1, and that γ → 1 as h → 0. With γ in hand, we are prepared to state our convergence result. Here, κ(N) = NN−1 denotes the condition number of the preconditioner. Theorem 6.3. Given (6.11), (6.13), and assumptions on λ1 , x1 , and N established in the previous paragraphs, suppose that p0 is chosen so that the continuous dynamical system converges in angle to an eigenvector associated with the distinct, simple leftmost eigenvalue λ1 (e.g., y1T p0 = 0 suffices if N = I). Furthermore, suppose there exists h > 0 for which (6.14) γ ∈ [0, 1/ κ(N)). Then after preliminary iteration with a sufficiently small time-step h0 , the forward Euler method with time-step h will converge (in angle) to the desired eigenvector: (6.15) sin(∠(pk , x1 )) = O γ k . Asymptotically, the Rayleigh quotient converges to λ at the same rate: (6.16) |θk − λ| = O γ k , which in the case d = 0 improves to |θk − λ| = O(γ 2k ). Proof. Denote the kth iterate by α pk = k . bk • Convergence of the forward Euler method to the continuous solution, and convergence of the continuous solution to the eigenvector, together ensure that preliminary forward Euler steps will get close to the eigenvector. To show that sin(∠(pk , x1 )) → 0 as k → ∞, we will show that bk → 0 while |αk | is bounded away from zero. The convergence of the forward Euler method at a fixed time τ ≥ 0 (see (6.12)), with the assumption that the continuous system converges for the given p0 (as described in sections 4–5), ensures that we can run the forward Euler iteration with a sufficiently small time-step that, after k ≥ 0 iterations, bk is sufficiently small that bk d ε bk 2 λ1 − C + 2 ≤ 2 2 2 αk + bk hM αk + bk for some ε ∈ [0, 1/ κ(N) − γ); here γ ∈ [0, 1/ κ(N)) and h > 0 are as in the statement of the theorem. Note that the left-hand side of (6.17) will get small when bk is small, since |αk | is bounded away from zero. This follows from Theorem 6.2 (monotonic drift of the invariant) and the fact that N is symmetric positive definite, which imply that for any j, (6.17)
(6.18)
pj 2 ≥
1 T 1 T 1 p Npj ≥ p Npj−1 ≥ pj−1 2 . N j N j−1 κ(N)
• Condition (6.17) ensures that θk is close to λ1 . Since θk =
λ1 α2k + αk dT bk + bTk Cbk , α2k + bk 2
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
we have
1465
λ1 α2 + αk dT bk + bT Cb − λ1 α2 + bT b k k k k k k |θk − λ1 | = α2k + bk 2 T b (C − λ1 )b |αk |bk d k k ≤ + 2 α2k + bk 2 αk + bk 2
bk 2 C − λ1 bk d + 2 , α2k + bk 2 αk + bk 2 where the last inequality uses the fact that |αk | ≤ α2k + bk 2 . Now condition (6.17) implies that the Rayleigh quotient θk is sufficiently close to the eigenvalue λ1 : ε . (6.20) |θk − λ1 | ≤ hM ≤
(6.19)
The next step of the iteration, with time-step h > 0 specified in the statement of the theorem, produces αk + ηh (θk − λ1 )αk − dT bk αk+1 −1 = pk+1 = pk + hN (θk − A)pk = . (I + hM(θk − C))bk bk+1 Adding zero in a convenient way gives bk+1 = (I + hM(λ1 − C))bk + h(θk − λ1 )Mbk ≤ I + hM(λ1 − C)bk + h|λ1 − θk |Mbk ≤ (γ + ε)bk .
(6.21)
In particular, since 0 ≤ γ + ε < 1/κ(N) ≤ 1, this guarantees a fixed reduction in the component of the forward Euler iterate in the unwanted eigenspace. (The second inequality follows from condition (6.14) and bound (6.20).) After checking a few details, we shall see that this condition is the key to convergence. • Subsequent Rayleigh quotients must also remain close to λ1 . We now show that the new Rayleigh quotient, θk+1 , automatically satisfies the requirement (6.20) with the same ε > 0 and time-step. Repeating the calculation that culminated in (6.19), we obtain bk+1 2 C − λ1 dbk+1 + . |θk+1 − λ1 | ≤ 2 αk+1 + bk+1 2 α2 + b 2 k+1
k+1
Now we use (6.18), a consequence of the monotonic drift from the invariant manifold, to deduce that κ(N)(γ + ε)dbk κ(N)(γ + ε)2 bk 2 C − λ1 |θk+1 − λ1 | ≤ + 2 2 αk + bk α2k + bk 2 ≤
bk 2 C − λ1 dbk + 2 , α2k + bk 2 αk + bk 2
since γ + ε < 1/ κ(N). The condition (6.17) then implies that ε |θk+1 − λ1 | ≤ , hM which guarantees that the Rayleigh quotient cannot wander too far from λ1 .
1466
MARK EMBREE AND RICHARD B. LEHOUCQ
• Subsequent iterates and Rayleigh quotients must eventually converge. The bound on |θk+1 − λ1 | just established allows us to repeat the argument resulting in (6.21) at future steps, giving bk+m ≤ (γ + ε)m bk along with, via a slight modification of (6.18), κ(N)(γ + ε)2m bk 2 C − λ1 (6.22) |θk+m − λ1 | ≤ + α2k + bk 2 ≤
κ(N)(γ + ε)m dbk α2k + bk 2
dbk bk 2 C − λ1 + 2 . α2k + bk 2 αk + bk 2
Thus, |θk+m − λ1 | ≤ ε/(hM) for all m ≥ 1. As bk+m → 0, the component in the desired eigenvector does not vanish, as again a generalization of (6.18) gives 1 p0 . pk+m ≥ κ(N) Thus, with x1 = e1 , we have 0 0 0 ξαk+m − 1 0 ξpk+m − x1 0 0 = min 0 sin ∠(pk+m , x1 ) = min 0 ξbk+m ξ ξ x1 bk bk+m ≤ (γ + ε)m , ≤ |αk+m | |αk+m | where we have taken ξ = α−1 k+m for the first inequality. As |αk+m | is bounded away from zero, we have sin ∠(pk+m , x1 ) = O((γ + ε)m ) as m → ∞. Since bk+m → 0 as m → ∞, we can take the ε used in (6.19) to be arbitrarily small as the iterations progress, giving the asymptotic rate given in (6.15). Similarly, from (6.22) we observe that the Rayleigh quotient converges as in (6.16). The O(γ m ) term in that bound falls out if d = 0. We now make several remarks concerning Theorem 6.14 and its proof. (1) As N becomes increasingly ill-conditioned, the hypothesis (6.14) in the theorem becomes more and more difficult to satisfy. We can only guarantee convergence for an illconditioned preconditioner if that preconditioner gives a small value of γ, i.e., if it gives a rapid convergence rate. (2) A curiosity of condition (6.17) is that the requirement is more strict when convergence is slower, i.e., when γ is near κ(N)−1/2 . (3) One does not in general know whether θk falls to the left or right of λ1 . If A is normal, then as θk must fall the convex hull of its spectrum, and so θk ≥ λ1 ; for nonnormal A, it is possible that θk < λ1 . (4) The proof of the theorem exploits the monotonic drift from the manifold described by Theorem 6.2. This drift is easily monitored, so providing a useful (and cheap) check on convergence of the iteration during computation. If this drift reaches a point where it is not small, projection to the quadratic manifold is easily undertaken; see [18, Chapter IV] for further information. Theorem 6.3 considers the general case of nonsymmetric A and a somewhat stringent notion of preconditioning. For the important special case of symmetric positive definite A, Knyazev and Neymeyr [23] provide convergence estimates (and review much literature) for the one-sided forward Euler discretization (6.3). They provide
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1467
rates of convergence given a symmetric positive definite preconditioner N for A. However, a connection with dynamical systems is not made and instead optimization is applied to the Rayleigh quotient. If M = I, and C is normal (which is possible even if A itself is not normal due to d = 0) with spectrum given by σ(C) = {λ2 , . . . , λn }, we can estimate an optimal time-step as follows. We wish to minimize γ = max |1 + h(λ1 − λi )|, i=2,...,n
a simple minimax approximation problem on a discrete set; see, e.g., [36, section 8.5]. In particular, if all the eigenvalues are real (i.e., C is symmetric) and λ2 ≤ λ3 ≤ · · · ≤ λn , then the best h must give 1 + h(λ1 − λ2 ) = −1 − h(λ1 − λn ). This can be solved to obtain h = 2/(λ2 + λn − 2λ1 ), from which we compute γ=
λn − λ2 . λn + λ2 − 2λ1
Notice that this agrees with the convergence rate of the power method applied to A − σI for the optimal shift σ = 12 (λ2 + λn ) to the leftmost eigenvector x1 ; see, e.g., [43, p. 572]. With the optimal choice of h, the forward Euler method recovers the convergence rate of an optimally shifted power method to x1 . Again, suppose that M = I, so that γ = γ(h) → 1 as h → 0. However, this limit need not be approached from below; that is, for some matrices C we will have γ(h) > 1 for all h sufficiently small.2 The behavior of γ in this limit bears a close connection to the logarithmic norm of λ1 − C, which is defined as β(λ1 − C) := lim h↓0
I + h(λ1 − C) − 1 ; h
see, e.g., [30], [40, Chapter 17]. In particular, γ(h) < 1 for all sufficiently small h > 0 provided β(λ1 − C) < 0. One can show that the logarithmic norm of a matrix coincides with the numerical abscissa, that is, the real part of the rightmost point in the numerical range: β(λ1 − C) =
max
v∈Cn−1 ,v=1
Re v∗ (λ1 − C)v
5 4 ; = max η : η ∈ σ( 12 (λ1 − C) + λ1 − CT see, e.g., [40, Theorem 17.4]. When is γ(h) > 1? That is, for what matrices can we not apply our convergence theory by taking h arbitrarily small? We can answer this question by finding requirements on C that ensure β(λ1 − C) < 0. From the above analysis we see that β(λ1 − C) = λ1 −
min
v∈Cn−1 ,v=1
Re v∗ Cv.
of A to the orthogonal complement of the Since C is essentially the restriction A|x⊥ 1 eigenvector x1 , we can summarize as follows. Lemma 6.4. Suppose N = I. Then γ < 1 for all h sufficiently small if and only (equivalently, C). if λ1 is not in the numerical range of A|x⊥ 1 2 In this case the matrix A does not satisfy the hypotheses of the theorem; convergence is still possible. Experiments with a small example gave convergence after a bit of initial irregularity.
1468
MARK EMBREE AND RICHARD B. LEHOUCQ
8
8
10
10 p(t) residual q(t) residual
7
10 6
10
6
10
5
4
10
10
4
10 2
10
3
10
2
10
0
10
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10
0
˙ q), ˙ (a) Residual norms (p, exact flow
0.1
0.2
0.3
0.4
(b) sec ∠(p, q) =
8
0.5
0.6
pq , |qT p|
0.7
0.8
0.9
1
exact flow
8
10
10
p residual k
qk residual
6
10
6
10
4
10 4
10
2
10 2
10
0
10 0
10
−2
10 0
0.2
0.4
0.6
0.8
1
(c) FE residual norms (fk , gk ), h = 0.025
0
0.2
0.4
0.6
0.8
qT p j j (d) FE invariant drift, qT − 1, h = 0.025 p 0
1
0
Fig. 6.1. Sampled flow and forward Euler (FE) approximations for the two-sided system with T100 and ρ = 1/(20 · 101). The horizontal axis denotes time. Note the blow-up of the exact solution ρ near t = 0.675, and the consequences of this behavior for the discretized method.
6.3. Numerical experiments. In this section we investigate Theorems 4.1, 6.1, and 6.3 through several computational examples. Our first experiment applies to the tridiagonal matrix ⎡ Tnρ
2
⎢ ⎢ −1 − ρ ≡⎢ ⎢ ⎣ 0
−1 + ρ 2 .. .
0
.. ..
.
. −1 − ρ
⎤
⎥ ⎥ ⎥ ∈ Rn×n , ⎥ −1 + ρ ⎦ 2
where n = 100 and ρ = 1/(20(n + 1)). The eigenvalues are all real and the condition number of the matrix of eigenvectors is modest. All computations in Figure 6.1 use the same starting vectors p0 and q0 , which are taken to be (different) random vectors. (Results vary with the other choices for these vectors.) Figures 6.1(a) and 6.1(b) show the exact solution to the two-sided unpreconditioned system, as given by Theorem 4.1. The residuals · p = pθ − Ap and · q = qθ − A∗ q begin to decrease, but then rise as t approaches a critical point
1469
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS 2
1
10
10
0
10
0
10
γk
γk
−1
10
−2
10
−2
10
γ 2k
−4
10
γ 2k
−3
10 −6
10
−4
10
|θk−λ1|
|θk−λ1|
||b || −8
10
0
||b ||
k
−5
100
200
300
iteration, k
400
500
10
0
k
100
200
300
400
500
iteration, k
Fig. 6.2. Computational confirmation of Theorem 6.3 for a normal matrix (left) and a nonnormal matrix (right), both with N = I. In the normal case, the residual |θk − λ| converges like γ 2k , while in the nonnormal case |θk − λ| only converges like γ k . The vertical lines denote the point at which the hypotheses of the convergence theorem hold.
near t = 0.675, where cusps develop, indicating that a pole as given by π(t) of Theorem 4.1 is encountered by the flow. The same behavior is seen in a plot of the secant of the angle between p and q. Figures 6.1(c) and 6.1(d) display the discrete flow associated with a forward Euler time integrator with a time step of h = 0.025. As expected, when the iterates depart from the quadratic manifold, the residuals explode in size, as in the exact solution. One can also show that the secant of the angle between pj and qj , and the norms of pj and qj , also begin to grow near t ≈ .675, consistent with Theorem 6.1. Decreasing the time-step h does not avoid the blow-up—in fact, the time at which the explosive growth occurs is largely independent of the time-step because of the onset of incurable breakdown associated with the continuous dynamical system. In contrast to the latter, the discrete dynamical system cannot simply step over the pole associated with continuous dynamical system. Aside from special cases such as the one described by Theorem 4.2, these results appear to be common and do not significantly depend on specially engineered starting vectors (though breakdown will occur at different points in time, of course). We also implemented the symplectic Euler method (that preserves quadratic invariants) for this class of matrices and observed behavior consistent with the forward Euler method combined with a projection. In contrast, the one-sided discretized forward Euler iterations converge to the left eigenvalue and associated eigenvector. Next, we investigate the convergence analysis described in Theorem 6.3 for a simple example with N = I. Let A be the matrix with aj,j = (j − 1)/(N − 1) for j = 1, . . . , N , and all other entries equal to zero except perhaps for the vector dT in entries 2 through N of the first row; cf. (6.11). The plots in Figure 6.2 use N = 64, comparing dT = 0 (left) and dT = [1, . . . , 1] (right). In both cases we take h = 1/2, for which (6.14) gives γ = 0.992 . . . ∈ [0, 1) as required. We take p0 to be the same randomly generated unit vector in both cases. This initial vector does not satisfy (6.17), but this condition is eventually met after a number of iterations, denoted by the vertical line in each plot. For the normal case in the left plot, bk converges like γ k , while the error in the Rayleigh quotient |θk − λ1 | converges like γ 2k as predicted. The nonnormality induced by the d vector spoils this convergence for the Rayleigh quotient, as seen in the right plot; now both bk and |θk − λ1 | converge
1470
MARK EMBREE AND RICHARD B. LEHOUCQ
like γ k , consistent with Theorem 6.3. The spikes in the latter plot correspond to points where the Rayleigh quotient θk crossed over the desired eigenvalue λ1 , something only possible for nonnormal iterations. 7. Summary. This paper demonstrates the fruitful relationship between several nonlinear dynamical systems and certain simple preconditioned eigensolvers for nonsymmetric eigenvalue problems. Properties of the continuous-time systems, such as system invariants and the asymptotic behavior of the exact solution, can inform the convergence theory for practical algorithms derived from discretizations, as we illustrate with Theorem 6.1 for the forward Euler discretization. Generalizations to more sophisticated discretizations, along with relaxation of the stringent requirements on the preconditioner in Theorem 6.1, are natural avenues for future research. Appendix A. Lipschitz constant for Euler’s method. To apply the standard convergence theory for the forward Euler method applied to the system p˙ = N−1 (θ(p)p − Ap), we seek a constant L > 0 such that 0 0 −1 0N (θ(u)u − Au) − N−1 (θ(v)v − Av)0 ≤ L u − v for all u, v ∈ Rn . First we note that (θ(u)u − Au) − (θ(v)v − Av) ≤ θ(u)u − θ(v)v + Au − v. We focus attention on the first term on the right: θ(u)u − θ(v)v ≤ θ(u)u − θ(v)u + θ(v)u − θ(v)v ≤ |θ(u) − θ(v)|u + |θ(v)|u − v (A.1)
≤ |θ(u) − θ(v)|u + Au − v.
(In this last inequality and others that follow, we neglect the opportunity to take tighter bounds that would lead to smaller constants but greater analytical complexity.) Next, we need to bound |θ(u) − θ(v)|u in terms of u − v. For convenience = u/u and v = v/v, (assuming neither u nor v is zero), define the unit vectors u −u , so that with ε = v T TA A u−v v |θ(u) − θ(v)| = u T A TA TAε + εTAε = u u−u u − εTA u−u
(A.2)
≤ 2εA + ε2 A.
Now note that = ε = v−u
|u − v| u − v uv − vv + vv − vu ≤ + . uv u u
Apply the triangle inequality to obtain |u−v| ≤ u−v, from which we conclude (A.3)
ε ≤
2 u − v. u
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1471
and v are unit vectors, we alternatively have the coarse bound ε = Since u u− v ≤ 2, which we can apply to (A.2) to obtain |θ(u) − θ(v)| ≤ 2εA + ε2 A ≤ 2εA + 2εA = 4Aε. Now using (A.3), the bound first bound on ε, |θ(u) − θ(v)| ≤ 8
A u − v. u
Substituting this bound into (A.1) gives θ(u)u − θ(v)v ≤ 9Au − v, and, finally, we arrive at the Lipschitz constant 0 −1 0 0 0 0N (θ(u)u − Au) − N−1 (θ(v)v − Av)0 ≤ 10 0N−1 0 Au − v. Thus, we define (A.4)
0 0 L = 10 0N−1 0 A.
The Rayleigh quotient θ(p) is undefined in the case that p = 0. However, as p → 0, we have that θ(p)p − Ap → 0, and this motivates the definition that θ(p)p − Ap = 0 if p = 0. The above analysis excludes the case that u = 0 and/or v = 0, but with our definition of this singular case we have, e.g., if u = 0, that (θ(u)u − Au) − (θ(v)v − Av) = (θ(v)v − Av) ≤ 2Av ≤ 10Au − v, and obviously if u = v = 0, we have (θ(u)u − Au) − (θ(v)v − Av) = 0 = 10Au − v. Hence, the Lipschitz constant (A.4) holds for all u and v. Acknowledgments. We thank Pierre-Antoine Absil, Moody Chu, Kyle Gallivan, Anthony Kellems, Christian Lubich, and Qiang Ye, and anonymous referees for their numerous helpful suggestions concerning this work and its presentation. REFERENCES [1] P.-A. Absil, Continuous-time systems that solve computational problems, Int. J. Uncov. Comput., 2 (2006), pp. 291–304. [2] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, Princeton, NJ, 2008. [3] P.-A. Absil, R. Sepulchre, and R. Mahony, Continuous-time subspace flows related to the symmetric eigenproblem, Pacific J. Optim., 4 (2008), pp. 179–194. [4] V. I. Arnold, Ordinary Differential Equations, 3rd ed., Springer-Verlag, Berlin, 1992. [5] Z. Bai, D. Day, and Q. Ye, ABLE: An adaptive block Lanczos method for non-Hermitian eigenvalue problems, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 1060–1082. [6] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, eds., Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, 2000.
1472
MARK EMBREE AND RICHARD B. LEHOUCQ
[7] W. Bao and Q. Du, Computing the ground state solution of Bose–Einstein condensates by a normalized gradient flow, SIAM J. Sci. Comp., 25 (2004), pp. 1674–1697. [8] R. Car and M. Parrinello, Unified approach for molecular dynamics and density functional theory, Phys. Rev. Lett., 55 (1985), pp. 2471–2474. [9] J. Carr, Applications of Centre Manifold Theory, Springer-Verlag, New York, 1981. [10] M. T. Chu, Curves on sn−1 that lead to eigenvalues or their means of a matrix, SIAM J. Alg. Disc. Math., 7 (1986), pp. 425–432. [11] M. T. Chu, On the continuous realization of iterative processes, SIAM Rev., 30 (1988), pp. 375– 387. [12] M. Embree, The Arnoldi eigenvalue iteration with exact shifts can fail, SIAM J. Matrix Anal. Appl., to appear. [13] M. A. Freitag and A. Spence, Convergence theory for inexact inverse iteration applied to the generalised nonsymmetric eigenvalue problem, Electron. Trans. Numer. Anal., 28 (2007), pp. 40–64. [14] C. W. Gear, Numerical Initial Value Problems in Ordinary Differential Equations, PrenticeHall, Englewood Cliffs, NJ, 1971. [15] G. H. Golub and L.-Z. Liao, Continuous methods for extreme and interior eigenvalue problems, Linear Algebra Appl., 415 (2006), pp. 31–51. [16] G. H. Golub and Q. Ye, Inexact inverse iteration for generalized eigenvalue problems, BIT, (2000), pp. 671–684. [17] J. Guckenheimer and P. Holmes, Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields, Springer-Verlag, New York, 1983. [18] E. Hairer, C. Lubich, and G. Wanner, Geometric Numerical Integration: StructurePreserving Algorithms for Ordinary Differential Equations, 2nd ed., Springer-Verlag, Berlin, 2006. [19] U. Helmke and J. B. Moore, Optimization and Dynamical Systems, Springer, London, 1994. [20] W. Kahan, B. N. Parlett, and E. Jiang, Residual bounds on approximate eigensystems of nonnormal matrices, SIAM J. Num. Anal., 19 (1982), pp. 470–484. [21] A. V. Knyazev, Preconditioned eigensolvers—an oxymoron?, Elec. Trans. Numer. Anal., 7 (1998), pp. 104–123. [22] A. V. Knyazev and K. Neymeyr, Efficient solution of symmetric eigenvalue problems using multigrid preconditioners in the locally optimal block conjugate gradient method, Elec. Trans. Numer. Anal., 7 (2003), pp. 38–55. [23] A. V. Knyazev and K. Neymeyr, A geometric theory for preconditioned inverse iteration. III: A short and sharp convergence estimate for generalized eigenvalue problems, Linear Algebra Appl., 358 (2003), pp. 95–114. [24] Y.-L. Lai, K.-Y. Lin, and W.-W. Lin, An inexact inverse iteration for large sparse eigenvalue problems, Numer. Linear Algebra Appl., 1 (1997), pp. 1–13. [25] R. B. Lehoucq and A. J. Salinger, Large-scale eigenvalue calculations for stability analysis of steady flows on massively parallel computers, Internat. J. Numer. Methods Fluids, 36 (2001), pp. 309–327. [26] B. Leimkuhler and S. Reich, Simulating Hamiltonian Dynamics, Cambridge University Press, Cambridge, 2005. [27] R. Mahony and P.-A. Absil, The continuous time Rayleigh quotient flow on the sphere, Linear Algebra Appl., 368 (2003), pp. 343–357. [28] Y. Nakamura, K. Kajiwara, and H. Shiotani, On an integrable discretization of the Rayleigh quotient gradient system and the power method with a shift, J. Comput. Appl. Math., 96 (1998), pp. 77–90. [29] T. Nanda, Differential equations and the QR algorithm, SIAM J. Numer. Anal., 22 (1985), pp. 310–321. [30] O. Nevanlinna, Convergence of Iterations for Linear Equations, Birkh¨ auser, Basel, 1993. [31] E. E. Osborne, On pre-conditioning of matrices, J. ACM, 7 (1960), pp. 338–345. [32] B. N. Parlett, The Symmetric Eigenvalue Problem, no. 20 in Classics in Applied Mathematics, SIAM, Philadelphia, 1998. Amended reprint of 1980 Prentice-Hall edition. [33] B. N. Parlett and C. Reinsch, Balancing a matrix for calculation of eigenvalues and eigenvectors, Numer. Math., 13 (1969), pp. 293–304. [34] M. C. Payne, M. P. Teeter, D. C. Allan, T. Arias, and J. Joannopoulos, Iterative minimization techniques for ab initio total-energy calculations: Molecular dynamics and conjugate gradients, Rev. Mod. Phys, 64 (1992), pp. 1045–1097. [35] B. T. Polyak, Introduction to Optimization, Translation Series in Mathematics and Engineering, Optimization Software, Inc., New York, 1987. [36] M. J. D. Powell, Approximation Theory and Methods, Cambridge University Press, Cambridge, 1981.
DYNAMICAL SYSTEMS AND ITERATIVE EIGENSOLVERS
1473
[37] Y. Saad, Variations on Arnoldi’s method for computing eigenelements of large unsymmetric matrices, Linear Algebra Appl., 34 (1980), pp. 269–295. [38] G. W. Stewart and J.-g. Sun, Matrix Perturbation Theory, Academic Press, San Diego, CA, 1990. [39] W. W. Symes, The QR algorithm and scattering for the finite nonperiodic Toda lattice, Physica D, 4 (1982), pp. 275–280. [40] L. N. Trefethen and M. Embree, Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators, Princeton University Press, Princeton, NJ, 2005. [41] J. S. Warsa, T. A. Wareing, J. E. Morel, J. M. McGhee, and R. B. Lehoucq, Krylov subspace iterations for deterministic k-eigenvalue calculations, Nuc. Sci. Engrg., 147 (2004), pp. 26–42. [42] D. S. Watkins, Isospectral flows, SIAM Rev., 26 (1984), pp. 379–391. [43] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, Oxford, 1965.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1474–1499
c 2009 Society for Industrial and Applied Mathematics
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY THE EXTENDED FINITE ELEMENT METHOD∗ JAROSLAV HASLINGER† AND YVES RENARD‡ Abstract. The purpose of this paper is to present a new fictitious domain approach inspired by the extended finite element method introduced by Mo¨es, Dolbow, and Belytschko in [Internat. J. Numer. Methods Engrg., 46 (1999), pp. 131–150]. An optimal method is obtained thanks to an additional stabilization technique. Some a priori estimates are established and numerical experiments illustrate different aspects of the method. The presentation is made on a simple Poisson problem with mixed Neumann and Dirichlet boundary conditions. The extension to other problems or boundary conditions is quite straightforward. Key words. fictitious domain, Xfem, approximation of elliptic problems, stabilization technique AMS subject classifications. 65N30, 65N15 DOI. 10.1137/070704435
1. Introduction. The extended finite element method (Xfem) was introduced by Mo¨es, Dolbow, and Belytschko in [18] and developed in many papers such as [5, 16, 19, 23, 28]. The first application of Xfem was done in structural mechanics when dealing with cracked domains. The specificity of the method is that it combines a level-set representation of the geometry of the crack (introduced in [25]) with an enrichment of a finite element space by singular and discontinuous functions. The enrichment of a finite element space with a singular function has been studied earlier by Strang and Fix in [26]. The originality of Xfem consists in a particular way of defining the enrichment via the multiplication by a partition of unity provided by basis functions of a Lagrange finite element method. Several strategies can be considered in order to extend or improve the original Xfem. Some of these strategies are presented in [16]. An a priori error estimate of a variant of Xfem for cracked domains is presented in [5]. In this work we adapt the techniques of Xfem to develop a new method allowing computations in domains whose boundaries are independent of the mesh. A similar attempt was done in [17, 27]. Our goal is to develop a fully optimal method. It can be considered as a fictitious domain-type method. Its advantage, compared to existing ones (see, for instance, [11, 13]), is its ability to easily treat complex boundary conditions. The elementary matrices, however, have to be computed taking into account the geometry of the real boundary (in a nonlinear framework this disadvantage disappears since the tangent stiffness matrix has to be frequently recomputed). Therefore, this method can be of interest for computational domains having moving boundaries or boundaries with a complex geometry and various conditions on them (Dirichlet, Neumann, Signorini, . . . ). In this paper, only Dirichlet and Neu∗ Received by the editors October 5, 2007; accepted for publication (in revised form) November 18, 2008; published electronically March 25, 2009. This work was supported by “l’Agence Nationale de la Recherche,” project ANR-05-JCJC-0182-01. http://www.siam.org/journals/sinum/47-2/70443.html † Department of Numerical Mathematics, Faculty of Mathematics and Physics, Sokolovsk´ a 83, 18675 Praha 8, Czech Republic (Jaroslav.Haslinger@mff.cuni.cz). This author’s research was supported by grant MSM0021620839 of the Czech Ministry of Education and IAA100750802 of GAAV CR. ‡ Universit´ e de Lyon, CNRS, INSA-Lyon, ICJ UMR5208, LaMCoS UMR5259, F-69621, Villeurbanne, France (
[email protected]).
1474
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
1475
mann boundary conditions are considered. An extension to more complex boundary data is straightforward, at least from the implementation point of view. The outline of this paper is as follows. In section 1, we introduce the model problem which is represented by a simple Poisson equation with Neumann and Dirichlet boundary conditions. In section 2 we describe the new method for a model problem without any stabilization. Section 3 is devoted to a convergence analysis of this √ approach. An abstract result is obtained which gives a convergence rate of order h under reasonable regularity assumptions on the solution even for high order finite elements. The main part of this paper is section 4 where a new stabilized method is introduced. Under appropriate assumptions we prove the stability of this formulation as well as optimal error estimates. In section 5 we briefly mention details on the computational implementation. Numerical experiments for a model example with different choices of finite element spaces are presented in section 6. The paper is completed with three appendices with proofs of trace theorems needed in the text. 2. Setting of the problem. We present a new approach for numerical realization of elliptic problems. The theoretical presentation is made for a two or threedimensional simply connected bounded domain Ω with a sufficiently smooth boundary. 6 ⊂ Rd (d = 2 or d = 3) be a rectangular or parallelepiped domain (the fictiLet Ω tious domain) containing Ω in its interior. We consider that the boundary Γ of Ω is split into two parts ΓN and ΓD (see Figure 1). It is assumed that ΓD has a nonzero (d − 1)-dimensional Lebesgue measure. Let us consider the following problem in Ω: Find u : Ω → R such that −Δu = f in Ω, u = 0 on ΓD , ∂n u = g on ΓN ,
(1) (2) (3)
where f ∈ L2 (Ω), g ∈ L2 (ΓN ) are given data and n is the outward unit normal vector to Γ. The weak formulation of such a problem is well known and reads as follows: Find u ∈ V0 such that (4) a(u, v) = l(v) ∀v ∈ V0 , where V = H 1 (Ω), V0 = {v ∈ V : v = 0 on ΓD }, ∇u.∇vdΩ, l(v) = f vdΩ + gvdΓ. a(u, v) = Ω
Ω
ΓN
.
ΓD n Ω ΓN ˜ Ω .
Fig. 1. Fictitious and real domains.
1476
JAROSLAV HASLINGER AND YVES RENARD
It is also well known that this problem can be expressed by means of the following mixed formulation: ⎧ ⎨Find u ∈ V and λ ∈ W such that a(u, v) + λ, vW,X = l(v) ∀v ∈ V, (5) ⎩ μ, uW,X = 0 ∀μ ∈ W, where X = {w ∈ L2 (ΓD ) : ∃v ∈ V such that w = v| }, W = X , and μ, vW,X ΓD denotes the duality pairing between W and X. Let # V0# =
v∈V :
vdΓ = 0 . ΓD
Then a(., .) is coercive on V0# (a direct consequence of Peetre–Tartar lemma, see [10] for instance), i.e., there exists α > 0 such that a(v, v) ≥ αv2V ∀v ∈ V0# .
(6)
From this, the existence and uniqueness of a solution to Problem (5) follows. In addition, λ = −∂n u on ΓD . Problem (5) is also equivalent to the problem of finding a saddle point of the following Lagrangian on V × W : (7)
L(v, μ) =
1 a(v, v) + μ, vW,X − l(v). 2
3. The new fictitious domain method. The new fictitious domain approach which will be studied in this paper requires the introduction of two finite dimensional 6 and W P h ⊂ L2 (Ω) 6 on the fictitious domain Ω. 6 As Ω 6 finite element spaces V6 h ⊂ H 1 (Ω) can be a rectangular or parallelepiped domain, the ones can be defined on the same structured mesh T h (see Figure 2). Note that in the following, we only use the fact that the family of meshes is quasi-uniform (in the classical sense of Ciarlet [6, 7]). Next we shall suppose that . h 6 : vh (8) V6 h = v h ∈ C Ω , ∈ P (T ) ∀T ∈ T | T
where P (T ) is a finite dimensional space of regular functions such that P (T ) ⊇ Pk (T ) for some k ≥ 1 integer. The mesh parameter h stands for h = maxT ∈T h hT where hT is the diameter of T . .
ΓD
ΓN
.
Fig. 2. Example of a structured mesh.
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
1477
Then one can build V h := V6 h|
Ω
Ph , and W h := W |
ΓD
,
which are natural discretizations of V and W , respectively. An approximation of Problem (5) is defined as follows:
(9)
⎧ Find uh ∈ Vh and λh ∈ W h such that ⎪ ⎪ ⎪ ⎪ ⎨a(uh , v h ) + λh v h dΓ = l(v h ) ∀v h ∈ V h , ΓD ⎪ ⎪ ⎪ ⎪ μh uh dΓ = 0 ∀μh ∈ W h . ⎩ ΓD
Similarly to Xfem, where the shape functions of the finite element space are multiplied with a Heaviside function, this corresponds here to the multiplication of the shape functions with the characteristic function of Ω. 4. Convergence analysis. Let us define the following space: # h h h h h h h (10) V0 = v ∈ V : . μ v dΓ = 0 ∀μ ∈ W ΓD
This space can be viewed to be a (nonconforming) discretization of V0 . In addition, P h and V6 h are chosen in such a way that the following two we shall suppose that W conditions are satisfied for every h > 0: (11)
1| ∈ W h , ΓD
(12)
μh ∈ W h :
μh v h dΓ = 0 ∀v h ∈ V h =⇒ μh = 0. ΓD
Lemma 1. The bilinear form a(·, ·) is uniformly V0h -elliptic; i.e., there exists α > 0 independent of h such that 0 0 a v h , v h ≥ α 0v h 0V ∀v h ∈ V0h . Proof. It follows from the fact that V0h ⊂ V0# . Proposition 1. Suppose that (11) and (12) are satisfied. Then the solution (uh , λh ) to Problem (9) is unique and there exists a constant C > 0 independent of P h such that1 V6 h and W 0 h0 0u 0 ≤ ClH −1 (Ω) . V ∈ W h , it follows from the last equality in (9) that uh ∈ V0# . Proof. Since 1| ΓD The existence and uniqueness of (uh , λh ) now follows from (12) and Lemma 1. The announced estimate comes from the fact that a(uh , uh ) = l(uh ). We prove now the following abstract result (the extension of Cea’s lemma). 1 In what follows, the symbol C will be used to denote a generic positive constant which does not depend on h and which can take different values at different places of its appearance.
1478
JAROSLAV HASLINGER AND YVES RENARD
Lemma 2. Let (u, λ) and (uh , λh ) be the solution to Problems (5) and (9), respectively. Suppose that (11) and (12) are satisfied. Then there exists a constant C > 0 P h such that independent of V6 h and W a u, v h − l v h 0 0 0 0 h0 0 0u − uh 0 ≤ C 0 h0 inf u − v V + sup . V 0v 0 v h ∈V0h v h ∈V0h ,v h =0 V Proof. For a given function v h ∈ V0h one has 0 02 α 0uh − v h 0V ≤ a uh − v h , uh − v h = a u − v h , uh − v h + l uh − v h − a u, uh − v h . Thus, 0 h 0 0 0 0u − v h 0 ≤ C 0u − v h 0 + V V
a u, wh − l wh 0 h0 sup . 0w 0 w h ∈V0h ,w h =0 V
From the triangle inequality u − uh V ≤ u − v h V + uh − v h V we obtain the result. h h )| Remark 1. The term supvh ∈V0h ,vh =0 |a(u,vvh)−l(v is called a consistency error. V Corollary 1. Under the assumptions of Lemma 2, there exists a constant C > 0 P h such that independent of V6 h and W
0 0 0 0 h0 h0 h 0 0 u−u V ≤C (13) inf u − v V + inf λ − μ W . μh ∈W h
v h ∈V0h
Proof. Since u is a solution to Problem (5) one has F G a u, v h = l v h − λ, v h W,X ∀v h ∈ V0h . The definition of V0h yields F G G F a u, v h − l v h = − λ, v h W,X = μh − λ, v h W,X ∀v h ∈ V0h ∀μh ∈ W h , so that a u, v h − l v h ≤
inf
μh ∈W h
0 0 0 0 0λ − μh 0 0v h 0 W V
∀v h ∈ V0h .
This, together with Lemma 2, gives (13). We establish now the following convergence result. Proposition 2. Suppose that (11) and (12) are satisfied and, in addition, let the system {V0h }, {W h }, h → 0+ be dense in V0 and L2 (ΓD ), respectively. Then uh → u
in V, h → 0+,
where u and uh are the first components of the solution to (5) and (9), respectively. Proof. From Proposition 1 it follows that 0 h0 0u 0 ≤ C ∀h > 0. V
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
1479
Thus, there exists a subsequence, still denoted by the same symbol and an element u ∈ V such that uh u
(14)
in V, h → 0+.
Since {W h } is dense in L2 (ΓD ), for any μ ∈ L2 (ΓD ) there exists a sequence {μh }, μh ∈ W h such that (15)
μh → μ in L2 (ΓD ), h → 0+.
Passing to the limit in the last equality in (9), using (14) and (15) we see that μudΓ = 0 ∀μ ∈ L2 (ΓD ), ΓD
which is equivalent to u ∈ V0 . Let v ∈ V0 be given. Then, by the assumption there exists a sequence {v h }, v h ∈ V0h such that (16)
vh → v
in V, h → 0+.
Since uh solves (9) we have a uh , v h = l v h . From this, (14), and (16) we see that a(u, v) = l(v) ∀v ∈ V0 , i.e., u := u solves the original problem. As u is unique, the whole sequence {uh } tends weakly to u in V . Strong convergence of {uh } to u follows from the fact that |uh |1,Ω → |u|1,Ω , which is easy to verify. In what follows, we shall estimate the first term on the right of (13). To simplify our presentation we shall consider a purely homogeneous Dirichlet problem, i.e., with ΓD = Γ and such that its solution u belongs to H 1+d/2+ε (Ω) ∩ H01 (Ω) for some ε > 0 (Ω ⊂ Rd ). From the embedding theorem it immediately follows that (17) u ∈ C1 Ω . For δ > 0 given, we denote by Ωδ the subset of Ω: Ωδ = {x ∈ Ω : dist(x, Γ) > δ}. Let ηh be a sufficiently smooth cutoff function: 1 in Ω \ Ω2h , ηh = 0 in Ω3h . In Ω2h \ Ω3h the function ηh is defined in such a way that (18)
∇j ηh C(Ω) ≤
C , j = 1, 2. hj
1480
JAROSLAV HASLINGER AND YVES RENARD
The solution u can be split and written in the form u = ηh u + (1 − ηh )u. Next, we show that √ ηh uV ≤ C h, h → 0+.
(19) Indeed, (20)
ηh u2V = u21,Ω\Ω2h + ηh u21,Ω2h \Ω3h .
From (17) it immediately follows that u21,Ω\Ω2h ≤ Ch, h → 0+. To get the estimate of the second term on the right of (20) it is sufficient to estimate the respective seminorm. It holds: 2 2 2 2 2 |ηh u|1,Ω2h \Ω3h ≤ C |∇ηh | u dΩ + ηh |∇u| dΩ Ω2h \Ω3h
(21)
Ω2h \Ω3h
≤ Ch, h → 0+,
making use of (18) and the elementary estimate (22)
max
x∈Ω2h \Ω3h
|u(x)| ≤ Ch,
which holds in view of the fact that u = 0 on Γ. From (21) and (22) we obtain (19). h Let V00 be a subset of Vh containing functions vanishing in a vicinity of Γ. More precisely, 4 5 h V00 = v h ∈ V h : v h (a) = 0 ∀a ∈ N h , h ⊂ V0h . where N h is the set of those nodes of T h which lie in Ω\Ω3h/2 . Observe that V00 By ΠT v we denote the standard P -Lagrange interpolate of v on an element T ∈ T h , T ⊂ Ω. Since P ⊇ Pk (k ≥ 1) we know that
v − ΠT v1,T ≤ ChT v2,T
(23)
holds for any v ∈ H 2 (T ), T ∈ T h and T ⊂ Ω. Proposition 3. Suppose that V6 h is defined by (8), let (11) and (12) be satisfied, and, in addition, 0 0 (24) inf 0λ − μh 0W ≤ Chβ , for some β ≥ 1/2. μh ∈W h
Let the solution u of (4) with Γ = ΓD be such that u ∈ H 1+d/2+ε (Ω) ∩ H01 (Ω), ε > 0. Then √ 0 0 0u − uh 0 ≤ C h, h → 0+. V
1481
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
Proof. It is sufficient to estimate the first term on the right of (13). It holds: 0 0 0 0 0 0 inf 0u − v h 0V ≤ inf 0u − v h 0V = inf 0ηh u + (1 − ηh )u − v h 0V v h ∈V0h
h v h ∈V00
h v h ∈V00
0 0 ≤ ηh uV + 0(1 − ηh )u − v h 0V
We construct v h as follows: v h|
T
= ΠT (1 − ηh )u|
T
h ∀v h ∈ V00 .
if T ⊂ Ω,
h otherwise, we set v h = 0. It is readily seen that v h ∈ V00 and from (23) it follows that 0 0 0(1 − ηh )u − v h 0 ≤ Ch(1 − ηh )u2,Ω ≤ Chu2,Ω + Chηh u2,Ω . (25) V
A direct computation shows that (26)
C ηh u2,Ω ≤ √ , h → 0+. h
Indeed, the H 2 (Ω)-seminorm can be estimated by 0 2 02 2 2 0 0 ∇ ηh C(Ω) |ηh u|2,Ω ≤ C u dΩ + Ω2h \Ω3h
Ω2h \Ω3h
|∇ηh | |∇u| dΩ + 2
2
|u|22,Ω
≤
C , h
as follows from (18) and (22). Using (26) in (25) we see that √ 0 0 0(1 − ηh )u − v h 0 ≤ C h, h → 0+. V From this and (19) we finally arrive at √ 0 0 inf 0u − v h 0V ≤ C h. v h ∈V0h
√ The convergence rate given by the previous proposition is only of order h. The numerical experiments of section 7 show that this result, based on the classical formulation, is optimal, in general. The aim of the next section is to propose a stabilization technique to overcome this limitation. 5. A stabilized formulation. In this section we adapt a stabilization technique presented by Barbosa and Hughes in [2, 3] in order to recover an optimal rate of convergence. Note that the link between this stabilization technique and the former Nitsche’s method [20] has been established in [24]. Moreover, it has been recently used to interface problems with nonmatching meshes in [4] and to elastostatic contact problems in [14]. We present its symmetric version although the nonsymmetric one can be considered in the same way. This technique is based on the addition of a supplementary term involving the normal derivative on ΓD . In fact, we need a little bit more general definition. Let us suppose that we have at our disposal an operator Rh : V h −→ L2 (ΓD ), which approximates the normal derivative on ΓD , (i.e., for v h ∈ V h converging to a sufficiently smooth function v, Rh (v h ) tends to ∂n v in an appropriate sense). Several
1482
JAROSLAV HASLINGER AND YVES RENARD
choices of Rh will be proposed later. We suppose that the following estimate holds for this operator: 0 0 0 0 (27) h1/2 0Rh v h 00,Γ ≤ C 0∇v h 00,Ω ∀v h ∈ V h , ∀h > 0. D
To obtain the stabilized problem we replace the Lagrangian (7) by the following one: γ h 2 Lh v h , μh = L v h , μh − μ + Rh v h dΓ, v h ∈ V h , μh ∈ W h , 2 ΓD where for the sake of simplicity γ := hγ0 is chosen to be a positive constant over Ω (for nonuniform meshes, an element dependent parameter γ = hT γ0 is a better choice). The corresponding discrete problem reads as follows: (28) ⎧ Find uh ∈ V h and λh ∈ W h such that ⎪ ⎪ ⎪ h ⎪ h h ⎨a uh , v h + λ + Rh uh Rh v h dΓ = l v h ∀v h ∈ V h , λ v dΓ − γ ΓD ΓD ⎪ h ⎪ ⎪ h h h ⎪ λ + R uh μh dΓ = 0 ∀μh ∈ W h . μ u dΓ − γ ⎩ ΓD
ΓD
As in [2], let us define the form Bh : (V h × W h )2 −→ R by Bh uh , λh ; v h , μh := a uh , v h + λh v h dΓ + μh uh dΓ ΓD ΓD h h h h λ +R u μ + Rh v h dΓ. −γ ΓD
Then, (28) is equivalent to h h Find V h and λh ∈ hu ∈ W such that (29) h h h h Bh u , λ ; v , μ = l v , ∀ v h , μh ∈ V h × W h . Moreover, this formulation is consistent in the sense that the solution (u, λ) to problem (5) satisfies (30) Bh u, λ; v h , μh = l v h , ∀v h ∈ V h , ∀μh ∈ W h , provided that λ ∈ L2 (ΓD ) with Bh having the same definition as Bh but replacing Rh (u) by ∂n u. The following hypothesis on the approximation property of W h will be needed to get an abstract result. Let P h : L2 (ΓD ) −→ W h be the L2 -projection on W h . We suppose that there exists a constant C > 0 independent of h such that 0 h 0 0P v − v 0 (31) ≤ Ch1/2 v1/2,ΓD , ∀v ∈ H 1/2 (ΓD ). 0,Γ D
This allows one to establish the following “inf-sup” property of Bh . Lemma 3. Let hypotheses (11), (27), and (31) be satisfied. Then for γ0 > 0 sufficiently small there exists a constant C > 0 independent of h such that 0 0 Bh v h , μh ; z h , η h 0 h h 0 (32) sup ≥ C 0 v h , μh 0 , 0 z , η 0 (0,0) =(z h ,η h )∈V h ×W h where |(z h , η h )|2 := z h2V + h−1 z h 20,Γ + hη h 20,Γ . D
D
1483
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
Proof. The proof is an adaptation of the one in [24], Lemma 5. First of all, for (v h , μh ) ∈ V h × W h arbitrary, γ0 > 0 sufficiently small, and from (27) one has 0 02 0 02 02 0 Bh v h , μh ; v h , −μh = 0∇v h 00,Ω + γ0 h 0μh 00,Γ − γ0 h 0Rh v h 00,Γ D D 0 0 h 02 0 h 02 0 0 0 (33) ≥ C ∇v 0,Ω + h μ 0,Γ . D
Next, from (27) and the Young inequality we get for μh := h−1 P h v h : h μ + Rh v h μh dΓ Bh v h , μh ; 0, μh = μh v h dΓ − γ ΓD
ΓD
0 02 ≥ h 0P h v h 00,Γ D 0 0 0 0 0 0 h0 0 − C ∇v 0,Ω + h1/2 0μh 00,Γ h−1/2 0P h v h 00,Γ D D 0 0 −1 0 h h 02 P v 0,Γ ≥h −1
D
2 h−1 0 0 0 0 0 C 2 0 0∇v h 0 + h1/2 0μh 0 0P h v h 02 − − 0,ΓD 0,Ω 0,Γ D 2 2 −1 0 02 0 02 0 h 0 h h0 2 P v 00,Γ − C 0∇v h 00,Ω + h 0μh 00,Γ . ≥ D D 2
(34)
We now take (z h , η h ) = (v h , −μh + δμh ) in (32) with δ > 0. Using (33), (34), and δ sufficiently small one has Bh v h , μh ; z h , η h = Bh v h , μh , v h , −μh + δBh v h , μh , 0, μh 0 0 02 02 0 02 (35) ≥ C 0∇v h 00,Ω + h−1 0P h v h 00,Γ + h 0μh 00,Γ . D
D
Since {1} ⊂ W , then for the L -projection of v on {1} we obtain 2 2 0 h h 02 1 1 0P v 0 (36) ≥ v h dΓ dΓ = v h dΓ . 0,ΓD |ΓD | ΓD |ΓD | ΓD ΓD h
2
h
Let β > 0 be sufficiently small. Then it holds: 0 0 02 0 02 02 0 02 (37) 0∇v h 00,Ω + h−1 0P h v h 00,Γ = 0∇v h 00,Ω + (1 − β)h−1 0P h v h 00,Γ D D 0 0 −1 0 h h h h 02 + βh P v − v + v 0,Γ D 2 0 h 02 1 h 0 0 (38) ≥ ∇v 0,Ω + (1 − β) v dΓ |ΓD |diam(Ω) ΓD 0 02 02 0 + βh−1 0v h 00,Γ − 0P h v h − v h 00,Γ D D 0 0 0 02 0 h 02 h0 −1 0 h 02 0 0 v 0,Γ − h v 01/2,Γ ≥ C v V + βh D D 0 02 0 0 h0 −1 0 h 02 0 (39) ≥C v V +h v 0,Γ , D
where we used (31), the fact that + (1 − v h dΓ)2 )1/2 is an equivalent norm on V and the trace theorem. Finally, one obtains (32) combining (35) and (39) together with the fact that |(z h , η h )| ≤ C|(v h , μh )|. (∇v h 20,Ω
1 β) |Γ |diam(Ω) ( Γ D D
1484
JAROSLAV HASLINGER AND YVES RENARD
Remark 2. The inf-sup condition straightforwardly ensures the existence and uniqueness of a solution to the discrete problem (28) for γ0 > 0 sufficiently small. Now, we can prove the following abstract error estimate. Theorem 1. Let (11), (27), and (31) be satisfied and γ0 > 0 be sufficiently small. If (u, λ) is the solution to Problem (5) such that λ ∈ L2 (ΓD ), then there exists a constant C > 0 independent of h and (u, λ) such that the following estimate holds: 0 0 0 u − uh , λ − λh 0 ≤C
inf
v h ∈V h ,μh ∈W h
0 0 0 0 0 u − v h , λ − μh 0 + h1/2 0Rh v h − ∂n u0 . 0,Γ D
Proof. From (30) it follows that B h u, λ, z h , η h = Bh uh , λh , z h , η h ∀ z h , η h ∈ V h × W h . Thus, for any (v h , μh ) ∈ V h × W h one has Bh v h , μh , z h , η h − Bh u, λ, z h , η h = Bh v h − uh , μh − λh , z h , η h ∀ z h, ηh ∈ V h × W h . A direct computation leads to 0 0 Bh v h , μh ; z h , η h − Bh u, λ; z h , η h ≤ C 0 u − v h , λ − μh 0 0 0 0 0 0 z h , η h 0 . + h1/2 0Rh v h − ∂n u00,Γ D
Further, 0 0 0 0 0 0 0 u − uh , λ − λh 0 ≤ 0 u − v h , λ − μh 0 + 0 v h − uh , μh − λh 0 0 0 ≤ 0 u − v h , λ − μh 0 Bh v h − uh , μh − λh ; z h , η h 0 h h 0 + C sup 0 z , η 0 (0,0) =(z h ,η h )∈V h ×W h 0 0 0 0 ≤ C 0 u − v h , λ − μh 0 + h1/2 0Rh v h − ∂n u0 0,ΓD
holds for any (v h , μh ) ∈ V h × W h . In the rest of this section we show how to use the abstract result of Theorem 1 to establish an optimal a priori error estimate for the following standard finite element spaces: . h 6 : vh (40) V6 h = v h ∈ C Ω ∈ P (T ) ∀T ∈ T , ku ≥ 1, k u |T . h 6 : μh P h = μh ∈ L2 Ω , kλ ≥ 0. (41) W | ∈ Pkλ (T ) ∀T ∈ T T
In order to estimate the boundary terms, we shall need the following classical estimate which is satisfied for any T ∈ T h and any w ∈ H 1 (T ) provided that ΓD is smooth enough (see Appendix A for the proof): w20,T + hT w21,T . (42) w20,Γ ∩T ≤ C h−1 T D
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
1485
Let k = min(ku , kλ + 1) and consider two continuous extension operators: 6 , Tuk : H k+1 (Ω) −→ H k+1 Ω 6 , Tλk : H k−1/2 (ΓD ) −→ H k Ω where H k−1/2 (ΓD ) stands for the space of traces on ΓD of functions from H k (Ω). Due to Calder´ on’s extension theorem, it is always possible to build such operators provided that the domain Ω has the uniform cone property (see [1], for instance). This allows Ph : us to define the following interpolation operators on V6 h and W k k,h 6 k,h Π Tu (v) ∀v ∈ H k+1 (Ω), u (v) := Π 6 k,h (μ) := Πk−1,h T k (μ) ∀μ ∈ H k−1/2 (Γ ), Π D λ λ where Πk,h stands for the standard Lagrange interpolation operator by piecewise polynomial functions of degree less or equal k defined on the mesh T h . An exception has to be done for k = 1 when the Lagrange interpolation operator will be replaced 6 by Cl´ement’s one for the interpolation of the multiplier since functions from H 1 (Ω) are not generally continuous (see [8]). Due to the known approximation properties of these operators on regular families of meshes (see [7] and [8]), one has for any v ∈ H k+1 (Ω): 0 0 0 0 0 0 6 k,h 0 0 6 k,h k (v) − T (v) 0 6 0Πu (v) − v 0 ≤ 0Π u u V 1,Ω 0 0 k0 k k 0 ≤ Ch Tu (v) k+1,Ω 6 ≤ Ch vk+1,Ω , and for any μ ∈ H k−1/2 (ΓD ) taking into account (42): 0 02 02 0
0 6 k,h 0 0 0 6 k,h ≤C (μ) − T k (μ)0 h−1 0Π 0Π (μ) − μ0 λ
0,ΓD
λ
λ
T ∈T h
02 0 ≤ Ch2k−1 0Tλk (μ)0
6 k,Ω
0,T
0 02 0 6 k,h 0 k + h 0Π (μ) − T (μ) 0 λ λ 1,T
≤ Ch2k−1 μ2k−1/2,Γ . D
6 k,h (v) − v0,Γ for v ∈ H k+1 (Ω) In the same way one can derive the estimate Π u D and also obtain the estimate (31) (using Cl´ement’s interpolation operator). Thus, an a priori error estimate can be derived provided that the following approximation property of Rh holds: 0 0 0 0 h 6 k,h ≤ Chk−1/2 vk+1,Ω . (43) 0R Πu (v) − ∂n v 0 0,ΓD
P h be defined by (40) and (41), respectively. Let Theorem 2. Let V6 h and W (u, λ) be the solution to Problem (5) such that u ∈ H k+1 (Ω) and λ ∈ H k−1/2 (ΓD ) for k = min{ku , kλ + 1}. Assume that (27) and (43) are satisfied. Then the following estimate holds: 0 0 0 u − uh , λ − λh 0 ≤ Chk uk+1,Ω , where (uh , λh ) is the solution to Problem (28). P h ∩ C(Ω) 6 instead of W P h does not Remark 3. Note that for kλ ≥ 1 the use of W h change the result. Note also that the definition of the norm |(u−u , λ−λh )| involves
1486
JAROSLAV HASLINGER AND YVES RENARD
a standard error estimate for u − uh V . However, it does not provide an estimate of λ − λh −1/2,ΓD but the one of h1/2 λ − λh 0,ΓD . An additional optimal estimate of u − uh 0,ΓD is also available without supplementary regularity assumptions. This is due to the use of the Pitk¨ aranta technique [21]. Error estimates with natural norms instead of mesh dependent norms are also possible for the stabilized problem (see [3]). 5.1. Case Rh (v h ) = ∂n v h and an additional condition on the mesh. A natural choice for the operator Rh is of course Rh v h = ∂n v h on ΓD , which corresponds to the original method of Barbosa and Hughes. In this case, unfortunately, the stability condition (27) is verified only under an additional regularity assumption on the intersection of the mesh with Ω. We denote by Tˆ a reference element such that T = τT (Tˆ ) for all T ∈ T h , where τT is a regular affine transformation in Rd . The assumption on the mesh can be expressed as follows (see [21] for a similar one):
(44)
There exists a radius ρˆ > 0 independent of h such that for any yT , ρˆ) T ∈ T h , T ∩ Ω = ∅ the reference element Tˆ contains a ball B(ˆ which satisfies B(ˆ yT , ρˆ) ⊂ τT−1 (T ∩ Ω).
Under this assumption, inequality (27) is satisfied for V6 h defined by (40) (see the proof in Appendix B). Moreover, the following lemma says that (43) is also satisfied. Lemma 4. Let V6 h be defined by (40), Rh (v h ) = ∂n v h on ΓD and assume that (44) is satisfied. Then (43) is satisfied as well. Proof. Recall that k = min(ku , kλ + 1). Using (42) and standard interpolation error estimates one has for any v ∈ H k+1 (Ω): 0 02 0 h 6 k,h 0 0R Πu (v) − ∂n v 0
0,ΓD
≤
02 0 0 6 k,h 0 0∇Πu (v) − ∇v 0 T ∈T h
0,ΓD ∩T
02 0
0 0 6 k,h k ≤C h−1 0∇Π u (v) − ∇Tu (v)0 T ∈T h
0,T
0 02 0 6 k,h 0 k + h 0∇Πu (v) − ∇Tu (v)0 1,T 0 2
0 h−1 hk 0Tuk (v)0k+1,T ≤C T ∈T h
2 0 0 + h hk−1 0Tuk (v)0k+1,T ≤ Ch2k−1 v2k+1,Ω . We can deduce that if Rh (v h ) = ∂n v h on ΓD , the estimate of Theorem 2 holds provided that (44) is satisfied. This assumption, however, restricts the use of our fictitious domain approach. Indeed if, for instance, one wants to approximate an evolving boundary, the intersection of elements with the real domain will be arbitrary. The aim of the next section is to introduce an operator Rh with a reinforced stability, enabling us to work with an arbitrary domain.
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
1487
.
Ω
“good” element T
ΓD
T “bad” element .
Fig. 3. The choice of T for an element T having a small intersection with Ω. In this case, it is more stable to evaluate the normal derivative from a natural extension of vh from T on T because smaller is the thickness of this intersection; poorer approximation of the normal derivative h . on T ∩ ∂Ω is obtained using v|T .
lT T Td
lT
.
Fig. 4. Prolongation of T .
5.2. Operator Rh with a reinforced stability. We give here an example of how to construct an operator Rh ensuring both the approximation property (43) as well as the stability property (27) for an arbitrary intersection of the mesh T h with the domain Ω. The proposed construction is only local and quite simple to implement. Let ρˆ > 0 be an a priori given small radius (ˆ ρ << 1). For each element T ∈ T h such that T ∩ Ω = ∅, we will designate by T either the element T itself if there is a ball B(ˆ yT , ρˆ) ⊂ τT−1 (T ∩ Ω) (a “good” element) or any neighbor element possessing this property if T itself does not satisfy it (T is a “bad” element). The proposed operator Rh will simply be equal to ∂n v hT ,T where v hT ,T is either v h| if T = T or the natural extension of v h| onto T if T = T . Of course, ρˆ > 0 T T has to be sufficiently small such that T always exists, which is not a big constraint. It is not difficult to see that the stability condition (27) is satisfied with such a choice of the operator Rh (see Appendix C for the sketch of the proof). The following lemma establishes that (43) is also satisfied so that the estimate of Theorem 2 holds, again. Lemma 5. Let V6 h be defined by (40), and Rh (v h ) := ∂n v hT ,T on ΓD . Then (43) is satisfied. Proof. Suppose that T is a “bad” element, i.e., T ∩ Ω is “thin” and let T be a “good” neighbor element as described above (see also Figure 3). We prolong T and construct the new element Td as shown in Figure 4. The interpolation on Td is defined by the interpolation on T . More precisely: d k+1 R and vT := v| . let v ∈ Hloc T
1488
JAROSLAV HASLINGER AND YVES RENARD
By ΠT vT we denote the Pk -Lagrange interpolant of vT constructed on T (i.e., using degrees of freedom in T ) but with the domain of definition being the whole Rd . The interpolation of v on Td is defined as ΠTd v := ΠT vT |
Td
.
Classical arguments based on the fact that v − ΠTd v vanishes for all polynomials of degree less or equal k lead to the following approximation property (see [6], for instance): hTd ≤ 2hT . v − ΠTd vm,Td ≤ Chk+1−m vk+1,Td T d
Analogically to Lemma 6 (see Appendix A) it holds: v20,Γ ∩Td ≤ C h−1 v20,Td + hTd v21,Td . T D
d
To get (43) we proceed as in Lemma 4. Only we have to sort all elements into “good” and “bad” ones and to use either ΠT or ΠTd . 6. Some practical details for implementation. The implementation of the proposed method requires one to overcome a certain number of difficulties. First of P h . As all, one has to select bases of the spaces V h and W h from the ones of V6 h and W h far as V is concerned, the task is rather easy because it suffices to select the basis functions among the ones of V6 h which are not identically equal to zero in Ω (one can eventually remove those for which the intersection of their support with Ω is too small). It is a little more difficult to find a basis of the space W h . Indeed, the traces on P h may be linearly dependent, especially if Γ is rectilinear. ΓD of basis functions of W D A possible way to overcome this difficulty is to eliminate the redundant functions by analyzing the elementary mass matrices whose components are Γ ∩T ψi ψj dΓ, where D Ph. {ψi } are the shape functions of W Another difficulty concerns the numerical integration: one needs to build integration formulas on the intersection of elements with the domain Ω as well as on the intersection of elements with ΓD . Our finite element library, Getfem++ [22], uses splitting of elements into simplices in a conformal way with respect to ∂Ω and then it applies a standard integration formula on each subelement. If ∂Ω is curved, then some curved subelements can be used. One obtains an integration formula on ΓD by considering the faces of the subelements lying on ΓD . The natural extension of functions on “bad” elements which is needed to obtain the fully stabilized method described in section 5.2 consists in seeking information in a “good” nearby element. This can be a handicap for certain finite element codes where calculations are done only elementwise. A possible remedy is to precompute a global discrete extension operator which gives the solution extended to “bad” elements from the original one. Then, the matrices involving Rh (vh ) are obtained as a composition of classical matrices with this discrete extension operator. The Xfem method is often associated with the use of some level-sets of functions defined on the mesh. This is particularly useful when, for instance, one needs to represent an evolving interface. In our case such a level-set can be utilized to represent the boundary of Ω. The implementation in Getfem++ uses this strategy. Generally, this involves an additional approximation of Ω. In our numerical tests presented in the next section, the level-set functions are piecewise second degree polynomials. In this case the level-set approach has no influence on the rate of convergence of the used finite element methods which are of the first and second order.
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
1489
.
ΓN
ΓD
.
Fig. 5. Test domain and a triangular structured mesh.
Fig. 6. Exact solution.
Fig. 7. Approximated solution on a rough mesh with the P2 /P1 method. Only the elements intersecting Ω are depicted. The black curve is the boundary of Ω and the white curve is the zero level-set of the approximated solution.
7. Numerical experiments. In this section, we present 2D-numerical tests. 6 = ] − 1/2, 1/2[2. The exact solution is u(x) = R4 − The fictitious domain is Ω 7π 4 |x| (5+3 sin(7θ + 36 ))/2, where R = 0.47 and θ(x) = arctan(x2 /x1 ). The real domain is Ω = {x ∈ R2 : u(x) < 0}, and the Dirichlet and Neumann boundary conditions are defined on ΓD = Γ ∩ {x ∈ R2 : x2 < 0} and ΓN = Γ ∩ {x ∈ R2 : x2 > 0}. The domain Ω is represented in Figure 5 with an example of a triangular structured mesh. The exact solution is shown in Figure 6 while a computed solution on a rough mesh is depicted in Figure 7.
1490
JAROSLAV HASLINGER AND YVES RENARD
7.1. Without stabilization. First, we present numerical tests without any staPh . bilization. We tested several choices of the finite element spaces V6h and W In order to avoid the locking phenomena, the couple of selected finite element spaces should satisfy as much as possible a discrete mesh independent inf-sup condition since the stabilization is not used. For instance, it is known that the P1 /P0 method for the discretization of u, λ, respectively, does not satisfy such a condition. The linear system to be solved is of the form
K B
(45)
T
B 0
U L = , Λ 0
where U and Λ are the degrees of freedom of uh and λh , respectively, and the components of K, B, and L are Kij = ∇ϕi .∇ϕj dΩ, Bij = ψi ϕj dΓ, Li = f ϕi dΩ + gϕi dΓ, Ω
ΓD
Ω
ΓN
Ph , respectively. In our with {ϕi }, {ψj } being the selected basis functions of V6h , W experiments, this system is solved using the library Superlu [9] (a direct LU solver for sparse matrices). The test program can be downloaded on the Getfem++ web site [22]. It allows one to test many other couples of elements and to treat also 3D problems. The couples of spaces tested are the following: P1 /P0 , P1 +/P0 (a standard continuous P1 element for u enriched by a cubic bubble function and a standard P0 element for the multiplier), Q1 /Q0 (standard continuous Q1 and discontinuous Q0 elements on quadrilaterals), P2 /P1 , P2 /P0 , and Q2 /Q1 . Rates of convergence are presented in Figure 8. One can see that in all experiments the rate of convergence in the H 1 (Ω)-norm is better than the theoretical one given by Proposition 3 except for the P1 /P0 case which is a little bit slower than h1/2 . The choice P1 /P0 suffers of course from the non-satisfaction of a mesh-independent inf-sup condition. It has to be stressed that in all the experiments without stabilization, and particularly for the P1 /P0 case, a singular linear system can be obtained. However, in all examples, presented here, we selected some cases with a non-singular linear system. It is also seen that convergence of the multiplier is not generally obtained, especially for degree one methods. Figure 9 illustrates a poor quality of the multiplier for the P1 /P0 method. The P2 /P1 method gives slightly better results (see Figure 10 still with some oscillations in parts where the intersection of the element with the domain Ω is very small). 7.2. The stabilized method with Rh (v h ) = ∂n v h . The numerical experiments are now done using the standard Barbosa–Hughes stabilization technique (with γ = 0.1). It has been proven in section 5.1 that this method is optimal whenever the intersection of elements with the domain Ω is not too small. This is not easy to satisfy in computations. Of course, one way to avoid small intersections would be to move a little bit some mesh nodes, at least when a structured mesh is not required. We did not test this possibility. Unlike (45), the linear system to be solved is now of the form
Kγ Bγ
T
Bγ −Mγ
U L = , Λ 0
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
Rate of convergence u − uh 0,Ω .
1491
Rate of convergence u − uh 1,Ω .
Rate of convergence λ − λh 0,ΓD . Fig. 8. Rates of convergence for some couples of finite element spaces with no stabilization.
where the components of Kγ , Bγ , and Mγ are
∇ϕi .∇ϕj dΩ − γ
(Kγ )ij = Ω
Rh (ϕi )Rh (ϕj )dΓ, ΓD
ψi ϕj − γRh (ϕj ) dΓ,
(Bγ )ij = ΓD
(Mγ )ij = γ
ψi ψj dΓ, ΓD
respectively. Note that Kγ is invertible provided that γ is sufficiently small. The whole matrix of the system is invertible as well whatever is Bγ . Rates of convergence are presented in Figure 11 for the same couples of elements as in the previous section. The stabilization significantly improves the convergence of the P1 /P0 choice (the stabilization with bubble functions is no longer necessary) and the convergence of quadratic elements. Moreover, the linear system is guaranteed to be invertible. Figure 12 shows that also the approximation of the multiplier is considerably improved. The convergence rate is improved by the stabilization, but some problems remain with too small intersections of elements with Ω even for degree two methods (see Figure 13).
1492
JAROSLAV HASLINGER AND YVES RENARD
Fig. 9. Multiplier on ΓD with no stabilization for the P1 /P0 method (h = 0.05).
Fig. 10. Multiplier on ΓD with no stabilization for the P2 /P1 method (h = 0.05).
7.3. The fully stabilized method. We now consider the fully stabilized method described in section 5.2. An element T is considered to be “bad” when |T ∩ Ω| is less than one percent of |T |. The convergence curves given in Figure 14 are rather the same than with the standard Barbosa–Hughes stabilization used in the previous sec-
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
Rate of convergence u − uh 0,Ω .
1493
Rate of convergence u − uh 1,Ω .
Rate of convergence λ − λh 0,ΓD . Fig. 11. Rates of convergence for some couples of finite element spaces with the Barbosa– Hughes stabilization.
Fig. 12. Multiplier on ΓD with the Barbosa–Hughes stabilization for the P1 /P0 method (h = 0.05).
1494
JAROSLAV HASLINGER AND YVES RENARD
Fig. 13. Multiplier on ΓD with the Barbosa–Hughes stabilization for the P2 /P1 method (h = 0.05).
Rate of convergence u − uh 0,Ω .
Rate of convergence u − uh 1,Ω .
Rate of convergence λ − λh 0,ΓD . Fig. 14. Rates of convergence for some couples of finite element spaces with the fully stabilized method.
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
1495
Fig. 15. Multiplier on ΓD with the fully stabilized method for the P1 /P0 method (h = 0.05).
Fig. 16. Multiplier on ΓD with the fully stabilized method for the P2 /P1 method (h = 0.05).
tion. However, we see that the multipliers behave in a more regular way than before (see Figures 15 and 16). The difference lies only on the elements having a too small intersection with the domain. 8. Concluding remarks. In this paper, we combined the Xfem approach together with the Barbosa–Hughes stabilized formulation to get a new fictitious domain method. This method is quite simple to implement since all the variables (multipliers and primal variables) are defined on a single mesh independent of the computational domain. Moreover, it potentially allows one to treat complex boundary conditions (such as contact and friction). The fully stabilized method introduced in section 5.2 leads to a robust method in the sense that it converges whatever is the intersection of the domain with the mesh. This is not the case if the Barbosa–Hughes stabilization technique is used alone, for which the quality of the approximation of the multiplier cannot be guaranteed on the elements having a too small intersection with the domain. Note that in [21] a similar
1496
JAROSLAV HASLINGER AND YVES RENARD
approach is presented. However, the error estimate is given under the assumption (44) and the definition of multipliers requires the construction of a quasi-uniform family of meshes on the boundary. Appendix A. In this appendix we prove the trace inequality (42). For a proof in a more classical framework see, for instance, [12]. The proof is done by scaling with respect to a reference element Tˆ. We recall that for all T ∈ T h one has T = τT (Tˆ ), where τT is an affine and invertible mapping in Rd . We make the following hypotheses: (a) ΓD is a Lipschitz-continuous boundary. (b) there exists a constant C2 > 0 independent of h and T ∈ T h such that ∇τT ∞,T ≤ C2 hT and ∇τT−1 ∞,T ≤ C2 h−1 . T ˆ These two hypotheses are satisfied for regular families of meshes provided that ΓD is Lipschitz-continuous. Lemma 6. Let (a) and (b) be satisfied. Then there exists a constant C > 0 independent of h and T ∈ T h such that v20,T + hT v21,T , ∀v ∈ H 1 (T ). v20,Γ ∩T ≤ C h−1 T D
Proof. Since C ∞ (T ) is dense in H 1 (T ) one can confine to functions v ∈ C ∞ (T ). ˆ , one has ˆ = τ −1 (Γ ∩ T ) and n ˆ a unit normal vector to Γ Denoting Γ D D D T 0 −1 0 2 2 d−1 0 0 ˆ ˆ v dΓ = vˆ |det(∇τT )| ∇τT n ˆ dΓ ≤ ChT vˆ2 dΓ, ˆ Γ D
ΓD ∩T
ˆ Γ D
where vˆ = v ◦ τT . Let us prove now that the following trace inequality: ˆ ≤ C3 ˆ (46) vˆ2 dΓ v 21,Tˆ ∀ˆ v ∈ C ∞ Tˆ , ˆ Γ D
ˆ D inside of is such that the constant C3 > 0 does not depend on the position of Γ ˆ T . This has been proved for a straight intersection in [11]. Let us consider the case ˆ curved. For a sufficiently small mesh parameter h the curve Γ ˆ is a graph of a Γ D D ˆ ˆ function over a segment l contained in T . Without loss of generality we may assume that ˆl coincides with the x ˆ-axis (after appropriate shift and rotation of Tˆ ). Then ΓD can be parametrized by the mean of a function (ˆ x, a(ˆ x)), x ˆ ∈ ˆl and one has
a(ˆ x)
vˆ(ˆ x, a(ˆ x)) = vˆ(ˆ x, 0) + 0
∂ vˆ(ˆ x, τ )dτ. ∂y
Thus, x, a(ˆ x)) ≤ C vˆ (ˆ 2
2
vˆ (ˆ x, 0) + 0
a(ˆ x)
2 ∂ vˆ(ˆ x, τ ) dτ , ∂y
where C > 0 is an absolute constant. Integrating over ˆl we obtain 2
∂ 2 2 vˆ (ˆ x, a(ˆ x))dˆ x≤C vˆ (ˆ x, 0)dˆ x+ vˆ(ˆ x, τ ) dτ dˆ x ∂y ˆ ˆ l l Tˆ ≤ Cv1,Tˆ
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
1497
using the result for the straight segment ˆl. Now, we can conclude by the fact that ˆ = vˆ2 (ˆ vˆ2 dΓ x, a(ˆ x)) 1 + (a (ˆ x))2 dˆ x ˆ Γ D
ˆ l
≤ max ˆ l
1 + (a (ˆ x))2 vˆ2 (ˆ x, a(ˆ x))dˆ x, ˆ l
since ΓD is assumed to be Lipschitz-continuous. Using now (46) and ∇ˆ v ∞,Tˆ ≤ ChT ∇v∞,T we can establish the estimate of the lemma:
2 ˆ 2 d−1 2 v dˆ v0,Γ ∩T ≤ ChT vˆ + ∇ˆ x D Tˆ
2 ˆ 2 + x ≤ Ch−1 v ˆ v |det(∇τT )|dˆ ∇ˆ T ˆ T ≤ Ch−1 v 2 dx + ChT |∇v|2 dx. T T
T
Appendix B. We prove the discrete trace inequality (27) when Rh (v h ) = ∂n v h provided that (44) is satisfied under the same hypotheses on the family of meshes and on ΓD as in Appendix A. First we prove the following auxiliary result. Lemma 7. Let v h be defined on T ∈ T h by v h (x) := vˆ(τT−1 (x)) with vˆ ∈ Pk (Rd ) and suppose that (44) is satisfied. Then there exists a constant C > 0 independent of h, T , and vˆ such that h 2 h 2 v v dΓ ≤ Ch−1 dx. T ΓD ∩T
Ω∩T
Proof. Because of the equivalence of norms on Pk (Rd ), one has ˆ v 2∞,Tˆ ≤ ˆ v2∞,B (yˆ
T
≤ Cˆ v◦
,2)
= ˆ v ◦ t(−ˆyT ) 2∞,B(0,2)
t(−ˆy ) 20,B(0,ρ) ˆ T
=
Cˆ v 20,B (yˆ ,ρˆ) T
≤C
vˆ2 dˆ x, τ −1 (T ∩Ω) T
yT . Thus, still with notations where t(−ˆyT ) is the translation defined by t(−ˆyT ) (x) = x−ˆ of Appendix A: 0 0 h 2 ˆ ˆ ≤ Chd−1 ˆ v dΓ = vˆ2 |det(∇τT )| 0∇τT−1 n ˆ0 dΓ v2∞,Tˆ Γ D T ΓD ∩T
ˆ Γ D
≤C
hd−1 T d
hT
−1
vˆ |det(∇τT )|dˆ x = ChT 2
τ −1 (T ∩Ω) T
h 2 v dx. T ∩Ω
Now, summing up the previous estimate over elements of T h one obtains the following result. = vˆT (τT−1 (x)), vˆT ∈ Pk (Rd ), Lemma 8. Let v h be defined on Ω by v h (x)| T
T ∈ T h , and suppose that (44) is satisfied. Then the following estimate holds with a constant C > 0 independent of h and v h : h 2 h 2 h dΓ ≤ C dx. v v ΓD
Ω
1498
JAROSLAV HASLINGER AND YVES RENARD
The discrete trace inequality (27) can be now easily deduced since ∂n v h 0,ΓD ≤ ∇v h 0,ΓD , and for a quasi-uniform family of meshes the previous lemma can be applied to ∇v h componentwise. Appendix C. We now adapt the proof of Appendix B to the operator Rh (v h ) defined in section 5.2. The difference comes from those elements T ∈ T h having a too small intersection with Ω (“bad” elements) and for which a neighbor element T has been selected to make a natural extension of functions. For such an element, the proof of Appendix B has to be modified because we evaluate the polynomial on −1 ˆ a larger zone than Tˆ = τT−1 (T ), namely, on TT,T = τT (T ∪ (T ∩ Ω)). With the quasi-uniform assumption for the meshes, it is readily seen that this zone is included in TˆρR = {x ∈ Rd : dist(x, Tˆ ) ≤ ρR } for some ρR > 0 independent of h, T and T . Lemma 7 can be easily adapted remarking that there exists a constant C > 0 independent of h such that ˆ v ∞,Tˆρ ≤ Cˆ v ∞,Tˆ ∀ˆ v ∈ Pk Rd , R
using again that all norms are equivalent in Pk (Rd ). From this the estimate h 2 h 2 v v dΓ ≤ Ch−1 dx, T ΓD ∩T
Ω∩T
(x)), x ∈ Rd follows. Thus, (27) can be established remarking where v h (x) := vˆ(τT−1 that the element T can be selected as a neighbor element only a finite number times independently of h still due to the quasi-uniform property of the meshes. Acknowledgment. Many thanks to Julien Pommier for his participation to obtain nice numerical experiments. REFERENCES [1] R.A. Adams, Sobolev Spaces, Academic Press, New York, 1975. [2] H.J.C. Barbosa and T.J.R. Hughes, The finite element method with Lagrange multipliers on the boundary: Circumventing the Babuˇ ska-Brezzi condition, Comput. Methods Appl. Mech. Engrg., 85 (1991), pp. 109–128. [3] H.J.C. Barbosa and T.J.R. Hughes, Boundary Lagrange multipliers in finite element methods: Error analysis in natural norms, Numer. Math., 62 (1992), pp. 1–15. [4] P. Hansbo, C. Lovadina, I. Perugia, and G. Sangalli, A Lagrange multiplier method for the finite element solution of elliptic interface problems using nonmatching meshes, Numer. Math., 100 (2005), pp. 91–115. [5] E. Chahine, P. Laborde, and Y. Renard, Crack-tip enrichment in the Xfem method using a cut-off function, Internat. J. Numer. Methods Engrg., 75 (2008), pp. 629–646. [6] P.G. Ciarlet, The finite element method for elliptic problems, Studies in Mathematics and its Applications 4, North-Holland, Amsterdam, 1978. [7] P.G. Ciarlet, The finite element method for elliptic problems, in Handbook of Numerical Analysis, Volume II, Part 1, P.G. Ciarlet and J.L. Lions, eds., North-Holland, Amsterdam, 1991, pp. 17–352. [8] P. Cl´ ement, Approximation by finite elements functions using local regularization, RAIRO, Anal. Numer., 9 (1975), pp. 77–84. [9] J.W. Demmel, J.R. Gilbert, and X.S. Li, A general purpose library for the direct solution of large, sparse, nonsymmetric systems, http://crd.lbl.gov/6 xiaoye/SuperLU/. [10] A. Ern and J.-L. Guermond, Theory and practice of finite elements, Appl. Math. Sci., 159 (2004). [11] V. Girault and R. Glowinski, Error analysis of a fictitious domain method applied to a Dirichlet problem, Japan J. Indust. Appl. Math., 12 (1995), pp. 487–514. [12] P. Grisvard, Elliptic problems in nonsmooth domains. Monographs and Studies in Mathematics, Pitman (Advanced Publishing Program), Boston, MA, 1985.
A NEW FICTITIOUS DOMAIN APPROACH INSPIRED BY XFEM
1499
[13] J. Haslinger and A. Klarbring, Fictitious domain/mixed finite element approach for a class of optimal shape design problems, Math. Model. Numer. Anal. (M2AN), 4 (1995), pp. 435– 450. [14] P. Hild and Y. Renard, A stabilized Lagrange multiplier method for the finite element approximation of contact problems in elastostatics, submitted. [15] T. Hughes and L.P. Franca, A new finite element formulation for computational fluid dynamics. VII. The Stokes problem with various well-posed boundary conditions: Symmetric formulations that converge for all velocity/pressure spaces, Comput. Methods Appl. Mech. Engrg., 65 (1987), pp. 85–96. ¨ n, High order extended finite element [16] P. Laborde, J. Pommier, Y. Renard, and M. Salau method for cracked domains, Internat. J. Numer. Methods Engng., 64 (2005), pp. 354–381. [17] N. Mo¨ es, E. B´ echet, and M. Tourbier, Imposing Dirichlet boundary conditions in the eXtended Finite Element Method, Internat. J. Numer. Methods Engng., 12 (2006), pp. 354–381. [18] N. Mo¨ es, J. Dolbow, and T. Belytschko, A finite element method for crack growth without remeshing, Internat. J. Numer. Methods Engrg., 46 (1999), pp. 131–150. [19] N. Mo¨ es, A. Gravouil, and T. Belytschko, Non-planar 3D crack growth by the extended finite element and level sets, Part I: Mechanical model, Internat. J. Numer. Methods Engrg., 11 (2002), pp. 2549–2568. ¨ [20] J. Nitsche, Uber ein Variationsprinzip zur L¨ osung von Dirichlet-Problemen bei Verwendung von Teilr¨ aumen, die keinen Randbedingungen unterworfen sind, Abh. Math. Univ. Hamburg, 36 (1971), pp. 9–15. ¨ ranta, Local stability conditions for the Babuˇ [21] J. Pitka ska method of Lagrange multipliers, Math. Comput., 35 (1980), pp. 1113–1129. [22] Y. Renard and J. Pommier, Getfem++. An open source generic C++ library for finite element methods, http://home.gna.org/getfem/. [23] F.L. Stazi, E. Budyn, J. Chessa, and T. Belytschko, An extended finite element method with higher-order elements for curved cracks, Comput. Mech., 31 (2003), pp. 38–48. [24] R. Stenberg, On some techniques for approximating boundary conditions in the finite element method, J. Comput. Appl. Math., 63 (1995), pp. 139–148. [25] M. Stolarska, D.L. Chopp, N. Mo¨ es, and T. Belytschko, Modelling crack growth by level sets in the extended finite element method, Internat. J. Numer. Methods Engrg., 51 (2001), pp. 943–960. [26] G. Strang and G.J. Fix, An Analysis of the Finite Element Method, Prentice-Hall, Englewood Cliffs, NJ, 1973. [27] N. Sukumar, D.L. Chopp, N. Mo¨ es, and T. Belytschko, Modeling holes and inclusions by level sets in the extended finite element method, Comput. Methods Appl. Mech. Eng., 46 (2001), pp. 6183–6200. [28] N. Sukumar, N. Mo¨ es, B. Moran, and T. Belytschko, Extended finite element method for three dimensional crack modelling, Internat. J. Numer. Methods Engrg., 48 (2000), pp. 1549– 1570.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1500–1523
c 2009 Society for Industrial and Applied Mathematics
A SADDLE POINT APPROACH TO THE COMPUTATION OF HARMONIC MAPS∗ QIYA HU† , XUE-CHENG TAI‡ , AND RAGNAR WINTHER§ Abstract. In this paper we consider numerical approximations of a constraint minimization problem, where the object function is a quadratic Dirichlet functional for vector fields and the interior constraint is given by a convex function. The solutions of this problem are usually referred to as harmonic maps. The solution is characterized by a nonlinear saddle point problem, and the corresponding linearized problem is well-posed near strict local minima. The main contribution of the present paper is to establish a corresponding result for a proper finite element discretization in the case of two space dimensions. Iterative schemes of Newton type for the discrete nonlinear saddle point problems are investigated, and mesh independent preconditioners for the iterative methods are proposed. Key words. harmonic maps, nonlinear constraints, saddle point problems, error estimates AMS subject classifications. 35A40, 65C20, 65N30 DOI. 10.1137/060675575
1. Introduction. The solutions of many systems of linear partial differential equations can be characterized as minimizers of quadratic functionals over a set of linear constraints. Examples of such systems are the linear Stokes system for fluid flow, the Reissner–Mindlin plate model, and the so-called mixed formulation of second order elliptic equations. The discretizations of these systems lead to linear systems with a saddle point structure, and conditioning of the systems deteriorates as the mesh becomes finer. As a consequence, substantial research on preconditioned iterative methods for the corresponding discrete systems has taken place; cf., for example, [2, 3] or [18, Chapter 6]. The purpose of the present paper is to perform a corresponding analysis for a nonlinear problem. We will study a simple variant of the problem characterizing harmonic maps with respect to a compact manifold. In particular, we will focus on stability and error estimates for the discretization and on preconditioning of the linear saddle point systems arising in a Newton iteration. For a bounded Lipschitz domain Ω ⊂ Rd , we shall consider the problem of finding local minima of a constrained minimization problem of the form 1 E(v) = |∇v|2 dx. (1.1) min v∈H1g (Ω;M) 2 Ω ∗ Received by the editors November 21, 2006; accepted for publication (in revised form) December 8, 2008; published electronically March 25, 2009. The work was supported by the Norwegian Research Council, LSEC (Laboratory of Scientific and Engineering Computing) at the Chinese Academy of Sciences, the Key Project of the Natural Science Foundation of China G10531080, the National Basic Research Program of China 2005CB321702, and the Natural Science Foundation of China G10771178. http://www.siam.org/journals/sinum/47-2/67557.html † LSEC, Institute of Computational Mathematics and Scientific Engineering Computing, Chinese Academy of Sciences, Beijing 100080, China (
[email protected]). ‡ Department of Mathematics, University of Bergen, Johannes Brunsgate 12, Bergen, 5008, Norway and Division of Mathematical Science, School of Physical & Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, Singapore, 637371, Singapore (
[email protected], xctai@ntu. edu.sg). § Centre of Mathematics for Applications and Department of Informatics, University of Oslo, P.B. 1053, Blindern, Oslo, Norway (
[email protected]).
1500
SADDLE POINT APPROACH FOR HARMONIC MAPS
1501
Here H1g (Ω; M) is the set of vector fields with values in a smooth compact manifold M in Rd , with function values and first derivatives in L2 (Ω), and such that the elements v of H1g (Ω; M) satisfies v|∂Ω = g for fixed vector field g defined on the boundary ∂Ω. We will further assume that the target manifold M is implicitly given in the form M = {v ∈ Rd | F (v) = 0 }, where the function F : Rd → Rk is a smooth function, and it will be assumed that the compatibility condition F (g) = 0 holds. More specific assumptions on F and the boundary data g will be given later. Problems of the form (1.1) arise, for example, in liquid crystal and superconductor simulations. The solutions of the problem (1.1) are frequently referred to as harmonic maps [7]. In the present paper we will restrict our study to the case k = 1, i.e., M is of dimension d − 1. We will focus on a nonlinear saddle point approach to compute the solutions of the problem (1.1). For a review of results on the continuous harmonic map problem, we refer to [7, 24, 29, 30]. The purpose of the present paper is to discuss a finite element method for approximating the constraint minimization problem (1.1). For the simplest case of (1.1), with interior constraint given by |v| = 1, several numerical approaches have been discussed; cf., for example, [1, 4, 5, 13, 14, 15, 16, 20, 21, 25, 26, 32]. Variants of the projection method are proposed and analyzed in [1, 5, 16]. However, the standard projection method applies only to the simplest model. Moreover, it was illustrated in [5] that the projection method converges only for very special regular and quasiuniform triangulations for the discretized harmonic map problem. The relaxation method of [13, 21, 25] is using point relaxation with the constraint required at each grid point. Both convergence analysis and numerical experiments are supplied in [25]. An advantage with the relaxation method is that it is very easy to implement. However, disadvantages are that the relaxation parameter has to be chosen properly to obtain convergence and that the convergence of such fixed point iterations is slow. Another commonly used approach for harmonic map problems is to use penalization methods; cf. [4, 14, 15, 16, 20]. It is even often combined with the gradient decent method, which produces some time evolution equations; cf. [4, 11, 12, 14, 15, 16, 20]. The approach and analysis given in [4] even work for general p-harmonic problems, with p close to 1. The analysis of [14, 15] is also valid for problems coupling harmonic maps with Navier–Stokes equations. The main contribution of the present paper is to discuss the use of a saddle point approach for the construction of numerical methods for the constraint minimization problem (1.1). We will show that the corresponding saddle point problem is stable near exact local minima. This is achieved by verifying the standard stability conditions for linear saddle point problems. This verification has the extra difficulty in that the coercivity condition will not hold, in general, but only on the kernel of the linearized constraint. Using the standard stability conditions for the corresponding discrete saddle point problem, we will construct finite element methods such that the corresponding discrete solutions admit an optimal error estimate in the energy norm. Due to some technical difficulties, caused by the use of inverse inequalities to handle some nonlinear terms, this analysis of the finite element discretization is restricted to two space dimensions, i.e., d = 2. In this case we also establish that any critical point of the functional E with respect to H1g (Ω; M) is indeed a local minimum. Compared with other approaches [4, 11, 14, 15], our estimates do not depend on extra artificial parameters like a weight parameter for the penalty method or a step size for a gradient flow. We will also study Newton’s method for the discrete nonlinear saddle
1502
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
point problem and propose a simple and efficient preconditioner for the linear systems arising during the iterations. Numerical tests will be given to show the efficiency of the proposed method. The outline of the paper is as follows. In section 2, the notations and assumption will be specified. In section 3, the continuous problem is studied. The problem (1.1) is formally transformed to a saddle point problem, and stability results will be proved for the continuous model. In section 4 we first describe a finite element discretization for (1.1), and then the discrete stability conditions are established. Using these stability conditions, the existence, local uniqueness, and the error estimates are derived in section 5. Variants of Newton’s method are analyzed in section 6, while numerical experiments are presented in section 7. 2. Notation and preliminaries. Throughout this paper we will use c and C to denote generic positive constants, not necessarily the same at different occurrences. It is assumed that the constants are independent of the mesh size h, which will be introduced later. For vectors v, w ∈ Rd , we use v · w to denote the Euclidian inner product, while the notation A : B is used to denote the Frobenius inner product of two matrices A, B ∈ Rd×d . The corresponding norms are given by |v| and |A|, respectively. For a vector or matrix A, At is the transpose of A. In the special case of vectors v = (v1 , v2 ) in R2 we will use v⊥ = (−v2 , v1 ) to denote the corresponding vector obtained by a rotation of 90 degrees. For m ≥ 0, we will use H m = H m (K) to denote the real valued L2 -based Sobolev spaces on domain K ⊂ Rd , the corresponding norm by · m,K , and | · |m,K is the seminorm involving only the mth order derivatives. The subspace H0m is the closure in H m of C0∞ (K), while H −m is the dual of H0m with respect to an extension of the L2 inner product ·, · . The corresponding L∞ -based Sobolev spaces are denoted W m,∞ (K), with associated norm · m,∞,K . For all the Sobolev norms, we will omit K in case K = Ω. In general, we will use boldface symbols for vector or matrix valued functions. The gradient operator with respect to the spatial variable x = (x1 , x2 , . . . , xd ) is denoted ∇ = (∂/∂x1 , ∂/∂x2 , . . . , ∂/∂xd )t . Furthermore, the gradient of a vector valued function v = (v1 , v2 , . . . vd )t , ∇v, is the matrix valued function obtained by taking the gradient rowwise, i.e., (∇v)ij = ∂vi /∂xj . In order to specify the properties of the constraint functional F : Rd → R, defining the constraint manifold M, we will use DF to denote the gradient of F , i.e., DF (v) = (∂F/∂v1 , . . . , ∂F/∂vd )t and the corresponding Hessian by D2 F (v) = (∂ 2 F/∂vi ∂vj )di,j=1 . Throughout this paper we will assume that the constraint functional F satisfies the following: (i) F is convex and smooth. Furthermore, there exist constants c0 and c1 such that (2.1)
c0 |v|2 ≤ D2 F (ξ)v · v ≤ c1 |v|2 ,
ξ, v ∈ Rd .
(ii) F (0) < 0 and DF (0) = 0. (iii) There exists an > 0 such that the matrix function D2 F satisfies (2.2)
|D2 F (ξ1 ) − D2 F (ξ2 )| ≤ |ξ1 − ξ2 |,
ξ1 , ξ2 ∈ Rd .
The analysis below will still hold if the assumptions (2.1) and (2.2) are only valid for all ξ, ξ1 , ξ2 in a neighborhood of a continuous true solution. For the boundary function g of (1.1), we assume that it has been extended into the interior of Ω such that g ∈ H1 (Ω). Corresponding to g, we let H1g (Ω) = {v ∈ H1 (Ω) : v = g on ∂Ω}.
SADDLE POINT APPROACH FOR HARMONIC MAPS
1503
If v : Ω → Rd is a smooth vector field, then it follows from the chain rule that (2.3)
∇F (v) = (∇v)t DF (v),
where the product on the right-hand side is the ordinary matrix-vector product. Furthermore, we have (2.4)
∇DF (v) = D2 F (v)∇v.
From assumptions (i)–(ii) and the Taylor expansion we obtain the following estimate: (2.5)
−1 2 2c−1 1 |F (0)| ≤ |v(x)| ≤ 2c0 |F (0)|,
x∈Ω
for any v satisfying F (v) ≡ 0 in Ω. Similarly, we derive (2.6)
|DF (v)| ≥ c0 |v|
for any v, and hence |DF (v(x))| > 0 if v(x) ∈ M. Let us note that the interior constraint in (1.1), given by v(x) ∈ M, implies that a local minimum of (1.1) satisfies u ∈ H1g (Ω) ∩ L∞ (Ω). In fact, if we restrict the analysis to the case d = 2, with the manifold M taken to be the unit circle S1 , and we assume that the boundary ∂Ω and the boundary data g are sufficiently regular, then there is a unique smooth global minimizer of (1.1) under the condition that the degree of g is zero; cf. [7, Theorem 12] and [22]. However, this result is not true for more general harmonic map problems [30, 24]. We will first consider the characterization of critical points of the functional E over H1g (Ω; M). The outline below follows a standard Langrange multiplier approach to constrained optimization; cf., for example, [6] for the finite-dimensional case or [17, 19] in the infinite-dimensional case. A vector field u ∈ H1g (Ω; M) is such a critical point if it satisfies (2.7)
∇u, ∇v = 0
for any v in the tangent space of H1g (Ω; M) at u, i.e., for any v ∈ H10 (Ω) such that DF (u) · v ≡ 0. In the saddle point approach which we shall consider here we will view the critical points u as elements of the larger space H1g (Ω). Assume that u has the extra regularity property that (2.8)
u ∈ H1g (Ω) ∩ W1,∞ (Ω).
Then any such u is a critical point if and only if there is a λ ∈ L2 (Ω) such that the pair (u, λ) satisfies the first order conditions (2.9)
∇u, ∇v + DF (u) · v, λ = 0, F (u), μ = 0,
v ∈ H10 (Ω), μ ∈ H −1 (Ω).
To see this we assume that u is a critical point satisfying (2.8), and let z = DF (u)/|DF (u)|. For any v ∈ H10 (Ω), let vτ = v − (v · z)z. As a consequence DF (u) · vτ = 0, and, by (2.7), 0 = ∇u, ∇vτ = ∇u, ∇v − ∇u, ∇(v · z)z . From (2.3), the constraint implies that (∇u)t z = 0. Therefore, the final inner product above can be rewritten as ∇u, ∇(v · z)z = ∇u : ∇z, v · z .
1504
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
Hence, the system (2.9) holds with (2.10)
λ = −∇u : ∇z/|DF (u)| = −∇u : ∇DF (u)/|DF (u)|2 ,
where the last identity again is a consequence of the constraint. Note that it follows from (2.8) that the multiplier λ is actually in L∞ (Ω). The variational problem (2.9) is the Euler–Lagrangian equation for the constrained minimization problem (1.1), and the system is a weak formulation of the problem −Δu + λDF (u) = 0 in Ω, F (u) = 0 in Ω.
(2.11)
In the simplest case when M = Sd−1 , i.e., the unit disc in Rd , we have λ = −|∇u|2 and −Δu − |∇u|2 u = 0 in Ω u = g on ∂Ω. In the present paper we will restrict our attention to the critical points u of E over H1g (Ω; M) that are local minimizers. So assume that the pair (u, λ) is a solution of (2.9), satisfying the regularity property (2.8), and let w = w(t) be a smooth curve in H1g (Ω; M), defined for t in a neighborhood of the origin such that w(0) = u and w (0) = v. Hence, since F (w(t)) ≡ 0, we must have DF (u) · v = 0, and (2.12)
DF (u) · w (0) = −D2 F (u)v · v.
Furthermore, if we define a real valued function φ = φ(t) by φ(t) = E(w(t)) =
1 ∇w(t), ∇w(t) , 2
then φ (t) = ∇w(t), ∇w (t)
and φ (t) = ∇w (t), ∇w (t) + ∇w(t), ∇w (t) .
Hence, it follows from the system (2.9) that φ (0) = ∇u, ∇v = 0, and if u corresponds to a local minimum of E over H1g (Ω; M), then the second order condition φ (0) = ∇v, ∇v + ∇u, ∇w (0) ≥ 0 must hold. However, by using the system (2.9) and (2.12), we obtain that ∇u, ∇w (0) = −DF (u) · ∇w (0), λ = D2 F (u)v · v, λ . Therefore, the second order condition takes the form (2.13)
φ (0) = ∇v, ∇v + D2 F (u)v · v, λ ≥ 0.
In fact, let us refer to a local minimum u of E over H1g (Ω; M) as a strict local minimum if there is a positive constant β such that d2 E(w(t))|t=0 ≥ βv21 dt2 for any smooth curve w = w(t) in H1g (Ω; M) satisfying w(0) = u and w (0) = v. It follows from the calculation above that the function φ(t) = E(w(t)) satisfies (2.14)
φ (0) = ∇v, ∇v + D2 F (u)v · v, λ ≥ βv21
SADDLE POINT APPROACH FOR HARMONIC MAPS
1505
for all v ∈ H10 (Ω) satisfying DF (u) · v = 0. As we shall see below this condition is closely tied to a stability condition for a linearization of the system (2.9). The saddle point approach can be regarded as the limiting case of the penalty method. In the commonly used penalty approach, cf. [4, 14, 15, 16, 20], one is seeking a local minimizer of the following regularized problem: 1 min E(v) + |F (v)|2 dx, 2 Ω v∈H1g (Ω) where the penalty parameter > 0 has to be properly chosen. The saddle point system (2.9) is formally obtained in the limit as tends to zero. The advantage of the saddle point approach is that the standard mixed finite element theory, cf. [9], tells us how to choose the finite element spaces properly to avoid possible instabilities, and there is no need to choose a penalty parameter. 3. Stability of the linearized problem. Throughout the rest of this paper we will assume that the pair (u, λ) is a solution of the system (2.9), corresponding to a local minimum of E over H1g (Ω; M) and satisfying the regularity property (3.1)
u ∈ H1g (Ω) ∩ W1,∞ (Ω),
λ ∈ L∞ (Ω).
In particular, u and λ are related by (2.10), and the second order condition (2.13) holds, i.e., a(u, λ; v, v) ≥ 0 for all v ∈ Zu , where the bilinear form a(u, λ; ·, ·) is given by ˆ , λ ˆ ) = ∇v, ∇ˆ a(u, λ; v, v v + D2 F (u)v · v and Zu = {v ∈ H10 (Ω) : DF (u) · v, μ = 0,
μ ∈ L2 (Ω)}.
For the analysis below, it will be useful to consider linearization of the saddle point system (2.9). More precisely, we consider systems of the following form: Find (v, μ) ∈ H10 (Ω) × H −1 (Ω) such that (3.2)
ˆ ) + DF (u) · v ˆ , μ = f , v , a(u, λ; v, v DF (u) · v, μ ˆ = σ, μ ,
ˆ ∈ H10 (Ω), v μ ˆ ∈ H −1 (Ω),
where (u, λ) is the exact solution of (2.9) satisfying (3.1). Here f ∈ H−1 (Ω) and σ ∈ H01 (Ω) represent data. Our goal is to show that this linear system is well-posed, i.e., we will show that the map (f , σ) ∈ H−1 (Ω) × H01 (Ω) → (v, μ) ∈ H10 (Ω) × H −1 (Ω) is well defined and bounded. This will be established by verifying the standard stability conditions for saddle points systems; cf. [8] or [9]. We will first establish the so-called inf-sup condition. Theorem 3.1. Let (u, λ) satisfy (3.1) and be related by (2.10). Then there is a positive constant β1 , depending on u, such that (3.3)
inf
μ∈H −1 (Ω)
DF (u) · v, μ ≥ β1 . v∈H10 (Ω) v1 μ−1 sup
1506
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
Proof. For any μ ∈ H −1 (Ω), there exists a ϕ ∈ H01 (Ω) such that μ, ϕ = μ−1 . ϕ1
(3.4)
w Define v = ϕ |w| 2 , where w = DF (u). Then, by Leibniz’ rule, there exists a c > 0, depending on u, such that
∇v0 ≤ cϕ1 . Furthermore, DF (u) · v, μ = ϕ, μ = ϕ1 μ−1 . Hence, the desired inequality holds with β1 = 1/c. Next we need to consider the properties of the bilinear form a(u, λ; ·, ·). It is straightforward to check that this bilinear form is bounded in the sense that (3.5)
ˆ ) ≤ C(u, λ)|v|1 |ˆ a(u, λ; v, v v|1 ,
ˆ ∈ H10 (Ω), v, v
where the constant C(u, λ) depends on the norms of u and λ indicated by (3.1). The final key property for the stability analysis of the linear system (3.2) is the requirement that the bilinear form a(u, λ; ·, ·) is coercive on the linearized constraint space Zu . It should be noted that this bilinear form is, in general, not coercive on the entire space H10 (Ω). For example, in the simplest case when M = Sd−1 , we have a(u, λ; v, v) = (|∇v|2 − |∇u|2 |v|2 ) dx. Ω
On the other hand, the stability theory of [8] requires only that (3.6)
a(u, λ; v, v) ≥ βv21 ,
v ∈ Zu
for a suitable positive constant β, and this is exactly the strict minimum condition (2.14). Therefore, if u is a strict local minimum, then the linear system (3.2) is well-posed. Furthermore, if we restrict to two space dimensions, i.e. d = 2, then the coercivity condition (3.6) always holds. This is a consequence of the following theorem, which implies that in this case every critical point (u, λ) satisfying (3.1) is a strict local minimum, and the corresponding problem (3.2) is well-posed. Theorem 3.2. Assume that d = 2. Let (u, λ) satisfy (3.1) and be related by (2.10). Then there is a positive constant β2 , depending on u, such that (3.7)
a(u, λ; v, v) = ∇v, ∇v + D2 F (u)v · v, λ ≥ β2 v21 ,
v ∈ Zu .
Remark 3.1. The result of this theorem will not be true, in general, if the target manifold M is of higher dimension. However, in [23] a sufficient condition on u and M, referred to as the “cut locus condition,” is given, which ensures that the operator associated with the bilinear form a(u, λ; ·, ·), restricted to the tangent space Zu , is invertible, and hence the linear system (3.2) will be well-posed. Before we give the proof of the theorem we will establish an auxiliary result.
SADDLE POINT APPROACH FOR HARMONIC MAPS
1507
Lemma 3.1. Assume that the conditions given in Theorem 3.2 hold and define w = (w1 , w2 )t = DF (u). Then, λD2 F (u)w⊥ · w⊥ = −
w12 |∇w2 |2 + w22 |∇w1 |2 − 2w1 w2 ∇w1 · ∇w2 . |w|2
Proof. It follows from (2.10) that the multiplier λ can be expressed as λ = −∇u : ∇w/|w|2 . Hence, λD2 F (u)w⊥ · w⊥ =
(3.8)
∇u : ∇w (F11 w22 + F22 w12 − 2F12 w1 w2 ), |w|2
where Fij = ∂ 2 F/∂ui ∂uj . Furthermore, since ∇F (u) ≡ 0, we have from (2.3) that w1 ∇u1 + w2 ∇u2 = 0, while (2.4) implies that ∇wi = Fi1 ∇u1 + Fi2 ∇u2 . By combining these identities, we obtain (F11 w22 + F22 w12 − 2F12 w1 w2 )∇u1 · ∇w1 = w22 (F11 ∇u1 + F12 ∇u2 ) · ∇w1 − w1 w2 (F22 ∇u2 + F12 ∇u1 ) · ∇w1 = w22 |∇w1 |2 − w1 w2 ∇w1 · ∇w2 . A similar argument shows that (F11 w22 + F22 w12 − 2F12 w1 w2 )∇u2 · ∇w2 = w12 |∇w2 |2 − w1 w2 ∇w1 · ∇w2 , and hence the desired identity follows from (3.8). Proof of Theorem 3.2. As above we let w = DF (u). For any v ∈ Zu , there exists an α such that v = αw⊥ . In fact, we have (3.9)
α=
v · w⊥ . |w|2
From the estimates (2.5)–(2.6) and condition (3.1), we see that α ∈ H01 (Ω). The key identity we will use is the pointwise relation (3.10)
|∇v|2 + λD2 F (u)v · v = |∇(α|w|)|2 .
In order to verify this identity note that ∇(α|w|) = |w|∇α +
α (w1 ∇w1 + w2 ∇w2 ). |w|
Hence, |α|2 |w1 ∇w1 + w2 ∇w2 |2 |w|2 + 2α(w1 ∇α · ∇w1 + w2 ∇α · ∇w2 ).
|∇(α|w|)|2 = |w|2 |∇α|2 +
1508
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
On the other hand, |∇v|2 = |w|2 |∇α|2 + α2 |∇w|2 + 2α(w1 ∇α · ∇w1 + w2 ∇α · ∇w2 ). Therefore,
|w1 ∇w1 + w2 ∇w2 |2 |∇v|2 − |∇(α|w|)|2 = α2 |∇w|2 − |w|2 α2 = (w2 |∇w2 |2 + w22 |∇w1 |2 − 2w1 w2 ∇w1 ∇w2 ) |w|2 1 = −λD2 F (u)v · v, where the last identity follows from Lemma 3.1. Hence, we have verified (3.10). μ Let μ = α|w|. Then v = |w| w⊥ , and hence ∇v =
1 ⊥ w · ∇μ + μ∇ |w|
w⊥ |w|
.
Therefore, since u satisfies (3.1), Poincar´e’s inequality implies that ∇v0 ≤ c(∇μ0 + μ0 ) ≤ c∇(α|w|)0 , where the constant c depends on u. Together with (3.10) this implies the desired inequality of the theorem. 4. A stable discretization. The purpose of this section is to analyze a finite element discretization of the constrained minimization problem (1.1). Due to some technical difficulties caused by the use of inverse inequlities to treat some nonlinear terms, cf. (4.3) below, the analysis given here is restricted to the case d = 2. As a consequence, the bilinear form a(u, λ; ·, ·) will satisfy the coercivity bound given in Theorem 3.2. So, for the rest of the paper, we assume that d = 2 and that Ω ⊂ R2 is a polygonal domain. Given a shape regular and quasi–uniform family of triangulation {Th } of Ω, with a mesh size h < 1, let Nh denote the set of nodes associated with Th . We use Vh to denote the space of continuous piecewise linear functions and Vh,0 = Vh ∩ H01 (Ω). The notations Vh and Vh,0 will be used for the vector version of the corresponding spaces. We will use πh to denote the usual nodal interpolation operators onto the spaces Vh and Vh . Standard approximation properties of spaces of piecewise linear functions will be used below. In particular, we will use the estimates (4.1)
(I − πh )v1 ≤ Ch|v|2 ,
v ∈ H 2 (Ω),
and (4.2)
(I − Ph )v−1 ≤ Chv0 ,
v ∈ L2 (Ω).
Here, Ph : L2 (Ω) → Vh,0 is the L2 projection. Due to the quasi-uniformity of the mesh, the operator Ph can be extended to a uniformly bounded operator on H −1 . Moreover, the following inverse inequalities hold: (4.3)
v∞ ≤ C log(h−1 )v1 ,
v1 ≤ Ch−1 v0 ,
v ∈ Vh .
SADDLE POINT APPROACH FOR HARMONIC MAPS
1509
Set gh = πh g (on ∂Ω). We define Vh,g = {v ∈ Vh : v|∂Ω = gh }. We will consider the following discretized minimization problem: (4.4)
min E(v) subject to F (v) = 0 on Nh .
v∈Vh,g
The Lagrange functional L : Vh,g × Vh,0 → R is (4.5) L(v, μ) = E(v) + μπh F (v)dx (v, μ) ∈ Vh,g × Vh,0 . Ω
The first order condition defining the critical points of L leads to the following discrete counterpart of the nonlinear saddle point problem (2.9): Find (uh , λh ) ∈ Vh,g × Vh,0 such that (4.6)
∇uh , ∇v + πh [DF (uh ) · v], λh = 0, πh F (uh ), μ = 0,
v ∈ Vh,0 , μ ∈ Vh,0 .
However, we shall first analyze the discrete counterpart of the linearized system (3.2). ˆ ·, ·) to be ˆ ∈ Vh,g × Vh,0 , let us define the bilinear form ah (ˆ u, λ; For a given (ˆ u, λ) ˆ v, v ˆ ˆ ) = ∇v, ∇ˆ ˆ ], λ . u, λ; v + πh [D2 F (ˆ u)v · v ah (ˆ Similarly, as in (3.2) for the continuous problem, the linearized problem for (4.6) is to find (v, μ) ∈ Vh,0 × Vh,0 such that (4.7)
ˆ v, v ˆ ) + πh [DF (ˆ ˆ ], μ = f , v ˆ , ah (ˆ u, λ; u) · v πh [DF (ˆ u) · v], μ ˆ = σ, μ ˆ ,
ˆ ∈ Vh,0 v μ ˆ ∈ Vh,0 .
ˆ ∈ Vh,g , define For a given u Zh,ˆu = {v ∈ Vh,0 : DF (ˆ u) · v = 0 on Nh }. Lemma 4.1. Let Φ : R2 × R2 × · · · × R2 → R2 be a smooth function. Then we have the following estimates for all v1 , v2 , . . . , vk ∈ Vh : (4.8)
|πh Φ(v1 , v2 , . . . , vk )|1 ≤ C
k
Dvi Φ0,∞ |vi |1 ;
i=1
(4.9)
(πh − I)Φ(v1 , v2 , . . . , vk )0 ≤ Ch
k
Dvi Φ0,∞ |vi |1 .
i=1
Above, the constant C is independent of h, Φ, and vi . The norm Dvi Φ0,∞ stands for Dvi Φ(v1 , v2 , . . . , vk)0,∞ , with Dvi Φ(v1 , v2 , . . . , vk) = ∂Φ(v1 , v2 , . . . , vk)/ ∂vi . Proof. For clarity, we shall only give the proof for k = 2. The extension of the proof for general cases is straightforward. For an element e ∈ Th , let pi , i = 1, 2, 3 be the vertices of e. Under the condition that the finite element mesh Th is regular and quasi-uniform, we have the following equivalent H 1 norms for v ∈ Vh : (4.10)
|v|1,e ∼ =
3 i,j=1
|v(pi ) − v(pj )|2 ,
v ∈ Vh , e ∈ Th .
1510
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
In particular, 3
|πh Φ(v1 , v2 )|21,e ≤
|Φ(v1 (pi ), v2 (pi )) − Φ(v1 (pj ), v2 (pj ))|2 .
i,j=1
Thus, we get (4.8) from the following estimate: 3
≤2 |Φ(v1 (pi ), v2 (pi )) − Φ(v1 (pj ), v2 (pi ))|2
|πh Φ(v1 , v2 )|21,e
i,j=1
+ |Φ(v1 (pj ), v2 (pi )) − Φ(v1 (pj ), v2 (pj ))|
2
3
2 2 2 2 ≤2 Dv1 Φ0,∞,e |v1 (pi ) − v1 (pj )| + Dv2 Φ0,∞,e |v2 (pi ) − v2 (pj )| . i,j=1
Next, we estimate (4.9). By the definition of the interpolation operator πh , we have (πh − I)Φ(v1 , v2 )(p) =
3
[Φ(v1 (pi ), v2 (pi )) − Φ(v1 (p), v2 (p))]χi (p)
p ∈ e,
i=1
where {χi }3i=1 are the barycentric coordinates on e. From this, we see that (πh − (4.11) ≤ C
I)Φ(v1 , v2 )20,e
3 i,j=1
≤ Ch2
e
3 ≤C | Φ(v1 (pi ), v2 (pi )) − Φ(v1 , v2 ) χi |2 i=1
e
Dv1 Φ20,∞,e |v1 (pi ) − v1 |2 + Dv2 Φ20,∞,e |v2 (pi ) − v2 |2
3 |Dv1 Φ|20,∞,e |v1 |21,e + |Dv2 Φ|20,∞,e |v2 |21,e . i,j=1
Thus, the estimate (4.9) is verified. For the lemma above, it is essential that the functions vi are finite element functions. If v1 ∈ W1,∞ (Ω) and v2 ∈ Vh , then we obtain (4.12)
(πh − I)Φ(v1 , v2 )0 ≤ Ch(Dv1 Φ0,∞ |v1 |1,∞ + Dv2 Φ0,∞ |v2 |1 ).
The next result, which is essential for our analysis, is a discrete version of Theorem 3.2. As in the previous section, (u, λ) is a solution of (2.9) satisfying (3.1). ˆ ∈ Theorem 4.1. There exists positive constants γ0 and h0 such that, for (ˆ u, λ) Vh,g × Vh,0 satisfying (4.13)
ˆ − Ph λ−1 ≤ γ/ log2 (h−1 ) ˆ u − πh u1 + λ
with h ≤ h0 and γ ≤ γ0 , we have (4.14)
ˆ v, v) ≥ β3 v2 , u, λ; ah (ˆ 1
Here the constants γ0 , h0 , β3 depend on u.
v ∈ Zh,ˆu .
SADDLE POINT APPROACH FOR HARMONIC MAPS
1511
In order to prove the above theorem, we need to derive some auxiliary results. The main idea is to relate (4.14) to the continuous problem, and then use Theorem 3.2 and some approximate properties of the operators πh and Ph . As before, we shall ˆ satisfying use w = DF (u), with u being the true solution; see (3.1). Given a (ˆ u, λ) ˆ = DF (ˆ (4.13), we define w u). For any v ∈ Zh,ˆu , let us define (4.15)
α(pi ) =
ˆ ⊥ (pi ) v(pi ) · w , ˆ i )|2 |w(p
From the above definition, it is clear that
ˆ⊥ v·w α = πh ∈ Vh,0 , ˆ 2 |w|
pi ∈ Nh .
ˆ ⊥ ). v = πh (αw
ˆ · v = 0 on Nh in getting the last equality. Corresponding We have used the relation w ˆ ∈ Zh,ˆu , let εh ∈ H10 (Ω) be the function given by to the true solution u and a given u ⊥ εh = αw − v. We see clearly that εh + v ∈ Zu .
(4.16)
ˆ satisfying (4.13), one can verify by assumption (i) on the constraint For a given u function F , cf. (2.1), and the inverse estimate (4.3) that ˆ |w(p) − w(p)| = |DF (ˆ u(p)) − DF (πh u(p))| ≤ c1 γ,
p ∈ Nh .
Thus, by choosing γ small enough, one can guarantee that ˆ 0 < c|w(p)| ≤ |w(p)| ≤ C|w(p)|,
(4.17)
p ∈ Nh .
Hence, we conclude that (4.13) implies that there is a constant C, depending only on u, such that ˆ u1 , ˆ u0,∞ ≤ C.
(4.18)
ˆ ∈ Vh,g × Vh,0 satisfy (4.13). Then we have the estimate Lemma 4.2. Let (ˆ u, λ)
ˆ πh ϕ w ≤ C|ϕ|1 , ϕ ∈ Vh,0 , 2 ˆ |w| 1 where the constant C depends on u. ˆ w Proof. Let ψ = πh (ϕ |w| ˆ 2 ). Using (4.10), we see that 2 ˆ i) ˆ j ) w(p ϕ(pi ) w(p − ϕ(p ) j ˆ i )|2 ˆ j )|2 |w(p |w(p i,j 2 |ϕ(pi ) − ϕ(pj )|2 ˆ i) ˆ j ) w(p 2 w(p . ≤C + |ϕ(pj )| · − ˆ i )|2 ˆ i )|2 ˆ j )|2 |w(p |w(p |w(p i,j
|ψ|21,e ≤ C (4.19)
It follows from (4.10) and (4.17)–(4.18) that (4.20)
|ϕ(pi ) − ϕ(pj )|2 i,j
ˆ i )|2 |w(p
≤ C|ϕ|21,e .
1512
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
On the other hand, we have by (4.17)–(4.18) and assumption (iii) on the constraint function F , cf. (2.2), 2 w(p ˆ j) w(p ˆ i) ˆ i ) − w(p ˆ j )|2 ≤ C|ˆ ˆ (pj )|2 − u(pi ) − u |w(p ˆ i )|2 ˆ j )|2 ≤ C|w(p |w(p ≤ C|(ˆ u − πh u)(pi ) − (ˆ u − πh u)(pj )|2 + |πh u(pi ) − πh u(pj )|2 . Thus, we get by the inverse estimate (4.3) and (4.13) that 2 ˆ ˆ w(p ) ) w(p i j − |ϕ(pj )|2 · ˆ i )|2 ˆ j )|2 | w(p | w(p i,j (4.21)
≤ Cϕ20,∞,e · |ˆ u − πh u|21,e + ϕ20,e · πh u21,∞,e ≤ C(γ 2 + u21,∞,e )ϕ21,e .
Substituting (4.20)–(4.21) into (4.19), we obtain the desired bound. ˆ w Remark 4.1. If we apply Lemma 4.1 on the function ψ defined by ψ = πh (ϕ |w| ˆ 2 ), we will get that |ψ|1 ≤ C log(h−1 )|ϕ|1 . The result we are getting here is better. We have removed the factor log(h−1 ). ˆ ∈ Vh,g × Vh,0 satisfy (4.13). Then, there exist positive Lemma 4.3. Let (ˆ u, λ) constants h0 and γ0 , depending on u, such that a(u, λ; v, v) ≥
β2 v21 , 2
v ∈ Zh,ˆu
for 0 < h ≤ h0 and 0 < γ ≤ γ0 . Proof. For any v ∈ Zh,ˆu , let α and εh be defined as in (4.15) and (4.16). From πh (απh w⊥ ) = πh (αw⊥ ), we have (4.22)
ˆ ⊥ ]. εh = (I − πh )(αw⊥ ) + πh [απh (w − w)
From (4.12) and also using the inverse inequality (4.3), we get that |(I − πh )(αw⊥ )|21 ≤ Ch2 w⊥ 20,∞ |α|21 + α20,∞ w⊥ 21,∞ ≤ Ch2 log2 (h−1 )u21,∞ |α|21 .
(4.23)
Note that there exists a ξ such that ⊥ ˆ ⊥ ] = πh απh πh D2 F (ξ)(πh u − u ˆ) πh [απh (w − w) . A repeated application of (4.8) and (4.3) gives (4.24)
ˆ ⊥ ]|21 ≤ C log4 (h−1 )|α|21 |πh u − u ˆ |21 . |πh [απh (w − w)
From Lemma 4.2, we see that (4.25)
|α|1 ≤ C|v|1 .
Combining (4.23)–(4.25) with (4.13), we see that (4.26) |εh |21 ≤ C(h2 log2 (h−1 )u21,∞ + γ 2 )|α|21 ≤ C(h2 log2 (h−1 )u21,∞ + γ 2 )|v|21 .
SADDLE POINT APPROACH FOR HARMONIC MAPS
1513
The following estimate follows from (3.5) and (3.7): (4.27)
a(u, λ; v, v) = a(u, λ; v + εh , v + εh ) − a(u, λ; v, εh ) + a(u, λ; εh , εh ) ≥ C(β2 v + εh 21 − |v|1 |εh |1 − |εh |21 ).
Choosing h and γ small enough, we obtain the desired result from (4.26) and (4.27). Proof of Theorem 4.1. In the proof, we always assume that h and γ are small. Note that ˆ v, v) − a(u, λ; v, v) = πh [D2 F (ˆ ˆ − D2 F (u)v · v, λ u, λ; u)v · v], λ ah (ˆ ˆ − λ + (πh − I)[D2 F (ˆ = πh [D2 F (ˆ (4.28) u)v · v], λ u)v · v], λ u) − D2 F (u))v · v, λ = I1 + I2 + I3 . + (D2 F (ˆ The meaning of Ii is self-explainable. Since λ ∈ L2 (Ω), we obtain from (4.13) that ˆ h − λ−1 ≤ λ ˆ h − Ph λ−1 + Ph λ − λ−1 λ ≤ γ/ log2 (h−1 ) + Chλ0 . Using Lemma 4.1, we see that |πh [D2 F (ˆ u)v · v]|1 ≤ C(|D2 F (ˆ u) · v|0,∞ |v|1 + v20,∞ D3 F (ˆ u)0,∞ |ˆ u|1 ) 2 −1 ≤ C log (h )v21 . For a small h, a combination of the above two inequalities leads to ˆh −λ)| ≤ C log2 (h−1 )v2 (γ/ log2 (h−1 )+Chλ0 ) ≤ Cγv2 . |I1 | = |(πh [D2 F (ˆ u)v·v], λ 1 1 Similarly, we use Lemma 4.1 to prove that |I2 | = |((πh − I)[D2 F (ˆ u)v · v], λ)| ≤ (πh − I)[D2 F (ˆ u)v · v]0 · λ0 ≤ Ch log2 (h−1 )v21 and |I3 | = |((D2 F (ˆ u) − D2 F (u))v · v, λ)| u) − D2 F (u))v · v0 · λ0 ≤ Cγv21 . ≤ (D2 F (ˆ Choosing h and γ small enough, we obtain the desired result from Lemma 4.3 and the estimates above of the three terms appearing in (4.28). ˆ ∈ Vh,g × Vh,0 satisfies the condition (4.13). Theorem 4.2. Assume that (ˆ u, λ) There exists a constant β4 > 0, which depends on u, such that (4.29)
inf
sup
μ∈Vh,0 v∈Vh,0
πh [DF (ˆ u) · v], μ ≥ β4 . μ−1 v1
Proof. For the ϕ given in (3.4), let ϕh = Ph ϕ. Then, we see that μh , ϕh ≥ β1 μh −1 . ϕh 1
1514
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
DF (ˆ u) Define vh = πh [ϕh |DF (ˆ u)|2 ]. Then,
u) · vh ], μh = μh , ϕh . πh [DF (ˆ From Lemma 4.2, one gets that |vh |1 ≤ C|ϕh |1 . By collecting these estimates, the theorem is established. Together with the Theorems 4.1 and 4.2, the saddle point theory given in [8] or [9] assures existence, stability, and uniqueness of the solution of the linearized saddle ˆ satisfies (4.13). In the next section, we shall use point system (4.7), as long as (ˆ u, λ) these properties to prove some results for the corresponding nonlinear systems. Remark 4.2. If we replace Vh,0 by Vh in (4.29), the inf-sup condition (4.29) may not be satisfied. This is why we use the Vh,0 , instead of Vh , as finite element space for the Lagrange multiplier. 5. The discrete nonlinear problem. The main purpose of this section is to establish existence and uniqueness of solutions of the discretized nonlinear saddle point problem (4.6) in a neighborhood of a continuous solution (u, λ) of the system (2.9). As above, we assume that (u, λ) corresponds to a local minimum of the functional E over H1g (Ω; M) and that the regularity assumption (3.1) holds. Furthermore, we will show that the discrete solutions converge to the continuous solution with a linear rate with respect to the mesh parameter h. However, we start by summarizing some properties of the linearized saddle point system. For notational simplicity, we shall use X, Xh , and Xh,g defined by X = H10 (Ω) × −1 H (Ω), Xh = Vh,0 × Vh,0 , and Xh,g = Vh,g × Vh,0 . Let · X denote the norm on the product space H10 (Ω) × H −1 (Ω), and let · X ∗ denote the norm on the dual space X ∗ = H−1 (Ω) × H01 (Ω). The norm · L(X,X ∗ ) will be used to denote the norm of a bounded linear operator from X to X ∗ . The spaces Xh and Xh,g are equipped with the norm of X, while Xh∗ is equal to Xh as a set, but equipped with the dual norm of X with respect to the L2 inner products. Similarly, the norm · L(Xh ,Xh∗ ) is the associated operator norm. Let x = (u, λ) be a solution of (2.9). Corresponding to x, let G(x) ∈ X ∗ be given by G(x), y = ∇u, ∇v + DF (u) · v, λ + F (u), μ ,
y = (v, μ) ∈ X.
As usual, ·, · is the duality pairing which extends the standard L2 inner product. Associated with G, we define a mapping G (x) : X → X ∗ by (5.1)
ˆ ) + DF (u) · v ˆ , μ + DF (u) · v, μ G (x) · y, yˆ = a(u, λ; v, v ˆ
for all y = (v, μ), yˆ = (ˆ v, μ ˆ ) ∈ X = H10 (Ω) × H −1 (Ω). The operator G (x) is formally the Fr´echet differential of G at x. Recall from the saddle point theory given in [8, 9] that Theorems 3.2–3.1 imply that the system (3.2) has a unique solution (v, μ), which depends continuously on (f , σ) ∈ X ∗ . Thus we have the following result. Theorem 5.1. If (u, λ) satisfies the regularity assumption (3.1), then the map G (x) defined by (5.1) is an isomorphism from X = H10 (Ω) × H −1 (Ω) to X ∗ = H−1 (Ω) × H01 (Ω). For the discretized saddle point problem, let Gh : Xh,g → Xh∗ be the map defined ˆ ∈ Xh,g , Gh (ˆ by (4.6). For any x ˆ = (ˆ u, λ) x) is the operator that satisfies ˆ + πh F (ˆ ˆ ], λ) x), yˆ = ∇ˆ u, ∇ˆ v + πh [DF (ˆ u) · v u), μ ˆ , Gh (ˆ
yˆ = (ˆ v, μ ˆ ) ∈ Xh .
SADDLE POINT APPROACH FOR HARMONIC MAPS
1515
Thus, problem (4.6) is, in fact, to find xh = (uh , λh ) ∈ Xh,g such that Gh (xh ), y = 0,
(5.2)
y = (ˆ v, μ ˆ ) ∈ Xh .
ˆ ∈ Xh,g . Then, G (ˆ Let G h (ˆ x) be the Fr´echet derivative of Gh at x ˆ = (ˆ u, λ) h x ) : Xh → ∗ Xh is the linear operator given by ˆ v, v ˆ ) + πh [DF (ˆ ˆ ], μ + πh [DF (ˆ x)y, yˆ = ah (ˆ u, λ; u) · v u) · v], μ ˆ , (5.3) G h (ˆ v, μ ˆ ) ∈ Xh . y = (v, μ) ∈ Xh , yˆ = (ˆ By Theorems 4.1–4.2, the following result is a consequence of the theory given in [8, 9]. ˆ ∈ Xh,g satisfies the condition (4.13). Theorem 5.2. Assume that x ˆ = (ˆ u, λ) For sufficiently small h and γ, the map Gh (ˆ x) is an isomorphism from Xh to Xh∗ . Moreover, x)−1 L(Xh∗ ,Xh ) ≤ M, G h (ˆ
(5.4)
ˆ where M is a constant independent of h and x ˆ = (ˆ u, λ). Define x∗ = (πh u, Ph λ), and set y∗ = Gh (x∗ ). We can use similar techniques as for Theorems 4.1 to prove the following lemma. ˆ ∈ Xh,g satisfying (4.13), we have Lemma 5.1. For any xˆ = (ˆ u, λ) G h (ˆ x) − G h (x∗ )L(Xh ,Xh∗ ) ≤ C log(h−1 )ˆ x − x∗ X , where C depends on u and λ. v, μ ˆ) ∈ Proof. By the definition of G h , we have, for any y = (v, μ) ∈ Xh and yˆ = (ˆ Xh , ˆ − Ph λ ˆ ], λ x) − G h (x∗ ))y, yˆ = πh [D2 F (ˆ u)v · v (G h (ˆ 2 2 ˆ ], Ph λ + πh [(D F (ˆ u) − D F (πh u))v · v ˆ ], μ u) − DF (πh u)) · v + πh [(DF (ˆ + πh [(DF (ˆ u) − DF (πh u)) · v], μ ˆ .
(5.5)
It is clear that (5.6)
ˆ − Ph λ ≤ πh [D2 F (ˆ ˆ − Ph λ−1 . ˆ ], λ ˆ ]1 λ πh [D2 F (ˆ u)v · v u)v · v
As in the proof of Lemma 4.1, we deduce ˆ ]1 ≤ CD2 F (ˆ πh [D2 F (ˆ u)v · v u)v0,∞ · ˆ v1 + CD2 F (ˆ u)0,∞ · v1 · ˆ v0,∞ + CD2 F (ˆ u)0,∞ · v0,∞ · ˆ v0,∞ . Then, we further get by the inverse inequality (4.3) ˆ ]1 ≤ C log3 (h−1 )v1 · ˆ πh [D2 F (ˆ u)v · v v 1 . Plugging this in (5.6), together with (4.13), leads to ˆ − Ph λ ≤ Cγ log(h−1 )v1 ˆ ˆ ], λ πh [D2 F (ˆ u)v · v v1 .
1516
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
Similarly, we deduce by (2.2), the inverse inequality (4.3), and (4.13) ˆ ]1 πh [(D2 F (ˆ u) − D2 F (πh u))v · v ≤ C log3 (h−1 )ˆ u − πh u1 · v1 · ˆ v1 ≤ Cγ log(h−1 )v1 ˆ v1 . Estimating the last two terms in (5.5) by Lemma 4.1, (4.3), and (4.13), we obtain the result. The constants C in the estimates depend on u and λ. At this point, we need to recall the implicit function theorem as, for example, given in Lemma 1 of [10]. From the implicit function theorem, we can conclude that if there is a δ > 0 such that (5.7)
x ˆ ∈ Xh , ˆ x − x∗ X ≤ δ implies G h (ˆ x) − G h (x∗ )L(Xh ,Xh∗ ) ≤
1 , 2M
then the equation (5.8)
Gh (ˆ x) = yˆ
has a unique solution for all yˆ satisfying ˆ y − y∗ X ∗ ≤
δ . 2M
Here M is the positive constant appearing in Theorem 5.2. From Lemma 5.1, we see that the condition (5.7) is fulfilled if we choose δ = 1/(2M C log(h−1 )). Hence, we have that (5.8) has a unique solution x ˆ satisfying ˆ x − x∗ X ≤
1 2M C log(h−1 )
for all yˆ such that ˆ y − y∗ X ∗ ≤
1 . 4M 2 C log(h−1 )
Furthermore, we can conclude from Lemma 1 of [10] that (5.9)
ˆ x − x∗ X ≤ 2M ˆ y − y∗ X ∗ .
Note that our desired equation is Gh (x) = 0. Thus, if we can verify that (5.10)
Gh (x∗ )X ∗ = y∗ X ∗ ≤
4M 2 C
1 , log(h−1 )
we can conclude existence and uniqueness of the solution of this equation. If we assume more smoothness on u, this is a consequence of the following lemma. Lemma 5.2. Assume that u ∈ H2 (Ω) ∩ W1,∞ (Ω). Then we have Gh (x∗ )X ∗ ≤ Ch, with x∗ = (πh u, Ph λ). Proof. It suffices to prove that (5.11)
|Gh (x∗ ), x ˆ | ≤ Chˆ xX ,
x ˆ = (v, μ) ∈ Xh .
SADDLE POINT APPROACH FOR HARMONIC MAPS
1517
We have by (2.9) and the definition of Gh Gh (x∗ ), xˆ = ∇(πh u − u), ∇v + πh F (πh u), μ − F (u), μ + πh [DF (πh u) · v], Ph λ − DF (u) · v, λ .
(5.12) It is clear that (5.13)
|∇(πh u − u), ∇v | ≤ |πh u − u|1 · |v|1 ≤ Chu2 · |v|1 .
Note that since πh F (πh u) = πh F (u), we obtain from (4.1) that (5.14) |πh F (πh u), μ − F (u), μ | = |πh − I)F (u), μ | ≤ (πh − I)F (u)1 · μ−1 ≤ ChF (u)2 · μ−1 . Furthermore, by the assumptions on F and the estimates (4.1), (4.2), and (4.12), we get
(5.15)
|πh [DF (πh u) · v], Ph λ − DF (u) · v, λ | ≤ |(πh − I)[DF (u) · v], Ph λ | + |DF (u) · v, Ph λ − λ | ≤ (πh − I)[DF (u) · v]0 · Ph λ0 + DF (u) · v1 · Ph λ − λ−1 ≤ ChDF (u) · v1 · λ0 ≤ ChDF (u)1,∞ · λ0 · v1 .
Substituting (5.13)–(5.15) into (5.12), gives (5.11). From this lemma, we see that y∗ satisfies (5.10) for small h. Thus, there exists a unique solution for (4.6). Moreover, the solution satisfies the estimate (5.9). We state this conclusion more clearly in the following theorem. Theorem 5.3. Assume that u ∈ H2 (Ω) ∩ W1,∞ (Ω). Then, for sufficiently small h, there exists a unique saddle point (uh , λh ) ∈ Xh for (4.6) in a small neighborhood of (πh u, Ph λ). Moreover, the following error estimate holds: uh − u1 + λh − λ−1 ≤ Ch. 6. Preconditioned iterative methods. We shall combine a preconditioning technique with the classical Newton’s method; cf., for example [27, chapter 7], to solve the nonlinear saddle point problem (4.6) or equivalently (5.2). Of course, Newton’s method will only converge if the initial value is close enough to the true solution. Therefore, in practical computations, it is often necessary to use another global method to obtain an appropriate initial value. A systematic study of such techniques is beyond the scope the present work. However, some alternatives to supply a good initial value are given in the example in section 7.2 below. Let x0 = (u0h , λ0h ) ∈ Xh be a suitable initial guess. The Newton iteration is given by (6.1)
xn+1 = xn − G h (xn )−1 Gh (xn ),
n = 0, 1, . . . .
Assume that the initial guess (u0h , λ0h ) satisfies (4.13), with a small γ. Using Theorem 5.2, combined with Lemma 5.1 and the standard properties of Newton’s method, it follows that all xn = (unh , λnh ) satisfy (4.13), with the same γ, and all the operators G h (xn ) are invertible. Moreover, the sequence {(unh , λnh )} converges with almost order 2, i.e., − uh 1 + λn+1 − λh −1 ≤ C log2 (h−1 )(unh − uh 1 + λnh − λh −1 )2 . un+1 h h
1518
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
For the iteration (6.1), we need to invert G h (xn ), i.e., we need to solve the system G h (xn )(xn+1 − xn ) = −Gh (xn ).
(6.2)
From Theorem 5.2, we obtain that G h (xn ) is an isomorphism from Xh to Xh∗ . Moreover, G h (xn )L(Xh ,Xh∗ ) is bounded, and the bound is independent of h and n if the initial value is chosen close enough to the true solution. Hence, based on preconditioning theory as in [2, 3], we see that any isomorphism from Xh∗ to Xh is an optimal preconditioner for system (6.2). Due to this, we can construct some efficient preconditioners for (6.2). Let Δh and Δh be the finite element discretizations for the vector and scalar Laplacian operators Δ and Δ on Vh,0 and Vh,0 , respectively. To be precise, Δh : Vh,0 → Vh,0 is the mapping defined by Δh uh , v = −∇uh , ∇v ,
v ∈ Vh,0 .
Then the operator
Th =
−Δ−1 h 0
0 −Δh
is an isomorphism from Xh∗ to Xh , with associated operator norm bounded independently of h. Thus, Th ◦ G h (xn ) maps Xh to Xh , with condition numbers bounded independently of h and n. However, in order to make the preconditioner efficient, it is necessary to simplify the evaluation of the operator Th . We therefore replace Δ−1 h by another spectral equivalent operator, i.e., by a preconditioner for the discrete Laplacian using domain decomposition or multigrid methods [31, 33]. The linear system (6.2) is then solved by the preconditioned minimum residual method, with the modified Th operator T˜h as the preconditioner; cf. [28] or [18, Chapter 6]. Since the condition number of the operator T˜h ◦ G h (xn ) is bounded independent of h and n, so is the convergence of the iteration. 7. Numerical experiments. Numerical experiments for the harmonic map problem with M = S1 , i.e., the unit circle, will be done. The domain Ω is always a square. The domain is triangulated by first dividing it into h × h squares. Then, each square is divided into two triangles by the diagonal with a negative slope of Ω, which is further divided into triangles by the diagonal with a negative slope. The finite element problem (4.6) is to find (uh , λh ) ∈ Vh,g × Vh,0 such that (7.1)
ˆ h ), λh = 0, vh + πh (uh · v ∇uh , ∇ˆ πh (|uh |2 − 1), μ ˆh = 0,
ˆ h ∈ Vh,0 , v μ ˆ h ∈ Vh,0 .
For the finite element method, we need to integrate over each element e ∈ Th . If we use the three vertices of e as the integration points, then the mass matrix reduces to a diagonal matrix. Correspondingly, the system (7.1) reduces to
(7.2)
−Lh uh + λh uh = 0
on Nh ,
|uh | − 1 = 0
on Nh .
2
Above Lh is the standard five-point finite difference discrete Laplacian approximation. For the Newton iteration (6.1), we need to solve the system
−Lh + Λn diag(un ) un+1 − un Lh un − λn un (7.3) on Nh . = diag(un )t 0 λn+1 − λn (1 − |un |2 )/2
SADDLE POINT APPROACH FOR HARMONIC MAPS
1519
Here and below, we use the simplified notation (un , λn ) instead of (unh , λnh ). Furthermore, Λn and diag(un ) are the matrix representations of the operators v → πh (λn v) and μ → πh (μun ), respectively. From Theorem 5.2, it is interesting to observe that the block-diagonal matrix Th = diag(L−1 h , Lh ) is a uniform preconditioner for the matrix of system (7.3). For the Newton iteration (7.3) with the preconditioner Th = diag(L−1 h , Lh ), the matrix L−1 h in Th is replaced by a symmetric and spectrally equivalent multigrid operator, while the matrix Lh is simply a discrete Laplacian with homogeneous Dirichlet boundary conditions. By doing so, no matrix needs to be inverted during the iterations. The cost per iteration is O(N ), where N is the degree of freedom for the discretization. In the following, we will investigate if it is possible to replace Newton’s method with a modified method where the linear system (6.2) is only solved to a given accuracy. More precisely, we shall compare the behavior of the exact and an inexact Newton solver: • The exact Newton solver: This refers to the scheme where we solve the linear system (6.2) with a preconditioned minimum residual method, which is terminated when the residual is reduced by a factor of 1010 . • The inexact Newton solver: This refers to the scheme where the Newton iterations (6.2) are terminated when the residual is reduced by a factor of 102 . In the tables, we show the numerical errors en versus the iteration number n, where en is defined as en = unh − uh H1h + λnh − λh H −1 ,
(7.4)
h
where xh 2H1 = (πh xh )t (I − Lh )πh xh and yh 2H−1 = (πh yh )t (I − Lh )−1 πh yh . h
h
7.1. A smooth harmonic map. In the first example we consider a smooth harmonic map u = (sin(θ(x, y)), cos(θ(x, y))), with θ = k log( (x − a)2 + (y − b)2 ) and λ = −|∇u|2 on Ω = [0, 1] × [0, 1]. We have used a = b = −0.1 and k = 3. The initial guess was u0 = 2(πh u + ), where is a random noise vector field with values between −0.3 and 0.3 and λ0 = 0. When using the inexact Newton solver, the stop criterion is obtained in less than 20 iterations, with a few exceptions in the first nonlinear iterations where the maximum was 80. For the exact Newton solver, the stop criterion is obtained in less than 50 iterations with a few exceptions in the first nonlinear iterations where as much as 300 iterations were required on the finest mesh. Hence, except for the first iterations, the required number of iterations seems to be bounded independent of the mesh size. This is due to the property of the preconditioner. In Table 1 we estimate the convergence of the L2 and H 1 norms of the error of u − uh in terms of h. We observe linear convergence in H1 and quadratic convergence in L2 , respectively. The convergence in H1 is in accordance with the error estimate of Theorem 5.1. The improved rate of convergence in L2 has not been justified in this paper, but this effect is in agreement with standard linear theory. Also, in the
1520
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER Table 1 The L2 and H 1 error of u and the L2 error of λ with respect to h. 2−2 6.7e-1 4.6 4.2e-1
h u − uh 0 u − uh 1 λ − λh 0
2−3 3.6e-2 1.1 2.2e-2
2−4 9.4e-3 5.7e-1 1.6e-3
2−5 2.4e-3 2.9e-1 1.5e-4
2−6 6.0e-4 1.4e-1 1.2e-5
Table 2 Convergence for the exact and inexact Newton solver with h = 2−4 .
Exact Inexact
e1 3.2e+1 3.2e+1
e2 9.3 9.5
e3 1.7 1.7
e4 2.3e-1 2.4e-1
e5 4.0e-3 3.5e-3
e6 3.4e-6 1.1e-5
e7 2.6e-9 1.0e-7
e8 2.7e-9
Table 3 Convergence for the the inexact Newton solver. h\it. 2−2 2−3 2−4 2−5
e1 9.2 1.6e+1 3.2e+1 6.4e+1
e2 2.6 4.7 9.5 2.4e+1
e3 4.7e-1 9.1e-1 1.7 3.6
e4 2.8e-2 7.6e-2 2.4e-1 9.6e-1
e5 1.9e-4 8.8e-4 3.5e-3 1.5e-2
e6 9.9e-7 4.0e-6 1.1e-5 4.7e-5
e7 7.7e-9 7.9e-8 1.0e-7 1.5e-6
e8 7.6e-10 1.4e-9 2.7e-9 6.6e-9
present example the observed convergence for λ − λh is better than that Theorem 5.1 predicts. A comparison of the exact Newton and inexact Newton solvers is shown in Table 2 for mesh size h = 2−4 . The convergence for other mesh sizes is similar. These tests indicate that the inexact Newton solver is nearly as efficient as the exact Newton solver. In Table 3, the convergence of the inexact Newton solver with different mesh sizes is shown. It shows the mesh independence property of the preconditioned iterative solver. 7.2. A harmonic map with singularity. As it is well known, the solution of the harmonic map problem is generally not unique and may have singularities even with smooth data. In order to show the applicability of our algorithms for these problems, we test a problem with a singular solution, i.e., u = (x/r, y/r), with r = k x2 + y 2 and λ = −|∇u|2 on Ω = [−0.5, 0.5] × [0.5, 0.5]. The pair (u, λ) corresponds to a classical solution of the saddle point system away from the origin, but u1 = ∞. Therefore, this example is not covered by our theoretical results, but we include the example to illustrate additional effects. The Dirichlet boundary conditions are obtained from the analytical solution, while the initial value for λ is λ0 = 0 everywhere except in (0, 0), where λ = 1. The initial value for u is shown in Figure 1(a). The computed solution is shown in Figure 1(b). The numerical errors are given in Table 4. The errors indicate that both uh and λh converge linearly to the solution when measured in L2 . It is interesting to observe that we get convergence for u − uh 0 and λ − λu 0 even without mesh refinement around the singularity. For this example, the Newton solvers are unstable and do not always converge. Thus, we have used the following iteration to produce the initial value for the Newton solvers:
−Lh diag(un ) un+1 − un Lh un − λn un (7.5) . = diag(un )t 0 λn+1 − λn (1 − |un |2 )/2
SADDLE POINT APPROACH FOR HARMONIC MAPS
(a)
(b)
(c)
(d)
1521
Fig. 1. Plot of the initial solutions and the computed solutions. (a) The first initial solution. (b) The solution for (a). (c) The second initial solution. (d) The solution for (c). Table 4 Errors with respect to h for the singular problem. h u − uh 0 λ − λh 0
2−3 2.2e-1 8.3e-1
2−4 1.3e-1 4.1e-1
2−5 7.4e-2 2.1e-1
2−6 4.0e-2 1.0e-1
Table 5 Convergence for the inexact Newton solver for the singular problem. e1 1.1e+1
e5 6.4e-1
e10 1.1e-1
e11 8.1e-2
e12 9.7e-4
e13 2.4e-7
e14 1.2e-8
Compared with (7.3), the matrix Λn has been dropped. This iterative scheme is globally convergent and is normally slower than the Newton solvers. Its convergence properties will be analyzed and discussed elsewhere. We do ten iterations of (7.5), and the inexact Newton solver is then turned on. The results are shown in Table 5 for h = 2−4 , where it is clear that we have quadratic convergence in the last iterations. For the smooth problem tested in section 7.1, it seems that the iterative solution always converges to the same solution no matter what kind of initial solution we use. For the problem here, we have noticed that the saddle point problem may have
1522
QIYA HU, XUE-CHENG TAI, AND RAGNAR WINTHER
multiple solutions. With another initial solution, as shown in Figure 1(c), we obtain another solution, which is shown in Figure 1(d). Acknowledgment. The authors are grateful to Kent Mardal who has supplied the numerical experiments for this work. REFERENCES [1] F. Alouges, A new algorithm for computing liquid crystal stable configurations: The harmonic mapping case, SIAM J. Numer. Anal., 34 (1997), pp. 1708–1726. [2] D. N. Arnold, R. S. Falk, and R. Winther, Preconditioning discrete approximations of the Reissner–Mindlin plate model, M2AN Math. Model. Numer. Anal., 31 (1997), pp. 517–557. [3] D. N. Arnold, R. S. Falk, and R. Winther, Preconditioning in H(div) and applications, Math. Comp., 66 (1997), pp. 957–984. [4] J. Barrett, S. Bartels, X. Feng, and A. Prohl, A convergent and constraint-preserving finite element method for the p-harmonic flow into spheres, SIAM J. Numer. Anal., 45 (2007), pp. 905–927. [5] S. Bartels, Stability and convergence of finite-element approximation schemes for harmonic maps, SIAM J. Numer. Anal., 43 (2005), pp. 220–238. [6] D. Bertsekas, Constrained Minimization and Lagrange Multiplier Methods, Athena Scientific, Belmont, MA, 1996. [7] H. Brezis, The interplay between analysis and topology in some nonlinear PDE problems, Bull. Amer. Math. Soc., 40 (2003), pp. 179–201. [8] F. Brezzi, On the existence, uniqueness and approximation of saddle–point problems arising from Lagrangian multipliers, RAIRO Anal. Num´ er., 8 (1974), pp. 129–151. [9] F. Brezzi and M. Fortin, Mixed and Hybrid Finite Element Methods, Springer, New York, 1991. [10] F. Brezzi, J. Rappaz, and P. Raviart, Finite dimensional approximation of nonlinear problems Part I: Branches of nonsingular solution, Numer. Math., 36 (1980), pp. 1–25. [11] Y. Chen and M. Struwe, Existence and partial regularity results for the heat flow for harmonic maps, Math. Z., 201 (1989), pp. 83–103. [12] Y. Chen, The weak solutions to the evolution of harmonic maps, Math. Z., 201 (1989), pp. 69– 74. [13] R. Cohen, R. Hardt, D. Kinderlehrer, S. Lin, and M. Luskin, Minimum energy configurations for liquid crystals: Computational results, in Theory and Applications of Liquid Crystals, IMA Vol. Math. Appl. 5, Springer, New York, 1987, pp. 99–121. [14] Q. Du, B. Guo, and J. Shen, Fourier spectral approximation to a dissipative system modeling the flow of liquid crystals, SIAM J. Numer. Anal., 39 (2001), pp. 735–762. [15] Q. Du, B. Guo, and J. Shen, Corrigendum: Fourier spectral approximation to a dissipative system modeling the flow of liquid crystals, SIAM J. Numer. Anal., 41 (2003), pp. 796–798. [16] W. E and X. Wang, Numerical Methods for the Landau–Lifshitz equation, SIAM J. Numer. Anal., 38 (2000), pp. 1647–1665. [17] I. Ekeland and R. Temam, Convex Analysis and Variational Problems, Classics Appl. Math. 28, SIAM, Philadelphia, PA, 1999. [18] H. Elman, D. Silvester, and A. J. Wathen, Finite Elements and Fast Iterative Solvers with Applications in Incompressible Fluid Dynamics, Oxford University Press, London, 2005. [19] R. Glowinski and P. Le Tallec, Augmented Lagrangian and Operator Splitting Methods in Nonlinear Mechanics, SIAM, Philadelphia, PA, 1989. [20] R. Glowinski, P. Lin, and X. Pan, An operator-splitting method for a liquid crystal model, Comput. Phys. Comm., 152 (2003), pp. 242–252. [21] R. Hardt, D. Kinderlehrer, and M. Luskin, Remarks about the mathematical theory of liquid crystals, in Calculus of Variations and Partial Differential Equations, Lecture Notes in Math. 1340, Springer, Berlin, 1988, pp. 123–138. [22] F. H´ elein, R´ egularit´ e des applications faibliment harmoniques une surface et une vari´ et´ e riemannienne, C. R. Acad. Sci. Paris, 312 (1991), pp. 591–596. ¨ ger and H. Kaul, Uniqueness and stability of harmonic maps and their Jacobi fields, [23] W. Ja Manuscripta Math., 28 (1979), pp. 269–291. [24] J. Jost, Riemannian Geometry and Geometric Analysis, 4th ed., Springer, Heidelberg, 2005. [25] S. Lin and M. Luskin, Relaxation methods for liquid crystal problems, SIAM J. Numer. Anal., 26 (1989), pp. 1310–1324.
SADDLE POINT APPROACH FOR HARMONIC MAPS
1523
[26] M. Lysaker, S. Osher, and X.-C. Tai, Noise removal using smoothed normals and surface fitting, IEEE Trans. Image Process., 13 (2004), pp. 1345–1357. [27] A. Quarteroni, R. Sacco, and F. Saleri, Numerical Mathematics, Springer, New York, 2000. [28] T. Rusten and R. Winther, A preconditioned iterative method for saddlepoint problems, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 887–904. [29] R. Sochen and S. T. Yau, Lectures on Harmonic Maps, International Press, Somerville, MA, 1997. [30] M. Struwe, Variational Methods, 3rd ed., Springer, New York, 2000. [31] X. C. Tai and J. C. Xu, Global and uniform convergence of subspace correction methods for some convex optimization problems, Math. Comp., 71 (2001), pp. 105–124. [32] L. Vese and S. Osher, Numerical methods for p-harmonic flows and applications to image processing, SIAM J. Numer. Anal., 40 (2002), pp. 2085–2104. [33] J. Xu, Iterative methods by space decomposition and subspace correction, SIAM Rev., 34 (1992), pp. 581–613.
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1524–1545
c 2009 Society for Industrial and Applied Mathematics
FIRST-ORDER SYSTEM LEAST-SQUARES METHODS FOR AN OPTIMAL CONTROL PROBLEM BY THE STOKES FLOW∗ SOOROK RYU† , HYUNG-CHUN LEE‡ , AND SANG DONG KIM§ Abstract. The least-squares approximations of an optimal control problem governed by the Stokes equations are considered, which leads to an unconstrained coupled optimization problem by the Lagrange multiplier method. The least-squares functionals for the two- and three-dimensional first-order coupled optimality systems are employed by modifying those functionals in [Z. Cai, T. A. Manteuffel, and S. F. McCormick, SIAM J. Numer. Anal., 34 (1997), pp. 1727–1741]. The established ellipticity and continuity in a product H 1 norm yield the optimal discretization error estimates in the finite element spaces. For numerical tests, we apply V-cycle multigrid methods to the whole discrete algebraic system. Key words. optimal control, least-squares finite element methods, coupled Stokes equations, V-cycle AMS subject classifications. 65M55, 65N30, 49J20, 49K20 DOI. 10.1137/070701157
1. Introduction. Optimal control problems involving partial differential equations have been interesting subjects to experimentalists because of computational approaches. Unfortunately, most computational efforts have employed basic optimization strategies. But more recently, there has been considerable practical and theoretical interest in sophisticated optimization strategies such as the Lagrange multiplier method, sensitivity- or adjoint-based gradient methods, and quasi-Newton methods (see [15], for example). The Lagrange multiplier rule is a standard approach to solving optimization and control problems constrained by partial differential equations. This approach has been studied extensively in much of the literature, including [14], [15], [19], [20], and [24]. There has also been considerable interest in the least-squares–type methods for Stokes flow problems, resulting in physically meaningful new variables which induce the corresponding first-order system (see [5], [8], [9], [11], [21], and [22]). The applications of least-squares principles to optimality systems were previously discussed in [2], [3], [4], [6], and [7]. The rigorous analysis of a penalty/least-squares approach was introduced in [16], [17], and [23]. Recently, [7] developed an abstract form for the first-order system least-squares (FOSLS) for optimal control problems governed by linear, elliptic partial differential equations. Unfortunately, they discuss neither the whole coupled system concretely nor important facts regarding practical results of optimal control problems using FOSLS. In this paper, we complete the work presented in [7] by analyzing the methods from a practical point of view. By drawing on [7] and [11], we provide a nice synthesis of optimal control by the Stokes flow and FOSLS. Further, we show that the least-squares methods are suited to these optimal ∗ Received by the editors August 27, 2007; accepted for publication (in revised form) September 5, 2008; published electronically April 1, 2009. This work was supported by Korea Research Foundation under grant KRF-2005-070-C00017. http://www.siam.org/journals/sinum/47-2/70115.html † Department of Industrial and Applied Mathematics, Kyungpook National University, Daegu 702-701, Korea (
[email protected]). ‡ Department of Mathematics, Ajou University, Suwon 443-749, Korea (
[email protected]). § Department of Mathematics, Kyungpook National University, Daegu 702-701, Korea (skim@ knu.ac.kr).
1524
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM
1525
control problems because of the ability of weight residuals according to the penalty parameter σ (defined later) and the viscosity ν. The object of this paper is to combine the Lagrange multiplier method and the FOSLS methods for a distributed control problem by Stokes flow for use in multigrid methods. The state system is given by the Stokes equations ⎧ −νΔu + ∇p = f in Ω, ⎪ ⎪ ⎪ ⎪ ⎪ ∇ · u = 0 in Ω, ⎨ (1.1) u = 0 on ∂Ω, ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ p dx = 0, Ω
where u and p denote the velocity and pressure variables, respectively, ν is the constant kinematic viscosity, and f is the control function in L2 (Ω). Here Ω ⊂ Rn (n = 2 or 3) is a bounded convex polygon (polyhedron) or has a C 1,1 boundary. The objective functional is defined by 1 σ |2 dx + (1.2) J (u, f ) = |u − u |f |2 dx, 2 Ω 2 Ω is a given target velocity and σ is a positive penalty parameter. The optiwhere u mization problem we consider is to find an optimal state (u, p) and an optimal control subject to (u, p, f ) satisfying f which minimize the L2 norm distance between u and u (1.1). The second term in (1.2) is added to limit the cost of control, and the positive penalty parameter σ can be used to change the relative importance of the two terms. Such a constrained optimization problem can be converted into the unconstrained optimization problem by the Lagrange multiplier rule so that one may have a coupled optimality system related to two Stokes-type equations associated with state and adjoint variables. This optimality system can be dealt with by using mixed finite element approaches. But we can use FOSLS techniques in [11] by introducing the state flux U = ∇ut of the state velocity u and the adjoint flux V = ∇vt of the adjoint velocity v so that we reformulate the coupled second-order optimality systems as the coupled first-order optimality systems. This process enables us to avoid using the finite elements satisfying the LBB conditions. For the proof of a product H 1 norm equivalence of an L2 functional, we first establish the coercivity of a FOSLS formulation employing the H −1 -L2 norm as a vehicle. Then we show that our L2 functional is elliptic in a product H 1 norm. In addition, we show H 2 -regularity of the coupled Stokes equations according to the Agmon–Douglis– Nirenberg (ADN) theory in [1] by following [22] (see the appendix). Our proof of the H 1 norm equivalence is shown only for n = 3 because its proof can be easily modified to the two-dimensional case. But numerical computations are implemented for a twodimensional model problem defined on the unit square domain. In numerical tests, it is important to use appropriate weights in least-squares functionals (see section 5). To get desirable numerical results, we make a balance of adjoint residual terms weighted by 1/σ 2 for functionals (see section 2). Hence, in order to emphasize the importance of this weight, we compare the approximate solutions of the weighted minimizing functional with those of the nonweighted (or partially weighted) minimizing functional in section 5. Taking a target velocity as the solution of the Stokes equations, we observe the controlled flows computed by multigrid methods for the whole discrete algebraic system by decreasing σ and by taking several viscosities ν (see section 5).
1526
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM
This paper proceeds as follows. In section 2, we present the first-order optimality systems related to the state-adjoint systems and the optimality condition and then we provide some notation and preliminaries. In section 3, two types of FOSLS functionals are defined for the coupled least-squares problems for which the existence and uniqueness of the solutions are shown. In section 4, the optimal error estimates for the least-squares functionals are proved. Various numerical results are presented by changing the penalty parameter in section 5. Finally, some conclusions are in last section, and a proof of H 2 -regularity for coupled Stokes-type equations is added in the appendix. 2. Coupled first-order system formulations. From the Lagrange multiplier rule (see [15] and [24]), the optimality system which minimizes (1.2) subject to (1.1) is given by the Stokes system (1.1) and the adjoint system
(2.1)
⎧ νΔv + ∇q + u = u ⎪ ⎪ ⎪ ⎪ ⎪ ∇·v =0 ⎨ v=0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ q dx = 0,
in Ω, in Ω, on ∂Ω,
Ω
with the optimality condition (2.2)
1 − v=f σ
in Ω.
Unfortunately, the resulting optimality systems (1.1) and (2.1)–(2.2) are coupled. Due to the success of the FOSLS approach on Stokes equations for the last decade, we adopt the notation and formulations in [11] in order to solve the optimality systems by the FOSLS methods. As a result, with the state velocity flux variable U = ∇ut = (∇u1 , . . . , ∇un ) and the adjoint velocity flux variable V = ∇vt = (∇v1 , . . . , ∇vn ), the optimality systems turn out to be the following equivalent first-order systems:
(2.3)
(2.4)
⎧ −ν(∇ · U)t + ∇p = f ⎪ ⎪ ⎪ ⎨ U − ∇ut = 0 ⎪ ∇·u= 0 ⎪ ⎪ ⎩ u=0
in Ω, in Ω, in Ω, on ∂Ω,
⎧ in Ω, ν(∇ · V)t + ∇q + u = u ⎪ ⎪ ⎪ ⎨ t V − ∇v = 0 in Ω, ⎪ ∇ · v = 0 in Ω, ⎪ ⎪ ⎩ v = 0 on ∂Ω,
and (2.2). From now on, we replace the control f in (2.3) by (2.2). Then, following the developments in [11], we obtain the equivalent extended optimality system for
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM
1527
(2.3)–(2.4) and (2.2):
(2.5)
L1 (U, u, p, v; ν, σ) :=
⎧ v t ⎪ ⎪ ⎪ −ν(∇ · U) + ∇p + σ = 0 ⎪ ⎪ ⎪ ⎪ U − ∇ut = 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ∇·u= 0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
in Ω, in Ω,
∇×U=0
in Ω, in Ω,
∇(trU) = 0
in Ω,
u=0 n×U=0
on ∂Ω, on ∂Ω,
and
(2.6)
L2 (V, v, q, u; ν) :=
⎧ in Ω, ν(∇ · V)t + ∇q + u = u ⎪ ⎪ ⎪ ⎪ t ⎪ V − ∇v = 0 in Ω, ⎪ ⎪ ⎪ ⎪ ⎪ ∇ · v = 0 in Ω, ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
∇×V = 0 ∇(trV) = 0 v=0 n×V = 0
in Ω, in Ω, on ∂Ω, on ∂Ω.
The standard notation and definitions for the Sobolev spaces H s (Ω) with their associated inner products (·, ·)s and norms · s , s ≥ 0, will be used. For example, H 0 (Ω) is the usual L2 (Ω) with the norm · 0 = · and inner product (·, ·). L20 (Ω) is the subspace of all square integrable functions with zero mean. The space H −1 (Ω) denotes the dual of H01 (Ω) equipped with norm φ−1 = sup0 =v∈H01 (Ω) (φ,v) v1 . Finally, the standard div space H(div; Ω) will be used (see [13]). 3. Norm equivalences of first-order least-squares functionals. In this section, we show H 1 -ellipticity of the L2 least-squares functional corresponding to (2.5) and (2.6) by proving whose H −1 -L2 functional is elliptic in a product L2 -H 1 norm. As suggested in [11], the weight ν 2 will be given for the residuals U − ∇ut and ∇ · u to balance with the residual of the first-order state momentum equation. Accordingly, the weight σ12 is used in the adjoint system for balancing weights between norms from the state and adjoint systems (see section 5 for computational results). In this sense, the H −1 -L2 least-squares functional is defined for the system (2.3)–(2.4) and (2.2) as ) G1 (U, u, p, V, v, q; u 0 02 0 v0 t 0 =0ν(∇ · U) − ∇p − 0 + ν 2 U − ∇ut 2 + ν 2 ∇ · u2 σ 0−1 +
1 ν2 ν2 2−1 + 2 V − ∇vt 2 + 2 ∇ · v2 , ν(∇ · V)t + ∇q + u − u 2 σ σ σ
1528
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM
and the L2 functional for the extended system (2.5) and (2.6) is defined as ) G2 (U, u, p , V, v, q; u 0 02 0 v0 t 0 = 0ν(∇ · U) − ∇p − 0 (3.1) + ν 2 U − ∇ut 2 + ν 2 ∇ · u2 + ν 2 ∇ × U2 σ0 1 ν2 2 + 2 V − ∇vt 2 + ν 2 ∇tr(U)2 + 2 ν(∇ · V)t + ∇q + u − u σ σ 2 2 ν2 ν ν + 2 ∇ · v2 + 2 ∇ × V2 + 2 ∇tr(V)2 . σ σ σ Then we show the existence of solutions minimizing the quadratic functional ) over the proper solution spaces W × W which will be defined G2 (U, u, p, V, v, q; u later: find (U, u, p, V, v, q) ∈ W × W such that (3.2)
(U, u, p, V, v, q) = arg
inf (Ψ,ψ,r,Ξ,ξ,w)∈W×W
). G2 (Ψ, ψ, r, Ξ, ξ, w; u
Let W1 = H(div; Ω)n × H01 (Ω)n × (L20 (Ω) ∩ H 1 (Ω)). Lemma 3.1. For any (U, u, p, V, v, q) ∈ W1 ×W1 , there exists a positive constant C such that 0
0 0 v0 1 t 0 0 (3.3) p ≤ C 0ν(∇ · U) − ∇p − 0 + νU + v−1 σ −1 σ and (3.4)
q ≤ C(ν(∇ · V)t + ∇q + u−1 + νV + u−1 ).
Proof. Since the proofs of (3.3) and (3.4) are similar, we will provide the proof of (3.3). First, note that, for any (U, u, p, V, v, q) ∈ W1 × W1 and φ ∈ H01 (Ω)n , we have
v 1 t (∇p, φ) = − ν(∇ · U) + ∇p + , φ + ν((∇ · U)t , φ) − (v, φ) σ σ
v 1 t = − ν(∇ · U) + ∇p + , φ − ν(U, ∇φt ) − (v, φ) σ σ 0 0 0 v0 1 t 0 ≤ 0 − ν(∇ · U) + ∇p + 0 φ1 + νUφ1 + v−1 φ1 . σ 0−1 σ Hence, using the fact that p ≤ C∇p−1 for any p ∈ L20 (Ω) (see [25]), it follows that 0
0 0 v0 t 0 + νU + 1 v−1 , − ν(∇ · U) p ≤ C 0 + ∇p + 0 σ 0−1 σ which is (3.3). For convenience, we denote M1 and M2 as the following norms: M1 (U, u, p, V, v, q) := U2 + u21 + p2 + V2 + v21 + q2
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM
1529
and M2 (U, u, p, V, v, q) := U21 + u21 + p21 + V21 + v21 + q21 . In order to show the existence and uniqueness of the solution for (3.2), the coercivity and continuity of the bilinear form B(·, ·) corresponding to G2 should be provided. This can be achieved by showing the equivalence of M2 and G2 , where the equivalence in this norm is dependent on the viscosity constant ν and the penalty parameter σ. Let us define for the proper solution spaces 2
W1 = L2 (Ω)n × H01 (Ω)n × L20 (Ω) 2
and, with V0 = {V ∈ H 1 (Ω)n : n × V = 0
on ∂Ω},
W = V0 × H01 (Ω)n × (H 1 (Ω) ∩ L20 (Ω)). Note that the H 1 norm equivalence of the functional G2 can guarantee the convergence of the multigrid cycle on the solution space W. From now on, we assume that the penalty parameter σ is in the range 0 < σ ≤ 1 because the small ranges of σ are required for good control (see section 5). The following theorem will be used as an intermediate process to prove the equivalence of M2 and G2 . Theorem 3.2. There are two positive constants C1 and C2 dependent on σ and ν such that, for any (U, u, p, V, v, q) ∈ W1 × W1 , we have (3.5)
C1 M1 (U, u, p, V, v, q) ≤ G1 (U, u, p, V, v, q; 0) ≤ C2 M1 (U, u, p, V, v, q).
Proof. One may modify the the proof of Theorem 3.1 in [11] easily using Lemma 3.1. Hence we omit the proof. For the norm equivalence of G2 (U, u, p, V, v, q; 0) with M2 (U, u, p, V, v, q), we need to establish the H 2 -regularity estimates of the following coupled equations: ⎧ v ⎪ −Δu + ∇p/ν + = 0 in Ω, ⎪ ⎪ σν ⎪ ⎪ ⎪ ⎨ u 1 u = 0 on ∂Ω, u = in Ω, Δv + ∇q/ν + and (3.6) ν ν ⎪ v = 0 on ∂Ω. ⎪ ⎪ ⎪ ∇ · u = 0 in Ω, ⎪ ⎪ ⎩ ∇ · v = 0 in Ω, Note that, for uncoupled Stokes-like equations, H 2 -regularity was provided in [22] by ADN theory (see [1]). Proposition 3.3. Suppose that the domain Ω is a bounded convex polyhedron or has a C 1,1 boundary. Then, for u, v ∈ H 2 (Ω)n ∩ H01 (Ω)n and p, q ∈ H 1 (Ω), the coupled Stokes equations (3.6) satisfy the H 2 -regularity estimate
(3.7)
u2 + v2 + p1 + q1 0 0 0 v0 0 ≤ Cr 0νΔu − ∇p − 0 + ∇ · v , + νΔv + ∇q + u + ∇ · u 1 1 σ0
where Cr depends on σ, ν, and Ω. Proof. See the appendix.
1530
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM
Theorem 3.4. Assume that the domain Ω is a bounded convex polyhedron or has a C 1,1 boundary. Then for any vector v in either H0 (div) ∩ H(curl) or H(div) ∩ H0 (curl) it follows that v21 ≤ C(v2 + ∇ · v2 + ∇ × v2 ).
(3.8)
If, in addition, the domain is simply connected, then v21 ≤ C(∇ · v2 + ∇ × v2 ).
(3.9)
Proof. These results follow from Theorems 3.7–3.9 and Lemmas 3.4–3.6 in [13]. Lemma 3.5. Assume that the same assumptions of Theorem 3.4 hold with simply connected Ω. Then the following hold: (i) It follows that for q ∈ (H01 (Ω) ∩ H 2 (Ω))n and p ∈ H 1 (Ω) ∩ L20 (Ω) (3.10)
∇ · q + δp ≤ C|∇ · q + δp|1 .
(ii) Let each qi ∈ H01 (Ω) ∩ H 2 (Ω) and each φi ∈ H 1 (Ω)3 be divergence free with Δφi ∈ L2 (Ω)3 and n × (∇ × φi ) = 0 on ∂Ω. Then Φ = (φ1 , φ2 , φ3 ), q = (q1 , q2 , q3 )t , and any p ∈ H 1 (Ω) satisfy (3.11)
|∇ · q + δp|1 ≤ C(|∇ · q + tr∇ × Φ + δp|21 + ΔΦ2 ).
(iii) Let S ∈ V0 . Then it has the decomposition (3.12)
S = ∇st + ∇ × Ξ,
where s ∈ H01 (Ω)3 ∩H 2 (Ω)3 with Δs = (∇·S)t and Ξ is columnwise divergence free with n × (∇ × Ξ) = 0 on ∂Ω with ΔΞ = ∇ × S. Proof. See Lemma 3.1 in [11], Lemma 3.2 in [11], and the proof of Theorem 3.2 in [11] for (i), (ii), and (iii), respectively. Now we are ready to prove the norm equivalence for the least-squares functional G2 (U, u, p, V, v, q; 0) with M2 (U, u, p, V, v, q). Theorem 3.6. Assume that the domain Ω is a bounded convex polyhedron or has a C 1,1 boundary. Then there are two constants C1 and C2 dependent on σ and ν such that, for any (U, u, p, V, v, q) ∈ W × W, we have (3.13)
C1 M2 (U, u, p, V, v, q) ≤ G2 (U, u, p, V, v, q; 0) ≤ C2 M2 (U, u, p, V, v, q).
Proof. The upper bound in (3.13) is straightforward from the triangle and Cauchy–Schwarz inequalities. Next, to prove the lower bound in (3.13), note that, since W ⊂ W1 , G1 ≤ G2 on W × W. Hence, by Theorem 3.2, we have M1 (U, u, p, V, v, q) ≤ CG1 (U, u, p, V, v, q; 0) ≤ CG2 (U, u, p, V, v, q; 0). From (3.8) in Theorem 3.4 and the standard Poincar´e–Friedrichs inequality, we have U21 + p21 + V21 + q21 ≤ C(U2 + (∇ · U)t 2 + ∇ × U2 + ∇p2 + V2 + (∇ · V)t 2 + ∇ × V2 + ∇q2 ).
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM
1531
It thus suffices to show that
(3.14)
C((∇ · U)t 2 + ∇p2 + (∇ · V)t 2 + ∇q2 ) 02 0 0 v0 t 0 + |trU|2 + ∇ × U2 + U − ∇ut 2 ν(∇ · U) ≤0 − ∇p − 1 0 σ0 + ν(∇ · V)t + ∇q + u2 + |trV|21 + ∇ × V2 + V − ∇vt 2 .
If (3.14) is satisfied for simply connected Ω, then it is also satisfied for Ω whose boundary ∂Ω is C 1,1 due to the similar arguments in the proof of Theorem 3.7 in [13]. Hence it is enough to assume that the domain Ω is simply connected with connected boundary. Also, we will prove (3.14) only for n = 3 because its proof can be reduced to the n = 2 case. Since U and V are in V0 , there exist r, w, Φ, and Ψ satisfying (iii) of Lemma 3.5 such that U = ∇rt + ∇ × Φ and V = ∇wt + ∇ × Ψ, with (3.15) Δr = (∇ · U)t
Δw = (∇ · V)t ,
ΔΦ = ∇ × U,
and ΔΨ = ∇ × V.
Hence, (∇ · U)t 2 + ∇p2 + (∇ · V)t 2 + ∇q2
(3.16) (3.17) (3.18)
(3.19)
= Δr2 + ∇p2 + Δw2 + ∇q2 02 0 0 w0 2 2 2 0 0 ≤ Cr 0νΔr − ∇p − 0 + ∇ · r1 + νΔw + ∇q + r + ∇ · w1 σ 02 0 0 w0 0 + |∇ · r|21 + νΔw + ∇q + r2 + |∇ · w|21 νΔr − ∇p − ≤C 0 0 σ0 02 0 0 w0 0 + |∇ · r + tr∇ × Φ|21 + ΔΦ2 ≤ C 0νΔr − ∇p − 0 σ0 2 2 2 + νΔw + ∇q + r + |∇ · w + tr∇ × Ψ|1 + ΔΨ 02 0 0 w0 t 0 + |trU|21 + ∇ × U2 =C 0 − ∇p − ν(∇ · U) 0 σ0
+ ν(∇ · V)t + ∇q + r2 + |trV|21 + ∇ × V2 .
The inequalities (3.16), (3.17), and (3.18) follow from (3.7), (3.10), and (3.11), respectively. The equality (3.19) follows from (3.15). Using the triangle inequality, we have 02 0 0 0 0ν(∇ · U)t − ∇p − w 0 + ν(∇ · V)t + ∇q + r2 0 σ0 02
0 0 v0 t 0 + v − w2 + ν(∇ · V)t + ∇q + u2 + r − u2 . ≤C 0 − ∇p − ν(∇ · U) 0 σ0 Using the Poincar´e–Friedrichs inequality and (3.15), it follows that (3.20) v − w2 + r − u2 ≤ C(∇vt − V + ∇ × Ψ2 + U − ∇ × Φ − ∇ut 2 ).
1532
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM
Then applying the triangle inequality with the Poincar´e–Friedrichs inequality, applying (3.9) to the right-hand side of (3.20) with v = ∇ × Φ or ∇ × Ψ, and using (3.15), we have ∇vt − V + ∇ × Ψ2 + U − ∇ × Φ − ∇ut 2 ≤ C(V − ∇vt 2 + U − ∇ut 2 + ∇ × Ψ21 + ∇ × Φ21 ) ≤ C(V − ∇vt 2 + U − ∇ut 2 + ∇ · ∇ × Ψ2 + ∇ × ∇ × Ψ2 + ∇ · ∇ × Φ2 + ∇ × ∇ × Φ2 ) ≤ C(V − ∇vt 2 + U − ∇ut 2 + ΔΦ2 + ΔΨ2 ) = C(V − ∇vt 2 + U − ∇ut 2 + ∇ × U2 + ∇ × V2 ). This proves (3.14) for a simply connected Ω. Hence, we have the conclusions. 4. Finite element approximations. The finite element approximations on the minimization of the least-squares functionals G2 defined in section 3 can be shown with a family of triangulations Th of Ω by a standard finite element subdivision of Ω into quasi-uniform triangles with h = max{diam(K) : K ∈ Th }. Let W h := V h × U h × Qh be a finite dimensional subspace of W with the following approximation properties 2 (see [12]): for any (U, u, p) ∈ W ∩ (H r (Ω)n × H r+1 (Ω)n × H r (Ω))(r ≥ 1), there exist a constant C and a pair (Uh , uh , ph ) ∈ W h such that (4.1) (4.2) (4.3)
inf {U − Uh 0 + hU − Uh 1 } ≤ Chr Ur ,
Uh ∈ V h
inf {u − uh 0 + hu − uh 1 } ≤ Chr+1 ur+1 ,
uh ∈ U h
inf {p − ph 0 + hp − ph 1 } ≤ Chr pr .
ph ∈ Qh
For convenience, let state variables: Φs = {U, u, p} ∈ W, adjoint variables: Λa = {V, v, q} ∈ W, ˜ = {U, ˜ u ˜ , p˜} ∈ W, test functions: Φ ˜ = {V, ˜ v ˜ , q˜} ∈ W. Λ Then, the finite element approximation to (3.2) for i = 2 becomes the following: find (Φsh , Λah ) ∈ W h × W h satisfying (Φsh , Λah ) = arg
inf
˜ h ,Λ ˜ h )∈W h ×W h (Φ
˜ h, Λ ˜ h; u ). G2 (Φ
The variational problem is to find (Φs , Λa ) ∈ W × W satisfying (4.4)
˜ Λ)) ˜ = F ((Φ, ˜ Λ); ˜ u ˜ Λ) ˜ ∈ W × W, ˆ ) ∀(Φ, B((Φs , Λa ), (Φ,
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM
1533
where ˜ Λ)) ˜ B((Φs , Λa ), (Φ,
˜ v v ˜ t − ∇˜ = ν(∇ · U)t − ∇p − , ν(∇ · U) p− σ σ 2 t ˜ t 2 ˜ ˜ ) + ν 2 (∇ × U, ∇ × U) u ) + ν (∇ · u, ∇ · u + ν (U − ∇u , U − ∇˜ 1 ˜ t + ∇˜ ˜) ((ν(∇ · V)t + ∇q + u, ν(∇ · V) q+u σ2 ˜ − ∇˜ ˜ ˜ ) + ν 2 (∇ × V, ∇ × V) vt ) + ν 2 (∇ · v, ∇ · v + ν 2 (V − ∇vt , V 2 ˜ + ν (∇tr(V), ∇tr(V))) ˜ + + ν 2 (∇tr(U), ∇tr(U))
and 1 ˜ Λ; ˜ u ˜ t + ∇˜ ˆ ) = 2 ( ˜ ). F (Φ, u, ν(∇ · V) q+u σ The corresponding finite element discretization of (4.4) is to find (Φsh , Λah ) ∈ W h ×W h satisfying (4.5)
˜ h, Λ ˜ h )) = F (Φ ˜ h, Λ ˜ h; u ˜ h, Λ ˜ h) ∈ W h × W h . ˆ ) ∀(Φ B((Φsh , Λah ), (Φ
Proposition 4.1. Let (Φs , Λa ) be the solution of the minimization of G2 over W × W and (Φsh , Λah ) the unique minimizer of G2 over W h × W h . Then (4.6)
M2 (Φs − Φsh , Λa − Λah ) ≤ C
inf
˜ h ,Λ ˜ h )∈W h ×W h (Φ
˜ h , Λa − Λ ˜ h ). M2 (Φs − Φ
˜ h , Λa − Λ ˜ h ) to Proof. Theorem 3.6 and the orthogonality of the error (Φs − Φ h W × W , with respect to the above bilinear form B(·, ·) and the Cauchy–Schwarz inequality, imply (4.6). 2 Theorem 4.2. Assume that (Φs , Λa ) ∈ W 2 ∩ (H r (Ω)n × H r+1 (Ω)n × H r (Ω))2 , where r ≥ 1 is the solution of the minimization problem for G2 and (Φsh , Λah ) is the unique minimizer of G2 over W h × W h . Then h
M2 (Φs − Φsh , Λa − Λah ) ≤ Ch2(r−1) U2r + h2 u2r+1 + p2r + V2r + h2 v2r+1 + q2r . Proof. The approximation properties (4.1)–(4.3) and Proposition 4.1 lead us to the conclusion. We note that the convergence depends on the viscosity ν and the penalty parameter σ. The slow convergence will be observed as such parameters approach zero in section 5. For the computations of the functional G1 , one may adopt the well-known H −1 norm technique proposed in [10]. 5. Numerical experimentation. For numerical tests, we take the unit square domain Ω := (0, 1) × (0, 1) ⊂ R2 . First, we need the matrix representations corresponding to (4.5) for numerical implementations on a family of uniform triangulations Th of Ω consisting of the single discrete space of continuous piecewise linear functions with mesh size h for the approximations of all unknowns. Let us choose 2J J h h h bases {V j }4J j=1 , {Uj }j=1 , and {Qj }j=1 for V , U , and Q , respectively, so that one 4J 4J 2J J may have Uh = j=1 Uj V j , uh = j=1 uj Uj , ph = j=1 pj Qj , Vh = j=1 Vj V j ,
1534
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM
2J J 2J vh = j=1 vj Uj , and q h = j=1 qj Qj for some sets of coefficients {U }4J j=1 , {uj }j=1 , J 4J 2J J {pj }j=1 , {Vj }j=1 , {vj }j=1 , and {qj }j=1 . Hence (4.5) leads to the matrix equation F1 A1 B T Φ = (5.1) , B A2 Λ F2 where
⎡
K1 A1 = ⎣ C1 C2
Ct1 K2 0
⎤ ⎡ 1 Ct2 σ 2 K1 0 ⎦ , A2 = ⎣ σ12 C1 − σ12 C2 K3
1 t σ 2 C1 K4
0
⎤ ⎡ − σ12 Ct2 0 0 ⎦ , B = ⎣C4 1 0 σ 2 K3
C3 0 C6
⎤ 0 C5 ⎦ , 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 g2 V U = ⎣ v ⎦ , F1 = ⎣g1 ⎦ , F2 = ⎣ 0 ⎦ , = ⎣ u ⎦ , Λ Φ 0 g3 q p = (V1 , . . . , V4J )t , =(U1 , . . . , U4J )t , u = (u1 , . . . , u2J )t , p = (p1 , . . . , pJ )t , V and U t t v = (v1 , . . . , v2J ) , and q = (p1 , . . . , qJ ) . The block matrix K1 is of size 4J × 4J, K2 and K4 are of size 2J × 2J, K3 is of size J × J, C1 and C4 are of size 2J × 4J, C2 is of size J × 4J, C3 is of size 4J × 2J, C5 is of size 2J × J, and C6 is of size J × 2J. We note that the matrix in (5.1) whose block entries are easily set up is symmetric and positive definite. The positivity of (5.1) depends on the viscosity ν and the penalty parameter σ. The implementation of a model problem not only shows that the given target can be reached by the finite element solutions uh as h approaches 0 but velocity u also shows the role of the penalty parameter σ. Since the H 1 norm equivalence is shown for the least-squares functional G2 , it is possible to use the multigrid V-cycle preconditioner for the linear system (5.1). For the linear system (5.1), the five-point Gaussian quadrature is used on each triangle for all integrals of gk (k = 1, 2, 3). Let Rm be the mth residual; then the relative residual tolerance (Rm /R0 ) < := 10−5 is used with maximum iteration number 300 for terminating the V(1,1)-cycle with the Gauss–Seidel smoothing iterations. (x, y) = (ˆ As the target velocity u u1 (x, y), uˆ2 (x, y)), let us take the example in [18], where u ˆ1 (x, y) =
d (φ(x)φ(y)) dy
and uˆ2 (x, y) = −
d (φ(x)φ(y)) dx
, we examine L2 errors with φ(z) = (1 − cos(πz))(1 − z)2 . With this divergence free u h between the finite element solution u and the exact solution u for the various penalty parameters σ ≤ 1 for a fixed viscosity constant ν. Generally, the penalty parameter σ requires small values for good control; hence it will be taken as less than or equal to 1 (see [15] and [18]), which was assumed in Theorem 3.2. Note that the initial state velocity and the adjoint variable are set to be zeros for our computations. First, we study the performances of the different values of penalty parameter σ. To show the effects of σ as σ → 0, we report the L2 errors between the target flows and the controlled flows and the L2 norms of the optimal control f h plus the , magnitude of the cost functionals. Table 5.1 reveals the aspects of L2 error uh − u the magnitude f h , and the value J (uh , f h ) for various σ with fixed ν and mesh size h = 1/64. According to Table 5.1, the approximate solution uh converges to the according to σ → 0 for ν = 1, 10−1 , or 10−2 . target velocity u
1535
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM Table 5.1 The values when the G2 functional is weighted with ν 2 and
ν
1
10−1
10−2
σ 1 10−1 10−2 10−3 10−4 1 10−1 10−2 10−3 10−4 1 10−1 10−2 10−3 10−4
uh − u 1.27407460e − 1 1.25622183e − 1 6.11272053e − 2 2.08869292e − 3 1.63889436e − 4 1.22605207e − 1 5.91572035e − 2 4.28504164e − 3 3.80646256e − 4 1.48804667e − 4 2.82493074e − 2 3.06976832e − 3 3.38752045e − 4 1.41904717e − 4 1.37947792e − 4
νσ 1 10−1 10−2 10−3 10−4 10−1 10−2 10−3 10−4 10−5 10−2 10−3 10−4 10−5 10−6
f h 2.36195082e − 3 1.92834224e − 2 9.16813612e − 2 2.68464510e − 2 1.35038017e − 2 2.26955618e − 2 9.55968591e − 2 7.79396372e − 2 7.02108160e − 2 1.03155722e − 1 4.77763288e − 2 5.54958655e − 2 5.67637555e − 2 6.05699458e − 2 1.22791866e − 1 1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 . 64
J (uh , f h ) 8.11911988e − 3 7.90905892e − 3 1.91029498e − 3 2.54168503e − 6 2.25475067e − 8 7.77356270e − 3 2.20672534e − 3 3.95537262e − 5 2.53722513e − 6 5.43126564e − 7 1.54030048e − 3 1.58701293e − 4 1.61679962e − 5 1.84442764e − 6 7.63406921e − 7
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.7
0.8
0.9
1
ν=0.001, σ=1
ν=0.01, σ =0.1 1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
for h =
ν=0.1, σ=0.01
ν=1, σ =0.001 1
0
1 σ2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Fig. 5.1. Control flows uh when νσ = 0.001 and h =
1 . 32
1 Figure 5.1 shows the controlled flows for νσ = 0.001 and h = 32 . Figures 5.2 −1 and 5.3 show the behaviors of the controlled flows when σ = 1, 10 , 10−2 , and 10−3 for ν = 1, 0.1, and h = 1/32. In Figure 5.1, the controlled flows are very similar to each other for νσ = 0.001, and also in Figures 5.2 and 5.3, the flow shapes are alike for νσ = 10−1 , 10−2 , and 10−3 , respectively. In Table 5.1 and Figures 5.1–5.3, converges to zero as νσ → 0; that is, the controlled velocity we see that uh − u approaches to the desired velocity as νσ is decreased. These phenomena will be explained by the following observations: for νσ = μ, if (U, u, p, V, v, q) is the solution
1536
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
νσ = 1
νσ = 0.1
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1 0
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.7
0.8
0.9
1
0.1
0.1 0
0
νσ = 0.001
νσ = 0.01
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Fig. 5.2. Target flow (top), and controlled flows at σ = 1 (middle left), 10−1 (middle right), 10−2 (bottom left), and σ = 10−3 (bottom right) for ν = 1, when h = 1/32.
of L1 (U, u, p, v; ν, σ) = 0 and L2 (V, v, q, u; ν) = F, then (U, u, p, V, v, q) with a scaled p(σp) is also the solution of L1 (U, u, p, v; μ, 1) = 0 and L2 (V, v, q, u; ν) = F. may be alike for the same νσ. Hence the L2 errors uh − u cannot approach 0 completely even if Remark 5.1. We may note that uh − u u − uh converges to 0 as h → 0 (this can be done by finite element approximations) because, if limh→0 u − uh = 0, then we have ≤ lim u − uh + uh − u u − u h→0 ≤ lim uh − u + u − u = u − u . = lim uh − u h→0
h→0
= 0 if the exact solution u approaches the target velocity u Thus, limh→0 uh − u under a circumstance such as that wherein σ approaches 0 and the admissible set of should be in the admissible set. f is compact. In this case, of course, u Remark 5.2. If the adjoint variables V, v, and q in system (2.5) and (2.6) are all zeros, then the state variables U, u, and p also become zeros. This implies that u = u = 0, so the adjoint variables cannot be exactly in (2.6), but it is a contradiction if u zeros in system (2.6). Tables 5.2 and 5.3 exhibit each norm of the approximate state
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM
1537
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.1
0.2
0.3
0.4
0.5
0.6
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.8
0.9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0
0.1
0.2
0.3
νσ = 0.001
0.4
0.5
0.6
0.7
0.8
0.9
1
0.7
0.8
0.9
1
νσ = 0.0001
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1 0
1
0.1
0.1
0
0.7
νσ = 0.01
νσ = 0.1
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Fig. 5.3. Target flow (top) and controlled flows at σ = 1 (middle left), 10−1 (middle right), 10−2 (bottom left), and σ = 10−3 (bottom right) for ν = 10−1 , when h = 1/32.
and adjoint variables, respectively. In Tables 5.2 and 5.3, the norms of each state variable U, u, and p and their sums increase as σ goes to zero; on the other hand, magnitudes of adjoint variables V, v, and q get smaller as σ goes to zero. Since the variable v in (2.5) has the the coefficient σ1 , we balanced (2.5) with multiplying the adjoint equation (2.6) by σ1 . Therefore, the solver may be focused on the system (2.6) for small σ. That is, the smaller σ provides the better controlled flow, as seen in Figures 5.2 and 5.3, because V, v, and q go to zeros as σ → 0. Tables 5.4–5.5 exhibit the values uh − u for σ = 10−1 and σ = 10−4 , respectively, do not approach zero as mesh size h is decreasing. In Table 5.4, the errors uh − u if σ is relatively large. Hence, we may say that uh is controlled well for small σ and h. Note the convergence rates are like O(h2 ) for σ = 10−4 (see Table 5.5). Remark 5.3. In this remark, we pay attention to the weight σ12 balancing the residuals of state and adjoint systems. Now consider the following minimizing problem using a functional G, removing the weight σ12 in the functional G2 such that (5.2)
(U, u, p, V, v, q) = arg
inf (Ψ,ψ,r,Ξ,ξ,w)∈W×W
), G(Ψ, ψ, r, Ξ, ξ, w; u
1538
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM
Table 5.2 The magnitudes of the approximate state variables with varying σ when ν = 1 and h =
σ 1 10−1 10−2 10−3 10−4
U2 1.0548e − 07 8.0469e − 06 4.8527e − 04 4.1143e − 04 3.5101e − 04
u2 2.6342e − 09 3.4994e − 06 4.6395e − 03 1.5822e − 02 1.6226e − 02
p2 2.9419e − 07 5.9599e − 05 4.2292e − 02 1.3374e − 01 1.3695e − 01
1 . 64
U2 + u2 + p2 4.0230e − 07 7.1145e − 05 4.7417e − 02 1.4997e − 01 1.5353e − 01
Table 5.3 The magnitudes of the approximate adjoint variables with varying σ when ν = 1 and h =
σ 1 10−1 10−2 10−3 10−4
V2 2.9819e − 04 2.8780e − 04 6.5021e − 05 5.0869e − 08 1.0949e − 10
v2 5.5788e − 06 3.7185e − 06 8.4055e − 07 7.2073e − 10 1.8235e − 12
q2 2.6133e − 05 2.5292e − 05 1.7709e − 06 3.5029e − 08 4.0121e − 10
1 . 64
V2 + v2 + q2 3.2990e − 04 3.1681e − 04 6.7632e − 05 8.6619e − 08 5.1253e − 10
Table 5.4 , the L2 norm of control f h , and the values of cost functional for the The L2 -errors uh − u different h when ν = 1 and σ = 10−1 .
h
uh − u
f h
J (uh , f h )
1 4 1 8 1 16 1 32 1 64
9.37742558e − 2 1.06921353e − 1 1.18731641e − 1 1.23921360e − 1 1.25622183e − 1
7.58448816e − 3 1.40698183e − 2 1.76054484e − 2 1.89002755e − 2 1.92834224e − 2
4.39968176e − 3 5.72598590e − 3 7.06409883e − 3 7.69611279e − 3 7.90905892e − 3
Table 5.5 , the L2 norm of control f h , and the values of cost functional for the The L2 -errors uh − u different h when ν = 1 and σ = 10−4 .
h
uh − u
ρ
f h
J (uh , f h )
1 4 1 8 1 16 1 32 1 64
3.91818901e − 2 9.72997579e − 3 2.40527354e − 3 5.92404111e − 4 1.63889436e − 4
2.0097 2.0162 2.0215 1.8539
1.75520248e − 0 5.77694222e − 1 1.49528777e − 1 3.53178777e − 2 1.35038017e − 2
9.21647045e − 4 6.40227452e − 5 4.01061318e − 6 2.37838940e − 7 2.25476057e − 7
where the functional G is ) G(U,u, p, V, v, q; u 02 0 0 v0 t 0 + ν 2 U − ∇ut 2 + ν 2 ∇ · u2 + ν 2 ∇ × U2 ν(∇ · U) =0 − ∇p − 0 σ0 2 + ν 2 V − ∇vt 2 + ν 2 ∇tr(U)2 + ν(∇ · V)t + ∇q + u − u + ν 2 ∇ · v2 + ν 2 ∇ × V2 + ν 2 ∇trV)2 . Then one may get the norm equivalence of G so that (5.2) has a unique solution, but according to numerical tests in Table 5.6 (compare them with Table 5.1), uh − u
1539
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM Table 5.6 The values without the weight σ12 in the G2 functional for h =
ν
1
10−1
10−2
σ 1 10−1 10−2 10−3 10−4 1 10−1 10−2 10−3 10−4 1 10−1 10−2 10−3 10−4
uh − u 1.27407460e − 1 1.27016697e − 1 1.24526214e − 1 1.25850014e − 1 1.26830317e − 1 1.22605207e − 1 1.00351941e − 1 8.97403442e − 2 9.15307160e − 2 9.18262184e − 2 2.82493073e − 2 9.65867353e − 3 8.32188190e − 3 8.23441322e − 3 8.22622963e − 3
f h 2.36195082e − 3 2.34315166e − 2 1.57872405e − 1 9.05242193e − 2 3.31493457e − 2 2.26955618e − 2 1.40226171e − 1 1.98551642e − 1 1.87005411e − 1 1.85093250e − 1 4.77763288e − 2 6.14732212e − 2 6.31820665e − 2 6.33332474e − 2 6.33481285e − 2
1 . 64
J (uh , f h ) 8.11911988e − 3 8.09407240e − 3 7.87800748e − 3 7.92321038e − 3 8.04301960e − 3 7.77356269e − 3 6.01842502e − 3 4.22377847e − 3 4.20642150e − 3 4.21774017e − 3 1.54030048e − 3 2.35592833e − 4 5.45867269e − 5 3.59083307e − 5 3.40360763e − 5
Table 5.7 The values without the weights ν 2 and σ12 at the G2 functional for h =
ν
1
10−1
10−2
σ 1 10−1 10−2 10−3 10−4 1 10−1 10−2 10−3 10−4 1 10−1 10−2 10−3 10−4
uh − u 1.27407460e − 1 1.27016697e − 1 1.24526214e − 1 1.25850014e − 1 1.26830317e − 1 1.23378335e − 1 1.03748970e − 1 8.88520222e − 2 8.57461071e − 2 8.57871028e − 2 8.09929079e − 2 7.39311834e − 2 7.27525606e − 2 7.25866622e − 2 7.25795684e − 2
f h 2.36195082e − 3 2.34315166e − 2 1.57872405e − 1 9.05242193e − 2 3.31493457e − 2 2.07895089e − 2 1.34083973e − 1 2.57116549e − 1 3.18058282e − 1 3.18512380e − 1 2.54933229e − 2 3.77040995e − 2 4.25219861e − 2 4.51049297e − 2 4.50406576e − 2
1 . 64
J (uh , f h ) 8.11911988e − 3 8.09407240e − 3 7.87800748e − 3 7.92321038e − 3 8.04301960e − 3 7.77356270e − 3 6.28085002e − 3 4.27788552e − 3 3.72677798e − 3 3.68478601e − 3 3.60488032e − 3 2.80398989e − 3 2.65550813e − 3 2.63542899e − 3 2.63399831e − 3
1 is not decreased as σ → 0 for the fixed mesh size h = 64 and the fixed viscosity ν. 1 This may be due to the imbalance of the weight σ between (2.5) and (2.6). Table 5.7 shows that the flow uh without both ν 2 and σ12 in the G2 functional is not controlled even though σ is varied. Due to these numerical observations, the weight σ12 at the adjoint terms in G2 may be a critical factor for the present optimal control problem (see also Tables 5.1, 5.4, and 5.5). The flows are controlled well for small σ as seen in Tables 5.1 and 5.5. This small σ may have effects on the discrete system, such as altering the condition numbers of the system and the elapsed time of the multigrid cycle. Let us focus on the condition numbers of the matrix A generated by the discrete system (5.1) with different weights. The behaviors of scaled condition numbers of three types of discrete systems κ(A) are presented in Figures 5.4 and 5.5 for ν = 10−1 and ν = 10−2 , respectively, for fixed
1540
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM
11
log (κ(A))
no weighted ν2
10
10
ν2, 1/σ2
9
8
7
6
5
4 −3
−2.5
−2
−1.5
−1
−0.5
0
log (σ) 10
Fig. 5.4. Condition numbers for different weights with varying σ when ν = 0.1 and h = 1/16.
13
log10(κ(A))
no weighted ν2
12
ν2, 1/σ2
11
10
9
8
7
6
5
4 −3
−2.5
−2
−1.5
−1
−0.5
0
log (σ) 10
Fig. 5.5. Condition numbers for different weights with varying σ when ν = 0.01 and h = 1/16.
1 h = 16 as σ changes 10−3 to 1. The condition numbers of the systems (5.1) become larger than those of the systems without only 1/σ 2 or without all weights; also the condition numbers are increased as σ → 0. For the weighted system (5.1), the tangent graph of condition numbers may be log10 (κ(A))/ log10 (σ) = −2, so it seems that the condition numbers are proportional to σ12 . But the elapsed time t of the multigrid cycle is not increased so much. Since we use relative residual Rm /R0 as a tolerance and take zeros for initial variables, R0 is F when the linear system is Ax = F . The value F is large if σ is a small value; hence the elapsed time is relatively not so large for σ = 10−3 and σ = 10−4, although the condition numbers are increased, as seen in Figure 5.6. We note that the spectral condition number κ(A) can be bounded 2 by κ(A) < c0 C C1 , where c0 is a constant dependent on h and C1 , C2 are constants corresponding to (3.13) dependent on ν and σ.
6. Concluding remarks. We have formulated an approach for finite element discretizations of optimality systems based on the applications of FOSLS principles for the optimal control problem governed by Stokes equations. We showed that the leastsquares functional is equivalent to H 1 product norms by providing the H 2 -regularity of the coupled Stokes optimal system. This principle results in a symmetric and positive definite algebraic system. In particular, we imposed the weights σ12 and ν 2 on the present least-squares functionals. These weights are very important for the optimal control problem using least-squares methods. To emphasize the importance of these weights, we compared the numerical solutions of three different weighted functionals by decreasing the parameter σ or ν. This was done by multigrid solver
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM
1541
5.6 5.5 5.4
log10(t)
5.3 5.2 5.1 5 4.9
−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0
log (σ) 10
Fig. 5.6. Scaled elapsed time of V-cycle for different σ when ν = 1 and h = 1/32.
for whole discrete algebraic systems. In this sense, an advantage of least-squares formulation is its flexibility, which allows appropriate weights to get good control results. The present numerical tests may be compared with the block Gauss–Seidel method suggested in [7] for the implementation of (5.1) from a computational point of view. This will be done in a coming paper. Appendix. Proof of regularity estimates (3.7). The H 2 -regularity estimates for (3.6) on the domain we prescribed can be derived using ADN theory [1] and following reasoning similar to that of the proof in [22]. Proof. Let l = {lij } for 1 ≤ i, j ≤ 2n + 2 and B = {Bμj } for 1 ≤ μ ≤ 2n, 1 ≤ j ≤ 2n + 2 denote the differential operator and boundary operator corresponding to (3.6), respectively. Then for n = 3 we have ⎡ −Δ 0 0 ⎢ 0 −Δ 0 ⎢ ⎢ 0 0 −Δ ⎢ 1 ⎢ 0 0 ν l=⎢ 1 ⎢ 0 0 ν ⎢ 1 ⎢ 0 0 ν ⎢ ⎣ ∂1 ∂2 ∂3 0 0 0 F = 0
0 0
u ˆ1 /ν
1 νσ
0 0 Δ 0 0 0 ∂1
uˆ2 /ν
0 1 νσ
0 0 Δ 0 0 ∂2
0 0
∂1 ∂2 ∂3 0 0 0 0 0
1 νσ
0 0 Δ 0 ∂3
u ˆ3 /ν
0
⎤ 0 0⎥ ⎥ 0⎥ ⎥ ∂1 ⎥ ⎥, ∂2 ⎥ ⎥ ∂3 ⎥ ⎥ 0⎦ 0
t 0 ,
⎡
⎤ u1 ⎢ u2 ⎥ ⎢ ⎥ ⎢ u3 ⎥ ⎢ ⎥ ⎢ v1 ⎥ ⎥ U =⎢ ⎢ v2 ⎥ , ⎢ ⎥ ⎢ v3 ⎥ ⎢ ⎥ ⎣p/ν ⎦ q/ν
and Bμj = δμj ,
where δμj is the Kronecker delta. One may have similar matrices for n = 2 easily. Following the developments and notation in [1], we assign a system of integer indices {si }, si ≤ 0, for the equations and {tj }, tj ≥ 0, for the unknown functions. Then, for the system (3.6), we choose Sobolev norms based on the scales si = 0 (1 ≤ i ≤ 2n),
s2n+1 = s2n+2 = −1
for the equations and tj = 2 (1 ≤ j ≤ 2n),
t2n+1 = t2n+2 = 1
for the variables. Next, the principal part l of the interior operator l can be chosen
1542
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM
by taking any term (i, j) whose order is (si + tj ) as ⎡ −Δ 0 0 0 0 0 ⎢ 0 −Δ 0 0 0 0 ⎢ ⎢ 0 0 −Δ 0 0 0 ⎢ ⎢ 0 0 0 Δ 0 0 l (∂) = ⎢ ⎢ 0 0 0 0 Δ 0 ⎢ ⎢ 0 0 0 0 0 Δ ⎢ ⎣ ∂1 ∂2 ∂3 0 0 0 0 0 0 ∂1 ∂2 ∂3
∂1 ∂2 ∂3 0 0 0 0 0
⎤ 0 0⎥ ⎥ 0⎥ ⎥ ∂1 ⎥ ⎥. ∂2 ⎥ ⎥ ∂3 ⎥ ⎥ 0⎦ 0
Next, replacing ∂j by ξj (1 ≤ j ≤ n), respectively, we have the determinant of the principal part as L(ξ) := det(l (ξ)) = (−1)n−1 |ξ|4n , where |ξ|2 = ξ12 + · · · + ξn2 and ξ = (ξ1 , . . . , ξn ) for n = 2 or 3 dimensions. Hence the ellipticity of (3.6) in the ADN sense is shown so that the uniform ellipticity condition A−1 |ξ|2m ≤ |L(ξ)| ≤ A|ξ|2m holds with m = 2n and A = 1. Supplementary condition on l. Note that L(ξ) is of even degree 2m (with respect to ξ). For every pair of linearly independent real vectors ξ, ξ , the polynomial L(ξ + κξ ) = (−1)n−1 |ξ + κξ |4n in the complex variable κ has exactly m(= 2n) roots, κ+ (ξ, ξ ) = −ξ · ξ + i |ξ|2 |ξ |2 − |ξ · ξ |2 , with positive imaginary part. Complementing boundary conditions. We examine the complementary boundary condition in an attempt to show that the boundary conditions of the system are independent of its interior equations and provide a well-posed problem. Note that the matrix for the boundary operator consists of Bμj = δμj , where 1 ≤ μ ≤ 2n, 1 ≤ j ≤ 2n + 2(n = 2, 3). If we take rμ = −2 for μ = 1, . . . , 2n, then Bμj ≤ rμ + tj . The principal boundary operator B is the same as B. At any point P of ∂Ω, let n denote the outward unit normal and τ = 0 any tangential unit vector to ∂Ω (τ , in particular, is real). Let us denote the stencil l(ξ) and l (ξ) of the interior operator l and the principal interior operator l , respectively, where ξ = τ + γn, and we will use the same notation B for the stencil matrix corresponding to the boundary operator B because it is the constant matrix. Note that the only root of L(τ + γn) = (−1)n−1 |τ + γn|4n = (−1)n−1 (1 + γ 2 )2n = 0 with positive imaginary part is i with multiplicity 2n = m. Let M + (γ) = (γ − i)m . Now we examine the linear independence of the rows of ˆ (P, ξ) ˆ (P, ξ)adj( ˆ (P, ξ) (A.1) B l (ξ)) = L(ξ)B l −1 (P, ξ) = (−1)n−1 |ξ|4n B l −1 (P, ξ). Hence, we must show that (A.2)
m μ=1
Cμ
2n+1 j=1
Bμj (P, τ + γn) adj( l (τ + γn))jk ≡ 0(mod M + )
1543
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM
if and only if the constants Cμ all vanish. For n = 2, by straightforward calculations, the inverse of l (P, ξ) is ⎡ ξ2 ξ ξ ⎤ ξ1 1 2 0 0 0 − ρ22 ρ2 ρ ⎢ ξ1 ξ2 ⎥ ξ2 ξ2 ⎢ ρ2 − ρ12 0 0 0⎥ ρ ⎢ ⎥ ⎢ ξ22 ξ1 ξ2 ξ1 ⎥ 0 0 − 0 ⎢ 2 2 ρ ρ ρ ⎥ l −1 (ξ) = ⎢ ⎥, ⎢ ξ12 ξ1 ξ2 ξ2 ⎥ 0 − ρ2 0 ⎢ 0 ⎥ 2 ρ ρ ⎢ ξ1 ⎥ ξ2 ⎣ ρ ⎦ 0 0 1 0 ρ ξ2 ξ1 0 0 0 −1 ρ ρ 0 where ρ = |ξ|2 = ξ12 + ξ22 . Note that L(ξ) = −|ξ|8 = −ρ4 , and hence ⎡ 2 −ξ1 ξ2 0 0 −ρξ1 ξ2 2 ⎢−ξ1 ξ2 ξ 0 0 −ρξ2 −1 2 1 ˆ (P, ξ) l (P, ξ) = ρ ⎢ (A.3) L(ξ)B ⎣ 0 0 −ξ22 ξ1 ξ2 0 0 0 ξ1 ξ2 −ξ12 0
⎤ 0 0 ⎥ ⎥. −ρξ1 ⎦ −ρξ2
Without loss of generality, we may assume that the coordinate axes are aligned with the directions of τ and n, so that τ = (1, 0) and n = (0, 1). Then M + (γ) = (γ − i)4 , ρ2 = (γ 2 + 1)2 , and ˆ (P,ξ) adj( B l (τ + γn)) ⎡ 2 −γ γ ⎢ 1 2 2 ⎢−γ =(γ + 1) ⎣ 0 0 0 0
0 0 −γ 2 γ
⎤ 0 −(γ 2 + 1) 0 ⎥ 0 −γ(γ 2 + 1) 0 ⎥. 2 γ 0 −(γ + 1) ⎦ −1 0 −γ(γ 2 + 1)
Hence, (A.2) reduces to (A.4) (γ 2 + 1)2 (C1 γ 2 − C2 γ) = A1 (γ − i)4 ,
(γ 2 + 1)2 (−C3 γ 2 + C4 γ) = A2 (γ − i)4 ,
where Ai (i = 1, 2) are polynomials for γ. The equality is possible if and only if C1 = C2 = A1 = 0 and C3 = C4 = A2 = 0. Now let us discuss n = 3. Note that the inverse of l (ξ) is ⎡ ⎤ M3×3 03×3 C3×1 03×1 ⎢ 03×3 −M3×3 03×1 C3×1 ⎥ ⎥ l −1 (P, ξ) = ⎢ , t ⎣ C1×3 01×3 1 0 ⎦ t 01×3 C1×3 0 −1 8×8 where
⎡ −(ξ22 + ξ32 ) 1 ⎣ M= 2 ξ1 ξ2 ρ ξ3 ξ1
⎤ ξ1 ξ2 ξ3 ξ1 −(ξ12 + ξ32 ) ξ2 ξ3 ⎦ , ξ2 ξ3 −(ξ12 + ξ22 )
⎡ ⎤ ξ 1 ⎣ 1⎦ ξ2 , and ρ = |ξ|2 . C= ρ ξ3
From now on, we may assume that n = (0, 0, 1) and τ = (a, b, 0), where a, b are arbitrary constants, satisfy a2 + b2 = 1. Note that ξ = τ + γn = (a, b, γ). Then, since L(ξ) = ρ6 , M + (γ) = (γ − i)6 , and ρ = (1 + γ 2 ), it follows that ˆ 03×3 ∗3×1 03×1 2 4 M3×3 (A.5) B (P, τ + γn) adj(l (τ + γn)) = (γ + 1) ˆ 3×3 03×1 ∗3×1 , 03×3 −M
1544 where
SOOROK RYU, HYUNG-CHUN LEE AND SANG DONG KIM
⎡ −(γ 2 + b2 ) ˆ =⎣ ab M aγ
⎤ ab aγ −(γ 2 + a2 ) bγ ⎦ bγ −1
and ∗3×1 stands for a matrix that does not affect row independence. Due to the structure of the matrix in (A.5), it is enough to show that (A.6) (γ 2 + 1)4 (−(γ 2 + b2 )C1+j + abC2+j + aγC3+j ) = A1 (γ − i)6 ,
where
j = 0, 3,
has all zero coefficients C1 = C2 = C3 = C4 = C5 = C6 = 0 for the row independence of the matrix in (A.1), where A1 is a polynomial for γ. This can be easily verified. Therefore, applying Theorem 10.5 in [1] and the following remark, we have the estimates (3.7). Acknowledgment. We would like to thank the referees, whose valuable comments and corrections improved the paper. REFERENCES [1] S. Agmon, A. Douglis, and L. Nirenberg, Estimates near the boundary for solutions of elliptic partial differential equations satisfying general boundary conditions, II, Comm. Pure Appl. Math., 17 (1964), pp. 35–92. [2] D. Bedivan and G. Fix, Least-squares methods for optimal shape design problems, Comput. Math. Appl., 30 (1995), pp. 17–25. [3] P. Bochev, Least-squares methods for optimal control, Nonlinear Anal., 30 (1997), pp. 237–256. [4] P. Bochev and D. Bedivan, Least-squares methods for Navier-Stokes boundary control problems, Int. J. Comput. Fluid Dyn., 9 (1997), pp. 43–58. [5] P. Bochev and M. Gunzburger, Analysis of least-squares finite element methods for the Stokes equations, Math. Comp., 63 (1994), pp. 479–506. [6] P. Bochev and M. Gunzburger, Least-squares finite element methods for optimization and control problems for the Stokes equations, Comput. Math. Appl., 48 (2004), pp.1035–1057. [7] P. Bochev and M. D. Gunzburger, Least-squares finite element methods for optimality systems arising in optimization and control problems, SIAM J. Numer. Anal., 43 (2006), pp. 2517–2543. [8] P. Bochev, Z. Cai, T. A. Manteuffel, and S. F. McCormick, Analysis of velocity-flux first-order system least-squares principles for the Navier–Stokes equations: Part I, SIAM J. Numer. Anal., 35 (1998), pp. 990–1009. [9] P. Bochev, T. A. Manteuffel, and S. F. McCormick, Analysis of velocity-flux least-squares principles for the Navier–Stokes equations: Part II, SIAM J. Numer. Anal., 36 (1999) pp. 1125–1144. [10] J. H. Bramble, R. D. Lazarov, and J. E. Pasciak, A least-squares approach based on a discrete minus one inner product for first order system, Math. Comp., 66 (1997), pp. 935–955. [11] Z. Cai, T. A. Manteuffel, and S. F. McCormick, First-order system least squares for the Stokes equations, with application to linear elasticity, SIAM J. Numer. Anal., 34 (1997), pp. 1727–1741. [12] P. G. Ciarlet, The Finite Element Method for Elliptic Problems, North–Holland, Amsterdam, 1978. [13] V. Girault and P. A. Raviart, Finite Element Methods for Navier-Stokes Equations: Theory and Algorithms, Springer-Verlag, New York, 1986. [14] R. Glowinski and J. He, On shape optimization and related issues, in Computational Methods for Optimal Design and Control (Arlington, VA, 1997), Birkha¨ user Boston, Boston, 1998, pp. 151–179. [15] M. D. Gunzburger, Perspectives in Flow Control and Optimization, Adv. Des. Control 5, SIAM, Philadelphia, 2002. [16] M. Gunzberger and H.-C. Lee, Analysis and approximation of optimal control problems for first-order elliptic systems in three dimensions, Appl. Math. Comput., 100 (1999), pp. 49–70.
FOSLS METHODS FOR AN OPTIMAL CONTROL PROBLEM
1545
[17] M. Gunzburger and H.-C. Lee, A penalty/least-squares method for optimal control problems for first-order elliptic systems, Appl. Math. Comput., 107 (2000), pp. 57–75. [18] M. D. Gunzburger and S. Manservisi, Analysis and approximation of the velocity tracking problem for Navier–Stokes flows with distributed control, SIAM J. Numer. Anal., 37 (2000), pp. 1481–1512. [19] J. He, M. Chevalier, R. Glowinski, R. Metcalfe, A. Nordlander, and J. Periaux, Drag reduction by active control for flow past cylinders, in Computational Mathematics Driven by Industrial Problems (Martina Franca, 1999), Lecture Notes in Math. 1739, SpringerVerlag, Berlin, 2000, pp. 287–363. [20] J. He, R. Glowinski, R. Metcalfe, A. Nordlander, and J. Periaux, Active control and drag optimization for flow past a circular cylinder, J. Comput. Phys., 163 (2000), pp. 83–117. [21] S. D. Kim, C.-O. Lee, T. A. Manteuffel, S. F. McCormick, and O. Rohrler, First-order system least squares for the Oseen equations, Numer. Linear Algebra Appl., 13 (2006), pp. 523–542. [22] S. D. Kim, T. A. Manteuffel, and S. F. McCormick, First-order system least squares (FOSLS) for spatial linear elasticity: Pure traction, SIAM J. Numer. Anal., 38 (2000), pp. 1454–1482. [23] H.-C. Lee and Y. Choi, A least-squares method for optimal control problems for a second-order elliptic system in two dimensions, J. Math. Anal. Appl., 242 (2000), pp. 105–128. [24] J.-L. Lions, Optimal Control of Systems Governed by Partial Differential Equations, SpringerVerlag, New York, 1971. [25] J. Necas, Equations aux Derivees Partielles, Presses de l’Universite de Montreal, Montreal, Quebec, Canada, 1965.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1546–1575
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS USING THE MALLIAVIN–THALMAIER FORMULA∗ A. KOHATSU-HIGA† AND KAZUHIRO YASUDA‡ Abstract. The Malliavin–Thalmaier formula was introduced in [P. Malliavin and A. Thalmaier, Stochastic Calculus of Variations in Mathematical Finance, Springer-Verlag, Berlin, 2006] as an alternative expression for the density of a multivariate smooth random variable in Wiener space. In comparison with classical integration by parts formulae, this alternative formulation requires the application of the integration by parts formula only once to obtain an expression that can be simulated. Therefore, this expression is free from the curse of dimensionality. Unfortunately, when this formula is applied directly in computer simulation, it exhibits unstable behavior. We propose an approximation to the Malliavin–Thalmaier formula in the spirit of the theory of kernel density estimation to solve this problem. In the first part of this paper, we obtain a central limit theorem for the estimation error. And in the latter part, we apply the Malliavin–Thalmaier formula for the calculation of Greeks in finance. Key words. Malliavin–Thalmaier formula, multidimensional density function, Greeks AMS subject classifications. 60H07, 60H35, 60J60, 62G07, 65C05 DOI. 10.1137/070687359
1. Introduction. Let (Ω, F , P ) denote a complete probability space carrying a k-dimensional Wiener process W , and let F : Ω → Rd , F = (F1 , . . . , Fd ), d ≥ 2 be a random vector defined in this space. The goal of the present article is to discuss how to simulate the probability density function of F for d ≥ 2 using Malliavin calculus. Applications of this problem can be found in a variety of fields where fundamental solutions or density functions cannot be explicitely obtained. This problem has attracted some interest due to its financial applications, although we frame it here as a general density estimation problem. The classical integration by parts (IBP) formula of Malliavin calculus is an approach that has been suggested by Fourni´e et al. [4]. For definitions and results on Malliavin calculus, we refer the reader to section 2 of this article where a brief introduction is given, Nualart [8, Theorem 2.1.4 and Proposition 2.1.5, pp. 102–103] or Sanz-Sol´e [9, Proposition 5.4, p. 67]. Our starting point is an expression for the density of a smooth d-dimensional random vector F . This basic result can be stated as follows. Let F = (F1 , . . . , Fd ) be a nondegenerate random vector and G a smooth random variable. We denote by pF,G = E[G|F = x]pF,1 (x), where pF,1 (x) denotes the density of F . Then there exists a random variable H(1,2,...,d) (F ; 1) ∈ Lp (Ω) for any p > 2 such that d ! x) = E 1[0,∞) (Fi − x ˆi )H(1,2,...,d) (F ; G) , (1.1) pF,G (ˆ i=1 ∗ Received by the editors April 4, 2007; accepted for publication (in revised form) November 6, 2008; published electronically April 1, 2009. http://www.siam.org/journals/sinum/47-2/68735.html † Graduate School of Engineering Sciences, Osaka University, Osaka 560-8531, Japan (kohatsu@ sigmath.es.osaka-u.ac.jp). ‡ Faculty of Science and Engineering, Hosei University, Koganei, Tokyo 184-8554, Japan (k
[email protected]).
1546
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1547
where 1[0,∞) (x) denotes the indicator function. Here, for i = 2, . . . , d, H(1) (F ; G) :=
d 1j δ G γF−1 DFj , j=1
(1.2) H(1,...,i) (F ; G) :=
d ij δ H(1,...,i−1) (F ; G) γF−1 DFj . j=1
Here δ denotes the adjoint operator associated to the Malliavin derivative operator D and γF the Malliavin covariance matrix of F . In particular, we remark that δ is an extension of the Itˆ o integral that also integrates nonadapted processes and is usually called the Skorohod integral. The definition of H(1,...,i) (F ; 1) in iterative form in (1.2) shows that, in order to compute this expression, one requires the calculation of i-iterated stochastic integrals. The Skorohod integral, being a nonadapted integral, is not easy to simulate in iterative form, and therefore, the above expression takes a relatively large amount of time to be simulated when d is big unless an explicit simple expression for H(1,...,d) (F ; G) is obtained. Besides this problem, one also encounters problems of high variance, and therefore, variance reduction methods have to be incorporated making the problem even less tractable from an applied point of view. Recently, Malliavin and Thalmaier [7, section 4.5] introduced an alternative IBP formula that seems to alleviate the computational burden for the simulation of densities in high dimension. In fact, Malliavin and Thalmaier express the multidimensional delta function as δ0 (x) = ΔQd (x), d ∂ 2 where Δ = i=1 ∂x 2 is the Laplace operator and Qd is the fundamental solution of i ˆ ∈ Rd , they obtain the following representation for the Poisson equation. Then, for x the density of F : d ∂ ˆ )H(i) (F ; G) . (1.3) pF,G (ˆ x) = E Qd (F − x ∂xi i=1 Therefore, one needs to simulate H(i) (F ; G), which involves only one Skorohod integral instead of the previous d-iterated Skorohod integrals in (1.2). In fact, if we partition the time interval in N intervals in order to carry out simulations of the increments of the Wiener process, then the iterated Skorohod integrals appearing in (1.1) will require the calculation over N d cross-intervals. Instead, formula (1.3) requires only N d. In principle, one expects then that the calculation time will be highly reduced. Nevertheless, the high variance problem in formula (1.1) is taken to an extreme as the variance of the estimator in (1.3) is infinite. This problem appears because the ∂ Qd (x) at x = 0 is ∞, although the expectation in (1.3) is finite. limit of ∂x i Therefore, we propose a slightly modified estimator that depends on a modifica∂ tion parameter h, which will converge to ∂x Qd (x) as h → 0. This will generate a i small bias and a large variance which is not infinite. Then we control the explosive behavior of the variance using the number of simulations. This type of calculation is common in kernel density estimation (KDE) methods where this technique has been
1548
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
very effective. The main difference between traditional KDE theory and the proposal in this paper is that, although the modification we propose here is mathematically natural, it does not correspond to any of the classical estimation methods studied in KDE theory. In order to “tune the parameter h” (an expression used in KDE, meaning how to choose h), we obtain in section 3 the bias of the estimation procedure. In section 4 we study the L2 (Ω)-error of estimation to finally obtain, in section 5, the central limit theorem that shows how to tune the parameters of the estimation procedure. In section 6 we apply the Malliavin–Thalmaier formula to finance, especially to the calculation of Greeks, in the spirit of Fourni´e et al. [4] where the one-dimensional examples are considered. We give an expression for Greeks using the Malliavin– Thalmaier formula. In particular, the weights are free from the curse of dimensionality. That is, the expression does not have a d-iterated Skorohod integral. The article closes in section 7, with the discussion of various simulation results. In particular, we concentrate on the simulation of the density of the stock value and volatility in the Heston model. In order to avoid long proofs, we have moved to an appendix all technical details, leaving in the proofs of the main theorems the essential ideas. Also note that the expression in (1.1) corresponds to a density only in the case where G = 1. To avoid introducing further terminology, we will keep referring to x) as the “density.” pF,G (ˆ 2. Preliminaries. Let us introduce some notations and basic definitions of Malliavin calculus. For a multi-index α = (α1 , . . . , αm ) ∈ {1, . . . , d}m , we denote by |α| = m the length of the multi-index. 2.1. Malliavin calculus. Let (Ω, F , P ) be a complete probability space. Let W be a k-dimensional Wiener process on the time interval [0, T ]. We denote by Cp∞ (Rn ) the set of all infinitely differentiable functions f : Rn → R such that f and all of its partial derivatives have at most polynomial growth. Let S denote the class of smooth random variables of the form (2.1)
F = f (W (t1 ), . . . , W (tn )),
where f ∈ Cp∞ (Rn ), t1 , . . . , tn ∈ [0, T ], and n ≥ 1. If F has form (2.1), we define its derivative Dsi F , i = 1, . . . , k as Dsi F =
n ∂f (W (t1 ), . . . , W (tn ))1[0,ti ] (s). ∂xj j=1
We will denote the domain of D in Lp (Ω) by D1,p . This space is the closure of the class of smooth random variables S with respect to the norm . p1 F 1,p = E [|F |p ] + E DF pL2 [0,T ] . We can define the iteration of the operator D in such a way that, for a smooth random variable F , its derivative Dk F is a random variable, with values on L2 [0, T ]⊗k . Then, for every p ≥ 1 and k ∈ N, we introduce a seminorm on S defined by F pn,p = E [|F |p ] +
n j=1
E Dj F pL2 [0,T ]⊗j .
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1549
For any real p ≥ 1 and any natural number n ≥ 0, we will denote by Dn,p the completion of the family of smooth random variables S with respect to the norm · n,p . Note that Dj,p ⊂ Dn,q if j ≥ n and p ≥ q. Consider the intersection S S D∞ = Dn,p . p≥1 n≥1
Then D∞ is a complete, countably normed, metric space. We will denote by δ the adjoint of the operator D as an unbounded operator from L2 (Ω) into L2 (Ω; L2 [0, T ]). That is, the domain of δ, denoted by Dom(δ), is the set of L2 [0, T ]-valued square integrable random variables u such that
E DF, u L2 [0,T ] ≤ cF 2 , for all F ∈ D1,2 , where c is some positive constant depending on u. (Here ·2 denotes the L2 (Ω)-norm.) We remark here that any L2 integrable adapted process u belongs to the domain of δ. Furthermore, one can prove that, in such a case, δ(u) is the Itˆ o integral of u. In general, δ(u) is called the Skorohod integral of u. A property of δ that is frequently used is as follows: Let G ∈ D1,2 be a real-valued random variable such that Gu ∈ L2 (Ω, L2 [0, T ]), then (2.2)
δ(Gu) = Gδ(u) − DG, u L2 [0,T ] ,
where we suppose that the right-hand side is integrable. Suppose that F = (F1 , . . . , Fd ) is a random vector whose components belong to space D1,1 . We associate with F the following random symmetric nonnegative definite matrix: γF = DFi , DFj L2 [0,T ] 1≤i,j≤d . This matrix is called the Malliavin covariance matrix of the random vector F . Definition 2.1. We say that a random vector F = (F1 , . . . , Fd ) ∈ (D∞ )d is nondegenerate if its associated Malliavin covariance matrix γF is invertible a.s. and S Lp (Ω). (det γF )−1 ∈ p≥1
It is well known that, if F is nondegenerate and G ∈ D∞ , then pF,G exists and is smooth and, in particular, one obtains (1.1). 2.2. Malliavin–Thalmaier representation of multidimensional density functions. We represent the delta function by δ0 (x) = ΔQd (x)
for x ∈ Rd , d ≥ 2,
in the following sense (see Evans [3, p. 25]). If f is a twice continuously differentiable function with compact support, then the solution of the Poisson equation Δu = f is given by the convolution Qd ∗ f , where the fundamental solution (also called Poisson kernel) Qd has the following explicit form: Q2 (x) := a−1 2 ln |x|
and Qd (x) := −a−1 d
1 |x|d−2
for d ≥ 3.
1550
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
Here ad is the area of the unit sphere in Rd . The derivative of the Poisson kernel is ∂ xi Qd (x) = Ad d , ∂xi |x| −1 where i = 1, . . . , d, A2 := a−1 2 , and, for d ≥ 3, Ad := ad (d − 2). d Definition 2.2. Given an R -valued random vector F , an R-valued random variable G, a multi-index α, and power p ≥ 1, we say that there is an IBP formula if there exists a random variable Hα (F ; G) ∈ Lp (Ω) such that |α| ∂ |α| IPα,p (F, G) : E f (F )G = E [f (F )Hα (F ; G)] for all f ∈ C0 Rd . α ∂x
Related to the Malliavin–Thalmaier formula, Bally and Caramellino [2] have obtained the following result, which gives specific conditions for the Malliavin–Thalmaier formula to hold. Proposition 2.1 (Bally and Caramellino [2]). Suppose that, for some p > 1, p p−1 ∂ p p−1 sup E Qd (F − a) + |Qd (F − a)| < ∞ for all R > 0, a ∈ Rd . ∂xi |a|≤R ˆ ∈ Rd , we have Then for x (i) If IPi,p (F ; G), i = 1, . . . , d, holds, then the law of F is absolutely continuous with respect to the Lebesgue measure on Rd and the density pF,G is represented as d ∂ ˆ )H(i) (F ; G) . (2.3) pF,G (ˆ x) = E Qd (F − x ∂xi i=1 (ii) If IPα,p (F ; G) holds for every multi-index α, with |α| ≤ m + 1, then pF,G ∈ C m (Rd ) and for every multi-index ρ, with |ρ| ≤ m, one has d ∂ ∂ |ρ| ˆ )H(i,ρ) (F ; G) . pF,G (ˆ x) = E Qd (F − x ∂xρ ∂xi i=1 The heuristic idea of the above proof is to use the IBP formula as follows: ˆ ) G] pF,G (ˆ x) = E [ΔQd (F − x d 2 ∂ ˆ = E Q (F − x ) G d ∂x2i i=1 d ∂ ˆ ) H(i) (F ; G) . =E Qd (F − x ∂xi i=1 Next, we impose conditions to assure that the assumptions of Proposition 2.1 are satisfied. The proof is given in the appendix. Corollary 2.1. If F = (F1 , . . . , Fd ) is a nondegenerate random vector and ˆ ∈ Rd , G ∈ D∞ , then the probability density function of F is, for x d ∂ ˆ )H(i) (F ; G) . pF,G (ˆ x) = E Qd (F − x ∂xi i=1 Assumption 2.1. From now on, we always assume that F = (F1 , . . . , Fd ) is a d-dimensional nondegenerate random variable and G ∈ D∞ .
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1551
3. Bias error estimation. In this section, we find the rate of convergence of the ˆ ∈ Rd . From Assumption 2.1, IPα,p (F ; G) will modified estimator of the density at x always hold (see Nualart [8, Proposition 2.1.4, p. 100] or Sanz-Sol´e [9, Proposition 5.4, p. 67]). We start with some definitions and notations. Definitions and Notations. 1. For h > 0 and x ∈ Rd , define | · |h by : ; d ; |x|h := < x2i + h. i=1
Without loss of generality, we assume 0 < h < 1. 2. For i = 1, . . . , d, define the following approximation to Qd for x ∈ Rd : −1 a2 ln |x|h ; d = 2, h Qd (x) = −a−1 1 ; d ≥ 3. d |x|d−2 h
Then we have that ∂ h xi Qd (x) = Ad d . ∂xi |x|h 3.
Then we define the approximation to the density function of F , for x ∈ Rd ,
as (3.1)
phF,G (x)
d ∂ h := E Q (F − x)H(i) (F ; G) . ∂xi d i=1
4. Consider a function η which satisfies ⎧ d (i) η ∈ C0∞ R4 , η(x) ≥ 0 5 x ∈ Rd , ⎪ ⎪ ⎪ ⎪ ⎨(ii) supp(η) ⊂ x ∈ Rd | |x| ≤ 1 , (iii) η(x)dx = 1, ⎪ ⎪ ⎪ Rd ⎪ ⎩(iv) η(x) is symmetric, that is, η(x) = η(y) when |x| = |y| for x, y ∈ Rd . 5. For each ε > 0, we define ηε (x) as ηε (x) :=
1 x . η εd ε
6. We define η˜ε (x) as follows: xd x1 η˜ε (x) := ··· ηε (y)dy1 . . . dyd . (≤ 1 from 4). −∞
−∞
7. We often use the spherical coordinates. To avoid long expressions, we define Θ := (Θ1 , . . . , Θd )∗ as the coordinate change rΘ1 :=r cos(θ1 ) cos(θ2 ) · · · cos(θd−2 ) cos(θd−1 ), rΘi :=r cos(θ1 ) · · · cos(θd−i ) sin(θd−i+1 ) for i = 2, . . . , d,
1552
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
where 0 ≤ r < ∞, − π2 ≤ θi ≤ π2 , i = 1, . . . , d − 2, 0 ≤ θd−1 ≤ 2π. Set si = sin θi , ci = cos θi for i = 1, . . . , d − 1. We will give some preparatory lemmas for the following section. Lemma 3.1. For m ∈ N ∪ {0}, let α ∈ {1, . . . , d}m be any multi-index. Then, for ˆ = (ˆ x x1 , . . . , x ˆd ) ∈ Rd , there exists some constant C such that, for p ≥ 1, |α| ∂ C ˆ )G] ≤ lim E [ηε (F − x . ε→0 ∂xα 1 + |ˆ x|p Proof. It is enough to consider the case p ∈ N. In such a case, we have m p ∂ p lim (1 + |ˆ = lim (1 + |ˆ ˆ ˆ x | ) E [η (F − x )G] x | )E [η (F − x )H (F, G)] ε ε α ε→0 ε→0 ∂xα ˆ )(1 + (|F | + ε)p )Hα (F, G)]| ≤ lim |E [ηε (F − x ε→0 0 0 ≤ Cp E 0H(1,...,d) F ; 1 + |F |2p Hα (F, G) 0 < +∞. ˆ = (ˆ Lemma 3.2. The following holds for x x1 , . . . , x ˆ d ) ∈ Rd : ˆ )G = E[G|F = x ˆ ]pF,1 (ˆ lim E ηε (F − x x).
ε→0
Proof. Set zi = Fi − xi (i = 1, . . . , d). By the dominated convergence theorem and the properties of ηε , stated in 4, and Fubini’s theorem, we have, for ϕ ∈ C0∞ (Rd ), Rd
ˆ )G] ϕ(ˆ lim E [ηε (F − x x)dˆ x = lim E
ε→0
ε→0
Rd
ηε (z)ϕ(F + z)dzG
= E [ϕ(F )G] ˆ ] pF,1 (ˆ = ϕ(ˆ x)E [G|F = x x)dˆ x. Rd
The next result gives the order of the error of the approximation to the density. Theorem 3.1. Let F be a nondegenerate random vector and G ∈ D∞ , then, for ˆ = (ˆ ˆ d ) ∈ Rd , x x1 , . . . , x pF,G (ˆ x) − phF,G (ˆ x) = C1xˆ h ln
1 + C2xˆ h + o(h), h
where C1xˆ :=
d i=1
ˆ x C1,i
and
⎧ ⎫ d ⎨ d ⎬ ˆ ˆ ˆ x x x C2,i C2xˆ := + C3,i,j,k + C4,i ⎩ ⎭ i=1
j,k=1
and the constants appearing above are defined in Lemmas 8.3, 8.4, and 8.5 in the appendix. Proof. As we will have to change from rectangular to spherical coordinates, set y1 − xˆ1 = rΘ1 and yi − xˆi = rΘi for i = 2, . . . , d.
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1553
By using Lemma 3.2 and spherical coordinates, x)−phF,G (ˆ x) pF,G (ˆ d
∂ ∂ h ˆ) − ˆ ) H(i) (F ; G) =E Qd (F − x Q (F − x ∂xi ∂xi d i=1
d
yi − xˆi yi − xˆi lim = Ad − E η (F − y)H (F ; G) dy1 · · · dyd ε (i) ε→∞ ˆ |d |y − x d ˆ |dh |y − x i=1 R d π2 1 ∞ 2 d 2π π 2 r + h 2 − rd = Ad ··· + Θi cd−2 · · · cd−2 d 1 π π 2 2 0 0 1 − − (r + h) 2 2 i=1 ˆ ) drdθ1 · · · dθd−1 , lim ΦF i,ε (rΘ + x ε→0
where ΦF i,ε (y) := E ηε (F − y)H(i) (F ; G) for i = 1, . . . , d. Here note that the limits appearing in the above formula exist due to Lemmas 3.1 and 3.2. Next, we consider the integral for r ∈ [0, 1], where the following Taylor formula is used: ΦF i,ε (rΘ
ˆ) = +x
ΦF x) i,ε (ˆ
+
d j=1
rΘj
∂ F Φ (ˆ x) ∂yj i,ε 1 d ∂2 1 2 r Θj Θk ΦF x + γrΘ)dγ. + i,ε (ˆ 2 0 ∂yk ∂yj j,k=1
This leads to three terms, whose orders of convergence are analyzed, respectively, in Lemmas 8.2, 8.3, and 8.4 in the appendix. Finally, the integral term for r ∈ [1, +∞) is analyzed in Lemma 8.5 in the appendix. Therefore, one obtains that x) − phF,G (ˆ x) pF,G (ˆ ⎧ ⎫ d ⎨ d ⎬ 1 ˆ ˆ ˆ ˆ x x x x h ln + C2,i h + o(h) + C3,i,j,k h + o(h) + C4,i h + o(h) . = C1,i ⎩ ⎭ h i=1
j,k=1
The constants are explicitly given in the appendix. 4. Estimation of the L2 -error of the approximation. In this section, we compute the rate at which the L2 -error of the estimator diverges. That is, ⎡ 2 ⎤ d ∂ ˆ )H(i) (F ; G) − pF,G (ˆ E⎣ Qhd (F − x x) ⎦ ∂x i i=1 ⎡ 2 ⎤ d ∂ ˆ )H(i) (F ; G) ⎦ =E⎣ Qhd (F − x (4.1) ∂x i i=1 4 5 x) pF,G (ˆ x) − phF,G (ˆ x) − pF,G (ˆ x)2 . + 2pF,G (ˆ Therefore, it is enough to estimate the rate of divergence of the first term in (4.1) as the second term converges to 0 (proved in Theorem 3.1) and the third is a constant.
1554
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
The term we will calculate is then ⎡
2 ⎤ d ∂ ˆ )H(i) (F ; G) ⎦ E⎣ Qhd (F − x ∂x i i=1 =
d
E
i,j=1
∂ h ∂ h ˆ) ˆ )H(i) (F ; G)H(j) (F ; G) . Qd (F − x Qd (F − x ∂xi ∂xj
ˆ F (y) := E ηε (F − y)H(i) (F ; G)H(j) (F ; G) for i, j = 1, . . . , d. Let Φ i,j,ε 4.1. Case d = 2. Theorem 4.1. Let F be a nondegenerate random vector and G ∈ D∞ . Then, for d = 2, ⎡ 2 ⎤ 2 ∂ 1 ˆ ∈ Rd , ˆ )H(i) (F ; G) − pF,G (ˆ E⎣ x Qh2 (F − x x) ⎦ = C3xˆ ln + O(1) ∂xi h i=1 2 ˆ ˆ x x where C3xˆ := i=1 C5,i and constants C5,i are defined in Lemma 8.6 in the appendix. ˆi = rΘi for i = 1, 2. For i, j = 1, 2, by using Lemma 3.2, Taylor Proof. Set yi − x expansion, and spherical coordinates, ∂ h ∂ h ˆ) ˆ )H(i) (F ; G)H(j) (F ; G) E Q2 (F − x Q2 (F − x ∂xi ∂xj (yi − x ˆi )(yj − x ˆj ) ˆF = A22 lim Φ i,j,ε (y) dy1 dy2 4 ε→0 ˆ |h |y − x R2 2π 2|ˆx|+1 2 r Θi Θj (4.2) = A22 r 2 (r + h)2 0 0 # 1 2 ∂ ˆ F (ˆ ˆ F (ˆ Φ lim Φ rΘk x + γrΘ)dγ drdθ i,j,ε x) + ε→0 ∂yk i,j,ε 0 k=1 2π ∞ r 2 Θi Θj 2 ˆF ˆ + A2 Φ lim r 2 (rΘ + x ) drdθ. i,j,ε 2 ε→0 0 2|ˆ x|+1 (r + h) Then, by using Lemmas 8.6, 8.7, and 8.8, we obtain (4.2) =
2 i=1
ˆ x C5,i ln
1 + O(1). h
4.2. Case d ≥ 3. Theorem 4.2. Let F be a nondenegerate random vector and G ∈ D∞ . Then, for d ≥ 3, ⎡ 2 ⎤
d 1 ∂ 1 ˆ x h ⎣ ⎦ ˆ )H(i) (F ; G) − pF,G (ˆ ˆ ∈ Rd , E Qd (F − x x) = C4 d +o x d −1 −1 ∂x 2 2 i h h i=1 where C4xˆ =
d i=1
ˆ ˆ x x C8,i and constants C8,i are defined in Lemma 8.10.
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1555
ˆi = rΘi for i = 1, . . . , d. For i, j = 1, . . . , d, by using Lemma 3.2, Proof. Let yi − x Taylor expansion, and spherical coordinates, ∂ h ∂ h ˆ) ˆ )H(i) (F ; G)H(j) (F ; G) Q (F − x Q (F − x E ∂xi d ∂xj d (yi − x ˆi )(yj − xˆj ) 2 ˆF = Ad Φ lim (y) dy1 . . . dyd i,j,ε ε→0 ˆ |2d |y − x Rd h 2π π2 π2 1 2 r Θi Θj d−1 d−2 2 (4.3) = Ad ··· r c1 · · · cd−2 2 + h)d π π (r 0 −2 −2 0 # 1 d ∂ F F ˆ i,j,ε (ˆ ˆ i,j,ε (ˆ × lim Φ x) + rΘk x + γrΘ)dγ drdθ1 . . . dθd−1 Φ ε→0 0 ∂yk
k=1
+
A2d 0
2π
π 2
−π 2
···
π 2
−π 2
1
∞
r2 Θi Θj d−1 d−2 r c1 · · · cd−2 (r2 + h)d
ˆ F (ˆ lim Φ i,j,ε x + rΘ) drdθ1 . . . dθd−1 .
ε→0
Then from Lemmas 8.10, 8.11, and 8.12, we can obtain our result. Remark 4.1. In particular, for h = 0, one obtains that the variance of the Malliavin–Thalmaier estimator is infinite. We also point out that this situation also appears in KDE theory. In particular, one uses as estimator h−1 K( F h−x ), where h is the tuning parameter and K is a smooth density function with mean 0 and finite 1 ). moments. In this case, the bias is of order O(h2 ), and the L2 -error is of order O( hd/2 In that situation, as we will do in the next section, the solution is to use the sample size in order to obtain the convergence of the estimator. 5. The central limit theorem. Obviously, when performing simulations, one is also interested in obtaining confidence intervals, and therefore, the central limit theorem is useful in such a situation. In what follows, ⇒ denotes weak convergence, and the index j = 1, . . . , N denotes N independent copies of the respective random variables. The symbol ·, denotes the greatest integer function. Theorem 5.1. Let Z be a random variable with standard normal distribution, and let (F (j) , G(j) ) ∈ (D∞ )d × D∞ , j ∈ N be a sequence of independent identically distributed random vectors. T U T 2 U C (i). When d = 2, set n = h ln and N = h2Cln 1 for some positive constant 1 h h C fixed throughout. Then as h → 0, (5.1) ⎛
⎞ 2 N ∂ 1 ˆ H(i) (F ; G)(j) − pF,G (ˆ Qh F (j) − x x)⎠ n⎝ N j=1 i=1 ∂xi 2
=⇒
C3xˆ Z − C1xˆ C,
where H(i) (F ; G)(j) , i = 1, . . . , d, j = 1, . . . , N , denotes the weight obtained in the jth independent simulation (the same that generates F (j) and G(j) ).
1556
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
(ii). When d ≥ 3, set n =
T
C h ln
V
U 1 h
and N =
constant C fixed throughout. Then as h → 0,
W
C2 d
h 2 +1 (ln
1 2 h)
for some positive
(5.2) ⎛
⎞ d N ∂ 1 ˆ H(i) (F ; G)(j) − pF,G (ˆ Qh F (j) − x x)⎠ n⎝ N j=1 i=1 ∂xi d ⎛
=⇒
C4xˆ Z − C1xˆ C.
Proof. Consider
⎞ d N 1 ∂ ˆ H(i) (F ; G)(j) − pF,G (ˆ n⎝ Qh F (j) − x x)⎠ N j=1 i=1 ∂xi d d # N n ∂ h (j) (j) h ˆ H(i) (F ; G) − pF,G (ˆ = Qd F − x x) + n phF,G (ˆ x) − pF,G (ˆ x) . N j=1 i=1 ∂xi Due to the definition of n and Theorem 3.1, we have that the second term above converges to −C1xˆ C. Therefore, it remains only to prove a central limit theorem for N h n j=1 ζj , where N ζjh :=
d ∂ h (j) ˆ H(i) (F ; G)(j) − phF,G (ˆ Qd F − x x). ∂xi i=1
Note that {ζjh } is a sequence of independently and identically distributed random variables, with E[ζ1h ] = 0. n N h To prove this, we compute the characteristic function of N j=1 ζj . By Taylor expansion, Lemmas 8.13 and 8.14, ⎡ ⎞⎤ ⎛ √ N −1un E ⎣exp ⎝ ζjh ⎠⎦ N j=1 =
1−
1 N
2 N 2 u 1 u 2 n2 E ζ1h −→ exp − Cxˆ , −N ×R 2 N 2
where when d = 2, Cxˆ = C3xˆ , and when d ≥ 3, Cxˆ = C4xˆ and set
√ −1un h 1 u2 n2 h 2 ζ1 R := E exp E ζ − 1− . 1 N 2 N2 Remark 5.1. (i) In the assertion of Theorem 5.1, we can freely choose the constant C. Therefore, we have that if C is small (w.r.t. C1xˆ ), then the bias becomes small. (ii) This theorem also gives an idea on how to choose h once n or N is fixed. (iii) Constants C3xˆ and C1xˆ have explicit expressions, but they seem cumbersome to compute for each model. One alternative way to compute these constants is to perform a pilot simulation and estimate the constants through a histogram of the left-hand side (LHS) of (5.1) or (5.2). (iv) This theorem can be applied to obtain the values of constants C3xˆ and C1xˆ , which later can be used to choose an appropiate value for h.
1557
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
6. Application of the Malliavin–Thalmaier formula to finance. In this section, we compute Greeks using the Malliavin–Thalmaier formula. The setup of this section is rather general and does not refer to the financial issues. We refer the reader to Fourni´e et al. [4] for more details about the financial background. We consider a random vector F μ = (F1μ , . . . , Fdμ ), μ ∈ Rm ; m ∈ N, which depends on a parameter μ. We suppose through this section that F μ is a.s. differentiable, with respect to μ. Furthermore, we assume that F μ ∈ (D∞ )d is a nondegenerate random vector. Let f (x1 , . . . , xd ) be a payoff function in the following class A:1 continuous a.e. w.r.t. Lebesgue measure, d A := f : R → R : . c and there exist constants c, a such that |f (x)| ≤ (1+|x|) a (a > 1) Note that functions in A are bounded. A Greek is defined for f ∈ A, as the following quantity for some j ∈ {1, . . . , m}: ∂ E [f (F1μ , . . . , Fdμ )] . ∂μj As the study of the second derivative is similar, we concentrate on the above quantity and just quote the result for second derivatives in the next section. First, we give some lemmas. For i = 1, . . . , d and f ∈ A, set ∂ f (x) Qd (y − x)dx, gi (y) := ∂x d i R ∂ h f (x) Q (y − x)dx. gih (y) := ∂xi d Rd Note that gih ∈ C ∞ (Rd ) for i = 1, . . . , d. Lemma 6.1. For f ∈ A ∩ Lp (Rd ) (p > 1) and i = 1, . . . , d, gih (y) −→ gi (y)
a.e.
Proof. For δ > 0, ∂Qd ∂Qhd ˆ) − ˆ ) dˆ f (ˆ x) (y − x (y − x x ∂xi ∂xi Rd ∂Qd ∂Qhd ˆ) − ˆ ) dˆ (6.1) f (ˆ x) (y − x (y − x x = ∂xi ∂xi Rd ∂Qd ∂Qhd ˆ) − ˆ ) dˆ (6.2) f (ˆ x) (y − x (y − x x. + ∂xi ∂xi Rd Note that f ∈ A ⇒ f ∈ Lp (Rd ) (p > d/a). Then we take ad < p < d and p1 + By the dominated convergence theorem, we have that, for any δ > 0, 0 0 0 ∂Qd 0 ∂Qhd 0 (y − ·) − (y − ·)0 −→ 0, (h → 0), | (6.2) | ≤ f p 0 0 ∂xi ∂xi q,B(y;δ)c
1 q
= 1.
1 Note that, in the case of a put option, clearly (K − x) + ∈ A. Also, in the digital put option case, 1[0,K] (x) ∈ A. In the call-type cases, the results in this section apply if one uses the put-call parity. We remark here that as pointed out in [4], in some cases a localization procedure is needed in order to obtain a method with small variance.
1558
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
where · p denotes the Lp (Rd )-norm, · q,B(y;δ)c denotes the Lq (B(y; δ)c )-norm, and B(y; δ)c denotes the complement of the d-dimensional sphere, with center y ∈ Rd and radius δ > 0. Next we consider (6.1). ∂ ∂ h ˆ) − ˆ ) dˆ (6.1) = (6.3) (f (ˆ x) − f (y)) Qd (y − x Qd (y − x x ∂xi ∂xi |y−ˆ x|≤δ ∂ ∂ h ˆ) − ˆ ) dˆ + f (y) Qd (y − x Q (y − x x. ∂xi ∂xi d |y−ˆ x|≤δ As in the proof of Lemma 8.2, the second term equals 0. Therefore, as δ → 0, (6.3) ∂ ˆ) − converges to 0 due to the continuity of f a.e. and that |y−ˆx|≤δ ∂x Qd (y − x i ∂ h ˆ )dˆ x < ∞. Therefore, the result follows. ∂xi Qd (y − x From Lemma 6.1, we obtain the following convergence. Lemma 6.2. For f ∈ A and i = 1, . . . , d,
E gih (F μ ) −→ E [gi (F μ )] as h → 0. We denote the integration with respect to phF μ ,1 (x) by E h [·]. That is, h μ f (ˆ x)phF μ ,1 (ˆ x)dˆ x. E [f (F )] := Rd
Lemma 6.3. For f ∈ A, E [f (F μ )] =
d
E gi (F μ )H(i) (F μ ; 1) ,
i=1
E h [f (F μ )] =
d
E gih (F μ )H(i) (F μ ; 1) .
i=1
The proof of this lemma is straightforward. In fact, for the proof, use Malliavin– Thalmaier formula (2.3), multiply it by f (ˆ x), integrate, and finally apply Fubini’s theorem. Now we consider an expression of a first derivative. Proposition 6.1. Let k ∈ {1, . . . , m} be fixed. Let F μ be a nondegenerate random vector which is a.s. differentiable with respect to μk . Suppose that, for every i = 1, . . . , d, H(1,...,d,i) (F μ ; 1) is a.s. differentiable in μk , ∂μ∂ k H(1,...,d,i) (F μ ; 1) ∈ L2 (Ω), and also
∂Fjμ ∂μk
∈ L2 (Ω) for all j = 1, . . . , d. Then we have
d
∂ ∂ E h [f (F μ )] = E gih (F μ )H(i) (F μ ; 1) ∂μk ∂μk i=1
−→
d
∂ ∂ E gi (F μ )H(i) (F μ ; 1) = E [f (F μ )] . ∂μ ∂μ k k i=1
Proof. Using the IBP formula d times, for i = 1, . . . , d, we have that the following equality is satisfied for f = gih , gi : μ F1μ Fd
μ μ μ ··· f (z)dzH(1,...,d,i) (F ; 1) . E f (F )H(i) (F ; 1) = E 0
0
1559
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
For i = 1, . . . , d, define yd h Gi (y) := ··· 0
y1
gih (z)dz
and Gi (y) :=
0
yd
···
0
y1
gi (z)dz. 0
From Lemma 8.7, we have that gih , i = 1, . . . , d, has, at most, polynomial growth. Therefore, the same property is satisfied by Ghi for i = 1, . . . , d and then
∂ E Ghi (F μ )H(1,...,d,i) (F μ ; 1) ∂μk ⎤ ⎡ μ d ∂F ∂ h μ ∂ j Gi (F ) H(1,...,d,i) (F μ ; 1)⎦ + E Ghi (F μ ) H(1,...,d,i) (F μ ; 1) . =E⎣ ∂yj ∂μk ∂μk j=1 We consider the first term. Let y be fixed. From Lemma 8.7, gih , i = 1, . . . , d, has, at most, polynomial growth, then it is bounded on [0, y1 ] × · · · × [0, yd ]. Hence, for j = 1, . . . , d, as h → 0, ∂ ∂ h G (y) −→ Gi (y). ∂yj i ∂yj And since gih , i = 1, . . . , d, has, at most, polynomial growth, ∂y∂ j Ghi , i, j = 1, . . . , d, also has polynomial growth, where the growth rate is independent of h. Hence, for i, j = 1, . . . , d, as h → 0, μ μ ∂F ∂ ∂ h μ ∂Fj j G (F ) H(1,...,d,i) (F μ ; 1) −→ E Gi (F μ ) H(1,...,d,i) (F μ ; 1) . E ∂yj i ∂μk ∂yj ∂μk Similarly, we prove the convergence of the second term, for i = 1, . . . , d: ∂ ∂ h μ μ μ μ E Gi (F ) H(1,...,d,i) (F ; 1) −→ E Gi (F ) H(1,...,d,i) (F ; 1) . ∂μk ∂μk From here, the result follows in a straightforward manner. For i, j = 1, . . . , d, define ∂g h ∂ ∂ h h ˆ )dˆ (y) := i (y) = f (ˆ x) Q (y − x x, (6.4) gi,j ∂yj ∂yj Rd ∂xi d
y ∈ Rd .
h Remark 6.1. Note that if f ∈ A, then gi,j exists and is finite for i, j = 1, . . . , d d and y ∈ R . Theorem 6.1. Let k ∈ {1, . . . , m} be fixed, and let f ∈ A. Moreover, let F μ be a nondegenerate random vector, which is a.s. differentiable with respect to μk . Suppose ∂F μ that for j = 1, . . . , d, ∂μjk ∈ D∞ . Then, for i = 1, . . . , d, d ∂ h μ ∂ E h f (F μ ) = E gi (F )H(i) F μ ; 1 ∂μk ∂μk i=1
=
d i,j=1
E
h gi,j (F μ )H(i)
∂Fjμ F ; ∂μk μ
.
1560
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
Moreover, if we assume that, for all i, j = 1, . . . , d, there exists a function gi,j such h that gi,j (F μ ) → gi,j (F μ ) in L1+ε (Ω) as h → 0 for some ε > 0, then d ∂Fjμ ∂ μ μ (6.5) E gi,j (F )H(i) F ; E f (F μ ) . = ∂μk ∂μk i,j=1 Proof. We prove the first part by using the IBP formula. For i = 1, . . . , d, ∂ h μ ∂ ∂ E gih (F μ )H(i) F μ ; 1 = E gi (F ) ∂μk ∂μk ∂yi ⎤ ⎡ μ d 2 ∂F ∂ j ⎦ =E⎣ gih (F μ ) ∂y ∂y ∂μ j i k j=1 d ∂Fjμ ∂ h μ μ , E g (F )H(i) F ; = ∂yj i ∂μk j=1 where we have used Lemma 8.15. Therefore, we obtain the first assertion. The second claim follows by taking limits. Remark 6.2. (i) Note that the expression in Theorem 6.1 is obviously not unique. In fact, we also have μ d d
∂F ∂ h μ j h E gi (F )H(i) (F μ ; 1) = E gi,i (F μ )H(j) F μ ; . ∂μk i=1 ∂μk i,j=1 (ii) In the digital put case, we have an explicit expression of gi,j , i, j = 1, . . . , d. That is, let d = 2 and f (x1 , x2 ) = 1(0 ≤ x1 ≤ K1 )1(0 ≤ x2 ≤ K2 ) ∈ A, where K1 and K2 are positive constants. (6.6) g1,1 (y) y2 − K 2 y2 y2 − K 2 y2 = A2 arctan − arctan − arctan + arctan , y1 y1 y1 − K 1 y1 − K 1 g2,2 (y) y1 − K 1 y1 y1 − K 1 y1 = A2 arctan − arctan − arctan + arctan . y2 y2 y2 − K 2 y2 − K 2 g1,2 (y) y12 + y22 (y1 − K1 )2 + (y2 − K2 )2 A2 ln . = g2,1 (y) = 2 ((y1 − K1 )2 + y22 ) (y12 + (y2 − K2 )2 ) h These expressions are obtained after taking limits of gi,j (y) as h → 0 for i, j = 1, 2. h , i, j = 1, . . . , d has an explicit representation, then one (iii) In general, if gi,j can calculate Greeks easily. If we do not have an explicit expression for the multiple integral, then one can use a suitable approximation for multiple Lebesgue integrals. (iv) The case of second derivatives follows along a similar pattern, and we quote briefly only the result. For more details, see [10]. Let k, l ∈ {1, . . . , m} be
fixed. Suppose that, for i = 1, . . . , d, l, k = 1, . . . , n,
∂Fiμ ∂Fiμ ∂ 2 Fiμ ∂μk , ∂μl , ∂μk ∂μl
∈ D∞ .
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1561
Furthermore, assume that, for all i, j = 1, . . . , d, there exists functions gi,i , h h gi,i,j such that (gi,i , gi,i,j )(F μ ) → (gi,i , gi,i,j )(F μ ) in L1+ε (Ω) as h → 0 for some ε > 0. Then we have ∂2 E h [f (F μ )] ∂μl ∂μk d μ μ d ∂F ∂F j j 2 1 h = E gi,i,j (F μ )H(j1 ) F μ ; 2 ∂μ ∂μ l k i,j1 =1 j2 =1 # ∂ 2 Fjμ1 h μ μ + E gi,i (F )H(j1 ) F ; ∂μl ∂μk d d ∂Fjμ2 ∂Fjμ1 μ μ −→ E gi,i,j2 (F )H(j1 ) F ; ∂μl ∂μk i,j1 =1 j2 =1 # ∂ 2 Fjμ1 μ μ + E gi,i (F )H(j1 ) F ; ∂μl ∂μk =
∂2 E [f (F μ )] . ∂μl ∂μk
(v) Note that we have written the approximation of the second derivative as part of the statement. This is because, in some particular situations, it may be more convenient to use the approximation to the second derivative rather than the limit expression itself. For example, this is the case for d = 2 and when the second derivative coincides with the density function. 7. Examples and simulations. In this section, we provide some simple examples of application in two cases. In the first, we approximate the multidimensional log-normal density. We take this as a toy example to show the performance of the Malliavin–Thalmaier method with and without a regularization parameter (i.e., h > 0 or h = 0). In order to be concise, we describe only the results and comment on the important issues in the toy example. The case of the bivariate density of the Heston model is solved using the technique presented in this paper and using a finite difference scheme. 7.1. The multivariate geometric Brownian motion. Consider the solution of the following stochastic differential equation:
(7.1)
d dXti = μ dt + σij dWtj , i Xti j=1
X0i = xi ,
where Wt = (Wt1 , . . . , Wtd ) is a standard d-dimensional Brownian motion and μi and σij are constants. The density of Xt = (Xt1 , . . . , Xtd ) is the multivariate lognormal distribution. As the goal is to compare the theoretical density with the Malliavin–Thalmaier approach with and without regularization parameter h, we only need to compute the terms in formula (3.1) explicitly. In particular, we derive now an expression for the weight
1562
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
H(i) (F ; G). First, define, for i = 1, . . . , d, fi (x) :=yi0 exp ⎛ XTi :=fi ⎝
d
d
μi − ⎞
j=1
2 σij
2
T +x ,
σij WTj ⎠ .
j=1
Then, we have using the chain rule for Malliavin derivatives that ⎛ ⎞ ⎧ ⎛ ⎞⎫ ⎛ ⎞ σi1 1(· ≤ T ) d d ⎨∂ ⎬ ⎜ ⎟ .. DXTi = σij WTj ⎠ D ⎝ σij WTj ⎠ = XTi ⎝ f ⎝ ⎠. . ⎩ ∂x i ⎭ j=1 j=1 σid 1(· ≤ T ) Lemma 7.1. Let F be a nondegenerate random vector, then the density of F = XT , solution of (7.1), can be expressed as ⎡ #⎤ d d j det Σji W − x ˆ T F σ i i ij T ⎦. (7.2) pF,1 (ˆ x) = Ad E⎣ (−1)i+j + ˆ |d |F − x det(Σ) F F i i i=1 j=1 Here, Σ := (σij )i,j=1,...,d and Σij , j, i = 1, . . . , d, is a (d − 1) × (d − 1) matrix obtained from Σ by deleting row i and column j. Proof. The expression of H(i) does not depend on the function f . Therefore, from Proposition 2.1 and Lemma 8.16 in the appendix, we have a representation of the density pF,G , where F = XT . Our approximation to the density is given by ⎡ #⎤ j d d j det Σ i WT ˆi Fi − x σij T ⎦ (7.3) phF,1 (ˆ . x) = Ad E⎣ (−1)i+j + d det(Σ) F Fi ˆ |F − x | i h j=1 i=1 Now we briefly comment on the simulation results. We realized the Monte Carlo simulation of both formulas (7.3) and (7.2) and compared them with the theoretical result in the d = 2 dimensional case. In particular, the parameters were μ = (0.01, 0.02),
0.1 0.2 σ= . 0.3 0.2 The densities were then compared. In particular, the Malliavin–Thalmaier formula is highly biased in comparison with approximative formula (7.3). The Malliavin– ∂ Thalmaier formula exibited peaks, which are due to the unstable behavior of ∂x Qd . i This unstability can also be observed at a local level. In comparison, the regularized version behaves smoothly. The choice of h = 0.01 was an adhoc choice. In fact, the central limit theorem, Theorem 5.1, states that the weak convergence occurs as h → 0. Nevertheless, one may also want to minimize the asymptotic L2 -error as is usually done in KDE theory. This requires a minimization procedure that can be done when the constants in the formulas appearing in sections 3 and 4 are known. In fact, they can be obtained in practice through a pilot simulation that gives the histogram of the error sequences in the central limit Theorem 5.1.
1563
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS Num. of MC -- Heston density with Confidence Interval (Confidence Level 95%) 5.6 MT lower MT upper MT MT with h lower MT with h upper MT with h PDE 1 PDE 2
5.55
5.5
Value of density
5.45
5.4
5.35
5.3
5.25
5.2 0
1e+006
2e+006
3e+006
4e+006
5e+006
6e+006
7e+006
8e+006
9e+006
1e+007
Num. of MC
Fig. 7.1. Simulation of Heston model.
7.2. Example: The Heston model. Now we consider the simulation of the joint density of the underlying price and the volatility in the Heston model (see Figure 7.1). First, we define Heston model [5] as follows:
(7.4)
. √ (2) (1) , vt St ρdWt + 1 − ρ2 dWt √ (2) dvt = γ (θ − vt ) dt + κ vt dWt ,
dSt = μSt dt +
2
where μ, γ, θ, κ are positive constants satisfying γθ ≥ 3κ4 . This condition assures that v satisfies the necessary differentiability and integrability properties and that it is strictly positive a.s. For more information, we refer the reader to Alos and Ewald [1] and section 6.2.2 of Lamberton and Lapeyre [6]. Next, we consider the following change of variables. Set Xt := ln(St /S0 ) − μt, ut := avt for a positive constant a. Then > t t √ 1 − ρ2 t √ 1 ρ (2) Xt = X0 − ur dr + √ ur dWr + ur dWr(1) , 2a 0 a a 0 0 (7.5) t t √ √ ur dr + aγθt + aκ ur dWr(2) . ut = u0 − γ 0
0
In this setting, we have the following facts: (i) (7.6)
√ e(t) 1[0,t] (s), Ds(2) ut = κ aus e(s)
1564
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
where
aκ2 e(t) := exp −γt − 8
0
t
1 dr + ur
√ t aκ 1 (2) √ dWr 2 ur 0
and Ds(2) e(t) # √ √ t (2) (2) aκ aκ Ds u r aκ2 t Ds ur dr − dWr(2) 1[0,t] (s). = e(t) √ + 3 2 us 8 s u2r 4 s ur2 (1)
(1)
Also, note that Ds ut ≡ 0 and Ds e(t) ≡ 0. (ii) > 1 − ρ2 √ (1) us 1[0,t] (s). Ds Xt = a (iii) Ds(2) Xt √ t √ √ κ us ρ us ρκ us t e(r) √ − √ = e(r)dr + √ dWr(2) a 2 ae(s) s 2e(s) s ur √ 1 − ρ2 κ us t e(r) + √ dWr(1) 1[0,t] (s). 2e(s) ur s (iv) (2) (1) Dw Ds Xt
(v) (2) (2) Ds u t Dw √ e(t) = κ aus e(s)
1 − ρ2 (2) = √ D us 1[0,t] (s)1[0,s] (w). 2 aus w
√ √ κ3 auw t e(r) aκ2 uw t e(r) dr − dWr(2) 8e(w) s∨w u2r 4 e(w) s∨w ur23 # √ (2) aκ Dw u s + √ 1(0 ≤ s ≤ w ≤ t) + 1(0 ≤ w < s ≤ t) . 2 uw 2us
(vi) The calculation of H(1) (F ; 1) for F = (Xt , ut ). With the previous calculations, we can apply the Bismut–Elworthy formula (see Exercise 2.3.5 in [8]), to obtain √ t a 1 H(1) (F ; 1) = √ dWs(1) . us 1 − ρ2 t 0 (vii) Similarly, (7.7)
1 H(2) (F ; 1) = t
δ
(2)
1 (2)
D· u t
−δ
(1)
D· Xt
(2)
1
(1)
(2)
D· Xt D· u t
# ,
1565
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
where
t
1
(2) 0 Ds u t
dWs(2)
t t e(s) e(s) 1 1 (2) = √ ds √ dWs + aκe(t) 0 us 2e(t) 0 us √ t t aκ aκ2 e(r) e(r) + r 2 dr − r 3 dWr(2) 8e(t) 0 ur 4e(t) 0 u 2 r
and δ
(1)
D· Xt
(2)
1
(1)
(2)
D· Xt D· u t ρ
t
e(s) 1 √ dWs(1) − us 2 a(1 − ρ2 )e(t) r e(r) 1 √ √ dWs(1) dWr(2) ur 0 us
= κ a(1 − ρ2 )e(t) 0 t ρ + 2 1 − ρ2 e(t) 0 t e(r) r 1 1 + √ √ dWs(1) dWr(1) , 2e(t) 0 ur 0 us
t
r
e(r) 0
0
1 √ dWs(1) dr us
(1)
where we have used the fact that ut and e(t) are independent of Wt . We compare the simulation results of the finite difference method for the associated partial differential equation and the Malliavin–Thalmaier formula with and without a regularization parameter. We observe that the finite difference method is sensible to changes in the initial condition, although the value stabilizes around the values [5.2, 5.4]. The Malliavin–Thalmaier formula without regularization also seems to converge to a similar value, but there seems to be a bias in the results, probably due to the high oscilations of the estimates. The Malliavin–Thalmaier formula with regularization exhibits a better behavior with less variance. Confidence intervals have also been computed. As before, the value of h was computed by obtaining constants C1xˆ and C3xˆ with a pilot simulation in the central limit theorem, Theorem 5.1. Then one minimizes the L2 -error in a fashion similar to KDE methods. We have not compared these results with the classical formulation that follows from (1.1), as this will require a long calculation of triple stochastic integrals. As noted before, when the dimension of the problem increases, then the dimension of the multiple stochastic integrals in (1.1) will increase while the ones using the Malliavin– Thalmaier formula will remain of order 2. A detailed description of the simulation study will appear elsewhere. 8. Appendix. 8.1. Proof of Corollary 2.1. In this section, we give a proof of Corollary 2.1. Lemma 8.1. For xi ≥ 0, i = 1, . . . , d, the following inequalities hold. (i) For d = 2 and 1 < κ < 2 − p2 , p > 2, we have
x2 0
0
x1
|Q2 (y)|
p p−1
π − p dy1 dy2 ≤ a2 p−1 2
2p−1 p−1 + |x| p−1 (2 − κ)p − 2
.
1566
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
(ii) For p > d − 1 ≥ 2,
xd
x1
···
0
p
|Qd (y)| p−1 dy1 . . . dyd ≤
π d−1 2
0
p − p−1
ad
2p−d p−1 + |x| p−1 2p − d
.
(iii) For p > d ≥ 2, i = 1, . . . , d,
xd
···
0
x1
0
p p−1 π d−1 p ∂ p−1 p−1 + |x| . Q (y) dy . . . dy ≤ A 1 d d ∂yi d 2 p−d
Proof. (i). Here one performs a change of variables from rectangular to spherical coordinates and separates the region of integration in two. The first integral is bounded by the integral over the unit ball and the second on the complement. For the first, one uses the inequality | ln(r)| < r1κ for 1 < κ and 0 < r < 1. For the second integral, one uses that ln(r) < r for r > 1. Then the inequality follows after some straightforward integrations where condition κ < 2 − 2p is used. (ii) and (iii) are proved changing directly from rectangular to spherical coordinates. Proof of Corollary 2.1. The goal in the proof is to show that (2.1) is satisfied. ∂ Qd (x)| (i = 1, . . . , d) are symmetric. That is, First, note that |Qd (x)| and | ∂x i ∂ ∂ |Qd (x)| = |Qd (−x)| and | ∂x Q (x)| = | ∂x Qd (−x)|. d i i p
Now, we prove that sup|a|≤R E[|Q2 (F − a)| p−1 ] < ∞ for all R > 0. From the above symmetric property, the IBP formula, Lemma 8.1, and H¨ older’s inequality, p p−1 sup E Q2 (F − a)
|a|≤R
∂2 = sup E ∂y1 ∂y2 |a|≤R
|F2 −a2 | 0
|F2 −a2 |
≤
p π − p−1 a2 2
0
|F1 −a1 |
p p−1 dy1 dy2 Q2 (y)
0
|F1 −a1 |
= sup E |a|≤R
0
p p−1 dy1 dy2 H(1,2) (F ; 1) Q2 (y)
p−1 p−1 + E |F | + R (2 − κ)p − 2
2p−1
p p−1
p−1 p
p p1 < ∞. E H(1,2) (F ; 1)
Proving the inequality for general d and the derivative of Qd follows along the same lines as above. 8.2. Lemmas used in the proof of Theorem 3.1. Lemma 8.2. For i = 1, . . . , d, Ad lim
ε→0
ΦF x) i,ε (ˆ
1
d
(r2 + h) 2 − rd d
0
(r2 + h) 2
dr
2π 0
π 2
−π 2
···
π 2
−π 2
Θi cd−2 · · · cd−2 dθ1 · · · dθd−1 = 0. 1
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1567
x) is finite. Similarly, for We know by Lemma 3.1 that limε→0 ΦF i,ε (ˆ 1 (r2 +h) d2 −rd fixed h, we have that 0 dr is finite. Therefore, the result follows from d (r 2 +h) 2 π2 n sin θ cos θdθ = 0 for n ∈ N. −π 2 Lemma 8.3. For i, j = 1, . . . , d, Proof.
1 2
2π π2 π2 d ∂ F (r + h) 2 − rd Ad lim Φi,ε (ˆ x) r dr ··· Θi Θj cd−2 · · · cd−2 dθ1 · · · dθd−1 d 1 ε→0 ∂yj π (r2 + h) 2 0 0 −π − 2 2 1 ˆ ˆ x x h ln + C2,i h + o(h) for i = j, C1,i = h 0 for i = j, where
ˆ x C1,i
ˆ x C2,i
π2 2π π 2 ∂ F d := Ad lim Φi,ε (ˆ x) ··· Θ2i cd−2 · · · cd−2 dθ1 · · · dθd−1 , 1 ε→0 ∂yi π 4 0 −π − 2 2 π2 2π π 2 ∂ F := Ad lim Φi,ε (ˆ x) ··· Θ2i cd−2 · · · cd−2 dθ1 · · · dθd−1 1 ε→0 ∂yi π π 0 −2 −2 d 1 1 (u2 + 1) 2 − ud 1 0 × u du + ln + Md d 4 2d−1 + 2 d2 −1 (u2 + 1) 2 0
and Md0 is a constant (defined in the proof ). Proof. In the case of i = j, using the same argument of Lemma 8.2, the result follows. Next, in the case of i = j, note that
Ad
Set u =
2π π π2 2 ∂ F Φi,ε (ˆ x) ··· Θ2i cd−2 · · · cd−2 dθ1 · · · dθd−1 < ∞. lim 1 ε→0 ∂yi π 0 −π − 2 2
√r . h
r 0
d
1
(r2 + h) 2 − rd d
(r2 + h) 2
dr = h
d
1
u 0
+h 1
(u2 + 1) 2 − ud
√1 h
du d 2(d−l) l=1 l u . du. u d d (u2 + 1) 2 (u2 + 1) 2 + ud d
(u2 + 1) 2 d
Clearly, the first term on the right-hand side has finite value. Next, we consider the second term
1568
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
h
√1 h
d l=1
-
d l
u2(d−l)
. du d d (u2 + 1) 2 (u2 + 1) 2 + ud d √1 2 d−1 h 2du(u + 1) + d2 (4u3 + 2u)(u4 + u2 ) 2 −1 h = du d 4 1 (u2 + 1)d + (u4 + u2 ) 2 ⎡ √1 d−1 d − 1 h d ⎣ +h − 2u u2(d−1−l) l 4 1 l=1
d −1 1 1 2 2d−1 + 2u −1 1+ 2 1+ 2 2u u ⎤ d d 2(d−l) √1 u d −1 h l=2 l d ⎦ × u2 + 1 + u4 + u2 2 du + d du 2 (u + 1)d + (u4 + u2 ) 2 1 1 d d h d d 2 2 = + hMdh , ln (1 + h) + (1 + h) + d ln − ln 2 + 2 4 h u
1
where Mdh is defined by the term within brackets [ ]. Note that the integrands in Mdh are of order O(u2 ) as u → ∞ and therefore integrable. Then we define Md0 := limh→0 Mdh . Hence, - . d d h d d 2 2 ln (1 + h) − ln 2 + hMdh + (1 + h) + 2 4 lim h→0 h d 1 + Md0 . ln(2) + ln 2d + 2 2 = 4 Lemma 8.4. For i, j, k = 1, . . . , d, (8.1) π2 1 π d Ad 2π 2 (r2 + h) 2 − rd ··· r2 Θi Θj Θk cd−2 · · · cd−2 d 1 2 π π 2 0 2 (r + h) −2 −2 0
1 ∂2 ˆ x F × lim Φi,ε (x + γrΘ)dγdrdθ1 · · · dθd−1 = C3,i,j,k h + o(h), ε→0 0 ∂yk ∂yj where ˆ x := C3,i,j,k
dAd 4
0
2π
π 2
−π 2
···
π 2
−π 2
×
1
Θi Θj Θk cd−2 · · · cd−2 1 0
lim
ε→0
0
1
∂2 F Φ (ˆ x + γrΘ)dγ drdθ1 · · · dθd−1 . ∂yk ∂yj i,ε
Proof. From l’Hˆ opital’s rule, (LHS of (8.1)) h π π2 1 rd+2 Ad 2π 2 d d−2 = lim ··· · · · cd−2 d+2 Θi Θj Θk c1 2 π π h→0 2 2 2 (r + h) 0 −2 −2 0
1 ∂2 F × lim Φi,ε (ˆ x + γrΘ)dγ drdθ1 · · · dθd−1 . ε→0 0 ∂yk ∂yj
lim
h→0
The result follows after applying the bounded convergence theorem.
1569
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
Lemma 8.5. For i = 1, . . . , d, (8.2)
2π
Ad 0
π 2
···
−π 2
π 2
−π 2
∞
d
(r2 + h) 2 − rd
Θi cd−2 · · · cd−2 d 1 (r2 + h) 2 ˆ x × lim ΦF (ˆ x + rΘ) drdθ1 · · · dθd−1 = C4,i h + o(h), i,ε
1
ε→0
where ˆ x C4,i :=
d Ad 2
2π
π 2
−π 2
0
···
π 2
−π 2
∞
1 Θi cd−2 · · · cd−2 1 r2 × lim ΦF x + rΘ) drdθ1 · · · dθd−1 . i,ε (ˆ
1
ε→0
Proof. As in the previous Lemma, from l’Hˆopital’s rule, we have (LHS of (8.2)) h 2π = lim Ad
lim
h→0
h→0
π 2
−π 2
0
···
π 2
−π 2
∞
1
d rd Θi cd−2 · · · cd−2 1 2 (r2 + h) d2 +1 × lim ΦF (ˆ x + rΘ) drdθ1 · · · dθd−1 . i,ε ε→0
The result follows from the dominated convergence theorem. 8.3. Lemmas used in the proof of Theorem 4.1. We provide some lemmas for section 4.1. We use the same notations and assumptions in section 4.1. Lemma 8.6. For i, j = 1, 2, 2π 2|ˆx|+1 r3 ˆF (ˆ x ) dr Θi Θj dθ1 A22 lim Φ i,j,ε ε→0 (r2 + h)2 0 0 (8.3) 1 ˆ x ln + O(1) for i = j, C5,i = h 0 for i = j, where
π 2 ˆ F (ˆ A2 lim Φ x ) . i,i,ε ε→0 2
ˆ x C5,i =
Proof. In the case of i = j, the LHS of (8.3) = 0, as in Lemma 8.2. Set u := √rh . In the case of i = j, then (LHS of (8.3)) =
πA22
lim
ε→0
ˆ F (ˆ Φ i,i,ε x)
2|ˆx|+1 # x|+1 2|ˆ√ √ h h 4u3 + 4u u 1 du − du . 4 0 (u2 + 1)2 (u2 + 1)2 0
The first term in the brackets can be computed as follows: 1 4
0
2|ˆ x|+1 √ h
4u3 + 4u 1 du = (u2 + 1)2 2
1 ln (2|ˆ x| + 1)2 + h + ln h
We can easily find that the second term is bounded uniformly in h.
.
1570
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
Lemma 8.7. For i, j, k = 1, 2, 1 2π 2|ˆ x|+1 4 r Θi Θj Θk ∂ ˆF 2 Φ (ˆ x + γrΘ)dγ drdθ lim A2 ≤ C6xˆ , 1 i,j,ε 2 + h)2 ε→0 (r ∂y k 0 0 0 ˆ. where C6xˆ is a positive constant which depends on x 1 ˆ F (ˆ x + γrΘ)dγ is uniformly bounded. Proof. By Lemma 3.1, limε→0 0 ∂y∂ k Φ i,j,ε Therefore, the result follows. Lemma 8.8. For i, j = 1, 2, 2π ∞ r 2 Θi Θj 2 ˆ x F ˆ ˆ Φ lim r 2 (rΘ + x ) drdθ A2 1 ≤ C7 , 2 ε→0 i,j,ε (r + h) 0 2|ˆ x|+1 ˆ. where C7xˆ is a positive constant which depends on x Proof. From Lemma 3.1, 2π ∞ r 2 Θi Θj 2 F ˆ ˆ Φ lim r 2 (rΘ + x ) drdθ A2 1 2 ε→0 i,j,ε (r + h) 0 2|ˆ x|+1
2π
≤ A22
2|ˆ x|+1
0
≤
∞
(r2
r3 C drdθ1 ˆ |2 + h)2 1 + |rΘ + x
C , 1 + (|ˆ x| + 1)2
where C is a positive constant. 8.4. Lemmas used in the proof of Theorem 4.2. We provide the lemmas used in section 4.2. We will use the same notations and assumptions in section 4.2. Lemma 8.9. Set I(n, m) = sinn x cosm xdx for n + m = 0. Then I(n, m) = − =
sinn−1 x cosm+1 x n−1 + I(n − 2, m) n+m n+m
m−1 sinn+1 x cosm−1 x + I(n, m − 2). n+m n+m
Proof. This is proved using the IBP formula for Lebesgue integrals. Lemma 8.10. For i, j = 1, . . . , d, (8.4) . ˆF A2d lim Φ (ˆ x ) i,j,ε ε→0
⎧ ⎨ =
⎩
ˆ x C8,i
0
1 h
d 2 −1
2π π2 π2 rd+1 dr · · · Θi Θj cd−2 · · · cd−2 dθ1 · · · dθd−1 1 2 + h)d π π (r 0 0 −2 −2
1 +o for i = j, d h 2 −1 for i = j, 1
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1571
where
ˆ x C8,i
⎧ . 2π π2 ⎪ F ⎪ 3π A2 lim Φ ˆ ⎪ (ˆ x) Θ2i c1 dθ1 dθ2 ⎪ ⎪ π 16 d ε→0 i,i,ε ⎪ 0 − ⎪ 2 ⎛d ⎞ ⎪ ⎪ −1 ⎪ 2 . ⎪ ! 2 + 2k 1 ⎪ ⎪ ˆ F (ˆ ⎝ ⎠ A2d lim Φ ⎪ x) ⎪ i,i,ε ⎪ ε→0 d−2 d + 2k ⎪ ⎪ k=0 ⎪ ⎪ π π 2π ⎨ 2 2 × ··· Θ2i cd−2 · · · cd−2 dθ1 · · · dθd−1 = 1 ⎪ 0 −π −π ⎪ 2 2 ⎪ ⎛ ⎞ ⎛ ⎞ ⎪ d−7 d−1 ⎪ 2 2 ⎪ . ⎪ ⎪ π ⎝ ! 3 + 2k ⎠ ⎝ ! 1 + 2k ⎠ 2 ˆ F (ˆ ⎪ Φ lim x ) A ⎪ d i,i,ε ⎪ ε→0 ⎪ 4 4 + 2k d − 1 + 2k ⎪ k=0 k=0 ⎪ ⎪ π π ⎪ 2π ⎪ 2 2 ⎪ ⎪ ⎪ × ··· Θ2i cd−2 · · · cd−2 dθ1 · · · dθd−1 ⎩ 1 −π 2
0
−π 2
(d = 3),
(d ≥ 4 : even),
(d ≥ 5 : odd),
" d−7 3+2k 2 where if d = 5, then we define k=0 2+2k = 1. Proof. In the case of i = j, the LHS of (8.4) = 0, as in Lemma 8.2. In the case where i = j, we perform, successively, the following changes of variables u := √rh , u = tan τ , and ν = arctan √1h to obtain that 0
1
rd+1 1 dr = d 2 (r + h)d 2 h −1
√1 h
0
ud+1 1 du = d 2 (u + 1)d 2 h −1
ν
sind+1 τ cosd−3 τ dτ. 0
(i) In the case of d = 3, then
ν 1 1 3π 3π 1 3π 1 4 sin τ dτ − = + o . + 1 1 1 1 16 16 16 2 2 2 h2 h h h 0 (ii) In the case where d (≥ 4) and even, we have from Lemma 8.9 that ⎛ ⎞ d d −1 −1 2 ! 2 + 2k 1 2! 2 + 2k 1 ⎝ ν d+1 1 1 d−3 ⎠ sin τ cos τ dτ − + d d d−2 d + 2k h 2 −1 h 2 −1 d − 2 k=0 d + 2k 0 k=0
−1 1 1 2! 2 + 2k = d +o . d d + 2k h 2 −1 d − 2 h 2 −1 d
1
k=0
(iii) In the case where d (≥ 5) is odd, from Lemma 8.9, ⎛ d−7 ⎞ ⎛ d−1 ⎞⎞ ⎛ 2 2 ! ! 3 + 2k 1 + 2k π 1 ⎝ ν d+1 ⎠⎝ ⎠⎠ sin τ cosd−3 τ dτ − ⎝ d 4 4 + 2k d − 1 + 2k h 2 −1 0 k=0 k=0 ⎛ d−7 ⎞ ⎛ d−1 ⎞ 2 2 π ! 3 + 2k ⎠ ⎝ ! 1 + 2k ⎠ 1 + ⎝ 4 4 + 2k d − 1 + 2k h d2 −1 k=0 k=0 ⎛ d−7 ⎞ ⎛ d−1 ⎞
2 2 1 3 + 2k ⎠ ⎝ ! 1 + 2k ⎠ 1 π ⎝! = + o . d 4 4 + 2k d − 1 + 2k h d2 −1 h 2 −1 k=0
k=0
1572
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
Lemma 8.11. For i, j, k = 1, . . . , d, π2 1 d+2 2π π2 r Θi Θj Θk d−2 2 Ad ··· c1 · · · cd−2 2 + h)d π π (r 0 −2 −2 0 1 ∂ ˆF Φi,j,ε (ˆ × lim x + γrΘ)dγ drdθ1 · · · dθd−1 ε→0 ∂yk ⎧ 0 1 ⎪ ⎪ for d = 3, ⎪ ⎨O ln h =
⎪ 1 ⎪ ⎪ for d ≥ 4. ⎩O d−3 h 2 1 ˆ F (x + γrΘ)dγ) is bounded. Set u = Proof. By Lemma 3.1, limε→0 ( 0 ∂y∂ k Φ i,j,ε Then 1 1 √1 h rd+2 ud+2 1 1 1 dr ≤ du + du. d−3 d−3 2 d 2 d ud−2 h 2 0 (u + 1) h 2 1 0 (r + h)
√r . h
Hence, the result follows. Lemma 8.12. For i, j = 1, . . . , d, there exists some positive constant C such that π π 2π 2 ∞ d+1 2 r Θi Θj d−2 2 F ˆ ˆ ) drdθ1 · · · dθd−1 ≤ C. ··· c1 · · · cd−2 lim Φ Ad i,j,ε (rΘ + x 2 d ε→0 π 0 −π (r + h) −2 1 2 Proof. follows.
ˆ F (rΘ + x) is bounded. By Lemma 3.1, limε→0 Φ i,j,ε
Then the result
8.5. Lemmas used in the proof of Theorem 5.1. In this section, we give some lemmas used to prove the central limit theorem. Lemma 8.13. For any d ≥ 2 and 0 < p < 12 , we have |N × R| ≤ o(hp ). Proof. Generally, for n ∈ N and x ∈ R, we have k n √ √ −1x |x|n+1 −1x . − ≤ e (n + 1)! k! k=0
Then it is enough to prove the following: For any d ≥ 2, we have E[|ζ1h |3 ] ≤ 3 O(1/hd− 2 ). In fact, (8.5) ⎡ 3 ⎤ d 3 ∂ h ˆ H(i) (F ; G)(1) ⎦ E ζ1h ≤ E ⎣ Qd F (1) − x ∂x i i=1 ⎡ 2 ⎤ d ∂ h ˆ H(i) (F ; G)(1) ⎦ phF (1) ,G (ˆ Qd F (1) − x x) + 3E ⎣ ∂x i i=1 d ∂ 2 3 ˆ H(i) (F ; G)(1) phF (1) ,G (ˆ + 3E Qhd F (1) − x x) + phF (1) ,G (ˆ x) . ∂xi i=1
The second and fourth term have already been studied in Theorems 3.1, 4.1, and 4.2. Hence, we estimate the first and third term.
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1573
(1) Define (1) (1) (1) (1) F Φi,j,k (y) := E (F ; G) H (F ; G) H (F ; G) = y pF (1) ,1 (y) H (i) (j) (k) (1) F for i, j, k = 1, . . . , d. Using spherical coordinates and Lemma 3.1, the first term of (8.5) is first divided into two terms as follows: ⎡ 3 ⎤ d ∂ h (1) ˆ )H(i) (F ; 1)(1) ⎦ Qd (F − x E ⎣ ∂x i i=1 (8.6)
≤
d
A3d
r
π 2
−π 2
0
i,j,k=1
×
2π
d+2
···
{r + h}
3d 2
1
∞
+
−π 2
|Θi Θj Θk |
2
π 2
0
1
Φi,j,k (ˆ x + rΘ)cd−2 · · · cd−2 drdθ1 · · · dθd−1 . 1 F (1)
We can easily check the integrability of the second term of (8.6). We consider the first term. √ Set r = h tan(τ ), β := arctan √1h . Then there exists a positive constant M such that β M (First term of (8.6)) ≤ d− 3 sind+2 τ cos2d−4 τ dτ h 2 0 2π π2 π2 d × ··· |Θi Θj Θk |cd−2 · · · cd−2 dθ1 · · · dθd−1 . 1 −π 2
0
Then we obtain the order
1
3
hd− 2
−π 2 i,j,k=1
.
(2) Next, we calculate the third term of (8.5). Set |ΦiF (1) (y)| := E[|H(i) (F ; G)(1) | = y ]pF (1) ,1 (y) for i = 1, . . . , d. By using the spherical coordinates and | F Lemma 3.1, 0 0 d 0 ∂ 0 0 0 h ˆ )H(i) (F ; 1)0 E 0 Qd (F − x 0 0 ∂x i i=1 π2 1 ∞ d 2π π 2 ≤ ··· + (1)
i=1
−π 2
0
×
−π 2
r|Θi | d
{r + h} 2 2
0
1
|ΦiF (ˆ x + rΘ)|rd−1 cd−2 · · · cd−2 drdθ1 · · · dθd−1 1
1 ≤ Cd √ + Cd , h where Cd and Cd are some constants. This completes the proof. Therefore, as a result of above, we have our conclusion. Lemma 8.14. ⎧ ⎪C xˆ ln 1 + O(1) for d = 2, ⎨ 3 h 2 ⎪ h
E ζ1 = 1 1 ⎪ ⎪ ⎩C4xˆ d +o for d ≥ 3. d −1 h2 h 2 −1
1574
A. KOHATSU-HIGA AND KAZUHIRO YASUDA
Proof. In the case of d = 2, the result follows from Theorems 3.1 and 4.1. In the case d ≥ 3, it follows from Theorems 3.1 and 4.2. 8.6. Lemma for section 6. Lemma 8.15. Assume that f ∈ A, then we have that, for i = 1, . . . , d, (8.7) |gi (y)| ≤ a|y| + b and gih (y) ≤ a|y| + b, where a and b are constants which depend on d and are independent of h. Proof. Equation (8.7) follows easily from the assumptions on f . 8.7. Lemma for section 7.1. Here we obtain the weight H(i) . Lemma 8.16. Let XT , be the solution of (7.1), then an expression for H(j) is # d det Σij WTi 1 σji T i+j H(j) (XT ; 1) = (−1) + , T i=1 det(Σ) XTj XTj where j = 1, . . . , d, Σ := (σij )i,j=1,...,d , and Σij , j, i = 1, . . . , d, is a (d − 1) × (d − 1) matrix obtained from Σ by deleting row i and column j. Proof. Let f ∈ C01 (Rd ) and ei be a unit vector whose ith component is 1 and the other components are 0. For l = 1, . . . , d, we have, using the chain rule, Di f (XT ) =
d d ∂ ∂ f (XT )Di XTj = f (XT )XTi σij . ∂x ∂x j j j=1 j=1
If we consider the above as a set of equations for i = 1, . . . , d, where the unknowns ∂ f (XT ), we can solve this set of simultaneous equations using Cramer’s formula are ∂x j and obtain d ∂ 1 f (XT ) = E (−1)i+j det Σij Df (XT ), ei Rd , 1(· ≤ T ) E ∂xj XTj det(Σ) i=1 where ·, · Rd is the inner product in Rd . Then, using that the Skorohod integral δ is the dual operator of D, we have that d ∂ 1 1(· ≤ T ) i+j i i E f (XT ) = E (−1) det Σj δ ei f (XT ) . ∂xj det(Σ) i=1 XTj ) )= Then, by (2.2), δ i ( 1(·≤T Xj T
WTi j XT
+
σji T j XT
, and from here, the result follows.
REFERENCES [1] E. Alos and C.-O. Ewald, Malliavin differentiability of the Heston volatility and applications to option pricing, Adv. Appl. Prob., 40 (2008), pp. 144–162. [2] V. Bally and L. Caramellino, Lower bounds for the density of Ito processes under weak regularity assumptions, working paper. [3] L. C. Evans, Partial differential equations, Grad. Stud. Math. 19, American Mathematical Society, Providence, RI, 1998. [4] E. Fourni´ e, J.-M. Lasry, J. Lebuchoux, P.-L. Lions, and N. Touzi, Applications of Malliavin calculus to Monte Carlo methods in finance, Finance Stoch., 3 (1999), pp. 391–412. [5] S. L. Heston, A closed-form solution for options with stochastic volatility with applications to bond and currency options, Rev. Financ. Stud., 6 (1993), pp. 327–343.
ESTIMATING MULTIDIMENSIONAL DENSITY FUNCTIONS
1575
[6] D. Lamberton and B. Lapeyre, Introduction to Stochastic Calculus Applied to Finance, Chapman & Hall, London, 1996. [7] P. Malliavin and A. Thalmaier, Stochastic Calculus of Variations in Mathematical Finance, Springer Finance, Springer-Verlag, Berlin, 2006. [8] D. Nualart, The Malliavin Calculus and Related Topics, 2nd ed., Probab. Appl., SpringerVerlag, Berlin, 2006. [9] M. Sanz-Sol´ e, Malliavin Calculus with Applications to Stochastic Partial Differential Equations, EPFL Press, Lausanne, Switzerland, 2005. [10] K. Yasuda, Kernel Density Estimation: The Malliavin-Thalmaier Formula and Bayesian Parameter Estimation, Ph.D. thesis, Osaka University, Osaka, Japan, 2008.
c 2009 Society for Industrial and Applied Mathematics
SIAM J. NUMER. ANAL. Vol. 47, No. 2, pp. 1576–1600
A THREE-LEVEL BDDC ALGORITHM FOR MORTAR DISCRETIZATIONS∗ HYEA HYUN KIM† AND XUEMIN TU‡ Abstract. In this paper, a three-level balancing domain decomposition by constraints (BDDC) algorithm is developed for the solutions of large sparse algebraic linear systems arising from the mortar discretization of elliptic boundary value problems. The mortar discretization is considered on geometrically nonconforming subdomain partitions. In two-level BDDC algorithms, the coarse problem needs to be solved exactly. However, its size will increase with the increase of the number of the subdomains. To overcome this limitation, the three-level algorithm solves the coarse problem inexactly while a good rate of convergence is maintained. This is an extension of previous work: the three-level BDDC algorithms for standard finite element discretization. Estimates of the condition numbers are provided for the three-level BDDC method, and numerical experiments are also discussed. Key words. mortar discretization, balancing domain decomposition by constraints, three-level, domain decomposition, coarse problem, condition number AMS subject classifications. 65N30, 65N55 DOI. 10.1137/07069081X
1. Introduction. Mortar methods were introduced by Bernardi, Maday, and Patera [3] to couple different approximations in different subdomains so as to obtain a good global approximate solution. They are useful for modeling multiphysics, adaptivity, problems with joints, and mesh generation for three-dimensional complex structures. The coupling between different subdomains in mortar methods is done by enforcing certain constraints on solutions across the subdomain interface using Lagrange multipliers. We call these constraints the mortar matching conditions. Balancing domain decomposition by constraints (BDDC) methods were introduced and analyzed in [9, 23, 22] for elliptic problems with standard finite element discretizations. These iterative methods are new versions of the balancing Neumann– Neumann algorithms with a coarse problem given in terms of a set of primal constraints. Two-level BDDC methods have been extended to saddle point problems in [19, 10, 28, 30], indefinite problems in [18], nonsymmetric problems in [27], and the problems with mortar finite element discretization in [13, 14]. The complicated geometrically nonconforming subdomain partition leads to a much larger coarse problem than that of the standard discretization. In the two-level BDDC algorithms, the coarse problems are generated and factored by direct solvers at the beginning of the computation. The coarse components can be a bottleneck of the algorithms if the number of the subdomains is large. ∗ Received by the editors May 8, 2007; accepted for publication (in revised form) November 30, 2008; published electronically April 16, 2009. The work of the authors was supported in part by the U.S. Department of Energy under contract DE-FC02-01ER25482. http://www.siam.org/journals/sinum/47-2/69081.html † Department of Mathematics, Chonnam National University, Youngbong-dong, Buk-gu, Gwangju 500-757, Korea (
[email protected],
[email protected]). This author’s work was also supported in part by Chonnam National University, 2008. ‡ Department of Mathematics, University of California at Berkeley and Lawrence Berkeley National Laboratory, Berkeley, CA 94720-3840 (
[email protected]). This author’s research was also supported in part by the Director, Office of Science, Advanced Scientific Computing Research, U.S. Department of Energy under contract DE-AC02-05CH11231.
1576
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
1577
Recently, there have been several papers about inexact solvers for BDDC algorithms with standard finite element discretization. In [32, 31], two three-level BDDC algorithms are introduced which solve the coarse problems inexactly by introducing an additional level. Inexact local solvers based on multigrid methods were introduced in [21]. In [11], several inexact solvers for both the coarse and local components are considered. An inexact dual-primal finite element tearing and interconnecting (FETI-DP) algorithm is also introduced in [16]. The connection between FETI-DP and BDDC algorithms has been discussed in [22, 20, 4, 6]. In this paper, we extend the algorithms in [32] to mortar finite element discretization with quite general subdomain partitions. We solve the coarse problem approximately by introducing an additional level and using the BDDC algorithm recursively. We decompose the whole domain into subdomains and then group several subdomains to subregions to obtain a subregion partition. The subdomain partition can be geometrically nonconforming (it does not need to form a triangulation of the original domain), and the subregions usually will be irregular (they may not have uniformly Lipschitz continuous boundaries). We assume that our subregions are uniform domains and apply the results developed for such irregular domains in [15] to our analysis. See [15] and the references therein for the definition of uniform domains. We provide estimates of the condition number bounds of the system with the new preconditioners and show that a good rate of convergence can still be maintained. We note that we have to choose the edge average primal constraints in the mortar discretization due to the mortar matching conditions. The resulting coarse problems are different from the ones in [32], where the vertex primal constraints are used. This difference and the geometrically nonconforming subdomain partition need a more complicated analysis for the condition number bound. We also note that this analysis can be used for the three-level BDDC algorithms for standard finite element discretization with edge primal constraints chosen for two dimensions. The rest of the paper is organized as follows. We first review a two-level BDDC method for mortar discretization briefly in section 2. A three-level BDDC method P−1 are introduced in section 3. We give some and the corresponding preconditioner M auxiliary results in section 4. In section 5, we provide an estimate of the condition P−1 which is of the form number bound for the system with the preconditioner M 2 2 C(1 + log(H/H)) (1 + log(H/h)) , where H, H, and h are typical diameters of the subregions, subdomains, and elements, respectively; see section 3 for the definitions of subregions and subdomains. Finally, some numerical experiments are discussed in section 6. Throughout the paper, C denotes a generic positive constant that does not depend on any mesh parameters and the problem coefficients. 2. A two-level BDDC algorithm for mortar discretizations. 2.1. A model problem and the mortar discretizations. We will consider a second order scalar elliptic problem in a two-dimensional region Ω: find u ∈ H01 (Ω) such that ρ ∇u · ∇v dx = f v dx ∀v ∈ H01 (Ω), (2.1) Ω
Ω
where ρ(x) > 0 for all x ∈ Ω and f ∈ L2 (Ω). We decompose Ω into N nonoverlapping subdomains Ωi with diameters Hi and set H = maxi Hi . We make the following assumption for our subdomain partition.
1578
HYEA HYUN KIM AND XUEMIN TU
Assumption 2.1. Subdomains are polygons, and each subdomain has comparable diameter to its neighbors. The partition can be geometrically nonconforming, where a pair of subdomains can intersect only a part of a subdomain edge. In other words, the partition does not need to form a triangulation of Ω. In the following, we will regard the edges as the interface between subdomains. We then define the interface of the subdomain partition by / F ij \ ∂Ω, Γ= ij
where Fij = ∂Ωi
S
∂Ωj .
A quasi-uniform triangulation is given for each subdomain. We introduce W(i) , the standard finite element space of continuous, piecewise linear functions associated with the given triangulation in Ωi . In addition, the functions in W(i) vanish on ∂Ω. We define the product space of subdomain finite element spaces by ! W= W(i) . i
Functions in W can be discontinuous across the subdomain interface Γ. The mortar methods are nonconforming finite element methods. To find a good approximate solution, the mortar matching condition is enforced on functions in the space W across the subdomain interface by using suitable Lagrange multipliers. Optimal order of approximation has been proved for the elliptic problems in both two and three dimensions; see [3, 1, 2]. In [3], the error estimate for the mortar approximation was first proved for both geometrically conforming and nonconforming partitions. To introduce Lagrange multiplier spaces, we first select nonmortar and mortar parts of the interface. Among the subdomain edges, we can select edges Fl that provide a disjoint covering of the interface Γ (see [25, section 4.1]) / F l = Γ, Fl ∩ Fk = ∅, l = k. l
Each Fl is a full edge of a subdomain. We call these edges the nonmortar edges. Since the subdomain partition can be geometrically nonconforming, a single nonmortar edge Fl ⊂ ∂Ωi may intersect several subdomain boundaries. This provides Fl with a partition / Fl = F ij , Fij = ∂Ωi ∩ ∂Ωj . j
We call these Fij , the mortar edges, which are opposite to Fl and can be only a part of a subdomain edge. A dual or a standard Lagrange multiplier space M(Fl ) is given for each nonmortar edge Fl ⊂ ∂Ωi . We define a space ◦
(i) 1 W(Fl ) := W |Fl ∩ H0 (Fl )
that is the restriction of the finite element functions to the nonmortar edges and vanishes on the boundary of these edges. We require that the space M(Fl ) has ◦ the same dimension as the space W(Fl ) and that it contains the constant functions. Constructions of such Lagrange multiplier spaces were first given in [1, 3] for standard
1579
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
Lagrange multiplier spaces and in [33, 34] for dual Lagrange multiplier spaces; see also [12]. We note that the basis functions {ψk }k of the Lagrange multiplier space M (Fl ) satisfy (2.2) ψk = 1. k
For (w1 , . . . , wN ) ∈ W, we define φ ∈ L2 (Fl ) by φ = wj on Fij ⊂ Fl . The mortar matching condition in the geometrically nonconforming partition is then given by (2.3) (wi − φ)λ ds = 0 ∀λ ∈ M(Fl ) ∀Fl . Fl ◦
We further define the following two product spaces of the M(Fl ) and W(Fl ), respectively: ! ! ◦ (2.4) M= M(Fl ) and Wn = W(Fl ). l
l
The mortar discretization for problem (2.1) is to approximate the solution by Galerkin’s method in the mortar finite element space $ := {w ∈ W : w satisfies the mortar matching condition (2.3)} . W 2.2. A two-level BDDC algorithm. In this subsection, we construct a twolevel BDDC algorithm for the mortar discretization as in [13]. We first derive the primal form of the mortar discretization and then introduce a BDDC preconditioner for the primal form. We divide unknowns in the subdomain finite element space W(i) into subdomain interior and interface parts. We then select primal unknowns among the interface unknowns and further decompose the interface unknowns into the primal and the rest, called dual unknowns: (2.5)
(i)
(i)
W(i) = WI × WΓ
(i)
(i)
(i)
and WΓ = WΠ × WΔ ,
where I, Γ, Π, and Δ denote the interior, interface, primal, and dual unknowns, respectively. The primal unknowns are related to certain primal constraints selected from the mortar matching condition (2.3), and they result in a coarse component of the BDDC preconditioner. A proper selection of such constraints is important to obtain a scalable BDDC algorithm. We consider {ψij,k }k , the basis functions in M(Fl ) that are supported in F ij , and introduce ψij,k . (2.6) ψij = k
Assumption 2.2. There is at least one basis function ψij,k whose support belongs to F ij . We introduce the trace space of W on the subdomain boundaries WΓ =
N ! i=1
(i)
WΓ .
1580
HYEA HYUN KIM AND XUEMIN TU
We select the primal constraints for (w1 , . . . , wN ) ∈ WΓ over each interface Fij to satisfy (2.7) (wi − wj )ψij ds = 0. Fij
In more detail, the primal unknowns associated to these constraints will be defined by Fij wi ψij ds F wj ψij ds uπ = = ij . Fij ψij ds Fij ψij ds In the case of a geometrically conforming partition, i.e., when Fij is a full edge of two subdomains, the above constraints are the regular edge average matching condition because ψij = 1, the sum of all Lagrange multiplier basis functions {ψij,k }k provided for Fij ; see (2.6) and (2.2). We make the primal constraints explicit by a change of variables; see [17, section 6.2], [20, section 2.3], and [13, section 2.2]. We then separate the unknowns in the space W(i) as described in (2.5). We will also assume that all of the matrices and vectors are written in terms of the new unknowns. Throughout this paper, we use the notation V for the product space of local for a subspace of V finite element spaces V(i) . In addition, we use the notation V satisfying the mortar matching condition (or pointwise continuity condition) across 6 for a subspace of V satisfying only the the subdomain interface and the notation V primal constraints. For example, we can represent the space (2.8)
P Γ = {w ∈ WΓ : w satisfies the primal constraints (2.7)} W
in the following way: P Γ = WΔ × W $ Π. W We further decompose the dual unknowns into the unknowns in the nonmortar part and the rest: WΔ = WΔ,n × WΔ,m , where n and m denote unknowns in each part, respectively. The matrix representation of the mortar matching condition (2.3) on functions in P Γ can be written as the space W (2.9)
Bn wn + Bm wm + BΠ wΠ = 0.
Here we enforced the mortar matching condition using a reduced Lagrange multiplier P Γ satisfy the primal constraints selected space, since the functions in the space W from the mortar matching condition (2.3). The reduced Lagrange multiplier space is obtained after eliminating one basis function among {ψij,k }k for each Fij ⊂ Fl so that the matrix Bn in (2.9) is invertible. The unknowns wn are then determined by the other unknowns (wm , wΠ ), which are called the genuine unknowns. We define the space of genuine unknowns by $Π WG = WΔ,m × W
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
1581
and define the mortar map by ⎛
−Bn−1 Bm 6 I RΓ = ⎝ 0
(2.10)
⎞ −Bn−1 BΠ ⎠ 0 I
P Γ which satisfy the that maps the genuine unknowns in WG into the unknowns in W mortar matching condition (2.9). In the following, we will regard WG as the space $ Γ and regard R 6Γ as an extension from W $ Γ to the space W P Γ to be consistent with W notations of the three-level algorithm. To derive the linear system of the mortar discretization, we introduce several ma(i) trices. The matrix SΓ is the local Schur complement matrix obtained by eliminating the subdomain interior unknowns −1 T (i) (i) (i) (i) (i) , KΓI SΓ = KΓΓ − KΓI KII where K (i) is the local stiffness matrix ordered as ⎛ (i) (i) K (i) ⎜ II KII KIΓ (i) (i) =⎜ K = ⎝KΔI (i) (i) KΓI KΓΓ (i) KΠI
follows: (i)
KIΔ (i)
KΔΔ (i)
KΠΔ
(i)
KIΠ
⎞
⎟ (i) KΔΠ ⎟ ⎠. (i)
KΠΠ
6Γ and RΓ by We define extensions R 6Γ R RΓ P Γ −→W $ Γ −→ W W Γ,
6Γ is the mortar map in (2.10) and RΓ is the product of restriction maps where R PΓ → W . RΓ : W Γ (i)
(i)
We next introduce the matrices SΓ and S6Γ , the block diagonal matrix and the partially assembled matrix at the primal unknowns, respectively, as (i) and S6Γ = RΓT SΓ RΓ . SΓ = diagi SΓ The linear system of the mortar discretization is then written as follows: find $ Γ such that uG ∈ W 6Γ uG = R 6 T gG , 6T S6Γ R R Γ Γ
(2.11)
$ Γ is the part of genuine unknowns of gΓ ∈ WΓ and gΓ is given by where gG ∈ W −1 (i) (i) (i) (i) (i) gΓ |∂Ωi = gΓ = fΓ − KΓI KII fI , where f (i) =
fI(i) (i) , fΓ
the local load vector.
In the two-level BDDC algorithm in [13], we solve (2.11) using a preconditioner M −1 of the form (2.12)
6D,Γ , 6T S6−1 R M −1 = R D,Γ Γ
1582
HYEA HYUN KIM AND XUEMIN TU
6D,Γ is given by where the weighted extension operator R ⎞ ⎛ 0 0 Dn 6Γ , Dn = 0, Dm = I, DΠ = I. 6D,Γ = DR 6Γ = ⎝ 0 Dm 0 ⎠R (2.13) R 0 0 DΠ We call M −1 the Neumann–Dirichlet preconditioner. The weight factor D is determined to be zero at the nonmortar interfaces and to be one otherwise. This type of weight was shown to be the most efficient for the elliptic problems with jump coefficients ρi when the part with smaller ρi is selected to be the nonmortar part; see [7]. Assumption 2.3. We select the nonmortar and mortar parts of the interface H Fij (= ∂Ωi ∂Ωj ) to satisfy ρi ≤ ρj , where Ωi is the nonmortar part and Ωj is the mortar part. Using a block Cholesky factorization, we obtain ⎛ ⎞ −1
N (i) (i) T 0 KII KIΔ (i) −1 T T ⎝ ⎠ RΓΔ +ΦSΠ (2.14) S6Γ−1 = RΓΔ 0 RΔ Φ , (i) (i) (i) R K K Δ ΔI ΔΔ i=1 (i)
where the restrictions RΓΔ and RΔ are defined by P Γ → WΔ and R(i) : WΔ → W(i) . RΓΔ : W Δ Δ Here Φ is the matrix whose columns are the coarse basis functions with minimal energy N (i)T T T 0 RΔ − RΓΔ Φ = RΓΠ
(i)
KII (i) KΔI
i=1
(i)
KIΔ (i) KΔΔ
−1
(i)T
KΠI
(i)T KΠΔ
(i)
RΠ ,
(i)
where RΓΠ and RΠ are the restrictions PΓ → W $ Π and R(i) : W $ Π → W(i) . RΓΠ : W Π Π The coarse level problem matrix SΠ is determined by (2.15) SΠ =
N i=1
(i)
RΠ
T
⎧ ⎨ (i) (i) KΠΠ − KΠI ⎩
(i)
KΠΔ
(i) KII (i) KΔI
(i) KIΔ (i) KΔΔ
−1 ⎛ (i)T ⎞⎫ ⎬ K ⎝ ΠI T ⎠ R(i) , (i) ⎭ Π KΠΔ
which is obtained by assembling subdomain matrices; for additional details, see [9, 20, 23]. Therefore, the preconditioner M −1 contains local components and a coarse component that involve solving the Neumann problems in each subdomain and solving the coarse problem with the matrix SΠ , respectively. $ Γ, From [13, Theorem 4.7], we know that for any uΓ ∈ W (2.16)
6T S6Γ R 6Γ uΓ ≤ C (1 + log(H/h))2 uT M uΓ . uTΓ M uΓ ≤ uTΓ R Γ Γ
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
1583
Fig. 1. A subregion partition (left) and unknowns at a subregion (right) when H/H = 4; the small rectangles are subdomains, the white nodes designate primal unknowns at the interior of the subregion, and the black nodes designate primal unknowns on the subregion boundary.
3. A three-level BDDC method. In the three-level algorithms, as in [32, 31], we will not factor the coarse problem matrix SΠ defined in (2.15) by a direct solver. Instead, we will introduce another level and solve the coarse problem approximately on this level by using ideas similar to those for the two-level preconditioners. (j) Let subregion Ωj be a union of Nj subdomains Ωji with diameters Hi , and then N c we obtain a subregion partition {Ωj }j=1 . We make the following assumption on our subregions; see [15] and the references therein for the definition of uniform domains. Assumption 3.1. The subregions are uniform domains. (j) and = maxj H (j) the diameter of the subregion Ωj . Let H We denote by H (j) H = maxi,j Hi . Then N , the total number of subdomains, can be written as N = N1 + · · · + NNc . An example of a subregion partition that is obtained from a geometrically nonconforming subdomain partition is shown in Figure 1. In the following, we will use a superscript for the subregion index and a subscript for the subdomain index, for example, Ωj and Ωi for subregions and subdomains, respectively. For subdomains in the subregion Ωj , we use the notation Ωji . In the subregion partition, we define edges as the intersection of two subregions and vertices as the intersection of more than two subregions, similar to [26, Definition 4.1]. In addition, the finite element spaces for the subregions are given by the primal unknowns of the two-level algorithm so that the subregion partition is equipped with a conforming finite element space for which the unknowns match across the subregion interface. On this new level, the mortar discretization is no longer relevant. We can then develop the theory and algorithm for the subregion partition as in the standard BDDC algorithm for conforming finite element discretizations. However, we need to construct appropriate finite element spaces for the subregions equipped with the primal unknowns to provide the condition number bound. (j) We obtain the subregion matrix SΠ by assembling the coarse problem matrices j of the subdomains Ωi ⊂ Ωj : ⎧ −1 ⎛ (i)T ⎞⎫ Nj ⎬ ⎨ K (i) K (i) T K (j) (i) (i) (i) (i) II IΔ ⎝ ΠI T ⎠ R(i) , RΠ SΠ = KΠΠ − KΠI KΠΔ (i) (i) (i) ⎭ Π ⎩ K K K i=1
ΔI
ΔΔ
ΠΔ
(i) $Π |Ωj → W $ (i) is the restriction of primal unknowns in the subregion where RΠ : W Π
1584
HYEA HYUN KIM AND XUEMIN TU
Ωj to the subdomain Ωji . We note that the global coarse problem matrix SΠ can be (j) assembled from the SΠ of each subregion. We will build a BDDC preconditioner for the problem SΠ following the same construction as in the two-level algorithm for standard conforming finite element discretizations. In the following, we introduce the same finite element spaces as in the previous section except that they are based on the subregion partition and the subregion unknowns. We will use the subscript c to denote those unknowns, function (j) spaces, and matrices related to the subregion level. For example, Wc denotes the discrete space for the subregion Ωj . It consists of the primal unknowns of the two-level algorithm contained in the subregion Ωj . Let Γc be the interface between the subregions and Γc ⊂ Γ. We then decompose the subregion unknowns into subregion interior and interface unknowns and further decompose the interface unknowns into primal and dual unknowns (j)
(j)
Wc(j) = WIc × WΓc
(j)
(j)
(j)
and WΓc = WΠc × WΔc .
Here the average constraints on subregion edges have been selected as the primal constraints and we have changed the variables to make the primal constraints explicit. $ Γc and W P Γc , and the Similarly, we define the product space WΓc , its subspaces W extensions 6
RΓc RΓc P Γ −→ $ Γ −→ W WΓc . W c c
(3.1)
$ Γ is the space of vectors of unknowns that have the same values We note that W c P Γ is the space of vectors of unknowns that have across the subregion interface and W c the same values at the subregional primal unknowns and can have different values at the other interface unknowns. P−1 by We define our three-level preconditioner M (3.2) ⎧ ⎫ ⎛ ⎞ −1
N ⎨ ⎬ K (i) K (i) T 0 (i) II IΔ −1 T T ⎝ 6D,Γ , 6T ⎠ R 0 R R + ΦM Φ R R ΓΔ (i) D,Γ Π Δ (i) (i) ⎩ ΓΔ ⎭ RΔ K K i=1
ΔI
ΔΔ
−1 where MΠ−1 is an approximation of SΠ ; see (2.14). In other words, for a given −1 −1 $ Ψ ∈ Wc , we compute z = MΠ Ψ instead of y = SΠ Ψ. −1 We now introduce the approximation MΠ in detail. We first order the unknowns $ c into subregion interior and interface unknowns y∈W
T (1) (N ) y = yIc , . . . , yIc c , yΓc . We then write the problem SΠ y = Ψ as (3.3) ⎛ (1) (1)T (1) 0 0 SΠΓc Ic RΓc SΠIc Ic ⎜ .. ⎜ .. ⎜ . 0 0 . ⎜ T ⎜ (N ) (N ) (N ) c c ⎜ 0 0 SΠIc Ic SΠΓc Ic RΓc c ⎝ (1)T
(1)
RΓc SΠΓc Ic
···
(N )T
(N )
RΓc c SΠΓcc Ic
SΠΓc Γc
⎞⎛
(1)
y ⎟ ⎜ Ic ⎟⎜ .. ⎟⎜ . ⎟⎜ ⎟ ⎜ (N ) ⎟⎝ y c Ic ⎠ yΓc
⎞
⎛
(1)
ΨIc .. .
⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟=⎜ ⎟ ⎜ (Nc ) ⎠ ⎝ ΨIc ΨΓc
⎞ ⎟ ⎟ ⎟ ⎟, ⎟ ⎠
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
1585
(j)
where RΓc is the restriction and SΠΓc Γc is the fully assembled matrix at the subregion interface (j) $ Γc → W(j) RΓc : W Γc
and SΠΓc Γc =
Nc
(j)T
(j)
(j)
RΓc SΠΓc Γc RΓc .
j=1 (j)
Here we solve for yIc (3.4)
(j) (j)−1 (j) (j)T (j) yIc = SΠIc Ic ΨIc − SΠΓc Ic RΓc yΓc
and obtain the interface problem ⎛ ⎞ Nc T −1 T (j) (j) (j) (j) (j) (j) ⎝ (3.5) SΠΓc Γc − SΠΓc Ic SΠIc Ic SΠΓc Ic RΓc ⎠ yΓc = hΓc , RΓc j=1
where (3.6)
hΓc = ΨΓc −
Nc
(j)T
(j)−1
(j)
(j)
RΓc SΠΓc Ic SΠIc Ic ΨIc .
j=1 (j)
We denote by T (j) the Schur complement of SΠ , −1 T (j) (j) (j) (j) SΠΓc Ic T (j) = SΠΓc Γc − SΠΓc Ic SΠIc Ic and define the block diagonal matrix T = diagj T (j) . We then introduce the partially assembled matrix and the fully assembled matrix (3.7)
6Γc , 6ΓT T6R T6 = RΓTc T RΓc and T = R c
6Γc defined in (3.1). The reduced subrespectively, using the extensions RΓc and R $ Γc such that regional interface problem (3.5) is then written as follows: find yΓc ∈ W (3.8)
6Γc yΓc = hΓc . 6T T6R R Γc
P−1 , we do not solve (3.8) exactly. When using the three-level preconditioner M Instead, we replace yΓc by zΓc , where (3.9)
6D,Γc hΓc . 6T T6−1 R zΓc = R D,Γc
6D,Γc = DR 6Γc . The three-level coarse 6D,Γc is the scaled extension such that R Here R −1 6 problem appearing in the computation of T is solved quite cheaply compared to that of the two-level algorithm, since its size is much smaller than that of the twolevel algorithm. The weight factor D has the value 1 as its diagonal components $ Πc and the following values for the corresponding to the global primal unknowns in W other diagonal components: ρj γ (x) (j) † , x ∈ n WΔ c , (3.10) δc,j (x) = γ i∈Nx ρi (x)
1586
HYEA HYUN KIM AND XUEMIN TU (j)
where γ ∈ [1/2, ∞) and n(WΔc ) denotes the set of nodes in the finite element space (j)
(i)
WΔc . In addition, Nx is the set of the subregion indices i such that x ∈ n(WΔc ) and ρi (x) is the coefficient of (2.1) at x in the subregion Ωi . In our theory, ρi (x) is a positive constant in the subregion Ωi . Assumption 3.2. ρi (x) is a positive constant in each subregion Ωi . (j) We then compute zIc from zΓc as in (3.4): (j) (j)−1 (j) (j) (j) (3.11) zIc = SΠIc Ic ΨIc − SΠIc Γc RΓc zΓc . As a result, we obtain z = MΠ−1 Ψ, the solution of the inexact coarse problem for a given Ψ. Let u, v denote the l2 -inner product for vectors u and v. We summarize our P−1 in (3.2) as follows: three-level algorithm equipped with the preconditioner M 6Γ , b = R 6T g G , and T OL be given. 6 T S6Γ R Let A = R Γ Γ Step 1. Start with initial x0 , compute residual r0 = b − Ax0 , and set k = 0. Step 2. while (rk /r0 > T OL) P−1 rk Step 2.1. zk = M Step 2.2. k = k + 1 Step 2.3. if (k ≥ 2) βk = zk−1 , rk−1 /zk−2 , rk−2 dk = zk−1 + βk dk−1 else β1 = 0, d1 = z0 end if Step 2.4. αk = zk−1 , rk−1 /Adk , dk Step 2.5. Compute xk = xk−1 + αk dk Step 2.6. Compute rk = b − Axk end while Step 3. x = xk is the required solution. P−1 in Step 2.1 is replaced by the two-level preconIn the two-level algorithm, M −1 ditioner M ; see (2.12). From (2.12) and (2.14), we know that we need to solve subdomain local problems and one coarse problem exactly when we apply M −1 to a P−1 in Step 2.1, we vector in Step 2.1. When we use our three-level preconditioner M solve the subdomain local problems exactly as in the two-level algorithm; see (3.2). We do not solve the coarse problem exactly. Instead, we apply the standard two-level BDDC preconditioner to solve the coarse problem. In other words, we use (3.9) and (3.11), which will need to solve a subregion coarse problem and subregion local problems exactly. We note that the size of the subregion coarse problem is much smaller than that of the two-level coarse problem. 4. Some auxiliary results. In this section, we will collect a number of results which are needed in our theory. In the following, the notation f = O(g) means that there exist positive constants c and C, independent of H and h, such that cg ≤ f ≤ Cg. 1/2
Let E be an edge of a subdomain Ωi . We introduce a Sobolev space H00 (E) as . 1/2 H00 (E) = v ∈ L2 (E) : v6 ∈ H 1/2 (∂Ωi ) .
1587
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
Here v6 is the zero extension of v to the subdomain boundary. The norm is given by v(x)2 2 2 vH 1/2 (E) = |v|H 1/2 (E) + ds(x), 00 E dist(x, ∂E) where |v|2H 1/2 (E) =
E
E
|v(x) − v(y)|2 ds(x) ds(y). |x − y|2
Lemma 4.1. Given a function g(x) = x(H − x) defined on [0, H], we consider a nodal interpolant g h (x) = I h (x(H − x)) to the finite element space equipped with a quasi-uniform triangulation given on [0, H]. Then we have 1 H
H
g h (x) dx = O H 2 ,
0
0 h 0 0g (x)0
1/2
H00 ([0,H])
= O H2
for sufficiently small h. Proof. We can obtain these results by a direct calculation for g, 1 H g(x) dx = O H 2 , g(x)H 1/2 ([0,H]) = O H 2 00 H 0 and interpolation results for g h . In the BDDC algorithm, we use the Lagrange multiplier function ψij across the subdomain interface Fij = ∂Ωi ∩ ∂Ωj to enforce the primal constraint; see (2.7). We note that ψij is the sum of Lagrange multiplier basis functions supported in F ij . We (i) introduce a subinterval Eij of Fij such that (i)
Eij =
(4.1)
. /(i) (i) supp φl : supp φl ⊂ supp(ψij ) , l
(i)
(i)
where φl are the nodal basis functions in the finite element space WΓ . Similarly, (j) (j) we introduce Eij using the nodal basis functions in WΓ . We select such intervals on the boundary of Ωi and denote them by {Ek }k and call them reduced edges of Ωi . We define our edge average as v ψij ds F v Ek = ij , Fij ψij ds where Fij is the interface containing Ek and ψij is the Lagrange multiplier function used for the primal constraint on Fij . We use the notation v Ek for the average value rather than v Fij for a simple presentation of the proof in Lemma 4.2. (i) For a reduced edge Ek = Eij ⊂ Fij ⊂ ∂Ωi , defined in (4.1), we may consider Ek as a straight line with its length Hk (≤ Hi ). Using Lemma 4.1, we construct such a function g h in the interval [0, Hk ] and obtain a function gk (s) defined on Ek using an appropriate translation and rotation. We extend gk (s) by zero to Fij . For the function gk , we can prove F gk ψij ds (4.2) gk Ek = ij = O(Hk2 ), gk H 1/2 (E ) = O Hk2 ; k 00 Fij ψij ds
1588
HYEA HYUN KIM AND XUEMIN TU
Ωj Ωi
ψij
1
Fij Fig. 2. An example of the function ψij with the standard Lagrange multiplier basis in a geometrically nonconforming partition: Ωi is the nonmortar part of Fij , the big white nodes designate the degrees of freedom of Lagrange multiplier basis {ψij,k }k supported in F ij , and ψij = k ψij,k .
see Lemma 4.1. Here Hk is the length of Ek . In the geometrically nonconforming partition, when Fij is a part of the subdomain edge, ψij may not be the constant function with the value one on Fij ; see Figure 2. However, we can see that gk Ek with such ψij is similar to the regular average of gk that is used in the conforming finite element case F gk = ij Fij
gk ds 1 ds
.
We note that (4.2) also holds for the case when the length of Hk is comparable to the mesh size hi . This can be shown by a direct calculation. Lemma 4.2. Let {Ωij }j be the subdomains in a subregion Ωi , and let {Ek }k be the reduced edges of Ωij . For given values {mk }k , let u be the minimal energy extension h to the subdomain finite element space Vi,j with its average values uEk = mk on each Ek . We then have C1 |u|2H 1 (Ωi ) ≤
j
|uEk − uEl |2 ≤ C2 |u|2H 1 (Ωi ) . j
k,l h Proof. We consider a function v in Vi,j defined as
v(x) =
k
1 (uEk − uE1 )φk (x) + uE1 , g k Ek
h where φk is the discrete harmonic extension of gk to Vi,j . Here gk (x) is the function i which satisfies (4.2) on Ek and is zero on ∂Ωj \ Ek . We can see easily that
v Ek = u Ek . Since u is the minimal energy extension with the average values uEk = mk , we have |u|2H 1 (Ωi ) ≤ |v|2H 1 j
(Ωij )
.
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
1589
We consider |u|2H 1 (Ωi ) ≤ |v|2H 1 (Ωi ) j j 2 1 (uEk − uE1 )φk (x) + uE1 = g k Ek
H 1 (Ωij )
k
2 1 (uEk − uE1 )φk (x) = g k Ek
H 1 (Ωij )
k
≤C
k
≤C
k
1 (uEk − uE1 )2 |φk |2H 1 (Ωi ) j gk 2Ek 1 gk 2Ek
(uEk − uE1 )2 gk 2H 1/2 (E ) , 00
k
where we use [26, Lemma 4.10] or [29, Lemma 2.4] for the last inequality. Applying (4.2) to the above equation, we obtain (uEk − uE1 )2 . (4.3) |u|2H 1 (Ωi ) ≤ C j k
We now prove the other bound as follows: (4.4) (uEk − uE1 )2 = ((u − uE1 )Ek )2 k
k
≤C
Fij ⊃Ek , k
1 u − uE1 2L2 (Fij ) ψij 2L2 (Fij ) ( Fij ψij )2
≤ C|u|2H 1 (Ωi ) . j
Here we have used the facts that
ψij L2 (Fij ) ≤ CH 1/2 ,
ψij = O(H), Fij
the Poincar´e inequality 1 u − uE1 2L2 (Fij ) ≤ C|u|2H 1/2 (Fij ) , H and the trace inequality for the discrete harmonic function u |u|2H 1/2 (Fij ) ≤ C|u|2H 1 (Ωi ) . j
Here H stands for the diameter of Fij . Since each subregion is a union of subdomains, we might have a subregion with irregular boundaries as in Figure 3. We introduce a new mesh on each subregion Ωi . The purpose of introducing this new mesh is to relate the quadratic form in Lemma 4.2 to one for a conventional finite element space. Here we follow [8, 24]. We construct a triangulation of Ωi with its node set containing the primal nodes and the subdomain vertices. The vertices of the subdomain Ωij are the end points of
1590
HYEA HYUN KIM AND XUEMIN TU
v3
v v
Ω4i . .
. . . .
. . i
v
. .
Ω1
. .
v . . i
Ωj . . . .
v2
i Ω16
Ω
. . . . . .
p6
i
p1
v5
p2
v
Ω
i j
cx
p
3
p
4
v6
p5 i
Ω1
v
x
p
v4
7
v1
p8
1 i Fig. 3. Left: subregion Ωi (= 16 j=1 Ωj ) with irregular boundary; v are subregion vertices, and the nodes at black dots are unknowns at the subregion boundary. Right: a triangulation for the subregion Ωi ; pk are primal nodes, c is the center of the primal nodes {pk }6k=1 , vk are the subregion vertices, and the nodes at white circles are the subdomain vertices.
Fjk = (Ωij ∩ Ωik ), where Ωik are neighbors of Ωij . We note that we have one primal unknown for each interface Fjk . We locate the node corresponding to the primal unknown at the midpoint of the two end points of Fjk . We call these nodes primal nodes. After introducing the primal nodes in the subdomain Ωij , we consider the center point of all of these primal nodes; i.e., each component of the center points is the average of each component of the primal nodes. We then connect all primal nodes and vertices to the center point and obtain a triangulation of Ωij as in Figure 3. Finally, the union of such triangulations of Ωij gives a triangulation of the subregion Ωi . The corresponding finite element space is denoted by UH (Ωi ). We note that the subregion Ωi is equipped with the triangulation whose nodes consist of the primal nodes, vertices, and the center points of its subdomains Ωij ; see Figure 3. We call the nodes other than the primal nodes the secondary nodes. Among the secondary nodes, we call those at the interior of the subregion Ωi the interior secondary nodes and those at the boundary of the subregion Ωi the boundary secondary nodes. In addition, we call two nodes in a triangulation adjacent if they are connected through an edge of the triangulation. Ωi I φ (x) to UH (Ωi ) by For a function φI (x) ∈ UH (Ωi ), we define an interpolant IH ⎧ I Ωi I φ (x) if x is a primal node IH φ (pk ) = φI (pk ); ⎪ ⎪ ⎪ ⎪ ⎪ i ⎪ ⎨ the average of the values at all adjacent primal nodes on edges of Ω i Ωi I Ω I IH φ (x) = if x is a boundary secondary node IH φ (v1 ) = 12 (φI (p7 ) + φI (p8 )); ⎪ ⎪ ⎪ ⎪ ⎪ the average of the values at all adjacent primal nodes ⎪ ⎩ 6 Ωi I if x is an interior secondary node IH φ (c) = 16 k=1 φI (pk ). i
Ω I Here we presented the specific values of IH φ (x) for the case in Figure 3. (i) We recall that Wc is the discrete space of values at the primal nodes in the (i) subregion Ωi and WΓc is its trace space on the subregion boundary. All of these nodes correspond to the primal unknowns of the subdomain partition. Given any (i) φ ∈ Wc , we can find a function φI ∈ UH (Ωi ) with the values at the primal nodes equal to the components of φ that correspond to the primal unknowns associated with
1591
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS (i)
those nodes. For such φ ∈ Wc , we define a similar interpolant to UH (Ωi ) by i
i
Ω Ω I φ := IH φ (x). IH i
Ω We note that the function φI is not unique, but IH φ(x) will be determined uniquely Ωi since the interpolation IH depends only on the values at the primal nodes. (i) ∂Ωi We now define a mapping IH φ from WΓc to the space UH (∂Ωi ), the trace space of UH (Ωi ), by i ∂Ωi Ω φ = IH φe . IH ∂Ωi
(i)
Here φe is any function in Wc such that φe |∂Ωi = φ. The map is well defined, since Ωi the values of IH φe on the subregion boundary depend only on the values of φe at the primal nodes on the subregion boundary. (i) (i) Ωi ∂Ωi We introduce the range spaces IH (Wc ) and IH (WΓc ) and denote them, respectively, by (i) Ωi ∂Ωi SH Ωi := IH Wc(i) and SH ∂Ωi := IH WΓc . We note that SH (Ωi ) and SH (∂Ωi ) are the subspaces of UH (Ωi ) and UH (∂Ωi ), respectively. In order to prove Lemma 4.5, which plays an important role in our condition number estimate, we need to establish the equivalence between the H 1 -norm of the discrete harmonic extensions in the spaces SH (Ωi ) and UH (Ωi ) for any φ ∈ SH (∂Ωi ). Lemma 4.3. There exists a constant C > 0, independent of H and |Ωi |, the volume of Ωi , but dependent on the shape regularity of the triangulation of Ωi , such that 0 i 0 i 0 Ω 0 Ω ≤ C|φ|H 1 (Ωi ) and 0IH φ0 ≤ CφL2 (Ωi ) ∀φ ∈ UH Ωi . IH φ H 1 (Ωi )
L2 (Ωi )
Proof. See [8, Lemma 6.1]. Lemma 4.4. For φ ∈ SH (∂Ωi ), inf
v∈SH (Ωi ),v|∂Ωi =φ
vH 1 (Ωi ) ≈
inf
vH 1 (Ωi )
inf
|v|H 1 (Ωi ) .
v∈UH (Ωi ),v|∂Ωi =φ
and inf
v∈SH (Ωi ),v|∂Ωi =φ
|v|H 1 (Ωi ) ≈
v∈UH (Ωi ),v|∂Ωi =φ
Here SH (Ωi ) is a subspace of UH (Ωi ). Proof. For the first equivalence, since SH (Ωi ) is a subspace of UH (Ωi ), we need only to prove that inf
v∈SH (Ωi ),v|∂Ωi =φ
vH 1 (Ωi ) ≤ C
inf
v∈UH (Ωi ),v|∂Ωi =φ
vH 1 (Ωi ) . i
Ω v ∈ SH (Ωi ). Given any function v ∈ UH (Ωi ) with v = φ on ∂Ωi , let w = IH i Ωi ∂Ωi Since φ ∈ SH (∂Ω ) and by the definitions of IH and IH , we have w = φ on ∂Ωi .
1592
HYEA HYUN KIM AND XUEMIN TU i
Ω Moreover, by Lemma 4.3, we have wH 1 (Ωi ) = IH vH 1 (Ωi ) ≤ CvH 1 (Ωi ) for any i i v ∈ UH (Ω ) with v = φ on ∂Ω , and we proved the first equivalence. The second equivalence can be obtained similarly. We note that the hidden constants in the equivalences in Lemma 4.4 depend on the shape regularity of the partition of the subregion Ωi by the subdomains Ωij . The constants in the following Lemmas 4.5, 4.7, 4.8, and 5.1 and Theorem 5.2 will have the same dependence. For a discussion of the shape regularity of a partition, see [5]. H, h, and Lemma 4.5. There exist constants C1 and C2 > 0, independent of H, (i) ρi , such that for all wi ∈ WΓc , J I C1 ρi inf |v|2H 1 (Ωi ) ≤ T (i) wi , wi ≤ C2 ρi inf |v|2H 1 (Ωi ) , i
i
∂Ω w v∈UH (Ωi ),v|∂Ωi =IH i
∂Ω w v∈UH (Ωi ),v|∂Ωi =IH i
where T (i) wi , wi = wiT T (i) wi = |wi |2T (i) and T (i) = SΠΓc Γc −SΠΓc Ic (SΠIc Ic )−1 (SΠΓc Ic )T . (i)
(i)
(i)
(i)
Proof. By the definition of T (i) , we have I J T (i) wi , wi = inf |v|2S (i) (i)
Π
v∈Wc ,v|∂Ωi =wi
=
inf
ρi
(i)
v∈Wc ,v|∂Ωi =wi
≈ ≈ ≈
Ni j=1
Ni
inf
ρi
inf
i 2 Ω ρi IH v
(i) v∈Wc ,v|∂Ωi =wi
(i) v∈Wc ,v|∂Ωi =wi
inf
|u|2H 1 (Ωi ) inf h ,¯ j u∈Vi,j uEl =vl ,El ⊂∂Ωij |vk1 − vk2 |2
j=1 k1 ,k2 H 1 (Ωi )
i
∂Ω w v∈UH (Ωi ),v|∂Ωi =IH i
=
inf v∈SH
(Ωi ),v|
∂Ωi w i ∂Ωi =IH
ρi |v|2H 1 (Ωi )
ρi |v|2H 1 (Ωi ) . i
i
Ω ∂Ω and IH for the fourth We use Lemma 4.2 for the third bound, the definitions of IH and fifth bounds, and Lemma 4.4 for the final one. Here vl stands for the value of (i) v ∈ Wc at the primal node corresponding to the reduced edge El of the subdomain i Ωj . Next we refer to Lemma 4.2 in [15] for subdomains with irregular boundary. We rewrite this lemma for our subregions with irregular boundary. Lemma 4.6. Let F ij be an edge common to the boundaries of Ωi and Ωj . For all wi ∈ UH (Ωi ) and wj ∈ UH (Ωj ), which have the same edge average over the common edge F ij , we have 2 i H (ϑF ij (wi − wj ))2 1 i ≤ C 1 + log H i /Hi |wi |2H 1 (Ωi ) H (Ω ) 2 j /Hj + C 1 + log H |wj |2H 1 (Ωj ) , i
∂Ω where ϑF ij is the discrete harmonic extension of IH (ζF ij ) to UH (Ωi ) and ζF ij has ij i and H j are its value one at the nodes in F and zero at the other part. Here H subregion diameters, and Hi and Hj are the element sizes of finite element spaces UH (Ωi ) and UH (Ωj ), respectively. In addition, Hi (v) denotes the discrete harmonic extension of v restricted on the boundary of Ωi to UH (Ωi ).
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
1593
P Γ as ED = R 6T 6Γc R We define the interface average operator EDc on W c c Dc ,Γc , which computes the averages across the subregion interface Γc and then distributes the averages to the unknowns at the subregion boundaries. The interface average operator EDc has the following property. Lemma 4.7. |EDc wΓc |2T6
H 1 + log H
≤C
2 |wΓc |2T6
P Γc , where C is a positive constant independent of H, H, h, and the for any wΓc ∈ W coefficients of (2.1), and T6 is defined in (3.7). P Γ , we have Proof. We can follow the proof of [30, Lemma 5]. Given any wΓc ∈ W c |EDc wΓc |2T6 ≤ 2 |wΓc |2T6 + |wΓc − EDc wΓc |2T6 ≤ 2 |wΓc |2T6 + |RΓc (wΓc − EDc wΓc ) |2T (4.5)
= 2 |wΓc |2T6 +
Nc
| (wΓc − EDc wΓc )i |2T (i)
,
i=1
where (wΓc − EDc wΓc )i is the restriction of wΓc − EDc wΓc to the subregion Ωi . Also let wi be the restriction of wΓc to the subregion Ωi and set (4.6)
vi (x) := (wΓc − EDc wΓc )i (x) =
† δc,j (wi (x) − wj (x)),
x ∈ ∂Ωi ∩ Γc .
j∈Nx
Here Nx is the set of indices of the subregions that have x on their boundaries. We † in (3.10). It satisfies recall the definition for δc,j (4.7)
2 † ρi δc,j ≤ min(ρi , ρj ). (i)
Let ζF be unknowns in WΓc with its values one at the nodes in F and zero at the other nodes. We also need a function in the space UH (Ωi ), denoted by ϑF , which ∂Ωi is the discrete harmonic extension of IH (ζF ) to UH (Ωi ). We note that x in (4.6) are from the subdomain primal unknowns; they belong to exactly two subregions as in Figure 3 so that we have (4.8)
|vi |2T (i) ≤ C
|ζF ij vi |2T (i) ,
F ij ⊂∂Ωi
where F ij is the common interface of the subregions Ωi and Ωj .
1594
HYEA HYUN KIM AND XUEMIN TU
We then obtain |ζF ij vi |2T (i) ≤ Cρi
(4.9)
inf
i
∂Ω (ζ v∈UH (Ωi ),v|∂Ωi =IH F ij vi )
|v|2H 1 (Ωi )
2 †2 i ∂Ωi ζFij (wi − wj ) = Cρi δc,j H IH H 1 (Ωi ) 2 †2 i ∂Ωi ∂Ωi ∂Ωj = Cρi δc,j (wi ) − IH (wj ) ζFij IH H IH 1 i H (Ω ) 2 2 i i j † Ω ∂Ω ∂Ω Hi ϑFij Hi IH ≤ Cρi δc,j (wi ) − Hj IH (wj ) IH 1 i H (Ω ) 2 i j †2 i ∂Ω ∂Ω ≤ Cρi δc,j H ϑFij Hi IH (wi ) − Hj IH (wj ) . H 1 (Ωi )
Here Hi (v) is the discrete harmonic extension of v restricted on the boundary of Ωi to UH (Ωi ), and Lemmas 4.5 and 4.3 are used for the first and last inequalities, respectively. We can estimate the term in (4.9) by Lemma 4.6 to obtain 2 2 H + 2 2 ∂Ωk (wk ) , |ζF ij vi |T (i) ≤ Cρi δc,j 1 + log Hk IH H H 1 (Ωk ) k=i,j
where wi and wj have the same edge average on F ij . Combining the above inequality with (4.7) and Lemma 4.5, we obtain 2 H 2 |wi |2T (i) + |wj |2T (j) . |ζF ij vi |T (i) ≤ C 1 + log H From (4.5), (4.6), (4.8), and the above inequality, the desired bound then follows: 2 H 2 |EDc wΓc |T6 ≤ C 1 + log |wΓc |2T6 . H Using Lemma 4.7, we can prove the following result; see [32, Lemma 4.6] or [31, Lemma 4.7]. $ Γ , let Ψ = ΦT R 6D,Γ uΓ . We have Lemma 4.8. Given any uΓ ∈ W 2 H −1 −1 T −1 T Ψ SΠ Ψ ≤ Ψ MΠ Ψ ≤ C 1 + log ΨT SΠ Ψ. H 5. Condition number estimate for the new preconditioner. In order to P−1 , we estimate the condition number for the system with the new preconditioner M −1 by using Lemma 4.8. compare it to the system with the preconditioner M $ Γ, Lemma 5.1. Given any uΓ ∈ W 2 H T −1 T P−1 uTΓ M −1 uΓ . (5.1) uΓ M uΓ ≤ uΓ M uΓ ≤ C 1 + log H Proof. See [32, Lemma 5.1] or [31, Lemma 5.1].
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
1595
Theorem 5.2. The condition number for the system with the three-level precon2 P−1 is bounded by C(1 + log(H/H)) ditioner M (1 + log(H/h))2 . Proof. Combining the condition number bound in (2.16) for the two-level BDDC method and Lemma 5.1, we find that the condition number for the three-level method 2 (1 + log(H/h))2 . is bounded by C(1 + log(H/H)) 6. Numerical experiments. In this section, we present numerical results for the suggested algorithm. We consider the elliptic problem in the unit rectangular domain Ω = [0, 1]2 : −∇ · (ρ(x, y)∇u(x, y)) = f (x, y), u(x, y) = 0,
(x, y) ∈ Ω,
(x, y) ∈ ∂Ω,
where f (x, y) is given in L2 (Ω). In our experiments, we performed the conjugate gradient iterations up to the relative residual norm reduced by a factor of 106 . We test our algorithm by two sets of the numerical experiments. In the first set of the experiments, we take ρ(x, y) = 1 everywhere in the domain. While in the second set of the experiments, we take ρ(x, y) to be constants in each subregion but to have large jumps across the subregion boundaries. In each experiment set, we performed the computations for both geometrically conforming and nonconforming subdomain partitions and used the Lagrange multiplier space with dual basis. All of these numerical results are consistent with our theory. The geometrically conforming partitions are obtained from the uniform rectangles of length 1/N , where N denotes the number of subdomains in each x and ydirectional edge of Ω. For a given N , we obtain N 2 uniform rectangular subdomains. Each subdomain is equipped with finite elements that can be nonmatching across the subdomain interface. In the three-level algorithm, we group subdomains to obtain a subdomains in its x uniform rectangular subregion partition. Each subregion has N and y-directional edges. To obtain a geometrically nonconforming subdomain partition, we first partition Ω into N uniformly vertical strips in the x-direction and then divide each strip into N or N + 1 rectangles successively. We group subdomains to obtain a subregion = H/H, partition with N the number of subdomains across an edge of a subregion. Figure 4 shows a geometrically conforming subdomain partition, a geometrically nonconforming subdomain partition, and their subregion partitions when N = 16 and = 4. N In the first set of the experiments, we set ρ(x, y) = 1. We perform the exact two-level BDDC algorithm and the inexact three-level BDDC algorithm to see the scalability in terms of the number of subdomains and the number of subregions, respectively. Tables 1 and 2 show the condition numbers and the number of iterations in geometrically conforming and nonconforming partitions, respectively. Here Nd and Nc denote the number of subdomains and the number of subregions, respectively. In the inexact case, the subdomain problem size and the subregion problem size are fixed, and in the exact case the subdomain problem size is fixed. Both cases show a good scalability. In Tables 1 and 2, each row corresponds to the same subdomain partition, i.e., the same coarse problem SΠ in (2.15). The inexact case solves the coarse problem approximately by applying a BDDC preconditioner to solve the coarse problem SΠ . We can observe that, when using the inexact coarse problem, there are only slight increases in the condition numbers and the number of iterations compared to the exact coarse problem. However, the coarse problem is solved quite cheaply in the inexact case.
1596
HYEA HYUN KIM AND XUEMIN TU
Fig. 4. Examples of subdomain and subregion partitions: smaller rectangles are subdomains and each subregion (with thick boundary) is a group of subdomains. Left: a geometrically conforming = subdomain partition of 162 subdomains (N = 16) and its subregion partition with 4 subdomains (N 4) across each subregion (the number of subregions are 42 ). Right: a geometrically nonconforming subdomain partition of 162 + 8 subdomains (N = 16) and its subregion partition with 4 subdomains = 4) across each subregion (the number of subregions are 42 + 2). (N
Table 1 Geometrically conforming subdomain partitions. Left three columns: scalability as the increase of the number of subdomains Nd for the BDDC algorithm with the exact coarse problem when the subdomain problem sizes are fixed with (H/h) = 5 or 4. Right three columns: scalability as the increase of the number of subregions Nc for the BDDC algorithm with an inexact coarse problem = (H/H) when the subregion problem sizes, N = 4, and the subdomain problem sizes, (H/h) = 5 or 4, are fixed.
Nd 162 322 642 802
Exact Cond 9.18 9.26 9.28 9.29
Iter 18 17 17 17
Nc 42 82 162 202
Inexact Cond 9.67 10.11 10.13 10.13
Iter 19 21 20 20
Table 2 Geometrically nonconforming partitions. Left three columns: scalability as the increase of the number of subdomains Nd for the BDDC algorithm with the exact coarse problem when the subdomain problem sizes are fixed with (H/h) = 6, 8, or 10. Right three columns: scalability as the increase of the number of subregions Nc for the BDDC algorithm with an inexact coarse problem = (H/H) when the subregion problem sizes, N = 4, and the subdomain problem sizes, (H/h) = 6, 8, or 10, are fixed.
Nd 162 + 8 322 + 16 642 + 32 802 + 40
Exact Cond 12.36 12.37 12.40 12.41
Iter 23 24 24 25
Nc 42 + 2 82 + 4 162 + 8 202 + 10
Inexact Cond 12.70 12.79 12.81 12.82
Iter 26 27 29 29
Tables 3 and 4 present the results of the three-level algorithm by changing the subregion problem size and the subdomain problem size. Tables 3 and 4 are for geometrically conforming and nonconforming subdomain partitions, respectively. Both results are consistent with our theory.
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
1597
Table 3 Geometrically conforming subdomain partitions with 4 × 4 subregions. Left three columns: when the subdomain problem sizes are scalability as the increase of the subregion problem size N fixed with n = (H/h) = 5 or 6. Right three columns: scalability as the increase of the subdomain = (H/H) problem size n when the subregion problem sizes are fixed with N = 4. 4 × 4 subregions, n fixed = H Cond Iter N H 4 9.67 19 8 10.57 20 16 11.73 24 20 12.16 25
fixed 4 × 4 subregions, N n= H h (5,4) (10,8) (20,16) (25,20)
Cond 9.67 13.23 17.20 18.56
Iter 19 23 26 26
Table 4 Geometrically nonconforming subdomain partitions with 42 + 2 subregions. Left three columns: when the subdomain problem sizes are scalability as the increase of the subregion problem size N fixed with n = (H/h) = 6, 8, or 10. Right three columns: scalability as the increase of the subdomain = (H/H) problem size n when the subregion problem sizes are fixed with N = 4. 42 + 2 subregions, n fixed = H N Cond Iter H 4 12.70 26 8 13.11 28 16 13.77 29 20 14.01 30
fixed 42 + 2 subregions, N n= H h (6,8,10) (8,10,12) (18,20,22) (24,26,28)
Cond 12.70 14.12 18.39 20.05
Iter 26 27 30 30
Table 5 Discontinuous coefficient case for geometrically conforming subdomain partitions. Left three columns: scalability as the increase of the number of subdomains Nd for the BDDC algorithm with the exact coarse problem when the subdomain problem sizes are fixed with (H/h) = 5 or 4. Right three columns: scalability as the increase of the number of subregions Nc for the BDDC algorithm with = (H/H) an inexact coarse problem when the subregion problem sizes, N = 4, and the subdomain problem sizes, (H/h) = 5 or 4, are fixed.
Nd 162 322 642 802
Exact Cond 9.18 9.22 9.27 9.27
Iter 19 19 18 18
Nc 42 82 162 202
Inexact Cond 9.55 10.01 10.17 10.19
Iter 22 23 22 21
In our second set of the numerical experiments, we test our algorithm with discontinuous coefficients ρ(x, y). The values ρ(x, y) are selected among 1, 10, 100, and 1000. They are constants in each subregion, but they can have jump across subregion boundaries. As before, we compare the two-level and the three-level algorithms with the same coarse problem size on the geometrically conforming and nonconforming subdomain partitions. The results are reported in Tables 5 and 6. The three-level algorithm gives slightly more iterations due to solving the coarse problem inexactly. However, the computation cost is reduced for each iteration resulting in a faster computing time than the two-level algorithm. Tables 7 and 8 show the number of iterations and condition numbers of the threelevel algorithm regarding the subregion problem size and the subdomain problem
1598
HYEA HYUN KIM AND XUEMIN TU
Table 6 Discontinuous coefficient case for geometrically nonconforming partitions. Left three columns: scalability as the increase of the number of subdomains Nd for the BDDC algorithm with the exact coarse problem when the subdomain problem sizes are fixed with (H/h) = 6, 8, or 10. Right three columns: scalability as the increase of the number of subregions Nc for the BDDC algorithm with = (H/H) an inexact coarse problem when the subregion problem sizes, N = 4, and the subdomain problem sizes, (H/h) = 6, 8, or 10, are fixed.
Nd 162 + 8 322 + 16 642 + 32 802 + 40
Exact Cond 11.85 12.25 12.37 12.39
Iter 25 26 27 28
Nc 42 + 2 82 + 4 162 + 8 202 + 10
Inexact Cond 11.87 12.56 12.74 12.77
Iter 27 29 30 30
Table 7 Discontinuous coefficient case for geometrically conforming subdomain partitions with 4 × 4 when subregions. Left three columns: scalability as the increase of the subregion problem size N the subdomain problem sizes are fixed with n = (H/h) = 5 or 6. Right three columns: scalability as the increase of the subdomain problem size n when the subregion problem sizes are fixed with = (H/H) N = 4. 4 × 4 subregions, n fixed = H Cond Iter N H 4 9.55 22 8 10.46 25 16 11.64 26 20 12.06 28
fixed 4 × 4 subregions, N n= H h (5,4) (10,8) (20,16) (25,20)
Cond 9.55 13.01 16.90 18.24
Iter 22 26 30 31
Table 8 Discontinuous coefficient case for geometrically nonconforming subdomain partitions with 42 +2 when the subregions. Left three columns: scalability as the increase of the subregion problem size N subdomain problem sizes are fixed with n = (H/h) = 6, 8, or 10. Right three columns: scalability as the increase of the subdomain problem size n when the subregion problem sizes are fixed with = (H/H) N = 4. 42 + 2 subregions, n fixed = H Cond Iter N H 4 11.87 27 8 12.23 28 16 12.74 30 20 13.04 31
fixed 42 + 2 subregions, N n= H h (4,6,8) (6,8,10) (10,12,14) (12,14,16)
Cond 10.36 11.87 14.36 15.30
Iter 26 27 30 31
size with the other mesh parameters fixed. We observe that the theoretical bound is still valid for the discontinuous coefficients in both the geometrically conforming and nonconforming subdomain partitions. Acknowledgments. The authors are grateful to Professor Olof Widlund for all his help. They also thank the referees for useful comments and suggestions. REFERENCES [1] F. B. Belgacem and Y. Maday, The mortar element method for three dimensional finite elements, M2AN Math. Model. Numer. Anal., 31 (1997), pp. 289–302. [2] F. B. Belgacem, The mortar finite element method with Lagrange multipliers, Numer. Math., 84 (1999), pp. 173–197.
A THREE-LEVEL BDDC FOR MORTAR DISCRETIZATIONS
1599
[3] C. Bernardi, Y. Maday, and A. T. Patera, A new nonconforming approach to domain decomposition: The mortar element method, in Nonlinear Partial Differential Equations and Their Applications. Coll`ege de France Seminar, Vol. XI (Paris, 1989–1991), Pitman Res. Notes Math. 299, Longman Scientific and Technical, Harlow, 1994, pp. 13–51. [4] S. C. Brenner and L.-Y. Sung, BDDC and FETI-DP without matrices or vectors, Comput. Methods Appl. Mech. Engrg., 196 (2007), pp. 1429–1435. [5] S. C. Brenner, Korn’s inequalities for piecewise H 1 vector fields, Math. Comp., 73 (2004), pp. 1067–1087. [6] S. C. Brenner, A functional analytic framework for BDDC and FETI-DP, in Domain Decomposition Methods in Science and Engineering, Vol. XVII, Proceedings of the Seventeenth International Conference of Domain Decomposition Methods, Lect. Notes Comput. Sci. Eng. 60, U. Langer, M. Discacciati, D. Keyes, O. Widlund, and W. Zulehner, eds., Springer-Verlag, Berlin, 2008, pp. 239–246. [7] Y.-W. Chang, H. H. Kim, and C.-O. Lee, Preconditioners for the dual-primal FETI methods on nonmatching grids: Numerical study, Comput. Math. Appl., 51 (2006), pp. 697–712. [8] L. C. Cowsar, J. Mandel, and M. F. Wheeler, Balancing domain decomposition for mixed finite elements, Math. Comp., 64 (1995), pp. 989–1015. [9] C. R. Dohrmann, A preconditioner for substructuring based on constrained energy minimization, SIAM J. Sci. Comput., 25 (2003), pp. 246–258. [10] C. R. Dohrmann, A Substructuring Preconditioner for Nearly Incompressible Elasticity Problems, Technical report SAND2004-5393, Sandia National Laboratories, Albuquerque, NM, 2004. [11] C. R. Dohrmann, An approximate BDDC preconditioner, Numer. Linear Algebra Appl., 14 (2007), pp. 149–168. [12] C. Kim, R. D. Lazarov, J. E. Pasciak, and P. S. Vassilevski, Multiplier spaces for the mortar finite element method in three dimensions, SIAM J. Numer. Anal., 39 (2001), pp. 519–538. [13] H. H. Kim, M. Dryja, and O. B. Widlund, A BDDC method for mortar discretizations using a transformation of basis, SIAM J. Numer. Anal., 47 (2008), pp. 136–157. [14] H. H. Kim, A BDDC algorithm for mortar discretization of elasticity problems, SIAM J. Numer. Anal., 46 (2008), pp. 2090–2111. [15] A. Klawonn, O. Rheinbach, and O. B. Widlund, An analysis of a FETI–DP algorithm on irregular subdomains in the plane, SIAM J. Numer. Anal., 46 (2008), pp. 2484–2504. [16] A. Klawonn and O. Rheinbach, Inexact FETI-DP methods, Internat. J. Numer. Methods Engrg., 69 (2007), pp. 284–307. [17] A. Klawonn and O. B. Widlund, Dual-primal FETI methods for linear elasticity, Comm. Pure Appl. Math., 59 (2006), pp. 1523–1572. [18] J. Li and X. Tu, Convergence Analysis of a Balancing Domain Decomposition Method for Solving Interior Helmholtz Equations, Numer. Linear Algebra Appl., to appear. [19] J. Li and O. Widlund, BDDC algorithms for incompressible Stokes equations, SIAM J. Numer. Anal., 44 (2006), pp. 2432–2455. [20] J. Li and O. Widlund, FETI-DP, BDDC, and block Cholesky methods, Internat. J. Numer. Methods Engrg., 66 (2006), pp. 250–271. [21] J. Li and O. Widlund, On the use of inexact subdomain solvers for BDDC algorithms, Comput. Methods Appl. Mech. Engrg., 196 (2007), pp. 1415–1428. [22] J. Mandel, C. R. Dohrmann, and R. Tezaur, An algebraic theory for primal and dual substructuring methods by constraints, Appl. Numer. Math., 54 (2005), pp. 167–193. [23] J. Mandel and C. R. Dohrmann, Convergence of a balancing domain decomposition by constraints and energy minimization, Numer. Linear Algebra Appl., 10 (2003), pp. 639–659. [24] M. Sarkis, Nonstandard coarse spaces and Schwarz methods for elliptic problems with discontinuous coefficients using non-conforming elements, Numer. Math., 77 (1997), pp. 383–406. [25] D. Stefanica, Domain Decomposition Methods for Mortar Finite Elements, Ph.D. thesis, Department of Computer Science, Courant Institute, New York Unversity, New York, 2000. [26] A. Toselli and O. Widlund, Domain Decomposition Methods—Algorithms and Theory, Springer Ser. Comput. Math. 34, Springer-Verlag, Berlin, 2005. [27] X. Tu and J. Li, A balancing domain decomposition method by constraints for advectiondiffusion problems, Commun. Appl. Math. Comput. Sci., 3 (2008), pp. 25–60. [28] X. Tu, A BDDC algorithm for a mixed formulation of flows in porous media, Electron. Trans. Numer. Anal., 20 (2005), pp. 164–179. [29] X. Tu, BDDC Domain Decomposition Algorithms: Methods with Three Levels and for Flow in Porous Media, Ph.D. thesis, Courant Institute, New York University, New York, 2006. [30] X. Tu, A BDDC algorithm for flow in porous media with a hybrid finite element discretization, Electron. Trans. Numer. Anal., 26 (2007), pp. 146–160.
1600
HYEA HYUN KIM AND XUEMIN TU
[31] X. Tu, Three-level BDDC in three dimensions, SIAM J. Sci. Comput., 29 (2007), pp. 1759– 1780. [32] X. Tu, Three-level BDDC in two dimensions, Internat. J. Numer. Methods Engrg., 69 (2007), pp. 33–59. [33] B. I. Wohlmuth, A mortar finite element method using dual spaces for the Lagrange multiplier, SIAM J. Numer. Anal., 38 (2000), pp. 989–1012. [34] B. I. Wohlmuth, Discretization Methods and Iterative Solvers Based on Domain Decomposition, Lect. Notes Comput. Sci. Eng. 17, A. Toselli and O. Widlund, eds., Springer-Verlag, Berlin, 2001.