This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Nf} consists of the standard basis functions for P^ } that correspond to the Nf free nodes in the mesh. The reader should recall the following requirement on the triangulation: Any point where F] and F2 meet must be a node in the mesh, and this node is considered to belong to F]. Most entries Ky of the stiffness matrix are zero, since the corresponding integrand K V0y • V0, is zero throughout £2. For those entries KJJ that are not zero, the support of KV(f>j • V0/ consists of a few triangles. One strategy for computing K is to loop over all i, j pairs, determine if KJJ is nonzero, and, if it is, compute the integral that defines it. If KIJ is nonzero and the support of K V0y • V0/ is
then
To compute these integrals, it is necessary to compute the basis functions >/ and 07 (or, actually, their gradients) on each of the triangles T r { , Tr2,..., Tfl. Algorithm 6.1 expresses this approach to computing K. This algorithm can be described as node-oriented, since it involves looping over the nodes in the mesh. The reader will notice that only the upper triangle of K is computed directly, since the matrix is known to be symmetric (Kji = K^).
6.1. Programming the finite element method
129
Initialize K to the zero matrix for for Determine if K is nonzero if Determine the triangles support of Set for
Compute Compute
and
forming the
on and add it to
Set
Algorithm 6.1. Node-oriented algorithm for computing the stiffness matrix K.
Figure 6.1. The support 0/013 in a certain mesh. The triangles are labeled in the left graph, while the free nodes are labeled in the right graph. One problem with Algorithm 6.1 is that the value of any given V>, on any particular Tk will contribute to Kfj for several (usually three) values of j. For example, for the mesh illustrated in Figure 6.1, the value of V0!3 on T2o contributes to #13,12, #13,13, and #13,18 (and, by symmetry, #12.13, #18.13). Therefore, it must be computed repeatedly (at the cost of some inefficiency) or stored after it is computed (at the cost of some inconvenience). It would be preferable, if possible, to compute V0, just once on each triangle in its support, use its value, and then discard it. The simplest data structure describing a triangulation is the triangle-node list. This consists of two arrays: The node array contains the coordinates of the nodes, and the triangle array contains three indices for each triangle, identifying the nodes (from the nodes array) that are the vertices of the given triangle. When Algorithm 6.1 is executed, it is necessary to loop over the vertices of the triangles and to know, for a given vertex, which other vertices are adjacent to it. This implies storing the "connectivity" information of the mesh (that is, storing, for each vertex, the indices of the adjacent nodes). This connectivity information is contained in the triangle-node list, but only implicitly. It would be inefficient
130
Chapter 6. The mesh data structure
to search through the list of triangles and vertices to determine the connectivity of the vertices. Algorithm 6.1 therefore requires that both the triangle-node list and the connectivity information be stored explicitly. It turns out that by adopting a different strategy for computing K, both of the above problems can be circumvented: 0, on Tk need be computed only once, and the connectivity information need not be stored explicitly. The idea is to loop over the triangles in the mesh and, for each triangle, compute the contributions to all entries KJJ that are affected by the given triangle. This is actually quite easy to do. Given a triangle Tk, the only basis functions whose support has a nontrivial intersection with Tk are those corresponding to the vertices of 7^. There are at most three such basis functions (fewer if one or more vertices are constrained). If all three vertices of Tk are free and the corresponding basis functions are then the following entries of K are affected:
The contribution to KI ^ is
To be precise,
where "H " represents integrals of KV<j>ep • V4>tr/ over the other triangles that form its support. The integrals computed over Tk are often collected in a 3 x 3 matrix called the element stiffness matrix (over Tk):
This matrix need not be formed explicitly (except possibly as a programming convenience); rather, its entries are added to the corresponding entries of K. When computing the entries of the element stiffness matrix, it may be advantageous to compute the integrals by transforming to the reference triangle, as described in Section 4.6. The advantages of using a reference triangle will be discussed in Chapters 7 and 8. As always, the symmetry of A' should not be ignored. It is necessary to compute only six of the nine entries of the element matrix, namely, those in the upper triangle. If one of the three vertices of Tk is constrained, then Tk contributes to only four entries of K, while if two of the vertices are constrained, then Tk contributes to a single entry in K. It is possible that all three vertices of Tk can be constrained, but this could hold for only a few triangles in a given mesh, for example, those lying at the corner of a rectangle.
6.1. Programming the finite element method
131
Algorithm 6.2 incorporates the above ideas. The reader should recall that the vertices of
are Initialize k to the zero matrix for for for if and are both free Find the indices Compute
and
of
and
in the list of free nodes
and add it to
and to
Algorithm 6.2. Element-oriented algorithm for computing K. To implement this algorithm, it is necessary to know, for each triangle Tk, the nodes z , j — 1, 2, 3. This information is required by any conceivable scheme, since integrals over Tk must be computed, and is contained in the triangle-node list. In addition, it must be possible to determine if a given vertex z« is free or not. If it is free, its index in the list of all free nodes must be known. I have already established the following notation: The free nodes are enumerated 1 , 2, . . . , Nf and the vertices are enumerated \,2, . . . , Nv. Free node j is vertex z /,. . That is, I have established a mapping from j e {1, 2, . . . , N/} to /) e { 1 , 2 , . . . , Nv}. This mapping is necessarily one-to-one, so it has an inverse mapping defined by /?/ = j if and only if j e {1, 2, . . . , Nf} and i = f j . Except in the case that every node is free, the quantity /?, is not defined for some i e { 1 , 2, . . . , Nv}. For each node zn, it is necessary to store pn or a flag indicating that zn is constrained. I will present a convenient way to do this in the next section. For now I just point out that, given this information, the above algorithm is efficient and easy to implement. Since I will need it later, I will also define , = j if and only if j e { 1 , 2 , . . . , Nc} and i = c / . This establishes the analogous relationship for the constrained nodes. Other issues that must be addressed are the computation of the gradients of the basis functions on each triangle and the computation of the integrals over the triangles. These will be addressed in Chapter 7.
6.1 .2
Computing the load vector
The algorithm for computing the load vector F is exactly analogous to the algorithm for assembling the stiffness matrix. If the boundary conditions are homogeneous (that is, if g and h are zero in (6.lb) and (6.1c)), then the components of F are defined by
if the support of
then
132
Chapter 6. The mesh data structure
As in the case of the stiffness matrix, the contributions to the components of the load vector are computed while looping over the triangles in the mesh. The result is Algorithm 6.3. Initialize f to the zero vector for for if is free in the list of free nodes Find the index of and add it to Compute
Algorithm 6.3. Element-oriented algorithm for computing F in the case of homogeneous boundary conditions. The reader should notice that the index of znk, in the list of free nodes is This sort of indirect indexing is essential for the computer implementation of the finite element method. Inhomogeneous Dirichlet conditions
When the Dirichlet conditions are inhomogeneous (g / 0 in (6.1b)), then the formula for Fi has an extra term,
where G is a function interpolating the Dirichlet data g on T\. When computing F by looping over the triangles, integrals of the form
must be computed. It is usual to choose, for G, the continuous piecewise linear function satisfying
But zn & T] if and only if zn is a constrained node. Therefore, it is necessary to know, for a given node zn, whether it is constrained or not and, if so, its index qn in the list of constrained nodes. The mesh data structure must contain this information if it is to be used for problems with inhomogeneous Dirichlet conditions. Inhomogeneous Neumann conditions
If the Neumann condition is inhomogeneous (that is, if h ^ 0 in (6.1c)), then the formula for Fi contains an additional term:
6.1. Programming the finite element method
133
When computing F, it is therefore necessary to compute integrals of the form
where e is a triangle edge lying in YI, that is, a free boundary edge. These contributions to F can be computed while looping over the triangles (along with the contributions from the right-hand side of the PDE and the inhomogeneous Dirichlet conditions). Then the values of >/ on Tk need only be computed once. This suggests that the data structure should record the edges of each triangle, with an indication of which lie on the boundary and which boundary edges are free. The edges of a triangle Tk are fully specified by their endpoints, which are the nodes z,,k,, ink 2 , Znk 3 • It would be sufficient, then, for the algorithms described thus far to augment a triangle-node list with flags describing the edges as interior edges, free boundary edges, or constrained boundary edges. However, for the purpose of local refinement (in which some triangles in a given mesh are refined, but others are not), it is necessary to identify not only the edges of a given triangle but also the triangle on the other side of each edge. This information is not easily extracted from the triangle-node list. Therefore, instead of two lists (triangles and nodes), the data structure defined in the next section includes three lists: triangles, edges, and nodes. 1 have already introduced the notation for the triangles and for the nodes. 1 will denote the edges in the mesh by and define indices kit\,fc/,2so that Zk,, and Zk, 2 are the endpoints of e,. Since 1 need to point from a triangle to its edges, I define indices Sk,\, s^.i, and %3 such that Tk has edges Each triangle is identified by its three edges, each edge by the nodes forming its endpoints, and each node by its coordinates. To these lists will be added arrays of flags and pointers as needed for the algorithms. For example, the mappings i \-+ fi and j i-> PJ can be stored as arrays of pointers. From the resulting data structure, all of the information needed by the algorithms described in this chapter can be easily extracted. Explicitly storing the triangle-edge and edge-vertex lists is convenient for handling inhomogeneous Neumann conditions and essential for local refinement. The vertices of a triangle can be cheaply extracted from these lists, so there is no significant penalty for eliminating the triangle-node list. When the Neumann data are specified for input to the algorithm for computing the load vector, it should be easy to determine which boundary edges are free (without searching through the list of all triangles). I will denote the number of free boundary edges by N/, and the free boundary edges themselves will be denoted The mapping / i-> b/ will also be stored.
Chapter 6. The mesh data structure
134
6.2
The mesh data structure
I will now define a data structure for describing a triangular mesh on a polygonal domain in R2. The data structure describes linear Lagrange triangles and will be extended in later chapters for higher-order Lagrange triangles. The necessary notation for referring to triangles, edges, nodes, free nodes, constrained nodes, and free boundary edges has already been established in Chapter 4 and Section 6.1. I should emphasize that the data structure presented here is based on simple arrays, as is appropriate for a procedural style of programming. Using object-oriented programming, it would be natural to encapsulate the mesh data structure together with access functions and other manipulations in a class (in the C++ programming language, for example). However, I will not pursue this extension here. The data structure that I present is designed for ease of use, not to minimize the amount of storage used. There is a trade-off between storing information explicitly and recomputing it when needed; the first approach uses more memory and the second more time. The report by Beall and Shephard [11] discusses this trade-off in detail. Below, in Section 6.3,1 give more details about the amount of memory used (see also Exercise 1).
6.2.1 The list of nodes The most basic information required about the mesh is the list of nodes and their coordinates. The order of the nodes in this list is completely irrelevant. The nodes are z\ , Zi, . . • , ZNV and their coordinates are (x\ , y\), (x^, ^2), • • • , (XNV, »/„)' respectively. Therefore, the Nv x 2 array Nodes is
Associated with Nodes are three arrays of pointers, allowing one to retrieve the index of a given node in the list of free or constrained nodes, or to retrieve the index of a given free or constrained node in Nodes. The Nf x 1 array FNodePtrs (free node pointers) contains the pointer into Nodes for each free node:
CNodePtrs (constrained node pointers) is the analogous array for the constrained nodes:
6.2. The mesh data structure
135
In Section 4.1.1, I defined the notation used above: f\, fa,..., /#, are the indices of the free nodes in the list of all nodes. That is, z/, is the /th free node. Similarly, zc. is the iih constrained node. Finally, the Nv x 1 array NodePtrs contains the information necessary to determine, for each node in Nodes, if it is free or constrained and its index is in the list of free or constrained nodes. I use a small trick here to put all of this information in a single array:
The reader will recall that if node z/ is free, then its index in the list of free nodes is /?/, while if it is constrained, its index in the list of constrained nodes is q,. By examining the sign of NodePtrs ( i ) , one can determine whether z/ is free or constrained (positive for a free node, negative for a constrained node). The value of NodePtrs ( i ) (or its negative) then gives the index of z/ in the list of free or constrained nodes.
6.2.2 The list of edges The Ne x 2 arrays Edges describes the edges by listing their endpoints, or, to be precise, by listing the indices of their endpoints in the Nodes array. My notation for the endpoints of edge £[ is zki.,, z/t, 2 , and thus
The reader should notice that there is no preferred order for listing the endpoints of a given edge, but whatever order is chosen in the Edges array must be followed in the other parts of the code.
6.2.3
The list of elements
Each triangular element is defined by its three edges. I have already established the notation estj to denote the jth edge of triangle Tk. The edges are listed in counterclockwise order, and each edge has an orientation defined by the order in which the endpoints are listed in Edges. Since the orientation of the edge is important in certain circumstances, it is convenient to indicate it explicitly. 1 therefore define the N, x 3 array Elements by
The positive sign is taken if, in traversing the boundary of triangle Tk counterclockwise along edges e X k l , eSk2, eSk 3 , edge eSk. is followed in its orientation defined in Edges. Otherwise, the negative sign is recorded.
136
Chapter 6. The mesh data structure
Figure 6.2. A mesh with two triangles, T\ and 1^. The edges are e\ (bottom), e^ (left), £3 (diagonal), 64 (right), ande*, (top). To make this clear, consider the mesh shown in Figure 6.2. There are two triangles, T\ and TI, five edges, e\, €2, 63,64, e$, and four nodes, z\,Z2,Zi, z\. If the Edges array is defined to be
then
This indicates that triangle T\ has edges 62, £3, ^5, with e^ traversed in the positive direction (that is, the direction defined by Edges) and e-i and e$ traversed in the negative direction. Triangle T^ has edges e\, 64, 63, with e\ and 64 traversed in the positive direction and e^ traversed in the negative direction. At various points in the algorithms presented in subsequent chapters, it is necessary to identify each edge as being an interior edge, a free boundary edge, or a constrained boundary edge. For example, when refining a mesh, new nodes are added. If a new node belongs to an interior edge or a free boundary edge, then it will be free, but if it belongs to a constrained boundary edge, then it will be constrained. When doing local refinement, it is also necessary to know, for interior edges, the triangles on both sides of the edge. I define indices f/j, ?,,2 such that Tti, and Ttn are the triangles on the two sides of e-t. Ifei is a boundary edge, then f/,2 is defined to be 0. The reader will recall that the free boundary edges are denoted e^, ebl,..., et,Nb. I define / i-> af to be the inverse of j t-» bj (that is, a, = j •&• i = bj). Then, if et is a free boundary edge, its index in the list of all free boundary edges is at.
6.2. The mesh data structure
137
The above information is collected in an Ne x 2 array called EdgeEls. If the /th edge is an interior edge, then
Ifej is a free boundary edge, then
Finally, if e\ is a constrained boundary edge, then
6.2.4
The list of free boundary edges
The final information that must be recorded about the mesh is the list of free boundary edges, which must be available to deal with an inhomogeneous Neumann condition. I define the Nh x 1 array FBndyEdge s so that the /th entry contains the index bt of the /th free boundary edge:
6.2.5
Other fields in the mesh data structure
Since the data structure and algorithms will be extended in later chapters to handle piecewise polynomials of degree greater than one, the mesh data structure will contain an integer called Degree that has value 1 for linear Lagrange triangles, 2 for quadratic Lagrange triangles, and so forth. The Ne x 1 array EdgeCFlags indicates, for each edge, whether it corresponds to a curved piece of the boundary. EdgeCFlags (i) is 0 if et is an interior edge or a boundary edge corresponding to a straight piece of d£2. On the other hand, if e\ is a boundary edge approximating a curved piece of 3Q, then EdgeCFlags ( i ) is 1. The array EdgeCFlags is used by the Ref inel algorithm (described below) to automatically approximate a nonpolygonal domain by a polygonal mesh. When a domain has a curved boundary, some mechanism is required to describe the precise boundary. Since both the refinement algorithm and the use of isoparametric finite elements require points in the interior of given arcs of the boundary curve, a curved boundary is described by a function BndyFcn with the following property: Given two points on the boundary and a positive integer k, the function returns k — 1 points in the interior of the arc having the given endpoints. The refinement algorithm uses k = 2, that is, just one point between the two given points is needed (see Figure 6.10 below). For generating isoparametric Lagrange triangles of degree d, k = d is used. Whenever the domain has a curved boundary, BndyFcn must be provided.
138
Chapter 6. The mesh data structure
Two optional fields, LevelNodes and Node Parents, are added to the data structure for meshes created by the Re f ine 1 algorithm. The meaning of these fields is described in the next section.
6.3
The MATLAB implementation
There is nothing in the data structure described above or the algorithms described in the following chapters that is specific to MATLAB; they could equally well be implemented in any programming language, such as Fortran, C, C++, and so forth. However, I provide an implementation in MATLAB, and at various points I will add some details about these MATLAB routines. For convenience, the various arrays in the mesh data structure are stored in a MATLAB struct. A struct is a data structure that collects two or more data fields in one object that can then be passed to routines. The mesh struct has the following fields: • Degree
• Elements, Edges, Nodes • EdgeEls, EdgeCFlags, FBndyEdges • NodePtrs, FNodePtrs, CNodePtrs • BndyFcn (if necessary) • LevelNodes, NodeParents (optional) If an instance of the mesh struct is given the variable name T, then one can refer to any of the fields using the syntax T. FieldName (for example, T. Nodes). As mentioned earlier, there is a trade-off between memory usage and computational time that must be considered when designing a data structure. I have chosen to store needed information explicitly rather than recompute it, which means that my data structure uses a relatively large amount of memory. To put this into context, I can compare the storage requirement of the mesh to the storage requirement for a corresponding stiffness matrix. For a scalar PDE, such as the model problem (6.1), the mesh takes about three to four times as much memory as the stiffness matrix. However, the mesh only occupies this much memory because I am using the default MATLAB data types, which store all numbers, even integers, in double precision. If all the integer pointers are stored as 2-byte integer values rather than 8-byte double precision numbers, the mesh would take approximately the same amount of storage as the stiffness matrix itself (see Exercise 1). Such storage would be natural in a compiled language such as Fortran, C, or C++, and can be attained in MATLAB at the cost of some inconvenience. When meshes of higher-degree triangles are used, the stiffness matrix is less sparse and the memory used by the mesh data structure is less significant by comparison. For a system of PDEs, such as the equations of linear elasticity, the memory required for the mesh does not change, but the size of the stiffness matrix increases (by a factor of four for a 2 x 2 system). Therefore, for systems of PDEs, the memory used by the mesh is also less significant.
6.3. The MATLAB implementation
6.3.1
139
Generating a mesh by refinement
Given an arbitrary polygonal domain £2, the simplest way to create a mesh is probably to define a coarse mesh and then refine it. I have provided a routine called Ref inel that takes a triangulation To and returns the standard refinement T described in Section 5.1.1 (each triangle is replaced with four triangles obtained by inserting edges joining the midpoints of the original triangle edges—see Figure 5.1). The algorithm used in Ref inel, which is outlined in Algorithm 6.4, is straightforward, though somewhat tedious; it mainly consists of careful bookkeeping. Copy Nodes, Nodeptrs, FNode ptrs, and CNodeptrs from To to T Copy Node Level and Nodeparents form To to T (if they exist in To) be the number of triangles in To Let for for If edge j of trinagle i has not been bisected already Create the midpoint of edge j of triangle i Create the corresponding two new edges Begin updating Elements, EdgeEls, Nodeparents, EdgeCFlags, and FBndyEdges to create three new interior triangles i T Create the three new interior edges in T Finish updating Elements and Edge Els in T: Create the fourth interior triangle in t fpor Determine if the new midpoint of edge j is free or constgrained Update Node ptrs, FNode ptrs, and CNodeptrs in T
Algorithm 6.4. Algorithm for forming the standard refinement T of a triangulation ToThere are several important features to note from this algorithm: • The nodes from To are also nodes in T, and they are guaranteed to be the first nodes in the Nodes array of T. • When a mesh Tk is obtained by repeated refinement of an initial coarse mesh To via a sequence of intermediate meshes T T , . . . , T*-i, there is the possibility of using a hierarchical basis in place of the standard nodal basis. Hierarchical bases will be described in Section 11.2. When employing a hierarchical basis, it is necessary to know which nodes belong to TI but not to TI-1. I denote this set by A//; thus A// is the set of nodes added during the z'th refinement. Because Ref inel adds new nodes to the end of the list, the sets A/o, M » • • •»A/Jt can be identified by simply recording the number of nodes in each of the meshes To, TI, . . . , Tk- These numbers are recorded in the array LevelNodes, which is automatically added to the mesh data structure
140
Chapter6. The mesh data structure when a mesh is created by Ref ine 1; on repeated refinements, the LevelNodes array is updated. The number of nodes in mesh 71 is stored in LevelNodes ( i+1 ) .
• When one mesh is obtained from another by refinement, the new nodes are all obtained as midpoints of edges in the original mesh. It is often necessary to know the endpoints of these edges. For example, this is needed in the hierarchical basis approach, and it is also needed when a piecewise linear function from the coarse mesh is interpolated to obtain the same function on the fine mesh. Therefore, Ref ine 1 adds or updates an Nv x 2 array called NodeParents. If j\ = NodeParents ( i , 1), 72 = NodeParents ( i , 2 ) , then zi is the midpoint of the line segment with endpoints Zjl , Zj2 . This means that
The code itself (Ref ine 1 . m) is carefully documented and the interested reader can consult the source file to see how it is implemented.
6.3.2
Generating a mesh from a triangle-node list
Meshes can be generated by many different methods; indeed, mesh generation itself is an active area of research. Any mesh generation tool (that creates a triangulation) will, at the least, produce a triangle-node description of the mesh. For example, MATLAB itself provides a routine called delaunay that takes a list of nodes and produces the Delaunay triangulation, defining the triangles precisely by the triangle-node list. The Delaunay triangulation has the property that the circumscribed circle around each triangle does not enclose any nodes except those of the given triangle. This gives one way to create a mesh: Define a list of nodes on the domain and then compute the Delaunay triangulation for those nodes. A simple MATLAB mesh generator di stmesh2 d is due to Persson and Strang [33].9 It is based on two user-defined functions. The first function d defines the domain by giving the signed distance from any point (x, y) to the boundary (negative if (x, y) is inside the domain) so that the boundary is defined by d(x, y) = 0. The second function h defines the desired triangle diameters: The relative diameter of a triangle containing the point (jc, y) should be about h(x, y) (and thus h(x, y) = 1 would give an approximately uniform mesh). The distmesh2d routine also describes the resulting mesh by the triangle-node list. To allow a broader use of the code that accompanies this book, 1 wrote a routine, MakeMeshl, that takes a mesh described by a triangle-node list and generates the data structure described in this chapter. The following example illustrates some possibilities. EXAMPLE 6.1. Suppose an (approximate) triangulation of the unit circle is needed. Here 1 present the results of three different approaches. ^Currently available at http: //math.mit .edu/~persson/mesh.
6.3. The MATLAB implementation
141
Figure 6.3. A very coarse mesh on the unit circle (left) and the result of refining it three times (right).
Figure 6.4. A set of nodes on the unit circle (left) and its Delaunay triangulation (right).
In the first method, I define a (very) coarse mesh consisting of four triangles; this is shown in Figure 6.3 (left). This mesh is then refined three times, resulting in the mesh shown on the right in Figure 6.3. This construction is explained in detail in Example 6.6 below. Another way to proceed is to define nodes uniformly spaced on the boundary, together with those nodes on a regular rectangular grid that lie in the interior of the circle (but not too close to the boundary). Such a set of nodes is shown in Figure 6.4 (left). The delaunay routine in MATLAB then computes the Delaunay triangulation, which is also shown in Figure 6.4 (right). Finally, the distmesh2d routine described above, when asked for a uniform triangulation, yields the mesh shown in Figure 6.5. The three meshes generated in this example have 256, 263, and 251 triangles, respectively.
142
Chapter 6. The mesh data structure
Figure 6.5. A triangulation of the unit circle generated by distmesh2d.
Figure 6.6. An intentionally bad initial mesh on the unit circle (left) and the result of refining it four times (right).
6.3.3
Assessing the quality of a triangulation
Not every way of generating a triangulation produces equally good results. For example, Figure 6.6 shows an initial mesh for the unit circle that consists of a single triangle, and the result of refining it four times; the final mesh has 256 triangles. A close-up of part of the mesh is shown in Figure 6.7, which shows that some of the triangles are quite distorted. This suggests that this mesh may not be satisfactory in the finite element method. EXAMPLE 6.2. The effect of the mesh on the finite element method will be illustrated on the BVP
143
6.3. The MATLAB implementation
Figure 6.7. A close-up of the final mesh from Figure 6.6. where £1 is (bounded by) the unit circle and f and g are chosen so that the exact solution is u(x, y ) = sin (jc)y2. The solution was computed on the meshes 71, Ti, Ti, T$ shown in Figures 6.3, 6.4, 6.5, and6.6, respectively, yielding the following results: Mesh
T1 T2 T3 T4
0.154 0.152 0.132 0.170
These results suggest that 7s is the best of the four meshes, while T^ is the worst. The a priori error bounds presented in Chapter 5 all contain unknown constants, which, however, are known to depend on p, the lower bound on dT/d\am(T) forTeTh (cf. (5.1) and (5.2)). The smaller the value of p, that is, the further the worst triangles are from equilateral, the larger the bound on the error and, presumably, the larger the error itself. Thus the quality of a mesh can be measured by computing
for each T e Th- The factor of >/3 is included so that an equilateral triangle (the ideal) has a quality measure of 1. The computation of dj was discussed in Exercise 5.6.2. A related measure of the quality of a triangle is two times the ratio of the radius of the inscribed circle to the radius of the circumscribed circle. The factor of 2 is included so that an equilateral triangle has a quality measure of 1 in this case also. If the lengths of the three sides of T are b\,b2, bi, then
where rinsc is the radius of the inscribed circle and rcirc is the radius of the circumscribed circle (see Exercise 2).
Chapter 6. The mesh data structure
144
A third quality measure for a triangle is the measure of the smallest angle, divided by n/3 (once again, so that an equilateral triangle will have quality 1). EXAMPLE 6.3. To illustrate the above quality measures, the quality measures q\,q2, q-\ were computed for the meshes 71 , ?2, 7s, T^from Example 6.2. The minimum and mean quality,
are given in the following table: Mesh T1 T2 T3 T4
0.701 0.8944 0.806 0.916 0.568 0.792 0.588 0.744 0.658 0.848 0.572 0.0.752 0.620 0.917 0.702 0.971 0.637 0.889 0.0428 0.604 0.00572 0.656 0.0341 0.571
Mesh TI clearly has the best average triangle quality, while T\ is the worst from the standpoint of both average and minimum triangle quality. Comparing these results with the errors in the finite element solutions in Example 6.2 suggests that average triangle quality is the best indicator of the quality of the mesh. The MATLAB implementation includes a MeshQualityl routine that computes q\, q2, or #3 for the triangles in a given mesh. MATLAB has a help command for getting details about built-in and user-defined commands; one can type, for example, "help MeshQual ityl" at the MATLAB prompt to get details about the MeshQual i tyl command. The help command can be used to get the precise usage and other details about all of the MATLAB routines that accompany this book.
6.3.4
Viewing a mesh
To visualize a mesh stored in the data structure described above, I wrote a routine called ShowMeshl, which takes as input a mesh and (optionally) an array of flags indicating how the mesh is to be labeled. The nodes, edges, and triangles can be labeled in the figure. The following examples show how to define and plot meshes. EXAMPLE 6.4. Let Q be the region bounded by the triangle with vertices (0, 0), (1, 0), and (0, 1), and assume that Dirichlet conditions are to be imposed on the left edge (joining (0, 0) and (0, 1)) and Neumann conditions elsewhere. The following quantities define a triangulation of£l consisting of a single triangle:
6.3. The MATLAB implementation
145
Figure 6.8. A coarse mesh and three successive refinements (see Example 6.4). (The nodes and triangles are labeled in (a) and (b) but not in the others.)
This coarse mesh was refined three times by repeated application of the command T = Ref inel ( T ) . Figure 6.8 shows the originalmesh and'the three successive refinements. Plot (a) shows results from the command ShowMeshl (T, [ 1 , 0 , 1 , 0 ] ) . The first three flags accepted by ShowMeshl determine whether the nodes, edges, and/or triangles are enumerated, and the fourth flag whether the nodes are unmarked (flag = 0), marked with a dot (flag = 1), or marked with a circlefor afree node and a starfor a constrained node (flag = 2). Thus, ShowMeshl (T, [ 1 , 0 , 1 , 0 ] ) means that the nodes and triangles are enumerated. Plot (b) in Figure 6.8 was produced by the same command, while (c) and (d) were produced by ShowMeshl ( T ) , which is equivalent to ShowMeshl (T, [ 0 , 0 , 0 , 0 ] ) . EXAMPLE 6.5. Let £2 be the region bounded by the pentagon with vertices (0, 0), (1, 0), (1.5,0.5), (0.5, 1.5), and(Q, 1), and assume that Dirichlet conditions are to be imposed on all the edges except the edge e with endpoints (1.5, 0.5) and (0.5, 1.5), on which Neumann
146
Chapter 6. The mesh data structure
conditions are imposed. In order that the triangles have the same area (which is not a requirement), I put nodes at (0.5, 0.5) and(\, 1), in addition to the five vertices. My initial triangulation ofQ has seven nodes and six triangles:
Next I refine this coarse mesh three times with the command'T = Ref inel ( T ) . Figure 6.9 shows the original mesh and the three successive refinements. The edges are enumerated in the coarsest mesh, and the free and constrained nodes are distinguished in the second mesh. The commands to plot the first two meshes are ShowMeshl (T, [ 0 , 1 , 0 , 0 ] ) and ShowMeshl (T, [ 0 , 0 , 0 , 2 ] ) , respectively.
6.3. The MATLAB implementation
147
Figure 6.9. A coarse mesh and three successive refinements. The edges are labeled in the coarsest mesh. In the second mesh, the free nodes are indicated by a small circle and the constrained nodes by a small star.
Figure 6.10. Refining an edge along a curved boundary.
6.3.5
Handling a domain with a curved boundary
I briefly described above how a mesh approximating a domain with a curved boundary can be refined. The mesh data structure includes an array EdgeCFlags, which indicates whether an edge is approximating a piece of a curved boundary or not. If it is, instead of bisecting the edge during refinement, the curve can be bisected instead to generate the new "midpoint." The edge is still replaced with two (straight) edges, but these edges now follow the boundary. The process is illustrated in Figure 6.10.
148
Chapter 6. The mesh data structure
If any EdgeCFlags is nonzero, a "boundary function" must be included as the field BndyFcn in the mesh data structure. This function is called whenever an edge approximating an arc of a curved boundary is to be refined; the endpoints of the edge are passed in, and the boundary function must return the midpoint of the corresponding arc of the boundary curve. EXAMPLE 6.6. Let £2 be the unit circle, and assume that Dirichlet conditions are to be imposed everywhere on the boundary. Using the strategy described above, I can begin with a coarse mesh (consisting of only four triangles—a poor approximation to £2) and obtain a good approximate mesh after a few refinements. Here is the initial mesh, which is shown in Figure 6.11:
Since £2 has a curved boundary, T. BndyFcn is set to 'Circlef'; Circlef . m defines a boundary function, of the type described above, for circles centered at the origin. I refine this coarse mesh three times. Figure 6.11 shows the original mesh and the three successive refinements.
6.3.6
Viewing a piecewise linear function
When working with the finite element code presented in this book, it is helpful to be able to graph a continuous piecewise linear function defined on a triangular mesh. I have provided a routine, ShowPWLinFcnl, for this purpose. In order to use ShowPWLinFcnl, it is
6.3. The MATLAB implementation
149
Figure 6.11. A coarse mesh and three successive refinements. necessary to specify the nodal values of the desired function at all nodes in the mesh. The calling sequence is ShowPWLinFcnl(T,u,g) The input T is the mesh data structure, u is a vector containing the nodal values corresponding to the free nodes, and g is a vector containing the nodal values corresponding to the constrained nodes. Thus u has Nf- components and g has Nc components. If the values at the constrained nodes are all zero, or if every node is free, then it is not necessary to provide the vector g. EXAMPLE 6.7. Consider the function u(x, y) = x2 + >'2 defined on the unit disk. IfT is the third mesh from Figure 6.11, then the following MATLAB commands evaluate the function u at the nodes of the mesh and plot the resulting piecewise linear function. The first line uses the built-in MATLAB functions inline and vectorize to define u as a function of two variables, while the second line evaluates u at the nodes of the mesh T, producing the free and constrained nodal values. The graph produced by the third line is shown in Figure 6.12.
150
Chapter 6. The mesh data structure
Figure 6.12. Continuous piecewise linear function ShowPWLinFcnl (see Example 6.7).
6.3.7
graphed
using
MATLAB functions
Functions that deal exclusively with meshes of linear triangles have a name ending with " 1." The reader should recall that the MATLAB command help f cnname will display information about the function named f cnname, which is implemented in the file f cnname . m. • Meshl: Type help Meshl to see a description of the mesh data structure. • Generating sample meshes: - RectangleMeshDl Generates a uniform mesh on a rectangle [0, lx] x [0, ty \ (Dirichlet conditions). - RectangleMeshNl Generates a uniform mesh on arectangle [0, t x ] x [0, ty \ (Neumann conditions). - RectangleMeshTopDl Generates a uniform mesh on arectangle [0, t x ] x [0, ty] (Dirichlet conditions on the top edge, Neumann conditions elsewhere). Similar meshes are RectangleMeshLeftNl, RectangleMeshTopLeftDl. - CoarseCircleMeshDl,CoarseCircleMeshNl Generateacoarsemesh (four triangles) on the unit circle (Dirichlet, Neumann conditions, respectively). - CoarseSemiCircleMeshDl, CoarseSemiCircleMeshBottomDl Generate a coarse mesh (two triangles) on the unit semicircle (Dirichlet, mixed conditions, respectively).
6.3. The MATLAB implementation
151
Figure 6.13. The standard pattern of triangles for a uniform mesh (left) and an alternate pattern (right). - CoarseEllipseMeshDl, CoarseEllipseMeshNl Generate a coarse mesh (four triangles) on the ellipse jc2 + >'2/4 = 1 (Dirichlet, Neumann conditions, respectively). - NGonMe shDl Generates a coarse mesh on a regular n-gon of area one (Dirichlet conditions). Suitable for n not too large (if n is large, say much bigger than 10, the mesh quality is poor). Some of the Rectangle routines above have a "la" version, which generates according to the pattern on the right in Figure 6.13, instead of the standard pattern shown on the left (see Exercise 7.6.11). • Ref inel Standard refinement of a given mesh. • MakeMeshl Assembles the mesh data structure from a triangle-node list. • MakeEdgesCurvedl Labels the boundary edges of a given mesh as curved. • NeumannMesh Converts the boundary conditions on a mesh to pure Neumann. • ShowMeshl Graphs a mesh. • MeshQual i tyl Computes the measures of mesh quality discussed in Section 6.3.3. • JiggleMeshl Improves mesh quality by moving the interior nodes.10 • ShowSupportl Graphs a mesh and shades the support of a given basis function. • ShowPWLinFcnl Graphs a continuous piecewise linear function on a given mesh. • ShowPWConstFcn Graphs a piecewise constant function on a given mesh.
6.3.8
A summary of the notation
I have now introduced a large amount of notation, all of which is necessary to describe the mesh precisely. For convenient reference, I summarize it here. Where appropriate, I also indicate where in the mesh data structure the corresponding information is to be found. 3
JiggleMeshl was adapted from a similar routine in the MATLAB PDE Toolbox.
152
Chapter 6. The mesh data structure Number of triangles: Nt Number of edges: Ne Number of nodes: Nv Number of free nodes: Nf Number of constrained nodes: Nc Number of free boundary edges: Nb Triangles: T\, T2,..., TJv, Edges: e\,e2, ...,eNe Edges belonging to triangle 7): eSi,, eSi2, eSi 3 (Elements) Triangles to which edge e, belongs: Tti,, Tti 2 (f,,2 is undefined if e\ is a boundary edge)(EdgeEls) Nodes: z\, zi,..., ZNV, Zj — (xj, yj) (Nodes) Nodes belonging to edge ei: zkil, zki2 (Edges) Nodes belonging to triangle 7): z«,,, z B ._ 2 , zni,3 Free nodes: z/,, z / 2 , . . . , z/N (FNodePtrs) Inverse of j i-> /;: i \-> /?, (/?, = j •<&• i = fj) (NodePtrs) Constrained nodes: zCi ,zC2,---, zCNc (CNodePtrs)
Inverse of j \-> cf. i i-> qt (qi = j •&• i = Cj) (NodePtrs) Nodal basis functions: Vi, fa, • • • , &NV Nodal basis functions corresponding to free nodes: 0i, 02 > • • • , 0/v7, 0y = ^f} Free boundary edges: ^,, e ^ 2 , . . . , e\,Nb (FBndyEdges) Inverse of i i-> /?/: j (-»• ay- (ay = / ^> j — bt) (EdgeEls)
6.4 1.
Exercises for Chapter 6 (a) Consider a family of uniform triangulations on the unit square (such as the mesh shown in Figure 4.5). Give asymptotic expressions for Nt and Ne (the numbers of triangles and edges, respectively) in terms of Nv (the number of nodes). These should be of the form
the problem is to find the constants ct and ce.
6.4.
Exercises for Chapter 6
153
(b) Now consider a family of meshes obtained by standard refinement from an initial mesh. Do (6.2) still hold for the same constants ct, cel Why or why not? (c) Using the results from above and assuming integers require two bytes of storage and double precision numbers require eight bytes of storage, estimate the amount of storage used by the mesh data structure in terms of Nv, the number of nodes. (d) Consider a family of meshes as described in part 1 (b). About how many nonzeros will the stiffness matrix K have per row? In total? (e) A typical (double precision) sparse matrix storage scheme will require about 12 bytes of storage per nonzero. (It could be a little less; MATLAB uses a little more.) In terms of Nv, about how much storage does the stiffness matrix K require? 2. Let T be a triangle with side lengths b\,b2,bi, and let rinsc and rcirc be the radii of the inscribed and circumscribed circles, respectively. Show that
3. (MATLAB) Suppose Q is the annulus (that is, the region bounded by two concentric circles) centered at the origin, with inner radius r — 0.5 and outer radius/? = 1. Using any of the tools mentioned in Section 6.2 (Ref inel, delaunay, distmesh2d), create a triangulation of £2 having a good mesh quality. 4. (MATLAB) Repeat the previous exercise when Q is the region bounded by the parabola _y = x2 and the line >' = 1. 5. (MATLAB) Repeat the previous exercise when Q is the region bounded by the ellipse
This page intentionally left blank
Chapter 7
Programming the finite element method: Linear Lagrange triangles This chapter discusses programs for solving model problem (6.1) using piecewise linear finite elements and the mesh data structure introduced in the previous chapter. Some details about quadrature and the evaluation of basis functions and their gradients are presented in Section 7.1. Then the assembly of the stiffness matrix and load vector are explained in Sections 7.2 and 7.3, respectively. Section 7.4 presents various examples of (6.1), solved using the MATLAB version of these algorithms.
7.1
Quadrature
There is one more technical matter to discuss before I can explain in detail the algorithms for assembling the stiffness matrix and load vector, namely, the computation of the integrals that define these quantities. As I discussed in Section 5.5, integrals such as
are usually estimated by quadrature rather than computed exactly. Section 5.5 presented the necessary theory: If the quadrature rule is exact for polynomials of degree 2d — 2, then the convergence rate for Lagrange triangles of degree d will be just as good as if the integrals were computed exactly.
7.1.1
Gaussian quadrature
One-dimensional Gaussian quadrature I begin by describing Gaussian quadrature for one-dimensional integrals. This will simplify the introduction, and these rules will also be useful for the (one-dimensional) boundary integrals arising in a problem with inhomogeneous Neumann conditions. Moreover, quadrature rules over squares, useful for quadrilateral elements, are based directly on one-dimensional Gaussian quadrature rules. 155
156 Chapter 7. Programming the finite element method: Linear Lagrange triangles
It is standard to define Gaussian quadrature rules for the reference interval [—1, 1]; that is, the rules are developed for integrals of the form
General integrals of the form
are then handled by a linear change of variables, which is described below. The best-known quadrature rules are probably the trapezoidal rule and Simpson's rule. The simple trapezoidal rule is based on linear interpolation: The integrand / is approximated by the linear function I agreeing with / at the endpoints —1 and 1 of the interval of integration. The result is11
Since the trapezoidal rule is based on linear interpolation, it is obviously exact for every polynomial / of degree one or less (i = f for such an /). Simpson's rule is based on quadratic interpolation. The integrand / is approximated by the quadratic polynomial q that interpolates / at — 1, 0, and 1. The result is12
Since Simpson's rule is based on quadratic interpolation, it is exact for any polynomial / of degree 2 or less. In fact, though, more is true: Simpson's rule is actually exact for polynomials of degree up to 3. In short, Simpson's rule has degree of precision 3 (while the trapezoidal rule has degree of precision 1). This extra degree of accuracy, together with the simplicity of Simpson's rule, makes it a popular quadrature rule. 1
'The reader has probably seen the composite trapezoidal rule applied to
which is derived by dividing the interval [a, b\ into n subintervals |jt/_i, jc/1, i = 1 , 2 , . . . , n, where *, — a + i(b — a)/n, and applying the trapezoidal rule on each subinterval. Summing the results yields the familiar formula
12 Again, the reader is probably more familiar with the composite Simpson's rule, which is derived in the same way as the composite trapezoidal rule and takes the form
(n even).
7.1. Quadrature _
157
Having observed this extra degree of precision displayed by Simpson's rule, it is natural to pursue the following idea: to derive quadrature rules directly from the requirement that they be exact on polynomials of as high a degree as possible. The trapezoidal rule, Simpson's rule, and similar numerical integration formulas have the form
where t\ , ?2, • • • , tn e [— 1, 1] are called the quadrature nodes and w\, W2, . . . , wn are the corresponding weights. For the trapezoidal rule, the nodes are t\ — — 1, ?2 — 1 and the weights are w\ = 1, u>2 — 1. The nodes for Simpson's rule are t\ = — 1, ti — 0, t^ — 1 and the weights are w\ — 1/3, W2 = 4/3, WT, — 1/3. A quadrature rule of the form (7.1) is called an n-point rule. The general n-point quadrature rule depends on 2n parameters, namely, the n nodes and the n weights. It is reasonable to expect that the 2n degrees of freedom could be chosen so that the rule has degree of precision In — 1, since such a polynomial is determined by 2n coefficients. To put it a different way, since both integration and the n-point rule are linear in the integrand, a quadrature rule is exact for polynomials of degree at most d if and only if it is exact on the d + 1 monomials 1 , t, t2, . . . , td . An n-point rule, having In parameters t\ , ?2, - - - , tn, w\ , W2, . . . , wn, can be expected to satisfy 2n equations:
In fact, this is possible for all n, and the resulting quadrature rules are called the Gaussian quadrature rules. For example, the one-point Gaussian quadrature rule is
(the midpoint rule), and the two-point rule is
The one-point Gaussian quadrature rule is as accurate, from the point of view of integrating polynomials, as the two-point trapezoidal rule, and the two-point Gaussian quadrature rule is as accurate as the three-point Simpson's rule. The one- and two-point Gaussian quadrature rules are easily derived directly from the conditions given above, but for n > 2 this is not so easy. The existence and uniqueness of the general n-point Gaussian quadrature rule is proved using the theory of orthogonal polynomials, and the details are not relevant for this book. The last matter that 1 wish to discuss concerning one-dimensional integrals is the question of transforming the general integral
158 Chapter 7. Programming the finite element method: Linear Lagrange triangles into the special form
This is accomplished using a linear change of variables:
The result is
Applying the n-point quadrature rule (7.1) yields
where Thus, to apply the n-point rule, it is necessary to compute the quadrature nodes on [a, b] from the nodes on the reference interval [—1, 1 ] and also compute the Jacobian factor (b — a)/2, which is constant since the change of variables is linear. Gaussian-type quadrature on triangles
In keeping with the development of one-dimensional Gaussian quadrature in the previous section, I will begin by examining quadrature rules for integrals over the reference triangle (sometimes called the master element) TR having vertices (0, 0), (1, 0), and (0, 1). An n-point rule now takes the form
and it is desired to choose the quadrature nodes (s\, t\), ($2, h),..., (sn, tn) and the weights w\, W2,..., wn so that the rule is exact for polynomials of as high a degree as possible. The monomials are 1, s, t, s2, st, t2,..., and the goal is to integrate as many of them exactly as is possible. With a one-point rule, it ought to be easy to integrate constant functions (that is, the monomial 1) exactly. A one-point rule is defined by three parameters—s\, t\, w\—and a constant function imposes only one constraint. Indeed, with f ( s , t ) = \,
and
7.1. Quadrature
159
Therefore, (s\, t\) can be any point in TK as long as the weight is correct: w\ — 1/2. Any such rule has degree of precision 0. (But the reader should note that it is still necessary to evaluate the integrand at some point in TK, since constant functions usually do not have value 1. In other words, one cannot use a zero-point rule.) In order that a rule have degree of precision 1, it must satisfy three constraints: It must integrate 1, s, and / exactly. A one-point rule, having three parameters, should work here as well. The three conditions are
and the unique solution is s\ — t\ = 1/3, w\ = 1/2 (see Exercise 1). The point (1/3, 1/3) is the centroid of the reference triangle TR. To integrate all second-degree polynomials exactly implies satisfying six constraints (namely, that 1, s, t, s2, st, and t2 are integrated exactly), and it would appear that a two-point rule, having six degrees of freedom, would suffice. However, it is here that the situation begins to break down, for, in fact, there is no two-point rule with degree of precision 2. The reader can write down the six (nonlinear) equations and, with a little effort, show that there is no solution (see Exercise 2). A three-point rule is required to integrate all second-degree polynomials exactly, but there is no uniqueness. In Table 7.1, I give two three-point rules for the reference triangle TR, with degree of precision 2. Rules with higher precision are presented in Chapter 8 in the context of Lagrange triangles of degree greater than 1. The one-point rule suffices for linear Lagrange triangles, at least for computing the stiffness matrix and load vector. In Exercise 16, the reader is asked to write a program to compute the mass matrix, which was introduced in Exercise 4.8.17:
Since the basis functions 0, and 07, restricted to each triangle, are linear, the integrand is at least quadratic and a rule having degree of precision 2 is required. Rule 1 Rule 2 (s1,t1) (1/6, 1/6) (1/2, 0) (s2, y2) (2/3, 1/6) (1/2, 1/2) (s3, t3) (1/6,2/3) (0, 1/2) w1 1/6 41/6 w1 1/6 41/6 w1 1/6 41/6
Table 7.1. Two three-point quadrature rules for TR, each with degree of precision 2.
160 Chapter 7. Programming the finite element method: Linear Lagrange triangles Integrating over general triangles
Now I consider an arbitrary triangle T with vertices (JC], y\), (X2, yi), (*3, yi) and show how to estimate by a change of variables. The reference triangle TR is mapped to T by the following transformation (cf. Section 4.6), which sends (0, 0) to (x\, y\), (1,0) to (X2, ^2), and (0, 1) to(x3,y3): (s,t)eTR^ (x,y)e T,
In vector form, the transformation is
orz = z\ + Ju. The standard formula for a change of variables in a multiple integral gives
where g is defined by The Jacobian factor is constant: Applying the one-point rule
to the transformed integral, the result is
where
The reader should notice that the quadrature node on T is the centroid of T, just as (1 /3, 1/3) is the centroid of TR:
Moreover, J7/2 is the area of triangle T (see Exercise 3).
7.1. Quadrature
161
A similar result holds if a three-point rule is used to estimate the transformed integral
Using Rule 1 from Table 7.1,
where
As the above examples suggest, because of the linearity of the transformation between TR and T, the quadrature nodes are conveniently expressed in terms of barycentric coordinates. If the vertices of T are (x\, y\), fe, >'2), (xi, 3^3), then the barycentric coordinates (Ci > £2, £0 of (jc, _y) e T are defined by the conditions
For Rule 1 from Table 7. 1 , the barycentric coordinates for the three quadrature nodes on TR are (2/3, 1/6, 1/6), (1/6,2/3, 1/6), and (1/6, 1/6, 2/3); the corresponding quadrature nodes (7.3a-7.3c) on T have the same barycentric coordinates. For this reason, it is convenient to give quadrature rules using barycentric coordinates. When a quadrature rule is expressed in barycentric coordinates, it is conventional to express it for an arbitrary triangle of area 1. Since TR has area 1/2, the reader will notice that the weights are doubled when expressed in barycentric coordinates as compared to the coordinates of TR. Table 7.2 contains the same three-point rules as does Table 7. 1, but expressed in terms of barycentric coordinates. To apply one of these rules to an arbitrary triangle 7\ • use nodes on T with the given barycentric coordinates; • multiply the weights by the area of triangle T. The reader can verify that this procedure yields the results derived above.
162 Chapter 7. Programming the finite element method: Linear Lagrange triangles
Table 7.2. Two three-point quadrature rules, each with degree of precision 2. The nodes are given in terms of bary centric coordinates for a triangle of area 1.
7.1 .2
Evaluating the standard basis functions on a triangle
In order to apply the quadrature rules described above, it is necessary to evaluate the basis functions and their gradients over each triangle. For linear Lagrange triangles, the standard basis functions are linear on each triangle, and, given any triangle Tk in the mesh, there are only three basis functions that are nonzero on 7*. To simplify the notation, I will use local indices in this section and assume that 7* has vertices (jci, y\), (X2, yi), and (*3, yj,). The corresponding nonzero basis functions will be denoted 0], 02, and 03, respectively.13 The functions 0i, 02> 03 are defined by
and can be represented by
The coefficients «,, b{, c, can be found by solving a 3 x 3 system of equations. For example, for / = 1 this system is
or, in matrix-vector form,
13
Using global indices, the vertices would be (xnk,, ynk ,),i = 1,2,3.
7.1. Quadrature
163
Each triple (a-,, b j , c/) is determined by a system like (7.4), and in each of these three systems, the coefficient matrix is the same. The three systems can therefore be combined into the matrix equation
which shows that
is the inverse of
Therefore, the computation of a 3 x 3 matrix inverse suffices to give the coefficients of all three linear functions on Tk. When assembling the stiffness matrix, it is necessary to compute integrals of the form
Since each >, is linear over Tk, the gradients are constant:
Therefore,
The integral of K can be estimated by the one-point rule:
(A is the area of Tk and (;c, ~y) is the centroid of Tk}. The calculation then proceeds like this: 1. Estimate JL'k K. 2. Invert the 3 x 3 matrix
and extract the gradients of the basis functions from M '. 3. Compute the dot products of the gradients:
164 Chapter 7. Programming the finite element method: Linear Lagrange triangles
The entries in the element stiffness matrix are then available, and they can be added to the proper entries in the global stiffness matrix. Computing the integrals that contribute to the load vector is even easier, at least when the standard one-point quadrature rule
is used. The computation is simplified by the fact that all three basis functions have value 1/3 at the centroid (J, ~y) (see Exercise 4), and so the estimate of
is the same for each / = 1,2, 3. Thus, when using the one-point rule, which is accurate enough when using linear Lagrange triangles, there is no work involved in computing the basis functions at the quadrature node. For future reference, I will show how to compute the basis functions 0i, 02,03 at a set of arbitrary points inside of 7*. This is easy if the points are given by their bary centric coordinates, since the barycentric coordinates (£j, &, &) of ( x , y ) e Tk are simply the values (0| (jc, j), 02(*, }0, fo(x, 30) (see Exercise 5). If, on the other hand, the points are specified by their Cartesian coordinates, evaluating 0i, 02, 03 requires the solution of a linear system. If M is the matrix defined by (7.6), then
The values of 0i, 02, 03 at (fi, /?]), (£2, m), • • • , (£p, VP) are given by
These values are the entries of the matrix-matrix product
Defining
V = CM l satisfies V,7 = 0 y -(£/, r/,). The equation V = CM l can be rearranged to give the matrix-matrix equation MTVT = CT, which can be solved efficiently using Gaussian elimination.
7.1. Quadrature
165
Using the reference triangle
The computations described above can be performed by transforming all integrals to the reference triangle, as described in Section 4.6. When using linear Lagrange triangles, there is a little gain in efficiency in using the reference triangle: The gradients of the basis functions need not be computed over every triangle, which saves the computation of a 3 x 3 matrix inverse. On the other hand, the transformation J must be computed and then J~lV<j)i, i = 1, 2, 3, which is nearly as expensive (see Exercise 6). Since the values of the basis functions at the centroid of any triangle are already known, there is no advantage in transforming
to the reference triangle. When using higher-order Lagrange triangles, there is a significant savings to be had in using the reference triangle. The values and gradients of the basis functions on TR can be computed once and used for every triangle in the mesh. I will discuss this further in Chapter 8.
7.1 .3
Quadrature over a square
When using quadrilateral elements, all integrations are performed over the reference square SR — [— 1, 1] x [— 1, 1]. Quadrature rules suitable for bilinear finite element spaces are easily obtained from the Gauss quadrature rules for the interval [ — 1 , 1 ]. Any polynomial in s, t is the sum of terms of the form skti, and
This suggests using a one-dimensional Gauss rule for each integral:
This product Gauss rule is applied to a general function / = f ( s , t) as follows:
1 66 Chapter 7. Programming the finite element method: Linear Lagrange triangles
Since the n-point (one-dimensional) Gauss rule is exact for polynomials of degree up to In — 1 , rule (7.7) is exact for polynomials whose terms are of the form sktt, k, t < In — \ . This space of polynomials does not include all polynomials of degree 4n — 2 or less, but it does include all polynomials of the form
The product Gauss rules are therefore natural for use with bilinear finite elements, described in Section 4.5.2, and also with their generalizations to biquadratic, bicubic, and so forth. In Exercise 9, the reader is asked to show that the product Gauss rule with n = 2 (four points in all) is suitable for implementing bilinear quadrilateral finite elements.
7.2
Assembling the stiffness matrix
I will now present the main computation of the finite element method, namely, the assembly of the stiffness matrix. In Section 6.1,1 presented an outline of an element-oriented algorithm for assembling K (see Algorithm 6.1). I can now express this algorithm in terms of the mesh data structure presented in Section 6.2. A fundamental operation needed for implementing element-oriented algorithms is the determination of the vertices of a given triangle. To be precise, it is necessary to know, for each vertex, its coordinates, whether it is free or constrained, and its index in the list of free or constrained nodes. The reader will recall that the mesh data structure stores, for each triangle in a mesh, (pointers to) the edges of the triangle. It also stores, for each edge, (pointers to) the endpoints of the edge. By a simple algorithm, then, one can extract pointers to the vertices of a given triangle. If the triangle is Tk, then, according to the notation developed earlier, these pointers are nk
The matrix M } contains the coefficients of the basis functions as its columns, and since the basis functions are linear, these coefficients include the partial derivatives
7.2.
Assembling the stiffness matrix
167
for j = 1,2, 3 eptr(j)=T.Elements(k,j ) ifeptr(l)> 0 indices(1)=T.Edges(eptr(1),1) indices(2)=T.Edges(eptr(1),2) else indices(1)=T.Edges(eptr(1),2) indices(2)=T.Edges(eptr(1),1) if eptr (2) >0 indices(3)=T.Edges(eptr(2,2)) else indices(3)=T.Edges(eptr(2,1))
for / = 1, 2, 3 ptrs(i)=T.NodePtrs(indices(i) )
for j = 1, 2 coords(i,j)=T.Nodes(indices(i) , j Algorithm 7.1. The get Node si algorithm: Given a mesh T and a triangle index k, determines the coordinates of the vertices ofT^ and the corresponding entries in T. NodePtrs. The coordinates are returned in a 3 x 2 array coords and the pointers in the 3 x 1 array ptrs.
Therefore, upon computing M ', the gradients of the three basis functions are immediately available. The integral
is estimated using the one-point rule
where A is the area and (x, y) the centroid of the triangle. The computation of the area was explained in the previous section (see also Exercise 3). Algorithm 7.2 is the complete element-oriented algorithm for assembling K. I assume that a routine for computing a matrix inverse is available. MATLAB contains such a routine. If the code is to be written in a high-level language such as Fortran, C, or C++, it is recommended that code from the LAPACK package [3] be used. Since K is symmetric, the upper triangle is first computed and then copied to the lower triangle. Depending on the software to be used to solve KU — F, it may not be necessary to fill in the lower triangle ofK.
168 Chapter 7. Programming the finite element method: Linear Lagrange triangles
Initialize K to the zero matrix forfc= l,2,...,Nt Call getNodesl to get coords and ptrs Compute the matrix M and its inverse forr = 1,2,3 fors — r, ... ,3 G(r, s) = V0r • V0, Estimate / = fT K using the one-point quadrature rule forr = 1 , 2 , 3 for s — r,..., 3 if ptrs (r) >0 and ptrs (s) >0 / = min{ptrs (r) ,ptrs ( s ) } j = maxjptrs (r) ,ptrs ( s ) } MdG(r,s)ItoK(i,j) for/ = 2 , 3 , . . . , N f for j = 1 , 2 , . . . , / - 1 K(i,j) = KV,i) Algorithm 7.2. The complete algorithm for assembling K. The matrix M is defined by (7.8) and M~l contains V0i, V02, V
7.3
Computing the load vector
The basic algorithm for computing the load vector F is similar to that for assembling the stiffness matrix K, although it is complicated by the need to handle inhomogeneous boundary conditions. I will begin by completing the description of Algorithm 6.3 (see page 132), which applies to the following BVP (with homogeneous boundary conditions):
The integrals
will be estimated using the one-point quadrature rule
7.3. Computing the load vector
169
where A is the area and (x, >•) the centroid of Tk. Moreover, since all three basis functions have value 1/3 at (x, y), this estimate is the same for all three basis functions that are nonzero over Tk :
The details (getting the coordinates of the vertices of 7*, manipulating the pointers, computing the area of Tk, and so forth) are the same as for assembling the stiffness matrix. The complete algorithm for assembling F, under homogeneous boundary conditions, is given in Algorithm 7.3. Initialize F to the zro vector
for
Call getNodes1 to get coords and ptrs Compute the area A and the centroid Compute for if ptrs Add I to F(ptrs(r))
Algorithm 7.3. The algorithm for assembling the load vector in the case of homogeneous boundary conditions.
7.3.1
In homogeneous Dirichlet conditions
It is not difficult to extend the above algorithm to handle inhomogeneous Dirichlet boundary conditions. In place of
it is necessary to compute
The function G is the continuous piecewise linear function defined by
By definition, G is zero over every triangle except those having at least one constrained vertex. When looping over the triangles of the mesh, a contribution to F (from the inhomogeneous Dirichlet condition) must be computed whenever the triangle Tk contains at least one constrained node and at least one free node. These contributions have the form
170 Chapter 7. Programming the finite element method: Linear Lagrange triangles Since both VG and V0, are constant on Tk, it follows that
Moreover, VG is easily computed. If w\, W2, WT, are the nodal values of G at the three nodes of 7*, then, on Tk,
(these formulas are expressed in local indices). Algorithm 7.4 computes the load vector, taking into account the influence of the righthand-side function / and the Dirichlet data g (but still ignoring any nonzero Neumann data). This algorithm assumes that the nonzero Dirichlet data are given in an Nc x 1 array. Initialize F to the zero vector for*= 1 , 2 , . . . , N , Call getNodesl to get coords and ptrs Compute the area A and the centroid (x, j) of Tk if / is nonzero Compute / = Af(x, j)/3 forr = 1,2,3 ifptrs(r)>0 Add/toF(ptrs(r)) if g is nonzero and at least one ptrs (r) is negative Form the matrix M and its inverse Extract the nodal values of G on Tk Compute the gradient of G on 7* Compute / = AK(X, j) forr = 1,2,3 ifptrs(r)>0 Add VG • V0r7 to F(ptrs (r)) Algorithm 7.4. The algorithm for assembling the load vector in the case of a nonzero right-hand-side f and/or nonzero Dirichlet data g.
7.3.2
Inhomogeneous Neumann conditions
If the problem involves inhomogeneous Neumann conditions, the resulting contribution to the load vector can also be computed while looping over the triangles of the mesh. It can be determined from the array EdgeEls whether each edge is an interior edge, a constrained boundary edge, or a free boundary edge. To be precise, T. EdgeEls ( j , 2) is positive if edge ej is an interior edge, zero if ej is a constrained boundary edge, and —/ if e-} is the j'th free boundary edge.
7.4. Examples
171
The formula for the load vector, taking into account both Dirichlet and Neumann data,
is
If node zn is a free node that lies on 9 £2, then it is the endpoint of two free boundary edges. Therefore, if i = pn (this notation means that zn is the /th free node) and zn is an endpoint of free edges e-h and ei2, then
These integrals can be computed while looping over the triangles. The reader should notice that e-h and e-h may or may not belong to the same triangle, but, as in the case of the integrals defining £//, it is not necessary that the two integrals be computed together. The nodal values of the Neumann data h, at the endpoints of the free edges, can be provided in an Nb x 2 array. The function h is then approximated by its piecewise linear interpolant on each edge. Each integral of the form
is then estimated by the one-point Gaussian quadrature rule (the midpoint rule) for onedimensional integrals. Algorithm 7.5 is the complete algorithm for computing the load vector. It is coded in the MATLAB routine Loadl. The reader may notice that the array FBndyEdges, which identifies the free boundary edges, is not used while computing the load vector. Its only purpose is to identify the free boundary edges in a convenient way so that the needed Neumann data can be assembled, for input to Loadl, in a convenient way.
7.4
Examples
In this section, I solve several BVPs using the MATLAB code implementing the algorithms of this chapter. I have two goals in mind: 1. to show the steps involved in solving a BVP; and 2. to illustrate the theory described in Part I. To these ends, the solved problems illustrate a variety of boundary conditions and geometries. The easiest way to illustrate the convergence theory is to solve problems with known solutions. Few BVPs can be solved in closed form (that is, with the solution expressed as a finite combination of elementary functions). Therefore, to generate a test problem of the form
the following procedure is used:
1 72 Chapter 7. Programming the finite element method: Linear Lagrange triangles
Initialize F to the zero vector f o r & = 1,2, . . . , N , Call getNodesl to get coords and ptrs if / is nonzero Compute the area A and the centroid (Jc, y) of 7* Compute / — Af(x, y)/3 forr = 1,2,3 if ptrs (r) > 0 Add/toF(ptrs(r)) if g is nonzero and at least one ptrs (r) is nonzero if not already done Compute the area A and the centroid (J, 7) of 7* Form the matrix M and its inverse Extract the nodal values of G on 7* Compute the gradient of G on 7* Compute 7 = AK(X, ~y) forr = 1,2,3 ifptrs(r)> 0 Add VG • V0r/ to F(ptrs (r)) if h is nonzero for; = 1,2,3 if edge j is a free boundary edge Extract the coordinates of the endpoints of the edge Extract the corresponding pointers ptrsl from NodePtrs Compute the length L of the edge Interpolate the boundary data at the midpoint of the edge to get h(m) Evaluate / = 0.5Lh(m) ifptrsl (1) > 0 Add/ to F(ptrsl (1)) ifptrsl (2) > 0 Add/toF(ptrsl(2)) Algorithm 7.5. The complete algorithm for assembling the load vector. \. Choose a domain £1 and a partition dQ = r\ U F2 of its boundary (possibly with FI or F2 empty). Choose the coefficient K for the PDE. 2. Choose the solution u. If the boundary conditions are homogeneous, then u must be chosen accordingly (it is easier to choose a solution for a problem with inhomogeneous boundary conditions). 3. Compute the right-hand-side / and, if M was not chosen to satisfy homogeneous boundary conditions, compute the boundary functions g and h.
7.4. Examples
7.4.1
1 73
Homogeneous boundary conditions
The following two examples involve homogeneous boundary conditions on a simple geometry. EXAMPLE 7.1. The first example will involve homogeneous Dirichlet conditions on the unit square £1 = (0, 1) x (0, 1) = {(x, y) : 0 < jc < 1, 0 < y < 1}. The coefficient K is nonconstant: K(X, y) = 1 + xy1. In constructing a test problem, the only hard part is choosing a function u that satisfies the homogeneous boundary conditions; in this case, since the geometry is so simple, this is not hard at all. The function is obviously zero on the boundary, and f is computed to be
The computed f can be rather complicated; however, the procedure is purely mechanical and one can employ a computer algebra system (such as the Symbolic Toolbox in MATLAB, or Mathematica®, or Maple®}. Using the MATLAB code implementing the algorithms described in this chapter, the BVP \vassolvedonfive (successivelyfiner) regular meshes, having^, 32, 128, 512, aw/2048 triangles. The MATLAB routine RectangleMeshDl generates a regular mesh, assuming Dirichlet conditions, on any rectangle [0, tx ] x [0, t y ] . The mesh size h decreases by a factor of 2 with each successive refinement of the mesh. Therefore, according to the convergence theory presented in Chapter 5, the energy norm of the error should also decrease by a fact or of approximately 2 with each mesh refinement. In order to estimate the energy norm of the error (when the exact solution is known), I wrote a MATLAB routine EnergyNormErrl to estimate
where u is a given smooth function and «/, is a piecewise linear function defined on a mesh 77;. The integral is estimated using a three-point quadrature rule on each triangle in 7~h, and the result may be somewhat inaccurate if the mesh is coarse. Table 7.3 shows the errors, measured in the energy norm, of the five computed solutions. It also shows the ratio of the errors in goingfrom one mesh to the next finer mesh. The results displayed in Table 7.3 are in agreement with the predictions of the theory, namely, that\\u-uh\\E = 0(h). The energy norm of the exact solution is approximately 0.1613, so even on a mesh with 2048 triangles, the error is still about 5%. EXAMPLE 7.2. In this example, £2, K, f are the same as in the previous example, but the boundary conditions are different:
174 Chapter 7. Programming the finite element method: Linear Lagrange triangles
0.1128
—
0.06275
1.797
0.03232
1.941
0.01629
1.985
0.008159
1.996
Table 7.3. The errors in Example 7.1.
Figure 7.1. The computed solutions to Examples 7.1 (left) and 1.2 (right). The only difference in the twoBVPs is that Dirichlet conditions are imposed on the bottom edge ofQ in Example 7.1, while Neumann conditions are imposed in Example 7.2.
The set YI is the bottom edge of the square, while the other three edges comprise Y\. To illustrate the effect of replacing the Dirichlet condition on YI by the Neumann condition, the computed solutions to the previous example and this example are displayed in Figure 7.1.
7.4.2
Inhomogeneous boundary conditions
I now turn to problems with inhomogeneous boundary conditions.
7.4.
175
Examples
Figure 7.2. The initial and final meshes from Example 7.3.
0.9428 0.4714 0.2357 0.1179
3.592 2.513 1.453 0.7630
— 1.429 1.730 1.904
Table 7.4. 7/ze errors /« Example 1.3.
EXAMPLE 7.3. In this example, Q is the region bounded by two squares centered at the origin; the inner square has side length 2/3 and the outer square has side length 2 (see Figure 7.2). The BVP is
where f and g are chosen so that the exact solution is u(x, y) — x/(x2 + y2). The initial mesh is defined by 16 triangles and 16 nodes, as shown in the left-hand plot of Figure 7.2. The mesh is then refined three times and the BVP is solved on each of the four meshes. Figure 7.2 also shows the final mesh, while Figure 7.3 shows the final computed solution. The errors in the computed solutions are displayed in Table 7.4. EXAMPLE 7.4. 77?e «£*/ example is a pure Neumann problem on the unit square. The reader should recall that the right-hand-side f and the boundary data h cannot be chosen arbitrarily in the BVP
176 Chapter 7. Programming the finite element method: Linear Lagrange triangles
Figure 7.3. The computed solution, corresponding to the finest mesh in Figure 1.2, from Example 7.3. The error in the energy norm is about 16.8%. Rather, f and h must satisfy the following compatibility condition:
If the compatibility condition is not satisfied, then the BVP has no solution; if it is satisfied, then the BVP has infinitely many solutions, any two of which differ by a constant. By choosing u and computing f, h (so as to produce a problem with a known solution), I guarantee that the compatibility condition will be satisfied. However, I will still have to deal with the nonuniqueness of the solution, which manifests itself in a singular stiffness matrix. I choose u(x, y) = x2y2 + 2y, which yields f ( x , y) — —2y2 — 2x2. To specify h, I partition 9 £2 into its four edges, labeled T\ (bottom), F2 (right), FT, (top), andT^ (left). Then h is given by
The data for the Neumann problem consist of the values ofh at the endpoints of each free boundary edge. However, it is important to notice that h is determined not only by the points (jc, y) on the boundary, but also by the normal vector on the edge:
7.4.
Examples
177
Figure 7.4. The initial mesh for Example 7.4. The edges are labeled. Therefore, at corner points, where the unit normal changes discontinuously, h effectively has two different values. For example, the initial mesh, with the edges labeled, is shown in Figure 1.4, which shows that the free boundary edges are the eight edges numbered 1,2, 3, 4, 1, 8, 9, and 10. The data for the Neumann problem are then the following 8 x 2 array:
(The rows ofh correspond to the edges in the order listed in T. FBndyEdges; the entries in each row correspond to the endpoints of the edge in the order given in T. Edges.) Thus, for example, the corner point ( x , y ) = (1, 1) belongs to edges 8 and 10. Considered as an endpoint ofe%, the value ofh is
but considered as an endpoint ofe\o,
Thus H(6, 2) = 2.0andH(8, 2) = 4.0. Mathematically, thejact Ihath (1, 1) is not uniquely defined is unimportant, since a single point is a set of measure zero in 9£1 But when the problem is discretized, the value ofh at a point is no longer negligible and both values are needed.
1 78 Chapter 7. Programming the finite element method: Linear Lagrange triangles
In addition to the computation of the Neumann data, the singular stiffness matrix requires special attention. The null space of K is one-dimensional and is spanned by E, the vector of all ones. A nonsingular system results from removing the first row and column ofK and the first component ofF. Removing the first column ofK is equivalent to setting the first nodal value to 0; removing the first row of K and the first component of F is equivalent to removing the equation corresponding to the first free node. The resulting system, KU = F, has a unique solution that corresponds to the piecewise linear function that is zero at the first free node. In the meshes used in this problem, the first free node is the origin, so the particular solution u(x, y) — x2 y2 + 2y is approximated (as opposed to u(x, y) = x2y2 + 2y + C for some nonzero C). The technique described in the previous paragraph is probably the simplest way to deal with a singular stiffness matrix, but it may not be the best way. Constraining a single node on the boundary amounts to imposing a Dirichlet condition at a single point on the boundary, which is not a well-posed boundary condition. The practical effect of this technique is to make the stiffness matrix more ill-conditioned; this effect is analyzed in detail in the paper by Bochev andLehoucq [ 12]. Section 11.5 discusses alternatives for dealing with a singular stiffness matrix. The error in the computed solution, on the finest mesh, is shown in Figure 7.5, while the estimated errors in all the computed solutions are displayed in Table 7.5.
Figure 7.5. The error in the computed solution in Example 7.4 (on the finest mesh). (To be precise, the graph is of the difference between the piecewise linear interpolant of the exact solution and the computed solution.)
7.4.
Examples
179
0.4955 0.2498
1.984
0.1271
1.966
0.06404
1.984
0.03212
1.994
Table 7.5. The errors in Example 7.4.
7.4.3
A more realistic example
Problems with known solutions are useful for testing the correctness of code, and also for testing convergence theory. Obviously, though, the real purpose of finite element code is to solve problems whose solutions are not known. In this section I will discuss a more realistic problem than the preceding examples, that is, a problem whose solution is not known. A critical part of solving a realistic problem is assessing the accuracy of the computed solution. Normally there is some goal for the quality of the solution (such as computing the solution to within 5% in some norm, for example), and it is important to have some assurance that the computed solution is correct to within that limit. Efficiently producing validated solutions is the topic of Part IV of this book. Here I will produce a simple heuristic which is effective under certain conditions. Assuming the goal is to approximate the solution to within some tolerance in the energy norm, the a priori error bound
is useful. To develop a heuristic, it is assumed that the bound is achieved as an equation, that is, for some (unknown) positive constant C. If the solution is computed on two successive meshes, Tih and 77,, where 77, is the refinement of Tin, then the computable quantity \\uh U2h II £ is a reasonable estimate of ||w — M/, ||#, as I will now show. The reader should bear in mind, though, that this heuristic depends on the assumption (7.12), which in turns depends on the assumption that the true solution belongs to // 2 (fi) (and that the meshes are fine enough for (7.13) to hold approximately). By the reverse triangle inequality and assumption (7.13),
Thus
180 Chapter 7. Programming the finite element method: Linear Lagrange triangles
should be approximately true if (7.13) is approximately true. The reasoning above also suggests which is a computable test on whether (7.13) is approximately valid (even if the true solution u is smooth enough, (7.13) may fail if the meshes employed are too coarse). The error estimate (7.14) is an example of an a posteriori error estimate. To compute «/, - U2h, it is necessary to interpolate U2h onto the mesh TH. Since 77, is assumed to be obtained from Tih by standard refinement, this is straightforward. Each node in 7/j either belongs to Tih or is of the form
for some nodes Zj} and Zj2 in Tih- Since UIH is linear over each edge in T^, it follows that
The mapping from / to j\, J2 is part of the mesh data structure, so this interpolation is easy to implement. The MATLAB routine Interpolate! implements the interpolation of U2h from 72* to ThEXAMPLE 7.5. Consider a square metal plate measuring 20 cm by 20 cm, completely insulated on the top and bottom and also on three edges. To model the plate mathematically, it is assumed that it occupies the set £2 = (0, 20) x (0,20) in the plane, with F\ =• {(x, 20) : 0 < jc < 20} (the "top" edge) and F2 = d£2 \ r\. The problem is to determine the steady-state temperature distribution u = u ( x , y ) in the plate if the temperature along FI is held fixed at g(x) =25 — 5 cos (nx/10). The BVP that models this experiment was presented in Section 1.1.1:
The coefficient K is the thermal conductivity of the metal; it is constant if the plate is homogeneous and variable (K — K (x, y)) if the plate is heterogeneous. In this example, 1 will compare two situations. In the first, the plate is homogeneous and made of iron; its thermal conductivity is the constant K\ = 0.836 W/(cm K). In the second, the plate has a region of high thermal conductivity in the center; its thermal conductivity is given by for (x, y) inside the circle (x — 10)2 + (y — 10)2 = 25 andK2(x, y) — 0.836 elsewhere. The function KI is displayed in Figure 7.6. For the purpose of this example, the goal will be to compute the solutions to within 10% in the energy norm. For both BVPs, the initial mesh is uniform, with 128 triangles and h = (5/2)V2 = 3.536. Solving the first BVP (with K = K\ = 0.836) on four successive meshes yields the following results:
7.4. Examples
181
Figure 7.6. The thermal conductivity KI (in W/(cmK)) for the second plate in Example 7.5.
3.536 1.768 0.8839 0.4419
— 2.471 1.318 0.6756
— — 1.875 1.950
8.277 8.153 8.116 8.106
— 30.3 16.2 8.3
The ratios of \\U2h — «4/i I I E / \ \ U h — U2h HE suggest that the assumption (7.13) is not unreasonable, and the values of\\Uh — u^h II E show that the estimated error in the fourth computed solution is less than \ 0%. Repeating the above computations -with the nonconstant thermal conductivity KI yields similar results:
3.536 1.768 0.8839 0.4419
— 2.487 1.324 0.6785
— — 1.879 1.951
8.291 8.164 8.127 8.117
— 30.5 16.3 8.4
Again, the fourth computed solution seems to be sufficiently accurate. The goal of this example was to compare the heat distributions in the two plates. If M ( I ) is the true heat distribution in the first (homogeneous) plate and w (2) is the true heat distribution in the second (heterogeneous) plate, then Figure 7.7 shows the computed estimate ofu(2) — w ( 1 ) . The figure shows that the second plate is cooler on one side of the high conductivity zone and warmer on the other, which would be expected because heat energy flows more easily across that zone.
182 Chapter 7. Programming the finite element method: Linear Lagrange triangles
Figure 7.7. The difference Example 1.5.
in the temperature distributions in the two plates of
While the heuristic used in the previous example can be useful, its applicability is limited because the true solution must be smooth and the number of nodes increases rapidly as the mesh is uniformly refined. Methods for nonuniform mesh refinement are presented in Part IV; these methods are much more efficient and can also handle problems with singular solutions. However, they are also much more complex.
7.5
The MATLAB implementation
The main algorithms from this chapter are implemented in the MATLAB functions Stif fnessl and Loadl. Several routines are provided to estimate various norms of solutions and errors. There is also a master routine, TestConvl, which monitors the convergence of the finite element method on a model problem with a known solution.
7.5.1
MATLAB functions
• Stif fnessl Assembles the stiffness matrix (as a MATLAB sparse matrix) for the model problem on a mesh of linear Lagrange triangles. • Loadl Assembles the load vector for the mode I problem on a mesh of linear Lagrange triangles. • EnergyNorml Estimates (by quadrature on a given mesh) the energy norm of a known function. • EnergyNormErrl Estimates (by quadrature) the energy norm of the difference between a known function and a piecewise linear function on a given mesh. • L2Norml, L2NormErrl Like the previous routines, but for the L2-norm. • Linf NormErr 1 Like the previous routines, but for the L°°-norni.
7.6. Exercises for Chapter 7
183
• Interpolate! Interpolates a piecewise linear function from a given mesh onto a finer mesh (obtained by one or more applications of Ref inel). • TestConvl Monitors the convergence of the finite element method on a model problem with a known solution. • The functions for retrieving information from the mesh data structure are - getDirichletData Evaluates a given function at the constrained nodes. - getNeumannDatal Evaluates h — K'du/dn on the free boundary edges, given K and u. - getNeumannDatala Evaluates a given function h on the free boundary edges. - getGradientsl Evaluates the gradients of a given piecewise linear functions on all the triangles. - getNodalValues Evaluates a given function on the nodes of a given mesh. - getNodesl Extracts the coordinates of the vertices of a triangle, along with their indices. - getNormall Computes the outward unit normal to a given side of a triangle. - getTriNodelndicesl Creates the triangle-node list for a given mesh.
7.6
Exercises for Chapter 7
1. Show that there is a unique one-point rule that integrates 1, s, and / exactly on TR, namely,
2. Show that there is no two-point rule that integrates 1, s, t, s2, st, and t2 exactly on TR. (Hint: Derive the six equations that must be satisfied by the six parameters, s\, t\, w\, S2,t2, u)2, and use algebra to prove that there is no solution.) 3. Let T be the triangle with vertices (;ci, y\), (X2, ^2), and (A/?, ^3), and let
Prove that | det(J)| is twice the area of T. (Hint: Define the vectors v\ — (x-i — x\,y2 — y\), V2 = (XT, — x\,y$ — y\), and let 0 be the angle between v\ and \>i. Then v\ • v2 = HI;, ||||u2|| cos(#) and the area of T is (l/2)||u,||||u2||sin(0). The trigonometric identity cos2 (9} + sin2 (6) — 1 and some algebra gives the desired result.) 4. Let T be the triangle with vertices (x\, y\), (x2, yi), C*3, Js), and let 0i, 02, >3 be the standard nodal basis functions that are nonzero on T. Let (J, ~y) be the centroid of T. Prove that 0, (J, J) = 1/3 for i = 1,2,3.
184 Chapter 7. Programming the finite element method: Linear Lagrange triangles 5. Using the notation of the previous exercise, show that if (x, y) e T, the values 0i (x, y), 02 (x, y), 03 (x, y) are equal to the barycentric coordinates expressing (;c, y) in terms of (x\, y\), (jt2, y-i), Us, Js)- Show that solving for the barycentric coordinates of (£,-, r]i) € T, i = 1, 2 , . . . , / ? , leads directly to the equation MTVT — Cr described near the end of Section 7.1.2. 6.
(a) Let (x\, y\), (*2, ^2), (*3» Js) be the vertices of a triangle 7\ The basis {0i, 02, 03} for the shape functions can be determined by computing the inverse of
as explained in Section 7.1.2. (The gradients V0/ are then known.) Count the number of arithmetic operations required to compute M~l. (b) If the reference triangle TR is employed, it is necessary to compute V0,- = J~TVyi, where [y\, yi, ^3} is the basis for the shape functions on TR and J is the Jacobian of the transformation from TR to T. Count carefully the number of operations necessary to compute J~ r V]//, i — 1,2, 3. (Hint: This is done most efficiently by row reducing the matrix [J\Vy\ \Vy2 \Vys].) (c) Count the rest of the arithmetic operations required to process one triangle by either approach to assembling the stiffness matrix. What is the overall savings, expressed as a percentage, in using a reference triangle in the computation? 7. Using the results of the previous exercise, give total operation counts for assembling the stiffness matrix, with and without the use of the reference triangle. Express the results in terms of Nt and also Nv (given that Nv = O(Nt/2)). 8. Do an operation count for the assembly of the load vector. Express the results in terms of Nt and also Nv (given that Nv — O(Nt/2)~). 9. Assume that when implementing bilinear quadrilateral elements, all integrals of the form
where Q is an arbitrary quadrilateral, must be computed exactly in the case that K is a constant. Show that the product Gauss rule (7.7) corresponding to n — 2 satisfies this requirement. Note: Remember that the above integral is first transformed to the reference square SR. Do not neglect the Jacobian determinant factor in the transformed integral. It may be assumed that this determinant never changes sign. 10. (MATLAB) Test the heuristic suggested in Section 7.4.3 on a problem with a known solution, such as any of Example 7.1, 7.3, and 7.4. Is \\Uh — U2h\\E a good estimate of \\u -«/JE?
7.6.
Exercises for Chapter 7
185
11. (MATLAB) There are two obvious ways to define a uniform triangulation of a rectangle; these are shown in Figure 6.13 and are produced by the MATLAB routines RectangleMeshDl and RectangleMeshDla. Choose a test problem with a known solution and determine which kind of mesh is more efficient (that is, which gives a smaller error for a given number of degrees of freedom). 12. (MATLAB) Consider the BVP
where £2 is the unit circle and g is defined by
Here 0 is the polar angle corresponding to (jc, >') on the unit circle ((*, y) — (cos (0), sin (0))). Using piecewise linear finite elements, try to compute the solution to this BVP to within 5% in the energy norm. 13. (MATLAB) Repeat the previous exercise, replacing the Dirichlet data g with
Which problem is easier to solve to the given accuracy? Why? 14. (Programming) Write a MATLAB routine Solvel to apply piecewise linear finite elements with uniformly refined meshes to solve a given BVP to within a given tolerance. Use the heuristic from Section 7.4.3 to decide when the solution is accurate enough. Also include a limit on the number of refinements or the total number of triangles so that the routine will stop in a reasonable amount of time, even if an unrealistically small tolerance is given. 15. Consider the following BVP:
where
is a partition of
and
on
(a) Derive the variational form of the BVP (cf. Exercise 2.7.12; notice the inhomogeneous Dirichlet conditions here). (b) Derive the formula for computing the load vector F from the Galerkin formulation (cf. Exercise 4.8.17).
186 Chapter 7. Programming the finite element method: Linear Lagrange triangles
16. (Programming) Write a MATLAB routine Mas si, similar to Stif fnessl, for computing the mass matrix M e RNfxNf defined by
(cf. Exercise 4.8.17, where M was introduced). Use a quadrature rule having degree of precision 2. 17. (Programming) Extend Loadl so that it computes the load vector for the BVP from Exercise 15. 18. (Programming) Rewrite Stif fnessl and Loadl to perform all computations over the reference triangle TR. Compare the performance of the new version with the old. 19. (Programming project) Write a collection of MATLAB routines that completely implements bilinear quadrilateral finite elements. This will involve modifying the mesh data structure to describe quadrilateral elements and rewriting the routines Stif f 1 and Loadl, as well as the routines they invoke. As discussed in Section 4.5, it is necessary to perform all integrations over a reference square.
Chapter 8
Lagrange triangles of arbitrary degree
This chapter extends the algorithms and codes of the previous chapters to Lagrange triangles of arbitrary degree, and also to isoparametric finite elements. The presentation continues to address the model problem
More general BVPs are considered in the next chapter. A few simple changes allow the mesh data structure to describe a mesh of Lagrange triangles of degree d > 1. Each edge now contains d + 1 nodes, so the Edges array will be Ne x (d + 1), with row i listing the (indices of the) nodes of £/ in order. In addition to the edge nodes, if d > 3, each triangle contains interior nodes. These (or their indices) will be listed in the IntNodes array. IntNodes has Nt rows and (d — 2)(d — l)/2 columns; row k contains the indices of the interior nodes belonging to 7*.
8.1
Quadrature for higher-order elements
When using piecewise polynomials of degree greater than one, it is necessary to use a quadrature rule with a sufficiently high degree of precision, or else the finite element method may break down. (For example, the stiffness matrix can be singular if it is computed with a quadrature rule that is not sufficiently accurate.) If 0, and 0/, restricted to a triangle T, are polynomials of degree d, then V0/ • V0; has degree 2d - 2. If the quadrature rule used to compute integrals of the form
has degree of precision Id — 2, then the integrals are computed exactly in the case that K is constant. Moreover, as explained in Section 5.5, such a quadrature rule is sufficiently accurate even if K is not constant. 187
188
Chapter 8. Lagrange triangles of arbitrary degree
1 1 1.000000000000000 0.333333333333333 2 3 0.333333333333333 0.666666666666667 ~ 3 F -0.562500000000000 0.333333333333333 30.5208333333333330.600000000000000 4 10.223381589678011 0.108103018168070 3~ 0.109951743655322 0.816847572980459~ ~5J0.225000000000000 0.333333333333333~ 3 0.132394152788506 0.059715871789770 3 0.125939180544827 0.797426985353087 6 3 0.116786275726379 0.501426509658179 3 0.050844906370207 0.873821971016996 6 0.082851075618374 0.053145049844817
0.333333333333333 0.333333333333333 0.166666666666667 0.166666666666667 0.333333333333333 ~0.333333333333333 0.2000000000000000.200000000000000 0.4459484909159650.445948490915965 0.091576213509771 0.091576213509771 0.333333333333333 ~0.333333333333333 0.470142064105115 0.470142064105115 0.101286507323456 0.101286507323456 0.249286745170910 0.249286745170910 0.063089014491502 0.063089014491502 0.310352451033784 0.636502499121399
Table 8.1. Symmetric Gaussian quadrature rules (Dunavant [18]). The degree of precision is p, the quadrature weight for each point is w, and n permutations of each point of the form (ft, £2. £3) appear in the formula, each with weight w.
Dunavant [18] has derived quadrature rules of degree of precision up to 20. These rules have a certain symmetry, namely, if one of the quadrature nodes is (ft, ft, £2) On barycentric coordinates), then the following three points are all quadratures nodes, and with the same weight:
Similarly, if (ft, £2, &) is one quadrature node, then the following six nodes are all quadrature nodes, and with the same weight:
Table 8.1 gives quadrature rules having degree of precision up to 6. In assembling the stiffness matrix, the element stiffness matrices are computed on each triangle T and added to the global stiffness matrix. There are two ways this can be done: Estimate
by the appropriate quadrature rule over T, or transform the integral to the reference triangle TR and then estimate it. For a generic integral
there is really not much difference between the two approaches. However, since the integrand in (8.2) has such a special form, there is, as it turns out, a great advantage in using the reference triangle.
8.1. Quadrature for higher-order elements
189
I will begin by showing how to compute the integrals without using the reference triangle and estimate the work necessary to do so. In order to estimate (8.2) directly, it is necessary to know the values of V0, and V07 at the quadrature nodes. 1 will assume that i and j are local coordinates, so each of 0, and >/ has value 1 at one of the nodes on T and value 0 at all the other nodes. If
a n d \ f ( x k , y>k),k — 1 , 2 , . . . , id, are the nodes on T, then the conditions defining <j>\, fa,..., >,, on T can be written as the matrix equation MA — /, where / is the identity matrix,
and
is the matrix of coefficients of (f>\, fa,..., 0/,r (The system MA — I represents /j systems of id equations in ij unknowns; each system has the same coefficient matrix.) Therefore, all of the local basis functions can be computed simultaneously by computing A = M~]. In addition to the mesh nodes (jcj, y\), (jt2, yj),..., (jc/ (/ , >'/,,), there are n quadrature nodes on T, say (£*, rjk), k = 1, 2 , . . . , n. The values of the basis functions at these quadrature nodes form a matrix V e Rnx'<>, Vfj — > 7 -(£/, ??,), which can be computed by the formula where
If the values of the basis functions are needed, there is no reason to compute the coefficients A = M~] explicitly; instead, V can be computed directly by solving
The cost of solving such a matrix equation, where M is ij x id and C is n by /, is approximately
arithmetic operations.
190
Chapter 8. Lagrange triangles of arbitrary degree The gradients of the local basis functions can be computed similarly. I will write
and
(these matrices are obtained by differentiating the columns of C with respect to £ and /?). Then, if Vx, Vy are defined by
it follows that The matrices Vx and Vy can be efficiently computed by solving the matrix equation
The cost of solving this system, in which M is ij x id and [Cj|C^] is / x (2n), is approximately
arithmetic operations. Having computed the local basis functions (or their gradients), the integrals are easy to estimate. If Wk, k = 1, 2 , . . . , n, are the quadrature weights over T and (&, r)k), k — \, 2 , . . . , n, are the corresponding quadrature nodes (as noted above), then
It is convenient to define the vector / e R" by /* = w*/(£*. //*) (since these products are needed for each i = 1,2, . . . , / < / ) . Then
so all the integrals over T can be estimated by the matrix-vector product VT/, which involves 2nid arithmetic operations.
8.1. Quadrature for higher-order elements
191
Similarly,
where gk — W^K^^, rjk). The cost of the above sum is 4« operations. Since there are id (id + l)/2 integrals of the form fT ^V0/ • V0/ over T, the total cost of the integrals for the element stiffness matrix is about 2nij. The above results show that the cost of computing the element stiffness matrix is about
operations. I have ignored some of the necessary calculations, such as computing the quadrature weights and nodes on T, but these cost little compared to (8.4). The cost of computing the element load vector is about
operations. This total comes from computing the matrix V (that is, the values of the local basis functions); the additional cost of computing VT/, 2/n operations, is negligible by comparison. In obtaining the totals (8.4) and (8.5), I have ignored the cost of computing the values of K and / at the quadrature nodes. Depending on how complicated these functions are, the cost of computing them might be much less or much greater than the costs counted above. In light of the above discussion, it is easy to see the advantage of using a reference triangle. The values and gradients of the local basis functions on TR, y\,Y2, • • • , Yilt, can be computed once and saved. Then, given any triangle T,
Here w(f\ (%(k \ /?[r)), k = 1 , 2 , . . . , « , are the quadrature weights and nodes on T and u'k-, (£*, %), k — 1, 2 , . . . , n, are the quadrature weights and nodes on TR. The scalar A is the area of T. The dominant cost of computing the element load vector over T, namely the cost of computing the matrix V of values of the basis functions, has now been eliminated. Or, to be precise, the same computation must now be performed only once (for TR), so that the number of operations per triangle in (8.5) has been replaced by the negligible cost of
192
Chapter 8. Lagrange triangles of arbitrary degree
operations per triangle. The other costs remain the same (the quadrature nodes still must be computed over T, the values of / at the quadrature nodes of T must still be obtained, etc.). The overall result is to greatly decrease the cost of assembling the load vector. When using the reference triangle to assemble the stiffness matrix, the formula is
Instead of computing the matrices Vx, Vy described above, this approach requires the computation of J~TVyi(^k, rik) for k = 1 , 2 , . . . , n and i = 1 , 2 , . . . , id. This requires about 6nid operations (see Exercise 1). The other operation counts remain the same. Therefore, the cost of computing the element stiffness matrix for each triangle, using the reference triangle approach, is about
This is again a considerable reduction over the first approach described above, though not as dramatic as for the load vector. The approximate operation counts given above, such as (8.3), are valid for id large. Since id is not particularly large when d is small (for example, 12 = 6, 13 = 10), the specific formulas should not be regarded as precise. Nevertheless, the overall conclusion is valid, even for d = 2: There is a significant gain in efficiency in using the reference triangle approach.
8.2
Assembling the stiffness matrix and load vector
Most of the necessary details for assembling the stiffness matrix and load vector were described in the preceding section. A remaining detail is the definition of the interpolation nodes on the triangular elements, specifically when there are interior nodes (d > 2). Lagrange triangles of degree d — 4, 5, 6 were shown in Figure 4.16. Using barycentric coordinates, the nodes are easily seen to be
The interior nodes are
The interior nodes on the reference triangle TR are (in Cartesian coordinates)
8.2. Assembling the stiffness matrix and load vector
193
Because the various integrals are computed over TR rather than over each triangle Tk, the algorithms for assembling the stiffness matrix and load vector change somewhat in outline from Algorithms 7.2 and 7.5. The gradients of the nodal basis functions y\, YI, ..., yiti, evaluated at the quadrature nodes on TR, are computed as described in the previous section and stored in two matrices:
Algorithm 8.1 is the algorithm for assembling the stiffness matrix. Initialize K to the zero matrix Record the quadrature weights and nodes on TR Evaluate the gradients of the local basis functions, on TK, at the quadrature nodes (this yields Vs and V,) for* = 1,2, . . . , N t Call getNodesl to get coords and ptrs Compute the transformation J from TR to Tk, including |det(J)| Transform the gradients of the local basis functions to TR if K is nonconstant Transform the quadrature nodes to Tk Compute K at the quadrature nodes on Tk for r — 1, 2, . . . , id if ptrs (r) >0 for s — r,..., id ifptrs (s) >0 Estimate / = fT K V0,- • V0V using (8.7) / = min{ptrs(r),ptrs(s)} j — max{ptrs(r),ptrs(s)} Add / to K(i, j) for/ = 2 , 3 , . . . , N f for7 = 1 , 2 , . . . , / - 1 K(i,j) = K(j,i) Algorithm 8.1. The complete algorithm for assembling K. The evaluation of the gradients on TR was explained in Section 8.1. The assembly of the load vector is complicated by the fact that the BVP (8.1) has three types of data: the right-hand-side / of the PDE, the Dirichlet data g, and the Neumann data h. The necessary calculations in the case of a nonzero / were discussed in the previous section, and the reader should be able to follow the part of Algorithm 8.2 that pertains to /. The quadratures use the values of y\, yi,..., yid at the quadrature nodes on TR:
194
Chapter 8. Lagrange triangles of arbitrary degree Inhomogeneous Dirichlet conditions lead to integrals of the form
which are handled exactly as in Algorithm 8.1. The part of Algorithm 8.2 that handles g should also be understandable at this point. Inhomogeneous Neumann conditions lead to integrals of the form
where e is a free boundary edge and 0, is one of the local basis functions on the triangle to which e belongs. When using Lagrange triangles of degree d, e contains d + 1 nodes. I will assume that h is not provided exactly, but rather that the values of h at the nodes of each free boundary edge are given as input to the routine that assembles the load vector. Then h, restricted to e, will be replaced with the polynomial he (in one variable) interpolating h at the d + 1 nodes on e, and (8.9) will be estimated by
The local basis function 0,, restricted to e, reduces to a polynomial in one variable of degree d. Therefore, the integrand of (8.10) is a polynomial of degree 2d, and (8.10) can be computed exactly by the (d + l)-point Gaussian quadrature rule introduced in Section 7.1.1. To transform (8.10) to the reference interval [-1, 1], e can be parametrized by arc length: e — {(x(s), y(s)) : 0 < s < t ] , where I is the length of the line segment e. Then, since [—1, 1] is mapped onto [0,1] by s = t(t + l)/2, there exist polynomials he(t), y/(0 such that
Then
where Wj, tj, j — 1, 2 , . . . , d + 1, are the weights and nodes of the (d + l)-point Gauss quadrature rule on [—1, 1]. Finally, the functions y,,/ = 1 , 2 , . . . , d, are the one-dimensional nodal basis functions on [— 1, 1 ] (analogous to y,, / = 1, 2 , . . . , *, on the triangle TR); each is a polynomial of degree d defined by
8.3.
Implementing the isoparametric method
195
where r!•,— — 1 + 2(j — \)/d, j — 1, 2 , . . . , d + 1. Any polynomial of degree d can be expressed in terms of y\, fa, • • • , Yd+\, and thus the values of these basis functions at the quadrature nodes can be computed once and for all. Thus the reference interval [—1, 1] is used in a fashion exactly analogous to how the reference triangle TR is used when the stiffness matrix and load vector are assembled. The remaining details are left to the reader (see Exercise 3). The complete algorithm for assembling the load vector is given in Algorithm 8.2.
8.3
Implementing the isoparametric method
The isoparametric method for solving a BVP on a nonpolygonal domain was described in Section 4.7. If T is an element next to the boundary, then T is defined to be the image of the reference triangle under a transformation of the form
where p and q are polynomials of degree J. The reader will recall that "isoparametric" means that d is also the degree of the shape functions. As shown in Section 4.7, the transformation can be represented explicitly as
where (x\, y\}, (^2, > ' 2 ) , - • - , (-*/,,, >',,/) are the nodes on the element T and y\, 5/2, • • • , Yi,, are the nodal basis functions on the reference triangle 7^. The basis functions on T are then defined by where (s, t) and (x, y) are related by (8.13). Since (f>\, fa,..., 0,,, are not polynomials, it is essential to compute
by transforming the integrals to TR:
When implementing these formulas, it is important to bear in mind that the Jacobian matrix J is not constant.
196
Chapter 8. Lagrange triangles of arbitrary degree
Initialize F to the zero vector if / or g is nonzero Record the quadrature weights and nodes on TR if / is nonzero Evaluate the local basis functions, on TR, at the quadrature nodes (this yields the matrix V) ifg is nonzero Evaluate the gradients of the local basis functions, on TR, at the quadrature nodes (yielding Vs, Vt) if h is nonzero Record the quadrature weights and nodes on the (ID) interval [— 1, 1] Evaluate the (ID) basis functions at the quadrature nodes on [—1, 1] for*= 1,2, ...,#, Call getNodes to get coords and ptrs if / or g is nonzero Compute the transformation J from TR to Tk Transform the quadrature weights to 7^ if / is nonzero Compute / at the quadrature nodes on 7* forr = 1,2, . . . , * < / ifptrs(r)>0 Estimate fT />, using (8.6) and add to F (ptrs (r) ) if g is nonzero and there is a constrained node on 7* Compute K at the quadrature nodes on Tk Transform the gradients of the local basis functions to TR Transform the gradient of G to TR forr = 1,2, ....i,/ ifptrs(r)>0 Estimate /^ KV(f)r • VG using (8.6) and add to F (ptrs (r) ) if h is nonzero fory = 1,2,3 if edge j of 7* is a free boundary edge Extract the coordinates and pointers ptrsl of the endpoints Interpolate the Neumann data at the quadrature nodes Estimate the integrals feh(()r,r = 1,2,... ,d + 1, over the edge using a 1D Gauss rule on [— 1, 1] forr = 1,2, ...,d+\ if node r of edge e is free Add feh(()r to F (ptrsl (r) ) Algorithm 8.2. The algorithm for assembling the load vector.
8.3.
Implementing the isoparametric method
197
Using an «-point quadrature rule with degree of precision 2d -2 results in the following formulas:
Here w^ (£*, rjk), k — 1, 2 , . . . , n, are the quadrature weights and nodes on TR, and /, K are the functions /, K transformed from T to TR. These formulas can be implemented using the matrices V, Vs, Vt, as in the previous section. The only novelties are the computation of the quadrature nodes on an arbitrary T and the computation of the nonconstant Jacobians. I will write (£/ r) , r? ( - r) ) for the quadrature node on T corresponding to (£,-, ??,). In light of the correspondence (8.13), (£/ r) , /?((7)) is given by
Therefore, the matrix-vector products Vx, Vy, where
yield the coordinates of the quadrature nodes on T. (I am abusing notation by using the same symbol for the variable x and the vector whose components are x\, X 2 , . . . , *,-,,, and similarly for y.) The values
can then be computed. The Jacobian matrix is defined by
From (8.13), it follows that
and hence
198
Chapter 8. Lagrange triangles of arbitrary degree
The other partial derivatives needed to compute J (£/, 77,-) can be computed similarly. All of the needed values of J can thus be computed from four matrix-vector products: Vsx, Vtx, Vsy, Vty. The above discussion covers the assembly of K and the contribution of the function / (the right-hand side of the PDE) to the load vector. Computation of the contribution of nonzero Dirichlet data to the load vector parallels the computation of entries in the stiffness matrix, and the details are left to the reader. If the BVP includes inhomogeneous Neumann conditions, then integrals of the form
must be computed, where e is an edge of triangle T. I adopt the convention that if T contains a curved edge, then it must be the second edge. It then corresponds to the second edge of TR, {(I — s, s) : 0 < s < 1}, and can therefore be parametrized as where and (p, q) defines the transformation from TR to T (see (8.12) and (8.13)). Then
(using the element of arc length to write the line integral). An n-point (one-dimensional) Gauss quadrature rule can be used to estimate the integral:
(rather than invent new notation, I now use w,, f, to denote the weights and nodes from the one-dimensional Gauss quadrature rule). Evaluation of
now follows a familiar pattern. The matrices V, Vs, and Vt are defined by
These matrices can be computed in the same fashion as V, Vs, and Vt (see Section 8.1). Then
8.3. Implementing the isoparametric method
199
The last two formulas follow from the chain rule applied to a(s) — p(\ — s, s) and b(s) — q(\ - s,s). If the function h is not given explicitly (so that h(a(ti), b(tj)) cannot be evaluated directly), then an interpolating polynomial can be defined to approximate h, as was described in the previous section. This is the approach taken in the MATLAB code.
8.3.1
Placement of nodes in the isoparametric method
A subtlety that arises in implementing the isoparametric method is the definition of the nodes interior to the elements having curved edges. Formula (8.8), which defines the interior nodes for an ordinary triangle, gives the nodes in terms of the vertices of the triangle. When the element has a curved edge, this formula is no longer adequate; the interior nodes must move with the curved edge. This is necessary because the transformation from TR to T is defined by the nodal placements, and the integration theory requires that the Jacobian determinant of this transformation not change sign. The convergence theory imposes even more stringent conditions, such as uniform bounds, on the norms of the Jacobians and their inverses. Ciarlet and Raviart [15] derived the basic theory that governs the placement of the nodes, and Scott [38] devised an algorithm for computing nodes on triangular meshes that satisfy the required conditions. This algorithm was later extended by Lenoir [27] to higherdimensional problems. I will now describe Scott's algorithm. Given an element with a curved boundary, there are actually three related elements (plus the reference element TR) involved. As in Section 4.7,1 denote by a> a subregion of £2 that includes part of the (curved) boundary of £2. The ordinary triangle approximating co (see Figure 4.20) will be denoted by f , while the isoparametric triangle approximating a> will be denoted by T. The curved part of the boundary ofco will be denoted by Fw, which is parametrized as
The element T is obtained by approximating F(y by a polynomial curve
where a, b are polynomials of degree d that interpolate a and b at d + 1 nodes on f w , namely, at the points
The reference triangle TR can be mapped to f by a linear mapping F. By convention, the curved edge of T must be the second edge, which corresponds to the second edge of TR:
Thus the second edge of f is { F ( \ — s, s) : 0 < s < 1}. The interior nodes of TR lie on (horizontal) lines joining a node (0, i/d) on the third edge of TR with a node (1 — i / d , i/d) on the second edge of TR. The linear mapping sends these nodes to d — i — 1 nodes on the line joining the node F(Q, i/d) on the third edge of T with the node F(\ — i / d , i/d) on the second edge of f . The Scott algorithm merely moves these nodes to be evenly spaced
200
Chapter 8. Lagrange triangles of arbitrary degree
Figure 8.1. The reference triangle TR of degree 5, and a corresponding isoparametric "triangle." between F(Q, i/d) (which is a node on the third edge of T) and (a(i/d), b(i/d)) (the node on the second edge of T corresponding to F(\ — i / d , i/d)). Thus the interior nodes are
for i = 1 , 2 , . . .,d — 2, j = 1, 2 , . . . , d — 1 — i. An example is shown in Figure 8.1.
8.4
Examples
To illustrate the use of higher-order Lagrange triangles, I will repeat Example 7.3 from Section 7.4, but now using Lagrange triangles of increasing order. EXAMPLE 8.1. In this example, £2 is a region bounded by two squares (see Figure 7.2). The BVP is
-where f andg are chosen so that the exact solution is u(x, y) = x/(x2 + y2). The initial mesh is defined by 16 triangles and 16 nodes; the initial mesh and three refinements were shown in Figure 7.2. In Example 7.3, it was shown that linear elements on the finest mesh from Figure 7.2 resulted in an energy norm error of about 16.8%— not particularly small. In this example, higher-order elements are used to solve the same problem. The following table shows the percent errors (in the energy norm) resulting from solving the BVP using Lagrange triangles of degree d = 1,2,3,4 on the four meshes To, 7], ?2, TS from Figure 7.2. The numbers in parentheses are the numbers of free nodes for each mesh. Mesh To T\ Ti Ti
d=1 79.3(0) 55.4(16) 32.0(96) 16.8(448)
d =2 50.5(16) 18.6(96) 5.8(448) 1.6(1920)
d=3 28.0(48) 5.6(240) 0.97(1056) 0.14(4416)
d=4 14.6(96) 1.6(448) 0.16(1920) 0.012(7936)
8.4. Examples
201
The number of free nodes gives some indication of the amount of work to form and solve the finite element equations, and therefore provides some basisfor comparison. The results show that, for this problem, higher-order elements are more efficient when an accurate solution is desired. For example, d = \ on mesh 71, d — 2 on mesh Ti, andd — 4 on mesh T\ all result in 448 degrees of freedom. The errors are 16.8%, 5.8%, and \ .6%, respectively, confirming that the higher-order method gives a smaller error for a given amount of work. This is always true when the true solution is smooth—higher-order elements are asymptotically more efficient than low-order elements. EXAMPLE 8.2. To illustrate the use of isoparametric finite elements, this example solves the same BVP as in the previous example, except that the outer boundary ofQ is changed to the unit circle. The following table compares the percent errors obtained using linear Lagrange triangles with those obtained using cubic isoparametric triangles. In each case, only the (energy norm) error over the approximate domain £2h is computed In the case of the cubic elements, the additional error is quite small, since £1^ is very close to £2. For linear elements, the neglected error is much more significant (cf. Example 4.5 in Section 4.7), but it does go to zero at the same rate as the computed error. As in the previous example, the number of free nodes on each mesh is given in parentheses. Mesh T0 T1 T2 T3 T4
78.8 (0) 27.0 (36) 54.5 (12) 5.3 (180) 31.3 (72) 0.89 (792) 16.4 (3236) 0.12 (7923) 8.3 (1440) 0.016 (13536)
Comparing these results to the results of the previous exercise shows that the results of using isoparametric elements on a domain with a curved boundary are quite similar to using ordinary elements on a boundary with a polygonal domain. Figure 8.2 shows the initial nonisoparametric mesh and the initial isoparametric mesh. Figure 8.3 shows the final computed solution.
Figure 8.2. The initial nonisoparametric mesh (left) and the initial isoparametric mesh (right) from Example 8.2. The exact (curved) boundary of£l is shown as a dashed line. In the isoparametric mesh, the cubic boundary segments follow the curved boundary quite closely.
202
Chapter 8. Lagrange triangles of arbitrary degree
Figure 8.3. The final computed solution from Example 8.2. If Lagrange triangles of degree d are used to solve a BVP that has a smooth solution, then
holds. Under the assumption \\u — Uh \\ — Chd, the reverse triangle inequality yields
This leads to the heuristic
which can be used to assess the quality of a solution. EXAMPLE 8.3. As the final example for this chapter, I will solve the second BVP from Example 7.5 again, this time using quadratic Lagrange triangles. The BVP is
8.5. The MATLAB implementation
203
and it models the steady-state temperature distribution in a square metal plate measuring 20 cm by 20 cm, completely insulated on the top and bottom and also on three edges. The plate is heterogeneous, with the thermal conductivity plotted in Figure 7.6. The reader can review Example 1.5 for the rest of the details about the By P. In this example, the goal will be to compute the solution to within 1 % in the energy norm. The heuristic (8.16), in the case d = 2, is
The initial mesh is uniform, with 32 triangles and h — 5\/2 = 7.071. Solving the BVP on four successive meshes consisting of quadratic Lagrange triangles yields the following results: 7.071 3.536 1.768 0.8839
— 0.456 0.128 0.0335
— — 3.56 3.83
— 5.09 5.15 5.10
— 9.0 2.5 0.66
The ratios of \\U2t, — u$h H£/||«/, — ii2h HE suggest that the assumption
is not unreasonable, and the values of\\u^ — U2h\\E show that the estimated error in the fourth computed solution is less than 1 %.
8.5 The MATLAB implementation 8.5.1
version2
MATLAB routines that handle Lagrange triangles of arbitrary degree form version2; the names of routines generally have a suffix "2." The main routines in version2 of the MATLAB code are Stif fness2 and Load2, which assemble the stiffness matrix and load vector for the model problem. These routines work for Lagrange triangles of any degree, with one limiting factor: Quadrature rules have been coded only up to degree of precision 20, limiting the degree of the triangles for model problem (8.1) to d = \ I. The quadrature rules are coded in DunavantData. Other major computational routines are EvalNodalBasisFcns, for computing the matrix V// = Yj(si> */)» and EvalNodalBasisGrads, for computing the related matrices Vs and Vt. Other routines presented in Section 7.5, such as ShowMeshl, are extended to meshes of arbitrary degree. MATLAB functions • Mesh2: Type help Mesh2 to see a description of the mesh data structure. • GenLagrangeMesh2 Converts a mesh of degree 1 to a mesh of degree d.
204
Chapter 8. Lagrange triangles of arbitrary degree
• ExtractLinearMesh Recreates the original (linear) mesh from the output of GenLagrangeMesh2. • Ref Tri Creates the reference triangle as a mesh of degree d (with one triangle). • ShowMesh2 Graphs a mesh. • ShowPWPolyFcn2 Graphs a continuous piecewise polynomial function on a given mesh. • St if f ness2 Assembles the stiffness matrix (as a MATLAB sparse matrix) for the model problem on a mesh of arbitrary degree. • Load2 Assembles the load vector for the model problem on a mesh of arbitrary degree. • TransToRef Tri Computes the transformation from the reference triangle TR to an arbitrary triangle T. • DunavantData Defines the quadrature rules for a triangle. • GaussData Defines the quadrature rules for an interval. • EvalNodalBasisFcns Evaluates the nodal basis functions at a list of evaluation nodes. • EvalNodalBasisGrads Evaluates the gradients of the nodal basis functions at a list of evaluation nodes. • EvalNodalBasisFcns ID Evaluates the nodal basis functions for a one-dimensional interval at a list of evaluation nodes. • EvalPWPolyFcn2 Evaluates a piecewise polynomial function of arbitrary degree at a list of evaluation nodes. • EnergyNorm2 Estimates (by quadrature on a given mesh) the energy norm of a known function. • EnergyNormErr2 Estimates (by quadrature) the energy norm of the difference between a known function and a piecewise linear function on a given mesh. • L2Norm2, L2NormErr2 Like the previous routines, but for the L2-norm. • Interpolate2 Interpolates a piecewise linear function onto a mesh of higher degree obtained by an application of GenLagr angeMe sh2. • Interpolate2a Interpolates a piecewise polynomial function onto a finer mesh obtained by a single application of Ref inel. • TestConv2 Monitors the convergence of the finite element method on a model problem with a known solution.
8.5. The MATLAB implementation
205
• The functions for retrieving information from the mesh data structure are - getNeumannData2 Evaluates h — Kdu/dn on the free boundary edges, given K and u. - getNeumannData2a Evaluates a given function h on the free boundary edges. - getNodes Extracts the coordinates of the nodes of a triangle, along with their indices. - getVertices Extracts the coordinates of the vertices of a triangle, along with their indices. - getNorma!2 Computes the outward unit normal to a given side of a triangle. - getTriNodelndices Creates the triangle-node list for a given mesh. Some routines, like Ref Tri and getNodes, have no suffix of "2" because they do not need updating for versions.
8.5.2
versions
The most general routines handle Lagrange triangles of any order, and use isoparametric elements when the boundary is curved. These routines comprise versions, and their names have no suffix: Stiffness, Load, GenLagrangeMesh, ShowMesh, ShowPWPolyFcn, and so forth. MATLAB functions
All of the following routines, including the graphic routines, handle isoparametric elements correctly. • Mesh: Type help Mesh to see a description of the mesh data structure. • GenLagrangeMesh Converts a mesh of degree 1 to a mesh of degree d. • ShowMesh Graphs a mesh. • ShowPWPolyFcn Graphs a continuous piecewise polynomial function on a given mesh. • Stiffness Assembles the stiffness matrix (as a MATLAB sparse matrix) for the model problem on a mesh of arbitrary degree. • Load Assembles the load vector for the model problem on a mesh of arbitrary degree. • EnergyNorm Estimates (by quadrature on a given mesh) the energy norm of a known function. • EnergyNormErr Estimates (by quadrature) the energy norm of the difference between a known function and a piecewise linear function on a given mesh.
206
Chapter 8. Lagrange triangles of arbitrary degree
• TestConv Monitors the convergence of the finite element method on a model problem with a known solution. • The functions for retrieving information from the mesh data structure are - getNeumannData Evaluates h = Kdu/dn on thefreeboundary edges, given K and u. - getNormal Computes the outward unit normal to a given side of a triangle.
8.6
Exercises for Chapter 8
1. Show that solving JTB — G for B, where J is a nonsingular 2 x 2 matrix and G is a 2 x n matrix, requires about 6n arithmetic operations. 2. Count carefully the number of arithmetic operations required to assemble the stiffness matrix (for the model problem) using the algorithm described in this chapter in the case of quadratic Lagrange triangles. Express the results in terms of TV, and also Nv (cf. Exercise 7.6.7). 3. Letrj = -\+ 2(j - \)/d, j = 1, 2 , . . . , d + 1, and let y;, / = 1 , 2 , . . . , d + 1, be defined by (8.11). (a) Given quadrature nodes r;, j — 1 , 2 , . . . , d + 1, on [— 1, 1], show how to efficiently compute Yi(tj),i, j — \, 2 , . . . , d + 1. (b) How many arithmetic operations does it cost to perform the computation described in the previous part? 4. (MATLAB) Test the heuristic suggested in Example 8.3 on the following BVP:
Here Q is the unit square, f ( x , y) = sin (nx) sin (2ny), and the exact solution i s w ( j c , y ) = (Sjr2)"1 sin(7TJt)sin(27ry). Use piecewise quadratic functions (the MATLAB function Interpolate2a will be useful). Is (1/3)\\Uh — U2h\\E a good estimate of \\u — «/, ||E? 5. (MATLAB) Consider the BVP
8.6. Exercises for Chapter 8
207
where £2 is the upper half of the unit circle (Q — {(x, y) : x1 + y1 < 1, y > 0}), K(X, y) = 1 + y, and / is chosen so that the exact solution is u(x, y) — y(\ — x2 — y2)ex. The BVP can be solved using regular triangles or isoparametric triangles (the two coincide for linear triangles). The purpose of this exercise is to examine the improvement in accuracy obtained by using isoparametric triangles. Measuring the errors in this case is complicated by the fact that there are three different domains involved (£2, the polygonal approximation to £2, and the isoparametric approximation to £2). In this exercise, the errors near the boundary will be disregarded and only the errors in the interior will be compared. To simplify matters further, only errors in the nodal values will be considered. Start with a coarse mesh on £2 and refine it several times. On each mesh, solve the BVP using ordinary triangles and again using isoparametric triangles. Record the level of refinement and the maximum error at any free node for each computed solution. At what rate is each error (nonisoparametric, isoparametric) going to zero? 6. (MATLAB) Repeat the previous exercise, but this time use cubic triangles. Do the errors resulting from ordinary triangles improve in going from quadratic elements to cubic elements? Explain your results. 7. (MATLAB) Let £2 be the polygonal region having vertices (0,0), (1,0), (1, 1), (-1,1), (-!,-!), (0, -1) and let u be the function defined in polar coordinates by u = r 2 / 3 sin (20/3), 0 < 0 < 2n. (a) Verify that u is harmonic (that is, satisfies — Aw = 0). (b) Verify that VM is singular at the origin. In fact, « does not belong to //2(£2), so the standard convergence theory from Chapter 5 does not apply. In particular, when solving by finite elements, \\u — Uh \\E = O(h) need not hold. (c) Solve the Dirichlet problem
(g — u on 9£2) using piecewise linear functions on a sequence of uniformly refined meshes. Record the energy norm errors. At what rate does \\u — «/, \\E appear to go to zero? (d) Repeat the previous part using piecewise quadratric functions. (e) Repeat using piecewise cubic functions. Does increasing the degree of the shape functions improve the accuracy of the solution? 8. (MATLAB) Consider the BVP
208
Chapter 8. Lagrange triangles of arbitrary degree where £2 is the unit circle and g is defined by
Here 0 is the polar angle corresponding to (jc, _y) on the unit circle ((x, y) = (cos (#), sin (#))). Using piecewise quadratic finite elements, try to compute the solution to this BVP to within 1% in the energy norm. 9. (MATLAB) Repeat the previous exercise, replacing the Dirichlet data g with
Which problem is easier to solve to the given accuracy? Why? 10. (Programming) Write a MATLAB routine Solve2 to apply finite elements, with quadratic Lagrange triangles and uniformly refined meshes to solve a given BVP to within a given tolerance. Use the heuristic (8.16) to decide when the solution is accurate enough. Also include a limit on the number of refinements or the total number of triangles so that the routine will stop in a reasonable amount of time, even if an unrealistically small tolerance is given. 11. (Programming) Write a MATLAB routine Mass2, similar to Stiffness2, for computing the mass matrix M e RN/xNf defined by
(cf. Exercise 4.8.17, where M was introduced). Your routine should handle a polygonal mesh of arbitrary degree d. Use a quadrature rule having degree of precision Id. 12. (Programming) Write a MATLAB routine Mass, similar to Stiffness, for computing the mass matrix M e RNfxNf defined by
(cf. Exercise 4.8.17, where M was introduced). Your routine should handle a mesh of arbitrary degree d with isoparametric elements. Use a quadrature rule having degree of precision 2d.
Chapter 9
The finite element method for general BVPs
The previous two chapters have been restricted to the model problem (8.1). In this chapter, I will discuss some of the more general problems that can be treated by the same techniques. I will also discuss some limitations of the Galerkin method.
9.1
Scalar BVPs
The most general second-order PDE in x and y takes the form
(the negative signs on the leading coefficients are for convenience). When the solution u is sufficiently smooth, the mixed partial derivatives of u are equal:
Then
and thus the matrix of coefficients
might as well be replaced by the symmetric matrix
209
210
Chapter 9. The finite element method for general BVPs
For this reason, it will be assumed that the matrix A defined by (9.2) is symmetric. Furthermore, the reader can easily verify that
(see Exercise 1). Therefore, writing c for the vector-valued function with components c\, €2, (9.1) can be rewritten as
The theory and techniques developed in this book are applicable when the matrix A is symmetric and uniformly positive definite over Q: There exists a constant A > 0 such that
If this fails to be true, the PDE is not elliptic and (9.1) has a different character. In such a case, data from part of the boundary are typically propagated across the domain £1 to the remainder of the boundary, and BVPs in which the boundary data are specified on the entire boundary (such as have been studied in this book) are not well-posed. Finite element methods can be developed for such problems; in the most-studied cases, one of the variables jc or y is time and the propagation of information is explicitly recognized in that the time variable is treated differently than the spatial variable. Such problems are beyond the scope of this book. As discussed in Section 2.6, the presence of the term (c + V • A) • Vu means that the variational form of (9.4) involves a nonsymmetric bilinear form, which may or may not be elliptic. Also, if/? is negative, the bilinear form may fail to be elliptic. Therefore, the theory and techniques developed thus far in the text apply most readily to the PDE
where A is symmetric positive definite and p > 0 in Q. A well-posed BVP would have the form
The variational form of (9.5) can be derived using the identity
(a version of Green's identity—see Exercise 2). The result is
find
9.1. Scalar BVPs
2JJ_
Here V = {v e //' (£2) : v = 0 on F\} and G is any function in //' (£2) satisfying G = g on 3£2. Under the assumptions given above, the bilinear form
is V-elliptic (see Exercise 3). Given a finite element subspace V/, of V, having basis {0i, 02, • • • , >Nf}, (9.7) can be treated in the usual way. The Galerkin method leads to the variational problem
where
and Gh interpolates g on FV Writing
and substituting into (9.9) yields
or
Defining the matrices K and M by
and the vector F by
this can be written as the matrix-vector equation
The matrix K is the version of the stiffness matrix suitable for this problem. The matrix M is called the mass matrix; it was the subject of Exercises 2.7.12, 4.8.17, and 7.6.16. The vector F is the load vector for this problem.
212
Chapter 9. The finite element method for general BVPs
Assembling K, M, and F presents little that is different from previous calculations. The entries of K have the form
These terms are similar in form to
and can be handled in the same way. The entries of M,
are assembled from integrals of the form
where T is an arbitrary triangle in the mesh. Transforming to the reference triangle TR and using an n -point quadrature rule,
Here J = \ det (J)\ is the Jacobian determinant and p is the function p transformed from T to TR (p(s, t) — p(x, y), (s, t) e TR, (x, y) e T). The quadrature rule should be chosen to be exact in the case that p is constant, that is, to have degree of precision 2d if Lagrange triangles of degree d are used. Finally, the term
is the only new part of the expression for the load vector, and a simple modification of Algorithm 8.2 will handle this.
9.1.1
An example
Replacing the scalar-valued function K in the PDE
with a symmetric positive definite matrix A introduces anisotropy into the model. For example, when (9.10) represents steady-state heat flow, the basic physical law is Fourier's law of heat conduction,
9.2. Isotropic elasticity
213
where u is the temperature, K is the thermal conductivity, and q is the heat flux. Since K is a scalar, the magnitude of the heat flux at a point (jc, y ) e Q, \\q(x, y)\\2 = \\K(X, y)Vu(x, y)\\2, is the same regardless of the direction of VM. Also, the direction of the heat flux is the (opposite of the) direction of VM. On the other hand, the law
says that the magnitude of the heat flux depends on both the direction and size of VM, and the direction of the heat flux is not simply the direction of VM. According to this model, heat flows in some directions more readily than in others. I will refer to the PDE (9.5a) as the matrix conductivity problem. EXAMPLE 9.1. Consider the (constant) matrix
which has eigenvalues 0.5 and 5. The corresponding eigenvectors point in the directions of the vector ( 1 , 1 ) and(—1,1), respectively. Therefore, if A represents an anisotropic thermal conductivity, then heat flows from the lower right to the upper left much more readily than from the lower left to the upper right. Now consider the BVP
where £2 is the unit square and T\, V^, 1^, F4 are the bottom, right, top, left edges, respectively. This B VP models an anisotropic plate with the left and right edges insulated and the top edge held at 0 degrees. Heat energy is entering across the bottom edge at the rate of 10 (in W/cm1, for example). To illustrate the effect of the anisotropy, I solved the BVP using a regular mesh of 128 linear Lagrange triangles. The resulting heat flux (—AVu) is displayed on the left in Figure 9.1 as a vector field. For comparison, I solved the same BVP but with A replaced by a scalar; the result is the heat flux shown on the right in Figure 9.1. As one would expect, the heat flows toward the upper left corner in the anisotropic plate but straight up in the isotropic plate.
9.2
Isotropic elasticity
The equations of isotropic elasticity form perhaps the simplest interesting elliptic system of PDEs (as opposed to the scalar PDEs that have been treated up to now). This system describes the displacement M of an isotropic elastic membrane with Lame moduli n, X under the influence of a body force /:
214
Chapter 9. The finite element method for general BVPs
Figure 9.1. The heat flux from the anisotropic plate (left) and the isotropic plate (right) of Example 9.1. The lengths of the flux vector are scaled, so that the vectors in each graph can be compared only to other vectors in the same graph (that is, comparing the vector lengths on the left to the vector lengths on the right is meaningless). The displacement u is a vector-valued function of (x, y) e Q and V« is the 2 x 2 matrix (called the Jacobian of u in other contexts) whose /th row contains the partial derivatives of HI with respect to x and y. The tensor € is the (linearized) strain, and a is the stress tensor. The Lame moduli describe the elastic properties of the membrane (see the exercises in Chapter 1 for more details). The tensors a and € depend on M, and I will write au, eu when necessary. A typical BVP for a membrane is
where, as usual, TI, T2 partition 3fi. If F\ = 3£2, T2 = 0, the BVP is called a pure displacement problem, while if FI = 0, F2 = d£2, it is a pure traction problem. For simplicity, I will begin by considering only homogeneous Dirichlet conditions (g = 0 in (9.12)). The variational form of (9.12) (with g — 0) is
where In the finite element method, the approximating subspace of V can be taken to be a vectorvalued version of the usual space of scalar piecewise polynomials,
9.2. Isotropic elasticity
215
Here P^ represents the space of continuous piecewise polynomial functions of degree at most d, relative to a given triangulation Tt, of £2. The vector-valued version of V/, is
If {01 > 02, • • • ,
where I will write a(u, v) for the bilinear form
and t for the linear functional
At the abstract level, the Galerkin method is unchanged: The approximate solution is
where U e R 2N/ solves KU = F and
To assemble K and F, though, it is necessary to take into account the fact that there are two different kinds of basis functions, 0, = (0,, 0) and 0# /+ / = (0, 0,). As a result, K and F are naturally partitioned as
where
and
216
Chapter 9. The finite element method for general BVPs
The matrices K(U) and K{22) are symmetric and K(2l) = (K(U))T, but £ (12) is not symmetric. The problem of assembling K and F now reduces to assembling K(}1\ K(U\ £ (21) , (22) K and F(}\ F(2). As in the case of the matrix conductivity considered in the previous section, the computations are very similar to those considered in previous chapters. I begin with^ (11) . Since
it is necessary to simplify the expressions
The above formulas yield
Equation (9.15) is a slight variation of formulas that have been treated earlier. Exercise 4 asks the reader to derive the following formulas:
The vector F (1) is computed as follows:
9.2. Isotropic elasticity
217
where f\ and h \ are the first components of the vector-valued functions / and h, respectively. Similarly,
Analogous formulas appeared when treating scalar PDEs. Finally, if the Dirichlet data function g is nonzero, then the variational problem becomes
find where G is any function in (//' (f2))2 such that G = g on F]. When applying the Galerkin method with finite element functions, G can be replaced with G/,, the piecewise polynomial interpolating g on F[ and having all other nodal values equal to zero. It remains then to compute the contributions a(Gh, (0,, 0)) and a(G, (0, 0,)) to F (l) and F (2) , respectively. The derivations follow the pattern used when computing K^l) above, and the results are
EXAMPLE 9.2. Consider a square isotropic elastic membrane occupying the unit square £2, and subjected to a purely longitudinal traction in the x-direction:
Suppose n and A are the constants /z — 1, A = 2, which correspond to a Young's modulus ofE = 3 and a Poisson's ratio ofv = 0.5 (see Exercise 1.3.6). Solving the BVP
yields the solution shown in Figure 9.2. This simple traction problem was examined in Exercise 1.3.6, in which the solution was expressed in terms of the Young's modulus E and Poisson 's ratio v of the membrane,
218
Chapter 9. The finite element method for general BVPs
Figure 9.2. The homogeneous membrane from Example 9.2. The mesh on the deformed membrane is superimposed on the mesh on the undeformed membrane. the exact solution is
Since u is linear, it should compute exactly with linear Lagrange triangles, so it is easy to check whether the code performs correctly. Now consider the same membrane with a stiff region, the disk of radius 0.2 centered at (x, y) — (0.6, 0.7). This region also has v = 0.5, but E is increased by a factor of about 10. To be precise, Young's modulus in the stiff region is defined, for this simulation, by
The necessary formulas for // and A. can be computed from
(cf. Exercise 1.3.7). The resulting BVP was solved and the solution graphed in Figure 9.3. The effect of the stiff region can be clearly seen.
9.3
Mesh locking
The elastic properties of a membrane can be described by the Lame moduli /A and A., or equivalently by Young's modulus E and Poisson's ratio v. As Exercise 1.3.6 suggests, another pair of moduli that could be used instead are the bulk modulus JJL + A. and the shear modulus fji. The bulk modulus determines the response of the material to a pure expansion; when the bulk modulus is large, the material resists expansion. On the other hand, the shear modulus determines the response to a shear. When the bulk modulus is very large compared to the shear modulus, that is, when A. is large compared to //, then the material is nearly incompressible. Such cases present difficulties for simulation with Galerkin finite elements.
9.3.
Mesh locking
2119
Figure 9.3. The heterogeneous membrane from Example 9.2. The stiff region of the (undeformed) heterogeneous membrane is indicated by the circle. EXAMPLE 9.3. Consider a square membrane, occupying the unit square when at rest, and fixed on its top boundary. The bulk modulus /-i + A. and the shear modulus [i of the membrane are taken to be 5000 and 2.5, respectively, so that A = 4997.5 and the membrane is nearly incompressible. A traction of magnitude 1 is applied in the downward direction on the bottom of the membrane, so that the displacement u of the membrane is governed by the BVP
where a = 2/ii6,( + A/r(e,,)/ and Y\, F2, FT, F4 are the bottom, right, top, and left sides of the square, respectively. The reader should be able to predict the shape of the deformed membrane without difficulty; it is stretched downward from its fixed top edge and contracted horizontally. On the other hand, when this B VP is solved using the Galerkin method with linear Lagrange triangles, the computed displacement is as shown on the left in Figure 9.4. 777/5 result is clearly inaccurate; this inaccuracy is due to an effect called mesh locking. A computed displacement free from the effects of locking can be found using higher-order elements. The right-hand graph in Figure 9.4 shows an accurate displacement computed using quadratic Lagrange triangles. The mesh locking in the previous example can be explained as follows: The strain 6U satisfies the variational problem
With the boundary conditions (which determine the right-hand side) fixed, it seems believable that increasing A must cause tr(€ (/ ) to decrease, and that A —> oo implies that tr(e u ) —>• 0.
220
Chapter 9. The finite element method for general BVPs
Figure 9.4. The computed displacement for Example 9.3, using linear Lagrange triangles (left) and quadratic Lagrange triangles (right}. The inaccuracy in the left-hand graph is due to mesh locking. This can be proved rigorously (see [13, Section 11.3] and the references contained therein), so A. large implies that tr(e M ) = V • u is small in £2. This makes a finite element space based on linear Lagrange triangles a poor choice, since such a space typically contains no nonzero divergence-free functions. This implies that V • w/, is small only when w/, itself is small—that is, the mesh "locks" (displays inaccurately small displacements). Nearly incompressible materials can be simulated using higher-order Galerkin finite elements; for example Babuska and Suri [9] show that locking cannot occur for Lagrange triangles of degree d > 4. If the use of higher-order elements is undesirable, then it is necessary to move outside of the Galerkin framework and, moreover, into an active research problem. The papers by Arnold and Winther [4,5] and by Cai and Starke [ 14] will introduce the reader to recent results.
9.4
The MATLAB implementation
versions contains the MATLAB routines necessary to generate the examples for this chapter. No attempt has been made to provide a complete set of codes. For example, there is no updated Load function for the PDE with a matrix coefficient. Load2 works for this problem as long as the Dirichlet data (if any) are zero; a new routine would be necessary to handle nonzero Dirichlet data. MATLAB functions
All of the following routines handle isoparametric elements correctly. • ShowDisplacement Graphs a two-dimensional displacement as a perturbation to a mesh. • Stiff nesslso Assembles the stiffness matrix (as a MATLAB sparse matrix) for the system of isotropic elasticity on a mesh of arbitrary degree.
9.5. Exercises for Chapter 9
221
• Load I so Assembles the load vector for the system of isotropic elasticity on a mesh of arbitrary degree. • Stif fnessMC Assembles the stiffness matrix (as a MATLAB sparse matrix) for the matrix conductivity problem on a mesh of arbitrary degree. • Tes tConvI so Monitors the convergence of the finite element method for a system of isotropic elasticity with a known solution. • TestConvMC Monitors the convergence of the finite element method for a matrix conductivity problem with a known solution. • getNeumannDatalso Evaluates h = outi on the free boundary edges, given the entries in Vw and /z, A.
9.5
Exercises for Chapter 9
1. Suppose A is a matrix-valued function A : £2 —> R 2 x 2 defined by (9.2). Apply the product rule to the expression V • (AVu) to derive (9.3). 2. Derive (9.6), assuming A : £ 2 — > - R 2 x 2 , M : ! Q - > R , and v : £2 -> R are smooth and extend smoothly to the boundary of Q. 3. Suppose A <E R 2 x 2 is symmetric positive definite. Then the eigenvalues of A are positive, and if A2 > AI > 0 are the eigenvalues, then
Use these facts to show that the bilinear form (9.8) is V-elliptic when p > 0 on £2, A is symmetric and uniformly positive definite on Q, and V = {v e Hl(£2) : v — Oonr,}. 4. Derive formulas (9.16) and (9.17). 5. (MATLAB) Solve the BVP from Example 9.3 using piecewise linear finite elements, beginning with the mesh shown in Figure 9.4, and refining the mesh twice. Are the effects of locking still visible on the refined mesh? 6. Consider the pure traction problem for the system of isotropic elasticity,
where a = 2/ze + Atr(e)/. When the finite element method is applied, the result is a singular stiffness matrix K. (a) Use the results of Exercise 1.3.5 to show that the null space of K is threedimensional, and to find a basis for the null space.
222
Chapter 9. The finite element method for general BVPs
(b) Explain how to produce a nonsingular matrix by removing three (properly chosen) rows and columns from K (cf. Example 7.4). (Hint: Removing a column from K is equivalent to setting the corresponding component of the displacement to zero. It would be physically reasonable to fix one point (thus eliminating the translational degrees of freedom) and to set either the x- or ^-component of displacement of another point to zero (thus eliminating the rotational degree of freedom).) 7. (MATLAB) Let £2 be the unit circle and consider the pure traction problem (9.21), where [i — 1000, A. = 10, and the traction on the boundary is given by on — 100«. Solve the BVP using piecewise linear finite elements, graph the displacement, and verify that the computed displacement is correct by comparing it with the results of Exercises 1.3.6b and 1.3.8. (Warning: Recall that a pure traction problem like this results in a singular stiffness matrix. Use the results of the previous exercise to solve KU = F.) 8. (MATLAB) Let Q be the unit square and consider the BVP
where Fj is the bottom side of the square (corresponding to y = 0) and the other three sides comprise F2. (a) What boundary traction h will produce the pure shear of Exercise 1.3.6a? (b) Formulate and solve the BVP in MATLAB for fi = 1000, X = 10, and S = 0.1 (8 is the constant from Exercise l.3.6a). 9. (MATLAB) Consider the pure fraction problem (9.21), where Q, is the unit circle, H — 1000, A = 10, and h is given in polar coordinates by h(0) = 1000cos (kO)n, n = (cos (6), sin (6)). Solve the BVP and plot the displacement for various (small) integers k. Use quadratic isoparametric triangles. 10. (Programming) Write a MATLAB function LoadMC that assembles the load vector for BVP (9.5). (The reader can choose whether to do this for linear meshes, arbitrary polygonal meshes, or arbitrary meshes with isoparametric elements.) LoadMC should handle inhomogeneous boundary conditions. 11. (Programming project) Write a collection of routines (like Stiffnesslso, Loadlso, and so forth) to solve the BVP (9.12) on a mesh of quadrilaterals. Use bilinear shape functions. Test your code on the BVP from Example 9.3 and on similar problems with nearly incompressible materials. Are the same locking effects seen?
Part III
Solving the Finite Element Equations
This page intentionally left blank
Chapter 10
Direct solution of sparse linear systems
This part of the book discusses the problem of solving the finite element equations K U — F. As I have pointed out several times, one of the primary advantages of the finite element method is that the matrix K is sparse, which makes it possible to solve KU = F even when K is very large. There are two classes of algorithms for solving linear systems: direct and iterative methods. Direct methods are related to the fundamental Gaussian elimination algorithm and are distinguished by the fact that the exact solution (up to round-off error) is computed in a finite number of calculations. On the other hand, an iterative method computes a sequence of approximations that converges to the exact solution. Although the exact solution is computed only in the limit, an acceptable approximation is often found after relatively few iterations. For many problems, an appropriate iterative method uses much less memory and computational time than a direct method. Most of Part HI focuses on iterative methods. However, I begin with a brief discussion of direct methods for solving sparse systems. This will provide a context for explaining the advantages and disadvantages of iterative methods. For simplicity, I will restrict myself to symmetric positive definite systems KU = F.
10.1 The Cholesky factorization for positive definite matrices Mathematically, the solution of KU = F can be expressed as U = K~[ F, where ^ -l is the inverse of the matrix K. Computationally, though, it is not a good idea to compute K~]. Instead, a factorization of K can be computed which then makes it easy to solve KU — F. For symmetric positive definite systems, the Cholesky factorization is preferred.
225
226
Chapter 10. Direct solution of sparse linear systems
10.1.1 The Cholesky factorization for dense matrices The Cholesky factorization of a symmetric positive definite matrix K takes the form K = RT R, where R is an upper triangular matrix.14 An upper triangular matrix R has nonzeros only on or above the main diagonal; that is, /?,-7- = 0 for i > j. I will begin by discussing the Cholesky factorization for dense matrices, and later turn to the implications of sparsity. Given K = RT R, it is easy to solve KU = F:
The inverse matrices are used for notational convenience. However, no inverse matrices need be formed. The system RT V = F takes the form
The first equation can be solved to get V\, which can then be substituted into the second equation to get ¥2, and so on. This algorithm is called forward substitution. Solving the upper triangular system RU = V by back substitution is no harder. The last equation is solved for UN, which is then substituted into the next-to-the-last equation to get UN-I, and so on. Forward and back substitution each require N2 arithmetic operations when R is N x N and it cannot be assumed that any of the upper triangular entries in R are zero (see Exercise 1). The factorization K = RT R can be derived directly from the definition of matrixmatrix multiplication:
Since R is upper triangular by hypothesis, the index k can be restricted so that the above summation does not involve any elements from the strict lower triangle of R:
In particular, and thus R] \ = *JK\\. It can be shown that all the diagonal entries of a positive definite matrix are positive, so the square root produces a real number. On the other hand, if K\ \ fails to be positive, then the algorithm can detect that K is not positive definite. Having computed R\\,\i is possible to compute the remainder of the first row of R:
14 Sometimes the factorization is expressed as K = LLT, where L is a lower triangular matrix. Obviously the two forms are equivalent, with L = RT.
10.1. The Cholesky factorization for positive definite matrices
227
In general, having computed the first i — 1 rows of R, the /th row can be computed as follows. First of all,
It can be shown that if K is positive definite, then
must hold, and therefore, once again, taking the square root is allowed. On the other hand, if
is detected, this demonstrates that K is not positive definite. Finally, having computed /?/,, the remainder of row i of R can be computed as follows:
The complete algorithm (omitting the test to verify (10.1), that is, to verify that K is really positive definite) is shown in Algorithm 10.1.
Algorithm 10.1. The Cholesky factorization for a dense N x N symmetric positive definite matrix. Computing the Cholesky factorization of a (dense) N x N matrix requires N3/3 + N /2 - 5N/6 arithmetic operations, plus TV square roots (see Exercise 2). This operation count is usually expressed as 2
since only the leading term in the polynomial is significant when N is large.
228
Chapter 10. Direct solution of sparse linear systems
10.1.2 The Cholesky factorization for banded matrices The simplest kind of sparse matrix is a banded matrix, in which all of the nonzeros belong to a band centered at the main diagonal. The precise definition is as follows: AT is a banded matrix with half-bandwidth p if KIJ = 0 for \i — j\ > p. The stiffness matrix is banded in problems with simple geometry when the nodes are ordered in a regular fashion. For example, Figure 10.1 shows a mesh on a square domain with the free nodes numbered by rows. The square is divided into n2 smaller squares, with each small square then divided into two triangles (n — 6 in this example). A typical free node z/, is adjacent to free nodes with indices
which shows that the stiffness matrix is banded with half-bandwidth n. 15 A stiffness matrix K corresponding to Figure 10.1 is shown in Figure 10.2. The Cholesky factor R of a banded symmetric positive definite matrix K with halfbandwidth p is upper triangular with half-bandwidth p. Typically R has more nonzeros than the corresponding lower triangle of K, since zero entries within the nonzero band of K can become nonzero in the corresponding entries of R. This is referred to as fill-in, and it is an important concept in the direct solution of sparse systems. Due to fill-in, the factors of a sparse matrix are usually less sparse than the original matrix. However, when the original matrix is banded, fill-in is limited to the band, and so the factors cannot be too much denser than the original matrix. This is illustrated in Figure 10.2, which shows the sparsity patterns of both K and its Cholesky factor R for the example described above.
Figure 10.1. A uniform mesh with regular mesh numbering.
I5
lf the PDE is Laplace's equation, then the entries #,,/-„ = Ki,i+n = 0 due to cancellation. In that case, the half-bandwidth reduces to « — 1.
10.2.
Factoring general sparse matrices
229
Figure 10.2. Left: The sparsity pattern of a stiffness matrix K corresponding to the mesh of Figure 10.1. Right: The spars ity pattern of the Cholesky factor ofK.
Algorithm 10.2. The Cholesky factorization for a banded N x N symmetric positive definite matrix with half-bandwidth p. The algorithm for the banded Cholesky factorization differs from the dense case only in that the various loops are limited so that the computations remain in the band. Algorithm 10.2 gives the details. Computing the Cholesky factorization of an N xN banded symmetric positive definite matrix with half-bandwidth p requires O (2p2N) arithmetic operations, while solving the corresponding triangular systems requires O(2pN) operations each. The advantage of factoring K instead of computing K~l is dramatic for a banded (or, more generally, sparse) matrix. The inverse of a banded matrix is dense, so the number of operations required to compute K~[ is proportional to pN2. When N ^> p, this is much more than is required to factor K. Even after K~l is known, O(2N2} operations are required to compute A'"1 F, whereas only O(4pN) are needed to solve the two triangular systems in the Cholesky factorization approach.
10.2
Factoring general sparse matrices
If K is a sparse symmetric positive definite matrix with no particular sparsity pattern, then it is difficult to predict how many arithmetic operations will be required to compute the
230
Chapter 10. Direct solution of sparse linear systems
Figure 10.3. A triangulation with 1024 triangles and481 free nodes.
Figure 10.4. Left: The sparsity pattern of the stiffness matrix K corresponding to the mesh of Figure 10.3 (and a certain ordering of the free nodes}. Right: The sparsity pattern of the Choles ky factor R ofK.
Cholesky factorization K — RT R. This is because, in the absence of some regular pattern to the position of nonzeros in the matrix, it is hard to predict how much fill-in will occur. For example, the mesh in Figure 10.3 has 1024 triangles and 481 free nodes. Figure 10.4 shows the sparsity patterns of the stiffness matrix K and its Cholesky factor R. About 1.3% of the entries of K are nonzero, while almost 12% of the entries of (the upper triangle of) R are nonzero. It is important to note that both the sparsity pattern of K and the resulting sparsity pattern of/? are dependent on the ordering of the free nodes in the mesh. The number of nonzeros in K is not changed when the free nodes are listed in a different order, but the sparsity pattern is changed. On the other hand, because fill-in is influenced not just by the
10.2. Factoring general sparse matrices
231
number of nonzeros but by the sparsity pattern, the number of nonzeros in R changes when the free nodes in the mesh are listed in a different order. There are various algorithms for reordering the nodes of a mesh so as to reduce the bandwidth, or more generally the fill-in, of the resulting stiffness matrix. Alternatively, similar algorithms reorder the rows and columns of a sparse matrix to reduce bandwidth or fill-in. The details of these algorithms are beyond the scope of this book; 1 will just briefly mention two such algorithms that apply directly to the matrix. The symmetric reverse Cuthill-McK.ee (RCM) algorithm defines a permutation matrix Q such that QKQ1 tends to have its nonzero entries closer to the diagonal than does K, thus reducing the bandwidth. A permutation matrix results from permuting the rows of the identity matrix. Left-multiplying K by Q permutes the rows of K, while right-multiplying K by QT permutes the columns of A' according to the same permutation. Figure 10.5 shows the sparsity patterns of QKQT (for the matrix K illustrated in Figure 10.4) and the corresponding Cholesky factor, where Q is obtained from the symmetric RCM method. The number of nonzeros in the Cholesky factor is about 23% less than the number of nonzeros in the original Cholesky factor R. The symmetric RCM algorithm is implemented in the built-in MATLAB function sytnrcm. The symmetric minimum degree permutation defines another permutation matrix, Q, such that QKQT tends to have a sparser Cholesky factor than K. This algorithm tries to minimize fill-in directly rather than to reduce bandwidth. Since it is expensive to implement the minimum degree algorithm exactly, there are various approximate algorithms, one of which is implemented in the built-in MATLAB function symamd. Figure 10.6 shows the sparsity patterns ofQKQT (still for the same matrix K), where Q is obtained from symamd, and the corresponding Cholesky factor. The number of nonzeros in the Cholesky factor is now reduced by 50%.
Figure 10.5. Left: The sparsity pattern of QKQT, where K is the matrix from Figure 10.4 and the permutation matrix Q is obtained from the symmetric RCM algorithm. Right: The spars ity pattern of the C'holesky factor ofQKQT.
232
Chapter 10. Direct solution of sparse linear systems
Figure 10.6. Left: The sparsity pattern of QKQ7, where K is the matrix from Figure 10.4 and the permutation matrix Q is obtained from the symmetric minimum degree permutation. Right: The spars ity pattern of the Cholesky factor of QKQT.
Without the use of reordering, the cost (in computational time and memory) of factoring K can easily become unacceptable. For example, the following table shows the number of nonzeros in the stiffness matrix K and its Cholesky factor R as the mesh shown in Figure 10.3 is refined. It also shows that time required to compute R.}6 N I 48T 1985 8065 32513 |
nnz(K) I nnz(R) I Time(s) 3017 13748 0.031552 12681 121387 0.4246 51977 1035889 8.6038 210441 | 8613949 | 183.8831
The size of K grows by a factor of about four each time the mesh is refined, as would be expected, and the number of nonzeros in K grows proportionately. However, the number of nonzeros in R grows much more quickly, and the time taken in computing R grows more quickly still. Using the symamd function in MATLAB to reorder the rows and columns of K leads to the following results: N nnz(K) nnz(R) 481 3017 6841 1985 12681 41451 8065 51977 230098 32513 I 210441 | 1 212742
Time (s) 0.090548 0.13162 0.87183 | 7.9944
16 The actual times are irrelevant, as they were obtained on a nearly obsolete laptop computer. But the relative times are still meaningful.
10.3.
Exercises for Chapter 10
233
The improvement is dramatic; nevertheless, the number of nonzeros in R and the computational time are both growing noticeably faster than the number of nonzeros in K. For large problems, the cost of solving KU = F will therefore overwhelm the cost of forming K and F. It is for this reason that iterative methods for solving KU — F are usually preferred.
10.3
Exercises for Chapter 10
1. Verify that forward or back substitution requires TV2 arithmetic operations for a triangular N x N system. 2. Verify that the exact operation count for Algorithm 10.1 is N3/3 + N2/2 — 5N/6. 3. Suppose R e RNxN is a banded triangular matrix with half-bandwidth p. Determine the exact operation count for solving RU = F or RTU — F. 4. An alternate form for the Cholesky factorization is K = RTDR, where D is a diagonal matrix with positive diagonal entries and R is an upper triangular matrix with ones on the diagonal. The advantage of this alternate form is that no square roots must be computed to form D and R. (a) Write out an algorithm, similar to Algorithm 10.1, for computing D and R. (b) What is the relationship between D and R and the Cholesky factor R satisfying K = RTR? 5. (MATLAB) Consider a sequence of uniformly refined meshes approximating the unit circle, such as shown in Figure 6.3. Consider solving the resulting finite element equations KU = F with and without reordering the nodes (or, equivalently, reordering the rows and columns of K). Make a table showing the number of refinements, the time required to solve KU = F without reordering, and the time required to solve KU — F with reordering. Use the MATLAB routines chol to compute the Cholesky factorization and symamd to reorder the rows and columns of K. 6. In Section 3.1, the possibility of using an orthogonal basis in the context of Galerkin's method was discounted as too expensive. The purpose of this exercise is to examine the cost of producing an orthogonal basis via the Gram-Schmidt process.17 Let Vh be a finite element space, and suppose { 0 i , . . . , 0«} is the standard nodal basis for V/j. Let K be the usual stiffness matrix. Let [9\,..., 9n] be the orthonormal basis produced by the Gram-Schmidt orthogonalization process applied to {0|,..., 0n}. (a) Prove that there is a lower triangular matrix L such that
l7 The Gram-Schmidt process is a standard method for producing an orthogonal basis from a given basis. An explanation can be found in any introductory text on linear algebra, such as those listed in the bibliography.
234
Chapter 10. Direct solution of sparse linear systems
(b) Show that LKLT — /, where / is the n x n identity matrix. (c) Find the relationship between L and the Cholesky factor R of K (R upper triangular, K = RTR). (d) Compare the cost of finding the orthonormal basis with the cost of forming the matrix K and solving the equation KU — F.
Chapter 11
Iterative methods: Conjugate gradients
In the preceding chapter, direct methods for sparse systems were shown to be hampered by fill-in, which in turn was determined by the sparsity pattern of the matrix. For iterative methods, on the other hand, the sparsity pattern matters little, since the only requirement is that the matrix-vector product K V can be computed efficiently. The sparsity of K is important only in that it makes the computation of K V inexpensive. I will discuss two general classes of iterative methods, those based on minimizing an associated functional, and stationary iterations, based on fixed point iteration. The most important example of the first class is the conjugate gradient (CG) method, which is the subject of this chapter. The second class, examined in the next chapter, includes the Jacobi, Gauss-Seidel, and successive overrelaxation (SOR) methods.
11.1
The CG method
The CG method is actually an algorithm for minimizing a quadratic form. If K e R yVxA ' is symmetric positive definite, F e R yv , and 0 : RN —>• R is defined by
then a direct calculation shows that
Therefore, the unique stationary point of 0 is
Moreover, a consideration of the second derivative matrix shows that this stationary point is the global minimizer of 0 (a quadratic form defined by a symmetric positive definite matrix 235
236
Chapter 11. Iterative methods: Conjugate gradients
Figure 11.1. The graph of a quadratic form defined by a positive definite matrix. The contours of the function are also shown.
is analogous to a scalar quadratic ax2 + bx + c with a > 0— see Figure 11.1). Therefore, solving KU — F and minimizing 0 are equivalent. Any iterative minimization algorithm can therefore be applied to 0 and, assuming it works, it will converge to the desired value of U. An important class of minimization algorithms consists of descent methods based on a line search. Such algorithms are based on the idea of a descent direction: Given an estimate t/ (() of the solution, a descent direction P for 0 at t/ (l) is a direction such that, starting from t/ (l) , 0 decreases in the direction of P. This means that, for all a > 0 sufficiently small,
holds. Equivalently, this means that the directional derivative of 0 at U(i) in the direction of P is negative, that is,
Given a descent direction, a line search algorithm will seek to minimize 0 along the ray {U(l) + aP : or > 0} (that is, it will search along this "line," which is really a ray). The quadratic 0 reduces to a scalar quadratic along a one-dimensional subset, so it is particularly easy to perform the line search. Indeed,
11.1. TheCG method
237
(the symmetry of K was used to combine the terms U(i) • KP/2 and P • KU(i)/2). The minimum is easily seen to occur at
How should the descent direction be chosen? The obvious choice is the steepest descent direction since the directional derivative of 0 at £/ (/) is as negative as possible in this direction. The resulting algorithm (choose a starting point, move to the minimum in the steepest descent direction, calculate the steepest descent direction at that new point, and repeat) is called the steepest descent algorithm. For an example of a line search in the steepest descent direction, see Figure 11.2. The steepest descent method is guaranteed to converge to the minimizer U of 0, that is, to the solution of KU — F. However, it can be shown that the steepest descent method converges slowly, especially when K is ill-conditioned. The condition number of a symmetric positive definite matrix K is
where A m j n and A max are the smallest and largest eigenvalues of K, respectively. The matrix K is called ill-conditioned if cond(^) ^> 1. When K is the stiffness matrix for a (twodimensional) BVP and a nodal basis is used, its condition number is O(h~2), where h is the mesh size. This shows that K becomes increasingly ill-conditioned as the mesh is refined. To show the relationship between the condition number of K and the convergence of the steepest descent algorithm, it is convenient to introduce an alternate inner product and norm for RN: (x, y)K — x • Ky, \\x\\K = y/(x,x)K. It can be shown that (•, -)/c defines an inner product on R^ when K is symmetric positive definite.18 Moreover, the sequence of vectors f/ (0) , f/ ( 1 ) , f / < 2 ) , . . . produced by the steepest descent algorithm satisfies
A proof can be found in Luenberger [28]. I8 lf K is the stiffness matrix for a BVP, then the inner product defined by K is the discrete version of the energy inner product a ( - , •)•
238
Chapter 11. Iterative methods: Conjugate gradients
Figure 11.2. The contours of the quadratic form from Figure 11.1. The steepest descent direction from U = (4, 2) (marked by o) is indicated, along with the minimizer in the steepest descent direction (marked by o). The desired (global) minimizer is marked by x. The bound (11.4) implies that the number of iterations needed by the steepest descent method is roughly proportional to the condition number of K (see Exercise 2). The algorithm therefore takes increasingly long to converge as the condition number of K increases (for example, as the underlying mesh is refined). As I will show, much better algorithms are available. EXAMPLE n.i. A fixed test problem will be used to test the algorithms described in this chapter, namely, the BVP
where £2 is the unit square and
The exact solution is The mesh on £2 is uniform, constructed by dividing the x and y intervals into n subintervals each. This results in 2n2 triangles and (n — I}1 free nodes. The finite element equation, KU = F, is therefore N x N, with N = (n - I) 2 .
11.1. TheCG method
239
The steepest descent algorithm is used to solve KU = F, starting with f/ (0) — 0 and stopping when the relative residual
falls below 10 6. The vector R(k) = F — KU^ is called the residual in the equation KU — F; it is the amount by which the equationfails to be satisfied. The iterations required are reported in the following table, along with the condition number of the matrix K.
N 9 49 225 961 3969 16 129
Iterations 36 162 685 2798 11254 45062
cond(K) 9.00 37.26 150.42 603.05 2413.60 9655.79
Ratio 4.00 4.35 4.55 4.64 4.66 4.67
In this experiment, the number of iterations is roughly proportional to the condition number ofK,as suggested above.
11.1.1 The CG algorithm The CG algorithm is another descent algorithm that is usually a great improvement over the steepest descent method. The problem with the steepest descent method is that while the steepest descent direction is locally optimal, the search directions are poorly chosen from a global point of view. Indeed, it can be shown that successive search directions are orthogonal, and thus the path followed to the solution is not efficient. The CG algorithm defines the successive search directions to satisfy a pleasing global property: Each step preserves the optimality of previous steps. To be precise, after k steps of CG, the estimated solution is the minimizer of 0 over the ^-dimensional subset spanned by the first k search directions. It is rather difficult to derive the CG algorithm—the final form results from several nonobvious simplifications. I will content myself with showing the critical step: the computation of the search direction. I assume that the initial estimate of the solution is t/(0) = 0, that the first/: search directions are P (1) , P(2\ ..., P(k \andthat after k steps a. \, «2, • • • , « * are determined so that
solves 1 now wish to find a new search direction p(k+l) with the following property: If
240
Chapter 11. Iterative methods: Conjugate gradients
where
solves
hten
solves
It is not clear a priori how to compute such a jP(*+1). However, it turns out to be easy, and this is the secret of the CG method. The solution of (11.6) is given by with the property that
is as small as possible. I separate the last term, $t+i P(k+]\ from the sum because I already know how to make
as small as possible. Some straightforward algebra shows that
Here is the crucial observation: If P(k+l) is chosen so that
is zero, then
The minimization problem is then "decoupled." That is, fi[, fa, • • • , ftk can be chosen to minimize
and $t+i to minimize and the resulting ft], fa, • • • , Pk+\ will be the solution of (11.6). By assumption, $ = a,, i — 1,2,... ,k, solves (11.7), and I have already shown how to compute fik+\ using the fact that (11.8) reduces to a quadratic in one variable.
11.1. The CG method
241
The problem then reduces to finding P(k+]) to satisfy
It is certainly sufficient to satisfy
To find p(*+1\ it can be assumed that this property was satisfied at the previous steps:
Condition (11.9) states that the vectors P ( ] ) , P(2\ ..., P(k) are orthogonal with respect to the inner product (-, -)/c introduced above. A search direction P (A+1) with the desired property can be computed from any descent direction by subtracting off its component lying in the subspace
the result will be orthogonal to each of the vectors P (1) , F ( 2 ) , . . . , P(k\ The CG method results from choosing the descent direction to be the steepest descent direction R(k) = -V(j)(U(k)) = F - KU(k). Then
Finally, computing ak+\ is simple, as shown earlier:
Then The above formulas define the CG algorithm. However, efficient implementation depends on several simplifications that are not straightforward to derive. The reader is referred to Golub and Van Loan [21] for proofs of the following assertions. It can be shown that if P ( l ) , P ( 2 ) , . . . , P(k) are chosen as described above, then
has an alternate basis:
A subspace of the form (11.12) is called a Krylov subspace, and so the CG method can be described as a Krylov subspace method. Because (11.12) holds, it can be shown that
242
Chapter 11. Iterative methods: Conjugate gradients
and thus (11.10) reduces to
In other words, R(k) is automatically orthogonal to P (1) , P(2),..., P(k l\ so it need only be orthogonalized against P(k). This results in a significant savings in computation that makes the CG method affordable. Further analysis shows that
and Using these formulas, which reduce the amount of computational work that must be done, the CG algorithm can be written in the efficient form found in Algorithm 11.1. The reader should note that only a single matrix- vector product is required at each step of the algorithm, making it very efficient. In addition to the matrix- vector product, only O(\QN) arithmetic operations are required for each iteration. The storage requirements, beyond K and F, are only four vectors, denoted R, P, V, U in Algorithm 11.1. The CG algorithm is usually halted when the relative residual is reduced to a level considered satisfactory, or when a predetermined iteration limit is reached.
R <-F P +- R c\ <- R • R for* = 1 , 2 , . . . V *- KP c2 <- P • V a <- ci/c 2 C7 <- C/ + aP /? ^- ^ - a V c3 <- R- R ft ^ C 3 /Ci
P ^ pP + R
C] <~C3
Algorithm 11.1. The CG algorithm for solving KU = F, where K is a symmetric positive definite matrix. The name "conjugate gradient" is derived from the fact that many authors refer to the orthogonality of the search directions, in the inner product defined by K, as K-conjugacy. Therefore, the key step is to make the (negative) gradient direction conjugate to the previous search directions.
11.1. TheCG method 1 1 .1 .2
243
CCConvergence of the CG algorithm
According to the above derivation, CG produces a sequence U ( l } , U(2) , . . . with the property that U{k) minimizes
over the ^-dimensional subspace Sk. Since SN — R^ and the exact solution of KU — F minimizes 0 over R^, this shows that U ( N ) is the exact solution of KU — F. Therefore, CG can be regarded as a direct method; in exact arithmetic, it produces the exact solution of the equation in a finite number of steps! However, this fact is not really relevant, for two reasons. First of all, round-off errors tend to lead to a loss of orthogonality, so that U ( N ) is usually not as accurate a solution of KU = F as one expects from a direct method. More importantly, though, is that CG is usually applied to systems that are large enough that performing N iterations is unrealistic. The CG method is useful because it can produce quite accurate results in many fewer than N iterations. The convergence analysis, which is outlined in Luenberger [28], shows that
Therefore, the convergence behavior depends on the square root of the condition number rather than the condition number itself, a considerable improvement over the steepest descent algorithm; one can show that the number of iterations required is proportional to the square root of the condition number (see Exercise 3). In many cases, the algorithm behaves even better than this bound suggests. Nonetheless, it is still advantageous to try to replace KU = F by an equivalent system with a better-conditioned coefficient matrix. This is the subject of the next two sections. EXAMPLE 11.2. The convergence of the CG is illustrated on the test problem from Example 11.1, solving KU = F for a sequence of increasingly fine meshes. The following table shows the number of iterations taken by CG in reducing the relative residual to 10~6:
N
9 49 225 961 3969 16 129
Iterations 5 16 33 68 138 282
Jcond(K) 3.00 6.10 12.26 24.56 49.13 98.26
Ratio 1.67 2.62 2.69 2.69 2.71 2.87
In this example, the number of iterations is roughly proportional to the square root of cond(K), a striking improvement over the steepest descent algorithm.
244
11.2
Chapter 11. Iterative methods: Conjugate gradients
Hierarchical bases for finite element spaces
So far, I have assumed that a nodal basis {<j>\,
will be used to represent the finite element solution. However, a finite-dimensional space has many possible bases, and it may be advantageous to choose one basis over another. The nodal basis is the most popular basis because of ease of use; its interpretation is simple and it is straightforward to assemble K and F. On the other hand, the condition number of K depends on the particular basis chosen, and a nodal basis leads to a condition number of O(h~2), which grows significantly as the mesh is refined. An alternative to a standard nodal basis is a hierarchical basis (first proposed by Zienkiewicz et al. [45]; see also Yserentant [44]). For the piecewise linear case, a hierarchical basis can be defined in a natural way when the mesh is obtained by several refinements of an initial, coarse mesh. In the following discussion I will consider a sequence of meshes To, T\, Ti,..., where To is a given (coarse) mesh and each 77+i is the standard refinement of 77- Each mesh is assumed to consist of linear Lagrange triangles, so every node in a given mesh is a vertex of one or more triangles. By construction, each node in 77 is also a node in Ti+\. Moreover, every node belonging to 77+1 but not to 77 is the midpoint of an edge in 77- I will assume that the nodes belonging to To are z\, 12, • • • , ZNM, the nodes of T\ are z\, 12, • • • , zN(i>, and so forth. Thus the nodes of 77 are the first N^l) nodes of 77+1.
11.2.1
Hierarchical bases for linear Lagrange triangles
I will write P((1) for the space of continuous piecewise linear functions defined on 77- The standard basis functions for P^ do not belong to the space P/ }, since they are not linear over the triangles of Pf-(1). However, every function in p f l ) is a member of P/+j (see Figure 11.3). In particular, any basis for P;(1) is a linearly independent set in P^\, and can therefore be extended to a basis for P^}. This is the foundation of the hierarchical basis construction. The first step in the hierarchical basis construction is to define
to be the standard Lagrange basis for P 0 . The hierarchical basis for rPt(1)i is then taken to be where i/r/y) denotes one of the standard basis functions defined relative to T/. I will now show the exact relationship between the hierarchical basis (11.15) and the standard Lagrange basis
for
11.2. Hierarchical bases for finite element spaces
245
Figure 11.3. A standard basis function on a mesh (left) and the same function on the refined mesh (right). If i < N^\ then all the basis functions in (11.15), except ty- \ are zero at node z t. That is,
It follows that if u E P\}) is written as
then In other words, the first N(v ' coefficients representing u are its nodal values at the nodes of To. On the other hand, i f / > A^0), then v[0), • • • , V^m are nt>t all zero at z/, which is a midpoint of some edge in To- If the endpoints of this edge are z enc j (/1} and zen(j(/ 2 )» then
and Therefore, (11.17) implies
246
Chapter 11. Iterative methods: Conjugate gradients
where the second equality comes from applying (11.18). Thus
This equation should be interpreted as follows. Suppose w (0) is the piecewise linear function on To that interpolates u at the nodes of To- Then, as I have already shown,
The function u itself can then be written as
where the coefficients otNm+l,..., a^in are the differences between the nodal values at the nodes ZNW+} , . . . , ZNW of u and the interpolant M (O) :
This pattern continues as more levels are added to the hierarchy. For example, if M e P2(1)> men u can be written as
where
is the piecewise linear function from P,(1) interpolating u at the nodes of T\. Since, for i > N^\ each i^/(2) is zero at all of the nodes of 7i, the coefficients a\, « 2 , . . . , oyn are defined exactly as above. To determine the coefficients a A , < i ) + 1 , . . . , o^at, I reason as before. Suppose / > N^\ so that z, is a node belonging to Ti but not to 7i. Then zt is the midpoint of some edge in 71, and
11.2. Hierarchical bases for finite element spaces
247
Therefore,
In general, the hierarchical basis for P/1* is the hierarchical basis for P^\ , augmented by the nodal basis functions corresponding to nodes belonging to Ti but not to 77- 1. Any M e / ( 1 ) can be written as
where u ( l ) is the element of P^ ) interpolating u at the nodes of 7/. Each of the differences w ( ( ) — M ( '" I } has the property that it is zero at the nodes of 71-1, and thus u ( l ) — u(l~}) is oscillatory (see Figure 11.4).
Figure 11.4. The function u(0) (top left), w ( l ) - w ( 0 ) (top right), u{2) - w ( 1 ) (bottom left), and « (3) — w ( 2 ) (bottom right) for u(x, y) — sin (nx) sin (ny).
248
Chapter 11. Iterative methods: Conjugate gradients
The relationships derived above lead to compact algorithms for converting from a nodal basis to a hierarchical basis, and vice versa. The linear operator taking the coefficients representing u e P^ in the hierarchical basis and producing the vector U of nodal values of u will be denoted by S. To describe algorithms for computing S and S"1, the following notation will be used: The set A//, i = 1 , 2 , . . . , k, consists of the nodes belonging to T\ but not to TI-i • As above, any z/ e A/i (i > 1) is the midpoint of the edge with endpoints z end(i i) anc^ zend(i 2r Suppose the nodal values of a piecewise linear function M, defined on the mesh 7*, are stored in an array U: U(i) = w(z/). The following double loop19 implements U <— S~}U; that is, it overwrites the values of U with the coefficients a/, i = 1 , 2 , . . . , N^k\ expressing u in terms of the hierarchical basis: for for This algorithm works for the following reason: While the coefficients corresponding to Zj e MI are being computed, the condition U(t} — u ( z t ) still holds for it e ACV, s < i. It is therefore a simple matter to subtract off the average of the nodal values at the endpoints of the edge whose midpoint is Zj. The following double loop takes an array U containing the coefficients expressing u e P£}) in terms of the hierarchical basis and overwrites it with SU, the vector of nodal values of u: for*
for In solving most BVPs, the approximating subspace is a subspace V, of P((1) rather than itself. The subspace takes into account the essential (Dirichlet) boundary conditions. In this case, the nodal basis for V/ is a subset of the nodal basis for P ( (l) , denoted P((1)
In this case, the hierarchical basis for V* is
The above algorithms for computing 5 and S } are still valid; the coefficients (in either the nodal or hierarchical basis) of basis functions corresponding to constrained nodes are assigned value zero upon input and ignored upon output. For the sake of convenience, I will denote the hierarchical basis for V* as
Each Yi is a nodal basis function from one of the meshes To, 7 T , . . . , 7/t19
cf. Yserentant [44].
11.2. Hierarchical bases for finite element spaces
11.2.2
249
Relationship between the stiffness matrices in nodal and hierarchical bases
In a hierarchical basis resulting from a mesh that has been refined several times, some of the basis functions (those coming from the coarse meshes) have supports that cover many triangles. One of the main advantages of the nodal basis is the sparsity of the resulting stiffness matrix, which results directly from the fact that the nodal basis functions have small support. It would appear, then, that the hierarchical basis surrenders this sparsity of the stiffness matrix, and would therefore be of limited use. However, when the finite element equations are solved by iterative methods, such as the CG method, the matrix itself is not needed but only the ability to multiply it by a vector. For the sake of the following discussion, I will denote by K the stiffness matrix arising from the nodal basis for Vk and by K the stiffness matrix arising from the corresponding hierarchical basis. Similarly, F and F will denote the two load vectors. I will now show how the matrix-vector product KU can be computed nearly as cheaply as KU, even though K is much less sparse than K. The matrix K is defined by the values W • KU, where U and W are arbitrary vectors in RNf . Indeed, ifU,W e RNf , and u, w are the continuous piecewise linear functions defined by
then
In particular, the entries of K are determined by taking u, w to be nodal basis functions and U, W to be the corresponding Euclidean vectors. By the same reasoning, the matrix K is determined by the values of
where
250
Chapter 11. Iterative methods: Conjugate gradients
But if U — SU, then U and U define the same piecewise linear function, and similarly for W = SW:
It follows, then, that
Therefore, since this holds for all U, W,
Similar reasoning shows that F — ST F. If an iterative method is used to solve KU = F, the matrix K — STKS need not be formed explicitly. Rather, matrix-vector products of the form KU can be computed via
I have already given an algorithm for computing SU efficiently. Here is the corresponding algorithm for overwriting U with STU: for for
For the sake of completeness, I will also give the algorithm for overwriting U with S TU: for for
11.3
The hierarchical basis CG method
As I discussed in Section 11.2, the condition number of the stiffness matrix K is not determined solely by the BVP, mesh, and choice of approximating subspace. The choice of basis is of primary importance. If a hierarchical basis is used in place of the standard nodal basis, then the condition number of the resulting stiffness matrix is much smaller than when the nodal basis is used. As in Section 11.2, K and F denote the stiffness matrix and load vector corresponding to the hierarchical basis.
11.3. The hierarchical basis CG method
251
The reader will recall that any piecewise linear function uh e P(h}) can be written either in terms of the nodal basis or the hierarchical basis, and there is a vector of coefficients representing «/, in either basis. For the nodal basis, the vector U of coefficients contains the nodal values of M/, . For the hierarchical basis, the vector U of coefficients can be related to U by a simple linear transformation: U = SU. The transformation S can be implemented in a simple and efficient double loop, which was given in Section 11.2. When the approximating subspace V/, is a subspace of P^ } obtained by omitting the basis functions corresponding to constrained nodes, there is a similar transformation, which will also be denoted by 5. The hierarchical basis CG method is simple: The ordinary CG method is applied to KU — F instead of to KU — F. The solution U is then transformed into the corresponding U. It was shown in Section 11.2.2 that K = STKS and F = STF. The hierarchical basis CG method is then applied as follows: 1. Form K and F in the usual manner. 2. Compute F = STF. 3. Apply the CG method to KU — F to get an approximate solution U. The matrix K is needed only to compute matrix-vector products and is never formed explicitly. Instead, when it is necessary to compute KP — (ST KS)P, the equivalent computation K P = ST(K(SP))isusGd. 4. The approximate solution UofKU — F is transformed into an approximate solution ofKU = Fby U = SU. The algorithms for applying S and ST each require about 3NV operations, for a total of 6N V = 6Nf operations. One iteration of CG required one matrix-vector multiplication plus about 10W/ operations. Since computing KP will typically require at least lOAf/ operations, the operation count per iteration is about 20Nf for ordinary CG versus about 26Nf for the hierarchical CG method. Therefore, an iteration of hierarchical CG costs, at most, only about 30% more than an iteration of the ordinary CG algorithm. EXAMPLE 11.3. To illustrate the convergence of the hierarchical basis CG method, I use the test problem from Example 11.2. The following table shows the number of iterations taken in reducing the relative residual to 10~6: N
9 49 225 961 3969 16129
Iterations 6 18 26 34 41 48
cond(K) 9.00 37.26 150.42 603.05 2413.60 9655.79
In this example, the number of iterations grows only slowly with N. In particular, on the finest mesh, the hierarchical basis CG method requires less than 14% of the iterations of the standard CG, and so the total time required should be less than 20% of the time for the standard CG.
252
Chapter 11. Iterative methods: Conjugate gradients
11.4 The preconditioned CG method The CG method works well for solving KU — F provided K is not too ill-conditioned. If K is ill-conditioned, it may be possible to replace KU = F by an equivalent system KU = F with the property that K is better conditioned than K. This is the idea behind preconditioning. A preconditioner is a matrix L that is somehow an approximation of K but has the property that it is significantly easier to solve systems of the form LU — G than KU = F. I assume that L is also symmetric positive definite, which means that it has a square root, a symmetric positive definite matrix L l / 2 such that L = (L 1/2 ) 2 . Then K = L~l/2KL~l/2 is symmetric and positive definite, and K may be better conditioned than K. For example, if L were actually equal to K, then K would be the identity matrix, which is perfectly conditioned. If L is close to K in some sense, then K may be close to the identity in some sense and therefore not too badly conditioned. The system KU — F is equivalent to the system KU = F, where F = L~ 1/2 F and the solutions are related by U — L~*/2U. Given a preconditioner, the CG method can then be applied to KU — F, yielding the following formulas:
This approach is not useful as it stands, because computing L 1/2 is nearly always more expensive than solving KU = F in the first place. However, by the following substitutions, the explicit use of L 1/2 can be avoided. The vectors P(k\ U(k\ and R(k) are defined by
The reader should notice the difference in the definition of R(k). These definitions yield the following simplifications:
It is then straightforward to see that (11.19) becomes
11.4. The preconditioned CG method
253
The preconditioner L appears in (11.21) only in that an equation of the form LZ = R must be solved. One of the fundamental assumptions about the preconditioner L is that LZ = R is considerably cheaper to solve than KU = F, and now the meaning of this is made clear: It must be reasonable to solve a system of the form LZ = R at each iteration of the algorithm. The preconditioned conjugate gradient (PCG) algorithm is presented in Algorithm 11.2. The storage requirement is five vectors, R, Z, P, V, U, in addition to K and F. Each iteration requires O(107V) arithmetic operations plus one matrix-vector product KP and the solution of LZ — R.
R <- F Z +- L~1R P +-Z C \ <- R - Z
for& = 1 , 2 , . . . V <- KP c2 +- P -V a <- C] /c2 f/ ^ [ / + « P /? <- / ? - « V Z ^- L-'/? ci^ R-Z ft <«- c 3 /ci P ^-£P + Z C, ^C 3
Algorithm 11.2. The PCG algorithmfor solving K U — F, where K is a symmetric positive definite matrix and L is a symmetric positive definite preconditioner.
11.4.1
Alternate derivation of PCG
A symmetric positive definite preconditioner L can be factored as L = EET instead of L = (L 1/2 ) 2 . The matrix E could be obtained from the Cholesky factorization, but there are other possible choices. The system KU = F is then transformed into KU — F, where
254
Chapter 11. Iterative methods: Conjugate gradients
Defining
and substituting into the CG algorithm applied to KU — F yields the same PCG algorithm given in Algorithm 11.2. Once again, E need not be computed explicitly; the preconditioner only appears in that the system LZ — R must be solved at each iteration.
11.4.2
Preconditioners
In this section, I mention several preconditioners that can be used in conjunction with the CG method. Hierarchical bases
The hierarchical basis CG method transforms KU — F to
where 5 is the transformation from the hierarchical basis to the nodal basis. Taking E1 — S~T, this can be interpreted in terms of Section 11.4.1. The preconditioner is L = S~T S~l, and the hierarchical basis CG method is an example of PCGs. In this special case, however, the transformation that is so important in expressing the PCG algorithm (obviating the need to form L 1/2 or E) is not needed, since S is readily available. Because of its generality and relative simplicity, the hierarchical CG method is highly recommended for two-dimensional problems. Diagonal preconditioning
Perhaps the simplest preconditioner is a diagonal matrix, and the obvious diagonal entries are those of A': This is referred to as Jacobi preconditioning, for reasons that will be obvious in the next chapter. Although this preconditioner is very simple to use, it is frequently not very helpful. If the diagonal of K is constant, as in Example 11.1, then Jacobi preconditioning has no effect on the convergence of CG; this is one extreme. On the other hand, Jacobi preconditioning can make a significant difference when the magnitudes of the diagonal entries of K vary significantly, as in the following example. EXAMPLE 11.4. To illustrate the advantage of Jacobi preconditioning on certain problems, I will use the BVP
11.4. The preconditioned CG method
255
where £2 is the unit disk,
and Since K varies noticeably over £2, so do the diagonal entries ofK, which range from about 4 to almost 90. The following table shows the number of iterations taken by CG and PCG with Jacobi preconditioning in reducing the relative residual to 10~6:
W 5 25
113 481 1985 8065
Iterations (CG) 1 6 25 62 149 317
Iterations (PCG) 1 6 16
33 69 144
SSOR preconditioning
In the next chapter, I describe the symmetric successive overrelaxation (SSOR) iteration. While SSOR is an iterative method in its own right, it also defines a preconditioner for CG. I will describe the use of this preconditioner in Section 12.2.5. The incomplete Cholesky factorization
As I discussed in Chapter 10, the difficulty with using the Cholesky factorization is that the level of fill-in is unpredictable and can be significant. A preconditioner can be obtained by computing an approximate Cholesky factorization of K in which fill-in is limited by fiat. Such a factorization K = RTR is called an incomplete Cholesky factorization. For example, the level 0 incomplete Cholesky factorization allows no fill-in; the only nonzeros in the factor correspond to nonzero entries in (the lower triangle of) K. Another variation of the incomplete Cholesky replaces small entries in the Cholesky factorization with zero, thus reducing the level of fill-in. The drop-tolerance determines which entries are considered small enough to be set to zero. An incomplete Cholesky factorization defines a preconditioner L — RTR which can then be used in the PCG algorithm. For more details about incomplete Cholesky factorizations, the reader can consult the seminal papers by Meijerink and van der Vorst [31] and Manteuffel [29]. The books by Saad [37] and Axelsson [6] discuss incomplete factorizations and many other types of preconditioners. Fast Poisson solvers When the domain is a rectangle and a uniform mesh is used, it is possible to develop a fast solver for constant coefficient PDEs such as the Laplace or Poisson problem using the fast Fourier transform (FFT). Such a solver is called a fast Poisson solver, it can be used as a preconditioner for nonconstant coefficient problems. For more details, the reader can consult Saad [37].
256
11.5
Chapter 11. Iterative methods: Conjugate gradients
The pure Neumann problem
Some problems with pure Neumann boundary conditions, such as
lead to singular systems of finite element equations. (An example of this type was encountered in Example 7.4.) When the finite element method is applied, resulting in the equation KU — F, the stiffness matrix is singular. In this section, I will explain why K is singular in this context, and how the system KU — F can be solved. The reader will recall from Section 1.1 that (11.23) has a solution only if the compatibility condition
is satisfied. If (11.24) holds, then there are infinitely many solutions, any two of which differ by a constant. When (11.23) is expressed in its weak form,
the bilinear form a(u, v) = / f i /cVw • Vi> fails to be //'(^-elliptic. To be precise, the constant function 1 has the property that
and thus
cannot hold for all u e Hl (Q). When the finite element method is applied (with a nodal basis), the result is the system KU = F, where
Since the constant function 1 can be written as
the vector E of all ones represents the constant function 1. The following calculations show E spans the null space of AT, that is, that K V — 0 if and only if V = cE for some constant c:
11.5. The pure Neumann problem
257
Therefore, K is a positive semidefinite matrix with a one-dimensional null space (spanned by £), and special methods are needed to solve KU = F. As I discussed in Section 2.5, the variational problem (11.25) can be made well-posed by restricting the solution u and the test functions v to the space
The bilinear form «(-, •) is V-elliptic, and thus
has a unique solution. The question then arises of how to translate this into an effective computational technique, that is, to a nonsingular sparse system KU — F. Any basis for Vh — V n Ph} (the space of continuous piecewise polynomials of degree d with mean zero) will lead to a nonsingular stiffness matrix. However, unless the basis is constructed carefully, K will be dense. Bochev and Lehoucq [12] show how to define a nodal basis for V/,; under such a basis, K will be sparse and one could use either direct or iterative methods to solve KU = F. An alternative (also suggested in [12]) is to modify the bilinear form a ( - , •) to make it //'(^-elliptic, dispensing with the need for a basis for V. To see how to do this, it is helpful to recall the results of Section 2.2.1: (11.25) is equivalent to the problem
where The function J does not have a unique minimizer but rather a one-dimensional set of minimizers. However, the constrained minimization problem
258
Chapter 11. Iterative methods: Conjugate gradients
has a unique minimizer, the solution w of (11.26). Constrained minimization problems are more difficult to solve than unconstrained problems, and so various techniques have been developed to convert a constrained problem into an unconstrained problem (or a sequence of unconstrained problems). The simplest such technique is the quadratic penalty method, in which the square of the constraint is added to the function to be minimized as a penalty term. The penalty term is multiplied by a weight so that the constraint can be more or less emphasized in the minimization. Thus (11.27) is replaced by
where
and p is the penalty parameter. In the typical application of the quadratic penalty method, there would be no minimizer of J that also satisfied the constraint, so the exact solution of (11.27) would be found only in the limit as p -» oo (increasing p causes the constraint to be satisfied more closely). However, the application at hand is a special case: J does have a minimizer, namely u, that also satisfies the constraint. It is easy to see, therefore, that the unique solution of (11.28), for any p > 0, is u. The optimality condition for (11.28) is
where DJp(u} is the derivative of Jp at u. The following computation reveals DJp(u):
This expresses Jp(u + v) as Jp(u) + (a term linear in v) + (a term quadratic in u). The linear term must be DJp(u)v:
The solution u of (11.26) and (11.28) is the unique solution of the variational equation
11.5. The pure Neumann problem
259
or, equivalently, where
The preceding discussion was intended to motivate replacing (11.26) with (11.29). I will now show directly that (11.29) has u as its unique solution. First of all, since ap(u, v) = a(u, v) whenever JQ u — 0, M solves (11.29). It remains only to show that a p ( - , - ) i s H l ( £ 2 ) elliptic, and therefore (11.29) has a unique solution. Any v e //' (Q) can be written as u = 0 + U, where v e V and U is a constant. The constant U is defined by
where |£2| is the measure of Q, (|£2| = JQ 1). The function v is defined by v — v — v, and a direct calculation shows that v e V (see Exercise 4). Moreover,
since fn v = 0 and Vu — 0. Therefore, v and TJ are orthogonal and hence
follows from the Pythagorean theorem. The bilinear form « ( • , • ) is V'-elliptic:
(see Section 2.5). Also, for any constant function U,
while
Therefore, for any v € Hl (^),
260
Chapter 11. Iterative methods: Conjugate gradients
This shows that ap(-, •) is H] (£2)-elliptic. When the finite element method is applied to (11.29), the result is the system KPU — F, where F is the usual load vector and
The matrix Kp can be written as Kp — K + pWWT, where K is the usual stiffness matrix and W e RN" is the vector defined by
The matrix WWT is called a rank-one matrix; each column is a multiple of the vector W. Although WWT and therefore Kp are dense, in the context of iterative methods this turns out not to matter. It is necessary only to efficiently multiply Kp by a vector, and this can be done by taking advantage of the special structure of WWT:
(WT V is the scalar W • V). The matrix Kp is necessarily symmetric and positive definite, since it is derived from the //'(^O-elliptic bilinear form a p ( - , •)• This can be seen directly, since V • KV > 0 for all vectors V, with V • KV = 0 only if V is a multiple of E, the vector of all ones. On the other hand, V • WWT V = (W • V}2 > 0 for all V, and
It follows that V • Kp V > 0 for all V, since
and both terms are nonnegative, with at least one of them positive. The unique solution of KPU = F is the unique solution of KU = F satisfying the constraint W • U = 0. This corresponds to the piecewise polynomial M/, that satisfies a(uh, v) = t(v) for all v e Vfl and also the constraint /n «/, = 0. If desired, the vector W can be replaced with E itself, the vector of all ones. Writing now Kp = K + pEET, the solution of KPU — F is the unique solution of KU = F that satisfies the additional constraint E • U — 0. This avoids the computation of the vector W.
11.5. The pure Neumann problem
261
The value of the penalty parameter p should be chosen so that the matrix Kp is as well-conditioned as possible. The eigenvalues of K are 0 < ^2 < • • • < A/y,,, and thus the effective condition number of K (in a sense that 1 will not define precisely) is A/v,, A 2 . If P is defined to be a/Nv, where a satisfies ^2 < a < X.NV, then the eigenvalues of Kp — K + pEET lie in the interval [A.2, A.#(l], and the condition number of Kp is also A.jv,,/A.2. The reader is asked to justify this conclusion in Exercise 5. If Kp is defined by Kp — K + pWWT instead, then a reasonable choice of p is p = aNv/(W • E)2 (see [12]). A reasonable value of a can be found by the formula
where V is a vector whose components are random numbers. The resulting a is guaranteed to lie in the interval [0, Xyy,,] and virtually certain to lie in the interval [^2, A.#J. The preceding discussion assumed that K and F are computed exactly, so that the null space of K is known and KU = F is a consistent singular system. When quadrature is used, the exact K and F are replaced with K and F, and the system KU = F may not be consistent. 1 will ignore the error in K, for the following reason: The quadrature rule is unlikely to change the null space of K, since, as 1 showed above,
and the quadrature rule need merely preserve the condition that J Q ^Vi» • Vv — 0 if and only if Vv is identically zero. Therefore, although K is not computed exactly when K is nonconstant, K has the same null space as K, so the errors in K can be ignored when discussing the consistency of the linear system. Thus, 1 will discuss the system KU = F, where F is an estimate of the load vector F. The error F — F can be decomposed into two components, the part that lies in the null space of K and the part that is orthogonal to the null space. Thus F — F — F + F, where E • F — 0 and F = uE. The system KU = F is therefore inconsistent unless F happens to equal zero. On the other hand, since Kp is nonsingular, the system KPU = F must be consistent (with a unique solution). If the solution is denoted by Up, then
In Exercise 6, the reader is asked to show that K~] F is orthogonal to E and independent of p, and that K ~ ] F is a multiple of E whose size depends on p. The error K~] F is consistent with the equation KU = F; that is, it cannot be separated from the true solution U = K~} F (unless F is known exactly). On the other hand, the error K~] F arises from the inconsistency of the right-hand-side F. It is a multiple of the constant vector E and its presence means that Up will not satisfy E • Up = 0. The error K~} F can be eliminated by replacing F with its projection onto the null space of K, namely,
262
Chapter 11. Iterative methods: Conjugate gradients
Doing so sets F to zero and eliminates the inconsistent error in the solution of the linear system. This is not necessary; doing it changes the computed solution by a constant vector. Since two solutions of the original BVP (or two solutions ofKU = F) differ by a constant, this is just a question of which solution is to be computed. I suggest replacing F with (11.31) since then Up — K~} F is independent of p, is a solution of KU = F, and satisfies E • Up = 0. EXAMPLE 11.5. This example is a continuation of Example 7.4, where a pure Neumann problem (posed on the unit square) was solved. The linear system KU — F from that example (corresponding to the finest mesh) consists of\ 089 equations and 1089 unknowns. Denoting as above the computed load vector by F, I will solve KPU = F by the CG method. Using a relative residual tolerance o/"10~8, CG requires 134 iterations to solve the system. The value of p (computed from a random vector, as suggested above) was 3.46 • 10~3. In this example, the average value of the components of F was 7.97 • 10~7; this is nonzero due to quadrature error. As a result, the computed solution did not satisfy E-U = 0; instead, the average value of the components of the computed U, 2.11 • 10~7, was of the same order of magnitude. When F was replaced with (11.31), the resulting system also required 134 CG iterations to solve, but the computed solution satisfied E-U — 0 up to round-off error (the mean value of the components ofU was 2.32 • 10~17).
11.6 The MATLAB implementation Some of the algorithms discussed in this chapter apply to any matrix-vector equation KU — F, where K is symmetric and positive definite. These include CG and its preconditioned version. Other algorithms, such as the hierarchical basis CG method, are developed specifically for solving finite element equations. Code is provided for such algorithms for the piecewise linear case only and therefore is found in versionl. 11.6.1
MATLAB functions
• CG Applies the CG method to solve a linear system with a symmetric positive definite coefficient matrix. • PCG Applies the PCG method to solve a linear system with a symmetric positive definite coefficient matrix. • HierCGl Applies the hierarchical basis CG method to solve the finite element equation KU — F (piecewise linear functions only). • CGsing, HierCGsingl: Versions of CG and HierCGl for a singular matrix. • The functions implementing the operators S, S~ ] , ST, S~T discussed in Section 11.2.1 are - HierToNodall Transforms the coefficients representing a piecewise linear function in terms of a hierarchical basis into the nodal values (the operator 5). - HierToNodalTransl: The operator ST.
11.7. Exercises for Chapter 11
263
- NodalToHierl Transforms the nodal values of a piecewise linear function into the coefficients representing it in terms of a hierarchical basis (the operator 0000
- NodalToHierTransl: The operator S
11.7
.
Exercises for Chapter 11
1. Let K be an n x n symmetric positive definite matrix, let F e R" be given, and let 0 : R" -> R be defined by
Show that 2. Let K be an n x n symmetric positive definite matrix and let F e R" be given. Suppose the steepest descent algorithm is applied to solve KU — F, starting with the initial estimate (7(0). Show that the bound (11.4) implies that the number of iterations required to achieve
is proportional to cond(AT). (Hints:
3. Use (11.14) to show that the number of CG iterations required to achieve
is proportional to ^/cond(K). (Hint: See the previous exercise.) 4. For each v e H' (£2), define v by (11.30). Prove that v - v e V. 5. Let K be the (singular) stiffness matrix for the pure Neumann problem (11.23), and let the eigenvalues of K be 0 < A 2 < • • • < XNV . Denote the corresponding eigenvectors by E, V2,..., VJv,,, where E is the vector of all ones. Show that the eigenvalues and eigenvectors of Kp = K + pEET are the same as those of A', except that the zero eigenvalue is changed to Nvp. 6. Let K, E, and Kp be as in the preceding exercise. (a) Show that if E-F = Q,thenE-K~}F = 0. Show also that K~' F is independent of p.
264
Chapter 11. Iterative methods: Conjugate gradients
(b) Show that if F is a multiple of E, then so is K~}F. Show that the norm of K~l F is inversely proportional to p. 7. In Section 11.2, algorithms are presented for computing SU, S~1U, ST U, and S~T U, where S maps the weights in the hierarchical basis representation of a piecewise linear function to its nodal values. Given that the algorithm implementing the action of S~] is correct (which is justified in the text), show that the other three algorithms are correct. (Hint: Represent S~] as the composition of k linear operators, where k is the number of levels of refinement: S"1 = Sf'sr 1 ,..., Sr1.) 8. (MATLAB) Consider the BVP
where £2 is the unit circle. Establish a mesh on Q by starting with a coarse triangulation with four triangles (as in Figure 6.11) and refining it five times. Form the finite element equations KU = F and solve them using both CG and the hierarchical basis CG method. How many iterations of each are required to reduce the relative residual below 10~6? How large is the difference between the solutions computed by the two CG methods and the "exact" solution computed by the MATLAB direct solver? 9. (MATLAB) Repeat the previous exercise, but take £2 to be the polygon with vertices (0, 0), (1, 0), (1, 1), (-1, 1), (-1.-1), and (0,-1). Start with a coarse mesh consisting of six triangles and refine it five times to obtain the final mesh. 10. (MATLAB) Apply the CG algorithm to solve KU = F, the finite element equations resulting from solving the BVP (11.5) using piecewise quadratic functions. Create a table analogous to the table in Example 11.2. 11. Use the results of Exercise 4.8.5 to determine the operation count for each iteration of CG in the previous exercise. Compare this to the operation count when K is the stiffness matrix for piecewise linear finite elements. Taking into account the number of CG iterations required in each case, the cost per iteration, and the number of unknowns, compare the cost of solving the finite element equations using piecewise linear and piecewise quadratic functions (for the given example). 12. (Project) The idea of a hierarchical basis can be extended to piecewise polynomials of higher degree. For example, to incorporate quadratic shape functions on a sequence of meshes To, 71,..., 71, it can be assumed that the basis functions corresponding to To, 7 i , . . . , 7/c-i are piecewise linear, exactly as before. The new basis function corresponding to nodes belonging to 7* but not to Tjt-1 are then taken to be the standard piecewise quadratic basis functions corresponding to the midpoints of the edges in 7*-i (these midpoints are precisely the new nodes in Tk). (a) In this scheme, approximately what percentage of the basis functions are quadratic?
11.7. Exercises for Chapter 11
265
(b) Work out the details (for example, the relationship between the stiffness matrices in the standard and hierarchical bases), and implement the corresponding CG method. (c) Compare this approach with the ordinary CG algorithm applied to the stiffness matrix corresponding to the standard nodal basis of piecewise quadratics for BVP (11.5). Is there a difference in efficiency? In accuracy?
This page intentionally left blank
Chapter 12
The classical stationary iterations
In this chapter I will briefly cover the classical Jacobi, Gauss-Seidel, and SOR (successive overrelaxation) methods, all of which can be classified as stationary iterations. These methods generally do not converge as rapidly as CG, but they are still important because they can sometimes be used as preconditioners, and because they play an important role in the multigrid method, which is covered in the next chapter.
12.1
Stationary iterations
Since the methods described in this section are not necessarily restricted to symmetric or positive definite systems, I will write the equation to be solved as
where A € RNxN is nonsingular and b e RN. It may be possible to transform the equation Ax = b into an equivalent equation
Equation (12.1) has a unique solution if and only if / — B is nonsingular. A solution of an equation of the form x — f(x) is called a stationary point (or fixed point) of f. A general algorithm for computing a stationary point is to choose an initial estimate jc(0) and compute jc (1) , j c < 2 ) , . . . by jt(*+1) = f ( x ( k ) ) . This is referred to asfixed point iteration. In the context of (12.1), fixed point iteration takes the form
The iteration (12.2) is called a stationary iteration for AJC — b. The exact solution jc satisfies jc = Bx + c, and thus
Therefore,
267
268
Chapter 12. The classical stationary iterations
where the error is defined by
This in turn implies that
and thus This equation makes the analysis of stationary iterations fairly straightforward, but this analysis requires the use of matrix norms.
12.1.1
Matrix norms
The general definition of a norm for a vector space was given in Section 2.4.1. Since matrices can be added and multiplied by scalars, the set of matrices of a given size can be regarded as a vector space. In this book, only the space RNxN of square matrices with real entries is used. A norm on RNxN is any real-valued function || • || satisfying the three fundamental properties:
A matrix norm on RNxN is required to satisfy a fourth property:
In particular,
and, by induction,
The most useful matrix norms, at least from a theoretical point of view, are those induced by a given vector norm. An induced matrix norm is defined by
Here I use the same notation for the vector norm || • || on RN and the induced matrix norm || • || on R^ x N . it is easy to show from the definition that an induced matrix norm satisfies
Here are the most important induced matrix norms. 1. The Euclidean norm on
12.1.
Stationary iterations
269
induces a matrix norm that is also denoted || • ||2. It can be shown that ||A||2 is the square root of the largest eigenvalue of the positive semidefinite matrix AT A. For a symmetric positive definite (or positive semidefinite) matrix A, this reduces to
where A. max (A) is the largest eigenvalue of A. 2. The £°° norm on RN is defined by
The corresponding induced matrix norm is also denoted by || • ||oo. It can be shown that
In other words, \\A\\QQ is the maximum absolute row sum of A. It should be noted that ||A||2 is quite expensive to compute and is therefore used primarily for theoretical analysis. The most important result on matrix norms, which I give below, refers to the spectral radius of a matrix:
The reader should note the difference between p(A) and A. max (A): A. max (A) is meaningful only when the eigenvalues of A are real, while p(A) makes sense for any matrix, even one with complex eigenvalues. For symmetric positive definite matrices, the two are equal: p(A) = A max (A). Therefore, if A is symmetric positive definite, then ||A||2 = p(A) — 00000000000
The following results about matrix norms are fundamental. THEOREM 12.1. /. If A e RNxN and \\ • \\ is any matrix norm (induced by a vector norm or not), then
2. If A e R" x/! and € is any positive real number, then there exists at least one vector norm \\ • \\ such that the corresponding induced matrix norm satisfies
This theorem and the next are proved in Ciarlet [17, Sections 1.4 and 1.5]. As noted above, || Ak | < || A ||* for all k = 1, 2......, 3,.. It follofollows that for any matrixxx It follows that, for any matrix norm || • ||, The main result now follows from the previous theorem.
270
Chapter 12. The classical stationary iterations
THEOREM 12.2. Let A e RNxN. Then the follow ing are equivalent:
for alt least one induced matrix norm
12.1.2
Convergence of stationary iterations
As I showed earlier, the error e(k) = x(k) — x in the sequence produced by the stationary iteration satisfies Convergence of x(k) to jc is equivalent to the convergence of e(k) to zero. Therefore, by Theorem 12.2, the iteration (12.4) converges for each possible jc(0) if and only if p(B) < 1. Moreover, (12.5) and the inequality
suggest that the smaller \\B\\ (and hence />(#)) is, the faster the error converges to zero. This conclusion is not necessarily true for all e(G\ since the inequalities
need not be tight. Nevertheless, it can be shown that there is an asymptotic sense in which the above statement is true, namely, that given two stationary iterative methods
and the first is better than the second if p(B\) < p(#2)- For this more refined analysis, see Ciarlet [17, Theorem 5.1-2] or Varga [42].
12.2 The classical iterations A general way to construct a stationary iteration for a system of the form Ax — b is to "split" the matrix A into A — M — N, where M is nonsingular. Then
12.2.
The classical iterations
271
Generally the iteration is actually implemented as
and so it must be inexpensive to solve systems in which M is the coefficient matrix. The convergence of the iteration is determined by the matrix B = M~]N, and thus the fundamental question is whether p ( M ~ ] N } < 1 holds for A in the class of matrices under consideration. I now describe the most popular splittings and resulting iterations. I will use the notation that D is the diagonal matrix whose diagonal entries are those of A, L is the negative of the strict lower triangle of A, and U is the negative of the strict upper triangle of A. That is,
and A = D - L - U.
12.2.1
Jacobi iteration
The Jacobi iteration corresponds to the splitting M = D, N — L + U, and is only defined if all of the diagonal entries of A are nonzero. This iteration is easily implemented as follows:
One class of matrices for which Jacobi iteration is guaranteed to converge is the class of strictly diagonally dominant matrices. A matrix A e R^XW js called strictly diagonally dominant if
If A is strictly diagonally dominant, then
and hence p (M~] TV) < 1 in this case (see Exercise 1).
Chapter 12. The classical stationary iterations
272
EXAMPLE 12.3. To illustrate the convergence of the Jacobi iteration, I use the test problem from Example 11.1. This example does not fit into the convergence theorem just mentioned, as K is diagonally dominant but not strictly diagonally dominant. However, Jacobi iteration does converge. The following table shows the number of iterations taken in reducing the relative residual to 10~6:
N 9 49 225 961 3969 16129
Iterations 40 171 692 2776 11 107 44422
The iteration converges slowly in this example, and this is not uncommon. Jacobi iteration is easy to implement and each iteration is inexpensive, but convergence is quite slow.
12.2.2
Gauss-Seidel iteration
Jacobi iteration requires two vectors of storage for the solution, since x(k) must be kept while is being computed. The value of x(k) can be overwritten with x(k+l) if the value of is used as soon as it is known. One might also expect that this would lead to faster convergence, since the (presumably better) value jc^ +l) is used in place of jcf0. The result is the following iteration, which is called the Gauss-Seidel iteration:
Gauss-Seidel corresponds to the splitting M = D — L, N = U. It might be thought that, since M is lower triangular rather than diagonal, Gauss-Seidel is more costly per iteration than Jacobi. In fact, however, the number of operations per iteration is exactly the same, and the memory cost is less since it is not necessary to maintain separate arrays for x(k) and It can be shown that the Gauss-Seidel iteration converges for all symmetric positive definite matrices A. EXAMPLE 12.4. The following table shows the number of iterations of Gauss-Seidel taken in reducing the relative residual in the system from Example 11.1 to 10~6:
N 9 49 225 961 3969 16129
Iterations 21 88 353 1411 5639 22544
12.2. The classical iterations
273
In this example, Gauss—Seidel converged much more quickly than Jacobi (which is typical), although the performance is still poor.
12.2.3
SOR iteration
Another popular iteration can be obtained by modifying the Gauss-Seidel method. A parameter a> > 0 is chosen and A is split as follows:
The resulting iteration is called the successive overtaxation (SOR) iteration. (Whenw < 1, the method is sometimes referred to as underrelaxation.) Gauss-Seidel is the special case of SOR corresponding to u> — 1. The eigenvalues of B = M~^NW depend continuously on CD. Therefore, if Gauss-Seidel converges, then so does SOR for a) sufficiently close to co — 1. It might be hoped that one could find co such that
Ideally, one would find the optimal co by solving
leading to the optimal SOR method. Actually computing the optimal co, though, might be costly and not worthwhile. The following results are known: 1. In order that SOR converge, it is necessary that \co — 11 < 1 hold. In fact,
2. If A is symmetric and positive definite, then SOR converges provided \co — 1| < 1, that is, provided 0 < co < 2. 3. If A is symmetric positive definite and tridiagonal or block tridiagonal, then
so that Jacobi and Gauss-Seidel both converge and Gauss-Seidel converges more rapidly than Jacobi. Here I denote the Jacobi splitting by A — Mj — Nj (Mj = D, Nj — L + U). Moreover, in this case SOR has an optimal parameter of given by
Ciarlet [17, Section 5.3] contains proofs of the above results. It should be emphasized that even if the optimal SOR parameter exists, it may be a nontrivial task to compute it. Young [43] contains an extensive discussion of practical methods to estimate the optimal SOR parameter.
Chapter 12. The classical stationary iterations
274
EXAMPLE 12.5. Thefollowing table shows the number of iterations ofSOR taken in reducing the relative residual in the system from Example 11.1 to 10~6. The parameter a> was chosen to be a> — 1.5. (/ did not attempt to estimate the optimal value ofu>.) N 9 49 225 961 3969 16129
Iterations 23 23 117 487 1964 7866
On this example, SOR converged much more quickly than Gauss—Seidel.
12.2.4
Symmetric SOR
Given a splitting A = M — N, the matrix M is a candidate for a preconditioner for the CG method, provided M is symmetric and positive definite. Of the methods considered thus far, only Jacobi's method leads to such an M. However, the SOR method can be symmetrized to give another possible preconditioner. The reader may have noticed that, in the Gauss-Seidel method, the splitting M = D — L, N = U is rather arbitrary; it might just as well be M — D — U, N — L. The same remark applies to the SOR method: One could use
instead of The resulting iteration is referred to as backward SOR, since a step corresponds to solving an upper triangular system by back substitution. When A is symmetric, UT = L, and thus
The symmetric SOR (SSOR) method consists of taking one step of the ordinary SOR method followed by one step of the backward SOR method. Applying one step of SOR to x(k) yields
and following this by one step of backward SOR yields
Thus the SSOR iteration is given by
12.2. The classical iterations
275
EXAMPLE 12.6. The following table shows the number of iterations of SSOR taken in reducing the relative residual in the system from Example 11.1 to 10~6. The parameter o) was chosen to be a> = 1.5.
N 9 49 225 961 3969
16129
Iterations 27 93 351 1355 5332 21 193
In this example, SSOR converged about as quickly as Gauss-Seidel (in terms of the number of iterations}, and much slower than SOR with the same parameter. The reader should notice that the cost per iteration of SSOR is twice that of Gauss—Seidel or SOR.
12.2.5
CG with SSOR preconditioning
Although it is not immediately obvious, the SSOR iteration given by (12.8) results from a splitting of A, A — M(l) — N^. The matrix Mw must be given by
must be the matrix multiplying b in (12.8)),and some simplification shows that
Therefore,
A formula for the matrix Nw could also be derived, but it is not needed. The matrix MM is symmetric and positive definite and so is a candidate for preconditioning the CG algorithm. EXAMPLE 12.7. The following table shows the number of iterations ofCG, preconditioned by SSOR with o) — 1.5, taken in reducing the relative residual in the system from Example ll.lfoKT6.
N 9 49 225 961 3969 16129
Iterations 1 13 23 42 79 165
The number of iterations is reduced by a factor of almost two over unpreconditioned CG.
276
Chapter 12. The classical stationary iterations
12.3 The MATLAB implementation The algorithms discussed in this chapter are implemented for a general system Ax — b\ none of them are mesh-dependent.
12.3.1
MATLAB functions
• Jacobi: Jacobi method. • GaussSeidel: Gauss-Seidel method. • SOR: Successive overrelaxation method (user must provide relaxation parameter). • SSOR: Symmetric SOR method (user must provide relaxation parameter). • CGSSOR: Conjugate gradients with SSOR preconditioning (user must provide relaxation parameter).
12.4
Exercises for Chapter 12
1. Suppose A e R"x" is strictly diagonally dominant and that A — D — L — U is the splitting of A into its diagonal, lower triangular, and upper triangular parts, as in this chapter. Prove that and hence that Jacobi iteration is guaranteed to converge. 2. Verify that the Gauss-Seidel iteration corresponds to the splitting M — D — L, N = U. 3. Suppose A € Rnxn is symmetric and positive definite, and let A = M — N, where M is nonsingular. Consider the matrix C = MT + N. (a) Prove that C is symmetric. (b) Prove that if C is positive definite, then p ( M ~ ] N ) < 1. (Hint: Recall that
for any matrix norm || • ||. Prove that \\M M norm induced by the vector norm
Notice that that
l
N \\M ./i < 1, where || • ||A is the matrix
Some simplification shows
(c) Using the previous result, prove that when A is symmetric and positive definite, the SOR method converges for 0 < CD < 2.
12.4. Exercises for Chapter 12
277
4. (From [ 16]) Consider the two matrices
For each matrix, determine the following by direct computation: (a) the spectral radius of the iteration matrix for Jacobi's method; (b) the spectral radius of the iteration matrix for the Gauss-Seidel method; (c) whether Jacobi and/or Gauss-Seidel converges. 5. Prove that the matrix norm || • ||2 induced by the Euclidean vector norm || • ||2 reduces to ||A||2 = ^mcu(A) when A is symmetric and positive definite. (Hint: There is an orthonormal basis of eigenvectors jci ,XT, ... ,xn with corresponding eigenvalues 0 < A, < X2 < -•• < X n . For any x = c\X]+C2X2-\ \-cnxn, \\x\\l = cJ+c^H \-c2n. Compute Ax using the expansion of x in terms of eigenvectors; then compute || Ax\\\ and compare it to ||jc l^.) 6. Show that the matrix norm || • ||oo induced by the vector norm || • ||oo is given by
(Hint: Notice that if ||jc||oo = 1, then no component of x is more than 1 in magnitude. Use this to show that HAHoo is bounded by the maximum absolute row sum of A. Then suppose row k o f A has the largest absolute row sum and consider a vector with all components equal to ±1, the signs matching the signs of the entries in row k of A.) Consider a model problem on a rectangle resulting in a stiffness matrix with no more than five nonzeros per row. What is the cost of an iteration of CG? What is the cost of an iteration of CG with SSOR preconditioning? Taking into account the added cost of the SSOR preconditioning, did the SSOR preconditioning lead to an increase in efficiency in Example 12.7? 8. (MATLAB) Consider the BVP
where £1 is the unit circle. Starting with a coarse triangulation with four triangles (as in Figure 6.11), refine it five times to obtain a sequence of six meshes. Form the finite element equations KU — F and apply each of the following methods to solve KU — F, reducing the relative residual to below 10~6. Create a table analogous to that of Example 12.3 for each: (a) Jacobi iteration, (b) Gauss-Seidel,
278
Chapter 12. The classical stationary iterations
(c) SOR (choose the relaxation parameter by trial and error), (d) SSOR with the same relaxation parameter, (e) CG with SSOR preconditioning. 9. (MATLAB) Repeat the previous exercise, but take £2 to be the polygon with vertices (0, 0), (1,0), (1, 1), (-1, 1), (-1, -1), and (0, -1). Start with a coarse mesh consisting of six triangles and refine it five times to obtain a sequence of six meshes. 10. (M ATLAB) Apply each of the following methods to solve KU — F, the finite element equations resulting from solving BVP (11.5) using piecewise quadratic functions. Create a table analogous to that of Example 12.3 for each: (a) Jacobi iteration, (b) Gauss-Seidel, (c) SOR (choose the relaxation parameter by trial and error), (d) SSOR with the same relaxation parameter, (e) CG with SSOR preconditioning. Is the relative efficiency of the methods the same as in the piecewise linear case?
Chapter 13
The multigrid method
In the preceding chapter, I described the classical stationary iterations and showed how the rate of convergence of a given method depends on the spectral radius of the iteration matrix. However, there is more to an iterative method than the rate at which the norm of the error goes to zero. As I will show in Section 13.1, certain iterative methods smooth the error: the high frequency components of the error go to zero quickly, while low frequency components are reduced much more slowly. Multigrid methods take advantage of this phenomenon, switching from one mesh to another to reduce various components of the error efficiently.
13.1
Stationary iterations as smoothers
To lay the foundation for the multigrid method, I begin by examining in detail the performance of Jacobi and Gauss-Seidel iterations on a simple model problem. The BVP is Poisson's equation under Dirichlet conditions, the domain is the unit square, and the mesh is a uniform triangulation. It is possible to describe very precisely the fashion in which the error in the solution to KU — F goes to zero.
13.1.1
The stiffness matrix for the model problem
In this section, £2 will denote the unit square, ft = {(x,y) : 0 <Jt < I, 0 < > ' < ! } ,
and Th will denote an n x n uniform triangulation of £2, as shown in Figure 13.1 for n = 4. In this section, the integer n will be of the form n = 2m for some positive integer m. Such a mesh has 2n2 triangles, (n + I) 2 nodes, and (n — I) 2 free nodes. The nodes will always be numbered by rows, from left to right and bottom to top, from 1 to (n + I) 2 , and the free nodes, which are the interior nodes for a Dirichlet problem, will be numbered in the same way. 279
280
Chapter 13. The multigrid method
Figure 13.1. A uniform triangulation of the unit square. All the nodes are labeled on the left; the free nodes are labeled on the right. For reasons that will become clear, it is convenient to define an alternate numbering of the nodes in the mesh, one that explicitly displays the two-dimensional nature of the mesh. Defining h — 1/n, a typical node is (ih, jh) = (i/n, j/n), i, j = 0, 1 , . . . , n. In this section, the nodes will be denoted
The free nodes are then Z i j , i, j — 1 , 2 , . . . , n — 1. The stiffness matrix for the BVP
takes a special form on the uniform mesh Th- First of all, no row of K has more than five nonzero entries. Referring to Figure 13.1, it might appear that, for example, K5\, AT52, AT54, ^55, ^56, ^58> ^59 would all be nonzero, for a total of seven nonzeros in row five of K. However, due to the geometry of the mesh, K*,\ and £59 are both zero (V05 and V<j>\ are orthogonal on their common support, as are V05 and V>9). The analogous cancellation occurs in each row of K. Second, the five nonzeros in a typical row correspond to the nodes
Assuming ztj is the Mi free node,20 these nodes are numbered
20
AIthough I do not use it explicitly, the relationship is k = (j — l)(n — 1) + /.
13.1. Stationary iterations as smoothers
281
in the linear ordering of the nodes, which determines the order of the unknowns and the sparsity structure of K. Therefore, nonzeros in K occur only in diagonals — (n — 1), — 1 , 0 , 1, n — 1, where the main diagonal is numbered 0, the subdiagonals are numbered — 1, —2, and the superdiagonals are numbered 1,2, Finally, due to the uniformity of the mesh and the PDE, the nonzero entries are
Most rows have these five nonzeros, although rows corresponding to nodes next to the boundary have only four nonzeros and rows corresponding to nodes in the corners (nodes z\,\, 2/1-1,1, z\,n-\, zn-\,n-\) nave onry three nonzeros. The result is the block tridiagonal matrix
where A is an (n - 1) x (n - 1) tridiagonal matrixm,
and / is the (n — 1) x (n — 1) identity matrix. Those readers familiar with finite difference methods for solving Laplace's or Poisson's equation will recognize that K is a multiple of the finite difference matrix resulting from the standard five-point stencil for the Laplacian:
13.1.2
Fourier modes and the spectral decomposition of K
The matrix K has the following (rare) property: It is possible to write down analytic formulas for its eigenvalues and eigenvectors. It turns out that the eigenvectors of K can be obtained from the eigenfunctions of the Laplace operator, which are also known analytically. Defining
it follows that
282
Chapter 13. The multigrid method
Therefore, with
showing that u/ fc - €) , /z*^ form an eigenfunction-eigenvalue pair for each k, t > 1. Perhaps surprisingly, the eigenvectors of K are obtained by merely sampling the eigenfunctions M/*-€) at the free nodes. The vector W(k-l) e R^"1* is defined by
When it is necessary to arrange the components of W(M) in a one-dimensional array, as when forming the matrix-vector product
the order is the same as the order of the nodes:
I will now show that W(kti) is an eigenvector of K by computing atypical entry (K W^-^^ij. The manipulation that follows is based on the trigonometric identities
The reader should notice how the following expression arises from the nonzeros — 1, -1,4, — 1, — 1 in the typical row of K:
The above calculation, which is derived for a typical row of K with five nonzeros, is also valid for the rows with four or three nonzeros. In these cases, one or two of the adjacent nodes Zij-i, Zi-ij, Zi+ij, Zij+\ lies on the boundary, where the value of w/M) is zero. In such a case, any term that does not belong in the above calculation o f ( K W ( k ' e ) ) i j is in fact zero and has no effect.
13.1. Stationary iterations as smoothers
283
I have thus shown that
where A.^ = 4 — 2 cos (kn/n) — 2 cos (tn/n). The relationship between W(k*t} and w(k'l} has already been stated (sampling w(k-l) yields W(k^). When k and t are small compared to «, then, by a Taylor expansion,
The stiffiiess matrix K would be scaled by 1 /h2 to make it a discrete approximation to —A (see (13.2)), and this scaling would make A.^ close to /u^ for K, I small. Since K is (n — I) 2 x (n — I) 2 , it can have only (n — I) 2 independent eigenvectors. Indeed, the vectors W(k-^ are distinct only for k, I — 1, 2 , . . . , n — 1. For larger values of k and/or t, aliasing occurs because the discrete mesh cannot resolve such large frequencies. To be precise, the following relationships hold (Exercise 1):
The eigenvectors W(k-^,k,l — \,2,... ,n - I, can be shown to be orthogonal under the Euclidean dot product. They therefore form an orthogonal basis for R (n ~ 1)2 . It is convenient to normalize these eigenvectors so as to obtain an orthonormal basis, and this is facilitated by the fact that each W(k-i} has the same Euclidean norm:
Therefore, if
then{W ( 0 ) : k,i= 1,2, ...,n- 1} is an orthonormal basis for R ( r t ~ l ) 2 . The significance of the above calculations can now be explained. When U e R ( "~ 1)2 is written in terms of the basis of eigenvectors,
Chapter 13. The multigrid method
284
Figure 13.2. Four Fourier modes for n = 16: W(}'1) (upper left), W ( 1 ' I 5 ) (upper right), W(15>1) (lower left), W (15 - I5) (lower right). (this is the standard formula for expressing a vector in terms of a given orthonormal basis; see Exercise 2), the frequency content off/ is revealed by the coefficients U • W(k^\ This is because each eigenvector is a sinusoid that oscillates with a certain frequency, as illustrated in Figure 13.2. The eigenvectors W(*'€) are called (discrete) Fourier modes. When k and t are both small compared to n, then W(k^ is relatively smooth—a low frequency mode. When k and t are both close to n, then W(k^ is quite oscillatory—a high frequency mode. When one of the indices is small and the other large, then W(k-^ is smooth in one variable and oscillatory in the other. Such a mode is called a mixed frequency mode. It will be useful below to define these terms precisely: 1 < k, t < n/2 correspond to a low frequency mode, n/2 < k,i < n — 1 to a high frequency mode, and the remaining values of k, t to mixed frequency modes.
13.1.3
Jacob! iteration
The reader will recall that when a stationary iteration
13.1. Stationary iterations as smoothers
285
Figure 13.3. The initial vector U^ from Example 13.1 (left} and its frequency content (right). is applied to solve KU — F, the error
(where U is the exact solution of KU — F) satisfies
For this reason, when analyzing convergence, it is common to take F — 0, so that the system is KU — 0, the solution is U — 0, and the error is just E(k) — U(k\ This simplifies the notation somewhat, but does not limit the applicability of the results. Of course, the initial vector t/ (0) , which plays the role of the initial error, must be nonzero. The following example shows how the error goes to zero under Jacobi iteration. EXAMPLE 13.1. Consider the model problem with n — 16, so that K e R 225x225 . The initial vector is chosen to contain all frequencies equally:
This vector is displayed (as a piecewise linear function) in Figure 13.3. The equivalent vector, having as components f/(0) • W(kX) and thus revealing the frequency content of U(Q is also displayed in Figure 13.3. This equivalent vector can conveniently be computed as WTU(k\ where W is the matrix whose columns are the Fourier modes:
Ten steps of the Jacobi iteration yields the vector (7(10), which is shown in Figure 13.4 along with WTU(}0). The norm of the error is reduced by about a factor of\Qby these 10 iterations:
286
Chapter 13. The multigrid method
Figure 13.4. The vector f/ (10) from Example 13.1 (left) and its frequency content (right). More importantly, Figure 13.4 shows that the various Fourier modes of the error were damped according to a distinct pattern: The low and high frequency modes were reduced much less than the mixed frequency modes. The results of the previous example could have been predicted. The Jacobi splitting is K = Mj — Nj , where My is the diagonal matrix agreeing with K on the diagonal, and Nj is the rest of the matrix. Jacobi 's method is then
and the eigenvectors of Bj = MJ1 Nj are precisely the same as the eigenvectors of K. Indeed, a similar calculation to that given on page 282 shows that
Therefore,
where
It follows that
13.1. Stationary iterations as smoothers
287
Figure 13.5. The eigenvalues 6k. t of the Jacobi iteration matrix, graphed as a function of k and t (left) and as a function of a single index (right). The eigenvalues corresponding to low, mixed, and highfrequencies are indicated by o, o, andx, respectively. The Fourier modes are damped since \0k.e\ < 1, and the smaller the eigenvalue \6k^ , the faster the mode W(k'i} is damped. The expression 9k^, as a function of A: and I, is graphed in Figure 13.5. Since the three-dimensional view of #*.£ is difficult to interpret, it is helpful to view Q^ as a function of a single index; this is also displayed in Figure 13.5. All of the eigenvalues 6k^ are less than one in magnitude, but those that correspond to the lowest frequency and the highest frequency modes are close to one in magnitude. It follows that these modes are slowly eliminated by the Jacobi iteration. On the other hand, the eigenvalues corresponding to the mixed frequency modes are small and these modes are eliminated quickly by the Jacobi iteration.
13.1.4
Weighted Jacobi iteration
As 1 show in the next section, it is useful to have iterative methods that smooth the error; that is, iterations that eliminate the high frequency modes most rapidly and the low frequency modes most slowly. Such an iteration can be designed using weighted Jacobi iteration. Given any iteration matrix B and a scalar a> e (0, 1), a new iteration can be defined as follows: When B is Bj, the Jacobi iteration matrix, this is called the weighted Jacobi iteration. The eigenvalues of are simply (see Exercise 3), and
288
Chapter 13. The multigrid method
Figure 13.6. The eigenvalues of the weighted Jacobi iteration matrix Bj , graphed as a function ofk and t (left) and as a function of a single index (right). The eigenvalues corresponding to low, mixed, and high frequencies are indicated by o, o, and x, respectively. (The dotted lines correspond to ±1/3.)
(—1 < Bk,t < 1, as was shown above). To damp the high frequencies, o> should be chosen so that the values \a)6k,i + 1 — w are as small as possible for n/2 < k, t < n — 1. Exercise 4 asks the reader to show that the optimal value of a), from this point of view, is a> = 2/3; with this value of co,
The eigenvalues of By
are shown in Figure 13.6. Examination of Figure 13.6 shows that,
as expected, the eigenvalues corresponding to high frequency modes are all less than 1/3 in magnitude, while those corresponding to mixed frequency modes are all less then 2/3. Only the eigenvalues corresponding to low frequency modes are close to 1 in magnitude. It can be predicted, then, that the iteration matrix Bj } will eliminate the high frequency modes of the error quickly, leaving behind only the smooth (low frequency) modes. This is confirmed in the following example. EXAMPLE 13.2. The model problem, with n = \6, is solved by weighted Jacobi iteration with a) — 2/3. Once again the initial vector is chosen to contain all frequencies equally (see Figure 13.3). Ten steps of the weighted Jacobi iteration yield the vector (7(10) shown in Figure 13.7 along with WTU(]0). The convergence is actually degraded a bit as compared to the ordinary Jacobi iteration, when only the norm of the error is considered:
13.1. Stationary iterations as smoothers
289
Figure 13.7. The vector (7 (IO) from Example 13.2 (left) and its frequency content (right).
However, Figure 13.7 shows the expected damping of high frequency modes. Only
the lowest frequencies are discerniblen inll(]0).
Other stationary iterations can be seen to smooth the error, although it may not be so simple to analyze the eigenvalues and eigenvectors of the iteration matrix in other cases. For example, the Gauss-Seidel iteration matrix BOS is not symmetric and has complex eigenvalues and eigenvectors, so its effects cannot be as easily analyzed as above. Nevertheless, the Gauss-Seidel iteration has a pronounced smoothing effect on the error, as the following example suggests. EXAMPLE 13.3. This time the model problem with n = 16 is solved using the Gauss-Seidel iteration. The initial vector U (0) is as before (see Figure 13.3). Ten steps of the Gauss— Seidel iteration yield the vector f/ ( I O ) shown in Figure 13.8 along with WTU^®\ The convergence is much faster than was exhibited by either Jacobi or weighted Jacobi iteration:
Moreover, Figure 13.8 shows a very pronounced smoothing effect. Gauss-Seidel will be used as the basic iterative method while developing the multigrid method. The following example shows that the above ideas are not restricted to the model problem. EXAMPLE 13.4. Consider the equation KU — 0 arising from the BVP
290
Chapter 13. The multigrid method
Figure 13.8. The vector £/(10) from Example 13.3 (left) and its frequency content (right).
Figure 13.9. The mesh for Example 13.4.
where K (x, y) = 1 + x2 + 2y 2 and £1 is the unit disk. A mesh is established on Q as shown in Figure 13.9; the mesh is intentionally asymmetric. The initial vector J7(0) is taken to be the piecewise linear interpolant of
ThusU(Q} contains many frequencies. Figure 13.10 shows f/ (0) , f/ (1) , f/ (2) , (7(4), obtained by applying the Gauss-Seidel method to KU = 0. The results show that the error is smoothed by the Gauss-Seidel iteration.
13.2. The coarse grid correction algorithm
291
Figure 13.10. The vectors U(Q}, U ( l ) , U(2\ U(4) from Example 13.4.
13.2
The coarse grid correction algorithm
Here is a key idea underlying the multigrid method: Some Fourier modes that are classified as low frequency on a given mesh appear to be high frequency on a coarser mesh. This is because the frequency at which a Fourier mode oscillates is independent of the mesh, while the classification into low and high frequency modes depends on n, that is, on how fine or coarse the mesh is. To explain this idea in detail, I consider a sequence of nested meshes, with one mesh obtained from the previous by refinement. The final (finest) mesh will be denoted 7/z, and the previous, coarser meshes by TIH , T*h, For motivation, the model problem should be kept in mind, but the following development is valid for any BVP considered in this book. Other quantities associated with these meshes will be identified by the subscripts h,2h,4h, For example, Kh will denote the stiffness matrix for mesh ?/,, K^h the stiffness matrix for mesh Tib, ar >d so forth. I will denote the exact solution of KhU — Fh by (//,, and similarly for the other meshes. The number of free nodes in 7/, will be denoted by /V/,, so that Kh e RN"xN«, Fh e R"", and so forth.
292
13.2.1
Chapter 13. The multigrid method
Projecting the equation onto a coarser mesh
Applying k steps of the Gauss-Seidel method to KhU — F/, yields an estimate U^k) of the solution Uh- Since the multigrid switches between meshes, performing iterations on each, it becomes inconvenient to keep track of the iteration number k. Therefore, I will denote the current approximate solution on Th by Vh rather than by U(h \ The error Eh — Uh — V/, is expected to be relatively smooth, which has two important implications. First of all, a smooth (low frequency) function on Th can be well-represented on Tih\ this is not true of a function that contains significant high frequency content. This is illustrated in Figure 13.11. It follows that it should be possible to compute Eh accurately (though not exactly) on Tin. Second, since Eh is smooth, it is mostly made up of low frequency modes on Th. Referring for the moment to the model problem, this would mean that the significant components of Eh are W(k-l), where 1 < k < n/2 and 1 < t < n/2. However, on mesh Tih, k is a high frequency if k > (n/2)/2 — n/4, and similarly for t. This means that many of the frequencies that are hard to eliminate on Th can be eliminated quickly on Tih•
Figure 13.11. Top: A smooth function on a fine mesh (left) and on a coarse mesh (right). Bottom: An oscillatory function on a fine mesh (left) and on a coarse mesh (right).
13.2. The coarse grid correction algorithm
293
I must now explain how to move from Th to Tih when solving KhU = Fh. It is not possible to transfer the equation itself from Th to Tin, since the solution Uh may not be well-represented on the coarser mesh (Uf, may contain significant high frequency modes). However, the error £/, = £//, — Vh is known to be smooth, so it makes sense to look for an equation satisfied by Eh. This is easy to find:
The residual Rh — Fh — KhVh is the amount by which Vh fails to satisfy the equation Kf,U — Ff,. While Eh cannot be computed (if Vh and Eh were both known, then the exact solution Uh = Vh + Eh would be known), Rh is computable once the estimate V/j has been computed. Therefore, Eh satisfies the known equation
To understand the next step, it is important to recall that each vector Vh e RNh corresponds to a piecewise linear function vh on Th (that is, Vh e Ph )). Similarly, Vih e R^2A corresponds to a piecewise linear function VIH on Tih (vih £ P-ih)- Since Th is obtained from Tih by refinement, it follows that P^ C P^ }. That is, any function vih that is piecewise linear on Tih is also piecewise linear on Th- The function vih can be represented by the vector V2h containing the nodal values of VIH on Tih- To obtain the corresponding vector Vh containing the nodal values of vih on 7/i, it is necessary to interpolate V2h on Th- There is an operator lihh that takes Vih and produces Vh by interpolation. The operator hh,h can be represented by a matrix, but all that is needed is to compute the action of hh,h on an arbitrary vector Vih '• hh.hVih- This is easy because each node in Th either belongs to Tih or is the midpoint of an edge in Tih- If ^ih represents the (indices of) nodes of Tih and A//, represents the (indices of) nodes in Th that are not in Tih-, then the following algorithm computes Vh — hh.h Vih- This algorithm is slightly complicated by the fact that a free node in Th might be the midpoint of an edge in Tih with one free node and one constrained node as endpoints. Therefore, the algorithm actually computes V/,, the vector of nodal values for all nodes, free and constrained. The components corresponding to free nodes are then extracted. Copy the components of vw to the appropriate components of (those corresponding to nodes For
Extract the components of
forresponding to free nodesto obtain
Here end(/, 1), end(/, 2) represent the indices of the endpoints of the edge in Tih of which Z; €. Th is the midpoint. The equation KhEh = RH is now to be solved for a vector corresponding to a piecewise linear function belonging to the subspace V^ of VH- This means that the solution will be of the form hh.h^ih-, and the equation becomes
Chapter 13. The multigrid method
294
Equation (13.5) represents Nh equations in only N^h unknowns, and is therefore overdetermined and unlikely to have a solution. As I mentioned above, it should be possible to compute Eh accurately but not exactly on Tih- To recover a square system, the transpose of hh,h can be applied to both sides of (13.5):
Since ^^ maps R^2'1 into R^'1, the transpose I%h h maps R^'1 into R^2*. For this reason, l^h h is sometimes denoted as //, 2/1. The operator 72^ h could also be represented by a matrix, but its action can be computed directly and efficiently without a matrix representation. The following algorithm computes V2h — 1^ h^ • C»nce again, the intermediate vector Vh, containing all of the nodal values (free and constrained), is used. Copy the components of V), to the appropriate components of Vh (those corresponding to the free nodes of Th). Initialize the other components of Vh to zero. For/ e A//,,
Extract the components of V/, corresponding to free nodes in Tih to obtain V^.
13.2.2 The projected equation and the Galerkin idea It is now necessary to solve, at least approximately, the equation
This can be done using the Gauss-Seidel iteration. Before I discuss this, however, I want to prove that the matrix l£h hKhhh,h is just the stiffness matrix for the mesh Tih-
This can be shown by considering arbitrary functions v^h and wj_h in P^ and the corresponding vectors V^, ^2h e R^2* of nodal values. If Vh and Wf, are defined by
and vh , wh are the corresponding functions in Phl\ then vh — V2h and wh — W2h- Therefore,
where #(-,-) is the usual energy inner product:
13.2. The coarse grid correction algorithm
295
Since vh — V2h and wh — w2h, this shows that
But K2h is the unique matrix with the property that
and thus as desired. It is also true that F2h = l{h h Fh (see Exercise 7). Multiplying by l^h h to produce
may seem rather arbitrary, but it is actually an instance of the Galerkin method. The system
is equivalent to the variational equation
As I remarked above, this equation is overdetermined and cannot be solved in general. However, it can be projected onto the subspace RNlh by requiring that (13.7) hold only for Wh of the form Wh = hh,hW2h, W2h e RNl11. The new variational equation is then
But this is equivalent to
which in turn is equivalent to the system (13.6).
13.2.3 The two-grid multigrid algorithm The basic two-grid multigrid algorithm can be summarized as follows: 1. Do n\ Gauss-Seidel iterations on KhU — Fh to get an estimate V/7 of the solution Uh. 2. Compute the residual Rh — Fh — KhVh and project it onto T2h to get I^h hRh3. Do n\ + n2 Gauss-Seidel iterations on K2f,E — I^h hRh to get an estimate D2h of EU4. Interpolate D2h onto TH and update the approximate solution on the fine mesh:
296
Chapter 13. The multigrid method
It is easy to verify empirically that the corrected Vh, while more accurate than the initial V/,, tends to have relatively more high frequency errors. This is because, during step 3, the values of Eh are estimated more accurately at the nodes of Tih (where, after all, the computation of Dih is performed) than at the nodes of Th that do not belong to Tih- Therefore, it is usual to add a fifth step to the above algorithm: 5. Do HI Gauss-Seidel iterations on K^U — Fh, beginning with the value of V/, from step 4. Replace V/, with this improved estimate of £//,. Step 1 is calledpresmoothing, steps 2 to 4 coarse-grid correction, and step 5 postsmoothing. In a two-dimensional problem, computations on Tih cost about one fourth of the analogous computations on Th- Applying the intermesh transfer operators lih,h and I*h h during coarse-grid correction costs significantly less than one Gauss-Seidel iteration on Th, and therefore will be ignored. Computing the residual Rh = Ft, — Kh Vf, costs about the same as one Gauss-Seidel iteration. Therefore, the total cost of the two-grid algorithm is no more than (5/4)(nj + «2 + 1) Gauss-Seidel iterations on K^U — Fh. It makes sense, then, to compare the performance of the two-grid algorithm and Gauss-Seidel for the equivalent amount of work. EXAMPLE 13.5. The finite element method with linear Lagrange triangles is applied to the BVP
where K(X, y) — 1+x 2 +2y2, Q is the unit square, and f is chosenso that the exact solution is u(x, y) — x(l — x) sin (ny). The coarse and fine meshes are shown in Figure 13.12. Table 13.1 shows the error in the computed solution for the two-grid algorithm described above and for the equivalent number of Gauss—Seidel iterations. The parameters n\ = 2 and HI = 1 were used. The results show that the two-grid algorithm is significantly more efficient than the Gauss—Seidel iteration. As in the preceding example, the cost of various multigrid algorithms can be compared to the cost of Gauss-Seidel iterations on the finest mesh. In this context, the computational cost of one Gauss-Seidel iteration on the finest mesh is called one work unit.
13.3
The multigrid V-cycle
The previous example shows that the two-grid algorithm is an improvement over GaussSeidel; however, the convergence of the two-grid method is still not particularly good. The real power of the multigrid idea lies in applying coarse-grid correction recursively. When solving the equation KIHE = 1^ hRh on Tih, a coarse-grid correction can be applied using T^h- Similarly, when solving the equation on T^h, a coarse-grid correction using TSA can be applied. The resulting method is called the multigrid V-cycle. The name arises from the schematic shown in Figure 13.13, which shows the method beginning on the finest mesh (at the top), descending to the coarsest mesh (at the bottom), and then returning to the finest mesh. On the way down, presmoothing iterations are performed; on the way up, postsmoothing is done.
13.3. The multigrid V-cycle
297
00000000000000000000000000000000000000000000000000000000000000000 00000000000 0 0 0000000000 0000 0000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000
Table 13.1. Comparison of the two-grid algorithm with Gauss-Seidel. The number of Gauss—Seidel iterations is chosen so that the two methods will use roughly the same amount of computational work. The ratio in the fifth column is the error in the solution produced by Gauss—Seidel divided by the error in the solution produced by the two-grid algorithm.
Figure 13.12. The coarse and fine meshes for Example 13.5. The multigrid V-cycle algorithm is defined in Algorithm 13.1. This algorithm can be iterated as many times as needed to reduce the residual Rh to an acceptable level. The first time the V-cycle is invoked, the initial estimate Vh — 0 is used. Since the parameters n\, ri2 (the number of presmoothing and postsmoothing iterations, respectively) determine the cost and rate of convergence of the V-cycle, the V-cycle with these parameters is often called the V(n\, «2)-cycle. As the following example shows, the V-cycle is much more effective than the two-grid algorithm. However, it is not much more costly. Since a Gauss-Seidel iteration on T^h costs only 1/4' of an iteration on Tt,, the total cost of the V-cycle is
298
Chapter 13. The multigrid method
Figure 13.13. The multigrid V-cycle. This should be compared with an upper bound of (5/4)(n\ + ni + 1) for the two-grid algorithm; the cost of the V-cycle is only about 8% more.
Given an estimate Vf, of Uf,: Perform n\ iterations of Gauss-Seidel on K^U = Fh to improve the estimate V/, of f//, Compute the residual Rh = Fh — Kh V/, for/ = 1,2, . . . , * Project /J2,--iA onto 7^ to get F2//, = /^-i,,^-'/* Starting with the zero vector, perform n\ iterations of Gauss-Seidel on K2ihE = F2
299
13.3. The multigrid V-cycle
V-cycle its 1 2 3 4 5 6 7 8 9 10
Rel. error 8.802 • 10~2 8.119- 10~3 7.847- 10~4 7.767- 10~5 7.755 • 10~6 7.741 - 10-7 7.701 • 10~8 7.654- 10-9 7.666 • 10-10 7.839- 10-'1
Gauss-Seidel its 6 11 16 22 27 32 38 43 48 54
Rel. error 7.906 • 10-' 6.452- 10-' 5.262 • 10-' 4.117- lO" 1 3.354-10-' 2.731 • 10-' 2.134- 10-' 1.737- 10~' 1.413- 10-1 1.103- JO" 1
Ratio 8.98- 10° 7.95- 101 6.71 • 102 5.30- 103 4.32 • 104 3.53 • 105 2.77 - 106 2.27- 107 1.84- 108 1.41 -10 9
Table 13.2. Comparison of the multigrid V(2, \)-cycle with Gauss-Seidel. The number of Gauss-Seidel iterations is chosen so that the two methods will use roughly the same amount of computational work. The ratio in the fifth column is the error in the solution produced by Gauss-Seidel divided by the error in the solution produced by the V-cycle. EXAMPLE 13.6. The V(2, \)-cycle is applied to the system K^U = F^from Example 13.5, using a sequence of four meshes, with 8, 32, 128, and 512 triangles. Table 13.2 shows the error in the computed solution for the V-cycle algorithm described above and for the equivalent number of Gauss-Seidel iterations. The multigrid V-cycle is conveniently expressed and implemented as a recursive algorithm. In this context, it is more natural to write the relevant system on each level as KyhUzh — Fiih, even though Fr,h is not the load vector except on the finest mesh. Similarly, 1/2'h is not an estimate of the desired solution except on the finest mesh; on the coarser meshes, it represents an estimate of the error on the next finer mesh. With this understanding, the multigrid V-cycle can be expressed as a recursive algorithm Vh «— mgv(Kh, F/,, Vh) for replacing an estimate V), of the solution of KhUh = F/, with a better estimate. The recursion is expressed succinctly as Algorithm 13.2. Given an estimate Vh of the solution of KhUh = Fh: Perform n\ iterations of Gauss-Seidel to improve the estimate Vh If this is not the coarsest mesh: Compute F2h = 72rM (Fh - KhVh) and set Vih = 0 Call mgv recursively: V2i, <- mgv(K2h, F2h, ^2h) Correct Vh: Vh +- Vh + hh.hVih Perform nj iterations of Gauss-Seidel to improve the estimate V/,
Algorithm 13.2. The recursiveform of the multigrid V-cycle for solving K^U = F/, on the mesh 7/j.
300 13.3.1
Chapter 13. The multigrid method W-cycles and /^-cycles
As Example 13.6 shows, a single V-cycle may not be enough to solve the equation accurately. An alternative to simply performing multiple V-cycles is to do multiple cycles on the coarsegrid equations. This makes the coarse-grid correction more accurate without incurring additional (relatively expensive) computations on the finest mesh. When the coarse-grid correction cycle is performed IJL times, the result is called a fi-cycle. The recursive algorithm Vf, «- mg/ji(Kh, Fh,Vf,) is presented in Algorithm 13.3. Given an estimate V/, of the solution of K^Uf, = Fh: Perform n \ iterations of Gauss-Seidel to improve the estimate Vh If this is not the coarsest mesh: Compute the F2h = I2h,h (Fh - KhVh) and set V2h = 0 Call mg/M recursively /x times: for/ = 1,2, . . . , / x V2h <- mgfi(K2h, F2h, V2h) Correct y ft : Vh ^ Vh + I2h,hV2h Perform n2 iterations of Gauss-Seidel to improve the estimate V/, Algorithm 13.3. The recursive form of the multigrid ii-cycle for solving KhU = Fh on the mesh ThWhen IJL = 1, the /i-cycle is just the V-cycle, while /* = 2 results in the W-cycle, which is illustrated (for four meshes) in Figure 13.14. The name W(n\, «2)-cycle is used to specify the number of Gauss-Seidel iterations at each step. Exercise 5 asks the reader to show that the cost of the W(n\, «2)-cycle is bounded by 2(n\ + «2 + 1) work units. Therefore, for example, a W(l, l)-cycle costs roughly the same as a V(2, l)-cycle. Increasing /x beyond /z = 2 produces little additional accuracy for the additional computational cost; therefore, in practice, only the V-cycle and the W-cycle are used. EXAMPLE 13.7. This example applies the W(\, \)-cycle to the system KhU — Fh from Example 13.5. Table 13.3 shows the error in the computed solution for the W-cycle algorithm described above and for the equivalent number of Gauss—Seidel iterations. The results should also be compared with Table 13.2. For this example, theW(l, \}-cycle is somewhat more efficient than the V(2, \}-cycle.
13.4
Full multigrid
Multigrid V-cycles and W-cycles perform well, but there is one way in which they might be improved: The initial estimate of (//, (on the finest mesh) is taken to be V/, = 0, and it is
13.4. Full multigrid
301
Figure 13.14. The multigrid W-cycle. W-cycle its
1 2 3 4 5 6 7 8 9 10
Rel. error 2.350- 10~2 5.464 • 10~4 4.077- 10~5 2.397- 10~6 2.131 • 10~7 2.213- 10~8 2.348- 10~9 2.583- l
Gauss-Seidel its
6 12 18 24 30 36 42 48 54 60
Rel. error 7.906- 10"1 6.194- 10-' 4.849- 10"' 3.793 • 10-' 2.965- 10-' 2.317- 10"1 1.810- 10'1 1.413- 10'1 1.103- 10'1 8.614- 10'2
Ratio 3.36- 101 1.13- 103 1.19- 104 1.58- 105 1.39- 106 1.05- 107 7.71 - 107 5.47- 108 3.85- 109 2.70- 1010
Table 13.3. Comparison of the multigrid W-cycle with Gauss—Seidel. The number of Gauss-Seidel iterations is chosen so that the two methods will use roughly the same amount of computational work. The ratio in the fifth column is the error in the solution produced by Gauss-Seidel divided by the error in the solution produced by the W-cycle.
natural to try to do better. A simple idea is to estimate the solution to Kih U — FIH using the mesh Tih, and then interpolate this solution onto 7/, to use as the initial V/j. However, there is no reason to start on Tih \ an initial estimate of the solution to Kih U — Fih can be obtained by estimating the solution to K^U — F^ on ?4/,, and so forth. The/w// multigrid algorithm begins by solving the equation K2^,U —2t/, F on the coarsest mesh T^ and then interpolating to obtain an initial estimate of the solution Lfy-i/, of K2k-ti,U = F2*-i/,, which is then improved by n /x-cycles. This estimate to Lfy-i/, is then interpolated onto T-^-ih and used as an initial estimate of f/ 2 «-2 h . Again, n /^-cycles are used
302
Chapter 13. The multigrid method
Algorithm 13.4. The full multigrid algorithm for solving Kh U = Ff, on the mesh - The nestedfamilty of meshes is assumed to be T&h-. • • • , T2h, TH-
Figure 13.15. The fall multigrid algorithm.
to improve the estimate of U2k-2h, and the process continues. The full multigrid algorithm is easily expressed, in Algorithm 13.4, in terms of the /it-cycle. Figure 13.15 shows the order in which the meshes are visited for IJL = 1 (the V-cycle version) and n = 1. The full multigrid algorithm can also be expressed recursively, as in Algorithm 13.5, in the form Vh <- fmg(Kh, Fh). The reader will note that the full multigrid algorithm is determined by the parameters n\ (presmoothing iterations), n2 (postsmoothing iterations), ^ (/x = 1 for V-cycle, \JL = 2 for W-cycle), and n (the number of /n-cycles at each level). In the full multigrid algorithm, n /i-cycles are performed at each of the meshes T^-i/z, T2k-2h,... ,Th- The cost of a /Lt-cycle on T^h 's about 1/4' times that on Th. Therefore, reasoning as before, the total cost of the full multigrid is bounded by 4/3 times the cost of n /x-cycles, or a total of (\6/9)n(n\ + n2 + 1) work units for the V-cycle version and (8/3)/i(«i + n2 + 1) work units for the W-cycle version.
13.4. Full multigrid
303
Algorithm 13.5. Recursive version of the full multigrid algorithm for solving Kh U = Fh on the meshTh- The nestedfamilty of meshes is assumed to be Ty / , , - • • , Tih, 7/7.
13.4.1
Discretization, algebraic, and total errors
To evaluate the performance of full multigrid and, in particular, to compare it with the //cycle, it is helpful to consider the errors involved in solving Kh U — Fh and to think about the goal of an iterative algorithm. When a BVP is discretized by the finite element method and the system Kh U = Fh is solved to get an estimate of Uh, there are actually three "solutions" involved. One is the exact (continuous) solution u of the BVP. The second is (//,, the exact solution of the discretized problem K/,U — Fh. Finally, there is the computed solution Vh. The vectors Uh and Vh correspond to piecewise polynomials ufl and vh, respectively. In solving the system K/,U — /*/,, one is nominally trying to estimate Ui, (and hence M/,). However, the real goal is that vh approximate a. The error in v^ can be bounded by the triangle inequality:
The error u — uh (or its norm) is referred to as the discretization error, while uh — vh is the algebraic error and u — u/, is the total error in u/,. There is no point in expending computational effort reducing the algebraic error much below the discretization error, since doing so reduces the total error in vh marginally at best. Therefore, to compare the //-cycle with full multigrid, it is necessary to determine which algorithm reduces the error in vh to the level of discretization more efficiently. Since the /i-cycle is determined by three parameters (/i, n\, n-i) and full multigrid by four (//, n, n\, 712), a precise comparison is complicated. The following example will give the reader an idea of how the various possibilities compare on a specific problem. EXAMPLE 13.8. This example considers the B VPfrom Example 13.5 on a sequence of seven meshes. The number of free nodes ranges from 1 on the coarsest mesh to 16 129 on the finest. Five different multigrid algorithms-were applied: V(2, \)-cycle, W(\, \)-cycle, and three versions of the full multigrid algorithm, (fj,,n,n\,H2) — (1, 1,2, 1), (;u, n, «i, 712) = (2, 1, 1, 1), (/z, «, ft], 712) = (1,2, 1, 1). To compare the effectiveness of the algorithms, the ratio of algebraic error to discretization error was recorded. (Since the exact solution is known for this problem, the discretization error can be computed.} The results are reported
Chapter 13. The multigrid method
304
W-cycle
V-cycle Itlt »
1 2 3 4
WI I ^U
U
V
\\ I>- I<\\F. ||«-«A|k
5.33 7.14 10.67 0.767 16 0.091 21.33 0.011
Full multigrid
Its
wu
II;*-"* 'I* \\U-Uh\\E
1 2 3
6 12 18
0.542 0.028 0.002
Table 13.4. A comparison of different (see Example 13.8).
v (H,n,n ,n ) WU ";**|'£ v ^' ' }" 2z / \\u-uh\\E
(1,1,2,1) (2,1,1,1) (1,2,1,1)
7.11 0.027 8 0.093 10.67 0.006
multigrid algorithms on a model problem
in Table 13.4. This example suggests that the full multigrid algorithm is more efficient than either the V-cycle or the W-cycle alone. Roughly speaking, solving the system K^ U = Ft, to the level of discretization required 16 work units using the V(2, \}-cycle, 12 work units using the W(\, \}-cycle, and 1 work units using a full multigrid scheme. The reader should notice how remarkable these results are. One work unit equates to one Gauss—Seidel iteration on the finest mesh. Example VIA from the previous chapter -would suggest that 10 iterations of Gauss-Seidel would not begin to solve the system satisfactorily; indeed, 1000 iterations of Gauss-Seidel for this problem produced
It would take many thousands of Gauss-Seidel iterations to solve the system to the level of discretization. The best nonmultigrid algorithm presented in this text is the hierarchical basis CG {HCG) method, an iteration of which is somewhat more costly than a Gauss—Seidel iteration. Ten iterations of HCG produced
Even the powerful HCG algorithm does not approach the level of discretization for this amount of computational cost.
13.5 The MATLAB implementation The MATLAB code for the algorithms from this chapter assumes that the inputs are the quantities from the finest mesh (and, in particular, the finest mesh itself). It is therefore necessary to extract a coarser mesh from its refinement; this is inexpensive and coded in unRef inel. The other supporting functions are Interpolate!, the intermesh transfer operator hh,h, and its transpose, InterpolateTransl.
13.5.1
MATLAB functions
mgmul Applies the multigrid /z-cycle to the model problem (implemented using recursion).
13.6. Exercises for Chapter 13
305
• fullmgl Applies the full multigrid algorithm to the model problem (implemented using recursion). • Interpolate! Implements the intermesh transfer operator hh,h• InterpolateTransl Implements the intermesh transfer operator lh^h = 1^ h. • unRef inel Extracts the original mesh from a mesh produced by Ref inel.
13.6
Exercises for Chapter 13
1. Use trigonometric identities to show that (13.3a)-(13.3e) hold. 2. Suppose ( M I , «2, • • • , « « } is an orthonormal basis for R", and v is any vector in R". Show that
(Hint: Since { M I , M 2 , . . . , un] is a basis, v can be written as v — ]C"=1 <*/"/• Solve for a j by taking the dot product of both sides with w 7 .) 3. Show that the eigenvalues of B^ are a>9k,t + ! — & > . 4. Show that a> should be chosen to be 2/3 in the weighted Jacobi method to damp the high frequencies. (Hint: First show that
Then show that
where the optimal value of a> is 2/3.) 5. Suppose n\ presmoothing and n2 postsmoothing Gauss-Seidel steps are used. Show that the cost of a W-cycle is bounded by 2(n\ + n2 + 1) work units. 6. In Section 13.2.2, it was shown that (13.6) is the result of applying Galerkin's method to (13.5). It follows that the solution of (13.6) must be the best approximation to the solution of (13.5) in some sense. State this conclusion precisely. 7. Show that the load vector F2h on T2h is characterized by the condition
Use this fact to show that F2h — iL .Fh, where Fh is the load vector on Th.
306
Chapter 13. The multigrid method
8. (MATLAB) Consider the BVP
where £2 is the unit circle and
The exact solution is u(x, v) — (1 — x2 — y2)ex . (a) Create a mesh on £2 as in Example 6.6, refining the coarse mesh four times. (b) Compute the stiffness matrix K and the load vector F. (c) Compute the "exact" finite element solution «/, by solving KU = F using the built-in solver in MATLAB. Compute the discretization error. (d) Solve K U — Fusing the multigrid V(2, 1) -cycle, and record the algebraic error after each cycle. How many cycles are required to solve the equations to the level of discretization (for example, algebraic error at most 1 0% of discretization error)? (e) Solve KU — F using the full multigrid method with n 1 = n 2 = /* = « = !• Is the system solved to the level of discretization? 9. (MATLAB) Repeat the previous exercise, using the BVP from Example 7. 1 . Find the combination of parameters n\, «2, M> and n with which the full multigrid algorithm most efficiently solves KU = F to the level of discretization. 10. (MATLAB) In Section 13.3.1, it was stated that a /z-cycle with /x greater than 2 is not cost effective. Using the BVPs from the previous two exercises, demonstrate that this is true. 1 1 . (MATLAB) Consider the BVP
where £2 is the unit circle. Establish a mesh on £2 by starting with a coarse triangulation with four triangles (as in Figure 6.11) and refining it five times. Form the finite element equations KU — F and solve them using the full multigrid method with each of the following choices for (n\,n2, f i , n ) : (2, 1, 1, 1), (1, 1,2, 1), (1, 1, 1,2). How many work units are required by each method? What is the algebraic error produced by each method? 12. (MATLAB) Repeat the previous exercise, but take £1 to be the polygon with vertices (0,0), (1,0), (1, 1), (-1, 1), (-1, -1), and (0, -1). Start with a coarse mesh consisting of six triangles and refine it five times to obtain the final mesh.
Part IV
Adaptive Methods
This page intentionally left blank
Chapter 14
Adaptive mesh generation
When solving a BVP, the goal is to obtain a solution that is sufficiently accurate; frequently the exact details of how this is accomplished are unimportant. This is especially true of the mesh; in many cases, it would be desirable if the finite element algorithm could automatically generate a suitable mesh. That is the topic of this part of the book. In order for the computed solution to be accurate, it is necessary that the mesh be fine enough to represent the variation in the true solution; if the solution is changing rapidly, then the mesh must be quite fine. On the other hand, if the solution changes slowly, then a coarser mesh will suffice. Many solutions have the property that they change rapidly over part of the computational domain £2 and slowly over other parts. An example is the function
which has a sharp peak at (x, y) = (0.5, 0.75). Two meshes on the unit square are shown in Figure 14.1. One is a uniform mesh with 2048 triangles, while the other is refined in the neighborhood of the point (0.5, 0.75) and has only 858 triangles. The piecewise linear interpolants of u on these two meshes are shown in Figure 14.2. The error in the interpolant is less on the locally refined mesh, whether measured in the energy norm (0.2912 versus 0.3534) or the maximum pointwise error (0.03078 versus 0.04538).
Figure 14.1. Two meshes on the unit square, The mesh on the left has 2048 triangles, while the mesh on the right has 858 triangles.
309
Chapter 14. Adaptive mesh generation
310
Figure 14.2. Thepiecewise linear interpolants ofthefunction (14.1) on the meshes of Figure 14.1. Adaptive finite element methods try to determine a locally refined mesh, like the one in the previous example, automatically during the process of solving a BVP. The basic algorithm is this: Given an initial mesh T: repeat Solve the BVP on T Estimate the error in the computed solution on each element If the total error is sufficiently small, stop Otherwise, use the error estimates to select certain elements to refine Locally refine the mesh There are therefore three components to an adaptive algorithm: 1. an element-by-element error estimator; 2. a strategy for choosing which triangles to refine; and 3. an algorithm for locally refining a mesh. There are a number of possible choices for each of these algorithms. Indeed, adaptive finite element methods form an ongoing area of research, and there is no currently accepted "best" algorithm. In this chapter, I will discuss algorithms for local refinement of a triangulation and triangle selection strategies. I end the chapter with a conceptually simple but expensive error estimator, and show how an adaptive algorithm would perform. In the next chapter, I present several efficient error estimators.
14.1. Algorithms for local mesh refinement
14.1
311
Algorithms for local mesh refinement
14.1.1 Algorithms based on the standard refinement Locally refining a triangulation is not straightforward because the resulting mesh must conform to the following rule, first mentioned in Section 4.1: The intersection of any two triangles must be a common vertex or a common edge. For instance, if it is decided that one triangle in a mesh is to be refined, then the standard refinement, applied to the single triangle, leads to a nonconforming mesh, as shown in Figure 14.3. The situation illustrated by Figure 14.3 arises in any conceivable algorithm for local mesh refinement, and so any algorithm must be able to deal with it. The usual method is to refine neighboring triangles, as necessary, until a conforming triangulation is obtained. For example, the second triangle in the original mesh of Figure 14.3 could be bisected, leading to the situation shown on the left in Figure 14.4. It is customary to refer to the bisection of a triangle as a green refinement and the resulting (sub)triangles as green triangles. Although green refinement creates a conforming mesh, it can also lead to a degenerate sequence of meshes. The reader will recall, from Section 5.1, that the convergence theory and standard error estimates are based on the nondegeneracy of the family of meshes, which requires that the shapes of the triangles not degenerate (informally, that the triangles not become arbitrarily skinny). However, as suggested by the mesh on the right in Figure 14.4, repeated green refinements can produce unacceptable triangles.
Figure 14.3. A mesh with two triangles (left} and a nonconforming refinement (right).
Figure 14.4. Left: The nonconforming mesh of Figure 14.3, made to conform by a triangle bisection. Right: Another step of this refinement process; a degenerate family of meshes could arise by continuing this process.
312
Chapter 14. Adaptive mesh generation
To avoid degeneracy, a given triangle is typically subjected to at most one green refinement. One way to ensure this is to keep track of which triangles are descended from a green refinement; if such a triangle become nonconforming due to refinement of an adjacent triangle, then it must be subject to a regular refinement. Another way to handle green refinements is to remove all green triangles from a given mesh before locally refining it. Of course, this introduces nonconforming triangles, but so does the local refinement, so not much added difficulty is introduced. In this second option, the sequence of meshes is not nested (not every triangle in a given mesh need be a subtriangle of a triangle in the previous mesh).
14.1.2
Algorithms based on bisection
Since standard refinements must be supplemented by green refinements to enforce triangle conformity, it is natural to consider using triangle bisection exclusively for local refinement. There are several algorithms based on triangle bisection, and I will describe two of them in detail.
Newest-node bisection
Triangle bisection leads to degeneracy if repeated bisecting edges are incident with the same node, as in Figure 14.4. A simple way to avoid this is to always bisect any given triangle from the newest node of that triangle, that is, from the node most recently added to the family of meshes. The newest node in a triangle is referred to as the peak of the triangle, and the opposite edge as the base. When a triangle is bisected, the new node becomes the peak of both subtriangles. This leads to the sequence of refinements shown in Figure 14.5. Newest-node bisection was introduced by Sewell [39], who showed that every descendent of a given triangle falls into one of four similarity classes (that is, each subtriangle is similar to one of four triangles), as illustrated in Figure 14.5. This is enough to prove that newest-node bisection cannot lead to degeneracy. Triangle bisection can lead to nonconforming triangles just as does standard refinement. However, if a triangle and one of its neighbors share a common base, then the two triangles can be refined together without creating a nonconformity. In this case, each of the two triangles is said to be compatibly divisible. (A triangle is also called compatibly divisible if its base is a boundary edge.) Mitchell [32] extended the newest-node method to a simple recursive algorithm, which is based on the following observation: Suppose 7*, with base e, is to be bisected, and 7) is the neighbor of Tk sharing the edge e. Ife is not the base of 7), then, after a single bisection of 7), e will be the base of a subtriangle of 7). This is illustrated in Figure 14.6. Mitchell's idea was to recursively refine 7} so that Tk and its (new) neighbor can then be refined together. The recursion continues until it reaches a pair of neighbors sharing a base or a triangle whose base is a boundary edge. It can be shown that the recursion "bottoms out" after a finite number of steps provided that, in the initial mesh, the bases are defined so that every triangle either shares its base with a neighbor or has a boundary edge as a base. The recursive algorithm is summarized in Algorithm 14.1.
14.1. Algorithms for local mesh refinement
313
Figure 14.5. Refined triangles created by newest-node bisection. The triangles are labeled by their similarity classes.
Figure 14.6. Newest-node bisection. The left-hand triangle in the mesh on the left is to be bisected. Its base is not the base of its neighbor; however, after a single bisection of the neighbor, the first triangle shares a base with one of the new subtriangles. Given a triangle Tk of mesh T to bisect: If the base e ofTk is a boundary edge replace Tk by two subtriangles else Let TJ be adjacent to Tk across e Ife is also the base of TJ replace Tk and TJ each by two subtriangles else recursively call this routine to bisect 7) recursively call this routine to bisect Tk Algorithm 14.1. The recursive form of newest-node bisection. Longest-edge bisection
Another algorithm based on bisection prevents degeneracy by choosing the longest edge of the triangle as the edge to be bisected. This is referred to as longest-edge bisection. Rivara
314
Chapter 14. Adaptive mesh generation
Figure 14.7. The nonconforming refinement of Figure 14.3 (left) and a further refinement (right), which is conforming.
Figure 14.8. The original mesh (left) and a nonconforming refinement (middle). The lower right triangle is bisected twice, first by the longest side, to make the mesh conforming again.
[35] proposed two algorithms based on longest-edge bisection, one of which is described here. When a triangle is bisected by the longest edge, a neighboring triangle can become nonconforming. The remedy is to bisect the neighboring triangle by the longest edge. In the simplest case, this leads to a conforming triangulation, as in Figure 14.7. If it does not, then one of the subtriangles in the neighboring triangle is bisected to make the mesh conforming. This is illustrated in Figure 14.8. Since the subtriangle is bisected by the newest node, this algorithm is a hybrid of the longest-edge and newest-node methods. (Rivara also presented a method based entirely on longest-edge bisection, which will not be described here.) The process illustrated in Figure 14.8 may create new nonconforming triangles, since making a neighboring triangle conform may induce a nonconformity in one of its neighbors. In this case, the above process is iterated. That is, first the desired triangles are bisected by the longest side. Then any nonconformities created in the first steps are removed, possibly introducing new nonconformities. Then any nonconformities introduced in the second step are removed, and so forth. Since there is a finite number of triangles in the original mesh, the process must terminate with a conforming mesh. An application of this algorithm refines certain triangles T from the original mesh T. Each refined triangle is replaced by two, three, or four subtriangles, and it is guaranteed that even repeated application will not reduce the minimum angle in the mesh by more than a factor of two. Moreover, the nondegeneracy condition from Section 5.1 is guaranteed to hold (see Rivara [35]).
14.2. Selecting triangles for local refinement
315
The performance of the newest-node and longest-edge/newest-node hybrid bisection algorithms tends to be similar in practice (see [32]). The newest-node algorithm is simpler to implement, since no nonconforming triangles are ever created, and therefore it seems to be a better choice in practice. The only drawback of the newest-node method is that the bases must be chosen in the initial mesh so that every triangle is compatibly divisible. This is easy to do by inspection for a coarse mesh, and algorithms can be devised to perform this step automatically.
14.2
Selecting triangles for local refinement
The previous section presented algorithms for refining certain triangles in a mesh without refining all of the triangles. Another essential ingredient of an adaptive algorithm is a method for selecting which triangles to refine. The reader will recall the framework for an adaptive algorithm: The given BVP is solved on a mesh ?/, to produce an approximation uh of the true solution u. The error u — «/, is then estimated on each triangle T e Th, and triangles are selected to be refined based on the size of the errors. It is necessary to choose a norm in which to measure the error on T; possibilities are the L 2 - and energy norms:
Another possibility is to estimate the maximum pointwise error over T; this introduces a new norm, the L°°-norm, which is defined for continuous functions by
An adaptive algorithm might then try to estimate
for each T e 7/j . When the true solution is smooth enough and elliptic regularity holds, the following error estimates hold when uh is the piecewise linear finite element solution:
(see Chapter 5). Using the L°°-norm, the analogous result is
(see Chapter 8, and page 224 in particular, of Brenner and Scott [13]). The bound (14.2) is stated more precisely in the next chapter. Since |log(/z)l grows so slowly as h —>• 0, it is reasonable to ignore it in designing practical numerical algorithms. The most common triangle selection strategy is due to Babuska and Rheinboldt [7] and is based on two assumptions:
316
Chapter 14. Adaptive mesh generation
1 . The error estimate € on a given triangle T (and its subtriangles formed as the mesh is refined) has the form € — chx, where c, A are positive constants and h is the diameter of7\ 2. A mesh is (nearly) optimal when the errors are equilibrated, that is, when the elementwise errors are nearly constant. The asymptotic error estimates given above suggest that the first assumption is reasonable, at least asymptotically, whether the error is measured in the L2-, energy, or L°°-norm. The second assumption can be justified rigorously (see Section 3.5 of [8]) under certain circumstances. As working hypotheses for the development of practical algorithms, the above assumptions have been quite successful. The Babuska-Rheinboldt strategy works as follows: Consider a triangle T e Th with diameter h\ and suppose the estimated error over T is €\. Suppose further that T was obtained by refining a triangle (in an earlier mesh) having diameter ho, and assume that the estimated error over that triangle was 6Q. The assumptions
allow the determination of c and A, which in turns allows an estimate of the error that would result if T were refined so that the diameter of its subtriangles were h\/2:
The estimate €2 is computed for each triangle in the current mesh and M is defined to be the largest of these values. That is, M is an estimate of the largest elementwise error that would result from a uniform refinement of the current mesh. Now the second assumption above is used, that the mesh can be optimized by equilibrating the error. Every triangle in the current mesh whose estimated error e\ is greater than M is selected for refinement. The above strategy also provides a basis for deciding how much to refine a given triangle. The reader should notice that a single bisection of a triangle T produces two subtriangles T\ , TI satisfying
(for example, bisecting an equilateral triangle T produces two subtriangles with the same diameter as that of T). Bisecting T twice (that is, bisecting T and both its subtriangles) produces four subtriangles whose diameters are at most half of that of T. Should every selected triangle be bisected twice? Since the goal is to equilibrate the mesh, this may not be the best strategy. Subtriangles of T would ideally have a diameter hi satisfying
which yields hi — (M/c)l/k. Of course, the bisection algorithm does not allow hi to be specified, so T could be bisected once or twice so that subtriangles have diameter at most 0000000
14.3. A complete adaptive algorithm
317
There is one more technicality to be considered. The reader will recall that the bisection algorithm may require the refinement of unselected triangles in order to maintain compatibility. Since a single bisection does not necessarily reduce the diameter, there is the possibility that h \ — ho, in which case (14.3) does not determine c and A.. Some ad hoc procedure is necessary in this case; I suggest replacing h\ with ho/V2 before determining c and X (although the diameter of T is the same as that of its supertriangle, its area is half of the supertriangle). The triangle selection strategy described above has one drawback: It may select very few triangles for refinement at a given step. This is appropriate if the main goal is to equilibrate the error on the mesh. However, since the finite element equations must be formed and solved at each step of the adaptive algorithm, efficiency demands that the adaptive algorithm terminate in relatively few steps. For this reason, it is reasonable to augment the above strategy by requiring that at least a fixed fraction r of the triangles be selected for refinement at each step; a reasonable value of this fraction would be r — 0.2. If each selected triangle is bisected twice, then the number of nodes would increase by nearly a factor of two, at least, at each step. (With r = 0.2, the factor could be as little as 1.6, but since nonselected triangles must usually be bisected to maintain a conforming mesh, the factor is greater in practice.)
14.3
A complete adaptive algorithm
The reader will recall from the introduction to this chapter that an adaptive algorithm requires three ingredients: an elementwise error estimator, a triangle selection strategy, and a local refinement algorithm. I have presented options for the second and third ingredients; in this section 1 describe a conceptually simple but computationally expensive error estimator, and show some results from the resulting algorithm. The next chapter is devoted to practical error estimators. The type of error estimate needed for an adaptive algorithm is an a posteriori error estimate, one that uses the computed solution itself to estimate how accurate it is. In Section 7.4,1 compared the finite element solutions on two meshes, the second a refinement of the first, to estimate the error in the second solution. In that case, both solutions were piecewise linear. A similar approach would be to compute solutions of two different degrees, say piecewise linear and piecewise quadratic, on the same mesh. Given the piecewise linear approximation M/, on a mesh 77,, one could compute the piecewise quadratic finite element solution u(^ on the same mesh and estimate u — «/, by u ( h ) — uh. Knowing wj,2) and uh on each triangle, the elementwise error can be computed in any desired norm. The three examples below are solved using an adaptive algorithm. The BVPs in these examples are taken from Mitchell's comparative study [32] of adaptive methods. In each example, the adaptive algorithm consists of newest-node bisection, the Babuska-Rheinboldt triangle selection strategy, and the quadratic error estimator, with the elementwise errors estimated in the energy norm. Some criterion is needed to evaluate the effectiveness of an adaptive method. We know that when the true solution u is smooth enough,
318
Chapter 14. Adaptive mesh generation
Moreover, in a two-dimensional problem with uniformly refined meshes,
where Nv is the number of nodes in the mesh. It follows that, for h small enough,
for some constant C. When u is smooth but varies sharply in some parts of £2 (as in the examples presented below), it might be necessary that h be unrealistically small before this asymptotic rate of convergence is observed (and thus, in practice, this rate of convergence is not observed when uniform refinement is used). A fair test of the effectiveness of an adaptive algorithm is whether the optimal rate of convergence (14.4) is observed. Therefore, in the examples shown below and in the next chapter, the values of (Nv , || u —uh \\ E) are recorded for the meshes generated during the adaptive algorithm, and values of C and p are determined such that € = CNP fits the data points (N, e) = (Nv, \\u — Uh HE) as nearly as possible in the least-squares sense. If/? is (close to) —1/2, then the algorithm has exhibited a (nearly) optimal rate of convergence, and can be judged a success. The following three examples are given in order of increasing difficulty. The first has a mild boundary layer (region of sharp change near the boundary). Since the boundary layer is not severe, a sequence of uniform meshes exhibits the optimal convergence rate, and the adaptive algorithm shows only a small improvement in efficiency. The second has a sharp peak in the interior of £2, and the adaptive algorithm is noticeably more efficient than the nonadaptive algorithm. The third problem has a region of rapid change in the interior of ^ and is the most difficult of the three. EXAMPLE 14.1. In this example, the domain is the unit square, and the true solution, the function
changes somewhat rapidly near the top and right boundaries. The function u is shown in Figure 14.9. The BVP is
where g is chosen so that the function u given above is the solution. A sequence of uniform meshes yielded errors satisfying
The final mesh contained 8192 triangles and 4225 nodes, and the error in the energy norm was about 0.0713. The convergence of the error to zero during the course of the adaptive iteration displayed the following behavior:
14.3.
A complete adaptive algorithm
319
Figure 14.9. The exact solution for Example 14.1.
Figure 14.10. A uniformly refined mesh (left} and a locally refined mesh (right) generated in Example 14.1. The final mesh has 9728 triangles an d 5022 nodes, and the energy norm error in the solution was about 0 . 03 88. Figure 14.10 shows a sample uniformly refined mesh and a sample locally refined mesh from this example. Figure 14.1 1 displays the convergence to zero of the error for both methods. The graph is given on a log-log plot, on which a relationship of the form € = CNP appears as a straight line with slope p. In this example, the two methods are converging at roughly the same rate, so the two lines are nearly parallel. The adaptive method is slightly superior (C is less for the adaptive method), as seen in Figure 14. 1 1. EXAMPLE 14.2. Consider the Dirichlet problem
320
Chapter 14. Adaptive mesh generation
Figure 14.11. The convergence to zero of the error (in the energy norm) for Example 14.1: uniform refinement (solid line) and local refinement (dashed line).
Figure 14.12. A locally refined mesh in Example 14.2 and the computed solution.
where £2 is the unit square and f is chosen so that the exact solution is
This function has a sharp peak at ( x , y ) = (0.5, 0.117) (see Figure 14.12).
14.3. A complete adaptive algorithm
321
Figure 14.13. The convergence to zero of the error (in the energy norm) for Example 14.2: uniform refinement (solid line) and local refinement (dashed line). The finite element method with a sequence of uniform meshes exhibited the following convergence: The adaptive algorithm behaved like this: The errors for the two methods are plotted in Figure 14. 13. // is also instructive to compare the two methods in the following way. On a uniform mesh with 8 1 92 triangles and 4225 nodes, the error (in the energy norm) was about 0.0 1 44. On the other hand, the adaptive algorithm achieved a similar error (0.01 38) on a mesh with only 388 triangles and 209 nodes. Even using the quadratic error estimator, which is quite inefficient, the adaptive algorithm only took about 2 1 % of the time required to achieve this level of accuracy on the uniform mesh. EXAMPLE 14.3. In this example, the domain is the rectangle £2 = (0.01 , 1) x (— 1 , 1), and the true solution is the harmonic function The function u changes abruptly as (x, y) approaches the lower left corner of£i (see Figure 14. 14); indeed, u has a singularity at the origin, which lies just outside of£L. The BVP is the Dirichlet problem
where g is chosen so that the function u given above is the solution.
Chapter 14. Adaptive mesh generation
322
Figure 14.14. A locally refined mesh in Example 14.3 and the computed solution. The finite element method with a sequence of uniform meshes exhibited the follow ing convergence:
The adaptive algorithm behaved like this:
The errors for the two methods are plotted in Figure 14.15. The preceding examples suggest that the adaptive algorithm described in this section is quite satisfactory in every respect but one: The error estimator is extremely expensive. In fact, computing the error estimate wf } — uh costs much more time than computing uh itself. To be satisfactory, an a posteriori error estimate must require less computational time than computing the solution M/, itself, preferably much less time. The next chapter describes several such estimators that, in most respects, work as well as the expensive estimator used above.
14.4 The MATLAB implementation The main algorithms from this chapter are the newest-node refinement algorithm, the Babuska-Rheinboldt triangle selection strategy, and the quadratic error estimator. These algorithms are implemented in the MATLAB functions LocalRef inel, SelectTris, and QuadElementErrEstl, respectively. Estimating the errors on each element requires the element stiffness matrices corresponding to the quadratic mesh, so a new version
14.4. The MATLAB implementation
323
Figure 14.15. The convergence to zero of the error (in the energy norm} for Example 14.3: uniform refinement (solid line) and local refinement (dashed line). of Stiffness2, named Stif fnessE, is provided to return these quantities. The Interpolate2 routine is used to interpolate a piecewise linear function onto a piecewise quadratic mesh. Several new access functions are needed to extract information from the mesh data structure. In order to apply the newest-node algorithm to a mesh, the base of each triangle must be defined. The bases are stored in an N, x 1 array Bases in the mesh data structure; Bases (k) contains the index of the base of triangle Tk in the list of edges. Only LocalRef inel uses the Bases field, and it automatically updates the array when it refines a mesh. It was mentioned above in Section 14.1.2 that the recursive newest-node algorithm is guaranteed to work provided that every triangle in the initial mesh is compatibly divisible (that is, shares a base with a neighboring triangle or has a boundary edge as base). Choosing the bases in the initial mesh to satisfy this property is an example of a matching problem from graph theory. Algorithms for this problem are complicated and beyond the scope of this book. Interested readers can consult the book by Jungnickel [25] for a detailed discussion of matching algorithms. For a coarse mesh, the user can easily choose the bases appropriately and define the Bases array. For example, Figure 14.16 shows two mesh with the elements and edges labeled by their indices. In the mesh on the left, a natural choice of T. Bases is T.Bases = [4, 4, 6, 6, 11, 11, 13, 13]. This indicates that triangles 1 and 2 share edge 4 as base, triangles 3 and 4 share edge 6, triangles 5 and 6 share edge 11, and triangles 7 and 8 share edge 13. In the mesh on the right, one choice of T. Bases is T.Bases = [8,9, 10, 11, 12, 13, 14].
324
Chapter 14. Adaptive mesh generation
Figure 14.16. Two triangulations with the triangles and edges labeled by their indices. In this case, each triangle has its boundary edge as base. In neither case is the choice of bases unique. For example, in the mesh on the right in Figure 14.16, another possible choice of bases is given by T. Bases = [2, 2, 4, 4, 6, 6, 14]. If a mesh is passed to LocalRef inel and the mesh data structure does not contain the Bases field, then LocalRef inel calls Def ineBases to choose the bases. The MATLAB routine Def ineBases implements a heuristic algorithm that is likely but not guaranteed to find an acceptable choice of bases. If Def ineBases fails, then LocalRef inel prints an error message and terminates (in which case the user would have to define T. Bases directly). The interested reader can consult the MATLAB file Def ineBases . m for details about the algorithm it implements. 14.4.1
MATLAB functions
These functions, together with those described in Section 15.5, comprise the adaptive code. • LocalRef inel: The newest-node algorithm for local refinement. Updates the lists of triangle diameters and error estimates needed for triangle selection. • Def ineBases: Heuristic algorithm for defining the base of each triangle in a mesh. If the algorithm succeeds, the resulting mesh is compatibly divisible. • SelectTris: Babuska-Rheinboldt triangle selection strategy. Always selects at least 20% of the triangles. • QuadElementErrEst 1 Estimates the errors on each triangle by comparing to the piecewise quadratic solution on the same mesh. Returns the more accurate piecewise quadratic solution if requested. • Solve Implements the adaptive algorithm (for the model problem) described in this chapter. Can also use one of the efficient error estimators described in the next chapter. • Solve 1: Similar to Solve but with uniform refinement. Intended for comparison with Solve.
325
14.5. Exercises for Chapter 14
• StiffnessE: Version of Stiff ness2 which also returns the element stiffness matrices; needed by QuadElementErrEstl. • The functions for retrieving information from the mesh data structure are - getAdjacentTriangle Gets the index of the triangle on the other side of a given edge of a triangle. - getAdj acentTriangles Gets the indices of all the triangles adjacent to a given triangle. - getDiameter Computes the diameter of a given triangle. — getDiameters Computes the diameters of a list of triangles. - getFBndyEdgeNodes Gets the coordinates of the endpoints of a free boundary edge. - getGradl Gets the gradient of a piecewise linear triangle on a given triangle. - getOppositeVertex Gets the index of the third vertex of a triangle. - getOtherEdges Given the index of an edge of a triangle, gets the indices of the other two edges.
14.5
Exercises for Chapter 14
1. Here are the values of N and 6 obtained in the adaptive algorithm in Example 14.3:
TV 1 180 2250 4379 8440
16 182
6
0.23224 0.16039 0.11387 0.082623 0.059271
Verify that the least-squares fit to these data is € = 8.86W ° 5I8 . (Hint: The problem can be expressed as a linear least-squares problem by taking the log:
The values of C and p are estimated by solving the system
in the least-squares sense for log (C) and p.}
326
Chapter 14. Adaptive mesh generation
2. If a triangle is bisected repeatedly by the newest-node algorithm, all the subtriangles fall into one of four similarity classes (see Figure 14.5). Determine the angles for triangles in each of the four classes, assuming the initial triangle is (a) an equilateral triangle; (b) an isosceles right triangle (take the hypotenuse to be the base). 3. (MATLAB) Let ft be the polygonal region having vertices (0,0), (1,0), (1,1), (—1, 1), (—1, —1), and (0, -1), and let u be the harmonic function defined in polar coordinates by u = r 2/3 sin (20/3), 0 < 9 < 2n (see Exercise 8.6.7). Using the adaptive algorithm described in this chapter, solve the Dirichlet problem
where g is chosen so that the given function u is the solution. At what rate does II" — W/JE go to zero? Solve the problem using a sequence of uniformly refined meshes. How much more efficient is the adaptive algorithm? 4. (MATLAB) The function
is harmonic. Let ft be the square (—1, 1) x (—1, 1) and use the adaptive algorithm described in this chapter to solve the Dirichlet problem
where g is chosen so that the given function u is the solution. (The solution u has peaks at the four corners of the domain.) At what rate does \\u — Uh \\E go to zero? Solve the problem using a sequence of uniformly refined meshes. How much more efficient is the adaptive algorithm? 5. (MATLAB) Solve the Dirichlet problem
where ft is the unit square and K is the discontinuous function defined by
Use the adaptive algorithm described in this chapter.
14.5. Exercises for Chapter 14
327
Figure 14.17. The initial mesh for Exercise 1. 6. (MATLAB) Repeat the preceding exercise with
7. (MATLAB) Let tt be the hexagon with vertices (1,0), (1/2, >/3/2), (-1/2, >/3/2), (-1,0), (-1/2, -A/3/2), and (1/2, -A/3/2), and let To be the triangulation of £2 shown in Figure 14.17. Consider the Dirichlet problem
where K is the discontinuous function with value 1 on triangles 1, 3, and 5 and value 100 on triangles 2, 4, and 6 of To- The boundary data are chosen so that the exact solution is
Solve the BVP using the adaptive algorithm described in this chapter, and using a sequence of uniform meshes produced by the standard refinement. In both cases, use To as the initial mesh. Which method is more efficient? 8. (Programming) Extend the MATLAB codes Solve and QuadElementErrEstl to allow a zero-order term in the PDE, as in
Test your code by reproducing the results of Example 14. 1 .
328
Chapter 14. Adaptive mesh generation
9. (MATLAB) Change the BVP in Example 14.1 so that the solution is
and solve using the adaptive algorithm from this chapter. (The code from the previous exercise is required.) Also solve using a sequence of uniformly refined meshes, and compare the efficiency of the adaptive algorithm to the nonadaptive algorithm.
Chapter 15
Error estimators and indicators
In this chapter, several practical error estimators will be presented. An a posteriori error estimator can serve two purposes. First of all, as was demonstrated in the preceding chapter, an elementwise error estimate allows an algorithm to choose which triangles to refine so as to reduce the error with as little computational effort as possible. Second, an accurate error estimate shows when the problem has been solved accurately enough and therefore when the algorithm can be halted. Some techniques give rise to error indicators, which effectively indicate where the error is large, and therefore which elements can fruitfully be refined, without giving a quantitative measure of the error. An indicator can be used in an adaptive algorithm, although one would have to use other criteria to decide when to halt the algorithm. Error indicators are worth pursuing because it has been shown that the most accurate error estimator does not necessarily lead to the most efficient adaptive algorithm (see [32]). Although this may be surprising at first, it follows naturally from the fact that the nodal values are found by solving a coupled system KU — F; no nodal values are independent of any others. Therefore, if the mesh is insufficiently refined in one region, the error in the associated nodal values will affect the nodal values away from the given region. I will typically write "error estimators" when, strictly speaking, "error estimators and indicators" would be more precise (for example, in the next paragraph). However, when discussing a specific technique, I will indicate the category into which it falls (error estimator or error indicator). There are two types of error estimators: explicit and implicit. An implicit error estimator requires the solution of systems of (algebraic) equations; the estimator is expressed as an implicit function of the computed solution. On the other hand, an explicit error estimator is given as an explicit function of the computed solution. It follows that explicit estimators are less expensive to compute. In the following sections, I present two explicit estimators and one implicit estimator. Throughout the discussion, the model BVP
329
Chapter 15. Error estimators and indicators
330
will be used for illustration.
15.1
An explicit error indicator based on estimating the curvature of the solution
The first practical error indicator presented here is due to Eriksson and Johnson [19] and is based on an a priori error estimate for
Here u is the exact solution of the BVP and M/J is the finite element solution, which is always assumed to be piecewise linear in this chapter. The reader will recall that, for continuous functions, the L°°-norm measures the largest pointwise value of the given function. When the function might be discontinuous, the L°°-norm ignores sets of measure zero.21 In the previous chapter, I presented the following asymptotic error bound:
The reader will recall from Chapter 5 that error estimates typically are expressed in terms of higher derivatives of the exact solution; a typical result was
where
is defined by
(see Section 5.3). The bound (15.2) can be stated more precisely in terms of the second derivatives of M, but this requires the definition of the following Sobolev spaces:
and its partial derivatives up to order k are in The W*'°°-norm is just the largest of the L°° -norms of M and its partial derivatives up to order k, while the \y*'°°-seminorm |«lvy*.~(n) is the largest of the L°° -norms of the partial derivatives of M of order exactly k. Given these definitions, (15.2) can be expressed as
21 The precise fefinition is The number defined on the right is called the essential supremum of |/| on £2. It is the smallest number M such that | /| < Af except possibly on a set of measure zero.
15.1. Explicit error indicator based on estimating curvature of solution
331
(the constant C in (15.3) is differentthan in (15.2); the constant in (15.2) absorbed the factor of ||M||w2-™(Q))- A variation of (15.3), which is useful for local refinement, is expressed element by element:
Based on (15.4), it can be expected that the values ofh^-\u\w^^(T) are representative of the relative sizes of \\u — «/, |U~(7> Since |log (h)\ grows so slowly as h decreases, it is usually considered to be absorbed in the constant C. Since the value of C is unknown, this method produces an error indicator, not an error estimator (although Eriksson and Johnson [19] do discuss methods for estimating C, thereby turning the indicator into an estimator). To develop a practical method, it remains only to find a way of estimating \u\W2,^(T) from the computed solution uh. It is not surprising that the size of the second derivatives of u would provide an indication of where the error in Uh is large. Since w/, is piecewise linear, its second derivatives are zero inside each triangle T. If the second derivatives of u are large (that is, if Vw is changing rapidly) on part of £1, w/, could accurately represent u on that region only if the mesh were highly refined there. By definition, \u\W2^(T) is the largest of
Eriksson and Johnson proposed to estimate this quantity by the largest of
where (XT, }'r) represents the centroid of the triangle T and T ranges over the three triangles adjacent to T. (Thus there are six difference quotients to compute for each T, fewer if T is adjacent to dQ.) To demonstrate the effectiveness of the Eriksson-Johnson error indicator, the examples from Section 14.3 will be solved using it in place of the quadratic error estimate. EXAMPLE 15.1 (cf. Example 14.1). The Eriksson—Johnson error indicator attempts to estimate (at least up to an unknown multiplicative constant) the L°°-norm of the error. The following table records the maximum of the elementwise indicators and also the actual L°°norm of the error in the computed solution. The last column gives the ratio of the actual error to the estimated error.
Chapter 15. Error estimators and indicators
332
Nv
25 81 145 266 537 1134 2169 4176
II" -uh\\L*nn) 1
2.1296- ID" 7.9937 • 1(T2 4.5236 • lO-2 1.6384- 10-2 1.5173- lO-2 5.1592- 1(T3 2.5034- 10~3 1.4781 • 10~3
Estimated error
Ratio
1.46878 7.63613- KT1 5.24178- 10"1 2.23028 • 10-1 1.24818- 10-1 6.35376-10- 2 3.0163 • JO"2 1.58193- JO' 2
0.145 0.105 0.0863 0.0734 0.122 0.0812 0.0830 0.0934
Figure 15.1. The convergence to zero of the error (in the energy norm) for Example 15.1: uniform refinement (solid line) and local refinement (dashed line). These results suggest that the indicator is quite successful. The convergence of the energy norm of the error to zero was
which is similar to the result obtained in Example 14.1. This convergence is compared to the results for uniform refinement in Figure 15.1 (cf. Figure 14.11). The reader will recall that this B VP is not particularly difficult, and the adaptive refinement does not lead to a large improvement in this case. EXAMPLE 15.2 (cf. Example 14.2). The solution of this problem has a sharp peak in the interior of£l (see Figure 14.12) and is more difficult than the previous problem. Here are the results of the adaptive algorithm, displayed as in the previous example.
15.1. Explicit error indicator based on estimating curvature of solution
Nv 45 86 145 239 398 802 1546 2815 5243
II w -uh\\L~(n) 7.0416- 10-2 8.5744 • 10~2 1.3493- lO-2 3.0149- 10~3 7.1862- 10~4 4.3305 • 10~4 1.6103- 10~4 1.4172- 10~4 7.1135- 10-5
Estimated error 1.6036- 10-1 6.0089- 10~2 2.7297- 10~2 2.0955 • 10~2 6.2306 • 10-3 4.0484 • 10~3 1.3948- 10-3 8.5832- 10~4 4.1649- 10-4
333
Ratio 0.43911 1.427 0.49430 0.14387 0.11534 0.10697 0.11545 0.16511 0.17080
Figure 15.2. The convergence to zero of the error (in the energy norm) for Example 15.2: uniform refinement (solid line) and local refinement (dashed line).
The convergence of the energy norm of the error to zero was
which is virtually identical to the result obtained in Example 14.2. This convergence is compared to the results for uniform refinement in Figure 15.2 (cf. Figure 14.13). EXAMPLE 15.3 (cf. Example 14.3). This problem is the most difficult of the threefor uniform refinement (the solution and domain are shown in Figure 14.14). Here are the results of the adaptive algorithm, displayed as in the previous examples.
Chapter 15. Error estimators and indicators
334 Nv
95 198 364 692 1350 2477 4717
II" -uh\\L°<>(a) 5.5502 • 10-1 3.7094- lO"1 1.6246- 10-' 5.6060 • 10~2 1.7011 • 10~2 5.0250 • 10~3 4.02478 • 10-3
Estimated error 2.49002 2.07305 1.43412 0.764897 0.289175 0.130231 0.0875076
Ratio 0.22290 0.17893 0.11328 0.073291 0.058826 0.038585 0.046028
Figure 15.3. The convergence to zero of the error (in the energy norm) for Example 15.3: uniform refinement (solid line) and local refinement (dashed line). The convergence of the energy norm of the error to zero was
which, again, is similar to the result obtained in Section 14.3. This convergence is compared to the results for uniform refinement in Figure 15.3 (cf. Figure 14.15).
15.2
An explicit error indicator based on the residual
The finite element solution w/, is defined by
and it satisfies the orthogonality condition
15.2. An explicit error indicator based on the residual
335
where u is the exact solution. If eh — u — M/, is the error in w/, and v is any element of V, then Therefore, e^ satisfies the variational equations
Many error estimators are based on manipulating these equations. In deriving the following error estimator, I will assume that any Dirichlet boundary conditions are homogeneous, as in the following model problem:
The necessary modification for nonzero Dirichlet data will be given at the end of the derivation. The weak form of the model problem (15.6) is a(u,v) = ^(u)foralli; eV = {veHl(&) : v = 0 o n r , ) , where
The derivation of the error estimate begins with (15.5a), expressed in terms of the individual triangles:
In deriving the above formula, Green's identity
was used; it is valid since uh, restricted to a single triangle T, is smooth.
336
Chapter 15. Error estimators and indicators The above expression for a(eh, v) involves the element-by-element residuals
and
which measure by how much «/, fails to solve the PDE in the interior of each triangle and by how much it fails to satisfy the boundary condition on each boundary edge. The above expression can be further simplified by defining £/, to be the set of all edges in the triangulation and Xh to be the set of all e e £h such that e is not a boundary edge. If e lies on r\, then, for any test function v e V, v = 0 on e and hence
For each e e Xh, Tft\ and Te^ will denote the triangles on either side of e and ne will be the outward-pointing unit normal vector to 3Tej on e. The order of Te>\, Te^ is arbitrary. Then
where
denotes the jump in the discontinuous function Kduf,/dne across e. (It is important to notice that this jump is the same regardless of which direction is chosen for ne.) It thus follows that
To simplify this expression further, the definition of the boundary residual R is extended to all edges in £h:
With this notation, (15.7) becomes
15.2. An explicit error indicator based on the residual
337
For the next step of the derivation, a piecewise linear approximation to an arbitrary v 6 V is needed. If v lies in # 2 (£2), then the piecewise linear interpolant vj of u is available; unfortunately, the size of v — v/ is bounded in terms of \v\H2(^}. A similar approximation vh is needed with ||u - Vh\\L2m bounded in terms of the Hl norm o f f . Such an estimate can be given by referring to the patch f of a given triangle T:
(thus f is the collection of triangles in 77, that intersect T, even at a single point). Theorem 1.7 in Ainsworth and Oden [2] yields, for any v e H] (£2), a continuous piecewise linear vh satisfying
where e is any edge of T. Since TJ/; e V/,, (15.5b) implies that
and thus, by (15.5a), It follows that v can be replaced with v — vh in the right-hand side of (15.8). Therefore,
The reader should notice the use of the Cauchy-Schwarz inequality in the second and fourth steps (the first time for integrals, the second for Euclidean vectors). In the final upper bound, fe denotes the patch of a triangle having e as an edge. It can be shown that
(this follows from the fact that each T' can belong to at most k patches T for some finite integer k; otherwise, the minimum angle condition would be violated).
Chapter 15. Error estimators and indicators
338
It therefore follows that
Taking v — eh and using the V-ellipticity of a ( - , •) yields
or, regrouping the boundary integrals and noting that most edges belong to two triangles, 1/2
Based on this indicator, the explicit residual indicator is defined by
When the BVP contains inhomogeneous Dirichlet data, the finite element problem solves
where Gh is the piecewise linear function interpolating g (the Dirichlet data) at the constrained nodes and having value zero at the free nodes. Since Gh is given, the finite element procedure, as described in previous chapters, actually solves for Wh. The explicit residual indicator rjT in this case is exactly as defined above, bearing in mind that M/,, not wh, is used to define the residuals r and R. As in the previous section, the effectiveness of the indicator will be illustrated by applying it to the examples from Section 14.3. EXAMPLE 15.4 (cf. Example 14.1). The explicit residual estimator led to
The results are compared to the results for uniform refinement in Figure 15.4 (cf. Figure 14.11). EXAMPLE 15.5 (cf. Example 14.2). The three-point element residual estimator led to
The results are compared to the results for uniform refinement in Figure 15.5 (cf. Figure 14.13).
15.2. An explicit error indicator based on the residual
339
Figure 15.4. The convergence to zero of the error (in the energy norm) for Example 15.4: uniform refinement (solid line), local refinement based on the explicit residual indicator (dashed line).
Figure 15.5. The convergence to zero of the error (in the energy norm) for Example 15.5: uniform refinement (solid line), local refinement based on the explicit residual estimator (dashed line).
340
Chapter 15. Error estimators and indicators
Figure 15.6. The convergence to zero of the error (in the energy norm) for Example \ 5.6: uniform refinement (solid line), local refinement based on the explicit residual estimator (dashed line). EXAMPLE 15.6 (cf. Example 14.3). The explicit residual estimator led to These results are compared to the results for uniform refinement in Figure 15.6 (cf. Figure 14.15). In all these examples, the explicit residual indicator was as effective as the explicit indicator presented in the last section, or, indeed, as the expensive quadratic error estimator of the last chapter.
15.3 The element residual error estimator In the last chapter, I showed how an effective a posteriori estimate for the error in a piecewise linear solution could be obtained from the piecewise quadratic solution on the same mesh. This is not practical because of the expense of solving the global problem to get the piecewise quadratic solution. The idea of implicit residual methods is to obtain a higher-order estimate of the solution element by element. This can be done by posing a BVP for the true solution u on each element T. In this section, I will present one such BVP and the resulting error estimate. This method is due to Bank and Weiser [10] and is called the element residual method. Given that the exact solution to the BVP (15.1) is u and that the approximate solution uh has been computed on the finite element mesh Th, it is easy to derive the PDE satisfied by the error e^ = u — Uh in each T e Th '•
15.3. The element residual error estimator
341
This calculation is valid in the interior of each triangle since the piecewise linear function Uh is smooth there. This PDE must be augmented by suitable boundary conditions; in the element residual method, Neumann conditions are used:
The Neumann boundary conditions follow immediately from eh—u — Uh. It is not possible to solve (15.9) to find e^ on T, since the exact value of du/dn on dT is unknown except on F2 (where Kdu/dn = h). However, if e c dT is an edge in the interior of Q and f is the triangle on the other side, then a reasonable estimate is
In this formula, n is the outward unit normal to T and V w/,| r , V uh\f are the gradients of Uh on triangles T and 7", respectively. Since M/J is piecewise linear, these gradients are constant. The notation
will be used to denote the averaged value oficduh/dn. If an edge of T lies in F2, then the boundary condition
will be applied on that edge. On the other hand, if an edge of T lies in F|, then the Dirichlet condition €h = 0 can be applied on that edge. (Strictly speaking, the correct condition would be eh — g — Uh, since «/, likely satisfies the Dirichlet condition exactly only at the endpoints of the edge. However, it is simpler to take eh = 0 on such an edge, and the additional error introduced does not significantly affect the performance of the method.) Thus eh is estimated on T by solving the PDE
together with the following boundary conditions applied to each edge e of dT:
Exercise 2 asks the reader to show that the weak form of this BVP is
342
Chapter 15. Error estimators and indicators
where and
is interpreted as h on edges lying in F2. Having solved (15.14), at least approximately, the element residual error estimate is defined by
It remains only to decide how to solve (15.14). Except when T has an edge lying in FI, the BVP is a pure Neumann problem, and there is, in fact, no guarantee that a solution exists. (The reader should recall that a pure Neumann problem such as (15.14) has no solution except when the appropriate compatibility condition is satisfied.) Nevertheless, it has been shown (cf. [ 10]) that a useful estimate of eh can be obtained by solving (15.14) by Galerkin finite elements over a special subspace of quadratic functions, one over which the bilinear form defining (15.14) is elliptic. The space Pi(T} of quadratic functions on a triangle T is six-dimensional, and elements ofP2(T) are uniquely determined by their nodal values at the three vertices and the midpoints of the three edges (cf. Section 4.2.1). The bilinear form
is not P2(T)-e\\iptic, but it is elliptic over the three-dimensional subspace M(T) spanned by the nodal basis functions corresponding to the three edge midpoints. Restricting the approximating subspace from P2(T) to M(T) is equivalent to estimating eh to be zero at the vertices of T. Again, it has been shown that this is not too severe an assumption. The Galerkin problem used to estimate et, on T is then
The (three-point) element residual estimator is then
Sometimes the space M(T) is augmented by a fourth basis function, which is cubic with value 0 on dT and value 1 at the centroid of T. The resulting four-dimensional space will be denoted by M(T}, and the solution of the above Galerkin problem, with M(T) replaced by M(T), will be denoted by £/,. The (four-point) element residual estimator is
15.3. The element residual error estimator
343
Figu re 15.7. The convergence to zero of the error (in the energy norm) for Example 15.7: uniform refinement (solid line), local refinement based on the three-point element residual estimator (dashed line), local refinement based on the four-point element residual estimator (dotted line). The dashed and dotted lines are almost indistinguishable. Because of their distinctive shapes, the basis functions of M(T) are called bubble functions (see Figures 4. 1 1 and 4. 1 5). To illustrate their effectiveness, the two element residual estimators will be applied to the test problems from Section 14.3. EXAMPLE 15.7 (cf. Example 14.1). The three-point element residual estimator led to
while the four-point element residual method resulted in
These results are compared to the results for uniform refinement in Figure 15.7 (cf. Figure 14.11). EXAMPLE 15.8 (cf. Example 14.2). The three-point element residual estimator led to
while the four-point element residual method resulted in
344
Chapter 15. Error estimators and indicators
Figure 15.8. The convergence to zero of the error (in the energy norm) for Example 15.8: uniform refinement (solid line), local refinement based on the three-point element residual estimator (dashed line}, local refinement based on the four-point element residual estimator (dotted line). The dashed and dotted lines are almost indistinguishable. These results are compared to the results for uniform refinement in Figure 15.8 (cf. Figure 14.13). EXAMPLE 15.9 (cf. Example 14.3). The three-point element residual estimator led to
while the four-point element residual method resulted in
These results are compared to the results for uniform refinement in Figure 15.9 (cf. Figure 14.15). In all three examples, the three-point estimator was just as effective as the four-point estimator and therefore should be preferred due to its reduced cost. Even the three-point element residual method is significantly more expensive than the explicit error indicators presented in the previous sections. Implicit estimators, such as the element residual method, are thought to be more robust and accurate than explicit estimators (see, for example, the discussions in Sections 2.1 and 3.1 of Ainsworth and Oden [2]). Indeed, the explicit estimators presented in this book are really error indicators, not error estimators. The element residual method, on the other hand, is derived as a true estimator, and it is quite effective. The following table shows the ratios of true energy norm error to estimated energy norm error for the three-point element residual estimator for Example 15.8; the estimator provides a close upper bound for the true error.
15.4. Some final examples
345
Figu re 15.9. The convergence to zero of the error (in the energy norm) for Example 15.9: uniform refinement (solid line), local refinement based on the three-point element residual estimator (dashed line), local refinement based on the four-point element residual estimator (dotted line). The dashed and dotted lines are almost indistinguishable.
N 830 1712 3303 6722 12594 Ration 0.851 0.844 0.836 0.834 0.829 However, in his comparative study of error estimation and local refinement techniques, Mitchell [32] found that a wide variety of implicit and explicit estimators resulted in a similar performance in practice. Indeed, using the exact error (computed from a knowledge of the exact solution) as the error estimator did not usually result in the best mesh, at least in his numerical experiments. For this reason, inexpensive explicit indicators may be preferred over more expensive implicit estimators. There are a number of other estimators and indicators that have been proposed, beyond the three presented in this chapter. More information can be found in the book by Ainsworth and Oden [2].
15.4
Some final examples
To illustrate the workings of the finite element method, I have presented a number of numerical examples throughout this book. Most of these have been problems with known smooth solutions, as such problems are convenient for illustrating various aspects of the performance of the algorithms. This final section will examine some typical problems that lead to singular solutions. The exact solutions are not known, but the adaptive algorithm developed in the last two chapters can be used to analyze the singularities numerically.
346
Chapter 15. Error estimators and indicators
Figure 15.10. The mesh obtained by the adaptive finite element method and corresponding computed solution for the BVP described in Section 15.4.1.
15.4.1 A discontinuous coefficient If K is piecewise continuous and the weak form of the BVP
is solved, it can be shown that the flux K Vu of the solution is continuous. Therefore, Vu will necessarily have a discontinuity wherever K does (unless VM is zero there). To illustrate this, 1 will define £2 to be the unit square,
and f(x, y) = 1, and solve the BVP using the adaptive algorithm described above (with the three-point element residual error estimator). Figure 15.10 shows the final mesh obtained and the corresponding solution; both the mesh and the solution show the effect of the discontinuous coefficient K . Figure 15.11 shows the graph of z — u(Q.5, y), a "slice" of the computed solution. The discontinuity in the gradient is clearly visible.
15.4.2 A reentrant corner Elliptic regularity, described in Section 5.4, holds for planar regions ft if 9ft is smooth or ft is convex. A nonconvex region ft with a boundary that fails to be smooth can lead to solutions with singularities even if the problem functions (coefficients, forcing functions, boundary data) are smooth. The region ft shown in Figure 15.12 is such a region; the corner at the center of the graph is commonly referred to as a reentrant corner.
15.4. Some final examples
347
Figure 15.11. The graph ofz = w(0.5, y), where u is the computed solution from the example in Section 15.4.1. The discontinuity in the gradient is clearly seen.
Figu re 15.12. The mesh obtained by the adaptive finite element and corresponding computed solution for the BVP described in Section 15.4.2.
The simple Dirichlet BVP
was solved by the adaptive algorithm, resulting in the mesh and computed solution shown in Figure 15.12. The locally refined mesh, in a neighborhood of the reentrant corner, is magnified by factors of 4 and 16 in Figure 15.13.
348
Chapter 15. Error estimators and indicators
Figure 15.13. The mesh from Figure 15.12, magnified 4 times (left) and 16 times (right) around the reentrant corner.
Figu re 15.14. 77/e me*// obtained by the adaptive finite element and corresponding computed solution for the BVP described in Section 15.4.3. 15.4.3
Transition from Dirichlet to Neumann conditions
The final example involves mixed boundary conditions, with a singularity appearing where the boundary conditions change from Dirichlet to Neumann. The BVP is
where Q is the unit square, F2 is the interval on the y-axis with endpoints (0, 0.25) and (0, 0.75), and PI is the rest of 3£2. The mesh obtained from the adaptive algorithm and the corresponding computed solution are graphed in Figure 15.14. The solution has singularities at the points (0, 0.25) and (0, 0.75), where the boundary conditions change type. This is clearly seen in Figure 15.15, where z = «(0, y) is graphed.
15.6. Exercises for Chapter 15
349
Figure 15.15. A graph ofz — w(0, y), where u is the computed solution from the example in Section 15.4.3. The discontinuity in the gradient is clearly seen.
15.5
The MATLAB implementation
In addition to routines implementing the estimators from this chapter, there are functions to estimate the actual errors, in the energy and L°°-norms, on each triangle. These routines require that the exact solution be provided. I say estimate, even in this context, because quadrature is used to estimate the integrals defining the energy norm, and the L°°-norm is estimated by sampling the given function at a judicious selection of points. 15.5.1
MATLAB functions
• EJindicatorl: The Eriksson-Johnson error indicator. • ExpResiduall: The explicit residual error indicator. • ElementResiduall: The element residual error estimator (three- or four-point versions). • ElementEnergyNormErrsl: True energy norm errors (estimated by quadrature). Requires the true solution. • ElementLinfNormErrsl: True L°°-norm errors (estimated by sampling). Requires the true solution. • getBubbleVals: Computes the values and gradients of the bubble functions at specified evaluation nodes on TR.
15.6
Exercises for Chapter 15
1. On page 337, it was stated that each triangle T' e Th belongs to at most k patches f for some finite integer k. Find k if 77, is a standard uniform mesh on a rectangle (such as the mesh on the left in Figure 14.1).
350
Chapter 15. Error estimators and indicators
2. Show that the weak form of the BVP defined by (15.10) and (15.11H15.13) is (15.14). 3. Perform an operation count for each of the error estimators described in this chapter. How does each compare to the cost of assembling the stiffness matrix K1 (See Exercise 7.6.7.) 4. (MATLAB) Solve the BVP from Exercise 14.5.4 using the adaptive algorithm using the element residual error estimator. How does the total error estimated by the element residual method compare with the total error in the final solution? 5. (MATLAB) Consider the BVP
where Q is the polygon with vertices (1,-1), (1, 1), (-1, 1), (-1,-1), and (0, 0). Using an adaptive algorithm, estimate the solution with an (estimated) error of no more than 5% in the energy norm. 6. (MATLAB) Consider the linear system KU — F arising from the BVP in Example 14.2 on a locally refined mesh. Solve this system using each of the following algorithms: (a) CG;
(b) CG with the hierarchical basis preconditioning; (c) CG with SSOR preconditioning. Repeat for several levels of (local) refinement. Which method seems to be the most efficient? 7. (MATLAB) Repeat the previous exercise for the system KU — F arising from the BVP in Example 14.3. 8. (MATLAB) Consider the BVP from Exercise 14.5.7. (a) Solve the BVP using the adaptive algorithm with each of the error estimators presented in this chapter. Which method is most efficient? (b) Does the element residual estimator accurately estimate the errors in the computed solution? If not, why not? (Hint: Consider the behavior of the true solution across the discontinuities of K.) 9. (Programming) Extend the MATLAB codes ExpResiduall and Element Residua 11 to allow a zero-order term in the PDE, as in
15.6.
Exercises for Chapter 15
351
10. (Programming) Extend the MATLAB code ElementEnergyNormErrsl to allow a zero-order term in the PDE, as in
11. (MATLAB) The solution of
has a boundary layer whose severity increases with p. Solve this BVP on the unit square for p — 10, 100, 1000 and observe how the meshes change with p. 12. (Programming) The triangle selection algorithm involves a number of choices. The MATLAB codes Solve and SelectTris include the following features: • SelectTris always selects at least 20% of the triangles for refinement. • Each selected triangle is refined until the diameters of its subtriangles are at most half of its diameter. These choices are almost certainly not optimal. The purpose of this exercise is to design a more efficient triangle selection process. Modify the code as follows: (a) SelectTris selects at least rN, triangles to refine, where r e [0, 1) is an algorithmic parameter. (If r = 0, then the algorithm selects only the triangles indicated by the strategy described in Section 14.2, even if the number is small.) (b) Solve refines each selected triangle until the diameters of its subtriangles are at most p times its diameter, where p € (0, 1] is an algorithmic parameter. (If p — 1, then Solve refines each selected triangle only once, which may not lead to a decreased diameter since the refinement is done by bisection.) (c) Solve calls CG to solve the finite element equations KU = F. By numerical testing, determine good values for r and p. Use the test problems from Examples 14.1, 14.2, and 14.3, and measure the efficiency by the total amount of work required to reduce the error in the computed solution below a given threshold. (To measure this total work, the results of Exercises 7.6.7, 7.6.8, and 15.6.3, as well as the operation count for the CG algorithm given in Section 11.1.1, will be useful.) Be sure to include the cost of the calculations on the intermediate meshes in the totals.
This page intentionally left blank
Bibliography [ 1 ] R. A. Adams. Sobolev Spaces. Academic Press, New York, 1975. [2] Mark Ainsworth and J. Tinsley Oden. A Posteriori Error Estimation in Finite Element Analysis. John Wiley, New York, 2000. [3] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. 3rd edition, SI AM, Philadelphia, 1999. [4] D. N. Arnold and R. Winther. Mixed finite elements for elasticity. Numer. Math., 42:401-419,2002. [5] D. N. Arnold and R. Winther. Nonconforming mixed elements for elasticity. Math. Models Methods Appl. ScL, 13:295-307, 2003. [6] Owe Axelsson. Iterative Solution Methods. Cambridge University Press, Cambridge, UK, 1994. [7] I. Babuska and W. C. Rheinboldt. Error estimates for adaptive finite element computations. SIAMJ. Numer. Anal., 15:736-754, 1978. [8] I. Babuska and T. Strouboulis. The Finite Element Method and its Reliability. Oxford University Press, New York, 2001. [9] Ivo Babuska and Manil Suri. Locking effects in the finite element approximation of elasticity problems. Numer. Math., 62:439-463, 1992. [10] R. E. Bank and A. Weiser. Some a posteriori error estimators for elliptic partial differential equations. Math. Comp., 44:283-301, 1985. [11] Mark W. Beall and Mark S. Shephard. Mesh Data Structures for Advanced Finite Element Applications. Technical Report SCOREC Report #23-1995, Scientific Computation Research Center, Rensselaer Polytechnic Institute, Troy, NY, 1995. [12] P. Bochev and R. B. Lehoucq. On the finite element solution of the pure Neumann problem. SIAM Review, 47:50-66, 2005. [13] Susanne C. Brenner and L. Ridgway Scott. The Mathematical Theory of Finite Element Methods. Springer-Verlag, New York, 1994. 353
354
Bibliography
[14] Zhiqiang Cai and Gerhard Starke. Least-squares methods for linear elasticity. SIAM J. Numer. Anal, 42:826-842, 2004. [15] P. G. Ciarlet and P. A. Raviart. Interpolation theory over curved elements, with applications to finite element methods. Comput. Methods Appl. Mech. Engrg., 1:217-249, 1972. [ 16] Philippe G. Ciarlet. The Finite Element Method for Elliptic Problems. North-Holland, Amsterdam, 1978. Reprinted as Classics in Applied Mathematics 40, SIAM, Philadelphia, 2002. [17] Philippe G. Ciarlet. Introduction to Numerical Linear Algebra and Optimisation. Cambridge University Press, Cambridge, UK, 1989. [18] D. A. Dunavant. High degree efficient symmetrical Gaussian quadrature rules for the triangle. Int. J. Numer. Methods Engrg., 21:1129-1148, 1985. [19] Kenneth Eriksson and Claes Johnson. An adaptive finite element method for linear elliptic problems. Math. Comp., 50:361-383, 1988. [20] Gerald B. Folland. Real Analysis: Modern Techniques and their Applications. John Wiley, New York, 1984. [21] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 3rd edition, 1996. [22] Michael Greenberg. Advanced Engineer ing Mathematics. Prentice-Hall, Upper Saddle River, NJ, 2nd edition, 1998. [23] P. Grisvard. Elliptic Problems in Nonsmooth Domains. Pitman, Boston, 1985. [24] M. E. Gurtin. An Introduction to Continuum Mechanics. Academic Press, New York, 1981. [25] Dieter Jungnickel. Graphs, Networks and Algorithms. Springer-Verlag, Berlin, 1999. [26] Wilfred Kaplan. Advanced Calculus. Addison-Wesley, Reading, MA, 4th edition, 1994. [27] M. Lenoir. Optimal isoparametric finite elements and error estimates for domains involving curved boundaries. SIAMJ. Numer. Anal., 23:562-580, 1986. [28] David G. Luenberger. Introduction to Linear and Nonlinear Programming. AddisonWesley, Reading, MA, 1973. [29] T. A. Manteuffel. An incomplete factorization technique for positive definite linear systems. Math. Comp., 34:473-497, 1980. [30] Jerrold E. Marsden and Anthony J. Tromba. Vector Calculus. W. H. Freeman, New York, 4th edition, 1996.
Bibliography
355
[31] J. A. Meijerink and H. A. van der Vorst. An iterative solution method for linear systems of which the coefficient matrix is a symmetric m-matrix. Math. Comp., 31:148-162, 1977. [32] William F. Mitchell. A comparison of adaptive refinement techniques for elliptic problems. ACM Trans. Math. Software, 15:326-347, 1989. [33] Per-Olof Persson and Gilbert Strang. A simple mesh generator in MATLAB. SIAM Review, 46:329-345, 2004. [34] Jeffrey Rauch. Partial Differential Equations. Springer-Verlag, New York, 1991. [35] Maria-Cecilia Rivara. Algorithms for refining triangular grids suitable for adaptive and multigrid techniques. Int. J. Numer. Methods Engrg., 20:745-756, 1984. [36] H. L. Royden. Real Analysis. 2nd edition, Macmillan, New York, 1968. [37] Yousef Saad. Iterative Methods for Sparse Linear Systems. Philadelphia, 2003.
2nd edition, SIAM,
[3 8] R. Scott. Finite Element Techniquesfor Curved Boundaries. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1973. [39] E. G. Sewell. Automatic Generation ofTriangulationsfor Piecewise Polynomial Approximation. PhD thesis, Purdue University, West Lafayette, IN, 1972. [40] Gilbert Strang. Variational crimes in the finite element method. In A. K. Aziz, ed., The Mathematical Foundations of the Finite Element Method with Applications to Partial Differential Equations, Academic Press, New York, 1972, pp. 689-710. [41 ] Gilbert Strang and George J. Fix. An Analysis of the Finite Element Method. WellesleyCambridge Press, Wellesley, MA, 1988. [42] Richard S. Varga. Matrix Iterative Analysis. Prentice-Hall, Englewood Cliffs, NJ, 1962. [43] David M. Young. Iterative Solution of Large Linear Systems. Academic Press, New York, 1971. [44] Harry Yserentant. On the multi-level splitting of finite element spaces. Numer. Math., 49:379-412, 1986. [45] O. C. Zienkiewicz, B. M. Irons, F. E. Scott, and J. S. Campbell. High speed computing of elastic structures. In Proceedings of the Symposium of the International Union of Theoretical and Applied Mechanics. Liege, 1970.
This page intentionally left blank
Index boundary layer, 3 1 8 boundary value problem, 3, 4 Dirichlet, 20 elliptic, 8 Neumann, 7, 17, 47 one-dimensional, 64 variational form, 3, 15, 21, 22 weak form, 3, 15,21,27,34,57 bounded set, 4 bulk modulus, 12, 13,49,218
adaptive finite elements basic algorithm, 310 admissible displacement, 22 algebraic error, 303 approximating subspace, 70, 79 for Dirichlet conditions, 72 back substitution, 226 balance law, 9 banded matrix, 228 barycentric coordinates, 122, 161 basis, 58 hierarchical, 85 Lagrange, 70, 78 nodal, 70 best approximation, 58 bilinear form, 43 bounded, 43 elliptic, 43 BndyFcn, 137, 148 body force, 10 boundary curved, 93 of a domain, 4 boundary condition inhomogeneous Neumann, 132 boundary conditions, 4 Dirichlet, 4, 5, 7, 45, 169 essential, 30, 33 inhomogeneous Dirichlet, 74, 111, 132, 194 inhomogeneous Neumann, 194, 198 interpolated, 112 mixed, 8, 10,31,33,46,72,78 natural, 30 Neumann, 4, 5, 47, 70, 170, 177 transition, 348
C A (ft),20 C0°°(ft),23
c£(n),2o
Cauchy sequence, 40 Cauchy-Schwarz inequality, 37 Cea's theorem, 61, 63 CG, 262, 276 CGsing, 262 CGSSOR, 276 change of variables in a 1-D integral, 158 in a 2-D integral, 160 Cholesky factorization, 226 of a banded matrix, 229 of a general sparse matrix, 230 Ciarlet, 199 closure of a domain, 4 CNodePtrs, 134 coarse-grid correction, 296 CoarseCircleMeshDl, 150 CoarseCircleMeshNl, 150 CoarseEllipseMeshDl, 151 CoarseEllipseMeshNl, 151 CoarseSemiCircleMeshBottomDl, 150 CoarseSemiCircleMeshDl, 150 357
358
compact set, 23 compactly supported function, 23 compatibility condition, 5, 7, 17, 256 compatibly divisible triangles, 312 complete space, 40 condition number and choice of basis, 244 of a stiffness matrix, 237 of a symmetric positive definite matrix, 237 conjugate directions, 242 conjugate gradient algorithm, 239, 242 preconditioned, 253 with hierarchical bases, 251, 254 connected set, 4 connectivity of a mesh, 129 consistency of a singular system, 261 constitutive hypothesis, 9 continuous dependence, 44 convergence of the conjugate gradient algorithm, 243 of the steepest descent algorithm, 237 degenerate family of meshes, 311 Degree, 137 Delaunay triangulation, 140 dense subset, 40 descent direction, 236 diagonally dominant matrix, 271 direct method for solving a linear system, 225 discontinuous coefficient, 346 discretization error, 303 displacement problem, 49 distmesh2d, 140 divergence of a tensor, 9 divergence operator, 6, 16 divergence theorem, 15, 18 for a vector-valued function, 18 domain, 4 dot product, 36 of tensors, 11
Index
dual space of a normed vector space, 42 duality trick, 116 Dunavant (quadrature rules), 188 DunavantData, 203,204 edge constrained, 72 free, 72 EdgeCFlags, 137 EdgeEls, 137 Edges, 135 efficiency of high-order elements, 201 EJ Indicator, 349 elasticity, 8 general, 34 isotropic, 9, 48, 213 element quadrilateral, 87, 165 rectangular, 86 reference, 87, 91, 188, 195 triangular, 68 ElementEnergyNormErrs, 349 ElementLinfNormErrs, 349 ElementResidual, 349 Elements, 135 elliptic regularity, 115 EnergyNorm, 205 EnergyNorml, 182 EnergyNorm2, 204 EnergyNormErr, 205 EnergyNormErrl, 173, 182 EnergyNormErr2, 204 error bound a posteriori, 106 a priori, 105 asymptotic, 105 finite element solution, 111 inhomogeneous Dirichlet problem, 114 isoparametric finite elements, 122 piecewise linear interpolation, 107 piecewise polynomial interpolation, 109 quadrature, 121
359
Index
error estimate a posteriori, 180, 317 heuristic, 179,202 error estimator element residual, 340 explicit versus implicit, 329 quadratic, 317 error indicator, 329 Eriksson-Johnson, 331 explicit residual, 338 EvalNodalBasisFcns, 203, 204 EvalNodalBasisFcnslD, 204 EvalNodalBasisGrads, 203, 204 EvalPWPolyFcn2, 204 existence, 4 ExpResidual, 349 ExtractLinearMesh, 204 fast Poisson solver, 255 FBndyEdges, 137 fill-in, 228 reducing, 231 finite differences for the Laplacian, 281 finite element method, 3, 67 Galerkin, 3 fixed point, 267 fixed point iteration, 267 FNodePtrs, 134 forward substitution, 226 Fourier modes, 284 Fourier's law of heat conduction, 6, 212 Friedrich's inequality, 48 fullmgl,305 fundamental theorem of calculus, 17 Galerkin method, 57, 60, 295 Gauss-Seidel iteration, 272, 289 GaussData, 204 GenLagrangeMesh, 205 GenLagrangeMesh2, 203 getAdjacentTriangle, 325 getAdjacentTriangles, 325 getBubbleVals, 349 getDiameter, 325 getDiameters, 325
getDirichletData, 183 getFBndyEdgeNodes, 325 getGrad 1,325 getGradientsl, 183 getNeumannData, 206 getNeumannDatal, 183 getNeumannData la, 183 getNeumannData2, 205 getNeumannData2a, 205 getNeumannDatalso, 221 getNodal Values, 183 getNodes, 205 getNodesl, 166, 183 getNormal, 206 getNormall, 183 getNorma!2, 205 getOppositeVertex, 325 getOtherEdges, 325 getTriNodelndices, 205 getTrilModelndicesl, 183 getVertices, 205 global indices, 69 gradient, 9 Gram matrix, 59 Green's (first) identity, 18 alternate version, 19 extension, 19 //'(ft), 27, 71 half-bandwidth, 228 harmonic function, 4 heat flux, 6, 7, 16,212 help in MATLAB, 144 hierarchical basis, 244 HierCG 1,262 HierCGsing 1,262 HierToNodal 1,262 HierToNodalTransl, 262 Hilbert space, 40 hyperelasticity, 11 ill-conditioned matrix, 237 incomplete Cholesky factorization, 255 incompressible material, 218 indices, 69
360
Index
induced matrix norm, 268 infinitesimal rigid displacements, 12 inner product, 36, 57 H\39 L 2 , 38 energy, 43 integral Lebesgue, 28 Riemann, 28 integration by parts in multiple dimensions, 1 8 interior node, 192 isoparametric, 199 interpolant piecewise linear, 107 piecewise polynomial, 109 Interpolate 1, 180, 183,305 InterpolateTransl, 305 interpolation operator, 293 IntNodes, 187 isoparametric finite elements, 95, 121,1 95, 201 iterative method for solving a linear system, 225
Lagrange triangle cubic, 82 linear, 78 quadratic, 78 Lame moduli, 9,214 Laplace operator, 3, 16 Laplace's equation, 3 Laplacian, 3 Lax-Milgram theorem, 52 Lenoir, 199 LevelNodes, 140 line search, 236 linear functional, 41 bounded, 42 LinfNormErrl, 182 Load, 205 load vector, 60, 93, 169, 195 element-oriented assembly, 132 Loadl, 171, 182 Load2, 203, 204 Loadlso, 221 local indices, 69 locally integrable function, 24 LocalRefine 1,324 longest-edge bisection, 313
Jacobi, 276 Jacobi iteration, 271, 285 weighted, 287 Jacobi preconditioning, 254 Jacobian determinant, 93, 160 matrix, 90 nonconstant, 98, 197 JiggleMeshl, 151
MakeEdgesCurvedl, 151 MakeMeshl, 140, 151 mass matrix, 104 matrix banded, 228 diagonally dominant, 271 ill-conditioned, 85 sparse, 74 symmetric positive definite, 65 matrix conductivity problem, 213 matrix norm, 268 induced, 268 measurable function, 28 membrane isotropic, 8 small vertical deflections of, 7 Mesh, 205 mesh, 67 data structure, 134, 187 nonconforming, 68, 311
Korn's inequality, 49, 5 1 Krylov subspace, 241 L 2 (ft),27 L 2 (a,fc),37 L2Norml, 182 L2Norm2, 204 L2NormErrl, 182 L2NormErr2, 204
Index nonuniform, 95 quality, 143 uniform versus locally refined, 309 mesh locking, 219 mesh size, 69 Mesh 1,150 Mesh2, 203 MeshQualityl, 144, 151 mgmul, 304 midpoint rule, 157 multigrid //-cycle, 299 full multigrid, 301 two-grid algorithm, 295 V-cycle, 296 V-cycle, recursive version, 299 W-cycle, 299 Neumann data, 177 NeumannMesh, 151 newest node bisection, 312 NGonMeshDl, 151 nodal value, 69 NodalToHier 1,263 NodalToHierTransl, 263 node constrained, 72 free, 72 interior, 192 NodeParents, 140 NodePtrs, 135 Nodes, 134 nodes constrained, 79 free, 79 nondegenerate family of triangulations, 107 norm, 36 equivalent, 43 of a linear functional, 41 of a matrix, 268 normal derivative, 4 normal equations, 59 notation summary, 151 numerical integration, 119
361
open set, 4 orthogonal basis, 59 orthogonal vectors, 57 partial derivative classical, 23 weak, 24, 71 piecewise linear function continuous, 69, 107 piecewise polynomial, 3, 67, 109 piecewise quadratic function continuous, 78 Poincare's inequality, 46 Poisson's equation, 4 Poisson's ratio, 9, 13,217 polynomial, 67 bilinear, 86 postsmoothing, 296 potential energy, 21, 62 minimal, 22 preconditioner, 252 presmoothing, 296 product rule, 17 in multiple dimensions, 17, 18 programming object-oriented, 134 procedural, 134 projection theorem, 58 Pythagorean theorem, 58 QuadElementErrEstl, 324 quadratic form, 235 quadratic penalty method, 258 quadrature degree of precision, 156 Gaussian, 1-D, 156 midpoint rule, 157 on a square, 165 on a triangle, 158, 188, 196 product Gauss rule, 165 Simpson's rule, 156 trapezoidal rule, 156 quadrature rule, 119 degree of precision, 119
Index
362
R rt ,35 rank-one matrix, 260 rational numbers, 39 Raviart, 199 RectangleMeshDl, 150, 173, 185 RectangleMeshDla, 185 RectangleMeshNl, 150 RectangleMeshTopDl, 150 RectangleMeshTopLeftDl, 150 RectangleMeshTopLeftNl, 150 reentrant corner, 346 Refinel, 139, 151 refinement and curved boundaries, 147 green, 311 standard, 106 RefTri, 204 residual, 293 Riemann sum, 38 Riesz representation theorem, 42, 47-^9 rigid displacement infinitesimal, 11, 50 Ritz method, 62 Scott, 199 SelectTris, 324 shape functions, 78, 97 rational, 89 shear modulus, 12, 13, 49, 218 ShowDisplacement, 220 ShowMesh, 205 ShowMeshl, 144, 151 ShowMesh2, 204, 205 ShowPWConstFcn, 151 ShowPWLinFcnl, 148, 151 ShowPWPolyFcn, 205 ShowPWPolyFcn2, 204 ShowSupportl, 151 Simpson's rule, 156 Sobolev space, 27, 28 Solve, 324 Solve 1,324 SOR, 273, 276 optimal, 273 sparse matrix, 59 sparsity pattern, 76, 80
spectral radius of a matrix, 269 splitting of a matrix, 270 SSOR, 276 SSOR preconditioning, 255 stability, 44 stationary iteration, 267 stationary point, 267 steady-state heat flow, 5, 16 steepest descent algorithm, 237 direction, 237 Stiffness, 205 stiffness matrix, 60, 93, 167, 193 element-oriented assembly, 130 hierarchical versus nodal bases, 249 node-oriented assembly, 128 singular, 178,256 Stiffness 1, 182 Stiffhess2, 203, 204 StiffnessE, 325 Stiffnesslso, 220 StiffnessMC, 221 strain tensor linearized, 9 stress tensor, 9 stress-strain law general, 10 struct MATLAB, 138 subspace, 39, 58 successive overrelaxation, 273 support of a function, 23, 129 symamd, 231 symmetric minimum degree permutation, 231 symmetric reverse Cuthill-McKee algorithm, 231 symmetric successive overrelaxation, 255, 274 and conjugate gradients, 275 symrcm, 231 temperature gradient, 6 test function, 20 TestConv, 206 TestConvl, 183
Index TestConv2, 204 TestConvlso, 221 TestConvMC, 221 thermal conductivity, 6, 45 anisotropic, 213 total error, 303 trace, 9 trace theorem, 28, 46 traction problem, 10, 49, 50 TransToRefTri, 204 trapezoidal rule, 156 triangle green, 311 with curved edge, 97 triangle inequality, 3 triangle selection for local refinement, 315
363 triangle-node list, 129 triangulation, 68 Delaunay, 140 nonconforming, 68 uniqueness, 4 unRefine 1,305 variation first, 22 variational crime, 118 variational problem nonsymmetric, 51, 63 vector space, 35 work unit, 296 Young's modulus, 9, 13,217