This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
{t) : [0,1] -» Cm and 4>{t) g Q* for t £ (0,1] include all the nonsingular roots ofF{z; {B) is equal to a neighborhood on Z of x; and the complex Jacobian - dtp! _ _ _ d<j>t dzi dzm
te(0,l],
is contained in Cm \ A. Proof. Set A has complex dimension at most m — 1, so it has real dimension at most 2m — 2. Let B be the union of all real-one-dimensional lines through qo and any point of A. B has real dimension at most 2m — 1, and so its complement in C m has real dimension 2m. The set of points q\ G C m that give a line segment satisfying the condition of the theorem includes all of C m \ B. • Item 5 of Theorem 7.1.1 with Lemma 7.1.2 imply that for a given target set of parameters go almost any starting set of parameters qi will give a homotopy F(z;tq1 + (l-t)qQ) = 0
(7.1.1)
whose solution paths include all the nonsingular solutions of .F(z; go) = 0 at their endpoints as t goes from 1 to 0 on the real line. If somehow we can arrange to solve F(z;q\) = 0 for a random, complex set of parameters gi, we are ready to solve the
94
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
target system, because the one-real-dimensional open line segment of the homotopy is contained in C m \ Q* with probability one. Suppose we have all the nonsingular solutions for only the particular system F(z; qi) = 0, with M(q\) = A/*. Even though gi is generic, it could happen that we wish to solve the system for a target go f° r which the homotopy of Equation 7.1.1 fails. This means there is some relation between qi and qo, for example, they might both be real with a degenerate point on the real line segment joining them. Referring to the proof of Lemma 7.1.2, we have that q\ is not in the degenerate set, Q*, but it is in the set of points lying on a real straight line from q0 to a point of Q*. When gi is generic, in the sense that Af(qi) = Af, but not random complex independent of qo, can we still formulate an homotopy to find all nonsingular solutions of F(z;qo) = 0 with probability one? Yes: the answer is to follow a different continuation path, one that is not the real straight-line segment from qi to go and that includes some extra parameter or parameters that can be chosen generically to avoid degeneracies. Here are three, among many, possibilities: • Pick a third random, complex parameter point p £ C m and follow the broken line homotopy path from qi to p to go- Each of the two real-straight-line segments will succeed with probability one, and so the concatenation of the two will succeed also. • Pick p as in the previous item, and employ a curved-path homotopy such as F(z; tqi + i(l - t)p + (1 - t)q0) = 0.
(7.1.2)
By similar reasoning to Lemma 7.1.2, the endpoints at t = 0 of N paths from the nonsingular solutions of F(z\q{) = 0 will include all the nonsingular solutions of F(z; go) = 0 for almost all choices of p £ C m . • Use the same homotopy as in Equation 7.1.1, but follow a more general path in the complex line denned by t, instead of just following the real segment [0,1]. A convenient way of doing so is to reparameterize the homotopy by T £ [0,1], setting
for generic 7 £ C. This maneuver is justified in the following lemma. Lemma 7.1.3 ( " G a m m a Trick") Fix a point q0 £ Cm, a proper algebraic set A C Cm, and a point gi £ Cm, q\ £ A . For all 7 £ C except for a finite number of one-real-dimensional rays from the origin, the one-real-dimensional arc
t=
1 +
^
1 ) T
>
T £ (0,1],
is contained in C m \ A. Furthermore, if we let 7 = e%e, the foregoing statement still holds for all but a finite number points 9 £ [—TT, IT] .
Coefficient-Parameter Homotopy
95
Proof. Since the set T := {t G C | (tqi + (1 - £)<7o) e A} is algebraic, it must either be all of C or a finite number of points in C. But by assumption, t = 1 is not in T, so T must be finite. The bilinear transform from r to t maps [0,1] to a circular arc in the Argand plane for t, leaving t = 0 with angle equal to the angle of 7. Hence, any two choices of 7 / 0 having different angles give distinct circular arcs that meet only in the two points t = 0 and t = 1. This implies that there is only one such arc through each t € T, and each such arc is produced by values of 7 on a one-real-dimensional ray from the origin. For all other values of 7 £ C, the path 4>{t) for T £ (0,1] is contained in C m \ A. The final statement follows because each ray from the origin hits the unit circle, I7I = 1, in a single point. • There are many alternative ways one could set up paths with the desired genericity, but these simple approaches suffice. We have already seen the usefulness of a variant of the "gamma trick" in the example of Figure 2.1, and we will return to it in § 8.3. Theorem 7.1.1 covers many of the cases that arise in practice, but situations arise when more refined versions are useful. Some useful variants are: (1) the variables z live on projective space or on a cross product of projective spaces instead of on Euclidean space; (2) we count solutions on a Zariski open subset of the variable space instead of on the whole space, that is, solutions that satisfy prespecified algebraic conditions are to be ignored; (3) the parameters q live on an irreducible algebraic set in Euclidean space or in projective space or in a cross product of projective spaces. In the case that the variable space or the parameter space involve a projective factor, the system of equations must be multihomogeneous in a way that is compatible with those spaces. Recall from § 3.6 the definition of a multiprojective space as a product of projective spaces, for which we have the associated concept of multihomogeneous polynomials. Theorem 7.1.4 (Generalized Parameter Continuation) Let X be a multiprojective space of dimension n, that is, X = P" 1 x • • • x Pnfe with n\ + • • • + n^ = n. Let U C X be a Zariski open subset of X. Let Q c Y be an irreducible multiprojective algebraic set in a multiprojective space Y. Let F(z; q) be a system of n multihomogeneous polynomials compatible with X xY such that z and q are homo-
96
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
geneous coordinates for X andY, respectively. Furthermore, let J\f(q,U,Q) denote the number of nonsingular solutions in U as a function of q € Q: M(q, U,Q):=#lzeu\
F(z; q) = 0, rank f^(z;
q)\ = n\ .
Then, (1) N(q, U, Q) is finite and it is the same, say Af(U, Q), for almost all q G Q; (2) For all qeQ, N{q, U, Q) < Af(U, Q); (3) The subset of Q where Af(q, U, Q) = M(U, Q) is a Zariski open set; We denote the exceptional set where Af(q, U, Q) < Af(U, Q) as Q*; (4) The homotopy F(z; 7(£)) = 0 with 7(4) : [0,1] —> Q\Q* hasM(U, Q) continuous, nonsingular solution paths z(t) G U; (5) As t —> 0, the limits of the solution paths of the homotopy F(z;/y(t)) = 0 with j(t) : [0,1] —» Q and j(t) $ Q* for t G (0,1] include all the nonsingular roots inU o/F(z; 7 (0))=0. Note that computations will be done in z G C™1+1 x • • • x C" m + 1 but interpreted as points in X. For each projective factor, we typically add an inhomogeneous hyperplane equation to make the scaling factor unique. This is the projective transformation technique described in Chapter 3. The constancy of the number of solutions for the algebraic case still follows from Corollary A. 14.2, which allows an even more general situation than we use here. We require Q to be irreducible so that it is path connected which implies the constancy of the root count; if Q were not irreducible, the root count could be different on different components of Q. Since C n is a Zariski open subset of IP™, Theorem 7.1.4 clearly includes Theorem 7.1.1, by letting m = 1, U = Cn, and Q = C m , an irreducible algebraic set. Notice that in the generalized version of the theorem, we denote the generic number of nonsingular solutions as N{U,Q), because the count may change if we consider a different Zariski open set U for the variables or if we restrict the parameters to a different algebraic set Q. We will consider both of these possibilities in the succeeding sections. We can generalize the theorem further. It sometimes happens that the parameters appear via analytic expressions instead of polynomial ones. That is, the coefficients of F(z; q) as a polynomial system in z may be trigonometric or other analytic functions of q. All the same conclusions follow. This is discussed in § A.14.2, so we omit further discussion here and simply state the analytic version of the theorem in the following abbreviated form. Theorem 7.1.5 (Analytic Parameter Continuation) Consider the same situation as in Theorem 1.1.4 except that Q — C m and each of the n functions in F(z; q) is a multihomogeneous polynomial in z with coefficients that are holomorphic functions of q € Q. Then, we have the same conclusions as Theorem 7.1.4 for items 1, 2, 4, and 5, with item 3 modified as
Coefficient-Parameter Homotopy
97
(3) The subset of Q where Af(q, U, Q) = N{U, Q) is an analytic Zariski open set. Elsewhere, without the qualifier analytic, we use the term Zariski open set to mean the algebraic case. The inclusion of analytic in item 3 of Theorem 7.1.5 implies a weaker condition than the algebraic case, as is to be expected since the set of holomorphic functions is larger than the set of polynomial functions. The difference is illustrated by the algebraic case f(z; q) = z2 — q, which has Af(q) — 2 everywhere in C except q = 0, as compared to the analytic case of f(z; q) = z2 — sin(g), which has exceptions for q = kir, k any integer. An algebraic equation can never have an infinite number of isolated roots, but an analytic one can. Even so, an analytic Zariski open set of C m is path connected, so continuation will succeed. A final generalization of the theorem is to consider not just nonsingular roots, but isolated roots of any multiplicity. Theorem A.14.1 and Corollary A.14.2 are general enough to justify a restatement of Theorem 7.1.4 for isolated roots. Care must be taken in the restatement of items 2 and 5, as the limit behavior of multiple roots as a parameter path approaches the exceptional set is more complicated than for nonsingular roots. The fact is that in this limit only three things can happen: a solution path can leave U by landing on X \ U (this may include paths going to infinity); a solution path can land on a higher-dimensional solution component and thus cease being an isolated point; and two or more solution paths may merge to form an isolated solution whose multiplicity is the sum of those for the incoming paths. The number of isolated roots of a given multiplicity can increase, but only at the expense of a corresponding decrease in the number of roots having a lower multiplicity.
Theorem 7.1.6 (Parameter Continuation of Isolated Roots) Let X, Y, U, Q, and F(z;q) be as in Theorem 7.1.4- Furthermore, let Mi(q,U,Q) denote the number of multiplicity i isolated solutions in U as a function of q £ Q. (1) Ni{q, U, Q) is finite and it is the same, say Ni(U, Q), for almost all q e Q, and there is some finite number fx such that for all i > /x, Afi(U, Q) = 0; (2) For allqeQ and any m, J2?=i *M(«, U, Q) < Y7=i iMi(U, Q); (3) The subset of Q where Mi{q, U, Q) = Afi(U, Q) for all i < m is a Zariski open set; We denote the exceptional set where any of these equalities fails as Q*m; (4) For each i, the homotopy F(z; j{t)) = 0 with -y{t) : [0,1] -> Q\Q* has Aft(U, Q) continuous, isolated solution paths z(t) € U of multiplicity i; (5) As t —* 0, the limits of the set of multiplicity i solution paths such that i < m of the homotopy F(z; j(t)) = 0 with i(t) : [0,1] -> Q and j(t) g Q*m for t € (0,1] include all the isolated roots of F(z; 7(0)) = 0 in U of multiplicity less than m!, where m' is such that Mi(U, q) = 0 for m < i < m'. In numerical work, the paths traced by roots of multiplicity greater than one are hard to track, but in principle, singular path tracking is possible, see § 15.6. If we track only nonsingular paths, item (5) tells us that we are assured of obtaining
98
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
all nonsingular roots of the target system, which is what was claimed in the earlier theorems. To be assured of finding all isolated roots of the target system, we must track all the generically isolated roots, as indicated when m in item (5) is equal to fi in item (1). A special case of particular interest is when all the isolated roots of a generic system in the family are nonsingular, that is, when fj, — 1 in item (1) of the theorem. Then, we can easily track all the isolated solution paths, and we are assured that the endpoints of these include all isolated solutions, even those with multiplicity greater than one. It is important to note that where Theorems 7.1.1, 7.1.4, and 7.1.5 refer to a polynomial system F(z; q), it is acceptable for F to be given in straight line form (see Definition 1.2.4). 7.2
Parameter Homotopy in Application
The foregoing describes the essence of the polynomial continuation method. To find nonsingular solutions of the polynomial system p(z) = 0 in a Zariski open set U, we do the following, a restatement in mathematical terms of the steps enumerated in the introduction to Part II. Ab Initio Procedure: To find all solutions in a Zariski open set U of p(z) = 0. (1) Embed p(z) : C" —> C" as a member of a parameterized family F(z;q) : C" x Q —> C n of polynomial systems. Denote by qo £ Q the particular parameter values that correspond to p(z), that is, F(z,qo) =p(z). (2) Arrange the embedding such that we have starting parameters q\ £ Q, q\ $ Q*, for which we either have or can compute all N(U, Q) nonsingular solutions to F(z;qi) = 0. Call these the "start points." (3) Construct a continuous path j(t) : C —> Q such that 7(1) = qi, 7(0) = qo, and 7(i) £ Q* for t in the real interval (0,1]. That is, -y(t) for t £ [0,1] connects the start parameters to the target parameters without intersecting the exceptional set, except possibly at t — 0. (4) Follow the Af(U, Q) solution paths of F(z;j{t)) = 0 from t = 1 along the real axis to the vicinity of t = 0. These paths begin at the start points, and we propagate them towards t = 0 using a numerical path-tracking algorithm. (5) In the neighborhood of t = 0, determine which paths are converging to nonsingular solutions. Refine these to numerically approximate the solutions to the desired accuracy. (6) Keep only those roots which are in U, that is, eliminate those that lie on the algebraic set C™ \ U. Suppose that p(z) is not just a single system of interest, but rather it is a member of a family systems G(z; q) : X x Q' —> C" of the sort we have been discussing:
Coefficient-Parameter Homotopy
99
p(z) = G(z; q) for some q G Q'. For the sake of item 2 above, we may have had to cast p(z) in a larger family of systems than G. That is, G(z; q) is F(z; q) restricted to Q' C Q. This is often necessary when we have no generic member of G for which we have (or can easily generate) all nonsingular solutions. The larger family F is chosen in a way that provides such a start system. However, once we have solved an initial generic member G, we can then solve any other member of G by parameter continuation along paths in Q'. This can be advantageous because the generic root count on G can be smaller (perhaps much smaller) than the generic root count for F. To capture this advantage, one may apply a two-phase procedure as follows. Two-Phase Procedure: To find all solutions of G(z; q) — 0 in Zariski open set U for several parameter points, say qlt..., qk £ Q'. (1) Phase 1: solve G(z;q0) = 0 (a) Choose q0 random, complex in Q'. (b) Solve G(z; go) = 0 using an ab initio technique as above. (c) Let Z be the set of nonsingular solutions in U so obtained. (2) Phase 2: for each qt, i = l,...,k, straight line homotopy.
solve G(z;qi) = 0 by a continuation on a
(a) Form the homotopy G(z; tq0 + (1 — t)qi) = 0. (b) Track each root in Z from t = 1 to near t = 0. (c) In the neighborhood of t = 0, determine which paths are converging to nonsingular solutions and compute their endpoints to the desired accuracy. (d) Keep only those roots which are in U, that is, eliminate those that lie on the algebraic set Cn \ U. In the remainder of this chapter, we will concentrate on Phase 2 of this procedure; that is, we will assume that we have the solution set for some initial generic system. Phase 1, the ab initio procedure, is the subject of Chapter 8. 7.3
An Illustrative Example: Triangles
Before embarking on a more complete examination of parameter continuation, let's look at a simple example where the parameterization and the start system are rather easily obtained. One would not use continuation to solve this problem, but it may help illustrate what goes on in more challenging problems. To make a concrete example, we consider the classic problem of solving for the angles of a triangle given the lengths of its sides, a, 6, c. Let 9 be the angle opposite side c. We will write a system of polynomials in two variables eg = cos9 and sg = sin 9. As shown in Figure 7.1, we have the three vertices of the triangle as
100
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(0,0), (b, 0) and (acg,asg), and the system to solve is
The first of these is the basic trigonometric identity for sine and cosine, and the second says that point (acg,asg) is distance c from point (6,0). Our parameters are the physical parameters q = (a,b,c), and the variables are z = (cg,sg). The coefficients in /i are constants and, when expanded out, the coefficients in /2 are quadratic polynomials in (a,b,c). (acg,asg)
/Q (0,0) Fig. 7.1
b
\ (6,0)
Triangle with side lengths a,b, c.
The system is easily solved without using continuation by forming ji — a2 f\ to get a2 + b2 - 2abce - c2 = 0,
se = ±\Jl-c20.
(7.3.5)
The first of these is the familiar Law of Cosines for planar triangles. For almost all (a,b,c), there is a unique value of eg, the exceptions being if a = 0 or b = 0. In these cases, the angle 9 is not well defined, because one of the sides of the triangle is nonexistent. Away from these sets, the second of Equations 7.3.5 gives two distinct values of SQ unless eg = ±1, in which case there is a double root. Substituting this in the law of cosines equation, one sees that there will be double roots for (a, b, c) on any of the four planes a ± b ± c = 0. These are the boundaries of the triangle inequality conditions. For real (a, b, c) that violate the triangle inequality, one has c$\ > 1, and sg is a pair of complex conjugate roots. Now, let's pretend that we do not know the solution via Equation 7.3.5 and that we seek a solution by parameter continuation using (a, 6, c) s C 3 as our parameter space. The first hurdle is to obtain a start system. For a more complicated system, we would normally pick (a, 6, c)i at random and rely on one of the special homotopies discussed in Chapter 8, such as the total degree homotopy, to solve it. We will discuss this type of maneuver more below. However, for this simple system, we can pick out a known solution easily: let (a\,bi,c\) = (5,4,3), a Pythagorean triple. Then, we have two solution points (cg,sg) = (4/5, ±3/5). Note that J2 in Equation 7.3.4 is homogeneous in the parameters; in particular, all the coefficients are homogeneous quadratics in (a, b,c). This means that the solution does not
Coefficient-Parameter Homotopy
101
change under scaling, and so for (a, b, c) = (5a, 4a, 3a) we have the same solution points (ce,sg) = (4/5, ±3/5) for any nonzero, complex a. One may wonder if there are any other solutions. The total degree of the system is four, and its one-homogenization has two roots at infinity of the form [ZQ, c$, sg] = [0,1, ±i], so there are only two finite roots. Here, the one-homogenization is obtained via the substitutions eg = Cg/zo, sg = sg/' ZQ. Next, we need a homotopy path from our starting system (ai,&i,ci) = to the target (a, b, c)o- The straight line path -y(t) = (1 - t)(5a, 4a, 3a) + t(a0, b0, c0)
(7.3.6)
will suffice for almost all targets. It is not difficult to check that when a is complex and the target is real, the values of t for which the path intersects the singularity conditions is complex, unless the target itself is singular. So we will not encounter any singularities for t on the real interval (0,1]. For a fixed complex-valued a, there will exist complex targets for which the homotopy path hits a singularity, but if we choose a at random, independent of the target, then there is a zero probability of this failure. It may be instructive1 to consider what would happen if we were to choose a homotopy path in the reals, say a = 1. The homotopy is still fine for any real target that is inside the triangle inequalities, since these bound a convex region of the real parameter space. However, a line segment connecting a real target outside the triangle inequality region to a real start system inside must cross the singularity. These real targets form a set of measure zero in C3, so considering all targets in C3, the homotopy is still valid with probability one. But in practice, we usually want to solve systems for real-valued parameters. This illustrates that it is important to use some sort of complex randomizing factor in the homotopy so that real systems are solved with probability one. 7.4
Nested Parameter Homotopies
In practice, it is quite common for a parameterized family of problems to have special cases that are themselves of significant interest. In fact, we often see an elaborate network of special cases, each one inheriting the special structure of the solution sets of the more general cases of which it is a member. The forward kinematics problem for Stewart-Gough robot manipulators discussed below (§ 7.7) illustrates this. Let us be a bit more precise about this situation. Corollary 7.4.1 For a family of polynomial systems F(z;q) : C™ x Qo —> C n , a chain of parameter spaces
Qo 3 Qi D Q2 3 • • • 1
Exercise 7.1 at the end of the chapter is a good way to get a feel for the numerical behavior of this simple homotopy.
102
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
each of which is an irreducible quasiprojective algebraic set, and a Zariski open set U C C n , the generic nonsingular root counts J\f(U,Qi) obey the inequalities M{U, Qo) > M(U, Q0 > M{U, Q2) > • • • Proof.
This is just the repeated application of item 2 of Theorem 7.1.4.
•
We know that we can use parameter homotopy within any one of these spaces to compute the nonsingular roots of the associated polynomial systems, assuming we have all nonsingular solutions at an initial generic point in the family. Suppose we wish to use parameter continuation within the space Q\, but we do not yet have a solution for any point in that space. Suppose that instead we have all nonsingular solutions for the system f(z;qo) = 0, for a generic point qo € Qo- Let q® € Q\ be a generic point of Q\. But Qi C Qo implies q\ G Qo, so we may find all nonsingular solutions to f{z;q1) = 0 by parameter continuation in Qo, starting at qo- If Qi C Qo, the exceptional set in Qo, then there are fewer solutions at q\ than at qo- Now, we may proceed to solve the system for any other parameters q\ £ Q\, i = 1,2,..., using this smaller number of paths, by continuation inside Q\ starting at q±. Obviously, the same approach can be applied to solve a start system in any Qi once we have a solution for a system in one of its ancestors, Qj, j < i. Unlike the simple triangle problem discussed above, when solving problems in engineering or science, we rarely have all the solutions for any generic point in the natural parameter space of the problem. So how do we get started? A very useful trick is to solve the first naturally-parameterized problem by embedding the whole family within a larger, artificially-parameterized family of problems, within which we do have a solved general case. This is the Ab Initio procedure of § 7.2. Suppose, for example, that an engineering problem is a system of two quadratics in two variables. There are a total of 12 coefficients in two bivariate quadratics, but for our problem these may depend on just a few physical parameters. We may solve the initial problem given by generic physical parameters using a homotopy in Qo = C12, the parameter space of all coefficients of two bivariate polynomials. Then, Qi C Qo consists of the sets of coefficients that are generated by ranging over the physical parameters. 7.5
Side Conditions
In the statement of coefficient-parameter homotopy above, the generic number of nonsingular roots, M(U, Q), is counted on a Zariski open subset U in complex space. The result is stated in that way to justify the application of "side conditions" for eliminating uninteresting solution paths from a parameter homotopy. Suppose the zeros of a system of analytic functions s(z) : C™ —v Ck are not of interest as solutions of F(z;q) — 0. We call s(z) — 0 "side conditions," and U = Cn \ s~1(0). Typically, the side conditions identify degenerate solution sets
Coefficient-Parameter Homotopy
103
that are known by other means, but they may also be certain pro forma conditions that have been noticed to arise often. A common choice of the latter type, especially when using monomial product homotopies, is the side condition s(z) = HILi z* = 0> which simply means that we are not interested in solutions that have any coordinate equal to zero. This is equivalent to saying that we are working on the open set U — (C*)n, where C* = C \ 0. We will see below the use of side conditions specific to a particular application, such as two variables being equal: s(z) = z\ — z% = 0. In essence, even when we work on U = C n , we are invoking a side condition on P": we are ignoring solutions at infinity. Side conditions work hand-in-hand with nested parameter homotopies. Whenever we solve the first generic example in a parameter space, we check the solutions against the side conditions. Then, when solving other problems in the same parameter space using the first example as the start system, we drop the solutions that satisfy the side conditions from the list of start points for the continuation. In some cases, the degenerate solutions specified by the side conditions vastly outnumber the interesting ones, and the number of paths in the parameter continuation is dramatically reduced. 7.6
Homotopies that Respect Symmetry Groups
Some systems respect symmetry groups and we can reduce the number of paths to follow accordingly. Suppose we have a mapping 5 : C" —> C n such that for any q G Q, if F(z;q) = 0, then F(S(z);q) = 0. Furthermore, suppose that if z is a nonsingular solution, then so is S(z). Often, F(S(z);q) is either exactly F(z;q) or a rearrangement of the polynomials of F(z;q). For example, under the mapping i > (y, x), the polynomial system {xy—qi, x2+y2+q2} is invariant, whereas S : (x, y) — the polynomials in the system {xy3 — a, x3y — a} interchange. In such cases, it is clear that nonsingular roots map to nonsingular roots. Using the notation S2(z) = S{S(z)), S3(z) = S(S(S(z))), etc., suppose k is the smallest integer such that z = Sk(z). We say that / respects 5 as a symmetry group of order k. The symmetry implies that for the homotopy F(z;q(t)) = 0, a solution path zo(t) is matched by the paths Zi(t) = Sl(zo(t)), i = 1,..., k - 1 . So we only need to compute one of the k paths: we use S to compute the endpoint of the matching paths without knowing their intermediate points. It can happen that for the same symmetry mapping, roots appear in symmetry groups of different orders. For example, for the system {xy3 — 1, x3y - 1} = 0 and the mapping (x, y) i-» (y, x), the root (x,y) = (1,1) maps to itself, while the root (\/2/2, —\/2/2) is in a group of order two. This must be taken into account when using symmetry to reduce the number of solution paths. When we solve the first generic example in a parameter space, we usually must resort to an ab initio procedure (§ 7.2), embedding the target system into a larger family of systems. Since the members of this larger family generally do not respect
104
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
the symmetry, we must follow all the paths in the first run. The symmetry can still be useful as a check on the computation: do all roots appear in the requisite symmetries? If so, we have some assurance that the numerical process was carried out successfully. Then, in subsequent runs using Phase 2 of the two-phase parameter homotopy procedure, the symmetry is used to reduce the number of paths in the computation. 7.7
Case Study: Stewart-Gough Platforms
For the first significant example of this book, we examine an important family of problems from mechanical engineering: the forward kinematics of Stewart-Gough platform robots. As we will see shortly, there are a number of different options for the design of such robots, and these can be organized into nested families of robot types. These parameterized families are ideal for illustrating the concept of parameter continuation. A Stewart-Gough platform, shown schematically in Figure 7.2, is a type of parallel-link robot, having a stationary base platform upon which a moving platform is supported by six "legs." Each of these legs has a spherical (ball-and-socket) joint at each end,2 with a prismatic joint (linearly telescoping) in between. The prismatic joint is actuated, usually by a ball screw and electric motor, so that the distance between the center of its adjacent universal and spherical joints can be controlled by computer. That is, leg i, i — 1,..., 6, connects point Ai of the stationary platform to point Bi of the moving platform, and we control the lengths Lt = \Bi — Ai\. By proper coordination of the six leg lengths, the moving plate can be placed in any position and orientation within a working volume (actually a six-dimensional workspace, a subset of R3 x 50(3)), whose boundaries are determined by the limits of travel of the prismatic joints. Collisions between the legs can also limit the range of motion. These robots are best known as the mechanism beneath motion platforms for aircraft flight simulators, but they are applicable to tasks as varied as aiming telescopes or welding automotive bodies. The kinematics of these robots has been the subject of extensive academic research, which we cannot begin to address here. We refer the interested reader to (Merlet, 2000; Tsai, 1999) as a starting point. Although many interesting algebraic problems arise in the study of these mechanisms, for the moment, we will consider only the so-called "forward kinematics" problem, which is as follows: Given: the geometry of the stationary and moving platforms and the six leg lengths, 2
One ball joint on each leg can be replaced by a universal joint to eliminate rotation of the leg around its axis, but this does not alter the motion of the moving platform, our present object of study.
Coefficient-Parameter Homotopy
105
Find: the position and orientation of the moving platform with respect to the stationary one. As usual, in what follows, we embed the real problem into complex space, so even though only real values of the leg lengths are physically meaningful, we consider complex Li e C. Similarly, we treat the robot workspace as C3 x 50(3, C), where SO(3,C) = {AG C 3x3 |,4 T yl = I,detA = 1}.
Fig. 7.2
General Stewart-Gough platform robot.
To write a system of polynomial equations, we need to precisely define the problem data. Choose reference frames in the stationary and moving platforms. Let the position of point Ai be given by vector at S C 3 in the stationary frame, and let Bi be given by vector bi € C 3 in the moving frame. Rather than use a direct coordinatization of C3 x SO(3,C), it is more convenient for the problem at hand is to use Study coordinates, also known as "soma coordinates" (p.150-152 Bottema & Roth, 1979). These consist of all points [e,g] = [e0, ei, e2, e3, g0, gi, 52 >#3] G P 7 that lie on the Study quadric / 0 (e, g) = eog0 + eigi + e2g2 + e3g3 = 0.
(7.7.7)
This is an isomorphism of C3 x 50(3, C), wherein the elements e are a quaternion that represents the orientation of the moving platform with respect to the stationary one and g is a quaternion that encodes translation as p = ge'/(ee1). Accordingly,
106
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
the position of point Bi in the reference frame of the stationary platform is written
(ge' + eble')/(eel), where multiplication follows the rules for quaternions and g' = {go, —g\, —gi, ~gz) and e' = (eo, —&\, ~&2-, —63) are quaternion conjugates of g and e. Clearly, we must exclude the points that satisfy s(e,g) = ee' = 0.
(7.7.8)
The Study quadric is exactly the condition that the translation ge' be a pure vector, and sincefo,is a pure vector, so is e^e''. These facts and the fact that the length of pure vector v, considered as a quaternion, is just vv', allow us to write the basic kinematic equations for the Stewart-Gough platform as Li = {(ge' + ebie')/(eef) -
1 = 1 , . . . , 6.
(7.7.9)
Note that this system of equations immediately solves the "inverse" kinematic problem: given the position and orientation of the moving platform as [e,g], we can calculate the leg lengths Lt. We are looking to solve the opposite problem: given Lu find [e,g\. To proceed, we expand Equation 7.7.9 and multiply through by ee' to get, for i = l,...,6 fi(e, 9) = 99' + (hK + a X " £<)«*' + (gb'te' + ebi9') -(ge'a'i + ateg') - (eft^'a; + a.eb'.e') = 0 .
l
;
In summary, Equations (7.7.7,7.7.10) form the forward kinematic problem for Stewart-Gough platforms as F(e,g) = {f0,fu...,f6}=0,
(7.7.11)
subject to the side condition s(e,g) ^ 0 from Equation 7.7.8. System F(e,g) = 0 is a set of seven homogeneous quadratic equations in [e,g] G P 7 .
7.7.1
General Case
The complete family of Stewart-Gough forward kinematic problems is parameterized by the joint center points and the leg lengths, {(a i; 6,, L4), i = 1,...,6} G (C3 x C 3 x C) 6 , a 42-dimensional space. Hence, in the preceding section, we should have written the equations as F((e,g);p) — 0, where p G C 42 . It is of historical interest to note that the number of solutions to the forward kinematics of general Stewart-Gough platforms was found to be 40 by several different researchers at about the same time 3 using entirely different approaches: continuation (Raghavan, 3
Historical note: preprints of (Ronga & Vust, 1995) circulated widely in 1992 and were referenced in (Lazard, 1993; Mourrain, 1993). The conference paper (Raghavan, 1991) was the first report of the count of 40, and this numerical result may have helped motivate the proofs.
Coefficient-Parameter Homotopy
107
1993), vector bundles and Chern classes from algebraic geometry (Ronga & Vust, 1995), computer algebra using Grobner bases (Lazard, 1993), and computation of a resultant using computer algebra (Mourrain, 1993). See also (Mourrain, 1996). The formulation of the problem we use here follows (Wampler, 1996a), wherein a simple proof of 40 roots is given. The same formulation was derived independently by Husty (Husty, 1996), who gave a procedure that uses computer algebra to derive a degree-40 equation in one variable. This is but a small indication of the level of interest this problem has attracted. If we could solve the forward kinematics problem for just one general member of C42, we could solve any other member by parameter continuation. The question of how to get that first solution set is addressed in the next chapter. For the moment, let us just say that the trick is to cast the Stewart-Gough forward kinematics problems as members of a much larger family, the family of all systems of seven quadrics on [e,g] G P 7 . General members of this family have 27 = 128 isolated solution points, so we can find all isolated solutions for an initial Stewart-Gough problem by tracking 128 solution paths for a homotopy defined in this larger space. Doing so reveals that a generic Stewart-Gough platform, p0 £ C42, (chosen using a random number generator) has 40 nonsingular solutions and 88 singular ones. The singular solutions are on the degenerate set Equation 7.7.8, so we can safely ignore them as they are not of physical significance. In short, we have 7V(P7,C42) = 40 and only these roots are of interest. Having the 40 isolated solutions x0 € -Fp^1(0) to a generic Stewart-Gough platform, po £ C42, we are ready to apply parameter continuation within the family. By Lemma 7.1.2, a straight line path from po to almost any other pi G C42 stays generic and so by Theorem 7.1.4, the 40 solution paths starting at Xo for t = 1 of the homotopy HSG((e,g),t)
:= F((e,g);tp0 + (1 - t)Pl) = 0
(7.7.12)
will lead to a set of endpoints that contains all isolated solutions of F((e, g);pi) = 0. (We invoke the generalized Theorem 7.1.4 instead of the basic version, Theorem 7.1.1, because we are working on projective space P7.) There exist points p* for which the line segment between p0 and p*, parameterized by t G [1,0) in the homotopy above, strikes a singular point. Such points are a set of measure zero in C42, but they do exist. If one happens to encounter such a problem, where some homotopy paths founder before t approaches zero, all that is necessary is to first continue from po to another random point in C42 before proceeding to the final target. Or, to accomplish the same thing, we may choose a random 7 £ C and follow the homotopy HSG{{^,9)>t(s)) = 0 along a nonlinear path t(s) = s + -ys(l — s) on the real segment s £ [1,0]. In practice, unless one is solving a large number of such problems, the exceptions to the linear homotopy path will almost certainly not be encountered, so Equation 7.7.12 is sufficient.
108
7.7.2
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Platforms with Coincident
Joints
Various special families of Stewart-Gough platform robots may be defined by requiring some joint centers to coincide. For example, suppose legs 1 and 2 both connect to the same point on the moving platform; in other words, points B\ and i?2 coincide. This is an example of a so-called 6-5 platform, where 6 and 5 are the numbers of distinct joint centers on the stationary and moving platforms, respectively. Such special platform robots can have advantageous kinematic properties, so they are of practical interest. In fact, the limiting case of a 3-3 platform, discussed below, is one of the most popular designs in practice. A 6-6 platform is the most general type, treated in the preceding paragraphs. The number of joint centers on a platform can take on any value from 3 to 6. (If there were only 2 joint centers, rotation around the line through them cannot be resisted by the mechanism, making it useless.) Moreover, these two integers are not enough in general to fully specify the mechanism type, since it matters, for example, if one of the legs connects two double joints. We can schematically represent the topological type of a platform with coincident joints by two rows of dots representing joint centers, with lines between them representing connecting legs. There are always six legs, but the number of dots is reduced by the presence of coincident joints. We will assume that the top row of dots represent the joint centers on the moving platform and the bottom row represents those of the stationary platform. The connection patterns
VvyY - YvM 4-4a 4-4b are both 4-4 patterns, but they are topologically distinct. We will only address a few of the possibilities in the next few paragraphs. A more complete catalog of coincident-joint geometries and their root counts can be found in (Faugere & Lazard, 1995). Consider first the 4-4 connection pattern illustrated on the left above, which we label 4-4a. It is given as a quasiprojective algebraic subset of C42 by the equations {ai = a2,a5 = ae,b2 = 63,^4 = b5}. We may solve such an example by making it the target system of either a total degree homotopy or the general Stewart-Gough homotopy HSG, because it is a member of both. Usually, it is more efficient to use the 40-path option than the 128 paths of the total degree homotopy. But either way, one finds only 16 solutions, with the rest of the paths having endpoints on the degenerate condition, Equation 7.7.8. With 16 solutions for a generic example in family 4-4a in hand, we can solve any other problem in that subfamily using HSG and only 16 paths. This is just the tip of the iceberg in terms of the possible subfamilies of the Stewart-Gough platform. Figure 7.3 shows a family tree of six sub-families, with arrows indicating inclusions (lower families in the figure are sub-families of higher ones). At the top, "quad7" is the family of all systems of 7 quadrics, which contains
Coefficient-Parameter Homotopy
109
all of the Stewart-Gough platform systems. Table 7.1 lists these same families: each is given a name, such as 4-4a, and the pattern of coincident joints is indicated graphically. Ignore for the moment the families whose names end in "P;" these are discussed in the next subsection. The number of nonsingular roots is indicated as M. This will be the number of homotopy paths for a parameter homotopy starting from a generic point in the family and ending at any other point in the family, including any point in a family that is a subset of that family. For each family, the dots in the table in its row indicate which families it belongs to. For example, the first column is the family of all systems of 7 quadrics, which contains all of the other families, so there is a dot in every cell of the first column. We can solve any Stewart-Gough platform by a 128-path homotopy through the parameter space of 7 quadrics or by a 40-path homotopy through the space of general 6-6 platforms. Of course, if the target system is a member of some other subfamily, it is more efficient to work within that family after a first generic member of the family has been solved by continuation in a family above it. This is why, for example, we need the seven-quadric system to get the process started.
Fig. 7.3 Stewart-Gough coincident joint family tree
Table 7.1 is not an exhaustive list of special Stewart-Gough sub-families. Among the coincident-joint families, any type K-L with 3 < K, L < 6 is possible, including cases where 3 joints are coincident. Four coincident joints will be degenerate—either no solutions or a positive-dimensional solution set—so these can be ignored. Further exploration of the coincident-joint families is an exercise at the end of this chapter. Besides these families, there exist special cases where no joints are coincident, but rather, there is some other geometric relationship, such as joints in a straight line.
110
Numerical Solution of Systems of Polynomials Arising in Engineering and Science Table 7.1 Stewart-Gough Sub-Families 4 q
Pattern N/A
Name quadf~
66
40
64
20 + 20 32
nnn " 111111 6-6P
IA A !
Af 128
"
1A/M \A A /
4 4a
6-4P ~
16 + 16 is
Y W V
4-4aP
8+ 8
T V V \1 /W
^ 4-4bP 3-3
24 12+12 8+8
\AA
d
6
6 7
6 6
•» ••
•
.
6 P
6
4 4
4
4 4
4
4
4 a 4 b 3 P a P b P 3
• • .
«
»
r~;
.
«
«
»
;
i «
• ..»»
We will have reason to study such a case later, in Part III. 7.7.3
Planar
Platforms
Every family in Table 7.1 has a planar version, indicated by the suffix "P" in its name. These have the six points of the stationary platform in a plane and similarly for the moving platform. In the interest of simplicity, these have not been added to Figure 7.3, but we may summarize the membership relationship as follows. If A and B are non-planar families, AP and BP their planar sub-families, and B is a sub-family of A, then we have the following inclusions. A D AP U U B D BP The planarity condition results in a symmetry, because the moving platform and its mirror image reflected through the plane of the stationary platform are congruent and all the leg lengths are preserved by the reflection. Hence, solutions appear in symmetric pairs. If we perform continuation in a planar family, this symmetry applies at every step, and hence all solutions can be obtained by tracking only one of each pair. This is the reason that N in Table 7.1 is written in the form N/2 + AT/2: only half the paths must be tracked to solve a member of that family. 7.7.4
Summary of Case Study
The main point to remember is that if we have a list of N{U, Q) nonsingular solutions for one generic member of a parameterized family F of polynomial systems, we can find the nonsingular solutions of any other member of the family using these as the start points of M(U,Q) homotopy solution paths. In the case of the for-
Coefficient-Parameter Homotopy
111
ward kinematics of Stewart-Gough platforms, N(F7, C42) = 40, so any problem can be solved using a 40-path homotopy. We have identified a number of sub-families that have a reduced number of nonsingular solutions, and a homotopy that stays within such a parameter subspace solves other members of the sub-family using the reduced number of solution paths. Sub-families with planar platforms admit a two-way symmetry which can be used to reduce the number of solution paths by half. We see that parameter continuation can be an effective way to explore such nested families and discover the generic number of nonsingular roots for each. In the exercises in the next section, we encourage the reader to experience this directly, by running Matlab routines supplied for this purpose. It should be mentioned that there are many other approaches to such a study. In addition to studies of the general 6-6 case already mentioned (Husty, 1996; Mourrain, 1996; Raghavan, 1993; Ronga & Vust, 1995; Wampler, 1996a), for several of the subfamilies, kinematicians have found elimination procedures reducing the problem to a single polynomial (Chen & Song, 1994; Nanua, Waldron, & Murthy, 1991; Sreenivasan, Waldron, & Nanua, 1994; Zhang & Song, 1994) or have applied their own variants of continuation (Sreenivasan Sz Nanua, 1992; Dhingra, Kohli, & Xu, 1992). An extensive study of coincident-joint sub-families using Grobner bases can be found in (Faugere & Lazard, 1995). 7.8
Historical Note: The Cheater's Homotopy
Among those who have some passing knowledge of developments in polynomial continuation, there has sometimes been confusion between parameter homotopy and a similar approach called the "cheater's homotopy" by its inventors (Li, Sauer, & Yorke, 1989). Appearing in print before the article establishing "coefficientparameter homotopy" (Morgan & Sommese, 1989), the cheater's homotopy presaged much of the flavor of the full parameter theory. Consequently, the cheater's homotopy holds an important place in the development of the subject, even though it was soon eclipsed by the more general parameter homotopy theory. Rather than working in the natural parameter space Q associated to a system f(z; q) = 0, the cheater's homotopy expands the parameter space by generic constants b E C". The method starts by solving the initial system f(z; q{) + b = 0 for generic qi E Q and b € C". Then, the finite, nonsingular solutions of this system are used as start points in a homotopy to find all the finite, nonsingular solutions to some other example in the family, say f(z;qo) = 0, qo € Q. This is done by following the solution paths from t = 1 to t = 0 in the homotopy f(z; q(t))+tb = 0, where q(t) G Q is a continuous path in Q with q(l) = q\ and q(0) = qo. We can see immediately from the parameter homotopy theory that this approach works: we have a generic start system (qi, b) in an expanded parameter space Q X C" and the target system is given by (qo,O) 6 Q x C " . However, the addition of
112
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
the generic constants to each equation often destroys crucial structure, causing an increase in the number of paths to track, often substantially. A simple example that shows a big difference is
For general q, this has one nonsingular solution (x,y) = (q,q), so a parameter homotopy will have just one path to track. But the start system for the cheater's homotopy
f(^MM={x4;2XXb:b2}=0.
(7.8.14)
has six nonsingular solutions. Computing solutions of Equation 7.8.13 for several different values of q by the cheater's homotopy requires six paths each time. The added constants b\ and 62 destroy all the structure of the original system. This kind of difference arises in meaningful problems as well; for the nine-point path synthesis problem discussed in § 9.6.7, a parameter homotopy requires only 1442 solution paths, whereas the cheater's homotopy would require at least 90,000 continuation paths (see (Wampler, Morgan, & Sommese, 1992, 1997)). The difference is due to the presence of positive dimensional solution components. Parameter homotopy preserves these components and so the associated paths can be safely ignored. But the cheater's homotopy perturbs these components, replacing them with thousands of nonsingular paths that must be tracked. The same property that makes the cheater's homotopy undesirable in the general situation can make it the method of choice in certain specialized situations: the addition of the random constants makes all finite roots nonsingular. For example, Equation 7.8.13 has a quintuple root at the origin, (x,y) = (0,0). Adding the constants as in Equation 7.8.14 perturbs this into five distinct roots. If we wish to have the origin appear as the endpoint of nonsingular homotopy paths, the cheater's homotopy will accomplish this. Usually though, our aims are in the opposite direction: we would like to avoid computing degenerate solutions whenever possible. 7.9
Exercises
The following exercises are intended to help the reader understand the principles of parameter continuation and also to experience the numerical behavior of the continuation method. They assume that the user has access to Matlab, and that the package HOMLAB, available on the authors' websites, has been installed on the Matlab search path. A users guide to HOMLAB appears in Appendix C. Demonstration codes are provided for most of the exercises, so they can be run with minimal knowledge of Matlab commands. A few exercises require the user to
Coefficient-Parameter Homotopy
113
write or modify an m-file. Even those with minimal prior experience with Matlab should be able to handle these after a little experimentation. A few words about HOMLAB. The main output of the demonstration programs is always stored in two arrays: xsoln and stats. Each column of xsoln contains a solution of the system in homogeneous coordinates, and column i of s t a t s compiles some statistics on the numerics of the ith solution. HOMLAB treats all problems as formulated in a multiprojective space to take advantage of the ability of the projective transformation to handle paths leading to solutions at infinity. For the Stewart-Gough platform problems, this is natural, since we have formulated them on IP7. The code requires that problems naturally formulated in C n , such as the initial triangle example, be homogenized for solution in P". Typically, the homogeneous coordinate that is added in this process is appended as the last row in xsoln. (See the user's guide for information on the full range of options.) Function y=dehomog(xsoln,epsO) de-homogenizes solutions by dividing through by the homogeneous coordinate for any solution for which the homogeneous coordinate is nonzero as judged by the test abs (xsoln(n+1, :) )>epsO. Part of the learning process of the exercises will be to see how to set the tolerances such as epsO. The second output, stats, compiles some statistics for the run. Each column of stats corresponds to the matching column in xsoln. Full information is given in the user's guide. For the exercises to follow, we are mainly concerned with rows 2, 3, and 5, having the following meanings: Row 2 This is a convergence test on the solution. It is a two-norm estimate of how accurately the solution has been computed. Row 3 This is the maximum of the absolute values of the polynomials evaluated at the solution point. If this is not small, an error has occurred. Row 5 Condition number of the Jacobian matrix of the polynomial system evaluated at the solution point. A large condition number implies the solution is singular. Exercise 7.1 (Triangle) This exercise experiments with file triangle.m, which solves Ex. 7.3 using the parameter homotopy path given in Equation 7.3.6. It uses a path tracker without an endgame to handle singular roots so that one can see what happens in such cases. The routine allows the option of accepting a randomly-generated, complex value for the path constant a in Equation 7.3.6. Try the following experiments: (1) Solve several triangles of your own choice, accepting the option to use a random, complex value for a. Does the routine reliably return accurate solutions? (2) Try again, but choose a = 1. Can you find examples for which the routine fails? Succeeds? Can you determine a condition on (a, b, c) that predicts success versus failure?
114
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(3) Now choose a = I + li. Can you find a (a,b, c) for which the algorithm now fails? What happens if you add a small random perturbation to the values? (4) Enter an (a, b, c) that is on the boundary of the triangle inequality, for example, (2,1,1). Let the routine pick a random value for a. What happens? How about for {a,b,c) = (2,l,l+le-8)? Exercise 7.2 (Symmetry) Consider the family of systems F{x, y; a) = {xy3 — a,x3y — a} parameterized by a £ C. Solve F(x,y;a) = 0 symbolically by hand. Find a mapping that gives symmetry groups of order 4. How many roots are there in (x, y) G C2? How many paths would you need to track if symmetry is used to its fullest extent? Exercise 7.3 (Cheater's Homotopy) This exercise addresses the system in Equation 7.8.13. (1) Prove the claim that Equation 7.8.13 has just one nonsingular solution for a generic value of q. (2) Use the script cheatrun.m provided with HOMLAB to numerically determine the number of nonsingular roots for Equation 7.8.13 and for the Cheater's start system, Equation 7.8.14, assuming generic b\ and &2(3) How many solution paths would a parameter continuation have when solving for different settings of the parameter ql How about the cheater's homotopy? (4) What are the singular solutions of Equation 7.8.13? Exercise 7.4 (Stewart-Gough Platforms) The goal of this exercise is to reconfirm the results presented in the case study of § 7.7. You will use Matlab routine Stewart/sgparhom.m. (1) In Matlab, type »sgparhom to begin solving Stewart-Gough forward kinematics problems. A file, strt66.mat, containing random parameters and the 40 corresponding solutions for a generic 6-6 problem is provided to bootstrap the process. (2) Plan a strategy for reconfirming the solution counts for all the subfamilies shown in Table 7.1. Try to minimize the total number of paths that are tracked. The program provides a facility for saving solutions to re-use as start points in subsequent runs. Run at least one of each topological type, some planar and some not. (3) Pick a subfamily and write an m-file that defines a specific example in that subfamily. Then, compute solutions to that example twice: once using a homotopy in the subfamily and once as a special case of a larger subfamily that contains it. For example, you might solve a specific 3-3 case with a 16-path homotopy in that family and also with a 32-path homotopy in the 6-4 family. Compare computation times and check that the same (nondegenerate) solutions are obtained both ways. Remember that the points are computed using homogeneous
Coefficient-Parameter Homotopy
115
coordinates in P 7 , so you will need to devise a scheme for judging that two such points are equal. How closely do the points match? (4) Run a real case and check that any complex solutions appear in conjugate pairs. Change the parameters and see if the number of real roots changes. (5) Solve a problem with real parameters, p £ IR42. Then, use 3-D graphics commands to draw simple (stick-figure) models of the Stewart-Gough platform in all its real poses. Exercise 7.5 (Secant Homotopy) Let f(x;p) : C" x Q —> C" be a system of parameterized polynomials. Then the secant system derived from f(x;p) is g(x;\,n,Pi,P2) = A/(x;pi) + fj,f(x;p2). (1) What is the parameter space for g(x;\,n,pi,p2) = 0? (Note, we may consider [A,/i] G P 1 . Why?) Denote the parameter space as Q' in the following items. (2) What is the relationship between the nonsingular root count for g, J\fg(U,Q'), and the one for / , Nf(U,Q), where U is any Zariski open subset4 of C n ? (3) Suppose we know all A//(U,Q) nonsingular roots of f(x;pi) = 0 for some general pi £ Q. We would like to use these as start points for a secant homotopy h(x,t) = -ytf(x;Pl) + (1 - t)f(x-p2)
(7.9.15)
to find all nonsingular solutions in U of f(x;p2) = 0 by tracking solution paths as t goes from 1 to 0. Why do we need Mf(U, Q) = Mg{U, Q') for this to be justified? If this equality does not hold, can you think of a way the homotopy might fail? If conditions of the previous item are satisfied and the constant 7 in Equa(4) tion 7.9.15 is chosen randomly in C, the secant homotopy will be successful with probability 1. Choosing 7 randomly on the unit circle (|7| = 1) also works. Can you see why the random 7 is necessary? (Try to think of a counter-example if 7 = 1.) (5) Prove the claims of the previous item. (Hint: see § 8.3.)
Exercise 7.6 (Secant Homotopy for Stewart-Gough Platforms) The conditions laid out in the previous exercise for success of the secant homotopy hold for general Stewart-Gough forward kinematics problem (type 6-6) as defined by Equation 7.7.11. A set of 40 solutions for a general 6-6 platform are provided in file stewart/strt66.mat. These were used in Exercise 7.4 as start points for a parameter homotopy, but they can also be used for secant homotopy as implemented in Stewart/sgsecant. (1) In (Wampler, 1996a), it is shown that the root count of 40 for general 6-6 platforms follows from the fact that Bi is antisymmetric when we re-write Equa4
see § 12.1.1 for definition
116
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
tion 7.7.10 for leg i as a quadratic form, eTAie + 2eTBig + gTg = 0,
(2) (3)
(4)
(5) (6) (7)
where e and g are interpreted as 4 x 1 column matrices. Use this fact to prove that the secant homotopy is valid for 6-6 platforms. Use sgsecant to solve a random 6-6 example. How does the running time compare to the parameter homotopy? Can you explain why? Use the secant homotopy to solve examples of the other coincident-point subfamilies, using the 40 start points from strt66.mat. Why is this justified? In particular, solve a 3-3 problem. How does the computation time compare to item 2? Can you explain? We would like to solve problems in a coincident-joint subfamily using a start problem from the same subfamily so that the number of solution paths is equal to the generic number of solution points. For example, we would like to solve 4-4b problems using just 24 paths. What check must be performed to see that this is justified? (Challenging) Write a program to do the check for subfamily 4-4b. (Tip: modify a copy of sgsecant .m.) What is your conclusion? Try the same for other families. What needs to be checked to conclude that a secant homotopy between two 6-6P platforms can be done using just 20 paths? Modify sgsecant.m so that you can do this check. What is your conclusion? Use the results of the last two items to determine the minimum number of paths required to solve a 3-3 problem by secant homotopy.
Exercise 7.7 (Numerics of Tracking) File htopyset .m sets constants that control the behavior of the path tracker. The two most important ones are maxit, and epsbig. Small values require the numerical solution point to stay close to the true path denned by the equations. Large values allow more deviation. Adjust the settings by putting a copy of htopyset .m in your local directory and editing it. Run sgparhom and observe the effect on computation time and reliability by recording changes in runtime, the number of function evaluations (last row in stats), and by noting any path failures. Also, type >> pathcros(xsoln) to check if any solutions have "jumped paths," causing some root to be reported more than once and leaving out the root at the end of the solution path that was left behind in the jump. Can you make sgparhom run faster?
Chapter 8
Polynomial Structures
In the previous chapter, we introduced the basic concept of a coefficient-parameter homotopy. This is the underlying principle for all of the homotopies discussed in this book; each system that we solve has a parameter space, and a homotopy is just a continuous path between two points in this parameter space. Whenever we approach a new polynomial system, the first question we face is how to parameterize it. Problems from engineering or science generally come with a natural set of parameters built in: the dimensions of the links in a mechanical system or the rate coefficients for chemical reactions, for example. But rarely do we know all the solutions for a general choice of such parameters. We need to cast the naturallyparameterized problems in some larger family of problems in which a start system is more easily found. We called this the Ab Initio procedure in § 7.2, but postponed detailed discussion for later. We now return to this important question. At the opposite end of the spectrum from the natural, physical parameterization of a system are total degree homotopies. These can in principle solve any system, because as we shall see, every system is a member of a total-degree family parameterized by the coefficients. Moreover, in each such family, there is a start system whose solutions are immediately apparent. The downside is that, depending on the target system, the total-degree homotopy may have many paths that go to solutions at infinity or other degeneracies. These waste computer time, and the process of carefully distinguishing between degenerate and nondegenerate solutions can also cause extra work. Even so, if the extra work is not so excessive as to make the computation infeasible, we only have to do it once to get the solutions for a general member of the naturally parameterized family. Then, we can use a homotopy in the natural parameter space to solve any other system of that parameterized family. But what if the extra work is excessive? Over the years, a number of useful classes of homotopies have been invented to populate the territory between total degree and naturally-parameterized homotopies. We choose among these with the objective of best matching the target system without overly complicating the solution of the start system. The purpose of this chapter is to discuss the most important signposts in this territory.
117
118
8.1
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
A Hierarchy of Structures
Fig. 8.1 Classes of Product Structures. Below line A, start systems can be solved using only routines for solving linear systems of equations. Above line B, special methods must be designed case-by-case.
Figure 8.1 shows a hierarchy of classes of special structures that are useful in constructing homotopies. Each structure in the diagram is a member of the class above it; for example, a total degree structure is a particular kind of multihomogeneous structure. (In particular, as we will shortly see, it is a one-homogeneous structure.) As we ascend the hierarchy, each class of structures presents more and more possibilities for matching a particular target system that we wish to solve. As indicated on the right of the diagram, this means that we can select a more special structure, usually with the aim of reducing the number of solution paths to track in the homotopy. The trade-off we face in this ascent is indicated by the downward pointing arrow on the left of the diagram: the lower structures allow us to select start systems that are easier to solve. For some problems, the ascent up the diagram pays handsomely in path reduction and may turn an intractable problem into a solvable one. On the other hand, it can happen that solving a start system for a higher structure can consume more computer time than is saved in path reduction. Unfortunately, even just counting the number of roots of the start system can be expensive, so it is a matter of experience to decide the most advantageous spot in this hierarchy to solve a particular problem. Two dashed lines appear in Figure 8.1 to demarcate significant differences in the start systems of homotopies respecting the various special structures. Below Line A,
Polynomial Structures
119
the start systems can be chosen in a factored form which permits all solutions to be computed using simple combinatorics and routines for solving linear systems. Thus, for these structures, the time spent solving the start system is insignificant compared with tracking the solution paths to the target system. Above Line A, some path tracking is usually required just to solve the start system. Furthermore, above Line B, solving the start system usually requires the use of a homotopy based on one of the structures below it in the hierarchy. Typically, these are not optimal in the sense that some paths lead to degenerate points. Between the two lines lie the monomial-product and Newton-polytope homotopies. These require path tracking to solve the start system, but the homotopies involved can be specially designed to produce all solutions of the start system without any extra paths leading to degenerate solutions. In addition to the cost of the path tracking, the combinatoric calculations can be significant. In addition to differences in computation times, the position in the hierarchy also has an effect on the complexity of the computer code that implements it. In this regard, the two extremes are the simplest. All homotopies require routines for path tracking. To this, a total degree homotopy adds a simple start system that is almost trivially solved. Consequently, the corresponding computer code is as simple as possible. At the other extreme, we may formulate a coefficient-parameter homotopy in terms of the physical parameters of the engineering or science problem at hand, a step which we must do in any case. The start system simply amounts to choosing random, complex values for these parameters. The difficulty comes in solving the start system. A simple way to proceed is to solve the start system with a total degree homotopy. This may be expensive, but it only has to be done once. After that, we may solve any target system in the same parameterized family using only the paths from the nondegenerate solutions of that first start system. So, once we have implemented a general-purpose solver for total degree homotopies, coefficient-parameter homotopies require only a bit of data management to solve a start system and store its nondegenerate solution list. The other intermediate structures introduce intermediate levels of complexity to a computer code. Multihomogeneous and linear product homotopies introduce simple combinatorics into the enumeration of the start solutions. In contrast, the combinatorics introduced by monomial homotopies have been the subject of significant mathematical study, of which we give only a hint in § 8.5. A final important consideration in the choice of homotopy is numerical stability and robustness. For the paths leading to nonsingular solutions, there is not much difference to be expected in this regard no matter which homotopy is chosen. However, it can happen that if one uses a homotopy near the bottom of Figure 8.1, the singular solutions may vastly outnumber the nonsingular ones. In some practical situations, we may be satisfied to casually discard all badly-conditioned solutions without wasting much computer time on them. This runs the risk of dropping out some generically nonsingular solutions that happen to have marginal conditioning.
120
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
When we wish to be more careful about finding all nonsingular solutions, a great deal of effort may be necessary to resolve all the badly-conditioned solutions. Moving up the hierarchy to a more special structure may eliminate these solutions from the homotopy and avoid the cost and uncertainty of computing singular solutions. In some cases, singular solutions remain but have reduced multiplicity, making them easier to compute accurately using "singular endgames," (see Chapter 10). With this general picture in mind, we will proceed to examine each of the special structures in some detail. Before starting this journey, we present a discussion of homotopy paths that is relevant to all the special structures. Then, we start at the bottom of the diagram of Figure 8.1 and work our way up to structures of increasing specificity. We give only simple examples in this chapter, postponing case studies of more significant examples to the next chapter.
8.2
Notation
Throughout the remainder of this chapter, it will be convenient to use the following notations. be the n-dimensional vector space having basis elements (1) Let (ei,...,en) e i , . . . , e n and coefficients from C. Any point in this space may be written in the form X^Li Ci&i w i* n c i £ C for all i. Note that we have not specified anything about the basis elements: in the structures we discuss below these will be variously individual variables, monomials, or polynomials. (2) Let {pi, • • •,pn} ® {Qi, • • • ,Qm} be the product of two sets, that is, the set {Pi ® Qj-, 1 < i < n, 1 < j < m} having nm elements. Throughout this chapter, we take this product as the image inside the polynomial ring; that is, x ®y = y ® x = xy is just the product of two polynomials. (3) Define P x Q = {pq | p s P,q £ Q}. Accordingly, {P ® Q) is the space whose members are sums of members of (P) x (Q). Since this includes a sum of one item, we have (P) x (Q) C (P
8.3
Homotopy Paths for Linearly Parameterized Families
As we shall soon see, in our hierarchy of special structures, Figure 8.1, all but the top case (general coefficient-parameter structures) have parameters that appear linearly. This means that the family of systems F(z\ q) : C™ x C m —> C n has the
Polynomial Structures
121
property that for any a, (3 £ C and qi, q2 £ C m , F(z; aqi + f3q2) = aF(z; Ql) + 0F(z; q2). The special structures of this chapter all obey this linearity condition because they are parameterized by coefficients which multiply a basis set of monomials or polynomials. Since the parameter space, C m , is linear, we can easily construct an homotopy that stays in the parameter space while continuing from a start system, F(z; qi), at t = 1, to a target system, F(z; q2), at t = 0, as H(z, t) := F(z; tqx + (1 - t)q2) = 0. By Lemma 7.1.2, to solve the system for a given target q2, we just need the solutions at almost any starting qi £ C m , from which we can follow the real straight line path t £ (0,1]. However, in the case of an Ab Initio homotopy, where we have chosen
where 7 £ C is chosen randomly and r € (0,1]. For nonzero 7 not on the negative real axis and r G [0,1], the denominator 1 + (7 — l)r ^ 0. By the linearity of F(z; q) with respect to q, we can clear the denominator to get H(z, T) := F{z; 7TQ1 + (1 - r)q2) = 0, without changing the solution paths. It can save computation to further rewrite this as H(z, r) := 1TF(z- qx) + (1 - r)F(z; q2) = 0. This is sufficiently convenient that we state it formally below. The upshot is that in the succeeding sections, we may concentrate on finding start systems for each of the special structures. Any start system in the family will do, as long as it has the generic number of roots. Recall from the previous chapter, the notation N{q, U, Q) is the number of nonsingular roots in U of F(z; q) = 0 at parameter point q € Q. Theorem 8.3.1 Suppose F(z; q) : Cn x C m —> Cra is polynomial in z and linear in q, and let f(z) = F(z;q0) for some given q0 G C m . If g(z) = F(z;q*) with M{q*, U, Cm) = Af(U, Cm) for some Zariski open set U C Cn, then
122
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(1) for almost all 7 G S1, i.e., for all but finitely many complex numbers 7 of absolute value one, the homotopy h(z,t):=-ytg(z) +
(l-t)f(z)=O
has Af(U, C m ) nonsingular solution paths ont 6 (0,1] whose endpoints as t —* 0 include all of the nonsingular roots of f(z) = 0 in U; (2) if g(z) = 0 has no isolated roots of multiplicity greater than 1, the endpoints of the nonsingular solution paths include all isolated solutions of f(z) = 0 in U; and (3) if we let 7 = el6, the foregoing statements still hold for all but a finite number points 9 G [—7T, TT]. Proof. This is a consequence of Theorem 7.1.4, Theorem 7.1.6, and Lemma 7.1.3 with rearrangements described above for linearly parameterized families. • Remark 8.3.2 In cases where g already incorporates a generic complex scaling factor, 7 is superfluous; it can be dropped from the homotopy. (This is equivalent to choosing 7 = 1 . ) Through use of Theorem 8.3.1, as long as the parameters of the family of systems appear linearly, all that we need to form a good homotopy is to find one start system in the family having the generic number of nonsingular roots. Then, by picking 7 at random in C, the homotopy leads to all nonsingular solutions of a target system, with probability one. 8.4
Product Homotopies
Let us now jump to the bottom of the hierarchy of Figure 8.1 and work our way up. Although the lower structures can be justified as special cases of the higher ones, it is better for building understanding and intuition to start with the simpler cases. Not surprisingly, for the most part, this follows the historical development of the subject. 8.4.1
Total Degree Homotopies
At the bottom of the hierarchy, the total degree homotopy uses the least detail of the structure of the target system to be solved. The structure is completely characterized by the number of variables n and a list of degrees di, i = 1,..., n. (Here, the di are all positive integers.) Let F(z, q) : Cn x Q —> C" be the family consisting of n polynomials in n variables with dt being the degree of the ith polynomial. The parameter space Q consists of the coefficients of all monomials that respect the
123
Polynomial Structures
specified degree structure. In other words, we have fi(z;q)=
qi>aza,
^
i = l,...,n,
(8.4.1)
\a\
where a = {ai,...,an}
G Z | o , \a\ := ax -\
+ an,
and za := z™1z%2 • • • < " .
1
The number of monomials in n variables having degree less than or equal to d is ("n^)' s o denoting rrii = (™~^di), the parameter space for the total degree homotopy is Q = C m i x • • • x C m ". Using the notation of § 8.1, we may write a description of F in the alternative form /i(z)e({l,;zi,...,zn}<*>), where the parameter space is the set of coefficients multiplying the elements of the vector space. Since the parameters of F appear linearly, we can apply Theorem 8.3.1, if only we can find a start system g £ F that has the generic number of nonsingular roots and is easy to solve. We know from the classical Bezout Theorem for systems that the number of finite, nonsingular solutions to a generic member of the total degree family is J\f = d\ • • • dn. A simple system that achieves this bound is
(#-1) z
g(z) = I
d
_
2
2
.
-^
\ = 0.
(8.4.2)
We can solve the individual equations independently, obtaining dt roots for z»; the solutions of the system g(z) = 0 are the d\ • • • dn combinations of these. It is easy to see that all of these roots are nonsingular. So, even though it is very sparse, g(z) has as many roots as the most general member of the total degree family. We summarize the net result in the following theorem. Theorem 8.4.1 (Total Degree Homotopy) Given a system of polynomials : C™ -> C" with the degree of fc equal to di} let g(z) / ( z ) = {fi(z),...,fn(z}} be any system of polynomials of matching degrees such that g{z) = 0 has d = fj™ di nonsingular solutions. Then, the d solution paths of the homotopy h(z,t):=
(l-t)f(z)=O
starting at the solutions of g(z) = 0 are nonsingular for t £ (0,1] and their endpoints as t —> 0 include all of the nonsingular solutions of f(z) — 0 for almost all 7 £ C, excepting a finite number of real-one-dimensional rays through the origin. In 1
Simple demonstration: a monomial za, z = {21,..., zn}, \a\ < d, can be written as a string of
d + n symbols as za = 1 • • • 1 X z\ • • • z\ X • • • X zn • • • zn, where the positions of the n occurrences
of the "x" symbol uniquely specify the monomial. Hence, the choices of n items in a list of n -f- d things enumerate the monomials.
124
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
particular, restricting 7 to the unit circle, 7 = e%0, the exceptions are a finite number of points 9 £ [0, 2TT].
Proof. Because the family of all polynomial systems with the specified degrees is a vector space over the coefficients of its monomials, this follows directly from Theorem 8.3.1 under the condition that g(z) = 0 has the generic nonsingular root count. The classical Bezout Theorem says that d = 11"= 1 di i s the generic root count for this family, so we are done. • Remark 8.4.2 The system g(z) from Equation 8.4.2 satisfies the conditions of the theorem, and so it can be used as the start system of a homotopy to solve f(z) = 0. There are, however, many viable alternatives. One that is occasionally useful has gi(z) a product of di generic linear factors. Using the notation of § 8.1, we may write gi(z)e{z1,...,zn,l)(d').
The roots of this start system are found by choosing one factor from each equation and solving the resulting linear system of equations. If we choose the coefficients of all the linear factors at random, these linear systems will all be nonsingular with probability one. Equation 8.4.2 is a special case in which 9i{z)
G (Zi,l)idi) .
Instead of taking the classical Bezout Theorem as given, we can prove it with the tools at hand. It is instructive to do so, because a slight generalization of the same argument will apply for multihomogeneous structures in the next section. First, we rephrase Bezout's Theorem in the current notation. Theorem 8.4.3 (Projective Bezout) Given positive integers di,...,dn, let F(z, q) : C r a + 1 xQ —> C" be the family of homogeneous polynomial systems whose ith function is a member of the vector space ({?o, ~z\,. •., z n } d i ) and whose parameters Q are the coefficients of this space. Then, n t=i
Corollary 8.4.4 (Affine Bezout) Given positive integers d\,..., dn, let F(z,q) : C " x Q ^ C n be the family of polynomial systems whose ith function is a member of the vector space ({1, z\,..., zn}d') and whose parameters Q are the coefficients of this space. Then, n
Af(Ci,Q) = '[[di.
125
Polynomial Structures
Proof. Let q* G Q be the set of coefficients for the system 5(2) = F(£; g*) as
z
z
l
5(2) = I \zn
2
0
. ° [=0. z
0
(8.4.3)
>
We see that 5 has no solutions at infinity, because if ?o = 0, then all of the % = 0, but [0,..., 0] is not a point in projective space. Away from infinity, we may dehomogenize by setting z0 = 1, and find the remaining % as the djth roots of unity. Clearly, there are d = JJ"=1 di distinct solutions, and they are all nonsingular. Theorem 7.1.4 says that since q* G Q, the generic root count A^(Pn, Q) > d. Suppose q' 6 Q in the neighborhood of q* has N > d nonsingular solutions. Theorem A. 14.1 implies that nonsingular roots continue in an open neighborhood, so since P" is compact, the nonsingular solutions along a path from q' to q* must have a limit in P n as the path approaches q*. Accordingly, some solution of g(z) = 0 must have at least two solution paths approaching it. But this contradicts Theorem A. 14.1, leaving M = d as the only possible conclusion. The corollary follows immediately from the observation that since ^(2) = 0 has no roots at infinity, this is the case generically on the whole family F(z; q) = 0, and therefore the affine root count is the same as the root count onP". • Remark 8.4.5 We call d = Yl7=i °k * n e total degree of the system. Thus, we may say that the number of finite, nonsingular roots of a system of n polynomials on C71 is less than or equal to its total degree. Remark 8.4.6 The system of Equation 8.4.3 can be used as the start system in an homogeneous homotopy to solve n homogeneous polynomials in n + 1 variables using the homogeneous analogue of Theorem 8.4.1. In fact, it is very useful to homogenize a target system and solve it on P n , so that solution paths that would diverge to infinity in C n can be followed to their endpoints at infinity in Pn. See Chapter 3 for more on this. The total degree homotopy is easy to implement and very effective for systems of dense polynomials. However, systems arising in practice often display patterns of sparsity that result in fewer than the total degree number of roots. The next few sections move up the hierarchy of Figure 8.1 to capture more of the structure of the target system.
126
8.4.2
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Multihomogeneous
Homotopies
The quickest way to understand multihomogeneous structures is to start with an example. Suppose we have the system
f(x,v) = {*Hll}=0-
(8-4-4)
The total degree of this system is four, but it has only two finite roots, (x, y) = ±(1,1). When we use a total degree homotopy on C2, we are in essence solving a one-homogenization of the system on a patch of P 2 . In this case, the onehomogenization of f(x, y) is obtained by substituting x = X/W and y = Y/W and clearing denominators to get 2
~) FX(W,X, Y) =f |XY£ [-_W w2 I = 0.
(8.4.5)
Now the finite roots are [W, X, Y] = [1,1, ±1] and there is an additional double root at infinity: [W, X, Y] = [0,1,0]. The total degree homotopy not only wastes computation by following four solution paths, but the two unwanted paths lead to a singular root. If this root is not handled properly, the procedure may spend much more time on it than is spent on the meaningful finite roots. It would be better to use a different treatment of infinity, so that the undesired roots no longer exist. In this case, this can be done by introducing a separate homogeneous coordinate for each variable; that is, set x = X/U and y = Y/V and clear denominators to get
F2(X,Y,U,V) = [X£Z™ } =°-
(8A6)
We now seek solutions ([U,X], [V,Y]) G P x P and find that there are only the two finite solutions ([1,1], [1,1]) and ([—1,1], [—1,1]). There are no solutions at infinity, because setting U = 0 implies (U,X) = (0,0), which is not allowed, since [0,0] ^ P, and setting V = 0 has similar consequences. An homotopy that respects the two-homogeneous structure of the system will have only two paths. This can be understood in another way using the vector space notation of § 8.1. Recall from the previous section that the total degree homotopy treats f(x,y) as follows: (xy - 1) G ({x, y, 1}
[ l>
^
In contrast the two-homogeneous treatment places f(x, y) as a member of the family as follows: (xy - 1) G ({x, 1} ® {y, 1}) = (xy, x, y, 1), (x2 - 1) G ({x, 1} ® {x, 1}} = ( x 2 , x , l ) .
. l8 4 8j
- -
127
Polynomial Structures
Clearly, for this system, the two-homogeneous treatment is more restrictive than the one-homogeneous treatment. The corresponding start system is 9i(x,y)£(x,l)x(y,l), g2(x,y) e (x,l) x (x,l).
^ ^
A particular instance that is sufficient is
which has two solutions (x, y) = (±1,1). When solving this system, we cannot choose the first factor x = 0 in the first equation as it is incompatible with either factor in the second equation. This hints at the general phenomenon that we make use of in multihomogeneous homotopies. More formally, the structure used in a multihomogeneous treatment of a system can be summarized as follows. We have n variables that are partitioned into m disjoint subsets of size ki,..., km, (fci + • • • + km = n); that is, we have z G C™ written as z = {zi,...,
zm} with Zj = {ZJI, . . . ,
zjk.}.
Furthermore, in the target system f(z), the degree of the zth polynomial fi(z) with respect to the jth set of variables Zj is d^. This can be written for i = 1 , . . . , n as
fi=
£
c { Q l ,..., Q m } zf 1 ... Z °™
(8.4.11)
{ = !,...,or m }
where each a^ is a multidegree. Equivalently fi e ({1, 2l}
.
(8.4.12)
We consider the family of all such systems, parameterized by the coefficients of all the monomials that appear in this vector space. In the remainder of this section, f(z) is a particular member of this family and Af(Cn, Q) is the root count for the family, where the parameters, forming space Q, are the coefficients of all the monomials of the vector space specified by Equation 8.4.12. As we will justify below, a start system that corresponds to F given by Equation 8.4.12 is g with 9i
e (zu l ) ( d i l ) x • • • x (Zm, l){d""}.
(8.4.13)
That is, gi is the product of linear factors, with d^ factors of the variables Zj. Let G be the family of all such systems, having a parameter space Q' consisting of the cross product of the parameter spaces for the vector spaces of the factors. Clearly, after expanding the product and collecting terms, each such g is in the family defined by Equation 8.4.12, which defines a map <j> : Q' —> Q. Let Qg C Q denote the image
128
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
of Q' under the map 0. We know that Qg is irreducible, because Q' is, so we may speak of M(U, Qg), the generic nonsingular root count of the start system family G as a subfamily of F. To find a solution of g{z) = 0, choose one factor from each equation and solve these n linear equations simultaneously. One finds all of the solutions by ranging over all possible choices of the factors. As we saw in the example of Equation 8.4.10, some combinations of factors will be incompatible; in fact, we must choose exactly kj factors for each group of variables Zj. There are several ways to count the number of solutions of the start system g(z) = 0. Let D be the n x m matrix of nonnegative integers with entries d\j and let K = {k\,..., km}. For generic coefficients in the linear factors, we have a generic root count that depends only on D and K. We'll call this function Bez(D,K) = J\f(Cn,Qg). Let s(K) be a list of length n containing kj copies of a,- for j = 1,..., m. From this, let ir{K) be all the distinct permutations of the list s(K), of which there will be n\/(k\\ • • • km\). Then, a direct formulation of the combinatoric process described in the previous paragraph is
Bez(D,K)=
n
(8A14)
11^-
J2
An equivalent definition is
(
n
m
\
^•••amm.I[2dtfaJ h
(8A15
)
where coeff(x,p(x)) reads as "the coefficient of monomial x in the polynomial p(x)." A special case of this formula occurs when m = n, which implies kj = 1, j = 1,... ,m. Then, D is a square matrix and Bez(£>, K) — permanent(Z?), where the permanent of a matrix is just the determinant except all terms are added without introducing negative signs on the odd permutations. If D has all nonzero entries, then there are n! terms in the sum. The other extreme is the one-homogeneous case m — 1, k\ = n, for which we get one term, the total degree Bez(D,{n}) = dn • • -dni.
Now, let's justify the use of this start system by proving the following theorem. Theorem 8.4.7 (Multihomogeneous Bezout Theorem) Let F : Cn x Q —> C n and G : Cn x Qg —> Cn be the families of systems specified by Equation 8.4.12 and Equation 8.4.13, respectively. Then JV(C", Q) = M(Cn, Qg) = Bez(D, K), where a formula for Bez(D, K) is given in Equation 8.4.I4. Proof. The proof is essentially the same as the proof of Theorem 8.4.4, except we use multihomogenizations of F and G to compactify the solution domain. See
Polynomial Structures
129
§ 3.6 for the definition of multiprojective spaces and multihomogeneous polynomial systems compatible with them. The multihomogenizations of F and G are functions on C fcl+1 x • • • x Ckm+1 compatible with the multiprojective space X = ¥kl x • • • x pfem j n particular, the multihomogenization G of G, using the homogenization substitutions Zj£ = z"j(/u>j, has an ith function of the form
A solution to 5 = 0 must have at least one factor in each equation equal to zero. For a generic J E G , a choice of kj factors in the group of variables {WJ, 2ji,..., "z^ } from kj different equations, determines a unique point in the corresponding ¥kj, and a collective choice of one factor from each equation that has kj factors in each group of variables for j — 1,..., m gives one nonsingular solution of 5 = 0 in X. These are the only possible choices, since any other choice must have more than kj factors in some group j and so has only the trivial solution {0, ...,0} ^ Fkj. These are the same combinatorics that define Bez(D,K), so we have Af(X,Qg) = Hez(D,K). Moreover, generically none of the roots are at infinity, and no other solutions exist. Since the multiprojective space X is compact, by the same argument used in Theorem 8.4.4, we have for the multihomogenized family F that Af(X, Q) = Af(X, Qg). Since generically none of the roots is at infinity, and since the affine roots of the original inhomogeneous systems F and G are in one-to-one correspondence with those of their multihomogenizations, the affine root counts are the same as the root counts on X. • Remark 8.4.8 Although we have not stated it as a separate theorem, it is clear from the proof that Bez(D,K) is also the generic nonsingular root count for a multihomogeneous polynomial system with degree matrix D and group sizes K = {hi,..., km} compatible with the multiprojective space X = Pfcl x • • • x Pfcm. The final step is to connect our start system g with a target system / of the same multidegree structure. Since the parameters of F(z\ q) appear linearly, we may use the homotopy given in Theorem 8.3.1. For the record, we state this as the following theorem. Theorem 8.4.9 (Multihomogeneous Homotopy) Given a system of polynomials f(z) = {fi(z),..., fn(z)} '• C n —> Cn having a degree matrix D for variables partitioned into subsets of sizes K, as above, let g(z) be any system of polynomials of matching degrees such that g(z) = 0 has Bez(D,K) nonsingular solutions. Then, the Bez(D, K) solution paths of the homotopy h(z,t):=jtg(z)
+
(l-t)f(z)=O
starting at the solutions ofg(z) = 0 are nonsingular for t £ (0,1] and their endpoints as t —* 0 include all of the nonsingular solutions of f(z) = 0 for almost all 7 € C, excepting a finite number of real-one-dimensional rays through the origin. In
130
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
particular, restricting 7 to the unit circle, 7 = e , the exceptions are a finite number of points 0 e [0,2TT]. Proof. Because the family of all polynomial systems with the specified degrees is linear with respect to the coefficients of its monomials, this follows directly from Theorem 8.3.1 under the condition that g(z) — 0 has the generic nonsingular root count. Theorem 8.4.7 establishes that Bez(D, K) is this count. • Remark 8.4.10 A similar homotopy works on a multiprojective space X = pfci x • • • x Pfc™ for compatible multihomogeneous functions and start systems. This is the preferred formulation when the target system might have solutions at infinity, for the reasons cited in Chapter 3. Example 8.4.11 (Matrix Eigenvalues) In the realm of numerical linear algebra, efficient and robust methods already exist for solving matrix eigenvalue problems, but for purposes of illustration, let's consider the problem of finding eigenvectors and eigenvalues by multihomogeneous homotopy. Given two n x n matrices A and B, the generalized eigenvalue problem is to find (v,X) e p™-1 x IP such that f = (X1A + \2B)v = 0. This becomes a conventional eigenvalue problem for A if we set Ai = 1 and B — —I. The problem consists of n quadratic equations, thus the total degree is 2". Partitioning the variables in the natural way as Z\ = v and z-z = A, we have da = di2 = 1; that is, the equations are bilinear. The root count is the coefficient of a™~1a2 in the polynomial (c*i + 02)") which is simply n. This agrees with the well-known result from linear algebra. A suitable start system has gi(v,X):=(aJv)(bJX)=O,
i =
l,...,n
where a* € C™ and 6j £ C2 are chosen randomly. For k = 1,... ,n, we choose the second factor in the /cth equation to solve for A and solve the linear system formed by the first factors from the remaining (n — 1) equations to get v. This gives n start points. Notice that the equations are all two-homogenized from the outset. To treat these numerically, we may dehomogenize by appending a random, inhomogeneous linear equation for v and one for A. This amounts to choosing a random patch C"" 1 x C 1 on P"" 1 x P. 8.4.3
Linear Product
Homotopies
Multihomogeneous homotopies are linear product homotopies that respect a given partitioning of the variables. They are ideal for problems that have a natural
Polynomial Structures
131
partitioning, such as the eigenvector-eigenvalue problem, but some problems benefit from a less restrictive partitioning, introduced in (Verschelde & Cools, 1993). We call a linear set any subset of {1, z\,..., zn}. A linear product structure is specified by a list of linear sets for each equation. Assume the variables are z = {zi,... ,zn}. Let TOj be the number of linear sets for equation i, and let them be denoted s^ C {1, z\,..., zn}. Then, a linear product family is given by / < € (sn ® • • • ® simi),
(8.4.16)
with, as usual, the parameters being the coefficients of the vector space. For such a family, a sufficient family of start systems G has for the ith equation 9i(z)
e (s x . . - x ( s i m t ) .
(8.4.17)
As discussed in the previous section on multihomogeneous systems, we may consider the family of G as a subfamily of F, having an irreducible parameter space Qg C Q, where Q is the parameter space of F. The sufficiency of G as a start system for F just means that it has the proper root count, which is stated formally as the following theorem.
Theorem 8.4.12 (Linear-Product Root Count) Let F and G be the families of systems specified by Equation 8.4-16 and Equation 8-4-17, respectively, and let Q and Qg C Q be their parameter spaces. Then, for a Zariski open set U C C n Af(U,Q)=M(U,Qg). This is an easy consequence of the general product decomposition theorem, Theorem 8.4.14, below, so we postpone proof to that point. The combinatorics of finding all nonsingular roots to g(z) = 0 is slightly more complicated than in the multihomogeneous case, because the variable groupings are not necessarily the same across all the factors. However, it is just a matter of determining, for each collective choice of one factor from each of the polynomials in g(z), whether the resulting linear system is compatible. We return to this below, but first, let us state the corollary that justifies using a linear-product homotopy. Corollary 8.4.13 For any f(z) in the family defined by Equation 8.4-16 and a generic g(z) from the family defined by Equation 8.4-17, the solution paths of h{z,t):=-ytg(z)
+
(l-t)f(z)=O
starting at the nonsingular roots of g(z) = 0 are nonsingular for t g (0,1] and their endpoints at t = 0 include all the nonsingular roots of f(z) = 0, for all 7 G C excepting a finite number of one-real-dimensional rays through the origin. Proof. This is the usual application of Theorem 8.3.1 in light of Theorem 8.4.12.
•
132
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
The theorem and its corollary are quite simple to apply. Consider the system
^U'H^C^H-
(8418)
We see that /i G ({x,y}
*(*,*) = (" W ,
(X+ )(1
J ^v +2/ M=0-
(^.19)
v lS2 / \ ( ^ - 2 / ) ( ^ + 22/){l + 3/) / > Although the total degree of g is 6, it has only 4 nonsingular roots, since (0, 0) is a double root. Although we chose very simple coefficients, it is easy to see that this is true for generic coefficients. Hence / has at most 4 nonsingular roots on C2. We give a more substantial example in the case studies below (see § 9.3). It is easy to build a computer program that takes advantage of linear-product homotopies, if we rely on the user to identify the product structure. Then, the program forms a start system consisting of linear factors with coefficients picked by a random number generator. This gives a system that is generic with probability one. The program cycles through the various combinations of choosing one factor from each equation and, if the resulting linear subsystem is full rank, its solution is determined. This potential start point is a solution of the start system, but it is a true start point of the homotopy only if it is nonsingular and it is in the set U. We can check for singularity by numerically evaluating the condition number of the Jacobian matrix of partial derivatives at the point. Assume U is defined explicitly as the complement of the solution set of a given polynomial system, say U = Cn \ s-\0) where s : Cn ->
Polynomial Structures
133
include total degree structures. In HOMLAB, the Matlab code distributed for use with this text (see Appendix C), the general-purpose code uses linear products. The drivers for multihomogeneous and total degree homotopies construct equivalent linear-product structures and then proceed as in the general linear-product case. 8.4.4
Monomial Product
Homotopies
Next up the hierarchy of Figure 8.1 are monomial product structures. Truth be told, these are not usually used directly, but we introduce them as a conceptual bridge to the next level of polynomial products and polytope structures. All we note here is that the entire theory of linear-product structures carries over to the more general case where the sets si;,- are collections of monomials. In the case of linear products, we restricted these monomials to just {1, z\,..., zn}. Let's consider a simple example to fix ideas. Suppose we have two equations involving only the monomials {x,y,x2y,xy2}, that is, /i,/z e (x,y,x2y,xy2). These are cubics, so the total degree is 9. The two-homogeneous Bezout number is the coefficient of a(3 in (2a + 2/?)2, which is 8. The best linear-product structure that contains the given monomials is ({x, y} ® {l,x} ® {l,y}}. If we work on (C*)2, this structure has 6 roots. But the equations obey the following monomial product structure A,/2G {{l,xy}®{x,y}). This structure gives the same root count as the factored system gi,92 G (l,xy) x (x,y). One sees that two generic factors from (l,xy) have no finite roots and two generic factors from (x,y) have only the origin in common, so working on (C*)2, we have ^({/i,/2},(C*) 2 )-AT({ gi ,(; 2 },(C*) 2 ) = 4. The drawback of monomial products, in contrast to linear products, is that it is no longer easy to solve the start system. Fortunately, as covered in the next section, that problem has been solved in a quite general way via the use of convex polytopes. Another advantage of the advanced methods is that they do not require the analyst to find decompositions by hand; it is all automatic. In fact, although convex polytopes can be used to justify the theory of monomial product structures, it is more powerful, applying also to monomial vector spaces that do not reduce to products. If our simple example above is modified to A, h e (x, V, xy, x2y, xy2) 0 ({1, xy} ® {x, y}),
134
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
the monomial product theory does not apply, but the convex polytope approach gives the same root count of 4, because xy is inside the "convex hull" of the other monomials. Still, despite its limitations, it may occasionally be useful to analyze a small system by monomial products. It also serves as a stepping stone to our final product structure: polynomial products. 8.4.5
Polynomial Product Homotopies
As throughout this chapter, let's consider a family of polynomial systems F(z; q) : C n x Q —> Cn. More specifically, let F = {/i,... , / „ } , where each polynomial fi : C™ x Qvt —> C has as its parameter space the coefficients of a vector space Vj defined by a polynomial product structure as follows. Each Vj is specified by rrii sets of polynomials s^-, j = 1 , . . . ,mj, which letting /cy be the number of polynomials in the set s^, can be denoted as s^ = {ptji, • • • ,Pijktj}- All the polynomials pijk are given. The vector space Vj is constructed from these as the polynomial product Vz := (Sil ® • • • ® s i m i ) ,
i =l,...,n.
(8.4.20)
The basis elements of V are all the polynomials obtained by choosing one element from each sy, j ~ 1 , . . . , rrii and multiplying them together. If two or more of these choices give an identical element, the duplicates can be dropped, but in any case, there are at most JTjli ^ij basis elements for Vi. The parameter space for Vi, which we call Qvi, is the set of coefficients multiplying these elements. The parameter space for the family of systems F is just Qv1 x • • • , xQyn. Alternatively said, if a polynomial Wi can be written in the form
Wi
Ti
rrii
=£ n ww>
t 8 - 4 - 21 )
where wm £ (sy), then wt e Vj. A particular system in the family F consists of an n-tuple of polynomials {w\,... ,wn}, Wi G Vj. Now, consider a special member of F wherein each Wi is formed from a single product, that is, rj = 1 in Equation 8.4.21. We will argue that a generic system of this type is sufficient as a start system for a homotopy to find all nonsingular solutions of any system in F. Accordingly, we will choose a generic start system 9{z) = {gi(z)> • • -,9n(z)} with rrii
9i(z) = I ] 9ij{*),
9lj e (Sij),
(8.4.22)
j=i
or what is equivalent, 9i(z) e (sn) x • • • x {simi).
(8.4.23)
Each vector space (sy) has Ckij coefficients, so the entire family of start systems G of the form of Equation 8.4.22 has a parameterization as the cross product of all
Polynomial Structures
135
of these Euclidean spaces, which is therefore just a big Euclidean space. But since every g{z) e G is also in F, we can cast G as a subfamily of F having parameter space Qg C Q. Clearly, Qg is connected, because it is the image of a Euclidean space, where the map is defined by expanding the product and collecting terms. Accordingly G{z;q) is just F(z;q) restricted to C n x Qg, where Qg is the set of systems in F that factors as Equation 8.4.22. The sufficiency of g(z) € G as a start system for any f(z) G F is established by the following theorem. Theorem 8.4.14 (Polynomial-Product Root Count) Let F and G be the families of systems specified by Equation 8-4-20 and Equation 8-4-23, respectively, having parameter spaces Q and Qg c Q, as described above. Then, for any U that is a Zariski open subset ofCn, Af(U,Q)=Af(U,Qg). In other words, the number of nonsingular roots in U for a generic start system, one that factors in the specified way, is the same as the generic nonsingular root count of the whole family. Such a start system g(z) is much easier to solve than a general system in the family, because g^z) = 0 implies that at least one of gij(z)
= 0, j =
I,...,mi.
Our earlier proofs of Theorems 8.4.4 and 8.4.7 hinged on showing that the start system had no singular solutions and no solutions at infinity. The question of excluding roots that satisfy some side conditions, that is, the limiting of the root count to some Zariski open subset U, did not arise, because those start systems will not generically have roots on any given quasiprojective set. In the case of polynomial-product structures, a generic system g 6 G may have singular solutions, solutions at infinity (in some multihomogenization of C"), or solutions on some quasiprojective set. The inclusion of U in the theorem strengthens the result (as compared to using just C n in its place), because it will allow us to drop solutions that generically lie on some quasiprojective set that we wish to ignore. So while these possibilities give the formulation extra power to eliminate solution paths in the homotopy, we must pay for them with a more difficult proof. In particular, we must argue in more detail that in a continuation from a generic member of F to a generic member of G, none of the nonsingular, finite solution paths end at such degeneracies. The proof is a little long by the standards of this chapter, but we attempt to keep the arguments elementary. This sacrifices some rigor and elegance, but hopefully it grants the reader an easier grasp of the essential facts. In the linear-product example of Equation 8.4.18 with start systems like Equation 8.4.19, we already saw an example of a singular solution to the start system which also happens to lie on the affine algebraic set x = 0. These conditions persist generically for the entire family of start systems for that example. We pause here a moment to emphasize that the theorem can be readily applied
136
Numerical Solution of Systems of Polynomials Arising m Engineering and Science
without understanding its proof. In fact, we will give only a sketch of a proof here, as a rigorous one requires the language of line bundles and sheaves. The reader who is versed in these technicalities may wish to consult (Morgan, Sommese, & Wampler, 1995) for a better proof. The proof sketch below may be useful as a guide to understanding the rigorous proof. On that note, some readers may wish to skip to the end of the proof now. Proof, (sketch) We consider the one-homogenizations of F and G with solutions that live on P", but to keep notation simpler, let us retain the same names. After homogenization, the variables z are replaced by homogeneous coordinates x = [xo,xi,... ,xn] G P " and the basis elements of the sets Sik are replaced by their homogenizations. We count the nonsingular solutions on a Zariski open subset U C P n . This includes the special case of counting finite solutions, since Cn = P"\^4 where A = {x E Pn\xo = 0}. The finite solutions of the homogenized systems, i.e., the solutions with Xo =fi 0, are in one-to-one correspondence with the solutions of the original systems via the mappings [xo,xi,... ,xn] — i > (XI/XQ,. .. ,xn/xo) and ( z i , . . . , zn) i—> [1, x\,..., xn], so counting the finite solutions of the homogenized systems is the same as counting the solutions to the original systems. Let iJfc = {!,..., gk, fk+i, • • •, fn} be the system obtained by replacing the first k functions in F by the corresponding functions in G. Accordingly, Ho = F, Hn = G and we have a corresponding sequence of parameter spaces Q = Qo D Q\ D • • • D Qn = Qg- Suppose we can show that M(U, Qi) = N(U, Qo). Then since the order of the functions doesn't affect the solution set, when stepping from Hk to Hk+\ by replacing fk+1 in Hk by gk+i, we may reorder to place fk+x as the first function in the set and conclude that J\f(U,Qk+i) = N(U,Qk)- Chaining these equalities together, we get Af(U,Q) = N{U,Qg), thus establishing the result we seek. Thus, the proof of the theorem hangs only on the lemma Af(U, Q\) = J\f{U, Qo). For the lemma, we fix {/2,..., f n } , and consider what happens for generic f\ and g\. Abusing notation, we will still call the parameter spaces Q and Qi, respectively, from here on. To prove the lemma, we begin by considering that g\ is the product of m; factors, say si = 311912 • • -01m!, where 0ij e (sij), j = 1,.. • ,m\. For each factor, there is a generic nonsingular root count dj in U for the system {(sij), /2, • • •, fn}- By the elementary rules of differentiation of a product, it is easily seen that if a point x* is a zero of more than one factor in the product, then all the first derivatives of 0i at x* are zero. On the other hand, if x* is a nonsingular root of one and only one of the systems {01 j , f2,. • •, / „ } = 0, j = 1 , . . . , mi, then it is a nonsingular solution of {01, /2, • • •, fn} = 0. Consequently, N{U, Q\) is the sum dx + • • • + dmi minus the number of roots that are generically at the intersection of two or more of the factors {sij). We will be more precise in a moment. Consider W = {/ 2 ,. •., /n}~1(0)> t n e solution set, with multiplicities, of the last n — \ equations in F. This set can be decomposed into its irreducible com-
Polynomial Structures
137
ponents, which may be of any dimension from n down to 1. The intersection of a fc-dimensional component with a hypersurface produces components of dimension k or k — 1, and the multiplicity of the intersection is at least as great as that of the component. Accordingly, to count the nonsingular solutions of F = Ho or Hi, we only need to retain from W the irreducible components having both dimension 1 and multiplicity 1. Call this collection of components the curve K. The root count for Ho concerns the intersection K H /^(O), whereas the count for Hi concerns Kngi\0) = U^iKng-Jl(0). In a continuation path through the parameter space for /i as we approach pi, we must consider whether nonsingular roots might become singular so that Af(U,Q) > M(U,Qi). Recall that the base locus Bs(V) of a vector space V = (ei,. •., em) is the set of common zeros of all the basis elements of the space: Bs(V) = {ei,... ,e m }~ 1 (0). The key observation is that generically the singular intersections with gf 1(0) can only occur where K meets the base loci of (sy). Any other singular intersections disappear under generic perturbations of the parameters of gx. The completion of the proof depends on technical arguments about these base loci. Basically, since /i is a sum of polynomials each of the same form as gi, the base loci are preserved under the sum, and moreover, so are their effect on the root count. We leave the details to (Morgan et al., 1995). • There is one phenomenon mentioned in the proof sketch that is relevant to practical implementation of polynomial product homotopies. This is that a point that is a nonsingular solution to one subset of factors (i.e., to a choice of one factor from each gi) might be a singular solution of the whole system g = 0. Such points must be dropped from the list of start points. Except in the special case of linear products, treated in § 8.4.3, polynomial product structures require special methods to solve the start system. After breaking the start system into its various subsystems, one could apply a simpler structure, say multihomogeneous, to each subsystem. However, this is the same amount of work as solving the entire start system, and therefore the target system, with the simpler embedding. We only come out ahead if we use something in the structure of the subsystems to solve them more efficiently. A common occurrence is that many of the starting subsystems have the same structure. After one such subsystem is solved, it can be used as a start system for the other similar subsystems in a parameter continuation. A second major inhibitor to the use of polynomial products is that there is no automatic way to identify a useful breakdown of a given system into a product. Usually, the product is suggested by the method of derivation of the equations. The dual difficulties of finding a useful breakdown into a product and solving the start system means that polynomial products are only appealing for very large problems where the potential payoff is worth the analyst's time. Otherwise, one may as well employ a more automated method and let the computer do the work.
138
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
This completes our tour of the product structures in Figure 8.1. In the next section, we consider a different generalization of monomial products, using monomial polytopes, which respect product structures but also take advantage of monomial sparsity that is not captured in by any breakdown into products. 8.5
Polytope Structures
A natural way to specify a family of polynomial systems is just to list the monomials that may appear in each polynomial. The family is parameterized by the coefficients. By the general coefficient-parameter theory, it is clear that there is a root count associated to such a family. A remarkable theorem, repeated below, due to Bernstein (Bernstein, 1975) tells how the root count depends on the pattern of the monomials. Since the family is linear in its coefficients, we can use the homotopy of Theorem 8.3.1 to solve problems in the family, if only we can solve a start system having the generic number of roots. Several methods for formulating and solving such systems have been invented. We describe here only the basics, so that the reader can appreciate the methodology, but due to the highly technical nature of efficient combinatorial formulations, we defer to references for the details. After reading this section, one might next wish to consult the review article (Li, 2003). Before we can state Bernstein's theorem, we need a few definitions. 8.5.1
Newton Polytopes and Mixed Volume
Let C* = C \ 0, the complex numbers excepting the origin. A Laurent polynomial fi : (C*)n x Cm" —> C is given in multidegree notation as Ji\X,Ci) = y Ci,ct'E aeSi
where Si C Z n is the set of exponent vectors appearing in the monomials, #(£i) = rrii is the number of monomials, and Ci%a G C is the coefficient for the Laurent monomial xa. The qualifier "Laurent" acknowledges that we allow negative exponents, which are disallowed in our usual definition of polynomials. The set Si is called the "support" of fu and its convex hull Qi = conv(5j) in W1 is its "Newton polytope.2" The polynomial family f{x\c) = f(x;ci,...,cn)
= {/i(x;ci),...,/ n (ar;cn)}
is parameterized by mi+m2H \~mn coefficients for the support S = {Si,..., Sn}. When working on (C*)™, multiplication of any equation by a monomial does not change the root count, as the zero set of xap(x) = 0 is just the union of the zero set of p(x) = 0 and the zero set of xa = 0, the latter having no points that A convex polytope is a bounded region of n-dimensional real space enclosed by hyperplanes. "Polytope" is to n dimensions as "polyhedron" is to three dimensions.
139
Polynomial Structures
are in (C*)n. So given a Laurent polynomial, we can always multiply through by some monomial with large enough exponent to clear any negative exponents that appear. Said another way, we can translate the support into the nonnegative orthant without changing the zero set on (C*)n. Thus, it is clear that the parameter theory of Chapter 7 for polynomials with nonnegative exponents applies also to Laurent polynomials. There are several operations on convex polytopes that are of interest to us. One is the Minkowski sum of two polytopes: Qi + Q2 = {Qi + 42 I qi 6 Qi, 2 G Q2}Second, defining the n-dimensional volume, denoted Vol n of a unit hypercube to be 1, we may speak of the n-volume Voln(Q) for any polytope Q C R n . In fact, the volume of the simplex having vertices VQ,VI, ... ,vn is Vol n (conv(u 0 , • • •, vn)) = — |det[t>i
-vo,...,vn-vo]\.
From these definitions, it can be shown that Vbln(AiQi+A2<32H l-AnQn), where 0 < A, £ M, is a homogeneous polynomial of degree n in the scalars Aj. Definition 8.5.1 (Mixed Volume) The mixed volume of convex polytopes Qi! • • • > Qn is defined as M(QU
...,Qn)
= c o e f f ( A i • • • Xn, V o l n ( A 1 Q 1 + \2Q2
We say that the mixed volume, Ai(S\,..., volume of their convex hulls. 8.5.2
Bernstein's
+ ••• + A n Q n ) ) .
Sn), of supports Si,..., Sn is the mixed
Theorem
We have argued above that a (Laurent) polynomial family has a well-defined root count on (C*)ra. The following theorem tells us how to determine it from the geometry of the supports. Theorem 8.5.2 (Bernstein, 1975) The root count on (C*)n of a Laurent polynomial family specified by supports Si,...,Sn and parameterized by the coefficients of the corresponding monomials is the mixed volume M(Si,... ,Sn).
This result is variously called the "Bernstein count," the "BKK bound" (a term coined in (Canny & Rojas, 1991) in recognition of the contributions of (Kushnirenko, 1976) and (Khovanski, 1978)), the "polyhedral root count," or the "polytope root count." We adopt the last convention as the most descriptive and precise. If all the exponents are positive, that is, if the system is polynomial in the usual sense, there is a well-defined root count on C", which may be higher than the polytope root count on (C*)n. The count in C" can be determined by the procedure in (Li & Wang, 1996) with further refinements in (Huber & Sturmfels, 1997).
140
8.5.3
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Computing Mixed Volumes
The computation of the mixed volume is a combinatorial problem. As mentioned at the outset, efficient methods for this computation are highly technical and we will not delve into them here. Instead, we will describe a very basic approach that is of practical use only for two variables, three variables at the most. This will be enough to show the nature of the beast. With this level of understanding, the reader can knowledgeably use software provided by experts, but further study of the references will be necessary to understand the internal workings of such software. Let's begin with the direct application of Definition 8.5.1 for two polynomials in two variables. We know that Vol2(AiQi + X2Q2) is a homogeneous quadratic in Ai, A2, that is, it is of the form p(Ai, A2) = c20Ai + C11A1A2 + C02A2. The mixed volumes is the coefficient of A1A2, which is Ci\. But notice that cn=p(l,l)-p(l)0)-p(0,l), or in other words, M(Q1,Q2) = Vol2(Qi + Q2) ~ Vol2(Qi) - Vol2(Q2).
(8.5.24)
Since Vol2(Q) is just the area of polytope Q, it is easy to see how to apply this using familiar area calculations. Following exactly the same line of reasoning, one may see that M{QUQ2, Q3) = Vol3(Qi + Q2 + Q3) - Vol3(Qi + Q2) - Vol3(Q2 + Q3) - Vol3(Q3 + Qi) (8.5.25) + Vol3(Qi) + Vol3(Q2) + Vol3(Q3), and generally,
M{Qu...,Qn)
= Y,{-l)^-^Yo\n I J2 QJ I » i=i
\jecr*
( 8 - 5 - 26 )
J
where the inner sum is a Minkowski sum of polytopes and Cf are the combinations of n things taken i at a time. It is instructive to see how the mixed volume relates to Bezout's theorem. Suppose fi(x,y) and f2ix,y) are general polynomials of degree d\ and d2, respectively. This implies that their support polytopes Q\ and Q2 are isosceles right triangles of size d\ and d2, shown in Figure 8.2, and the Minkowski sum Qi + Q2 is another such triangle of size d\ + d2- Accordingly, by Equation 8.5.24, the root count is
M(QUQ2) = i(di + d2)2 - id? - id! =
did2)
141
Polynomial Structures
which is, of course, the same result as given by Bezout's Theorem. The subtraction of areas is shown graphically in the drawing of Qi + Q2- Alternatively, we can visualize the definition of the mixed volume directly by drawing a picture of AxQi + X2Q2, as shown at the right side of Figure 8.2. Only the area of the shaded parallelogram scales as A1A2, whereas the triangles scale as \\ and X2-
Fig. 8.2
Mixed volume for two polynomials of degree d\ and di.
In a similar fashion, one may easily see that the mixed volume for two equations having bidegrees (dix,d\y) and (d2X,d,2V) is dixd2y + diyd2x, in agreement with the two-homogeneous Bezout count. Figure 8.3 shows this in a self-explanatory way.
Fig. 8.3 Mixed volume for polynomials with bidegrees {d\x,diy)
and (d2x, <^2j<).
Although the preceding examples only examine the two-variable case, the mixed volume does in fact generalize the multihomogeneous Bezout count in any dimension. This relationship is pursued further in one of the exercises at the end of the chapter. Any linear product structure is exactly captured by the polytope root count, as a linear product is just another way of saying that the monomials appear in a certain pattern. There are, however, more general patterns that are captured by the polytope formulation but not by any linear product formulation. Systems having such patterns are said to be "sparse," because some of the monomials which could appear in a total degree formulation are missing. Many of the problems that arise in applications have such sparseness.
142
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Consider, for example, the system /j (x, y) = l + ax + bx2y2 = 0 f2(x, y) = 1 + ex + dy + exy2 = 0
(8.5.27) (8.5.28)
This system has a total degree root count of 4 • 3 = 12 and a two-homogeneous root count of 2 • 2 + 2 • 1 = 6, but as illustrated in Figure 8.4, the mixed volume is only four.
Fig. 8.4
Mixed volume for polynomials in Equations (8.5.27, 8.5.28).
These diagrams hint at the main idea that underlies efficient algorithms for computing the mixed volume. In each of the drawings of Qi + Q2, notice that the gray cells, whose areas sum to the mixed volume, are parallelograms having one edge in common with Q\ and one edge in common with Qi- These are known as the "mixed cells" in a "mixed subdivision" of Q\ +Q%- Mixed subdivisions are not unique, as we show in Figure 8.5. It is only required to find one.
Fig. 8.5 Alternative subdivisions for each example.
One approach to finding subdivisions for the mixed volume calculation is based on "liftings." A lifting algorithm augments each polytope by adding an (n + l)th coordinate axis and assigning a value using a lifting function. That is, point a e Qi, corresponding to monomial xa in fi(x), is lifted to (a,Wj(a)), where the lifting function, u>i : Z n —> K, for the ith polytope assigns a lift value to each exponent vector. If these assignments are chosen at random, the following procedure gives a valid subdivision with probability one. Let Q\ be the (n + 1)-dimensional polytope derived from Qi using u>i. Then, one forms the Minkowski sum Q[ + • • • + Q'n and finds the lower convex hull. The projection of the edges of this lower hull onto the original n coordinates gives a mixed subdivision, from which the mixed cells
143
Polynomial Structures
can be readily identified and their volumes computed. In fact, for efficiency, one avoids forming the convex hull of the Minkowski sum and instead searches for the mixed cells directly. See (Gao & Li, 2000, 2003; Li, 2003; Li & Li, 2001; Huber & Sturmfels, 1995; Verschelde, Gatermann, & Cools, 1996). In (Huber & Sturmfels, 1995), it is also shown how to take advantage of several of the equations having the same support.
8.5.4
Polyhedral
Homotopies
The mixed volume root count by itself does not enable us to solve the system by continuation. We need a start system that we can solve ab initio. This can be done using information gleaned from the mixed volume calculation to identify monomial combinations that contribute to the mixed volume. This was accomplished in (Verschelde, Verlinden, & Cools, 1994), using a recursive formula for the mixed volume, following in that way Bernstein's proof. The same objective was attained by using mixed subdivisions in (Huber k Sturmfels, 1995). In fact, the homotopy denned in (Huber & Sturmfels, 1995) can be used to establish an independent proof of Bernstein's theorem. A good review of subsequent developments is (Li, 2003). To form a homotopy, one usually chooses the lifting values not from the reals, but from the small nonnegative integers. Such a choice is not necessarily sufficiently generic, but this can be discovered by testing and correcting. In fact, we require the subdivision induced by the lifting to be "fine mixed," a technical condition which is best left for study in the references. In the end, one has a lifting function u>i for each equation. We select a generic member G(x) = {gi(x),... ,gn(x)} of the family of polynomials by picking random complex coefficients Cj>a for monomials at the vertices of the convex hull Q, to get gi{x) = ] T citClxa,
i = l,...,n,
aeQi
and form homotopy functions H(x, t) = {h\(x, t),..., hi(x,t)=
Y,
Ci,axat^a\
hn(x, t)} as i = l,...,n.
(8.5.29)
Att=l, we have H(x, 1) = G(x). We solve G(x) = 0 by first solving H(x, 0) = 0 and then tracking solution paths from t = 0 to t = 1. Subsequently, we solve the original, possibly nongeneric, target system F(x) = 0, using the homotopy
H(x, t) = tG(x) + (1 - t)F(x) tracking paths for t = 1 to t = 0 starting at the solutions for G(x) = 0. At first glance, H(x, 0) as defined in Equation 8.5.29 does not look so easy to solve. However, if we consider the limit as t approaches zero, the solutions x(t) are algebraic, having a number of branches each with its own Puiseux series (fractional power series). Each branch corresponds to a mixed cell in the subdivision, and it
144
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
has a number of solutions equal to the volume of that cell. These solutions can be found by elementary means. Altogether, the paths emanating from the mixed cells give the full set of solutions to G(x) = 0, whose number totals to the mixed volume of the system. In principle the homotopy could go directly to the target system F(x) — 0, using the coefficients and monomials of F in Equation 8.5.29 instead of those of G. In practice it is advisable to use the two-stage procedure, solving G and then progressing to F. This is because target systems are often not generic in the family defined by their support (that is, the coefficients may satisfy a degeneracy condition) and this may cause the standard algorithm for solving H(x, 0) to fail. 8.5.5
Example
Rather than delve any deeper into the technicalities, let us simply show the workings on the example of Equations (8.5.27) and (8.5.28). A choice of lifting functions as u>i = 0 and u>2{a) = (1,1) • a yields the subdivision shown in Figure 8.4. To see this, note that the Newton polytopes of the supports of the polynomials are Ql = [(0,0), (0,1), (2,2)],
Q2 = [(0,0), (1,0), (0,1), (1,2)],
which are convex already. These lift to Qi = [(0,0,0), (0,1,0), (2,2,0)],
Q'2 = [(0,0,0), (1,0,1), (0,1,1), (1,2,3)],
The lower hull of Q[ + Q'2 has the faces shown in the figure with vertices [(0,0,0), (0,1,0), (2,2,0), (2,0,1), (0,1,1), (3,2,1), (2,3,1), (3,4,3)]. Using these liftings, the homotopy of Equation 8.5.29 applied to this example becomes
H(x,y,t) = Clf^l) v
y
=(
1 + ax
+ bx2y2 2 A 3
v
(8.5.30)
} ' ' ' \h2(x,y,t)J \1 + cxt + dyt + exyH ) The solution paths of H(x, y, t) = 0 are intimately related to the two mixed cells, labeled A and B in the figure. It can be shown3 (Lemma 3.1 Huber & Sturmfels, 1995) that H(x,y,t) only has branches of the form
(x(t),y(t)) = (zoi71,Vof2) + higher-order terms when (-ji,72,1) is an inner normal of the mixed cell of the lower convex hull of Qi + Q'2- As i ^ 0, the lowest order terms dominate and we solve them to obtain the leading coefficients Zo;2/o of the fractional power series. Let us start by examining cell A, which is generated by monomials \,x2y2 from /i and l,y in / 2 . The inner normal for that cell is (71,72,1) = (1, — 1,1). One may 3
The result stated here generalizes to any number of variables.
145
Polynomial Structures
check that the inner product of (71,72,1) with the lifted vertices takes a minimal value of 0 on the cell. In the case at hand, this means that {x(t),y(t)) = ( x o t 1 , ^ " 1 ) + higher-order terms.
(8.5.31)
Substituting into H(x,t) = 0 gives hi (x, y, t) = 1 + axot + bxfyg + higher-order terms, h,2(x,y, t) = 1 + cxot2 + dyo + exoy^t2 + higher-order terms. Keeping just the lowest-order terms in t, we have equations for the initial coefficients xo,yo as 0 = l + bx%yZ, 0 = l + dy0These give two solutions (zo,2A>) =
(±id/Vb,-l/d).
For each of these, we may use Equation 8.5.31 to predict the values of x(t),y(t) for small t and then commence path tracking on the homotopy Equation 8.5.30 to t = 1. In similar fashion, the mixed cell B in Figure 8.4 is generated by monomials x,x2y2 from fi and l,x from J2- This time the inner normal is (71,72,1) = (-1,1/2,1), so we get (x(t),y(t)) = (xot-^yot1/2) + higher-order terms,
(8.5.32)
which gives h\(x,y,t)
= 1 + axot^1 + bx1ylt~l + higher-order terms,
h,2(x, y, t) = 1 + cx0 + dy^t3/2 + exoy2t3 + higher-order terms. This time, the lowest-order terms in t give 0 = axot~l +
bxlylt'1,
0 = 1 + cx0, giving two solutions / ChC
{xo,yo) = (-i/c,±J—). As before, these allow us to predict (x(t),y(t)) for small t, now using Equation 8.5.32, and then track the homotopy Equation 8.5.30 to t = 1. Together, these give four paths to the four solutions of Equations (8.5.27) and (8.5.28). Any other choice of (71, 72) fails to give any nonzero solutions of (XQ, yo) in the initial fractional power series, as there is only one leading term in one or both of
146
Numerical Solution of Systems of Polynomials Arising in Engineering and Science Table 8.1
Various root counts for the toy example, Equation 8.6.33
Structure
Embedding
Total Degree
<{1, zu z2, z3, 2 4 })
Two-Homogeneous Linear Product
{1, zi, Z2}
(2)
Count
U (4)
C (2)
({21,22} ® {1, zi, 22}®
4
C
256
4
96 4
54
(C*) 4
26
(C*)
{23,24}® {1,23,24}) Monomial Product
({2124,2223,21,22}®
or Polytopes Polynomial
{2124, 2223, 23, 24}) {{2124 — 2223,21,22}®
Product
(C*) 4
6
{2124 — 2223,23,24})
the homotopy equations. Both XQ and yo must be nonzero, because by assumption, they are the leading coefficients of the series. This kind of argument is the key to the general result for any number of equations.
8.6
A Summarizing Example
Let us review by studying a "toy" example for which each product structure gives a different root count. Consider a system of four equations, each of the form fi = {qn{z\Zi ~ z2z3) + ql2z\ + qi3Z2){qii(zizA - z2z3) + qibz3 + qi6z4) +qi7ZiZ3 + ql8ziz4 + qi9z2z3 + qnoz2z4.
(8.6.33)
We have four variables 2 = {z\, z2, z3, Z4} and forty parameters g^, i = 1,...,4, j = 1 , . . . , 10. Table 8.1 gives the root counts for various embeddings of the system. Here is a quick summary of how each of these is calculated: • The total degree is 4 4 = 256. • With the variables split into two groups {zi,z2} and {23,24}, the twohomogeneous Bezout count is the coefficient of a2f32 in (2a -I- 2/?)4, which is 96. More explicitly, each polynomial in the start system has the form x (1,2:3, Zi)^2\ There are (2) = 6 ways to choose the factor (1,2:1,2:2) (1,2:1, Z2) from two start polynomials and (1,2:3,2:4)^ from the remaining two, and then there are 2 4 solutions for each such choice, yielding 6 • 2 4 solutions in all. • Notice that the equations have no constant or linear terms, so the start systems can be chosen of the form (2:1,2:2) x {\,zl,z2)
x (2:3,2:4) x (1,23,24)
Polynomial Structures
147
The combinatorics for the linear-product embedding follows those for the twohomogeneous case, but the simultaneous choice of two factors {zi,z2) gives a solution with z\ = z2 — 0, and two choices of the form {z3, z±) yield a similar result. Thus, we get a smaller root count when working on (C*)4 of 3-3- (4) = 54. • The monomial product root count and the polytope root count are the same for this system. Evaluation of the mixed volume by computer yields the count of 26. • The polytope root count does not account for the fact that Z\Z4 and z2z3 do not appear independently in the factors. The polynomial product structure captures this fact, and as a result the root count decreases to 6. To determine this count, one must consider the 24 ways to choose one factor from each equation in the corresponding start system: gi G {zxz4 - z2z3, zi,z2) x {ziZi - z2z3, z3, z 4 ). It turns out that only choices with two of each kind of factor give roots in (C*)4. Each of these (2) = 6 combinations gives a single root. Although for this example polynomial products give a lower count than the polytope root count, that is not necessarily true in general. It depends on whether the equations admit a favorable polynomial product. Often the polytope root count is lowest. Other than that, the ordering in the table is fixed, as the structures lower in the table are generalizations of those higher in the table, as indicated at the outset of the chapter in Figure 8.1. 8.7
Exercises
The next chapter on case studies contains more challenging exercises connected to applications. For now, the exercises are simpler, illustrative problems. Exercise 8.1 (Warm Up) Use HOMLAB to solve the system 8.4.4 using • a total-degree homotopy (see routine totdtab), and • a two-homogeneous homotopy (see routine mhomtab). Exercise 8.2 (Linear Products) Consider the system 8.4.18. What is its total degree, its two-homogeneous Bezout count, and its best linear-product root count. Solve it all three ways using the HOMLAB routines totdtab, mhomtab, and lpdtab. Exercise 8.3 (Generalized Eigenvalues) Create a straight-line program for the generalized eigenvalue problem (XtA + X2B)v = 0, where A and B are n x n matrixes, [Ai,A2] G P 1 , and v £ Fra. Solve a randomly generated example using a two-homogeneous homotopy with n paths. (Use routine lpdsolve.) Compare your result to the gz algorithm in Matlab.
148
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Exercise 8.4 (Multihomogeneous and Polytopes) For a general system having a given multihomogeneous degree structure (D,K), show that the polytope root count and the multihomogeneous Bezout count Bez(D, K) are the same. Use Equation 8.4.15. Exercise 8.5 (Toy Problem) Use HOMLAB to confirm all the root counts reported in Table 8.1. How can you confirm the polytope root count even though HOMLAB does not implement a mixed volume calculation? Exercise 8.6 (Circle Tangents) A circle of radius r and center (a, b) has the equation f(x,y) := (x — a)2 + {y — b)2 — r2 = 0. The condition for a line through (x, y) and point (c, d) to be tangent to the circle is g(x, y) := (x — a)(x — c) + (y — b)(y-d)= 0. (1) Assume r, a, b, c, d are given. Find the points of the circle where it is touched by the tangents through (c, d). Do so by solving the system {/ = 0, g = 0}, then try again by solving {/ = 0, f — g = 0}. Is there a difference in the number of paths for a total degree homotopy? How about for a two-homogeneous homotopy? (2) Assume two circles are given. Find the point pairs where a line simultaneously touches both circles in a tangency. Use the same trick as in item 1 to reduce the number of homotopy paths. (3) Show that with the change of variables (z, z) :— (x + iy,x — iy) and judicious linear combinations of the equations, the simultaneous tangents to two circles can be found with a system having total degree 8, linear-product root count 6, and polytope root count 4. (4) What happens if the two circles are tangent to each other?
Chapter 9
Case Studies
As a means of reviewing the computation of isolated solutions by continuation, we present a collection of application problems in this chapter. Reflecting our own experiences, these are weighted heavily towards problems in kinematics, with chemistry and game theory also represented. Readers who have no interest in these application areas are encouraged nonetheless to study this chapter to solidify concepts. We order these roughly by the complexity of the analysis of the polynomial structure. The first case concerning Nash equilibria is naturally formulated as a multihomogeneous system, while succeeding cases offer a range of options to consider. The final case study on the design of four-bar linkages is actually a collection of problems ranging from the very easy, four-bar motion analysis, to rather hard, nine-point path synthesis. In these examples, one may notice that there is an art to choosing a clean formulation and simple manipulations of the equations can sometimes lead to homotopy formulations having fewer paths. Although such manipulations are sometimes not really necessary, as a few extra solution paths are not of practical consequence, our objective is to give some sense of the full range of possibilities.
9.1
Nash Equilibria
An important problem in game theory, with application to economics, is the determination of Nash equilibria. A description of the problem and results of using several different solution methods, including Grobner methods and continuation, can be found in (Datta, 2003), and related information is in (Sturmfels, 2002). The problem concerns N players, and the ith player has Si + 1 possible choices of play, called "pure strategies." For every combination of strategies, there is a payoff for each player. There are ni=i( s i + -0 possible combinations and TV players, so the game is defined by TV rTi=i(s* + -0 numbers, tabulated in utility matrices as follows. Let's say there are 3 players, Alice, Bob, and Chuck, abbreviated as A, B, or C, and they make, respectively, the plays a,b,c. Denote by U^bc,U^bc,U^bc the respective payoffs to the players. More generally, the utilities are U^ J w , where i 149
150
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
ranges 1 to N, and each jk runs 0 to s^. The game is played multiple times. Suppose Alice observes that in the last round a change in her strategy would have earned her a higher payoff. Then, she is likely to change her play in the next round. Bob and Chuck will act similarly. An equilibrium occurs if every player finds that there is no unilateral change of strategy that would have increased his or her payoff. Suppose the players can split their bets between the possible strategies. This is called a "mixed strategy," which models either the situation of putting a fraction of ones money on each strategy or of putting all ones money on a single strategy chosen probabilistically according to the mixed strategy. Let Xi = (xio,... ,XiSi) be the ith player's mixed strategy. Then, the total payoff Pi{x\,..., XN) to player i is obtained by summing his/her utility over all the mixed strategies as Pi(xU...,XN)=
J2'"Y1 ji=0
U
xi31,-,xNJNxljix2h---XNjN-
(9.1.1)
j«=0
Notice that this is multilinear in the players' mixed strategies. Equilibrium occurs for player A if, while holding B and C's mixed strategies fixed, every pure strategy for A returns the same payoff. Otherwise, A would be motivated to bet more heavily on the higher paying pure strategy. Let e/- be a pure bet on the fcth strategy: eo = ( 1 , 0 , . . . , 0), ei = (0,1,0,..., 0), etc. Then, a Nash equilibrium occurs when for i = 1 , . . . , N and k — 1 , . . . , Si Pi(xi,... ,Xi-i,ek,xi+i,...
,xN) = Pi(xi,... ,Xi-i,eo,xi+i,...
,xN).
(9.1.2)
This comprises a total of X)i=i s* homogeneous equations on P S l x ••• x FSN, a multilinear system of polynomial equations. Since the entries in the mixed strategy x\ are the percentages that player i bets on each pure strategy, these should all be in the real interval [0,1] , and they should sum to one. Each player's strategy Xi £ FSi has a unique scaling factor that makes the sum of its homogeneous coordinates equal to unity. These can then be filtered against the [0,1] condition to find the meaningful solutions. Thoseforwhich all bets are in the interior of the interval (0,1) are called "totally mixed Nash equilibria." A given game can also have partially mixed equilibria, where some players adopt pure strategies, due to unequal payoffs, while others adopt mixed strategies. We consider only the totally mixed Nash equilibria. The system given in Equation 9.1.2 has two essential structural characteristics: the equations are all multilinear, and the group of variables Xi does not appear in the ith block of equations (those that involve Pi). This structure is perfectly captured by a multihomogeneous formulation. In (Datta, 2003; Sturmfels, 2002), the solutions are counted using the polyhedral mixed volume and computed via the associated polyhedral homotopy. This is of course valid, since the polyhedral formulation sharply bounds any multihomogeneous formulation, but it is a bit of overkill when the multihomogeneous formulation is already sharp. If the payoffs
151
Case Studies
were such that more monomials vanish from the equations, such as may happen when payoffs for two pure strategies are equal, then the polyhedral method could provide a lower root count. For small systems, a multihomogeneous root count can be done by hand while a general multihomogeneous routine for larger systems remains a simple and efficient alternative to polyhedral approaches. Let us take, for example, the case of N = 3 players, with players 1 and 2 having Si + 1 = 3 pure strategies each, and player 3 having just s 3 + 1 = 2 pure strategies, so (si, s2, S3) = (2,2,1). By Equation 8.4.15, the multihomogeneous root count is B = coeS(a2b2c\ (b + c)2(a + c)2(a + b)1) = coeft(a2b2c, b2{a + c)2(a + b)) + coeff(a2b2,2b(a + c)2(a + b)) = coeff (a2c, (a + c)2a) + coeff(a2b, 2a2(a + 6)) = 2 + 2 = 4 The explanation of the first line is that the exponents in a2fc2c1 match the dimensions of the space, P Sl x PS2 x PS3, on which we work, while those in the polynomial (b + c)2(a 4- e)2(a + b)1 match the number of equations of each type, which are also s\, S2, S3 by Equation 9.1.2. The factor (b + c)2 says that the two equilibrium equations for player 1 do not involve player l's bets while those of players 2 and 3 appear linearly, and similar factors come from the other two players' equilibrium conditions. It is clear, we hope, from this example how to generalize to other N and
Sj.
Another way to arrive at the same result is to examine the linear product start system. For the (si, S2, S3) = (2,2,1) game, the 3-homogeneous start system is / 2(
) 3( r player 1 equilibrium
(xi) x <x3) 1 , . .... . ; ; ; > player 2 equilibrium
(9.1.3) v
;
Wxwr
(xi) x (x2) } player 3 equilibrium. Among the 25 ways to choose one factor from each equation, we are limited to 2 choices each in x\ and x-i and only one choice for X3. Making the choice for x% first, which can only be done 4 ways, one may see that all the other choices are forced. The valid choices are
'fa)} (x2)
< te> \ (Xi)
[ten
[ten
\ (zi> >
\ te> >
te)
(Xi)
(x2)
(Xi)
[ter (x2)
< (xi) > (x3)
(9.1.4)
. (x2) ) I (x2) ) I (xi) ) \ (xi> . The disparity between the multihomogeneous root count and the total degree, here 4 and 32, respectively, grows rapidly with the size of the problem, for example, for N = 4 players having 4 pure strategies each, the 4-homogeneous Bezout count is 13,833, while the total degree is 3 12 = 531,441.
152
9.2
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Chemical Equilibrium
Imagine a reaction vessel, or an automobile engine, in which a mixture of chemical compounds are reacting. The compounds may break up and recombine into a myriad of intermediate species, settling down eventually to a final equilibrium mixture. While the transient behavior of the reaction is governed by differential equations, the final equilibrium conditions are well-modeled by a system of polynomial equations. The system typically has at least one real root with positive values for the concentrations of all the chemical species. It is possible for there to be more than one positive root, in which case the transient behavior determines which of several possible equilibria is reached. A basic presentation of modeling chemical reactions can be found in (Morgan, 1987), from which the following discussion is derived; more sophisticated treatments are given in (Feinberg, 1980). The variables in the system represent the molar concentrations of the species. The concentrations at a state of equilibrium are governed by two types of equations: conservation equations state that the total number of atoms of each element must stay constant (we assume a closed system), and reaction equations model the propensity of certain combinations of species to transform into each other. In such a model, a chemical reaction equation of the familiar form, such as H2O ^2H + O, gives rise to an equilibrium reaction equation governing the balance between the constituents on the two sides, in this case kXH2o = XHXO, where k is an equilibrium constant that depends on temperature. (Equilibrium constants for many reactions are available in standard tables, typically derived from laboratory experiments.) To go with this reaction equation, the conservation equations would be 2XH2O + XH = TH, XH2O + XO =
To,
where TH and To stand for the total amount of hydrogen and oxygen in the vessel. Notice that the coefficient of 2 on XH2O in the conservation equation for hydrogen comes from the fact that each water molecule has two hydrogen atoms. The conservation equations are always linear, and the reaction equations are polynomial. The three equations just given determine the equilibrium balance between water, hydrogen and oxygen in a simple model that ignores molecular hydrogen and oxygen, if 2 and OiMorgan presents a model (Model B in (Chapter 9 Morgan, 1987)) involving eleven species formed from oxygen, hydrogen, carbon and nitrogen. The reaction equations, given in standard chemical notation at left and in polynomial form at
153
Case Studies
right, are: 02 ^ 20
kiXO2 = Xo
(9.2.5)
H2 ^ 2H
k2XH2 = X2H
(9.2.6)
7V2 ^ 2N C02^0
k3XN2 + CO
=X
2
(9.2.7)
N
(9.2.8)
k4XCO2 = XOXCO
OH^O +H H2O^±O + 2H
k5XOH = XOXH k6XH2o=XoX2H
(9.2.9) (9.2.10)
NO^O
k7XNO=XoXN.
(9.2.11)
+N
There are four conservation equations: TH = XH + 2XH2 + XOH + 2XH2o
(9.2.12)
Tc=XCo + Xco2
(9.2.13)
TO = XO + Xco + 2X O2 + 2XCo2 + XOH + XH2O + XNO
(9.2.14)
TN=XN
(9.2.15)
+ 2XN, + XNO 6
These are eleven equations in eleven variables, with total degree 2 • 3 = 192. We could readily solve the system as given, but it is easy to reduce. The obvious move is to substitute from the reaction equations into the conservation equations to eliminate all variables except Xfj, Xo, XQOI a n d ^JV- This gives four equations of total degree 3 • 2 • 3 • 2 = 36. Note , however, that there is only one cubic monomial in the equations, which comes when we eliminate XH2O using Equation 9.2.10. So it is a simple maneuver to replace Equation 9.2.14 with 2To — TH = 2(Xo + Xco + 2Xo2 + 2Xco2 + XOH + X^o) —
+ 2XH2 + XOH)(9.2.16) After substituting from the conservation equations, the system of Equations (9.2.12, 9.2.13, 9.2.16, 9.2.15) has total degree 3 • 2 3 = 24. Now, let's see if any of the product structures can further reduce the number of homotopy paths. First, for convenience, we list the monomial structure of the equations: {XH
(l,Xo,XH,XoXH,Xfj,XoXff) (l,Xco,XoXCo) (l, Xo, XH, XCO, XQ,XH,
XOXH,XOXCO,
/g 2 1 7 \ XOXN)
\ 1, XN , XN, XOXN ) .
A four-homogeneous formulation gives a root count of 18, which is the lowest possible multihomogeneous count. We can improve on that slightly with a linear product
154
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
homotopy having the start system (l,Xo)x(l,XH)i2) (l,Xo) x (l,XCo) {1,XO,XH)
x
(1,XN) x
/Q2
lgx
(1,XO,XH,XCO,XN)
(1,XO,XN),
which gives a root count of just 16. As always, a sparse monomial homotopy would do just as well as the best linear product homotopy. In chemical equilibrium problems a significant numerical issue arises: equilibrium constants often have wide ranges of magnitude. For a temperature of 1000°, Morgan gives reciprocal equilibrium constants Ri = 1/fcj that range from 10 22120 to 10 47 ' 970 . It is essential to rescale the variables and the equations to work in double precision arithmetic. We will not discuss this issue here. The interested reader may refer to Morgan's treatment in (Morgan, 1987) or (Meintjes & Morgan, 1987), or study the implementation in the function scalepol distributed as part of HOMLAB. This problem is treated in the exercises of this chapter. 9.3
Stewart-Gough Forward Kinematics
A detailed description of Stewart-Gough platform robots and the associated forward kinematics problem has already been given as a case study in parameter continuation, § 7.7. However, the discussion there assumed that we had the solutions for some general member of the problem family which could then be used as the start system for parameter continuation. Here, we return to the problem to examine our options for solving the first example. The family of Stewart-Gough platform problems is a sub-family in the family of all systems of seven quadrics on [e, g] G P 7 . Any member of this family has at most 27 = 128 isolated solution points, and it is easy to write down an example with that many roots, a simple one being G(e,g)
= {e2 - e2, e 2 - e2,, e2 - e2, e2 - g2, e2 - g2, e2 - g22, e2 ~ g2} = 0. (9.3.19)
We immediately see that this system has exactly 128 solutions, all of the form = ±e0. The theory presented in § 8.3,8.4.1 shows that with ex = ±eo,...,g3 probability one, the solution paths of the homotopy H((e,g),t) = -ytG(e,g) + (1 - t)F((e,g);p0) = 0,
(9.3.20)
for any p0 and random 7 e C, will lead from the 128 solutions of G = 0 to a set of endpoints that contains all isolated solutions of F = 0 as t goes from 1 to 0 along the real line. The Stewart-Gough forward kinematic equations can be reduced to a form in which a linear-product decomposition yields a lower root count than the total degree. The reduction is based on the observation that the quadratic terms in g in all
155
Case Studies
six leg equations, Equation 7.7.10, are the same, namely gg'. Hence, if we subtract the equation for leg 1 from all the others, this term is eliminated from five of the equations. That is, the system becomes (9.3.21)
fo(e,g)=ge' = 0, fi(e,g) = ( M i + aia[ - L\)ee' + (gb'xe' + ehg') - (ge'a[ + aveg') - (e&ie'ai + axeb\^) + gg' = 0,
(9.3.22)
fi(e,g) = (bil/i + aia'i - L})ee' + (gbtf + ebig') - {ge'4 + a%eg') - {ehe'a'i + a ^ e ' ) = 0 ,
i = 2 , . . . , 6.
(9.3.23)
This system admits the linear product decomposition /o G {g ® e)
(9.3.24)
he({e,g}®{e,g})
(9.3.25)
/,efej}®«).
i = 2,...,6.
(9.3.26)
Consequently, we may use a start system of the form 9o € (g) x (e) Si€<e,s>
(9.3.27)
(2)
gi<E(e,g)x(e),
(9.3.28) i = 2 , . . . , 6.
(9.3.29)
The linear-product root count may be tallied up by noting that in picking one factor from each equation, we must never choose more than three of the form (e), because choosing four or more forces e = 0, and we wish to ignore any solutions on that degenerate set. Accordingly, if we pick the factor (g) in go, we may pick either of two factors in gx and among the remaining five equations, we may choose (e) from zero to three times. If instead we choose (e) in go, we must limit the last five equations to choose (e) at most twice. These observations give a root count of
4(HHK)H[0KMDH It is shown in (Wampler, 1996a) that the count of 40 for general Stewart-Gough platforms is due to the antisymmetry of the mixed quadratic terms. That is, if we write Equation 7.7.10 for leg i in the form, eTAte + 2eTBi9 + gTg = 0, where e and g are interpreted as 4 x 1 column matrices, then the 4 x 4 matrix Bi is antisymmetric, Bf = — Bj. [This can be seen in the quaternion formulation by noting that (gb'e' + ebg') = —{eb'g' + gbe'), and similarly for the other mixed terms.] Accordingly, any further reduction of the problem must take advantage of this property. A monomial product or sparse monomial homotopy does not account
156
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
for any relationships between the coefficients of the monomials, so these give 84 roots when applied to Equation 9.3.21. 9.4
Six-Revolute Serial-Link Robots
The solution of the inverse kinematic problem of general six-revolute, serial-link robots, once called the "Mount Everest of Kinematics" by renowned kinematician F. Freudenstein,1 is a milestone in the development of polynomial continuation. The problem is, given a stationary ground link and six subsequent moving links connected in series by rotational joints, find all sets of joint angles to place the final link in a given position, p, and orientation, {X7,y7,z7}, as schematically shown in Figure 9.1. The links are assumed to be rigid bodies, a good approximation for most industrial robots. The space of rigid-body displacements, E 3 x 50(3), is six-dimensional, which matches the dimensionality of the joint space, so we expect in general a finite number of isolated solutions to the problem. The stature of the problem justifies a historical synopsis, which may help to place the development of the continuation method in context with other approaches.
Fig. 9.1
Schematic six-revolute serial-link robot
The high points in the history of the problem begin in 1968 with (Pieper, 1968), who gave a formulation of the general problem having total degree 64,000. This 1 Ferdinand Freudenstein, Higgins Professor Emeritus of Mechanical Engineering, Columbia University
Case Studies
157
upper bound was substantially sharpened in 1973 to only 32 (Roth, Rastegar, & Scheinman, 1974), but it was not until 1980 that (Duffy & Crane, 1980) derived a reduction of the problem to a single polynomial of degree 32. This essentially solved the problem in the sense that good numerical methods exist for factoring a polynomial in one variable and also in the sense that one could solve a generic example and find the true root count. The count is only 16, since 16 of the 32 roots were extraneous ones introduced by the reduction process. However, at the time this was not fully appreciated and the prevailing attitude at the time was that the problem could not be considered fully solved until a reduction to single univariate polynomial of degree 16 was found. Besides, a numerical demonstration does not carry the full weight of mathematical proof. It was into this scene that, in 1985, (Tsai & Morgan, 1985) introduced the method of polynomial continuation to the kinematics community. They cast the problem as eight quadratics (total degree 256) and found that only 16 endpoints of the ensuing homotopy were valid solutions. Perhaps the most important contribution of that work was not the confirmation of the count of 16, but rather the demonstration that systems of polynomial equations could be solved reliably by numerical means. Work continued after that on two fronts: elimination methods and continuation. (Primrose, 1986) gave the first real proof of the root count of 16, by showing that the other 16 roots of the Crane-Duffy polynomial correspond to solutions at infinity for the intermediate joints. Morgan and Sommese (Morgan & Sommese, 1987a) showed that the Tsai-Morgan system had a two-homogeneous Bezout number of only 96, the first application of multihomogeneous continuation. Finally, in 1988, Lee and Liang (Lee & Liang, 1988) produced the long sought-after reduction to a univariate polynomial of minimal degree, although it was a complicated procedure. A simpler one was later given by (Raghavan k. Roth, 1993), and a numerical treatment of this reduction as an eigenvalue problem was given by (Manocha & Canny, 1994). Complementing all of these works, (Manseur & Doty, 1989) found an example with all 16 solutions being real. The reduction of a problem to a univariate polynomial of minimal degree has two payoffs: it proves an upper bound on the root count and it leads to a numerical solution. But it is not the only route to either of these. A system of equations that admits a sharp root count via a multihomogeneous formulation or a monomial polytope analysis suffices for proof, and continuation can provide the numerical method. We should not fail to mention the extensive work in computer algebra to compute Grobner bases as a means of proof; see (Cox et al., 1997) and (Cox et al., 1998) as a beginning point to the extensive literature on this. Any reduction of a problem to a Grobner basis can be converted for numerical solution to an equivalent eigenvalue problem (Auzinger & Stetter, 1988; Moller & Stetter, 1995). But even as late as Raghavan and Roth's paper, algorithms for computing Grobner bases were not capable of handling a problem as difficult as the six-revolute inverse
158
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
position problem. If we are willing to give up rigorous proof of the true root count, it is often convenient to find a "good enough" formulation of a multivariate system with a root count low enough that continuation can be reasonably applied. In this sense, with the tremendous increase in computer power of late, even Pieper's original formulation of total degree 64,000 might be considered within range. But we will proceed below to give a much more amenable formulation than that. The approach we give here, first published in (Wampler & Morgan, 1993), is of a different cloth than all the others we have mentioned. Those others begin with a formulation of the kinematics as a product of homogeneous transformation matrices (Denavit & Hartenberg, 1955), (Chapter 12 Hartenberg & Denavit, 1964). Reductions starting from that point lead to rather long algebraic expressions, as one can see from the cited references and (Chap. 10 Morgan, 1987). Instead, we write down a system mirroring closely the geometry of the problem, and proceed to solve it in its unmodified form. Let Zi € R3, i = 1 , . . . , 6, be unit vectors along the joint axes of a six-revolute serial-link chain; see Figure 9.1. The kinematic chain is completely described by finding the common normal between each pair of successive joint axes and listing three values: the "twist angle" ar between joint % and i + 1, the distance a, between these joint axes (a.k.a, the "link length"), and the distance d» (a.k.a., the "joint offset") between successive common normals. If none of the successive joints are parallel2, the common normal directions are Xi = z% x z i + i / s i n a j ,
i = l,...,5,
where "x" means the vector cross-product in 3-space. Then, the six-revolute inverse position problem can be written as the system Zi-Zi = l, Zi-zi+1=
cos on
i = 2,3,4,5
(9.4.30)
i = 1,2,3,4,5
(9.4.31)
5
(ai/sinai)zi x z2 + ^ ( d i z i + ( a l / s i n a i ) 2 i x z i + 1 ) = p,
(9.4.32)
i=1
where ft is a known vector from where the first common normal intersects joint 1 to where the last common normal intersects joint 6. The vectors z0, XQ, and z\ are known, being fixed in the ground, as are ZQ and xy, being fixed in the last link whose position and orientation is given. From these, and the known lengths and offsets of the links, ft is readily computed from p, and we take it as given. So we have 12 equations (vector Equation 9.4.32 is equivalent to 3 scalar ones) in 12 variables, which are the 3 elements each of £2,2:3,2:4^5. Although these vectors naturally live in R 3 , we will treat them as if they live in C 3 , by the usual embedding. 2
See (Wampler & Morgan, 1993) for how to handle parallel links.
Case Studies
159
Among the equations, two are linear and the rest are quadratic, for a total degree of 210 = 1024. Using the two-homogeneous groupings (I,z 2 ,z 4 ) and (I,z3,z5), we get a lower root count of 24(g) = 320. Although this is quite a bit inflated over the true root count of 16, it is low enough that we have no trouble tracking all paths by continuation. Then, we can solve subsequent examples with only a sixteen-path parameter homotopy. 9.5
Planar Seven-Bar Structures
One of the most prevalent classes of mechanical systems consists of planar links joined by rotational joints, also known as "pin joints," or simply "hinges." The axes of all the joints in the mechanism are perpendicular to the plane of motion. In reality, the links occupy three-dimensional volumes, and they can move in separate parallel planes, but for the purpose of analyzing their motion, only their projection onto one of these planes needs to be considered. Consider the seven-bar assembly shown in Figure 9.2, consisting of four triangles and three simple bars. (We call this the "type a" seven-bar, as there are two other topological arrangements of interest; see Exercise 9.6) For general dimensions of the links, such an assembly is a structure, meaning that it will be rigid. However, it is quite possible that if we disconnect a joint and reposition the pieces, we can reconnect that same joint with the links in new relative positions. The question of finding all such assembly configurations comes up in the study of related six-bar and eight-bar linkages which have internal motion.
Fig. 9.2 Seven-bar linkage, type a.
160
9.5.1
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Isotropic
Coordinates
Before presenting equations for the problem, we take a brief aside to explain "isotropic coordinates." Suppose we have a point in the real plane (a, b) £ R2. We can naturally associate to this point the complex number z = a + ib G C This is quite convenient because vector addition in R2 becomes just the usual addition of complex numbers, and the approach is known in the kinematics community as the "complex vector formulation." Moreover, a rotation around the origin through an angle 0 moves point z e C to a new point el@z. For brevity, we use the convention 6 := el&. In this manner, any rotation in the plane corresponds to a 9 6 C of unit magnitude, \9\ = 1. Now, suppose we extend (a, b) into C 2 by letting a and b take on complex values. Then, to preserve the convenient modeling of rotations by complex multiplication, we associate to (a,b) e C 2 the point (z,z) := (a + ib,a - ib) G C 2 . For reasons beyond the current discussion, the pair (z, z) are known as "isotropic coordinates." Note that z and z are complex conjugates if, and only if, a and b are real. Rotation through an angle 0 now gives the point (8z, 9~lz). Any vector loop equation written in terms of z and 9 has a corresponding equation in which z is replaced by z and 9 is replaced by 9~1. Alternatively, we may let 9 := 9~l, so that rotation is represented by the isotropic pair (9,9), with the extra equation 99 = 1.
9.5.2
Seven-Bar Equations
Without loss of generality, let us take the position of link 0 to be fixed; that is, assume 9o = 1. Then the squared lengths of the three simple bars can be written as i\ =(a0 + M i + b292){a0 + M i + b292),
(9.5.33)
l\ =(c 0 + a292 + b393)(c0 + a292 + b39s),
(9.5.34)
£l ={b0 + a393 + Mi)(&o + M s + M i ) ,
(9.5.35)
9^
= 1,
8292 = 1,
9393 = 1.
(9.5.36)
This is a system of six quadratics, for a total degree of 26 = 64. The system is bilinear when treated with the two-homogeneous partition {1,01,02,03} X {1,01,02,03}. In the corresponding linear-product start system, only choices of factors having three of each type of factor give finite roots, so the two-homogeneous root count is (3) = 20A sharp root count is obtained by matching the sparsity of the equations using
Case Studies
161
a linear product decomposition as follows: {1,01,02} x { l , M 2 } {1,02,03} X {1,02,03} {1,03,01} X {1,03,0!} {1,01} X {1,0!} {1,02} X {1,02} {1,03} X {1,03}
f q , w (9 5 37)
" -
Of the same 20 combinations of factors that gave start points in the twohomogeneous formulation, six now do not give solutions. For example, we cannot simultaneously choose the initial factor from the first, fourth and fifth equations, as we would then have three equations in only two variables: 9%, 02- From this, we see that the linear product homotopy based on Equation 9.5.37 has a root count of 14. Readers with a particular interest in planar linkages may wish to look at (Wampler, 2001) to see an alternative solution approach which, when applied to the seven-bar problems, converts them to eigenvalue problems of size 14, 16, or 18. 9.6
Four-Bar Linkage Design
We have already studied several systems concerning the kinematics of mechanisms and robots, namely, Stewart-Gough platforms, six-revolute serial-link robots, and planar seven-bar linkages. In each of these, the objective was analysis: given the mechanical structure of the links, we sought all assembly configurations. In this section, we study a simpler linkage, the planar four-bar, but we ask synthesis questions, that is, we seek structural dimensions of the links so that the four-bar produces a specified motion. Depending on the requirements set forth for the motion, we may face a system of polynomial equations ranging from easy to hard. The easy examples have been solved long ago by a variety of methods, but the most difficult, the nine-point path synthesis problem, stood for almost 70 years before being solved by modern continuation methods. The use of an efficient product structure was critical to that success. In all these examples, we begin with basic loop-closure equations and manipulate them into a form amenable to efficient solution by continuation. In earlier times, kinematicians usually declared a problem "solved" when an elimination procedure had been found for reducing it to a single polynomial in one variable, especially if that polynomial was of minimal degree (having no extraneous factors). Such a polynomial could then be solved by a variety of numerical methods. With the advent of continuation, this is no longer necessary, for we can reliably find all solutions to a system of multivariate polynomials. Part of the art in applying continuation is to make informed decisions about how much symbolic pre-processing to do before turning the problem over to numerical solution. With this in mind, let us take a look at some four-bar design problems.
162
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Fig. 9.3 Four-bar linkage. Heavy lines are rigid links, whereas thin lines are vectors. Open circles mark hinge joints, and hash marks indicate a stationary link.
9.6.1
Four-Bar
Synthesis
Except for the simple lever, perhaps the most ubiquitous linkage mechanism is the planar four-bar, Figure 9.3. It consists of four rigid planar bodies connected in a loop, with one link, the "ground link," held in fixed position. A set of three links connected in such a loop would form a rigid triangle, a fundamental structural component in bridges and the like. In contrast, a hinged quadrilateral deforms, making it useless for structures but leading to a multitude of applications in machines that perform useful motions. In particular, points such as A and B on the two links adjacent to the ground link trace out circles centered on the fixed hinge points Ao and BQ, respectively, while points such as C on the "coupler link" opposite the ground link generally trace out sixth-degree curves. Linkages where one or more of the hinge joints are replaced by linear (slider) joints are also four-bars, but we will not discuss them here. The motion characteristics of four-bars can be used in several ways. Most applications fall into one of the following categories: Function Generation In this case, the purpose of the four-bar is to transfer an input rotation at one ground pivot to the other. If the four-bar is a parallelogram, this does nothing more than duplicate the input motion at the output side (transferring power in the process), but if the linkage is a general quadrilateral, some reshaping of the motion takes place. That is, a uniform rotation speed at the input gives a nonuniform speed at the output, which can be very useful. Quite often, a steady rotation at the input is converted to an oscillatory output. A windshield wiper operates on this principle, for instance. Path Generation In this case, there is a designated point on the coupler link, where we might place the tip of a tool for the machine to do its work, and so the path traced out by this tool is of top concern. The designated point is called the coupler point and its path is called the coupler curve. The motion of the foot of a simple walking machine might be generated in this way (assuming a
163
Case Studies
ball-shaped foot so that only its center position matters, not its orientation). Body Guidance In this case, the entire motion of the coupler is at stake, both position and orientation. Such a machine might scoop up material in one location, carry it without spillage to deposit the contents in a second location. A four-bar might guide the motion of the scoop. Four-bar synthesis means that we specify at the outset the desired motion, and seek to find a four-bar that will produce it. Synthesis is the inverse process of analysis, which seeks to describe the motion characteristics of a given mechanism. We will proceed to write out the basic equations of four-bar motion, which can be employed for analysis and for various kinds of synthesis, depending on which quantities are given and which are treated as unknowns. We will then describe several synthesis problems. Among these, the most challenging are path-synthesis problems, and as we shall discuss in some detail, the most challenging of all is the synthesis of a coupler curve to pass through nine given points. 9.6.2
Four-Bar Equations
The kind of synthesis problems we treat here are called precision-point methods, because we give a certain number of points through which the coupler curve must pass precisely or a number of locations through which the coupler must guide a body. So in the following equations, we use an index j to denote the configuration of the four-bar at the j t h precision point or precision position. Referring to Figure 9.3, vectors a and b describe the locations of the fixed pivots with respect to the origin O, vectors u and v are the links connected to ground at these pivots having rotations 4>j a n d 4>j-> respectively, pj is the vector from the coupler point C to the origin, x and y tell the location of the rotational joints in the coupler link, while Bj is the rotation of the coupler link. Quantities a, b, u, v, x, y do not change as the four-bar moves, while 4>,ip,6,p do change and hence have a subscript j in our formulae. Without loss of generality, we may assume 6>o = 4>o — V'o = 1) because the initial orientation of the links can be absorbed into the orientations of the vectors u,v,x,y. The four-bar can be viewed as consisting of a left "dyad," a,u,x and a right dyad, b,v,y, that are rigidly connected at the coupler point. In the following equations, we will use isotropic coordinates to represent vectors in the plane. Recall from § 9.5.1, where we give more details, that a vector from (0,0) to (ao,ai) in the plane is represented by isotropic coordinates as (a,a) := (a0 + iai,ao — ia,i). Summing vectors around the left and right dyads, we have loop equations for the j t h position, as pj+x6j+u(pj+a
= 0,
Pj+ydJ+vipj+b = O,
pj + x6~1+u
pj+yej
l
+vtpj +1 = 0.
(9.6.38) (9.6.39)
164
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
From these basic equations, we can define a wide variety of synthesis problems, varying in how many positions are prescribed and which of the symbols in the above equations are known quantities versus variables. 9.6.3
Four-Bar
Analysis
Before proceeding to the synthesis questions, let's look at the analysis question of determining the motion of a given four-bar. This will come down to nothing more than solving a quadratic equation, but we include it for background in case the reader wishes to animate any of the linkages synthesized in the subsequent sections. We assume that in Equations (9.6.38) and (9.6.39) we know the shapes of the links as given by x,y,u,v,a,b
and x, y, u, v, a, b. This leaves five unknowns, pj, pj, Oj, (pj, xjjj
and since we have just four equations, we expect a solution curve. One way to plot the curve is to rotate the left input link through a sequence of closely spaced angles, say $j = 0,1°, 2°,..., 360° and solve for the other four variables. First, eliminate Pj and pj by subtracting one equation from another to get {x-y)ej+u
= 0,
(x-y)ejx+u(j>~1 -vi/j~1+a-b
= 0. (9.6.40)
Next, we eliminate 6j to get [x - y)(x -y) = (u(j)j - vipj +a- b){u(j)~l - vipj1 +a-b).
(9.6.41)
Since
We should mention that the engineering analysis of a four-bar under consideration for a real machine would encompass much more than just plotting its motion curve. One would need to consider, for example, the forces transmitted through the links. This and other considerations are beyond the scope of the present discussion. 9.6.4
Function
Generation
For function generation, we prescribe pairs (
165
Case Studies
(9.6.39) to get, for j = 0 , . . . , n,
x9j
x9j + ucj)j = vipj + 1,
(9.6.42)
l
(9.6.43)
l
1
+ u(f>J = vt/jj + 1.
This leaves as variables u, u, v, v, x, x and 9j, j = 1 , . . . , n, since we assume 90 = 1. Equations (9.6.42) and (9.6.43) for j = 0 , . . . ,n are 2(n + 1) equations in 6 + n variables, so n < 4. This implies that we can specify up to five pairs of angles, ((j>j,ipj), j = 0 , . . . , 4 , and still expect to find four-bars that exactly interpolate them. The system of Equations (9.6.42) and (9.6.43), j = 0 , . . . ,4, after clearing the negative exponent on 9j, consists of 8 quadratics and two linear equations for a total degree of 64. We leave it as an exercise to show that the system has a multihomogeneous formulation with a root count of only six. We could solve this using continuation, preferably using a sparse linear solver in the path tracker since only a few variables appear in each equation. An alternative is to reduce the number of variables. Eliminating 9j between Equations (9.6.42) and (9.6.43) and then using the equation for j = 0 to eliminate xx from each of the others, one obtains, for j = 1 , . . . , 4, the single equation (-uct>j+vipj + l){-u(l)-l+v^Jl
+ l) = (-u
It is now easy to see that the total degree is 2 4 = 16, whereas a two-homogeneous structure {u, v, 1}
9.6.5
Body Guidance
This time, we are given positions (pj,pj) and orientations 9j of a body, j = 0 , . . . , n. We want to find four-bars which carry this body through these locations while it is rigidly attached to the coupler link. The Equations 9.6.38 for the left dyad are decoupled from those for the right dyad, Equations 9.6.39. In fact, they are exactly the same form, so if we find multiple solutions to Equations 9.6.38, we can choose one of them for the left dyad and one for the right dyad to form a four-bar that guides the body through the specified locations. For the left dyad, we have 2(n +1) equations in the 6 + n variables x,x,u,u,a,a and
166
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Burmester points, and points (—a, —a) are the Burmester centers, named after the man who first solved the problem (Burmester, 1888). Eliminating
(9.6.45)
Using case j — 0 to eliminate uu from the others, we have, for j = 1 , . . . , 4, (Pj + x9j + a)(pj + x9~x +a) = (p0 + x90 + a)(p0 + X8QX + a).
(9.6.46)
This is almost identical in form to Equation 9.6.44, except this time the constant term does not cancel out since pjpj ^ PoPo- Thus, the system has total degree 24 = 16, two-homogeneous degree (2) = 6, and only 4 finite roots. (The same two roots at infinity exist as in the function generator problem.) In fact, one classical approach to solving the function generator problem is to use the principle of kinematic inversion to convert it to the Burmester body guidance problem, but we will not go into that here.
9.6.6
Five-Point Path Synthesis
Many different path synthesis problems can be formulated, depending on what additional information is given besides the path points (pj,pj). One version is to give the ground pivots {a, a, b, b}. The simplification of the equations is exactly as for body guidance, except we must simultaneously consider both the left and right dyads, since 93 is unknown. Thus, the system to be solved is, for j = 1 , . . . , n, (Pj + x6j + a){pj + x9~l + a) = (p0 + x + a){p0 + x + a), (Pj + y9j + b) (pj + y6j * +b) = (Po + y + b)(p0 + y + b),
(9.6.47) (9.6.48)
where we have used 60 = 1. For five precision points, i.e., n — 4, we have eight equations in the unknowns x, x, y, y and 9j, j = 1 , . . . , 4. Expanding the products and cancelling terms, we have the system 8M0j(Pi + a) " (Po + a)] + x{6j\p3
+ a) - (p0 + a)}
+(Pj + a)(pj +a)~ (po + a)(po + 0)) = 0, 03{y[03(Pi +b)-
(Po + b)\ + y[9-\Pj
(9.6.49)
+ b) - (Po + b))
+(Pj + b){p3- + b)- (p 0 + b)(p0 + b)) = 0, where the 9j multiplying each equation clears negative exponents. cubic equations, for a total degree of 3 8 = 6561. This obviously ple monomial structure of the system. We can do much better Equation 9.6.49 has the monomial structure {x,x,l} x {l,9j,9^} Equation 9.6.50 has the monomial structure {y, y, 1}
(9.6.50) This gives eight misses the simby noting that and similarly, The reader may
Case Studies
167
The monomial structure is in truth sparser than the product structure just given appear in Equation 9.6.49, would imply. Only the monomials {X6J,X6J,X,X~6J,0J} and Equation 9.6.50 has a similar pattern. This allows solutions of the form x = y = 0j = 0, so it is clear that the root count is lower than 96. In fact, the polyhedral mixed volume yields a root count of 36, which is sharp. 9.6.7
Nine-Point Path Synthesis
In 1923, Alt (Alt, 1923) noted that the extreme path-synthesis problem for four-bars is to specify nine points on the coupler curve. Compared to the six-revolute seriallink problem, this one has a longer chronology, but a shorter historical account. The problem has so far proven to be invulnerable to reduction by hand, and it seems no one as yet has made a serious attempt at it using computer algebra. To date, the problem has only been solved by polynomial continuation. After Alt, the main advance came in 1962, when Roth (Roth, 1962) (Roth & Preudenstein, 1963) abandoned analytical methods and invented an early form of the continuation method, which he called the "bootstrap method." The work was done using real variables, so Roth invented heuristics to work around difficulties which we now recognize to be solution paths that meet and branch out into complex space. Most bootstrap paths never found a solution, but nevertheless, the approach did produce for the first time linkages to interpolate nine specified points. After the invention of the cheater's homotopy (see § 7.8), Tsai and Lu (Tsai & Lu, 1989) used a heuristic version of it to improve the yield of solutions, but a complete solution was not found until 1992, by Wampler, Morgan, and Sommese (Wampler et al., 1992). A follow-up discussion of this article (Wampler, Morgan, & Sommese, 1997) showed how the approach could be specialized to design symmetric four-bar coupler curves with a maximal specification of precision points (five points plus the line of symmetry). The system of equations is exactly the same as Equations (9.6.49) and (9.6.50), except now a, a, b, b are unknown and the index ranges over j = 1 , . . . , 8. Accordingly, the system has the product structure, for j = 1 , . . . , 8, (l,x,x,a,a,ax,ax){l,0j,0?),
(9.6.51)
(l,y,y,b,b,by,by) {1^3,6*).
(9.6.52)
Using the fact that four general equations in the monomials {l,x,x,a,a,ax,ax} have just 4 solutions (hint: introduce new variables n = ax and ft = ax), one sees that this system has a root count of 212(®) = 286,720. This is the root count of the formulation used to solve the problem in (Wampler et al., 1992), which at the time was probably the largest polynomial system ever solved. This is a case where symmetry can play a helpful role. It is easy to see that swapping (x, x, a, a) with (y, y, b, b) leaves the equations reordered but otherwise unchanged. If we can arrange our start system to have this same two-way symmetry,
168
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
we can track just half the number of paths. This can be done by using the same random coefficients for the factors in Equation 9.6.51 as in Equation 9.6.52. Thus, the problem can be solved using only 143,360 paths. This is far from the end of the story. The system has numerous solutions at infinity. Moreover, if (x, x,a, a) = (y,y,b,b), Equation 9.6.49 and Equation 9.6.50 are identical, so there is a positive dimensional solution component obeying this relation. Many continuation paths terminate on this singular set. Actual solution of the problem showed that there were only 8652 nonsingular solutions, appearing in 4326 pairs due to the two-way symmetry. Since the two-way symmetry amounts to nothing more than swapping the labels between the left and right dyads of the mechanism, we may say that there are 4326 distinct four-bars that interpolate nine general points. Moreover, these appear in triplets, called Roberts cognates, which not only go through the nine points but have exactly the same coupler curve. This means there are just 1442 distinct four-bar coupler curves that pass through the points. By using parameter continuation, we can solve subsequent examples using only 1442 paths, about a 100-fold reduction from the 143,360 used to solve the first example. When dealing with a very sparse system like Equations (9.6.49) and (9.6.50), it is often advantageous to eliminate some variables. This is because one of the main costs of the continuation method is solving the linear systems for Euler prediction and Newton correction. The cost of linear solves grows as 0(n 3 ) with the number of variables, unless sparse solving methods can be applied. In the problem at hand, we can eliminate all the 8j variables without increasing the root count, thereby increasing efficiency when using a linear solver for full systems. The elimination is accomplished by applying Cramer's rule for linear systems. The system
at9 + a20~l + a3 = 0,
M + w-i + p3 = o,
(9A53)
SiS2 + Sj = 0,
(9.6.54)
has solutions only if
where 6l =
ft ft '
* = ft fc '
S
> = ft ft '
(9-6'55)
Applying this to Equations (9.6.49) and (9.6.50) gives a system of 8 equations with the monomial product structure {xd, xa, x, x, a, a, 1 } ( 2 )
(9.6.56)
This reduced system has been the subject of further study. The mixed volume of the reduced version of the system, computed by Verschelde (Verschelde, 1996)
Case Studies
169
(Verschelde et al., 1996), was found to be 83,977. The best root count known was found by applying polynomial products (Morgan et al., 1995). The approach is to observe that Equation 9.6.54 admits the product decomposition {5i, 53} ® {52S3}. A homotopy based on this decomposition has 18,700 paths appearing with twoway symmetry so that only 9,350 paths must be tracked. However, the start system itself must be solved by continuation since the subsystems obtained by choosing one factor from each equation are not linear. The whole computation requires 24,300 paths. Although this is a substantial reduction in the number of paths, it requires a specialized computer program, so one may prefer to use a general purpose algorithm with more paths. No matter which method is used to solve the first random example, considerable efficiency is to be gained in subsequent examples by applying parameter continuation to track only 1442 paths.
9.6.8
Four-Bar
Summary
The purpose of this discussion of four-bar linkages is to show a spectrum of problems, having multihomogeneous root counts ranging from six to 286,720. Each geometric problem can be formulated in several ways as an algebraic system to be solved, and each algebraic system can be placed in any one of several homotopies for numerical solution. Generally, a well-chosen multihomogeneous formulation yields a root count considerably lower than the total degree, while the mixed volume of the Newton polytopes gives a somewhat lower root count. General linear products in which different equations have different linear decompositions are not useful for these synthesis problems, because they have the same monomial structure at each precision point. For the hardest problem in the set, the nine-point problem, symmetry can cut the number of paths in half, while polynomial products give the smallest root count at the expense of a more complicated computer program. Even with that approach, the number of continuation paths is more than ten times as large as the actual number of isolated roots. Only parameter homotopy can solve the problem using only 1442 paths, but we need to bootstrap the process by solving one example with one of the other homotopies. In several examples, we see that there is more at stake than just the number of homotopy paths. We can choose between two homotopies having the same root count, one having sparse equations in many variables and the other having some variables eliminated but less sparsity. Which is more efficient depends on the details of how function evaluation and linear solving are computed. To be efficient, the large, sparse formulation of the equations requires sparse linear algebra routines in the path-tracking code. On the other hand, elimination of variables tends to raise the degrees of the equations that remain, which can adversely affect the numerical stability of the equations.
170
Numerical Solution of Systems of Polynomials Arising in Engineering and Science Table 9.1 Equilibrium Constants Iog10(l/fei) Iog 10 (l/fc 2 ) Iog 10 (l/fc 3 ) Iog 10 (l/fc 4 ) Iog 10 (l/fc s ) Iog 10 (l/fc 6 ) Iog 10 (l/fc 7 )
9.7
Constants for the chemical equilibrium model, Exercise 9.2 T = 1000° 24.528 22.206 47.970 24.942 22.120 46.989 32.187
T = 3000° 7.289 6.997 15.107 6.825 7.208 14.680 10.285
Total Concentrations
T = 6000° 3.108 3.270 6.942 2.559 3.541 6.791 4.878
To TH Tc TN
5.e-5 3.e-5 l.e-5 l.e-5
|
Exercises
Exercise 9.1 (Nash Equilibria) (1) Compute the generic number of Nash equilibria for the following cases: (a) 3 players, 3 pure strategies each; (b) 4 players, 2 pure strategies each; (c) 3 players with (4,3,3) pure strategies, respectively. (2) Let Nash(N, S) be the generic number of Nash equilibria for N players having S strategies each. Derive a recursive formula for Nash(./V, 2) in terms of Nash(AT — 1,2) and Nash(iV - 2,2). Use it to find Nash(7,2). (3) Write a code to compute Nash(iV, {Si,..., SJV}), where player i has Si pure strategies. Compute Nash(5, {4,3,3,2,2}). (4) Use HOMLAB'S routine bezno to find Nash(3, {4, 3,3}). In general, routine bezno is not an efficient way to perform such a count, because it works by forming and solving a linear-product start system. Demonstrate this by using it to count Nash(6,2). What goes awry for a larger number of players? Exercise 9.2 (Chemical Equilibrium) This exercise concerns the chemical system of § 9.2. Data for this problem is given in Table 9.1. (A typographical error in Morgan's Table 9-2, corrected here, reverses the constants Tc and TH•) (1) Carefully verify the 4-homogeneous and the linear-product root counts given in § 9-2. (2) Find a 3-homogeneous formulation that also has a root count of 18. (3) Follow the steps outlined in § 9.2 to derive expressions for the coefficients of the monomials listed in Equation 9.2.17 in terms of the mass conservation parameters TH, To,Tc,TN and the equilibrium constants k\,..., k7. (4) Use routine chemsys in HOMLAB to compute solutions to the system. First, choose random coefficients. Try the different start systems. Do you get the same number of finite roots each way? What do the roots at infinity look like? (Hint: s t a t s (4,:) indicates the multiplicity of roots as determined by the end game. See Chap.10.)
171
Case Studies Table 9.2 Concentrations for T = 1000°, the only physically meaningful answer Components Xo XH Xco XN
Concentration
Compound
Concentration
1.4911556-015 3.212064e-019 7.664381e-016 2.314587e-027
Xo2 XH2 XN2 XCo2 XOH XH2O XNO
7.499733e-006 1.657938e-015 4.999735e-006 1.000000e-005 6.314036e-012 1.500000e-005 5.308800e-010
(5) Compute the solutions for random parameters. Is the result the same as for random coefficients? (6) Compute the solutions for T = 6000°, 3000°, and 1000°. How many physically meaningful roots are there (concentration values must be real and nonnegative)? (7) The test in chemsys for real solutions checks if the imaginary part of the concentrations is less than 10~6. Why is this not an adequate test for this problem? Can you devise a better one? Can you spot complex conjugate pairs in the list of "real" solutions? (8) Try turning off scaling for T = 6000° and see what happens. What do you think will happen for T = 3000°? Try it and see. (9) (Open ended.) Why is T = 1000° so difficult? Can you devise a strategy to treat this problem more easily? The sole physically meaningful answer for T = 1000° is given in Table 9.2. Exercise 9.3 (Stewart-Gough by total degree) Try running the Matlab file stewart/sgtotdeg.m to solve the forward kinematics of a general 6-6 StewartGough platform. (1) Confirm that among the 128 endpoints of the total-degree homotopy, 88 lie on the afHne algebraic set {(e, g) : e = 0, gg' = 0}. (2) The degenerate points are all singular. Why? (3) Save the 40 nonsingular roots and use them as start points for parameter homotopy, as directed in Exercise 7.4. Exercise 9.4 (Stewart-Gough by LPD) HOMLAB provides a routine, called lpdsolve, that creates a linear-product start system for a given product structure and tracks the resulting homotopy paths. The user must provide an m-file function that computes the function value f(x) and its Jacobian matrix df/dx. The script file stewart/sglpdhom.m does all of this for Stewart-Gough forward kinematics problems. (1) Run sglpdhom and check that it tracks 84 paths and obtains 40 nondegenerate solution points for a general 6-6 platform.
172
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(2) The routine warns that the start system has 30 singular solutions of the form e = 0. Can you see why these are present and why there are 30 of them? (Hint: they are nonsingular roots for some choice of factors in the start system G = {go, • • • ,ge}, but singular as solutions of G.) Exercise 9.5 (Six-Revolute Inverse Position) The following exercises pertain to the system Equations 9.4.30-9.4.32. (1) Confirm the two-homogeneous root count of 320. (2) Run routine sixrevl in HOMLAB and check that there are 16 finite roots. (3) If your computer is fast enough, modify sixrevl to solve the system with a total-degree homotopy (1024 paths) and reconfirm the root count. (4) Run routine sixrev2, which uses a 16-path parameter homotopy, on the following and observe the number of finite roots and the number of real roots. (a) (b) (c) (d)
a random, complex example, a random, real example, the Manseur-Doty example, a real example with intersecting "wrist" axes: a4 = d5 = a5 = 0.
Fig. 9.4
Seven-bar linkage, type b.
Exercise 9.6 (Seven-Bar Structures) The structure in Figure 9.2 is one of just three topological arrangements of seven links in a structure that cannot be solved by analyzing a five-bar or three-bar substructure. The other two are shown in Figures 9.4 and 9.5. (1) Derive equations for each of the seven-bar structures in Figures 9.4 and 9.5 and find linear product decompositions having root counts of 16 and 18, respectively.
Case Studies
Fig. 9.5
173
Seven-bar linkage, type c.
(2) Create a single program using HOMLAB to solve any of the seven-bar structures with a 20-path two-homogeneous homotopy. Solve a random example of each type and verify the root counts of 14, 16, and 18. (3) Create individual programs for the three cases using linear-product decompositions having the minimal number of paths. Run the same examples as you used in the previous item and verify that the same solutions are found.
Exercise 9.7 (Four-bar Function Generation) (1) Clear the negative exponent from Equation 9.6.43 and show that the system Equations (9.6.42)' and (9.6.43), j = 0, . . . , 4 , has a six-homogeneous root (x,u,v,l), and (6j,l), count of four. (Hint: use the groupings (x,u,v,l), j = 1,2,3,4.) (2) Confirm the root count of four for the system of Equation 9.6.44, j = 1, 2,3, 4. (3) Use routine f cngen in H O M L A B to synthesize some function generators. Remember that a real linkage has u* = u and v* = v, where "*" is complex conjugation. (a) Let *j- = $2 for $ , = {0.0,0.1,0.2,0.3,0.4}. Set (<^-,Vj) = (e'^.e***). (b) Do the same except *_,• = sin($j) for $.,• = {-1.0,-0.5,0.0,0.5,1.0}. (c) Construct an original example. How many real solutions are there in each case? (4) For real linkages synthesized in the previous item, plot angle * versus $ on a fine grid and animate the motion of the linkage. (5) Write a program to use H O M L A B to solve the six-homogeneous version of the problem from item (1) above.
174
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Exercise 9.8 (Four-bar Body Guidance) (1) Use routine burmest in HOMLAB to synthesize some dyads for body guidance. (a) Let 9j = {e~u, 0,0,0, elu}, Pj = {-2, - 1 - 0.5i, 0,1,2 + i}, and Pj = p* (complex conjugate). (b) Construct an original example. (2) A four-bar is obtained by combining a left with a right dyad. For the problems above, pick two real solutions and use one as the left dyad and one as the right dyad. Sketch the four-bar linkage at each of the five given positions. (3) What is the maximum number of distinct four-bars that can guide a body through five general positions? (4) Confirm that the only monomials appearing in Equation 9.6.46 are {xa, ax, x, x, a, a, 1}. Show that by introducing new variables s = xa and s = ax, one can reformulate the problem as six equations of total degree four. (This trick is similar to one due to Bottema (ch.8, §5 Bottema & Roth, 1979).) Exercise 9.9 (Five-Point Path Synthesis) • Use HOMLAB to solve the five-point problem using a six-homogeneous formulation with 96 paths. You may wish to write a script to form the equations in "tableau" form and then apply mhomtab. Determine the number of endpoints that are (1) at infinity, (2) singular, (3) finite and singular, (4) contained in (C*)8. • Using the results of the previous run, construct a parameter homotopy to solve subsequent problems in this family using as start points only the solutions in (C*)8. • Explore the symmetric five-point problem in which the fixed pivots and the precision points are placed with mirror symmetry about the vertical axis. Instead of writing equations specialized to the symmetric case, just use the general formulation with symmetric data. (Hints: Let the zero-th precision point be on the vertical axis. Also, note that in isotropic coordinates, (a, a) being mirrorsymmetric to (b,b) means (b,b) = (—a,—a).) What is the generic number of symmetric solutions? • How can you set up a parameter homotopy that preserves symmetry? How many paths must be tracked? • Find a formulation of the symmetric problem that uses just half as many variables and equations. Program it in HOMLAB and verify that you get the same results as using the general formulation with symmetric data. • Solve the case of a = 0,6 = l,p = (0.765 + 0.735i, 0.935 + 0.595i, 1.335 + 0.595i, 1.685 + 0.945i, 1.08 + 1.05i), with "real" data, meaning a = a*, etc.
Case Studies
175
Verify that one of the solutions has x w 0.71477 + 1.3365i. How many "real" solutions are there? • For some real solutions, plot the coupler curve and verify that it passes through the specified points. A "circuit defect" is said to occur if the real coupler curve has two circuits and some precision points fall on each. Find examples with and without circuit defects. Can you find an example having multiple real solutions without circuit defects? • Download one of the publicly available packages that implements polyhedral homotopy and use it to solve the five-point problem.
Chapter 10
Endpoint Estimation
In earlier chapters we studied polynomial homotopies H(z, t) : CN x C ^ C ^ with t going from 1 to 0. In this chapter we investigate the last part of the continuation procedure as t goes to 0. This is called the endgame in the continuation algorithm. In § 10.1, we look at nonsingular solutions of H(z,0) = 0, the system we want to solve. For these solutions Newton's method 1 is excellent. In § 10.2, we look at the situation of singular roots of H(z, 0) = 0. For these solutions, we follow (Morgan, Sommese,
177
178
10.1
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Nonsingular Endpoints
Assume that x is an isolated nonsingular solution of H(z, 0) = 0, i.e., that H(x, 0) = 0 and that the Jacobian dH/dz is an invertible matrix at (x, 0). Then we know that applying Newton's method to H(z,0) = 0 starting at any point (x',0) sufficiently near x will converge quadratically to (x,0). Given a homotopy continuation path z(t) with limt^o z(t) = x, the usual prediction-correction methods described in § 2.3 work well. The final prediction to t — 0 provides the initial guess (x',0) for Newton's method. Usually, it is not difficult to decide that the limit (x, 0) is a nonsingular solution. Convergence itself is a good indicator that the solution is nonsingular, but a surer test is to examine the condition number of the Jacobian at the endpoint. If the solution converges, as indicated by a small step in the final Newton iteration, and the condition number is mild, then the solution can be confidently declared nonsingular. Because convergence behavior and the condition number can be affected by poor scaling and high degrees, the definition of "mild" is problem dependent. Histograms of condition numbers for all the solutions of a problem can be very useful in such judgements, as described further in Chapter 11. If the solution does not converge well, the condition number computed at the solution estimate might not accurately reflect the condition number at the true solution. Because of this, one cannot confidently tell the difference between a cluster of nonsingular roots, each having a rather high condition number, and an inaccurate estimate of a true multiple solution. One way to clarify the situation is to increase the digits of accuracy of the computation. If the solution is truly nonsingular, then a sufficient level of accuracy will eventually reveal this. One can even apply interval arithmetic (see § 6.1) to obtain proof that a suspected nonsingular solution really is nonsingular. However, one cannot prove a solution is singular in this way; that is, higher accuracy arithmetic applied to a truly singular solution will increase the condition number at the estimated solution, but it likely will never show exact singularity. The interval evaluation of the Jacobian matrix will show that a singular matrix is within the bounds of the computation, but that does not prove singularity. One must finally stop at some level of accuracy and accept the judgement that the solution is singular to that level of approximation. This moves us into the realm of singular endgames, discussed next. As a practical matter, it is not necessary to determine if a solution is singular or not to estimate it well. This is because the singular endgames that follow work equally well on nonsingular endpoints. Thus, to keep a computer code simpler, one may apply the singular endgame to all paths and judge singular vs. nonsingular afterwards, according to the results.
Endpoint Estimation
10.2
179
Singular Endpoints
When the endpoint of a solution path is singular, there are several approaches that can improve the accuracy of its estimate. All the singular endgames are based on the fact that the homotopy continuation path z(t) approaching a solution of H(z,0) = 0 as t —> 0 lies on a complex algebraic curve containing (x,0). In this section we collect the facts that follow from this and underpin the methods. In particular, we will see that the methods become valid only after the path z(t) has been tracked into an "endgame operating zone" around t = 0. For very singular endpoints, this operating zone may only be reached by increasing the number of digits used. In § 10.4, we discuss in fuller detail what happens if one computes an estimate while still outside of this operating zone. Since this chapter is about local behavior of holomorphic functions, our homotopies H(z,t) will usually only need to be assumed holomorphic and not algebraic. In § 10.2.1, we collect all the assumptions we use in one place.
10.2.1
Basic Setup
Assume that H(z, f ) : [ / x A - ^
{(z,t)£UxA
H(z,t) = 0},
i.e., X is the closure of a connected component of the set of points of X with neighborhoods biholomorphic to an open set of C; and (4) the projection TT : U x A —> A restricts to a proper holomorphic surjection •KX • X -> A with 7^(0) = (x,0). At first sight this seems like a large number of assumptions that might be difficult to check! The crucial observation is that all the polynomial homotopies H(z,t) = 0 considered in this book fall into this setup. Indeed, if we are tracking a path z(t) starting at a nonsingular root z(l) of H(z, 1) = 0 and are trying to estimate the root of 7i(z,0) = 0 as z(i) —> x := 2(0), then the path is part of a one-dimensional irreducible component X of
{(z,t)eUxA
H(z,t) = o}.
By choosing small enough neighborhoods U and A and taking H(z, t) := 7iux&{z, t) and X to be the irreducible component of X n (U x A), all the hypotheses of the basic setup are satisfied.
180
Numerical Solution of Systems of'Polynomials Arising in Engineering and Science
In simpler terms, we know that the paths in our polynomial homotopies remain nonsingular for t £ (0,1], so each path is one-dimensional and makes a steady advancement as t goes to zero. The defining equations for the homotopy are all polynomial, so the path is a complex analytic set. This is the essence of the conditions stated above as applied to polynomial continuation.
10.2.2
Fractional Power Series and Winding Numbers
We have the following consequence of Corollary A.3.3. Recall that Ar(a) c C means the disk of radius r centered on a. Lemma 10.2.1 Assume that we are in the basic setup above. There is a neighborhood V c X of (x, 0) € X, a positive number r > 0, and a holomorphic mapping
•
We call c the winding number of X at (a;, 0). Given an isolated solution (x, 0) of H(z, 0) = 0, there is a positive e e R such that for 0 < t < e, H(z, t) = 0, considered as a system in z, has only nonsingular solutions in the vicinity of (x, 0). From this, it follows that the multiplicity of the solution as a solution of H(z, 0) = 0 is the sum of the winding numbers of the one-dimensional irreducible components of the solution set of H{z,t) at (x,0). The nonsingularity condition is satisfied automatically for many algebraic systems. Note that since the components Zi{4>(s)) are holomorphic functions of s, they can be expressed as convergent power series of s. We can consider these as fractional power series in t1^0. For the above representation of the components of z(t) to hold we must be within a disk A r c := {t e C | \t\ < rc}, such that Tfxnir-1(Ar ) has either no branch point (in which case c = 1) or a branch point at (x, 0). We refer to rc as the endgame convergence radius. A good way to visualize the situation is to consider what happens when we track a solution path as t circles the origin in the complex (Argand) plane at a real radius r, say as t = retB as 9 goes from 0 to 2TT. We start at z$ satisfying H(zo,r) = 0 and follow the path implicitly defined by H{z,rel9) = 0. For example, the reader may think about H(z, t) = z2 — t(r] — t) with 77 a small positive number. For almost all r, paths satisfying the basic setup above will remain nonsingular as we continue around such a circle, returning at the end of the loop either to z0 again or to a distinct nonsingular solution z\. For the example H(z,t) = z2 — t{j] — t), paths will remain nonsingular except for r = n, and we will go from z0 = y/t(rj — t) to z i = "V^l 7 ? ~ 0 o r t o z i = \A(^ ~ *) depending on whether r < r\ or r > 77. We
Endpoint Estimation
181
may then proceed around the circle again and again to return to solutions z%, Z3,... Since there are only a finite number of nonsingular solutions, after some number of such loops, the solution path must return to the original point; that is, for some k, we have Zk = z0. In the example, H(z, t) = z2 — t{j] — t), k = 2 or k = 1 depending on whether r < r\ or r > r\. Considering this whole process again at a slightly smaller radius r', we generally expect the same picture again, meaning that we get a sequence of return points z'o, z[,..., z'k with z'k = z'o and z[, i = 1,..., k, being the continuation of zt as t goes from r to r' on the real line. However, there may be exceptional values r* of r where at least one of the loops hits a singularity, thus breaking continuity. In the example, this value is rj. Stepping across this value, the return sequence may change such that the ith return values Zi and z[ for r and r' with r > r* > r', i.e., on opposite sides of the exceptional value, are no longer joined by continuation of t from r to r' in the reals. The value of k that closes the sequence may change as well. The endgame convergence radius rc is the smallest such exceptional value of r*: for all smaller radii, the return map remains stable and the winding number c of the path is the value of k in this range. Remark 10.2.2 For simplicity we have slid over questions about whether you can indeed choose small enough open sets so that we can decompose the solution set of H(z,t) = 0 in a neighborhood of a solution (x,0) components so that for the one-dimensional components, we have the desired uniformization result. The language of germs is the way to gently deal with these issues in a rigorous manner. We have included a short introduction to germs in § A.3. 10.3
Singular Endgames
For a singular endpoint, Newton's method applied to solve H(z, 0) = 0 is no longer satisfactory for several reasons. First, Newton's method loses its quadratic convergence at a singularity, and in some circumstances, it may even diverge. Second, the prediction along the incoming path may give a poor initial guess, which exacerbates the problem of slow convergence. Finally, while the endpoint of the continuation path is well defined in the limit, the path might very well end on a positive-dimensional solution set of H(z, 0) = 0, so unconstrained Newton iterations may wander along this set rather than give us the endpoint we desire. All of this is to say that to deal with singular solutions, we need a strategy different than the one we described above for nonsingular endpoints. We call such a strategy a singular endgame. All singular endgames estimate the endpoint at t = 0 by building a local model of the path inside the endgame convergence radius. The overwhelming problem is that the paths approaching singular solutions of a system approach their limit very slowly. To deal with this, we wish to sample the path as close as possible to t = 0, but numerical ill-conditioning precludes accurate computation too near t = 0. This
182
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
leads to the idea of an endgame operating zone, described next. 10.3.1
Endgame Operating Zone
For a fixed precision of arithmetic, there is typically some small zone around t — 0 inside which a path with a singular endpoint cannot be numerically tracked within a prescribed accuracy of the true path. Since the endgame can only work inside the convergence radius, this leaves an annular endgame operating zone, as illustrated in Figure 10.1.
Fig. 10.1 Endgame operating zone
The endgame operating zone can be empty in the case that the ill-conditioned zone is larger than the convergence radius. However, whereas the convergence radius is completely defined by the homotopy, the size of the ill-conditioned zone is not. It depends on the precision of the arithmetic, so it can be made smaller by using more digits. Roughly speaking, if we wish to estimate the endpoint with k digits of accuracy, then we need to sample the path with k digits of accuracy also. Let 10 c denote the condition number of the Jacobian J(z, t) of H{z, t) with respect to the z-variables and some fixed norm. When we do a correction step of Newton's method we solve the equation J(z,t)5z = —H(z,t). Here we lose roughly C digits of accuracy. Computing with d digits of precision, we need d — C > k for success.2 By increasing d, one may effectively shrink the ill-conditioned zone. With enough 2 This analysis of Newton's method is very rough, as the iterative nature of the method can correct some errors. It would be closer to the truth to say that Newton's method converges quadratically only to k < d — C digits, but even that is a rough generalization. Our comments are meant to give a correct general picture without a complicated analysis.
Endpoint Estimation
183
digits, one can ensure that the endgame operating zone is not empty. Once inside the endgame operating zone, we can sample the path just for real t or we can sample for complex t in the zone. For a given precision of arithmetic, better accuracy in the estimate is achieved by sampling for complex t. 10.3.2
Simple
Prediction
The simplest approach is to track the path as close to t = 0 as possible using extended precision to get the same accuracy as a nonsingular root. Let us analyze a simple example to see what happens. Assume we were trying to solve zc = 0 for some integer c > 1 using the homotopy H(z, t) = zc — t = 0. Note in this special case of solving a one variable complex polynomial, the condition number of J{z,t) is 1. So we can track with precision on the same order as the number of digits, i.e., k = d. If we follow the path z(t) with z(l) = 1, our path is then z(t) := £=, but in practice we do not know the path explicitly, but must track it. Assume we have tracked the path t^ + e(t) where e(i) is a random error of size O(l0~k). Once t= is of the same order as e(i), path crossing will likely happen. So we cannot track for t beyond R « 10~k. In this case we have an estimate 10~fe/c for the solution. This is not very good. For example, with c = 5 and 15 digits of precision, we get 10~3 as an estimate for the solution 0. If we wanted 10 digits of accuracy, we could achieve this by using this method and 50 digits of precision. 10.3.3
Power-Series
Method
The simple prediction approach of § 10.3.2 can easily be improved. The idea here is to estimate the winding number c and then approximate the map cj> : Ar(0) —> C " x C of Lemma 10.2.1. There are different schemes to achieve this. We begin by tracking a solution path z{t) from t = 1 down to t = R for some R G (0,1). We then collect samples of z(t) by continuation from t = R to use in fitting a power series to 4>(s), where t = sc. There is a separate power series for each component of z. Assume for the moment that we know c and that t = R is inside the endgame operating zone. We choose some number of points s\,..., sK in the s-disk, such that each si is inside the endgame operating zone, and find the values z(t) = Z(SJ) by continuation. At each such point, we can compute derivatives. If we compute the first ki derivatives at a particular Sj, then Si is equivalent to ki + 1 points without derivatives when determining the order of the power series we can compute. That is, for each j = 1,..., N, we have a polynomial Pj(s) of degree (X^i=i(^ + -0) ~ 1 approximating <> /_•, (s) and satisfying p3 (st) = <\>^ (s») for i from 1 to K and for v from 0 to ki. The standard error estimate (Theorem 3.6.1 Davis, 1975) tells us the error of the approximation of Pj(0) to (pj(O) is O (PliLi Isi]fei+1)- F° r brevity below, we shall say this is an Mth order fit, where M — (X)"=1(&i + 1)) — 1.
184
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
This leaves open two questions that must be answered in order to deploy the method: • How do we find the endgame operating zone so that we can sample within it? • How do we determine the winding number c? The only practical approach to finding the endgame operating zone seems to be adaptive trial-and-error. Suppose we fix a pattern of the sample points, {asi,..., asK}, where a is a scaling factor for shrinking the sample pattern around the origin. Typically, asi is real and we arrive at it by tracking t in (0,1). The remaining sample points may be real or complex, but either way, we evaluate z at them by continuation. We may execute the endgame repeatedly for a geometrically decreasing set of scalings aj = A* for some fixed real number A S (0,1), say A = 0.3. When successive estimates of the endpoint agree to some pre-specified tolerance, we declare the method a success and stop. If this tolerance is never satisfied, we stop when the scaling gets so small that we can no longer accurately track paths due to the ill-conditioning near t = 0. If this happens, we must report that the tolerance was not met and return as our best estimate the one for which the smallest successive difference was found. There are several good ways to determine c. One is to directly measure the winding number by tracking a circular path, t — Re^^6 until the path closes up at 9 = 2TTC with c a positive integer, i.e., with z(Re2'KC^1) = z(R). If R is inside the endgame operating zone, then c, the number of loops around the origin necessary to close the path, is the winding number. As always, there is the numerical problem of deciding when two approximate numbers, z f Re2lxc^~^\ and z(R) are equal. This is the same as the problem of needing to keep the allowed error in our tracking small enough that we do not have path crossing. A less computationally-expensive method for small c is to note that since c is an integer, we can quickly test small values of c, say, from 1 to 4, for consistency with a power-series fit to an oversampled data set. Such a data set can be obtained with less path-tracking than would be required to find the winding number by path closure. A method for determining c and estimating z(0) is as follows. (1) Use continuation to collect sample values of z(t) for t = ti,... (2) For c = 1,..., c max , do the following.
,tK,tK+i.
(a) Transform the sample points into the s-plane, using Si = ti . The continuation path in t determines the proper matching angle of each Si, that is, if U = Re^16 for R e (0,1), then s; = R}/ceV=ie/c taking R1/0 in the reals. (b) Derivatives with respect to t at the sample points must also be converted to derivatives with respect to s using the value of c, e.g., dz/ds = (dz/dt)(dt/ds) = (dz/dt)csc-\ (c) Fit an Mth-order power series,
185
Endpoint Estimation
scribed above, (d) Calculate the prediction error at the extra sample point as ec = \\4>c(sK+i) —
^(Wi)ll(3) Use the c that gives the smallest prediction error ec as the estimate of the winding number, so (f>(s) = 4>c{s). Estimate the path endpoint as z(0) = 4>(0). When used in conjunction with the adaptive method of determining the endgame operating zone, one often observes that c = 1 gives the best prediction when the path is far outside the convergence radius. As the path is tracked into the operating zone, c settles into the correct value. This is because the order of the prediction error for an incorrect value c' of the winding number is O(tl/C), whereas for the correct value it is O{tM>c). One way to collect samples is in a geometric sequence along the reals: (^0)^1)^2) •••) = (R,XR,X2R,...) for some A € (0,1). Using z and dz/dt at two successive values ti and tj+i, one may make a cubic prediction of the next value at ti+2- A n i c e feature of this sampling pattern is that it advances by adding just one sample point to the sequence, reusing the last two points of the previous sample. That is, at one iteration we use samples at (to,£i,<2) a n d at the next (ti, *2J ^3)Such a geometric sequence can be used to determine the winding number without trial and error. The value z{t) is approximately z(t) = z(0) + at1/0 + higher-order terms, where a is the first coefficient in the fractional power series. Thus, z(R) — z(XR) « a(l - \l'c)R}-/c and z(XR) - z(\2R) « a(l - A ^ A ^ i ? 1 / 0 and so z(XR) - z(X2R) ^ z(R)-z(XR) ~
1/c
Since we know X, this can be used to estimate c, keeping in mind that c is a positive integer. This method can fail when a is zero or small, so that the first nonconstant term in the power series is order £2/c or higher. A method that attempts to deal with such subtleties is described in (Huber & Verschelde, 1998) (see also (Verschelde, 2000)). We shall not pursue this further here. As we approach t = 0, we can expect the predictions of the power series to be quite accurate. Accordingly, we may use it in place of the linear predictor in the predictor-corrector path tracker when collecting new samples. Of course, one should use the current best estimate of c at each stage, which may change as the endgame proceeds. Even when c is not correct, because the path has not fully entered the endgame operating zone, the best estimate for c obtained by the above method will generally be better than just assuming c = 1. A final variation on the power-series method is worth mentioning. Once the endgame operating zone is entered, it is valuable to quickly gather more samples to raise the order of the prediction. This allows the process to converge to full accuracy at larger values of t, before the ill-conditioned zone is encountered. Suppose we have
186
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
sampled along real t £ (0, R) and the prediction of 4>c{s) gives an accurate estimate of z(tK+i) in step 2d. Then we may try to predict across the origin in s and use Newton's method to refine samples there. It is particularly convenient to gather a symmetric sample set —si, — S2, • • •, —sK, because the odd-powered terms in the power series for tp(s) = {
For double precision arithmetic and samples on the real line in t, experience has shown that there is little profit in attempting the use of winding numbers greater than four or five. For higher precision arithmetic, this limit can be extended. The problem is that an Mth-order power series in s corresponds to a power series in t of order only M/c. To get a good estimate, we will need a large value of M and a numerically stable method of computing the estimate 0(0) without finding all M + 1 coefficients of the power series. The Cauchy integral method of the next section provides this. 10.3.4
Cauchy Integral Method
The Cauchy integral method is based on the use of the Cauchy Integral Theorem to estimate the solution of H(z,t) = 0 by 0(0), where <j> : Ar(0) -> C^ x C of Lemma 10.2.1. As in the power-series method, we first track z(t) until t = R. We then track as 6 varies, to both determine the winding number c and to collect z (Re^^e) samples around this circular path. Letting s denote the coordinate of A r (0), we have t = sc, and z$ = <j>i(s)fori = 1,..., N with the sought after solution given by (zi,... ,z^) = (>i(0)... ,<^JV(0)). The Cauchy Integral Theorem gives
fc(0) = -±== f 27TV-1 J{sec | |«|=flv<=}
^ds.
(10.3.1)
s
In terms of 9 and z (Re^~*e) we get the vector integral
Because of periodicity, an excellent method to evaluate this integral is the trapezoid method, e.g., (Eq.(3.3.4) Stoer & Bulirsch, 2002). This method yields an estimate of z(0) with error of the same magnitude as the error with which we know the sample values z(Re^zzl0). As in the power-series method, we can benefit from choosing a special sample set. If M + 1 points around the circle are sampled at equal angles, Sk =
End/point Estimation
187
j^e2n^ikc/(M+i)^ faen the trapezoid method gives exactly the average of the sample points: 1
M+i
Moreover, it is easily shown that this is the same result as would be obtained from a power series fit to the same points. The success of the Cauchy integral method depends on finding an appropriate radius for the circular sample. As in the power-series method, we do not know a priori the convergence radius. The most practical recourse is to discover it adaptively, by trying the method at geometrically decreasing radii. Convergence may then be judged by agreement in winding number and endpoint estimate between successive trials. 10.3.5
The Clustering or Trace Method
This last approach is based on the trace, see § 15.5. Assume that we have a number of paths Zi(t) converging to what appears to be the same solution z* of the system H(z,t) = 0 that we want to solve. Denote the paths as wi(t),..., wm(t). We have a finite number of one-dimensional irreducible analytic sets Xlt... ,Xk passing through a small neighborhood of (z*, 0). We assume that the projections to the taxis -Ki : Xi —> C are proper for all i when restricted to 7r~1(Ar(0)). This will be true for some r > 0. Each map iri n-i 0, e.g., see § 15.5 or (Appendix Morgan, Sommese, & Wampler, 1992a), this sum extends to a holomorphic function tr(t) for t £ A r (0). We are in a situation similar to the situation with the power series method of § 10.3.3, but simpler since we do not worry about c. This method predicts the value tr(0)/m for z*. This prediction is a prediction for the average of the limit points wi(0) + • • • + Wm(0) m
Each of the Wi(t) has a fractional power series, but their sum is holomorphic, that is, it has a power series with integer exponents. Thus, we may conveniently estimate z(0) by fitting an integer-exponent power series to the average of the Wi(t). The main difficulty with this method is determining which solutions are converging to the same endpoints. The difficulty arises because the estimate of the endpoints of the individual paths is inaccurate unless the winding number is employed in the estimation. Only the average endpoint is well-behaved (holomorphic),
188
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
not the individual paths. A lesser disadvantage of the trace method is that to sample the average solution path, one averages the values of the individual paths at each sample point. This means that the individual paths must be sampled at the same points. Hence, the processing of the paths becomes coupled, whereas the power series or the Cauchy integral can be applied to each endpoint independently.
10.4
Losing the Endgame
It may happen that the endgame is applied outside the endgame convergence radius, either because there are insufficient digits to track within that radius or because the endgame zone is not identified correctly. It is natural to ask what happens in such a circumstance. When there is a tight cluster of distinct solutions, the precision of the arithmetic must be high enough not only to distinguish between them, but also to track paths accurately near them. If the cluster is too tight in comparison to the precision of the arithmetic, the end of path tracking, and hence the application of the endgame, will occur outside the radius of convergence. There is the stability question of whether the methods will compute some sort of average of some of the solutions of the cluster. The methods do, in fact, have good stability properties, which hold in a larger range than the endgame operating range. The setup is that we have a holomorphic function, H(z, t) : CN x U —> C, where 0 6 U C C. Let 7T : C^ x U —> U be the product projection. Of course, in practice this is our homotopy. We are trying to solve at t = 0. We have introduced three interrelated methods. The Cauchy integral method and the power series method are the most accurate. The clustering method of § 10.3.5 is less accurate but clearly fails gracefully: it gives the weighted average of roots of the cluster. The full gamut of possible behaviors of the methods when we are not in the endgame operating region is not clear, but we can get some idea of the behavior from the following examinations. Consider the simple example on C2 H(z,t) =
z2-t2-e2=0.
If we track down to t — R and R < e, then we are in the endgame operating region. If R > e, we are not. Let's see what we end up computing. The solution set TZ of H(z, t) = 0 over AR(0) is a Riemann surface that can be shown to be biholomorphic to some annulus. The important point is that 7r~1({t G C | |t| = R}) is the union of two disjoint circles C\ and C?,-
Endpoi^it Estimation
189
Applying the Cauchy integral method we end up evaluating -1
/>2TT
— /
2TT JO
VR2e2^e
+ e2d6,
with a choice of one of the two branches of the square root. If R < e, the Cauchy integral method yields the roots ±e depending on the choice of the branch. If R > e we get a function dependent on R. This integral is an elliptic integral, but for explicit values of R and e it is easy to evaluate numerically. Fixing e = 10" 7 and R = 10~5 we get 0.64- 1(T 5 -0.50-lQ- 7 \f-[, which does not compare favorably with the actual roots ±10~ 7 . Indeed, the error 0.63 • 10~5 is two orders of magnitude larger than the root. Since the Cauchy integral method applied to an approximating polynomial gives the value at the origin of the approximating polynomial, we see that choosing interpolation points on the circle C\ or C2, the power series method will yield answers identical to the Cauchy integral method. It is important to realize that the trace method is not better than the powerseries or Cauchy integral method. Indeed, if we chose the paths wi(t),... ,wm(t) apparently converging to a common root as in the trace method, and applied the power series or Cauchy integral method to all the points and summed, we would get the same sort of answer as in the trace method. Let's see this precisely for the Cauchy integral method, realizing, as noted above, that this implies the analogous statement for the power series method using interpolation points on the curves over the circle \t\ = R. We assume that over some small disk, AR := {( 6 C \t\ < R}, of radius R around 0, with A# C U, the set H~1 (0) r\TT~1 (AR — 0) is a one-dimensional analytic set with closure X in AR X C^ such that irx '• X —> AR and TT-^ : X —> AR are proper. This is phrased this way to allow the possibility that there is a positive dimensional analytic solution set in the fiber over 0. By definition, proper means that the inverse image of any compact set is compact. One significance of properness for a holomorphic map is that the map has a well-defined sheet number on each irreducible component of X, e.g., see Corollary A.4.15. As mentioned previously, these conditions are satisfied for all of our homotopies. We are not assuming that we are in the endgame convergence radius. Theoretically this means that we do not necessarily have a map 0 as in § 10.2.1. We still have the normalization mapping v : TZ —> X, which for curves is the most classical special case of Theorem A.4.1. Here TZ is a smooth curve (a Riemann surface in the terminology of complex analysis), v is proper; and for a finite set of points B C AR, the map 7TTC\77-I(B), i.e., the map n restricted to TZ minus the finite set TT~1(B) is a biholomorphism. When we are in endgame convergence radius, TZ is a disk and v is 0. Since TXX extends to a neighborhood of X, v extends to V : TZ —> X, where TZ is a Riemann surface with boundary a union of circles, i.e., dTZ := TZ — TZ is a union of disjoint smooth connected curves C\,..., CL for some integer L > 1. Now the Cauchy integral method (Morgan et al., 1991) that we are using starts
190
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
with a point po £ dTZ and follows its continuation p as n(p) goes around the circle, {t £ C | |t| = R}, c times, where c is the minimum positive number of times it is necessary to go around the circle, {t e C | |i| = R}, until p returns to po. Note p traces out a connected component, d, of OX = UjCj containing p. We let c, denote the cycle number associated to the curve CV In analogy with the cluster method we compute the integral
J-yfz.^(±)
(10.4.2)
where, abusing notation, we let z : TZ —> C is the vector of coordinate functions on CN pulled back to TZ. 72. is a noncompact Riemann surface and TZ is a compact Riemann surface with boundary a finite number of circles, such that dTZ = TZ — TZ, i.e., TZ is the set of interior points of TZ. We assume that TT : TZ —> AJJ is a proper holomorphic map from TZ to the disk, A# := {t 6 C | |t| < R}, of radius R around 0, and that TT extends to a differentiable, finite to one map, TT^ : X —> A#. We let Pi for i in a finite set I denote the distinct points in ir~1(0). We let n, denote the multiplicity of the pi as a zero of the holomorphic function n. The following consequence of Stokes Theorem will let us work out estimates for the effect of branch points on the Cauchy integral method. L e m m a 10.4.1 Let n, TZ, dTZ, n^ be as above. Let z : TZ —> C be a holomorphic map. Let {pi | i G / } be the set of points, Pi, in the set, n~1(0), with multiplicities, rii. Then letting c = C\ + • • • + ci : 1
f
» fdt\
yr--^ rii
Note c = X l i e / n i ' ^ u ^ * n e n * an<^ ^ e Cj can be different (though each rii is a sum of a subset of the c,. This consequence of Cauchy's integral theorem is left to the reader as an exercise. Corollary 10.4.2 If dp = dTZ, then the equation (10.4.2) computes the average of c (counting multiplicities) solutions of H(z, 0) = 0.
10.5
Deflation of Isolated Singularities
Endpoints of homotopy solution paths can be divided into two types: isolated solutions and points on positive solution sets. We say that z* £ CN is an isolated root of f{z) = 0, f(z) : CN -> C ^ , if for a small enough positive e e l , the ball Be(z*) c CN defined by B€(z*) = {z £ CN \ \z - z*\ < e} contains no other root of f(z) = 0 besides z*. Isolated singular roots can be computed accurately without resorting to the kinds of singular endgames we have discussed above. This is
191
Endpoint Estimation
brought about by a symbolic reformulation of the equations so that z* becomes a nonsingular root of the new system. Before describing the method, let us review some facts about the behavior of Newton's method near an isolated root. If z* is a nonsingular root, that is, if the Jacobian matrix df/dz(z*) is nonsingular, then it is well-known both that z* is isolated and that Newton's method converges quadratically to z* when initialized from any point close enough to it. In most cases, but not all, Newton's method will also converge for isolated singular roots, but convergence will be slower and the final accuracy lower than for nonsingular roots. An illustration of a system for which Newton's method fails near an isolated root is (Griewank & Osborne, 1983) (29/16)**-2^0, xz - y — 0.
No matter how close one starts to the multiplicity-three isolated root at the origin, (x,y) = (0,0), Newton's method diverges. See (Griewank & Osborne, 1983) for more on how Newton's method behaves near such irregular singular roots. The system of Equation 10.5.3 is very special in the sense that if the coefficient (29/16) is changed to a generic value, Newton's method converges even though the origin remains a root of multiplicity three. However, we do not wish to depend on this kind of genericity, as we may indeed be given a system with an irregular singularity. Moreover, even when Newton's method converges, its behavior may not be satisfactory. For a root of multiplicity fi > 1, its rate of convergence is only linear and the function must be evaluated with precision /x times greater than the accuracy desired in the estimated root. To be precise, consider a single polynomial f(z) : C —> C with a root z* of multiplicity /x > 1. Denoting the kth iterate of Newton's method as Azk, we have the iteration formulae A2fc = -f{zk-i)/f'(zk~i),
Zk = zk-i + Azk.
Let £k := zk — z* be the error between the kth iterate and the true value z*. If the sequence of iterates converges to z*, then it obeys the following relation in the limit
tk+l = (»-iyk+o(ek). (A simple demonstration of this result can be found in (Ojika et al., 1983).) So for H > 1, the convergence rate is linear with geometric ratio (/i — l)//x. For fi = 1, convergence is quadratic, a much faster process. 10.5.1
Polynomials in One Variable
How can we restore quadratic convergence for roots with multiplicity greater than one? For a polynomial in one variable, this is rather simple. By Theorem 5.1.2, we know that a multiplicity fi root of f(z) is a multiplicity one root of f^~1"l{z).
192
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Suppose we begin by solving f{z) by a homotopy method, and we observe that fi roots are approaching a common endpoint. Then, we may switch to solving yO-i)(^), initializing Newton's method using the estimated singular endpoint of the first stage. While it is clear that in theory this deflation maneuver is valid, one might wonder if it is numerically stable. The polynomial that we solve in floating point arithmetic is only an approximation to an exact polynomial and so the multiplicity // root of the exact polynomial will appear to have a cluster of [i roots for the numerical polynomial. How does this cluster behave under differentiation? Can we be sure that f^~l^(z) has a root in the vicinity of the cluster of roots of f(z)l As detailed in (Sommese, Verschelde, & Wampler, 2004d), the answer depends on the degree d of f(z) and the distribution of its roots. Let z* be the centroid of a cluster of fi roots inside a disk Ap(z*) of radius p centered on z*, and let R denote the distance from z* to the nearest root outside that disk. Then, the condition
5,_1SL p
a — /i + 1
(104M)
is a conservative estimate that guarantees that f^k\z), for all k < /i — 1, has exactly /x — k zeros in AP(ZQ). Even if the root is truly a multiple root due to the structure of the equations, at any finite level of precision in floating point, it will likely become a cluster of roots. However, the higher the precision, the tighter the cluster, and so beyond some precision, the cluster radius p will become small enough that condition 10.5.4 will be satisfied and the deflation maneuver will succeed. This does not resolve the question of deciding whether a given polynomial has an exact multiple root or it has a cluster of closely-spaced roots. As we have indicated before, this is not a question that can be resolved in favor of a multiple root using floating point arithmetic. If it is a cluster, a high enough level of precision will reveal it, but if it is a true multiple root, only exact arithmetic can prove it. 10.5.2
More than One Variable
It is natural to consider how to generalize the approach for one variable to systems of equations in several variables. The following formulation is based on (Leykin et al, 2004), which in turn was motivated by (Ojika et al., 1983; Ojika, 1987) (see also (Lecerf, 2002)). Assume that f(z) : CN —> CN is a polynomial system with an isolated singular root z*. Denote its N x N Jacobian matrix as J(z) : — df/dz. At the singular root, J(z*) will have rank r < N. This implies that the matrix equation J(z*)v = 0 has a linear solution set for v 6 f>w-i of dimension N — r — 1. We can pick out a unique point of this null set in P ^ " 1 by appending N — r — 1 homogeneous linear equations and dehomogenize by appending one more inhomogeneous equation. Equivalently, we can pick a random r-dimensional linear space to intersect the null space in a point in CN, that is, pick VQ,. .. ,vr G CN at
193
Endpoint Estimation
random and set v = v0 + Y^i=i \vu with unknowns Ai,..., Ar e C. Combining this condition with the system f(z), we have 2N equations in TV + r unknowns
S(*,A)=(
f(^r
) =0,
(10.5.5)
\J{z)(vo + J2z=i^iVt)J where A = (Ai,...,A r ). An initial guess for A can be found by standard linear algebra applied to J{z) at the estimated value of z* coming from the solution of system f(z) = 0. The system of Equation 10.5.5 has more equations than unknowns. It can be reduced to square using a randomization procedure (see § 13.5), but this is not necessary. We are only seeking a local solution, not forming a global homotopy, so it suffices to use Gauss-Newton iteration. This is identical to Newton's method except that the overdetermined iteration step is solved by least-squares (pseudoinversion). Let (z*,X*) denote the solution of g(z, A) = 0 that uniquely projects to the solution z* of f(z) = 0. It is not immediately clear that the multiplicity of (z*, A*) as a solution of g(z, A) = 0 will be lower than that of z* as a solution of f(z) — 0, but a proof of this is given in (Leykin et al., 2004), subject to the assumption that z* is an isolated solution of f(z) = 0. To desingularize an isolated root of multiplicity fi > 2, deflation may need to be applied multiple times. Indeed, in the case of n = 1, a single polynomial, the foregoing is exactly the same as the differentiation approach discussed in Subsection 10.5.1, where we saw that fi — 1 deflation steps are required. In the general case, the statement is that at most \i — 1 deflation steps are required. The fewer deflations required, the better, as each one adds more variables. The deflation process is local in the sense that different singular points of the same system may have different deflations. The singularities may differ not only in their multiplicities, but also in the rank of the Jacobian at each stage of deflation. An analysis of the numerical properties of deflation is not yet developed for the multivariate situation: there are no known formulae analogous to Equation 10.5.4 for the univariate case. Experiments reported in (Leykin et al., 2004) indicate that the approach is effective for a number of test cases having isolated singularities. In several variables, there is an additional concern that does not arise for just one variable. This is the possibility of positive dimensional solution sets. Deflation is only valid for isolated roots. This is a big drawback, because we have no clear way of deciding which singular endpoints in a homotopy are isolated and which ones are landing on positive dimensional sets. This issue will be treated further in Part III, where we consider the treatment of positive dimensional solutions. The frequent appearance of positive dimensional solution sets, especially at infinity, means that we cannot depend on deflation alone: the general purpose singular endgames remain necessary if we wish to find all path endpoints accurately.
194
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
10.6
Exercises
Exercise 10.1 (Power-Series Method) The power-series endgame is implemented in HOMLAB using samples for real t only. The control variables are set in htopyset .m and are described in § C.7.1. Systems of the form x y - l = 0,
Or + l) f e =0,
have a multiplicity k root at (x,y) = (—1, —1). • A total-degree homotopy has 2k paths. We have identified (—1,-1) as a multiplicity k root. Analytically determine the endpoints of the other k paths. • For k=3, solve the problem with HOMLAB by writing the system in tableau form and using the script totdtab to solve it with a total-degree homotopy. Does it give the result you expect? • Try similar problems for k = 2,3,4,... How high can you go and get good endpoint estimates? Pay attention to the setting of CycleMax. • The default setting is allowjump=l, which causes the endgame to also collect sample points for negative values of s by predicting across the ill-conditioned zone at the origin. Compare the performance of the endgame for allowjump=0 versus allowjump=l. You may set global verbose=l to get intermediate results from the endgame, see § C.7.2. Exercise 10.2 (Power Series Error Analysis) There are two sources of numerical error in the estimate produced by the power series method: truncation error due to the order of the fit and amplification in the fitting process of errors in the sample points. Formulate the fitting process as the solution of a linear system whose unknowns are the coefficients of the power series:
aM]T
for i = 1,..., M + 1. We may write this in matrix form as $ = 5a,
(10.6.6)
where $ is the column of sample values, S is the Vandermonde matrix whose (i,j)th element is s^~ , and a is the column of power series coefficients. The final estimate will be 0(0) = do- The condition number of the Vandermonde matrix affects how errors in the samples <j>(si) are transmitted to the estimate ao• For the same order fit, M, compare the condition number for the following sample patterns: (1) a geometric sequence s^ = R, XR, X2R,... for various A, (2) a symmetric, two-sided geometric sequence s, — ±R, ±XR, ±X2R,..., (3) the transformation of the two-sided sample set to fit a power series in w = s2,
195
Endpoint Estimation
(4) a circular sample, s, = ReV=T2m/(M+i)_ • How are the numerics affected by rescaling the fitting as
(pSi)2 • • • (p S i ) M ][a 0 ai/p
a2/p2 • • •
aM/pMf
with p = 1/R? • What sample pattern is best for a thin endgame operating zone, characterized by having an ill-conditioned region almost as big as the convergence radius? • Give two reasons why the Cauchy integral method is a good approach for endpoints with large winding numbers. Exercise 10.3 (Circular Sample Sets) • For an evenly-spaced circular sample set, s» = fie^-l2nt/(M+1)j find the sums ££+ 1 sJforfc = 0 ) l,2 ) ...,oc. • Show how this implies equivalence between the trapezoid rule for the Cauchy integral on evenly spaced circular samples and the power series fit to those points. • Show how this also implies that the average of all paths approaching the same endpoint is a holomorphic function (given by a power series with nonnegative integer exponents) of the path parameter t. Consider that there can be several subgroups of paths approaching the same endpoint, each subgroup having its own winding number. • Let S be the Vandermonde matrix, as in Equation 10.6.6, formed for the evenlyspaced circular sample. What is S1"1? Exercise 10.4 (Multiprecision) (open research topic: see (Bates, Sommese, & Wampler, 2005b)) The control settings for the endgame in HOMLAB reflect the fact that Matlab computes in double precision. How should these be changed if multiprecision arithmetic were available? If the precision of the arithmetic could be changed at will during the endgame, how should the endgame algorithm best use this capability? Exercise 10.5 (Deflation 1) The system x2 + y2 = 0,
x2 - y2 = 0
has a multiplicity four isolated root at (x, y) — (0,0). Show that one stage of deflation gives a nonsingular system defining the root. Exercise 10.6 (Deflation 2) Do the following for Griewank and Osborne's system of Equation 10.5.3. • Formulate Newton's method and experimentally observe that initial guesses near (0,0) diverge.
196
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
• Use HOMLAB to solve the system with the power-series endgame and observe the winding number of the origin (suggestion: use totdtab.m). • Use deflation to obtain a new system for which the origin is a nonsingular solution. • How many stages of deflation are required? How many variables does the final system have?
Chapter 11
Checking Results and Other Implementation Tips
This is a very short chapter to help those who might try to create their own continuation codes. These tips can also be useful in getting more secure results when using an existing code. Since continuation is a floating point numerical process, there is the possibility of several kinds of failure. The first step in correcting a failure is recognizing that it has happened. Sophisticated codes detect some failures automatically and take corrective action. Whether done automatically or manually, the basic techniques are similar. 11.1
Checks
There are two kinds of checks: local checks examine an endpoint in isolation using numerical analysis of the iterative method used in the endgame, whereas global checks use knowledge of the polynomial nature of the problem, primarily the fact that we expect to find all isolated solutions. If the path tracker fails mid-course, that fact should be flagged and a corrective action taken. See § 11.2 below. 11.1.1
Endpoint Quality Measures
Any numerical solution method should provide some measures of the quality of the solutions it produces. Let us assume we are solving the square system f{x) = 0 and x* is an estimate. An entire treatise could be written on how to analyze the accuracy of x*, but we will be very brief and simply list some useful indicators: Function Residual The size of the function value, |/(x*)|. This measure is affected by the scaling of the function, that is, if g(x) = 100/(x), then |p(x*)| gives a 100 times worse function residual than |/(x*)|, even though the error in the solution is the same. Even so, this gives a first look at whether the solution has been successfully computed. Newton Residual If we are using Newton's method to refine the endpoint, the 197
198
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
magnitude of the last step, \fx(x*)~lf(x*)\, is a good estimate of the distance between x* and an actual zero of f(x), providing that the Jacobian matrix is nonsingular. Endgame Residual If the endpoint is singular, the methods of Chapter 10 are preferred to Newton's method. Typically, the method is performed several times for successively smaller values of t as t —> 0. The distance between successive endpoint approximations, \x* — x*_1|, replaces the Newton residual as an accuracy estimate. Condition number The condition number of the Jacobian matrix, K,(fx(x*)), is a good measure of how singular the solution is. Using the 2-norm, it is the ratio of the largest to smallest singular value of the matrix. A large condition number indicates singularity. However, the value can be near one for a near-rank-zero matrix, having all singular values small, although these appear only rarely in practice. To signal these, the largest singular value, or any other matrix norm, \fx{x*)\, can be useful as an auxiliary measure. If one finds the complete list of singular values, say o\ >
Checking Results and Other Implementation Tips
199
It is typical that nonsingular solutions will attain very small Newton residuals, while the accuracy of singular ones will depend on the multiplicity of the root. Without a singular endgame, a double root usually attains only about half the accuracy of a nonsingular one. If the condition number is high enough (and we have taken care that the bad conditioning is not due to poor scaling of the equations), we can be relatively secure in classifying the root as singular and, if we are only looking for the nonsingular roots, it can be discarded. It is more satisfying, of course, to invoke a singular endgame and clean up the solution, if possible. Also, higher-precision arithmetic can be invoked to clarify the situation. 11.1.2
Global Checks
In addition to the measures above, which are computed for each endpoint separately, there are some checks that depend on the patterns of roots in the computed solution set. These are tied to the polynomial character of the problem. Path Crossing Check By using random complex numbers in our formulations, we ensure that, with probability one, the solution paths do not cross in the middle of the homotopy; only at the end might they merge together in a singularity. However, if two paths become sufficiently close, it is possible for the path tracking algorithm to jump from one to the other while still staying within the tracking tolerances. Thus, it is a good idea to stop at some small t and check if all the solutions are still distinct. That is, we pick some small te G (0,1) and do the tracking in two phases: first from t = 1 to t = te, then from te to 0. (A value of te =0.1 is typical.) If two solution estimates at te are very close, this indicates that the tracker jumped paths. Re-running just those paths with tighter tracking tolerance usually corrects the error. Multiplicity Check If one uses the power-series endgame of § 10.3.3, an estimate of the winding number, c, is obtained for each endpoint, and this implies that in the neighborhood of t — 0, this path is part of a cluster of c paths approaching the same endpoint. Since we are tracking a complete set of solutions paths, all c of them should be found. It is possible for more than one cluster to approach the same endpoint, so the check is to see if the total number of solutions approaching the same endpoint are compatible with the winding numbers assigned to them. Examples of valid clusters of winding numbers are {2, 2} (one cluster with winding number 2); {2,2,2,2}, two clusters, each with c = 2); and {2,2,3,3,3}, (two clusters, one each with winding numbers 2 and 3). One could go further to extract not just the endpoint of a path, i.e., the constant term of the power series, but also the next term in the power series to match endpoints into clusters. The Cauchy integral method of § 10.3.4 gives an even stronger check for matching up paths: each time the path tracker circles around the origin without returning to the original point generates another point in the cluster. We can check for the existence of such points in the incoming solutions.
200
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Multiple Run Comparisons If one runs the same problem two or more times with different choices for the random constants, the same results should be obtained. This principle can be invoked at several levels. • In a homotopy of the form h(z, t) = -ftg(z) + (1 —t)f(z) (see Theorem 8.3.1 for details), this means using a different value for 7. Then, one should obtain the exact same list of path endpoints, because although the tracking path has changed, it is a real-one-dimensional curve inside the same complex curve and its destination point is the same. The association of start points to endpoints likely will be permuted, however. If the endpoints from two such runs cannot be sorted to match up, then one or both are in error, and one can concentrate path re-runs on those paths whose endpoints have no match in the other set. • A stronger test than the above is to change the start system to another in the same class. The start systems described in Chapter 8 all contain random constants which can be reset to new values. Two such runs should have the same set of nonsingular endpoints, which can be compared. The singular endpoints will typically move, but usually these are not of primary interest. • For a parameterized family of systems, F(z; q) = 0, using the notation of Chapter 7, one may solve two instances for different, randomly chosen, values of the parameters q. The number of nonsingular roots should be constant, but, of course, their values will change. To cross check them, one can track paths from one to the other in a parameter homotopy F(z; tq\ + (l-t)«2)=0.
11.2
Corrective Actions
Points with good quality measures at t = te and which pass the path-crossing test are ready for the endgame. Those which fail on either count should be re-run from the beginning, t = 1 to t = te, with different path-tracking parameters. Paths that fail in one endgame might benefit from another. For example, the power-series endgame in double precision is only effective up to c — 4, while the Cauchy integral endgame has no such limit. But ultimately, the only way to compute some difficult endpoints is to increase the precision of the arithmetic. We briefly address these two issues next. How much extra effort should be devoted to corrective actions depends on one's aims. In an engineering problem, one might not care much about lost solution paths. This is especially true if the trouble is due to a nearly singular endpoint, as it may likely be useless for practical purposes anyway. However, if one is doing an initial run to solve a random-parameter example in preparation for repeated parameter continuations, then one wants to ensure that a full solution set has been
Checking Results and Other Implementation Tips
201
found. This is because there is no way to predict which of these starting solutions will lead to the desired answers in a subsequent application. 11.2.1
Adaptive Re-Runs
We saw in Chapter 2 that path tracking benefits greatly from using an adaptive step size in place of a fixed one. In a similar way, the remaining heuristic control parameters, such as the path tracking tolerance, can be made adaptive. Too small a path-tracking tolerance makes progress slow, while too large allows path crossing. This works hand in hand with the number of iterations allowed in each corrector step. For concreteness, let's say that the path-tracking tolerance is 10~4 and we allow up to three iterations in the corrector. Then, a path-crossing incident is often cleared up by decreasing the tracking tolerance to 1CT6, and if not, try decreasing the iterations allowed to just two. (We have found such settings effective when using double precision on systems of low-degree equations.) These kinds of re-run strategies are easily automated so that human intervention is not necessary. As tighter tolerances are set, it may be necessary to decrease the minimum step size allowed and increase the number of steps allowed, if such constraints are in place to cut off expensive paths. This presumes, of course, that one is willing to pay the extra computational cost to get the answer. If one is planning a large run with a path count on the order of 100,000 or more, it can be worthwhile to collect run statistics on perhaps 1% of the paths and make adjustments in the tracking parameters. Once the initial 1% runs well, the entire run can be launched with confidence, although automatic adaptive re-runs should be left in place. 11.2.2
Verified Path Tracking
Instead of controlling tracking by a tracking tolerance, one can instead use interval arithmetic (see § 6.1) to guarantee that the solution estimate stays in a unique convergence zone throughout, thereby having absolute assurance that path crossing cannot occur (Kearfott & Xing, 1994). This tends to give conservative step sizes, so it can be very expensive. 11.2.3
Multiple Precision
There are several difficult situations that may arise that are most simply resolved with multiple-precision arithmetic. One is the case of a generally ill-conditioned target system, which can be due to high degree equations or coefficients with widely different scales. Sometimes, high degree is the result of applying elimination to an initial system having many equations of lower degree, in which case it might be better to solve the initial system rather than the reduced one. For systems with
202
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
wide-ranging coefficients, such as the chemical systems presented in § 9.2, a scaling algorithm can help. But for some systems, there is no practical recourse except raising the precision of the arithmetic. A common situation is the existence of singular endpoints. As illustrated in Figure 10.1, the endgame operating zone is a disk minus an ill-conditioned region near t = 0. It can happen, especially for endpoints of high multiplicity, that the ill-conditioned region takes up a large portion (or all) of the convergence disk, thus preventing the endgame from succeeding. If the desired accuracy is held constant, higher-precision arithmetic shrinks the ill-conditioned zone and allows the endgame to succeed. (If one deploys multiple precision and makes the accuracy requirement more stringent simultaneously, the latter may cancel the former so that there is no net gain.) A final possibility is the phenomenon of path crossing. Although in theory there is a zero probability of two paths crossing, they can approach each other close enough to require higher precision to negotiate past the near collision. For small systems, it is acceptable to just pick new random constants and re-run the whole procedure, but for a large problem, one wouldn't want to throw away a significant investment of computation if a near collision should happen on some path late in the process. It would be better to detect ill conditioning in the middle of a path and increase precision on the fly, or lacking that capability, rerun the paths in question with, higher precision and tighter path-tracking tolerance. In a sense, singular endpoints are a case of this same difficulty, except we are not trying to slip by the collision, but instead we are aiming directly at it. As multiple paths approach the same endpoint, we need to keep from jumping from one to another so that the endgame attributes the correct angle in the s-plane to the samples, where sc = t. Extra precision may be needed to maintain accuracy. 11.3
Exercises
Exercise 11.1 (Checking) Revisit any problem from the exercises of previous chapters; the six-revolute inverse position problem of Exercise 9.5 might be a good choice. Do the following. • Run the problem using standard settings in HOMLAB and make histograms of condition number, function residual, and the homogeneous coordinate. Note that for any of these quantities, a histogram of the exponents of the values in scientific notation is more useful than a histogram of the values themselves. Use routine pathcros to check for path crossings among the points in xendgame, which is a list of the solutions for t — Endgame ^ ^- ^ s e P a t hcros again for the list of solution points, xsoln, at t = 0. For any occurrence of multiple paths having the same endpoint, check that the incoming paths have winding numbers consistent with the multiplicity check described above.
Checking Results and Other Implementation Tips
203
• Loosen the path tracking tolerance so that pathcros discovers path crossing errors. • Return the path tracking tolerance to its default value, but this time cripple the endgame by setting CycleMax=l. See what difference this makes in the histograms. Exercise 11.2 (Multiple-Run Checking) For any parameterized problem of your choice, do a multiple-run global check that shows that the nonsingular solutions for two independent total-degree runs match up under parameter homotopy.
PART III
Positive Dimensional Solutions
Chapter 12
Basic Algebraic Geometry
In this chapter we discuss the basic properties of the different sorts of algebraic sets that arise in the numerical solution of polynomial systems. The flexible "probabilityone" methods underlying the numerical approach to polynomial systems, developed in Chapter 13, are based on the fact that given any system of polynomials, the set of solutions breaks up into a finite number of irreducible components. Recall that we say that an affine, projective, or quasiprojective algebraic set Z is irreducible if ZTes is connected. The dimension of an irreducible algebraic set Z is defined to be dim Zreg as a complex manifold, which is half the dimension of ZTeg as a real manifold. Irreducible components, discussed in § 12.2 are nice sets that are almost manifolds. For example, the system f{x y)
' -[x(y>-x>)(v-2)(3x
+ y)\-°
^ ^
vanishes on the union of four irreducible components {x = 0} U {y2 - x3 = 0} U {(1,2)} U {(1, -3)}. It is a striking and powerful fundamental fact that the most general solution set is not much worse than this simple example. To even state this result, which is called the irreducible decomposition, we need to make precise what is meant by an algebraic set. The aim of this chapter is to familiarize the reader with the basic types of algebraic sets and their properties. Four types of algebraic sets are useful to us: affine algebraic sets, projective algebraic sets, quasiprojective algebraic sets, and constructive algebraic sets. The first three of these were introduced briefly in the introduction of Chapters 3 and 4. We consider them in more detail in the succeeding sections. In § 12.1, we revisit affine algebraic sets, i.e., the solution sets of systems of polynomials on C^, to discuss the topologies and the maps defined on them. In § 12.2, we discuss the irreducible decomposition for affine algebraic sets. Often polynomials are homogeneous, e.g., f(x,y) = x2 + y2, and in this case acknowledging that their solution set is naturally defined on P^ simplifies matters, 207
208
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
both conceptually and numerically. For this reason we introduced projective algebraic sets, i.e., solution sets on FN, in Chapter 3 and consider them further in § 12.3. Often we need to consider all points in a projective algebraic set X except for some that are in a second projective algebraic set Y, i.e., sets of the form X\(Xf~)Y), such as C2 \ {(0,0)}. These sets, which include affine algebraic sets and projective algebraic sets, are called quasiprojective algebraic sets. They are discussed in § 12.4. A map / : X —> Y between quasiprojective algebraic sets X and Y is said to be an algebraic map if the graph of / is a quasiprojective subset oi X xY: see § 12.4 and § A.4 for more details. Finally, we discuss constructible algebraic sets in § 12.5. These sets, which include all quasiprojective algebraic sets, may be defined as follows. Constructible algebraic sets A constructible algebraic set, or constructible set for short, is any set constructed from projective algebraic sets by a finite number of the Boolean operations of union, intersection, and complementation. Constructible algebraic sets prove useful for two reasons. First, many natural sets, e.g., images of algebraic sets or the set of points of the image of an algebraic map where the fiber is a given dimension, are not quasiprojective, but are constructible (see Theorem 12.5.6 and Lemma 12.5.9). Second, a constructible set A contained in a quasiprojective set X is quite close to being an algebraic set, e.g., the closure A of A in the complex topology is a quasiprojective algebraic subset of A (see Lemma 12.5.3), and there is a dense Zariski open set U of A contained in A (see Lemma 12.5.2). We end with § 12.6, a brief discussion of multiplicity of algebraic sets. Roughly speaking, this notion allows us to relate the algebraic degree of a system of equations to the degrees of the irreducible components of the system's solution set. For a single polynomial in several variables, this is a straightforward generalization of the phenomenon of multiple roots (double roots, triple roots, etc.) that may appear when factoring a polynomial in one variable. For systems of more than one equation, the situation becomes a bit more delicate, as we shall discuss. All four basic kinds of algebraic sets arise quite naturally in discussing the solutions of polynomials on CN, as we show by examples. We include in this chapter only the rudimentary facts about these different classes of sets, with further useful facts collected in Appendix A. As this book is focussed entirely on polynomial systems, we may sometimes drop the modifier "algebraic" and speak simply of "affine sets," "projective sets," etc., but meaning these in the algebraic sense. Before diving in, let's clarify briefly how quasiprojective sets include both projective and affine algebraic sets, and how constructible sets include them all. Since quasiprojective sets are of the form X \ (X n Y), where X and Y are both projective, they include projective sets as the special case where Y is empty. As for affine sets, recall that CN is equal to P^ minus its hyperplane at infinity, Hoo, which is
209
Some Concepts From Algebraic Geometry
a projective algebraic set equivalent to P"" 1 given by the homogeneous equation XQ = 0. So if A is an affine algebraic set defined as the solution of a polynomial system F(x), and B is the projective algebraic set defined by the homogenization of F(x), then A = B \ (B n #00) is seen to be quasiprojective. Finally, the defining form, X \ (X n Y), of a quasiprojective set is just a Boolean construction: we could rewrite it as X n (not V). So quasiprojective sets are a kind of constructible set. We now examine each type of algebraic set in more detail. 12.1
Affine Algebraic Sets
Naively, an algebraic set is nothing more than the common zeros of a set of polynomials. Making this precise and convenient to use takes some work. We start with a polynomial system ~ f(x)
:=
fi(xi,...,xN)~ :
(12.1.2)
Jn{x1,...,xN)_ consisting of n polynomials fi{x\,..., XJV) on CN contained in the ring C[zi,..., XN] of polynomials in the variables Xi, ..., xM with complex coefficients. We denote the set of common zeros on C by V(fi,. ..,/„) := { i e C w | / 1 ( i ) = 0;... i fn(x) = 0} . Such a set of common zeros is called an affine algebraic set. The word affine in "affine algebraic set" signifies that the set is a closed subset of Euclidean space, which is sometimes called affine space. For a system / as above in Equation 12.1.2, we usually abbreviate V ( / i , . . . , /„) by V(f). Example 12.1.1 The simplest polynomial system is p(x) = 0 where p(x) is a monic polynomial of degree d in one variable with complex coefficients, i.e., p(x) := xd + axxd-1
+ • • • + ad, k
with a,i £ C constants. As discussed in § 5.3, p(x) factors as \[(x — x,)Mi. Thus i=l
V(p) consists of the k complex numbers xt. The multiplicity of X{ equals /x, (see § 12.6 for further discussion of multiplicity). Thus p(x) = x3 — x2 = x2(x — 1) = 0 has a zero set consisting of 0 and 1. Unions of affine sets are affine, e.g., if A := V(f) for polynomials / := (A) • • • > fr) and B := V(g) for polynomials g := ( g i , . . . ,5s), then A U B is de-
210
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
fined by V({fl9j
| t = l,...,r; j =
l,...,s}).
Since any point is an affine set, i.e., (xf,... ,x*N) is denned by (xi — x*,... ,XN~X*N), we have that have that any finite set is an affine algebraic set. Lemma 12.4.3 will show that these are the only compact affine sets. For a single polynomial p(x\, £2) £ C[xi, X2] not equal to a constant, the solution set is a nonempty one-dimensional affine algebraic set. Example 12.1.2 A simple polynomial system on C2 is given by X\ = 0. Here the solution set is the X2-axis. It is worth emphasizing that passing from a system / of polynomials to V(f) throws away all multiplicity information. For example, on C, x5, and x define the same affine algebraic set V(x). Also note that CN is the affine algebraic set corresponding to the identically zero polynomial, and the empty set is the affine algebraic set defined by a constant polynomial. Here is a less trivial one-dimensional example of an affine algebraic set. Example 12.1.3
Consider the polynomial w — z2. The set V{w - z2) := {(z,w)eC2\w-z2=
0}
is a smooth connected two-real-dimensional manifold. Indeed, the mappings (z,w) 1—> z and z 1 — > (z,z2) show that there is a one-to-one correspondence between points (z,w) G V(w — z2) and z e C . Note that an m-dimensional complex manifold is a 2m-real-dimensional manifold, since C has real and imaginary parts. In this book, "dimension" always means complex dimension; otherwise, we explicitly say "real dimension." A map / : X —> Y from one affine algebraic set X c C^ to a second affine algebraic set Y C C M is said to be an algebraic map if there is a map F : CN — • > CM such that (1) F = (FU..., FM) with all the F* G C ^ , . . . , % ] ; and (2) / = Fx, the restriction of F to X. When it is clear from the context, we sometimes refer to an algebraic map as a map. We define an algebraic function on an affine algebraic set X to be an algebraic map from X to C. We say that two affine algebraic sets X C CN and Y C C M are isomorphic if there exist algebraic maps F : X —> Y and G : Y —> X such that F o G i s the identity on Y and G o F is the identity on X. Example 12.1.4 Let Y := Viw - z2) be as in Example 12.1.3 and let X := C. We have the map G : Y —> X given by G(z,w) = z and F : X —> Y given by F(z) = (z, z2) which shows Y and X are isomorphic.
Some Concepts From Algebraic Geometry
12.1.1
211
The Zariski Topology and the Complex Topology
Noting that given two systems / = {/i,... , / n } and g = {g\,... ,gm} of polynomials V{f)
U V(g) = V({fi9j\l
and
V(f)nV(g) = V(f,g), we conclude that affine algebraic sets in CN are closed under finite unions and intersections. Given an arbitrary, possibly infinite, set of polynomials on CN, the Noetherian property for ideals in C[zi,..., zjv] (see, e.g, (page 74 Cox et al., 1997)) guarantees that there is always a finite subset of the polynomials with the same common zeros on C^. This guarantees that an arbitrary intersection of affine algebraic subsets of C^ is an affine algebraic set. This implies that the set of affine algebraic subsets of CN that lie on a given affine algebraic set X C CN satisfy the axioms to be the closed sets of a topology on X, which is called the Zariski topology. Here the open sets U C X are the sets X \ Y, where Y C C^ is an affine algebraic set contained in X. Open sets in this topology are called Zariski open sets. Similarly the affine algebraic subsets of CN that lie on the given affine algebraic set X c CN are the Zariski closed sets of X.
Besides the Zariski topology, there is the complex topology, which is also called the classical topology. Given an affine algebraic set X C C^, the complex topology on X is the topology that X inherits from the usual Euclidean topology on C^, i.e., a basis of open sets on X at a point x* € X is given by the intersection of X with the balls
{xeCN | ||x-x*|| <e} for 0 < e e IR and M the Euclidean norm. Both topologies are useful. Since every closed set Y in the Zariski topology on an affine algebraic set X C CN is the zero set of a finite number of polynomials, it follows that Y is also closed in the complex topology. Thus the complex topology has at least as many open sets as the Zariski topology. Except for the case of X a finite set, the Zariski topology has many fewer open sets than the complex topology. For example, if X is one-dimensional, then the open sets of the Zariski topology are the complements of finite subsets of X, that is, X minus a finite number of points. For X = C, this follows immediately from the fundamental theorem of algebra, Theorem 5.1.1. The point to understand is that a statement about Zariski open sets is much stronger than one about open sets in the complex topology. In particular, a nonempty Zariski open set of an irreducible affine algebraic set X is dense, and therefore a property that holds on a nonempty Zariski open set of X holds with
212
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
probability one on random points of X, as was discussed in Chapter 4. For example, the nonempty open sets of C in the Zariski topology are the complements of finite sets, but for the complex topology, the interior of the unit disk is a possible open set. For more on the material in this section, (Red Book: Chapter 1.10 Mumford, 1999) is a good reference. In § 12.4, we discuss the quasiprojective algebraic sets, a very broad class of algebraic sets that includes both affine algebraic sets and Zariski open sets of affine algebraic sets. For now, we would like to point out that certain Zariski open sets of afHne algebraic sets may be identified with affine algebraic sets in different Euclidean spaces. Given a Zariski open set U of an affine algebraic set X c C ^ , we define the algebraic functions on U to be all functions of the form - where p,q G C [ x i , . . . , XJV] and V(q)nU = 0. Given a Zariski open set U on an affine algebraic set X C C ^ and a Zariski open set V on an affine algebraic set Y C C M , a map / : [ / — > V is said to be an algebraic map if / := FTJ where F : U —> C M is given by F := ( F 1 ; . . . , FM), with all of the Fj being algebraic functions. In line with the earlier definition of isomorphism in the case of affine algebraic sets, we say that U and V are isomorphic if there are algebraic maps F : U —> V and G : V —> U with F o G the identity on V and G o F the identity on U. If g is an algebraic function on an affine algebraic set X C C*, then X \ V(g) is isomorphic to an affine algebraic set. See Lemma A.2.4 for a proof of this useful fact. The Zariski open sets U of the form X \ V(g) are a basis for the Zariski open sets on X. To see this let Y := V(h\,..., hr) be an affine algebraic set on X. Then X\Y
= ur=1 (X \ v(hi)).
Not every Zariski open set of an affine algebraic set X is of the form X \ V(g), e.g., in Example A.2.3, we show that 0 C C w for iV > 2 is not of the form V(g) for a polynomial g. 12.1.2
Proper Maps
A continuous map / : X —> Y between topological spaces is called proper if for each y E Y, there is an open set U C Y containing y and such that U and f~x ([/) are compact. An algebraic map / : X —> Y between quasiprojective algebraic sets is called a proper algebraic map if / is proper as a continuous map in the complex topology. Proper maps are very nice, e.g., see § A.4. They also arise naturally when working in a probability-one framework. 12.1.3
Linear Projections
In this subsection, we give a brief introduction to linear projections: see § A.8 for more details.
Some Concepts From Algebraic Geometry
213
A linear projection ix: CN -> Cfc, N > k, is a surjective affine map 7r(x!, ...,XN)
= (LI(X),
. . . , Ljt(a:)),
(12.1.3)
where JV
Li(x) := al0 + ^2 a%ixh
a
ij
e
C.
We say that TT is a generic linear projection if the coefficients a^ are chosen "randomly." Precisely speaking, this only has meaning in the context of some property we are interested in. For example, in Theorem 12.1.5 below, we say that a generic linear projection restricted to X is proper, which means that there is a Zariski open dense subset of the a^ £ £kx{N+i) w^ t n e prOper^y t n a t ^ ne restriction to X of the linear projection, constructed from the Oy, is proper. Choosing a generic linear change of coordinates, i.e., choosing N generic linear maps to C, any projections along the coordinate axes is generic. The simplest example of a nontrivial linear projection 7r : C2 —> C is given by sending (2:1,£2) to X\. To see what this corresponds to in projective space, fix the ernbeddings • C2 into P2 given by sending (xi,x2) —> [l,Xi,x 2 ]; and • C into P 1 given by sending Xi —> [l,Xj]. We now have a commutative diagram C2 ^ P 2 \ {[0,0,1]} ni in' C ^P 1 where the map TT' : P 2 \ {[0,0,1]} —> P 1 is given by sending [xcXi,^] ~* [^Oj^i]Given two distinct points a, b £ FN, let (a, b) denote the unique line through them. The map TT' is often referred to as the projection from {[0,0,1]} because we can think of the map as sending each point i £ P z \ { [ 0 , 0 , 1 ] } to (x,[0,0,l])f){x 2 =0}. Intuitively we have a source of light at {[0,0,1]} and we send each point to the shadow it casts on {xi — 0}. With projections, we are perfectly happy to change the image by a linear transformation, and with this notion of equivalence, the projection is uniquely determined by the point {[0,0,1]}. The point {[0,0,1]} is called the center of the projection. Projections from points at infinity, i.e., points of the form [0, a, b], correspond to linear projections C2 —> C given by sending (x\, X2) to x\ — (a/b)x2 G C, as illustrated in Figure 12.1. From the point of view of projective space, there is nothing special about the points at infinity, and indeed on occasion, e.g., (Sommese, Verschelde, & Wampler,
214
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Fig. 12.1 Projection from point at infinity {[0, a, b]}
2001b) and (Calabri & Ciliberto, 2001), it is useful to project from points not at infinity. The case of a projection C2 —• C with a finite center is illustrated in Figure 12.2, where point c is the center of the projection. (We only draw the real part.) The set of all the lines through c are equivalent to a projective space P 1 , and the projection of a point x is the line P{x) through x and c. To perform calculations, we will often select a line, such as line L, and set TT(X) := LnP(x). No matter which line we choose in place of L, the essential fact is that all points along P{x) \ {c} have the same projection as point x. From this observation, it follows that the projection is determined uniquely by the center c.
Fig. 12.2 Projection with finite center c
We need the following important result, which is proven in § A. 10.4. Theorem 12.1.5 (Noether Normalization Theorem) Let X C CN denote an affine algebraic set. Let ix : CN —> Ck denote a generic linear projection. Then if dim X < k, the map nx is a proper algebraic map with allfibersT^x^iv)finitefor
Some Concepts From Algebraic Geometry
215
ally £ Y :=n{X). If dim X < k, then there is a Zariski dense subset U c X such that i\u : U —> TT(C/) is an isomorphism. If X is of pure dimension k, then nx is a branched covering of degree degX. 12.2
The Irreducible Decomposition for Affine Algebraic Sets
Given an affine algebraic set Z, we let Z reg denote the set of smooth points of Z. The set Z reg is an open set, dense in Z, with Z \ Zreg equal to a union of affine algebraic sets, which is why smooth points are also referred to as regular points. We say that Z is irreducible if Z reg is connected. We would like to follow the traditional, and very common, usage, e.g., (Mumford, 1995), and call an irreducible affine algebraic set an affine variety. It is unfortunate that affine variety has been used as a synonym for affine algebraic set by some authors. At this point it is safe to say that anyone picking up a book on algebraic or complex geometry must check whether varieties are irreducible or not (also reduced or nonreduced if that applies). For example, in (Mumford, 1995) affine variety means irreducible affine algebraic set, but in (Gunning & Rossi, 1965), a variety is a not necessarily irreducible reduced analytic set. The word variety is easier to say than irreducible algebraic set, but, to avoid confusion, we have reluctantly avoided use of this ancient word. The irreducible decomposition of an affine algebraic set Z C C is the decomposition Z := UaezZa obtained by first decomposing Zreg into the disjoint union of connected components Ua and letting Za denote the closure of Ua. Here I is just an index set assigning subscript numbers to the irreducible components. For many of our algorithms, it will be useful to group the irreducible components according to their dimensions, in which case we have index set Xi for dimension i, and we write Z-^UjLoZi,
Zi = UjeIiZij
(12.2.4)
where Zi is the union of all i-dimensional irreducible components of Z, and where Zij for j € Xi are the finite number of distinct irreducible components of Zi. Some of the Zi may be empty, that is, Z might not have components at every dimension. Indeed, the only possible component at dimension n is the whole of C n , which precludes any lower-dimensional pieces, so the decomposition is only interesting when Zn = 0. A simple example is given by Z := V{x\X2). This affine set is the union of the X\ and x2 axes, and since this set is clearly singular only at the origin, the irreducible decomposition of Z is Z = V{x{) U V{x2). The irreducible decomposition is a fundamental tool in understanding solution sets of polynomial systems. The primary aim of the remainder of this book is to show how to numerically find and manipulate this decomposition. (D'Andrea & Emiris, 2003) is a good place for obtaining an overview of symbolic algorithms for rinding the irreducible decomposition.
216
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Remark 12.2.1 (The algebraic situation) Though we will use the geometric approach to solution sets, there is a natural approach based on the underlying algebra of the polynomial system. Let 1{f) C C[x\,... ,XN] denote the ideal generated by the polynomials / i ( x i , . . . , XN), • • •, fn(xi> • • • > XN) making up the polynomial system /. Note that V(/) = V(J(/)). Given an affine algebraic set S c C N , let I(S) C C ^ ! , . . . , xN], denote the ideal of polynomials vanishing on S. An affine algebraic set Z is irreducible if and only if I{Z) is a prime ideal. Given any ideal J c C[xi,..., XN], V% the radical of I, is the ideal consisting of all / G C[xi,... ,XN] such that fk G I for some k. The irreducible decomposition is equivalent to the fact that for any ideal I C C[xi,..., XN], we can write \/T = C\a^j(Pa, where Va are the finite number of minimal prime ideals containing I. For example, y/l{x\x\) = I(x\X2) = I(xi) n I(x2). One weakness of the exact irreducible decomposition is that it assumes that the polynomials are exact and an algebraic set will be said to be irreducible even though it is for all practical purposes reducible. For example, let p{x,y) = xy — e. For e = 0, V(p) has two components, but for e ^ 0, V(p) is irreducible, even if e is so small that, in a problem arising in engineering or science, it is just noise. This sort of discontinuous behavior is not realistic for problems where data is never completely exact. For small e, numerical-geometrical methods will rather gracefully give different answers depending on the precision used. 12.2.1
The Dimension of an Algebraic Set
Using the irreducible decomposition, we can finish the definition of dimension. We define the dimension of an irreducible affine algebraic set to be the dimension of the smooth points, Xreg. Since the smooth points of an irreducible component are connected and dense, this is very natural. We say that an affine algebraic set X is pure-dimensional if all the irreducible components of X have the same dimension. We define the dimension dim^ Z of an affine algebraic set Z at a point x G Z to be the maximum of the dimensions of the irreducible components of Z that contain x. We define the dimension dim Z to be max dim-r Z. xez Here is a basic fact about dimension, which follows from the general result (Theorem III.C.14 Gunning & Rossi, 1965). Theorem 12.2.2 Let Z be an irreducible affine algebraic set Z C C^ of dimension k. Then given a polynomial f on CN which is not identically zero on Z, it follows that the dimension of every component of Z n V(f) is k — 1. Here are some points to be aware of.
Some Concepts From Algebraic Geometry
217
(1) Since the smooth points of an irreducible affine algebraic set Z are connected, it follows that given any point z G Z, every Zariski open neighborhood of z is irreducible. This can fail in the complex topology, as shown in the following example. Consider the curve Z := V(x2—Xi(xi + 1)) in the neighborhood of the point z = (0,0). The real part of this curve is shown to the right, where one may see that near the origin, the curve is 2 / resembles two lines, xi = ±xi, so in the local neighborhood it is not irreducible, even though globally the curve is one irreducible /~\/ piece. The solution set over the complexes is topologically a real \ Z ^ \ ~x[ two-plane stretched and bent such that two points touch each \ other. Local to the point of contact, it looks like two disks touching transversely, but globally it is all one surface. This is discussed in more detail in Example A.4.18. (2) Real points of irreducible algebraic sets do not have to be connected, nor do the components have to have the same dimensions. V{x\ — X\{x\ — l)(xi — 2)) is an example of the former and V{x\ — x\(x\ ~ 2)) is an example of the latter. Nor does there have to be much relation between degrees and number of real isolated zeros. For example, following (Example 13.6 Fulton, 1998), let p(x, y) := U^{x
- if + Iif=1(y - j) 2 .
We have m2 zeroes on R2 despite degp(x,y) = 2m. Over C, we have a curve with these m2 points all singular. 12.3
Further Remarks on Projective Algebraic Sets
Though, for applications, affine algebraic sets are the main interest, we must also define projective algebraic sets. We need them to be able to discuss what happens at infinity for a given polynomial system, and in particular to be able to carry out accurate counts of solutions of polynomial systems. Also the behavior of projective algebraic sets is often easy to understand, e.g., see the Proper Mapping Theorem A.4.3, and they can be used to understand the behavior of affine algebraic sets. In this section we continue the discussion of projective sets started in § 3.5. FN is a compact manifold containing CN as a dense open set. The natural approach to the definition of algebraic sets on WN is to define them as the solution sets of finite numbers of whatever are the analogue for FN of polynomials on C^. At first glance this does not look hopeful, since we cannot expect any nontrivial global algebraic functions. To see this consequence of the compactness of P™, consider the representative case of P 1 . Polynomials on C are holomorphic functions, and so under any reasonable definition, an algebraic function / on P 1 should be a holomorphic function. The
218
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
snag is that since P 1 is compact it follows from continuity that \f(x)\ has a maximum at some point x* G P 1 . Thus, then by the Maximum Principle, Lemma A.2.7, f(x) must be constant on an open neighborhood of x*, and therefore on all of P 1 . At first sight this is discouraging, but the key insight is that although there is no reasonable class of algebraic functions on ¥N, there are some "almost functions" lying around, i.e., the homogeneous polynomials. It is important to realize that, even though homogeneous polynomials are not functions on projective space, they behave as "extensions" to FN of polynomials on C^. Later we will return to homogeneous functions in § A. 13 and see that they are the prototypical nontrivial example of "sections of line bundles." Before we give definitions, let's work out a simple representative example. Let p(xi,X2) — x\ — X2 + 1 be a function on C2. Regarding C2 as the coordinate patch Uo C P 2 as above, we have in terms of the homogeneous coordinates [zoi^i)^] o n P2 that x\ = ZI/ZQ and a:2 = Z^/ZQ. Thus the function x\ — X2 + 1 is represented by
(?) a -? +i =(fy^-^ + ^-
\zoj ZQ \zoj 2 Under the identification of Uo with C , it is easy to check that the closure in P 2 of the zero set V{p) is the zero set V(f) of the homogeneous polynomial f(z) := Z\ - Z0Z2 + Zl.
The following two examples indicate that counting solutions in C2, even when we just have points, is not so clear cut as on C. Example 12.3.1
Consider the system
f
^--=[ax7bf+c]-
^
The reader can check that if a ^ 0, then there are two solutions to f(x,y) = 0 (counting multiplicities in the obvious way when b2 — \ac = 0). But what about the case a = 0, b ^ 0 where we only have one solution? Example 12.3.2
Consider the system of two polynomials on C2
(12 3 6)
/(*.»):= [^2.]
--
We expect two lines to meet in a point, but these two parallel lines do not. We already met similar systems in Chapter 3, so we know that the key to simplifying solution counts is to homogenize the systems. In this way, Example 12.3.1 becomes
g(w,x,y) := [ yv
' '
y/
wx y2
~
]
[ax + by + cw,\ '
(12.3.7)
v
'
Some Concepts From Algebraic Geometry
219
which now has for a = 0 a second solution point at infinity of [w, x, y] = [0,1,0] 6 P 2 , formerly "missing" from the affine version. Similarly, Example 12.3.2 becomes
S(w,*,v)--=[jSZ]>
(12 3 8)
'-
which now has the solution point at infinity along the x-axis, [w, x, y] = [0,1,0] € P 2 . Note that Example 12.3.2 shows that if we have a system / on CN, then V(f), the closure in FN of the set of solutions of / , may be smaller than the set of solutions V(f ) of the associated system / of homogeneous polynomials on FN. In that example, V(f) is empty, so V(f) is too, whereas V(f ) is the point {[0,1,0]}. It is easily checked that V{J) n C N = V(f). 12.4
Quasiprojective Algebraic Sets
Sets of the form X \X nY, where X, Y C FN are projective algebraic sets, are called quasiprojective algebraic set. These include sets of the form X \ X nY, where X, Y c P ^ are affine algebraic sets. The simplest nontrivial example of a quasi-projective algebraic set which is neither projective nor affine is C 2 \ 0. As with affine algebraic sets, we can with no changes define the Zariski and complex topology and the notion of irreducibility. The following is a basic fact. Theorem 12.4.1 Let U be a Zariski open dense subset of a quasiprojective algebraic set X. Then the closure of U in X in the complex topology is X. Proof. This follows immediately from (Theorem 2.33 Mumford, 1995).
•
Finally we note that all the basic results such as the irreducible decomposition of § 12.2 hold for quasiprojective algebraic sets (respectively projective algebraic sets) and not just for affine algebraic sets. The only difference is that the irreducible components in this generality are not affine algebraic sets, but are only quasiprojective varieties (respectively projective algebraic sets). Using this we carry over all the definitions of dimension. For example, a pure-dimensional quasiprojective set is a quasiprojective set with all irreducible components having the same dimension. Let X and Y be quasiprojective algebraic sets. We define an algebraic map f : X —> Y between X and Y to be a map such that for all x € X and j e F there are affine open sets U C X containing x and V C Y containing y such that f(U) C V and / : U —> V is algebraic. The set X x Y is a quasiprojective set, which may be shown by elaborating on § A. 10.2. The graph of a map f : X —> Y is the set Graph(f) := {(x,f(x)) e X x Y | x G X}.
220
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
It may be shown that an equivalent definition for an algebraic map f : X —• Y between quasiprojective algebraic sets is that / is a map from X to Y such that the graph Graph(f) C X x Y of / is a quasiprojective algebraic subset of X x Y. The following fact is useful. Theorem 12.4.2 The complement of a proper quasiprojective algebraic subset Y in an irreducible quasiprojective set X is connected. If a quasiprojective set X is connected, then X is path connected. Proof. The first assertion follows immediately from (Chapter 4, Corollary (4.16) Mumford, 1995). The second assertion would follow if we knew it for irreducible quasiprojective algebraic sets. Given any irreducible quasiprojective set, there is a connected smooth manifold mapping onto it by Hironaka's Desingularization Theorem A.4.1. Since • connected manifolds are path connected, we are done. Few algebraic sets are both affine and projective. Lemma 12.4.3
Let X C CN denote a compact affine set. Then X is finite.
Proof. To see this assume otherwise. By the irreducible decomposition from § 12.2, we know that if X is compact and not finite, then X contains a compact irreducible infinite affine algebraic set. We can assume without loss of generality that X is this set. The absolute value of any coordinate functions Z; restricted to X has a maximum on X. By Lemma A.4.2, the restrictions of all the coordinate functions • are constants, and hence X is a single point. 12.5
Constructible Algebraic Sets
Let us start with an example leading to a constructible set. Example 12.5.1 mials in C[x,y]
Suppose we were interested in the family of systems of polyno-
F
™=[Z,~-u]=0-
parameterized by (t,u) € C2. The set of (t,u) £ C2 where F^u)(x,y) nonempty solution set is
(12A9) = 0 has a
{(o,o)}u{t^o}. This set is not quasiprojective, but it is constructible. Let X be a quasiprojective algebraic set. Let A{X) denote the set of closed algebraic subsets of X. A(X) is closed under finite unions and arbitrary intersec-
Some Concepts Prom Algebraic Geometry
221
tions. The set T(X) of complements of the elements of A(X) are the open sets of the Zariski topology of X. The set C(X) of constructible sets of X is the smallest set of subsets of X that • contains A(X) and • is closed under a finite number of Boolean operations, where the Boolean operations are union, intersection, and sending a subset of X to its complement in X. Otherwise said, C(X) is the Boolean algebra of subsets of X generated by A{X) (or equivalently T{X)). Constructible sets are the outer limits of the type of sets that need to be considered in the numerical analysis of polynomial systems. We will see that they arise naturally when working with affine algebraic sets. We present here a few key facts about constructible sets. A fuller discussion may be found in (Chaps. AG.l and AG.10 Borel, 1969). Lemma 12.5.2 Let X be a quasiprojective algebraic set. Assume that A C X is a constructible set such that A = X, where the closure is in the Zariski topology. Then there exists a Zariski open and dense set U C X such that U C A. Proof. See (Proposition in Chap. AG.2 Borel, 1969)
•
Lemma 12.5.3 Let A be a constructible subset of a quasiprojective algebraic set X. Then the closure of A in the complex topology and in the Zariski topology are the same. Proof. Use Lemma 12.5.2 and Lemma 12.4.1.
•
When we take closures of constructible sets (and almost every set that comes up in this book is at worst constructible) this lemma tells us it does not matter whether we use the complex or Zariski topology: in either case we get the same algebraic set. For this reason, we often do not specify which topology we are taking the closure in. It is useful to record the trivial case when a constructible set is automatically algebraic, a corollary to Lemma 12.5.3. Lemma 12.5.4 Let A be a constructible algebraic subset of an affine (respectively protective, respectively quasiprojective) set. If A = A, e.g., if A is closed in the complex topology, then A is an affine (respectively projective, respectively quasiprojective) set. Example 12.5.1 is a simple and fairly typical example of a constructible set. Here it is said a slightly different way. Example 12.5.5 Consider the map F : C2 —> C2 which sends (z,w) —• (z,zw). This is a nice algebraic map, but the image is (C2 \ {z = 0}) U {(0,0)}, which is
222
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
neither the set of zeros of a set of polynomials nor the complement of such a set. This is about the worst the image of an algebraic map gets. Theorem 12.5.6 (Chevalley's Theorem) Let F : X —> Y be an algebraic map between quasiprojective algebraic sets. If Z G C(X), then F(Z) s C(Y). Proof. See (Corollary AG.10.2 Borel, 1969).
a
Chevalley's Theorem is one of the features distinguishing algebraic geometry from complex analytic geometry, e.g., holomorphic maps are too wild to admit any such result. Corollary 12.5.7 Let f : X —> Y be an algebraic map between quasiprojective algebraic sets. Then given any irreducible component B' of f(X), there is an irreducible component A of X with f(A) = B'. In particular, if X is irreducible, then f(X) is irreducible. Proof. By Chevalley's Theorem 12.5.6 we know that B := f(X) is algebraic. We first show the special case when X is irreducible. We have the irreducible decomposition B = U*j=1Bj for some positive integer r. Since X is irreducible and contained in \Jj=1f~1(Bj), we conclude that X = f~l(Bj) for some j . Thus f{X) CB3 and B = By For the general case assume that X has an irreducible decomposition \J\=1Xj for some finite s. We have the irreducible decomposition B = U^=1Bj for some positive integer r. By the last paragraph, f(Xi) is irreducible. Since for any j we have that Bj C B = Uf=1/(Xj), we conclude that Bj C f(Xi) for some i. Since any component of the irreducible decomposition of an algebraic set B is not contained • in any larger irreducible algebraic subset of B, we conclude that Bj = f{Xi). Maps of algebraic sets that "should" be surjective often fail to be because the domain lacks some points at infinity. For example, the map from V{zw — 1) C C2 to C obtained by sending (z,w) —> z misses z = 0. Example 12.5.5 is also of this sort. For this reason, it is often more useful to use the notion of a dominant map. A map / : X —> Y between quasiprojective algebraic sets is called dominant if f(X) = Y. Lemma 12.5.8 Let f : X —>Y be a dominant algebraic map from a quasiprojective set X to an irreducible quasiprojective set Y. There exists a Zariski open dense set V C Y contained in f(X). Proof. This is an immediate consequence of Theorem 12.5.6 and Lemma 12.5.2. • Thus using the Upper-Semicontinuity of dimension Theorem A.4.5 and Chevalley's Theorem 12.5.6, we have the following result.
Some Concepts From Algebraic Geometry
223
Lemma 12.5.9 Let f : X —> Y be an algebraic map of quasiprojective algebraic sets. Then for any integer k, the set {y £ Y\ dim/~ 1 (j/) > k} is constructive.
12.6
Multiplicity
Multiplicity appears in numerous places in algebraic geometry. In its simplest form, it is very easy to understand, e.g., given a not identically zero polynomial of one variable p(x) £ C[x], the multiplicity of a point x* £ V(p) is the integer fi > 0 such that p(x) = (x — x*)Mg(a;) for a polynomial q(x) with q(x*) ^ 0. In several variables, the story for a single polynomial is the same. Let be a not identically zero polynomial on CN. The p(xi,... ,XN) £ C[X±,...,XN] irreducible decomposition of V(p(x)), the solution set of p(x) = 0, is a decomposition
V(p) = UUiZN-i,i where ZN-I,I are distinct afSne algebraic sets, i.e., distinct irreducible affine algebraic sets. Moreover, dim.Zjv-i,t = N — 1 for all i and there are polynomials qi{x) such that (1) ZN-lti = V(qi); (2) the multiplicities of the solutions of the one variable polynomial obtained by restriction of qt(x) to a generic line is one; and (3) p{x) = qi{x)^ • • • qr{x)^. This is a satisfying description of multiplicity of a component, although already the situation is not so easy to prove as in one variable. What about the multiplicity of an isolated solution x* of a polynomial system
fi(xi,... f(x) :=
,XN) :
=0?
(12.6.10)
Jn(x1,...,xN)_ The difficulties with multiplicities begin when we have a set denned by more than one polynomial. Theorem 12.2.2 implies that for this system to have an isolated solution, n must be > N. Perhaps the simplest example is given by the system z1=0
(12.6.11)
^2 — 0.
It is completely reasonable to say that the origin (0,0) is a multiplicity 2 solution of this system. Indeed, since z\ = 0 defines the -22-axis, and since the restriction of z\ = 0 to the Z2-axis has 0 as a multiplicity 2 root, we must either have that (0,0)
224
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
is a multiplicity 2 solution of this system 12.6.11 or give up any sort of reasonable compatibility with the already defined notions. We define the multiplicity of x* as a solution of the system to be the dimension /x of the finite dimensional vector space Ocv/(/i,-,/n), where (1) GcN,x' is the ring of convergent power series centered at x*\ and (2) (/i,..., /„) is the ideal of OQN x. generated by the polynomials /». It is straightforward to see that when n = N = 1 this agrees with the notion of multiplicity, that we are used to, but it is certainly not clear what this means when N > 1. Also, why convergent power series? It turns out this is just a convenience for us. One could use instead formal power series, or the ring of rational functions p(x)/q(x) with q(x*) ^ 0. But the equivalence of the multiplicities obtained these different ways is not obvious! In the special case n = N, \i has a simple geometrical interpretation. If x* is a multiplicity /x isolated solution of f(x) = 0 and you choose a generic vector v 6 C^ sufficiently near 0, then f(x) = v has exactly /i nonsingular isolated solutions x*,...,x^ near x*. By nonsingular we mean that the Jacobian matrix, J, with elements T
j
~
_ dfj(x) dXj
is invertible at each of x*,..., x*. This, in fact, implies that ji = 1 is equivalent to the solution x* being nonsingular isolated. Another consequence of this, in the case n = N, is that with appropriate homotopies of the sort we construct, the number of paths ending at x* equals the multiplicity. Unfortunately, when n ^ N, the meaning of multiplicity becomes a bit more obscure, and not so closely connected to geometric intuition. This is a reflection of the complexity of the nonreduced structures on points in higher dimensions, i.e., the zero dimensional nonreduced schemes. Since we do not make much use of multiplicity we do not pursue this. If you do, you need to put multiplicity into a broader context of Hilbert functions, e.g., see the discussion of multiplicity in (Hartshorne, 1977), and in particular (Exercise V.3.4c Hartshorne, 1977). The books (Eisenbud, 1995; Fulton, 1998) are good algebraic references. See also (Bates, Peterson, & Sommese, 2005a) for a numerical-symbolic algorithm for computing multiplicity. Multiplicity for us arises in another way. Consider C := V(x\ - x\) c C2. The multiplicity of C as a component of the solution set of x\ — x\ = 0 is 1. In this case, it is useful to attach a multiplicity to each point of C. We define the multiplicity of a point x* € C as a point of C to be the multiplicity of x* as an isolated solution
Some Concepts From Algebraic Geometry
225
x* of the system
[ao + aiXi + 0,2X2
where ^ ( a 0 + aixi + 02^2) is a generic line vanishing at x*. An excellent and very readable reference for this sort of multiplicity is (Chapter 8 Fischer, 2001).
12.7
Exercises
Exercise 12.1 (Solution Components) Solve the system on page 207 using a total-degree homotopy. Do you get points on every component? How many? Exercise 12.2 (Projection from a Point) Write out the formula for a projection C 2 —> C from center c onto the line {x2 = 0}. Exercise 12.3 (Composition of Projections) Write the projection CN —> C N - 1 from center c G C^ onto the hyperplane {x^ = 0}. For points ci,C2 £ C^, consider the projection CN —>• C ^ " 2 given by the composition of the projection TTI : CN -> C ^ " 1 from center cx followed by the projection TT2 : C ^ " 1 -> C ^ " 2 having 7Ti(c2) as center. Is the result the same or different if we reverse the order of c\ and c2? Exercise 12.4 (Dimension of an Affine Algebraic Set) Let Z be the solution set of the initial example on page 207. What is the dimension of Zl What is dim(ii2) Zl Exercise 12.5 (Classifying Sets) Classify each of the following sets as affine, projective, quasiprojective, or constructible. Remember that the classifications are not mutually exclusive. (1) (2) (3) (4) (5) (6) (7) (8)
V{xy-y-l). The image of V(xy — y — 1) under the projection (x, y) 1 — > x. 2 2 2 V(x + y + yz,xz-2z ). The set of quadratic equations in one variable that have two distinct roots. The nonsingular solution points of y2 — x 2 (x — 1) = 0. Points in C 2 that are not nonsingular solutions of y2 - x 2 (x - 1) = 0. Pairs of points in C 2 such that there is a unique line containing them. Pairs of points as in the previous item such that the line contains the origin.
Exercise 12.6 (Real Solution Points) Verify the statements in item 2 on page 217 concerning the real points of the two algebraic sets mentioned there.
226
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Exercise 12.7 (Multiplicity of an Isolated Solution) Prove that the multiplicity of (0,0) as a solution zi = 0
4 =o is 2. (Hint: n = N.) Demonstrate this numerically using HOMLAB. Exercise 12.8 (Multiplicity of an Irreducible Affine Set at a Point) Show that the multiplicity of x* e V(x\ — x\) is 1 for all points but x* = (0,0), for which it is 3. Demonstrate these facts numerically using HOMLAB.
Chapter 13
Basic Numerical Algebraic Geometry
Our overarching goal is to numerically encode an algebraic set Z in a form that allows us to answer such basic questions as: Membership Is point x in Zl Dimension What is the dimension of Zl Degree What is the degree of the pure i-dimensional component of Zl Decomposition What are the irreducible components of Zl This is just a beginning, however, for suppose we have a similar encoding for a second algebraic set Y. Then, we would like to answer: Inclusion Is Y a subset of Zl Equality Is Y equal to Zl Finally, we would like to propagate the encoding through Boolean binary operations, that is, if we have encodings for algebraic sets Y and 2", we would like to: Union Find an encoding for X = Y U Z; and Intersection Find an encoding for X = Y n Z. (We regard the third Boolean operation of complementation as just the negation of the membership test.) Numerical algorithms to answer these questions form the foundation of numerical algebraic geometry. Typically, we begin not with an algebraic set, but rather, with a system of polynomials f(x) : CN -> Cn
'hixy f[x)=
j
=0.
(13.0.1)
Jn{x)_ Then, our object of study is the solution set of / = 0, which we often write as
z = vu). As discussed in § 12.2, we know that any affine algebraic set decomposes as Z:=uf™zZz,
Zi:=\JjeXiZi:i 227
(13.0.2)
228
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
where Zj is the union of all i-dimensional irreducible components of Z, and where Zij for j £ Xi are the finite number of distinct irreducible components of Zi. Geometrically, the Z^ are the closures of the connected components of the set of manifold points of Z. The algebraic set Z might be the entire solution set V(f) or it might be the union of several of its irreducible pieces. In the latter case, once we have built our encoding for Z, we wish to answer all our questions about Z alone, as if all the components excluded from Z did not exist. The purpose of this chapter is to motivate and describe an encoding of algebraic sets that we call witness sets. A first look at these is given in an introductory section, § 13.1, without worrying about how we can compute them or even justifying that they are well denned. In § 13.2, we present basic theory concerning the intersection of irreducible components with linear spaces, which is the underpinning for our formulation of witness sets, given precisely in §13.3. Next in §13.4, we define the rank of a polynomial system and present a fast algorithm to compute it. Then, in § 13.5, we show how the solution set of a system of polynomial equations relates to the solution set of a system of random linear combinations of those same polynomials. This prepares us for an algorithm in §13.6 to compute a loose inclusion of witness sets, called witness supersets. The final section, § 13.7, uses these concepts and procedures to obtain numerical methods to answer several of our basic questions itemized above. Much of this Chapter is based on the article (Sommese & Wampler, 1996), where the subject Numerical Algebraic Geometry was started, and its name coined. The name was chosen to indicate that this subject would be to algebraic geometry what numerical linear algebra is to linear algebra. After this chapter, one major problem remains before we can compute the numerical irreducible decomposition, Equation 13.1.3, namely, the witness point supersets are only a crude approximation to the numerical irreducible decomposition. A lesser problem is that the procedures given in this chapter for finding the witness point supersets are not as efficient as we would like. The two chapters following this chapter show how to solve these problems. Chapter 14 gives the efficient algorithm of (Sommese & Verschelde, 2000) to find the witness point supersets. Chapter 15 gives efficient algorithms (Sommese & Wampler, 1996; Sommese, Verschelde, & Wampler, 2001c, 2002b) to process the numerical irreducible decomposition out of the witness point supersets. The notion of witness set has developed over time. Originally in (Sommese & Wampler, 1996) and continuing through (Sommese & Verschelde, 2000), the central notion was that of generic point of a component, though all the information contained in what we now call witness sets was being computed and used. In the successive articles (Sommese et al., 2001c, 2002b), the notion of irreducible witness sets was distilled out as the essential numerical output of our algorithms. The enriched version of the witness sets for nonreduced components, presented in this chapter for the first time, is based on the experience gained from (Sommese, Verschelde, &
Basic Numerical Algebraic Geometry
229
Wampler, 2002a). 13.1
Introduction to Witness Sets
What should we adopt as our numerical encoding of algebraic sets? Let's begin by considering the simplest case, a zero-dimensional algebraic set Z. This is just a finite set of points, so we can use as our encoding a list of the points. When we are given a system of N polynomial equations in TV unknowns, the methods of Part II allow us to find a numerical approximation to all nonsingular solution points, and in fact, those methods give us a list of homotopy path endpoints that includes all isolated singular solution points as well, although we cannot readily sort these out from singular endpoints on higher dimensional components. Nevertheless, we have some confidence that the encoding of Z as a list of solution points is computable. Moreover, up to the approximation of numerical roundoff, we can easily answer all our questions about membership, union, intersection, etc. The subtlety regarding isolated singular solutions is a concern that we can resolve by considering the larger picture that includes higher dimensional components. But what shall we do when Z is positive dimensional? Looking at natural examples, e.g., the set V{x{) C C2, there are two obvious ways of encoding the points of these algebraic sets. The first approach is to use a parametric representation, e.g., representing V(xi) C C2 as {t £ C | (xi,x2) = (0,t)}. Unfortunately, while parametric representations are very useful, they are also rare. For example, in Remark A.2.10, we sketch an argument showing that a curve as simple as V{x\—x\{x\ — l)(xi —2)) has no parametric representation. A nice discussion of which curves have a parametric representation may be found in (Abhyankar, 1990). A second approach is to use denning equations. Since by definition, algebraic sets are solution sets of polynomial equations, we know that this approach has to work. Indeed, this is the approach taken in computational algebra. Low degree equations vanishing on an algebraic set are nothing to scoff at: they can be very useful. Unfortunately, computing denning equations is numerically expensive. Furthermore such equations can be numerically unstable. Numerical Algebraic Geometry rests on a third approach, using the notion of witness sets. This natural data structure to encode algebraic sets is based on the concept of generic points and the classical notion of a linear space section. Since we are going to talk often about linear subspaces of CN, it is convenient to introduce a shorthand notation for them. We use the following conventions: • Z/dL*J c C^ denotes an affine linear subspace of dimension i; and • Lcr*l c C^ denotes an affine linear subspace of codimension i, or equivalently, of dimension N — i. Depending on context, it is sometimes easier to use the notation of codimension
230
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
instead of dimension, which is why we introduce both. Consider two generic linear subspaces, L\ and Li. Their dimensions add under the operation of union, while their codimensions add under the operation of intersection. If their dimensions are complementary, their intersection is zero-dimensional; i.e., L^ n L^ i s a point. The following fact, demonstrated in § 13.2, is the foundation of the notion of a witness set. Let A C C be a pure i-dimensional algebraic set. Given a generic affine linear subspace LCM 6 C^, the set A = Lcl"*l n A consists of a well-defined number d of points lying in ATeg. The number d is called the degree of A and denoted deg A. We refer to A as a set of witness points for A, and we call Lcl~ll the associated slicing (N — i)-plane or just slicing plane, for short. The number of witness points A tells us the degree of the set A, and if we determine the codimension of the slicing plane that cuts A in isolated points, we have determined the dimension of A. However, to answer most any other question about A, such as to test whether a given point x is in A, we need the ability to track the paths of the witness points as the slicing plane is moved. When A is a pure i-dimensional reduced component of a polynomial system / , then the witness points A are nonsingular roots of / restricted to Lc'z> (more on this in a moment), so the data structure W := (A,Lc^\f) is everything a nonsingular path tracker needs to track solution paths starting at A as L c ^l evolves continuously. Accordingly, we call W a witness set for A. When A is not reduced we need a slightly richer structure, which will be discussed in § 13.3.2 and § 13.3.3. We will see later, in § 15.2, how to generate from a witness set as large a number as we wish of widely spaced points on A. The witness set data structure, more fully described in § 13.3, has many advantages: (1) it is stable and much cheaper numerically than finding defining equations; (2) it is sparing of memory; (3) it can be used to compute quantities of interest, e.g., if you really want defining equations they may be computed from this encoding; and (4) it is special case of the notion of a linear space section, for which there is an extensive theory (Beltrametti & Sommese, 1995). Using witness sets, we can make numerical sense out of what it means to find the solution set of a system of polynomials f(x) = 0 in Equation 13.0.1. We wish to find a numerical irreducible decomposition that mirrors the irreducible decomposition of Equation 13.0.2, by which we mean to find a collection of witness sets W, for the i-dimensional components V$, which are themselves decomposed into irreducible witness sets Wy for the irreducible components Vij, i.e., W:=V^v{f)Wi,
Wi:=UjeIiWV
(13.1.3)
In Equation 13.1.3, we should be a little careful to define what we mean by the union of witness sets. Let us use the notation that WA means the witness set for
Basic Numerical Algebraic Geometry
231
an algebraic set A. When two algebraic sets, say A and B, have no components of the same dimension, the witness set for their union is just a formal union of their witness sets, that is, WAUB = WAUWB = {WA, WB},
dim A ^ dim B.
However, when A and B have some irreducible components of the same dimension, we require the witness sets of the components with the same dimension to have the same slicing planes L. So, in the reduced case with A a pure-dimensional union of components of V(f) for a system of polynomials / on C^ and B a pure-dimensional union of components of V(g) for a possibly different system of polynomials g of the same dimension as A, the formal union resolves as WAUB
= WAUWB = {WA, WB} = {(A, L, / ) , (B, L, g)} .
where A and B are witness point sets for A and B, respectively. The resolution of unions in this fashion is not necessary, but it is convenient, and if two witness sets have different slicing planes, they can always be brought to a common slicing plane by homotopy continuation. In computing a numerical irreducible decomposition, we are faced with the opposite problem of computing a union. Our procedures will first find the witness set Wi for the i-dimensional component Vit and subsequently, its witness point set Vj will be partitioned into irreducible witness point sets Vy. In the above overview, we have claimed the existence of witness sets and asserted some of their basic properties. This chapter aims to justify these assertions and to describe some rudimentary algorithms based on them. Subsequent chapters will discuss refinements and extensions. We begin in the next section with the basic facts about intersecting irreducible components with linear spaces, thereby establishing that witness sets do indeed exist and have the main properties that we asserted above. 13.2
Linear Slicing
We use the terms slicing or linear slicing to mean intersecting algebraic sets with linear spaces. The answer to the following question supports the use of linear slicing and will give witness sets much of their power. How does the irreducible decomposition, Equation 13.0.2, behave under slicing by general hyperplanes? The crucial value of linear slices is that they have good preservation properties, i.e., given a general hyperplane L C CN, Z and ZnL share several important properties.
232
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
An affine hyperplane (or hyperplane for short) Lc^~\ C CN is the zero set of a linear equation, which we denote C(x; a) = ao + aiXi + • • • +
aNxN
with the <2j E C not all zero for i > 1. C\{x;a) and £2(^1 &) have the same zero set if and only if a = Xb for some complex number A ^ 0. Thus affine hyperplanes are parameterized by the subset of points [ao,ai,... ,ajv] £ P ^ with aj 6 C not all zero for i > 1. The single point not in this set, [1,0,... ,0] G P " , corresponds to the hyperplane at infinity. Similarly, we regard affine linear spaces Lcl"*l c CN as parameterized by i-tuples (Ai,..., Ai) G (P^) , where Aj := [a^o, • • •, CLJ,N] and the rank of the matrix ai,o • • • OI,JV
A—
:
•-.:
difi . . . Ili^N _
is i. The linear space is the zero set of the linear equations, so we may write Ae(¥N)\
CN DLc\i\ =v(C(x;A)),
Though we use this representation below, it is not optimal for i > 2. For example, given an invertible i x i matrix F, the linear equations associated to the matrix (F • A) define the same affine linear space as the linear equations associated to A. A much crisper parameterization is given by the use of the Grassmannian, as discussed in § A.8.1. We are interested in the relation between the solution set of the polynomial system f(x) = 0 of Equation 13.0.1 and the augmented polynomial system " fi(x) '• fn(x)
=0
(13.2.4)
.£(x; a) _ on C ^ , where C(x;a) is a general linear equation. The basic facts are as follows. Theorem 13.2.1 (Slicing Theorem) Let X C C ^ denote a pure i-dimensional affine algebraic set. There is a Zariski open dense subset U C PN such that for a€U and L = V (£{x; a)), (1) ifi = 0, then L n X is empty; (2) if i > 0, then L n X is nonempty and (i — 1)-dimensional, and deg(L n X) = degX; and (3) if i > 1 and X is irreducible, then L n X is irreducible.
Basic Numerical Algebraic Geometry
233
Items 1 and 2 of the theorem are rather elementary consequences of Bertini's theorem A.7.1, but item 3 is deeper. A quick proof of this fact follows from the Hironaka Desingularization Theorem A.4.1 and a vanishing theorem of Kodaira type. See (Theorem 3.42 Shiffman & Sommese, 1985) for a proof in the projective case. The afHne case follows from this since (1) the closure X of X in P ^ under the natural embedding C ^ C TN is projective; (2) X is irreducible if and only if X is irreducible; and (3) X D L is irreducible if X n L is irreducible. Theorem 13.2.1 is not quite strong enough to be conveniently used. We say that a set of linear equations
: = A
: •LK{X)
'
+ .XN
.
is generic with respect to an irreducible affine algebraic set X if given any subset Li1,..., Lir of r distinct Lj, it follows that (1) either Xr\V(Lllt... ,Lir) is empty or dimXnF(Li 2 ,... ,Lir) = d i m X - r > 0; and (2) Sing(X n V(LU ,...,Lir))c Sing(X) We say that a set of linear equations L\,..., L^ is generic with respect to the irreducible components of an algebraic set X if the set of equations is generic with respect to all irreducible components of X, plus all irreducible components of intersections of any number of the irreducible components of X. Theorem 13.2.2 Let X C C ^ be an irreducible affine algebraic set. There is a Zariski open dense subset of U C C X x ( i v + 1 ' of K x {N + 1) matrices such that for A 6 U, the linear equations
\L^X)] :=A fxV : ' •LK{X)1
XN X
, N .
are generic with respect to the irreducible components of X. This is a special case of the more general result Theorem A.9.2. There is a further consequence that we do not make much of because we do not keep track of multiplicities. L e m m a 13.2.3 Let f(x) = 0 be as in Equation 13.0.1. Assume that X is a positive dimensional solution component of f(x) = 0 with multiplicity /x, then there
234
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
is a Zariski open dense subset U C FN such that for a G U and L = V(£(x;a)), every component of X D L is a component of the solution set of the augmented system in Equation 13.2.4 with multiplicity fi. Proof. This follows from the stronger result (Lemma 1.7.2 Fulton, 1998).
13.2.1
•
Extrinsic and Intrinsic Slicing
In putting Theorem 13.2.1 to use, we will often simultaneously slice an algebraic set by several hyperplanes. The theorem implies that slicing an algebraic set by i generic hyperplanes will cut the i-dimensional components of the set down to isolated points. As we will see subsequently, this is a standard maneuver in many of the algorithms of numerical algebraic geometry. Accordingly, it is useful to consider how different formulations of slicing might affect computational efficiency. The formulation we have used so far in this chapter, which we call an extrinsic formulation, represents a linear space by a set of equations, as Lc'1' — V(C{x\ A)), where A is an i x (N + 1) matrix. To find V{f) n Lc|Y1, where f(x) : CN -> C n , we simply concatenate the two systems of equations to obtain the augmented system f(x) : CN -> Cn+i as
=o
(i3 2 5)
^••=Uiri)] -
--
Clearly, a general A with i < N has full rank i. Using standard techniques from linear algebra, we can write the solution of C(x; A) = 0 in the form L c m = {x e CN | x = p + B • u for some u G
CN~1},
where p G C^ is a particular solution of C(x; A) = 0 and where the columns of the N x (N — i) matrix B are a basis for the null space of last N columns of A. Accordingly, an intrinsic formulation of slicing is the system fL(u):=f(p
+ B-u)=0.
(13.2.6)
The solutions of the extrinsic and intrinsic systems are isomorphic under the mappings u — i > p + B • u and x ^ B^(x — p), where B^ is the pseudoinverse of B. Since /z, : CN~l —> C" has fewer equations and variables than f(x), the intrinsic formulation can save computation compared to the extrinsic one. From a geometric point of view, the extrinsic and intrinsic formulations are identical: they both describe the intersection of V(f) with LC^T. Furthermore, in a situation where we wish to choose the slicing plane generically among the set of all affine (N — i)-planes, it does not matter if we do so by choosing random coefficients A as a point in (P^) 1 or if we choose random {p, B) for the intrinsic formulation. Either way, we are choosing a random slicing (N — i)-plane from the Grassmannian of all such planes in C^.
Basic Numerical Algebraic Geometry
13.3
235
Witness Sets
The strong version of the slicing theorem, Theorem 13.2.2, gives us everything we need to justify our definition of a witness set. It tells us that for an affine algebraic set X, a generic Lc^1^ c CN meets the irreducible components of X as follows. • It misses any irreducible components of dimension less than i. • It meets each irreducible component X^ of dimension i in degXij isolated points, and these points do not lie on any other component. • It intersects irreducible components of dimension k > i in an irreducible algebraic set of dimension k — i. Moreover, Theorem 13.2.2 implies that LCW will be generic with probability one if we choose the coefficients of its defining linear equations at random from
236
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
component of V(f) is of interest.1 We hasten to add that situations may arise where we define an afHne algebraic set in some indirect manner such that, although there must exist a set of polynomial equations that define the set, we do not necessarily have such a set at hand. As such situations arise, our definition of a witness set will be adapted to accommodate them. While a witness set of the form {X n Lci>l\ Lc^l\ f} is everything we need for theoretical purposes, it is not always sufficient from the numerical point of view. As a data structure in a computer program, we want to treat / as a pointer to a black-box routine that, given a point x*, returns just the function value f(x*) and the Jacobian matrix df/dx(x*), as floating point numbers. We would prefer not to perform any symbolic manipulations to extract information from / . Even in the case of a zero-dimensional algebraic set, which is just a finite set of points, a numerical witness point set is just a list of approximations to those points. A witness set should carry along enough additional information to allow us to numerically refine these approximations to higher precision. Exactly what additional information we carry along to numerically encode an algebraic set X will depend on the properties of X and also on the initial symbolic information we have been given to uniquely describe X. Accordingly, we will have several different flavors of witness set, but each will include a witness point set and enough additional information to allow us to use the witness set in our numerical algorithms. In an implementation of these algorithms in computer code, the witness set would be a data structure that includes a field identifying its flavor, and basic operations on witness sets need to be able to handle all such flavors. In the next few paragraphs, we define threeflavorsof witness sets that are useful in numerical work. We will not yet give numerical algorithms for computing such sets; these come later in the chapter. 13.3.1
Witness Sets for Reduced Components
Let us remind ourselves of the meaning of "reduced." The notions of reduced versus nonreduced are not to be confused with reducible versus irreducible. The line {(x,y) s C2 | x = 0} is an irreducible algebraic set that is a reduced solution component of the equation xy = 0, but it is a nonreduced solution component of the equation x2y = 0. Thus, we see that reduced and nonreduced are not intrinsic properties of the set as a geometric entity, but relate to algebraic properties of the system of polynomials that we use to define the set. Reduced is synonymous with multiplicity-one, while nonreduced implies a multiplicity greater than one. The salient point is that if Xi C C^ is an i-dimensional reduced solution component of the system of equations / = 0, it meets a generic codimension i slicing 1
Strictly speaking, the slicing plane Lcl~ll is not necessary, because it is either uniquely determined by the witness points or, if not, we can pick a slicing plane at random from among all the (TV — i)-dimensional linear spaces that interpolate the witness points. Nevertheless, it is convenient in our algorithms to have it on hand rather than to regenerate it when needed.
Basic Numerical Algebraic Geometry
237
plane in witness points having multiplicity-one. Such a point is numerically tame: the Jacobian matrix of the system of equations defining the point is full rank N. Letting L(x) — 0 denote a system of % independent linear equations for the slicing plane, the witness points are solutions of the augmented system {f(x), L(x)} = 0. If the number of equations, n, in this system is equal to the number of unknowns, N, then Newton's method converges quadratically in the neighborhood of the witness point. If n > N, the Gauss-Newton method, that is, Newton's method modified to use a least-squares iterative step, converges quadratically. This is quite satisfactory for our numerical work, so when X, is a reduced component, we use {XinLcW,LcW,f} as its witness set. 13.3.2
Witness Sets for Deflated
Components
In the case that an irreducible algebraic set X is a nonreduced solution component of a polynomial system f(x) = 0, its witness points are isolated roots of multiplicity greater than one for the augmented system f(x) = {f(x),L(x)} — 0. This means that the Jacobian matrix evaluated at such a root is not full rank: the witness points are singular. As discussed in § 10.5, the behavior of Newton's method near singular roots is greatly degraded and may even diverge. However, since the witness points are isolated, one or more stages of deflation, as described in Equation 10.5.5, may allow us to compute the witness points in a nonsingular manner. That is, for a witness point x*, deflation produces a new system of equations, say g(y) = 0, with a projection operator, say n : y H-> X, such that y* is a nonsingular solution of g{y) = 0 and x* = 7r(y*). In fact, since the slicing plane V(L) is generic and X is irreducible, the same deflation system that works on one witness point x* G X n V(L) must work for all other witness points in the set. Let us restate this in a slightly more general way, independent of the specific deflation technique described in § 10.5. Suppose that we have an i-dimensional irreducible algebraic set X C C^, and a linear projection TT : C M —> CN, with M > N. Let C(x; A) = 0 be a system of linear equations parameterized by matrix A, as in § 13.2, such that a generic A defines a codimension % linear space. Suppose that for generic A, each witness point x* £ X D V(L(x; A)) has above it a point y* G C M that is a nonsingular solution of a system of polynomial equations g(y; A) = 0. For a particular generic slicing plane Lc^l\ suppose Wy C C M is a collection of such nonsingular solution points, one for each point i n l f l Lc^^, so that ir(Wy) = X n Lc^ • Then, we may use as our numerical witness set for X the data {Wy,LcW,g,n}. Of course, in our numerical work, Wy will be a numerical approximation to the ideal points. To refine the witness point set n(Wy), we use Newton's method to refine
238
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Wy as solution points of g(y) = 0, and then project.
13.3.3
Witness Sets for Nonreduced Components
It might not always be possible or convenient to find a deflation formulation for a nonreduced solution component. For example, it may be that several stages of deflation would be necessary, giving an unreasonably large dimension M in which the numerical work is carried out. Alternatively, it may happen that the algebraic set in question is not given to us as a solution component of a system of polynomial equations. For example, it could be defined as the intersection of two solution components of a system. Such a set is certainly described by some system of equations, but it might take considerable symbolic computation to construct them from the data at hand. In this section, we present a third flavor of witness set that can handle many such situations. Suppose that we use a homotopy method to solve for the witness points, that is, each point w € W := X n L c ^ is the endpoint of some solution path xw(t) of a homotopy function h(x,t) = 0, i.e., h(xw(t),t) = 0, \imt^oxw(t) = w. We will construct explicit examples of such homotopies below, but for now, we simply posit the existence of one. When X is multiplicity greater than one as an i-dimensional solution component of a given polynomial system / , the homotopy we construct will have w as a singular endpoint. Recall from Chapter 10 that we have several methods for computing singular endpoints of homotopy paths. In the power-series endgame or the closely-related Cauchy integral endgame, we estimate the endpoint by building a local model of the end of the path, sampling the path for small t inside the endgame convergence radius but outside the ill-conditioned zone at t — 0. Taking this route, we define the set
W(e) := {xw(e) \ w e W} consisting of the solutions of h(x, e) = 0 that lead to the witness points W as e —>• 0, with e € (0,1]. Our third flavor of witness set for an i-dimensional algebraic set X is, accordingly, the data
{W,Lc^,h(x,t),W(e),e}, where in addition to the conditions that L c ^ is a generic (AT — «)-dimensional linear subspace of C^ with W := Lc^ fl X, we have h(x, t) and W(e) satisfying (1) for each point w G W, we have a positive e > 0 and a nonsingular path xw(t):(0,e]->CN with h(xw(t),t) (2) W(e) = {xw(e)
— 0 and limt^oxw(t) \w£W}.
= w, and
Basic Numerical Algebraic Geometry
239
Whenever we wish to refine a numerical approximation of the witness set, we can do so by re-playing the singular endgame in higher precision, using W(e) and e to initialize the solution paths of the homotopy. Whichever treatment of nonreduced components we choose, we still refer to W as a witness point set for X. For simplicity of statement, we often suppress the reference to h(x,t) or g(y) and refer to the witness set (W, Lc^l\f) in both the reduced and nonreduced case. We will soon turn to the task of computing witness point sets, but first we prepare by discussing the rank of a polynomial system and randomizations of polynomial systems. 13.4
Rank of a Polynomial System
Let f(x) = 0 denote a system of n polynomials on
A- x := A-
:
,
_XN .
where A is an n x N matrix, the rank of the system is the classical rank A of the matrix A and the corank is the dimension of the null space of A • x = 0. Note that given a system / as above, neither adding polynomials in the equations of / to the system nor replacing / with F • f, where F is an invertible nxn matrix, changes the rank of the system.
Lemma 13.4.1 Let f(x) = 0 denote the system of n polynomials on CN. Then there is a Zariski open set Y c f(CN) such that for y eY, V(f(x) — y) is smooth of dimension equal to the corank of f. Moreover, the Jacobian matrix of f is of rank equal to rank/ at all points ofV(f(x) — y). Proof. This is a corollary of Theorem A.6.1 with X taken as CN.
•
An important consequence of the above is that for the dense Zariski open set U := f~1(Y), we have for all points 2* € U, the rank of the Jacobian of / evaluated at £* equals rank/. This gives us a fast probability-one algorithm for the rank for a system f. Explicitly, given a system f(x) of n polynomials on CN, then the rank of / equals the rank of the Jacobian at a random point of C^. To emphasize its importance, we restate the algorithm below.
240
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Rank of a System: r = rank(/) • Input: Polynomial system f(x): CN —> Cn. • Output: r := rank/. • Procedure: — Choose a random x* £ C^. — v '=
r a n k I ——IT** ) I
— return(r). The numerical determination of the rank of the Jacobian matrix is best done using the singular value decomposition. The rank intervenes in the study of systems in the following way, which will play an important role in subsequent developments. Theorem 13.4.2 Given a system f(x) of n polynomials on CN, all irreducible components ofV(f) have dimension at least equal to the corank of f. Proof. As noted in Lemma 13.4.1, the corank of / is dimxf~1(f(x)) dense Zariski open set of CN. Thus, by Theorem A.4.5, the set {xeCN must equal C^.
\ dimx f-\f(x))
for £ in a
> corank/} •
Remark 13.4.3 Surprisingly, previous to this book, the rank of a system has not been defined explicitly in numerical algebraic geometry. Example 13.4.4 Consider the space of 3 x 3 orthogonal matrices, usually denoted 50(3). For any matrix A e 5*0(3), we have the defining equations ATA = I, detA = 1, where / is the identity. Because ATA is symmetric, the first matrix condition amounts to just 6 scalar equations, so we have 7 polynomial equations depending on the 9 entries in A. However, the rank of the system is only 6. Thus, 5*0(3) can have no components of dimension smaller than 9 — 6 = 3. Of course, it is well-known that the dimension of 50(3) is three, so the rank condition is sharp in this case. The definitions of rank and corank make sense for systems of algebraic functions, e.g., rational functions, defined on an irreducible quasiprojective algebraic set X. Lemma 13.4.1, Theorem 13.4.2, and the algorithm for the rank carry over immediately with the same proofs to this situation. This generalization, which will be needed in Chapter 16, is presented in § A.6.
Basic Numerical Algebraic Geometry
13.5
241
Randomization and Nonsquare Systems
We define a square system to be a system f(x) =
7i(*)" : =0
(13.5.7)
Jn(x)_ of polynomials on C^ with n = N. When we numerically solve a system of equations, it is usually convenient, and sometimes necessary, to have the same number of equations as unknowns. The systems we wish to study might not be square. If n < N, we call the system underdetermined, and if n > N, we call it overdetermined. However, if it is underdetermined, its rank is at most n, so by Theorem 13.4.2, its irreducible solution components must be dimension at least N — n. We will work with such components by slicing them with at least N — n hyperplanes, resulting in an augmented system having at least as many equations as unknowns. Of course, when augmented by slicing planes, square systems become overdetermined, and overdetermined systems stay overdetermined. Therefore, we see that the overdetermined case needs attention. To find the isolated solutions of an overdetermined system, n > N, the naive approach is to pick out N equations, solve them, and check the solution points against the remaining equations. This approach is fraught with peril. For example, consider the system: xy = 0 x(z-y) = 0 y(x - y) = 0.
(13.5.8)
Any two of the 3 equations have a 1 dimensional solution set, but all three together have the origin (with multiplicity 3) as the solution set. There is a natural procedure for obtaining a square system from the above system, f(x) — 0. Given a n i V x n matrix of complex numbers A G CNxn, we can form a square system / *1,1 fl +•••+M,nfn A./=
\
: \-V/V,l/l + • • • + AjV,n/n /
As we will show below, this square system has all the properties we need to compute an irreducible decomposition of V(/), and in our first article (Sommese & Wampler, 1996) on Numerical Algebraic Geometry, this was our approach. In the following paragraphs, we present a somewhat more general view of randomization, which is essential in dealing with intersections of irreducible algebraic
242
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
sets (Sommese et al., 2004b). In particular, let us consider the system A • /(x), where the k x n matrix A G Ckxn is chosen generically. If k = N, this is a square system, but we may consider k ^ N as well. Let us discuss the principal facts about this construction. First note that this construction is only of interest if k < n. To see this, note that if k = n, then such a A is invertible. Consequently, the systems A • f(x) = 0 and f(x) = A" 1 • A • f(x) = 0 are equivalent. If k > n, then we may break A G C fcxn into two submatrices by rows; the matrix Ai formed from the first n rows is an invertible matrix for a nonempty Zariski open set of the k x n matrices A £ Ckxn. Let A2 be the remaining (k — n) x n matrix formed from the last k — n rows of A and let r
._ [
Aj
• [-Aa-Ar
1
Onx(k-n)]
/*-„ J
with 0nX(k-n) the n x {k — n) matrix with all zero entries and Ik-n the (k — n) x (k — n) identity matrix. Then T is invertible and A • f{x) = 0 is equivalent to
r h(x) ' L-A2-A!1
/fc_n J [ A 2 J M '
/n(x)
.0(fe-n)xl.
Thus, only if k < n is this construction interesting. If k < n, then we may break A into two submatrices as A = [Ai A2], where Ai is k x k. Submatrix Ai is an invertible matrix for a nonempty Zariski open set of the kxn matrices A G Ckxn. Thus, A-/(x) is equivalent to A^"1 -A-f(x) = [I A']-f(x), where A' = A1"1A2- In other words, the system is of the form /l
fk+l A
': + ' • ': Jk\ L fn . for a nonempty Zariski open set of k x (n - k) matrices A' G Ckx(n~k), It is important to note that though mathematically A • f(x) and [/ A'] • f(x) are equivalent, the latter may be better than the former for homotopy continuation. Moreover, the ordering of the equations can matter. For example, if the equations were
' x\ +xl - 1" x2 Xi — 1
=0
Basic Numerical Algebraic Geometry
243
the randomization 'x2 + x22-l
+ Xl(x1-l)^
a;2 + A 2 (x 1 -l)
J
= Q
would be better than the randomization [ X2 + Xi{xl + x22-l)
1_
[Xl-l+X2(xl+xl-l)\ since there would be only two paths to follow using a total degree homotopy on the former as opposed to four paths on the latter. The key properties of randomization are given by the following simple theorem of Bertini type. Theorem 13.5.1 Let 7i(*)' /(*) = : =0 Jn(x)_ be a system of polynomials on CN. Assume that A C CN is an irreducible affine algebraic set. Then there is a nonempty Zariski open set U of k x n matrices A e C fexn such that for A e U (1) if dim A > N — k, then A is an irreducible component of V(f) if and only if it is an irreducible component ofV(A • f); (2) if dim ^4 = N — k, then A is an irreducible component ofV(f) implies that A is also an irreducible component ofV(A • / ) ; and (3) if A is an irreducible component ofV(f), its multiplicity as a solution component of A • f(x) = 0 is greater than or equal to its multiplicity as a solution component of f(x) = 0, with equality if either multiplicity is 1. It is important to emphasize that although an irreducible component of V(f) is an irreducible component of the randomized system, V(A • / ) , its multiplicity as an irreducible solution component of A • / = 0 (if not 1) might be larger than as an irreducible solution component of / = 0. The following system, which is equivalent to the system Equation 13.5.8 illustrates this: xy = 0 x2 = 0
(13.5.9)
y2 = o. The origin is an isolated solution of multiplicity 3. The randomized square system x(y + /xix) = 0 y(x + n2y) = 0.
(13.5.10)
244
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
has the origin as an isolated solution of multiplicity 4. The randomization of a system will be used often enough that we introduce a new notation for it. We let fK(/(x);/c) denote a randomization A • f{x) with A a k x n matrix. We also may write 9l(fi(x),..., fn(x); k) to mean the same sort of randomization acting on the system obtained by stacking up all the functions fi(x). When we use the randomization method in probability-one algorithms, A must be chosen from a Zariski open dense set that is defined by the problem at hand, and it may depend on other choices we make in the algorithm. Logically, the open set from which we must choose A is not defined until all such choices are made, and so we should choose A last. Operationally, we usually do not have a computationally useful description of the set nor do we need one, since a random choice of A will be in the set with probability one. Accordingly, it does not matter when we choose A in the course of the procedure, as long as the choice is made independently of the conditions that define the invalid set.
13.6
Witness Supersets
Suppose that we wish to compute the numerical irreducible decomposition, Equation 13.1.3, of V(f) for some polynomial system / : C^ —> C n . The logical first step is to find a witness point set, Wi := V* (~l Lc^l\ for each pure-dimensional solution component, Vi. A second step would then decompose these into irreducible components. Unfortunately, we do not have an algorithm for directly computing the Wi, but we can readily compute a looser set Wi that contains Wi. We will call such a set a witness point superset, defined as follows.
Definition 13.6.1 (Witness Point Superset) Let Z c C^ be an affine algebraic set, and let X be a pure i-dimensional component of Z. Then W C CN is a witness point superset for X as a component of Z if it meets the requirements: (1) W is a finite set of points; (2) W GZnL^^jjind (3) (IflLcf'l) CW, where Lc'1' c C^ is a generic linear space of codimension i. A witness superset for Z is just a collection of one witness point superset at each dimension along with the corresponding linear slicing space, Lc^\ at each dimension. Remark 13.6.2 Since for generic Lc^^, Z n Lc^l is empty for i > dim Z, we see that the witness point supersets for all dimensions greater than dim Z are empty. Let Vi be the union of all the i-dimensional irreducible components of V(f), and let Wi be a witness superset for Vi. UVi is not the maximal dimensional component of V(f), then a linear space Lc^l will meet the higher dimensional components, and
245
Basic Numerical Algebraic Geometry
Wi will likely contain some points on those components. That is (13.6.11)
Wi = Wi + Ji
where Jj C Uk>iVk. We call Jj the "junk points" in W. Even when i = 0, i.e., the classical case of finding isolated solutions of f(x) = 0, the homotopy methods of Part II return WQ and give no ready method to distinguish isolated singular solutions in Wo from the junk points JQ. In Chapter 15, we will present algorithms that discard the junk points Ji to get Wi and then further decompose the Wi into the Wtj of Equation 13.1.3. This will give the numerical irreducible decomposition. For the present, we will concentrate on finding the witness point supersets. We can compute witness point supersets using homotopy continuation. Theorem 8.3.1 gives conditions for a homotopy to find all isolated solutions of a square system of polynomial equations. Total degree homotopies and multihomogeneous homotopies as given in § 8.4.1 and § 8.4.2 have start systems with only nonsingular roots, so they satisfy the required conditions for finding all isolated roots in CN. The linear product homotopies of § 8.4.3 and the polyhedral homotopies of § 8.5 do the same on (
:= x*).
— Else, choose a random point a G Ce and a random matrix A e — Let L be V{£), where £(x) := a + A • x.
CxN.
246
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
- If i > N - rank/, * Compute S := HomSolve({fH(/; N - i), £.}). * Let W :={s G S \ f(s) = 0}. * retum(W,L). - Else return(t? := 0,L).
Theorem 13.6.3 For 0 < i < N, there is a Zariski open dense set U € CixN+1 such that for (a, A) & U, algorithm WitnessSupi returns a witness point superset for the i-dimensional component ofV(f). Proof. By Theorem 13.5.1, if A is an i-dimensional irreducible solution component of f(x) = 0, then it is also an irreducible solution component of F(x) := fH(/; N - i). Therefore, the witness points A n Lc^l~\ are isolated points in both V(f) f~l Lc^l~\ and V(F)C\LC^^, for a generic linear space L 0 ^! c CN of codimension i. By assumption, the set of points S returned by S := HomSolve(g) is finite and includes all isolated solutions of V{g), so W must include A n L c ^l. This holds for every i-dimensional irreducible component of V(f), so W includes the witness points for the entire i-dimensional component of V(/). Thus, items 1 and 3 of Definition 13.6.1 are satisfied. To see that item 2 of that definition is satisfied, we argue as follows. By our assumptions on HomSolve, we have 5* C (V(F) (1 i c ^ l ) . By Theorem 13.2.1, for generic L c ^ , V(F) n Lc^l this includes only points of V(F) of dimension i or greater. By Theorem 13.5.1, any components of V(F) of dimension k > i must also befc-dimensionalcomponents of V(f). Thus, any points in S that lie on components of V(F) of dimension k > i must also be in V(f). Consequently, any s £ S such that s $ V(f) n Lc^ must lie in a component of V(F) \ V(f) of dimension i. Such • points do not satisfy f(s) = 0, and so they are not copied from S into W. For i = N, the algorithm uses the probabilistic null test to see if all the functions in / are the zero polynomial. For all other i > N — rank/, we solve a square system of size N. The statement of the algorithm above uses an extrinsic formulation of slicing. To work intrinsically, we just change a few lines, and in so doing, decrease the size of the square system we solve to only N — i. Witness Superset for Dimension i (intrinsic): - Choose a random point b 6 CN and a random matrix B £ (^NX(N-I) _ - Let L be the space defined intrinsically as L(u) := b + B • u, u £ CN~l. - If i > N - rank/, * Let F(x)=iR(f;N-i). * S := HomSolve(F(b + B-u))c
CN~\
Basic Numerical Algebraic Geometry
* Let W :=Jw <E CN \ w = b + B • s, * ret\irn(W,L).
247
s e 5 and f(w) = 0}.
- Elsereturn(t?:=0,L). When i = N — 1, the system to be solved has only one variable, so the call to HomSolve could be replaced by any other method for solving polynomials in one variable. With WitnessSupi available to find a witness superset for the i-dimensional component of V(f), it is a simple matter to assemble a collection of such sets for every possible dimension. To be explicit, we display the full algorithm, WitnessSuper, below. Witness Superset: [W] — WitnessSuper(/) • Input: Polynomial system / :JCN -> C*. __ • Output: A witness superset W — {(Wo, LQ), ..., (WJV, £JV)} for V(f), where (Wj, Li) is a witness superset for the dimension i component. Empty dimensions may be omitted. • Procedure: -
Initialize W = {}. Append (WN,LN) =_WitnessSupi(/, N) to W. HWN ^ 0, return(W). L o o p : For i = N -1,...,N - rank/ * Append (Wi,Lt) - WitnessSupi(/,i) to W.
~ End loop. - return(W^). i
i
Recall that in the case of nonreduced components, we wish to include in our numerical witness sets additional information to allow robust numerical computation of the witness points, either a deflation formulation as in § 13.3.2 or a homotopy formulation as in § 13.3.3. Clearly, a homotopy is available inside algorithm WitnessSup, we merely have to return the information. A deflation formulation can be returned if one is used in the endgame of HomSolve. Notice that deflation can only work on the true witness points in the witness superset, because these are isolated solutions, whereas the junk points are not. So before trying to deploy deflation, we need to separate the junk from the witness points, which requires the methods of Chapter 15. 13.6.1
Examples
In the following examples, the tables summarizing runs of algorithm WitnessSuper have columns labeled as follows:
248
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Dim Paths #W
Dimension of component under investigation. Number of paths in the homotopy. Number of endpoints in witness superset.
#W^sing #7V #00 nfe nfe
Number of those points that are singular. Number of "nonsolutions," i.e., endpoints x $ V(f). Number of endpoints at infinity. Average number of function evaluations per path. Total number of function evaluations for this dimension.
The number of function evaluations depends on details of the path tracker and the endgame, including the various control settings they use. The figures reported here are for the default settings in H O M L A B . These numbers will change slightly in repeat runs, because the paths depend on random choices of slices and in the randomization to square up a system. They are included to give a sense of where the algorithm spends most of its effort. Example 13.6.4 Consider the system given in Equation 12.0.1, which for convenience we repeat here:
f(xv)-\
sfo2-*3)^-1)
l - o
The polynomials are degree 5 and 6, so using total-degree homotopies, algorithm WitnessSuper tracks 6 paths to find a dimension 1 witness superset and 30 paths to find a dimension 0 witness superset. The results are summarized in the following table. Dim 1
Paths 6
#W 4
0 I 30 I 30 I
#Wsins 0
28
#Af 2
#00 0
nfe 120
nfe 72~T
0 I 0 I 165 4948
This is consistent with the fact that V(f) has a degree 4 component at dimension 1, decomposable into V(x) and V(y2 - x3). The superset at dimension 1 has four points and no junk. At dimension 0, all paths must end on V(f) as there is no slice involved. (This would not necessarily be true if / had more equations than variables.) Of the 30 path endpoints, 28 are singular. The true witness points are the two nonsingular points in the witness superset, the other 28 are junk. Junk points are always singular, but it would be erroneous to conclude that singular points in the superset are necessarily junk. In fact, if the factor (x — 1) in the first equation were changed to (x — I) 2 , the zero dimensional solution points would become double points and therefore would be singular.
249
Basic Numerical Algebraic Geometry
Example 13.6.5
The system f(x
v )
n*,v)-[
- Wy-xy-2yA
xy3_y
_
j-o
leads to the following results from WitnessSuper: Dim 1 0
Paths 4 12
#W 1 10
#Wsing 0 6
#JV # 0 0 3 0 0 2
nfe nfe 158 631 166 1997
At dimension 1, we have one witness point for the set V(y). At dimension 0, two of twelve paths go to infinity, leaving ten points in the witness superset. The six singular points in the zero-dimensional witness superset are in fact junk: they all have y = 0. The remaining four points are the finite isolated roots in V(f). Example 13.6.6 Running WitnessSuper on the equations for 50(3), see Example 13.4.4, one obtains the following table. Note that the rank test saves us from trying to compute witness points for dimension 2, which would have required 192 paths. Dim ~8 7 6 5 4 3
Paths 3 6 12 24 48 96
#W 0 0 0 0 0 8
#W 5 i n g 0 0 0 0 0 0
#AT #00 3 0 6 0 12 0 24 0 48 0 40 48
nfe nfe 58 173~ 107 639 151 1815 150 3590 229 10986 254 24337
In all the examples, we may observe that the number of function calls grows as we descend dimensions. This is due both to an increase in the number of paths (which grows geometrically) and also a general tendency for the number of calls per path to increase. Not reflected in the table is the additional fact that the number of variables climbs as we descend, so the linear solving routine used in prediction/correction iterations will be more expensive. So, by every measure, the bottom run is by far the most expensive. This underscores the importance of using the rank of the system to eliminate low-dimensional runs. 13.7
Probabilistic Algorithms About Algebraic Sets
In this section we follow (Sommese & Wampler, 1996) and show how the witness supersets immediately give some numerical algorithms. Subsequent chapters will present more efficient algorithms, so the main point here is to recognize the capabilities that witness supersets make possible.
250
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
13.7.1
An Algorithm for the Dimension of an Algebraic Set
One consequence of Remark 13.6.2 is a simple algorithm for finding the dimension of V(f), i.e., the maximum of the dimensions of the components of V(f). •
•
Top Dimension: d = TopDimen(/) • Input: Polynomial system / : C^ —> C n . • Output: The dimension of V(f), i.e., d := dim V(f). If V(f) = 0, then d := 0. • Procedure: — Loop: For i = N, N - 1,..., N - rank/ * LetjW,L) := WitnessSup(/,«'). * If W / 0, then return(d := i). — End loop. — return (d := 0).
13.7.2
An Algorithm for the Dimension of an Algebraic Set at a Point
Let Z be an algebraic subset of C^ defined by a system of polynomial equations / = 0. Let p G Z, i.e., p e CN and f(p) = 0. Recall from § 12.2.1 that if Z = Vfi=1Zi is the decomposition of Z into pure-dimensional algebraic sets, then the dimension of Z at p G Z is max{i | p 6 Zi}. In this section we give an algorithm to compute the dimension of Z at p. In particular: (1) if p is a generic point of an irreducible component Zi of Z, then this algorithm computes dim Zi\ (2) this algorithm lets us decide whether a solution p of a system / = 0 is isolated. This is the local variant of the dimension algorithm of § 13.6. The algorithm proceeds as follows. If Zi is an irreducible component of Z containing p, then any affine CN~dtm z' near a generic affine CN~dim Zl containing p meets Zi in at least one point near p. Moreover if dim Zi is the maximum dimension of any irreducible component of Z containing p, then for k > dim Zi it follows that generically an affine CN~k near a generic affine CN^k containing p does not meet Z in any points near p. A generic affine CN~k containing p := {pi,- • -,PN) is specified parametrically by {x G C^ \ x — p + B • u, u G CN~k} where B is a generic N x (N — k) matrix. An affine CN~k nearby is one parameterized by ip',B') G CN+N*(N~k*> in the neighborhood of (p,B), using the complex topology. Let us first lay this out as a conceptual algorithm in which many implementation details are left for later. In particular, the algorithm depends on a procedure [S] = LocalSlice(Z,p, L) that returns a list of points S C ZnL that contains all isolated
Basic Numerical Algebraic Geometry
251
points of Z n L near p, where L C C^ is an affine linear space. We do not specify an implementation for LocalSlice here, but one possibility is given in a numerical version of the algorithm later in this section. Local Dimension: (conceptual algorithm) [d] = LocalDimen(Z, p) • Input: A numerical description of an algebraic set Z c
252
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
* * * * *
Let L'(u) be a generic affine C* near L. Let fi'{u) := f(L'(u)), a system of n polynomials in i variables. Let g(u) := 91(/L'; i), a square system of size i. Compute 5 := HomSolve(g). If 5 contains a point u* such that p' := L'(u*) is near p, then return(d := N -i).
- End loop. The definition of nearness in this algorithm is a bit problematic. One prescription would be to repeat the test for a sequence of linear spaces closer and closer to L and see if this produces a sequence of solution points closer and closer to p. Ideally, this would be done using continuation from a generic V to L. The problem is that the set S can contain singular solutions. Since we do not know the local dimension at these, we do not know the dimension of the solution paths as V varies and so it is not possible to numerically track them. The methods of subsequent chapters will refine the situation so that the nearness test can be implemented as testing equality between p and the endpoint of a well-defined one-dimensional homotopy path. The use of HomSolve in the algorithm above is overkill, because it finds all isolated solutions of g = 0, when all we need is to find one near p, if it exists. A better alternative might be to use an exclusion method (see § 6.1) initialized on a small box containing p. An interesting purely local heuristic for checking the dimension at a point p is given in (Kuo, Li, & Wu, 2004), based on the methods of (Li & Zheng, 2004). It is a variant of the conceptual algorithm above, with a heuristic for LocalSlice. If this could be strengthened to a probability-one algorithm, it might be substantially more efficient than the numerical procedure above. 13.7.3
An Algorithm for Deciding Inclusion and Equality of Reduced Algebraic Sets
At this point, we can succinctly formulate an algorithm for deciding inclusion of the solution sets of two systems of polynomial equations, which will immediately yield an algorithm for deciding equality of such solution sets. Inclusion Test: [t] = Inclusion(/, g) • Input: Polynomial systems / : CN -> C", g : CN -> C m . • Output: Logical t := true if V(f) C V(g), otherwise, t := false. • Procedure: - Loop: For % = AT, N - 1 , . . . , N - rank/, * Let W := WitnessSup(/, i). * If g(x) ^ 0 for any x € W, then return(£ := false). - End Loop.
Basic Numerical Algebraic Geometry
253
— return(£ := true). The inclusion test leads immediately to an equality testing algorithm, as follows. Equality Test: [t] = Equal(/,#) • Input: Polynomial systems / : CN -> Cn, g : CN -> C m . • Output: Logical t :— true if V(/) = V(g); otherwise, t := false. • Procedure: — — — —
t\ := Inclusion(/,g). t2 := Inclusion^,/). If both t\ and £2 are true, then return(i := true). Else, return(£ := false).
1
1
We have not dealt with multiplicities in this algorithm. Thus this algorithm gives a way of deciding if the reduced algebraic set defined by / = 0 is an algebraic subset of the reduced algebraic set defined by g = 0. This algorithm is a translation of the algorithm from van der Waerden's classic (§93 to §98 van der Waerden, 1950). It is a strength of our numerical version of generic points model that they model the classical generic points close enough that such results of classical algebraic geometry translate without difficulty. 13.8
Summary
Given a polynomial system fix) = 0 on
Exercises
Several of these exercises refer to routines from HOMLAB (see Appendix C). Routine wit sup. m implements algorithm WitnessSuper. If the system to be analyzed is provided in tableau form, script wsuptab.m will sort them by descending degree and then call wit sup. m. Exercise 13.1 (Multiplicity and Randomization) Show that the system Equation 13.5.9 has the origin as an isolated zero of multiplicity 3. Show that the system Equation 13.5.10 has the origin as an isolated zero of multiplicity 4.
254
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Exercise 13.2 (Inclusion Test) Use witsup.m to find witness points for the twisted cubic, V(y — x2,z — x3), and also for V(xy — z2,xz inclusion test to see if either of these contains the other.
— y2).
Apply the
Exercise 13.3 (Seven-Bar Linkage) Refer to Figure 9.5 and derive a set of six equations similar to the ones in Equations 9.5.33—9.5.36, consisting of three loop equations and three unit magnitude conditions. Compute a witness superset for general link parameters (a o ,6 0 ,c o ,ai,02,62,03,63,4,4,£ 6 ) & C 1 1 . Then repeat the exercise arbitrarily choosing a0 = —0.3- li, c0 = —1, ai = 0.28, 62 = 0.37, £6 = 0.55, and setting the remaining parameters with the formulae 60 = 0, a2 = a o 6 2 /co, as = ai, 63 = <2i(ao — Co)/ao, £4 = £e\ao/co\, and £5 = 11>21. Make a table like those shown in § 13.6.1.
Chapter 14
A Cascade Algorithm for Witness Supersets This chapter revisits the construction of a witness superset for the solution set of a system f(x) = 0 of n polynomials on CN, a topic addressed earlier in § 13.6. The algorithm, WitnessSuper, from that section leaves room for improvement both from a theoretical and practical point of view. To understand why this might be so, let us assume that we use total degree homotopies to solve the systems arising in the algorithm. Without loss of generality, we may assume that we have squared-up the system, so n = N, and we have sorted the polynomials fi{x) from the system f(x) by descending degree, so that letting di = deg/j, we have d\ > ••• > djvUnder these conditions, WitnessSuper tracks N
j
paths. In comparison, it is a classical fact, e.g., (12.3.1 Fulton, 1998), that given the irreducible components Zij of V(f) with Z^ occurring with multiplicity /i^ it follows that N
i=i
ij
At first sight this does not look so terrible. In the case when all the di = 2, 2N+l — 1 paths to be tracked in the algorithm to find at most 2N solutions. Since, all other things being equal, computational work is proportional to the number of paths followed, this amounts to only about twice as much work as is theoretically needed. But all other things are not equal! Paths that do not lead to witness points often end up going to singular solutions. This can be expensive. In § 14.1, we explain an algorithm that follows only Yli=i °*i paths in the total degree case. These paths are tracked in N stages, yielding at each stage a witness superset for each successively smaller dimension, and hence, we call this the cascade algorithm. In the worst case, all the paths survive to the end of the cascade, requiring the equivalent of N FIi=i di paths to track, but in the typical case, many paths terminate early in the process, making the algorithm relatively efficient. Moreover, 255
256
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
to survive to the next stage, a path must remain nonsingular, which helps keep computational cost down and reliability high. The version we present here differs slightly from its first appearance in (Sommese & Verschelde, 2000). The most notable difference is the removal of slack variables, which were never used in actual implementations. The new presentation draws on Theorem 13.2.2 to establish the genericity of the slicing hyperplanes, removing any dependence on the order in which they are used. For ease of reading, in this chapter we act as if components are reduced, e.g., we All talk about witness supersets (Wi,Li,f) instead of (Wi,Li: f,hi(x,t),Wi(e),e). our arguments and algorithms hold for nonreduced components also. For example, the cascade algorithm for witness supersets produces the hi(x,t) whether the components are reduced or nonreduced, and to obtain the Wi(e) we would only need to have the t variable in the homotopies in the cascade algorithm take on the value t = e for an appropriate small value of e. This short chapter has only two sections: the description of the cascade algorithm in § 14.1, and presentation of some examples of its use in § 14.2. 14.1
The Cascade Algorithm
The form of this algorithm is: Input Output
a system f(x) — 0 of n polynomials on C^ a witness superset for V(f) (see Definition 13.6.1)
For simplicity in forming the algorithm, we begin by squaring up the system so that we have the same number of equations as unknowns. Let r = rank/. By Theorem 13.4.2, the lowest possible dimension of any irreducible solution component is N — r, and by Theorem 13.5.1, all such components are also irreducible components of [Ir A] • f, where Ir is the r x r identity matrix and A is a generic r x (n — r) matrix in C" x ( n - r ). To get a witness point set for dimension i, we slice simultaneously with % generic hyperplanes, so we see that we use at least N — r such planes no matter which dimension is being investigated. By Theorem 13.2.2, with probability one, we can pick a set of N — r such hyperplanes, generic with respect to all solution components, by choosing random, complex coefficients for their equations. This is equivalent to choosing an r-dimensional linear space L £ CN intrinsically as L — b + B • u with random b £ CN and B £ CNxr. Combining these maneuvers, we have the square system of size r g{u)= [lrA]-f(b + B-u). Any solution u* of g(u) = 0 maps to a point x* = b+B-u* £ L, and such points that also satisfy f(x*) = 0 are the witness points that we seek. Whatever the values of n and iV may be, we use this approach to convert the problem of analyzing
A Cascade Algorithm for Witness Supersets
257
/ : C ^ —> C n to treating a square system g : Cr —> C . Accordingly, without loss of generality, from this point on we assume / is square of size n — N — rank/. Recall that for i = 0 to N, algorithm WitnessSuper obtains a witness superset for the i-dimensional component of V(f) by intersecting V(f) with i generic linear equations. Instead of treating each of these as an independent problem, the cascade approach embeds all of them into a common formulation. For this purpose, we introduce an JV-tuple of parameters t = (t\,..., £JV) G C ^ , the diagonal matrix
T(t):=
•..
,
(14.1.1)
and the notational device tW = ( t 1 , . . . , t i , O , . . . , O ) . By Theorem 13.2.2, there is a Zariski open dense set U C CNx{N+1) such that for A := [
(14.1.2)
+ A1-x
are generic with respect to all the irreducible components of V(f), where ao is the first column of N x (JV +1) matrix A and A\ is the remaining columns. The witness point superset for dimension i is a finite set of points containing all isolated solutions of the system
for nonzero values of t\,..., f j . The zeros on the diagonal of T(v-l>) knock out N — i of the linear equations in L(x), leaving us with a system of N + i equations in
X£CN.
To obtain all isolated solutions of F(A, x,t) = 0 by homotopy methods, we need a square system. Theorem 13.5.1 tells us that there is a Zariski open dense set U C CNxN such that for A e U the isolated solutions of F(A,x,t) = 0 are contained in those of £(A, A, x, t) := f(x) + A • T(t) • L(A, x).
(14.1.4)
Accordingly, a witness point superset for the i-dimensional component of V(f) can be found by computing all isolated solutions to £i{A, A,x) := £(A, A,x,t®)
= 0.
(14.1.5)
258
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
For clarity, let's denote the fcth row of L(A,x) as Lfc(x) and the fcth entry in f(x) as fk(x) so that we write £(A,A,x,t) expanded as
£(A,A,x,t):=
T/i(x)"| piL^a;)"!" : +A: . .UN(X)\ [tNLN(x)\_
(14.1.6)
We summarize what we have done, with some additional useful conclusions, in the following theorem, carrying over the notation of the preceding paragraphs. Theorem 14.1.1 For a given polynomial system f : CN —> C^, there is a Zariski open dense set U C C M x C JVx ( N+1) such that for (A, A) <5 U and any integer i satisfying 0 < i < N, it follows that (1) a witness point superset for all i-dimensional components of V(f) is a subset of the isolated solutions of £i(A, A, x) = 0; and (2) if x' is a solution of £i(A, A,x) = 0 then either: (a) x' is in a component ofV(f) of dimension at least i, and Lfc(x) = 0 for all 1 < k
I^NXN
Having embedded all the systems of interest into £(A, A, x, t), we now turn to the cascade for solving the £i(A, A, x) = 0 as % descends from N to 0. With probability one, a random choice of (A, A) satisfies the genericity conditions of Theorem 14.1.1. Choosing them so, we consider them fixed and suppress them from our notation, hence writing £(x,t) for the embedding and £i{x) for the ith embedded system. Define the level i nonsolutions as the set of solutions x' of £i(x) = 0 with Li(x') / 0. Denote these by Mi. They depend on the choice of A and A, but by Theorem 14.1.1, the number of them, which we denote i/j, is independent of A s U. Each £i(x) is in the family of systems £{x,t) for a particular value of t^ G C^. Moreover, holding il*"1) fixed but letting ti vary, we can view £i{x;ti) = 0 as a parameterized family of systems which include as a special case £j_i(x) = £i(x; 0). By the principles of parameter continuation, see § 7.4, if we can solve £t{x; U) = 0 for a generic ti £ C, then we can use those solutions as start points in a homotopy
A Cascade Algorithm for Witness Supersets
259
£i(x;s) = 0 as s goes from t\ to 0. By similar reasoning, we can descend from £i{x) — 0 to any £j(x) = 0, j < i, using the homotopy Hji{x, s) := £(x, (*!,..., t,-, atJ+u ..., stu 0,... ,0)) = 0
(14.1.7)
for s going from 1 to 0. We refer to this as a cascade of homotopies. We can be more precise about which solutions of the higher system lead to solutions for the lower one, as follows. Theorem 14.1.2 Let Hji(x, s) be defined as above. There is a Zariski open dense set U C CNxN x CNx<-N+1) x e such that for (A, A,t®) <E U, there are nonsingular paths ((f>k(s), s) :C—>CN x C with 1 < k < vt such that: (1) the set 0^(1) are equal to the set of nonsolutions at level i; and (2) H3i{4>k(s),s)=Q; and (3) the limits lims^o0fc(s) with Lj(lims^o 0fc(s)) ^ 0 are the level j nonsolutions; and (4) the limits lims^o0fc(s) with Lj(\ims^o4>k{s)) = 0 contain the witness point superset for the j-dimensional components ofV(f). Proof. By construction and Theorem 14.1.1, on an Zariski open dense set of (A,j4,£[fc!), the witness points for level j are among the isolated solutions of Hji(x, 0) = 0. Moreover, by Theorem 14.1.1, the nonsolutions at each level are nonsingular and therefore isolated. By Theorem 7.1.6, the limits of the isolated solution paths of Hji(x, s) = 0 include all the isolated solutions of Hij(x, 0) = 0. But the solutions of Hji(x, 1) = 0 that are solutions of Lk{x) = 0, 1 < k < i, remain solutions of those linear equations as s varies, so they remain on afc-dimensionalor higher component of V(f) and therefore cannot give witness points on a j-dimensional component. Thus, the endpoints of the paths beginning at the nonsolutions at level k include all the witness points and all the nonsolutions at level j . When categorizing an endpoint of a solution path, it suffices to check only Lj(lims_»o 4>k(s)), because by Theorem 14.1.1, either all L/c(lims^o 4>k(s)), 1 < k < j , will be zero or none of them will. • The Cascade Algorithm simply asserts that tracking the level i nonsolutions using the homotopy Hji{x, s) = 0 of Equation 14.1.7, we get the level j nonsolutions plus a witness point superset for the dimension j components of V(f). The randomness of i'2' in the homotopy of Equation 14.1.7 simplifies the proof of Theorem 14.1.2, but in practice all the U can be 1. They are just scaling factors on the linear equations (see Equation 14.1.6) and since the linear coefficients are already chosen generically in the matrix A, the generic choice of t^ is redundant. For the same reason, when we track paths from s = 1 to s = 0, we may do so on the real interval (0,1] with a probability of success equal to one (see Lemma 7.1.2). Assume that we know by some other means that dim V(f) < K < N. Then,
260
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
we can start the cascade by solving £K(X) = 0 using any homotopy that will find all isolated solutions, for example, a total degree homotopy. We can check for the trivial case when V(f) = CN, using the probabilistic null test, and so we usually start at level K = N — 1. Alternatively, one might use the algorithm TopDimen from § 13.7.1 to determine a lower starting dimension for the cascade. A final important note: at the top of the section, we began by squaring up the system / to size r = rank/. Theorem 14.1.1 applies to the square system; call it / ' and its witness superset as W. If the original system / has more than r equations, then W may include points which do not satisfy / . We simply discard these. Cascade: [W] = Cascade(/) • Input: Polynomial system / : C^ —> Cn. • Output: A witness point supersets W = {Wo,..., WJV} for V(f), where Wi is a witness point superset for the codimension i component. Empty dimensions may be omitted. • Procedure: — Initialize W = {}. — If / is null, return the appropriate result. Otherwise, continue. — Comment: square up f{x) to form g(u) of size rank/. — Let r = rank/. - L e t / ' = 5H(/;r). — Choose random b G CN and B 6 C i V x r . — Define g{u) = f'(b + B-u). — Comment: form embedding and solve for codimension 1. — Choose random A e C r x r and A £ C r x < r + 1 ) . — Form £(u, t) = g{u) + A • T(t) • L(A, w), where T(t) is diagonal of size r x r , — Let S := HomSolve(£'(u,^ r ^ 1 ))), discarding any solutions at infinity. — Partition S as W := {u G S : g(u) =0},N' = S\W. — Loop: For i = 1 , . . . , r — 1 * Comment: i is the codimension. * Append Wi := b + B • W to W. * Let d = r — i. * Track solution paths of £(u, st^ + (1 — s ) ^ " 1 ' ) = 0 as s goes from 1 to 0, starting at each of the points in J\f. Discard any endpoints at infinity and call the remaining ones set S. * Partition S as W := {u G 5 : g(u) = 0}, Af = S \ W. — End loop. — Comment: the lowest dimension might have extraneous points. — Let Wr := b + B • W, and expunge any points x G Wr such that f(x) ^ 0. — Append Wr to W. — return(VF).
261
A Cascade Algorithm for Witness Supersets
For simplicity, we state the algorithm concentrating on the witness point sets. The linear slicing equations are easily constructed from b, B, and A. 14.2
Examples
For direct comparison with algorithm WitnessSuper of the previous chapter, we repeat the same examples as in § 13.6.1, this time using Cascade. Please refer to the earlier section for the meanings of the table entries. Example 14.2.1
For the system f ( x v )
- \
*(2/2-*3)(z-l)
l _
0
the cascade results are as follows. There is a new column called "fail" to record that some paths did not converge well. Dim 1 0
Paths 30 9
#W 4 9
#Wsing 0 7
#JV 9 0
#00 4 0
fail 13 0
nfe nfe 223 6711 64 643
As we know the answer before hand, we can verify that the witness supersets contain a valid witness set. The 13 failed paths are worrisome, but it appears that they are highly singular points at infinity. This example is rather degenerate with high degrees; it calls for higher precision arithmetic for a secure treatment. Example 14.2.2
The system
f(x,y):=\x2y - Xy-2y}=0J [ 3xyA-y leads to the following results from the cascade: Dim 1 0 Example 14.2.3 Dim 8 7 6 5 4
Paths 12 6
#W 1 6
#Wsing 0 2
#jV (3 0
#00 5 2
nfe nfe 117 1403 37 222
Running the cascade on the equations for SO(3), we obtain Paths 96 72 72 72 72
#W 0 0 0 0 0
#Wsing 0 0 0 0 0
3 I 72 I 8 1 0
#N 72~~ 72 72 72 72
#00 24 0 0 0 0
nfe 267 46 44 47 48
nfe 25620 3290 3195 3357 3465
40 I 24 1 107 I 7674
262
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
For each stage after the first, the nonsolutions of the previous stage become the start points of the next, so the number of paths can only decrease at each stage. Examples like the SO (3) problem are the worst case for the cascade as far as the total number of paths is concerned, because all the paths survive to the last stage. A saving grace is that the number of function evaluations per path falls dramatically after the top dimension. We can only surmise that the initial homotopy between a generic start system and our sliced target has longer, perhaps more twisted, paths, while the cascade homotopies connect highly related systems, so that the paths are short and relatively straight. The rise in nfe for the final dimension of the 50(3) problem is due to the solutions at infinity being singular, thus requiring a more expensive endgame to compute them accurately. Comparing these tables to the ones in § 13.6.1, we see that Cascade consistently tracks more paths than WitnessSuper, but the total number of function evaluations is almost the same. We experience some numerical difficulty on the first cascade example, but it still returned a correct witness superset. There is one clear difference in performance: Cascade returns a smaller superset than WitnessSuper on each of these examples. This means the supersets contain fewer junk points. This is particularly notable in the zero-dimensional sets for Example 14.2.1, for which WitnessSuper gave a set of 30 points containing 28 junk points, while Cascade gave a set of only 9 points containing 7 junk points. When we move on to computing a numerical irreducible decomposition, the first step is to remove the junk points. It is quite advantageous to have fewer of them at the outset. 14.3
Exercises
Exercise 14.1 (Comparisons) Run Example 14.2.2 and Example 14.2.3 using HOMLAB. DO SO both using witsup. m, an implementation of WitnessSuper of the previous chapter, and using cascade.m, an implementation of Cascade. Compare run times for the two methods. Use the profiler tool in Matlab to track which routine is using the most computation. If you are using the "tableau" format, supplied for both examples, see how much you can improve performance by writing an efficient straight-line program. Exercise 14.2 (Slicing Equations) Find an expression, in terms of b, B, and A, for the slicing equations for dimension i in algorithm Cascade. Exercise 14.3 (Spherical Pentad) A pentad mechanism is topologically two triangles, A and B, with three line segments, each one joining corresponding vertices of the triangles. The segments and triangles represent rigid links, but relative motion is allowed where they meet. In the spherical version, the joints are all revolute (one-degree-of-freedom hinges), and their centerlines all intersect in a common point. This means that the possible relative positions of one triangle with respect
A Cascafe Algorithm for Witness Supersets
263
to the other is constrained to rotations in M3. Let a i, 02,03 € K3 be unit vectors at the joints of triangle A and bi,b2,b3 E R3 the same triangle B. Let q 6 R be the cosine of the arc subtended by the segment from at to bi. Let X e SO(3) be the rotation of triangle B with respect to A. Then, we have the three equations a[Xbi = cu
i = 1,2,3,
to describe all possible placements of B with respect to A such that the pentad can be assembled. Explain how results presented in this chapter allow you to conclude that for general parameters a^, bi, c,, i — 1,2,3, the spherical pentad has at most 8 assembly configurations. Exercise 14.4 (Griffis-Duffy Platform) A special case of the Stewart-Gough platform we studied in § 7.7 and § 9.3, Griffis-Duffy platforms (Griffis & Duffy, 1993) have triangular upper and lower platforms, with the vertices of each connected to a point on the edge of the other. An even more special case, which we call the GriffisDuffy Type I platform, is when the triangles are equilateral (not necessarily the same size) and the joints on the edges are at the midpoints (Husty k Karger, 2000; Sommese, Verschelde, & Wampler, 2004a). That is, connecting point a, on the base to bi on the upper plate, a\, a3 and a5 are vertices of an equilateral triangle, and a-2 = (ai + a^)/2 and so on cyclically. Meanwhile, &2, 64 and b& are vertices, and bj = (&6 + fr2)/2 and so on cyclically. The leg lengths, Li: are arbitrary. What is the dimension and degree of the top-dimensional component? Use Equations (7.7.7) and 7.7.10), and ignore any points on the degenerate set of Equation 7.7.8. Exercise 14.5 (Seven-Bar Revisited) Repeat Exercise 13.3 using Cascade.
Chapter 15
The Numerical Irreducible Decomposition
Let Z be an affine algebraic set on C^. This means that Z is the solution set of some system of polynomials / : C^ —» C", i.e., Z = V(f). In a typical situation, we start with / as given, and we seek to find a description of its solution set. In other cases, such as we address in the next chapter, Z may be only a portion of the full solution set of the polynomials on hand. But for the moment, it does no harm to think of Z as the full solution set V(f). No matter its origins, Z has a decomposition into its pure-dimensional parts Zi, i.e., Z = L)fl!£lzZi with dim Zi = i. Furthermore, each Zi can be decomposed into irreducible pieces Zij, i.e.,Zi = Uj^XiZij, where each Zij is a distinct irreducible component and the index sets X% are finite. Our goal is to find a numerical irreducible decomposition, that is, we wish to find witness point sets, W^ := Z^ fl L c ^ for each irreducible component Z.Lj of Z, where Lc^l is a generic linear space of codimension i. From Chapters 13 and 14, we have algorithms WitnessSuper and Cascade that, given polynomial system /, find a witness superset for V(f). That is, for each dimension i, they give a set Wi D W{ := Zi n Lc^\ which contains all the witness points for all the irreducible components of dimension i along with some possible junk points. Accordingly, our goal becomes to find the breakups Wi = Wi + J, = Uj-ez, W^ + Jj
(15.0.1)
where Jj C (Uj>iZj) n L^1^. To achieve this, we show • how to trim the junk points out of the witness supersets, Wi, to obtain the witness sets, Wi, i — 0,..., dim Z, and • how to decompose a witness set, Wi, into its irreducible components, Wij. In the sequel, Chapter 16, we present methods for finding witness supersets for the intersection of algebraic sets. The methods of the current chapter will apply equally well to those witness supersets. One way to approach the processing task is to employ a membership test. Junk points in a witness superset at dimension i are members of component of dimension greater than i, so one way of detecting them is to start at the highest dimension and work down, eliminating any points found to be members of higher-dimensional 265
266
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
components. Then, at a fixed dimension, we need a test of whether two or more witness points belong to the same irreducible component. By such tests, we can group the points to form the numerical irreducible decomposition. Hence, much of this chapter is devoted to membership tests, and in § 15.1, we begin the chapter by discussing how different types of membership tests, defined abstractly by their inputs and outputs, can be used to process witness supersets into the irreducible decomposition. The remaining sections present concrete approaches to providing the necessary membership tests. All the algorithms of this chapter rely on a basic maneuver we call sampling, which generates new points on a component by tracking witness points as the slicing plane is moved continuously. Thus, in § 15.2 we discuss sampling for each of the three variants of witness set put forward in § 13.3: reduced, deflated, and nonreduced. The general nonreduced case requires a method for tracking singular paths, which we outline in § 15.6. A sampling routine enables three kinds of algorithms that are useful in computing a numerical irreducible decomposition. Numerical elimination theory, § 15.3, interpolates sample points to find equations that vanish on a component thereby providing a membership test. This approach provides a complete solution to both the junk elimination and the decomposition stages of processing, but it becomes prohibitively expensive and numerically unstable for all but the lowest degrees and dimensions. As a more practical alternative, in § 15.4, we discuss a homotopy membership test based on the fact that regular points of an irreducible algebraic set are path connected. This approach provides a complete method for junk elimination, but its use in monodromy loops to heuristically find connection paths between witness points at the same dimension provides only a partial solution to the decomposition phase. To complement this, the trace test discussed in § 15.5 determines whether a given subset of witness points forms a complete component. It can be used to quickly certify a putative decomposition found by monodromy or to complete a partial one by exhaustive testing. It can even be used by itself to combinatorially test subsets of points until the entire decomposition is determined. Our presentation follows the order in which the methods were originally developed: numerical elimination theory in (Sommese et al., 2001a), monodromy in (Sommese et al., 2001c), and traces in (Sommese, Verschelde, & Wampler, 2002b)—inspired by ideas in (Rupprecht, 2004). The different approaches each have their own niches. For a pure i-dimensional component of moderate degree, meaning not much more than degree 10, traces prove to be fastest decomposition method, but the worst-case cost grows exponentially with degree. For this reason, monodromy certified by traces eventually becomes more effective. Numerical elimination is not generally competitive for determining a decomposition, but it could still be useful if one seeks equations vanishing on a component.
The Numerical Irreducible Decomposition
15.1
267
Membership Tests and the Numerical Irreducible Decomposition
Our task is: • Given: A witness superset, W, (see Definition 13.6.1) for an affine algebraic set Z. ^ « Find: The decomposition of W into a numerical irreducible decomposition for Z, that is, find the breakup of W as in Equation 15.0.1. We will outline three variations on a procedure to complete this task, each based on a different type of membership test. The details of how to implement the tests follow in subsequent sections. In this and the following sections, we denote the witness superset for dimension i as Wi, which is composed of a witness set Wj for dimension i plus, possibly, some junk points, J;. In addition to witness points, witness sets and witness supersets carry along linear slicing planes and some description of Z in a form that allows witness points to be refined numerically. When we speak of a point w G Wj, it is implied that w is in the witness point set for Wi. Before employing membership tests, we reduce the amount of work by partially categorizing the points in the witness superset. The first observation is that all points in the top-dimensional witness superset are true witness points: there is no junk in the top dimension. This is because, by definition, the junk points in a witness superset for an i dimensional component of Z must lie in some higher-dimensional component of Z. A second observation is that any nonsingular points in W must be true witness points. Assume that Z = V(/), where / is a system polynomials. A point t t e W j lies in Z n £ c ' 2 ', so letting / ^ m (x) denote the restriction of / to the linear space Lc'1', we have / ^ m (w) = 0. Then, w is nonsingular if the Jacobian matrix of partial derivatives for //,cm has full column rank.1 For this purpose, it does not matter whether the linear slice is represented extrinsically or intrinsically. (See § 13.2.1 for explanation of these terms.) Nonsingularity implies that the point is an isolated point of Z D £CM. In contrast, junk points in Wi lie in a component of Z of dimension greater than i and hence in a component of Z fl Lc ^ of dimension at least one. The final observation builds on the second. A point w £ Wi is a true witness point if, and only if, it is an isolated solution to fLo[^(x) = 0. Any test of local dimension can serve to distinguish between junk points and witness points. If point w £ Z n I c f ' l C CN is not isolated, then the slice Zn Lc^ must intersect a closed hypersurface surrounding w. Interval arithmetic might be used to find that the point is isolated by showing that none of the 2N faces of a rectangular box enclosing w x
We use the usual convention that rows of the Jacobian correspond to functions in / and columns correspond to variables.
268
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
contains a solution. Alternatively, a heuristic like the one in (Kuo et al., 2004) could be used to find nearby points on Z D Lc^1^, thereby showing that the point is not isolated and must be junk. Taking such observations into account, we mark some points in W as true witness points, we discard any points known to be junk, and we mark the remaining ones as needing further investigation. If local dimension testing of the sort mentioned in the previous paragraph is reliably complete, then no questionable points remain, but we do not count on that outcome in what follows. We can complete the decomposition with one of several types of membership test. The first of these has the following inputs and outputs. Irreducible Membership: [Yi,>2] := Memberl(Y,w) • Input: A finite set of test points Y 6 CN and an isolated point w G Z C\ Lc'*', where Z is an algebraic set and Lc ^ is a linear space of codimension i generic with respect to Z. • Output: Set Y\ consisting of the points in Y that are on the same irreducible component of Z as w, and set 5^ := Y \ Y\ being the rest of Y. • Procedure: See § 15.3. This membership test yields a complete algorithm for the numerical irreducible decomposition of a witness superset as follows. Irreducible Decomposition: [W] := IrrDecompl (W) • Input: A witness superset W for an algebraic set Z. • Output: The witness set W contained in W decomposed into its irreducible pieces as in Equation 15.0.1. • Procedure: - Initialize W = 0. - While: W ^ 0, * * * *
Let k be the top dimension of W. Pick any w € W^Let \YUY2] :=Memberl(t?,w;). Points in Y\ from Wk form an irreducible witness set. Append this set to W. ^ * Points in Y\ from Wi, i < k, are junk. Discard them. * Remove Yx from W, i.e., W := W \ Yl.
- End while. - return(W). On each pass through the main loop, at least one point w is removed from Wk, so eventually it is emptied out and the algorithm descends to the next dimension.
The Numerical Irreducible Decomposition
269
Eventually, W is completely empty and the algorithm terminates. Irrdecompl does both jobs of removing junk and decomposing the witness sets. The only trouble with this approach is that Memberl turns out to be expensive. For this reason, we develop more efficient alternatives. These alternatives proceed by eliminating junk as an independent process from decomposing the witness set. The key to junk removal is the following algorithm. •
i
Membership: [t] := Member2(?/, W) • Input: A single point y € CN and a witness set W for a pure-dimensional algebraic set X. • Output: If y £ X, return t := true, else return t := false. • Procedure: See the homotopy membership test of § 15.4. With this test available, one can remove all junk points as follows. Remember that the top dimensional component of a witness superset contains no junk points. Junk Removal: [W] := JunkRemove(W/) • Input: A witness superset W for an algebraic set Z. • Output: The witness set W obtained by removing all junk points from W. • Procedure: - Let k be the top dimension of W and set i := k - 1. - Let Wk := Wk. - While: i > 0,
* For each w £ Wi, if Member2(w;, Wj) for any j > i, then discard w. Otherwise, copy it into Wj. * Let i :—i — \. - End while. - return(W:=[W 0 ,...,W fe ]). i
i
With the junk removed, it remains to partition the witness sets at each dimension into irreducible witness sets. The monodromy method, though not complete on its own, is useful for this task. Monodromy: [W] := Monodromy(W) • Input: A witness set W for a pure-dimensional algebraic set Z. • Output: A witness set W having the same points as W in some permutation such that corresponding points in the lists are known to be in the same irreducible component of Z. • Procedure: See the monodromy algorithm of § 15.4.
270
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
To make the monodromy approach complete, we may employ a trace test. This test can also be used on its own, without monodromy, to find irreducible decompositions. Its format is as follows. Trace Test: [t] := Trace(F) • Input: A set of points Y C W, where W is a witness set for a pure-dimensional algebraic set Z. • Output: An array t containing linear traces of the points in Y. The traces have the property that if the sum of traces for a set points is zero, then that set is the union of one or more irreducible witness sets. • Procedure: See § 15.5. i
i
As we will discuss in § 15.5, if w € W is on an irreducible component of degree d, there is one and only one subset W of size d that contains w and has trace of zero. (The trace of a set of points is just the sum of the traces of its members.) Moreover, any zero-trace set of size greater than d that contains w is the union of the irreducible one of size d and one or more other irreducible witness sets. Combining monodromy and the trace test, we have a complete algorithm for irreducible decomposition of a pure-dimensional algebraic set as follows. Irreducible Decomposition: [W] := IrrDecompPure(W, M, K) • Input: A witness set W for a pure-dimensional algebraic set Z. Also, integers M, K that control when to switch to exhaustive enumeration. • Output: The a list W of the irreducible components Wj of W. • Procedure: — For j = 1,..., #W, initialize Yj as a set containing the jth point of W. Let Y be the list of all Yj. — Associate to each Yj a trace value tj := Trace(Y,). — For each tj that is zero, move Yj from Y to W'. — Comment: try heuristic monodromy loops first. Integer k counts the number of attempts without making progress. — Initialize k = 0. — While: m := #Y > M and k < K, * Let {Y{, ...,Y^}:= monodromy {{Yu ..., Ym}). * If there is any Yj ^ Yo•, we have found a path connecting a point in Yj to a point of some Yj, i ^ j . * Regroup the Yi, merging all sets that have a monodromy connection and updating the corresponding trace as the sum of those for the merged sets. * For each new trace that is zero, move the merged set from Y to W. * If there were no mergers, increment k, else set k = 0. — End while.
The Numerical Irreducible Decomposition
271
- Comment: Switch to the exhaustive tests either because the number of groups is low enough or because we give up on the monodromy heuristic. - While: Y / 0 * Among all combinations of one or more Yj eY, find the smallest combination that both contains Y\ and has a summed trace of zero. "Smallest" means having the fewest witness points. * Merge this combination into one set and move it to W.
i
- End While. - return(W)-
_
m
^
i
With care in programming, the exhaustive phase requires at most 2 m ~ 1 — 1 combinations to be examined. There are 2 m possible combinations in all, but if one combination passes the trace, so does its complement in Y, so we never have to test both. Also, we know that the trace for the whole of Y must be zero, because the initial set W is a complete witness set. A further refinement of the algorithm recognizes that some witness points appear with multiplicities greater than one, and all witness points on the same component must have the same multiplicity. Therefore, if we keep track of how many times each witness point appears in the output of WitnessSuper or Cascade, we can limit the combinations to be tested in the exhaustive phase to only those combining points of the same multiplicity. Here, multiplicity means the multiplicity of the component as a solution to the squared-up system used in witness superset generation, which may be greater than its multiplicity as a solution of the original system. If we do not wish to use monodromy, a negative value of K causes IrrDecompPure to skip directly to exhaustive trace testing. The numerical irreducible decomposition is obtained from a witness superset by removing junk and then the witness sets for each dimension one by one. For completeness, we list out the algorithm as follows. Irreducible Decomposition: [W] := IrrDecomp2(VF,M, K) • Input: A witness superset W for an algebraic set Z. Integers M and K are control parameters for IrrDecompPure. • Output: The witness set W contained in W decomposed into its irreducible pieces as in Equation 15.0.1. • Procedure: - Let W := JunkRemove(W0. - For % = 1,..., dim W, let W{ := IrrDecompPure(W', M, K). - return(W / ). This completes the top-down description of numerical irreducible decomposition. The rest of the chapter builds the required membership tests from the bottom up
272
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
and shows that they have the properties that we rely on here. 15.2
Sampling a Component
The fundamental capability upon which all the membership tests depend is the ability to sample a component given a witness point on it. Recall from § 13.3 that as a theoretical construct, witness points are the isolated points of intersection between an affine algebraic set and a generic linear space. To sample a component, we simply move the linear space in continuous fashion, i.e., move it along a realone-dimensional path through the Grassmannian of linear spaces. As long as the prescribed path avoids a proper algebraic subset of nongeneric linear spaces, the intersection with the component remains isolated and defines a real-one-dimensional path of points on the component. Sampling is just the process of setting up such paths and following the points of intersection. Suppose the algebraic set under study is Z, and let L(s) denote a continuous path of linear spaces that are generic with respect to Z for s S (0,1]. Then, x(s) := Z n L(s) is a path of isolated points, with a well-defined endpoint x(0) = lim s ^o3 ; (s). When we choose L(0) to be generic also, then x(0) is a new sample point lying on the same component as x(l) and on no other component of Z. As a numerical construct, witness sets carry along extra information that allows a numerical approximation to a witness point to be refined to higher precision. This same information allows us to update the witness point when the slicing plane is moved slightly, hence we can numerically follow the path x(s). The details vary according to whether the component is reduced, deflated or nonreduced. A linear space can be represented extrinsically as a set of linear equations L{x) = a + A • x = 0 or intrinsically as x(u) = b + B • u. In the extrinsic form, a linear interpolation between two such spaces of the same dimension, say L\{x) and L0(x), can be written as L(x, s) = sL^x) + (1 - s)LQ(x).
(15.2.2)
If the coefficients of Li(x) and L0(x) are chosen at random, then L(x, s) = 0 defines a linear space of the that same dimension for all s G [0,1], with probability one. Intrinsically defined paths work in an analogous way, so we don't write out the details. In the rest of this section, we write only extrinsic formulations, but it should be understood that intrinsic ones can be used instead, usually with some increase in efficiency for implementations. 15.2.1
Sampling a Reduced Component
A numerical witness point on a reduced component in V(f) is a nonsingular solu— 0, for some known slicing tion, say xi, to the augmented system {f(x),Li(x)} equations Li(x). To sample, we simply replace Li(x) with the path L(x,s) of
The Numerical Irreducible Decomposition
Equation 15.2.2 to get the homotopy
h
Q
^=[i(th -
273
(1523)
--
We wish to track the path beginning at xi for s = 1 to find the endpoint as s —> 0. For x € CN, the homotopy h(x,s) has at least N equations. When it is not square, we can use randomization to square it up as h'(x,s) := 9i(h(x,s);N) and then apply the usual nonsingular path tracker of § 2.3. An alternative is to use a Gauss-Newton predictor-corrector, meaning that we use least-squares pseudoinversion in place of Gaussian elimination to solve the overdetermined linear systems in the predictor and corrector steps (see Equations 2.3.5, 2.3.6). 15.2.2
Sampling a Deflated
Component
Recall from § 13.3.2 that for a nonreduced component in V(/), we have the option of constructing a deflation such that the component in question is the projection of a reduced solution component of a related system of polynomials g. That is, the witness set has the form (W,L,g, TT) such that the points W are nonsingular points in V(g) D n~1(L) and the witness points are W = Tr(W'). When L is given by equations L{x) = 0, the pullback TT^^L) is given by the same equations, so the path L(x, s) is still just as in Equation 15.2.2. We proceed as in the case of a reduced component but with g replacing /, obtaining a solution path y(s) in some larger dimension. The path we seek is just x(s) = ir(y(s)). 15.2.3
Witness Sets in the Nonreduced Case
The nonreduced case without deflation is the most difficult. In this case, witness points are singular endpoints of solution paths in a homotopy h(x, t) = 0. This homotopy is constructed in the course of computing a witness superset either by WitnessSupi called from WitnessSuper of § 13.6 or by algorithm Cascade of § 14.1. Either way, the homotopy depends on the coefficients of the linear slicing equations, which we may explicitly show by the notation h(x, t; A) = 0. Consequently, if Ai is the matrix of coefficients for the slicing plane on which our witness point lies and Ao is same for the target slice, the sampling homotopy becomes doubly parameterized by t and s as H(x, t, s) := h(x, t, sAi + (1 - s)A0) = 0.
(15.2.4)
We have in our witness set the start points Wt that satisfy H{x, e, 1) = 0 and which lead to the witness point as t —> 0 for s = 1. We wish to track the solution path as s moves along (0,1]. We know that the solution path exists, but it consists of singular isolated points for each value of s. This is a case of singular path tracking. So as not to unduly interrupt the flow of the chapter, we postpone discussion of singular path tracking to the last section, § 15.6.
274
15.3
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Numerical Elimination Theory
The first approach to the numerical irreducible decomposition, reported in (Sommese et al., 2001a), uses a membership test based on a numerical version of elimination theory. This is the test we called Memberl in § 15.1. Let X denote an irreducible k dimensional component of the solution set V(f) and let x* be a known generic point on it, that is, x* = X n L c ^ l for a generic linear space Lcrfel of codimension k. We need to give a criterion for y G C^ to belong to X. We can assume that N > k > 0 since otherwise nothing needs to be done. Assume first that k = N — 1. Using the sampling techniques of § 15.2, we vary the slicing plane and collect as many widely separated general points on X as we wish. For each positive integer d, there are m(d) := {N^d) monomials of degree less than or equal to d; the coefficients of these monomials are homogeneous coordinates in p m ( d ), thereby forming the complex linear space Pd(CN) of polynomial equations of degree < d. Each point on X gives a linear condition on the Pd(CN). Choosing m(d) — 1 general points, we get a polynomial Pd{x) vanishing on the points, unique up to multiplication by a nonvanishing complex number. Choosing one additional general point e on X we have either (1) Pd(e) / 0, in which case there are no elements of Pd(CN) vanishing on X and deg X must be greater than d; or (2) or Pd(e) = 0, which by genericity implies that Pd{x) is identically zero. In this case, degX < d and if this is the smallest d for which such a polynomial exists, we know that degX = d, and in fact, X = V(pd)- Consequently, we have a membership test: y G X if, and only if, pd(y) = 0. Thus, we may proceed progressively d = 1,2,... until we find a d for which there is a polynomial Pd{x) vanishing on X. We know that degX is at most the cardinality of the witness set for dimension N — 1, which limits the complexity of the method. Now assume that 0 < k < N — 1. Take a generic linear projection TT : CN —> Cfc+1. We know by Theorem A.10.5 that 7r is generically one-to-one and proper on X, and in particular, that TT(X) is an algebraic hypersurface with deg?r(X) = deg^f. We sample X as usual and project each sample point x to y = TT(X) £ Cfc+1. Just as above, we now find a polynomial qd(y) of minimal degree that vanishes on the projected samples and we conclude that TT(X) = V(qd)Any point of x' e Tr~l(n(X)) satisfies qd(n(x')) = 0, so at first blush, qd does not seem adequate for testing membership in X. However, it is sufficient for testing membership for a finite set F C C^, because a general projection such as TT has the property that for all x* G F, n(x*) G n(X) if, and only if, x* G X. So choosing the projection at random, we have a probability-one membership test for points x* G F: x* € X if, and only if, qd(Tr(x*)) = 0. This is all we need for algorithm Memberl. The main problem with this approach is that (fc+^+d) grows rapidly with the dimension k and degree d of the component. Also, fitting polynomials of high degree
The Numerical Irreducible Decomposition
275
to numerical data is often numerically ill-conditioned. The dimensionality of the problem can sometimes be reduced by detecting that the linear span of a component is smaller than N and the degree can be lowered mildly by projecting from points on the component; see (Sommese et al., 2001b) for more on these. Still the approach is often too inefficient for practical use. 15.4
Homotopy Membership and Monodromy
We can avoid the computational cost of numerical elimination by switching to a weaker membership test, called Member2 in § 15.1. It has the more stringent condition that the input is a witness set for a pure-dimensional component, whereas Member 1 only requires a single generic point. However, since our methods of generating witness supersets always give a top-dimensional witness set free of junk points, we have the necessary input to start the junk removal process for lower dimensions using Member2. The same theoretical underpinning that justifies Member2 gives us routine Monodromy: both rely on a homotopy membership test. The main principle is that if X C C^ is an irreducible algebraic set, X and XTeg are path connected. Assume X is i-dimensional, i < N, and let G be the Grassmannian consisting of all codimension i linear spaces in CN. A general point in G is a generic slicing plane with respect to X, while there is some proper algebraic subset, say G* C G, of nongeneric slicing planes. A generic slicing plane, say Li £ G \G*, cuts X in a witness point set W\ := X D L\. For any LQ £ G, let L(s) c G b e a one-real-dimensional path with L(l) = L\ and L(0) = LQ and L(s) <EG\G* for all s £ (0,1]. By Theorems A.14.1 and A.14.2, since Wi is the entire solution set of XDL(0), the solution paths XflL(s) start at W\ for s = 1 and the limits of their endpoints as s —> 0 includes all isolated solutions of X n L(0). For convenience throughout this section, we abuse notation by using the same symbol L for both the linear space and the linear functions which define it, i.e., L = V{L{x)). Suppose we wish to test if point y £ CN is in X, where X is as in the previous paragraph. If y £ X, then among all the linear spaces in G that pass through y, generic ones meet X at y transversely, that is, letting Lo be a generic element of G meeting y, we have that y is an isolated point of X n Lo- Accordingly, the endpoints of the homotopy paths X n L(s) starting from W\ must include y as s —* 0. The only remaining question is how to construct L(s) so that it misses G*. This is easily accomplished because L\ is generic, so by Lemma 7.1.2, the path L(s) = sL\ + (1 — S)LQ avoids G* for s 6 (0,1] with probability one. Here, the interpolation formula for L(s) assumes L\ and Lo are represented extrinsically as a set of i linear functions. Now, suppose that X is the union of several irreducible pieces, all of dimension i. We have y G X if, and only if, it is in one of the pieces. We just conduct the homotopy membership test for each piece. Notice that if we have a witness set for X, it includes witness points for all the irreducible pieces even though we may not
276
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
know which points match which pieces. It doesn't matter as far as the membership test is concerned; if we track all the witness points, we still get all the endpoints, without knowing which ones are on which piece. According to the above, we may now write pseudocode for Member2 as follows. Membership: [t] := Member2(y, W) • Input: A single point y € CN and a witness set W for a pure-dimensional algebraic set X of dimension i. • Output: If y € X, return t := true, else return t :— false. • Procedure: — Comment: W includes a linear space L\ and the witness points W = XC\L\. — Choose a random complex i x N matrix A. — Let LQ(X) := A • (x — y). This is a generic linear space passing through y. — Track the paths X n (sLi + (1 - s)L0) from W at s = 1 to get endpoints Y at s = 0. — If y £ y, then return(true), otherwise return(false). Obviously, we can save some computation by tracking the paths one at a time and returning a positive result as soon as one ends on y. The worst case is when y $ X, because then we always have to track all the paths to find this out. 15.4.1
Monodromy
The same principle underlying the homotopy membership test leads directly to the concept of monodromy. In our context, the basic idea is that if L(s) C G \ G* is a one-real-dimensional closed loop, that is, L(0) = L(l), then the set of witness points at s = 1 are equal to those at s = 0, i.e., W = X n L(l) = X n L(0). This is true both when X is irreducible and when it is the union of irreducibles. What makes this useful to us is that, although the set of points is the same, the paths leaving at s = 1 may arrive back at s = 0 in permuted order. A path beginning at point u £ W and arriving at point v G W with u ^ v demonstrates that u and v are in the same irreducible component. This is just the homotopy membership test applied on a closed loop. When we begin with a witness set W for a pure-dimensional component X, such as would be generated by successive application of algorithms Cascade and JunkRemove, we do not know how many irreducible components X contains. Any partition of the points is possible, from every witness point lying in its own linear component to all witness points on the same component of degree #W. Each connection between distinct witness points found by monodromy restricts the possible break up. This is how algorithm Monodromy is used in algorithm IrrDecompPure of § 15.1. Pseudocode for the monodromy algorithm follows.
The Numerical, Irreducible Decomposition
277
Monodromy: [W] := Monodromy(M^) • Input: A witness set W for a pure-dimensional algebraic set Z. • Output: A witness set W having the same points as W in some permutation such that corresponding points in the lists are known to be in the same irreducible component of Z. • Procedure: — Comment: W includes a linear space L\ and the witness points W € XC\L\. — Choose a random linear space LQ(X) = 0 of the same dimension as Li{x). - Let L(s) = sLi + (1 - s)L0. - Track the paths X H L(s) starting at W for s = 1 to get new endpoints V at s = 0. — Choose a random, complex 7 e C. - Let L(s) = sjLQ + (1 - s)L\. — Track the paths beginning at V for s = 1 to get endpoints W at s = 0. - return(W). In the lists of points W, V, and W, we maintain the path ordering throughout, so that the kth point in W is path connected to the kth point of W', for all k. Note that we have used the fact that ryL(x) = 0 defines the same linear space as L{x) — 0, so X D LQ = I f l 7L0 • This means the start points of the second homotopy are the endpoints of the first. The 7 causes the return path to be different than the outbound path. See the figure following Lemma 7.1.3 for illustration. Note that Monodromy as written above uses two stages of path-tracking to produce one monodromy loop. In the process, it generates a witness set at a second slice, but this information is thrown away. For efficiency, one could save this intermediate witness set and use it to close monodromy loops with less work on subsequent executions of the algorithm. For example, if we go from i 2 to LQ to L\ to LQ, we have closed two loops, L\ —> LQ —> L\ and Lo —> L\ —> L o , using only three rounds of path tracking instead of four. See (Sommese et al., 2001c) for more on such practical issues. 15.4.2
Completeness of Monodromy
The monodromy procedure above is clearly valid, but it could be vacuous in the sense that the points might always come back in the same order. This is not usually observed in practice, and in fact, theory tells us that there exist monodromy loops sufficient to generate all possible permutations of the witness points on an irreducible set. This section presents an even stronger result that is key to the next topic of traces. But before we show that result, we give an extended discussion of a simple case. Historically, this was probably the first example of monodromy ever studied. Assume we have a polynomial p(z, w) = w2 - z on C2. Assume that p(z, w) = 0
278
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
and we wish to express w as a function of z, i.e., we wish to make sense out of the expression y/z with z € C. We would like this to be a global function, but this is not possible in any continuous way. Let us assume it is possible and see what goes wrong. At z = 1, we need %/T to be set to either 1 or — 1. Let's assume that y/z is set to 1: the case of —1 is identical. For z = elS we have either y/z = e%e/2 or y/z — —el6/2. Since \/T = \/e® we conclude by continuity that y/z = eie/2. The trouble comes when we go full circle and reach Vei2lT. By continuity we have V ^ F = ei7r = - 1 . The easiest classical solution of the problem of defining y/z (or ln(.z) for that matter) is to slit the plane from 0 to —oo, e.g., remove the real numbers from 0 to —oo from C. On the slit plane there are two "branches" of y/z. One has -\/l set to 1 and the other has vT set to —1. Similarly setting a more complicated polynomial p(z,w) = 0 with the w degree equal to d, we will have functions w = qi{z) for i = l,...,d solving p(z,w) = 0. Each is branch of the solution is defined on an appropriately slit region of the plane. Analytic continuation is the classical name for the process of extending the function, e.g., extending y/z denned in a small neighborhood of 1 to a function on a larger region. Hille has a nice detailed discussion of analytic continuation (Chapter 10 Hille, 1962). Notice that trying to define y/z and tracking v e * as z goes around the unit circle leads to a permutation of the set {1,-1} of roots of z2 = 1. Looking at this a bit more abstractly, we have that w — z2 — 0 defines an algebraic curve X in C 2 . Projection to the z variable gives a two-sheeted branched cover ?r : X —> C. Over C* := C \ {0}, we have that 7r : X \ {(0,0)} —» C* is a two-sheeted unramified cover with the fiber over a point z being (z, w), with w running over the "two square roots" of z. The fundamental group of C* is the additive group Z, and we have the monodromy action of Z on the fiber of TT over a fixed basepoint, e.g., 1. The even elements of Z leave {1,-1} fixed and odd integers send 1 to —1 and —1 to 1. How does this apply to decomposing an algebraic set into its irreducible components? Let's assume we have a purefc-dimensionalaffine algebraic set X C C^. Let 7T : X —> Ck denote the restriction to X of a generic linear projection from X to Cfe. Note that by genericity we conclude from the Noether Normalization Theorem 12.1.5 that 7r is a proper d := degX branched covering of Cfe. The union of the sets where n is not a covering and X is not a manifold form a proper algebraic X' C X with dimX' < dimX. Since n is proper, we know that TT(X') is an algebraic subset of Ck by the proper mapping theorem A.4.3. Moreover since the fibers of the map TT are finite, we know that dimTr(X') = diiaX1 < dimX = k. Thus letting X = X\ U • • • LJJ r denote the decomposition of X into irreducible components, we have that Y := Ck \ n(X') and Xz := Xt \ -K-1(K{X')) are all irreducible and connected. Moreover, letting X equal the manifold Ul_1Xi, the map n : X —> Y is a d sheeted unramified covering map. Fix a basepoint y* £ Y and consider the monodromy action of /n\(Y,y*), the fundamental group of Y with basepoint y*, on F := tr~l(y*). Note we have a
279
The Numerical Irreducible Decomposition
decomposition F = Fl U • • • U F r
(15.4.5)
given by setting Fi :— F n Xi. For our purposes, it is enough to take smooth embeddings g : S1 —> Y of the unit circle S 1 into Y with 1 going to y*, and for points in F track them as 9 S [0, TT] goes from 0 to ~n over the path g. We get different permutations of F as we carry out this tracking with different embeddings of S1 • By using the permuations, we break F into disjoint sets F = FiU---UFl,.
(15.4.6)
Since the Xi are connected, we see that the decomposition given in Equation 15.4.5 is compatible with F = U ^ - F / in the sense that each Fj is a subset of one of the Fi. The immediate question that raises itself is: Question 15.4.1
Do we have r = r' and is each Fj equal to one of the F{1
If we take sufficiently many smooth immersions g : S1 —* Y with g(l) — y, the answer to this is yes as we will see in §A.12. By a smooth immersion g : S1 —> Y, we mean a smooth map with the differential of rank one at all points of S1. This suggests the method of using monodromy along paths to decompose X into irreducible components. The problem is that the set X' can be expensive to compute. Therefore, although it is easy to find random paths in Y and consequently permutations of F, we have no cheap way in general to find generators of -K\(Y, y*), and so we have no way to know whether the breakup of F into the F/ equals the breakup of F into the F^. This raises the second question: Question 15.4.2 Is there a cheap way of checking that the breakup of F into the F[ equals the breakup of F into the i^? The answer to this is yes. Based on Theorem A. 12.2, the trace test to certify the breakup is explained in § 15.5. Remark 15.4.3 (Monodromy over general bases) Everything we said above works equally well for F := p~1(y) where p : X —> Y is a proper finite-to-one covering map from a pure-dimensional quasiprojective manifold X onto an connected quasiprojective manifold Y and y is a point we treat as a basepoint.
15.5
The Trace Test
The trace test is based on an explicit geometric description of a defining equation of a hypersurface built out of traces.
280
15.5.1
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Traces of Functions
The trace of a function is an old concept arising in a number of different situations. In this section we summarize the main results about the trace and some related constructions that go under the same general name. We follow the approach to these concepts as they arise in the Weierstrass Preparation Theorem (Gunning, 1990; Gunning & Rossi, 1965). We also follow (Morgan et al., 1992a; Sommese et al., 2002b) where we have used these concepts in a numerical context. We refer the reader to these places for more details. 15.5.2
The Simplest Traces
Let us explain what a trace is in the simplest case. We have a finite set F c C consisting of d not necessarily distinct elements Ai,...,Ad. Keeping track of multiplicities, or assuming that the A^ are distinct, we have a polynomial p{x) of degree d, unique up to a multiple by a nonzero complex number, with the property that V(p) = F. It is easy to write down: p(i) = n t i ( x - A i ) .
(15.5.7)
Multiplying this out we get d
P(x) = £(-1) V * .
(15.5.8)
i=0
where the ti are elementary symmetric functions of the roots, i.e., to := 1, and for i>0,
ti :—
2_^
An '" " AH-
l<ji
The parameterized version of these ti are the traces we are interested in. Before we turn to the parameterized situation, let us note an interpretation of the above ti as traces of matrices. Recall from linear algebra that the trace of a matrix is the sum of its diagonal elements. Let A:=diag (Ai,...,A d ). The trace of A is clearly t\. The matrix A induces linear transformations A*A of the exterior products AJCd. Using the basis {eh
A---Aen\l<j1<j2<---<jl
where efc is the d-tuple with zero entries in all places but the fc-th place, where there is a 1, we see that the trace of A1 A is U.
The Numerical Irreducible Decomposition
15.5.3
Traces in the Parameterized
281
Situation
Now we want to deal with the trace in the parameterized situation. Assume we have a finite-to-one proper degree d algebraic map TT : X —> Y from one pure-dimensional quasiprojective algebraic set onto a connected smooth quasi-projective algebraic set, or more generally onto an irreducible normal quasi-projective algebraic set. In practice, X is usually an aflane algebraic set in CN and Y is Euclidean space. From Corollary A.4.14, we know that properness implies there is a Zariski open dense set U CY and a positive integer d such that TV^-^U) : TT~1(U) —» U is an unbranched d-sheeted cover. The integer d is called the degree of TT and denoted deg TT. We call the function ti, that extends to Y, the i-th trace of g with respect to TT, and we denote it trX)j(A). If g and TT are algebraic, then so are the traces tr^^A). We have d
5^(-l)'tr W i i (A)A d -' = 0.
(15.5.9)
i=0
Assume we have an algebraic function A(a;) defined on X. If Y was a point, we would be in the case of § 15.5.2. Over the dense Zariski open subset U C Y, where n is an unramified covering, each fiber consists of exactly d inverse images, and we can do the construction of the ti pointwise over each y G Y to get functions tr7r,i(A)(y) defined on U. More explicitly, fix a point y G U. The set n~1{y) consists of d := deg?r points x\,..., Xd- We can form the degree d polynomial zero at the numbers X(xi) counted with multiplicity
{w-\{xi))---{w-\{xd))Expanded we have
i=0
where to — 1
an
d ti for i > 0 denotes the elementary symmetric function
l<ji<-<ji
of the roots X(xi). This unramified assumption is too restrictive for us. The wonderful fact is that under the modest assumption that Y is normal, e.g., a manifold such as C^, these functions ti, which depend only on y G U, have unique extensions to Y as holomorphic functions. We call the extension of ti, the i-th trace. These extensions exists because the properness of n implies that given any y E Y, there exists an open set V C Y containing y such that V and TT~1(V) are compact. Thus tr7r]i(A)(y) is bounded on UC\V. By using Theorem A.2.5 when Y is
282
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
smooth, and Remark A.2.6 when Y is merely normal, we conclude that trnj(\)(y) extends to V. This gives a holomorphic extension of ti7V:i(X)(y) to all of Y, which we also denote tr^j(A). The functions tr7rj(A)(y) are algebraic functions. This is a consequence of the characterization (discussed briefly in §A.l) of algebraic functions by their growth. The equation corresponding to the relation between the U and the A^ in § 15.5.2 is the key equation d
^(-l)Hr T , i (A)( 2 /)A( aJ ) d - i = 0
(15.5.10)
i=0
for (x,y) G X x Y with TT(X) = y. For y G Y where n~1{y) consists of d distinct points, this is nothing more than the fact that the roots of Equation 15.5.7 satisfy Equation 15.5.8. 15.5.4
Writing Down Defining Equations: An Example
Consider the cuspidal cubic, defined as the solution set of z2 — z\ = 0 in C2. Let X = V{z\ — z\). The projection {z\,z2) >-> (z\) restricted to X gives a proper degree two map it : X —> C. Then given g{z\, z2), a polynomial on C 2 , we have
tTn,1(\x(zi,z2))(z1) = A U , y ^ J + A (Zl, -yf$\ •
(15.5.11)
Though y/zf is not well-defined, the unordered pair -I y/zf, — \fz\ > is well defined, and thus tr7rjl(Ax(^i,^2))(-2i) is well-defined. Consider the function Xx(zi,z2) := z2. Substituting into Equation 15.5.11, the first trace of the function z2 is found to be
trw,i(z2)(zi) = \fz% +—\f$ = 0Note that yz^ is only well defined if we choose a branch of the square root, but whichever branch we choose, we have 0. Similarly trva{z2){zi) = yjz~l ( ~ \ A i )
=
~zi'
Recalling that t0 = 1 by convention, Equation 15.5.10 gives 0=(l)z22-(0)z2
+
(-z31)z02=z22~zl
It is no surprise that we get z\ — z\ back again, since we know that up to a nonzero constant multiple, z\ — z\ is the lowest polynomial vanishing on X.
283
The Numerical Irreducible Decomposition
Note the linear projection given above is far from generic, e.g., if the linear projection was generic, we know that the degree of the projection restricted to X would equal the degree of X, i.e., deg^f — z\) = 3.
15.5.5
Linear Traces
Let X be a pure (N — l)-dimensional algebraic subset of CN. Choose a generic projection of CN to C ^ " 1 . Then we know by Theorem 12.1.5 that the restriction 7T of the projection to X is finite and proper of degree equal to d :— deg X. Choose as coordinates X\,... ,XN of CN, the composition of coordinates xi,... ,5?/v-i of C ^ " 1 with 7r; and x^ equal to a general linear function on CN that is nonconstant on a fiber of IT. Then Equation 15.5.10 gives the polynomial d
P(*) = ^ ( - l y t r ^ a ^ X a ; ! , • • •, arjv-i)*^'
(15.5.12)
2=0
of the Xi that vanishes on X. We know that p(x) is a defining equation of X. Since degp(a;) = d, we conclude from Equation 15.5.12 that tr7rii(a;iv)(a;i, • • • ,^Ar-i) is a linear function.
(15.5.13)
Indeed, if it is not then the coefficient of a;^"1 would be of degree at least two contradicting degp(x) = d. If X c CN is a pure fc-dimensional affine algebraic set of degree d, then by the Noether Normalization Theorem 12.1.5, a generic linear projection of X to Cfc is proper and finite-to-one, and taking the trace t r ^ i (A) of the restriction to X of any generic linear function A on C ^ , we also obtain a linear function. This can be seen by noting that the map (TT, A) : X —> C fc+1 is an embedding on a Zariski open dense set V of X. Thus, we fall into the case covered by Equation 15.5.13 with N = k +1. We are now in a position to give an answer to Question 15.4.2. Given X C C ^ of dimension k, choose a generic linear Lo :=
284
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Now assume that F? is not equal to any of the sets Fj. It must be properly contained in one of the Fj, say F/ C Fj. Let qi,...,qb be the points of F/ and let q0 be a point of Fi \ F/. It follows from Theorem A. 12.2 that there is a path c : S1 —> C, where S1 are the complex numbers of absolute value 1 with c(l) = 0, such that monodromy under Lo + c(t)v takes q\, q2, • • •, qb to qo, q2, • • •, qb- Since S F ' X(s) is linear in s we conclude that A(
Let Lo be the linear space that cuts out Y C Z n Lo. Choose a random, complex v £ CN. Choose two distinct, nonzero, real numbers s± and S2Track the paths of Z D (Lo + sv) from Y at s = 0 to get Y\ at s = si and y 2 at s = s2. - Choose a random, complex 1 x N matrix A, and define \(y) := A- y. - Evaluate q0 = A(Y), qx = A(Yi), and q2 = X(Y2). - return(i := (qi - qo)/s1 - (q2 - qa)/s2)
15.6
Singular Path Tracking
In § 15.2.3, we saw that sampling a nonreduced solution component can lead to a singular path-tracking problem. This can be viewed as a special case of the following situation. Suppose a parameterized family of polynomial systems (see Chapter 7), f(z;q) : Cra x C m —> C n , has an isolated singular solution (z*,q*) at a generic
The Numerical Irreducible Decomposition
285
parameter point q*, where singular means that the Jacobian matrix df/dz(z*;q*) has rank less than n. This solution will continue to other isolated singular solutions on an open set in C m (see § A.14.1), and as described in Theorem 7.1.6, we may wish to track such a solution along a continuous path in parameter space, say q(s) C O™, where q(0) = q*. In general, this would be a nearly intractable numerical problem, but we have a little extra leverage if we have obtained the solution point (z*,q*) as the endpoint of a nonsingular solution path to a homotopy h(x,t;q*) = 0. Then, we may define the doubly parameterized homotopy H(x, t, s) := h(x, t; q(s)) = 0.
(15.6.14)
At its root, singular path tracking is based on a singular endgame. For each - 0 of a nonsingular value of s, the point on the singular path is the limit as t —> path. In Chapter 10, we discussed how to estimate such endpoints with the powerseries endgame or the related Cauchy integral endgame. Both of these work by building a local model of the solution path for small t. The gist of singular path tracking is to update this local model as we advance s and in essence, replay the endgame at every s. The power-series endgame and the Cauchy integral endgame both collect sample data on the incoming paths of the homotopy to determine the winding number c and to build a local model of the holomorphic function 4>{rj) from Lemma 10.2.1, where t = rf. The singular path tracker uses prediction/correction techniques to update the local model as we step along the path. Recall from Chapter 10, that a cluster of /i paths approaching the same endpoint may break into cycles, each cycle having a winding number, say c, such that the solution path closes up as t circles the origin c times. Although we will not argue the issue carefully here, it is clear that these cycles also continue in the local neighborhood. In a nutshell, the closing up of the solution path in c loops is an algebraic condition that holds on at the generic parameter q*, so it continues on an open subset in the neighborhood of q*. The endgame convergence radius within which the local model holds varies as q(s) varies with s. It may become zero within a proper algebraic subset of the parameter space, but by Lemma 7.1.2, a one-real-dimensional path between two generic parameter points will miss the degenerate set with probability one. Therefore, at each value of s, we have a nonzero endgame operating zone as in Figure 10.1, with a convergence radius and an ill-conditioned zone. If we use sufficiently high precision, the ill-conditioned zone stays inside the convergence radius for all s 6 [0,1], and our task is to track the local model along this endgame operating zone. As we have several ways of formulating an endgame based on the local model, the details of tracking the model must be adjusted accordingly. In essence though, all the methods are similar. For conciseness, it is helpful to adopt the notation that for a cluster of points C = {w\,..., wc}, we let H(C,t,s) = 0 mean H(wi, t, s) = 0 for i = 1,..., c. Also, the following definitions are convenient. Definition 15.6.1
A convergent cluster (C,to,s) = ({wi,..., wc},to,s) with
286
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
H(C,to,s) = 0 is such that to is inside the endgame convergence radius for fixed s and for i = 1,..., c, the solution path of H(w, t,s) = 0 beginning at (wi, to, s) continues to (tUj+i, to, s) as t travels once around the circle |t| = to- For this definition, wc continues to w\. By requiring that to is inside the convergence radius, we implicitly require that all the points in the cluster approach the same endpoint w* as t —> 0 and the same cyclic mapping from u>i to IUJ+I holds under continuation around a circle for every £ ^ 0 in the disk A to (0). In other words, the projection (w,t,s) —> t gives a proper c-sheeted finite mapping from the solution set of H(w, t, s) = 0 in a neighborhood (w*,0, s) to a neighborhood of 0 6 C. We call w* the convergence point of the cluster. Definition 15.6.2 For fixed s, the convergence point of a convergent cluster is the common endpoint as t —» 0 of the solution paths of H(w, t, s) = 0 emanating from each cluster point (u>i,to, s). The nonsingular path tracking algorithm of § 2.3 can be adapted to our current situation to arrive at the following singular path tracking algorithm. • Given: System of equations, H(w, t, s) = 0, and an initial convergent cluster Co, such that H(Co,to,0) w 0. Also, an initial step length h and a tracking tolerance e. • Find: Sequence of convergent clusters (Ci,U,Si), i = 1,2,..., along the path such that with Sj+i > Sj, terminating with sn = 1. Return the final cluster at s = 1 and a high-accuracy estimate of its convergence point. • Procedure: - Loop: For i — 1,2,... (1) Predict: Predict cluster (U,t',s') with s' = min(si_i + h,l) and t' = U-i. (2) Correct: In the vicinity of U, attempt to find a corrected cluster W such thatff(W,i',s')~0. (3) Recondition: If correction is successful, play a singular endgame in t' to compute the convergence point of the cluster at s'. If the convergence point is computed to accuracy better than e, declare the endgame successful and do the following. * Adjust t: Pick a new £j in the endgame operating zone. * Update: Set Si = s' and generate the corresponding cluster C^. Increment i. (4) Adjust h: Adjust the step length h. - Terminate: Terminate when s; = 1. - Refine endpoint: Play the endgame at s = 1 to compute thefinalconvergence point to high accuracy.
The Numerical Irreducible Decomposition
287
In the context of witness points generated by the cascade algorithm, the paths of the cluster points are nonsingular away from t = 0. Accordingly, the usual prediction/correction techniques for nonsingular paths apply. The adjustment step for reconditioning must select a new value of t, which will be held constant in the next prediction step. One sensible way to select it is to use the largest value for which the singular endgame meets the convergence tolerance e. If the endgame meets the tolerance on the first try at the current value t', it may be useful to try increasing it. If it fails, we try decreasing t, unless the condition of the Jacobian matrix indicates that failure may be due to having entered the ill-conditioned zone around t = 0. With such rules in place, the value of t can adaptively decrease and increase as s proceeds. Similar to the nonsingular path tracker, we adaptively adjust the step length h by halving it when the correction step or the reconditioning step fail. On the other hand, if these steps both succeed several times in a row, we try doubling h. A variant of the procedure is to save some computation by applying reconditioning only occasionally to verify that the cluster is convergent. One criterion for deciding when to recondition is to monitor the condition number of the Jacobian matrix dH/dw along the paths. Even more computation might be saved by tracking only one path in the cluster along s holding t constant, and when the condition of the Jacobian matrix indicates reconditioning is necessary, to regenerate the other points in the cluster by looping t around the origin. This risks path crossing, because it is not clear how to set the reconditioning criterion to ensure that t has remained within the convergence radius as s progresses. There is very little experience at this point to judge whether such variants can be made both reliable and efficient. By reconditioning at every step, we have greater assurance that the local model remains valid for the whole extent of s 6 [0,1]. The techniques we have discussed show that in principle singular path tracking is feasible, although in practice a fully satisfactory approach is still a matter of research. The approach was first presented in (Sommese et al., 2002a), which also reports on some initial experiments with the technique of using the condition number to decide when to recondition. It may seem that we could completely avoid singular path tracking by using deflation to convert problems into nonsingular path tracking problems. This is true in the context of witness points generated by the cascade algorithm, because such points are isolated solutions cut out by the slicing procedure. However, in Chapter 16, we will see how to find witness points for a set denned as the intersection of two given algebraic sets, say A and B. If A and B are both components of the same system of equations f(x) = 0, then although a slice of appropriate dimension cuts out a unique point on the intersection set, such a point is not an isolated solution of the system obtained by appending the linear slicing equations to f(x) = 0. Consequently, witness points for A D B are defined only as singular endpoints of solution paths in a new kind of homotopy, called the diagonal homotopy, and such
288
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
points can be moved along AdB only by singular path tracking. Of course, it could be that a more elaborate form of deflation could desingularize these points as well; such a procedure could be subject matter for a new line of inquiry. To raise the bar even higher, consider intersecting two algebraic sets whose witness points are only known as singular endpoints of a diagonal homotopy. Then, we could have a very difficult singular path tracking problem in which each point in the convergent cluster is itself only known as the convergence point of a prior homotopy. We have yet to face such a nasty calculation, but it is quite within the scope of numerical algebraic geometry to consider it. 15.7
Exercises
Exercise 15.1 (Degree of p(x)) Conclude that degp(x) — d for the polynomial in Equation 15.5.12 by showing that (1) the highest degree that XN occurs with is d; and (2) by genericity of XJV we know that there is at least one fiber of n on which XJV is nowhere zero, and therefore that tT7T^(xN)(xi,...,a;jv-i) is not identically zero. Exercise 15.2 (Spherical Parallelogram Mechanism) Pick two unit vectors ax, a2 € M3 and a random value of a £ I . Let 6i, &2, 63 6 M3- Consider the system of polynomial equations ajbi = a,
a2b2 = a,
b[b2 = aja2,
bjbi = 1,
6^62 = 1,
63 = (6i + b2)/2.
These eight equations describe a curve in (61,62,63) € C9. Find a numerical irreducible decomposition of that curve. Report the number of irreducible components and their degrees. Exercise 15.3 (Griffls-Duffy Decomposition) Revisit Exercise 14.4 and find the irreducible decomposition. Do it again for the special case when 6j = a* and Ci = 1, i = 1,..., 6. Report the number of irreducible components and their degrees. Exercise 15.4 (Seven-Bar Problem) Use exhaustive trace testing to show that the one-dimensional component of the seven-bar system presented in Exercise 13.3 is irreducible.
Chapter 16
The Intersection Of Algebraic Sets God keep me from ever completing anything. This whole book is but a draught—nay, but the draught of a draught. Oh, Time, Strength, Cash, and Patience! —Herman Melville
Up to this point, we have concentrated on describing the numerical solution of a given system of polynomial equations. That is, given a polynomial system / , we have numerically described V(f). In Part II, we sought just the isolated points in V(f), while in Part III, we have sought the numerical irreducible decomposition of V(f). In this final chapter, we discuss operations on irreducible components. In particular, we present algorithms from (Sommese et al., 2004b, 2004c) to compute the numerical irreducible decomposition of A n B, where A and B are irreducible components of V(f) and V(g), respectively. The capability to work with individual pieces of the solution sets and to intersect pieces from different sets of equations gives a new level of refinement, allowing resources to be concentrated on just the objects of interest, especially when the solution sets of the systems on hand include extra components that are not of interest, as happens frequently. When one wants the intersection of reducible algebraic sets, it is just a matter of bookkeeping to intersect all of their irreducible pieces. For reasons which will become apparent later, we call the workhorse of the new approach the diagonal intersection algorithm. The diagonal intersection technique even allows one to examine certain algebraic sets that are proper subsets of the irreducible components of the equations on hand. A case in point is where A and B are both irreducible components of V(f). In this case A n B is certainly an affine algebraic set, but we do not have on hand a set of polynomials for which it is an irreducible component. One could derive such a polynomial system with appropriate symbolic operations on / , but we can find witness points for the set with only numerical operations on / . Somewhat surprisingly, the ability to work with individual components gives us new leverage in finding just the isolated solution points. We find that much of the special structure of a system, which we worked so hard to exploit in Part II, can be captured starting with total-degree homotopies to decompose individual equations and then applying the diagonal intersection algorithm to find intersections equation-by-equation. While the approach is at present too new to have much practical experience, early experiments (Sommese, Verschelde, & Wampler, 2004e)
289
290
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
show promise. It is hoped that the approach might solve some problems that were previously too large to solve in one blow by the traditional approaches of Part II. 16.1
Intersection of Irreducible Algebraic Sets
A good idea of the way the diagonal intersection algorithm proceeds can be gleaned by studying a special case. Assume that we have two polynomials /, g on C2. Let A be an irreducible component of V(f) and let B be an irreducible component of V(g). We would like to find A n B. Assume that A has degree d\, and B has degree d2, and let a.\,..., a^ and /?!,..., (3d2 be witness point sets of A and B, respectively. That is, for generic linear equations LA(x) = 0 and LB(X) = 0, we assume that we have already computed the intersections V(LA)^A = {a\,... , a ^ } and V(L B ) Dfl = {/?i,... ,/JdJ. Note that AC\B can be interpreted as solutions to a system on C4 by a procedure from algebraic geometry called reduction to the diagonal (Ex. 13.15 Eisenbud, 1995). The procedure is to form the system ~f{xi,x2)~ F(x1,x2,y1,y2)
=
S
= 0.
^
. x2-y2
.
The solutions of the system consists of points {x\,x^,x\,X2) G C4 with (x*,^) a point of V(f,g). This identification respects components and all multiplicity structure. In particular, all the irreducible components of AC\B have corresponding irreducible components in V(F). Ignoring for a moment the two diagonal linears, let's consider the set V{f(x1,x2),g{yi,y2))Clearly, A x B is an irreducible component of this set. To see this, remember that an algebraic set being irreducible means by definition that its set of smooth points is connected. To see that the smooth points of A x B, (Ax B)leg, is connected, note that (A x B) reg = ATeg x BTeg, and that the product of connected sets is connected. Moreover, we know a set of witness points of A x B, i.e., the set of points {(ati,l3j),i — l,...,d\,j = I,...,d2} are the intersection of A x B with the linear space V(LA(x1,x2),LB(yi,y2))Consider the homotopy
_{1 - t){x2 - y2) + jtLB{x)_ with 7 a general point of S1, the complex numbers of absolute value one. In this special case, the diagonal intersection theory of (Sommese et al., 2004b) implies
291
Intersection of Algebraic Sets
that the endpoints as t —> 0 of the solution paths of H(x, y, t) starting at the points (<Xi, (3j) at t — 1 includes (using the identification given by reduction to the diagonal) all the isolated points of An B. The general case is conceptually not much harder, although the procedural details get a bit technical. We sketch only the main idea here. We use notation similar to that above, but now work in higher dimensions. That is, let A C V(f) C CN and B C V(g) C C ^ be irreducible algebraic sets, with / and g as polynomial systems. Let dim A — a and dim B = b. The main idea is that, letting x £ Cfc be the variables for A and y £ Cfc those for B, we wish to find the irreducible decomposition of the diagonal polynomial system, namely x — y, restricted to Ax B. The cascade homotopies of Chapter 14 carry over with A x B in place of Euclidean space. In short, we have an embedding like Equation 14.1.4 that includes all of the systems for slicing witness sets at every dimension. As in the cascade method on Euclidean space, we need to square up systems as necessary. Omitting detailed argumentation, this just amounts to choosing random, complex matrices M / , Mg, Mxy,S, U, v with dimensions as follows: Matrix rows columns
M/ N- a #(/)
Mg N -b #(g)
Mxy a+b N
S a+b N
U v N N 2N 1
The result is a system of 2N polynomials: Mf • f(x) Mg-g(y)
£(x,y,t)=
Mxy(x-y)
S-T(t).(u-[xy\+vy
+
(16.1.1)
where T(t) is & NxN diagonal matrix with entries ti,... ,tN. Just as in the regular cascade method, we choose t\,... ,tn randomly, and a witness set for dimension i is found by solving the equations £ (x, y, t^) = 0, where t ^ = (ti,...,ti,O,...,O). To get started, note that we have at the outset the solutions (a*, f3j) 6 CN x CN, i = 1 , . . . , deg A, j = 1 , . . . , deg B of the system J-(x, y) = 0, where
'Mrf(xy H*,V)=
M
['S]
•
(16-1.2)
LJA\X)
. LB(y) _ Now, the top dimensional component of A n B is at most fci := min(a, b) and the lowest is at leastfco:= max(0,a + b - N). We solve for dimension fci by tracking the solution paths of s^(x,y) + (1 - s)£ (z,y,tM) = 0,
(16.1.3)
from each of the start points (ai, 13j) at s = 1 to get at s = 0 three kinds of points: witness points on the diagonal x — y = 0, points at infinity, and "nonsolutions." The
292
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
nonsolutions at dimension i are the start points for the homotopy to dimension i — 1, £(x,y,{t1,...,ti-1,s,0,...,0))
= 0,
(16.1.4)
whose solution paths we follow from s = 1 to 0. This is a brief, but procedurally complete, description of the diagonal intersection method. As outlined above, each homotopy is a system of 2iV equations in 2N unknowns. In (Sommese, Verschelde, & Wampler, 2004c), it is shown how to consistently reduce the size of the homotopy by using intrinsic formulations of the linear equations. Finally, it is important to note that the output of the diagonal homotopy method is a witness superset. We still need to remove junk points and, if desired, break the witness sets into irreducible witness sets. The algorithms of Chapter 15 are directly applicable.
16.2
Equation-by-Equation Solution of Polynomial Systems
With the diagonal intersection algorithm in hand, we have much more flexibility in how we solve systems of polynomials. For example, we can subdivide a system into two sets of polynomials, compute the irreducible decomposition of each, and the use the diagonal method to intersect each irreducible component of the first subsystem with each one of the second. With a little bookkeeping, for eliminating duplications and so on, we get a numerical irreducible decomposition for the whole system. Taking this approach to the extreme, we may first find witness sets for each polynomial individually, and then intersect these one-by-one. We call this solving the system equation-by-equation (Sommese et al., 2004e). The approach is most easily described in terms of a flowchart, shown in Figure 16.1. The post-processing of points coming out of the diagonal homotopy discards duplicates and checks whether singular points are junk. In the junk removal box, we have used the shorthand V{W) to mean the algebraic set witnessed by W. We also allow an affine algebraic set Q to be pre-specified for discarding points on known degenerate sets or sets not of interest. For example, should we wish to work on (C*)N, Q is the union of the coordinate planes, X{ = 0, any i. The flowchart also includes two tests that eliminate some witness points of the subsystems before they get to the diagonal homotopy routine. The one on the left, "/fc+i = 0?," recognizes that if a witness point satisfies the new equation, then the set it represents does too, and it passes to the output without change of dimension. The points eliminated by the similar test on the right, ufi(x) = 0 any i < fc?," discards points on components we have already found. Such tests are cheap compared to running the diagonal homotopy, so it is useful to employ them. The pruning of points in the flowchart can be made more stringent if all we wish to find are the nonsingular isolated points of the system. Supposing that the original system is square, / : CN —> C^, we can keep in the output for V ( / i , . . . , ft) just the nonsingular witness points for dimension N -i. There are not enough polynomials
293
Intersection of Algebraic Sets
remaining to cut any higher-dimensional components down to isolated points. To understand why the equation-by-equation approach might be valuable, consider that systems of 50 or more low degree polynomial equations occur naturally in the study of polynomial systems. It can happen that such a system has only a few thousand isolated solutions, and we might wish to find them. Straightforward use of traditional homotopy continuation, such as we described in Part II, may have little chance of succeeding. For example, assume that we had a system of 60 polynomials of order two. A total degree homotopy continuation would have 260 « 1018 paths. Assuming we had a thousand node computer, each node of which could compute 20 paths a second, it would take a few million years. Of course, if the system has many fewer than 260 solutions, we should not be using a total degree homotopy, but instead use a start system to take some advantage of the special structure. However, the computation of a special start system also suffers from a curse of dimensionality, so we may not find a good one in a reasonable amount of time. Consider the following simple case: an eigenvalue problem. We have A a given 60 x 60 matrix of constants. We have the polynomial system Ax — \x = 0. Regarding it as a system on P 59 x C and homogenizing, we get the system fxAx — Xx = 0
on P 59 x P 1 . Embedding P 59 x P 1 into P 119 using the Segre embedding described in § A. 10.2, we see that the total degree of the solution components of the first k equations is k. This means the number of solution paths working equation-byequation never gets large. A reasonable observation is that using the bihomogeneous structure we just wrote down, the usual homotopy continuation will work well. The point is that in this case we could see a special structure. If we hadn't, the total degree homotopy would be useless, but the equation-by-equation approach would automatically utilize the special structure. 16.2.1
An Example
Consider once more the system given in Equation 12.0.1, which we treated with WitnessSuper in Example 13.6.4 and with Cascade in Example 14.2.1. The equations are
\h(x,y)] [h{x,y)\-
[ x(y2-x3)(x-l) 1 2 3 [x(y -x )(y-2)(3x + y)\
U
"
It is easy to confirm by hand that the equation-by-equation algorithm flows as follows. The numbers next to the flow lines indicate how many points flow that direction. Counting the computation of witness points for the individual equations,
294
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
there are a total of 5 + 6 + 2 = 13 homotopy paths in the procedure for this problem. This compares to 36 paths for WitnessSuper and 39 paths for Cascade. Witness V(fx)
Witness V(f2)
#Wf = 5
#X2 = 6
5
^
-.
6
Witness y ( / i , / 2 ) 16.3
Exercises
Exercise 16.1 (Flowchart) Draw a diagram showing how witness points flow when the equation-by-equation method is applied to the system of Example 13.6.5. (Hint: some points coming from the diagonal homotopy go to infinity.) Exercise 16.2 (Eigenvalues) Chart the flow of witness points for an equationby-equation treatment of the eigenvalue problem described in § 16.2. Assume the size of the matrix i s n x n . The output of the diagonal homotopy at each stage consists of only nonsingular points and points at infinity. How many paths are tracked in total? Exercise 16.3 (Diagonal Intersection of Reducible Components) The description of the diagonal approach is for intersecting irreducible sets. Despite this, the equation-by-equation flowchart does not require the witness sets to be decomposed into irreducibles. Explain why this is valid.
295
Intersection of Algebraic Sets
Witness
Witness V(h, ...,fr) w*
w$
•••
w*
•••
xk+1
w£
w
/fc+i(w) = O? —
V
•
•
N>
x
1
^
V{fk+1)
|
W
/i(a;) = Oanyi
^
X
r
7—
V7 f I
<^N
Diagonal Homotopy
y at CXD? or y e Q?
ye W^li1?
y singular?
\ I
Y^>
Y>
,
Y> •
1
y € V{Wf+1)
I
for any i < j ?
I
Y>
r^>^ \
Wk
+1
Wk
+1
...
^fc+l
Wk+1
...
py-fc+l
Discard /
^ ^ 1
Witness V ( / i , . . . , / f c +i)
Fig. 16.1 Stage A; of equation-by-equation generation of witness sets for V(fi,..., fn) 6 C w \ <5The witness sets are subscripted by codimension and superscripted by stage. Q is some prespecified algebraic set on which we wish to ignore solutions.
Appendices
Appendix A
Algebraic Geometry
A basic goal underlying algebraic geometry is to translate between algebra and geometry, and take advantage of people's strong visual intuition and the tools developed in mathematics to support this intuition. Over the complex numbers, the relationship between algebra and geometry is remarkably strong, and sadly over the real numbers this relationship is very weak. In this appendix we present useful results about these concepts, but we have left many facts to be introduced as needed throughout the book. What we have tried to do is give adequate definitions and examples so that the reader can understand the techniques in the book. Towards this goal we add to the basic concepts introduced earlier in this book. There are a plethora of introductory books on algebraic geometry. Unfortunately, many of these, based on a computational algebra approach, are not centered on the basic geometric facts we need, e.g., the equivalence of an algebraic set being irreducible with the connectedness of its smooth points. (Kendig, 1977) is a good geometric introduction. Though restricted to plane curves, (Fischer, 2001) is a gentle introduction that covers a surprising amount of important material. (Fischer, 1976) is a wonderful book for getting a detailed understanding with precise statements of the analytic geometry that is useful in the study of polynomial systems. No one book will cover everything, but for further study we suggest the fine books (Griffths & Harris, 1994; Harris, 1995; Mumford, 1995), which discuss many geometrical issues that arise. (Eisenbud, 1995) is a useful book covering the algebra underlying the symbolic methods with attention to the background geometry. (Decker & Schreyer, 2001) is a good survey of computational algebraic geometry. (Cox et al., 1997, 1998; Decker & Schreyer, 2005; Greuel & Pfister, 2002; Schenck, 2003) are good introductions to computational algebra and computational algebraic geometry. Except when explicitly stated, algebraic sets are reduced, i.e., we ignore multiplicity information. Systems of polynomials on C'" are not sufficient. For example, if we have a system of polynomials on C^, it might well happen that there is some algebraic subset B of CN, known in advance of solving the system, such that solutions in B 299
300
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
may be ignored. Working directly with a system of polynomials on CN \ B leads to conceptual clarity. A more serious situation occurred in Chapter 16 where the natural space is not a Zariski open set of C^, but rather a pure-dimensional affine algebraic set. The compromise we make here is to deal with algebraic functions on pure-dimensional quasiprojective algebraic sets. Systems of homogeneous polynomials on projective algebraic sets may be reduced to this situation by the moves discussed throughout the book, i.e., for projective space, we pass to the Euclidean space of one dimension higher with the addition of a random linear equation or equivalently passing to a "general" Euclidean patch inside the projective space. We make several remarks in the rest of this appendix about more general situations, e.g., working with line bundles and vector bundles. A significant part of this appendix is devoted to Bertini Theorems, which are crucial for applications of numerical analysis to polynomial systems. Many of these results assert that certain sets are smooth with appropriate dimensions or they are empty. These statements, which do not assert any existence, are usually simple to prove, and reduce to Theorem A.4.10 combined with some form of the constructions of § A.7 or § A.8.1. There are also statements asserting certain sets are nonempty or irreducible. These results are more difficult and rapidly lead beyond the scope of the book. For this reason, we have multiple statements of Bertini's Theorem with different levels of generality. A.I
Holomorphic Functions and Complex Analytic Spaces
The complex neighborhoods introduced in § 12.1.1 are convenient because they may be chosen small enough to discard global information. Loosely speaking, they let us put local properties of a space "under the microscope." When using complex neighborhoods it is often useful to choose local coordinates which are not polynomials. Here is a typical example. Example A.1.1 Consider the affine algebraic set Z := V(w2 - z). We have a map 7T : Z —> C given by ir(z,w) = z. There are two points in the fiber TT~1(1) over 1, i.e., (1,1) and (1,-1). As we will see, Z is a manifold, and a natural parameterization of Z at (1,1) g Z is given by (z, ^/z) where we choose the branch of yfz with \/I = 1, and stay in a neighborhood of 1, e.g., {z £ C | z ^ (—oo,0]}, where the branch gives a well-defined function. For doing algebraic geometry over the complex numbers, it has been standard for over a century to use holomorphic functions such as the function ^/z in Example A.1.1 and holomorphic functions such as ez. When talking about holomorphic functions, we use the complex topology unless we explicitly say otherwise, e.g., that a set is a Zariski open set. A function / defined on an open set V C CN is said to be a holomorphic function on V if given any x = (xi,..., xN) G V, there exists a neighborhood U C V of x
301
Algebraic Geometry
on which there is an absolutely convergent power series expansion oo
f(zu...,zN) = ^2 X!
i=0 \J\=i
a z x J
j( - ) >
where all of the aj G C. Here we use multidegree notation (1) J denotes an JV-tuple of nonnegative integers (ji,..., (2) \J\ :=ji+---+jN\ and (3) {z - x ) J : = (zi - x ^ • • • ( z N - x N y > » .
J'JV);
Just as in one complex variable there are many equivalent ways of denning holomorphic functions, e.g., in terms of the Cauchy-Riemann equations. We refer the reader to (Pritzsche & Grauert, 2002; Gunning, 1990; Gunning & Rossi, 1965) for more on holomorphic functions. We need only a few facts about them. The first is the obvious fact that polynomials are holomorphic. Locally, polynomials and holomorphic functions look and behave the same, but when looked at globally, holomorphic functions can be much more wild than polynomials, e.g., ez — 1 has infinitely many complex zeros. On the other hand, there are many results that assert that a holomorphic function with growth as moderate as an "algebraic function" is an "algebraic function." For example, any holomorphic function / on C ^ with the property that there is a constant C > 0 and an integer K > 0 such that
l/(*)l < c(1 + v / N 2 + --- + k v | 2 ) X is a polynomial of degree < K. This follows immediately from the Cauchy Inequalities (page 21 Pritzsche & Grauert, 2002). In analogy to affine algebraic sets, we define a complex analytic set X C U on an open subset U C CN as the set of common zeros of a finite number of holomorphic functions f\,..., fk on U. Given a complex analytic set X := V ( / i , . . . , fk) C U on an open subset U C CN and a complex analytic set Y := V( '• X —> Y is a function from X to Y such that (f> is the restriction to J of a holomorphic map A : U' —> V from an open subset U' C U containing X to an open subset V' C V containing Y, i.e., the restriction of a mapping of the form (z1,...,zN)
-> (A^Z!,...,
zN),...,
AM(zi,
• • •, zN))
with Ai(zi,... ,ZN),...,AM(ZI,...,ZN) holomorphic functions. A holomorphic mapping <j> : X —> Y is called a biholomorphic mapping if there exists a homomorphic mapping tp : Y —> X such that <j> o ip is the identity mapping on Y and tjj°4>\s the identity mapping on X. In this situation we say that X is biholomorphic to Y. Recall, e.g., (Milnor, 1965), that an n-dimensional differentiable manifold X
302
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
is a metric space that locally looks like Euclidean space. The definition of differentiable manifold requires some technicalities because manifolds can have different degrees of smoothness. You need a set {Ua \ a € 1} of open sets which covers the manifold, i.e., X = UaeiUa, and for each Ua, a map (pa : Ua —> R n that gives a homeomorphism Ua to an open set of ]Rn. Moreover: (1) given any compact set K C X, only finitely many Ua meet K; (2) whenever Ua D U0 ^ 0, fa o 0 " 1 : 4>a(Ua n Up) -> 4>p{Ua D L^) is C°°, i.e., has infinitely many continuous derivatives; (3) there is a countable basis of open sets, i.e., there are a countable set B of open sets such that every open set on X is a union of open sets from B. If we replace Rn by C™ and C°° by holomorphic, we have the definition of ndimensional complex manifold. Before we go any further, we point out that all manifolds connected to algebraic geometry are quite nice. For algebraic sets, we rarely need worse than complex manifolds, which are much "nicer" than even C°°-manifolds, i.e., infinitely smooth manifolds. We never stray below infinitely differentiable manifolds. The complex analytic sets we have defined so far are analogous to affine algebraic sets. There exists a very natural more global notion of complex analytic spaces defined analogously to complex manifolds using the complex analytic sets as local models, e.g., (Fischer, 1976; Gunning & Rossi, 1965). Complex analytic sets, quasiprojective algebraic sets, and complex manifolds are complex analytic spaces. In what follows we will state some results for complex analytic spaces.
A.2
Some Further Results on Holomorphic Functions
Holomorphic functions satisfy very strong restraints that are often considerably stronger when the domain of the functions is at least two dimensions. For example, there are several convenient extension theorems. Theorem A.2.1 (Hartogs' Theorem) Let U C CN be an open set with N > 2 and let Y = V(gi,... ,gi) be a complex analytic subset of CM. If K C U is a compact set with U \ K connected, then any holomorphic mapping A : U \K —• Y has a unique extension to a holomorphic mapping U —> Y. Proof. The map A is given by functions Ai,...,AM, and has a unique extension to U, since the Ai extend uniquely to U by the single function version of Hartogs' Theorem, e.g., (page 307 Fritzsche & Grauert, 2002). Since the holomorphic functions gt(Ai(z),..., AM{Z)) are identically zero on U \ K, the extensions to U are identically zero. Thus A(U) CY. •
Algebraic Geometry
303
Remark A.2.2 Theorem A.2.1 is not true with Y merely a complex analytic subset of an open set U C CM. For example, if G is the open unit ball in C2, K is the closed ball in C2 of radius 1/2, and Y = U := G\K, the result is false. It is true whenever U is a holomorphically convex open set of C M , see, (page 75 Fritzsche & Grauert, 2002). Such sets include CN and open balls. Here is a typical use of Hartogs' Theorem. Example A.2.3 Let X := CN \ 0 with N > 2. Then X is not isomorphic to an affine algebraic set. To see this assume otherwise that it was isomorphic via F : X —> X' to an affine algebraic set X' C C M for some positive integer M. Then, since X' is closed, any sequence xn G X converging to 0 £ C^ cannot have their images F(xn) converge in C M . But, such a sequence does converge, since by Hartogs' Theorem A.2.1, the mapping F has a holomorphic and hence continuous extension to CN. The following simple result puts Example A.2.3 in perspective. Lemma A.2.4 Let g be an algebraic function on an affine algebraic set X C CN. Then X \V(g) is isomorphic to an affine algebraic set. Proof. By definition we have a polynomial p £ C[zi,..., ZN] such that Px = 9 and hence V(g) = V{Px) = V(p,f,,..., fk) where X := V ( / 1 ; . . . , fk). We let z denote the AT-tuple (zi,..., zN). We define the map F : X\V(g) -> V(fi,..., fk, wp-1) C C ^ 1 by F(z) = (z,l/p(z)). Define G : V{h,..., fk,wp - 1) ^ X\V(g) by G(z, w) = z. Note that G o F is the identity on X and F o G is the identity on V(f1,...,fk,wp-1).
a
Another very useful result is the Riemann Extension Theorem (page 38 Fritzsche & Grauert, 2002). Theorem A.2.5 (Riemann Bounded Extension Theorem) Let U be a complex manifold. If Y c U is a complex analytic subset of U with Y ^ U, then any bounded holomorphic function on U \Y has a unique extension to a bounded holomorphic function on U. Remark A.2.6 Analogous to the generalization mentioned in Remark A.2.2, Theorem A.2.5 remains true if the bounded holomorphic function on U \ Y is replaced by a holomorphic mapping from U \ Y to an analytic subset X of a bounded holomorphically convex open set G C C M . The condition that U is smooth may be relaxed to the condition that U is normal, which will be briefly touched on in § A.2.2. For holomorphic functions, there is the maximum principle, e.g., (Theorem LA.7 Gunning & Rossi, 1965).
304
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
L e m m a A.2.7 (Maximum Principle) Let f{x) be a holomorphic function on a connected open set U C C ^ . / / |/(:r)| has a maximum on U, then f(x) is constant. For holomorphic functions, partial derivatives with respect to the coordinates can be shown to be well defined, e.g., by differentiating the power series term by term. The analogues of many differential calculus results hold with no change of statement. For example, there is the important Implicit Function Theorem. A set of holomorphic functions / i , . . . , /JV defined on an open neighborhood U of a point x € CN is called a system of coordinates on U centered at x e C ^ if (1) fi{x) — 0 for all z; and (2) the mapping ( / i , . . . , fN) : U —> C ^ is a biholomorphic mapping from U to an open set V oi CN. By Theorem A.2.8, the second condition, with U possibly replaced by a smaller open set, is equivalent to a condition that the Jacobian at x r dfi dzx '" dfN • dz\
dfx -I dzN
_ _ _ dfN dzN
-
is invertible at x. T h e o r e m A . 2 . 8 ( I m p l i c i t F u n c t i o n T h e o r e m ) Let fi,...,fk
be
holomorphic
functions defined in a neighborhood of a point x € CN with fi(x) = 0 for all i. Assume that the Jacobian rdfx dzx
dfx -i dzN
dfk . - dzx
3/fc dzpi -
has rank k at x. Then on some possibly smaller neighborhood U of x, there exist holomorphic functions fk+i, • • •, /JV such that / i , . . . , fN form a system of coordinates on U centered at x. The analogues of the many consequences of the differentiable implicit function theorem hold with no change. For example, we have a corollary that we will use below. Corollary A.2.9 Let Z e.g., let U = CN and Z a holomorphic map from 4>(0) — x £ Z and
CU be a complex analytic subset of an open set U C C CN be an affine algebraic set. Let <j> : B -> the open ball B in a complex Euclidean space Cm equal to a neighborhood of Z containing x. Assume
C^, C ^ be with further
305
Algebraic Geometry
that the complex Jacobian • dfa_ . . . 9
: •.. :
d4>=
L dzi '"
dzm J
has rank m at 0. Then there exist holomorphic coordinates / i , . . . , /jv in a open set U' C U ofx such that ZnU' = (j>{B)C\U' = {x € U' | fm+i(x) = 0;...; fN{x) = 0}. Proof. By renaming if necessary we can assume without loss of generality that • <Mi . . . Mx. -\ dz\ dZrn d
has rank m a t 0. In addition to the coordinates z\,..., zm on C m , let zm+i,... ,zjq be coordinates on CN~m. Define the map f : B x C N " m -^ C^ by f{(zi,..., zN) = 4>i(zi,..., zm) for i from 1 to m, and /i(zi,..., z/v) = —4>l(zi,..., zm) + z, for z fromTO+ 1 to N. The Jacobian of / at 0 is 9zi
a^m
u
"9^7
9i^"
U
dtpm + l 3zi '" 90w L
aZl
u
'
U
''"
L
90m + l 1-1 rv 9 z m -- • • • U i
d(f>N r\ '"'
dzm
u
J
By the implicit function theorem with k = N, the /j form a system of coordinates at 0. Knowing that / is one-to-one in a neighborhood of 0, it follows by construction that (j>{B) n U = {x e U' | fm+i{x) = 0;...; fN{x) = 0}. • The reader might observe that in Examples 12.1.3 and 12.1.4, there is a one-toone and onto mapping C —> V(w2 — z) given by sending j o e C t o (w, w2). Note that the differential of the mapping everywhere has rank one, and the map (z, w) —> w gives an inverse. Given this, it is natural to hope that given a smooth point x of an affine algebraic set Z, there is a Zariski open dense neighborhood U C Z of x which can be identified with a Zariski open dense subset of some Euclidean space. It is a fact of life that this is false. Example A.2.10 w2 ~z{z-l)(z-2)
Let I c C 2 denote the affine algebraic set defined by p(z, w) = =0. Since
306
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
is the empty set, it follows from Corollary A.2.9 that V{p) is a manifold. It can be shown that V(p) is as a differentiable manifold homeomorphic to a torus minus one point, i.e., homeomorphic to S1 x S1 minus a point, where S1 denotes the circle S*1 := { z e C \z\ = l} . Any Zariski open set U C V(p) is the complement of a finite set on V(p). Thus, there will be two differentiable embeddings of the circle 5 1 to V(p) that meet transversely in only one point. But, there can be no such maps of S1 into C. One of the beauties of Zariski open sets is that they are very big. The problem here though, is that Zariski open sets are too big. A.2.1
Manifold Points and Singular Points
How bad a set is an affine algebraic set? How far are they from being smooth, i.e., from being a manifold? As we will see, the answers are "quite nice" and "not very far" respectively. In fact, given any affine algebraic set Z, the set of smooth points of Z will be a Zariski open dense set of Z. Let us introduce definitions and concepts to make this precise. Given an affine algebraic set Z C CN, we define a point x G Z to be a smooth point (also called a manifold point or a regular point) if there is a holomorphic map
:
d<j>=
9
Uz,
•-.
: 9
' ' ' dzm J
has rank m a t i , where (f>{zi,...
,zm)
= (4>i(zi,...
,zm),...
,<j>N(zi,...
,zm)).
For example, at (1,1) on Example A.1.1, (j> can be taken to be z —> (z, yfz). Note that by Corollary A.2.9, it follows that given a smooth point x £ Z, there are holomorphic coordinates z\,..., z^ defined on a complex open set U C CN containing x and such that Zi(x) = 0 for all i, and such that U(~\Z = V(zm+i, • • •, .ZJV). This integer m is defined to be the complex dimension of Z at a regular point x £ Z. The complex dimension of Z at m is half the usual dimension of Z considered as a topological manifold at x. We typically use the word dimension for complex dimension and refer to the usual dimension as the real dimension. For example, the complex dimension of C is one and the real dimension is two. It is traditional to denote the smooth points of a quasiprojective set Z by Z reg . The points in Z \ Zreg are called singular points. The singular points of Z are denoted Sing(Z). The dimension of Z at a smooth point is well defined. A nice argument for this follows by adapting the very short argument for differentiable manifolds (page 7 Milnor, 1965). We gave a general definition of dimension in § 12.2 based on the irreducible decomposition.
307
Algebraic Geometry
One difficulty with deciding for which points an algebraic set are smooth is that the defining equations for the set might have too much information packed in them. Here is an example where the defining functions will not suffice. Example A.2.11 Let Z := V(z2) C C. In this case, Z = V(z) also, and using the defining equation z, we see that Z is a manifold. The problem with the defining equation z2 is that it also includes multiplicity information about Z. Remark A.2.12 There is no easy computational solution to the problem posed by the last example. The set of smooth points of an affine algebraic set V(f) is Zariski open and dense, but the prescription for the singular set is nontrivial. Given an affine algebraic set Z C C^, Z = V(I(Z)) where I(Z) denotes the ideal of polynomials in C[zi,... ,ZN] that vanish on Z. One version of Hilbert's Nullstellensatz, e.g., (Cox et al., 1997), says that given an ideal X C C[zi,..., z^\, then I(V(X)) = y/X, where y/X, the radical of X, consists of all polynomials g such that gk G X for some positive integer k. For example, on C, y/(z3) = (z). The passing from an ideal to its radical throws away all multiplicity information. The radical intervenes in the algebraic characterization of the set of smooth points of an affine algebraic set. Let g\, • • • -,gu be a basis of the radical of X(f). It follows that (Chapter 1A Mumford, 1995) that the singular set, Sing(V(/)), of V(f) is equal to T/
( \
dgi dzi
dgM \ dzN)
It must be noted that for a fixed N, M can be arbitrarily large. A special case of the above, mainly useful for making illustrative examples, is the following. Lemma A.2.13
Let p G C[zi,..., z^\. The singular set of V(p) is contained in (to
dp^\
Here is an example using Lemma A.2.13. Example A.2.14
Let Z = V(zw) C C 2 . In this case the potential singular
points, V I , -r—, zw I, the common zeros of zw and its partial derivatives, \ oz ow j equals the origin (0,0) € C 2 , which is clearly a singular point of Z. Remark A.2.15 The inclusion in Lemma A.2.13 is an equality if the dimension of the set ( \
dp_ dzx
^ \ dzN J
is < N — 2. For explicit examples, such a criterion is useful, but in our numerical work such criteria have not yet proven useful.
308
A.2.2
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Normal Spaces
There is an extensive literature on singularities. The simplest general class of complex analytic spaces after complex manifolds are normal complex analytic spaces, e.g., see (Fischer, 1976). A complex analytic space Y is normal if given any y GY and any complex neighborhood U of y and any bounded holomorphic function / on U \ Singf/, it follows that / extends holomorphically to U. A quasiprojective algebraic set is said to be normal if it is a normal complex analytic set. Given any complex analytic space (respectively, any quasiprojective algebraic set) X, there exists a unique normal complex analytic space (respectively, a unique normal quasiprojective algebraic set) X' with a finite proper holomorphic (respectively, algebraic) map -K : X' —> X with n : X' \ 7r~1(SingA') —> X \ Sing(X) isomorphic. Normal spaces include the affine algebraic subsets X C CN with the properties: (1) X is a reduced complete intersection, i.e., all irreducible components of X have with the same dimension; there are k := N — dimX polynomials pi,...,pk X — V(pi,... ,Pk)', and all components of X occur with multiplicity one; and (2) Sing(X) is codimension at least two in X. These special sets are irreducible and naturally occur as parameter spaces.
A.3
Germs of Complex Analytic Sets
There are situations, e.g., in the study of endgames in Chapter 10, when we want to look carefully at behavior in a neighborhood of a point. In such situations specifying a fixed neighborhood of the point is inconvenient, and the notion of a germ of a complex analytic set improves clarity. (Chap. II, Sec. E Gunning & Rossi, 1965) is an excellent place for becoming comfortable with germs of complex analytic sets. Since we will not be talking about germs of other types of sets, e.g., germs of affine algebraic sets, which are denned analogously using the Zariski topology in place of the complex topology, we will often refer to the germ of a complex analytic set as a germ of an analytic set or a germ. Given a point x e CN we define an equivalence relation on complex analytic sets containing x: if X c U and X' C U' are two complex analytic sets denned on open neighborhoods of x in C ^ , then we say that X and X' have the same germ at x if there is an open neighborhood V C U D V with X n V = X' D V. Thus the complex analytic sets V(z) and V(zw) define the same germ of a complex analytic set at (0,4) but not at (0,0). G C ^ , we let ||u;|| = \/|u!i| 2 + • • • + \WN\2 Given a point w = (WI,...,WN) denote the Euclidean norm and we denote the ball of radius r about a point x by
Br(x), i.e.,
Br(x)~{zeCN
||z-x||
Algebraic Geometry
309
We say a germ X at a point x £ C ^ is irreducible if there is a positive number e' such that for all positive e that are less than e', there is an irreducible representative of X in Be(x). This is equivalent to the usual definition that a germ l a t i G C^ is irreducible if there is no way to write X as a union Xi U X2 of germs X\, X2 at x unless either X = X1 or X = X2. The dimension of an irreducible germ X at a point x E C ^ is defined as the dimension of any one of the germ's irreducible representatives. An important fact about germs is that the irreducible decomposition holds, e.g., see (Theorem II.E.13 Gunning & Rossi, 1965). Theorem A.3.1 Let X be a germ of a complex analytic set at a point x € CN. Xk at x £ CN such that There are a finite number of irreducible germs X\,..., X = Xi U • • • U Xf, and for all j = 1 , . . . , k we have that Xj
310
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
dimensional, the set of singular points of X is O-dimensional and therefore meets a neighborhood of x with compact closure in a finite set. We can therefore (by taking a smaller U if needed) assume that the only possible singular point in U is x. If a; is a smooth point there is nothing to prove. Similarly IT : U \ {x} —> Ai (0) \ 0 is an unramified covering with a well-defined sheet number d. Basic topology tells us that the restriction / : Ai(0) \ 0 —> Ai(0) \ 0 of the mapping z —• zd, factors through n : U \ {x} -> Ai(0) \ 0 as f(z) = n(g(z)) with g : Ai(0) \ 0 -> U \ {x}. Using the Riemann Extension Theorem A.2.5, we conclude we have an extension <> / of g satisfying the conclusions we are trying to show. • From Theorem A.3.2 follow results classically referred to as Puiseux's theorem, e.g., (Chapter 7 Fischer, 2001). The following corollary is a version of this result. Corollary A.3.3 Let X be a one-dimensional complex analytic subset of an open set U C C ^ . Assume that X is irreducible at a point x e X. Let
g(x)+Y^aiw\ k=c
where ac ^ 0. Since Y^k=oai+cwl 1S nonzero at 0 we may express it on A r (0) for some positive r < 1 as h(w)c, with h(w) holomorphic on A r (0). Choosing r positive, but possibly smaller, s = h(w) is the desired coordinate. •
A.4
Useful Results About Algebraic and Complex Analytic Sets
The Hironaka Desingularization Theorem, which holds for both complex algebraic sets and complex analytic spaces, is highly nontrivial, but extremely useful. Given a quasiprojective algebraic set X (respectively, a complex analytic space X), a desingularization f : X —> X of X is a quasiprojective manifold X (respectively, a complex manifold X) and a proper surjective algebraic (respectively, holomorphic) map / : X —> X such that //-i(x r e g ) : / -1 (-Xreg) —> XTeg is an isomorphism Zariski open and dense in X. Xreg (respectively, a biholomorphism) with f~l(Xreg) is always Zariski open and dense in X.
Algebraic Geometry
311
Theorem A.4.1 (Hironaka Desingularization Theorem) Let X be a quasiprojective algebraic set or a complex analytic space. Then there is a desingularization f :X -> X ofX, More refined versions of the result tell us that we may choose the desingularization map so that the inverse image of the singular set under the desingularization map is a union of smooth codimension one algebraic sets which meet transversely. See (Lipman, 1975) for a nice exposition of this result. In the case when all components of X are of dimension one, Theorem A.4.1 is simply the normalization of X, e.g., see (Fischer, 1976). The Hironaka Desingularization Theorem makes many facts that are easy for manifolds carry over immediately to general algebraic sets. Here is one simple example often referred to as the maximum principle, e.g., (Theorem III.B.16 Gunning & Rossi, 1965). Theorem A.4.2 Let X be an irreducible complex analytic space with infinitely many points. Let f be a holomorphic function of X. If \f\ has a maximum on X, then f is a constant function. Proof. Let n : X —> X be a desingularization. Let / denote the composition of / with 7T. If |/| has a maximum, then so does |/|. By Lemma A.2.7, it follows that / is constant, and hence / is constant. • Theorem A.4.20 will give another illustration of the clarity brought by using Hironaka's theorem. The proper mapping theorem of Grauert assures us that many operations with algebraic sets yield algebraic sets. Theorem A.4.3 Let f : X —> Y be a proper holomorphic mapping (respectively proper algebraic mapping) of complex analytic spaces (respectively protective, respectively affine, respectively quasiprojective algebraic) sets. Then f(X) is a closed complex analytic (respectively protective, respectively affine, respectively quasiprojective algebraic) subset ofY. Proof. The analytic statement may be found in (Fischer, 1976). This, or the simple fact that in the complex topology proper maps take closed sets to closed sets, automatically implies the algebraic statements. To see this note that if X is projective, affine, or quasiprojective algebraic, we know that ir(X) is constructible by Theorem 12.5.6. Since n(X) is closed, we have the conclusion from Lemma 12.5.4.
•
Recall that an algebraic map from a quasiprojective set to an irreducible quasiprojective set Y is dominant if the image of the map is dense in Y.
312
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Lemma A.4.4 Let f : X —•> Y be a dominant proper algebraic map from a quasiprojective algebraic set to a quasiprojective algebraic set. Then f(X) — Y. Proof. Let y be a point not in the image of X. By dominance we can find a sequence of points yj G f(X) with yj converging to y. Choose Xj G X with f(xj) = yj. By the definition of properness, there is a neighborhood U that contains y such that f~l(U) is compact. Thus there is a subsequence of the Xj that converges to a point • x G X. By continuity of / , we have the contradiction that f(x) = y. For algebraic sets there is a strong result on upper semicontinuity of dimension (Corollary 3.16 Mumford, 1995) for the algebraic case or (Theorem, page 137 Fischer, 1976) for the more difficult complex analytic version. Theorem A.4.5 Let f : X —> Y be an algebraic map (respectively a holomorphic map) between quasiprojective algebraic sets (respectively complex analytic spaces). Then for each positive integer k,
{xex\ dimxr'ifix^yk} is a quasiprojective algebraic set (respectively a complex analytic space). Remark A.4.6 Let / : X —> Y be an algebraic map between algebraic sets. As we see from Example 12.5.5, the sets
{yGY
| dim f-\y)>
k)
do not have to be algebraic sets, though by Theorem A.4.5 and by Theorem 12.5.6, they are constructible. Using Theorem A.4.5 and Theorem A.4.3 we have the following result. Corollary A.4.7 Let f : X —> Y be a proper algebraic mapping of quasiprojective algebraic sets. For each integerfc> 0, the set {y G Y\ d i m / ^ 1 ( y ) > k} is a closed quasiprojective subset ofY. Finally we have the very useful Factorization Theorem of Remmert and Stein (III Corollary 11.5 Hartshorne, 1977). Note that finite-to-one proper maps are called finite maps by algebraic geometers, e.g., (Hartshorne, 1977). Theorem A.4.8 (Stein Factorization Theorem) Let f : X —> Y be a proper algebraic mapping of quasiprojective algebraic sets. Then f factors as s or, where r : X —> Z is a proper algebraic map from X onto a quasiprojective algebraic set Z with all fibers connected, and s : Z —>Y is a finite-to-one proper map. The following general lemma, which is a special case of (III Proposition 10.6 Hartshorne, 1977), is often useful. Lemma A.4.9 Let f : X —* Y be an algebraic map from a quasiprojective algebraic set X to a quasiprojective algebraic set Y. Let Xr denote the closure of
313
Algebraic Geometry
those points from x e Xreg such that f(x) 6 f(X) dimf{Xr) < r.
and rank dfx < r. Then
The algebraic analogue of Sard's Theorem, e.g., ((3.7) Mumford, 1995), is much crisper than the usual Sard's theorem for differentiable maps. It is responsible, through the Bertini theorems of § A.9 for many of the strong probability-one statements in this book. Theorem A.4.10 (Sard's Theorem) Let n : X —> Y be a dominant algebraic map between irreducible quasiprojective algebraic sets X and Y. Then there exists a Zariski open dense subset V cY such that letting U denote the Zariski open dense set A^reg (~l TT~1(V), TTu : U —> V is surjective and of maximal rank, i.e., dnx has rank dimY at all points of x e U. In particular, for all v e V, T T " 1 ^ ) fl XTeg is smooth of dimension dim X — dim Y.
Proof. In a nutshell the proof goes as follows. By replacing Y by a Zariski open dense set Y' of Y with Y' C n(X) \ Sing(Y), and X by Xreg n TT~1(Y'), we can assume without loss of generality that X and Y are smooth and n is surjective. Let X' denote the closed algebraic subset of X consisting of points for which dirx has rank < dimY. By Lemma A.4.9, n(X') is a proper algebraic subset of Y. • Remark A.4.11 The differentiable form of Theorem A.4.10 is quite weak. For example, consider the infinitely differentiable map / : E 2 —• R defined by
/(*,„) : = f e x p ( ^ i f ^ < l . [
0
if x2 + y2 > 1
The image of this map is [0,e -1 ]. Over the dense set (Oje"1) of the image [0, e" 1 ], / is of maximal rank, but f~1(U) is far from dense in IR2. Another useful fact is that generically dimensions add. Corollary A.4.12 (Additivity of Dimensions) Let f : X -^ Y be a dominant map between irreducible quasiprojective algebraic sets. There is a dense Zariski open setUcY such that for all y GU, f"1(y) is pure dimension dimX - dimY. Proof. By Theorem A.4.5, there is a dense Zariski open set V C X such that dimx f~1(f(x)) is a constant k for all x G V, and for all x G Z := X \ V, dimx f~1(f(x)) > k. Using Theorem A.4.10, we see that k = dimX — dimY. We will be done if we show that f(Z) is not Zariski dense. Assume it was. Then we would have an irreducible component Z' of Z mapped dominantly to Y. Using Theorem A.4.10 we conclude that a dense set of points x £ Z' satisfy dimx fz^{fz'{x)) — dimZ' — dimY. But this gives the contradiction that dimX - dimY = k < dim^ f^ifz'ix))
= dimZ' - dimY < dimX - dimY. a
314
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
The following result (Corollary, page 138 Fischer, 1976) is useful for analyzing not necessarily proper maps. The algebraic case with the Zariski topology follows from it by using Theorem 12.5.6. Theorem A.4.13 Let f : X —> Y be a holomorphic spaces. Assume for a point x € X that there is an O C X of x, such that for all x' € O, dinx^ f~1(f(x')) are arbitrarily small open complex neighborhoods U C such that
mapping of complex analytic open complex neighborhood is a constant k. Then there O of x and V C Y of f(x)
(1) f(U) is a complex analytic subset ofV; (2) fu'.U—t /(J7) is an open map, i.e., all open subsets of U in the complex topology are mapped to open subsets; and (3) d\mx U = k + dim /(a; ) f(U). By a covering map g : A —> B between differentiable manifolds of the same dimension is meant a differentiable map such that each point y GY has a neighborhood U such that g~1 (U) is a union of disjoint open sets each mapped isomorphically onto U by / . We have the important consequence of Theorem A.4.3 and Theorem A.4.10. Corollary A.4.14 Let f : X —> Y be a surjective proper algebraic map between quasiprojective algebraic sets. Assume that X and Y are pure dimensional with dimX = dimY. Then there is a Zariski dense open set U C Y which is smooth and such that f : f~l(U) —> U is a covering map. If Y is irreducible, the map / in Corollary A.4.14 has a well-defined degree. Corollary A.4.15 Let f : X —> Y be a proper map from a pure-dimensional quasiprojective algebraic set X to an irreducible quasiprojective algebraic set of the same dimension. Assume that f is surjective on every component of X, e.g., assume that f is finite-to-one. Then there is a Zariski dense open set U C Y such that f : f~1(U) —> U is a covering map of degree degf, which is equal to the number of points in any fiber of fj-i^uy The most important case of Corollary A.4.15 is when -K : X —» Y is finite-to-one. We say that an algebraic map from a pure-dimensional quasiprojective algebraic set X to an irreducible quasiprojective algebraic set of the same dimension is a branched covering if / is proper and finite. We define the degree of / to be the number deg / in Corollary A.4.15. The following Lemma gives an easy local condition for properness. Lemma A.4.16 (Stein) Let f : X —> Y be a holomorphic map between complex analytic spaces. Assume that y £Y and A is a connected component (not necessarily
Algebraic Geometry
315
irreducible) of f~l(y). If A is compact, then there are open complex neighborhoods U c X of A and V
•
Using this and Grauert's Proper Mapping Theorem A.4.3, we have an extremely important existence result. Theorem A.4.17 Let f : X —> Y be a holomorphic map between complex analytic spaces. Assume that Y is irreducible and that all irreducible components of X have dimension at least equal to dimY. Assume that y € Y and x is an isolated point °f f1(y)IfY is locally irreducible at y, then, there are arbitrarily small complex neighborhoods U C X of x and V C Y of y such that f\j : U —> V is a proper surjective map with finite fibers. Proof. Choose a neighborhood X' of x. By Lemma A.4.16, there are open complex neighborhoods U C X' of x and V C Y of y, such that fu : U —> V is proper. By Theorem A.4.3, f(U) is a complex analytic subspace of V. Since x is isolated, we would be done in the algebraic case by Corollary A.4.12 and the irreducibility of Y at y. In the complex analytic case, we instead use Theorem A.4.13, which implies dim/([/) = dimY. From this and the irreducibility of Y at y, we conclude that f(U) contains a complex open neighborhood V of y. The rest of the result follows by replacing V by V, and U by U n f~l{V). U Example A.4.18 The local irreducibility is needed for the above result. Let X := C and let Y := V(g) C C2 be defined by g(x,y) = y2 - x2(x + 1). Consider the algebraic map / from to X -* Y given by f(t) = (-(1 + t2),yf-i(t + t3)). The reader can check that / is surjective and one-to-one everywhere except at ±%/^T which are mapped to (0,0). A small complex neighborhood U of (0,0) on Y is biholomorphic to a small complex neighborhood of 0 on V(xy). A small neighborhood of t = yf—l does not map onto a neighborhood of (0,0) in Y, but only onto a complex neighborhood of (0,0) on one of the irreducible components of U at (0,0). The following result underpins most constructions of homotopies. It asserts that under minimal conditions, isolated solutions of a system in a family of systems are limits of isolated solutions of nearby systems. Corollary A.4.19 Let X and Y be irreducible quasiprojective algebraic sets. Let f(x;y) be a system of N := dimX algebraic functions on X xY. Letir : XxY —> Y denote the product projection. Assume that x* is an isolated solution of f(x;y*) = 0 for some point y* such that Y is locally irreducible at y*. Then each irreducible component Z ofV(f) containing (x*,y*) satisfies the following properties: (1) dim Z = dim Y; and
316
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(2) there exist arbitrarily small open neighborhoods U C Z of x* and V C Y of y* such that 7T[/ is a proper finite map of U onto V. A number of stronger versions of Corollary A.4.19 are contained in § A. 14.
A.4.1
Generic
Factorization
The next theorem tells us that if we are satisfied with generic results, as we often are in numerical work, properness is unnecessary. The proof requires some constructions involving the graph of a map. Theorem A.4.20 Let n : X —> Y be a dominant algebraic map from an irreducible quasiprojective algebraic set X to an irreducible quasiprojective algebraic set Y. There exists a smooth Zariski dense open set U C Y and a smooth Zariski dense open set W of TT~1(U) such that nw factors nw — s or, where r : W —> V is a surjective maximal rank algebraic map with connected fibers and s : V —> U is an algebraic map and a finite covering map. In particular each fiber of TTW '-W-^U has the same number of irreducible components, i.e., degs irreducible components. Proof. In the following argument, we will repeatedly replace algebraic sets by dense Zariski open sets. For the most part, we call these shrunk sets by the same names. Replacing X by Xreg, we may assume that X is smooth. By shrinking Y and replacing X by the inverse image under n of the shrunk Y, we may assume that Y is smooth. Similarly using Chevalley's Theorem 12.5.6, we may assume that ?r(X) = Y. By using the algebraic Sard's Theorem A.4.10 and shrinking X and Y further we may further assume that IT is of maximal rank. Let X denote an irreducible projective algebraic set in which X is Zariski open. Let X denote the closure of Graph(7r) C X x Y in X x Y. The induced map W : X —> Y extending n is proper. By Hironaka's Theorem A.4.1, there exists a desingularization / : X —> X. Thus following Theorem A.4.8, we can factor ?F O / as a o p where p : X —> Z is an algebraic map with connected fibers; where Z is an irreducible quasiprojective algebraic set; and a : Z —» Y is a finite-to-one proper algebraic map. By Corollary A.4.15, there is a Zariski open dense set U' C Y such that U' and cr' 1 (t/ / ) are smooth and ov-^t/') : <J~1(U') —> U' is a covering. Thus by shrinking we may assume without loss of generality that a : Z —> Y is a finite-to-one covering; Z is smooth; and that p(X) = Z. As we shrink Z we may automatically shrink Y so as not to lose the properties already obtained. To see this let V' be a Zariski open set of Z. Since the image under a of the proper algebraic subset Z \ V is a proper algebraic subset, we may replace Y by U := Y \ a{Z \ V) and Z by V := o-1 (Y \ a(Z \ V1)) to still have an algebraic finite-to-one covering map a :V —> U between manifolds.
Algebraic Geometry
317
By using the algebraic Sard Theorem A.4.10 on p we may, after shrinking, assume without loss of generality that p is of maximal rank. Since X is smooth and / is an isomorphism on the inverse image of the regular points, we may regard X as a subset of X. By using Chevalley's Theorem 12.5.6, we may assume that px '• X —> Z surjects onto Z. Since all fibers of p are smooth and connected they are irreducible. We conclude that all fibers of px are smooth and connected. Indeed, if this failed, we would have a fiber of p which is irreducible, and which after removing a proper algebraic subset is disconnected. Taking U as the final shrunk Y, V :— a~1{U) and W as the final shrunk X, we have finished the proof of the theorem. • Corollary A.4.21 Let n : X —> Y be a dominant algebraic map from an irreducible quasiprojective algebraic set X to an irreducible quasiprojective algebraic set Y. Assume that 7r(Sing(X)) is a proper algebraic subset ofY. There exists a Zariski dense open set U C Yreg such that W := ?r~1(!7) is smooth; and ITW factors nw = s o r, where r : W —> V is a surjective maximal rank algebraic map with connected fibers and s : V —> U is an algebraic map and a finite covering map. In particular each fiber of irw '• W —> U has the same number of irreducible components, i.e., degs irreducible components.
Proof. Replacing Y by Y' := Y\ (Sing(y) U7r(Sing(X))) and X by TT^1 (Y1), it may be assumed without loss of generality that X and Y are smooth. The rest of the proof follows by carefully going through the proof of Theorem A.4.20. • A.5
Rational Mappings
Besides algebraic mappings, there is a more general notion of mapping that is often very useful. On C, the assignment / : x H-> 1/X is a well-defined function on C \ {0}. Using the identification of x € C with [1, x] € P 1 ,, it is natural to think of / as extending to a function on all of C that takes 0 to the value oo equal to [0,1]. In this case, the algebraic set {(a;, [x, 1]) e C x P 1 | x G C} is the graph of the map x —> 1/x regarded as a map from C —> P 1 . This is the simplest example of a rational mapping. A rational mapping f : X —> Y between quasiprojective algebraic sets is defined to be an algebraic set F C X x Y such that there is a Zariski open dense set U c X such that F n (U x Y) is the graph of an algebraic map and V D (U x Y) = T. Every algebraic mapping is a rational mapping, but rational mappings are much general. A rational mapping often has a set of indeterminacy, where it cannot be
318
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
assigned any value. Consider the rational mapping given by the assignment / : (zi, z
The Rank and the Projective Rank of an Algebraic System
We define an algebraic system on a pure .ZV-dimensional quasiprojective algebraic set X to be a set of n algebraic functions f{x) := { / 1 , . . . , / „ } on X. Typically we assume irreducibility in theorems about systems and leave it to the reader to make the trivial adjustments in statements for the reducible case. We define the rank of the algebraic system f(x) = 0 on an TV-dimensional irreducible quasiprojective algebraic set X to be the dimension of the closure of the image / ( I ) c C". By Corollary 12.5.7, f(X) is an irreducible algebraic set. We denote the rank of / by rank/. We define the corank of the algebraic system f(x) = 0 to be N — rank/. Neither adjoining polynomial functions of the equations of / to create a larger system nor replacing / with g • f, where g is an invertible nxn matrix, changes the rank of a system. Theorem A.6.1 Let f(x) = 0 denote a system of n algebraic functions on an irreducible quasiprojective set X. Then there is a Zariski open set U c f{X) c C" such that for y € U, V(f(x) — y) PI XIeg is smooth of dimension equal to the corank of f. Moreover, the Jacobian matrix of f is of rank equal to rank/ at all points of
V(f(x)-y)nXTeg.
Proof. By Theorem A.4.10, we know there is a Zariski open set U C f{X) such that V(f(x) — y) is smooth and such that the Jacobian matrix of / has rank equal to dim/pQ at all points of V(f(x) -y). Corollary A.4.12 gives that dim V(f(x)-y) = N-dimf(X). • Let V and / be as in Theorem A.6.1. For the dense Zariski open set U :=
Algebraic Geometry
319
/~ 1 (y)nX r e g , we have for all points £* 6 U, the rank of the Jacobian of / evaluated at x* equals rank/. This gives us a quick probability-one algorithm for the rank for a system f. Given an algebraic system f(x) of n algebraic functions on an irreducible ./V-dimensional quasiprojective set X, the rank of / equals the rank of the Jacobian at a random point of X. The following is useful. Corollary A.6.2 Let f(x) = 0 denote a system of n algebraic functions on an irreducible quasiprojective set X. If the rank of the Jacobian of f at some point x E Xreg is k, then rank/ > k. Proof. The set on X reg where the Jacobian has rank less than or equal to A; is a quasiprojective subset of X reg . If it is dense then rank/ = k. If it is not dense, then the rank of the Jacobian is greater than A; on a Zariski dense set, which would imply rank/ > k. • Theorem A.6.3 Given a system f(x) of n algebraic functions on an irreducible quasiprojective set X, all irreducible components of V(f) have dimension at least equal to the corank of f. Proof. Use the proof of Theorem 13.4.2
D
The rank of a system is a useful invariant, but from the viewpoint of the Bertini Theorems, a closely related invariant, the projective rank of a system, plays a more central role. The first time reader may safely ignore the rest of this section and any mention of the projective rank of a system. The importance of projective rank stems from Theorem A.8.6, which states that projective rank controls the nonemptiness of the zero sets in Bertini Theorems. The projective rank of a system f is the dimension of the closure of the image of the rational mapping given by sending x G X \ V(f) to [/i(a;),..., fn(x)]. We denote the projective rank of / by rankp/. A system / on CN having projective rank N is called a big system. Remark A.6.4 For an algebraic line bundle and a system of algebraic sections / i , . . . ,fn, rank does not make good sense but projective rank does. It is closely related to the notion of Kodaira dimension, e.g., (Iitaka, 1982). Lemma A.6.5 Let f be a system of n algebraic functions on an irreducible quasiprojective algebraic set X. Then rank/ — 1 < rankp/ < rank/. Proof. The rational mapping used in the definition of projective rank factors as the composition of the map in the definition of rank followed by the map Cn \ {0} —> P™"1, which sends (ZQ, ..., zn-i) —> [ZQ, ..., zn^i]. Since the fiber of the map C™ \ {0} —> P " - 1 is dimension one, we are done. •
320
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
The above proof makes clear how the two ranks may fail to be equal. Before we make this precise in Lemma A.6.7, let us give a definition and an example. Let X c P™ be an irreducible projective set. X is said to be a cone with vertex x e X if for some hyperplane H of Pra not containing x, the projection •KX : P™ \ {x} —> H maps X \ {x} to a set whose closure has dimension less than dimX. An irreducible affine algebraic set X C C™ is said to be a cone with vertex x G X if, when we regard Cn as a subset of P n , the closure of X in P n is a cone with vertex x. Example A.6.6 (Cones) Let Y denote an irreducible (N — l)-dimensional smooth complete intersection in p™-1 defined by homogeneous equations / i , . . . ,fn-N in the variables zo,..., zn-\. The Af-dimensional cone X := V ( / i , . . . , / n -jv) C C n intersects a general (n - iV)-dimensional linear subspace through the origin in the origin only. To see this, regard C" as contained in ¥n with the hyperplane at infinity Hoo given by V(zn). For positive integers m satisfying (m — 1) + (N — 1) < n — 1, there is a dense Zariski open set of m-dimensional linear subspaces L c C™ with
I n I n E 0 0 = (In H^) n (x n H^) = 0. Thus X C\ L = {0}, and this point is singular if degX > 2. Lemma A.6.7 Let f be a system of n algebraic functions on an N-dimensional irreducible quasiprojective set X. Then rank/ = rankp/ except when f(X) is a cone over 0. Proof. The proof of this is immediate from the definitions and left to the reader. • To compute the projective rank is easy. Theorem A.6.8 Let f be a system of n algebraic functions / i , . . . , / „ on an irreducible quasiprojective algebraic set X with one of the functions, fi, which we relabel f\, not identically equal to the zero function. Then
rankp/ = rank j , . . . , - ^ , I /i h ) where the system of quotients is defined on X \ V(/i). Proof. The proof of this, and the independence of the choice of the not identically zero fi, follows immediately from definitions and is left to the reader. • A.7
Universal Functions and Systems
In this section we will give a detailed discussion of certain special families of polynomials, which are useful in the study of polynomial systems.
321
Algebraic Geometry
A. 7.1
One Variable Polynomials
We start with the case of the most general degree d polynomial. Fix an integer d > 1. We have the family p(z, c) := c0 + ciz + ... + cdzd = 0.
(A.7.1)
We have some related issues we must face already. (1) Should we insist that cd ^ 0? (2) Would it be better to include "solutions at infinity" by using the homogenized system
p(z,c) := cO2o + ciz^zi
+ ... + cdzf = 0
withz := [zo,zi] G P 1 ? (3) Since multiplying an equation by a nonzero complex number does not change the solution set of the polynomial, should we make the convention that c is not the point (c 0 , ...,cd) G C d + 1 , but instead [c0, ...,cd] G P d ? Doing this, of course, implicitly throws away the identically zero polynomial, that corresponds to c = 0. For simplicity, we look only at polynomials with no restrictions on the c*, i.e., we assume that (z,c) = (z,c0,... ,cd) G Cd+2 corresponding to polynomials of degree < d. The different choices raised by the issues listed above are treated in a similar way. Let's introduce some notation. We let Zd c Cd+2 denote the solution set of p(z, c) = 0. We let n : Zd —> Cd+1 denote the map induced by the projection (z, c) —> c, and we let p : Zd —> C denote the map induced by the projection 0,c) ->• z. Note that for any given c € Cd, n~1(c) consists of the points (z,c) satisfying p(z, c) = Co + c\z + . . . + cdzd = 0. It is important that the zero set Zd C Cd+2 oip(z,c) = 0 is a connected (d + l)-dimensional complex manifold. Indeed, Zd is dimension d+ 1 since it is denned by a single algebraic function on an irreducible d'oi z cl (d+ 1)-dimensional algebraic set. Moreover, since —^ ' = 1, it is a consequence OCQ
of Theorem A.2.8, that Zd is smooth. To show the connectedness of Zd, we apply the criterion that a space (in our case Zd) is connected if a continuous map (in our case p : Zd —> C) has connected image and fibers. To see this, note that the fiber p~1(zt) of p over an arbitrary point zt G C is the set of (z*,c) such that p{z*,c) = 0. Since this is the linear equation CQ + C\Z* + ... + cdzd = 0 in the variables c G C d + 1 , we see that p~1(z*) is identified with a hyperplane of Cd+l by ir. Since Zd is connected and smooth, it is irreducible. Over all points except 0 6 C d + 1 , n has finite fibers.
322
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Let us consider the points c for which p(z, c) has multiple roots, i.e., less than d distinct roots. This corresponds to points c for which the equations P(Z>C)] _n dP(z,c) — u . dz J
,A 7 9 N {A. (.A)
have a common root. The classical prescription, e.g., (Chapter 1 Walker, 1962) and (Cox et al., 1997), for how to eliminate z from these equations constructs the discriminant of p(z,c), a polynomial of degree Id — 1 in c, which is the resultant of the two polynomials in the system (A.7.2). We only discuss resultants briefly in § 6.2.1 of this book. For us it will be enough to note that (1) for some c», e.g., the c* corresponding to the polynomial zd — 1 = 0, the roots are distinct; (2) for c in a complex neighborhood of c* the roots remain distinct; and (3) the set S in Cd+1 denned by the system A.7.2 is an affine algebraic set. We know from item 3) and Chevalley's Theorem 12.5.6 that TT(<S) is a constructible set. We know that the closure T> of n(S) in the complex topology is an affine algebraic set by Lemma 12.5.3. By item 2) we conclude that V ^ C d+1 and thus that the Zariski open set C d + 1 \ T> is nonempty, and hence dense. A.7.2
Polynomials of Several Variables
The construction of § A.7.1 carries over to several variables. We summarize the construction for polynomials of degree < d on CN. Such a polynomial
P(z,c)= Yl °JzJ \j\
depends on A/" := ( j~ ) coefficients cj, where we use the multidegree notation. We regard p(z, c) as a polynomial on CN x C . By the same reasoning as in § A.7.1, we see that Z^ := V(p(z,c)) is smooth, connected, and of codimension one. Moreover by Theorem A.4.10, there is a Zariski open dense set U C O^", such that the restriction of the projection map TT : CN x C —> CN to Zd H TT~1(U) is a maximal rank map. A.7.3
A More General Case
Let fi(x),..., fn{x) be a set of algebraic functions on an irreducible quasiprojective algebraic set X. For example, these might be a set of rational functions PlQ)
Pn(x)
Qi(x)''""'' qn(x) where the Pi(x) and <&(#) are polynomials on CN and X := CN \ (\J™=1V(qi(x))).
323
Algebraic Geometry n
We define the universal function F(X: x) := V^ A;/,(z) on Cn x X. i=l
It is traditional in this context to refer to the solution set V(f) of the set of algebraic functions fi(x),..., fn{x) as the base locus of the set of functions. We will not use this language, but the reader should be aware of it. Zf : = V(F) i s a quasiprojective algebraic set with Zf (1 [Cn x (Xieg \ V(f))] smooth. Moreover there is a Zariski open dense set U C Cn, such that the restriction of the projection map -K : Cn X (Xreg \ V(/)) —> C" to Zf H TC~1 (U) is either empty or a maximal rank map. This is important enough to state as a Theorem. Theorem A.7.1 (Simple Bertini's Theorem) Let f{x) := {fi,- • • ,fn} be a system of algebraic functions on an irreducible quasiprojective algebraic set X. There is a Zariski open dense set U C C™, such that for (Ai,...,A n ) G U, it follows that g := Yli=i Ai/i has a possibly empty quasiprojective zero set Z such that Z (~i (Xreg \ V(f)) is smooth with the differential dg nowhere zero on
zn(xieg\V(f)).
Proof. First note that we can assume that X is smooth and V(f) is empty, by simply replacing X with (XTeg \ V(/)) and renaming. Note that if rank/ = 0, then each g is constant and the theorem is vacuously true. Therefore we can assume that rank/ > 0. We have the "universal function" F(\,x) :— X^ILi ^ifi(x) defined for (X,x) G n C x X. Zf := V(F) C X is smooth by the same reasoning as used in § A.7.1. Consider the maps TTI : Zf —>• Cra and 7T2 : Zf —> X induced by the projections C " x X - » C " a n d C n x I ^ I respectively. The fiber ^(x) for any x G X is an affine hyperplane of C™. It can be further checked that given any x e X, there is a neighborhood O of x in the complex topology such that 7r^"1(C7) is biholomorphic to C"" 1 x C. Thus Zj is a bundle over X and therefore irreducible of dimension dimX + 71—1. We are in the situation of Theorem A.4.10, and would be done, if we knew that 7Ti is dominant. Assume it is not. Then, there is a Zariski open dense set U C C n such that for A G U we would have that V(£)™=1 A,/*) = 0. • A. 7.4
Universal
Systems
Let / i ( x ) , . . . , fn(x) be a set of algebraic functions on an irreducible quasiprojective algebraic set X. For any positive integer s, we define the universal system "Fi(A,x)i F(A,x):=
:
_F s (A,x)J on C s x " x X.
["Ai,i ••• Ai,n"i = A • /(z) =
:
•..
:
LAS,1 ••• A s , n J
r/r •
:
[fn.
324
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Define Zf := V{F(A,x)) C Csxn x X. Let -KX : Zf -> C s x n and TT2 : Z/ -> X denote the projections induced from C sX71 x X -+ C SXri and C s x " x I -> X respectively. L e m m a A.7.2
IfV(f)
= 0 £/ien Z j is irreducible of dimension dim.X + s(n— 1).
Proof. The set 7r^"1(x) for x 6 X may be identified with the linear space of A 6 Csx™ satisfying A • / ( x ) = 0. Since f(x) ^ 0, this space has codimension s. It can be further checked that given any x G X there is a neighborhood 0 of a: in the complex topology such that T T ^ ^ O ) is biholomorphic to ^ ' " " ' ' x O . Prom this it follows that the set Zf C C s x n x X consisting of (A, x) such that F(A, x) = 0 is an irreducible • set. Lemma A.7.3
IfV(f)
is empty, then Zf D (CsX™ x Xreg)
is smooth.
Proof. Since at any point x G V(F) C Csxn x X at least one of the / , is nonzero, we can see that all the partial derivatives
dFj{A,x) From this we see that V{F) n (Csxn x X r e g ) is smooth.
D
After we develop a few results on linear projections and subspaces, we will prove Theorem A.8.7, the analogue for systems of Theorem A.7.1.
A.8
Linear Projections
"Generic" projections have been used since classical times to reduce questions about general algebraic sets to questions about hypersurfaces. Here we present the basic facts that we need. We follow the presentation in (Sommese et al., 2001c) closely. A linear projection n : CN —> C m is a surjective affine map ?r(x) = a + ^ x ,
(A.8.3)
where «i,o
o=
: .«m,oj
We work with from CN onto with T(TTI(X)) fibers through
a i , i • • • ai,N
; A=
: • . : L a m , l • ' • a m,JVJ
Xi
; and x =
:
(A.8.4)
LXN.
equivalence classes of projections, considering two projections 7Ti,7r2 C m equivalent if there is an affine linear isomorphism T : C m —• C m = 7T2(x). Thus, for us two linear projections are the same if their the origin are parallel (N - m)-dimensional linear subspaces of C ^ ,
Algebraic Geometry
325
i.e., TTJ~ (TTI(O)) is parallel to TT^" (7^(0)). So in the special case of linear projections from C^ —> C^^ 1 with N > 2, we can consider the projections to be parameterized by the lines through the origin, or equivalently the hyperplane at infinity H^ := N V(ZQ), where we regard C^ as embedded in ¥ by ( x i , . . . , XN) - » [z 0 , • • • , ZN\ = (1, Xi, . . . , I J V ) .
This observation will play an important role in § A.10.3. Though we can use the set ofmxJV matrices A g C mXiv with rank A — m to parameterize the linear projections, it helps to keep the geometrical correspondence between projections and the nullspaces of the matrices A in mind. As noted in the last paragraph, when m = N — 1, we are dealing with lines and the natural parameter space is the projective space parameterizing lines though the origin in C^. For linear subspaces of other dimensions this leads us to Grassmannians. A.8.1
Grassmannians
We denned the iV-dimensional projective space P^ in § 3.2 as the set of lines through the origin in CN+1. Replacing linesby (m+1) -dimensional linear subspaces through the origin leads to the notion of a Grassmannian. We define the Grassmannian of (m + l)-planes in (N + l)-space to be the set of all (m + l)-dimensional linear subspaces of CN+1 through the origin. Equivalently this is the space of linear P m s in FN. We denote this space Gr(m, N). The reader should be aware that there is a second convention in the literature where the focus is on CN+1, and the space we denote Gr(m, N) is denoted Gr(m + 1, N + 1). An (TO + l)-dimensional subspace of C w + 1 through the origin is determined by m + 1 elements of CN+1. In analogy with homogeneous coordinates on projective space, we may represent an element of Gr(m, N) by an (m + 1) x (N + 1) matrix A. Conversely, we would like an (m + 1) x (N + 1) matrix A to represent an element of Gr(m, N). For A to represent an (m + l)-dimensional linear subspace, A must have rank m + 1 , e.g., for projective space P^ = Gr(0,N), the (N + l)-tuples [zo,... ,ZN] £ FN are not allowed to have all entries 0. In analogy with homogeneous coordinates on PN, if G is an (m + 1) x (m + 1) invertible matrix, then the rank m + 1 matrices A and G • A represent the same linear subspace. As with FN, we can define embeddings of c( m + 1 ) x ( i V - m ) into Gr(m, N). Indeed, given an (m+1) x (N — m) matrix B, if we send it to [Im+i B], we have a one-to-one mapping. We take this as giving a neighborhood of any of the elements of the image of this map. We can construct other embeddings of c(m+i)x(W-™) whose unions cover Gr(m, N), but do not do so since we do not need this. Grassmannians are connected projective manifolds. As we saw above, the dimension of Gr(m, N) is (m + 1) x (N — m). There is a natural embedding
GrfoNO-pG+D-1
326
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
called the Plucker embedding obtained by sending an (m + 1) x (AT + 1) matrix A representing a point of Gr(m, N) to the point in p U + i ) " 1 with the (^1J) determinants of (m + 1) x (m + 1) submatrices of A as homogeneous coordinates. Since using G • A in place of A multiplies all these homogeneous coordinates by det G, this mapping is well defined. There is a large literature on Grassmannians. The analogy for Grassmannians of the linear projections from FN to lower dimensional projective spaces, that we consider in § A.8.2, are similar mappings from Gr(m, N) to lower dimensional Grassmannians. The isomorphism of a vector space and its dual leads to the isomorphism between Gr(m, N) and Gr(N —m, N). A good place to read more on these useful homogeneous manifolds is (Griffths & Harris, 1994). For detailed information, (Hodge & Pedoe, 1994b) is particularly helpful. In the same vein as § A.7.4, it may be shown that there is a connected projective manifold Ti c Gr(m, N) x P N consisting of all points (L, x) where L G Gr(m, N) is an m-dimensional linear subspace of P ^ and x G PN is a point in the subspace of PN represented by L. By standard abuse of notation, this is denoted by x £ L. Let 7i"i : 7i —> Gr(m, N) and 7T2 : 7i —> PN denote the algebraic mappings induced by the projections Gr(m,N) x P N -> Gr(m,N) and G r ( m , N ) x P N - > P N respectively. The mapping TTI is of maximal rank with the fiber ir~1(L) over L G Gr(m, N) mapped isomorphically to L by 7T2- Thus
dimH = (m + 1)(N - m) + m. The mapping TT2 is of maximal rank with the fiber 7r^"1(a;) for x G PN mapped isomorphically by w\ onto the Gr(m — 1, JV — 1) of m-dimensional linear spaces of P ^ that contain x. Thus dim7r^"1(a:) = m(N — m) Regarding CN as P ^ minus a hyperplane H, there is a one-to-one correspondence of m-dimensional affine linear subspaces of CN with the dense Zariski open set U C Gr(m, N) of m-dimensional linear subspaces L of P ^ not completely contained in H. The algebraic set Gr(m, N) \U is thus identified with Gr(m, N — 1). Theorem A.8.1 Let X be an n-dimensional affine algebraic subset of CN (respectively projective algebraic subset of¥N). Ifn + m < N, there is a dense Zariski open set U C Gr(m, N) of affine linear subspaces L C C ^ (respectively of linear spaces L C PN) of dimension m not meeting X. Proof. Using the identification of affine linear spaces of C ^ with linear spaces in P ^ , we only need to show this result in the case of FN. Note that d i m T r ^ X ) = n + m(N — m). Thus 7Ti(7r^"1(X)) cannot be dense because if it was
(m+l)(N-m) = dimGr(m,N) = dimTr^Tr^^X)) < dimTr^pQ = n+m(N-m). This implies the contradiction N — m < n. Theorem A.8.2
•
Let X be an n-dimensional algebraic subset of CN (respectively
Algebraic Geometry
327
projective algebraic subset of FN). Assume n + m < N. For a given x € X, there is a dense Zariski open set U of m-dimensional affine linear spaces L C CN containing x (respectively m-dimensional linear spaces L C fN containing x) such that L D X = {x}. Moreover if x € X r e g , then U can be chosen so that in addition TL,X n Tx,x =x £ TCNIX (respectively TL,X n Tx,x =x£ T¥N^X), where TLtX! Tx,x, Tpjvj,, TCN >x are the tangent spaces of L, X, P w , C ^ respectively at x. Proof. This theorem is proved by reasoning similar to that for Theorem A.8.1. We only prove the case when X is projective algebraic (the quasiprojective case requires the projective case plus an application of Lemma 12.5.2). Fix a point x £ X C ¥N. The L € Gr(m, N) that contain x, i.e., G\ := 7T1(7r2"1(ar)) is isomorphic to Gr(m — 1,N — 1) and thus irreducible and m(N — m) dimensional. The set L containing x and a point y ^ x is isomorphic to the set G2 '•— Gr(m — 2, N — 2) and thus (m — 1)(N — m) dimensional. The set W of L that contain x and some other point y of X is thus of dimension at most (m — l)(iV —TO)+ n. Here we are using the fact that W is projective. To see this let 52 : G± —> fN denote the algebraic ir^q^iX)). mapping induced by TT2- W is the set = N-m-n>l, Since dimGi - d i m W > m(N -m) - ((m - l)(iV -m)+n) we conclude W is a proper algebraic subset of an irreducible projective algebraic set C?2- Thus there is a Zariski dense open set G2 \ W of m-dimensional projective linear spaces L containing x and no other point of X. The tangent space assertions follow by a dimension count showing that the space of L € Gi such that TL,X H TX,X ¥" x ^ TPN<X is of dimension less than dim G2- The details are left to the reader. • Theorem A.8.3 Let X be an n-dimensional affine algebraic subset of CN. Assume n + m > N. There is a dense Zariski open set U of m-dimensional linear m>N,U subspaces L C CN such that LH X is of dimension n + m — N. Ifn + may be chosen so that if L G U then L D XTeg is nonempty. Proof. Using the same sort of reasoning used in Theorem A.8.2 or a repeated use of Theorem A.7.1 gives this result. •
A.8.2
Linear Projections on FN
We need to consider the extension of projections to projective space. Such projections have traditionally been a major tool in algebraic geometry and are a perennial focus for research, e.g., (Beltrametti, Howard, Schneider, & Sommese, 2000). Let [z0,..., ZN\ denote linear coordinates on fN. As above, we regard CN C fN using the inclusion (xu ... ,xN) —> [z0,... ,zN] = [ l , x i , . . .,xN]. Thus C w = P w \ Hoo, where HOQ := V(ZQ) is the hyperplane at infinity.
328
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
A projection from WN to P m is a surjective map nL : FN \ L -> P m with n(z) = Az,
(A.8.5)
where a
A=
0,0
a
'• .SO
a
l,l
'''
'•
"•
rn,l
a
0,N
ZQ
:
and
z =
: L ZN .
••• a m . w j
and where i is the linear projective space p^- 7 7 1 " 1 c P w defined by the vanishing of the linear equations Az. L is the center of the projection. Theoretically, we work with equivalence classes of projections, considering two projections TTI, TT2 from PN onto P m equivalent if they have a common center L and there is a projective linear isomorphism T : P m -> P m with T(TTI(X)) = -K2{x) on P ^ - L. Note that two projections from P ^ to P m are equivalent if and only if they have the same center L. Thus the linear projections ¥N —> P m are naturally parameterized by the Grassmannian Gr(N — m — 1,N) of (N — m— l)-dimensional linear spaces L c PN. Geometrically, the projection -KI has a simple description. Let L be the center of the projection. Choose any P m C P ^ with the property that L D P m = 0. Given a point x e WN \L and letting (a;, L) denote the linear subspace FN~m C P ^ generated by x and L, the projection from P ^ to P m with center L sends x to (i,L)nPra. The projections nL from P ^ to P m that are extensions of projections from C^ to C m are precisely the projections with centers L C H^. Indeed, let y i , . . . , ym be coordinates on C m and let the usual embedding of C m to P m be given by { y i , . . . , y m ) -> [ w o , . . . , w m ] = [ l , y i , . . . , y m ] , Since we must have a linear equation in Xi,... ,XN when we dehomogenize with respect to w0, we conclude that A is of the form " ao,o
. a m,0
0
a
•• •
0 '
m,l ' ' ' am,N .
with o0,o / 0. Using the invertible linear transformation on P m
T
-=\ \°1
329
Algebraic Geometry
ai,o
where u :=
•
, we see that an equivalent form for A is
.am,0.
where A is as in Equation A.8.4. For example, the projection (xi,... ,XN) —» (xi,... ,a;jv_i) extends to the projection [XQ, ..., XJV] —> [xo,a;i,..., rrjv-i] with center L := {[0,..., 0,1]}. To recapitulate a main point: an equivalence class of linear projections is naturally identified with the center of the projection in the projective case and with the center of the projective extension of the linear projection in the affine case. A.8.3
Further Results on System Ranks
We have a few more properties on the behavior of the rank of a system under randomization. Lemma A.8.4 Let X C C n denote an irreducible affine algebraic set. Then for any nonnegative integer s, there is a dense Zariski open set of linear projections CN —> Cs such that the dimension of the closure of the image of X is min{dimX, s}. Proof. We regard Cn as the complement in P n of a hyperplane Hoc. We first do the case of s > dim X. As we saw above, the linear projections are parameterized by the Grassmannian G :— Gr(n - s — 1, n — 1) of linear P"~'s~1s contained in H^. We have dimXnffco < d i m X - l . Since (dimX - 1) + (n - s - 1) = n — (s - dimX) — 2 < n — 1, we conclude from Theorem A.8.1 that there is a Zariski open dense set U of G corresponding to linear pn-s-i s m i s s m g X n Hoo. Given one of these, say L, and the associated linear map irL,
t h e fiber -K^(irL{x))
t h r o u g h x G X i s (x,L)
H X.
S i n c e L n (X \ X) = 0 ,
(x, L)nX is compact and hence finite by Lemma 12.4.3. Thus by Corollary A.4.12, the closure of the image of X has dimension the same as X. The case of s < dimX follows from the case s = dimX and the observation that if s < dimX, then a dense open set of linear projections from Cd™^ —> C s are onto. • Theorem A.8.5 Let f(x) — 0 denote a system of n algebraic functions on an irreducible quasiprojective set X. For any positive integer s, there is a dense Zariski open set of matrices U C Csxn such that ifAeU, then rank A • / = min{s, rank/}.
330
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Proof. Applying Lemma A.8.4 to f{X)~; there is a dense Zariski open set U of A £ C s x n such that dim(A-/)(X) = min |dim/(X), s\ . • The following useful result adds an existence component to Bertini's Theorem. Theorem A.8.6 Let f be a system of n algebraic functions / i , . . . , / n on an irreducible N-dimensional quasiprojective algebraic set X. Assume that rankp/ — K. Then there is a Zariski open dense subset ofUd CKXn of K X n matrices such that for A £ U, any subset of i distinct functions from the K functions A • / has a nonempty solution set on Xieg, which is smooth and of dimension N — i. Proof. This follows from application of Theorem A.8.2 to the closure of the image ofXinP"-1. • A.8.4
Some Genericity Properties
We have the important generalization of Theorem A.7.1. T h e o r e m A . 8 . 7 ( S i m p l e B e r t i n i T h e o r e m f o r S y s t e m s ) Let fi,--.,fn
be a
system of algebraic functions on an irreducible quasiprojective algebraic set X with solution set V(f). For each s x n matrix A £ Csxn, let F{A,x)~
'Fi(A,x): =A-/(x) .Fs(A,x)_
on C s x n x X. There is a Zariski open dense set U C Csxn of s x n matrices such that for A £ U, it follows that V(A • / ) C X is a quasiprojective set such that if Z\ '•= V(A • f) \ V(f) is nonempty, then dim^A = dimX — s, and Z\ (~l XTeg is smooth with the differentials dFj spanning the normal bundle of ZhC\Xve&. Moreover the number of components of Z/^ fl Xreg is independent of A £ U. Proof. The set U' of A with rank equal to min{s, n} is dense and Zariski open. Therefore, by replacing any dense Zariski open set U C C s x n that is constructed below with its intersection with U', we may assume that all A G U have rank equal to m'm{s,n}. By replacing X with X\V(f) we can assume that the /; have no common zeros. Denote V(F(A,x)) C C s x n x X by Zf. By Lemma A.7.2, Zf is irreducible of dimension dimX + s(n — 1). By Lemma A.7.3, Zf f) (C s x n x Xreg) is smooth. Let TTI : Zf —> CSXn denote the algebraic map induced by the product projection sx C ™ x X —> C s X n and let TT2 : Zf —> X denote the algebraic map induced by the product projection
Algebraic Geometry
331
Therefore we may assume without loss of generality that wi restricted to Zf is dominant. By Corollary A.4.12, there is a Zariski open set U C Csxn such that for y eU, all components of ^:[1{y) have dimension dimZy - dim
z& n Xies is smooth with the differentials dFj spanning the normal bundle of Z\ n Xreg. A.9
•
Bertini's Theorem and Some Consequences
In this section we present a general Bertini Theorem about the solution sets of systems. Since the intersection of any finite number of dense Zariski open sets is Zariski open and dense, we can (and typically do) apply Bertini's Theorem to conclude that a generic choice of some parameters leads to a long list of generic properties. To state such a result succinctly, let us define the constellation of algebraic sets associated to a finite number of quasiprojective subsets X\,..., Xr of a quasiprojective set X to be the collection of sets obtained by repeatedly doing in any order the operations of (1) (2) (3) (4) (5)
taking irreducible components; taking intersections; taking the singular set of a quasiprojective algebraic set; taking finite unions; and given two sets A, B taking the set A \ A D B.
Lemma A.9.1 The constellation of algebraic sets, C, associated to a finite number of quasiprojective subsets X\,-.., Xr of an algebraic set X is a finite set of quasiprojective sets. Proof. All these operations start with quasiprojective algebraic sets and produce quasiprojective algebraic sets. To prove that C is finite, it suffices to show that the set of all the irreducible components of the quasiprojective sets obtained by these operations is finite. Since an irreducible quasiprojective algebraic set A minus a proper algebraic subset remains irreducible, the last operation leads only to the finite number of quasiprojective algebraic sets A \ A f) B for the collection of quasiprojective sets J4, B generated by the first four operations. Thus it suffices to prove the finiteness of
332
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
the collection of sets generated from X±,..., Xr by a repeated use of the operations 1), 2), 3), and 4). Since any intersection is a finite union of intersections of irreducible quasiprojective algebraic sets, it suffices to prove the finiteness of the collection of sets obtained by starting with X\,... ,Xr and repeatedly doing only the operations 1), 2), and 3). Note that the operations of taking intersections of irreducible sets and taking singular sets decreases dimensions if it leads to anything new. Thus, by the fact that dimension is finite, the operations 1), 2), and 3) lead to only a finite number of quasiprojective sets. D Let gi,... ,gs be a set of algebraic functions defined on a quasiprojective algebraic set X. Denote the solution set of all the functions, i.e., V(gi,... ,gg), by V(g). We say that g\,..., gs are simply generic with respect to a /c-dimensional irreducible algebraic set Z C X if given any integers 1 < i\ < ... < ir < s, it follows that (1) ifr >kthenV{gtl,...,gir)nZ (2) if r < k then either V{gtl,...,
cV(g); and gir) D (Z \ V(g)) is empty or
dimV(gil,...,gir)n(Z\V(g))
=
k-r
and V{gil,...,gir) n (Zreg \ V(g)) is smooth with the differentials dgit,..., dgir having rank r in the tangent space Tz,x any x e V(gil,..., gir) n (ZTeg \ V())• Given an s x n complex matrix Ai,i • • • Ai i7l A :
=
;
••.
•
'
. A s , l • ' • ^s,n .
the s x b submatrix A ( j i , . . . , j b ) 6 Csxb of A G C s x n associated to the list of integers 1 < j i < . . . < jb < n is defined to be Ai,ji • • •
A(ji,...,jfc) : =
:
••.
^i,jb
:
-As,ji ' • • *s,jb _
The following Bertini theorem expands on the conclusions reached in § A.7.3. T h e o r e m A . 9 . 2 ( B e r t i n i T h e o r e m f o r C o n s t e l l a t i o n s ) Let fi,...,fn
be a
set of algebraic functions on a quasiprojective set X. Given any finite number A\,... ,Am of quasiprojective subsets of X, let C denote the constellation of quasiprojective sets associated to (1) the sets A\,..., Am; (2) all irreducible components of X; and (3) all sets of the form V(fjl,... ,fjb) for the lists of integers 1 < jx < ... < j b < n.
333
Algebraic Geometry
Then there is a Zariski dense open set U C CsXra, such that for A e U and any list 1 < ji < • • • < jb < n the functions
:
:=A(ji,...,jb)-
\
.9s \
[fjb.
are simply generic with respect to every irreducible set in C. Proof. Since the intersection of dense Zariski open subsets of Csxn is dense and Zariski open, it suffices to prove the result for a single irreducible set Z e C of some dimension k. Further if we showed that for a given list l < i i < . . . < i r < s , the result is true ioi gi1,..., gir where "fill :
[/ji" := A ( j i , . . . , j b ) •
.9s J
:
[fjb.
with A in a dense Zariski open set U(ii, •.., ir;ji, • • • ,jb) C C s x n , we will be done by taking the intersection of these open sets indexed by the finite number of lists of integers 1 < i\ < ... < ir < s and 1 < j \ < ... < % < n. Therefore by renaming, it suffices to prove that there is a dense Zariski open set (/ C C s x n for any s
r/r
"si] :
:=A-
.9s\
:
,
[fn.
it follows that (1) if r > k then % , . . . , j s ) n Z c V(/); and = k - r and V(gi,... ,gB) n (2) if r < k then dimV(9l,... ,gs) n (Z\V(f)) (Zreg \ V(f)) is smooth with the differentials dgi,..., dgs having rank r in the t a n g e n t s p a c e Tz,x for a n y x e V(gil
,...,gZr)n
(Zreg \
These assertions follow immediately from Theorem A.8.7.
V(g)).
•
There are many versions of Bertini's Theorem in the literature, e.g., (Example 12.1.11 Fulton, 1998). For a further discussion of Bertini theorems, see also (§1-7 Beltrametti & Sommese, 1995).
334
A.10
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Some Useful Embeddings
There are some natural embeddings of algebraic sets that are useful. For simplicity, we give versions for projective algebraic sets, though similar constructions are equally useful for affine algebraic sets.
A.10.1
Veronese Embeddings
Let N and d be positive integers. The Veronese embedding is the natural embedding
to the point with homogeneous coorobtained by sending the point [ZO,...,ZN] J dinates made out of all the monomials {z \ \J\ = d} where we use multidegree notation. The restrictions of the linear equations ¥^ N >~ to the image of the Veronese embedding give all the degree d equations of VN.
A.10.2
The Segre Embedding
Let Ni,...,
Nr be r positive integers. The Segre embedding is the natural embedding P^ 1 x . . . x FNr -> p n L i W + i J - i
given by sending the point [zito,..., Z±>N1 ; . . . ; zr,o, • • •, Zr,Nr] to the point with homogeneous coordinates made out of all the monomials z\^ • • • zr^T. Remark A.10.1 The degree of the image of the Segre embedding of the multiprojective space FNl x . . . x PNr in p n r=i( Ar ;+ 1 )- 1 i s the multihomogeneous Bezout number for the system with X)[=i Ni equations all of type ( 1 , . . . , 1). This may be checked, e.g., using Equation 8.4.15, to be / V
N- \
\N1,---,NrJ-
CV"
7V-V
N1\---Nr\-
On a theoretical level, the Segre embedding shows that subsets of multiprojective spaces defined by multihomogeneous equations may be regarded as projective algebraic sets. One case is of special interest. Example A.10.2 (The Quadric Surface) Let S := P 1 x P 1 with bihomogeneous coordinates [zi,o, zi,i; Z2,o, ^2,i]- Let [wo, wi, ^2,^3] denote the homogeneous coordinates on P 3 . The Segre embedding of P ' x P U P 3 given by [wO,Wi,W2,W3]
: = [zi,0Z2,O,Zl,1*2,0, Zl,0Z2,l>zl,1^2,1]
has as image the smooth quadric V(u>oW3 — Wiit^)-
Algebraic Geometry
335
The Segre embedding is useful because it gives a consistent way of measuring the degrees of pure-dimensional algebraic sets on P^ 1 x ... x WNr. Remark A.10.3 (Measuring Degrees) Measuring degree by using the Segre embedding gives the smallest possible values of all the consistent ways of measuring the degrees of pure-dimensional algebraic sets. Other consistent ways may be obtained by using the Veronese embedding on the different projective spaces followed by a Segre embedding. In the language of line bundles mentioned briefly in § A. 13, such a choice is equivalent to choosing an ample line bundle L on M := P^ 1 x ... x ¥Nr and then denning the L-degree degL(X) of a pure fc-dimensional X C M to be c\{L)k • X, where ci(L) is the first Chern class of L. A.10.3
The Secant Variety
To derive properties of generic projections, we need to define the secant variety of an affine algebraic set X. Let X be an irreducible affine algebraic subset of C^. Given two distinct points x, y e X, we have a unique line between them, parameterized by u € C as (1 — u)x + uy. Let A denote the diagonal A := {{z, w) € X x X \ z = w} . Then the image of the map / : ( I x I \ A ) x C - > C A r , defined by f(x, y, u) = (1 u)x + uy is a constructible set by Theorem 12.5.6. The secant variety of X, denoted Sec(X), is the closure in CN of the image of this map. By Corollary 12.5.7, Sec(X) is an irreducible affine algebraic set. By Lemma 12.5.2, dimSec(X) < 2dimX + 1. L e m m a A.10.4 Let X be an irreducible affine algebraic subset of CN. If N > 2d\mX + 1, then a generic linear projection TT :
ZN) —> [XQ, . . . , XN] = [1, Z\,..., ZJV].
Let HQ := V(XQ) denote the hyperplane at infinity. Then Sec(X), the closure of Sec(X) in ¥N, meets Ho in a proper algebraic set of Sec(X). Thus dim iJonSec(X) < dim Sec (X). So we conclude that dim# 0 n Sec(X) < dimSec(X) - 1 < 2dimX < N - 1 = dimH0. Thus Ho
336
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
A.10.4
Some Genericity
Results
Let X C C ^ be an irreducible affine algebraic set. Let n : CN —> C m be a linear projection. The restriction nx, of n to X, does not have to be proper. For example, the map from the hyperbola V(xi^ — 1) to the x\ axis has as image the complement of the origin and is therefore not proper. It is a part of the Noether Normalization Theorem 12.1.5, that if 7r is general, then the restriction wx, of n to X, is proper. We now go over the geometric proof behind a form of Theorem 12.1.5, which includes some degree information.
Theorem A.10.5 (Noether Normalization Theorem) Let X C CN denote an affine algebraic set. Let TV : CN —> Ck denote a generic linear projection. Then if dim X < k, the map TTX is a proper algebraic map with all fibers nxl{y) finite for ally GY :=TT(X). If dim X < k, then there is a Zariski dense subset U C X such that iru : U —> TT(U) is an isomorphism. If X is of pure dimension k, then nx is a branched covering of degree degX. Proof. Let X denote the closure of X in P ^ . Here we embed C ^ by sending (xi,...,xN)
G CN t o [zQ,...,zN]
= [l,xi,...
,xN] 1
G FN. A s a b o v e , w e l e t # o o N Q 1P ; equal to V(ZQ). Linear
denote the hyperplane at infinity, i.e., the p ^ projections CN —> Ck correspond to (N — k — l)-dimensional linear subspaces Z/Ar_fc_i c Hoo. Fixing a general fc-dimensional linear subspace Sk C P ^ , the to map 7T£ : ¥N —> ¥k associated to £, an L w _ fc _i c i?oo, sends x ePN \ LN-k-i T^cix) = Sk n (x,Ljv-fc-i). If C does not meet the projective algebraic set X \X, then TT£ is proper when restricted to X. Since d i m X \ X < dimX < k, we conclude that the set of C C H^ that meet X \ X is a proper algebraic subset A of the Grassmannian Gr(N - k, N) of linear PJV~fcs in P ^ " 1 . This implies properness of the restrictions to X of projections 7f£ with C in the complement of A. If a fiber of -KC on X was not finite, then since the restriction of TT£ is proper, we would have a compact projective subset of X which is not finite. This is absurd by Lemma 12.4.3. If dimX < k, then it is sufficient to show that given a general point x of an irreducible component of X, a general £ = fN~k containing x meets X in no other points and the map associated to H^ n C has maximal rank at x. This makes sense since the general point of an irreducible quasiprojective algebraic set is smooth. This follows from Theorem A.8.2. If X is of pure dimension k, then a general £ = fN~k meets X in deg X points. The general map associated to C = £ fl iJoo has degree degX. • N + 1 different projections may be used to separate points. Lemma A.10.6 Let X be an affine algebraic subset ofCN, all of whose irreducible components are of dimension < k. Fix a finite set S C CN. For a general linear
Algebraic Geometry
337
projection IT : CN -> C m with m> k + 1, K(X) = ir(y) for x e S and y e X U 5 implies that x — y. Proof. Since the lemma is vacuous if m — N, we can assume that m < JV — 1 and thus that k < N — 2. We can reduce by induction to the case when m = N - 1. Let Hoc := f1^-1 denote the hyperplane at infinity in FN. Let y be a point of S. If y ^ X, consider the map
A. 11
The Dual Variety
In classical projective geometry there is a simple but basic duality between points and hyperplanes. To make this precise, let P ^ denote the A^-dimensional projective
338
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
space. A point is represented in homogeneous coordinates by an (TV + l)-tuple [zo,..., ZN\- A hyperplane is represented by a linear equation a^zo + • • • + OJVZJV = 0 with not all coefficients zero. Since multiplication of a linear defining equation of a hyperplane does not change the hyperplane, we see that there is a one-to-one correspondence of hyperplanes with points in a projective space represented in the homogeneous coordinates \ao,..., a/v]- This second projective space is referred to as the dual projective space FN*. Note the relationship is completely symmetric, i.e., P^**, the dual of the dual of FN, is just FN. The family of hyperplanes containing a P N ~ 2 C ¥N corresponds to a line in P^*. With such a duality, it is natural to try to extend it to subsets of projective space besides points and linear spaces. To see how this might be done, let C be an irreducible curve in P 2 . If C was smooth we could send x £ C to the point in P2* representing the line in P2 tangent to C at x. This curve C" is called the dual curve of C despite the fact that if C is a line then C is a point. If C is a singular curve we could define C as the closure of the image of the smooth points. For such a singular curve, the map C —• C is a rational mapping but not necessarily a function. This duality makes sense in general. Given an irreducible projective algebraic set I c P " , define the dual variety X' as the closure in PN* of the set consisting of hyperplanes which contain at least one tangent space of some smooth point of X. We can similarly define the dual of an algebraic set. There is a strong result about dual algebraic sets in complex projective space. Theorem A . l l . l
Let X be an irreducible subset of¥N. Then (XJ = X.
Proof. (Kleiman, 1986) is a good reference for this result and related material.
•
Note this result says that in the case when X is a curve in P 2 and not a line, the rational map X —> X' gives an isomorphism from a Zariski open set of X to a Zariski open set of X'. To see this note that the image of X is either a point or a curve. If it is a point then X" = X is a line. So we have that if X is not a line it has image a curve. The rational mapping X' —> X is a well-defined map on the smooth points of X'. Prom this we conclude that X - t l ' could not be r to one for an r > 1. We need a special consequence of this result. Corollary A.11.2 Let C be a pure dimension-one, not necessarily irreducible, algebraic subset o/P 2 . Assume that C has no irreducible components of degree one. Then C" = C. Further let x be a general point of any one of the components C of C with the tangent line £ to C at x. Then the defining equation of C given by Theorem A. 10.7 restricted to £ has x as a zero of multiplicity two with all other zeros of multiplicity one. Proof. Since C" = C for an irreducible curve and the degrees of the components of C are all of degree greater than one, we have from Theorem A.ll.l that the images of the components of C are distinct irreducible curves. Choosing a general point x
Algebraic Geometry
339
of a component D we get a general point of a component D' of C. This implies that any line £ tangent to a general point of a component D of C corresponds to a point of P2* not on any component of C other than D'. In particular £ must be transverse to C away from x. The condition that a neighborhood of x on C" goes isomorphically to a neighborhood of x on C is equivalent to the fact that x is a multiplicity-two zero of the restriction to £ of the defining equation of C. • A.12
A Monodromy Result
Let X be a pure fc-dimensional affine algebraic subset of CN and let Gr(m, N) denote the Grassmannian of P m s in FN. We close X up to get a pure fc-dimensional projective algebraic set X C PN. We consider the family of intersections £;v-fc H i for A;-dimensional linear spaces Lpi^k C fN. The set of pairs F := {(LN_k,x)
GGr(N-k,N) x X |x
is a projective algebraic set. This is completely analogous to the simpler construction in § A.7. We have the maps p : T —> Gr(N — k, N) and q : T —> X induced by the product projections on Gr(N — k, N) x X. Since a generic L^-k meets X transversely in a set of degX distinct points of X reg , we conclude from Corollary A.4.14 that there is a Zariski open set U c Gr(N - k, N) such that pp-i(u) '• P'1^) —> U is a finite covering. Fix a general point y € U, we have the monodromy action of the fundamental group ni(U,y) on the set p~l(y)- Statements for monodromy using slices of X follow immediately from the statements for monodromy using slices of X. Indeed, by shrinking U further it may be assumed that q(p~1(U)) C X, and so the lemmas and theorems we state hold equally for X and its closure X. Reflecting the bias in this book to regard polynomial systems as being defined on Euclidean space rather than projective space, we state the results for affine algebraic sets X in the rest of this subsection. Lemma A.12.1 If Xi is an irreducible component of X, then the above monodromy action acts transitively on the set Xi np~1(y). Proof. Note that q : J- —> X is a fiber bundle with the fibers isomorphic to the Grassmannian Gr(N — k, N). Thus the set q"1{Xi) is irreducible, and therefore the Zariski dense open subset p~l(U) D q^1(Xi) C q~l(Xi) is also irreducible. Since y is general, p~l{y) consists of smooth points of the irreducible and hence pathconnected manifold (p^1(f7) n q~1(Xi))reg. The monodromy action under a path connecting two distinct points of p~1(y) gives the transitivity. • We need a much stronger result. Choose a general affine linear subspace B :=
340
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
£N-k+i c £N containing the (N -fc)-dimensionalaffine linear space corresponding to the basepoint y € Y. Let BQ denote the (N —fc)-dimensionallinear subspace of C* parallel to B. Though in practice it does not matter, we should theoretically choose the general B first and the general space corresponding to y afterwards. We let 7r : CN~k+1 ->C:= CN~k+1/B0 be the induced linear map. Let Ls := TT" 1 ^). Let V C U be the linear curve through y corresponding to the Ls. We have the following result from (§3 Sommese et al., 2002b). Theorem A.12.2 Let X = U^1Xi denote the decomposition of a pure kdimensional affine algebraic set X c C^ into irreducible components. We assume that k > 1. Let ir~l(y) = VJTi=1Fi, where Ft = ir~l(y) n Xt with n, y, and V as above. The image ofit\ (V, y) into the automorphism group of the set n'1(y) induced by the monodromy action of slices of X by the (N —fc)-dimensionallinear subspaces Ls c CN is r
0Sym(Fi), where Sym(F;) is the symmetric group of F^. Proof. First we may discard all degree one components since they do not effect the veracity of the theorem. Next we reduce to the case when k = 1. Since B is general, we know from part 3) of Theorem 13.2.1 that each X, is irreducible. The map -K may now be regarded as the linear map from CN to C with Ls of the form LQ + sv for a fixed vector v G CN with v £ LQ. By renaming if necessary, we may assume that X is one-dimensional. Next we take a general projection TT' : C^ —* C. The generic linear map II := (7r,7r') : CN —> C2 maps X generically one-to-one to its image by Theorem A.10.5. Let TTi denote the projection of C2 onto its ith factor. There is a Zariski open dense set V of C such that ?rf 1{V) nll(X) is smooth and m : Tr^l(V')r\Tl(X) -> V is a d := degX sheeted covering map. Since n = TT\ O LT, we may regard V as an open subset of V. Since every immersion g : S1 —> V' gives an immersion g : S1 —> V, it suffices to prove the result for V'. This reduces us to the case of a curve in C2 with V a family of lines parameterized by an open Zariski dense set of a line in the dual P2 to the P 2 containing C2. This case follows in two steps. First we prove the statement for the family U of all affine lines in C2. This follows using Corollary A. 11.2 and a modification of the proof of the classical statement when X is an irreducible curve, e.g., (page 111 Arbarello, Cornalba, Griffiths, & Harris, 1985). The proof foiVcU follows from a theorem (Theorem, §5.2. Part II Goresky & MacPherson, 1988) of Lefschetz type asserting that the homomorphism TT\(V, y) —> 7Ti(U,y) induced by the inclusion V C U is a surjection. • We refer the reader to (Sommese et al., 2002b) for a more detailed proof.
341
Algebraic Geometry
A. 13
Line Bundles and Vector Bundles
We have mentioned earlier that homogeneous functions are not functions on projective space, though they are functions on a related Euclidean space. One difficulty posed by this is that the usual statements for algebraic functions on affine algebraic sets are not literally true for homogeneous functions on projective space. If homogeneous functions on projective space were the only issue, we could state the results for polynomials with slight rewording for homogeneous functions. But, faced with a number of very useful generalizations of homogeneous functions, e.g., bihomogeneous and more generally multihomogeneous functions, this is not a viable approach. In this section we first introduce bihomogeneous and multihomogeneous polynomials, and then define line bundles and their sections. A.13.1
Bihomogeneity and
Multihomogeneity
Let X denote the product of two projective spaces, P m x P n . We can denote a point in this space by a (a + b + 2)-tuple [ZQ, ..., zm; Wo,..., wn] of points with neither Zi = 0 for all i nor Wj = 0 for all j , and with the equivalence relation [zQ>...,zm;wo,...,wn]
~ [Xz'o,...,
Xz'm; fiw0,...,
p,w'n}
for all 0 j^ A e C and 0 ^ /x € C A polynomial p(z, w) in the variables ZQ,.. . ,zm,Wo,... ,wn is said to be bihomogeneous of degree (a,b) if it is of the form Yl\i\=a \j\=bcuzIwJ• Note that since p(Xz, /j,w) = Xanbp(z,w), it follows that the set where p(z,w) = 0 is a well-defined subset of Fm x P™. Similarly, we can define multihomogeneous polynomials on P™1 x • • • x Pnfc. A. 13.2
Line Bundles and Their Sections
First, let's consider the case of C^. If we have a polynomial p(z) on CN, we can think of p(z) in terms of its graph ap :— {(z, A) e C^ x C j A = p(z)} . We say that CN x C is the trivial line bundle on CN and av is a section. In loose terms, a line bundle over X is a quasiprojective algebraic set which maps onto X with fibers identified with C in such a way that the vector space structure on C is preserved. Precisely, we can define line bundles on any quasiprojective algebraic set X. A line bundle L on X consists of the data (1) UQ, .. •, Ue, a covering of X by affine Zariski open sets Ui dense in X; (2) for each 0 < i < £, 0 < j < £, an algebraic function ptj defined and nowhere zero on Uij := Ui f] Uj with pijPji = 1 on Uitj and pa = 1 on Ui\ and (3) PijPjk = Pik on Ui n Uj n Uk for all 0 < i < I, 0 < j < I, 0 < k < £. Associated to a line bundle is a space generalizing the trivial bundle. The space, also called L by abuse of notation, is covered by open sets UiXC where for x € Uij we identify {x,At) G £7, x C with (x,Aj) 6 Uj x C if Aj = pij{x)Al. The cocycle
342
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
condition pijPjk — Pik guarantees the identifications are well defined. There is a further natural, but involved, definition of when different covers and choices of pij lead to the "same" line bundle. Sections, which are basically like graphs of functions, are defined as a choice of algebraic functions at : Ui —> C with the property that
aj{x) = ai{x)pij{x) for x e Uld and all 0 < i < I, 0 < j < L An algebraic line bundle L on a quasiprojective algebraic set X is spanned by a vector space V of global sections of L if for each point x 6 X, there is at least one section s & V such that s(x) ^ 0. For example, letting [zo,zi] denote homogeneous coordinates on P 1 , we may cover P 1 with Uo :— P 1 \ V(zx) and Ux := P 1 \ V(z0). We have the coordinate z :— ZQ/ZI on UQ and w := 1/'z = Z\/ZQ on U\. We may form the line bundle Opi (d) by taking the data consisting of the function poi = l/zd on Uo fl U\. We may regard a homogeneous polynomial p(zo,z\) of degree d > 0 a s a section of Opi(d) by assigning ao(z) := p(z, 1) to Uo and
= p(l, 1/z) = z" d p(-s ; 1) = Po,io-o(^)-
Note that Opi (0) is the trivial bundle, and for d > 0 the bundles 0pi (rf) are spanned. There are no other sections of (Dpi (d) besides the ones just constructed using the homogeneous polynomials. On P ^ , the line bundles are not much more complicated than the ones just constructed for P 1 . They are in one-to-one correspondence with the integers d, with the line bundle corresponding to d being denoted Cpw(d). For d < 0 the only algebraic section of OFw(rf) is the 0-section, i.e., the choice of a cover [7; of FN and (Ti — 0 for all i. For d = 0 we have the trivial bundle, whose only sections are the constant functions, and for d > 0 the algebraic sections are again in one-to-one correspondence with the homogeneous polynomials of degree d. It turns out that up to equivalence that the only algebraic line bundle on CN is the trivial line bundle. Any algebraic line bundle L on an irreducible projective algebraic set X gives rise to a well-defined element C\{L) in the second integral cohomology group H2(X, Z) of X. This element c\(L) is called the first Chern class of L. If L has a not identically zero section s, then ci(L) is Poincare dual to the zero set Z of s. Let us assume we have line bundles L\,... ,LN on an irreducible projective algebraic set X of dimension N. If the line bundles are spanned by global sections, then given general sections Si of Li for i = 1 , . . . , N, it follows that the system siO) = 0 :
(A.13.6)
sN(z) = 0 has exactly (c\{L\) • •
-CI(LJV))
[X] isolated solutions and they are all nonsingular.
343
Algebraic Geometry
For example, if X = FN and Li = OpN(di), then the Sj are homogeneous polynomials of degree di, and we have the classical Bezout Theorem.
A. 13.3
Some Remarks on Vector Bundles
Replacing C in the definition of line bundles by C , and letting p^ be invertible r x r matrix-valued holomorphic functions we end up with the definition of a vector bundle of rank r. In terms of this definition, given line bundles L\,..., L^ on an irreducible projective algebraic set X of dimension N, and sections Sj of L, for i = 1,..., N it follows that the system given by Equation A.13.6 is equivalent to s = 0, where s = s1 © • • • © s^ is the section of the rank N vector bundle E := L\ © • • • © LN obtained by taking the direct sum of the N line bundles L^ The cohomology class c\{L) •••cN(L) e H2N{X,Z) is just the iVth Chern class Cff(E) of E, and the Bezout number for the system is just c^/(E)[X]. Such numbers are very often easy to compute. As a concrete example, we give the simplest nontrivial system on CN arising as a section of a rank N bundle on P w restricted to C^. For the bundle we take the tangent bundle Tpw of FN. The Bezout number for the system associated to a general section s of TpN is AT + 1. Written in terms of coordinates x i , . . . , x^ on CN the system becomes ' £i(x) -
Xleo(x)
'
:
=0
JN{X) -xNe0(x)_ where li{x) = a^o + auxi + • • • + dixx^ for generic choices of all the a^. By the theory of vector bundles it may be checked that this system has exactly N + 1 nonsingular isolated solutions.
A.13.4
Detecting Positive-Dimensional Components
The algebraic geometric structure that best captures what is meant by a polynomial system is that consisting of a vector bundle and one of its sections. For the sake of simplicity we have avoided line bundles and vector bundles in this book, but they are in the background and they lead to useful results, e.g., (Morgan et al., 1995) and (Morgan & Sommese, 1989). Here is one (Theorem 7 Morgan & Sommese, 1989). Theorem A. 13.1 (Morgan and Sommese) Let £ be a spanned rank N holomorphic vector bundle on an N-dimensional irreducible compact complex analytic space X. Assume that CN{£)[X\ ^ 0. Let a0 and o\ be two holomorphic sections of £. Then letting \ZQ,ZI] be homogeneous coordinates on P 1 , the solution set of 1 ZQCTQ + z\cj\ on P x X is connected.
344
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Before we start the proof of this theorem we would like to show what it says about down-to-earth polynomial systems. Let X := P^ 1 x • • • x ¥Nr be a product of projective spaces. Consider "systems of polynomials" a consisting of N := YH=i Ni equations where the ith equation has the nonnegative multidegrees d^i,... ,dktT with respect to the multihomogeneous structure. Letting TTJ be the product projection of X onto the jth factor P ^ , a is a section of the bundle
£:=0(0*;
(A13-7)
When the solutions of a system a are isolated and nonsingular, then the Nth Chern class of £ evaluated on X, i.e., CN{£)[X], equals the number of points in V(a). In the case of the £ of Equation A.13.7, c^(£)[X] equals the coefficient of tr 1 • • • t^r inIT^=1(M»,i + \-Uditr)Next let ai be a section of the £ in Equation A. 13.7 with isolated nonsingular solutions. Assume that we know the solutions of <j\. Now consider the "homotopy" CTJ := (1 — £)<7o + itai where ao is a second section, whose solution set we are computing. We know that for all but a finite number of 7 £ 5 1 , the solution set of <7t — 0 is isolated and nonsingular for t G (0,1]. By Theorem A.13.1, we know that the limits of V(at) as t —> 0 include points from every connected component of V(a0). Proof, (sketch of the proof of Theorem A.13.1) Letting X be a desingularization of X by Theorem A.4.1 and noting that sections from a Zariski open dense space of sections of £ are nowhere zero on a proper analytic subset of X, we conclude that we can assume that X is smooth without any loss of generality. Analogously to the arguments in § A.7, the universal space of solutions of sections of £ is a smooth connected projective bundle over X. Using the proof of item (3) • of Theorem 13.2.1, we have the connectedness. A. 14
Generic Behavior of Solutions of Polynomial Systems
Systems of polynomials that arise in engineering and science often depend on parameters. In this section, we take a general approach to polynomial systems with parameters, and discuss what we can say about the dependence of solution sets on the parameters. There are two questions we are interested in: (1) what properties hold for general values of the parameters, e.g., a well-defined number of isolated solutions; and (2) given some property for a system with a special value of the parameter, e.g., having an isolated solution, what can we conclude for general values of the parameters.
345
Algebraic Geometry
Since the proofs require material beyond the scope of this book, we refer to references for essential points. Our approach is the same as (Morgan & Sommese, 1989), though the focus there was mainly on isolated solutions of systems. Let / i ( z i , . . .,xN;qi,..
.,qM)
:
f(x;q):= _fn{xi,
• • • ,XN',Ql,
(A.14.8) • • • ,QM) .
be a system of polynomials of (x;q) G CN x C M . We regard this as a family of polynomial systems in the x variables with the g-variables as parameters. Though the algebraic system given in Equation A.14.8 is quite general, it is not general enough. We need to allow also the possibility that systems in the family are defined on any algebraic subset of
(A.14.9)
be the restriction to X x Y of an algebraic section of an algebraic rank n vector bundle £ on X x Y, where X is a Zariski open and dense subset of an iV-dimensional connected projective manifold X, and Y is an irreducible smooth quasiprojective algebraic set of dimension M. A special case of this would be the situation that X x Y is a smooth Zariski open set of an irreducible projective algebraic subset of some projective space and f(x; q) consists of the restriction of n homogeneous polynomials fi(x; q) to X x Y. Though we briefly discussed vector bundles in § A.13, we suggest strongly, the first-time reader proceed with X := CN, Y := C M , and f(x;q) = 0 in Equation A.14.8 satisfying the extra property that it is a set of n polynomials on CN+M. Let X denote the nonreduced solution set of f(x; q) = 0, and let Z := V(f(x; q)) denote the reduction of X. Let n : X —> Y be the map induced from the product projection X x Y —> Y {CN x C M —> C M if you are following in the simpler setup). Let Xo denote the union of irreducible components Z of Z such that TTZ is dominant and such that dim Z = M. T h e o r e m A.14.1 If n = N and if there is an isolated solution (x*;q*) of f(x;q*) = 0, then (x*;q*) £ XQ. Moreover there are arbitrarily small complex open sets U C X x Y that contain (x*; q*) and such that (1) (x*;q*) is the only solution of f(x; q*) = 0 in U D (X x {<7*})/ (2) f(x;ql) = 0 has only isolated solutions for q' G TT(W) and x G U fl (X x {q1}); and
346
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
(3) the multiplicity of (x*;q*) as a solution of f(x;q*) = 0 equals the sum of the multiplicities of the isolated solutions of f(x; q1) = 0 for q' € ir(U) and x € Un(Xx{q'}). Proof. The first two statements are proven in Theorem A.4.17. Since the codimension of XQ is N and XQ is defined near (x*;q*) by N functions, we know that Xo is a local complete intersection near (x*;q*). Thus Xo has at worst Cohen-Macaulay singularities. Since Y is smooth, this implies that -Kx0 '• X$ —> C M is flat in a neighborhood of (x*;q*). Here we are using the nonreduced structure of XQ in the neighborhood of (x*;q*). Flatness yields this result, e.g., (Prop. 3.13 Fischer, 1976) and the corollary following that proposition. • Corollary A.14.2 Let f(x;q) be as in Equation A.14-9. Then there is a Zariski open set U
tions counting multiplicity is d := yjidj, where the sum is always finite and i=l
bounded by the product of the JV largest degrees (with respect to the x variables) of the equations making up f(x; q) = 0. We call d, the generic Bezout number or the generic root count of the system f(x; q) = 0. Theorem A. 14.1 is one large reason why we use square systems. The following example is typical of the case n > N. Example A.14.4
For a system of polynomials in (x; qit q2) € C x C2, take
For q\ — q§, the system has isolated solutions, but for q\ ^ q2., there are no solutions. Theorem A.14.5 Assume that M + JV > n and that there is an isolated solution (x*;q*) of f(x;q*) = 0 where f(x;q) is as in Equation A.14.9. There is a germ of an irreducible complex analytic set Q containing q* with dim Q > M — {n — N) such that for all points q' in arbitrarily small open sets U
Algebraic Geometry
347
obtain a system equivalent to f(x;q) = 0. Using Theorem 12.2.2 successively, we cut X'Q down to an affuie algebraic set with dimension > M - (n — N). Take a component Z of this set at (x*;q*). By Lemma A.4.16, there are arbitrarily small open sets V of q* on this component (in the complex topology) on which the restriction 7rgn7r-i(y) : ZPi-K~l(V) —> V is proper (in addition to being finite by construction), e.g., see (§3, Theorem 8(b) Gunning, 1970) for a discussion. By Theorem A.4.3 applied to map irzr\-ir-1{y)i w e a r e done. • Remark A.14.6 A similar statement to Theorem A.14.1 can be proved, when we are talking about k dimensional components in place of an isolated x*. In this case when M + N > n + k, we get a Q of dimension at least M + N — n — k. A.14.1
Generic Behavior of Solutions
As at the start of the section, let f(x;q) be as in Equation A.14.9 (or simply as in Equation A. 14.8). We let X denote the solution set of f(x;q) = 0 with the induced nonreduced structure, and TT : X —> Y the induced morphism. The easiest route to generic statements is to exploit the fact that the morphism IT : X —> Y is "generically flat." Before we do this, let us show some generic properties, just to give the flavor of how the arguments go. We will continually choose smaller Zariski open dense sets U CY, and by abuse of notation call them U. Lemma A.14.7 There is a Zariski open dense set U cY such that either TT"1 ([/) is empty or n^-i^ : 7r^1(t/) —> U maps every irreducible component of X surjectively onto Y. Proof. To see this note that there are finitely many irreducible components Z of X. The set TT(Z) is constructible by Theorem 12.5.6, and so either n(Z) is Y or a proper algebraic subset of Y. Setting U equal to the complement of the union of the proper algebraic sets arising in this way, we can assume TT(Z) is dense in U for every component of Xu, the solution set of f(x;q) over U. We know, by Lemma 12.5.8, that for such a Z there is a Zariski open dense set of Y contained in ir(Z). By taking the intersection of these sets, we get a Zariski open dense set U with the desired property. • Lemma A.14.8 There is a Zariski open dense set U C Y such that given any irreducible component Z of n~1(U), TT^-I^) : TT~1([7) —> U maps Z surjectively onto Y with every fiber of nz having dimension exactly dim Z — M. Proof The argument follows from Corollary A.4.7 combined with the same reasoning as Lemma A.14.7. • The same sort of arguments yield the following result.
348
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Lemma A. 14.9 There is a Zariski open dense set U C Y such that given any distinct irreducible components Z\ and Z2 ofn'1^), either Z\V\Zi — 0 or 7rff-i((/) : TT~1(U) —> U maps every irreducible component W of Z\ D Z 2 surjectively onto Y with every fiber W of irw having dimension exactly dim W — M Many results such as this are immediate consequences of the generic flatness theorem. The generic flatness theorem is a useful algebraic result of Grothendieck, e.g., (pg. 57 Mumford, 1966), which Frisch (Frisch, 1967) showed holds for holomorphic maps between complex analytic spaces. We are not going to define flatness, but it geometrically says "fibers change without discontinuity." Good places to read about flatness (and some of the results that justify such a statement) are (pg. 146-161 Fischer, 1976) and (Chapter III. 10 Mumford, 1999). The generic flatness theorem mentioned above says that there is a Zariski open set U C Y such that either 7r~1(C/) is empty or T I V - I ^ ) : TT~1(C/) —> U is a flat surjection. From here on we will assume that U is not empty, since the statements we show are all trivially true in that case.
The generic irreducible decomposition Is there a "generic irreducible decomposition?" The answer is a strong yes, but first we must understand what we mean by this. For a point y € Y, let Xy denote the solution set of f(x; y) = 0. Forgetting about multiplicity information, we have the irreducible decomposition dimZy /
\
(J M J Zy,it3.
Zy:=V(f(x;y))=
(A.14.10)
We would like there to be a Zariski open dense set U
n-\U), i.e., dim Zy /
Zv=
\
U (\J Zv,itk i=l
\keJi
.
(A.14.11)
/
Note we are using Lemma A. 14.8, which tells us that given any irreducible component ZUthk of ZJJ, dimZ[/,j;fc = M + d\mZUylyk,y = M + i, where ZUtitk,y = Zu,i,k n (X x {y}).
349
Algebraic Geometry
Theorem A.14.10 Let f(x;q) be as in Equation A.14-9. Then there is a Zariski open dense set U C Y such that for any y £ U and each Zu,i,k occurring in Equation A.14-11, it follows that Zy^fcClTr—1(y) is a union of the irreducible components Zy,i,j °f Zy occurring in Equation A. 14-10. Moreover, for each of the i,k, all fibers of Zjj,i^k under n have the same number of components. Proof. Assume that it is not true, for the U selected in Lemmas A. 14.7, A. 14.8, and A.14.9, that Zu,i,k H Tr~1(y) is a union of the irreducible components 2y,i,j of Zy. Then one of the components Zy^j of Zy must contain one of the components of Zu^^k H ir~l{y). Moreover one of the components Zuytk' of Z\j must contain Zyjj. Thus we get that Z[/,i',fc' H •Zf/.i.fc contains a component W with fiber under 7T of dimension i. But this means W is dense in Zu,i,k, which gives the absurdity that Zutifk C Zuyk'By Theorem A.4.20, we may shrink U to a smaller dense Zariski open set U, so that each Zu,%,k contains a smooth Zariski open set W such that for all y e U, W Pi 7r~1(y) is dense in 7r^1(y); and IT : W —» U is of maximal rank with all fibers • having the same number of irreducible components. A.14.2
Analytic Parameter Spaces
It is a natural question to ask whether the results in this section are true when the parameters do not vary algebraically but only vary holomorphically. The short answer is "yes, with certain minor modifications." Because it is useful to allow complex analytic parameters, we explain what we mean by this and moreover state the generalization of the above results with the changes needed to prove them. In this one subsection, Zariski topology refers to the Zariski topology using zero sets of sets of homomorphic functions. The simplest case is a system '
fi(xi,...,xN;qi,...,qM)~
f(x;q):=
(A.14.12)
: Jn(xi,-
•• ,xN;qi,.
..,qM).
of holomorphic functions of (x; q) £ CN x C M , that are polynomial in the x variables. We regard this as a family of polynomial systems with the q-variables as parameters, there is a positive integer di such that i.e., for each i — l,...,n,
fi(x;q)= ] T aI(q)xI, \i\
where each ai(q) is holomorphic on all of C M . The situation analogous to Equation A. 14.9 is a system f(x;q),
(A.14.13)
350
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
which is the restriction to X x Y of a holomorphic section of a holomorphic rank n vector bundle £ on X x Y, where X is a smooth and dense Zariski open subset of an irreducible projective algebraic set X of dimension N, and Y is a connected complex manifold of dimension M. We allow the possibility that X — X. In the case of Equation A.14.12, X = P^, and £ = FN and £ = OPN(di)
® • • •®
O¥N(dn),
and f(x;q) = fi(x;q)(B- • -®fn(x;q). As before, the first-time reader should assume that X := CN, Y := C M , and f(x; q) = 0 is as in Equation A.14.12. As above we let n : V(f) —> Y denote the holomorphic mapping induced by the product projection X x Y —» Y. We let n : V(f) —> Y denote the holomorphic mapping induced by the product projection X x Y —> Y. If Z is an irreducible component of V(/), then by Theorem A.4.3, W (Z) is a complex analytic subspace of Y. Since n {Z\Z) is a proper complex analytic subspace of n (Z), we conclude that U :— 7f (Z) \ W [Z \ Z) c n(Z) is a Zariski open dense subset of TT(Z). This plays the role of Lemma 12.5.8. We will continue to replace U by Zariski open dense subsets of U as needed, and call them by the name U. This implies that each irreducible component of 7T~1(C/) maps surjectively on U. We state only the analogues of Theorem A.14.1 and Corollary A.14.2. Theorem A.14.11 If n = N and if there is an isolated solution (x*;q*) of f(x;q*) = 0, then (x*;q*) € XQ. Moreover there are arbitrarily small open sets U C X xY that contain (x*; q*) and such that (1) (x*;q*) is the only solution of f(x;q*) = 0 in U n (X x {q*}); (2) f(x; q') = 0 has only isolated solutions for q' € TT(U) and x 6 U ("1 (X x {9'}); and (3) the multiplicity of (x*;q*) as a solution of f(x;q*) = 0 equals the sum of the multiplicities of the isolated solutions of f(x; q') — 0 for q' e TT(W) and x G Un(Xx{q'}). Proof. The argument is the same as that for Theorem A.14.1.
O
When working with complex analytic spaces it is useful to define an analytic Zariski open set to be a subset U C X of an irreducible complex analytic space of the form X \Y where Y is a complex analytic subspace of X. All the usual notions, e.g., probability-one and generic point, carry over with no change. We would call the Zariski open sets we have dealt with up to now, algebraic Zariski open sets, if we needed to deal in any significant way with both sorts of Zariski open sets. Corollary A.14.12 Let f(x;q) be as in Equation A. 14-13. Then there is an analytic Zariski open set U C C M such that for q € U the system f(x; q) = 0 has di
351
Algebraic Geometry
isolated solutions (not counting multiplicity) of multiplicity i where di is an integer independent of q &U. Remark A.14.13
Thus, as in the purely algebraic case, the generic number of oo
isolated solutions counting multiplicity is d := ~S^idi, where the sum is always i=\
finite and bounded by the product of the N largest degrees (with respect to the x variables) of the equations making up f(x; q) — 0. We, as in the purely algebraic case, call d, the generic Bezout number or the generic root count of the system f{x;q) = 0. Corollary A. 14.12 holds with X singular.
Appendix B
Software for Polynomial Continuation
There is much to be said for the motto "learn by doing," and in our case, this means solving polynomial systems with numerical continuation. Even though this book offers substantially all the information one would need to write a solver from scratch, that is rather far beyond the level of commitment most readers will muster. To provide an easy entry to the area, we provide a suite of m-file routines called HOMLAB for performing polynomial continuation in the Matlab environment. After gaining experience with HOMLAB, one may wish to download one of several freely available software packages for polynomial continuation. These may offer speed advantages and advanced options, such as polytope methods, not available in HOMLAB. Some of these have been adapted to run on multi-processor machines for large computations. A partial listing of packages available as of the writing of this book is as follows. • HOMLAB runs in the Matlab environment and implements general linear product homotopy and parameter homotopy. See Appendix C. • HOMPACK, H0MPACK90, POLSYS_PLP are a sequence of increasingly sophisticated continuation algorithms, written in Fortran. The "PLP" in POLSYS_PLP stands for Partitioned Linear Products, a special case of the general linear products discussed in § 8.4.3. This code finds only isolated solutions for square systems (same number of equations as variables). • PHoM is a C++ code that implements polyhedral homotopies (see § 8.5). This package finds isolated solutions for square systems. • PHCpack implements a variety of homotopies in a menu-driven interface that includes all the structures discussed in Chapter 8, except polynomial products. In addition to isolated roots, the algorithms from Part III of this book for handling positive dimensional solutions, nonsquare systems, etc., are implemented. This package is written by our collaborator, J. Verschelde, and it has been the experimental platform for validation of these algorithms. Both executables and Ada source code are available. • Algorithms for mixed volume computations can be found on T.Y. Li's webpage. This is the most difficult phase of a polyhedral homotopy, (§ 8.5). 353
Appendix C
HomLab User's Guide
a suite of scripts and functions for the Matlab environment, is designed as an easy entry into the use of polynomial continuation and, for the experienced user, as a platform for experimental development of new methods. Many of the exercises of this book assume the availability of HOMLAB and special routines using HOMLAB functions are provided for some exercises. The use of a routine for a particular exercise is described in the exercise statement itself, while the general structure and use of HOMLAB is documented below. The best way to learn HOMLAB is simply to work the exercises in the order they appear in this book. These progress from the simple application of the core path-tracking routine to successively more sophisticated homotopies that use it. Help describing the usage of individual routines, say, endgamer. m, is available by typing "help endgamer" at the Matlab prompt. The main text of this book is the reference for the methodologies used and the help facility just mentioned is the reference for individual routines. However, to help the user in getting started quickly, we provide this user's guide. We assume the user has at least a minimal acquaintance with Matlab; in particular, the user must know how to write and execute simple scripts and functions. A script is a sequence of Matlab commands recorded in a file, say "myscript.m," which are executed by typing » myscript at the Matlab prompt, here indicated as " » . " (Scripts can also be called within other scripts or functions.) A function is a file, say "myfunc.m," which starts with a declaration line something like HOMLAB,
function
[out1,out2]=myfunc(inl,in2,in3)
followed by lines of Matlab code that compute the two outputs, outl,out2 from the three inputs inl,in2,in3. This function might be called as [a,b]=myfunc(0.1,[1 3] ,x) where x is an existing variable in the workspace. For more on using Matlab, please see the Matlab documentation. 355
354
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
• BERTINI is a C code soon to be available on A. Sommese's webpage. An effort of D. Bates, C. Monico, A. Sommese, and C. Wampler, led by A. Sommese, BERTINI features a high-level interface for parameter homotopies (including automatic differentiation) and multiple-precision routines that can adjust precision on the fly. As URL's are often subject to change, we suggest that the packages be located by use of a search engine.
356
C.I C.I.I
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Preliminaries "As is" Clause
HOMLAB is distributed free of charge on an "as is" basis. Its intended usage is educational, so that the user may gain a greater understanding of the use of numerical homotopy continuation for solving systems of polynomial equations. Any other use is strictly the user's responsibility.
C.I.2
License Fee
There is no license fee for HOMLAB. In lieu of this, we hereby request each user to buy a copy of this book. C.I.3
Citation and
Attribution
The use of HOMLAB for research purposes, either in its original form or as modified by the user, is highly encouraged, subject only to professional ethical conduct, as follows. In publications based on results obtained using HOMLAB or its successors, the use of HOMLAB should be acknowledged and this book should be cited. The author of the code is Charles Wampler. Any redistribution of HOMLAB in unaltered form must retain the same name and acknowledge the author. Any distribution of derived codes that extend or modify HOMLAB should acknowledge the original source and authorship. In addition, the differences from the original should be clearly documented and attributed to the new author. These conditions extend to users of the derived codes. C.I.4
Compatibility and Modifications
HOMLAB is a suite of Matlab routines. Version HOMLABI.O has been restricted to the conventions of Matlab v.4.0 to provide compatibility with both old and new Matlab installations. (Even the file names have been restricted to eight characters for compatibility with old operating systems.) The exception to this rule is that routines based on Part III of this book for generating witness point supersets use cell arrays to store sets for different dimensions. Users who advance to that level will need a more recent version of Matlab, or else they must modify the code. All routines have all been verified to run under Matlab v.6.5. By avoiding advanced features of newer versions of Matlab (except as just noted), we hope the package will be easier to translate to run in other environments, in case some readers lack access to Matlab. In particular, Octave and SciLab are both freely available packages that implement a large subset of Matlab functions, so they are good candidates for substitute environments. Anyone who successfully
HomLab User's Guide
357
ports HOMLAB to one of these, or similar, environments is requested to notify the authors and to make the ported version freely available. Citation of HOMLAB and this book are required, and any differences in functionality must be documented. The authors are not bound to fix bugs in the current version or to upgrade HOMLAB for compatibility of any future release of the Matlab product. However, user comments and bug reports are welcome, so that, at our discretion, we can maintain and possibly improve the educational value of the package. Please see the HOMLAB webpage for instructions on how to submit a comment or bug report. The exercises for this book have been written under Matlab v.6.5. Some of these use features not available in previous releases, namely function pointers and function files that include subfunctions in the same file. This should be more convenient for those with an up-to-date release of Matlab; those with old versions will, we hope, have little trouble revising the source code to run in their environment. C.I.5
Installation
As a suite of m-files, HOMLAB becomes functional by simply adding the folder containing the routines to Matlab's search path. The folder for the current release, HOMLABI.O, is HomLablO. Let's say that you have copied this folder onto your machine with the full path name of c:\mypath\HomLablO, where "mypath" could be any path in the file structure of your machine. There are three basic options for adding HOMLAB to the Matlab path: • In Matlab (v.6.5 and above), use "File -> Set Path" on the Matlab menu bar to launch a dialog box for setting the path and use it to add c: \mypath\HomLablO and its subfolders to the top of the search path. The change becomes effective immediately in the current session, while the "Save" button in the dialog box records it for future sessions. • At the Matlab prompt, use the command » addpath c:\mypath\HomLablO HOMLAB will then be available for the current session only. Similarly, add the subfolders of HomLablO to the path. • Create a file called startup.m in a directory already on Matlab's search path and put the appropriate addpath commands there. HOMLAB will then be available for all future sessions.
Any one of these three options is sufficient. See the Matlab help facility to obtain more detailed instructions on modifying the search path. To test if the installation is successful, type » simpltst at the Matlab prompt. If all is well, the session should look something like:
358
Numerical Solution of Systems of Polynomials Arising in Engineering and Science » simpltst Number of s t a r t points = 2 elapsecLtime = 0 Path 1 elapsed_time = 1.4100e-001 Path 2 elapsed_time = 3.1300e-001 The solutions are: 1.0000e+000 -2.2204e-016i -1.0000e+000 -1.1102e-016i 1.0000e+000 -1.6653e-016i -1.0000e+000 -1.6653e-016i 1.0000e+000 1.0000e+000 +5.5511e-017i »
The times will vary according to your machine and the tiny values of the imaginary parts of the answers will typically change with each run. This test solves the simple system x2 - 1 = 0,
xy-l
= 0,
in the homogenized form x2 - w2 = 0,
xy — w2 = 0,
using a two-path homotopy based on the linear-product formulation /i G (x,l) , / 2 6 (x,l) x (y, 1). Accordingly, the answers should be (x,y,w) = (1,1,1) and (—1, —1,1), as above. More information on interpreting the results is given below. C.I.6
About Scripts
In HOMLAB, the high-level functions are written not as true functions, which hide their internal variables from the workspace, but as scripts, which are a sequence of commands that run directly in the top level workspace. One advantage of this is that a Matlab save command can save all of the data necessary to execute an exact re-run, including all random constants used in denning a start system, and so on. A negative consequence is that all such data is in the workspace until the user clears it. If one wishes to avoid this, one can write a function to call the HOMLAB script and pass out only the desired results. C.2
Overview of HOMLAB
HOMLAB is a collection of compatible routines for defining and executing homotopy algorithms. The workhorse routine is endgamer.m which tracks solution paths for a homotopy h(x, t) = 0 from a list of startpoint solutions of h(x, 1) = 0 to their
HomLab User's Guide
359
endpoints satisfying h(x, 0) = 0. Specifically, endgamer has the usage [xsoln,stats,xendgame]=endgamer(startpoint,hfun) which is more completely documented in § C.7 below. Briefly, the inputs are startpoint, a matrix with one startpoint solution of the homotopy in each column, and hf un, a string name of the homotopy function. The matrix xsoln lists the endpoints of the solution paths in columnwise fashion. As its name suggests, endgamer applies an endgame to get better estimates of the endpoints for paths which approach singularities as t —> 0. Specifically, it uses the power-series endgame described in § 10.3.3. Usage of HOMLAB mainly comes down to specifying a homotopy and finding its start points. This can be done by writing one's own m-files or by making use of utilities and drivers in HOMLAB. The main alternatives are as follows. Linear Products This option includes total degree homotopies (§ 8.4.1), multihomogeneous homotopies (§ 8.4.2), and general linear-product homotopies (§ 8.4.3). The user must specify a target function f(x), its derivative fx{x), and the linear product structure. Automatic differentiation is available if the function is specified in fully-expanded form (see § C.3.1). Driver routine lpdsolve does everything else to construct and solve a homotopy of the form h(x,t) = ytg(x) + (l-t)f(x) = 0. That is, lpdsolve constructs a compatible start system g(x), solves it, and calls endgamer to get the final answers. See § C.4 for details. Parameter Homotopy This option handles general homotopies of the type described in Chapter 7. The user gives a parameterized function f{x,q), its derivatives fx{x,q) and fq(x,q), starting and ending parameter values q\ and go, and startpoint solutions for f(x,qi) = 0. (Usually, the start points are found with a single linear-product run, then parameter homotopy is used for all subsequent runs for various target values of go-) A means is provided for selecting a linear path from q\ to qo; otherwise, the user must write an m-file to implement a nonlinear path. When the linear path is selected, the homotopy is of the form h(x,t)=f{x)tq1 + (l-t)qo)=O. Secant Homotopy This option solves homotopies of the form h(x, t) = -ytf(x, qi) + (1 - t)f(x, q0) = 0. The user supplies the function f(x,q), the derivative fx(x,q), and startpoint solutions to f(x,qi) = 0. (Again, as in the parameter continuation case, one usually solves f(x,q±) = 0 with a single linear-product homotopy, reusing the same q\ for subsequent homotopies to various target values of qo.) It is the
360
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
user's responsibility to verify that the homotopy is valid in the sense that the linear combination of two functions from the family is still in the same family. This is the least used option, but as shown in Exercise 7.6, it is sometimes handy. The usual process involves creating two m-files: • a function defining the system to be solved • a script that sets up the required data structures before calling endgamer to get the solutions. The exception is if one chooses to specify the function in "tableau" form (§ C.3.1), in which case the function evaluation routine is already provided. Facilities are available to make the whole process easy in the most common formulations, while the more advanced user can directly access the basic routines to implement specialized homotopies. In the next few sections, we illustrate each of the main options by examining example scripts and functions. C.3
Defining the System to Solve
HOMLAB allows a target system to be denned in one of two ways: as a fully expanded sum of terms or as a black box function. The fully expanded form is convenient for simple, sparse polynomials, while user-written functions are more flexible and often more efficient. Parameterized families of systems must always be written as a userdefined function, but the underlying functions that HOMLAB uses for evaluating fully expanded functions can be employed in a user-defined function as well.
C.3.1
Fully-Expanded
Polynomials
The simplest option for specifying a target polynomial is to list out its monomials and coefficients. As discussed in § 1.2, this is not generally an efficient formulation: for complicated problems, straight-line programs can require much less computation. However, for simple systems, the fully expanded form is quite reasonable. HOMLAB supports a "tableau" style definition for systems, wherein the entire polynomial system is laid out in a single numerical matrix with n + 1 columns for an n-variable problem. The convention is that each row is a term of a polynomial, with the coefficient in the first column and the exponents d\,..., dn for monomial xf1 • • • xff in the remaining columns. The end of a polynomial is marked by a row with a negative exponent for x\. A complete script for solving the system x2 - x - 2 = 0,
xy - 1 = 0,
using a tableau definition of the system and a total-degree homotopy is as follows.
HomLab User's Guide
361
% Define the target system in tableau form eop = [0 -1 0 ] ; % marker for end of polynomial tableau = [ 12 0 -110 -2 0 0 eop 111 -10 0 eop ]; '/, decode tableau and solve with total-degree homotopy totdtab 7, display the dehomogenized solutions dispOThe solutions a r e : ' ) ; disp(dehomog(xsoln, le-8)) The total degree is 2 • 2 = 4, and there are two finite solutions [x, y, w] = [2,0.5,1], and [—1,-1,1] and a double root at infinity of [0,1,0]. More information on the solution script totdtab is given in § C.4. A related script, lpdtab, can be used to solve tableau-style systems using multihomogeneous or general linear-product homotopies. Using this capability, a two-path version of the above would be as follows. °/0 Define the target system in tableau form eop = [ 0 - 1 0 ] ; '/, marker for end of polynomial tableau = [ 12 0 -110 -2 0 0 eop 111 -10 0 eop ]; % define a linear-product decomposition xw=[l 0 1]; yw=[0 1 1] ; LPDstruct=[ xw; xw; xw; yw; ] ; HomStruct= [] ; V, default t o 1-homogeneous % decode tableau and solve with linear-product homotopy lpdtab '/, display the dehomogenized solutions disp('The solutions are:'); disp(dehomog(xsoln,le-8))
362
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
See § C.4 for details on specifying linear-product structures and see the help for lpdtab for details on how this script automatically homogenizes the tableau-style polynomial using the information in HomStruct. These scripts parse the tableau matrix into a more basic form and pass the results to a built-in function, ftabcall, which in turn calls function ftabsys. The latter can be used directly if one wishes to write a straight-line function (see next section) while specifying some subset of the polynomials in tableau form. For details, see use the help facility or look at the source code for ftabsys.
C.3.2
Straight-Line
Functions
For more complicated functions, it is usually more efficient to express them in straight-line form. There are two contexts in HOMLAB where a user might define such a function: • to define a target polynomial for solution in a linear-product homotopy, or • to define a parameterized family of functions for a coefficient-parameter homotopy. In the first case, the function must have the form
function
[f,fx]=function_name(x)
where x is the input variable list, f is the output function value, and f x is the output Jacobian matrix of partial derivatives df/dx. The function must be homogeneous, possibly multihomogeneous. The careful reader might raise an objection that a homogeneous polynomial on P n is not truly a function (see § 12.3), but for our purposes we consider it as a function on C" +1 , which it certainly is. The script which defines the linear-product homotopy appends random linear equations to effect the projective transformation of § 3.7, one such equation for each projective subspace when working multihomogeneously. To repeat the example above of the system
x2 - x - 2 = 0,
xy-1-0,
in straight-line form, one could define the function
HomLab User's Guide
363
function [f,fz]=simplefcn(z) % Straight-line function for "/. x"2-x-2=0, xy-l=0 x=z(l); y=z(2); w=z(3); f = [ x~2-x*w-2*w~2 x*y-w*2 ]; fz = [ 2*x-w, 0, -x-4*w y, x, -2*w ]; This is not really useful for such a simple example, but it can be significant for more complicated systems. Notice the use of the homogeneous coordinate w. Similarly, parameterized functions must also be homogeneous in the unknowns, but not necessarily in the parameters. The Matlab format for a parameterized family of systems is simply function
[f,fx,fp]=function_name(x,p)
where the third output, f p, is the matrix of derivatives df /dp. Here is a complete specification for the intersection of two circles, where a subfunction for a single circle is used twice. function
[f,fx,fp]=twocircle(x,p)
% Straight-line function for intersection of two circles f=zeros(2,l); fx=zeros(2,3); fp=zeros(2,6); [f(l),fx(l,:),fp(l,l:3)]=onecircle(x,p(l:3)); [f(2),fx(2):),fp(2)4:6)]=onecircle(x,p(4:6)); % function [f,fx,fp]=onecircle(z,p)
% straight-line function for one circle °/, parameters are [cx;cy;r~2] where (cx,cy)=center, r=radius x=z(l); y=z(2); w=z(3); cx=p(l); cy=p(2); rsq=p(3); a=x-cx*w; b=y-cy*w; f = 0.5*( a~2 + b~2 - rsq*w~2); fx = [ a , b, -cx*a-cy*b-rsq*w]; fp = [ -w*a, -w*b, -0.5*w~2 ] ; The most error-prone part of writing a straight-line program is in generating the derivatives. To aid in debugging, utilities are provided to numerically check the coding of the function. See § C.3.4.
364
C.3.3
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Homogenization
It is highly recommended that all systems presented to HOMLAB be denned in homogeneous form. This is the user's responsibility, with the exception that tableaustyle systems will be homogenized automatically. Homogenization is recommended because path endpoints approaching infinity are very common, and the projective transformation available after homogenization keeps both the magnitudes of the coordinates and the arclengths of the homotopy paths finite. If one wishes to compute homotopy paths without homogenization, the path-tracker routines endgame and tracker will still work, but they do not include any special stopping conditions for diverging solutions, which may therefore take up inordinate computation time. (Such paths can never make it to t — 0, so eventually they must fail on a too small step size condition or on the limit on the number of steps.) The choices of a linear product structure and a multihomogenization are independent. For example, if a system is bilinear, one can reflect this in the linearproduct structure while one-homogenizing the system. The one-homogeneous start system will have solutions at infinity, but HOMLAB will ignore these. If the system is two-homogenized instead, respecting the bilinear structure, the linear-product start system has no solutions at infinity. This is a bit cleaner mathematically, but in practical terms, both formulations have the same number of solution paths to follow. To be clear, consider again the example x2 ~ x - 2 = 0,
xy - 1 = 0.
In a one-homogeneous treatment using coordinates [x, y, w] £ P 2 , we have the equations x2 - xw - 2w2 = 0 ,
xy -w2 = 0,
and we must specify a compatible homogeneous structure: HomStruct=[l 1 1]; which directs HOMLAB to append a inhomogeneous linear equation ax + by+cw = 1 for some random, complex {o, b, c}, thereby choosing a random patch on P 2 . (For a discussion of projective spaces, see Chapter 3.) To get a two-path homotopy, we specify the linear-product structure LPDstruct=[l 0 1 ; 1 0 1 ;
1 0 1; O i l ] ;
that is, /i e (x,w) x (x,w) and / 2 G (x,w) (y,w). This start system has a double root at infinity of [x,y,w] = [0,1,0], but HOMLAB will ignore it. The two-homogeneous treatment of the same system using coordinates {[x, u], [y,v]} e P 1 x P 1 is x2 — xu — u2 = 0,
xy — uv = 0.
365
HomLab User's Guide
The compatible HomStruct is, assuming the coordinates are ordered as HomStruct=[l 0 1 0 ; which directs
HOMLAB
(x,y,u,v),
0 1 0 1];
to append two linear equations
ax + Qy + bu + Ov = 1,
Ox + cy + Ou + dv = 1,
for random, complex values of {a, b, c, d}. This picks a random patch on each of the two P 1 subspaces. Now, the two-path linear-product decomposition is LPDstruct=[1 0 1 0 ; 1 0 1 0 ;
1010;
010
1];
See § C.8 for a description of the dehomog function to dehomogenize a solution point.
C.3.4
Function Utilities and Checking
With a few sample scripts in hand, it is easy to set up and run any of the various kinds of homotopies once the function and its derivatives are available. To make the definition of these easier, some utilities are available. • Function f tableau accepts a list of coefficients and a matrix of exponents to define the terms of a polynomial. It then provides both the function and derivative evaluations. It works only for a single function / : C n —> C, so a wrapper function ftabsys is provided to call ftableau multiple times for a system of such functions. • Utility scalepol is available for scaling a system for f tabsys, as is sometimes necessary. For example, see the chemical system of § 9.2. Since Matlab is a numerical package, there has been no attempt to automate differentiation and homogenization except for the simple case of fully expanded polynomials via the ftableau function. Otherwise, this onerous task falls to the user. Symbolic packages can be employed to preprocess functions in this way and then copy the results into an m-file function. The most error-prone step in defining a straight-line program for a function is in giving formulae for the partial derivatives. A helpful way of checking these is to compare the computed derivatives with a computation based on numerical differentiation. The function must also be homogenized, which can also be checked numerically. The following checking utilities are provided for these purposes.
366
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
function [fxerr]=chekffun(fname,nx,epsO) —> checks derivatives of target functions [f,fx]=myfunc(x) function [fxerr,fperr]=chekpfun(fname,nx,np,epsO) —> checks derivatives for parameterized functions [f,fx,fp]=myfunc(x,p) function [homerr]=chekhmog(fname,HomStruct,mdeg) —> Checks multihomogenization of function [f]=myfunc(x) User provides homogeneous structure and multidegree matrix function [homerr]=chekhmog(fname,HomStruct,deg,lpd) —> Checks multihomogenization of function [f]=myfunc(x) User provides homogeneous structure, t o t a l degrees, and linear product structure. Code computes multidegree matrix from these. In each case, the checking is done at a random point x £ Cnx. The functions provide a numerical comparison and also use the Matlab spy function to graphically show which elements are suspicious, having an error greater than espO. If epsO is omitted from the call, it defaults to 10~6. Note that high-level scripts define a global FFUN, which can be used for fname.
C.4
Linear Product Homotopies
One of the two main options in HOMLAB is the linear-product homotopy, implemented in the script lpdsolve. With appropriate settings, this script performs the equivalent of a total-degree homotopy, a multihomogeneous homotopy, or a general linear-product homotopy. For total-degree and multihomogeneous homotopies and a tableau-style function definition, the higher-level scripts totdtab and mhomtab automatically perform some preliminary processing steps for you before initiating lpdsolve. Let's first see all the set-up information required by lpdsolve by studying a script to solve a simple system specified in straight-line form. Such a function is treated as a "black box," so the user must supply all the structural information necessary to specify the linear-product formulation. To this end, consider the straight-line function called simplf en in § C.3 above, that implements the system x2 - x - 2 = 0,
xy - 1 = 0.
It has two variables (before homogenization), and each equation is quadratic. A complete script to solve this with a total-degree homotopy, with four paths, is as follows.
HomLab User's Guide
367
% script to solve "simplfcn" by t o t a l degree using lpdsolve global nvar degrees FFUN nvar=2; degrees=[2 2 ] ; FFUN='simplfcn'; LPDstruct=ones(sum(degrees),nvar+l); % t o t a l degree structure lpdsolve dispOThe solutions a r e : ' ) ; disp(dehomog(xsoln, ie-8)) The meaning of the global variables is self-explanatory. The degrees of the polynomials as listed in degrees must be in the same order as they appear in the evaluation function, although in this example they are the same. The item that needs explanation is LPDstruct, which defines the linear-product structure to be used. Each row in LPDstruct represents one linear factor, and there must be degrees (i) factors for the ith equation, for a total of sum (degrees) rows in all. The columns of LPDstruct correspond to the variables in x as it is passed into simplfcn(x). Typically, the final column is the homogeneous coordinate, but this is at the discretion of the user when writing the function. A nonzero entry in element (i,j) of LPDstruct indicates that variable j appears in the ith linear factor, and factors are assigned to equations in accordance with the entries in degrees. For a total-degree homotopy, LPDstruct is just a full matrix of ones. We can run the same problem using only two paths just by changing LPDstruct. In this case, a two-path homotopy is obtained with the following script. '/„ script to solve "simplfcn" with two paths using lpdsolve global nvar degrees FFUN nvar=2; degrees=[2 2]; FFUN='simplfcn'; xw= [ 1 0 1]; yw=[0 1 l ] ; LPDstruct=[ xw; xw; xw; yw; ];
HomStruct=[]; % default to 1-homogeneous lpdsolve disp('The solutions a r e : ' ) ; disp(dehomog(xsoln,le-8)) This is exactly the script simpltst that is suggested as an installation check in § C.1.5. The script above uses only two paths even though simplfcn is only onehomogenized. That is, the homotopy runs in the projective space P 2 . The choice of linear product structure guarantees that the start system, the target system, and consequently the whole homotopy, have a double root at infinity that we choose not to track. If we two-homogenize the equations instead, then this double root
368
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
does not exist at all. It makes no difference in this case, but in cases where some endpoints arrive at infinity only at the target (as t —• 0), multihomogenization can change their representation, and sometimes this can make them numerically more tame. For instance, a singular double root at infinity might break into two distinct nonsingular roots at infinity. To show how HomStruct is used to set up a homotopy in a cross product of projective spaces, let's rework the running example. First, we need a two-homogenized version of the equations. function [f,fz]=simplefcn2(z) % Straight-line function for '/. x~2-x-2=0, xy-l=0 % Two-homogeneous version on [x,u] \times [y,v] x=z(l); y=z(2); u=z(3); v=z(4); f = [ x~2-x*u-2*u~2 x*y-u*v ]; fz = [ 2*x-u, 0, -x-4*u, 0 y. x, -v, -u ];
The script to solve this as a two-homogeneous system follows. 5i script to solve "simplfcn2" two-homogeneously global nvar degrees FFUN nvar=2; degrees=[2 2]; FFUN='simplfcn'; xu=[l 0 1 0 ] ; yv=[0 1 0 1 ] ; LPDstruct=[ xu; xu; xu; yv; ]; HomStruct=[ xu ];
lpdsolve dispOThe solutions a r e : ' ) ; disp(dehomog(xsoln,le-8)) In general, the groupings in the linear factors specified in LPDstruct do not have to be copies of those in HomStruct, as indeed, they are different in the two-path,
369
HomLab User's Guide
one-homogeneous example above. The given LPDstruct and HomStruct must be compatible with the target function. In tableau style functions, the automated scripts will ensure compatibility, but straight-line functions are treated as black boxes, so HOMLAB has no way of checking compatibility. It is the user's responsibility to ensure compatibility. In the case of errors, the resulting behavior will be erratic, sometimes signalled by path-tracking failures, but not necessarily. The solution script, lpdsolve, does the following: • generates a start system g(x) according to the linear-product structure of LPDstruct, • appends random hyperplane slices to implement the projective transformation compatible with HomStruct, • solves g(x) = 0 to get all the start points, • forms the homotopy h(x,t) = 1tg(x) +
{l-t)f(x),
• calls endgamer to track the solution paths, invoking a power-series endgame. The results are in matrices xsoln, stats, and xendgame, as described in § C.7.3. C.5
Parameter Homotopies
Suppose we have written a coefficient-parameter target function, f(x;p), implemented as an m-file function, say myf unc, having the calling sequence [f,fx,fp]=myfunc(x,p) as described in § C.3. An example is function twocircle above. How can we form a parameter homotopy function to solve it for some target value of pO? Let's assume we have a solution list for random, complex parameter values pi. (We will see in a moment how to get this using lpdsolve.) What we need is a homotopy function h(x,t;pl,pO) = f(x,p(t;pl,pQ)) where p : C x Q x Q —> Q with Q the parameter space, and p(l;pi,p0) = pi, p(O;pi,j>o) — Po- The path function p must give a continuous path, with continuous first derivative, starting at pi, ending at po, and always staying in the parameter space. HOMLAB does not offer a general solution for arbitrary parameter spaces, but in the special case that Q = C m , a Euclidean space, a linear path suffices: p(t\pi,p0) = tpi + ( l - i ) p 0 Our path tracker and endgame function, endgamer, expects a homotopy function with the calling sequence [h,hx,ht]=h(x,t). Therefore, the parameters and the
370
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
path function must be passed from the top level script to the homotopy evaluation function via global variables. Moreover, we write myfunc in homogeneous form, so projective transformation equations must be appended. The script parsolve takes care of all the formatting once the minimal set of information has been established. Let's assume that pi and pO are in memory along with the start points, as matrix startpoint, listed columnwise and satisfying f(x,pi) = 0. Then, a script for solving the system f(x,p0) = 0 is as follows, assuming myfunc implements f(x,p). global FFUN PATHFUN FFUN = 'myfunc'; PATHFUN = ' l i n . p a t h ' ; global ParStart ParGoal ParStart = p i ; ParGoal = pO; HomStruct= [] ; °/« defaulting to 1-homogeneous parsolve disp('The solutions a r e : ' ) ; disp(dehomog(xsoln,le-8)) Here, lin_path is a pre-defined function for a linear path. Clearly, HomStruct must be set to agree with the homogenization that has been applied to the user-defined myfunc. C.5.1
Initializing Parameter Homotopies
For parameter homotopy to be useful, we must have some way to solve the first example, f(x,pi) — 0. This can be done with a linear-product homotopy. Once the parameterized family of systems has been defined, in the form f(x;p) = 0, HOMLAB can treat it like any other black box target system. This requires, as described in § C.4, one to provide the linear product structure to be used in the homotopy. One additional wrinkle is that the initial set of parameters p\ must be chosen at random, and then passed behind the scenes through a global variable. Script Ipd2par takes care of all of this. An example usage of this to solve the example of the intersection of two circles, function twocircle above, is as follows. global nvar degrees FFUN nvar=2; degrees=[2 2 ] ; FFUN='twocircle'; LPDstruct=ones(4,3) ; °/0 t o t a l degree structure HomStruct=[l 1 1]; °/0 1-homogeneous % 6 random, complex parameters pi = crand(6,l); global ParGoal ParGoal = p i ; Ipd2par dispOThe solutions a r e : ' ) ; disp(dehomog(xsoln,le-8))
HomLab User's Guide
371
This sets up and solves a homotopy of the form 1ftg(x) + (l-t)f{x;p1) Here, we have chosen random, complex parameters, using the function crand, as this is the desired first step in establishing a parameter homotopy. One can use the same script with nonrandom values of ParStart to solve other problems in the family using a linear-product homotopy, but each such run uses the full linear-product root count number of paths. If any of the endpoints are degenerate in the run of Ipd2par for random target parameters, then correspondingly fewer paths can be used in solving subsequent members of the family by parameter homotopy. Just copy pi to ParStart and copy the nondegenerate endpoints into startpoint and you are ready to apply the parameter homotopy of the previous section. Here, "degenerate" can mean singular solutions, solutions at infinity, or solutions on any pre-specified (i.e., independent of the random choice of parameters) irreducible quasiprojective algebraic set. See Chapter 7 for details. C.6
Defining a Homotopy Function
In all the above usages, HOMLAB automatically constructs a homotopy in accord with the instructions provided by the user. Alternatively, one can define a complete homotopy from scratch and then call up HOMLAB'S path tracker to solve it. The homotopy function must be denned with the following interface: function [h,hx,ht]=myriomotopy(x>t) where myhomotopy can be any name of the user's choosing. The user must also provide a list of start points, whereupon the corresponding endpoints can be obtained with the command [xsoln,stats,xendgame]=endgamer(startpoint,'myhomotopy'); See § C.7 for details. C.6.1
Defining a Parameter
Path
In linear-product decompositions, the homotopy path is automatically chosen by HOMLAB as a straight line through the corresponding coefficient space, as justified by Theorem 8.3.1. In parameter homotopies, however, one must ensure that the homotopy path stays in the desired parameter space for all t, not just for the start and target systems at t = 1 and t = 0. If the parameter space is Euclidean, then a linear path is acceptable. As in the example usage of parsolve above, this is easily obtained by the declaration PATHFUN=' lin_path'; which makes use of a pre-defined function lin_path for linearly interpolating between points in parameter space. If
372
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
the parameter space is non-Euclidean, a more general type of path is needed. The user must provide the definition in the form function
[p,dpdt]=mypath(pl,pO,t)
where mypath. m is a user-written m-file function. Then, set PATHFUN=' mypath'; before calling parsolve to execute the homotopy. See the source code for lin_path.m for an example to follow. C.6.2
Homotopy Checking
Most errors in coding a homotopy function can be revealed by checking if computed derivatives agree with a computation based on numerical differentiation. The following routine is provided for this purpose. function [hxerr,hterr]=chekhfun(fname,nx,epsO) —> checks homotopy functions [f,fx,ft]=myfunc(x,t) The checking is done at a random point (x,t) € C nx x C. The functions provide a numerical comparison and also use the Matlab spy function to graphically show which elements are suspicious, having an error greater than espO. If epsO is omitted from the call, it defaults to 10~6. Note that high-level scripts define a global HFUN, which can be used for f name. C.7
The Workhorse: Endgamer
The workhorse routine is endgamer. m which tracks solution paths for a homotopy h(x,t) = 0 from a list of startpoint solutions of h(x, 1) — 0 to their endpoints satisfying h(x, 0) = 0. Specifically, endgamer has the usage [xsoln,stats,xendgame]=endgamer(startpoint,hfun) with thefollowinginputs and outputs. Inputs startpoint An n x N matrix of N start points, listed columnwise. hfun A string name of the homotopy function, h(x,t) : Cn x C - » C n . The function routine must provide derivatives (see § C.6). It is recommended that the homotopy be homogenized. Outputs xsoln Ann x N matrix of the endpoints of the homotopy paths. stats A6xJV matrix of statistics regarding the paths and their endpoints. xendgame An n x N matrix recording the solutions for t at the start of the endgame.
HomLab User's Guide
373
There are a number of control settings regarding path-tracking tolerances and the like which must be set prior to calling endgamer. These global variables can be set by calling htopyset, as is done automatically by the high-level solving scripts totdtab, mhomtab, lpdsolve, and parsolve. To change the default settings, one just puts a copy of htopyset in the current working directory and edits the values. Matlab will find and use the copy in the current directory, overriding the copy in HOMLABIO, which is best left in its original condition. Comments in the original copy ofhtopyset.m tell the default settings in case the user needs them for reference. Routine endgamer loops through the start points and for each one does the following: (1) tracks the path to the beginning of the endgame, t=t_endgame, a global control variable; (2) records the solution at t=t_endgame as a column in xendgame; (3) executes the power-series endgame (§ 10.3.3), monitoring the convergence criterion and stopping when either convergence is reached or when one of several protective stopping conditions is satisfied; (4) records the best solution estimate, as judged by the convergence criterion, as a column in xsoln and certain statistics concerning the solution are recorded as a column in stats. The details of all the required control settings are given next, followed by a detailed description of the outputs. C.7.1
Control Settings
As mentioned above, the control settings are established in htopyset.m, which is called automatically by the high-level scripts. Here, we give a detailed list and describe what each control means, as well as give the default value. We group these into three general categories. These are all global variables. LPD Start System (used by lpdstart) • epsstart = le-12; Each solution of the linear-product start system is found by choosing one linear factor from each equation and solving the resulting linear system. Choices that give a singular linear system are ignored. The solver builds the linear system one equation at a time, using Gaussian elimination to triangularize as it proceeds. If at any stage the magnitude of the largest available pivot is less than epsstart, that combination of linear factors is declared invalid, and the solver moves on to the next combination. Path Tracking (used by tracker). This variable step-size tracker is a correctorpredictor type as described in § 2.3.
374
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
• stepmin = le-6; The minimum step size in t, below which a path is declared as having failed. • maxit = 3; The maximum number of Newton iterations allowed in the corrector. If the designated convergence criterion is not met within this number of iterations, the step is a failure and the step size will be halved. • maxnf e = 1500; The maximum number of function evaluations allowed per path. This limits the amount of computing time that a diverging path may consume. For well-scaled, homogenized homotopies, this criterion should rarely come into play. • epstiny = le-12; In rare instances, a path may fail due to vanishing of the tangent vector (dh/dx)~1dh/dt. This is detected using the tolerance epstiny. Typically, when this occurs, it is a signal that the homotopy is not properly formed, possibly an error in a user-written function for evaluation of the derivatives. End Game (used by endgamer) This routine calls tracker to get to the start of the endgame, then runs the power-series endgame. • stepstart = 0.1; The initial step size for t in the tracker. • epsbig = le-4; The tracking accuracy to be maintained in the initial phase of tracking. This is the convergence tolerance for the corrector. If a path does not successfully reach the endgame, it is tried once more from the beginning with a tighter tolerance of epsbig/100. • epssmall = le-6; The tracking accuracy to be maintained in the endgame. • t_endgame = 0.1; The value of t where the endgame starts. • tstop = le-10; The value of t where the endgame gives up. • t r a t i o = 0.3; During the endgame, samples are taken for t in a geometric series where £& = tratio*£/c_i. The value 0.3 is a compromise between the need to spread the samples out for a well-conditioned fit (tratio smaller) and the need to stay away from t = 0, where the path may be singular. • eps_end = le-10; The criterion for deciding when the endpoint estimate has converged. When two successive estimates agree to this tolerance, success is declared. • CycleMax = 4; This is the maximum winding number tested by the powerseries endgame. In double precision, the endgame is rarely successful above winding number c = 4. • maxerrup = 10; The endgame keeps a record of the smallest change in the endpoint estimate in successive iterations. (This is compared to eps_end for declaring success.) Usually, this measure improves with each successive iteration, unless the path gets too close to t = 0 before converging. However, in the early stages of the endgame, the convergence measure can sometimes increase briefly before entering the endgame operating zone. If there are more than maxerrup successive iterations without improving on
HomLab ,User's Guide
375
the best iteration, the path is stopped. • allowjump = 1; When nonzero, this flag allows the endgame to predictcorrect across the origin in s, where s = t1//<2 is the un-wound path variable. This allows the endgame to sample on both sides of s = 0 to estimate the value of the endpoint by seventh-order interpolation. If allowjump=O, samples are only taken for s > 0, and the endpoint is estimated using cubic extrapolation to s — 0. C.7.2
Verbose Mode
By declaring global verbose and setting verbose=l;, the user will cause endgamer to print out its progress during the endgame for each path. This allows one to see how well the endgame is performing. Usually this is not of great interest, but if one is running a huge problem, it may be worth monitoring a small sample of paths and tuning the control settings for greater efficiency. It is also a useful way of confirming that all is working well: if superlinear convergence is obtained in the endgame, it is a strong indicator that everything is in good order. The five columns of information printed in verbose mode are: • the t value of the current endgame sample, • the difference between the last two endpoint estimates (maximum absolute value of the difference in any variable), • the current best guess for the winding number c, • the status of the endgame, which is the number of samples involved in estimating the endpoint. Each sample includes derivative information. "1" means there is only one sample, so the estimate will be done by linear extrapolation. "2" means two samples, so cubic extrapolation is available. "3" means an additional sample has been acquired on the other side of s = 0, but cubic extrapolation is still used. "4" means there are two samples on each side, so seventh-order interpolation is used for the estimate. • the fifth column is the number of successive iterations that have not improved on the best estimate. C.7.3
Path Statistics
The main tool for interpreting the results is to examine the s t a t s matrix. It has one column per path and six rows. The rows are as follows. • s t a t s (1,:) The value of t at which the endgame gave its best estimate. • s t a t s (2,:) The convergence estimate at the endpoint, which is the maximum absolute value of the difference in any variable between two successive endpoint estimates. • s t a t s (3,:) The function residual, that is the maximum absolute value of any entry in h(x*,0) for the endpoint estimate x*.
376
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
• s t a t s (4, :) The estimated winding number. If the true winding number is higher than CycleMax, this will typically result in a best guess of c=CycleMax in this place. • stats (5,:) Condition number of the Jacobian matrix |^ (x*, 0) at the endpoint estimate x*. • stats (6,:) The total number of function evaluations used in computing this path. For large runs, it is tedious to examine stats by looking at the raw numbers. It is much easier to look at histograms and other types of summary statistics. Any endpoint that does not have a small function residual, s t a t s ( 3 , :), has failed in a serious way; quite likely the path tracker stopped with a large value of t in s t a t s ( l , : ) . A histogram plot of Iogl0(stats(3,:)) gives a quick check whether all the paths have ended in at least an approximate solution. If the endpoint is singular, its function residual can be small while it is still relatively far from the true endpoint. Check loglO (stats (2,:)) to see the endpoint convergence measure. One hopes that all endpoints are computed to the desired accuracy, eps_end, but some may not, especially if they are singularities with cycle number of 4 or greater. If the accuracy is only moderate, say better than 10~6 but not at the 10~10 ones desires, check if the condition number is at least moderately high, say 108 or greater. This would indicate that the root really is singular and failed to be computed accurately for that reason. Depending on one's purpose, that may be enough. When the singular endgame works well, the endpoint accuracy will be better than 10~10 and the condition number of a singular point will be greater than 1010 often as high as 1016 or more. C.8
Solutions at Infinity and Dehomogenization
When the solutions are computed in homogeneous coordinates or multihomogeneous coordinates, they can be scaled in each projective factor. Usually the original formulation is in Cn and it has been recast in P", or in a cross product of projective spaces, by introducing one or more homogenizing coordinates. Solutions at infinity are indicated by a small homogenizing coordinate, or if multihomogenized, by at least one homogenizing coordinate being near zero. Here, being near zero means, typically, being of the same magnitude as the convergence estimate in s t a t s ( 2 , : ) . If the homogenizing coordinate is in row k of xsoln, then a histogram of absdoglO(xsoln(k,:))) can be very revealing. For a finite solution, we wish to rescale to make the homogenizing coordinate(s) equal to one. Subroutine dehomog does this. The short form is x=dehomog(xsoln,epsO) ; where epsO is the magnitude of the homogenizing coordinate below which a solu-
HomLab User's Guide
377
tion is declared to be at infinity. This form assumes that the solutions are onehomogenized and that the homogenizing coordinate is the last entry. Any solution determined to be at infinity is rescaled by its largest element, while a finite one is rescaled by the homogenizing coordinate. This is usually what one wants, but the result can be a bit surprising if epsO is made too small so that a poorly computed solution at infinity gets erroneously rescaled as if it were finite. A more elaborate form must be used for multihomogenized solutions: x=dehomog(xsoln,espO,HomStruct,homvar); where HomStruct identifies the membership in the various homogeneous groupings, and homvar is a list of the row number for each homogenizing variable. If homvar is missing, the last variable of each group is assumed by default to be the homogenizing variable for that group. In either the short form or the long form, dehomog sets one variable of each homogeneous group to one.
Bibliography
Abhyankar, S. S. (1990). Algebraic geometry for scientists and engineers, Vol. 35 of Mathematical Surveys and Monographs. Providence, RI: American Mathematical Society. Alefeld, G., k Herzberger, J. (1983). Introduction to interval computations. Computer Science and Applied Mathematics. New York: Academic Press Inc. [Harcourt Brace Jovanovich Publishers]. Translated from the German by Jon Rokne. Allgower, E. L., Erdmann, M., k Georg, K. (2002). On the complexity of exclusion algorithms for optimization. J. Complexity, 18(2), 573-588. Algorithms and complexity for continuous problems/Algorithms, computational complexity, and models of computation for nonlinear and multivariate problems (Dagstuhl/South Hadley, MA, 2000). Allgower, E. L., k Georg, K. (1993). Continuation and path following. In Ada numerica, Vol. 2 (pp. 1-64). Cambridge: Cambridge Univ. Press. Allgower, E. L., k Georg, K. (1997). Numerical path following. In Handbook of numerical analysis, Vol. V (pp. 3-207). Amsterdam: North-Holland. Allgower, E. L., k Georg, K. (2003). Introduction to numerical continuation methods, Vol. 45 of Classics in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). Reprint of the 1990 edition [SpringerVerlag, Berlin]. Allgower, E. L., Georg, K., k Miranda, R. (1992). The method of resultants for computing real solutions of polynomial systems. SIAM J. Numer. Anal., 29(3), 831-844. Allgower, E. L., k Sommese, A. J. (2002). Piecewise linear approximation of smooth compact fibers. J. Complexity, 18(2), 547-556. Algorithms and complexity for continuous problems/Algorithms, computational complexity, and models of computation for nonlinear and multivariate problems (Dagstuhl/South Hadley, MA, 2000). Alt, H. (1923). Uber die Erzeugung gegebener ebener Kurven mit Hilfe des Gelenkvierecks. Zeitschrift fur Angewandte Mathematik und Mechanik, 3(1), 13-19. Arbarello, E., Cornalba, M., Griffiths, P. A., & Harris, J. (1985). Geometry of algebraic curves. Vol. I, Vol. 267 of Grundlehren der Mathematischen Wissenschaften 379
380
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
[Fundamental Principles of Mathematical Sciences]. New York: Springer-Verlag. Auzinger, W., & Stetter, H. J. (1988). An elimination algorithm for the computation of all zeros of a system of multivariate polynomial equations. In Numerical mathematics, Singapore 1988, Vol. 86 of Internat. Schriftenreihe Numer. Math. (pp. 11-30). Basel: Birkhauser. Bates, D., Peterson, C , & Sommese, A. J. (2005a). A numerical-symbolic algorithm for computing the multiplicity of a component of an algebraic set. in preparation. Bates, D., Sommese, A. J., & Wampler, C. W. (2005b). Multiprecision endgames for homotopy continuation, in preparation. Beltrametti, M. C , Howard, A., Schneider, M., &, Sommese, A. J. (2000). Projections from subvarieties. In Complex analysis and algebraic geometry (pp. 71-107). Berlin: de Gruyter. Beltrametti, M. C , & Sommese, A. J. (1995). The adjunction theory of complex protective varieties, Vol. 16 of de Gruyter Expositions in Mathematics. Berlin: Walter de Gruyter & Co. Bernstein, D. N. (1975). The number of roots of a system of equations. Functional Anal. Appl., 9(3), 183-185. Translated from Funktsional. Anal, i Prilozhen 9(3):1-4,1975. Borel, A. (1969). Linear algebraic groups. Notes taken by H. Bass. W. A. Benjamin, Inc., New York-Amsterdam. Bottema, O., h Roth, B. (1979). Theoretical kinematics, Vol. 24 of North-Holland Series in Applied Mathematics and Mechanics. Amsterdam: North-Holland Publishing Co. Burmester, L. E. H. (1888). Lehrbuch der Kinematik. Leipzig A. Felix. Calabri, A., & Ciliberto, C. (2001). On special projections of varieties: epitome to a theorem of Beniamino Segre. Adv. Geom., 1(1), 97-106. Canny, J. (1990). Generalised characteristic polynomials. J. Symbolic Comput., 9, 241-250. Canny, J., & Manocha, D. (1993). Multipolynomial resultant algorithms. J. Symbolic Comput., 15, 99-122. Canny, J., & Rojas, J. M. (1991). An optimal condition for determining the exact number of roots of a polynomial system. Proceedings of the 1991 International Symposium on Symbolic and Algebraic Computation (pp. 96-101). ACM, New York. Chablat, D., Wenger, P., Majou, R., & Merlet, J.-P. (2004). An interval based study for the design and the comparison of three-degrees-of-freedom parallel kinematic machines. Int. J. Robotics Research, 23(6), 615-624. Chen, N. X., & Song, S.-M. (1994). Direct position analysis of the 4-6 Stewart platform. ASME J. Mech. Design, 116(1), 61-66. Chow, S. N., Mallet-Paret, J., & Yorke, J. A. (1979). A homotopy method for locating all zeros of a system of polynomials. In Functional differential equations and approximation of fixed points (proc. summer school and conf, univ. bonn,
Bibliography
381
bonn, 1978), Vol. 730 of Lecture Notes in Math. (pp. 77-88). Berlin: Springer. Chu, M. T., Li, T.-Y., & Sauer, T. (1988). Homotopy method for general A-matrix problems. SI AM J. Matrix Anal. AppL, 9(4), 528-536. Cox, D., Little, J., & O'Shea, D. (1997). Ideals, varieties, and algorithms. Undergraduate Texts in Mathematics. New York: Springer-Verlag, second edition. An introduction to computational algebraic geometry and commutative algebra. Cox, D., Little, J., & O'Shea, D. (1998). Using algebraic geometry, Vol. 185 of Graduate Texts in Mathematics. New York: Springer-Verlag. D'Andrea, C , & Emiris, I. Z. (2003). Sparse resultant perturbations. In Algebra, geometry, and software systems (pp. 93-107). Berlin: Springer. Datta, R. S. (2003). Using computer algebra to find Nash equilibria. Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation (pp. 74-79). New York: ACM. Davidenko, D. F. (1953a). On a new method of numerical solution of systems of nonlinear equations. Doklady Akad. Nauk SSSR (N.S.), 88, 601-602. Davidenko, D. F. (1953b). On approximate solution of systems of nonlinear equations. Ukrain. Mat. Zurnal, 5, 196-206. Davis, P. J. (1975). Interpolation and approximation. New York: Dover Publications Inc. Republication, with minor corrections, of the 1963 original, with a new preface and bibliography. Decker, W., Greuel, G.-M., & Pfister, G. (1999). Primary decomposition: algorithms and comparisons. In Algorithmic algebra and number theory (Heidelberg, 1997) (pp. 187-220). Berlin: Springer. Decker, W., & Schreyer, F.-O. (2001). Computational algebraic geometry today. In Applications of algebraic geometry to coding theory, physics and computation (Eilat, 2001), Vol. 36 of NATO Sci. Ser. II Math. Phys. Chem. (pp. 65-119). Dordrecht: Kluwer Acad. Publ. Decker, W., & Schreyer, F.-O. (2005). Solving polynomial equations: Foundations, algorithms, and applications, to appear. Denavit, J., & Hartenberg, R. S. (1955). A kinematic notation for lower pair mechanisms based on matrices. J. Appl. Mechanics, 22, 215-221. Trans. ASME, vol. 77. Dhingra, A., Kohli, D., & Xu, Y. X. (1992). Direct kinematic of general Stewart platforms. DE-Vol. 45, Robotics, Spatial Mechanisms, and Mechanical Systems (pp. 107-112). ASME. Dian, J., & Kearfott, R. B. (2003). Existence verification for singular and nonsmooth zeros of real nonlinear systems. Math. Comp., 72(242), 757-766. Dickenstein, A., & Emiris, I. Z. (Eds.), (preprint). Solving polynomial equations: Foundations, algorithms, and applications. Berlin Heidelberg New York: Springer-Verlag. Dietmaier, P. (1998). The Stewart-Gough platform of general geometry can have 40 real postures. In J. Lenarcic, & M. L. Husty (Eds.), Advances in robot kinematics:
382
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Analysis and control (pp. 1-10). Dordrecht: Kluwer Academic Publishers. Dixon, A. L. (1909). The eliminant of three quantics in two independent variables. Proc. London Math. Soc, 2(7), 49-69. Drexler, F. J. (1977). Eine Methode zur Berechnung samtlicher Losungen von Polynomgleichungssystemen. Numer. Math., 29(1), 45-58. Drexler, F. J. (1978). A homotopy method for the calculation of all zeros of zerodimensional polynomial ideals. In Developments in statistics, vol. 1 (pp. 69-93). New York: Academic Press. Duffy, J., & Crane, C. (1980). A displacement analysis of the general spatial 7-link, 7R mechanism. Mechanism Machine Theory, i5(3-A), 153-169. Eisenbud, D. (1995). Commutative Algebra with a view toward algebraic geometry, Vol. 150 of Graduate Texts in Mathematics. New York: Springer-Verlag. Emiris, I. Z. (1994). Sparse elimination and applications in kinematics. PhD thesis, Computer Science Division, Dept. of Electrical Engineering and Computer Science, University of California, Berkeley. Emiris, I. Z. (1995). A general solver based on sparse resultants. Proc. PoSSo (Polynomial System Solving) Workshop on Software (pp. 35-54). Paris. Emiris, I. Z. (2003). Discrete geometry for algebraic elimination. In Algebra, geometry, and software systems (pp. 77-91). Berlin: Springer. Faugere, J. C , & Lazard, D. (1995). The combinatorial classes of parallel manipulators. Mechanism Machine Theory, 30(6), 765-776. Feinberg, M. (1980). Chemical oscillations, multiple equilibria, and reaction network structure. In W. E. Stewart (Ed.), Dynamics and modelling of reactive systems (pp. 59-130). Academic Press, Inc. Fischer, G. (1976). Complex analytic geometry. Berlin: Springer-Verlag. Lecture Notes in Mathematics, Vol. 538. Fischer, G. (2001). Plane algebraic curves, Vol. 15 of Student Mathematical Library. Providence, RI: American Mathematical Society. Translated from the 1994 German original by Leslie Kay. Freudenstein, F., & Roth, B. (1963). Numerical solution of systems of nonlinear equations. J. ACM, 10(4), 550-556. Frisch, J. (1967). Points de platitude d'un morphisme d'espaces analytiques complexes. Invent. Math., 4, 118-138. Fritzsche, K., & Grauert, H. (2002). From holomorphic functions to complex manifolds, Vol. 213 of Graduate Texts in Mathematics. New York: Springer-Verlag. Fulton, W. (1998). Intersection theory, Vol. 2 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Berlin: Springer-Verlag, second edition. Gao, T., & Li, T.-Y. (2000). Mixed volume computation via linear programming. Taiwanese J. Math., 4(4), 599-619. Gao, T., & Li, T.-Y. (2003). Mixed volume computation for semi-mixed systems.
Bibliography
383
Discrete Comput. Geom., 29(2), 257-277. Gao, T., Li, T.-Y., Verschelde, J., & Wu, M. (2000). Balancing the lifting values to improve the numerical stability of polyhedral homotopy continuation methods. Appl. Math. Comput, 114(2-3), 233-247. Gao, T., Li, T.-Y., & Wang, X. (1999). Finding all isolated zeros of polynomial systems in C" via stable mixed volumes. J. Symbolic Comput., 28(1-2), 187-211. Polynomial elimination—algorithms and applications. Garcia, C. B., & Zangwill, W. I. (1979). Finding all solutions to polynomial systems and other systems of equations. Math. Programming, 16(2), 159-176. Garcia, C. B., & Zangwill, W. I. (1980). Global continuation methods for finding all solutions to polynomial systems of equations in n variables. In Extremal methods and systems analysis (Interned. Sympos., Univ. Texas, Austin, Tex., 1977), Vol. 174 of Lecture Notes in Econom. and Math. Systems (pp. 481-497). Berlin: Springer. Gelfand, I., Kapranov, M., & Zelevinsky, A. (1994). Discriminants, resultants and multidimensional determinants. Boston: Birkhauser. Georg, K. (2001). Improving the efficiency of exclusion algorithms. Adv. Geom., 1(2), 193-210. Georg, K. (2003). A new exclusion test. J. Comput. Appl. Math., 152(1-2), 147160. Proceedings of the International Conference on Recent Advances in Computational Mathematics (ICRACM 2001) (Matsuyama). Giusti, M., Hagele, K., Lecerf, G., Marchand, J., & Salvy, B. (2000). The projective Noether Maple package: computing the dimension of a projective variety. J. Symbolic Comput, 30(3), 291-307. Goedecker, S. (1994). Remark on algorithms to find roots of polynomials. SI AM J. Sci. Comput, 15(5), 1059-1063. Goresky, M., & MacPherson, R. (1988). Stratified Morse theory, Vol. 14 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Berlin: Springer-Verlag. Greuel, G.-M. (2000). Computer algebra and algebraic geometry—achievements and perspectives. J. Symbolic Comput, 30(3), 253-289. Greuel, G.-M., & Pfister, G. (2002). A singular introduction to commutative algebra. Berlin: Springer-Verlag. With contributions by O. Bachmann, C. Lossen and H. Schonemann, With 1 CD-ROM (Windows, Macintosh, and UNIX). Griewank, A., & Osborne, M. R. (1983). Analysis of Newton's method at irregular singularities. SIAM J. Numer. Anal, 20(4), 747-773. Griffis, M., & Duffy, J. (1993). Method and apparatus for controlling geometrically simple parallel mechanisms with distinctive connections. US Patent 5,179,525. Griffths, P. A., & Harris, J. (1994). Principles of algebraic geometry. Wiley Classics Library. New York: John Wiley & Sons Inc. Reprint of the 1978 original. Gunning, R. C. (1970). Lectures on complex analytic varieties: The local parametrization theorem. Mathematical Notes. Princeton, N.J.: Princeton University
384
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Press. Gunning, R. C. (1990). Introduction to holomorphic functions of several variables. Vol. II. The Wadsworth & Brooks/Cole Mathematics Series. Monterey, CA: Wadsworth k Brooks/Cole Advanced Books & Software. Local theory. Gunning, R. C, k Rossi, H. (1965). Analytic functions of several complex variables. Englewood Cliffs, N.J.: Prentice-Hall Inc. Hamming, R. W. (1986). Numerical methods for scientists and engineers. New York: Dover Publications Inc., second edition. Harris, J. (1995). Algebraic geometry, Vol. 133 of Graduate Texts in Mathematics. New York: Springer-Verlag. A first course, Corrected reprint of the 1992 original. Hartenberg, R. S., & Denavit, J. (1964). Kinematic synthesis of linkages. McGrawHill, N.Y. Hartshorne, R. (1977). Algebraic geometry. New York: Springer-Verlag. Graduate Texts in Mathematics, No. 52. Hille, E. (1959). Analytic function theory. Vol. 1. Introduction to Higher Mathematics. Ginn and Company, Boston. Hille, E. (1962). Analytic function theory. Vol. II. Introductions to Higher Mathematics. Ginn and Co., Boston, Mass.-New York-Toronto, Ont. Hodge, W. V. D., k Pedoe, D. (1994a). Methods of algebraic geometry. Vol. I. Cambridge Mathematical Library. Cambridge: Cambridge University Press. Book I: Algebraic preliminaries, Book II: Projective space, Reprint of the 1947 original. Hodge, W. V. D., k Pedoe, D. (1994b). Methods of algebraic geometry. Vol. II. Cambridge Mathematical Library. Cambridge: Cambridge University Press. Book III: General theory of algebraic varieties in projective space, Book IV: Quadrics and Grassmann varieties, Reprint of the 1952 original. Hodge, W. V. D., k Pedoe, D. (1994c). Methods of algebraic geometry. Vol. III. Cambridge Mathematical Library. Cambridge: Cambridge University Press. Book V: Birational geometry, Reprint of the 1954 original. Ho§ten, S., k Shapiro, J. (2000). Primary decomposition of lattice basis ideals. J. Symbolic Comput., 29(4-5), 625-639. Symbolic computation in algebra, analysis, and geometry (Berkeley, CA, 1998). Huang, Y., Wu, W., Stetter, H. J., k Zhi, L. (2000). Pseudofactors of multivariate polynomials. Proceedings of the 2000 International Symposium on Symbolic and Algebraic Computation (St. Andrews) (pp. 161-168). New York: ACM. Huber, B., Sottile, F., k Sturmfels, B. (1998). Numerical Schubert calculus. J. Symbolic Comput., 26(6), 767-788. Symbolic numeric algebra for polynomials. Huber, B., k Sturmfels, B. (1995). A polyhedral method for solving sparse polynomial systems. Math. Comp., 64(212), 1541-1555. Huber, B., k Sturmfels, B. (1997). Bernstein's theorem in affine space. Discrete Comput. Geom., 17(2), 137-141. Huber, B., k Verschelde, J. (1998). Polyhedral end games for polynomial continuation. Numer. Algorithms, 18(1), 91-108.
Bibliography
385
Huber, B., & Verschelde, J. (2000). Pieri homotopies for problems in enumerative geometry applied to pole placement in linear systems control. SIAM J. Control Optim., 38(4), 1265-1287. Husty, M. L. (1996). An algorithm for solving the direct kinematics of general Stewart-Gough platforms. Mechanism Machine Theory, 31(4), 365-380. Husty, M. L., & Karger, A. (2000). Self-motions of Griffis-Duffy type parallel manipulators. Proceedings of the 2000 IEEE Int. Conf. Robotics and Automation, CDROM, San Francisco, CA, April 24-28, 2000. IEEE. Iitaka, S. (1982). Algebraic geometry, Vol. 76 of Graduate Texts in Mathematics. New York: Springer-Verlag. An introduction to birational geometry of algebraic varieties, North-Holland Mathematical Library, 24. Innocenti, C. (1995). Polynomial solution to the position analysis of the 7-link Assur kinematic chain with one quaternary link. Mechanism Machine Theory, 30(8), 1295-1303. Isaacson, E., & Keller, H. B. (1994). Analysis of numerical methods. New York: Dover Publications Inc. Corrected reprint of the 1966 original [Wiley, New York]. Kearfott, R. B. (1996). Rigorous global search: continuous problems, Vol. 13 of Nonconvex Optimization and its Applications. Dordrecht: Kluwer Academic Publishers. Kearfott, R. B. (1997). Empirical evaluation of innovations in interval branch and bound algorithms for nonlinear systems. SIAM J. Sci. Comp., 18(2), 574-594. Kearfott, R. B., & Novoa, M. (1990). Algorithm 681: INTBIS, a portable interval Newton/bisection package. ACM Trans. Math. Softw., 16(2), 152-157. Kearfott, R. B., & Xing, Z. (1994). An interval step control for continuation methods. SIAM J. Numer. Anal., 31(3), 892-914. Keller, H. B. (1981). Geometrically isolated nonisolated solutions and their approximation. SIAM J. Numer. Anal., 18(5), 822-838. Kendig, K. (1977). Elementary algebraic geometry. New York: Springer-Verlag. Graduate Texts in Mathematics, No. 44. Khovanski, A. G. (1978). Newton polyhedra, and the genus of complete intersections. Funktsional. Anal, i Prilozhen., 12(1), 51-61. Kleiman, S. L. (1986). Tangency and duality. Proceedings of the 1984 Vancouver conference in algebraic geometry, Vol. 6 of CMS Conf. Proc. (pp. 163-225). Providence, RI: Amer. Math. Soc. Knuth, D. E. (1981). The art of computer programming. Vol. 2. Addison-Wesley Publishing Co., Reading, Mass., second edition. Seminumerical algorithms, Addison-Wesley Series in Computer Science and Information Processing. Krick, T. (2004). Straight-line programs in polynomial equation solving. In F. Cucker, R. DeVore, P. Olver, & E. Siili (Eds.), Foundations of computational mathematics, Minneapolis 2002. Cambridge University Press. Kuo, Y.-C, Li, T.-Y., & Wu, D. (2004). Determining whether a numerical solution of a polynomial system is isolated, preprint.
386
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Kushnirenko, A. G. (1976). Newton polytopes and the Bezout theorem. Funktsional. Anal, i Prilozhen., 10(3), 82-83. Lazard, D. (1993). On the representation of rigid-body motions and its application to generalized platform manipulators. In J. Angeles, P. Kovacs, & G. Hommel (Eds.), Computational kinematics (pp. 175—182). Kluwer. Lecerf, G. (2001). Une alternative aux methodes de reecriture pour la resolution des system.es algebriques. PhD thesis, Ecole Polytechnique. Lecerf, G. (2002). Quadratic Newton iteration for systems with multiplicity. Found. Comput. Math., 2(3), 247-293. Lee, H.-Y., & Liang, C.-G. (1988). Displacement analysis of the general spatial 7-link 7R mechanism. Mechanism Machine Theory, 23(3), 219-226. Leykin, A., Verschelde, J., & Zhao, A. (2004). Newton's method with deflation for isolated singularities of polynomial systems, preprint. Li, T.-Y. (1983). On Chow, Mallet-Paret and Yorke homotopy for solving system of polynomials. Bull. Inst. Math. Acad. Sinica, 11(3), 433-437. Li, T.-Y. (1993). Solving polynomial systems by homotopy continuation methods. In Computer mathematics (Tianjin, 1991), Vol. 5 of Nankai Ser. Pure Appl. Math. Theoret. Phys. (pp. 18-35). River Edge, NJ: World Sci. Publishing. Li, T.-Y. (1997). Numerical solution of multivariate polynomial systems by homotopy continuation methods. In Ada numerica, Vol. 6 (pp. 399-436). Cambridge: Cambridge Univ. Press. Li, T.-Y. (1999). Solving polynomial systems by polyhedral homotopies. Taiwanese J. Math., 3(3), 251-279. Li, T.-Y. (2003). Numerical solution of polynomial systems by homotopy continuation methods. In Handbook of numerical analysis, Vol. XI (pp. 209-304). Amsterdam: North-Holland. Li, T.-Y., & Li, X. (2001). Finding mixed cells in the mixed volume computation. Found. Comput. Math., 1(2), 161-181. Li, T.-Y., & Sauer, T. (1987a). Homotopy method for generalized eigenvalue problems Ax = XBx. Linear Algebra Appl., 91, 65-74. Li, T.-Y., & Sauer, T. (1987b). Regularity results for solving systems of polynomials by homotopy method. Numer. Math., 50(3), 283-289. Li, T.-Y., &; Sauer, T. (1989). A simple homotopy for solving deficient polynomial systems. Japan J. Appl. Math., 6(3), 409-419. Li, T.-Y., Sauer, T., & Yorke, J. A. (1987a). Numerical solution of a class of deficient polynomial systems. SIAM J. Numer. Anal, 24(2), 435-451. Li, T.-Y., Sauer, T., & Yorke, J. A. (1987b). The random product homotopy and deficient polynomial systems. Numer. Math., 51(5), 481-500. Li, T.-Y., Sauer, T., & Yorke, J. A. (1988). Numerically determining solutions of systems of polynomial equations. Bull. Amer. Math. Soc. (N.S.), 18(2), 173-177. Li, T.-Y., Sauer, T., & Yorke, J. A. (1989). The cheater's homotopy: an efficient procedure for solving systems of polynomial equations. SIAM J. Numer. Anal.,
Bibliography
387
26(5), 1241-1251. Li, T. Y., Wang, T., & Wang, X. (1996). Random product homotopy with minimal BKK bound. In The mathematics of numerical analysis (Park City, UT, 1995), Vol. 32 of Lectures in Appl. Math. (pp. 503-512). Providence, RI: Amer. Math. Soc. Li, T.-Y., & Wang, X. (1991). Solving deficient polynomial systems with homotopies which keep the subschemes at infinity invariant. Math. Comp., 56(194), 693-710. Li, T.-Y., & Wang, X. (1992). Nonlinear homotopies for solving deficient polynomial systems with parameters. SIAM J. Numer. Anal, 29(4), 1104-1118. Li, T.-Y., & Wang, X. (1996). The BKK root count in C". Math. Comp., 65(216), 1477-1484. Li, T.-Y., & Zheng, Z. (2004). A rank-revealing method and its applications. preprint. Lipman, J. (1975). Introduction to resolution of singularities. In Algebraic geometry (Proc. Sympos. Pure Math., Vol. 29, Humboldt State Univ., Arcata, Calif., 1974) (pp. 187-230). Providence, R.I.: Amer. Math. Soc. Lo Cascio, M. L., Pasquini, L., & Trigiante, D. (1989). Simultaneous determination of polynomial roots and multiplicities: an algorithm and related problems. Ricerche Mat, 38(2), 283-305. Losch, S. (1995). Parallel redundant manipulators based on open and closed normal Assur chains. In J.-P. Merlet, & B. Ravani (Eds.), Computational kinematics '95, Proceedings of the Second Workshop held in Sophia Antipolis, September 4-6, 1995, Vol. 40 of Solid Mechanics and its Applications (pp. x+310). Dordrecht: Kluwer Academic Publishers Group. Lu, Y., Sommese, A. J., & Wampler, C. W. (2005). Finding all real solutions of polynomial systems: I the curve case, in preparation. Macaulay, F. (1902). On some formulas in elimination. Proc. London Math. Soc, 3, 3-27. Manocha, D. (1993). Efficient algorithms for multipolynomial resultant. The Computer Journal, 36, 485-496. Manocha, D. (1994). Solving systems of polynomial equations. IEEE Comput. Graph. Appl, 36, 46-55. Manocha, D., & Canny, J. F. (1994). Efficient inverse kinematics for general 6R manipulators. IEEE Trans. Rob. Auto., 10(5), 648-657. Manseur, R., & Doty, K. (1989). A robot manipulator with 16 real inverse kinematic solution set. Int. J. Robotics Res., 8(5), 75-79. Marden, M. (1966). Geometry of polynomials. Second edition. Mathematical Surveys, No. 3. Providence, R.I.: American Mathematical Society. Mavroidis, C, & Roth, B. (1995a). Analysis of overconstrained mechanisms. ASME J. Mech. Design, 117, 69-74. Mavroidis, C, & Roth, B. (1995b). New and revised overconstrained mechanisms. ASME J. Mech. Design, 117, 75-82.
388
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Mayer St-Onge, B., &: Gosselin, C. M. (2000). Singularity analysis and representation of the general Gough-Stewart platform. Int. J. Robotics Research, 19, Li 1— ZOO.
Meintjes, K., & Morgan, A. P. (1987). A methodology for solving chemical equilibrium systems. Appl. Math. Com/put., 22, 333-361. Merlet, J.-P. (1989). Singular configurations of parallel manipulators and Grassmann geometry. Int. J. Robotics Research, 8, 45—56. Merlet, J.-P. (2000). Parallel robots. Kluwer Academic Publishers, Dordrecht, The Netherlands. Merlet, J.-P. (2001). A parser for the interval evaluation of analytical functions and its applications to engineering problems. J. Symbolic Computation, 31, 475-486. Mignotte, M., & Stefanescu, D. (1999). Polynomials. Springer Series in Discrete Mathematics and Theoretical Computer Science. Springer-Verlag, Singapore. An algorithmic approach. Milnor, J. W. (1965). Topology from the differentiate viewpoint. Based on notes by David W. Weaver. The University Press of Virginia, Charlottesville, Va. Moller, H. M. (1998). Grobner bases and numerical analysis. In Grobner bases and applications (Linz, 1998), Vol. 251 of London Math. Soc. Lecture Note Ser. (pp. 159-178). Cambridge: Cambridge Univ. Press. Moller, H. M., & Stetter, H. J. (1995). Multivariate polynomial equations with multiple zeros solved by matrix eigenproblems. Num. Math., 70, 311-329. Moore, R. E. (1979). Methods and applications of interval analysis, Vol. 2 of SIAM Studies in Applied Mathematics. Philadelphia, Pa.: Society for Industrial and Applied Mathematics (SIAM). Morgan, A. P. (1983). A method for computing all solutions to systems of polynomial equations. ACM Trans. Math. Software, 9(1), 1-17. Morgan, A. P. (1986a). A homotopy for solving polynomial systems. Appl. Math. Comput., 18(1), 87-92. Morgan, A. P. (1986b). A transformation to avoid solutions at infinity for polynomial systems. Appl. Math. Comput., 18(1), 77-86. Morgan, A. P. (1987). Solving polynomial systems using continuation for engineering and scientific problems. Prentice-Hall, Englewood Cliffs, N.J. Morgan, A. P., & Sommese, A. J. (1987a). A homotopy for solving general polynomial systems that respects m-homogeneous structures. Appl. Math. Comput., 101-113. Morgan, A. P., & Sommese, A. J. (1987b). Computing all solutions to polynomial systems using homotopy continuation. Appl. Math. Comput., 115-138. Errata: Appl. Math. Comput. 51 (1992), p. 209. Morgan, A. P., & Sommese, A. J. (1989). Coefficient-parameter polynomial continuation. Appl. Math. Comput, 29(2), 123-160. Errata: Appl. Math. Comput. 51:207(1992). Morgan, A. P., & Sommese, A. J. (1990). Generically nonsingular polynomial
Bibliography
389
continuation. In Computational solution of nonlinear systems of equations (Fort Collins, CO, 1988), Vol. 26 of Lectures in Appl. Math. (pp. 467-493). Providence, RI: Amer. Math. Soc. Morgan, A. P., Sommese, A. J., & Wampler, C. W. (1990). Polynomial continuation for mechanism design problems. In Computational solution of nonlinear systems of equations (Fort Collins, CO, 1988), Vol. 26 of Lectures in Appl. Math. (pp. 495-517). Providence, RI: Amer. Math. Soc. Morgan, A. P., Sommese, A. J., & Wampler, C. W. (1991). Computing singular solutions to nonlinear analytic systems. Numer. Math., 58(7), 669-684. Morgan, A. P., Sommese, A. J., & Wampler, C. W. (1992a). Computing singular solutions to polynomial systems. Adv. in Appl. Math., 13(3), 305-327. Morgan, A. P., Sommese, A. J., & Wampler, C W. (1992b). A power series method for computing singular solutions to nonlinear analytic systems. Numer. Math., 63(3), 391-409. Morgan, A. P., Sommese, A. J., & Wampler, C. W. (1995). A productdecomposition bound for Bezout numbers. SIAM J. Numer. Anal, 32(A), 13081325. Morgan, A. P., Sommese, A. J., & Watson, L. T. (1989). Finding all isolated solutions to polynomial systems using HOMPACK. A CM Trans. Math. Software, 15(2), 93-122. Morgan, A. P., & Wampler, C. W. (1990). Solving a planar four-bar design problem using continuation. ASME J. Mech. Design, 112, 544-550. Morgan, A. P., & Watson, L. T. (1987). Solving polynomial systems of equations on a hypercube. In Hypercube multiprocessors 1987 (Knoxville, TN, 1986) (pp. 501-511). Philadelphia, PA: SIAM. Morgan, A. P., & Watson, L. T. (1989). A globally convergent parallel algorithm for zeros of polynomial systems. Nonlinear Anal., 13(\1), 1339-1350. Mourrain, B. (1993, July). The 40 generic positions of a parallel robot. In M. Bronstein (Ed.), Proc. ISSAC'93 (Kiev) (pp. 173-182). ACM Press. Mourrain, B. (1996). Enumeration problems in geometry, robotics and vision. In Algorithms in algebraic geometry and applications (Santander, 1994), Vol. 143 of Progr. Math. (pp. 285-306). Basel: Birkhauser. Mourrain, B. (1998). Computing the isolated roots by matrix methods. J. Symbolic Comput., 26(6), 715-738. Symbolic numeric algebra for polynomials. Mumford, D. (1966). Lectures on curves on an algebraic surface. With a section by G. M. Bergman. Annals of Mathematics Studies, No. 59. Princeton, N.J.: Princeton University Press. Mumford, D. (1970). Varieties denned by quadratic equations. In E. Marchionna (Ed.), Questions on algebraic varieties (C.I.M.E., III Ciclo, Varenna, 1969) (pp. 29-100). Rome: Edizioni Cremonese. Mumford, D. (1995). Algebraic geometry. I. Classics in Mathematics. Berlin: Springer-Verlag. Complex projective varieties, Reprint of the 1976 edition.
390
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Mumford, D. (1999). The red book of varieties and schemes, Vol. 1358 of Lecture
Notes in Mathematics. Berlin: Springer-Verlag, expanded edition. Includes the Michigan lectures (1974) on curves and their Jacobians, With contributions by E. Arbarello. Nanua, P., Waldron, K. J., & Murthy, V. (1991). Direct kinematic solution of a Stewart platform. IEEE Trans, on Robotics and Automation, 6(4), 438-444. Neumaier, A. (1990). Interval methods for systems of equations, Vol. 37 of Encyclopedia of Mathematics and its Applications. Cambridge: Cambridge University Press. Nielsen, J., & Roth, B. (1999). Solving the input/output problem for planar mechanisms. ASME J. Mech. Design, 121(2), 206-211. Ojika, T. (1987). Modified deflation algorithm for the solution of singular problems. I. A system of nonlinear algebraic equations. J. Math. Anal. Appl., 123, 199-221. Ojika, T., Watanabe, S., & Mitsui, T. (1983). Deflation algorithm for the multiple roots of a system of nonlinear equations. J. Math. Anal. Appl., 96, 463-479. Pan, V. Y. (1997). Solving a polynomial equation: some history and recent progress. SI AM Rev., 39(2), 187-220. Pasquini, L., & Trigiante, D. (1985). A globally convergent method for simultaneously finding polynomial roots. Math. Comp., ^^(169), 135-149. Pernkopf, F., & Husty, M. L. (2002). Singularity analysis of spatial stewart-gough platforms with planar base and platform. Proc. ASME Design Eng. Tech. Conf, Montreal, Canada, Sept. 30~Oct. 2, 2002. Pieper, D. L. (1968). The kinematics of manipulators under computer control. PhD thesis, Computer Science Dept., Stanford University. Primrose, E. J. F. (1986). On the input-output equation of the general 7Rmechanism. Mechanism Machine Theory, 21(6), 509-510. Raghavan, M. (1991). The Stewart platform of general geometry has 40 configurations. Proc. ASME Design and Automation Conf, vol. 32-2 (pp. 397-402). ASME. Raghavan, M. (1993). The Stewart platform of general geometry has 40 configurations. ASME J. Mech. Design, 115, 277-282. Raghavan, M., & Roth, B. (1993). Inverse kinematics of the general 6R manipulator and related linkages. ASME J. Mech. Design, 115, 502-508. Raghavan, M., & Roth, B. (1995). Solving polynomial systems for the kinematic analysis and synthesis of mechanisms and robot manipulators. ASME J. Mech. Design, 117, 71-79. Roberts, S. (1875). On three-bar motion in plane space. Proc. London Math. Soc, VII, 14-23. Rojas, J. M. (1994). A convex geometric approach to counting the roots of a polynomial system. Theoret. Comput. Sci., 133(1), 105-140. Selected papers of the Workshop on Continuous Algorithms and Complexity (Barcelona, 1993). Rojas, J. M. (1999). Toric intersection theory for affine root counting. J. Pure Appl.
Bibliography
391
Algebra, 136(1), 67-100. Rojas, J. M., & Wang, X. (1996). Counting affine roots of polynomial systems via pointed Newton polytopes. J. Complexity, 12(2), 116-133. Ronga, F., & Vust, T. (1995). Stewart platforms without computer? In Real analytic and algebraic geometry (Trento, 1992) (pp. 197-212). Berlin: de Gruyter. Roth, B. (1962). A generalization of Burmester theory: Nine-point path generation of geared five-bar mechanisms with gear ratio plus and minus one. PhD thesis, Columbia University. Roth, B., & Freudenstein, F. (1963). Synthesis of path-generating mechanisms by numerical means. J. Eng. Industry, 298-306. Trans. ASME, vol. 85, Series B. Roth, B., Rastegar, J., & Scheinman, V. (1974). On the design of computer controlled manipulators. On the Theory and Practice of Robots and Manipulators: First CSIM-IFToMM Symposium (pp. 93-113). Springer-Verlag. Rump, S. M. (1999). INTLAB - INTerval LABoratory. In T. Csendes (Ed.), Developments in reliable computing, Proc. of (SCAN-98), Budapest, September 22-25, 1998 (pp. 77-104). Dordrecht: Kluwer Academic Publishers. Rupprecht, D. (2004). Semi-numerical absolute factorization of polynomials with integer coefficients. J. Symbolic Comput., 37(5), 557-574. Sasaki, T. (2001). Approximate multivariate polynomial factorization based on zero-sum relations. In B. Mourrain (Ed.), Proceedings of the 2001 international symposium on symbolic and algebraic computation (ISSAC 2001) (pp. 284-291). ACM. Schenck, H. (2003). Computational algebraic geometry, Vol. 58 of London Mathematical Society Student Texts. Cambridge: Cambridge University Press. Shiftman, B., & Sommese, A. J. (1985). Vanishing theorems on complex manifolds, Vol. 56 of Progress in Mathematics. Boston, MA: Birkhauser Boston Inc. Sommese, A. J., & Verschelde, J. (2000). Numerical homotopies to compute generic points on positive dimensional algebraic sets. J. Complexity, 16(3), 572-602. Complexity theory, real machines, and homotopy (Oxford, 1999). Sommese, A. J., Verschelde, J., & Wampler, C. W. (2001a). Numerical decomposition of the solution sets of polynomial systems into irreducible components. SIAM J. Numer. Anal., 38(6), 2022-2046. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2001b). Numerical irreducible decomposition using projections from points on the components. In Symbolic computation: solving equations in algebra, geometry, and engineering (South Hadley, MA, 2000), Vol. 286 of Contemp. Math. (pp. 37-51). Providence, RI: Amer. Math. Soc. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2001c). Using monodromy to decompose solution sets of polynomial systems into irreducible components. In Applications of algebraic geometry to coding theory, physics and computation (Eilat, 2001), Vol. 36 of NATO Sci. Ser. II Math. Phys. Chem. (pp. 297-315). Dordrecht: Kluwer Acad. Publ.
392
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Sommese, A. J., Verschelde, J., & Wampler, C. W. (2002a). A method for tracking singular paths with application to the numerical irreducible decomposition. In Algebraic geometry (pp. 329-345). Berlin: de Gruyter. Sommese, A. J., Verschelde, J., & Wampler, C W. (2002b). Symmetric functions applied to decomposing solution sets of polynomial systems. SIAM J. Numer. Anal, 40(6), 2026-2046. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2003). Numerical irreducible decomposition using PHCpack. In Algebra, geometry, and software systems (pp. 109-129). Berlin: Springer. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004a). Advances in polynomial continuation for solving problems in kinematics. ASME J. Mech. Design, 126(2), 262-268. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004b). Homotopies for intersecting solution components of polynomial systems. SIAM J. Numer. Anal., 42(4), 1552-1571. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004c). An intrinsic homotopy for intersecting algebraic varieties. J. Complexity, to appear. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004d). Numerical factorization of multivariate complex polynomials. Theoretical Computer Science, 315, 651— 669. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004e). Solving polynomial systems equation by equation, in preparation. Sommese, A. J., & Wampler, C. W. (1996). Numerical algebraic geometry. In The mathematics of numerical analysis (Park City, UT, 1995), Vol. 32 of Lectures in Appi Math. (pp. 749-763). Providence, RI: Amer. Math. Soc. Sosonkina, M., Watson, L. T., & Stewart, D. E. (1996). Note on the end game in homotopy zero curve tracking. ACM Trans. Math. Software, 22(3), 281-287. Sreenivasan, S. V., & Nanua, P. (1992). Solution of the direct position kinematics problem of the general Stewart platform using advanced polynomial continuation. DE-Vol. 45, Robotics, Spatial Mechanisms, and Mechanical Systems (pp. 99-106). ASME. Sreenivasan, S. V., Waldron, K. J., & Nanua, P. (1994). Closed-form direct displacement analysis of a 6-6 Stewart platform. Mechanism Machine Theory, 29(6), 855-864. Stetter, H. J. (2004). Numerical polynomial algebra. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). Stoer, J., & Bulirsch, R. (2002). Introduction to numerical analysis, Vol. 12 of Texts in Applied Mathematics. New York: Springer-Verlag, third edition. Translated from the German by R. Bartels, W. Gautschi and C. Witzgall. Sturmfels, B. (2002). Solving systems of polynomial equations, Vol. 97 of CBMS Regional Conference Series in Mathematics. Published for the Conference Board of the Mathematical Sciences, Washington, DC.
Bibliography
393
Sturmfels, B., & Zelevinsky, A. (1994). Multigraded resultants of Sylvester type. J. of Algebra, 163(1), 115-127. Su, H.-J., Wampler, C. W., & McCarthy, J. M. (2004). Geometric design of cylindric PRS serial chains. ASME J. Mech. Design, 126(2), 269-277. Tsai, L. W. (1999). Robot analysis: the mechanics of serial and parallel manipulators. New York: John Wiley & Sons Inc. Tsai, L. W., & Lu, J.-J. (1989). Coupler-point curve synthesis using homotopy methods. In B. Ravani (Ed.), Advances in Design Automation-1989: Mechanical Systems Analysis, Design and Simulation, Vol. DE-Vol. 19-3 (pp. 417-424). ASME. Tsai, L. W., & Morgan, A. P. (1985). Solving the kinematics of the most general six- and five-degree-of-freedom manipulators by continuation methods. ASME J. Mech., Trans., Auto. Design, 107, 48-57. van der Waerden, B. L. (1949). Modern Algebra. Vol. I. New York, N. Y.: Frederick Ungar Publishing Co. Translated from the second revised German edition by Fred Blum, With revisions and additions by the author. van der Waerden, B. L. (1950). Modern Algebra. Vol. II. New York, N. Y.: Frederick Ungar Publishing Co. Translated from the first German edition by Theodore Benac. Verschelde, J. (1996). Homotopy continuation methods for solving polynomial systems. PhD thesis, Katholieke Universiteit Leuven. Verschelde, J. (1999). Algorithm 795: PHCpack: A general-purpose solver for polynomial systems by homotopy continuation. A CM Trans, on Math. Software, 25(2), 251-276. Verschelde, J. (2000). Toric Newton method for polynomial homotopies. J. Symbolic Comput., £9(4-5), 777-793. Symbolic computation in algebra, analysis, and geometry (Berkeley, CA, 1998). Verschelde, J., & Cools, R. (1993). Symbolic homotopy construction. Appl. Algebra Engrg. Comm. Comput., ^(3), 169-183. Verschelde, J., Gatermann, K., & Cools, R. (1996). Mixed-volume computation by dynamic lifting applied to polynomial system solving. Discrete Comput. Geom., 16(1), 69-112. Verschelde, J., Verlinden, P., & Cools, R. (1994). Homotopies exploiting Newton polytopes for solving sparse polynomial systems. SI AM J. Numer. Anal., 31 (3), 915-930. Verschelde, J., & Wang, Y. (2004). Computing feedback laws for linear systems with a parallel Pieri homotopy. In Y. Yang (Ed.), Proceedings of 2004 International Conference on Parallel Processing Workshops, August 15-18, 2004 (PP- 222-229). IEEE. Walker, R. J. (1962). Algebraic curves. Dover, New York. Wampler, C. W. (1992). Bezout number calculations for multi-homogeneous polynomial systems. Appl. Math. Comput, 51(2-3), 143-157.
394
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
Wampler, C. W. (1994). An efficient start system for multihomogeneous polynomial continuation. Numer. Math., 66(4), 517-523. Wampler, C. W. (1996a). Forward displacement analysis of general six-in-parallel SPS (Stewart) platform manipulators using soma coordinates. Mechanism Machine Theory, 31, 331-337. Wampler, C. W. (1996b). Isotropic coordinates, circularity and Bezout numbers: planar kinematics from a new perspective. In J. M. McCarthy (Ed.), Proceedings of the 1996 ASME Design Engineering Technical Conference, Irvine, California August 18-22, 1996. American Society of Mechanical Engineers, CD-ROM. Also available as GM Technical Report, Publication R&D-8188., 1996. Wampler, C. W. (1999). Solving the kinematics of planar mechanisms. ASME J. Mech. Design, 121, 387-391. Wampler, C. W. (2001). Solving the kinematics of planar mechanisms by Dixon determinant and a complex-plane formulation. ASME J. Mech. Design, 123(3), 382-387. Wampler, C. W. (2004). Displacement analysis of spherical mechanisms having three or fewer loops. ASME J. Mech. Design, 126(1), 93-100. Wampler, C. W., & Morgan, A. P. (1993). Solving the kinematics of general 6R manipulators using polynomial continuation. In Robotics: applied mathematics and computational aspects (Loughborough, 1989), Vol. 41 of Inst. Math. Appl. Conf. Ser. New Ser. (pp. 57-69). New York: Oxford Univ. Press. Wampler, C. W., Morgan, A. P., & Sommese, A. J. (1990). Numerical continuation methods for solving polynomial systems arising in kinematics. ASME J. Mech. Design, 112, 59-68. Wampler, C. W., Morgan, A. P., & Sommese, A. J. (1992). Complete solution of the nine-point path synthesis problem for four-bar linkages. ASME J. Mech. Design, 114, 153-159. Wampler, C. W., Morgan, A. P., & Sommese, A. J. (1997). Complete solution of the nine-point path synthesis problem for four-bar linkages - closure. ASME J. Mech. Design, 119, 150-152. Watson, L. T., Billups, S. C , k Morgan, A. P. (1987). Algorithm 652. HOMPACK: a suite of codes for globally convergent homotopy algorithms. ACM Trans. Math. Software, 13(3), 281-310. Watson, L. T., Sosonkina, M., Melville, R. C , Morgan, A. P., & Walker, H. F. (1997). Algorithm 777: HOMPACK90: a suite of Fortran 90 codes for globally convergent homotopy algorithms. ACM Trans. Math. Software, 23(4), 514-549. Weil, A. (1962). Foundations of algebraic geometry. Providence, R.I.: American Mathematical Society. Wilkinson, J. H. (1984). The perfidious polynomial. In Studies in numerical analysis, Vol. 24 of MAA Stud. Math. (pp. 1-28). Washington, DC: Math. Assoc. America. Wilkinson, J. H. (1994). Rounding errors in algebraic processes. New York: Dover
Bibliography
395
Publications Inc. Reprint of the 1963 original [Prentice-Hall, Englewood Cliffs, NJ]. Xu, Z.-B., Zhang, J.-S., & Wang, W. (1996). A cell exclusion algorithm for determining all the solutions of a nonlinear system of equations. Appl. Math. Comput., 80'(2-3), 181-208. Zhang, C.-D., & Song, S.-M. (1994). Forward position analysis of nearly general Stewart platform. ASME J. Mech. Design, 116(1), 54-60.
Index
Z r e g , 44, 215 # , xxii Sing(X), 44 Sing(Z), 306 C*, xxii (x,L), 328 A, xxii P N , 29 Gr(m,N), 325 V(f), 8 Opi(d), 342 \, i.e., setminus, xxii
Local Dimension, 251 LocalDimen, 251 Memberl, 268, 275 Member2, 269, 276 Monodromy, 269, 277 Rank, 240 TopDimen, 250 Trace, 270, 284 WitnessSuper, 247 WitnessSupi, 245 WitnessSupi(intrinsic), 246 algorithm for the rank of a system, 319 analysis, 163 analytic continuation, 278 analytic parameter spaces, 349 analytic Zariski open set, 350
affine algebraic set, 43, 47, 56, 207, 209 affine hyperplane, 232 affine space, 209 affine variety, 215 algebraic function, 210, 212 algebraic map, 208, 210, 212, 219, 220 algebraic probability one, 50 algebraic set, 43, 44, 207, 209 affine, see affine algebraic set constructible set, see constructive set projective, see projective algebraic set quasiprojective, see quasiprojective set algebraic set associated to / , 8 algebraic set of / see algebraic set associated to / , 8 algorithm Inclusion, 252 LocalDimen, 251 Equal, 253 IrrDecompl, 268 IrrDecomp2, 271 IrrDecompPure, 270, 284 JunkRemove, 269
base locus, 323 Bertini Theorems, 313, 323, 330-333 big system, 319 biholomorphic mapping, 301 biholomorphic to, 301 BKK bound, 139 body guidance, 163, 165 branched covering, 314 Buchberger's algorithm, 82 Burmester centers, 166 Burmester points, 166 cascade algorithm, 255, 259 Cauchy integral, 199 Cauchy integral endgame, 285 Cauchy integral method, 186, 187, 189 Cauchy's Lemma, 58 center of a projection, 213, 328 Chebychev polynomials, 65 397
398
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
chemical equilibria, 152-154, 170 Cauchy integral method, see Cauchy Chern class, 343 integral method Chevalley's Theorem, 222 cluster method, see trace method classical topology, see complex topology, power-series method, see power-series 211 method cluster method, 187 trace method, see trace method coefficient, 5 endgame convergence radius, 180 coefficient-parameter homotopy, 91 endgame operating zone, 179, 182, 183, coefficient-parameter theory, 92 185 compact affine set, 220 equation-by-equation, 292 companion matrix, 4 Exclusion method, 68 complex analytic set, 301 extension theorems, 302 complex analytic space, 302 extrinsic slicing, 234 complex dimension, see dimension, 306 complex manifolds, 302 finite affine set, 210 complex projective space, see projective finite map, 312 space first Chern class, 342 complex topology, 211, 300 five-point path synthesis, 166, 174 condition number, 198 four-bar analysis, 164 cone with vertex x, 320 four-bar equations, 163 constellation of algebraic sets, 331, 332 four-bar function generation, 173 constructible algebraic set, 207, 208 four-bar linkages, 169 constructible set, 208, 209, 221 four-bar synthesis, 162, 163 convex polytope, 138 four-body guidance, 174 corank of a polynomial system, 239 fractional power series, 180 corank of an algebraic system, 318 function generation, 162, 164 coupler curve, 162 Fundamental Theorem of Algebra, 55 covering map, 314 cuspidal cubic, 282 gamma trick, 18, 94, 95 general point, 44 deflation, 190-193, 195 generic, 45, 46 degree, 230 simply, 332 degree of a polynomial, 5 generic Bezout number, 346, 351 desingularization, 310 generic factorization, 316, 317 diagonal intersection, 289 generic line, 46 differentiable manifold, 301 generic linear change of coordinates, 213 dimension, 44, 207, 216, 306 generic linear projection, 213, 324 upper semicontinuity, see upper generic point, 44 semicontinuity of dimension generic projection, see generic linear dimension of a germ, 309 projection dimensional complex manifold, 302 generic root count, 346, 351 discriminant, 57 generic with respect to an algebraic set, disk, 58 233 Dixon determinant, 77 generically, 45 dominant map, 222, 311 genericity, 43 dual curve, 338 germ, 308 dimension of a, 309 elementary symmetric functions, 280 irreducible, 309 elimination methods, 72 germ of a complex analytic set, 308 endgame, 177 germ of an affine algebraic set, 308
399
Index germ of an analytic set, 308 germs, 181 Grobner bases, 81 Grobner basis, 82 graph of a map, 219 Grassmannian, 325, 326 Grauert's Proper Mapping Theorem, 311 ground link, 162 grounded link, 13 growth estimates, 60 Hartogs' Theorem, 302, 303 heuristic eliminant, 79 hidden variable resultant, 73 Hironaka Desingularization Theorem, 310 holomorphic function, 300 holomorphic mapping, 301 homogeneous coordinates, xxii, 29 homogeneous polynomial, 33, 34 homogeneous polynomials, 218 homotopy continuation method, 15 homotopy membership test, 275 hyperplane, 232 hyperplane at infinity, 320, 327, 335-337 image of an irreducible set, 222 Implicit Function Theorem, 304 interval arithmetic, 201 intrinsic slicing, 234 irreducible algebraic set, 44, 207, 215 irreducible at a point, 309 irreducible component, 56, 207 irreducible decomposition, 56, 207, 215, 219 irreducible germ, 309 dimension of an, 309 irreducible witness sets, 230 isomorphic, 210, 212 isotropic coordinates, 160 joint offset, 158 junk points, 245, 249 Laurent monomial, 138 Laurent polynomial, 139 level i nonsolutions, 258 line at infinity, 33 line bundle, 341, 342 linear projection, 213, 324 generic, see generic linear projection
linear projections, 212 linear slicing, 231 link length, 158 locally irreducible, 309 losing the endgame, 188 manifold, 301 manifold point, 44, 306 map finite, 312 proper, 212 maximum principle, 304, 311 membership test, 266 Minkowski sum, 139 mixed strategy, 150 mixed volume, 138, 140 monodromy, 275-277, 339, 348 monodromy action, 278 monomial, 5 Mount Everest of Kinematics, see six-revolute serial-link robots multidegree notation, xxi, 5, 301, 322 multihomogeneous polynomial, 35 multiplicity, 8, 209, 223, 224, 236 multiprojective space, 35 Nash equilibria, 149-151, 170 nested parameter homotopy, 101 Newton polytope, 138 Newton's method, 17, 18, 24, 71, 177, 182 Newton-Raphson method, 17 nine-point path synthesis problem, 112, 161, 167 Noether Normalization Theorem, 214, 336 nonreduced, 236 nonsolutions, 258 normal, 281, 303, 308 normal complex analytic space, 308 normalization, 189, 311 Nullstellensatz, 307 numerical algebraic geometry, vii, 227-229, 241 numerical elimination theory, 266 numerical irreducible decomposition, 228, 230, 231, 253, 265 overdetermined, 241 parameter homotopy, see coefficient-parameter homotopy
400
Numerical Solution of Systems of Polynomials Arising in Engineering and Science
patch switching, 38 path generation, 162 path synthesis problems, 166 Plucker embedding, 326 point at infinity, 30 polyhedron, 138 polynomial system, 209 polytope, 138 polytope root count, 139 power-series endgame, 199, 285 power-series method, 183, 185, 186, 189, 194 precision-point methods, 163 primary decomposition, 216 probabilistic algorithm, 249 probability one, 43, 50, 313 probability-one methods, 207 projective transformation, 39 projective algebraic set, 34, 207, 217 projective line, 30 projective plane, 32 projective rank of an algebraic system, 319, 320, 330 projective set, 43 projective space, 27-30 projective transformation, 38, 40, 198 proper, 189 proper algebraic map, 212 proper map, 212 proper mapping theorem, 311 Puiseux's Theorem, 310 pure-dimensional, 216, 219
Riemann Bounded Extension Theorem, 303
quadric surface, 334 quasiprojective algebraic set, 44, 207, 208 quasiprojective set, see quasiprojective algebraic set, 219
sampling, 272, 273 Sard's Theorem, 313 secant variety, 335 section of a line bundle, 218 Segre embedding, 293, 334, 335 set of indeterminacy, 317 seven-bar structures, 172 simply generic, 332 singular path tracking, 273, 284 singular point, 44, 306 singular set, 307 six-revolute inverse position, 172 six-revolute serial-link robots, 156 slicing, 231 smooth immersion, 279 smooth point, 44, 306 solution sets, 8 spanned, 342 square system, 241 Stein factorization Theorem, 312 Stewart-Gough forward kinematics, 154 Stewart-Gough platform robots, ix, 101, 104-106, 108, 109, 111, 113-115, 154, 171 straight-line function, 6, 11, 12, 48, 70, 85, 362 submatrix, 332 Sylvester determinant, 56 Sylvester matrix, 65 Sylvester Resultant, 57 symmetric group, 340 synthesis, 163 synthesis problems, 164, 169 system of coordinates, 304
radical, 216 rank of a polynomial system, 228, 239, 240 rank of an algebraic system, 318, 319, 329 rational mapping, 317, 338 real dimension, 210, 306 reduced, 236 reduction to the diagonal, 290 regular point, 215, 306 Remmert-Stein Factorization Theorem see Stein Factorization Theorem, 312 resultant, 57, 73
topologically unibranch, 309 topology, 211 classical, see complex topology complex, see complex topology Zariski, see Zariski topology total degree of a polynomial, 5 trace, 187, 279-281 trace method, 187, 189 trace test, 279 trigonometric equations, 7 twist angle, 158
Index
underdetermined, 241 universal field, 52 universal function, 323 universal system, 323 upper semicontinuity of dimension, 312 variety, 8 vector bundle, 341, 343 Veronese embedding, 334 Wilkinson polynomials, 11 winding number, 180, 182, 183 witness point superset, 244, 245, 255 witness set, see witness point set, 8, 229, 235 witness superset, 253, 256 Zariski closed set, 211 Zariski open set, 92, 211 Zariski topology, 211, 221
401