Numerical Solution of Systems of Polynomials Arising in Engineering and Science

The Numerical Solution of Systems of Polynomials Arising in Engineering and Science The Numerical Solution of Systems ...

Author: Andrew John Sommese | Charles W. Wampler II

27 downloads 727 Views 22MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

The Numerical Solution of Systems of Polynomials Arising in Engineering and Science

The Numerical Solution of Systems of Polynomials

Arising in Engineering and Science

Andrew J. Sommese University of Notre Dame du Lac, USA

Charles W. Wampler, II General Motors Research & Development, USA

^p NEW JERSEY

• LONDON

World Scientific

• SINGAPORE • BEIJING • SHANGHAI

• H O N G K O N G • TAIPEI • CHENNAI

Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

THE NUMERICAL SOLUTION OF SYSTEMS OF POLYNOMIALS ARISING IN ENGINEERING AND SCIENCE Copyright © 2005 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

Forphotocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN 981-256-184-6

Printed in Singapore.

To Rebecca, Rachel, and Ruth

To Vani, Megan, and Anne

Preface

This book started with the goal of explaining, to engineers and scientists, the advances made in the numerical computation of the isolated solutions of systems of nonlinear multivariate complex polynomials since the book of A. Morgan (Morgan, 1987). The writing of this book was delayed because of a number of surprising developments, which made possible numerically describing not just the isolated solutions, but also positive-dimensional solution sets of polynomial systems. The most recent advances allow one to work with individual solution components, which opens up new ways of solving a large system of polynomials by intersecting the solution sets of subsets of the equations. This collection of ideas, methods, and problems makes up the new area of Numerical Algebraic Geometry. The heavy dependence of the new developments since (Morgan, 1987) on algebraic geometric ideas poses a serious challenge for an exposition aimed at engineers, scientists, and numerical analysts — most of whom have had little or no exposure to algebraic geometry. Furthermore most of the introductory books on algebraic geometry are oriented towards computational algebra, and give short shrift at best to the geometric results which underly the numerical analysis of polynomial systems. Even worse, from the standpoint of an engineer or scientist, such books typically aim to resolve algebraic questions and so do not directly address the numerical/geometric questions coming from applications. Our approach throughout this book is to assume that we are trying to explain each topic to an engineer or scientist. We want to be accurate: we do not cut corners on giving precise definitions and statements. We give illustrative examples exhibiting all the phenomena involved, but we only give proofs to the extent that they further understanding. The set of common zeros of a system of polynomials is not a manifold, but it is close to being one in the sense that exceptional points are rare. This vague statement can be made mathematically precise, and indeed, the theoretical underpinnings of our methods imply that we avoid such trouble spots "with probability one." The usual algebraic approaches to the subject do not show how familiar geometric notions from calculus relate to these solution sets. The geometric approach is harder, since to link concepts like prime ideals to algebraic sets with certain very nice vii

viii

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

geometric properties, you must use not only algebra, but topology, several complex variables, and partial differential equations. Doing this with full proofs would rule the book out for all but a very small audience. Yet the theory basically says that, in any number of dimensions, solution sets are as nice as a few well chosen and simple examples would naively lead an engineer or scientist to expect.

There remains a tension that we see no way to completely resolve. Dealing with polynomials and algebraic subsets of Euclidean space is basic, but this is not general enough to cover the applications common in engineering and science. For example, the use of products of projective spaces and multihomogeneous polynomials which live on them is extraordinarily useful, but these polynomials are not "functions" on the products of projective spaces. Working in an appropriate generality to cover everything needed would cast a pall over the whole book. Moreover, the early parts of the book need only advanced calculus and a few concepts from algebraic geometry. For this reason, we often restate results in different levels of generality in different parts of the book. We have also included an appendix with detailed statements of useful, more technical results from algebraic geometry. Part One of the book is introductory. Chapter 1 gives examples of polynomial systems as they arise in practice and gives an introduction to homotopy continuation, the numerical solution tool underlying our work. Chapter 2 gives a more detailed discussion of homotopy continuation and what it means to be a complex or real solution of a system of polynomials. Chapter 3 introduces some algebraic geometry and shows some of the ways it naturally presents itself, e.g., dealing with solutions at infinity and continuation paths going to infinity. Chapter 4 gives a first discussion of generic points and probability-one algorithms. The powerful ability to choose "generic points" in Euclidean space increases the efficiency and stability of numerical algorithms and eliminates some problems that are endemic in exact symbolic procedures. In Chapter 5, there is some detailed discussion of polynomials in just one variable. For example, we discuss the fundamental limitations that the number of digits available to us impose on our recognizing a zero of a polynomial. Chapter 6 gives a brief discussion, with some pointers to the literature, of other approaches to solving systems of polynomials. Part Two is devoted to the theory and practice of finding isolated solutions of polynomial systems. Here we consider the many special features of a polynomial system that make it amenable to efficient solution. Chapter 7 explains the coefficient-parameter framework for systems arising in engineering and science. It is a compelling fact that almost all systems that arise in practice depend on parameters, and need to be solved many times for different values of the parameters. Thus it becomes worthwhile to spend extra computation solving such a system if that extra time, amortized over all the times we solve the

Preface

ix

system, leads to a more efficient and quicker average solution time. We include a case study of this approach applied to Stewart-Gough platform robots. Polynomial systems arising in engineering and science tend to be sparse and highly structured. In Chapter 8, we give an extended discussion of such special structures. These features cause systems to have fewer solutions than would be naively expected. Taking advantage of this structure leads to more efficient homotopies and much faster solution times. Chapter 9 gives case studies for systems arising from a number of different engineering and scientific applications. We have found that these systems present challenging problems and excellent trial grounds for improving our algorithms. Chapter 10 covers endgame methods. These methods exploit continuation to improve the numerical accuracy of singular solutions, such as double or triple roots. Chapter 11 deals with how to recognize and deal with problems that may occur. The probability-one methods we use are based on choosing generic points. If only we had computers with infinite precision, these methods would eliminate all manner of unpleasant difficulties, e.g., path crossing. Since real computers have only finite precision, the probability of "probability zero" events is very small, but positive. This chapter discusses how to detect the occurrence of such events, in the large problems occurring in engineering and science, and how to deal with them. Part Three of the book shows how the ability to compute isolated solutions by homotopy continuation can be exploited to manipulate higher-dimensional solution sets of polynomial systems. To do so, we introduce "witness sets" to represent curves, surfaces and other algebraic-geometric sets as numerical objects. Witness sets and the underlying theory should be looked at as a new subject Numerical Algebraic Geometry whose relation to Algebraic Geometry is similar to the relation of Numerical Linear Algebra to Linear Algebra. Chapter 12 introduces some needed material from algebraic geometry, such as the Zariski topology, its relation to the complex topology, the irreducible decomposition, constructible algebraic sets, and multiplicity. Chapter 13 introduces the basic concepts of numerical algebraic geometry. Primary among these are witness points, which is the natural numerical data structure to encode irreducible algebraic sets. We also give an extensive discussion of the reduction to systems with the same number of equations as unknowns. Based on (Sommese & Wampler, 1996), the article where the Numerical Algebraic Geometry started, this chapter explains the numerical irreducible decomposition and how to compute "witness point supersets," a first approximation to the witness point sets occurring in the numerical irreducible decomposition. Chapter 14 presents an alternative procedure to compute the "witness point supersets" of Chapter 13. We follow (Sommese & Verschelde, 2000), with some of the later improvements from (Sommese, Verschelde, & Wampler, 2004b). One novelty is the complete removal of slack variables. Chapter 15 explains the algorithms to compute the numerical irreducible de-

x

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

composition. This is primarily based on (Sommese, Verschelde, & Wampler, 2001a, 2001c, 2002b). The chapter ends with a section on singular path-tracking. We give some applications, mainly from the theory of mechanisms, which was a major motivation for our studying the numerical solution of polynomial systems. Chapter 16 discusses briefly the recent algorithms of (Sommese et al., 2004b, 2004c) to find the numerical irreducible decomposition of the intersection of irreducible algebraic sets. This gives a new method which shows promise for solving large polynomial systems. Appendix A collects in one place many useful results from algebraic geometry, including some structure theorems relating solutions sets of parameterized polynomial systems at generic points and particular points of the parameter space. Appendix B lists some software packages available for solving polynomial systems by continuation. Appendix C contains a users guide to HOMLAB, a suite of Matlab1 routines provided by the authors for experimenting with polynomial continuation and working the numerous exercises in this book. The bibliography is not meant to be exhaustive. At the present time, when a few keystrokes brings a deluge of references, the inclusion of everything of relevance on a topic as broad as polynomial systems would diminish the value of the bibliography as a tool for learning. Given this, we have followed the policy of only including references of such direct relevance to the topics we cover that they are referred to in the text. Given the frequency with which web addresses change, we do not list explicit addresses of webpages in this book. We do mention numerous websites: it is easy to find their current coordinates by using a search engine. We would like to express our thanks to the National Science Foundation for their support (under Grant No. 0105653 and Grant No. 0410047 for the first author and under Grant No. 0410047 for the second author). The first author thanks the University of Notre Dame and the Duncan Chair for their support. The second author thanks General Motors Research and Development for their support, especially his long-time supervisor, Samuel Marin, and current supervisor, Roland Menassa. The second author wishes to acknowledge his mathematical colleagues at GM R&D who have aided his continuing education in the field, particularly Daniel Baker and the late W. Weston Meyer. Both authors are indebted to Alec Morgan for early collaborations, which introduced us to the area and had the additional benefit of introducing us to each other. We would like to thank Tien-Yien "T.-Y." Li for his helpful comments on this book and on many of our numerical algebraic geometry articles. We would like to express our thanks to all the many people who have made helpful comments and suggested improvements. Our close collaborator, Jan Verschelde, deserves special recognition. We also thank Wesley Calvert, Ye Lu, and Yumiko 1

"MATLAB" is a registered trademark of The Mathworks, Inc.

Preface

xi

Watanabe. We give special thanks to Daniel Bates for his many helpful suggestions and remarks. Most of all, we thank our families for their strong encouragement and patience during the writing of this book. Andrew J. Sommese [email protected] Notre Dame, Indiana, U.S.A.

Charles W. Wampler [email protected] Warren, Michigan, U.S.A.

Contents

vii

Preface

xxi

Conventions

I

Background

1

1. Polynomial Systems

3

1.1 1.2 1.3 1.4 1.5 1.6 1.7

Polynomials in One Variable Multivariate Polynomial Systems Trigonometric Equations as Polynomials Solution Sets Solution by Continuation Overview Exercises

2. Homotopy Continuation 2.1 2.2 2.3 2.4

15

Continuation for Polynomials in One Variable Complex Versus Real Solutions Path Tracking Exercises

3. Projective Spaces 3.1 3.2 3.3 3.4 3.5 3.6

3 5 7 8 9 10 11

15 18 20 24 27

Motivation: Quadratic Equations Definition of Projective Space The Projective Line P 1 The Projective Plane P 2 Projective Algebraic Sets Multiprojective Space xiii

27 29 30 32 34 35

xiv

4.

5.

6.

II 7.

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

3.7 Tracking Solutions to Infinity 3.8 Exercises

36 39

Genericity and Probability One

43

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

44 46 48 50 51 52 53 53

Generic Points Example: Generic Lines Probabilistic Null Test Algebraic Probability One Numerical Certainty Other Approaches to Genericity Final Remarks Exercises

Polynomials of One Variable

55

5.1 5.2 5.3 5.4

55 58 61 65

Algebraic Facts for One Variable Polynomials Analytic Facts for One Variable Polynomials Some Numerical Aspects of Polynomials of One Variable Exercises

Other Methods

67

6.1 Exclusion Methods 6.2 Elimination Methods 6.2.1 Resultants 6.2.1.1 Hidden Variable Resultants 6.2.1.2 u-Resultants 6.2.2 Numerically Confirmed Eliminants 6.2.3 Dixon Determinants 6.2.4 Heuristic Eliminants 6.3 Grobner Methods 6.3.1 Definitions 6.3.2 From Grobner Bases to Eigenvalues 6.4 More Methods 6.5 Floating Point vs. Exact Arithmetic 6.6 Discussion 6.7 Exercises

68 72 73 73 76 76 77 79 81 81 83 84 84 85 86

Isolated Solutions

89

Coefficient-Parameter Homotopy

91

7.1 Coefficient-Parameter Theory 7.2 Parameter Homotopy in Application

92 98

Contents

8.

9.

xv

7.3 7.4 7.5 7.6 7.7

An Illustrative Example: Triangles Nested Parameter Homotopies Side Conditions Homotopies that Respect Symmetry Groups Case Study: Stewart-Gough Platforms 7.7.1 General Case 7.7.2 Platforms with Coincident Joints 7.7.3 Planar Platforms 7.7.4 Summary of Case Study 7.8 Historical Note: The Cheater's Homotopy 7.9 Exercises

99 101 102 103 104 106 108 110 110 Ill 112

Polynomial Structures

117

8.1 8.2 8.3 8.4

A Hierarchy of Structures Notation Homotopy Paths for Linearly Parameterized Families Product Homotopies 8.4.1 Total Degree Homotopies 8.4.2 Multihomogeneous Homotopies 8.4.3 Linear Product Homotopies 8.4.4 Monomial Product Homotopies 8.4.5 Polynomial Product Homotopies 8.5 Polytope Structures 8.5.1 Newton Polytopes and Mixed Volume 8.5.2 Bernstein's Theorem 8.5.3 Computing Mixed Volumes 8.5.4 Polyhedral Homotopies 8.5.5 Example 8.6 A Summarizing Example 8.7 Exercises

118 120 120 122 122 126 130 133 134 138 138 139 140 143 144 146 147

Case Studies

149

9.1 9.2 9.3 9.4 9.5

149 152 154 156 159 160 160 161 162 163

Nash Equilibria Chemical Equilibrium Stewart-Gough Forward Kinematics Six-Revolute Serial-Link Robots Planar Seven-Bar Structures 9.5.1 Isotropic Coordinates 9.5.2 Seven-Bar Equations 9.6 Four-Bar Linkage Design 9.6.1 Four-Bar Synthesis 9.6.2 Four-Bar Equations

xvi

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

9.6.3 Four-Bar Analysis 9.6.4 Function Generation 9.6.5 Body Guidance 9.6.6 Five-Point Path Synthesis 9.6.7 Nine-Point Path Synthesis 9.6.8 Four-Bar Summary 9.7 Exercises 10.

Endpoint Estimation 10.1 Nonsingular Endpoints 10.2 Singular Endpoints 10.2.1 Basic Setup 10.2.2 Fractional Power Series and Winding Numbers 10.3 Singular Endgames 10.3.1 Endgame Operating Zone 10.3.2 Simple Prediction 10.3.3 Power-Series Method 10.3.4 Cauchy Integral Method 10.3.5 The Clustering or Trace Method 10.4 Losing the Endgame 10.5 Deflation of Isolated Singularities 10.5.1 Polynomials in One Variable 10.5.2 More than One Variable 10.6 Exercises

11.

Checking Results and Other Implementation Tips 11.1 Checks 11.1.1 Endpoint Quality Measures 11.1.2 Global Checks 11.2 Corrective Actions 11.2.1 Adaptive Re-Runs 11.2.2 Verified Path Tracking 11.2.3 Multiple Precision 11.3 Exercises

III 12.

Positive Dimensional Solutions Basic Algebraic Geometry 12.1 Affine Algebraic Sets 12.1.1 The Zariski Topology and the Complex Topology 12.1.2 Proper Maps

164 164 165 166 167 169 170 177 178 179 179 180 181 182 183 183 186 187 188 190 191 192 194 197 197 197 199 200 201 201 201 202

205 207 209 211 212

Contents

12.1.3 Linear Projections 12.2 The Irreducible Decomposition for Affine Algebraic Sets 12.2.1 The Dimension of an Algebraic Set 12.3 Further Remarks on Projective Algebraic Sets 12.4 Quasiprojective Algebraic Sets 12.5 Constructible Algebraic Sets 12.6 Multiplicity 12.7 Exercises 13. Basic Numerical Algebraic Geometry 13.1 Introduction to Witness Sets 13.2 Linear Slicing 13.2.1 Extrinsic and Intrinsic Slicing 13.3 Witness Sets 13.3.1 Witness Sets for Reduced Components 13.3.2 Witness Sets for Deflated Components 13.3.3 Witness Sets for Nonreduced Components 13.4 Rank of a Polynomial System 13.5 Randomization and Nonsquare Systems 13.6 Witness Supersets 13.6.1 Examples 13.7 Probabilistic Algorithms About Algebraic Sets 13.7.1 An Algorithm for the Dimension of an Algebraic Set 13.7.2 An Algorithm for the Dimension of an Algebraic Set at a Point 13.7.3 An Algorithm for Deciding Inclusion and Equality of Reduced Algebraic Sets 13.8 Summary 13.9 Exercises 14. A Cascade Algorithm for Witness Supersets 14.1 The Cascade Algorithm 14.2 Examples 14.3 Exercises 15. The Numerical Irreducible Decomposition

xvii

212 215 216 217 219 220 223 225 227 229 231 234 235 236 237 238 239 241 244 247 249 250 250 252 253 253 255 256 261 262 265

15.1 Membership Tests and the Numerical Irreducible Decomposition . . 267 15.2 Sampling a Component 272 15.2.1 Sampling a Reduced Component 272 15.2.2 Sampling a Deflated Component 273 15.2.3 Witness Sets in the Nonreduced Case 273 15.3 Numerical Elimination Theory 274 15.4 Homotopy Membership and Monodromy 275

xviii

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

15.4.1 Monodromy 15.4.2 Completeness of Monodromy 15.5 The Trace Test 15.5.1 Traces of Functions 15.5.2 The Simplest Traces 15.5.3 Traces in the Parameterized Situation 15.5.4 Writing Down Defining Equations: An Example 15.5.5 Linear Traces 15.6 Singular Path Tracking 15.7 Exercises 16.

The Intersection Of Algebraic Sets 16.1 Intersection of Irreducible Algebraic Sets 16.2 Equation-by-Equation Solution of Polynomial Systems 16.2.1 An Example 16.3 Exercises

Appendices Appendix A

Algebraic Geometry

A.I Holomorphic Functions and Complex Analytic Spaces A.2 Some Further Results on Holomorphic Functions A.2.1 Manifold Points and Singular Points A.2.2 Normal Spaces A.3 Germs of Complex Analytic Sets A.4 Useful Results About Algebraic and Complex Analytic Sets A.4.1 Generic Factorization A.5 Rational Mappings A.6 The Rank and the Projective Rank of an Algebraic System A.7 Universal Functions and Systems A.7.1 One Variable Polynomials A.7.2 Polynomials of Several Variables A.7.3 A More General Case A.7.4 Universal Systems A.8 Linear Projections A.8.1 Grassmannians A.8.2 Linear Projections o n P * A.8.3 Further Results on System Ranks A.8.4 Some Genericity Properties A.9 Bertini's Theorem and Some Consequences A.10 Some Useful Embeddings

276 277 279 280 280 281 282 283 284 288 289 290 292 293 294

297 299 300 302 306 308 308 310 316 317 318 320 321 322 322 323 324 325 327 329 330 331 334

Contents

A.10.1 Veronese Embeddings A.10.2 The Segre Embedding A.10.3 The Secant Variety A. 10.4 Some Genericity Results A.ll The Dual Variety A.12 A Monodromy Result A.13 Line Bundles and Vector Bundles A.13.1 Bihomogeneity and Multihomogeneity A.13.2 Line Bundles and Their Sections A.13.3 Some Remarks on Vector Bundles A.13.4 Detecting Positive-Dimensional Components A. 14 Generic Behavior of Solutions of Polynomial Systems A.14.1 Generic Behavior of Solutions A. 14.2 Analytic Parameter Spaces

xix

334 334 335 336 337 339 341 341 341 343 343 344 347 349

Appendix B Software for Polynomial Continuation

353

Appendix C HomLab User's Guide

355

C.I Preliminaries C.I.I "As is" Clause C.I.2 License Pee C.I.3 Citation and Attribution C.I.4 Compatibility and Modifications C.I.5 Installation C.I.6 About Scripts C.2 Overview of HOMLAB C.3 Denning the System to Solve C.3.1 Fully-Expanded Polynomials C.3.2 Straight-Line Functions C.3.3 Homogenization C.3.4 Function Utilities and Checking C.4 Linear Product Homotopies C.5 Parameter Homotopies C.5.1 Initializing Parameter Homotopies C.6 Defining a Homotopy Function C.6.1 Defining a Parameter Path C.6.2 Homotopy Checking C.7 The Workhorse: Endgamer C.7.1 Control Settings C.7.2 Verbose Mode C.7.3 Path Statistics C.8 Solutions at Infinity and Dehomogenization

356 356 356 356 356 357 358 358 360 360 362 364 365 366 369 370 371 371 372 372 373 375 375 376

xx

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Bibliography

379

Index

397

Conventions

The following notational conventions are used in this book. • Often when using indices, we refer to the objects being discussed as a sequence with first and last elements. For example, we might write a Jacobian matrix as r &h. dzi

dh 1 ' dzm

dfN dfN L dzi " ' dzm J

This is an abuse of notation in the case N = 1 or m = 1. Rather than avoiding the abuse and obscuring things we usually leave the reader to fill in the special cases, e.g., in the example just given with N = m = 1 we mean

WA and not [ | | . • When clear from context, we let 0 denote the origin of a vector space. • When we have a map / : X —> Z between sets, and Y C X, we usually denote the restriction of / to Y by / y . Similarly, for a point z € Z, we denote the fiber f~1(z) by Xz. • We often use := when we are making a definition, e.g., the disk of radius r in the complex plane C around a point x is defined Ar(a;) := {z G C | \z ~ x\ < r) . In pseudocode statements of algorithms, we use the same symbol for copying the right-hand result to the left, e.g., k := k + 1 increments k by one. • We use multidegree notation. For example, if z\,..., zN are indeterminates, and / = (ii,...,ijv) is a n -/V-tuple of nonnegative integers, then z1 denotes N

*i'---2#. We let |/| := 5 > . i=i xxi

xxii

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

• C[x] is the set of polynomials in x with coefficients in C. Similarly, C[xi,... ,xn] is the set of multivariate polynomials with complex coefficients. • A polynomial p(zi,...,zN)

:= ^2 cizI |/|
e

C[zi,...,ZAr],

is said to be of total degree d (or of degree d for short) if there is at least one coefficient with c/ ^ 0 and \I\ = d. For one variable polynomials we often follow the reverse convention on the ordering, i.e., we write p(z) := aozd H \-ade C[z], with a0 ^ 0. • The symbol \ is the "setminus" operator, that is A\B:={xG • • • • •

A\x<£B}.

For set A, #A is the cardinality of A. For a subset A of a topological space, A denotes the closure of A. We use C* := C \ 0, the complex line minus the origin. We use p^\x) :— d?p(x)/dx^, the j t h derivative of p with respect to x. The ./V-dimensional complex Euclidean space is CN := C x • • • x C = {(xu...,xN)

J Xi e C } .

N times • For real numbers a, b, we denote open and closed intervals of the real line as [a, b] = {x e M | a < x < b} , [a, b) = {x G R | a < x < b} ,

(a, b] = {x £ K | a < x < b} , (a, b) = {x e R | a < x < b} .

• A point x in iV-dimensional complex projective space, FN, is often written via homogeneous coordinates enclosed in square brackets, vis [xo,xi,... ,XN] £ ¥N. (Projective space is explained in Chapter 3 and §12.4.) Unfortunately, this means that a real interval [u, v] C M and a point on the projective line [u,v] G P 1 have the same notation. The distinction between the two will be clear from context. • We use Hardy's big O notation. The expression, f(x) = g(x) + O(h(x)) as x —+ a, means that there exists a constant C > 0 such that |/(x) — g(x)\ < Ch(x) for all x sufficiently near a, but not necessarily equal to a. The most typical choices for a are 0 and oo, and the most typical choice for h(x) is Xs for some nonnegative integer 5.

PART I

Background

Chapter 1

Polynomial Systems

The goal of this book is to describe numerical methods for computing the solutions of systems of polynomial equations. It is appropriate, therefore, to begin by denning "polynomials," discussing how they may arise in science and engineering, describing in nontechnical terms what the "solutions of polynomial systems" look like and how we might represent these numerically. The last section of this chapter gives an overview of the rest of the book, to help the reader understand it in a larger perspective. 1.1

Polynomials in One Variable

As will be our habit throughout the book, we start with simple scenarios before proceeding to more general ones. A polynomial of degree d in one variable, say x, is a function of the form f(x) = aoxd + a^"1 +

h ad-ix + ad,

(1.1.1)

where ao,...,a,d are the coefficients and the integer powers of x, namely 1, x, x2, ..., x d , are monomials. In science and engineering, such functions usually have coefficients that are real numbers although sometimes they may be complex. Accordingly, we will consider f(x) as a function that maps complex numbers to complex numbers, / : C —> C. The notation C[x] is often used to denote the set of all polynomials over the complex numbers in the variable x, so that we may write /(x) G C[x]. When we say that f(x) in Equation 1.1.1 is degree d, this implies that a0 ^ 0; otherwise, we say that / is at most degree d. The "solution set" of the equation /(x) = 0 is the set of all values of x g C such that /(x) evaluates to zero. We may write this as /- 1 (0) = { x e C | / ( x ) = 0 } . One of the great advantages of working over complex numbers is that, by the fundamental theorem of algebra (see Theorem 5.1.1), we know that as long as ao ^ 0, f~l(x) will consist of exactly d points, counting multiplicities. Thus, a data struc3

4

Numerical Solution of Systems 'of Polynomials Arising in Engineering and Science

ture convenient for representing the solution of the equation is just a list of d complex numbers, say x\,...,x*d, not all necessarily distinct. These are also called the roots of the polynomial. If some of the roots are repeated, then the reduced solution set is just the list of distinct roots. We know the solution set is complete and correct if d

a0 Y[(x ~ xi)

= a

o%d + a-\xd~l + • • • + a,d-ix + ad-

1=1

That is, we expand the left-had side and check that all the coefficients match. It is possible to study polynomials over other rings, for example: the reals, M[x]; rational numbers, Q[x}\ the integers, Z[x]; any finite field1, W[x\; or sometimes, in statements of theory, an unspecified field, usually denoted K[x]. In one sense, there is no loss of generality in restricting our attention to C [x], for if we find all complex solutions of f{x) = 0, all real solutions will be contained therein, and similarly rational and integer solutions. However, the situation may be turned on its head if we ask other questions. As an example, suppose we seek the conditions for a sixth degree polynomial to be factorable over a field other than C. Since the fundamental theorem of algebra tells us that all polynomials of degree greater than one in one variable factor over the complexes, we would have to consider the specific field in question to get an answer. Computer algebra systems deal extensively with polynomials over the rational numbers or over finite fields, since these permit exact calculation. And in the area of encryption, essential to secure digital communications, polynomials on finite fields are crucial. However, in engineering and science, real or complex numbers are of greatest concern, and it is in this arena that we focus our effort. At this point, it is worth noting that our approach will be numerical, so in fact, all of the coefficients and the solutions we compute will be represented in floating point arithmetic. Typically, both will be only approximate, so that in reality we compute approximate solutions to a polynomial that is already an approximation of the original problem. This is the nature of almost all scientific computation. What is critical is that we have some estimate of the sensitivity of the problem so that we have assurance that the solutions are near the correct ones, or, as some would have it, that the problem that our solutions satisfy is near the one we want to solve. There is an extensive literature on the numerical solution of polynomials in one variable. We will not delve into it here, as our focus is on multivariate cases. For low-degree polynomials in one variable, one approach is to reformulate the problem as finding the eigenvalues of the companion matrix, / 0 A= ^ a

1

•••

0 \

:

:

:

0

0

'•• •••

oo

a0

Most commonly, the integers modulo a prime number.

1 ao'

,

(1.1.2)

Polynomial Systems

5

having ones on the superdiagonal and the coefficients of / in the last row. Since the characteristic polynomial of A is det(xl — A) = f(x)/a0, its eigenvalues are the roots of / . This formulation is convenient due to the wide availability of highquality software for solving eigenvalue problems, and as documented in (Goedecker, 1994), it is a highly effective numerical approach. For polynomials with high degree, divide-and-conquer techniques may be better (Pan, 1997).

1.2

Multivariate Polynomial Systems

We may generalize the single-variable case in two ways: we may seek the simultaneous solution of several polynomials, and each of these may involve more than one variable. The formal definition of a polynomial, which includes the single variable case, is as follows. Definition 1.2.1 (Polynomial) A function f(x) : Cn —> C in n variables x = (x\,... ,xn) is a polynomial if it can be expressed as a sum of terms, where each term is the product of a coefficient and a monomial, each coefficient is a complex number, and each monomial is a product of variables raised to nonnegative integer powers. Restating this in multidegree notation, let a = (a\,... ,an) with each a» a nonnegative integer, and write monomials in the form xa = YYi=i x
f(x) = J2aaxa,

(1.2.3)

where T is a finite index set and aa £ C. The notation f € C [ x i , . . . ,xn] = C[x] means f is a polynomial in the variables x with coefficients in C. The total degree of a monomial xa is \a\ := a\ +••• + an and ofthe polynomial f'(x) is max \a\. When no confusion can result, we abbreviate total degree to degree. In practice, polynomials often arise in unexpanded form, so that although in principle they can be expanded to the form of Equation 1.2.3, it is neither convenient nor numerically expedient to do so. Consequently, it is useful to make the following simple observations. Proposition 1.2.2

If f € C[x] and g € C[x] are polynomials, then

• - / e c[4. • f + geC[x], . f-g£c[x], • fg £ C[x], and • fk £ C[x], for any nonnegative integer k.

6

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Proof. Apply the distributive law to expand each expression into a sum of terms.

•

Note that constants are polynomials too, so we may add, subtract or multiply by them as well. Notice that the operation of division is missing. Although with suitable care, many of the techniques in this book can be extended to algebraic functions, which allow division, we will for the most part concentrate on polynomials. Mainly, this just means that before commencing to solve algebraic equations, we must clear denominators. 2 The facts listed in Proposition 1.2.2 allow us to consider systems of polynomial functions given in "straight-line" form, convenient for both the analyst and for evaluation by computer. Definition 1.2.3 (Straight-Line Function) Beginning with a list of known quantities, consisting of internal constants, c, and a set of variables, x, a straightline function specifies a finite sequence of elementary operations whose operands are among the known quantities and whose output is added to the list of known quantities. At termination, a subset of the known quantities are the function values, f(x). Definition 1.2.4 A polynomial straight-line function is one whose elementary operations are limited to those listed in Proposition 1.2.2. When coding a function for numerical work, the analyst typically writes the function in a high-level description using the standard rules for precedence of operations and parentheses, as necessary. A compiler program parses this into a low-level sequence of unary and binary operations, producing a succession of intermediate results until the function values are reached. The following is a direct consequence of Proposition 1.2.2 and Definition 1.2.4. Corollary 1.2.5 A polynomial straight-line function is a polynomial in x with coefficients that are polynomial in the internal constants c. One of our goals will be to solve such polynomial systems with a minimum of symbolic processing. Particularly, we do not wish to expand the polynomials into the form of Equation 1.2.3. For example, if / = 1 + x\ + X2 + • • • + %k, then the efficient way to evaluate fn given values of x\,..., Xk is to evaluate / and raise it to the nth power, as we would do in a straight-line program. Fully expanded, / " has ("^ ) terms, which can become rather large even for moderate n and fe. Example 1.2.6 Consider the polynomial f(x,y) expanded form, this has ten terms, namely

= (1 + 2.2a; - 0.3y)3. In fully

f(x, y) = 1 + 6.6 * x + 14.52 * x2 + 10.648 * x3 - 0.9 * y - 3.96 *x*y - 4.356 * x2 * y + 0.27 * y2 + 0.594 * x * y2 - 0.027 * y3. 2

Laurent polynomials, which allow negative exponents, are treated briefly in § 8.5

Polynomial Systems

7

An un-optimized evaluation of the function proceeding left to right and accumulating results term-by-term will not be very efficient. Compiling this into a sequence of elementary operations would give 27 operations in total. A computer code that only accepts fully expanded polynomials may optimize the evaluation procedure in some fashion. For example, the function is easily rearranged into nested Horner form as f(x, y) = l + x * (6.6 + x * (14.52 + 10.648 * x)) + y* (-0.9 + x * (-3.96 - 4.356 * x) + y * (0.27 + 0.594 * x - 0.027 * y)), which reduces the operation count to 18. This is still far from the most efficient straight-line form, in which we first evaluate the quantity inside the parentheses and then cube the result. Compiled into a sequence of elementary operations, the evaluation proceeds as follows, using two temporary variables a and b and only five operations a^- (2.2* a;) 6<- (1 + a) a«-(-0.3*j/) b *~ (b + a) f - (b3) Here, the "<—" symbol indicates that the right-hand expression should be evaluated and loaded into the variable at the left. 1.3

Trigonometric Equations as Polynomials

Problems in geometry and kinematics are often formulated using trigonometric functions. Very often these can be converted to polynomials. For example, equations involving sin 9 and cos# can be treated by replacing these with new indeterminates, say sg and eg, respectively, and then adding the polynomial relation s^ + Cg = 1. Once solution values for sg and eg have been found, the value of 9 is easily determined.3 Sine or cosine of a multiple angle can always be reduced to a polynomial in sine and cosine of the angle, e.g., sin29 = 2 sin9cos9, and the sine or cosine of sum and differences of angles can also be expanded into polynomials in the sines and cosines of the angles. There are limits, of course: Not all trigonometric expressions can be converted to polynomials. Examples include x + sinx and sin a; + sinxy. The reason that trigonometric expressions arising in practice are so often convertible to polynomials is that they usually have to do with angular rotations, 3 A different maneuver is to use a new variable t and the substitutions sin6< = 2i/(l + t 2 ) and cosd — (1 — t 2 )/(l + t 2 ). This avoids introducing a new equation at the cost of making the substitution quadratic.

8

Numerical Solution of Systems of Polynomials Arising in Engineering and Science Table 1.1 Solution Sets of Polynomial Systems Univariate Multivariate System 1 Equation, 1 Variable n Equations, TV Variables solution points sol'n points, curves, surfaces, etc. double roots, etc. sets with multiplicity Irreducible decomposition Factorization, r7|(^ — ai)fli Numerical Representation | list of points | list of witness sets ~|

whose main property is the preservation of length. Length relations are inherently polynomial, due to the Pythagorean Theorem. 1.4

Solution Sets

We have already described above the nature of the solution set of a single polynomial in one variable, which can be represented numerically by a list of approximate solution points. As summarized in Table 1.1, the situation is more complicated for multivariate systems of polynomials. Such a system may have solution sets of several different dimensions: that is, a system could have isolated solution points (dimension 0), curves (dimension 1), surfaces (dimension 2), etc., all simultaneously. Moreover, just as a univariate polynomial may have repeated roots, a multivariate system may have solution sets that appear with multiplicity greater than one. Corresponding to the factorization of a univariate polynomial into linear factors, the solution set of a multivariate system can be broken down into its irreducible components. Isolated points are always irreducible, but higher dimensional sets may factor. For example, the quadratic x2 + y2 factors into two lines {x + iy){x — iy), whereas the quadratic x2 + y2 — 1 is an irreducible circle. The computation of a numerical representation of the irreducible decomposition of a multivariate polynomial system is the major topic in Part III of this book. This requires witness sets, a special numerical data structure. We postpone any further discussion of this until that point. If f(x) : CN —> C™ is a system of multivariate polynomials, we use the notations 1 f~ (0) and V(f) interchangeably to mean the solution set of f(x) = 0, i.e., V(f) = f-1(0) =

{xeCN\f(x)=0}.

The set V(f) contains no multiplicity information. When multiplicity is at issue, we will explicitly say so. V(f) is read as the algebraic set associated to / or the algebraic set of f. The letter V in V(f) stands for variety, and indeed V(f) is sometimes referred to as the variety associated to / . As we will see at the start of § 12.2, the word variety often stands irreducible algebraic set. Because of the possible confusion that results, we have avoided using the word variety in this book. Let us state now one caveat regarding real solutions. Higher dimensional solution

Polynomial Systems

9

sets retain the property that the complex solution sets must contain the real solution sets. However, the containment can now be looser, because the real solution set may be of lower dimension than the complex component that contains it. For example, the complex line x + iy = 0 on C2 only contains one real point (x, y) = (0,0). Also, an irreducible complex component can contain more than one real component, as for example, the solution of y2 — x(x2 — 1) = 0 is one complex curve that has two disconnected real components, one in the range x > 1 and one in — 1 < x < 0. Regrettably, the extraction of real components from complex ones is not developed enough for treatment in this book. We refer the reader to (Lu, Sommese, k Wampler, 2005). This caveat notwithstanding, the complex solutions often give all the information that an analyst desires. In fact, although systems can, and often do, have solution sets at several dimensions, a scientist or engineer may often only care about isolated solution points. When circumstances dictate this, higher dimensional solutions may be justifiably labeled "degenerate" or "mathematical figments of the formulation." Consequently, methods that are guaranteed to find the isolated solutions, without systematically finding the higher dimensional solution sets, are of significant value, and we will spend a large portion of this book discussing how to do this efficiently. Moreover, the numerical treatment of higher dimensional solutions will rest upon the ability to reformulate the problem so that at each dimension we are seeking a set of isolated solution points. 1.5

Solution by Continuation

The earliest forms of continuation tracked just one root as parameters of a problem were moved from a solved problem to a new problem. A notable example is the "bootstrap method" of (Roth, 1962; Freudenstein & Roth, 1963), which happened to be applied to problems involving polynomials but made no essential use of their properties. Beginning in the 1970's, an approach to solving multivariate polynomial systems, called "polynomial continuation," was developed. To just list a few of the early articles, there are (Drexler, 1977, 1978; Chow, Mallet-Paret, & Yorke, 1979; Garcia & Zangwill, 1979, 1980; Keller, 1981; Li, 1983; Morgan, 1983). A more detailed history of the first period of the subject may be found in (Morgan, 1987). That period had relatively sparse use of algebraic geometry and centered on numerically computing all isolated solutions by means of total degree homotopies. A more recent survey of developments in finding all isolated solutions, taking into account which monomials appear in the equations, may be found in (Li, 2003). Methods for finding higher-dimensional solution sets are new; for these, we refer you to Part III of this book. In (Allgower & Georg, 2003, 1993, 1997), a broader perspective on continuation, including non-polynomial systems, is available. By using algebraic geometry and specializing "homotopy continuation" to take advantage of the properties of polynomials, the algorithms can be designed to be

10

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

theoretically complete and practically very robust. Besides being general, polynomial continuation has the advantage that very little symbolic information needs to be extracted from a polynomial system to proceed. It often suffices, for example, just to know the degree of each polynomial, which is easily obtained without a full expansion into terms. For small systems, other approaches may be faster, and we will mention some of these. But these alternatives are quickly overwhelmed by systems of even moderate size, whereas continuation pushes out the boundary to include a much larger set of practical applications. For this reason, we highly recommend continuation and we devote nearly all of this book to that approach. 1.6

Overview

The main text of this book is divided into three main parts: Part I an introduction to polynomial systems and continuation, along with material to familiarize the reader with one-variable polynomials and a chapter summarizing alternatives to continuation, Part II a detailed study of continuation methods for finding the isolated solutions of multivariate polynomials systems, and Part III in which continuation methods dealing with higher dimensional solution sets are presented. As such, Part I is a combination of classical material and warm-ups for a serious look at the continuation method. Although we give brief looks at some alternative solution methods, beyond Part I, we concentrate exclusively on polynomial continuation. Part II is our attempt to put a common perspective on the major developments in that method from the 1980's and 1990's. Part III brings the reader to the cutting edge of developments. The book also contains two substantial appendices. The first, Appendix A, provides extra material on some of the results we use from algebraic geometry. The style of the main text is intended to be understood without these extra details, but some readers will wish to dig deeper. Unfortunately, most of the existing mathematical texts take a more abstract point of view, necessitated by the mathematicians' drive to be general by encompassing polynomials over number fields other than the complexes. By collecting the basics of algebraic geometry over complex numbers, we hope to make this theory more accessible. Even mathematicians from outside the specialty of algebraic geometry might find the material useful in developing a better intuition for the field. Appendix C is important for the serious student who wishes to work the exercises in the book. We give a user's guide to HomLab, a collection of Matlab routines for polynomial continuation. In addition to the basic HomLab distribution, there is a collection of routines associated with individual examples and exercises. These are documented in the exercises themselves.

11

Polynomial Systems

1.7

Exercises

As the focus of this book is on numerical work, most of the exercises will involve the use of a computer and a software package with numerical facilities, such as Matlab. A free package called SciLab is also available. While most exercises require a modicum of programming in the way of writing scripts or at least interactive sessions with the packages, there are a few that require extensive programming. Unless stated otherwise, statements such as »x=eig(A) refer to Matlab commands, where " » " is the Matlab prompt. Similar commands are available in the other packages mentioned above. Exercise 1.1 (Companion Matrices) See Equation 1.1.2 for the definition of a companion matrix. In the following, poly() is a function that returns the coefficients of a polynomial given its roots, whereas roots () returns the roots given the coefficients. (1) Form the companion matrix for f{x) = x 5 - 1.500a;4 - 0.320a;3 - 0.096a:2 + 0.760a; + 0.156 and find its roots using an eigenvalue solver (in Matlab: eig). (2) Repeat the example using » f = p o l y ( [ l , 1.5, - . 4 + . 6 i , - . 4 - . 6 i , - . 2 ] ) to form the polynomial and » r o o t s (f) to find its roots. (Note that in Matlab, roots() works by forming the companion matrix and finding its eigenvalues.) (3) Wilkinson polynomials. Use >>roots(poly(l:n)) to solve the Wilkinson polynomial (Wilkinson, 1984) of order n

n?=1(x-i). Explore how the accuracy behaves as n increases from 1 to 20. Why does it degrade? (Examine the coefficients of the polynomials.) (4) Roots of unity. Use roots () to solve xn — 1 = 0 for n = 1 , . . . , 20. Compare answers to the roots of unity, e27™/™, where i = -J^l. (5) Repeated roots. Solve x6 - 12a;5 + 56a;4 - 130a;3 + 159a;2 - 98x + 24 using » r o o t s ( p o l y ( [ l , 1, 1, 2, 3 , 4 ] ) ) . What is the accuracy of the triple root? What is the centroid (average) of the roots clustered around x = 1? Exercise 1.2 (Straight-Line Polynomials: Efficiency) Consider the determinantal polynomials pn(xi,... ,xn2), where pn is the determinant of the n x n matrix listed row-wise. For example, having elements x\,...,xni . P2\X\,X2,X-i,X4,) =

(1) What is the degree of p n ?

Xi X2 X3 Z 4

=X1Xi-X2X3.

12

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

(2) How many terms are there in the fully expanded form of pn? Using the sequence of operations implied by the fully expanded expression, how many arithmetic operations are required to numerically evaluate pn, given numeric values of (3) Using expansion by minors, how many operations are required to numerically evaluate pn? (4) What method does Matlab use to efficiently evaluate the determinant of an n x n matrix? How many operations are required? Exercise 1.3 (Straight-Line Polynomials: Degree) As mentioned in Definition 1.2.1, the degree of a monomial is the sum of the exponents appearing in it, e.g., the degree of xy2z = x1y2z1 is 4 and the degree of a polynomial is the maximal degree of any of its terms. The purpose of this exercise is to find the degree of a straight-line polynomial without expanding it. (1) Given the degrees of / and g, what can you say about the degree of the result for each of the operations listed in Proposition 1.2.2? (2) Suppose each step of a straight-line program is given as an operator followed by a list of the addresses of one or two operands (as appropriate) and an address for the result. Design an algorithm to compute an upper bound on the degree of a straight-line polynomial. The complexity of the algorithm should be linear in the number of steps in the straight-line program. (3) Implement your algorithm in a language of your choice. (4) Can you think of a polynomial for which your algorithm computes a degree that is too high? Exercise 1.4 (A Trigonometric Problem) Figure 1.1 shows a planar two-link robot arm, with upper arm length a and forearm length b. The end of the arm is at point (x, y) in the plane. Simple trigonometry gives the relations x = acosd + bcoscp,

y = asin.0 + bsincp.

(1-7-4)

(1) Given a, b, x, y, use trigonometry to find 6 and 4>. (2) Reformulate Equations 1.7.4 as polynomials using the method suggested in § 1-3. (3) An alternative formulation is to let the coordinates of the "elbow" point be (u,v) and write equations for the squared distance from (u,v) to (0,0) and from (u,v) to (x,y). Do so. (4) Reduce the pair of equations in (u, v) to a single quadratic in u. What does this tell you about the number of solutions of the two-link arm? (5) What region of the plane can the endpoint of the arm reach? What happens to the solutions of the polynomial outside this range?

Polynomial Systems

13

Fig. 1.1 A planar two-link robot arm. The triangle with hash marks indicates a grounded link, meaning that it cannot move. Open circles indicate hinge joints that allow relative rotation of the adjacent links.

Exercise 1.5 (Solution Sets) Create a system of three polynomials in three variables such that the solution set includes a surface, a curve, and several isolated points? (Hint: it is easier to do if the equations are written as products of factors, some of which appear in more than one equation.)

Chapter 2

Homotopy Continuation

In this chapter we present the basic theory underlying the homotopy continuation method. This flexible method works well in many situations where there is no other numerical method. The underlying approach of homotopy continuation is to (1) put the problem we are solving into a family of problems depending on parameters; (2) solve the problem for some appropriate point in the parameter space; and (3) track the solutions of the problem as the point representing it in the parameter space passes from the point where we have the solutions to the point representing the original problem that we wish to solve. This approach is useful on a wide variety of problems, not necessarily polynomial, which exhibit a continuous dependence of the solutions on the parameters. Of course, in this generality, many things can go wrong, even to the extent that the approach completely fails. The major theme of this book is that for polynomial problems arising in applications, this approach works wonderfully well. An added advantage of homotopy continuation is that it may easily be parallelized: if the starting problem has several solutions, the corresponding solution paths may be tracked on different processors. In this chapter we start with simple examples and gradually build up to more general ones. For the first examples, there are other methods, but even for these examples the continuation method's many robust properties recommend its use. 2.1

Continuation for Polynomials in One Variable

Let us consider how to find the roots of the polynomial p(z) := zd + aizd~l + • • -+0,4 where d is a positive integer and the ai are constants. In Chapter 1, we saw that finding the eigenvalues of the companion matrix is an effective approach. Let's see how continuation might be used to solve this same problem. We know how to solve zd ~ 1 = 0: the roots are z* = ek1-n^\/d

{OTk=l,,..,d. 15

16

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Consequently, let's define our family of problems by H(z, t) := t(zd - 1) + (1 - t)p(z).

(2.1.1)

When t = 1 we have the system H(z, 1) = zd — 1, with known roots, and when t = 0, we have the system H(z,0) = p(z), which we want to solve. We propose to track the solution paths as t goes from 1 to 0. For example, applying Equation 2.1.1 to the very simple case p(z) = z2 — 5 = 0, we have H{z, t) = t(z2 - 1) + (1 - t)(z2 - 5) = z2 - (5 - 4t). Thus for t 6 [0,1] we have two solutions of H(z, t) = 0, namely z{(t) = \/5 — 4£ and 2^) = —-v/5 — 4t. As £ goes from 1 to 0, the roots go from ±1 to ±\/5, the roots of the equation z2 - 5 = 0. Pretending that we don't know formulae for the solution paths, our continuation method consists of numerically tracking the solutions of H(z, t) = 0 as t goes from 1 to 0. Of course, no one would bother to solve this trivial case in such a complicated way, but the point is that, with a few tweaks, the same approach works for any polynomial. So how can we numerically follow the solution paths? One approach is to observe that the solution paths z*{t) satisfy the Davidenko differential equation; see, e.g., (Davidenko, 1953a, 1953b; Allgower & Georg, 2003). This equation is obtained by noting that H(z*(t),t) = 0 for all t. Consequently, letting Hz(z,t) and Ht(z,t) denote the partial derivatives of H(z, t) with respect to z and t respectively, we have Z

For the general case of Equation 2.1.1, we have

dz*(t) dt

=

Ht(z*(t),t) H2(z*(t),t)

=

z*(t)d - 1 -p(z*(t)) tdz*(ty-i + (l-t)p'(z*(t))-

This is an ordinary differential equation for z*(t), with initial values given for z*(l). The roots we seek are the values z*(0). In the particular case of p(z) = z2 — 5, the Davidenko equation simplifies to

At this point we could numerically solve the two independent initial value problems, dz\ —7- = at

2 z\

, with

zi{\) = 1,

and

dz2 = dt

2 Z2

with

,N 22(1) = —1-

This does work, though it opens us up to all the issues and numerical errors facing the use of the numerical theory of ordinary differential equations.

17

Homotopy Continuation

A more numerically stable approach takes full advantage of the fact that the solution paths satisfy the equation H(z, t) = 0 for each t. Thus we might use the following algorithm to track the paths starting at zi{l) = 1 and 2:2(1) = —1Simple Path Tracker Begin (1) Set up a grid to,..., ijvf with M some large number, h = j ^ , and tj = (M - j)h; (2) For each i from 1 to 2, do (a) set w0 = Zj(l);

(b) for each j from 1 to M — 1 do i. use one step of Euler's method to define w = w-:

h\

ii. find the solution Wj+\ of H(z,tj) = 0 using Newton's method1 with start value w. End The reader probably has many worries about this simple algorithm. Some obvious ones are: Ql. Q2. Q3. Q4.

How should one choose Ml; Euler's method is pretty terrible; Newton's method could fail; and If you had a multiple root, e.g., your original system was z2 = 0, Newton's method does not work so well.

To these we also add the following observation: Q5. If one wants to solve the equivalent equation p(z) = 5 — z2 = 0, the homotopy, Equation 2.1.1, becomes H{z,t) = t(z2 - 1) + (1 - t)(5 - z2) = 0. This gives trouble at t = 5/6 (because H(z, 5/6) = (2/3)z2 has a double root) and at t = 1/2 (because H(z, 1/2) = 3/2 has no solution). Some quick responses to these concerns are: Al. In fact, we do not pick an M but choose the tj by an adaptive procedure. Of course this raises more questions, e.g., "How do we control the step size?" Section 2.3 below addresses the main points. A2. Because we use Newton's method to correct solutions as we move along, Euler's method gives the same accuracy as using a more sophisticated solver for ordinary differential equations. Higher-order predictors can be used in place of Euler's method to increase efficiency. 1

The method known as "Newton-Raphson's method" in engineering circles is commonly called just "Newton's method" in the numerical analysis community. We adopt the briefer appellation.

18

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

A3. The adaptive procedure is designed to keep the application of Newton's method within its zone of convergence. There are good ways of dealing with special situations where Newton's method still fails (see next item). A4. Yes, singular solutions pose particular difficulties, but there are a number of effective "endgame" procedures to refine such singular solutions. See Chapter 10. A5. Certain simple procedures guarantee that bad situations such as these happen with "probability zero." In the next paragraph, we apply a quick fix of wide applicability to the particular example above. All of these answers and answers to other questions, e.g., how to construct good homotopies H(z, t) when we have many equations with special structure, will be dealt with in this book. For now, let us satisfy ourselves that we can eliminate the troubles arising in the example in item Q5, above, using the following "quick fix." This is a special case of the gamma trick first introduced in (page 108 Morgan & Sommese, 1987a). Let's introduce a random angle 9 e [—vr, TT] and modify the homotopy of Q5 to H(z, t) = tel9(z2 - 1) + (1 - t)(5 - z2) = 0

(2.1.3)

where i = \/^T is the imaginary element. Note that at t = 1, we have the same start points z = ±1 as before. But now, due to the complex factor e%e, the paths are well-behaved for all t G [0,1]; the coefficient of z2 does not vanish nor does the constant. Figure 2.1 shows the solution paths for several values of 6 in (0, TT]. For values of 6 in [—n, 0), the paths are the reflection through the real line of those shown in the figure. We see that trouble is brewing for 9 near zero. For 8 = 0.1 the paths are mildly behaved, but the trend of what will happen for small 9 is apparent: as 9 —> 0, the paths start at ±1, meet at a double point at the origin, then follow the positive and negative branches of the imaginary axis to infinity, then re-enter the scene along the real axis, coming in from infinity to arrive at the final roots ±\/5. Numerically, we can stand a very small value of 0, although the length of the path becomes longer and longer. Thankfully, if we were to pick 6 at random, there would be a very small chance of picking 9 close enough to zero to cause any trouble. This kind of random complexification of a homotopy is a very useful tool for avoiding singularities. We will justify the gamma trick in a more general context in Chapter 7. 2.2

Complex Versus Real Solutions

In applications, it is quite common that only real solutions have physical meaning, yet we find all solutions, including the complex ones. Isn't this a waste of computing time? Why bother? One might think, "Surely it must be simpler to just find the real solutions." The answer has different aspects. First, there is currently no good general

Homotopy Continuation

19

Fig. 2.1 Solution paths of Equation 2.1.3 for 9 = {0.1, 0.3,1,2}.

method for finding all real roots directly. A good choice in low dimensions is to use exclusion methods, also known as interval or box-bisection methods, to fence in isolated roots, but in high dimensions, the rate of convergence tends to be slow. We summarize these methods in more detail in § 6.1. These methods have a place, faring best in comparison to continuation if dimensions are low, degrees are high (where there is the possibility of large numbers of complex roots), and if one only desires real roots in a limited region. These methods often perform poorly if the problem has any nonisolated solution components, as they bog down computing a large number of boxes covering the solution curve, surface, etc. Research in methods for real roots is an active area, so one shouldn't count them out. Meanwhile, continuation offers the option of finding all roots, real and complex, and then casting out the complex ones. The second answer is that there is useful information to be gained from the whole solution list. One example is a complex root with small imaginary parts, an "almost real" solution. Such roots suggest that a small perturbation of the problem might introduce a new real root. Indeed, a mechanical system modeled as a collection of rigid bodies always has a bit of elasticity, so "almost real" solutions of the mathematical model might indicate an extra assembly configuration for the actual device. An even more compelling reason to find all roots is that it can reveal structural information about other problems in the same family as the one at hand. The total

20

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

number of nondegenerate isolated roots for a general problem from the family is an upper bound on the number of such roots for any other problem in the family. The number of real roots does not respect such a relationship. The complete set of roots of the general problem can be used as start points for a homotopy to solve other problems in the family. This sometimes can make a large difference in the amount of computation used for those subsequent problems. Chapter 7 deals with this in some detail. One might hope to use continuation to follow just the real roots from the start system to the target system. As a general approach, this is doomed to fail, because the number of real roots is usually not constant. Even if the number of real roots is the same for the start and target systems, surprising things can happen. Figure 2.2 shows two examples where real solutions become nonreal while nonreal ones become real. Example 2.2.1 Suppose we set up a homotopy between the polynomials f(y)=y4-2y2~y

+l

and

g(y) = y4 - 2y2 + y + 1,

which both have two real and two nonreal solutions. The linear homotopy h(y, t) = yA - 2y2 - ty + 1 = 0 has two real roots for all t s [—1,1], except at t ~ 0, where there are two real double roots. But the two positive real roots for 1 > t > 0 do not connect to the two negative real roots for — 1 < t < 0. Example 2.2.2 Consider the homotopy h{y, t) = y4 - t + 0.25 = 0 which at t = 1 has two real and two imaginary roots. Let t travel around the unit circle in the complex plane; that is, let t = el6 as 9 goes from 0 to 2n. At the end of the circuit, we end up with the same polynomial and hence the same roots as at the start. But the paths starting at the two real roots lead to the imaginary ones, and vice versa. As a final word on the utility of finding nonreal solution points, we note that in Part III of this book, we give algorithms for finding all solutions to a polynomial system, including positive-dimensional solution components. These algorithms rely heavily on the ability to reliably find all isolated solutions, both real and complex, to certain polynomial systems related to the initial problem. 2.3

Path Tracking

The heart of any numerical continuation method is its path-tracking algorithm. We already presented a simple path tracking algorithm on page 17, where we noted

Homotopy Continuation

21

Fig. 2.2 Interchange of real and imaginary roots for two homotopies.

some deficiencies, especially regarding selection of the step size. Much has been written about path tracking in general (Allgower & Georg, 2003) and path trackers for polynomial continuation (Morgan, 1987) in particular, so we only sketch the bare necessities here. Surprisingly, perhaps, the basic algorithm presented below is sufficient for most of our needs without further improvements. The main improvement over our earlier simple algorithm is the use of an adaptive step size. For solving algebraic problems, we often place a higher priority on finding all solutions reliably than on finding one or a few solutions quickly. Therefore, when faced with a choice between speed and reliability, we choose the more cautious route. This has the added benefit that the cautious choice is usually simpler as well. General path trackers must deal with all sorts of difficult issues, for example, a path that bifurcates into several paths, or a path that reverses direction. Fortunately, with proper care in forming a homotopy, one can assure that the paths for solving polynomial systems have none of these troubles: they advance steadily as the homotopy parameter t advances and never intersect except possibly at the end target. (More precisely, the probability of a singularity occurring on a path is zero. This is an issue that will be discussed at greater length when we discuss homotopies.) The numerical treatment of singularities at the end of the homotopy is addressed in Chapter 10 on endgames. The nonsingular path-tracking task may be summarized as follows. Here, as throughout this book, we arrange the homotopy to begin at t — 1 and end at t = 0. Given the following: • a continuous homotopy function H(z,t) : Cn x R - > C n ; and • a start point x\, such that H(xu 1) = 0, where xx lies on a nonsingular path. That is, there exists a path z(t), continuous over t € (0,1], such that z(l) = Xi,

22

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

OH H(z(t),t) — 0, and the Jacobian matrix -j—(z(t),t) is nonsingular for all t £ (0,1]. Again, the existence of the nonsingular homotopy path, z(t), is one of the primary topics of Part II; for the moment, we just assume that it exists. Our goal is: • to move along the path, from t = 1 to as close as possible to t = 0, in order to produce a close approximation to the endpoint z(0) = limt_,o z(t) or else, in the case of a diverging path, to conclude that the limit does not exist. Section 3.7 outlines an improved treatment for the case of diverging endpoints. In the context of the introductory example of this chapter, we already touched on using Davidenko's equation to turn the path-tracking problem into an initialvalue problem for an ordinary differential equation. We also saw that we may use a predictor/corrector method based on having an explicit homotopy H(z,t). Such a predictor/corrector method is highly preferred, because the corrector step avoids the build-up of error which often accumulates in a numerical o.d.e. solver.

Fig. 2.3 Schematic of path tracking, showing prediction (Euler) and correction (Newton) steps. In practice, the step size would not be so big.

Basic prediction and correction, schematically illustrated in Figure 2.3, are both accomplished by considering a local model of the homotopy function via its Taylor series: H(z+Az, t+At) = H(z,t)+Hz(z,t)Az+Ht(z,t)At+mgher-Order

Terms, (2.3.4)

where Hz = dH/dz is the n x n Jacobian matrix and Ht = dH/dt is size n x l . If we have a point (zi,t\) near the path, that is, H(z\,ti) « 0, one may predict to a new approximate solution at t\ + At by setting H(z + Az, t\ + At) = 0 and solving the first-order terms to get Az = -H^izuh)

Htizuh) At.

(2.3.5)

23

Homotopy Continuation

On the other hand, when if(z 1 ,i 1 ) is not as small as one would like, one may hold t constant by setting At = 0 and solving the equation to get ^z=-H-\zuh)H{zut1)

(2.3.6)

These are precisely Euler prediction and Newton correction. The main concern of a numerical path-tracking algorithm is deciding which of these to do next and how big a step At to use in the predictor. A generic path-tracking algorithm proceeds as follows, adapted from (Allgower & Georg, 1997), (see also (Allgower & Georg, 2003; Morgan, 1987)). In our homotopies, we may assume that the path parameter, s, is strictly monotonic, that is, the path has no turning points. This is a consequence of the assumption above that the Jacobian matrix is nonsingular along the path. • Given: System of full-rank equations, g(v, s) = 0, initial point v 0 at *o = 1 such that <7(VQ, SQ) « 0, and initial step length h. • Find: Sequence of points (vj,Si), i = 1,2,..., along the path such that g(vi,Si) « 0, Si+i < Si, terminating with sn = 0. Return a high-accuracy estimate of v n . • Procedure: — Loop: For i = 1,2,... (1) Predict: Predict solution (u, s') such that ||(u, s') — (VJ_I,S,-_I)|| as h with s' < Si-i. (2) Correct: In the vicinity of (u, s'), attempt to find a corrected solution (w, s") such that g(w, s") w 0. (3) Update: If correction step was successful, update (VJ,SJ) = (w,s"). Increment i. (4) Adjust: Adjust the step length h. (5) Terminate Loop: Terminate when Si = 0 or when nonconvergence of the path has been detected. — Refine endpoint: At s» = 0, correct v» to high accuracy. There are many possible choices for the implementation of each step. Some useful choices are as follows. Predictor The simplest predictor is just u = Vj_i, but it is much better to use a linear prediction along the path. Higher-order predictions can also be used, such as matching a quadratic to two points and a tangent. There are two sensible linear predictors: Secant Predictor Use the last two points on the path to linearly extrapolate to the next. That is, (U, s') = (Vi_i,s) + /lAi/||Aj||,

Aj = (Vi_i - Vi_2,Si_i - S j _ 2 ) .

24

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Tangent Predictor Step along the tangent direction u = vi-a(dg/dv)-1(dg/ds), where a is calculated to give the desired step length. This is Euler's method. Step Length The step length can be measured by any preferred norm of (u — Vj, s' - Si-i).

A simple choice is ||(u — Vj, s' — Si-i)|| := \s' — s,_i|.

Corrector A common corrector strategy is to hold s constant, that is, s" — s', and compute w by Newton's method, allowing a fixed number of iterations. The correction is deemed successful if Newton's method converges within a pre-specified path-tracking tolerance within the allowed number of iterations. Step Length Adjustment A good strategy is to cut the step length in half on failure of the corrector and to double it if m successive corrections at the current step size have been successful. A choice of m in the range two to five works well. Final Step Near the end of the path-tracking interval, the step length is adjusted to land exactly on s = 0. Terminate Eventually, we must either arrive at s = 0 or else \s' — s»_i| must become progressively smaller. It is useful to set a minimum threshold for progress in s, below which we declare that the path is either diverging or approaching a singularity. One can also terminate if the magnitude of the solution grows excessively large. Refine Newton's method will work well for nonsingular endpoints. By keeping the number of iterations in the corrector small (no larger than three, conservatively just one) and the path-tracking tolerance tight, all intermediate points are kept close to the exact path, minimizing any chance that a solution will jump tracks. However, to save computation time, the path-tracking tolerance is generally looser at the beginning (for say, 1 > s > 0.1), then made tighter near the end (0.1 > s > 0), and finally set very tight for the final refinement at s = 0. This path-tracking algorithm incorporates an adaptive step size. One can also employ adaptivity at a higher level. Specifically, if a path-tracking failure occurs, the whole path can be recomputed with more conservative choices in the control parameters. Most useful is adjustment of the tracking tolerance. 2.4

Exercises

This set of exercises begins with the simple o.d.e. and fixed-step path trackers applied to single variable homotopies and carries through to the application of HomLab's variable-step path tracker to multivariate systems. Exercise 2.1 (Davidenko Equation) Derive the Davidenko differential equation for the homotopy equation H(z, t) = jt{z2 - 1) + a(l - t){z2 - 5),

(2.4.7)

Homotopy Continuation

25

and check that for 7 = o — 1 you get Equation 2.1.2. In HomLab\exercise, there is a short m-file, davidenko .m that defines this function in a form suitable for solution by an o.d.e. solver. To use it, you must declare global variables » g l o b a l gamma sigma and assign them values. The solution path beginning at z(l) = a may be obtained with the command » [ t ,z]=ode45(Qdavidenko, [1 0] ,a) ; Use this approach to do the following. (1) Verify that for 7 = 1, a = 1, the solution paths starting at z(l) = ±1 terminate near ±\/5. What accuracy is achieved? (2) Try the same for 7 = 1, a = —1. (Tip: C interrupts a nonterminating process.) (3) Reproduce Figure 2.1 by setting a = - 1 and using 7 = el6 for 9 = {0.1,0.3,1,2}. (4) Compute the path from z(l) = 1 for a = - 1 and 7 = eie for 0 = W~k, k — 0,2,4,6,8,10. Monitor the number of time steps, the computational time used, and the final accuracy \z(0) - \/5| versus k. (5) Using the results of the previous item, examine the history of t returned by ode45. What values of t require small time steps? (Hint: p l o t ( t , ' . ') may be insightful.) Save the array of t values for use in the next exercise. Exercise 2.2 (Crude Path Tracker) File HomLab\exercise\crudetrack.m implements the simple path tracking algorithm given in § 2.1, generalized to the homotopy of Equation 2.4.7. The calling sequence is >> [z] = crudetrack(zO,t, gamma, sigma); where zO is an array of starting values for z(t), and t is an array of values of t. (1) Use >>[z]=cmdetrack( [1 -1] , 1- [0:M] /M,gamma,sigma) to track the paths using M steps of equal length. Do the same experiments as in Items 1-3 of the previous exercise using M = 100. How does the speed and endpoint accuracy compare? (2) With M = 100, how close to zero can 6 be so that the paths for 7 = e*9, a = — 1, still have the correct shape? What happens for smaller 91 (3) Try a small value of 6 but use the nonuniform t values returned by ode45 from Exercise 2.1. What does this show about the value of step size control? How does the final accuracy compare to the o.d.e. approach? Can you explain the difference? Exercise 2.3 (Crude Tracker Generalized) Another m-file, crudetrack2.m, generalizes the simple path tracker to the homotopy H(z,t)=1tg(z) + (l-t)f(z),

(2.4.8)

26

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

where g(z) and f(z) are any polynomials in one variable. The calling sequence is >> [z] = crudetrack2(zO,t, gamma, g,f); where g and / are given as coefficient arrays in the usual Matlab convention. For g=[l 0 -1], f=[-l 0 5], this is exactly the same as crudetrack with a = — 1. (1) Compare the speed of crudetrack and crudetrack2. Can you explain the difference? What does this say about the importance of efficient function evaluation? (2) Use crudetrack2 with g and / as in Example 2.2.1. Choose 7 complex in the vicinity of 1 to avoid trouble with double roots. Try other values of 7 around the unit circle. Do the start points always end up at the same endpoints? (3) Try crudetrack2 to solve a polynomial f(z) of degree 7 having random real coefficients chosen in the range [—2,2]. Use the start system g(z) = z7 — 1. Compare the success rate using 7 = 1 versus using 7 = e%e for a random 6 e [0,27r]. Exercise 2.4 (Multivariate Davidenko O.D.E.) The Davidenko differential equation generalizes for multivariate homotopies. (1) Derive the Davidenko equation for a homotopy H(z,t) = 0, where H(z,t) : C " x l ^ C n . (Hint: see Equations (2.3.4) and (2.3.5).) (2) Use this approach and Matlab's ode45 to solve the system

*<*.».«>=7* (£ :!)+(!-•) ( J ; x % ; 2 , ) =». <*«) (Routine davidtwo.m may be used.) What are the starting solutions at t = 1? How many of the endpoints at t = 0 are real? What is the final value of H(x,y,l)? Exercise 2.5 (Variable-Step Tracker) An implementation of the variable stepsize tracker as described in § 2.3 is provided as routine goodtrack.m. It makes some choices of control parameters that are generally acceptable for tracking well-scaled homotopies in double precision. This is similar to the main path-tracking engine inside HomLab. The routines in the first two items below use goodtrack. m to solve some of our earlier examples. (1) Use goodtrack2vble.m to track the homotopy of Equation 2.4.9. (2) Using these examples for guidance, write your own m-file to reproduce Figure 2.1. Do the same study as in Exercise 2.1, items 4 and 5, to see the effectiveness of the variable step size.

Chapter 3

Projective Spaces

A very useful tool in both algebraic geometry and in the practical implementation of polynomial continuation, projective spaces are a fundamental construct. They simplify theorems by sewing up infinity, compactifying Euclidean space so that points at infinity becomes just like ordinary points. This allows us to more conveniently make accurate statements about the number of roots of polynomial systems. Furthermore, in the numerical context, this has the benefit of allowing "solutions at infinity" to be computed as easily as finite ones. The concept of solutions at infinity and why one would wish to compute them will also be covered in this chapter. 3.1

Motivation: Quadratic Equations

To motivate the introduction of projective space, let's begin with the very familiar quadratic equation in one variable, x, ax2 + bx + c = 0,

(3.1.1)

which has two solutions given by the quadratic formula: x = ~6±v/*2-4aC. (3.1.2) 2a Of course, this is not quite the whole story, for if we wish to be precise, we must add the caveats: • • • •

if 62 — 4ac = 0, then there is just one (double) root, x = —b/(2a); if a = 0, b ^ 0, there is just one root, x = —c/b; if a = b = 0, c 7^ 0, there is no solution; and if a = 6 = c = 0, the solution is all x e C

There are two ways to simplify the situation. One way is to exclude all but one of the special cases by observing that if a = 0, we don't really have a quadratic equation, but something of lower degree. Accordingly, we may say that a quadratic equation with nonzero coefficient on x2 has two roots, possibly equal, given by 27

28

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Equation 3.1.2. This is correct and simple, but it merely sidesteps the exceptions. Moreover, we will have a completely separate statement for linear equations that says nothing about the connection between linear and quadratic equations, even though it is clear that the set of quadratic equations parameterized by the coefficients (a, b, c) includes linear equations. There is an associated concern in numerical work: what should we do if a is very small? Careful analysis of the quadratic formula shows that as a —> 0, one root approaches — c/b while the other root diverges to infinity. Is there a well-behaved numerical representation of the large root? A second way to simplify the situation by formulating the solution of the quadratic equation in terms of projective space addresses these concerns. We replace x by the ratio u/v and clear denominators to obtain the homogeneous polynomial au2 + buv + cv2 = 0.

(3.1.3)

Because of the homogeneity, if (u,v) satisfies Equation 3.1.3, then so does (Xu, Xv) for any A G C, and as long as v ^ 0, these give the same value of x = u/v. We use the notation [it, v] j^ [0,0] to denote all pairs (v!, v') ^ (0,0) such that (V, v') = (Au, Xv) for some A G C. We call the space of all nonzero [it, v] the one-dimensional complex projective space, denoted P 1 , and we call [u,v] the homogeneous coordinates of P 1 . Points [u, v] with « / 0 are said to be "finite," whereas the point with v = 0 is said to be "at infinity." (There is only one point, [u,v] = [1, 0], at infinity in P 1 .) With this notation, we see that for a = 0, b ^ 0, Equation 3.1.3 factors as (bu + cv)v = 0, so there are two roots: [u,v] = {[—c,b], [1,0]}. The first gives the same x = —c/b as we had before, while the second is a root "at infinity." Similarly, for a = 6 = 0, c / 0, we have cv2 = 0, which implies a double root at infinity [u,v] = [1,0]. Note that b2 — Aac = 0 for this case. Accordingly, we may eliminate two of our former caveats to say that in projective space, the homogeneous quadratic equation, au2 + buv + cv2 = 0, has two roots for general a, b, c, one double root for b2 — Aac = 0, and all [u, v] G P 1 when a = b = c — 0. This is certainly more succinct than our first statement in the opening paragraph of this section, while still covering all the cases. This is because roots at infinity have become just like any other roots. In homogeneous coordinates, the quadratic formula can be written in many equivalent ways, since only the ratio of u to v matters. The following formulae agree everywhere that they are well defined: [u,v] = [-b±Vb2-Aac,2a], 2

if a ^ 0; 2

[u, v] = [-b - Vb - Aac, 2a] or [2c,-b-^b 2

[u,v] = [2c,-b±s/b -Aac],

- Aac], if b + 0. if c ^ 0;

(3.1.4) (3.1.5) (3.1.6)

For every (a, b, c) ^ (0,0,0), at least one of these formulae is well denned. These are also useful for accurately computing numerical values of roots in the neighborhood of infinity.

29

Projective Spaces

Of even greater importance for our larger goal of treating general polynomial systems is the fact that homogeneous coordinates allow the continuation method to track solution paths to infinity without any numerical difficulty. We will return to this in § 3.7 after first discussing projective spaces more thoroughly. 3.2

Definition of Projective Space

We have already denned P 1 in the foregoing example. The concept generalizes straightforwardly to any dimension as follows. Definition 3.2.1 (Projective Space) JV-dimensional complex projective space, denoted VN, is the space of complex lines through the origin in CN+1. Points in P are given by (JV + l)-tuples of complex numbers [ZQ, • • • ,-ZTV], not all zero, with the equivalence relation given by {zo, • • •, ZJV] ~ [ZQ, • • •, z'N] if and only if there is a nonzero complex number A such that z'j = XZJ for j = 0,... ,N.

The definition makes sense, because a line through the origin in CJV+1 is a set of the form

{(Xzo,...,XzN)eCN+1

I AGC}

with not all the Zi zero. The Zi occurring within the brackets [z$,..., z^} are called homogeneous coordinates, even though they are not coordinates on P^, but rather coordinates on C^"1"1. To put the structure of a complex manifold and hence also the structure of a topological space on FN, we specify coordinate charts. We define the sets Ul:={[z0,...,zN}GPN

| Zi/0}.

On Ui the ratios Zj/zi of the homogeneous coordinates Zi, Zj are well-defined functions that can be used to identify Ut with C ^ . Indeed, we identify CN with UQ by the map (zo,i, • • • i ZO,N) —• [1,20,11 • • • 1 ZO,N], and for other i, we identify • • • J ^ . J V ) —> [zi,\,-

• • , ^ i , i - i , 1 , zi
zi
where

we make the obvious modification for i = N. The transition functions between Ui \ {zitj = 0} and Uj \ {zjti = 0} for i ^ j are given by z^k = ziykjzi%} for all i and j . Here we follow the convention z^i = 1 for any i. One way to think of projective space P^ is as C^ with infinity a slit filled by P-^"1. In other words, we have the following. • P° consists of a single point, C° = [1]. • P1 has the chart UQ given by (w) —» [l,w] and the chart U\ given by the map (z) —> [z, 1], and the transition function z = 1/w. Uo is thus identified with C1 and covers all of P 1 except the point [0,1]. So we have that P 1 = C1 U C° = Uo U (Ui \ Uo) = {weC}U{z = 0}.

30

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

• P ^ is the disjoint union PN = CN U C ^ " 1 U • • • U C° given by ¥N = Uo U (U! \ Uo) U • • • U (UN \ U t o 1 ^ ) -

Thus, we make C1 compact by adding a single point at infinity to form P 1 . Similarly, we make C2 compact by adding a line at infinity to form P 2 , the line at infinity being itself compactified as P 1 . In the next two sections, we solidify concepts by discussing P 1 and P2 in greater detail. 3.3

The Projective Line P 1

First, let's clarify a bit of terminology. Historically, C1 has often been called the "complex plane," because we may identify each point x e C 1 with a point in the real plane M2 by sending x to (Re(x), Im(x)). To avoid confusion, we prefer the terms "Argand plane" or "Argand diagram" for this construction. We say that C1 is a line, having one complex dimension, in analogy to the real line M1, which has one real dimension. We will always have to keep in mind that the n dimensional complex Euclidean space Cn is isomorphic to K2n, the 2n dimensional real Euclidean space. We have seen above that P 1 is the union of the complex line C1 with a single point, .Hoo := [0,1], which we call the point at infinity. We may visualize the real part of P 1 , as in Figure 3.1, by plotting real [u, v] as a line through the origin in R2 through the point (u,v). Several such lines are shown, with [1,0] as the horizontal line and [0,1] as the vertical line. Figure 3.1 also shows the chart Uo represented by the dashed vertical line (1, to) as w varies. We see that there is one point of intersection of this line with each line through the origin except for the line [0,1], which is the point at infinity that we must add to complete P 1 . It is just as valid, however, to consider P 1 as the union of U\, shown here as the horizontal line (z, 1) as z varies, with the single point [1,0]. In fact, any inhomogeneous line au + f3v = 1 cuts each line through the origin exactly once, except for the line au + (3v = 0. That is, we may view P 1 as the line au + (3v = 1 union the single point \fl, —a]. Such a line, labeled £/', is shown in the figure. Each such line is a "Euclidean patch" that covers all of P 1 except one point. When performing a calculation in P 1 , we are free to choose any patch that is convenient as long as the answer we seek is not the point missing from that patch. For example, the line L, which happens to represent the point [cos7r/8,sin7r/8], intersects each of the patches Uo, U\, and U' in one point, whose coordinates will be the numerical representation of this projective point on that patch. This will turn out to be very useful in § 3.7 below. The Riemann sphere uses stereographic projection to visualize P 1 more fully, showing all of P 1 , not just its real part. It allows us to draw P 1 , which is a manifold having two real dimensions, as a surface in real three-dimensional space. This

Projective Spaces

31

Fig. 3.1 Real part of P 1 , showing three patches and the real part of the Riemann sphere.

construction does not readily generalize to higher-dimensional projective spaces, and it plays no essential role in our numerical work. However, it is useful for gaining a better understanding of P 1 , so we include it here. The construction of the Riemann sphere is illustrated in Figure 3.2. The part of the sphere that corresponds to the real line is shown as the dotted circle in Figure 3.1, tangent to [0,1] at the origin. Each real line through the origin, except the line [0,1], intersects the circle at the origin and in one additional point. We set up a similarity correspondence between the nonzero point and the line. The line [0,1] meets the circle in a double point at the origin, so we say that the origin corresponds to that line. This takes care of the real line inside of P 1 . To get the full picture, we extend into three dimensions. Suppose w = a + bi £ C1, with a, b real. Figure 3.2 illustrates the chart UQ which maps (w) to [l,w] by plotting it in three (real) dimensional space as the point (l,a,b). That is, we send each point in P 1 \ [0,1] to the plane x = 1 in R3. Figure 3.2(a) shows the real slice from Figure 3.1 laid on its side, Figure 3.2(b) shows a similar imaginary slice standing up, and Figure 3.2(c) combines the two. We see that the plane x = 1 intersects almost every real line through the origin in R3 in one point, just missing those passing through points of the form (0,a, b). But all of those missed lines are just the same point in projective space [0,1]. The dashed circles are great circles of the Riemann sphere, lying tangent at the origin to the plane x = 0. The correspondence between a point (1, a, b) and a point on the sphere is analogous to the real slice but rotated into the plane containing the origin, (1,0,0) and (1, a, b). Moreover, all the lines through the origin and a point of the form (0, a, b) meet the sphere at the

32

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

origin. That is, the points of the sphere and the points of P 1 are in one-to-one correspondence, and in fact P 1 is topologically equivalent to the real two-sphere S2.

(c) Real and imaginary slices with rectangle in Co

Fig. 3.2

3.4

Riemann sphere construction.

The Projective Plane P 2

The projective plane, P 2 , is a compactification of the complex plane C2. There are several natural ways of adding points at infinity to compactify C2, but P 2 stands out as the "simplest" and arguably the most useful in general. One sees that C2 is equivalent to M4 just by taking real and imaginary parts. In contrast, P2 is a manifold of four real dimensions for which there is no easy visualization. This limitation of our three-dimensional minds notwithstanding, by the definitions we have already given, it is easy to see that P 2 is C2 with a projective line, P 1 , added at infinity. Just to clarify, let's restate the general construction for the specific case of P 2 . It is defined as the set of all triples [zo,zi,z2] of complex numbers except [0,0,0] subject to the equivalence of [z0, Zi,z2] and [z'o, z[, z'2] if there is a nonzero complex

33

Projective Spaces

number A such that z[ = \z\ for i = 0,1,2. There is a one-to-one correspondence of points [zo,zi,z2] G P 2 with lines in C 3 that contain the origin. Indeed, simply make the association [ZO,ZT.,Z2}

e C3|A G C}.

<-> {(\zo,\Zl,\z2)

We can identify C 2 with a subset Uo of P 2 by sending (xi,x2) G C 2 to [l,xi:x2] G P 2 . The remaining portion of the space, P 2 \ Uo, is the set of triples [0, z\, Z2} satisfying (1) not both of z\ and z2 are zero; and (2) [0,21,2:2] and [0, 2i,2 2 ] a r e equivalent if there is a nonzero complex number A such that z[ = Xzi for i = 1,2. We can identify P 2 \ UQ with P 1 via the association [0,21,22] <-> [21,22]. We call Hoo := P 2 \ UQ the line at infinity. The importance of P 2 and its homogeneous coordinates [20,21,22] lies in the fact that if p(zo, z\, z2) is a homogeneous polynomial, i.e., a polynomial where each monomial term has the same total degree, then the zero set of p(zo, z\, z2) is well = defined on P 2 . For example, if p(zo, z\, z2) = ZQ + 2122, then p(\zo,Xzi,Xz2) A 2 p(2 Q ,2i,2 2 ). Moreover, given the zero set C C C 2 = : Uo C P 2 of a polynomial p{x\, x2) of degree d, the closure C of C in P 2 is the zero set of the homogenized polynomial \z0

zoj

which by abuse of notation we typically write p(zo, Z\, z2). When we say a polynomial has degree d, we assume that there is at least one term of the polynomial with degree d. Homogeneous coordinates are very well adapted for computations, as we shall see in the next section. As a simple illustration, let's consider the intersection of two parallel lines, given as

Two general lines in C 2 intersect in a single point, but these parallel lines either coincide, if c = d, or they do not meet, if c ^ d. Homogenizing with x = z\/z0, y — Z2/Z0, one has + p(x,y)=(aZl+buZ2 \az! +bz2+dzoj ^y 'yj

C

;°)=0.

(3.4.8) v

;

Assuming that at least one of a, b is nonzero, we now find the solution [0, —b, a}; that is, the lines meet at infinity. The line at infinity, H^, meets each finite line in a point and any two lines passing through a given point on H^ are all parallel. Accordingly, Hoo is the set of slopes for the finite lines, and in the breakup we gave

34

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

above for H^, = C1 U C°, C1 represents the slopes of lines that have a finite slope while the final point, C° = [0,0,1], represents the slope of a vertical line. We have the nice result that every pair of noncoincident lines in P2 meet in a single point. For any pair of lines, parallel or not, we may write the homogenized equations in the form Az = 0, where A is a 2 x 3 matrix. Then, the solution can be computed using any of the standard methods from linear algebra, such as Gaussian elimination with column pivoting, or more robustly, the singular value decomposition. 3.5

Projective Algebraic Sets

To speak of projective algebraic sets, we first need to define homogeneous polynomials. Definition 3.5.1 (Homogeneous Polynomial) A polynomial f onCN+1 is said to be homogeneous of degree d if f{Xz) = Xdf(z) for all (X,z) e C x CN+1. This is equivalent to / being expandable to a sum 2_^ cjz1, where / denotes an \i\=d

(N + l)-tuple (io,..., ijv) G ^^Q" 1 °f nonnegative integers with |/| := io 4- • • • + IN, z1 := ZQ° • • • z)y , and c/ £ C Though a homogeneous polynomial / does not have a well-defined value at most points of FN, the set where it is zero is well defined. This follows because a point [zo,..., zpj] G ¥N is also represented by its multiples, [Xz0:...,

XzN] with A € C*. If f{z) = 0 we have also f{Xz0,...,

Given a system of n homogeneous polynomials on C

XzN) = Xdf{z)

= 0.

ZJV+I)

fi(z\,...,

/(«):=

N+l

I

,

(3-5.9)

• • • >2jV+l).

_fn(zi>

we denote the set of common solutions on VN by V(h,---Jn):={zeCN+1

| /i(*) = 0;... ;/„(*) = 0 } ,

or more briefly V(f). We give a name to such sets as follows. Definition 3.5.2 (Projective Algebraic Set) A projective algebraic set is any subset of P whose homogeneous coordinates are the set of common solutions to a system of homogeneous polynomials on CN+1. The simplest projective algebraic set is P ^ itself, because it is the solution of an empty set of polynomials. Next simplest would be a hypersurface defined by a single homogeneous polynomial. It can happen that a polynomial system comes directly from an application in homogeneous form. More commonly, we start with an inhomogeneous polynomial

35

Projective Spaces

system and, as we did in this chapter for the quadratic equation, Equation 3.1.1, and for lines in the plane, Equation 3.4.7, we homogenize the equations in order to facilitate computing their solutions. Similar to what we did in § 3.1 for equations ,xn) on C 1 and in § 3.4 for equations on C 2 , we homogenize a polynomial p(xi,... of degree d on C" as P ( z 0 , z u ..., z n ) = z ^ ( — , . . . , — ) .

(3.5.10)

Indeed, if V(p) C C 2 is the solution set of p — 0, then its closure V(p) c P n is the zero set of P. When we say a polynomial has degree d, we assume that there is at least one term of the polynomial with degree d. A polynomial of multidegree ( d i , . . . ,dm) is multihomogenized by applying this same procedure for each of m groups of variables. The upshot of this is that a root of p having a large magnitude for one or more can be represented numerically as a zero of P having variables among xi,...,xn moderate magnitudes for z\,...,zn but a small value of ZQ. For example, we may map (xi,... ,£„) — i > [l/y,Xi/y,... ,xn/y], where y = max \xi\. In this manner, a solution at infinity, or in the neighborhood of infinity, becomes numerically tractable.

3.6

Multiprojective Space

A mild generalization of projective space will prove useful in later chapters. We wish to consider spaces that are built using projective spaces as the building blocks. Definition 3.6.1 A multiprojective space is a cross product of projective spaces, P™1 x • • • x P " m . This includes the case m = 1, which is just a projective space P n i . The homogeneous coordinates for such a space are the cross product of homogeneous coordinates for each projective factor, hence forming a space ( C n i + 1 \ 0 ) x ••• x ( C n m + 1 \ 0 ) . Definition 3.6.2

A multihomogeneous polynomial ft-

J\zli

y

• • • i zm)

\ . (pni + 1 v •^

v

(pnm + l

X • • • X li,

(p

> U_

of multidegree ( d i , . . . , dm) is a polynomial such that f(\iz1,...,\mzm)

= Xf1

•••X^f(z1,...,zm)

for all {{Al, • . . , Am), Z\, . . . ,Zmj fc lb

X U_

X • • • X H_

We may also say that such a function is m-homogeneous, and the 1-homogeneous case is understood to be included. We say that a multihomogeneous polynomial / is compatible with multiprojective space X if the dimensions n i , . . . , nm match.

36

Numerical Solution of Systems

of Polynomials

Arising in Engineering

and Science

A multihomogeneous polynomial is just a sum of terms whose monomials all have the same multidegree [d\,..., dm); that is, monomials of the form zf1 • • • z^ m with |o!i| = di. The procedure described in § 3.5 for 1-homogenizing a polynomial can be applied separately to each of the variable groups Zj to multihomogenize a polynomial. can be looked at as having Example 3.6.3 The polynomial x\y\-\-x\Xiy\Jry\J\-\. degree (2,4) in the variables £1,£251/1,2/2,2/3- Multihomogenizing with respect to this grouping gives x\j/iyg + xix2ylvl + x\y\ + x%y$. It would be natural to define a "multiprojective algebraic set" A as any subset of a multiprojective space X such that the multihomogeneous coordinates of A are the set of common solutions to a system of multihomogeneous polynomials compatible with X. There is no need to do this. In § A.10.2, we will see that any multiprojective space may be regarded in a natural way as a projective algebraic set, thus every multiprojective algebraic set is a projective algebraic set. Use of multiprojective space often leads to simple descriptions of important sets. An example is the generalized eigenvalue problem {AX + B[i)v = 0, in which A and B, each an n x n square matrix, are known, and (A, fi,v = ( « i , . . . , D n ) ) e C 2 x Cn are to be found. This is a set of n homogeneous quadratics. The equations are homogeneous of bidegree (1,1) in (A,/x) and in v separately, so the solution sets have a natural interpretation as sets in P 1 x P™"1. Much more could be said about eigenvalue problems, but for now we just show this as an example, one common enough that most packages for linear algebra include a solution method for it. To avoid confusion later, we point out that unlike in this example, in a more general case of a multihomogeneous polynomial system, the individual equations can have different multihomogeneous degrees.

3.7

Tracking Solutions to Infinity

Let us return now to the subject of Chapter 2, that is, solving polynomial systems by tracking the solution paths of a suitably defined homotopy h{x, t) = 0, where h(x,l) is a starting polynomial system whose solutions are known and h(x,0) is the target polynomial system we wish to solve. Often, this will be a linear interpolation between a start system g(x) and a target system f(x), both consisting of n polynomials in n variables, as h(x,t)=tg(x) + {l-t)f(x).

(3.7.11)

37

Projective Spaces

In particular, we might choose g(x) as the system

g(x)=1\

:

,

(3.7.12)

W" ' 1/

where 7 is a randomly chosen complex number and di is the degree of the ith polynomial in f(x). The art of choosing a good homotopy is studied extensively in Part II of this book, so let's just take it on faith for now that, with probability one, the homotopy paths starting at the JliLi ^ solutions to g(x) = 0 are nonsingular for t £ (0,1] and the endpoints of the paths as t —> 0 include all the nonsingular solution points of f(x) = 0. The matching of the degrees in the polynomials of g(x) to those of f(x) is an attempt to match the number of roots of the two systems, so that there are no wasted paths. This works some of the time, but not always, and when the difference is too great, we will make use of more sophisticated homotopies. But despite our best efforts to match the homotopy to the problem at hand, it is very common for the start system to have more solutions than the target system. In such cases, the extra solutions must diverge. This causes two problems for the path tracker. First, a diverging path has infinite arclength, which can cause the path tracker to spend an inordinate amount of time on a futile quest. Second, as the magnitude of the solution grows, the polynomials can no longer be accurately evaluated and all numerical accuracy is lost. One simple remedy mentioned in § 2.3 is to simply truncate any path whose solution components grow too large in magnitude. This introduces an uncertainty about setting the limit, because one never knows if the path may be heading to a large, but finite, solution, or even if the path might reverse course and converge to a small magnitude. Indeed, in the example from Q5 in § 2.1, we encountered a path that approached infinity at t = 1/2 and then returned to the finite realm. A robust way to eliminate the trouble is to homogenize the polynomials, as in Equation 3.5.10, and track the paths in P n . Our homotopy becomes H(z, t) = tG(z) + (1 - t)F(z),

(3.7.13)

with G(z) as the system z

i

~ zo \ :

z

n

-

. z

0

(3.7.14)

/

Along any path, at any value of t, we can rescale [ZQ, ..., zn] to keep the magnitudes of the homogeneous coordinates in range. In numerical work, we want to restrict the representation of a root to just n variables at any particular moment. One way is to pick one of the variables and

38

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

set it to one. Typically we do this initially with ZQ = 1. If at any later time we find that some variable, say z*, is growing large, we may rescale to make z* = 1 and let Zj, j ^ i, vary (including ZQ). In other words, we can pick any of the Euclidean patches, UQ, ..., Un, to do the computation and we can transition from one to another whenever it is advantageous to do so. More generally, we may pick any Euclidean patch of P™ for our computations, such as the patch V illustrated in Figure 3.1. That is, we may choose coefficients a = (ao,... ,an) and append a linear equation aozo + aizi H

anzn = 1.

(3.7.15)

Whenever any variable grows too large, we may switch patches by picking a new set of coefficients, a. We call the application of Equation 3.7.15 a projective transformation, introduced as a numerical technique in polynomial continuation by Morgan (Morgan, 1987). By the homogeneity of H(z,t), we have that if H(z,t) = 0 then H(Xz,t) = 0 for any A. Using Equation 3.7.15, the numerical representation of the root [z] is (Azo,. • • ,Azn) where A = l/(aozo H

anzn).

This representation breaks down if we happen to have chosen coefficients a such that CLQZO + • • • anzn = 0. If we choose random, complex values for a, there is a probability of zero that we will encounter such a point in the homotopy. Thus, in practice, it is usually sufficient to make such a random choice once at the beginning of the continuation run, with no further monitoring to check for the need to switch patches. However, there is little overhead involved in the checking, so for the utmost in reliability, it is worthwhile to implement patch switching. This takes care of numerically tracking the path to infinity. Once we have endpoints of all the homotopy paths, we usually wish to sort the finite solutions from the solutions at infinity. In numerical work, this comes down to setting a tolerance e^ and declaring any solution point with |zo|/max|zj| < e^ to be at infinity. Obviously, the proper setting of this tolerance depends on the precision of the arithmetic we are using and the conditioning of the solution point. We cannot know with certainty whether the point is actually at infinity or just so close to infinity that the difference cannot be discerned at whatever level of numerical precision is in place. Increasing the precision can raise one's confidence in the judgement, but certainty can never be attained. In this respect, the numerical result gives a strong indication of the truth, but it is never equal to rigorous proof. Let's look at an example in one variable that has roots diverging to infinity. We can arrange this by using a starting polynomial of higher degree than the target. In practice, we would normally use a start system of equal degree to the target, so there would be no diverging paths. But in multivariate systems, such an exact matching is often not possible, so that the phenomenon illustrated here is very

39

Protective Spaces

common. Single variable examples have the advantage that the solution path of the homotopy can be visualized by plotting it in an Argand diagram. Examples of multivariate homotopies with diverging paths are given in the exercises that follow. Example 3.7.1 Choose a start system g{x) = a;3 - 1 and a target system f(x) = x + 1.5. Form the homotopy h(x,t)=tg + (l-t)f

= O,

and follow the three solution paths from x — 1, x = (—1 ± iy/3)/2 at t = 1 as t goes from 1 to 0. As shown at the left in Figure 3.3, two roots diverge to infinity as t —> 0. Homogenizing h by substituting x — ZI/ZQ and clearing denominators, one obtains #(2 0 ,z u t) = t(z\ - 4) + (1 - t)(zi +

1.5ZQ)Z$

= 0.

(3.7.16)

On the patch ZQ = 1, the solution paths for z\ are the same as the paths for x in the inhomogeneous homotopy. In contrast, on the patch z\ = 1, we get the picture at the right in Figure 3.3. The roots that diverge on the left patch now are seen to approach the origin. In addition to being represented numerically by finite numbers, the paths to infinity (i.e., to z0 = 0) also have finite arclength, so one can successfully track the entire path. Neither patch is suitable for all the roots, as the real root on the patch z\ = 1 now goes to infinity at t — 2/3. Accordingly, let's pick a "random" complex patch: (0.2 + 0.8i)20 + (0.4 - 0.5i)zi = 1.

(3.7.17)

(In practice, we would use a random number generator for the coefficients of this equation, but for illustrative purposes, we keep the numbers simple here.) In this patch, the paths of both ZQ and z\ stay finite on all of t e [0,1], as shown in Figure 3.4. At t = 2/3, zx passes through zero on the path labeled "1," and at t = 0, ZQ reaches zero for paths labeled "2,3."

3.8

Exercises

Exercise 3.1 (Projective Transformation 1) (1) Use the multivariate Davidenko equation (See 2.4) to reproduce Figure 3.4 by appending the projective transformation Equation 3.7.17 as the second equation to homotopy Equation 3.7.16. (2) Use goodtracklnfty.m to reproduce Figure 3.4. What is the final value of t for the paths going to infinity? What criterion caused the path tracker to stop? What is the underlying cause? (3) Instead of tracking the homotopy as a two variable system, one can solve the projective transformation, Equation 3.7.17, for ZQ as a function of z\. Substitute

40

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Fig. 3.3 Solution paths of Equation 3.7.16 as t goes from 1 to 0 (real), shown on two different patches.

Fig. 3.4 Solution paths of Equation 3.7.16 as t goes from 1 to 0 (real), using a general projective transformation.

this into the homotopy Equation 3.7.16 to get a homotopy in z\ alone. What are the start points? Adapt goodtracklnfty.m to solve this homotopy. How do you recover the value of [zo>2i] f° r the endpoints? Exercise 3.2 (Projective Transformation 2) Any homotopy between polynomials in one variable having a start system with higher degree than the final one must have solution paths diverging to infinity. Multivariate systems may exhibit this phenomenon even if the degrees match. Use goodtrack. m and the projective transformation to treat the following systems. You must homogenize and then append the projective transformation. Use a random, complex 7 to avoid singularities. Dehomogenize the finite roots to obtain the final solutions of the original homotopy.

Projective Spaces

41

• A one-variable system h{x, t) = 7i(x + l)(x - l)(x - 2) + (1 - t)(x + 2) = 0. Plot the three solution paths of a; in an Argand diagram. Try several values of 7. How do the paths to infinity respond? Do they seem to be going to the same or different endpoints? Now, plot the paths of z0 that resulted from the projective transformation (presuming you used x = Zi/z0 in the homogenization step). Does this help you explain the paths of x? • A two-variable system

^y,*) = T*(^lJ)H-(i-t)( a:2 ^;^ 2 )=o. Plot the curves xy = 1 and x2 — x — 2 = 0 in the real x, y plane. Do the finite roots agree with your computation? How many roots at infinity do you get? Can you interpret their meaning in the context of the plot of the curves? Exercise 3.3 (Circles) Try intersecting two circles in the x, y plane using a homotopy similar to Exercise 3.2, utilizing homogenization and the projective transformation. (1) For two general circles, how many finite roots and how many roots at infinity are there? Can you confirm from the homogenized equations what the roots at infinity should be? Does the computation agree? (2) What if the circles are concentric? Predict the outcome by studying the homogenized equations. Do the endpoints found by continuation agree with your analysis? Exercise 3.4 (Projective Cross-Product Spaces) Consider the cross-product space P 1 xP 1 , that is, the set {{u,v) \ u G P 1 , v G P 1 }. The finite portion of P 1 x P1 is equivalent to C2. Answer the following: (1) Describe (P1 x P1) \ C2, i.e., what is added at infinity to C2 to form P 1 x P1? (2) The line ax + by + c = 0 in (x, y) £ C2 is the finite portion of the line az\ + bz2 + cz0 = 0 in [zo,zi,z2] G P 2 under the mapping [zo,Zi,z2] —• {zi/zo,z2/zo)- It is also the finite portion of the line auiVo + buoVi +CUOVQ ~ 0 in ([UQ, U\], [VQ, V\]) G P1 x P 1 under the mapping ([uo>ui]> [t>o,^i]) —> (ui/uo,vi,vo). Investigate the intersection of two parallel lines under both homogenizations. Be sure to consider horizontal lines, vertical lines and lines with arbitrary slope. What do you conclude about the relationship of points at infinity of one space to the other?

Chapter 4

Genericity and Probability One

This chapter explores how one of the fundamental concepts of algebraic geometry, genericity, is also the foundation of polynomial continuation. In an idealized model where paths are tracked exactly and where random numbers can be generated to infinite precision, our homotopies can be proven to succeed "with probability one." In the non-ideal world of floating point arithmetic and pseudo-random number generators, probability one cannot be achieved, but experience shows that high reliability is obtained when reasonable precautions are taken. Moreover, that reliability can be raised asymptotically close to one by increasing the precision of the calculations and taking other steps to bring the actual numerical behavior closer to the ideal. It is impossible to talk about generic points without introducing a few notions from algebraic geometry. We have various types of sets, which it is natural to refer to as algebraic sets. Affine algebraic sets An affine algebraic set on CN (see § 12.1 for more details) is a set defined by the vanishing of a finite number, say n, of polynomials Pi,... ,pn £ C[a:i,... ,XN]. That is, a set X c CN defined by X = { ( x i , . . . , x N ) € CN \ p i { x 1 , . . . , x N ) = 0 ,

i=l,...,n} .

Projective algebraic sets Recall from § 3.2 that the set of lines through the origin in CN+1 is equivalent to the projective space ¥N (the projective plane) and that the zero set of any homogeneous polynomial f(xo,xi,... , Xjv) is a subset of PN with homogeneous coordinates [xo,Xi,..., XN], see § 3.5. Accordingly, a projective algebraic set on P^ (see Chapter 3, § 3.5, and § 12.3 for more details) is a set defined by the vanishing of a finite set of homogeneous polynomials, say Pi(xo,xi,...

,xN),...

,pn(x0,xi,...

,xN).

That is, a set X C PN defined by X = { [ x o , a ; i , . . . , : r j v ] 6 P W | Pi(xo,x1:...

,xN) = 0,

i=l,...,n}.

Quasiprojective algebraic sets Sets of the form X\(XDY), where X C ¥N and Y C VN are both projective algebraic sets, are called quasiprojective algebraic 43

44

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

sets. These sets include both affine algebraic sets and projective algebraic sets (see § 12.4 for more details). For this book, quasiprojective algebraic set and algebraic set are synonyms. In differential geometry and topology, there is the basic notion of a manifold. This is defined precisely in § A.2.1, but for now we can use the loose definition that an n-dimensional complex manifold is a space that is locally like C™. Not every algebraic set is a manifold, e.g., V(xy) is locally like C except at the point (0,0). A point of a quasiprojective algebraic set with neighborhood like C™ for some n is called a smooth point or a manifold point of X. Here the word "like" must be made precise: this will be done in § A.2.1. For now the important point to note is that the subset of manifold points Xreg of a quasiprojective algebraic set X is dense and open, and the set of singular points Sing(X) := X \ Xleg is a quasiprojective subset of X. The most basic building block of any of the above three types of algebraic sets is an irreducible algebraic set. We say that a quasiprojective (or affine algebraic or projective algebraic) set Z is irreducible if ZTeg, the set of manifold points of Z, is connected. The dimension of an irreducible quasiprojective set Z is defined to be dimZ reg as a complex manifold, which is half the dimension of Z reg as a manifold. Note that in all three cases Zreg is quasiprojective, but if Z is projective (respectively affine) then Zreg is not necessarily projective (respectively affine). Indeed, if Z is projective and has singularities, then Zreg is noncompact and thus not projective. Moreover if Z is affine, then Zreg is affine if and only if the singularity set Zsing of Z contains no manifold point x with the dimension of Zsing at x less than dimZ — 2. These sorts of algebraic sets, the singular subset of a quasiprojective set, irreducibility, the natural breakup of an quasiprojective set into irreducible quasiprojective sets, and dimension are discussed in detail in Chapter 12. 4.1

Generic Points

The concept of a general point or a generic point is classical. The desire is to have something like a "random" point on a quasiprojective set which has no special properties not true for all points of the quasiprojective set. As stated, this is asking too much, but we can make the notion of generic points precise just by being a bit more careful in our language. The crucial refinement is to restrict our attention to individual irreducible components of quasiprojective sets. Indeed, to see the necessity of this, consider V(z\Zi) C C 2 , which is the union of two lines: Z\ = 0 and Z2 = 0. We can easily distinguish between these components, and a random point on V{z\z-i) must be on one or the other. A property that holds generally on one component cannot be expected to hold on the other one. An obvious example is the property that z\ = 0, which holds on every point of the component V(z\), but holds only for

Genericity and Probability One

45

(z\,Z2) = (0,0) on the component V{z2). With this restriction, we may define the meaning of generic as follows. Definition 4.1.1 (Generic) Let X be an irreducible quasiprojective set. Property P holds generically on X if the set of points in X that do not satisfy P are contained in a proper algebraic subset Y of X. The points in Y are called nongeneric points and their complement, the points in X\Y, are called generic points. As discussed in § 4.6, there are other ways to define generic, but the definition above suits our needs. From this definition, one sees that the term generic is only meaningful in the context of the property P in question. In many instances, the property in question is a compound one. For instance, if properties Pi and P 2 both hold generically on X, then the compound property P = {Pi and P2) also holds generically on X. This is because Pi holds on X \ Yi and property P% holds on X \ Y2, where Yx and Y2 are both proper algebraic subsets of the irreducible quasiprojective set X: so P holds on X \ (Yi LJ Y2)- But the union of two proper algebraic subsets is also a proper algebraic subset. We state the following claim without proof. Claim 4.1.2 Let / : C n -> CN be a set of polynomial functions and let X C C n be an irreducible quasiprojective set. Suppose property P is equivalent to the condition f(x) -£ 0. Then, P holds generically on X if and only if it holds for at least one point of X. The concept of generic properties is very useful. The set Y in the definition has complex codimension at least 1, equivalent to a real codimension of at least 2. It also has measure zero. Thus it is "small" from a number of perspectives. Often this captures what we want. For example, we might consider X = C and for a € C ask whether the property that V(z2 + a) has two distinct roots is true or not. It is easy to see that V(a) is the set a G C where the property fails, and so, although the property is not always true, it does hold generically. If we were to pick a at random from C, the probability that z2 +a = 0 would have two distinct roots is one. In general it may be difficult or impossible to completely describe the set where a certain property fails. Often we do not care that a property is sometimes false, and really only need to know that the set where a property holds is "large." According to Claim 4.1.2, we just need to know that the conditions for failure of the property are algebraic and that there exists a point in X for which the property holds. Note that the definition of generic carries over naturally to real affine sets X C RN. The main difference is that the set Y C X of exceptional points has at least real codimension 1, whereas in the complex case, it at least real codimension 2. This difference is essential and is a major reason that we construct homotopies in complex Euclidean space instead of in real Euclidean spaces. A crucial advantage of complex space is reflected in the following special case of Theorem 12.4.2.

46

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Theorem 4.1.3 The complement of a proper algebraic subset Y in an irreducible affine set X C CN is connected. If an affine algebraic set X c CN is connected, then it is path connected. This proposition implies that we can always move from one generic point to another along a continuous path consisting of all generic points. 4.2

Example: Generic Lines

Throughout this book, we often apply the adjective "generic" to various geometric objects, as in a "generic point" or a "generic line." The precise meaning of the adjective always depends on the context, which we illustrate here by considering in detail the meaning of the following statement: The degree of a homogeneous polynomial p(xo,x\,X2) is the same as the degree of the homogeneous polynomial obtained by restricting to a generic line in P 2 . In the notation of the previous section, this statement without the word generic is our property P. Saying a line is generic we are implicitly referring to all the lines on P2 and assuming they have some sort of algebraic structure. Then, a "generic line" is any line that is not a special exception to the statement at hand, or said another way, the statement is true for all lines except those in a proper algebraic subset of the set of all lines. In the notation of the previous section, we need to show that: • there is an irreducible algebraic set X, each point of which represents a line in P2, • the failure of proposition P is described by a set of algebraic equations, and • there exists a line for which the proposition holds. In the next few paragraphs, we show this in some detail. Typically we can represent objects in different ways. The simplest way of representing lines on P2 is as the solution set of a linear equation b\Xo + b\Xi + 62^2 = 0. Lines correspond to three-tuples (bi,b2,bs) 6 C3 with not all three coordinates 0. Since (&o,&i,&2) and {b'^b'^b'^) give the same line if and only if there is a A e C* := C \ {0} with b't = Xbi for i = 0,1,2, we see that lines in P 2 are parameterized by points [60, &i, 62] £ P2Since the proposition concerns the degree of the restriction of p(xo,x\,X2), it is more convenient to parameterize the line by its solution points, rather than representing it by the coefficients of its equation. Suppose that two distinct points [aio,an,ai 2 ] and [0.20,0,21,0,22] are on the line. Then, the entire line on P2 is given in parametric form as [ZQ,ZI] —>• [xo,xi,x2]

= [zo,zi] • A,

Genericity and Probability One

47

where \a 2 o 021 ^22/

In this manner, every line in P 2 has a representation as a 2 x 3 matrix of complex numbers. At first sight the parameter space for the lines is C 6 . This is not quite true. For a 2 x 3 matrix A to give a map from P 1 to P 2 , the nullspace of the map (ZQ, ZI) —> (ZQ, Z\) • A must be the single point (0,0). (Otherwise, there would exist [£o,2i] £ P 1 that give [XQ,XI,X2[ = [0,0,0], which is not allowed.) Thus letting U denote the set of matrices of rank two, we have A G U. Note that U is a dense open set of C 6 . It is the complement of the set V : = V(aua22

- 012021,011.023 - 013021,012023 - 013022),

i.e., the set of common solutions of the three polynomials 011022 — 012021, 0 ^ 2 3 o-iso-2i,ai2a23 — 013022- The set T> is a typical example of an affine algebraic set, i.e., the set of solutions of a finite set of polynomials on complex Euclidean space (see § 12.1). As such, it follows from T> ^ C 6 , that T> is "thin" in a precise sense, e.g., it's complement U := C 6 \Z? is dense and open, and T> is of measure zero in C 6 . Moreover, T> is of complex dimension at most five, which is equal to real dimension at most ten. Since U is open dense, we have that generically a point of C 6 is in U. In practice this means that a six-tuple generated by a random number generator will lie in U. But this space is six dimensions, and we have already identified the space of lines as P 2 . Why are the dimensions different? Notice that given any B G GL(2,C), i.e., any invertible 2 x 2 matrix B, then A and B • A give maps with the same line as image in P 2 . This accounts for the four dimensions. For genericity questions it suffices to work on U, and indeed, more often than not, we will work on larger spaces that map onto the true parameter spaces. to a line is just Now, the restriction of p(xo,Xi,x2)

g(zo,zi) :=p((z o ,2i) -A), and we are trying to show that g(zo,Z\) has degree d on a generic line. Without carrying out the algebra it is easy to see that (1) each term in the expansion of P{{ZQ,Z\) • A) is degree d in (z\,zo); (2) the coefficients of these terms are polynomials in the entries of A; (3) the condition that g is degree d is equivalent to at least one coefficient being nonzero. Let B G C[A] be the set of coefficient polynomials. The only thing that remains is to check is that not all of the polynomials B are identically zero, that is, V(B) ^ U. It suffices to check that there is at least one line on which g has degree d. To do this

48

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

t a k e a n y p o i n t [a,(3,-y] w i t h p(a, /3,7) = c / 0 , a n d choose a n y line [xo,Xi,x2}

=

[azl,/3zi:jzi +ZQ}. Then g(zo,zi) = czf + zoq(zo, z{), where q(zo,Zi) is either zero or has degree d — 1. The reader may confirm that an analog of the proposition holds on C2; that is, The degree of a polynomial p{x\,x2) is the same as the degree of the polynomial obtained by restricting to a generic line in C2. Notice that the modifier "homogeneous" has been dropped. The demonstration follows analogously to the discussion above, replacing ZQ and xo by 1. The main difference is that the set of polynomials B which must be zero for nongeneric lines is no longer the set of all coefficients, but only the set of coefficients of terms having degree d in z\. The purpose of all this is to illustrate how the intuitive notion that "generic" means "nothing special" can be concretely reduced for this specific case to say that "generic lines" are those represented by a matrix A whose entries do not satisfy the finite set of polynomials B. Usually, we will not go to such lengths to work out the precise definition of "generic" in other contexts. It is enough to know that in principle, nontrivial algebraic conditions exist whose zero sets contain the nongeneric points, and so the generic points, containing the complement of such a zero set, contain a dense, open set of the ambient space. Ultimately, this comes down to knowing that all the conditions of the context are algebraic and that there is at least one point that is not special. 4.3

Probabilistic Null Test

The concept of generic points leads quite naturally to the notion of "probabilityone" algorithms. Before making a general definition, let's motivate it by considering the question of whether a polynomial p{z) on C^ is zero or not. Of course, for any p[z) of reasonable complexity, we could expand it into terms and check if any of the coefficients is nonzero. In this sense, the question may appear to be a toy problem, but it has many aspects of serious questions we face about whether a given polynomial system has some property or another. For example, given a polynomial f(z) on C^, how can we check whether it is identically zero on an affine algebraic set X C CN? But even the question posed on CN is not so trivial as it may seem at first, for p(z) might be defined in straight-line fashion in a form not so easily expanded into terms; for example, it could be the determinant of a matrix whose elements are all polynomials. To settle whether p{z) is zero, we propose choosing a random point of z* G C^, and checking whether p(z*) = 0 or not. We wish to conclude that if p(z*) = 0, then p is the zero polynomial and if p(z*) =£ 0, then p is not the zero polynomial. The important observation is that if p(z) is not the zero polynomial the set V(p) is an affine algebraic subset of C^ of codimension at least one, and in particular of real

Generidty and Probability One

49

codimension at least two. The volume of V(p) as a subset of CN is zero relative to the usual 2JV-dimension real Euclidean volume. So if we choose a random number z* £ CN, then except for a set of measure zero, i.e., a probability-zero event, we have that p(z) is identically zero if and only if p(z*) = 0. Thus, we say that testing the value p{z*) for a random z* £ CN is a "probability-one" algorithm for deciding if p(z) is the zero polynomial. This is very fast, but raises practical questions. The worry is that a random point might be close enough to the set of nongeneric points that numerical analysis difficulties ensue. In floating point arithmetic, p(z*) will almost never evaluate exactly to zero even if p(z) is the zero polynomial. So in practical work, we must replace the test "Is p(z*) = 0?" with the test "Is |p(z*)| < e for some small positive real e?" Upon doing so, we face the trouble that if p(z) is not the zero polynomial, then the region {z G C n | \p(z)\ < e} is not measure zero. There are two ways in which the probabilistic null test can give an erroneous answer: False Positive |p(^*)| < e even though p(z) is not the zero polynomial; and False Negative |p(z*)| > e even though p(z) is zero. False negatives are the result of numerical error only, because if p(z) is identically zero, the random pick of z* cannot land on a mathematical exception. This is not true for false positives, where by chance we might pick a z* close to a solution of the equation p(z) = 0 even though p(z) is not identically zero. The chance of a false positive can be reduced by testing more than one random point. Suppose that for a given e, there is a false positive rate, neglecting numerical error, of r. This rate depends only on the set {z £ C | \p(z)\ < e} and the distribution from which we draw the random test point z*. Suppose we test twice with independent random test points, and declare p to be zero only if both tests indicate so. Then, the false positive rate neglecting numerical error declines to r 2 . Consider a polynomial given as the determinant of a matrix with polynomial entries, say, p(z) = detM(z), z € C^. Instead of expanding the determinant, the probabilistic null test is to simply evaluate the elements of M at a random point z* G C^ and check if M(z*) is a singular matrix. It is well-known that instead of simply evaluating det M(z*), a safer test for singularity is to use singular value decomposition. Suppose that M{z) does represent a singular matrix for all z, so p(z) is the zero polynomial. Typically, neither numerical evaluation of detM(;z*) nor numerical determination of the smallest singular value of M(z*) will return an exact value of zero: instead we will get a value which is at best a small multiple of machine precision. We must make a judgement of how small the result must be before we declare that M(z) is singular. This gets to the heart of the matter: we cannot know with certainty using floating point arithmetic that p{z*) = 0, but by raising the number of digits used in the computation, we can make the uncertainty in the conclusion arbitrarily small. In short, under the assumption of exact arithmetic and a random number gen-

50

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

erator of infinite precision, the probabilistic null test will give correct answers with probability one. In floating point arithmetic, this ideal is not achieved, but the probability of false answers can be made arbitrarily close to zero by increasing precision. For example, fix a positive integer d and positive real numbers M and R and consider the monic polynomials p(z) of degree d on the region {z 6 C | \z\ < R} with all coefficients bounded in absolute value by M. Then the probability of false positives and false negatives in the probabilistic null test goes to zero as the number of digits increase. This follows by combining the fact, that we can choose e smaller and smaller in such a way that the absolute error of evaluating p(z) is less than e, with the fact, that the bound on the area of {z € C | \p(z)\ < e} given in Lemma 5.3.2. Analogously on C^, the probability of false positives and false negatives in the probabilistic null test goes to zero as the number of digits increase. As in the case of one variable we need to have some limits on our data for this to hold, e.g., it suffices to fix a positive integer d and positive numbers M, R and restrict to • z = (zi,...,zN) G CN with max{|zi|,...,|;zjv|} < R; • those polynomials p(z) of degree d on CN having all coefficients bounded in absolute value by M and at least one term of the form zf for some i. 4.4

Algebraic Probability One

From the discussion of the probabilistic null test, one sees that the idea of "generic" translates directly to randomized algorithms that succeed "with probability one." While this is exactly true in a mathematical sense, in floating point arithmetic, probability one is an ideal that is only attained in the limit as the arithmetic is extended to an infinite number of digits, consuming infinite computer time and memory! The success of such an approach in practice depends on careful consideration of numerical processes and benefits greatly if the mathematical functions under consideration are mildly behaved. In this respect, algebraic questions have properties not generally enjoyed in other mathematical domains. For this reason, we declare the following equivalence. Definition 4.4.1 (Algebraic Probability One) Suppose property P holds generically on an irreducible algebraic set X. Then we say that P holds with algebraic probability one for a random point of X. In this manner, we will also speak of "algebraic probability-one algorithms," meaning algorithms whose correctness, ignoring numerical error, depends only on some choice of parameter being generic. Since the scope of this book is limited to algebraic systems, we often drop the modifier, using "probability one" in place of "algebraic probability one."

Genericity and Probability One

51

Even though we often drop the adjective "algebraic," the distinction is meaningful. Consider a proposition P that holds for irrational real numbers, but fails for rational ones. It is known that although the rational numbers are dense in the real line (there is a rational number between any two given real numbers), they are also countable and hence measure zero. In this sense, a random number drawn uniformly from the real interval [0,1] has a zero probability of being rational. One could then imagine a test for the truth of P based on testing it at a random point. But this becomes utter nonsense in floating point computations, where every number represented on the computer is rational! We can only draw test points from the rational numbers, so we cannot test P on any irrational number, let alone a random one. We are in a much stronger position when treating algebraic systems, as illustrated in the following simple theorem. Theorem 4.4.2 finite in number.

// proposition P holds generically on C, the exceptions to P are

Proof. This follows from the definition of generic and the fact that a polynomial in one variable has a finite number of roots. • 4.5

Numerical Certainty

In the probabilistic null test for polynomial p(z), two sources of uncertainty come into play: the random selection of a test point z* and numerical error in evaluating p(z*). Intuitively, if p(z*) is far from zero, we feel very secure in concluding that p(z) is not identically zero. It is only when p(z*) is small that doubts enter in. But how small is small? That is, if our test is "Is |p(z*)| < e?," how do we pick e? And can we ever have certainty in our conclusion? One can attain certainty in many instances. If we can establish bounds on the round-off errors in the calculations and find a z* such that |p(z*)| is bounded away from zero, then we know with certainty that p(z) is not identically zero. It would be onerous to derive bounds for every situation that arises, but fortunately, methods exist for automating the process. In particular, interval arithmetic can be used for this purpose. The idea is each number in a sequence of arithmetic operations is replaced by an interval guaranteed to contain the exact result. To ensure this, each arithmetic operation rounds down the lower limit and rounds up the upper limit according to strict rules. In a complex version of this, numbers become rectangular regions in the complex plane (i.e., a cross product of a real interval and an imaginary interval). If the region computed for p(z*) does not include 0, then one knows with certainty that p(z) is not zero. This eliminates the question of deciding a value for e, by changing the question to "Does the interval value of p(z*) include zero?" If it does not, we have a certain

52

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

conclusion. False negatives are therefore eliminated. However, if it does include zero, we still do not know whether it might be a false positive due to overconservative estimates of the error interval or due to an unlucky random choice. Increasing precision and checking independent random points may turn a false positive into a certain negative. But if p(z) is identically zero, we can never determine this with certainty. We can, however, make the probability of false positives vanishingly small. In practice, we do not usually employ the rigorous methodology of interval arithmetic. If the computations are lengthy, the final error bounds can be very pessimistic, accumulating the worst case for every intermediate stage. The extra computation can be a burden as well. With a little good judgement in picking e, the uncertain approach yields good results. This approach values getting the correct answer with high probability quickly over rigor in distinguishing between certain and uncertain results. Mathematical proof, when required, is usually best sought in other ways. We can obtain very strong conjectures to guide the search for such proof.

4.6

Other Approaches to Genericity

There is classical approach to "generic points," which is espoused rigorously by (Weil, 1962) and in a simplified form in (page 2 Mumford, 1995). It forms the framework for Weil's approach to algebraic geometry. In characteristic zero, which is where complex and real algebraic geometry mainly sit, the idea is this. In a given discussion, a large but at most finite number of polynomial equations pi,. • • ,pm arise. Take all the coefficients of these polynomials and adjoin them to the rational numbers to produce a field K of finite transcendence degree over Q. For example, add \/2 to Q to get all the numbers of the form a + b\/2 with a, b G Q. Let Q be a field extension of infinite transcendence degree over the algebraic closure of K, e.g., if we started with Q, we could make the classical choice Q := C Now given a set of polynomials / i , . . . , / n € K[zi,..., ZN] generating an ideal X whose radical \/T C fi[zi,..., ZN] is prime, a generic point for V ( / i , . . . , /„) C £lN is a point r G V(fi,... ,fn) with the property that if q(z) G K[zi, ...,zN] is zero on T, then q(z) belongs to the radical \/X. Even though this classical approach (with its careful attention to fields of definition and a "universal field") seems somewhat far from our notion of generic point, the use of this approach is very close to the use we make of generic points in this book. For example, in (Chapter 16.3 van der Waerden, 1949), the criterion is given that for an algebraic function to vanish on an irreducible affine algebraic set, it is necessary and sufficient that it vanish at a generic point. If a property holds "generically" in the sense of § 4.1 for points of an irreducible quasiprojective algebraic set, then it holds "generically" also for this classical approach. Another variant of the concept of generic is to replace Y with countable unions of proper algebraic subsets. You give up the openness of U in the Zariski topology,

Genericity and Probability One

53

but the theory is basically the same. We do not ever need this generality. We refer to (Sommese & Wampler, 1996) where generic points were introduced numerically and some different approaches are contrasted with more detail. 4.7

Final Remarks

Though our experience with solving systems of polynomials using probabilistic algorithms has been very good, more research needs to be done on quantifying how secure we are in using probability-one algorithms. In such an endeavor, more quantitative measures of the size of numerically bad sets are needed. The remarks in § 5.3 discuss some of the numerical issues involved in deciding whether a point x G CN is a zero of p(z). We know that the model we are using is good for a range of degrees and dimensions dependent on the number of digits we use. As use of these algorithms spreads and applications are made well outside of the ranges so far considered, it will be useful to have more than rules of thumb for the behavior of this dependence. 4.8

Exercises

In the following exercises, "random normal" means a Gaussian distribution with zero mean and unit variance. "Complex random normal" means that the real and imaginary parts are each independent random normals. In Matlab, one can produce a n x m matrix of such variables using the command randn.(n, m) + l i * randn(n,m). Exercise 4.1 (Generic Circles) Interpret the statement: two generic circles in the plane meet in exactly two distinct finite points. Prove it. Exercise 4.2 (Nonsingular Matrices) (1) Prove the statement: n x n matrices are generically nonsingular. (2) The expression p(M) = det M is a polynomial in the entries of matrix M. The probabilistic null test on p consists of choosing a random M* and checking "Is \p(M*)\ < e?" Use Matlab to generate a large number of trials, perhaps 10,000, and plot a histogram of loglO (abs (det (M))), where M is a 2 x 2 matrix whose entries are complex random normals. What does your result imply about how the probability of false positives depends on e? (3) Repeat (2) using condition number instead of the determinant. (4) Do similar experiments for the condition number of larger nxn matrices. How does the variance in the result relate to n? (5) Try different distributions for the elements of M, such as uniform on [0,1], uniform on [—1,1] x [—i,i], and uniform on the unit-magnitude circle in the

54

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

complex plane. What affect, if any, do these have on the probability of false positives? Exercise 4.3 (Singular Matrices) The expression det(AAT) where A has more rows than columns is an identically zero polynomial in the elements of A. The following experiments explore the effectiveness of the probabilistic null test on such expressions. (1) Form a singular 2 x 2 matrix M by generating a random 2 x 1 vector a and setting M = aaT. Let the elements of a be complex random normal. Plot a histogram of loglO(le-20+abs(det(M))). What is the largest observed value? How does this relate to false negatives in the probabilistic null test? (2) Perform a similar experiment for nxn matrices M = AAT, where A is nx(n— 1) and complex random normal. (3) Compare these results to those of Exercise 4.2.2. Does there exist an e so that the null test "|detM| < e?" gives a correct answer in all your tests? Does the size of the matrix matter? Why? Exercise 4.4 (Null Tests on Random Polynomials) Experiment with the probabilistic null test on randomly generated polynomials of degree d, d = 1, 2, 3,4. Pick d roots r,, i = 1,..., d and a test point x, all complex random normal, and let p = Yli=i(x ~ ri)- Notice that considering p as a polynomial in x, it is never the zero polynomial, because it has leading term xd. (1) For d = 1, show that Prob(\p\ < a) = 1 - e"" 2 / 4 . (Hint: the sum of two normal distributions is normal, and the sum of two squared unit normals is a chi-squared distribution.) (2) Estimate Prob(\p\ < a) for d = 1,2,3,4 by numerical experiment. (3) Plot the experimental data and overlay the theoretical result for d = 1 for comparison. (4) What is the behavior of Prob(\p\ < a) for small a? How does this relate to the probability of false positives in the probabilistic null test?

Chapter 5

Polynomials of One Variable

This chapter presents three interrelated but distinctly different perspectives on polynomials in one variable: their algebraic properties, the analytic behavior of their roots, and their numerical behavior when evaluated in floating point arithmetic. The algebraic picture is important as a precursor to more general results for multivariate systems. Each algebraic result for one variable polynomials may be viewed as a special case of the more complicated set of possibilities that arise in the multivariate situation. The analytic and numerical pictures do not generalize quite so readily, although, as demonstrated in the short discussion of growth estimates, one may sometimes gain insight to the multivariate case by considering a multivariate polynomial as a polynomial in one variable with coefficients that are polynomials in the remaining variables. Let us begin with the algebraic point of view. 5.1

Some Algebraic Facts about Polynomials of One Complex Variable

We have already gained considerable experience with polynomials of one variable in earlier chapters, and we have even seen how to solve them with eigenvalue methods and continuation. However, these earlier presentations took for granted certain algebraic facts that have waited until now for a definitive statement. For a polynomial of one variable p{z) G C[z], the structure of the solution set of p(z) = 0 is a simple consequence of the Fundamental Theorem of Algebra, which states that Theorem 5.1.1 Any polynomial p(z) = aozd + • • • + a d e C[z], where d is a positive integer, the a^ are complex numbers, and ao ^ 0, factors; that is, k

p(z) = aol\(z - xt)d\ i=l

where the Xi are distinct complex numbers and di are positive integers satisfying + dk. d = d!-\ 55

56

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

In this case, the set V(p) defined by {z € C | p(z) = 0} consists of the k points Xj,... ,Xk, which are the k irreducible components of V{p). This set is the simplest example of an affine algebraic set, i.e., a closed algebraic subset of complex Euclidean space (see § 12.1 for a precise definition). The description V(p) = {zi} U • • • U {xk} is a special case of the irreducible decomposition, see § 12.2. The multiplicity of a root Xi oi p(z) is the integer di > 0 occurring in the factorization of p(z). It is easy to check the following theorem, which we state without proof. We use the notation p(i)(z) to mean the jth derivative of p(z), i.e., p^(z) = j^jp(z). Theorem 5.1.2 di = m i n { j > 0

Point x% is a root of p(z) with multiplicity di if and only if \p^{xi)^Q}.

Considering the common zeros of more than one polynomial leads to no new sets, since it is an easy consequence of the Fundamental Theorem of Algebra that V(pi,... ,pn), the common zeros of n polynomials, equals V(p) for the greatest common divisor of the p^. Another way of approaching this is to take the set of zeros of one of the Pi and keep only those for which all the remaining pi are zero. Let p \ (z) — aozdl -\ \- adl and p2 (z) = bozd2 -j \- bd2 denote polynomials of degrees d\ and d2- The polynomials P\{z) and p2(z) have a root in common if and only if the Sylvester determinant (defined below in Equation 5.1.1), a polynomial of degree d\ + di in the coefficients of p\ and p2, is zero. A quick proof of this, in sufficient generality to be used as a tool to symbolically investigate multivariate polynomials, may be found in (Walker, 1962). A more extensive development of resultants may be found in (Cox, Little, & O'Shea, 1997). In the case of polynomials of one complex variable, the proof in (Walker, 1962) comes down to simple linear algebra. Since we will have occasion to contrast the numerical methods we use with purely algebraic methods, we prove the underlying lemma about the Sylvester determinant in this case. Lemma 5.1.3 Let p\{z) — aozdl H \- adl and p2(z) = b0zd2-i \-bd2 denote polynomials of degrees d\ and d2- If there is an x E C such that pi(x) = 0 and p2(x) = 0, then there exist polynomials f(z),g(z) e C[z] with p2{z) f'(z) = Pi{z)g(z), deg/(z) < degpi(z), and degg(z) < degp2(z)Proof. Since x is a root of both pi(z) and p2(z), we may factor out [z — x) to write Pi(z) = {z- x)f(z) and p2{z) = {z - x)g(z). Accordingly, P2{z)f{z)

= {z- x)g{z)f{z) = {z - x)f{z)g{z) =

Pl{z)g{z).

n Lemma 5.1.3 leads directly to the following theorem. h adl, a0 =£ 0, Theorem 5.1.4 (Sylvester) The polynomials p\{z) = aozdl + andp2(z) — boz 2 + - • - + bd2, bo ^ 0, have a common root if and only if the Sylvester

57

Polynomials of One Variable

resultant Res(pi,p 2 ) — 0, where Res(pi,p 2 ) := det(Syl(pi,p 2 ))

an

d

'ao ... adl 0 . . . 0 0 a0 ... adl 0 . . .

Syl( P l , P 2 ) :=

0 ...

0

an ... ad, dl

° 0 ... 0 b0 -..bd2 0 b0 ... bd2 0 . . . . 0 ... 0

b0

.

(5.1.1)

...bd2.

x

The matrix in this expression is size (di +^2) (di + d 2 ) and has d2 rows involving the ai 's and d\ rows involving the bi 's. The columns above and below the dividing line do not necessarily line up. Proof. The condition given in Lemma 5.1.3 is the existence of f(z) and g(z) such + ... + fdl-1 andg(z) = gozd'-1 + that p2(z)f(z) = Pl (z)g{z), where f(z) = fozdl-1 • • • + 9d2 -1 • This may be written in matrix form as 0 = [go,- • • ,9d2-i,-fo,-

• • ,-fd!-i}

-Syl(pi,p 2 ),

(5.1.2)

where the matrix on the right is the same one as appears in the Sylvester resultant. The condition is met if and only if the above linear system of equations has a solution, which happens if and only if the determinant of the matrix is zero. • The reader should write out a few low degree cases for himself or herself. For example, the special case when d\ = di = 1 is

Res = det[f? 1 l and R = ao&i ~ &o°i- -R = 0 if and only if the vectors (ao,ai) and (6o>&i) a r e linearly dependent. This agrees with what we know: if two linear equations in one variable have a common solution, then one is a multiple of the other. Remark 5.1.5 Treating the ai and bj as indeterminates, we see (looking ahead to A.13.1) that R is a bihomogeneous polynomial of bidegree (d2,di). Theorems 5.1.2 and 5.1.4 may be combined to conclude the following. Theorem 5.1.6 A polynomial p(z) = a,ozd + • • • + ad, ao ^ 0, has a multiple root if and only if its discriminant Dis(p) is zero, where D\s(p) := Res(p,dp/dz). Note that the discriminant condition, Dis(p) = 0, is a polynomial equation on C[ao, •. •, an], so we are justified in saying that a generic polynomial of degree d has d distinct roots.

58

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

As afinalremark, we note that the number of real zeros of a degree d polynomial is less than or equal to d, for to find the real zeros of a polynomial p(z) G R[z], we can find the complex zeros of the polynomial and pick out those that are real. Even though the real solutions of p(z) are easy to understand, we can see that the algebra and the geometry are not closely connected as in the case of complex solutions. For example, for integer d > 1, the polynomial z2d + 1 has 2d complex zeros, but no real zeros. 5.2

Some Analytic Facts about Polynomials of One Complex Variable (Optional)

We now collect some of the classical relations between the coefficients of a polynomial p(z) = aozd + • • • + Od £ C[z] and its roots, i.e., the solutions of p(z) = 0. We follow Marden's beautiful book (Marden, 1966), which contains many more results than we present here. This section is marked "optional," because it is not essential to an understanding of the continuation method. Indeed, it is quite difficult to find similar relations that apply to systems of multivariate polynomials, our main subject of concern. We include this material as background, because it at least gives a hint of what we might expect in the more general situation. Moreover, in Remark 5.2.4, we show the one variable growth estimates given here give growth estimates for general affine algebraic sets. These estimates combined with the Noether Normalization Theorem A.10.5 and the use of trace functions as in § 15.5.4 may be developed into a geometric proof of the existence of the irreducible decomposition. We start by getting numerical bounds on the roots of p(z) in terms of the coefficients a«. The basic trick here is an observation of Cauchy. For any complex number a £ C and any real number r > 0, we let Ar(a) := {z e C | \z - a\ < r] denote the disk of radius r around a. L e m m a 5.2.1 Let p(z) = aozd + • • • + aj, e C[z], with a® ^ 0 and with a,j ^ 0 for at least one j > 0. Then the polynomial d

q{z)-\aQ\zd-Y,\ai\zd~i 2=1

has a unique positive root R, and all the roots of p(z) are contained in the disk AR(0).

Proof. Without loss of generality we can assume that a,j ^ 0, since otherwise we could factor a power zl with i > 0 out of the polynomials p(z) and q(z) and have the condition that p(z) has a nonzero constant term.

59

Polynomials of One Variable

Consider the function h(x) := q(x)/xd on x G (0, oo). Note that the derivative h'(x) is positive for all x G (0, oo). This shows that h(x) is an increasing function with at most one x G (0, oo) with h(x) = 0. Since lira h(x) = —oo and lim h(x) = oo, x—>oo

we conclude from the intermediate value theorem that h(x) = 0 has at least one solution. Thus q(z) has a unique root R on (0, oo), and q(x) > 0 for real x > R. Now we will assume that there is a root z* oip{z) which satisfies \z*\ > R and show we get a contradiction. We have p(z*) = 0, which gives

Thus we conclude the absurdity that g(|-z*|) < 0.

•

The first observation is that this radius R satisfies

# < l + m a x ( f e ,..., ^ 1. Uao

a

o J

(5.2.3)

To see this assume that for R with q(R) = 0we have the contrary R>1

+ max < — , . . . , — > . { a0 a0 )

Dividing q(z) by Rd and setting M := max < — , . . . , — > we have

ao|J

I ao d

d

i = X) — R~1 <MJ2 •i?"i

(5-2-4)

and thus we conclude that R < M + 1. Theorem 5.2.2 Let r denote the maximum absolute value of any of the roots of h ad G C[z], with ao ^ 0, and let R denote the unique nonnegative p(z) = aozd -\ real root of the polynomial q{z) := \ao\zd - J2i=i \ai\zd~l. Then, denoting a =

™*i<^(|S|/(")) 1A > a
< R< ~

-r^—•

" 2 3 - 1

60

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Proof. We can assume that r and R are nonzero, since otherwise the result is trivial. We have already shown that r < R. The left inequality follows from the observation that if we denote the roots of p(z) = 0 by z\,..., Zd we have - =

£

**•••**<(*>.

For the right hand inequality, using a as defined in the theorem, note that

Rd < Y ^ Rd~* < Y f^a'iJ""' = (R + a)d - Rd.

-£?«o - f e w

a d

Remark 5.2.3 Given a polynomial p(z) = a^z + • • • + a^ G C[z], with ao ^ O / o j and roots z\,...,Zd, then the numbers — are the roots of the polynomial a0 + a\z + . . . + a,d,zd. Using this we get another set of bounds for the absolute value of the \zi\ by applying Theorem 5.2.2 to a0 + a\z + ... + adzd. Remark 5.2.4 (Growth Estimates) For a polynomial p(z) G C[zi,... ,ZN], Theorem 5.2.2 gives some quantitative feeling for the behavior of the solution set V(p) := {zGCN\ p{z) = 0} . To understand this, assume that degp(z) = d and single out one of the variables, e.g., ZN and consider p(z) as an element of C[zi,... ,ZJV-I][^^] and write d

p(z) = 2^ai(zi,...

,Zjv-i)Zjv~*- F° r simplicity, assume that ao(zi,... ,ZJV-I) = 1:

i=0

this can always be achieved by a linear change of coordinates. We have that a,i{z\,... ,ZM-I) G C[zi,... ,ZN-I] has degree at most i, and on Br:=

r

r ^

rf 1

a

. £|zil
Uzu...,ZN-i)&C -

1

\,

for all sufficiently large r, we have that \at(zi,...,z/v_i)| < Cj?*8, where Q is a positive constant independent of r. Thus Theorem 5.2.2 implies that for all (zi,... ,ZJV_I) G C^" 1 with \/Ylj=i \zj\2 sufficiently large, we have that any solution (zi,... ,ZN) of p(z\,... ,ZN) = 0 satisfies N-l

\zN\
J2\ZJ\*

with C a positive constant independent of (z\,..., ZJV-I).

Polynomials of One Variable

5.3

61

Some Numerical Aspects of Polynomials of One Variable

It is a numerical fact of life that constants are only known to (and computations are only carried out with) limited numbers of digits. It is worth spending a little time thinking through what this means for polynomials of a single variable, i.e., to consider how closely numerical calculations match the algebraic-geometric picture. If we were considering polynomials with coefficients in a finite field, it may well happen that a polynomial is nonzero even though it evaluates to zero at all points of the field, e.g., z(z — l) over Z2 = {0,1}. One happy consequence of the Fundamental Theorem of Algebra is that this does not happen over the complex numbers, i.e., a given polynomial p(z) is only zero at a finite set x\,... ,Xd- But what about the situation when we use the floating point numbers on the computer? At first sight there is nothing to worry about. Assuming 15 digits on our computer we have on the order of 1015 distinct numbers and for a polynomial to be zero at all of them, it would need to be of degree at least 10:5, which is absurdly large for any application we know of. But there is a snag here. If, for a polynomial to numerically be zero we mean it is less than some small constant, e.g., 10~15, then the Fundamental Theorem of Algebra is certainly false. Consider the normalized Chebychev polynomial of order n, which is given by n

Tn(z) = ]J(z - cos((i - l/2)7r/n)). i=l

As Hamming eloquently points out (§28.5 Hamming, 1986), since the normalized Chebychev polynomial of degree n is a real polynomial that oscillates between ±2 1 ^ n on the interval [—1,1], the 51st of these, T5i(z) = z51 + lower order terms is < 10~15 on [—1,1]. Thus, although the exact polynomial has just 51 roots in this interval, the numerical approximation of it in standard double precision arithmetic is zero to within round-off error on the entire interval. Indeed, it is worth thinking about what we mean when we say we find a zero of a polynomial p(z). We mean a floating point number x of some prescribed number of digits (say 15 for simplicity of discussion) whose distance from one of the k zeros of p(z) is less than some prescribed number, e.g., 10~15. In light of Hamming's observation, we might want to say, "all right, if p(x) is very small we cannot conclude that x is close to a zero, but certainly, if x is close to a zero oip(z), then p(x) is close to zero." Unfortunately, even this is false. Consider p(z) := z10 - 28z9 + 1.

(5.3.6)

To 15 digits of accuracy one of the roots of this polynomial is x = 27.9999999999999. Evaluate p(z) at this number and rounding to 15 digits, we find that it is rather

62

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

far from 0, i.e., p(x) = - 2 . Even with 17 digit accuracy, the approximate root is x = 27.999999999999905 and we still only have p(x) = -0.01. To go a bit further in this direction let p(z) = zd + a\zd~1 + • • • + a<j with positive coefficients all of modest sizes, e.g., < 10. Let the degree be d = 15, and consider the implications of Theorem 5.2.2. It implies that the roots of p(z) are all within the disk of radius 24.66. Suppose that p(z) is the same as p(z) with the four lowest degree terms dropped, that is, p(z) = z15 + aiz14 + • • • + anz4. Then, also by Theorem 5.2.2, the polynomial p(z) — p{z) has all its roots within the disk of radius 12.83. Then, for \z\ > 24.66, the relative error, \\p(z) - p(z)\\/\\p{z)\\, of approximating p(z) by p(z) is bounded by \\p{z) -p(z)\\/\\p(z)\\ < (|s| + 12.83)3/(M - 24.66)15. This implies that for \z\ > 48, the relative error is < 10^15. Hence, if we are using 15 digits of accuracy, we can drop the four lowest-degree terms without observable change in the numerical values on \z\ > 48. The moral is that with only a certain limited number of digits, we can only look at algebraic objects of a limited size before the numerical limitations imposed by the allowed number of digits overwhelm the model coming from algebraic geometry. It is convenient to have a rough rule-of-thumb for how the number of digits we have available is connected to the degrees of the polynomials we can safely compute with. To achieve this let us look a little more closely at the phenomena raised by the Hamming example. The first observation is that the situation is not as serious for the unit disk as it is for the interval [—1,1]. Lemma 5.3.1 Let p(z) = zd + aizd~l + ... + ad. Then max \p(z)\ > 1. Proof. Assume that max \p(z)\ = c < 1. Then by Rouches theorem (Hille, 1959), it follows that zd and zd — p{£) have the same number of zeros within the unit disk. Since these numbers are d and d—1, they cannot be equal and we have shown the lemma. • This shows that the normalized Chebychev polynomials of any order, and in fact, any polynomial with leading coefficient 1, is distinguishable from zero in the unit disk. Though comforting, the real problem is that T^\{z) and its relatives are very close to zero on a significant set within the unit disk. We start with a crude order of magnitude result. Lemma 5.3.2 Given a polynomial p{z) = zd + a,izd~l + ... + a<j G C[z] and a positive number e, the area of the set of z e C such that \p(z)\ < e is at most d-ne2ld. Proof. Let z\,...,z&

denote the roots of p(z). Let w denote a point such that

63

Polynomials of One Variable

\p{z)\ < £• We claim that w is contained in d

{J\1/d{Zi). i=l

If not, then we would have that \w — Zi\ > e1^.

Thus we get the absurdity that

e > |p(u>)| = \w — Zi\ • • • \w — Zd\ > e.

•

If the roots are sufficiently separated, the bound for the area is actually of the form < Cdirc2/d, where Cd is a universal constant bounded by 2. We see this as follows. Theorem 5.3.3 Letp(z) = zd ~\- a^zd~l + .. . + a,d £ C[z] be a complex polynomial with distinct roots Z\,... ,Zd, and let e be a positive number. Assume that for all l
Then the area of the set o / z £ C such that \p(z)\ < e is

Proof. It suffices to show that the region \p(z)\ < e is contained in the union of disks

i=i I

Id"1)

d

J

We first show that \p(z) | > e on the set

To see this note that for j ^ i, ,

z - Zj\ + \z - Zi\ > \Zi - Zj\ >

(

—

d

- ^

\ x

\(d-l) d J

e*.

64

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Thus if we have z as in Equation 5.3.7, we have that d

\p(z)\ = H\z-zJ\

(

«J

w

d - i ,V"'

= e. The set of z such that |p(z)| < e has at most d connected components. This can be seen by noting that z —> p(z) is a d-sheeted branched cover.

f

Since the roots Zj are in distinct disks < z

{

\z — zt\ <

e^ 1

j—f > , we are done.

(d-l)-a-J

^

Conjecture 5.3.4 (Zero Region Bound) Letp{z) = zd+aizd~l + .. .+ad G C[z] be a complex polynomial and let e be a positive number. Then the area of the set of z G C such that \p(z)\ < e is < Cdne2'd where Cj, is a constant only dependent on d and bounded by 2 for all sufficiently large d. Remark 5.3.5

We suspect that Cd < 2 for d > 3.

Lemma 5.3.2 and Theorem 5.3.3 suggest a rule of thumb for the tradeoff between the number of digits used and the degree of polynomial that can be handled. Suppose we can tolerate at most an area of 10~a for the set where \p{z)\ < e, i.e., where p(z) looks like it is zero numerically. If we are computing with E digits of accuracy, then we take e = IG~B. By Theorem 5.3.3, we have 27re2/d < 10" a , which implies --£<-a-log10(27r)<-a-l a or

In particular, for an area of 10~3, we should use around 2d digits of precision in calculations.

Polynomials of One Variable

65

If the degree is high, the assumption on the spread of the roots looks like C{d)e1/d, where C(d) = d/{(d - l)^"1)/"*) slowly approaches 1 from above as d grows. If the closest pair of roots is separated a distance r, we may turn this around and say that Theorem 5.3.3 only applies for e < (r/C)d « rd/(de), where e = 2.718... For a difficult case such as the Chebychev polynomial T51(;z), which has d = 51 and two roots within r = 3.8 • 10^3, one must have e < 2.7 • 10~126. This ensures that the sets with \p(z)\ < e are in distinct disks centered on the roots. Sharper bounds may be possible, but the message is that high degree polynomials require high precision arithmetic, and any roots close to each other exacerbate the difficulty. 5.4

Exercises

Exercise 5.1 (Resultants) We call the matrix appearing in (5.1.1) and (5.1.2) the Sylvester matrix for the resultant. (1) Write out the Sylvester matrix for the resultant of a general cubic and a general quadratic. (2) Let the cubic and quadratic have random coefficients. Use a numerical test of the rank of the Sylvester matrix (singular value decomposition is best) to show that it is nonsingular. (3) Form the Sylvester matrix for Pl=

z3 + 2z- 3,

p2 = 2z3 - z2 - 3z + 2.

Numerically evaluate the determinant and find the rank of the Sylvester matrix. Do pi and pi have a common factor? If so, use linear algebra to compute the polynomials f(z), g(z) as in Lemma 5.1.3. (4) Repeat the above for Pi = z3 - 2x2 - x + 2,

pi = z3 - -x2 + -x + l.

(5) Use the results of the last two items to form a conjecture about how the rank of the Sylvester matrix relates to the number of common solutions. Prove it. Be sure to account for the possibility of multiple roots. (6) Pick any one of the polynomials in the preceding items and use Theorem 5.1.6 to show that it does not have a repeated root. What does the same test say about z3 + z2 - z - 1? Exercise 5.2 (Chebychev polynomials) Form normalized Chebychev polynomials Tn(z). (1) For n = 1,2,3,4,5,10,15, plot fn(z) for z £ [-1.1,1.1]. Verify the predicted limits of oscillation.

66

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

(2) Make a contour plot of |TiO(.z)| for z £ C in the unit disk, Ai(O). (3) Zoom in around z = 1 until the contour lines separate the roots. (4) Try this for larger n and see how far you can go before the roots near z = 1 can no longer be separated. (5) Plot the contour |f n (z)| = 0.001 for n = 5,10,15,20. (6) How much of a problem is this for the probabilistic null test? Consider degree and the precision of arithmetic in your answer.

Chapter 6

Other Methods

While the focus of this book is on homotopy methods, this chapter highlights some of the most useful alternatives: exclusion methods, eliminants, and Grobner bases. We already indicated in § 1.1 that the eigenvalue approach is one of the most effective means of solving a polynomial in one variable, but its extension to systems in more than one variable requires significant symbolic preprocessing. In contrast, we have seen that homotopy methods for one variable extend rather naturally to multivariate systems, a matter that we take up in detail in Part 2. Exclusion methods have this same property: the multivariate algorithm looks almost exactly like the one-variable method. Numerical applications of eliminants and Grobner bases work the other way around: they reduce multivariate problems back to just one variable so that an eigenvalue routine or other method for one variable can be employed. As our interest is in numerical methods, some readers may be surprised to see resultants and Grobner bases mentioned here. These are usually regarded as symbolic approaches, applicable to systems with rational coefficients and computed in exact arithmetic. But, in fact, even if we use a symbolic method for most of the computation, we will generally have to rely on numerics to estimate the zeros of the system. As a very simple example, consider the equation x2 — 2 = 0, which has no roots over the rational numbers. To proceed further symbolically, we must add the symbol \[2 to the number field, whereupon the roots can be expressed as x = ±y/2. This may be perfectly suitable for some purposes, but a scientist or engineer will usually want to know that y/2 « 1.41421.... The situation is even more dicey in general, because according to Galois theory, there is no symbolic formula for the roots of a polynomial of degree greater than five. In short, for most practical purposes, it is not a question of whether to proceed symbolically or numerically; rather, it is a question of how far to proceed symbolically before turning to numerics. With this in mind, one may craft symbolic approaches that lead naturally into numeric methods. It is from this viewpoint that we discuss how eliminants and Grobner basis methods can lead us to eigenvalue formulations for computing solutions numerically. There are a host of considerations relevant to choosing a solution method, in67

68

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

eluding • Does it find all solutions? What happens if there are isolated singular solutions? How about higher-dimensional solutions? • Does it provide error estimates and/or error bounds? • Under what conditions is it efficient? • Is it easy to implement? Are software packages readily available? Since each method has an extensive literature on its own, full answers to such questions are beyond the scope of this book. We will not attempt detailed comparisons here, nor in fact, will we even give in-depth descriptions of practical algorithms. Our aim is only to introduce these alternatives to give the interested reader a starting point for further investigation. 6.1

Exclusion Methods

Exclusion methods, also known as subdivision methods or generalized bisection methods (and related to branch and bound algorithms for optimization), operate by subdividing a region into pieces, excluding those pieces which cannot contain a solution, saving pieces that can be seen to contain a single solution point, and subdividing again the remaining pieces, stopping the process when all pieces are smaller than some predetermined size. Let's rephrase this in a precise, yet general, way and then specialize to a common practical form. Assume we wish to solve a system of equations f(x) = 0, / : X —> X, where X = C n or X = W1. Suppose there is a kind of subset of X that we will call a box, such that for every box B we have a test T(f, B), a splitting algorithm S(B), and a real-valued size measure \B\. The test T(f,B) returns one of three values: • T(/, B) = - 1 means that there is no x £ B such that f(x) = 0; • T(f, B) = 1 means that there is a unique x £ B such that f(x) = 0; and • T(f, B) = 0 means that neither of the other two conditions could be verified. For some real 0 < p < 1, the splitting algorithm returns boxes £?i,... ,Bk C X, k>2, such that B C u£ =1 #i and vol(Bj) < pvol(B). An exclusion method for finding solutions of f(x) = 0 in an initial box BQ is as follows. Given An initial box BQ, a real tolerance e > 0, and a function f(x) with test T(f, B) and splitting algorithm S(B) as above. Find Sets of boxes B* and Be such that every solution point in Bo is in at least one of the boxes in B* or B£, each box in B* contains a unique solution point and each box B g B€ has size \B\ < e. Begin • Initialize B = {-Bo}-

69

Other Methods

• Initialize Be = {} and B* = {} (empty sets). • While B ^ {}, - Set B' = {}. - For each B, € B do: * * * *

Remove Bi from B. UT(f,Bi) = - 1 , discard B^. Else if T(/, B ^ == 1, append Bt to B*. Else if |Bj| < e, append £?, to Be.

* Else, append 5(JBi) to B'. - Set S = B'. • End While. End After m passes through the main loop, the largest boxes in B have volume no greater than pmvol(Bo), so with mild conditions on the way that splitting is done, we can be assured that the algorithm terminates. At the conclusion of the algorithm, each box in B* contains a unique solution, each box in B€ is no bigger than e, and all solutions of f{x) — 0 in the initial box BQ is in one of the boxes in B* or Be. At this level of generality, the algorithm always succeeds, but possibly in a completely useless way: there is nothing to keep it from returning an extremely long list Be whose boxes cover all of Bo- What is needed to make the approach useful is a test T(f,B) that is reasonably sharp; that is, it classifies boxes as ±1 while they are still relatively large, eliminating them from further subdivision. Usually, a box in Rn is taken to be a rectangular parallelepiped defined by lower and upper limits on each coordinate: x = {(xi,...,xn)\xi

e [x^Xi],

i=

l,...,n}

Then, |x| = maxi(xi — x j , and S(x) bisects x along the mid-plane of this maximally wide coordinate direction,1 so k = 2 and p = 1/2. A nice consequence of this choice is that the subdivisions of a box exactly cover the original box with no overlap except for sharing a boundary face. This means that a solution point can be in the interior of only one box, so we get duplicate copies of a solution only in the rare instance that a bisection face passes exactly through it. A box in C n can be considered a box in ]R2n having independent coordinates for the real and imaginary parts of each complex coordinate. The obvious question at this point is how to construct good exclusion tests. The most common approach, popular because of its wide generality, is interval arithmetic. An interval extension of a function / : M.n —> R is a function f : R " —> R, where R C I 2 is the half-plane of intervals [a, b], a < b, and f(x) G f (x) for any x

It can be advantageous to bisect along a smaller edge of the box, using derivative information to inform the decision; see (Kearfott, 1997).

70

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

x G x. That is, the interval extension function evaluated on an interval box gives an interval that contains all possible values of the function evaluated on points in the box. Clearly, if there is a solution of f(x) — 0 in box x, that is, if there is a x* G x such that f(x*) = 0, then f (x) must contain 0. Consequently, if 0 ^ f (x), then the box can be excluded, or in the notation above, T(f, x) = —1. The interval extension does not have to be sharp, that is, it may give loose bounds on the actual image of f(x), x € x, and in practice, this is almost always the case, as sharp bounds are prohibitively expensive to compute. An interval extension of a polynomial in straight-line form can be computed by concatenating interval extensions of each of the basic operations of negation, addition, subtraction, multiplication, and integer powers (see § 1.2). For these, we have the sharp bounds ~{ao,ai] C

[-ai,-a0]

[a0, ai] + [b0, bi] C [a0 + b0, ax + bi] [a o ,ai] - [bo,6i] C [a0 - b i , a i - b0]

[a o ,ai]- [bo,bi] C [min(aobo,aob1:aibo,aibi),ma,x(aobo,aob1,aibo,aibi)] [ao,ai]fc C [(0, if aoai < 0; else min(|ao|, |ai|)fc),max(|ao|, lai|)fe]i k even [ao,ai]k C [oo,ai]. k o d d (6.1.1) When the operations are carried out in floating point, one must be careful to round the upper limit of the output interval up and the lower limit down to be sure that it properly contains all possible results. To evaluate a general polynomial function, one may simply apply these interval operations at every stage of a straight-line implementation of the function. Sharper bounds can be determined by considering the special properties of a polynomial, as illustrated by the exponentiation formula above: in principle we only need the multiplication formula to evaluate x2 as x • x, but the formula invokes the fact that x2 is always nonnegative. With only an exclusion test, we have a bisection method for narrowing potential solution boxes down to size |x| < e. But bisection becomes very expensive as the dimension n grows, because we may generate as many as 2™ sub-boxes in the course of bisecting each of the coordinates. The process is greatly expedited if an inclusion test returns T(f, x) = 1 while |x| is still relatively large. An approach that can provide this is the interval Newton test. Although the method can be refined in various ways, the basic idea is to compute a Newton step using interval arithmetic and test the overlap of the resulting box with the initial box. To be precise, the interval Newton step is computed as N(/,x)=x-f'(x)-1/(x), where x is any point in x (typically the midpoint), f' is an interval extension of the Jacobian matrix of / , and the inversion is computed by Gaussian elimination using

Other Methods

71

interval arithmetic. If f (x) includes singular matrices, the inversion will fail, and the test is inconclusive. Otherwise, we have the following facts. • Any solution in x is also in N(/, x). • If N(/,x) C int(x), where int(x) is the interior of x, then there is a unique solution in x. Hence, the test T(/,x) = 1. In any case, we can restrict further search for a solution to the box N(/, x) fix, and the general algorithm given above can easily be refined to take advantage of this. If the intersection is empty, we may declare the test T(f,x) = — 1. Once the Newton test confirms that a box contains a unique solution, the box can be constricted by repeated iterations of the interval Newton step. As in the usual Newton method, convergence is quadratic, under certain assumptions on differentiability and on the tightness of the interval extension, which are satisfied by polynomial functions evaluated with interval arithmetic. This brief overview just gives a glimpse of the approach; for more information, see the books (Alefeld & Herzberger, 1983; Kearfott, 1996; Moore, 1979; Neumaier, 1990). References (Allgower, Georg, & Miranda, 1992; Dian & Kearfott, 2003; Xu, Zhang, & Wang, 1996) are also useful. Substantial effort has been expended on methods to sharpen the interval tests or to reduce the computation required (Georg, 2001, 2003; Kearfott, 1997), and software packages are available, including IntBis (Kearfott & Novoa, 1990), ALIAS (Merlet, 2001), and IntLab (Rump, 1999). A major strength of the approach is that the search can be conducted entirely in the reals and limited to a finite region of space, so if one is only interested in such solutions, effort is not expended elsewhere. The approach also easily generalizes to non-polynomial functions, just by including interval extensions of other elementary functions. (In fact, almost all the literature on the subject is for general nonlinear, continuous functions.) Importantly, even though we are using floating point arithmetic, we obtain not just an approximate answer, but also mathematically reliable bounds and a guarantee that all solutions in the initial box are somewhere in the final set of solution boxes. The method has several weaknesses. First, the Newton test is inconclusive in the neighborhood of singular solutions, even isolated ones, in which case the method behaves like bisection and converges slowly. Worse, in the presence of a higherdimensional solution set that intersects the initial box, the method returns a set of boxes covering that whole set. The number of such boxes grows exponentially with the dimension of the solution set, so this sea of boxes can easily founder the computation. Finally, interval arithmetic does not return sharp results, and with every arithmetic operation, the looseness may accumulate. For functions with many operations, the interval extensions may grossly overestimate the true bounds. This also applies to the linear solving step in the interval Newton test, so that for large dimensions n, loose bounds inevitably accumulate.

72

6.2

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Elimination Methods

Instead of numerically attacking all variables at once, as in the exclusion methods, one can eliminate some variables and then numerically solve for the remaining ones. The extreme case is to eliminate all but one variable, so that the remaining polynomial can be numerically solved readily. Then, a backsolving procedure must reconstruct the other variables. For a system of n polynomials in n unknowns, say f{x\,... ,xn) = 0, we call an "eliminant" any system ofTO< n equations inTOunknowns, g(xi,... ,xm) = 0, such that if x* is an isolated solution of / = 0, then TT(X*) is also an isolated solution of g = 0, where IT : (xi,... ,xn) \—> (xi,... ,xm) is the projection onto the first m variables. Note that the vanishing of g is only a necessary condition for / to vanish; exact eliminants that are also sufficient are found in some approaches. The solution set of g = 0 includes the projection of the solution set of / = 0. If the projection were in a general direction, then distinct isolated solutions of / = 0 would project to distinct isolated solutions of g — 0. But, if we merely project onto the first m variables, the projection may not be general, and so several solutions may project to the same point. The backsolving procedure must then be able to find all of the pre-images of that point. To avoid such complications, one may introduce a random, linear change of variable before computing the eliminant, effectively randomizing the subsequent projection direction. While simplifying the backsolve, this may make the calculation of the eliminant more difficult. With only the necessary condition in place, some isolated solutions of g = 0 may not have pre-images that are isolated solutions to / = 0. These are called extraneous solutions. The conceptualization of elimination as a projection explains how such solutions can appear in the eliminant. One way is that a positive dimensional solution set might project to a point, as in a vertical line under the projection from the xy-plane to the x-axis. The other is that a solution at infinity of / = 0 might project to a finite solution of g = 0. A worse situation for elimination is when / = 0 has a positive dimensional solution set, for then, the projection of such a set may contain the projection of some isolated solution. In fact, if we eliminate to a single variable and if that variable is not constant on the positive dimensional solution set, then the projection covers the entirety of C and no isolated solutions can be found. This means that elimination to a single variable produces just the null polynomial. Under the assumption that / = 0 has only isolated solutions, we may find all of them by computing an eliminant, finding all of its isolated solutions, and backsolving these, checking for extraneous solutions. There are several approaches for computing eliminants along with a backsolving procedure. One of the most popular approaches is to use resultants, which we discuss next.

73

Other Methods

6.2.1

Resultants

Recall from Theorem 5.1.4 that the condition for two polynomials p$z) and pz(z) in one variable to have a common root is the vanishing of their Sylvester resultant Res(pi,p2)5 a determinant in the coefficients of the two polynomials. Similarly, for degrees di,..., dn, let Pi(x) be the polynomial i n i = ( i i , . . . , xn-$ composed of all monomials xa with |a| < di and with coefficient ci%a on monomial xa. This is called the "universal polynomial system" of degree d\,..., dn. There is a polynomial in the coefficients Cj)Cn called the resultant, unique up to scale, such that the n polynomials Pi in n — 1 variables x have a common root if and only if the resultant is zero (Ch.3,Thm. 2.3 Cox, Little, & O'Shea, 1998).2 We may denote the resultant as Res(i1,...,dn to indicate its relation to the degrees of the polynomials. An exposition on how to find the resultant for n > 2 is beyond our scope; see (Canny & Manocha, 1993; Cox et al., 1998; Manocha, 1993) for details. More generally, following the notation introduced in Equation 1.2.3, for index sets I i ; i = 1,... , n, we suppose that polynomial pt(x) is of the form Pi(x) = J2aej. Ciaxa. Then, the condition that the polynomials have a common root is again a resultant polynomial in the coefficients, called the sparse resultant (Cox et al., 1998; Emiris, 1994, 1995; Gelfand, Kapranov, & Zelevinsky, 1994), which we may denote as Resi1,...iin. While Res^j,^ is given in Equation 5.1.1 as the determinant of a matrix having a single coefficient or zero in each entry, this is not true in the general case. For universal polynomial systems, the resultant is a ratio of two such determinants, e.g., (Ch.3,Thm.4.9 Cox et al., 1998) and (Macaulay, 1902). For nongeneric coefficients, such as when a system has specific integer coefficients or when a system is sparse, the determinant in the denominator of such an expression may vanish, so that more complicated formulae may have to be employed. Some conditions that guarantee that the resultant has an expression as a single determinant, sometimes referred to as a resultant of "Sylvester type," are given in (Sturmfels & Zelevinsky, 1994). Although resultants apply to n polynomials in n — 1 variables, several techniques exist for applying them to compute solutions to n polynomials in n variables. We briefly touch on two of them here. 6.2.1.1 Hidden Variable Resultants The hidden variable technique picks out one variable, say xn, and rewrites each polynomial Pi(x\,... ,xn) as a polynomial in just y = (x\,.. • ,£ n -i) with coefficients that depend on xn. That is, Pi(X) — }

j

Ci,ax

— /

J

Cj,aVxnJy i

2 Officially, the scale is made unique by adding an extra condition, as in CLO98, but that is not of interest to us here.

74

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

where J7i is a new index set and CjQ(a:n) are the corresponding coefficient polynomials, these being derived from 2,- after hiding xn and collecting like terms. Then, a necessary condition that pi (x) = 0,..., pn (x) — 0 have a common solution is Resyi,...,J-«(ci,a(*n))=0,

(6.2.2)

where we mean to indicate that the resultant depends on all the coefficient polynomials that appear in the system of equations. Since this is a polynomial in the single variable x n , we may solve it numerically via the eigenvalues of the companion matrix or any other suitable numerical method. Equation 6.2.2 does not tell us how to find the corresponding values of the remaining variables. We will not address this in a general way, but will content ourselves to show how it can be done for systems of two equations in two variables. We have px (x, y) = ao(y)xdl + ... + adl(y) and p2(x,y) = bo(y)xd2 + ... + bd2(y), where y is "hidden" in the coefficients. Looking back to the proof of Theorem 5.1.4, we note that each column in Equation 5.1.2 corresponds to a power of x, that is, we have the matrix equation flo

a

i

•••

a

di

0

0 ...

0 a0 ... adl-i adl 0 ... 0=[fl,-/]-

, ,

:

,

„

n

o0 b\ . . . bd2 U U . . . 0 6 0 • • • bd2-i bd20 ...

rx(d1+d2-i) ]

,

(6.2.3)

x

i

where, to save space, we have written g and / in place of the row vector for their coefficients. We may rename the matrices appearing in this equation as [g, -f]S(y)x

= 0,

so that the resultant condition is just det S(y) = 0. Key to the proof of Theorem 5.1.4 was that the vanishing of the resultant is necessary for the existence of left null vectors [g, —/] satisfying [g, —f)S{y) = 0, but this also implies the existence of right null vectors x satisfying S(y)x — 0. So for each value of y satisfying det>S(y) = 0, we solve the linear homogeneous system S(y)x — 0 for x, and since this is determined only up to scale, we recover x as the ratio of the last two entries in x. This approach assumes the co-rank of S(y) is one at each solution for y, otherwise x is not uniquely determined. Also, the final entry in the solution for x must be nonzero for x to be well defined. We cannot go into the details of what to do when these conditions fail. Example 6.2.1 Using y as the hidden variable, the resultant formulation for the

75

Other Methods

system =0 2x2 -xy-y-2 2 2 x - y - 2x + 2y = 0 is "2 -y .0

1

-y-2

0

-2

-y2 +2y\

1

|V L 1.

The determinant of the matrix gives the resultant -12 + 16j/ + lly 2 - 14y3 + 3j/4, whose roots are y = —1,2/3,2,3. Substituting each of these in turn back into Equation 6.2.4 and solving the homogeneous linear system, one obtains column vectors whose last two entries are in the ratio x = — 1, 4/3, 2 , - 1 , respectively. For nongeneric coefficients Cj>a, the hidden resultant formula Equation 6.2.2 can fail to yield solutions of the system. The problem is that the system may have a positive-dimensional solution set so that there is a solution x for every value of xn. This implies that the hidden-variable resultant must be identically zero. The system may have isolated solution points in addition to the positive dimensional solution set, but the resultant formula does not find them. An approach for dealing with this situation can be found in (Canny, 1990). Example 6.2.2

Consider a system of two quadratics of the form x2 + (3y + 4)x + {2y2 + 5y + 3) = 0 x2 + 7x + (-y2 + 5y + 6) = 0.

Using y as the hidden variable, the resultant condition is "1 3y + 4 2y2 + 5y + 3 .0

1

7

0 -2/ 2 + 5y + 6.

A bit of algebra shows that this polynomial is identically zero, even though the system has a nonsingular root, (x, y) = (—5,1). The trouble is that the system also has a singular solution set: x + y + 1 = 0. This failure of the hidden-variable resultant formula on nongeneric systems is one of the major drawbacks of the approach. Also, the symbolic derivation of resultant formulae can be an onerous task, even if done using computer algebra. For example, a result due to B. Sturmfels, reported in (Cox et al., 1998), is that the resultant for three general quadratics in two variables, Res2,2,2, when fully expanded as a degree 12 polynomial in the 18 coefficients of the system, has 21,894 terms. This exaggerates the problem though, for when we apply the method to a system of

76

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

three quadratics in three variables having numerical coefficients, the hidden variable resultant formula gives at most a degree 8 polynomial in the hidden variable. The trick, then, is to use resultant theory to set up Sylvester-type matrix formulae and operate on these, without expanding the associated determinants, see (Manocha, 1994). Such approaches can be very fast, especially for small or sparse systems, which may outweigh the drawbacks. For large systems, the resultant formulae tend to be unwieldy, and the method is no longer useful. 6.2.1.2

u-Resultants

Instead of hiding a variable to get n equations in n — 1 variables, one can add an extra linear equation f(x) = uo + wi^i H

h unxn = 0

to get n + 1 equations in n variables. This is the first step in the so-called uresultant method. If the coefficients Ui were to be specified as given constants, then in general the whole system would not have any solutions, but the idea is to treat will depend on the them as unknown. The resultant for the system f,pi,-..,pn coefficients of all the pi and o n t i o , . . . , u n . In fact, it factors as a constant multiple of a polynomial of the form nfc( u o + flifc«i + • • • + OnfeWn), whereupon the fcth solution can be read off as x = (ai/t,..., anfc)- The u-resultant usually becomes unmanageable, because even though the coefficients of the pi polynomials will in application be numerical, the u's must be carried through symbolically in what can be rather large determinant formulae. Afterwards, the large polynomial must be factored. In a maneuver akin to the hidden variable approach, one can reduce computation by singling out one variable, say xn, and appending the simpler equation f(x) = UQ — xn. Then, substituting numerical values for the coefficients of the pt, the resultant just depends on u0, and its roots are the values of xn. This just describes the gist of resultant methods, since we have evaded the rather difficult technical issue of deriving resultant formulae. The sequel to this section discusses techniques that instead of working with the resultant, work with a polynomial multiple of it, which is often all that is required to compute solutions numerically. 6.2.2

Numerically Confirmed Eliminants

There are a number of ways to eliminate variables other than resultant formulae. Typically, these come down to a final expression of the form A(xn)m = 0,

(6.2.6)

where A is a matrix whose entries are polynomials in xn and m is a column vector of monomials in x\,... , x n . An example is the Sylvester formula given in Equa-

77

Other Methods

tion 6.2.3. When A(xn) is square, the existence of a nontrivial solution requires detA(x n )=0,

(6.2.7)

which is a polynomial in xn. This general approach is sometimes called "Sylvester dialytic elimination" (Raghavan & Roth, 1995). Often the procedure leading to such a formula guarantees only that it is a necessary condition, which may be equivalent to the hidden variable resultant, a polynomial multiple of the resultant, or a null polynomial. The case of a null polynomial can be detected using the probabilistic null test (§ 4.3). Instead of evaluating the determinant directly, it is numerically more stable and reliable to test, for a random test value x*, whether A{x*) is full rank using a singular value decomposition. If this shows that A(xn) is generically nonsingular, the solutions for xn in Equation 6.2.7 and the corresponding monomial vectors from Equation 6.2.6 must include all the solutions of the original problem. These can be tested in the original equations to see if any extraneous solutions are included. If so, this means that det A(xn) includes an extraneous polynomial factor whose degree is the number of extraneous roots. Generally, it is numerically disadvantageous to solve the determinantal polynomial of Equation 6.2.7 in route to solving Equation 6.2.6. It is better to convert Equation 6.2.6 to an equivalent eigenvalue problem, whereby the monomials m are recovered from the eigenvectors. We will return to this briefly below. This description has been intentionally sketchy and is meant only to convey the general sense of the approach. Again, the approach gives necessary, not sufficient, conditions. One approach described below, the Dixon determinant, is at least algorithmic, but we also include a short description of some other more heuristic methods. Either way, probabilistic numerical tests can be used to determine if the formulae are nontrivial and if extraneous roots are present. 6.2.3

Dixon Determinants

One of the earliest solution methods for three polynomials in three variables is due to Dixon (Dixon, 1909), which in modern notation is generalized to n polynomials as follows. Given n polynomials, / i , • • •, / n in n — 1 variables xi, • • •, xn_\, one introduces new variables o>i,... ,an^1 and forms the determinant fl(x1,X2,--.,Xn-i)

•••

fn{xi,X2,...,Xn-l)

fi(ai,x2,...,xn-i)

•••

fn(a1,x2,.-.,xn-i)

f i ( o L i , c t 2 , • • •, a n _ i )

••• f n ( a i , a 2 , - • • , a n - \ )

In the ith row of this equation, variables X\,..., Xi—i are replaced by u\,..., cti-i. If for any i we let Xi = on, then row i and row i + 1 will be identical and so the

78

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

determinant is zero. Cancelling out such factors, one obtains the Dixon polynomial n-l

8(xu. •. ,a;n-i,ai, • • • ,an-i) = N Y[(xi - (*)•

(6-2.9)

When this determinant is expanded and like terms collected, it can be put into the where m Q is a row vector of monomials in the a» variables, form 6 = maWmx, mx is a column vector of monomials in the variables xt, and W is a function of the coefficients of f\ , . . . , / „ . It is clear that for a common solution of the original equations, the first row of the determinant is zero, so 5 must also be zero. Moreover, this will be true for arbitrary values of the auxiliary variables a$. Consequently, solutions must satisfy the matrix equation Wmx = 0.

(6.2.10)

It happens that W is square, so a necessary condition that / i , . . . , / „ have a common root is det W = 0. Notice that the procedure as just described has one more equation /* than unknown Xi. To use this as an elimination method, one may apply the same trick as in the hidden-variable resultant of § 6.2.1.1; that is, consider the /* as polynomials in x i , . . . , x n _i with coefficients that depend on xn. Then the eliminant matrix W in Equation 6.2.10 depends only on xn and detVK = 0 is a polynomial equation in one variable, and we have the situation described at Equation 6.2.6. The Dixon determinant can, of course, be used in symbolic work, see for example, (Mourrain, 1993). Some examples of its use in formulating numerical algorithms in kinematics are in (Nielsen k Roth, 1999; Wampler, 2001). E x a m p l e 6.2.3 (Three quadratics) To apply Dixon's method to three quadratics, rewrite them in the form, for i = 1, 2,3, fi(x, y, z) = (cot + cux + c2ix2) + (c 3i + c4ix)y + (c5i + Cfax)z + c7iy2+c8iyz 0

+ c9iz2 l

:= cOOiy°z + cWiy z°

(6.2.11) l

+ c0liy°z

2

+ c2Oiy z° + c n i y V +

c02iy0z2, (6.2.12)

where cmni is a polynomial in x of degree 2 — m — n. At Equation (6.2.8), we will have a 3 x 3 matrix, where y,z play the role of xi,x2 and x is hidden in the coefficients c m n j. Subtracting row 2 from row 1 and row 3 from row 2 and then cancelling a factor of (y — a\) from the new row 1 and a factor (z - a2) from the new row 2, 6 in Equation 6.2.9 is a 3 x 3 determinant whose ith column is cioi + c20i(y + oil) + ciuz cou + cniai + cO2i(z + a2) cooi + c1Oiai + cfma2 + c2aia\ + cnjQ 1 a 2 + c02ial

(6.2.13)

Other Methods

79

The determinant is linear in y, quadratic in z, and quadratic in y, z together, so it gives terms only in the monomial set mx = {l,y,z,yz,z2}. Expanding and collecting like terras, one obtains a matrix W of size 5 x 5 , each entry of which is a polynomial in x. The degrees of these entries are as follows, where "0" indicates an entry that is identically zero: / 4 3 degW = 3 2 \2

3 3 2 2\ 2 2 11 2 2 11 . 110 0 1 1 0 0/

(6.2.14)

From this, one sees that det W is a polynomial of degree 8 in a;, as one would expect for the intersection of three quadratics. It remains to show that det W is nontrivial, which may be done by checking the rank of W for a random test value of x. It turns out that this is so for general coefficients c^, and we have the equivalent of the hidden-variable resultant. Of course, the method will fail on examples like Example 6.2.2, for the simple reason that elimination can never work in the presence of positive dimensional solutions. 6.2.4

Heuristic

Eliminants

Historically, a very popular approach among engineers has been to heuristically search out an eliminant of the form of Equation 6.2.6. The basic idea is that if f(x) = 0, then necessarily xaf(x) = 0, where xa is any monomial (written in multidegree notation). Augmenting the original system of polynomials fi(x) = 0,..., fn(x) = 0 with a number of such auxiliary equations, one may with some cunning or luck arrive at a system that can be written in the desired form, with an eliminant matrix that is square and generically nonsingular. Often, a "hidden variable" approach is used, meaning that at the outset one of the variables is "hidden" in the coefficients and the analyst tries to construct an augmented system of N equations in N monomials that depend only on the other n — 1 variables. We can be more precise and less restrictive in stating the requirements for a successful eliminant formulation. First, it is useful to have the notion of ideals. Definition 6.2.4 (Ideal) The ideal I(T) of a system of polynomials J- = {/l; /2> • • •, fn} is the set of all polynomials that can be formed by multiplying each fi by a polynomial gi and summing them up: I{J=) = {h\h = 9lh + g2f2 + • • •
(6.2.15)

where the gi can be any polynomials in the same variables as T. Notice that if a; is a solution to T, that is, fi(x) = 0 for i = 1,... ,n, then h(x) = 0 for any h 6 I{3~). Thus, any subset of polynomials in I(!F) is potentially a

80

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

set that could be rewritten in the form of Equation 6.2.6 and used as an eliminant. If the requirements on the number of equations and monomials can be met, and if the consequent eliminant matrix is nonsingular, then a viable eliminant (possibly including extraneous roots) has been found. We call this a "heuristic method," because it is not based on an algorithm guaranteed to deliver a set of polynomials in the ideal having the necessary properties to form an eliminant. See Stetter's book (Stetter, 2004) for more on finding such formulations without resorting to Grobner methods. (Grobner bases are sketched in the next section.) A variant of this approach was presented in (Wampler, 2004) and also used in (Su, Wampler, & McCarthy, 2004). Instead of hiding a variable, a set of equations from I{J-) is written as a constant matrix A, depending only on the coefficients of the original equations, times a set of monomials m, as Am = 0. To these, one appends identity relations that are linear in one variable. For example, if in multidegree notation, xa and x0 are both in the set of monomials m, with a — (3 = [1,0,..., 0] then the identity xa — x\x® = 0 is an allowed identity. Such identities can be appended to the equations from the ideal to form a system

[B+AXiC]m

= 0,

(6.2.16)

where the lower block is the collection of monomial identities. We are again in the situation of Equation 6.2.6. An important characteristic of Equation 6.2.16 is that x\ appears linearly in the elimination matrix, so the numerical solution of the problem falls within the purview of linear algebra and is, in fact, a sparse, generalized eigenvalue problem. To illustrate this last approach, let's return again to the example of three quadratic equations. Example 6.2.5 (Three quadratics revisited) Consider again three general quadratics as in Equation 6.2.11. We may multiply each of the three original polynomials by each of nine monomials {1, x, y, z, x2, xy, xz, y2, yz} to generate a set of 27 polynomials in the ideal. These polynomials contain 34 monomials, being all the monomials of degree 4 or less, except for z4. Thus, A in Equation 6.2.16 is a 27 x 34 matrix, but numerical test shows that it is rank only 26. Keeping 26 independent rows of A, we need 8 identities to produce a square system. These can be formed using x as the eigenvariable and the 8 monomials {I,x,x2,x3,y, xy, z,xz}, that is, the 8 identities are x • {1, x, x2, x3,y, xy, z, xz} = {x, x2, x3, x4, xy, x2y, xz, x2z}. The net result is a 34 x 34 generalized eigenvalue problem in which x appears only in the last 8 rows. A numeric test shows that for generic coefficients and a random value of x, the matrix is nonsingular, so this is indeed an eliminant, and in fact, generically there are no extraneous roots. The problem is sparse, as there are 10

Other Methods

81

nonzero entries in each of the first 26 rows and just two nonzero entries in each of the last 8 rows (these being one appearance each of x and -1). Standard linear algebra can be used to reduce the problem to size 8 before applying an eigenvalue routine to solve for x. 6.3

Grobner Methods

Grobner bases have wide applicability in computational algebraic geometry. Out of the vast literature on the subject, we can recommend the textbook (Cox et al., 1997) for a good introduction to the concept and to the Buchberger algorithm for computing them, while the sequel (Cox et al., 1998) illustrates the many uses of the approach. In the more narrow objective of finding isolated solutions to systems of polynomial equations, the computation of a Grobner basis can be used as the key step in turning the heuristic approach of the previous section into an algorithm. That is, it can be viewed as an organized way to generate new polynomials in the ideal, I(J-), selecting a subset that retains exactly the same solution set as the original polynomials, and determining a valid set of monomial identities that complete the definition of an eigenvalue problem. Routines for computing Grobner bases can be found in most general symbolic processing software packages, such as Maple and Mathematica; the Singular package (Greuel & Pfister, 2002), which is specialized to computer algebra, is one of the most efficient implementations available. We give only a brief glimpse of the approach here. 6.3.1

Definitions

First, some terminology. We already introduced ideals in Definition 6.2.4 above. A basis of an ideal is denned as follows Definition 6.3.1 (Basis of an Ideal) LetX be an ideal. Any set of polynomials Ji = {/i1;..., hm}, hj £ X, j = 1,..., m, that generates X, that is, I(H) = X, is called a basis for X. Two bases for the same ideal, say T and H with I(T) = I(H), have the same set of solutions, because each of them is in the ideal of the other. So, for the purpose of equation solving, we may exchange one basis for another at our convenience. Beginning with a system T that we wish to solve, one may, of course, append any auxiliary polynomials in the ideal, these being algebraic combinations of the original polynomials, without changing the ideal. We may also discard any polynomials that can be generated from others remaining in the basis. In this way, we can manipulate the polynomials into helpful forms without changing the solution set. The key to organizing the process of changing bases is to establish a monomial ordering that tells which of any two monomials is "greater."

82

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Definition 6.3.2 A monomial ordering is a set of rules for comparing monomials that satisfies the following three statements. (1) The ordering always tells which of two distinct monomials is greater. (2) The relative order of two monomials does not change when they are each multiplied by the same monomial. (3) Every strictly decreasing sequence of monomials eventually terminates. Once a monomial ordering has been established, a polynomial, being a sum of monomials with coefficients, has a leading term whose monomial, the leading monomial, is greater than any other appearing in the polynomial. There are many possible monomial orderings. Among the most useful are the graded orderings, which compare monomials by their degrees, that is, if |a| > |/3| then xa > x13, with one or more secondary rules for ordering monomials of equal degree. "Graded lexicographic" and "graded reverse lexicographic" orderings are frequently used (Cox et al., 1997). This finally brings us to the definition of a Grobner basis. Definition 6.3.3 (Grobner Basis) A Grobner basis, Q, for an ideal X with respect to a given monomial ordering is a basis of I such that the leading monomial of every polynomial in X is a multiple of at least one of the leading monomials of Q. The main technique for computing a Grobner basis is called Buchberger's algorithm. We start with a set of polynomials, say T. Initialize Q = T. Then, pick any two polynomials in Q and combine them so as to cancel out their leading terms (via the leading terms' least common multiple). This is called their "s-polynomial." If the result has a leading term that is a multiple of the leading monomial of any of the members of Q, reduce it by again forming the s-polynomial, proceeding until it can no longer be reduced. If at that point, it is nonzero, append it to Q. Pick another pair and do the same series of operations. Keep repeating this until there are no two members of Q that generate a new nonzero member. The final Q is a Grobner basis for T. We can further reduce Q to a minimal Grobner basis by dividing out the leading coefficients and eliminating any members of the basis that are in the ideal of the other members. This is just a basic description: efficient versions of the algorithm carefully prune the list along the way, use sophisticated tests to avoid forming s-polynomials that cannot be fruitful, and use informed heuristics for deciding what pair to combine next so as to speed up termination of the algorithm. The following describes a useful property of Grobner bases. For any set of monomials m, we can define the normal set of monomials as those monomials which are not multiples of any monomial in m. We can extend this to say that the normal set for an ideal is the normal set of all the leading monomials of the ideal. From the definition of a Grobner basis, stated above, one may conclude that if Q is a Grobner basis of X, then the normal set of X is just the normal set of the leading monomials of Q.

83

Other Methods

We need one last property of a Grobner basis. Any polynomial p has a unique remainder r with respect to a Grobner basis Q such that p = g + r with g £ I{Q) and no term of r divisible by a leading monomial of Q. In other words, all of the monomials in r are in the normal set of Q. The remainder can be computed by initializing r — p, and if any term in r is divisible by a leading monomial of any 9 G G, we just add the appropriate multiple of g to r to cancel that term. Repeat this until no term in r is so divisible. Let us denote the remainder as renig(p). 6.3.2

From Grobner Bases to Eigenvalues

We now sketch how to use a Grobner basis to derive an eigenvalue problem to solve the system. For details, see (Moller, 1998; Moller & Stetter, 1995; Stetter, 2004). Normal sets, as defined above, are key. Suppose that a set of polynomials has a finite number of solutions. Then, the number of solutions, counted with multiplicities, is equal to the number of monomials in the normal set. (See Proposition 3 of (Moller & Stetter, 1995).) Furthermore, we can use the normal set as the eigenvector in an eigenvalue problem that determines the solutions. A Grobner basis allows us to easily find the normal set and formulate the associated eigenvalue problem. Suppose we are solving a system of polynomials T in x = (x\,..., xn) which has a Grobner basis Q. Let A be any linear combination, A = co + cixi H

h cnxn

(6.3.17)

for given constants Co,...,cn. Let n = {xai,... ,xak} be the normal set of J-. Assuming T has a finite number of solutions, we know that the number of solutions must be k, and we wish to formulate a size k eigenvalue problem to find them. Consider the polynomial Pi{x) = \xa\ for some i, and suppose that x* is a solution of T. Since f(x*) = 0 for any / £ T, we can add any multiple of a polynomial in the ideal of T to pi without changing the value of Pi(x*). This implies that if r* = remg(pi), then r^x*) = Pi(x*) = Arij. But ri(x) is a sum of terms in the normal set, so we can write it as r» = [an • • • ajfc]n. The entries a^ are just the constant coefficients in the formulae for the remainders rt, i = 1,..., k. Assembling all these into matrix notation, we have An = An.

(6.3.18)

Hence, by computing remainders using the Grobner basis, we have derived an eigenvalue problem. For each eigenvector n, we can get a unique solution x, because either i ; £ n or else it is a leading monomial of Q. In the latter case, we just evaluate Xi using the Grobner basis element that has it as the leading monomial. By picking the constants co,...,cn at random, the procedure is made more robust than if one were to make a special choice, such as A = xn. With the randomization, distinct solution points give distinct values of A with probability one. Still, a root with multiplicity greater than one can lead to a repeated eigenvalue, a

84

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

situation that requires extra care, as addressed in (Moller & Stetter, 1995). We should note that the eigenvector in Equation 6.3.18 is defined only up to scale. But the correct scale is easily discerned, because one of the members of the normal set is 1. If the monomial 1 is not in the normal set, then the constant polynomial p = 1 must be in the Grobner basis, which means that the system has no solution. 6.4

More Methods

The construction of the eigenvalue problem at Equation 6.3.18 has at its heart the so-called multiplication map for the polynomial system. Eigenvalue problems can be formed by devising other algorithms for constructing this map without using Grobner bases and the Buchberger algorithm (Auzinger & Stetter, 1988; Mourrain, 1998; Stetter, 2004). Methods that take advantage of the sparse structure of the polynomial system are described in (Emiris, 2003) and extensions of the approach allowing it to treat systems with higher-dimensional components are in (D'Andrea & Emiris, 2003). (For background on sparse structures, see § 8.5.) Related methods and more can be found in the book (Dickenstein & Emiris, preprint). 6.5

Floating Point vs. Exact Arithmetic

If the system to be solved has integer or rational coefficients, the elimination and Grobner methods can be carried forward in exact arithmetic through the stage of forming an eigenvalue problem. After that point, floating point algorithms must be employed. In that way, one may be sure that the calculations are rigorous up to the eigenvalue routine, at which point at least one knows the number of solutions to expect. For the elimination methods, exact arithmetic guarantees the determination of the rank of the eliminant matrix, and for Grobner methods, it guarantees that leading terms are determined properly. In floating point, either of these may require judgements of whether small numbers should be declared zero, because they may just be the figments of limited precision. Unfortunately, exact arithmetic over the integers is often not feasible, because for a series of computations, the number of digits usually grows ponderously large. To avoid this, one may do the calculations over a finite field (i.e., over integers modulo a moderately large prime number). For the purposes of determining the rank of an eliminant matrix or finding the correct leading term of an s-polynomial, this will almost certainly work correctly. A polynomial that is found to be nontrivial over a finite field is certainly nontrivial over the integers, while the opposite direction is not necessarily true, but holds with a high probability. In engineering problems, more often than not, the polynomials have real coefficients. What shall we do then? One option is simply to proceed in floating point

Other Methods

85

and make decisions about zero quantities taking round off into account. Another way is to use a finite field calculation in parallel with the floating point, using the exact arithmetic results to determine when quantities are zero and using the floating point results as the value of the nonzero quantities. See (Losch, 1995) for a discussion of this approach to Grobner bases as applied to kinematics problems. If all one wants is to count the generic number of solutions for a family of problems, one can sometimes choose a candidate system having integer coefficients and proceed in exact arithmetic over a finite field. In general, this can only be employed when the coefficients for the family are a Euclidean space, for if they are defined by algebraic conditions, integer examples may not exist. See (Faugere & Lazard, 1995) for examples of this technique applied to problems in kinematics and also for a discussion of the validity of such demonstrations. (They are not mathematical proof, but the authors argue that facts discovered in this way may be more reliable than proofs constructed by fallible humans.) As we argued at the top of the chapter, however, if one wants solutions, or more properly speaking, solution estimates, then floating point must be invoked at some point. 6.6

Discussion

All of the methods mentioned above have their place. Exclusion methods work best in low dimensions and when only the real solutions in a finite region are desired. Positive dimensional solution sets outside the region of interest have no effect, but inside the region, they can be devastating. Isolated roots with multiplicity greater than one cause extra work, as at best, the box containing such a root must be whittled down to size e for the method to terminate. Algorithmic resultant methods and Grobner methods work well on small systems, and in the case that all solutions are nonsingular and isolated, these can be very fast as well. However, these methods can get very expensive for high degrees and many variables. Clever heuristic eliminants can occasionally fill in where algorithmic methods fail. The basic elimination methods described here can break down completely when positive dimensional solutions exist and multiple roots can also cause difficulties, although in both cases more advanced techniques can be brought into play. Also, note that resultants and Grobner methods require the polynomials to be expanded into sums of monomials; they do not work directly with straight-line programs for evaluating polynomials. This expansion can significantly increase the complexity of the calculations (see § 1.2). Work on using straight-line programs in symbolic computations is relatively new (Krick, 2004). The continuation methods that we propound have their own set of strengths and weaknesses. In contrast to exclusion methods, continuation cannot be limited to a pre-defined region and the only way to find all real solutions is to first find all complex solutions and then pick out the real ones. For small systems where elimination is still cheap, continuation can be orders of magnitude more expensive.

86

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

On the other hand, homotopy continuation is very robust in the face of multiple roots and positive dimensional solutions. Homotopies can easily be set up to find all isolated solutions and, as we shall see in Part III, can be adapted to catalog all the positive dimensional solutions as well. Continuation can use straight-line programs and the cost per solution point tends to grow mildly with the number of variables. And unlike the other methods, continuation very naturally applies to families of polynomial systems that are parameterized by real-valued physical quantities, such as typically arise in engineering and science. We take up this last point in some detail in the next chapter, as we concentrate on the continuation method exclusively for the remainder of the book.

6.7

Exercises

Exercise 6.1 (Exclusion) Download one of the interval arithmetic packages mentioned in § 6.1 and try the following. (1) Solve Example 6.2.1. How large are the boxes when the interval Newton test terminates? (2) Solve Example 6.2.2. Try different termination settings for the size e for which boxes are put in the list B€. How does the number of boxes in the list depend on e? (3) Try to solve the six-revolute serial-link inverse kinematic problem as formulated in Equations (9.4.30), (9.4.31), and (9.4.32). For parameters, use the ManseurDoty example from Exercise 9.5. Beware of long running times. Why? Exercise 6.2 (Hidden Variable Resultants) (1) Repeat Examples 6.2.1 and 6.2.2 by hand or using a symbolic manipulation program. (2) Convert Equation 6.2.4 to a generalized eigenvalue problem for y by adding xy and y to the column of monomials and appending related identities. The resultant matrix should no longer have any quadratic entries. Solve the problem numerically with an eigenvalue routine (in Matlab, see qz). How do the results for this 6 x 6 problem reconcile with the fact that we expect just four roots? Exercise 6.3 (Dixon Determinants) Implement the Dixon determinant for three general quadratics in three variables, as discussed in § 6.2.3. (1) Test the method on a system of three randomly generated quadratics. (2) Try your routine on the following system representing two parallel cylinders of

87

Other Methods

radius 1 and a sphere of radius r: x2 + 2

2

y2-l=0

(x - I) + y - 1 = 0 x2 +y2 + z2 -r2 =0 For r = 2, see what your Dixon algorithm returns. If the algorithm fails, can you make it work? (Hint: consider using a projective transformation.) Determine the solutions by hand and see how they compare. Try this again for the case r = 1. (3) Append the equation z2 — 1 = 0 to the system of Example 6.2.2 and try your Dixon routine. What happens? (4) Use HOMLAB to solve these same problems, using the total-degree tableau-style script totdtab. How does it perform? Exercise 6.4 (Heuristic Eliminants) Repeat Exercise 6.3, but use the heuristic elimination algorithm described in Example 6.2.5. Exercise 6.5 (Grobner Bases) This exercise requires access to a computer algebra package that can compute a Grobner basis. (1) Repeat the examples of Exercise 6.3 once more using an algorithm for computing Grobner bases. Use a graded ordering and determine the normal set. If possible, derive an eigenvalue problem using the method described in § 6.3.2. (What can go wrong here?) (2) Try to use this approach to solve the six-revolute serial-link inverse kinematic problem as formulated in Equations (9.4.30), (9.4.31), and (9.4.32).

PART II

Isolated Solutions

Chapter 7

Coefficient-Parameter Homotopy

Equations arising in science and engineering express relationships between various physical quantities: the length of a bar, a chemical or physical property of a substance, an angle, a velocity and so on. Some of these quantities are the variables, whose unknown values are to be found, and the others are known parameters. In this way, we may consider any one problem to be a member of a whole family of problems, defined by letting the parameters range over all admissible values. It is natural to consider how the unknowns change in response to changes in the parameters. In most cases, but not all, this response is expected to be continuous. The essence of any continuation method is to track one or more solutions known for one set of parameter values to get solutions for some new set of parameters. While parameterized problems arise in many forms, this book is concerned with polynomial problems: simultaneous equations that are a sum of terms, each term a product of a coefficient with a monomial, itself the product of nonnegative integer powers of the variables (Definition 1.2.1). In this context, continuous parameters enter only through the coefficients, that is, the coefficients are functions of the parameters. When denning a parameterized family of systems, we often use the physical parameters of the problem as it originates in engineering or science, while at other times, we may define artificial parameters, such as the coefficients of the polynomial system. In this way, even a single polynomial system, one that has specific coefficients and no parameters, can be cast as a member of a parameterized family of systems having the same monomials but coefficients ranging over a complex Euclidean space. This sort of maneuver lies at the heart of classical algebraic geometry, where the complex number field is king. (Modern algebraic geometry commonly considers arbitrary number fields, abandoning continuation arguments.) Assuming that the coefficients are continuous functions of the parameters, a continuous path through parameter space determines a continuous evolution of the coefficients and, generally, continuous paths for the solutions as well. We call this a coefficient-parameter homotopy. Throughout this book, we often abbreviate the terminology to simply "parameter homotopy," meaning exactly the same thing. In a sense, every homotopy considered in this book is a parameter homotopy: the key is in recognizing what parameterization is most useful in a given context. 91

92

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

The beauty of using parameter continuation for polynomial systems is that if we can find all solutions to a general member of a family, then we can find all solutions to any other member of that same family. All it takes is a little care to ensure that continuity is preserved as we move from one parameter set to another. We begin with the theory that describes this situation and then examine how it can be applied in solving systems. In this chapter, we consider the situation when all of the solutions for some general system of the family are known at the outset. The sequel, Chapter 8, takes up the question of how to get started when no such prior solution is available.

7.1

Coefficient-Parameter Theory

In the following paragraphs, we will state several versions of the basic coefHcientparameter theory. The most concise approach would be to give the most general version first and then state the rest as corollaries, but for the sake of understanding, let's work the other way, from simple to general. Before stating the theorem, we need the concept of a Zariski open set. A Zariski open set of an algebraic set A is any set derived by removing from A an algebraic subset of A. If A is smooth and connected, e.g., C m , then except for the trivial case of the empty set, a Zariski open set of A is dense in A. It is almost all of A with all the missing pieces equal to a lower-dimensional algebraic set. See § 12.1.1 for more details.

Theorem 7.1.1 (Basic Parameter Continuation) Let F(z\q) be a system of polynomials in n variables and m parameters, F(z;q) :Cn x C m - • C n , that is, F(z; q) = {fi(z; q),..., fn(z; q)} and each ft(z; q) is polynomial in both z and q (see Definition 1.2.1). Furthermore, let Af(q) denote the number of nonsingular solutions as a function of q:

M{q) -=#[ze

Cn F(z; q) = 0, det (j^(z;

q)\ + 0 J .

Then, (1) Af(q) is finite, and it is the same, say Af, for almost all q € C m ; (2) For all q£Cm, M{q) < N; (3) The subset ofCm where M{q) = J\f is a Zariski open set. That is, the exceptional set Q* := {q e Cn\Af(q) < AT} is an affine algebraic set contained within an algebraic set of dimension n — 1; (4) The homotopy F(z; (t)) = 0 with 4>(t) : [0,1] ->• C m \ Q* has N continuous, nonsingular solution paths z(t) £ C n ;

Coefficient-Parameter Homotopy

93

(5) As t —> 0 ; the limits of the solution paths of the homotopy F(z; {t) : [0,1] -» Cm and 4>{t) g Q* for t £ (0,1] include all the nonsingular roots ofF{z;{0))=0Items 1 and 3 are implied by Corollary A.14.2; the quantity d\ in that theorem is the generic number of nonsingular roots which we denote as M here. In the terminology established in Chapter 4, the property M(q) = Af holds generically on q £ Cm. Item 2 holds because by Theorem A.14.1, a nonsingular solution at a parameter point q* G C M must extend to a nonsingular solution in the neighborhood of q*. Hence, it would be a contradiction for the open neighborhood around an exceptional parameter point q* to have fewer nonsingular roots than at q*. On the other hand, in the reverse direction as we approach a point q*, it is possible for a solution path to become singular or to diverge. Items 4 and 5 follow from similar reasoning, because if the N nonsingular solution paths coming from qi did not arrive at the nonsingular solutions of go, there would be more than Af nonsingular solutions in the neighborhood of golf we have all nonsingular solutions for one set of generic parameters q\, items 4 and 5 allow us to find all nonsingular solutions to any system in the family by continuation. We simply track all the solution paths along a path, i(t), through the parameter space that avoids the exceptional set, Q*, for t G (0,1]. All that is lacking is a method for constructing ^{t) to have the required property. But this is easy, as the following lemma shows. L e m m a 7.1.2 Fix a point go £ C m and a proper algebraic set A C C m . For almost all q\ G C m , the one-real-dimensional open line segment (t) := tQl + (1 - t)q0,

te(0,l],

is contained in Cm \ A. Proof. Set A has complex dimension at most m — 1, so it has real dimension at most 2m — 2. Let B be the union of all real-one-dimensional lines through qo and any point of A. B has real dimension at most 2m — 1, and so its complement in C m has real dimension 2m. The set of points q\ G C m that give a line segment satisfying the condition of the theorem includes all of C m \ B. • Item 5 of Theorem 7.1.1 with Lemma 7.1.2 imply that for a given target set of parameters go almost any starting set of parameters qi will give a homotopy F(z;tq1 + (l-t)qQ) = 0

(7.1.1)

whose solution paths include all the nonsingular solutions of .F(z; go) = 0 at their endpoints as t goes from 1 to 0 on the real line. If somehow we can arrange to solve F(z;q\) = 0 for a random, complex set of parameters gi, we are ready to solve the

94

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

target system, because the one-real-dimensional open line segment of the homotopy is contained in C m \ Q* with probability one. Suppose we have all the nonsingular solutions for only the particular system F(z; qi) = 0, with M(q\) = A/*. Even though gi is generic, it could happen that we wish to solve the system for a target go f° r which the homotopy of Equation 7.1.1 fails. This means there is some relation between qi and qo, for example, they might both be real with a degenerate point on the real line segment joining them. Referring to the proof of Lemma 7.1.2, we have that q\ is not in the degenerate set, Q*, but it is in the set of points lying on a real straight line from q0 to a point of Q*. When gi is generic, in the sense that Af(qi) = Af, but not random complex independent of qo, can we still formulate an homotopy to find all nonsingular solutions of F(z;qo) = 0 with probability one? Yes: the answer is to follow a different continuation path, one that is not the real straight-line segment from qi to go and that includes some extra parameter or parameters that can be chosen generically to avoid degeneracies. Here are three, among many, possibilities: • Pick a third random, complex parameter point p £ C m and follow the broken line homotopy path from qi to p to go- Each of the two real-straight-line segments will succeed with probability one, and so the concatenation of the two will succeed also. • Pick p as in the previous item, and employ a curved-path homotopy such as F(z; tqi + i(l - t)p + (1 - t)q0) = 0.

(7.1.2)

By similar reasoning to Lemma 7.1.2, the endpoints at t = 0 of N paths from the nonsingular solutions of F(z\q{) = 0 will include all the nonsingular solutions of F(z; go) = 0 for almost all choices of p £ C m . • Use the same homotopy as in Equation 7.1.1, but follow a more general path in the complex line denned by t, instead of just following the real segment [0,1]. A convenient way of doing so is to reparameterize the homotopy by T £ [0,1], setting

for generic 7 £ C. This maneuver is justified in the following lemma. Lemma 7.1.3 ( " G a m m a Trick") Fix a point q0 £ Cm, a proper algebraic set A C Cm, and a point gi £ Cm, q\ £ A . For all 7 £ C except for a finite number of one-real-dimensional rays from the origin, the one-real-dimensional arc (t):=tqi + {l-t)q0,

t=

1 +

^

1 ) T

>

T £ (0,1],

is contained in C m \ A. Furthermore, if we let 7 = e%e, the foregoing statement still holds for all but a finite number points 9 £ [—TT, IT] .

Coefficient-Parameter Homotopy

95

Proof. Since the set T := {t G C | (tqi + (1 - £)<7o) e A} is algebraic, it must either be all of C or a finite number of points in C. But by assumption, t = 1 is not in T, so T must be finite. The bilinear transform from r to t maps [0,1] to a circular arc in the Argand plane for t, leaving t = 0 with angle equal to the angle of 7. Hence, any two choices of 7 / 0 having different angles give distinct circular arcs that meet only in the two points t = 0 and t = 1. This implies that there is only one such arc through each t € T, and each such arc is produced by values of 7 on a one-real-dimensional ray from the origin. For all other values of 7 £ C, the path 4>{t) for T £ (0,1] is contained in C m \ A. The final statement follows because each ray from the origin hits the unit circle, I7I = 1, in a single point. • There are many alternative ways one could set up paths with the desired genericity, but these simple approaches suffice. We have already seen the usefulness of a variant of the "gamma trick" in the example of Figure 2.1, and we will return to it in § 8.3. Theorem 7.1.1 covers many of the cases that arise in practice, but situations arise when more refined versions are useful. Some useful variants are: (1) the variables z live on projective space or on a cross product of projective spaces instead of on Euclidean space; (2) we count solutions on a Zariski open subset of the variable space instead of on the whole space, that is, solutions that satisfy prespecified algebraic conditions are to be ignored; (3) the parameters q live on an irreducible algebraic set in Euclidean space or in projective space or in a cross product of projective spaces. In the case that the variable space or the parameter space involve a projective factor, the system of equations must be multihomogeneous in a way that is compatible with those spaces. Recall from § 3.6 the definition of a multiprojective space as a product of projective spaces, for which we have the associated concept of multihomogeneous polynomials. Theorem 7.1.4 (Generalized Parameter Continuation) Let X be a multiprojective space of dimension n, that is, X = P" 1 x • • • x Pnfe with n\ + • • • + n^ = n. Let U C X be a Zariski open subset of X. Let Q c Y be an irreducible multiprojective algebraic set in a multiprojective space Y. Let F(z; q) be a system of n multihomogeneous polynomials compatible with X xY such that z and q are homo-

96

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

geneous coordinates for X andY, respectively. Furthermore, let J\f(q,U,Q) denote the number of nonsingular solutions in U as a function of q € Q: M(q, U,Q):=#lzeu\

F(z; q) = 0, rank f^(z;

q)\ = n\ .

Then, (1) N(q, U, Q) is finite and it is the same, say Af(U, Q), for almost all q G Q; (2) For all qeQ, N{q, U, Q) < Af(U, Q); (3) The subset of Q where Af(q, U, Q) = M(U, Q) is a Zariski open set; We denote the exceptional set where Af(q, U, Q) < Af(U, Q) as Q*; (4) The homotopy F(z; 7(£)) = 0 with 7(4) : [0,1] —> Q\Q* hasM(U, Q) continuous, nonsingular solution paths z(t) G U; (5) As t —> 0, the limits of the solution paths of the homotopy F(z;/y(t)) = 0 with j(t) : [0,1] —» Q and j(t) $ Q* for t G (0,1] include all the nonsingular roots inU o/F(z; 7 (0))=0. Note that computations will be done in z G C™1+1 x • • • x C" m + 1 but interpreted as points in X. For each projective factor, we typically add an inhomogeneous hyperplane equation to make the scaling factor unique. This is the projective transformation technique described in Chapter 3. The constancy of the number of solutions for the algebraic case still follows from Corollary A. 14.2, which allows an even more general situation than we use here. We require Q to be irreducible so that it is path connected which implies the constancy of the root count; if Q were not irreducible, the root count could be different on different components of Q. Since C n is a Zariski open subset of IP™, Theorem 7.1.4 clearly includes Theorem 7.1.1, by letting m = 1, U = Cn, and Q = C m , an irreducible algebraic set. Notice that in the generalized version of the theorem, we denote the generic number of nonsingular solutions as N{U,Q), because the count may change if we consider a different Zariski open set U for the variables or if we restrict the parameters to a different algebraic set Q. We will consider both of these possibilities in the succeeding sections. We can generalize the theorem further. It sometimes happens that the parameters appear via analytic expressions instead of polynomial ones. That is, the coefficients of F(z; q) as a polynomial system in z may be trigonometric or other analytic functions of q. All the same conclusions follow. This is discussed in § A.14.2, so we omit further discussion here and simply state the analytic version of the theorem in the following abbreviated form. Theorem 7.1.5 (Analytic Parameter Continuation) Consider the same situation as in Theorem 1.1.4 except that Q — C m and each of the n functions in F(z; q) is a multihomogeneous polynomial in z with coefficients that are holomorphic functions of q € Q. Then, we have the same conclusions as Theorem 7.1.4 for items 1, 2, 4, and 5, with item 3 modified as

Coefficient-Parameter Homotopy

97

(3) The subset of Q where Af(q, U, Q) = N{U, Q) is an analytic Zariski open set. Elsewhere, without the qualifier analytic, we use the term Zariski open set to mean the algebraic case. The inclusion of analytic in item 3 of Theorem 7.1.5 implies a weaker condition than the algebraic case, as is to be expected since the set of holomorphic functions is larger than the set of polynomial functions. The difference is illustrated by the algebraic case f(z; q) = z2 — q, which has Af(q) — 2 everywhere in C except q = 0, as compared to the analytic case of f(z; q) = z2 — sin(g), which has exceptions for q = kir, k any integer. An algebraic equation can never have an infinite number of isolated roots, but an analytic one can. Even so, an analytic Zariski open set of C m is path connected, so continuation will succeed. A final generalization of the theorem is to consider not just nonsingular roots, but isolated roots of any multiplicity. Theorem A.14.1 and Corollary A.14.2 are general enough to justify a restatement of Theorem 7.1.4 for isolated roots. Care must be taken in the restatement of items 2 and 5, as the limit behavior of multiple roots as a parameter path approaches the exceptional set is more complicated than for nonsingular roots. The fact is that in this limit only three things can happen: a solution path can leave U by landing on X \ U (this may include paths going to infinity); a solution path can land on a higher-dimensional solution component and thus cease being an isolated point; and two or more solution paths may merge to form an isolated solution whose multiplicity is the sum of those for the incoming paths. The number of isolated roots of a given multiplicity can increase, but only at the expense of a corresponding decrease in the number of roots having a lower multiplicity.

Theorem 7.1.6 (Parameter Continuation of Isolated Roots) Let X, Y, U, Q, and F(z;q) be as in Theorem 7.1.4- Furthermore, let Mi(q,U,Q) denote the number of multiplicity i isolated solutions in U as a function of q £ Q. (1) Ni{q, U, Q) is finite and it is the same, say Ni(U, Q), for almost all q e Q, and there is some finite number fx such that for all i > /x, Afi(U, Q) = 0; (2) For allqeQ and any m, J2?=i *M(«, U, Q) < Y7=i iMi(U, Q); (3) The subset of Q where Mi{q, U, Q) = Afi(U, Q) for all i < m is a Zariski open set; We denote the exceptional set where any of these equalities fails as Q*m; (4) For each i, the homotopy F(z; j{t)) = 0 with -y{t) : [0,1] -> Q\Q* has Aft(U, Q) continuous, isolated solution paths z(t) € U of multiplicity i; (5) As t —* 0, the limits of the set of multiplicity i solution paths such that i < m of the homotopy F(z; j(t)) = 0 with i(t) : [0,1] -> Q and j(t) g Q*m for t € (0,1] include all the isolated roots of F(z; 7(0)) = 0 in U of multiplicity less than m!, where m' is such that Mi(U, q) = 0 for m < i < m'. In numerical work, the paths traced by roots of multiplicity greater than one are hard to track, but in principle, singular path tracking is possible, see § 15.6. If we track only nonsingular paths, item (5) tells us that we are assured of obtaining

98

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

all nonsingular roots of the target system, which is what was claimed in the earlier theorems. To be assured of finding all isolated roots of the target system, we must track all the generically isolated roots, as indicated when m in item (5) is equal to fi in item (1). A special case of particular interest is when all the isolated roots of a generic system in the family are nonsingular, that is, when fj, — 1 in item (1) of the theorem. Then, we can easily track all the isolated solution paths, and we are assured that the endpoints of these include all isolated solutions, even those with multiplicity greater than one. It is important to note that where Theorems 7.1.1, 7.1.4, and 7.1.5 refer to a polynomial system F(z; q), it is acceptable for F to be given in straight line form (see Definition 1.2.4). 7.2

Parameter Homotopy in Application

The foregoing describes the essence of the polynomial continuation method. To find nonsingular solutions of the polynomial system p(z) = 0 in a Zariski open set U, we do the following, a restatement in mathematical terms of the steps enumerated in the introduction to Part II. Ab Initio Procedure: To find all solutions in a Zariski open set U of p(z) = 0. (1) Embed p(z) : C" —> C" as a member of a parameterized family F(z;q) : C" x Q —> C n of polynomial systems. Denote by qo £ Q the particular parameter values that correspond to p(z), that is, F(z,qo) =p(z). (2) Arrange the embedding such that we have starting parameters q\ £ Q, q\ $ Q*, for which we either have or can compute all N(U, Q) nonsingular solutions to F(z;qi) = 0. Call these the "start points." (3) Construct a continuous path j(t) : C —> Q such that 7(1) = qi, 7(0) = qo, and 7(i) £ Q* for t in the real interval (0,1]. That is, -y(t) for t £ [0,1] connects the start parameters to the target parameters without intersecting the exceptional set, except possibly at t — 0. (4) Follow the Af(U, Q) solution paths of F(z;j{t)) = 0 from t = 1 along the real axis to the vicinity of t = 0. These paths begin at the start points, and we propagate them towards t = 0 using a numerical path-tracking algorithm. (5) In the neighborhood of t = 0, determine which paths are converging to nonsingular solutions. Refine these to numerically approximate the solutions to the desired accuracy. (6) Keep only those roots which are in U, that is, eliminate those that lie on the algebraic set C™ \ U. Suppose that p(z) is not just a single system of interest, but rather it is a member of a family systems G(z; q) : X x Q' —> C" of the sort we have been discussing:

Coefficient-Parameter Homotopy

99

p(z) = G(z; q) for some q G Q'. For the sake of item 2 above, we may have had to cast p(z) in a larger family of systems than G. That is, G(z; q) is F(z; q) restricted to Q' C Q. This is often necessary when we have no generic member of G for which we have (or can easily generate) all nonsingular solutions. The larger family F is chosen in a way that provides such a start system. However, once we have solved an initial generic member G, we can then solve any other member of G by parameter continuation along paths in Q'. This can be advantageous because the generic root count on G can be smaller (perhaps much smaller) than the generic root count for F. To capture this advantage, one may apply a two-phase procedure as follows. Two-Phase Procedure: To find all solutions of G(z; q) — 0 in Zariski open set U for several parameter points, say qlt..., qk £ Q'. (1) Phase 1: solve G(z;q0) = 0 (a) Choose q0 random, complex in Q'. (b) Solve G(z; go) = 0 using an ab initio technique as above. (c) Let Z be the set of nonsingular solutions in U so obtained. (2) Phase 2: for each qt, i = l,...,k, straight line homotopy.

solve G(z;qi) = 0 by a continuation on a

(a) Form the homotopy G(z; tq0 + (1 — t)qi) = 0. (b) Track each root in Z from t = 1 to near t = 0. (c) In the neighborhood of t = 0, determine which paths are converging to nonsingular solutions and compute their endpoints to the desired accuracy. (d) Keep only those roots which are in U, that is, eliminate those that lie on the algebraic set Cn \ U. In the remainder of this chapter, we will concentrate on Phase 2 of this procedure; that is, we will assume that we have the solution set for some initial generic system. Phase 1, the ab initio procedure, is the subject of Chapter 8. 7.3

An Illustrative Example: Triangles

Before embarking on a more complete examination of parameter continuation, let's look at a simple example where the parameterization and the start system are rather easily obtained. One would not use continuation to solve this problem, but it may help illustrate what goes on in more challenging problems. To make a concrete example, we consider the classic problem of solving for the angles of a triangle given the lengths of its sides, a, 6, c. Let 9 be the angle opposite side c. We will write a system of polynomials in two variables eg = cos9 and sg = sin 9. As shown in Figure 7.1, we have the three vertices of the triangle as

100

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

(0,0), (b, 0) and (acg,asg), and the system to solve is

The first of these is the basic trigonometric identity for sine and cosine, and the second says that point (acg,asg) is distance c from point (6,0). Our parameters are the physical parameters q = (a,b,c), and the variables are z = (cg,sg). The coefficients in /i are constants and, when expanded out, the coefficients in /2 are quadratic polynomials in (a,b,c). (acg,asg)

/Q (0,0) Fig. 7.1

b

\ (6,0)

Triangle with side lengths a,b, c.

The system is easily solved without using continuation by forming ji — a2 f\ to get a2 + b2 - 2abce - c2 = 0,

se = ±\Jl-c20.

(7.3.5)

The first of these is the familiar Law of Cosines for planar triangles. For almost all (a,b,c), there is a unique value of eg, the exceptions being if a = 0 or b = 0. In these cases, the angle 9 is not well defined, because one of the sides of the triangle is nonexistent. Away from these sets, the second of Equations 7.3.5 gives two distinct values of SQ unless eg = ±1, in which case there is a double root. Substituting this in the law of cosines equation, one sees that there will be double roots for (a, b, c) on any of the four planes a ± b ± c = 0. These are the boundaries of the triangle inequality conditions. For real (a, b, c) that violate the triangle inequality, one has c$\ > 1, and sg is a pair of complex conjugate roots. Now, let's pretend that we do not know the solution via Equation 7.3.5 and that we seek a solution by parameter continuation using (a, 6, c) s C 3 as our parameter space. The first hurdle is to obtain a start system. For a more complicated system, we would normally pick (a, 6, c)i at random and rely on one of the special homotopies discussed in Chapter 8, such as the total degree homotopy, to solve it. We will discuss this type of maneuver more below. However, for this simple system, we can pick out a known solution easily: let (a\,bi,c\) = (5,4,3), a Pythagorean triple. Then, we have two solution points (cg,sg) = (4/5, ±3/5). Note that J2 in Equation 7.3.4 is homogeneous in the parameters; in particular, all the coefficients are homogeneous quadratics in (a, b,c). This means that the solution does not

Coefficient-Parameter Homotopy

101

change under scaling, and so for (a, b, c) = (5a, 4a, 3a) we have the same solution points (ce,sg) = (4/5, ±3/5) for any nonzero, complex a. One may wonder if there are any other solutions. The total degree of the system is four, and its one-homogenization has two roots at infinity of the form [ZQ, c$, sg] = [0,1, ±i], so there are only two finite roots. Here, the one-homogenization is obtained via the substitutions eg = Cg/zo, sg = sg/' ZQ. Next, we need a homotopy path from our starting system (ai,&i,ci) = to the target (a, b, c)o- The straight line path -y(t) = (1 - t)(5a, 4a, 3a) + t(a0, b0, c0)

(7.3.6)

will suffice for almost all targets. It is not difficult to check that when a is complex and the target is real, the values of t for which the path intersects the singularity conditions is complex, unless the target itself is singular. So we will not encounter any singularities for t on the real interval (0,1]. For a fixed complex-valued a, there will exist complex targets for which the homotopy path hits a singularity, but if we choose a at random, independent of the target, then there is a zero probability of this failure. It may be instructive1 to consider what would happen if we were to choose a homotopy path in the reals, say a = 1. The homotopy is still fine for any real target that is inside the triangle inequalities, since these bound a convex region of the real parameter space. However, a line segment connecting a real target outside the triangle inequality region to a real start system inside must cross the singularity. These real targets form a set of measure zero in C3, so considering all targets in C3, the homotopy is still valid with probability one. But in practice, we usually want to solve systems for real-valued parameters. This illustrates that it is important to use some sort of complex randomizing factor in the homotopy so that real systems are solved with probability one. 7.4

Nested Parameter Homotopies

In practice, it is quite common for a parameterized family of problems to have special cases that are themselves of significant interest. In fact, we often see an elaborate network of special cases, each one inheriting the special structure of the solution sets of the more general cases of which it is a member. The forward kinematics problem for Stewart-Gough robot manipulators discussed below (§ 7.7) illustrates this. Let us be a bit more precise about this situation. Corollary 7.4.1 For a family of polynomial systems F(z;q) : C™ x Qo —> C n , a chain of parameter spaces

Qo 3 Qi D Q2 3 • • • 1

Exercise 7.1 at the end of the chapter is a good way to get a feel for the numerical behavior of this simple homotopy.

102

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

each of which is an irreducible quasiprojective algebraic set, and a Zariski open set U C C n , the generic nonsingular root counts J\f(U,Qi) obey the inequalities M{U, Qo) > M(U, Q0 > M{U, Q2) > • • • Proof.

This is just the repeated application of item 2 of Theorem 7.1.4.

•

We know that we can use parameter homotopy within any one of these spaces to compute the nonsingular roots of the associated polynomial systems, assuming we have all nonsingular solutions at an initial generic point in the family. Suppose we wish to use parameter continuation within the space Q\, but we do not yet have a solution for any point in that space. Suppose that instead we have all nonsingular solutions for the system f(z;qo) = 0, for a generic point qo € Qo- Let q® € Q\ be a generic point of Q\. But Qi C Qo implies q\ G Qo, so we may find all nonsingular solutions to f{z;q1) = 0 by parameter continuation in Qo, starting at qo- If Qi C Qo, the exceptional set in Qo, then there are fewer solutions at q\ than at qo- Now, we may proceed to solve the system for any other parameters q\ £ Q\, i = 1,2,..., using this smaller number of paths, by continuation inside Q\ starting at q±. Obviously, the same approach can be applied to solve a start system in any Qi once we have a solution for a system in one of its ancestors, Qj, j < i. Unlike the simple triangle problem discussed above, when solving problems in engineering or science, we rarely have all the solutions for any generic point in the natural parameter space of the problem. So how do we get started? A very useful trick is to solve the first naturally-parameterized problem by embedding the whole family within a larger, artificially-parameterized family of problems, within which we do have a solved general case. This is the Ab Initio procedure of § 7.2. Suppose, for example, that an engineering problem is a system of two quadratics in two variables. There are a total of 12 coefficients in two bivariate quadratics, but for our problem these may depend on just a few physical parameters. We may solve the initial problem given by generic physical parameters using a homotopy in Qo = C12, the parameter space of all coefficients of two bivariate polynomials. Then, Qi C Qo consists of the sets of coefficients that are generated by ranging over the physical parameters. 7.5

Side Conditions

In the statement of coefficient-parameter homotopy above, the generic number of nonsingular roots, M(U, Q), is counted on a Zariski open subset U in complex space. The result is stated in that way to justify the application of "side conditions" for eliminating uninteresting solution paths from a parameter homotopy. Suppose the zeros of a system of analytic functions s(z) : C™ —v Ck are not of interest as solutions of F(z;q) — 0. We call s(z) — 0 "side conditions," and U = Cn \ s~1(0). Typically, the side conditions identify degenerate solution sets

Coefficient-Parameter Homotopy

103

that are known by other means, but they may also be certain pro forma conditions that have been noticed to arise often. A common choice of the latter type, especially when using monomial product homotopies, is the side condition s(z) = HILi z* = 0> which simply means that we are not interested in solutions that have any coordinate equal to zero. This is equivalent to saying that we are working on the open set U — (C*)n, where C* = C \ 0. We will see below the use of side conditions specific to a particular application, such as two variables being equal: s(z) = z\ — z% = 0. In essence, even when we work on U = C n , we are invoking a side condition on P": we are ignoring solutions at infinity. Side conditions work hand-in-hand with nested parameter homotopies. Whenever we solve the first generic example in a parameter space, we check the solutions against the side conditions. Then, when solving other problems in the same parameter space using the first example as the start system, we drop the solutions that satisfy the side conditions from the list of start points for the continuation. In some cases, the degenerate solutions specified by the side conditions vastly outnumber the interesting ones, and the number of paths in the parameter continuation is dramatically reduced. 7.6

Homotopies that Respect Symmetry Groups

Some systems respect symmetry groups and we can reduce the number of paths to follow accordingly. Suppose we have a mapping 5 : C" —> C n such that for any q G Q, if F(z;q) = 0, then F(S(z);q) = 0. Furthermore, suppose that if z is a nonsingular solution, then so is S(z). Often, F(S(z);q) is either exactly F(z;q) or a rearrangement of the polynomials of F(z;q). For example, under the mapping i > (y, x), the polynomial system {xy—qi, x2+y2+q2} is invariant, whereas S : (x, y) — the polynomials in the system {xy3 — a, x3y — a} interchange. In such cases, it is clear that nonsingular roots map to nonsingular roots. Using the notation S2(z) = S{S(z)), S3(z) = S(S(S(z))), etc., suppose k is the smallest integer such that z = Sk(z). We say that / respects 5 as a symmetry group of order k. The symmetry implies that for the homotopy F(z;q(t)) = 0, a solution path zo(t) is matched by the paths Zi(t) = Sl(zo(t)), i = 1,..., k - 1 . So we only need to compute one of the k paths: we use S to compute the endpoint of the matching paths without knowing their intermediate points. It can happen that for the same symmetry mapping, roots appear in symmetry groups of different orders. For example, for the system {xy3 — 1, x3y - 1} = 0 and the mapping (x, y) i-» (y, x), the root (x,y) = (1,1) maps to itself, while the root (\/2/2, —\/2/2) is in a group of order two. This must be taken into account when using symmetry to reduce the number of solution paths. When we solve the first generic example in a parameter space, we usually must resort to an ab initio procedure (§ 7.2), embedding the target system into a larger family of systems. Since the members of this larger family generally do not respect

104

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

the symmetry, we must follow all the paths in the first run. The symmetry can still be useful as a check on the computation: do all roots appear in the requisite symmetries? If so, we have some assurance that the numerical process was carried out successfully. Then, in subsequent runs using Phase 2 of the two-phase parameter homotopy procedure, the symmetry is used to reduce the number of paths in the computation. 7.7

Case Study: Stewart-Gough Platforms

For the first significant example of this book, we examine an important family of problems from mechanical engineering: the forward kinematics of Stewart-Gough platform robots. As we will see shortly, there are a number of different options for the design of such robots, and these can be organized into nested families of robot types. These parameterized families are ideal for illustrating the concept of parameter continuation. A Stewart-Gough platform, shown schematically in Figure 7.2, is a type of parallel-link robot, having a stationary base platform upon which a moving platform is supported by six "legs." Each of these legs has a spherical (ball-and-socket) joint at each end,2 with a prismatic joint (linearly telescoping) in between. The prismatic joint is actuated, usually by a ball screw and electric motor, so that the distance between the center of its adjacent universal and spherical joints can be controlled by computer. That is, leg i, i — 1,..., 6, connects point Ai of the stationary platform to point Bi of the moving platform, and we control the lengths Lt = \Bi — Ai\. By proper coordination of the six leg lengths, the moving plate can be placed in any position and orientation within a working volume (actually a six-dimensional workspace, a subset of R3 x 50(3)), whose boundaries are determined by the limits of travel of the prismatic joints. Collisions between the legs can also limit the range of motion. These robots are best known as the mechanism beneath motion platforms for aircraft flight simulators, but they are applicable to tasks as varied as aiming telescopes or welding automotive bodies. The kinematics of these robots has been the subject of extensive academic research, which we cannot begin to address here. We refer the interested reader to (Merlet, 2000; Tsai, 1999) as a starting point. Although many interesting algebraic problems arise in the study of these mechanisms, for the moment, we will consider only the so-called "forward kinematics" problem, which is as follows: Given: the geometry of the stationary and moving platforms and the six leg lengths, 2

One ball joint on each leg can be replaced by a universal joint to eliminate rotation of the leg around its axis, but this does not alter the motion of the moving platform, our present object of study.

Coefficient-Parameter Homotopy

105

Find: the position and orientation of the moving platform with respect to the stationary one. As usual, in what follows, we embed the real problem into complex space, so even though only real values of the leg lengths are physically meaningful, we consider complex Li e C. Similarly, we treat the robot workspace as C3 x 50(3, C), where SO(3,C) = {AG C 3x3 |,4 T yl = I,detA = 1}.

Fig. 7.2

General Stewart-Gough platform robot.

To write a system of polynomial equations, we need to precisely define the problem data. Choose reference frames in the stationary and moving platforms. Let the position of point Ai be given by vector at S C 3 in the stationary frame, and let Bi be given by vector bi € C 3 in the moving frame. Rather than use a direct coordinatization of C3 x SO(3,C), it is more convenient for the problem at hand is to use Study coordinates, also known as "soma coordinates" (p.150-152 Bottema & Roth, 1979). These consist of all points [e,g] = [e0, ei, e2, e3, g0, gi, 52 >#3] G P 7 that lie on the Study quadric / 0 (e, g) = eog0 + eigi + e2g2 + e3g3 = 0.

(7.7.7)

This is an isomorphism of C3 x 50(3, C), wherein the elements e are a quaternion that represents the orientation of the moving platform with respect to the stationary one and g is a quaternion that encodes translation as p = ge'/(ee1). Accordingly,

106

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

the position of point Bi in the reference frame of the stationary platform is written

(ge' + eble')/(eel), where multiplication follows the rules for quaternions and g' = {go, —g\, —gi, ~gz) and e' = (eo, —&\, ~&2-, —63) are quaternion conjugates of g and e. Clearly, we must exclude the points that satisfy s(e,g) = ee' = 0.

(7.7.8)

The Study quadric is exactly the condition that the translation ge' be a pure vector, and sincefo,is a pure vector, so is e^e''. These facts and the fact that the length of pure vector v, considered as a quaternion, is just vv', allow us to write the basic kinematic equations for the Stewart-Gough platform as Li = {(ge' + ebie')/(eef) -
1 = 1 , . . . , 6.

(7.7.9)

Note that this system of equations immediately solves the "inverse" kinematic problem: given the position and orientation of the moving platform as [e,g], we can calculate the leg lengths Lt. We are looking to solve the opposite problem: given Lu find [e,g\. To proceed, we expand Equation 7.7.9 and multiply through by ee' to get, for i = l,...,6 fi(e, 9) = 99' + (hK + a X " £<)«*' + (gb'te' + ebi9') -(ge'a'i + ateg') - (eft^'a; + a.eb'.e') = 0 .

l

;

In summary, Equations (7.7.7,7.7.10) form the forward kinematic problem for Stewart-Gough platforms as F(e,g) = {f0,fu...,f6}=0,

(7.7.11)

subject to the side condition s(e,g) ^ 0 from Equation 7.7.8. System F(e,g) = 0 is a set of seven homogeneous quadratic equations in [e,g] G P 7 .

7.7.1

General Case

The complete family of Stewart-Gough forward kinematic problems is parameterized by the joint center points and the leg lengths, {(a i; 6,, L4), i = 1,...,6} G (C3 x C 3 x C) 6 , a 42-dimensional space. Hence, in the preceding section, we should have written the equations as F((e,g);p) — 0, where p G C 42 . It is of historical interest to note that the number of solutions to the forward kinematics of general Stewart-Gough platforms was found to be 40 by several different researchers at about the same time 3 using entirely different approaches: continuation (Raghavan, 3

Historical note: preprints of (Ronga & Vust, 1995) circulated widely in 1992 and were referenced in (Lazard, 1993; Mourrain, 1993). The conference paper (Raghavan, 1991) was the first report of the count of 40, and this numerical result may have helped motivate the proofs.

Coefficient-Parameter Homotopy

107

1993), vector bundles and Chern classes from algebraic geometry (Ronga & Vust, 1995), computer algebra using Grobner bases (Lazard, 1993), and computation of a resultant using computer algebra (Mourrain, 1993). See also (Mourrain, 1996). The formulation of the problem we use here follows (Wampler, 1996a), wherein a simple proof of 40 roots is given. The same formulation was derived independently by Husty (Husty, 1996), who gave a procedure that uses computer algebra to derive a degree-40 equation in one variable. This is but a small indication of the level of interest this problem has attracted. If we could solve the forward kinematics problem for just one general member of C42, we could solve any other member by parameter continuation. The question of how to get that first solution set is addressed in the next chapter. For the moment, let us just say that the trick is to cast the Stewart-Gough forward kinematics problems as members of a much larger family, the family of all systems of seven quadrics on [e,g] G P 7 . General members of this family have 27 = 128 isolated solution points, so we can find all isolated solutions for an initial Stewart-Gough problem by tracking 128 solution paths for a homotopy defined in this larger space. Doing so reveals that a generic Stewart-Gough platform, p0 £ C42, (chosen using a random number generator) has 40 nonsingular solutions and 88 singular ones. The singular solutions are on the degenerate set Equation 7.7.8, so we can safely ignore them as they are not of physical significance. In short, we have 7V(P7,C42) = 40 and only these roots are of interest. Having the 40 isolated solutions x0 € -Fp^1(0) to a generic Stewart-Gough platform, po £ C42, we are ready to apply parameter continuation within the family. By Lemma 7.1.2, a straight line path from po to almost any other pi G C42 stays generic and so by Theorem 7.1.4, the 40 solution paths starting at Xo for t = 1 of the homotopy HSG((e,g),t)

:= F((e,g);tp0 + (1 - t)Pl) = 0

(7.7.12)

will lead to a set of endpoints that contains all isolated solutions of F((e, g);pi) = 0. (We invoke the generalized Theorem 7.1.4 instead of the basic version, Theorem 7.1.1, because we are working on projective space P7.) There exist points p* for which the line segment between p0 and p*, parameterized by t G [1,0) in the homotopy above, strikes a singular point. Such points are a set of measure zero in C42, but they do exist. If one happens to encounter such a problem, where some homotopy paths founder before t approaches zero, all that is necessary is to first continue from po to another random point in C42 before proceeding to the final target. Or, to accomplish the same thing, we may choose a random 7 £ C and follow the homotopy HSG{{^,9)>t(s)) = 0 along a nonlinear path t(s) = s + -ys(l — s) on the real segment s £ [1,0]. In practice, unless one is solving a large number of such problems, the exceptions to the linear homotopy path will almost certainly not be encountered, so Equation 7.7.12 is sufficient.

108

7.7.2

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Platforms with Coincident

Joints

Various special families of Stewart-Gough platform robots may be defined by requiring some joint centers to coincide. For example, suppose legs 1 and 2 both connect to the same point on the moving platform; in other words, points B\ and i?2 coincide. This is an example of a so-called 6-5 platform, where 6 and 5 are the numbers of distinct joint centers on the stationary and moving platforms, respectively. Such special platform robots can have advantageous kinematic properties, so they are of practical interest. In fact, the limiting case of a 3-3 platform, discussed below, is one of the most popular designs in practice. A 6-6 platform is the most general type, treated in the preceding paragraphs. The number of joint centers on a platform can take on any value from 3 to 6. (If there were only 2 joint centers, rotation around the line through them cannot be resisted by the mechanism, making it useless.) Moreover, these two integers are not enough in general to fully specify the mechanism type, since it matters, for example, if one of the legs connects two double joints. We can schematically represent the topological type of a platform with coincident joints by two rows of dots representing joint centers, with lines between them representing connecting legs. There are always six legs, but the number of dots is reduced by the presence of coincident joints. We will assume that the top row of dots represent the joint centers on the moving platform and the bottom row represents those of the stationary platform. The connection patterns

VvyY - YvM 4-4a 4-4b are both 4-4 patterns, but they are topologically distinct. We will only address a few of the possibilities in the next few paragraphs. A more complete catalog of coincident-joint geometries and their root counts can be found in (Faugere & Lazard, 1995). Consider first the 4-4 connection pattern illustrated on the left above, which we label 4-4a. It is given as a quasiprojective algebraic subset of C42 by the equations {ai = a2,a5 = ae,b2 = 63,^4 = b5}. We may solve such an example by making it the target system of either a total degree homotopy or the general Stewart-Gough homotopy HSG, because it is a member of both. Usually, it is more efficient to use the 40-path option than the 128 paths of the total degree homotopy. But either way, one finds only 16 solutions, with the rest of the paths having endpoints on the degenerate condition, Equation 7.7.8. With 16 solutions for a generic example in family 4-4a in hand, we can solve any other problem in that subfamily using HSG and only 16 paths. This is just the tip of the iceberg in terms of the possible subfamilies of the Stewart-Gough platform. Figure 7.3 shows a family tree of six sub-families, with arrows indicating inclusions (lower families in the figure are sub-families of higher ones). At the top, "quad7" is the family of all systems of 7 quadrics, which contains

Coefficient-Parameter Homotopy

109

all of the Stewart-Gough platform systems. Table 7.1 lists these same families: each is given a name, such as 4-4a, and the pattern of coincident joints is indicated graphically. Ignore for the moment the families whose names end in "P;" these are discussed in the next subsection. The number of nonsingular roots is indicated as M. This will be the number of homotopy paths for a parameter homotopy starting from a generic point in the family and ending at any other point in the family, including any point in a family that is a subset of that family. For each family, the dots in the table in its row indicate which families it belongs to. For example, the first column is the family of all systems of 7 quadrics, which contains all of the other families, so there is a dot in every cell of the first column. We can solve any Stewart-Gough platform by a 128-path homotopy through the parameter space of 7 quadrics or by a 40-path homotopy through the space of general 6-6 platforms. Of course, if the target system is a member of some other subfamily, it is more efficient to work within that family after a first generic member of the family has been solved by continuation in a family above it. This is why, for example, we need the seven-quadric system to get the process started.

Fig. 7.3 Stewart-Gough coincident joint family tree

Table 7.1 is not an exhaustive list of special Stewart-Gough sub-families. Among the coincident-joint families, any type K-L with 3 < K, L < 6 is possible, including cases where 3 joints are coincident. Four coincident joints will be degenerate—either no solutions or a positive-dimensional solution set—so these can be ignored. Further exploration of the coincident-joint families is an exercise at the end of this chapter. Besides these families, there exist special cases where no joints are coincident, but rather, there is some other geometric relationship, such as joints in a straight line.

110

Numerical Solution of Systems of Polynomials Arising in Engineering and Science Table 7.1 Stewart-Gough Sub-Families 4 q

Pattern N/A

Name quadf~

66

40

64

20 + 20 32

nnn " 111111 6-6P

IA A !

Af 128

"

1A/M \A A /

4 4a

6-4P ~

16 + 16 is

Y W V

4-4aP

8+ 8

T V V \1 /W

^ 4-4bP 3-3

24 12+12 8+8

\AA

d

6

6 7

6 6

•» ••

•

.

6 P

6

4 4

4

4 4

4

4

4 a 4 b 3 P a P b P 3

• • .

«

»

r~;

.

«

«

»

;

i «

• ..»»

We will have reason to study such a case later, in Part III. 7.7.3

Planar

Platforms

Every family in Table 7.1 has a planar version, indicated by the suffix "P" in its name. These have the six points of the stationary platform in a plane and similarly for the moving platform. In the interest of simplicity, these have not been added to Figure 7.3, but we may summarize the membership relationship as follows. If A and B are non-planar families, AP and BP their planar sub-families, and B is a sub-family of A, then we have the following inclusions. A D AP U U B D BP The planarity condition results in a symmetry, because the moving platform and its mirror image reflected through the plane of the stationary platform are congruent and all the leg lengths are preserved by the reflection. Hence, solutions appear in symmetric pairs. If we perform continuation in a planar family, this symmetry applies at every step, and hence all solutions can be obtained by tracking only one of each pair. This is the reason that N in Table 7.1 is written in the form N/2 + AT/2: only half the paths must be tracked to solve a member of that family. 7.7.4

Summary of Case Study

The main point to remember is that if we have a list of N{U, Q) nonsingular solutions for one generic member of a parameterized family F of polynomial systems, we can find the nonsingular solutions of any other member of the family using these as the start points of M(U,Q) homotopy solution paths. In the case of the for-

Coefficient-Parameter Homotopy

111

ward kinematics of Stewart-Gough platforms, N(F7, C42) = 40, so any problem can be solved using a 40-path homotopy. We have identified a number of sub-families that have a reduced number of nonsingular solutions, and a homotopy that stays within such a parameter subspace solves other members of the sub-family using the reduced number of solution paths. Sub-families with planar platforms admit a two-way symmetry which can be used to reduce the number of solution paths by half. We see that parameter continuation can be an effective way to explore such nested families and discover the generic number of nonsingular roots for each. In the exercises in the next section, we encourage the reader to experience this directly, by running Matlab routines supplied for this purpose. It should be mentioned that there are many other approaches to such a study. In addition to studies of the general 6-6 case already mentioned (Husty, 1996; Mourrain, 1996; Raghavan, 1993; Ronga & Vust, 1995; Wampler, 1996a), for several of the subfamilies, kinematicians have found elimination procedures reducing the problem to a single polynomial (Chen & Song, 1994; Nanua, Waldron, & Murthy, 1991; Sreenivasan, Waldron, & Nanua, 1994; Zhang & Song, 1994) or have applied their own variants of continuation (Sreenivasan Sz Nanua, 1992; Dhingra, Kohli, & Xu, 1992). An extensive study of coincident-joint sub-families using Grobner bases can be found in (Faugere & Lazard, 1995). 7.8

Historical Note: The Cheater's Homotopy

Among those who have some passing knowledge of developments in polynomial continuation, there has sometimes been confusion between parameter homotopy and a similar approach called the "cheater's homotopy" by its inventors (Li, Sauer, & Yorke, 1989). Appearing in print before the article establishing "coefficientparameter homotopy" (Morgan & Sommese, 1989), the cheater's homotopy presaged much of the flavor of the full parameter theory. Consequently, the cheater's homotopy holds an important place in the development of the subject, even though it was soon eclipsed by the more general parameter homotopy theory. Rather than working in the natural parameter space Q associated to a system f(z; q) = 0, the cheater's homotopy expands the parameter space by generic constants b E C". The method starts by solving the initial system f(z; q{) + b = 0 for generic qi E Q and b € C". Then, the finite, nonsingular solutions of this system are used as start points in a homotopy to find all the finite, nonsingular solutions to some other example in the family, say f(z;qo) = 0, qo € Q. This is done by following the solution paths from t = 1 to t = 0 in the homotopy f(z; q(t))+tb = 0, where q(t) G Q is a continuous path in Q with q(l) = q\ and q(0) = qo. We can see immediately from the parameter homotopy theory that this approach works: we have a generic start system (qi, b) in an expanded parameter space Q X C" and the target system is given by (qo,O) 6 Q x C " . However, the addition of

112

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

the generic constants to each equation often destroys crucial structure, causing an increase in the number of paths to track, often substantially. A simple example that shows a big difference is

For general q, this has one nonsingular solution (x,y) = (q,q), so a parameter homotopy will have just one path to track. But the start system for the cheater's homotopy

f(^MM={x4;2XXb:b2}=0.

(7.8.14)

has six nonsingular solutions. Computing solutions of Equation 7.8.13 for several different values of q by the cheater's homotopy requires six paths each time. The added constants b\ and 62 destroy all the structure of the original system. This kind of difference arises in meaningful problems as well; for the nine-point path synthesis problem discussed in § 9.6.7, a parameter homotopy requires only 1442 solution paths, whereas the cheater's homotopy would require at least 90,000 continuation paths (see (Wampler, Morgan, & Sommese, 1992, 1997)). The difference is due to the presence of positive dimensional solution components. Parameter homotopy preserves these components and so the associated paths can be safely ignored. But the cheater's homotopy perturbs these components, replacing them with thousands of nonsingular paths that must be tracked. The same property that makes the cheater's homotopy undesirable in the general situation can make it the method of choice in certain specialized situations: the addition of the random constants makes all finite roots nonsingular. For example, Equation 7.8.13 has a quintuple root at the origin, (x,y) = (0,0). Adding the constants as in Equation 7.8.14 perturbs this into five distinct roots. If we wish to have the origin appear as the endpoint of nonsingular homotopy paths, the cheater's homotopy will accomplish this. Usually though, our aims are in the opposite direction: we would like to avoid computing degenerate solutions whenever possible. 7.9

Exercises

The following exercises are intended to help the reader understand the principles of parameter continuation and also to experience the numerical behavior of the continuation method. They assume that the user has access to Matlab, and that the package HOMLAB, available on the authors' websites, has been installed on the Matlab search path. A users guide to HOMLAB appears in Appendix C. Demonstration codes are provided for most of the exercises, so they can be run with minimal knowledge of Matlab commands. A few exercises require the user to

Coefficient-Parameter Homotopy

113

write or modify an m-file. Even those with minimal prior experience with Matlab should be able to handle these after a little experimentation. A few words about HOMLAB. The main output of the demonstration programs is always stored in two arrays: xsoln and stats. Each column of xsoln contains a solution of the system in homogeneous coordinates, and column i of s t a t s compiles some statistics on the numerics of the ith solution. HOMLAB treats all problems as formulated in a multiprojective space to take advantage of the ability of the projective transformation to handle paths leading to solutions at infinity. For the Stewart-Gough platform problems, this is natural, since we have formulated them on IP7. The code requires that problems naturally formulated in C n , such as the initial triangle example, be homogenized for solution in P". Typically, the homogeneous coordinate that is added in this process is appended as the last row in xsoln. (See the user's guide for information on the full range of options.) Function y=dehomog(xsoln,epsO) de-homogenizes solutions by dividing through by the homogeneous coordinate for any solution for which the homogeneous coordinate is nonzero as judged by the test abs (xsoln(n+1, :) )>epsO. Part of the learning process of the exercises will be to see how to set the tolerances such as epsO. The second output, stats, compiles some statistics for the run. Each column of stats corresponds to the matching column in xsoln. Full information is given in the user's guide. For the exercises to follow, we are mainly concerned with rows 2, 3, and 5, having the following meanings: Row 2 This is a convergence test on the solution. It is a two-norm estimate of how accurately the solution has been computed. Row 3 This is the maximum of the absolute values of the polynomials evaluated at the solution point. If this is not small, an error has occurred. Row 5 Condition number of the Jacobian matrix of the polynomial system evaluated at the solution point. A large condition number implies the solution is singular. Exercise 7.1 (Triangle) This exercise experiments with file triangle.m, which solves Ex. 7.3 using the parameter homotopy path given in Equation 7.3.6. It uses a path tracker without an endgame to handle singular roots so that one can see what happens in such cases. The routine allows the option of accepting a randomly-generated, complex value for the path constant a in Equation 7.3.6. Try the following experiments: (1) Solve several triangles of your own choice, accepting the option to use a random, complex value for a. Does the routine reliably return accurate solutions? (2) Try again, but choose a = 1. Can you find examples for which the routine fails? Succeeds? Can you determine a condition on (a, b, c) that predicts success versus failure?

114

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

(3) Now choose a = I + li. Can you find a (a,b, c) for which the algorithm now fails? What happens if you add a small random perturbation to the values? (4) Enter an (a, b, c) that is on the boundary of the triangle inequality, for example, (2,1,1). Let the routine pick a random value for a. What happens? How about for {a,b,c) = (2,l,l+le-8)? Exercise 7.2 (Symmetry) Consider the family of systems F{x, y; a) = {xy3 — a,x3y — a} parameterized by a £ C. Solve F(x,y;a) = 0 symbolically by hand. Find a mapping that gives symmetry groups of order 4. How many roots are there in (x, y) G C2? How many paths would you need to track if symmetry is used to its fullest extent? Exercise 7.3 (Cheater's Homotopy) This exercise addresses the system in Equation 7.8.13. (1) Prove the claim that Equation 7.8.13 has just one nonsingular solution for a generic value of q. (2) Use the script cheatrun.m provided with HOMLAB to numerically determine the number of nonsingular roots for Equation 7.8.13 and for the Cheater's start system, Equation 7.8.14, assuming generic b\ and &2(3) How many solution paths would a parameter continuation have when solving for different settings of the parameter ql How about the cheater's homotopy? (4) What are the singular solutions of Equation 7.8.13? Exercise 7.4 (Stewart-Gough Platforms) The goal of this exercise is to reconfirm the results presented in the case study of § 7.7. You will use Matlab routine Stewart/sgparhom.m. (1) In Matlab, type »sgparhom to begin solving Stewart-Gough forward kinematics problems. A file, strt66.mat, containing random parameters and the 40 corresponding solutions for a generic 6-6 problem is provided to bootstrap the process. (2) Plan a strategy for reconfirming the solution counts for all the subfamilies shown in Table 7.1. Try to minimize the total number of paths that are tracked. The program provides a facility for saving solutions to re-use as start points in subsequent runs. Run at least one of each topological type, some planar and some not. (3) Pick a subfamily and write an m-file that defines a specific example in that subfamily. Then, compute solutions to that example twice: once using a homotopy in the subfamily and once as a special case of a larger subfamily that contains it. For example, you might solve a specific 3-3 case with a 16-path homotopy in that family and also with a 32-path homotopy in the 6-4 family. Compare computation times and check that the same (nondegenerate) solutions are obtained both ways. Remember that the points are computed using homogeneous

Coefficient-Parameter Homotopy

115

coordinates in P 7 , so you will need to devise a scheme for judging that two such points are equal. How closely do the points match? (4) Run a real case and check that any complex solutions appear in conjugate pairs. Change the parameters and see if the number of real roots changes. (5) Solve a problem with real parameters, p £ IR42. Then, use 3-D graphics commands to draw simple (stick-figure) models of the Stewart-Gough platform in all its real poses. Exercise 7.5 (Secant Homotopy) Let f(x;p) : C" x Q —> C" be a system of parameterized polynomials. Then the secant system derived from f(x;p) is g(x;\,n,Pi,P2) = A/(x;pi) + fj,f(x;p2). (1) What is the parameter space for g(x;\,n,pi,p2) = 0? (Note, we may consider [A,/i] G P 1 . Why?) Denote the parameter space as Q' in the following items. (2) What is the relationship between the nonsingular root count for g, J\fg(U,Q'), and the one for / , Nf(U,Q), where U is any Zariski open subset4 of C n ? (3) Suppose we know all A//(U,Q) nonsingular roots of f(x;pi) = 0 for some general pi £ Q. We would like to use these as start points for a secant homotopy h(x,t) = -ytf(x;Pl) + (1 - t)f(x-p2)

(7.9.15)

to find all nonsingular solutions in U of f(x;p2) = 0 by tracking solution paths as t goes from 1 to 0. Why do we need Mf(U, Q) = Mg{U, Q') for this to be justified? If this equality does not hold, can you think of a way the homotopy might fail? If conditions of the previous item are satisfied and the constant 7 in Equa(4) tion 7.9.15 is chosen randomly in C, the secant homotopy will be successful with probability 1. Choosing 7 randomly on the unit circle (|7| = 1) also works. Can you see why the random 7 is necessary? (Try to think of a counter-example if 7 = 1.) (5) Prove the claims of the previous item. (Hint: see § 8.3.)

Exercise 7.6 (Secant Homotopy for Stewart-Gough Platforms) The conditions laid out in the previous exercise for success of the secant homotopy hold for general Stewart-Gough forward kinematics problem (type 6-6) as defined by Equation 7.7.11. A set of 40 solutions for a general 6-6 platform are provided in file stewart/strt66.mat. These were used in Exercise 7.4 as start points for a parameter homotopy, but they can also be used for secant homotopy as implemented in Stewart/sgsecant. (1) In (Wampler, 1996a), it is shown that the root count of 40 for general 6-6 platforms follows from the fact that Bi is antisymmetric when we re-write Equa4

see § 12.1.1 for definition

116

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

tion 7.7.10 for leg i as a quadratic form, eTAie + 2eTBig + gTg = 0,

(2) (3)

(4)

(5) (6) (7)

where e and g are interpreted as 4 x 1 column matrices. Use this fact to prove that the secant homotopy is valid for 6-6 platforms. Use sgsecant to solve a random 6-6 example. How does the running time compare to the parameter homotopy? Can you explain why? Use the secant homotopy to solve examples of the other coincident-point subfamilies, using the 40 start points from strt66.mat. Why is this justified? In particular, solve a 3-3 problem. How does the computation time compare to item 2? Can you explain? We would like to solve problems in a coincident-joint subfamily using a start problem from the same subfamily so that the number of solution paths is equal to the generic number of solution points. For example, we would like to solve 4-4b problems using just 24 paths. What check must be performed to see that this is justified? (Challenging) Write a program to do the check for subfamily 4-4b. (Tip: modify a copy of sgsecant .m.) What is your conclusion? Try the same for other families. What needs to be checked to conclude that a secant homotopy between two 6-6P platforms can be done using just 20 paths? Modify sgsecant.m so that you can do this check. What is your conclusion? Use the results of the last two items to determine the minimum number of paths required to solve a 3-3 problem by secant homotopy.

Exercise 7.7 (Numerics of Tracking) File htopyset .m sets constants that control the behavior of the path tracker. The two most important ones are maxit, and epsbig. Small values require the numerical solution point to stay close to the true path denned by the equations. Large values allow more deviation. Adjust the settings by putting a copy of htopyset .m in your local directory and editing it. Run sgparhom and observe the effect on computation time and reliability by recording changes in runtime, the number of function evaluations (last row in stats), and by noting any path failures. Also, type >> pathcros(xsoln) to check if any solutions have "jumped paths," causing some root to be reported more than once and leaving out the root at the end of the solution path that was left behind in the jump. Can you make sgparhom run faster?

Chapter 8

Polynomial Structures

In the previous chapter, we introduced the basic concept of a coefficient-parameter homotopy. This is the underlying principle for all of the homotopies discussed in this book; each system that we solve has a parameter space, and a homotopy is just a continuous path between two points in this parameter space. Whenever we approach a new polynomial system, the first question we face is how to parameterize it. Problems from engineering or science generally come with a natural set of parameters built in: the dimensions of the links in a mechanical system or the rate coefficients for chemical reactions, for example. But rarely do we know all the solutions for a general choice of such parameters. We need to cast the naturallyparameterized problems in some larger family of problems in which a start system is more easily found. We called this the Ab Initio procedure in § 7.2, but postponed detailed discussion for later. We now return to this important question. At the opposite end of the spectrum from the natural, physical parameterization of a system are total degree homotopies. These can in principle solve any system, because as we shall see, every system is a member of a total-degree family parameterized by the coefficients. Moreover, in each such family, there is a start system whose solutions are immediately apparent. The downside is that, depending on the target system, the total-degree homotopy may have many paths that go to solutions at infinity or other degeneracies. These waste computer time, and the process of carefully distinguishing between degenerate and nondegenerate solutions can also cause extra work. Even so, if the extra work is not so excessive as to make the computation infeasible, we only have to do it once to get the solutions for a general member of the naturally parameterized family. Then, we can use a homotopy in the natural parameter space to solve any other system of that parameterized family. But what if the extra work is excessive? Over the years, a number of useful classes of homotopies have been invented to populate the territory between total degree and naturally-parameterized homotopies. We choose among these with the objective of best matching the target system without overly complicating the solution of the start system. The purpose of this chapter is to discuss the most important signposts in this territory.

117

118

8.1

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

A Hierarchy of Structures

Fig. 8.1 Classes of Product Structures. Below line A, start systems can be solved using only routines for solving linear systems of equations. Above line B, special methods must be designed case-by-case.

Figure 8.1 shows a hierarchy of classes of special structures that are useful in constructing homotopies. Each structure in the diagram is a member of the class above it; for example, a total degree structure is a particular kind of multihomogeneous structure. (In particular, as we will shortly see, it is a one-homogeneous structure.) As we ascend the hierarchy, each class of structures presents more and more possibilities for matching a particular target system that we wish to solve. As indicated on the right of the diagram, this means that we can select a more special structure, usually with the aim of reducing the number of solution paths to track in the homotopy. The trade-off we face in this ascent is indicated by the downward pointing arrow on the left of the diagram: the lower structures allow us to select start systems that are easier to solve. For some problems, the ascent up the diagram pays handsomely in path reduction and may turn an intractable problem into a solvable one. On the other hand, it can happen that solving a start system for a higher structure can consume more computer time than is saved in path reduction. Unfortunately, even just counting the number of roots of the start system can be expensive, so it is a matter of experience to decide the most advantageous spot in this hierarchy to solve a particular problem. Two dashed lines appear in Figure 8.1 to demarcate significant differences in the start systems of homotopies respecting the various special structures. Below Line A,

Polynomial Structures

119

the start systems can be chosen in a factored form which permits all solutions to be computed using simple combinatorics and routines for solving linear systems. Thus, for these structures, the time spent solving the start system is insignificant compared with tracking the solution paths to the target system. Above Line A, some path tracking is usually required just to solve the start system. Furthermore, above Line B, solving the start system usually requires the use of a homotopy based on one of the structures below it in the hierarchy. Typically, these are not optimal in the sense that some paths lead to degenerate points. Between the two lines lie the monomial-product and Newton-polytope homotopies. These require path tracking to solve the start system, but the homotopies involved can be specially designed to produce all solutions of the start system without any extra paths leading to degenerate solutions. In addition to the cost of the path tracking, the combinatoric calculations can be significant. In addition to differences in computation times, the position in the hierarchy also has an effect on the complexity of the computer code that implements it. In this regard, the two extremes are the simplest. All homotopies require routines for path tracking. To this, a total degree homotopy adds a simple start system that is almost trivially solved. Consequently, the corresponding computer code is as simple as possible. At the other extreme, we may formulate a coefficient-parameter homotopy in terms of the physical parameters of the engineering or science problem at hand, a step which we must do in any case. The start system simply amounts to choosing random, complex values for these parameters. The difficulty comes in solving the start system. A simple way to proceed is to solve the start system with a total degree homotopy. This may be expensive, but it only has to be done once. After that, we may solve any target system in the same parameterized family using only the paths from the nondegenerate solutions of that first start system. So, once we have implemented a general-purpose solver for total degree homotopies, coefficient-parameter homotopies require only a bit of data management to solve a start system and store its nondegenerate solution list. The other intermediate structures introduce intermediate levels of complexity to a computer code. Multihomogeneous and linear product homotopies introduce simple combinatorics into the enumeration of the start solutions. In contrast, the combinatorics introduced by monomial homotopies have been the subject of significant mathematical study, of which we give only a hint in § 8.5. A final important consideration in the choice of homotopy is numerical stability and robustness. For the paths leading to nonsingular solutions, there is not much difference to be expected in this regard no matter which homotopy is chosen. However, it can happen that if one uses a homotopy near the bottom of Figure 8.1, the singular solutions may vastly outnumber the nonsingular ones. In some practical situations, we may be satisfied to casually discard all badly-conditioned solutions without wasting much computer time on them. This runs the risk of dropping out some generically nonsingular solutions that happen to have marginal conditioning.

120

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

When we wish to be more careful about finding all nonsingular solutions, a great deal of effort may be necessary to resolve all the badly-conditioned solutions. Moving up the hierarchy to a more special structure may eliminate these solutions from the homotopy and avoid the cost and uncertainty of computing singular solutions. In some cases, singular solutions remain but have reduced multiplicity, making them easier to compute accurately using "singular endgames," (see Chapter 10). With this general picture in mind, we will proceed to examine each of the special structures in some detail. Before starting this journey, we present a discussion of homotopy paths that is relevant to all the special structures. Then, we start at the bottom of the diagram of Figure 8.1 and work our way up to structures of increasing specificity. We give only simple examples in this chapter, postponing case studies of more significant examples to the next chapter.

8.2

Notation

Throughout the remainder of this chapter, it will be convenient to use the following notations. be the n-dimensional vector space having basis elements (1) Let (ei,...,en) e i , . . . , e n and coefficients from C. Any point in this space may be written in the form X^Li Ci&i w i* n c i £ C for all i. Note that we have not specified anything about the basis elements: in the structures we discuss below these will be variously individual variables, monomials, or polynomials. (2) Let {pi, • • •,pn} ® {Qi, • • • ,Qm} be the product of two sets, that is, the set {Pi ® Qj-, 1 < i < n, 1 < j < m} having nm elements. Throughout this chapter, we take this product as the image inside the polynomial ring; that is, x ®y = y ® x = xy is just the product of two polynomials. (3) Define P x Q = {pq | p s P,q £ Q}. Accordingly, {P ® Q) is the space whose members are sums of members of (P) x (Q). Since this includes a sum of one item, we have (P) x (Q) C (PQ). (4) For repeated products, we use the shorthand notations P^ = P <S> P and (P) = (P) x (P), and similarly for three or more products. (5) We write an element of complex projective space P™ using square brackets, for example, x = [xo,x\,... ,xn] £ P™, see § 3.2.

8.3

Homotopy Paths for Linearly Parameterized Families

As we shall soon see, in our hierarchy of special structures, Figure 8.1, all but the top case (general coefficient-parameter structures) have parameters that appear linearly. This means that the family of systems F(z\ q) : C™ x C m —> C n has the

Polynomial Structures

121

property that for any a, (3 £ C and qi, q2 £ C m , F(z; aqi + f3q2) = aF(z; Ql) + 0F(z; q2). The special structures of this chapter all obey this linearity condition because they are parameterized by coefficients which multiply a basis set of monomials or polynomials. Since the parameter space, C m , is linear, we can easily construct an homotopy that stays in the parameter space while continuing from a start system, F(z; qi), at t = 1, to a target system, F(z; q2), at t = 0, as H(z, t) := F(z; tqx + (1 - t)q2) = 0. By Lemma 7.1.2, to solve the system for a given target q2, we just need the solutions at almost any starting qi £ C m , from which we can follow the real straight line path t £ (0,1]. However, in the case of an Ab Initio homotopy, where we have chosen
where 7 £ C is chosen randomly and r € (0,1]. For nonzero 7 not on the negative real axis and r G [0,1], the denominator 1 + (7 — l)r ^ 0. By the linearity of F(z; q) with respect to q, we can clear the denominator to get H(z, T) := F{z; 7TQ1 + (1 - r)q2) = 0, without changing the solution paths. It can save computation to further rewrite this as H(z, r) := 1TF(z- qx) + (1 - r)F(z; q2) = 0. This is sufficiently convenient that we state it formally below. The upshot is that in the succeeding sections, we may concentrate on finding start systems for each of the special structures. Any start system in the family will do, as long as it has the generic number of roots. Recall from the previous chapter, the notation N{q, U, Q) is the number of nonsingular roots in U of F(z; q) = 0 at parameter point q € Q. Theorem 8.3.1 Suppose F(z; q) : Cn x C m —> Cra is polynomial in z and linear in q, and let f(z) = F(z;q0) for some given q0 G C m . If g(z) = F(z;q*) with M{q*, U, Cm) = Af(U, Cm) for some Zariski open set U C Cn, then

122

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

(1) for almost all 7 G S1, i.e., for all but finitely many complex numbers 7 of absolute value one, the homotopy h(z,t):=-ytg(z) +

(l-t)f(z)=O

has Af(U, C m ) nonsingular solution paths ont 6 (0,1] whose endpoints as t —* 0 include all of the nonsingular roots of f(z) = 0 in U; (2) if g(z) = 0 has no isolated roots of multiplicity greater than 1, the endpoints of the nonsingular solution paths include all isolated solutions of f(z) = 0 in U; and (3) if we let 7 = el6, the foregoing statements still hold for all but a finite number points 9 G [—7T, TT]. Proof. This is a consequence of Theorem 7.1.4, Theorem 7.1.6, and Lemma 7.1.3 with rearrangements described above for linearly parameterized families. • Remark 8.3.2 In cases where g already incorporates a generic complex scaling factor, 7 is superfluous; it can be dropped from the homotopy. (This is equivalent to choosing 7 = 1 . ) Through use of Theorem 8.3.1, as long as the parameters of the family of systems appear linearly, all that we need to form a good homotopy is to find one start system in the family having the generic number of nonsingular roots. Then, by picking 7 at random in C, the homotopy leads to all nonsingular solutions of a target system, with probability one. 8.4

Product Homotopies

Let us now jump to the bottom of the hierarchy of Figure 8.1 and work our way up. Although the lower structures can be justified as special cases of the higher ones, it is better for building understanding and intuition to start with the simpler cases. Not surprisingly, for the most part, this follows the historical development of the subject. 8.4.1

Total Degree Homotopies

At the bottom of the hierarchy, the total degree homotopy uses the least detail of the structure of the target system to be solved. The structure is completely characterized by the number of variables n and a list of degrees di, i = 1,..., n. (Here, the di are all positive integers.) Let F(z, q) : Cn x Q —> C" be the family consisting of n polynomials in n variables with dt being the degree of the ith polynomial. The parameter space Q consists of the coefficients of all monomials that respect the

123

Polynomial Structures

specified degree structure. In other words, we have fi(z;q)=

qi>aza,

^

i = l,...,n,

(8.4.1)

\a\
where a = {ai,...,an}

G Z | o , \a\ := ax -\

+ an,

and za := z™1z%2 • • • < " .

1

The number of monomials in n variables having degree less than or equal to d is ("n^)' s o denoting rrii = (™~^di), the parameter space for the total degree homotopy is Q = C m i x • • • x C m ". Using the notation of § 8.1, we may write a description of F in the alternative form /i(z)e({l,;zi,...,zn}<*>), where the parameter space is the set of coefficients multiplying the elements of the vector space. Since the parameters of F appear linearly, we can apply Theorem 8.3.1, if only we can find a start system g £ F that has the generic number of nonsingular roots and is easy to solve. We know from the classical Bezout Theorem for systems that the number of finite, nonsingular solutions to a generic member of the total degree family is J\f = d\ • • • dn. A simple system that achieves this bound is

(#-1) z

g(z) = I

d

_

2

2

.

-^

\ = 0.

(8.4.2)

We can solve the individual equations independently, obtaining dt roots for z»; the solutions of the system g(z) = 0 are the d\ • • • dn combinations of these. It is easy to see that all of these roots are nonsingular. So, even though it is very sparse, g(z) has as many roots as the most general member of the total degree family. We summarize the net result in the following theorem. Theorem 8.4.1 (Total Degree Homotopy) Given a system of polynomials : C™ -> C" with the degree of fc equal to di} let g(z) / ( z ) = {fi(z),...,fn(z}} be any system of polynomials of matching degrees such that g{z) = 0 has d = fj™ di nonsingular solutions. Then, the d solution paths of the homotopy h(z,t):=
(l-t)f(z)=O

starting at the solutions of g(z) = 0 are nonsingular for t £ (0,1] and their endpoints as t —> 0 include all of the nonsingular solutions of f(z) — 0 for almost all 7 £ C, excepting a finite number of real-one-dimensional rays through the origin. In 1

Simple demonstration: a monomial za, z = {21,..., zn}, \a\ < d, can be written as a string of

d + n symbols as za = 1 • • • 1 X z\ • • • z\ X • • • X zn • • • zn, where the positions of the n occurrences

of the "x" symbol uniquely specify the monomial. Hence, the choices of n items in a list of n -f- d things enumerate the monomials.

124

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

particular, restricting 7 to the unit circle, 7 = e%0, the exceptions are a finite number of points 9 £ [0, 2TT].

Proof. Because the family of all polynomial systems with the specified degrees is a vector space over the coefficients of its monomials, this follows directly from Theorem 8.3.1 under the condition that g(z) = 0 has the generic nonsingular root count. The classical Bezout Theorem says that d = 11"= 1 di i s the generic root count for this family, so we are done. • Remark 8.4.2 The system g(z) from Equation 8.4.2 satisfies the conditions of the theorem, and so it can be used as the start system of a homotopy to solve f(z) = 0. There are, however, many viable alternatives. One that is occasionally useful has gi(z) a product of di generic linear factors. Using the notation of § 8.1, we may write gi(z)e{z1,...,zn,l)(d').

The roots of this start system are found by choosing one factor from each equation and solving the resulting linear system of equations. If we choose the coefficients of all the linear factors at random, these linear systems will all be nonsingular with probability one. Equation 8.4.2 is a special case in which 9i{z)

G (Zi,l)idi) .

Instead of taking the classical Bezout Theorem as given, we can prove it with the tools at hand. It is instructive to do so, because a slight generalization of the same argument will apply for multihomogeneous structures in the next section. First, we rephrase Bezout's Theorem in the current notation. Theorem 8.4.3 (Projective Bezout) Given positive integers di,...,dn, let F(z, q) : C r a + 1 xQ —> C" be the family of homogeneous polynomial systems whose ith function is a member of the vector space ({?o, ~z\,. •., z n } d i ) and whose parameters Q are the coefficients of this space. Then, n t=i

Corollary 8.4.4 (Affine Bezout) Given positive integers d\,..., dn, let F(z,q) : C " x Q ^ C n be the family of polynomial systems whose ith function is a member of the vector space ({1, z\,..., zn}d') and whose parameters Q are the coefficients of this space. Then, n

Af(Ci,Q) = '[[di.

125

Polynomial Structures

Proof. Let q* G Q be the set of coefficients for the system 5(2) = F(£; g*) as

z

z

l

5(2) = I \zn

2

0

. ° [=0. z

0

(8.4.3)

>

We see that 5 has no solutions at infinity, because if ?o = 0, then all of the % = 0, but [0,..., 0] is not a point in projective space. Away from infinity, we may dehomogenize by setting z0 = 1, and find the remaining % as the djth roots of unity. Clearly, there are d = JJ"=1 di distinct solutions, and they are all nonsingular. Theorem 7.1.4 says that since q* G Q, the generic root count A^(Pn, Q) > d. Suppose q' 6 Q in the neighborhood of q* has N > d nonsingular solutions. Theorem A. 14.1 implies that nonsingular roots continue in an open neighborhood, so since P" is compact, the nonsingular solutions along a path from q' to q* must have a limit in P n as the path approaches q*. Accordingly, some solution of g(z) = 0 must have at least two solution paths approaching it. But this contradicts Theorem A. 14.1, leaving M = d as the only possible conclusion. The corollary follows immediately from the observation that since ^(2) = 0 has no roots at infinity, this is the case generically on the whole family F(z; q) = 0, and therefore the affine root count is the same as the root count onP". • Remark 8.4.5 We call d = Yl7=i °k * n e total degree of the system. Thus, we may say that the number of finite, nonsingular roots of a system of n polynomials on C71 is less than or equal to its total degree. Remark 8.4.6 The system of Equation 8.4.3 can be used as the start system in an homogeneous homotopy to solve n homogeneous polynomials in n + 1 variables using the homogeneous analogue of Theorem 8.4.1. In fact, it is very useful to homogenize a target system and solve it on P n , so that solution paths that would diverge to infinity in C n can be followed to their endpoints at infinity in Pn. See Chapter 3 for more on this. The total degree homotopy is easy to implement and very effective for systems of dense polynomials. However, systems arising in practice often display patterns of sparsity that result in fewer than the total degree number of roots. The next few sections move up the hierarchy of Figure 8.1 to capture more of the structure of the target system.

126

8.4.2

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Multihomogeneous

Homotopies

The quickest way to understand multihomogeneous structures is to start with an example. Suppose we have the system

f(x,v) = {*Hll}=0-

(8-4-4)

The total degree of this system is four, but it has only two finite roots, (x, y) = ±(1,1). When we use a total degree homotopy on C2, we are in essence solving a one-homogenization of the system on a patch of P 2 . In this case, the onehomogenization of f(x, y) is obtained by substituting x = X/W and y = Y/W and clearing denominators to get 2

~) FX(W,X, Y) =f |XY£ [-_W w2 I = 0.

(8.4.5)

Now the finite roots are [W, X, Y] = [1,1, ±1] and there is an additional double root at infinity: [W, X, Y] = [0,1,0]. The total degree homotopy not only wastes computation by following four solution paths, but the two unwanted paths lead to a singular root. If this root is not handled properly, the procedure may spend much more time on it than is spent on the meaningful finite roots. It would be better to use a different treatment of infinity, so that the undesired roots no longer exist. In this case, this can be done by introducing a separate homogeneous coordinate for each variable; that is, set x = X/U and y = Y/V and clear denominators to get

F2(X,Y,U,V) = [X£Z™ } =°-

(8A6)

We now seek solutions ([U,X], [V,Y]) G P x P and find that there are only the two finite solutions ([1,1], [1,1]) and ([—1,1], [—1,1]). There are no solutions at infinity, because setting U = 0 implies (U,X) = (0,0), which is not allowed, since [0,0] ^ P, and setting V = 0 has similar consequences. An homotopy that respects the two-homogeneous structure of the system will have only two paths. This can be understood in another way using the vector space notation of § 8.1. Recall from the previous section that the total degree homotopy treats f(x,y) as follows: (xy - 1) G ({x, y, 1} {a;,y, 1}) = (x2, xy,y2,x,y,l), 1 2 2 (x - 1 ) e ({x,y,l}®{x,y,l}) = (x ,xy,y ,x,y,l) .

[ l>

^

In contrast the two-homogeneous treatment places f(x, y) as a member of the family as follows: (xy - 1) G ({x, 1} ® {y, 1}) = (xy, x, y, 1), (x2 - 1) G ({x, 1} ® {x, 1}} = ( x 2 , x , l ) .

. l8 4 8j

- -

127

Polynomial Structures

Clearly, for this system, the two-homogeneous treatment is more restrictive than the one-homogeneous treatment. The corresponding start system is 9i(x,y)£(x,l)x(y,l), g2(x,y) e (x,l) x (x,l).

^ ^

A particular instance that is sufficient is

which has two solutions (x, y) = (±1,1). When solving this system, we cannot choose the first factor x = 0 in the first equation as it is incompatible with either factor in the second equation. This hints at the general phenomenon that we make use of in multihomogeneous homotopies. More formally, the structure used in a multihomogeneous treatment of a system can be summarized as follows. We have n variables that are partitioned into m disjoint subsets of size ki,..., km, (fci + • • • + km = n); that is, we have z G C™ written as z = {zi,...,

zm} with Zj = {ZJI, . . . ,

zjk.}.

Furthermore, in the target system f(z), the degree of the zth polynomial fi(z) with respect to the jth set of variables Zj is d^. This can be written for i = 1 , . . . , n as

fi=

£

c { Q l ,..., Q m } zf 1 ... Z °™

(8.4.11)

{ = !,...,or m }

where each a^ is a multidegree. Equivalently fi e ({1, 2l} ® • • • ® {1, zmyd^)

.

(8.4.12)

We consider the family of all such systems, parameterized by the coefficients of all the monomials that appear in this vector space. In the remainder of this section, f(z) is a particular member of this family and Af(Cn, Q) is the root count for the family, where the parameters, forming space Q, are the coefficients of all the monomials of the vector space specified by Equation 8.4.12. As we will justify below, a start system that corresponds to F given by Equation 8.4.12 is g with 9i

e (zu l ) ( d i l ) x • • • x (Zm, l){d""}.

(8.4.13)

That is, gi is the product of linear factors, with d^ factors of the variables Zj. Let G be the family of all such systems, having a parameter space Q' consisting of the cross product of the parameter spaces for the vector spaces of the factors. Clearly, after expanding the product and collecting terms, each such g is in the family defined by Equation 8.4.12, which defines a map <j> : Q' —> Q. Let Qg C Q denote the image

128

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

of Q' under the map 0. We know that Qg is irreducible, because Q' is, so we may speak of M(U, Qg), the generic nonsingular root count of the start system family G as a subfamily of F. To find a solution of g{z) = 0, choose one factor from each equation and solve these n linear equations simultaneously. One finds all of the solutions by ranging over all possible choices of the factors. As we saw in the example of Equation 8.4.10, some combinations of factors will be incompatible; in fact, we must choose exactly kj factors for each group of variables Zj. There are several ways to count the number of solutions of the start system g(z) = 0. Let D be the n x m matrix of nonnegative integers with entries d\j and let K = {k\,..., km}. For generic coefficients in the linear factors, we have a generic root count that depends only on D and K. We'll call this function Bez(D,K) = J\f(Cn,Qg). Let s(K) be a list of length n containing kj copies of a,- for j = 1,..., m. From this, let ir{K) be all the distinct permutations of the list s(K), of which there will be n\/(k\\ • • • km\). Then, a direct formulation of the combinatoric process described in the previous paragraph is

Bez(D,K)=

n

(8A14)

11^-

J2

An equivalent definition is

(

n

m

\

^•••amm.I[2dtfaJ h

(8A15

)

where coeff(x,p(x)) reads as "the coefficient of monomial x in the polynomial p(x)." A special case of this formula occurs when m = n, which implies kj = 1, j = 1,... ,m. Then, D is a square matrix and Bez(£>, K) — permanent(Z?), where the permanent of a matrix is just the determinant except all terms are added without introducing negative signs on the odd permutations. If D has all nonzero entries, then there are n! terms in the sum. The other extreme is the one-homogeneous case m — 1, k\ = n, for which we get one term, the total degree Bez(D,{n}) = dn • • -dni.

Now, let's justify the use of this start system by proving the following theorem. Theorem 8.4.7 (Multihomogeneous Bezout Theorem) Let F : Cn x Q —> C n and G : Cn x Qg —> Cn be the families of systems specified by Equation 8.4.12 and Equation 8.4.13, respectively. Then JV(C", Q) = M(Cn, Qg) = Bez(D, K), where a formula for Bez(D, K) is given in Equation 8.4.I4. Proof. The proof is essentially the same as the proof of Theorem 8.4.4, except we use multihomogenizations of F and G to compactify the solution domain. See

Polynomial Structures

129

§ 3.6 for the definition of multiprojective spaces and multihomogeneous polynomial systems compatible with them. The multihomogenizations of F and G are functions on C fcl+1 x • • • x Ckm+1 compatible with the multiprojective space X = ¥kl x • • • x pfem j n particular, the multihomogenization G of G, using the homogenization substitutions Zj£ = z"j(/u>j, has an ith function of the form

A solution to 5 = 0 must have at least one factor in each equation equal to zero. For a generic J E G , a choice of kj factors in the group of variables {WJ, 2ji,..., "z^ } from kj different equations, determines a unique point in the corresponding ¥kj, and a collective choice of one factor from each equation that has kj factors in each group of variables for j — 1,..., m gives one nonsingular solution of 5 = 0 in X. These are the only possible choices, since any other choice must have more than kj factors in some group j and so has only the trivial solution {0, ...,0} ^ Fkj. These are the same combinatorics that define Bez(D,K), so we have Af(X,Qg) = Hez(D,K). Moreover, generically none of the roots are at infinity, and no other solutions exist. Since the multiprojective space X is compact, by the same argument used in Theorem 8.4.4, we have for the multihomogenized family F that Af(X, Q) = Af(X, Qg). Since generically none of the roots is at infinity, and since the affine roots of the original inhomogeneous systems F and G are in one-to-one correspondence with those of their multihomogenizations, the affine root counts are the same as the root counts on X. • Remark 8.4.8 Although we have not stated it as a separate theorem, it is clear from the proof that Bez(D,K) is also the generic nonsingular root count for a multihomogeneous polynomial system with degree matrix D and group sizes K = {hi,..., km} compatible with the multiprojective space X = Pfcl x • • • x Pfcm. The final step is to connect our start system g with a target system / of the same multidegree structure. Since the parameters of F(z\ q) appear linearly, we may use the homotopy given in Theorem 8.3.1. For the record, we state this as the following theorem. Theorem 8.4.9 (Multihomogeneous Homotopy) Given a system of polynomials f(z) = {fi(z),..., fn(z)} '• C n —> Cn having a degree matrix D for variables partitioned into subsets of sizes K, as above, let g(z) be any system of polynomials of matching degrees such that g(z) = 0 has Bez(D,K) nonsingular solutions. Then, the Bez(D, K) solution paths of the homotopy h(z,t):=jtg(z)

+

(l-t)f(z)=O

starting at the solutions ofg(z) = 0 are nonsingular for t £ (0,1] and their endpoints as t —* 0 include all of the nonsingular solutions of f(z) = 0 for almost all 7 € C, excepting a finite number of real-one-dimensional rays through the origin. In

130

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

particular, restricting 7 to the unit circle, 7 = e , the exceptions are a finite number of points 0 e [0,2TT]. Proof. Because the family of all polynomial systems with the specified degrees is linear with respect to the coefficients of its monomials, this follows directly from Theorem 8.3.1 under the condition that g(z) — 0 has the generic nonsingular root count. Theorem 8.4.7 establishes that Bez(D, K) is this count. • Remark 8.4.10 A similar homotopy works on a multiprojective space X = pfci x • • • x Pfc™ for compatible multihomogeneous functions and start systems. This is the preferred formulation when the target system might have solutions at infinity, for the reasons cited in Chapter 3. Example 8.4.11 (Matrix Eigenvalues) In the realm of numerical linear algebra, efficient and robust methods already exist for solving matrix eigenvalue problems, but for purposes of illustration, let's consider the problem of finding eigenvectors and eigenvalues by multihomogeneous homotopy. Given two n x n matrices A and B, the generalized eigenvalue problem is to find (v,X) e p™-1 x IP such that f = (X1A + \2B)v = 0. This becomes a conventional eigenvalue problem for A if we set Ai = 1 and B — —I. The problem consists of n quadratic equations, thus the total degree is 2". Partitioning the variables in the natural way as Z\ = v and z-z = A, we have da = di2 = 1; that is, the equations are bilinear. The root count is the coefficient of a™~1a2 in the polynomial (c*i + 02)") which is simply n. This agrees with the well-known result from linear algebra. A suitable start system has gi(v,X):=(aJv)(bJX)=O,

i =

l,...,n

where a* € C™ and 6j £ C2 are chosen randomly. For k = 1,... ,n, we choose the second factor in the /cth equation to solve for A and solve the linear system formed by the first factors from the remaining (n — 1) equations to get v. This gives n start points. Notice that the equations are all two-homogenized from the outset. To treat these numerically, we may dehomogenize by appending a random, inhomogeneous linear equation for v and one for A. This amounts to choosing a random patch C"" 1 x C 1 on P"" 1 x P. 8.4.3

Linear Product

Homotopies

Multihomogeneous homotopies are linear product homotopies that respect a given partitioning of the variables. They are ideal for problems that have a natural

Polynomial Structures

131

partitioning, such as the eigenvector-eigenvalue problem, but some problems benefit from a less restrictive partitioning, introduced in (Verschelde & Cools, 1993). We call a linear set any subset of {1, z\,..., zn}. A linear product structure is specified by a list of linear sets for each equation. Assume the variables are z = {zi,... ,zn}. Let TOj be the number of linear sets for equation i, and let them be denoted s^ C {1, z\,..., zn}. Then, a linear product family is given by / < € (sn ® • • • ® simi),

(8.4.16)

with, as usual, the parameters being the coefficients of the vector space. For such a family, a sufficient family of start systems G has for the ith equation 9i(z)

e (s x . . - x ( s i m t ) .

(8.4.17)

As discussed in the previous section on multihomogeneous systems, we may consider the family of G as a subfamily of F, having an irreducible parameter space Qg C Q, where Q is the parameter space of F. The sufficiency of G as a start system for F just means that it has the proper root count, which is stated formally as the following theorem.

Theorem 8.4.12 (Linear-Product Root Count) Let F and G be the families of systems specified by Equation 8.4-16 and Equation 8-4-17, respectively, and let Q and Qg C Q be their parameter spaces. Then, for a Zariski open set U C C n Af(U,Q)=M(U,Qg). This is an easy consequence of the general product decomposition theorem, Theorem 8.4.14, below, so we postpone proof to that point. The combinatorics of finding all nonsingular roots to g(z) = 0 is slightly more complicated than in the multihomogeneous case, because the variable groupings are not necessarily the same across all the factors. However, it is just a matter of determining, for each collective choice of one factor from each of the polynomials in g(z), whether the resulting linear system is compatible. We return to this below, but first, let us state the corollary that justifies using a linear-product homotopy. Corollary 8.4.13 For any f(z) in the family defined by Equation 8.4-16 and a generic g(z) from the family defined by Equation 8.4-17, the solution paths of h{z,t):=-ytg(z)

+

(l-t)f(z)=O

starting at the nonsingular roots of g(z) = 0 are nonsingular for t g (0,1] and their endpoints at t = 0 include all the nonsingular roots of f(z) = 0, for all 7 G C excepting a finite number of one-real-dimensional rays through the origin. Proof. This is the usual application of Theorem 8.3.1 in light of Theorem 8.4.12.

•

132

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

The theorem and its corollary are quite simple to apply. Consider the system

^U'H^C^H-

(8418)

We see that /i G ({x,y} {l,x,y}} and / 2 € {{x,y}2 {l,y}), so we pick g1 e (x,y) x (l,x,y) and g2 £ (x,y)2 x (l,y). A particular example is

*(*,*) = (" W ,

(X+ )(1

J ^v +2/ M=0-

(^.19)

v lS2 / \ ( ^ - 2 / ) ( ^ + 22/){l + 3/) / > Although the total degree of g is 6, it has only 4 nonsingular roots, since (0, 0) is a double root. Although we chose very simple coefficients, it is easy to see that this is true for generic coefficients. Hence / has at most 4 nonsingular roots on C2. We give a more substantial example in the case studies below (see § 9.3). It is easy to build a computer program that takes advantage of linear-product homotopies, if we rely on the user to identify the product structure. Then, the program forms a start system consisting of linear factors with coefficients picked by a random number generator. This gives a system that is generic with probability one. The program cycles through the various combinations of choosing one factor from each equation and, if the resulting linear subsystem is full rank, its solution is determined. This potential start point is a solution of the start system, but it is a true start point of the homotopy only if it is nonsingular and it is in the set U. We can check for singularity by numerically evaluating the condition number of the Jacobian matrix of partial derivatives at the point. Assume U is defined explicitly as the complement of the solution set of a given polynomial system, say U = Cn \ s-\0) where s : Cn ->
Polynomial Structures

133

include total degree structures. In HOMLAB, the Matlab code distributed for use with this text (see Appendix C), the general-purpose code uses linear products. The drivers for multihomogeneous and total degree homotopies construct equivalent linear-product structures and then proceed as in the general linear-product case. 8.4.4

Monomial Product

Homotopies

Next up the hierarchy of Figure 8.1 are monomial product structures. Truth be told, these are not usually used directly, but we introduce them as a conceptual bridge to the next level of polynomial products and polytope structures. All we note here is that the entire theory of linear-product structures carries over to the more general case where the sets si;,- are collections of monomials. In the case of linear products, we restricted these monomials to just {1, z\,..., zn}. Let's consider a simple example to fix ideas. Suppose we have two equations involving only the monomials {x,y,x2y,xy2}, that is, /i,/z e (x,y,x2y,xy2). These are cubics, so the total degree is 9. The two-homogeneous Bezout number is the coefficient of a(3 in (2a + 2/?)2, which is 8. The best linear-product structure that contains the given monomials is ({x, y} ® {l,x} ® {l,y}}. If we work on (C*)2, this structure has 6 roots. But the equations obey the following monomial product structure A,/2G {{l,xy}®{x,y}). This structure gives the same root count as the factored system gi,92 G (l,xy) x (x,y). One sees that two generic factors from (l,xy) have no finite roots and two generic factors from (x,y) have only the origin in common, so working on (C*)2, we have ^({/i,/2},(C*) 2 )-AT({ gi ,(; 2 },(C*) 2 ) = 4. The drawback of monomial products, in contrast to linear products, is that it is no longer easy to solve the start system. Fortunately, as covered in the next section, that problem has been solved in a quite general way via the use of convex polytopes. Another advantage of the advanced methods is that they do not require the analyst to find decompositions by hand; it is all automatic. In fact, although convex polytopes can be used to justify the theory of monomial product structures, it is more powerful, applying also to monomial vector spaces that do not reduce to products. If our simple example above is modified to A, h e (x, V, xy, x2y, xy2) 0 ({1, xy} ® {x, y}),

134

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

the monomial product theory does not apply, but the convex polytope approach gives the same root count of 4, because xy is inside the "convex hull" of the other monomials. Still, despite its limitations, it may occasionally be useful to analyze a small system by monomial products. It also serves as a stepping stone to our final product structure: polynomial products. 8.4.5

Polynomial Product Homotopies

As throughout this chapter, let's consider a family of polynomial systems F(z; q) : C n x Q —> Cn. More specifically, let F = {/i,... , / „ } , where each polynomial fi : C™ x Qvt —> C has as its parameter space the coefficients of a vector space Vj defined by a polynomial product structure as follows. Each Vj is specified by rrii sets of polynomials s^-, j = 1 , . . . ,mj, which letting /cy be the number of polynomials in the set s^, can be denoted as s^ = {ptji, • • • ,Pijktj}- All the polynomials pijk are given. The vector space Vj is constructed from these as the polynomial product Vz := (Sil ® • • • ® s i m i ) ,

i =l,...,n.

(8.4.20)

The basis elements of V are all the polynomials obtained by choosing one element from each sy, j ~ 1 , . . . , rrii and multiplying them together. If two or more of these choices give an identical element, the duplicates can be dropped, but in any case, there are at most JTjli ^ij basis elements for Vi. The parameter space for Vi, which we call Qvi, is the set of coefficients multiplying these elements. The parameter space for the family of systems F is just Qv1 x • • • , xQyn. Alternatively said, if a polynomial Wi can be written in the form

Wi

Ti

rrii

=£ n ww>

t 8 - 4 - 21 )

where wm £ (sy), then wt e Vj. A particular system in the family F consists of an n-tuple of polynomials {w\,... ,wn}, Wi G Vj. Now, consider a special member of F wherein each Wi is formed from a single product, that is, rj = 1 in Equation 8.4.21. We will argue that a generic system of this type is sufficient as a start system for a homotopy to find all nonsingular solutions of any system in F. Accordingly, we will choose a generic start system 9{z) = {gi(z)> • • -,9n(z)} with rrii

9i(z) = I ] 9ij{*),

9lj e (Sij),

(8.4.22)

j=i

or what is equivalent, 9i(z) e (sn) x • • • x {simi).

(8.4.23)

Each vector space (sy) has Ckij coefficients, so the entire family of start systems G of the form of Equation 8.4.22 has a parameterization as the cross product of all

Polynomial Structures

135

of these Euclidean spaces, which is therefore just a big Euclidean space. But since every g{z) e G is also in F, we can cast G as a subfamily of F having parameter space Qg C Q. Clearly, Qg is connected, because it is the image of a Euclidean space, where the map is defined by expanding the product and collecting terms. Accordingly G{z;q) is just F(z;q) restricted to C n x Qg, where Qg is the set of systems in F that factors as Equation 8.4.22. The sufficiency of g(z) € G as a start system for any f(z) G F is established by the following theorem. Theorem 8.4.14 (Polynomial-Product Root Count) Let F and G be the families of systems specified by Equation 8-4-20 and Equation 8-4-23, respectively, having parameter spaces Q and Qg c Q, as described above. Then, for any U that is a Zariski open subset ofCn, Af(U,Q)=Af(U,Qg). In other words, the number of nonsingular roots in U for a generic start system, one that factors in the specified way, is the same as the generic nonsingular root count of the whole family. Such a start system g(z) is much easier to solve than a general system in the family, because g^z) = 0 implies that at least one of gij(z)

= 0, j =

I,...,mi.

Our earlier proofs of Theorems 8.4.4 and 8.4.7 hinged on showing that the start system had no singular solutions and no solutions at infinity. The question of excluding roots that satisfy some side conditions, that is, the limiting of the root count to some Zariski open subset U, did not arise, because those start systems will not generically have roots on any given quasiprojective set. In the case of polynomial-product structures, a generic system g 6 G may have singular solutions, solutions at infinity (in some multihomogenization of C"), or solutions on some quasiprojective set. The inclusion of U in the theorem strengthens the result (as compared to using just C n in its place), because it will allow us to drop solutions that generically lie on some quasiprojective set that we wish to ignore. So while these possibilities give the formulation extra power to eliminate solution paths in the homotopy, we must pay for them with a more difficult proof. In particular, we must argue in more detail that in a continuation from a generic member of F to a generic member of G, none of the nonsingular, finite solution paths end at such degeneracies. The proof is a little long by the standards of this chapter, but we attempt to keep the arguments elementary. This sacrifices some rigor and elegance, but hopefully it grants the reader an easier grasp of the essential facts. In the linear-product example of Equation 8.4.18 with start systems like Equation 8.4.19, we already saw an example of a singular solution to the start system which also happens to lie on the affine algebraic set x = 0. These conditions persist generically for the entire family of start systems for that example. We pause here a moment to emphasize that the theorem can be readily applied

136

Numerical Solution of Systems of Polynomials Arising m Engineering and Science

without understanding its proof. In fact, we will give only a sketch of a proof here, as a rigorous one requires the language of line bundles and sheaves. The reader who is versed in these technicalities may wish to consult (Morgan, Sommese, & Wampler, 1995) for a better proof. The proof sketch below may be useful as a guide to understanding the rigorous proof. On that note, some readers may wish to skip to the end of the proof now. Proof, (sketch) We consider the one-homogenizations of F and G with solutions that live on P", but to keep notation simpler, let us retain the same names. After homogenization, the variables z are replaced by homogeneous coordinates x = [xo,xi,... ,xn] G P " and the basis elements of the sets Sik are replaced by their homogenizations. We count the nonsingular solutions on a Zariski open subset U C P n . This includes the special case of counting finite solutions, since Cn = P"\^4 where A = {x E Pn\xo = 0}. The finite solutions of the homogenized systems, i.e., the solutions with Xo =fi 0, are in one-to-one correspondence with the solutions of the original systems via the mappings [xo,xi,... ,xn] — i > (XI/XQ,. .. ,xn/xo) and ( z i , . . . , zn) i—> [1, x\,..., xn], so counting the finite solutions of the homogenized systems is the same as counting the solutions to the original systems. Let iJfc = { t n e solution set, with multiplicities, of the last n — \ equations in F. This set can be decomposed into its irreducible com-

Polynomial Structures

137

ponents, which may be of any dimension from n down to 1. The intersection of a fc-dimensional component with a hypersurface produces components of dimension k or k — 1, and the multiplicity of the intersection is at least as great as that of the component. Accordingly, to count the nonsingular solutions of F = Ho or Hi, we only need to retain from W the irreducible components having both dimension 1 and multiplicity 1. Call this collection of components the curve K. The root count for Ho concerns the intersection K H /^(O), whereas the count for Hi concerns Kngi\0) = U^iKng-Jl(0). In a continuation path through the parameter space for /i as we approach pi, we must consider whether nonsingular roots might become singular so that Af(U,Q) > M(U,Qi). Recall that the base locus Bs(V) of a vector space V = (ei,. •., em) is the set of common zeros of all the basis elements of the space: Bs(V) = {ei,... ,e m }~ 1 (0). The key observation is that generically the singular intersections with gf 1(0) can only occur where K meets the base loci of (sy). Any other singular intersections disappear under generic perturbations of the parameters of gx. The completion of the proof depends on technical arguments about these base loci. Basically, since /i is a sum of polynomials each of the same form as gi, the base loci are preserved under the sum, and moreover, so are their effect on the root count. We leave the details to (Morgan et al., 1995). • There is one phenomenon mentioned in the proof sketch that is relevant to practical implementation of polynomial product homotopies. This is that a point that is a nonsingular solution to one subset of factors (i.e., to a choice of one factor from each gi) might be a singular solution of the whole system g = 0. Such points must be dropped from the list of start points. Except in the special case of linear products, treated in § 8.4.3, polynomial product structures require special methods to solve the start system. After breaking the start system into its various subsystems, one could apply a simpler structure, say multihomogeneous, to each subsystem. However, this is the same amount of work as solving the entire start system, and therefore the target system, with the simpler embedding. We only come out ahead if we use something in the structure of the subsystems to solve them more efficiently. A common occurrence is that many of the starting subsystems have the same structure. After one such subsystem is solved, it can be used as a start system for the other similar subsystems in a parameter continuation. A second major inhibitor to the use of polynomial products is that there is no automatic way to identify a useful breakdown of a given system into a product. Usually, the product is suggested by the method of derivation of the equations. The dual difficulties of finding a useful breakdown into a product and solving the start system means that polynomial products are only appealing for very large problems where the potential payoff is worth the analyst's time. Otherwise, one may as well employ a more automated method and let the computer do the work.

138

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

This completes our tour of the product structures in Figure 8.1. In the next section, we consider a different generalization of monomial products, using monomial polytopes, which respect product structures but also take advantage of monomial sparsity that is not captured in by any breakdown into products. 8.5

Polytope Structures

A natural way to specify a family of polynomial systems is just to list the monomials that may appear in each polynomial. The family is parameterized by the coefficients. By the general coefficient-parameter theory, it is clear that there is a root count associated to such a family. A remarkable theorem, repeated below, due to Bernstein (Bernstein, 1975) tells how the root count depends on the pattern of the monomials. Since the family is linear in its coefficients, we can use the homotopy of Theorem 8.3.1 to solve problems in the family, if only we can solve a start system having the generic number of roots. Several methods for formulating and solving such systems have been invented. We describe here only the basics, so that the reader can appreciate the methodology, but due to the highly technical nature of efficient combinatorial formulations, we defer to references for the details. After reading this section, one might next wish to consult the review article (Li, 2003). Before we can state Bernstein's theorem, we need a few definitions. 8.5.1

Newton Polytopes and Mixed Volume

Let C* = C \ 0, the complex numbers excepting the origin. A Laurent polynomial fi : (C*)n x Cm" —> C is given in multidegree notation as Ji\X,Ci) = y Ci,ct'E aeSi

where Si C Z n is the set of exponent vectors appearing in the monomials, #(£i) = rrii is the number of monomials, and Ci%a G C is the coefficient for the Laurent monomial xa. The qualifier "Laurent" acknowledges that we allow negative exponents, which are disallowed in our usual definition of polynomials. The set Si is called the "support" of fu and its convex hull Qi = conv(5j) in W1 is its "Newton polytope.2" The polynomial family f{x\c) = f(x;ci,...,cn)

= {/i(x;ci),...,/ n (ar;cn)}

is parameterized by mi+m2H \~mn coefficients for the support S = {Si,..., Sn}. When working on (C*)™, multiplication of any equation by a monomial does not change the root count, as the zero set of xap(x) = 0 is just the union of the zero set of p(x) = 0 and the zero set of xa = 0, the latter having no points that A convex polytope is a bounded region of n-dimensional real space enclosed by hyperplanes. "Polytope" is to n dimensions as "polyhedron" is to three dimensions.

139

Polynomial Structures

are in (C*)n. So given a Laurent polynomial, we can always multiply through by some monomial with large enough exponent to clear any negative exponents that appear. Said another way, we can translate the support into the nonnegative orthant without changing the zero set on (C*)n. Thus, it is clear that the parameter theory of Chapter 7 for polynomials with nonnegative exponents applies also to Laurent polynomials. There are several operations on convex polytopes that are of interest to us. One is the Minkowski sum of two polytopes: Qi + Q2 = {Qi + 42 I qi 6 Qi, i

-vo,...,vn-vo]\.

From these definitions, it can be shown that Vbln(AiQi+A2<32H l-AnQn), where 0 < A, £ M, is a homogeneous polynomial of degree n in the scalars Aj. Definition 8.5.1 (Mixed Volume) The mixed volume of convex polytopes Qi! • • • > Qn is defined as M(QU

...,Qn)

= c o e f f ( A i • • • Xn, V o l n ( A 1 Q 1 + \2Q2

We say that the mixed volume, Ai(S\,..., volume of their convex hulls. 8.5.2

Bernstein's

+ ••• + A n Q n ) ) .

Sn), of supports Si,..., Sn is the mixed

Theorem

We have argued above that a (Laurent) polynomial family has a well-defined root count on (C*)ra. The following theorem tells us how to determine it from the geometry of the supports. Theorem 8.5.2 (Bernstein, 1975) The root count on (C*)n of a Laurent polynomial family specified by supports Si,...,Sn and parameterized by the coefficients of the corresponding monomials is the mixed volume M(Si,... ,Sn).

This result is variously called the "Bernstein count," the "BKK bound" (a term coined in (Canny & Rojas, 1991) in recognition of the contributions of (Kushnirenko, 1976) and (Khovanski, 1978)), the "polyhedral root count," or the "polytope root count." We adopt the last convention as the most descriptive and precise. If all the exponents are positive, that is, if the system is polynomial in the usual sense, there is a well-defined root count on C", which may be higher than the polytope root count on (C*)n. The count in C" can be determined by the procedure in (Li & Wang, 1996) with further refinements in (Huber & Sturmfels, 1997).

140

8.5.3

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Computing Mixed Volumes

The computation of the mixed volume is a combinatorial problem. As mentioned at the outset, efficient methods for this computation are highly technical and we will not delve into them here. Instead, we will describe a very basic approach that is of practical use only for two variables, three variables at the most. This will be enough to show the nature of the beast. With this level of understanding, the reader can knowledgeably use software provided by experts, but further study of the references will be necessary to understand the internal workings of such software. Let's begin with the direct application of Definition 8.5.1 for two polynomials in two variables. We know that Vol2(AiQi + X2Q2) is a homogeneous quadratic in Ai, A2, that is, it is of the form p(Ai, A2) = c20Ai + C11A1A2 + C02A2. The mixed volumes is the coefficient of A1A2, which is Ci\. But notice that cn=p(l,l)-p(l)0)-p(0,l), or in other words, M(Q1,Q2) = Vol2(Qi + Q2) ~ Vol2(Qi) - Vol2(Q2).

(8.5.24)

Since Vol2(Q) is just the area of polytope Q, it is easy to see how to apply this using familiar area calculations. Following exactly the same line of reasoning, one may see that M{QUQ2, Q3) = Vol3(Qi + Q2 + Q3) - Vol3(Qi + Q2) - Vol3(Q2 + Q3) - Vol3(Q3 + Qi) (8.5.25) + Vol3(Qi) + Vol3(Q2) + Vol3(Q3), and generally,

M{Qu...,Qn)

= Y,{-l)^-^Yo\n I J2 QJ I » i=i

\jecr*

( 8 - 5 - 26 )

J

where the inner sum is a Minkowski sum of polytopes and Cf are the combinations of n things taken i at a time. It is instructive to see how the mixed volume relates to Bezout's theorem. Suppose fi(x,y) and f2ix,y) are general polynomials of degree d\ and d2, respectively. This implies that their support polytopes Q\ and Q2 are isosceles right triangles of size d\ and d2, shown in Figure 8.2, and the Minkowski sum Qi + Q2 is another such triangle of size d\ + d2- Accordingly, by Equation 8.5.24, the root count is

M(QUQ2) = i(di + d2)2 - id? - id! =

did2)

141

Polynomial Structures

which is, of course, the same result as given by Bezout's Theorem. The subtraction of areas is shown graphically in the drawing of Qi + Q2- Alternatively, we can visualize the definition of the mixed volume directly by drawing a picture of AxQi + X2Q2, as shown at the right side of Figure 8.2. Only the area of the shaded parallelogram scales as A1A2, whereas the triangles scale as \\ and X2-

Fig. 8.2

Mixed volume for two polynomials of degree d\ and di.

In a similar fashion, one may easily see that the mixed volume for two equations having bidegrees (dix,d\y) and (d2X,d,2V) is dixd2y + diyd2x, in agreement with the two-homogeneous Bezout count. Figure 8.3 shows this in a self-explanatory way.

Fig. 8.3 Mixed volume for polynomials with bidegrees {d\x,diy)

and (d2x, <^2j<).

Although the preceding examples only examine the two-variable case, the mixed volume does in fact generalize the multihomogeneous Bezout count in any dimension. This relationship is pursued further in one of the exercises at the end of the chapter. Any linear product structure is exactly captured by the polytope root count, as a linear product is just another way of saying that the monomials appear in a certain pattern. There are, however, more general patterns that are captured by the polytope formulation but not by any linear product formulation. Systems having such patterns are said to be "sparse," because some of the monomials which could appear in a total degree formulation are missing. Many of the problems that arise in applications have such sparseness.

142

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Consider, for example, the system /j (x, y) = l + ax + bx2y2 = 0 f2(x, y) = 1 + ex + dy + exy2 = 0

(8.5.27) (8.5.28)

This system has a total degree root count of 4 • 3 = 12 and a two-homogeneous root count of 2 • 2 + 2 • 1 = 6, but as illustrated in Figure 8.4, the mixed volume is only four.

Fig. 8.4

Mixed volume for polynomials in Equations (8.5.27, 8.5.28).

These diagrams hint at the main idea that underlies efficient algorithms for computing the mixed volume. In each of the drawings of Qi + Q2, notice that the gray cells, whose areas sum to the mixed volume, are parallelograms having one edge in common with Q\ and one edge in common with Qi- These are known as the "mixed cells" in a "mixed subdivision" of Q\ +Q%- Mixed subdivisions are not unique, as we show in Figure 8.5. It is only required to find one.

Fig. 8.5 Alternative subdivisions for each example.

One approach to finding subdivisions for the mixed volume calculation is based on "liftings." A lifting algorithm augments each polytope by adding an (n + l)th coordinate axis and assigning a value using a lifting function. That is, point a e Qi, corresponding to monomial xa in fi(x), is lifted to (a,Wj(a)), where the lifting function, u>i : Z n —> K, for the ith polytope assigns a lift value to each exponent vector. If these assignments are chosen at random, the following procedure gives a valid subdivision with probability one. Let Q\ be the (n + 1)-dimensional polytope derived from Qi using u>i. Then, one forms the Minkowski sum Q[ + • • • + Q'n and finds the lower convex hull. The projection of the edges of this lower hull onto the original n coordinates gives a mixed subdivision, from which the mixed cells

143

Polynomial Structures

can be readily identified and their volumes computed. In fact, for efficiency, one avoids forming the convex hull of the Minkowski sum and instead searches for the mixed cells directly. See (Gao & Li, 2000, 2003; Li, 2003; Li & Li, 2001; Huber & Sturmfels, 1995; Verschelde, Gatermann, & Cools, 1996). In (Huber & Sturmfels, 1995), it is also shown how to take advantage of several of the equations having the same support.

8.5.4

Polyhedral

Homotopies

The mixed volume root count by itself does not enable us to solve the system by continuation. We need a start system that we can solve ab initio. This can be done using information gleaned from the mixed volume calculation to identify monomial combinations that contribute to the mixed volume. This was accomplished in (Verschelde, Verlinden, & Cools, 1994), using a recursive formula for the mixed volume, following in that way Bernstein's proof. The same objective was attained by using mixed subdivisions in (Huber k Sturmfels, 1995). In fact, the homotopy denned in (Huber & Sturmfels, 1995) can be used to establish an independent proof of Bernstein's theorem. A good review of subsequent developments is (Li, 2003). To form a homotopy, one usually chooses the lifting values not from the reals, but from the small nonnegative integers. Such a choice is not necessarily sufficiently generic, but this can be discovered by testing and correcting. In fact, we require the subdivision induced by the lifting to be "fine mixed," a technical condition which is best left for study in the references. In the end, one has a lifting function u>i for each equation. We select a generic member G(x) = {gi(x),... ,gn(x)} of the family of polynomials by picking random complex coefficients Cj>a for monomials at the vertices of the convex hull Q, to get gi{x) = ] T citClxa,

i = l,...,n,

aeQi

and form homotopy functions H(x, t) = {h\(x, t),..., hi(x,t)=

Y,

Ci,axat^a\

hn(x, t)} as i = l,...,n.

(8.5.29)

Att=l, we have H(x, 1) = G(x). We solve G(x) = 0 by first solving H(x, 0) = 0 and then tracking solution paths from t = 0 to t = 1. Subsequently, we solve the original, possibly nongeneric, target system F(x) = 0, using the homotopy

H(x, t) = tG(x) + (1 - t)F(x) tracking paths for t = 1 to t = 0 starting at the solutions for G(x) = 0. At first glance, H(x, 0) as defined in Equation 8.5.29 does not look so easy to solve. However, if we consider the limit as t approaches zero, the solutions x(t) are algebraic, having a number of branches each with its own Puiseux series (fractional power series). Each branch corresponds to a mixed cell in the subdivision, and it

144

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

has a number of solutions equal to the volume of that cell. These solutions can be found by elementary means. Altogether, the paths emanating from the mixed cells give the full set of solutions to G(x) = 0, whose number totals to the mixed volume of the system. In principle the homotopy could go directly to the target system F(x) — 0, using the coefficients and monomials of F in Equation 8.5.29 instead of those of G. In practice it is advisable to use the two-stage procedure, solving G and then progressing to F. This is because target systems are often not generic in the family defined by their support (that is, the coefficients may satisfy a degeneracy condition) and this may cause the standard algorithm for solving H(x, 0) to fail. 8.5.5

Example

Rather than delve any deeper into the technicalities, let us simply show the workings on the example of Equations (8.5.27) and (8.5.28). A choice of lifting functions as u>i = 0 and u>2{a) = (1,1) • a yields the subdivision shown in Figure 8.4. To see this, note that the Newton polytopes of the supports of the polynomials are Ql = [(0,0), (0,1), (2,2)],

Q2 = [(0,0), (1,0), (0,1), (1,2)],

which are convex already. These lift to Qi = [(0,0,0), (0,1,0), (2,2,0)],

Q'2 = [(0,0,0), (1,0,1), (0,1,1), (1,2,3)],

The lower hull of Q[ + Q'2 has the faces shown in the figure with vertices [(0,0,0), (0,1,0), (2,2,0), (2,0,1), (0,1,1), (3,2,1), (2,3,1), (3,4,3)]. Using these liftings, the homotopy of Equation 8.5.29 applied to this example becomes

H(x,y,t) = Clf^l) v

y

=(

1 + ax

+ bx2y2 2 A 3

v

(8.5.30)

} ' ' ' \h2(x,y,t)J \1 + cxt + dyt + exyH ) The solution paths of H(x, y, t) = 0 are intimately related to the two mixed cells, labeled A and B in the figure. It can be shown3 (Lemma 3.1 Huber & Sturmfels, 1995) that H(x,y,t) only has branches of the form

(x(t),y(t)) = (zoi71,Vof2) + higher-order terms when (-ji,72,1) is an inner normal of the mixed cell of the lower convex hull of Qi + Q'2- As i ^ 0, the lowest order terms dominate and we solve them to obtain the leading coefficients Zo;2/o of the fractional power series. Let us start by examining cell A, which is generated by monomials \,x2y2 from /i and l,y in / 2 . The inner normal for that cell is (71,72,1) = (1, — 1,1). One may 3

The result stated here generalizes to any number of variables.

145

Polynomial Structures

check that the inner product of (71,72,1) with the lifted vertices takes a minimal value of 0 on the cell. In the case at hand, this means that {x(t),y(t)) = ( x o t 1 , ^ " 1 ) + higher-order terms.

(8.5.31)

Substituting into H(x,t) = 0 gives hi (x, y, t) = 1 + axot + bxfyg + higher-order terms, h,2(x,y, t) = 1 + cxot2 + dyo + exoy^t2 + higher-order terms. Keeping just the lowest-order terms in t, we have equations for the initial coefficients xo,yo as 0 = l + bx%yZ, 0 = l + dy0These give two solutions (zo,2A>) =

(±id/Vb,-l/d).

For each of these, we may use Equation 8.5.31 to predict the values of x(t),y(t) for small t and then commence path tracking on the homotopy Equation 8.5.30 to t = 1. In similar fashion, the mixed cell B in Figure 8.4 is generated by monomials x,x2y2 from fi and l,x from J2- This time the inner normal is (71,72,1) = (-1,1/2,1), so we get (x(t),y(t)) = (xot-^yot1/2) + higher-order terms,

(8.5.32)

which gives h\(x,y,t)

= 1 + axot^1 + bx1ylt~l + higher-order terms,

h,2(x, y, t) = 1 + cx0 + dy^t3/2 + exoy2t3 + higher-order terms. This time, the lowest-order terms in t give 0 = axot~l +

bxlylt'1,

0 = 1 + cx0, giving two solutions / ChC

{xo,yo) = (-i/c,±J—). As before, these allow us to predict (x(t),y(t)) for small t, now using Equation 8.5.32, and then track the homotopy Equation 8.5.30 to t = 1. Together, these give four paths to the four solutions of Equations (8.5.27) and (8.5.28). Any other choice of (71, 72) fails to give any nonzero solutions of (XQ, yo) in the initial fractional power series, as there is only one leading term in one or both of

146

Numerical Solution of Systems of Polynomials Arising in Engineering and Science Table 8.1

Various root counts for the toy example, Equation 8.6.33

Structure

Embedding

Total Degree

<{1, zu z2, z3, 2 4 })

Two-Homogeneous Linear Product

{1, zi, Z2}

(2)

Count

U (4)

{1, 23, 24}

C (2)

({21,22} ® {1, zi, 22}®

4

C

256

4

96 4

54

(C*) 4

26

(C*)

{23,24}® {1,23,24}) Monomial Product

({2124,2223,21,22}®

or Polytopes Polynomial

{2124, 2223, 23, 24}) {{2124 — 2223,21,22}®

Product

(C*) 4

6

{2124 — 2223,23,24})

the homotopy equations. Both XQ and yo must be nonzero, because by assumption, they are the leading coefficients of the series. This kind of argument is the key to the general result for any number of equations.

8.6

A Summarizing Example

Let us review by studying a "toy" example for which each product structure gives a different root count. Consider a system of four equations, each of the form fi = {qn{z\Zi ~ z2z3) + ql2z\ + qi3Z2){qii(zizA - z2z3) + qibz3 + qi6z4) +qi7ZiZ3 + ql8ziz4 + qi9z2z3 + qnoz2z4.

(8.6.33)

We have four variables 2 = {z\, z2, z3, Z4} and forty parameters g^, i = 1,...,4, j = 1 , . . . , 10. Table 8.1 gives the root counts for various embeddings of the system. Here is a quick summary of how each of these is calculated: • The total degree is 4 4 = 256. • With the variables split into two groups {zi,z2} and {23,24}, the twohomogeneous Bezout count is the coefficient of a2f32 in (2a -I- 2/?)4, which is 96. More explicitly, each polynomial in the start system has the form x (1,2:3, Zi)^2\ There are (2) = 6 ways to choose the factor (1,2:1,2:2) (1,2:1, Z2) from two start polynomials and (1,2:3,2:4)^ from the remaining two, and then there are 2 4 solutions for each such choice, yielding 6 • 2 4 solutions in all. • Notice that the equations have no constant or linear terms, so the start systems can be chosen of the form (2:1,2:2) x {\,zl,z2)

x (2:3,2:4) x (1,23,24)

Polynomial Structures

147

The combinatorics for the linear-product embedding follows those for the twohomogeneous case, but the simultaneous choice of two factors {zi,z2) gives a solution with z\ = z2 — 0, and two choices of the form {z3, z±) yield a similar result. Thus, we get a smaller root count when working on (C*)4 of 3-3- (4) = 54. • The monomial product root count and the polytope root count are the same for this system. Evaluation of the mixed volume by computer yields the count of 26. • The polytope root count does not account for the fact that Z\Z4 and z2z3 do not appear independently in the factors. The polynomial product structure captures this fact, and as a result the root count decreases to 6. To determine this count, one must consider the 24 ways to choose one factor from each equation in the corresponding start system: gi G {zxz4 - z2z3, zi,z2) x {ziZi - z2z3, z3, z 4 ). It turns out that only choices with two of each kind of factor give roots in (C*)4. Each of these (2) = 6 combinations gives a single root. Although for this example polynomial products give a lower count than the polytope root count, that is not necessarily true in general. It depends on whether the equations admit a favorable polynomial product. Often the polytope root count is lowest. Other than that, the ordering in the table is fixed, as the structures lower in the table are generalizations of those higher in the table, as indicated at the outset of the chapter in Figure 8.1. 8.7

Exercises

The next chapter on case studies contains more challenging exercises connected to applications. For now, the exercises are simpler, illustrative problems. Exercise 8.1 (Warm Up) Use HOMLAB to solve the system 8.4.4 using • a total-degree homotopy (see routine totdtab), and • a two-homogeneous homotopy (see routine mhomtab). Exercise 8.2 (Linear Products) Consider the system 8.4.18. What is its total degree, its two-homogeneous Bezout count, and its best linear-product root count. Solve it all three ways using the HOMLAB routines totdtab, mhomtab, and lpdtab. Exercise 8.3 (Generalized Eigenvalues) Create a straight-line program for the generalized eigenvalue problem (XtA + X2B)v = 0, where A and B are n x n matrixes, [Ai,A2] G P 1 , and v £ Fra. Solve a randomly generated example using a two-homogeneous homotopy with n paths. (Use routine lpdsolve.) Compare your result to the gz algorithm in Matlab.

148

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Exercise 8.4 (Multihomogeneous and Polytopes) For a general system having a given multihomogeneous degree structure (D,K), show that the polytope root count and the multihomogeneous Bezout count Bez(D, K) are the same. Use Equation 8.4.15. Exercise 8.5 (Toy Problem) Use HOMLAB to confirm all the root counts reported in Table 8.1. How can you confirm the polytope root count even though HOMLAB does not implement a mixed volume calculation? Exercise 8.6 (Circle Tangents) A circle of radius r and center (a, b) has the equation f(x,y) := (x — a)2 + {y — b)2 — r2 = 0. The condition for a line through (x, y) and point (c, d) to be tangent to the circle is g(x, y) := (x — a)(x — c) + (y — b)(y-d)= 0. (1) Assume r, a, b, c, d are given. Find the points of the circle where it is touched by the tangents through (c, d). Do so by solving the system {/ = 0, g = 0}, then try again by solving {/ = 0, f — g = 0}. Is there a difference in the number of paths for a total degree homotopy? How about for a two-homogeneous homotopy? (2) Assume two circles are given. Find the point pairs where a line simultaneously touches both circles in a tangency. Use the same trick as in item 1 to reduce the number of homotopy paths. (3) Show that with the change of variables (z, z) :— (x + iy,x — iy) and judicious linear combinations of the equations, the simultaneous tangents to two circles can be found with a system having total degree 8, linear-product root count 6, and polytope root count 4. (4) What happens if the two circles are tangent to each other?

Chapter 9

Case Studies

As a means of reviewing the computation of isolated solutions by continuation, we present a collection of application problems in this chapter. Reflecting our own experiences, these are weighted heavily towards problems in kinematics, with chemistry and game theory also represented. Readers who have no interest in these application areas are encouraged nonetheless to study this chapter to solidify concepts. We order these roughly by the complexity of the analysis of the polynomial structure. The first case concerning Nash equilibria is naturally formulated as a multihomogeneous system, while succeeding cases offer a range of options to consider. The final case study on the design of four-bar linkages is actually a collection of problems ranging from the very easy, four-bar motion analysis, to rather hard, nine-point path synthesis. In these examples, one may notice that there is an art to choosing a clean formulation and simple manipulations of the equations can sometimes lead to homotopy formulations having fewer paths. Although such manipulations are sometimes not really necessary, as a few extra solution paths are not of practical consequence, our objective is to give some sense of the full range of possibilities.

9.1

Nash Equilibria

An important problem in game theory, with application to economics, is the determination of Nash equilibria. A description of the problem and results of using several different solution methods, including Grobner methods and continuation, can be found in (Datta, 2003), and related information is in (Sturmfels, 2002). The problem concerns N players, and the ith player has Si + 1 possible choices of play, called "pure strategies." For every combination of strategies, there is a payoff for each player. There are ni=i( s i + -0 possible combinations and TV players, so the game is defined by TV rTi=i(s* + -0 numbers, tabulated in utility matrices as follows. Let's say there are 3 players, Alice, Bob, and Chuck, abbreviated as A, B, or C, and they make, respectively, the plays a,b,c. Denote by U^bc,U^bc,U^bc the respective payoffs to the players. More generally, the utilities are U^ J w , where i 149

150

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

ranges 1 to N, and each jk runs 0 to s^. The game is played multiple times. Suppose Alice observes that in the last round a change in her strategy would have earned her a higher payoff. Then, she is likely to change her play in the next round. Bob and Chuck will act similarly. An equilibrium occurs if every player finds that there is no unilateral change of strategy that would have increased his or her payoff. Suppose the players can split their bets between the possible strategies. This is called a "mixed strategy," which models either the situation of putting a fraction of ones money on each strategy or of putting all ones money on a single strategy chosen probabilistically according to the mixed strategy. Let Xi = (xio,... ,XiSi) be the ith player's mixed strategy. Then, the total payoff Pi{x\,..., XN) to player i is obtained by summing his/her utility over all the mixed strategies as Pi(xU...,XN)=

J2'"Y1 ji=0

U

xi31,-,xNJNxljix2h---XNjN-

(9.1.1)

j«=0

Notice that this is multilinear in the players' mixed strategies. Equilibrium occurs for player A if, while holding B and C's mixed strategies fixed, every pure strategy for A returns the same payoff. Otherwise, A would be motivated to bet more heavily on the higher paying pure strategy. Let e/- be a pure bet on the fcth strategy: eo = ( 1 , 0 , . . . , 0), ei = (0,1,0,..., 0), etc. Then, a Nash equilibrium occurs when for i = 1 , . . . , N and k — 1 , . . . , Si Pi(xi,... ,Xi-i,ek,xi+i,...

,xN) = Pi(xi,... ,Xi-i,eo,xi+i,...

,xN).

(9.1.2)

This comprises a total of X)i=i s* homogeneous equations on P S l x ••• x FSN, a multilinear system of polynomial equations. Since the entries in the mixed strategy x\ are the percentages that player i bets on each pure strategy, these should all be in the real interval [0,1] , and they should sum to one. Each player's strategy Xi £ FSi has a unique scaling factor that makes the sum of its homogeneous coordinates equal to unity. These can then be filtered against the [0,1] condition to find the meaningful solutions. Thoseforwhich all bets are in the interior of the interval (0,1) are called "totally mixed Nash equilibria." A given game can also have partially mixed equilibria, where some players adopt pure strategies, due to unequal payoffs, while others adopt mixed strategies. We consider only the totally mixed Nash equilibria. The system given in Equation 9.1.2 has two essential structural characteristics: the equations are all multilinear, and the group of variables Xi does not appear in the ith block of equations (those that involve Pi). This structure is perfectly captured by a multihomogeneous formulation. In (Datta, 2003; Sturmfels, 2002), the solutions are counted using the polyhedral mixed volume and computed via the associated polyhedral homotopy. This is of course valid, since the polyhedral formulation sharply bounds any multihomogeneous formulation, but it is a bit of overkill when the multihomogeneous formulation is already sharp. If the payoffs

151

Case Studies

were such that more monomials vanish from the equations, such as may happen when payoffs for two pure strategies are equal, then the polyhedral method could provide a lower root count. For small systems, a multihomogeneous root count can be done by hand while a general multihomogeneous routine for larger systems remains a simple and efficient alternative to polyhedral approaches. Let us take, for example, the case of N = 3 players, with players 1 and 2 having Si + 1 = 3 pure strategies each, and player 3 having just s 3 + 1 = 2 pure strategies, so (si, s2, S3) = (2,2,1). By Equation 8.4.15, the multihomogeneous root count is B = coeS(a2b2c\ (b + c)2(a + c)2(a + b)1) = coeft(a2b2c, b2{a + c)2(a + b)) + coeff(a2b2,2b(a + c)2(a + b)) = coeff (a2c, (a + c)2a) + coeff(a2b, 2a2(a + 6)) = 2 + 2 = 4 The explanation of the first line is that the exponents in a2fc2c1 match the dimensions of the space, P Sl x PS2 x PS3, on which we work, while those in the polynomial (b + c)2(a 4- e)2(a + b)1 match the number of equations of each type, which are also s\, S2, S3 by Equation 9.1.2. The factor (b + c)2 says that the two equilibrium equations for player 1 do not involve player l's bets while those of players 2 and 3 appear linearly, and similar factors come from the other two players' equilibrium conditions. It is clear, we hope, from this example how to generalize to other N and

Sj.

Another way to arrive at the same result is to examine the linear product start system. For the (si, S2, S3) = (2,2,1) game, the 3-homogeneous start system is / 2(

) 3( r player 1 equilibrium

(xi) x <x3) 1 , . .... . ; ; ; > player 2 equilibrium

(9.1.3) v

;

Wxwr

(xi) x (x2) } player 3 equilibrium. Among the 25 ways to choose one factor from each equation, we are limited to 2 choices each in x\ and x-i and only one choice for X3. Making the choice for x% first, which can only be done 4 ways, one may see that all the other choices are forced. The valid choices are

'fa)} (x2)

< te> \ (Xi)

[ten

[ten

\ (zi> >

\ te> >

te)

(Xi)

(x2)

(Xi)

[ter (x2)

< (xi) > (x3)

(9.1.4)

. (x2) ) I (x2) ) I (xi) ) \ (xi> . The disparity between the multihomogeneous root count and the total degree, here 4 and 32, respectively, grows rapidly with the size of the problem, for example, for N = 4 players having 4 pure strategies each, the 4-homogeneous Bezout count is 13,833, while the total degree is 3 12 = 531,441.

152

9.2

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Chemical Equilibrium

Imagine a reaction vessel, or an automobile engine, in which a mixture of chemical compounds are reacting. The compounds may break up and recombine into a myriad of intermediate species, settling down eventually to a final equilibrium mixture. While the transient behavior of the reaction is governed by differential equations, the final equilibrium conditions are well-modeled by a system of polynomial equations. The system typically has at least one real root with positive values for the concentrations of all the chemical species. It is possible for there to be more than one positive root, in which case the transient behavior determines which of several possible equilibria is reached. A basic presentation of modeling chemical reactions can be found in (Morgan, 1987), from which the following discussion is derived; more sophisticated treatments are given in (Feinberg, 1980). The variables in the system represent the molar concentrations of the species. The concentrations at a state of equilibrium are governed by two types of equations: conservation equations state that the total number of atoms of each element must stay constant (we assume a closed system), and reaction equations model the propensity of certain combinations of species to transform into each other. In such a model, a chemical reaction equation of the familiar form, such as H2O ^2H + O, gives rise to an equilibrium reaction equation governing the balance between the constituents on the two sides, in this case kXH2o = XHXO, where k is an equilibrium constant that depends on temperature. (Equilibrium constants for many reactions are available in standard tables, typically derived from laboratory experiments.) To go with this reaction equation, the conservation equations would be 2XH2O + XH = TH, XH2O + XO =

To,

where TH and To stand for the total amount of hydrogen and oxygen in the vessel. Notice that the coefficient of 2 on XH2O in the conservation equation for hydrogen comes from the fact that each water molecule has two hydrogen atoms. The conservation equations are always linear, and the reaction equations are polynomial. The three equations just given determine the equilibrium balance between water, hydrogen and oxygen in a simple model that ignores molecular hydrogen and oxygen, if 2 and OiMorgan presents a model (Model B in (Chapter 9 Morgan, 1987)) involving eleven species formed from oxygen, hydrogen, carbon and nitrogen. The reaction equations, given in standard chemical notation at left and in polynomial form at

153

Case Studies

right, are: 02 ^ 20

kiXO2 = Xo

(9.2.5)

H2 ^ 2H

k2XH2 = X2H

(9.2.6)

7V2 ^ 2N C02^0

k3XN2 + CO

=X

2

(9.2.7)

N

(9.2.8)

k4XCO2 = XOXCO

OH^O +H H2O^±O + 2H

k5XOH = XOXH k6XH2o=XoX2H

(9.2.9) (9.2.10)

NO^O

k7XNO=XoXN.

(9.2.11)

+N

There are four conservation equations: TH = XH + 2XH2 + XOH + 2XH2o

(9.2.12)

Tc=XCo + Xco2

(9.2.13)

TO = XO + Xco + 2X O2 + 2XCo2 + XOH + XH2O + XNO

(9.2.14)

TN=XN

(9.2.15)

+ 2XN, + XNO 6

These are eleven equations in eleven variables, with total degree 2 • 3 = 192. We could readily solve the system as given, but it is easy to reduce. The obvious move is to substitute from the reaction equations into the conservation equations to eliminate all variables except Xfj, Xo, XQOI a n d ^JV- This gives four equations of total degree 3 • 2 • 3 • 2 = 36. Note , however, that there is only one cubic monomial in the equations, which comes when we eliminate XH2O using Equation 9.2.10. So it is a simple maneuver to replace Equation 9.2.14 with 2To — TH = 2(Xo + Xco + 2Xo2 + 2Xco2 + XOH + X^o) —

+ 2XH2 + XOH)(9.2.16) After substituting from the conservation equations, the system of Equations (9.2.12, 9.2.13, 9.2.16, 9.2.15) has total degree 3 • 2 3 = 24. Now, let's see if any of the product structures can further reduce the number of homotopy paths. First, for convenience, we list the monomial structure of the equations: {XH

(l,Xo,XH,XoXH,Xfj,XoXff) (l,Xco,XoXCo) (l, Xo, XH, XCO, XQ,XH,

XOXH,XOXCO,

/g 2 1 7 \ XOXN)

\ 1, XN , XN, XOXN ) .

A four-homogeneous formulation gives a root count of 18, which is the lowest possible multihomogeneous count. We can improve on that slightly with a linear product

154

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

homotopy having the start system (l,Xo)x(l,XH)i2) (l,Xo) x (l,XCo) {1,XO,XH)

x

(1,XN) x

/Q2

lgx

(1,XO,XH,XCO,XN)

(1,XO,XN),

which gives a root count of just 16. As always, a sparse monomial homotopy would do just as well as the best linear product homotopy. In chemical equilibrium problems a significant numerical issue arises: equilibrium constants often have wide ranges of magnitude. For a temperature of 1000°, Morgan gives reciprocal equilibrium constants Ri = 1/fcj that range from 10 22120 to 10 47 ' 970 . It is essential to rescale the variables and the equations to work in double precision arithmetic. We will not discuss this issue here. The interested reader may refer to Morgan's treatment in (Morgan, 1987) or (Meintjes & Morgan, 1987), or study the implementation in the function scalepol distributed as part of HOMLAB. This problem is treated in the exercises of this chapter. 9.3

Stewart-Gough Forward Kinematics

A detailed description of Stewart-Gough platform robots and the associated forward kinematics problem has already been given as a case study in parameter continuation, § 7.7. However, the discussion there assumed that we had the solutions for some general member of the problem family which could then be used as the start system for parameter continuation. Here, we return to the problem to examine our options for solving the first example. The family of Stewart-Gough platform problems is a sub-family in the family of all systems of seven quadrics on [e, g] G P 7 . Any member of this family has at most 27 = 128 isolated solution points, and it is easy to write down an example with that many roots, a simple one being G(e,g)

= {e2 - e2, e 2 - e2,, e2 - e2, e2 - g2, e2 - g2, e2 - g22, e2 ~ g2} = 0. (9.3.19)

We immediately see that this system has exactly 128 solutions, all of the form = ±e0. The theory presented in § 8.3,8.4.1 shows that with ex = ±eo,...,g3 probability one, the solution paths of the homotopy H((e,g),t) = -ytG(e,g) + (1 - t)F((e,g);p0) = 0,

(9.3.20)

for any p0 and random 7 e C, will lead from the 128 solutions of G = 0 to a set of endpoints that contains all isolated solutions of F = 0 as t goes from 1 to 0 along the real line. The Stewart-Gough forward kinematic equations can be reduced to a form in which a linear-product decomposition yields a lower root count than the total degree. The reduction is based on the observation that the quadratic terms in g in all

155

Case Studies

six leg equations, Equation 7.7.10, are the same, namely gg'. Hence, if we subtract the equation for leg 1 from all the others, this term is eliminated from five of the equations. That is, the system becomes (9.3.21)

fo(e,g)=ge' = 0, fi(e,g) = ( M i + aia[ - L\)ee' + (gb'xe' + ehg') - (ge'a[ + aveg') - (e&ie'ai + axeb\^) + gg' = 0,

(9.3.22)

fi(e,g) = (bil/i + aia'i - L})ee' + (gbtf + ebig') - {ge'4 + a%eg') - {ehe'a'i + a ^ e ' ) = 0 ,

i = 2 , . . . , 6.

(9.3.23)

This system admits the linear product decomposition /o G {g ® e)

(9.3.24)

he({e,g}®{e,g})

(9.3.25)

/,efej}®«).

i = 2,...,6.

(9.3.26)

Consequently, we may use a start system of the form 9o € (g) x (e) Si€<e,s>

(9.3.27)

(2)

gi<E(e,g)x(e),

(9.3.28) i = 2 , . . . , 6.

(9.3.29)

The linear-product root count may be tallied up by noting that in picking one factor from each equation, we must never choose more than three of the form (e), because choosing four or more forces e = 0, and we wish to ignore any solutions on that degenerate set. Accordingly, if we pick the factor (g) in go, we may pick either of two factors in gx and among the remaining five equations, we may choose (e) from zero to three times. If instead we choose (e) in go, we must limit the last five equations to choose (e) at most twice. These observations give a root count of

4(HHK)H[0KMDH It is shown in (Wampler, 1996a) that the count of 40 for general Stewart-Gough platforms is due to the antisymmetry of the mixed quadratic terms. That is, if we write Equation 7.7.10 for leg i in the form, eTAte + 2eTBi9 + gTg = 0, where e and g are interpreted as 4 x 1 column matrices, then the 4 x 4 matrix Bi is antisymmetric, Bf = — Bj. [This can be seen in the quaternion formulation by noting that (gb'e' + ebg') = —{eb'g' + gbe'), and similarly for the other mixed terms.] Accordingly, any further reduction of the problem must take advantage of this property. A monomial product or sparse monomial homotopy does not account

156

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

for any relationships between the coefficients of the monomials, so these give 84 roots when applied to Equation 9.3.21. 9.4

Six-Revolute Serial-Link Robots

The solution of the inverse kinematic problem of general six-revolute, serial-link robots, once called the "Mount Everest of Kinematics" by renowned kinematician F. Freudenstein,1 is a milestone in the development of polynomial continuation. The problem is, given a stationary ground link and six subsequent moving links connected in series by rotational joints, find all sets of joint angles to place the final link in a given position, p, and orientation, {X7,y7,z7}, as schematically shown in Figure 9.1. The links are assumed to be rigid bodies, a good approximation for most industrial robots. The space of rigid-body displacements, E 3 x 50(3), is six-dimensional, which matches the dimensionality of the joint space, so we expect in general a finite number of isolated solutions to the problem. The stature of the problem justifies a historical synopsis, which may help to place the development of the continuation method in context with other approaches.

Fig. 9.1

Schematic six-revolute serial-link robot

The high points in the history of the problem begin in 1968 with (Pieper, 1968), who gave a formulation of the general problem having total degree 64,000. This 1 Ferdinand Freudenstein, Higgins Professor Emeritus of Mechanical Engineering, Columbia University

Case Studies

157

upper bound was substantially sharpened in 1973 to only 32 (Roth, Rastegar, & Scheinman, 1974), but it was not until 1980 that (Duffy & Crane, 1980) derived a reduction of the problem to a single polynomial of degree 32. This essentially solved the problem in the sense that good numerical methods exist for factoring a polynomial in one variable and also in the sense that one could solve a generic example and find the true root count. The count is only 16, since 16 of the 32 roots were extraneous ones introduced by the reduction process. However, at the time this was not fully appreciated and the prevailing attitude at the time was that the problem could not be considered fully solved until a reduction to single univariate polynomial of degree 16 was found. Besides, a numerical demonstration does not carry the full weight of mathematical proof. It was into this scene that, in 1985, (Tsai & Morgan, 1985) introduced the method of polynomial continuation to the kinematics community. They cast the problem as eight quadratics (total degree 256) and found that only 16 endpoints of the ensuing homotopy were valid solutions. Perhaps the most important contribution of that work was not the confirmation of the count of 16, but rather the demonstration that systems of polynomial equations could be solved reliably by numerical means. Work continued after that on two fronts: elimination methods and continuation. (Primrose, 1986) gave the first real proof of the root count of 16, by showing that the other 16 roots of the Crane-Duffy polynomial correspond to solutions at infinity for the intermediate joints. Morgan and Sommese (Morgan & Sommese, 1987a) showed that the Tsai-Morgan system had a two-homogeneous Bezout number of only 96, the first application of multihomogeneous continuation. Finally, in 1988, Lee and Liang (Lee & Liang, 1988) produced the long sought-after reduction to a univariate polynomial of minimal degree, although it was a complicated procedure. A simpler one was later given by (Raghavan k. Roth, 1993), and a numerical treatment of this reduction as an eigenvalue problem was given by (Manocha & Canny, 1994). Complementing all of these works, (Manseur & Doty, 1989) found an example with all 16 solutions being real. The reduction of a problem to a univariate polynomial of minimal degree has two payoffs: it proves an upper bound on the root count and it leads to a numerical solution. But it is not the only route to either of these. A system of equations that admits a sharp root count via a multihomogeneous formulation or a monomial polytope analysis suffices for proof, and continuation can provide the numerical method. We should not fail to mention the extensive work in computer algebra to compute Grobner bases as a means of proof; see (Cox et al., 1997) and (Cox et al., 1998) as a beginning point to the extensive literature on this. Any reduction of a problem to a Grobner basis can be converted for numerical solution to an equivalent eigenvalue problem (Auzinger & Stetter, 1988; Moller & Stetter, 1995). But even as late as Raghavan and Roth's paper, algorithms for computing Grobner bases were not capable of handling a problem as difficult as the six-revolute inverse

158

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

position problem. If we are willing to give up rigorous proof of the true root count, it is often convenient to find a "good enough" formulation of a multivariate system with a root count low enough that continuation can be reasonably applied. In this sense, with the tremendous increase in computer power of late, even Pieper's original formulation of total degree 64,000 might be considered within range. But we will proceed below to give a much more amenable formulation than that. The approach we give here, first published in (Wampler & Morgan, 1993), is of a different cloth than all the others we have mentioned. Those others begin with a formulation of the kinematics as a product of homogeneous transformation matrices (Denavit & Hartenberg, 1955), (Chapter 12 Hartenberg & Denavit, 1964). Reductions starting from that point lead to rather long algebraic expressions, as one can see from the cited references and (Chap. 10 Morgan, 1987). Instead, we write down a system mirroring closely the geometry of the problem, and proceed to solve it in its unmodified form. Let Zi € R3, i = 1 , . . . , 6, be unit vectors along the joint axes of a six-revolute serial-link chain; see Figure 9.1. The kinematic chain is completely described by finding the common normal between each pair of successive joint axes and listing three values: the "twist angle" ar between joint % and i + 1, the distance a, between these joint axes (a.k.a, the "link length"), and the distance d» (a.k.a., the "joint offset") between successive common normals. If none of the successive joints are parallel2, the common normal directions are Xi = z% x z i + i / s i n a j ,

i = l,...,5,

where "x" means the vector cross-product in 3-space. Then, the six-revolute inverse position problem can be written as the system Zi-Zi = l, Zi-zi+1=

cos on

i = 2,3,4,5

(9.4.30)

i = 1,2,3,4,5

(9.4.31)

5

(ai/sinai)zi x z2 + ^ ( d i z i + ( a l / s i n a i ) 2 i x z i + 1 ) = p,

(9.4.32)

i=1

where ft is a known vector from where the first common normal intersects joint 1 to where the last common normal intersects joint 6. The vectors z0, XQ, and z\ are known, being fixed in the ground, as are ZQ and xy, being fixed in the last link whose position and orientation is given. From these, and the known lengths and offsets of the links, ft is readily computed from p, and we take it as given. So we have 12 equations (vector Equation 9.4.32 is equivalent to 3 scalar ones) in 12 variables, which are the 3 elements each of £2,2:3,2:4^5. Although these vectors naturally live in R 3 , we will treat them as if they live in C 3 , by the usual embedding. 2

See (Wampler & Morgan, 1993) for how to handle parallel links.

Case Studies

159

Among the equations, two are linear and the rest are quadratic, for a total degree of 210 = 1024. Using the two-homogeneous groupings (I,z 2 ,z 4 ) and (I,z3,z5), we get a lower root count of 24(g) = 320. Although this is quite a bit inflated over the true root count of 16, it is low enough that we have no trouble tracking all paths by continuation. Then, we can solve subsequent examples with only a sixteen-path parameter homotopy. 9.5

Planar Seven-Bar Structures

One of the most prevalent classes of mechanical systems consists of planar links joined by rotational joints, also known as "pin joints," or simply "hinges." The axes of all the joints in the mechanism are perpendicular to the plane of motion. In reality, the links occupy three-dimensional volumes, and they can move in separate parallel planes, but for the purpose of analyzing their motion, only their projection onto one of these planes needs to be considered. Consider the seven-bar assembly shown in Figure 9.2, consisting of four triangles and three simple bars. (We call this the "type a" seven-bar, as there are two other topological arrangements of interest; see Exercise 9.6) For general dimensions of the links, such an assembly is a structure, meaning that it will be rigid. However, it is quite possible that if we disconnect a joint and reposition the pieces, we can reconnect that same joint with the links in new relative positions. The question of finding all such assembly configurations comes up in the study of related six-bar and eight-bar linkages which have internal motion.

Fig. 9.2 Seven-bar linkage, type a.

160

9.5.1

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Isotropic

Coordinates

Before presenting equations for the problem, we take a brief aside to explain "isotropic coordinates." Suppose we have a point in the real plane (a, b) £ R2. We can naturally associate to this point the complex number z = a + ib G C This is quite convenient because vector addition in R2 becomes just the usual addition of complex numbers, and the approach is known in the kinematics community as the "complex vector formulation." Moreover, a rotation around the origin through an angle 0 moves point z e C to a new point el@z. For brevity, we use the convention 6 := el&. In this manner, any rotation in the plane corresponds to a 9 6 C of unit magnitude, \9\ = 1. Now, suppose we extend (a, b) into C 2 by letting a and b take on complex values. Then, to preserve the convenient modeling of rotations by complex multiplication, we associate to (a,b) e C 2 the point (z,z) := (a + ib,a - ib) G C 2 . For reasons beyond the current discussion, the pair (z, z) are known as "isotropic coordinates." Note that z and z are complex conjugates if, and only if, a and b are real. Rotation through an angle 0 now gives the point (8z, 9~lz). Any vector loop equation written in terms of z and 9 has a corresponding equation in which z is replaced by z and 9 is replaced by 9~1. Alternatively, we may let 9 := 9~l, so that rotation is represented by the isotropic pair (9,9), with the extra equation 99 = 1.

9.5.2

Seven-Bar Equations

Without loss of generality, let us take the position of link 0 to be fixed; that is, assume 9o = 1. Then the squared lengths of the three simple bars can be written as i\ =(a0 + M i + b292){a0 + M i + b292),

(9.5.33)

l\ =(c 0 + a292 + b393)(c0 + a292 + b39s),

(9.5.34)

£l ={b0 + a393 + Mi)(&o + M s + M i ) ,

(9.5.35)

9^

= 1,

8292 = 1,

9393 = 1.

(9.5.36)

This is a system of six quadratics, for a total degree of 26 = 64. The system is bilinear when treated with the two-homogeneous partition {1,01,02,03} X {1,01,02,03}. In the corresponding linear-product start system, only choices of factors having three of each type of factor give finite roots, so the two-homogeneous root count is (3) = 20A sharp root count is obtained by matching the sparsity of the equations using

Case Studies

161

a linear product decomposition as follows: {1,01,02} x { l , M 2 } {1,02,03} X {1,02,03} {1,03,01} X {1,03,0!} {1,01} X {1,0!} {1,02} X {1,02} {1,03} X {1,03}

f q , w (9 5 37)

" -

Of the same 20 combinations of factors that gave start points in the twohomogeneous formulation, six now do not give solutions. For example, we cannot simultaneously choose the initial factor from the first, fourth and fifth equations, as we would then have three equations in only two variables: 9%, 02- From this, we see that the linear product homotopy based on Equation 9.5.37 has a root count of 14. Readers with a particular interest in planar linkages may wish to look at (Wampler, 2001) to see an alternative solution approach which, when applied to the seven-bar problems, converts them to eigenvalue problems of size 14, 16, or 18. 9.6

Four-Bar Linkage Design

We have already studied several systems concerning the kinematics of mechanisms and robots, namely, Stewart-Gough platforms, six-revolute serial-link robots, and planar seven-bar linkages. In each of these, the objective was analysis: given the mechanical structure of the links, we sought all assembly configurations. In this section, we study a simpler linkage, the planar four-bar, but we ask synthesis questions, that is, we seek structural dimensions of the links so that the four-bar produces a specified motion. Depending on the requirements set forth for the motion, we may face a system of polynomial equations ranging from easy to hard. The easy examples have been solved long ago by a variety of methods, but the most difficult, the nine-point path synthesis problem, stood for almost 70 years before being solved by modern continuation methods. The use of an efficient product structure was critical to that success. In all these examples, we begin with basic loop-closure equations and manipulate them into a form amenable to efficient solution by continuation. In earlier times, kinematicians usually declared a problem "solved" when an elimination procedure had been found for reducing it to a single polynomial in one variable, especially if that polynomial was of minimal degree (having no extraneous factors). Such a polynomial could then be solved by a variety of numerical methods. With the advent of continuation, this is no longer necessary, for we can reliably find all solutions to a system of multivariate polynomials. Part of the art in applying continuation is to make informed decisions about how much symbolic pre-processing to do before turning the problem over to numerical solution. With this in mind, let us take a look at some four-bar design problems.

162

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Fig. 9.3 Four-bar linkage. Heavy lines are rigid links, whereas thin lines are vectors. Open circles mark hinge joints, and hash marks indicate a stationary link.

9.6.1

Four-Bar

Synthesis

Except for the simple lever, perhaps the most ubiquitous linkage mechanism is the planar four-bar, Figure 9.3. It consists of four rigid planar bodies connected in a loop, with one link, the "ground link," held in fixed position. A set of three links connected in such a loop would form a rigid triangle, a fundamental structural component in bridges and the like. In contrast, a hinged quadrilateral deforms, making it useless for structures but leading to a multitude of applications in machines that perform useful motions. In particular, points such as A and B on the two links adjacent to the ground link trace out circles centered on the fixed hinge points Ao and BQ, respectively, while points such as C on the "coupler link" opposite the ground link generally trace out sixth-degree curves. Linkages where one or more of the hinge joints are replaced by linear (slider) joints are also four-bars, but we will not discuss them here. The motion characteristics of four-bars can be used in several ways. Most applications fall into one of the following categories: Function Generation In this case, the purpose of the four-bar is to transfer an input rotation at one ground pivot to the other. If the four-bar is a parallelogram, this does nothing more than duplicate the input motion at the output side (transferring power in the process), but if the linkage is a general quadrilateral, some reshaping of the motion takes place. That is, a uniform rotation speed at the input gives a nonuniform speed at the output, which can be very useful. Quite often, a steady rotation at the input is converted to an oscillatory output. A windshield wiper operates on this principle, for instance. Path Generation In this case, there is a designated point on the coupler link, where we might place the tip of a tool for the machine to do its work, and so the path traced out by this tool is of top concern. The designated point is called the coupler point and its path is called the coupler curve. The motion of the foot of a simple walking machine might be generated in this way (assuming a

163

Case Studies

ball-shaped foot so that only its center position matters, not its orientation). Body Guidance In this case, the entire motion of the coupler is at stake, both position and orientation. Such a machine might scoop up material in one location, carry it without spillage to deposit the contents in a second location. A four-bar might guide the motion of the scoop. Four-bar synthesis means that we specify at the outset the desired motion, and seek to find a four-bar that will produce it. Synthesis is the inverse process of analysis, which seeks to describe the motion characteristics of a given mechanism. We will proceed to write out the basic equations of four-bar motion, which can be employed for analysis and for various kinds of synthesis, depending on which quantities are given and which are treated as unknowns. We will then describe several synthesis problems. Among these, the most challenging are path-synthesis problems, and as we shall discuss in some detail, the most challenging of all is the synthesis of a coupler curve to pass through nine given points. 9.6.2

Four-Bar Equations

The kind of synthesis problems we treat here are called precision-point methods, because we give a certain number of points through which the coupler curve must pass precisely or a number of locations through which the coupler must guide a body. So in the following equations, we use an index j to denote the configuration of the four-bar at the j t h precision point or precision position. Referring to Figure 9.3, vectors a and b describe the locations of the fixed pivots with respect to the origin O, vectors u and v are the links connected to ground at these pivots having rotations 4>j a n d 4>j-> respectively, pj is the vector from the coupler point C to the origin, x and y tell the location of the rotational joints in the coupler link, while Bj is the rotation of the coupler link. Quantities a, b, u, v, x, y do not change as the four-bar moves, while 4>,ip,6,p do change and hence have a subscript j in our formulae. Without loss of generality, we may assume 6>o = 4>o — V'o = 1) because the initial orientation of the links can be absorbed into the orientations of the vectors u,v,x,y. The four-bar can be viewed as consisting of a left "dyad," a,u,x and a right dyad, b,v,y, that are rigidly connected at the coupler point. In the following equations, we will use isotropic coordinates to represent vectors in the plane. Recall from § 9.5.1, where we give more details, that a vector from (0,0) to (ao,ai) in the plane is represented by isotropic coordinates as (a,a) := (a0 + iai,ao — ia,i). Summing vectors around the left and right dyads, we have loop equations for the j t h position, as pj+x6j+u(pj+a

= 0,

Pj+ydJ+vipj+b = O,

pj + x6~1+u
pj+yej

l

+vtpj +1 = 0.

(9.6.38) (9.6.39)

164

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

From these basic equations, we can define a wide variety of synthesis problems, varying in how many positions are prescribed and which of the symbols in the above equations are known quantities versus variables. 9.6.3

Four-Bar

Analysis

Before proceeding to the synthesis questions, let's look at the analysis question of determining the motion of a given four-bar. This will come down to nothing more than solving a quadratic equation, but we include it for background in case the reader wishes to animate any of the linkages synthesized in the subsequent sections. We assume that in Equations (9.6.38) and (9.6.39) we know the shapes of the links as given by x,y,u,v,a,b

and x, y, u, v, a, b. This leaves five unknowns, pj, pj, Oj, (pj, xjjj

and since we have just four equations, we expect a solution curve. One way to plot the curve is to rotate the left input link through a sequence of closely spaced angles, say $j = 0,1°, 2°,..., 360° and solve for the other four variables. First, eliminate Pj and pj by subtracting one equation from another to get {x-y)ej+uj-vxj}j+a-b

= 0,

(x-y)ejx+u(j>~1 -vi/j~1+a-b

= 0. (9.6.40)

Next, we eliminate 6j to get [x - y)(x -y) = (u(j)j - vipj +a- b){u(j)~l - vipj1 +a-b).

(9.6.41)

Since
We should mention that the engineering analysis of a four-bar under consideration for a real machine would encompass much more than just plotting its motion curve. One would need to consider, for example, the forces transmitted through the links. This and other considerations are beyond the scope of the present discussion. 9.6.4

Function

Generation

For function generation, we prescribe pairs (
165

Case Studies

(9.6.39) to get, for j = 0 , . . . , n,

x9j

x9j + ucj)j = vipj + 1,

(9.6.42)

l

(9.6.43)

l

1

+ u(f>J = vt/jj + 1.

This leaves as variables u, u, v, v, x, x and 9j, j = 1 , . . . , n, since we assume 90 = 1. Equations (9.6.42) and (9.6.43) for j = 0 , . . . ,n are 2(n + 1) equations in 6 + n variables, so n < 4. This implies that we can specify up to five pairs of angles, ((j>j,ipj), j = 0 , . . . , 4 , and still expect to find four-bars that exactly interpolate them. The system of Equations (9.6.42) and (9.6.43), j = 0 , . . . ,4, after clearing the negative exponent on 9j, consists of 8 quadratics and two linear equations for a total degree of 64. We leave it as an exercise to show that the system has a multihomogeneous formulation with a root count of only six. We could solve this using continuation, preferably using a sparse linear solver in the path tracker since only a few variables appear in each equation. An alternative is to reduce the number of variables. Eliminating 9j between Equations (9.6.42) and (9.6.43) and then using the equation for j = 0 to eliminate xx from each of the others, one obtains, for j = 1 , . . . , 4, the single equation (-uct>j+vipj + l){-u(l)-l+v^Jl

+ l) = (-uo+vij>o + l){-uoX+vij)Ql + \). (9.6.44)

It is now easy to see that the total degree is 2 4 = 16, whereas a two-homogeneous structure {u, v, 1} {u, v, 1} still has a root count of six. The two-homogeneous root count is not sharp. Expanding Equation 9.6.44, we see that the monomials uu, vv and 1 appear on both sides and cancel leaving only the monomials {u, u, v, v, uv, vu}. This means that the origin (u, v, u, v) = (0,0, 0, 0) is always a solution, which is of no use as a four-bar. Also, after two-homogenizing via (u,v,u,v) — (U/W,V/W,U/W,V/W), we see that there are two solutions at infinity ([W,U,V},[W,U,V]) = ([0,1,0], [0,0,1]) and ([0,0,1], [0,1,0]). Since all three roots appear even for arbitrary coefficients, the root count on C* is three, and accordingly, the polyhedral approach gives a mixed volume of three.

9.6.5

Body Guidance

This time, we are given positions (pj,pj) and orientations 9j of a body, j = 0 , . . . , n. We want to find four-bars which carry this body through these locations while it is rigidly attached to the coupler link. The Equations 9.6.38 for the left dyad are decoupled from those for the right dyad, Equations 9.6.39. In fact, they are exactly the same form, so if we find multiple solutions to Equations 9.6.38, we can choose one of them for the left dyad and one for the right dyad to form a four-bar that guides the body through the specified locations. For the left dyad, we have 2(n +1) equations in the 6 + n variables x,x,u,u,a,a and o = 1, so we expect a finite number of solutions when n = 4. Points (x, x) are known as

166

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Burmester points, and points (—a, —a) are the Burmester centers, named after the man who first solved the problem (Burmester, 1888). Eliminating
(9.6.45)

Using case j — 0 to eliminate uu from the others, we have, for j = 1 , . . . , 4, (Pj + x9j + a)(pj + x9~x +a) = (p0 + x90 + a)(p0 + X8QX + a).

(9.6.46)

This is almost identical in form to Equation 9.6.44, except this time the constant term does not cancel out since pjpj ^ PoPo- Thus, the system has total degree 24 = 16, two-homogeneous degree (2) = 6, and only 4 finite roots. (The same two roots at infinity exist as in the function generator problem.) In fact, one classical approach to solving the function generator problem is to use the principle of kinematic inversion to convert it to the Burmester body guidance problem, but we will not go into that here.

9.6.6

Five-Point Path Synthesis

Many different path synthesis problems can be formulated, depending on what additional information is given besides the path points (pj,pj). One version is to give the ground pivots {a, a, b, b}. The simplification of the equations is exactly as for body guidance, except we must simultaneously consider both the left and right dyads, since 93 is unknown. Thus, the system to be solved is, for j = 1 , . . . , n, (Pj + x6j + a){pj + x9~l + a) = (p0 + x + a){p0 + x + a), (Pj + y9j + b) (pj + y6j * +b) = (Po + y + b)(p0 + y + b),

(9.6.47) (9.6.48)

where we have used 60 = 1. For five precision points, i.e., n — 4, we have eight equations in the unknowns x, x, y, y and 9j, j = 1 , . . . , 4. Expanding the products and cancelling terms, we have the system 8M0j(Pi + a) " (Po + a)] + x{6j\p3

+ a) - (p0 + a)}

+(Pj + a)(pj +a)~ (po + a)(po + 0)) = 0, 03{y[03(Pi +b)-

(Po + b)\ + y[9-\Pj

(9.6.49)

+ b) - (Po + b))

+(Pj + b){p3- + b)- (p 0 + b)(p0 + b)) = 0, where the 9j multiplying each equation clears negative exponents. cubic equations, for a total degree of 3 8 = 6561. This obviously ple monomial structure of the system. We can do much better Equation 9.6.49 has the monomial structure {x,x,l} x {l,9j,9^} Equation 9.6.50 has the monomial structure {y, y, 1} {1,9j, #|}. verify that this gives a six-homogeneous root count of 24(^) = 96.

(9.6.50) This gives eight misses the simby noting that and similarly, The reader may

Case Studies

167

The monomial structure is in truth sparser than the product structure just given appear in Equation 9.6.49, would imply. Only the monomials {X6J,X6J,X,X~6J,0J} and Equation 9.6.50 has a similar pattern. This allows solutions of the form x = y = 0j = 0, so it is clear that the root count is lower than 96. In fact, the polyhedral mixed volume yields a root count of 36, which is sharp. 9.6.7

Nine-Point Path Synthesis

In 1923, Alt (Alt, 1923) noted that the extreme path-synthesis problem for four-bars is to specify nine points on the coupler curve. Compared to the six-revolute seriallink problem, this one has a longer chronology, but a shorter historical account. The problem has so far proven to be invulnerable to reduction by hand, and it seems no one as yet has made a serious attempt at it using computer algebra. To date, the problem has only been solved by polynomial continuation. After Alt, the main advance came in 1962, when Roth (Roth, 1962) (Roth & Preudenstein, 1963) abandoned analytical methods and invented an early form of the continuation method, which he called the "bootstrap method." The work was done using real variables, so Roth invented heuristics to work around difficulties which we now recognize to be solution paths that meet and branch out into complex space. Most bootstrap paths never found a solution, but nevertheless, the approach did produce for the first time linkages to interpolate nine specified points. After the invention of the cheater's homotopy (see § 7.8), Tsai and Lu (Tsai & Lu, 1989) used a heuristic version of it to improve the yield of solutions, but a complete solution was not found until 1992, by Wampler, Morgan, and Sommese (Wampler et al., 1992). A follow-up discussion of this article (Wampler, Morgan, & Sommese, 1997) showed how the approach could be specialized to design symmetric four-bar coupler curves with a maximal specification of precision points (five points plus the line of symmetry). The system of equations is exactly the same as Equations (9.6.49) and (9.6.50), except now a, a, b, b are unknown and the index ranges over j = 1 , . . . , 8. Accordingly, the system has the product structure, for j = 1 , . . . , 8, (l,x,x,a,a,ax,ax){l,0j,0?),

(9.6.51)

(l,y,y,b,b,by,by) {1^3,6*).

(9.6.52)

Using the fact that four general equations in the monomials {l,x,x,a,a,ax,ax} have just 4 solutions (hint: introduce new variables n = ax and ft = ax), one sees that this system has a root count of 212(®) = 286,720. This is the root count of the formulation used to solve the problem in (Wampler et al., 1992), which at the time was probably the largest polynomial system ever solved. This is a case where symmetry can play a helpful role. It is easy to see that swapping (x, x, a, a) with (y, y, b, b) leaves the equations reordered but otherwise unchanged. If we can arrange our start system to have this same two-way symmetry,

168

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

we can track just half the number of paths. This can be done by using the same random coefficients for the factors in Equation 9.6.51 as in Equation 9.6.52. Thus, the problem can be solved using only 143,360 paths. This is far from the end of the story. The system has numerous solutions at infinity. Moreover, if (x, x,a, a) = (y,y,b,b), Equation 9.6.49 and Equation 9.6.50 are identical, so there is a positive dimensional solution component obeying this relation. Many continuation paths terminate on this singular set. Actual solution of the problem showed that there were only 8652 nonsingular solutions, appearing in 4326 pairs due to the two-way symmetry. Since the two-way symmetry amounts to nothing more than swapping the labels between the left and right dyads of the mechanism, we may say that there are 4326 distinct four-bars that interpolate nine general points. Moreover, these appear in triplets, called Roberts cognates, which not only go through the nine points but have exactly the same coupler curve. This means there are just 1442 distinct four-bar coupler curves that pass through the points. By using parameter continuation, we can solve subsequent examples using only 1442 paths, about a 100-fold reduction from the 143,360 used to solve the first example. When dealing with a very sparse system like Equations (9.6.49) and (9.6.50), it is often advantageous to eliminate some variables. This is because one of the main costs of the continuation method is solving the linear systems for Euler prediction and Newton correction. The cost of linear solves grows as 0(n 3 ) with the number of variables, unless sparse solving methods can be applied. In the problem at hand, we can eliminate all the 8j variables without increasing the root count, thereby increasing efficiency when using a linear solver for full systems. The elimination is accomplished by applying Cramer's rule for linear systems. The system

at9 + a20~l + a3 = 0,

M + w-i + p3 = o,

(9A53)

SiS2 + Sj = 0,

(9.6.54)

has solutions only if

where 6l =

ft ft '

* = ft fc '

S

> = ft ft '

(9-6'55)

Applying this to Equations (9.6.49) and (9.6.50) gives a system of 8 equations with the monomial product structure {xd, xa, x, x, a, a, 1 } ( 2 ) {yb, yb, y , y , b, b, 1}<2>.

(9.6.56)

This reduced system has been the subject of further study. The mixed volume of the reduced version of the system, computed by Verschelde (Verschelde, 1996)

Case Studies

169

(Verschelde et al., 1996), was found to be 83,977. The best root count known was found by applying polynomial products (Morgan et al., 1995). The approach is to observe that Equation 9.6.54 admits the product decomposition {5i, 53} ® {52S3}. A homotopy based on this decomposition has 18,700 paths appearing with twoway symmetry so that only 9,350 paths must be tracked. However, the start system itself must be solved by continuation since the subsystems obtained by choosing one factor from each equation are not linear. The whole computation requires 24,300 paths. Although this is a substantial reduction in the number of paths, it requires a specialized computer program, so one may prefer to use a general purpose algorithm with more paths. No matter which method is used to solve the first random example, considerable efficiency is to be gained in subsequent examples by applying parameter continuation to track only 1442 paths.

9.6.8

Four-Bar

Summary

The purpose of this discussion of four-bar linkages is to show a spectrum of problems, having multihomogeneous root counts ranging from six to 286,720. Each geometric problem can be formulated in several ways as an algebraic system to be solved, and each algebraic system can be placed in any one of several homotopies for numerical solution. Generally, a well-chosen multihomogeneous formulation yields a root count considerably lower than the total degree, while the mixed volume of the Newton polytopes gives a somewhat lower root count. General linear products in which different equations have different linear decompositions are not useful for these synthesis problems, because they have the same monomial structure at each precision point. For the hardest problem in the set, the nine-point problem, symmetry can cut the number of paths in half, while polynomial products give the smallest root count at the expense of a more complicated computer program. Even with that approach, the number of continuation paths is more than ten times as large as the actual number of isolated roots. Only parameter homotopy can solve the problem using only 1442 paths, but we need to bootstrap the process by solving one example with one of the other homotopies. In several examples, we see that there is more at stake than just the number of homotopy paths. We can choose between two homotopies having the same root count, one having sparse equations in many variables and the other having some variables eliminated but less sparsity. Which is more efficient depends on the details of how function evaluation and linear solving are computed. To be efficient, the large, sparse formulation of the equations requires sparse linear algebra routines in the path-tracking code. On the other hand, elimination of variables tends to raise the degrees of the equations that remain, which can adversely affect the numerical stability of the equations.

170

Numerical Solution of Systems of Polynomials Arising in Engineering and Science Table 9.1 Equilibrium Constants Iog10(l/fei) Iog 10 (l/fc 2 ) Iog 10 (l/fc 3 ) Iog 10 (l/fc 4 ) Iog 10 (l/fc s ) Iog 10 (l/fc 6 ) Iog 10 (l/fc 7 )

9.7

Constants for the chemical equilibrium model, Exercise 9.2 T = 1000° 24.528 22.206 47.970 24.942 22.120 46.989 32.187

T = 3000° 7.289 6.997 15.107 6.825 7.208 14.680 10.285

Total Concentrations

T = 6000° 3.108 3.270 6.942 2.559 3.541 6.791 4.878

To TH Tc TN

5.e-5 3.e-5 l.e-5 l.e-5

|

Exercises

Exercise 9.1 (Nash Equilibria) (1) Compute the generic number of Nash equilibria for the following cases: (a) 3 players, 3 pure strategies each; (b) 4 players, 2 pure strategies each; (c) 3 players with (4,3,3) pure strategies, respectively. (2) Let Nash(N, S) be the generic number of Nash equilibria for N players having S strategies each. Derive a recursive formula for Nash(./V, 2) in terms of Nash(AT — 1,2) and Nash(iV - 2,2). Use it to find Nash(7,2). (3) Write a code to compute Nash(iV, {Si,..., SJV}), where player i has Si pure strategies. Compute Nash(5, {4,3,3,2,2}). (4) Use HOMLAB'S routine bezno to find Nash(3, {4, 3,3}). In general, routine bezno is not an efficient way to perform such a count, because it works by forming and solving a linear-product start system. Demonstrate this by using it to count Nash(6,2). What goes awry for a larger number of players? Exercise 9.2 (Chemical Equilibrium) This exercise concerns the chemical system of § 9.2. Data for this problem is given in Table 9.1. (A typographical error in Morgan's Table 9-2, corrected here, reverses the constants Tc and TH•) (1) Carefully verify the 4-homogeneous and the linear-product root counts given in § 9-2. (2) Find a 3-homogeneous formulation that also has a root count of 18. (3) Follow the steps outlined in § 9.2 to derive expressions for the coefficients of the monomials listed in Equation 9.2.17 in terms of the mass conservation parameters TH, To,Tc,TN and the equilibrium constants k\,..., k7. (4) Use routine chemsys in HOMLAB to compute solutions to the system. First, choose random coefficients. Try the different start systems. Do you get the same number of finite roots each way? What do the roots at infinity look like? (Hint: s t a t s (4,:) indicates the multiplicity of roots as determined by the end game. See Chap.10.)

171

Case Studies Table 9.2 Concentrations for T = 1000°, the only physically meaningful answer Components Xo XH Xco XN

Concentration

Compound

Concentration

1.4911556-015 3.212064e-019 7.664381e-016 2.314587e-027

Xo2 XH2 XN2 XCo2 XOH XH2O XNO

7.499733e-006 1.657938e-015 4.999735e-006 1.000000e-005 6.314036e-012 1.500000e-005 5.308800e-010

(5) Compute the solutions for random parameters. Is the result the same as for random coefficients? (6) Compute the solutions for T = 6000°, 3000°, and 1000°. How many physically meaningful roots are there (concentration values must be real and nonnegative)? (7) The test in chemsys for real solutions checks if the imaginary part of the concentrations is less than 10~6. Why is this not an adequate test for this problem? Can you devise a better one? Can you spot complex conjugate pairs in the list of "real" solutions? (8) Try turning off scaling for T = 6000° and see what happens. What do you think will happen for T = 3000°? Try it and see. (9) (Open ended.) Why is T = 1000° so difficult? Can you devise a strategy to treat this problem more easily? The sole physically meaningful answer for T = 1000° is given in Table 9.2. Exercise 9.3 (Stewart-Gough by total degree) Try running the Matlab file stewart/sgtotdeg.m to solve the forward kinematics of a general 6-6 StewartGough platform. (1) Confirm that among the 128 endpoints of the total-degree homotopy, 88 lie on the afHne algebraic set {(e, g) : e = 0, gg' = 0}. (2) The degenerate points are all singular. Why? (3) Save the 40 nonsingular roots and use them as start points for parameter homotopy, as directed in Exercise 7.4. Exercise 9.4 (Stewart-Gough by LPD) HOMLAB provides a routine, called lpdsolve, that creates a linear-product start system for a given product structure and tracks the resulting homotopy paths. The user must provide an m-file function that computes the function value f(x) and its Jacobian matrix df/dx. The script file stewart/sglpdhom.m does all of this for Stewart-Gough forward kinematics problems. (1) Run sglpdhom and check that it tracks 84 paths and obtains 40 nondegenerate solution points for a general 6-6 platform.

172

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

(2) The routine warns that the start system has 30 singular solutions of the form e = 0. Can you see why these are present and why there are 30 of them? (Hint: they are nonsingular roots for some choice of factors in the start system G = {go, • • • ,ge}, but singular as solutions of G.) Exercise 9.5 (Six-Revolute Inverse Position) The following exercises pertain to the system Equations 9.4.30-9.4.32. (1) Confirm the two-homogeneous root count of 320. (2) Run routine sixrevl in HOMLAB and check that there are 16 finite roots. (3) If your computer is fast enough, modify sixrevl to solve the system with a total-degree homotopy (1024 paths) and reconfirm the root count. (4) Run routine sixrev2, which uses a 16-path parameter homotopy, on the following and observe the number of finite roots and the number of real roots. (a) (b) (c) (d)

a random, complex example, a random, real example, the Manseur-Doty example, a real example with intersecting "wrist" axes: a4 = d5 = a5 = 0.

Fig. 9.4

Seven-bar linkage, type b.

Exercise 9.6 (Seven-Bar Structures) The structure in Figure 9.2 is one of just three topological arrangements of seven links in a structure that cannot be solved by analyzing a five-bar or three-bar substructure. The other two are shown in Figures 9.4 and 9.5. (1) Derive equations for each of the seven-bar structures in Figures 9.4 and 9.5 and find linear product decompositions having root counts of 16 and 18, respectively.

Case Studies

Fig. 9.5

173

Seven-bar linkage, type c.

(2) Create a single program using HOMLAB to solve any of the seven-bar structures with a 20-path two-homogeneous homotopy. Solve a random example of each type and verify the root counts of 14, 16, and 18. (3) Create individual programs for the three cases using linear-product decompositions having the minimal number of paths. Run the same examples as you used in the previous item and verify that the same solutions are found.

Exercise 9.7 (Four-bar Function Generation) (1) Clear the negative exponent from Equation 9.6.43 and show that the system Equations (9.6.42)' and (9.6.43), j = 0, . . . , 4 , has a six-homogeneous root (x,u,v,l), and (6j,l), count of four. (Hint: use the groupings (x,u,v,l), j = 1,2,3,4.) (2) Confirm the root count of four for the system of Equation 9.6.44, j = 1, 2,3, 4. (3) Use routine f cngen in H O M L A B to synthesize some function generators. Remember that a real linkage has u* = u and v* = v, where "*" is complex conjugation. (a) Let *j- = $2 for $ , = {0.0,0.1,0.2,0.3,0.4}. Set (<^-,Vj) = (e'^.e***). (b) Do the same except *_,• = sin($j) for $.,• = {-1.0,-0.5,0.0,0.5,1.0}. (c) Construct an original example. How many real solutions are there in each case? (4) For real linkages synthesized in the previous item, plot angle * versus $ on a fine grid and animate the motion of the linkage. (5) Write a program to use H O M L A B to solve the six-homogeneous version of the problem from item (1) above.

174

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Exercise 9.8 (Four-bar Body Guidance) (1) Use routine burmest in HOMLAB to synthesize some dyads for body guidance. (a) Let 9j = {e~u, 0,0,0, elu}, Pj = {-2, - 1 - 0.5i, 0,1,2 + i}, and Pj = p* (complex conjugate). (b) Construct an original example. (2) A four-bar is obtained by combining a left with a right dyad. For the problems above, pick two real solutions and use one as the left dyad and one as the right dyad. Sketch the four-bar linkage at each of the five given positions. (3) What is the maximum number of distinct four-bars that can guide a body through five general positions? (4) Confirm that the only monomials appearing in Equation 9.6.46 are {xa, ax, x, x, a, a, 1}. Show that by introducing new variables s = xa and s = ax, one can reformulate the problem as six equations of total degree four. (This trick is similar to one due to Bottema (ch.8, §5 Bottema & Roth, 1979).) Exercise 9.9 (Five-Point Path Synthesis) • Use HOMLAB to solve the five-point problem using a six-homogeneous formulation with 96 paths. You may wish to write a script to form the equations in "tableau" form and then apply mhomtab. Determine the number of endpoints that are (1) at infinity, (2) singular, (3) finite and singular, (4) contained in (C*)8. • Using the results of the previous run, construct a parameter homotopy to solve subsequent problems in this family using as start points only the solutions in (C*)8. • Explore the symmetric five-point problem in which the fixed pivots and the precision points are placed with mirror symmetry about the vertical axis. Instead of writing equations specialized to the symmetric case, just use the general formulation with symmetric data. (Hints: Let the zero-th precision point be on the vertical axis. Also, note that in isotropic coordinates, (a, a) being mirrorsymmetric to (b,b) means (b,b) = (—a,—a).) What is the generic number of symmetric solutions? • How can you set up a parameter homotopy that preserves symmetry? How many paths must be tracked? • Find a formulation of the symmetric problem that uses just half as many variables and equations. Program it in HOMLAB and verify that you get the same results as using the general formulation with symmetric data. • Solve the case of a = 0,6 = l,p = (0.765 + 0.735i, 0.935 + 0.595i, 1.335 + 0.595i, 1.685 + 0.945i, 1.08 + 1.05i), with "real" data, meaning a = a*, etc.

Case Studies

175

Verify that one of the solutions has x w 0.71477 + 1.3365i. How many "real" solutions are there? • For some real solutions, plot the coupler curve and verify that it passes through the specified points. A "circuit defect" is said to occur if the real coupler curve has two circuits and some precision points fall on each. Find examples with and without circuit defects. Can you find an example having multiple real solutions without circuit defects? • Download one of the publicly available packages that implements polyhedral homotopy and use it to solve the five-point problem.

Chapter 10

Endpoint Estimation

In earlier chapters we studied polynomial homotopies H(z, t) : CN x C ^ C ^ with t going from 1 to 0. In this chapter we investigate the last part of the continuation procedure as t goes to 0. This is called the endgame in the continuation algorithm. In § 10.1, we look at nonsingular solutions of H(z,0) = 0, the system we want to solve. For these solutions Newton's method 1 is excellent. In § 10.2, we look at the situation of singular roots of H(z, 0) = 0. For these solutions, we follow (Morgan, Sommese,
177

178

10.1

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Nonsingular Endpoints

Assume that x is an isolated nonsingular solution of H(z, 0) = 0, i.e., that H(x, 0) = 0 and that the Jacobian dH/dz is an invertible matrix at (x, 0). Then we know that applying Newton's method to H(z,0) = 0 starting at any point (x',0) sufficiently near x will converge quadratically to (x,0). Given a homotopy continuation path z(t) with limt^o z(t) = x, the usual prediction-correction methods described in § 2.3 work well. The final prediction to t — 0 provides the initial guess (x',0) for Newton's method. Usually, it is not difficult to decide that the limit (x, 0) is a nonsingular solution. Convergence itself is a good indicator that the solution is nonsingular, but a surer test is to examine the condition number of the Jacobian at the endpoint. If the solution converges, as indicated by a small step in the final Newton iteration, and the condition number is mild, then the solution can be confidently declared nonsingular. Because convergence behavior and the condition number can be affected by poor scaling and high degrees, the definition of "mild" is problem dependent. Histograms of condition numbers for all the solutions of a problem can be very useful in such judgements, as described further in Chapter 11. If the solution does not converge well, the condition number computed at the solution estimate might not accurately reflect the condition number at the true solution. Because of this, one cannot confidently tell the difference between a cluster of nonsingular roots, each having a rather high condition number, and an inaccurate estimate of a true multiple solution. One way to clarify the situation is to increase the digits of accuracy of the computation. If the solution is truly nonsingular, then a sufficient level of accuracy will eventually reveal this. One can even apply interval arithmetic (see § 6.1) to obtain proof that a suspected nonsingular solution really is nonsingular. However, one cannot prove a solution is singular in this way; that is, higher accuracy arithmetic applied to a truly singular solution will increase the condition number at the estimated solution, but it likely will never show exact singularity. The interval evaluation of the Jacobian matrix will show that a singular matrix is within the bounds of the computation, but that does not prove singularity. One must finally stop at some level of accuracy and accept the judgement that the solution is singular to that level of approximation. This moves us into the realm of singular endgames, discussed next. As a practical matter, it is not necessary to determine if a solution is singular or not to estimate it well. This is because the singular endgames that follow work equally well on nonsingular endpoints. Thus, to keep a computer code simpler, one may apply the singular endgame to all paths and judge singular vs. nonsingular afterwards, according to the results.

Endpoint Estimation

10.2

179

Singular Endpoints

When the endpoint of a solution path is singular, there are several approaches that can improve the accuracy of its estimate. All the singular endgames are based on the fact that the homotopy continuation path z(t) approaching a solution of H(z,0) = 0 as t —> 0 lies on a complex algebraic curve containing (x,0). In this section we collect the facts that follow from this and underpin the methods. In particular, we will see that the methods become valid only after the path z(t) has been tracked into an "endgame operating zone" around t = 0. For very singular endpoints, this operating zone may only be reached by increasing the number of digits used. In § 10.4, we discuss in fuller detail what happens if one computes an estimate while still outside of this operating zone. Since this chapter is about local behavior of holomorphic functions, our homotopies H(z,t) will usually only need to be assumed holomorphic and not algebraic. In § 10.2.1, we collect all the assumptions we use in one place.

10.2.1

Basic Setup

Assume that H(z, f ) : [ / x A - ^
{(z,t)£UxA

H(z,t) = 0},

i.e., X is the closure of a connected component of the set of points of X with neighborhoods biholomorphic to an open set of C; and (4) the projection TT : U x A —> A restricts to a proper holomorphic surjection •KX • X -> A with 7^(0) = (x,0). At first sight this seems like a large number of assumptions that might be difficult to check! The crucial observation is that all the polynomial homotopies H(z,t) = 0 considered in this book fall into this setup. Indeed, if we are tracking a path z(t) starting at a nonsingular root z(l) of H(z, 1) = 0 and are trying to estimate the root of 7i(z,0) = 0 as z(i) —> x := 2(0), then the path is part of a one-dimensional irreducible component X of

{(z,t)eUxA

H(z,t) = o}.

By choosing small enough neighborhoods U and A and taking H(z, t) := 7iux&{z, t) and X to be the irreducible component of X n (U x A), all the hypotheses of the basic setup are satisfied.

180

Numerical Solution of Systems of'Polynomials Arising in Engineering and Science

In simpler terms, we know that the paths in our polynomial homotopies remain nonsingular for t £ (0,1], so each path is one-dimensional and makes a steady advancement as t goes to zero. The defining equations for the homotopy are all polynomial, so the path is a complex analytic set. This is the essence of the conditions stated above as applied to polynomial continuation.

10.2.2

Fractional Power Series and Winding Numbers

We have the following consequence of Corollary A.3.3. Recall that Ar(a) c C means the disk of radius r centered on a. Lemma 10.2.1 Assume that we are in the basic setup above. There is a neighborhood V c X of (x, 0) € X, a positive number r > 0, and a holomorphic mapping : A r (0) —> V with (0) = (x, 0) and the composition t = nv(
•

We call c the winding number of X at (a;, 0). Given an isolated solution (x, 0) of H(z, 0) = 0, there is a positive e e R such that for 0 < t < e, H(z, t) = 0, considered as a system in z, has only nonsingular solutions in the vicinity of (x, 0). From this, it follows that the multiplicity of the solution as a solution of H(z, 0) = 0 is the sum of the winding numbers of the one-dimensional irreducible components of the solution set of H{z,t) at (x,0). The nonsingularity condition is satisfied automatically for many algebraic systems. Note that since the components Zi{4>(s)) are holomorphic functions of s, they can be expressed as convergent power series of s. We can consider these as fractional power series in t1^0. For the above representation of the components of z(t) to hold we must be within a disk A r c := {t e C | \t\ < rc}, such that Tfxnir-1(Ar ) has either no branch point (in which case c = 1) or a branch point at (x, 0). We refer to rc as the endgame convergence radius. A good way to visualize the situation is to consider what happens when we track a solution path as t circles the origin in the complex (Argand) plane at a real radius r, say as t = retB as 9 goes from 0 to 2TT. We start at z$ satisfying H(zo,r) = 0 and follow the path implicitly defined by H{z,rel9) = 0. For example, the reader may think about H(z, t) = z2 — t(r] — t) with 77 a small positive number. For almost all r, paths satisfying the basic setup above will remain nonsingular as we continue around such a circle, returning at the end of the loop either to z0 again or to a distinct nonsingular solution z\. For the example H(z,t) = z2 — t{j] — t), paths will remain nonsingular except for r = n, and we will go from z0 = y/t(rj — t) to z i = "V^l 7 ? ~ 0 o r t o z i = \A(^ ~ *) depending on whether r < r\ or r > 77. We

Endpoint Estimation

181

may then proceed around the circle again and again to return to solutions z%, Z3,... Since there are only a finite number of nonsingular solutions, after some number of such loops, the solution path must return to the original point; that is, for some k, we have Zk = z0. In the example, H(z, t) = z2 — t{j] — t), k = 2 or k = 1 depending on whether r < r\ or r > r\. Considering this whole process again at a slightly smaller radius r', we generally expect the same picture again, meaning that we get a sequence of return points z'o, z[,..., z'k with z'k = z'o and z[, i = 1,..., k, being the continuation of zt as t goes from r to r' on the real line. However, there may be exceptional values r* of r where at least one of the loops hits a singularity, thus breaking continuity. In the example, this value is rj. Stepping across this value, the return sequence may change such that the ith return values Zi and z[ for r and r' with r > r* > r', i.e., on opposite sides of the exceptional value, are no longer joined by continuation of t from r to r' in the reals. The value of k that closes the sequence may change as well. The endgame convergence radius rc is the smallest such exceptional value of r*: for all smaller radii, the return map remains stable and the winding number c of the path is the value of k in this range. Remark 10.2.2 For simplicity we have slid over questions about whether you can indeed choose small enough open sets so that we can decompose the solution set of H(z,t) = 0 in a neighborhood of a solution (x,0) components so that for the one-dimensional components, we have the desired uniformization result. The language of germs is the way to gently deal with these issues in a rigorous manner. We have included a short introduction to germs in § A.3. 10.3

Singular Endgames

For a singular endpoint, Newton's method applied to solve H(z, 0) = 0 is no longer satisfactory for several reasons. First, Newton's method loses its quadratic convergence at a singularity, and in some circumstances, it may even diverge. Second, the prediction along the incoming path may give a poor initial guess, which exacerbates the problem of slow convergence. Finally, while the endpoint of the continuation path is well defined in the limit, the path might very well end on a positive-dimensional solution set of H(z, 0) = 0, so unconstrained Newton iterations may wander along this set rather than give us the endpoint we desire. All of this is to say that to deal with singular solutions, we need a strategy different than the one we described above for nonsingular endpoints. We call such a strategy a singular endgame. All singular endgames estimate the endpoint at t = 0 by building a local model of the path inside the endgame convergence radius. The overwhelming problem is that the paths approaching singular solutions of a system approach their limit very slowly. To deal with this, we wish to sample the path as close as possible to t = 0, but numerical ill-conditioning precludes accurate computation too near t = 0. This

182

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

leads to the idea of an endgame operating zone, described next. 10.3.1

Endgame Operating Zone

For a fixed precision of arithmetic, there is typically some small zone around t — 0 inside which a path with a singular endpoint cannot be numerically tracked within a prescribed accuracy of the true path. Since the endgame can only work inside the convergence radius, this leaves an annular endgame operating zone, as illustrated in Figure 10.1.

Fig. 10.1 Endgame operating zone

The endgame operating zone can be empty in the case that the ill-conditioned zone is larger than the convergence radius. However, whereas the convergence radius is completely defined by the homotopy, the size of the ill-conditioned zone is not. It depends on the precision of the arithmetic, so it can be made smaller by using more digits. Roughly speaking, if we wish to estimate the endpoint with k digits of accuracy, then we need to sample the path with k digits of accuracy also. Let 10 c denote the condition number of the Jacobian J(z, t) of H{z, t) with respect to the z-variables and some fixed norm. When we do a correction step of Newton's method we solve the equation J(z,t)5z = —H(z,t). Here we lose roughly C digits of accuracy. Computing with d digits of precision, we need d — C > k for success.2 By increasing d, one may effectively shrink the ill-conditioned zone. With enough 2 This analysis of Newton's method is very rough, as the iterative nature of the method can correct some errors. It would be closer to the truth to say that Newton's method converges quadratically only to k < d — C digits, but even that is a rough generalization. Our comments are meant to give a correct general picture without a complicated analysis.

Endpoint Estimation

183

digits, one can ensure that the endgame operating zone is not empty. Once inside the endgame operating zone, we can sample the path just for real t or we can sample for complex t in the zone. For a given precision of arithmetic, better accuracy in the estimate is achieved by sampling for complex t. 10.3.2

Simple

Prediction

The simplest approach is to track the path as close to t = 0 as possible using extended precision to get the same accuracy as a nonsingular root. Let us analyze a simple example to see what happens. Assume we were trying to solve zc = 0 for some integer c > 1 using the homotopy H(z, t) = zc — t = 0. Note in this special case of solving a one variable complex polynomial, the condition number of J{z,t) is 1. So we can track with precision on the same order as the number of digits, i.e., k = d. If we follow the path z(t) with z(l) = 1, our path is then z(t) := £=, but in practice we do not know the path explicitly, but must track it. Assume we have tracked the path t^ + e(t) where e(i) is a random error of size O(l0~k). Once t= is of the same order as e(i), path crossing will likely happen. So we cannot track for t beyond R « 10~k. In this case we have an estimate 10~fe/c for the solution. This is not very good. For example, with c = 5 and 15 digits of precision, we get 10~3 as an estimate for the solution 0. If we wanted 10 digits of accuracy, we could achieve this by using this method and 50 digits of precision. 10.3.3

Power-Series

Method

The simple prediction approach of § 10.3.2 can easily be improved. The idea here is to estimate the winding number c and then approximate the map cj> : Ar(0) —> C " x C of Lemma 10.2.1. There are different schemes to achieve this. We begin by tracking a solution path z{t) from t = 1 down to t = R for some R G (0,1). We then collect samples of z(t) by continuation from t = R to use in fitting a power series to 4>(s), where t = sc. There is a separate power series for each component of z. Assume for the moment that we know c and that t = R is inside the endgame operating zone. We choose some number of points s\,..., sK in the s-disk, such that each si is inside the endgame operating zone, and find the values z(t) = Z(SJ) by continuation. At each such point, we can compute derivatives. If we compute the first ki derivatives at a particular Sj, then Si is equivalent to ki + 1 points without derivatives when determining the order of the power series we can compute. That is, for each j = 1,..., N, we have a polynomial Pj(s) of degree (X^i=i(^ + -0) ~ 1 approximating <> /_•, (s) and satisfying p3 (st) = <\>^ (s») for i from 1 to K and for v from 0 to ki. The standard error estimate (Theorem 3.6.1 Davis, 1975) tells us the error of the approximation of Pj(0) to (pj(O) is O (PliLi Isi]fei+1)- F° r brevity below, we shall say this is an Mth order fit, where M — (X)"=1(&i + 1)) — 1.

184

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

This leaves open two questions that must be answered in order to deploy the method: • How do we find the endgame operating zone so that we can sample within it? • How do we determine the winding number c? The only practical approach to finding the endgame operating zone seems to be adaptive trial-and-error. Suppose we fix a pattern of the sample points, {asi,..., asK}, where a is a scaling factor for shrinking the sample pattern around the origin. Typically, asi is real and we arrive at it by tracking t in (0,1). The remaining sample points may be real or complex, but either way, we evaluate z at them by continuation. We may execute the endgame repeatedly for a geometrically decreasing set of scalings aj = A* for some fixed real number A S (0,1), say A = 0.3. When successive estimates of the endpoint agree to some pre-specified tolerance, we declare the method a success and stop. If this tolerance is never satisfied, we stop when the scaling gets so small that we can no longer accurately track paths due to the ill-conditioning near t = 0. If this happens, we must report that the tolerance was not met and return as our best estimate the one for which the smallest successive difference was found. There are several good ways to determine c. One is to directly measure the winding number by tracking a circular path, t — Re^^6 until the path closes up at 9 = 2TTC with c a positive integer, i.e., with z(Re2'KC^1) = z(R). If R is inside the endgame operating zone, then c, the number of loops around the origin necessary to close the path, is the winding number. As always, there is the numerical problem of deciding when two approximate numbers, z f Re2lxc^~^\ and z(R) are equal. This is the same as the problem of needing to keep the allowed error in our tracking small enough that we do not have path crossing. A less computationally-expensive method for small c is to note that since c is an integer, we can quickly test small values of c, say, from 1 to 4, for consistency with a power-series fit to an oversampled data set. Such a data set can be obtained with less path-tracking than would be required to find the winding number by path closure. A method for determining c and estimating z(0) is as follows. (1) Use continuation to collect sample values of z(t) for t = ti,... (2) For c = 1,..., c max , do the following.

,tK,tK+i.

(a) Transform the sample points into the s-plane, using Si = ti . The continuation path in t determines the proper matching angle of each Si, that is, if U = Re^16 for R e (0,1), then s; = R}/ceV=ie/c taking R1/0 in the reals. (b) Derivatives with respect to t at the sample points must also be converted to derivatives with respect to s using the value of c, e.g., dz/ds = (dz/dt)(dt/ds) = (dz/dt)csc-\ (c) Fit an Mth-order power series, c(s), to the samples at s i , . . . ,sK, as de-

185

Endpoint Estimation

scribed above, (d) Calculate the prediction error at the extra sample point as ec = \\4>c(sK+i) —

^(Wi)ll(3) Use the c that gives the smallest prediction error ec as the estimate of the winding number, so (f>(s) = 4>c{s). Estimate the path endpoint as z(0) = 4>(0). When used in conjunction with the adaptive method of determining the endgame operating zone, one often observes that c = 1 gives the best prediction when the path is far outside the convergence radius. As the path is tracked into the operating zone, c settles into the correct value. This is because the order of the prediction error for an incorrect value c' of the winding number is O(tl/C), whereas for the correct value it is O{tM>c). One way to collect samples is in a geometric sequence along the reals: (^0)^1)^2) •••) = (R,XR,X2R,...) for some A € (0,1). Using z and dz/dt at two successive values ti and tj+i, one may make a cubic prediction of the next value at ti+2- A n i c e feature of this sampling pattern is that it advances by adding just one sample point to the sequence, reusing the last two points of the previous sample. That is, at one iteration we use samples at (to,£i,<2) a n d at the next (ti, *2J ^3)Such a geometric sequence can be used to determine the winding number without trial and error. The value z{t) is approximately z(t) = z(0) + at1/0 + higher-order terms, where a is the first coefficient in the fractional power series. Thus, z(R) — z(XR) « a(l - \l'c)R}-/c and z(XR) - z(\2R) « a(l - A ^ A ^ i ? 1 / 0 and so z(XR) - z(X2R) ^ z(R)-z(XR) ~

1/c

Since we know X, this can be used to estimate c, keeping in mind that c is a positive integer. This method can fail when a is zero or small, so that the first nonconstant term in the power series is order £2/c or higher. A method that attempts to deal with such subtleties is described in (Huber & Verschelde, 1998) (see also (Verschelde, 2000)). We shall not pursue this further here. As we approach t = 0, we can expect the predictions of the power series to be quite accurate. Accordingly, we may use it in place of the linear predictor in the predictor-corrector path tracker when collecting new samples. Of course, one should use the current best estimate of c at each stage, which may change as the endgame proceeds. Even when c is not correct, because the path has not fully entered the endgame operating zone, the best estimate for c obtained by the above method will generally be better than just assuming c = 1. A final variation on the power-series method is worth mentioning. Once the endgame operating zone is entered, it is valuable to quickly gather more samples to raise the order of the prediction. This allows the process to converge to full accuracy at larger values of t, before the ill-conditioned zone is encountered. Suppose we have

186

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

sampled along real t £ (0, R) and the prediction of 4>c{s) gives an accurate estimate of z(tK+i) in step 2d. Then we may try to predict across the origin in s and use Newton's method to refine samples there. It is particularly convenient to gather a symmetric sample set —si, — S2, • • •, —sK, because the odd-powered terms in the power series for tp(s) = {(—s))/2 drop out, while i/>(0) = 0(0) = z(0). Consequently, with a change in variables to w = s2, we can estimate z(0) with an Afth order power series for ip(w) that is the same as a (2M + l)th order power series for
For double precision arithmetic and samples on the real line in t, experience has shown that there is little profit in attempting the use of winding numbers greater than four or five. For higher precision arithmetic, this limit can be extended. The problem is that an Mth-order power series in s corresponds to a power series in t of order only M/c. To get a good estimate, we will need a large value of M and a numerically stable method of computing the estimate 0(0) without finding all M + 1 coefficients of the power series. The Cauchy integral method of the next section provides this. 10.3.4

Cauchy Integral Method

The Cauchy integral method is based on the use of the Cauchy Integral Theorem to estimate the solution of H(z,t) = 0 by 0(0), where <j> : Ar(0) -> C^ x C of Lemma 10.2.1. As in the power-series method, we first track z(t) until t = R. We then track as 6 varies, to both determine the winding number c and to collect z (Re^^e) samples around this circular path. Letting s denote the coordinate of A r (0), we have t = sc, and z$ = <j>i(s)fori = 1,..., N with the sought after solution given by (zi,... ,z^) = (i(0)... ,<^JV(0)). The Cauchy Integral Theorem gives

fc(0) = -±== f 27TV-1 J{sec | |«|=flv<=}

^ds.

(10.3.1)

s

In terms of 9 and z (Re^~*e) we get the vector integral

Because of periodicity, an excellent method to evaluate this integral is the trapezoid method, e.g., (Eq.(3.3.4) Stoer & Bulirsch, 2002). This method yields an estimate of z(0) with error of the same magnitude as the error with which we know the sample values z(Re^zzl0). As in the power-series method, we can benefit from choosing a special sample set. If M + 1 points around the circle are sampled at equal angles, Sk =

End/point Estimation

187

j^e2n^ikc/(M+i)^ faen the trapezoid method gives exactly the average of the sample points: 1

M+i

Moreover, it is easily shown that this is the same result as would be obtained from a power series fit to the same points. The success of the Cauchy integral method depends on finding an appropriate radius for the circular sample. As in the power-series method, we do not know a priori the convergence radius. The most practical recourse is to discover it adaptively, by trying the method at geometrically decreasing radii. Convergence may then be judged by agreement in winding number and endpoint estimate between successive trials. 10.3.5

The Clustering or Trace Method

This last approach is based on the trace, see § 15.5. Assume that we have a number of paths Zi(t) converging to what appears to be the same solution z* of the system H(z,t) = 0 that we want to solve. Denote the paths as wi(t),..., wm(t). We have a finite number of one-dimensional irreducible analytic sets Xlt... ,Xk passing through a small neighborhood of (z*, 0). We assume that the projections to the taxis -Ki : Xi —> C are proper for all i when restricted to 7r~1(Ar(0)). This will be true for some r > 0. Each map iri n-i 0, e.g., see § 15.5 or (Appendix Morgan, Sommese, & Wampler, 1992a), this sum extends to a holomorphic function tr(t) for t £ A r (0). We are in a situation similar to the situation with the power series method of § 10.3.3, but simpler since we do not worry about c. This method predicts the value tr(0)/m for z*. This prediction is a prediction for the average of the limit points wi(0) + • • • + Wm(0) m

Each of the Wi(t) has a fractional power series, but their sum is holomorphic, that is, it has a power series with integer exponents. Thus, we may conveniently estimate z(0) by fitting an integer-exponent power series to the average of the Wi(t). The main difficulty with this method is determining which solutions are converging to the same endpoints. The difficulty arises because the estimate of the endpoints of the individual paths is inaccurate unless the winding number is employed in the estimation. Only the average endpoint is well-behaved (holomorphic),

188

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

not the individual paths. A lesser disadvantage of the trace method is that to sample the average solution path, one averages the values of the individual paths at each sample point. This means that the individual paths must be sampled at the same points. Hence, the processing of the paths becomes coupled, whereas the power series or the Cauchy integral can be applied to each endpoint independently.

10.4

Losing the Endgame

It may happen that the endgame is applied outside the endgame convergence radius, either because there are insufficient digits to track within that radius or because the endgame zone is not identified correctly. It is natural to ask what happens in such a circumstance. When there is a tight cluster of distinct solutions, the precision of the arithmetic must be high enough not only to distinguish between them, but also to track paths accurately near them. If the cluster is too tight in comparison to the precision of the arithmetic, the end of path tracking, and hence the application of the endgame, will occur outside the radius of convergence. There is the stability question of whether the methods will compute some sort of average of some of the solutions of the cluster. The methods do, in fact, have good stability properties, which hold in a larger range than the endgame operating range. The setup is that we have a holomorphic function, H(z, t) : CN x U —> C, where 0 6 U C C. Let 7T : C^ x U —> U be the product projection. Of course, in practice this is our homotopy. We are trying to solve at t = 0. We have introduced three interrelated methods. The Cauchy integral method and the power series method are the most accurate. The clustering method of § 10.3.5 is less accurate but clearly fails gracefully: it gives the weighted average of roots of the cluster. The full gamut of possible behaviors of the methods when we are not in the endgame operating region is not clear, but we can get some idea of the behavior from the following examinations. Consider the simple example on C2 H(z,t) =

z2-t2-e2=0.

If we track down to t — R and R < e, then we are in the endgame operating region. If R > e, we are not. Let's see what we end up computing. The solution set TZ of H(z, t) = 0 over AR(0) is a Riemann surface that can be shown to be biholomorphic to some annulus. The important point is that 7r~1({t G C | |t| = R}) is the union of two disjoint circles C\ and C?,-

Endpoi^it Estimation

189

Applying the Cauchy integral method we end up evaluating -1

/>2TT

— /

2TT JO

VR2e2^e

+ e2d6,

with a choice of one of the two branches of the square root. If R < e, the Cauchy integral method yields the roots ±e depending on the choice of the branch. If R > e we get a function dependent on R. This integral is an elliptic integral, but for explicit values of R and e it is easy to evaluate numerically. Fixing e = 10" 7 and R = 10~5 we get 0.64- 1(T 5 -0.50-lQ- 7 \f-[, which does not compare favorably with the actual roots ±10~ 7 . Indeed, the error 0.63 • 10~5 is two orders of magnitude larger than the root. Since the Cauchy integral method applied to an approximating polynomial gives the value at the origin of the approximating polynomial, we see that choosing interpolation points on the circle C\ or C2, the power series method will yield answers identical to the Cauchy integral method. It is important to realize that the trace method is not better than the powerseries or Cauchy integral method. Indeed, if we chose the paths wi(t),... ,wm(t) apparently converging to a common root as in the trace method, and applied the power series or Cauchy integral method to all the points and summed, we would get the same sort of answer as in the trace method. Let's see this precisely for the Cauchy integral method, realizing, as noted above, that this implies the analogous statement for the power series method using interpolation points on the curves over the circle \t\ = R. We assume that over some small disk, AR := {( 6 C \t\ < R}, of radius R around 0, with A# C U, the set H~1 (0) r\TT~1 (AR — 0) is a one-dimensional analytic set with closure X in AR X C^ such that irx '• X —> AR and TT-^ : X —> AR are proper. This is phrased this way to allow the possibility that there is a positive dimensional analytic solution set in the fiber over 0. By definition, proper means that the inverse image of any compact set is compact. One significance of properness for a holomorphic map is that the map has a well-defined sheet number on each irreducible component of X, e.g., see Corollary A.4.15. As mentioned previously, these conditions are satisfied for all of our homotopies. We are not assuming that we are in the endgame convergence radius. Theoretically this means that we do not necessarily have a map 0 as in § 10.2.1. We still have the normalization mapping v : TZ —> X, which for curves is the most classical special case of Theorem A.4.1. Here TZ is a smooth curve (a Riemann surface in the terminology of complex analysis), v is proper; and for a finite set of points B C AR, the map 7TTC\77-I(B), i.e., the map n restricted to TZ minus the finite set TT~1(B) is a biholomorphism. When we are in endgame convergence radius, TZ is a disk and v is 0. Since TXX extends to a neighborhood of X, v extends to V : TZ —> X, where TZ is a Riemann surface with boundary a union of circles, i.e., dTZ := TZ — TZ is a union of disjoint smooth connected curves C\,..., CL for some integer L > 1. Now the Cauchy integral method (Morgan et al., 1991) that we are using starts

190

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

with a point po £ dTZ and follows its continuation p as n(p) goes around the circle, {t £ C | |t| = R}, c times, where c is the minimum positive number of times it is necessary to go around the circle, {t e C | |i| = R}, until p returns to po. Note p traces out a connected component, d, of OX = UjCj containing p. We let c, denote the cycle number associated to the curve CV In analogy with the cluster method we compute the integral

J-yfz.^(±)

(10.4.2)

where, abusing notation, we let z : TZ —> C is the vector of coordinate functions on CN pulled back to TZ. 72. is a noncompact Riemann surface and TZ is a compact Riemann surface with boundary a finite number of circles, such that dTZ = TZ — TZ, i.e., TZ is the set of interior points of TZ. We assume that TT : TZ —> AJJ is a proper holomorphic map from TZ to the disk, A# := {t 6 C | |t| < R}, of radius R around 0, and that TT extends to a differentiable, finite to one map, TT^ : X —> A#. We let Pi for i in a finite set I denote the distinct points in ir~1(0). We let n, denote the multiplicity of the pi as a zero of the holomorphic function n. The following consequence of Stokes Theorem will let us work out estimates for the effect of branch points on the Cauchy integral method. L e m m a 10.4.1 Let n, TZ, dTZ, n^ be as above. Let z : TZ —> C be a holomorphic map. Let {pi | i G / } be the set of points, Pi, in the set, n~1(0), with multiplicities, rii. Then letting c = C\ + • • • + ci : 1

f

» fdt\

yr--^ rii

Note c = X l i e / n i ' ^ u ^ * n e n * an<^ ^ e Cj can be different (though each rii is a sum of a subset of the c,. This consequence of Cauchy's integral theorem is left to the reader as an exercise. Corollary 10.4.2 If dp = dTZ, then the equation (10.4.2) computes the average of c (counting multiplicities) solutions of H(z, 0) = 0.

10.5

Deflation of Isolated Singularities

Endpoints of homotopy solution paths can be divided into two types: isolated solutions and points on positive solution sets. We say that z* £ CN is an isolated root of f{z) = 0, f(z) : CN -> C ^ , if for a small enough positive e e l , the ball Be(z*) c CN defined by B€(z*) = {z £ CN \ \z - z*\ < e} contains no other root of f(z) = 0 besides z*. Isolated singular roots can be computed accurately without resorting to the kinds of singular endgames we have discussed above. This is

191

Endpoint Estimation

brought about by a symbolic reformulation of the equations so that z* becomes a nonsingular root of the new system. Before describing the method, let us review some facts about the behavior of Newton's method near an isolated root. If z* is a nonsingular root, that is, if the Jacobian matrix df/dz(z*) is nonsingular, then it is well-known both that z* is isolated and that Newton's method converges quadratically to z* when initialized from any point close enough to it. In most cases, but not all, Newton's method will also converge for isolated singular roots, but convergence will be slower and the final accuracy lower than for nonsingular roots. An illustration of a system for which Newton's method fails near an isolated root is (Griewank & Osborne, 1983) (29/16)**-2^0, xz - y — 0.

No matter how close one starts to the multiplicity-three isolated root at the origin, (x,y) = (0,0), Newton's method diverges. See (Griewank & Osborne, 1983) for more on how Newton's method behaves near such irregular singular roots. The system of Equation 10.5.3 is very special in the sense that if the coefficient (29/16) is changed to a generic value, Newton's method converges even though the origin remains a root of multiplicity three. However, we do not wish to depend on this kind of genericity, as we may indeed be given a system with an irregular singularity. Moreover, even when Newton's method converges, its behavior may not be satisfactory. For a root of multiplicity fi > 1, its rate of convergence is only linear and the function must be evaluated with precision /x times greater than the accuracy desired in the estimated root. To be precise, consider a single polynomial f(z) : C —> C with a root z* of multiplicity /x > 1. Denoting the kth iterate of Newton's method as Azk, we have the iteration formulae A2fc = -f{zk-i)/f'(zk~i),

Zk = zk-i + Azk.

Let £k := zk — z* be the error between the kth iterate and the true value z*. If the sequence of iterates converges to z*, then it obeys the following relation in the limit

tk+l = (»-iyk+o(ek). (A simple demonstration of this result can be found in (Ojika et al., 1983).) So for H > 1, the convergence rate is linear with geometric ratio (/i — l)//x. For fi = 1, convergence is quadratic, a much faster process. 10.5.1

Polynomials in One Variable

How can we restore quadratic convergence for roots with multiplicity greater than one? For a polynomial in one variable, this is rather simple. By Theorem 5.1.2, we know that a multiplicity fi root of f(z) is a multiplicity one root of f^~1"l{z).

192

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Suppose we begin by solving f{z) by a homotopy method, and we observe that fi roots are approaching a common endpoint. Then, we may switch to solving yO-i)(^), initializing Newton's method using the estimated singular endpoint of the first stage. While it is clear that in theory this deflation maneuver is valid, one might wonder if it is numerically stable. The polynomial that we solve in floating point arithmetic is only an approximation to an exact polynomial and so the multiplicity // root of the exact polynomial will appear to have a cluster of [i roots for the numerical polynomial. How does this cluster behave under differentiation? Can we be sure that f^~l^(z) has a root in the vicinity of the cluster of roots of f(z)l As detailed in (Sommese, Verschelde, & Wampler, 2004d), the answer depends on the degree d of f(z) and the distribution of its roots. Let z* be the centroid of a cluster of fi roots inside a disk Ap(z*) of radius p centered on z*, and let R denote the distance from z* to the nearest root outside that disk. Then, the condition

5,_1SL p

a — /i + 1

(104M)

is a conservative estimate that guarantees that f^k\z), for all k < /i — 1, has exactly /x — k zeros in AP(ZQ). Even if the root is truly a multiple root due to the structure of the equations, at any finite level of precision in floating point, it will likely become a cluster of roots. However, the higher the precision, the tighter the cluster, and so beyond some precision, the cluster radius p will become small enough that condition 10.5.4 will be satisfied and the deflation maneuver will succeed. This does not resolve the question of deciding whether a given polynomial has an exact multiple root or it has a cluster of closely-spaced roots. As we have indicated before, this is not a question that can be resolved in favor of a multiple root using floating point arithmetic. If it is a cluster, a high enough level of precision will reveal it, but if it is a true multiple root, only exact arithmetic can prove it. 10.5.2

More than One Variable

It is natural to consider how to generalize the approach for one variable to systems of equations in several variables. The following formulation is based on (Leykin et al, 2004), which in turn was motivated by (Ojika et al., 1983; Ojika, 1987) (see also (Lecerf, 2002)). Assume that f(z) : CN —> CN is a polynomial system with an isolated singular root z*. Denote its N x N Jacobian matrix as J(z) : — df/dz. At the singular root, J(z*) will have rank r < N. This implies that the matrix equation J(z*)v = 0 has a linear solution set for v 6 f>w-i of dimension N — r — 1. We can pick out a unique point of this null set in P ^ " 1 by appending N — r — 1 homogeneous linear equations and dehomogenize by appending one more inhomogeneous equation. Equivalently, we can pick a random r-dimensional linear space to intersect the null space in a point in CN, that is, pick VQ,. .. ,vr G CN at

193

Endpoint Estimation

random and set v = v0 + Y^i=i \vu with unknowns Ai,..., Ar e C. Combining this condition with the system f(z), we have 2N equations in TV + r unknowns

S(*,A)=(

f(^r

) =0,

(10.5.5)

\J{z)(vo + J2z=i^iVt)J where A = (Ai,...,A r ). An initial guess for A can be found by standard linear algebra applied to J{z) at the estimated value of z* coming from the solution of system f(z) = 0. The system of Equation 10.5.5 has more equations than unknowns. It can be reduced to square using a randomization procedure (see § 13.5), but this is not necessary. We are only seeking a local solution, not forming a global homotopy, so it suffices to use Gauss-Newton iteration. This is identical to Newton's method except that the overdetermined iteration step is solved by least-squares (pseudoinversion). Let (z*,X*) denote the solution of g(z, A) = 0 that uniquely projects to the solution z* of f(z) = 0. It is not immediately clear that the multiplicity of (z*, A*) as a solution of g(z, A) = 0 will be lower than that of z* as a solution of f(z) — 0, but a proof of this is given in (Leykin et al., 2004), subject to the assumption that z* is an isolated solution of f(z) = 0. To desingularize an isolated root of multiplicity fi > 2, deflation may need to be applied multiple times. Indeed, in the case of n = 1, a single polynomial, the foregoing is exactly the same as the differentiation approach discussed in Subsection 10.5.1, where we saw that fi — 1 deflation steps are required. In the general case, the statement is that at most \i — 1 deflation steps are required. The fewer deflations required, the better, as each one adds more variables. The deflation process is local in the sense that different singular points of the same system may have different deflations. The singularities may differ not only in their multiplicities, but also in the rank of the Jacobian at each stage of deflation. An analysis of the numerical properties of deflation is not yet developed for the multivariate situation: there are no known formulae analogous to Equation 10.5.4 for the univariate case. Experiments reported in (Leykin et al., 2004) indicate that the approach is effective for a number of test cases having isolated singularities. In several variables, there is an additional concern that does not arise for just one variable. This is the possibility of positive dimensional solution sets. Deflation is only valid for isolated roots. This is a big drawback, because we have no clear way of deciding which singular endpoints in a homotopy are isolated and which ones are landing on positive dimensional sets. This issue will be treated further in Part III, where we consider the treatment of positive dimensional solutions. The frequent appearance of positive dimensional solution sets, especially at infinity, means that we cannot depend on deflation alone: the general purpose singular endgames remain necessary if we wish to find all path endpoints accurately.

194

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

10.6

Exercises

Exercise 10.1 (Power-Series Method) The power-series endgame is implemented in HOMLAB using samples for real t only. The control variables are set in htopyset .m and are described in § C.7.1. Systems of the form x y - l = 0,

Or + l) f e =0,

have a multiplicity k root at (x,y) = (—1, —1). • A total-degree homotopy has 2k paths. We have identified (—1,-1) as a multiplicity k root. Analytically determine the endpoints of the other k paths. • For k=3, solve the problem with HOMLAB by writing the system in tableau form and using the script totdtab to solve it with a total-degree homotopy. Does it give the result you expect? • Try similar problems for k = 2,3,4,... How high can you go and get good endpoint estimates? Pay attention to the setting of CycleMax. • The default setting is allowjump=l, which causes the endgame to also collect sample points for negative values of s by predicting across the ill-conditioned zone at the origin. Compare the performance of the endgame for allowjump=0 versus allowjump=l. You may set global verbose=l to get intermediate results from the endgame, see § C.7.2. Exercise 10.2 (Power Series Error Analysis) There are two sources of numerical error in the estimate produced by the power series method: truncation error due to the order of the fit and amplification in the fitting process of errors in the sample points. Formulate the fitting process as the solution of a linear system whose unknowns are the coefficients of the power series: (si) = [ 1 Sj sf • • • sf1 ] [ a 0 ai a2 • • •

aM]T

for i = 1,..., M + 1. We may write this in matrix form as $ = 5a,

(10.6.6)

where $ is the column of sample values, S is the Vandermonde matrix whose (i,j)th element is s^~ , and a is the column of power series coefficients. The final estimate will be 0(0) = do- The condition number of the Vandermonde matrix affects how errors in the samples <j>(si) are transmitted to the estimate ao• For the same order fit, M, compare the condition number for the following sample patterns: (1) a geometric sequence s^ = R, XR, X2R,... for various A, (2) a symmetric, two-sided geometric sequence s, — ±R, ±XR, ±X2R,..., (3) the transformation of the two-sided sample set to fit a power series in w = s2,

195

Endpoint Estimation

(4) a circular sample, s, = ReV=T2m/(M+i)_ • How are the numerics affected by rescaling the fitting as (Si) = {lpSl

(pSi)2 • • • (p S i ) M ][a 0 ai/p

a2/p2 • • •

aM/pMf

with p = 1/R? • What sample pattern is best for a thin endgame operating zone, characterized by having an ill-conditioned region almost as big as the convergence radius? • Give two reasons why the Cauchy integral method is a good approach for endpoints with large winding numbers. Exercise 10.3 (Circular Sample Sets) • For an evenly-spaced circular sample set, s» = fie^-l2nt/(M+1)j find the sums ££+ 1 sJforfc = 0 ) l,2 ) ...,oc. • Show how this implies equivalence between the trapezoid rule for the Cauchy integral on evenly spaced circular samples and the power series fit to those points. • Show how this also implies that the average of all paths approaching the same endpoint is a holomorphic function (given by a power series with nonnegative integer exponents) of the path parameter t. Consider that there can be several subgroups of paths approaching the same endpoint, each subgroup having its own winding number. • Let S be the Vandermonde matrix, as in Equation 10.6.6, formed for the evenlyspaced circular sample. What is S1"1? Exercise 10.4 (Multiprecision) (open research topic: see (Bates, Sommese, & Wampler, 2005b)) The control settings for the endgame in HOMLAB reflect the fact that Matlab computes in double precision. How should these be changed if multiprecision arithmetic were available? If the precision of the arithmetic could be changed at will during the endgame, how should the endgame algorithm best use this capability? Exercise 10.5 (Deflation 1) The system x2 + y2 = 0,

x2 - y2 = 0

has a multiplicity four isolated root at (x, y) — (0,0). Show that one stage of deflation gives a nonsingular system defining the root. Exercise 10.6 (Deflation 2) Do the following for Griewank and Osborne's system of Equation 10.5.3. • Formulate Newton's method and experimentally observe that initial guesses near (0,0) diverge.

196

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

• Use HOMLAB to solve the system with the power-series endgame and observe the winding number of the origin (suggestion: use totdtab.m). • Use deflation to obtain a new system for which the origin is a nonsingular solution. • How many stages of deflation are required? How many variables does the final system have?

Chapter 11

Checking Results and Other Implementation Tips

This is a very short chapter to help those who might try to create their own continuation codes. These tips can also be useful in getting more secure results when using an existing code. Since continuation is a floating point numerical process, there is the possibility of several kinds of failure. The first step in correcting a failure is recognizing that it has happened. Sophisticated codes detect some failures automatically and take corrective action. Whether done automatically or manually, the basic techniques are similar. 11.1

Checks

There are two kinds of checks: local checks examine an endpoint in isolation using numerical analysis of the iterative method used in the endgame, whereas global checks use knowledge of the polynomial nature of the problem, primarily the fact that we expect to find all isolated solutions. If the path tracker fails mid-course, that fact should be flagged and a corrective action taken. See § 11.2 below. 11.1.1

Endpoint Quality Measures

Any numerical solution method should provide some measures of the quality of the solutions it produces. Let us assume we are solving the square system f{x) = 0 and x* is an estimate. An entire treatise could be written on how to analyze the accuracy of x*, but we will be very brief and simply list some useful indicators: Function Residual The size of the function value, |/(x*)|. This measure is affected by the scaling of the function, that is, if g(x) = 100/(x), then |p(x*)| gives a 100 times worse function residual than |/(x*)|, even though the error in the solution is the same. Even so, this gives a first look at whether the solution has been successfully computed. Newton Residual If we are using Newton's method to refine the endpoint, the 197

198

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

magnitude of the last step, \fx(x*)~lf(x*)\, is a good estimate of the distance between x* and an actual zero of f(x), providing that the Jacobian matrix is nonsingular. Endgame Residual If the endpoint is singular, the methods of Chapter 10 are preferred to Newton's method. Typically, the method is performed several times for successively smaller values of t as t —> 0. The distance between successive endpoint approximations, \x* — x*_1|, replaces the Newton residual as an accuracy estimate. Condition number The condition number of the Jacobian matrix, K,(fx(x*)), is a good measure of how singular the solution is. Using the 2-norm, it is the ratio of the largest to smallest singular value of the matrix. A large condition number indicates singularity. However, the value can be near one for a near-rank-zero matrix, having all singular values small, although these appear only rarely in practice. To signal these, the largest singular value, or any other matrix norm, \fx{x*)\, can be useful as an auxiliary measure. If one finds the complete list of singular values, say o\ > • • • > an, a sequence with a precipitous drop in magnitude between two successive singular values is a clearer indication of singularity than a sequence that declines gradually, even if the condition number o\jon is the same. Homogeneous coordinate If using the projective transformation, as we highly recommend, a small magnitude of the homogeneous coordinate denning the hyperplane at infinity indicates a solution at or near infinity. If we are using multihomogenization, the smallest magnitude, of any of the homogeneous coordinates defining the hyperplanes at infinity, is the one to consider. Solutions at infinity are often singular, so a rather small homogeneous coordinate along with a rather large condition number is a good indicator that both conditions may hold, even when either indicator by itself is not extreme enough to be thoroughly convincing. For easy human understanding, it is usually best just to report all of these measures by their decimal exponents, e.g., as Iog10(|/(x*)|). Just two or three decimal places usually suffice. In a "clean" run, all roots have a small endpoint residual, and there are no roots in the gray zone between singular vs. nonsingular or finite vs. at infinity. It is not so safe a practice to set fixed tolerances for the classification of roots, as, for example, ill-conditioning is exacerbated by high-degree equations. It is more instructive to view histograms of the various measures, in which case a gap of several orders of magnitude between singular vs. nonsingular or finite vs. at infinity gives a rather secure picture, whilst a smear of values gives no clear indication where to draw boundaries. It is essential that these histograms be compiled for the logarithms of the magnitude of the measures. The exercises explore the use of histograms, including two-dimensional ones that categorize finiteness and singularity in the same chart.

Checking Results and Other Implementation Tips

199

It is typical that nonsingular solutions will attain very small Newton residuals, while the accuracy of singular ones will depend on the multiplicity of the root. Without a singular endgame, a double root usually attains only about half the accuracy of a nonsingular one. If the condition number is high enough (and we have taken care that the bad conditioning is not due to poor scaling of the equations), we can be relatively secure in classifying the root as singular and, if we are only looking for the nonsingular roots, it can be discarded. It is more satisfying, of course, to invoke a singular endgame and clean up the solution, if possible. Also, higher-precision arithmetic can be invoked to clarify the situation. 11.1.2

Global Checks

In addition to the measures above, which are computed for each endpoint separately, there are some checks that depend on the patterns of roots in the computed solution set. These are tied to the polynomial character of the problem. Path Crossing Check By using random complex numbers in our formulations, we ensure that, with probability one, the solution paths do not cross in the middle of the homotopy; only at the end might they merge together in a singularity. However, if two paths become sufficiently close, it is possible for the path tracking algorithm to jump from one to the other while still staying within the tracking tolerances. Thus, it is a good idea to stop at some small t and check if all the solutions are still distinct. That is, we pick some small te G (0,1) and do the tracking in two phases: first from t = 1 to t = te, then from te to 0. (A value of te =0.1 is typical.) If two solution estimates at te are very close, this indicates that the tracker jumped paths. Re-running just those paths with tighter tracking tolerance usually corrects the error. Multiplicity Check If one uses the power-series endgame of § 10.3.3, an estimate of the winding number, c, is obtained for each endpoint, and this implies that in the neighborhood of t — 0, this path is part of a cluster of c paths approaching the same endpoint. Since we are tracking a complete set of solutions paths, all c of them should be found. It is possible for more than one cluster to approach the same endpoint, so the check is to see if the total number of solutions approaching the same endpoint are compatible with the winding numbers assigned to them. Examples of valid clusters of winding numbers are {2, 2} (one cluster with winding number 2); {2,2,2,2}, two clusters, each with c = 2); and {2,2,3,3,3}, (two clusters, one each with winding numbers 2 and 3). One could go further to extract not just the endpoint of a path, i.e., the constant term of the power series, but also the next term in the power series to match endpoints into clusters. The Cauchy integral method of § 10.3.4 gives an even stronger check for matching up paths: each time the path tracker circles around the origin without returning to the original point generates another point in the cluster. We can check for the existence of such points in the incoming solutions.

200

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Multiple Run Comparisons If one runs the same problem two or more times with different choices for the random constants, the same results should be obtained. This principle can be invoked at several levels. • In a homotopy of the form h(z, t) = -ftg(z) + (1 —t)f(z) (see Theorem 8.3.1 for details), this means using a different value for 7. Then, one should obtain the exact same list of path endpoints, because although the tracking path has changed, it is a real-one-dimensional curve inside the same complex curve and its destination point is the same. The association of start points to endpoints likely will be permuted, however. If the endpoints from two such runs cannot be sorted to match up, then one or both are in error, and one can concentrate path re-runs on those paths whose endpoints have no match in the other set. • A stronger test than the above is to change the start system to another in the same class. The start systems described in Chapter 8 all contain random constants which can be reset to new values. Two such runs should have the same set of nonsingular endpoints, which can be compared. The singular endpoints will typically move, but usually these are not of primary interest. • For a parameterized family of systems, F(z; q) = 0, using the notation of Chapter 7, one may solve two instances for different, randomly chosen, values of the parameters q. The number of nonsingular roots should be constant, but, of course, their values will change. To cross check them, one can track paths from one to the other in a parameter homotopy F(z; tq\ + (l-t)«2)=0.

11.2

Corrective Actions

Points with good quality measures at t = te and which pass the path-crossing test are ready for the endgame. Those which fail on either count should be re-run from the beginning, t = 1 to t = te, with different path-tracking parameters. Paths that fail in one endgame might benefit from another. For example, the power-series endgame in double precision is only effective up to c — 4, while the Cauchy integral endgame has no such limit. But ultimately, the only way to compute some difficult endpoints is to increase the precision of the arithmetic. We briefly address these two issues next. How much extra effort should be devoted to corrective actions depends on one's aims. In an engineering problem, one might not care much about lost solution paths. This is especially true if the trouble is due to a nearly singular endpoint, as it may likely be useless for practical purposes anyway. However, if one is doing an initial run to solve a random-parameter example in preparation for repeated parameter continuations, then one wants to ensure that a full solution set has been

Checking Results and Other Implementation Tips

201

found. This is because there is no way to predict which of these starting solutions will lead to the desired answers in a subsequent application. 11.2.1

Adaptive Re-Runs

We saw in Chapter 2 that path tracking benefits greatly from using an adaptive step size in place of a fixed one. In a similar way, the remaining heuristic control parameters, such as the path tracking tolerance, can be made adaptive. Too small a path-tracking tolerance makes progress slow, while too large allows path crossing. This works hand in hand with the number of iterations allowed in each corrector step. For concreteness, let's say that the path-tracking tolerance is 10~4 and we allow up to three iterations in the corrector. Then, a path-crossing incident is often cleared up by decreasing the tracking tolerance to 1CT6, and if not, try decreasing the iterations allowed to just two. (We have found such settings effective when using double precision on systems of low-degree equations.) These kinds of re-run strategies are easily automated so that human intervention is not necessary. As tighter tolerances are set, it may be necessary to decrease the minimum step size allowed and increase the number of steps allowed, if such constraints are in place to cut off expensive paths. This presumes, of course, that one is willing to pay the extra computational cost to get the answer. If one is planning a large run with a path count on the order of 100,000 or more, it can be worthwhile to collect run statistics on perhaps 1% of the paths and make adjustments in the tracking parameters. Once the initial 1% runs well, the entire run can be launched with confidence, although automatic adaptive re-runs should be left in place. 11.2.2

Verified Path Tracking

Instead of controlling tracking by a tracking tolerance, one can instead use interval arithmetic (see § 6.1) to guarantee that the solution estimate stays in a unique convergence zone throughout, thereby having absolute assurance that path crossing cannot occur (Kearfott & Xing, 1994). This tends to give conservative step sizes, so it can be very expensive. 11.2.3

Multiple Precision

There are several difficult situations that may arise that are most simply resolved with multiple-precision arithmetic. One is the case of a generally ill-conditioned target system, which can be due to high degree equations or coefficients with widely different scales. Sometimes, high degree is the result of applying elimination to an initial system having many equations of lower degree, in which case it might be better to solve the initial system rather than the reduced one. For systems with

202

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

wide-ranging coefficients, such as the chemical systems presented in § 9.2, a scaling algorithm can help. But for some systems, there is no practical recourse except raising the precision of the arithmetic. A common situation is the existence of singular endpoints. As illustrated in Figure 10.1, the endgame operating zone is a disk minus an ill-conditioned region near t = 0. It can happen, especially for endpoints of high multiplicity, that the ill-conditioned region takes up a large portion (or all) of the convergence disk, thus preventing the endgame from succeeding. If the desired accuracy is held constant, higher-precision arithmetic shrinks the ill-conditioned zone and allows the endgame to succeed. (If one deploys multiple precision and makes the accuracy requirement more stringent simultaneously, the latter may cancel the former so that there is no net gain.) A final possibility is the phenomenon of path crossing. Although in theory there is a zero probability of two paths crossing, they can approach each other close enough to require higher precision to negotiate past the near collision. For small systems, it is acceptable to just pick new random constants and re-run the whole procedure, but for a large problem, one wouldn't want to throw away a significant investment of computation if a near collision should happen on some path late in the process. It would be better to detect ill conditioning in the middle of a path and increase precision on the fly, or lacking that capability, rerun the paths in question with, higher precision and tighter path-tracking tolerance. In a sense, singular endpoints are a case of this same difficulty, except we are not trying to slip by the collision, but instead we are aiming directly at it. As multiple paths approach the same endpoint, we need to keep from jumping from one to another so that the endgame attributes the correct angle in the s-plane to the samples, where sc = t. Extra precision may be needed to maintain accuracy. 11.3

Exercises

Exercise 11.1 (Checking) Revisit any problem from the exercises of previous chapters; the six-revolute inverse position problem of Exercise 9.5 might be a good choice. Do the following. • Run the problem using standard settings in HOMLAB and make histograms of condition number, function residual, and the homogeneous coordinate. Note that for any of these quantities, a histogram of the exponents of the values in scientific notation is more useful than a histogram of the values themselves. Use routine pathcros to check for path crossings among the points in xendgame, which is a list of the solutions for t — Endgame ^ ^- ^ s e P a t hcros again for the list of solution points, xsoln, at t = 0. For any occurrence of multiple paths having the same endpoint, check that the incoming paths have winding numbers consistent with the multiplicity check described above.

Checking Results and Other Implementation Tips

203

• Loosen the path tracking tolerance so that pathcros discovers path crossing errors. • Return the path tracking tolerance to its default value, but this time cripple the endgame by setting CycleMax=l. See what difference this makes in the histograms. Exercise 11.2 (Multiple-Run Checking) For any parameterized problem of your choice, do a multiple-run global check that shows that the nonsingular solutions for two independent total-degree runs match up under parameter homotopy.

PART III

Positive Dimensional Solutions

Chapter 12

Basic Algebraic Geometry

In this chapter we discuss the basic properties of the different sorts of algebraic sets that arise in the numerical solution of polynomial systems. The flexible "probabilityone" methods underlying the numerical approach to polynomial systems, developed in Chapter 13, are based on the fact that given any system of polynomials, the set of solutions breaks up into a finite number of irreducible components. Recall that we say that an affine, projective, or quasiprojective algebraic set Z is irreducible if ZTes is connected. The dimension of an irreducible algebraic set Z is defined to be dim Zreg as a complex manifold, which is half the dimension of ZTeg as a real manifold. Irreducible components, discussed in § 12.2 are nice sets that are almost manifolds. For example, the system f{x y)

' -[x(y>-x>)(v-2)(3x

+ y)\-°

^ ^

vanishes on the union of four irreducible components {x = 0} U {y2 - x3 = 0} U {(1,2)} U {(1, -3)}. It is a striking and powerful fundamental fact that the most general solution set is not much worse than this simple example. To even state this result, which is called the irreducible decomposition, we need to make precise what is meant by an algebraic set. The aim of this chapter is to familiarize the reader with the basic types of algebraic sets and their properties. Four types of algebraic sets are useful to us: affine algebraic sets, projective algebraic sets, quasiprojective algebraic sets, and constructive algebraic sets. The first three of these were introduced briefly in the introduction of Chapters 3 and 4. We consider them in more detail in the succeeding sections. In § 12.1, we revisit affine algebraic sets, i.e., the solution sets of systems of polynomials on C^, to discuss the topologies and the maps defined on them. In § 12.2, we discuss the irreducible decomposition for affine algebraic sets. Often polynomials are homogeneous, e.g., f(x,y) = x2 + y2, and in this case acknowledging that their solution set is naturally defined on P^ simplifies matters, 207

208

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

both conceptually and numerically. For this reason we introduced projective algebraic sets, i.e., solution sets on FN, in Chapter 3 and consider them further in § 12.3. Often we need to consider all points in a projective algebraic set X except for some that are in a second projective algebraic set Y, i.e., sets of the form X\(Xf~)Y), such as C2 \ {(0,0)}. These sets, which include affine algebraic sets and projective algebraic sets, are called quasiprojective algebraic sets. They are discussed in § 12.4. A map / : X —> Y between quasiprojective algebraic sets X and Y is said to be an algebraic map if the graph of / is a quasiprojective subset oi X xY: see § 12.4 and § A.4 for more details. Finally, we discuss constructible algebraic sets in § 12.5. These sets, which include all quasiprojective algebraic sets, may be defined as follows. Constructible algebraic sets A constructible algebraic set, or constructible set for short, is any set constructed from projective algebraic sets by a finite number of the Boolean operations of union, intersection, and complementation. Constructible algebraic sets prove useful for two reasons. First, many natural sets, e.g., images of algebraic sets or the set of points of the image of an algebraic map where the fiber is a given dimension, are not quasiprojective, but are constructible (see Theorem 12.5.6 and Lemma 12.5.9). Second, a constructible set A contained in a quasiprojective set X is quite close to being an algebraic set, e.g., the closure A of A in the complex topology is a quasiprojective algebraic subset of A (see Lemma 12.5.3), and there is a dense Zariski open set U of A contained in A (see Lemma 12.5.2). We end with § 12.6, a brief discussion of multiplicity of algebraic sets. Roughly speaking, this notion allows us to relate the algebraic degree of a system of equations to the degrees of the irreducible components of the system's solution set. For a single polynomial in several variables, this is a straightforward generalization of the phenomenon of multiple roots (double roots, triple roots, etc.) that may appear when factoring a polynomial in one variable. For systems of more than one equation, the situation becomes a bit more delicate, as we shall discuss. All four basic kinds of algebraic sets arise quite naturally in discussing the solutions of polynomials on CN, as we show by examples. We include in this chapter only the rudimentary facts about these different classes of sets, with further useful facts collected in Appendix A. As this book is focussed entirely on polynomial systems, we may sometimes drop the modifier "algebraic" and speak simply of "affine sets," "projective sets," etc., but meaning these in the algebraic sense. Before diving in, let's clarify briefly how quasiprojective sets include both projective and affine algebraic sets, and how constructible sets include them all. Since quasiprojective sets are of the form X \ (X n Y), where X and Y are both projective, they include projective sets as the special case where Y is empty. As for affine sets, recall that CN is equal to P^ minus its hyperplane at infinity, Hoo, which is

209

Some Concepts From Algebraic Geometry

a projective algebraic set equivalent to P"" 1 given by the homogeneous equation XQ = 0. So if A is an affine algebraic set defined as the solution of a polynomial system F(x), and B is the projective algebraic set defined by the homogenization of F(x), then A = B \ (B n #00) is seen to be quasiprojective. Finally, the defining form, X \ (X n Y), of a quasiprojective set is just a Boolean construction: we could rewrite it as X n (not V). So quasiprojective sets are a kind of constructible set. We now examine each type of algebraic set in more detail. 12.1

Affine Algebraic Sets

Naively, an algebraic set is nothing more than the common zeros of a set of polynomials. Making this precise and convenient to use takes some work. We start with a polynomial system ~ f(x)

:=

fi(xi,...,xN)~ :

(12.1.2)

Jn{x1,...,xN)_ consisting of n polynomials fi{x\,..., XJV) on CN contained in the ring C[zi,..., XN] of polynomials in the variables Xi, ..., xM with complex coefficients. We denote the set of common zeros on C by V(fi,. ..,/„) := { i e C w | / 1 ( i ) = 0;... i fn(x) = 0} . Such a set of common zeros is called an affine algebraic set. The word affine in "affine algebraic set" signifies that the set is a closed subset of Euclidean space, which is sometimes called affine space. For a system / as above in Equation 12.1.2, we usually abbreviate V ( / i , . . . , /„) by V(f). Example 12.1.1 The simplest polynomial system is p(x) = 0 where p(x) is a monic polynomial of degree d in one variable with complex coefficients, i.e., p(x) := xd + axxd-1

+ • • • + ad, k

with a,i £ C constants. As discussed in § 5.3, p(x) factors as \[(x — x,)Mi. Thus i=l

V(p) consists of the k complex numbers xt. The multiplicity of X{ equals /x, (see § 12.6 for further discussion of multiplicity). Thus p(x) = x3 — x2 = x2(x — 1) = 0 has a zero set consisting of 0 and 1. Unions of affine sets are affine, e.g., if A := V(f) for polynomials / := (A) • • • > fr) and B := V(g) for polynomials g := ( g i , . . . ,5s), then A U B is de-

210

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

fined by V({fl9j

| t = l,...,r; j =

l,...,s}).

Since any point is an affine set, i.e., (xf,... ,x*N) is denned by (xi — x*,... ,XN~X*N), we have that have that any finite set is an affine algebraic set. Lemma 12.4.3 will show that these are the only compact affine sets. For a single polynomial p(x\, £2) £ C[xi, X2] not equal to a constant, the solution set is a nonempty one-dimensional affine algebraic set. Example 12.1.2 A simple polynomial system on C2 is given by X\ = 0. Here the solution set is the X2-axis. It is worth emphasizing that passing from a system / of polynomials to V(f) throws away all multiplicity information. For example, on C, x5, and x define the same affine algebraic set V(x). Also note that CN is the affine algebraic set corresponding to the identically zero polynomial, and the empty set is the affine algebraic set defined by a constant polynomial. Here is a less trivial one-dimensional example of an affine algebraic set. Example 12.1.3

Consider the polynomial w — z2. The set V{w - z2) := {(z,w)eC2\w-z2=

0}

is a smooth connected two-real-dimensional manifold. Indeed, the mappings (z,w) 1—> z and z 1 — > (z,z2) show that there is a one-to-one correspondence between points (z,w) G V(w — z2) and z e C . Note that an m-dimensional complex manifold is a 2m-real-dimensional manifold, since C has real and imaginary parts. In this book, "dimension" always means complex dimension; otherwise, we explicitly say "real dimension." A map / : X —> Y from one affine algebraic set X c C^ to a second affine algebraic set Y C C M is said to be an algebraic map if there is a map F : CN — • > CM such that (1) F = (FU..., FM) with all the F* G C ^ , . . . , % ] ; and (2) / = Fx, the restriction of F to X. When it is clear from the context, we sometimes refer to an algebraic map as a map. We define an algebraic function on an affine algebraic set X to be an algebraic map from X to C. We say that two affine algebraic sets X C CN and Y C C M are isomorphic if there exist algebraic maps F : X —> Y and G : Y —> X such that F o G i s the identity on Y and G o F is the identity on X. Example 12.1.4 Let Y := Viw - z2) be as in Example 12.1.3 and let X := C. We have the map G : Y —> X given by G(z,w) = z and F : X —> Y given by F(z) = (z, z2) which shows Y and X are isomorphic.

Some Concepts From Algebraic Geometry

12.1.1

211

The Zariski Topology and the Complex Topology

Noting that given two systems / = {/i,... , / n } and g = {g\,... ,gm} of polynomials V{f)

U V(g) = V({fi9j\l

and

V(f)nV(g) = V(f,g), we conclude that affine algebraic sets in CN are closed under finite unions and intersections. Given an arbitrary, possibly infinite, set of polynomials on CN, the Noetherian property for ideals in C[zi,..., zjv] (see, e.g, (page 74 Cox et al., 1997)) guarantees that there is always a finite subset of the polynomials with the same common zeros on C^. This guarantees that an arbitrary intersection of affine algebraic subsets of C^ is an affine algebraic set. This implies that the set of affine algebraic subsets of CN that lie on a given affine algebraic set X C CN satisfy the axioms to be the closed sets of a topology on X, which is called the Zariski topology. Here the open sets U C X are the sets X \ Y, where Y C C^ is an affine algebraic set contained in X. Open sets in this topology are called Zariski open sets. Similarly the affine algebraic subsets of CN that lie on the given affine algebraic set X c CN are the Zariski closed sets of X.

Besides the Zariski topology, there is the complex topology, which is also called the classical topology. Given an affine algebraic set X C C^, the complex topology on X is the topology that X inherits from the usual Euclidean topology on C^, i.e., a basis of open sets on X at a point x* € X is given by the intersection of X with the balls

{xeCN | ||x-x*|| <e} for 0 < e e IR and M the Euclidean norm. Both topologies are useful. Since every closed set Y in the Zariski topology on an affine algebraic set X C CN is the zero set of a finite number of polynomials, it follows that Y is also closed in the complex topology. Thus the complex topology has at least as many open sets as the Zariski topology. Except for the case of X a finite set, the Zariski topology has many fewer open sets than the complex topology. For example, if X is one-dimensional, then the open sets of the Zariski topology are the complements of finite subsets of X, that is, X minus a finite number of points. For X = C, this follows immediately from the fundamental theorem of algebra, Theorem 5.1.1. The point to understand is that a statement about Zariski open sets is much stronger than one about open sets in the complex topology. In particular, a nonempty Zariski open set of an irreducible affine algebraic set X is dense, and therefore a property that holds on a nonempty Zariski open set of X holds with

212

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

probability one on random points of X, as was discussed in Chapter 4. For example, the nonempty open sets of C in the Zariski topology are the complements of finite sets, but for the complex topology, the interior of the unit disk is a possible open set. For more on the material in this section, (Red Book: Chapter 1.10 Mumford, 1999) is a good reference. In § 12.4, we discuss the quasiprojective algebraic sets, a very broad class of algebraic sets that includes both affine algebraic sets and Zariski open sets of affine algebraic sets. For now, we would like to point out that certain Zariski open sets of afHne algebraic sets may be identified with affine algebraic sets in different Euclidean spaces. Given a Zariski open set U of an affine algebraic set X c C ^ , we define the algebraic functions on U to be all functions of the form - where p,q G C [ x i , . . . , XJV] and V(q)nU = 0. Given a Zariski open set U on an affine algebraic set X C C ^ and a Zariski open set V on an affine algebraic set Y C C M , a map / : [ / — > V is said to be an algebraic map if / := FTJ where F : U —> C M is given by F := ( F 1 ; . . . , FM), with all of the Fj being algebraic functions. In line with the earlier definition of isomorphism in the case of affine algebraic sets, we say that U and V are isomorphic if there are algebraic maps F : U —> V and G : V —> U with F o G the identity on V and G o F the identity on U. If g is an algebraic function on an affine algebraic set X C C*, then X \ V(g) is isomorphic to an affine algebraic set. See Lemma A.2.4 for a proof of this useful fact. The Zariski open sets U of the form X \ V(g) are a basis for the Zariski open sets on X. To see this let Y := V(h\,..., hr) be an affine algebraic set on X. Then X\Y

= ur=1 (X \ v(hi)).

Not every Zariski open set of an affine algebraic set X is of the form X \ V(g), e.g., in Example A.2.3, we show that 0 C C w for iV > 2 is not of the form V(g) for a polynomial g. 12.1.2

Proper Maps

A continuous map / : X —> Y between topological spaces is called proper if for each y E Y, there is an open set U C Y containing y and such that U and f~x ([/) are compact. An algebraic map / : X —> Y between quasiprojective algebraic sets is called a proper algebraic map if / is proper as a continuous map in the complex topology. Proper maps are very nice, e.g., see § A.4. They also arise naturally when working in a probability-one framework. 12.1.3

Linear Projections

In this subsection, we give a brief introduction to linear projections: see § A.8 for more details.

Some Concepts From Algebraic Geometry

213

A linear projection ix: CN -> Cfc, N > k, is a surjective affine map 7r(x!, ...,XN)

= (LI(X),

. . . , Ljt(a:)),

(12.1.3)

where JV

Li(x) := al0 + ^2 a%ixh

a

ij

e

C.

We say that TT is a generic linear projection if the coefficients a^ are chosen "randomly." Precisely speaking, this only has meaning in the context of some property we are interested in. For example, in Theorem 12.1.5 below, we say that a generic linear projection restricted to X is proper, which means that there is a Zariski open dense subset of the a^ £ £kx{N+i) w^ t n e prOper^y t n a t ^ ne restriction to X of the linear projection, constructed from the Oy, is proper. Choosing a generic linear change of coordinates, i.e., choosing N generic linear maps to C, any projections along the coordinate axes is generic. The simplest example of a nontrivial linear projection 7r : C2 —> C is given by sending (2:1,£2) to X\. To see what this corresponds to in projective space, fix the ernbeddings • C2 into P2 given by sending (xi,x2) —> [l,Xi,x 2 ]; and • C into P 1 given by sending Xi —> [l,Xj]. We now have a commutative diagram C2 ^ P 2 \ {[0,0,1]} ni in' C ^P 1 where the map TT' : P 2 \ {[0,0,1]} —> P 1 is given by sending [xcXi,^] ~* [^Oj^i]Given two distinct points a, b £ FN, let (a, b) denote the unique line through them. The map TT' is often referred to as the projection from {[0,0,1]} because we can think of the map as sending each point i £ P z \ { [ 0 , 0 , 1 ] } to (x,[0,0,l])f){x 2 =0}. Intuitively we have a source of light at {[0,0,1]} and we send each point to the shadow it casts on {xi — 0}. With projections, we are perfectly happy to change the image by a linear transformation, and with this notion of equivalence, the projection is uniquely determined by the point {[0,0,1]}. The point {[0,0,1]} is called the center of the projection. Projections from points at infinity, i.e., points of the form [0, a, b], correspond to linear projections C2 —> C given by sending (x\, X2) to x\ — (a/b)x2 G C, as illustrated in Figure 12.1. From the point of view of projective space, there is nothing special about the points at infinity, and indeed on occasion, e.g., (Sommese, Verschelde, & Wampler,

214

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Fig. 12.1 Projection from point at infinity {[0, a, b]}

2001b) and (Calabri & Ciliberto, 2001), it is useful to project from points not at infinity. The case of a projection C2 —• C with a finite center is illustrated in Figure 12.2, where point c is the center of the projection. (We only draw the real part.) The set of all the lines through c are equivalent to a projective space P 1 , and the projection of a point x is the line P{x) through x and c. To perform calculations, we will often select a line, such as line L, and set TT(X) := LnP(x). No matter which line we choose in place of L, the essential fact is that all points along P{x) \ {c} have the same projection as point x. From this observation, it follows that the projection is determined uniquely by the center c.

Fig. 12.2 Projection with finite center c

We need the following important result, which is proven in § A. 10.4. Theorem 12.1.5 (Noether Normalization Theorem) Let X C CN denote an affine algebraic set. Let ix : CN —> Ck denote a generic linear projection. Then if dim X < k, the map nx is a proper algebraic map with allfibersT^x^iv)finitefor

Some Concepts From Algebraic Geometry

215

ally £ Y :=n{X). If dim X < k, then there is a Zariski dense subset U c X such that i\u : U —> TT(C/) is an isomorphism. If X is of pure dimension k, then nx is a branched covering of degree degX. 12.2

The Irreducible Decomposition for Affine Algebraic Sets

Given an affine algebraic set Z, we let Z reg denote the set of smooth points of Z. The set Z reg is an open set, dense in Z, with Z \ Zreg equal to a union of affine algebraic sets, which is why smooth points are also referred to as regular points. We say that Z is irreducible if Z reg is connected. We would like to follow the traditional, and very common, usage, e.g., (Mumford, 1995), and call an irreducible affine algebraic set an affine variety. It is unfortunate that affine variety has been used as a synonym for affine algebraic set by some authors. At this point it is safe to say that anyone picking up a book on algebraic or complex geometry must check whether varieties are irreducible or not (also reduced or nonreduced if that applies). For example, in (Mumford, 1995) affine variety means irreducible affine algebraic set, but in (Gunning & Rossi, 1965), a variety is a not necessarily irreducible reduced analytic set. The word variety is easier to say than irreducible algebraic set, but, to avoid confusion, we have reluctantly avoided use of this ancient word. The irreducible decomposition of an affine algebraic set Z C C is the decomposition Z := UaezZa obtained by first decomposing Zreg into the disjoint union of connected components Ua and letting Za denote the closure of Ua. Here I is just an index set assigning subscript numbers to the irreducible components. For many of our algorithms, it will be useful to group the irreducible components according to their dimensions, in which case we have index set Xi for dimension i, and we write Z-^UjLoZi,

Zi = UjeIiZij

(12.2.4)

where Zi is the union of all i-dimensional irreducible components of Z, and where Zij for j € Xi are the finite number of distinct irreducible components of Zi. Some of the Zi may be empty, that is, Z might not have components at every dimension. Indeed, the only possible component at dimension n is the whole of C n , which precludes any lower-dimensional pieces, so the decomposition is only interesting when Zn = 0. A simple example is given by Z := V{x\X2). This affine set is the union of the X\ and x2 axes, and since this set is clearly singular only at the origin, the irreducible decomposition of Z is Z = V{x{) U V{x2). The irreducible decomposition is a fundamental tool in understanding solution sets of polynomial systems. The primary aim of the remainder of this book is to show how to numerically find and manipulate this decomposition. (D'Andrea & Emiris, 2003) is a good place for obtaining an overview of symbolic algorithms for rinding the irreducible decomposition.

216

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Remark 12.2.1 (The algebraic situation) Though we will use the geometric approach to solution sets, there is a natural approach based on the underlying algebra of the polynomial system. Let 1{f) C C[x\,... ,XN] denote the ideal generated by the polynomials / i ( x i , . . . , XN), • • •, fn(xi> • • • > XN) making up the polynomial system /. Note that V(/) = V(J(/)). Given an affine algebraic set S c C N , let I(S) C C ^ ! , . . . , xN], denote the ideal of polynomials vanishing on S. An affine algebraic set Z is irreducible if and only if I{Z) is a prime ideal. Given any ideal J c C[xi,..., XN], V% the radical of I, is the ideal consisting of all / G C[xi,... ,XN] such that fk G I for some k. The irreducible decomposition is equivalent to the fact that for any ideal I C C[xi,..., XN], we can write \/T = C\a^j(Pa, where Va are the finite number of minimal prime ideals containing I. For example, y/l{x\x\) = I(x\X2) = I(xi) n I(x2). One weakness of the exact irreducible decomposition is that it assumes that the polynomials are exact and an algebraic set will be said to be irreducible even though it is for all practical purposes reducible. For example, let p{x,y) = xy — e. For e = 0, V(p) has two components, but for e ^ 0, V(p) is irreducible, even if e is so small that, in a problem arising in engineering or science, it is just noise. This sort of discontinuous behavior is not realistic for problems where data is never completely exact. For small e, numerical-geometrical methods will rather gracefully give different answers depending on the precision used. 12.2.1

The Dimension of an Algebraic Set

Using the irreducible decomposition, we can finish the definition of dimension. We define the dimension of an irreducible affine algebraic set to be the dimension of the smooth points, Xreg. Since the smooth points of an irreducible component are connected and dense, this is very natural. We say that an affine algebraic set X is pure-dimensional if all the irreducible components of X have the same dimension. We define the dimension dim^ Z of an affine algebraic set Z at a point x G Z to be the maximum of the dimensions of the irreducible components of Z that contain x. We define the dimension dim Z to be max dim-r Z. xez Here is a basic fact about dimension, which follows from the general result (Theorem III.C.14 Gunning & Rossi, 1965). Theorem 12.2.2 Let Z be an irreducible affine algebraic set Z C C^ of dimension k. Then given a polynomial f on CN which is not identically zero on Z, it follows that the dimension of every component of Z n V(f) is k — 1. Here are some points to be aware of.

Some Concepts From Algebraic Geometry

217

(1) Since the smooth points of an irreducible affine algebraic set Z are connected, it follows that given any point z G Z, every Zariski open neighborhood of z is irreducible. This can fail in the complex topology, as shown in the following example. Consider the curve Z := V(x2—Xi(xi + 1)) in the neighborhood of the point z = (0,0). The real part of this curve is shown to the right, where one may see that near the origin, the curve is 2 / resembles two lines, xi = ±xi, so in the local neighborhood it is not irreducible, even though globally the curve is one irreducible /~\/ piece. The solution set over the complexes is topologically a real \ Z ^ \ ~x[ two-plane stretched and bent such that two points touch each \ other. Local to the point of contact, it looks like two disks touching transversely, but globally it is all one surface. This is discussed in more detail in Example A.4.18. (2) Real points of irreducible algebraic sets do not have to be connected, nor do the components have to have the same dimensions. V{x\ — X\{x\ — l)(xi — 2)) is an example of the former and V{x\ — x\(x\ ~ 2)) is an example of the latter. Nor does there have to be much relation between degrees and number of real isolated zeros. For example, following (Example 13.6 Fulton, 1998), let p(x, y) := U^{x

- if + Iif=1(y - j) 2 .

We have m2 zeroes on R2 despite degp(x,y) = 2m. Over C, we have a curve with these m2 points all singular. 12.3

Further Remarks on Projective Algebraic Sets

Though, for applications, affine algebraic sets are the main interest, we must also define projective algebraic sets. We need them to be able to discuss what happens at infinity for a given polynomial system, and in particular to be able to carry out accurate counts of solutions of polynomial systems. Also the behavior of projective algebraic sets is often easy to understand, e.g., see the Proper Mapping Theorem A.4.3, and they can be used to understand the behavior of affine algebraic sets. In this section we continue the discussion of projective sets started in § 3.5. FN is a compact manifold containing CN as a dense open set. The natural approach to the definition of algebraic sets on WN is to define them as the solution sets of finite numbers of whatever are the analogue for FN of polynomials on C^. At first glance this does not look hopeful, since we cannot expect any nontrivial global algebraic functions. To see this consequence of the compactness of P™, consider the representative case of P 1 . Polynomials on C are holomorphic functions, and so under any reasonable definition, an algebraic function / on P 1 should be a holomorphic function. The

218

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

snag is that since P 1 is compact it follows from continuity that \f(x)\ has a maximum at some point x* G P 1 . Thus, then by the Maximum Principle, Lemma A.2.7, f(x) must be constant on an open neighborhood of x*, and therefore on all of P 1 . At first sight this is discouraging, but the key insight is that although there is no reasonable class of algebraic functions on ¥N, there are some "almost functions" lying around, i.e., the homogeneous polynomials. It is important to realize that, even though homogeneous polynomials are not functions on projective space, they behave as "extensions" to FN of polynomials on C^. Later we will return to homogeneous functions in § A. 13 and see that they are the prototypical nontrivial example of "sections of line bundles." Before we give definitions, let's work out a simple representative example. Let p(xi,X2) — x\ — X2 + 1 be a function on C2. Regarding C2 as the coordinate patch Uo C P 2 as above, we have in terms of the homogeneous coordinates [zoi^i)^] o n P2 that x\ = ZI/ZQ and a:2 = Z^/ZQ. Thus the function x\ — X2 + 1 is represented by

(?) a -? +i =(fy^-^ + ^-

\zoj ZQ \zoj 2 Under the identification of Uo with C , it is easy to check that the closure in P 2 of the zero set V{p) is the zero set V(f) of the homogeneous polynomial f(z) := Z\ - Z0Z2 + Zl.

The following two examples indicate that counting solutions in C2, even when we just have points, is not so clear cut as on C. Example 12.3.1

Consider the system

f

^--=[ax7bf+c]-

^

The reader can check that if a ^ 0, then there are two solutions to f(x,y) = 0 (counting multiplicities in the obvious way when b2 — \ac = 0). But what about the case a = 0, b ^ 0 where we only have one solution? Example 12.3.2

Consider the system of two polynomials on C2

(12 3 6)

/(*.»):= [^2.]

--

We expect two lines to meet in a point, but these two parallel lines do not. We already met similar systems in Chapter 3, so we know that the key to simplifying solution counts is to homogenize the systems. In this way, Example 12.3.1 becomes

g(w,x,y) := [ yv

' '

y/

wx y2

~

]

[ax + by + cw,\ '

(12.3.7)

v

'

Some Concepts From Algebraic Geometry

219

which now has for a = 0 a second solution point at infinity of [w, x, y] = [0,1,0] 6 P 2 , formerly "missing" from the affine version. Similarly, Example 12.3.2 becomes

S(w,*,v)--=[jSZ]>

(12 3 8)

'-

which now has the solution point at infinity along the x-axis, [w, x, y] = [0,1,0] € P 2 . Note that Example 12.3.2 shows that if we have a system / on CN, then V(f), the closure in FN of the set of solutions of / , may be smaller than the set of solutions V(f ) of the associated system / of homogeneous polynomials on FN. In that example, V(f) is empty, so V(f) is too, whereas V(f ) is the point {[0,1,0]}. It is easily checked that V{J) n C N = V(f). 12.4

Quasiprojective Algebraic Sets

Sets of the form X \X nY, where X, Y C FN are projective algebraic sets, are called quasiprojective algebraic set. These include sets of the form X \ X nY, where X, Y c P ^ are affine algebraic sets. The simplest nontrivial example of a quasi-projective algebraic set which is neither projective nor affine is C 2 \ 0. As with affine algebraic sets, we can with no changes define the Zariski and complex topology and the notion of irreducibility. The following is a basic fact. Theorem 12.4.1 Let U be a Zariski open dense subset of a quasiprojective algebraic set X. Then the closure of U in X in the complex topology is X. Proof. This follows immediately from (Theorem 2.33 Mumford, 1995).

•

Finally we note that all the basic results such as the irreducible decomposition of § 12.2 hold for quasiprojective algebraic sets (respectively projective algebraic sets) and not just for affine algebraic sets. The only difference is that the irreducible components in this generality are not affine algebraic sets, but are only quasiprojective varieties (respectively projective algebraic sets). Using this we carry over all the definitions of dimension. For example, a pure-dimensional quasiprojective set is a quasiprojective set with all irreducible components having the same dimension. Let X and Y be quasiprojective algebraic sets. We define an algebraic map f : X —> Y between X and Y to be a map such that for all x € X and j e F there are affine open sets U C X containing x and V C Y containing y such that f(U) C V and / : U —> V is algebraic. The set X x Y is a quasiprojective set, which may be shown by elaborating on § A. 10.2. The graph of a map f : X —> Y is the set Graph(f) := {(x,f(x)) e X x Y | x G X}.

220

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

It may be shown that an equivalent definition for an algebraic map f : X —• Y between quasiprojective algebraic sets is that / is a map from X to Y such that the graph Graph(f) C X x Y of / is a quasiprojective algebraic subset of X x Y. The following fact is useful. Theorem 12.4.2 The complement of a proper quasiprojective algebraic subset Y in an irreducible quasiprojective set X is connected. If a quasiprojective set X is connected, then X is path connected. Proof. The first assertion follows immediately from (Chapter 4, Corollary (4.16) Mumford, 1995). The second assertion would follow if we knew it for irreducible quasiprojective algebraic sets. Given any irreducible quasiprojective set, there is a connected smooth manifold mapping onto it by Hironaka's Desingularization Theorem A.4.1. Since • connected manifolds are path connected, we are done. Few algebraic sets are both affine and projective. Lemma 12.4.3

Let X C CN denote a compact affine set. Then X is finite.

Proof. To see this assume otherwise. By the irreducible decomposition from § 12.2, we know that if X is compact and not finite, then X contains a compact irreducible infinite affine algebraic set. We can assume without loss of generality that X is this set. The absolute value of any coordinate functions Z; restricted to X has a maximum on X. By Lemma A.4.2, the restrictions of all the coordinate functions • are constants, and hence X is a single point. 12.5

Constructible Algebraic Sets

Let us start with an example leading to a constructible set. Example 12.5.1 mials in C[x,y]

Suppose we were interested in the family of systems of polyno-

F

™=[Z,~-u]=0-

parameterized by (t,u) € C2. The set of (t,u) £ C2 where F^u)(x,y) nonempty solution set is

(12A9) = 0 has a

{(o,o)}u{t^o}. This set is not quasiprojective, but it is constructible. Let X be a quasiprojective algebraic set. Let A{X) denote the set of closed algebraic subsets of X. A(X) is closed under finite unions and arbitrary intersec-

Some Concepts Prom Algebraic Geometry

221

tions. The set T(X) of complements of the elements of A(X) are the open sets of the Zariski topology of X. The set C(X) of constructible sets of X is the smallest set of subsets of X that • contains A(X) and • is closed under a finite number of Boolean operations, where the Boolean operations are union, intersection, and sending a subset of X to its complement in X. Otherwise said, C(X) is the Boolean algebra of subsets of X generated by A{X) (or equivalently T{X)). Constructible sets are the outer limits of the type of sets that need to be considered in the numerical analysis of polynomial systems. We will see that they arise naturally when working with affine algebraic sets. We present here a few key facts about constructible sets. A fuller discussion may be found in (Chaps. AG.l and AG.10 Borel, 1969). Lemma 12.5.2 Let X be a quasiprojective algebraic set. Assume that A C X is a constructible set such that A = X, where the closure is in the Zariski topology. Then there exists a Zariski open and dense set U C X such that U C A. Proof. See (Proposition in Chap. AG.2 Borel, 1969)

•

Lemma 12.5.3 Let A be a constructible subset of a quasiprojective algebraic set X. Then the closure of A in the complex topology and in the Zariski topology are the same. Proof. Use Lemma 12.5.2 and Lemma 12.4.1.

•

When we take closures of constructible sets (and almost every set that comes up in this book is at worst constructible) this lemma tells us it does not matter whether we use the complex or Zariski topology: in either case we get the same algebraic set. For this reason, we often do not specify which topology we are taking the closure in. It is useful to record the trivial case when a constructible set is automatically algebraic, a corollary to Lemma 12.5.3. Lemma 12.5.4 Let A be a constructible algebraic subset of an affine (respectively protective, respectively quasiprojective) set. If A = A, e.g., if A is closed in the complex topology, then A is an affine (respectively projective, respectively quasiprojective) set. Example 12.5.1 is a simple and fairly typical example of a constructible set. Here it is said a slightly different way. Example 12.5.5 Consider the map F : C2 —> C2 which sends (z,w) —• (z,zw). This is a nice algebraic map, but the image is (C2 \ {z = 0}) U {(0,0)}, which is

222

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

neither the set of zeros of a set of polynomials nor the complement of such a set. This is about the worst the image of an algebraic map gets. Theorem 12.5.6 (Chevalley's Theorem) Let F : X —> Y be an algebraic map between quasiprojective algebraic sets. If Z G C(X), then F(Z) s C(Y). Proof. See (Corollary AG.10.2 Borel, 1969).

a

Chevalley's Theorem is one of the features distinguishing algebraic geometry from complex analytic geometry, e.g., holomorphic maps are too wild to admit any such result. Corollary 12.5.7 Let f : X —> Y be an algebraic map between quasiprojective algebraic sets. Then given any irreducible component B' of f(X), there is an irreducible component A of X with f(A) = B'. In particular, if X is irreducible, then f(X) is irreducible. Proof. By Chevalley's Theorem 12.5.6 we know that B := f(X) is algebraic. We first show the special case when X is irreducible. We have the irreducible decomposition B = U*j=1Bj for some positive integer r. Since X is irreducible and contained in \Jj=1f~1(Bj), we conclude that X = f~l(Bj) for some j . Thus f{X) CB3 and B = By For the general case assume that X has an irreducible decomposition \J\=1Xj for some finite s. We have the irreducible decomposition B = U^=1Bj for some positive integer r. By the last paragraph, f(Xi) is irreducible. Since for any j we have that Bj C B = Uf=1/(Xj), we conclude that Bj C f(Xi) for some i. Since any component of the irreducible decomposition of an algebraic set B is not contained • in any larger irreducible algebraic subset of B, we conclude that Bj = f{Xi). Maps of algebraic sets that "should" be surjective often fail to be because the domain lacks some points at infinity. For example, the map from V{zw — 1) C C2 to C obtained by sending (z,w) —> z misses z = 0. Example 12.5.5 is also of this sort. For this reason, it is often more useful to use the notion of a dominant map. A map / : X —> Y between quasiprojective algebraic sets is called dominant if f(X) = Y. Lemma 12.5.8 Let f : X —>Y be a dominant algebraic map from a quasiprojective set X to an irreducible quasiprojective set Y. There exists a Zariski open dense set V C Y contained in f(X). Proof. This is an immediate consequence of Theorem 12.5.6 and Lemma 12.5.2. • Thus using the Upper-Semicontinuity of dimension Theorem A.4.5 and Chevalley's Theorem 12.5.6, we have the following result.

Some Concepts From Algebraic Geometry

223

Lemma 12.5.9 Let f : X —> Y be an algebraic map of quasiprojective algebraic sets. Then for any integer k, the set {y £ Y\ dim/~ 1 (j/) > k} is constructive.

12.6

Multiplicity

Multiplicity appears in numerous places in algebraic geometry. In its simplest form, it is very easy to understand, e.g., given a not identically zero polynomial of one variable p(x) £ C[x], the multiplicity of a point x* £ V(p) is the integer fi > 0 such that p(x) = (x — x*)Mg(a;) for a polynomial q(x) with q(x*) ^ 0. In several variables, the story for a single polynomial is the same. Let be a not identically zero polynomial on CN. The p(xi,... ,XN) £ C[X±,...,XN] irreducible decomposition of V(p(x)), the solution set of p(x) = 0, is a decomposition

V(p) = UUiZN-i,i where ZN-I,I are distinct afSne algebraic sets, i.e., distinct irreducible affine algebraic sets. Moreover, dim.Zjv-i,t = N — 1 for all i and there are polynomials qi{x) such that (1) ZN-lti = V(qi); (2) the multiplicities of the solutions of the one variable polynomial obtained by restriction of qt(x) to a generic line is one; and (3) p{x) = qi{x)^ • • • qr{x)^. This is a satisfying description of multiplicity of a component, although already the situation is not so easy to prove as in one variable. What about the multiplicity of an isolated solution x* of a polynomial system

fi(xi,... f(x) :=

,XN) :

=0?

(12.6.10)

Jn(x1,...,xN)_ The difficulties with multiplicities begin when we have a set denned by more than one polynomial. Theorem 12.2.2 implies that for this system to have an isolated solution, n must be > N. Perhaps the simplest example is given by the system z1=0

(12.6.11)

^2 — 0.

It is completely reasonable to say that the origin (0,0) is a multiplicity 2 solution of this system. Indeed, since z\ = 0 defines the -22-axis, and since the restriction of z\ = 0 to the Z2-axis has 0 as a multiplicity 2 root, we must either have that (0,0)

224

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

is a multiplicity 2 solution of this system 12.6.11 or give up any sort of reasonable compatibility with the already defined notions. We define the multiplicity of x* as a solution of the system to be the dimension /x of the finite dimensional vector space Ocv/(/i,-,/n), where (1) GcN,x' is the ring of convergent power series centered at x*\ and (2) (/i,..., /„) is the ideal of OQN x. generated by the polynomials /». It is straightforward to see that when n = N = 1 this agrees with the notion of multiplicity, that we are used to, but it is certainly not clear what this means when N > 1. Also, why convergent power series? It turns out this is just a convenience for us. One could use instead formal power series, or the ring of rational functions p(x)/q(x) with q(x*) ^ 0. But the equivalence of the multiplicities obtained these different ways is not obvious! In the special case n = N, \i has a simple geometrical interpretation. If x* is a multiplicity /x isolated solution of f(x) = 0 and you choose a generic vector v 6 C^ sufficiently near 0, then f(x) = v has exactly /i nonsingular isolated solutions x*,...,x^ near x*. By nonsingular we mean that the Jacobian matrix, J, with elements T

j

~

_ dfj(x) dXj

is invertible at each of x*,..., x*. This, in fact, implies that ji = 1 is equivalent to the solution x* being nonsingular isolated. Another consequence of this, in the case n = N, is that with appropriate homotopies of the sort we construct, the number of paths ending at x* equals the multiplicity. Unfortunately, when n ^ N, the meaning of multiplicity becomes a bit more obscure, and not so closely connected to geometric intuition. This is a reflection of the complexity of the nonreduced structures on points in higher dimensions, i.e., the zero dimensional nonreduced schemes. Since we do not make much use of multiplicity we do not pursue this. If you do, you need to put multiplicity into a broader context of Hilbert functions, e.g., see the discussion of multiplicity in (Hartshorne, 1977), and in particular (Exercise V.3.4c Hartshorne, 1977). The books (Eisenbud, 1995; Fulton, 1998) are good algebraic references. See also (Bates, Peterson, & Sommese, 2005a) for a numerical-symbolic algorithm for computing multiplicity. Multiplicity for us arises in another way. Consider C := V(x\ - x\) c C2. The multiplicity of C as a component of the solution set of x\ — x\ = 0 is 1. In this case, it is useful to attach a multiplicity to each point of C. We define the multiplicity of a point x* € C as a point of C to be the multiplicity of x* as an isolated solution

Some Concepts From Algebraic Geometry

225

x* of the system

[ao + aiXi + 0,2X2

where ^ ( a 0 + aixi + 02^2) is a generic line vanishing at x*. An excellent and very readable reference for this sort of multiplicity is (Chapter 8 Fischer, 2001).

12.7

Exercises

Exercise 12.1 (Solution Components) Solve the system on page 207 using a total-degree homotopy. Do you get points on every component? How many? Exercise 12.2 (Projection from a Point) Write out the formula for a projection C 2 —> C from center c onto the line {x2 = 0}. Exercise 12.3 (Composition of Projections) Write the projection CN —> C N - 1 from center c G C^ onto the hyperplane {x^ = 0}. For points ci,C2 £ C^, consider the projection CN —>• C ^ " 2 given by the composition of the projection TTI : CN -> C ^ " 1 from center cx followed by the projection TT2 : C ^ " 1 -> C ^ " 2 having 7Ti(c2) as center. Is the result the same or different if we reverse the order of c\ and c2? Exercise 12.4 (Dimension of an Affine Algebraic Set) Let Z be the solution set of the initial example on page 207. What is the dimension of Zl What is dim(ii2) Zl Exercise 12.5 (Classifying Sets) Classify each of the following sets as affine, projective, quasiprojective, or constructible. Remember that the classifications are not mutually exclusive. (1) (2) (3) (4) (5) (6) (7) (8)

V{xy-y-l). The image of V(xy — y — 1) under the projection (x, y) 1 — > x. 2 2 2 V(x + y + yz,xz-2z ). The set of quadratic equations in one variable that have two distinct roots. The nonsingular solution points of y2 — x 2 (x — 1) = 0. Points in C 2 that are not nonsingular solutions of y2 - x 2 (x - 1) = 0. Pairs of points in C 2 such that there is a unique line containing them. Pairs of points as in the previous item such that the line contains the origin.

Exercise 12.6 (Real Solution Points) Verify the statements in item 2 on page 217 concerning the real points of the two algebraic sets mentioned there.

226

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Exercise 12.7 (Multiplicity of an Isolated Solution) Prove that the multiplicity of (0,0) as a solution zi = 0

4 =o is 2. (Hint: n = N.) Demonstrate this numerically using HOMLAB. Exercise 12.8 (Multiplicity of an Irreducible Affine Set at a Point) Show that the multiplicity of x* e V(x\ — x\) is 1 for all points but x* = (0,0), for which it is 3. Demonstrate these facts numerically using HOMLAB.

Chapter 13

Basic Numerical Algebraic Geometry

Our overarching goal is to numerically encode an algebraic set Z in a form that allows us to answer such basic questions as: Membership Is point x in Zl Dimension What is the dimension of Zl Degree What is the degree of the pure i-dimensional component of Zl Decomposition What are the irreducible components of Zl This is just a beginning, however, for suppose we have a similar encoding for a second algebraic set Y. Then, we would like to answer: Inclusion Is Y a subset of Zl Equality Is Y equal to Zl Finally, we would like to propagate the encoding through Boolean binary operations, that is, if we have encodings for algebraic sets Y and 2", we would like to: Union Find an encoding for X = Y U Z; and Intersection Find an encoding for X = Y n Z. (We regard the third Boolean operation of complementation as just the negation of the membership test.) Numerical algorithms to answer these questions form the foundation of numerical algebraic geometry. Typically, we begin not with an algebraic set, but rather, with a system of polynomials f(x) : CN -> Cn

'hixy f[x)=

j

=0.

(13.0.1)

Jn{x)_ Then, our object of study is the solution set of / = 0, which we often write as

z = vu). As discussed in § 12.2, we know that any affine algebraic set decomposes as Z:=uf™zZz,

Zi:=\JjeXiZi:i 227

(13.0.2)

228

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

where Zj is the union of all i-dimensional irreducible components of Z, and where Zij for j £ Xi are the finite number of distinct irreducible components of Zi. Geometrically, the Z^ are the closures of the connected components of the set of manifold points of Z. The algebraic set Z might be the entire solution set V(f) or it might be the union of several of its irreducible pieces. In the latter case, once we have built our encoding for Z, we wish to answer all our questions about Z alone, as if all the components excluded from Z did not exist. The purpose of this chapter is to motivate and describe an encoding of algebraic sets that we call witness sets. A first look at these is given in an introductory section, § 13.1, without worrying about how we can compute them or even justifying that they are well denned. In § 13.2, we present basic theory concerning the intersection of irreducible components with linear spaces, which is the underpinning for our formulation of witness sets, given precisely in §13.3. Next in §13.4, we define the rank of a polynomial system and present a fast algorithm to compute it. Then, in § 13.5, we show how the solution set of a system of polynomial equations relates to the solution set of a system of random linear combinations of those same polynomials. This prepares us for an algorithm in §13.6 to compute a loose inclusion of witness sets, called witness supersets. The final section, § 13.7, uses these concepts and procedures to obtain numerical methods to answer several of our basic questions itemized above. Much of this Chapter is based on the article (Sommese & Wampler, 1996), where the subject Numerical Algebraic Geometry was started, and its name coined. The name was chosen to indicate that this subject would be to algebraic geometry what numerical linear algebra is to linear algebra. After this chapter, one major problem remains before we can compute the numerical irreducible decomposition, Equation 13.1.3, namely, the witness point supersets are only a crude approximation to the numerical irreducible decomposition. A lesser problem is that the procedures given in this chapter for finding the witness point supersets are not as efficient as we would like. The two chapters following this chapter show how to solve these problems. Chapter 14 gives the efficient algorithm of (Sommese & Verschelde, 2000) to find the witness point supersets. Chapter 15 gives efficient algorithms (Sommese & Wampler, 1996; Sommese, Verschelde, & Wampler, 2001c, 2002b) to process the numerical irreducible decomposition out of the witness point supersets. The notion of witness set has developed over time. Originally in (Sommese & Wampler, 1996) and continuing through (Sommese & Verschelde, 2000), the central notion was that of generic point of a component, though all the information contained in what we now call witness sets was being computed and used. In the successive articles (Sommese et al., 2001c, 2002b), the notion of irreducible witness sets was distilled out as the essential numerical output of our algorithms. The enriched version of the witness sets for nonreduced components, presented in this chapter for the first time, is based on the experience gained from (Sommese, Verschelde, &

Basic Numerical Algebraic Geometry

229

Wampler, 2002a). 13.1

Introduction to Witness Sets

What should we adopt as our numerical encoding of algebraic sets? Let's begin by considering the simplest case, a zero-dimensional algebraic set Z. This is just a finite set of points, so we can use as our encoding a list of the points. When we are given a system of N polynomial equations in TV unknowns, the methods of Part II allow us to find a numerical approximation to all nonsingular solution points, and in fact, those methods give us a list of homotopy path endpoints that includes all isolated singular solution points as well, although we cannot readily sort these out from singular endpoints on higher dimensional components. Nevertheless, we have some confidence that the encoding of Z as a list of solution points is computable. Moreover, up to the approximation of numerical roundoff, we can easily answer all our questions about membership, union, intersection, etc. The subtlety regarding isolated singular solutions is a concern that we can resolve by considering the larger picture that includes higher dimensional components. But what shall we do when Z is positive dimensional? Looking at natural examples, e.g., the set V{x{) C C2, there are two obvious ways of encoding the points of these algebraic sets. The first approach is to use a parametric representation, e.g., representing V(xi) C C2 as {t £ C | (xi,x2) = (0,t)}. Unfortunately, while parametric representations are very useful, they are also rare. For example, in Remark A.2.10, we sketch an argument showing that a curve as simple as V{x\—x\{x\ — l)(xi —2)) has no parametric representation. A nice discussion of which curves have a parametric representation may be found in (Abhyankar, 1990). A second approach is to use denning equations. Since by definition, algebraic sets are solution sets of polynomial equations, we know that this approach has to work. Indeed, this is the approach taken in computational algebra. Low degree equations vanishing on an algebraic set are nothing to scoff at: they can be very useful. Unfortunately, computing denning equations is numerically expensive. Furthermore such equations can be numerically unstable. Numerical Algebraic Geometry rests on a third approach, using the notion of witness sets. This natural data structure to encode algebraic sets is based on the concept of generic points and the classical notion of a linear space section. Since we are going to talk often about linear subspaces of CN, it is convenient to introduce a shorthand notation for them. We use the following conventions: • Z/dL*J c C^ denotes an affine linear subspace of dimension i; and • Lcr*l c C^ denotes an affine linear subspace of codimension i, or equivalently, of dimension N — i. Depending on context, it is sometimes easier to use the notation of codimension

230

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

instead of dimension, which is why we introduce both. Consider two generic linear subspaces, L\ and Li. Their dimensions add under the operation of union, while their codimensions add under the operation of intersection. If their dimensions are complementary, their intersection is zero-dimensional; i.e., L^ n L^ i s a point. The following fact, demonstrated in § 13.2, is the foundation of the notion of a witness set. Let A C C be a pure i-dimensional algebraic set. Given a generic affine linear subspace LCM 6 C^, the set A = Lcl"*l n A consists of a well-defined number d of points lying in ATeg. The number d is called the degree of A and denoted deg A. We refer to A as a set of witness points for A, and we call Lcl~ll the associated slicing (N — i)-plane or just slicing plane, for short. The number of witness points A tells us the degree of the set A, and if we determine the codimension of the slicing plane that cuts A in isolated points, we have determined the dimension of A. However, to answer most any other question about A, such as to test whether a given point x is in A, we need the ability to track the paths of the witness points as the slicing plane is moved. When A is a pure i-dimensional reduced component of a polynomial system / , then the witness points A are nonsingular roots of / restricted to Lc'z> (more on this in a moment), so the data structure W := (A,Lc^\f) is everything a nonsingular path tracker needs to track solution paths starting at A as L c ^l evolves continuously. Accordingly, we call W a witness set for A. When A is not reduced we need a slightly richer structure, which will be discussed in § 13.3.2 and § 13.3.3. We will see later, in § 15.2, how to generate from a witness set as large a number as we wish of widely spaced points on A. The witness set data structure, more fully described in § 13.3, has many advantages: (1) it is stable and much cheaper numerically than finding defining equations; (2) it is sparing of memory; (3) it can be used to compute quantities of interest, e.g., if you really want defining equations they may be computed from this encoding; and (4) it is special case of the notion of a linear space section, for which there is an extensive theory (Beltrametti & Sommese, 1995). Using witness sets, we can make numerical sense out of what it means to find the solution set of a system of polynomials f(x) = 0 in Equation 13.0.1. We wish to find a numerical irreducible decomposition that mirrors the irreducible decomposition of Equation 13.0.2, by which we mean to find a collection of witness sets W, for the i-dimensional components V$, which are themselves decomposed into irreducible witness sets Wy for the irreducible components Vij, i.e., W:=V^v{f)Wi,

Wi:=UjeIiWV

(13.1.3)

In Equation 13.1.3, we should be a little careful to define what we mean by the union of witness sets. Let us use the notation that WA means the witness set for

Basic Numerical Algebraic Geometry

231

an algebraic set A. When two algebraic sets, say A and B, have no components of the same dimension, the witness set for their union is just a formal union of their witness sets, that is, WAUB = WAUWB = {WA, WB},

dim A ^ dim B.

However, when A and B have some irreducible components of the same dimension, we require the witness sets of the components with the same dimension to have the same slicing planes L. So, in the reduced case with A a pure-dimensional union of components of V(f) for a system of polynomials / on C^ and B a pure-dimensional union of components of V(g) for a possibly different system of polynomials g of the same dimension as A, the formal union resolves as WAUB

= WAUWB = {WA, WB} = {(A, L, / ) , (B, L, g)} .

where A and B are witness point sets for A and B, respectively. The resolution of unions in this fashion is not necessary, but it is convenient, and if two witness sets have different slicing planes, they can always be brought to a common slicing plane by homotopy continuation. In computing a numerical irreducible decomposition, we are faced with the opposite problem of computing a union. Our procedures will first find the witness set Wi for the i-dimensional component Vit and subsequently, its witness point set Vj will be partitioned into irreducible witness point sets Vy. In the above overview, we have claimed the existence of witness sets and asserted some of their basic properties. This chapter aims to justify these assertions and to describe some rudimentary algorithms based on them. Subsequent chapters will discuss refinements and extensions. We begin in the next section with the basic facts about intersecting irreducible components with linear spaces, thereby establishing that witness sets do indeed exist and have the main properties that we asserted above. 13.2

Linear Slicing

We use the terms slicing or linear slicing to mean intersecting algebraic sets with linear spaces. The answer to the following question supports the use of linear slicing and will give witness sets much of their power. How does the irreducible decomposition, Equation 13.0.2, behave under slicing by general hyperplanes? The crucial value of linear slices is that they have good preservation properties, i.e., given a general hyperplane L C CN, Z and ZnL share several important properties.

232

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

An affine hyperplane (or hyperplane for short) Lc^~\ C CN is the zero set of a linear equation, which we denote C(x; a) = ao + aiXi + • • • +

aNxN

with the <2j E C not all zero for i > 1. C\{x;a) and £2(^1 &) have the same zero set if and only if a = Xb for some complex number A ^ 0. Thus affine hyperplanes are parameterized by the subset of points [ao,ai,... ,ajv] £ P ^ with aj 6 C not all zero for i > 1. The single point not in this set, [1,0,... ,0] G P " , corresponds to the hyperplane at infinity. Similarly, we regard affine linear spaces Lcl"*l c CN as parameterized by i-tuples (Ai,..., Ai) G (P^) , where Aj := [a^o, • • •, CLJ,N] and the rank of the matrix ai,o • • • OI,JV

A—

:

•-.:

difi . . . Ili^N _

is i. The linear space is the zero set of the linear equations, so we may write Ae(¥N)\

CN DLc\i\ =v(C(x;A)),

Though we use this representation below, it is not optimal for i > 2. For example, given an invertible i x i matrix F, the linear equations associated to the matrix (F • A) define the same affine linear space as the linear equations associated to A. A much crisper parameterization is given by the use of the Grassmannian, as discussed in § A.8.1. We are interested in the relation between the solution set of the polynomial system f(x) = 0 of Equation 13.0.1 and the augmented polynomial system " fi(x) '• fn(x)

=0

(13.2.4)

.£(x; a) _ on C ^ , where C(x;a) is a general linear equation. The basic facts are as follows. Theorem 13.2.1 (Slicing Theorem) Let X C C ^ denote a pure i-dimensional affine algebraic set. There is a Zariski open dense subset U C PN such that for a€U and L = V (£{x; a)), (1) ifi = 0, then L n X is empty; (2) if i > 0, then L n X is nonempty and (i — 1)-dimensional, and deg(L n X) = degX; and (3) if i > 1 and X is irreducible, then L n X is irreducible.

Basic Numerical Algebraic Geometry

233

Items 1 and 2 of the theorem are rather elementary consequences of Bertini's theorem A.7.1, but item 3 is deeper. A quick proof of this fact follows from the Hironaka Desingularization Theorem A.4.1 and a vanishing theorem of Kodaira type. See (Theorem 3.42 Shiffman & Sommese, 1985) for a proof in the projective case. The afHne case follows from this since (1) the closure X of X in P ^ under the natural embedding C ^ C TN is projective; (2) X is irreducible if and only if X is irreducible; and (3) X D L is irreducible if X n L is irreducible. Theorem 13.2.1 is not quite strong enough to be conveniently used. We say that a set of linear equations

: = A

: •LK{X)

'

+ .XN

.

is generic with respect to an irreducible affine algebraic set X if given any subset Li1,..., Lir of r distinct Lj, it follows that (1) either Xr\V(Lllt... ,Lir) is empty or dimXnF(Li 2 ,... ,Lir) = d i m X - r > 0; and (2) Sing(X n V(LU ,...,Lir))c Sing(X) We say that a set of linear equations L\,..., L^ is generic with respect to the irreducible components of an algebraic set X if the set of equations is generic with respect to all irreducible components of X, plus all irreducible components of intersections of any number of the irreducible components of X. Theorem 13.2.2 Let X C C ^ be an irreducible affine algebraic set. There is a Zariski open dense subset of U C C X x ( i v + 1 ' of K x {N + 1) matrices such that for A 6 U, the linear equations

\L^X)] :=A fxV : ' •LK{X)1

XN X

, N .

are generic with respect to the irreducible components of X. This is a special case of the more general result Theorem A.9.2. There is a further consequence that we do not make much of because we do not keep track of multiplicities. L e m m a 13.2.3 Let f(x) = 0 be as in Equation 13.0.1. Assume that X is a positive dimensional solution component of f(x) = 0 with multiplicity /x, then there

234

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

is a Zariski open dense subset U C FN such that for a G U and L = V(£(x;a)), every component of X D L is a component of the solution set of the augmented system in Equation 13.2.4 with multiplicity fi. Proof. This follows from the stronger result (Lemma 1.7.2 Fulton, 1998).

13.2.1

•

Extrinsic and Intrinsic Slicing

In putting Theorem 13.2.1 to use, we will often simultaneously slice an algebraic set by several hyperplanes. The theorem implies that slicing an algebraic set by i generic hyperplanes will cut the i-dimensional components of the set down to isolated points. As we will see subsequently, this is a standard maneuver in many of the algorithms of numerical algebraic geometry. Accordingly, it is useful to consider how different formulations of slicing might affect computational efficiency. The formulation we have used so far in this chapter, which we call an extrinsic formulation, represents a linear space by a set of equations, as Lc'1' — V(C{x\ A)), where A is an i x (N + 1) matrix. To find V{f) n Lc|Y1, where f(x) : CN -> C n , we simply concatenate the two systems of equations to obtain the augmented system f(x) : CN -> Cn+i as

=o

(i3 2 5)

^••=Uiri)] -

--

Clearly, a general A with i < N has full rank i. Using standard techniques from linear algebra, we can write the solution of C(x; A) = 0 in the form L c m = {x e CN | x = p + B • u for some u G

CN~1},

where p G C^ is a particular solution of C(x; A) = 0 and where the columns of the N x (N — i) matrix B are a basis for the null space of last N columns of A. Accordingly, an intrinsic formulation of slicing is the system fL(u):=f(p

+ B-u)=0.

(13.2.6)

The solutions of the extrinsic and intrinsic systems are isomorphic under the mappings u — i > p + B • u and x ^ B^(x — p), where B^ is the pseudoinverse of B. Since /z, : CN~l —> C" has fewer equations and variables than f(x), the intrinsic formulation can save computation compared to the extrinsic one. From a geometric point of view, the extrinsic and intrinsic formulations are identical: they both describe the intersection of V(f) with LC^T. Furthermore, in a situation where we wish to choose the slicing plane generically among the set of all affine (N — i)-planes, it does not matter if we do so by choosing random coefficients A as a point in (P^) 1 or if we choose random {p, B) for the intrinsic formulation. Either way, we are choosing a random slicing (N — i)-plane from the Grassmannian of all such planes in C^.

Basic Numerical Algebraic Geometry

13.3

235

Witness Sets

The strong version of the slicing theorem, Theorem 13.2.2, gives us everything we need to justify our definition of a witness set. It tells us that for an affine algebraic set X, a generic Lc^1^ c CN meets the irreducible components of X as follows. • It misses any irreducible components of dimension less than i. • It meets each irreducible component X^ of dimension i in degXij isolated points, and these points do not lie on any other component. • It intersects irreducible components of dimension k > i in an irreducible algebraic set of dimension k — i. Moreover, Theorem 13.2.2 implies that LCW will be generic with probability one if we choose the coefficients of its defining linear equations at random from 0, afiniteset of points in Lc^^ does not uniquely identify any particular algebraic set: many different algebraic sets pass through those same points. Accordingly, we see that a witness point set alone is not a complete encoding of an algebraic set; for that, we need to carry along additional symbolic information describing the set. We wish to define a witness set for an affine algebraic set X to consist of a witness point set plus such additional information required to uniquely define X. Suppose that X is an i-dimensional irreducible solution component of a system of polynomial equations f(x) = 0. If we have a symbolic formulation of f(x) on hand, the set {X (1 Lc^%\ Lc^l\ f} defines X uniquely: the witness points tell us which

236

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

component of V(f) is of interest.1 We hasten to add that situations may arise where we define an afHne algebraic set in some indirect manner such that, although there must exist a set of polynomial equations that define the set, we do not necessarily have such a set at hand. As such situations arise, our definition of a witness set will be adapted to accommodate them. While a witness set of the form {X n Lci>l\ Lc^l\ f} is everything we need for theoretical purposes, it is not always sufficient from the numerical point of view. As a data structure in a computer program, we want to treat / as a pointer to a black-box routine that, given a point x*, returns just the function value f(x*) and the Jacobian matrix df/dx(x*), as floating point numbers. We would prefer not to perform any symbolic manipulations to extract information from / . Even in the case of a zero-dimensional algebraic set, which is just a finite set of points, a numerical witness point set is just a list of approximations to those points. A witness set should carry along enough additional information to allow us to numerically refine these approximations to higher precision. Exactly what additional information we carry along to numerically encode an algebraic set X will depend on the properties of X and also on the initial symbolic information we have been given to uniquely describe X. Accordingly, we will have several different flavors of witness set, but each will include a witness point set and enough additional information to allow us to use the witness set in our numerical algorithms. In an implementation of these algorithms in computer code, the witness set would be a data structure that includes a field identifying its flavor, and basic operations on witness sets need to be able to handle all such flavors. In the next few paragraphs, we define threeflavorsof witness sets that are useful in numerical work. We will not yet give numerical algorithms for computing such sets; these come later in the chapter. 13.3.1

Witness Sets for Reduced Components

Let us remind ourselves of the meaning of "reduced." The notions of reduced versus nonreduced are not to be confused with reducible versus irreducible. The line {(x,y) s C2 | x = 0} is an irreducible algebraic set that is a reduced solution component of the equation xy = 0, but it is a nonreduced solution component of the equation x2y = 0. Thus, we see that reduced and nonreduced are not intrinsic properties of the set as a geometric entity, but relate to algebraic properties of the system of polynomials that we use to define the set. Reduced is synonymous with multiplicity-one, while nonreduced implies a multiplicity greater than one. The salient point is that if Xi C C^ is an i-dimensional reduced solution component of the system of equations / = 0, it meets a generic codimension i slicing 1

Strictly speaking, the slicing plane Lcl~ll is not necessary, because it is either uniquely determined by the witness points or, if not, we can pick a slicing plane at random from among all the (TV — i)-dimensional linear spaces that interpolate the witness points. Nevertheless, it is convenient in our algorithms to have it on hand rather than to regenerate it when needed.

Basic Numerical Algebraic Geometry

237

plane in witness points having multiplicity-one. Such a point is numerically tame: the Jacobian matrix of the system of equations defining the point is full rank N. Letting L(x) — 0 denote a system of % independent linear equations for the slicing plane, the witness points are solutions of the augmented system {f(x), L(x)} = 0. If the number of equations, n, in this system is equal to the number of unknowns, N, then Newton's method converges quadratically in the neighborhood of the witness point. If n > N, the Gauss-Newton method, that is, Newton's method modified to use a least-squares iterative step, converges quadratically. This is quite satisfactory for our numerical work, so when X, is a reduced component, we use {XinLcW,LcW,f} as its witness set. 13.3.2

Witness Sets for Deflated

Components

In the case that an irreducible algebraic set X is a nonreduced solution component of a polynomial system f(x) = 0, its witness points are isolated roots of multiplicity greater than one for the augmented system f(x) = {f(x),L(x)} — 0. This means that the Jacobian matrix evaluated at such a root is not full rank: the witness points are singular. As discussed in § 10.5, the behavior of Newton's method near singular roots is greatly degraded and may even diverge. However, since the witness points are isolated, one or more stages of deflation, as described in Equation 10.5.5, may allow us to compute the witness points in a nonsingular manner. That is, for a witness point x*, deflation produces a new system of equations, say g(y) = 0, with a projection operator, say n : y H-> X, such that y* is a nonsingular solution of g{y) = 0 and x* = 7r(y*). In fact, since the slicing plane V(L) is generic and X is irreducible, the same deflation system that works on one witness point x* G X n V(L) must work for all other witness points in the set. Let us restate this in a slightly more general way, independent of the specific deflation technique described in § 10.5. Suppose that we have an i-dimensional irreducible algebraic set X C C^, and a linear projection TT : C M —> CN, with M > N. Let C(x; A) = 0 be a system of linear equations parameterized by matrix A, as in § 13.2, such that a generic A defines a codimension % linear space. Suppose that for generic A, each witness point x* £ X D V(L(x; A)) has above it a point y* G C M that is a nonsingular solution of a system of polynomial equations g(y; A) = 0. For a particular generic slicing plane Lc^l\ suppose Wy C C M is a collection of such nonsingular solution points, one for each point i n l f l Lc^^, so that ir(Wy) = X n Lc^ • Then, we may use as our numerical witness set for X the data {Wy,LcW,g,n}. Of course, in our numerical work, Wy will be a numerical approximation to the ideal points. To refine the witness point set n(Wy), we use Newton's method to refine

238

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Wy as solution points of g(y) = 0, and then project.

13.3.3

Witness Sets for Nonreduced Components

It might not always be possible or convenient to find a deflation formulation for a nonreduced solution component. For example, it may be that several stages of deflation would be necessary, giving an unreasonably large dimension M in which the numerical work is carried out. Alternatively, it may happen that the algebraic set in question is not given to us as a solution component of a system of polynomial equations. For example, it could be defined as the intersection of two solution components of a system. Such a set is certainly described by some system of equations, but it might take considerable symbolic computation to construct them from the data at hand. In this section, we present a third flavor of witness set that can handle many such situations. Suppose that we use a homotopy method to solve for the witness points, that is, each point w € W := X n L c ^ is the endpoint of some solution path xw(t) of a homotopy function h(x,t) = 0, i.e., h(xw(t),t) = 0, \imt^oxw(t) = w. We will construct explicit examples of such homotopies below, but for now, we simply posit the existence of one. When X is multiplicity greater than one as an i-dimensional solution component of a given polynomial system / , the homotopy we construct will have w as a singular endpoint. Recall from Chapter 10 that we have several methods for computing singular endpoints of homotopy paths. In the power-series endgame or the closely-related Cauchy integral endgame, we estimate the endpoint by building a local model of the end of the path, sampling the path for small t inside the endgame convergence radius but outside the ill-conditioned zone at t — 0. Taking this route, we define the set

W(e) := {xw(e) \ w e W} consisting of the solutions of h(x, e) = 0 that lead to the witness points W as e —>• 0, with e € (0,1]. Our third flavor of witness set for an i-dimensional algebraic set X is, accordingly, the data

{W,Lc^,h(x,t),W(e),e}, where in addition to the conditions that L c ^ is a generic (AT — «)-dimensional linear subspace of C^ with W := Lc^ fl X, we have h(x, t) and W(e) satisfying (1) for each point w G W, we have a positive e > 0 and a nonsingular path xw(t):(0,e]->CN with h(xw(t),t) (2) W(e) = {xw(e)

— 0 and limt^oxw(t) \w£W}.

= w, and

Basic Numerical Algebraic Geometry

239

Whenever we wish to refine a numerical approximation of the witness set, we can do so by re-playing the singular endgame in higher precision, using W(e) and e to initialize the solution paths of the homotopy. Whichever treatment of nonreduced components we choose, we still refer to W as a witness point set for X. For simplicity of statement, we often suppress the reference to h(x,t) or g(y) and refer to the witness set (W, Lc^l\f) in both the reduced and nonreduced case. We will soon turn to the task of computing witness point sets, but first we prepare by discussing the rank of a polynomial system and randomizations of polynomial systems. 13.4

Rank of a Polynomial System

Let f(x) = 0 denote a system of n polynomials on
A- x := A-

:

,

_XN .

where A is an n x N matrix, the rank of the system is the classical rank A of the matrix A and the corank is the dimension of the null space of A • x = 0. Note that given a system / as above, neither adding polynomials in the equations of / to the system nor replacing / with F • f, where F is an invertible nxn matrix, changes the rank of the system.

Lemma 13.4.1 Let f(x) = 0 denote the system of n polynomials on CN. Then there is a Zariski open set Y c f(CN) such that for y eY, V(f(x) — y) is smooth of dimension equal to the corank of f. Moreover, the Jacobian matrix of f is of rank equal to rank/ at all points ofV(f(x) — y). Proof. This is a corollary of Theorem A.6.1 with X taken as CN.

•

An important consequence of the above is that for the dense Zariski open set U := f~1(Y), we have for all points 2* € U, the rank of the Jacobian of / evaluated at £* equals rank/. This gives us a fast probability-one algorithm for the rank for a system f. Explicitly, given a system f(x) of n polynomials on CN, then the rank of / equals the rank of the Jacobian at a random point of C^. To emphasize its importance, we restate the algorithm below.

240

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Rank of a System: r = rank(/) • Input: Polynomial system f(x): CN —> Cn. • Output: r := rank/. • Procedure: — Choose a random x* £ C^. — v '=

r a n k I ——IT** ) I

— return(r). The numerical determination of the rank of the Jacobian matrix is best done using the singular value decomposition. The rank intervenes in the study of systems in the following way, which will play an important role in subsequent developments. Theorem 13.4.2 Given a system f(x) of n polynomials on CN, all irreducible components ofV(f) have dimension at least equal to the corank of f. Proof. As noted in Lemma 13.4.1, the corank of / is dimxf~1(f(x)) dense Zariski open set of CN. Thus, by Theorem A.4.5, the set {xeCN must equal C^.

\ dimx f-\f(x))

for £ in a

> corank/} •

Remark 13.4.3 Surprisingly, previous to this book, the rank of a system has not been defined explicitly in numerical algebraic geometry. Example 13.4.4 Consider the space of 3 x 3 orthogonal matrices, usually denoted 50(3). For any matrix A e 5*0(3), we have the defining equations ATA = I, detA = 1, where / is the identity. Because ATA is symmetric, the first matrix condition amounts to just 6 scalar equations, so we have 7 polynomial equations depending on the 9 entries in A. However, the rank of the system is only 6. Thus, 5*0(3) can have no components of dimension smaller than 9 — 6 = 3. Of course, it is well-known that the dimension of 50(3) is three, so the rank condition is sharp in this case. The definitions of rank and corank make sense for systems of algebraic functions, e.g., rational functions, defined on an irreducible quasiprojective algebraic set X. Lemma 13.4.1, Theorem 13.4.2, and the algorithm for the rank carry over immediately with the same proofs to this situation. This generalization, which will be needed in Chapter 16, is presented in § A.6.

Basic Numerical Algebraic Geometry

13.5

241

Randomization and Nonsquare Systems

We define a square system to be a system f(x) =

7i(*)" : =0

(13.5.7)

Jn(x)_ of polynomials on C^ with n = N. When we numerically solve a system of equations, it is usually convenient, and sometimes necessary, to have the same number of equations as unknowns. The systems we wish to study might not be square. If n < N, we call the system underdetermined, and if n > N, we call it overdetermined. However, if it is underdetermined, its rank is at most n, so by Theorem 13.4.2, its irreducible solution components must be dimension at least N — n. We will work with such components by slicing them with at least N — n hyperplanes, resulting in an augmented system having at least as many equations as unknowns. Of course, when augmented by slicing planes, square systems become overdetermined, and overdetermined systems stay overdetermined. Therefore, we see that the overdetermined case needs attention. To find the isolated solutions of an overdetermined system, n > N, the naive approach is to pick out N equations, solve them, and check the solution points against the remaining equations. This approach is fraught with peril. For example, consider the system: xy = 0 x(z-y) = 0 y(x - y) = 0.

(13.5.8)

Any two of the 3 equations have a 1 dimensional solution set, but all three together have the origin (with multiplicity 3) as the solution set. There is a natural procedure for obtaining a square system from the above system, f(x) — 0. Given a n i V x n matrix of complex numbers A G CNxn, we can form a square system / *1,1 fl +•••+M,nfn A./=

\

: \-V/V,l/l + • • • + AjV,n/n /

As we will show below, this square system has all the properties we need to compute an irreducible decomposition of V(/), and in our first article (Sommese & Wampler, 1996) on Numerical Algebraic Geometry, this was our approach. In the following paragraphs, we present a somewhat more general view of randomization, which is essential in dealing with intersections of irreducible algebraic

242

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

sets (Sommese et al., 2004b). In particular, let us consider the system A • /(x), where the k x n matrix A G Ckxn is chosen generically. If k = N, this is a square system, but we may consider k ^ N as well. Let us discuss the principal facts about this construction. First note that this construction is only of interest if k < n. To see this, note that if k = n, then such a A is invertible. Consequently, the systems A • f(x) = 0 and f(x) = A" 1 • A • f(x) = 0 are equivalent. If k > n, then we may break A G C fcxn into two submatrices by rows; the matrix Ai formed from the first n rows is an invertible matrix for a nonempty Zariski open set of the k x n matrices A £ Ckxn. Let A2 be the remaining (k — n) x n matrix formed from the last k — n rows of A and let r

._ [

Aj

• [-Aa-Ar

1

Onx(k-n)]

/*-„ J

with 0nX(k-n) the n x {k — n) matrix with all zero entries and Ik-n the (k — n) x (k — n) identity matrix. Then T is invertible and A • f{x) = 0 is equivalent to

r h(x) ' L-A2-A!1

/fc_n J [ A 2 J M '

/n(x)

.0(fe-n)xl.

Thus, only if k < n is this construction interesting. If k < n, then we may break A into two submatrices as A = [Ai A2], where Ai is k x k. Submatrix Ai is an invertible matrix for a nonempty Zariski open set of the kxn matrices A G Ckxn. Thus, A-/(x) is equivalent to A^"1 -A-f(x) = [I A']-f(x), where A' = A1"1A2- In other words, the system is of the form /l

fk+l A

': + ' • ': Jk\ L fn . for a nonempty Zariski open set of k x (n - k) matrices A' G Ckx(n~k), It is important to note that though mathematically A • f(x) and [/ A'] • f(x) are equivalent, the latter may be better than the former for homotopy continuation. Moreover, the ordering of the equations can matter. For example, if the equations were

' x\ +xl - 1" x2 Xi — 1

=0

Basic Numerical Algebraic Geometry

243

the randomization 'x2 + x22-l

+ Xl(x1-l)^

a;2 + A 2 (x 1 -l)

J

= Q

would be better than the randomization [ X2 + Xi{xl + x22-l)

1_

[Xl-l+X2(xl+xl-l)\ since there would be only two paths to follow using a total degree homotopy on the former as opposed to four paths on the latter. The key properties of randomization are given by the following simple theorem of Bertini type. Theorem 13.5.1 Let 7i(*)' /(*) = : =0 Jn(x)_ be a system of polynomials on CN. Assume that A C CN is an irreducible affine algebraic set. Then there is a nonempty Zariski open set U of k x n matrices A e C fexn such that for A e U (1) if dim A > N — k, then A is an irreducible component of V(f) if and only if it is an irreducible component ofV(A • f); (2) if dim ^4 = N — k, then A is an irreducible component ofV(f) implies that A is also an irreducible component ofV(A • / ) ; and (3) if A is an irreducible component ofV(f), its multiplicity as a solution component of A • f(x) = 0 is greater than or equal to its multiplicity as a solution component of f(x) = 0, with equality if either multiplicity is 1. It is important to emphasize that although an irreducible component of V(f) is an irreducible component of the randomized system, V(A • / ) , its multiplicity as an irreducible solution component of A • / = 0 (if not 1) might be larger than as an irreducible solution component of / = 0. The following system, which is equivalent to the system Equation 13.5.8 illustrates this: xy = 0 x2 = 0

(13.5.9)

y2 = o. The origin is an isolated solution of multiplicity 3. The randomized square system x(y + /xix) = 0 y(x + n2y) = 0.

(13.5.10)

244

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

has the origin as an isolated solution of multiplicity 4. The randomization of a system will be used often enough that we introduce a new notation for it. We let fK(/(x);/c) denote a randomization A • f{x) with A a k x n matrix. We also may write 9l(fi(x),..., fn(x); k) to mean the same sort of randomization acting on the system obtained by stacking up all the functions fi(x). When we use the randomization method in probability-one algorithms, A must be chosen from a Zariski open dense set that is defined by the problem at hand, and it may depend on other choices we make in the algorithm. Logically, the open set from which we must choose A is not defined until all such choices are made, and so we should choose A last. Operationally, we usually do not have a computationally useful description of the set nor do we need one, since a random choice of A will be in the set with probability one. Accordingly, it does not matter when we choose A in the course of the procedure, as long as the choice is made independently of the conditions that define the invalid set.

13.6

Witness Supersets

Suppose that we wish to compute the numerical irreducible decomposition, Equation 13.1.3, of V(f) for some polynomial system / : C^ —> C n . The logical first step is to find a witness point set, Wi := V* (~l Lc^l\ for each pure-dimensional solution component, Vi. A second step would then decompose these into irreducible components. Unfortunately, we do not have an algorithm for directly computing the Wi, but we can readily compute a looser set Wi that contains Wi. We will call such a set a witness point superset, defined as follows.

Definition 13.6.1 (Witness Point Superset) Let Z c C^ be an affine algebraic set, and let X be a pure i-dimensional component of Z. Then W C CN is a witness point superset for X as a component of Z if it meets the requirements: (1) W is a finite set of points; (2) W GZnL^^jjind (3) (IflLcf'l) CW, where Lc'1' c C^ is a generic linear space of codimension i. A witness superset for Z is just a collection of one witness point superset at each dimension along with the corresponding linear slicing space, Lc^\ at each dimension. Remark 13.6.2 Since for generic Lc^^, Z n Lc^l is empty for i > dim Z, we see that the witness point supersets for all dimensions greater than dim Z are empty. Let Vi be the union of all the i-dimensional irreducible components of V(f), and let Wi be a witness superset for Vi. UVi is not the maximal dimensional component of V(f), then a linear space Lc^l will meet the higher dimensional components, and

245

Basic Numerical Algebraic Geometry

Wi will likely contain some points on those components. That is (13.6.11)

Wi = Wi + Ji

where Jj C Uk>iVk. We call Jj the "junk points" in W. Even when i = 0, i.e., the classical case of finding isolated solutions of f(x) = 0, the homotopy methods of Part II return WQ and give no ready method to distinguish isolated singular solutions in Wo from the junk points JQ. In Chapter 15, we will present algorithms that discard the junk points Ji to get Wi and then further decompose the Wi into the Wtj of Equation 13.1.3. This will give the numerical irreducible decomposition. For the present, we will concentrate on finding the witness point supersets. We can compute witness point supersets using homotopy continuation. Theorem 8.3.1 gives conditions for a homotopy to find all isolated solutions of a square system of polynomial equations. Total degree homotopies and multihomogeneous homotopies as given in § 8.4.1 and § 8.4.2 have start systems with only nonsingular roots, so they satisfy the required conditions for finding all isolated roots in CN. The linear product homotopies of § 8.4.3 and the polyhedral homotopies of § 8.5 do the same on ( CN. A quick look at our situation reveals that for all but the lowest dimension, simply appending a set of linear slicing equations to f(x) will give us an overdetermined system. V(f) may have components at dimensions i = N,N — 1,... ,N — rank/, and rank/ < min(N,n), so only dimensions i > max(iV - n,0) are of interest. To get witness points at dimension i, we slice with i linear equations, say L(x), and so {f(x), L(x)} has i + n > N equations. How can we solve such systems? The answer is to employ randomization to square up the system, as in the following algorithm. Witness Superset for Dimension i: [W,L] = WitnessSupi(/, i) • Input: Polynomial system / : CN —> C n , and an integer 0 < i < N. • Output: A generic (TV —i)-dimensional linear space L C CN and a set of points W that includes the witness points W := X n L, where X is the i-dimensional component of V(f). • Procedure: — If i = N, perform the probabilistic null test as follows: * Choose a random point x* £
:= x*).

— Else, choose a random point a G Ce and a random matrix A e — Let L be V{£), where £(x) := a + A • x.

CxN.

246

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

- If i > N - rank/, * Compute S := HomSolve({fH(/; N - i), £.}). * Let W :={s G S \ f(s) = 0}. * retum(W,L). - Else return(t? := 0,L).

Theorem 13.6.3 For 0 < i < N, there is a Zariski open dense set U € CixN+1 such that for (a, A) & U, algorithm WitnessSupi returns a witness point superset for the i-dimensional component ofV(f). Proof. By Theorem 13.5.1, if A is an i-dimensional irreducible solution component of f(x) = 0, then it is also an irreducible solution component of F(x) := fH(/; N - i). Therefore, the witness points A n Lc^l~\ are isolated points in both V(f) f~l Lc^l~\ and V(F)C\LC^^, for a generic linear space L 0 ^! c CN of codimension i. By assumption, the set of points S returned by S := HomSolve(g) is finite and includes all isolated solutions of V{g), so W must include A n L c ^l. This holds for every i-dimensional irreducible component of V(f), so W includes the witness points for the entire i-dimensional component of V(/). Thus, items 1 and 3 of Definition 13.6.1 are satisfied. To see that item 2 of that definition is satisfied, we argue as follows. By our assumptions on HomSolve, we have 5* C (V(F) (1 i c ^ l ) . By Theorem 13.2.1, for generic L c ^ , V(F) n Lc^l this includes only points of V(F) of dimension i or greater. By Theorem 13.5.1, any components of V(F) of dimension k > i must also befc-dimensionalcomponents of V(f). Thus, any points in S that lie on components of V(F) of dimension k > i must also be in V(f). Consequently, any s £ S such that s $ V(f) n Lc^ must lie in a component of V(F) \ V(f) of dimension i. Such • points do not satisfy f(s) = 0, and so they are not copied from S into W. For i = N, the algorithm uses the probabilistic null test to see if all the functions in / are the zero polynomial. For all other i > N — rank/, we solve a square system of size N. The statement of the algorithm above uses an extrinsic formulation of slicing. To work intrinsically, we just change a few lines, and in so doing, decrease the size of the square system we solve to only N — i. Witness Superset for Dimension i (intrinsic): - Choose a random point b 6 CN and a random matrix B £ (^NX(N-I) _ - Let L be the space defined intrinsically as L(u) := b + B • u, u £ CN~l. - If i > N - rank/, * Let F(x)=iR(f;N-i). * S := HomSolve(F(b + B-u))c

CN~\

Basic Numerical Algebraic Geometry

* Let W :=Jw <E CN \ w = b + B • s, * ret\irn(W,L).

247

s e 5 and f(w) = 0}.

- Elsereturn(t?:=0,L). When i = N — 1, the system to be solved has only one variable, so the call to HomSolve could be replaced by any other method for solving polynomials in one variable. With WitnessSupi available to find a witness superset for the i-dimensional component of V(f), it is a simple matter to assemble a collection of such sets for every possible dimension. To be explicit, we display the full algorithm, WitnessSuper, below. Witness Superset: [W] — WitnessSuper(/) • Input: Polynomial system / :JCN -> C*. __ • Output: A witness superset W — {(Wo, LQ), ..., (WJV, £JV)} for V(f), where (Wj, Li) is a witness superset for the dimension i component. Empty dimensions may be omitted. • Procedure: -

Initialize W = {}. Append (WN,LN) =_WitnessSupi(/, N) to W. HWN ^ 0, return(W). L o o p : For i = N -1,...,N - rank/ * Append (Wi,Lt) - WitnessSupi(/,i) to W.

~ End loop. - return(W^). i

i

Recall that in the case of nonreduced components, we wish to include in our numerical witness sets additional information to allow robust numerical computation of the witness points, either a deflation formulation as in § 13.3.2 or a homotopy formulation as in § 13.3.3. Clearly, a homotopy is available inside algorithm WitnessSup, we merely have to return the information. A deflation formulation can be returned if one is used in the endgame of HomSolve. Notice that deflation can only work on the true witness points in the witness superset, because these are isolated solutions, whereas the junk points are not. So before trying to deploy deflation, we need to separate the junk from the witness points, which requires the methods of Chapter 15. 13.6.1

Examples

In the following examples, the tables summarizing runs of algorithm WitnessSuper have columns labeled as follows:

248

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Dim Paths #W

Dimension of component under investigation. Number of paths in the homotopy. Number of endpoints in witness superset.

#W^sing #7V #00 nfe nfe

Number of those points that are singular. Number of "nonsolutions," i.e., endpoints x $ V(f). Number of endpoints at infinity. Average number of function evaluations per path. Total number of function evaluations for this dimension.

The number of function evaluations depends on details of the path tracker and the endgame, including the various control settings they use. The figures reported here are for the default settings in H O M L A B . These numbers will change slightly in repeat runs, because the paths depend on random choices of slices and in the randomization to square up a system. They are included to give a sense of where the algorithm spends most of its effort. Example 13.6.4 Consider the system given in Equation 12.0.1, which for convenience we repeat here:

f(xv)-\

sfo2-*3)^-1)

l - o

The polynomials are degree 5 and 6, so using total-degree homotopies, algorithm WitnessSuper tracks 6 paths to find a dimension 1 witness superset and 30 paths to find a dimension 0 witness superset. The results are summarized in the following table. Dim 1

Paths 6

#W 4

0 I 30 I 30 I

#Wsins 0

28

#Af 2

#00 0

nfe 120

nfe 72~T

0 I 0 I 165 4948

This is consistent with the fact that V(f) has a degree 4 component at dimension 1, decomposable into V(x) and V(y2 - x3). The superset at dimension 1 has four points and no junk. At dimension 0, all paths must end on V(f) as there is no slice involved. (This would not necessarily be true if / had more equations than variables.) Of the 30 path endpoints, 28 are singular. The true witness points are the two nonsingular points in the witness superset, the other 28 are junk. Junk points are always singular, but it would be erroneous to conclude that singular points in the superset are necessarily junk. In fact, if the factor (x — 1) in the first equation were changed to (x — I) 2 , the zero dimensional solution points would become double points and therefore would be singular.

249

Basic Numerical Algebraic Geometry

Example 13.6.5

The system f(x

v )

n*,v)-[

- Wy-xy-2yA

xy3_y

_

j-o

leads to the following results from WitnessSuper: Dim 1 0

Paths 4 12

#W 1 10

#Wsing 0 6

#JV # 0 0 3 0 0 2

nfe nfe 158 631 166 1997

At dimension 1, we have one witness point for the set V(y). At dimension 0, two of twelve paths go to infinity, leaving ten points in the witness superset. The six singular points in the zero-dimensional witness superset are in fact junk: they all have y = 0. The remaining four points are the finite isolated roots in V(f). Example 13.6.6 Running WitnessSuper on the equations for 50(3), see Example 13.4.4, one obtains the following table. Note that the rank test saves us from trying to compute witness points for dimension 2, which would have required 192 paths. Dim ~8 7 6 5 4 3

Paths 3 6 12 24 48 96

#W 0 0 0 0 0 8

#W 5 i n g 0 0 0 0 0 0

#AT #00 3 0 6 0 12 0 24 0 48 0 40 48

nfe nfe 58 173~ 107 639 151 1815 150 3590 229 10986 254 24337

In all the examples, we may observe that the number of function calls grows as we descend dimensions. This is due both to an increase in the number of paths (which grows geometrically) and also a general tendency for the number of calls per path to increase. Not reflected in the table is the additional fact that the number of variables climbs as we descend, so the linear solving routine used in prediction/correction iterations will be more expensive. So, by every measure, the bottom run is by far the most expensive. This underscores the importance of using the rank of the system to eliminate low-dimensional runs. 13.7

Probabilistic Algorithms About Algebraic Sets

In this section we follow (Sommese & Wampler, 1996) and show how the witness supersets immediately give some numerical algorithms. Subsequent chapters will present more efficient algorithms, so the main point here is to recognize the capabilities that witness supersets make possible.

250

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

13.7.1

An Algorithm for the Dimension of an Algebraic Set

One consequence of Remark 13.6.2 is a simple algorithm for finding the dimension of V(f), i.e., the maximum of the dimensions of the components of V(f). •

•

Top Dimension: d = TopDimen(/) • Input: Polynomial system / : C^ —> C n . • Output: The dimension of V(f), i.e., d := dim V(f). If V(f) = 0, then d := 0. • Procedure: — Loop: For i = N, N - 1,..., N - rank/ * LetjW,L) := WitnessSup(/,«'). * If W / 0, then return(d := i). — End loop. — return (d := 0).

13.7.2

An Algorithm for the Dimension of an Algebraic Set at a Point

Let Z be an algebraic subset of C^ defined by a system of polynomial equations / = 0. Let p G Z, i.e., p e CN and f(p) = 0. Recall from § 12.2.1 that if Z = Vfi=1Zi is the decomposition of Z into pure-dimensional algebraic sets, then the dimension of Z at p G Z is max{i | p 6 Zi}. In this section we give an algorithm to compute the dimension of Z at p. In particular: (1) if p is a generic point of an irreducible component Zi of Z, then this algorithm computes dim Zi\ (2) this algorithm lets us decide whether a solution p of a system / = 0 is isolated. This is the local variant of the dimension algorithm of § 13.6. The algorithm proceeds as follows. If Zi is an irreducible component of Z containing p, then any affine CN~dtm z' near a generic affine CN~dim Zl containing p meets Zi in at least one point near p. Moreover if dim Zi is the maximum dimension of any irreducible component of Z containing p, then for k > dim Zi it follows that generically an affine CN~k near a generic affine CN^k containing p does not meet Z in any points near p. A generic affine CN~k containing p := {pi,- • -,PN) is specified parametrically by {x G C^ \ x — p + B • u, u G CN~k} where B is a generic N x (N — k) matrix. An affine CN~k nearby is one parameterized by ip',B') G CN+N*(N~k*> in the neighborhood of (p,B), using the complex topology. Let us first lay this out as a conceptual algorithm in which many implementation details are left for later. In particular, the algorithm depends on a procedure [S] = LocalSlice(Z,p, L) that returns a list of points S C ZnL that contains all isolated

Basic Numerical Algebraic Geometry

251

points of Z n L near p, where L C C^ is an affine linear space. We do not specify an implementation for LocalSlice here, but one possibility is given in a numerical version of the algorithm later in this section. Local Dimension: (conceptual algorithm) [d] = LocalDimen(Z, p) • Input: A numerical description of an algebraic set Z c C". For a square polynomial system g : Cm —> C m , the methods of Part II provide us with homotopy methods that give a list of solution points in V(g) that includes all isolated points, so we may adapt this to implement LocalSlice. With these considerations in mind, we may construct a numerically viable method for finding local dimension, as follows. Where overdetermined systems appear in the algorithm, we use the randomization procedure of § 13.5 to reduce to the same number of equations as unknowns. This version of the algorithm replaces LocalSlice with a call to homotopy procedure S = HomSolve(g) for solving square systems. Local Dimension: [d] :— LocalDimen(/,p) • Input: Polynomial system / : C^ —> C™, and a point p £ V(f). • Output: The dimension of V(f) at p, i.e., d := dimp V(f). • Procedure: - If f(q) = 0 for a random q G C^, then return(d := N). — Loop: For i = 1,2,..., rank/ * Choose a random matrix B e CNxi. * Set L(u) := p + B • u, a generic affine C* containing p.

252

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

* * * * *

Let L'(u) be a generic affine C* near L. Let fi'{u) := f(L'(u)), a system of n polynomials in i variables. Let g(u) := 91(/L'; i), a square system of size i. Compute 5 := HomSolve(g). If 5 contains a point u* such that p' := L'(u*) is near p, then return(d := N -i).

- End loop. The definition of nearness in this algorithm is a bit problematic. One prescription would be to repeat the test for a sequence of linear spaces closer and closer to L and see if this produces a sequence of solution points closer and closer to p. Ideally, this would be done using continuation from a generic V to L. The problem is that the set S can contain singular solutions. Since we do not know the local dimension at these, we do not know the dimension of the solution paths as V varies and so it is not possible to numerically track them. The methods of subsequent chapters will refine the situation so that the nearness test can be implemented as testing equality between p and the endpoint of a well-defined one-dimensional homotopy path. The use of HomSolve in the algorithm above is overkill, because it finds all isolated solutions of g = 0, when all we need is to find one near p, if it exists. A better alternative might be to use an exclusion method (see § 6.1) initialized on a small box containing p. An interesting purely local heuristic for checking the dimension at a point p is given in (Kuo, Li, & Wu, 2004), based on the methods of (Li & Zheng, 2004). It is a variant of the conceptual algorithm above, with a heuristic for LocalSlice. If this could be strengthened to a probability-one algorithm, it might be substantially more efficient than the numerical procedure above. 13.7.3

An Algorithm for Deciding Inclusion and Equality of Reduced Algebraic Sets

At this point, we can succinctly formulate an algorithm for deciding inclusion of the solution sets of two systems of polynomial equations, which will immediately yield an algorithm for deciding equality of such solution sets. Inclusion Test: [t] = Inclusion(/, g) • Input: Polynomial systems / : CN -> C", g : CN -> C m . • Output: Logical t := true if V(f) C V(g), otherwise, t := false. • Procedure: - Loop: For % = AT, N - 1 , . . . , N - rank/, * Let W := WitnessSup(/, i). * If g(x) ^ 0 for any x € W, then return(£ := false). - End Loop.

Basic Numerical Algebraic Geometry

253

— return(£ := true). The inclusion test leads immediately to an equality testing algorithm, as follows. Equality Test: [t] = Equal(/,#) • Input: Polynomial systems / : CN -> Cn, g : CN -> C m . • Output: Logical t :— true if V(/) = V(g); otherwise, t := false. • Procedure: — — — —

t\ := Inclusion(/,g). t2 := Inclusion^,/). If both t\ and £2 are true, then return(i := true). Else, return(£ := false).

1

1

We have not dealt with multiplicities in this algorithm. Thus this algorithm gives a way of deciding if the reduced algebraic set defined by / = 0 is an algebraic subset of the reduced algebraic set defined by g = 0. This algorithm is a translation of the algorithm from van der Waerden's classic (§93 to §98 van der Waerden, 1950). It is a strength of our numerical version of generic points model that they model the classical generic points close enough that such results of classical algebraic geometry translate without difficulty. 13.8

Summary

Given a polynomial system fix) = 0 on
Exercises

Several of these exercises refer to routines from HOMLAB (see Appendix C). Routine wit sup. m implements algorithm WitnessSuper. If the system to be analyzed is provided in tableau form, script wsuptab.m will sort them by descending degree and then call wit sup. m. Exercise 13.1 (Multiplicity and Randomization) Show that the system Equation 13.5.9 has the origin as an isolated zero of multiplicity 3. Show that the system Equation 13.5.10 has the origin as an isolated zero of multiplicity 4.

254

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Exercise 13.2 (Inclusion Test) Use witsup.m to find witness points for the twisted cubic, V(y — x2,z — x3), and also for V(xy — z2,xz inclusion test to see if either of these contains the other.

— y2).

Apply the

Exercise 13.3 (Seven-Bar Linkage) Refer to Figure 9.5 and derive a set of six equations similar to the ones in Equations 9.5.33—9.5.36, consisting of three loop equations and three unit magnitude conditions. Compute a witness superset for general link parameters (a o ,6 0 ,c o ,ai,02,62,03,63,4,4,£ 6 ) & C 1 1 . Then repeat the exercise arbitrarily choosing a0 = —0.3- li, c0 = —1, ai = 0.28, 62 = 0.37, £6 = 0.55, and setting the remaining parameters with the formulae 60 = 0, a2 = a o 6 2 /co, as = ai, 63 = <2i(ao — Co)/ao, £4 = £e\ao/co\, and £5 = 11>21. Make a table like those shown in § 13.6.1.

Chapter 14

A Cascade Algorithm for Witness Supersets This chapter revisits the construction of a witness superset for the solution set of a system f(x) = 0 of n polynomials on CN, a topic addressed earlier in § 13.6. The algorithm, WitnessSuper, from that section leaves room for improvement both from a theoretical and practical point of view. To understand why this might be so, let us assume that we use total degree homotopies to solve the systems arising in the algorithm. Without loss of generality, we may assume that we have squared-up the system, so n = N, and we have sorted the polynomials fi{x) from the system f(x) by descending degree, so that letting di = deg/j, we have d\ > ••• > djvUnder these conditions, WitnessSuper tracks N

j

paths. In comparison, it is a classical fact, e.g., (12.3.1 Fulton, 1998), that given the irreducible components Zij of V(f) with Z^ occurring with multiplicity /i^ it follows that N

i=i

ij

At first sight this does not look so terrible. In the case when all the di = 2, 2N+l — 1 paths to be tracked in the algorithm to find at most 2N solutions. Since, all other things being equal, computational work is proportional to the number of paths followed, this amounts to only about twice as much work as is theoretically needed. But all other things are not equal! Paths that do not lead to witness points often end up going to singular solutions. This can be expensive. In § 14.1, we explain an algorithm that follows only Yli=i °*i paths in the total degree case. These paths are tracked in N stages, yielding at each stage a witness superset for each successively smaller dimension, and hence, we call this the cascade algorithm. In the worst case, all the paths survive to the end of the cascade, requiring the equivalent of N FIi=i di paths to track, but in the typical case, many paths terminate early in the process, making the algorithm relatively efficient. Moreover, 255

256

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

to survive to the next stage, a path must remain nonsingular, which helps keep computational cost down and reliability high. The version we present here differs slightly from its first appearance in (Sommese & Verschelde, 2000). The most notable difference is the removal of slack variables, which were never used in actual implementations. The new presentation draws on Theorem 13.2.2 to establish the genericity of the slicing hyperplanes, removing any dependence on the order in which they are used. For ease of reading, in this chapter we act as if components are reduced, e.g., we All talk about witness supersets (Wi,Li,f) instead of (Wi,Li: f,hi(x,t),Wi(e),e). our arguments and algorithms hold for nonreduced components also. For example, the cascade algorithm for witness supersets produces the hi(x,t) whether the components are reduced or nonreduced, and to obtain the Wi(e) we would only need to have the t variable in the homotopies in the cascade algorithm take on the value t = e for an appropriate small value of e. This short chapter has only two sections: the description of the cascade algorithm in § 14.1, and presentation of some examples of its use in § 14.2. 14.1

The Cascade Algorithm

The form of this algorithm is: Input Output

a system f(x) — 0 of n polynomials on C^ a witness superset for V(f) (see Definition 13.6.1)

For simplicity in forming the algorithm, we begin by squaring up the system so that we have the same number of equations as unknowns. Let r = rank/. By Theorem 13.4.2, the lowest possible dimension of any irreducible solution component is N — r, and by Theorem 13.5.1, all such components are also irreducible components of [Ir A] • f, where Ir is the r x r identity matrix and A is a generic r x (n — r) matrix in C" x ( n - r ). To get a witness point set for dimension i, we slice simultaneously with % generic hyperplanes, so we see that we use at least N — r such planes no matter which dimension is being investigated. By Theorem 13.2.2, with probability one, we can pick a set of N — r such hyperplanes, generic with respect to all solution components, by choosing random, complex coefficients for their equations. This is equivalent to choosing an r-dimensional linear space L £ CN intrinsically as L — b + B • u with random b £ CN and B £ CNxr. Combining these maneuvers, we have the square system of size r g{u)= [lrA]-f(b + B-u). Any solution u* of g(u) = 0 maps to a point x* = b+B-u* £ L, and such points that also satisfy f(x*) = 0 are the witness points that we seek. Whatever the values of n and iV may be, we use this approach to convert the problem of analyzing

A Cascade Algorithm for Witness Supersets

257

/ : C ^ —> C n to treating a square system g : Cr —> C . Accordingly, without loss of generality, from this point on we assume / is square of size n — N — rank/. Recall that for i = 0 to N, algorithm WitnessSuper obtains a witness superset for the i-dimensional component of V(f) by intersecting V(f) with i generic linear equations. Instead of treating each of these as an independent problem, the cascade approach embeds all of them into a common formulation. For this purpose, we introduce an JV-tuple of parameters t = (t\,..., £JV) G C ^ , the diagonal matrix

T(t):=

•..

,

(14.1.1)

and the notational device tW = ( t 1 , . . . , t i , O , . . . , O ) . By Theorem 13.2.2, there is a Zariski open dense set U C CNx{N+1) such that for A := [
(14.1.2)

+ A1-x

are generic with respect to all the irreducible components of V(f), where ao is the first column of N x (JV +1) matrix A and A\ is the remaining columns. The witness point superset for dimension i is a finite set of points containing all isolated solutions of the system

for nonzero values of t\,..., f j . The zeros on the diagonal of T(v-l>) knock out N — i of the linear equations in L(x), leaving us with a system of N + i equations in

X£CN.

To obtain all isolated solutions of F(A, x,t) = 0 by homotopy methods, we need a square system. Theorem 13.5.1 tells us that there is a Zariski open dense set U C CNxN such that for A e U the isolated solutions of F(A,x,t) = 0 are contained in those of £(A, A, x, t) := f(x) + A • T(t) • L(A, x).

(14.1.4)

Accordingly, a witness point superset for the i-dimensional component of V(f) can be found by computing all isolated solutions to £i{A, A,x) := £(A, A,x,t®)

= 0.

(14.1.5)

258

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

For clarity, let's denote the fcth row of L(A,x) as Lfc(x) and the fcth entry in f(x) as fk(x) so that we write £(A,A,x,t) expanded as

£(A,A,x,t):=

T/i(x)"| piL^a;)"!" : +A: . .UN(X)\ [tNLN(x)\_

(14.1.6)

We summarize what we have done, with some additional useful conclusions, in the following theorem, carrying over the notation of the preceding paragraphs. Theorem 14.1.1 For a given polynomial system f : CN —> C^, there is a Zariski open dense set U C C M x C JVx ( N+1) such that for (A, A) <5 U and any integer i satisfying 0 < i < N, it follows that (1) a witness point superset for all i-dimensional components of V(f) is a subset of the isolated solutions of £i(A, A, x) = 0; and (2) if x' is a solution of £i(A, A,x) = 0 then either: (a) x' is in a component ofV(f) of dimension at least i, and Lfc(x) = 0 for all 1 < k
I^NXN

Having embedded all the systems of interest into £(A, A, x, t), we now turn to the cascade for solving the £i(A, A, x) = 0 as % descends from N to 0. With probability one, a random choice of (A, A) satisfies the genericity conditions of Theorem 14.1.1. Choosing them so, we consider them fixed and suppress them from our notation, hence writing £(x,t) for the embedding and £i{x) for the ith embedded system. Define the level i nonsolutions as the set of solutions x' of £i(x) = 0 with Li(x') / 0. Denote these by Mi. They depend on the choice of A and A, but by Theorem 14.1.1, the number of them, which we denote i/j, is independent of A s U. Each £i(x) is in the family of systems £{x,t) for a particular value of t^ G C^. Moreover, holding il*"1) fixed but letting ti vary, we can view £i{x;ti) = 0 as a parameterized family of systems which include as a special case £j_i(x) = £i(x; 0). By the principles of parameter continuation, see § 7.4, if we can solve £t{x; U) = 0 for a generic ti £ C, then we can use those solutions as start points in a homotopy

A Cascade Algorithm for Witness Supersets

259

£i(x;s) = 0 as s goes from t\ to 0. By similar reasoning, we can descend from £i{x) — 0 to any £j(x) = 0, j < i, using the homotopy Hji{x, s) := £(x, (*!,..., t,-, atJ+u ..., stu 0,... ,0)) = 0

(14.1.7)

for s going from 1 to 0. We refer to this as a cascade of homotopies. We can be more precise about which solutions of the higher system lead to solutions for the lower one, as follows. Theorem 14.1.2 Let Hji(x, s) be defined as above. There is a Zariski open dense set U C CNxN x CNx<-N+1) x e such that for (A, A,t®) <E U, there are nonsingular paths ((f>k(s), s) :C—>CN x C with 1 < k < vt such that: (1) the set 0^(1) are equal to the set of nonsolutions at level i; and (2) H3i{4>k(s),s)=Q; and (3) the limits lims^o0fc(s) with Lj(lims^o 0fc(s)) ^ 0 are the level j nonsolutions; and (4) the limits lims^o0fc(s) with Lj(\ims^o4>k{s)) = 0 contain the witness point superset for the j-dimensional components ofV(f). Proof. By construction and Theorem 14.1.1, on an Zariski open dense set of (A,j4,£[fc!), the witness points for level j are among the isolated solutions of Hji(x, 0) = 0. Moreover, by Theorem 14.1.1, the nonsolutions at each level are nonsingular and therefore isolated. By Theorem 7.1.6, the limits of the isolated solution paths of Hji(x, s) = 0 include all the isolated solutions of Hij(x, 0) = 0. But the solutions of Hji(x, 1) = 0 that are solutions of Lk{x) = 0, 1 < k < i, remain solutions of those linear equations as s varies, so they remain on afc-dimensionalor higher component of V(f) and therefore cannot give witness points on a j-dimensional component. Thus, the endpoints of the paths beginning at the nonsolutions at level k include all the witness points and all the nonsolutions at level j . When categorizing an endpoint of a solution path, it suffices to check only Lj(lims_»o 4>k(s)), because by Theorem 14.1.1, either all L/c(lims^o 4>k(s)), 1 < k < j , will be zero or none of them will. • The Cascade Algorithm simply asserts that tracking the level i nonsolutions using the homotopy Hji{x, s) = 0 of Equation 14.1.7, we get the level j nonsolutions plus a witness point superset for the dimension j components of V(f). The randomness of i'2' in the homotopy of Equation 14.1.7 simplifies the proof of Theorem 14.1.2, but in practice all the U can be 1. They are just scaling factors on the linear equations (see Equation 14.1.6) and since the linear coefficients are already chosen generically in the matrix A, the generic choice of t^ is redundant. For the same reason, when we track paths from s = 1 to s = 0, we may do so on the real interval (0,1] with a probability of success equal to one (see Lemma 7.1.2). Assume that we know by some other means that dim V(f) < K < N. Then,

260

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

we can start the cascade by solving £K(X) = 0 using any homotopy that will find all isolated solutions, for example, a total degree homotopy. We can check for the trivial case when V(f) = CN, using the probabilistic null test, and so we usually start at level K = N — 1. Alternatively, one might use the algorithm TopDimen from § 13.7.1 to determine a lower starting dimension for the cascade. A final important note: at the top of the section, we began by squaring up the system / to size r = rank/. Theorem 14.1.1 applies to the square system; call it / ' and its witness superset as W. If the original system / has more than r equations, then W may include points which do not satisfy / . We simply discard these. Cascade: [W] = Cascade(/) • Input: Polynomial system / : C^ —> Cn. • Output: A witness point supersets W = {Wo,..., WJV} for V(f), where Wi is a witness point superset for the codimension i component. Empty dimensions may be omitted. • Procedure: — Initialize W = {}. — If / is null, return the appropriate result. Otherwise, continue. — Comment: square up f{x) to form g(u) of size rank/. — Let r = rank/. - L e t / ' = 5H(/;r). — Choose random b G CN and B 6 C i V x r . — Define g{u) = f'(b + B-u). — Comment: form embedding and solve for codimension 1. — Choose random A e C r x r and A £ C r x < r + 1 ) . — Form £(u, t) = g{u) + A • T(t) • L(A, w), where T(t) is diagonal of size r x r , — Let S := HomSolve(£'(u,^ r ^ 1 ))), discarding any solutions at infinity. — Partition S as W := {u G S : g(u) =0},N' = S\W. — Loop: For i = 1 , . . . , r — 1 * Comment: i is the codimension. * Append Wi := b + B • W to W. * Let d = r — i. * Track solution paths of £(u, st^ + (1 — s ) ^ " 1 ' ) = 0 as s goes from 1 to 0, starting at each of the points in J\f. Discard any endpoints at infinity and call the remaining ones set S. * Partition S as W := {u G 5 : g(u) = 0}, Af = S \ W. — End loop. — Comment: the lowest dimension might have extraneous points. — Let Wr := b + B • W, and expunge any points x G Wr such that f(x) ^ 0. — Append Wr to W. — return(VF).

261

A Cascade Algorithm for Witness Supersets

For simplicity, we state the algorithm concentrating on the witness point sets. The linear slicing equations are easily constructed from b, B, and A. 14.2

Examples

For direct comparison with algorithm WitnessSuper of the previous chapter, we repeat the same examples as in § 13.6.1, this time using Cascade. Please refer to the earlier section for the meanings of the table entries. Example 14.2.1

For the system f ( x v )

- \

*(2/2-*3)(z-l)

l _

0

the cascade results are as follows. There is a new column called "fail" to record that some paths did not converge well. Dim 1 0

Paths 30 9

#W 4 9

#Wsing 0 7

#JV 9 0

#00 4 0

fail 13 0

nfe nfe 223 6711 64 643

As we know the answer before hand, we can verify that the witness supersets contain a valid witness set. The 13 failed paths are worrisome, but it appears that they are highly singular points at infinity. This example is rather degenerate with high degrees; it calls for higher precision arithmetic for a secure treatment. Example 14.2.2

The system

f(x,y):=\x2y - Xy-2y}=0J [ 3xyA-y leads to the following results from the cascade: Dim 1 0 Example 14.2.3 Dim 8 7 6 5 4

Paths 12 6

#W 1 6

#Wsing 0 2

#jV (3 0

#00 5 2

nfe nfe 117 1403 37 222

Running the cascade on the equations for SO(3), we obtain Paths 96 72 72 72 72

#W 0 0 0 0 0

#Wsing 0 0 0 0 0

3 I 72 I 8 1 0

#N 72~~ 72 72 72 72

#00 24 0 0 0 0

nfe 267 46 44 47 48

nfe 25620 3290 3195 3357 3465

40 I 24 1 107 I 7674

262

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

For each stage after the first, the nonsolutions of the previous stage become the start points of the next, so the number of paths can only decrease at each stage. Examples like the SO (3) problem are the worst case for the cascade as far as the total number of paths is concerned, because all the paths survive to the last stage. A saving grace is that the number of function evaluations per path falls dramatically after the top dimension. We can only surmise that the initial homotopy between a generic start system and our sliced target has longer, perhaps more twisted, paths, while the cascade homotopies connect highly related systems, so that the paths are short and relatively straight. The rise in nfe for the final dimension of the 50(3) problem is due to the solutions at infinity being singular, thus requiring a more expensive endgame to compute them accurately. Comparing these tables to the ones in § 13.6.1, we see that Cascade consistently tracks more paths than WitnessSuper, but the total number of function evaluations is almost the same. We experience some numerical difficulty on the first cascade example, but it still returned a correct witness superset. There is one clear difference in performance: Cascade returns a smaller superset than WitnessSuper on each of these examples. This means the supersets contain fewer junk points. This is particularly notable in the zero-dimensional sets for Example 14.2.1, for which WitnessSuper gave a set of 30 points containing 28 junk points, while Cascade gave a set of only 9 points containing 7 junk points. When we move on to computing a numerical irreducible decomposition, the first step is to remove the junk points. It is quite advantageous to have fewer of them at the outset. 14.3

Exercises

Exercise 14.1 (Comparisons) Run Example 14.2.2 and Example 14.2.3 using HOMLAB. DO SO both using witsup. m, an implementation of WitnessSuper of the previous chapter, and using cascade.m, an implementation of Cascade. Compare run times for the two methods. Use the profiler tool in Matlab to track which routine is using the most computation. If you are using the "tableau" format, supplied for both examples, see how much you can improve performance by writing an efficient straight-line program. Exercise 14.2 (Slicing Equations) Find an expression, in terms of b, B, and A, for the slicing equations for dimension i in algorithm Cascade. Exercise 14.3 (Spherical Pentad) A pentad mechanism is topologically two triangles, A and B, with three line segments, each one joining corresponding vertices of the triangles. The segments and triangles represent rigid links, but relative motion is allowed where they meet. In the spherical version, the joints are all revolute (one-degree-of-freedom hinges), and their centerlines all intersect in a common point. This means that the possible relative positions of one triangle with respect

A Cascafe Algorithm for Witness Supersets

263

to the other is constrained to rotations in M3. Let a i, 02,03 € K3 be unit vectors at the joints of triangle A and bi,b2,b3 E R3 the same triangle B. Let q 6 R be the cosine of the arc subtended by the segment from at to bi. Let X e SO(3) be the rotation of triangle B with respect to A. Then, we have the three equations a[Xbi = cu

i = 1,2,3,

to describe all possible placements of B with respect to A such that the pentad can be assembled. Explain how results presented in this chapter allow you to conclude that for general parameters a^, bi, c,, i — 1,2,3, the spherical pentad has at most 8 assembly configurations. Exercise 14.4 (Griffis-Duffy Platform) A special case of the Stewart-Gough platform we studied in § 7.7 and § 9.3, Griffis-Duffy platforms (Griffis & Duffy, 1993) have triangular upper and lower platforms, with the vertices of each connected to a point on the edge of the other. An even more special case, which we call the GriffisDuffy Type I platform, is when the triangles are equilateral (not necessarily the same size) and the joints on the edges are at the midpoints (Husty k Karger, 2000; Sommese, Verschelde, & Wampler, 2004a). That is, connecting point a, on the base to bi on the upper plate, a\, a3 and a5 are vertices of an equilateral triangle, and a-2 = (ai + a^)/2 and so on cyclically. Meanwhile, &2, 64 and b& are vertices, and bj = (&6 + fr2)/2 and so on cyclically. The leg lengths, Li: are arbitrary. What is the dimension and degree of the top-dimensional component? Use Equations (7.7.7) and 7.7.10), and ignore any points on the degenerate set of Equation 7.7.8. Exercise 14.5 (Seven-Bar Revisited) Repeat Exercise 13.3 using Cascade.

Chapter 15

The Numerical Irreducible Decomposition

Let Z be an affine algebraic set on C^. This means that Z is the solution set of some system of polynomials / : C^ —» C", i.e., Z = V(f). In a typical situation, we start with / as given, and we seek to find a description of its solution set. In other cases, such as we address in the next chapter, Z may be only a portion of the full solution set of the polynomials on hand. But for the moment, it does no harm to think of Z as the full solution set V(f). No matter its origins, Z has a decomposition into its pure-dimensional parts Zi, i.e., Z = L)fl!£lzZi with dim Zi = i. Furthermore, each Zi can be decomposed into irreducible pieces Zij, i.e.,Zi = Uj^XiZij, where each Zij is a distinct irreducible component and the index sets X% are finite. Our goal is to find a numerical irreducible decomposition, that is, we wish to find witness point sets, W^ := Z^ fl L c ^ for each irreducible component Z.Lj of Z, where Lc^l is a generic linear space of codimension i. From Chapters 13 and 14, we have algorithms WitnessSuper and Cascade that, given polynomial system /, find a witness superset for V(f). That is, for each dimension i, they give a set Wi D W{ := Zi n Lc^\ which contains all the witness points for all the irreducible components of dimension i along with some possible junk points. Accordingly, our goal becomes to find the breakups Wi = Wi + J, = Uj-ez, W^ + Jj

(15.0.1)

where Jj C (Uj>iZj) n L^1^. To achieve this, we show • how to trim the junk points out of the witness supersets, Wi, to obtain the witness sets, Wi, i — 0,..., dim Z, and • how to decompose a witness set, Wi, into its irreducible components, Wij. In the sequel, Chapter 16, we present methods for finding witness supersets for the intersection of algebraic sets. The methods of the current chapter will apply equally well to those witness supersets. One way to approach the processing task is to employ a membership test. Junk points in a witness superset at dimension i are members of component of dimension greater than i, so one way of detecting them is to start at the highest dimension and work down, eliminating any points found to be members of higher-dimensional 265

266

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

components. Then, at a fixed dimension, we need a test of whether two or more witness points belong to the same irreducible component. By such tests, we can group the points to form the numerical irreducible decomposition. Hence, much of this chapter is devoted to membership tests, and in § 15.1, we begin the chapter by discussing how different types of membership tests, defined abstractly by their inputs and outputs, can be used to process witness supersets into the irreducible decomposition. The remaining sections present concrete approaches to providing the necessary membership tests. All the algorithms of this chapter rely on a basic maneuver we call sampling, which generates new points on a component by tracking witness points as the slicing plane is moved continuously. Thus, in § 15.2 we discuss sampling for each of the three variants of witness set put forward in § 13.3: reduced, deflated, and nonreduced. The general nonreduced case requires a method for tracking singular paths, which we outline in § 15.6. A sampling routine enables three kinds of algorithms that are useful in computing a numerical irreducible decomposition. Numerical elimination theory, § 15.3, interpolates sample points to find equations that vanish on a component thereby providing a membership test. This approach provides a complete solution to both the junk elimination and the decomposition stages of processing, but it becomes prohibitively expensive and numerically unstable for all but the lowest degrees and dimensions. As a more practical alternative, in § 15.4, we discuss a homotopy membership test based on the fact that regular points of an irreducible algebraic set are path connected. This approach provides a complete method for junk elimination, but its use in monodromy loops to heuristically find connection paths between witness points at the same dimension provides only a partial solution to the decomposition phase. To complement this, the trace test discussed in § 15.5 determines whether a given subset of witness points forms a complete component. It can be used to quickly certify a putative decomposition found by monodromy or to complete a partial one by exhaustive testing. It can even be used by itself to combinatorially test subsets of points until the entire decomposition is determined. Our presentation follows the order in which the methods were originally developed: numerical elimination theory in (Sommese et al., 2001a), monodromy in (Sommese et al., 2001c), and traces in (Sommese, Verschelde, & Wampler, 2002b)—inspired by ideas in (Rupprecht, 2004). The different approaches each have their own niches. For a pure i-dimensional component of moderate degree, meaning not much more than degree 10, traces prove to be fastest decomposition method, but the worst-case cost grows exponentially with degree. For this reason, monodromy certified by traces eventually becomes more effective. Numerical elimination is not generally competitive for determining a decomposition, but it could still be useful if one seeks equations vanishing on a component.

The Numerical Irreducible Decomposition

15.1

267

Membership Tests and the Numerical Irreducible Decomposition

Our task is: • Given: A witness superset, W, (see Definition 13.6.1) for an affine algebraic set Z. ^ « Find: The decomposition of W into a numerical irreducible decomposition for Z, that is, find the breakup of W as in Equation 15.0.1. We will outline three variations on a procedure to complete this task, each based on a different type of membership test. The details of how to implement the tests follow in subsequent sections. In this and the following sections, we denote the witness superset for dimension i as Wi, which is composed of a witness set Wj for dimension i plus, possibly, some junk points, J;. In addition to witness points, witness sets and witness supersets carry along linear slicing planes and some description of Z in a form that allows witness points to be refined numerically. When we speak of a point w G Wj, it is implied that w is in the witness point set for Wi. Before employing membership tests, we reduce the amount of work by partially categorizing the points in the witness superset. The first observation is that all points in the top-dimensional witness superset are true witness points: there is no junk in the top dimension. This is because, by definition, the junk points in a witness superset for an i dimensional component of Z must lie in some higher-dimensional component of Z. A second observation is that any nonsingular points in W must be true witness points. Assume that Z = V(/), where / is a system polynomials. A point t t e W j lies in Z n £ c ' 2 ', so letting / ^ m (x) denote the restriction of / to the linear space Lc'1', we have / ^ m (w) = 0. Then, w is nonsingular if the Jacobian matrix of partial derivatives for //,cm has full column rank.1 For this purpose, it does not matter whether the linear slice is represented extrinsically or intrinsically. (See § 13.2.1 for explanation of these terms.) Nonsingularity implies that the point is an isolated point of Z D £CM. In contrast, junk points in Wi lie in a component of Z of dimension greater than i and hence in a component of Z fl Lc ^ of dimension at least one. The final observation builds on the second. A point w £ Wi is a true witness point if, and only if, it is an isolated solution to fLo[^(x) = 0. Any test of local dimension can serve to distinguish between junk points and witness points. If point w £ Z n I c f ' l C CN is not isolated, then the slice Zn Lc^ must intersect a closed hypersurface surrounding w. Interval arithmetic might be used to find that the point is isolated by showing that none of the 2N faces of a rectangular box enclosing w x

We use the usual convention that rows of the Jacobian correspond to functions in / and columns correspond to variables.

268

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

contains a solution. Alternatively, a heuristic like the one in (Kuo et al., 2004) could be used to find nearby points on Z D Lc^1^, thereby showing that the point is not isolated and must be junk. Taking such observations into account, we mark some points in W as true witness points, we discard any points known to be junk, and we mark the remaining ones as needing further investigation. If local dimension testing of the sort mentioned in the previous paragraph is reliably complete, then no questionable points remain, but we do not count on that outcome in what follows. We can complete the decomposition with one of several types of membership test. The first of these has the following inputs and outputs. Irreducible Membership: [Yi,>2] := Memberl(Y,w) • Input: A finite set of test points Y 6 CN and an isolated point w G Z C\ Lc'*', where Z is an algebraic set and Lc ^ is a linear space of codimension i generic with respect to Z. • Output: Set Y\ consisting of the points in Y that are on the same irreducible component of Z as w, and set 5^ := Y \ Y\ being the rest of Y. • Procedure: See § 15.3. This membership test yields a complete algorithm for the numerical irreducible decomposition of a witness superset as follows. Irreducible Decomposition: [W] := IrrDecompl (W) • Input: A witness superset W for an algebraic set Z. • Output: The witness set W contained in W decomposed into its irreducible pieces as in Equation 15.0.1. • Procedure: - Initialize W = 0. - While: W ^ 0, * * * *

Let k be the top dimension of W. Pick any w € W^Let \YUY2] :=Memberl(t?,w;). Points in Y\ from Wk form an irreducible witness set. Append this set to W. ^ * Points in Y\ from Wi, i < k, are junk. Discard them. * Remove Yx from W, i.e., W := W \ Yl.

- End while. - return(W). On each pass through the main loop, at least one point w is removed from Wk, so eventually it is emptied out and the algorithm descends to the next dimension.

The Numerical Irreducible Decomposition

269

Eventually, W is completely empty and the algorithm terminates. Irrdecompl does both jobs of removing junk and decomposing the witness sets. The only trouble with this approach is that Memberl turns out to be expensive. For this reason, we develop more efficient alternatives. These alternatives proceed by eliminating junk as an independent process from decomposing the witness set. The key to junk removal is the following algorithm. •

i

Membership: [t] := Member2(?/, W) • Input: A single point y € CN and a witness set W for a pure-dimensional algebraic set X. • Output: If y £ X, return t := true, else return t := false. • Procedure: See the homotopy membership test of § 15.4. With this test available, one can remove all junk points as follows. Remember that the top dimensional component of a witness superset contains no junk points. Junk Removal: [W] := JunkRemove(W/) • Input: A witness superset W for an algebraic set Z. • Output: The witness set W obtained by removing all junk points from W. • Procedure: - Let k be the top dimension of W and set i := k - 1. - Let Wk := Wk. - While: i > 0,

* For each w £ Wi, if Member2(w;, Wj) for any j > i, then discard w. Otherwise, copy it into Wj. * Let i :—i — \. - End while. - return(W:=[W 0 ,...,W fe ]). i

i

With the junk removed, it remains to partition the witness sets at each dimension into irreducible witness sets. The monodromy method, though not complete on its own, is useful for this task. Monodromy: [W] := Monodromy(W) • Input: A witness set W for a pure-dimensional algebraic set Z. • Output: A witness set W having the same points as W in some permutation such that corresponding points in the lists are known to be in the same irreducible component of Z. • Procedure: See the monodromy algorithm of § 15.4.

270

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

To make the monodromy approach complete, we may employ a trace test. This test can also be used on its own, without monodromy, to find irreducible decompositions. Its format is as follows. Trace Test: [t] := Trace(F) • Input: A set of points Y C W, where W is a witness set for a pure-dimensional algebraic set Z. • Output: An array t containing linear traces of the points in Y. The traces have the property that if the sum of traces for a set points is zero, then that set is the union of one or more irreducible witness sets. • Procedure: See § 15.5. i

i

As we will discuss in § 15.5, if w € W is on an irreducible component of degree d, there is one and only one subset W of size d that contains w and has trace of zero. (The trace of a set of points is just the sum of the traces of its members.) Moreover, any zero-trace set of size greater than d that contains w is the union of the irreducible one of size d and one or more other irreducible witness sets. Combining monodromy and the trace test, we have a complete algorithm for irreducible decomposition of a pure-dimensional algebraic set as follows. Irreducible Decomposition: [W] := IrrDecompPure(W, M, K) • Input: A witness set W for a pure-dimensional algebraic set Z. Also, integers M, K that control when to switch to exhaustive enumeration. • Output: The a list W of the irreducible components Wj of W. • Procedure: — For j = 1,..., #W, initialize Yj as a set containing the jth point of W. Let Y be the list of all Yj. — Associate to each Yj a trace value tj := Trace(Y,). — For each tj that is zero, move Yj from Y to W'. — Comment: try heuristic monodromy loops first. Integer k counts the number of attempts without making progress. — Initialize k = 0. — While: m := #Y > M and k < K, * Let {Y{, ...,Y^}:= monodromy {{Yu ..., Ym}). * If there is any Yj ^ Yo•, we have found a path connecting a point in Yj to a point of some Yj, i ^ j . * Regroup the Yi, merging all sets that have a monodromy connection and updating the corresponding trace as the sum of those for the merged sets. * For each new trace that is zero, move the merged set from Y to W. * If there were no mergers, increment k, else set k = 0. — End while.

The Numerical Irreducible Decomposition

271

- Comment: Switch to the exhaustive tests either because the number of groups is low enough or because we give up on the monodromy heuristic. - While: Y / 0 * Among all combinations of one or more Yj eY, find the smallest combination that both contains Y\ and has a summed trace of zero. "Smallest" means having the fewest witness points. * Merge this combination into one set and move it to W.

i

- End While. - return(W)-

_

m

^

i

With care in programming, the exhaustive phase requires at most 2 m ~ 1 — 1 combinations to be examined. There are 2 m possible combinations in all, but if one combination passes the trace, so does its complement in Y, so we never have to test both. Also, we know that the trace for the whole of Y must be zero, because the initial set W is a complete witness set. A further refinement of the algorithm recognizes that some witness points appear with multiplicities greater than one, and all witness points on the same component must have the same multiplicity. Therefore, if we keep track of how many times each witness point appears in the output of WitnessSuper or Cascade, we can limit the combinations to be tested in the exhaustive phase to only those combining points of the same multiplicity. Here, multiplicity means the multiplicity of the component as a solution to the squared-up system used in witness superset generation, which may be greater than its multiplicity as a solution of the original system. If we do not wish to use monodromy, a negative value of K causes IrrDecompPure to skip directly to exhaustive trace testing. The numerical irreducible decomposition is obtained from a witness superset by removing junk and then the witness sets for each dimension one by one. For completeness, we list out the algorithm as follows. Irreducible Decomposition: [W] := IrrDecomp2(VF,M, K) • Input: A witness superset W for an algebraic set Z. Integers M and K are control parameters for IrrDecompPure. • Output: The witness set W contained in W decomposed into its irreducible pieces as in Equation 15.0.1. • Procedure: - Let W := JunkRemove(W0. - For % = 1,..., dim W, let W{ := IrrDecompPure(W', M, K). - return(W / ). This completes the top-down description of numerical irreducible decomposition. The rest of the chapter builds the required membership tests from the bottom up

272

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

and shows that they have the properties that we rely on here. 15.2

Sampling a Component

The fundamental capability upon which all the membership tests depend is the ability to sample a component given a witness point on it. Recall from § 13.3 that as a theoretical construct, witness points are the isolated points of intersection between an affine algebraic set and a generic linear space. To sample a component, we simply move the linear space in continuous fashion, i.e., move it along a realone-dimensional path through the Grassmannian of linear spaces. As long as the prescribed path avoids a proper algebraic subset of nongeneric linear spaces, the intersection with the component remains isolated and defines a real-one-dimensional path of points on the component. Sampling is just the process of setting up such paths and following the points of intersection. Suppose the algebraic set under study is Z, and let L(s) denote a continuous path of linear spaces that are generic with respect to Z for s S (0,1]. Then, x(s) := Z n L(s) is a path of isolated points, with a well-defined endpoint x(0) = lim s ^o3 ; (s). When we choose L(0) to be generic also, then x(0) is a new sample point lying on the same component as x(l) and on no other component of Z. As a numerical construct, witness sets carry along extra information that allows a numerical approximation to a witness point to be refined to higher precision. This same information allows us to update the witness point when the slicing plane is moved slightly, hence we can numerically follow the path x(s). The details vary according to whether the component is reduced, deflated or nonreduced. A linear space can be represented extrinsically as a set of linear equations L{x) = a + A • x = 0 or intrinsically as x(u) = b + B • u. In the extrinsic form, a linear interpolation between two such spaces of the same dimension, say L\{x) and L0(x), can be written as L(x, s) = sL^x) + (1 - s)LQ(x).

(15.2.2)

If the coefficients of Li(x) and L0(x) are chosen at random, then L(x, s) = 0 defines a linear space of the that same dimension for all s G [0,1], with probability one. Intrinsically defined paths work in an analogous way, so we don't write out the details. In the rest of this section, we write only extrinsic formulations, but it should be understood that intrinsic ones can be used instead, usually with some increase in efficiency for implementations. 15.2.1

Sampling a Reduced Component

A numerical witness point on a reduced component in V(f) is a nonsingular solu— 0, for some known slicing tion, say xi, to the augmented system {f(x),Li(x)} equations Li(x). To sample, we simply replace Li(x) with the path L(x,s) of

The Numerical Irreducible Decomposition

Equation 15.2.2 to get the homotopy

h

Q

^=[i(th -

273

(1523)

--

We wish to track the path beginning at xi for s = 1 to find the endpoint as s —> 0. For x € CN, the homotopy h(x,s) has at least N equations. When it is not square, we can use randomization to square it up as h'(x,s) := 9i(h(x,s);N) and then apply the usual nonsingular path tracker of § 2.3. An alternative is to use a Gauss-Newton predictor-corrector, meaning that we use least-squares pseudoinversion in place of Gaussian elimination to solve the overdetermined linear systems in the predictor and corrector steps (see Equations 2.3.5, 2.3.6). 15.2.2

Sampling a Deflated

Component

Recall from § 13.3.2 that for a nonreduced component in V(/), we have the option of constructing a deflation such that the component in question is the projection of a reduced solution component of a related system of polynomials g. That is, the witness set has the form (W,L,g, TT) such that the points W are nonsingular points in V(g) D n~1(L) and the witness points are W = Tr(W'). When L is given by equations L{x) = 0, the pullback TT^^L) is given by the same equations, so the path L(x, s) is still just as in Equation 15.2.2. We proceed as in the case of a reduced component but with g replacing /, obtaining a solution path y(s) in some larger dimension. The path we seek is just x(s) = ir(y(s)). 15.2.3

Witness Sets in the Nonreduced Case

The nonreduced case without deflation is the most difficult. In this case, witness points are singular endpoints of solution paths in a homotopy h(x, t) = 0. This homotopy is constructed in the course of computing a witness superset either by WitnessSupi called from WitnessSuper of § 13.6 or by algorithm Cascade of § 14.1. Either way, the homotopy depends on the coefficients of the linear slicing equations, which we may explicitly show by the notation h(x, t; A) = 0. Consequently, if Ai is the matrix of coefficients for the slicing plane on which our witness point lies and Ao is same for the target slice, the sampling homotopy becomes doubly parameterized by t and s as H(x, t, s) := h(x, t, sAi + (1 - s)A0) = 0.

(15.2.4)

We have in our witness set the start points Wt that satisfy H{x, e, 1) = 0 and which lead to the witness point as t —> 0 for s = 1. We wish to track the solution path as s moves along (0,1]. We know that the solution path exists, but it consists of singular isolated points for each value of s. This is a case of singular path tracking. So as not to unduly interrupt the flow of the chapter, we postpone discussion of singular path tracking to the last section, § 15.6.

274

15.3

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Numerical Elimination Theory

The first approach to the numerical irreducible decomposition, reported in (Sommese et al., 2001a), uses a membership test based on a numerical version of elimination theory. This is the test we called Memberl in § 15.1. Let X denote an irreducible k dimensional component of the solution set V(f) and let x* be a known generic point on it, that is, x* = X n L c ^ l for a generic linear space Lcrfel of codimension k. We need to give a criterion for y G C^ to belong to X. We can assume that N > k > 0 since otherwise nothing needs to be done. Assume first that k = N — 1. Using the sampling techniques of § 15.2, we vary the slicing plane and collect as many widely separated general points on X as we wish. For each positive integer d, there are m(d) := {N^d) monomials of degree less than or equal to d; the coefficients of these monomials are homogeneous coordinates in p m ( d ), thereby forming the complex linear space Pd(CN) of polynomial equations of degree < d. Each point on X gives a linear condition on the Pd(CN). Choosing m(d) — 1 general points, we get a polynomial Pd{x) vanishing on the points, unique up to multiplication by a nonvanishing complex number. Choosing one additional general point e on X we have either (1) Pd(e) / 0, in which case there are no elements of Pd(CN) vanishing on X and deg X must be greater than d; or (2) or Pd(e) = 0, which by genericity implies that Pd{x) is identically zero. In this case, degX < d and if this is the smallest d for which such a polynomial exists, we know that degX = d, and in fact, X = V(pd)- Consequently, we have a membership test: y G X if, and only if, pd(y) = 0. Thus, we may proceed progressively d = 1,2,... until we find a d for which there is a polynomial Pd{x) vanishing on X. We know that degX is at most the cardinality of the witness set for dimension N — 1, which limits the complexity of the method. Now assume that 0 < k < N — 1. Take a generic linear projection TT : CN —> Cfc+1. We know by Theorem A.10.5 that 7r is generically one-to-one and proper on X, and in particular, that TT(X) is an algebraic hypersurface with deg?r(X) = deg^f. We sample X as usual and project each sample point x to y = TT(X) £ Cfc+1. Just as above, we now find a polynomial qd(y) of minimal degree that vanishes on the projected samples and we conclude that TT(X) = V(qd)Any point of x' e Tr~l(n(X)) satisfies qd(n(x')) = 0, so at first blush, qd does not seem adequate for testing membership in X. However, it is sufficient for testing membership for a finite set F C C^, because a general projection such as TT has the property that for all x* G F, n(x*) G n(X) if, and only if, x* G X. So choosing the projection at random, we have a probability-one membership test for points x* G F: x* € X if, and only if, qd(Tr(x*)) = 0. This is all we need for algorithm Memberl. The main problem with this approach is that (fc+^+d) grows rapidly with the dimension k and degree d of the component. Also, fitting polynomials of high degree

The Numerical Irreducible Decomposition

275

to numerical data is often numerically ill-conditioned. The dimensionality of the problem can sometimes be reduced by detecting that the linear span of a component is smaller than N and the degree can be lowered mildly by projecting from points on the component; see (Sommese et al., 2001b) for more on these. Still the approach is often too inefficient for practical use. 15.4

Homotopy Membership and Monodromy

We can avoid the computational cost of numerical elimination by switching to a weaker membership test, called Member2 in § 15.1. It has the more stringent condition that the input is a witness set for a pure-dimensional component, whereas Member 1 only requires a single generic point. However, since our methods of generating witness supersets always give a top-dimensional witness set free of junk points, we have the necessary input to start the junk removal process for lower dimensions using Member2. The same theoretical underpinning that justifies Member2 gives us routine Monodromy: both rely on a homotopy membership test. The main principle is that if X C C^ is an irreducible algebraic set, X and XTeg are path connected. Assume X is i-dimensional, i < N, and let G be the Grassmannian consisting of all codimension i linear spaces in CN. A general point in G is a generic slicing plane with respect to X, while there is some proper algebraic subset, say G* C G, of nongeneric slicing planes. A generic slicing plane, say Li £ G \G*, cuts X in a witness point set W\ := X D L\. For any LQ £ G, let L(s) c G b e a one-real-dimensional path with L(l) = L\ and L(0) = LQ and L(s) <EG\G* for all s £ (0,1]. By Theorems A.14.1 and A.14.2, since Wi is the entire solution set of XDL(0), the solution paths XflL(s) start at W\ for s = 1 and the limits of their endpoints as s —> 0 includes all isolated solutions of X n L(0). For convenience throughout this section, we abuse notation by using the same symbol L for both the linear space and the linear functions which define it, i.e., L = V{L{x)). Suppose we wish to test if point y £ CN is in X, where X is as in the previous paragraph. If y £ X, then among all the linear spaces in G that pass through y, generic ones meet X at y transversely, that is, letting Lo be a generic element of G meeting y, we have that y is an isolated point of X n Lo- Accordingly, the endpoints of the homotopy paths X n L(s) starting from W\ must include y as s —* 0. The only remaining question is how to construct L(s) so that it misses G*. This is easily accomplished because L\ is generic, so by Lemma 7.1.2, the path L(s) = sL\ + (1 — S)LQ avoids G* for s 6 (0,1] with probability one. Here, the interpolation formula for L(s) assumes L\ and Lo are represented extrinsically as a set of i linear functions. Now, suppose that X is the union of several irreducible pieces, all of dimension i. We have y G X if, and only if, it is in one of the pieces. We just conduct the homotopy membership test for each piece. Notice that if we have a witness set for X, it includes witness points for all the irreducible pieces even though we may not

276

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

know which points match which pieces. It doesn't matter as far as the membership test is concerned; if we track all the witness points, we still get all the endpoints, without knowing which ones are on which piece. According to the above, we may now write pseudocode for Member2 as follows. Membership: [t] := Member2(y, W) • Input: A single point y € CN and a witness set W for a pure-dimensional algebraic set X of dimension i. • Output: If y € X, return t := true, else return t :— false. • Procedure: — Comment: W includes a linear space L\ and the witness points W = XC\L\. — Choose a random complex i x N matrix A. — Let LQ(X) := A • (x — y). This is a generic linear space passing through y. — Track the paths X n (sLi + (1 - s)L0) from W at s = 1 to get endpoints Y at s = 0. — If y £ y, then return(true), otherwise return(false). Obviously, we can save some computation by tracking the paths one at a time and returning a positive result as soon as one ends on y. The worst case is when y $ X, because then we always have to track all the paths to find this out. 15.4.1

Monodromy

The same principle underlying the homotopy membership test leads directly to the concept of monodromy. In our context, the basic idea is that if L(s) C G \ G* is a one-real-dimensional closed loop, that is, L(0) = L(l), then the set of witness points at s = 1 are equal to those at s = 0, i.e., W = X n L(l) = X n L(0). This is true both when X is irreducible and when it is the union of irreducibles. What makes this useful to us is that, although the set of points is the same, the paths leaving at s = 1 may arrive back at s = 0 in permuted order. A path beginning at point u £ W and arriving at point v G W with u ^ v demonstrates that u and v are in the same irreducible component. This is just the homotopy membership test applied on a closed loop. When we begin with a witness set W for a pure-dimensional component X, such as would be generated by successive application of algorithms Cascade and JunkRemove, we do not know how many irreducible components X contains. Any partition of the points is possible, from every witness point lying in its own linear component to all witness points on the same component of degree #W. Each connection between distinct witness points found by monodromy restricts the possible break up. This is how algorithm Monodromy is used in algorithm IrrDecompPure of § 15.1. Pseudocode for the monodromy algorithm follows.

The Numerical, Irreducible Decomposition

277

Monodromy: [W] := Monodromy(M^) • Input: A witness set W for a pure-dimensional algebraic set Z. • Output: A witness set W having the same points as W in some permutation such that corresponding points in the lists are known to be in the same irreducible component of Z. • Procedure: — Comment: W includes a linear space L\ and the witness points W € XC\L\. — Choose a random linear space LQ(X) = 0 of the same dimension as Li{x). - Let L(s) = sLi + (1 - s)L0. - Track the paths X H L(s) starting at W for s = 1 to get new endpoints V at s = 0. — Choose a random, complex 7 e C. - Let L(s) = sjLQ + (1 - s)L\. — Track the paths beginning at V for s = 1 to get endpoints W at s = 0. - return(W). In the lists of points W, V, and W, we maintain the path ordering throughout, so that the kth point in W is path connected to the kth point of W', for all k. Note that we have used the fact that ryL(x) = 0 defines the same linear space as L{x) — 0, so X D LQ = I f l 7L0 • This means the start points of the second homotopy are the endpoints of the first. The 7 causes the return path to be different than the outbound path. See the figure following Lemma 7.1.3 for illustration. Note that Monodromy as written above uses two stages of path-tracking to produce one monodromy loop. In the process, it generates a witness set at a second slice, but this information is thrown away. For efficiency, one could save this intermediate witness set and use it to close monodromy loops with less work on subsequent executions of the algorithm. For example, if we go from i 2 to LQ to L\ to LQ, we have closed two loops, L\ —> LQ —> L\ and Lo —> L\ —> L o , using only three rounds of path tracking instead of four. See (Sommese et al., 2001c) for more on such practical issues. 15.4.2

Completeness of Monodromy

The monodromy procedure above is clearly valid, but it could be vacuous in the sense that the points might always come back in the same order. This is not usually observed in practice, and in fact, theory tells us that there exist monodromy loops sufficient to generate all possible permutations of the witness points on an irreducible set. This section presents an even stronger result that is key to the next topic of traces. But before we show that result, we give an extended discussion of a simple case. Historically, this was probably the first example of monodromy ever studied. Assume we have a polynomial p(z, w) = w2 - z on C2. Assume that p(z, w) = 0

278

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

and we wish to express w as a function of z, i.e., we wish to make sense out of the expression y/z with z € C. We would like this to be a global function, but this is not possible in any continuous way. Let us assume it is possible and see what goes wrong. At z = 1, we need %/T to be set to either 1 or — 1. Let's assume that y/z is set to 1: the case of —1 is identical. For z = elS we have either y/z = e%e/2 or y/z — —el6/2. Since \/T = \/e® we conclude by continuity that y/z = eie/2. The trouble comes when we go full circle and reach Vei2lT. By continuity we have V ^ F = ei7r = - 1 . The easiest classical solution of the problem of defining y/z (or ln(.z) for that matter) is to slit the plane from 0 to —oo, e.g., remove the real numbers from 0 to —oo from C. On the slit plane there are two "branches" of y/z. One has -\/l set to 1 and the other has vT set to —1. Similarly setting a more complicated polynomial p(z,w) = 0 with the w degree equal to d, we will have functions w = qi{z) for i = l,...,d solving p(z,w) = 0. Each is branch of the solution is defined on an appropriately slit region of the plane. Analytic continuation is the classical name for the process of extending the function, e.g., extending y/z denned in a small neighborhood of 1 to a function on a larger region. Hille has a nice detailed discussion of analytic continuation (Chapter 10 Hille, 1962). Notice that trying to define y/z and tracking v e * as z goes around the unit circle leads to a permutation of the set {1,-1} of roots of z2 = 1. Looking at this a bit more abstractly, we have that w — z2 — 0 defines an algebraic curve X in C 2 . Projection to the z variable gives a two-sheeted branched cover ?r : X —> C. Over C* := C \ {0}, we have that 7r : X \ {(0,0)} —» C* is a two-sheeted unramified cover with the fiber over a point z being (z, w), with w running over the "two square roots" of z. The fundamental group of C* is the additive group Z, and we have the monodromy action of Z on the fiber of TT over a fixed basepoint, e.g., 1. The even elements of Z leave {1,-1} fixed and odd integers send 1 to —1 and —1 to 1. How does this apply to decomposing an algebraic set into its irreducible components? Let's assume we have a purefc-dimensionalaffine algebraic set X C C^. Let 7T : X —> Ck denote the restriction to X of a generic linear projection from X to Cfe. Note that by genericity we conclude from the Noether Normalization Theorem 12.1.5 that 7r is a proper d := degX branched covering of Cfe. The union of the sets where n is not a covering and X is not a manifold form a proper algebraic X' C X with dimX' < dimX. Since n is proper, we know that TT(X') is an algebraic subset of Ck by the proper mapping theorem A.4.3. Moreover since the fibers of the map TT are finite, we know that dimTr(X') = diiaX1 < dimX = k. Thus letting X = X\ U • • • LJJ r denote the decomposition of X into irreducible components, we have that Y := Ck \ n(X') and Xz := Xt \ -K-1(K{X')) are all irreducible and connected. Moreover, letting X equal the manifold Ul_1Xi, the map n : X —> Y is a d sheeted unramified covering map. Fix a basepoint y* £ Y and consider the monodromy action of /n\(Y,y*), the fundamental group of Y with basepoint y*, on F := tr~l(y*). Note we have a

279

The Numerical Irreducible Decomposition

decomposition F = Fl U • • • U F r

(15.4.5)

given by setting Fi :— F n Xi. For our purposes, it is enough to take smooth embeddings g : S1 —> Y of the unit circle S 1 into Y with 1 going to y*, and for points in F track them as 9 S [0, TT] goes from 0 to ~n over the path g. We get different permutations of F as we carry out this tracking with different embeddings of S1 • By using the permuations, we break F into disjoint sets F = FiU---UFl,.

(15.4.6)

Since the Xi are connected, we see that the decomposition given in Equation 15.4.5 is compatible with F = U ^ - F / in the sense that each Fj is a subset of one of the Fi. The immediate question that raises itself is: Question 15.4.1

Do we have r = r' and is each Fj equal to one of the F{1

If we take sufficiently many smooth immersions g : S1 —* Y with g(l) — y, the answer to this is yes as we will see in §A.12. By a smooth immersion g : S1 —> Y, we mean a smooth map with the differential of rank one at all points of S1. This suggests the method of using monodromy along paths to decompose X into irreducible components. The problem is that the set X' can be expensive to compute. Therefore, although it is easy to find random paths in Y and consequently permutations of F, we have no cheap way in general to find generators of -K\(Y, y*), and so we have no way to know whether the breakup of F into the F/ equals the breakup of F into the F^. This raises the second question: Question 15.4.2 Is there a cheap way of checking that the breakup of F into the F[ equals the breakup of F into the i^? The answer to this is yes. Based on Theorem A. 12.2, the trace test to certify the breakup is explained in § 15.5. Remark 15.4.3 (Monodromy over general bases) Everything we said above works equally well for F := p~1(y) where p : X —> Y is a proper finite-to-one covering map from a pure-dimensional quasiprojective manifold X onto an connected quasiprojective manifold Y and y is a point we treat as a basepoint.

15.5

The Trace Test

The trace test is based on an explicit geometric description of a defining equation of a hypersurface built out of traces.

280

15.5.1

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Traces of Functions

The trace of a function is an old concept arising in a number of different situations. In this section we summarize the main results about the trace and some related constructions that go under the same general name. We follow the approach to these concepts as they arise in the Weierstrass Preparation Theorem (Gunning, 1990; Gunning & Rossi, 1965). We also follow (Morgan et al., 1992a; Sommese et al., 2002b) where we have used these concepts in a numerical context. We refer the reader to these places for more details. 15.5.2

The Simplest Traces

Let us explain what a trace is in the simplest case. We have a finite set F c C consisting of d not necessarily distinct elements Ai,...,Ad. Keeping track of multiplicities, or assuming that the A^ are distinct, we have a polynomial p{x) of degree d, unique up to a multiple by a nonzero complex number, with the property that V(p) = F. It is easy to write down: p(i) = n t i ( x - A i ) .

(15.5.7)

Multiplying this out we get d

P(x) = £(-1) V * .

(15.5.8)

i=0

where the ti are elementary symmetric functions of the roots, i.e., to := 1, and for i>0,

ti :—

2_^

An '" " AH-

l<ji
The parameterized version of these ti are the traces we are interested in. Before we turn to the parameterized situation, let us note an interpretation of the above ti as traces of matrices. Recall from linear algebra that the trace of a matrix is the sum of its diagonal elements. Let A:=diag (Ai,...,A d ). The trace of A is clearly t\. The matrix A induces linear transformations A*A of the exterior products AJCd. Using the basis {eh

A---Aen\l<j1<j2<---<jl
where efc is the d-tuple with zero entries in all places but the fc-th place, where there is a 1, we see that the trace of A1 A is U.

The Numerical Irreducible Decomposition

15.5.3

Traces in the Parameterized

281

Situation

Now we want to deal with the trace in the parameterized situation. Assume we have a finite-to-one proper degree d algebraic map TT : X —> Y from one pure-dimensional quasiprojective algebraic set onto a connected smooth quasi-projective algebraic set, or more generally onto an irreducible normal quasi-projective algebraic set. In practice, X is usually an aflane algebraic set in CN and Y is Euclidean space. From Corollary A.4.14, we know that properness implies there is a Zariski open dense set U CY and a positive integer d such that TV^-^U) : TT~1(U) —» U is an unbranched d-sheeted cover. The integer d is called the degree of TT and denoted deg TT. We call the function ti, that extends to Y, the i-th trace of g with respect to TT, and we denote it trX)j(A). If g and TT are algebraic, then so are the traces tr^^A). We have d

5^(-l)'tr W i i (A)A d -' = 0.

(15.5.9)

i=0

Assume we have an algebraic function A(a;) defined on X. If Y was a point, we would be in the case of § 15.5.2. Over the dense Zariski open subset U C Y, where n is an unramified covering, each fiber consists of exactly d inverse images, and we can do the construction of the ti pointwise over each y G Y to get functions tr7r,i(A)(y) defined on U. More explicitly, fix a point y G U. The set n~1{y) consists of d := deg?r points x\,..., Xd- We can form the degree d polynomial zero at the numbers X(xi) counted with multiplicity

{w-\{xi))---{w-\{xd))Expanded we have

i=0

where to — 1

an

d ti for i > 0 denotes the elementary symmetric function

l<ji<-<ji
of the roots X(xi). This unramified assumption is too restrictive for us. The wonderful fact is that under the modest assumption that Y is normal, e.g., a manifold such as C^, these functions ti, which depend only on y G U, have unique extensions to Y as holomorphic functions. We call the extension of ti, the i-th trace. These extensions exists because the properness of n implies that given any y E Y, there exists an open set V C Y containing y such that V and TT~1(V) are compact. Thus tr7r]i(A)(y) is bounded on UC\V. By using Theorem A.2.5 when Y is

282

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

smooth, and Remark A.2.6 when Y is merely normal, we conclude that trnj(\)(y) extends to V. This gives a holomorphic extension of ti7V:i(X)(y) to all of Y, which we also denote tr^j(A). The functions tr7rj(A)(y) are algebraic functions. This is a consequence of the characterization (discussed briefly in §A.l) of algebraic functions by their growth. The equation corresponding to the relation between the U and the A^ in § 15.5.2 is the key equation d

^(-l)Hr T , i (A)( 2 /)A( aJ ) d - i = 0

(15.5.10)

i=0

for (x,y) G X x Y with TT(X) = y. For y G Y where n~1{y) consists of d distinct points, this is nothing more than the fact that the roots of Equation 15.5.7 satisfy Equation 15.5.8. 15.5.4

Writing Down Defining Equations: An Example

Consider the cuspidal cubic, defined as the solution set of z2 — z\ = 0 in C2. Let X = V{z\ — z\). The projection {z\,z2) >-> (z\) restricted to X gives a proper degree two map it : X —> C. Then given g{z\, z2), a polynomial on C 2 , we have

tTn,1(\x(zi,z2))(z1) = A U , y ^ J + A (Zl, -yf$\ •

(15.5.11)

Though y/zf is not well-defined, the unordered pair -I y/zf, — \fz\ > is well defined, and thus tr7rjl(Ax(^i,^2))(-2i) is well-defined. Consider the function Xx(zi,z2) := z2. Substituting into Equation 15.5.11, the first trace of the function z2 is found to be

trw,i(z2)(zi) = \fz% +—\f$ = 0Note that yz^ is only well defined if we choose a branch of the square root, but whichever branch we choose, we have 0. Similarly trva{z2){zi) = yjz~l ( ~ \ A i )

=

~zi'

Recalling that t0 = 1 by convention, Equation 15.5.10 gives 0=(l)z22-(0)z2

+

(-z31)z02=z22~zl

It is no surprise that we get z\ — z\ back again, since we know that up to a nonzero constant multiple, z\ — z\ is the lowest polynomial vanishing on X.

283

The Numerical Irreducible Decomposition

Note the linear projection given above is far from generic, e.g., if the linear projection was generic, we know that the degree of the projection restricted to X would equal the degree of X, i.e., deg^f — z\) = 3.

15.5.5

Linear Traces

Let X be a pure (N — l)-dimensional algebraic subset of CN. Choose a generic projection of CN to C ^ " 1 . Then we know by Theorem 12.1.5 that the restriction 7T of the projection to X is finite and proper of degree equal to d :— deg X. Choose as coordinates X\,... ,XN of CN, the composition of coordinates xi,... ,5?/v-i of C ^ " 1 with 7r; and x^ equal to a general linear function on CN that is nonconstant on a fiber of IT. Then Equation 15.5.10 gives the polynomial d

P(*) = ^ ( - l y t r ^ a ^ X a ; ! , • • •, arjv-i)*^'

(15.5.12)

2=0

of the Xi that vanishes on X. We know that p(x) is a defining equation of X. Since degp(a;) = d, we conclude from Equation 15.5.12 that tr7rii(a;iv)(a;i, • • • ,^Ar-i) is a linear function.

(15.5.13)

Indeed, if it is not then the coefficient of a;^"1 would be of degree at least two contradicting degp(x) = d. If X c CN is a pure fc-dimensional affine algebraic set of degree d, then by the Noether Normalization Theorem 12.1.5, a generic linear projection of X to Cfc is proper and finite-to-one, and taking the trace t r ^ i (A) of the restriction to X of any generic linear function A on C ^ , we also obtain a linear function. This can be seen by noting that the map (TT, A) : X —> C fc+1 is an embedding on a Zariski open dense set V of X. Thus, we fall into the case covered by Equation 15.5.13 with N = k +1. We are now in a position to give an answer to Question 15.4.2. Given X C C ^ of dimension k, choose a generic linear Lo :=
284

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Now assume that F? is not equal to any of the sets Fj. It must be properly contained in one of the Fj, say F/ C Fj. Let qi,...,qb be the points of F/ and let q0 be a point of Fi \ F/. It follows from Theorem A. 12.2 that there is a path c : S1 —> C, where S1 are the complex numbers of absolute value 1 with c(l) = 0, such that monodromy under Lo + c(t)v takes q\, q2, • • •, qb to qo, q2, • • •, qb- Since S F ' X(s) is linear in s we conclude that A(
Let Lo be the linear space that cuts out Y C Z n Lo. Choose a random, complex v £ CN. Choose two distinct, nonzero, real numbers s± and S2Track the paths of Z D (Lo + sv) from Y at s = 0 to get Y\ at s = si and y 2 at s = s2. - Choose a random, complex 1 x N matrix A, and define \(y) := A- y. - Evaluate q0 = A(Y), qx = A(Yi), and q2 = X(Y2). - return(i := (qi - qo)/s1 - (q2 - qa)/s2)

15.6

Singular Path Tracking

In § 15.2.3, we saw that sampling a nonreduced solution component can lead to a singular path-tracking problem. This can be viewed as a special case of the following situation. Suppose a parameterized family of polynomial systems (see Chapter 7), f(z;q) : Cra x C m —> C n , has an isolated singular solution (z*,q*) at a generic

The Numerical Irreducible Decomposition

285

parameter point q*, where singular means that the Jacobian matrix df/dz(z*;q*) has rank less than n. This solution will continue to other isolated singular solutions on an open set in C m (see § A.14.1), and as described in Theorem 7.1.6, we may wish to track such a solution along a continuous path in parameter space, say q(s) C O™, where q(0) = q*. In general, this would be a nearly intractable numerical problem, but we have a little extra leverage if we have obtained the solution point (z*,q*) as the endpoint of a nonsingular solution path to a homotopy h(x,t;q*) = 0. Then, we may define the doubly parameterized homotopy H(x, t, s) := h(x, t; q(s)) = 0.

(15.6.14)

At its root, singular path tracking is based on a singular endgame. For each - 0 of a nonsingular value of s, the point on the singular path is the limit as t —> path. In Chapter 10, we discussed how to estimate such endpoints with the powerseries endgame or the related Cauchy integral endgame. Both of these work by building a local model of the solution path for small t. The gist of singular path tracking is to update this local model as we advance s and in essence, replay the endgame at every s. The power-series endgame and the Cauchy integral endgame both collect sample data on the incoming paths of the homotopy to determine the winding number c and to build a local model of the holomorphic function 4>{rj) from Lemma 10.2.1, where t = rf. The singular path tracker uses prediction/correction techniques to update the local model as we step along the path. Recall from Chapter 10, that a cluster of /i paths approaching the same endpoint may break into cycles, each cycle having a winding number, say c, such that the solution path closes up as t circles the origin c times. Although we will not argue the issue carefully here, it is clear that these cycles also continue in the local neighborhood. In a nutshell, the closing up of the solution path in c loops is an algebraic condition that holds on at the generic parameter q*, so it continues on an open subset in the neighborhood of q*. The endgame convergence radius within which the local model holds varies as q(s) varies with s. It may become zero within a proper algebraic subset of the parameter space, but by Lemma 7.1.2, a one-real-dimensional path between two generic parameter points will miss the degenerate set with probability one. Therefore, at each value of s, we have a nonzero endgame operating zone as in Figure 10.1, with a convergence radius and an ill-conditioned zone. If we use sufficiently high precision, the ill-conditioned zone stays inside the convergence radius for all s 6 [0,1], and our task is to track the local model along this endgame operating zone. As we have several ways of formulating an endgame based on the local model, the details of tracking the model must be adjusted accordingly. In essence though, all the methods are similar. For conciseness, it is helpful to adopt the notation that for a cluster of points C = {w\,..., wc}, we let H(C,t,s) = 0 mean H(wi, t, s) = 0 for i = 1,..., c. Also, the following definitions are convenient. Definition 15.6.1

A convergent cluster (C,to,s) = ({wi,..., wc},to,s) with

286

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

H(C,to,s) = 0 is such that to is inside the endgame convergence radius for fixed s and for i = 1,..., c, the solution path of H(w, t,s) = 0 beginning at (wi, to, s) continues to (tUj+i, to, s) as t travels once around the circle |t| = to- For this definition, wc continues to w\. By requiring that to is inside the convergence radius, we implicitly require that all the points in the cluster approach the same endpoint w* as t —> 0 and the same cyclic mapping from u>i to IUJ+I holds under continuation around a circle for every £ ^ 0 in the disk A to (0). In other words, the projection (w,t,s) —> t gives a proper c-sheeted finite mapping from the solution set of H(w, t, s) = 0 in a neighborhood (w*,0, s) to a neighborhood of 0 6 C. We call w* the convergence point of the cluster. Definition 15.6.2 For fixed s, the convergence point of a convergent cluster is the common endpoint as t —» 0 of the solution paths of H(w, t, s) = 0 emanating from each cluster point (u>i,to, s). The nonsingular path tracking algorithm of § 2.3 can be adapted to our current situation to arrive at the following singular path tracking algorithm. • Given: System of equations, H(w, t, s) = 0, and an initial convergent cluster Co, such that H(Co,to,0) w 0. Also, an initial step length h and a tracking tolerance e. • Find: Sequence of convergent clusters (Ci,U,Si), i = 1,2,..., along the path such that with Sj+i > Sj, terminating with sn = 1. Return the final cluster at s = 1 and a high-accuracy estimate of its convergence point. • Procedure: - Loop: For i — 1,2,... (1) Predict: Predict cluster (U,t',s') with s' = min(si_i + h,l) and t' = U-i. (2) Correct: In the vicinity of U, attempt to find a corrected cluster W such thatff(W,i',s')~0. (3) Recondition: If correction is successful, play a singular endgame in t' to compute the convergence point of the cluster at s'. If the convergence point is computed to accuracy better than e, declare the endgame successful and do the following. * Adjust t: Pick a new £j in the endgame operating zone. * Update: Set Si = s' and generate the corresponding cluster C^. Increment i. (4) Adjust h: Adjust the step length h. - Terminate: Terminate when s; = 1. - Refine endpoint: Play the endgame at s = 1 to compute thefinalconvergence point to high accuracy.

The Numerical Irreducible Decomposition

287

In the context of witness points generated by the cascade algorithm, the paths of the cluster points are nonsingular away from t = 0. Accordingly, the usual prediction/correction techniques for nonsingular paths apply. The adjustment step for reconditioning must select a new value of t, which will be held constant in the next prediction step. One sensible way to select it is to use the largest value for which the singular endgame meets the convergence tolerance e. If the endgame meets the tolerance on the first try at the current value t', it may be useful to try increasing it. If it fails, we try decreasing t, unless the condition of the Jacobian matrix indicates that failure may be due to having entered the ill-conditioned zone around t = 0. With such rules in place, the value of t can adaptively decrease and increase as s proceeds. Similar to the nonsingular path tracker, we adaptively adjust the step length h by halving it when the correction step or the reconditioning step fail. On the other hand, if these steps both succeed several times in a row, we try doubling h. A variant of the procedure is to save some computation by applying reconditioning only occasionally to verify that the cluster is convergent. One criterion for deciding when to recondition is to monitor the condition number of the Jacobian matrix dH/dw along the paths. Even more computation might be saved by tracking only one path in the cluster along s holding t constant, and when the condition of the Jacobian matrix indicates reconditioning is necessary, to regenerate the other points in the cluster by looping t around the origin. This risks path crossing, because it is not clear how to set the reconditioning criterion to ensure that t has remained within the convergence radius as s progresses. There is very little experience at this point to judge whether such variants can be made both reliable and efficient. By reconditioning at every step, we have greater assurance that the local model remains valid for the whole extent of s 6 [0,1]. The techniques we have discussed show that in principle singular path tracking is feasible, although in practice a fully satisfactory approach is still a matter of research. The approach was first presented in (Sommese et al., 2002a), which also reports on some initial experiments with the technique of using the condition number to decide when to recondition. It may seem that we could completely avoid singular path tracking by using deflation to convert problems into nonsingular path tracking problems. This is true in the context of witness points generated by the cascade algorithm, because such points are isolated solutions cut out by the slicing procedure. However, in Chapter 16, we will see how to find witness points for a set denned as the intersection of two given algebraic sets, say A and B. If A and B are both components of the same system of equations f(x) = 0, then although a slice of appropriate dimension cuts out a unique point on the intersection set, such a point is not an isolated solution of the system obtained by appending the linear slicing equations to f(x) = 0. Consequently, witness points for A D B are defined only as singular endpoints of solution paths in a new kind of homotopy, called the diagonal homotopy, and such

288

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

points can be moved along AdB only by singular path tracking. Of course, it could be that a more elaborate form of deflation could desingularize these points as well; such a procedure could be subject matter for a new line of inquiry. To raise the bar even higher, consider intersecting two algebraic sets whose witness points are only known as singular endpoints of a diagonal homotopy. Then, we could have a very difficult singular path tracking problem in which each point in the convergent cluster is itself only known as the convergence point of a prior homotopy. We have yet to face such a nasty calculation, but it is quite within the scope of numerical algebraic geometry to consider it. 15.7

Exercises

Exercise 15.1 (Degree of p(x)) Conclude that degp(x) — d for the polynomial in Equation 15.5.12 by showing that (1) the highest degree that XN occurs with is d; and (2) by genericity of XJV we know that there is at least one fiber of n on which XJV is nowhere zero, and therefore that tT7T^(xN)(xi,...,a;jv-i) is not identically zero. Exercise 15.2 (Spherical Parallelogram Mechanism) Pick two unit vectors ax, a2 € M3 and a random value of a £ I . Let 6i, &2, 63 6 M3- Consider the system of polynomial equations ajbi = a,

a2b2 = a,

b[b2 = aja2,

bjbi = 1,

6^62 = 1,

63 = (6i + b2)/2.

These eight equations describe a curve in (61,62,63) € C9. Find a numerical irreducible decomposition of that curve. Report the number of irreducible components and their degrees. Exercise 15.3 (Griffls-Duffy Decomposition) Revisit Exercise 14.4 and find the irreducible decomposition. Do it again for the special case when 6j = a* and Ci = 1, i = 1,..., 6. Report the number of irreducible components and their degrees. Exercise 15.4 (Seven-Bar Problem) Use exhaustive trace testing to show that the one-dimensional component of the seven-bar system presented in Exercise 13.3 is irreducible.

Chapter 16

The Intersection Of Algebraic Sets God keep me from ever completing anything. This whole book is but a draught—nay, but the draught of a draught. Oh, Time, Strength, Cash, and Patience! —Herman Melville

Up to this point, we have concentrated on describing the numerical solution of a given system of polynomial equations. That is, given a polynomial system / , we have numerically described V(f). In Part II, we sought just the isolated points in V(f), while in Part III, we have sought the numerical irreducible decomposition of V(f). In this final chapter, we discuss operations on irreducible components. In particular, we present algorithms from (Sommese et al., 2004b, 2004c) to compute the numerical irreducible decomposition of A n B, where A and B are irreducible components of V(f) and V(g), respectively. The capability to work with individual pieces of the solution sets and to intersect pieces from different sets of equations gives a new level of refinement, allowing resources to be concentrated on just the objects of interest, especially when the solution sets of the systems on hand include extra components that are not of interest, as happens frequently. When one wants the intersection of reducible algebraic sets, it is just a matter of bookkeeping to intersect all of their irreducible pieces. For reasons which will become apparent later, we call the workhorse of the new approach the diagonal intersection algorithm. The diagonal intersection technique even allows one to examine certain algebraic sets that are proper subsets of the irreducible components of the equations on hand. A case in point is where A and B are both irreducible components of V(f). In this case A n B is certainly an affine algebraic set, but we do not have on hand a set of polynomials for which it is an irreducible component. One could derive such a polynomial system with appropriate symbolic operations on / , but we can find witness points for the set with only numerical operations on / . Somewhat surprisingly, the ability to work with individual components gives us new leverage in finding just the isolated solution points. We find that much of the special structure of a system, which we worked so hard to exploit in Part II, can be captured starting with total-degree homotopies to decompose individual equations and then applying the diagonal intersection algorithm to find intersections equation-by-equation. While the approach is at present too new to have much practical experience, early experiments (Sommese, Verschelde, & Wampler, 2004e)

289

290

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

show promise. It is hoped that the approach might solve some problems that were previously too large to solve in one blow by the traditional approaches of Part II. 16.1

Intersection of Irreducible Algebraic Sets

A good idea of the way the diagonal intersection algorithm proceeds can be gleaned by studying a special case. Assume that we have two polynomials /, g on C2. Let A be an irreducible component of V(f) and let B be an irreducible component of V(g). We would like to find A n B. Assume that A has degree d\, and B has degree d2, and let a.\,..., a^ and /?!,..., (3d2 be witness point sets of A and B, respectively. That is, for generic linear equations LA(x) = 0 and LB(X) = 0, we assume that we have already computed the intersections V(LA)^A = {a\,... , a ^ } and V(L B ) Dfl = {/?i,... ,/JdJ. Note that AC\B can be interpreted as solutions to a system on C4 by a procedure from algebraic geometry called reduction to the diagonal (Ex. 13.15 Eisenbud, 1995). The procedure is to form the system ~f{xi,x2)~ F(x1,x2,y1,y2)

=

S

= 0.

^

. x2-y2

.

The solutions of the system consists of points {x\,x^,x\,X2) G C4 with (x*,^) a point of V(f,g). This identification respects components and all multiplicity structure. In particular, all the irreducible components of AC\B have corresponding irreducible components in V(F). Ignoring for a moment the two diagonal linears, let's consider the set V{f(x1,x2),g{yi,y2))Clearly, A x B is an irreducible component of this set. To see this, remember that an algebraic set being irreducible means by definition that its set of smooth points is connected. To see that the smooth points of A x B, (Ax B)leg, is connected, note that (A x B) reg = ATeg x BTeg, and that the product of connected sets is connected. Moreover, we know a set of witness points of A x B, i.e., the set of points {(ati,l3j),i — l,...,d\,j = I,...,d2} are the intersection of A x B with the linear space V(LA(x1,x2),LB(yi,y2))Consider the homotopy

_{1 - t){x2 - y2) + jtLB{x)_ with 7 a general point of S1, the complex numbers of absolute value one. In this special case, the diagonal intersection theory of (Sommese et al., 2004b) implies

291

Intersection of Algebraic Sets

that the endpoints as t —> 0 of the solution paths of H(x, y, t) starting at the points (<Xi, (3j) at t — 1 includes (using the identification given by reduction to the diagonal) all the isolated points of An B. The general case is conceptually not much harder, although the procedural details get a bit technical. We sketch only the main idea here. We use notation similar to that above, but now work in higher dimensions. That is, let A C V(f) C CN and B C V(g) C C ^ be irreducible algebraic sets, with / and g as polynomial systems. Let dim A — a and dim B = b. The main idea is that, letting x £ Cfc be the variables for A and y £ Cfc those for B, we wish to find the irreducible decomposition of the diagonal polynomial system, namely x — y, restricted to Ax B. The cascade homotopies of Chapter 14 carry over with A x B in place of Euclidean space. In short, we have an embedding like Equation 14.1.4 that includes all of the systems for slicing witness sets at every dimension. As in the cascade method on Euclidean space, we need to square up systems as necessary. Omitting detailed argumentation, this just amounts to choosing random, complex matrices M / , Mg, Mxy,S, U, v with dimensions as follows: Matrix rows columns

M/ N- a #(/)

Mg N -b #(g)

Mxy a+b N

S a+b N

U v N N 2N 1

The result is a system of 2N polynomials: Mf • f(x) Mg-g(y)

£(x,y,t)=

Mxy(x-y)

S-T(t).(u-[xy\+vy

+

(16.1.1)

where T(t) is & NxN diagonal matrix with entries ti,... ,tN. Just as in the regular cascade method, we choose t\,... ,tn randomly, and a witness set for dimension i is found by solving the equations £ (x, y, t^) = 0, where t ^ = (ti,...,ti,O,...,O). To get started, note that we have at the outset the solutions (a*, f3j) 6 CN x CN, i = 1 , . . . , deg A, j = 1 , . . . , deg B of the system J-(x, y) = 0, where

'Mrf(xy H*,V)=

M

['S]

•

(16-1.2)

LJA\X)

. LB(y) _ Now, the top dimensional component of A n B is at most fci := min(a, b) and the lowest is at leastfco:= max(0,a + b - N). We solve for dimension fci by tracking the solution paths of s^(x,y) + (1 - s)£ (z,y,tM) = 0,

(16.1.3)

from each of the start points (ai, 13j) at s = 1 to get at s = 0 three kinds of points: witness points on the diagonal x — y = 0, points at infinity, and "nonsolutions." The

292

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

nonsolutions at dimension i are the start points for the homotopy to dimension i — 1, £(x,y,{t1,...,ti-1,s,0,...,0))

= 0,

(16.1.4)

whose solution paths we follow from s = 1 to 0. This is a brief, but procedurally complete, description of the diagonal intersection method. As outlined above, each homotopy is a system of 2iV equations in 2N unknowns. In (Sommese, Verschelde, & Wampler, 2004c), it is shown how to consistently reduce the size of the homotopy by using intrinsic formulations of the linear equations. Finally, it is important to note that the output of the diagonal homotopy method is a witness superset. We still need to remove junk points and, if desired, break the witness sets into irreducible witness sets. The algorithms of Chapter 15 are directly applicable.

16.2

Equation-by-Equation Solution of Polynomial Systems

With the diagonal intersection algorithm in hand, we have much more flexibility in how we solve systems of polynomials. For example, we can subdivide a system into two sets of polynomials, compute the irreducible decomposition of each, and the use the diagonal method to intersect each irreducible component of the first subsystem with each one of the second. With a little bookkeeping, for eliminating duplications and so on, we get a numerical irreducible decomposition for the whole system. Taking this approach to the extreme, we may first find witness sets for each polynomial individually, and then intersect these one-by-one. We call this solving the system equation-by-equation (Sommese et al., 2004e). The approach is most easily described in terms of a flowchart, shown in Figure 16.1. The post-processing of points coming out of the diagonal homotopy discards duplicates and checks whether singular points are junk. In the junk removal box, we have used the shorthand V{W) to mean the algebraic set witnessed by W. We also allow an affine algebraic set Q to be pre-specified for discarding points on known degenerate sets or sets not of interest. For example, should we wish to work on (C*)N, Q is the union of the coordinate planes, X{ = 0, any i. The flowchart also includes two tests that eliminate some witness points of the subsystems before they get to the diagonal homotopy routine. The one on the left, "/fc+i = 0?," recognizes that if a witness point satisfies the new equation, then the set it represents does too, and it passes to the output without change of dimension. The points eliminated by the similar test on the right, ufi(x) = 0 any i < fc?," discards points on components we have already found. Such tests are cheap compared to running the diagonal homotopy, so it is useful to employ them. The pruning of points in the flowchart can be made more stringent if all we wish to find are the nonsingular isolated points of the system. Supposing that the original system is square, / : CN —> C^, we can keep in the output for V ( / i , . . . , ft) just the nonsingular witness points for dimension N -i. There are not enough polynomials

293

Intersection of Algebraic Sets

remaining to cut any higher-dimensional components down to isolated points. To understand why the equation-by-equation approach might be valuable, consider that systems of 50 or more low degree polynomial equations occur naturally in the study of polynomial systems. It can happen that such a system has only a few thousand isolated solutions, and we might wish to find them. Straightforward use of traditional homotopy continuation, such as we described in Part II, may have little chance of succeeding. For example, assume that we had a system of 60 polynomials of order two. A total degree homotopy continuation would have 260 « 1018 paths. Assuming we had a thousand node computer, each node of which could compute 20 paths a second, it would take a few million years. Of course, if the system has many fewer than 260 solutions, we should not be using a total degree homotopy, but instead use a start system to take some advantage of the special structure. However, the computation of a special start system also suffers from a curse of dimensionality, so we may not find a good one in a reasonable amount of time. Consider the following simple case: an eigenvalue problem. We have A a given 60 x 60 matrix of constants. We have the polynomial system Ax — \x = 0. Regarding it as a system on P 59 x C and homogenizing, we get the system fxAx — Xx = 0

on P 59 x P 1 . Embedding P 59 x P 1 into P 119 using the Segre embedding described in § A. 10.2, we see that the total degree of the solution components of the first k equations is k. This means the number of solution paths working equation-byequation never gets large. A reasonable observation is that using the bihomogeneous structure we just wrote down, the usual homotopy continuation will work well. The point is that in this case we could see a special structure. If we hadn't, the total degree homotopy would be useless, but the equation-by-equation approach would automatically utilize the special structure. 16.2.1

An Example

Consider once more the system given in Equation 12.0.1, which we treated with WitnessSuper in Example 13.6.4 and with Cascade in Example 14.2.1. The equations are

\h(x,y)] [h{x,y)\-

[ x(y2-x3)(x-l) 1 2 3 [x(y -x )(y-2)(3x + y)\

U

"

It is easy to confirm by hand that the equation-by-equation algorithm flows as follows. The numbers next to the flow lines indicate how many points flow that direction. Counting the computation of witness points for the individual equations,

294

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

there are a total of 5 + 6 + 2 = 13 homotopy paths in the procedure for this problem. This compares to 36 paths for WitnessSuper and 39 paths for Cascade. Witness V(fx)

Witness V(f2)

#Wf = 5

#X2 = 6

5

^

-.

6

Witness y ( / i , / 2 ) 16.3

Exercises

Exercise 16.1 (Flowchart) Draw a diagram showing how witness points flow when the equation-by-equation method is applied to the system of Example 13.6.5. (Hint: some points coming from the diagonal homotopy go to infinity.) Exercise 16.2 (Eigenvalues) Chart the flow of witness points for an equationby-equation treatment of the eigenvalue problem described in § 16.2. Assume the size of the matrix i s n x n . The output of the diagonal homotopy at each stage consists of only nonsingular points and points at infinity. How many paths are tracked in total? Exercise 16.3 (Diagonal Intersection of Reducible Components) The description of the diagonal approach is for intersecting irreducible sets. Despite this, the equation-by-equation flowchart does not require the witness sets to be decomposed into irreducibles. Explain why this is valid.

295

Intersection of Algebraic Sets

Witness

Witness V(h, ...,fr) w*

w$

•••

w*

•••

xk+1

w£

w

/fc+i(w) = O? —

V

•

•

N>

x

1

^

V{fk+1)

|

W

/i(a;) = Oanyi
^

X

r

7—

V7 f I

<^N

Diagonal Homotopy

y at CXD? or y e Q?

ye W^li1?

y singular?

\ I

Y^>

Y>

,

Y> •

1

y € V{Wf+1)

I

for any i < j ?

I

Y>

r^>^ \

Wk

+1

Wk

+1

...

^fc+l

Wk+1

...

py-fc+l

Discard /

^ ^ 1

Witness V ( / i , . . . , / f c +i)

Fig. 16.1 Stage A; of equation-by-equation generation of witness sets for V(fi,..., fn) 6 C w \ <5The witness sets are subscripted by codimension and superscripted by stage. Q is some prespecified algebraic set on which we wish to ignore solutions.

Appendices

Appendix A

Algebraic Geometry

A basic goal underlying algebraic geometry is to translate between algebra and geometry, and take advantage of people's strong visual intuition and the tools developed in mathematics to support this intuition. Over the complex numbers, the relationship between algebra and geometry is remarkably strong, and sadly over the real numbers this relationship is very weak. In this appendix we present useful results about these concepts, but we have left many facts to be introduced as needed throughout the book. What we have tried to do is give adequate definitions and examples so that the reader can understand the techniques in the book. Towards this goal we add to the basic concepts introduced earlier in this book. There are a plethora of introductory books on algebraic geometry. Unfortunately, many of these, based on a computational algebra approach, are not centered on the basic geometric facts we need, e.g., the equivalence of an algebraic set being irreducible with the connectedness of its smooth points. (Kendig, 1977) is a good geometric introduction. Though restricted to plane curves, (Fischer, 2001) is a gentle introduction that covers a surprising amount of important material. (Fischer, 1976) is a wonderful book for getting a detailed understanding with precise statements of the analytic geometry that is useful in the study of polynomial systems. No one book will cover everything, but for further study we suggest the fine books (Griffths & Harris, 1994; Harris, 1995; Mumford, 1995), which discuss many geometrical issues that arise. (Eisenbud, 1995) is a useful book covering the algebra underlying the symbolic methods with attention to the background geometry. (Decker & Schreyer, 2001) is a good survey of computational algebraic geometry. (Cox et al., 1997, 1998; Decker & Schreyer, 2005; Greuel & Pfister, 2002; Schenck, 2003) are good introductions to computational algebra and computational algebraic geometry. Except when explicitly stated, algebraic sets are reduced, i.e., we ignore multiplicity information. Systems of polynomials on C'" are not sufficient. For example, if we have a system of polynomials on C^, it might well happen that there is some algebraic subset B of CN, known in advance of solving the system, such that solutions in B 299

300

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

may be ignored. Working directly with a system of polynomials on CN \ B leads to conceptual clarity. A more serious situation occurred in Chapter 16 where the natural space is not a Zariski open set of C^, but rather a pure-dimensional affine algebraic set. The compromise we make here is to deal with algebraic functions on pure-dimensional quasiprojective algebraic sets. Systems of homogeneous polynomials on projective algebraic sets may be reduced to this situation by the moves discussed throughout the book, i.e., for projective space, we pass to the Euclidean space of one dimension higher with the addition of a random linear equation or equivalently passing to a "general" Euclidean patch inside the projective space. We make several remarks in the rest of this appendix about more general situations, e.g., working with line bundles and vector bundles. A significant part of this appendix is devoted to Bertini Theorems, which are crucial for applications of numerical analysis to polynomial systems. Many of these results assert that certain sets are smooth with appropriate dimensions or they are empty. These statements, which do not assert any existence, are usually simple to prove, and reduce to Theorem A.4.10 combined with some form of the constructions of § A.7 or § A.8.1. There are also statements asserting certain sets are nonempty or irreducible. These results are more difficult and rapidly lead beyond the scope of the book. For this reason, we have multiple statements of Bertini's Theorem with different levels of generality. A.I

Holomorphic Functions and Complex Analytic Spaces

The complex neighborhoods introduced in § 12.1.1 are convenient because they may be chosen small enough to discard global information. Loosely speaking, they let us put local properties of a space "under the microscope." When using complex neighborhoods it is often useful to choose local coordinates which are not polynomials. Here is a typical example. Example A.1.1 Consider the affine algebraic set Z := V(w2 - z). We have a map 7T : Z —> C given by ir(z,w) = z. There are two points in the fiber TT~1(1) over 1, i.e., (1,1) and (1,-1). As we will see, Z is a manifold, and a natural parameterization of Z at (1,1) g Z is given by (z, ^/z) where we choose the branch of yfz with \/I = 1, and stay in a neighborhood of 1, e.g., {z £ C | z ^ (—oo,0]}, where the branch gives a well-defined function. For doing algebraic geometry over the complex numbers, it has been standard for over a century to use holomorphic functions such as the function ^/z in Example A.1.1 and holomorphic functions such as ez. When talking about holomorphic functions, we use the complex topology unless we explicitly say otherwise, e.g., that a set is a Zariski open set. A function / defined on an open set V C CN is said to be a holomorphic function on V if given any x = (xi,..., xN) G V, there exists a neighborhood U C V of x

301

Algebraic Geometry

on which there is an absolutely convergent power series expansion oo

f(zu...,zN) = ^2 X!

i=0 \J\=i

a z x J

j( - ) >

where all of the aj G C. Here we use multidegree notation (1) J denotes an JV-tuple of nonnegative integers (ji,..., (2) \J\ :=ji+---+jN\ and (3) {z - x ) J : = (zi - x ^ • • • ( z N - x N y > » .

J'JV);

Just as in one complex variable there are many equivalent ways of denning holomorphic functions, e.g., in terms of the Cauchy-Riemann equations. We refer the reader to (Pritzsche & Grauert, 2002; Gunning, 1990; Gunning & Rossi, 1965) for more on holomorphic functions. We need only a few facts about them. The first is the obvious fact that polynomials are holomorphic. Locally, polynomials and holomorphic functions look and behave the same, but when looked at globally, holomorphic functions can be much more wild than polynomials, e.g., ez — 1 has infinitely many complex zeros. On the other hand, there are many results that assert that a holomorphic function with growth as moderate as an "algebraic function" is an "algebraic function." For example, any holomorphic function / on C ^ with the property that there is a constant C > 0 and an integer K > 0 such that

l/(*)l < c(1 + v / N 2 + --- + k v | 2 ) X is a polynomial of degree < K. This follows immediately from the Cauchy Inequalities (page 21 Pritzsche & Grauert, 2002). In analogy to affine algebraic sets, we define a complex analytic set X C U on an open subset U C CN as the set of common zeros of a finite number of holomorphic functions f\,..., fk on U. Given a complex analytic set X := V ( / i , . . . , fk) C U on an open subset U C CN and a complex analytic set Y := V( '• X —> Y is a function from X to Y such that (f> is the restriction to J of a holomorphic map A : U' —> V from an open subset U' C U containing X to an open subset V' C V containing Y, i.e., the restriction of a mapping of the form (z1,...,zN)

-> (A^Z!,...,

zN),...,

AM(zi,

• • •, zN))

with Ai(zi,... ,ZN),...,AM(ZI,...,ZN) holomorphic functions. A holomorphic mapping <j> : X —> Y is called a biholomorphic mapping if there exists a homomorphic mapping tp : Y —> X such that <j> o ip is the identity mapping on Y and tjj°4>\s the identity mapping on X. In this situation we say that X is biholomorphic to Y. Recall, e.g., (Milnor, 1965), that an n-dimensional differentiable manifold X

302

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

is a metric space that locally looks like Euclidean space. The definition of differentiable manifold requires some technicalities because manifolds can have different degrees of smoothness. You need a set {Ua \ a € 1} of open sets which covers the manifold, i.e., X = UaeiUa, and for each Ua, a map (pa : Ua —> R n that gives a homeomorphism Ua to an open set of ]Rn. Moreover: (1) given any compact set K C X, only finitely many Ua meet K; (2) whenever Ua D U0 ^ 0, fa o 0 " 1 : 4>a(Ua n Up) -> 4>p{Ua D L^) is C°°, i.e., has infinitely many continuous derivatives; (3) there is a countable basis of open sets, i.e., there are a countable set B of open sets such that every open set on X is a union of open sets from B. If we replace Rn by C™ and C°° by holomorphic, we have the definition of ndimensional complex manifold. Before we go any further, we point out that all manifolds connected to algebraic geometry are quite nice. For algebraic sets, we rarely need worse than complex manifolds, which are much "nicer" than even C°°-manifolds, i.e., infinitely smooth manifolds. We never stray below infinitely differentiable manifolds. The complex analytic sets we have defined so far are analogous to affine algebraic sets. There exists a very natural more global notion of complex analytic spaces defined analogously to complex manifolds using the complex analytic sets as local models, e.g., (Fischer, 1976; Gunning & Rossi, 1965). Complex analytic sets, quasiprojective algebraic sets, and complex manifolds are complex analytic spaces. In what follows we will state some results for complex analytic spaces.

A.2

Some Further Results on Holomorphic Functions

Holomorphic functions satisfy very strong restraints that are often considerably stronger when the domain of the functions is at least two dimensions. For example, there are several convenient extension theorems. Theorem A.2.1 (Hartogs' Theorem) Let U C CN be an open set with N > 2 and let Y = V(gi,... ,gi) be a complex analytic subset of CM. If K C U is a compact set with U \ K connected, then any holomorphic mapping A : U \K —• Y has a unique extension to a holomorphic mapping U —> Y. Proof. The map A is given by functions Ai,...,AM, and has a unique extension to U, since the Ai extend uniquely to U by the single function version of Hartogs' Theorem, e.g., (page 307 Fritzsche & Grauert, 2002). Since the holomorphic functions gt(Ai(z),..., AM{Z)) are identically zero on U \ K, the extensions to U are identically zero. Thus A(U) CY. •

Algebraic Geometry

303

Remark A.2.2 Theorem A.2.1 is not true with Y merely a complex analytic subset of an open set U C CM. For example, if G is the open unit ball in C2, K is the closed ball in C2 of radius 1/2, and Y = U := G\K, the result is false. It is true whenever U is a holomorphically convex open set of C M , see, (page 75 Fritzsche & Grauert, 2002). Such sets include CN and open balls. Here is a typical use of Hartogs' Theorem. Example A.2.3 Let X := CN \ 0 with N > 2. Then X is not isomorphic to an affine algebraic set. To see this assume otherwise that it was isomorphic via F : X —> X' to an affine algebraic set X' C C M for some positive integer M. Then, since X' is closed, any sequence xn G X converging to 0 £ C^ cannot have their images F(xn) converge in C M . But, such a sequence does converge, since by Hartogs' Theorem A.2.1, the mapping F has a holomorphic and hence continuous extension to CN. The following simple result puts Example A.2.3 in perspective. Lemma A.2.4 Let g be an algebraic function on an affine algebraic set X C CN. Then X \V(g) is isomorphic to an affine algebraic set. Proof. By definition we have a polynomial p £ C[zi,..., ZN] such that Px = 9 and hence V(g) = V{Px) = V(p,f,,..., fk) where X := V ( / 1 ; . . . , fk). We let z denote the AT-tuple (zi,..., zN). We define the map F : X\V(g) -> V(fi,..., fk, wp-1) C C ^ 1 by F(z) = (z,l/p(z)). Define G : V{h,..., fk,wp - 1) ^ X\V(g) by G(z, w) = z. Note that G o F is the identity on X and F o G is the identity on V(f1,...,fk,wp-1).

a

Another very useful result is the Riemann Extension Theorem (page 38 Fritzsche & Grauert, 2002). Theorem A.2.5 (Riemann Bounded Extension Theorem) Let U be a complex manifold. If Y c U is a complex analytic subset of U with Y ^ U, then any bounded holomorphic function on U \Y has a unique extension to a bounded holomorphic function on U. Remark A.2.6 Analogous to the generalization mentioned in Remark A.2.2, Theorem A.2.5 remains true if the bounded holomorphic function on U \ Y is replaced by a holomorphic mapping from U \ Y to an analytic subset X of a bounded holomorphically convex open set G C C M . The condition that U is smooth may be relaxed to the condition that U is normal, which will be briefly touched on in § A.2.2. For holomorphic functions, there is the maximum principle, e.g., (Theorem LA.7 Gunning & Rossi, 1965).

304

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

L e m m a A.2.7 (Maximum Principle) Let f{x) be a holomorphic function on a connected open set U C C ^ . / / |/(:r)| has a maximum on U, then f(x) is constant. For holomorphic functions, partial derivatives with respect to the coordinates can be shown to be well defined, e.g., by differentiating the power series term by term. The analogues of many differential calculus results hold with no change of statement. For example, there is the important Implicit Function Theorem. A set of holomorphic functions / i , . . . , /JV defined on an open neighborhood U of a point x € CN is called a system of coordinates on U centered at x e C ^ if (1) fi{x) — 0 for all z; and (2) the mapping ( / i , . . . , fN) : U —> C ^ is a biholomorphic mapping from U to an open set V oi CN. By Theorem A.2.8, the second condition, with U possibly replaced by a smaller open set, is equivalent to a condition that the Jacobian at x r dfi dzx '" dfN • dz\

dfx -I dzN

_ _ _ dfN dzN

-

is invertible at x. T h e o r e m A . 2 . 8 ( I m p l i c i t F u n c t i o n T h e o r e m ) Let fi,...,fk

be

holomorphic

functions defined in a neighborhood of a point x € CN with fi(x) = 0 for all i. Assume that the Jacobian rdfx dzx

dfx -i dzN

dfk . - dzx

3/fc dzpi -

has rank k at x. Then on some possibly smaller neighborhood U of x, there exist holomorphic functions fk+i, • • •, /JV such that / i , . . . , fN form a system of coordinates on U centered at x. The analogues of the many consequences of the differentiable implicit function theorem hold with no change. For example, we have a corollary that we will use below. Corollary A.2.9 Let Z e.g., let U = CN and Z a holomorphic map from 4>(0) — x £ Z and
CU be a complex analytic subset of an open set U C C CN be an affine algebraic set. Let <j> : B -> the open ball B in a complex Euclidean space Cm equal to a neighborhood of Z containing x. Assume

C^, C ^ be with further

305

Algebraic Geometry

that the complex Jacobian • dfa_ . . . 9
: •.. :

d4>=

L dzi '"

dzm J

has rank m at 0. Then there exist holomorphic coordinates / i , . . . , /jv in a open set U' C U ofx such that ZnU' = (j>{B)C\U' = {x € U' | fm+i(x) = 0;...; fN{x) = 0}. Proof. By renaming if necessary we can assume without loss of generality that • <Mi . . . Mx. -\ dz\ dZrn dm _ _ _ d<j,m - Szi dzm -

has rank m a t 0. In addition to the coordinates z\,..., zm on C m , let zm+i,... ,zjq be coordinates on CN~m. Define the map f : B x C N " m -^ C^ by f{(zi,..., zN) = 4>i(zi,..., zm) for i from 1 to m, and /i(zi,..., z/v) = —4>l(zi,..., zm) + z, for z fromTO+ 1 to N. The Jacobian of / at 0 is 9zi

a^m

u

"9^7

9i^"

U

dtpm + l 3zi '" 90w L

aZl

u

'

U

''"

L

90m + l 1-1 rv 9 z m -- • • • U i

d(f>N r\ '"'

dzm

u

J

By the implicit function theorem with k = N, the /j form a system of coordinates at 0. Knowing that / is one-to-one in a neighborhood of 0, it follows by construction that (j>{B) n U = {x e U' | fm+i{x) = 0;...; fN{x) = 0}. • The reader might observe that in Examples 12.1.3 and 12.1.4, there is a one-toone and onto mapping C —> V(w2 — z) given by sending j o e C t o (w, w2). Note that the differential of the mapping everywhere has rank one, and the map (z, w) —> w gives an inverse. Given this, it is natural to hope that given a smooth point x of an affine algebraic set Z, there is a Zariski open dense neighborhood U C Z of x which can be identified with a Zariski open dense subset of some Euclidean space. It is a fact of life that this is false. Example A.2.10 w2 ~z{z-l)(z-2)

Let I c C 2 denote the affine algebraic set defined by p(z, w) = =0. Since

306

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

is the empty set, it follows from Corollary A.2.9 that V{p) is a manifold. It can be shown that V(p) is as a differentiable manifold homeomorphic to a torus minus one point, i.e., homeomorphic to S1 x S1 minus a point, where S1 denotes the circle S*1 := { z e C \z\ = l} . Any Zariski open set U C V(p) is the complement of a finite set on V(p). Thus, there will be two differentiable embeddings of the circle 5 1 to V(p) that meet transversely in only one point. But, there can be no such maps of S1 into C. One of the beauties of Zariski open sets is that they are very big. The problem here though, is that Zariski open sets are too big. A.2.1

Manifold Points and Singular Points

How bad a set is an affine algebraic set? How far are they from being smooth, i.e., from being a manifold? As we will see, the answers are "quite nice" and "not very far" respectively. In fact, given any affine algebraic set Z, the set of smooth points of Z will be a Zariski open dense set of Z. Let us introduce definitions and concepts to make this precise. Given an affine algebraic set Z C CN, we define a point x G Z to be a smooth point (also called a manifold point or a regular point) if there is a holomorphic map : B —> C^ from the open ball B in a complex Euclidean space C'm with {B) is equal to a neighborhood on Z of x; and the complex Jacobian - dtp! _ _ _ d<j>t dzi dzm

:

d<j>=

9N

Uz,

•-.

: 9N

' ' ' dzm J

has rank m a t i , where (f>{zi,...

,zm)

= (4>i(zi,...

,zm),...

,<j>N(zi,...

,zm)).

For example, at (1,1) on Example A.1.1, (j> can be taken to be z —> (z, yfz). Note that by Corollary A.2.9, it follows that given a smooth point x £ Z, there are holomorphic coordinates z\,..., z^ defined on a complex open set U C CN containing x and such that Zi(x) = 0 for all i, and such that U(~\Z = V(zm+i, • • •, .ZJV). This integer m is defined to be the complex dimension of Z at a regular point x £ Z. The complex dimension of Z at m is half the usual dimension of Z considered as a topological manifold at x. We typically use the word dimension for complex dimension and refer to the usual dimension as the real dimension. For example, the complex dimension of C is one and the real dimension is two. It is traditional to denote the smooth points of a quasiprojective set Z by Z reg . The points in Z \ Zreg are called singular points. The singular points of Z are denoted Sing(Z). The dimension of Z at a smooth point is well defined. A nice argument for this follows by adapting the very short argument for differentiable manifolds (page 7 Milnor, 1965). We gave a general definition of dimension in § 12.2 based on the irreducible decomposition.

307

Algebraic Geometry

One difficulty with deciding for which points an algebraic set are smooth is that the defining equations for the set might have too much information packed in them. Here is an example where the defining functions will not suffice. Example A.2.11 Let Z := V(z2) C C. In this case, Z = V(z) also, and using the defining equation z, we see that Z is a manifold. The problem with the defining equation z2 is that it also includes multiplicity information about Z. Remark A.2.12 There is no easy computational solution to the problem posed by the last example. The set of smooth points of an affine algebraic set V(f) is Zariski open and dense, but the prescription for the singular set is nontrivial. Given an affine algebraic set Z C C^, Z = V(I(Z)) where I(Z) denotes the ideal of polynomials in C[zi,... ,ZN] that vanish on Z. One version of Hilbert's Nullstellensatz, e.g., (Cox et al., 1997), says that given an ideal X C C[zi,..., z^\, then I(V(X)) = y/X, where y/X, the radical of X, consists of all polynomials g such that gk G X for some positive integer k. For example, on C, y/(z3) = (z). The passing from an ideal to its radical throws away all multiplicity information. The radical intervenes in the algebraic characterization of the set of smooth points of an affine algebraic set. Let g\, • • • -,gu be a basis of the radical of X(f). It follows that (Chapter 1A Mumford, 1995) that the singular set, Sing(V(/)), of V(f) is equal to T/

( \

dgi dzi

dgM \ dzN)

It must be noted that for a fixed N, M can be arbitrarily large. A special case of the above, mainly useful for making illustrative examples, is the following. Lemma A.2.13

Let p G C[zi,..., z^\. The singular set of V(p) is contained in (to

dp^\

Here is an example using Lemma A.2.13. Example A.2.14

Let Z = V(zw) C C 2 . In this case the potential singular

points, V I , -r—, zw I, the common zeros of zw and its partial derivatives, \ oz ow j equals the origin (0,0) € C 2 , which is clearly a singular point of Z. Remark A.2.15 The inclusion in Lemma A.2.13 is an equality if the dimension of the set ( \

dp_ dzx

^ \ dzN J

is < N — 2. For explicit examples, such a criterion is useful, but in our numerical work such criteria have not yet proven useful.

308

A.2.2

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Normal Spaces

There is an extensive literature on singularities. The simplest general class of complex analytic spaces after complex manifolds are normal complex analytic spaces, e.g., see (Fischer, 1976). A complex analytic space Y is normal if given any y GY and any complex neighborhood U of y and any bounded holomorphic function / on U \ Singf/, it follows that / extends holomorphically to U. A quasiprojective algebraic set is said to be normal if it is a normal complex analytic set. Given any complex analytic space (respectively, any quasiprojective algebraic set) X, there exists a unique normal complex analytic space (respectively, a unique normal quasiprojective algebraic set) X' with a finite proper holomorphic (respectively, algebraic) map -K : X' —> X with n : X' \ 7r~1(SingA') —> X \ Sing(X) isomorphic. Normal spaces include the affine algebraic subsets X C CN with the properties: (1) X is a reduced complete intersection, i.e., all irreducible components of X have with the same dimension; there are k := N — dimX polynomials pi,...,pk X — V(pi,... ,Pk)', and all components of X occur with multiplicity one; and (2) Sing(X) is codimension at least two in X. These special sets are irreducible and naturally occur as parameter spaces.

A.3

Germs of Complex Analytic Sets

There are situations, e.g., in the study of endgames in Chapter 10, when we want to look carefully at behavior in a neighborhood of a point. In such situations specifying a fixed neighborhood of the point is inconvenient, and the notion of a germ of a complex analytic set improves clarity. (Chap. II, Sec. E Gunning & Rossi, 1965) is an excellent place for becoming comfortable with germs of complex analytic sets. Since we will not be talking about germs of other types of sets, e.g., germs of affine algebraic sets, which are denned analogously using the Zariski topology in place of the complex topology, we will often refer to the germ of a complex analytic set as a germ of an analytic set or a germ. Given a point x e CN we define an equivalence relation on complex analytic sets containing x: if X c U and X' C U' are two complex analytic sets denned on open neighborhoods of x in C ^ , then we say that X and X' have the same germ at x if there is an open neighborhood V C U D V with X n V = X' D V. Thus the complex analytic sets V(z) and V(zw) define the same germ of a complex analytic set at (0,4) but not at (0,0). G C ^ , we let ||u;|| = \/|u!i| 2 + • • • + \WN\2 Given a point w = (WI,...,WN) denote the Euclidean norm and we denote the ball of radius r about a point x by

Br(x), i.e.,

Br(x)~{zeCN

||z-x||
Algebraic Geometry

309

We say a germ X at a point x £ C ^ is irreducible if there is a positive number e' such that for all positive e that are less than e', there is an irreducible representative of X in Be(x). This is equivalent to the usual definition that a germ l a t i G C^ is irreducible if there is no way to write X as a union Xi U X2 of germs X\, X2 at x unless either X = X1 or X = X2. The dimension of an irreducible germ X at a point x E C ^ is defined as the dimension of any one of the germ's irreducible representatives. An important fact about germs is that the irreducible decomposition holds, e.g., see (Theorem II.E.13 Gunning & Rossi, 1965). Theorem A.3.1 Let X be a germ of a complex analytic set at a point x € CN. Xk at x £ CN such that There are a finite number of irreducible germs X\,..., X = Xi U • • • U Xf, and for all j = 1 , . . . , k we have that Xj : Ai(0) —• X from the open unit disk Ai(0) C C to X such that (f>(0) = x and
(Ai(0)) \ x. Proof. This is sometimes referred to as the local uniformization theorem for onedimensional analytic sets. It is the one-dimensional complex analytic version of Theorem A.4.1. A proof for it can be based on the local parameterization theorem (Gunning, 1970), which, in its simplest form for a pure one-dimensional complex analytic set X, says that given a point x C X, there exists a finite proper surjection n : U —> Ai(0) where U is an open neighborhood of x and where 0 = n(x). Since X is one-

310

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

dimensional, the set of singular points of X is O-dimensional and therefore meets a neighborhood of x with compact closure in a finite set. We can therefore (by taking a smaller U if needed) assume that the only possible singular point in U is x. If a; is a smooth point there is nothing to prove. Similarly IT : U \ {x} —> Ai (0) \ 0 is an unramified covering with a well-defined sheet number d. Basic topology tells us that the restriction / : Ai(0) \ 0 —> Ai(0) \ 0 of the mapping z —• zd, factors through n : U \ {x} -> Ai(0) \ 0 as f(z) = n(g(z)) with g : Ai(0) \ 0 -> U \ {x}. Using the Riemann Extension Theorem A.2.5, we conclude we have an extension <> / of g satisfying the conclusions we are trying to show. • From Theorem A.3.2 follow results classically referred to as Puiseux's theorem, e.g., (Chapter 7 Fischer, 2001). The following corollary is a version of this result. Corollary A.3.3 Let X be a one-dimensional complex analytic subset of an open set U C C ^ . Assume that X is irreducible at a point x e X. Let V, for some open neighborhood V c X of x, be the local uniformization map of Theorem A.3.2. Given any holomorphic function g{z\,... ,zN) on U which is not constant on X, e.g., a coordinate function Zj, it follows that there exists a positive r < 1 such that a coordinate function s on A r (0) may be chosen with the property that the composition g((p(s)) equals g(x) + sc. Proof. Let w denote any coordinate on Ai(0), which is 0 at 0. The power series expansion of g((f>(w)) is given by oo

g(x)+Y^aiw\ k=c

where ac ^ 0. Since Y^k=oai+cwl 1S nonzero at 0 we may express it on A r (0) for some positive r < 1 as h(w)c, with h(w) holomorphic on A r (0). Choosing r positive, but possibly smaller, s = h(w) is the desired coordinate. •

A.4

Useful Results About Algebraic and Complex Analytic Sets

The Hironaka Desingularization Theorem, which holds for both complex algebraic sets and complex analytic spaces, is highly nontrivial, but extremely useful. Given a quasiprojective algebraic set X (respectively, a complex analytic space X), a desingularization f : X —> X of X is a quasiprojective manifold X (respectively, a complex manifold X) and a proper surjective algebraic (respectively, holomorphic) map / : X —> X such that //-i(x r e g ) : / -1 (-Xreg) —> XTeg is an isomorphism Zariski open and dense in X. Xreg (respectively, a biholomorphism) with f~l(Xreg) is always Zariski open and dense in X.

Algebraic Geometry

311

Theorem A.4.1 (Hironaka Desingularization Theorem) Let X be a quasiprojective algebraic set or a complex analytic space. Then there is a desingularization f :X -> X ofX, More refined versions of the result tell us that we may choose the desingularization map so that the inverse image of the singular set under the desingularization map is a union of smooth codimension one algebraic sets which meet transversely. See (Lipman, 1975) for a nice exposition of this result. In the case when all components of X are of dimension one, Theorem A.4.1 is simply the normalization of X, e.g., see (Fischer, 1976). The Hironaka Desingularization Theorem makes many facts that are easy for manifolds carry over immediately to general algebraic sets. Here is one simple example often referred to as the maximum principle, e.g., (Theorem III.B.16 Gunning & Rossi, 1965). Theorem A.4.2 Let X be an irreducible complex analytic space with infinitely many points. Let f be a holomorphic function of X. If \f\ has a maximum on X, then f is a constant function. Proof. Let n : X —> X be a desingularization. Let / denote the composition of / with 7T. If |/| has a maximum, then so does |/|. By Lemma A.2.7, it follows that / is constant, and hence / is constant. • Theorem A.4.20 will give another illustration of the clarity brought by using Hironaka's theorem. The proper mapping theorem of Grauert assures us that many operations with algebraic sets yield algebraic sets. Theorem A.4.3 Let f : X —> Y be a proper holomorphic mapping (respectively proper algebraic mapping) of complex analytic spaces (respectively protective, respectively affine, respectively quasiprojective algebraic) sets. Then f(X) is a closed complex analytic (respectively protective, respectively affine, respectively quasiprojective algebraic) subset ofY. Proof. The analytic statement may be found in (Fischer, 1976). This, or the simple fact that in the complex topology proper maps take closed sets to closed sets, automatically implies the algebraic statements. To see this note that if X is projective, affine, or quasiprojective algebraic, we know that ir(X) is constructible by Theorem 12.5.6. Since n(X) is closed, we have the conclusion from Lemma 12.5.4.

•

Recall that an algebraic map from a quasiprojective set to an irreducible quasiprojective set Y is dominant if the image of the map is dense in Y.

312

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Lemma A.4.4 Let f : X —•> Y be a dominant proper algebraic map from a quasiprojective algebraic set to a quasiprojective algebraic set. Then f(X) — Y. Proof. Let y be a point not in the image of X. By dominance we can find a sequence of points yj G f(X) with yj converging to y. Choose Xj G X with f(xj) = yj. By the definition of properness, there is a neighborhood U that contains y such that f~l(U) is compact. Thus there is a subsequence of the Xj that converges to a point • x G X. By continuity of / , we have the contradiction that f(x) = y. For algebraic sets there is a strong result on upper semicontinuity of dimension (Corollary 3.16 Mumford, 1995) for the algebraic case or (Theorem, page 137 Fischer, 1976) for the more difficult complex analytic version. Theorem A.4.5 Let f : X —> Y be an algebraic map (respectively a holomorphic map) between quasiprojective algebraic sets (respectively complex analytic spaces). Then for each positive integer k,

{xex\ dimxr'ifix^yk} is a quasiprojective algebraic set (respectively a complex analytic space). Remark A.4.6 Let / : X —> Y be an algebraic map between algebraic sets. As we see from Example 12.5.5, the sets

{yGY

| dim f-\y)>

k)

do not have to be algebraic sets, though by Theorem A.4.5 and by Theorem 12.5.6, they are constructible. Using Theorem A.4.5 and Theorem A.4.3 we have the following result. Corollary A.4.7 Let f : X —> Y be a proper algebraic mapping of quasiprojective algebraic sets. For each integerfc> 0, the set {y G Y\ d i m / ^ 1 ( y ) > k} is a closed quasiprojective subset ofY. Finally we have the very useful Factorization Theorem of Remmert and Stein (III Corollary 11.5 Hartshorne, 1977). Note that finite-to-one proper maps are called finite maps by algebraic geometers, e.g., (Hartshorne, 1977). Theorem A.4.8 (Stein Factorization Theorem) Let f : X —> Y be a proper algebraic mapping of quasiprojective algebraic sets. Then f factors as s or, where r : X —> Z is a proper algebraic map from X onto a quasiprojective algebraic set Z with all fibers connected, and s : Z —>Y is a finite-to-one proper map. The following general lemma, which is a special case of (III Proposition 10.6 Hartshorne, 1977), is often useful. Lemma A.4.9 Let f : X —* Y be an algebraic map from a quasiprojective algebraic set X to a quasiprojective algebraic set Y. Let Xr denote the closure of

313

Algebraic Geometry

those points from x e Xreg such that f(x) 6 f(X) dimf{Xr) < r.

and rank dfx < r. Then

The algebraic analogue of Sard's Theorem, e.g., ((3.7) Mumford, 1995), is much crisper than the usual Sard's theorem for differentiable maps. It is responsible, through the Bertini theorems of § A.9 for many of the strong probability-one statements in this book. Theorem A.4.10 (Sard's Theorem) Let n : X —> Y be a dominant algebraic map between irreducible quasiprojective algebraic sets X and Y. Then there exists a Zariski open dense subset V cY such that letting U denote the Zariski open dense set A^reg (~l TT~1(V), TTu : U —> V is surjective and of maximal rank, i.e., dnx has rank dimY at all points of x e U. In particular, for all v e V, T T " 1 ^ ) fl XTeg is smooth of dimension dim X — dim Y.

Proof. In a nutshell the proof goes as follows. By replacing Y by a Zariski open dense set Y' of Y with Y' C n(X) \ Sing(Y), and X by Xreg n TT~1(Y'), we can assume without loss of generality that X and Y are smooth and n is surjective. Let X' denote the closed algebraic subset of X consisting of points for which dirx has rank < dimY. By Lemma A.4.9, n(X') is a proper algebraic subset of Y. • Remark A.4.11 The differentiable form of Theorem A.4.10 is quite weak. For example, consider the infinitely differentiable map / : E 2 —• R defined by

/(*,„) : = f e x p ( ^ i f ^ < l . [

0

if x2 + y2 > 1

The image of this map is [0,e -1 ]. Over the dense set (Oje"1) of the image [0, e" 1 ], / is of maximal rank, but f~1(U) is far from dense in IR2. Another useful fact is that generically dimensions add. Corollary A.4.12 (Additivity of Dimensions) Let f : X -^ Y be a dominant map between irreducible quasiprojective algebraic sets. There is a dense Zariski open setUcY such that for all y GU, f"1(y) is pure dimension dimX - dimY. Proof. By Theorem A.4.5, there is a dense Zariski open set V C X such that dimx f~1(f(x)) is a constant k for all x G V, and for all x G Z := X \ V, dimx f~1(f(x)) > k. Using Theorem A.4.10, we see that k = dimX — dimY. We will be done if we show that f(Z) is not Zariski dense. Assume it was. Then we would have an irreducible component Z' of Z mapped dominantly to Y. Using Theorem A.4.10 we conclude that a dense set of points x £ Z' satisfy dimx fz^{fz'{x)) — dimZ' — dimY. But this gives the contradiction that dimX - dimY = k < dim^ f^ifz'ix))

= dimZ' - dimY < dimX - dimY. a

314

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

The following result (Corollary, page 138 Fischer, 1976) is useful for analyzing not necessarily proper maps. The algebraic case with the Zariski topology follows from it by using Theorem 12.5.6. Theorem A.4.13 Let f : X —> Y be a holomorphic spaces. Assume for a point x € X that there is an O C X of x, such that for all x' € O, dinx^ f~1(f(x')) are arbitrarily small open complex neighborhoods U C such that

mapping of complex analytic open complex neighborhood is a constant k. Then there O of x and V C Y of f(x)

(1) f(U) is a complex analytic subset ofV; (2) fu'.U—t /(J7) is an open map, i.e., all open subsets of U in the complex topology are mapped to open subsets; and (3) d\mx U = k + dim /(a; ) f(U). By a covering map g : A —> B between differentiable manifolds of the same dimension is meant a differentiable map such that each point y GY has a neighborhood U such that g~1 (U) is a union of disjoint open sets each mapped isomorphically onto U by / . We have the important consequence of Theorem A.4.3 and Theorem A.4.10. Corollary A.4.14 Let f : X —> Y be a surjective proper algebraic map between quasiprojective algebraic sets. Assume that X and Y are pure dimensional with dimX = dimY. Then there is a Zariski dense open set U C Y which is smooth and such that f : f~l(U) —> U is a covering map. If Y is irreducible, the map / in Corollary A.4.14 has a well-defined degree. Corollary A.4.15 Let f : X —> Y be a proper map from a pure-dimensional quasiprojective algebraic set X to an irreducible quasiprojective algebraic set of the same dimension. Assume that f is surjective on every component of X, e.g., assume that f is finite-to-one. Then there is a Zariski dense open set U C Y such that f : f~1(U) —> U is a covering map of degree degf, which is equal to the number of points in any fiber of fj-i^uy The most important case of Corollary A.4.15 is when -K : X —» Y is finite-to-one. We say that an algebraic map from a pure-dimensional quasiprojective algebraic set X to an irreducible quasiprojective algebraic set of the same dimension is a branched covering if / is proper and finite. We define the degree of / to be the number deg / in Corollary A.4.15. The following Lemma gives an easy local condition for properness. Lemma A.4.16 (Stein) Let f : X —> Y be a holomorphic map between complex analytic spaces. Assume that y £Y and A is a connected component (not necessarily

Algebraic Geometry

315

irreducible) of f~l(y). If A is compact, then there are open complex neighborhoods U c X of A and V Vis proper. Proof. See (Lemma 1, page 56 Fischer, 1976).

•

Using this and Grauert's Proper Mapping Theorem A.4.3, we have an extremely important existence result. Theorem A.4.17 Let f : X —> Y be a holomorphic map between complex analytic spaces. Assume that Y is irreducible and that all irreducible components of X have dimension at least equal to dimY. Assume that y € Y and x is an isolated point °f f1(y)IfY is locally irreducible at y, then, there are arbitrarily small complex neighborhoods U C X of x and V C Y of y such that f\j : U —> V is a proper surjective map with finite fibers. Proof. Choose a neighborhood X' of x. By Lemma A.4.16, there are open complex neighborhoods U C X' of x and V C Y of y, such that fu : U —> V is proper. By Theorem A.4.3, f(U) is a complex analytic subspace of V. Since x is isolated, we would be done in the algebraic case by Corollary A.4.12 and the irreducibility of Y at y. In the complex analytic case, we instead use Theorem A.4.13, which implies dim/([/) = dimY. From this and the irreducibility of Y at y, we conclude that f(U) contains a complex open neighborhood V of y. The rest of the result follows by replacing V by V, and U by U n f~l{V). U Example A.4.18 The local irreducibility is needed for the above result. Let X := C and let Y := V(g) C C2 be defined by g(x,y) = y2 - x2(x + 1). Consider the algebraic map / from to X -* Y given by f(t) = (-(1 + t2),yf-i(t + t3)). The reader can check that / is surjective and one-to-one everywhere except at ±%/^T which are mapped to (0,0). A small complex neighborhood U of (0,0) on Y is biholomorphic to a small complex neighborhood of 0 on V(xy). A small neighborhood of t = yf—l does not map onto a neighborhood of (0,0) in Y, but only onto a complex neighborhood of (0,0) on one of the irreducible components of U at (0,0). The following result underpins most constructions of homotopies. It asserts that under minimal conditions, isolated solutions of a system in a family of systems are limits of isolated solutions of nearby systems. Corollary A.4.19 Let X and Y be irreducible quasiprojective algebraic sets. Let f(x;y) be a system of N := dimX algebraic functions on X xY. Letir : XxY —> Y denote the product projection. Assume that x* is an isolated solution of f(x;y*) = 0 for some point y* such that Y is locally irreducible at y*. Then each irreducible component Z ofV(f) containing (x*,y*) satisfies the following properties: (1) dim Z = dim Y; and

316

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

(2) there exist arbitrarily small open neighborhoods U C Z of x* and V C Y of y* such that 7T[/ is a proper finite map of U onto V. A number of stronger versions of Corollary A.4.19 are contained in § A. 14.

A.4.1

Generic

Factorization

The next theorem tells us that if we are satisfied with generic results, as we often are in numerical work, properness is unnecessary. The proof requires some constructions involving the graph of a map. Theorem A.4.20 Let n : X —> Y be a dominant algebraic map from an irreducible quasiprojective algebraic set X to an irreducible quasiprojective algebraic set Y. There exists a smooth Zariski dense open set U C Y and a smooth Zariski dense open set W of TT~1(U) such that nw factors nw — s or, where r : W —> V is a surjective maximal rank algebraic map with connected fibers and s : V —> U is an algebraic map and a finite covering map. In particular each fiber of TTW '-W-^U has the same number of irreducible components, i.e., degs irreducible components. Proof. In the following argument, we will repeatedly replace algebraic sets by dense Zariski open sets. For the most part, we call these shrunk sets by the same names. Replacing X by Xreg, we may assume that X is smooth. By shrinking Y and replacing X by the inverse image under n of the shrunk Y, we may assume that Y is smooth. Similarly using Chevalley's Theorem 12.5.6, we may assume that ?r(X) = Y. By using the algebraic Sard's Theorem A.4.10 and shrinking X and Y further we may further assume that IT is of maximal rank. Let X denote an irreducible projective algebraic set in which X is Zariski open. Let X denote the closure of Graph(7r) C X x Y in X x Y. The induced map W : X —> Y extending n is proper. By Hironaka's Theorem A.4.1, there exists a desingularization / : X —> X. Thus following Theorem A.4.8, we can factor ?F O / as a o p where p : X —> Z is an algebraic map with connected fibers; where Z is an irreducible quasiprojective algebraic set; and a : Z —» Y is a finite-to-one proper algebraic map. By Corollary A.4.15, there is a Zariski open dense set U' C Y such that U' and cr' 1 (t/ / ) are smooth and ov-^t/') : <J~1(U') —> U' is a covering. Thus by shrinking we may assume without loss of generality that a : Z —> Y is a finite-to-one covering; Z is smooth; and that p(X) = Z. As we shrink Z we may automatically shrink Y so as not to lose the properties already obtained. To see this let V' be a Zariski open set of Z. Since the image under a of the proper algebraic subset Z \ V is a proper algebraic subset, we may replace Y by U := Y \ a{Z \ V) and Z by V := o-1 (Y \ a(Z \ V1)) to still have an algebraic finite-to-one covering map a :V —> U between manifolds.

Algebraic Geometry

317

By using the algebraic Sard Theorem A.4.10 on p we may, after shrinking, assume without loss of generality that p is of maximal rank. Since X is smooth and / is an isomorphism on the inverse image of the regular points, we may regard X as a subset of X. By using Chevalley's Theorem 12.5.6, we may assume that px '• X —> Z surjects onto Z. Since all fibers of p are smooth and connected they are irreducible. We conclude that all fibers of px are smooth and connected. Indeed, if this failed, we would have a fiber of p which is irreducible, and which after removing a proper algebraic subset is disconnected. Taking U as the final shrunk Y, V :— a~1{U) and W as the final shrunk X, we have finished the proof of the theorem. • Corollary A.4.21 Let n : X —> Y be a dominant algebraic map from an irreducible quasiprojective algebraic set X to an irreducible quasiprojective algebraic set Y. Assume that 7r(Sing(X)) is a proper algebraic subset ofY. There exists a Zariski dense open set U C Yreg such that W := ?r~1(!7) is smooth; and ITW factors nw = s o r, where r : W —> V is a surjective maximal rank algebraic map with connected fibers and s : V —> U is an algebraic map and a finite covering map. In particular each fiber of irw '• W —> U has the same number of irreducible components, i.e., degs irreducible components.

Proof. Replacing Y by Y' := Y\ (Sing(y) U7r(Sing(X))) and X by TT^1 (Y1), it may be assumed without loss of generality that X and Y are smooth. The rest of the proof follows by carefully going through the proof of Theorem A.4.20. • A.5

Rational Mappings

Besides algebraic mappings, there is a more general notion of mapping that is often very useful. On C, the assignment / : x H-> 1/X is a well-defined function on C \ {0}. Using the identification of x € C with [1, x] € P 1 ,, it is natural to think of / as extending to a function on all of C that takes 0 to the value oo equal to [0,1]. In this case, the algebraic set {(a;, [x, 1]) e C x P 1 | x G C} is the graph of the map x —> 1/x regarded as a map from C —> P 1 . This is the simplest example of a rational mapping. A rational mapping f : X —> Y between quasiprojective algebraic sets is defined to be an algebraic set F C X x Y such that there is a Zariski open dense set U c X such that F n (U x Y) is the graph of an algebraic map and V D (U x Y) = T. Every algebraic mapping is a rational mapping, but rational mappings are much general. A rational mapping often has a set of indeterminacy, where it cannot be

318

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

assigned any value. Consider the rational mapping given by the assignment / : (zi, z-+ 22/21 on C 2 . The corresponding set T c C 2 x P 1 is (z1,z2;[z1,z2]) e C 2 xP 1 . F does not define an algebraic map at (0,0). Indeed, there is no way to assign a value there since 22/21 has different constant values on different lines through the origin. The origin is the set of indeterminacy of / . Rational mappings may fail to be algebraic mappings even though they are continuous maps. For example, consider the algebraic map g sending t 6 C to (21,22) := (*2,£3) e C 2 . The image of g is the affine set C denned by z\ - z\ = 0. The map g is one-to-one and onto. The inverse from C to C may be checked to be a rational mapping: it is the restriction of the rational function 22/21. However, it is not an algebraic mapping, since t as a function on C is not the restriction of a polynomial p £ C[2i, 22]. A.6

The Rank and the Projective Rank of an Algebraic System

We define an algebraic system on a pure .ZV-dimensional quasiprojective algebraic set X to be a set of n algebraic functions f{x) := { / 1 , . . . , / „ } on X. Typically we assume irreducibility in theorems about systems and leave it to the reader to make the trivial adjustments in statements for the reducible case. We define the rank of the algebraic system f(x) = 0 on an TV-dimensional irreducible quasiprojective algebraic set X to be the dimension of the closure of the image / ( I ) c C". By Corollary 12.5.7, f(X) is an irreducible algebraic set. We denote the rank of / by rank/. We define the corank of the algebraic system f(x) = 0 to be N — rank/. Neither adjoining polynomial functions of the equations of / to create a larger system nor replacing / with g • f, where g is an invertible nxn matrix, changes the rank of a system. Theorem A.6.1 Let f(x) = 0 denote a system of n algebraic functions on an irreducible quasiprojective set X. Then there is a Zariski open set U c f{X) c C" such that for y € U, V(f(x) — y) PI XIeg is smooth of dimension equal to the corank of f. Moreover, the Jacobian matrix of f is of rank equal to rank/ at all points of

V(f(x)-y)nXTeg.

Proof. By Theorem A.4.10, we know there is a Zariski open set U C f{X) such that V(f(x) — y) is smooth and such that the Jacobian matrix of / has rank equal to dim/pQ at all points of V(f(x) -y). Corollary A.4.12 gives that dim V(f(x)-y) = N-dimf(X). • Let V and / be as in Theorem A.6.1. For the dense Zariski open set U :=

Algebraic Geometry

319

/~ 1 (y)nX r e g , we have for all points £* 6 U, the rank of the Jacobian of / evaluated at x* equals rank/. This gives us a quick probability-one algorithm for the rank for a system f. Given an algebraic system f(x) of n algebraic functions on an irreducible ./V-dimensional quasiprojective set X, the rank of / equals the rank of the Jacobian at a random point of X. The following is useful. Corollary A.6.2 Let f(x) = 0 denote a system of n algebraic functions on an irreducible quasiprojective set X. If the rank of the Jacobian of f at some point x E Xreg is k, then rank/ > k. Proof. The set on X reg where the Jacobian has rank less than or equal to A; is a quasiprojective subset of X reg . If it is dense then rank/ = k. If it is not dense, then the rank of the Jacobian is greater than A; on a Zariski dense set, which would imply rank/ > k. • Theorem A.6.3 Given a system f(x) of n algebraic functions on an irreducible quasiprojective set X, all irreducible components of V(f) have dimension at least equal to the corank of f. Proof. Use the proof of Theorem 13.4.2

D

The rank of a system is a useful invariant, but from the viewpoint of the Bertini Theorems, a closely related invariant, the projective rank of a system, plays a more central role. The first time reader may safely ignore the rest of this section and any mention of the projective rank of a system. The importance of projective rank stems from Theorem A.8.6, which states that projective rank controls the nonemptiness of the zero sets in Bertini Theorems. The projective rank of a system f is the dimension of the closure of the image of the rational mapping given by sending x G X \ V(f) to [/i(a;),..., fn(x)]. We denote the projective rank of / by rankp/. A system / on CN having projective rank N is called a big system. Remark A.6.4 For an algebraic line bundle and a system of algebraic sections / i , . . . ,fn, rank does not make good sense but projective rank does. It is closely related to the notion of Kodaira dimension, e.g., (Iitaka, 1982). Lemma A.6.5 Let f be a system of n algebraic functions on an irreducible quasiprojective algebraic set X. Then rank/ — 1 < rankp/ < rank/. Proof. The rational mapping used in the definition of projective rank factors as the composition of the map in the definition of rank followed by the map Cn \ {0} —> P™"1, which sends (ZQ, ..., zn-i) —> [ZQ, ..., zn^i]. Since the fiber of the map C™ \ {0} —> P " - 1 is dimension one, we are done. •

320

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

The above proof makes clear how the two ranks may fail to be equal. Before we make this precise in Lemma A.6.7, let us give a definition and an example. Let X c P™ be an irreducible projective set. X is said to be a cone with vertex x e X if for some hyperplane H of Pra not containing x, the projection •KX : P™ \ {x} —> H maps X \ {x} to a set whose closure has dimension less than dimX. An irreducible affine algebraic set X C C™ is said to be a cone with vertex x G X if, when we regard Cn as a subset of P n , the closure of X in P n is a cone with vertex x. Example A.6.6 (Cones) Let Y denote an irreducible (N — l)-dimensional smooth complete intersection in p™-1 defined by homogeneous equations / i , . . . ,fn-N in the variables zo,..., zn-\. The Af-dimensional cone X := V ( / i , . . . , / n -jv) C C n intersects a general (n - iV)-dimensional linear subspace through the origin in the origin only. To see this, regard C" as contained in ¥n with the hyperplane at infinity Hoo given by V(zn). For positive integers m satisfying (m — 1) + (N — 1) < n — 1, there is a dense Zariski open set of m-dimensional linear subspaces L c C™ with

I n I n E 0 0 = (In H^) n (x n H^) = 0. Thus X C\ L = {0}, and this point is singular if degX > 2. Lemma A.6.7 Let f be a system of n algebraic functions on an N-dimensional irreducible quasiprojective set X. Then rank/ = rankp/ except when f(X) is a cone over 0. Proof. The proof of this is immediate from the definitions and left to the reader. • To compute the projective rank is easy. Theorem A.6.8 Let f be a system of n algebraic functions / i , . . . , / „ on an irreducible quasiprojective algebraic set X with one of the functions, fi, which we relabel f\, not identically equal to the zero function. Then

rankp/ = rank j , . . . , - ^ , I /i h ) where the system of quotients is defined on X \ V(/i). Proof. The proof of this, and the independence of the choice of the not identically zero fi, follows immediately from definitions and is left to the reader. • A.7

Universal Functions and Systems

In this section we will give a detailed discussion of certain special families of polynomials, which are useful in the study of polynomial systems.

321

Algebraic Geometry

A. 7.1

One Variable Polynomials

We start with the case of the most general degree d polynomial. Fix an integer d > 1. We have the family p(z, c) := c0 + ciz + ... + cdzd = 0.

(A.7.1)

We have some related issues we must face already. (1) Should we insist that cd ^ 0? (2) Would it be better to include "solutions at infinity" by using the homogenized system

p(z,c) := cO2o + ciz^zi

+ ... + cdzf = 0

withz := [zo,zi] G P 1 ? (3) Since multiplying an equation by a nonzero complex number does not change the solution set of the polynomial, should we make the convention that c is not the point (c 0 , ...,cd) G C d + 1 , but instead [c0, ...,cd] G P d ? Doing this, of course, implicitly throws away the identically zero polynomial, that corresponds to c = 0. For simplicity, we look only at polynomials with no restrictions on the c*, i.e., we assume that (z,c) = (z,c0,... ,cd) G Cd+2 corresponding to polynomials of degree < d. The different choices raised by the issues listed above are treated in a similar way. Let's introduce some notation. We let Zd c Cd+2 denote the solution set of p(z, c) = 0. We let n : Zd —> Cd+1 denote the map induced by the projection (z, c) —> c, and we let p : Zd —> C denote the map induced by the projection 0,c) ->• z. Note that for any given c € Cd, n~1(c) consists of the points (z,c) satisfying p(z, c) = Co + c\z + . . . + cdzd = 0. It is important that the zero set Zd C Cd+2 oip(z,c) = 0 is a connected (d + l)-dimensional complex manifold. Indeed, Zd is dimension d+ 1 since it is denned by a single algebraic function on an irreducible d'oi z cl (d+ 1)-dimensional algebraic set. Moreover, since —^ ' = 1, it is a consequence OCQ

of Theorem A.2.8, that Zd is smooth. To show the connectedness of Zd, we apply the criterion that a space (in our case Zd) is connected if a continuous map (in our case p : Zd —> C) has connected image and fibers. To see this, note that the fiber p~1(zt) of p over an arbitrary point zt G C is the set of (z*,c) such that p{z*,c) = 0. Since this is the linear equation CQ + C\Z* + ... + cdzd = 0 in the variables c G C d + 1 , we see that p~1(z*) is identified with a hyperplane of Cd+l by ir. Since Zd is connected and smooth, it is irreducible. Over all points except 0 6 C d + 1 , n has finite fibers.

322

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Let us consider the points c for which p(z, c) has multiple roots, i.e., less than d distinct roots. This corresponds to points c for which the equations P(Z>C)] _n dP(z,c) — u . dz J

,A 7 9 N {A. (.A)

have a common root. The classical prescription, e.g., (Chapter 1 Walker, 1962) and (Cox et al., 1997), for how to eliminate z from these equations constructs the discriminant of p(z,c), a polynomial of degree Id — 1 in c, which is the resultant of the two polynomials in the system (A.7.2). We only discuss resultants briefly in § 6.2.1 of this book. For us it will be enough to note that (1) for some c», e.g., the c* corresponding to the polynomial zd — 1 = 0, the roots are distinct; (2) for c in a complex neighborhood of c* the roots remain distinct; and (3) the set S in Cd+1 denned by the system A.7.2 is an affine algebraic set. We know from item 3) and Chevalley's Theorem 12.5.6 that TT(<S) is a constructible set. We know that the closure T> of n(S) in the complex topology is an affine algebraic set by Lemma 12.5.3. By item 2) we conclude that V ^ C d+1 and thus that the Zariski open set C d + 1 \ T> is nonempty, and hence dense. A.7.2

Polynomials of Several Variables

The construction of § A.7.1 carries over to several variables. We summarize the construction for polynomials of degree < d on CN. Such a polynomial

P(z,c)= Yl °JzJ \j\
depends on A/" := ( j~ ) coefficients cj, where we use the multidegree notation. We regard p(z, c) as a polynomial on CN x C . By the same reasoning as in § A.7.1, we see that Z^ := V(p(z,c)) is smooth, connected, and of codimension one. Moreover by Theorem A.4.10, there is a Zariski open dense set U C O^", such that the restriction of the projection map TT : CN x C —> CN to Zd H TT~1(U) is a maximal rank map. A.7.3

A More General Case

Let fi(x),..., fn{x) be a set of algebraic functions on an irreducible quasiprojective algebraic set X. For example, these might be a set of rational functions PlQ)

Pn(x)

Qi(x)''""'' qn(x) where the Pi(x) and <&(#) are polynomials on CN and X := CN \ (\J™=1V(qi(x))).

323

Algebraic Geometry n

We define the universal function F(X: x) := V^ A;/,(z) on Cn x X. i=l

It is traditional in this context to refer to the solution set V(f) of the set of algebraic functions fi(x),..., fn{x) as the base locus of the set of functions. We will not use this language, but the reader should be aware of it. Zf : = V(F) i s a quasiprojective algebraic set with Zf (1 [Cn x (Xieg \ V(f))] smooth. Moreover there is a Zariski open dense set U C Cn, such that the restriction of the projection map -K : Cn X (Xreg \ V(/)) —> C" to Zf H TC~1 (U) is either empty or a maximal rank map. This is important enough to state as a Theorem. Theorem A.7.1 (Simple Bertini's Theorem) Let f{x) := {fi,- • • ,fn} be a system of algebraic functions on an irreducible quasiprojective algebraic set X. There is a Zariski open dense set U C C™, such that for (Ai,...,A n ) G U, it follows that g := Yli=i Ai/i has a possibly empty quasiprojective zero set Z such that Z (~i (Xreg \ V(f)) is smooth with the differential dg nowhere zero on

zn(xieg\V(f)).

Proof. First note that we can assume that X is smooth and V(f) is empty, by simply replacing X with (XTeg \ V(/)) and renaming. Note that if rank/ = 0, then each g is constant and the theorem is vacuously true. Therefore we can assume that rank/ > 0. We have the "universal function" F(\,x) :— X^ILi ^ifi(x) defined for (X,x) G n C x X. Zf := V(F) C X is smooth by the same reasoning as used in § A.7.1. Consider the maps TTI : Zf —>• Cra and 7T2 : Zf —> X induced by the projections C " x X - » C " a n d C n x I ^ I respectively. The fiber ^(x) for any x G X is an affine hyperplane of C™. It can be further checked that given any x e X, there is a neighborhood O of x in the complex topology such that 7r^"1(C7) is biholomorphic to C"" 1 x C. Thus Zj is a bundle over X and therefore irreducible of dimension dimX + 71—1. We are in the situation of Theorem A.4.10, and would be done, if we knew that 7Ti is dominant. Assume it is not. Then, there is a Zariski open dense set U C C n such that for A G U we would have that V(£)™=1 A,/*) = 0. • A. 7.4

Universal

Systems

Let / i ( x ) , . . . , fn(x) be a set of algebraic functions on an irreducible quasiprojective algebraic set X. For any positive integer s, we define the universal system "Fi(A,x)i F(A,x):=

:

_F s (A,x)J on C s x " x X.

["Ai,i ••• Ai,n"i = A • /(z) =

:

•..

:

LAS,1 ••• A s , n J

r/r •

:

[fn.

324

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Define Zf := V{F(A,x)) C Csxn x X. Let -KX : Zf -> C s x n and TT2 : Z/ -> X denote the projections induced from C sX71 x X -+ C SXri and C s x " x I -> X respectively. L e m m a A.7.2

IfV(f)

= 0 £/ien Z j is irreducible of dimension dim.X + s(n— 1).

Proof. The set 7r^"1(x) for x 6 X may be identified with the linear space of A 6 Csx™ satisfying A • / ( x ) = 0. Since f(x) ^ 0, this space has codimension s. It can be further checked that given any x G X there is a neighborhood 0 of a: in the complex topology such that T T ^ ^ O ) is biholomorphic to ^ ' " " ' ' x O . Prom this it follows that the set Zf C C s x n x X consisting of (A, x) such that F(A, x) = 0 is an irreducible • set. Lemma A.7.3

IfV(f)

is empty, then Zf D (CsX™ x Xreg)

is smooth.

Proof. Since at any point x G V(F) C Csxn x X at least one of the / , is nonzero, we can see that all the partial derivatives

dFj{A,x) From this we see that V{F) n (Csxn x X r e g ) is smooth.

D

After we develop a few results on linear projections and subspaces, we will prove Theorem A.8.7, the analogue for systems of Theorem A.7.1.

A.8

Linear Projections

"Generic" projections have been used since classical times to reduce questions about general algebraic sets to questions about hypersurfaces. Here we present the basic facts that we need. We follow the presentation in (Sommese et al., 2001c) closely. A linear projection n : CN —> C m is a surjective affine map ?r(x) = a + ^ x ,

(A.8.3)

where «i,o

o=

: .«m,oj

We work with from CN onto with T(TTI(X)) fibers through

a i , i • • • ai,N

; A=

: • . : L a m , l • ' • a m,JVJ

Xi

; and x =

:

(A.8.4)

LXN.

equivalence classes of projections, considering two projections 7Ti,7r2 C m equivalent if there is an affine linear isomorphism T : C m —• C m = 7T2(x). Thus, for us two linear projections are the same if their the origin are parallel (N - m)-dimensional linear subspaces of C ^ ,

Algebraic Geometry

325

i.e., TTJ~ (TTI(O)) is parallel to TT^" (7^(0)). So in the special case of linear projections from C^ —> C^^ 1 with N > 2, we can consider the projections to be parameterized by the lines through the origin, or equivalently the hyperplane at infinity H^ := N V(ZQ), where we regard C^ as embedded in ¥ by ( x i , . . . , XN) - » [z 0 , • • • , ZN\ = (1, Xi, . . . , I J V ) .

This observation will play an important role in § A.10.3. Though we can use the set ofmxJV matrices A g C mXiv with rank A — m to parameterize the linear projections, it helps to keep the geometrical correspondence between projections and the nullspaces of the matrices A in mind. As noted in the last paragraph, when m = N — 1, we are dealing with lines and the natural parameter space is the projective space parameterizing lines though the origin in C^. For linear subspaces of other dimensions this leads us to Grassmannians. A.8.1

Grassmannians

We denned the iV-dimensional projective space P^ in § 3.2 as the set of lines through the origin in CN+1. Replacing linesby (m+1) -dimensional linear subspaces through the origin leads to the notion of a Grassmannian. We define the Grassmannian of (m + l)-planes in (N + l)-space to be the set of all (m + l)-dimensional linear subspaces of CN+1 through the origin. Equivalently this is the space of linear P m s in FN. We denote this space Gr(m, N). The reader should be aware that there is a second convention in the literature where the focus is on CN+1, and the space we denote Gr(m, N) is denoted Gr(m + 1, N + 1). An (TO + l)-dimensional subspace of C w + 1 through the origin is determined by m + 1 elements of CN+1. In analogy with homogeneous coordinates on projective space, we may represent an element of Gr(m, N) by an (m + 1) x (N + 1) matrix A. Conversely, we would like an (m + 1) x (N + 1) matrix A to represent an element of Gr(m, N). For A to represent an (m + l)-dimensional linear subspace, A must have rank m + 1 , e.g., for projective space P^ = Gr(0,N), the (N + l)-tuples [zo,... ,ZN] £ FN are not allowed to have all entries 0. In analogy with homogeneous coordinates on PN, if G is an (m + 1) x (m + 1) invertible matrix, then the rank m + 1 matrices A and G • A represent the same linear subspace. As with FN, we can define embeddings of c( m + 1 ) x ( i V - m ) into Gr(m, N). Indeed, given an (m+1) x (N — m) matrix B, if we send it to [Im+i B], we have a one-to-one mapping. We take this as giving a neighborhood of any of the elements of the image of this map. We can construct other embeddings of c(m+i)x(W-™) whose unions cover Gr(m, N), but do not do so since we do not need this. Grassmannians are connected projective manifolds. As we saw above, the dimension of Gr(m, N) is (m + 1) x (N — m). There is a natural embedding

GrfoNO-pG+D-1

326

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

called the Plucker embedding obtained by sending an (m + 1) x (AT + 1) matrix A representing a point of Gr(m, N) to the point in p U + i ) " 1 with the (^1J) determinants of (m + 1) x (m + 1) submatrices of A as homogeneous coordinates. Since using G • A in place of A multiplies all these homogeneous coordinates by det G, this mapping is well defined. There is a large literature on Grassmannians. The analogy for Grassmannians of the linear projections from FN to lower dimensional projective spaces, that we consider in § A.8.2, are similar mappings from Gr(m, N) to lower dimensional Grassmannians. The isomorphism of a vector space and its dual leads to the isomorphism between Gr(m, N) and Gr(N —m, N). A good place to read more on these useful homogeneous manifolds is (Griffths & Harris, 1994). For detailed information, (Hodge & Pedoe, 1994b) is particularly helpful. In the same vein as § A.7.4, it may be shown that there is a connected projective manifold Ti c Gr(m, N) x P N consisting of all points (L, x) where L G Gr(m, N) is an m-dimensional linear subspace of P ^ and x G PN is a point in the subspace of PN represented by L. By standard abuse of notation, this is denoted by x £ L. Let 7i"i : 7i —> Gr(m, N) and 7T2 : 7i —> PN denote the algebraic mappings induced by the projections Gr(m,N) x P N -> Gr(m,N) and G r ( m , N ) x P N - > P N respectively. The mapping TTI is of maximal rank with the fiber ir~1(L) over L G Gr(m, N) mapped isomorphically to L by 7T2- Thus

dimH = (m + 1)(N - m) + m. The mapping TT2 is of maximal rank with the fiber 7r^"1(a;) for x G PN mapped isomorphically by w\ onto the Gr(m — 1, JV — 1) of m-dimensional linear spaces of P ^ that contain x. Thus dim7r^"1(a:) = m(N — m) Regarding CN as P ^ minus a hyperplane H, there is a one-to-one correspondence of m-dimensional affine linear subspaces of CN with the dense Zariski open set U C Gr(m, N) of m-dimensional linear subspaces L of P ^ not completely contained in H. The algebraic set Gr(m, N) \U is thus identified with Gr(m, N — 1). Theorem A.8.1 Let X be an n-dimensional affine algebraic subset of CN (respectively projective algebraic subset of¥N). Ifn + m < N, there is a dense Zariski open set U C Gr(m, N) of affine linear subspaces L C C ^ (respectively of linear spaces L C PN) of dimension m not meeting X. Proof. Using the identification of affine linear spaces of C ^ with linear spaces in P ^ , we only need to show this result in the case of FN. Note that d i m T r ^ X ) = n + m(N — m). Thus 7Ti(7r^"1(X)) cannot be dense because if it was

(m+l)(N-m) = dimGr(m,N) = dimTr^Tr^^X)) < dimTr^pQ = n+m(N-m). This implies the contradiction N — m < n. Theorem A.8.2

•

Let X be an n-dimensional algebraic subset of CN (respectively

Algebraic Geometry

327

projective algebraic subset of FN). Assume n + m < N. For a given x € X, there is a dense Zariski open set U of m-dimensional affine linear spaces L C CN containing x (respectively m-dimensional linear spaces L C fN containing x) such that L D X = {x}. Moreover if x € X r e g , then U can be chosen so that in addition TL,X n Tx,x =x £ TCNIX (respectively TL,X n Tx,x =x£ T¥N^X), where TLtX! Tx,x, Tpjvj,, TCN >x are the tangent spaces of L, X, P w , C ^ respectively at x. Proof. This theorem is proved by reasoning similar to that for Theorem A.8.1. We only prove the case when X is projective algebraic (the quasiprojective case requires the projective case plus an application of Lemma 12.5.2). Fix a point x £ X C ¥N. The L € Gr(m, N) that contain x, i.e., G\ := 7T1(7r2"1(ar)) is isomorphic to Gr(m — 1,N — 1) and thus irreducible and m(N — m) dimensional. The set L containing x and a point y ^ x is isomorphic to the set G2 '•— Gr(m — 2, N — 2) and thus (m — 1)(N — m) dimensional. The set W of L that contain x and some other point y of X is thus of dimension at most (m — l)(iV —TO)+ n. Here we are using the fact that W is projective. To see this let 52 : G± —> fN denote the algebraic ir^q^iX)). mapping induced by TT2- W is the set = N-m-n>l, Since dimGi - d i m W > m(N -m) - ((m - l)(iV -m)+n) we conclude W is a proper algebraic subset of an irreducible projective algebraic set C?2- Thus there is a Zariski dense open set G2 \ W of m-dimensional projective linear spaces L containing x and no other point of X. The tangent space assertions follow by a dimension count showing that the space of L € Gi such that TL,X H TX,X ¥" x ^ TPN<X is of dimension less than dim G2- The details are left to the reader. • Theorem A.8.3 Let X be an n-dimensional affine algebraic subset of CN. Assume n + m > N. There is a dense Zariski open set U of m-dimensional linear m>N,U subspaces L C CN such that LH X is of dimension n + m — N. Ifn + may be chosen so that if L G U then L D XTeg is nonempty. Proof. Using the same sort of reasoning used in Theorem A.8.2 or a repeated use of Theorem A.7.1 gives this result. •

A.8.2

Linear Projections on FN

We need to consider the extension of projections to projective space. Such projections have traditionally been a major tool in algebraic geometry and are a perennial focus for research, e.g., (Beltrametti, Howard, Schneider, & Sommese, 2000). Let [z0,..., ZN\ denote linear coordinates on fN. As above, we regard CN C fN using the inclusion (xu ... ,xN) —> [z0,... ,zN] = [ l , x i , . . .,xN]. Thus C w = P w \ Hoo, where HOQ := V(ZQ) is the hyperplane at infinity.

328

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

A projection from WN to P m is a surjective map nL : FN \ L -> P m with n(z) = Az,

(A.8.5)

where a

A=

0,0

a

'• .SO

a

l,l

'''

'•

"•

rn,l

a

0,N

ZQ

:

and

z =

: L ZN .

••• a m . w j

and where i is the linear projective space p^- 7 7 1 " 1 c P w defined by the vanishing of the linear equations Az. L is the center of the projection. Theoretically, we work with equivalence classes of projections, considering two projections TTI, TT2 from PN onto P m equivalent if they have a common center L and there is a projective linear isomorphism T : P m -> P m with T(TTI(X)) = -K2{x) on P ^ - L. Note that two projections from P ^ to P m are equivalent if and only if they have the same center L. Thus the linear projections ¥N —> P m are naturally parameterized by the Grassmannian Gr(N — m — 1,N) of (N — m— l)-dimensional linear spaces L c PN. Geometrically, the projection -KI has a simple description. Let L be the center of the projection. Choose any P m C P ^ with the property that L D P m = 0. Given a point x e WN \L and letting (a;, L) denote the linear subspace FN~m C P ^ generated by x and L, the projection from P ^ to P m with center L sends x to (i,L)nPra. The projections nL from P ^ to P m that are extensions of projections from C^ to C m are precisely the projections with centers L C H^. Indeed, let y i , . . . , ym be coordinates on C m and let the usual embedding of C m to P m be given by { y i , . . . , y m ) -> [ w o , . . . , w m ] = [ l , y i , . . . , y m ] , Since we must have a linear equation in Xi,... ,XN when we dehomogenize with respect to w0, we conclude that A is of the form " ao,o

. a m,0

0

a

•• •

0 '

m,l ' ' ' am,N .

with o0,o / 0. Using the invertible linear transformation on P m

T

-=\ \°1

329

Algebraic Geometry

ai,o

where u :=

•

, we see that an equivalent form for A is

.am,0.

where A is as in Equation A.8.4. For example, the projection (xi,... ,XN) —» (xi,... ,a;jv_i) extends to the projection [XQ, ..., XJV] —> [xo,a;i,..., rrjv-i] with center L := {[0,..., 0,1]}. To recapitulate a main point: an equivalence class of linear projections is naturally identified with the center of the projection in the projective case and with the center of the projective extension of the linear projection in the affine case. A.8.3

Further Results on System Ranks

We have a few more properties on the behavior of the rank of a system under randomization. Lemma A.8.4 Let X C C n denote an irreducible affine algebraic set. Then for any nonnegative integer s, there is a dense Zariski open set of linear projections CN —> Cs such that the dimension of the closure of the image of X is min{dimX, s}. Proof. We regard Cn as the complement in P n of a hyperplane Hoc. We first do the case of s > dim X. As we saw above, the linear projections are parameterized by the Grassmannian G :— Gr(n - s — 1, n — 1) of linear P"~'s~1s contained in H^. We have dimXnffco < d i m X - l . Since (dimX - 1) + (n - s - 1) = n — (s - dimX) — 2 < n — 1, we conclude from Theorem A.8.1 that there is a Zariski open dense set U of G corresponding to linear pn-s-i s m i s s m g X n Hoo. Given one of these, say L, and the associated linear map irL,

t h e fiber -K^(irL{x))

t h r o u g h x G X i s (x,L)

H X.

S i n c e L n (X \ X) = 0 ,

(x, L)nX is compact and hence finite by Lemma 12.4.3. Thus by Corollary A.4.12, the closure of the image of X has dimension the same as X. The case of s < dimX follows from the case s = dimX and the observation that if s < dimX, then a dense open set of linear projections from Cd™^ —> C s are onto. • Theorem A.8.5 Let f(x) — 0 denote a system of n algebraic functions on an irreducible quasiprojective set X. For any positive integer s, there is a dense Zariski open set of matrices U C Csxn such that ifAeU, then rank A • / = min{s, rank/}.

330

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Proof. Applying Lemma A.8.4 to f{X)~; there is a dense Zariski open set U of A £ C s x n such that dim(A-/)(X) = min |dim/(X), s\ . • The following useful result adds an existence component to Bertini's Theorem. Theorem A.8.6 Let f be a system of n algebraic functions / i , . . . , / n on an irreducible N-dimensional quasiprojective algebraic set X. Assume that rankp/ — K. Then there is a Zariski open dense subset ofUd CKXn of K X n matrices such that for A £ U, any subset of i distinct functions from the K functions A • / has a nonempty solution set on Xieg, which is smooth and of dimension N — i. Proof. This follows from application of Theorem A.8.2 to the closure of the image ofXinP"-1. • A.8.4

Some Genericity Properties

We have the important generalization of Theorem A.7.1. T h e o r e m A . 8 . 7 ( S i m p l e B e r t i n i T h e o r e m f o r S y s t e m s ) Let fi,--.,fn

be a

system of algebraic functions on an irreducible quasiprojective algebraic set X with solution set V(f). For each s x n matrix A £ Csxn, let F{A,x)~

'Fi(A,x): =A-/(x) .Fs(A,x)_

on C s x n x X. There is a Zariski open dense set U C Csxn of s x n matrices such that for A £ U, it follows that V(A • / ) C X is a quasiprojective set such that if Z\ '•= V(A • f) \ V(f) is nonempty, then dim^A = dimX — s, and Z\ (~l XTeg is smooth with the differentials dFj spanning the normal bundle of ZhC\Xve&. Moreover the number of components of Z/^ fl Xreg is independent of A £ U. Proof. The set U' of A with rank equal to min{s, n} is dense and Zariski open. Therefore, by replacing any dense Zariski open set U C C s x n that is constructed below with its intersection with U', we may assume that all A G U have rank equal to m'm{s,n}. By replacing X with X\V(f) we can assume that the /; have no common zeros. Denote V(F(A,x)) C C s x n x X by Zf. By Lemma A.7.2, Zf is irreducible of dimension dimX + s(n — 1). By Lemma A.7.3, Zf f) (C s x n x Xreg) is smooth. Let TTI : Zf —> CSXn denote the algebraic map induced by the product projection sx C ™ x X —> C s X n and let TT2 : Zf —> X denote the algebraic map induced by the product projection
Algebraic Geometry

331

Therefore we may assume without loss of generality that wi restricted to Zf is dominant. By Corollary A.4.12, there is a Zariski open set U C Csxn such that for y eU, all components of ^:[1{y) have dimension dimZy - dim
z& n Xies is smooth with the differentials dFj spanning the normal bundle of Z\ n Xreg. A.9

•

Bertini's Theorem and Some Consequences

In this section we present a general Bertini Theorem about the solution sets of systems. Since the intersection of any finite number of dense Zariski open sets is Zariski open and dense, we can (and typically do) apply Bertini's Theorem to conclude that a generic choice of some parameters leads to a long list of generic properties. To state such a result succinctly, let us define the constellation of algebraic sets associated to a finite number of quasiprojective subsets X\,..., Xr of a quasiprojective set X to be the collection of sets obtained by repeatedly doing in any order the operations of (1) (2) (3) (4) (5)

taking irreducible components; taking intersections; taking the singular set of a quasiprojective algebraic set; taking finite unions; and given two sets A, B taking the set A \ A D B.

Lemma A.9.1 The constellation of algebraic sets, C, associated to a finite number of quasiprojective subsets X\,-.., Xr of an algebraic set X is a finite set of quasiprojective sets. Proof. All these operations start with quasiprojective algebraic sets and produce quasiprojective algebraic sets. To prove that C is finite, it suffices to show that the set of all the irreducible components of the quasiprojective sets obtained by these operations is finite. Since an irreducible quasiprojective algebraic set A minus a proper algebraic subset remains irreducible, the last operation leads only to the finite number of quasiprojective algebraic sets A \ A f) B for the collection of quasiprojective sets J4, B generated by the first four operations. Thus it suffices to prove the finiteness of

332

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

the collection of sets generated from X±,..., Xr by a repeated use of the operations 1), 2), 3), and 4). Since any intersection is a finite union of intersections of irreducible quasiprojective algebraic sets, it suffices to prove the finiteness of the collection of sets obtained by starting with X\,... ,Xr and repeatedly doing only the operations 1), 2), and 3). Note that the operations of taking intersections of irreducible sets and taking singular sets decreases dimensions if it leads to anything new. Thus, by the fact that dimension is finite, the operations 1), 2), and 3) lead to only a finite number of quasiprojective sets. D Let gi,... ,gs be a set of algebraic functions defined on a quasiprojective algebraic set X. Denote the solution set of all the functions, i.e., V(gi,... ,gg), by V(g). We say that g\,..., gs are simply generic with respect to a /c-dimensional irreducible algebraic set Z C X if given any integers 1 < i\ < ... < ir < s, it follows that (1) ifr >kthenV{gtl,...,gir)nZ (2) if r < k then either V{gtl,...,

cV(g); and gir) D (Z \ V(g)) is empty or

dimV(gil,...,gir)n(Z\V(g))

=

k-r

and V{gil,...,gir) n (Zreg \ V(g)) is smooth with the differentials dgit,..., dgir having rank r in the tangent space Tz,x any x e V(gil,..., gir) n (ZTeg \ V(
=

;

••.

•

'

. A s , l • ' • ^s,n .

the s x b submatrix A ( j i , . . . , j b ) 6 Csxb of A G C s x n associated to the list of integers 1 < j i < . . . < jb < n is defined to be Ai,ji • • •

A(ji,...,jfc) : =

:

••.

^i,jb

:

-As,ji ' • • *s,jb _

The following Bertini theorem expands on the conclusions reached in § A.7.3. T h e o r e m A . 9 . 2 ( B e r t i n i T h e o r e m f o r C o n s t e l l a t i o n s ) Let fi,...,fn

be a

set of algebraic functions on a quasiprojective set X. Given any finite number A\,... ,Am of quasiprojective subsets of X, let C denote the constellation of quasiprojective sets associated to (1) the sets A\,..., Am; (2) all irreducible components of X; and (3) all sets of the form V(fjl,... ,fjb) for the lists of integers 1 < jx < ... < j b < n.

333

Algebraic Geometry

Then there is a Zariski dense open set U C CsXra, such that for A e U and any list 1 < ji < • • • < jb < n the functions

:

:=A(ji,...,jb)-

\

.9s \

[fjb.

are simply generic with respect to every irreducible set in C. Proof. Since the intersection of dense Zariski open subsets of Csxn is dense and Zariski open, it suffices to prove the result for a single irreducible set Z e C of some dimension k. Further if we showed that for a given list l < i i < . . . < i r < s , the result is true ioi gi1,..., gir where "fill :

[/ji" := A ( j i , . . . , j b ) •

.9s J

:

[fjb.

with A in a dense Zariski open set U(ii, •.., ir;ji, • • • ,jb) C C s x n , we will be done by taking the intersection of these open sets indexed by the finite number of lists of integers 1 < i\ < ... < ir < s and 1 < j \ < ... < % < n. Therefore by renaming, it suffices to prove that there is a dense Zariski open set (/ C C s x n for any s
r/r

"si] :

:=A-

.9s\

:

,

[fn.

it follows that (1) if r > k then % , . . . , j s ) n Z c V(/); and = k - r and V(gi,... ,gB) n (2) if r < k then dimV(9l,... ,gs) n (Z\V(f)) (Zreg \ V(f)) is smooth with the differentials dgi,..., dgs having rank r in the t a n g e n t s p a c e Tz,x for a n y x e V(gil

,...,gZr)n

(Zreg \

These assertions follow immediately from Theorem A.8.7.

V(g)).

•

There are many versions of Bertini's Theorem in the literature, e.g., (Example 12.1.11 Fulton, 1998). For a further discussion of Bertini theorems, see also (§1-7 Beltrametti & Sommese, 1995).

334

A.10

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Some Useful Embeddings

There are some natural embeddings of algebraic sets that are useful. For simplicity, we give versions for projective algebraic sets, though similar constructions are equally useful for affine algebraic sets.

A.10.1

Veronese Embeddings

Let N and d be positive integers. The Veronese embedding is the natural embedding

to the point with homogeneous coorobtained by sending the point [ZO,...,ZN] J dinates made out of all the monomials {z \ \J\ = d} where we use multidegree notation. The restrictions of the linear equations ¥^ N >~ to the image of the Veronese embedding give all the degree d equations of VN.

A.10.2

The Segre Embedding

Let Ni,...,

Nr be r positive integers. The Segre embedding is the natural embedding P^ 1 x . . . x FNr -> p n L i W + i J - i

given by sending the point [zito,..., Z±>N1 ; . . . ; zr,o, • • •, Zr,Nr] to the point with homogeneous coordinates made out of all the monomials z\^ • • • zr^T. Remark A.10.1 The degree of the image of the Segre embedding of the multiprojective space FNl x . . . x PNr in p n r=i( Ar ;+ 1 )- 1 i s the multihomogeneous Bezout number for the system with X)[=i Ni equations all of type ( 1 , . . . , 1). This may be checked, e.g., using Equation 8.4.15, to be / V

N- \

\N1,---,NrJ-

CV"

7V-V

N1\---Nr\-

On a theoretical level, the Segre embedding shows that subsets of multiprojective spaces defined by multihomogeneous equations may be regarded as projective algebraic sets. One case is of special interest. Example A.10.2 (The Quadric Surface) Let S := P 1 x P 1 with bihomogeneous coordinates [zi,o, zi,i; Z2,o, ^2,i]- Let [wo, wi, ^2,^3] denote the homogeneous coordinates on P 3 . The Segre embedding of P ' x P U P 3 given by [wO,Wi,W2,W3]

: = [zi,0Z2,O,Zl,1*2,0, Zl,0Z2,l>zl,1^2,1]

has as image the smooth quadric V(u>oW3 — Wiit^)-

Algebraic Geometry

335

The Segre embedding is useful because it gives a consistent way of measuring the degrees of pure-dimensional algebraic sets on P^ 1 x ... x WNr. Remark A.10.3 (Measuring Degrees) Measuring degree by using the Segre embedding gives the smallest possible values of all the consistent ways of measuring the degrees of pure-dimensional algebraic sets. Other consistent ways may be obtained by using the Veronese embedding on the different projective spaces followed by a Segre embedding. In the language of line bundles mentioned briefly in § A. 13, such a choice is equivalent to choosing an ample line bundle L on M := P^ 1 x ... x ¥Nr and then denning the L-degree degL(X) of a pure fc-dimensional X C M to be c\{L)k • X, where ci(L) is the first Chern class of L. A.10.3

The Secant Variety

To derive properties of generic projections, we need to define the secant variety of an affine algebraic set X. Let X be an irreducible affine algebraic subset of C^. Given two distinct points x, y e X, we have a unique line between them, parameterized by u € C as (1 — u)x + uy. Let A denote the diagonal A := {{z, w) € X x X \ z = w} . Then the image of the map / : ( I x I \ A ) x C - > C A r , defined by f(x, y, u) = (1 u)x + uy is a constructible set by Theorem 12.5.6. The secant variety of X, denoted Sec(X), is the closure in CN of the image of this map. By Corollary 12.5.7, Sec(X) is an irreducible affine algebraic set. By Lemma 12.5.2, dimSec(X) < 2dimX + 1. L e m m a A.10.4 Let X be an irreducible affine algebraic subset of CN. If N > 2d\mX + 1, then a generic linear projection TT : C ^ " 1 applied to X is one-to-one. Proof. Embed CN into PN by the map (zi,...,

ZN) —> [XQ, . . . , XN] = [1, Z\,..., ZJV].

Let HQ := V(XQ) denote the hyperplane at infinity. Then Sec(X), the closure of Sec(X) in ¥N, meets Ho in a proper algebraic set of Sec(X). Thus dim iJonSec(X) < dim Sec (X). So we conclude that dim# 0 n Sec(X) < dimSec(X) - 1 < 2dimX < N - 1 = dimH0. Thus Ho
336

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

A.10.4

Some Genericity

Results

Let X C C ^ be an irreducible affine algebraic set. Let n : CN —> C m be a linear projection. The restriction nx, of n to X, does not have to be proper. For example, the map from the hyperbola V(xi^ — 1) to the x\ axis has as image the complement of the origin and is therefore not proper. It is a part of the Noether Normalization Theorem 12.1.5, that if 7r is general, then the restriction wx, of n to X, is proper. We now go over the geometric proof behind a form of Theorem 12.1.5, which includes some degree information.

Theorem A.10.5 (Noether Normalization Theorem) Let X C CN denote an affine algebraic set. Let TV : CN —> Ck denote a generic linear projection. Then if dim X < k, the map TTX is a proper algebraic map with all fibers nxl{y) finite for ally GY :=TT(X). If dim X < k, then there is a Zariski dense subset U C X such that iru : U —> TT(U) is an isomorphism. If X is of pure dimension k, then nx is a branched covering of degree degX. Proof. Let X denote the closure of X in P ^ . Here we embed C ^ by sending (xi,...,xN)

G CN t o [zQ,...,zN]

= [l,xi,...

,xN] 1

G FN. A s a b o v e , w e l e t # o o N Q 1P ; equal to V(ZQ). Linear

denote the hyperplane at infinity, i.e., the p ^ projections CN —> Ck correspond to (N — k — l)-dimensional linear subspaces Z/Ar_fc_i c Hoo. Fixing a general fc-dimensional linear subspace Sk C P ^ , the to map 7T£ : ¥N —> ¥k associated to £, an L w _ fc _i c i?oo, sends x ePN \ LN-k-i T^cix) = Sk n (x,Ljv-fc-i). If C does not meet the projective algebraic set X \X, then TT£ is proper when restricted to X. Since d i m X \ X < dimX < k, we conclude that the set of C C H^ that meet X \ X is a proper algebraic subset A of the Grassmannian Gr(N - k, N) of linear PJV~fcs in P ^ " 1 . This implies properness of the restrictions to X of projections 7f£ with C in the complement of A. If a fiber of -KC on X was not finite, then since the restriction of TT£ is proper, we would have a compact projective subset of X which is not finite. This is absurd by Lemma 12.4.3. If dimX < k, then it is sufficient to show that given a general point x of an irreducible component of X, a general £ = fN~k containing x meets X in no other points and the map associated to H^ n C has maximal rank at x. This makes sense since the general point of an irreducible quasiprojective algebraic set is smooth. This follows from Theorem A.8.2. If X is of pure dimension k, then a general £ = fN~k meets X in deg X points. The general map associated to C = £ fl iJoo has degree degX. • N + 1 different projections may be used to separate points. Lemma A.10.6 Let X be an affine algebraic subset ofCN, all of whose irreducible components are of dimension < k. Fix a finite set S C CN. For a general linear

Algebraic Geometry

337

projection IT : CN -> C m with m> k + 1, K(X) = ir(y) for x e S and y e X U 5 implies that x — y. Proof. Since the lemma is vacuous if m — N, we can assume that m < JV — 1 and thus that k < N — 2. We can reduce by induction to the case when m = N - 1. Let Hoc := f1^-1 denote the hyperplane at infinity in FN. Let y be a point of S. If y ^ X, consider the map y{x) equal to the intersection of H^ with the line spanned by x and y. If y £ X (~) S, let (f>y : X \ {y} —> i ^ be the analogous map. The union T of the closures of the images of these maps as y runs over the set S is at most dimX. Since dimX = k < N — 2 < dimiJoo, we conclude that the projection corresponding to a general point of -ffoo \ T has the desired • properties. The following result is classical (p. 7 Mumford, 1995). A proof follows, e.g., from the construction given in § 15.5.4. Theorem A.10.7 Given a pure (N — 1)-dimensional affine set A C C ^ (respectively, protective set A C ¥N), there is a polynomial p(z) on CN (respectively, a homogeneous polynomial p(z) on VN) of degree deg A with V(p) — A. A is irreducible if and only if p(z) is irreducible in the sense that p(z) does not factor as a product of two polynomials both of strictly lower degree. Given a reduced affine algebraic set X, the following classical lemma lets us construct polynomials whose set of common zeros is the underlying set of X. Lemma A.10.8 Let X be an affine subset of CN, all of whose irreducible components are of dimension k < N. Given N + 1 generic projections i\i : CN —> Ck+1 with Qi the defining degX polynomial of TTJ(X) for i = 0 , . . . , iV; the set of common zeros of the polynomials qo(iTQ(x)),..., qN(nN{x)) is X. Proof. Choose a generic projection TTJV : C ^ —> C fc+1 and let qpj be the defining degX polynomial of n^(X). Then gjv(7Tjv(aO) vanishes on an (N — l)-dimensional set XJV containing X. Let S be a finite set consisting of one point from each irreducible component of X^ \X. Choose a generic projection TT/V-I : C ^ —> Ck+1. By Lemma A.10.6, TTJV-I(S) fl TTJV-I(X) = 0, and thus the set of common zeros minus X is of dimension at most N — 2. This Xjv-i of qN(^N{x)),qN^i(nN^i(x)) • step can be repeated, in an induction, to give the conclusion of the lemma.

A. 11

The Dual Variety

In classical projective geometry there is a simple but basic duality between points and hyperplanes. To make this precise, let P ^ denote the A^-dimensional projective

338

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

space. A point is represented in homogeneous coordinates by an (TV + l)-tuple [zo,..., ZN\- A hyperplane is represented by a linear equation a^zo + • • • + OJVZJV = 0 with not all coefficients zero. Since multiplication of a linear defining equation of a hyperplane does not change the hyperplane, we see that there is a one-to-one correspondence of hyperplanes with points in a projective space represented in the homogeneous coordinates \ao,..., a/v]- This second projective space is referred to as the dual projective space FN*. Note the relationship is completely symmetric, i.e., P^**, the dual of the dual of FN, is just FN. The family of hyperplanes containing a P N ~ 2 C ¥N corresponds to a line in P^*. With such a duality, it is natural to try to extend it to subsets of projective space besides points and linear spaces. To see how this might be done, let C be an irreducible curve in P 2 . If C was smooth we could send x £ C to the point in P2* representing the line in P2 tangent to C at x. This curve C" is called the dual curve of C despite the fact that if C is a line then C is a point. If C is a singular curve we could define C as the closure of the image of the smooth points. For such a singular curve, the map C —• C is a rational mapping but not necessarily a function. This duality makes sense in general. Given an irreducible projective algebraic set I c P " , define the dual variety X' as the closure in PN* of the set consisting of hyperplanes which contain at least one tangent space of some smooth point of X. We can similarly define the dual of an algebraic set. There is a strong result about dual algebraic sets in complex projective space. Theorem A . l l . l

Let X be an irreducible subset of¥N. Then (XJ = X.

Proof. (Kleiman, 1986) is a good reference for this result and related material.

•

Note this result says that in the case when X is a curve in P 2 and not a line, the rational map X —> X' gives an isomorphism from a Zariski open set of X to a Zariski open set of X'. To see this note that the image of X is either a point or a curve. If it is a point then X" = X is a line. So we have that if X is not a line it has image a curve. The rational mapping X' —> X is a well-defined map on the smooth points of X'. Prom this we conclude that X - t l ' could not be r to one for an r > 1. We need a special consequence of this result. Corollary A.11.2 Let C be a pure dimension-one, not necessarily irreducible, algebraic subset o/P 2 . Assume that C has no irreducible components of degree one. Then C" = C. Further let x be a general point of any one of the components C of C with the tangent line £ to C at x. Then the defining equation of C given by Theorem A. 10.7 restricted to £ has x as a zero of multiplicity two with all other zeros of multiplicity one. Proof. Since C" = C for an irreducible curve and the degrees of the components of C are all of degree greater than one, we have from Theorem A.ll.l that the images of the components of C are distinct irreducible curves. Choosing a general point x

Algebraic Geometry

339

of a component D we get a general point of a component D' of C. This implies that any line £ tangent to a general point of a component D of C corresponds to a point of P2* not on any component of C other than D'. In particular £ must be transverse to C away from x. The condition that a neighborhood of x on C" goes isomorphically to a neighborhood of x on C is equivalent to the fact that x is a multiplicity-two zero of the restriction to £ of the defining equation of C. • A.12

A Monodromy Result

Let X be a pure fc-dimensional affine algebraic subset of CN and let Gr(m, N) denote the Grassmannian of P m s in FN. We close X up to get a pure fc-dimensional projective algebraic set X C PN. We consider the family of intersections £;v-fc H i for A;-dimensional linear spaces Lpi^k C fN. The set of pairs F := {(LN_k,x)

GGr(N-k,N) x X |x
is a projective algebraic set. This is completely analogous to the simpler construction in § A.7. We have the maps p : T —> Gr(N — k, N) and q : T —> X induced by the product projections on Gr(N — k, N) x X. Since a generic L^-k meets X transversely in a set of degX distinct points of X reg , we conclude from Corollary A.4.14 that there is a Zariski open set U c Gr(N - k, N) such that pp-i(u) '• P'1^) —> U is a finite covering. Fix a general point y € U, we have the monodromy action of the fundamental group ni(U,y) on the set p~l(y)- Statements for monodromy using slices of X follow immediately from the statements for monodromy using slices of X. Indeed, by shrinking U further it may be assumed that q(p~1(U)) C X, and so the lemmas and theorems we state hold equally for X and its closure X. Reflecting the bias in this book to regard polynomial systems as being defined on Euclidean space rather than projective space, we state the results for affine algebraic sets X in the rest of this subsection. Lemma A.12.1 If Xi is an irreducible component of X, then the above monodromy action acts transitively on the set Xi np~1(y). Proof. Note that q : J- —> X is a fiber bundle with the fibers isomorphic to the Grassmannian Gr(N — k, N). Thus the set q"1{Xi) is irreducible, and therefore the Zariski dense open subset p~l(U) D q^1(Xi) C q~l(Xi) is also irreducible. Since y is general, p~l{y) consists of smooth points of the irreducible and hence pathconnected manifold (p^1(f7) n q~1(Xi))reg. The monodromy action under a path connecting two distinct points of p~1(y) gives the transitivity. • We need a much stronger result. Choose a general affine linear subspace B :=

340

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

£N-k+i c £N containing the (N -fc)-dimensionalaffine linear space corresponding to the basepoint y € Y. Let BQ denote the (N —fc)-dimensionallinear subspace of C* parallel to B. Though in practice it does not matter, we should theoretically choose the general B first and the general space corresponding to y afterwards. We let 7r : CN~k+1 ->C:= CN~k+1/B0 be the induced linear map. Let Ls := TT" 1 ^). Let V C U be the linear curve through y corresponding to the Ls. We have the following result from (§3 Sommese et al., 2002b). Theorem A.12.2 Let X = U^1Xi denote the decomposition of a pure kdimensional affine algebraic set X c C^ into irreducible components. We assume that k > 1. Let ir~l(y) = VJTi=1Fi, where Ft = ir~l(y) n Xt with n, y, and V as above. The image ofit\ (V, y) into the automorphism group of the set n'1(y) induced by the monodromy action of slices of X by the (N —fc)-dimensionallinear subspaces Ls c CN is r

0Sym(Fi), where Sym(F;) is the symmetric group of F^. Proof. First we may discard all degree one components since they do not effect the veracity of the theorem. Next we reduce to the case when k = 1. Since B is general, we know from part 3) of Theorem 13.2.1 that each X, is irreducible. The map -K may now be regarded as the linear map from CN to C with Ls of the form LQ + sv for a fixed vector v G CN with v £ LQ. By renaming if necessary, we may assume that X is one-dimensional. Next we take a general projection TT' : C^ —* C. The generic linear map II := (7r,7r') : CN —> C2 maps X generically one-to-one to its image by Theorem A.10.5. Let TTi denote the projection of C2 onto its ith factor. There is a Zariski open dense set V of C such that ?rf 1{V) nll(X) is smooth and m : Tr^l(V')r\Tl(X) -> V is a d := degX sheeted covering map. Since n = TT\ O LT, we may regard V as an open subset of V. Since every immersion g : S1 —> V' gives an immersion g : S1 —> V, it suffices to prove the result for V'. This reduces us to the case of a curve in C2 with V a family of lines parameterized by an open Zariski dense set of a line in the dual P2 to the P 2 containing C2. This case follows in two steps. First we prove the statement for the family U of all affine lines in C2. This follows using Corollary A. 11.2 and a modification of the proof of the classical statement when X is an irreducible curve, e.g., (page 111 Arbarello, Cornalba, Griffiths, & Harris, 1985). The proof foiVcU follows from a theorem (Theorem, §5.2. Part II Goresky & MacPherson, 1988) of Lefschetz type asserting that the homomorphism TT\(V, y) —> 7Ti(U,y) induced by the inclusion V C U is a surjection. • We refer the reader to (Sommese et al., 2002b) for a more detailed proof.

341

Algebraic Geometry

A. 13

Line Bundles and Vector Bundles

We have mentioned earlier that homogeneous functions are not functions on projective space, though they are functions on a related Euclidean space. One difficulty posed by this is that the usual statements for algebraic functions on affine algebraic sets are not literally true for homogeneous functions on projective space. If homogeneous functions on projective space were the only issue, we could state the results for polynomials with slight rewording for homogeneous functions. But, faced with a number of very useful generalizations of homogeneous functions, e.g., bihomogeneous and more generally multihomogeneous functions, this is not a viable approach. In this section we first introduce bihomogeneous and multihomogeneous polynomials, and then define line bundles and their sections. A.13.1

Bihomogeneity and

Multihomogeneity

Let X denote the product of two projective spaces, P m x P n . We can denote a point in this space by a (a + b + 2)-tuple [ZQ, ..., zm; Wo,..., wn] of points with neither Zi = 0 for all i nor Wj = 0 for all j , and with the equivalence relation [zQ>...,zm;wo,...,wn]

~ [Xz'o,...,

Xz'm; fiw0,...,

p,w'n}

for all 0 j^ A e C and 0 ^ /x € C A polynomial p(z, w) in the variables ZQ,.. . ,zm,Wo,... ,wn is said to be bihomogeneous of degree (a,b) if it is of the form Yl\i\=a \j\=bcuzIwJ• Note that since p(Xz, /j,w) = Xanbp(z,w), it follows that the set where p(z,w) = 0 is a well-defined subset of Fm x P™. Similarly, we can define multihomogeneous polynomials on P™1 x • • • x Pnfc. A. 13.2

Line Bundles and Their Sections

First, let's consider the case of C^. If we have a polynomial p(z) on CN, we can think of p(z) in terms of its graph ap :— {(z, A) e C^ x C j A = p(z)} . We say that CN x C is the trivial line bundle on CN and av is a section. In loose terms, a line bundle over X is a quasiprojective algebraic set which maps onto X with fibers identified with C in such a way that the vector space structure on C is preserved. Precisely, we can define line bundles on any quasiprojective algebraic set X. A line bundle L on X consists of the data (1) UQ, .. •, Ue, a covering of X by affine Zariski open sets Ui dense in X; (2) for each 0 < i < £, 0 < j < £, an algebraic function ptj defined and nowhere zero on Uij := Ui f] Uj with pijPji = 1 on Uitj and pa = 1 on Ui\ and (3) PijPjk = Pik on Ui n Uj n Uk for all 0 < i < I, 0 < j < I, 0 < k < £. Associated to a line bundle is a space generalizing the trivial bundle. The space, also called L by abuse of notation, is covered by open sets UiXC where for x € Uij we identify {x,At) G £7, x C with (x,Aj) 6 Uj x C if Aj = pij{x)Al. The cocycle

342

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

condition pijPjk — Pik guarantees the identifications are well defined. There is a further natural, but involved, definition of when different covers and choices of pij lead to the "same" line bundle. Sections, which are basically like graphs of functions, are defined as a choice of algebraic functions at : Ui —> C with the property that

aj{x) = ai{x)pij{x) for x e Uld and all 0 < i < I, 0 < j < L An algebraic line bundle L on a quasiprojective algebraic set X is spanned by a vector space V of global sections of L if for each point x 6 X, there is at least one section s & V such that s(x) ^ 0. For example, letting [zo,zi] denote homogeneous coordinates on P 1 , we may cover P 1 with Uo :— P 1 \ V(zx) and Ux := P 1 \ V(z0). We have the coordinate z :— ZQ/ZI on UQ and w := 1/'z = Z\/ZQ on U\. We may form the line bundle Opi (d) by taking the data consisting of the function poi = l/zd on Uo fl U\. We may regard a homogeneous polynomial p(zo,z\) of degree d > 0 a s a section of Opi(d) by assigning ao(z) := p(z, 1) to Uo and
= p(l, 1/z) = z" d p(-s ; 1) = Po,io-o(^)-

Note that Opi (0) is the trivial bundle, and for d > 0 the bundles 0pi (rf) are spanned. There are no other sections of (Dpi (d) besides the ones just constructed using the homogeneous polynomials. On P ^ , the line bundles are not much more complicated than the ones just constructed for P 1 . They are in one-to-one correspondence with the integers d, with the line bundle corresponding to d being denoted Cpw(d). For d < 0 the only algebraic section of OFw(rf) is the 0-section, i.e., the choice of a cover [7; of FN and (Ti — 0 for all i. For d = 0 we have the trivial bundle, whose only sections are the constant functions, and for d > 0 the algebraic sections are again in one-to-one correspondence with the homogeneous polynomials of degree d. It turns out that up to equivalence that the only algebraic line bundle on CN is the trivial line bundle. Any algebraic line bundle L on an irreducible projective algebraic set X gives rise to a well-defined element C\{L) in the second integral cohomology group H2(X, Z) of X. This element c$L) is called the first Chern class of L. If L has a not identically zero section s, then ci(L) is Poincare dual to the zero set Z of s. Let us assume we have line bundles L\,... ,LN on an irreducible projective algebraic set X of dimension N. If the line bundles are spanned by global sections, then given general sections Si of Li for i = 1 , . . . , N, it follows that the system siO) = 0 :

(A.13.6)

sN(z) = 0 has exactly (c\{L$ • •

-CI(LJV))

[X] isolated solutions and they are all nonsingular.

343

Algebraic Geometry

For example, if X = FN and Li = OpN(di), then the Sj are homogeneous polynomials of degree di, and we have the classical Bezout Theorem.

A. 13.3

Some Remarks on Vector Bundles

Replacing C in the definition of line bundles by C , and letting p^ be invertible r x r matrix-valued holomorphic functions we end up with the definition of a vector bundle of rank r. In terms of this definition, given line bundles L\,..., L^ on an irreducible projective algebraic set X of dimension N, and sections Sj of L, for i = 1,..., N it follows that the system given by Equation A.13.6 is equivalent to s = 0, where s = s1 © • • • © s^ is the section of the rank N vector bundle E := L\ © • • • © LN obtained by taking the direct sum of the N line bundles L^ The cohomology class c\{L) •••cN(L) e H2N{X,Z) is just the iVth Chern class Cff(E) of E, and the Bezout number for the system is just c^/(E)[X]. Such numbers are very often easy to compute. As a concrete example, we give the simplest nontrivial system on CN arising as a section of a rank N bundle on P w restricted to C^. For the bundle we take the tangent bundle Tpw of FN. The Bezout number for the system associated to a general section s of TpN is AT + 1. Written in terms of coordinates x i , . . . , x^ on CN the system becomes ' £i(x) -

Xleo(x)

'

:

=0

JN{X) -xNe0(x)_ where li{x) = a^o + auxi + • • • + dixx^ for generic choices of all the a^. By the theory of vector bundles it may be checked that this system has exactly N + 1 nonsingular isolated solutions.

A.13.4

Detecting Positive-Dimensional Components

The algebraic geometric structure that best captures what is meant by a polynomial system is that consisting of a vector bundle and one of its sections. For the sake of simplicity we have avoided line bundles and vector bundles in this book, but they are in the background and they lead to useful results, e.g., (Morgan et al., 1995) and (Morgan & Sommese, 1989). Here is one (Theorem 7 Morgan & Sommese, 1989). Theorem A. 13.1 (Morgan and Sommese) Let £ be a spanned rank N holomorphic vector bundle on an N-dimensional irreducible compact complex analytic space X. Assume that CN{£)[X\ ^ 0. Let a0 and o\ be two holomorphic sections of £. Then letting \ZQ,ZI] be homogeneous coordinates on P 1 , the solution set of 1 ZQCTQ + z\cj\ on P x X is connected.

344

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Before we start the proof of this theorem we would like to show what it says about down-to-earth polynomial systems. Let X := P^ 1 x • • • x ¥Nr be a product of projective spaces. Consider "systems of polynomials" a consisting of N := YH=i Ni equations where the ith equation has the nonnegative multidegrees d^i,... ,dktT with respect to the multihomogeneous structure. Letting TTJ be the product projection of X onto the jth factor P ^ , a is a section of the bundle

£:=0(0*;
(A13-7)

When the solutions of a system a are isolated and nonsingular, then the Nth Chern class of £ evaluated on X, i.e., CN{£)[X], equals the number of points in V(a). In the case of the £ of Equation A.13.7, c^(£)[X] equals the coefficient of tr 1 • • • t^r inIT^=1(M»,i + \-Uditr)Next let ai be a section of the £ in Equation A. 13.7 with isolated nonsingular solutions. Assume that we know the solutions of <j\. Now consider the "homotopy" CTJ := (1 — £)<7o + itai where ao is a second section, whose solution set we are computing. We know that for all but a finite number of 7 £ 5 1 , the solution set of <7t — 0 is isolated and nonsingular for t G (0,1]. By Theorem A.13.1, we know that the limits of V(at) as t —> 0 include points from every connected component of V(a0). Proof, (sketch of the proof of Theorem A.13.1) Letting X be a desingularization of X by Theorem A.4.1 and noting that sections from a Zariski open dense space of sections of £ are nowhere zero on a proper analytic subset of X, we conclude that we can assume that X is smooth without any loss of generality. Analogously to the arguments in § A.7, the universal space of solutions of sections of £ is a smooth connected projective bundle over X. Using the proof of item (3) • of Theorem 13.2.1, we have the connectedness. A. 14

Generic Behavior of Solutions of Polynomial Systems

Systems of polynomials that arise in engineering and science often depend on parameters. In this section, we take a general approach to polynomial systems with parameters, and discuss what we can say about the dependence of solution sets on the parameters. There are two questions we are interested in: (1) what properties hold for general values of the parameters, e.g., a well-defined number of isolated solutions; and (2) given some property for a system with a special value of the parameter, e.g., having an isolated solution, what can we conclude for general values of the parameters.

345

Algebraic Geometry

Since the proofs require material beyond the scope of this book, we refer to references for essential points. Our approach is the same as (Morgan & Sommese, 1989), though the focus there was mainly on isolated solutions of systems. Let / i ( z i , . . .,xN;qi,..

.,qM)

:

f(x;q):= _fn{xi,

• • • ,XN',Ql,

(A.14.8) • • • ,QM) .

be a system of polynomials of (x;q) G CN x C M . We regard this as a family of polynomial systems in the x variables with the g-variables as parameters. Though the algebraic system given in Equation A.14.8 is quite general, it is not general enough. We need to allow also the possibility that systems in the family are defined on any algebraic subset of
(A.14.9)

be the restriction to X x Y of an algebraic section of an algebraic rank n vector bundle £ on X x Y, where X is a Zariski open and dense subset of an iV-dimensional connected projective manifold X, and Y is an irreducible smooth quasiprojective algebraic set of dimension M. A special case of this would be the situation that X x Y is a smooth Zariski open set of an irreducible projective algebraic subset of some projective space and f(x; q) consists of the restriction of n homogeneous polynomials fi(x; q) to X x Y. Though we briefly discussed vector bundles in § A.13, we suggest strongly, the first-time reader proceed with X := CN, Y := C M , and f(x;q) = 0 in Equation A.14.8 satisfying the extra property that it is a set of n polynomials on CN+M. Let X denote the nonreduced solution set of f(x; q) = 0, and let Z := V(f(x; q)) denote the reduction of X. Let n : X —> Y be the map induced from the product projection X x Y —> Y {CN x C M —> C M if you are following in the simpler setup). Let Xo denote the union of irreducible components Z of Z such that TTZ is dominant and such that dim Z = M. T h e o r e m A.14.1 If n = N and if there is an isolated solution (x*;q*) of f(x;q*) = 0, then (x*;q*) £ XQ. Moreover there are arbitrarily small complex open sets U C X x Y that contain (x*; q*) and such that (1) (x*;q*) is the only solution of f(x; q*) = 0 in U D (X x {<7*})/ (2) f(x;ql) = 0 has only isolated solutions for q' G TT(W) and x G U fl (X x {q1}); and

346

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

(3) the multiplicity of (x*;q*) as a solution of f(x;q*) = 0 equals the sum of the multiplicities of the isolated solutions of f(x; q1) = 0 for q' € ir(U) and x € Un(Xx{q'}). Proof. The first two statements are proven in Theorem A.4.17. Since the codimension of XQ is N and XQ is defined near (x*;q*) by N functions, we know that Xo is a local complete intersection near (x*;q*). Thus Xo has at worst Cohen-Macaulay singularities. Since Y is smooth, this implies that -Kx0 '• X$ —> C M is flat in a neighborhood of (x*;q*). Here we are using the nonreduced structure of XQ in the neighborhood of (x*;q*). Flatness yields this result, e.g., (Prop. 3.13 Fischer, 1976) and the corollary following that proposition. • Corollary A.14.2 Let f(x;q) be as in Equation A.14-9. Then there is a Zariski open set U
tions counting multiplicity is d := yjidj, where the sum is always finite and i=l

bounded by the product of the JV largest degrees (with respect to the x variables) of the equations making up f(x; q) = 0. We call d, the generic Bezout number or the generic root count of the system f(x; q) = 0. Theorem A. 14.1 is one large reason why we use square systems. The following example is typical of the case n > N. Example A.14.4

For a system of polynomials in (x; qit q2) € C x C2, take

For q\ — q§, the system has isolated solutions, but for q\ ^ q2., there are no solutions. Theorem A.14.5 Assume that M + JV > n and that there is an isolated solution (x*;q*) of f(x;q*) = 0 where f(x;q) is as in Equation A.14.9. There is a germ of an irreducible complex analytic set Q containing q* with dim Q > M — {n — N) such that for all points q' in arbitrarily small open sets U
Algebraic Geometry

347

obtain a system equivalent to f(x;q) = 0. Using Theorem 12.2.2 successively, we cut X'Q down to an affuie algebraic set with dimension > M - (n — N). Take a component Z of this set at (x*;q*). By Lemma A.4.16, there are arbitrarily small open sets V of q* on this component (in the complex topology) on which the restriction 7rgn7r-i(y) : ZPi-K~l(V) —> V is proper (in addition to being finite by construction), e.g., see (§3, Theorem 8(b) Gunning, 1970) for a discussion. By Theorem A.4.3 applied to map irzr\-ir-1{y)i w e a r e done. • Remark A.14.6 A similar statement to Theorem A.14.1 can be proved, when we are talking about k dimensional components in place of an isolated x*. In this case when M + N > n + k, we get a Q of dimension at least M + N — n — k. A.14.1

Generic Behavior of Solutions

As at the start of the section, let f(x;q) be as in Equation A.14.9 (or simply as in Equation A. 14.8). We let X denote the solution set of f(x;q) = 0 with the induced nonreduced structure, and TT : X —> Y the induced morphism. The easiest route to generic statements is to exploit the fact that the morphism IT : X —> Y is "generically flat." Before we do this, let us show some generic properties, just to give the flavor of how the arguments go. We will continually choose smaller Zariski open dense sets U CY, and by abuse of notation call them U. Lemma A.14.7 There is a Zariski open dense set U cY such that either TT"1 ([/) is empty or n^-i^ : 7r^1(t/) —> U maps every irreducible component of X surjectively onto Y. Proof. To see this note that there are finitely many irreducible components Z of X. The set TT(Z) is constructible by Theorem 12.5.6, and so either n(Z) is Y or a proper algebraic subset of Y. Setting U equal to the complement of the union of the proper algebraic sets arising in this way, we can assume TT(Z) is dense in U for every component of Xu, the solution set of f(x;q) over U. We know, by Lemma 12.5.8, that for such a Z there is a Zariski open dense set of Y contained in ir(Z). By taking the intersection of these sets, we get a Zariski open dense set U with the desired property. • Lemma A.14.8 There is a Zariski open dense set U C Y such that given any irreducible component Z of n~1(U), TT^-I^) : TT~1([7) —> U maps Z surjectively onto Y with every fiber of nz having dimension exactly dim Z — M. Proof The argument follows from Corollary A.4.7 combined with the same reasoning as Lemma A.14.7. • The same sort of arguments yield the following result.

348

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Lemma A. 14.9 There is a Zariski open dense set U C Y such that given any distinct irreducible components Z\ and Z2 ofn'1^), either Z\V\Zi — 0 or 7rff-i((/) : TT~1(U) —> U maps every irreducible component W of Z\ D Z 2 surjectively onto Y with every fiber W of irw having dimension exactly dim W — M Many results such as this are immediate consequences of the generic flatness theorem. The generic flatness theorem is a useful algebraic result of Grothendieck, e.g., (pg. 57 Mumford, 1966), which Frisch (Frisch, 1967) showed holds for holomorphic maps between complex analytic spaces. We are not going to define flatness, but it geometrically says "fibers change without discontinuity." Good places to read about flatness (and some of the results that justify such a statement) are (pg. 146-161 Fischer, 1976) and (Chapter III. 10 Mumford, 1999). The generic flatness theorem mentioned above says that there is a Zariski open set U C Y such that either 7r~1(C/) is empty or T I V - I ^ ) : TT~1(C/) —> U is a flat surjection. From here on we will assume that U is not empty, since the statements we show are all trivially true in that case.

The generic irreducible decomposition Is there a "generic irreducible decomposition?" The answer is a strong yes, but first we must understand what we mean by this. For a point y € Y, let Xy denote the solution set of f(x; y) = 0. Forgetting about multiplicity information, we have the irreducible decomposition dimZy /

\

(J M J Zy,it3.

Zy:=V(f(x;y))=

(A.14.10)

We would like there to be a Zariski open dense set U
n-\U), i.e., dim Zy /

Zv=

\

U (\J Zv,itk i=l

\keJi

.

(A.14.11)

/

Note we are using Lemma A. 14.8, which tells us that given any irreducible component ZUthk of ZJJ, dimZ[/,j;fc = M + d\mZUylyk,y = M + i, where ZUtitk,y = Zu,i,k n (X x {y}).

349

Algebraic Geometry

Theorem A.14.10 Let f(x;q) be as in Equation A.14-9. Then there is a Zariski open dense set U C Y such that for any y £ U and each Zu,i,k occurring in Equation A.14-11, it follows that Zy^fcClTr—1(y) is a union of the irreducible components Zy,i,j °f Zy occurring in Equation A. 14-10. Moreover, for each of the i,k, all fibers of Zjj,i^k under n have the same number of components. Proof. Assume that it is not true, for the U selected in Lemmas A. 14.7, A. 14.8, and A.14.9, that Zu,i,k H Tr~1(y) is a union of the irreducible components 2y,i,j of Zy. Then one of the components Zy^j of Zy must contain one of the components of Zu^^k H ir~l{y). Moreover one of the components Zuytk' of Z\j must contain Zyjj. Thus we get that Z[/,i',fc' H •Zf/.i.fc contains a component W with fiber under 7T of dimension i. But this means W is dense in Zu,i,k, which gives the absurdity that Zutifk C Zuyk'By Theorem A.4.20, we may shrink U to a smaller dense Zariski open set U, so that each Zu,%,k contains a smooth Zariski open set W such that for all y e U, W Pi 7r~1(y) is dense in 7r^1(y); and IT : W —» U is of maximal rank with all fibers • having the same number of irreducible components. A.14.2

Analytic Parameter Spaces

It is a natural question to ask whether the results in this section are true when the parameters do not vary algebraically but only vary holomorphically. The short answer is "yes, with certain minor modifications." Because it is useful to allow complex analytic parameters, we explain what we mean by this and moreover state the generalization of the above results with the changes needed to prove them. In this one subsection, Zariski topology refers to the Zariski topology using zero sets of sets of homomorphic functions. The simplest case is a system '

fi(xi,...,xN;qi,...,qM)~

f(x;q):=

(A.14.12)

: Jn(xi,-

•• ,xN;qi,.

..,qM).

of holomorphic functions of (x; q) £ CN x C M , that are polynomial in the x variables. We regard this as a family of polynomial systems with the q-variables as parameters, there is a positive integer di such that i.e., for each i — l,...,n,

fi(x;q)= ] T aI(q)xI, \i\
where each ai(q) is holomorphic on all of C M . The situation analogous to Equation A. 14.9 is a system f(x;q),

(A.14.13)

350

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

which is the restriction to X x Y of a holomorphic section of a holomorphic rank n vector bundle £ on X x Y, where X is a smooth and dense Zariski open subset of an irreducible projective algebraic set X of dimension N, and Y is a connected complex manifold of dimension M. We allow the possibility that X — X. In the case of Equation A.14.12, X = P^, and £ = FN and £ = OPN(di)

® • • •®

O¥N(dn),

and f(x;q) = fi(x;q)(B- • -®fn(x;q). As before, the first-time reader should assume that X := CN, Y := C M , and f(x; q) = 0 is as in Equation A.14.12. As above we let n : V(f) —> Y denote the holomorphic mapping induced by the product projection X x Y —» Y. We let n : V(f) —> Y denote the holomorphic mapping induced by the product projection X x Y —> Y. If Z is an irreducible component of V(/), then by Theorem A.4.3, W (Z) is a complex analytic subspace of Y. Since n {Z\Z) is a proper complex analytic subspace of n (Z), we conclude that U :— 7f (Z) \ W [Z \ Z) c n(Z) is a Zariski open dense subset of TT(Z). This plays the role of Lemma 12.5.8. We will continue to replace U by Zariski open dense subsets of U as needed, and call them by the name U. This implies that each irreducible component of 7T~1(C/) maps surjectively on U. We state only the analogues of Theorem A.14.1 and Corollary A.14.2. Theorem A.14.11 If n = N and if there is an isolated solution (x*;q*) of f(x;q*) = 0, then (x*;q*) € XQ. Moreover there are arbitrarily small open sets U C X xY that contain (x*; q*) and such that (1) (x*;q*) is the only solution of f(x;q*) = 0 in U n (X x {q*}); (2) f(x; q') = 0 has only isolated solutions for q' € TT(U) and x 6 U ("1 (X x {9'}); and (3) the multiplicity of (x*;q*) as a solution of f(x;q*) = 0 equals the sum of the multiplicities of the isolated solutions of f(x; q') — 0 for q' e TT(W) and x G Un(Xx{q'}). Proof. The argument is the same as that for Theorem A.14.1.

O

When working with complex analytic spaces it is useful to define an analytic Zariski open set to be a subset U C X of an irreducible complex analytic space of the form X \Y where Y is a complex analytic subspace of X. All the usual notions, e.g., probability-one and generic point, carry over with no change. We would call the Zariski open sets we have dealt with up to now, algebraic Zariski open sets, if we needed to deal in any significant way with both sorts of Zariski open sets. Corollary A.14.12 Let f(x;q) be as in Equation A. 14-13. Then there is an analytic Zariski open set U C C M such that for q € U the system f(x; q) = 0 has di

351

Algebraic Geometry

isolated solutions (not counting multiplicity) of multiplicity i where di is an integer independent of q &U. Remark A.14.13

Thus, as in the purely algebraic case, the generic number of oo

isolated solutions counting multiplicity is d := ~S^idi, where the sum is always i=\

finite and bounded by the product of the N largest degrees (with respect to the x variables) of the equations making up f(x; q) — 0. We, as in the purely algebraic case, call d, the generic Bezout number or the generic root count of the system f{x;q) = 0. Corollary A. 14.12 holds with X singular.

Appendix B

Software for Polynomial Continuation

There is much to be said for the motto "learn by doing," and in our case, this means solving polynomial systems with numerical continuation. Even though this book offers substantially all the information one would need to write a solver from scratch, that is rather far beyond the level of commitment most readers will muster. To provide an easy entry to the area, we provide a suite of m-file routines called HOMLAB for performing polynomial continuation in the Matlab environment. After gaining experience with HOMLAB, one may wish to download one of several freely available software packages for polynomial continuation. These may offer speed advantages and advanced options, such as polytope methods, not available in HOMLAB. Some of these have been adapted to run on multi-processor machines for large computations. A partial listing of packages available as of the writing of this book is as follows. • HOMLAB runs in the Matlab environment and implements general linear product homotopy and parameter homotopy. See Appendix C. • HOMPACK, H0MPACK90, POLSYS_PLP are a sequence of increasingly sophisticated continuation algorithms, written in Fortran. The "PLP" in POLSYS_PLP stands for Partitioned Linear Products, a special case of the general linear products discussed in § 8.4.3. This code finds only isolated solutions for square systems (same number of equations as variables). • PHoM is a C++ code that implements polyhedral homotopies (see § 8.5). This package finds isolated solutions for square systems. • PHCpack implements a variety of homotopies in a menu-driven interface that includes all the structures discussed in Chapter 8, except polynomial products. In addition to isolated roots, the algorithms from Part III of this book for handling positive dimensional solutions, nonsquare systems, etc., are implemented. This package is written by our collaborator, J. Verschelde, and it has been the experimental platform for validation of these algorithms. Both executables and Ada source code are available. • Algorithms for mixed volume computations can be found on T.Y. Li's webpage. This is the most difficult phase of a polyhedral homotopy, (§ 8.5). 353

Appendix C

HomLab User's Guide

a suite of scripts and functions for the Matlab environment, is designed as an easy entry into the use of polynomial continuation and, for the experienced user, as a platform for experimental development of new methods. Many of the exercises of this book assume the availability of HOMLAB and special routines using HOMLAB functions are provided for some exercises. The use of a routine for a particular exercise is described in the exercise statement itself, while the general structure and use of HOMLAB is documented below. The best way to learn HOMLAB is simply to work the exercises in the order they appear in this book. These progress from the simple application of the core path-tracking routine to successively more sophisticated homotopies that use it. Help describing the usage of individual routines, say, endgamer. m, is available by typing "help endgamer" at the Matlab prompt. The main text of this book is the reference for the methodologies used and the help facility just mentioned is the reference for individual routines. However, to help the user in getting started quickly, we provide this user's guide. We assume the user has at least a minimal acquaintance with Matlab; in particular, the user must know how to write and execute simple scripts and functions. A script is a sequence of Matlab commands recorded in a file, say "myscript.m," which are executed by typing » myscript at the Matlab prompt, here indicated as " » . " (Scripts can also be called within other scripts or functions.) A function is a file, say "myfunc.m," which starts with a declaration line something like HOMLAB,

function

[out1,out2]=myfunc(inl,in2,in3)

followed by lines of Matlab code that compute the two outputs, outl,out2 from the three inputs inl,in2,in3. This function might be called as [a,b]=myfunc(0.1,[1 3] ,x) where x is an existing variable in the workspace. For more on using Matlab, please see the Matlab documentation. 355

354

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

• BERTINI is a C code soon to be available on A. Sommese's webpage. An effort of D. Bates, C. Monico, A. Sommese, and C. Wampler, led by A. Sommese, BERTINI features a high-level interface for parameter homotopies (including automatic differentiation) and multiple-precision routines that can adjust precision on the fly. As URL's are often subject to change, we suggest that the packages be located by use of a search engine.

356

C.I C.I.I

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Preliminaries "As is" Clause

HOMLAB is distributed free of charge on an "as is" basis. Its intended usage is educational, so that the user may gain a greater understanding of the use of numerical homotopy continuation for solving systems of polynomial equations. Any other use is strictly the user's responsibility.

C.I.2

License Fee

There is no license fee for HOMLAB. In lieu of this, we hereby request each user to buy a copy of this book. C.I.3

Citation and

Attribution

The use of HOMLAB for research purposes, either in its original form or as modified by the user, is highly encouraged, subject only to professional ethical conduct, as follows. In publications based on results obtained using HOMLAB or its successors, the use of HOMLAB should be acknowledged and this book should be cited. The author of the code is Charles Wampler. Any redistribution of HOMLAB in unaltered form must retain the same name and acknowledge the author. Any distribution of derived codes that extend or modify HOMLAB should acknowledge the original source and authorship. In addition, the differences from the original should be clearly documented and attributed to the new author. These conditions extend to users of the derived codes. C.I.4

Compatibility and Modifications

HOMLAB is a suite of Matlab routines. Version HOMLABI.O has been restricted to the conventions of Matlab v.4.0 to provide compatibility with both old and new Matlab installations. (Even the file names have been restricted to eight characters for compatibility with old operating systems.) The exception to this rule is that routines based on Part III of this book for generating witness point supersets use cell arrays to store sets for different dimensions. Users who advance to that level will need a more recent version of Matlab, or else they must modify the code. All routines have all been verified to run under Matlab v.6.5. By avoiding advanced features of newer versions of Matlab (except as just noted), we hope the package will be easier to translate to run in other environments, in case some readers lack access to Matlab. In particular, Octave and SciLab are both freely available packages that implement a large subset of Matlab functions, so they are good candidates for substitute environments. Anyone who successfully

HomLab User's Guide

357

ports HOMLAB to one of these, or similar, environments is requested to notify the authors and to make the ported version freely available. Citation of HOMLAB and this book are required, and any differences in functionality must be documented. The authors are not bound to fix bugs in the current version or to upgrade HOMLAB for compatibility of any future release of the Matlab product. However, user comments and bug reports are welcome, so that, at our discretion, we can maintain and possibly improve the educational value of the package. Please see the HOMLAB webpage for instructions on how to submit a comment or bug report. The exercises for this book have been written under Matlab v.6.5. Some of these use features not available in previous releases, namely function pointers and function files that include subfunctions in the same file. This should be more convenient for those with an up-to-date release of Matlab; those with old versions will, we hope, have little trouble revising the source code to run in their environment. C.I.5

Installation

As a suite of m-files, HOMLAB becomes functional by simply adding the folder containing the routines to Matlab's search path. The folder for the current release, HOMLABI.O, is HomLablO. Let's say that you have copied this folder onto your machine with the full path name of c:\mypath\HomLablO, where "mypath" could be any path in the file structure of your machine. There are three basic options for adding HOMLAB to the Matlab path: • In Matlab (v.6.5 and above), use "File -> Set Path" on the Matlab menu bar to launch a dialog box for setting the path and use it to add c: \mypath\HomLablO and its subfolders to the top of the search path. The change becomes effective immediately in the current session, while the "Save" button in the dialog box records it for future sessions. • At the Matlab prompt, use the command » addpath c:\mypath\HomLablO HOMLAB will then be available for the current session only. Similarly, add the subfolders of HomLablO to the path. • Create a file called startup.m in a directory already on Matlab's search path and put the appropriate addpath commands there. HOMLAB will then be available for all future sessions.

Any one of these three options is sufficient. See the Matlab help facility to obtain more detailed instructions on modifying the search path. To test if the installation is successful, type » simpltst at the Matlab prompt. If all is well, the session should look something like:

358

Numerical Solution of Systems of Polynomials Arising in Engineering and Science » simpltst Number of s t a r t points = 2 elapsecLtime = 0 Path 1 elapsed_time = 1.4100e-001 Path 2 elapsed_time = 3.1300e-001 The solutions are: 1.0000e+000 -2.2204e-016i -1.0000e+000 -1.1102e-016i 1.0000e+000 -1.6653e-016i -1.0000e+000 -1.6653e-016i 1.0000e+000 1.0000e+000 +5.5511e-017i »

The times will vary according to your machine and the tiny values of the imaginary parts of the answers will typically change with each run. This test solves the simple system x2 - 1 = 0,

xy-l

= 0,

in the homogenized form x2 - w2 = 0,

xy — w2 = 0,

using a two-path homotopy based on the linear-product formulation /i G (x,l) , / 2 6 (x,l) x (y, 1). Accordingly, the answers should be (x,y,w) = (1,1,1) and (—1, —1,1), as above. More information on interpreting the results is given below. C.I.6

About Scripts

In HOMLAB, the high-level functions are written not as true functions, which hide their internal variables from the workspace, but as scripts, which are a sequence of commands that run directly in the top level workspace. One advantage of this is that a Matlab save command can save all of the data necessary to execute an exact re-run, including all random constants used in denning a start system, and so on. A negative consequence is that all such data is in the workspace until the user clears it. If one wishes to avoid this, one can write a function to call the HOMLAB script and pass out only the desired results. C.2

Overview of HOMLAB

HOMLAB is a collection of compatible routines for defining and executing homotopy algorithms. The workhorse routine is endgamer.m which tracks solution paths for a homotopy h(x, t) = 0 from a list of startpoint solutions of h(x, 1) = 0 to their

HomLab User's Guide

359

endpoints satisfying h(x, 0) = 0. Specifically, endgamer has the usage [xsoln,stats,xendgame]=endgamer(startpoint,hfun) which is more completely documented in § C.7 below. Briefly, the inputs are startpoint, a matrix with one startpoint solution of the homotopy in each column, and hf un, a string name of the homotopy function. The matrix xsoln lists the endpoints of the solution paths in columnwise fashion. As its name suggests, endgamer applies an endgame to get better estimates of the endpoints for paths which approach singularities as t —> 0. Specifically, it uses the power-series endgame described in § 10.3.3. Usage of HOMLAB mainly comes down to specifying a homotopy and finding its start points. This can be done by writing one's own m-files or by making use of utilities and drivers in HOMLAB. The main alternatives are as follows. Linear Products This option includes total degree homotopies (§ 8.4.1), multihomogeneous homotopies (§ 8.4.2), and general linear-product homotopies (§ 8.4.3). The user must specify a target function f(x), its derivative fx{x), and the linear product structure. Automatic differentiation is available if the function is specified in fully-expanded form (see § C.3.1). Driver routine lpdsolve does everything else to construct and solve a homotopy of the form h(x,t) = ytg(x) + (l-t)f(x) = 0. That is, lpdsolve constructs a compatible start system g(x), solves it, and calls endgamer to get the final answers. See § C.4 for details. Parameter Homotopy This option handles general homotopies of the type described in Chapter 7. The user gives a parameterized function f{x,q), its derivatives fx{x,q) and fq(x,q), starting and ending parameter values q\ and go, and startpoint solutions for f(x,qi) = 0. (Usually, the start points are found with a single linear-product run, then parameter homotopy is used for all subsequent runs for various target values of go-) A means is provided for selecting a linear path from q\ to qo; otherwise, the user must write an m-file to implement a nonlinear path. When the linear path is selected, the homotopy is of the form h(x,t)=f{x)tq1 + (l-t)qo)=O. Secant Homotopy This option solves homotopies of the form h(x, t) = -ytf(x, qi) + (1 - t)f(x, q0) = 0. The user supplies the function f(x,q), the derivative fx(x,q), and startpoint solutions to f(x,qi) = 0. (Again, as in the parameter continuation case, one usually solves f(x,q±) = 0 with a single linear-product homotopy, reusing the same q\ for subsequent homotopies to various target values of qo.) It is the

360

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

user's responsibility to verify that the homotopy is valid in the sense that the linear combination of two functions from the family is still in the same family. This is the least used option, but as shown in Exercise 7.6, it is sometimes handy. The usual process involves creating two m-files: • a function defining the system to be solved • a script that sets up the required data structures before calling endgamer to get the solutions. The exception is if one chooses to specify the function in "tableau" form (§ C.3.1), in which case the function evaluation routine is already provided. Facilities are available to make the whole process easy in the most common formulations, while the more advanced user can directly access the basic routines to implement specialized homotopies. In the next few sections, we illustrate each of the main options by examining example scripts and functions. C.3

Defining the System to Solve

HOMLAB allows a target system to be denned in one of two ways: as a fully expanded sum of terms or as a black box function. The fully expanded form is convenient for simple, sparse polynomials, while user-written functions are more flexible and often more efficient. Parameterized families of systems must always be written as a userdefined function, but the underlying functions that HOMLAB uses for evaluating fully expanded functions can be employed in a user-defined function as well.

C.3.1

Fully-Expanded

Polynomials

The simplest option for specifying a target polynomial is to list out its monomials and coefficients. As discussed in § 1.2, this is not generally an efficient formulation: for complicated problems, straight-line programs can require much less computation. However, for simple systems, the fully expanded form is quite reasonable. HOMLAB supports a "tableau" style definition for systems, wherein the entire polynomial system is laid out in a single numerical matrix with n + 1 columns for an n-variable problem. The convention is that each row is a term of a polynomial, with the coefficient in the first column and the exponents d\,..., dn for monomial xf1 • • • xff in the remaining columns. The end of a polynomial is marked by a row with a negative exponent for x\. A complete script for solving the system x2 - x - 2 = 0,

xy - 1 = 0,

using a tableau definition of the system and a total-degree homotopy is as follows.

HomLab User's Guide

361

% Define the target system in tableau form eop = [0 -1 0 ] ; % marker for end of polynomial tableau = [ 12 0 -110 -2 0 0 eop 111 -10 0 eop ]; '/, decode tableau and solve with total-degree homotopy totdtab 7, display the dehomogenized solutions dispOThe solutions a r e : ' ) ; disp(dehomog(xsoln, le-8)) The total degree is 2 • 2 = 4, and there are two finite solutions [x, y, w] = [2,0.5,1], and [—1,-1,1] and a double root at infinity of [0,1,0]. More information on the solution script totdtab is given in § C.4. A related script, lpdtab, can be used to solve tableau-style systems using multihomogeneous or general linear-product homotopies. Using this capability, a two-path version of the above would be as follows. °/0 Define the target system in tableau form eop = [ 0 - 1 0 ] ; '/, marker for end of polynomial tableau = [ 12 0 -110 -2 0 0 eop 111 -10 0 eop ]; % define a linear-product decomposition xw=[l 0 1]; yw=[0 1 1] ; LPDstruct=[ xw; xw; xw; yw; ] ; HomStruct= [] ; V, default t o 1-homogeneous % decode tableau and solve with linear-product homotopy lpdtab '/, display the dehomogenized solutions disp('The solutions are:'); disp(dehomog(xsoln,le-8))

362

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

See § C.4 for details on specifying linear-product structures and see the help for lpdtab for details on how this script automatically homogenizes the tableau-style polynomial using the information in HomStruct. These scripts parse the tableau matrix into a more basic form and pass the results to a built-in function, ftabcall, which in turn calls function ftabsys. The latter can be used directly if one wishes to write a straight-line function (see next section) while specifying some subset of the polynomials in tableau form. For details, see use the help facility or look at the source code for ftabsys.

C.3.2

Straight-Line

Functions

For more complicated functions, it is usually more efficient to express them in straight-line form. There are two contexts in HOMLAB where a user might define such a function: • to define a target polynomial for solution in a linear-product homotopy, or • to define a parameterized family of functions for a coefficient-parameter homotopy. In the first case, the function must have the form

function

[f,fx]=function_name(x)

where x is the input variable list, f is the output function value, and f x is the output Jacobian matrix of partial derivatives df/dx. The function must be homogeneous, possibly multihomogeneous. The careful reader might raise an objection that a homogeneous polynomial on P n is not truly a function (see § 12.3), but for our purposes we consider it as a function on C" +1 , which it certainly is. The script which defines the linear-product homotopy appends random linear equations to effect the projective transformation of § 3.7, one such equation for each projective subspace when working multihomogeneously. To repeat the example above of the system

x2 - x - 2 = 0,

xy-1-0,

in straight-line form, one could define the function

HomLab User's Guide

363

function [f,fz]=simplefcn(z) % Straight-line function for "/. x"2-x-2=0, xy-l=0 x=z(l); y=z(2); w=z(3); f = [ x~2-x*w-2*w~2 x*y-w*2 ]; fz = [ 2*x-w, 0, -x-4*w y, x, -2*w ]; This is not really useful for such a simple example, but it can be significant for more complicated systems. Notice the use of the homogeneous coordinate w. Similarly, parameterized functions must also be homogeneous in the unknowns, but not necessarily in the parameters. The Matlab format for a parameterized family of systems is simply function

[f,fx,fp]=function_name(x,p)

where the third output, f p, is the matrix of derivatives df /dp. Here is a complete specification for the intersection of two circles, where a subfunction for a single circle is used twice. function

[f,fx,fp]=twocircle(x,p)

% Straight-line function for intersection of two circles f=zeros(2,l); fx=zeros(2,3); fp=zeros(2,6); [f(l),fx(l,:),fp(l,l:3)]=onecircle(x,p(l:3)); [f(2),fx(2):),fp(2)4:6)]=onecircle(x,p(4:6)); % function [f,fx,fp]=onecircle(z,p)

% straight-line function for one circle °/, parameters are [cx;cy;r~2] where (cx,cy)=center, r=radius x=z(l); y=z(2); w=z(3); cx=p(l); cy=p(2); rsq=p(3); a=x-cx*w; b=y-cy*w; f = 0.5*( a~2 + b~2 - rsq*w~2); fx = [ a , b, -cx*a-cy*b-rsq*w]; fp = [ -w*a, -w*b, -0.5*w~2 ] ; The most error-prone part of writing a straight-line program is in generating the derivatives. To aid in debugging, utilities are provided to numerically check the coding of the function. See § C.3.4.

364

C.3.3

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Homogenization

It is highly recommended that all systems presented to HOMLAB be denned in homogeneous form. This is the user's responsibility, with the exception that tableaustyle systems will be homogenized automatically. Homogenization is recommended because path endpoints approaching infinity are very common, and the projective transformation available after homogenization keeps both the magnitudes of the coordinates and the arclengths of the homotopy paths finite. If one wishes to compute homotopy paths without homogenization, the path-tracker routines endgame and tracker will still work, but they do not include any special stopping conditions for diverging solutions, which may therefore take up inordinate computation time. (Such paths can never make it to t — 0, so eventually they must fail on a too small step size condition or on the limit on the number of steps.) The choices of a linear product structure and a multihomogenization are independent. For example, if a system is bilinear, one can reflect this in the linearproduct structure while one-homogenizing the system. The one-homogeneous start system will have solutions at infinity, but HOMLAB will ignore these. If the system is two-homogenized instead, respecting the bilinear structure, the linear-product start system has no solutions at infinity. This is a bit cleaner mathematically, but in practical terms, both formulations have the same number of solution paths to follow. To be clear, consider again the example x2 ~ x - 2 = 0,

xy - 1 = 0.

In a one-homogeneous treatment using coordinates [x, y, w] £ P 2 , we have the equations x2 - xw - 2w2 = 0 ,

xy -w2 = 0,

and we must specify a compatible homogeneous structure: HomStruct=[l 1 1]; which directs HOMLAB to append a inhomogeneous linear equation ax + by+cw = 1 for some random, complex {o, b, c}, thereby choosing a random patch on P 2 . (For a discussion of projective spaces, see Chapter 3.) To get a two-path homotopy, we specify the linear-product structure LPDstruct=[l 0 1 ; 1 0 1 ;

1 0 1; O i l ] ;

that is, /i e (x,w) x (x,w) and / 2 G (x,w) (y,w). This start system has a double root at infinity of [x,y,w] = [0,1,0], but HOMLAB will ignore it. The two-homogeneous treatment of the same system using coordinates {[x, u], [y,v]} e P 1 x P 1 is x2 — xu — u2 = 0,

xy — uv = 0.

365

HomLab User's Guide

The compatible HomStruct is, assuming the coordinates are ordered as HomStruct=[l 0 1 0 ; which directs

HOMLAB

(x,y,u,v),

0 1 0 1];

to append two linear equations

ax + Qy + bu + Ov = 1,

Ox + cy + Ou + dv = 1,

for random, complex values of {a, b, c, d}. This picks a random patch on each of the two P 1 subspaces. Now, the two-path linear-product decomposition is LPDstruct=[1 0 1 0 ; 1 0 1 0 ;

1010;

010

1];

See § C.8 for a description of the dehomog function to dehomogenize a solution point.

C.3.4

Function Utilities and Checking

With a few sample scripts in hand, it is easy to set up and run any of the various kinds of homotopies once the function and its derivatives are available. To make the definition of these easier, some utilities are available. • Function f tableau accepts a list of coefficients and a matrix of exponents to define the terms of a polynomial. It then provides both the function and derivative evaluations. It works only for a single function / : C n —> C, so a wrapper function ftabsys is provided to call ftableau multiple times for a system of such functions. • Utility scalepol is available for scaling a system for f tabsys, as is sometimes necessary. For example, see the chemical system of § 9.2. Since Matlab is a numerical package, there has been no attempt to automate differentiation and homogenization except for the simple case of fully expanded polynomials via the ftableau function. Otherwise, this onerous task falls to the user. Symbolic packages can be employed to preprocess functions in this way and then copy the results into an m-file function. The most error-prone step in defining a straight-line program for a function is in giving formulae for the partial derivatives. A helpful way of checking these is to compare the computed derivatives with a computation based on numerical differentiation. The function must also be homogenized, which can also be checked numerically. The following checking utilities are provided for these purposes.

366

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

function [fxerr]=chekffun(fname,nx,epsO) —> checks derivatives of target functions [f,fx]=myfunc(x) function [fxerr,fperr]=chekpfun(fname,nx,np,epsO) —> checks derivatives for parameterized functions [f,fx,fp]=myfunc(x,p) function [homerr]=chekhmog(fname,HomStruct,mdeg) —> Checks multihomogenization of function [f]=myfunc(x) User provides homogeneous structure and multidegree matrix function [homerr]=chekhmog(fname,HomStruct,deg,lpd) —> Checks multihomogenization of function [f]=myfunc(x) User provides homogeneous structure, t o t a l degrees, and linear product structure. Code computes multidegree matrix from these. In each case, the checking is done at a random point x £ Cnx. The functions provide a numerical comparison and also use the Matlab spy function to graphically show which elements are suspicious, having an error greater than espO. If epsO is omitted from the call, it defaults to 10~6. Note that high-level scripts define a global FFUN, which can be used for fname.

C.4

Linear Product Homotopies

One of the two main options in HOMLAB is the linear-product homotopy, implemented in the script lpdsolve. With appropriate settings, this script performs the equivalent of a total-degree homotopy, a multihomogeneous homotopy, or a general linear-product homotopy. For total-degree and multihomogeneous homotopies and a tableau-style function definition, the higher-level scripts totdtab and mhomtab automatically perform some preliminary processing steps for you before initiating lpdsolve. Let's first see all the set-up information required by lpdsolve by studying a script to solve a simple system specified in straight-line form. Such a function is treated as a "black box," so the user must supply all the structural information necessary to specify the linear-product formulation. To this end, consider the straight-line function called simplf en in § C.3 above, that implements the system x2 - x - 2 = 0,

xy - 1 = 0.

It has two variables (before homogenization), and each equation is quadratic. A complete script to solve this with a total-degree homotopy, with four paths, is as follows.

HomLab User's Guide

367

% script to solve "simplfcn" by t o t a l degree using lpdsolve global nvar degrees FFUN nvar=2; degrees=[2 2 ] ; FFUN='simplfcn'; LPDstruct=ones(sum(degrees),nvar+l); % t o t a l degree structure lpdsolve dispOThe solutions a r e : ' ) ; disp(dehomog(xsoln, ie-8)) The meaning of the global variables is self-explanatory. The degrees of the polynomials as listed in degrees must be in the same order as they appear in the evaluation function, although in this example they are the same. The item that needs explanation is LPDstruct, which defines the linear-product structure to be used. Each row in LPDstruct represents one linear factor, and there must be degrees (i) factors for the ith equation, for a total of sum (degrees) rows in all. The columns of LPDstruct correspond to the variables in x as it is passed into simplfcn(x). Typically, the final column is the homogeneous coordinate, but this is at the discretion of the user when writing the function. A nonzero entry in element (i,j) of LPDstruct indicates that variable j appears in the ith linear factor, and factors are assigned to equations in accordance with the entries in degrees. For a total-degree homotopy, LPDstruct is just a full matrix of ones. We can run the same problem using only two paths just by changing LPDstruct. In this case, a two-path homotopy is obtained with the following script. '/„ script to solve "simplfcn" with two paths using lpdsolve global nvar degrees FFUN nvar=2; degrees=[2 2]; FFUN='simplfcn'; xw= [ 1 0 1]; yw=[0 1 l ] ; LPDstruct=[ xw; xw; xw; yw; ];

HomStruct=[]; % default to 1-homogeneous lpdsolve disp('The solutions a r e : ' ) ; disp(dehomog(xsoln,le-8)) This is exactly the script simpltst that is suggested as an installation check in § C.1.5. The script above uses only two paths even though simplfcn is only onehomogenized. That is, the homotopy runs in the projective space P 2 . The choice of linear product structure guarantees that the start system, the target system, and consequently the whole homotopy, have a double root at infinity that we choose not to track. If we two-homogenize the equations instead, then this double root

368

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

does not exist at all. It makes no difference in this case, but in cases where some endpoints arrive at infinity only at the target (as t —• 0), multihomogenization can change their representation, and sometimes this can make them numerically more tame. For instance, a singular double root at infinity might break into two distinct nonsingular roots at infinity. To show how HomStruct is used to set up a homotopy in a cross product of projective spaces, let's rework the running example. First, we need a two-homogenized version of the equations. function [f,fz]=simplefcn2(z) % Straight-line function for '/. x~2-x-2=0, xy-l=0 % Two-homogeneous version on [x,u] \times [y,v] x=z(l); y=z(2); u=z(3); v=z(4); f = [ x~2-x*u-2*u~2 x*y-u*v ]; fz = [ 2*x-u, 0, -x-4*u, 0 y. x, -v, -u ];

The script to solve this as a two-homogeneous system follows. 5i script to solve "simplfcn2" two-homogeneously global nvar degrees FFUN nvar=2; degrees=[2 2]; FFUN='simplfcn'; xu=[l 0 1 0 ] ; yv=[0 1 0 1 ] ; LPDstruct=[ xu; xu; xu; yv; ]; HomStruct=[ xu ];

lpdsolve dispOThe solutions a r e : ' ) ; disp(dehomog(xsoln,le-8)) In general, the groupings in the linear factors specified in LPDstruct do not have to be copies of those in HomStruct, as indeed, they are different in the two-path,

369

HomLab User's Guide

one-homogeneous example above. The given LPDstruct and HomStruct must be compatible with the target function. In tableau style functions, the automated scripts will ensure compatibility, but straight-line functions are treated as black boxes, so HOMLAB has no way of checking compatibility. It is the user's responsibility to ensure compatibility. In the case of errors, the resulting behavior will be erratic, sometimes signalled by path-tracking failures, but not necessarily. The solution script, lpdsolve, does the following: • generates a start system g(x) according to the linear-product structure of LPDstruct, • appends random hyperplane slices to implement the projective transformation compatible with HomStruct, • solves g(x) = 0 to get all the start points, • forms the homotopy h(x,t) = 1tg(x) +

{l-t)f(x),

• calls endgamer to track the solution paths, invoking a power-series endgame. The results are in matrices xsoln, stats, and xendgame, as described in § C.7.3. C.5

Parameter Homotopies

Suppose we have written a coefficient-parameter target function, f(x;p), implemented as an m-file function, say myf unc, having the calling sequence [f,fx,fp]=myfunc(x,p) as described in § C.3. An example is function twocircle above. How can we form a parameter homotopy function to solve it for some target value of pO? Let's assume we have a solution list for random, complex parameter values pi. (We will see in a moment how to get this using lpdsolve.) What we need is a homotopy function h(x,t;pl,pO) = f(x,p(t;pl,pQ)) where p : C x Q x Q —> Q with Q the parameter space, and p(l;pi,p0) = pi, p(O;pi,j>o) — Po- The path function p must give a continuous path, with continuous first derivative, starting at pi, ending at po, and always staying in the parameter space. HOMLAB does not offer a general solution for arbitrary parameter spaces, but in the special case that Q = C m , a Euclidean space, a linear path suffices: p(t\pi,p0) = tpi + ( l - i ) p 0 Our path tracker and endgame function, endgamer, expects a homotopy function with the calling sequence [h,hx,ht]=h(x,t). Therefore, the parameters and the

370

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

path function must be passed from the top level script to the homotopy evaluation function via global variables. Moreover, we write myfunc in homogeneous form, so projective transformation equations must be appended. The script parsolve takes care of all the formatting once the minimal set of information has been established. Let's assume that pi and pO are in memory along with the start points, as matrix startpoint, listed columnwise and satisfying f(x,pi) = 0. Then, a script for solving the system f(x,p0) = 0 is as follows, assuming myfunc implements f(x,p). global FFUN PATHFUN FFUN = 'myfunc'; PATHFUN = ' l i n . p a t h ' ; global ParStart ParGoal ParStart = p i ; ParGoal = pO; HomStruct= [] ; °/« defaulting to 1-homogeneous parsolve disp('The solutions a r e : ' ) ; disp(dehomog(xsoln,le-8)) Here, lin_path is a pre-defined function for a linear path. Clearly, HomStruct must be set to agree with the homogenization that has been applied to the user-defined myfunc. C.5.1

Initializing Parameter Homotopies

For parameter homotopy to be useful, we must have some way to solve the first example, f(x,pi) — 0. This can be done with a linear-product homotopy. Once the parameterized family of systems has been defined, in the form f(x;p) = 0, HOMLAB can treat it like any other black box target system. This requires, as described in § C.4, one to provide the linear product structure to be used in the homotopy. One additional wrinkle is that the initial set of parameters p\ must be chosen at random, and then passed behind the scenes through a global variable. Script Ipd2par takes care of all of this. An example usage of this to solve the example of the intersection of two circles, function twocircle above, is as follows. global nvar degrees FFUN nvar=2; degrees=[2 2 ] ; FFUN='twocircle'; LPDstruct=ones(4,3) ; °/0 t o t a l degree structure HomStruct=[l 1 1]; °/0 1-homogeneous % 6 random, complex parameters pi = crand(6,l); global ParGoal ParGoal = p i ; Ipd2par dispOThe solutions a r e : ' ) ; disp(dehomog(xsoln,le-8))

HomLab User's Guide

371

This sets up and solves a homotopy of the form 1ftg(x) + (l-t)f{x;p1) Here, we have chosen random, complex parameters, using the function crand, as this is the desired first step in establishing a parameter homotopy. One can use the same script with nonrandom values of ParStart to solve other problems in the family using a linear-product homotopy, but each such run uses the full linear-product root count number of paths. If any of the endpoints are degenerate in the run of Ipd2par for random target parameters, then correspondingly fewer paths can be used in solving subsequent members of the family by parameter homotopy. Just copy pi to ParStart and copy the nondegenerate endpoints into startpoint and you are ready to apply the parameter homotopy of the previous section. Here, "degenerate" can mean singular solutions, solutions at infinity, or solutions on any pre-specified (i.e., independent of the random choice of parameters) irreducible quasiprojective algebraic set. See Chapter 7 for details. C.6

Defining a Homotopy Function

In all the above usages, HOMLAB automatically constructs a homotopy in accord with the instructions provided by the user. Alternatively, one can define a complete homotopy from scratch and then call up HOMLAB'S path tracker to solve it. The homotopy function must be denned with the following interface: function [h,hx,ht]=myriomotopy(x>t) where myhomotopy can be any name of the user's choosing. The user must also provide a list of start points, whereupon the corresponding endpoints can be obtained with the command [xsoln,stats,xendgame]=endgamer(startpoint,'myhomotopy'); See § C.7 for details. C.6.1

Defining a Parameter

Path

In linear-product decompositions, the homotopy path is automatically chosen by HOMLAB as a straight line through the corresponding coefficient space, as justified by Theorem 8.3.1. In parameter homotopies, however, one must ensure that the homotopy path stays in the desired parameter space for all t, not just for the start and target systems at t = 1 and t = 0. If the parameter space is Euclidean, then a linear path is acceptable. As in the example usage of parsolve above, this is easily obtained by the declaration PATHFUN=' lin_path'; which makes use of a pre-defined function lin_path for linearly interpolating between points in parameter space. If

372

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

the parameter space is non-Euclidean, a more general type of path is needed. The user must provide the definition in the form function

[p,dpdt]=mypath(pl,pO,t)

where mypath. m is a user-written m-file function. Then, set PATHFUN=' mypath'; before calling parsolve to execute the homotopy. See the source code for lin_path.m for an example to follow. C.6.2

Homotopy Checking

Most errors in coding a homotopy function can be revealed by checking if computed derivatives agree with a computation based on numerical differentiation. The following routine is provided for this purpose. function [hxerr,hterr]=chekhfun(fname,nx,epsO) —> checks homotopy functions [f,fx,ft]=myfunc(x,t) The checking is done at a random point (x,t) € C nx x C. The functions provide a numerical comparison and also use the Matlab spy function to graphically show which elements are suspicious, having an error greater than espO. If epsO is omitted from the call, it defaults to 10~6. Note that high-level scripts define a global HFUN, which can be used for f name. C.7

The Workhorse: Endgamer

The workhorse routine is endgamer. m which tracks solution paths for a homotopy h(x,t) = 0 from a list of startpoint solutions of h(x, 1) — 0 to their endpoints satisfying h(x, 0) = 0. Specifically, endgamer has the usage [xsoln,stats,xendgame]=endgamer(startpoint,hfun) with thefollowinginputs and outputs. Inputs startpoint An n x N matrix of N start points, listed columnwise. hfun A string name of the homotopy function, h(x,t) : Cn x C - » C n . The function routine must provide derivatives (see § C.6). It is recommended that the homotopy be homogenized. Outputs xsoln Ann x N matrix of the endpoints of the homotopy paths. stats A6xJV matrix of statistics regarding the paths and their endpoints. xendgame An n x N matrix recording the solutions for t at the start of the endgame.

HomLab User's Guide

373

There are a number of control settings regarding path-tracking tolerances and the like which must be set prior to calling endgamer. These global variables can be set by calling htopyset, as is done automatically by the high-level solving scripts totdtab, mhomtab, lpdsolve, and parsolve. To change the default settings, one just puts a copy of htopyset in the current working directory and edits the values. Matlab will find and use the copy in the current directory, overriding the copy in HOMLABIO, which is best left in its original condition. Comments in the original copy ofhtopyset.m tell the default settings in case the user needs them for reference. Routine endgamer loops through the start points and for each one does the following: (1) tracks the path to the beginning of the endgame, t=t_endgame, a global control variable; (2) records the solution at t=t_endgame as a column in xendgame; (3) executes the power-series endgame (§ 10.3.3), monitoring the convergence criterion and stopping when either convergence is reached or when one of several protective stopping conditions is satisfied; (4) records the best solution estimate, as judged by the convergence criterion, as a column in xsoln and certain statistics concerning the solution are recorded as a column in stats. The details of all the required control settings are given next, followed by a detailed description of the outputs. C.7.1

Control Settings

As mentioned above, the control settings are established in htopyset.m, which is called automatically by the high-level scripts. Here, we give a detailed list and describe what each control means, as well as give the default value. We group these into three general categories. These are all global variables. LPD Start System (used by lpdstart) • epsstart = le-12; Each solution of the linear-product start system is found by choosing one linear factor from each equation and solving the resulting linear system. Choices that give a singular linear system are ignored. The solver builds the linear system one equation at a time, using Gaussian elimination to triangularize as it proceeds. If at any stage the magnitude of the largest available pivot is less than epsstart, that combination of linear factors is declared invalid, and the solver moves on to the next combination. Path Tracking (used by tracker). This variable step-size tracker is a correctorpredictor type as described in § 2.3.

374

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

• stepmin = le-6; The minimum step size in t, below which a path is declared as having failed. • maxit = 3; The maximum number of Newton iterations allowed in the corrector. If the designated convergence criterion is not met within this number of iterations, the step is a failure and the step size will be halved. • maxnf e = 1500; The maximum number of function evaluations allowed per path. This limits the amount of computing time that a diverging path may consume. For well-scaled, homogenized homotopies, this criterion should rarely come into play. • epstiny = le-12; In rare instances, a path may fail due to vanishing of the tangent vector (dh/dx)~1dh/dt. This is detected using the tolerance epstiny. Typically, when this occurs, it is a signal that the homotopy is not properly formed, possibly an error in a user-written function for evaluation of the derivatives. End Game (used by endgamer) This routine calls tracker to get to the start of the endgame, then runs the power-series endgame. • stepstart = 0.1; The initial step size for t in the tracker. • epsbig = le-4; The tracking accuracy to be maintained in the initial phase of tracking. This is the convergence tolerance for the corrector. If a path does not successfully reach the endgame, it is tried once more from the beginning with a tighter tolerance of epsbig/100. • epssmall = le-6; The tracking accuracy to be maintained in the endgame. • t_endgame = 0.1; The value of t where the endgame starts. • tstop = le-10; The value of t where the endgame gives up. • t r a t i o = 0.3; During the endgame, samples are taken for t in a geometric series where £& = tratio*£/c_i. The value 0.3 is a compromise between the need to spread the samples out for a well-conditioned fit (tratio smaller) and the need to stay away from t = 0, where the path may be singular. • eps_end = le-10; The criterion for deciding when the endpoint estimate has converged. When two successive estimates agree to this tolerance, success is declared. • CycleMax = 4; This is the maximum winding number tested by the powerseries endgame. In double precision, the endgame is rarely successful above winding number c = 4. • maxerrup = 10; The endgame keeps a record of the smallest change in the endpoint estimate in successive iterations. (This is compared to eps_end for declaring success.) Usually, this measure improves with each successive iteration, unless the path gets too close to t = 0 before converging. However, in the early stages of the endgame, the convergence measure can sometimes increase briefly before entering the endgame operating zone. If there are more than maxerrup successive iterations without improving on

HomLab ,User's Guide

375

the best iteration, the path is stopped. • allowjump = 1; When nonzero, this flag allows the endgame to predictcorrect across the origin in s, where s = t1//<2 is the un-wound path variable. This allows the endgame to sample on both sides of s = 0 to estimate the value of the endpoint by seventh-order interpolation. If allowjump=O, samples are only taken for s > 0, and the endpoint is estimated using cubic extrapolation to s — 0. C.7.2

Verbose Mode

By declaring global verbose and setting verbose=l;, the user will cause endgamer to print out its progress during the endgame for each path. This allows one to see how well the endgame is performing. Usually this is not of great interest, but if one is running a huge problem, it may be worth monitoring a small sample of paths and tuning the control settings for greater efficiency. It is also a useful way of confirming that all is working well: if superlinear convergence is obtained in the endgame, it is a strong indicator that everything is in good order. The five columns of information printed in verbose mode are: • the t value of the current endgame sample, • the difference between the last two endpoint estimates (maximum absolute value of the difference in any variable), • the current best guess for the winding number c, • the status of the endgame, which is the number of samples involved in estimating the endpoint. Each sample includes derivative information. "1" means there is only one sample, so the estimate will be done by linear extrapolation. "2" means two samples, so cubic extrapolation is available. "3" means an additional sample has been acquired on the other side of s = 0, but cubic extrapolation is still used. "4" means there are two samples on each side, so seventh-order interpolation is used for the estimate. • the fifth column is the number of successive iterations that have not improved on the best estimate. C.7.3

Path Statistics

The main tool for interpreting the results is to examine the s t a t s matrix. It has one column per path and six rows. The rows are as follows. • s t a t s (1,:) The value of t at which the endgame gave its best estimate. • s t a t s (2,:) The convergence estimate at the endpoint, which is the maximum absolute value of the difference in any variable between two successive endpoint estimates. • s t a t s (3,:) The function residual, that is the maximum absolute value of any entry in h(x*,0) for the endpoint estimate x*.

376

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

• s t a t s (4, :) The estimated winding number. If the true winding number is higher than CycleMax, this will typically result in a best guess of c=CycleMax in this place. • stats (5,:) Condition number of the Jacobian matrix |^ (x*, 0) at the endpoint estimate x*. • stats (6,:) The total number of function evaluations used in computing this path. For large runs, it is tedious to examine stats by looking at the raw numbers. It is much easier to look at histograms and other types of summary statistics. Any endpoint that does not have a small function residual, s t a t s ( 3 , :), has failed in a serious way; quite likely the path tracker stopped with a large value of t in s t a t s ( l , : ) . A histogram plot of Iogl0(stats(3,:)) gives a quick check whether all the paths have ended in at least an approximate solution. If the endpoint is singular, its function residual can be small while it is still relatively far from the true endpoint. Check loglO (stats (2,:)) to see the endpoint convergence measure. One hopes that all endpoints are computed to the desired accuracy, eps_end, but some may not, especially if they are singularities with cycle number of 4 or greater. If the accuracy is only moderate, say better than 10~6 but not at the 10~10 ones desires, check if the condition number is at least moderately high, say 108 or greater. This would indicate that the root really is singular and failed to be computed accurately for that reason. Depending on one's purpose, that may be enough. When the singular endgame works well, the endpoint accuracy will be better than 10~10 and the condition number of a singular point will be greater than 1010 often as high as 1016 or more. C.8

Solutions at Infinity and Dehomogenization

When the solutions are computed in homogeneous coordinates or multihomogeneous coordinates, they can be scaled in each projective factor. Usually the original formulation is in Cn and it has been recast in P", or in a cross product of projective spaces, by introducing one or more homogenizing coordinates. Solutions at infinity are indicated by a small homogenizing coordinate, or if multihomogenized, by at least one homogenizing coordinate being near zero. Here, being near zero means, typically, being of the same magnitude as the convergence estimate in s t a t s ( 2 , : ) . If the homogenizing coordinate is in row k of xsoln, then a histogram of absdoglO(xsoln(k,:))) can be very revealing. For a finite solution, we wish to rescale to make the homogenizing coordinate(s) equal to one. Subroutine dehomog does this. The short form is x=dehomog(xsoln,epsO) ; where epsO is the magnitude of the homogenizing coordinate below which a solu-

HomLab User's Guide

377

tion is declared to be at infinity. This form assumes that the solutions are onehomogenized and that the homogenizing coordinate is the last entry. Any solution determined to be at infinity is rescaled by its largest element, while a finite one is rescaled by the homogenizing coordinate. This is usually what one wants, but the result can be a bit surprising if epsO is made too small so that a poorly computed solution at infinity gets erroneously rescaled as if it were finite. A more elaborate form must be used for multihomogenized solutions: x=dehomog(xsoln,espO,HomStruct,homvar); where HomStruct identifies the membership in the various homogeneous groupings, and homvar is a list of the row number for each homogenizing variable. If homvar is missing, the last variable of each group is assumed by default to be the homogenizing variable for that group. In either the short form or the long form, dehomog sets one variable of each homogeneous group to one.

Bibliography

Abhyankar, S. S. (1990). Algebraic geometry for scientists and engineers, Vol. 35 of Mathematical Surveys and Monographs. Providence, RI: American Mathematical Society. Alefeld, G., k Herzberger, J. (1983). Introduction to interval computations. Computer Science and Applied Mathematics. New York: Academic Press Inc. [Harcourt Brace Jovanovich Publishers]. Translated from the German by Jon Rokne. Allgower, E. L., Erdmann, M., k Georg, K. (2002). On the complexity of exclusion algorithms for optimization. J. Complexity, 18(2), 573-588. Algorithms and complexity for continuous problems/Algorithms, computational complexity, and models of computation for nonlinear and multivariate problems (Dagstuhl/South Hadley, MA, 2000). Allgower, E. L., k Georg, K. (1993). Continuation and path following. In Ada numerica, Vol. 2 (pp. 1-64). Cambridge: Cambridge Univ. Press. Allgower, E. L., k Georg, K. (1997). Numerical path following. In Handbook of numerical analysis, Vol. V (pp. 3-207). Amsterdam: North-Holland. Allgower, E. L., k Georg, K. (2003). Introduction to numerical continuation methods, Vol. 45 of Classics in Applied Mathematics. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). Reprint of the 1990 edition [SpringerVerlag, Berlin]. Allgower, E. L., Georg, K., k Miranda, R. (1992). The method of resultants for computing real solutions of polynomial systems. SIAM J. Numer. Anal., 29(3), 831-844. Allgower, E. L., k Sommese, A. J. (2002). Piecewise linear approximation of smooth compact fibers. J. Complexity, 18(2), 547-556. Algorithms and complexity for continuous problems/Algorithms, computational complexity, and models of computation for nonlinear and multivariate problems (Dagstuhl/South Hadley, MA, 2000). Alt, H. (1923). Uber die Erzeugung gegebener ebener Kurven mit Hilfe des Gelenkvierecks. Zeitschrift fur Angewandte Mathematik und Mechanik, 3(1), 13-19. Arbarello, E., Cornalba, M., Griffiths, P. A., & Harris, J. (1985). Geometry of algebraic curves. Vol. I, Vol. 267 of Grundlehren der Mathematischen Wissenschaften 379

380

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

[Fundamental Principles of Mathematical Sciences]. New York: Springer-Verlag. Auzinger, W., & Stetter, H. J. (1988). An elimination algorithm for the computation of all zeros of a system of multivariate polynomial equations. In Numerical mathematics, Singapore 1988, Vol. 86 of Internat. Schriftenreihe Numer. Math. (pp. 11-30). Basel: Birkhauser. Bates, D., Peterson, C , & Sommese, A. J. (2005a). A numerical-symbolic algorithm for computing the multiplicity of a component of an algebraic set. in preparation. Bates, D., Sommese, A. J., & Wampler, C. W. (2005b). Multiprecision endgames for homotopy continuation, in preparation. Beltrametti, M. C , Howard, A., Schneider, M., &, Sommese, A. J. (2000). Projections from subvarieties. In Complex analysis and algebraic geometry (pp. 71-107). Berlin: de Gruyter. Beltrametti, M. C , & Sommese, A. J. (1995). The adjunction theory of complex protective varieties, Vol. 16 of de Gruyter Expositions in Mathematics. Berlin: Walter de Gruyter & Co. Bernstein, D. N. (1975). The number of roots of a system of equations. Functional Anal. Appl., 9(3), 183-185. Translated from Funktsional. Anal, i Prilozhen 9(3):1-4,1975. Borel, A. (1969). Linear algebraic groups. Notes taken by H. Bass. W. A. Benjamin, Inc., New York-Amsterdam. Bottema, O., h Roth, B. (1979). Theoretical kinematics, Vol. 24 of North-Holland Series in Applied Mathematics and Mechanics. Amsterdam: North-Holland Publishing Co. Burmester, L. E. H. (1888). Lehrbuch der Kinematik. Leipzig A. Felix. Calabri, A., & Ciliberto, C. (2001). On special projections of varieties: epitome to a theorem of Beniamino Segre. Adv. Geom., 1(1), 97-106. Canny, J. (1990). Generalised characteristic polynomials. J. Symbolic Comput., 9, 241-250. Canny, J., & Manocha, D. (1993). Multipolynomial resultant algorithms. J. Symbolic Comput., 15, 99-122. Canny, J., & Rojas, J. M. (1991). An optimal condition for determining the exact number of roots of a polynomial system. Proceedings of the 1991 International Symposium on Symbolic and Algebraic Computation (pp. 96-101). ACM, New York. Chablat, D., Wenger, P., Majou, R., & Merlet, J.-P. (2004). An interval based study for the design and the comparison of three-degrees-of-freedom parallel kinematic machines. Int. J. Robotics Research, 23(6), 615-624. Chen, N. X., & Song, S.-M. (1994). Direct position analysis of the 4-6 Stewart platform. ASME J. Mech. Design, 116(1), 61-66. Chow, S. N., Mallet-Paret, J., & Yorke, J. A. (1979). A homotopy method for locating all zeros of a system of polynomials. In Functional differential equations and approximation of fixed points (proc. summer school and conf, univ. bonn,

Bibliography

381

bonn, 1978), Vol. 730 of Lecture Notes in Math. (pp. 77-88). Berlin: Springer. Chu, M. T., Li, T.-Y., & Sauer, T. (1988). Homotopy method for general A-matrix problems. SI AM J. Matrix Anal. AppL, 9(4), 528-536. Cox, D., Little, J., & O'Shea, D. (1997). Ideals, varieties, and algorithms. Undergraduate Texts in Mathematics. New York: Springer-Verlag, second edition. An introduction to computational algebraic geometry and commutative algebra. Cox, D., Little, J., & O'Shea, D. (1998). Using algebraic geometry, Vol. 185 of Graduate Texts in Mathematics. New York: Springer-Verlag. D'Andrea, C , & Emiris, I. Z. (2003). Sparse resultant perturbations. In Algebra, geometry, and software systems (pp. 93-107). Berlin: Springer. Datta, R. S. (2003). Using computer algebra to find Nash equilibria. Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation (pp. 74-79). New York: ACM. Davidenko, D. F. (1953a). On a new method of numerical solution of systems of nonlinear equations. Doklady Akad. Nauk SSSR (N.S.), 88, 601-602. Davidenko, D. F. (1953b). On approximate solution of systems of nonlinear equations. Ukrain. Mat. Zurnal, 5, 196-206. Davis, P. J. (1975). Interpolation and approximation. New York: Dover Publications Inc. Republication, with minor corrections, of the 1963 original, with a new preface and bibliography. Decker, W., Greuel, G.-M., & Pfister, G. (1999). Primary decomposition: algorithms and comparisons. In Algorithmic algebra and number theory (Heidelberg, 1997) (pp. 187-220). Berlin: Springer. Decker, W., & Schreyer, F.-O. (2001). Computational algebraic geometry today. In Applications of algebraic geometry to coding theory, physics and computation (Eilat, 2001), Vol. 36 of NATO Sci. Ser. II Math. Phys. Chem. (pp. 65-119). Dordrecht: Kluwer Acad. Publ. Decker, W., & Schreyer, F.-O. (2005). Solving polynomial equations: Foundations, algorithms, and applications, to appear. Denavit, J., & Hartenberg, R. S. (1955). A kinematic notation for lower pair mechanisms based on matrices. J. Appl. Mechanics, 22, 215-221. Trans. ASME, vol. 77. Dhingra, A., Kohli, D., & Xu, Y. X. (1992). Direct kinematic of general Stewart platforms. DE-Vol. 45, Robotics, Spatial Mechanisms, and Mechanical Systems (pp. 107-112). ASME. Dian, J., & Kearfott, R. B. (2003). Existence verification for singular and nonsmooth zeros of real nonlinear systems. Math. Comp., 72(242), 757-766. Dickenstein, A., & Emiris, I. Z. (Eds.), (preprint). Solving polynomial equations: Foundations, algorithms, and applications. Berlin Heidelberg New York: Springer-Verlag. Dietmaier, P. (1998). The Stewart-Gough platform of general geometry can have 40 real postures. In J. Lenarcic, & M. L. Husty (Eds.), Advances in robot kinematics:

382

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Analysis and control (pp. 1-10). Dordrecht: Kluwer Academic Publishers. Dixon, A. L. (1909). The eliminant of three quantics in two independent variables. Proc. London Math. Soc, 2(7), 49-69. Drexler, F. J. (1977). Eine Methode zur Berechnung samtlicher Losungen von Polynomgleichungssystemen. Numer. Math., 29(1), 45-58. Drexler, F. J. (1978). A homotopy method for the calculation of all zeros of zerodimensional polynomial ideals. In Developments in statistics, vol. 1 (pp. 69-93). New York: Academic Press. Duffy, J., & Crane, C. (1980). A displacement analysis of the general spatial 7-link, 7R mechanism. Mechanism Machine Theory, i5(3-A), 153-169. Eisenbud, D. (1995). Commutative Algebra with a view toward algebraic geometry, Vol. 150 of Graduate Texts in Mathematics. New York: Springer-Verlag. Emiris, I. Z. (1994). Sparse elimination and applications in kinematics. PhD thesis, Computer Science Division, Dept. of Electrical Engineering and Computer Science, University of California, Berkeley. Emiris, I. Z. (1995). A general solver based on sparse resultants. Proc. PoSSo (Polynomial System Solving) Workshop on Software (pp. 35-54). Paris. Emiris, I. Z. (2003). Discrete geometry for algebraic elimination. In Algebra, geometry, and software systems (pp. 77-91). Berlin: Springer. Faugere, J. C , & Lazard, D. (1995). The combinatorial classes of parallel manipulators. Mechanism Machine Theory, 30(6), 765-776. Feinberg, M. (1980). Chemical oscillations, multiple equilibria, and reaction network structure. In W. E. Stewart (Ed.), Dynamics and modelling of reactive systems (pp. 59-130). Academic Press, Inc. Fischer, G. (1976). Complex analytic geometry. Berlin: Springer-Verlag. Lecture Notes in Mathematics, Vol. 538. Fischer, G. (2001). Plane algebraic curves, Vol. 15 of Student Mathematical Library. Providence, RI: American Mathematical Society. Translated from the 1994 German original by Leslie Kay. Freudenstein, F., & Roth, B. (1963). Numerical solution of systems of nonlinear equations. J. ACM, 10(4), 550-556. Frisch, J. (1967). Points de platitude d'un morphisme d'espaces analytiques complexes. Invent. Math., 4, 118-138. Fritzsche, K., & Grauert, H. (2002). From holomorphic functions to complex manifolds, Vol. 213 of Graduate Texts in Mathematics. New York: Springer-Verlag. Fulton, W. (1998). Intersection theory, Vol. 2 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Berlin: Springer-Verlag, second edition. Gao, T., & Li, T.-Y. (2000). Mixed volume computation via linear programming. Taiwanese J. Math., 4(4), 599-619. Gao, T., & Li, T.-Y. (2003). Mixed volume computation for semi-mixed systems.

Bibliography

383

Discrete Comput. Geom., 29(2), 257-277. Gao, T., Li, T.-Y., Verschelde, J., & Wu, M. (2000). Balancing the lifting values to improve the numerical stability of polyhedral homotopy continuation methods. Appl. Math. Comput, 114(2-3), 233-247. Gao, T., Li, T.-Y., & Wang, X. (1999). Finding all isolated zeros of polynomial systems in C" via stable mixed volumes. J. Symbolic Comput., 28(1-2), 187-211. Polynomial elimination—algorithms and applications. Garcia, C. B., & Zangwill, W. I. (1979). Finding all solutions to polynomial systems and other systems of equations. Math. Programming, 16(2), 159-176. Garcia, C. B., & Zangwill, W. I. (1980). Global continuation methods for finding all solutions to polynomial systems of equations in n variables. In Extremal methods and systems analysis (Interned. Sympos., Univ. Texas, Austin, Tex., 1977), Vol. 174 of Lecture Notes in Econom. and Math. Systems (pp. 481-497). Berlin: Springer. Gelfand, I., Kapranov, M., & Zelevinsky, A. (1994). Discriminants, resultants and multidimensional determinants. Boston: Birkhauser. Georg, K. (2001). Improving the efficiency of exclusion algorithms. Adv. Geom., 1(2), 193-210. Georg, K. (2003). A new exclusion test. J. Comput. Appl. Math., 152(1-2), 147160. Proceedings of the International Conference on Recent Advances in Computational Mathematics (ICRACM 2001) (Matsuyama). Giusti, M., Hagele, K., Lecerf, G., Marchand, J., & Salvy, B. (2000). The projective Noether Maple package: computing the dimension of a projective variety. J. Symbolic Comput, 30(3), 291-307. Goedecker, S. (1994). Remark on algorithms to find roots of polynomials. SI AM J. Sci. Comput, 15(5), 1059-1063. Goresky, M., & MacPherson, R. (1988). Stratified Morse theory, Vol. 14 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Berlin: Springer-Verlag. Greuel, G.-M. (2000). Computer algebra and algebraic geometry—achievements and perspectives. J. Symbolic Comput, 30(3), 253-289. Greuel, G.-M., & Pfister, G. (2002). A singular introduction to commutative algebra. Berlin: Springer-Verlag. With contributions by O. Bachmann, C. Lossen and H. Schonemann, With 1 CD-ROM (Windows, Macintosh, and UNIX). Griewank, A., & Osborne, M. R. (1983). Analysis of Newton's method at irregular singularities. SIAM J. Numer. Anal, 20(4), 747-773. Griffis, M., & Duffy, J. (1993). Method and apparatus for controlling geometrically simple parallel mechanisms with distinctive connections. US Patent 5,179,525. Griffths, P. A., & Harris, J. (1994). Principles of algebraic geometry. Wiley Classics Library. New York: John Wiley & Sons Inc. Reprint of the 1978 original. Gunning, R. C. (1970). Lectures on complex analytic varieties: The local parametrization theorem. Mathematical Notes. Princeton, N.J.: Princeton University

384

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Press. Gunning, R. C. (1990). Introduction to holomorphic functions of several variables. Vol. II. The Wadsworth & Brooks/Cole Mathematics Series. Monterey, CA: Wadsworth k Brooks/Cole Advanced Books & Software. Local theory. Gunning, R. C, k Rossi, H. (1965). Analytic functions of several complex variables. Englewood Cliffs, N.J.: Prentice-Hall Inc. Hamming, R. W. (1986). Numerical methods for scientists and engineers. New York: Dover Publications Inc., second edition. Harris, J. (1995). Algebraic geometry, Vol. 133 of Graduate Texts in Mathematics. New York: Springer-Verlag. A first course, Corrected reprint of the 1992 original. Hartenberg, R. S., & Denavit, J. (1964). Kinematic synthesis of linkages. McGrawHill, N.Y. Hartshorne, R. (1977). Algebraic geometry. New York: Springer-Verlag. Graduate Texts in Mathematics, No. 52. Hille, E. (1959). Analytic function theory. Vol. 1. Introduction to Higher Mathematics. Ginn and Company, Boston. Hille, E. (1962). Analytic function theory. Vol. II. Introductions to Higher Mathematics. Ginn and Co., Boston, Mass.-New York-Toronto, Ont. Hodge, W. V. D., k Pedoe, D. (1994a). Methods of algebraic geometry. Vol. I. Cambridge Mathematical Library. Cambridge: Cambridge University Press. Book I: Algebraic preliminaries, Book II: Projective space, Reprint of the 1947 original. Hodge, W. V. D., k Pedoe, D. (1994b). Methods of algebraic geometry. Vol. II. Cambridge Mathematical Library. Cambridge: Cambridge University Press. Book III: General theory of algebraic varieties in projective space, Book IV: Quadrics and Grassmann varieties, Reprint of the 1952 original. Hodge, W. V. D., k Pedoe, D. (1994c). Methods of algebraic geometry. Vol. III. Cambridge Mathematical Library. Cambridge: Cambridge University Press. Book V: Birational geometry, Reprint of the 1954 original. Ho§ten, S., k Shapiro, J. (2000). Primary decomposition of lattice basis ideals. J. Symbolic Comput., 29(4-5), 625-639. Symbolic computation in algebra, analysis, and geometry (Berkeley, CA, 1998). Huang, Y., Wu, W., Stetter, H. J., k Zhi, L. (2000). Pseudofactors of multivariate polynomials. Proceedings of the 2000 International Symposium on Symbolic and Algebraic Computation (St. Andrews) (pp. 161-168). New York: ACM. Huber, B., Sottile, F., k Sturmfels, B. (1998). Numerical Schubert calculus. J. Symbolic Comput., 26(6), 767-788. Symbolic numeric algebra for polynomials. Huber, B., k Sturmfels, B. (1995). A polyhedral method for solving sparse polynomial systems. Math. Comp., 64(212), 1541-1555. Huber, B., k Sturmfels, B. (1997). Bernstein's theorem in affine space. Discrete Comput. Geom., 17(2), 137-141. Huber, B., k Verschelde, J. (1998). Polyhedral end games for polynomial continuation. Numer. Algorithms, 18(1), 91-108.

Bibliography

385

Huber, B., & Verschelde, J. (2000). Pieri homotopies for problems in enumerative geometry applied to pole placement in linear systems control. SIAM J. Control Optim., 38(4), 1265-1287. Husty, M. L. (1996). An algorithm for solving the direct kinematics of general Stewart-Gough platforms. Mechanism Machine Theory, 31(4), 365-380. Husty, M. L., & Karger, A. (2000). Self-motions of Griffis-Duffy type parallel manipulators. Proceedings of the 2000 IEEE Int. Conf. Robotics and Automation, CDROM, San Francisco, CA, April 24-28, 2000. IEEE. Iitaka, S. (1982). Algebraic geometry, Vol. 76 of Graduate Texts in Mathematics. New York: Springer-Verlag. An introduction to birational geometry of algebraic varieties, North-Holland Mathematical Library, 24. Innocenti, C. (1995). Polynomial solution to the position analysis of the 7-link Assur kinematic chain with one quaternary link. Mechanism Machine Theory, 30(8), 1295-1303. Isaacson, E., & Keller, H. B. (1994). Analysis of numerical methods. New York: Dover Publications Inc. Corrected reprint of the 1966 original [Wiley, New York]. Kearfott, R. B. (1996). Rigorous global search: continuous problems, Vol. 13 of Nonconvex Optimization and its Applications. Dordrecht: Kluwer Academic Publishers. Kearfott, R. B. (1997). Empirical evaluation of innovations in interval branch and bound algorithms for nonlinear systems. SIAM J. Sci. Comp., 18(2), 574-594. Kearfott, R. B., & Novoa, M. (1990). Algorithm 681: INTBIS, a portable interval Newton/bisection package. ACM Trans. Math. Softw., 16(2), 152-157. Kearfott, R. B., & Xing, Z. (1994). An interval step control for continuation methods. SIAM J. Numer. Anal., 31(3), 892-914. Keller, H. B. (1981). Geometrically isolated nonisolated solutions and their approximation. SIAM J. Numer. Anal., 18(5), 822-838. Kendig, K. (1977). Elementary algebraic geometry. New York: Springer-Verlag. Graduate Texts in Mathematics, No. 44. Khovanski, A. G. (1978). Newton polyhedra, and the genus of complete intersections. Funktsional. Anal, i Prilozhen., 12(1), 51-61. Kleiman, S. L. (1986). Tangency and duality. Proceedings of the 1984 Vancouver conference in algebraic geometry, Vol. 6 of CMS Conf. Proc. (pp. 163-225). Providence, RI: Amer. Math. Soc. Knuth, D. E. (1981). The art of computer programming. Vol. 2. Addison-Wesley Publishing Co., Reading, Mass., second edition. Seminumerical algorithms, Addison-Wesley Series in Computer Science and Information Processing. Krick, T. (2004). Straight-line programs in polynomial equation solving. In F. Cucker, R. DeVore, P. Olver, & E. Siili (Eds.), Foundations of computational mathematics, Minneapolis 2002. Cambridge University Press. Kuo, Y.-C, Li, T.-Y., & Wu, D. (2004). Determining whether a numerical solution of a polynomial system is isolated, preprint.

386

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Kushnirenko, A. G. (1976). Newton polytopes and the Bezout theorem. Funktsional. Anal, i Prilozhen., 10(3), 82-83. Lazard, D. (1993). On the representation of rigid-body motions and its application to generalized platform manipulators. In J. Angeles, P. Kovacs, & G. Hommel (Eds.), Computational kinematics (pp. 175—182). Kluwer. Lecerf, G. (2001). Une alternative aux methodes de reecriture pour la resolution des system.es algebriques. PhD thesis, Ecole Polytechnique. Lecerf, G. (2002). Quadratic Newton iteration for systems with multiplicity. Found. Comput. Math., 2(3), 247-293. Lee, H.-Y., & Liang, C.-G. (1988). Displacement analysis of the general spatial 7-link 7R mechanism. Mechanism Machine Theory, 23(3), 219-226. Leykin, A., Verschelde, J., & Zhao, A. (2004). Newton's method with deflation for isolated singularities of polynomial systems, preprint. Li, T.-Y. (1983). On Chow, Mallet-Paret and Yorke homotopy for solving system of polynomials. Bull. Inst. Math. Acad. Sinica, 11(3), 433-437. Li, T.-Y. (1993). Solving polynomial systems by homotopy continuation methods. In Computer mathematics (Tianjin, 1991), Vol. 5 of Nankai Ser. Pure Appl. Math. Theoret. Phys. (pp. 18-35). River Edge, NJ: World Sci. Publishing. Li, T.-Y. (1997). Numerical solution of multivariate polynomial systems by homotopy continuation methods. In Ada numerica, Vol. 6 (pp. 399-436). Cambridge: Cambridge Univ. Press. Li, T.-Y. (1999). Solving polynomial systems by polyhedral homotopies. Taiwanese J. Math., 3(3), 251-279. Li, T.-Y. (2003). Numerical solution of polynomial systems by homotopy continuation methods. In Handbook of numerical analysis, Vol. XI (pp. 209-304). Amsterdam: North-Holland. Li, T.-Y., & Li, X. (2001). Finding mixed cells in the mixed volume computation. Found. Comput. Math., 1(2), 161-181. Li, T.-Y., & Sauer, T. (1987a). Homotopy method for generalized eigenvalue problems Ax = XBx. Linear Algebra Appl., 91, 65-74. Li, T.-Y., & Sauer, T. (1987b). Regularity results for solving systems of polynomials by homotopy method. Numer. Math., 50(3), 283-289. Li, T.-Y., &; Sauer, T. (1989). A simple homotopy for solving deficient polynomial systems. Japan J. Appl. Math., 6(3), 409-419. Li, T.-Y., Sauer, T., & Yorke, J. A. (1987a). Numerical solution of a class of deficient polynomial systems. SIAM J. Numer. Anal, 24(2), 435-451. Li, T.-Y., Sauer, T., & Yorke, J. A. (1987b). The random product homotopy and deficient polynomial systems. Numer. Math., 51(5), 481-500. Li, T.-Y., Sauer, T., & Yorke, J. A. (1988). Numerically determining solutions of systems of polynomial equations. Bull. Amer. Math. Soc. (N.S.), 18(2), 173-177. Li, T.-Y., Sauer, T., & Yorke, J. A. (1989). The cheater's homotopy: an efficient procedure for solving systems of polynomial equations. SIAM J. Numer. Anal.,

Bibliography

387

26(5), 1241-1251. Li, T. Y., Wang, T., & Wang, X. (1996). Random product homotopy with minimal BKK bound. In The mathematics of numerical analysis (Park City, UT, 1995), Vol. 32 of Lectures in Appl. Math. (pp. 503-512). Providence, RI: Amer. Math. Soc. Li, T.-Y., & Wang, X. (1991). Solving deficient polynomial systems with homotopies which keep the subschemes at infinity invariant. Math. Comp., 56(194), 693-710. Li, T.-Y., & Wang, X. (1992). Nonlinear homotopies for solving deficient polynomial systems with parameters. SIAM J. Numer. Anal, 29(4), 1104-1118. Li, T.-Y., & Wang, X. (1996). The BKK root count in C". Math. Comp., 65(216), 1477-1484. Li, T.-Y., & Zheng, Z. (2004). A rank-revealing method and its applications. preprint. Lipman, J. (1975). Introduction to resolution of singularities. In Algebraic geometry (Proc. Sympos. Pure Math., Vol. 29, Humboldt State Univ., Arcata, Calif., 1974) (pp. 187-230). Providence, R.I.: Amer. Math. Soc. Lo Cascio, M. L., Pasquini, L., & Trigiante, D. (1989). Simultaneous determination of polynomial roots and multiplicities: an algorithm and related problems. Ricerche Mat, 38(2), 283-305. Losch, S. (1995). Parallel redundant manipulators based on open and closed normal Assur chains. In J.-P. Merlet, & B. Ravani (Eds.), Computational kinematics '95, Proceedings of the Second Workshop held in Sophia Antipolis, September 4-6, 1995, Vol. 40 of Solid Mechanics and its Applications (pp. x+310). Dordrecht: Kluwer Academic Publishers Group. Lu, Y., Sommese, A. J., & Wampler, C. W. (2005). Finding all real solutions of polynomial systems: I the curve case, in preparation. Macaulay, F. (1902). On some formulas in elimination. Proc. London Math. Soc, 3, 3-27. Manocha, D. (1993). Efficient algorithms for multipolynomial resultant. The Computer Journal, 36, 485-496. Manocha, D. (1994). Solving systems of polynomial equations. IEEE Comput. Graph. Appl, 36, 46-55. Manocha, D., & Canny, J. F. (1994). Efficient inverse kinematics for general 6R manipulators. IEEE Trans. Rob. Auto., 10(5), 648-657. Manseur, R., & Doty, K. (1989). A robot manipulator with 16 real inverse kinematic solution set. Int. J. Robotics Res., 8(5), 75-79. Marden, M. (1966). Geometry of polynomials. Second edition. Mathematical Surveys, No. 3. Providence, R.I.: American Mathematical Society. Mavroidis, C, & Roth, B. (1995a). Analysis of overconstrained mechanisms. ASME J. Mech. Design, 117, 69-74. Mavroidis, C, & Roth, B. (1995b). New and revised overconstrained mechanisms. ASME J. Mech. Design, 117, 75-82.

388

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Mayer St-Onge, B., &: Gosselin, C. M. (2000). Singularity analysis and representation of the general Gough-Stewart platform. Int. J. Robotics Research, 19, Li 1— ZOO.

Meintjes, K., & Morgan, A. P. (1987). A methodology for solving chemical equilibrium systems. Appl. Math. Com/put., 22, 333-361. Merlet, J.-P. (1989). Singular configurations of parallel manipulators and Grassmann geometry. Int. J. Robotics Research, 8, 45—56. Merlet, J.-P. (2000). Parallel robots. Kluwer Academic Publishers, Dordrecht, The Netherlands. Merlet, J.-P. (2001). A parser for the interval evaluation of analytical functions and its applications to engineering problems. J. Symbolic Computation, 31, 475-486. Mignotte, M., & Stefanescu, D. (1999). Polynomials. Springer Series in Discrete Mathematics and Theoretical Computer Science. Springer-Verlag, Singapore. An algorithmic approach. Milnor, J. W. (1965). Topology from the differentiate viewpoint. Based on notes by David W. Weaver. The University Press of Virginia, Charlottesville, Va. Moller, H. M. (1998). Grobner bases and numerical analysis. In Grobner bases and applications (Linz, 1998), Vol. 251 of London Math. Soc. Lecture Note Ser. (pp. 159-178). Cambridge: Cambridge Univ. Press. Moller, H. M., & Stetter, H. J. (1995). Multivariate polynomial equations with multiple zeros solved by matrix eigenproblems. Num. Math., 70, 311-329. Moore, R. E. (1979). Methods and applications of interval analysis, Vol. 2 of SIAM Studies in Applied Mathematics. Philadelphia, Pa.: Society for Industrial and Applied Mathematics (SIAM). Morgan, A. P. (1983). A method for computing all solutions to systems of polynomial equations. ACM Trans. Math. Software, 9(1), 1-17. Morgan, A. P. (1986a). A homotopy for solving polynomial systems. Appl. Math. Comput., 18(1), 87-92. Morgan, A. P. (1986b). A transformation to avoid solutions at infinity for polynomial systems. Appl. Math. Comput., 18(1), 77-86. Morgan, A. P. (1987). Solving polynomial systems using continuation for engineering and scientific problems. Prentice-Hall, Englewood Cliffs, N.J. Morgan, A. P., & Sommese, A. J. (1987a). A homotopy for solving general polynomial systems that respects m-homogeneous structures. Appl. Math. Comput., 101-113. Morgan, A. P., & Sommese, A. J. (1987b). Computing all solutions to polynomial systems using homotopy continuation. Appl. Math. Comput., 115-138. Errata: Appl. Math. Comput. 51 (1992), p. 209. Morgan, A. P., & Sommese, A. J. (1989). Coefficient-parameter polynomial continuation. Appl. Math. Comput, 29(2), 123-160. Errata: Appl. Math. Comput. 51:207(1992). Morgan, A. P., & Sommese, A. J. (1990). Generically nonsingular polynomial

Bibliography

389

continuation. In Computational solution of nonlinear systems of equations (Fort Collins, CO, 1988), Vol. 26 of Lectures in Appl. Math. (pp. 467-493). Providence, RI: Amer. Math. Soc. Morgan, A. P., Sommese, A. J., & Wampler, C. W. (1990). Polynomial continuation for mechanism design problems. In Computational solution of nonlinear systems of equations (Fort Collins, CO, 1988), Vol. 26 of Lectures in Appl. Math. (pp. 495-517). Providence, RI: Amer. Math. Soc. Morgan, A. P., Sommese, A. J., & Wampler, C. W. (1991). Computing singular solutions to nonlinear analytic systems. Numer. Math., 58(7), 669-684. Morgan, A. P., Sommese, A. J., & Wampler, C. W. (1992a). Computing singular solutions to polynomial systems. Adv. in Appl. Math., 13(3), 305-327. Morgan, A. P., Sommese, A. J., & Wampler, C W. (1992b). A power series method for computing singular solutions to nonlinear analytic systems. Numer. Math., 63(3), 391-409. Morgan, A. P., Sommese, A. J., & Wampler, C. W. (1995). A productdecomposition bound for Bezout numbers. SIAM J. Numer. Anal, 32(A), 13081325. Morgan, A. P., Sommese, A. J., & Watson, L. T. (1989). Finding all isolated solutions to polynomial systems using HOMPACK. A CM Trans. Math. Software, 15(2), 93-122. Morgan, A. P., & Wampler, C. W. (1990). Solving a planar four-bar design problem using continuation. ASME J. Mech. Design, 112, 544-550. Morgan, A. P., & Watson, L. T. (1987). Solving polynomial systems of equations on a hypercube. In Hypercube multiprocessors 1987 (Knoxville, TN, 1986) (pp. 501-511). Philadelphia, PA: SIAM. Morgan, A. P., & Watson, L. T. (1989). A globally convergent parallel algorithm for zeros of polynomial systems. Nonlinear Anal., 13(\1), 1339-1350. Mourrain, B. (1993, July). The 40 generic positions of a parallel robot. In M. Bronstein (Ed.), Proc. ISSAC'93 (Kiev) (pp. 173-182). ACM Press. Mourrain, B. (1996). Enumeration problems in geometry, robotics and vision. In Algorithms in algebraic geometry and applications (Santander, 1994), Vol. 143 of Progr. Math. (pp. 285-306). Basel: Birkhauser. Mourrain, B. (1998). Computing the isolated roots by matrix methods. J. Symbolic Comput., 26(6), 715-738. Symbolic numeric algebra for polynomials. Mumford, D. (1966). Lectures on curves on an algebraic surface. With a section by G. M. Bergman. Annals of Mathematics Studies, No. 59. Princeton, N.J.: Princeton University Press. Mumford, D. (1970). Varieties denned by quadratic equations. In E. Marchionna (Ed.), Questions on algebraic varieties (C.I.M.E., III Ciclo, Varenna, 1969) (pp. 29-100). Rome: Edizioni Cremonese. Mumford, D. (1995). Algebraic geometry. I. Classics in Mathematics. Berlin: Springer-Verlag. Complex projective varieties, Reprint of the 1976 edition.

390

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Mumford, D. (1999). The red book of varieties and schemes, Vol. 1358 of Lecture

Notes in Mathematics. Berlin: Springer-Verlag, expanded edition. Includes the Michigan lectures (1974) on curves and their Jacobians, With contributions by E. Arbarello. Nanua, P., Waldron, K. J., & Murthy, V. (1991). Direct kinematic solution of a Stewart platform. IEEE Trans, on Robotics and Automation, 6(4), 438-444. Neumaier, A. (1990). Interval methods for systems of equations, Vol. 37 of Encyclopedia of Mathematics and its Applications. Cambridge: Cambridge University Press. Nielsen, J., & Roth, B. (1999). Solving the input/output problem for planar mechanisms. ASME J. Mech. Design, 121(2), 206-211. Ojika, T. (1987). Modified deflation algorithm for the solution of singular problems. I. A system of nonlinear algebraic equations. J. Math. Anal. Appl., 123, 199-221. Ojika, T., Watanabe, S., & Mitsui, T. (1983). Deflation algorithm for the multiple roots of a system of nonlinear equations. J. Math. Anal. Appl., 96, 463-479. Pan, V. Y. (1997). Solving a polynomial equation: some history and recent progress. SI AM Rev., 39(2), 187-220. Pasquini, L., & Trigiante, D. (1985). A globally convergent method for simultaneously finding polynomial roots. Math. Comp., ^^(169), 135-149. Pernkopf, F., & Husty, M. L. (2002). Singularity analysis of spatial stewart-gough platforms with planar base and platform. Proc. ASME Design Eng. Tech. Conf, Montreal, Canada, Sept. 30~Oct. 2, 2002. Pieper, D. L. (1968). The kinematics of manipulators under computer control. PhD thesis, Computer Science Dept., Stanford University. Primrose, E. J. F. (1986). On the input-output equation of the general 7Rmechanism. Mechanism Machine Theory, 21(6), 509-510. Raghavan, M. (1991). The Stewart platform of general geometry has 40 configurations. Proc. ASME Design and Automation Conf, vol. 32-2 (pp. 397-402). ASME. Raghavan, M. (1993). The Stewart platform of general geometry has 40 configurations. ASME J. Mech. Design, 115, 277-282. Raghavan, M., & Roth, B. (1993). Inverse kinematics of the general 6R manipulator and related linkages. ASME J. Mech. Design, 115, 502-508. Raghavan, M., & Roth, B. (1995). Solving polynomial systems for the kinematic analysis and synthesis of mechanisms and robot manipulators. ASME J. Mech. Design, 117, 71-79. Roberts, S. (1875). On three-bar motion in plane space. Proc. London Math. Soc, VII, 14-23. Rojas, J. M. (1994). A convex geometric approach to counting the roots of a polynomial system. Theoret. Comput. Sci., 133(1), 105-140. Selected papers of the Workshop on Continuous Algorithms and Complexity (Barcelona, 1993). Rojas, J. M. (1999). Toric intersection theory for affine root counting. J. Pure Appl.

Bibliography

391

Algebra, 136(1), 67-100. Rojas, J. M., & Wang, X. (1996). Counting affine roots of polynomial systems via pointed Newton polytopes. J. Complexity, 12(2), 116-133. Ronga, F., & Vust, T. (1995). Stewart platforms without computer? In Real analytic and algebraic geometry (Trento, 1992) (pp. 197-212). Berlin: de Gruyter. Roth, B. (1962). A generalization of Burmester theory: Nine-point path generation of geared five-bar mechanisms with gear ratio plus and minus one. PhD thesis, Columbia University. Roth, B., & Freudenstein, F. (1963). Synthesis of path-generating mechanisms by numerical means. J. Eng. Industry, 298-306. Trans. ASME, vol. 85, Series B. Roth, B., Rastegar, J., & Scheinman, V. (1974). On the design of computer controlled manipulators. On the Theory and Practice of Robots and Manipulators: First CSIM-IFToMM Symposium (pp. 93-113). Springer-Verlag. Rump, S. M. (1999). INTLAB - INTerval LABoratory. In T. Csendes (Ed.), Developments in reliable computing, Proc. of (SCAN-98), Budapest, September 22-25, 1998 (pp. 77-104). Dordrecht: Kluwer Academic Publishers. Rupprecht, D. (2004). Semi-numerical absolute factorization of polynomials with integer coefficients. J. Symbolic Comput., 37(5), 557-574. Sasaki, T. (2001). Approximate multivariate polynomial factorization based on zero-sum relations. In B. Mourrain (Ed.), Proceedings of the 2001 international symposium on symbolic and algebraic computation (ISSAC 2001) (pp. 284-291). ACM. Schenck, H. (2003). Computational algebraic geometry, Vol. 58 of London Mathematical Society Student Texts. Cambridge: Cambridge University Press. Shiftman, B., & Sommese, A. J. (1985). Vanishing theorems on complex manifolds, Vol. 56 of Progress in Mathematics. Boston, MA: Birkhauser Boston Inc. Sommese, A. J., & Verschelde, J. (2000). Numerical homotopies to compute generic points on positive dimensional algebraic sets. J. Complexity, 16(3), 572-602. Complexity theory, real machines, and homotopy (Oxford, 1999). Sommese, A. J., Verschelde, J., & Wampler, C. W. (2001a). Numerical decomposition of the solution sets of polynomial systems into irreducible components. SIAM J. Numer. Anal., 38(6), 2022-2046. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2001b). Numerical irreducible decomposition using projections from points on the components. In Symbolic computation: solving equations in algebra, geometry, and engineering (South Hadley, MA, 2000), Vol. 286 of Contemp. Math. (pp. 37-51). Providence, RI: Amer. Math. Soc. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2001c). Using monodromy to decompose solution sets of polynomial systems into irreducible components. In Applications of algebraic geometry to coding theory, physics and computation (Eilat, 2001), Vol. 36 of NATO Sci. Ser. II Math. Phys. Chem. (pp. 297-315). Dordrecht: Kluwer Acad. Publ.

392

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Sommese, A. J., Verschelde, J., & Wampler, C. W. (2002a). A method for tracking singular paths with application to the numerical irreducible decomposition. In Algebraic geometry (pp. 329-345). Berlin: de Gruyter. Sommese, A. J., Verschelde, J., & Wampler, C W. (2002b). Symmetric functions applied to decomposing solution sets of polynomial systems. SIAM J. Numer. Anal, 40(6), 2026-2046. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2003). Numerical irreducible decomposition using PHCpack. In Algebra, geometry, and software systems (pp. 109-129). Berlin: Springer. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004a). Advances in polynomial continuation for solving problems in kinematics. ASME J. Mech. Design, 126(2), 262-268. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004b). Homotopies for intersecting solution components of polynomial systems. SIAM J. Numer. Anal., 42(4), 1552-1571. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004c). An intrinsic homotopy for intersecting algebraic varieties. J. Complexity, to appear. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004d). Numerical factorization of multivariate complex polynomials. Theoretical Computer Science, 315, 651— 669. Sommese, A. J., Verschelde, J., & Wampler, C. W. (2004e). Solving polynomial systems equation by equation, in preparation. Sommese, A. J., & Wampler, C. W. (1996). Numerical algebraic geometry. In The mathematics of numerical analysis (Park City, UT, 1995), Vol. 32 of Lectures in Appi Math. (pp. 749-763). Providence, RI: Amer. Math. Soc. Sosonkina, M., Watson, L. T., & Stewart, D. E. (1996). Note on the end game in homotopy zero curve tracking. ACM Trans. Math. Software, 22(3), 281-287. Sreenivasan, S. V., & Nanua, P. (1992). Solution of the direct position kinematics problem of the general Stewart platform using advanced polynomial continuation. DE-Vol. 45, Robotics, Spatial Mechanisms, and Mechanical Systems (pp. 99-106). ASME. Sreenivasan, S. V., Waldron, K. J., & Nanua, P. (1994). Closed-form direct displacement analysis of a 6-6 Stewart platform. Mechanism Machine Theory, 29(6), 855-864. Stetter, H. J. (2004). Numerical polynomial algebra. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). Stoer, J., & Bulirsch, R. (2002). Introduction to numerical analysis, Vol. 12 of Texts in Applied Mathematics. New York: Springer-Verlag, third edition. Translated from the German by R. Bartels, W. Gautschi and C. Witzgall. Sturmfels, B. (2002). Solving systems of polynomial equations, Vol. 97 of CBMS Regional Conference Series in Mathematics. Published for the Conference Board of the Mathematical Sciences, Washington, DC.

Bibliography

393

Sturmfels, B., & Zelevinsky, A. (1994). Multigraded resultants of Sylvester type. J. of Algebra, 163(1), 115-127. Su, H.-J., Wampler, C. W., & McCarthy, J. M. (2004). Geometric design of cylindric PRS serial chains. ASME J. Mech. Design, 126(2), 269-277. Tsai, L. W. (1999). Robot analysis: the mechanics of serial and parallel manipulators. New York: John Wiley & Sons Inc. Tsai, L. W., & Lu, J.-J. (1989). Coupler-point curve synthesis using homotopy methods. In B. Ravani (Ed.), Advances in Design Automation-1989: Mechanical Systems Analysis, Design and Simulation, Vol. DE-Vol. 19-3 (pp. 417-424). ASME. Tsai, L. W., & Morgan, A. P. (1985). Solving the kinematics of the most general six- and five-degree-of-freedom manipulators by continuation methods. ASME J. Mech., Trans., Auto. Design, 107, 48-57. van der Waerden, B. L. (1949). Modern Algebra. Vol. I. New York, N. Y.: Frederick Ungar Publishing Co. Translated from the second revised German edition by Fred Blum, With revisions and additions by the author. van der Waerden, B. L. (1950). Modern Algebra. Vol. II. New York, N. Y.: Frederick Ungar Publishing Co. Translated from the first German edition by Theodore Benac. Verschelde, J. (1996). Homotopy continuation methods for solving polynomial systems. PhD thesis, Katholieke Universiteit Leuven. Verschelde, J. (1999). Algorithm 795: PHCpack: A general-purpose solver for polynomial systems by homotopy continuation. A CM Trans, on Math. Software, 25(2), 251-276. Verschelde, J. (2000). Toric Newton method for polynomial homotopies. J. Symbolic Comput., £9(4-5), 777-793. Symbolic computation in algebra, analysis, and geometry (Berkeley, CA, 1998). Verschelde, J., & Cools, R. (1993). Symbolic homotopy construction. Appl. Algebra Engrg. Comm. Comput., ^(3), 169-183. Verschelde, J., Gatermann, K., & Cools, R. (1996). Mixed-volume computation by dynamic lifting applied to polynomial system solving. Discrete Comput. Geom., 16(1), 69-112. Verschelde, J., Verlinden, P., & Cools, R. (1994). Homotopies exploiting Newton polytopes for solving sparse polynomial systems. SI AM J. Numer. Anal., 31 (3), 915-930. Verschelde, J., & Wang, Y. (2004). Computing feedback laws for linear systems with a parallel Pieri homotopy. In Y. Yang (Ed.), Proceedings of 2004 International Conference on Parallel Processing Workshops, August 15-18, 2004 (PP- 222-229). IEEE. Walker, R. J. (1962). Algebraic curves. Dover, New York. Wampler, C. W. (1992). Bezout number calculations for multi-homogeneous polynomial systems. Appl. Math. Comput, 51(2-3), 143-157.

394

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Wampler, C. W. (1994). An efficient start system for multihomogeneous polynomial continuation. Numer. Math., 66(4), 517-523. Wampler, C. W. (1996a). Forward displacement analysis of general six-in-parallel SPS (Stewart) platform manipulators using soma coordinates. Mechanism Machine Theory, 31, 331-337. Wampler, C. W. (1996b). Isotropic coordinates, circularity and Bezout numbers: planar kinematics from a new perspective. In J. M. McCarthy (Ed.), Proceedings of the 1996 ASME Design Engineering Technical Conference, Irvine, California August 18-22, 1996. American Society of Mechanical Engineers, CD-ROM. Also available as GM Technical Report, Publication R&D-8188., 1996. Wampler, C. W. (1999). Solving the kinematics of planar mechanisms. ASME J. Mech. Design, 121, 387-391. Wampler, C. W. (2001). Solving the kinematics of planar mechanisms by Dixon determinant and a complex-plane formulation. ASME J. Mech. Design, 123(3), 382-387. Wampler, C. W. (2004). Displacement analysis of spherical mechanisms having three or fewer loops. ASME J. Mech. Design, 126(1), 93-100. Wampler, C. W., & Morgan, A. P. (1993). Solving the kinematics of general 6R manipulators using polynomial continuation. In Robotics: applied mathematics and computational aspects (Loughborough, 1989), Vol. 41 of Inst. Math. Appl. Conf. Ser. New Ser. (pp. 57-69). New York: Oxford Univ. Press. Wampler, C. W., Morgan, A. P., & Sommese, A. J. (1990). Numerical continuation methods for solving polynomial systems arising in kinematics. ASME J. Mech. Design, 112, 59-68. Wampler, C. W., Morgan, A. P., & Sommese, A. J. (1992). Complete solution of the nine-point path synthesis problem for four-bar linkages. ASME J. Mech. Design, 114, 153-159. Wampler, C. W., Morgan, A. P., & Sommese, A. J. (1997). Complete solution of the nine-point path synthesis problem for four-bar linkages - closure. ASME J. Mech. Design, 119, 150-152. Watson, L. T., Billups, S. C , k Morgan, A. P. (1987). Algorithm 652. HOMPACK: a suite of codes for globally convergent homotopy algorithms. ACM Trans. Math. Software, 13(3), 281-310. Watson, L. T., Sosonkina, M., Melville, R. C , Morgan, A. P., & Walker, H. F. (1997). Algorithm 777: HOMPACK90: a suite of Fortran 90 codes for globally convergent homotopy algorithms. ACM Trans. Math. Software, 23(4), 514-549. Weil, A. (1962). Foundations of algebraic geometry. Providence, R.I.: American Mathematical Society. Wilkinson, J. H. (1984). The perfidious polynomial. In Studies in numerical analysis, Vol. 24 of MAA Stud. Math. (pp. 1-28). Washington, DC: Math. Assoc. America. Wilkinson, J. H. (1994). Rounding errors in algebraic processes. New York: Dover

Bibliography

395

Publications Inc. Reprint of the 1963 original [Prentice-Hall, Englewood Cliffs, NJ]. Xu, Z.-B., Zhang, J.-S., & Wang, W. (1996). A cell exclusion algorithm for determining all the solutions of a nonlinear system of equations. Appl. Math. Comput., 80'(2-3), 181-208. Zhang, C.-D., & Song, S.-M. (1994). Forward position analysis of nearly general Stewart platform. ASME J. Mech. Design, 116(1), 54-60.

Index

Z r e g , 44, 215 # , xxii Sing(X), 44 Sing(Z), 306 C*, xxii (x,L), 328 A, xxii P N , 29 Gr(m,N), 325 V(f), 8 Opi(d), 342 \, i.e., setminus, xxii

Local Dimension, 251 LocalDimen, 251 Memberl, 268, 275 Member2, 269, 276 Monodromy, 269, 277 Rank, 240 TopDimen, 250 Trace, 270, 284 WitnessSuper, 247 WitnessSupi, 245 WitnessSupi(intrinsic), 246 algorithm for the rank of a system, 319 analysis, 163 analytic continuation, 278 analytic parameter spaces, 349 analytic Zariski open set, 350

affine algebraic set, 43, 47, 56, 207, 209 affine hyperplane, 232 affine space, 209 affine variety, 215 algebraic function, 210, 212 algebraic map, 208, 210, 212, 219, 220 algebraic probability one, 50 algebraic set, 43, 44, 207, 209 affine, see affine algebraic set constructible set, see constructive set projective, see projective algebraic set quasiprojective, see quasiprojective set algebraic set associated to / , 8 algebraic set of / see algebraic set associated to / , 8 algorithm Inclusion, 252 LocalDimen, 251 Equal, 253 IrrDecompl, 268 IrrDecomp2, 271 IrrDecompPure, 270, 284 JunkRemove, 269

base locus, 323 Bertini Theorems, 313, 323, 330-333 big system, 319 biholomorphic mapping, 301 biholomorphic to, 301 BKK bound, 139 body guidance, 163, 165 branched covering, 314 Buchberger's algorithm, 82 Burmester centers, 166 Burmester points, 166 cascade algorithm, 255, 259 Cauchy integral, 199 Cauchy integral endgame, 285 Cauchy integral method, 186, 187, 189 Cauchy's Lemma, 58 center of a projection, 213, 328 Chebychev polynomials, 65 397

398

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

chemical equilibria, 152-154, 170 Cauchy integral method, see Cauchy Chern class, 343 integral method Chevalley's Theorem, 222 cluster method, see trace method classical topology, see complex topology, power-series method, see power-series 211 method cluster method, 187 trace method, see trace method coefficient, 5 endgame convergence radius, 180 coefficient-parameter homotopy, 91 endgame operating zone, 179, 182, 183, coefficient-parameter theory, 92 185 compact affine set, 220 equation-by-equation, 292 companion matrix, 4 Exclusion method, 68 complex analytic set, 301 extension theorems, 302 complex analytic space, 302 extrinsic slicing, 234 complex dimension, see dimension, 306 complex manifolds, 302 finite affine set, 210 complex projective space, see projective finite map, 312 space first Chern class, 342 complex topology, 211, 300 five-point path synthesis, 166, 174 condition number, 198 four-bar analysis, 164 cone with vertex x, 320 four-bar equations, 163 constellation of algebraic sets, 331, 332 four-bar function generation, 173 constructible algebraic set, 207, 208 four-bar linkages, 169 constructible set, 208, 209, 221 four-bar synthesis, 162, 163 convex polytope, 138 four-body guidance, 174 corank of a polynomial system, 239 fractional power series, 180 corank of an algebraic system, 318 function generation, 162, 164 coupler curve, 162 Fundamental Theorem of Algebra, 55 covering map, 314 cuspidal cubic, 282 gamma trick, 18, 94, 95 general point, 44 deflation, 190-193, 195 generic, 45, 46 degree, 230 simply, 332 degree of a polynomial, 5 generic Bezout number, 346, 351 desingularization, 310 generic factorization, 316, 317 diagonal intersection, 289 generic line, 46 differentiable manifold, 301 generic linear change of coordinates, 213 dimension, 44, 207, 216, 306 generic linear projection, 213, 324 upper semicontinuity, see upper generic point, 44 semicontinuity of dimension generic projection, see generic linear dimension of a germ, 309 projection dimensional complex manifold, 302 generic root count, 346, 351 discriminant, 57 generic with respect to an algebraic set, disk, 58 233 Dixon determinant, 77 generically, 45 dominant map, 222, 311 genericity, 43 dual curve, 338 germ, 308 dimension of a, 309 elementary symmetric functions, 280 irreducible, 309 elimination methods, 72 germ of a complex analytic set, 308 endgame, 177 germ of an affine algebraic set, 308

399

Index germ of an analytic set, 308 germs, 181 Grobner bases, 81 Grobner basis, 82 graph of a map, 219 Grassmannian, 325, 326 Grauert's Proper Mapping Theorem, 311 ground link, 162 grounded link, 13 growth estimates, 60 Hartogs' Theorem, 302, 303 heuristic eliminant, 79 hidden variable resultant, 73 Hironaka Desingularization Theorem, 310 holomorphic function, 300 holomorphic mapping, 301 homogeneous coordinates, xxii, 29 homogeneous polynomial, 33, 34 homogeneous polynomials, 218 homotopy continuation method, 15 homotopy membership test, 275 hyperplane, 232 hyperplane at infinity, 320, 327, 335-337 image of an irreducible set, 222 Implicit Function Theorem, 304 interval arithmetic, 201 intrinsic slicing, 234 irreducible algebraic set, 44, 207, 215 irreducible at a point, 309 irreducible component, 56, 207 irreducible decomposition, 56, 207, 215, 219 irreducible germ, 309 dimension of an, 309 irreducible witness sets, 230 isomorphic, 210, 212 isotropic coordinates, 160 joint offset, 158 junk points, 245, 249 Laurent monomial, 138 Laurent polynomial, 139 level i nonsolutions, 258 line at infinity, 33 line bundle, 341, 342 linear projection, 213, 324 generic, see generic linear projection

linear projections, 212 linear slicing, 231 link length, 158 locally irreducible, 309 losing the endgame, 188 manifold, 301 manifold point, 44, 306 map finite, 312 proper, 212 maximum principle, 304, 311 membership test, 266 Minkowski sum, 139 mixed strategy, 150 mixed volume, 138, 140 monodromy, 275-277, 339, 348 monodromy action, 278 monomial, 5 Mount Everest of Kinematics, see six-revolute serial-link robots multidegree notation, xxi, 5, 301, 322 multihomogeneous polynomial, 35 multiplicity, 8, 209, 223, 224, 236 multiprojective space, 35 Nash equilibria, 149-151, 170 nested parameter homotopy, 101 Newton polytope, 138 Newton's method, 17, 18, 24, 71, 177, 182 Newton-Raphson method, 17 nine-point path synthesis problem, 112, 161, 167 Noether Normalization Theorem, 214, 336 nonreduced, 236 nonsolutions, 258 normal, 281, 303, 308 normal complex analytic space, 308 normalization, 189, 311 Nullstellensatz, 307 numerical algebraic geometry, vii, 227-229, 241 numerical elimination theory, 266 numerical irreducible decomposition, 228, 230, 231, 253, 265 overdetermined, 241 parameter homotopy, see coefficient-parameter homotopy

400

Numerical Solution of Systems of Polynomials Arising in Engineering and Science

patch switching, 38 path generation, 162 path synthesis problems, 166 Plucker embedding, 326 point at infinity, 30 polyhedron, 138 polynomial system, 209 polytope, 138 polytope root count, 139 power-series endgame, 199, 285 power-series method, 183, 185, 186, 189, 194 precision-point methods, 163 primary decomposition, 216 probabilistic algorithm, 249 probability one, 43, 50, 313 probability-one methods, 207 projective transformation, 39 projective algebraic set, 34, 207, 217 projective line, 30 projective plane, 32 projective rank of an algebraic system, 319, 320, 330 projective set, 43 projective space, 27-30 projective transformation, 38, 40, 198 proper, 189 proper algebraic map, 212 proper map, 212 proper mapping theorem, 311 Puiseux's Theorem, 310 pure-dimensional, 216, 219

Riemann Bounded Extension Theorem, 303

quadric surface, 334 quasiprojective algebraic set, 44, 207, 208 quasiprojective set, see quasiprojective algebraic set, 219

sampling, 272, 273 Sard's Theorem, 313 secant variety, 335 section of a line bundle, 218 Segre embedding, 293, 334, 335 set of indeterminacy, 317 seven-bar structures, 172 simply generic, 332 singular path tracking, 273, 284 singular point, 44, 306 singular set, 307 six-revolute inverse position, 172 six-revolute serial-link robots, 156 slicing, 231 smooth immersion, 279 smooth point, 44, 306 solution sets, 8 spanned, 342 square system, 241 Stein factorization Theorem, 312 Stewart-Gough forward kinematics, 154 Stewart-Gough platform robots, ix, 101, 104-106, 108, 109, 111, 113-115, 154, 171 straight-line function, 6, 11, 12, 48, 70, 85, 362 submatrix, 332 Sylvester determinant, 56 Sylvester matrix, 65 Sylvester Resultant, 57 symmetric group, 340 synthesis, 163 synthesis problems, 164, 169 system of coordinates, 304

radical, 216 rank of a polynomial system, 228, 239, 240 rank of an algebraic system, 318, 319, 329 rational mapping, 317, 338 real dimension, 210, 306 reduced, 236 reduction to the diagonal, 290 regular point, 215, 306 Remmert-Stein Factorization Theorem see Stein Factorization Theorem, 312 resultant, 57, 73

topologically unibranch, 309 topology, 211 classical, see complex topology complex, see complex topology Zariski, see Zariski topology total degree of a polynomial, 5 trace, 187, 279-281 trace method, 187, 189 trace test, 279 trigonometric equations, 7 twist angle, 158

Index

underdetermined, 241 universal field, 52 universal function, 323 universal system, 323 upper semicontinuity of dimension, 312 variety, 8 vector bundle, 341, 343 Veronese embedding, 334 Wilkinson polynomials, 11 winding number, 180, 182, 183 witness point superset, 244, 245, 255 witness set, see witness point set, 8, 229, 235 witness superset, 253, 256 Zariski closed set, 211 Zariski open set, 92, 211 Zariski topology, 211, 221

401

The Numerical Solution of Systems of Polynomials Arising in Engineering and Science

Read more

The Numerical Solution of Systems of Polynomials: Arising in Engineering And Science

Read more

Numerical Solution of Systems of Polynomials

Read more

Numerical solution of partial differential equations in science and engineering

Read more

Numerical Solution of Partial Differential Equations in Science and Engineering

Read more

Numerical Solution of Integral Equations

Read more

Numerical Solution of Elliptic Equations

Read more

Numerical Solution of Elliptic Problems

Read more

Numerical Solution of Elliptic Problems

Read more

Numerical Solution of Differential Equations

Read more

Numerical Solution of Integral Equations

Read more

Numerical Solution of Nonlinear Equations

Read more

Numerical Solution of Elliptic Equations

Read more

Numerical Solution of Partial Differential Equations on Parallel Computers (Lecture Notes in Computational Science and Engineering)

Read more

Numerical Methods for Roots of Polynomials

Read more

Numerical methods for roots of polynomials

Read more

Numerical methods for roots of polynomials

Read more

Numerical Methods for Roots of Polynomials

Read more

Database Systems in Science and Engineering

Read more

Approximation of Nonlinear Evolution Systems (Mathematics in Science and Engineering)

Read more

Fundamentals of Engineering Numerical Analysis

Read more

Numerical solution of SDE through computer experiments

Read more

Numerical solution of stochastic differential equations

Read more

Numerical Solution of Partial Differential Equations solution manual

Read more

The Numerical Solution of Algebraic Equations

Read more

The numerical solution of elliptic equations

Read more

Numerical solution of partial differential equations

Read more

Numerical solution of partial differential equations (MA3243)

Read more

Numerical solution of partial differential equations

Read more

The numerical solution of elliptic equations

Read more

Recommend Documents

The Numerical Solution of Systems of Polynomials Arising in Engineering and Science

The Numerical Solution of Systems of Polynomials: Arising in Engineering And Science

Numerical Solution of Systems of Polynomials

Numerical solution of partial differential equations in science and engineering

Numerical Solution of Partial Differential Equations in Science and Engineering

NUMERICAL SOLUTION OF PARTIAL DIFFERENTIAL EQUATIONS IN SCIENCE AND ENGINEERING This page intentionally left blank ...

Numerical Solution of Integral Equations

Numerical Solution of Elliptic Equations

CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS A series of lectures on topics of current research interest ...

Numerical Solution of Elliptic Problems

Numerical Solution of Elliptic Problems SIAM Studies in Applied and Numerical Mathematics This series of monographs f...

Numerical Solution of Elliptic Problems

Numerical Solution of Differential Equations

Numeri cal SoluSolution tion of Differenti al EquationsEquations Numerical of Differential By: M.K. Jain * po tr =...