MULTISCALE OPTIMIZATION METHODS AND APPLICATIONS
Nonconvex Optimization and Its Applications VOLUME 82 Managing Editor: Panos Pardalos University of Florida, U.S.A.
Advisory Board: J. R. Birge University of Chicago, U.S.A. Ding-Zhu Du University of Minnesota, U.S.A. C. A. Floudas Princeton University, U.S.A. J. Mockus Lithuanian Academy of Sciences, Lithuania H. D. Sherali Virginia Polytechnic Institute and State University, U.S.A. G. Stavroulakis Technical University Braunschweig, Germany H. Tuy National Centre for Natural Science and Technology, Vietnam
MULTISCALE OPTIMIZATION METHODS AND APPLICATIONS
Edited by WILLIAM W. HAGER University of Florida, Gainesville, Florida SHU-JEN HUANG University of Florida, Gainesville, Florida PANOS M. PARDALOS University of Florida, Gainesville, Florida OLEG A. PROKOPYEV University of Florida, Gainesville, Florida
^
Springer
Library of Congress Control Number: 2005933792 ISBN-10: 0-387-29549-6
e-ISBN: 0-387-29550-X
ISBN-13: 978-0387-29549-7
Printed on acid-free paper.
© 2006 Springer Science-fBusiness Media, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science-HBusiness Media, hic., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in coimection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 987654321 springeronline.com
Contents
Multiscale Optimization in VLSI Physical Design Automation Tony F. Chan, Jason Cong, Joseph R. Shinnerl, Kenton Sze, Min Xie, Yan Zhang
1
A Distributed Method for Solving Semidefinite Programs Arising from Ad Hoc Wireless Sensor Network Localization Pratik Biswas, Yinyu Ye
69
Optimization Algorithms for Sparse Representations and Applications Pando G. Georgiev, Fabian Theis, Andrzej Cichocki
85
A Unified Framework for Modeling and Solving Combinatorial Optimization Problems: A Tutorial Gary A. Kochenberger, Fred Glover
101
Global Convergence of a Non-monotone Trust-Region Filter Algorithm for Nonlinear Programming Nicholas I. M. Gould, Philippe L. Toint
125
Factors Affecting the Performance of Optimization-based Multigrid Methods Robert Michael Lewis, Stephen G. Nash
151
A Local Relaxation Method for Nonlinear Facility Location Problems Walter Murray, Uday V. Shanbhag
173
Fluence Map Optimization in IMRT Cancer Treatment Planning and A Geometric Approach Yin Zhang, Michael Merritt
205
vi
Contents
Panoramic Image Processing using Non-Commutative Harmonic Analysis Part I: Investigation Amal Aafif, Robert Boyer
229
Generating Geometric Models through Self-Organizing Maps Jung-ha An, Yunmei Chen, Myron N. Chang, David Wilson, Edward Geiser
241
Self-similar Solution of Unsteady Mixed Convection Flow on a Rotating Cone in a Rotating Fluid Devarapu Anilkumar, Satyajit Roy
251
Homogenization of a Nonlinear Elliptic Boundary Value Problem Modelling Galvanic Interactions on a Heterogeneous Surface Y.S. Bhat
263
A Simple Mathematical Approach for Determining Intersection of Quadratic Surfaces Ken Chan
271
Applications of Shape-Distance Metric to Clustering Shape-Databases Shantanu H. Joshi, Anuj Srivastava
299
Accurately Computing the Shape of Sandpiles Christopher M. Kuster, Pierre A. Cremaud
305
Shape Optimization of Transfer Functions Jiawang Nie, James W. Demmel
313
Achieving Wide Field of View Using Double-Mirror Catadioptric Sensors Ronald Perline, Emek Rose
327
Darcy Flow, Multigrid, and Upscaling James M. Rath
337
Iterated Adaptive Regularization for the Operator Equations of the First Kind Yanfei Wang, Qinghua Ma
367
Recover Multi-tensor Structure from H A R D MRI Under Bi-Gaussian Assumption Qingguo Zeng, Yunmei Chen, Weihong Guo, Yijun Liu
379
P A C B B : A Projected Adaptive Cyclic Barzilai-Borwein Method for Box Constrained Optimization Hongchao Zhang, William W. Eager
387
Contents Nonrigid Correspondence and Classification of Curves Based on More Desirable Properties Xiqiang Zheng, Yunmei Chen, David Groisser, David Wilson
vii
393
Preface
The Conference on Multiscale Optimization Methods and Applications (February 26-28, 2004) and a Student Workshop (March 3-4) took place at the University of Florida (UF), hosted by the Center for Applied Optimization and the local SIAM student chapter, the SIAM Gators. The organizers of the optimization conference were William Hager, Timothy Davis, and Panos Pardalos, while the student workshop was organized by a committee chaired by Beyza Asian, president of the SIAM Gators. In addition, Jung-ha An, Yermal Bhat, Shu-Jen Huang, Oleg Prokopyev, and Hongchao Zhang co-edited the student paper submissions to this volume. The conferences were supported by the National Science Foundation and by UF's Mathematics Department, Industrial and Systems Engineering Department, Computer and Information Science and Engineering Department, College of Liberal Arts and Sciences, and Division of Sponsored Research. At the optimization conference, poster prizes were awarded to Lei Wang, Rutgers University, Kenton Sze, UCLA, Jiawei Zhang, Stanford University, and Balabhaskar Balasundarara, Texas A & M University. At the student workshop, the awards for best presentation were given to Dionisio Fleitas, University of Texas, Arhngton, and Firmin Ndeges, Virginia Polytechnic Institute. For details concerning the student workshop, see the article "By students for students: SIAM Gators welcome chapters nationwide to Florida conference," SIAM News, Vol. 37, Issue 8. The conferences focused on the development of new solution methodologies, including general multilevel solution techniques, for tackhng difficult, large-scale optimization problems that arise in science and industry. Applications presented at the conference included: (a) (b) (c) (d) (e)
the circuit placement problem in VLSI design, the protein folding problem, and drug design, a wireless sensor location problem, internet optimization, the siting of substations in an electrical network.
X
Preface (f) optimal dosages in the treatment of cancer by radiation therapy, (g) facihty location, and (h) shape and topology optimization.
These problems are challenging and intriguing, often easy to state, but difficult to solve, fn each case, complexity is related to geometry: components must be placed so as to satisfy geometric constraints, while optimizing a cost function. The geometric constraints lead to an exponentially large solution space. The development of efficient techniques to probe this huge solution space is an ongoing effort that can have an enormous economic impact. The key to success seems to be the development of techniques that exploit problem structure. In solving difficult problems, it is important to have a quantitative way to compare the effectiveness of different approaches. For the circuit placement problem (the subject of talks by Jason Cong and Tony Chan, and a poster by Kenton Sze) benchmarks have been developed with known optimal solutions. When state-of-the-art algorithms for the circuit placement problem were applied to these benchmarks, researchers were startled to see the large gap between the best algorithms and the actual optimum (a factor of two difference between the best approximation and the optimum). The idea of generating test problems with known optimal solutions seems to be one research direction of importance in the coming years. For certain classes of problems, such as the quadratic assignment problems, best-known solutions as well as lower bounds on the cost are published. Although bounds and estimates are useful, test problems with known optimal solutions provide a quantitative way to compare the advantages and disadvantages of algorithms. The Netlib LP test set catalyzed the development of linear programming algorithms during the past decade, while the circuit placement benchmarks appear to be having a similar impact in VLSI design. In the area of VLSI design, a multilevel approach has proved to be an effective way to cope with the huge solution space (see paper of Tony F. Chan et ai). This approach, closely connected with ideas developed for multigrid techniques in partial differential equations, uses a series of scaled approximations to the original problem. The coarsest approximation is easiest to solve and coarse information passed on to the next finer level gives a good starting point for a more difficult problem. Research is progressing towards systematic ways of moving back and forth between problem scales in an increasing number of apphcation areas (see paper of Michael Lewis and Stephen Nash). In large-scale discrete optimization, another important strategy is to transform the discrete problem into a continuous setting. This is being done in many different ways. Semidefinite programming is used to obtain approximations to partitioning problems with a guaranteed error bound. Continuous quadratic programs are used in reformulations of both graph partitioning and maximum clique problems. A parameterized exponential transformation is used in the siting substation problem (paper by Walter Murray and Uday Shanbhag) to obtain a feasible point.
Preface
xi
Interest in interior point methods remains strong; it is the basis of some powerful optimization packages. On the other hand, recent success was reported by Igor Griva with an exterior approach based on a primal-dual nonlinear rescaling method. The approach was particularly effective in a neighborhood of an optimum where numerical errors and stabihty issues impede convergence of interior point methods. In the area of continuous, smooth optimization, the "Newton Liberation Front" (NLF), introduced by Philippe Toint, received strong endorsement. In the past, researchers placed much emphasis on global convergence of algorithms. This led to rather stringent criteria for the acceptance of a Newton step. A new class of acceptance criteria is emerging, the so-called filter methods. And a new variation of this criterion, with a less stringent acceptance criterion, yielded superior performance on a large set of test problems (paper of Nicholas Gould and Philippe Toint). The papers in this book highlight some of the research presented at the Gainesville conferences. Additional papers wiU appear in a special issue of Computational Optimization and Applications. We would like to thank the sponsors and participants of the conference, the authors, the anonymous referees, and the publisher for helping us produce this volume.
Gainesville, Florida, USA April, 2005
William W. Eager Shu-Jen Huang Panos M. Pardalos Oleg A. Prokopyev
List of Contributors
Amal Aaflf Department of Mathematics Drexel University Philadelphia, PA 19104 USA amalOdrexel.edu Jung-ha An Department of Mathematics University of Florida Gainesville, FL 32611 USA junghaOmath.uf1.edu
Devarapu Anilkumar Department of Mathematics Indian Institute of Technology Madras, Chennai 600 036 India
[email protected],in Pratik Biswas Electrical Engineering Stanford University Stanford, CA 94305 USA pbiswasOstanford.edu Y.S. Bhat Department of Mathematics University of Florida
Gainesville, FL 32611 USA ybhatOmath.uf1.edu
Robert Boyer Department of Mathematics Drexel University Philadelphia, PA 19104 USA rboyerOmcs.drexel.edu
Tony F. Chan UCLA Mathematics Department Los Angeles, California 90095-1555 USA chanOmath.ucla.edu
Myron N . Chang Department of Biostatistics University of Florida Gainesville, FL 32611 USA mchangQcog.uf1.edu Ken Chan The Aerospace Corporation 15049 Conference Center Drive, Chantilly, VA 20151 USA kenneth.f.chanQaero.org
List of Contributors Yunmei Chen Department of Mathematics University of Florida Gainesville, FL 32611 USA yunOmath.uf1.edu
David Groisser Department of Mathematics University of Florida Gainesville, FL 32611 USA
Andrzej Cichocki Laboratory for Advanced Brain Signal Processing Brain Science Institute, RIKEN Wako-shi Japan
Fred Glover School of Business University of Colorado at Boulder Boulder, Colorado 80304 USA
groisserOmath.uf1.edu
Fred.GloverOColorado.edu
ciaObsp.brain.riken.jp
Jason Cong UCLA Computer Science Department Los Angeles, Cahfornia 90095-1596 USA congQcs.ucla.edu James W. Demmel Department of Mathematics University of California Berkeley, CA 94710 USA demmelOmath.bekeley.edu
Nick Gould Rutherford Appleton Laboratory Computational Science and Engineering Department Chilton, Oxfordshire England gouldOrl.ac.uk Pierre A. Gremaud Management Science and Department of Mathematics and Center for Research in Scientific Computation Raleigh, NC 27695 USA gremaudOmath.ncsu.edu
Edward Geiser Department of Medicine University of Florida Gainesville, FL 32611 USA
Weihong Guo Department of Mathematics University of Florida Gainesville, FL 32611 USA
geiseeaOmedicine.uf1.edu
guoQmath.uf1.edu
Pando G. Georgiev ECECS Department University of Cincinnati Cincinnati, Ohio 45221-0030 USA
William W. Hager Department of Mathematics University of Florida Gainesville, FL 32611 USA
[email protected]
pgeorgieQececs.uc.edu
List of Contributors
XV
Shantanu H. Joshi Department of Electrical Engineering Florida State University Tallahassee, PL 32310 USA j oshiOeng.f su.edu
Qinghua Ma Department of Information Sciences College of Arts and Science of Beijing Union University Beijing, 100038 P.R.China qinghuaSygi.edu.en
Gary A. Kochenberger School of Business University of Colorado at Denver Denver, Colorado 80217 USA Gary.KochenbergerQcudenver.edu
Michael Merritt Department of Computational and Applied Mathematics Rice University Houston, TX 77005 - 4805 USA mmerrittOcaam.rice.edu
Emek Kose Department of Mathematics Drexel University Philadelphia, PA 19103 USA
[email protected] Christopher M. Kuster Department of Mathematics and Center for Research in Scientific Computation Raleigh, NC 27695 USA cmkust erOmath.nc su.edu
Walter Murray Department of Management Science and Engineering Stanford University Stanford, CA 94305-4026 USA walterOstanford.edu Stephen G. Nash School of Information Technology and Engineering Mail Stop 5C8 George Mason University Fairfax, VA 22030 USA snashQgmu.edu
Robert Michael Lewis Department of Mathematics College of William & Mary Williamsburg, Virginia, 23187-8795 USA buckarooQmath.wm.edu Yijun Liu Department of Psychiatry University of Florida Gainesville, FL 32611 USA
[email protected]. edu
Jiawang Nie Department of Mathematics University of California Berkeley, CA 94710 USA njwSmath.bekeley.edu Ronald Perline Department of Mathematics Drexel University Philadelphia, PA 19103 USA rperlineSmcs.drexel.edu
xvi
List of Contributors
James M. Rath Institute for Computational Engineering and Sciences University of Texas at Austin TX 78712 USA orgcinismSices. u t e x a s . edu Satyajit Roy Department of Mathematics Indian Institute of Technology Madras, Chennai 600 036 India sjroySiitm.ac.in Uday V. Shanbhag Department of Mechanical and Industrial Engineering University of Illinois at UrbanaChampaign Urbana, II 61801 USA udaybagQstcinf ord. edu
Fabian Theis Institute of Biophysics University of Regensburg D-93040 Regensburg Germany f a b i a n S t h e i s . nsime Philippe L. Toint University of Namur Department of Mathematics 61, rue de Bruxelles, B-5000 Namur Belgium philippe.tointOfundp.ac.be Yanfei Wang State Key Laboratory of Remote Sensing Science P.O. BOX 9718, Beijing 100101 P.R.China yf Wcing_ucf Oyahoo. com David Wilson Department of Mathematics University of Florida Gainesville, FL 32611 USA dcwQmath.ufl.edu
Joseph R. Shinnerl UCLA Computer Science Department Los Angeles, Cahfornia 90095-1596 USA shirmerlOcs.ucla.edu
Min Xie UCLA Computer Science Department Los Angeles, Cahfornia 90095-1596 USA xieScs.ucla.edu
Anuj Srivastava Department of Statistics Florida State University Tallahassee, FL 32306 USA anujOstat.fsu.edu
Yinyu Ye Management Science and Engineering Stanford University Stanford, CA 94305 USA yinyu-yeSstanford.edu
Kenton Sze UCLA Mathematics Department Los Angeles, Cahfornia 90095-1555 USA nkszeOmath.ucla.edu
Qingguo Zeng Department of Mathematics University of Florida GainesviUe, FL 32611 USA qingguoOmath.uf1.edu
List of Contributors
xvii
Hongchao Zhang Department of Mathematics University of Florida Gainesville, FL 32611 USA hzhctngOuf 1. edu
Yin Zhang Department of Computational and Applied Mathematics Rice University Houston, TX 77005 - 4805 USA yzhangOcaam.rice.edu
Yan Zhang UCLA Computer Science Department Los Angeles, California 90095-1596 USA zhangyanOcs.ucla.edu
Xiqiang Zheng Department of Mathematics University of Florida Gainesville, FL 32611 USA xzhengSmath.uf1.edu
Multiscale Optimization in VLSI Physical Design Automation Tony F. C h a n \ Jason Cong^, Joseph R. Shinnerl^, Kenton Sze\ Min Xie^, and Yan Zhang^ ^ UCLA Mathematics Department, Los Angeles, CaHfornia 90095-1555, USA. {chan,nksze}Qmath.ucla.edu ^ UCLA Computer Science Department, Los Angeles, CaHfornia 90095-1596, USA. {cong,shinnerl,xie,zhangyan}Qcs.ucla.edu
Summary. The enormous size and complexity of current and future integrated circuits (IC's) presents a host of challenging global, combinatorial optimization problems. As IC's enter the nanometer scale, there is increased demand for scalable and adaptable algorithms for VLSI physical design: the transformation of a logicaltemporal circuit specification into a spatially explicit one. There are several key problems in physical design. We review recent advances in multiscale algorithms for three of them: partitioning, placement, and routing. K e y words: VLSI, VLSICAD, layout, physical design, design automation, scalable algorithms, combinatorial optimization, multiscale, multilevel
1 Introduction In the computer-aided design of very-large-scale integrated circuits (VLSICAD), physical design is concerned with the computation of a precise, spatially explicit, geometrical layout of circuit modules and wires from a given logical and temporal circuit specification. Mathematically, the various stages of physical design generally amount to extremely challenging mixed integer— nonlinear-programming problems, including large numbers of both continuous and discrete constraints. The numbers of variables, nonconvex constraints, and discrete constraints range into the tens of millions and beyond. Viewed discretely, the solution space grows combinatorially with the number of variables. Viewed continuously, the number of local extrema grows combinatorially. The principal goals of algorithms for physical design are (i) speed and scalabihty; (ii) the ability to accurately model and satisfy complex physical constraints; and (iii) the ability to attain states with low objective values subject to (i) and (ii).
2
Tony F. Chan et al.
Highly successful multiscale algorithms for circuit partitioning first appeared in the 1990s [CS93, KAKS97, CAMOO]. Since then, multiscale metaheuristics for VLSICAD physical design have steadily gained ground. Today they are among the leading methods for the most critical problems, including partitioning, placement and routing. Recent experiments strongly suggest, however, that the gap between optimal and attainable solutions remains quite substantial, despite the burst of progress in the last decade. Thus, an improved understanding of the apphcation of multiscale methods to the large-scale combinatorial optimization problems of physical design is widely sought [CS03]. A brief survey of some leading multiscale algorithms for the principal stages of physical design — partitioning, placement, and routing — is presented here. First, the role of physical design in VLSICAD is briefly described, and recent experiments revealing a large optimality gap in the results produced by leading placement algorithms are reviewed. 1.1 Overvievv^ of VLSI Design As illustrated in Figure 1, VLSI design can be divided into the following steps: system modeling, architectual synthesis, logic synthesis, physical design, fabrication, packaging. •
•
•
•
•
System modeling. The concepts in the designer's mind are captured as a set of computational operations and data dependencies subject to constraints on timing, chip area, etc. Functional Design. The resources that can implement the system's operations are identified, and the operations are scheduled. As a result, the control logic and datapath interconnections are also identified. Functional design is also called high-level synthesis. Logic synthesis. The high-level specification is transformed into an interconnection of gate-level boolean primitives — nand, xor, etc. The circuit components that can best realize the functions derived in functional design are assembled. Circuit delay and power consumption are considered at this step. The output description of the interconnection between different gate-level primitives is usually called a netlist (Section 1.2). Physical design. The actual spatial layout of circuit components on the chip is determined. The objectives during this step usually include total wirelength, maximum signal propogation time ("performance"), etc. Physical design can be further divided into steps including partitioning, floorplanning, placement, and routing; these are described in Section 1.2 below. Fabrication. Fabrication involves the deposition and diffusion of material onto a sihcon wafer to achieve desired electronic circuit properties. Since designs will make use of several layers of metal for wiring, masks mirroring the layout on each metal layer will be applied in turn to produce the required interconnection pattern by photolithography.
Multiscale Optimization in VLSI Physical Design Automation System Modeling
Functional Design ^
Logic Synthesis
Physical Design
Fabrication
Packaging
Fig. 1. VLSI design includes system specification, functional design, logic physical design, fabrication, and paclcaging. •
n.
Packaging. The wafer is diced into individual chips, which are then packaged and tested.
As the fundamental physical barriers to continued transistor miniaturization begin to take shape, efforts in synthesis and physical design have intensified. The main component stages of physical design are reviewed next in more detail. 1.2 Overview of VLSI Physical Design At the physical level, an integrated circuit is a collection of rectangular modules connected by rectangular wires. The wires are arranged in parallel, horizontal layers stacked along the z axis; the wires in each layer are also parallel. Each module has one face in a prescribed rectangle in the xy-plane known as the placement region. However, different modules may intersect with different numbers of metal wiring layers. After logic synthesis, most modules are
Tony F. Chan et al.
Cell
'chip multipin net
two-pm net
Fig. 2. A 2-D illustration of the physical elements of an integrated circuit. The routing layers have been superimposed. selected from a cell library and assigned to logic elements as part of a process known as technology mapping. These modules are called standard cells or simply cells. Their widths (x-direction) may vary freely, but their heights {ydirection) are taken from a small, discrete set. Other, larger modules may represent separately designed elements known as IP blocks (inteUectual-property blocks) or macros; the heights of these larger blocks typically do not fall within the standard-cell heights or their integer multiples. The area of a module refers to the area of its cross-sections in the xy-plane. A signal may propagate from a source point on a module to any number of sinks on other modules. The source and sinks together define a net. At steady state, a net is an equipotential of the circuit. A connection point between a net and a module is called a pin. Hence, a net may be abstracted as either a set of pins, or, less precisely, as the set of modules to which these pins belong. The netlist specifies the nets of a circuit as lists of pins and is a product of logic synthesis (Section 1.1). The physical elements of an IC are iUustrated in Figure 2. As iUustrated in Figure 3, VLSI physical design proceeds through several stages, including partitioning, floorplanning, placement, routing, and compaction [DiM94, She99].
Multiscale Optimization in VLSI Physical Design Automation Logic Synthesis Physical design
Partitioning
Floorplanning
Placement
Routing
Compaction
Fabrication
Fig. 3. Stages in the physical design of integrated circuits. Partitioning Due to the complexity of integrated circuits, the first step in physical design is usually to divide a design into subdesigns. Considerations include area, logic functionality, and interconnections between subdesigns. Partitioning is applied recursively until the complexity in each subdesign is reduced to the extent that it can be handled efficiently by existing tools. Floorplanning The shapes and locations of the components within each partitioning block are determined at this stage. These components are also called blocks and may be reshaped. Floorplanning takes as input a set of rectangular blocks, their fixed areas, their allowed shapes expresed as maximum aspect ratios, and the connection points on each block for the nets containing it. Its output includes the shape and location of each block. Constraints may involve the location
6
Tony F, Chan et al.
of a block and/or adjacency requirements between arbitrary pairs of blocks. The blocks are not allowed to overlap. Floorplanning is typically limited to problems with a few hundred or a few thousand blocks. As such, it is typically used as a means of coarse placement on a simplified circuit model, either as a precursor to placement or as a means of guiding logic synthesis to a physically reasonable solution. Placement In contrast to floorplanning, placement treats the shapes of all blocks as fixed; i.e., it only determines the location of each block on the chip. The variables are the xy-locations of the blocks; most blocks are standard cells (Section 1.2). The y-locations of cells are restricted to standard-cell rows, as in Figure 2. Placement instance sizes range into the tens of millions and will continue to increase. Placement is usually divided into two steps: global placement and detailed placement. Global placement assigns blocks to certain subregions of the chip without determining the exact location of each component within its subregion. As a result, the blocks may still overlap. Detailed placement starts from the result of global placement, removes all overlap between blocks, and further optimizes the design. Placement objectives include the estimated total wirelength needed to connect blocks in nets, the maximum expected wiring congestion in subsequent routing, and/or the timing performance of the circuit. A simphfied formulation of placement is given in Section 1.5.1. Routing With the locations of the blocks fixed, their interconnections as specified by the netlist must be realized. That is, the shapes and locations of the metal wires connecting the blocks must be determined. This wiring layout is performed not only within the placement region but also in a sequence of parallel metal routing layers above it. Cells constitute routing obstacles in layers which pass through them. Above the cells, all the wires in the same routing layer are parallel to the same coordinate axis, either x or y. Routing layers alternate in the direction of their wires. Interlayer connections are called vias. The objective of routing is to minimize the total wirelength while realizing all connections subject to wire spacing constraints within each layer. In addition, the timing performance of the circuit may also be considered. Routing is usually done in two steps, global routing and detailed routing. Global-routing algorithms determine a route for each connection in terms of the regions it passes through, without giving the exact coordinates of the connection. During this phase, the maximum congestion in each region must be kept below a certain hmit. The goal of detailed routing is to realize a point-to-point path for each net following the guidance given by the global routing. It is in this step that the geometric location and shape of each wire is determined. Due
Multiscale Optimization in VLSI Physical Design Automation
7
to the sequential nature of most routing algorithms, a 100% completion rate may not be obtained for many designs. An additional step called rip-up and reroute is used to remove a subset of connections already made and find alternate routes, so that the overall completion rate can be improved. The rip-up and reroute process works in an iterative fashion until either no improvement can be obtained or a certain iteration limit is reached. Compaction Compaction is used to reduce the white space on the chip so that the chip area can be minimized. This step involves heavy manipulation of geometric objects. Depending on the movement these geometric objects are allowed, compaction can be categorized into 1-D compaction or 2-D compaction. However, the chip area for many of the designs are given as fixed. In this case, instead of compacting the design, intelligent allocation of white space can be adopted to further optimize certain metrics, e.g., routing congestion, maximum temperature, etc. 1.3 Hypergraph Circuit Model for Physical Design An integrated circuit is abstracted more accurately as a hypergraph than as a graph, because each of its nets may connect not just a pair of nodes but rather an arbitrarily large subset of nodes. The details of the abstraction depend on the point in the design flow where it is used. A generic definition of the hypergraph concept is given here. In later sections, specific instances of it are given for partitioning, placement, and routing. A hypergraph H = {V, E} consists of a set of vertices V = {fi,f2, •••Vn] and a set of hyperedges E = {ei, 6 2 , . . . , e ^ } . Each hyperedge Cj is just some subset of V, i.e., ej = {t'jijVjj, ...f^^} C V. Each hyperedge corresponds to some net in the circuit. Each vertex Vi may have weight wivi) associated with it, e.g., area; each hyperedge ej may have weight w{ej) associated with it, e.g., timing criticality. In either case, the hypergraph itself is said to be weighted as well. The number of vertices contained by e^ (we will also say "connected by" ej ) is called the degree of Cj and is denoted |ej|. The number of hyperedges containing vi is called the degree of Vi and is denoted \vi\. Every hypergraph, weighted or unweighted, has a dual. The dual hypergraph H' = {V, £"} of a given hypergraph H — {y, -B} is defined as follows. First, let V = E; if H is weighted, then let w{vl) = w{ei). Second, for each Vi & V, let e[ e E' be the set of Cj £ E that contain Vi. li H is weighted, then let w{e'j) = 'w{vi). It is straightforward to show that H", the dual of H', is isomorphic to H. 1.4 The Gigascale Challenge Since the early 1960s, the number of transistors in an integrated circuit has doubled roughly every 18 months. This trend, known as Moore's Law, is ex-
8
Tony F. Chan et al.
pected to continue into the next decade. Projected statistics from the 2003 International Technology Roadmap for Semiconductors (ITRS 2003) are summarized in Table 1. Production year DRAM 1/2 pitch (nm) M transistors/chip Chip size (mm'') Local clock (MHz) Wiring levels
2003 100 153 140 2976 13
2004 90 193 140 4171 14
2005 80 243 140 5204 15
2006 70 307 140 6783 15
2007 65 386 140 9285 15
2008 57 487 140 10972 16
2009 50 614 140 12369 16
Table 1. Circuit Statistics Projections from ITRS 2003 [itr].
Over 40 years of this exponential growth have brought enormous complexity to integrated circuits, several hundred million transistors integrated on a single chip. Although the power of physical-design algorithms has also increased over this period, evidence suggests that the relative gap between optimal and achievable widens with increasing circuit size (Section 1.5). As the number and heterogeneity of devices on chip continues to increase, so does the difficulty in accurately modeling and satisfying various manufacturability constraints. Typically, constraints in VLSICAD physical design are concerned with module nonoverlap ("overlap"), signal propagation times ("timing"), wiring congestion ("routability"), and maximum temperature. A detailed survey of the modeling techniques currently used for these conditions is beyond the scope of this chapter. However, the practical utility of any proposed algorithm rests largely in (a) its scalability and (b) its ability to incorporate constraint modeling efficiently and accurately at every step. Recent research [CCXOSb, JCX03] strongly suggests that, over the last few decades, advances in algorithms for physical design have not kept pace with increasing circuit complexity. These studies are reviewed next. 1.5 Quantifying the Optimality Gap As even the simplest formulations of core physical-design problems are NPhard [She99j, practical algorithms rely heavily on heuristics. Meaningful bounds on the deviation from optimal are not yet known for these algorithms as applied to the design of real circuits. However, a recent optimality study of VLSI placement algorithms shows a large gap between solutions from stateof-the-art placement algorithms and the true optima for a special class of synthetic benchmarks. In this section, the wirelength-driven placement problem is introduced for the purpose of summarizing this study. In Section 3, the role of placement in VLSICAD is considered in more detail, and some recent multiscale algorithms for it are reviewed.
Multiscale Optimization in VLSI Physical Design Automation
9
1.5.1 The Placement Model Problem In the given hypergraph-netlist representation H = {V, E) of an integrated circuit, we require for placement that each Vi has a given, fixed rectangular shape. We assume given a bounded rectangle R in the plane whose boundaries are parallel to coordinate axes x and y. The orientation of each Vi will also be assumed prescribed in alignment with the boundaries of R, although in some cases flipping Vi across coordinate axes may be allowed. The length of Vi along the a;-axis is called its width; its length along the y-axis is called its height. The vertices Vi are typically represented at some fixed level of abstraction. Possible levels are, from lowest to highest, transistors, logic gates, standard cells, or macros (cf. Section 1.2). As IC's become more heterogeneous, the mixed-size problem, in which elements from several of these levels are simultaneously placed, increases in importance. The coverage in this chapter assumes the usual level of standard cells. Interconnections among placed cells (Section 4) are ultimately made not only within R but also in multiple routing layers above R; each routing layer has the same x and y coordinates as R but a different z coordinate. For this reason, the total area of all Vi G V may range anywhere from 50% to 98% or more of the area of R. With multiple routing layers above placed cells, making metal connections between all the pins of a net can usually be accomplished within the bounding box of the net. Therefore, the most commonly used estimate of the length of wire i{ei) that will be required for routing a given net ej = {vi^ ,Vi^,... ,Vij} is simply the half-perimeter of its bounding box: i{ei) = (m&xx{vi^) - minx{vi^)j
+ f m a x y ( v i j - mmy{v,,)j
.
(1)
Wirelength-Driven Model Problem In the simplest commonly used abstraction of placement, total 2D-boundingbox wirelength is minimized subject to the pairwise nonoverlap, row-alignment, and placement-boundary constraints. Let w{R) denote the width (along the x-direction) of the placement region R, and let y i , y 2 , . . . ,F„^ denote the ycoordinates of its standard-cell rows' centers; assume every cell fits in every row. With {xi, yi) denoting the center of cell Vi and Wi denoting its width, this wirelength-driven form of placement may be expressed as ™n(j.._y,) I]eg£;iy(e)^(e)
for t{e) defined in (1)
subject to yj G {Yi,...,y„^} all ^i G V" 0 < Xj < w{R) - Wi/2 all Vi&V \xi -Xj\ > {wi + Wj)/2 or yi ^ j/j all Vi,Vj G V.
(2)
Despite its apparent simplicity, this formulation captures much of the difficulty in placement. High-quality solutions to (2) generally serve as useful starting
10
Tony F, Chan et al.
wm
mm \ r
F
\
i
w T)
H
i
Sii
^
n %
W
T J
imm^^m^^m:
m mm
:•;•:':
% :•:•:•.
•<:•:•
Fig. 4. PEKO generation for p = 9, D = (6,2,2). points for more elaborate models of real circuits. The optimality gap for most leading academic tools applied to (2) has been observed to be quite large for the PEKO benchmarks discussed next. 1.5.2 Placement Examples with Known Optima (PEKO) Placement algorithms have been actively studied for the past 30 years. However, there is little understanding of how far computed solutions are from optimal. It is also not known how much the deviation from optimality is likely to grow with respect to problem size. Recently, significant progress was made toward answers to these questions using cleverly constructed placement examples with known optima (PEKO) [CCXOSa]. The construction of PEKO can be stated as follows. Given a netlist A'^, let D{N) = {d2, ( i s , . . . , dn) be the Net Distribution Vector (NDV), where dk is the total number of fc-pin nets in the nethst. PEKO examples have all cells of equal size. Given a number p and a vector D, we construct a placement example with p placeable cells such that (i) its nethst has D as its NDV and (ii) it has a known placement of optimal half-perimeter wirelength. The cells are first arranged in a nearly square rectangular region as a regular 2-D array of uniform rows and columns, except possibly the last row, which may not be filled. After that, nets are defined one by one on the cells in such a way that the bounding box for each net has minimal perimeter. Each fc-pin net connects cells within a region of size \/fc X k/
vfc
(or k/
vfc
x
\/fe ). The
wirelength for each fc-pin net thus constructed is optimal. In the end, the specific netlist is extracted from this placed configuration. Figure 4 shows an example, where p = 9, D = (6,2,2). Net A is a 4-pin net. Accordingly, it will connect four cells located in a 2 x 2 rectangular region. In Figure 4, it connects the four cells in the lower left corner. The other 4-pin net, B, is placed on the lower right corner. Using the same method, the two 3pin nets are generated as C and D respectively. This process is repeated until the NDV is exhausted. The total wirelength for this example is 6*l-f2*2+2*2
Multiscale Optimization in VLSI Physical Design Automation
11
Fig. 5. An 8x8 PEKO instance. = 14. To mimic real circuits, the NDV used in [CCXOSa] to generate PEKO are extracted from real circuits [Alp98]. Another, still tiny example with a more realistic NDV is shown in Figure 5. Four state-of-the-art placers from academia including Dragon [WYSOOa], Capo [CKMOO], mPL [CCKSOO], mPG [CCPY02], and one industrial placer, QPlace [Cad99] are compared on PEKO. Figure 6 compares their ratios of attained wirelengths to optimal wirelengths for several instances of PEKO of different size. Overall, their wirelengths are 1.59 to 2.40 times the optimal in the worst cases and are 1.43 to 2.12 times the optimal on average. As for scalability, the average solution quality of each tool deteriorates by an additional 9% to 17% when the problem size increases by a factor of 10. These results indicate significant room for improvement in existing placement algorithms. Run times are shown in Figure 7. These results have generated great interest among industrial designers and academic researchers, with over 150 downloads by major universities and EDA and semiconductor companies, e.g.. Cadence, Synopsys, Magma, IBM, and Intel, etc., in just the first year of their release. There have also been three EE times articles covering the study [Goe03c, GoeOSb, Goe03a]. 1.6 Multiscale Optimization — Opportunities and Obstacles Concurrently with the steady advances in VLSI design, multiscale methods have emerged as a means of of generating scalable solutions to many diverse
12
Tony F. Chan et al.
'
Dragon 2.20 Capo 8.6 mPGLO mPL3.0 QPIace6.1
/
/\
I ''
/
"-.
AL
_ i...
:ri^s|"
\p
- * - . •
-»'-
»
* 1
.U*-
—X— •
II
V-V
•'
-
p ?
^ Ik?-- J .,.Q„,
.
.••-°s"l„e.'
-J ...a/ \!fl •=
HB
.
Fig. 6. Wirelength ratios vs. numbers of cells on PEKO test cases. Each ratio is the wirelength attained by the given tool divided by the optimal wirelength on the indicated benchmark.
90000
-
1
80000
30000
*1 20000
/
-
V*
5J
*
-
/
/J
40000
••
i 1
-
/
\
50000
-
-
;
60000
0
• - * • -
1 1
70000
10000
Dragon 2.20 Capo 8.6 - —X'mPG 1.0 • m PL 3.0 •„.Q„.. QPIace5.1 '
;
/
zlEi^sl!;
F i g . 7. Run time vs. numbers of cells on PEKO test cases.
-
-
Multiscale Optimization in VLSI Physical Design Automation
13
mathematical problems in the gigascale range. However, multiscale methods for PDEs are not readily transferred to the large-scale combinatorial optimization problems common to VLSICAD. A lack of continuity presents one obstacle. The presence of myriad local extrema presents another. Nevertheless, in recent years, multiscale methods have also been successfully applied to VLSICAD [CS03]. In circuit partitioning, hMetis and MLpart [KAKS97, CAMOO] produce the best cutsize minimization, and MLPR [CW02] produces the best balance of timing delay and cutsize. Significant progress has also been made in multiscale placement [SR99, CCK+03, SWY02, CCPY02, KW04, HMS04] and routing [CFZOl, CXZ02, CL04]. Hierarchical levels of abstraction are indispensable in the design of gigascale complex systems, but hierarchies must properly represent physical relationships, viz., interconnects, among constituent parts. The flexibility of the multiscale heuristic provides the opportunity both to merge previously distinct phases in the design flow and to simultaneously model very diverse, heterogeneous kinds of constraints. 1.7 An Operative Definition of Multiscale Optimization The engineering complexity of VLSICAD has led researchers to a large variety of algorithms and terminology quite distinct from what exists in other areas. In this section, we take a fairly inclusive view and try not to impose artificial distinctions. By multiscale optimization, we mean (i) the use of optimization at every level of a hierarchy of problem formulations, wherein (ii) each variable at any given coarser level represents a subset of variables at the adjacent finer level. In particular, each coarse-level formulation can be viewed directly as a coarse representation of the original problem. Therefore, coarse-level solutions implicitly provide approximate solutions at the finest level as well. While many long-standing paradigms in VLSICAD employ optimization hierarchicaUy, multiscale algorithms as defined above are a relatively new phenomenon. This distinction is considered further in Section 3.2. The terms multilevel and multiscale are used synonomously. Characterization of Multiscale Algorithnis L Hierarchy Construction. Although the construction is usually from the bottom up by recursive aggregation, top-down constructions are also possible, as described in Section 3.4 below. 2. Relaxation. In the combinatorial setting, the purpose of intralevel optimization is not generally viewed as error smoothing but rather the efficient, iterative exploration of the solution space at that level. Continuous, discrete, local, global, stochastic, and deterministic formulations may be used in various combinations.
14
Tony F. Chan et al,
3. Interpolation. A coarse-level solution can be transferred to and represented at its adjacent finer level in a variety of ways. The simplest and most common is simply the placement of all components of a cluster concentrically at the cluster's center. This choice amounts to a piecewiseconstant interpolation. 4. Iteration Flow. The levels of the hierarchy may be traversed in different ways. A single pass from the coarsest to the finest level is still the most common. Alternatives include standard flows such as a single V-cycle, multiple V-cycles, W-cycles, and the full multigrid (FMG) F-cycle (see Figure 8 on Page 14). The forms taken by these components are usually tightly coupled with the diverse objective models and constraint models used by different algorithms.
Iterated V-Cycles 0(logN))
F-Cycle (O(logNlogN))
Backtracking V-Cycle O(logN)
Fig. 8. Some iteration flows for multiscale optimization.
2 Multiscale Hypergraph Partitioning Given a hypergraph H = {V, E}, the hypergraph partitioning problem is to partition V into disjoint subsets P ~ {Vi,V2i •••, Vfc}, i.e., Vi nVj =4) for i ^ j ,
and UVi = V,
subject to certain objectives and constraints. Each block Vi of partition P is also called a partition. The area of partition Vi is defined as area{Vi) = Y J area(v) veVi
Multiscale Optimization in VLSI Physical Design Automation
15
The objective in hypergraph partitioning is usually to minimize the outsize, the number of hyperedges that connect vertices from two different partition blocks. Typical constraints in the hypergraph partitioning problem include the following. • •
The relative area of each partition block. E.g., li x area{V) < area{Vi) < Ui X area{V) for user-specified parameters /j and u,. The number of partition blocks k. When fc = 2, the problem is called bipartitioning. When fc > 2, the problem is called multiway partitioning. Multiway partitioning problems are often reduced to a recursive sequence of bipartitioning problems. We focus on bipartitioning algorithms for the rest of this section.
2.1 Early Multiscale Hypergraph Partitioning: F M C An early effort to accelerate the standard Fiduccia-Mattheyses Algorithm (FM, Figure 10 on Page 17) by recursive clustering was made without any deliberate connection to existing multiscale algorithms [CS93]. To facilitate coarsening, a graph corresponding to the original hypergraph is constructed. A bottom-up clustering algorithm is applied to recursively collapse small cliques into clusters. An iterative-refinement algorithm is then applied to the clustered hypergraph to optimize the cutsize, satisfying the area constraint at the same time. This refinement is coupled with a recursive declustering process until the partitioning result on the original hypergraph is obtained. This early work is referred to as FMC (FM with clustering). 2.1.1 Hypergraph-to-graph transformation Coarsening decisions in FMC are guided by the strength of connection between pairs of vertices. Because a hyperedge may connect more than 2 vertices, the hypergraph if first approximated by a graph. Following the so called cliquemodel approximation, each r-vertex hyperedge becomes a clique on the same r vertices in the graph. Since a hyperedge is now represented as a union of (l)
edges, their weights should be properly scaled. Several weighting
schemes have been proposed [CP68, Don88, HK72]. The one adopted in FMC assigns | to each edge of the clique, so that the total weight of the r-chque is the number of connections required to connect r vertices, namely, r — 1. When the same two vertices belong to several hyperedges, the edge-weight contributions from all of them are summed to determine the edge weight. For a pair of vertices vi and V2, the final edge weight between them is thus calculated as w{vi,V2) = J2 ifj' Figure 9 gives an example of the cUque model approximation to a 4-vertex hypergraph. The edge weight between t^i
16
Tony F. Chan et al. el
Fig. 9. Transformation from a hypergraph to graph. Each hyperedge is transformed into a clique between all the vertices in the hyperedge. and V2 is 2/2+2/3 = 5/3, whose two components correspond to hyperedges ei and 63, respectively. As most real circuits contain a small fraction of high-degree nets — these range in cardinality from 6 to a few hundred vertices — the actual size of the largest clique could easily reach several tens of thousands. Practical graphbased approximations therefore do not use cliques to model high-degree nets; stars are a popular alternative. However, large nets are empirically observed to have Httle impact on the result of partitioning. In FMC, all nets of degree more than 5 are simply ignored. 2.1.2 Coarsening The coarsening scheme in FMC recursively collapses small cliques in a graph into clusters. The intuition comes from the theory of a random graph of n nodes and edge probability p. Let Xr be the expected number of r-cliques. Then for most n, there exists a threshold ro such that Xr^ ^ 1 and XfO+i < 1. The threshold ro is calculated as ro = 2log(, n - 2logblogj, n + 2logb - + l + o(l), where b = I/p. In other words, the value of ro is an approximation of the size of the largest clique in the graph. It is empirically observed that ro is usually no more than 5 for typical hypergraphs found in VLSICAD. Starting from the transformed graph, the coarsening searches for ro-cUques and (ro + l)-cliques for clustering. However, only cliques meeting the following criteria are accepted. • • •
Area limit. No cluster's area may exceed this fixed fraction of the total area of the original graph. Size limit. No cluster may contain more than this total number of vertices from the original graph. Density threshold. The density of a cluster with c nodes, defined as the sum of the weights of its edges divided by I
|, must equal or exceed a
fixed fraction of the density of the whole graph, a x D, where D is the
Multiscale Optimization in VLSI Physical Design Automation
17
Algorithm: FM refinement Initial partition generation Compute the cutsize reduction of all vertex movements While there exists a movable vertex do find V with the maximum reduction in cutsize move V to the destination partition and fix v record the movement and the intermediate cutsize update cutsize the reduction of vertices connected with v End While Find the minimum intermediate cutsize Cmin Apply the sequence of moves which leads to Cmin Fig. 10. The FM partitioning algorithm for hypergraphs and graphs [FM82].
total edge weight of the graph divided by ( ^ j , n the total number of vertices in the graph. The parameter a is empirically determined. The density threshold is imposed to ensure that the vertices in a cluster are strongly connected. It also prevents cliques introduced by high-cardinality hyperedges from automatically becoming clusters. After each pass of clustering, the threshold ro is recomputed for the clustered graph, and another pass is made with the new threshold on the clustered graph. The process terminates once the number of clusters produced is too small. A weighted matching is then applied to reduce the number of single unclustered vertices. Each quahfying pair is collapsed to a cluster. This step helps to balance the size of the clusters and further reduces the number of nodes in the clustered graph. 2.1.3 Initial Solution, Declustering, and Iterative Refinement After the final clustered graph is obtained, partitioning is performed on it by FM (Figure 10). This refinement is iterated for several passes. Afterwards, the coarsened hypergraph is recursively declustered following the coarsening hierarchy. Because many of the clusters may be very large in area, the areabalance constraint is not strictly enforced at coarser levels. At each level of hierarchy, the refinement is repeated on the clusters at that level to improve area balance and further reduce cutsize. This process of gradual declustering and refinement is repeated until the original hypergraph is restored, where further FM refinement is performed on the entire hypergraph. 2.1.4 Impact of Coarsening on the FM algorithm The final partitioning results for FMC are compared with those for pure FM. The characteristics of the hypergraphs used for comparison are given in Table 2. On average, the coarsening step helps to reduce the best cut size by
18
Tony F. Chan et al. hypergraph ^vertices #hyperedges 8870 bml PrimGAl PrimSCl 5655 Test04 Test03 Test02
TestOe Test05 19ks PrimGA2 PrimSC2 industry2
502 882 733 733 921 1515 1607 1663 1752 2595 2844 3014 3014 12142 Avg.
494 902 902 902 760 1658 1618 1721 1674 2751 3282 3029 3029 12949
Best Cut FM FMC 17 15 65 53 48 49 46 45 54 48 44 44 93 64 121 80 60 64 42 42 151 129 246 130 199 130 458 315 16.60%
Avg. Cut FM FMC 27.4 17.3 82.9 69.6 69.2 64.7 74 56.2 67.9 62.6 46 48.4 137.9 80.1 181.1 105.9 79.9 74.8 61.3 57.5 173.7 156.1 284 181.3 277.2 229.1 812.6 402.3 21.20%
Table 2. Impact of coarsening on FM algorithm. On average, coarsening helps to reduce the best cut size by 16.6%. It reduces the average cut size by 21.2% [CS93].
16.6%. It reduces the average cut size by 21.2%. The impact of coarsening on the refinement stage is obvious in this scenario. The results are very promising; however, they do not demonstrate the full power of multiscale methods. There are several limitations in this work. First, the coarsening scheme only collapses cliques. This artificially imposed constraint will eliminate many possible coarsening candidates which might lead to better final cutsize. Second, its control on the cluster size is limited. Since a whole chque is collapsed each time, the coarsened graph tends to become more uneven in cluster size. To our knowledge, FMC is the first application of multiscale optimization to circuit partitioning. Following FMC, there have been several efforts to apply multiscale methods to the hypergraph partitioning problem [AHK97, KAKS97, CLW99, CLOO, CLWOO, CAMOO]. Among the most successful are hMetis [KAKS97] and MLpart [CAMOO], described next. 2.2 h M e t i s hMetis [KAKS97, Kar99, Kar02] was first proposed for bipartitioning and later extended to multiway partitioning. Its iteration flow is a sequence of the traditional multiscale V-cycles shown in Figure 11. It constructs a hierarchy of partitioning problems by recursive clustering, generates multiple initialsolution candidates, then recursively declusters and refines at each level. It improves its candidate solutions at each level of declustering by FM-based local relaxation (Figure 10). A certain fraction of the poorest solution candidates is discarded after relaxation so as not to incur excessive run-time at
Multiscale Optimization in VLSI Physical Design Automation
19
Refinement
Initial solution
Fig. 11. V-cycle of hMetis. It consists of a coarsening phoase, an initial partition generation, and a refinement phase. the adjacent finer level. Each V-cycle after the first improves on the the result of its predecessor by restricting its coarsening to preserve the initial given partition. 2.2.1 First-Choice Coarsening In First-Choice Clustering, each vertex is associated witii one of its most closely-connected neighbors, irrespective of whether that neighbor has already been associated with some other vertex. The function (p : V x V —* [0, oo) defining how closely two vertices are connected is called the vertex affinity. It defines a weighted affinity graph over the vertices of the hypergraph, as follows. Two vertices u and v are connected in the aflnity graph if and only if they both belong to some common hyperedge, in which case the weight of the edge joining them is
20
Tony F. Chan et al. 5/3 v]
2/2 ii
^v3
2/3
v2
Fig, 12. First-Choice Clustering. Vertex v will be clustered with vertex vi since the edge between v and vi has the highest weight among all the edges incident to v. 2.2.2 Initial partition generation At the coarsest level, where the number of clusters is small compared with the original hypergraph, hMetis generates a pool of candidate solutions and refines all of them. In the end, the ten best initial solutions are kept and propagated to the adjacent finer level. At each level, all candidates solutions are improved by iterative refinement, and some of the worst candidates are discarded prior to interpolation to the next level, so that the total run time is not compromised. The pool of candidates is thus gradually pared down to a single best solution at the finest level. Since many clusters are relatively large in area, moving them across the partitioning boundary will often result in an area-constraint violation. On the other hand, forbidding such moves may lead the partitioning solution to a local minimum. As a compromise, hMetis allows intermediate solutions to violate the area constraints, as does FMC. However, it saves those solutions that satisfy the area constraints in the end. 2.2.3 Iterative Refinement Two algorithms are used during the refinement at each level. The first is based on FM with early termination. The vertices are visited in random order. Each vertex is moved across the outline if the move reduces cutsize without violating the area constraints. As soon as k vertices are moved without improvement in cutsize, the refinement stops. The maximum number of passes is limited to two. The second algorithm, Hyperedge Refinement (HER), simultaneously moves groups of vertices in the same hyperedge across the partitioning boundary so that this hyperedge is no longer cut. Compared with FM, HER targets hyperedges rather than individual vertices to reduce the cutsize. 2.2.4 Iterative V-cycle After the first V-cycle, hMetis applies additional V-cycles to further refine the cutsize until no appreciable improvement can be obtained. In the coarsening
Multiscale Optimization in VLSI Physical Design Automation
Coarsening
21
Refinement
Fig. 13. Iterative flow of hMetis. The recoarsening does not have to start from the finest level. It can start in the middle of the V-cycle as well. phase of each subsequent V-cycle, the partitioning result from the preceding V-cycle is preserved — only vertices that belong to the same partition block will be considered for clustering. The partitioning Pj+i of the next level coarser hypergraph Hi+i is derived directly from the partitioning result Pi from Hi. The coarsening thus obtained differs from that used in the preceding V-cycle and often leads to improved results in the subsequent interpolation and refinement at each level. Due to complexity considerations, the recoarsening does not necessarily start from the finest level. It can start in the middle of the V-cycle instead, as in the standard W-cycle flow for numerical PDE solvers [BHMOO]. Figure 13 gives such an example. Instead of letting the V-cycle finish and starting another, the recoarsening can start from a level between the coarsest and finest, and go through the following refinement stage, as in a normal V-cycle. 2.2.5 Comparision of hMetis with other Partitioning Algorithms The improvement of hMetis in run time and quality over the previous state of the art is dramatic. Numerical experiments compare hMetis to CDIP-LA3 and CLIP-PROP [DD97], PARABOLI [RDJ94], GFM [LLC95], GMetis [AK96]. Hypergraph characteristics for the test cases and the corresponding results for these algorithms are given in Table 3. Due to complexity issues, some algorithms do not produce reasonable output for some hypergraphs. Overall, the outsize produced by hMetis is 4% to 23% less than that produced by previous partitioning algorithms. The speed and scalability of hMetis is even more impressive, as its run time (not shown in Table 3) is one to two orders of magnitude less than the competition's. 2.3 MLpart MLpart [CAMOO] is another successful multiscale hypergraph partitioner. As with hMetis, it has a recursive coarsening phase, generation of an initial solu-
22
Tony F. Chan et al. hypergraph
1^1
\E\
CDIP- CLIP- PARA-GFM G Metis hMetis LA3 PROP BOLI
42 59 56 91 63 53 182 192 193 211 177 168 243 243 267 241 243 241 73 42 62 41 57 42 47 51 55 47 53 48 139 144 224 na 144 127 74 65 49 81 69 50 137 143 139 na 145 127 na na 1629 na 2111 1424 Table 3. Comparison of hMetis with other state-of-the-art partitioning algorithms in 1997. For each given hypergraph, \V\ denotes the number of vertices, and \E\ denotes the number of hyperedges. Overall, the cutsize produced by hMetis is 4% to 23% better than that by previous partitioning algorithms [KAKS97]. S15850 industry2 industryS S35932 S38584 avq.small S38417 avq.large golem3
10470 12637 15406 18148 20995 21918 23849 25178 103048
10383 13419 21923 17828 20717 22124 23843 25384 144949
tion, and interleaved interpolation and iterative refinement from the coarsest level back to the finest level. 2.3.1 Coarsening MLpart has several distinct features in its coarsening scheme. One is that the netlist is continuously updated as the coarsening progresses. When a pair of vertices is chosen for clustering, the clustering effect is immediately visible to all the remaining unclustered vertices. The vertex-to-vertex connection-weight calculation of MLpart is also the sum-of-weights contribution by the hyperedges connecting a pair of vertices. However, in MLpart, the weight contribution of 2-vertex hyperedges is set to 2, whereas the contribution of hyperedges with more than 2 vertices is set to 1. The intuition is that the weight should correspond to the potential reduction of pins from the hypergraph. (When the vertices of a 2-pin net are clustered together, that net becomes a singleton at all coarser levels and can thus be removed at those levels.) To discourage the formation of large clusters, the weight is divided by the sum of the cluster areas. An area-balance limit set to 4.5 x the average cluster size on each clustering level is imposed, so that clusters with area beyond the limit will not be formed. Clustering stops when the total number of clusters drops below 200. 2.3.2 Initial Partition Generation The initial partitioning at the coarsest level is randomly generated by assigning clusters in decreasing order of area with biased probability. Before all the partition blocks reach a minimum area threshold, the probability of a cluster being assigned to a particular partition block is proportional to the area slack of that partition after the cluster is assigned. After the threshold is reached.
Multiscale Optimization in VLSI Physical Design Automation
23
the probability is proportional to the maximum allowed area of each cluster. Following the random initial generation, FM-based refinement is applied to the initial solution. Similar to hMetis, multiple initial solutions are generated, but only the best solution is kept and propagated to finer levels. 2.3.3 Iterative refinement The refinement of MLpart is also based on FM. The authors also note that strictly satisfying the area constraint at coarser levels does not lead to the smallest possible cutsize at the finest level. Therefore, the acceptance criteria of a cluster movement is relaxed so that a move is accepted as long as it does not increase the violation of balance constraints. It is empirically observed that, although the area-balance criterion is relaxed, the refinement can usually reach a legal solution with smaller cutsize after the first pass. 2.3.4 Comparison witii hMetis
hypergraph ibmOl ibm02 ibm03 ibm04 ibm05 ibm06 ibm07 ibm08 ibm09 ibmlO ibmll ibml2 ibmlS ibml4 ibml5 ibml6 ibml7 ibml8
^vertices #hyperedge 14111 12506 19584 19342 22853 27401 27220 31970 28446 28146 32332 34826 45639 48117 50513 51023 60902 53110 75196 68685 70152 81454 70439 77240 83709 99666 147088 152777 186608 161187 190048 182980 184752 189581 210341 201920
hMetis MLPart
255 320 779 506
238 348 802 535
1740
1726
374 809
394 790
1166
1195
540 779 744
560 938 779
2326 1045 1894 2032 1721 2417 1632
2349 1095 1759 2029 1714 2316 1666
Table 4. Comparison of MLpart and hMetis. The two partitioners produce comparable results on 18 hypergraphs extracted from real circuits [CAMOO].
Table 4 gives the comparison between MLPart and hMetis. Overall, MLpart produces comparable cutsize compared with hMetis on 18 hypergraphs extracted from real circuits [Alp98]. The average relative cutsize difference is within 1%. Run times reported by the authors are also comparable. A recent
24
Tony F. Chan et al.
optimality study of partitioning algorithms, however, suggests that MLpart may be considerably more robust than hMetis. On a certain class of synthetic bipartitioning benchmarks with known upper bounds on optimal cutsize, MLpart consistently obtains cuts attaining the upper bounds, while hMetis finds cuts up to 20% worse [JCX03, CCRX04].
3 Multiscale Placement The spatial arrangement of circuit elements fundamentally constrains the layout of the circuit's interconnect and, therefore, the signal timing of the integrated circuit. Thus, as interconnect delay continues to increase relative to device delay, the importance of a good placement also increases. Rapid progress in placement has been made in the last few years independently across several different families of algorithms, with order-of-magnitude improvements in run-time and quality. Significant challenges remain, however, particularly with respect to the increasing size, complexity, and heterogeneity of integrated circuits and the constraint models associated with their design. Recent estimates of the gap between optimal and attainable placements (Section 1.5) strongly suggest that design improvements alone may produce the equivalent of at least a full technology generation's worth of improved performance [CCXOSb]. 3.1 Problem Description As described in Section 1.5, the placement problem is to assign coordinates {^i, Vi) G i? to the Vi G V. The wirelength-driven problem (2) is, however, only a generic representative of the true objectives and constraints, which include the following. 1. Overlap. No two cells may overlap. Expressed literally, this condition amounts to A''(A'' — l ) / 2 nonsmooth, nonconvex constraints for the placement of N cells. 2. Wirelength. The estimated total length of all wires forming these connections must be either as small as possible (an objective) or below a prescribed limit (a constraint). Prior to routing (Section 4), estimates of the wirelength must be used. 3. Timing Delay. The maximum propagation time of a signal along any path in the IC should be either as small as possible (an objective) or below a specified limit (a constraint). 4. Routing Congestion. The estimated density of wires necessary to route the nets for the placed cells must not be so large that the spacing between adjacent wires falls below the limits of manufacturability. 5. Additional Considerations. Other conditions may involve total power consumption, maximum temperature, inductance, noise, and other complex manufacturability criteria.
Multiscale Optimization in VLSI Physical Design Automation
25
Multiple mathematical formulations exist both for the data functions representing these conditions and the manner in which they are combined to formulate the placement problem. In practice, different formulations are often employed at different stages of design. For simplicity, the discussion in this section is limited primarily to wirelength-driven placement, however, it must be noted that the utility of any particular approach rests largely in its adaptability to more specific formulations which emphasize some subset of the above conditions over the others, e.g., timing-driven placement, routing-congestiondriven placement, temperature-driven placement, etc. A good placement supports a good routing, and ultimately, the quality of a placement can only be accurately judged after routing. 3.2 An Operative Definition of Multiscale Placement A precise, universally accepted definition of multiscale placement has yet to emerge. Many long-standing [Bre77, QB79, SS95, Vyg97, KSJA91] and more recent [CKMOO, YMOl, WYSOOb, BROS, CCK+03, CCPY02, KW04] algorithms for placement employ optimization at every level of a recursively constructed hierarchy of circuit models. None of these, however, includes all the core elements of the most successful multiscale solvers for PDEs [TOSOO, Bra77, BHMOO]. For example, no leading placement tool uses relaxation during recursive coarsening as a means of either improving the choice of coarse-level variables or reducing the error in the coarse-level solution. It is not yet clear whether active research will bring placement algorithms closer to the "standard" multiscale metaheuristics [Bra86, BraOl, BR02]. The coverage here follows the operative definition of multiscale optimization given in Section 1.7. In particular, each variable at any given coarser level must represent a subset of variables at the adjacent finer level. This requirement distinguishes multiscale algorithms from so called "hierarchical" methods, as explained below. Comparison with Traditional Hierarchical Approaches Within the placement community, the term "hierarchical" is generally used as a synonym for top-down recursive-partitioning based approaches, in which spatial constraints on cell movement are added recursively by top-down subregion-and-subnetlist refinement, but the variables at every step remain essentially identical to the variables for the problem as initially given. Initially, cells are partitioned into two almost-equal-area subsets such that the total weight of all nets containing cells in both subsets (the "cutsize") is minimized (Section 2). The placement region is divided by a straight-line cut into two subregions, and each of the cell subsets is assigned to one of these subregions. Connections to fixed input-output (I/O) pads, if present, can be used as a guide in selecting the cut orientation and position as well as the cell-subset-to-placement-subregion assignment. The cutsize-driven area
26
Tony F. Chan et al.
bipartitioning continues recursively on the subregions until these are small enough that simple discrete branch-and-bound search heuristics can be efficiently used. At each step of partitioning, connections between blocks are modeled by terminal propagation. When the cells in block Bi are to be bipartitioned, cells in external blocks Bj are modeled as fixed points along the boundary of Bi, when they belong to nets also containing cells in Bi. That is, a net containing cells in Bj and cells in Bj is modeled simply as a net containing the same movable cells in J3j and, for each cell in Bj, a fixed point on the boundary of Bi. An important variation on the top-down recursive-bisection based framework combines the partitioning with analytical minimization of smoothed wirelength in subtle ways. In this system, an initial layout is calculated by unconstrained minimization, typically of a weighted quadratic wirelength model, without regard to overlap.^'^^ This initial solution tends to knot cells together in an extremely dense subregion near the center of the placement region, but the presence of I/O pads along the chip boundary gives the cells a nontrivial relative ordering. This ordering then guides the subsequent recursive partitioning of cells toward a final layout. In Gordian [KSJA91], cells are initially partitioned consistently with the given relative ordering. Cutsize-driven iterative improvement of the partitioning is then used not to displace cells but rather to define a recursive sequence of center-of-mass positions for the cell subsets calculated by the partitionings. These center-of-mass positions form hierarchical grids and are iteratively added as equality constraints to the global, weighted quadratic optimization. Eventually, the accumulation of these constraints leads to a relatively uniform cell-area distribution achieved by global wirelength minimization at each step. In BonnPlace [KSJA91, Vyg97, BROS], cell displacement from the analytically determined layout is minimized rather than cutsize. No center-of-mass or other constraints are explicitly used. Instead, the wirelength objective is directly modified to incorporate the partitioning results; cell displacements to their assigned subregions are minimized at subsequent iterations. The multiscale framework departs significantly from the traditional topdown hierarchical placement techniques. The key distinction is that, in the multiscale setting, optimization at coarser levels is performed on aggregates of cells, while in the traditional formulation, optimization at coarser levels is still done on the individual cells. That is, the traditional flow employs a hierarchy of constraints but not a hierarchy of variables. The distinction is blurred somewhat by the fact that state-of-the-art topdown placement algorithms based on recursive cutsize-driven partitioning all employ multiscale partitioning at most if not all levels. It is not yet precisely
^^' The quadratic objective is preferred principally to support the speed and stability of the numerical solvers used. It can be iteratively reweighted to better approximate half-perimeter wirelength [SDJ91].
Multiscale Optimization in VLSI Physical Design Automation
27
understood to what extent this use of multiscale partitioning at every level of the top-down flow matches or exceeds the capabilities of multiscale placement. Classification of Multiscale Placement Algorithms Our survey of leading multiscale placement algorithms is organized as follows. Algorithms constructing their problem hierarchies from the bottom up by recursive aggregation are described in Section 3.3. Those constructing their problem hierarchies from top-down by recursive partitioning are described in Section 3.4. Each overview of each method follows the organization outhned in Section 1.7: (i) hierarchy construction, (ii) intralevel relaxation, (iii) interpolation, and (iv) multiscale iteration flow. 3.3 Clustering-based Methods Among the known leading methods, clustering-based algorithms are perhaps the closest to traditional multiscale methods in scientific computation. Local connections among vertices at one level are used to define vertex clusters in various ways, some hypergraph-based, others graph-based. Assuming each vertex is assigned to some nontrivial cluster of vertices, the number of clusters is at most half the number of vertices. Recursively clustering clusters and transferring nets produces the requisite multiscale problem formulation. For clarity, we speak of vertices at one level and clusters of vertices at its adjacent, coarser level. Although various alternatives exist for defining the vertex clusters, connectivity among the clusters at a given level is generally defined by directly transferring nets of vertices at the adjacent finer level to nets of clusters, as described in Section 2.2.1. If the vertices in the hyperedge e^ = {vi^ ,Vi^,... ,Vij} are assigned to the clusters Cjj, zJj^, • •., Vi^., then net ej becomes net gj = {vi^,Vi^,... ,Vii^} at the coarser level. In this process, nets may be eliminated in two ways. First, because tightly interconnected vertices at the finer level are often clustered together at the coarser level, many low-degree nets at the finer level become singleton nets at the coarser level, where they are then deleted. Second, two or more distinct nets at the finer level may become identical at the coarser level, where they can be merged. Hyperedge degrees decrease or remain constant as hyperedges are propagated toward coarser levels, because k < j . However, the nets eliminated during coarsening are mainly of low degree. Thus, the average degree of nets at coarser levels is typically about the same as that at finer levels, with some decrease in the maximum net degree. Vertex degree, however, increases dramatically with coarsening. Under first-choice clustering on one standard benchmark (IBM/ISPD98 #18), the average number of nets to which each vertex belongs jumps from 3 at the finest level to 458 at the coarsest level.
28
Tony F. Chan et al.
Alternatively, the number of nets increases relative to the number of vertices significantly at coarser levels. In this sense, the hypergraph model at the coarsest level is quite different from the one at the finest level. The accuracy of hypergraph coarsening remains, as far as we know, a heuristic and poorly understood notion. There is no generally accepted means of comparing two different clustering rules other than to attempt them both for the problem at hand, e.g., partitioning, placement, or routing, and observe whether one rule consistently produces better final outcomes. Leading methods generally attempt to merge vertices in a way that eliminates as many hyperedges at the coarser level as possible. A recent study [HMS03a, HMS04] explores vertex clustering on hypergraphs in some detail for the purpose of accelerating a leading top-down partitioning-based placer (Capo [CKMOO]) by applying just one level of clustering. 3.3.1 Ultrafast V P R Though limited use of clustering appears earlier in the placement literature [SS95, HL99], to our knowledge, Ultrafast VPR [SR99] is the first pubhshed work to recursively cluster a circuit model into a hierarchy of models for placement by multiscale optimization. Ultrafast VPR is used to accelerate the annealing-based VPR algorithm ("Versatile Packing, Placement and Routing" [BR97]) in order to reduce design times on field-programmable gate arrays (FPGAs) at some expense in placement quality. (FPGA placement quahty in Ultrafast VPR is measured by the area used.) Ultrafast VPR is not seen as a general means of improving final placement quality over VPR, when both approaches are allowed to run to normal termination. The authors observe, however, that when the run-time for the two algorithms is explicitly limited to about 10 seconds, the multiscale approach does produce superior results. Hierarchy construction Designed for FPGAs, Ultrafast VPR exploits their regular geometry. It creates only uniform, square clusters; i.e., the number of vertices per cluster at each level is held fixed at a perfect square: 4, 9, 16, etc. To form a cluster, a seed vertex is randomly selected and denoted c. Each vertex b connected to the cluster is then ranked for merging with c by the value
m = Abc+ J2
o^
e
{e6£;|6,cGe} ' '
where A^c denotes the number of nets containing both b and c that will be eliminated if b is merged with c. The terms in the sum indicate that a lowdegree net containing vertices b and c connects them more tightly than a high-degree net. An efficient bucketing data structure supports fast updates of the rankings as vertices are added to the cluster.
Multiscale Optimization in VLSI Physical Design Automation
29
Relaxation Intralevel improvement in Ultrafast VPR proceeds in two stages: constructive placement followed by adaptive simulated annealing. In the constructive phase, clusters are considered in order of their connectivity with I/O pads. I.e., first the I/O pads themselves are placed, then clusters connected to output pads are placed, then clusters connected to input pads are placed, then clusters connected to already placed clusters, and so on. Each cluster at a given level is placed as close as possible to its "optimal location," determined as the arithmetic mean of the already placed clusters with which it shares connections. The initial I/O pad placement at the coarsest level is random; at subsequent, finer levels, pad positions are interpolated and improved in the same way that other clusters are. At the coarsest level, only connections to already placed clusters are considered in the weighted average used to compute a given cluster's ideal positiion. At all other levels, the mean of all clusters connected to the given cluster being placed is used, where clusters yet to be placed at the current level are temporarily given the position of their parent clusters. The solution to the constructive placement defines an initial configuration for subsequent fast annealing-based relaxation. High temperature annealing is used at coarser levels, low temperature annealing at finer levels. In both VPR and Ultrafast VPR, the temperature T is decreased by an adaptively selected factor a G [0.5, 0.95] determined by the number of moves accepted at the preceding temperature, T <— aT. In Ultrafast VPR, however, a is squared to accelerate the search. The number of moves per temperature is set to nmoves = cN"^^^, where N is the number of movable objects, and the constant c G [0.01, 10] may also be decreased by the factor a. The starting temperature TQ , number of moves nmoves, stopping temperature TJ ', and decrease factor a at each level are the key parameters characterizing the annealing schedule of Ultrafast VPR. The authors consider three different annealing schedules: (i) an aggressive, adaptive schedule in which the parameters are dynamically updated; (ii) a greedy "quench" in which no uphill (wirelength-increasing) moves are permitted; (iii) a fixed choice of TQ , T\ , and a. The quality/run-time trade-offs of these schedules are reported in experiments for Ultrafast VPR. Overall, schedule (ii) performs best for the shortest allowed run-times (1 second or less), schedule (iii) performs best for intermediate run-times of 1-100 seconds, and schedules (i) and (iii) perform comparably after 100 seconds, on average. In general, the average results over 20 circuits are not very sensitive to the choice of annealing schedule. During the annealing, component clusters are allowed to migrate across the boundaries of their parent clusters; the reported experiments indicate that this freedom of movement significantly improves final placement quality, allowing finer-level moves to essentially correct errors in clustering not visible at coarser levels.
30
Tony F. Chan et al.
Interpolation Initially, each vertex inherits the position of its parent. That is, the components of each cluster are simply placed concentrically at the cluster's center. Iteration Flow Only one pass from the coarsest to the finest level is used. The burden of iterative improvement is placed entirely on the annealing process; multiple V-cycles are not attempted or even mentioned. There are no citations in the original report on Ultrafast VPR to any other work on multiscale algorithms. It appears the authors arrived at their algorithm without any awareness of existing multiscale techniques used in other areas. 3.3.2
mPL
mPL [CCKSOO, CCKS03, CCK+03] is the first clustering-based multiscale placement algorithm for standard-cell circuits. It evolved from an effort to apply recent advances in numerical algorithms for nonlinear programming and particle systems to circuit placement. The initial goal was the development of a scalable nonlinear-programming formulation, possibly making use of multiscale preconditioning for large Newton-based linear systems of equations. Experiments showed that for problem sizes above roughly 1000, steplength restrictions slowed progress to a crawl, as the intricacy of the 0{N'^) constraints limited the utility of pointwise approximations to tiny neighborhoods of the evaluation points. Multiscale optimization was seen as a means of escaping the complexity trap. Early numerical experiments demonstrated superb scalability at some loss in quality. Subsequent improvements have brought mPL's quality and scalability to a level comparable to the best available academic tools. Hierarchy construction mPL uses first-choice clustering (FC, Section 2.2.1). The mPL-FC affinity that vertex i has for vertex j is '''' " .
?
, (|e|-l)area(e)'
^^^
where w{e) is the weight assigned to hyperedge (net) e, area(e) denotes the sum of the areas of the vertices in e, and |e| denotes the number of vertices in hyperedge e. Dividing by the net degree promotes the elimination of small hyperedges at coarser levels, making the coarse-level hypergraph netlists sparser and hence easier to place [Kar99, SR99, HMS03b]. The area factor in the denominator gives greater affinity to smaller cells and thus promotes a more
Multiscale Optimization in VLSI Physical Design Automation
31
uniform distribution of areas at coarser levels; this property supports the nonlinear programming and slot assignment modules discussed below. For each vertex i at the finer level, the vertex j assigned to it for clustering is not necessarily of maximal mPL-FC affinity but is instead of least hyperedge degree among those vertices within 10% of i's maximum PC affinity. When this choice is not unique, a least-area vertex is selected from the least-degree candidates. Hyperedges are transferred from vertices to clusters in the usual fashion, as described at the beginning of Section 3.3. Initially, edge-separabiUty clustering (ESC) [CLOO] was used in mPL to define clusters based on fast, global min-cut estimates. The first-choice strategy improves overall quality of results by about 3% over ESC [CCK+03]. Relaxation No relaxation is used in the recursive coarsening; the initial cluster hierarchy is determined completely by netlist connectivity. Nonlinear programming (NLP) is used at the coarsest level to obtain an initial solution, and local refinements on subsets are used at all other levels. Immediately after nonlinear programming at the coarsest level or interpolation to finer levels, the area distribution of the placement is evened out by recursive bisection and linear assignment. Subsequent subset relaxation is accompanied by area-congestion control; these area-control steps ultimately enable by local perturbations the complete removal of all cell overlap at the finest level prior to detailed placement. By default, clustering stops when the number of clusters reaches 500 or fewer. At the coarsest level, vertices Vi and Vj are modeled as disks, and their pairwise nonoverlap constraint Cij{X, Y) is directly expressed in terms of their radii pi and pj: Cij{X, Y) = {xi - Xj)"^ + [vi -Vjf
- {pi + Pj)>0
for aU i < j .
Quadratic wirelength is minimized subject to the pairwise nonoverlap constraints by a customized interior-point method with a slack variable added to the objective and the nonoverlap constraints to gradually remove overlap. Interestingly, experiments suggest that area variations among the disks can be ignored without loss in solution quality. That is, the radius of each disk can be set to the average over all the disks: pi = pj = p = {1/N) J2i Pk- After nonlinear programming, larger-than-average cells are chopped into average-size fragments, and an overlap-free configuration is obtained by linear assignment on the cells and cell fragments. Fragments of the same cell are then reunited, the area overflow incurred being removed by ripple-move cell propagation described below. Discrete Goto-based swaps are then employed as described below to further reduce wirelength prior to interpolation to the next level. Relaxation at each level therefore starts from a reasonably uniform area-density distribution of vertices.
32
Tony F. Chan et al.
A uniform bin grid is used to monitor the area-density distribution. The first four versions of mPL rely on two sweeps of relaxations on local subsets at all levels except the coarsest. These local-subset relaxations are described in the next two paragraphs. The first of these sweeps allows vertices to move continuously and is called quadratic relaxation on subsets (QRS). It orders the vertices by a simple depth-first search (DPS) on the netlist and selects movable vertices from the DPS ordering in small batches, one batch at a time. Por each batch the quadratic wirelength of all nets containing at least one of the movable vertices is minimized, and the vertices in the batch are relocated. Typically, the relocation introduces additional area congestion. In order to maintain a consistent area-density distribution, a "ripple-move" algorithm [HLOO] is applied to any overfull bins after QRS on each batch. Ripple-move computes a maximum-gain monotone path of vertex swaps along a chain of bins leading from an overfull bin to an underfuU bin. Keeping the QRS batches small facilitates the area-congestion control; the batch size is set to three in the reported experiments. After the entire sweep of QRS-Fripple-move, a second sweep of Goto-style permutations [GotSl] further improves the wirelength. In this scheme, vertices are visited one at a time in netlist order. Each vertex's optimal "Goto" location is computed by holding all its vertex neighbors fixed and minimizing the sum of the bounding-box lengths of all nets containing it. If that location is occupied by b, say, then b's optimal Goto location is similarly computed along with the optimal Goto locations of all of b's nearest neighbors. The computations are repeated at each of these target locations and their nearest neighbors up to a predetermined Hmit (3-5). Chains of swaps are examined by moving a to some location in the Manhattan unit-disk centered at b, and moving the vertex at that location to some location in the Manhattan unit disk centered at its Goto location, and so on. The last vertex in the chain is then forced into a's original location. If the best such chain of swaps reduces wirelength, it is accepted; otherwise, the search begins anew at another vertex. See Pigure 14.
^^ \, s\,
H I J
K
1
N
G K'' *^ J
Fig. 14. Goto-based discrete relaxation in mPL.
C
.D B F E
Multiscale Optimization in VLSI Physical Design Automation
33
In the most recent implementation of mPL [CCS05], global relaxations, in which all movable objects are simultaneously displaced, have been scalably incorporated at every level of hierarchy. The redistribution of smoothed area density is formulated as a Helmholtz equation subject to Neumann boundary conditions, the bins defining area-density constraints serving as a discretization. A log-sum-exp smoothing of half-perimeter wirelength defined in Section 3.4.2 below is the objective. Given an initial unconstrained solution at the coarsest level or an interpolated solution at finer levels, an Uzawa method is used to iteratively improve the configuration. Interpolation AMG-based weighted aggregation [BHMOO], in which each vertex may be fractionally assigned to several generalized aggregates rather than to just one cluster, has yet to be successfully apphed in the hypergraph context. The obstacle is that it is not known how to transfer the finer-level hyperedges, objectives, and constraints accurately to the coarser level in this case. AMGbased weighted disaggregation is simpler, however, it has been successfully apphed to placement in mPL. For each cluster at the coarser level, a C-point representative is selected from it as the vertex largest in area among those of maximal weighted hyperedge degree. C-points simply inherit their parent clusters' positions and serve as fixed anchors. The remaining vertices, called F-points, are ordered by nonincreasing weighted hyperedge degree and placed at the weighted average of their strong C-point neighbors and strong, already-placed F-point neighbors. This F-point repositioning is iterated a few times, but the C-points are held fixed all the while. Iteration Flow Two backtracking V-cycles are used (Figure 8). The first follows the connectivity-based FC clustering hierarchy described above. The second follows a similar FC-cluster hierarchy in which both connectivity and proximity are used to calculate vertex affinities:
- s cPl•.•c.^ E
w{e) (|ej-l)area(e)|j(a:i,yi)-(a;j,2/j-)ir
During this second aggregation, positions are preserved by placing clusters at the weighted average of their component vertices' positions. No nonUnear programming is used in the second cycle, because it alters the initial placement too much and degrades the final result.
34
Tony F. Chan et al.
3.3.3
mPG
As described in Section 4, the wires implementing netlist connections are placed not only in the same region containing circuit cells, but also in a set of 3-12 routing layers directly above the placement region. The bottom layers closest to the cells are used for the shortest connections. The top layers are used for global connections. These can be made faster by increasing wire widths and wire spacing. As circuit sizes continue to increase, so do both the number of layers and the competition for wiring paths at the top layers. Over half the wires in a recent microprocessor design are over 0.5 mm in length, while only 4.1% are below 0.5 mm in length [CCPY02]. While many simple, statistical methods for estimating routing congestion during placement exist {topology-free congestion estimation [LTKS02]) it is generally believed that, for a placement algorithm to consistently produce routable results, some form of approximate routing topology must be explicitly constructed during placement as a guide. The principal goal of mPG [CCPY02] is to incorporate fast, constructive routing-congestion estimates, including layer assignment, into a wirelength-driven, simulatedanneahng based multiscale placement engine. Compared to Gordian-L [SDJ91], mPG is 4-6.7 times faster and generates slightly better wirelength for test circuits with more than 100,000 cells. In congestion-driven mode, mPG reduces wiring overflow estimates by 45%~74%, with a 5% increase in wirelength compared to wirelength-driven mode, but 3-7% less wirelength after global routing. The results of the mPG experiments show that the multiscale placement framework is readily adapted to incorporate complex routability constraints effectively. Hierarchy construction mPG uses connectivity-driven, recursive first-choice clustering (FC, Section 2.2.1) to build its placement hierarchy on the netlist. The vertex affinity used to define clusters is similar to that used by mPL, as defined in (3). However, instead of matching a given vertex to its highest-affinity neighbor, mPG selects a neighbor at random from the those in the top 10% affinity. Moreover, the mPG vertex affinity does not consider vertex area. Experiments for mPG show that imposing explicit constraints on cluster areas in order to limit the cluster-area variation increases run time without significantly improving placement quality. The strategy in mPG is instead to allow unlimited variation in cluster areas, thereby reducing the number of cluster levels and allowing more computation time at each level. Large variations in cluster sizes are managable in mPG due to its hierarchical area-density model. Optimization at each level of the netlist-cluster hierarchy is performed over the exact same set of regular, uniform, nested bin-density grids. By gradually reducing the area overflow in bins at all scales from the size of the smaflest cells up to 1/4 the placement region, a sufficiently
Multiscale Optimization in VLSI Physical Design Automation
35
uniform distribution of cell areas is obtained for detailed placement. The same grid hierarchy is also used to perform fast incremental, global routing, including fast layer assignment, and to estimate routing congestion in each bin. In wirelength mode, the mPG objective is simply bounding-box wirelength, as in (1). In (routability) congestion mode, the objective is equivalent to a weighted-wirelength version of (1), in which the weight of a net is proportional to the sum of the estimated wire usages of the bins used by that net's rectilinear Steiner-tree routing. The congestion-based objective is used only at finer levels of the cluster hierarchy. Relaxation Relaxation in mPG is by simulated anneahng. Throughout the process, vertices are positioned only at bin centers. All vertex moves are discrete, from one bin center to another. At each step, a cluster is randomly selected. A target location for the cluster is then selected either (a) randomly within some range limit or (b) to minimize the total bounding-box wirelength of the nets containing it. The probability of selecting the target randomly is set to max{a, 0.6}, where a is the "acceptance ratio." The probability p of accepting a move with cost change AC is one if ziC < 0 and exp{-ZiC/T} if AC > 0, where T denotes the temperature. At the coarsest level k, the starting temperature is set to approximately 20 times the standard deviation of the cost changes of rifc random moves, where rii denotes the number of clusters at level i. At other levels, binary search is used to estimate a temperature for which the expected cost change is zero. These approximate "equilibrium temperatures" are used as starting temperatures for those levels. When accepting a move to a target location results in a density-constraint violation, an alternative location near the target can be found efficiently, if it exists, by means of the hierarchical bin-density structure. Annealing proceeds at a given temperature in sweeps of rij vertex moves as long as the given objective can be decreased. After a certain number of consecutive sweeps with net increase in the objective, the temperature is decreased by a factor /U = /u(a), in a manner similar to that used by Ultrafast VPR. The default stopping temperature is taken to be 0.005C/|£^|, where C is the objective value and \E\ is the number of nets at the current level. Interpolation Cluster components are placed concentrically at the cluster center. Iteration Flow One V-cycle is used: recursive netlist coarsening followed by one recursive interpolation pass from the coarsest to the finest level. Relaxation is used at each level only in the interpolation pass.
36 3.4
Tony F. Chan et al. Partitioning-based Methods
An apparent deficiency with clustering-based hierarchies is that, while they are observed to perform well on circuits that can be placed with mostly short, local connections, they may be less effective for circuits that necessarily contain a high proportion of relative long, global connections. A bottom-up approach to placement may not work well on a design formed from the top down. A placement hierarchy can also be constructed from the top down in an effort to better capture the global interconnections of a design. An aggregate need not be defined by recursive clustering; recursive circuit partitioning (Section 2) can be used instead. In this approach, the coarsest level is defined first as just two (or perhaps four) partition blocks. Each placement level is obtained from its coarser neighboring level by partitioning the partition blocks at that coarser level. Although partitioning's use as a means of defining a hierarchy for multiscale placement is relatively new, placement by recursive partitioning has a long tradition. The average number of interconnections between subregions is obviously correlated with total wirelength. A good placement can therefore be viewed as one requiring as few interconnections as possible between subregions. Minimizing cutsize is generally acknowledged as easier than placement, and, since the arrival of multiscale hypergraph partitioners hMetis and MLpart (Sections 2.2 and 2.3), little or no progress in partitioning tools has been made. Partitioning tools are thus generaUy seen as more mature than placement tools, whose development continues to progress rapidly. Placement algorithms based on partitioning gain some leverage from the superior performance of the state-of-the-art partitioning tools. 3.4.1
Dragon
Since its introduction in 2000 [WYSOOb], Dragon has become a standard for comparison among academic placers for the low wirelength and high routabihty of the placements it generates. Dragon combines a partitioningbased cutsize-driven optimization with wirelength-driven simulated anneahng on partition blocks to produce placements at each level. Like mPG and Ultrafast VPR, Dragon relies on simulated annealing as its principal means of intralevel iterative improvement. Unlike these algorithms. Dragon's hierarchy is ultimately defined by top-down recursive partitioning rather than recursive clustering. Heavy reliance on annealing slows Dragon's performance relative to other techniques and may diminish its scalabihty somewhat (Section 1.5). The flexibility, simplicity, and power of the annealing-based approach, however, makes Dragon adaptable to a variety of problem formulations [SWY02, XWCS03].
Multiscale Optimization in VLSI Physical Design Automation
37
Hierarchy construction Dragon's placement hierarchy is built from the top down. Initially, a outsizedriven quadrisection of the circuit is computed by hMetis (Section 2.2). Each of the four partition blocks is then viewed as an aggregate. The aggregate is given an area in proportion to its cell content, and the cells within each such aggregate are placed at the aggregate's center. Half-perimeter wirelength— driven annealing on these aggregates is then used to determine their relative locations in the placement region. Cutsize-driven quadrisection is then applied to each of the aggregates, producing 16 = 4 x 4 aggregates at the next level. Wirelength-driven annealing then determines positions of these new, smaller aggregates within some limited distance of their parent aggregates' locations. This sequence of cutsize-driven subregion quadrisection followed by wirelength-driven positioning continues until the number of cells in each partitioning block is approximately 7. At that point, greedy heuristics are used to obtain a final, overlap free placement. When a given partition block Bi is quadrisected, the manner in which its connections to other partition blocks Bj at the same level are modeled may have considerable impact. In standard non-multiscale approaches, various forms of terminal propagation (Section 3.2) are the most effective known technique. Experiments reported by Dragon's authors indicate, however, that terminal propagation is inappropriate in the multiscale setting, where entire partition blocks are subsequently moved. Among a variety of attempted strategies, the one ultimately selected for Dragon is simply to replace any net containing cells in both B^ and other partition blocks by the ceUs in Bi contained by that net. Thus, during quadrisection, all connections within Bi are preserved, but connections to external blocks are ignored. Connections between blocks are accurately modeled only during the relaxation phase described below. Although hMetis is a multiscale partitioner and therefore generates its own hierarchy by recursive clustering, Dragon makes no exphcit use of hMetis's clustering hierarchy it its top-down phase. Instead, Dragon uses the final result of the partitioner as a means of defining a new hierarchy from the top down. It is this top-down, partitioning-based hierarchy which defines the placement problems to which Dragon's wirelength-driven relaxations are applied. Relaxation Low-temperature wirelength-driven simulated annealing is used to perform pairwise swaps of nearby partition blocks. These blocks are not required to remain within the boundaries of their parent blocks. Thus, relaxation at finer levels can to some extent correct premature decisions made at earlier, coarser levels. However, to control the run time, the range of moves that can be considered must be sharply limited.
38
Tony F. Chan et al.
After the final series of quadrisections, four stages proceed to a final placement. First, annealing based on swapping cells between partition blocks is performed. Second, area-density balancing linear programming [CXWS03] is used. Third, cells are spread out to remove all overlap. Finally, small permutations of cells are greedily considered separately along horizontal and vertical directions from randomly selected locations. Interpolation The top-down hierarchy construction completely determines the manner in which a coarse-level solution is converted to a solution at the adjacent finer level. First, the cutsize-driven netUst quadrisection performed by hMetis divides a partition-block aggregate into four equal-area subblocks. Second, the annealing-based relaxation is used to assign each subblock to a subregion. The initial assignment of subregions to subblocks is unspecified, but, due to the distance-limited annealing that follows, it may be taken simply as a collection of randomly selected 4-way assignments, each of these made locally within each parent aggregate's subregion. As stated above, the ultimate subregion selected for a subblock by the annealing need not belong to the region associated with its parent aggregate. Iteration Flow Dragon's flow proceeds top-down directly from the coarsest level to the finest level. Because the multiscale hierarchy is constructed from the top down, there is no explicit bottom-up phase and thus no notion of V-cycle etc. 3.4.2
Aplace
In VLSICAD, the word "analytical" is generally used to describe optimization techniques relying on smooth approximations. Among these methods, forcedirected algorithms [QB79, EJ98] model the hypergraph netlist as a generalized spring system and introduce a scalar potential field for area density. A smooth approximations to half-perimeter wirelength (1) is minimized subject to implicit bound constraints on area-density. Regions of high cell-area density are sources of cell-displacement force gradients, regions of low cell-area density are sinks. The placement problem becomes a search for equilibrium states, in which the tension in the spring system is balanced by the area-displacement forces. Aplace [KW04] (the 'A' stands for "analytic") is essentially a multiscale implementation of this approach. The wirelength model for a net t consisting of pin locations^''-* t = {{xi,yi)\i = 1 , . . . , deg(t)} follows a well-known logsum-exp approximation. *'*' A pin is a point on a cell where a net is connected to the cell.
Multiscale Optimization in VLSI Physical Design Automation
39
4xp(t) = a • ( l n ( ^ e^-/") + l n ( ^ e"^'/") + l n ( ^ e^'/«) + l n ( ^ e"^'/")) , (4) where a is a smoothing parameter. To estimate area densities, uniform grids are used. Each subregion of a grid G is called a bin (the word "cell" is reserved for the movable subcircuits being placed). The scalar potential field <j){x,y) used to generate area-density-balancing forces is defined as a sum over cells and bins as follows. For a single cell v at position (a;„, yy) overlapping with a single bin h centered at (xb, ?/(,), the potential is the bell-shaped function (j)y{b) = a{v)p{\xy - Xb\)p{\yv - Vbl), where a{v) is selected so that J2i,f:Q 4>v{b) = area(w), and / 1 - 2d'^/r^ P^'^^ ~ \ 2(ci - r)yr^
if 0 < d < r/2 if r/2
^^>
and r is the radius of the potential. The potential 0 at any bin b is then defined as the sum of the potentials <j)v{b) for the individual cells overlapping with that bin. Let {X, Y) denote all positions of all cells in the placement region R. Let \G\ denote the total number of bins in grid G. Then the target potential for each bin is simply (f) = Ylivev area(ii)/|G|, and the area-density penalty term for a current placement (X, Y) on grid G is defined as
i,G{x^Y) = Y,{m beG
For the given area density grid G, Aplace then formulates placement as the unconstrained minimization problem mm p(
y «exp Veefi
/
for appropriate, grid-dependent scalar weights pt and p^. This formulation has been successfully augmented in Aplace to model routing congestion, movable I/O pads, and symmetry constraints on placed objects. Hierarchy construction The aggregates are defined from the top down by recursive cutsize-driven area bipartitioning on the hypergraph netlist. The partitioning engine used is MLpart (Section 2.3). As in Dragon, Aplace makes no attempt to reuse MLpart's clustering hierarchy, and Aplace's entire hierarchy of aggregates is defined before any relaxation begins. Thus, at the start of relaxation on the coarsest level, all the aggregates at all levels have been defined, but none has been assigned a position.
40
Tony F. Chan et al.
Relaxation Optimization at each level of Aplace proceeds by the Polak-Ribiere variant of nonlinear conjugate gradients [NS96] with Golden-Section hnesearch [GMW81]. A hard iteration limit of 100 is imposed. The grid size |G|, objective weights pi, and p^, wirelength smoothing parameter a (4), and area-density potential radius r (5) are selected and adjusted at each level to guide the convergence. Bin size and a are taken proportional to the average aggregate size at the current level. The potential radius r is set to 2 on most grids but is increased to 4 at the finest grid in order to prevent oscillations in the maximum cell-area density of any bin. The potential weight p-^ is fixed at one. The wirelength weight pi is initially set rather large and is subsequently decreased by 0.5 to escape from local minima with too much overlap. As iterations proceed, the relative weight of the area-density penalty increases, and a relatively uniform cell-area distribution is obtained. Interpolation When the conjugate-gradient iterations converge, simple declustering is used. The components of each aggregate are placed concentrically at the location of the aggregate's center. This configuration is used as the starting point for conjugate-gradient iterations at the adjacent, finer level. Iteration ¥\ow As in Dragon, there is just one pass from the coarsest to the finest level. There is no bottom-up aggregation, and there are no V-cycles. After conjugate gradients at the finest level, a simple Tetris-based legahzer [Hil02] is used to remove overlap in a way that tends to preserve cells' relative orderings. 3.5 Numerical Comparison The results of Dragon and mPL on the IBM/ISPD98 benchmarks [Alp98] are compared^^^ in Figure 15 to those of two leading representatives of the top-down partitioning-based paradigm described in Section 3.2: Capo and Feng-Shui. To illustrate the time/quahty trade-offs, the figure shows scaled run time vs. scaled total wirelength. All values have been scaled relative to those for mPL 4. From the figure, the foUowing conclusions can be drawn. (i) Dragon produces the best wirelength, by 4%, but requires the most run time, by about 2.2 x. (ii) Capo requires the least run time (by 3x) but produces the longest wirelength (by about 9%); (iii) mPL and Feng-Shui produce the best wirelength-to-run-time tradeoff; both are competitive with Dragon in wirelength and with Capo in run time. ^^^ As this chapter goes to press, no executable for Aplace is yet publicly available.
Multiscale Optimization in VLSI Physical Design Automation
41
Comparison on Non-uniform-sized ISPD/IBIUI version 1 4 3.5
:
•
1 ' =
2.5
3 2 1 1.5
X
n
m
• '
^
1 0.5 0 .96
•
0.98
• mPL4
1
• Capo 8.7
•
1.02
1.04
4 Dragon 2.23
1.06
1.08
1.1
x FengShui 2.6
Fig. 15. Comparison of 4 state-of-the-art placement algorithms: mPL4, Dragon 2.23, Capo 8.7, and Feng-Shui 2.2. All values are scaled relative to those for mPL4.
4 Multiscale Routing After placement, the positions and the sizes of the cells and the large intellectual-property (IP) blocks are determined. In the following step, routing, the placed cells and IP blocks are connected by metal wires according to the nethst and subject to the constraints of design rules, timing performance, noise, etc. As VLSI technology reaches deep-submicron feature size and gigahertz clock frequencies, the interconnect has become the dominant factor in the performance, power, reliability, and cost of the overall system. The difficulty of VLSI routing has increased with the development of IC technology for the following reasons. •
•
•
The current trend of component-based design requires a flexible, full-chip based (i.e., not reduced by partitioning), area-routing algorithm that can simultaneously handle both the interconnects between large macro blocks and the interconnects within individual blocks. The numbers of gates, nets and routing layers in IC designs keep increasing rapidly, and the size of the routing problem grows correspondingly. According to the International Technology Roadmap for Semiconductors [itr], a router may have to handle designs of over 100 miUion transistors and 8-9 metal layers in the near future, assuming the full-chip based routing approach. Many optimization techniques, including high-performance tree structures such as A-tree, BA-tree, etc.; buffer insertion; wire sizing; and wire order-
42
Tony F. Chan et al. ing and spacing; have been proposed for high-performance circuit designs. These make the routing problem even more compUcated.
In this section, we review recent efforts to apply multiscale optimization to VLSI routing. We start with a brief introduction to the routing problem and basic techniques. 4.1 Introduction to VLSI Routing Routing is the process of finding the geometric layouts of aU the nets. The input of a routing problem includes the following. routing area— the dimensions of the rectangular routing region and the number of available routing layers. netlist— required interconnects as sets of connection points ("pins") on placed objects. design rules— specifications of the minimum and/or maximum values allowed for wire width, wire spacing, via width, via spacing, etc. pin locations— as determined by the placement or the floorplanning process. obstacles— objects which wires cannot pass through, including placed cells, IP blocks, pads, pre-routed wires, etc. constraint-related parameters— such as the electrical parameters (e.g., RC constants) for performance and signal integrity constraints; thermal parameters for the thermal constraints, etc. The constraints of a general routing process usually include connection rules and design rules. Connection rules stipulate that (i) to prevent open circuits, wires of the same net must be connected together; and (ii) to avoid short circuits, wires of different nets must not overlap with each other at the same layer. Design rules specify the sizes of the wires and vias (connections between layers) and the minimum spacing between them according to the available manufacturing technology. The objective of the routing problem depends on the circuit design. For general purpose chips, the objective is usually the total wire length of all nets. For high-performance chips, the delay of the circuit is minimized. Figure 16 shows a very simple routing example with three nets. 4.2 Basic Techniques for Single Two-Pin Net Routing In single two-pin-net routing, a path for one single net with two terminals is sought. Single two-pin-net routing is a necessary component for every routing algorithm. Basic techniques for it include maze-routing algorithms and lineprobe algorithms. The first maze-routing algorithm was proposed in 1961 by Lee [Ake67], and later on, the algorithm was improved in both time and space complexity [Sou78, Had75]. Lee's basic maze-routing algorithm searches for a path
Multiscale Optimization in VLSI Physical Design Automation
Fig. 16. A routing example of four layers nets - N1, N2, and N3.
-
43
poly, m l , m2, and m3 - and three
from source s t o target t in a breadth-first fashion. The routing area is represented by a grid map, with each bin in the grid either empty or occupied by an obstacle. The algorithm starts the search process from the s like a propagating wave. Source point s is labeled 'O', and all the unblocked neighbors are labeled as '1'. All empty neighbors of all '1'-labeled nodes are then visited and labeled as '2', and so on until the target t is finally reached. An example is shown in Figure 17. The shortest-path solution is guaranteed as long as a solution exists.
Fig. 17. A net routed by a maze-routing algorithm.
Line-probe algorithms were proposed independently by Mikami and Tabuchi in 1968 [MT68] and by Hightower in 1969 [Hig69].The basic difference between
44
Tony F, Chan et al.
line-probe and maze-routing algorithms is that the line-probe algorithms use line segments instead of grid points to search for a path. Search lines are generated from both the source s and the target t. At the next iteration, new line segments are generated from escape points selected along the existing line segments. The generated lines cannot go through an obstacle or the area boundary. The main difference between the Mikami algorithm and the Hightower algorithm lies in the selection of escape points. In the Mikami algorithm, every point on the search line is an escape point; hence, the optimal solution is guaranteed, if a path exists. In the Hightower algorithm, only one escape point will be generated for each line segment, the choice of location determined by heuristics designed to support navigation around obstacles. However, the shortest path is not guaranteed, and a path may not be found, even when one exists. Figure 18 shows an example of routing a net using the Hightower line-probe algorithm.
Fig. 18. A net routed by a line-search algorithm.
4.3 Traditional Two-Level Routing Flow The traditional routing system is usually divided into two stages: global routing and detailed routing, as shown in Figure 19. During global routing, the entire routing region is partitioned into tiles or channels. A rough route for each net is determined among these tiles to optimize the overaU congestion and performance. After global routing, detailed routing is performed within each tile or channel, and the exact geometric layout of all nets is determined. There are two kinds of global routing algorithms, sequential and iterative. Sequential methods route the nets one by one in a predetermined order, using either maze routing or line probing. However, the solution quality is often affected by the net ordering. Iterative methods try to overcome the net ordering problem by performing multiple iterations. A negotiation-based iterative global routing scheme was proposed [Nai87] and later used in FPGA
Multiscale Optimization in VLSI Physical Design Automation
45
routing [ME95]. In flow-based iterative methods [CLC96, AlbOO], the global routing problem is relaxed t o a linear programming formulation and solved by an approximate multi-terminal, multicommodity flow algorithm. A combination of maze routing and iterative deletion has recently been used for performance-driven global routing [CM98]. There are two types of detailed routing approaches, grid-based and gridless. In grid-based detailed routing, routing grids with fixed spacings from the design rules are defined before routing begins. The path of each net is restricted t o connect grid points rectilinearly. As the grids are uniform, the layout representation in grid-based routing is quite simple, usually just a 3dimensional array. Variable widths and spacings may be used for delay and noise minimization (e.g. [CHKM96, CCOO]). A gridless detailed router allows arbitrary widths and spacings for different nets; these can be used t o optimize performance and reduce noise. However, the design size that a gridless router can handle is usually limited, due t o its higher complexity.
Final Layout
Global Routing - Maze searching - Multi-commodity flow-based - Iterative deletion - Hierarchical methods Detailed Routing - Grid-based: rip-up and reroute cross & touch - Gridless: shape based tile based non-uniform grid graph
Fig. 19. Traditional two-level routing flow.
4.4 More Scalable Approaches Because routing tiles are confined t o discrete routing layers, the global routing problem is sometimes described as "2.5-dimensional." In this setting, flat two-level (global routing detailed routing) routing approaches have two limitations in current and future VLSI designs. First, future designs may integrate over several hundreds of millions of transistors in a single chip. The traditional two-level design flow may not scale well t o handle problems of such size. For example, a 2.5 x 2.5 cm2 chip using a 0.07pm processing technology may have over 360,000 horizontal and vertical routing tracks at the full-chip
+
46
Tony F. Chan et al.
level [itr]. That translates to about 600 x 600 routing tiles, if the problem size is balanced between global-routing and detailed-routing stages, and presents a significant challenge to the efficiencies of both stages. To handle the problem, several routing approaches have been proposed to scale to large circuit designs. Top-down hierarchical approaches have also been used to handle large routing problems. The flow is shown in Figure 20. The first hierarchical method was proposed for channel routing by Burstein [BP83]. Heisterman and Lengauer proposed a hierarchical integer prograraming-based algorithm for global routing [HL91]. Wang and Kuh proposed a hierarchical (a,/?)* algorithm [WK97] for multi-chip module (MCM) global routing. The main problems with the hierarchical approaches are (1) the higher-level solutions over-constrain the lower level solutions, and (2) the lack of detailed routing information at coarser levels makes it difficult to make good decisions there. Therefore, when an unwise decision is made at some point, it is very costly, through rip up and reroute, to revise it later at a finer level. To overcome the disadvantages of hierarchical methods, hybrid routing systems have been proposed. Lin, Hsu, and Tsai proposed a combined top-down hierarchical mazerouting method [LHT90]. Parameter-controlled expansion instead of strictly confined expansion is used to overcome the first disadvantage, but there is still no way to learn finer-level routing information at coarser levels. Hayashi and Tsukiyama proposed a combination of a top-down and a bottom-up hierarchical approach [HT95], aiming to resolve the second problem of the original hierarchical approach, but the fine-level planning results are still fixed once they are generated, causing a net-ordering problem.
/
/
/
"'
/
/
/
Fig. 20. Hierarchical routing flow.
A 3-level routing scheme with an additional wire-planning phase between performance-driven global routing and detailed routing has also been proposed [CFKOO] and is illustrated in Figure 21. The additional planning phase improves both the completion rate and the runtime. However, for large designs, even with the three-level routing system, the problem size at each level
Multiscale Optimization in VLSI Physical Design Automation
47
may still be very large. For the previous example of a 2.5 x 2.5 cm2 chip using a 0.07pm processing technology, the routing region has to be partitioned into over 100 x 100 tiles on both the top-level global routing and the intermediatelevel wire planning stage (assuming the final tile for detailed routing is about 30 x 30 tracks for the efficiency of the gridless router). Therefore, as the designs grow, more levels of routing are needed. Rather than a predetermined, manual partition of levels, which may have discontinuities between levels, an automated flow is needed to enable seamless transitions bctween levels.
PcrConnance-driven global routing Congestion-driven
wire planning *.,'
A . , , ,'... /'
,, .___________I
._______--L-_O
Gridlcss detailed routing
Fig. 21. 3-level routing flow
4.5 M A R S - A M ultiscale R o u t i n g S y s t e m The success of multiscale algorithms in VLSI partitioning and placement led naturally to the investigation of multiscale methods for large-scale VLSI routing. A novel multiscale router, MRS [CFZOl], was proposed for gridless routing. Soon afterward, an enhanced, more effective version of MRS called MARS (Multiscale Advanced Routing System) was proposed [CXZ02]. Independently of MARS, the MR system [CL04] was also developed based on the multiscalerouting framework of MRS. The MARS algorithm is reviewed in this section. Section 4.5.1 provides an overview. Section 4.5.2 describes the tile-partitioning and resource-estimation algorithms employed during coarsening. Section 4.5.3 describes the multicommodity flow algorithm used to compute the initial routing solution at the coarsest level. The history-based iterative refinement process is explained
48
Tony F. Chan et al. •Detailed routin
•Initial fine routing tile generation
....
'"
^
^
/
•
•/ 1 /
•
* /
> t
>
1 lie coarsening • • •
fRoute refinement,!! jf-
P^ •Initial routing Fig. 22. Multiscale Routing Flow. in Section 4.5.4. Results of experiments with MARS are presented in Section 4.5.7. 4.5.1 Overview of the M A R S Routing Flow Figure 22 illustrates the V-cycle flow for MARS. The routing area is partitioned into routing tiles. The algorithm employs a multiscale planning procedure to find a tile-to-tile path for each net. In contrast, most traditional global-routing algorithms [CLC96, AlbOO, CM98] try to find routing solutions on the finest tiles directly. For large designs, the number of tiles may be too large for the flat algorithms to handle. The multiscale approach first estimates routing resources using a linesweeping algorithm on the finest tile level. A recursive coarsening process is then employed to merge the tiles, to reserve the routing resources for local nets, and to build a hierarchy of coarser representations. At each coarsening stage, the resources of each tile are calculated from the previously considered finer-level tiles which form it. Also, part of the routing resources are assigned to nets local to that level. Once the coarsening process has reduced the number of tiles below a certain hmit, the initial routing is computed using a multicommodity flow-based algorithm. A congestion-driven Steiner-tree generator gradually decomposes multi-pin nets into two-pin nets. During recursive interpolation, the initial routing result is reflned at each level by a maze-search algorithm. When the flnal tile-to-tile paths are found at the finest level of tiles for all the nets, a gridless detailed routing algorithm [CFK99] is applied to find the exact path for each net. The V-cycle multiscale fiow has clear advantages over the top-down hierarchical approach. The multiscale subproblems are closer to the original
Multiscale Optimization in VLSI Physical Design Automation
49
problem, because at each level, all nets, including those local to the current level, are considered during resource reservation. In the interpolation pass, the finer-level router may alter the coarser level result according to its more detailed information about local resources and congestion. The coarse-level solution is used as a guide to the finer-level solver, but not as a constraint. Compared to the traditional top-down hierarchical approach, this feature leads the multiscale method to better solutions with higher efficiency. (h '
Gi
1
h
H \1
•
I
G
. K•
1
1
*
Lbcal Congestion
1
•
G2
(a) Coarse Routing Result
•
1
Gz
(b) Refinement at Next Level
Fig. 23. Limitation of top-down hierarchical approaches. In a top-down hierarchical approach, coarse-level decisions are based on limited information (a). Finer-level refinement cannot change the coarse-level solution, even when it leads to local congestion at the finer level (b).
4.5.2 Coarsening Process and Resource Reservation The hierarchy of coarse-grain routing problems is built by recursive aggregation of routing subregions. InitiaUy, at the finest level, all routing layers are partitioned the same way into regular 2-D arrays of identical rectangular tiles. The finest level is denoted Level 0. A three-dimensional routing graph Go is built there, each node in Go representing a tile in some routing layer at Level 0. Every two neighboring nodes in Go that have a direct routing path between them are connected by an edge. The edge capacity represents the routing resources at the common boundary of the two tiles. The ultimate objective of the multiscale routing algorithm is to find, for each net, a tile-to-tile path in Go- These paths are then used to guide the detailed router in searching for actual connections for the nets. The multiscale router first accurately estimates the routing resources at the finest level, then recursively coarsens the representations into multiple levels. The number of routing layers does not change during the coarsening. All layout objects such as obstacles, pins and pre-routed wires are counted in the calculation. Because the detailed router is gridless, actual obstacle dimensions are used to compute the routing-resource estimates. Three kinds of edge capacities are computed: wiring capacity, interlayer capacity and through capacity.
50
Tony F. Chan et al.
A line-sweeping resource-estimation algorithm [CFKOO] is used in MARS. This method slices each routing layer along lines parallel to the layer's wiring direction, horizontal or vertical. The depth of a slice relative to boundary B is the distance along the slice from B to the nearest obstacle in the slice or, if there is no obstacle, to the end of the tile. For the example in Figure 24, the wiring capacity at boundary B can be computed as a weighted sum of slice areas reachable from B along lines in the slice direction, C = ^{Wi
X Di /D)
(6)
where Wi and Di are the width and depth of each slice Si, and D is the tile depth. To calculate Wt and Dj, we maintain a contour list of B, defined as a sorted list of the boundaries of all the rectangular obstacles that can be seen from B. In Figure 24, the contour list of J3 is Ci, C2, C3, C4. The interlayer edge capacity, which corresponds to the resources used by vias, is calculated as the sum of the areas of all empty spaces in the tile. The through capacity, which corresponds to the paths that go straight through a routing tile, is the sum of the boundary-capacity contributions of those empty rectangles that span the whole tile. The through capacity at the horizontal direction of the tile in Figure 24 is Cth = W2 + W^. All three capacities contribute to the path costs. For the example in Figure 25, the total path cost Cpath = C\^right + C2,le{t + C2,right + C2,through + C3,left + C3,up + C4,down + C4,right + C5,left, w h e r e Ci^right, C2,teft, C2,right,
C3,ie/fi
C4,right and cc^^uft are the costs related to the wiring capacities of tiles 1,2,3,4 and 5; C2,through is the cost corresponds to the through capacity of tile 2; and C3,up, C4,down SI'S the via costs related to interlayer capacities.
Wi
B
CI Dl
4-
D2
^
D3
C2 C3
—>
w4 r>4
C4
D Fig. 24. Resource-estimation model for one layer. Shaded rectangles represent obstacles. Given the accurate routing-capacity estimation at the finest level and the grid graph GQ, the coarsening process generates a series of reduced graphs
Multiscale Optimization in VLSI Physical Design Autonnation
y
y
^
51
^
ZZHIP^
y^'y
y^
>^^
Fig. 25. Path cost. Gi, each representing a coarsened Level-z routing problem, Pj, with a different resolution. At a coarser Level i + 1, the tiles are built simply by merging neighboring tiles on the adjacent, finer Level i. The coarser-level graph, Gj+i, which represents the routing resources available at the coarse tiles, can also be derived from the finer-level graph Gj directly. The routing graphs are iteratively coarsened until the size of the graph faUs below a predetermined limit. Resource Merging Every move from a finer Level % to a coarser Level i -f 1 requires merging a certain number of component tiles — 2 x 2 in our implementation. A new contour list for a resulting Level-i -F 1 tile is obtained by merging the contour fists of the component Level-i tiles. The wiring capacities of the new tile can also be derived from (6). Figure 26 illustrates the merging process. Level-i tiles Ti, T2, T3 and T4, whose left boundaries are i?i, B-i, B3 and B4, respectively, are to be merged. The contour fists of Bi, B2, B3 and B4 are retrieved and merged into the contour list of the new edge B. Since the contour lists are sorted, the merging process can be accomplished in 0{n) time, where n is the number of segments in the new contour fist. With the contour list of B, it is straightforward to derive the rectangles abutting B and then calculate the wiring capacity of B. The interlayer capacity of the new tile is calculated as the sum of the interlayer capacities of the component tiles. The through capacity of the new tile is calculated as the sum of the heights of the empty shoes that span the whole tile. Resource Reservation The estimation computed by the above procedure still cannot precisely model available routing capacities at coarser levels. When the planning engine moves from Level i to coarser Level z -|- 1, a subset of the nets at Level i become completely local to individual Level-i-1-1 tiles. Such nets are therefore invisible at that level and all coarser levels. If the number of such local nets is large, a solution to the coarse-level problem may not be aware of locally congested areas and may therefore generate poor planning results.
52
Tony F. Chan et a1
Fig. 26. Merging contour lists.
Figure 27(a) shows an example of the effect of local nets. Nets 1 and 2 are local to Level 1, and appear at Level 0. Net 3 is local to Level 2, and Net 4 is global to the coarsest level(leve1 2). Each net is planned without any information about the nets local to the planning level. Net 3 and Net 4 will be planned as in Figure 27(a). Both Net 3 and Net 4 have to be changed in Level-0 planning to minimize the local congestion. This adjustment not only places a heavier burden on the refinements at finer levels but also wastes the effort spent at coarser levels. In order t o correctly model the effect of local nets, the resources that will be used by nets local to each level are predicted and reserved during coarsening. This process is called resource reservation. More specifically, suppose the coarsening process goes through Level 0, Level 1, ..., Level k , with Level 0 being the finest level. Let ci,j denote the initial capacity of edge ei,j in routing graph Gi, and the capacity vector Ci = [cijl,c i , ~..., ci,,] represents all routing capacities at Level i . Let Tn,i = {the minimal set of tiles in which the pins of net n are located on Level i ), which is called the spanning tile set of net n on Level i. The level of net n , level(n), is defined as the level above which a net becomes contained within the boundary of one tile. It is calculated as follows:
min{il ITn,il >
if I T n , k / > 1 if lTn,ol = 1 1 and ITn,i+lj= 1) otherwise.
(7)
Let Li = {n / level(n) = i ) ; Li is called the local net set on level i . Let Mi = {n I level(n) > i ) ; Mi is called the global net set on level i . To better estimate the local nets, we use a maze-routing engine to find a path for each net in Li before going from Level i to Level i 1. Then we deduct the routing capacity taken by these local nets in resource reservation. Figure 28 shows an example of the calculation of the reservation for edges C D , A C and B D of a Level i 1 tile. An L-shaped path connects pins s and t in a horizontal layer. The wiring capacities on C D , A C and B D are first calculated from (6). After the horizonal wire is added, one segment in the contour list of C D will be pushed right by h. Therefore, the reserved capacity is r = w . h l / h , where
+
+
Multiscale Optimization in VLSI Physical Design Automation C
A
C
53
A
•rT+ D-
'p-
p'
£)•
•?f .^^J C
c
A-
(a) Effect of Local Nets (w/o resource reservation)
(b) After Resource Reservation
Fig. 27. Effect of resource reservation. w is the wire width. Similarly, the vertical resource reservations on AC and BD are w • vi/v and w • V2/V, respectively. However, since pins are treated as obstacles in contour-list generation, the capacity reservation on AB remains zero. A vector i?i+i = [rj+i,i,ri+i,2, •.•?'i+i,j] can be obtained by repeating this process; each element corresponds to the reservation on ej+ij- in Gj+j. The routing capacity of edges in Gi+\ is updated as Cj+i = Cj+i — Ri+\ = [c,+ l,l - J'i+l.l, Cj_|-i,2 - ?'i+l,2, • • • , Ci+\,j — ^ i + l j ] -
C
A -hii^
v2
B
vl
D Fig. 28. Reservation calculation.
Figure 27(b) shows the planning result with the resource-reservation approach. Net 1 and Net 2 are routed at Level 0, and the reservations are made for them on CO' and AD'. At Level 1, Net 3 will take a route different from
54
Tony F. Chan et al.
that of Figure 27(a), due to the resource reservation for local nets made at Level 0. For the same reason, Net 4 is routed through the less congested tiles at Level 2. Because the local nets at a given level occupy a small number of adjacent or nearly adjacent tiles, the maze-routing engine will route them very quickly. Hence, the reservation procedure is very fast. The routes during this phase are usually short and straight, so the reservation amount is close to the lower bound of the resources actually needed by the nets. Furthermore, the reserved routes are not fixed and can be changed during the refinement process when necessary. 4.5.3 Initial Routing The coarsening process stops once the size of the tile grid falls below a certain user-specified value; the default maximum coarsest-level size is 20 x 20 on each routing layer. A set of tile-to-tile paths is then computed at the coarsest level for the nets which cross tile boundaries at that level. This process is called the initial routing. It is important to the final result of multiscale routing for two main reasons. First, long interconnects that span more tiles are among the nets that appear during the initial routing. Normally, these long interconnects are timing-critical and may also suffer noise problems due to coupling along the paths. Initial routing algorithms should be capable of handling these performance issues. Second, the initial routing result will be carried all the way down to the finest tiles through the refinement process in the multiscale scheme. Although a multiscale framework allows finer-level designs to change coarser-level solutions, a bad initial routing solution will slow down the refinement process and is likely to degrade the final solution. In MARS, a multicommodity flow-based algorithm is used to compute the initial routing solution at the coarsest level, k. The flow-based algorithm is chosen for several reasons. First, the flow-based algorithm is fast for the relatively small grid size at the coarsest level. Also, the flow-based algorithm considers all the nets at the same time and thus avoids the net-ordering problem of net-by-net approaches. Last, the flow-based algorithm can be integrated with other optimization algorithms to consider special requirements of certain critical nets. For example, high-performance tree structures such as A-Tree [CKL99], or buffered tree [CYOO] can be taken instead of the Steiner tree as the candidate tree structure in the flow-based initial routing. The objective of the initial routing is to minimize congestion on the routing graph. A set of candidate trees is computed for each net on the coarsest level routing graph Gk- For a given net i, let Pi = {Pj,i,..., Pi,ii} be the set of possible trees. In the current version of MARS, graph-based Steiner trees are used as candidates for each net. Assume the capacity of each edge on the routing graph is c(e), and Wi^e is the cost for net i to go through edge e. Let Xij be an integer variable with possible values 1 or 0 indicating if tree Pij is chosen or not (1 < j < k). Then, the initial routing problem can be
Multiscale Optimization in VLSI Physical Design Automation
55
formulated as a mixed integer linear-programming problem as follows: min A subject to E i , j : e S P,j
Wi,eXi,j
< A c ( e ) ioX B £
E
(g)
E / = i Xi,3 = 1 for z = l,...,nfc Xi^j G {0,1} for i = 1 , . . . ,nk] where Uk is the number of nets to be routed at level k. Normally, this mixed integer linear-programming problem is relaxed to a linear-programming problem by replacing the last constraint in Equation 8 by Xi,j > 0 for i = l,...,nk;
/gs
After relaxation, a maximum-flow approximation algorithm can be used to compute the value of Xij G [0, 1] for each net in the above linear-programming formulation. The algorithm in (Figure 29) is implemented based on an approximation method [AlbOO]. Garg and Konemann [GK98] have proved the optimality of this method and have given a detailed explanation of its application to multicommodity flow computation. After the fractional results for each path are computed, a randomized rounding algorithm is used to convert the fractional values into 0 or 1 values for candidate paths of each net so that one path is chosen for each net. Error bounds have been estimated for the randomized rounding approach to global routing [KLR+87]. 4.5.4 History-Based Incremental Refinement One major difference between the hierarchical routing and multiscale routing approaches is that a multiscale framework allows the finer-level relaxations to change coarser-level routing solutions. In the interpolation phase of the multiscale framework, paths computed by the initial flow-based algorithm are refined at every level. In MARS, a congestion-driven minimum Steiner tree is used to decompose a multipin net. The Steiner tree is generated using a point-to-path ^*-search algorithm in both the initial routing and the relaxation at finer levels. Steinertree-based decomposition of a multipin net achieves better wire length and routabihty than a minimum-spanning-tree-based decomposition. 4.5.5 The Path-Search Algorithm Two types of nets must be handled at each level of the refinement. One type is "new" nets that appear just at the current level, as shown by solid lines in Figure 30(b). These nets are relatively short and do not cross coarser tile
56
Tony F. Chan et al. parameter initialization; ^ each iteration { for each net { if ((there is no candidate topology for this net) \ (the cost of the net's last topology in current graph increases too much)) generate a new topology T for this net; else heep the last topology T; assign a unit flow to T; update the routing graph edges;
pick one topology for each net by randomized rounding according to the assigned flow
volume;
Fig. 29. Approximate multicommodity-flow algorithm. boundaries, so they are not modeled at coarser levels. These nets are called local nets. New paths must be created for local nets at the current level. Another set of nets, global nets, are those carried over from the previous coarser-level routing, whose paths must be refined at the current level. During each refinement stage, local nets are routed prior to global nets. Finding paths for the local nets is relatively easy, as each local net crosses at most two tiles. Furthermore, routing local nets before any refinement provides a more accurate estimation of local resources. The major part of the refinement work comes from refining the global nets. In general, the amount of work needed for refinement depends on the quality of the coarse-level solution. At one extreme, the choice of paths at the coarser level is also optimal at the current level. The paths need only be refined within the regions defined by the paths in coarse tiles. With this restriction, the multiscale algorithm can be viewed as a hierarchical algorithm. At the other extreme, when the coarser solution is totally useless or misleading, optimization at the current level must discard the coarser level solution and search for a new path for each coarse-level net among all finer tiles at this level. In this case, whole-region search is required at the current level. The reality lies between these two extreme cases. A good coarse-level routing provides hints as to where the best solution might be. However, if the search space is restricted to be totally within the coarse level tiles as in a hierarchical approach, the flexibility to correct the mistakes made by coarser levels will be lost.
Multiscale Optimization in VLSI Physical Design Automation
57
In order to keep the coarser-level guide while still maintaining the flexibility to search all finer tiles, a net-by-net refinement algorithm using a modified maze-search algorithm is implemented. Figure 30 shows an example. The path on coarser tiles guides the search for a path at the current level. A preferred region is defined as the set of tiles that the coarse-level path goes through. Weights and penalties associated with each routing-graph edge are computed based on the capacities and usage by routed nets. Additional penalties are assigned to graph edges linking to and going between the graph nodes corresponding to tiles that are not located within the preferred region, as shown in Figure 30(c). Dijkstra's shortest-path algorithm [Dij59] is used to find a weighted shortest path for each net, considering wire length, congestion, and the coarser-level planning results. In general, Dijkstra's algorithm may be slow in searching for a path in a large graph. However, by putting penalties to nonpreferred regions, we can guide the path to search within the preferred regions first. The penalty is chosen so that it does not prevent the router from finding a better solution that does not fall into the preferred region. (h
Gi
L2
1 —•
L.
* G1 i •
'
'
*
i
oi
*
1
1
G2 • (b) Local Nets
Gb
(a) Coarse Routing (
1 —•
Li
i
G
' c) Constraint Region
1
T
1
G2
(d) Final Routing
Fig. 30. Constrained maze refinement. Previous routing at coarse grid is shown in (a). Local nets at current level are routed first. Preferred regions are defined by a path at the coarser tile grid, as shown in (c). However, the modified maze algorithm is not restricted to higher-level results and can leave the preferred regions according to local congestion, thereby obtaining better solutions (d).
4.5.6 History-based Iterative Refinement Limiting to only one round of refinement may not be enough to guarantee satisfactory results. In MARS, a form of history-based iterative-refinement [ME95]
58
Tony F. Chan et al.
is applied at each level. Edge routing costs are iteratively updated with the consideration of historical congestion information and all nets are re-routed based on the new edge costs. The cost of edge e during the ith iteration is calculated by: Cost{e,i) history{e,i)
= a • congestion{e,i)
+ p • history{e,i)
= history{e,i — 1)+^ • congestion{e,i--1)
(10) (11)
where congestion{e,i) is a three-tiered slope function of the congestion on e, historyie, i) is the history cost, indicating how congested that edge was during previous iterations, and a, /?, 7 are scaling parameters. Explicit use of congestion history is observed to prevent oscillations and smooth the convergence of the iterative refinement. The congestion estimates of the routing edges are updated every time a path of a net is routed. After each iteration, the history cost of each edge is increased according to (11). Then the congestion estimates of all edges are scanned to determine whether another iteration is necessary. If so, all edge usages are reset to zero and the refinement process at the same level is restarted. 4.5.7 E x p e r i m e n t s a n d Results MARS has been implemented in C + + . The multiscale-routing results are finalized using an efficient multilayer gridless routing engine [CFK99]. MARS has been tested on a wide range of benchmarks, including MCM examples and several standard cell examples, as shown in Table 5 [CXZ02]. Mccl and Mcc2 are MCM examples, where Mccl has 6 modules and Mcc2 has 56 modules [CFZOl]. The results are coUected on a Sun Ultra-5 440Mhz with 384MB of memory. MARS is compared with the three-level routing flow recently presented at ISPD 2000 [CFKOO]. The three-level flow features a performance-driven global router [CM98], a noise-constrained wire-spacing and track-assignment algorithm [CCOO], and finally a gridless detailed-routing algorithm with wire planning [CFKOO]. In this experiment, the global router partitions each example into 16 X 16 routing tiles. Nets crossing the tile boundaries are partitioned into subnets within each tile. After the pin assignment, the gridless detailed routing algorithm routes each tile one by one. "#Total Sub-nets" are the total two-pin nets seen by the detailed router. Since long two-pin nets are segmented to shorter subnets, this number not only depends on the number of multiple-pin nets, but also depends on the net planning results. From the results in Table 6, the multiscale routing algorithm helps to eliminate failed nets and reduces the runtime by 11.7x. The difference between a multiscale router and a hierarchical router has been discussed in Section 4.5.1. For comparison, the MARS implementation has been modified and transformed into a hierarchical implementation. Table 7
Multiscale Optimization in VLSI Physical Design Automation
59
Table 5. Examples used for multiscale routing. Circuit
Size # #2-Pin (^m) Layers Nets 4 Mccl 39000 1694 X45000 4 Mcc2 152400 7541 X152400 3 Struct 4903 3551 X4904 Primaryl 7552 3 2037 X4988 Primary2 10438 3 8197 X6468 3124 S5378 4330 3 X2370 S9234 3 2774 4020 X2230 6590 3 S13207 6995 X3640 SI5850 7040 3 8321 X3880 S38417 11430 3 21035 X6180 S38584 12940 3 28177 X6710
#
#
#
Division
Cells Pins Levels
MCM 3101
2
25 X 22
MCM 25024
3
43 X 43
n/a
5471
5
273 X 273
n/a
2941
3
53 X 35
n/a
11226
6
73 X 46
1659 4734
3
61 X 34
1450 4185
3
57x32
3719 10562
4
92 X 51
4395 12566
4
98 X 55
11281 32210
3
159 X 86
14716 42589
4
180 X 94
compares the routing results of such a hierarchical approach with those of the multiscale approach. Although the hierarchical approach gains a little bit in runtime in some cases, by constraining the search space during the refinement process, it loses to the multiscale routing in terms of completion rate. This trend holds true especially in designs with many global nets, such as Mccl and Mcc2. This result indicates that the multiscale method can generate planning results with better quality.
5
Conclusion
Multiscale algorithms have captured a large part of the state of the art in VLSI physical design. However, experiments on synthetic benchmarks suggest that a significant gap between attainable and optimal still exists, and other kinds of algorithms remain competitive in most areas other than partitioning. Therefore, a deeper understanding of the underlying principles most relevant to multiscale optimization in physical design is widely sought. Current efi'orts to improve multiscale methods for placement and routing include attempts to answer the following questions.
60
Tony F. Chan et al. Table 6. Comparison of 3-level and multiscale routing. 3-level Routing Multiscale Routing #Failed Nets Run- #Failed Run(#Total sub-nets) time(s) Nets time(s) 517(3124) 430.2 S5378 0 30 S9234 307(2774) 355.2 0 22.8 85.2 S13207 877(6995) 1099.5 0 978(8321) S15850 1469.1 0 107.1 1945(21035) 3560.9 0 250.9 S38417 S38584 2535(28177) 0 466.1 7086.5 406.2 Struct 21 (3551) 0 31.6 Primary 1 19 (2037) 239.1 0 33.5 Primary2 88 (8197) 1311 162.7 0 Mccl 195 (1694) 933.2 105.9 0 Mcc2 2090 (7541) 1916.9 12333.6 0 Avg. 1 11.7 Circuit
Table 7. Comparison of hierarchical routing and multiscale routing. Hierarchical Routing Multiscale Routing #Failed Nets Run- #Failed Run(#Total sub-nets) time(s) Nets time(s) 1(3390) S5378 22.6 0 30 S9234 0 0 22.8 15.7 1(7986) 85.2 S13207 60.9 0 S15850 1(9587) 75.8 0 107.1 S38417 0 223.2 0 250.9 S38584 3(31871) 334.6 0 466.1 Struct 0 21.6 0 31.6 Primary 1 0 34.6 0 33.5 Priraary2 0 0 162.7 164.8 Mccl 377(13338) 205.3 0 105.9 Mcc2 7409(96030) 4433.0 0 1916.9 Avg. 1.04 1 Circuit
How should an optimization hierarchy be defined? Is it important to maintain a hypergraph model at coarser levels, or will a graph model suffice? Can relaxation itself be used as a means of selecting coarse-level variables [BR02]? How should the accuracy and quality of hypergraph coarsenings and coarselevel data functions and placements be quantified? Is it possible to simultaneously employ multiple hierarchies constructed by different means and perhaps targeting different objectives? Can optimization on the dual hypergraph hierarchy be coordinated with optimization on the primal? Given multiple candidate solutions at a given level, is there a general prescription for combining them into a single, superior solution?
Multiscale Optimization in VLSI Physical Design Automation
61
Current approaches directly optimize a coarse-level formulation of the finelevel problem. Would it be more effective at the coarser level to focus instead on the error or displacement in the given, adjacent finer-level configuration? Which methods of relaxation are the most effective at each level? What should be the relative roles of continuous and discrete formulations? How should they be combined? At what point in the flow should constraints be strictly satisfied and subsequently enforced? Are relaxations best restricted to sequences of small subsets? How should these subsets be selected? When should all variables be updated at every step? Which is preferable: a small number of V-cycles with relatively expensive relaxations, or a large number of V-cycles (or other hierarchy traversal) with relatively inexpensive relaxations? Is there some underlying convergence theory for multiscale optimization, either to local or global solutions, that can be used to guide the practical design of the coarsening, relaxation, interpolation, and iteration flow? As the final, physical barriers to the continuation of Moore's Law in fabrication technology emerge, these questions pose both a challenge and an opportunity for significant advances in VLSI physical design.
6 Acknowledgments Funding for this work comes from Semiconductor Research Consortium Contracts 2003-TJ-1091 and 99-TJ-686 and National Science Foundation Grants CCR-0096383 and CCF-0430077. The authors thank these agencies for their support.
62
Tony F. Chan et al.
References [AHK97]
C. Alpert, J.-H. Huang, and A.B. Kahng. Multilevel circuit partitioning. In Proc. 34th IEEE/ACM Design Automation Conf., 1997. [AK96] C. Alpert and A. Kahng. A hybrid multilevel/genetic approach for circuit partitioning. In In Proceedings of the Fifth ACM/SIGDA Physical Design Workshop, pages 100^105, 1996. [Ake67] S.B. Akers. A modification of Lee's path connection algorithm. IEEE Trans, on Computers, EC-16;97~98, Feb. 1967. [AlbOO] Christoph Albrecht, Provably good global routing by a new approximation algorithm for multicommodity flow. In Proc. International Symposium on Physical Design, pages 19-25, Mar. 2000. [Alp98] C.J. Alpert. The ISPD98 circuit benchmark suite. In Proc. Ml Symposium on Physical Design, pages 80-85, 1998. [BHMOO] W.L. Briggs, V.E. Henson, and S.F. McCormick. A Multigrid Tutorial. SIAM, Philadelphia, second edition, 2000. [BP83] M. Burstein and R. Pelavin. Hierarchical channel router. Proc. of 20th Design Automation Conference, pages 519-597, 1983. [BR97] V. Betz and J. Rose. VPR: A new packing, placement, and routing tool for FPGA research. In Proc. Intl. Workshop on FPL, pages 213-222, 1997. [BR02] A. Brandt and D. Ron. Multigrid Solvers and Multilevel Optimization Strategies, chapter 1 of Multilevel Optimization and VLSICAD. Kluwer Academic Publishers, Boston, 2002. [BR03] U. Brenner and A. Rohe. An effective congestion-driven placement framework. IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems, 22(4):387-394, April 2003. [Bra77] A. Brandt. Multi-level adaptive solutions to boundary value problems. Mathematics of Computation, 31(138):333-390, 1977. [Bra86] A. Brandt. Algebraic multigrid theory: The symmetric case. Appl. Math. Comp., 19:23-56, 1986. [BraOl] A. Brandt. Multiscale scientific computation: Review 2001. In T. Barth, R. Haimes, and T. Chan, editors, Multiscale and Multiresolution Methods. Springer Verlag, 2001. [Bre77] M.A. Breuer. Min-cut placement. J. Design Automation and Fault Tolerant Comp., l(4):343-362, Oct 1977. [Cad99] Cadence Design Systems Inc. Envisia ultra placer reference. In http://www.cadence.com, QPlace version 5.1.55, compiled on 10/25/1999. [CAMOO] A.E. Caldwell, A.B.Kahng, and I.L. Markov. Improved algorithms for hypergraph partitioning. In Proc. IEEE/ACM Asia South Pacific Design Automation Conf, 2000. [CCOO] C. Chang and J. Cong. Pseudo pin assignment with crosstalk noise control. In Proc. International Symposium on Physical Design, Apr 2000. [CCK+03] T.F. Chan, J. Cong, T. Kong, J. Shinnerl, and K. Sze. An enhanced multilevel algorithm for circuit placement. In Proc. IEEE International Conference on Computer Aided Design, San Jose, CA, Nov 2003. [CCKSOO] T.F. Chan, J. Cong, T. Kong, and J. Shinnerl. Multilevel optimization for large-scale circuit placement. In Proc. IEEE International Conference on Computer Aided Design, pages 171-176, San Jose, CA, Nov 2000.
Multiscale Optimization in VLSI Physical Design Automation
63
[CCKS03] T.F. Chan, J. Cong, T. Kong, and J. Shinnerl, Multilevel Circuit Placement, chapter 4 of Multilevel Optimization in VLSICAD. Kluwer Academic Pubhshers, Boston, 2003. [CCPY02] C.C. Chang, J. Cong, Z. Pan, and X. Yuan. Physical hierarchy generation with routing congestion control. In Proc. ACM International Symposium on Physical Design, pages 36-41, San Diego, CA, Apr 2002. [CCRX04] C. Chang, J. Cong, M. Romesis, and M. Xie. Optimality and scalability study of existing placement algorithms. IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems, pages 537-549, 2004. [CCS05] T.F. Chan, J. Cong, and K. Sze. Multilevel generalized force-directed method for circuit placement. In Proc. Int'l Symp. on Phys. Design, pages 185-192, 2005. [CCXOSa] C-C. Chang, J. Cong, and M. Xie. Optimality and scalability study of existing placement algorithms. In Proc. Asia South Pacific Design Automation Conference, pages 621-627, 2003. [CCXOSb] C.C. Chang, J. Cong, and M. Xie. Optimality and scalability study of existing placement algorithms. In Asia South Pacific Design Automation Conference, pages 325-330, Kitakyushu, Japan, Jan 2003. [CFK99] J. Cong, J. Fang, and K.Y. Khoo. An implicit connection graph maze routing algorithm for ECO routing. In Proc. International Conference on Computer Aided Design, pages 163-167, Nov. 1999. [CFKOO] J. Cong, J. Fang, and K.Y. Khoo. DUNE: A multi-layer gridless routing system with wire planning. In Proc. International Symposium on Physical Design, pages 12-18, Apr. 2000. [CFZOl] J. Cong, J. Fang, and Y. Zhang. Multilevel approach to full-chip gridless routing. Proc. IEEE International Conference on Computer Aided Design, pages 396-403, 2001. [CHKM96] J. Cong, Lei He, C.-K. Koh, and P. Madden. Performance optimization of VLSI interconnect layout. Intergration, the VLSI Journal, 21(1-2):194, 1996. [CKL99] J. Cong, A.B. Kahng, and K.S. Leung. Efficient algorithms for the minimum shortest path Steiner arborescence problem with applications to VLSI physical design. IEEE TYans. on Computer-Aided Design, 17(l):24-39, Jan. 1999. [CKMOO] A.E. Caldwell, A.B. Kahng, and I.L. Markov. Can recursive bisection produce routable placements? In Proc. 37th IEEE/ACM Design Automation Conf., pages 477-482, 2000. [CLOO] J. Cong and S.K. Lim. Edge separability based circuit clustering with application to circuit partitioning. In Asia South Pacific Design Automation Conference, Yokohama Japan, pages 429-434, 2000. [CL04] Y. Chang and S. Lin. Mr: A new framework for multilevel full-chip routing. IEEE Trans, on Computer Aided Design, 23(5), May 2004. [CLC96] R.C. Carden, J. Li, and C.K. Cheng. A global router with a theoretical bound on the optimal solution. IEEE Trans. Computer-Aided Design, 15(2):208-216, Feb. 1996. [CLW99] J. Cong, H. Li, and C. Wu. Simultaneous circuit partitioning/clustering with retiming for performance optimization. Proc. 36th ACM/IEEE Design Automation Conf., pages 460-465, Jun 1999.
64
Tony F. Chan et al.
[CLWOO]
J. Cong, S.K. Lim, and C. Wu. Performance-driven multi-level and multiway partitioning with retiming. In Proceedings of Design Automation Conference, pages 274- 279, Los Angeles, California, Jun 2000. [CM98] J. Cong and P. Madden. Performance driven multi-layer general area routing for PCB/MCM designs. In Proc. 35th Design Automation Conference, pages 356-361, Jun 1998. [CP68] H.R. Charney and D.L. Plato. Efficient partitioning of components. In In Proc. of the 5th Annual Design Automation Workshop, pages 16-0 16-21,1968. [CS93] J. Cong and M. Smith. A parallel bottom-up clustering algorithm with applications to circuit partitioning in vlsi designs. In Proc. Design Automation Conference, pages 755-760, San Jose, CA, 1993. [CS03] J. Cong and J.R. Shinnerl, editors. Multilevel Optimization in VLSICAD. Kluwer Academic Publishers, Boston, 2003. [CW02] J. Cong and C. Wu. Global clustering-based performance-driven circuit partitioning. In Proc. Int. Symp. on Physical Design, pages 149-154, 2002. [CXWS03] B. Choi, H. Xu, M. Wang, and M. Sarrafzadeh. Flow-based cell moving algorithm for desired cell distribution. Proc. IEEE International Conference on Computer Design, pages 218-225, Oct 2003. [CXZ02] J. Cong, M. Xie, and Y. Zhang. An enhanced multilevel routing system. IEEE International Conference on Computer Aided Design, pages 51-58, 2002. [CYOO] J. Cong and X. Yuan. Routing tree construction under fixed buffer locations. In Proc. 37th Design Automation Conference, pages 379-384, Jun. 2000. [DD97] Shantanu Dutt and Wenyong Deng. Vlsi circuit partitioning by clusterremoval using iterative improvement techniques. In Proc. InVl Conf. on Computer-Aided Design, pages 194 - 200, 1997. [Dij59] E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269-271, 1959. [DiM94] G. DiMicheli. Synthesis and Optimization of Digital Circuits. McGraw Hill, 1994. [Don88] W. E. Donath. Logic partitioning. Physical Design Automation in VLSI systems, 1988. [EJ98] H. Eisenmann and F.M. Johannes. Generic global placement and floorplanning. In Proc. 35th ACM/IEEE Design Automation Conference, pages 269-274, 1998. [FM82] C. M. Fiduccia and R. M. Mattheyses. A linear-time heuristic for improving network partitions. In Proc. Design Automation Conference, pages 175-181, 1982. [GK98] N. Garg and J. Konemann. Faster and simpler algorithms for multicommodity flow and other fractional packing problems. In Proc. Annual Symposium on Foundations of Computer Science, pages 300-309, Nov. 1998. [GMW81] P.E. Gill, W. Murray, and M.H. Wright. Practical Optimization. Academic Press, London and New York, 1981. ISBN 0-12-283952-8. [Goe03a] R. Goering. FPGA placement performs poorly, study says. EE Times, 2003. http://www.eedesign.coin/story/0EG20031113S0048.
Multiscale Optimization in VLSI Physical Design Automation [Goe03b]
65
R. Goering. IC placement benchmarks needed, researchers say. EE Times, 2003. http://www.eedesign.com/story/0EG20030410S0029. [Goe03c] R. Goering. Placement tools criticized for hampering IC designs. EE Times, 2003. http://www.eedesign.com/story/0EG20030205S0014. [GotSl] S. Goto. An efficient algorithm for the two-dimensional placement problem in electrical circuit layout. IEEE Trans, on Circuits and Systems, 28(1):12-18, January 1981. [Had75] F. Hadlock. Finding a maximum out of a planar graph in polynomial time. SIAM Journal of Computing, 4(3):221-225, Sep. 1975. [Hig69] D.W. Hightower. A solution to line routing problems on the continuous plane. In Proc. IEEE 6th Design Automation Workshop, pages 1-24, 1969. [Hil02] D. Hill. Method and system for high speed detailed placement of cells within an integrated circuit design. In US Patent 6370673, Apr 2002. [HK72] M. Hannan and J.M. Kurtzberg. A review of the placement and quadratic assignment problems. SIMA, 14, 1972. [HL91] J. Heisterman and T. Lengauer. The efficient solution of integer programs for hierarchical global routing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 10(6):748-753, Jun. 1991. [HL99] S.-W. Hur and J. Lillis. Relaxation and clustering in a local search framework: Apphcation to linear placement. In Proc. ACM/IEEE Design Automation Conference, pages 360-366, New Orleans, LA, Jun 1999. [HLOO] S.-W. Hur and J. Lillis. Mongrel: Hybrid techniques for standard-cell placement. In Proc. IEEE International Conference on Computer Aided Design, pages 165-170, San Jose, CA, Nov 2000. [HMS03a] B. Hu and M. Marek-Sadowska. Fine granularity clustering for largescale placement problems. In Proc. InVl Symp. on Physical Design, Apr. 2003. [HMS03b] B. Hu and M. Marek-Sadowska. Wire length prediction based clustering and its application in placement. In Proc. Design Automation Conference, Jun. 2003. [HMS04] B. Hu and M. Marek-Sadowska. Fine granularity clustering based placement. IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems, Apr. 2004. [HT95] M. Hayashi and S. Tsukiyama. A hybrid hierarchical approach for multilayer global routing. Proceedings of the 1995 European conference on Design and Test, pages 492-496, Mar. 1995. [itr] International Technology Roadmap for Semiconductors. http://public.itrs.net/ . [JCX03] M. Romesis J. Cong and M. Xie. Optimahty, scalability and stability study of partitioning and placement algorithms. In Proc. International Symposium on Physical Design, 2003. [KAKS97] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: Application in VLSI domain. In Proc. 34th ACM/IEEE Design Automation Conference, pages 526-529, 1997. [Kar99] G. Karypis. Multilevel algorithms for multi-constraint hypergraph partitioning. Technical Report 99-034, Department of Computer Science, University of Minnesota, Minneapolis, 1999.
66 [Kar02]
Tony F. Chan et al.
G. Karypis. Multilevel Hypergraph Partitioning, chapter 3 of Multilevel Optimization and VLSICAD. Kluwer Academic Pubhshers, Boston, 2002. [KLR+87] R.M. Karp, F.T. Leighton, R.L. Rivest, C D . Thompson, U.V. Vazirani, and V. V. Vazirani. Global wire routing in two-dimensional arrays. Algorithmica, 2:113-129, 1987. [KSJA91] J.M. Kleinhans, G. Sigl, P.M. Johannes, and K.J. Antreich. Gordian: VLSI placement by quadratic programming and slicing optimization. IEEE Trans, on Computer-Aided Design, 10:356-365, 1991. [KW04] A.B. Kahng and Q. Wang. Implementation and extensibility of an analytic placer. In Proc. Int'l Symp. on Physical Design, pages 18-25, 2004. [LHT90] Y. Lin, Y. Hsu, and F. Tsai. Hybrid routing. IEEE Transactions on Computer-Aided Design, 9(2): 151-157, Feb. 1990. [LLC95] J. Li, J. Lillis, and C. Cheng. Linear decomposition algorithm for vlsi design applications. In Proc. Int'l Conf. on Computer-Aided Design, pages 223-228, 1995. [LTKS02] J. Lou, S. Thakur, S. Krishnamoorthy, and H. Sheng. Estimating routing congestion using probabilistic analysis. IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems, 21(1):32-41, January 2002. [ME95] L. McMurchie and C. Ebeling. Pathfinder: a negotiation-based performance-driven router for FPGAs. In Proc. of ACM Symposium on Field-Programmable Gate Array, pages 111-117, Feb. 1995. [MT68] K. Mikami and K. Tabuchi. A computer program for optimal routing of printed ciurcuit connectors. IFIPS Proc, H-47:1475-1478, 1968. [Nai87] R. Nair. A simple yet effective technique for global wiring. IEEE Trans, on Computer-Aided Design, 6(2), 1987. [NS96] S.G. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw Hill, New York, 1996. [QB79] N. Quinn and M. Breuer. A force-directed component placement procedure for printed circuit boards. IEEE Trans, on Circuits and Systems CAS, CAS-26:377-388, 1979. [RDJ94] Bernhard M. Riess, Konrad Doll, and Frank M. Johannes. Partitioning very large circuits using analytical placement techniques. In Proc. Design Automation Conference, pages 646 - 651, 1994. [SDJ91] G. Sigl, K. Doll, and F.M. Johannes. Analytical placement: A linear or a quadratic objective function? In Proc. 28th ACM/IEEE Design Automation Conference, pages 427-432, 1991. [She99] Naveed Sherwani. Algorithms for VLSI Physical Design Automation. Kluwer Academic Publishers, Boston, Dordrecht, London, third edition, 1999. [Sou78] J. Soukup. Fast maze router. In Proc. 15th Design Automation Conference, pages 100-102, 1978. [SR99] Y. Sankar and J. Rose. Trading quality for compile time: Ultra-fast placement for FPGAs. In FPGA '99, ACM Symp. on FPCAs, pages 157-166, 1999. [SS95] W.-J. Sun and C. Sechen. Efficient and effective placement for very large circuits. IEEE Trans, on Computer-Aided Design, pages 349-359, Mar 1995. [SWY02] M. Sarrafzadeh, M. Wang, and X. Yang. Modern Placement Techiques. Kluwer Academic Publishers, Boston, 2002.
Multiscale Optimization in VLSI Physical Design Automation [TOSOO]
67
U. Trottenberg, C.W. Oosterlee, and A. Schiiller. Multigrid. Academic Press, London, 2000. [Vyg97] Jens Vygen. Algorithms for large-scale flat placement. In Proc. 34th ACM/IEEE Design Automation Conference, pages 746-751, 1997. [WK97J Dongsheng Wang and E.S Kuh. A new timing-driven multilayer mcm/ic routing algorithm. In Proc. IEEE Multi-Chip Module Conference, pages 89-94, Feb. 1997. [WYSOOa] M. Wang, X. Yang, and M. Sarrafzadeh. Dragon2000: Standard-cell placement tool for large industry circuits. In Proc. International Conference on Computer-Aided Design, pages 260-264, 2000. [WYSOOb] M. Wang, X. Yang, and M, Sarrafzadeh. Dragon2000: Standard-cell placement tool for large circuits. Proc. IEEE/ACM International Conference on Computer-Aided Design, pages 260-263, Apr 2000. [XWCS03] H. Xu, M. Wang, B. Choi, and M. Sarrafzadeh. A trade-off oriented placement tool. Proc. IEEE/ACM International Conference on Computer-Aided Design, pages 467-471, Apr 2003. [YMOl] M.C. Yildiz and P.H. Madden. Improved cut sequences for partitioningbased placement. In Proc. Design Automation Conference, pages 776779, 2001.
A Distributed Method for Solving Semidefinite Programs Arising from Ad Hoc Wireless Sensor Network Localization Pratik Biswas^ and Yinyu Ye^ ^ Electrical Engineering, Stanford University, Stanford, CA 94305, USA. pbiswasQstanford,edu ^ Management Science and Engineering and, by courtesy. Electrical Engineering, Stanford University, Stanford, CA 94305, USA. yinyu-yeQstanford.edu Summary. We describe a distributed or decomposed semidefinite programming (SDP) method for solving Euclidean metric localization problems that arise from ad hoc wireless sensor networks. Using the distributed method, we can solve very large scale semidefinite programs which are intractable for the centralized methods. Our distributed or decomposed SDP scheme also seems to be applicable to solving other Euclidean geometry problems where points are locally connected.
1 Introduction There has been an increase in the use of semidefinite programming (SDP) for solving wide range of Euclidean distance geometry problems, such as data compression, metric-space embedding, ball packing, chain folding etc. One application of the SDP Euclidean distance geometry model lies in ad hoc wireless sensor networks which are constructed for monitoring environmental information (temperature, sound levels, light etc) across an entire physical space. Typical networks of this type consist of a large number of densely deployed sensor nodes which gather local data and communicate with other nodes within a small range. The sensor data from these nodes are relevant only if we know what location they refer to. Therefore knowledge of of each sensor position becomes imperative. Generally, the use of a GPS system is a very expensive solution to this requirement. Indeed, other techniques to estimate node positions are being developed that rely just on the measurements of distances between neighboring nodes [BY04, BHEOO, DGPOl, GKW02, HBOl, HMSOl, NNOl, SRL02, SHSOl, SHS02, SRZ03]. The distance information could be based on criterion like time of arrival, angle of arrival and received signal strength. Depending on the accuracy of these measurements and processor, power and memory constraints at each of the nodes, there is some degree of error in the distance
70
Pratik Biswas and Yinyu Ye
information. Furthermore, it is assumed that we already know the positions of a few anchor nodes. The problem of finding the positions of all the nodes given a few anchor nodes and partial distance information between the nodes is called the position estimation or localization problem. In particular, the paper [BY04] describes an SDP relaxation based model for the position estimation problem in sensor networks. The optimization problem is set up so as to minimize the error in the approximate distances between the sensors. Observable traces are developed to measure the quality of the distance data. The basic idea behind the technique is to convert the non-convex quadratic distance constraints into linear constraints by introducing relaxations to remove the quadratic term in the formulation. The performance of this technique is highly satisfactory compared to other techniques. Very few anchor nodes are required to accurately estimate the position of all the unknown sensors in a network. Also the estimation errors are minimal even when the anchor nodes are not suitably placed within the network. More importantly, for each sensor the model generates numerical data to measure the reUability and accuracy of the positions computed from the model, which can be used to detect erroneous or outlier sensors. Unfortunately, the existing SDP solvers have very poor scalability. They can only handle SDP problems with the dimension and the number of constraints up to few thousands, where in the SDP sensor localization model the number of constraints is in the order of 0{n'^), where n is the number of sensors. The difficulty is that each iteration of interior-point algorithm SDP solvers needs to factorize and solve a dense matrix linear system whose dimension is the number of constraints. While we could solve localization problems with 50 sensors in few seconds, we have tried to use several off-the-shell codes to solve localization problems with 200 sensors and often these codes quit either due to memory shortage or having reached the maximum computation time. In this report we describe an iterative distributed SDP computation scheme to overcome this difficulty. We first partition the anchors into many clusters according to their physical positions, and assign some sensors into these clusters if a sensor has a direct connection to one of the anchors. We then solve semidefinite programs independently at each cluster, and fix those sensors' positions which have high accuracy measures according the SDP computation. These positioned sensors become "ghost anchors" and are used to decide the remaining un-positioned sensors. The distributed scheme then repeats. The distributed scheme is highly scalable and we have solved randomly generated sensor networks of 4,000 sensors in few minutes for a sequential implementation (that is, the cluster SDP problems are solved sequentially on a single processor), while the solution quality remains as good as that of using the centrahzed method for solving small networks. We remark that our distributed or decomposed computation scheme should be applicable to solving other EucUdean geometry problems where points are locally connected.
Ad Hoc Wireless Sensor Network Localization
71
2 T h e Semidefinite P r o g r a m m i n g Model We first present a quadratic programming formulation of the position estimation problem, then introduce its semidefinite programming model. For simpHcity, let the sensor points be placed on a plane. RecaU that we have m known anchor points ak G 7?.^, k = l,...,m, and n unknown sensor points Xj G 7?-^, j = 1, ...,n. For every pair of two points, we have a EucUdean distance measure if the two are within a communication distance range R. Therefore, say for i,j S Ni, we are given Euclidean distance data dij between unknown sensors i and j , and for k,j G N2 we know distance d^j between anchor k and sensor j . Note that for the rest of pairs we have only a lower bound R for their pair-wise distances. Therefore, the localization problem can be formulated as an error minimization problem with mixed equahties and inequahties: minimize EijeN,, i<j "ii + Efejeiv^ "fci subject to \\xi - XjW^ = {dijY + Uij, \/ i,j e Ni, i < j , li«fc - Xj\? = ( 4 i ) ^ + "fcj, for k,j e A^2, \\xi — XjW^ > R^, for the rest i < j , ||afc — a^ilP > -K^j for the restfc,j , aij > 0, akj > 0. Let X = [xi X2 ... Xn] be the 2 x n matrix that needs to be determined. Then |2 ef.X'^Xei \\Xi \\ai-xjf
= {ai;ejf\I
Xf[I
X]{ai;ej),
where e,j is the vector with 1 at the ith position, —1 at the j t h position and zero everywhere else; and Cj is the vector of all zero except —1 at the j t h position. Let Y = X'^X. Then the problem can be rewritten as: minimize EijeN,, iKi^iiJ + EkjeN, ^^kj subject to eJjYeij = (ijj)^ + " j j , y i < j & Ni, (afciCj)^
[X'^Y)
eJ.Yeij > R\\/i
Y = X'^X,
^^k\ej) = {dkjf +akj,
< j (^ Ni,
V fc,j G A^2,
^^^
aij > 0, akj > 0.
Unfortunately, the above problem is not a convex optimization problem. Doherty et al. [DGPOl] ignore the non-convex inequality constraints but keep the convex ones, resulting in a convex second-order cone optimization problem. A drawback of their technique is that all position estimations will lie in the convex hull of the known points. Others have essentially used various types of
72
Pratik Biswas and Yinyu Ye
nonlinear equation and optimization solvers to solve similar quadratic models, where final solutions are highly dependent on initial solutions and search directions. Our method is to relax problem (1) to a semidefinite program: minimize E i j e w , , i<j " u + ^kjeN, "kj subject to efjYeij = (dij)^ + ctij, "i i < j G Ni Tv„..
^ 'n2
Y > X'^X,
w /
^ „• A AT
(2)
aij > 0, akj > 0.
The last matrix inequahty is equivalent to (Boyd et al. [BEF94]) /
X^
Thus, the problem can be written as a standard SDP problem: minimize EijeNu i<j "'J + ^k,jeN2 "kj subject to (1;0; OyZ{l;0; 0) = 1 (0;1;0)^Z(0;1;0)-1 (1;1;0)^Z(1;1;0)=2 (0; eijfZ{0; ey) = ( 4 ) ^ + «„•, V i < j G A^i, {ak]ejYZ{ak;ej) = {dkjY + akj, V k,j G A''2, (0; ey)^Z(0; e^) > i?2, V i < j ^ iVi,
(3)
^ ^ 0, ttij > 0, ttfcj > 0. The matrix of Z has 2n+n(n-|-l)/2 unknown variables. If Ni is sufHciently large and all distance measures are perfect, then there is an unique optimal Z solution, with zero objective value, for (3). Moreover, in
we must have Y = {X)'^X and X equal true positions of the unknown sensors. That is, the SDP relaxation solves the original problem exactly. More precisely, we have the following theorem. Theorem 1. Let all distance measures dij anddkj be perfect. Then, the minimal value of (S) is zero. Moreover, let Ni and N2 be the set of all unknown sensors and anchors, the number of anchors equal 3, and the left-hand matrix of linear equations of (3) has full rank. Then we must have
Ad Hoc Wireless Sensor Network Localization
73
Y = {XfX in the optimal solution Z of (3), i.e., Z has rank 2, and X represents the true positions of all unknown sensors. Proof. Let X* be the true position matrix of the n unknown points, and Z
/
X*
— 1 /^r*\T
tY*\T
{X*Y {X*)^'X*
Then Z* is a feasible solution for (3) with all Uij = 0 and Uj^ = 0. But the objective value of (3) is greater than or equal to zero, so the minimal value of (3) must be zero. If A''i and N2 contain all unknown sensors and anchors, that is, we have perfect distance relations between all sets of points, we have \Ni\=n{n-l)/2
and |A^2| = 3n
which give 3n + n{n — l ) / 2 of hnear equations in (3). This number equals 2n + n{n + l ) / 2 , the total number of variable in matrix Z. Furthermore, if the left-hand constraint matrix of these linear equations is full rank, Z would be uniquely determined and it must be Z*, so that X = X* and Y = {X*)'^X* = X'^X. D As discussed in [BY04], the condition to have a full rank constraint matrix is that the three anchors are not on a same line. Generally, for imperfect information cases, we have Y — X'^X y 0 This inequality constitutes error analyses of the position estimation. For example, n
Tr^ce{Y-X^X)
=
Y,{yjj-\\xjf), 0=1
the total trace of the difference matrix, measures the efficiency and quality of distance data dij and dkj- In particular, individual trace Y- • — I I T ^ I I ^
C^^
helps us to evaluate the position estimation, Xj, for sensor j . The smaUer the trace, the higher accuracy of the estimation.
3 A Distributed SDP Method A round of the distributed computation method is straightforward and intuitive: 1. Partition the anchors into a number of clusters according to their geographical positions. In our implementation, we partition the entire sensor area into a number of equal-sized squares and those anchors in a same square form a regional cluster.
74
Pratik Biswas and Yinyu Ye
-0.3
-0.2
-0.1
0.4
0.6
Fig. 1 First round position estimations for the 2,000 sensor networlc, noisyfactor= 0, radio-range=.06, and the number of clusters=49.
2. Each (unpositioned) sensor sees if it has a direct connection to an anchor (within the communication range to an anchor). If it does, it becomes an unknown sensor point in the cluster to which the anchor belongs. Note that a sensor may be assigned into multiple clusters and some sensors are not assigned into any cluster. 3. For each cluster of anchors and unknown sensors, formulate the error minimization problem for that cluster, and solve the resulting SDP model if the number of anchors is more than 2. Typically, each cluster has less than 100 sensors and the model can be solved efficiently. 4. After solving each SDP model, check the individual trace (5) for each unknown sensor in the model. If it is below a predetermined small tolerance, label the sensor as positioned and its estimation Xj becomes an " anchor". If a sensor is assigned in multiple clusters, we choose the Xj that has the smallest individual trace. This is done so as to choose the best estimation of the particular sensor from the estimations provided by solving the different clusters. 5. Consider positioned sensors as anchors and return to Step 1 to start the next round of estimation.
Ad Hoc Wireless Sensor Network Localization J
.'4.
)l"
• ^
4^
,
75
•*
.« « 'N * :
0.4-, , » t
f**<^
(
,'#
?t
<
'«
•^
0.2 •
|
^
^
, * % • •
1?.'
*
• '
•"
-4.
*; '
•'..
• *'
,"•
*•
* I.
• i
<j,/
^^
•0.1 C H i f
-0.2
»
iC
/ • . *
M i
-0.3
-0.4
-0.5
-0.5
-0,4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
0.5
Fig. 2. Second round position estimations for the 2,000 sensor network, noisyfactor=0, radio-range=.06, and the number of clusters=49.
Note that the solution of the SDP problem in each cluster can be carried out at the cluster level so that the computation is highly distributive. The only information that needs to be passed among the neighboring clusters is which of the unknown sensors become positioned after a round of SDP solutions. In solving the SDP model for each cluster, even if the number of sensors is below 100, the total number of constraints could be in the range of thousands. However, many of those "bounding away" constraints, i.e., the constraints between two remote points, are inactive or redundant at the optimal solution. Therefore, we adapt an iterative active constraint generation method. First, we solve the problem including only partial equality constraints and completely ignoring the bounding-away inequality constraints to obtain a solution. Secondly we verify the equality and inequality constraints and add those violated at the current solution into the model, and then resolve it with a "warm-start" solution. We can repeat this process until all of the constraints are satisfied. Typically, only about 0{n + m) constraints are active at the final solution so that the total number of constraints in the model can be controlled
at 0{n + m).
76
Pratik Biswas and Yinyu Ye 0.5 ,' -t t l *
t'K
'.
'•»• •
•••' "'
/i'..'»*
,. "i
t'
d»
it
•
0.4
0.3bMl »
'
<»,
' '. ,
* •
T
'
'
'
-0.1
%
.,*
-0.2
• . -0.3 t-4.
•
* '>•
.*
*",(
• • .
t •**•
• •' \*-*\,
t'k'••••:
-0.4 •
'
#
" •
-0.5 -0.5
-0.4
-0.3
-0.2
-0.1
*
,
* .^
0.1
0.2
-Ufc. 0.3
t. .t J*..'.j.^ 0.4
I 0.5
Fig. 3. Third and final round position estimations for the 2,000 sensor network, noisy-factor=0, radio-range=.06, and the number of clusters=49.
Two SDP codes are used in our method, the primal-dual homogeneous and self-dual interior-point algorithm SeDuMi of [StuOl] and the dual interiorpoint algorithm DSDP2.0 of [BYZ03]. Typically, DSDP2.0 is 2 to 3 times faster than SeDuMi, but SeDuMi is often more accurate and robust. DSDP is faster due to the fact that the sparse data structure of the problem is more suitable for DSDP.
4 Computational Results Simulations were performed on networks of 2,000 to 4,000 sensor points which are randomly generated in a square region of [—.5 .5] x [—.5 .5] using rand{2,n) — .5 in MATLAB. The distance between two points is calculated as follows: If the distance is less than a given radiorange between [0,1], a random error was added to it dij = dij • (1 + randn{l) • noisy factor),
Ad Hoc Wireless Sensor Network Localization
77
-0.2
Fig. 4. First round position estimations for the 2,000 sensor network, noisyfactor=0.05, radio-range=.06, and the number of clusters=49.
where noisy factor is a given number between [0,1]. If the distance is beyond the given radiorange, no distance information is known to the algorithm except that it is greater than radiorange. We generally select the first 10% of the points as anchors, that is, anchors are also uniformly distributed in the same random manner. The tolerance for labeUng a sensor as positioned is set at 0.01 • (1 + noisy factor) • radiorange. Also the original true and the estimated sensor positions are plotted. The blue points refer to the positions of the anchors, green points to the true locations of the unknown sensors and red points to their estimated positions from the computation. The error offset between the true and estimated positions for an individual sensor is depicted by a solid blue line in each figure. The first simulation is carried out for solving a network localization with 2,000 sensors, where the iterative distributed SDP method terminates in three rounds, see Figures 1, 2 and 3. When a sensor is not positioned, its estimation is typicaUy at the origin. In this simulation, the entire sensor region is partitioned into 7 x 7 equal-sized squares, that is, 49 clusters, and the radio range is set at .06. The total solution time for the three round computation
Pratik Biswas and Yinyu Ye
78 0.5
-0.2
-0.3
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Fig. 5. Sixth round position estimations for the 2,000 sensor network, noisyfactor=0.05, radio-range=.06, and the number of clusters=49.
on a single Pentium 1.2 GHz and 500 MB PC, excluding the computation of dij, is about two minutes. As can be seen from the Figures 1, 2 and 3, it is usually the outlying sensors at the boundary or the sensors which do not have many anchors within the radio range that are not estimated in the initial stages of the method. Gradually, as the number of well estimated sensors or 'ghost' anchors grows, more and more of these points are estimated. The second simulation is for solving the same network of 2,000 sensors and 49 clusters, but the distance dij is perturbed by a random noise either plus or minus, that is, U —
ij
{l+randn{l)*0.05),
where randn{l) is a standard normal random number. The iterative distributed SDP method terminates in thirteen rounds, see Figures 4, 5 and 6 for rounds 1, 6 and 13. It is expected that the noisy cases will take more iterations since the number of 'ghost' anchors added at each iteration will be lower due to higher
Ad Hoc Wireless Sensor Network Localization
79
0.5 \*4^ 0.4
.
0.3
%
0.2
0.1
-0.1 -0.2
-0,3
'i;*%v i^« ^
-0.5
f n!^ %KT^ ^i*ii^i. ^m^ i/^'^M'^w^t -0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
Fig. 6. Thirteenth and final round position estimations for the 2, 000 sensor network, noisy-factor=0.05, radio-range=.06, and the number of clusters=49.
D
9
8
Fig. 7. Diamond: the offset distance between estimated and true positions, Square: the square root of individual trace (5) for the 2,000 sensor network.
80
Pratik Biswas and Yinyu Ye •«
0.5 0.4 -^ *NV^^^^B^S
0 . 3 -(I*
T^^^^^O
ȣi 0.2
^ ^
^ K K H I L Him
r^^
^^L^^^
^SSl i***^ '
^
^
^ ^
^gi^i^jSjJ^^jr
'^^^^S^^^%W'
^
0.1 J^^^^S
^ f . • * * « . ,_JJ
^^i^-fcai
0 -, i
^^a^^^ - — 1 ' '"'J''"J '•"K
g.', ,,,|L|i
y j
t.titlMi,;''—rS—'• '>i'
'^'T
"i
-0.1
^S
-0.2^
V^^TA
-0.5 -0.5
Hp*/f r / 'H J luW ^•'/'if *l f¥*'tt f 1 m \ '
^0iA
W V
•^*. >, ^Kp^^^*y^ -0.3 -
Y*m
-0.4'
•p/i
VAVV
^'7t / ' i *// -"I P^In ii\i
1 h\
l_S_aiJ.\Jt_l -4(—1
-0.4
-0.3
-0,2
-0.1
0.1
0.2
0,3
0.4
0.5
Fig. 8. First round position estimations in the 4,000 sensor network, noisy-factor=0, radio-range=.035, and the number of clusters=100.
errors in estimation. The final rounds mainly refine position estimations for a small number of sensors and each round runs in a few seconds. Note that the estimation from the distributed method possesses good quality and there is no propagation of noisy errors. One sensor on the very upright corner is unconnected to the network in this noisy data so that it is un-positioned at the final solution. Its individual trace also indicates this fact in the simulation, see Figure 7 for the correlation between individual error offset (blue diamond) and the square-root of trace (red square) for a few sensors whose trace is higher than 0.01 • (1 -{• noisy factor) • radiorange after the final round. The third simulation solves a network localization with 4, 000 sensors, where the iterative distributed SDP method terminates in five rounds, see Figures 8, 9, and 10 for rounds 1, 3 and 5. In this simulation, the entire sensor region is partitioned into 10 x 10 equal-sized squares, that is, 100 clusters, and the radio range is set at .035. The total solution time for the five round computation on the single Pentium 1.2 GHz and 500 MB PC, excluding computing dij, is about four minutes.
Ad Hoc Wireless Sensor Network Localization
81
0.5
' < «, , i A « \ *
•'\vV'".'. •.
v-'- •••'•• * • • * 7 r' ir « * 1- :
•'•
0.4
0.3
0.2-
0.1
-0.1
« .^ ^*i'^^\,#'I'l- f * > ' ' . - * I. . ' ^ ^ •*.'.•;'•"••; ,w''.**- **
-0.2
-0.3
-0.4
-0.5 -0.5
-0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
0.5
Fig. 9. Third round position estimations in the 4,000 sensor network, noisyfactor=0, radio-range=.035, and the number of clusters=100.
It is interesting to note that the erroneous points are concentrated within particular regions. This clearly indicates that the clustering approach prevents the propagation of errors to other clusters. Again, see Figure 11 for the correlation between individual error offset (blue diamond) and the squareroot of trace (red square) for a few sensors whose trace is higher than 0.008 • (1 + noisy factor) • radiorange after the final round of the third simulation.
5 Work in Progress The current clustering approach assumes that the anchor nodes are more or less uniformly distributed over the entire space. So by dividing the entire space into smaller sized square clusters, the number of anchors in each cluster is also more or less the same. However this may or may not be the case in a real scenario. A better approach would be to create clusters more intelUgently based on local connectivity information. Keeping this in mind, we try and find for each sensor
82
Pratik Biswas and Yinyu Ye 0.5r 0.4-
0.3
0.2
0.1
-0.1
',•;>..•• 'V^i;-v<- l-:',-r' ..*;',.*' V i#-,V'v-, A f - ; * ' . . i
-0.2
-0.3
-:rj:/:>?•-,:?^|:^;:*^^•:!^*>'*>%*V< i|i't?:,..r^ -0.4
-0.5
-0.5
-0.4
-0.3
-0.2
0.1
-0.1
0.2
0.3
0.4
0.5
Fig. 10. Fifth and final round position estimations in the 4,000 sensor network, noisy-factor=0, radio-range=.035, and the number of clusters=100.
0
2
4
6
0
8
10
12
Fig. 1 1 . Diamond: the offset distance between estimated and true positions, Square: the square root of individual trace (5) for the 4,000 sensor network.
Ad Hoc Wireless Sensor Network Localization
83
its immediate neighborhood, that is, points within radio range of it. It can be said that such points are within one hop of each other. Higher degrees of connectivity between different points can also be evaluated by calculating the minimum number of hops between the 2 points. Using the hop information, we propose to construct clusters which are not necessarily of any particular geometric configuration but are defined by its connectivity with neighborhood points. Such clusters would yield much more efficient SDP models and faster and more accurate estimations.
6 Concluding R e m a r k s The distributed SDP approach solves with great accuracy and speed very large estimation problems which would otherwise be extremely time consuming in a centralized approach. Also due to smaller independent clusters, the noise or error propagation is quite limited as opposed to centralized algorithms. In fact, the trace error (5) provides us with a very reliable measure of how accurate the estimation is and is used to discard estimations which may be very inaccurate as well as determining good estimations which may be used in future estimations. This distributed algorithm is particularly relevant in the ad hoc network scenario where so much emphasis is given to decentralized computation schemes.
References [BYZ03] S. J. Benson, Y. Ye and X. Zhang. DSDP, http://wwwunix.mcs.anl.gov/ benson/ or http://www.stanford.edu/ yyye/Col.html, 1998-2003. [BY98] D. Bertsimas and Y. Ye. Semidefinite relaxations, multivariate normal distributions, and order statistics. Handbook of Combinatorial Optimization (Vol. 3), D.-Z. Du and P.M. Pardalos (Eds.) pp. 1-19, (1998 Kluwer Academic Publishers). [BY04] P. Biswas and Y. Ye. Semidefinite Programming for Ad Hoc Wireless Sensor Network Localization. Proc. IPSN04 (2004). [BEF94] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. SI AM, 1994. [BHEOO] N. Bulusu, J. Heidemann, D. Estrin. GPS-less low cost outdoor localization for very small devices. TR 00-729, Computer Science, University of Southern California, April, 2000. [DGPOl] L. Doherty, L. E. Ghaoui, and K. Pister. Convex position estimation in wireless sensor networks. Proc. Infocom 2001, Anchorage, AK, April 2001. [GKW02] D. Ganesan, B. Krishnamachari, A. Woo, D. Culler, D. Estrin, and S. Wicker. An empirical study of epidemic algorithms in large scale multihop wireless networks. UCLA/CSD-TR-02-0013, Computer Science, UCLA, 2002.
84 [HBOl]
Pratik Biswas and Yinyu Ye
J. Hightower and G. Boriello. Location systems for ubiquitous computing. IEEE Computer, 34(8) (2001) 57-66. [HMSOl] A. Howard, M. J. Mataric, and G. S. Sukhatme. Relaxation on a mesh: a formalism for generalized localization. In Proc. lEEE/RSJ Int'l Conf. on Intelligent Robots and Systems (IROSOl) (2001) 1055-1060. [NNOl] D. Niculescu and B. Nath. Ad-hoc positioning system. In IEEE GloheCom, Nov. 2001. [SRL02] C. Savarese, J. Rabaey, and K. Langendoen. Robust positioning algorithm for distributed ad-hoc wireless sensor networks. In USENIX Technical Annual Conf., Monterey, CA, June 2002. [SHSOl] A. Savvides, C. C. Han, and M. Srivastava. Dynamic fine-grained localization in ad hoc networks of sensors. In ACM/IEEE Int'l Conf. on Mobile Computing and Networking (MOBICON), July 2001. [SHS02] A. Savvides, H. Park, and M. Srivastava. The bits and flops of the n-hop multilateration primitive for node localization problems. In 1st ACM Int'l Workshop on Wireless Sensor Networks and Applications (WSNA '02), 112-121, Atlanta, 2002. [SRZ03] Y. Shang, W. Ruml, Y. Zhang and M. Fromherz. Localization From Mere Connectivity, MobiHoc'03, Annapolis, Maryland. June 2003, [StuOl] J. F. Sturm. Let SeDuMi seduce you, http://fewcal.kub.nl/sturm/software/sedumi.html, October 2001. [XY97] G.L. Xue and Y. Ye. An efficient algorithm for minimizing a sum of Euclidean norms with applications, SIAM Journal on Optimization 7 (1997) 1017-1036.
Optimization Algorithms for Sparse Representations and Applications P a n d o G. Georgiev^, Fabian Theis^, and Andrzej Cichocki*^ ^ Laboratory for Advanced Brain Signal Processing, Brain Science Institute, RIKEN, Wako-shi, Japan. Current address: ECECS Department, University of Cincinnati, ML 0030, Cincinnati, Ohio 45221-0030, USA. pgeorgieQececs.uc.edu ^ Institute of Biophysics, University of Regensburg, D-93040 Regensburg, Germany, fabisuiQtheis.najne ^ Laboratory for Advanced Brain Signal Processing, Brain Science Institute, RIKEN, Wako-shi, Japan. c i a Q b s p . b r a i n . r i k e n . j p Summary. We consider the following sparse representation problem^ which is called Sparse Component Analysis: identify the matrices S G IR""^^ and A G IR""^"^ (m < n < N) uniquely (up to permutation of scaling), knowing only their multiplication X = A S , under some conditions, expressed either in terms of A and sparsity of S (identifiability conditions), or in terms of X {Sparse Component Analysis conditions). A crucial assumption (sparsity condition) is that S is sparse of level k in sense that each column of S has at most k nonzero elements [k = l,2,...,m— 1). We present two type of optimization problems for such identification. The first one is used for identifying the mixing matrix A: this is a typical clustering type problem aimed to finding hyperplanes in IR^ which contain the columns of X. We present a general algorithm for this clustering problem and a modification of Bradley-Mangasarian's /c-planes clustering algorithm for data allowing reduction of this problem to an orthogonal one. The second type of problems is those of identifying the source matrix S. This corresponds to finding a sparse solution of a linear system. We present a source recovery algorithm, which allows to treat underdetermined case. Applications include Blind Signal Separation of under-determined linear mixtures of signals in which the sparsity is either given a priori, or obtained with some preprocessing techniques as wavelets, filtering, etc. We apply our orthogonal mplanes clustering algorithm to fMRI analysis. K e y w o r d s : Sparse Component Analysis, Blind Source Separation, underdetermined mixtures
86
Pando G. Georgiev, Fabian Theis, and Andrzej Cichocki
1 Introduction One of the fundamental questions in data analysis, signal processing, data mining, neuroscience, etc. is how to represent a large data set X (given in form of a (m X A'^)-matrix) in different ways. A simple approach is a linear matrix factorization: X = AS,
AeIR'"^",SGlR"^^,
(1)
where the unknown matrices A (dictionary) and S (source signals) have some specific properties, for instance: 1) the rows of S are (discrete) random variables, which are statistically independent as much as possible - this is Independent Component Analysis (ICA) problem; 2) S contains as many zeros as possible - this is the sparse representation or Sparse Component Analysis (SCA) problem; 3) the elements of X, A and S are nonnegative - this is Nonnegative Matrix Factorization (NMF) (see [LS99]). There is a large amount of papers devoted to ICA problems (see for instance [CA02], [HKOOl] and references therein) but mostly for the case m> n. We refer to [BZOl, LLGS99, TLP03, WS03, ZPOl] and reference therein for some recent papers on SCA and underdetermined ICA (m < n). A related problem is the so called Blind Source Separation (BSS) problem, in which we know a priori that a representation such as in equation (1) exists and the task is to recover the sources (and the mixing matrix) as accurately as possible. A fundamental property of the complete BSS problem is that such a recovery (under assumptions in 1) and non-Gaussianity of the sources) is possible up to permutation and scaling of the sources, which makes the BSS problem so attractive. In this paper we consider SCA and BSS problems in the underdetermined case (m < n, i.e. more sources than sensors, which is more challenging problem), where the additional information compensating the limited number of sensors is the sparseness of the sources. It should be noted that this problem is quite general and fundamental, since the sources could be not necessarily sparse in time domain. It would be sufficient to find a linear transformation (e.g. wavelet packets), in which the sources are sufficiently sparse. In the sequel, we present new algorithms for solving the BSS problem: matrix identification algorithm and source recovery algorithm under conditions that the source matrix S has at most m — \ nonzero elements in each column and if the identifiability conditions are satisfied (see Theorem 1). We demonstrate the effectiveness of our general matrix identification algorithm and the source recovery algorithm in the underdetermined case for 7 artificially created sparse source signals, such that the source matrix S has at most 2 nonzero elements in each column, mixed with a randomly generated (3 x 7) matrix. For a comparison, we present a recovery using /i-norm minimization
Optimization Algorithms for Sparse Representations and Applications
87
[CDS98], [DE03], which gives signals that are far from the original ones. This imphes that the conditions which ensure equivalence of li-noiva and ^o-norm minimization [DE03], Theorem 7, are generaUy not satisfied for randomly generated matrices. Note that /i-norm minimization gives solutions which have at most m non-zeros [CDS98], [DE03]. Another connection with [DE03] is the fact that our algorithm for source recovery works "with probability one", i.e. for almost all data vectors x (in measure sense) such that the system X = As has a sparse solution with less than m nonzero elements, this solution is unique, while in [DE03] the authors proved that for all data vectors x such that the system x = As has a sparse solution with less than Spark{A)/2 nonzero elements, this solution is unique. Note that Spark{A) < m + 1, where Spark{A) is the smallest number of linearly dependent columns of A.
2 Blind Source Separation In this section we develop a method for solving the BSS problem if the following assumptions are satisfied: Al) the mixing matrix A G IR'"^" has the property that any square mx m submatrix of it is nonsingular; A2) each column of the source matrix S has at most m - 1 nonzero elements; A3) the sources are sufficiently rich represented in the following sense: for any index set of n - m-h 1 elements I = {ii, ...,in~m+i} C {!,...,n} there exist at least m column vectors of the matrix S such that each of them has zero elements in places with indexes in / and each m - 1 of them are linearly independent. 2.1 Matrix identification We describe conditions in the sparse BSS problem under which we can identify the mixing matrix uniquely up to permutation and scahng of the columns. We give two type of such conditions. The first one corresponds to the least sparsest case in which such identification is possible. Further, we consider the most sparsest case (for small number of samples) as in this case the algorithm is much simpler. 2.1.1 General case - full identiflability Theorem 1 (Identiflability conditions - general case) Assume that in the representation X = AS the matrix A satisfies condition Al), the matrix S satisfies conditions A2) and A3) and only the matrix X is known. Then the mixing matrix A is identifiable uniquely up to permutation and scaling of the columns.
88
Pando G. Georgiev, Fabian Theis, and Andrzej Cichocki
Proof. It is clear that any column a^ of the mixing matrix lies in the intersection of all f ^]^) hyperplanes generated by those columns of A in which aj participates. We will show that these hyperplanes can be obtained by the columns of the data X under the condition of the theorem. Let J7 be the set of all subsets of {!,...,n} containing m — 1 elements and let J E J. Note that J consists of ( „ 1 i ) elements. We will show that the hyperplane (denoted by Hj) generated by the columns of A with indexes from J can be obtained by some columns of X. By A2) and A3), there exist m indexes {tk})^^i C {!,..., A^} such that any m — 1 vector columns of {S(:,ife)}^j form a basis of the (m — l)-dimensional coordinate subspace of IR" with zero coordinates given by {1, ...,n} \ J. Because of the mixing model, vectors of the form Vfc = ^ 5 ' ( j , t f c ) a j , fc = l,...,m, belong to the data matrix X. Now, by condition Al) it follows that any m — 1 of the vectors {^k}^=i are Unearly independent, which impUes that they will span the same hyperplane Hj. By Al) and the above, it follows that we can cluster the columns of X in ( ^'^^ j groups Wfc, fc = 1,..., f "^^ j uniquely such that each group Hk contains at least m elements and they span one hyperplane /f j^, for some Jk & J • Now we cluster the hyperplanes obtained in such a way in the smallest number of groups such that the intersection of all hyperplanes in one group gives a single one-dimensional subspace. It is clear that such one-dimensional subspace will contain one column of the mixing matrix, the number of these groups is n and each group consists of ( ^~ 2 ) hyperplanes. • The proof of this theorem gives the idea for the matrix identification algorithm.
Algorithm 2.1 SCA matrix identification algorithm Data: samples x ( l ) , . . . , x ( r ) o / X Result: estimated mixing matrix A Hyperplane i d e n t i f i c a t i o n . 1 Cluster the columns ofK in ( „ 1 j ) groups ?ik,k = I,..., i^_^] such that the span of the elements of each group Tik produces one hyperplane and these hyperplanes are different. Matrix i d e n t i f i c a t i o n . 2 Cluster the normal vectors to these hyperplanes in the smallest number of groups Gj,j = l,...,n (which gives the number of sources n) such that the normal vectors to the hyperplanes in each group Gj lie in a new hyperplane
Optimization Algorithms for Sparse Representations and Applications
89
3 Calculate the normal vectors kj to each hyperplane Hj,j = l,...,n. 4 The matrix A with columns a.j is an estimation of the mixing matrix (up to permutation and scaling of the columns). Remark The above algorithm works for data for which we know a priori that they he on hyperplanes (or near to hyperplanes). 2.2 Identification of sources Theorem 2 (Uniqueness of sparse representation) Let Ti be the set of all X £ IR"* such that the linear system As = x has a solution with at least n — m + k zero components. If A fulfills Al), then there exists a subset TLQ C Ji with measure zero with respect to H, such that for every x £ Ti\Ho this system has no other solution with this property. Proof. Obviously W is the union of all ( „ " ^.) = (m-k)<(n-m+k)\ fc-codimensional hnear subspaces of H " (which are hyperplanes if fc = 1), produced by taking the linear hull of every subsets of the columns of A with rn ~ k elements. Let Ho be the union of all intersections of any two such subspaces. Then Ho has a measure zero in H and satisfies the conclusion of the theorem. Indeed, assume that x £ H\Ho and As = As = x, where s and s are have at least n — m + k zeros. Since x ^Ho, x belongs to only one fc-codimensional hnear subspace produced as a linear hull of some m — k columns ajj,..., aj^_^ of A. It means that the vectors s and s have n — m + k zeros in places with indexes in {1, ...,n}\{ii,..., im-k}- Now from the equation A(s —s) = 0 it follows that the m — k vector columns ajj, ...,a.i^^. of A are linearly dependent, which is a contradiction with Al). • From Theorem 2 it fohows that the sources are identifiable generically, i.e. up to a set with a measure zero, if they have level of sparseness grater than or equal to n — m + 1 (i.e., each column of S has at least n — m + 1 elements) and the mixing matrix is known. Below we present an algorithm, based on the observation in Theorem 2. Note that this theorem is used in the sequel only for the case /c = 1.
Algorithm 2.2 Source Recovery Algorithm Data: samples x ( l ) , . . . ,x{N) (vector columns) of the data matrix X, and mixing matrix A Result: estimated source matrix S 1 Identify the the set of k-codimensional subspaces H produced by taking the linear hull of every subsets of the columns of A with m — k elements; 2 Repeat for i = 1 to N: 3 Identify the subspace H E H containing Xj := X(:,i), or, in practical situation with presence of noise, identify the one to which the distance from Xj is minimal and project Xj onto H to XJ;
90
Pando G. Georgiev, Fabian Theis, and Andrzej Cichocki
4 if H is produced by the linear hull of column vectors ajj, ...,ai^_f., then find coefficients Xij such that m~k
j= l
These coefficients are uniquely determined if Xj doesn't belong to the set Ho with measure zero with respect to to H (see Theorem 2); 5 Construct the solution S(:,t); it contains Ajj in the place ij for j = 1,..., m — k, the rest of its components are zero.
3 Sparse Component Analysis In this section we develop a method for the complete solution of the SCA problem. Now the conditions are formulated only in terms of the data matrix X. Theorem 3 (SCA conditions) Assume that m < n < N and the matrix X S IR""^ satisfies the following conditions: (i) the columns ofX. lie in the union H of I ^_^ j different hyperplanes, each column lies in only one such hyperplane, each hyperplane contains at least m columns of X such that each m — 1 of them are linearly independent. (a) for each i £ {!,..., n} there exist p = ( ^ ^ 2 ) different hyperplanes {Hi^jY^^y in Ti such that their intersection Li = H^^^Hij is one dimensional subspace. (Hi) any m different Li span the whole IR"*. Then the matrix X is representable uniquely (up to permutation and scaling of the columns of A and S) in the form X — AS, where the matrices A e R ' " ^ " and S G IR"^''^ satisfy the conditions Al) and A2), A3) respectively. Proof. Let Li be spanned by a^ and set A = {aj}"^^. Condition (iii) implies that any hyperplane from Ti contains at most m — 1 vectors from A. By (i) and (ii) it follows that these vectors are exactly m — 1; only in this case the calculation of the number of all hyperplanes by (ii) will give the number in (i): n ( "T^ ) /("^ ~ ^) ~ \ "^i ) • ^^^ A be a matrix whose -2 J I v'" ^1 ~ \m-\ ^ column vectors are all vectors from A (taken in an arbitrary order). Since every column vector x of X lies only in one hyperplane from W, the linear system As = x has unique solution, which has at least n —m + 1 zeros (see the proof of Theorem 2). Let {xj}^^ be m column vectors from X, which span one hyperplane from H, and m — 1 of them are hnearly independent (such vectors exist by (i)). Then we have: Asj = Xj, for some uniquely determined vectors Si,i = l,...,m — 1, which are hnearly independent and have at least
Optimization Algorithms for Sparse Representations and Applications
91
n — m + 1 zeros in the same coordinates. In such a way we can write: X = AS for some uniquely determined matrix S, which satisfies A2) and A3). •
4 Skeletons of a finite set of points Let X be a finite set of points represented by the columns {xj}jLi of the matrix X G R"^^^. The solution {(n^,^i)}iLi of the following minimization problem: N
minimize Y^ min \nfxj—bi\ j=i
subject to ||n^|| = 1,6^ G R,i = 1, ...,n, (2)
^-'-^
defines n^^^ -skeleton of X (introduced in [RU03]). It consists of a union of n hyper-planes Hi = {x e M"^ : nfx = bi},i = l,...,fc, such that the sum of minimum distances of every point Xj to them is minimal. Analogically, the solution of the following minimization problem: N
minimize ^ ^ min \njxj — bi\'^ subject to
||n^|| = 1,6^ G R,z = l,...,n,
^—-^ l
(3) defines n^'^^ -skeleton of X (introduced in [BMOO]). It consists of union of n hyper-planes {i^i}^==i (defined as above) such that the sum of squared minimum distances of every point Xj to them is minimal. Assuming that the matrixes A and S satisfy conditions Al) and A3), it is clear by Theorem 1 that the representation X = AS is sparse (in sense that each column of S contains at most m — 1 non-zero elements, i.e. it satisfies the condition A2)) if and only if the above defined two skeletons coincide and the data points (columns of X) lie on them.
5 Orthogonal ?n-planes clustering algorithm In this section we propose a modification of the fc-plane clustering algorithm of Bradley and Mangasarian [BMOO]. The idea is to reduce the problem of finding the m-skeleton of X to an orthogonal problem: requiring that the hyperplanes of it are orthogonal, i.e. defined by an orthonormal matrix W G R ^ ^ ^ . This can be done in the following way, if we assume that the source matrix S after normalization of the rows is semi-orthogonal, i.e. SS^ == I, so X = AS. Let XX-^ = ULTJ-^ be the eigenvalue decomposition of the matrix X X ^ . Assume that the diagonal elements of L are positive. Then, denoting W = L - V 2 u ^ A and Y - L ' ^ ^ u ^ x , we have Y = WS,
WW^^I,
YY^-I.
(4)
92
Pando G. Georgiev, Fabian Theis, and Andrzej Cichocki
Then the cluster update steps in Bradley-Mangasarian algorithm [BMOO] can be unified in the following optimization problem with orthogonality constraints: mmimize
ywfY«(Y«)^Wi
(5)
i=l
under constraints
'wjwj = Sij,
(6)
where Y^'^ is the matrix, whose vector columns are elements of the i-th cluster. A l g o r i t h m 5.1 O r t h o g o n a l m-planes clustering algorithm D a t a : samples x ( l ) , . . . ,x(T) o / X Result: estimated orthonormal mixing matrix W in (4) 1 Initialize randomly W = ( w i , . . . , w„) - orthonormal matrix. fori <- l , . . . , j o Cluster assignment. fort^l,...,T 2 Add x{t) to cluster Y^^\ where i is chosen to minimize |w^x(i)| (distance to hyperplane given by the i-th column ofW). end 3 Exit if the mean distance to hyperplanes is smaller than some preset value. Matrix update. for k <— 1 , . . . , n 4 Define projection matrix P with rows consisting of an orthonormal basis of the orthogonal complement of Wl,...,Wfc_i.
5
Calculate projected cluster covariance C ^PYW(YW)"rp"^ 6 Choose eigenvector Vfc of C corresponding to a minimal eigenvalue. 7 S'eiWfc^pTvfc. end end
The constant JQ is chosen in practice sufficiently large. The finite termination of the algorithm is proved in [BMOO], Theorem 3.7.
6 Applications 6.1 C o m p u t e r simulation examples: U n d e r d e t e r m i n e d case We consider a mixture of 7 artificially created sources (see Fig. 2 left) - sparsified randomly generated signals with at least 5 zeros in each column - with a randomly generated mixing matrix with dimension 3 x 7 .
Optimization Algorithms for Sparse Representations and Applications
93
Fig. 1. Mixed signals (left) and normalized scatter plot (density) of the mixtures (right) together with the 21 data set hyperplanes, visualized by their intersection with the unit sphere in R^.
Figure 1 gives the mixed signals together with a normalized scatterplot of the mixtures - the data lies in 21 = (2) hyperplanes. Applying the underdetermined matrix recovery algorithm (Algorithm 1) to the mixtures gives the recovered mixing matrix perfectly well, up to permutation and scaling. Applying the source recovery algorithm (Algorithm 4) we recover the source signals up to permutation and scaling (see Fig. 2, middle). This figure (right) shows also that the recovery by ^j-norm minimization (known as Basis Pursuit method of S. Chen, D. Donoho and M. Saunders [CDS98]) does not perform well, even if the mixing matrix is perfectly known. 6.2 Complete case — orthogonal m-planes clustering algorithm applied to fMRI data We now analyze the performance of the orthogonal m-planes clustering algorithm when applied to functional magnetic resonance imaging (fMRI) measurements. The typical setup of fMRI experiments is the following: NMR brain imaging techniques are used to record brain activity data over a certain span of time, during which the subject is asked to perform some kind of task (e.g. 5 seconds of activity in the motor cortex followed by 5 seconds of activity in the visual cortex; this iterative procedure is often called block diagram). The brain recordings show areas of high and of low brain activity (using the BOLD effect). Analysis is performed on the 2d-image slices recorded at the discrete time steps. General linear model (GLM) approaches or ICA-based fMRI analysis then decompose this data set into a certain set of component maps i.e. sets of (independent) images that are active at certain time steps corresponding to the block diagram. fMRI data were recorded from six subjects (3 female, 3 male, age 20-37) performing a visual task. In five subjects, five slices with 100 images (TR/TE
94
Pando G. Georgiev, Fabian Theis, and Andrzej Cichocki
» to ito i» i« m m m Fig. 2. The original source signals are shown in the left column. The middle column gives the recovered source signals — the signal-to-noise ratio between the original sources and the recoveries is very high (above 278 dB after permutation and normalization). Note that only 200 samples are enough for excellent separation. The right column shows the recovered source signals using /j-norm minimization and known mixing matrix. Simple comparison confirms that the recovered signals are far from the original ones ~ the signal-to-noise ratio is only around 4 dB.
= 3000/60 msec) were acquired with five periods of rest and five photic simulation periods with rest. Simulation and rest periods comprised 10 repetitions each, i.e. 30s. Resolution was 3 x 3 x 4 mm. The slices were oriented parallel to the calcarine fissure. Photic stimulation was performed using an 8 Hz alternating checkerboard stimulus with a central fixation point and a dark background with a central fixation point during the control periods [WLDL02]. The first scans were discarded for remaining saturation effects. Motion artifacts were compensated by automatic image alignment (AIR, [WCM92]). Blind Signal Separation, mainly based on ICA, nowadays is a quite common tool in fMRI analysis (see for example [MJMB98], [MHS03]). Here, we analyze the fMRI data set using as a separation criterion a spatial decomposition of fMRI data images to sparse component maps. Such an approach we consider as very reasonable and advantageous when the stimuli are sparse and dependent, and therefore the ICA methods couldn't give good results. Due to the availability of fMRI data, it appears that the results of our SCA method and ICA method give similar results, which itself we consider as a surprising fact. Here we use our orthogonal m-planes clustering algorithm for matrix identification.
Optimization Algorithms for Sparse Representations and Applications
95
Figure 3 shows the performance of SCA method; see figure text for interpretation. Using only the first 9 principal components, our orthogonal mplanes clustering algorithm could recover the stimulus component as well as detect additional components. It performs equally well as fastICA [HKOOl], figure 4, which is interesting in itself: apparently the two difii'erent criteria, sparseness and independence, lead to similar results in this setting. This can be partially explained by noting that all components, mainly the stimulus component, have high kurtoses i.e. strongly peaked densities.
Acknowledgements The authors would fike to thank Dr. Dorothee Auer from the Max Planck Institute of Psychiatry in Munich, Germany, for providing the fMRI data, and Oliver Lange from the Department of Cfinical Radiology, Ludwig-Maximilian University, Munich, Germany, for data preprocessing and visualization.
7 Conclusion We defined rigorously Blind Signal Separation problem (BSS) and Sparse Component Analysis problem (SCA) of sparse signals and presented sufficient conditions for their solving. The main theoretical contributions are: • • •
Identifiabihty conditions for BSS - especially condition A3) Uniqueness of sparse representations up to a set with measure zero SCA conditions for sparse representation.
These theoretical results are supported by algorithmic implementations: SCA matrix identification algorithm and Source recovery algorithm. The /c-planes clustering algorithm of Bradley and Mangasarian is modified to an orthogonal one, which has superior performance. Our SCA methods are iUustrated with examples: computer simulated ones for the underdetermined case, and fMRI data analysis by our Orthogonal clustering algorithm.
References [BZOl] P. Bofill and M. Zibulevsky, " Underdetermined Blind Source Separation using Sparse Representation", Signal Processing, vol. 81, no. 11, pp. 23532362, 2001. [BMOO] PS. Bradley and O. L. Mangasarian, "k-Plane Clustering", J. Global Optim., 16, (2000), no.l, 23-32. [CA02] A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing. John Wiley, Chichester, 2002.
96
Pando G . Georgiev, Fabian Theis, and Andrzej Cichocki
(a) component maps
Fig. 3. M R I analysis by our orthogonal m-planes clustering algorithm. The data was reduced to the first 9 principal components. (a) shows the recovered component maps (white points indicate values stronger than 3 standard deviations), and (b) their time courses. The stimulus component is given in component 6 (indicated by the high crosscorrelation cc = 0.89 with the stimulus time course, delayed by roughly 2 seconds due to the BOLD effect), which is strongly active in the visual cortex as expected.
Optimization Algorithms for Sparse Representations and Applications
97
(a) component maps
Fig. 4. FastICA result during fMRI analysis of the same data set as in figure 3. The stimulus component is given in component 4 with high stimulus cross-correlation cc = 0.87.
98
Pando G. Georgiev, Fabian Theis, and Andrzej Cichocki
[CDS98] S. Chen, D. Donoho and M. Saunders, " Atomic decomposition by basis pursuit", SI AM J. Sci. Comput, Vol. 20, no. 1, pp. 33-61, 1998. [DE03] D. Donoho and M. Elad, "Optimally sparse representation in general (nonorthogonal) dictionaries via l^ minimization", Proc. Nat. Acad. Sci., vol.100, no.5, pp. 2197-2202, 2003. [DS03] D. Donoho and V. Stodden, "When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?", Neural Information Processing Systems (NIPS) 2003 Conference, http://books.nips.cc [GC04] P. G. Georgiev and A. Cichocki, "Sparse component analysis of overcomplete mixtures by improved basis pursuit method", in Proc. 2004 IEEE International Symposium on Circuits and Systems (ISCAS 2004), Vancouver, Canada, May 23-26. [GTC04] P.G. Georgiev, F. Theis and A. Cichocki, "Blind Source Separation and Sparse Component Analysis of overcomplete mixtures", in Proc. of International Conference on Acoustics and Statistical Signal Processing (ICASSP2004), Montreal, Canada, May 17-21, 2004. [GCB04] Georgiev P., Cichocki A., Bakardjian H., "Optimization Techniques for Independent Component Analysis with Applications to EEC Data", In: Pardalos et al, editors, Quantitative Neuroscience: Models, Algorithms, Diagnostics, and Therapeutic Applications, Kluwer Academic Publishers, 2004, pp. 53-68. [GBOl] Gorodnitsky I, Belouchrani A., " Joint cumulant and correlation based signal separation with application of EEC data analysis" , in Proc. 3-rd Int. Conf. on Independent Component Analysis and Signal Separation, San Diego, Cahfornia, Dec. 9-13, 2001, pp.475-480. [HKOOl] A. Hyvarinen, J. Karhunen and E. Oja, Independent Component Analysis, John Wiley & Sons, 2001. [LLGS99] T.-W. Lee, M.S. Lewicki, M. Girolami, T.J. Sejnowski, "Blind sourse separation of more sources than mixtures using overcomplete representaitons", IEEE Signal Process. Lett, Vol. 6, no. 4, pp. 87-90, 1999. [LS99] D. D. Lee and H. S. Seung "Learning the parts of objects by non-negative Matrix Factorization", Nature, Vol. 40, pp. 788-791, 1999. [MJMB98] McKewon, M., Jung, T., Makeig, S., Brown, G., Kindermann, S., Bell, A., Sejnowski, T., "Analysis of fMRI data by blind separation into independent spatial components". Human Brain Mapping 6, 1998, pp. 160-188. [MHS03] M. McKeown, L. Hansen and T. Sejnowski, "Independent component analysis of functional MRI: what is signal and what is noise? Current Opinion in Neurobiology 2003, 13, pp. 620-629. [RU03] A. M. Rubinov and J. Ugon, "Skeletons of finite sets of points". Research working paper 03/06, 2003, School of Information Technology & Mathematical Sciences, Univ. of Ballarat. [TLP03] F.J. Theis, E.W. Lang, and C.G. Puntonet, A geometric algorithm for overcomplete linear ICA. Neurocomputing, in print, 2003. [WS03] K. Waheed, F. Salem, "Algebraic Overcomplete Independent Component Analysis", in Proc. Int. Conf. ICA2003, Nara, Japan, pp. 1077-1082. [WLDL02] A. Wismiiller, O. Lange, D. Dersch, G. Leinsinger, K. Hahn, B. Piitz and D. Auer, "Cluster Analysis of Biomedical Image Time-Series", International Journal on Computer Vision, Vol. 46, 2, 2002, pp. 102-128.
Optimization Algorithms for Sparse Representations and Applications
99
[WCM92] R. Woods and S. Cherry and J. Mazziotta, "Rapid automated algorithm for aligning and reslicing PET images", Journal of Computer Assisted Tomography, Vol. 16, 8, 1992, pp. 620-633. [ZPOl] M. Zibulevsky, and B. A. Pearlmutter, "Blind source separation by sparse decomposition in a signal dictionary", Neural Comput., Vol. 13, no. 4, pp. 863-882, 2001.
A Unified Framework for Modeling and Solving Combinatorial Optimization Problems: A Tutorial* Gary A. Kochenberger^ and Fred Glover^ ^ School of Business, University of Colorado at Denver, Denver, Colorado 80217, USA. Gary.KochenbergerQcudenver.edu ^ School of Business, University of Colorado at Boulder, Boulder, Colorado 80304, USA.
[email protected]
Summary. In recent years the unconstrained quadratic binary program (UQP) has emerged as a unified framework for modeling and solving a wide variety of combinatorial optimization problems. This tutorial gives an introduction to this evolving area. The methodology is illustrated by several examples and substantial computational experience demonstrating the viability and robustness of the approach.
1 Introduction The unconstrained quadratic binary program (UQP) has a lengthy history as an interesting and challenging combinatorial problem. Simple in its appearance, the model is given by UQP ; Opt xQx where x is an n-vector of binary variables and Q is an n-by-n symmetric matrix of constants. Published accounts of this model go back at least as far as the sixties (see for instance Hammer and Rudeanu [HR68]) with applications reported in such diverse areas as spin glasses [DDJMRR95, GJR88], machine scheduling [AKA94], the prediction of epileptic seizures [ISSPOO], solving satisfiability problems [BH02, BP89, HR68, HJ90], and determining maximum cliques [BH02, PR92, PX94]. The application potential of UQP is much greater than might be imagined, due to the re-formulation possibilities afforded by the use of quadratic infeasibility penalties as an alternative to imposing constraints in an explicit manner. In fact, any linear or quadratic discrete (deterministic) problem with linear constraints in bounded integer variables can in principle be re-cast into the form of UQP via the use of * Earlier versions of this material appear in references [KGAR04a, KGAR04b]
102
Gary A. Kochenberger and Fred Glover
such penalties. This process of re-formulating a given combinatorial problem into an instance of UQP is easy to carry out, enabling UQP to serve as a common model form for a widely diverse set of combinatorial models. This common modeling framework, coupled with recently reported advances in solution methods for UQP, help to make the model a viable alternative to more traditional combinatorial optimization models as illustrated in the sections that follow. 1.1 Re-casting Into the Unified Framework For certain types of constraints, equivalent quadratic penalty representations are known in advance making it easy to embody the constraints within the UQP objective function. For instance, let Xi and Xjhe binary variables and consider the constraint^^^ "J
<1
(1)
which precludes setting both variables to one simultaneously. A quadratic infeasibihty penalty that imposes the same condition on Xi and Xjis: J.
Ju •i t o '1
\ ^
j
where P is a large positive scalar. This penalty function evidently is positive when both variables are set to one (i.e., when (1) is violated), and otherwise the function is equal to zero. For a minimization problem then, adding the penalty function to the objective function is an alternative equivalent to imposing the constraint of (1) in the traditional manner. In the context of our transformations involving UQP, we say that a penalty function is a valid infeasible penalty (VIP) if it is zero for feasible solutions and otherwise positive. Including quadratic VIPs in the objective function for each constraint in the original model yields a transformed model in the form of UQP. VIPs for several commonly encountered constraints are given below (where x and y are binary variables and P is a large positive scalar): Equivalent Classical Penalty (VIP) Constraint P{xy) X+y < l X + y > 1 P{1 - X — y + xy) X + y = 1 P ( l - x - y + 2xy) P{x — xy) X
*^^ The degree-2 constraints of this form commonly appear in optimization problems pertaining to graphs as described in [BH02, PR92, PX94]. As we'll see later in this paper, however, their application extends far beyond classical graph problems.
Modeling and Solving Combinatorial Optimization Problems
103
The penalty term in each case is zero if the associated constraint is violated, and otherwise is positive. These penalties, then, can be directly employed as an alternative to explicitly introducing the original constraints. For other more general constraints, however, VIPs are not known in advance and need to be "discovered." A simple procedure for finding an appropriate VIP for any linear constraint is given in section 1.3. Before moving on to this more general case, however, we give a complete illustration of the re-casting process by considering the set packing problem. 1.2 Set Packing Set packing problems (SPP) are important in the field of combinatorial optimization due to their application potential and their computational challenge. The standard formulation for SPP is: n
SPP: max y^^WjXj
s.t. yjttjja;^ < 1 for 1 = 1,...,m X binary where the 0/1 coefficients and the Wj are positive weights. The number of constraints m is determined by the appUcation, and generally may be very large. Many important appHcations of SPP have been reported in the literature, and an extensive survey of set packing and related models may be found in Vemuganti [Vem98]. The recent paper by Delorme, Gandibleax, and Rodriguez [DGR04] reports apphcations in railway infrastructure design, ship scheduling, resource constrained project scheduling, and the ground holding problem. Applications in combinatorial auctions and forestry are reported by Pekec and Rothkopf [PROS] and Ronnqvist [Ron03], respectively. Other applications, particularly as part of larger models, are found throughout the literature. Since SPP is known to be NP-hard, exact methods generally cannot be relied upon to generate good solutions in a timely manner. In particular, the Unear programming relaxation does not provide good bounds for these difficult problems. Nonetheless, considerable work has been devoted to improving exact methods for SPP with innovations in branch & cut methods based on polyhedral facets as described in Padberg [Pad73] and the extensive work of Cornuejolos [Cor95]. Despite these advances, however, SPP remains resistant to exact methods and, in general, it is necessary to employ heuristic methods to obtain solutions of reasonably decent quality within a reasonable amount of time. This is particularly true for problem instances with a large number of variables that are neither loosely nor tightly constrained.
104
Gary A. Kochenberger and Fred Glover
Recasting S P P into the form of xQx: The structure of the constraints in SPP enables quadratic VIPs to be easily constructed for each constraint simply by summing all products of constraint variables taken two at a time. To illustrate, consider the constraint Xi + X 2 + X3 < 1
Such a constraint can be replaced by the quadratic penalty function P{xiX2 +XiXz +X2X2,) where P is a positive scalar. Clearly this quadratic penalty function is zero for feasible solutions and positive otherwise. Similarly, the general paclting (or GUB) constraint n
can be replaced by the penalty function Xj).
By subtracting such penalty functions from the objective function of a maximization problem, we have a model in the general, unified form of xQx. Note that this reformulation is accomplished without introducing new variables. This procedure is illustrated by the following two examples: E x a m p l e 1: Find binary variables that solve: SPP: maxxi + 2:2 + X3 + x'4 s.t.
Xi+X'i+Xi
Representing the scalar penalty P by 2M, the equivalent unconstrained problem is: max xi + X2 + X3 + X4 — 2MX1X3 — 2MX1X4 — 2MX3X4 — 2Mxia;2 which can be re-written as
max (xi X2 X3 X4)
1 -M -M - M " d 1 0 0 X2 0 1 ~M X3 i 0 -M 1 VX4/ => nl a x X Qx
Modeling and Solving Combinatorial Optimization Problems
105
where Q, as shown above, is a square, symmetric matrix. All the problems characteristics of SPP are embedded into the Q matrix. Example 2: (Schrage [Sch97]): 22
max y^^Xj 3=1 S.t.
Xi+X2 + X^+X4 + X'5 +XQ+XT
xu
T h e Q matrix for equivalent transformed model (maxxQa;), with M arbitrarily chosen to be 8, is given by: • 1 - 8 - 8 - 8 - 8 - 8 - 8 - 8 - 8 - 8 - 8 - 8 - 8 - 8 0 0 0 0 0 0 0 0" -8 1 - 8 - 8 - 8 - 8 - 8 - 8 0 0 0 0 0 0 - 8 - 8 - 8 - 8 0 0 0 0 -8-8 1 - 8 - 8 - 8 - 8 0-8 0 0 0 0 0-8 0 0 0 - 8 - 8 - 8 0 - 8 - 8 - 8 1 - 8 - 8 - 8 0 0-8 0 0 0 0 0-8 0 0-8 0 0-8 - 8 - 8 - 8 - 8 1-8-8 0 0 0-8 0 0 0 0 0-8 0 0 - 8 - 8 - 8 - 8 - 8 - 8 - 8 - 8 1-8 0 0 0 0 - 8 - 8 0 0 0 0 - 8 0 - 8 0 - 8 - 8 - 8 - 8 - 8 - 8 - 8 1 0 0 0 0-8 0-8 0 0 0-8 0 0 - 8 - 8 - 8 - 8 0 0 0 0 0 1 - 8 - 8 - 8 - 8 -8 -8 - 8 - 8 - 8 - 8 0 0 0 0 -8 0 - 8 0 0 0 0 - 8 1 - 8 - 8 - 8 - 8 - 8 - 8 0 0 0 -8 -8 -8 0 -8 0 0 - 8 0 0 0 - 8 - 8 1 - 8 - 8 - 8 - 8 0 - 8 0 0 - 8 0 0 -8 0 0 0 - 8 0 0 - 8 - 8 - 8 1 - 8 - 8 - 8 0 0 - 8 0 0 -8 -8 -8 0 0 0 0 - 8 - 8 - 8 -8 - 8 - 8 1 - 8 - 8 0 0 0 - 8 0 -8 -8 -8 0 0 0 0 - 8 0 - 8 - 8 - 8 - 8 - 8 1-8 0 0 0 - 8 0 - 8 - 8 -8 0 0 0 0 0 - 8 - 8 - 8 - 8 - 8 - 8 - 8 1 0 0 0 - 8 0 0 -8 0-8-8 0 0 0 0-8-8 0 0 0 0 0 1-8-8-8-8-8-8 0 0-8 0-8 0 0 0-8 0-8 0 0 0 0-8 1 - 8 - 8 - 8 0 0-8 0 - 8 0 0 - 8 0 0 - 8 0 0 - 8 0 0 0 - 8 - 8 1-8 0-8-8-8 0-8 0 0 0 - 8 - 8 - 8 0 0 0 - 8 - 8 - 8 -8 - 8 - 8 1 0 -8 -8 -8 0 0-8-8 0 0 0 0-8-8 0 0 0 0-8-8 0 0 1-8-8-8 0 0 - 8 0 - 8 - 8 0 0 - 8 0 - 8 - 8 - 8 0 - 8 0 -8 -8 -8 1 -8 -8 0 0 - 8 0 - 8 0 - 8 0 - 8 0 - 8 - 8 -8 -8 -8 0 -8 -8 -8 -8 1 -8 0 0 0 - 8 - 8 - 8 - 8 0 0 - 8 - 8 - 8 - 8 - 8 0 - 8 - 8 - 8 - 8 - 8 - 8 1. Solving^"*) this instance of x Q x gives an optimal solution with an objective function value of 4 and x-j = xi^ = x\j — xig = 1, all other variables equal t o zero. We conclude this section by summarizing some of t h e key points about the procedure illustrated above: 1. In t h e manner illustrated, any S P P problem can be re-cast into an equivalent instance of UQP. '"** All instances of UQP solved in this tutorial were solved using the tabu search method described in [GKAA99, GKA98].
106
Gary A. Kochenberger and Fred Glover
2. This reformulation is accomplished without the introduction of new variables. 3. It is always possible to choose the scalar penalty sufficiently large so that the solution to xQx is feasible for SPP. At optimality the two problems are equivalent in the sense that they have the same set of optimal solutions. 4. For "weighted" instances of SPP, the weights, Wj, show up on the main diagonal of Q. We subsequently describe the outcome of using this and other types of problem reformulations as a means for solving a variety of optimization models. 1.3 Accommodating General Linear Constraints The preceding section illustrated how to re-cast a constrained problem into the form of UQP when the VIPs were known in advance. In this section we indicate how to proceed in the more general case when VIPs are not known in advance. We take as our starting point the general constrained problem minxo = xQx s.t. Ax = b, X binary
, . ^ '
This model accommodates both quadratic and linear objective functions since the linear case results when Q is a diagonal matrix (observing that xj = Xj when Xj is a 0-1 variable). Problems with inequality constraints can also be put into this form by representing their bounded slack variables by a binary expansion. These constrained quadratic optimization models are converted into equivalent UQP models by adding a quadratic infeasibility penalty function to the objective function in place of expUcitly imposing the constraints Ax — b. Specifically, for a positive scalar P, we have xo = xQx + P {Ax - 6)' [Ax - b) = xQx + xDx -|- c = xQx + c
(4)
where the matrix D and the additive constant c result directly from the matrix multiplication indicated. Dropping the additive constant, the equivalent unconstrained version of our constrained problem becomes UQP (PEN) :minxQa;, a; binary
(5)
Prom a theoretical standpoint, a suitable choice of the penalty scalar P can always be chosen so that the optimal solution to UQP (PEN) is the optimal solution to the original constrained problem. Remarkably, as we later demonstrate, it is often easy to find such a suitable value in practice as well. We refer to the preceding general transformation that takes us from (3) through (4) to (5) as transformation # 1 . This approach along with related material can be found in [BH02, Han79, HJM93]. This is the general procedure
Modeling and Solving Combinatorial Optimization Problems
107
that could in principle be employed to transform any problem in the form of (7) into an equivalent instance of UQP. As indicated earlier in section 1.1, VIPs are known in advance for certain simple constraints and when such constraints are encountered it is usually preferred to use the known VIP directly rather than applying transformation # 1 . One special constraint in particular Xj +Xk
appears in many important applications and as indicated in section 1.1 can be handled by a VIP of the form PxjXk- Due to the importance of this constraint and its frequency of occurrence in applications, we refer to this special case as transformation # 2. The use of these two transformations is illustrated in the next section by considering two classical problems in combinatorial optimization.
2 Further Illustrative Examples Before highhghting some of the problem classes we have successfuUy solved using the foregoing transformation approaches, we give two small examples from classical NP-hard problem settings to provide additional concrete illustrations. Example 1: Set Partitioning. The classical set partitioning problem is found in applications that range from vehicle routing to crew scheduhng [Jos02, MBRB99]. As an illustration, consider the following small example: min XQ = 3x1 + 2x2 + X3 + Xi + 3xr, + 2XQ subject to xi +X3 + xe = I X2+X3 + X5+XQ = 1 2:3 + a^4 + 2:5 = 1
xi + ^2 + a:;4 + xe = 1 and X binary. Applying Transformation 1 with P =10 gives the equivalent UQP model: UQP(PEN) : minxQx,
x binary
where the additive constant, c, is 40 and
Q
-17 10 10 10 0 20 10 -18 10 10 10 20 10 10 -29 10 20 20 10 10 10 -19 10 10 0 10 20 10 -17 10 20 20 20 10 10 -28
108
Gary A. Kochenberger and Pred Glover
Solving UQP(PEN) we obtain an optimal solution xi = xg = 1 (all other variables equal to 0) for which XQ = 6. In the straightforward application of Transformation 1 to this example, the replacement of the original problem formulation by the UQP(PEN) model did not involve the introduction of new variables. In many applications, Transformation 1 and Transformation 2 can be used in concert to produce an equivalent UQP model, as demonstrated next. Example 2; The K-Coloring Problem: Vertex coloring problems seek to assign colors to nodes of a graph in such a way that adjacent nodes receive different colors. The K-coloring problem attempts to find such a coloring using exactly K colors. A wide range of applications, ranging from frequency assignment problems to printed circuit board design problems can be represented by the K-coloring model. These problems can be modeled as satisfiability problems using the assignment variables as follows: Let Xij be 1 if node i is assigned color j , and 0 otherwise. Since each node must be colored, we have K
^ X y = 1 i = 1,..., n i=i
(6)
where n is the number of nodes in the graph. A feasible coloring, in which adjacent nodes are assigned different colors, is assured by imposing the constraints Xip + Xjp
p=l,...,K
(7)
for all adjacent nodes (i,j) in the graph. This problem can be re-cast into the form of UQP by using Transformation 1 on the assignment constraints of (6) and Transformation 2 on the adjacency constraints of (7). No new variables are required. Since the model of (6) and (7) has no expHcit objective function, any positive value for the penalty P will do. The following example gives a concrete illustration of the re-formulation process. Consider the graph given in Figure 1 and assume we want find a feasible coloring of the nodes using 3 colors. Our satisfiablity problem is that of finding a solution to: xn + Xi2 + Xi3 = 1 Xip + Xjp < 1
1 = 1,5 p = 1, 3
(8) (9)
(for all adjacent nodes i and j) In this traditional form, the model has 15 variables and 26 constraints. To recast this problem into the form of UQP, we use Transformation 1 on the
Modeling and Solving Combinatorial Optimization Problems
109
Fig. 1. Example of a graph for the K-Coloring Problem.
equations of (8) and Transformation 2 on the inequalities of (9). Arbitrarily choosing the penalty P to be 4, we get the equivalent problem: UQP(Pen) : minxQx where the Q matrix is:
Q =
-4 4 4 4 0 0 0 0 0 0 0 0 4 0 0 4 -4 4 0 4 0 0 0 0 0 0 0 0 4 0 4 4 -4 0 0 4 0 0 0 0 0 0 0 0 4 4 0 0 -_4 4 4 4 0 0 4 0 0 4 0 0 0 4 0 4 -- 4 4 0 4 0 0 4 0 0 4 0 0 0 4 4 4 -4 0 0 4 0 0 4 0 0 4 0 0 0 4 0 0 -4 4 4 4 0 0 0 0 0 0 0 0 0 4 0 4 -- 4 4 0 4 0 0 0 0 0 0 0 0 0 4 4 4 -4 0 0 4 0 0 0 0 0 0 4 0 0 4 0 0 -4 4 4 4 0 0 0 0 0 0 4 0 0 4 0 4 -4 4 0 4 0 0 0 0 0 0 4 0 0 4 4 4 -4 0 0 4 4 0 0 4 0 0 0 0 0 4 0 0 -4 4 4 0 4 0 0 4 0 0 0 0 0 4 0 4 -~4 4 0 0 4 0 0 4 0 0 0 0 0 4 4 4 -4
Solving this unconstrained model, xQx, yields the feasible coloring: a;ii, X22, X33, X41, a;53 = 1 ah other x^ = 0 This approach to coloring problems has proven to be very effective for a wide variety of coloring instances from the literature. Later in this paper we present some computational results for several standard K-coloring problems. An extensive presentation of the xQx approach to a variety of coloring problems, including a generahzation of the K-coloring problem considered here, is given in Kochenberger, Glover, Alidaee and Rego [KGAR02].
no
Gary A. Kochenberger and Fred Glover
3 Solving U Q P Employing the UQP unified framework to solve combinatorial problems requires the availability of a solution method for xQx. The recent literature reports major advances in such methods involving modern metaheuristic methodologies. The reader is referred to references [AHA98, AAK99, Bea99, BS94, BHS89, CS94, GARK02, GKAA99, GKA98, KTNOO, Lau70, LAL97, MF99, Pau95, PR90] for a description of some of the more successful methods. The pursuit of further advances in solution methods for xQx remains an active research arena. In the work reported here, we used a basic tabu search method due to Glover, Kochenberger, and Ahdaee [GL97, GKAA99, GKA98]. A brief overview of the approach is given below. For complete details the reader is referred to the aforementioned reference. Our TS method for UQP is centered around the use of strategic oscillation, which constitutes one of the primary strategies of tabu search. The method alternates between constructive phases that progressively set variables to 1 (whose steps we call "add moves") and destructive phases that progressively set variables to 0 (whose steps we call "drops moves"). To control the underlying search process, we use a memory structure that is updated at critical events, identified by conditions that generate a subclass of locaUy optimal solutions. Solutions corresponding to critical events are called critical solutions. A parameter span is used to indicate the amplitude of oscillation about a critical event. We begin with span equal to 1 and gradually increase it to some limiting value. For each value of span, a series of alternating constructive and destructive phases is executed before progressing to the next value. At the limiting point, span is gradually decreased, allowing again for a series of alternating constructive and destructive phases. When span reaches a value of 1, a complete span cycle has been completed and the next cycle is launched. The search process is typically allowed to run for a pre-set number of span cycles. Information stored at critical events is used to influence the search process by penaUzing potentiaUy attractive add moves (during a constructive phase) and inducing drop moves (during a destructive phase) associated with assignments of values to variables in recent critical solutions. Cumulative critical event information is used to introduce a subtle long term bias into the search process by means of additional penalties and inducements similar to those discussed above. Other standard elements of tabu search such £is short and long term memory structures are also included.
4 Applications To date several important classes of combinatorial problems have been successfully modeled and solved by employing the unified framework. Our results
Modeling and Solving Combinatorial Optimization Problems
111
with the unified framework applied to these problems have been uniformly attractive in terms of both solution quahty and computation times. While our solution method is designed for the completely general form of UQP, without any specialization to take advantage of particular types of problems reformulated in this general representation, our outcomes have typically proved competitive with or even superior to those of speciahzed methods designed for the specific problem structure at hand. Our broad base of experience with UQP as a modeling and solution framework includes a substantial range of problem classes including: Quadratic Assignment Problems Capital Budgeting Problems Multiple Knapsack Problems Task Allocation Problems (distributed computer systems) Maximum Diversity Problems P-Median Problems Asymmetric Assignment Problems Symmetric Assignment Problems Side Constrained Assignment Problems Quadratic Knapsack Problems Constraint Satisfaction Problems (CSPs) Set Partitioning Problems Fixed Charge Warehouse Location Problems Maximum Chque Problems Maximum Independent Set Problems Maximum Cut Problems Graph Coloring Problems Graph Partitioning Problems Number Partitioning Problems Linear Ordering Problems Number Partitioning Problems. Additional test problems representing a variety of other applications (which do not have "well-known" names) have also been reformulated and solved via UQP. In the section below we report specific computational experience with some of the problem classes listed above Additional applications are discussed by Boris and Hammer [BH91] and Lewis, Alidaee and Kochenberger [LAK04].
5 Illustrative Computational Experience Sections 1 and 2 of this paper presented small examples intended to illustrate the mechanics of the transformation process. Here, we highlight our computational experience with several well-known problem classes. In each case, we specify the standard problem formulation, comment on the transformation(s)
112
Gary A. Kochenberger and FYed Glover
used to recast the problem into the form of UQP, and summarize our computation experience. It is not our objective here to provide a comprehensive comparison with the best known methods for the problem classes considered below. Rather, our purpose in this section is to provide additional validation of the potential merits of this unified approach. Nonetheless, the results shown below are very attractive and motivate such head-to-head comparisons in future studies. 5.1 Warehouse Location: (Single source, Uncapacitated) Zero/One formulation: m
n
m
min E E CijXij + E fiVi m
J2xij = 1 j =
l,...,n
Xij
Recast as xQx: • • •
Complement the y variables (to enable the use of transformation # 2) Use both transformations No new variables required Computational Experience:
•
Total number of Problems Solved: 4
# variables 55 210 410 820
m 5 10 10 20
n # TS cycles Soln Time (sec) Soln Optimal? 10 20 < 1 sec Yes * 20 50 < 5 * 40 100 < 30 100 * 40 < 120
* Optimal Solutions not known. Remarks: Transformation # 1 was used for the assignment constraints and transformation # 2 , once the "y" variables were complemented, was used for the variable upper bound constraints. No new variables were required. The problems were randomly generated with Cy = C/(50,100) and fi = [/(lOO, 200). Each instance was recast as xQx using a penalty, P, equal to 200. Our tabu search heuristic was allowed to run for a fixed number of oscillation cycles as shown above, with the largest problem taking less than 2 minutes on a Pentium 400 PC. In each case feasible solutions were easily found. Moreover, the solution found for the first problem proved to be optimal. Optimal solutions to the other problems are not known.
Modeling and Solving Combinatorial Optimization Problems
113
5.2 Constraint Satisifiability problems (CSPs): Zero/One formulation: Ax = b aij 6 {-1,0,1} 6i = 1 or 2 Recast as xQx: Transformation # 1 No additional variables Computational Experience: •
Total number of Problems Solved: 26
# variables 20 50 100_ 500 1000
# rows 6 10 30 50 100
# problems 3 3 10 5 5
Soln Time < 1 sec < 3 sec < 15 sec 1 - 2 min 4 - 5 min
Soln Feasible Yes Yes Yes Yes Yes
Remarks: Transformation # 1, with P taken to be 2, was used to develop the equivalent xQx model for each of these problems. No new variables were required. In all, a total of 26 random problem instances were solved by letting our tabu search heuristic run until the objective function was equal to zero, implying a feasible (and in this case, optimal) solution. Feasible solutions were quickly found in each case, with solutions for the largest instances found, on average, in roughly 4-5 minutes on a Pentium 200 PC. The smaller problems took only a few seconds. 5.3 Quadratic Knapsack Problems Zero/One Formulation: max xQx s.t. Ax < b, X binary Recast as xQx: •
Add slack variables
•
Use transformation # 1 Computational Experience:
•
Total number of Problems Solved: 53
114
Gary A. Kochenberger and Fred Glover # Variables # constraints # problems Soln times Optimal Solns? 1 24 23 proven opt 10,20 30 < 2 sec 1 20 4, 9, 240 sec 40,100,500 * 2,4 8 < 4 sec All opt 20 5 1 < 16 sec 50 *
* Optimal Solutions not Known Remarks: For this class of problems, a total of 53 random problems were solved. The problem instances ranged in size from 10 to 500 variables and 1 to 5 constraints. The largest of these problems are much larger (in terms of both variable and constraint count) than previously reported in the literature. Instances were constructed with qij = C/(—25, 25), Oy = U{1,10), and bi chosen to be a fraction of the sum of the Oj^for a given row. Slack variables (in binary expansion form) allowing for a maximum slack activity of 31 were added to each row to produce equality constraints and transformation # 1 was then used to produce the equivalent xQx representation. The variable counts given in the table above portray the original problems and do not include these slack variables. The value of the penalty P used to achieve an equivalent xQx representation was heuristically incremented as needed in order to achieve feasible solutions for the larger instances solved. For the largest of the problems, n = 500, xQx was first generated with P = 150. Solving this model gave a solution that was infeasible with respect to the original problem. P was then raised to 1500 and a new xQx instance was formed whose solution was feasible. P = 1500 was then used in the transformation of each of the other (smaller) problems and in each case the solutions generated proved to be feasible. Moreover, the solutions obtained for the 23 of the 24 problems of size n = 30 or less proved to be optimal. Each problem was allowed to run for 100 TS cycles. The largest of the problems required 4 minutes on a Pentium 200 PC; all others were solved in less than 16 seconds with the smallest problems taking less than 2 seconds. Many of the smaller problems (n = 30 or less) were solved again with a much smaller value of the penalty. P = 50 was large enough to produce optimal solutions to the n = 10 and n == 20 variable problems and P = 250 was large enough for the n = 30 problems. Computational times, as expected, remained very small, showing no apparent relationship to the penalty value. 5.4 Maximum Diversity Zero/One Formulation:
Modeling and Solving Combinatorial Optimization Problems n
max
YJ
115
n S
^ij ^i ^j
n
s.t. J2 Xi = m i=l
R e c a s t as
xQx:
•
Transformation # 1
•
No new variables Computational Experience:
•
T o t a l n u m b e r of P r o b l e m s S o l v e d : 25 # TS cycles # vars (n) M 20 100 10, 15, 20, 25, 30 300 50 30, 45, 60, 75, 90 500 100 50, 75, 100, 125, 150 1000 100, 150, 200, 250, 300 100 2000 200 200, 300, 400, 500, 600
Soln Time Solns Opt? 2 sec (each) * 15 sec (each) * 58 sec (each) * 194 sec (each) * 16 min (each) *
* Optimal solutions not known Remarks: For this class of problems we solved a total of 25 random instances of sizes ranging from 100 to 2000 variables. For each size, five different values of "m" were considered as shown in the table above. For all problems, the g^jvalues were chosen from U(-50,50). Transformation # 1 was used with P = 2n. Our t a b u search heuristic was run for a fixed number of cycles, terminating in each case with a feasible solution. Optimal solutions for these problems are not known. However, we have also solved much smaller problems for which optimal solutions are known and in each such case our approach was successful in finding the optimal solution. All runs were made on a Pentium 200 P C . Prior to this paper, the largest instances reported in the literature were of size n = 100. Our results greatly expand the state of the art and illustrate a solution capabihty much beyond t h a t reported by others. 5.5 S e t P a r t i t i o n i n g Zero/One Formulation: min S.t.
Y^ '^j^j n Y^ CiijXj = 1
X binary R e c a s t as
xQx:
* = l i •••! W
116
Gary A. Kochenberger and Fred Glover
•
Transformation # 1
•
No additional variables Computational Experience:
•
Total number of Problems Solved: 18 n 80 100 400 800
M # TS cycles # problems Soln time Soln Feas* 3 sec Yes 20 5 10 5 9 sec Yes 10 20 50 5 220 sec Yes 40 3 5 min Yes 80 100
* Optimal Solutions not known Remarks: For this class of problems, we solved a total of 18 random instances or sizes ranging from 80 variables and 10 constraints to 800 variables and 80 constraints. The problems varied in density from .1 to .3 and in each case the Cjvalues were chosen from U(10,100). Transformation # 1 was used to convert to xQx with P = max Cjvalue for each problem. Our Tabu Search heuristic was run for fixed number of cycles as shown in the table above. While optimal solutions are not known, feasible solutions were quickly found for each problem with the largest of the problems being solved in less than 5 minutes on a Pentium 200 PC. 5.6 Vertex Coloring Zero/One Formulation:
min Yl Vj nc
s.t. X^ a^ij = 1 i = I,...,
nv
i=i
Xik + Xjk < 1 for each edge (i, j) and color k Xik < Vk for each vertex i and each color k x,y binary where: nc = Maximum number of colors allowed nv = number of vertices in graph Recast as xQx: complement y variables Transformation # 1 for first constraints Transformation # 2 for last two sets
Modeling and Solving Combinatorial Optimization Problems •
117
No additional variables Computational Experience:
•
Total number of Problems Solved: 9
ID # nodes # edges 254 Jean_col 80 David_col 87 406 74 Huck-col 301 11 Myciel3_col 20 Myciel4_col 23 71 Myciel5-col 236 47 Myciel6_col 95 755 Queen5_5_col 160 25 Queen6_6_col 36 290
nc # xQx variables 12 972 12 1056 14 1050 8 96 240 10 10 480 960 10 10 260 10 370
Soln Time xQx solution Opt Solution < 2 min 10 10 < 2 min 11 11 11 11 < 2 min 4 < 1 min 4 < 1 min 5 5 6 < 1 min 6 < 2 min 7 7 < 1 min 5 5 < 2 min 8 7
Remarks: In section 2 we presented a small example of a 3-coloring problem. Here we consider a generalization of the vertex coloring problem where we want to find a coloring with the minimum number of colors rather than finding one with a given number of colors. For this class of problems, we solved 9 standard problems from the literature, which can be found at http://mat.gsia.cmu.edu/COLOR/instances.html. The conversion to xQx was achieved by using transformation # 1 on the first set of constraints and, after complementing the "y" variables, using transformation 7^ 2 on the last two sets of constraints. In each case, the penalty P was taken to be 20. Note that no new variables are required. Optimal solutions were found in 8 of the 9 cases. We were ofi" by one color for problem Queen6_6. The solution times for the largest problem was slightly less than 2 minutes on a Pentium 200 PC. 5.7 Maximum Clique (Max Independent Set) Given a graph, G, and its complement graph: G =
{V,E),G={V,E)
Zero/One Formulation: n
max YL ^j j=i
s.t. Xi + Xj < 1 V(i,j) G E Recast as xQx:
118
Gary A. Kochenberger and Fred Glover
•
Transformation # 2
•
No additional variables Computational Experience: (Max Clique)
•
Total number of Problems Solved: 33
ID # nodes # instances xQx solns Soln Time Solns Optimal? < 2 sec Yes P-hat 300 300 3 8, 25, 36 < 2 sec Yes P-hat 500 500 3 9, 36, 50 Yes P-hat 700 700 3 11, 44, 62 < 4 sec Yes P-hat 1000 1000 3 10, 46, 68 < 15 sec 12, 65, 94 < 6 min Yes P-hat 1500 1500 3 12, 24, 58 < 1 sec Yes C-fat 200 200 3 14, 26, 64 < 1 sec Yes C-fat 500 500 3 4 Yes Brock 200 200 21,12,15,17 < 4 sec 4 Yes Brock 400 400 27,29,31,33 7 min 4 Yes Brock 800 800 23,24,25,26 32 min
Remarks: For this class of problems we solved 33 standard test problems from the literature. These problems can be found at ftp://dimacs.rutgers.edu/pub/challenge. The conversion to xQx was achieved by using transformation # 2, taking the penalty P to be 2 in each case. No new variables are required. Optimal solutions were found for all 33 problems. With few exceptions, these optimal solutions were found in only a few seconds by our tabu search method on a Pentium 333 PC. As noted in the table above, Brock 400, Brock 800 and P-hat 1500-1 took somewhat longer, requiring 6, 7 and 32 minutes respectively. 5.8 Comments on Computational Experience The computational experience reported above is intended to demonstrate the viability and utility of the reformulation approach. We have successfully applied this approach to many other problem classes as well. While our intention is to disclose the general applicability of the unified modeling and solution methodology, and not to provide a comprehensive comparison of this approach with the best known methods at this time, we nonetheless emphasize that results presented in section 5 clearly establish that the reformulation approach not only works across a wide array of problem classes but works very well.
Modeling and Solving Combinatorial Optimization Problems
119
The test bed we used in this section, 168 problems in all, is a conribination of new, randomly generated problems and widely used problems from the hterature. Optimal solutions are known for 100 of the 168 problems. For each problem class, the instances considered are representative of the problems considered by other researchers. In some cases, most notably for the maximum diversity and quadratic knapsack problems, we considered problem instances much larger than previously addressed in the literature. In all cases, our method was able to quickly find feasible solutions. For the problems where optimal solutions are known, our method matched the optimal solution in 99 of the 100 instances. This performance lends support to the expectation that the solutions obtained for the other problems, where optimal solutions are not known, are of very high quality as well. Solution times for our approach across the board are very small.
6 Special Transformations Transformation # 1 can in principle be used to transform any linear constraint in bounded integer variables into a quadratic penalty term and in fact this transformation is the general workhorse of this recasting approach. However, its use with general inequalities requires the introduction of additional variables and thus alternative transformations not requiring additional variables should be employed where possible. As we have indicated in the examples and computational outcomes given above, it is often possible to use a mixture of transformations in the same problem, constructing penalties with an eye toward avoiding the introduction of new variables where circumstances permit. The most common example of such an alternative is transformation # 2 , which accommodates a frequently encountered but very special class of inequalities. Section 1.1 listed a few additional special cases for which VIPs are known. This list is by no means exhaustive. Many additional special cases, either for single constraints or groups of constraints, are waiting to be discovered. We illustrate the possibihty of discovering important special cases by considering the classical problem of linear ordering. Linear Ordering: The linear ordering problem is defined by an n-by-n matrix of weights C = {cij} where the problem is to find a permutation, p, of columns (and rows) such that the sum of the weights on the upper triangular matrix is maximized. Such problems arise in a variety of settings (such as finding an acyclic tournament of maximum weight, or the aggregate ordering of paired observations) but are most often associated with the triangulation of inputoutput matrices in economics where the data in question often refers to sectors. This problem can be modeled utilizing the decision variable: Xij = 1 if sector
120
Gary A. Kochenberger and Fred Glover
i goes before sector j in the permutation; 0 otherwise. Taking advantage of the fact that = 1 for all i and j , a standard integer programming formulation for the problem is given by: max Yl ^ij '^ij
~^ L^ Cij (1 -- Xji )
i<j
j
s.t. Xij Xij Xij
V(i,j k): V(i,j k): G{0,1} V ( i , j ) : i <j
+ ^jk ~~ Xik < 1 + Xjk - Xik > 0
i <j < k i <j < k
After introducing slack variables, this model could be recast into the form of UQP by employing general transformation # 1 . However, the above constraints allow a special quadratic penalty not involving new variables that is greatly preferable to the penalty derived from transformation # 1. To see how this special penalty arises, note that for a particular set i < j < fc, the pair of constraints shown above allows 6 of the 8 possible solutions, excluding only Xij = 1, Xjk = 1, Xik = 0 and Xij — 0, Xjk = 0, x^^ = 1. It is easy to see that an exact quadratic penalty that precludes these same two solutions, while allowing the others, is given by: ^ X'^ik "T" ^ijXjk
^ij^ik
'^jk'^ik j
Thus, without introducing additional variables, this special penalty can be used to easily transform the Hnear ordering problem into an equivalent UQP. For a problem with n sectors, both the IP formulation and the equivalent UQP model will have n{n — l)/2 variables. This approach is illustrated below by a small example. Example: Consider the 4 sector example with an initial permutation p — (1,2,3,4) and matrix: ' 0 12 5 3" 4 026 8 309 11 4 2 0 The IP formulation becomes: max xo = 32 + 8x12 — 3xis — Sxu — IX23 + 2x24 + 7x34 s.t. a:i2 + X23 - x i 3 < 1
X12 + X23 - ^13 > 0
^12 + ^24 - 3:14 < 1
a:i2 + X24 - a;i4 > 0
^13 + ^34 - ^14 < 1
^13 + X34 - Xi4 > 0
^23 -f- ^34 - ^24 < 1
^23 + ^34 " ^24 > 0
Representing P by 2M, the equivalent xQx model is given by the 6x6 matrix:
Modeling and Solving Combinatorial Optimization Problems
Q
8 M M M -~ 3 - 2 M M M M - g -AM M M 0 M 0 M 0 -M M
-M -M M 0 M 0 -1 M M - 2 -- 2 M -M M
121
0 -M M -M M 7
Choosing the penalty, M, to be 10 and solving the problem:
max xQx yields the value 15 with 2:12 and X34 = 1 for which the corresponding permutation is p = (3,4,1,2) and the (original) objective function value is 15 + 32 = 47.
7 Summary In this tutorial we have demonstrated how a variety of disparate combinatorial problems can be solved by first re-casting them into the common modeling framework of the unconstrained quadratic binary program. Once in this unified form, the problems can be solved effectively by adaptive memory tabu search metaheuristics or other recently developed solution approaches for UQP. Our findings challenge the conventional wisdom that places high priority on preserving Unearity and exploiting specific structure. Although the merits of such a priority are well-founded in many cases, the UQP domain appears to offer a partial exception. In forming UQP (PEN), we destroy any hnearity that the original problem may have exhibited. Moreover, any exploitable structure that may have existed originally is "folded" into the Q matrix, and the general solution procedure we apply takes no advantage of it. Nonetheless, our solution outcomes have been remarkably successful, yielding results that rival the effectiveness of the best specialized methods. This combined modeling/solution approach provides a unifying theme that can be apphed in principle to all linearly constrained quadratic and hnear programs in bounded integer variables, and our computational findings for a broad spectrum of problem classes raises the possibihty that similarly successful results may be obtained for even wider ranges of problems. As additional research is conducted to provide enhanced methods for solving the UQP model, the approach of recasting diverse problems into this general framework will become even more attractive. At present, we are solving problems reformulated as UQP that have more than 50,000 variables in the quadratic representation. On-going research will further expand our ability to solve instances of UQP, further establishing this approach as a unified framework with noteworthy practical and theoretical merit.
122
Gary A. Kochenberger and Fred Glover
8 Acknowledgements T h e authors would like to acknowledge the contributions of their co-workers Drs. Bahram Alidaee, Cesar Rego, and Haibo Wang whose work contributed to some of the results appearing in this paper.
References [AKA94] Alidaee, B, G. Kochenberger, and A. Ahmadian, "0-1 Quadratic Programming Approach for the Optimal Solution of Two Scheduling Problems," International Journal of Systems Science, 25, 401-408, 1994. [AHA98] Alkhamis, T. M., M. Hasan, and M. A. Ahmed, "Simulated Annealing for the Unconstrained Binary Quadratic Pseudo-Boolean Function," European Journal of Operational Research, 108, (1998), 641-652. [AAK99] Amini, M., B. Alidaee, G. Kochenberger, "A Scatter Search Approach to Unconstrained Quadratic Binary Programs," A^eui Methods in Optimization, Cone, Dorigo, and Glover, eds., McGraw-Hill, 317-330, 1999. [Bea99] Beasley, J. E.,"Heuristic Algorithms for the Unconstrained Binary Quadratic Programming Problem, Working Paper, Imperial College, 1999. [BS94] Billionet, A. and A. Sutter, "Minimization of a Quadratic Pseudo-Boolean Function," European Journal of OR, 78 pp. 106-115, (1994). [BH02] Boros, E. and P. Hammer, " Pseudo-Boolean Optimization," Discrete Applied Mathematics, 123(1-3), 155-225 (2002) [BH91] Boros, E., and P. Hammer, "The Max-Cut Problem and Quadratic 0-1 Optimization, Polyhedral Aspects, Relaxations and Bounds," Annals of OR, 33,151-225 (1991) [BHS89] Boros, E, P. Hammer, and X, Sun, "The DDT Method for Quadratic 0-1 Minimization," RUTCOR Research Center, RRR 39-89, 1989. [BP89] Boros, E. and A. Prekopa, "Probabilistic Bounds and Algorithms for the Maximum Satisfiability Problem," Annals of OR, 21 (1989), pp. 109-126. [BGLM94] BourjoUy, J.M., P. Gill, G. Laporte, and H. Mercure, "A Quadratic 0/1 Optimization Algorithm for the Maximum Clique and Stable Set Problems," Working paper, University of Montreal, (1994). [CS94] Chardaire, P, and A. Sutter, "A Decomposition Method for Quadratic Zero-One Programming," Management Science, 41:4, 704-712, 1994. [Cor95] Cornuejolos, G., Combinatorial Optimization: Packing and Covering, CBMS-NSF, SIAM, (2001). [DDJMRR95] De Simone, C. M. Diehl, M. Junger, P. Mutzel, G. Reinelt, and G. Rinaldi, 'Exact Ground State4s of Ising Spin Glasses: New Experimental Results with a Branch and Cut Algorithm," Journal of Statistical Physics, 80, 487-496 (1995) [DGR04] Delorme, X., X. Gandibleau, and J. Rodriques, "GRASP for Set packing," EJOR, 153, (2004), pp. 564-580. [GARK02] Glover, F., B. Ahdaee, C. Rego, and G. Kochenberger, "One-Pass Heuristics for Large-Scale Unconstrained Binary Quadratic Programs," EJOR 137, pp. 272-287, 2002.
Modeling and Solving Combinatorial Optimization Problems [GL97]
123
Glover, F, and M. Laguna, "Tabu Search," Kluwer Academic Publishers, 1997. [GKAA99] Glover, F., G. Kochenberger, B. Alidaee, and M.M. Amini, "Tabu with Search Critical Event Memory: An Enhanced Application for Binary Quadratic Programs," In: MetaHeuristics: Advances and Trends in Local Search Paradigms for Optimization, (Eds.) S. Voss, S. Martello, I. Osman, and C. Roucairol. Kluwer Academic Publisher, Boston, 1999. [GKA98] Glover, F., G. Kochenberger., and B. Alidaee, "Adaptive Memory Tabu Search for Binary Quadratic Programs," Management Science, 44:3, 336345, 1998. [GJR88] Grotschel, M., M. Junger, and G. Reinelt, "An Application of Combinatorial Optimization to Statistical Physics and Circuit Layout Design," Operations Research, Vol.36, # 3 , May-June, (1988), pp. 493-513. [HR68] Hammer, P., and S. Rudeanu, Boolean Methods in Operations Research, Springer-Verlag, New York, 1968. [Han79] Hansen, P.B., "Methods of Nonlinear 0-1 Programming," Annals Discrete Math, vol. 5, pp.53-70, 1979. [HJ90] Hansen, P. and B. Jaumard, "Algorithms for the Maximum Satisfiability Problem," Computing, 44, 279-303 (1990). [HJM93] Hansen, P, B. Jaumard., and V. Mathon, "Constrained Nonlinear 0-1 Programming," INFORMS Journal on Computing, 5:2, 97-119, 1993. [ISSPOO] lasemidis, L. D., D. S. Shiau, J.C. Sackellares, and P. Pardalos, "Transition to Epileptic Seizures: Optimization," DIMACS Series in Discrete Math and Theoretical Computer Science, Vol. 55, (2000), pp. 55-73. [Jos02] Joseph, A. "A Concurrent Processing Framework for the Set Partitioning Problem," Computers & Operations Research, 29, 1375-1391, 2002. [KTNOO] Katayama, K., M. Tani, and H. Narihisa, "Solving Large Binary Quadratic Programming Problems by an Effective Genetic Local Search Algorithm," In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO'OO). Morgan Kaufmann, 2000. [KGAR04a] Kochenberger, G., F. Glover, B. Ahdaee, and C. Rego, "A Unified Modeling and Solution Framework for Combinatorial Optimization Problems," OR Spectrum, 26, pp. 237-250, (2004). [KGAR04b] Kochenberger, G., F. Glover, B. Alidaee, and C. Rego, "Solving Combinatorial Optimization Problems via Reformulation and Adaptive Memory Metaheuristics," Revolutionary Visions In Evolutionary Computation, ed. Anil Menon and David Goldberg, Kluwer Publisher, (to appear in 2004) [KGAR02] Kochenberger, G., F. Glover, B. Alidaee, and C. Rego, " An Unconstrained Quadratic Binary Programming Approach to the Vertex Coloring Problem," Working Paper, University of Colorado at Denver, 2002. [Lau70] Laughunn, D.J, "Quadratic Binary programming," Operations Research, 14, 454-461, 1970. [LAK04] Lewis, M., B. Ahdaee, and G. Kochenberger, "Using xQx to Model and Solve the Uncapacitated Task Allocation Problem," To Appear in OR Letters (2004). [LAL97] Lodi, A., K. Allemand, and T. M. Liebling, "An Evolutionary Heuristic for Quadratic 0-1 Programming," Technical Report OR-97-12, D.E.I.S., University of Bologna, 1997.
124 [MF99]
Gary A. Kochenberger and Fred Glover
Merz, P. and B. Freisleben, "Genetic Algorithms for Binary Quadratic Programming," Proceedings of the 1999 International Genetic and Evolutionary Computation Conference (GECCO'99), pp. An-A2A, Morgan Kaufmann, 1999. [MBRB99] Mingozzi, A., M. Boschetti, S. Ricciardelli and L. Blanco, "A Set Partitioning Approach to the Crew Scheduling Problem," Operations Research, 47 (6) 873- 888, 1999. [Pad73] Padberg, M., "On the Facial Structure of set packing Polyhedra," Mathematical Programmming,vol 5, (1973), pp.199-215. [Pau95] Palubeckis, G. "A Heuristic-Branch and Bound Algorithm for the Unconstrained Quadratic Zero-One Programming Problem," Computing, pp. 284-301, (1995). [PR90] Pardalos, P, and G.P. Rodgers, "Computational Aspects of a Branch and Bound Algorithm for Quadratic Zero-one Programming," Computing, 45, 131-144, 1990. [PR92] Pardalos, P, and G.P. Rodgers, "A Branch and Bound Algorithm for Maximum Clique problem," Computers & OR, 19, 363-375, 1992. [PX94] Pardalos, P, and J. Xue, "The Maximum Clique Problem," The Journal of Global Optimization, 4, 301-328, 1994. [PROS] Pekec, A., and M. Rothkopf, "Combinatorial Auction Design," Management Science, Vol. 49, # 1 1 , Nov.2003, pp. 1485-1503. [Ron03] Ronnqvist, M., "Optimization in Forestry," Math. Programming, Series B, 97, (2003). Pp. 267-284. [Sch97] Schrage, L., Optimization Modeling with LINDO, Duxbury Press, (1997) [Vem98] Vemuganti, R, "Applications of Set Covering, Set Packing and Set Partitioning Models: A Survey," Handbook of Combinatorial Optimization, eds. D. Zhu and P. Pardalos, Kluwer Academic Publishers, (1998).
Global Convergence of a Non-monotone Trust-Region Filter Algorithm for Nonlinear Programming Nicholas I. M. Gould^ and Philippe L. Toint^ ^ Rutherford Appleton Laboratory, Computational Science and Engineering Department, Chilton, Oxfordshire, England, n. i.m. gouldOr 1. ac.uk ^ University of Namur, Department of Mathematics, 61, rue de Bruxelles, B-5000 Namur, Belgium. philippe.tointQfundp.ac.be Summary. A non-monotone variant of the trust-region SQP-filter algorithm analyzed in Fletcher et al (SIAM J. Opt. 13(3), 2002, pp. 653-659) is defined, that directly uses the dominated area of the filter as an acceptability criterion for trial points. It is proved that, under reasonable assumptions and for all possible choices of the starting point, the algorithm generates at least a subsequence converging to a first-order critical point.
1 Introduction Our objective is to define and analyze a new algorithm for solving constrained minimization problems where both the objective function and the constraints are smooth, that is minimize f{x) subject to cs{x) = 0 (1) cx{x) > 0, where / is a twice continuously differentiable real valued function of the variables x e ^'^ and cs{x) and cx{x) are twice continuously differentiable functions from 3?"^ into 5R"^ and from 3?^ into 3?^, respectively. Let c(x)^ = {cs{x)^ cx{x)^). Note that no convexity assumption is made. The algorithms that we discuss belongs is a trust-region filter method^ and belong as such to a class of algorithms introduced by [FL02]. A global convergence theory for this class is proposed in [FLT98], in which the objective function is locally approximated by a linear function, leading, at each iteration, to the (exact) solution of a linear program. Similar results are shown in [FLT02], where the approximation of the objective function is quadratic, leading to a Sequential Quadratic Programming (SQP) method. However, this is accomplished at the (very high) cost of finding a global minimizer of the possibly nonconvex quadratic programming subproblem. This latter requirement
126
Nicholas 1. M. Gould and Philippe L. Toint
is relaxed in [FGLTW02], where the SQP step is decomposed in "normal" and "tangential" components. The main purpose of the current paper, a companion of [FGLTW02], is to analyze an algorithm where the filter acceptance criterion for new iterates is relaxed to allow dominated iterates to be accepted in some cases. This is potentially important as it is known that SQP method can generate such iterates in their asymptotic fast convergence phase. The theory developed here therefore provides a possible convergence framework for a filter method with quadratic convergence properties without the need to introduce secondorder corrections. Results along this line are already known for linesearch-filter methods [WBOl], and for another variant of trust-region-filter methods where the definition of filter entries is modified [Ulb04]. Our objective is to introduce a framework suitable for trust-region-filter methods using the original definition of the filter entries. Moreover, the new theory is that it no longer needs the notion of a "margin" around the filter, a device which is common to all theoretical approaches of the filter method so far.
2 A Non-monotone Filter Algorithm As indicated above, the algorithm that we are about to describe is of the SQP type. At a given iterate x^, Newton's method is implicitly applied to solve (a local version of) the first-order necessary optimality conditions by solving the quadratic programming subproblem QP(xfc) given by minimize fk + {9k, s) + ^{s, Hks) subject to C£{xk) + A£{xk)s = 0 cxixk) + Ax{xk)s > 0,
(2)
where fk = f{xk), gk = 9{xk) = ^xf{xk), where A£{xk) and Aj{xk) are the Jacobians of the constraint functions cg and cj at Xk and where Hk is a symmetric matrix. We will not immediately be concerned about how Hk is obtained, but we will return to this point in Section 3. The solution of QP(x'fc) then yields a step s^. If Sk = 0, then Xk is first-order critical for problem (1). 2.1 The composite SQP step The step Sk is typically computed by solving, possibly approximately, a variant of (2). In the trust-region approach, one takes into account the fact that (2) only approximates our original problem locally: the step Sk is thus restricted in norm to ensure that Xk + Sk remains in a trust-region centred at Xk, where we behave this approximation to be adequate. The subproblem QP(xfe) is thus replaced by its TRQP(a;fc,Z\fc) variant given by
Non-monotone Trust-Region Filter Algorithm minimize mk{xk + s) subject to cs{xk) + A£{xk)s = 0, cx{xk)+Ax{xk)s>Q, and ||s|| < Zife,
127 . . ^ >
for some (positive) value of the trust-region radius Ak, where we have defined mk{xk +s) = fk + {gk,s) + i(s,i?fes),
(4)
and where || • j| denotes the Euclidean norm. This latter choice is purely for ease of exposition. We could equally use a family of iteration dependent norms || • ||fc, so long as we require that all members of the family are uniformly equivalent to the Euclidean norm. In accordance with the ideas pionneered by [Var85], [BSS87] and [Omo89], and also with [FGLTW02], our algorithm decomposes the step s^ into the sum of two distinct components, a normal step Uk, such that Xk + n^ satisfies the linear constraints of TRQP(xfe, Ak) within the trust region, and a tangential step tk, whose purpose is to obtain reduction of the objective function's model while continuing to satisfy those constraints. More formally, we write Sk = rik + tk
(5)
and assume that C£{xk) + A£{xk)nk=0,
C2{xk) + Ai{xk)nk
> 0,
\\sk\\<^k,
(6) (7)
and C£{xk)+A£{xk)sk=0,
cx{xk) + Ax{xk)sk>0.
(8)
Of course, this is a strong assumption, since in particular (6) or (7)/(8) may not have a solution. We shall return to this possibility shortly. There are many ways to compute suitable Uk and tk- As in [FGLTW02], we only assume that Uk exists and, for some constants K,,^^, > 0 and 6n > 0, ll'^fcll < K,„Jk, whenever 9k < 5^,
(9)
where 9k = 9{xk) is the maximum violation of the nonlinear constraints at the fc-th iterate defined by 9{x) = max 0,rnax|cj(a;)|,max—Ci(x) ies
iei
(10)
These conditions can be shown to avoid unduly restrictive conditions on the constraints or the normal step itself (see [FGLTW02] for a discussion). We then may use the normal step Uk if it falls within the trust-region, that is if ||nfc|| < Ak- In this case, we write
128
Nicholas I. M, Gould and Philippe L. Toint x"^ =Xk + nk,
(11)
and observe that n^ satisfies the constraints of TRQP(xfc,zifc) and thus also of QP(xfc). It is crucial to note, at this stage, that such an Uk may fail to exist because the constraints of QP(x/c) may be incompatible or because all feasible points for QP(a;fc) may lie outside the trust region. If a normal step Uk fias been found with ||nfc|| < Z\fc, we then find a tangential step tk, starting from x^ and satisfying (7) and (8), whose objective is to decrease the value of the objective function. This is achieved by computing a step that produces a sufficient decrease in ruk, which is to say that we wish '^fc(3;fc) —?Tifc(x'fc+Sfc) to be "sufficiently large". Of course, this is only possible if the maximum permitted size of tk is not too small, which is to say that x'^ is not too close to the trust-region boundary. We formalize this condition by strengthening our requirement that j|nfcj| < Ak so that \\nk\\ < KAAkmm[l,K^A'^],
(12)
for some K/^ G (0,1], some K,J > 0 and some /i G [0,1). If condition (12) does not hold, we assume, as in [FGLTW02], that the computation of tk is unlikely to produce a satisfactory decrease in ruk, and proceed just as if the feasible set of TRQF{xk, Ak) were empty. If Uk can be computed and (12) holds, TRQP(a;^:,zifc) is said to be compatible for /x . In this sufficient model decrease seems possible. We formalize this notion in the form of a familiar Cauchy-point condition, and, recalling that the feasible set of QP(x'fc) is convex, we introduce the first-order criticality measure Xk = \
min cxixk) +
{gk + Hknk,t)\
(13)
Ax{xk)(nk+t)>0
l|tll
Q,
(14)
which is, up to the constant term ^{nk,Hknk), equivalent to Q,P{xk) with s = Uk + t. The sufficient decrease condition then consists in assuming that there exists a constant K,,,„d > 0 such that mk{Xk) -rukixk
+tk) > KtmdXfc]
Xk ^ Pk
(15)
whenever TRQP(xfc, Z\/c) is compatible, where /3fc = 1 -t- ||i:f/c||. We know from [Toi88] and [CGST93] that such steps can be computed, even if we recognise that (15) may be difficult to expficitly verify in practice for large problems.
Non-monotone Trust-Region Filter Algorithm
129
2,2 The restoration procedure If TRQP(a;fc, Ak) is not compatible for /x, that is when the feasible set determined by the constraints of QP{xk) is empty, or the freedom left to reduce ruk within the trust region is too small in the sense that (12) fails, we must consider an alternative. Observe that, if 9{xk) is sufficiently small and the true nonlinear constraints are locally compatible, the linearized constraints should also be compatible, since they approximate the nonlinear constraints (locally) correctly. Furthermore, the feasible region for the linearized constraints should then be close enough to Xk for there to be some room to reduce m^, at least if Z\fc is large enough. If the nonlinear constraints are locally incompatible, we have to find a neighbourhood where this is not the case, since the problem (1) does not make sense in the current one. As in [FGLTW02], we rely on a restoration procedure. The aim of this procedure is to produce a new point Xk + fk that satisfies two conditions: we require TRQP(a:;fe + r/j, ^fe+i) to be compatible for some Ak+i > 0, and also require that x/^ + r/. be acceptable, in the sense that we discuss in the Section 2.3.3 (precisely, we require that either (20) or (21) holds for such an x j ) . In what follows, we wiU denote TZ = {k \ Uk does not satisfy (9) or ||nfc|j > K^Zifcmin[l,K^iZ\j!]}, the set of restoration iterations. The idea of the restoration procedure is to (approximately) solve min 9(x) (16) xeIR" starting from Xk, the current iterate. This is a non-smooth problem, but there exist methods, possibly of trust-region type (such as that suggested by [Yua94]), which can be successfully applied to solve it. Thus we wih not describe the restoration procedure in detail. Note that we have chosen here to reduce the infinity norm of the constraint violation, but we could equally well consider other norms, such as ii or £2, in which case the methods of [FL98] or of [HT95] and [DAW99] can respectively be considered. Of course, this technique only guarantees convergence to a first-order critical point of the chosen measure of constraint violation, which means that, in fact, the restoration procedure may fail as this critical point may not be feasible for the constraints of (1). However, even in this case, the result of the procedure is of interest because it typicaUy produces a local minimizer of 9{x), or of whatever other measure of constraint violation we choose for the restoration, yielding a point of locally-least infeasibility. There seems to be no easy way to circumvent this drawback, as it is known that finding a feasible point or proving that no such point exists is a global optimization problem and can be as difficult as the optimization problem (1) itself. One therefore has to accept two possible outcomes of the restoration procedure: either the procedure fails in that it does not produce a sequence of iterates converging to feasibility, or a point Xk + rk is produced such that 6{xk + rk) is as small as desired.
130
Nicholas I. M. Gould and Philippe L. Toint
2.3 The filter as a criterion to accept trial points Unfortunately, because the SQP iteration may only be locally convergent, the step Sfc or rk may not always be very useful. Thus, having computed a step Sk or rk from our current iterate Xk, we need to decide whether the trial point x^, defined by ^+ def (xk+Vk if ken, ,^^. '^ \xk + Sk otherwise ^ ' is any better than Xk as an approximate solution to our original problem (1). If we decide that this is the case, we say that iteration k is successful and choose x'^ as our next iterate. Let us denote by S the set of (Indices of) all successful iterations, that is S = {k\ Xk+i = a ; ^ } . We will discuss the details of precisely when we accept x'^ as our next iterate in Section 2.3.3, but note that an important ingredient in the process is the notion of a filter, a notion itself based on that of dominance. We say that a point xi dominates a point X2 whenever 0{xi) < e{x2) and fixi)
< /(xz).
Thus, if iterate Xk dominates iterate Xj, the latter is unlikely to be of real interest to us since Xk is at least as good as Xj on account of both feasibility and optimality. All we need to do now is to remember iterates that are not dominated by other iterates using a structure called a filter. A filter is a list J^ of pairs of the form {Oi, fi) such that either %< dj or f, < fj for i 7^ j . [FGLTW02] propose to accept a new trial iterate Xk + Sk only if it is not dominated by any other iterate in the filter and Xk- In the vocabulary of multi-criteria optimization, this amounts to building elements of the efficient frontier associated with the bi-criteria problem of reducing infeasibihty and the objective function value. We may describe this concept by associating with each iterate Xk its {9, /)-pair {9k, fk) and might choose to accept Xk + Sfc only if its {9, /)-pair does not lie, in the two-dimensional space spanned by constraint violation and objective function value, above and on the right of a previously accepted pair. If we define p(jr) = {(5)^ f)\9>9j
and / > fj for some j e J"},
(18)
the part of the (9, /)-space that is dominated by the pairs in tiie filter, this amounts to say that xjj" could be accepted if {9{x'^),f{x'l)) 0 V{J^k)i where jFfc denotes the filter at iteration k.
Non-monotone Trust-Region Filter Algorithm
131
2.3.1 The contribution of a trial point to the filter However, we may not wish to accept a new point x^ if its {9, /)-pair
{Ohf^) = {9{xt),fi4)) is arbitrarily close to being dominated by another point already in the filter. [FGLTW02], as all other theoretical analysis of the filter tiiat we know of, set a small "margin" around the border of T>{rk) in which trial points are also rejected. We follow here a different idea and define, for any [9, /)-pair, an area that represents a sufficient part of its contribution to the area of 'D{J-k)- For this purpose, we partition the right half-plane [0, -l-oo] x [—oo, -|-oo] into four different regions (see Figure 1).
fix)
NWiJ'k) e^k
v{rk)
n
C'max
, min
e[x] SW{J^k)
ff SE{Tk)
Fig. 1. The partition of the right half-plane for a filter Tk containing four {0,f) pairs.
If we define ViTkf'
to be the complement of P(J^fe) in the right half-plane,
0'mill ^ t , / =' m" i" n" ^",J,i 3^J^k
"C m aLx =~' m' -a. "x" ^0" :,( ), J&J^k
and /min = m m / j l^J^k
f:^l^ - m a x / , , O^-Fk
132
Nicholas I. M. Gould and Philippe L, Toint
these four parts are 1. the dominated part of the filter, P(^fe). 2. the undominated part of lower left (south-west) corner of the half plane, SW{:F,)
^^ V{Tkf
n [0,eax] x [-oo,/;^^J,
3. the undominated upper left (north-west) corner,
4. the undominated lower right (south-east) corner, 5i?(^fc) = ' ( C L , + o o ] x [ - o o , / ; ^ f ; , ) . Consider first a trial iterate x^ with its associated {6, /)-pair {0^,fi^) with 0'^ > 0. If the filter is empty (JF^. = 0), then we measure its contribution to the area of the filter by the simple formula
for some constant Kp > 0. If the filter already contains some past iterates, we measure the contribution of xt to the area of the filter by
4
a{x+,Tk)
'^' area(^P(^fc)''n[0+, C * „ , + « , ] x [ / + , / ; ^ L + « K ] )
if (^^4+) G SW{:Fk
by
a{xt,:Fk) = K,{e^L - Gt) if {etJi) e NW{n), by a ( 4 , n ) "^ K.{f^,^ - / + )
if (0+,/+) G SE{J^,k)
and by a{xl,T,)
"^ -area(^2?(^On[0fc+-e:t„]x[/+-/^|=jj,
if ( e + , / + ) G V{Tu),
where Vk ='' m,fi)
Gn
I ^i < 0fe+ and /j < / + } ,
(the set of filter pairs that dominate (0^, / ^ ) ) , and ^min =^ min Oj,
e'^i^ ='' rnax 9j.
Figure 2 iUustrates the corresponding areas in the filter for four possible (0^,4+) pairs (in V{Tk), SWi^k), NWiJ'k) and SEiJ'k)) to the area of the filter. Horizontally dashed surfaces indicate a positive contribution and
Non-monotone Trust-Region Filter Algorithm
133
vertically dashed ones a negative contribution. Observe that, by construction, the assigned areas a{x'^,!Fk) for nondominated points are all disjoint and that the negative area for dominated points is chosen such that 2?(^fe) is updated correctly. Also note that a(x, JF) is a continuous function of {0{x), f{x)), and thus of x, for a given filter J-". Furthermore, a{x,!F) is identically zero if {d{x),f{x)) is on the boundary of the dominated region P ( ^ ) . Also note that, although seemingly complicated, the value of a{x,!F) is not difficult to compute, since its calculation requires, in the worst case, us to consider all the points currently in the filter only once.
/ m a x "f~ ^ F
J max
fVk /max
(9
f )
Fig. 2. The contributions of four {9^, fil) pairs (in V{rk),sW(h), NW{Tk) and SE{Tk)) to the area of the filter. Klorizontal stripes indicate a positive contribution and vertical stripes a negative one.
2.3.2 U p d a t i n g t h e filter The procedure to update the filter for a particular {9, f) pair is extremely simple. If {Ok,fk) — {0{xk),f{xk)) does not belong to V{!Fic) (i.e. ii Xk is not dominated), then J^k+i ^ .Ffc U {Ok,fk), while if {9k, fk) £ T^i^k) (if ^k is dominated).
134
Nicholas I, M. Gould and Philippe L. Toint Tk + 1
(•^fc\n)u(0:L,/fc)u(efc,/,
"Pfc \ min/
where Vh is now the subset of pairs in Tk that dominate {9k, fk)- This last situation is illustrated by Figure 3, which shows the filter resulting from the operation of including the pair {6k, fk) belonging to V{J^k) (that associated with the vertically shaded "decrement" in the filter area of Figure 2) in the filter. The two points in Vk that have been removed are marked with crossed circles and their associated dominated orthants are indicated by dotted lines. Observe that it may happen that the number of points in the filter decreases when the set of dominating points Vk contains more than two filter pairs. Moreover, the pair for which the filter is updated is not always itself included in the filter (as shown in Figure 3). fk
{OkJk)
{OZlJk)
9k
"k, / m i n )
Fig. 3. The filter J-k+i after including the dominated pair {9k, fk) into J^k-
2.3.3 Acceptability of potential iterates We now return to the question of deciding whether or not a trial point x'l is acceptable for the filter. We will insist that this is a necessary condition for the iteration k to be successful in the sense that Xk+i = x^, i.e. the algorithm changes its current iterate to the trial point. Note that all restoration iterations are successful {TZ C S). Note also that (except for XQ) all iterates are produced
Non-monotone Trust-Region Filter Algorithm
135
by successful iterations : if we consider an iterate Xk, there must exists a predecessor iteration of index p{k) G S such that Xk.
S(fc) " ^Pik)+l
(19)
Observe that we do not always have that p{k) = k — 1 since not all iterations need being successful. A monotone version of our method (rather similar to that developed in [FGLTW02], but using a{x,!F) rather than a margin around the filter) would be to accept x'^ whenever this trial point results in an sufficient increase in the dominated area of the filter, i.e. 'D{Tk)- This is to say that x'j. would be acceptable for the filter whenever
ak > lAOt)\
(20)
def
where a^ "= a{x'^,J^k) and where 7jr G (0,1) is a constant. The non-monotone version that we analyze below replaces this condition by the weaker requirement that k
k
^
E (^^ + ((^t)'
ap(j) + Qffc > 7;r
= r(/c) + l
(21)
j=r(fc)+l
where aq = a{x'^,!Fq) (and thus a^i^q) = a{xq,!Fpi^q))), where U = {k\ the filter is updated for
{9k,fk)},
and where r{k) < fc is some past reference iteration such that r{k) G U. Note that condition (21) may equivalently be written in the more symmetric form k
Mi) + "fe > 7.F = r(/=) + l
E ^nu))"+ iot?
j = r(fc) + 1
jew iew because of (19). The reader may notice that condition (21) is reminiscent of the condition for non-monotone trust-region algorithms developed in [Toi96]. It requires that the average contribution to the filter area of the last points included in the filter and x'^ together to be globally (sufficiently) positive, but makes it possible to accept x'^ even though it may be dominated (i.e. lie in P(J-)j)). However, if x'j" provides a clear monotonic improvement, in the sense that (20) holds, we are also prepared to accept it. Thus, x^ will he called acceptable at iteration k if either (20) or (21) holds. We will denote def
A=
{keS\
(21) holds}
(22)
Observe also that we could replace Of. by min[0^,Ke] in (20) and (21), where a and Ke are strictly positive constants. This variant may be more numerically sensible, and does not affect the theory developed below.
136
Nicholas I. M. Gould and Philippe L. Toint
2.4 T h e n o n - m o n o t o n e A l g o r i t h m We are now ready to define our algorithm formally as Algorithm 2.1. A simplified flow-chart of the algorithm is given as Figure 4.
Algorithm 2.1: N o n - m o n o t o n e Filter Algorithm Step 0: Initialization. Let an initial point xo, an initial trust-region radius Ao > 0 and an initial symmetric matrix Ho be given, as well as constants 0 < 70 < 71 < 1 < 72, 0 < ?7i < 772 < 1, 7J=- G (0,1), Ke E (0,1), KA e (0,1], Kft > 0, fj, G (0,1), ip > 1/(1 -I- Id) and Ktmd £ (0,1]. Compute /(xo) and c(xo). Set J ^ = 0 and /c = 0. Step 1: Test for optimality. If dk = Xk = 0, stop. Step 2: Ensure compatibility. Attempt to compute a step Uk- If TRQP (x^, Zifc) is compatible, go to Step 3. Otherwise, update the filter for {9k, fk) and compute a restoration step rk for which TRQP{xk+rk, Ak+i) is compatible for some Ak+i > 0, and x^ = Xk+rk is acceptable. If this proves impossible, stop. Otherwise, set Xk+i = x^ and go to Step 7. Step 3: Determine a trial step. Compute a step tk, set x^ = Xk + rik + tk, and evaluate c(x;J') and }{x'l). Step 4: Test acceptability of the trial point. If xjj" is not acceptable, again set Xfe+i = Xk, choose Ak+\ £ [-yoAk,^\Ak\, set nk+\ = uk, and go to Step 7. If mk{xk) - mk{xl) < Ke9f, (23) then update the filter for {Ok,fk) and go to Step 6. Step 5; Test predicted vs. achieved reduction. If Pk'='
^^;i-^^i\,
(24)
set Xk+i = Xk, choose Ak+i G ['yoAk,'yiAk], set Uk+i = rik and go to Step 7. Step 6: Move to the new iterate. Set Xk+i = x^ and choose Ak+i such that Ak+i e [Ak,y2Ak]
if pk > V2 and (23) fails.
Step 7: Update the Hessian approximation. Determine Hk+i- Increment k by one and go to Step 1.
As in [FL98, FL02], one may choose tp = 2 (Note t h a t the choice •0 = 1 is always possible because /i > 0). Reasonable values for the constants might then be 7 ^ = 10-4,
70-0.1, 71=0.5, 72=2, 7yi=0.01, ryz = 0.9, K4 = 0.7, K^ = 100, /ii = 0.01, Ke = 10-'^, and K,,„d = 0.01,
b u t it is too early to know if these are even close to the best possible choices.
Non-monotone Trust-Region Filter Algorithm
137
initialization {k — 0)
attempt to compute nt
TRQP(;i;t,At) compatible?
update the filter
compute tk
compute rfc and Afc+1
ret acceptable?
mfc(3,'t) - mt(;i;J) < Kefljf?
update the filter
.Tt+i =xt
+ rk
Xk+l =Xk
Pk > V> •
+Sk
(maybe) increase Afc —> A^+i
reduce Afc -» At+,
compute Hk-\-\ and increment k by one
Fig. 4. Flowchart of the algorithm (without termination tests)
For the restoration procedure in Step 2 to succeed, we have to evaluate whether TRQP(a;fe -|- rfc,Zifc+i) is compatible for a suitable value of Zi^+i. This requires that a suitable normal step be computed which successfully passes the test (12). Of course, once this is achieved, this normal step may be reused at iteration fc + 1. Thus we shall require the normal step calculated to verify compatibility of TRQP(a;fc +rk, /^fc+i) should actually be used as Ufc+i. Also note that the restoration procedure cannot be applied on two successive iterations, since the iterate Xk + Tk produced by the first of these iterations leads to a compatible TRQP(xfe-|-i,Zifc+i) and is acceptable. As it stands, the algorithm is not specific about how to choose Ak+i during a restoration iteration. We refer the reader to [FGLTW02] for a more complete
138
Nicholas 1, M. Gould and Philippe L. Toint
discussion of this issue, whose implementation may involve techniques such as the internal doubling strategy of [BSS87] to increase the new radius, or the intelligent radius choice described by [Sar97]. However, we recognize that numerical experience with the algorithm is too limited at this stage to make definite recommendations. The role of condition (23) may be interpreted as follows. If it holds, then one may think that the constraint violation is significant and that one should aim to improve on this situation in the future, by inserting the current point in the filter. If it fails, then the reduction in the objective function predicted by the model is more significant than the current constraint violation and it is thus appealing to let the algorithm behave as if it were unconstrained. In this case, it is important that the predicted decrease in the model is reafized by the actual decrease in the function, which is why we then perform the test (24). In particular, if the iterate Xk is feasible, then (9) implies that Xfc = x^ and Ke9f = 0 < mk{xk) - mk{x^) = mk{xk) - mk{x~l). (25) As a consequence, the filter mechanism is irrelevant if all iterates are feasible, and the algorithm reduces to a traditional unconstrained trust-region method. Another consequence of (25) is that no feasible iterate is ever included in the filter, which is crucial in allowing finite termination of the restoration procedure, as explained in [FGLTW02]. Note that the argument may fail and a restoration step may not terminate in a finite number of iterations if we do not assume the existence of the normal step when the constraint violation is small enough, even if this violation converges to zero (see Fletcher, Leyffer and Toint, 1998, for an example). Notice also that the failure of (23) ensures that the denominator of pk in (24) will be strictly positive whenever 9k is. If 6k = 0, then x^, = x^, and the denominator of (24) will be strictly positive unless Xk is a first-order critical point because of (15). The reader may have observed that Step 6 allows a relatively wide choice of the new trust-region radius /ifc+i- While the stated conditions appear to be sufficient for the theory developed below, one must obviously be more specific in practice. We refer again to [FGLTW02] for a more detailed discussion of this issue. Finally, observe that the mechanism of the algorithm imposes that UCS,
(26)
i.e. that iterates are included in the filter only at successful iterations.
3 Convergence to First-Order Critical Points We now prove that our non-monotone algorithm generates a globally convergent sequence of iterates. In the following analysis, we concentrate on the case that the restoration iteration always succeeds. If this is not the case, then it
Non-monotone Trust-Region Filter Algorithm
139
usually follows that the restoration phase has converged to an approximate solution of the feasibility problem (16) and we can conclude that (1) is locally inconsistent. In order to obtain our global convergence result, we will use the assumptions ASl: / and the constraint functions cg and c j are twice continuously diflerentiable; AS2: there exists «„,„,, > 1 such that ll^fcll < Kumh - 1 for all k, ASS: the iterates {xk} remain in a closed, bounded domain X C K". If, for example, Hk is chosen as the Hessian of the Lagrangian function e{x,y) = f{x) + (y£,C£{x)) + {yx,cx{x)) at Xk, in that Hk = ^xxf{Xk)
+ X ^ [yk]i^xxCi{Xk),
(1)
iefuz where [yk\i denotes the i-th component of the vector of Lagrange multipliers Vk = iVs k 2/jfc)>then we see from ASl and ASS that AS2 is satisfied when these multipliers remain bounded. The same is true if the Hessian matrices in (1) are replaced by bounded approximations. A first immediate consequence of AS1~ASS is that there exists a constant K„bi, > 1 such that, for all k, i/(x+)-mfc(x+)!<«„,„Z\2.
(2)
A proof of this property, based on Taylor expansion, may be found, for instance, on p. ISS of [CGTOO]. A second important consequence of our assumptions is that ASl and ASS together directly ensure that, for all fc, / " ' " < f{xk) < /""" and 0 < 6ifc < 6'"""'
(S)
for some constants /""'" < /""''' and 6''"'"' > 0. Thus the part of the (0,/)-space in which the {9, /)-pairs associated with the filter iterates lie is restricted to the rectangle [0,6''"'"'] x [/""", oo]. We also note the following simple consequence of (9) and ASS. Lemma 1. Suppose that Algorithm 2.1 is applied to problem (1). Suppose also that (9) and ASS hold, and that Ok < 5nThen there exists a constant
K,S<.,
> 0 independent of k such that
Kisc^fc < ||"-fc|!-
(4)
140
Nicholas I. M. Gould and Philippe L. Toint
Proof. See [FGLTW02], Lemma 3.1. Our assumptions and the definition of Xk in (13) also ensure that 6^ and Xfc can be used (together) to measure criticality for problem (1). Lemma 2. Suppose that Algorithm 2.1 is applied to problem (1) and that finite termination does not occur. Suppose also that ASl and ASS hold, and that there exists a subsequence {ki} % TZ such that lim Xfci = 0 and Um Qk^ = 0.
(5)
Then every limit point of the subsequence {x^.} is a first-order critical point for problem (1). Proof. See [FGLTW02], Lemma 3.2. We start our analysis by examining the impact of our non-monotone acceptance criteria (20) and (21). Once a trial point is accepted as a new iterate, it must be because it provides some improvement, compared to either a past reference iterate (using (21)), or to the previous iterate (using (20)). We formalize this notion by saying that iterate Xk = a;p(fc)+i improves on iterate Xi(^k), where i{k) = r{p{k)) if p{k) G A, that is if Xk is accepted at iteration p{k) using (21), and i{k)=p{k)
if p{k)(^A,
(6)
that is if Xk is accepted at iteration p{k) using (20). Now consider any iterate Xk- This iterate improved on Xj(fc), which was itself accepted because it improved on Xj(i(fc)), and so on, back to the stage where XQ is reached by this backwards referencing process. Hence we may construct, for each fc, a chain of successful iterations indexed by Cfc = {^i, ^2, • • •, ^g} such that £1=0,
iq = k and x^. = x^g.^^) for j =
l,...,q-l.
We start by proving the following useful lemma. Lemma 3. Suppose that Algorithm 2.1 is applied to problem (1). Then, for each k, fe-i
area(2?(J-fe)) > 7 ^ ^
di
Proof. Consider now the backward referencing chain from iteration fc — 1, Ck-i, and any £j {j > 0) in this chain. Observe that, if p{ij) G A, then (21) implies that i{ij) = r{p{£j)) = ij-i and that
Non-monotone Trust-Region Filter Algorithm If now p{ij)
^ .4, then £j-i
= p{ij)
141
and thus
{£j^i + i,...,ej}nuc
{^^_i + i,...,ij}ns
= {ij},
where we have used (26). Moreover, (20) then implies t h a t ap(^,) > Jy^dfi t h a t (7) holds again in this case. Observe also t h a t
so
q
fc-l
area(I'(j^fc)) > ^ a p
^^ j=0
ieu
E
^p(i)
='•3-I-'
ieu
since the ap(^i) are all disjoint for nondominated points and the dominated area of the filter is updated correctly for dominated ones. Combining this inequality with (7) then gives the desired result. We now consider what happens when the filter is u p d a t e d an infinite number of times. L e m m a 4 . Suppose t h a t Algorithm 2.1 is applied to problem (1). Suppose also t h a t A S l and ASS hold and t h a t \U\ = oo. Then hm 0k = 0. keu Proof. Suppose, for the purpose of obtaining a contradiction, t h a t there exists an infinite subsequence {ki} C U such t h a t 0fc. > e for all i and for some e > 0. Applying now Lemma 3, we deduce t h a t area(2?(J'fc,+i)) > i-fy^e'^. However, (3) implies t h a t , for any k, area('Z?(^fc)) is bounded above by a constant Kp"" > 0 independent of k. Hence we obtain t h a t t <
9 '
and i must also be finite. This contradicts the fact t h a t the subsequence {ki} is infinite. Hence this latter assumption is impossible and the conclusion follows. We next examine the size of the constraint violation before and after an iteration where restoration did not occur. L e m m a 5. Suppose t h a t Algorithm 2.1 is applied to problem (1), t h a t A S l and AS3 hold, t h a t k ^ TZ and t h a t n^ satisfies (4). Then Ok < K u M ^ r ^
(8)
9ix+) < K,„„Al.
(9)
and for some constant K^M, > 0.
142
Nicholas I. M. Gould and Philippe L. Toint
Proof. See [FGLTW02], Lemma 3.4. We next assess the model decrease when the trust-region radius is sufRciently small. Lemma 6. Suppose that Algorithm 2.1 is applied to problem (1). Suppose also that AS1-AS3, (12) and (15) hold, that fc 0 7^, that (10)
Xk > e, for some e > 0, and that 1+M
d e f •-
^k ^ iiiin
^l^ubg^Af^/J,
f^umbt^Af^l^
(11)
where K^^^ = max^^x ||Va:/(x)||. Then rukixk) -mk{xl)
> ^Ktmd^Ak.
Proof. See [FGLTW02], Lemma 3.5. We continue our analysis by showing, as the reader has grown to expect, that iterations have to be very successful when the trust-region radius is sufficiently small. Lemma 7. Suppose that Algorithm 2.1 is applied to problem (1). Suppose also that AS1-AS3, (15) and (10) hold, that fc 0 7^, and that Ak < min
(1 -r]2)K,^ae
def r.
^m.)
(12)
Then pk > mProof. See [FGLTW02], Lemma 3.6. Note that this proof could easily be extended if the definition of pk in (24) were altered to be of the form
d^f f{xk) mk{xk)
f{xl)-\-ek -mk{xl)
(13)
for some 0/c? provided this quantity is bounded above by a multiple of A\. Now, we also show that the test (23) will always fail when the trust-region radius is sufficiently small. Lemma 8. Suppose that Algorithm 2.1 is applied to problem (1). Suppose also that AS1-AS3, (12), (15) and (10) hold, that k^ll, that Uk satisfies (4), and that ^(1 + / ^ ) - !
Ah < min Then m/e(x/c) - mk{xl)
> KeO'l.
def
(14)
Non-monotone Trust-Region Filter Algorithm
143
Proof. This directly results from the inequalities K09f < KsK'^^,,Af^^'^''^ < iKt^jeZifc < nikixk) -
mk{x^),
where we successively used Lemma 5, (14) and Lemma 6. We may also guarantee a decrease in the objective function, large enough to ensure that the trial point is acceptable with respect to the {9, /)-pair associated with Xk, so long as the constraint violation is itself sufficiently small. Lemma 9. Suppose that Algorithm 2.1 is appHed to problem (1). Suppose also that AS1-AS3, (15), (10) and (12) hold, that k ^Tl, that Ufc satisfies (4), and that 9k < i^nhi
„ ,—
= So-
15)
Then
f{4) < fM
- Vl^9k.
Proof. Applying Lemmas 5-7—which is possible because of (10), (12), k ^ TZ and rik satisfies (4)—and (15), we obtain that
f{xk) - f{x^) > mimkixk)
- •rrikix'^)]
and the desired inequality follows. >V^9k We now estabUsh that if the trust-region radius and the constraint violation are both small at a non-critical iterate Xk, TRQP{xk,Ak) must be compatible. Lemma 10. Suppose that Algorithm 2.1 is applied to problem (1). Suppose also that AS1-AS3, (9) and (10) hold, that (15) holds for k^TZ, and that "^Sn.
Ak < min
(16)
Suppose furthermore that 9k < mm[5o,Sn].
(17)
Then k<^n. Proof Because 9k < Sn, we know from (9) and Lemma 1 that rik satisfies (9) and (4). Moreover, since 9k < 6$, we have that (15) also holds. Assume, for the purpose of deriving a contradiction, that k GTZ, that is nfc||>K4K^Zi^+^
(18)
144
Nicholas I. M. Gould and Philippe L. Toint
where we have used (12) and the fact that Kf^A'/^ < 1 because of (16). In this case, the mechanism of the algorithm then ensures that fc — 1 ^ 7 ^ . Now assume that iterationfc— 1 is unsuccessful. Because of Lemmas 7 and 9, which hold at iteration k — 1 ^ TZ because of (16), the fact that dk = Ok-ij (9), and (15), we obtain that Pk-i > m and /(.T^t^i) < f{xk-i)
- y^9k-i.
(19)
Hence, if iteration fc — 1 is unsuccessful, this must be because x'^_-^ is not acceptable for the filter. However, if we have that eti
< {I - V^)Ok~i,
(20)
then, using the second part of (19) and the fact that {O^^^, fil_^) G a ( 4 - i . - ^ f c - i ) > ifi^k-i)
SW{J^k-i),
- /(a;+_i)][0fc-i - ^ j ] > 7 ^ ^ L i >
lAOt-if,
and a;^_j is acceptable for the filter because of (20). Since this is not the case, (20) cannot hold and we must have that ^ ^ l > (1 - vi^Wk-i
= (1 -
v^)9k-
But Lemma 5 and the mechanism of the algorithm then imply that (1 - Vi^)Ok < /^uM^ti <
^ ^ l 7o
Combining this last bound with (18) and (9), we deduce that 7O(1-\/T^)
and hence that 1-^ ^ 7o(l -
^/l¥)KAfi,i
Since this last inequality contradicts (16), our assumption that iteration fc — 1 is unsuccessful must be false. Thus iteration fc — 1 is successful and 9^ = 9'l_-y. We then obtain from (18), (9) and (9) that KAI^HA]^''
< \\nk\\ < K,,,,9k < K.,„K„i,tZifc_i <
"°%"''Mfe, 7o
which is again impossible because of (16) and because (1 — y^r?) < 1. Hence our initial assumption (18) must be false, which yields the desired conclusion. The rest of the theory follows the development in [FGLTW02], but, because the variations with respect to this reference are difficult to isolate out of context, we prefer to restate the results explicitly. We now distinguish two mutually exclusive cases. For the first, we consider what happens if there is an infinite subsequence of iterates belonging to the filter.
Non-monotone Trust-Region Filter Algorithm
145
Lemma 11. Suppose that Algorithm 2.1 is apphed to problem (1). Suppose also that AS1-AS3, (9) hold and (15) holds for k ^ TZ. Suppose furthermore that \U\ = 00. Then there exists a subsequence {kj} CU such that hra Ok, = 0 —»oo
(21)
^
and lim xk, = 0.
(22)
Proof. Let {/cj} be any infinite subsequence of U. We observe that (21) follows from Lemma 4. Suppose now that Xfe, > £2 > 0
(23)
for all i and some t2 > 0. Suppose furthermore that there exists ea > 0 such that, for all i > IQ, Ak.„ > es.
(24)
Observe first that (21) and (9) ensure that lim \\nk, II = 0.
(25)
i--*oo
Thus (24) ensures that (12) holds for sufficiently large i and thus ki ^ TZ for such i. Now, as we noted in the proof of Lemma 6, |wfe,;(xfcj -mfc,(a;^,)| < K„,,g||nfcJ| + ^Kumh||nfc,f, which in turn, with (25), yields that lim [irik, {xk,) - mki {x'^ )] = 0.
(26)
I—too
We also deduce from (15) and AS2 that «2 ,£3
-^ 5 > 0.
(27)
We now decompose the model decrease in its normal and tangential components, that is f^kA^ki)
~ mk,{xl.) = nikiixk,) - mkiixlj
+ rukiixl^) -
mk,{x'l).
Substituting (26) and (27) into this decomposition, we find that hm inf [mfc,; {xk,) - ruk^ {xt.)] > S > 0. i—too
(28)
'
We now observe that, because fcj G U \TZ, we know from the mechanism of the algorithm that (23) must hold, that is
146
Nicholas I. M. Gould and Philippe L. Toint rrik, {xki) - viki {xj:^) < KgOf,.
(29)
Combining this bound with (28), we find that 9^. is bounded away from zero for i sufficiently large, which is impossible in view of (21). We therefore deduce that (24) cannot hold and obtain that there is a subsequence {ke} C {/cj} for which lim Ak, = 0. •+00
We now restrict our attention to the tail of this subsequence, that is to the set of indices ke that are large enough to ensure that (14), (15) and (16) hold, which is possible by definition of the subsequence and because of (21). For these indices, we may therefore apply Lemma 10, and deduce that iteration ki ^TZ for (. sufficiently large. Hence, as above, (29) must hold for £ sufficiently large. However, we may also apply Lemma 8, which contradicts (29), and therefore (23) cannot hold, yielding the desired result. Thus, if the filter is updated at an infinite subsequence of iterates. Lemma 2 ensures that there exists a limit point which is a first-order critical point. Our remaining analysis then naturally concentrates on the possibility that there may be no such infinite subsequence. In this case, the filter is unchanged for k sufficiently large. In particular, this means that the number of restoration iterations, \TZ\, must be finite. In what follows, we assume that /CQ > 0 is the last iteration at which the filter was updated. Lemma 12. Suppose that Algorithm 2.1 is applied to problem (1), that finite termination does not occur and that \U\ < oo. Suppose also that AS1-AS3, (9) hold and that (15) holds for k ^TZ. Then we have that fim 6ifc = 0.
(30)
/c—>oo
Furthermore, n^ satisfies (4) for aU k > ko sufficiently large. Proof. Consider any successful iterate with k > ko. Since the filter is not updated at iteration k, it follows from the mechanism of the algorithm that Pk > Vi holds and thus that f{xk) - f{xk+\) > Vi\mk{xk) - mkix'l)] > r]iKg0f > 0.
(31)
Thus the objective function does not increase for all successful iterations with k > ko. But ASl and AS3 imply (3) and therefore we must have, from the first part of this statement, that lim / ( x f c ) - / ( x f c + i ) = 0 .
fces fc—>oo
(32)
The hmit (30) then immediately follows from (31) and the fact that 9j = 6^ for all unsuccessful iterations j that immediately follow the successful iteration k, if any. The last conclusion then results from (9) and Lemma 1. We now show that the trust-region radius cannot become arbitrarily small if the (asymptoticaUy feasible) iterates stay away from first-order critical points.
Non-monotone Trust-Region Filter Algorithm
147
Lemma 13. Suppose that Algorithm 2.1 is applied to problem (1), that finite termination does not occur and that \U\ < oo. Suppose also that AS1-AS3 hold and (15) holds for k ^ TZ. Suppose furthermore that (10) holds for all k > ko. Then there exists a A^in > 0 such that
for all k. Proof. Suppose that fci > fco is chosen sufficiently large to ensure that (17) holds and that Uk satisfies (9) for all k > ki, which is possible because of Lemma 12. Suppose also, for the purpose of obtaining a contradiction, that iteration j is the first iteration following iteration fcj for which Aj < 7o min
= loSs,
(33)
where OF d e f
.
,
r = mmt ieu is the smallest constraint violation appearing in the filter. Note also that the inequahty Aj < joAki, which is implied by (33), ensures that j > fci + 1 and hence that j — I > ki and thus that j — 1 ^ TZ. Then the mechanism of the algorithm and (33) imply that Aj_i < —Aj < 5,
(34)
70
and Lemma 7, which is applicable because (33) and (34) together imply (12) with k replaced by j — 1, then ensures that Pj-i > m-
(35)
Furthermore, since rij^i satisfies (9), Lemma 1 implies that we can apply Lemma 5. This together with (33) and (34), gives that 0+_i < «...,/i,2_i < (1 - Vi^)e'.
(36)
We may also apply Lemma 9 because (33) and (34) ensure that (12) holds and because (15) also holds for j — I > ki. Hence we deduce that /(
This last relation and (36) ensure that x t ^ is acceptable for the filter. Combining this conclusion with (35) and the mechanism of the algorithm, we obtain that Aj > / i j - i . As a consequence, and since (23) also fails at iteration j — 1, iteration j cannot be the first iteration foUowing fci for which (33)
148
Nicholas I, M. Gould and Philippe L. Toint
holds. This contradiction shows that A^ > lo^s for all k > ki, and the desired result follows if we define ^min
=inm\Ao,...,Ak,,'yoSs].
We may now analyze the convergence of Xk itself. Lemma 14. Suppose that Algorithm 2.1 is applied to problem (1), that finite termination does not occur and that \U\ < oo. Suppose also that AS1-AS3, (9) hold and (15) holds for k^U. Then liminf Xfc = 0.
(37)
fc—*CX)
Proof. We start by observing that Lemma 12 implies that the second conclusion of (9) holds for k sufficiently large. Moreover, as in Lemma 12, we obtain (31) and therefore (32) for each k £ S, k > ko- Suppose now, for the purpose of obtaining a contradiction, that (10) holds and notice that mfc(xfc) - mkix'l^) = mk{xk) - mk{xl) + mfc(x^) - mfc(a;^).
(38)
Moreover, note, as in Lemma 6, that \mk{xk) - mfc(Xfc)| < Kubgll^fcll + K„„h||nfcf, which in turn yields that lim [mk{xk) - nikixk)] = 0 k—*oo
because of Lemma 12 and the first inequality of (9). This limit, together with (31), (32) and (38), then gives that lim [mfc(4)~mfe(x+)]=0.
(39)
k€S
But (15), (10), AS2 and Lemma 13 together imply that, for all k > ko "mkixk) - mfc(x^) > K.^^Xk I
f,Ak
(40)
Pk
immediately giving a contradiction with (39). Hence (10) cannot hold and the desired result follows. We may summarize all of the above in our main global convergence result. Lemma 15. Suppose that Algorithm 2.1 is apphed to problem (1) and that finite termination does not occur. Suppose also that AS1-AS3 and (9) hold, and that (15) holds for k ^ TZ. Let {xk} be the sequence of iterates produced by the algorithm. Then either the restoration procedure terminates unsuccessfully by converging to an infeasible first-order critical point of problem (16), or there is a subsequence {kj} for which hm Xk.: = X* j->oo
and
first-order
critical point for problem (1).
Non-monotone Trust-Region Filter Algorithm
149
Proof. Suppose t h a t the restoration iteration always terminates successfully. From ASS, Lemmas 11, 12 and 14, we obtain t h a t , for some subsequence
{kj}, Urn 9k, = Um Xfe, = 0.
(41)
T h e conclusion then follows from Lemma 2.
4 Conclusion and Perspectives We have introduced a trust-region SQP-filter algorithm for general nonlinear programming, and have shown this algorithm to be globally convergent to first-order critical points. T h e proposed algorithm differs from t h a t discussed by [FL02], notably because it uses a decomposition of the step in its normal and tangential components and imposes some restrictions on the length of the former. It also differs from the algorithm of [FGLTW02] in two main aspects. T h e first and most important is t h a t the rule for deciding whether a trial point is acceptable for the filter is non-monotone, and allows, in some circumstances, acceptance of points t h a t are dominated by other filter pairs. This gives hopes t h a t an S Q P filter algorithm can be developed without introducing secondorder correction steps. T h e second is t h a t the algorithm no longer relies on the definition of a "margin" around the filter, but directly uses the dominated area of the filter as an acceptance criterion.
References [BSS87] R. H. Byrd, R. B. Schnabel, and G. A. Shultz. A trust region algorithm for nonlinearly constrained optimization. SIAM Journal on Numerical Analysis, 24, 1152-1170, 1987. [CGTOO] A. R. Conn, N. L M. Gould, and Ph. L. Toint. Trust-Region Methods. Number 01 in 'MPS-SIAM Series on Optimization'. SIAM, Philadelphia, USA, 2000. [CGST93] A. R. Conn, N. I. M. Gould, A. Sartenaer, and Ph. L. Toint. Global convergence of a class of trust region algorithms for optimization using inexact projections on convex constraints. SIAM Journal on Optimization, 3(1), 164-221, 1993. [DAW99] J. E. Dennis, M. El-Alem, and K. A. Williamson. A trust-region approach to nonlinear systems of equalities and inequalities. SIAM Journal on Optimization, 9(2), 291-315, 1999. [HT95] M. El-Hallabi and R. A. Tapia. An inexact trust-region feasible-point algorithm for nonlinear systems of equalities and inequalities. Technical Report TR95-09, Department of Computational and Applied Mathematics, Rice University, Houston, Texas, USA, 1995. [FL98] R. Fletcher and S. LeyfFer. User manual for filterSQP. Numerical Analysis Report NA/181, Department of Mathematics, University of Dundee, Dundee, Scotland, 1998.
150 [FL02]
Nicholas I. M. Gould and Philippe L. Toint
R. Fletcher and S. LeyfFer. Nonlinear programming without a penalty function. Mathematical Programming, 91(2), 239-269, 2002. [FGLTW02] R. Fletcher, N. I. M. Gould, S. Leyffer, Ph. L. Toint, and A. Wachter. Global convergence of trust-region SQP-filter algorithms for nonlinear programming. SIAM Journal on Optimization, 13(3), 635-659, 2002. [FLT98] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of a SLP-filter algorithm. Technical Report 98/13, Department of Mathematics, University of Namur, Namur, Belgium, 1998. [FLT02] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of a filter-SQP algorithm. SIAM Journal on Optimization, 13(1), 44-59, 20026. [Omo89] E. O. Omojokun. Trust region algorithms for optimization with nonlinear equality and inequality constraints. PhD thesis. University of Colorado, Boulder, Colorado, USA, 1989. [Sar97] A. Sartenaer. Automatic determination of an initial trust region in nonlinear programming. SIAM Journal on Scientific Computing, 18(6), 17881803, 1997. [Toi88] Ph. L. Toint. Global convergence of a class of trust region methods for nonconvex minimization in Hilbert space. IMA Journal of Numerical Analysis, 8(2), 231-252, 1988. [Toi96] Ph. L. Toint. A non-monotone trust-region algorithm for nonlinear optimization subject to convex constraints. Mathematical Programming, 77(1), 69-94, 1997. [Ulb04] S. Ulbrich. On the superlinear local convergence of a filter-SQP method. Mathematical Programming, Series B, 100(1), 217-245, 2004. [Var85] A. Vardi. A trust region algorithm for equality constrained minimization: convergence properties and implementation. SIAM Journal on Numerical Analysis, 22(3), 575-591, 1985. [WBOl] A. Wachter and L. T. Biegler. Global and local convergence of line search filter methods for nonlinear programming. Technical Report CAPD B-01-09, Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, USA, 2001. Available on http://www.optimizationonline.org/DB.HTML/2001/08/367.html. [Yua94] Y. Yuan. Trust region algorithms for nonlinear programming, in Z. C. Shi, ed., 'Contemporary Mathematics', Vol. 163, pp. 205-225, Providence, Rhode-Island, USA, 1994. American Mathematical Society.
Factors Affecting the Performance of Optimization-based Multigrid Methods * Robert Michael Lewis^ and Stephen G. Nash^ ^ Department of Mathematics, College of WilHam k Mary, P.O. Box 8795, Williamsburg, Virginia, 23187-8795, USA. buckarooQmath.wm.edu ^ Associate Dean, School of Information Technology and Engineering, Mail Stop 5C8, George Mason University, Fairfax, VA 22030, USA. [email protected] S u m m a r y . Many large nonlinear optimization problems are based upon discretizations of underlying continuous functions. Optimization-based multigrid methods are designed to solve such discretized problems efficiently by taking explicit advantage of the family of discretizations. The methods are generalizations of more traditional multigrid methods for solving partial differential equations. The goal of this paper is to clarify the factors that affect the performance of an optimization-based multigrid method. There are five main factors involved: (1) global convergence, (2) local convergence, (3) role of the underlying optimization method, (4) role of the multigrid recursion, and (5) properties of the optimization model. We discuss all five of these issues, and illustrate our analysis with computational examples. Optimization-based multigrid methods are an intermediate tool between general-purpose optimization software and customized software. Because discretized optimization problems arise in so many practical settings we think that they could become a valuable tool for engineering design.
1 Introduction Many large nonlinear optimization problems are based upon discretizations of underlying continuous functions. For example, the underlying infinite dimensional model may be governed by a diflFerential or integral equation representing, for example, the flow of air over an airplane. When solved computationally, the underlying functions are typically approximated using a discretization (an approximation at a discrete set of points) or a finite-element approximation (for example, approximating the solution by a spline function). We focus here on discretizations to simplify the discussion. Optimization-based multigrid methods are designed to solve such discretized problems efficiently by taking explicit advantage of the family of * This research was supported by the National Aeronautics and Space Administration under NASA Grant NCC-1-02029, and by the National Science Foundation under grant DMS-0215444.
152
Robert Michael Lewis and Stephen G. Nash
discretizations. That is, they perform computations on a less costly version of the optimization problem based on a coarse discretization (grid), and use the results to improve the estimate of the solution on a finer discretization (grid). On each individual grid the algorithm applies iterations of a traditional optimization method to improve the estimate of the solution. The methods are generaUzations of more traditional multigrid methods for solving partial differential equations. The goal of this paper is to clarify the factors that affect the performance of an optimization-based multigrid method. There are five main factors involved: • • • • •
Global convergence: Is the algorithm guaranteed to converge to a solution of the optimization problem? Local convergence: How rapidly does the algorithm converge close to the solution? Behavior of the underlying optimization method: What is the role of the underlying optimization method in improving the estimate of the solution? Behavior of the multigrid recursion: What is the role of the multigrid recursion in improving the estimate of the solution? Properties of the optimization model: What types of optimization models are well-suited to an optimization-based multigrid method?
These questions will be addressed in subsequent sections, but it is possible to give brief responses here. In short: • • •
•
Optimization-based multigrid algorithms can be implemented in a manner that guarantees convergence to a local solution. The multigrid algorithm can very efficient, with a fast hnear rate of convergence near the solution (the same as for traditional multigrid algorithms). The underlying optimization method and the multigrid recursion are complementary. The underlying optimization method is effective at approximating the high-frequency (rapidly changing) components of the solution, and the multigrid recursion is effective at approximating the low-frequency (slowly changing) components of the solution. Thus, combining the two approaches is far more effective than using either one separately. The multigrid algorithm will be effective if the reduced Hessian of the optimization model is nearly diagonal in the Fourier (frequency) basis. There is evidence that suggests that large classes of optimization models have this property. (The reduced Hessian should also be positive semidefinite, but this will always be true near a local solution of an optimization problem.)
Multigrid algorithms were first proposed for solving elliptic linear partial differential equations (PDEs). In that setting they are well known for their efficiency, with computational costs that are linear in the number of variables. Multigrid algorithms have been extended to nonlinear PDEs, and to non-elliptic PDEs, but their behavior in these settings is not as ideal. More
Performance of Optimization-based Multigrid Methods
153
specifically, multigrid algorithms for nonlinear equations are not guaranteed to converge to a solution of the equations, and multigrid algorithms for nonelhptic equations can be far less efficient than for ehiptic equations. We are proposing that optimization-based multigrid algorithms be apphed to discretizations of models of the form minimize F(a) = / ( a , u ( a ) ) ,
(1)
where a is an infinite-dimensional set of design variables (for example, a might be a function on an interval), and u = u{a) is a set of state variables. Given a, the state variables are defined implicitly by a system of equations S{a,u{a)) = 0
(2)
in a and u. We assume that S{a, u) = Ois either a system of partial differential equations or a system of integral equations. The design variables a might represent boundary conditions, might define the shape of a machine part, etc. There may also be additional constraints on the design and state variables. Problems of this type can be very difficult to solve using general-purpose optimization software. In other words, we are applying multigrid methods to an optimization model, i.e., to the minimization of a nonhnear function subject to nonhnear constraints. Some of these constraint may be inequalities that are not always active, and become active in an abrupt, discontinuous manner. This is in contrast to applying multigrid methods to a system of nonlinear equations, where all the equalities (equations) will be active (satisfied) at a solution. Given the limitations of traditional multigrid algorithms applied to nonelUptic or nonlinear equations, it might seem questionable to apply optimization-based multigrid methods to the more general model (l)-(2). (We are not assuming that the state equation (2) is eUiptic or linear.) To further compficate matters, the performance of the multigrid algorithm depends on the properties of the reduced Hessian for (1)~(2), which in turn depends on the Jacobian of S{a,u{a)) and the Hessian of F{a). If the state equation were ill-suited for multigrid, would not the optimization model be even worse? Perhaps counter-intuitively, we believe that the optimization model can be a better setting for multigrid than a system of PDEs. In particular, it is possible to design the multigrid algorithm so that it is guaranteed to converge to a local solution of the optimization problem. (Typically, optimization algorithms have better guarantees of convergence than algorithms for solving systems of nonhnear equations.) Also, broad classes of optimization models appear to be well-suited to multigrid. An additional advantage is that the optimization setting is more general than the nonlinear-equations context in the sense that it can include auxihary constraints (including inequalities) [LN05]. The effectiveness of multigrid for optimization depends on all five of the factors mentioned above. Without careful design of the multigrid algorithm, it
154
Robert Michael Lewis and Stephen G. Nash
is not guaranteed to converge. Without the fast local convergence, the multigrid algorithm is not competitive with existing techniques. Both the underlying optimization algorithm and the multigrid recursion are needed to achieve the fast local convergence. And the multigrid recursion will be of no value if the reduced Hessian does not have the right properties. Our results build on existing work, especially on the extensive research on multigrid algorithms for PDEs (see, e.g., [Bra77, Hac85, McC89]), but also on multigrid optimization research [FP95, KTS95, NasOO, Ta'91, TT98, ZC92]. In our work we have focused on geometric multigrid, in which the optimization problem decomposes along length-scales. Here is an outline of the paper. Section 2 gives a template for the method. Section 3 describes several broad categories of models that are candidates for an optimization-based multigrid method. Sections 4 and 5 discuss global and local convergence, respectively. Sections 6 and 7 explain the complementary roles of the underlying optimization algorithm and the multigrid recursion. The properties of the reduced Hessian are the topic of Section 8. Section 9 has computational examples, and conclusions are in Section 10. The optimization-based multigrid methods we discuss are not completely general-purpose optimization methods. They assume that the optimization problem is based on a discretization, and not all large optimization problems are of this type. Nevertheless, the methods have great flexibility. They can adapt to specialized software (e.g., for grid generation, and for solving the underlying state equation), and they require httle effort from the user beyond that required to apply a traditional optimization method to the problem. Thus, these optimization-based multigrid methods are an intermediate tool between general-purpose optimization software (which may have difficulty solving these problems) and customized software and preconditioners (which require extensive human effort to develop). Because discretized optimization problems arise in so many practical settings (e.g., aircraft design) we think that they could become a valuable tool for engineering design.
2 T h e Multigrid Algorithm Our goal in this paper is to clarify the issues that affect the performance of an optimization-based multigrid algorithm. We specify an algorithm with sufficient generality for the results to have broader interest, yet with sufficient detail that properties of the algorithm (such as convergence theorems) can be deduced. The algorithm below, called MG/Opt, is an attempt to satisfy these conflicting aims. The description of the algorithm MG/Opt is taken from [LN05]. The recursion is a traditional multigrid V-cycle. The coarse-grid subproblems are motivated by the full approximation scheme, a multigrid method for solving systems of nonlinear equations [McC89]. Yet, despite the motivation of more
Performance of Optimization-based Multigrid Methods
155
traditional multigrid methods for equations, MG/Opt is truly based on optimization. The solver ("smoother") used on the fine grid is an optimization algorithm, and the coarse-grid subproblems are optimization problems. MG/Opt differs in two other ways from a more traditional multigrid algorithm. The coarse-grid subproblem imposes bounds on the solution (thus Hmiting the length of the step taken at each iteration), and the result of the coarse-grid subproblem is used to define a search direction for the fine grid. This search direction is used within a line search, a tool for ensuring that the algorithm makes progress toward the solution (as measured in terms of the value of the objective function). These additions guarantee that MG/Opt will converge to a local solution to the optimization problem. (See Section 4.) Several steps in the algorithm require explanation. In two places there is the requirement to "partially minimize" F{a). In our implementation this means to apply some (typically small) number of iterations of a nonlinear optimization algorithm. In our computational tests, we use one outer iteration of a truncated-Newton method. The algorithm MG/Opt refers to multiple versions of quantities corresponding to the various grids. At each iteration of MG/Opt, however, there are only references to two grids: the current grid, identified by the symbol h; and the next coarser grid, identified by H. Thus a^ is the version of the vector of design variables on the current grid, and Ff{ is the objective function on the next coarser grid. MG/Opt requires update and downdate operators, / ^ and I^^, respectively. These operators transform a vector on one grid to a vector on the next finer or coarser grid. For theoretical reasons, we require that these two operators be essentially the transposes of one another: J]^ = constant x I^ where the constant is required to be a positive number. This is a standard assumption [Bri87]. In other respects, MG/Opt offers considerable flexibility. Any optimization algorithm could be used. There are no additional assumptions about the update and downdate operators. No assumptions are made about the relationship among the various grids (e.g., they need not be nested, though this may be desirable). The line search is not specified in detail. Of course, these choices would have a major effect on the practical performance of MG/Opt. One iteration of the algorithm takes a step from a^^\ an initial estimate of the solution on the finest grid, to a^^\ via: • •
If on the coarsest grid, minimizeF/t(a/i) = fh{
156
Robert Michael Lewis and Stephen G. Nash -
Recursively apply MG/Opt (with initial estimate aH,i) to solve: minimize ^//(aj:^) — Vf^an an
subject to the bound constraints aH,low < dH < aH,up
-
to obtain an,2- (See below for a definition of the bounds.) Compute the search direction eh = IH{<^H,2 — a//,i)Use a line search to obtain afi,2 = o,h,i + ct^hPartially minimize Fh{a), with initial estimate aii,2, to obtain aj^'.
The coarse-grid optimization problem includes the bound constraints
The bounds are defined by the formulas an,low = a-H,! - le aH,up = a/j-,1 + 7 6
where
e=(i,...,ir 7 = max{\\vH\\, \\VFHiaH,i)\l | | / f VF;,(a/,,i)||} The bounds are used to limit the step e^, with the goal of only using the coarse-grid approximation near to the current estimate of the solution. We only trust the coarse-grid approximation within a limited region. This is an adaptation of ideas from trust-region methods [NS96]. In the preceding description of the algorithm we have chosen an L°° trust region to limit the length of steps taken in the multigrid recursion. Alternatively, we could have chosen an L^ or H'^ trust region. 2.1 Software for M G / O p t and Related Algorithms In our computational tests we use the Matlab implementation of MG/Opt from [LN05]. It uses the truncated-Newton algorithm TN [NN91] as the underlying optimization method. "Partially minimize" is implemented as one outer iteration of TN. "If on coarsest grid, minimize" is implemented as using TN with a maximum of 25 outer iterations. The line search in MG/Opt is the same fine search as in TN. We compare MG/Opt with two other algorithms. One is just the application of TN to the optimization problem on the finest grid, and is referred to as "optimization". The other, called "successive refinement", apphes TN on the
Performance of Optimization-based Multigrid Methods
157
coarsest grid (with an upper limit of 25 outer iterations), then interpolates the solution onto the next finer grid and applies TN again with this interpolated vector as an initial guess. This continues until the finest grid, when TN is apphed to the full problem without an upper limit on the number of outer iterations. We do not compare MG/Opt with traditional multigrid methods. In part, this is because the two approaches are for different categories of problems. Perhaps more importantly, we are trying to understand the properties of optimization-based multigrid methods and not the behavior of a specific software implementation of MG/Opt. By using test software with many common components, we hope to isolate the properties of the underlying approach. For further justification and commentary on this issue, see [LN05].
3 Some Typical Problem Categories Some insight into the nature of the optimization model (1) and the way in which the governing equation (2) affects it can be obtained by examining the Hessian of F. ImpUcit differentiation of (2) yields the Jacobian of u{a) with respect to a: ^ _ _c-ic da ~ " "' where Sa and Su are the derivatives of S with respect to a and u. The operator S~^ is the solution operator of the Unearized state equation with respect to the state u. The Lagrangian of the optimization model is L{a, u\ A) = / ( a ,
M)
+ (A , 5(a, u)),
and its Hessian is ^l,u)Lia,
u; A) = V^,,„)/(a, u) + V^,,„)5(a, u)X.
The Hessian of L with respect to both a and u has the block structure
We then have the foUowing expression for the Hessian of F with respect to a: V F = Laa + LauSy^
Sa + S^Sy^
L^a + S^S^
Lu„0„
Sa,
(3)
where * denotes the adjoint of an operator. This is the reduced Hessian [GMW81, NS96] of / ( a , u) with respect to the equality constraints S{a, u) = 0 in the equivalent equality constrained formulation of (1): minimize f{a,u) a,u subject to S{a,u) = 0.
C4')
158
Robert Michael Lewis and Stephen G. Nash
In many models of interest, the state equation is a differential equation, or an integral operator. The reduced Hessian that results, however, may be a very different type of operator, though it might also be an operator like a differential or integral operator. As can be seen from (3), the dependence of the reduced Hessian on the state equation is complicated. Since there are myriad possibilities for the type of state equation and the particular objective, there are myriad possibilities for the nature of the resulting optimization model. We contrast here two particular types of optimization models, which differ in the nature of their nonhnearity as manifest in the reduced Hessian of the objective. •
•
The reduced Hessian amplifies high frequencies, and tfms is a roughening operator. It may amplify high frequencies over all of the domain of the functions it acts on, or only in a region of their domain, or perhaps only microlocally (i.e., only in a subdomain, and only for frequencies with certain directions). Informally speaking, such reduced Hessians behave like differential operators. The minimizers of such models tend to be welldetermined insofar as the reduced Hessian at a solution is a coercive operator. (An operator A is coercive if there is a positive constant C such that \\Ax\\ > C\\x\\ for all x.) The reduced Hessian is a smoothing operator. High frequency components are damped over the domain of the functions the reduced Hessian acts on. The minimizers of such models tend to be under-determined, with the high-frequency components effectively unrecoverable. Fine discretizations of the optimization variables introduce as much spurious content as real detail about the solution.
Our focus here is models which exhibit the first type of behavior. We know from the prototype of applying conjugate gradients (CG) to Laplace's equation (see Section 6) that the solution wiU be slow if no preconditioning is used. On the other hand, multigrid is extremely effective, either as a solution technique in itself or as a preconditioner for CG. Consequently, if the reduced Hessian resembles a differential operator then the optimization problem will likely be difficult to solve. At the same time, if the reduced Hessian resembles an elliptic differential operator then multigrid will be an effective acceleration technique. For models of the second class, successive refinement is frequently an effective and straightforward strategy. In successive refinement one first solves the problem on a coarse grid, then uses that coarse grid solution transferred to a finer mesh as a starting point for the solution on the finer mesh, and so on. Since the low-frequency components of the solution are the ones that are most well-determined, and they are resolved in the coarser mesh solves, this approach is appropriate for underdetermined problems.
Performance of Optimization-based Multigrid Methods
159
4 Global Convergence A fundamental question for any optimization algorithm is whether it is guaranteed to converge to a solution of the optimization problem. For many traditional algorithms for solving minimize F(a) it is possible to prove (under appropriate assumptions) that lim||VF(afe)||=0, where a^ is the A;-th estimate of the solution produced by the algorithm. If the level set {a\Fia)
160
Robert Michael Lewis and Stephen G. Nash
recursion generates a descent direction, so that the search direction from the multigrid recursion also contributes to the improvement of the function value. The following theorem is adapted from [LN02]. Theorem 2. In algorithm MG/Opt, assume that
for some constant 7 > 0, and that the multigrid subproblem generates a point 0'H,2 satisfying FniaHa) < FH{aH,i)If the bounds in MG/Opt are chosen so that \\aH^up —fl/f,;otuliis sufficiently small, then e/, is a descent direction for Fh at a^^i, i.e., elVFh{ah,i)
< 0.
MG/Opt is based on a line-search strategy. An alternative approach for guaranteeing convergence is to use a multigrid-style optimization algorithm based on a trust-region strategy [SGT04]. (There are some differences between MG/Opt and the trust-region method in [SGT04]. For example, the trust-region method uses either the multigrid recursion or the underlying optimization method, but not both. Thus it does not share all the computational properties of MG/Opt.) If MG/Opt is applied without a line search for the multigrid search direction, that is, if we set a/1,2 = o/j,i + e/i,
then MG/Opt is not guaranteed to converge. See [LN05] for an example where this occurs. Likewise, if a nonlinear multigrid algorithm is applied to the optimahty conditions for the optimization model, there is no guarantee that the algorithm will converge to a stationary point of the optimization model. This is analogous to the behavior of algorithms appUed to solve minimize-F (a) VF{a) = 0. Although the algorithms for both problems are similar, and are typically based on Newton's method for solving nonlinear equations, the convergence guarantees are different. For the optimization problem, it is typically possible to guarantee convergence to a stationary point of the optimization problem. For the nonlinear equations, the best that can typically be guaranteed is that the algorithm will find a local solution to minimizejl VF(a) II. Thus, the optimization setting provides better guarantees of convergence.
Performance of Optimization-based Multigrid Methods
161
5 Local Convergence In the previous section we discussed global convergence, whether MG/Opt is guaranteed to converge to a solution of the optimization problem. Here we discuss local convergence, or the rate at which MG/Opt converges close to the solution. For reasons we explain below, we can expect asymptotically that the line search will use a step of a = 1. For this reason the behavior of MG/Opt will approximate the behavior of a traditional linear multigrid method applied to the Newton equations for (1): V^Fiak)x = -VF(afe). Thus the convergence rate for MG/Opt will be the same as for a traditional multigrid algorithm, i.e., it will have a hnear rate of convergence. When apphed to appropriate PDEs, multigrid algorithms are renowned for their efficiency. The computational cost of solving a linear system with A^ equations is 0{N), which is an optimal computational complexity. The convergence rate is linear, but the rate constant is significantly less than one. This is in contrast to the steepest-descent method, which also converges linearly but typically has a rate constant that is very close to one. It would also be possible to stop using MG/Opt at some iteration and just use the underlying optimization method starting from the current estimate of the solution. If the underlying optimization method had a more rapid (e.g., superhnear) rate of convergence, and if it was desirable to compute the solution to high accuracy, this could be a sensible approach. We now return to the issue of the effect (if any) of the line search on the convergence rate. In a traditional multigrid algorithm, no line search is applied to the result of the multigrid recursion, but MG/Opt uses a Une search to guarantee global convergence. It is possible that the line search could interfere with the desired rapid convergence associated with multigrid algorithms. It turns out that this is not a major concern, as least as MG/Opt approaches the solution of the optimization problem. It is possible to prove that, as MG/Opt converges, the multigrid search direction e^ can be expected to be "well scaled" in the sense that a step of a = 1 will be accepted in the line search [LN05]. In other words, a step of a = 1 will approximate the solution of minimize F/j(a/i J +aeh)a
This is a property shared by Newton-hke methods for optimization. Thus, as MG/Opt converges, the line search will not interfere with the rapid convergence of the multigrid algorithm.
162
Robert Michael Lewis and Stephen G. Nash
6 The Underlying Optimization Algorithm There are many choices for the underlying optimization algorithm in MG/Opt. If we focus on large problems, then the set of practical choices is much smaller. It will typically not be feasible to use a method that requires matrix storage (such as Newton's method, or a full quasi-Newton method), nor will it be feasible to use a direct-search method [KLT03, NS96]. This leaves three commonly-used classes of methods: • • •
truncated-Newton methods (where the conjugate-gradient method is applied to the Newton equations to compute a search direction) limited-memory quasi-Newton methods (where a limited number of quasiNewton updates are used to compute a search direction) nonlinear conjugate-gradient methods (adaptations of the linear conjugategradient method)
If these methods are applied to a quadratic function, and if exact line searches are used, then the three methods are equivalent up to rounding errors; in fact, they are equivalent to applying the linear conjugate-gradient method to the quadratic function. On more general problems, their qualitative behavior is similar, at least as they converge to a local solution of a nonlinear optimization problem satisfying the second-order sufficiency conditions for optimality. Our overall goal in this paper is to explain the factors that affect the performance of MG/Opt. If we are trying to understand the influence of the underlying optimization algorithm in MG/Opt, it is appropriate to study the behavior of the linear conjugate-gradient algorithm (CG algorithm) applied to a quadratic function. Imagine applying the CG algorithm to the solution of the quadratic problem minimize -x Ax — x b (5) where A is a symmetric, positive-definite matrix. The fc-th estimate of the solution is obtained by minimizing the quadratic function over the Krylov subspace span{6, Ab, A^b,..., A''~'^b}; that is, the subspace spanned by the sequence of Krylov vectors generated by the CG algorithm. The CG algorithm will work well if the Krylov subspace contains good approximations to the solution of (5). In the case of MG/Opt, the matrix A will correspond to the reduced Hessian, and the properties of the reduced Hessian will determine the behavior of the CG algorithm. If the reduced Hessian is a differential operator (such as the one-dimensional Laplacian) then A will be a "roughener". That is, multiplying a vector by A
Performance of Optimization-based Multigrid Methods
163
will amplify the high-frequency components, and decrease the relative magnitude of the low-frequency components. Thus the Krylov vectors will become ever more oscillatory. In contrast, the solution operator A~^ will be a smoother. The CG algorithm will compute the /c-th estimate of the solution A~^b via iA'b,
Xk 1=0
for appropriately chosen coefficients a^. Since the solution is smoother than the first Krylov vector b, and the Krylov vectors get ever rougher, the performance of the CG algorithm will deteriorate as k increases. The CG algorithm is trying to approximate a vector dominated by low-frequency components using a sequence of Krylov vectors that are dominated by high-frequency components. This is counter-productive. The CG algorithm will be most effective during its first few iterations, and it will be most effective at approximating the high-frequency components of the solution. In Figure 1 below we plot the Fourier transforms of successive Krylov vectors from a least-squares optimization problem with a hyperbolic differential equation as a constraint; the problem and figure are from [LN05]. The Krylov vectors have been normalized to have unit length. The first vector is plotted on the left and its successor on the right. In contrast, the Fourier transform of the solution (in Figure 2) is dominated by low-frequency terms.
-150O
-loco
Fig. 1. The magnitude of the FFTs of successive Krylov vectors.
As we discuss in the next section, the multigrid recursion is an ideal complement to the CG method, in that it is successful at approximating the low-frequency components of the solution. Together, the algorithms are able to approximate all components of the solution efficiently. CG also behaves perversely if the reduced Hessian is a smoothing operator. In this case the successive Krylov vectors will become smoother. On the other
164
Robert Michael Lewis and Stephen G. Nash
-1500
-1M«
-500
5(K)
IMO
1500
Fig. 2. The FFT of the solution of the optimization problem.
hand, the inverse of the reduced Hessian will be a roughening operator, so the minimizer of the quadratic model is oscillatory. Consequently, successive iterations of CG fail to introduce helpful information. Once again, CG is doing the wrong thing.
7 The Multigrid Recursion If the optimization problem has a nonlinear structure that resembles an elliptic operator, then standard multigrid ideas apply. The discussion here follows that in [Bri87], to which we refer the reader for an explanation of multigrid. However, we recast some of the ideas in the context of optimization. We return to the example of the previous section. Consider the solution of Laplace's equation Au^^ = f via the minimization of the quadratic
-u^Au + f'^u, where A is the discretization of —A. As is well-known, CG will typically first find the component of the solution that hes in the span of the eigenvectors associated with the largest eigenvalues of A. In the case of the Laplacian, the largest eigenvalues are associated with eigenvectors that represent oscillatory functions. Thus, CG quickly captures the high-frequency components of the solution, and the error u — u* is consists of smooth features. If we transfer the information to a coarser grid and continue with CG, then CG will quickly reduce the high-frequency component of the error at the length-scale of the coarser grid. However, high-frequency components on the coarser grid are relatively smooth on the finer grid. This means that the coarse grid calculation resolves a significantly different part of the solution than the fine grid CG iterations. If we imagine only a two-grid system, there are two direct sum decompositions of interest:
Performance of Optimization-based Multigrid Methods Smooth(/i) e Oscillatory(/i)
165
i ? ( / f ) © iV(/f)
Here Smooth(/i) and Oscillatory(/i) are the subspaces representing smooth and oscihatory components on the h-gnd (for instance, low and high frequency sinsusoids). R{I}^) is the range of the /i-mesh to H-xnesh intergrid transfer operator //^, which takes discrete functions from the /i-mesh to smoother versions on the iJ-mesh. N{lfl) is the null space of if^. There is a rough— but not exact—correspondence between Smooth(/i) and R{ll^) and between OsciUatory(/i) and N{lf^). After a few iterations of CG on the /i-mesh, most of the error lies in the subspace Smooth(/i). This subspace is approximately the same as the range it!(/^) of the fine-to-coarse grid transfer operator. Once the problem is solved on the i/-mesh, the iJ-mesh portion of the solution can be transferred to the /i-mesh by the coarse-to-fine grid transfer operator. There still remains a high-frequency component in the error, however. Part of this high-frequency component can come from any residual in the earlier fine grid solves. However, even if the high-frequency component were completely eliminated earlier there would still be a high-frequency component in the error due to the fact that -R(//f) is not exactly Smooth(/i), but has a nontrivial projection onto OsciUatory(/i). So, if the objective F is approximately separable along length-scales: F{a) = F{aio + au) » -Fio(aio) + Fhi(ahi), then we may expect geometric multigrid to be an effective technique to accelerate optimization algorithms. Thus the CG algorithm and the multigrid recursion are complementary. If the matrix A is like a differential operator, the CG algorithm quickly resolves the high-frequency components of the solution but then becomes less effective as the Krylov vectors become highly oscillatory. But by using the multigrid recursion to move to a coarser grid, the smooth error on the finer grid becomes rougher on the coarser grid, so the CG algorithm can again be an effective tool (at least for a few iterations, until it is necessary to use recursion again).
8 Properties of the Reduced Hessian When multigrid is applied to a PDE, it is appropriate to analyze the coefficient matrix/operator or (in the nonlinear case) its Jacobian. In the context of optimization, this corresponds to analyzing the reduced Hessian for the optimization model, a more comphcated endeavor. When we apply multigrid to the Laplacian, for instance, we have direct access to its representation Ah on a fine grid, and can thus work with the coarser grid representation given by AH = if^Ahl^- The situation is not so straightforward for (l)-(2) because of the indirect way the solution operator of the state equation and its discretization enter the calculation of the Hessian.
166
Robert Michael Lewis and Stephen G. Nash
The relationship between the Hessian and the solution operator of (2) lies in the structure of the Hessian of (1) and the reduced Hessian of the constrained optimization model. The formula for the reduced section was derived in Section 3: V F/i = Maa,h + Mau,hS~i^Sa,h
+ SljiS~*hMua,h
+
Sl^iiS^*f^My,u,hS^j^Sa,h-
In general on a coarser level H, V^FH^ll^V^Fhl^H-
(6)
Multigrid methods for PDEs are most effective when apphed to elUptic models. An elliptic operator corresponds, in the language of optimization, to a convex operator or coercive bilinear form. At a local minimizer, the reduced Hessian will be positive semi-definite and, in non-degenerate cases, will be positive definite. Thus, in non-degenerate cases, we can expect that the reduced Hessian at the solution will be elhptic. Does this mean that MG/Opt will be effective close to the solution of a nondegenerate optimization problem? We beheve that more than ellipticity is required for MG/Opt to be effective, for the reasons outlined in Section 7. The multigrid recursion computes an estimate of the solution (or, more precisely, the step to the solution) on a coarse grid. The CG method, applied on the fine grid, is effective at estimating the high-frequency components of the solution. If MG/Opt is to be effective, it is necessary that these two pieces of the solution are (approximately) independent of one another. The coarse-grid estimates of the solution correspond to low-frequency components of the solution. Thus, we expect that MG/Opt will be effective if the low-frequency components of the solution can be determined independently of the high-frequency components of the solution. This will occur if the reduced Hessian is (approximately) diagonal in the Fourier basis. If the reduced Hessian is diagonal in the Fourier basis, then individual frequency components can be computed independently of one another. If the reduced Hessian is approximately diagonal in this basis, then there will be little interaction among the frequencies, and we would expect MG/Opt to be an effective algorithm. It is easy to demonstrate that convexity, by itself, is not enough to guarantee the effectiveness of MG/Opt. This is shown in Section 9. It is not possible to make any general statement about the efficacy of MG/Opt, since it depends on the nature of both the objective and the governing PDE. However, there are many situations where MG/Opt will be effective. Again, consider the nonlinear problem from the perspective of its quadratic approximation, and the way in which the term S~^, which represents the solution operator for the linearization of the governing PDE, enters into the expression for V'^F{x) given in (3). In many cases, the Hessian is a singular integral operator that is nearly diagonal in both space and frequency. That is, in a basis that is local in space and frequency (e.g., wavelets) the kernel
Performance of Optimization-based Multigrid Methods
167
representing the integral operator is primarily concentrated along its diagonal in both space and frequency. In cases such as this we would expect MG/Opt to work well.
9 Computational Examples As we have argued, MG/Opt will be effective if the reduced Hessian is (approximately) diagonal in the Fourier basis. A simple example can be used to demonstrate that ellipticity by itself is not sufficient. Let A„ be the nxn tridiagonal matrix corresponding to the negative of the discretized one-dimensional Lapiacian on the interval [0,1]: /
A --i
2-1 0 \ -1 2-10 0-1 2-10
V
••••••••••••;
where h = l / ( n — 1). This is considered an ideal matrix for multigrid in that linear systems based on this matrix can be solved with optimal efficiency. A^ is diagonal in the Fourier basis for all values of n. We will apply MG/Opt to two related quadratic optimization problems based on this matrix for n = 129. The first optimization problem is minimize -x^Ai2gX
— 6^29^
where the right-hand side is a normally-distributed random vector obtained via the Matlab function randn: 6 i 2 9 = b = 50*randn( 129,1) and the random-number generator is initialized via randn ( ' s t a t e ' ,0). The second optimization problem is derived from this problem through an orthonormal change of variables. Form the eigenvalue-eigenvector decomposition of ^129:
where Q is an orthogonal matrix of eigenvectors, and D is a diagonal matrix of eigenvalues. Then define S129 = VDV^ =
{VQ^)Au9{VQ'^f,
where V is the matrix of Lanczos vectors obtained from applying the Lanczos algorithm to A129 with initial vector 6129- The second optimization problem is
168
Robert Michael Lewis and Stephen G. Nash minimize -x
Bi2gx — C\2Q^
where Cl29 =
{VQ^)bi2Q-
The two problems have the same optimal values and, up to an orthonormal change of variables, the same solution. The Hessians for the two optimization problems have the same eigenvalues and in exact arithmetic the CG algorithm will behave the same way when applied to the two problems. Thus, the underlying optimization algorithm will behave comparably when applied to the two problems. The observed behavior of the optimization algorithm can be different for the two problems, in part because of inexact arithmetic, and in part because of the line search and other safeguards that are built into the optimization algorithms. For the purposes of multigrid, we construct coarse-grid versions of these problems for n = 65,33,17 using an explicit projection matrix:
P = IH
=
/ 1 0 0 0 0 1 0 0 1 i ii 2
0
... 0 ... 0 0 0 0
\ TH
_]_pT
2
0 1 0
V--••••••
0
••••••
/
Then BH = ^P^BhP, Ay, = \P^AhP, bn = ^P'^b,,, and CH = ^P'^c,,. With these definitions, it is possible to apply optimization, successive refinement, and MG/Opt to the two quadratic optimization problems. As an initial estimate of the solution, we use the zero vector. Since 0 = ^P^O, this is an equivalent starting point for both optimization problems. The results of the computations are in the tables and figures. On the original problem, MG/Opt performs extraordinarily well, as we would expect since multigrid is an ideal algorithm when appHed to the Laplacian. Successive Refinement is not as successful as MG/Opt, but it too is effective because information from the coarse grids provides useful information about the solution on the finer grids. On both problems Optimization has comparable performance, as we would expect. MG/Opt and Successive Refinement, however, are dramatically less effective on the transformed problem. This is as predicted because of the strong interactions among the different frequency components of the solution. This illustrates the misalignment of the high- and low-frequency subspaces discussed in Section 7. The final two figures compare the solutions to the two optimization problems at the four different resolutions. For the original problem, the four solutions are similar, but for the transformed problem, there is far less similarity among the solutions. This gives further explanation for the results.
Performance of Optimization-based Multigrid Methods it Optimization Is eg Successive it Refinement Is eg it MG/Opt Is eg
169
n = 129n = 65 n = 33 n = 17 29 30 632 18 11 10 11 19 11 12 12 242 117 93 58 6 8 29 10 12 16 20 35 14 35 33 157
Table 1. Results for the original quadratic problem.
it Optimization Is eg Successive it Refinement Is eg it MG/Opt Is eg
n = 129 n = 65 n = 33 n = 17 25 26 505 25 10 15 15 26 44 39 16 599 43 138 75 18 22 20 48 36 44 40 116 496 140 110 236
Table 2. Results for the transformed quadratic problem.
10 Conclusions We believe that optimization-based multigrid methods can be an important tool in engineering design because of the prevalence of design problems governed by diflFerential and integral equations. The optimization setting is appropriate for these problems because of the goal of finding optimal or at least better designs. In addition, the optimization setting is advantageous for multigrid. The optimization-based multigrid method MG/Opt that we propose is designed to solve discretized problems efficiently by taking explicit advantage of the family of discretizations. We have discussed five main factors important to the performance of MG/Opt: (1) global convergence, (2) local convergence, (3) role of the underlying optimization method, (4) role of the multigrid recursion, and (5) properties of the optimization model. As MG/Opt and its software implementation demonstrate, it is possible to design an optimizationbased multigrid algorithm that responds to the first four of these factors. The last factor (the properties of the model) is not something that can be controlled, but there are broad classes of optimization models that are well suited to multigrid.
170
Robert Michael Lewis and Stephen G. Nash Objective vs. work for the Original Quadratic Problem
10' ^
o 10 n
1 MG/OPT
200 300 400 600 Equivalent fine-mesh gradient evaluations
Fig. 3. Comparison of approaches for the original quadratic problem. Objective vs. v^ork for the Transformed Quadratic Problem
200 300 400 600 Equivalent fine-mesh gradient evaluations
Fig. 4. Comparison of approaches for the transformed quadratic problem.
Thus, not only is it possible to design an appropriate optimization-based multigrid algorithm, optimization models are a fruitful setting for algorithms of this type.
Performance of Optimization-based Multigrid Methods
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
171
1
Fig. 5. Comparison of solutions at various resolutions (original problem)
0
0.1
0.2
0.3
0.4
0.6
0.6
0.7
0.8
0.9
1
Fig. 6. Comparison of solutions at various resolutions (transformed problem).
172
Robert Michael Lewis and Stephen G. Nash
References [Bra77] Achi Brandt. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation, 31:333-390, 1977. [Bri87] William L. Briggs. A Multigrid Tutorial. Society for Industrial and Applied Mathematics, 1987. [FP95] Dan Feng and Thomas H. PuUiam. Aerodynamic design optimization via reduced Hessian SQP with solution refining. Technical Report 95-24, Research Institute for Advanced Computer Science (RIACS), NASA Ames Research Center, 1995. [GMW81] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization. Academic Press, London, 1981. [Hac85] W. Hackbusch. Multi-grid Methods and Applications, volume 4 of Springer series in Computational Mathematics. Springer-Verlag, Berlin, 1985. [KLT03] Tamara G. Kolda, Robert Michael Lewis, and Virginia Torczon. Optimization by direct search: New perspectives on some classical and modern methods. SIAM Review, 45(3):385-482, 2003. [KTS95] G. Kuruvila, S. Ta'asan, and M. D. Salas. Airfoil design and optimization by the one-shot method. AIAA paper 95-0478, 1995. [LN02] Robert Michael Lewis and Stephen G. Nash. Practical aspects of multiscale optimization methods for vlsicad. In Jason Cong and Joseph R. Shinnerl, editors, Multiscale Optimization and VLSI/CAD, pages 265-291. Kluwer Academic Publishers, Boston, 2002. [LN05] Robert Michael Lewis and Stephen G. Nash. Model problems for the multigrid optimization of systems governed by differential equations. SIAM Journal on Scientific Computing, 2005. to appear. [McC89] Stephen F. McCormick. Multilevel Adaptive Methods for Partial Differential Equations. Society for Industrial and Applied Mathematics, 1989. [NasOO] Stephen G. Nash. A multigrid approach to discretized optimization problems. Journal of Computational and Applied Mathematics, 14:99-116, 2000. [NN91] Stephen G. Nash and Jorge Nocedal. A numerical study of the limited memory BFGS method and the truncated-Newton method for large scale optimization. SIAM Journal on Optimization, 1:358-372, 1991. [NS96] Stephen G. Nash and Ariela Sofer. Linear and Nonlinear Programming. McGraw-Hill, New York, 1996. [SGT04] A. Sartenaer S. Gratton and Ph.L. Toint. Recursive trust-region methods for multilevel nonlinear optimization (part I): global convergence and complexity. Technical Report 04/06, CERFACS, av. G. Coriolis, Toulouse, France, 2004. [Ta'91] Shlomo Ta'asan. "One Shot" methods for optimal control of distributed parameter systems I: Finite dimensional control. Technical Report 91-2, Institute for Computer Applications in Science and Engineering, January 1991. [TT98] Xue-Cheng Tai and Paul Tseng. Convergence rate analysis of an asynchronous space decomposition method for convex optimization. Technical report. Department of Mathematics, University of Washington, Seattle WA, 1998. [ZC92] Jianping Zhu and Yung Ming Chen. Parameter estimation for multiphase reservoir models on hypercubes. Impact of Computing in Science and Engineering, 4:97-123, 1992.
A Local Relaxation Method for Nonlinear Facility Location Problems* Walter Murray^ and Uday V. Shanbhag^ ^ Department of Management Science and Engineering, Stanford University, Stanford, CA 94305-4026, USA. walterQstanford.edu ^ Department of Mechanical and Industrial Engineering, University of Illinois at Urbana-Champaign, Urbana, II 61801, USA. udaybagQstanford.edu S u m m a r y . A common problem that arises is the number and placement of facilities such as warehouses, manufacturing plants, stores, sensors, etc., needed to provide service to a region. Typically, the greater the number of facilities, the cheaper the cost of providing the service but the higher the capital cost of providing the facilities. The location of a facility is usually limited to a number of fixed locations. Consequently, when modeling such problems, binary variables are introduced to indicate whether or not a facility exists at a particular location. The resulting problem may then be modeled as a discrete optimization problem and could, in theory, be solved by general purpose algorithms for such problems. However, even in the case of a linear objective, such problems are AfV-hard, Consequently, fast algorithms for large problems assured of finding a solution do not exist. Two alternatives to exact algorithms are heuristic algorithms and a-approximation algorithms. The latter ensure that a feasible point is found whose objective is no worst than a multiple of a times the optimal objective. However, there has been little success in discovering a-approximation algorithms [Hoc97] when the problem has a nonlinear objective. Here we discuss a generic heuristic approach that exploits the fact that the number of facilities is usually small compared to the number of locations. It also takes advantage of the notion that moving a facility to a neighboring location has a much smaller impact on the cost of service compared to that of moving it to a distant location. A specific form of this algorithm is then applied to the problem of optimizing the placement of substations in an electrical network.
1 Introduction A common problem faced by retail chains and utilities ^'^^ is the location of facilities. A company may want to decide where to best place its stores. T h e further away a store is from potential customers, the less likely they are to * This research was supported by Office of Naval Research grant NOOO14-02-1-0076 and National Science Foundation CCR-0306662. ^^^ Utility companies such as water, gas and electricity
174
Walter Murray and Uday V. Shanbhag
buy from it. Such a problem occurs quite frequently in diverse settings. We may want to know where to place servers in a network or as in the application discussed in detail here, where to place substations in an electrical network. If the number of facilities is very small compared to the number of potential sites, it may be possible to solve thge problem by exhaustive search. However, that approach very quickly becomes untenable. It may be thought that where to place facilities is easy to determine since they could be placed at demand centers, but typically demand is diffused and defining its "centers" is the essence of the problem. Typically in facility location problems (FLPs) [AM093], the cost of facihties is linear (or close enough) in the number of facilities. However, as the number of facihties rises, the benefits of additional facihties diminish. The optimization problem is to find when the addition of an incremental facility is no longer merited. Solving the problem would be quite simple if the benefit was independent of where the facilities are placed and it is this aspect of the problem that is the most challenging. If the number of facilities increases from say 20 to 21, the set of best locations for the case of 20 facilities may be completely different for the instance of 21 facilities. To model the problem, we can define a vector of binary variables, say y G K", such that n is equal to the number of possible locations of a facility. When Vi = 1 (or 0) there is (is not) a facility at location i. The cost then of providing the facihties is Y^ijyi, where 7 is the cost of a single facility. Given a set of facilities at specific locations, there is a cost (or benefit) to their operation. Since the facility is serving some region (this could be one, two or even three-dimensional), this cost is related to the size of the region. The facility could be a warehouse needing to make deliveries to the supermarkets in some region. Presuming that the units are served by just one facility, we may assign a supermarket to the nearest warehouse. However, "nearest" may be measured in a variety of ways. Also, there are often constraints on how large a region, a facility can serve. For example, if the facility is a pump for waste water, it will have a maximum rating. In some cases the problem is really to find the minimum number of facilities that can do the job. However, there is often a cost saving, if more than the minimum number is used. If it is necessary to ship goods from a facility, then the further the shipment the greater the cost. We now describe an example that we use as a reference throughout our discussion. Example 1. Suppose Apple were to establish a chain of outlets in the United States, with the objective of maximizing revenue less the cost of instaUing and supplying these facilities. The constraints are the satisfaction of demand (^^ and the network constraints associated with keeping the facilities wellstocked. Note that such a model would assume an exogenously provided set of Apple warehouses that stock their items. Obviously, then the number of potential customers within a certain driving time of an outlet will increase '**' We assume that the demand is deterministic for the present
A Local Relaxation Method for Nonlinear Facility Location Problems
175
(a benefit) with the number of outlets. The problem is to specify the placement of new Apple stores in this network so as to maximize profit subject to aforementioned constraints. This operating cost, although not a capital expense, can be merged with it to give the objective function of the problem. Often the operating cost may be approximated well by a linear function. However, that is not always the case and it is the nonlinear case that we focus on in this paper. We do assume that the operating cost is a sufficiently smooth function but we do not assume it is convex (or concave). The nonhnear facihty location problem is given by FLP
minimize
76 y + f{x, y)
x,y
bl\ subject to
\iJ
fAx + By\ \
X
fbu
j \u y e {0,1}
where 7 represents the cost of a facility and e the column of ones. The smooth nonlinear function f{x,y) represents a cost of losses or transport associated with a set of locations specified by y, where y is required to have binary values. Linear constraints are often needed since facilities have limits on capacity. The nonhnear FLP has few references in the Uterature. A Lagrangian relaxation algorithm is presented for a sub-class of these problems in [AM093]. However, linear facility location problems have been studied quite extensively in the recent past with a bulk of the effort being performed in the area of approximation algorithms. Such algorithms guarantee a solution within a of the true solution [STA97, JV99]. Unfortunately, these approaches need the linearity of the objectives to make such guarantees. Our interest is in making only smoothness assumptions on the objective (nonUnear part) and then attempting to construct fast and scalable algorithms. The problem (FLP) belongs to the class (MINLP): MINLP
minimize
F{x ,y) 9{x y) < 0
subject to
X
eX
y e y c
Z",
where F{x,y) and g{x,y) represent nonhnear functions in x and y and y is required to have integer values. MINLPs generalize two hard classes of problems: the class of mixed-integer programs or MIPs and the class of nonconvex NLPs or nonlinear programs. Both classes belong to the larger class of AfVcomplete problems implying that MINLPs are also a hard class of problems [NW88]. No convenient optimafity conditions exist for MINLPs and one has to find an optimal solution and then verify that the solution is indeed optimal. In fact, such verification may require an exponential number of iterations even if one were to start from a globaUy optimal solution.
176
Walter Murray and Uday V. Shanbhag
One of the more common approaches to solving such MINLPs is to apply branch-and-bound methods [NW88]. Such methods solve continuous relaxations of the MINLPs and partition the feasible region to avoid fractional integer solutions. Leyffer's results [Ley93] conclude that the computation times grow exponentially with the number of integer variables. This approach contrasts sharply with a method that alternates between solving a mixed-integer linear programming problem and a nonlinear programming problem. The master problem may be defined by outer-approximation [DG86] or through the ideas of generahzed Benders Decomposition [Geo72]. Either case results in a mixed-integer linear program for the master problem, which in turn provides a new estimate of integer variables y. These are subsequently used in a nonlinear programming (NLP) subproblem. The solution to the NLP subproblem provides a new estimate of the continuous variables x. The master problem provides a lower bound to the optimal solution and these bounds are used to test for termination. Leyffer [Ley93] discusses an example that demonstrates that the outer-approximation algorithm may take an exponential number of iterations. Some alternate approaches are also discussed in the same thesis, in particular, the use of linearizations [AH89], rounding [OV89], or even extensions of sequential quadratic programming (SQP) in the discrete domain [CM89]. A recent doctoral thesis by Ng [Ng02] focused on homotopy methods to solve MINLPs. This chapter is organized into five sections. Section 2 discusses the components of a generic local relaxation algorithm as it would apply to nonlinear facility location problems. There are several components of this algorithm that warrant a more detailed discussion and are the subject of sections 3 to 5. A key idea we introduce is that of a "good neighbor." This is a set of locations that are " close" to the set of current best known locations. Section 3 presents a global relaxation procedure for finding an initial integer solution, given a set of existing facilities. Section 4 articulates the notion of a neighborhood as it pertains to a nonhnear facihty location problem. Here, we distinguish between regular and more general graph-based problems and discuss a means for finding a good neighbor, given an integer solution. Often the move to a good neighbor may not prove beneficial from the standpoint of cost and we discuss a mechanism to find an improved integer solution is section 5. Section 6 discusses the apphcation of our algorithm on the placement of substations on a regular electrical grid. The resulting problem is a FLP with quadratic costs and additional linear constraints. Some computational results for our algorithm are presented in this section, in particular a comparison with existing mixed-integer nonlinear programming algorithms. Finally, section 7 discusses the bounds that may be obtained at each iterate of the algorithm. We also present a related continuous function that forms a smooth lower bounding function to the original objective.
A Local Relaxation Method for Nonlinear Facility Location Problems
177
2 A New Approach to t h e Facihty Location P r o b l e m The FLP has many features that cannot be taken advantage of by a generalpurpose algorithm. For example, suppose the best known set of locations were San Francisco, Washington, and Dallas. Alternatives include San Jose, Baltimore and Houston. Clearly, when considering an alternative set of locations San Jose, Baltimore and Houston would seem to make more sense than San Francisco, San Jose and Dallas. What is imphcit in this statement is that alternatives to a location in the vicinity of a current choice have a different impact than an option from a distant location. Moreover, replacing a distant facility by one that is close to another facility is often a poor choice. If these facilities were simply binary variables, then this distinction would be missed. To be able to capture this distinction, we introduce the concept of neighboring locations. Associated with each location i, is a set. Si, of neighboring locations. There are many ways to define Si, the most obvious being those locations within some prescribed distance of i. It is instructive to examine the basis for algorithms with continuous variables. Typically an improvement to the current best estimate is made based upon local information. For example, in the case of linesearch methods, a direction of descent is obtained and then used to determine an improved estimate. Initially what is thought to be the best step is taken along this direction but this may not prove to be a better point. However, by changing the steplength (making it shorter) an improvement is assured. In general, optimization algorithms are not based on choosing potential improvements at points that are remote from the current best point. Of course in the case of continuous variables the concept of closeness is obvious. In general for binary variables, there is no meaningful analogy. By introducing the concept of neighboring locations we can define "close" for the FLP. A set of locations x is close to y if Xi G Sy^ for all i. Rather than search for an improved point in the whole space, we propose restricting the choice to the points that are close and then, akin to methods for continuous variables, make a sequence of moves to "close" points. Note that this approach moves all facilities simultaneously and this key feature ensures that the algorithm is scalable. As the number of locations increases, provided the density of facilities stays similar, the "distance" from a reasonable initial estimate of the solution to the actual solution remains constant. In this context "distance" is measured in terms of the number of "close" steps needed to move from the initial estimate to the solution. In order to define an algorithm, we need the equivalence of a direction of search. We replace this notion by the idea of a "good" neighbor (see section 4.3). To find a "good neighbor", we could define a new discrete optimization problem under the restriction that the new estimate is close to the old. Indeed if we solved this problem, we would find the best neighbor. However, although this problem is much smaller in terms of choice than the original problem, it is stiU extremely hard to solve. Moreover, finding the best neighbor is unnec-
178
Walter Murray and Uday V. Shanbhag
essary especially when the current estimate is not close to the solution. As opposed to the original FLP, this problem attempts to decide the optimal set of neighboring positions, given a current feasible integer solution. We now provide a formulation of this FLP, which has a lower dimension than the original problem. Let Sy^ be the set of nodes housing new facilities and their neighbors. The input to this problem is the current integer solution (given by yo). Suppose 5^^ represents the set of neighbors of facility j . We may look at the total set of neighbors as '^j^yyo'
^yo ~
where j is the index of a new facility. The problem FLP(yo) then obtains the best set of neighboring locations to the set yo and is defined as follows: FLP(yo) minimize
"fe^y + f{x, y)
bl\
fAx + By\ X
subject to
fbu I
\U
Vi
e {0,1},
i G Sy^
Vj
=0,
i ^ Syg ,
where J-'n represents the set of new facilities. While smaller in the number of integers, this problem could still be large. For instance, if there were a 100 facilities and each location had 6 close locations, then FLP(yo) would have 700 integers. So, we would obviously need to determine a way of obtaining a good solution to FLP(yo) if not the best one. In addition to defining a good neighbor, we need to define the equivalence of the linesearch, given a search direction. One view of a linesearch is that it provides us with a more conservative choice than the initial step. The set of acceptable points is the infinite set along the line of the search direction. The ranking in terms of conservatism is clear and is obtained by the proximity to the original point. Given a good neighbor, we need to define a similar ranking of more conservative moves. Obviously in this case, the number of choices will be finite, albeit, extremely large. Suppose the good neighbor differs from the original estimate at k locations then there are 2^^"^ choices. The generic algorithm is shown in figure 1. We introduce three components of the algorithm that correspond to three important features of linesearch methods in continuous variables. In particular, 1. Finding an initial discrete point: This is similar to finding a starting point in a linesearch algorithm. While it is easy to find a poor feasible point, a good feasible point is more difficult to determine. We discuss a general approach for getting a solution to a nonlinear FLP by using a technique called global relaxation. As the name suggests, this technique involves the relaxation of all integrality constraints. To ensure a reasonable feasible
A Local Relaxation Method for Nonlinear Facility Location Problems
179
point, we modified the objective function to make it favorabie for some continuous variables to move toward tfieir upper bounds. We pursue ttiis discussion furtfier in section 3. 2. Determining a good neigtibor: Tliis step is analogous to the direction finding step in Hnesearch methods and requires determining a set of candidate nodes from the neighborhood of current positions. We discuss the appropriate subproblera of interest and associated issues in section 4. 3. Determining an improved feasible point: The good neighbor may not provide an improved integer solution. As in the case of continuous problems, we construct a ranking of points that are analyzed in sequence to determine what step to take. In hnesearch methods, this is iterating on the step size. In this instance, we achieve a similar objective by restricting the number of facilities that can move. Finally, if even moving a single facility causes an increase in cost, we terminate at the current integer solution. We describe these steps in detail in sections 3, 4 and 5.
DETERMINE INITIAL DISCRETE FEASIBLE SOLUTION INITIAL*OF FACILITIES
' ADJUST U OF FACILITIES
DETERMINE SEARCH DIRECTION
1 DETERMINE SEARCH STEP TO GET MPRDVED SOLN
' UPDATE POSITIONS OF FACILITIES
.
-TVHnSirmiovEDSouiTlDR „ _ ^ ^ C : A N BE
F O U N D ^ — - ^ ^
-----TVHIIETOF FAciuTlES ~-,..^_NOT ODNVEROED
^^
FINAL NUMBER AND POSITIONS OF FACILITIES
Fig. 1. Local Relaxation Algorithm for Nonlinear Facility Location Problems
180
Walter Murray and Uday V. Shanbhag
3 An Initial Integer Solution The first step of the algorithm is to find a discrete feasible point. For FLP this is trivial, but it is not trivial to find a "good" discrete feasible solution. We describe a technique called global relaxation to obtain such a solution. 3.1 A Global Relaxation Method The basic technique adopted is to (i) relax the discrete requirement (ii) obtain a continuous solution of the problem and (iii) round up the integer variables to obtain an integer solution (by rounding up, we ensure feasibihty). If the discrete requirement is relaxed, the solution of the problem is transparent. Namely, a global minimizer is given by the placement of a facility at every location, with its size matching the demand at that particular node. Obviously this is a very poor guide to where to place the discrete facilities. Ironically, after rounding up we would have a solution that maximized the costs^^). Typically demand at any given location is much smaller than what a facility can serve. Consequently, the continuous solution has few yi variables, if any, close to their upper bounds. It is in part for this reason that some algorithms such as branch and bound work poorly on FLP. The critical assumption being made is that the cost of a facility is linear in the size of the facility. The original discrete objective is a step function and in our global relaxation approach, we approximate this function more accurately than that given by a simple linear function. What we wish to do is introduce an objective term that would favor say two variables being set at (0,1) or (1,0) rather than (0.5,0.5). Recall that having solved the continuous problem, we will need to round up the solution obtained. There are many ways to try and induce the solution of the continuous problem to have discrete values. A distinction of the FLP is that identifying the variables that are 1 is hard. Most schemes for inducing a variable close to either bound (such as adding the term (72/4(1 —y*)) are neutral and we want one that is not. In the global relaxation algorithm, we propose that the linear cost je^y is replaced by a nonhnear (and nonconvex) cost Y2 i 7(1 —e''^'). The motivation behind such a choice is that the original cost function associated with yi is 7 when 2/i = 1 and zero otherwise. Moreover, by the integrahty of y^, the cost can only take on two values. If one introduces a relaxation without modifying the cost function, then partial additions of a facility are charged the corresponding fraction of the capital cost. However, in reality any fraction of the facility should be charged the same as the whole facility so as to simulate the discrete problem. This is achieved by large enough /i in the function Q:(1 — e~'*^'). This function looks like a smoothed step function as shown in figure 2. In particular, we have (5)
Assuming of course, that the relative transportation costs are sufficiently low
A Local Relaxation Method for Nonlinear Facility Location Problems
181
Fig. 2. This figure shows how f{yi,n) tends to a step function as fj, increases. We have shown JJL going from 1 to 20. Also shown is the 45 degree line represented by the convex relaxation fcivi)-
as /It - * 00, (1 - e '^''')
While a should be the ratio of the capital cost to the operating cost denoted by 7, we may set a = /3 > > 7. This ensures that the solution favors fewer facilities. As a consequence, one needs to add rather than remove facilities to obtain the optimal number of facilities. This is both an easier task since it is then trivial to get a very good initial point to the next problem that needs to be solved. The continuous problem solved is given by FLP,
mmimize oo,y
subject to
e'^yi) + f{x,y)
Ax + By^ X
<
y
The solution y* from FLPcont is then rounded up to obtain an integral solution. Our algorithm keeps the number of new facilities fixed over each major iteration. However, once we have an "optimal" placement of facilities, we may now want to obtain an initial solution for a larger number of facilities '^^This initial solution is obtained by solving the problem FLP(/c),
'®' It may also be that that there are already installed facilities at the outset
182
Walter Murray and Uday V. Shanbhag FLP(A;) minimize
/? YLJ^T^ (1 - ^^^') + fi^, y)
subject to I c J~e-, J c: J~ji
where k represents the number of new facilities. The set F^ represents the collection of existing facilities and F„ is the remaining set of nodes defined by F„=A/-\Fe.
4 Finding a Good Neighbor Given an integer solution, we wish to find a "good neighbor." A good neighbor is a discrete solution that is represented by a set of positions in an immediate proximity to the current set of locations. It should be emphasized that we do not ensure the this "good neighbor" does indeed have a lower cost. Let us return to our example concerning the placement of Apple stores. Example 2, Suppose that current estimate of locations had a store in Palo Alto - then the neighboring locations are Fremont, Redwood city and San Jose . Given stores in all four locations, what we would then determine is the profit associated with each location and from this information, an element of a good neighbor for Palo Alto is determined (it could, for instance, be Redwood city). In section 4.1, we formally specify the notion of a neighbor for rectangular grids and general graphs. Such a definition allows us to formulate a continuous subproblem as defined in section 4.2. This continuous subproblem allows every facility to be spread around its local neighborhood. The solution of the subproblem gives us a distribution of capacity in the neighborhood of a current set of positions. If the continuous solution sites most of the facilities in a single position, it may be an indication that position is a good one. In the more likely eventuality that the continuous solution does not favor a specific position, we need to use the continuous solution to find a "good neighbor". This is achieved by using the notion of center-of-gravity as described in section 4.3. 4.1 Speciflcation of the Facility Location Problem An FLP requires the specification of an associated graph, cost function and constraints. The cost function of a nonlinear FLP is comprised of two parts: •
The linear cost associated with the installation decision of a new facility. We use the parameter 7 as the proxy for the capital cost.
A Local Relaxation Method for Nonlinear Facility Location Problems •
183
The nonlinear cost f{x,y) associated with transporting the commodity from the facility to the demand node. This would be analogous to the cost of losses in an electrical network or the cost of transportation in a transportation network.
Normally, FLPs have a set of hnear network constraints. These constraints prescribe the satisfaction of demand based on supply. In addition, there may be a set of extra hnear or bound constraints that are a necessary ingredient of the formulation. We combine all these constraints into a set of equality constraints as specified in (FLP). A network problem is specified as Q{W,A), where Q represents a graph with a set of nodes W and an associated set of edges A. We may also require the length of an edge in specifying a local relaxation implying a set V, where £:A-^V, and £ is the distance transformation. We define Ai as the set of nodes adjacent to node i. Definition 1. A d-neighborhood of node i is defined as Mi,d-={iUj:jGAiJij
Note that if d = oo then A/i,oo = AiThe ideas of the above definition are sketched out in figure 3. 4.1.1 Grid-based Networks A grid-based network is defined as a fully connected graph in which nodes are arranged in the form of a grid. In [MS04], the network was a regular grid but in transportation networks in certain cities, the network may be a rectangular grid. In this context, some possible neighborhoods would be 9-point or a 5point stencil centered at the current position. Example 3. Suppose Apple were trying to decide where to place its stores in New York city, then it would be faced with a grid-based network. While the 5-point stencil allows movement to 4 points (apart from staying at the same position), the 9-point stencil allows movement to 8 points. The benefit from the 5-point stencil is that finding a better improved solution has a lower complexity than the 9-point stencil (as shown in section 5). However, the 9-point stencil permits an immediate move to diagonally opposite positions while the 5-point stencil would require 2 moves. Another issue is that given the inherent nonconvexity of the cost function, the cost at the two neighboring points of the 5-point stencil may be higher but that at a diagonally opposite point (not on the stencil) may be lower. Such a situation would not even allow a move. In effect, the two choices for the first move would increase the cost.
184
Walter Murray and Uday V. Shanbhag
0 W = {1,2,3,4,5}
^={12,13,14,15}
Q^^
•€) Ai = {1,2,3,4,5}
M,oo = {1,2,3,4,5}
® A/'i,< = {l,2},di2 = C
Fig. 3. oo and i-neighborhoods in an FLP CD Q G)
^^^>-^) ^>-<>-^) -(^
CMM> -^
""it
iy
o o 0
Fig. 4. Le/i; 5-point stencil. Center: Original set of Positions. Right: 9-point Stencil.
4.1.2 General Graph-based N e t w o r k s Consider a portion of a grapli-based network as shown in the middle section in figure 5. Assume t h a t the central node houses the facility. If the ooneighborhood is used, then the neighborhood contains all the nodes connected to the facility node and the node itself. If one uses an ^-neighborhood, where only the smallest link is shorter t h a n i, then only the facility node and the node on the left are in Afp.
A Local Relaxation Method for Nonlinear Facility Location Problems
185
Q
O
Ml
Mo.
Fig. 5. oo and (.-neighborhoods in an FLP Q Q Q
^ ^ ^
- - H -
<^>-^) <>-()-«
^KK> '^K> ^
^
—
(boo
c ^ ^ ^
Fig. 6. Le/t' Current set of positions of facilities. Center: Relaxation of facilities using 9-point stencil. Right: Solution of outflows from each facility by solving subproblem.
4.2 A Continuous Subproblem The distribution of outflows in this neighborhood provides an indication of a new set of positions. To determine this distribution, we solve a continuous subproblem. Implicit in this undertaking is that the number of facihties is kept fixed. In essence, we find a sequence of improving discrete positions until we can no longer do so. The termination of such an endeavor hinges on the finiteness on the number of positions. Once, we have an "optimal" set of positions for a particular number of facilities, we change the number of facilities and repeat the process. The crucial step in such an algorithm is the determination of a new set of positions from a current set of positions. Suppose that the vector y^ is an
186
Walter Murray and Uday V. Shanbhag
integer vector representing a current set of positions ^"^^ with the number of facilities n^ being given by n^ = e^y^. We achieve this by proceeding through the following steps: 1. Set m = 1. 2. Using Nj^ii neighborhoods, for all facility nodes j , we allow a facility j to be distributed on all the nodes in its neighborhood. The resulting y^ is modified to y^ ^^\ This has two consequences. a) First, the operating cost falls since demand nodes can be supplied by slightly closer facilities. b) Second, the discrete solution using y^^ provides an immediate lower bound to the optimal cost with y'^ since the transportation cost from facilities reduces and the revenues increase. 3. Based on y^, we solve a subproblem (FLPsub(yr)) '^^ which the decisions are the supplies from each facility in a neighborhood and the corresponding flows across the network. The by-product of this solution is a spatial distribution of supply across a neighborhood as provided by y^*. 4. Using the supply distribution, we can construct a new set of positions as a candidate solution. We test to see if this candidate solution has lower cost and if so, we exit the procedure. If not, we increment m by 1, and reduce the neighborhood size d so that at least one facility is no longer distributed. This provides a new vector y]^ and we proceed to step 2 (discussed in section 4.3). We conclude this section by stating the NLP subproblem to be solved. FLPsub(2/™)
minimize
f{x;y'^)
x,y
».je.t„ i^'ur:""H': The constraint system is obviously dependent on y^ but for notational simphcity, we do not show this relationship. Moreover, this subproblem is a smooth optimization problem with a nonlinear objective and linear constraints. The solution to this subproblem gives us an estimate of the relaxed integer variables, which indicate the distribution of a facility over its neighborhood. The next section discusses how one may use the solution of this subproblem in the prescription of a "good neighbor". There is an immediate relevance that the continuous subproblem in our canonical example. Example 4- The solution to the continuous subproblem would indicate what portions of the facilities should be installed at Palo Alto, Fremont, Redwood '''^ yi=Oorl based whether at iteration k, a node i houses a facility or not '^' This implies that any nodes in the neighborhood of node i are also assumed to house a facility and the corresponding components y^ are set to 1 but the total capacity of the neighborhood of facilities is restricted to unity
A Local Relaxation Method for Nonlinear Facility Location Problems
187
City and San Jose, respectively. If for instance, the continuous solution placed most of the facility at Palo Alto, it may indicate that Palo Alto is a good candidate. However, often this may not be entirely clear from the distribution. 4.3 A Good Neighbor Section 4.1 described a grid-based and a general graph-based network. We now consider the more general graph-based network with no assumption on symmetry or regularity. We begin by discussing the notion of a center-ofgravity of a facility distribution over a neighborhood. Using this definition, section 4.3.2 provides a technique for specifying a "good neighbor". There are alternatives to the center-of-gravity method. In particular, we consider an assignment method in [MS04]. Both approaches choose a new location based on the continuous solution. We shall refer to the center-of-gravity method as the CG method.
Initial Positions
TMPianmniBi
SnbprobI em Soln.
(jsV"^
d)
iB)
1^ ^^y
New Positions
<S)^
M V
~(lT) (12)
1
JS)
v-©'^
(?
w
1 iippa*yiiksgfe,.^,^j„
Fig. 7. This schematic shows the general workings of the local relaxation approach for finding an improved integer solution.
188
Walter Murray and Uday V. Shanbhag
Centers of Gravity
=rais3s:;ii75
New Positions
(s)
(TSV
CQ Jy^
(3)
T)
(To) (X X?} (T] (n) (
2)
^
Fig. 8. This schematic shows the process of using the center of gravity to make a simultaneous move of facilities.
4.3.1 T h e C G M e t h o d T h e coordinates, say Xcg and ycg, of the center of gravity associated with a stencil placed at i are defined by:
T.jeM.Mjsm9j)yj y.
S j G M yj
where dj is the length of hnk (i, j ) and 9j is the angle between the link (i, j ) and the positive x axis. If {xcg,ycg) is "close" to {i,j) the facility is not moved from ( i , j ) . There are a number of measures to define "close" based on the type of network. 1. If the network is a rectangular grid, we may use a rectangle of size /9*(ox6), where a and h represent the link lengths in the rectangular grid. Moreover, p G [0,1] allows control on the size of the rectangle. Note t h a t the rectangle is centered at (i, j ) .
A Local Relaxation Method for Nonlinear Facility Location Problems
189
2. If the network is a general graph, the notion of closeness is best captured by the Euclidean distance from the center-of-gravity. This is geometrically represented by a circle of radius p.
Q
-£>
O
-£)
(D
Fig. 9. Left: Solution to continuous subproblem with threshold circle. Center: Center-of-gravity. Right: Leftmost node is "good neighbor"
^ ^ ^
^ ^ ^ <
>
•
-^)
-
<5-(^-e)
<>-(}-^)
CH^> CH^^^
<)
CK>^^ G
C^-^-4 Fig. 10. Left: Solution of outflows from each facility by solving subproblem. Center: Determine e.g. of distribution and check if it is within the threshold (do not move) or beyond (move). Right: New set of positions.
4.3.2 A Good Neighbor If the center of gravity is outside the threshold distance, we need to identify which node is closest to it. In the instance of a rectangular grid, we partition the stencil into orthants and choose a "good neighbor" as the node whose orthant contains the center-of-gravity as shown in figure 10. On the other hand, if the network is a general graph, then we merely check which node is closest to the center-of-gravity. Given that we have the spatial coordinates of all nodes, this is a straightforward procedure.
190
Walter Murray and Uday V. Shanbhag
Example 5. To put this in the context of our example, we may have a center of gravity in the San Francisco bay toward the eastern end of Dumbarton bridge malcing Fremont the closest choice.
5 Determining an Improved Integer Solution Section 4.3 provides a "good neighbor" from the solution of the continuous subproblem. However, we are not guaranteed that by moving the facilities to this good neighbor that the objective wiU be reduced. Section 5.1 discusses how one may obtain a better integer solution under the assumption that the first "good neighbor" is not lower in cost. Once we can no longer obtain a better integer solution, we reach a set of positions that appear optimal for the given number of facilities. But, we may now modify the number of facilities and endeavor to find an improved solution. Such a possibility is discussed in section 5.2. Finally, we have always considered that facilities have a pre-specified capacity. In section 5.3, we discuss how one may add facihties of different sizes. We suggest how this may affect the determination of the centers-of-gravity but do not provide a formal routine. 5.1 A Linesearch Approach In linesearch-based nonlinear programming algorithms, given a search direction, we determine an appropriate step under the assumption that the full step does not give us an improved solution. Such a determination may be carried out by reducing the step-size by a fixed ratio and evaluating whether one has an improved cost or not*-^-*. A similar tack is adopted for obtaining an improved neighboring integer solution. The idea is a straightforward one and is captured by this set of steps. 1. Given a set of centers of gravity associated with the current set of positions (10).
2. Simultaneously move all facilities (that are not fixed) whose centers of gravity are beyond the threshold distance. If new integer cost is lower, then exit else continue. a) Fix the closest facility in terms of distance of center of gravity. If all facilities are now fixed then exit else continue. b) Solve the problem FLP(sub(yr)) ^^^ calculate a new set of centers of gravity. c) Go to step 1. Example 6. Suppose the current solution has an Apple store in Palo Alto and Oakland. The neighbors of the Oakland store are El-Cerrito, Berkeley '^^ Often one may use more elaborate interpolation techniques as well '^°' see section 4.3
A Local Relaxation Method for Nonlinear Facility Location Problems
191
and Hayward and those for Palo Alto are Fremont, Redwood City and San Jose. Suppose the center of gravity for the Oakland store is marginally closer to Hayward and that for Palo Alto is closest to Fremont. However, placing the stores at these two locations gives higher cost than the original. Our linesearch approach would then fix one of the facilities to its original position (say Oakland) and re-solve the problem to estimate if indeed Fremont still provides a better cost. If it does, we stay with (Fremont, Oakland). If not, we restore the Fremont facility back to Palo Alto and terminate with (Palo Alto, Oakland). This is akin to a linesearch idea from continuous nonlinear programming as we discuss in the next section. 5.2 Modifying the Number of facilities Once the above process terminates, the conclusion is that we have a good set of positions for a particular number of facilities. We now modify the number of new facilities and obtain an optimal set of positions for this new set. Therefore, given the current set of positions, the question of the number and initial placement of new facilities may be answered by solving the global relaxation problem FLP(/e'), where k' = k+p, p being the maximum number of facilities that can be added. Once we have an initial placement, we proceed to find a set of good neighbors and better integer solutions, until we have fixed aU facilities. 5.3 Addition Facilities of Varying Sizes An important contributor to the intractability of a nonlinear discrete optimization problem is the exponential growth in complexity that results from adding a new discrete variable. For instance, if the facilities were to take on two different sizes then the number of possibilities for installing a single facility is squared. A similar problem is encountered if we are faced with an upgrade decision. This has an immediate impact on the performance of branch-and-bound type methods. We suggest an alternate approach for solving such a problem. Let yih represent the decision associated with placing a facility of size cap{h) at node i. Since a particular node may only accommodate a single facility, we have ^ y i / i < i , Vi. h
Moreover, we may have a different exponential function for each type of facility leading to the following problem.
192
Walter Murray and Uday V. Shanbhag FLP(fc/i) minimize x,y
subject to
0<
G)-
Vih
<1,
Vi,
(E'W - U ) '
'^ ^ ^^ 6) J ^ -^n)
where fc/i represents a vector of integers representing the maximum number of facilities of specific capacities that may be added. The constraints involving Hih may be summarized as follows: (a) Vih = l , V j - all facilities have associated binary variables bounded above byl. (b) 0 < YlkVih < 1, j G Tn'' At any new facility node, a facility of exactly one size may be installed. (c) 1 < YlhTJoh ^ IjJ G Te'- At any existing facility node, an upgrade may be made. However, exactly one facility has to remain. The resulting solution has facilities of differing capacities. Obviously this requires handling these facilities differently when specifying the threshold distance. In effect, when specifying the center-of-gravity of a facihty distribution, a larger facility may require a larger threshold distance. To be specific, a center of gravity needs to be further away from a larger facility to initiate a change.
6 An Application: Substations Siting Optimization To illustrate the algorithm, we show how it would apply to the problem of siting electrical substations on a regular electrical grid. Such a problem represents an instance of (FLP). The nodal voltages and currents are denoted by V and / respectively while Y represents the admittance matrix of the network. The cost of losses in the network is quadratic and is given by Cioss^''"^''^, where V'^YV and Cioss represent the electrical losses and the cost per unit of losses, respectively. The constraints on voltage and current at a particular node are implicitly defined by whether a particular node houses a substation or not. The remainder of this section specializes the local relaxation algorithm to this problem. Section 6.1 discusses the three important parts of this algorithm: determination of an integer solution, specifying a set of good neighbors and finding an improved integer solution. Section 6.2 provides computational comparison between our algorithm and competing mixed-integer nonlinear programming codes such as cplex.sbb and dicopt. We also demonstrate the scalability of our algorithm in the number of discrete variables.
A Local Relaxation Method for Nonlinear Facility Location Problems 6.1
193
The SSO Algorithm
The sets N^g and NL , where A^ss and NL represent the sets of substation-based nodes and load nodes. These sets get modified as the algorithm progresses. The dependency of the constraints on whether i G A^ss or not may be stated in terms of a binary decision variables yi. In particular, yi is defined as iG
yi = 0.
This allows us to specify the cost of new substations as Ccap^'^y, where Ccap and e^y represent the unit cost of a new substation and the number of new substations, respectively. The constraint system comprises load flow system / = YV, the bounds on nodal voltages and the specification that a nodal current may be either a variable or a fixed quantity based on whether the node houses a substation or not. The bounds on V and / can be stated in terms of the substation instafiation decisions y. Such a formulation may be rewritten as follows: SSO
minijnize
CcapC^y + Cioss^^'^i^^
subject to
I-YV =0 V^
VJ,y
+ yiS,ap,
Vz, Vi.
6.1.1 An Initial Feasible Solution Section 3 discusses in some depth the process of obtaining an initial feasible solution. We merely show the globally relaxed problem of interest. It has a modified objective is given by a smooth function C{y;ii) which is defined as follows i
i
The new formulation of the problem can then be stated as: isIsUcont
minimize VJ,v subject to
7-YV =0 V^
+ yiScap,
v^,
where Cmuit = r T ^ - Since Ccap and C\ OSS are given, Cnmit is fixed. The •^loss
resulting solution need not have all y^ G {0,1}. However, any nonzero yi < I
194
Walter Murray and Uday V. Shanbhag
is rounded to 1. Often, we may prefer to begin with a configuration with fewer facilities since adding facilities preserves feasibility. This is achieved by using 7C'muit, where 7 > 1. An illustrative example that demonstrates the impact of changing 7 is shown in [MS04]. In practice, this method of obtaining an initial feasible solution has been highly effective. 6.1.2 A Local Relaxation Given a feasible integer solution, we use the 9-point stencil to specify a neighborhood (see figure 4). As discussed earlier, we solve a continuous problem given by SSOsub minimize
V^YV
V,I
7-yy=0
subject to
V^
A Local Relaxation Method for Nonlinear Facility Location Problems
195
Fig. 11. Gaussian Load Distribution: 20 x 20 (left) and Sample Snohomish PUD Load Distribution: 24 x 46 (right) 6.2.1 Computational Results from Commercial Solvers This subsection compares the performance of three MINLP solvers with our algorithm. We considered the following solvers for our calibration tests: dicopt, sbb and cplex. We restricted our load distribution to the gaussian load distribution as described in the earlier section. Results for grid sizes varying from 6 X 6 to 20 X 20 are given in Table 1. GAMS/DICOPT did not converge for even the smallest size and its performance is not reported. Table 1 reports the final costs (scaled by le'^) and final number of substations for the CG method as obtained for the non-uniform distribution. We report the performance of sbb and cplex. It should be noted that both these solvers fail to solve problems to optimality for sizes larger than 9 x 9 . We get suboptimal results for sbb upto 19 x 19. However, the results obtained by them in some cases are 10% worse than the CG algorithm. It is observed that the SSO algorithm produces comparable optimal costs to those produced by commercial solvers in the few cases that we can compare.
196
Walter Murray and Uday V. Shanbhag 1 Initial Size 6x6 7x7 8x8 9x9 10 X 11 x 12 X 13 X 14 X 15 X 16 X 17 X 18 X 19 X 20 X
10 11 12 13 14 15 16 17 18 19 20
no 1 1 2 2 3 2 3 3 3 4 5 5 6 6 6
zo 2.2 2.0 4.9 5.0 7.4 9.6 11.8 17.8 17.0 22.2 20.8 28.3 26.4 34.5 38.7
1no
CG
za 1 2.2 1 2.0 2 4.2 2 4.3 4 6.5 4 6.6 5 8.7 5 10.7 6 11.0 6 13.0 8 15.5 9 17.6 10 19,6 11 21.9 13 24.2
1ns 1 1 2 2 3 3 4 4 4 5 5 7 8 9
SBB Zs
2.1 2.1 4.3 4.5 6.4 6.8 9.2 11.8 12.4 15.0 15.4 16.9 22.2 23.9
* *
-iL
1nc CPLEX zc
2.1 1 2.0 1 4.1 2 2
* * * * * * * * * * *
1
2.1 2.0 4.1 4.2
* * * * * * * * * * *
Table 1. Comparison with Commercial MINLP Solvers: sbb: Note that ZQ and no represents the cost and the number of substations in the initial feasible solution. C: cplex, G: CG Method, s: sbb. z has been scaled by le^. Note that the results from sbb are early termination results at 1000 branch-and-bound nodes. * implies failure. Note that dicopt fails even on the smallest example.
While it is difficult to compare the computational effort of our algorithm with commercial solvers on the basis of iterations, it is possible to construct ratios of every algorithm's performance with the effort taken to solve the 6 x 6 case. We show the scaled computational effort with the scaled number of integers in figure 6.2.1. It can be seen that our algorithm shows modest growth in computational effort with the number of integers. However, the two commercial solvers show exponential growth in computational effort. In fact, for the 8 x 8 case, the computational effort of sbb and cplex grows by factors of 39 and 62 respectively. To exemphfy the differences in time taken, the SSO algorithm takes less than a minute to solve the 15 x 15 case while cplex takes over 10 hours (on Solaris 8). Note that cplex took 3.47 CPU seconds to solve the 6 x 6 problem. 6.2.2 Computational Results on Large Problems This section shows the workings for some large-scale cases. The commercial solvers were unable to solve problems larger than a few 100 integer variables. Our algorithm was able to solve problems as large as 2500 integer variables without difficulty. We show results for a gaussian (Table 2) load distributions with sizes ranging from 10 x 10 to 50 x 50. The CG method consistently produced better results efficiently. Table 3 shows the lower bounds obtained for the SSO algorithm (as obtained by solving a related mixed-integer hnear program). As a comparison, the solution of the convex quadratic program has also been shown. Since, the lower bounds require the solution of a large mixed-integer Unear program, computation of such bounds becomes difficult for large problems.
A Local Relaxation Method for Nonlinear Facility Location Problems 3
197
Scaled Number of Iterations vs Scaled Number of Integers
0 ::: o
Assignment CPLEX SBB
Scaled Number of Integers (1 = 36, 1.36 = 49, and so on)
Fig. 12. Comparison of scaled computational effort. Cost vs. scaled number of integers. This ratio is set as 1000 when the solver fails to converge or terminate gracefully. The y—axis has been drawn with a logarithmic scale. Note we also show results for an alternate approach to the CG approach (called the assignment method).
Size 10 X 15 X 20 X 25 X 30 X 35 X 40 X 45 X 50 X
20
"0
CG zo
1
0.0074 0.0222 0.0387 0.1085 0.1377 0.2254 0.2696 0.3532 0.4364
4 6 13 20 26 37 46 56 65
0.0065 0.013 0.0242 0.0375 0.0565 0.0762 0.0993 0.1278 0.1605
0.0031 0.0031 0.0056 0.0087 0,0127 0.0173 0.0224 0.0285 0.0350
1no Initial 1 10 15 20 25 30 35 40 45 50
3 4 6 8 13 18 20 26 34
^convex
Table 2. Non-Uniform Load Distribution. Note that zo and no represents the cost and the number of substations in the initial feasible solution. 1 Size 6x6 7x7 8x8 9x9 10 X 10 11 X 11 12 X 12 13 X 13 14 X 14 15 X 15
CG
2cc 0.0022 0,0020 0,0041 0.0043 0.0064 0.0066 0.0090 0.0107 0.0111 0.0129
Zii; 0,0010 0,0010 0,0022 0.0015 0.0020 0.0035 0.0042 0.0052 0.0072 0.0092
^convex 0.0005 0.0005 0.0010 0.0010 0.0015 0.0015 0.0020 0.0026 0.0025 0.0031
Table 3. Lower Bounds: Non-uniform
load distribution
198
Walter Murray and Uday V. Shanbhag
6.2.3 Total Cost vs. Number of Substations We obtain an estimate of the optimal number of substations based on the global relaxation procedure. However, once we have an optimal set of locations for this initial number, we add new substations. The resulting configuration may result in a set of locations which may have a lower system cost than the earher number. If so, we continue to add substations until no further improvement in optimal costs may be recorded. Figure 13 shows the relationship of the total cost with the number of substations in the system for a 20 x 20 grid. Both the CG and assignment methods have similar relationships though the former settles on 10 substations while the latter terminates at 11 substations.
X o
System Cost VS. Number erf Substations
- Y: E
0,03
•
:
CG Assign
:
\
::: \ ^ : L : ^ Number o( Substations
Fig. 13. Comparison of Total Cost vs. Number of Substations: This cost is in per units (pu) with Cmult = 0.001. Note that in general, costs of 28 MVA substations are of the order of $4,000,000.
7 Theoretical Properties This section develops some of the accompanying theory for this class of relaxation algorithms. General mixed-integer nonlinear programs do not possess polynomiaUy verifiable optimahty conditions and algorithms that claim to reach an optimal solution need to find a sequence of upper and lower bounds that he within a certain tolerance. In section 7.1, we discuss bounds that may
A Local Relaxation Method for Nonlinear Facility Location Problems
199
be obtained for relaxation algorithms. Under assumptions of convexity on the nonlinear function, we may solve the problem as a sequence of mixed-integer linear programs. In fact, this sequence provides a set of increasing lower bound, details of which are provided below. 7.1 Lower and Upper Bounds A sequence of lower bounds is immediately available through the course of the algorithm. Definition 2. Suppose the relaxation algorithm produces a sequence given by {xk,yk}, where yk S Z'^. Then the upper bound at major iteration k is given by the system cost at the previous iteration, viz. U^ = 7e'^yfc_i +
f{xk^i,yk-\).
Lower bounds may be obtained in a series of ways and vary in their degree of tightness. The simplest lower bound is that obtained by relaxing the integrality constraints and is denoted by L^ and defined as follows: Definition 3. The lower hound Lj. for the original FLP is obtained by solving the problem FLPy. FLPr
minimize subject to
-ye'^y +
f{x,y)
/bl\ /Ax + By^ ^ 1< X 1 <
Physically, this bound represents a solution in which a portion of each facility is placed at a node in accordance with demand. We may also obtain a bound using a global relaxation of the original FLP as specified by the following lemma. L e m m a 1. Given a solution {xk,yk) with cost Zk = fixk,yk) + 'ye'^Vk to the original FLP. Then the corresponding cost of this solution in the globally relaxed problem (with a modified objective) is zl = f{xk,yk)
+ -^{1 ^
- e-'^^y"^^^). i
Moreover, we have zf. < z^ as long as ii>2>. Proof. This follows from a Taylor series expansion of 2^ (we drop the subscript k to allow for readability)
200
Walter Murray and Uday V. Shanbhag
1 ^ ( 1 _ e-^^o = ^ E ( i - (1 - ^y^ + l^''yi • • •)
i
i
The last inequality holds if 0{(iyi) < 0. We prove this as follows:
2/-1
2
\
J-
4
4/w
•'•
\
= - 2 - " Vi (1 " 3 W') - ^ / ^ 2/i (1 - gAtyi) + • • • • This implies that 1 1 - r/uyi >o =^ 6 1 1 - -^i^yi > 0 ^ 5 1 2n + l = ^
yi<
3 fi 5 yi< /i yi<
. . _,
^ 2n+l
-•
But this always holds for (U > 3 since yt < 1. D Theorem 1. The global minimizer of the globally relaxed problem with a modified objective function
f{x,y) +
lj2{l~e-^^y^^) /^
i
provides a lower bound to the true problem FLP for FLP^
fiXk y^) + ^ E i ( i ~- e iJ-[yk]i\ /bl\ /Ax + By\ /bu\ < X subject to < \e \ y J minimize
rvoy
Proof.
fi>3.
Lemma 1 implies that
r•
A Local Relaxation Method for Nonlinear Facility Location Problems ^FLP^
^
201
ZFLP-
Moreover, FEAFLPQFEAPLPL,
where FEA refers to the feasible region implying the result. D The following lemma prescribes a distance measure between the solutions of the globally relaxed problem and the true problem for a particular set of integer decisions. Lemma 2. The expression \0{nyi)\ is bounded from above by ^jj.'^e^. Proof.
1 = ~2^^yii'^
1 1 - g^yi + Y^iwi?
- •••)
< -li^^yU'^ + \i^yi + ^ ( w i ) ' + • • •) |^(/iyi)l < 2M^2^'(l + tiyi + -{^lyif
+ ...)
Theorem 2. Given an integer solution {xk,yk) of FLP in which we have exactly m facilities. Then we may obtain a set of m lower bounds Zf. '•', where j of the facilities have been relaxed and j = 1 , . . . ,m. Note that j = 0 gives us the original integer solution. We obtain Zf.'-' by solving the appropriate continuous subproblem. Moreover, we have a monotonicity relationship given by Lfl ^ Zk
L,l ^
> Z k
^ ^ k
L,2
L,m •••^k
•
Proof. This follows immediately by noticing that when any facility is relaxed, it results in a lower transportation cost at the same cost of capital. D 7.2 Convexity and MIP Bounds Duran and Grossman [DG86] discuss an outer-approximation algorithm for solving mixed-integer nonlinear programs, with a convex objective and convex constraints. This algorithm may also be used to provide a tighter set of bounds. However, as the problem sizes grow to the order of thousand variables, it may be well nigh impossible to obtain such bounds.
202
Walter Murray and Uday V. Shanbhag
7.3 Discrete Local Minima In this section, we define tlie notion of a discrete local minimizer by extending the ideas from continuous optimization. It should be noted that such minimizers tend to be meaningful in situations when an integer variable has some locational significance on the graph. We begin by defining a J—local neighborhood of a solution to (FLP). We should specify that the local neighborhood accounts only for a change in the position of the facilities and not in the actual number of facilities. Definition 4. A S—local neighborhood of {xk,yk) represents a set of points {xj,yj] such that if[yk]i = 1 then
Yl [yk]j = 1 and Xj represents the solution of continuous problem FLPgubiVj)• We denote such a neighborhood by J^xk,vk' Essentially, the 5—local neighborhood allows the replacement of existing facilities with those within a 6 distance from the current set of facilities. We may now define necessary conditions for a discrete local minimizer of this problem. Definition 5. Given an integer solution {x*,y*) to the problem FLP with system cost z*. Then any point {xj,yj) belonging to Afx*,y* ^^^ system cost z^ where z^ > z*. Given such a definition, we now state the simple result that our relaxation algorithm does not move off a discrete local minimizer. Lemma 3. Suppose the local relaxation algorithm begins at a discrete local minimizer. Then it terminates without any further progress. Proof. This follows immediately by noticing that a full step of our algorithm moves to a point in the local neighborhood as does every other point visited in the backtracking process. Since the local neighborhood contains no points with strictly better costs, we never move off' the starting point. D
8 Summary and Comments We have presented a framework algorithm to solve nonlinear facility location problems and illustrated how it may be applied to the placement of substations in an electrical network. The algorithm is based on an approach commonly used to solve continuous problems in which an improved point is identified from the neighborhood of the current best point . In general, the concept of
A Local Relaxation Method for Nonlinear Facility Location Problems
203
a neighborhood does not generalize to discrete problems. However, we show that the FLP is an exception. We define a neighborhood and show how an improved point may be found. The algorithm we propose has two important properties. First, the algorithm generates a sequence of feasible and improving estimates of the solution. Second, provided the density of facilities to locations does not significantly decrease as the dimension of the problem increases then the number of steps needed to move from the initial estimate to the solution does not, in general, increase. The key to the scalability of the algorithm is that at every step it is possible (and likely) that all facilities change locations. In one way, the discrete algorithm works better than its continuous counterparts. Typically in a linesearch or trust-region algorithm for continuous problems, the initial step taken is usually only accepted when close to the solution (and not always then). In our observations, our discrete algorithm usually accepts the first step regardless of whether or not the current iterate is a good estimate. The reason for this is that the neighborhood of a continuous problems is difficult to define in terms of magnitude but for a discrete problem it is not. Acknowledgments We extend our sincere thanks to Robert H. Fletcher, PubUc Utility District No. 1 of Snohomish County, Everett, Washington and Patrick Gaffney, Bergen Software Services International (BSSI), for their guidance and support.
204
Walter Murray and Uday V. Shanbhag
References [AH89]
H.M. Amir and T. Hasegawa. Nonlinear mixed-discrete structural optimization. Journal of Structural Engineering, 115:626-646, 1989. [AM093] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms and Applications. Prentice Hall, Englewood Cliffs, NJ, 1993. [CM89] J.Z. Cha and R.W. Mayne. Optimization with discrete variables via recursive quadratic programming: Part 2 - algorithms and results. Transactions of the ASME, Journal of Mechanisms, Transmissions and Automation in Design, 111:130-136, 1989. [DG86] M.A. Duran and I.E. Grossman. An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Mathematical Programming, 36:307-339, 1986. [Geo72] A.M. Geoffrion. Generalized benders decomposition. J. Optim. Theory Appl,, 10:237-260, 1972. [GMS97a] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale constrained optimization. Technical Report NA-97-2, San Diego, CA, 1997. [GMS97b] P. E. Gill, W. Murray, and M. A. Saunders. User's guide for SQOPT 5.3: A fortan package for large-scale linear and quadratic programming. Technical report. Systems Optimization Laboratory, Department of Operations Research, Stanford University - 94305-4022, 1997. [Hoc97] D. S. Hochbaum. Approximation algorithms for NP-hard problems. PWS Publishing Co., 1997. [Hol99] K. Holmstrom. The TOMLAB Optimization Environment in Matlab. Advanced Modeling and Optimization, l(l):47-69, 1999. [JV99] K. Jain and V. V. Vazirani. Primal-dual approximation algorithms for metric facility location and k-median problems. In IEEE Symposium on Foundations of Computer Science, pages 2-13, 1999. [Ley93] S. Leyffer. Deterministic Methods for Mixed Integer Nonlinear Programming. PhD thesis, University of Dundee, Dundee, Scotland, UK, 1993. [MS04] W. Murray and U.V. Shanbhag. A local relaxation approach for the siting of electrical substations. Computational Optimization and Applications (to appear), 2004. [Ng02] K-M Ng. A Continuation Approach to Solving Continuous Problems with Discrete Variables. PhD thesis, Stanford University, Stanford, CA 94305, June, 2002. [NW88] G.L. Nemhauser and L.A. Wolsey. Integer and Combinatorial Optimization. John Wiley, New York, 1988. [OV89] G.R. Olsen and G.N. Vanderplaats. Methods for nonlinear optimization with discrete design variables. AIAA Journal, 27:1584-1589, 1989. [STA97] D. B. Shmoys, E. Tardos, and K. Aardal. Approximation algorithms for facility location problems (extended abstract). In 29th Annual ACM Symposium on Theory of Computing, pages 265-274, 1997.
Fluence Map Optimization in IMRT Cancer Treatment Planning and A Geometric Approach Yin Zhang and Michael Merritt* Department of Computational and Applied Mathematics, Rice University, Houston, TX 77005 - 4805, USA. {yzhang,minerritt}Qcaam.rice.edu S u m m a r y . Intensity-modulated radiation therapy (IMRT) is a state-of-the-art technique for administering radiation to cancer patients. The goal of a treatment is to deliver a prescribed amount of radiation to the tumor, while limiting the amount absorbed by the surrounding healthy and critical organs. Planning an IMRT treatment requires determining fluence maps, each consisting of hundreds or more beamlet intensities. Since it is difficult or impossible to deliver a sufficient dose to a tumor without irradiating nearby critical organs, radiation oncologists have developed guidelines to allow tradeoffs by introducing so-called dose-volume constraints (DVCs), which specify a given percentage of volume for each critical organ that can be sacrificed if necessary. Such constraints, however, are of combinatorial nature and pose significant challenges to the fluence map optimization problem. The purpose of this paper is two-fold. We try to introduce the IMRT fluence map optimization problem to a broad optimization audience, with the hope of attracting more interests in this promising application area. We also propose a geometric approach to the fluence map optimization problem. Contrary to the traditional view, we treat dose distributions as primary independent variables and beamlet intensities as secondary. We present theoretical and preliminary computational results for the proposed approach, while omitting excessive technical details to maintain an expository nature of the paper. K e y w o r d s : Cancer radiation therapy, Optimal t r e a t m e n t planning, Fluence m a p optimization, A geometric Approach.
1 Introduction Using radiation to treat cancer requires careful planning. Bombarding malignant tumors with high-energy X-rays can kill cancerous cells (or hinder their * This author's work was supported in part by DOE/LANL Contract 03891-99-23 and NSF Grant No. DMS-0240058.
206
Yin Zhang and Michael Merritt
growth), but it is usually impossible to deliver a terminal dose without damaging nearby healthy organs in the process. Serious patient complications can occur when the surrounding healthy tissues receive too much of this collateral radiation. On the other hand, sacrificing a modest number of healthy cells may be tolerable since many organs are resilient enough to sustain a certain degree of damage while still providing their anatomical function and can eventually recover. Therefore, research in radiation therapy seeks methods of delivering a sufficient dose to the tumor, while carefully controlling the dose received by neighboring critical organs and other healthy tissues. 1.1 IMRT Intensity-modulated radiation therapy (IMRT) is a state-of-the-art method which delivers higher doses to tumors and allows more precise conformation than the conventional 3D conformal radiotherapy. The primary delivery tool for IMRT is a linear accelerator that rotates on a gantry around the patient, emitting "modulated" beams of X-rays. This modulation is accomplished by means of a device known as a multileaf collimator (MLC) which is attached to the accelerator. Its adjustable heavy-metal leaves act as a filter, blocking or allowing radiation through in a precise manner controlled by a computer, in order to tailor the beam shape to the shape of the tumor volume while minimizing exposure of the neighboring structures. Several mathematical problems arise in order to optimally administer IMRT. Treatment proceeds by rotating the accelerator around the patient and coordinating the leaf movements in the MLC so that the radiation delivered conforms to some desirable dose distribution at each gantry (beam) angle. We will assume in this paper that treatments are administered by fixing the accelerator at a finite number of given gantry angles, rather than emitting radiation while rotating through a continuous arc. We note that determining the number and the values of the gantry angles constitutes a higher-level optimization problem of a combinatorial nature, often called the beam-angle optimization problem. Typically, increasing the number of gantry angles would increase the quality and the cost of the treatments. In addition to knowing the beam angles, one must also know how intense the beams should be at each point {x,y) on the MLC aperture for aU gantry angles. These intensity profiles, or fluence maps, are represented by two-dimensional, nonnegative functions Ia{x,y) for a = 1, 2 , . . . , /c, where k is the number of gantry angles in use. The process of determining the functions Ia{x,y) is often called fluence map optimization. Finally, once the fluence maps Ia{x,y) are determined, one must convert these into MLC leaf sequences that attempt to reahze them. The longer an MLC leaf is open at a certain position {x,y), the more dose the tissue along a straight path from that position (plus some surrounding tissue) absorbs. The process of converting fluence maps into the opening and closing movements of leaves is called leaf-sequencing. There are many physical and mathematical
Fluence Map Optimization in IMRT Cancer Treatment Planning
207
issues that affect how successful MLC leaf sequences are at approximating the desired fluence maps. In this paper, we will focus solely on the problem of computing the fluence maps Ia{x,y), such that the tumor, or target, structures receive the prescribed doses and the healthy critical structures receive as little as possible. These conflicting goals are the primary cause of difflculty in fluence map optimization. 1.2 Dose-Volume Constraints Besides some scattering, the radiation travels primarily in a straight line, so it must typically pass next to or even directly go through critical organs in order to reach and irradiate intended tumor targets. Since the doses that kill most cancers are much larger than those that kiU most healthy tissue in the body, even though multiple angles are used in an attempt to focus radiation on the targets, more often than not one has no choice but to sacrifice some healthy tissues. The next sensible objective is to control the volume of the healthy tissues to be sacrificed. Toward this end, in standard practice oncologists prescribe dose-volume constraints (DVCs) that allow a certain percentage of volume in healthy tissues to be sacrificed in order to make sufficient progress in treating the cancer. A typical DVC has the form, for example, that no more than 30% volume of the right lung can exceed a radiation dose of 20Gy, where "Gy" is the shorthand for "Gray" - the international unit for radiation dose absorption. In addition, oncologists may specify another level of constraint on the same organ, such as no more than 40% volume of the right lung can exceed lOGy. These dose-volume constraints are natural for oncologists to specify and have become the de facto standard way to prescribe radiation therapy treatment in practice. Clearly, dose-volume constraints provide the much needed flexibiUty necessary for the escalation of tumor doses. On the other hand, they also introduce a high degree of complexity to the underlying optimization problem. In the above example, for instance, which 30% of the right lung volume should be allowed to absorb more that 20Gy? This brings a combinatorial component to the optimization problem (once the problem is discretized). Mathematically, finding the globally optimal combination of critical organ cells to sacrifice in this way can be an extremely difficult problem.
2 Fluence M a p Optimization In this section, we provide details on the current practice of IMRT treatment planning, as they are relevant to the fluence map optimization.
208
Yin Zhang and Michael Merritt
2.1 Discretizations To determine the fluence map functions Ia{x,y), we first discretize the MLC aperture for each angle by putting a rectangular grid {{xi,yj)} on it. As a result, the two-dimensional function Ia{x,y) will be approximated by a set of discrete values {Ia{xi,yj)}. The actual number of these small rectangular elements, or "bixels," will depend not only on the physical sizes of the MLC device (such as the width of the MLC leaves), but also on the gantry angles and the geometry of the region under treatment. For instance, if a beam emitting from a given grid point is determined not to have a significant intersection with or impact on the region of treatment, then this particular grid point will be omitted from consideration. With this discretization, each MLC aperture is broken into hundreds (or up to thousands) of discrete "bixels" and, correspondingly, each radiation beam is broken into as many discrete "beamlets." The total number of beamlets in a given fluence map optimization problem is the sum of the beamlets for all the beam angles. Let n be the total number of beamlets for all beam angles and let us index the bixels linearly. Instead of using the notation Ia{xi,yj) for the unknown beamlet intensities, we denote the unknowns by a vector x £ R". In addition, since the intensity values are nonnegative, we have x G K" where M" denotes the nonnegative orthant of M". Moreover, we also need to discretize the "region of interest" or "region of treatment." This is the three-dimensional volume of the patient's anatomy containing the target structures and any nearby critical structures that might be adversely affected by the radiation. Similarly, we will break this volume up into small three-dimensional rectangular elements known as "voxels", each of which is associated with a point {xi,yj,Zk) 6 K'^. Let m be the total number of voxels in the region of interest and let us index the voxels linearly. During the treatment, each voxel will absorb a dose of radiation. We denote the dose values absorbed by the voxels in the region of interest by a vector d 6 R^. Furthermore, let rrit and m^, be the number of target and healthy voxels, respectively, so that rrit + rrih = m. Similarly, we will decompose the dose vector into two sub-vectors dt and dh, corresponding to dose values absorbed by the target and healthy voxels, respectively. 2.2 Dose Calculation The standard IMRT model for dose absorbed at the i-th voxel in the region of interest is n
di^j^^-ij^j^ (1) i=i where a^ represents the amount of dose absorbed by the *-th voxel per unit intensity emission from the j - t h beamlet. The values a^ for all the voxels and bixels form a matrix A G R^^", known as the "influence matrix" (or kernel matrix). In the matrix notation, the dose calculation formula (1) becomes
Fluence Map Optimization in IMRT Cancer Treatment Planning
209
Figure 1 shows how each element aij relates to a beamlet emitted from the MLC and a voxel in the discretized region of interest.
element of the influence matrix
voxel i
Fig. 1. Influence Matrix Element
Assume A has no zero rows or columns. This means, respectively, that all voxels receive some nonzero amount of radiation and every beamlet influences a t least one voxel's dose. These conditions can be easily met by pre-processing, if necessary. Typically, m >> n with m on the order of lo5 or larger and n of lo3 up to lo4. Note the entries aij are necessarily nonnegative. In fact, depending on how much scattering is included, the influence matrix A can be very sparse or fairly dense. The dose calculation model using the influence matrix A can be considered as a first-order approximation. Radiation absorption as a function of the beamlet intensities can be modeled with linear Boltzmann transport equations [Lar97]. Solving these equations can be complicated and computationally expensive, so many different approximation methods have been proposed for computing A. Monte Carlo sampling techniques are, for example, among the more popular methods because of their accuracy. However, Monte Carlo methods are also very slow and expensive. Some commercial planning systems include dose calculation engines with several levels of accuracy. In this way, dose calculations in early iterations, which are usually not required to be
210
Yin Zhang and Michael Merritt
highly accurate, can be made less expensive [WDL+03], while more accurate (and more expensive) schemes can be used in later iterations. Clearly, dose calculation is still an important research area on its own right. While acknowledging its importance, for now we will assume that a constant influence matrix A is provided to us a priori, and we will use it throughout our optimization process. 2.3 Prescriptions A sample prescription for a lung cancer case is given below. As one can see, the prescription consists of a set of dose-volume constraints. Currently, this is the standard practice in prescribing radiation therapy treatments.
>= 95'/, of Tumor <= 17, of Tvimor >= 95'/, of Ext.Tumor <= 1*/, of Ext_Tumor
receives receives receives receives
>= >= >= >=
63 72 60 70
Gy Gy Gy Gy
<= <= <= <= <= <= <= <=
receives receives receives receives receives receives receives receives
>= >= >= >= >= >= >= >=
43 30 10 20 10 19 10 54
Gy Gy Gy Gy Gy Gy Gy Gy
1'/. 15'/, 20'/, 2'/, 8'/, 30*/. 40"/, 50*/,
of of of of of of of of
Cord Heart Esophagus Lt.Lung Lt_Lung Rt_Lung Rt_Lung Norm_Tissue
Fig. 2. A sample prescription
For each structure, tumorous or healthy, there is at least one dose-volume constraint, specified by a percentage number on the left and a threshold value for dose on the right. The first four lines of the prescription are for a tumor and a so-called extended area around the tumor that is introduced to account for uncertainties about the boundary of the tumor. The first two lines state that the target tumor dose should be higher than a threshold value of 63Gy, although 5% of the target volume may be lower than that. On the other hand, the target dose should be below 72Gy except for a portion of 1% of volume. We can similarly interpret the specifications on the other structures. It should be pointed out that the dose-volume constraints for the target structures are very different in nature than those for the healthy structures. Obviously, they are not there for the purpose of sacrificing a part for the good of the whole. They are there because it is too difhcult or impossible to achieve
Fluence Map Optimization in IMRT Cancer Treatment Planning
211
a uniform target dose, so some imperfections are allowed. For example, we may very well regard the first two lines of the prescription as a perturbation to a target dose specification of 65Gy. In the rest of the paper, that is precisely the approach we will take; namely, we will assume that a single target dose value will be given for each target structure, knowing that it is unlikely to be exactly satisfiable. This assumption will significantly simplify our presentation, even though our formulation, to be introduced in the next section, can be extended to deal with dose-volume constraints for target structures. 2.4 Current Practice and Research The IMRT fluence map optimization problem has been extensively studied for a number of years, mostly by medical physicists but more recently also by operations researchers and apphed mathematicians. A survey on this subject from a mathematical viewpoint can be found in [SFOM99]. We now give a brief overview on the current practice and on-going research in IMRT fluence map optimization. Intended for a non-expert audience, this overview is by no means comprehensive. For a collection of recent survey papers on many aspects of IMRT treatment planning, including one on mathematical optimization by Censor [Cen03], we refer the reader to the book [PM03] and the references thereof. The fluence map optimization problem can be viewed as an inverse problem where one designates a desirable dose distribution and attempts to determine a beamlet intensity vector that best realizes the given distribution. There are different ways to formulate this problem into optimization problems using different objective functions, some biological and some physical. Biological models attempt to represent statistical knowledge of various biological responses, such as tumor control probability (see [Bra99], for example). At present, however, the predominant formulation is the "weighted least squares" model as described below, which is being used in most commercial IMRT systems on the market. If an objective function is associated with each anatomical structure, then this problem can be naturally viewed as a multi-objective optimization problem (for example, see [CLKBOl, LSB03]). However, due to the difficulties in directly solving multi-objective optimization problems, the prevalent approach in IMRT fluence map optimization is to use a weighted least squares fitting strategy. Although many variations exist, a typical form of the weighted least squares formulation is the following. For each voxel i, one tries to fit the calculated dose value di to some "desirable" value bi. For a target voxel, this "desirable" value is just the prescribed dose value for the tumor structure to which the voxel belongs. For a healthy voxel (for which there is really no "desirable" dose other than zero), it is usually set to the threshold value of the dose-volume constraint, though sometimes adaptive values are used. If a calculated dose for a healthy voxel is less than its "desirable" value, then the
212
Yin Zhang and Michael Merritt
corresponding error term is set to zero. This way, only those doses higher than their "desirable" values are penalized. Then to each target and critical structure, one attaches a weight parameter that represents the relative priority of fitting its calculated doses to the desired one. In fact, different tradeoffs between structures can be made by adjusting these weights. To illustrate, suppose there are four structures Sj, j = 0,1,2,3, each consisting of a set of voxels, where SQ is a target structure and the other three are healthy ones. Then the objective function in the weighted least squares formulation takes the form 3
fix) = Y,w^fM^))^
(3)
i=o where d{x) = Ax is the calculated dose vector corresponding to a beamlet intensity vector x > 0 (see (2)), Wj are the weights, fo{d) = J^{di-bif, i s So
fj{d) = Y, max(0,di " 6 i ) ^ j = 1,2,3, iSSj
and h is the vector of "desirable values" for all voxels. In this case, the resulting error function f{x) in (3) is a convex, piece-wise quadratic function of x. One seeks to minimize /(x) subject to the nonnegativity of the beamlet intensities in X. Obviously, solutions to this weighted least squares problem vary with the a priori choice of the weights. The weighted least squares model in (3) does not directly enforce the dosevolume constraints in a given prescription, which represent the most challenging aspect of the fluence map optimization problem. One approach to enforcing the dose-volume constraints is to try out different choices of the weights while using the dose-volume constraints as evaluation criteria for solutions. The fundamental difficulty in this approach is that there does not seem to exist any transparent relationships between weights and prescriptions. Hence, weight selection basically reduces to a trial-and-error process, which too often becomes overly time-consuming in terms of both human and computer times. In addition to manipulating the weights, some formulations (e.g. [WMOO]) add penalty terms to the weighted least squares objective to "encourage," but not impose, dose-volume constraint feasibility. These penalty terms are inevitably non-convex, thus introducing the complexity of having to deal with local minima. To address this problem, some systems include the option of using stochastic global optimization techniques such as simulated annealing and genetic algorithms to help escape from unsatisfactory local minima. Since the weighted least squares problems are usually quite large, gradienttype algorithms are often the methods of choice. Some implementations also employ conjugate gradient [SC98] or secant methods [LSB03]. The nonnegativity of beamlet intensities are enforced either by projection [WMOO] or some other means [LSB03].
Pluence Map Optimization in IMRT Cancer Treatment Planning
213
With all its shortcomings, the conceptual and algorithmic simpUcity of the weighted least squares approach is still attractive to the practitioners. Indeed, this current state of IMRT treatment planning does represent a remarkable progress in cancer radiotherapy. On the other hand, many issues remain, ample room for improvement exists, and intensive research activities are still on-going. One of the research directions in this field is the so-called weight optimization (see [XLD+99], for example), aimed at automating the weight selection process. The IMRT fiuence map optimization problem has attracted considerable attention from researchers in mathematical programming community who tend to formulate the problem into hnear or mixed-integer linear programs (see [LFC03, Hol03, HADL03, RLPT04] for a sample of some recent works). Linear programming techniques for radiation therapy have been proposed and studied since the early days [BKH+68] and have also been considered for treating dose-volume constraints [LMRW91]. On the other hand, the introduction of mixed-integer programming techniques into radiotherapy planning for treating dose-volume constraints was a more recent event. Many contributions from the mathematical programming community seem encouraging and promising. Their impact on the clinical practice of radiotherapy, even though limited at this point, will hopefully be felt over time. Finally, we reiterate that the above short overview is by no means comprehensive. Given the vast literature on this subject, many omissions have inevitably occurred, most notably works based on biological objective functions and works connecting the fiuence map optimization to the beam angle optimization as well as to the multileaf sequencing.
3 A Proposed Approach In our view, there are two levels of difficulties in IMRT fiuence map optimization, as outlined below. Our focus in this paper will be on the first issue. 1. Given a prescription, one needs to find a beamlet intensity vector so that the calculated dose from it will satisfy the prescription as closely as possible. The difficulty for this problem lies in the fact that dose-volume constraints define a complicated non-convex feasibility set. This leads to a non-convex global optimization problem that is difficult to solve exactly. The traditional weighted least squares approach relies, to a large degree, on a trial-and-error weight-selection process to search for a good plan. 2. Due to variations from patient to patient even for the same kind of cancer, more often than not, oncologists themselves do not know a priori a "good and achievable" prescription for a particular patient. A highly desirable prescription could be too good to be achievable, while an achievable one might not be close enough to the best possible. Development of procedures to assist oncologists in their decision-making is of paramount importance.
214
Yin Zhang and Michael Merritt
In this section we propose a geometric approach t h a t is entirely prescriptiondriven and does not require any artificial weights. At the same time, we will retain the least squares framework. As such, one can consider our formulation as a "weightless least squares" approach. We consider two sets in the dose space: (i) the physical set consisting of physically realizable dose distributions, and (h) the prescription set consisting of dose distributions meeting t h e prescribed tumor doses and satisfying the given dose-volume constraints. In the case where a prescription is given, we seek a suitable dose distribution by successively projecting between these two sets. A crucial observation is t h a t the projection onto the prescription set, which is non-convex, can be properly defined and easily computed. T h e projection onto the physical set, on the other hand, requires solving a nonnegative least squares problem. We show t h a t this alternating projection algorithm is actuahy equivalent to a greedy algorithm driven by local sensitivity information readily available in our formulation. Moreover, the availabihty of such local sensitivity information offers an opportunity to devise greedy algorithms to search for a desirable plan even when a "good and achievable" prescription is unknown. To keep the expository flavor of the paper, we will not include long and overly technical proofs for some mathematical results stated. A more complete treatment, including extensive numerical results, will be presented in a subsequent paper in preparation. 3.1 P r e s c r i p t i o n a n d P h y s i c a l S e t s We partition the rows of the influence matrix A into two groups: those for target voxels and those for healthy ones; t h a t is. A
At Ah
(4)
where At is the submatrix consisting of the rows for target voxels and likewise Ah of those for healthy voxels. Recall t h a t A S IR^J?^" where m = mt + ruh is the number of voxels, and n the number of bixels. T h u s At € IR!f!*^" and Ah £ R'^''^". W i t h this notation, Atx gives the calculated doses for the target voxels and AhX those for the healthy ones. We start by defining two sets in the dose space. D e f i n i t i o n 1 ( P r e s c r i p t i o n S e t ) . Let bt £ R ^ ' be the dose vector for target voxels in a given prescription, and Vy C R^''' be the set of dose vectors for healthy voxels that satisfy all the dose-volume constraints in the given prescription. We call the following set the prescription set
n=[['^:u^Vy^
C R!".
(5)
Fluence Map Optimization in IMRT Cancer Treatment Planning
215
Clearly, any dose vector d EH precisely meets the prescribed target doses and at the same time satisfies all the dose-volume constraints given in the prescription. If healthy tissue doses calculated from a beamlet intensity vector X e R" satisfy the dose-volume constraints, then we must have AhX < u; or equivalently, A^x + s = u ior some nonnegative slack variable s G R^''. Definition 2 (Physical Set). Let A be defined as in (4). We call the following set the physical set K.
Atx AhX + s
{x,s)>0}
CR"^.
(6)
Clearly, the physical set contains all the dose vectors that can be physically realized (disregarding the slack variable) under the standard dose calculation model. Both H and K, are closed sets in W^, and /C is a convex cone but Ti is non-convex. In fact, Vy is a non-convex union of convex "boxes." For example, suppose we have only two healthy tissue voxels in the region of interest and one dose-volume constraint: at least 50% of voxel doses must be less than or equal to IGy. Then Vy = {u £R% : ui < 1} L) {u &R% : U2 < 1}; i.e., either ui can be greater than one or U2, but not both. Clearly, this is the L-shaped (hence non-convex) region in the first quadrant along the two coordinate axes. Figure 3 shows the relation of Vy to H and K, for this case when there is one target voxel. Note the L-shaped region is elevated to a height corresponding to a given target dose value 64. In this figure, the two sets H and AC do not intersect. It is easy to imagine that with more voxels and more dose-volume constraints, the complexity of the geometry for T>y grows quickly out of hand. However, Vy always enjoys a very nice geometric property. That is, despite its non-convexity, Vy permits a trivial projection onto it once the issue of non-uniqueness is resolved (see below). For example, suppose that Vy c R^ specifies only one dose-volume constraint: at least 70% of the voxels must have doses of no more than 5Gy. Then Projo„((l,2,3,4,5,6,7,8,9,10)^) = (1,2,3,4,5,5,5,8,9,10)^. where Projp is the projection onto Vy. That is, we set the smallest two numbers greater than 5 equal to 5. Clearly, this is the closest point in Vy as it affects the least change on the original point in R^. Since Vy is nonconvex, such a projection will not always be unique, but this issue can be resolved by setting some priority rules. It is not difficult to see that dosevolume constraints (DVCs) for multiple structures, and multi-level DVCs for the same structure, can be treated in a similar fashion. Moreover, it is worth noting that projecting a point d G R^ onto H is tantamount to setting the first mt components of d (the target voxel dose values) to bt and projecting the last m^ components of d (the healthy voxel dose values) onto Vy. On the other hand, projecting onto IC is substantially more difficult and will be discussed next.
216
Yin Zhang and Michael Merritt
besirable and beliverable Fig. 3. Prescription set 7-t and physical set K in dose space :!FI
3.2 O p t i m i z a t i o n F o r m u l a t i o n s
Given a prescription, ideally we would like to find x E iR7, s E R y ' h n d u E D, such that Atx = bt, Al,x s = u,
+
but this system is generally over-determined and does not permit a solution. To see this, it suffices to examine the first equation Atx = bt which has mt equations with n unknowns. In practice, there are usually more target voxels than the total number of bixels, i.e., mt > n. The reality of the IMRT fluence map problem is that there may be no physically achievable dose that both satisfies the DVCs and meets the target prescription. That is, 1-I n K = 0; or equivalently, di~t(1-I~ K) > 0 where dist(., .) is the Euclidean distance between two sets. Thus, we are motivated to find a prescription dose vector dT = [b: uT], u E D,, that is closest to the physical set K (or vice versa). In this view, we have an optimization problem with a variable u E D, (because bt is fixed):
The objective in the problem (7) describes the distance from a given prescription dose vector to the physical set K which can be written as
Fluence Map Optimization in IMRT Cancer Treatment Planning
217
where || • || is the Euchdean norm by default (though any other fixed, weighted norms can be used as well). Equivalently, we can replace the norm above by one half times the square of the norm and define the following objective function fiu) = min^ i WAx - btf + i \\A„x + s - uf . (8) Namely, f{u) is itself the optimal value of a linear least squares problem with nonnegativity constraints. Using this notation, we can rewrite the problem (7) into the following equivalent form min f(u).
(9)
It is not difficult to show that f{u) decreases monotonically as u increases. Let {x{u),s{u)) be the solution of the optimization problem defined in the right-hand side of (8) for a given u e Vy. Then under suitable conditions, it can be proved that f{u) is differentiable and Vf{u) = -ma.x{0, Ahx{u)-u)
<0,
(10)
where the maximum is taken component-wise. The derivation of this formula is rather long and we omit it here for the sake of space. Formulation (9) readily provides the sensitivity of a planning quality measure, f{u) in (8), with respect to the dose upper bounds u. This information can potentially be utilized to search the "prescription space" in the more realistic case where a definitive "optimal" prescription is not available, but a set of guideUnes are. 3.3 Alternating Projection Algorithm The method of alternating (or successive) projections is well known in convex optimization. It is simple and effective, though it can be quite slow. Given two closed convex sets E and F with a nonempty intersection, one can obtain a point in E f] F by successively projecting points in E and F onto each other. The convergence of this procedure when E Ci F j^ 9 was first proven by von Neumann for closed convex sets in Hilbert space satisfying certain properties [Neu50]. In finite dimensional cases, convergence is guaranteed (see [BB93, CG59]). One can further show that if the intersection is empty, the algorithm still converges to a pair of points {x,y) £ E x F such that \\x — y\\ = dist{E,F) > 0. For our problem, it is usually the case that dist{H,IC) > 0. Successive or simultaneous projection algorithms have been applied to different formulations of the IMRT flucence map optimization problem, see [CLM+98, WJLM04, XMCG04] for example, where the sets involved in projections are all convex sets (in some cases convex approximations to non-convex sets). To our best knowledge, projections have not been directly apphed to the non-convex DVC feasibility set Vy defined in (9). The set Vy in (9) consists
218
Yin Zhang and Michael Merritt
of a large number of "branches," where one or more of the voxels has a dose that exceeds its threshold dose. Obtaining or verifying a global minimum on such a set can be excessively difficult. We will instead seek a local minimum in one of the branches. We propose to apply an alternating projection algorithm to find a local minimum of (7) by successively projecting iterates in dose space back and forth between the prescription set Ti and the physical set IC. Specifically, given do &7i and for /e = 0,1,2,..., do 4+1/2 = Proj^cK),
4 + 1 = Proj„((ifc+i/2).
(H)
In this algorithm, the iterates are in the prescription set, while the intermediate iterates are in the physical set corresponding to a sequence of beamlet intensity vectors {xi;^i/2 : k = 0,1,2,...}. As mentioned earlier, the projection onto H is easy, and the projection onto IC requires solving a nonnegative hnear least squares problem as defined in the right-hand side of (8). We emphasize that the starting point do & Ti should be chosen to satisfy threshold values of all the dose-volume constraints; i.e., do should be in the intersection of all the "branches" (or "boxes"). For example, if a dosevolume constraint for a given structure is "no more than 30% of voxels can have dose values greater than or equal to 20Gy," then we should require that every component of do corresponding to a voxel of that structure to be set to the threshold value 20 (or possibly lower). As the iterations progress, the algorithm will then automatically select voxels where the threshold value of 20 will be exceeded. This way we avoid arbitrarily selecting which "branch" to enter at the outset. 3.4 Equivalence to a Greedy Algorithm We now consider a gradient projection algorithm directly appfied to the problem (9): given uo G !?„, •Ufc+i =Projj,^^(ufc-afcV/(ufc)), fc = 0 , 1 , 2 , . . . .
(12)
This is a steepest-descent type algorithm, or a greedy algorithm. At each step, the movement is based on the local sensitivity information ~ the gradient of f{u). Likewise, we select the initial iterate UQ to be at or below the threshold values of all the dose-volume constraints, ensuring u & T>y. Then the algorithm will automatically increase u (recall that Vf{u) < 0) in proportion to the sensitivity of the objective function at the current iterate. Moreover, the projection Proj-p^ that follows each move will keep the iterate within the feasibihty set Vy. Thus, this algorithm can be considered as a sensitivity-driven greedy algorithm for solving (9). We note that f{u) is monotone in the direction —Wf{u) > 0. Hence the step length selection in (12) seems not as critical as in general situations. We now show that the constant step length a/j s 1 will lead to an algorithm that is equivalent to the alternating projection algorithm (11).
Fluence Map Optimization in IMRT Cancer Treatment Planning
219
Theorem 1. Let {dk} and {uk} be generated by algorithms (11) and (12), respectively, where d]^ = \bj UQ\ and ai- = 1 in (12). Then dk =
bt
/c = l , 2 , 3 , . . . .
(13)
Proof. Let us drop the iteration subscript k. Define x{u) and s{u) as the solutions associated with the subproblem in the right-hand side of (8). By the definitions of the relevant projections, Proj„ (^Proj^ (^ J j j = P r o j „ (^^^(^^ + ,(^) j = (proj^^(yl,J(u) + ^(u)) Therefore, it suffices to show that u —V/(it) = Afix{u) + s{u). By the gradient formula (10) of V / ( u ) , u — Vf{u) = u + max(0, Ahx{u) — u) = max(u, Ahx{u)). So, it remains to show that Ahx{u) + s{u) = max(u, Ahx{u)) for all u E Vy. In the following, we use subscripts to denote components of vectors. If [A/jx(u)]j < Ui, then necessarily the slack variable s{u)i > 0 must satisfy [Ahx{u) + s{u)]i =Ui = max{ui, [Ahx{u)]i). On the other hand, if [i4/ia;(u)]i > m, then necessarily the slack variable s{u)i = 0 and [Ahx{u) + s{u)]i = [Ahx{u)]i = max{ui, [Ahx{u)]i). This completes the proof. D The equivalence between these two algorithms allows us to view the problem geometrically and apply the alternating projection algorithm (11) with the confidence that locally, reasonable choices are being made as to which dose bounds to relax to take advantage of the flexibility in the dose-volume constraints. 3.5 Convergence to Local Minimum Since the prescription set 7i is non-convex, the classic convergence theory for alternating projection algorithm is not directly apphcable. In our limited computational experience, the algorithm has never failed to converge so far. We observe that despite H being non-convex, it is the union of finitely many (simple) convex sets. This "local convexity" of 7i seems to likely allow a modified convergence proof for the alternating projection algorithm in our case, which remains to be a further research topic. In the meantime, if we introduce a minor modification to the alternating projection algorithm, then the convergence of the algorithm to a local minimum will be guaranteed. For simplicity, let us assume that there is only one healthy structure with a single dose-volume constraint that defines the feasibility set !?„:
220
Yin Zhang and Michael Merritt "No more than P-percent of the healthy voxels can receive doses exceeding a given threshold value 7 > 0."
Let us work with the simpler algorithmic form (12). Suppose that at iteration k, the dose vector u^ S T>y is such that (rfcP)-percent of the voxels have already exceeded the threshold value 7 for some r^ G [0>1]- Then we define 2?^ to be the set of dose vectors for the healthy structure that satisfy the following dose-volume constraint: "No more than (1 — Tfc) P-percent of the healthy voxels corresponding to [uk]i < 7 can receive doses exceeding the threshold value 7." In this setting, once a voxel has taken a dose value [uk]i > 7 at some iteration k, it will be allowed to take dose values greater than 7 for all the subsequent iterations. Moreover, once r/c = 1 at some iteration k, then in all subsequent iterations no more dose upper-bounds will be allowed to exceed 7 except those already having been allowed. The modified algorithm wiU take a projection at iteration k onto the set V^ instead of onto Vy; that is, Mfc+i =ProJx,fc(Mfc - V/(wfe)), /c = 0 , 1 , 2 , . . . .
(14)
In essence, this algorithm provides a greedy scheme to select a set of healthy voxels that are allowed to receive higher doses. We now state the following convergence result for this algorithm without a proof. Theorem 2. Let the iteration sequence {uk} be generated by the algorithm (14) withuo < 7. Then {uk} C Vy and satisfies that (i) w^+i > Uk componentwise, (a) f{uk+i) < /(wfc), and (Hi) {uk} converges to a local minimum w* of f{u) in Vy. We emphasize that the proposed algorithms in this paper are designed for quickly finding a good local optimum instead of locating a global optimum. A number of studies [WM02, LDB+03, JWM03] indicate that the existence of multiple local minima due to dose-volume constraints does not appear to notably affect the quality of treatment plans obtained by the weighted least squares approach. A plausible interpretation of this phenomenon is that there exist many easily reachable local minima with function values very close to the global minimum value. Given the presence of various errors in mathematical models (such as dose calculation models) and in data measurements, finding a global optimum for the underlying non-convex optimization problem does not seem necessary nor practically meaningful, as long as a good local minimum is found. Of course, it is still important to carefully assess the quahty of obtained solutions from a clinical viewpoint. Let us examine the dose values calculated at the solution u^. Clearly, w* € Vy and Ax{u^) e K. (corresponding to s = 0). However, in general one should not expect that Ahx{Ui,) G Vy. That is, the locally optimal physical dose calculated by the algorithm generally does not satisfy the dosevolume constraints, because such constraints are not explicitly imposed in
Fluence Map Optimization in IMRT Cancer Treatment Planning
221
our "weightless least-squares" formulation, just as in weighted least-squares formulations. While this lack of a direct control over the dose-volume constraint satisfaction could be viewed as a potential disadvantage on one hand, it does usually allow fast solution times on the other hand. 3.6 Preliminary Numerical Results In this section, we demonstrate the potential of our algorithm on some twodimensional phantom cases. The region of interest is a 101 x 101 voxel cross section of a simulated treatment area. Each of the three test cases have different geometries for the "tumor" and "critical organ", or organ at risk (OAR). The simplest involves a C-shaped tumor that has grown around a small OAR. More challenging is a small OAR completely surrounded by an "O"-shaped tumor. In the third case, we add further comphcation to the "O" configuration by having the OAR also include a rectangular region just outside the tumor. The geometries of these cases, as outhned in the top-side pictures of Figures 46, are nontrivial and, in our view, sufficient for prehminary proof-of-principle studies. In all the test cases, we specify the prescribed target dose to be 80Gy for all the tumors, and consider the dose-volume constraint: "at most 30% of the critical organ voxels can have doses greater than 25Gy." We label as "normal" all the tissue which is neither the tumor nor the organ at risk. Although not as concerning as injury to the critical organ, we would always hke to prevent this normal tissue from receiving too high a dose. Therefore, we also specify an upper bound of 75Gy for the normal tissue, equivalent to a dose-volume constraint: "0% of the normal tissue can have doses greater than 75Gy." Additionally, each plan uses 9 coplanar beams with dose absorption governed by an influence matrix (i.e., A in (2)) that we have obtained from The University of Wisconsin-Madison Tomotherapy Research Group. We implemented the algorithm (14) in Matlab. To perform the minimization in (8) (projection onto /C) at each iteration, we used an interior-point scaled gradient algorithm [MZ04]. In anticipation that least squares solutions will allow calculated doses to vary both above and below their desired values, to be on the safer side we adjust bt to be 5% higher than the desired tumor dose SOGy, and similarly the OAR threshold value to be 15% lower than the desired 25Gy. We stop the algorithm once the relative change from u^ to u^+i becomes less than 1%. In our experiments, we have observed that the algorithm took very few (usually two) iterations to terminate in all the tested cases. Our computational results are presented in Figures 4-6 corresponding to the three test cases. In each figure, we have included a dose distribution on the left, and a dose-volume histogram (DVH) on the right. The dose distribution indicates the level of calculated radiation intensity (in gray scale) deposited in the region of interest. As can be seen, the calculated doses are well focused on the tumors while more or less sparing the critical organs. The dose-volume
222
Yin Zhang and Michael Merritt
histograms show the relationship between a given dose value (in x-axis) and the volume percentage (in y-axis) of an anatomical structure receiving that level of radiation or higher. For instance, in Figure 4 or 5 the point (40,O.l) on the "Normal" curve means that 10% of the normal tissue has received a radiation 40Gy or higher.
Fig. 4. C-Shape Dose Distribution and DVH
We have previously performed computational experiments on the same set of phantom cases with a weighted least squares (WLS) approach and a successive linear programming (SLP) approach [MZL+04]. Given these experiences, drawing some comparison would be useful. The weighted least squares approach requires trying out multiple sets of weights and solving a non-negative least squares problem for each set of weights. As such, it generally requires considerably more computation than the new approach. In [MZL+04], we used an exhaustive search, designed only for cases with one critical organ and one tumor, to find a set of optimal
Fluence Map Optimization in IMRT Cancer Treatment Planning
223
Fig. 5. 0-Shape Dose Distribution and DVIl
weights. With such optimal weights, the WLS approach produced solutions of a quality similar to that of the new approach. The SLP approach enforces the exact satisfaction of the dose-volume constraints and solves a sequence of linear programs. It obtained slightly better quality solutioris than the new approach, but required far more computation than the new approach. In addition, the beamlet intensity distributions generated by the SLP approach are generally less smooth, creating difficulties for the later leaf-sequencing stage. For more details on the WLS and SLP approaches, we refer interested readers to [MZL+04]. These preliminary numerical results, as encouraging as they may appear, constitute only a first step towards validating the viability of the proposed approach.
224
Yin Zhang and Michael Merritt
Fig. 6. OA-Shape Dose Distribution and DVH
4 Final Remarks The IMRT fluence map optimization problem arises, along with a few other optimization problems, from the state-of-the-art technologies of radiation therapy for cancer treatment. The problem has been extensively studied by medical physicists, and has also attracted considerable on-going research from the operations research and optimization communities. Currently, the predominant methodology in practice is the "classic" weighted least squares (WLS) approach, which focuses on determining an optimal beamlet intensity vector. In this paper, we take a different view to treat dose distributions as the primary variables, resulting in a formulation based on the geometry in "dose space." It is our purpose to retain the popular "least squares" framework, while doing away with the burden of having to select weights in the classic WLS approach. The proposed formulation is free of weights, prescriptiondriven, sensitivity guided, and still shares basic characteristics of a leastsquares approach such as not having a precise control over the dose-volume
Fluence Map Optimization in IMRT Cancer Treatment Planning
225
constraint satisfaction and, at the same time, being much less computationally demanding. It is designed for quickly finding a good locally optimal plan associated with a given prescription. Prehminary computational results indicate t h a t the approach is potentially capable of producing solutions of a quality at least comparable to t h a t obtainable by the classic WLS approach. Encouraged by these proof-of-principle results, we are currently working towards more reahstic testings on three-dimensional chnical cases. T h e approach presented in this paper is only one of many on-going research efforts in helping optimize IMRT cancer t r e a t m e n t planning. It is hopeful t h a t an active participation of the operations research and optimization communities in this important application field will bring about an advancement to cancer t r e a t m e n t planning.
Acknowledgment T h e first author would like to thank his colleagues in the Optimization Collaborative Working Group, sponsored by the National Cancer Institute and the National Science Foundation, and Dr. Helen Liu of M. D. Anderson Cancer Center, from whom he has learned the basics about cancer radiation therapy.
References [BKH+68] G. K. Bahr, J. G. Kereiakes, H, Horwitz, R. Finney, J. Galvin, and K. Goode. "The method of linear programming applied to radiation treatment planning." Radiology, 91:686-693, 1968. [BB93] H. Bauschke and J. Borwein. "On the Convergence of von Neumann's Alternating Projection Algorithm for Two Sets," Set-Valued Analysis, 1: pp. 185-212 (1993). [Bra99] A. Brahme. "Optimized radiation therapy based on radiobiological objectives," Sem. in Rad. Oncol, Vol. 9, No. 1: pp. 35-47, (1999). [Cen03] Y. Censor. "Mathematical optimization for the inverse problem of intensity modulated radiation therapy." In: J.R. Palta and T.R. Mackie (Editors), Intensity-Modulated Radiation Therapy: The State of The Art. American Association of Physicists in Medicine, Medical Physics Monograph No. 29, Medical Physics Publishing, Madison, Wisconsin, USA, 2003, pp. 25-49. [CG59] W. Cheney and A. Goldstein. "Proximity maps for convex sets." Proceedings of the AMS, Vol. 10: pp. 448-450 (1959). [CLM+98] Cho PS, Lee S, Marks RJ II, Oh S, Sutlief SG and Phillips MH. "Optimization of intensity modulated beams with volume constraints using two methods: cost function minimization and projections onto convex sets". Medical Physics 25:435-443, 1998. [CLKBOl] C. Cotrutz, M. Lahanas, C. Kappas,and D. Baltas. "A multiobjective gradient-bcised dose optimization algorithm for external beam conformal radiotherapy," Phys. Med. Biol. 46: pp. 2161-2175 (2001).
226 [H0IO3]
Yin Zhang and Michael Merritt
A. Holder. "Designing Radiotherapy Plans with Elastic Constraints and Interior Point Methods," 2003, Health Care and Management Science, vol. 6, num. 1, pages 5-16. [JWM03] R. Jeraj, C. Wu and T. R Mackie. "Optimizer convergence and local minima errors and their clinical importance." Phys. Med. Biol. 48 (2003) 28092827. [LSB03] M. Lahanas, M. Schreibmann, and D. Baltas. "Multiobjective inverse planning for intensity modulated radiotherapy with constraintfree gradient-baised optimization algorithms," Phys. Med. Biol., 48(17): pp. 2843-71 (2003). [LMRW91] R. G. Lane, S. M, Morrill, I. I. Rosen, and J. A.Wong. "Dose volume considerations with linear programming optimization." Med. Phys., 18(6):1201-1210, 1991. [Lar97] E. W. Larsen. "Tutorial: The Nature of Transport Calculations Used in Radiation Oncology," Transport Theory Statist. Phys., 26: pp. 739 (1997). [LFC03] E. Lee, T. Fox, and L Crocker. "Integer Programming Applied to Intensity-Modulated Radiation Treatment Planning Optimization," Annals of Operations Research, Optimization in Medicine, 119: pp. 165-181 (2003). [LDB+03] J. Llacer, J. Deasy, T. Bortfeld, T. Solberg and C. Promberger. "Absence of multiple local minima effects in intensity modulated optimization with dose-volume constraints." Phys. Med. Biol. 48 (2003) 183210. [MZL+04] M. Merritt, Y. Zhang, Helen Liu, Xiaodong Zhang, Xiaochun Wang, Lei Dong, Radhe Mohan. "A successive linear programming approach to the IMRT fluence map optimization problem." Manuscript, 2004. [MZ04] M. Merritt and Y. Zhang. "An Interior-Point Gradient Method for LargeScale Totally Nonnegative Least Squares Problems." To appear in JOTA, Vol. 126, No. 1 (2005), pp. 191-202. [PM03] J. R. Palta and T. R. Mackie, eds. "Intensity-Modulated Radiation Therapy: The State of the Art," Medical Physics Publishing, 2003. [RLPT04] R. Rardin, M. Langer, F. Preciado-Walters, V. Thai. "A coupled column generation, mixed integer approaches to optimal planning of intensitymodulated Radiation therapy for cancer." To appear in Mathematical Programming, 2004. [HADL03] H. Romeijn, R. Ahuja, J. Dempsey, A. Kumar and J. Li. "A novel linear programming approach to fluence map optimization for intensity modulated radiation therapy treatment planning," Phys. Med. Biol., Vol. 48: pp. 3521-3542 (2003) [SFOM99] D. Shepard, M. Ferris, G. Olivera, and T. Mackie. "Optimizing the delivery of radiation therapy to cancer patients," SIAM Review, 41: pp. 721744 (1999). [SC98] S. V. Spirou and C. Chui. "A gradient inverse planning algorithm with dose-volume constraints," Med. Phys., 25(3): pp. 321-333 (1998). [Neu50] J. von Neumann. "The geometry of orthogonal spaces, Functional operators - vol. II." Annals of Math. Studies, no. 22, Princeton University Press, 1950. (This is a reprint of mimeographed lecture notes, first distributed in 1933.)
Fluence Map Optimization in IMRT Cancer Treatment Planning [WMOO]
227
Q. Wu and R. Mohan. "Algorithms and functionality of an intensity modulated radiotherapy optimization system," Med. Phys., 27(4): pp. 701-711 (2000). [WJLM04] C. Wu, R. Jeraj, W. Lu and T. Mackie. "Fast treatment plan modification with an over-relaxed Cimmino algorithm." Med. Phys. 31:191-200 (2004). [WM02] Q. Wu and R. Mohan. "Multiple local minima in IMRT optimization based on dose-volume criteria." Med. Phys. 29 (2002) 151427. [WDL+03] Q. Wu, D. Djajaputra, M. Lauterbach, Y. Wu, and R. Mohan. "A fast dose calculation method based on table lookup for IMRT optimization," Phys. Med. Biol, 48(12): pp. N159-N166 (2003). [XMCG04] Y. Xiao, D. Michalski, Y. Censor and J. Calvin. "Inherent smoothness of intensity patterns for intensity modulated radiation therapy generated by simultaneous projection algorithms." Phys. Med. Biol. 49 (2004) 32273245. [XLD+99] L. Xing, J. G. Li, S. Donaldson,Q. T. Le,and A. L. Boyer. "Optimization of importance factors in inverse planning," Phys. Med. Biol., 44(10): pp. 2525-2536 (1999).
Panoramic Image Processing using Non-Commutative Harmonic Analysis Part I: Investigation Amal Aafif^ and Robert Boyer^ ^ Department of Mathematics Drexel University, Philadelphia, PA 19104, USA. amalQdrexel.edu ^ Department of Mathematics Drexel University, Philadelphia, PA 19104, USA. rboyerSmcs.drexel.edu
Summary. Automated surveillance, navigation and other applications in computational vision have prompted the need for omnidirectional imaging devices and processing. Omnidirectional vision involves capturing and interpreting full 360° panoramic images using rotating cameras, multiple cameras or cameras coupled with mirrors. Due to the enlarged field of view and the type of sensors required, typical techniques in image analysis generally fail to provide sufficient results for feature extraction and identification. A non-commutative harmonic analysis approach takes advantage of the Fourier transform properties of certain groups. Past work in representation theory already provides the theoretical background to analyze 2-D images though extensive numerical work for applications is limited. We will investigate the implementation and computation of the Fourier transform over groups, such as the motion group. The Eucfidean motion group SE{2) is a solvable Lie group that requires a 2-D polar F F T and has symmetry properties that could be used as a tool in processing panoramic images.
1 Introduction Applications in computer vision have expanded as larger and larger images can be stored, processed and analyzed quickly and efficiently. Autonomous robot navigation, automated surveillance and medical imaging all benefit from expanded fields of view with minimal distortion. Rotating cameras, multiple cameras and mirror-camera systems are being developed to capture 360° panoramic images. An image taken from a camera aimed at a spherical mirror, for example, will give a very distorted view of the surroundings. While the image can be "unwrapped" through a cyhndrical transformation to create a panoramic image, the objects will appear distorted and blurry, making feature identification difficult. Several camera-mirror systems with various mirror shapes have been proposed to minimize distortion and eliminate the need for
230
Amal Aafif and Robert Boyer
pre-processing. However, efficient methods for feature identification and template matching for such images are still needed for automated image analysis. Conventional approaches to pattern recognition include moment invariants and Fourier descriptors however they are generally not suited for omnidirectional images. A group theoretical approach combined with Fourier analysis takes advantage of the symmetry properties of the Fourier transform over certain groups, particularly matrix Lie groups. By defining the general Fourier transform over a group, one can find invariants and descriptors similar to those mentioned above. This paper presents a preliminary investigation into a non-commutative harmonic analysis approach proposed in [GBS91] and [Fon96] and its possible application to panoramic images. Sections 2 and 3 provide the necessary background in Lie group theory and representation theory applied to the group of motions on the plane. Section 4 describes the motion descriptors and the invariants to be extracted from 2-D black & white and gray-scale images. The last two sections illustrate the results and effectiveness of the invariants as a tool for identifying objects under rotation, translation and scahng.
2 Basics of m a t r i x Lie groups and Representations An important class of groups is the matrix Lie groups over the real and complex numbers. The general linear group GL{n;R) or GL{n,C) over the real or complex numbers is the group of all n x n invertible matrices with real or complex entries. Regular noncommutative matrix multiplication is the group operation. A matrix Lie group is any subgroup G of GL{n,C) with the property that if any sequence of matrices in G converges to some matrix A, then either ^ G G or >1 is not invertible. This property holds if and only if a matrix Lie group is a closed subset of GL{n,C). Several matrix Lie groups wiU be taken into consideration in this paper. The set of all n x n orthogonal (ie. A^ = A~^) matrices with determinant 1 is the special orthogonal group SO{n), where both orthogonahty and having determinant 1 are preserved under hmits. SO{n) is then the matrix Lie group of rotations. Orthogonal groups generalized to complex entries are also matrix Lie groups. The set of all unitary matrices also form a subgroup of GL{n; C). A matrix is unitary if A* = A~^ where A* is the adjoint or conjugate-transpose of A. Unitary matrices have orthonormal column vectors and preserve the inner product. The Euchdean group E{n) is the group of aU one-to-one distance preserving maps from M" to itself. Rotations and translations are described by this group; so the orthogonal group is a subgroup of the Euclidean group. For X G M", the translation by x is defined by Tx{y) = x + y. Every element
Panoramic Image Processing T of E{n) can be expressed uniquely lation Tx where R G 0{n): T = T^R. {Ri,xi){R2,X2) = {RiR2,xi + -R1X2). is not a subgroup of the general linear matrices of the form:
231
as a rotation R followed by a transThe group operation is described by Since translations are not linear E{n) group but is isomorphic to a group of ^i\
/
R
X2
Voo... 1 / E{n) is not connected, ie. for A,B G E{n), we cannot always find a continuous path lying in the group from A to B. However, each non-connected matrix Lie group can be decomposed into connected components. If the component of G contains the identity then it is also a subgroup of G. Other important groups such as the positive reals under multiplication are usually not thought of as matrix groups; however they are isomorphic to matrix Lie groups. M* and C* are the groups of non-zero real numbers and complex numbers under multiplication that are isomorphic to the general linear group. S^ is a commutative group of complex numbers with modulus 1, ie. the unit circle. The orthogonal groups over M. and S^ satisfy the conditions for compactness: 1. If Am is any sequence of matrices in a matrix Lie group that converges to matrix A, then A is in the group. 2. There exists a constant C such that for aU A G G, \Aij\ < C for all The orthogonal groups over C, the Euchdean groups, M* and C* all violate property 2 and are therefore not compact. Before defining the Fourier transform over groups, some background in representation theory is required. If G and H are matrix Lie groups, we can define a continuous map 0 : G —> H caUed a Lie group homomorphism that satisfies 0{gh) = (!>{g)
GL{V)
where V is finite-dimensional real or complex vector space and 77 is a Lie group homomorphism. (If the homomorphism is one-to-one then the representation is faithful). A representation can be considered as a linear group action: for every g G G there is a corresponding operator n{g) that acts on a vector space V. A subspace W^ of F is invariant if n{A)w G W for ali w G W and all A G G. W is a trivial subspace ii W = V or W ^ $• An irreducible representation of a Lie group is a representation with no non-trivial invariant subspaces. If G is a matrix Lie group and ?i is a Hilbert space, then we can
232
Amal Aafif and Robert Boyer
define a homomorphism 77 from the G to tfie group of unitary operators on 7i, U{H). 77 is a unitary representation of G if strong continuity is satisfied: If An G G converges to A & G, then 77(A„)a; converges to IT{A)x for all x G 7i. Representation theory can be extended to more general infinite dimensional vector spaces, and representations of several types of Lie groups have been studied extensively [Mac89]. The purpose of using representations is to express the group in question as a group of matrices. Though all of the abovementioned groups are already groups of matrices, their representations further describe group structures and symmetries and simphfy calculations that arise.
3 Representations of SE{2) The Euclidean motion group SE{n), a connected component of E{n), is the semidirect product of K" with SO{n) and describes the motions on the Euclidean plane. In particular, SE{2) is a solvable, non-compact, noncommutative matrix Lie group that is given by 5^ Xs R^ (a closed subgroup and an abehan normal subgroup). An element of 575(2) is {d,x,y) where {x,y) G R^ and 6 G S^. The group operation is given by {Oi,Xi,yi){d2,X2,y2)
= (6'i-h6'2,a;2Cos6li-Fy2sin6li, -X2sin6'i+y2Cos6'i+j/i)
We would like to be able to take the Fourier transform over this group. However we cannot use the usual definition of the Fourier transform because it is defined on K, an additive abelian group where {e"*''*^}^ where A e M is the set of unitary irreducible representations of R, or the dual of R. In order to obtain an analogous definition of the Fourier transform on SE{2), we need to look its dual space. Since each motion group is non-abehan, the dual space (or the set of unitary irreducible representations) is not easily obtained. However, we can take advantage of the semidirect product subgroups and consider harmonic analysis on locally compact unimodular groups. A left Haar measure jj, on a matrix Lie group is locally finite and left translation invariant (ie. l^{gE) = n{E) for all 5 G G and all Borel sets E C G). A right Haar measure can be defined similarly and if the two measures coincide, then the group is unimodular. The left Haar measure is finite if and only if G is compact. Let A = {{V\,
D{f{X)) = f fig)4'x{9~')dK9)
(1)
For a unimodular locally compact group G, A is the set of unitary irreducible representations and defined as the dual of G. Equation (1) is the corresponding
Panoramic Image Processing
233
Fourier transform over a group G; it is an isometry from the Hilbert space LP'{G,ii) to the Hilbert space Ti of /(A), where A is the Plancherel measure. The inner product is defined as
< /(A),ff(A) >n= j tr[f{\)r{\)]dX A
where tr is the trace operator and * is the conjugate transpose. The inverse Fourier transform is I{9) = jtr[f{X)x{g)]dX A
For SE{2), Mackey's imprimitivity theorem is used to compute the representation as the union of two subsets SE{2)'
= {L'^{S\de),(^A, A G R+} U {C,4>l, A G Z}
(2)
The second representation, 0^, which will not be used here, maps an element of SE{2) into operators acting on C
where z £ C. The first representation, (p\ maps an element {9,x,y) into operators that act on L'^{S^,d9), the set of square summable functions on the unit circle with Lebesgue measure d9:
where z & S^ and F G L'^. It is typically more common to express the elements of the group and unitary representation in polar coordinates so that g G SE{2) and (f)\ are functions of 9,p and co. The unitary representation of SE{2) can generally be expressed as a matrix; let Um,n{9{9, p, w), A) be a matrix element of the representation 4>\{9, p,u!) then Um,n{9(.9,p,u;),X) = i"—e-^("^+('"-")-)j^_„(pA)
(3)
where Jk{x) is the fcth Bessel function. The matrix unitary representation of SE{2) allows us to see the group symmetry properties more clearly [CKOO]: Wm,n(ff,A) =
(-l)'"-"u__„,_„(5,A)
Um,n{9,X) = '«m!n(5>A) = U m , n ( 5 ~ \ A ) Um,n{9{9,-p,U!),X)
UrrUg{9,p,uj),X)
=
{-l)'"-"u„i,n{9{9,P,l^j),X)
= {-ir-"u^A9{-~9,p,uj^9),X)
(4)
where Um,n is the conjugate of Um,n- Since SE{2) is neither compact nor commutative, the matrix representation is infinite dimensional. However, the representation is still irreducible.
234
Amal Aafif and Robert Boyer
With the representation established, we can go back to defining the Fourier transform on SE{2) using the representation (f)\{9,x^y). The Fourier transform of a function / on L^{SE{2)) n L^{SE{2)) is /(A)=
I
f{9,x,y)
(5)
SE{2)
The Plancherel measure is X/An'^dX and the inner product is defined by
< /(A),ff(A) >H= I tr[fiXWiX)]{-^)dX
(6)
A>0
where H is the space of functions on SE{2)''. The inverse Fourier transform on SE{2) recovers / f{e,x,y)
= J tr[f{X)MO,x,y)]{^)dX
(7)
A>0
It is possible to define the inverse transform because (px is a complete set of irreducible representations. In addition, the fact that the representation is unitary avoids the problem of calculating the inverse of an infinite matrix
4 Motion Descriptors Prom the previous section, we see that the Fourier transform over SE{2) and other groups can be defined. In addition, the transform on K as a homogeneous space for SE{2) has also been defined. The Motion Transform on SE(2) of / G L^(]R), proposed in [GBS91J, is expressed in terms of the basis e'"^ where n e Z M ( / ) „ = 27re*"-/2 J f{p,u)e'^^M~Xp)pdpdu, (8) Rotations and translations of / are equivalent to right multiplying the motion transform by a unitary matrix (the representation of a particular group element). From the motion transform, Gauthier et al. define motion descriptors similar to Fourier descriptors oo
MD(/(A)= Y. M(/)„M(/)„ n=—oo
They are invariant with respect to motions on the plane. Each descriptor value is associated with a unitary irreducible representation of SE(2). Furthermore,
Panoramic Image Processing
235
we can directly compute the motion descriptors from the usual Fourier transform of / in polar coordinates:
MD{f)ix) = j f{\,e)fi\,e)dd
(9)
0
For a > 1, the invariants of order a are 27r
\fi\,e)\"de
(10)
A 2-D image is treated as the homogeneous space with the group acting on it (ie. G is Lie group and for any x, y in the image, there is a. g € G such that y = gx). For any image / G L^(]R^), we can compute the Fourier transform and its motion invariants. It is known that these invariants are not guaranteed to distinguish between any given images since they are not a complete set of invariants [GBS91]. In the foUowing section, we will apply equation (10) to a set of simple black & white images and similar gray-scale images.
5 Results Figure 1 contains the images for which the motion invariants were calculated. The invariants are calculated over a set of radii in the image and used to determine how "far apart" or different two images are. Figures 2-7 illustrate the results of the invariant difference comparison. In Figures 2 and 3, we can see that the motion descriptor is invariant to rotations. This was not the case for translations on the plane as shown in figures 4 and 5: two different images were perceived to be more "alike" than the same image and its translate. Figure 5 compares images with some gray added. We compare 2 grayscale images that differ by a rotation with the same grayscale image and a similar black & white image. The graph seems inconclusive in determining which set is more similar. However this is not necessarily a poor feature to have in terms of pattern recognition because we would like our method to still recognize a template that has undergone a small change in color or shape (due to image deformation, for example). Figure 6 displays very poor results for scaling. We compare an image with itself scaled by a factor of 1/4; the invariant is not suitable for identification here. Invariants for more realistic gray-scale images, such as figures 8 and 9, were also calculated but the approach failed to give any acceptable results.
236
Amal Aafif and Robert Boyer
Fig. 1. Set of Images; from top left to bottom right, they will be referred to as
circles 1, 2, 3, lg, 2g, 3g, lgsm
Fig. 2. Invariants comparing circles 1 and 2 with circles 1 and 3
6 Conclusion The motion descriptor invariants, though easily calculated, give relatively poor results outside of simple black & white images under rotation. Some scaling factors may be chosen to define appropriate invariants under translation and scaling as well. More theoretically, it will be necessary to develop a complete set of invariants for all motions on the plane. We believe this approach would be more suitable for omnidirectional images. Future work will consider noncommutative techniques for analyzing objects in such images. It would also
Panoramic Image Processing
237
Fig. 3. Invariants comparing circles 1 and 2 with circles 1 and 4
Fig. 4. Invariants comparing circle 3 to a right translated version of itself with circle 3 and circle 1
Fig. 5. Invariants comparing circle 3 to a right translated version of itself and circles 3 and 2 be interesting to perform a similar analysis with other Lie groups that have interesting properties; the theoretical background to work with other groups already exists (see [Mac89]) but extensive numerical work in engineering and computer vision applications has yet to be explored.
238
Amal Aafif and Robert Boyer I
!,
- Ele-C2n/
- - l2II.CII
Fig. 6. Invariants comparing circles l g and 2g with circles lg and 3
Fig. 7. Invariants comparing circles lg and lgsm with circles lg and 1
Fig. 8. Realistic gray-scale image 1
Furthermore, the 2-D polar FFT over S E ( 2 )still needs to be implemented. In this work, we calculated the standard discrete 2-D Fourier Transform of the image and then changed t o polar coordinates. Averbuck1 et al. recently proposed a new method for directly calculating an accurate polar F F T [Ave03] by initially choosing non-uniformly spaced points on concentric squares on
Panoramic Image Processing
239
Fig. 9. Realistic gray-scale image 2
t h e image and then correcting t o obtain points on concentric circles on t h e image. T h e implementation of this method could improve t h e accuracy of t h e invariants along each radius.
References [AveOS] A. Averbuch et al. (2003), "Accurate and Fast Discrete Polar Fourier Transform" (preprint). [CKOO] G.S. Chirikjian and A.B. Kyatkin (2000), Engineering Applications in Noncommutative Harmonic Analysis: With Emphasis on Rotation and Motion Groups, CRC Press Online. [Fon96] H. Fonga (1996), "Pattern recognition in grey-level images by Fourier analysis" Pattern Recognition Letters, Vol. 17, 1477-1489. [GBS91] J.-P. Gauthier, G. Bornard, and M. Silbermann (1991), "Motions and Pattern Analysis: Harmonic Analysis on Motion Groups and Their Homogeneous Spaces" IEEE Tk-ansactions on Systems, Man, and Cybernetics, Vol. 21, NO. 1, 159-172. [Ha1031 B. Hall (2003), Lie Groups, Lie Algebras, and Representations: An Elementary Introduction, Springer-Verlag, New York, New York. [Mac891 G.W. Mackey (1989)) Unitary Group Representations in Physics, Probability and Number Theory, Addison-Wesley Publishing, Redwood, CA.
Generating Geometric Models through Self-Organizing Maps Jung-ha An^, Yunmei C h e n \ Myron N. Chang^, David W i l s o n \ and Edward Geiser'^ ^ Department of Mathematics, University of Florida, Gainesville, FL 32611. {j ungha,yun,dcw}@math.uf1.edu ^ Department of Biostatistics, University of Florida, Gainesville, FL 32611. mchangOcog.uf1.edu ^ Department of Medicine, University of Florida, Gainesville, FL 32611. geiseeaQmedicine.uf1.edu
Summary. A new algorithm for generating shape models using a Self-Organizing map (SOM) is presented. The aim of the model is to develop an approach for shape representation and classification to detect differences in the shape of anatomical structures. The Self-Organizing map requires specification of the number of clusters in advance, but does not depend upon the choice of an initial contour. This technique has the advantage of generating shape representation of each cluster and classifying given contours simultaneously. To measure the closeness between two contours, the area difference method is used. The Self-Organizing map is combined with the area difference method and is applied to human heart cardiac borders. The experimental results show the effectiveness of the algorithm in generating shape representation and classification of given various human heart cardiac borders. K e y w o r d s : Geometric Models, Average, Clustering, Self-Organizing Maps
1 Introduction In computer vision, shape analysis has become one of the most critical areas and has a wide range of applications including pattern recognition, object recognition, and medical image analysis. Due to more advanced algorithmic models t h a t better encapsulate and analyze images, shape analysis is now a major modern focus of many researchers who then a t t e m p t to utilize mathematics, statistics, and many other computer aided techniques to further theoretical developments. Much work is devoted to overcoming imaging difficulties which include significant signal loss, noise, and non-uniformity of regional intensities. Therefore, prior information is often necessary to resolve these types of problems, which occur in diagnostic images. Shape representation has a significant application to image segmentation and registration
242
Jung-ha An et al.
problems. Shape representation of the given contours can be used as a prior information in [LGFOO, LFGOO, CTT02, CGH03, CHW03, CHT03, PRR02, PR02, SY02, YD03] to segment and register given images more efficiently. Central to the shape analysis problem is the notion of a random shape. Prechet [Pra61] initiated developments of this area for analyzing the random shapes, i.e. curves. Matheron [Mat75] and David Kendall [Ken73, Ken84, Ken89] and his colleagues continued after him for mostly further theoretical developments of the analysis of random shapes. A theory and practice for the statistical analysis for the shapes has been developed by Bookstein [B0086], Dryden and Mardia [DM98], Carne [Car90], Kent and Mardia [KMOl], Cootes Taylor and colleagues [CTC95]. Their ideas used in the statistical analysis are tied to the study of the point-wise representation of the shapes: objects are represented by a finite number of salient points or landmarks which differ from the previous theoretical developments by Prechet [Fra61], Matheron [Mat75], and David Kendall [Ken73, Ken84, Ken89]. The model in this paper follows the ideas similarly from Bookstein [B0086], Dryden and Mardia [DM98], Carne [Car90], Kent and Mardia [KMOl], and Cootes Taylor and colleagues [CTC95] by studying the point-wise representation of the shapes using statistical methods. The goal of this paper is to generate the shape representation and classification of the given contours using the Self-Organizing map combined with the area difference method. For basic pattern recognition, the Self-Organizing Map(SOM) which was developed by Professor Teuvo Kohonen in the early 1980s, is widely used and can be visualized as a sheet-like neural-network array, the cells (or nodes) of which become specifically turned to various input signal patterns or classes of patterns in an orderly fashion. The learning process is competitive and unsupervised, meaning that no teacher is needed to define the correct output(or actually the cell into which the input is mapped) for an input. In the basic version, only one map node(winner) at a time is activated corresponding to each input [Hon]. The main applications of the SOM are thus in the visualization of complex data in a two-dimensional display, and creation of abstractions like in many clustering techniques [KohOl]. In order to get the good alignments and distance between two contours, the measurements between contours are important. In this paper, area difference distance methods from Chen [CWHOl] is adopted for the shape alignments and the distance measurements. Area difference between two contours is minimized to optimize the alignments of the given two contours in Chen [CWHOl]. The paper is organized as follows: In section 2, the existing methods are briefly reviewed. The area difference method and the Self-Organizing maps are explained. The model is suggested in section 3. To generate the shape representation and classification of the given contours, the Self-Organizing map is combined with the area difference methods in the model. In section 4, the numerical results of the model with an application to the human heart cardiac borders are shown. Finally, the conclusion and the future work are stated in the section 5.
Generating Geometric Models through Self-Organizing Maps
243
2 Review of t h e Existing M e t h o d s In this section, the existing methods which are the area difference method and the Self-Organizing map are briefly reviewed. The area difference method is used to get the shape alignments and to measure the distance between two arbitrary contours. To generate the shape representation and classification, the Self-Organizing map is used. 2.1 Description of the Area Difference Distance for the shape alignments In this paper, the area difference distance from [CWHOl] is used to optimize the distance and to get the best alignments between the two given contours. In [CWHOl], area difference between two contours is minimized to optimize the alignments of the two given contours. The following is the brief review of the area difference method developed in [CWHOl]: The notion of shape in our model is assumed independent of translation, rotation, and scaling. Any two contours A and B will be regarded as having the same shape, if there exists a scale jj,, a rotation matrix R with respect to an angle 9, and a translation T such that A coincides with fiRB + T. Definition 1. Let A and B be two contours. Then area(A, B) = the area o{{AUB~AnB). Definition 2. The area distance between two contours A and B is defined as the minimum area between two contours after rotation, transfer, and scaUng. AD{A,B) = mm {area{nRA + T,B)) (2.1) a —b b a Theorem 1. Let A = (uij), B = (bij) and [b.i+1,2 bi,2) + {bi+i,i - ^<,i) > where i,j = 1
In this subsection, we will denote /ii? ~ R =
mm(area(RA
n. Then
T,B))
R,T
a —b , T b a CI 0 C2 C3 0 CI - C 3 C 2 C 2 - C 3 C4 0 * C3 C2 0 C4
achieves at R •
CI
HI i=l
=
'h
, wl
.*2.
a b
h t2
Bl B2 — B2, S4
4-0^2), C2 = ^ C i - a a ,
C3 = ^ C i - a i 2 ,
^74 =
^ ^ii
244
Jung-ha An et al. n
n
S I = ^ C j • {biiau +bi2ai2),
5 2 = ^ C j • (6j2aii - &aai2),
1=1
i=l n
B3 = ^Ci-bii, i=\
n
B4: =
Y^Ci'b,2i=\
Here, area{RA + T, B) was approximated by n
y ^ Ci((o • aa - 6 • ai2 +
2.2 Description of the Self-Organizing Map In this section, the Self-Organizing map is described. The following is the brief description of the Self-Organizing map from [Hon, KohOl]: The set of input samples is described by a real vector x(<) G i?", where t is the index of the sample, or the discrete-time coordinate. Each node i in the map contains a model vector mj(t) G i?", which has the same number of elements as the input vector x(t). The stochastic SOM algorithm performs a regression process. Thereby, the initial values of the components of the model vector, mj(t), may even be selected at random. Any input item is thought to be mapped into the location, the mj(i) of which matches best with x(t) in some metric. The self-organizing algorithm creates the ordered mapping as a repetition of the following basic tasks: First, compare an input vector x(i) with all the model vectors mj(t). The best-matching unit (node) on the map, i.e., the node where the model vector is most similar to the input vector in some metric (e.g. Euclidean) is identified. This best matching unit is often called the winner. Then, the model vectors of the winner and a number of its neighboring nodes in the array are changed towards the input vector according to the learning principle specified below. Adaptation of the model vectors in the learning process may take place according to the following equations: mi(i + 1) = mi(i) + a ( 0 [ x ( 0 - nii{t)],if xni{t + 1) = nij(i), otherwise,
i e N,{t)
where t is the discrete-time index of the variables, the factor a{t) G [0,1] is a scalar that defines the relative size of the learning step, and Nc{t) specifies the neighborhood around the winner in the map array. At the beginning of the learning process the radius of the neighborhood is fairly large, but it is made to shrink during learning. This ensures that the global order is obtained already at the beginning, whereas towards the end, as the radius gets smaller, the local corrections of the model vectors in the map will be more specific. The factor a{t) also decreases during learning.
Generating Geometric Models through Self-Organizing Maps
245
3 Description of t h e Suggested Model The purpose of the paper is to generate the shape representation and classification of the given contours using the Self-Organizing map combined with the area difference method. In this section, the suggested model is explained. The Self-Organizing map is combined with the area difference distance method [CWHOl] to get shape representation and classification of the given various human heart cardiac borders. The following is the description of the proposed model: STEP 1. First, normalize all the n training contours Q . If the n training contours Cj, i = l , . . . , n to be grouped into k clusters, take k arbitrary contours as the initial contours, denoted by mj(0) {j = 1, ...,k). STEP 2. At {t + 1) iteration, randomly select a contour denoted by X{t + l) from the training set, and compare the disparity in shape between X{t + 1) and each of rrijit) {j = 1,..., fc). STEP 3. To do this comparison, align X{t + 1) to each mj[t), and denote the aligned X{t + 1) by Xj{t + 1) = iijRjX{t + l)+Ty STEP 4. Then we compute Aj =: AD{Xj{t + l),mj) defined in (2,1). STEP 5. Suppose Ai is the smallest number in Aj, j = 1, ...,k. Keep m2{t) , ... , mk{t) unchanged, and update TOI(<) by mi{t + 1) = mi{t) + a{t)[Xi{t + 1) -
mi{t)],
where a{t) is a smooth function of t, and decreases to zero as i —» oo. Then normalize and re-parameterize mi{t). STEP 6. After a large number, say A^^, of iterations, k average shapes "^i(-^) (j — l,...,fc) are generated. STEP 7. Then k clusters are formed by the contours that are closest to the average shapes. The closeness is again measured by the measurement in (2.1).
4 Numerical Results When the heart is imaged from the parasternal short-axis view, it has a simple geometric shape that can be reasonably modelled by a continuous contour which can then be used in echocardiographic image analysis [CWHOl]. The suggested model is applied to human heart ultra sound cardiac borders. a{t) = 0.9(1 — t/N) is used in the experiment, where t is the time step and N is total number of iterations which is a smooth function and decreases to zero as t is getting very large. To calculate, first, all of contours should be read, then move the centers of those contours to origin. After that, same number of points on each contour should be chosen by same angle. Then the suggested algorithm which is described in Section 3 is applied to the patients hearts cardiac borders to cluster and to average each cluster.
246
Jung-ha An et al.
4.1 2 chamber ED(end diastole) EPI(epicardia1) normal 85 contours with 3 groups In this section, experimental results of the algorithms with an application to 2 chamber ED EPI normal 85 patients heart contours are showed. Those contours are divided into 3 groups. From the table which compare the results between 10000 iterations and 30000 iterations, the convergence is showed. In each cluster, contours are very close to the average contour.
Fig. 1. 2 chamber ED EPI normal 85 patients heart contours 10000 iterations. The blue color represents the average of each cluster and pink colors represent 85 patients heart contours. From left to right, Group 1: 34 contours, Group 2: 22 contours, Group 3: 29 contours
Iterations Group 1 Group 2 Group 3 22 34 29 10000 22 30000 29 34 Table 1: In the above table, we compare the results between 10000 iterations and 30000 iterations.
4.2 4 chamber ED(end diastole) ENDO(endocardia1) abnormal 44 contours with 3 groups From now on, since the convergence results in the above is showed from the experiments, the final result will be displayed in the table. In this section, experimental results of the method are showed with an application to 4 chamber ED E N D 0 abnormal 44 patients heart contours which are divided into 3 groups. Compared with the results from normal case, it can be observed that the contours are less close t o average in each cluster. But this is natural, since the given shapes are more irregular than the normal ones.
Table 2: In the above table, we show the results for 10000 iterations.
Generating Geometric Models through Self-organizing Maps
247
Fig. 2. 4 chamber ED END0 abnormal 44 patients heart contours 10000 iterations.
The red color represents the average of each cluster and blue colors represent 44 patients heart contours. From left to right, Group l : 34 contours, Group 2: l contours, Group 3: 9 contours 4.3 4 c h a m b e r E D ( e n d diastole) ENDO(endocardia1) n o r m a l a n d a b n o r m a l 135 c o n t o u r s w i t h 12 g r o u p s
In this section, experimental results are showed with an application to 4 chamber ED E N D 0 normal and abnormal 135 patients heart contours. Normal contours are 91 and abnormal contours are 44. Those contours are divided into 12 groups. In the below table, this method separated most of the normal and abnormal mixed contours very well. Iterations 60000 Iterations 60000 Iterations 60000 Iterations 60000
Group 1(nor,ab) Group 2(nor,ab) Group 3(nor,ab) 24(20,4) 3(013) 1(110) Group 4(nor,ab) Group 5(nor,ab) Group 6(nor,ab) 13(12,1) 1(0,1) 1(0,1) Group 7(nor,ab) Group 8(nor,ab) Group 9(nor,ab) 28(26,2) 1(1,0) 4(0,4) Group lO(nor,ab) Group ll(nor,ab) Group 12(nor,ab) 35(22,13) 19(6,13) 5(3,2)
Table 3: In the above table, we show the results for 60000 iterations. From the experiments, we can see that Group 1, Group 2, Group 4, Group 6, Group 7 ,and Group 8 have either all normal or abnormal contours. Group 3, Group 5, and Group 9 have more normal contours.
5 Conclusion The purpose of the paper is to generate the shape representation and classification of the given contours using the Self-organizing map combined
248
Jung-ha An et al.
Fig. 3. 4 chamber ED E N D 0 normal 91 and abnormal 44 patients heart contours 60000 iterations. The red color represents the average of each cluster and blue colors represent 135 patients heart contours. From left to right and from top to bottom, Group 1: 1 contour, Group 2: 3 contours, Group 3: 24 contours, Group 4: 1 contour, Group 5: 13 contours, Group 6: 1 contour, Group 7: 1 contour, Group 8: 4 contours,Group 9: 28 contours, Group 10: 35 contours, Group 11: 19 contours, Group 12: 5 contours
with the area difference measurement. From the above experiments, it can be seen that this method is more effective, when it is applied to normal contours. Even though it did not distinguish normal and abnormal contours completely, the results indicate that this method separated most of the normal and abnormal mixed contours very well. The effectiveness of the method which does not depend on the choice of an initial contour is shown from the experiments, even though it requires the number of specification in advancc. It also has an advantage of finding the average of each cluster and total clustering simultaneously. In the future work, investigation of better distance measurements and the decreasing function a ( t ) are needed to improve the current methods. It is also necessary to study the error analysis of the method for the validation of the results.
Generating Geometric Models through Self-Organizing Maps
249
References [B0086] F.L. Bookstein, " Size and shape spaces for landmark data in two dimensions'', Statiatical Science, Vol.1, (1986), 181-242. [CTT02] Y. Chen, H. Tagare, M. S.R. Thiruvenkadam, F. Huang, D. Wilson, A. Geiser, K. Gopinath, and R. Briggs, " Using prior shapes in geometric active contours in a variational framework^', International Journal of Computer Vision, Vol.50, (3), (2002), 315-328. [CCC93] V. Caselles, F. Catte, T. Coll, and F. Dibos, " A geometric model for active contours in image processing'', Numerische Mathematik, Vol.66, (1993), 131. [CHT03] Y. Chen, F. Huang, H. Tagare, M. Rao, D. Wilson, and A. Geiser " Using prior shapes and intensity profiles in medical image segmentation ", Proceedings of International Conference on Computer Vision, Nice, France, (2003), 1117-1124. [CHW03] Y. Chen, F. Huang, D. Wilson, and A. Geiser " Segmentation with shape and intensity priors", Proceedings, Second International Conference on Image and Graphics, August 2002, Hefei, China, (2003), 378-385. [CGH03] Y. Chen, W. Guo, F. Huang, D. Wilson, and A. Geiser " Using prior shapes and points in medical image segmentation ", Proceedings of Energy Minimization Methods in Computer Vision and Pattern Recognition, Lisbon, Portugal, July 7-9, (2003), 291-305. [Car90] T.K. Carne, ^^The geometry of shape spaces, Proc. of the London Math. Soc, vol.3, no.61, (1990), pp.407-432. [CTC95] T. Cootes, C. Taylor, D. Cooper and J. Graham, ^^Active shape model their training and application, Computer Vision and Image Understanding, Vol. 61 (1995), pp. 38-59. [CWHOl] Y. Chen, D. Wilson and F. Huang, ".4 new procrustes methods for generating geometric models", Proceedings of World Multiconference on Systems, Cybernetics and Informatics, July 22-25, 2001, Orlando, (2001), 227-232. [DM98] I.L. Dryden and K.V. Mardia, ''Statistical Shape Analysis", John Wiley & Son, (1998). [Fra61] M. Frechet, ''Les courbes aleatoires" Bull. Inst. Internat. Statist., Vol. 38, pp.499-504, 1961. [Hon] Honkela, Timo "Description of Kohonen's Self-Organizing Map." http://www.mlab.uiah.fi/ timo/som/thesis-som.html [Ken73] D.G. Kendall, ''Stochastic Geometry, chapter Foundation of a theory of random sets", John Wiley Sons, New York, (1973) 322-376. [Ken84] D.G. Kendall, "Shape-manifolds, Procrustean metrics, and complex projective spaces", Bull. London Mathematical Society, (1984), 81-121. [Ken89] D.G. Kendall, "A survey of the statistical theory of shape", Statist. Sci., vol.4, no.2,(1989) 87-120. [KMOl] J.T. Kent and K.V. Mardia, "Shape, procrustes tangent projections and bilateral symmetry", Biometrika, (2001), 88:469-485. [KohOl] T. Kohonen, "Self-Organizng Maps", Springer, (2001). [LFGOO] M. E. Leventon, O. Faugeras, E. Crimson, W. Wells. "Level Set Based Segmentation with Intensity and Curvature Priors" Mathematical Methods in Biomedical Image Analysis, (2000). [LGFOO] M. E. Leventon, E. Crimson, and O. Faugeras, "Statistical Shape Influence in Geodesic Active Contours", Proc. IEEE Conf. CVPR (2000), 316-323.
250
Jung-ha An et al.
[Mat75] G. Matheron, "Random Sets and Integral Geometry", John Wiley &i Sons, 1975. [PR02] N. Paragios and M. Rousson, "Shape prior for level set representations", Computer Vision-ECCV2002, the 7th European Conference on Computer Vision, Copenhgen, Demark, May 2002 Proceeding. [PRR02] N. Paragios, M. Rousson, and V. Ramesh, "Marching distance functions: a shape-to-area variational approach for global-to-local registration", Computer Vision-ECCV2002, 775-789. [SY02] S.Soatto and A.Yezzi, "Deformation: deformining motion, shape average and joint registration and segmentation of images, Computer VisionECCV2002. [YD03] J. Yang and J.S. Duncan, "3D image segmentation of deformable objects with shape apprearance joint prior models", MICCAI, (2003), 573-580.
Self-similar Solution of Unsteady Mixed Convection Flow on a Rotating Cone in a Rotating Fluid Devarapu Anilkumar* and Satyajit RoyDepartment of Mathematics, Indian Institute of Technology, Madras, Chennai 600 036, India. {anil,sjroy}@iitm.ac. in Summary. This paper deals with a new self-similar solution of unsteady mixed convection flow in the stagnation point region of a rotating cone in a rotating fluid . The fluid and the body rotate either in the same direction or in the opposite direction. It has been shown that a self-similar solution is possible when the free stream angular velocity and the angular velocity of the cone vary inversely as a linear functions of time. The non-linear coupled PDEs governing the flow have been reduced to a set of coupled non-linear ordinary differential equations using a suitable similarity transformations mentioned above. Finally the set of non-linear coupled ODEs been solved numerically using an implicit finite difference scheme in combination with the quasi-linearization technique. Numerical results are obtained for the skin friction coefficients and Nusselt numbers. The effect of various parameters on the velocity and temperature profiles are also obtained.
K e y w o r d s ; Mixed Convection, Rotating Cone, Self-similar solution.
1 Introduction Convective heat transfer in rotating flows over a stationary or rotating cone is important for the thermal design of various types of industrial equipments such as rotating heat exchangers, spin stabilized missiles, canisters for nuclear waste disposal, nuclear reactor coohng system and geothermal reservoirs. T h e cooling of the nose-cone of re-entry vehicle by spinning the nose [OB58] may also be considered as another possible application of the present study. The system to be studied, shown schematically in F i g . l , is a right circular cone rotating about its vertical axis of symmetry in a rotating viscous fluid. T h e rotational motion of the cone induces a circumferential velocity in the fluid through the action of viscosity. Further, due to the action of the centrifugal force field, the fluid is impelled along the cone surface parallel to a cone ray. Thanks to CSIR, India for supporting to attend this conference.
252
Devarapu Anilkumar and Satyajit Roy
and to satisfy conservation of mass, fluid distant from tfie cone migrates towards it, replacing the fluid which has been centrifuged along the cone surface. If the cone surface and free stream fluid temperature differ, not only energy will be transferred to the flow but also density difference will exist. In a gravitational field these density differences result in an additional force(buoyancy force) besides that due to the viscous action or the centrifugal force field. In many practical circumstances of moderate flow velocities and large wall-fluid temperature differences, the magnitudes of buoyancy force and the centrifugal force are of comparable order and convective heat transfer process is considered as mixed convection.
*-
g'
Fig. 1. Physical model and coordinate system
z.w
Self-similar Solution of Unsteady Mixed Convection Flow
253
The early works of Tien and Suji [Tie60] present a theoretical analysis of the forced flow and heat transfer past a rotating cone and the influence of Prandtl number on the heat transfer on rotating non-isothermal disks and cones was described by Hartnett and Deland [HD61]. Also, the similarity solution of the mixed convection from a rotating vertical cone in an ambient fluid was obtained by Hering and Grosh [HG62] for Prandtl number Pr = 0.7. In a further study, Himasekhar et al.[HSJ89] found the similarity solution of the mixed convection flow over a vertical rotating cone in an ambient fluid for a wide range of Prandtl numbers. Wang and Kleinstreur [WK90] has also obtained a similarity solution of boundary layer flows on rotating cones, discs and axisymmetric bodies with concentrated heat sources. All these studies pertain to steady flows. In many practical problems, the flow could be unsteady due to the angular velocity of the spinning body which varies with time or due to the impulsive change in the angular velocity of the body or due to the free stream angular velocity which varies with time. Therefore, as a step towards the eventual development of studies on unsteady mixed convection flows, it is interesting as well as useful to investigate the combined effects of thermal and mass diffusion on a rotating cone in a rotating viscous fluid where the angular velocity of the cone and the free stream angular velocity vary arbitrarily with time. The aim of the present study is to develop a new self-similarity solutions for the unsteady mixed convection flow on a rotating cone in a rotating viscous incompressible fluid. It is observed that a similar solution is possible when the free stream angular velocity and the angular velocity of the cone vary inversely as a linear functions of time. The system of ordinary differential equations governing the flow has been solved numerically using the method of quasi-hnearization and an implicit finite difference scheme [IT74]. Particular cases of the present results are compared with those of Hering and Grosh [HG63], and Himasekhar et al.[HSJ89].
2 Analysis Consider the unsteady laminar mixed convection flow over an infinite rotating cone in a rotating viscous fluid. The unsteadiness in the flow field is introduced by rotating the cone and the surrounding free stream fluid about the axis of the cone with time-dependent angular velocities either in the same direction or in the opposite direction. Figure 1 shows the co-ordinate system and the physical model. The buoyancy forces arise due to both the temperature and concentration variations in the fluid and the flow is taken to be axisymmetric. Under the above assumptions and using the Boussinesq approximation, the governing boundary layer momentum, energy equations can be expressed as [HG63, HSJ89, GEB88]:
254
Devarapu Anilkumar and Satyajit Roy (xu)^ + {xw):, = 0
Ut +UUx
(1)
V
V = - - ^ X X
+WU;^
uv Vt + UVx + WVz H
,
+ l/Uzz + 5 * / ? C 0 S a * ( T - Too)
.
(2)
_.
= {Ve)t + VV^z
(3)
Tt + uT^ + wT, = aT,,
(4)
The initial conditions are u (0, X, z) = Ui{x, z),v{0, X, z) = fi(a;, z) w{Q, X, z) = 'Wi{x, z), r ( 0 , X, z) = Ti{x, z)
(5)
and the boundary conditions are given by u{t,x,0) v{t,x,0) u{t, x,oo) = 0,
= i7ixsina*(l - st*)'^; v{t,x,oo)
= w{t,x,0)
= 0;
T{t,x,0)
= T^
= Ve = f22xs'ma*{l ~ st*)~^, T{t,x,<x,)=T^
(6)
Here a* is the semi-vertical angle of the cone; v is the kinematic viscosity ; p is the density; t and t*{= Qsma*t) are the dimensional and dimensionless times, respectively ; J?i and Q2 are the angular velocities of the cone and the fluid far away from the surface, respectively; Q{= Qi + Q2) is the composite angular velocity; g* is the acceleration due to gravity; T is the temperature; fi is the volumetric co-efficient of thermal expansion; a is the thermal diffusivity; Subscripts t, x and z denote partial derivatives with respect to the corresponding variables and the subscripts e, i, w and 00 denote the conditions at the edge of the boundary layer, initial conditions, conditions at the wall and free stream conditions , respectively. Equations (l)-(4) are a system of partial differential equations with three independent variables x, z and t. It has been found that these partial differential equations can be reduced to a system of ordinary differential equations, if we take the velocity at the edge of the boundary layer v^, and the angular velocity of the cone to vary inversely as a linear functions of time. Consequently, applying the following transformations :
Self-similar Solution of Unsteady Mixed Convection Flow Ve = ^2a;sina*(l - st*)"^;
r/ = {f2sina*/u)^l
t* = (l?sina*)i, u{t,x,z)
= -2-^Qxsma*{l v{t,x,z)
w{t,x,z)
-
255
st*)~'^z,
- st*)~^ f [r]);
= Qxs'ma*{l
= {i/Qsma*)i{l T{t,x,z)-T^
— st*)~^g{ri) - st*)~i f{ri);
=
{T^^T^)9iri)
T^~-T^== (To - Too)(|)(l - s r )-2 7-3
Gn =g*/3cosa*{To-To^)-^; Gr\
•^1 = -H-y; Rei
r2
RBL = Qi
ai = —-; U
nsina*—, j_,^
Pr=-
V
(7)
a
to Eqs. (l)-(4), we find that Eq. (1) is identically satisfied, and Eqs. (2)-(4) are reduce to, / ' " - / / " + 2 ^ 1 / " - 2 b 2 - ( l ^ ai) V 2Xi9^s{r + 2-'r!f") = 0
(8)
9" - Ifg' - gf] + s(l - a i - 5 - 2-I775') = 0
(9)
Pr-^9" - ife' - / ' - ) - s{20 + 2~hie') = 0
(10)
The boundary conditions conditions are reduced to /(O) = 0 = / ' ( O ) , /'(<x))=0,
5(0) = a i ,
5(cx)) = l - a i ,
6(0) = ! ^(00) = 0
(11)
Here 77 is the similarity variable; / is the dimensionless stream function, / ' and g are, the respectively dimensionless velocity along x-and y -directions ; 9 is the dimensionless temperature; Rei is the Reynolds number ; Gri is the Grashof number; Ai is the buoyancy parameter; ai is the ratio of angular velocity of the cone to the composite angular velocity; Pr is the Prandtl number; s is the parameter characterizing the unsteadiness in the free stream velocity Ve = f22xsma*{l — st*)"^. The flow is accelerating if s > 0 provided
256
Devarapu Anilkumar and Satyajit Roy
st* < 1 and the flow is decelerating if s < 0. Further, a\ = 0 implies that the cone is stationary and the fluid is rotating, ai = 1 represents the case where the cone is rotating in an ambient fluid, and for a i = 0.5, the cone and the fluid are rotating with equal angular velocity in the same direction. For ai < 0.5, J?i < J72 and for aj > 0.5, J?i > i72. It may be remarked that the Eqs. (8)-(10) for a i = 1, s = 0, and Ai = •^^ are the same as those of Hering and Grosh [HG63], and Himasekhar et al.[HSJ89]. The set of partial differential Eqs. (l)-(4) governing the flow has to be solved subjected to initial conditions (5) and boundary conditions (6). Since we are interested in the self-similar solutions, we solve the ordinary differential equations (8)-(10) under boundary conditions (11) . The initial conditions will not be used here and they do not effect the solution of Eqs. (8)-(10) under the conditions (11). Self-similar solution implies that the solution at different times may be reduced to a single solutions i.e., the solution at one value of time t is similar to the solution at any other value of time t. This similarity property permits a decrease in the number of independent variables from three to one (in the present case) and yields treatment using ordinary differential equations instead of partial differential equations. The quantities of physical interest are as follows [GEB88]: The surface skin friction co-efficient in x- and y-directions are, respectively, given by ^^^"p[l?xsina*(l-st*)-i]2 ~ r
^y'
[2M(i;)]^=o p[Qxsma*{l--st*)-^f~
^
^^'
^p„-i,j(n\ ^^""^ 3 W
Thus,
Cf:,Rel = -/"(O) 2-'CfyRel where Rcx = nx'^sma*{l
-
= -g'(0)
(12)
st*)~^/v.
The Nusselt number can be expressed as NuReZ'^ = -0'(O)
(13)
where Nu^, = - M % ^ .
3 M e t h o d of Solution The non-linear coupled ordinary differential equations (8) - (10) were first linearized using quasihnearization method [IT74]. The resulting hnear ordinary
Self-similar Solution of Unsteady Mixed Convection Flow
257
Fig. 2. Effects of ai and Ai on - / ' for s = 0.5 and Pr = 0.7 . differential equations were expressed in difference form using finite-difference scheme. The difference equations were then reduced to a system of linear algebraic equations with a block tri-diagonal structure, which is solved by using Varga's algorithm [VarOO]. A convergence criteria based on the relative difference between the current and previous iteration is employed. When the difference reaches 10~^, the solution is assumed to have converged and the iterative process is terminated. The step size Ai] and the edge of the boundary layer i]oo have been optimized to ensure the convergence of the numerical solution to the exact solution.
4 Results and Discussion Equations (8)- (10) under boundary conditions (11) have been solved using an imphcit finite difference scheme in combination with the quasihnearization technique as described in previous section. The computations have been carried out with Ar] = 0.01 for various values of Pr(0.7 < Pr < 10),Ai(0 < Ai < 5),ai(-0.25 < ai < 1.0) and s ( - l < s < 1). The edge of the boundary layer ?7oo is taken between 4 and 6 depending on the values of parameters. In order to verify the correctness of our method, we have compared our results with those of Hering and Grosh [HG63] and Himasekhara et al. [HSJ89]. The
258
Devarapu Anilkumar and Satyajit Roy
O)
Fig. 3. Effects of ai and Ai on g for s = 0.5 and Pr = 0.7 results are found to be in excellent agreement and some of the comparisons are shown in Table 1. Figures 2-3 show the effects of the buoyancy parameter Ai and the parameter a i , which is the ratio of the angular velocity of the cone to the composite angular velocity, on the velocity profiles in the tangential and azimuthal directions {—f {TJ),g^q)) for accelerating flows s = 0.5 and Pr = 0.7. Also the effects of Ai and ai on the skin friction coefficients and Nusselt number [CfxReJ ,CfyRex ,NuRex ) are presented in Table 2. For ai = 0.5, the cone and the fluid are rotating with equal angular velocity in the same direction and the non-zero velocities in tangential and azimuthal directions {—f {rj),g{ri)) for ai = 0.5 are only due to the positive buoyancy parameter Ai = 1, which acts hke a favorable pressure gradient. When a\ > 0.5, the fluid is being dragged by the rotating cone and due to the combined effects of buoyancy force and rotating parameter, the tangential velocity (—/') increases its magnitude but the azimuthal velocity {g) decreases its magnitude within the boundary layer. On the other hand, when ai < 0.5, the cone is dragged by the fluid and the combined effects of buoyancy force and rotation parameter is just the opposite. For a i < 0 and Ai = 1, the velocity profiles —/'(r/) and g{ri) reach their asymptotic values at the edge of the boundary layer in an oscillatory manner. Physically these oscillations are caused by surplus convection of
Self-similar Solution of Unsteady Mixed Convection Flow
259
Fig. 4. Effects of Pr on 0 for Ai = 1, ai = 0.25 and s = 0.5 angular momentum present in the boundary layer. But for higher buoyancy parameter(i.e, for Ai = 3 in Figs. 2-3), the oscillations are not observed. Thus, the buoyancy force which acts like a favorable pressure gradient suppresses the oscillations in velocity profiles {—f',g). Since the positive buoyancy force (Ai > 0) implies favorable pressure gradient, the fluid gets accelerated which results in thinner momentum, thermal boundary layer. Consequently, the skin -|/rj
1/9
1/9
friction coefficients, and Nusselt number (C/a; i?ex ,CfyRex ,NuRex ) are increased as shown in Table 2. In Fig. 4 , the effect of the Prandtl number Pr on the temperature profiles {9) for Ai = l , a i = 0.25 and s = 0.5 are presented. From the engineering view point, the heat transfer rate should not be large.This can be achieved by (a) reducing the temperature difference between the surface and the free stream fluid, (b) using a low Prandtl number fluid (such as air, Pr = 0.7), and (c) by imposing the buoyancy force in the opposing direction to that of forced flow.
5 Conclusions A detailed numerical study to obtain a new self-similar solutions for unsteady mixed convection flow on a rotating cone in a rotating fluid has been carried
260
Devarapu Anilkumar and Satyajit Roy
out . T h e present results are compared with the available results in the literature and they are found to be in good agreement. The results indicate t h a t skin friction coefRcients and Nusselt numbers are enhanced by the positive buoyancy force. T h e buoyancy force (which acts like a favorable pressure gradient) suppresses the oscillations in the velocity profiles which appears due to surplus convection of angular momentum. T h e increase in P r a n d t l number causes a reduction in the thickness of thermal boundary layer.
References [GEB88] Gebhart, B., Jaluria, Y. and Mahajan R.L. (1988), Buoyancy-induced Flows and Transport. Hemisphere. [HD61] Hartnett, J.P. and Deland, E.G. (1961), " The influence of Prandtl number on the heat transfer from rotating non-isothermal disks and cones," ASME Journal of Heat Transfer, Vol. 83, 95 -96. [HG62] Hering, R.G. and Grosh, R.J. (1962), " Laminar free convection from a non-isothermal cone," Int. J Heat Mass Transfer, Vol. 5, 1059- 1068 . [HG63] Hering, R.G. and Grosh, R.J. (1963), "Laminar combined convection from a rotating cone," ASME Journal of Heat Transfer, Vol. 85, 29-34 . [HSJ89] Himasekhar, K. , Sarma, P.K. and Janardhan, K. (1989), "Laminar mixed convection from a vertical rotating cone," Int. Comm. Heat Mass Transfer, Vol. 16, 99-106 . [IT74] Inouye, K. and Tate, A. (1974), "Finite diff'erence version quasihnearization applied to boundary layer equations," AIAAJ, Vol. 12, 558-560 . [OB58] Ostrach, S. and Brown, W.H. (1958), " Natural convection inside a flat rotating container," NACA TN . [Tie60] Tien, C.L. and Suji, I.J (1960), "Heat transfer by laminar flow from a rotating cone," ASME Journal of Heat Transfer, Vol. 82, 252-253 (1960). [VarOO] Varga, R.S. (2000), Matrix Iterative Analysis. Printice Hall. [WK90] Wang, T.Y. and Kleinstreur, C. (1990),"Similarity solutions of combined convection heat transfer from a rotating cone or disk to non-newtonian fluids,"^S'MB Journal of Heat Transfer,VoL 112, 939-944 .
Self-similar Solution of Unsteady Mixed Convection Flow
261
Table 1. Comparison of the results (/"(O), —s'(0), --6'{0)) with those of Himasekhar et al. [HSJ89] and and Hering and Grosh [HG63].
Pr
Al 0.0
0.7
1.0 10
1
10
0.0 1.0 10 0.0 1.0 10
Himasekh ar et al. Result [HSJ89]. Present Results -^'(0) -^'(0) -5'(0) -9'iO) -/"(o) 0.4305 1.0199 0.4299 0.6158 1.0256 0.6160 0.4285* 0.6159* 1.0205* 2.2012 0.6127 2.1757 0.6120 0.8496 0.8499 0.6120* 0.8507* 2.2078* 8.5041 1.0175 1.4061 8.5029 1.0097 1.3990 1.0173* 1.4037* 8.5246* 0.51808 1.0199 0.5184 1.0256 0.6160 0.6158 2.0886 0.7005 0.8250 2.0627 0.7010 0.8176 1.1494 1.3504 7.9045 1.1230 1.3460 7.9425 1.4072 0.6154 1.0175 1.0256 1.4110 0.6158 1.5885 0.6894 1.5458 1.5662 1.5636 0.6837 2.3582 5.0821 5.0531 2.3580 0.9903 0.9840
-no)
* Values taken from Hering and Grosh [HG63]
Table 2. Skin friction coefficients and Nusselt number when Pr=0.7 and s=0.5.
Al « ! 1 -0.5 0.0 0.25 0.5 0.75 3 -0.5 0.0 0.25 0.5 0.75 5 -0.5 0.0 0.25 0.5 0.75
CfxRSx -1.27215 0.63241 1.31339 1.84798 2.24659 2.43934 3.79522 4.31854 4.73958 5.05951 5.18154 6.36147 6.82071 7.19231 7.47647
CfyRCx -1.33537 -0.63949 -0.22765 0.19806 0.62679 -1.43105 -0.59651 -0.13691 0.33552 0.81201 -1.55129 -0.60724 -0.10547 0.40602 0.92102
NuRe^^^^ 0.55580 0.81922 0.89011 0.93700 0.96563 0.91210 1.02869 1.06539 1.09111 1.10712 1.06503 1.14323 1.16887 1.18730 1.19926
Homogenization of a Nonlinear Elliptic Boundary Value Problem Modelling Galvanic Interactions on a Heterogeneous Surface Y.S. Bhat Department of Mathematics, University of Florida, Gainesville, FL 32611 USA. ybhatOmath.uf1.edu Summary. We study a nonlinear elliptic boundary value problem arising from electrochemistry. The boundary value problems occurs in the study of heterogeneous electrode surfaces. The boundary condition is of an exponential type and is normally associated with the names of Butler and Volmer and the notions of galvanic corrosion. We examine the questions of existence and uniqueness of solutions to this boundary value problem. We then treat the problem from the point of view of homogenization theory. The boundary condition has a periodic structure. We find a limiting or effective problem as the period approaches zero, along with a correction term and convergence estimates. We also do numerical experiments to investigate the behaviour of galvanic currents near the boundary as the period approaches zero.
K e y w o r d s : galvanic corrosion, homogenization, nonlinear boundary condition
1 Introduction Galvanic corrosion is a phenomenon caused by electrochemical interaction between different parts of the same surface. We study this phenomenon. A galvanic interaction occurs when galvanic current flows either between an electrode surface and a counterelectrode or between different parts of the same heterogeneous surface. In Figure 1(a) the silver strip is cathodic, and reduction takes place (Ag gains electrons.) Simultaneously oxidation takes place at the zinc strip, zinc loses electrons, and is said to be anodic. Zinc dissolves into the solution, the zinc electrode is being corroded and the electron flow is known as galvanic current. In Figure 1(b), a similar oxidation-reduction reaction is taking place between different parts of the same surface. Here, in this paper, we consider a cylindrically shaped domain, and model the oxidation-reduction reaction occurring between different parts of our heterogeneous surface, i.e. the two dimensional
Fig. 1. (a) Zinc loses electrons to Silver, (b) A similar reaction occurs between
different parts of the same surface base of our cylindrically shaped domain R. The base, which we will refer to as contains a periodically regular arrangement of anodic islands in a cathodic plane. All the anodes are the same uniform material. The cathodic plane is also uniform in material (see Figure 2.)
r,
cathode
Fig. 2. The base of the cylinder is a heterogeneous surface.
The electrolytic voltage potential, boundary value problem,
4 satisfies the
following nonlinear elliptic
Ad=Oinf2
ad - ~~[~a..(rn-V -~,-a..crn-v~)l ) -an
on
aQA
ad --
~~~~a..(rn --caccca-vc)l v~) on aR,
-ad =
o on BQ\ { ~ Quaac} A
dn
an
The boundary condition is called the Butler-Volmer exponential boundary condition, where:
a,,, a,, =anodic transfer coefficients, a,,
+ a,,
=
1
Homogenization of a Nonlinear Elliptic Boundary Value Problem
a,,, a,, =cathodic transfer coeff., a,,
+ a,,
265
=1
JA,Jc =anodic/cathodic polarization parameters VA,Vc =anodic/cathodic rest potential
V$ =galvanic current In the electrochemistry community, Morris and Smyrl [MS88] have tried numerically to simulate the behaviour of corrosion current for fixed ratios of anodic to cathodic areas(3-D model). Morris and Smyrl concluded that corrosion current is determined by the ratio of anodic area to active perimeter. They claim current increases with active perimeter.
d
*.a
Fig. 3. Area remains constant as perimeter increases.
As a special case of increasing perimeter with fixed anodic area, we consider a periodic structure with period going to zero. Mathematically we study,
Au, = 0 in f2
where f (y, v) = ~ ( ~ ) [ e ~ ( ~ ) (-~e-(l-a(y))(v-V(y))] -~(y)) for any v E X and y E Y = [O, 11 x [O, 11. Here X,a, and V are smooth real Y-periodic functions, we also assume there exist constants Xo, A o , a o , Ao and Vo such that 0 < Xo X(y) 5 A o a n d O < a o < a ( y ) A0 < 1 and jV(y)/ VO. Recall 0is a bounded cylindrical domain in X3. Here letting 6 -+ 0 represents increasing perimeter with fixed anodic area.
<
<
<
2 The Method of Homogenization Engineering and scientific problems often deal with materials formed from multiple constituents(e.g. composite materials, fluid filled porous solids.) We try to find simpler equations that smooth out whatever substructure variations that arise with the spatially heterogeneous material. Beginning with a problem that includes the structural variations we derive a simpler problem that serves as a first term approximation. Here we begin by first assuming the solution 4, has an asymptotic expansion of the form,
266
Y.S. Bhat
2d Case Rescaling
VB
10
1
10
Fig. 4. Example of 2D rescaling when e = 1/10. •e^W+e^.^^ T h e general procedure is to substitute the above expansion back into the original b o u n d a r y value problem to determine associated boundary value problems for . In this case we claim 0o satisfies, b^\ 4'\ A(j)o = 0m ~~d^
f2
= foiM
on r
- —— = 0 on ail \ r on where /o(f) = / y f{y,v)dy,
for any u G 5R and cpe satisfies,
A4^^ = om n ^(1)
= {f{x/e,
dn
= 0 on
o))/e + e^ on T dn\r
dn ^(1)
= 0
where, e^ = \ Jp{fo{4'o) " fi^/^t'Po))Note t h a t it t h a t these are the appropriate boundary conditions. estimates will show t h a t these are the right choices To whit we can show(proof omitted) t h a t there exist independent of e such t h a t :
He
K-ei'^\mia)
is not a priori obvious Subsequent convergence for b o u n d a r y functions. constants C i , C2, C3, C4
\\4>e - 0 o | ! f f i ( « ) - *^2\/e \\(pe-(t>o\\LHf2)
< Cse
/,(!)!
< :::C 4
L2(r)
Homogenization of a Nonlinear Elliptic Boundary Value Problem
267
3 Existence and Uniqueness Consider the problem, Auc = 0 in J? fill
- ^
= f{x/e,u,)onr
- ^ = ov
(1)
0ondf2\r
where f{y,v) = A(y)[e°(2')('^-^(J'» ~ e~(i-«(2^»(''"^(''»]. We consider the 3D problem, i.e. let J? C 5R^^ C 3?^. Here Y = [0,1]^ and A,a, and V are piecewise smooth real valued y-periodic functions, we also assume there exist constants Xo,Ao,ao,Ao and Vb such that 0 < AQ < A(y) < AQ and 0 < ao < a{y) < AQ < I and |V^(y)| < VQ. We show that the energy minimization forms of the problem (1 ) have unique solutions in H^{Q). For a given e, define the following energy functional, E,{v) = 11
2 Jn
\Vv\^dx+
f
Jr
F{x/e,v)dx
where, Hy),
,a(y)(v~V(y)) {y){v--V{y))
F{y,^) = —r^e a{y)
,
My) ^\y)
1 - a{y)
e
(i-a(y))(v-V(y))
Theorem 1 (Existence and Uniqueness of the Minimizer). There exists one function u^, G H^{f2) solving E^{u^) = min„g/^i(j7) E^{u). Proof: Note that -^F{y,v)
= A(y)a(y)e«(2/)(''-^(^» + A(y)(l - a(j/))e^(i^"(^»('^^^(^»,
since A > 0,a > 0, and 1 — a > 0 we have that ^F > 0. It is easy to see that the partial derivative is bounded below. That is, there exists a constant Co, independent of y and v such that, ^F{y,v)>co>0. Since F is smooth in the second variable, for any v,w & H^{n) and for any y, there exists some ^ between v + w and v — w such that F{y, v + w)+ F{y, v - w) ^ 2F{y, v) = -g^F{y, which from the lower bound yields
^w^
268
Y.S. Bhat F(x/e, V + w) + F{x/e, v — w) — 2F{x/e, v) > CQW'^
whence E^{v + w) + Ee{v - w) - 2E^{v) > / \S/w\'^dx + co / w'^dx Jn Jr >CQ\\w\\]ii(n)
(2)
where the last inequality follows by a variant of Poincare. Now let { u " } ^ i be a minimizing sequence, that is i?e(u") ^
inf
E^{u)
as
n - > oo.
note that clearly infug//i(^) E^{u) > —oo. Let
and tf =
2 then note that v + w = u" and v -^ w = u^ and so £;e(v + ^ ) + i?e(^ - ^ ) - 2E,{V) > | ! | < -
inf
£,(.;) > ^ | | < ~ < i | 2 .
Now if we let m,n —> oo, we see that {u"}n is a Cauchy sequence in the Hilbert Space H^{Q). Define u^ to be its limit in H^{Q). Then we have < ^ u ,
in
H\Q)
in
L2(J-)
which by the Trace Theorem implies, <
-> u,
which implies (Rudin [Rud66],p.68) there exists a subsequence {w"''}fc, such that u^ —> Mf a.e. in /^. So now we claim F(a;/e,Ue) = hminfF(x7e,u^)
a.e..
(3)
Since F is smooth in the second variable, and u^ —> u^ a.e. in F we have that F{-,Uc) = Umfc_,oo-P'(7,'"e) a.e which clearly implies (3). Now note that
Homogenization of a Nonlinear Elliptic Boundary Value Problem
269
clearly F(f,u,^) > 0 Vfc,fc = 1,2,.... So that by Fatou'sLemraa(Rudin [Rud66], p.23) we can claim, F{x/e,Uc)dx
/
F{x/e,u'^)da
Thus we can conclude from this and the fact the the first term of E^ is weakly lower semi continuous that, E,{u,)
(= lim E,{u';) = y
fc—•oo
inf
ue//M^)
E,{u)
t h a t is E^{ue) = i^^ueH^{n) E^{u). So we have shown the existence of a minimizer. T h e uniqueness of the minimizer follows trivially from (2). Suppose Ue and us are b o t h minimizers of the energy functional, i.e.
E,{u,) = mi^^HHn) Ec{u) = E,{us). Now if we let v = ^ i ^ and w = ^ ^ ^ ^ then V + w = Ue and v — w = us. Then substituting v and w into (2) yields, E,{u,) + E,{us) - 2E,C^^) but since ^ i ^ G H\f2) j\\ue
- Mnnn)
> ~^\\u, - usWl.^a^
we have inf„g^:(n) E,{u) < E,{^i^) < Ee{uc) + E,{us) -2^^mi
whence,
E,{u) = 0.
So Ue = Us a.e. in H^{Q). Thus we have shown the uniqueness of the minimizer. D Note that this argument can be generalized to address the n-dimensional problem, i.e. the case in which we have Q C iJJ",/^ C 3?"^^ with boundary period c e n y = [0, l ] " - i .
4 Numerical Experiments Finally we wish to numerically observe the behaviour of the homogenized boundary value problems as a way to describe the behaviour of the current near the boundary. We plan to use a finite element method approach to the 2-D problem. For the 2-D problem the domain J7 is a unit square and the boundary F is the left side of the unit square, that is F = {(xi,a;2) : Xi = 1}. In this case we impose a grid of points (called nodes) on the unit square and triangulate the domain, then introduce a finite set of piecewise continuous basis functions. Now we wish to minimize the energy functional with respect to these basis functions. In particular we assume that the minimizer can be written as a linear combination of basis functions, we write
Y^mh
270
Y.S. Bhat
where m is the number of nodes, and {bi}^^ is the set of basis functions. We a t t e m p t to minimize the energy over the set of coefBcients {?7i}™ i using a conjugate descent algorithm developed by Hager and Zhang. We are currently implementing this minimization and refer the reader to future publications for numerical results. Acknowledgments: All figures appear courtesy of Valerie Bhat.
References [BM05] Bhat, Y.S. and Moskow, S. (2005), "Homogenization of a nonlinear elliptic boundary value problem modeling galvanic currents",in preparation. [Eva98] Evans, L.C. (1998), Partial Differential Equations, American Mathematical Society, Providence, Rhode Island. [Hol95] Holmes, M.H. (1995), Introduction to Perturbation Methods, SpringerVerlag New York Inc., New York, New York. [MS88] Morris, R. and Smyrl, W. (1988), "Galvanic Interactions on periodically regular heterogeneous surfaces," AlChE Journal, Vol. 34, 723-732. [Rud66] Rudin, W. (1966), Real and Complex Analysis, McGraw-Hill, New York, New York. [VX98] Vogelius, M. and Xu, J.M. (1998), "A nonlinear elliptic boundary value problem related to corrosion modeling" Quart.Appl.Math., Vol. 56, No. 3, 479-505.
* www.math.ufl.edu/'^hager
A Simple Mathematical Approach for Determining Intersection of Quadratic Surfaces Ken Chan The Aerospace Corporation, 15049 Conference Center Drive, Chantilly, VA 20151, USA. [email protected]
Summary. This paper is primarily concerned with the mathematical formulation of the conditions for intersection of two surfaces described by general second degree polynomial (quadratic) equations. The term quadric surface is used to denote implicitly a surface described by a quadratic equation in three variables. Of special interest is the case of two ellipsoids in the three dimensional space for which the determination of intersection has practical applications. Even the simplest of traditional approaches to this intersection determination has been based on a constrained numerical optimization formulation in which a requisite combined rotational, translational and dilational transformation reduces one of the ellipsoids to a sphere, and then a numerical search procedure is performed to obtain the point on the other ellipsoid closest to the sphere's center. Intersection is then determined according to whether this shortest distance exceeds the radius of the sphere. An alternative novel technique, used by Alfano and Greer [AGOl] is based on formulating the problem in four dimensions and then determining the eigenvalues which yield a degenerate quadric surface. This method has strictly relied on many numerical observations of the eigenvalues to arrive at the conclusion whether these ellipsoids intersect. A rigorous mathematical formulation and solution was provided by Chan [ChaOlJto explain the myriads of numerical observations obtained through trial and error using eigenvalues. Moreover, it turns out that this mathematical analysis may also be extended in two ways. First, it is also valid for quadric surfaces in general: ellipsoids, hyperboloids of one or two sheets, elliptic paraboloids, hyperbolic paraboloids, cyUnders of the elliptic, hyperbolic and parabolic types, and double elliptic cones. (The term ellipsoids includes spheres, and elliptic includes circular.) The general problem of analytically determining the intersection of any pair of these surfaces is not simple. This formulation provides a much desired simple solution. The second way of generalization is to extend it to n dimensions in which we determine the intersection of higher dimensional surfaces described by quadratic equations in n variables. The analysis using direct substitution and voluminous algebraic simplification turns out to be very laborious and troublesome, if at all possible in the general case. However, by using abstract symbolism and invariant properties of the extended (n-l-1) by (n-l-1) matrix, the analysis is greatly simplified and its overall structure made comprehensive and comprehensible. These results are also included in this paper.
272
Ken Chan
They also serve as a starting point for further theoretical investigations in higher dimensional analytical geometry.
1 Introduction Nomenclature As a common starting point, we shall first introduce the nomenclature used in this paper in connection with surfaces described by second degree polynomial equations. The definitions are the same as those used in standard practice. This will also permit us to point out some of the differences when we go from the case of two dimensions to three dimensions and then to higher dimensions. In general, a second degree polynomial will contain first degree terms and a constant term. For brevity, we shall refer to this polynomial as "quadratic". This term is consistent with the usage in "quadratic equations". However, it is used in a different sense in "quadratic forms" which are homogeneous polynomials containing only second degree terms, but not first degree and constant terms. A surface described by a quadratic equation is referred to as a "quadratic surface". If there are two variables in the polynomial, we have a conic (curve). If there are three variables, we have a quadric surface. (Some authors refer to a surface described by a quadratic equation in n variables as a quadric surface in n-dimensional space, but we shall refer to it simply as a quadratic surface here unless we wish to use a term such as "n-dric" surface as a compression of n-dimensional quadric.) Classification of Conies A conic is described by the general second degree polynomial equation qnx'^ + q22y'^ + 2q^2^y + 2qi^x + 2q23y + qss = 0
.
(1)
If we next introduce homogeneous coordinates by defining a column vector r of 3 components and a symmetric 3x3 matrix Q by r = (x,2/,l) =
Q = hj]
(2)
,
(3)
then equation (1) may be written in an e x t e n d e d form as r^Qr = 0
.
(4)
We note that the LHS of equation (4) is in the form that we are famihar with when dealing with quadratic forms except that we now have an extended
Determining Intersection of Quadratic Surfaces
273
vector and an extended matrix. Note that not all quadratic polynomials can be reduced to homogeneous form by eliminating the linear terms. This is the case when a particular variable has no second degree terms but has only linear terms, e.g., a parabola. If we define A as the determinant of Q, i.e., A = \Q\
,
(5)
then we may classify the conic according to whether A is zero or otherwise. If A is non-zero, then we have a proper (or non-singular) conic which may be an elhpse, a hyperbola or a parabola. If A is zero, we have a singular (or improper) conic which may be a point, a pair of intersecting hues, a pair of parallel hnes or a coincident hne which actually comprises two lines. Here we are restricting ourselves to discussing only real conies. Because these singular cases have degenerated into very simple elements (a point or hne), a singular conic is also called a degenerate conic. All this has been known and may be found in a book by Pettofrezzo [Pet66]. It must be stated that for the case of a conic, it has not been customary (or necessary) to use the term singular if A vanishes. Rather, the term degenerate is the only one used for this description. However, for quadratic polynomials with more than two variables, such a distinction is necessary as we shall next discuss. Classification of Quadrics A quadric is described by the general second degree polynomial equation
9iia; +<5'22y +9332 +2q^2^y+2qi:iXZ+2q2zyz+2q^^x+2q2iy-\-2q.i^z^-qAA = 0 . (6) If we define a column vector r of 4 components and a symmetric 4x4 matrix Qby r=(x,2/,z,l)
(7)
Q = fei] ,
(8)
then equation (7) may be written in an extended form as r^Qr = 0
.
(9)
If we define A as the determinant of Q, i.e., Zi=|Q|
,
(10)
then we may classify the quadric according to whether A is zero or otherwise. If A is non-zero, then we have a proper (or non-singular) quadric which may be an ellipsoid, a hyperboloid of one or two sheets, elliptic paraboloid or a
274
Ken Chan
hyperbolic paraboloid. If A is zero, we have a singular (or improper) quadric which may be a double elliptic cone, a cylinder of the elliptic, hyperbolic or parabolic type, a point, a straight line, a pair of intersecting planes, a pair of parallel planes or a coincident plane. Here we are restricting ourselves to discussing only real quadrics. However, because not all of these singular cases have degenerated into very simple elements (a point, line or plane), a singular quadric is further classified as d e g e n e r a t e if the rank of the matrix Q is less than 3; otherwise, it is non-degenerate. Thus, a double elliptic cone, a cylinder of the elliptic, hyperbolic or paraboHc type and a point would be a singular non-degenerate quadric. (A point is a limiting case of an ellipsoid and has rank 3.) A straight line, a pair of intersecting planes, a pair of parallel planes and a coincident plane would be a singular degenerate quadric. (A straight line is the limiting case of an elliptic cylinder.) All this has been known and may be found in a book by Dresden [Dre64] from which Figure 1 is taken to summarize the main results. Figure 1 is taken from the handbook by Korn and Korn [GKT68] to provide an illustration of the five proper quadrics.
2 Illustrative example In this paper, our main concern is with the mathematical formulation of the conditions for intersection of two surfaces described by quadratic equations. Before we do this, we shall first study a simple case to illustrate the main ideas. Consider a circle A of radius a with center at the origin so that its equation is
^ +4 =1 •
(11)
From the previous discussion, we may write its extended 3x3 matrix A as ^ 0 0 0 ^ 0 0 0-1
(12)
and its equation in homogeneous coordinates is r^Ar = 0
.
(13)
Next, consider an ellipse B with semi-axes b and c with center on the X-axis at the point (XQ, 0) so that its equation is
fe^4
=l .
It may be easily shown that its extended 3x3 matrix B is
(14)
Determining Intersection of Quadratic Surfaces
275
CLASSIFICATION OF QUADRIC SURFACES
^ = ki/L hi = 1> 2, 3, 4; •4*4 = \aij\, i,j = 1, 2, 3; o,< O/i . i^3= «.y '
«y
2 i<j
au aij
dij o,ik ajj ajk
(Uk
3
an
a,j
oy «i;n = rank of matrix of A; fa = r a n k of m a t r i x of Au. 4 Non-singular quaclrica
A>0
or
Hyperboloid of one sheet
Hyperbolic
2
paraboloid
Hyperboloid of two sheets
Imaginary Cone Impossible
Impossible
Impossible
Ileal Cone
Imaginary EllipImagitic 'J\D,>0 elliptic nary cylinr2>o interder secting planes ImposTt>0; Ellipparabo- TiD, sible <0 tic cylloid inder
7',<0
1
2 1 Degenerate quadrics
A<0
Imaginary Ellipellipsoid soid
3
3 Singular non-degenerate quadrics
HyperInterbolic 'A<0 secting cylinplanes der
Parabolic cylinder
Imaginary I>2>0 parallel Coinplanes cident Paral- planes ft<0 lel planes
Pig. 1. Classification of Quadric Surfaces
276
Ken Chan
2
c'
(a)
Hyperboloid of two sheets; -•=• - ^ - ~ = 1 a'
b^
c'
(C)
2
Elliptic paraboloid:
J
—~-\-T7''Z
3
2
Hyperboloid of one sheet: ^ + -ij - i - = 1
Ellipsoid: -o^ ^ + Tf F + c^ % •" 1
Hyperbolic paraboloid: -V— r r
{d\ Fig. 2. The Five Proper Quadrics
a
h
6'
C
Determining Intersection of Quadratic Surfaces 1
W 0
B
277
0
1 71
0
(15)
and its equation in homogeneous coordinates is r^Br = 0 .
(16)
From the circle A and the eUipse B, it is obvious that we may form another conic Q described by any hnear combination of A and B such as r^Qr = r^Br - Ar^Ar = 0
(17)
If we choose Q to be singular, i.e., | Q |= Q = 0, then A must be an eigenvalue of the matrix A~^B since we have | A |= A 7^ 0 in the equation A(A~^B - AI)
(18)
0
There are three values of A and they are found from the characteristic polynomial equation |A"-^B-AI|=0
(19)
.
From equations (12), (15) and (19), we obtain
(f--A)
0
0
(fr-A) 0
a
XQ
= 0
0
(20)
(1-ff-A)
which yields -A)
'^^+(|i-S-^)^4
(21)
One eigenvalue A3 is given by (22)
Its corresponding eigenvector r^ is obtained from (A-iB-A3l)r3 = 0
.
(23)
The four parameters a, b, c and Xo are independent. Thus, in general, the cofactor of q22 in equation (20) is non-zero. In this case, rs is given by ra = (0,2/3,0) where y3 j^ 0. Since r3 is non-trivial, therefore it is an acceptable eigenvector from the standpoint of matrix theory. However, it is inadmissible for our purposes of using homogeneous coordinates since we require that it be of the form with a non-vanishing third component as dictated by
278
Ken Chan
equation (2). (Note that an eigenvector is determined only in direction but not in magnitude, but no scaling in magnitude of r^ will yield 1 in the third component.) The other two eigenvalues Ai and A2 are obtained by solving 1)A +
A^ + 62Q - 62
(24)
0
62
which yields Al,2 —
62 ) ±
'£0 _ ^
1N2 __ 4 ^
^62
')
62
(25)
^^52
We note that these two eigenvalues do not depend on the semi-axis c of the eUipse. If we let i denote 1 and 2, then the eigenvector r^ corresponding to Aj is obtained from (A-iB-Ail)ri
(26)
0
Again, the four parameters a, b, c and XQ are independent. Thus, in general, q22 in equation (20) is non-zero so that y, = 0. In this case, the cofactor of q22 in equation (20) is zero and r^ is given by Yi =
{Xi,0,l)
(27)
where
-w
62
Xi —
(f-A,)
(28)
(1-ff-A,)
Besides being an acceptable eigenvector from the standpoint of matrix theory, Ti is also admissible for our purposes of using homogeneous coordinates since we require that it be of the form with a non-vanishing third component as dictated by equation (2). If we substitute A and B from equations (12) and (15) and A^ from equation (25) into equation (17), we obtain the fohowing equation for the degenerate conic Q
[xyl]
1
0
-ff
(^-^)
0
0
(Ai + | | - l ) J
X y 1
=0
(29)
0 the form
ii^ - ^ ) ( a ; - X i ) 2 + ( - - - | ) y 2
+ fc = 0
(30)
Since | Q |= 0, we may verify that k = 0. Because Aj depends on a, b and Xo, therefore the coefficient of the first term in equation (30) depends on
Determining Intersection of Quadratic Surfaces
279
these three parameters. On the other hand, the coefRcient of the second term depends on all four parameters a, b, c and XQ. Thus, it is possible for these coefficients to be of the same sign or of opposite signs. In the first case, the degenerate conic is the point (xj, 0); and in the second case it is a pair of intersecting lines passing through that point. In equation (25), the discriminant determines when the two eigenvalues Ai and A2 are real, equal or complex conjugates. The general conditions (for Xo in terms of a and b) have been obtained but will not be given here because they would detract from the main theme. For simplicity of discussion which will achieve the same purpose, let us consider XQ to be positive. It is easily verified that Case I: If XQ = (a + b), then (^_^_1)2„4^^0
(31)
so that, by equation (25), the two eigenvalues are equal and negative and are given by Ai,2 = - ^
.
(32)
Thus, the two coefficients in equation (30) are of the same sign. Consequently, the two degenerate conies corresponding to Ai and A2 are the points (xj, 0) and (x2, 0) respectively. By substituting equation (32) into (28), we obtain x'1,2 = a
(33)
so that the two degenerate conies are the same point (a, 0). Moreover, by equation (27), the two eigenvectors ri and r2 corresponding to the two eigenvalues are also the same. In fact, they are the two degenerate conies expressed in homogeneous coordinates. We note that if XQ = (a + b), then the circle A and the ellipse B are tangent at the point (a, 0). Next, consider Case II: If XQ > (a + b), then
so that, by equation (25), the two eigenvalues are real, negative and not equal. Thus, the two coefficients in equation (30) are of the same sign. Consequently, the two degenerate conies corresponding to Ai and A2 are two different points (xi, 0) and (x2, 0). Because the eigenvalues are different, the two corresponding eigenvectors ri and r2 are also different, but they are still the two degenerate conies expressed in homogeneous coordinates. We note that if XQ > (a -I- b), then the circle A and the elhpse B are outside of each other, i.e., they share no common area. It is easily verified that Case III: If XQ = b = 2a, then
280
Ken Chan
so that, by equation (25), the two eigenvalues are complex conjugates. Consequently, there are no real degenerate conies and no real eigenvectors corresponding to the two complex eigenvalues. We note that if XQ = b = 2a, then the ellipse B intersects the circle A and passes through its center, i.e., they share common area. However, the three cases discussed above do not exhaustively describe all the scenarios. There are additional complications if we continue to move the center of the eUipse toward the origin. For instance, consider Case IV: If we let XQ = a = b/2, then the discriminant vanishes and we obtain equation (31) so that, by equation (25), the two eigenvalues are equal and positive and are given by Ai,2 = ^
.
(36)
However, only the semi-axes b and c effectively determine whether the two coefficients in equation (30) are of the same sign or of opposite signs. Consequently, by equations (28) and (30), both the degenerate conies corresponding to Ai and A2 are the same point (xj, 0) or the same pair of intersecting hnes passing through that point. By substituting equation (36) into (28), we obtain xi,2 = -a
(37)
so that the two degenerate conies have the common point (-a, 0). Moreover, by equation (27), the two eigenvectors rj and 12 corresponding to the two eigenvalues are also the same. We note that if x,, — a = b/2, then the circle A and the elUpse B are tangent at the point (-a, 0). For the parameters in this example, a little consideration reveals that both the degenerate conies comprise the same pair of intersecting lines passing through the point (-a, 0). This pair of lines also intersect the circle at the other two points where the ellipse intersects the circle. All these conclusions follow from equations (13), (16) and (17) which state that if there is a point common to any two of them, then it is also common to the third. However, not all common points when expressed in homogeneous coordinates will yield eigenvectors. But an eigenvector will always lie on the degenerate conic associated with the eigenvalue, even if that degenerate conic may not have any point in common with the circle and the elhpse. We have described many interesting and important manifestations associated with a circle intersecting with an ellipse. We have given four examples of these two conies having no, one, two and three points in common. A little consideration reveals that for one common point, we can only have the two conies tangent to each other. Our example of two common points considered the case of two intersecting points. However, we could also have two tangent points. Our example of three common points considered the case of one tangent and
Determining Intersection of Quadratic Surfaces
281
two intersecting points; and this is the only possible configuration. Finally, we could also have considered four common points, which must necessarily be intersecting points. Our Illustrative Example of the circle and the eUipse (spanning some 5 pages of concise description) is perhaps the simplest one which brings out the salient features of intersection between quadratic surfaces. Figure 3 illustrates the four cases just discussed.
Case II
(dD Cubic tQ.) = 0
Fig. 3. Relation Between Conic Intersections and Eigenvalues
At the very root of our methodology is the necessity to solve a polynomial equation to obtain explicit expressions for the eigenvalues so that we may then study their relation with each other. In our simple example, we had chosen the center of the ellipse to be on the x-axis thus making it possible to obtain easily the three eigenvalues. If we had chosen the center of the ellipse at a general point in the plane, then we would have to solve unnecessarily a much more comphcated cubic equation even though we have Cardan's method at our disposal. Moreover, if we had considered the case of intersection between two ellipsoids in three dimensions, our methodology would have required us to solve a quartic equation for the four eigenvalues associated with the singular quadric surface. While this is possible using Ferrari's method, it is hardly practical. For these more complex problems, it is doubtful if we will obtain new
282
Ken Chan
revelations illustrating the nature of the intersections. If we were to proceed along the same lines for the case of intersection between higher dimensional quadratic surfaces, we would be confronted with an insurmountable task which is rather daunting. Thus, we are forced to seek a new general approach to formulate and solve the problem.
3 General theorems In this section, we shall consider the general case of two quadratic surfaces defined in n-dimensional space, denoted by A and B. We shall assume that one of them (say A) is non-singular so that its inverse exists. Again, we shall introduce homogeneous coordinates. Let r denote the extended column vector defined by r = (a;i,X2,...,a;„,l)
.
(38)
Let the equations in homogeneous coordinates for A and B be r'^Ar = 0 r^Br = 0
(39) .
(40)
From A and B, it is obvious that we may form another quadratic surface Q described by any linear combination of A and B such as r'^Qr = r'^Br - Ar'^Ar = 0
.
(41)
Since A and B are symmetric, therefore Q is also symmetric. If we choose Q to be singular, i.e., | Q \= Q = 0, then A must be an eigenvalue of the matrix A~ B since we have | A |= A ^ 0 in the equation | A ( A - 1 B - A I ) | =0
.
(42)
There are (n+1) values of A and they are found from the characteristic polynomial equation |A"iB-Al|=0
.
(43)
Then, we may prove the following theorems. T h e o r e m 1. Common Point. If a point P lies on any two of the three surfaces A, B and Q (where Q need not be singular), then it lies on the third surface also.
Determining Intersection of Quadratic Surfaces
283
Proof. The proof is obvious and follows directly from equations (39), (40) and (41). D Remark: Even though it is trivial, this theorem is stated here because reference will be made to it repeatedly. T h e o r e m 2. Eigenvector. Let Aj be an eigenvalue of A~^B and let r^ be the eigenvector corresponding to Aj. Then, the eigenvector TJ always lies on the singular quadratic surface Q associated with that eigenvalue Aj. Proof By the definition of an eigenvector of A^^B, rj satisfies the equation ( A - i B - Ail)ri = 0
.
(44)
If we pre-multiply it by A, then we obtain (B-AiA)ri = 0
.
(45)
Therefore, we have Qr, = 0
.
(46)
Consequently, r, also satisfies rfQr, = 0
.
(47)
Hence, the homogeneous coordinates of this eigenvector r^ yield a point which lies on the singular surface Q. This proves the theorem. D Remark: Note that even though the eigenvector r, lies on Q, this singular surface Q may not have any point in common with A and B. T h e o r e m 3. Tangency. If a point P common to A and B yields the homogeneous coordinates of an eigenvector of A''^B, then the two quadratic surfaces are tangent at that point. Proof. Associated with the surface Q given by equation (41) is the quadratic polynomial ^ = r ^ Q r = r ^ B r - Ar^Ar
.
(48)
Let V ^ denote the n-dimensional gradient of ^, which is an n-dimensional vector. In keeping with current usage, V
(49)
where, for consistency, both sides are expressed as column vectors. Let Aj denote an eigenvalue of the matrix A~^B, and let r, denote the corresponding eigenvector. By equation (46), this eigenvector satisfies Qrj = 0, therefore we have
284
Ken Chan V^U. =0
Let the point P(xi, to ri(xi, X2,..., x„,l). lies on the surface Q. of <25 and obtain from
.
(50)
xg,..., x„) in nonhomogeneous coordinates correspond Since P lies on the surfaces A and B, therefore it also At this point, we evaluate the n-dimensional gradient equations (48) and (50)
(51) r Br)|^^^^ = AiV(r^Ar)|^^^^ . That is, the two quadratic surfaces A and B are tangent at point P. This proves the theorem. • Remarks: Note that this theorem is very general. If we are dealing with two n-diraensional surfaces other than hyper ellipsoids, then the singular quadratic surface may not be a point (which is a degenerate hyper elhpsoid). In general, we have a singular quadratic surface which may be a line, a plane, or any other surface Q as long as | Q |= 0. What this theorem says is that at the common point P the two surfaces A and B are tangent. This theorem does not say that this tangent point is necessarily an isolated point; it may be a point on a locus at which A and B are tangent. One example of this latter case is a sphere A inside a prolate ellipsoid of revolution B so that the locus of tangent points is a circle which is also the intersection of the plane Q with A (or B). Another example is an elhpsoid A inside a hyperboloid of one sheet B (both being concentric and having the same principal axes) so that the locus of tangent points is an ellipse which is also the intersection of the plane Q with A (or B). Corollary: If two quadratic surfaces A and B intersect (tangency excluded), then no point in the set of points comprising the intersection can correspond to an eigenvector of A~^B. Theorem 4. Converse Tangency. If two quadratic surfaces A and B are tangent at point P, then that point yields the homogeneous coordinates of an eigenvector Vi of A^^B. Proof. Let A and B be tangent at a point P(xj, X2,..., x^) whose homogeneous coordinates are denoted by r^(xj, X2,..., x^,l). Associated with A and B are two corresponding quadratic polynomials r-'^Ar and r-^Br. It follows that their gradients are proportional at point P V(r^Br) 1^^^,^ = A^ V ( / ^ A r ) |^^^,^
(52)
where AJ is some number. Let another quadratic surface Q be defined by r^Qr = r^Br - A-r^Ar = 0
.
(53)
Since A and B are symmetric, therefore Q is also symmetric. Associated with Q is the quadratic polynomial
Determining Intersection of Quadratic Surfaces ^ = r'^Qr = r ^ B r - A^r^Ar
.
285 (54)
Prom equations (52) and (54), we obtain V*lr=r', = 0
(55)
where V ^ is an n-dimensional row vector. Equation (55) states that it has n zero components. Since Q is symmetric, it may be shown that V ^ = 2[I„xnOnxi]Qr
.
(56)
If we evaluate the n-dimensional gradient V<5 at r^, it foUows from equations (55) and (56) that the column vector Qr^ has zero for the first n components, i.e., Qr'i = (0,0,...,0,a)
.
(57)
Since P lies on the surfaces A and B, therefore it also lies on the surface Q. Moreover, this point has nonhomogeneous coordinates obtained from v'^. Hence, the following equation holds
r'fQr', = 0 .
(58)
This yields 0 + l a = 0 so that a = 0
.
(59)
Consequently, Qr'i-0
.
(60)
That is, v[ is an eigenvector of A~^B. This proves the converse theorem. D Remarks: Let two quadratic surfaces A and B have tangent points Pi and P2. Then, the two vectors OPi and OP2 can be colhnear or non-colhnear. The first case can be easily constructed by using the example of a circle concentric with an ellipse whose minor axis is equal to the radius of the circle. This second case can be seen from the Illustrative Example for the case of a
286
Ken Chan p' + Po
.
(61)
However, this is not a convenient form for use in the extended system. Rather, let r and r ' respectively denote the homogeneous coordinates of the same point P. Then, they are related by the following matrix multiplication equation r = Tr'
(62)
where the (n+1) x (n+1) translation matrix T is given by T = T{po)
^nxn Po Olxn 1
(63)
By expanding the determinant of this matrix using the last row of elements and their cofactors, we obtain for all translations \T{Po)\ = l
•
(64)
Prom equation (62), it follows that r' = T^i(p„)r
.
(65)
v' = T{-po)v
,
(66)
Since we also have
it follows directly (without having to perform any matrix inversion) that T-i(po)=T(-p„)
.
(67)
T-\p,)^T^{p,) .
(68)
Note that
That is, the inverse of T is not equal to its transpose. However, by obtaining the inverse of the transpose of T given by equation (63), we obtain
[T'^{po)Y'=^[T-\p,)Y^l
.
(69)
That is, these two operations commute. The above discussion leads us to state the following: Theorem 5. Translation. The (n+1) x (n+1) translation matrix T given by equation (63) has the properties given by equations (64), (67) and (69). Let there be a rotation of axes from (e'^, 63,. • •, e^) to (e", e2,. • •, e^) where e^ and e" respectively denote the unit vectors associated with the positive x[ and x" axes. Let •jij denote the direction cosines of the "angle" between e[ and e'j
Determining Intersection of Quadratic Surfaces
lij=<-e';
.
287
(70)
Let S denote the n x n rotation matrix comprising these direction cosines S = [7d
•
(71)
Let p' = (x[,X2, • • •,x'^) and p" = {x'l,x'2, • • •,x'„) respectively denote the position of a point P in the ' system and the " system, both with the origin at O'. Then, we obtain p' = Sp"
.
(72)
Let r' and r" respectively denote the homogeneous coordinates of the same point P. Then, they are related by the following equation r' = R r "
(73)
where the (n+1) x (n+1) rotation matrix R is given by R = R(S)
^nxn
"nxl
Olxn 1
(74)
By expanding the determinant of this matrix using the last row or column of elements and noting that | S |= 1, we obtain for all rotations |R(S)|=:1
.
(75)
Since S is an orthogonal transformation (in the general sense), it follows from equation (68) that the extended rotation matrix R is also an orthogonal transformation. Consequently, we have R - i =: R^
.
(76)
That is, the inverse of the extended rotation matrix is precisely equal to the transpose. This is in direct contrast to the case of the extended translation matrix T. The above discussion leads us to state the following: Theorem 6. Rotation. The (n+1) x (n+1) rotation matrix R given by equation (74) has the properties given by equations (75) and (76). Remarks: For rotations, the usual and extended equations respectively given by (72) and (73) have the same form involving matrix multiplication. On the other hand, for translations, the usual and extended equations respectively given by (61) and (62) do not have the same form. One has vector sum while the other has matrix multiphcation. The older approaches in three dimensional analytic geometry (as in [Dre64]) have always used nonhomogeneous coordinates. As a result, they had to deal with translations and rotations in differing formulations. Thus, they encountered extensive algebraic
288
Ken Chan
manipulations which can be judiciously circumvented by using homogeneous coordinates. The advantages of this point are obvious in proving the following theorems on invariants. Theorem 7. Invariants. Let C denote the extended matrix whose elements are coefficients of a second degree polynomial with n variables. We shall associate these n variables with the coordinates in a rectangular system of the same dimension. We shall perform any sequence of translations and rotations taken in any order, even remembering that these operations do not commute. After such a sequence of transformations, we can replace the entire process by just a translation followed by a rotation or a rotation followed by a translation. However, in conformance with previously introduced notation in equations (62) and (73), we shall illustrate with a translation T and then a rotation R of axes taking the original system from the unprimed to the double primed variables. Then, the determinant of C is invariant under these transformations. Proof. Thus, we have r^Cr = (TRr")^C(TRr") = r"' R^T^CTRr" = r"' C"r"
(77)
so that the matrix C transforms according to C" = R'^T^CTR
.
(78)
If we take the determinant on both sides, it follows from Theorems 5 and 6 that |C"| = |C|
.
(79)
This proves the theorem. D Remarks: For the case of three dimensions. Theorem 7 is proved through much laborious pains by Dresden [Dre64] who used nonhomogeneous coordinates. If such an approach is taken for higher dimensions, it is conceivable that the methodology would soon break down. Suppose we wish to reduce C to its canonical form containing only diagonal second degree terms, only first degree terms that do not have a second degree term associated with them, and a constant term. This may be accomplished by first performing a rotation R* to eliminate the second degree off-diagonal terms, and then a translation T* to eliminate those first degree terms having a diagonal term associated with them. (Remember that the original coefficients of the linear terms have changed after performing the rotation R*.) Henceforth, we shall let C" denote the canonical form of C. This problem of reducing C to its canonical form is one of two very important transformations involving quadratic surfaces. The other is almost the inverse process: Given a canonical form C", transform it to another set of
Determining Intersection of Quadratic Surfaces
289
axes with orientation and displacement specified in the new coordinate system. A Httle consideration will reveal that, again, we first perform a rotation using the knowledge of direction cosines and then a translation to accomplish the required displacement in the new system. Interchanging these two steps will not yield the desired result. Corollary 1. It follows from the above theorem that (1) For the case of hyper ellipsoids with semi-axes ji, we have 1
ICI = |C"|
(80)
Til! (2) For the case of hyper hyperboloids, we have
ICI
1
|C"| = ±-
(81)
Uif Let us now return to the general case of two quadratic surfaces A and B defined in n-dimensional space, A being non-singular so that its inverse exists. Let D be defined as D =: A - ^ B
.
(82)
Even though A and B are symmetric (so that A~^ is also symmetric), it does not necessarily follow that A^^B is also symmetric. As an example, see the matrix in equation (18). From equation (82), we obtain |D|
(83)
IBI
Let (A ^)" and B " respectively denote the canonical forms of A ^ and B. This leads us to Theorem 8. Determinant. The determinant of the canonical form of the inverse of matrix A is equal to the determinant of the inverse of its canonical form, i.e., (84)
( A - i ) " = (A")-^ Proof. By Theorem 7, equation (79), we have |A"| = lAI (A-) Since | A
-1
=|(A-)|
(85) .
1, it follows from equations (85) and (86) that
(86)
290
Ken Chan IA"|
(87)
(A-^)
However, we also have = 1
(A")"
Consequently, we obtain equation (84). This proves the theorem. D
Remarks: Note that we proved the relation (84) relating determinants of (A~^) and (A )"•' even though it is not true in general that (A"^) = (A )"^. This can be seen from a simple example hke the matrix B given by equation (15) for which we observe that (B~^) ^ (B ) " ^ It is easy to prove that (B-^)" = ( B " ) - I if and only if T = I (i.e., Po = 0). Theorem 9. Combined Invariants. Let A and B be two hyper ellipsoids in n-dimensions. In this case, the inverse exists for both of them. Let D be defined by equation (82). Let ai and Pi respectively denote the semi-axes associated with A and B. Then, we have "
IDI
n\^f
2
(89)
Proof By Theorem 7, equation (79), and equation (83), we obtain (A-)
IDI
(90)
•IB''
Note that in deriving this equation, we do not assume that the transformations which take A to A" and B to B " are necessarily the same. Prom equation (80), we have |B"|
1
(91)
From equations (80) and (84), it follows that
(A-i)" = (A")-^ = - 1 1 " ?
(92)
i=l
By substituting equations (91) and (92) into (90), we obtain equation (89). This proves the theorem. D We can now rewrite equation (43) as |D - All - (-1)"+! [A"+i + d„A" + ... + diA + do]=0
(93)
where, in terms of the determinant | D |, we have according to equation (89)
Determining Intersection of Quadratic Surfaces
291
2
n
do-(-ir+MD| = (-l)"+^n?i ..:/?^
•
(94)
However, we can also write do in terms of the eigenvalues Aj as follows n+l
do = {-ir^'l[Xi
•
(95)
i=i
Prom these two equations, we obtain a relation between the product of the eigenvalues of A~^B and the product of semi-axes of the hyper ellipsoids A and B . This leads us to the following: Theorem 10. Eigenvalues Product. The (n+l) eigenvalues Aj of A~^B and the n semi-axes a, and /3j respectively of the hyper ellipsoids A and B are related by
n^'-nsA •
(96)
Remarks: If the two elhpsoids are translated or rotated, the matrix elements of A and B wiU change. Thus, the eigenvalues of A~^B wiU change, i.e., they are not invariant under rotation and translation. However, the semi-axes will not change because they are the diagonal elements of the canonical forms. This theorem states that do which is the product of the (n-)-l) eigenvalues is invariant under any combination of rotation and translation of A and B independently. This is not the case with the other coefficients dj in equation (93) which depend on the relative position and orientation of the two ellipsoids. This is evident even in the simple case studied in the Illustrative Example, as seen in equation (21). In these statements, it should be borne in mind that we are dealing with the eigenvalues {Aj ; i = 1,2,... ,(n-|-l)} of the extended matrix A~-^B in contrast to the eigenvalues {^,i ; i = 1,2,... ,n} associated with a quadratic form which are invariant under rotation. Needless to say, {ii, }
4 Application to ellipsoids The general theorems previously obtained for n-dimensional quadratic surfaces will now be apphed specifically to the three dimensional physical space so familiar to us. Moreover, we shall first restrict discussion to the case of two ellipsoids. (It helps to refer to Figure 3 which is really for elhpses in the two dimensional case.) Since n = 3, we have four eigenvalues of the matrix A^^B. Let us first consider a tangent point P when the ellipsoids A and B are exterior to each other (referred to as exterior tangency). We note that Vr-^Br is of opposite sign to Vr'^Ar. Hence, one eigenvalue (say Ai) is negative. Moreover, by Theorem 10, equation (96), the product of the four eigenvalues is positive.
292
Ken Chan
Consequently, there must be a second eigenvalue (say A2) which must also be negative. (The other two eigenvalues A3 and A4 must have a product which is positive. It does not matter presently what they are.) We note that when any negative A (not necessarily an eigenvalue) is substituted into equation (41), we obtain a quadratic surface Q which is an ellipsoid. However, when the eigenvalue Ai is substituted into this equation, Q becomes a singular ellipsoid which must therefore be a point. Let the eigenvalue Ai correspond to the eigenvector r i . Then, by Theorem 4, this eigenvector yields the nonhomogeneous coordinates of the tangent point P. Similarly, the eigenvalue A2 corresponds to the eigenvector r2 which yields the same tangent point P since there can be only one exterior tangent point between two ellipsoids. It follows that Ai = A2. Hence, the characteristic polynomial equation (43) has a repeated negative root when the surfaces A and B are exteriorly tangent. We note that the coefficients of this characteristic polynomial are functions of the elements of the matrices A and B. These are alternative ways of expressing the size, shape, orientation and relative position of the ellipsoids A and B. Hence, by subjecting these eUipsoids to undergo continuous variations in any fashion, the eigenvalues of equation (43) will also undergo continuous variations. This forms the basis of our approach in proving the ensuing general results without having to deal with the details of extremely complicated equations. If we move the two elhpsoids closer together or push them apart, then one of these configurations must correspond to the case in which the characteristic polynomial equation has two complex conjugate roots while the other configuration must correspond to the case in which it has two distinct negative roots. Let us first consider the case of two intersecting ellipsoids. For this, we have a locus of points in common between the ellipsoids A and B. Suppose this case corresponds to two negative eigenvalues Ai and A2. The eigenvalue Ai corresponds to an eigenvector ri which yields the nonhomogeneous coordinates of a point Pi on the singular surface Q which is a point. However, by Theorem 1, the set of points comprising the intersection between A and B must also lie on Q. This is impossible. Hence, the case of two intersecting ellipsoids must correspond to the characteristic polynomial equation having two complex conjugate roots. Consequently, the case of two ellipsoids exterior to each other must correspond to the characteristic polynomial equation having two distinct negative roots. Let us push the eUipsoids until they become tangent once again (referred to as interior tangency). At this tangent point, we note that Vr-^Br is of the same sign as Vr'^Ar. Hence, one eigenvalue (say Ai) is positive. Consequently, there must be a second eigenvalue (say A2) which must also be positive. In the configuration with interior tangency, there will be common volume between A and B. Depending on their size, shape, orientation and relative position, there are three possibilities: (i) They intersect again somewhere else; (ii) They become tangent again at another point; and (iii) They do not intersect again (one is contained entirely within the other). If we proceed as for exterior
Determining Intersection of Quadratic Surfaces
293
tangency, we can easily prove that for Cases (i) and (iii) we must have Ai = A2. To prove that Case (ii) also has two equal positive eigenvalues, let us consider Case (i) continuously deforming into Case (iii). We know that the eigenvalues also undergo continuous variations. It follows that at the transition (for which we have two interior tangent points), we stiU have Ai = A2. Note that this result is true for any two interior points in general. (In this process just discussed, the singular quadratic surface Q degenerates from a cone in Case (i) to a straight line or a plane in Case (ii) and then to a point in Case (iii).) We are able to prove this equahty of eigenvalues without having to deal with the details of the ehipsoidal surfaces. All the preceding theorems and discussion lead us to the following extremely general theorem on relative configuration based solely on eigenvalues. Theorem 11. Characteristic Equation. Let A and B denote two ellipsoids. If the extended 4^4 inatrix A~^B (or B~^A) has • • • • •
two distinct negative eigenvalues, then A and B are exterior to each other; two equal negative eigenvalues, then A and B are exteriorly tangent; two complex conjugate eigenvalues, then A and B intersect; two equal positive eigenvalues, then A and B are interiorly tangent at one or two points; two distinct positive eigenvalues, then A and B intersect or one is entirely within the other with no points of tangency.
Corollary 2. In view of Cases (2) and (4), it is also impossible for the matrix A~^B to have two pairs of equal eigenvalues of opposite signs (say Ai == A2 > 0 and A3 = A4 < 0). We will now apply this theorem to study the three cases above. For simplicity, we shall consider the two dimensional case as already exemplified by the Illustrative Example. We wish to observe how the eigenvalues change as the elUpse B (with semi-axes b and c) passes through the circle A (with radius a). Case (i) is illustrated by Stage 4 in Figure 4 for which c< a
For Case (ii), we consider a
294
Ken Chan
^r^f^
<==
(J) Two complex corrugate roots
(2) Two equal negative roots
-co
CD (4) Two equal positive roots
(1) Two distinct negative roots
(5) Two tlistinctpositivB roots
<=>
(9) Two distinct negative roots
c=0 (8) Two eqxjal negative roots
(e^ Two equal poatiye rooU
<:="
(7) Two complex conjugate roots
Fig. 4. Transition of Eigenvalues for the Case of c < a < b
(3) Two complex conjugate roots
(2) Two equal negative roots
(1) Two distinct negative roots
^
(4) Two equal positive roots
(9) Two distinctne^tive roots
(5) Two distinct positive roots
(8) Two equal ne^tive roots
(6) Two equal positive roots
(7) Two complex conjugate roots
Fig. 5. Transition of Eigenvalues for the Case of c < 6 < a
Determining Intersection of Quadratic Surfaces
295
stages are the same as before. If c is much larger than a, then Stages 4 through 6 (as well as Stages 8 through 10) collapse into one stage with only one tangent point. This critical condition depends on the curvature of the two conies. For example, see the Illustrative Example, equation (30). All these illustrations are included to clarify what could be an extremely difficult situation to describe analytically. They do not constitute a proof of the above theorem whose proof has been given already.
(J) Two complex corpugite roots
(2) Two equal regative roob
(1) Two distinct negative roob
(h ' CD ' (4) Two equal positive loots
(5) Two distinct positive mots
=•
(9) Two distind positive roob
(6) Two eqoal positive loots
<^
(8) Two equal positive loots
(7) Two distinct pos itive roots
4
C10) Two eqaal positive roots
(ll)'Iwocon^lexcor5ugate loob
(12) Two equal negative roob
4
(i;^ Two distinct negative roob
Fig. 6. Transition of Eigenvalues for the Case of a < c < b
Ken Chan
296
Finally, consider Case (4) of Theorem 4 for which we have two equal positive eigenvalues such that A and B are interiorly tangent at two points Pi and P2. This is illustrated in Figure 7 for two ellipses. Note that Figure 7a depicts the case of two colhnear vectors OPi and OP2, each of which corresponds to two equal positive eigenvalues Ai = A2. Also note that these two colhnear vectors are not coUinear in the extended form since they both have 1 in the last component. What is not so obvious is that in Figures 7b through ??d it is also true that each of the two vectors OPi and OP2 has the same two equal positive eigenvalues Ai = A2 even though OPi and OP2 are not cohinear. This property follows from the fact that because Pi and P2 are common to A and B, they also lie on the singular quadratic surface Q defined by Q = B - AiA.
(a) - Collinear Vectors Figure
(b) - Non-collinear Vectors
0'
2 (c) - Non-coIIinear Vectors Figure
^2 (d) - Non-collinear Vectors
Fig. 7. Illustrative Cases of Two Vectors with Equal Eigenvalues
Theorem 12. Multiple Interior Tangent Points. Let A and B denote two ellipsoids. Suppose the extended 4 '>^ 4 matrix A~^B (or B~^A) has two equal positive eigenvalues (say Ai = A2) such that A and B are interiorly tangent at two points Pi and P2. Then, the two vectors OPi and OP2 are each associated with the same two equal positive eigenvalues Ai and A2. These
Determining Intersection of Quadratic Surfaces
297
tangent points may not be isolated points; they may be on a locus of points at which A and B are tangent. Corollary 3. / / the singular quadratic surface Q defined by Q = B - XiA is a straight line, then there can be at most only two interior tangent points common to A and B. Corollary 4. / / the singular quadratic surface Q defined by Q = B - XiA is a plane intersecting A (or B), then there are infinitely many interior tangent points common to A and B. These tangent points form a locus which is given by the intersection of Q and A (or B). Remarks: A singular conic can only be a point, a pair of intersecting lines, a pair of parallel lines or a coincident line. Therefore, there can be only one or two tangent points between two conies in general. In contrast, a singular degenerate quadric (which is of rank less than 3) can only be a straight line, a pair of intersecting planes, a pair of parallel planes or a coincident plane. If we also include a point (which is actually a singular non-degenerate quadric of rank 3), then for these singular quadrics Q there can be only one, two or infinitely many tangent points between two quadrics A and B in general. However, the author has not analyzed the case of other singular non-degenerate quadrics (a double elliptic cone and a cyhnder of the elliptic, hyperbolic or parabolic type) to arrive at any conclusion. It is possible that the above statement is still valid.
5 Discussion It is obvious that the foregoing theorems are applicable to the case of n dimensions. All the results we obtained in the simple Illustrative Example on conies become so obvious now. If we had attempted a more difficult two dimensional or a three dimensional problem, we would have been mired in excessive comphcated mathematical details without being able to obtain the answer. We achieved our objective by using abstract symbolism and invariant properties of the extended A~^B matrix which circumvented substantial unnecessary algebraic manipulations. Moreover, we may apply the above process to analyze other configurations such as an ellipsoid and a hyperboloid of one or two sheets, an ellipsoid and an elliptic paraboloid, two hyperboloids of two sheets, etc. At a tangent point for each of these cases, we have to cautiously determine whether Vr-^Br is of the same sign as Vr'^Ar or otherwise. This is especially true when we deal with open surfaces such as hyperboloids of one or two sheets, elliptic paraboloids and hyperbolic paraboloids. The convention for this sign determination is intrinsically given by the polynomials associated with the quadratic surfaces. Rather than delve into these more difficult topics, it is instructive to note that all the classical analyses of lines and planes intersecting or tangential
298
Ken Chan
to conies and quadrics can now be easily and compactly treated in a unified way. That is, all the equations previously derived for secants, tangents, normals and contact points can now be obtained and studied more qualitatively. Moreover, we can also extend this approach to study the intersection of other improper quadrics such as cyhnders of the elliptic, hyperbolic or parabolic types with any of the five proper quadrics. Even these simpler topics will provide substantial material for further investigations.
6 Conclusion The foregoing analysis presents a bridge between two seemingly disparate topics in mathematics. One deals with the intersection or tangency of surfaces in solid analytical geometry while the other reveals the relationship of the eigenvalues of an associated problem in linear algebra.
References [AGOl] S. Alfano and M. L. Greer, "Determining If Two Ellipsoids Share the Same Volume," AAS/AIAA Astrodynamics Specialists Conference, Quebec, Canada, July 2001. [ChaOl] K. Chan, "A Simple Mathematical Approach for Determining Intersection of Quadratic Surfaces," AAS/AIAA Astrodynamics Specialists Conference, Quebec, Canada, July 2001. [Pet66] A. J. Pettofrezzo, Matrices and Transformations, Dover Publications, New York, pp 103-110 (1966). [Dre64] A. Dresden, Solid Analytical Geometry and Determinants, Dover PubUcations. New York, pp 197-205, 230 (1964). [GKT68] Granino A. Korn and T. M. Korn, Mathematical Handbook for Scientists and Engineers, McGraw-Hill, New York, p 75 (1968).
Applications of Shape-Distance Metric to Clustering Shape-Databases Shantanu H. Joshi^ and Anuj Srivastava^ ' Department of Electrical Engineering, Florida State University, Tallahassee, PL 32310 USA. [email protected] ^ Department of Statistics, Florida State University, Tallahassee, FL 32306 USA. anujOstat.fsu.edu
Summary. Based upon the geometric approach to clustering outlined in [KSMJ04], this paper presents an application to hierarchical clustering of imaged objects according to the shapes of their boundaries. The shape-distance used in clustering is an intrinsic metric on a nonlinear, infinite-dimensional shape space, obtained using geodesic lengths defined on the manifold. This analysis is landmark free, does not require embedding shapes in R^, and uses ODEs for flows (as opposed to PDEs). Clustering is performed in a hierarchical fashion. At any level of hierarchy clusters are generated using a minimum dispersion criterion and an MCMC-type search algorithm is employed to ensure near-optimal configurations. The Hierarchical clustering potentially forms an efficient (O(logn) searches) tool for retrieval from shape databases.
1 Introduction Numerous problems in pattern recognition deal with classifying and labeling objects of interest present in the observed images. Discriminating features of the objects can be characterized according to their location, pose, textures, colors and shapes. Shape is a feature that is receiving widespread attention in analysis of images or improving understanding of objects that are changing in a scene. Analysis of shapes, especially those of complex objects, is a challenging task and requires sophisticated mathematical tools. Various applications of shape analysis include biomedical image analysis, automatic surveillance, biometrics, military target recognition and general computer vision. There have been various breakthroughs in understanding of shape. A majority of this research has been restricted to landmark-based analysis [DM98], where shapes are represented by a course, discrete sampling of the object contours. Shape clustering and learning using the resulting Procrustean analysis are discussed in [DJDOl, DSJ99]. Klassen et. al. [KSMJ04, SMKJ03], propose a new metric on the space of continuous, closed curves in IR'^, without a need for defining landmarks. The computation of this metric obviates the use of
300
Shantanu H. Joshi and Anuj Srivastava
computationally expensive tools such as Euclidean embeddings and nonlinear PDEs and uses ODEs instead. The idea involves identifying a space S of allowable shapes and imposing a Riemannian structure and further utilizing its geometry to solve optimization and inference problems. For planar curves in M?, of length 2TT and parameterized by the arc length, the coordinate function a{s) relates to the direction function 6{s) according to a{s) = e*^^*\ i = A/—1. The planar closed curves are represented by their angle functions. Considering closed curves, and making them invariant under rigid motions (rotations, translations), and uniform scaling, one obtains: C=ie£
L ^ l ^ f ^ e{s)ds = TT, f ^ e^^'Wds = o | .
(1)
The paper [KSMJ04] describes algorithms for computing geodesic paths between any given shapes. It also defines an intrinsic mean on the shape space and describes an algorithm of computing the mean for a set of shapes. The length of the geodesic d{9i,9j) for any 6i,9j S S can be used to quantify shape differences. This metric once computed, can be used to solve a host of challenging problems in image analysis, segmentation and feature extraction. In this paper in particular we focus on the clustering algorithm used to partition a group of shapes. Clustering can be further extended in a hierarchical fashion leading to the organization of a large database of objects according their shapes in a manner that allows for efficient searches and retrievals.
2 Shape Clustering An important need in shape studies is to classify and cluster previously observed shapes. In this section, we develop an algorithm for clustering of objects according to shapes of their boundaries. It is assumed that the task of boundary extraction or segmentation of the shape from the image is already completed. Thus we are presented with a set of 2D planar closed contours in the form of their co-ordinate functions. The shape-distance metric is computed pair-wise between all the set of shapes. In Euclidean spaces with the usual metric, there are several algorithms for clustering that can be adapted to the shape space S using its Riemannian structure. However, to keep the algorithm simple and efficient, we restrict to a simple modification of the kmean clustering idea. This modification is applied since the computation of means in S is an iterative and a computationally costly procedure, and we try to minimize that calculation. 2.1 Minimum Pair-wise dispersion Clustering We view clustering as a problem of partitioning n shapes (in S) into k clusters. To motivate our algorithm, we begin with a discussion of a classical clustering
Applications of Shape-Distance Metric to Clustering Shape-Databases
301
procedure for points in Euclidean spaces, which uses the minimization of the total variance of clusters as a clustering criterion. More precisely, consider a data set with n points {yi,y2, • • • ,yn} with each y, G R'^. If a collection C = {Ci, 1 < i < fc} of subsets of W^ partitions the data into k clusters, the total variance of C is defined by Q{C) = S j = i Ylyed 11^ ~ MilPi where n IS the mean of data points in d. The term Y^^Q. \\y — MilP can be interpreted as the total variance of the cluster Q . The total variance is used instead of the (average) variance to avoid placing a bias on large clusters, but when the data is fairly uniformly scattered, the difference is not significant and either term can be used. The widely used k-Means Clustering Algorithm is based on a similar clustering criterion (see e.g. [JD98]). The soft k-Means Algorithm is a variant that uses ideas of simulated annealing to improve convergence [BH91, R98]. These ideas can be extended to shape clustering using d{9, iXiY instead of \\y — HiW^, where d{-, •) is the geodesic length and /Xj is the Karcher mean of a cluster Ci on the shape space. The Karcher mean is an intrinsic mean on the shape space that is obtained after an iterative procedure that minimizes the average variance for all the set of shapes. However, the computation of Karcher means of large shape clusters is a computationally expensive operation. Thus, we propose a variation that replaces d{9,jii)'^ with the average distance-square Vi{9) from 9 to elements of Cj. If Ui is the size of Ci, then Vi(0) = :^ X^g'gc, d{9,9')'^. The cost Q associated with a partition C can be expressed as
Q(^) = E ^ ( E i=\
E
* \BaeCib
rf(^-^6)M.
(2)
J
The scale factor in the above equation of Q{C) ensures there is no additional preference to those clusters having a large number of shapes, while moving shapes from one cluster to other. The algorithm is outUned next. Algorithm 2.1 For n shapes and k clusters initialize by randomly distributing n shapes among k clusters. Set a high initial temperature T. 1. Compute pairwise geodesic distances between all n shapes. This requires n{n — l ) / 2 geodesic computations. 2. Pick a shape 9j randomly. If it is not a singleton in its cluster then compute Qf for alii = 1,2,...,k. 3. Compute the probability PT{J, i) for all i = 1,... ,k and re-assign 9j to a cluster chosen according to the probability PT{j,i). 4- Update temperature using T = T/0 and return to Step 2. We have used (3 = 1.0001 in our experiments. The set of all configurations of n shapes into k clusters is finite, and the stochastic nature of Algorithm 2.1 guarantees the convergence to the optimal
302
Shantanu H. Joshi and Anuj Srivastava
configuration as long as T is reduced slowly. Instead of analyzing convergence theoretically, we present some experimental results. The Algorithm 2.1 is apphed to a collection of 25 shapes from the Kimia database [SKK03] shown in Figure 1. Left panel of Figure 1 shows the 25 shapes. The Right panel of Figure 1 shows the cluster configuration for fc = 6 clusters. The success of Algorithm 2.1 in clustering these diverse shapes is visible in these results; similar shapes have been clustered together.
3J532:s-=21 iL^J^^JI^/.)} !>== i i = i > ^ 1== ZZSO-Y 3EaH E E H B E YYYY X fco-[>E
zzzz
Fig. 1. Left panel shows 25 objects whose shapes are analyzed here. They are numbered from top left to bottom right in increasing order. Right panel shows a clustering arrangement (row-wise) for the 25 objects into 6 clusters.
3 Hierarchical Organization An important goal of this paper is to organize large databases of shapes in a fashion that allows for efficient searches. One way of accomplishing this is to organize shapes in a tree structure, such that shapes are refined regularly as we move down the tree. In other words, objects are organized (clustered) according to coarser differences (in their shapes) at top levels and finer differences at lower levels. This is accomplished in a bottom to top construction as follows: start with all the shapes at the bottom level and cluster them according to Algorithm 2.1 for a pre-determined k. Then, compute mean shape for each cluster and at the second level cluster these means according to Algorithm 2.1. Applying this idea repeatedly, one obtains a tree organization of shapes in which shapes change from coarse to fine as we move down the tree. Critical to this organization is the notion of the intrinsic mean of shapes as discussed earlier. Shown in Figure 2 is an example of a tree structure obtained for clustering result for 25 shapes shown in Figure 1. At the bottom level, these 25 shapes are clustered in fc = 6 clusters, with the clusters denoted by the indices of their element shapes. Computing the means of each these six clusters, we obtain the
Applications of Shape-Distance Metric to Clustering Shape-Databases
303
last to bottom row of shapes. Repeating the clustering for fc = 4 clusters we obtain the next level and subsequently their means. In this example, we have chosen to organize shapes in four levels with a single shape at the top level. The choice of parameters such as the number of levels, and the number of clusters at each level, depends on the required search speed and performance.
19 22 10 13 20 21
3 5 6 7 24
9 15 16 17
4
11 12 18 23
2 14 25
Fig. 2. Hierarchical Organization of 25 shapes
4 Conclusion We have presented a hierarchical organization of shapes based upon the shapedistance metric which utilizes the Riemannian structure of the shape space. Clustering is performed efficiently by minimizing the pair-wise average variance within the clusters and can be used in clustering of shape databases of objects. Hierarchical clustering reduces the search and test times for shape
304
Shantanu H. Joshi and Anuj Srivastava
queries against large databases. This has enormous potential for systems which use shape based object retrieval.
References [BH91] Brown, D.E. and Huntley, C.L. (1991), "A practical application of simulated annealing to clustering," Technical Report IPC-TR-91-003, Institute for Parallel Computing, University of Virginia, Charlotesville, VA. [DM98] Dryden, I.L. and Mardia, K.V. (1998), Statistical Shape Analysis, John Wiley & Son. [DJDOl] Duta, N., Jain, A.K. and Dubuisson-Jolly, M.-P. (2001), "Automatic construction of 2D shape models," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 5. [DSJ99] Duta, N., Sonka, M., and Jain, A.K. (1999), "Learning shape models from examples using automatic shape clustering and Procrustes analysis," Proceedings of Information in Medical Image Processing. Vol. 1613 of Springer Lecture Notes in Computer Science, pp. 370-375. [JD98] Jain, A.K. and Dubes, R.C. (1998), Algorithms for Clustering Data, Prentice-Hall. [KSMJ04] Klassen, E., Srivastava, A., Mio, W., and Joshi, S.H. (2004), "Analysis of planar shapes using geodesic paths on shape spaces," IEEE Pattern Analysis and Machine Intelligence,Yol. 26 No. 3, pp. 372-383. [R98] Rose, K. (1998), "Deterministic annealing for clustering, compression, classification, regression, and related optimization problems," Proceedings of IEEE, 86(ll):2210-2239. [SKK03] Sebastian, T.B., Klein, P.N. and Kimia, B.B. (2003), "On aligning curves," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No.l. [SMKJ03] Srivastava, A., Mio, W., Klassen, E. and Joshi, S.H. (2003), "Geometric Analysis of Continuous, Planar Shapes," Proceedings of Energy Minimization Methods in Computer Vision and Pattern Recognition., Vol 2683 of Springer Lecture Notes in Computer Science, pp. 341-356.
Accurately Computing the Shape of Sandpiles"* Christopher M. Kuster and Pierre A. Gremaud Department of Mathematics and Center for Research in Scientific Computation, Raleigh, NC 27695, USA. {cmkuster,gremaud}9math.ncsu.edu
Summary. Using an Eikonal formulation, we model the surface of sandpiles formed in regions containing obstacles. The fast marching method is adapted to have the optimal rate of convergence. We also apply the fast marching method to an industrial problem. K e y w o r d s : Fast Marching, upwind methods, Eikonal equation.
1 Introduction There are many examples of granular materials in industrial applications. These range from ore in mining operations to fine powders in the pharmaceutical industry. It is common to store these materials in piles, either in bins or in the open. Often, the area in which these piles are formed contains soHd obstacles which the grains must flow around in the forming of the pile. Our goal is to accurately measure the volume of such piles using only observations of the surface. In [ABG03], we created a mathematical model of the system, discretized the corresponding equations, and used the Fast Marching method to solve for the surface profile. This gave a method t h a t converged to the proper solution upon refinement, b u t it did so more slowly t h a n expected. Here, we modify the procedure to improve accuracy. T h e only physical parameter t h a t we consider for our model is the angle of repose. This is the maximum angle of inclination the surface of the grain can have and still be at rest. It roughly corresponds to the angle of internal friction of the material. For the purposes of our model, we assume t h a t the angle of inclination of the surface from vertical is equal to the angle of repose at every point. In one dimension, our assumption leads to the equation This research was supported by the National Science Foundation (NSF) through grant DMS-0244488.
306
Christopher M. Kuster and Pierre A. Gremaud |/'(x)|=tan5
where f{x) defines the height of the material at a point x, and S is the angle of repose. Requiring this equation to be satisfied nearly everywhere leads to an infinite number of solutions (See figure 1 (left)). In order to obtain a unique solution, we add the condition that the surface decrease with distance from the source point (See figure 1 (right)). This is commonly referred to as the viscosity solution. Generalizing this to multiple dimensions results in the Eikonal equation |V/(a:)| = t a n ^
Fig. 1. Possible solutions to the 1-D equation Obstacles can be modeled by modifying the right hand side in an appropriate way, namely
\yf{x,y)\
=
S{x,yJ{x,y))
where S, the slowness function, is defined by f tan ^ if {x, y, f{x, y)) ^ Obstacle, loo if (a;,y,/(x,y)) S Obstacle Travel time problems are closely related to the above formulation. In this paper, we look at prismatic obstacles. That is, obstacles with a cross sectional area that does not change with height. In this case, the slowness function is only a function of x and y. It is possible that our methods can be adapted to non-prismatic obstacles.
2 Numerical M e t h o d s One standard way of solving Eikonal equations is the Fast Marching Method (FMM) ([Tsi95, Set99]). The FMM is a wave propagation method based on upwind formulations. It starts with known values on a region and propagates the solution outward from there. The FMM is described by the following algorithm
Accurately Computing the Shape of Sandpiles f = ( o o , . . . ,00) "front" = "source" while "front" ^ 0 fmin = m i n f G " f r o n t " remove tmm from " f r o n t " add n e i g h b o r s of fmin t o " f r o n t " f o r ie G " f r o n t " solve F(fi,,..,f^_i, ff,f^+i,...,fN) = 0 end end
for
307
f^
In order to find fmin, Fast Marching uses a heap sort [Wil64]. This is a d a t a structure t h a t requires O ( l o g n ) operations in order to add, remove, or u p d a t e a node, where n in the number of nodes in the d a t a structure. T h e Fast Marching Method malces the assumption t h a t the slowness function is continuous. This is not the case in our problem, and in fact, our slowness function has a singularity inside every obstacle. When applying the F M M with a uniform mesh (see figure 2 (right))to a region with a cyhndrical obstacle, a loss of accuracy is observed. By adding a small number of extra mesh points around the b o u n d a r y of the obstacle, the F M M can be made to work weU in a region with singularities. In particular, new mesh points are added at the intersection of the hnes in the uniform mesh with the obstacle boundary (see figure 2 (left)).
41
()
4
•
't
•
41
(I
(
' t ^
II ^1
•
m
i
f
p
• >—
•
• •
0
•
s
;,
• >—
A
^
——^
i
• •
h-*—s
«
Fig. 2. Uniform Grid (left), Modified Grid (right)
This addition of nodes creates a non-Cartesian mesh near the obstacle. We use the process described in [SVOO, SV03], and consider a node XQ at which the solution / is t o be computed and two neighboring nodes xi and X2 at which the values of / {fe, i = 1,2) and its derivatives {dxfe,dyfe, i = 1,2) are known or have already been computed (See figure 3). We define
Ne
XQ
— X(
\Xo — X(
1,2
and
Af •
No
308
Christopher M. Kuster and Pierre A. Gremaud
where A/" is a 2x2 nonsingular matrix, assuming XQ, xi and 0:2 are not co-linear. The directional derivatives in the directions A''i and A^2 are approximated by Def = ,
\
\xo
£=1,2
(to 1st order),
Def = 2/° 2^!\ -Nf\dJt,dyfe\ £=1,2 (to 2nd order). I Xo Xe\ Those approximate directional derivatives are linked to the gradient by D,J = Ne-Vf
(1)
-Xi\
+ 0{\xo-Xin
or
1 5 / = A ^ V / + 0(/i«),
(2)
(3)
where Df = [Di/, D2/], h = max{|xo — xi|,|a;o — X2I}, and a = 1 or 2 depending on whether (1) or (2) is respectively used. Solving for V / and plugging into (1) results in the quadratic equation defining the unknown /o Df\UM'r'Df
= {S{x^))\
(4)
Fig. 3. Direction vectors for non-Cartesian grid.
3 Results 3.1 Without Obstacle With no obstacle, the solution is a cone, see figure 4 (left). Since there is no obstacle, the slowness function is continuous and the results are identical to those obtained by the Fast Marching Method on a uniform grid. The order of convergence is first order or second order depending on the approximation used for the directional derivatives (1) or (2).
Accurately Computing the Shape of Sandpiles
309
Fig. 4. Contours of surface ( f ) with and without obstacles
3.2 With Cylindrical Obstacle The exact solution is the generalized distance function from the source. This is consistent with observation (see figure 4 (right)). Table 1 is a grid refinement study looking a t the convergence rate of the L~ and L" errors. With a uniform grid, accuracy is lost due t o the singularity at the obstacle. By adding in nodes around the obstacle boundary, the second order accuracy expected from our second order discretization is preserved in the L1 norm. The first order results obtained for the Lm norm are expected due t o the "shock" behind the obstacle. Figure 5 shows the absolute error in the solutions produced using the uniform grid (Left), and the modified grid (Right) with M = 1600.
Fig. 5. Error plots for: Uniform Grid (left), Modified Grid (right)
3.3 Application
An application of this process is the determination of the volume of material in a bin with inserts, see figure 6. From this picture, it can be seen that the insert is not a prismatic obstacle. Preserving optimal accuracy is much more challenging here as the (x, y)-projection of the line of contact between
310
Christopher M. Kuster and Pierre A. Gremaud
M 25 50 100 200 400 800 1600 3200
L' 2.91(0) 1.71(0) 1.07(0) 6.42(-l) 3.76(-l) 2.21(-1) 1.31(-1) 7.80(-2)
M 25 50 100 200 400 800 1600 3200
L' 2.52(0) 1.45(0) 6.36(-l) 3.02(-l) 1.52(-1) 6.91(-2) 3.42(-2) 1.48(-2)
first order methods standard method rate L' rate L°° rate L' 9.00(-l) 6.11(-1) 1.13(0) .77 5.24(-l) .78 3.52(-l) .79 7.58(-l) .68 3.27(-l) .68 2.23(-l) .66 4.41(-1) .74 2.00(-l) .71 1.46(-1) .61 2.39(-l) .77 1.20(-1) .74 9.36(-2) .64 1.27(-1) .77 7.24(-2) .73 6.07(-2) .62 6.38(-2) .76 4.40(-2) .72 3.82(-2) .67 3.07(-2) .75 2.68(-2) .72 2.44(-2) .65 1.49(-2) second order methods standard method rate L ^ rate L°° rate L' 1.07(0) 8.99(-l) 1.17(0) .80 5.94(-l) .85 7.39(-l) .28 2.83(-l) 1.2 2.45(-l) 1.3 3.10(-1) 1.3 1.23(-1) 1.1 1.20(-1) 1.0 1.73(-1) .84 3.29(-2) .99 6.16(-2) .97 8.49(-2) 1.0 1.52(-2) 1.1 2.91(-2) 1.1 4.41(-2) .95 2.06(-3) 1.0 1.45(-2) 1.0 2.31(-2) .93 4.82(-4) 1.2 6.36(-3) 1.2 1.12(-2) 1.0 1.15(-4)
modified method rate L^ rate L°° 3.04(-l) 1.44(-1) .58 2.00(-l) .60 9.18(-2) .78 1.13(-1) .82 4.92(-2) .88 6.10(-2) .89 2.65(-2) .92 3.24(-2) .91 2.02(-2) .99 1.64(-2) .99 1.54(-2) 1.1 7.91(-3) 1.0 1.14(-2) 1.0 3.88(-3) 1.0 6.77(-3) modified method rate L°° rate 1/ 2.84(-l) 1.16(-1) 2.0 9.59(-2) 1.6 4.54(-2) 1.2 4.26(-2) 1.2 1.97(-2) 1.9 1.14(-2) 1.9 1.22(-2) 1.1 5.32(-3) 1.1 5.93(-3) 2.9 8.22(-4) 2.7 3.11(-3) 2.1 2.47(-4) 1.7 1.68(-3) 2.1 7.64(-5) 1.7 8.83(-4)
rate .65 .88 .91 .39 .39 .43 .75
rate 1.4 1.2 .70 1.0 .94 .89 .93
Table 1. Convergence study for formally first and second order methods in the presence of a circular obstacle (Example 2); M measures the number of nodes on one edge of the domain, i.e., total number of nodes N = 0{M'^).
material and obstacle is not known in advance. The results shown below were computed on a uniform grid and are thus not optimally accurate. By applying the fast marching process to several different source heights, a chart can be made of the volume of material in the bin as a function of the height of the surface at the source. Figure 7 shows such a chart for materials with several different angles of repose.
4 Conclusions An Eikonal formulation of general sandpile formation is proposed. The problem is discretized and solved. It is shown how to maintain optimal accuracy with fast marching when obstacles are present. The feasibility of the approach is illustrated by the solution of an industrial problem. Future work will address how to maintain optimal accuracy for non-prismatic obstacles.
Accurately Computing the Shape of Sandpiles
311
Fig. 6. Example of a bin with an insert
Hopper wIBINSERT, Materials wldiffering angle of repose
0
I
I
I
I
2
4
6 Height (feet)
8
1
30 degrees 40 degrees ------50 degrees --.-----
10
Fig. 7. Volume as a function of height for various materials
12
312
Christopher M. Kuster and Pierre A. Gremaud
References [Abg96] R. ABGRALL, Numerical discretization of the first order Hamilton-Jacobi equations on triangular meshes, Comm. Pure Appl. Math., 49 (1996), pp. 1339-1377. [ABG03] S.A. AHMED, R . BUCKINGHAM, P.A. GREMAUD, C . D . HAUCK, C.M. KUSTER, M . PRODANOVIC, T . A . ROYAL, V. SILANTYEV, Volume
[HS99] [0S91] [QS02] [Set99] [SVOO]
[SV03]
[Tsi95] [Wil64]
determination for bulk materials in bunkers, submitted to Int. J. for Num. Math, in Eng., Center for Research in Scientific Computation, NCSU, Technical Report CRSC-TR03-24. C. Hu, AND C.-W. SHU, A discontinuous Galerkin method for HamiltonJacobi equations, SIAM J. Sci. Comput., 21 (1999), pp. 666-690. S. OSHER, AND C.-W. SHU, High-order essentially nonoscillatory schemes for Hamilton-Jacobi, SIAM J. Numer. Anal., 28 (1991), pp. 907-922. J. QiAN, AND W.W SYMES, An adaptive finite difference method for traveltimes and amplitudes. Geophysics, 67 (2002), pp. 167-176. J.A SETHIAN, Fast marching methods, SIAM Review, 41 (1999), pp. 199235. J. A. SETHIAN, AND A. VLADIMIRSKY, Fast methods for the Eikonal and related Hamilton-Jacobi equations on unstructured meshes, Proc. Natl. Acad. Sci. USA, 97 (2000), pp. 5699-5703. J. A. SETHIAN, AND A. VLADIMIRSKY, Ordered upwind methods for static Hamilton-Jacobi equations: theory and algorithms, SIAM J. Numer. Anal., 41 (2003), pp. 325-363. J.N. TsiTSiKLiS, Efficient algorithms for globally optimal trajectories, IEEE Trans. Automat. Control, 40 (1995), pp. 1528-1538. J.W.J WILLIAMS, Heapsort, Com. ACM, 7 (1964), pp. 347-348.
Shape Optimization of Transfer Functions^ Jiawang Nie and James W. Demmel Department of Mathematics, University of California, Berkeley, CA 94710, USA. {nj w,demmel}9math.berkeley.edu
Summary. We show how to optimize the shape of the transfer function of a linear time invariant (LTI) single-input-single-output (SISO) system. Since any transfer function is rational, this can be formulated as an optimization problem for the coefficients of polynomials. After characterizing the cone of polynomials which are nonnegative on intervals, we formulate this problem using semidefinite programming (SDP), which can be solved efficiently. This work extends prior results for discrete LTI SISO systems to continuous LTI SISO systems. K e y w o r d s : Linear system, transfer function, shape optimization, nonnegative polynomials, convex cone, semidefinite programming.
1 Introduction Consider the following linear time invariant (LTI) single-input-single-output (SISO) system i = Ax + bu
(1)
y = c^x + du
(2)
where yl € K " ^ " , 6, c G R " , d £ R. T h e transfer function is H{s) = d + A)~'^b, which can also be written as the rational function
c^{sl-
ELpQ'fc's'' = gi(g) Note t h a t deg[qi) < deg(g2) < n. Conversely any function H{s) of this kind is the transfer function of some LTI system. Any such a LTI system is called a realization of H{s). There are many such (algebraically equivalent) LTI systems [CD91, chap. 9]. * This research was supported by the National Science Foundation Grant No. EIA0122599.
314
Jiawang Nie and James W. Demmel
In many engineering applications, we want the transfer function to have certain attractive properties. For example, we may want the Bode plot (the graph of |i?(s)| along the pure imaginary axis s = j • UJ) to have a certain shape corresponding to some kind of filtering. In this paper we study the shape optimization problem of choosing the coefficients of the rational function H{s) so its Bode plot has some desired shape. Now consider a discrete LTI system, i.e. the governing differential equation (l)-(2) is replaced by the difference equation 52fc=i C(kz{n—k) = X]^=i Pe.v[n— t) where {u(fc)}^j is a sequence of discrete inputs and {z{k)}^^^ is the sequence of state variables. In this case there are several nice papers [AV02, GHNOO, WBV97] that show how to formulate the filter design problem as the solution of the feasibility problem for certain convex sets. The main idea is to apply the spectral factorization of trigonometric polynomials, a characterization of nonnegative univariate polynomials, and semi-infinite programming. This approach can be used to design the transfer function to be a bandpass filter, piecewise constant or polynomial, or even have an arbitrary shape. Our contribution is to extend these results to continuous time LTI SISO systems (l)-(2). In this case the transfer function is not a trigonometric polynomial and hence we cannot directly apply spectral factorization. Fortunately our transfer function is a univariate rational function, which lets us apply certain characterizations of nonnegative univariate polynomials over the whole axis (—00,00), semi-axis (0,oo), or some finite interval [a,5]; see section 2. Using these characterizations, we show how to solve the shape optimization problem for the following shapes: 1. 2. 3. 4.
standard bandpass filter design; arbitrary piecewise constant shape; arbitrary piecewise polynomial shape; general nonnegative function.
We will show that the first three shape optimization problems can be solved by testing the feasibility of certain convex sets, which are the intersections of certain hyperplanes and the cone of semidefinite matrices. This feasibility testing can be done efficiently using semidefinite programming (SDP) [VB96]. The fourth shape optimization problem can be solved by semi-infinite programming (SIP) [P0I97, WBV97]. We introduce some notation. For any m G N, denote by 5"* the vector space of m — by — m symmetric matrices, and let S^ be the intersection of S^ and the positive semidefinite matrices. A y B{A y B resp.) means that A — B is positive definite (semidefinite resp.). [r\ denotes the largest integer no greater than r. deg{p) is the degree of the polynomial p(-). Given a cone K C K^, y ^K 0 means that y G mtK, the interior of K. K* denotes the dual cone of K, i.e., /f* = {u e M^ : u^y > 0, Vy G K). The rest of this paper is organized as follows. In Section 2 we give a characterization of the cone of polynomials which are nonnegative on certain intervals. In Section 3, we reformulate the shape optimization problem for
Shape Optimization of Transfer Functions
315
transfer functions to be convex optimization, and also discuss related work. In Section 4 we show how to recover the transfer function from its absolute value. Section 5 draws conclusions.
2 Cone of nonnegative polynomials on intervals We characterize univariate polynomials which are nonnegative on certain intervals. For a survey paper see [PROO]. First, we characterize the nonnegative polynomials on the positive semiaxis [0, oo). The following result is due to Markov and Lukacs about one century ago. Theorem 1 (Markov, Lukacs [Lukl8, Mar48, PS76]). Let q{t) G be a real polynomial of degree n. Let rii = [^J and n2 = L^^J • //'?(*) ^ 0 for all f > 0, then q{t) = qiit^ + ^92(0^ where deg{qi) < n\ and deg{q2) < n2Now we apply this theorem to characterize the transfer function, which is similar to the spectral factorization for trigonometric polynomials. Observe that
l92(jt^)P
\q2,even{j(^) + q2,odd{joj)\'^
921(^2)2-I-w2g22(w2)2
___ Pljw) 2 = —-,—r where w = u P2[w) Here qi^even and qi^odd denotes the even and odd parts of the polynomial gj, and qij,i,j = 1,2 are defined accordingly. Note that pi{w) and P2{w) are nonnegative polynomials on w £ [0, oo). Conversely, by Theorem 1, given any such nonnegative pi{w) and P2{w), it is possible to reconstruct the qij{w), and so qi{ju)) and H{jio). In other words, p\{w) and P2{w) with deg{pi) < deg{p2) satisfy \H{ju)\'^ = Pi{w)/p2{w) where w = u'^ for some transfer function H{juj) if and only if they are nonnegative on [0, oo). The characterization of polynomials nonnegative on some finite interval [a, h] is analogous: Theorem 2 (Markov, Lukacs [Lukl8, Mar48, PS76]). Letq{t) € R[t] be a real polynomial. Suppose q{t) > 0 for all t G [o, b], then one of the following holds. 1. If deg[q) = n = 2m is even, then q{t) = gi(t)^ + (* •" o-){b — 092(0^ where deg{qi) < m and deg{q2) < m — 1. 2. If deg{q) = n = 2TO + 1 is odd, then q{t) = {t - a)qi{t)'^ + (6 - t)q2{tf where deg{q{) < m and deg{q2) < m.
316
Jiawang Nie and James W. Demmel
In our algorithms we will need to compute the polynomials qi{jio) from Pi{w), i.e. we need computationally effective versions of Theorems 1 and 2. These are given in section 4. To make the connection to semidefinite programming, we next characterize the polynomials nonnegative on an interval (either [0, oo) or [a,b]) by using certain convex cones. As introduced in [NesOO], define the vector of monomials v{t) = [1 t i^ • • • t""]"^ and the two convex cones of polynomials Ko,co = {P G K"+^ : P^v{t) > 0 V i > 0} Ka,b = {pe ]R"+i : p'^v{t) > 0 V ^ e [a, b]}. Let Hn,i G 5"+i be the i-th Hankel matrix, i.e., H^\=\^' ' lO,
if /e + / = i + 1, otherwise.
As introduced in [NesOO], define linear operators
by the following 2rn + l
^l{v) = Yl
2n2 + l
'^i^rii,i:
^2(-y) = Yl
i=l
^i+l^n2,i-
i=l
Another two operators A^, and A/^ are defined according to whether n is even or odd. When n = 2m, yl3 : ]R"+i-> S'™+\ Ai-.W+^^S"^ are defined as 2m+l M{v)
=
2m-1
Yl '"i^rn.i, i=l
Ai{v)
=
^ [(a + b)ViJ^i - Vi+2 " i=l
aK]^m-l,i-
When n = 2m + 1, yl3 :M»+i ^5-"^+!,
yl4 : ]R"+i-^ 5""+i
are defined as 2m+l
^3(w) = YZ [^»+l ~ "^il-^n^.i' i=l
2m + l
^4(^) = Yl, [^^» ~ •'^i+ll'f^m.ii=l
Let yli,yl2,yl3,^4 be their adjoint operators respectively, with respect to the inner product < A,B >= trace{A^B) for symmetric matrices of the same size. The following theorem is a compact characterization of cones ii'o.ooi -^o,b and their dual cones.
Shape Optimization of Transfer Functions
317
Theorem 3 (Nesterov [NesOO]). The cones Ko^(X), Ka^b can be chcLrcictevized as follows 1- -f^o.oo f^fid its dual KQ ^ are characterized as follows: i^o.oo = {P G R"+i : p = AUYi) + A*^iY2), Yi e 5!f>+\F2 G
Sl'^'},
^o.oo = {c G lR"+i : yli(c) h 0, A2{c) h 0}; 2. when n = 2m is even, Ka,b = {P G R"+i : p = AliYs) + yi:(F4),1^3 G 5!f'+\y4 G S!^}, Kl,
= {c G ]R"+i : yl3(c) ^ 0,yl4(c) >r 0};
when n — 2m + 1 is odd, K,,b = {P G R"+i : p = AUYs) + A^Y^), Y3 G 5!^+\y4 G 5!^+i}, i r ; ^ = {c G R"+i : .13(c) h 0 , ^ ( c ) ^ 0}; 3. Both Ko^aoiKa,b) and KQ^{K*I^) with non-empty interiors.
are convex, closed, and pointed cones
Now suppose we have L subintervals of [0,00): {[ai,bi]}i=,i- Let K = i^o.oo X Ka,,bi X • ••Ka^,bj^. Then its dual K* = K^^ x K*^f,^ x • • • K*^,,^. Given a matrix A of (L + l)(n + 1) rows and 2(n + 1) columns, consider the following problem: find a vector ( if it exists ) p e K^Cn+i) s.t. Ap G K. This can be done by solving a SDP feasibility problem by Theorem 3, say, using the SDP solver in [Stu99]. However it will introduce 2{L+ 1) symmetric matrices of size [n/2] or [n/2] + 1. In order to use interior-point methods to solve it, the complexity of one iteration will be at least 0(2(L + ^)n^) arithmetic operations. Fortunately, the dual cone K* (^ does not involve two symmetric matrices. A natural barrier function [NN94] for K* is given by L
F{c) = -lndetyli(co) -lndetyl2(co) - ^ ( l n d e t 4 ' ^ ( c i ) + Indetyl^'^(ci)), where 713(714) is the operator A3{A4) corresponding to Kai,bi in Theorem 3. Here the vector c = (CQ, • • • , cz,) G IR""*"-^ x • • • R""'"^ Now solve the following L+i times analytic center problem: min F{c) s.t. A'^c = 0,ceintK*.
(3) (4)
318
Jiawang Nie and James W. Demmel
The barrier function F{c) will tend to infinity as c approaches dK*. Hence the minimum will be attained in the interior of K*, which is not empty as guaranteed by Theorem 3. The optimality condition is that
VF{c)=AX,
ceintK*;
A'^c = 0. The optimal solution c* and its Lagrange multiplier A* can be found very efficiently using Newton's method. For any c G intK*, it can be shown [NN94] that VF(c) -
3 Shape optimization In this section, we will show how to design the transfer function of a LTI SISO system so that it has a desired Bode plot. Four kinds of shapes will be discussed: standard bandpass filter, piecewise constant, piecewise polynomial, and general shapes. 3.1 B a n d p a s s filter design The goal is to design a transfer function \H{ju)\'^ = ^ 4 ^ which is close to one on some squared frequency {w = w^) interval [u)^,u;''] and tiny in a neighborhood just outside this interval. The design rules can be formulated as Pi{w),P2{w) > 0, y w >0 P2{w) Pijw) <S, Piiw)
VwG[wlw'2]U[w{,wl2]
where the interval [wfjtOg] is to the left of \w^,w^], and [101,102] ^^ t° ^^'^ right. Here a and f3 are tiny tolerance parameters (say around .05). Let pi and p2 be the vectors of coefficients of pi{w) and P2{w) respectively. Then the constraints above can we restated as
Shape Optimization ofTransfer Functions
319
Pi,P2 e Xo.oo pi - (1 - a)p2 e K^e^^r
Using Theorem 3, we see that the above cone constraints can be expressed as Ap £ K where • In+i
0
0 In+l In+1 [a - l ) / „ + l A = -In+l (l+/3)/„+l —In+1 SIn+1 _—In+l ^In+1
P
and K ^ KQ^OO X ^O,OO X -f'^^^iu'- x -^-u^'.-iu x-f^w«,^ xii'^.^^r. Given (Q;,/3, 5), solve the analytic center problem (3)-(4) and then recover the coefficients p. As introduced in [GHNOO] for the discrete case, we can also consider the following objectives: • • •
minimize a + /9 for fixed S and n minimize S for fixed a, /3, and n minimize the degree n of pi and p2 for fixed a, /?, and 6.
These optimization problems with objectives are no longer convex, but quasiconvex. This means that we can use bisection to find the solution by solving a sequence of analytic center problems. A design example is given in figure 1. For the simplicity of programming, we used SeDuMi[Stu99] to solve the primal feasibility problem.
Fig. 1. The design filter shape for [w' w''] = [2 3], [w' w'2] = [0 1.8], [wl w^] = [3.2 5], a = /3 = 0.05, 5 = 0.05, n = 10.
320
Jiawang Nie and James W. Demmel
3.2 Piecewise constant shape design Here we extend the shape design technique from the last section to piecewise constant shapes. In other words we want the transfer function to be close to given constant values ci, ...,c„ in a set of m disjoint intervals ui"^ = w G \ak,bk], where ai < foi < a2 < 62 < • • • < Om < ^m- More precisely we want the transfer function to lie in the interval [{l—a)ck, (1 + /3)cfc] for w G [0^,6^]. By picking enough intervals (picking m large enough) we can approximate any continuous function as closely as we like. These constraints may be written Pii'w),P2{w) > 0, y w >0 pi{w) (1 - a)ck < < (1 + I3)ck, y w e [ak,bk], k = !,••• P2{w)
,m.
Using Theorem 3 as before, these constraints can be rewritten as the cone constraints Pliw),P2{w)
pi - (1 -a)ckP2,
e /tTo.oo
(l+/3)cfcP2 - P i e Ka^,bk,k = !,•
, m.
As before, find vector p such that Ap G K where 0 0 'n+1 /n+1 (a - l)ci/„+i (l+/?)ci/„+i -/„+i l)Cm-^n+l
(1 + f3)CmIn+l •X K? {, . Solve the analytic center problem and K ^ 0 , 0 0 ^ -^ai.bi ^ (3)-(4) again and recover the vector p. As in the preceding subsection, various design objectives can be considered by applying bisection. A design example for a step function with 3 steps is given in figure 2. 3.3 Piecewise polynomial shape design Here we extend the techniques of the last section to piecewise polynomials. Thus, on each interval [afc,&fc] we ask that Pi{w)/p2{w) be close to a given polynomial (j>k{w)-, in particular that it lie in an interval [(1 — a)<j)k{w)^ (1 + (5)(j)k{w)]. This leads to the constraints Pi{w),P2{w) > 0, V 10 > 0 {l-a)<j>k{w)k{w), P2\W)
Vu;G[afc,5fc],fc = l,--. ,m.
Shape Optimization of Transfer Functions
321
Fig. 2. The step function shape design for [ai bi] = [0 1.8], [02 62] = [2 3], [as 63] [3.2 5], ci = 1,C2 = 3,C3 = 2, a = /3 = 0.Q5, n = 10.
Using Theorem 3 once again, we transform these to the following cone constraints
Pi - (1 - a)4>k{w)p2{'w) e Kaf,,bk,k = !,••• ,m; (1 + (3)p2{w)(f)k{w) - p i G Ka^,bt,k = !,•••
,m.
which are again a set of linear equations and LMI's. As before, pi and P2 can be obtained by solving some appropriate analytic center problem (3)-(4), and bisection can be used to achieve certain design goals. 3.4 General shape design So far we have considered bandpass filter design, piecewise constant shape design, and piecewise polynomial shape design. Here we discuss general shape design. The goal is to design a transfer function \H{ju>)\'^ = ^^\^l so that it behaves like some general nonnegative function f{w) for w G [a, b] where 0 < a < 6. In other words we want: Piiw),p2{w) >0,yw& [0,00) Pijw) (l-a)/H< <{l+p)f{w), \/w€[a,b]. P2{w)
(5) (6)
Now we can not apply Theorem 3 directly, and must instead apply approximation methods. One obvious approach is to partition [a,b] into subintervals {[afc,&fc]}fcLi and approximate f{w) by a constant or more general polynomial in each subinterval. Then we can apply the method from the preceding sections. Another approach is to apply semi-infinite programming(SIP), as described in [WBV97]. The idea is to choose N sample points
322
Jiawang Nie and James W. Demmel a < Wi < W2 < • • • < wj^ < b
and replace the semi-infinite inequality constraints (5)-(6) by N simple inequality constraints. A standard rule of thumb is to choose A'' = 15n in practice [WBV97]. Then the approximate optimization problem is solved iteratively [Pol97]. 3.5 Related work There are several related papers [AV02, GHNOO, WBV97] on various filter design techniques. Most of them are for discrete systems, and some techniques can be applied to continuous systems with some modifications. Note that the mapping s = j ^ maps the pure imaginary axis onto the unit circle except (—1,0). Then our transfer function H{s) becomes a rational trigonometric function R{z). Each interval j[ai,bi] is mapped onto some arc {e^^ : u G [w/, wf ]} on the unit circle. The methods described in [AV02, GHNOO, WBV97] all can be applied. However, all of them will involve the constraints of the form that some trigonometric polynomial is nonnegative on some interval. The characterization of (trigonometric) polynomials nonnegative on some intervals will eventually need Theorem 3 or its equivalent form to transform to a LMI. In this paper, we transform our design problems using constraints of real polynomial nonnegativity on some intervals in the positive semi-axis. Then we may apply Theorem 3 directly to characterize these constraints using LMIs. As described at the end of Section 2, we solve an appropriate analytic center problem, instead of solving these LMIs directly. The structure of this problem can be exploited to use Newton's method to efficiently find the analytic center. There are also several good papers [Fab02, NesOO, GHY03] on polynomials on the real axis, unit circle, pure imaginary axis, and other curves. [NesOO] is the classical paper that characterizes polynomial nonnegativity constraints by LMIs; our paper is based mostly on it. In [GHY03] the authors characterized the cone of positive pseudopolynomial matrices and discussed optimization over this cone. The authors also discussed the conditioning of such optimizations, and proposed using the basis of Chebyshev polynomials to improve conditioning. [Fab02] gives an abstract version of [NesOO, GHY03], characterizing polynomials which are nonnegative on the disjoint union of several intervals.
4 Recovery of t h e transfer function In this section, we show how to use Theorems 1 and 2 effectively. First, given polynomials pi{w) and P2{w) {w = u"^) such that ^ desired shape, we need to find real polynomials qi and q2 so that Pijw) P2{w)
2
92 (jw)
Shape Optimization of Transfer Functions
323
To this end, given a polynomial p{w) that is nonnegative on [0, oo), we will provide an algorithm to find two polynomials qe{w) and qo{w) such that p{w) = q1{w) + w • Qoiw). Then qe contains the even coefficients and qo the odd coefficients (modulo signs) of the desired polynomial q as described in Section 2. Lemma 1. The following polynomial identities hold: 1. iff + wgJXfl + wgl) = {hh + wgxg2? +w{hg2 - /25i)^' 2. {w - r)2 + b^^{wv 9 2 T F ) 2 + w • 2{Vr^+lP - r). Proof. Verify directly. D Lemma 2. If a polynomial p{'w) is nonnegative on [0, oo), then its factorization must have the form
(
111
\
n2
J
i=l
nC-^ + Ci) i=l
Y[{.{w-rif
"3
+ b^)l[{w
+ ai)
i=l
where a >0, bi > 0, n\ + 2n2 + na = n, 0 < ai < • • • < a„3, Ci < 0. Proof. Write the factorization p(tf) = a YYk=i{'^~'Pk)• First consider the constant term a; it must clearly satisfy a > 0 for piw) to be nonnegative over [0, oo). Next consider the three classes of roots pk- real positive, complex, and real nonpositive. The positive roots must all have even multiplicity for p{w) to be nonnegative, so we can write the product of all their factors w—pk as 0^=1 (^"1" Cj)^ where Cj < 0. Next, the complex roots come in complex conjugate pairs Pk = Tk + j • bk and p^ = rk - j • bk, so we can write {w - Pk)iw - pk) = {w — rkY + b\ for all n2 complex conjugate pairs. Finally consider the na nonpositive real roots 0 > —ai > • • • > ~a„3 of p{w). Their corresponding factors u; — (—Oj) = u; + ttj are all nonnegative on [0, oo). D
Using the above two lemmas, we get the following algorithm. Algorithm 4.1 This algorithm will find qe{w) and qo{w) such that p{w) = q1{w) + « ; • q1{w) if p{w) is nonnegative on [0, oo). Let ge = l>9o = 0. Step 1 Find the factorization of where hi > 0, ni + 2n2 + ns = n, 0 < ai < • • • < a„3,0 > Cj £ K. Step 2 Find the {-f + w{-f form ofXXZ^iw + a^) for fc = 1 : n3 qe = y/alqe
+ W • qo
qo = \fa'iqo - qe qe •=qe,qo -^^qoend
324
Jiawang Nie and James W. Demmel
Step 3 Find the {-f + w{-f form ofY^^l^{w'^ + 6f) for k = 1 : n2 qe = qejw - ^rf + b'i) + w • g „ ^ 2 ( ^ K ' + 6f - r^) 9o = qeyj2{^rt^+yf^)qo{w - ^rf + b'j) Qe ••=qe,qo ••^'qoend Step 4 qe := aqe ]Xi=ii'^ + ^i), 9o := aQo n " = i ( ^ + ^i). Now apply algorithm 4.1 topi{w){i = 1,2) respectively, i.e., Und qij{w){i,j = 1, 2) such that Pi{w) = qfiiw) + w • ql2{'^) fo^ i = 1, 2. Then we obtain the desired transfer function H{s) 92,l(-'S^) + S 9 2 , 2 ( - S ^ ) '
Remark: If a polynomial p{w) is nonnegative on a finite interval [—1,1] (another finite interval can be changed to this one by a linear transformation), then we can also apply the above algorithm to find two polynomials pi and P2 such that p{w) = Pi{w) + (1 — w){w + l)p'2{w). ActuaUy, we only need to do the Goursat transform (see [PROO]) for piw), i.e.,
p(«;) = (^ + l ) " p ( i ^ ) , and then apply the above algorithm to find qe,qo such that
p{w)
=ql{w)+wqliw),
and then apply the inverse Goursat transform to get back p{w): p{w) = 2-"(w + l)'*''ff(P)p(i-^^).
5 Conclusions and discussion This paper discusses shape optimization for a transfer function for a LTI SISO system by formulating it via semidefinite programming. Given the shape (absolute value) of the transfer function, we show how to extract the transfer function itself. Since the optimization process uses semidefinite programming, it may be done efficiently. We do not consider any constraints on the components A, b, c, d of the LTI system. However in practice, these components may not be arbitrary, but instead have special structure and depend on certain design parameters. Thus an interesting question is finding those parameters to optimize the shape of the transfer function as we did in Section 3. This is in general not a convex
Shape Optimization of Transfer Functions
325
problem, and can be very hard to solve. But it is still a feasibility/optimization problem about polynomials, if {A, b, c, d) are polynomials in those parameters. Therefore, we may formulate t h e m using polynomial optimization, and then solve t h e m by techniques such as the sum of squares and positivstellensatz (see [ParOl]). B u t this is more difficult, and future work. A c k n o w l e d g e m e n t s : The authors would like to t h a n k Prof. El Ghaoui and the Referee for many useful comments and suggestions.
References [AV02] B. Alkire and L. Vandenberghe, Convex optimization problems involving finite autocorrelation sequences. Mathematical Programming Series A 93 (2002), 331-359. [CD91] Prank M. Callier, Charles A. Desoer, Linear System Theory, SpringerVerlag, New York, 1991. [Fab02] L. Faybusovich, On Nesterov's approach to semi-infinite programming. Acta Applicandae Mathematicae 74 (2002), 195-215. [GHNOO] Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren, " Convex Optimization over Positive Polynomials and filter design", Proceedings UKA CC Int. Conf. Control 2000, page SS41, 2000. [GHY03] Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren, Optimization problems over positive pseudopolynomial matrices, SIAM Journal on Matrix Analysis and Applications 25 (2003), 57-79. [KS95] T. Kailath and A.H. Sayed, "Displacement Structure: theory and applications", SIAM Rev. 37(1995), 297-386. [LuklS] Lukacs, "Verscharfung der ersten Mittelwersatzes der Integralrechnung fur rationale Polynome", Math. Zeitschrift, 2, 229-305, 1918. [Mar48] A.A. Markov, "Lecture notes on functions with the least deviation from zero", 1906. Reprinted in Markov A.A. Selected Papers (ed. N. Achiezer), GosTechlzdat, 244-291, 1948, Moscow(in Russian). [NN94] Yu. Nesterov and A. Nemirovsky, "interior-point polynomial algorithms in convex programming", SIAM Studies in Applied Mathematics, vol. 13, Society of Industrial and Applied Mathematics(SIAM), Philadelphia, PA, 1994. [NesOO] Yu. Nesterov, "Squared functional systems and optimization problems". High Performance Optimization(H.Frenk et al., eds), Kluwer Academic Publishers, 2000, pp.405-440. [Pol97] E. Polak, "Optimization: Algorithms and Consistent Approximations". Applied Mathematical Sciences, Vol. 124, Springer, New York, 1997. [PS76] G. Polya and G. Szego, Problems and Theorems in Analysis II, SpringerVerlag, New York, 1976 [PROO] V. Powers and B. Reznick, "Polynomials That are Positive on an Interval", Transactions of the American Mathematical Society, vol. 352, No. 10, pp. 4677-4692, 2000. [WBV97] S.-P. Wu, S. Boyd, and L. Vandenberghe, "FIR filter design via spectral factorization and convex optimization". Applied and Computational Control, Signals and Circuits, B. Datta, ed., Birkhauser, 1997, ch.2, pp.51-81.
326
Jiawang Nie and James W. Demmel
[ParOl] P.A. Parrilo. Semidefinite Programming relaxations for semialgebraic problems. Math. Prog., No. 2, Ser. B, 293-320, 96 (2003). [Stu99] J.F. Sturm, "SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones". Optimization Methods and Software, ll&£:12(1999)625-653. [VB96] L. Vandenberghe and S. Boyd, "Semidefinite Programming", SIAM Review, 38(l):49-95, 1996.
Achieving Wide Field of View Using Double-Mirror Catadioptric Sensors Ronald Perline^ and Emek Kose^ ^ Department of Mathematics, Drexel University, Philadelphia, PA 19103, USA. rperlineOmcs.drexel.edu ^ Department of Mathematics, Drexel University, Philadelphia, PA 19103, USA. ek58®drexel.edu
Summary. For many applications such as surveillance, medical imaging, photography and robot navigation, it is required that the camera have a wide field of view. Traditional approaches to solve this problem include using a rotating camera, stitching images, complex lenses or multiple cameras. We are proposing a catadioptric sensor with a camera-mirror pair for enhancing the field of view. Devices consisting of a reflective surface (catoptrics) and a camera (dioptrics) are called catadioptric sensors. Our single viewpoint double-mirror system formed of a conical mirror coupled with a proper secondary mirror arises as a solution to a nonlinear first order ordinary differential equation. The ordinary diS'erential equations are obtained as geometric solutions to the problem. Our system is designed to image a plane at infinity, without distortion, requiring no digital unwarping. In this work, we are analyzing the family of cones and corresponding secondary mirrors to obtain a correct image and a large field of view.
1 Introduction Using mirrors combined with lenses is an effective solution to the general problem of enhancing the field of view, which is necessary for various apphcations, such as surveillance, medical imaging, photography and robot navigation. In robotics, especially in autonomous mobile robots, omnidirectional sensors are the best for navigation, and in dynamic environments, catadioptric sensors work the best. T h e images produced by the sensor can easily be transformed by various software. Presenting such advantages, interest in catadioptric sensors has increased recently. Catadioptric sensors are oriented such t h a t the camera is fixed, with the rotationally symmetric mirror suspended above it. A camera, the dioptric component, realizes a mapping between the 3D world and 2D image plane. We employed perspective and orthographic camera models in our designs. In perspective projection, the world points are projected onto the image plane with perspective rays originating at the center of
328
Ronald Perline and Emek Kose
projection (COP), which would lie in physical camera. The camera reahzing perspective projection is often referred to as pinhole camera. Orthographic projection can be viewed as a special case of perspective projection, it has a large focal distance, in other words, the center of projection is at infinity. In cameras realizing orthographic projection, objects close to the optical axis are imaged, and parallel hnes are mapped to parallel lines. So the difference of orthographic projection from perspective projection is that light rays travel parallel to optical axis, as opposed to passing through the center of projection. Traditionally, imaging systems have been designed to maintain a single effective viewpoint. In other words, all the rays of light entering the camera intersect at a single point, called the effective pinhole, so single effective viewpoint acts as a virtual center of projection. By placing the center of perspective camera lens at the outer focus of the hyperbohc mirror, the inner focus then becomes the single effective viewpoint. Single viewpoint allows the unwarping of accurate perspective images. The single viewpoint has been studied and recommended by Baker and Nayar [BN98]. Our system is designed to image a plane at infinity, compared to the diameter of the system, without distortion, requiring no digital unwarping. The catoptric component of the system is a double-mirror consisting of a primary mirror and its dual, as we call them, and both mirrors employed are rotationally symmetric. This allows us to picture the problem in 2D. The primary mirror has been chosen a concave up cone of different slopes under perspective and orthographic projections. The corresponding secondary mirror, which is also called its "dual" was designed according to the transformation reahzed by the primary mirror. For the special cone whose slope is 1, we can easily obtain the analytical solution to the ordinary differential equation for its dual whereas for cones with slopes different than 1, we have to solve the resulting ordinary differential equation by numerical methods. The scheme we used in designing our catadioptric sensors, as suggested by Hicks and Perhne [HPOl] consists of: 1) Differential formulation: Primary mirror is fixed, the desired design characteristics of the dual mirror are restated in terms of a set of ordinary differential equations. 2) RealizabiUty: A systematic check is performed to determine whether it is theoretically possible to construct a catoptric which exactly realizes the specified design characteristics. 3) Optimization: If the mirrors are not realizable, then we perform optimization methods, to find the 'best' possible catoptric component.
2 Related Work Here, we review past work on catadioptric sensors, with focus on design rather than applications. There has been an increasing interest in investigating the design and appfications of catadioptric sensors. Much of this work focuses on
Achieving Wide Field of View
329
designing panoramic or wide field of view sensors (see [BN98, Hic99, Nay97, Ree70, YK90, CM99]). An early use of employing mirrors for panoramic imaging is a patent by Rees [Ree70], who proposes the use of a hyperbolic mirror for television viewing. In [HPOl], Hicks and Perline describe a general method for catadioptric sensor design for a prescribed projections, using geometric distributions in 3-dimensional space, generalizing vector fields. Nayar presents a new single viewpoint, double-mirror, double-camera catadioptric design which is truly an omnidirectional sensor. His design consists of two parabolic mirrors, paired with two orthographic camera [Nay97]. In 1996, Nalwa [Nal96] proposed a panoramic sensor that consists of 4 planar mirrors coupled with 4 imaging systems. The system possesses single viewpoint and it has a large field of view, approximately 360° x 50°. In [HBOO], Hicks and Bajcsy obtain a mirror shape that is wide-angle but approximates perspective views. This design, which is not single effective viewpoint, distorts the view plane minimally. The polar sensor derived by Hicks, Perline and CoUetta [HPCOl] is very close to our Right Angle Mirror design, as we call it. They suggest a sensor design under cylindrical projection, such that the sensor images the world linearly.
3 Conical Mirrors 3.1 Right Angle Mirror In any of our designs, we start with a choice of primary mirror and the projection. Depicting the problem for the chosen parameters, we derive ordinary differential equations that describe the secondary mirror. Solving the ordinary differential equations obtained, either analytically, which is only valid for Right Angle Mirror under Orthographic Projection, or numerically, gives us the graph of the cross section of the secondary mirror. 3.1.1 Right Angle Mirror U n d e r O r t h o g r a p h i c P r o j e c t i o n This is the most basic design in the family of cones. The conical primary mirror we chose has slope equal to 1 and our camera realizes orthograpliic projection. The profile curve of the system can be seen in Figure 1. Assumptions made are: the view plane is infinitely far from the camera, compared to the diameter of the system and is in +y direction. Our camera is located on —y axis, which at the same time is the optical axis of the entire system. So, to set the problem from an inverse direction, all light rays emanating from the image plane travel parallel to each other to shoot the primary mirror,reflecting off parallel to a-axis, they intersect the dual mirror and they reflect off to shoot a point in the view plane, which is y = K. a is a. scaling
330
Ronald Perline and Emek Kose
-0.
Incoming L i g h t Ray
Fig. 1. Cross-section of the Right Angle Mirror under Orthographic Projection. constant that lets us adjust the amount of scaling of the view plane onto the image plane. The primary mirror, which is a cone has equation y=x. Let us define the vectors u and v; u = < - l , a ; i > ; v ==< Kaxi,K
> - < x',f{x')
>
(1)
Normalizing u and v, we get: u=<
-1 l+xi'l
Xi
X
,
Xi
+ xj
Since the view plane y = K is infinitely far, the limit X
lim
O/i
< axi - -r;,l —77 > = < axi, 1 >
K—>oo
K
(2)
K
The vector sum nd = u + v is parallel to the normal vector of the secondary mirror. By making use of the relation between the normal to the surface and the tangent to the surface, we can say that slope of the tangent line to the surface at the point x = x' equals the negative reciprocal of the slope of the normal to the surface at the same point.
That is, - S i = fix') We then have: f'{x)
= 2x+\/4x2 + l
(3)
Integrating both sides, we obtain the analytical solution to our problem:
Achieving Wide Field of View 1
331
1
fix) = x^ + -xV4x2 + 1 + - sinh-^(2x)
(4)
The graph of /(x) is the cross section for our secondary mirror. In Figure 1 you can see how this system images the checker board, far above. We use checkerboards for our simulations, because we are interested in how the system distorts lines. 3.1.2 Right Angle Mirror Under Perspective Projection Our primary mirror is the same cone with slope equal to 1 and we are considering a pinhole camera. The assumptions made are; view plane is at an infinite distance, the optical axis is the y-axis and the center of projection of the camera is at the point ( 0 , - 1 ) . All hght rays that pass through the pinhole are going to reflect off the conic mirror and then hit the secondary mirror, yet unknown. Reflecting off the secondary mirror, they shoot the plane y = Kxi is the intersection point of the incoming ray with the x-axis. We approach this problem with the vector-method discussed above. Let us define the vectors, u , v and w; u = < xi, 1 >, V = < —1, —xi >, w = < axi, 1 > . Let us also define m^ = slope of the incoming light ray, mr= slope of the reflected light ray, di= the angle that incoming light ray makes with positive X-axis, 9r=the angle that reflected light ray makes with positive x-axis and 9m= angle between positive x-axis and the mirror. It is obvious from the Figure 2 that rrii = ^ . Prom optics, we know that 0j. + Oi = 26^ , since we have 0m = f i this imphes; 61^ = I - 6li =» tan(6l^) = tan(f - Oi) =^ rtir = tan(f - di) = cot(6ii). Hence, this gives us ; m^ = xi and m, = —. The point of intersection of the conic mirror with the incoming hght ray can be found simply by simultaneously solving both equations. This point of intersection is: ( j ^ ^ , i r ^ ) Any point (x2,y2) on the secondary mirror can be written as: {X2,y2) =
{--,——,z——)+tv. 1 - xi 1 - xi This implies xi = —^TT . Also we know that irTr + jrhi is in the normal ^
^
X2 + ^
||V||
||W||
direction of the secondary mirror. After necessary substitutions and simplifications we obtain a nonlinear implicit ordinary differential equation which is not possible to solve analytically. However, numerical solution gives us the cross section of the dual mirror, seen in the Figure 2. 3.2 30-Degree Conical Primary Mirror One of most efficient systems we obtained is the 30-degree conic primary mirror system. Efficiency, in the sense that, the system can view a large portion
332
Ronald Perline and Emek Kose
Fig. 2. Cross-section of Riglit Angle Mirror under Perspective Projection. of the scene without having a large diameter. Keeping the diameter of the entire system small is important. We could sacrifice having a longer system for a narrow one. The reason for this is the future applications may include medical imaging. The primary mirror is a cone with slope equal to t a n ( | ) . The camera is orthographic.
0.5
Fig. 3. Cross-section of a 30-degree Primary Mirror under Orthographic Projection.
Achieving Wide Field of View
333
Fig. 4. POV-Ray simulation of the view from 30-degree system in the test room.
We are going t o employ the vector method t o derive the secondary mirror. The primary mirror has equation: y = tan(;)x. The vectors v and w after normalization are then;
So the vector addition of v and w is parallel to normal vector of the dual mirror at (x, f (x)). If n d = v w then the slope of n d equals the negative reciprocal of the slope of the tangent line of the secondary mirror at the same point. That is:
+
ndl nd2
--
=
f I(.)
Expressing tan(!) geometrically, using the figure helps us engage f (x) in the slope equation. That is
Solution of equation (8) for
gives us this relation;
334
Ronald Perline and Emek Kose
By equations (7) and (9) we derive the nonlinear, implicit ordinary differential equation for secondary mirror corresponding to 30-degree conic primary mirror. The cross section of this mirror can be seen in Figure 3. The simulation of this system after it has been revolved about the y-axis is as in Figure 5. The 30-degree conical mirror system inside the test room with checkerboard walls views close t o 180°, Figure 4.
Fig. 5. Sideview of a 30-degree Primary Mirror under Orthographic Projection.
4 Conclusion In this paper, we have exhibited a catadioptric sensor design which enables a normal camera an ultra wide field of view with minimal distortion. These sensors are based on a family of double-mirrors with conical primary mirrors, derived as numerical or analytical solutions of non-linear ordinary differential equations which describe how a plane perpendicular to the optical axis of the system is imaged on the film. The images obtained require no further processing. In the future work, we would like t o investigate the general conics as primary mirrors, such as parabola, ellipse and hyperbola. We also want to implement an actual system and experiment on it, perform full error analysis.
References [BN98] Baker, S, and Nayar, S.K. (1998), "A Theory of Catadioptric Image Formation," Proc. International Conference on Computer Vzsion, pp. 35-42.
Achieving Wide Field of View
335
[CM99J Conroy, T. and Moore, J. (1999), "Resolution Invariant Surfaces for Panoramic Vision Systems," Proc. International Conference on Computer Vision, 392-397. [Hic99] Hicks, R.A. (1999), "Reflective Surfaces as Catadioptric sensors ," Proceedings of the 2nd Workshop on Perception ofr Mobile Agents, CVPR 99 , 82-86. [HBOO] Hicks, R.A. and Bajcsy, R. (2000), "Catadioptric Sensors that Approximate Wide-Angle Perspective Projections," Proc. Computer Vision and Pattern Recognition, 545-551. [HPOl] Hicks, R.A. and Perline, R. (2001), "Geometric Distributions for Catadioptric Sensor Design," Proc. Computer Vision and Pattern Recognition 2001. [HPCOl] Hicks, R.A., Perline, R. and Coletta, M.L. (2001), Catadioptric Sensors For Panoramic Viewing, World Scientific. [Nal96] Nalwa, V. (1996), "A True Omnidirectional Viewer," Technical Report, Bell Laboratories, Homdel, NJ 07733, USA. [Nay97] Nayar, S.K. (1997), "Catadioptric Omnidirectional Camera," Proc. Computer Vision and Pattern Recognition, 482-488. [Ree70] Rees, D. (1970), "Panoramic Television Viewing System," United States Patent, (3,505,465) [YK90] Yagi, Y. and Kawato, S. (1990), "Panoramic Scene Analysis with Conic Projection ," Proceedings of International Conference on Robots and Systems , 82-86.
Darcy Flow, Multigrid, and Upscaling James M. R a t h Institute for Computational Engineering and Sciences, University of Texas at Austin, TX 78712, USA. organismQices.utexas.edu
Summary. Simulation of flow through a heterogeneous porous medium with finescale features can be computationally expensive if the flow is fully resolved. Multigrid algorithms are often used to compute approximations to such problems because of storage considerations, the high condition number of the problem, and multigrid's resolution independent convergence rate. The proposed algorithm takes an ordinary multigrid method and replaces the coarsening procedure with an upscaling one. Upscaling aims for performing computations on a coarser scale but still retaining information about the fine-scale flow and problem data. That is, an upscaled computation is just as cheap as a coarse one, but much more accurate. By replacing coarsening with upscaling in multigrid, faster convergence ensues. Numerous upscaling techniques have been proposed; variational multiscale subgrid upscaling is used here. By keeping within the usual finite element variational framework, known multigrid analysis techniques can be applied. Like multigrid, the proposed scheme has a convergence factor dependent on both the permeability field (the ratio of maximum to minimum values) and the relative resolution of the fine and coarse grids.
1 Introduction Darcy's law describes fluid flow through a porous medium. It is an empirical law t h a t asserts bulk flow of a fluid through the medium is proportional to the gradient of the pressure across the medium (accounting for hydrostatic differences from gravity) [Dar56, Bea72, Sch74, FC79]: u = — (Vp-pg) M Darcy's law has found wide applicability in modeling subsurface flows, and has been generalized to model multicomponent and multiphase flows. (The above differential form is itself a generalization of the relation Darcy formulated.) Our primary interest is in using Darcy's law to model oil reservoir and groundwater contaminant flow.
338
James M. Rath
Darcy's law alone is insufficient to describe the physics: conservation of mass (the continuity equation) and equations of state (relating density, viscosity, and permeability to phase fraction and temperature) are necessary. In the appUcations being considered, there is often a need for the velocity to be very accurate and to strictly (locally) observe mass conservation; hence our focus on mixed methods [RW83, DEW83, DEW84, ERW84]. Although our presentation ignores aspects of multiphase flow, the proposed ideas and software can readily be adapted to model such flows. 1.1 Heterogeneity and why it is a problem Geostatistical modeling is used to generate the necessary data (porosity and permeability) to specify the problem to be approximated [DJ97]. This data is typically given at a very fine resolution [Dur02], but the goal is to predict long-range flow behavior; it is tempting, then, to approximate the problem at a very coarse scale. However, fine-scale features of the problem data can have very large effects on the coarse-scale flow behavior [Dur02]. Therein lies one big difficulty: it is necessary to resolve the flow at very fine scales resulting in computationally poor conditioned problems to solve [G+85]. Moreover, the resolution cannot be reduced to shrink the size of the system: (1) heterogeneity in the permeabihty (irregular, short spatial-scale jumps) means p-refinements (high-order approximations) will not help, and (2) spatially-limited resolution and spatially-uniform heterogeneity means /i-refinements (coarse scaling) will not help either. The fine-scale resolution necessary in simulations makes for poor conditioning, yet there is still another difficulty: the jumps in the permeability can sometimes be quite severe (several orders of magnitude changes) between nearby locations (see for example Figure 4). This makes our computation of an approximation even more poorly conditioned. The more heterogeneous and fine-scale the problem, the more computationally expensive it becomes as all known direct/iterative linear solvers have behavior which worsens with greater condition number [GV96]. We seek to broaden the range of problems that are computationally feasible. 1.2 Proposed solution technique One approach used to overcome the poor-conditioning in modeling multiscale phenomena is upscaling [Dur02]. In upscaling techniques, an averaging process is used to determine the influence of fine-scales on the simulation and to adjust the coarse-scale computations accordingly. Typically an upscaled model will more accurately predict the coarse flow, and will also correctly display flne-scale features of the flow. With mixed variational multiscale subgrid upscahng [Arb, HFMQ98], the scales are spht into subgrid and coarse parts before approximating, thereby keeping all the fine-scale information in the model. The mixed framework keeps strict conservation of mass.
Darcy Flow, Multigrid, and Upscaling
339
Any upscaling technique must ignore some fine-scale information, though. As noted above, fine-scale features of the problem data can have very large effects on the coarse-scale flow behavior — sometimes a full resolution of the problem is inescapable to correctly predict flow features like production rates or breakthrough times. Not all the benefit of upscaling need be lost: we can use upscaling to aid the computation of the fine-scale flow. Since one goal of upscahng is to obtain a better coarse-scale approximation of the flow, it is a natural idea to try substituting the upscaled approximation for an ordinary coarse-scale approximation in a multigrid scheme. The upscaHng substitution is also natural in that it is about as computationally expensive as the ordinary coarsening. This substitution forms the basis for the proposed algorithm. Although we cannot hope to reduce the computational complexity of multigrid, we can dramatically reduce the time necessary to accurately model a fine-scale flow. As repeated single-phase flow simulations are at the heart of multi-phase simulations, geostatistical sampling, and flow optimization studies, this is a significant achievement.
2 Mixed Variational Multiscale Subgrid Upscaling Because of the complexity of the proposed algorithm, it is useful to first describe the underlying upscaling method and introduce some terminology and notation. Further details of this technique can be found in [Arb, Arb02, AB02]. Use /? to denote the spatial extent of the porous medium and assume that it is a connected, convex, polygonal domain in E" (where n is 1, 2, or 3). Let p and u denote the unknown pressure and macroscopic velocity. Our model problem in the interior of the domain consists of Darcy's Law u = --(Vp-pg), M
(1)
V • u = /,
(2)
and the continuity equation where K, /i, p, and g are the (spatially-varying) permeability, viscosity, density, and gravity, and / is a source term. On one piece of the boundary rV, specify the flow u-v = gN, (3) and on another, r b , specify the pressure P = 9D-
(4)
The pieces of the boundary add up to the whole dQ = PIM U PD , the pieces are disjoint F^ C] To = 0, and v is the unit outer normal vector to dSl. A cell-centered finite difference method (CCFD) is used to approximate fiow equations for the pressure.
340
James M. Rath
The proposed methods, however, apply to the problem (l)-(4) in mixed variational form. Let V = {v G iJ(div, /?) I V • 1/ = 0 on FN} , W = L'^{Q), and Vg„ be an element of if (div, Q) such that Vg^ • v = g^ on Fff and V(,„ • v = 0 on FD. (An important special case occurs if F^ = dQ. In that case, use W = L^/K instead; a compatibility condition on / and g^ is also required.) The problem is then to find u G V and p &W such that (V • n,w) = {b,w) (ku, v) - (p, V • v) = (c, v) -(£(£,,. VV' i •^ )iy)ro rD
Ww G W, Vv G V.
(5) (6)
The substitutions k = f^K.~^, b = f — V rg^, and c == pg — kvg„ have been made to simphfy the notation. Note that the velocity u = u + Vg„ is the solution to the original problem (l)-(4). In order for either formulation of the problem to have a solution, the data needs to meet some regularity constraints. In the variational formulation, one of interest in this proposal is that k must be a symmetric rank-two tensor in (L°°(i7))"^" (denote its essential supremum by k*), and must be uniformly elliptic (denote its essential infimum by fc*). The ratio of these two values fc*/fc* will have an impact on the performance of the proposed preconditioners. 2.1 A mixed finite element approximation For a fine-scale problem, assume the permeability data k is given as piecewise constant on a grid with spacing h, and that measurement uncertainties prevent specifying it at any finer resolution. It then makes sense to find solutions at the same resolution, and no finer. That is, find u/j G Vh C V and ph G Wh C W such that (V • Uh, Wh) = {b, Wh) (ku,i,v^)-(p/j,V-v/i) = (c,v/i)-(to,v/,-J/)rD
Wwh G Wh, Vv/j G V^,
(7) (8)
where Wh^Vh are Raviart-Thomas-Nedelec elements of order zero (RTO) [RT77, BF91, RT91] on the given grid. See Figure 1 for an illustration of the degrees of freedom of these elements. Using quadrature — the trapezoidal rule — to compute the term (ku/j, Yh) is equivalent to solving the original problem (5)-(6) using cell-centered finite differences [RW83, AWY97]. If a coarse-scale approximation is desired instead, then find u/f G VH C V and PH G WH C W such that {S7 •UH,WH)
= {b,WH)
(ku/f,v/f) - (p//,V-Vi^) = (c,Vfl') - (firo,v/f • j^jro
WwHeWn, Vv//G Vf/.
(9) (10)
Darcy Flow, Multigrid, and Upscaling —**—
^ »
: •
,
'
.i
.
•
:
•»
:
: * '
•
;
#:
•
•
•ii
•:
'
*
ill
;;
'!!!(•
,
r
<$ :
•
a- , ;
* ,
;
•
9
:
• ::
:•
:; »
:^
• ,:
: «:
: •"
: » .
:'i
•
• ;:
• : • ; : « ^; . :
•It
:
1!
.
:
.ill
:
:
•
II
;
;
# ::
• )
:
» : : • ;:
•
<•
::
f
;:
—fi—
III
:
,
.; • : :
s -
»
;
5
:
:
•
;
.
* ;
:
« ,
; . :
^
*
• :: .:
• :
•
. :
• : • ;
'::
• :: • ; : • ::
• : • :.
: • : : • ;: • ;: • :
m :: :
:
:
.
•
:;
.•
: •^ -
)
;
:,
« :: • ^ ,
m ^ ::
: • :
'
••
:
'-
^ . : ffl
;
$i
;
; « :
:
"
:
:
: ^
341
„: —
« :; »
•
• ,'
e .:
• ;:
« :• ,. ?
«
ti'
:i
®
:
11
;
:'. @ ;
• -
—
I
—
•
Fig. 1. Degrees of freedom for 2-D RTO elements on a fine rectangular grid: normal velocity degrees of freedom, and • pressure degrees of freedom.
•
•
*
'!•
•
a
•
•
^1^
Fig. 2. Degrees of freedom for 2-D BDMl [BDM85] elements on a coarse rectangular grid: :'. (linear) normal velocity degrees of freedom, and « pressure degrees of freedom.
342
James M. Rath
where WH X V^ are RTO elements or, perhaps, Brezzi-Douglas-Duran-Fortin order one (BDDFl) [BDDF87] elements) on a grid with spacing H, an integer multiple of h. See Figure 2 for an illustration. For the RTO elements, some error estimates [BF91, DR85, RT91] for a grid size h are ||u-u„||o
^0{h),
(11)
\\P - Phh < C\\p\\2h
=0{h).
(12)
For BDMl/BDDFl elements the velocity is an order of h more accurate ||u-u,,||o
=0{h').
(13)
2.2 The upscaling technique The subgrid upscaling technique uses the mixed finite element method (MFEM) with a variational multiscale [HFMQ98] technique. Before any approximation, decompose V and W (using a chosen coarse grid) so that (i) no information is lost, V = Vc ® SV, W = Wc® 5W; (ii) mass is conserved, V • Vg = Wc, V • 5Y = SW; and (iii) the fine-scale velocities are locally supported over the coarse grid, 6Y • i^ = 0 on the boundary of each coarse cell. (Section 3.1 — in particular Theorem 3.3 — of [Arb] show that this decomposition is always possible and satisfies some additional necessary properties. There is also more than one way to choose the decomposition.) Note that Condition (iii) above allows us to disconnect the subgrid problems from one another, but not from the coarse grid problem. A Green's or influence function approach is employed to effect a forward elimination of the subgrid problems into the coarse problem, and — once the coarse problem has been solved — a back substitution to recover the subgrid solutions. The two can then be recombined to obtain a fine scale solution. Since V and W have been decomposed into direct sums, the problem (5)~ (6) can be described as follows. Find Ug e Vc and pc G Wc, and Su G SV and Sp € 6W such that on the coarse scale {V •{uc + 5u),Wc)^{b,Wc)
Vu;c G Wc, (14)
(k(uc4-(5u),Vc) - (Pc + (5p, V • Vc) = (c,Vc) - igD,yc-'^)rD
Vvc e Vc, (15)
and on the subgrid scale {V-iuc + 5u),Sw) = {b,Sw) (k(uc-F(5u),(5v)-(pc + <5p,V-5v) = (c,(5v)
ySw&SW, VSveSY.
(16) (17)
Darcy Flow, Multigrid, and Upscaling
343
Then u = Uc + (5u and p = Pc + Sp solve the original variational problem (5)(6). To define the (5-subgrid operator and perform the forward elimination of the coarse components, rewrite the subgrid scale equations with coarse components on the right-hand-side. That is, consider the coarse components as sums of basis elements Pc
yjajW*
and
u^ = 2_.Pj'^i-
(18)
The parts of the 5-operator are obtained by substituting these expressions in (16) and (17) and solving for the a^- and /3j-influence function coefficients. The constant part of the 5 operator is given by solving (V • 5u, 5w) = {b, 5w)
\/Sw € 5W,
(19)
(k^u, Sv) - {5p, V • (5v) = (c, 5v)
V(5v e 5Y.
(20)
\/6w€ 5W, VJv G 5\.
(21) (22)
The V1/c-linear part of the S operator is given by solving (V • (5ui, Sw) = 0 {kSui, 6v) - {Spi, V • (5v) = {wi, V • (5v) The Vc-linear part of the 6 operator is given by solving (V • 5uj,5w) = -{V-vlSw) (k(5u^-,<5v)-(<5pj,V-(5v) = - ( k v ^ c H
\/6w e 5W,
(23)
ydv G 5Y.
(24)
Then 5p = Y^ UiSpi + ^ = 5p{p,)
pjSpj + Sp
+5p{n,)
+5p
(25)
and (5u = V ] ctiSui + 2_] P-jSvLj + 5vL = 5VL{PC)
+(5u(uc)
+5u
(26)
are implicit expressions for 5p and 6u in terms of Pc and Uc. Note that 5p{w].) = 5pi, 5p{vl) = Spj, 6u{wl) = Sui, and 5u(v^) = 6uj. Return to the coarse scale equations (14)-(15). Rewrite them substituting in the influence-function expressions for 5p and Su, and gathering the coarse coefficients ccj and /3j out front. Equations for the coarse components' coefficients result
344
James M. Rath ^ a i ( V • 5ui,wc) + X!'^i(V • (vi + Suj),We) = (6 - V • Su,Wc) i
i
Vu;eGVFc, (27) ^ a i ( k < 5 a i , V,) + ^ / 3 , ( k ( v ^ , + 5u,),Ve) i
3
- Y^ ai{wi + 5pi, V • Ve) - ^ i
pj{5pj, V • Ve)
J
= (c-k<5u,v,) + (<5p,V-Ve)
VVCGVC,
(28)
or using the operator notation (V • (ue + 5u{pc) + <5u(ue)), We) = {b-V( k ( U c + Su{Pc)
5u, Wc)
'iw, G Wc,
(29)
+ (5u(Uc)),Vc)
-(Pc + Sp{pc) + Sp{uc), V • Vc) = (c - k(5u, Vc) + {5p, V • v^) -igD,^c-'^)ro
VVeGVe,
(30)
Solve these equations for aj and /3j (that is, Uc and Pc), and write the solution as p = Pc
+Sp
= ^ atwl + Y2 (^i^Pi + Yl ^^^Pi + ^^ i
i
("^^)
j
and u = Uc
+ (5u
= E ^i^c + E "'"^^^ + E ^i'^"i + "^^^^ j
»
(32)
i
Lastly, (29)-(30) can be written in symmetric form by rearranging terms and using some identities not identified here. See [Arb] for details, as well as proofs that each of the above problems are well-posed. Also note that the i5-subgrid operator has a natural definition. No information has been lost in the reformulation of the problem, and no ad-hoc assumptions have been reintroduced to simplify the action of the 5 operator. 2.3 Approximating the upscaled problem Only after deciding on the decomposition is the approximation then made. We are free to pick discretizations for each subgrid independent of each of the others. The discretization of the coarse spaces is only constrained by the already chosen coarse grid. Standard mixed finite element spaces such as Raviart-Thomas or BDDF elements will do.
Darcy Flow, Multigrid, and Upscaling
345
The subgrid upscaling technique has a number of advantages. Each subgrid problem is independent from the others because of the Neumann boundary conditions along coarse cell edges; the subgrid problems can be solved in paraUel with no communication of boundary data needed across coarse cell edges. Each subgrid problem also has many fewer degrees of freedom than the whole problem (by about a factor of the number of coarse grid cells); these problems are far better conditioned, and may be amenable to a direct solver. Each subgrid problem has the same left-hand side and many different righthand sides creating an opportunity for computational efficiency. The coarse problem has many fewer degrees of freedom than the whole problem (by about a factor of the number of fine cells chosen per coarse ceU) and so is also much better conditioned. If the coarse grid spacing is a small multiple of the fine grid spacing, the work in solving the upscaled problem is nearly all done in solving the coarse problem. Lastly, there is only one coarse grid (not many as in multigrid). Further, the number of degrees of freedom in the upscaling space is nearly as many as that in the fine space (compare Figure 1 with Figure 3 below). However, for small upscaling factors the upscaled solution is only about as computationally expensive as a coarse solution. This would be only mildly interesting if the upscaling solution was only as accurate as a coarse solution, but in fact it captures much more (see example below in Subsection 2.5; also see Figures 10-17 in Section 3 on the multigrid-hke preconditioner). There are some algorithm implementation issues to be considered. First, the elements of SWk may not be easy to compute with. For instance, suppose RTO elements are used for Wh and WH, and SWh = {WH) with the complement taken in M^/,. Then elements of SWh do not have local support on fine cells — they must have zero average on coarse cells. A computational trick explained in [Arb02] and [AB02] avoids this comphcation. Second, there is obvious parallelism in the subgrid problems since each one is independent of all the others, and the coarse-scale influence functions allow disconnecting the subgrid from the coarse scale. However, for coarsescale problem there is not any obvious parallelism. To overcome this lack of paraUelism, one could use a domain decomposition method [GW88] or another technique. Lastly, all of the problems being considered are saddle point problems because of the mixed formulation. For small upscaling factors, the subgrid problems may be solved directly. In other cases and for the coarse problem, iterative solvers may be necessary. The Uzawa algorithm [BF91, BraOl] may be used to solve the indefinite system. Also, the hybrid mixed method [AB85, BF91] or a Schur complement could be used to transform the indefinite problem into a positive definite one.
James M. Rath
346
2.4 S a m p l e a p p r o x i m a t i o n of t h e u p s c a l e d p r o b l e m As an example, use RTO elements to approximate 5W and (5V, denoted SWh and 5Yh, and B D D F l elements for Wc and Vc, denoted WH and V ^ . Let WH,h = WH (B SWh and VHJI = ^H © SVh- Elements of WH are functions which are piecewise constant on coarse cells. Elements of SWh are functions which are piecewise constant on fine cells, but must have a zero average over each coarse cell. Elements of V/f are vector-valued functions where the components normal to coarse cell faces are continuous across those faces, and vary linearly along those faces. Lastly, elements of SVh, are vector-valued functions where the components normal to fine cell faces are continuous across those faces, and are constant along those faces; the normal components across coarse cell faces are zero. See Figure 3 for an illustration.
-»•
-«-
-»
T8^
S)
(5
U
r
J
(
a
ii
r
() if::
;: • (J
•
-ii-
(
o
(
o
)
»
o d)
z •
X
•
i'
()
Fig. 3. Degrees of freedom for 2-D RTO/BDDFl elements on an "upscaled" rectangular grid: ® coarse velocity (linear) degrees of freedom, x subgrid velocity degrees of freedom, and » pressure degrees of freedom. [Arb02]
No closure assumption is made regarding the permeability. Accuracy is simply a question of approximation theory — how well does VHJI approxim a t e V ? — and not a question of how much the physics is changed. Defining effective coarse permeabilities is not necessary. T h e closure assumption can be said to be t h a t ah flux across a coarse element face is due to the coarse-scale functions. (The 5-subgrid operator is approximated using well-worn techniques; it is not altered in an ad-hoc fashion to facilitate computation.) In this example, to compensate for the coarseness
Darcy Flow, Multigrid, and Upscaling
347
of the restriction of velocities along coarse edges, higher order accurate basis functions have been employed. For the BDDFl/RTO elements, some error estimates are [Arb] ||u - UHAO
< C{\\phh^
+ (||u||2 + ||u • 42,ro)H^}
= 0[H^),
(33)
lb - PH,hh < C{\\p\\ih + 1|V • u||i/i2 + (||u||2 + ||u • ^ i | 2 , r j ^ ' } = 0{h + H') (34) However, the simplicity of the above expressions belies some error analysis complications; for instance, the 'irH,h operator for the upscaled elements is obtained not just from gluing together the operators from the coarse elements TTH and each of the subgrid elements STTh,,Ec: but also solving an auxiliary problem (except under additional assumptions). The error estimates above also do not capture all the subtleties of the subgrid upscaled solution (see, for instance, the graphs of the eigenstructure of the subgrid upscahng preconditioner in Section 3). One thing to note, though, is that if H/h is small, then 0{h + H^) = 0{h) for the pressure and 0{H'^) = OHH/hfK^) for the velocity. A comparison between the upscaling error estimates with those for BDMl/BDDFl elements on fine grid, (12) and (13), shows that they are the
2.5 Some practical results To show the ability of the subgrid upscaling to approximate problems almost as well as a full fine-scale solution (and far better than a coarse-scale one), some results from a simulated quarter five-spot oil reservoir water flood are presented. The above ideas were adapted to two-phase immiscible displacements, and run on the following example [ArbOO]. The domain is a 40 m by 40 m square with a uniform, rectangular, 40 x 40 grid. The base-10 logarithm of the permeability field is shown in Figure 4; the porosity is 25%. The field initially has a 20% saturation of water. There is an injection well in the lower-left corner, and a production well in the top-right corner. Each has a rate of 0.2m'^/day, and water is being injected. Figures 5, 6, and 7 speak for themselves. The upscaled and fine-scale solutions are certainly close in the eyeball norm; the coarsened solution is not close to either and fails qualitatively to capture some behavior. The upscaled solution requires only about as much time to compute as the coarse solution, but has nearly as many degrees of freedom as the fine solution; hence the "success." Throughout most of the rest of the paper, only RTO/RTO elements will be used. Although the ideas are applicable to others, the description, analysis, and implementation of higher order combinations gets much more complicated.
348
James M. Rath
Fig. 4. Logarithm of the permeability field (geostatistically generated).
Fig. 5. Water saturation contours at 100 and 500 days for the fine 40 x 40 solution.
2.6 Other upscaling techniques
Upscaling techniques are numerous; surveys of various ones currently under study can be found in [Dur02, Far02, RdM97, WGH961. A few descriptions follow. Chen et al. [CDGWOS] describe their technique as a local/global one: use a global coarse simulation t o determine appropriate boundary conditions for local simulations that then determine an effective coarse permeability. Other techniques are local; for instance, periodic boundary conditions can be imposed on the subgrids when computing effective permeabilities as in [BLP78, SP801. Oversampling is another possibility where a local prob-
Darcy Flow, Multigrid, and Upscaling
349
Fig. 6. Water saturation contours at 100 and 500 days for the upscaled 5 x 5 solution.
Fig. 7. Water saturation contours at 100 and 500 days for the coarse 5 x 5 solution.
lem (with various boundary conditions) that is slightly larger than a subgrid is used to determine the effective permeability for that subgrid; Chen and Hou [CH02] and Hou and Wu [HW97] use a multiscale finite element technique to implement this idea. Wu, Efendiev, and Hou also use this idea [EHWOO], and in [WEH02] apply a flow-based gridding dynamic approach to limit the necessary upscaling. Holden and Nielsen [HNOO] have developed a global upscaling method that uses a least-squares problem to determine effective coarse permeabilities. Some distinguishing features of the upscaling technique used here are the use of mixed methods (and the variational setting), the lack of ad-hoc assumptions about the 6-subgrid operator, and the lack of computed effective coarse permeabilities.
350
James M. Rath
3 A Two-level Multigrid-like Scheme In traditional multigrid, one pairs smoothings on fine and coarse grids together with the intent of each smoother reducing the error in different parts of the spectrum. The fine-grid smoother reduces the error on short scales, and the coarse-grid smoother reduces it on larger scales [BHMOO]. The action of each smoother can be computed using a number of operations and storage units proportional to the number of unknowns. The tolerance for the stopping criterion (accuracy) can be chosen independently of the number of unknowns (but not the ratio of largest to smallest permeability). For problems with a great number of unknowns, multigrid is therefore preferable to direct methods. In our modified multigrid, upscaling is substituted for coarsening. Upscaling produces a better approximation than coarsening (at both fine and coarse scales), and is just as cheap. By substituting upscaling for coarsening we keep all of multigrid's benefits, and an even faster scheme results. To diagram the multigrid algorithm and our variant, let us introduce some notation for the matrices involved. If we cast equations (5)-(6) in operator form, we get " 0 B'^' P 'b (35) -B D _ u = c Forming the Schur complement of the velocities gives Ap={B'^D-^B)p = f = b-B'^D--^c.
(36)
For the discrete versions of these equations, add a subscript h for the fine scale, H for the coarse scale, and H, h for the upscaled. A schematic of a twolevel V-cycle multigrid is shown in Figure 8. Our modified version is shown in Figure 9 — the only change is the substitution of upscaled for coarse scale. Multigrid works so well partly because smoothing the solution iterates on different levels reduces the error in different parts of the spectrum. Towards that end, we have developed code to compute and plot the eigenvectors (and corresponding eigenvalues) for the action of each smoother on the error. The matrices analyzed have the form I — M A from the error update equation Gj+i = (/ — M A)ei that describes the action of the preconditioner M on the error e. The plots are of eigenvalue/eigenvector pairs. The horizontal axis shows the norm of the gradient of the eigenvectors (considered as functions) — a proxy for "eigenfrequency." The absolute value of the corresponding eigenvalue is used as the vertical axis. The results of applying the eigenanalysis to several two dimensional examples can be seen in Figures 10-17; some convergence histories are shown in Figures 18-19. Each of the examples has the unit square as the domain with uniform porosity and Dirichlet boundary conditions. The first example has a very simple permeability: it takes one value on the left half of the domain, and another value on the right half (shown in Figure 10). In Figure 1 1 a "typical" result for the eigenstructure analysis as applied to the simple two-level permeability field is shown. On the horizontal axis
Darcy Flow, Wlultigrid, and Upscaling
351
S k100'1'FI c, =
$ (cliagAhf- 1 r,
P I + I= p, -+ CI ~ + i + l
rs = fh START
- Ahpi A T>OSE'! Is llrt11 siritxll?
i=O
po = 0 ro = ftL
- ALP{) COAItSEY 13, = A;' Rr, P ~ + I= P I
+ PEI
i+-i+l
FISISII Ph
Pa
rl = f i t - Ahpv
Fig. 8. Schematic of two-level V-cycle multigrid as used to solve the linear system
Ahph = fh. The sequence of iterates for the pressure pi are computed using the ith-stage residual r , = f h - Ahp,, the diagonal elements of Ah (diagAh used in the Jacobi smoothing step), arid the coarse corrector from solving a reduced problem AH. The restriction operator R : Wh --' W I ~x 0, and the prolongation operator P : W H x V H + Wh are the natural projection and inclusion operators. Also note that there may be many ( u ) applications of the smoother between upscaling steps. is measured the norm of the gradient of the eigenvector when interpreted as a 2-D pressure field (this measure is a stand-in for frequency). The vertical axis shows on a logarithmic scale the absolute value of the corresponding eigenvalue. The blue dots show the check-mark-like structure for a weighted Jacobi smoother. This picture is similar to ones usually used in multigrid analysis [BHMOO]. The cyan dots show the structure one obtains for the corresponding weighted Jacobi smoother on the coarse level. I t has the same shape, but the small-eigenvalue dip comes at lower frequencies (coarser modes) as expected. The magenta dots show the structure for the preconditioner that exactly solves the coarse problem. It has eigenvalues greater than one in the coarser modes because the upscaling factor H l h is equal t o three, and not two (the more usual result with eigenvalues bounded by one can be seen in the bottom right of Figure 13). Lastly, the red dots show the structure for the upscaling preconditioner. Some notable features are its complicated organization, that coarse modes (over a large range) have quite small eigenvalues, that there is a steady rise in eigenvalues versus frequency, and that the eigenvalues top out at greater than one. However, this maximum occurs near where the weighted Jacobi smoother is performing at its best.
352
James M. Rath
i t i + l rs = f,,
-
.At,p,
A
1.- l'SC1,2 121;
E; = : 4 ~ !Rr+ ,~ pi+l = PI $* PEi
FTSISlI
i-i+l
PI,
" PI
r, = f f , - A f , p (
Fig. 9. Schematic of the modified two-level V-cycle multigrid as used to solve the
linear system Ahph = fh. The sequence of iterates for the pressure pi are computed using the ith-stage residual ri = fh - Ahpi, the diagonal elements of Ah (diagAh used in the Jacobi smoothing step), and the upscale corrector from solving a reduced problem AH,^. The restriction operator R : Wh --t WH,hx 0, and the prolongation ~ Wh are the natural projection and inclusion operaoperator P : W H , x~ V H , --t tors. Also note that there may be many (v) applications of the smoother between upscaling steps. Figure 12 shows the effect of varying h, the grid size, while leaving the upscaling factor H l h and the permeability field alone. As can be seen, the overall structure remains the same independent of h, but the level of detail one sees in the diagrams increases, as does the scale of the frequencies plotted. Figure 13 shows the effect of varying H l h , the upscaling factor, while leaving the fine grid size h and the permeability field alone. As H l h increases, there is a shift t o lower frequencies for the coarse smoother and the upscaling preconditioner. (As noted above, the eigenvalues of coarse modes for the exact coarse preconditioner get larger with increasing H l h . ) There is a general worsening of performance with increasing H l h as would be expected. Figure 14 shows the Arcoperm permeability field [Arc97], and Figure 16 shows the Brent permeability field [Lin92]. Both show considerably more variation than the simple two-level permability field. They both also have a greater ratio between the greatest and least values of the permeability: the Arcoperm field about a half an order of magnitude, and the Brent field more than four orders of magnitude. For the Arcoperm field, Figure 15 shows the same rich structure as seen for the two-level permeability field, but surprisingly the up-
Darcy Flow, Multigrid, and Upscaling
353
Fig. 10. A simple permeability field.
scaling preconditioner has a similar performance: small eigenvalues for coarse modes with a general trend towards larger (but not too large) eigenvalues for fine modes. At the greatest upscaling factor Hlh of six, the middle modes begin to suffer, too. The results in Figure 17 show the same general trends. However, the middle modes become much more of a problem (note the vertical scale has changed from a high of 10' before to 10' now). One obvious conclusion to draw from the eigenstructure diagrams is that the performance of the upscaling level suffers in the mid-frequency eigenvectors (see also the convergence histories below). Perhaps Krylov space methods (a CG-like iteration) [Bru95, GV96, Dem971, a third level with an upscaler based on a grid staggered relative to the first upscaler, semi-coarsening in different levels in different directions [CWW92], or perhaps a traditional multigrid coarse smoother might be used to improve the method. The diagrams also lead to the reasonable inference that the behavior of the proposed two-level scheme depends on Hlh and k*/k,, but not the absolute level of h. This mirrors the behavior of multigrid, but we have not proven this yet for this scheme. It also seems a reasonable inference that upscaling modified multigrid will always perform better than traditional multigrid. In this proposal, we have been assuming that a single coarsening of the grid is enough to result in a problem that is computationally inexpensive to use. Sometimes this "coarse" grid just is not coarse enough, and the associated problem is still too computationally expensive to work with. In this case, a
James M. Rath
354
amplify attenuate
8"
0
0.5
1
1.5
2
2.5
3
3.5
4
lo4 IUTX!
Eigen-Frequency
h igl I
Fig. 11. A "typical" eigenstructure result for the smoothers and upscaler (RTO elements used at all levels). The fine grid used was 24 x 24, the coarse grid 6 x 6 (with H / h = 4), and Icz = 0.1. The colored dots are eigenpairs for the * coarse
smoothing preconditioner, fine smoothing preconditioner, and preconditioner.
subgrid upscaling
true variational multi-scale (and not just two-scale) method for the upscaling level would be appropriate. This might be accomplished say by decomposing V = V , $ S V = V , , @ A V @ S V and W = W , @ S W = W , , $ A W @ S W into three levels (or more) with a correspoding approximation with multiple levels. The details of this possibility have yet to be worked out. It might also be possible to use traditional algebraic multigrid on the coarsened system that comes from the upscaled problem (29)-(30). The eigenstructure diagrams also give us a way to estimate how often the Jacobi smoother is applied relative t o the number of times the upscaling preconditioner is applied. It is also possible to then estimate the overall error reduction factor for an iteration of our two-level method. This is accomplished by examining frequency by frequency the product of the error reduction factors (eigenvalues) of the smoother and upscaling preconditioner, and by computing the maximum such product. This allows us also to estimate the relative performance of multigrid with the two-level upscaling/smoother method for a given problem. Of course, this is assuming that eigenvectors with
Darcy Flow, Multigrid, and Upscaling
Vary h ; F i x H/h= 3 ; Fix ka = 0.1 . . . .
to-
.""...t..:-...'..".
1'
,,
.
.
.... .
,:a'
A
. ..
..
, I
1
.
.
I-.---.
A
1
''0
. .
10'
355
d.
&
- 9 n r E g W d b V 1
+
*
I
.
,A
m
,
. .. . '
.I
I
..
E ,L & &
K& wMqlPa,.4-#IY1
An A Lm
mm
12 x 12
6x6 i
ID'
...
..
-. .
.. **
-
L
--ma-(*.-*-
24 x 24
_ .
& I
em.
;
.
'
ID.Y--Y-*-
A
-
.
.I$
48 x 48
Fig. 12. Eigenstructure results for the preconditioners for problems with various fine grid sizes h, but fixed upscaling ratio H l h and fixed permeability. The colored dots are eigenpairs for the * coarse smoothing preconditioner, e coarse exact preconditioner, fine smoothing preconditioner, and subgrid upscaling preconditioner.
like "frequency" are alike; in reality, the eigenbases differ and the comparison for a given frequency is not apples-to-apples. Some examples of convergence histories using this analysis are shown in Figure 18. The logarithm of the norm of the residuals/errors is shown on the vertical axis; the number of V-cycle iterations is shown on the horizontal axis. The red and blue pairs show the history for the upscaling modified multigrid; the pink and cyan pairs for an ordinary two-level multigrid. Significantly faster convergence is obtained with the upscaling than with multigrid (although both display linear convergence and dependence on k*/k,). A zero initial guess for the solution is used along with a random source term. T h e diagrams in the top row show the eigenstructure diagram and convergence history for the simple permeability field (shown in Figure 10) on a 24x 24 grid with H l h = 4 and k2 = 0.1. An estimation from the eigenstructure diagram indicates that a single Jacobi smoothing step pre- and post-upscaling in a V-cycle will ensure convergence. Indeed this is the case: the upscaling scheme has a convergence factor of about 0.6 per V-cycle step. On the other hand,
James M . Rath
356
Fix h = 1/24 ; Vary H / h ; Fix k2 = 0.1 w/RTO .... . .i>.. .,. .
.
,.
.
,d
."'
.
.
.
.
.
.
*..4.. 5 _.-.-.
..
r.
.......
.. ...
Lo-.
' ?.
.. .. .
.
7m-.
.
I
d
*- U . . - - y . p n C -
Ph
;'u
a
n
.
.
c
.
IS'
H/h =6 Id
d
.* R . * . r '. m C . W n r * . n , ~k.W
.
.
H/h =4 .
.
;'u
.
.
4
.I#
.
. . . .. +
d
*-.
... ..-. 5 :
... ?
,ma.
-
...
-
I
8
I
'
d"
;
*;
;
-4.mme*.e*..n*-
H/h = 3
0;
; '&
4
td
0
0;
;
'
.~l)*mnr~.w-~-
;r
i
*L
4
I,.'
H/h=2
Fig. 13. Eigenstructure results for the preconditioners for problems with various upscaling ratios H l h , but fixed fine grid size h and fixed permeability. The colored
dots are eigenpairs for the .r coarse smoothing preconditioner, o coarse exact preconditioner, fine smoothing preconditioner, and * subgrid upscaling preconditioner. multigrid has a factor of about 0.94 for the same V-cycle with a smoother on the coarse level. The diagrams in the middle row show the the eigenstructure diagram and convergence history for the Arcoperm permeability field shown in Figure 14 for H l h = 5. An estimation from the eigenstructure diagram indicates that two Jacobi smoothing steps pre- and post-upscaling in a V-cycle again will ensure convergence. Again, the upscaling scheme clearly outperforms ordinary multigrid. The diagrams in the bottom row show the the eigenstructure diagram and convergence history for the Brent permeability field shown in Figure 16 also for H l h = 5. This time estimating the number of necessary smoothings per iteration is more difficult. Forty smoothings were used; note the most smoothings comes from the coarsest eigenmode - the red dot furthest to the left. This is most likely an overestimate of the number of smoothings necessary
Darcy Flow, Multigrid, and Upscaling
357
Fig. 14. The Arcoperm permeability field. The minimum base-10 logarithm of the
permeability is -0.2916 and the maximum 0.1477. for a convergent scheme, but smoothings are inexpensive relative t o upscalings (and probably have a beneficial effect - see below). Figure 19 shows some effects of underestimating the number of necessary smoothings. In the case of the Arcoperm permeability field, changing the number of smoothings from two t o one only slows convergence (and both ordinary and modified multigrid perform comparably). In the case of the Brent permeability field, the iterations diverge. One feature t o note in the top diagram for the Arcoperm permeability field is that there is a steep initial decline in the size of the error. This is probably attributable t o the initial guess for the solution (all zeros) having an error that has large components in the direction of the coarse eigenmodes of the upscaling preconditioner (this probably also explains the large jump in the first or first few steps in each of the other convergence histories - note that the blue/cyan and red/pink dots coincide for step "zero"). The "stalling" of the convergence is probably attributable t o the poor performance of the preconditioner on middle eigenmodes. Another feature t o note is that the doubling of smoothings roughly doubles the rate of convergence of ordinary multigrid, but the modified multigrid experiences a dramatic improvement in performance. It would seem that overestimating the number of necessary fine smoothings has a synergistic effect (and is not very computationally expensive).
James M. Rath
358
Fix h = 1/40 ; Vary H / h ; Fix k w/RTO ,d -
d -
-
to4*
L
A
;
i*
i s .a' 7:. rn!m*rnC.C.ldlcN
A ;
A
x
'd
0
A
b
&
H/h =2 ,d 1
. ... . .-.
..,I
.
; *;
; ,
rnW.LM'IIY-I*
a. -
A
:.
*
r ID'
H/h=5
,.--.
.rr.. I.
..... 1 ! & A d a ; a ;
~ L - C - ~ Y P A
I;;;;I:I n w b L L b . L ;
H/h= 8
r d
; z ~ c I z. I ;
ryr-yawru-asw
H
d
H / h = 20
Fig. 15. Eigenstructure results for the preconditioners for problems with various upscaling ratios H l h with the Arcoperm permeability field in Figure 14. The colored dots are eigenpairs for the .- coarse smoothing preconditioner, * coarse exact preconditioner, fine smoothing preconditioner, and subgrid upscaling preconditioner.
One last feature t o note is t h a t for all the convergent schemes, the error is monotonically decreasing (as is expected for a multigrid-like scheme). Contrast this with (preconditioned) conjugate gradients where the convergence is typically non-monotone (although theoretical bounds are monotone). Conjugate gradients is also more costly asymptotically. In the future we would like t o produce plots of the size of spectrum of the error versus iteration number. A break-down of inter-step frequency reduction (the spectrum before and after smoothing, and before and after upscaling) would also be useful. The ability t o visualize a given eigenmode (that is, make an n-dimensional plot of the pressure that corresponds t o an eigenvector) by clicking on the eigenstructure plot would be a handy tool. In future research, we hope t o be able t o prove t h a t our two-level scheme converges linearly (independent of h, but dependent on H l h and k*/k,), and does so faster than ordinary multigrid. Toward that end, the methods used in standard multigrid will certainly be useful as the upscaling scheme is
Darcy Flow, Multigrid, and Upscaling
359
Fig. 16. The Brent permeability field. The minimum base-10 logarithm of the
permeability is -0.6909 and the maximum 3.5230. posed in a variational setting. There are several monographs [Hac85, McC87, Wes92, Bra93, Hac93, Bru95, Saa95, Sha95, BHMOO, Bra011 and review articles [Xu92, Yse93, Xu97a, Xu97bj that provide useful information (several of the monographs on iterative methods generally). As well, Bramble, Pasciak, and Xu have developed a general framework for subspace correction methods [BPX91, BPWXgla, BPWXglb, Xu92, Bra93, Xu97a, Xu97bl. Their work will be useful in having developed a unified approach to multigrid-like methods, and also specifically because they treat the possibility of non-nested spaces for corrections ( V H , is~ not necessarily a subset of V h ) , and perturbed linear forms (e.g., using quadrature on the term (kuh,v h ) ) . Lastly, we might try to modify the upscaling preconditioner so that it itself is a smoother. Also, the upscaling preconditioner might be modified for use in discontinous Galerkin (DG) methods, or the expanded mixed method [AWY97].
360
M,Rath
James
. .
d
Fix h = 1/40 ; Vary H / h ; Fix k w/RTO .
,
,
.
,
,
,
.
lf
.
.
.
.
.
.
.
\.
.
d.
.
........
.
.
- ..
a .
"
I@',
A
.
.
.
.
.
.
..>
'
"
U
"
U
.
.
.
*;
.
;A
.
t
mid
'4 &
A
A
; *;
A
h
;A
.IIIC-.*cIp.dCI*
2
I lb
H/h =5
.
'*..,, ..-,
.
-
.
.
.
.
.
.
ih.
..
'
.
U
,:,
&
Hjh = 2
.. ..... 3:: . :. O
;
A &
mulbl-.Ips-dLW*
.
..
.Z
. U
'
!
%
1
.
< 1
II*(.IL-C~eLLUI
H/h=8
I
4
l
d
I
r d
A
I
~
U
D
I
U
I
I
B
rnlllnn..,.r*l.l...-N
U
t
A
. U
1
1
I
r d
H / h = 20
Fig. 17. Eigenstructure results for the preconditioners for problems with various upscaling ratios H l h with the Brent permeability field in Figure 16. The colored dots are eigenpairs for the coarse smoothing preconditioner, e coarse exact preconditioner, fine smoothing preconditioner, and 0 subgrid upscaling preconditioner.
Darcy Flow, Multigrid, and Upscaling
361
Fig. 18. Convergence histories (right column) of the two-level upscaling and multigrid schemes for the three sample permeability fields in Figures 10, 14, and 16. The vertical axis shows the logarithm of the residuals/errors; the horizontal axis the number of V-cycle iterations. The light and dark blue colored dots mark the sizes of the residual for the multigrid and upscaling schemes, respectively. The light and dark red dots mark the sizes of the error. For reference, the left column shows the eigenstructure diagrams for the corresponding permeability fields.
362
James M. Rath
0
10
20
90
40
-1
50
Brent (v = 5 )
I
80
70
..
10''
lo'*
BO
numbex
.
-
Resklual(UP) Ermr (UP) Residual (MQ) Ermr(MG)
.....
10'~-
......... ............ .-aL
-
...
........ . .....*..........-.
Fig. 19. More convergence histories of the two-level upscaling and multigrid schemes that illustrate the effect of and underestimating the number of necessary smoothings. The coloring scheme is the same as used in Figure 18. The diagram on the top is comparable to the one in the center-right of Figure 18; one smoothing per step are used instead of two. The diagram on the bottom is comparable to the one in the bottom-right of Figure 18; five smoothings per step are used instead of forty.
Darcy Flow, Multigrid, and Upscaling
363
References [AB85]
D. N. Arnold and F. Brezzi. Mixed and nonconforming finite element methods: implementation, postprocessing and error estimates. RAIRO Model. Math. Anal. Numer., 19:7-32, 1985. [AB02] T. Arbogast and S. L. Bryant. A two-scale numerical subgrid technique for waterflood simulations. SPE J., pages 446-457, Dec. 2002. [Arb] T. Arbogast. Analysis of a two-scale, locally conservative subgrid upscaling for elliptic problems. [ArbOO] T. Arbogast. Numerical subgrid upscaling of two-phase flow in porous media. In Z. Chen, R. E. Ewing, and Z.-C. Shi, editors. Numerical treatment of multiphase flows in porous media, volume 552 of Lecture Notes in Physics, pages 35-49. Springer, Berlin, 2000. [Arb02] T. Arbogast. Implementation of a locally conservative numerical subgrid upscaling scheme for two-phase Darcy flow. Computational Geosciences, 6:453-481, 2002. [Arc97] Arcoperm permeability field data. Personal correspondence, 1997. [AWY97] T. Arbogast, M. F. Wheeler, and I. Yotov. Mixed finite elements for elliptic problems with tensor coefficients as cell-centered finite differences. SIAM J. Numer. Anal., 34:828-852, 1997. [BDDF87] F. Brezzi, J. Douglas, Jr., R. Duran, and M. Fortin. Mixed finite elements for second order elliptic problems in three variables. Numer. Math., 51:237-250, 1987. [BDM85] F. Brezzi, J. Douglas, Jr., and L. D. Marini. Two families of mixed elements for second order elliptic problems. Numer. Math., 47:217-235, 1985. [Bea72] J. Bear. Dynamics of Fluids in Porous Media. Dover, New York, 1972. [BF91] F. Brezzi and M. Fortin. Mixed and hybrid finite element methods. Springer-Verlag, New York, 1991. [BHMOO] William L. Briggs, Van Emden Henson, and Stephen Fahrney McCormick. A Multigrid Tutorial. SIAM, 2000. [BLP78] A. Bensoussan, J. L. Lions, and G. Papanicolaou. Asymptotic Analysis for Periodic Structure. North Holland, Amsterdam, 1978. [BPWX91a] James H. Bramble, Joseph E. Pasciak, J. Wang, and Jinchao Xu. Convergence estimates for multigrid algorithms without regularity assumptions. Mathematics of Computation, 1991. [BPWXQlb] James H. Bramble, Joseph E. Pasciak, J. Wang, and Jinchao Xu. Convergence estimates for product iterative methods with applications to domain decomposition. Mathematics of Computation, 57:1-21, 1991. [BPX91] James H. Bramble, Joseph E. Pasciak, and Jinchao Xu. The analysis of multigrid algorithms with nonnested spaces or noninherited quadratic forms. Mathematics of Computation, 56:1-34, 1991. [Bra93] James H. Bramble. Multigrid Methods. Number 294 in Pitman Research Notes in Mathematics. Longman Scientific and Technical, New York, 1993. [BraOl] Dietrich Braess. Finite elements : theory, fast solvers, and applications in solid mechanics. Cambridge University Press, New York, second edition, 2001.
364 [Bru95]
James M. Rath
Are Magnus Bruaset. A Survey of Preconditioned Iterative Methods. Number 328 in Pitman Research Notes in Mathematics. Longman Scientific and Technical (John Wiley), New York, 1995. [CDGW03] Y. Chen, Louis J. Durlofsky, M. Gerritsen, and Xian-Huan Wen. A coupled local-global upscaling approach for simulating flow in highly heterogeneous formations. Advances in Water Resources, 2003. Submitted. [CH02] Z. Chen and T. Y. Hou. A mixed multiscale finite element method for elliptic problems with oscillating coefficients. Math. Camp., 2002. [CWW92] L. C. Cowsar, A. Weiser, and M. F. Wheeler. Parallel multigrid and domain decomposition algorithms for elliptic equations. In D. Keyes et al., editors. Fifth International Symposium on Domain Decomposition Methods for Partial Differential Equations, pages 376-385. SIAM, Philadelphia, 1992. [Dar56] Henry Darcy. The Public Fountains of the City of Dijon, chapter Appendix D. Victor Dalmont, Paris, 1856. [Dem97] James W. Demmel. Applied numerical linear algebra. SIAM, Philadelphia, 1997. [DEW83] J. Douglas, Jr., R. E. Ewing, and M. F. Wheeler. Approximation of the pressure by a mixed method in the simulation of miscible displacement. R.A.I.R.O. Model. Math. Anal Numer., 17:17-33, 1983. [DEW84] B. L. Darlow, R. E. Ewing, and M. F. Wheeler. Mixed finite element methods for miscible displacement problems in porous media, SPE 10501. Soc. Petrol Eng. J., 24:391-398, 1984. [DJ97] Clayton V. Deutsch and Andre G. Journel. GSLIB geostatistical software library and user's guide. Oxford University Press, New York, second edition, 1997. [DR85] J. Douglas, Jr. and J. E. Roberts. Global estimates for mixed methods for second order elliptic equations. Math. Comp., 44:39-52, 1985. [Dur02] Louis J. Durlofsky. Upscaling of geological models for reservoir simulation: Issues and approaches. Computational Geosciences, 6:1-4, 2002. [EHWOO] Y. R. Efendiev, T. Y. Hou, and X.-H. Wu. Convergence of a nonconforming multiscale finite element method. SIAM J. Numer. Anal, 37:888-910, 2000. [ERW84] R. E. Ewing, T. F. Russell, and M. F. Wheeler. Convergence analysis of an approximation of miscible displacement in porous media by mixed finite elements and a modified method of characteristics. Comp. Meth. in Appl. Mech. and Engng., 47:73-92, 1984. [Far02] C. L. Farmer. Upscaling: a review. International Journal for Numerical Methods in Fluids, 40(l-2):63-78, 2002. [FC79] R. A. Freeze and J. A. Cherry. Groundwater. Prentice-Hall, Englewood Cliff's, New Jersey, 1979. [G'*'85] James Glimm et al. Sharp and diffuse fronts in oil reservoirs: Front tracking and capillarity. In William E. Fitzgibbon, editor, Mathematical and Computational Methods in Seismic Exploration and Reservoir Modeling, pages 54-76, Philadelphia, 1985. SIAM. [GV96] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, third edition, 1996.
Darcy Flow, Multigrid, and Upscaling [GW88]
365
R, Glowinski and M. F. Wheeler. Domain decomposition and mixed finite element methods for elliptic problems. In R. Glowinski et al., editors, First International Symposium on Domain Decomposition Methods for Partial Differential Equations, pages 144-172. SIAM, Philadelphia, 1988. [Hac85] Wolfgang Hackbusch. Multi-grid methods and applications. Number 4 in Computational mathematics. Springer-Verlag, New York, 1985. [Hac93] Wolfgang Hackbusch. Iterative Solution of Large Sparse Systems of Equations. Number 95 in Applied Mathematical Sciences. SpringerVerlag, New York, 1993. [HFMQ98] T. J. R. Hughes, G. R. Feijoo, L. Mazzei, and J.-B, Quincy. The variational multiscale method—a paradigm for computational mechanics. Comp. Meth. in Appl. Mech. and Engng., 166:3-24, 1998. [HNOO] L. Holden and B. F. Nielsen. Global upscahng of permeability in heterogeneous reservoirs; the output least squares (OLS) method. Transport Porous Media, 40:115-143, 2000. [HW97] T. Y. Hou and X. H. Wu. A multiscale finite element method for elliptic problems in composite materials and porous media. J. Comput. Phys., 134:169-189, 1997. [Lin92] Brent Lindquist. Geostatistically generated permeability field data. Personal correspondence, October 1992. [McC87] Stephen Fahrney McCormick, editor. Multigrid methods. SIAM, 1987. [RdM97] Ph. Renard and G. de Marsily. Calculating equivalent permeability: a review. Advances in Water Resources, 20:253-278, 1997. [RT77] R. A. Raviart and J. M. Thomas. A mixed finite element method for 2nd order elliptic problems. In Mathematical Aspects of Finite Element Methods, number 606 in Lecture Notes in Mathematics, pages 292-315. Springer-Verlag, New York, 1977. [RT91] J. E. Roberts and Jean-Marie Thomas. Mixed and hybrid methods. In P. G. Ciarlet and J. L. Lions, editors, Handbook of Numerical Analysis, volume 2, pages 523-639. Elsevier Science Publishers B.V. (NorthHolland), Amsterdam, 1991. [RW83] T. F. Russell and M. F. Wheeler. Finite element and finite difference methods for continuous flows in porous media. In R. E. Ewing, editor, The Mathematics of Reservoir Simulation, number 1 in Frontiers in Applied Mathematics, pages 35-106, Chapter II. Society for Industrial and Applied Mathematics, Philadelphia, 1983. [Saa95] Yousef Saad. Iterative Methods for Sparse Linear Systems. Brooks/Cole, 1995. [Sch74] A. E. Scheidegger. The Physics of Flow Through Porous Media, 5rd ed. University of Toronto Press, Toronto, 1974. [Sha95] V. V. Shaidurov. Multigrid methods for finite elements, volume 318 of Mathematics and its applications. Kluwer Academic Publishers, Boston, 1995. [SP80] E. Sanchez-Palencia. Non-homogeneous Media and Vibration TheoryNumber 127 in Lecture Notes in Physics. Springer-Verlag, New York, 1980. [WEH02] X. H. Wu, Y. Efendiev, and T. Y. Hou. Analysis of upscaling absolute permeability. Discrete Contin. Dyn. Syst. Ser. B, 2:185-204, 2002.
366
James M. Rath
[Wes92] [WGH96]
[Xu92] [Xu97a]
[Xu97b]
[Yse93]
Pieter Wesseling. An Introduction to Multigrid Methods. John Wiley, New York, 1992. Xian-Huan Wen and Jaime J. Gomez-Hernandez. Upscaling hydrauhc conductivities in heterogeneous media: An overview. Journal of Hydrology, 1996. Jinchao Xu. Iterative methods by space decomposition and subspace correction: A unifying approach. SIAM Review, 34:581-613, 1992. Jinchao Xu. An introduction to multigrid convergence theory. In Raymond H. Chan, Tony F. Chan, and Gene H. Golub, editors, Iterative methods in scientific computing, Singapore, 1997. Springer. Jinchao Xu. An introduction to multilevel methods. In M. Ainsworth, J. Levesley, M. Marietta, and W. A. Light, editors. Wavelets, Multilevel Methods and Elliptic PDEs, Numerical Mathematics and Scientific Computation (Oxford science publications), pages 213-302, Oxford, 1997. Clarendon Press. Harry Yserentant. Old and new convergence proofs for multigrid methods. Acta Numerica, 3:285-326, 1993.
Iterated Adaptive Regularization for the Operator Equations of the First Kind * Yanfei Warig^ and Qinghua Ma^ ^ State Key Laboratory of Remote Sensing Science, Jointly Sponsored by the Institute of Remote Sensing Applications of Chinese Academy of Sciences and Beijing Normal University, P.O. BOX 9718, Beijing 100101, P.R.China, yf wang.ucf Qyahoo . com Department of Mathematics, University of Central Florida, P.O.BOX 161364, Orlando, FL 32816-1364. yfwaiig9math.ucf.edu ^ Department of Information Sciences, College of Arts and Science of Beijing Union University, Beijing, 100038, P.R.China, qiiighua9ygi.edu.cn S u m m a r y . The adaptive regularization method is first proposed by Ryzhikov et al in [RB98] for the deconvolution in elimination of multiples which appears in geoscience and remote sensing. They have done experiments to show this method is very effective. This method is stronger than the Tikhonov regularization in the sense that it is adaptive, i.e., it automatically eliminates the small singular values of the operator when which is nearly singular. In this paper, we will show that the adaptive regularization can be implemented in an iterated way. Particularly, we show that if some priori knowledge-based information (i.e., a priori strategy for choosing the regularization parameter) is known in advance, the order of the convergence rate can approach 0{5'>''+^) for some i/ > 0. A posteriori strategy for choosing the regularization parameter is also introduced, the regularity is proved.
1 Introduction Many inverse problems of mathematical physics and problems of remote sensing lead to the solution of the first kind operator equations Tx = y,
(1)
where T is a bounded linear operator from Hilbert space X into the Hilbert space Y and T is often known as a convolution operator. Assume t h a t we are interested in the MA'^^'-soIution (minimum-norm least-squares solution) x"*" of equation (1), then it is well-known t h a t The work is partly supported by Chinese national 973 research project G20000779.
368
Yanfei Wang and Qinghua Ma x+=T+2/,
(2)
where T+ denotes the Moore-Penrose generalized inverse of T'^. For nonclosed range R{T) of T, the MA'^S'-solution x'^ exists only for D{T+) = R{T) + B,{T)^
CY
and depends discontinuously on the right-hand side. A prototype for such ill-posed problems is the Predholm integral equations of the first kind {Tx){t):=
/ k{s,t)xit)dt Jo
= y{s),
SG[0,1],
where A; is a non-degenerate L^-kernel and X = Y = L^[0,1] (see [EHN96, Tik63]). In many applications, the right-hand side can not be obtained exactly, that is to say, only some ys GY is available, \\y-ys\\<5,
(3)
where 5 is a priori known error level. Since T+ is unbounded, T'^ys is not a reasonable approximation to T+y, even if it exists. Because of this problem, one has to use the regularization method for approximating T'^y. A well-known method for the solution of ill-posed problems is the Tikhonov regularization, which is considered by Tikhonov, 1963 (see [Tik63]) and independently by Philhps, 1962 (see [Phi62]). For a > 0, we denote x" as the unique solution of (T*T + aI)x = T*ys, (4) where a is called the regularization parameter. For the theory of Tikhonov regularization, we refer the well written book by Groetsch, 1984 (see [Gro84]). It is well-known that if the regularization parameter a is chosen in dependence of 5 such that (see [Tik63]) (52 hm —T— = 0, and lim aid) = 0,
5—^0 a{S)
5^0
^ '
then lim | | x f ) - T + y | | = 0 . 0
>U
If the exact solution fulfills the smoothness property T+y e R{{T*Ty) for some i/ £ (0,1], then for an priori choice of a such as a{5) = c 5 ^ , c > 0, one obtains the convergence rate (see [Gro84, Neu97])
(5)
Iterated Adaptive Regularization for the Operator Equations
\\xf^^
369
-T+y\\=0{5^^).
The standard Tikhonov regularization can be implemented in an iterated way, we refer [EHN96, HG98, KC79, Gfr87] for details. Schock in [Sch84] considered the regularization with adjoint operators. Recently Ryzhikov et al [RB98] and Wang [Wan02] have considered the adaptive regularization method (simply denote by AR) for solving ill-posed inverse problems (1) with adjoint operator. Ryzhikov et al have utihzed this kind of method for solving deconvolution problems in elimination of multiples successfully. Denote H = T*T, which is an adjoint operator, Ryzhikov et al's adaptive regularization is based on the minimization problem min J[a,a;] = \\Tx - y^f + a\\x\\l,
(6)
where \\X\\D := ^/{Dx,x), i? is a definite or semi-definite operator. Choosing D = H"^ and denoting x" the solution of (6), then for a > 0, Ryzhikov et al obtain (if 2 + al)x1 = Hzs, (7) where zs = T*ys. The filtering function of the AR is defined as i?'^^(A) = A(A^ + a)~^. The remarkable difference between the adaptive regularization and the Tikhonov regularization lies in that the former can simultaneously ehminate the null space elements of the operator when it approaches singular while the later needs the regularization parameter to suppress the singularity. The adaptive regularization (7) can be implemented in an iterated way: (F^ + al)xf, = axi_i
+Hzs,
a > 0.
(8)
We will analyze the convergence properties of the iterated AR in the section 2. In section 3, we will prove that the convergence rate of iterated AR can be up to 0(5'^"+^) for any z^ > 0 if a priori strategy is satisfied. In section 4, we show that if the discrepancy principle is chosen as a posteriori strategy, then we can only obtain the optimal convergence rate 0((52"'+i) for some v > ^•
2 Convergence of t h e Iterated AR
(lAR)
For noise-free data, the iterated version of the AR is generated as {H'^+ aI)xn = aXn-i+Hz
(9)
or Xn = a{H'^ + aI)-^Xn^\
+ {H'^ + aI)-^Hz,
(10)
where z = T*y. For simplicity, we choose the initial guess value XQ = 0. Note that the operators [H"^ -t- al)~^ and {H'^ + aI)~^H
are both every-
where defined and bounded with 11(^2-fa/)-i|| < - and WiH"^+ aI)-'^H\\ <
370
Yanfei Wang and Qinghua Ma
1 . Therefore, for each fixed n, the sequence {a;„} generated by (9) is stable with respect to perturbations in y. By induction, (10) is transformed to n-l
J2 a^{H^
+ ocI)-^-^Hz
:= RI'^^{H)Z,
(11)
fc=0
where XQ := 0 and R(/\^{X) is defined as
«r(A)-i:ix4^4(i-(^)"). fc=0 ^
(12)
'
We also have that \\R'''''{H)\\ < p | (13) It is easy to see that i?^-^^(A) —> A^^ as a —> 0 and Rn^^{X) —> 0 as A —> 0. This is a major difference from traditional Tikhonov regularization method. We have known that the Tikhonov filtering function F{F'^'^^{\) = (A + a)^^ —> a~i as A —> 0. This indicates that if the operator T*T is degenerated and has an eigenvalue being null, the /Ai?-inverse operator eliminates null-space components at all. In the following, we analyze some of the convergence properties of the lAR when the data are noise-free. Lemma 1. Let x he any solution of Tx = Qy, where Q is the orthogonal projection ofY onto R{T). If {xn} is given by (9) or (10) then for alln> 1, we have x~Xn==a''{H'^ + aI)-''x. (14) Proof Since Tx = Qy, hence Hx = z. Prom (10) we know x-xi
= X- {H'^ + aiy^Hz = [I~~{H^+aI)-^H^]x
= a{H'^ +aiy^x; X ~ Xn = X — a{H'^ + aI)~^Xn-i = a{H'^ + aiy^{x
— {H + al)"
Hz
~ x„_i).
By induction and notice that XQ := 0, we obtain the results immediately. D Denote r„(A) = ( A ^ + O ) " ' ^'^'^ "•°^® ^^^^ •"''^ ~ '^^V^ '^^ h&ve Lemma 2. x+ — x„ =
rn{H)x'^.
Clearly, a;„ —> x'^ for any a > 0 as n —> oo. If in addition some "smoothness" conditions are given, i.e., x+ = Wtu for some w S D{H'^) and u > Q, then we have the following results:
Iterated Adaptive Regularization for the Operator Equations
371
Lemma 3. a;+ — a;„ = Sn,u{H)u), where Sn,v{X) := A'^r„(A). The error analysis will hinge on an investigation of the function A
-T ex
where u > 0, a > 0 are given parameters. As we are interested in fixed f > 0 and n —> oo, we shall assume that n > i^/2 (note that for v > 2n, Sn,(^(A) is increasing in A). An easy calculation shows that s^ ^^(A) = 0 if and only if A* =
( ^ ) * , 2n - w
thus max s„,^(A) = s„,i/(A*) A€[0,oo) 2n-L' =
( ,2n
) — 1/.
<
. 2 (1
V )"
V
)—'
av
(15)
Theorem 1. If x'^ = H'^uj for some u > Q and lo € D{H'^), then
\\x+-xJ
Theorem 1 indicates that the approximation error can be expressed as 0{n~'^), thus the behavior of the iteration index k also serves as a regularization parameter.
3 Regularity under a Priori Strategy Assume that ys is an approximation to the data y with error level 5 such that \\y — ysW < S- Let {x^} be the sequence generated by (10) using the data ys, i.e., xi = a{H^ + aiy^xi_^ + {H^ + aiy^Hzs (16) or for simplicity <
= R'n'^aWzs,
(17)
where zs = T*ysWe call the sequence {.T^} is stable, if \\xf^ — Xn\\ —» 0 as 5 —» 0. From (12) one can easily calculate that
372
Yanfei Wang and Qinghua Ma r„(A) + Afl^^„«(A) = l.
Since 0 < r„(A) < 1, hence 0 < 1 - Ai?^^„^(A) < 1. We now derive a stability estimate for the sequence xf^. Since x„, x^ G D{T), and note that (13), we have by simple calculation that
=
{K'^^{H){z-zs),Ri^^{H){z~zs))
<5'^\\T\\-'^. Furthermore, by (12) and using triangular inequahty, we have ,iARn^
^ 1- .
"
^
A2 + 2a A(A2+a)'
The maximum value of the function ._ A^ + 2a ^^^' - A(A2 + a) is /max(A) — y ^ . Hence the error ||.T„ — a:;^||^ can be further estimated as 11 -^n ~ 2;„ 11 = [Xn ~ X^, Xn — X^ )
= ( r < ^ „ « ( F X „ « ( F ) r * ( y - ys),y~
ys)
<\\Ri^J^iH)Ri^J^iH)H\\\\y~ysf <\\R'^^^{H)\\\\y-ysf
< 2v^ and hence
x.'ll<
^ '
\/2^'
By (17), we also have \\Txr,-Txi\\
=
\\TRi'^J'{H){z-zs)\\
<\\T\\5\\T\\-'=S. With the above analysis, we write a lemma as follows: Lemma 4. For the sequence {xn}, {xf^}, we have the following stability results:
5„ . V^s and \\Tx„~-Tx'J<5.
Iterated Adaptive Regularization for the Operator Equations
373
Theorem 2. If a{5,ys) —> 0 and -—-z—r- —> 0 as 5 —> 0 and n —> oo, a{5,y5) then xi := R'„^^{H)zs -^ x+ = T+y. Proof. By Lemma 2 and Lemma 4, the proof of the result is straightforward. D Compared with the Tikhonov regularization, this is a much better result, which provides more stronger regularization. As long as n —> oo and 5 —> 0, then for any regularization parameter a{6) = 6'', k < 4, the iterated AR converges, while for Tikhonov regularization, a{6) cannot be exceeded to 6^. Lemma 5. Let x'^ ^ 0. Then for all 5 > 0 there exists a unique a{S) satisfies the equation \\xn-x+\\=6/^.
(18)
Proof. Since Tx^ = Qy, from (11) we have a:„ - a;+ = -a"(fl-2 ^
al)-''x+.
Denoting the spectral family of the operator H = T*T as E\, we have
Hence if we define
« ) - /o/
7i(^^d\\E,x+f-5'^
(A2 + a)2»
then by definition x"*" G N{T)'^ and since x'^ ^ 0, we know that 0(a) is continuous and strictly monotonically increasing with lim (/»(«) = —S'^ and a—^0
lim
(/)(Q;)
= oo,
a—>oo
this proves the lemma. D Now we prove the rate of convergence for the iterative method (10). Theorem 3. Let x+ G R{H") and let a[5) he as in Lemma 5, then \\xf^,g-. — x+\\ =
0{5^).
Proof. Let a = a{5) be as in Lemma 5 and define \\Xn C5
••=
X ;;
a" then by (18) we obtain and a can be expressed as
,
374
Yanfei Wang and Qinghua Ma
Now by Lemma 4, ||4(5) - ^'^ II < W^niS) -X+\\ + \\xi(5) - ^niS) \\
This completes the proof. D
4 A Posteriori Choice of the Iteration Index Prom the former section we know that the optimal order of convergence is obtained if the choice of a{5) is in an apriori way, i.e., [|x-„ — x+H = bj\foi. However this is not applicable in applications. Note that by Bernoulli's inequality, we also have
^A2 + a '
'
A2 + a '
"
\^ A-a
hence
<«^(^)=^A^^25S-
(^^)
In this way,
= {Ri^^{H)T*{ys = {Ri^^{TT*)TT*{y,
- y), i?^^„«(ff)T*(y, - y)) - y),Rl^^{H){ys
- y))
2V^ Thus X rj,
X,
5 I
n
,1
Therefore, if the natural numbers n = n[5) is chosen such that n{5) —> oo and (5(n(5))2 —^ 0 as (5 -^ 0, then we also have ||a;^ ^ 3;+|| ^ 0 as 5 —> 0. Numerically, we can not expect too many iteration steps. This means that the iteration index must be controUed in some way such that n{5) is a finite number. In practice, a posteriori way will be better. A popular posteriori way is the discrepancy principle: the iteration process should be stopped at the first occurrence of the index n{5) such that \\Tx'„^s)-y5\\
(21)
Iterated Adaptive Regularization for the Operator Equations
375
with T > 1 another parameter. Note that r„(A) = 1 — Ai?^^^(A) —> 0 as n —> oo, hence the discrepancy ^^n(5) ~ ys satisfies \\ys-Txi^s)\\
=
\\ys~TRi'^''{H)T*ys\\
<e, where e is an arbitrarily small number. This shows that the discrepancy principle (21) will terminate the iteration after n{S) < oo iterations. In the following we will analyze that the iteration with the discrepancy principle as the stopping rule is a regularization. Theorem 4. If n{5) is chosen by the above stopping rule, then l b - T x „ ( 5 ) ! | < ( r + l)(5, ||y-Ta;„(5)_i||>(r-l)5.
(22) (23)
Proof. Note that 1 - r„(A) < 1 and y - Txn-i
- !/5 - Txi^i
- Rltl^{TT*)TT*{ys
- y)
and y - Tx„ = 2/5 - Txi + Rl,^^{TT*)TT*{y
^ y,),
so let n = n{5) and by triangular inequalities, we have \\y - rx„(5)_i|| = \\ys - Txi^s)-i
" (-^ " ^n(5)^i(rr*))(y5 - y)||
> ys - Tx'n(s)-i\\ - W - r„(5)„i(TT*))(j/5 - 2^)11 >TS-5 = (r-l)(5; \\y - Tx„^s)\\ = \\y6 - Tx'^nis) + {!- r„^s){TT*)){y - 2/5)|| < \\ys - Txi^5)\\ + \\{I- rn(6){TT*)){y - ys) < T5 + (5 = (r + 1)5.
Theorem 5. Assume that 0;+ G R{H''), v > ^, x^,^. is the solution of (1) when ys instead of y is given and n{S) is chosen according to (21), then for fixed a, ||x^(^) - x+1| = 0((5 w ) . Proof. Suppose a;"*" = //"w, where LO is normahzed and to S R{H''), then Xn{,5) -X'^
=
rn{H)H''uj.
v > \,
376
Yanfei Wang and Qinghua Ma
By Holder's inequality and (22) of Theorem 4, we obtain Xn(S)-X+\\
<
\\rn{H)L0\\^^\\H-^rn{H)H''ij\\^^ 1
1
< ||w|h-Vl||ff2(a;'Cn(5) , . _a;+)||2'+l =
0{S^STT).
Now assume that n{S) > [ ^ 1 , by (23) of Theorem 4, (r-l)5<||y-Ta;„(5)_i|| = ||Tx+ -ra;„(5)_i|| = ||Tr„(a)_i(ff)ff-a;||. By (15), we find \\Tr^i5)-iiHWtof
= (l/r„(5)_i(F)/f'^a;,r„(5)_i(F)/f^a;) n(o)
'n(d)
n(d) where -Ei_jy and i?2,i/ are two constants with respect to v. Thus
where Ei, = ^JE^.
This shows n((5)
(25)
where Fj, = (:;^)^J^^. Now by (20), we obtain
= C)(55I:TT) + 0((5217TT)
(for fixed a)
= 0((5^). D
Remark. Theorem 5 shows that, using the discrepancy principle as a posteriori strategy, we can not expect to obtain the optimal convergence rate 0((5''''+i) as in section 3. In fact, we even can not obtain the optimal convergence rate Oib"^"^^). This indicates that, sometimes, if some priori knowledgebased information is known in advance, then we can get good results. This phenomenon is particular useful in remote sensing (see [LGWSOl]). But for the posteriori strategy, recalhng that in section 3, ||x„ — x* || can be bounded by f^Ar- and noting that (20), we conclude that even for the posteriori strategy, it does not need too many iterations. If we choose only n = 3 iterations to generate convergence.
> 0, then it needs
Iterated Adaptive Regularization for the Operator Equations
377
References [EHN96] Engl, H.W., Hanke, M. and Neubauer, A. (1996), Regularization of Inverse Problems, Kluwer, Dordrecht. [Gfr87] Gfrerer, H. (1987), "An A-Posteriori Parameter Choice for Ordinary and Iterated Tikhonov Regularization of Ill-posed Problems Leading to Optimal Convergence Rates," Mathematics of Computation, Vol. 49, 507-522. [Gro84] Groetsch, C.W. (1984), The Theory of Tikhonov Regularization for Fredholm Equations of The First Kind, MA:Pitman, Boston. [HG98] Hanke, M. and Groetsch, C.W. (1998), "Nonstationary Iterated Tikhonov Regularization," Journal of Optimization Theory and Applications, Vol. 98, 37-53. [KC79] King, J.T. and Chillingworth, D. (1979), "Approximation of Generalized Inverses by Iterated Regularization," Numer. Fund. Anal. Optim., Vol. 1, 499-513. [LGWSOl] Li, X.W., Gao, F., Wang, J.D. and Strahler, A. (2001), "A priori knowledge accumulation and its application to linear BRDF model inversion". Journal of Geophysical Research, Vol. 106, 11925-11935. [Neu97] Neubauer, A. (1997), "On Converse and Saturation Results for Tikhonov Regularization of Linear 111-Posed Problems," SIAM.J.Numer. Anal, Vol. 34, No.S, 517-527. [Phi62] Phillips, D.L. (1962), "A Technique for the Numerical Solution of Certain Integral Equations of the First Kind," J. Assoc. Comput. Mach., Vol. 9, 84-97. [RB98] Ryzhikov, G.A. and Biryulina, M.S. (1998), "Sharp Deconvolution in Elimination of Multiples," 68th Ann. Internat. Mtg. Soc. Expl. Geophys., 1983-1986. [Sch84] Schock, E. (1984), "Regularization of Ill-posed Equations with Selfadjoint Operators," in: German-Italian Symposium on the Applications of Mathematics in Technology (Eds: Boffi, V. and Neunzert, H.), Teubner, Stuttgart, 340-351. [Tik63] Tikhonov, A.N. (1963), Regularization of Incorrectly Posed Problems, Sov.Math.Doklady, Vol. ^ 1624-27. (translation from Dokl.Akad.Nauk SSSR 153, 49-52, 1963) [Wan02] Wang, Y.F., On the Optimization and Regularization Methods for Inverse Problems, Ph.D Thesis, Academy of Mathematics and System Sciences, Chinese Academy of Sciences, June, 2002.
Recover Multi-tensor Structure from H A R D MRI Under Bi-Gaussian Assumption* Qingguo Z e n g \ Yunmei C h e n \ Weihong Guo^, and Yijun Liu^ ^ Department of Mathematics, University of Florida, Gainesville, FL 32611, USA. {qingguo, yun, guo }9niath. uf 1. edu ^ Department of Psychiatry, University of Florida, Gainesville, FL 32611,-USA. yijunliuQpsychiatry.uf1.edu Summary. We present a variational framework for determination of intra-voxel fiber orientations from High Angular Resolution Diffusion-Weighted (HARD) MRI under the assumption of biGaussian diffusion. The approach is simultaneously estimating and regularizing the two tensor fields and the field of the proportionality corresponding to the mixture of two Gaussians. The prior information on location of the voxels with strong isotropic or one-fiber diffusion is incorporated into the energy functional in order to increase the accuracy of the estimations. The prior information on voxel classification is obtained from the spherical harmonic (SH) representation of the apparent diffusion coefficient (ADC) profiles estimated and regularized simultaneously from noisy HARD data. The performance of the proposed model has been evaluated on the human HARD MR images, and the experimental results indicate the effectiveness of the model in recovering intra-voxel multi-fiber diffusion. K e y w o r d s : Multi-fiber, biGaussian diffusion, spherical harmonics, regularization.
1 Introduction Diffusion-weighted (DW) imaging adds to conventional MRI the capability of measuring the random motion of water molecules, referred as diffusion. T h e diffusion of water molecules in tissue over a time interval t can be described by a probability density function pt on the displacement r. Since Pt(r) is largest in the directions of least hindrance to diffusion and smaller in other directions, the information about Pf (r) reveals fiber orientations and leads to meaningful inferences about the microstructure of the tissues. Since pt(r) is related with D W signal s(q) through the equation: s{q) =^ so f Pt{r)e-'^''dT, * Thanks to NIH/DA016221 for funding.
(1.1)
380
Qingguo Zeng, Yunmei Chen, Weihong Guo, and Yijun Liu
where q is the direction of the diiTusion sensitizing gradients, and SQ is the signal in the absence of any gradient, it can be estimated from inverse Fourier transform (FT) of s(q)/so- However, this requires a large rmmber of measurements of s(q) over a wide range of q in order to perform a stable inverse FT. In ([TWBW99]) Tuch et al. introduced the method of high angular resolution diffusion-weighted (HARD) imaging. In [WRTOO] Wedeen et al. succeed in acquiring 500 measurements of s(q) in each scan to perform a fast FT inversion. One of the alternative for recovering the intra-voxel structure is using the information of ADC profiles d, that is defined for DW MRI by
d(q) = -hog'M, 0
(1.2)
So
If pt is a Gaussian, d{u) = bu^Du and s(u) = SQC^^ '-'" where D is the diffusion tensor, b = t | q p is the diffusion-weighting factor, and u = q/|q|, This model is useful for detecting single fiber diffusion. ([BML94a, BML94b, CBP90]) However, it has been recognized that the Gaussian model is inappropriate for assessing multiple fiber tract orientations([BML94b, AHLOl, FVaOlb, TWBW99, WRTOO, FraOla]). A simple alternative is assuming that pt is a mixture of n Gaussians. Then the diffusion is modeled by n
s(u) = s o E / i e - ' ' " " ^ ' " ,
(1.3)
where /j > 0 and Yl^=i fi = ^- The /j is considered as the apparent volume fraction of the voxel with diffusion tensor Di. Recently, Parker et al [PA03] and Tuch et al. ([TRW02]) used a mixture of two Gaussian densities to model the diffusion for the voxels where the Gaussian model fits the data poorly. Such voxels were identified by using the SH representation of d{u) in [PA], and the feature of multiple max/min of d{u) in u in [TRW02]. In this note we present a new variational method for recovering the intravoxel structure under the assumption that pt is a mixture of two Gaussians. Our approach differs from the existing methods in the following aspects. First, we recover each field Di{x) or /i(x) globaUy by simultaneous smoothing and data fitting, rather than estimating them from (1.3) in each isolated voxel, which leads to an ill-posed problem. Second, we recover the ADC profile d{x, 9,4>) in SH representation from the noisy HARD data before estimating -Di(x) and /i(x). The recovered d and the voxel classification on diffusion anisotropy from d are incorporated in our energy function to raise the accuracy of the estimations. Third, we applied the biGaussian model to all the voxels in the field, rather then the voxels where the Gaussian model fits poorly only. Since both the constraint of / i w 1 on the region of strong one-fiber diffusion, and the regularization for /j and Di are built in the model, the single fiber and multi-fiber diffusions can be separated automatically by the model solution. This approach will be less sensitive to the errors in the voxel classification.
Recover Multi-tensor Structure from HARD MRI
381
2 Recovery of A D C Profiles One of promising approaches for identifying voxels containing crossing fibers is using SH representation of the ADC profiles estimated from HARD data, that was initiated by Prank [EraOlb], and also studied in [ABA02]. In their work (i(x, u) was computed from HARD raw data via (1.2), and represented by a truncated SH series: ''max
•>
O!(X,0,0)=^
J2
1=0
Am{^)yi,m{0,
(2.1)
m=-l
where Yi^jn{9,(t>) ^tre the spherical harmonics. The odd-order terms in the SH series are zero, since d is real and antipodal symmetry. If Imax = 2, (2.1) provides a model equivalent to the Gaussian diffusion. Their experimental results showed that the profiles with significant 4th-order components arise consistently in various regions to human brain where two fiber crossing is known to exist. The ^;,m's were computed from the inverse SH transform of d in [FraOlb], and as the least-squares solutions of (2.1) in [ABA02]. Recently, Chen et al. ([CGZ04a]) proposed a simultaneous estimation and smoothing model for recovering d{x, 9,0) from the noisy HARD measurements s(q). In [CGZ04a] d was represented by (2.1) with / = 0, 2,4, and the A;,m(x)'s were estimated by solving the following constrained minimization problem:
™" / { E E iv^i,^(x)r('') + ivso(x)r('') •^^
+Xi I /o
( = 0,2,4 m = - ;
I |s(x,6l,0)-so(x)e-'"'('''"'^)psm0d6ld(/) + A 2 | s o ( x ) - s o ( x ) n d x , Jo (2.2)
with the constraint; I
d{x,0,cp)= J2
E
^'.™W^'.™(^''^)>0-
(2.3)
(=0,2,4m=-;
In (2.2) p(x) and q{x) were chosen such that the smoothing for v4;,m(x) and So(x) is isotropic in the homogeneous regions and anisotropic along their edges to preserve the relevant features. Their experiments demonstrated the effectiveness of this method in recovering d and in enhancement of diffusion anisotropy. The characterization of non-Gaussian diffusion based on the recovered d showed a consistency of the results from their human HARD MRI data with the known neuroanatomy.
3 Determination of Fiber N u m b e r s In our approach of determining fiber directions, the first step is to recover the ADC profiles d from noisy HARD data by using model (2.2)-(2.3). Then,
382
Qingguo Zeng, Yunmei Chen, Weihong Guo, and Yijun Liu
from the SH representation of tfie recovered d we define |Ao,o(x)|
^°W = - A ( ^ ' ^^W =
E::-2l-42.m(x)|
^(^)
wtiere ^ ( x ) = J2i=o 2 4 X^m=-i l^i,jn|(x). Tire voxels witfi significant RQ and i?2 are identified as strong isotropic diffusion and one-fiber diffusion, respectively. The union of all these voxels are denoted by J?i. On J7i /lin (1.3) should be close to 1. Under the assumption of pt being a mixture of two Gaussians, the diffusion is modelled by (1.3) with n = 2. To estimate D, and fi in (1.3) we minimize the following function; 2
min / ( V | V L i | ^ ' ( ' ' ) + |V/|^^W)dx + Ai /
+A2 /
/
JnJo
/
{f-lfdx
|so(x)V/ie-''"^^'^^^"-s(x,^,0)|2sm0d0d0rfx,
Jo
(3.1)
^^1
with the constraint L^'"* > 0. In (3.1) for i = 1, 2 Aj > 0 is a parameter, Pii^) = 1 + i+k\VG\*VLi[^' P/(^) = 1 + i+fc|VG.*v/|^' ^i ^^ ^ ^^^^^ triangular matrix such that D, = LiLj, that is the Cholesky facterization for Di to achieve the positive definite constraint on Dj (see [WVCM03]). jVLjl*' = Z ^ l < m , n < 3 \^ -^i
I i
The first two terms in (3.1) are the regularization terms By the choice of Pi(x) (similarly for p / ) , in the homogeneous region image gradients are close to zero and Pi(x) ~ 2, the smoothing is isotropic. Along the edges, image gradients makes Pi{x) ~ 1, the smoothing is the total variation based and only along the edges. At all other locations, the image gradient forces 1 < p < 2, and the diffusion is between isotropic and total variation based, and varies depending on the local properties of the image. Therefore, the smoothing governed by this model well preserves relevant features in these images. The third term in (3.1) is forcing / ~ 1 on i?i. The last term is the nonlinear data fidelity term based on (1.3). The fiber orientations at each voxel are determined by the directions of the principle eigenvectors of Di and I?2. For the voxels where / (or 1 — / ) is significantly large, we consider 1 — / (or / ) as zero, and (1.3) reduce to the Gaussian diffusion model.
4 Experiment Results We apphed model (3.1) to a set of HARD MRI human data. The raw HARD MR images were obtained using a single shot spin-echo EPI sequence.
Recover Multi-tensor Structure from HARD MRI
383
The imaging parameters for the DW-MRI acquisition are repetition time (TR) = 1000ms, echotime{TE) = 85ms. Diffusion-sensitizing gradient encoding is appUed in fifty-five directions with b = lOOOs/mm^. Thus, a total of fifty-six DW images with the matrix size =256 x 256 were obtained for each slice, and images through the entire brain are obtained by 24 shces. The slice is transversally oriented and the thickness is 3.8mm, and intersection gap between two contiguous shces wiU be 1.2mm. The field of view (FOV) =220mm x 220mm. The figures below show our experimental results from a particular subject in a brain shce through the external capsule. In this experiment we first recovered the ADC profiles d using (2.2)-(2.3), and defined J7I as the set of the voxels, where RQ > 0.8416 or i?2 > 0.1823. These thresholds were selected using the histograms of RQ and i?2- Then, we solved the minimization problem (3.1) by the energy decent method. The information of / w 1 on i7i was also incorporated into the selection of the initial / . By solving (3.1) we obtained the solutions Lj and / , and consequently, Di = LiLf {i = 1,2). Fig.la-b represents the image of / before and after smoothing. / ?» 1 on the dark red regions. The voxels in these regions are identified as isotropic or one-fiber diffusion. This is consistent to the known neuroanatomy. Fig.lc Compare shape of d at three particular voxels(first, second, third column) got by two models, first row: model (3.1); second row: model (3.2) in [CGZ2]. The blue and red arrows indicate the first and second orientations respectively. Fig. 2 shows the color representation of the directions of the principle eigenvectors for I?i(x) and 1^2(x), respectively. By comparing the color in Figs.2a,2b with that in the color pie shown in Fig.2c, the fiber directions are uniquely determined. The representation in Fig.2a,2b are implemented by relating the azimuthal angle {(j)) of the vector to color hue (H) and the polar angle [9 > 7r/2) to the color saturation (S). Shghtly different from [SC], we define H = 0/27r, S = 2{TT- e)/TT, and Value{V) = 1 in SHV. If the direction of the principle eigenvector is represented by (0, d), the fiber orientation can be described by either {9, cp) or {n — 9,(f> + TT). We express the vectors in the lower hemisphere, i.e. 9 > TT/2. The upper hemisphere is just an antipodally symmetric copy of the lower one. The xy plane is the plane of discontinuity. To examine the accuracy of the model in recovering fiber directions, we selected a region inside the corpus callosum, where the diffusion is known as one-fiber diffusion. For each voxel in this region we computed the direction in which d is maximized. This direction vector field is shown in Fig.3a. On the other hand we solved (3.1) and found that the model solution / ~ 1 on this region. Moreover, the direction field generated from the principle eigenvector of Di, as shown in Fig.3b, not only preserves the vector field in Fig.3a, but also more regularized due to the regularization terms in the model.
384
Qingguo Zeng, Yunmei Chen, Weihong Guo, and Yijiln Liu
5 Conclusion A novel model for determination of fiber directions for biGaussion diffusion from HARD MRI is presented. In this model two tensor fields are recovered by simultaneous smoothing and data fitting, as well as incorporating the information on voxel classification of diffusion anisotropy. The numerical results indicates the effectiveness of the model in recovering intra-voxel structure. The choice of the parameters in (3.1) and the determination of the region R1would influence the results. Our choice is made based on the principle that the one-fiber direction from the model agrees with the direction in which d is maximized. We will study in the future how to increase the accuracy in the estimation of the orientations of crossing fibers within a voxel.
Fig.1 (a)-(b) Images of f before and after smoothing respectively. The red region in l ( a ) represents R1.In (b), function f % 1 on the dark red regions. The voxels in these regions are identified as isotropic or one-fiber diffusion, which is consistent to the known neuroanatomy. The comparison of them indicates that the smoothing for f achieved by the constraint f = 1 on Rl is preserved. (c). Compare shape of d at three particular voxels(first, second, third column) got by two models, first row: model (4); second row: model (5) in [CGZ04b]. The blue and red arrows indicate the first and second orientations respectively.
Recover Multi-tensor Structure from HARD MRI
(4
(b)
385
(c)
Fig.2 (a)-(b) Color representation of the directions of the principle eigenvectors for D l ( x ) and D z ( x ) , respectively. By comparing the color-coding in (a) and (b) with the color pie shown in (c), the fiber directions are uniquely determined. We also can see the fiber direction field is smooth.
Fig.3 Vectors in field (a) are the directions corresponding t o maxima of d, and vectors in field (b) are the principle eigenvectors of D's (solution of (3.1))
References [ABA02] D.C.Alexander, G.J.Barker and S.R.Arridge. 2002, " Dection and Modeling of Non-Gaussian Apparent Diffusion Coefficient Profiles in Human Brain Data," Magn.Reson. Med., Vo1.48, 3317. [AHLOl] A.L. Alexander, K.M.Hasan, M.Lazar, D.L.Parker. 2001, "Analysis of partial volume effects in diffusion-tensor MRI," Magn. Reson. Med. V01.45,770-780. [BML94a] P.J.Basser, J.Matliello and D. Lebihan. 1994, "Estimation of the Effective Self-Diffusion Tensor from the NMR Spin Echo," Magn.Reson, Vol. 103(B), 247-254.
386
Qingguo Zeng, Yunmei Chen, Weihong Guo, and Yijun Liu
[BML94b] P.J.Basser, J. Mattiello.D. LeBihan. 1994, "MR diffusion tensor spectroscopy and imaging," Biophys,Vol.66(C267), 255. [CBP90] T.L.Chenevert, J.A.Brunberg, J.GPipe. 1990, "Anisotropic diffusion in human white matter: demonstration with MR techniques in vivo," Radiology, Vol. 177(C405), 401. [CGZ04a] Y. Chen, W. Guo and Q. Zeng. 2004, "Estimation, smoothing and characterization of apparent diffusion coefficient profiles from high angular resolutin DWl", Computer Vision and Pattern Recognition, to appear [CGZ04b] Y. Chen, W. Guo and Q. Zeng. 2004,"Recovery of intra-voxel structure from HARD MRI", ISBI to appear [PraOla] L.R.FYank. 2001, "Characterization of anisotropy in high angular resolution diffusion weighted MRI," Proc. of the 9th Annual Meeting oflSMRM, 1531, [PraOlb] L.R.Pranlc. 2001, "Anisotropy in high angular resolution diffusionweighted MRI," Magn.Reson.Med ,Vol.45,935-939. [PA03] G.J.M.Parker and D.C.Alexander. 2003 "Probabilistic Monte Carlo based mapping of cerebral connections utilising whole-brain crossing fiber information," Proc. of Information Processing in Medical Imaging,684-696. [TWBW99] D.S.Tuch, R.M. Weisskoff, J.W.Belliveau, V.J.Wedeen. 1999, "High angular resolution diffusion imaging of the human brain," Proc. of the 7th Annual Meeting of ISMRM,32l. [TRW02] D.S.Tuch,T.G.Reese,M.R.Wiegell,N.Makris,J.W.Belliveau, V.J.Wedeen. 2002, "High angular resolution diffusion imaging reveals intravoxel white matter fiber heterogeneity," Magn Reson Afed,Vol.48,577-582. [WRTOO] V.J.Wedeen, T.G.Reese, D.S.Tuch, M.R.Weigel, J.G.Dou, R.M.Weisskoff, D.Ghesler. 2000 "Mapping fiber orientation spectra in cerebral white matter with Fourier transform diffusion MRI," Proc. of the 8th Annual Meeting of ISMRM, 82. [WVCM03] Z.Wang, B.C.Vemuri, Y.Chen and T.Mareci. 2003 "A Constrained Variational Principle for Direct Estimation and Smoothing of the Diffusion Tensor Field from DWI," Proc. of IP MI,660-671, [PP] S.Pajevic and C.Pierpaoli, "Color schemes to represent the orientation of anisotropic tissues from diffusion tensor data: Application to white matter fiber tract mapping in the human brain," Magn. Reson. Med. Vol. 42,526-540
PACBB: A Projected Adaptive Cyclic Barzilai-Borwein Method for Box Constrained Optimization* Hongchao Zhang and William W. Hager Department of Mathematics, University of Florida, Gainesville, FL 32611, USA. {hzhang,hager}®ufl.edu Summary. The adaptive cyclic Barzilai-Borwein (BB) method [DZ05] for unconstrained optimization is extended to bound constrained optimization. Using test problems from the CUTE library [BCGT95], performance is compared with SPG2 (a BB method), GENCAN (a BB/conjugate gradient scheme), and L-BFGS-B (limited BFGS for bound constrained problems). K e y w o r d s : box constrained optimization, cyclic Barzilai-Borwein stepsize method, nonmonotone line search
1 Introduction Recently, we developed an adaptive cyclic Barzilai-Borwein (ACBB) method [DZ05] for solving unconstrained optimization problems. In this paper, we explain how the line search can be modified so as to solve bound constrained optimization problems of the form: min f(x),
(1)
where / is a smooth function, B= {x G 5R"|L < x < U}, and L and U are upper and lower bounds, possibly infinite. For the bound constrained problem, the ACBB search direction are projected onto the feasible set. Hence, the new algorithm is denoted PACBB [projected adaptive cyclic Barzilai-Borwein method). A step in the BB method [BB88] is given by
4-iSk-i Xk+\=Xk-ak9k,
.„.
ctk =-y •, (2) Sk-iVk-i * This material is based upon work supported by the National Science Foundation under Grant No. 0203270.
388
Hongchao Zhang and William W. Hager
Fig. 1. The projected line search.
where g^ = Vf{xk) is the gradient, viewed as a column vector, Sfc_i = Xk — Xk-i, and yu-i = Qk — Qk-i' Advantages of BB type methods are their low memory requirements and their simphcity - in a neighborhood of a local minimizer, no Une search is needed since the convergence is hnear (see [DZ05]) when the Hessian is strictly convex at the solution. In the cyclic BB method, the same stepsize au is used repeatedly for several iterations - we observe in [DZ05] that by reusing a step for several iterations, convergence can be accelerated. In the adaptive cyclic BB method, we adaptively adjust the cycle length as the iteration progress. In the projected cyclic Barzilai-Borwein method, we project the ACBB iterates onto the feasible set B and perform a nonmonotone hne search between the current iterate and the projection point using the scheme in [DZ05].
2 Algorithm The line search is illustrated in Figure 1. We first take a step along the negative gradient to a point Xk = Xk — ctkgk, where the initial stepsize ak is a safeguarded version of either the previous stepsize or the newly computed stepsize if the cycle length has been reached. If the point Xk lies outside B, we compute
PACBB for Box Constrained Optimization
389
the projection Ps^Xk) of Xk onto B. The search direction dk = Pei^k) ^ ^k is a descent direction since B is convex. A nonmonotone line search is performed along the line segment connecting Xk and PB(xfc). If possible, we accept the point Ptsixk). Otherwise, we backtrack towards Xk- When Xk lies outside of B, the next initial stepsize a^+i is given by the BB formula found in (2). In a forthcoming paper, we prove that when B is replaced by a closed, convex set n, we have liminf ||Pr2(a;fc ~ gk) -XfcHoo = 0 fc—>oo
for a gradient projection/nonmonotone hne search scheme with the structure depicted in Figure 1
3 The numerical results In this section, we compare the performance of the projected adaptive cychc BB algorithm (PACBB) to the SPG2 algorithm developed in [BMROO, BMROl], to the GENCAN algorithm developed in [BM02], and to the LBFGS-B version of the limited BFGS method for box constrained optimization developed in [BLN95, ZBN97]. All codes were written in Fortran and compiled with f77 (default compiler settings) on a Sun workstation. The GENCAN codes were obtained from Jose Martinez's web page: http://www.ime.unicamp.br/~martinez/software.htm and the L-BFGS-B codes were obtained from Jorge Nocedal's web page: http://www.ece.northwestern.edu/~nocedal/software.html For each code, we stopped the iterations if either \\PB{Xk-gk)-Xk\\oo<W~'^
(3)
or \fixk)-fixk.^)\/{l
+ \fixk)\)<10-''.
(4)
We also terminated a code if the number of function evaluations was more than 10^. The test set consisted of all bound constrained problems from the (2002) CUTE hbrary [BCGT95] with more than 50 variables. For all problems where more than one choice of the dimension is given, we use the largest dimension. The numerical results are posted at the following web page: http://www.math.ufl.edu/~hager/papers/CBB Relative to the CPU time, the numerical comparison of PACBB with the other three routines can be summerized as follows: •
PACBB is faster than SPG2 in 36 problems, while SPG2 is faster in 6 problems.
390 • •
Hongchao Zhang and William W. Hager PACBB is faster than GENCAN in 33 problems, while GENCAN is faster in 9 problems. PACBB is faster than L-BFGS-B in 34 problems, while L-BFGS-B is faster in 11 problems.
Excluding the problems where the difference in CPU time was less than 10%, the numerical results can be summerized as follows: • • •
PACBB is faster than SPG2 in 33 problems, while SPG2 is faster in 4 problems. PACBB is faster than GENCAN in 30 problems, while GENCAN is faster in 8 problems. PACBB is faster than L-BFGS-B in 32 problems, while L-BFGS-B is faster in 9 problems.
Figure 2 shows the performance profiles, proposed by Dolan and More [DM02], for the four codes. That is, for the methods analyzed, we plot the
Fig. 2. Performance profiles
fraction P of problems for which any given method is within a factor r of the best time. In a performance profile plot, the top curve is the method that solved the most problems in a time that was within a factor r of the best time. The percentage of the test problems for which a method is the fastest is given on the left axis of the plot. The right side of the plot gives the percentage of the test problems that were successfully solved by each of the methods. In essence, the right side is a measure of an algorithm's robustness.
PACBB for Box Constrained Optimization
391
Since the top curve in Figure 2 corresponds to P A C B B , this method yielded the best C P U time performance for this set of 48 test problems with dimensions ranging from 50 to 15,625. Similar to SPG2, the algorithm PACBB is suitable for large dimensional problems due to its low memory requirements. It is pointed out in [BMROO] t h a t for ill-conditioned problems, SPG2 may converge slowly. Although PACBB seems to deal with ill-conditioning better t h a n SPG2, b o t h G E N C A N and L-BFGS-B are more efficient for the very ill-conditioned problems. Finally, the PACBB algorithm is very easy to implement and it has many promising applications (see [BCM99, GHR93]).
References [BB88] J. Barzilai and J. M. Borwein, Two point step size gradient methods, IMA J. Numer. Anal, 8 (1988), pp. 141-148. [BCM99] E. G. Birgin, I. Chambouleyron, and J. M. Martinez, Estimation of the optical constants and the thickness of thin films using unconstrained optimization, J. Comput. Phys., 151 (1999), pp. 862-880. [BM02] E. G. Birgin and J. M. Martinez, Large-scale active-set box-constrained optimization method with spectral projected gradients, Comput. Optim. Appl., 23 (2002), pp. 101-125. [BMROO] E. G. Birgin, J. M. Martinez, and M. Raydan, Nonmonotone Spectral Projected Gradient Methods for convex sets, SIAM J. Optim., 10 (2000), pp. 1196-1211. [BMROl] E. G. Birgin, J. M. Martinez and M. Raydan, Algorithm 813: SPG software for convex-constrained optimization, ACM Trans. Math. Software, 27 (2001), pp. 340-349. [BLN95] R. H. Byrd, P. Lu and J. Nocedal, A Limited Memory Algorithm for Bound Constrained Optimization, SIAM J. Sci. Comput., 16, (1995), pp. 11901208. [BCGT95] I. Bongartz, A. R. Conn, N. I. M. Gould, and P. L. Toint, CUTE: constrained and unconstrained testing environments, ACM Trans. Math. Software, 21 (1995), pp. 123-160. [DZOl] Y. H. Dai and H. Zhang, An Adaptive Two-point Stepsize Gradient Algorithm, Numer. Algorithms, 27 (2001), pp. 377-385 [DZ05] Y. H. Dai, W. W. Hager, K. Schittkowski and H. Zhang, Cyclic BarzilaiBorwein Stepsize Method for Unconstrained Optimization, March, 2005 (see www.math.ufl.edu/~hager/papers/CBB) [DM02] E. D. Dolan and J. J. More, Benchmarking optimization software with performance profiles. Math. Prog., 91 (2002), pp. 201-213. [GHR93] W. Glunt, T. L. Hayden, and M. Raydan, Molecular conformations from distance matrices, J. Comput. Chem., 14 (1993), pp. 114-120. [GLL86] L. Grippo, F. Lampariello, and S. Lucidi, A nonmonotone line search technique for Newton's method, SIAM J. Numer. Anal., 23 (1986), pp. 707-716. [RayOl] M. Raydan, Nonmonotone spectral methods for large-scale nonlinear systems Report in the International Workshop on "Optimization and Control with Applications", Erice, Italy, July 9-17, 2001
392
Hongchao Zhang and William W. Hager
[Toi97] Ph. L. Toint, A non-monotone trust region algorithm for nonlinear optimization subject to convex constraints, Math. Prog., 77 (1997), pp. 69-94. [ZBN97] C. Zhu, R. H. Byrd and J. Nocedal, L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization, ACM Trans. Math. Software, 23 (1997), pp. 550-560.
Nonrigid Correspondence and Classification of Curves Based on More Desirable Properties Xiqiang Zheng, Yunmei Chen, David Groisser, and David Wilson Department of Mathematics, University of Florida, Gainesville, FL 32611, USA. {xzheng,yun,groisser,dcw}9math.uf1.edu
S u m m a r y . We present a few more desirable properties to find correspondence and dissimilarity of two plane curves in nonrigid sense. A crossed scheme is used to define dissimilarity metric, which ensures an actual bi-morphism between two curves to be aligned. The optimal correspondence is found by a modified dynamic-programming method. From the optimal correspondence between two curves, we compute their dissimilarity by the overall difference of curvature, local dissimilarity, and local scale of stretching of their corresponding segments. Based on dissimilarity, we can do pattern recognition in nonrigid sense among curves which is very important in many areas such as recognition of hand written letters and cardiac curves where rigid transformations and scaling do not work well. Finally, the effect of correspondence and the classification is shown in application to cardiac curves. K e y w o r d s : Curve alignment, recognition, dynamic programming, correspondence.
1 Introduction This paper employs a few more desirable properties to find correspondence and dissimilarity of two plane curves in nonrigid sense. First, we review current approaches for curve alignment, discuss how we extend current techniques, and present an overview of our approach. Current curve alignment methods can be classified into two categories: methods based on rigid transformations [AF86], [Ume93], and those based on nonrigid deformations [CAS92], [Tag99], [BCGJ98], [You98], [TSG02], [SKK03], and [BMPOl]. Methods based on rigid transformations rely on matching feature points by finding the optimal rotation, translation, and scaling parameters. Since these methods assume t h a t the outlines can be aligned by a rigid transformation, they are sensitive to articulations, deformations of parts, occlusion, and other variations in the object form. Methods based on nonrigid deformations model articulation and other deformations by finding the mapping from one curve to another t h a t minimizes a performance functional consisting of stretching and bending energies.
394
Xiqiang Zheng, Yunraei Chen, David Groisser, and David Wilson
The minimization problem in the discrete domain is transformed into one of matching shape signatures with curvature, bending angle, or absolute orientation as attributes. These methods suffer from one or more of the following drawbacks: asymmetric treatment of the two curves, sensitivity to sampling of the curves, lack of rotation and scaling invariance, and sensitivity to occlusion and articulations. As pointed in [SKK03], Cohen et al. [CAS92] pioneered the deformation-based approach to curve matching. The basic premise of their approach was to match high curvature points along the curves, while maintaining a smooth displacement field. Tagare, Shea, and Groisser in [Tag99] and [TSG02] pointed out the inherent asymmetry in the treatment of the two curves and proposed a bi-morphism that ensures a symmetric formulation. They search in the space of pairs of functions. Sebastian, Klein, and Kimia in [SKK03] defined a similarity metric based on the alignment curve using two intrinsic properties of the curve, namely, length and curvature. The optimal correspondence is found by an efficient dynamic-programming method both for aligning pairs of curve segments and pairs of closed curves, and is effective in the presence of a variety of transformations of the curve. Belongie, Malik, and Puzicha in [BMPOl] introduced the shape context, a rich shape descriptor that aids in judging shape similarity between similar shape points. Frenkel and Basri in [FB03] defined a similarity metric similar to Kimia in [SKK03] and employed Sethian's Fast Marching Method to find the solution with sub-resolution accuracy and in consistence with the underlying continuous problem. Based on those previous work, we formulate a few more desirable properties, which are local dissimilarity, and local scale of stretching, and use crossed scheme to find correspondence and dissimilarity of two plane curves in nonrigid sense. What we implemented are the actual bi-morphisms in the discrete formulation, and the optimal solution is found by a modified dynamic-programming method.
2 The Mathematical Formulation This section discusses the mathematical formulation of the problem of aligning two curves. We first review some previous approaches of aligning two curve segments and then formulate some more desirable properties. 2.1 Some Previous Approaches Denote the two curves to be matched by Ci(si) = (.TI(SI), j/i(si)), Si £ [0, Lj] and C2{s2) = (3^2(52)12/2(52)),S2 6 [0,i^2] , where Sj is the arc length, Xi and j/j are coordinates of each point, Li is the arc length of Cj for i = 1 and 2. Cohen et al. [CAS92] compare the displacement velocities and bending energies. They search for a function / : [0,Li] -^ [0,L2] which minimizes
Eif) = j{KcAf{s.))
- KcJ'dsr + R J ll^i^^ili^lRz^lM^^^,^,
(1)
Nonrigid Correspondence and Classification of Curves
395
where Kc^ and Kc2 are the curvatures along the curves Ci and C2 respectively, and i? is a parameter. Tagare [Tag99] handles this issue by introducing a bi-morphism and searching in the space of pairs of functions. A bi-morphism treats two curves Ci and C2 symmetrically and can be parameterized as ij{s) : [0,L] —> Ci x C2 such that it is regular and its projections on Ci and C2 are the onto, where L is the arc length of the product space. So a bi-morphism consists of a pair of functions lii{s) : [0, Li] -^ Ci and ^^{s) : [0,1/2] —> C2 such that ^i(s) =pi(/u(s)) and /U2(s) = P2{l^{s)) , where pi is a projection from Ci x C2 to Ci for i = 1,2 He defines J ( C I , C 2 ; M ) = / ( ^ ^ i M f M _ d02{P2{Ks)))y^^ /
(A/O
(2)
(J/O
which measures the overall dissimilarity between curves Ci and C2 , and H{CuC2;f^)=
l i - ^ r + i^-d^'ds 'Li ds'^ ' ^Li ds
(3)
which measures the closeness of the bi-morphism to a uniform mapping. Then he searches for the bi-morphism jj, which minimizes the following functional Q{CuC2•,^i) = J{Ci,C2;fi)
+ XH{Ci,C2;i^)
(4)
where A is a parameter. Younes in [You98] considers the correspondence between two curves Ci and C2 as a group action. Actually we just need to consider all ordered permutations of Ci, and hence the search space is a lattice-ordered group consisting of all the ordered permutations of Ci. He got the solution for the case of polygons. If two curves represented by ci(s) and C2{t), where s G [0,Li],t G [OL2], and Li is the arc length of curve d for i = 1 and 2, then a bi-morphism can be represented as a path c{p) : [0, L] —> [0, Li] x [0, L2] where L can be taken as the arc length in the product space. Prenkel and Basri in [BMPOl] search for a path c(p) which solves min / F{c{p))\c'{p)\dp
(5)
where F{s, t) = \Ki{s) - K2{t)\-\-X for some parameter A > 0. Then they use shape contexts to rescue. They show that the ODE: c'(p) = VT satisfies the Euler-Lagrange equation of (5), where T is a function satisfies | VT| = F{s, t). Hence we can use Fast Marching Method to solve for T first, then get c{p) from the ODE: c'{p) = VT. 2.2 Some Observations about the Energy Functions We have the following example which shows that there is a bi-morphism between two curves such that the bending energy is zero but two curves are not
396
Xiqiang Zheng, Yunmei Chen, David Groisser, and David Wilson
similar. Similarly we can show that for any two convex smooth curves, there is a bi-morphism such that the bending energy is zero. E x a m p l e 1 Let 0 < e < 0.1 be a fixed constant; Ci : y = ^,x G [e, 1] and C2 '• y = ^,x E [^/e,l]. Define /i : [e, 1] —» Ci x C2 with projections pi(/i(i)) = {t, ^ ) and p2(/uW) - {Vt, iti) . Then C[{t) = (l,i) and C^{t) = (^,iti).Then ei{t) = a r c t a n ( ^ i ^ ) = arctan(i)
(6)
and ,
v'(t)
-ti
e2{t) = arctan(^^f)-() = a r c t a n ( - ^ ) = arctan(t) Now ^
= Vix[{t))^
+ {y[{t))^ = x / I T F > 0 for any t G [e, 1], and ^
(7) =
y R W F T M W = \/ii + -4 = "W- > ° ^°' ^^y * ^ 1^' 1]' ^ y Equation (2) on page 573 of [Tag99], f = ,J{^^i§Y^{^^2 > Q for any t e [e,l]. Hence si(i), S2(i), and s{t) axe all strictly increasing functions for t G [e, 1], and hence their inverse functions exist. Then by Equations (6) and (7), we have OiipMs))) ds
dSiM^m
diipMsit)))) ds
_ ^
ds dt
,,etan^ ds dt
(8)
ds dt
for i = 1 and 2. Thus dOi{pi{ix{s))) _ d02{p2{l^{s)))
ds
ds
arctani
4^
(9)
at
for any s. Therefore
J(C:,C2;M)= [(^^^MIMA J
ds
^ ^^^MlMAf ds = ^ ds
However in the following, we can show that curve C\ is not similar to curve C2. First their graphs are shown in fig. 1. There let C{a, b) be a point on the curve y — x'^/2 with 0 < a < 1 and b = 0? 12. The equation of Une OA is a; — 2y = 0. Then the distance of the
Nonrigid Correspondence and Classification of Curves
397
Matching of y=x /2 (The rod dots) and y=x73 (The bluo dots) CE=sqrt(5)/20 is the maximai height of the red curve over OA DF=2'sqrt(30)/90 is the maximai height of the blue curve over OB CE/OA=1/10; DF/OB=sqrt(3)/15> CE/OA
0.7
Fig. 1. Matching between curve y = \
0.8
0.9
1
(red dots) and curve y = \- (blue dots)
point C to the line OA is d = ^^^ = ^^^ since 0 < a < 1. The v^ V5 V5 maximum is reached at a = ^ and ^ = | which result in d = - ^ . Hence CE = - ^ . Similarly we can get DF = ^ ^ ^ . The key point between Ci and C2 is that the total change of the angles of tangent lines are the same when ( changes from 0 to 1, and these angles are monotonically changing. Although we have the second term which measures the uniformity of arc length, it may be still difficult to find the global minimum.
3 Formulation of a Few More Desirable P r o p e r t i e s For a bi-morphism /x(s) : [0, L] —> Ci x C2, if we divide the interval into equal subintervals [0,ai], [ai,a2]v t^nd [a„_i,a„] , and if we don't assume any curve segment can be compressed to one point, then the curve Ci is divided into n segments Q j for i = 1,2 and j = 1,2,..., n. Actually in elastic model, no points disappear and hence we can assume it is one to one but a big segment can be compressed to a small segment. If we assume Ci and C2 are smooth, then Q is completely determined by its corresponding n segments. If one segment dj^ of curve d is deformed, then the whole curve is deformed. How much is the whole curve deformed depends on how much those segments are deformed and how well those segments are connected.
398
Xiqiang Zheng, Yunmei Chen, David Groisser, and David Wilson
Two hexagons in fig. 2 are very similar in nonrigid sense b u t differ much by rotation, translation, and scaling.
-15
-10
-5
-15
5
-10
(a)
10
15
(b)
Fig. 2. In fig. 2(b), the x coordinates of the right 3 points are added 3 units
Hence we do need to compare figures in nonrigid sense in many situations such as handwritten recognition and cardiac curves. In fig. 3(b), both angels and edges are changed a bit which results in a much different figures.
-15
5
-10
(a)
10
15
-15
-10
-5
5
to
15
(b)
Fig. 3. In fig. 3(b), 3 angles and 3 edges are changed
In many applications, each corresponding segment does not deform much. A desirable correspondence should match the corresponding curve segments as much as possible.For convenience, let us make the following definitions for curve segments.
Nonrigid Correspondence and Classification of Curves
399
Definition 1. For any curve segment Cij, call the line segment connecting the beginning and the ending points of a curve segment as the base of the curve segment. Definition 2. The maximal distance of the points of a curve segment to its base is called the height of the curve segment. For two curve segments Cij and C2J, let us denote the length of their bases as dj for i = 1,2, and assume that rfi < d2 temporarily. If we enlarge Cij and compress C2J by a factor r so that their bases can match after rotation and translation, then rdi = — and hence r = \/^. After this ahgnment, we can compute the area Ai which is enclosed by the curve segment C,j and its base for i = 1,2. Also we can define the symmetric difference AA12 between Ai and A2, which is the area of {Ai — A2) U(^2 — ^ i ) - Those will be used in the following definitions. Definition 3. Define the scaling factor r between curve segments Cij andC2j flu Definition 4. Define the dissimilarity between curve segments Cij and C2J as p{Cij,C2j) = ^jp-, where AA12 is the symmetric difference between Ai and A2 and d is their common base after scaling. For example, the dissimilarity between an arc of degree 2a of a unit circle and its base is f{a) = ^^7in"a" • ^^'^^easy to show that / ( a ) approaches zero when a approaches zero. Since f'{a) — 2stn^'atana' / ( ^ ) '^ increasing when a G [0,7r]. So the arc segment of degree a behaves like a straight line when a approaches zero. Also when a is bigger in the range [0, 27r], it is more different from straight line. Obviously /(2a) ^ 2/(a), hence this dissimilarity is not additive. Usually / ( 2 a ) > 2/(a) and hence the energy function wiU decrease when two curves are divided into more pairs. However if the number of pairs is fixed, this effect will disappear because one scaling becomes bigger will force some other scaling becomes smaller. Then it is additive based on the method of computation. Definition 5. Define the angle difference between curve segments Cij andC2j as Ad{Cij,C2j) = \AAi — AA2\, where AAi is the angle between beginning and ending tangent line of the curve segment Cij for i = 1,2. Now we can formulate our energy function. Let us assume curve Ci is well sampled into Uj segments Cy for i = 1,2 and j = 1,2,...,Ui. Up to a scaling, we can assume that curves Ci and C2 have the same arc length. Let the interval [0,1] is divided into n subintervals /, = [^^, ^] for i = 1,2,..., n. Then we need to find a bi-morphism ii{s) : [0,1] ^ Ci x C2 such that Pi° ^ is an ordered correspondence between [0,1] and Q for i = 1, 2 such that
400
Xiqiang Zheng, Yunmei Chen, David Groisser, and David Wilson n
minY,p{Pi)
+ r{Qi) + kiAe{Pi) + fc2|r(P.) - 1|
(10)
where Pi = {pi{n{Ii)),p2{fJ'{Ii))) is the ith pair in the matching, Qi = ^(Pj_i U Pj) is the union of the {i — l)th pair and the ith pair in the matching scaled by a factor ^, p{Pi) the scahng of the pair Pi, T{Qi) the scahng of the pair Qi, A9{Pi) the difference between the change of angles of pair Pi as defined in Definition 3.5, and r(P,) the scaling of pair Pi as in Definition 3.3. In equation (10), the term T{Qi) is optional and it can measure the local similarity of connecting points between the {i — l)th pair and ith pair.
4 The Algorithms For Finding the Optimal Bi-Morphism and Correspondence Let 5{{ii,i2], [ji, J2]) denote the cost of matching the iith segment through the i2th segment of curve Ci to the jith segment through the j2th segment of curve C2', let d{i,j) = 6{[l,i], [l,i]). If Ci consists of Ui segments for i = 1,2, then d{ni,n2) is the total cost of matching curves Ci and C2. In many applications, one segment cannot be matched to too much segments. Usually we can only allow one segment be matched to less than or equal to 3 segments. Then we have d{i,j) = min{d{i - k, j - I) + 5{[i - k,i],[j - I, j])\l
< 3|11)
which gives a recipe for computing the total cost d{ni,n2) by a dynamic programming approach. Because of being non additive of local dissimilarity, we need to fix our number of pairs between two curves. But based on the usual approach using equation (12), we just know the upper bound min{ni,n2} of the number of pairs. So we need to modify it a bit to satisfy our purpose. Without lose of generality, by a common refinement we may assume that both curves have the same number of segments. Actually for a lot of data such as the Cardiac data, the number of points representing curves already have the same size. Suppose each curve Cj is almost uniformly divided into n segments Cy for j = 1, 2,..., n. Since we can only allow the scaling of one segment is between g and 3, to find the matching cost between two curves Ci and C2 the second curve C2 should be interpolated into 3n segments d^j for j = l,2,...,3n. Then Ci„ can be matched to 9 possible choices which are Si = C2^„, S2 — C 2 , 3 n U C ' 2 , 3 n - l i •••' '^^ ^9 = ^^2,371 U ^"2,371-1 U C'2,,3n-2 U ••• U C'2,3n-8- T h e n
we have d{n,3ny = min{d{n~
l,3n - fc)'+ (5([n - l,n], [3n -/c,3n])'|l < A: < 9^12)
where d{n, 2>n)' means the cost of matching the first i segments of Ci of to the first j segments of C2, and 5{[ii,i2],[ji,J2])' is defined similarly. Apply equation (13) once, we can get a pair. Hence finally we can get n pairs. Before
Nonrigid Correspondence and Claissification of Curves
401
we compute the total cost, we also need to estimate the two parameters ki and ^2 in (13). For a certain apphcation we can figure out the suitable parameters. Now we can formulate the following algorithms: Step 1 Smooth each set of data a little bit. Step 2 Normalize each curve so that all curves have the same length. Step 3 Apply parametric cubic sphne to fit each data set to get a continuously differential curve. Also we need it to compute the overlapping area to get the local dissimilarity. Step 4 Use cubic polynomial to fit each curve and refine the second curve to be matched into 3n points. Step 5 Compute the tangent line and the corresponding angle at each point. Step 6 Estimate the parameter fci and k2Step 7 Start from curve 1 and apply equation (13) recursively to get the total cost (i(n,3n)'. Step 8* Start from curve 2 and do similarly as Step 6 to get the total cost d(3n,n)". Step 9 Compute the total cost d(Ci, C2) = d{n, 3n)' + d{3n, n)". Note*; Step 8 is optional. By applying Step 8, two curves are treated symmetrically and might improve the cost a little bit, but we need more computation to achieve it. We applied our algorithms to some data of cardiac curves, the results of correspondence are shown in Figure 4.
5 Curve Classification Based on t h e Cost of Nonrigid Correspondence Let us consider the set of all smooth curves and denote this set as F and fix the number of pairs when we search for the best bimorphism for any two curves. Proposition 1. The matching cost d{Ci,C2) between two curves Ci and C2 satisfies the following: l).diCuCi) = 0; 1'). d(Ci, C2) > 0 if and only if Ci is not similar to C2; 2).d{Cx,C2)^d{C2,Ci) Proof. 1) and 2) follows directly from the definitions and algorithms to compute the distance. To prove 1'), if Ci is similar to C2,then after a scaling, they are the same in rigid sense and hence ii(Ci, C2) = 0. On the other hand, if d(Ci, C2) = 0, then after a common scaling, for the corresponding n segments, all local similarities are the same in rigid sense and aU angle changes are the same. Hence Ci and C2 are the same in rigid sense. D Remark: In many previous papers, just 1),2) are shown. One direction of V) has been shown in one previous paper.
402
Xiqiang Zheng, Yunmei Chen, David Groisser, and David Wilson Local Simila'ily am
CortBSpwv
\
•V
:: 0.45
0.5
[ •-^ ^ , (b)
Q.55
(a)
.
|2ll«a;100lKMj4H0.l«l|
.
|3]lila:lO0OO4^4H1.ncl
,
,A-^ „ 075
(c)
O.a
0B5
(d)
Fig. 4. Matching based on Local Similarity and Uniformity of Arc Length Now suppose we have a set of m curves and we want to classify them into several groups based on their similarity which is in nonrigid sense. Based on the algorithm in the last section, we can compute the cost of the matching between any two curves of them which turned out to be a m x m matrix. Then this matrix can be used to pattern recognition among those curves. We tried 30 cardiac curves. Among them 15 curves are normal. For contrast of effect of different terms, we tested the classification effect by different setting of parameters. The results are shown by figures 5 through 10.
6 Conclusion We have shown a new promising algorithm for nonrigid correspondence and curve classification. We compute local similarity based on overlapping area between two corresponding curve segments which is very stable. Actually local similarity can catch the features of ordered curve segments very well. Our computation is based on an efficient modified dynamic programming approach. The experimental results show that this algorithm is robust. Figure 4
Nonrigid Correspondence and Classification of Curves Seed: 3
Seed: 2 •••.
{
403
{
•
1
•
2
J)
\
D
(
\
GIE
.../
((
„;:::- .•••
(b)
•"(a)
(d)
(c)
Seed;?
Seed: 10
(
I ::
-
L^
'"'T-^M.^'^
(e)
•••••• D
(f)
/
JH
•
-p-iol
m
V
,.•••'
'(h)-
(g)
\ \ r~Tn
<.
•
^ ^ /
L
J'
•:
C
13 23
)
_^_!4j
/
:
(k)
(1)
Fig. 5. Angle CoefRcient=5, Uniformity Coefficient=5, Local Similarity not counted Seed: is
Seed: 16
/
\, )
1 •
' 5 |
(
•}
••;,
. / • • • • • • ' •
[
• 16]
]
V- re
)
/ ;
I.
"(a)'
(b)
••
""'"(c)
••
•"••(I)
•
"
Seed: 21
f n~2oi 'v
••'''
/
/
)
/
! • 2'l
1 V
(e)
./•••
1 • 24
).
(f)
"
(8)
)) r^ '
'
•
27
'U.z:y
. . . ' • •
•
•:
(h)
••
Seed; 30
) 1-301
./
/
('(••• (
...,
^^....
....•••••••••••••
(j)
Fig. 6. Angle Coefficient=5, Uniformity Coefficient=5, Local Similarity not counted
404
Xiqiang Zheng, Yunmei Chen, David Groisser, and David Wilson leedi'
Seed: 2
\
(
rf )J V„,...''*"
(a)
)
:
! ..
/
i
••
/
/
G3]
^ v.....^:--(b)
/•••'
nm
L
•---'
(e)-
•)
C—
(f)
mZEi
1\
(h)4
•'1
L
../
/
y
. V..,.-
"TF]
(g) Seed: 13
j
) ;
JT
••
c
c
Seed: 12
;
(d)
(c) /
X
Z3
i
L...
C.
:••• ••(k)
(i)
\ )
1 . 14
13 1
(1)
Fig. 7. Angle CoefRcient=0, Uniformity Coefficient=10 Seed: 2D
Seed; 16
Seed: 1G
\
'\^
16 1
c
I (a)
• •
\\
^ ^ '
^^ (b)
n
18 19
\
^ ''•'v,.^
"(df
(c)
") : ID
i
(e)
L•
24l
/
\CZ1]
^.1L.>/ /
(f)
i,-
(g) Seed: 30
29
(.'"
1 1-291
/
./
( (
(i)
20 1
•
„
•
•
.^.^.,
•
"
..
Fig. 8. Angle Coefficient=0, Uniformity Coefficient=10
•(h)"
)
Nonrigid Correspondence and Classification of Curves
•j .•/
• •
16 17
V
f" (
L^ (a)
•
.-•••]
\
I ( \
10 1
...••••'
(c)
(b)
(d)' s».
Seed: 24
Seed
(
( 1
.
/ •;
-)
15|
/•
p ; 241
/'
I
(„
Si
(f)
ft
/
(X
I
••S..V-
)
// rtt-"';.
1
•
9|
\
! • "1 )
•••••-
(
I
•••••••••
(0
rtr-
Seed: 4
,/
/ C 31]
•••
i
11 12
(h)
Seed: 9
/
• •
X*'"'".'.
(g)
( .•' .) L-^
12
v.
('
r~^
.,...••
(e)
1 • 5|
y.
Seed: 10
405
(k)
)
'(1)
Fig. 9. Angle CoefRcient=10, Uniformity Coefficient=10 Seed: 20
Seed: 6
{
f
f
rt'( A] I
, . ' •
^
.,>•
= -: (a)
('(•
\
/ . / / •• 1
26 27
/ 20 1
• ^
(c)
(b)
S«23
Seed: 30
("
, [Z
Seed: 25
/ y
1
•
)
30 1
/ (
• " • • •
..•••••••••
(e)
1 • 23|
( \ ••
/•
.•••"
;
,
.
,
'"(g)"
(f)
C
V ••(h)'
Seed: 26
)1 EU]
) ::
?
( ( • • •
(i)
//
.••'
...
:••'
• .
25]
28 29
(J)
Fig. 10. Angle Coefficient=10, Uniformity Co6fRcient=10
406
Xiqiang Zheng, Yunmei Chen, David Groisser, and David Wilson
shows t h a t the effect of corresponding is already very good if we just use local similarity and uniformity of arc length. Figures 7 and 8 show t h a t the effect of classification is also very good if we just use two terms just mentioned. We have shown some different results of the classification among 30 cardiac curves for contrast with some previous method. For different kinds of data, we can get better results if we choose suitable parameters to get the appropriate combined effect among local similarity, angle change and uniformity of arc length. If we just use angle change and uniformity of arc length, we may get some undesirable classification such as figure 5(k). T h e computation of this algorithm is efficient. To classify 30 cardiac curves with 193 points representing a curve, we set the scahng between ^ and 2, then the running time is less t h a n 3 hours on a 2.8 GHz machine. Experimental results show t h a t scaling between i and 2 is sufficient for the classification of this data.
7 Acknowledgments T h e authors would like to t h a n k Dr. Hemant for his talk given last year about nonrigid correspondence, and Dr. Frenkel for generously providing some help.
References [AF86] Ayache, N.J. and Faugeras, O.D. (1986), HYPER: A New Approach for the Recognition and Positioning of Two-Dimensional Objects, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 1, 44-54. [Ume93] Umeyama, S. (1993), Parameterized Point Pattern Matching and its Application to Recognition of Object Families, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 2, 136-144. [CAS92] Cohen, I., Ayache, N., and Sulger, P. (1992), Tracking Points on Deformable Objects Using Curvature Information, Proc. European Conf. Computer Vision, 458-466. [Tag99] Tagare, H.D. (1999), Shape-Based Nonrigid Correspondence with Application to Heart Motion Analysis, IEEE Trans. Medical Imaging, vol. 18, no. 7, 570- 578. [BCGJ98] Basri, R., Costa, L., Geiger, D., and Jacobs, D. (1998), Determining the Similarity of Deformable Shapes, Vision Research, vol. 38, pp. 2365-2385. [You98] Younes, L. (1998), Computable Elastic Distance between Shapes, SIAM J. Applied Math., V ol. 58, 565-586, . [TSG02] Tagare, H.D., Shea, D.O., and Groisser, D. (2002), Non-Rigid Shape Comparison of Plane Curves in Images, J. Mathematical Imaging and Vision, Vol. 16, no. 1, pp. 57-68, . [SKK03] Sebastian, T.B., Klein, P.N., and Kimia, B.B. (2003), On Aligning Curves, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 25, no. 1, pp. 116-124. [BMPOl] Belongie, S., Malikl, J.,and Puzicha, J.(2001),"Shape Contexts: A New Descriptor for Shape Matching and Object Recognition," NIPS, Vol. 13,831837
Nonrigid Correspondence and Classification of Curves
407
[FB03] Frenkel, M. and Basri, R. (2003), Curve Matching Using the Fast Marching Method, Lecture Notes in Computer Science , Vol. 2683, pp. 35-51. [SS87] Schwartz, J.T. and Sharir, M. (1987), Identification of Partially Obscured Objects in Two and Three Dimensions by Matching Noisy Characteristic Curves, Intl J. Robotics Research, Vol. 6, no. Z, pp. 29-44. [JWGOO] Jacobs, D.W., Weinshall, D,, and Gdalyahu, Y. (2000), Classification with Nonmetric Distances: Image Retrieval and Class Representation, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 22, no. 6, pp. 583600. [Uhl91] Uhlmann, J. (1991), Satisfying General Proximity/Similarity Queries with Metric Trees, Information Processing Letters, Vol. 40, pp. 175-179. [MV93] Marzal, A. and Vidal, E. (1993), Computation of Normalized Edit Distances and Applications, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, pp. 926-932. [LS90] Liu, H.C. and Srinath, M.D. (1990), Partial Shape Classification Using Contour Matching in Distance Transformation, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 12, no. 11, pp. 1072-1079. [TY85] Tsai, W.H. and Yu, S.S. (1985), Attributed String Matching with Merging for Shape Recognition, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 7, no. 4, pp. 453-462.