Parallel Computational Geometry

Parallel Computational Geometry Selim G. Akl Kelly A. Lyons Department of Computing and Information Science Queens Uni...

Author: Selim G. Akl | Kelly A. Lyons

119 downloads 1063 Views 11MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Parallel Computational Geometry

Selim G. Akl Kelly A. Lyons Department of Computing and Information Science Queens University

PRENTICE HALL, Englewood Cliffs, NJ 07632

Library of Congress Cataloging-in-Publication Data

AkU, Selim G. Parallel computational geometry / Selim G. Akl, Kelly A. Lyons. p. cm. Includes bibliographical references (p. ) and indexes. ISBN 0-13-652017-0 1. Gemoetry--Data Processing 2. Parallel processing (Electronic computers) 3. Computer algorithms. I. Lyons, Kelly A. II. Title. QA448.D38A55 1993 516'.00285'435--dc 20 92-8940 CIP

Acquisitions editor: THOMAS McELWEE Editorial/production supervision and interior design: RICHARD DeLORENZO Copy editor: CAMIE GOFFI Cover design: JOE DiDOMENICO Prepress buyer: LINDA BEHRENS Manufacturing buyer: DAVID DICKEY Editorial assistant: PHYLLIS MORGAN

To Joseph

S.G. Akl ** * ** ** * ***

**

***

*

To Rainy Lake and the people on it -*

© 1993 by Prentice-Hall, Inc. A Simon & Schuster Company Englewood Cliffs, New Jersey 07632

K.A. Lyons

All rights reserved. No pan of this book may be reproduced, in any form or by any means, without permission in writing from the publisher.

Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

ISBN 0-13-652017-0 Prentice-Hall Intemational (UK) Limited, London Prentice-Hall of Austria Pty. Limited, Sydney Prentice-Hall Canada Inc., Toronto Prentice-Hall Hispanoamericana, S.A., Mexico Prentice-Hall of India Private Limited, New Delhi Prentice-Hall of Japan, Inc., Tokyo Simon & Schuster Asia Pte. Ltd., Singapore Editor Prentice-lHall do Brasil, Ltda., Rio de Janeiro

ISBN 0-13-652017-0

I

790000> 9 '78013 5_20_177'"11111

Contents PREFACE I

2

Vii

INTRODUCTION

1

1.1

Origins of Parallel Computational Geometry

1.2

Representative Problems

1

1.3

Organization of the Book

4

1.4

Problems

1.5

References

4 6

MODELS OF PARALLEL COMPUTATION 2.1

2.2

9

Early Models 9 2.1.1 Perceptrons, 9 2.1.2 Cellular Automata, 16

Processor Networks 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 2.2.7 2.2.8 2.2.9 2.2.10

2.3

1

10

Linear Array, 11 Mesh or Two-DimensionalArray, 11 Tree, 13 Mesh-of-Trees, 13 Pyramid, 14 Hypercube, 17 Cube-ConnectedCycles, 17 Butterfly, 17 AKS Sorting Network, 18 Stars and Pancakes, 19

Shared-Memory Machines 20 2.3.1 2.3.2 2.3.3 2.3.4

ParallelRandom Access Machine, 21 Scan Model, 23 Broadcasting with Selective Reduction, 23 Models for the Future, 23

2.4

Problems

2.5

References

24 25 iii

iv

Contents

3 CONVEX HULL

27

3.1

Shared-Memory Model Algorithms

3.2

Network Model Algorithms

3.3

Other Models

3.4

When the Input Is Sorted

3.5

Related Problems 3.5.1 3.5.2 3.5.3 3.5.4

28

33

37 38

38

Three-Dimensional Convex Hulls, 38 Digitized Images, 40 Convex Hull of Disks, 41 Computing Maximal Vectors, 41

3.6

Problems

3.7

References

44 45

4 INTERSECTION PROBLEMS 4.1

Line Segments

4.2

Polygons, Half-Planes, Rectangles, and Circles

4.3

Problems

4.4

References

51

51 56

61 62

5 GEOMETRIC SEARCHING

6

5.1

Point Location

5.2

Range Searching

5.3

Problems

5.4

References

65 70

71

72

VISIBILITY AND SEPARABILITY 6.1

Visibility 6.1.1 6.1.2 6.1.3 6.1.4

75

Visibility Polygon from a Point Inside a Polygon, 75 Region of a Polygon Visible in a Direction, 77 Visibility of the Planefrom a Point, 81 Visibility Pairs of Line Segments, 83

6.2

Separability

6.3

Problems

6.4

References

65

84 85 87

75

Contents

v

7 NEAREST NEIGHBORS

89

7. 1

Three Proximity Problems

7.2

Related Problems

7.3

Problems

7.4

References

89

93

94 95

8 VORONOI DIAGRAMS

99

8.1

Network Algorithms for Voronoi Diagrams

8.2

PRAM Algorithms for Voronoi Diagrams

8.3

Problems

8.4

References

101 103

105 107

9 GEOMETRIC OPTIMIZATION

111

9.1

Minimum Circle Cover

I I1

9.2

Euclidean Minimum Spanning Tree

9.3

Shortest Path

9.4

Minimum Matchings 117 9.4.1 Graph Theoretic Formulation, 118 9.4.2 Linear Programming Formulation, 119 9.4.3 Geometric Formulation, 120 9.4.4 ParallelAlgorithm, 121 9.4.5 Related Problems, 121 9.4.6 Some Open Questions, 122

9.5

Problems

9.6

References

115

116

122 123

10 TRIANGULATION OF POLYGONS AND POINT SETS

11

10.1

Trapezoidal Decomposition and Triangulation of Polygons

10.2

Triangulation of Point Sets

10.3

Problems

10.4

References

127

131

134 135

CURRENT TRENDS 11.1

127

Parallel Computational Geometry on a Grid 11.1.1 Geometric Search Problem, 138 11.1.2 Shadow Problem, 141

137 137

Contents

vi

11.1.3 11.1.4

11.2

General Prefix Computations and Their Applications 11.2.1 11.2.2 11.2.3 11.2.4

11.3

146

Lower Bound for GPC, 147 Computing GPC, 148 Applying GPC to Geometric Problems, 149 Concluding Remarks, 150

Parallel Computational Geometry on Stars and Pancakes 11.3.1 11.3.2 11.3.3 11.3.4 11.3.5 11.3.6

11.4

Path in a Maze Problem, 143 Concluding Remarks, 145

Broadcasting with Selected Reduction 11.4.1 11.4.2 11.4.3 11.4.4

169

BSR Model, 171 Sample BSR Algorithms, 172 Optimal BSR Implementation, 175 Concluding Remarks, 180

11.5

Problems

11.6

References

180 182

12 FUTURE DIRECTIONS

187

12.1

Implementing Data Structures on Network Models

12.2

Problems Related to Visibility 12.2.1 12.2.2

151

Basic Definitions, 151 Data Communication Algorithms, 154 Convex Hull Algorithms on the Star and Pancake Networks, 164 Solving Geometric Problems by the Merging Slopes Technique, 165 General Prefix Computation, 168 Concluding Remarks, 168

187

188

Art Gallery and Illumination Problems, 188 Stabbing, 188

12.3

Geometric Optimization Using Neural Nets

12.4

Parallel Algorithms for Arrangements

12.5

P-Complete Geometric Problems

12.6

Dynamic Computational Geometry

12.7

Problems

12.8

References

189

190

191 191

191 193

BIBLIOGRAPHY

195

INDEXES

211

Author

211

Subject

212

Preface Programming computers to process pictorial data efficiently has been an activity of growing importance over the last 40 years. These pictorial data come from many sources; we distinguish two general classes: 1. Most often, the data are inherently pictorial; by this we mean the images arising in medical, scientific, and industrial applications, such as, for example, the weather maps received from satellites in outer space. 2. Alternatively, the data are obtained when a mathematical model is used to solve a problem and the model relies on pictorial data; examples here include computing the average of a set of data (represented as points in space) in the presence of outliers, computing the value of a function that satisfies a set of constraints, and so on. Regardless of their source, there are many computations that one may want to perform on pictorial data; these include, among many others, identifying contours of objects, "noise" removal, feature enhancement, pattern recognition, detection of hidden lines, and obtaining intersections among various components. At the foundation of all these computations are problems of a geometric nature, that is, problems involving points, lines, polygons, and circles. Computationalgeometry is the branch of computer science concerned with designing efficient algorithms for solving geometric problems of inclusion, intersection, and proximity, to name but a few. Until recently, these problems were solved using conventional sequential computers, computers whose design more or less follows the model proposed by John von Neumann and his team in the late 1940s. The model consists of a single processor capable of executing exactly one instruction of a program during each time unit. Computers built according to this paradigm have been able to perform at tremendous speeds, thanks to inherently fast electronic components. However, it seems today that this approach has been pushed as far as it will go, and that the simple laws of physics will stand in the way of further progress. For example, the speed of light imposes a limit that cannot be surpassed by any electronic device. On the other hand, our appetite appears to grow continually for ever more powerful computers capable of processing large amounts of data at great speeds. One solution to this predicament that has recently gained credibility and popularity is parallel processing. Here a computational problem to be solved is broken into smaller parts that are solved simultaneously by the several processors of a parallel computer. The idea is a natural one, and the decreasing cost and size of electronic components have made it feasible. Lately, computer scientists have been busy building parallel computers and developing algorithms and software to solve problems on them. One area that has received its fair share of interest is the development of parallel algorithms for computational geometry. vii

Preface

viii

This book reviews contributions made to the field of parallel computational geometry since its inception about a decade ago. Parallel algorithms are presented for each problem, or family of problems, in computational geometry. The models of parallel computation used to develop these algorithms cover a very wide range, and include the parallel random access machine (PRAM) as well as several networks for interconnecting processors on a parallel computer. Current trends and future directions for research in this field are also identified. Each chapter concludes with a set of problems and a list of references. The book is addressed to graduate students in computer science, engineering, and mathematics, as well as to practitioners and researchers in these disciplines. We assume the reader to be generally familiar with the concepts of algorithm design and analysis, computational geometry, and parallelism. Textbook treatment of these concepts can be found in the following references: 1. Algorithm Design and Analysis G. Brassard and P. Bratley, Algorithmics: Theory and Practice, Prentice Hall,

Englewood Cliffs, New Jersey, 1988. T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill, New York, 1990. U. Manber, Introduction to Algorithms: A Creative Approach, Addison-

Wesley, Reading, Massachusetts, 1989. 2. Computational Geometry H. Edelsbrunner, Algorithms in Combinatorial Geometry, EATCS Mono-

graphs on Theoretical Computer Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1987. K. Mehlhorn, Data Structures and Algorithms 3: Multi-Dimensional Searching and Computational Geometry, EATCS Monographs on Theoretical Computer

Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1984. F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. 3. Parallel Algorithms S. G. Akl, Parallel Sorting Algorithms, Academic Press, Orlando, Florida, 1985. S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, En-

glewood Cliffs, New Jersey, 1989. A. Gibbons and W. Rytter, Efficient ParallelAlgorithms, Cambridge Univer-

sity Press, Cambridge, 1988. Finally, we wish to thank the staff of Prentice Hall for their help, the reviewers for their enthusiasm, and our families for their love and support. Selim G. Akl Kelly A. Lyons

1 Introduction

Computational geometry is a branch of computer science concerned with the design and analysis of algorithms to solve geometric problems. Applications where efficient solutions to such problems are needed include computer graphics, pattern recognition, robotics, statistics, database searching, and the design of very large scale integrated (VLSI) circuits. Typical problems involve a set of points in the plane for which it is required to compute the smallest convex polygon containing the set, or to find a collection of edges connecting the points whose total length is minimum, to determine the closest neighbor to each point, and so on. A survey of important results in this area can be found in [Lee84b], while textbook treatments of the subject are provided in [Mehl84, Prep85, Edel87].

1.1 Origins of Parallel Computational Geometry Due to the nature of some applications in which geometric problems arise, fast and even real-time algorithms are often required. Here, as in many other areas, parallelism seems to hold the greatest promise for major reductions in computation time. The idea is to use several processors which cooperate to solve a given problem simultaneously in a fraction of the time taken by a single processor. Therefore, it is not surprising that interest in parallel algorithms for geometric problems has grown in recent years. While some early attempts date back to the late 1950s, the modern approach to parallel computational geometry was pioneered by A. Chow in her 1980 Ph.D. thesis [Chow8O]. Other initial attempts are described in [Nath8O] and in [Akl82]. Since then a number of significant results have been obtained, and important problems have been identified whose solutions are still outstanding. In this book we survey the first ten years of research in parallel computational geometry.

1.2 Representative Problems Consider the following problems, each a classic in computational geometry. Problem 1: Convex Hull. Given a finite set of points in the plane, it is required to find their convex hull (i.e., the convex polygon with smallest area that includes all the points, either as its vertices or as interior points). 1

2

Introduction

Chap. 1

Problem 2: Line Segment Intersection. Given a finite set of line segments in the plane, it is required to find and report all pairwise intersections among line segments, if any exist. Problem 3: Point Location in a Planar Subdivision. Given a convex planar subdivision (i.e., a convex polygon itself partitioned into convex polygons) and a finite set of data points, it is required to determine the polygon of the subdivision occupied by each data point. Problem 4: Visibility Polygon from a Point Inside a Polygon. Given a simple polygon P and a point p inside P, it is required to determine that region of P that is visible from p (i.e., the region occupied by points q such that the line segment with endpoints p and q does not intersect any edge of P). Problem 5: Closest Pair. Given a finite set of points in the plane, it is required to determine which two are closest to one another. Problem 6: Voronoi Diagram. Given a finite set S of data points in the plane, it is required to find, for each point p of S, the region of the plane formed by points that are closer to p than to any other point of S. Problem 7: Minimum-Distance Matching. Given 2n points in the plane, it is required to match each point with a single other point so that the sum of the Euclidean distances between matched points is as small as possible. Problem 8: Polygon Triangulation. Given a simple polygon P. it is required to triangulate P (i.e., to connect the vertices of P with a set of chords such that every resulting polygonal region is a triangle). Each of the problems above has been studied thoroughly in the literature, and often more than one efficient sequential algorithm exists for its solution. The list above is also illustrative of geometric problems with a significant degree of inherent parallelism. Take, for instance, Problem 2. It is obvious that one could check all pairs of segments simultaneously and determine all existing intersections. Similarly, in Problem 3, all polygons of the subdivision may be checked at the same time for inclusion of a given data point. The same is true of Problem 5, where the closest neighbor of each point can be computed in parallel for all points and the overall closest pair of points quickly determined afterward. These examples demonstrate that very fast solutions to geometric problems can be obtained through parallel computation. However, the solutions just outlined are rather crude and typically require a large number of resources. Our purpose in this book is to show that geometric problems can be, and indeed have been, solved by algorithms that are efficient both in terms of running time and computational resources. We now introduce some terminology and notation used throughout the book. As customary in computational geometry, we refer to the number of relevant objects in the statement of a problem as the size of that problem. For example, the size of Problem 1

Sec. 1.2

Representative Problems

3

is the number of points, while the size of Problem 2 is the number of line segments. Let f (n) and g(n) be functions from the positive integers to the positive reals: 1. The function g(n) is said to be of order at least f (n), denoted Q (f (n)), if there are positive constants c and no such that g(n) > cf (n) for all n > no. 2. The function g(n) is said to be of order at most f (n), denoted 0(f (n)), if there are positive constants c and no such that g(n) < cf (n) for all n > no.

The Q( ) notation is used to express lower bounds on the computational complexity of problems. For example, to say that 2 (n log n) is a lower bound on the number of operations required to solve a certain problem of size n in the worst case means that the problem cannot be solved by any algorithm (whether known or yet to be discovered) in fewer than cn log n operations in the worst case, for some constant c. On the other hand, the 0( ) notation is used to express upper bounds on the computational complexity of problems. For example, if there exists an algorithm that solves a certain problem of size n in cn 2 operations in the worst case, for some constant c, and furthermore, no other algorithm is known that requires asymptotically fewer operations in the worst case, then we say that 0(n2 ) is an upper bound on the worst-case complexity of the problem at hand. Both the Q( ) and the 0( ) notations allow us to concentrate on the dominating term in an expression describing a lower or upper bound and to ignore any multiplicative constants. In an algorithm, an elementary operation is either a computation assumed to

take constant time (such as adding or comparing two numbers) or a routing step (i.e., the sending of a datum from one processor to a neighboring processor in a parallel computer). The number of elementary operations used by a sequential algorithm is generally used synonymously with the running time of the algorithm. Thus if a sequential algorithm performs en operations to solve a problem of size n, where c is some constant, we say that the algorithm runs in time t(n) = 0(n). In a parallel algorithm an upper bound on the the total number of operations performed by all processors collectively in solving a problem (also known as the cost or work) is obtained by multiplying the number of processors by the running time of the algorithm (i.e., the maximum number of operations performed by any one processor). When an algorithm solves a problem using a number of operations that matches, up to a constant multiplicative factor, the lower bound on the computational complexity of the problem, we say that the algorithm is optimal if it is a sequential algorithm, and cost optimal if it is a parallel algorithm. A randomized algorithm (whether sequential or parallel) is one that terminates within a prespecified running time with a given probability. The running time of such an algorithm is said to be probabilistic. A deterministic algorithm, on the other hand, has a guaranteed worst-case running time. In this book we refer to an algorithm as being deterministic only in those circumstances where it is to be contrasted with a randomized algorithm. When no qualifier is used explicitly, it should be understood that the algorithm in question (whether sequential or parallel) is deterministic. We adopt the notation used in [Reif90], where O( ) is used to express the running time

Introduction

4

Chap. 1

of a probabilistic algorithm, while O( ) is used in conjunction with deterministic time. Sometimes the expected running time of a deterministic algorithm is of interest. To this end, an average-case analysis is conducted assuming that the input obeys a certain probability distribution.

1.3 Organization of the Book Unlike in sequential computation, where von Neumann's model prevails, several models of parallel computation have been proposed and used. In Chapter 2 we introduce the most common of these models, particularly those used to design parallel algorithms for computational geometry. Many of the more interesting results in parallel computational geometry are algorithms designed for the shared-memory PRAM model of computation (Section 2.3.1). Algorithms for the less powerful network models are often seen as more practical since actual machines based on these models can be constructed more readily. However, designing algorithms for the PRAM model results in complexities that reflect the inherent limits of solving a problem in parallel rather than limits due to data movement [Yap87]. For example, the lower bound for communicating data on a mesh-connected computer (Section 2.2.2) is Q(n 1/2), and this lower bound holds for most interesting algorithms on the mesh. In this book we describe parallel algorithms that solve geometric problems on both network and shared-memory models. The main body of the book, Chapters 3 to 10, is organized according to geometric problem. In each of these chapters we describe a problem and give some idea of the extent to which the problem has been studied in the sequential world and the best known time complexities for solving the problem sequentially. We then describe parallel solutions to the problem and discuss the significance of the parallel results. When appropriate, a table is provided summarizing the complexities of existing parallel algorithms for the problem. Finally, Chapters 11 and 12 cover current trends and future directions, respectively, in the design of parallel algorithms for geometric problems. We stress that the parallel algorithms selected for treatment in this survey are for problems that are fundamental in nature, such as construction, proximity, intersection, search, visibility, separability, and optimization. By contrast, parallel algorithms for various applications of computational geometry are covered in [Uhr87, Kuma90, Kuma91]. We also draw the reader's attention to [Good92b], where another survey of parallel computational geometry can be found.

1.4 Problems 1.1.

A parallel computer is a computer consisting of several processors that cooperate to solve a problem simultaneously. This definition leaves many details unspecified, particularly those details pertaining to the structure and operation of a parallel computer. Several of the options available in designing a parallel computer, or more specifically, a parallel model of computation, are outlined in Chapter 2. Suppose that you had to design a parallel computer.

Sec. 1.4

Problems

5

Before reading Chapter 2, and perhaps to better appreciate the issues therein, describe how your parallel computer will be organized and how the processors will function when solving a problem in parallel. 1.2. The convex hull of a finite set P of points in the plane is the smallest convex polygon that contains all the points of P. The convex hull is a cornerstone concept in computational geometry, and algorithms for computing it, both sequentially and in parallel, have provided many insights to the field's theory and practice. Use the parallel computer designed in solving Problem 1.1 to compute the convex hull of a set of points. 1.3. Given a finite set P of points in the plane, it is required to compute a triangulation of P (i.e., it is required to connect the points of P repeatedly using nonintersecting straight-line segments until no more segment can be added without creating an intersection). The resulting structure is the convex hull of P and a collection of polygons inside it, each of which is a triangle. Suggest a way to solve this problem in parallel. 1.4. Assume that the triangulation T of a set of points P. as defined in Problem 1.3, is known. Given a point p not in P, it is required to determine the triangle (if any) of T in which p falls. Is there a fast way to solve this problem in parallel? How many processors will be needed? Express your answers in terms of the number of points in P. 1.5. In some applications it is required to determine, given a finite set of points in the plane, which two are closest (if several pairs satisfy the condition, one may be selected at random). Propose an efficient parallel solution to this problem (i.e., one that is fast and does not use an excessive number of processors). Can your solution be extended to points in d-dimensional space where d > 2? 1.6. Another common problem in computational geometry is to determine intersections of a number of objects. For simplicity, let us assume that all objects are equilateral triangles in the plane, all of the same size and all having one edge parallel to the x-axis. Design a parallel algorithm to solve this problem, and discuss its time and processor requirements. 1.7. An n-vertex convex polygon P is given in the plane such that its interior contains the origin of coordinates. It is required to identify P (i.e., determine its shape and location) using finger probes. For a chosen directed line L, a finger probe can be thought of as a point moving from infinity along and in the direction of L until it first touches P at some point p. The outcome of the probe is the pair (L,p), where p is oc if L does not intersect P. A sequence of such probes may be used to determine the exact shape and location of P. Design a parallel algorithm for identifying a set of probes sufficient to determine the shape and location of a given convex polygon. 1.8. General polygons are polygons in which two or more edges may cross. This class of polygons includes simple polygons as a subclass. (In simple polygons, no two edges may cross.) (a) Give a definition of the interior of a general polygon. (b) Design a test for point inclusion in a general polygon (i.e., a test to determine whether a given data point falls inside a given general polygon). (c) Design a test for polygon inclusion in a general polygon (i.e., a test to determine whether a general polygon is included inside another general polygon). (d) Design a test for polygon intersection (i.e., a test to determine whether two general polygons intersect (of which inclusion is a special case)]. (e) Develop parallel implementations of the tests in parts (b) through (d). (f) Are there applications where general polygons arise? 1.9. You are given two simple polygons P and Q in the plane, where Q falls entirely inside

6

Introduction

Chap. 1

P. Now consider two points p and q in the annular region R = P - Q (i.e., p and q are in P but not in Q). It is required to find a polygonal path from p to q that falls entirely in R and satisfies one or both of the following conditions: (a) The number of line segments on the path is minimized. (b) The total length of the path is minimized. Develop a parallel algorithm for solving this problem. 1.10. Given a point p inside a simple polygon P, it is required to compute a polygonal path from p to every vertex v of P such that the number of line segments on each path, also called the link distance from p to v, is minimized. Design a parallel algorithm for solving this problem. 1.11. Using the definition of link distance in Problem 1.10, develop an adequate definition for the concept of link center of a simple polygon P, and design a parallel algorithm for locating such a center. 1.12. In sequential computation, data structures play a crucial role in the development of efficient algorithms, particularly for computational geometric problems. Discuss the idea of how data structures might be implemented in a parallel computational environment.

1.5 References [AkI82]

S. G. AkI, A constant-time parallel algorithm for computing convex hulls, BIT, Vol. 22, 1982, 130-134. [Chow8] A. L. Chow, Parallel algorithms for geometric problems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1980. [Edel87] H. Edelsbrunner, Algorithms in combinatorial geometry, in EATCS Monographs on Theoretical Computer Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1987. [Good92b] M. T. Goodrich and C. K. Yap, What can be parallelized in computational geometry: a survey, manuscript in preparation, 1992. [Kuma90] V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal (Editors), ParallelAlgorithms for Machine Intelligence and Vision, Springer-Verlag, New York, 1990. [Kuma9l] V. K. Prasanna Kumar, ParallelArchitectures and Algorithms for Image Understanding, Academic Press, New York, 1991. [Lee84b] D. T. Lee and F. P. Preparata, Computational geometry-a survey, IEEE Transactions on Computers, Vol. C-33, No. 12, t984, 1072-1101. [MehI84] K. Mehlhorn, Data structures and algorithms 3: multi-dimensional searching and computational geometry, in EATCS Monographs on Theoretical Computer Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1984. [Nath8O] D. Nath, S. N. Maheshwari, and P. C. P. Bhatt, Parallel Algorithms for the Convex Hull Problem in Two Dimensions, Technical Report EE 8005, Department of Electrical Engineering, Indian Institute of Technology, Delhi Hauz Khas, New Delhi, October 1980. [Prep85] F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. [Reif90] J. H. Reif and S. Sen, Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications (preliminary

Sec. 1.5

[Uhr87] [Yap87]

References

7

version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 327-337. L. Uhr (Editor), Parallel Computer Vision, Academic Press, New York, 1987. C. K. Yap, What can be parallelized in computational geometry? Invited talk at the International Workshop on Parallel Algorithms and Architectures, Humboldt University, Berlin, May 1987, Lecture Notes in Computer Science, No. 269, Springer-Verlag, Berlin, 1988, 184-195.

2 Models of Parallel Computation

In this chapter we define existing models of parallel computation, with a particular emphasis on those used in the development of parallel computational geometric algorithms.

2.1 Early Models We begin this review with two models that predate today's more popular ones (developed mostly since the late 1970s). Interestingly, the salient features of these two models were later rediscovered, and new names are now used for these models. 2.1.1 Perceptrons The perceptron, proposed in the late 1950s [Rose62], was intended to model the visual pattern recognition ability in animals. A rectangular array of photocells (representing the eye's retina) receives as input from the outside world a binary pattern belonging to one of two classes. Inside the machine, the input bits are collected into n groups. Within a group, each bit is multiplied (at random) by +1 or -1, and the products are added: If the sum is larger than or equal to a certain threshold, a I is produced as input to the next stage of the computation; otherwise, a 0 is produced. Each of these n bits is then multiplied by an appropriate weight value wi and the products are added: Again, if the sum is larger than or equal to a given threshold, a final output of I is produced (indicating that the original input pattern belongs to one class); otherwise, a 0 is produced (indicating the other class). Figure 2.1 shows a rectangular array and a perceptron. A number of limitations of the perceptron model are uncovered in [Mins69]. It is important to point out that many of the ideas underlying today's neural net model 9

Models of Parallel Computation

10

Chap. 2

1,0

Figure 2.1

Perceptron.

of computation owe their origins to perceptrons. Neural nets, however, are more general in a number of ways; for example, they are not restricted to visual patterns, can classify an input into one of several classes, and their computations may be iterative [Lipp87]. 2.1.2 Cellular Automata The cellular automaton consists of a collection of simple processors all of which are identical. Each processor has a fixed amount of local memory and is connected to a finite set of neighboring processors. Figure 2.2 illustrates an example of a processor (a cell) in a cellular automaton and its neighboring processors. At each step of a computation, all processors operate simultaneously: Input is received from a processor's neighbors (and possibly the outside world), a small amount of local computation is performed, and the output is then sent to the processor's neighbors (and possibly the outside world). Developed in the early to mid-1960s [Codd68], this model enjoyed a purely theoretical interest until the advent of very large scale integrated circuits. The dramatic reduction in processor size brought about by the new technology rendered the model feasible for real computers. Today's systolic arrays [Fost8O] are nothing but finite cellular automata often restricted to two-dimensional regular interconnection patterns, with various input and output limitations. Cellular automata are the theoretical foundation upon which lies the more general processor network model of Section 2.2. 2.2 Processor Networks In a processor network, an interconnected set of processors, numbered 0 to N - 1, cooperate to solve a problem by performing local computations and exchanging

Sec. 2.2

Processor Networks

I11

Figure 2.2 Example of a processor and its neighbors in a cellular automaton. 0

2

3

Figure 2.3 Linear array with N = 6 processors.

messages [Akl89a]. Although all identical, the processors may be simple or powerful, operate synchronously or asynchronously, and execute the same or different algorithms. The interconnection network may be regular or irregular, and the number of neighbors of each processor may be a constant or a function of the size of the network. Local computations as well as message exchanges are taken into consideration when analyzing the time taken by a processor network to solve a problem. Some of the most widely used networks are outlined below. 2.2.1 Linear Array The simplest way to interconnect N processors is as a one-dimensional array. Here

processor i is linked to its two neighbors i -I and i + I through a two-way communication line. Each of the end processors, 0 and N -1, has only one neighbor. Figure 2.3 shows an example of a linear array of processors for N = 6. 2.2.2 Mesh or Two-Dimensional Array A two-dimensional network is obtained by arranging the N processors into an m x m array, where m = N1 2. The processor in row j and column k is denoted by (j,k), where 0 < j < m -I and 0 < k < m -1. A two-way communication line links (j,k) to its neighbors (j + 1, k), (j 1-,k), (j, k + 1), and (j, k -1). Processors on the boundary rows and columns have fewer than four neighbors and hence fewer connections. This network is also known as the mesh or the mesh-connected computer (MCC) model.

5

12

Chap. 2

Models of Parallel Computation COLUMN NUMBER

ROW NUMBER

0

1

2

3

0

2

3

Figure 2.4 Mesh with N = 16 processors. When each of its processors is associated with a picture element (or pixel) of a digitized image (i.e., a rectangular grid representation of a picture), the mesh is sometimes referred to as a systolic screen. Figure 2.4 shows a mesh with N = 16 processors. A number of processor indexing schemes are used for the processors in a mesh [Akl85b]. For example, in row-major order, processor i is placed in row j and column k of the two-dimensional array such that i = jm + k, for 0 < i < N-1, 0 < j < m-1, and 0 < k < m- 1. In snakelike row-major order, processor i is placed in row j and column k of the processor array such that i = jm + k, when j is even, and i = jm +m - k - 1, when j is odd, where i, j, and k are as before. Finally, shuffled row-major order is defined as follows. Let bib2 ... bq and blb(q/2)+Ib2b(q/2)+2b3b(q/2)+3b4... bql2bq be the

binary representations of two indices i and i5, respectively, 0 < i, i, < N - 1. Then processor is occupies in shuffled row-major order the position that would be occupied by processor i in row-major order. The mesh model can be generalized to dimensions higher than two. In a ddimensional mesh, each processor is connected to two neighbors in each dimension with processors on the boundary having fewer connections [Akl85b, Hole90. Several variations on the mesh have been proposed, including the mesh with broadcast buses [where the processors in each row (or column) are connected to a bus

Sec. 2.2

Processor Networks

13

LEVEL3

LEVEL2

LEVEL I

LEAVES

LEVEL

Figure 2.5 Tree with

N = 24- I= 15

processors.

over which a processor can broadcast a datum to all other processors in the same row (or column)], and the mesh with reconfigurable buses (which is essentially a mesh with broadcast buses and four switches per processor, allowing several subbuses to be created as needed by the algorithm). 2.2.3 Tree In a tree network, the processors form a complete binary tree with d levels. The levels are numbered from 0 to d - I and there are a total of N = 2- - I nodes each of which is a processor. A processor at level i is connected by a two-way line to its parent at level i + 1 and to its children at level i -1. The root processor (at level d - I) has no parent and the leaves (all of which are at level 0) have no children. Figure 2.5 shows a tree with N = 24-I = 15 nodes.

2.2.4 Mesh-of-Trees In a mesh-of-trees (MOT) network, N processors are placed in a square array with N' 12 rows and N1/ 2 columns. The processors in each row are interconnected to form a binary tree, as are the processors in each column. The tree interconnections are the only links among the processors. Figure 2.6 shows a mesh-of-trees with N = 16 processors. Sometimes the mesh-of-trees architecture is described slightly differently. Here, N processors form an N'12 x N'/2 base such that each base processor is a leaf of a column binary tree and a row binary tree. Additional processors form row and column binary trees. Each base processor is connected to its parent processor in its column binary tree and its parent processor in its row binary tree. The total number of processors is O(N). In some cases, mesh connections between the base processors are allowed.

14

Models of Parallel Computation

Figure 2.6 Mesh-of-trees with N

-

Chap. 2

16 processors.

This architecture can be nicely embedded in the plane making it useful for VLSI implementation. Figure 2.7 shows this different mesh-of-trees architecture.

2.2.5 Pyramid A one-dimensional pyramid computer is obtained by adding two-way links connecting processors at the same level in a binary tree, thus forming a linear array at each level. This concept can be extended to higher dimensions. For example, a two-dimensional

Sec. 2.2

Processor Networks

Figure 2.7 Slightly different mesh-of-trees where N = 16. The N boxes are the base processors and the black circles are additional processors that form row and column binary trees.

15

Models of Parallel Computation

16

Chap. 2

APEX

BASE

BASE

Figure 2.8

Pyramid with d = 2.

pyramid consists of 4 d/ 3 - 1/3 processors distributed among d + 1 levels. All processors at the same level are connected to form a mesh. There are 4d processors at level 0 (also called the base) arranged in a 2d x 2d mesh. There is only one processor at level d + 1 (also called the apex). In general, a processor at level i, in addition to being connected to its four neighbors at the same level, also has connections to four children at level i- 1 (provided that i > 1), and to one parent at level i + 1 (provided that i < d - 1). Figure 2.8 shows a pyramid with d = 2.

Sec. 2.2

Processor Networks

17

Figure 2.9 Hypercube with N = 23 processors.

2.2.6 Hypercube Assume that N = 2 d for some d > 1. A d-dimensional hypercube is obtained by connecting each processor to d neighbors. The d neighbors of processor i are those processors j such that the binary representation of the numbers j and i differs in exactly one bit. Figure 2.9 shows a hypercube with N = 23 processors. 2.2.7 Cube-Connected Cycles To obtain a cube-connected cycles (CCC) network, we begin with a d-dimensional hypercube, then replace each of its 2d corners with a cycle of d processors. Each processor in a cycle is connected to a processor in a neighboring cycle in the same dimension. See Figure 2.10 for an example of a CCC network with d = 3 and N = 2dd = 24 processors. In the figure each processor has two indices ij, where i is the processor order in cycle j. A modified CCC network is a CCC network with additional links, guaranteeing that it can be partitioned into smaller CCC networks [Mill88]. 2.2.8 Butterfly A butterfly network consists of 2d(d + 1) processors organized into d + I rows and 2d columns. If (i, j) is the processor in row i and column j, then for i > 0, (i, j) is connected to (i -1, j) and (i - 1, k), where the binary representations of the numbers j and k differ only in the ith most significant bit. A butterfly network with 23(3 + 1) processors is illustrated in Figure 2.11. The butterfly is related to both the hypercube and the CCC architectures. A link in a hypercube between processors i and j such that the binary representation of the number i differs from the binary representation of the number j in the rth bit corresponds to a link in a butterfly between processor (r - 1, i) and (r,j). To see how the butterfly is related to the CCC model, we begin by identifying row 0 with row d. Now, consider each column of processors in the butterfly as a node of a d-dimensional hypercube such that the processors in a column are connected in a cycle at the node in the order in

18

Models of Parallel Computation

Chap. 2

14

16 9'

C) 34

24

26

10

30

36

12

20

32

22

15

17

. ,

A N ,

,

35

,

31

A

25

27

II

WL_~

KZ, ,

37

13

1

J2

23

21

Figure 2.10 processors.

23 Cube-connected

33

cycles network with d = 3 and N = 24

which they appear in the column. Any algorithm that can be implemented in T(n) time on a butterfly can be implemented in T(n) time on a hypercube and a CCC network [Ullm84]. 2.2.9 AKS Sorting Network An O(n) processor network capable of sorting n numbers into nondecreasing order in 0(logn)' time is exhibited in [Leig85], based on the earlier work of [Ajta83]. ' All logarithms in this book are to the base 2 unless stated otherwise.

Sec. 2.2 COLUMN

Processor Networks 1

0

19 2

3

4

5

6

7

ROW 0

3 Figure 2.11

Butterfly network with 23(3+ 1) processors.

This network, combined with a modified CCC network, is referred to in [Mill88] as a modified AKS network. 2.2.10 Stars and Pancakes

These are two interconnection networks with the property that for a given integer 1t, each processor corresponds to a distinct permutation of aqsymbols, say {1, 2, . . ., 1. In other words, both networks connect N = ra! processors, and each processor is labeled with the permutation to which it corresponds. Thus, for q = 4, a processor may have the label 2134. In the star network, denoted by Se, a processor v is connected to a processor u if and only if the label of u can be obtained from that of v by exchanging

Models of Parallel Computation

20 1234

Chap. 2

4231

Figure 2.12 A 4-star. the first symbol with the ith symbol, where 2 < i < A. Thus for rj = 4, if v = 2134

and u = 3124, u and v are connected by a two-way link in S4, since 3124 and 2134 can be obtained from one another by exchanging the first and third symbols. Figure 2.12 shows S4. In the pancake network, denoted by P,, a processor v is connected to a processor u if and only if the label of u can be obtained from that of v by flipping the first i symbols, where 2 < i < q. Thus for rj = 4, if v = 2134 and u = 4312, u and v are connected by a two-way link in P4, since 4312 can be obtained from 2134 by flipping the four symbols, and vice versa. Figure 2.13 shows P4 . Both the star and pancake interconnection networks have been proposed as alternatives to the hypercube. They have recently been used to solve several problems in computational geometry. These two networks, their properties, and associated algorithms are studied in detail in Chapter 11.

2.3 Shared-Memory Machines

One of the main challenges involved in designing an algorithm for a processor network follows from the fact that the routing of messages from one processor to another is the responsibility of the algorithm designer. This challenge is removed completely by the models described in this section.

Sec. 2.3

1234

21

Shared-Memory Machines

4321

32

.1

23

.1

41 3 14:

a 3412

2143

Figure 2.13 A 4-pancake.

2.3.1 Parallel Random Access Machine In a parallel random access machine (PRAM), the processors no longer communicate directly through a network. Instead, a common memory is used as a bulletin board and all data exchanges are executed through it. Any pair of processors can communicate through this shared memory in constant time. As shown in Figure 2.14, an interconnection unit (IU) allows each processor to establish a path to each memory location for the purpose of reading or writing. The processors operate synchronously and each step of a computation consists of three phases: 1. The read phase, in which the processors read data from memory 2. The compute phase, in which arithmetic and logic operations are performed 3. The write phase, in which the processors write data to memory Depending on whether two or more processors are allowed to read from and/or write to the same memory location simultaneously, three submodels of the PRAM are identified: 1. The exclusive-read exclusive-write (EREW) PRAM, where both read and write accesses by more than one processor to the same memory location are disallowed 2. The concurrent-read exclusive-write (CREW) PRAM, where simultaneous reading from the same memory location is allowed, but not simultaneous writing

22

Models of Parallel Computation

(-_

Chap. 2

-A

Interconnection Unit (IU)

*AZ Processors

Shared Memory Locations

Figure 2.14

PRAM.

3. The concurrent-read concurrent-write (CRCW) PRAM, where both forms of simultaneous access are allowed In the case of the CRCW PRAM, one must also specify how write conflicts are to be resolved (i.e., what value is stored in a memory location when two or more processors are attempting to write potentially different values simultaneously to that location). Several conflict resolution policies have been proposed, such as the PRIORITY rule (where processors are assigned fixed priorities, and only the one with the highest priority is allowed to write in case of conflict), the COMMON rule (where, in case of conflict, the processors are allowed to write only if they are attempting to write the same value), the ARBITRARY rule (where any one of the processors attempting to write succeeds), the SMALLEST rule (where only the processor wishing to write the smallest datum succeeds), the AND rule (where the logical AND of the Boolean values to be written ends up being stored), the SUM rule (where the values to be stored are added up and the sum deposited in the memory location), the COLLISION rule (where a special symbol is stored in the memory location to indicate that a write conflict has occurred), and many other variants. Computational geometry on a single processor uses the REAL RAM as the model of computation which allows real arithmetic up to arbitrary precision, as well as evaluation of square roots and analytic functions, such as "sin" or "cos" in 0(1) time. In parallel, the model of computation is the REAL PRAM, sometimes denoted as RPRAM. We make the assumption that the PRAM model used for computational geometry is the REAL PRAM and refer to it simply as PRAM. Two fundamental algorithms for the EREW PRAM are broadcasting and sorting. Broadcasting allows a datum d to be communicated to N processors in 0 (log N) time, by beginning with one processor reading d and then doubling the number of processors that have d at each iteration [Akl85b]. The second algorithm, sorting, allows N numbers to be sorted in nondecreasing order by N processors in 0 (log N) time

Sec. 2.4

Problems

23

[Cole88b]. By using these two algorithms, it is possible to simulate any concurrent-read or concurrent-write step involving N processors in 0 (log N) time on a PRAM that disallows them [Akl89a, Corm9O].

2.3.2 Scan Model

Given n data items xo, xi., xn-, and an associative binary operation *, it is required to compute the n-I quantities xO * xl, xO * xl * x 2 , .. , xO * xI * ... * X,- . It is well known that all required outputs can be computed in parallel on a processor network with n processors in 0(log n) time. This is known as a parallelprefix computation [Krus85]. It is argued in [Blel89] that since O(logn) is the amount of time required to gain access to a shared memory of 0(n) locations, the time for parallel prefix computation can be absorbed by the time for memory access. Further, since the latter is assumed to take constant time, so should the former. The scan model is therefore the usual PRAM augmented with a special circuit to perform parallel prefix. As a result, many algorithms that use parallel prefix and run on the PRAM in T time units run on the scan model in T/logn time units.

2.3.3 Broadcasting with Selective Reduction Broadcasting with selective reduction, proposed in [Akl89c], extends the power of the CRCW PRAM while using only its existing resources: The interconnection unit connecting processors to memory locations is exploited to allow each processor to gain access to potentially all memory locations (broadcasting). At each step of an algorithm involving a concurrent write, the algorithm can specify, for each memory location, which processors are allowed to write in that location (selection) and the rule used to combine these write requests (reduction). This model is described in detail in Chapter 11 together with algorithms for solving geometric problems on it.

2.3.4 Models for the Future All popular models of computation today are based largely on assumptions derived from digital electronics. It is believed, however, that new models may emerge from totally different approaches to building computers. There are already computers in existence today in which some devices are built using optical components [Feit88]. The day may not be far where central processing units and memories are optical. There are also studies under way to investigate the possibility of building biologically based computers [Conr86]. The effect these models may have on our approach to algorithm design in general, and computational geometry in particular, is still unknown.

24

Models of Parallel Computation

Chap. 2

2.4 Problems 2.1.

How does your parallel computer, designed in solving Problem 1.1, compare with the parallel models of computation described in this chapter? 2.2. In Chapter 3, several parallel algorithms are described for computing the convex hull of a set of n points in the plane (defined in Problem 1.2). Before reading Chapter 3, attempt to compute the convex hull on one or more of the models of computation presented in this chapter. For simplicity, you may assume that no two points have the same x- or y-coordinate, and that no three points fall on the same straight line. Your algorithms may use one or more of the following properties of the convex hull: (a) If a point p falls inside the triangle formed by any three of the other n - I points, p is not a vertex of the convex hull. (b) If pi and pj are consecutive vertices of the convex hull, and pi is viewed as the origin of coordinates, then among all the remaining n -I points of the set, pj forms the smallest angle with pi with respect to the positive (or negative) x-axis. (c) A segment (pi, pj) is an edge of the convex hull if and only if all the n - 2 remaining points fall on the same side of an infinite line through (pi, pj). (d) If all the rays from a point p to every other point in the set are constructed, and the largest angle between each pair of adjacent rays is smaller than 7r, then p is not on the hull (and conversely). 2.3. Analyze each algorithm designed in Problem 2.2 to obtain its running time t(n) and the number of processors it uses p(n), both of which are functions of the size of the problem n (i.e., the number of points given as input). 2.4. A set of 2n points in the plane consists of n blue points and n red points. It is required to connect every blue point to exactly one red point, and similarly, every red point to exactly one blue point by straight lines whose total length is the minimum possible. Derive parallel algorithms for solving this problem on at least two different models of parallel computation, and analyze their running time. 2.5. Two points p and q in a simple polygon P are said to be visible from one another if the line segment with endpoints p and q does not intersect any edge of P. The visibility polygon from a point p contained inside a polygon P is that region of P that is visible from p. Show how this problem can be solved in parallel on a hypercube parallel computer. 2.6. Given a set of circular arcs S on a circle C, it is required to find the minimum number of arcs in S that cover C. Design an efficient parallel algorithm for solving this problem on a two-dimensional array of processors. 2.7. The plus-minus 2' (PM2I) interconnection network for an N-processor computer is defined as follows: Processor j is connected to processors r and s, where r = j + 2' mod N and s j-2' mod N, forO < i < logN. (a) Compare the PM2I processor network to the hypercube. (b) Use the PM2I processor network to solve Problem 1.4. 2.8. Consider the following model of parallel computation. The model consists of n2 processors arranged in an n x n array (n rows and n columns). The processors are interconnected as follows: (a) The processors of each column are connected to form a ring (i.e., every processor is connected to its top and bottom neighbors), and the topmost and bottommost processors of the column are also connected. (b) The processors of each row are connected to form a binary tree (i.e., if the processors

Sec. 2.5

References

25

in the row are numbered 1, 2,. n, then processor i is connected to processors 2i and 2i + I if they exist). Use this model to solve Problem 1.5. 2.9. Let N processors, numbered 0, 1, ... , N - 1, be available, where N is a power of 2. In the perfect shuffle interconnection network a one-way line links processor i to processor j, where the binary representation of j is obtained by cyclically shifting that of i one position to the left. Thus for N = 8, processor 0 is connected to itself, processor I to processor 2, processor 2 to processor 4, processor 3 to processor 6, processor 4 to processor 1, processor 5 to processor 3, processor 6 to processor 5, and processor 7 to itself. In addition to these shuffle links, two-way links connecting every even-numbered processor to its successor are sometimes added to the network. These connections are called the exchange links. In this case, the network is known as the shuffle-exchange interconnection network. Use the shuffle-exchange interconnection network to solve Problem 1.6. 2.10. An Omega network is a multistage interconnection network with n inputs and n outputs. It consists of k = log n rows numbered 1, 2, . . ., k with n processors per row. The processors in row i are connected to those in row i + 1, for i = 1, 2, . . ., k-1, by a perfect shuffle interconnection. (a) Discuss the relationship between the Omega network and a k-dimensional hypercube. (b) Use the Omega network to solve Problem 2.5. 2.11. A satellite picture is represented as an n x n array of pixels each taking an integer value between 0 and 9, thus providing various gray levels. The position of a pixel is given by its coordinates (ij), where i and j are row and column numbers, respectively. It is required to smooth the picture [i.e., the value of pixel (ij) is to be replaced by the average of its value and those of its eight neighbors (i - 1, j), (i - 1,j - 1), (i, j - 1), (i + 1, j - 1), (i + 1, j), (i + 1, j + 1), (i, i + 1), and (i -1, j + 1), with appropriate rounding]. (a) Design a special-purpose model of parallel computation to solve this problem. Assume that N, the number of processors available, is less than n2 , the number of pixels. (b) Give two different implementations of the smoothing process, and analyze their running times. 2.12. As described in Problem 2.11, a picture can be viewed as a two-dimensional array of pixels. A set S of pixels is said to be convex if the convex hull of S does not contain any pixel not belonging to S. Design a parallel algorithm for the two-dimensional pyramid to determine whether a set of pixels is convex.

2.5 References [Ajta831 [Akl85b]

[Akl89a] [Akl89c] [B1eI89]

M. Ajtai, J. Koml6s, and E. Szemeredi, An O(n log n) sorting network, Combinatorica, Vol. 3, 1983, 1-19. S. G. Akl, Parallel Sorting Algorithms, Academic Press, Orlando, Florida, 1985. S. G. AkM, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. S. G. Akl and G. R. Guenther, Broadcasting with selective reduction, Proceedings of the Eleventh IFIP Congress, San Francisco, August 1989, 515-520. G. E. Blelloch, Scans as primitive parallel operations, IEEE Transactions on Computers, Vol. C-38, No. 11, November 1989, 1526-1538.

26

Models of Parallel Computation

Chap. 2

[Codd68] E. F. Codd, CellularAutomata, Academic Press, New York, 1968. [Cole88b] R. Cole, Parallel merge sort, SIAM Journal on Computing, Vol. 17, No. 4, August 1988, 770-785. [Conr86] M. Conrad, The lure of molecular computing, IEEE Spectrum, Vol. 23, No. 10, October 1986, 55-60. [Corm90] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill, New York, 1990. [Feit88] D. G. Feitelson, Optical Computing, MIT Press, Cambridge, Massachusetts, 1988. [Fost80] M. J. Foster and H. T. Kung, The design of special purpose VLSI chips, Computer, Vol. 13, No. 1, January 1980, 26-40. [Hole9O] J. A. Holey and 0. H. Ibarra, Iterative algorithms for planar convex hull on mesh-connected arrays, Proceedings of the 1990 InternationalConference on Parallel Processing, St. Charles, Illinois, August 1990, 102-109. [Krus85] C. P. Kruskal, L. Rudolf, and M. Snir, The power of parallel prefix, Proceedings of the 1985 International Conference on Parallel Processing, St. Charles, Illinois, August 1985, 180-185. [Leig85] F. T. Leighton, Tight bounds on the complexity of parallel sorting, IEEE Transactions on Computers, Vol. C-34, No. 4, April 1985, 344-354. [Lipp87] R. P. Lippmann, An introduction to computing with neural nets, IEEE ASSP Magazine, April 1987, 4-22. [Mill88] R. Miller and Q. F. Stout, Efficient parallel convex hull algorithms, IEEE Transactions on Computers, Vol. C-37, No. 12, December 1988, 1605-1618. [Mins69] M. Minsky and S. Papert, Perceptrons, MIT Press, Cambridge, Massachusetts, 1969. [Rose62] F. Rosenblatt, Principles of Neurodynamics, Spartan Books, New York, 1962. [Ullm84] J. D. Ullman, Computational Aspects of VLSI, Computer Science Press, Rockville, Maryland, 1984.

3 Convex Hull

A set P = {Po, p 1 Pn-} of points in the plane is given, where each point is represented by its Cartesian coordinates [i.e., pi = (xi, yi)]. It is required to find the convex hull of P (i.e., the smallest convex polygon that includes all the points of P). The vertices of CH(P) are points of P such that every point of P is either a vertex of CH(P) or lies inside CH(P). Figure 3.1 shows a set P of points and the convex hull of P.

Without doubt the most popular problem among designers of sequential computational geometric algorithms, constructing convex hulls enjoyed a similar attention in parallel computing. In fact, it appears to be the first problem in computational geometry for which parallel algorithms were designed [Nath8O, Chow8l, Akl82]. To simplify our subsequent discussion we make two assumptions: 1. No two points have the same x or y coordinates. 2. No three points fall on the same straight line. These two assumptions can easily be lifted without affecting the behavior of the algorithms we present. Our statement of the convex hull problem requires a polygon to be computed. Any algorithm for determining CH(P) must then produce its vertices in the (clockwise or counterclockwise) order in which they appear on the convex hull. Consequently, any such algorithm can be used to sort n numbers [Akl89a]. Therefore, the running time of any algorithm for computing CH(P) on some model of computation is bounded below by the time required to sort on that model. For example, sorting n numbers on an 0(n)-processor linear array requires Q (n) time [Akl85b], and hence the same bound applies to computing the convex hull. In fact, as we will see below, many algorithms for computing the convex hull use sorting explicitly as a step in their computations. It should be noted that the problem of determining the vertices of the convex hull in any order is no easier asymptotically than that of producing them as a polygon. Indeed, 27

28

Convex Hull

0

0

Chap. 3

0

0 0 *0

0 0

0*0

0 a

db

(a)

Figure 3.1

(b)

(a) Set P of points and (b) convex hull CH(P) of P.

it is known that the former problem requires Q (n log n) algebraic operations [Yao8 1], and this coincides with the lower bound on sorting [Ben-083].

3.1 Shared-Memory Model Algorithms One of the first parallel alogorithms for the convex hull problem appears in [Akl82]; it runs in constant time on the CRCW PRAM (with the AND rule for resolving write conflicts) and requires 0(n3) processors. An improved algorithm for the same model that also runs in constant time but with 0(n 2 ) processors is described in [Akl89b]. The algorithm makes use of the following two properties of the convex hull: Property 1. Let pi and pj be consecutive vertices of CH(P), and assume that pi is taken as the origin of coordinates. Then among all points of P, pj forms the smallest angle with pi with respect to the positive (or negative) x-axis.

the n

Property 2. A segment (pi, Pi) is an edge of the convex hull if and only if all - 2 remaining points fall on the same side of an infinite line through (pi, P).

For simplicity of notation, and when the distinction is clear from context, we henceforth use (Pi, pj) to represent both the straight-line segment with endpoints pi and pj, as well as the infinite straight line through points pi and pj, without qualification. Assume that 0(n 2 ) processors are available on a CRCW PRAM. By assigning 0(n) processors to each point Pi, it is possible to determine in constant time (using the SMALLEST write-conflict resolution rule) the points Pk and Pm such that the segments (Pi,Pk) and (piPm) form the smallest angles with respect to the positive x-axis and the negative x-axis, respectively. Now, by assigning 0(n) processors to each segment (p ,pj) found in the preceding step, it is possible to determine in constant time (using the AND write-conflict resolution rule) whether all input points fall on the same side of an infinite straight line through (pipj), in which case pi is declared to be a point of CH(P). Finally, by sorting them according to their polar angles (using a constant time CRCW PRAM sorting algorithm [Akl89a]), the points identified as vertices of CH(P)

Sec. 3.1

Shared-Memory Model Algorithms

29

can be listed in the order in which they appear on the boundary of the convex hull (e.g., clockwise order). Thus the entire algorithm requires constant time. It is interesting to point out that this algorithm is essentially a parallelization of the algorithm due to Jarvis [Jarv73], long believed to be inherently sequential because of the incremental (point-by-point) way it constructs the convex hull. Note further that no algorithm is known for constructing the convex hull in constant time in the worst case while using asymptotically fewer that n

2

processors. By contrast, the CRCW PRAM

algorithm described in [Stou88], which requires 0(n) processors and the COLLISION rule for resolving write conflicts, assumes that the data are chosen from a uniform distribution and runs in constant expected time. A different approach is taken in [Atal86a] and, independently in [Agga88], for the weaker CREW PRAM with 0(n) processors. It is based on the idea of multiway divideand-conquer, which consists of dividing the problem into a number of subproblems whose solutions are obtained recursively in parallel, and then merging these solutions. The algorithm proceeds in three steps. In the first step the n points are sorted by their x-coordinates. The set P is then partitioned into n1/2 sets PI, P2 , .. . , P,112, divided by vertical lines such that Pi is to the left of Pj if i < j. This step is implemented in 0(logn) time using the sorting algorithm of [Cole88b]. In the second step, the convex hull problem is solved recursively in parallel for all Pi to obtain CH(Pi). Finally, the union of the convex polygons CH(PI), CH(P2 ), ..., CH(Pn 2) is computed, yielding CH(P). The merge step is implemented as follows. Let u and v be the points of P with smallest and largest x-coordinates, respectively. The convex polygon CH(P) consists of two parts: the upper hull (from u to v) and the lower hull (from v to u). We describe the merge step for the upper hull (the computation for the lower hull is symmetric). Each CH(Pi) is assigned 0(n"/2 ) processors. These processors find the n 1 2 - I upper common tangents between CH(P,) and the remaining n1 2 _I other convex polygons. Each tangent between two polygons is obtained by applying the sequential algorithm of [Over8l], which runs in 0(logn) time. Since 0(n1 /2 1) processors are computing tangents to the same polygon simultaneously, concurrent reads are needed during this step. Among all tangents to polygons to the left of CH(P,), let Vi be the one with smallest slope and tangent to CH(Pi) at the point vi. Similarly, among all tangents to polygons to the right of CH(Pt), let W, be the one with the largest slope and tangent to CH(P1 ) at the point wi. As shown in Figure 3.2, if the angle formed by V, and W, is less than 180°, none of the points of CH(Pi) is on the upper hull; otherwise, all the points from vi to wi are on the upper hull. These computations are done simultaneously, for all CH(P1 ), each yielding a (possibly empty) list of points on the upper hull. A parallel prefix computation, as defined in Section 2.3.2, is then used to compress these lists into one. In this case the addition operation (+) is used as the binary associative operation and the computation is referred to as computing prefix sums. Consider a shared memory array of points z(I), z(2), . . , z(n), where for each point z(i), it is known whether or not z(i) is an upper hull point. To compact all upper hull points into adjacent positions of the array, we assign a label s(i) to z(i), such that s(i) = I if z(i) is an upper hull point; otherwise, s(i) = 0. The position of upper hull point z(k)

Convex Hull

30

Chap. 3

CH(Pi)

(a)

Vj

Wj

CH(PO)

(b)

Figure 3.2 (a) The angle formed by V1and W, is less than 180°. (b) The angle formed by Vj and W, is greater than 180°.

in the compacted array is then obtained from s(l) + s(2) + * *+ s(k). This quantity is known as a prefix sum, and all n prefix sums can be computed in 0 (log n) parallel time using O(n) processors. Initially, processor i knows s(i), and when the following iterative computation terminates, s(i) has been replaced by s(l) + s(2) + *.. + s(i): During iteration j, where 0 < j < logn - 1, and for all i, 2i + 1 < i < n, processor i replaces s(i) with s(i -21) + s(i). Thus the merge step requires O(n) processors and 0(logn) time. The overall running time of the algorithm is given by t(n) = t(n"/2 ) + b logn, for some constant b, which is O(logn). Since O(n) processors are used, the cost of the algorithm (i.e., the total number of operations performed) is 0 (n log n). This cost is optimal in view of the Q (n log n) lower bound on the number of operations required to solve the convex hull problem [Prep85]. The same running time is obtained in [Cole88a], using the cascading merge tech-

Sec. 3.1

Shared-Memory Model Algorithms

31

nique described in detail in [Atal89c]. We outline the technique briefly here. Given an unsorted list of elements stored at the leaves of a tree T (where some leaves may be empty), the list U(v) at each node v of T is computed, which is the sorted list of all elements stored at descendent leaves of v. The algorithm proceeds in stages computing a list Us(v) at stage s for each node v E T. At each stage s, a sample of the elements in U,-I(x) and U.- (y) are merged to form the list Us(v), where x and y are the left and right children of v, respectively. An internal node v is active at stage s if Ls/31 < alt(v) < s, where alt(v), the altitude of a node v, is the height of the tree T minus the depth of v (the depth of the root is 0). Node v is full at stage s if Ls/3j = alt(v). For each stage s up to and including the time a node v becomes full, a sample of every fourth element of Us- (x) is passed to v and merged with every fourth element of U, I(y) to create the list Us(v). In stage s + 1, a sample of every second element of Us(x) and a sample of every second element of Us(y) are merged, and in (y) are merged. Therefore, there are stage s + 2, all the elements of U + 1 (x) and Ui+ 3 x height(T) stages in total. A sorted list L is a c-cover of a sorted list J if between every two elements of the list (-oo, L, +oc), there are at most c elements of J. It is shown in [Atal89c] how the two sample lists can be merged to create Us(v) in constant time using O(IU,(v)l) processors if the list of items passed to v from its child node x(y) in stage s is a c-cover of the list of items passed to v from x(y) in stage s + 1. In [Cole88a], n points are sorted by x-coordinate and the upper and lower hulls are computed using the cascading divide-and-conquer technique. Consider computing the upper hull. Initially, the points are paired and each pair represents an edge on the upper hull of that pair of points. The upper hulls are stored at the leaves of a tree. Given edges on two upper hulls sorted by slope, the edges are merged using cascading merge, and the common tangent t is found. Those edges not on the union of the upper hulls are deleted from the sorted list and t is added. Adding and deleting these edges does not change the fact that the list being passed up the tree at stage s is a c-cover of the list being passed up the tree at stage s + 1. The convex hull is thus computed in 0 (log n) time using

O(n)

processors on a CREW PRAM.

Two other algorithms for the CREW PRAM are described in the pioneering work of Chow [Chow81]: The first uses O(n) processors and runs in O(log 2 n) time, while the second uses 0(nl+I±K)processors and runs in O(K logn) time, with 1 < K < logn. Many efforts were directed toward obtaining efficient convex hull algorithms on the least powerful variant of the shared-memory model, namely the EREW PRAM. For example, an algorithm appearing in [Nath8O] runs in 0(K logn) time with 0(n±+I/K) processors, 1 < K < logn, thus duplicating the performance of the CREW PRAM algorithm in [Chow8l]. Also described in [Nath8O] is an O(N)-processor algorithm, I < N < n, which runs in 0(n/N log n + log n log N) time. An algorithm in [Akl84] uses 0(n 1 -) processors, 0 < E < 1, and runs in 0(nL logh) time, where h is the number of edges on the convex hull. It is shown in [Mill88] how the CREW PRAM multiway divide-and-conquer algorithm of [Atal86a] and [Agga88] can be modified to achieve the same performance on the weaker EREW PRAM. A judicious distribution of work among processors and

Convex Hull

32

Chap. 3

broadcasting the data needed to compute the tangents avoids the need for concurrent reads. The algorithm has essentially the same structure as the CREW algorithm (i.e., subdivision, recursive solution, and merging), with two differences: 1. In the first step, P is subdivided into n1/ 4 subsets, PI, P2 ,. . . , P. . /4, each with n 3 / 4 points. 2. In the third step, a different approach is used to compute the tangents. Let (p,, pj) be the upper tangent between CH(Pi) and CH(Pj), with pi in Pi, and pj in Pj. The slope of this tangent lies between the slopes of (pi-,,pi) and (pi,Pi+±), and also between the slopes of (pj-,, pj) and (pj,pj+,). This property is used to obtain the upper (lower) tangents between all pairs of convex polygons computed recursively during the second step of the algorithm. We now describe how this is done for the upper tangents. Step 1. Let the upper hull of Pi contain ni points. Of these, n /4 are marked, breaking the upper hull into convex chains of equal length. For each three consecutive marked points Pk-I, Pk, Pk+I, the processor containing Pk. say processor j, creates two slope records: [slope of straight line through

(Pk, Pk+l), Pk, Pk+l

[slope of straight line through Step 2.

(Pk-1, Pk),

Pk-I,

fll, and

Pk

j]-

Every CH(Pj) sends its 0(n1 /4 ) slope records to every other CH(Pj).

Step 3. Every CH(Pj) creates two slope records for each of its points [i.e., 0(n3 / 4 ) records in all]. These records are merged with the 0(n1/2 ) records received from the other CH(P1). Step 4. Through a parallel prefix (postfix) operation, each processor, upon receiving a record with slope s, can determine the largest (smallest) slope of a CH(Pj) edge that is smaller (larger) than s. Thus each processor in CH(Pj) that contains a received record representing a point p of CH(PF) can determine (in constant time) the endpoints of the upper tangent to CH(Pj) passing through p, and whether p is on, to the left, or to the right of the upper common tangent between CH(Pj) and CH(Pj). Step 5. Each CH(Pj) returns to CH(PI) the records it received from the latter, appended with the information gathered in step 4. This allows CH(P,) to determine, for each CH(Pj), the points (between consecutive marked points) that contain an endpoint of an upper common tangent between CH(P,) and CH(Pj). Step 6. Thus far, the search within CH(Pi) for the endpoint of the upper tangent to CH(Pj) has been narrowed down to 0(n1 /2 ) points. Steps 1 to 5 are now repeated two more times: In the first, 0 (n1/4) equally spaced points, among the 0(n1 /2 ), are sent from CH(Pj) to CH(Pj), thus reducing the number of candidates further to 0(n 1 /4 );

Sec. 3.2

Network Model Algorithms

33

finally, all 0(n' 4) leftover points are sent to CH(Pj), leading to the determination of the upper tangent endpoint. Computing the tangents therefore requires 0(logn) time. Since all other steps are identical to the ones in the CREW PRAM algorithm, this algorithm runs in time t(n) = t(n3 4) + clogn, for some constant c, which is 0(logn).

3.2 Network Model Algorithms

Several parallel algorithms for the convex hull problem have been developed for network models. Among the first algorithms are two results by Chow presented in [Chow8l] for the cube-connected cycles (CCC) network model. The first algorithm runs in 0(log2 n) time on a CCC with O(n) processors. It uses the divide-and-conquer technique, which splits the set P into two sets PI and P2, recursively solves the problem on PI and P2 , and then merges the two hulls to find CH(P). The second algorithm in [Chow8l] is also a divide-and-conquer algorithm; however, P is split into n 1 -/K subsets of n 1 K points each where I < K < logn. The problem is solved on each of the nI-I/K subsets simultaneously and the hulls are merged into one. This algorithm runs in 0(K logn) time on a CCC with

0(n 1+/K)

processors.

Little work on convex hull solutions for the CCC model has since been reported; however, a number of results exist for the hypercube model of computation. Stojmenovi6 presents two parallel convex hull algorithms for a hypercube with O(n) processors that both run in 0(log2 n) time [Stoj88a]. In both algorithms the input data are distributed one point per processor, and the points are sorted by x-coordinate in O(log 2 n) time [Akl85b]. The first algorithm is an adaptation for the hypercube of the CREW PRAM multiway divide-and-conquer algorithm of [Atal86a] and [Agga88]. The second algorithm is similar in spirit to the first CCC algorithm of [Chow8l]: The set P is divided into two disjoint sets PI and P2, each with approximately n/2 points and stored in a hypercube of O(n/2) processors; CH(PI) and CH(P2 ) are computed recursively, and CH(P) is formed by constructing the two common tangents between CH(PI) and CH(P2 ). We describe in some detail how this last step is performed. For each edge e of CH(PI) [similarly, for each edge of CH(P2 )] it is possible to decide if e is an edge of CH(P) by applying Property 2; namely, edge e is in CH(P) if CH(PI) and CH(P2 ) are in the same half-plane defined by the infinite line through e. We describe how this test is done for edges in CH(PI). For edges in CH(P2 ), the test is symmetric. Rather than testing all the vertices of CH(P2 ) with an edge e of CH(P1 ), it suffices to test two points: the nearest and farthest points of CH(P2 ) to e. Assuming that these two points are known to the processor that stores e, that processor can determine if e belongs to CH(P) in constant time. Consequently, every processor containing a point of CH(PI) or CH(P2 ) can determine if that point is a vertex of CH(P). Among the vertices of CH(P) thus determined, exactly two from CH(PI) have two adjacent edges such that one edge is on CH(P) and the other is not. Similarly, exactly two vertices from CH(P2 ) have the property of being adjacent to two edges, one on CH(P) and the other

Convex Hull

34

0

1

14

15

3

2

13

12

4

7

8

11

5

6

9

1

Chap. 3

Figure 3.3 Mesh of size 16 in proximity order.

not on CH(P). These four points define the upper and lower tangents of CH(PI) and CH(P2 ). It remains to be shown how for each edge e of CH(PI), the points pi and pj of CH(Pi) that are nearest to and farthest from e, respectively, can be found. The following property is used: pi belongs to an edge ei of CH(P2 ) such that Islope(e) -slope(ei)I is minimized; similarly, pj belongs to an edge ej of CH(P2 ) such that Islope(e) +

7r -

slope(ej) I is minimized. Therefore, by merging the slopes of edges

in CH(PI) and CH(P2 ), the nearest and farthest points in CH(P2 ) to each edge in CH(P1 ) can be found. Since merging two lists of size n/2 each on an O(n)-processor hypercube can be done in 0 (log n) time, and since all other operations described require constant time, the merge step runs in O(logn) time. The running time of the algorithm is t(n) = 2t(n/2) +blogn for some constant b, which is O(log 2 n). Another algorithm for computing the convex hull of n planar points on an O(n)-processor hypercube is given by Miller and Stout in [Mill88]. There are two main steps in the algorithm: The first sorts the set of points in O(log 2 n) time; the second is a divide-and-conquer computation requiring O(logn) time. Therefore, the time to sort n points on a hypercube of size n dominates the performance of this convex hull algorithm. Recently, a faster algorithm to sort on the hypercube has been developed which runs in O(lognloglogn) time. This sorting algorithm is presented in [Leig9l] and is based on ideas first proposed in [Cyph9O]. Therefore, the running time of the convex hull algorithm by Miller and Stout [Mill88] can be reduced to 0 (log n log log n). Miller and Stout also present an algorithm for computing the convex hull of a set of n points in the plane that runs in 0 (n 1/2) time on a mesh of size n [Mill89b]. The indexing used for the mesh is proximity ordering, which combines the advantages of snakelike ordering and shuffle row-major ordering: Adjacent processors have consecutive processor numbers and processors are organized by quadrant, which is useful for algorithms that employ the divide-and-conquer strategy. Figure 3.3 shows a mesh of size 16 with processors ordered in proximity order. To simplify the description of the algorithm, it is assumed that there is no more than one point per processor. The points are sorted by x-coordinate and divided into four subsets, PI, P2, P3 ,

Sec. 3.2

Network Model Algorithms

35

, All points in CH(P 2 ) lie below (a,p)

Some points in CH(P 2 ) lie above (p,b) a

Figure 3.4

CH(P,), CH(P2 ) and the lines (a, p) and (p, b).

and P4 , by three vertical separating lines. Each subset is mapped to a consecutive quadrant on the mesh. Let the quadrant Ai contain the set Pi of points, i = 1, 2, 3, 4. The convex hull is found recursively for the points in each quadrant and the resulting four convex hulls are merged in three steps. CH(PI) and CH(P2 ) are merged to form CH(PI U P2) = CH(B1), and CH(P3 ) and CH(P4 ) are merged to form CH(P3 U P 4 ) = CH(( 2 ). Finally, CH(B1 ) and CH(B2) are merged to form CH(Bi U B2) == CH(P). We now describe how merging two convex hulls is performed in 0(nl/2 ) time on a mesh of size n. We discuss merging CH(PI) and CH(P2 ) into CH(BI). The other two merges are similar. Merging CH(PI) and CH(P2 ) into CH(B1 ) requires finding the points p, t E CH(P1 ) and q, s E CH(P2 ) such that (p,q) is the upper common tangent to CH(PI) and CH(P2 ) and (s,t) is the lower common

tangent. We describe how to find the point p e CH(PI). A similar approach is used to find q, s, and t. Note that the two hulls do not intersect because they are separated by a vertical separating line. The coordinates of the point of CH(PI) with the smallest x-coordinate Xmin, and the point with the largest x-coordinate Xmax, and their position in counterclockwise order around CH(Pl) are reported to all processors of quadrant Al by two semigroup operations. A semigroup computation applies an operation such as minimum to all data items in a given quadrant in 0(r1 2 ) time, where r is the maximum

number of processors in a quadrant, and broadcasts the resulting value to all processors in the quadrant. Note that p must lie on or above the line (xmin, Xmax). Let a and b be the

points immediately succeeding and preceding p, respectively, in the counterclockwise ordering of hull points. All the points in CH(P2 ) must be below the line (a,p) and some of the points in CH(P2 ) must be above the line (p,b). See Figure 3.4. Initially, p is chosen to be the point pi on CH(PI) halfway between points Xmin

Convex Hull

36

Chap. 3

and xmax; pi is identified and reported to all processors in quadrant Al by using a semigroup operation. Let a, and bi be the points immediately succeeding and preceding pi, respectively. The two processors that contain a, and bi compute (ai,pi) and (p ,bi), respectively, and pass these values to all processors in quadrant A2 [the quadrant storing the points of CH(P2 )] by performing a concurrent read operation. Concurrent read, sometimes called random access read, allows any number of processors to read a value stored at another processor. Concurrent read takes O(n112 ) time on a mesh of size n. The processors in A2 store a 1(0) in a variable if they are below(above) (aj,pi) and store a 0(1) in a different variable if they are above(below) (pi,bi). They then write both of these variables to the processor in Al that contains pi using a concurrent write operation where conflicts are resolved by writing the minimum value in the variables. Concurrent write, sometimes called random access write, allows any number of processors to write to a location in a different processor. Concurrent write is executed in 0(nl/2) time on a mesh of size n. The processor that stores pi determines: 1. If all points in CH(P2 ) are below (ai,pi) (1 written by processors in A2), or 2. If one or more points in CH(P2 ) are above (pi,bi) (O written by processors in A 2 ).

If both conditions are satisfied, the point p has been found. If the first condition is not satisfied, Xmax is assigned to ai and pi is recomputed to be halfway between xmin and xmax. If the second condition is not satisfied, xmin is assigned to bi and pi is recomputed as above. The data are compressed to minimize communication cost and the merge algorithm is iterated. In data compression, m pieces of data distributed randomly on a mesh of size r such that r > m are moved to a submesh of size m in O(rI/2 ) time. In this binary search manner, 0(logn) iterations of the algorithm are executed to find each of the four tangent points. Steps in the first iteration operate on approximately n/2 pieces of data, and because of the data compression operation, at the ith iteration, approximately n/2' pieces of data are involved. Therefore, the number of steps over O(logn) iterations is O (log n)

E

(n/2') 1 /2

=

O(n 1 /2).

i=O

To compute the points in CH(P1 U P2) = CH(B), all processors concurrently read the number of points in CH(Pl), the number of points in CH(P2), and the counterclockwise positions of p, q E CH(Pl), and s, t C CH(P2 ). Each processor computes the position in CH(B) of its hull point (if it contains one). This final step in the merging takes O(n 1/2) time. The total time to merge takes 0(n 1/2) time as shown above, and merging CH(PI) with CH(P2 ) can be done in parallel with merging CH(P3 ) and CH(P4 ). The following recurrence relation gives the total time of the convex hull algorithm on a mesh of size n: t(n) = t(n/4) + cn112 , for some constant c; therefore, the algorithm runs in 0(n'/2 ) time. In [Mill84a], Miller and Stout present a similar algorithm for a mesh with a

Sec. 3.3

Other Models

37

snakelike order indexing scheme. The running time is the same as for the proximity order algorithm. Both of these algorithms for the mesh are time optimal since the lower bound for sorting on a mesh of size n is Q (nl/2) [Akl85b). In [Wang90a], parallel algorithms are sketched for sorting n elements in constant time and computing the convex hull of n points using a three-dimensional n x n x n mesh with reconfigurable buses. The sorting algorithm, which appeared originally in [Wang90b], achieves its constant running time by exploiting the constant time reconfigurability of the buses and the fact that transmission of a signal along a bus through 0(n) processors takes constant time. In [Wang90a], a straightforward planar embedding of the n x n x n array is proposed. While the changes in the interconnection from that described in [Wang90b] are cosmetic (e.g., replacing a diagonal with a right angle) the number of processors remains the same, and the sorting algorithm is essentially unchanged. The convex hull algorithm described in [Wang90a] is also an immediate consequence of the result in [Wang90b]. Algorithms for computing the convex hull on a linear array, a d-dimensional mesh, and a hypercube are presented in [Hole90]. Three types of linear array are considered. One allows input at one end and output at the other such that data travel in one direction only, the second allows input at all processors but data movement in one direction only, and the third allows input and output at all processors and the data movement in either direction. The convex hull algorithm runs on all three types of linear array in 0(n) time with 0(n) processors, on a d-dimensional mesh in 0(d 2 nlid) time with 0(n) processors and on a hypercube in 0(log2 n) time with 0(n) processors. Dynamic convex hull algorithms are also given in which deletions and insertions of points from and to the set are handled. Finally, the following results regarding convex hull computation on processor networks deserve mention: 0(n) time on an 0(n)-processor linear array [Chaz84, Chen87]; 0(logn) time on an 0(n2 )-processor mesh-of-trees [Akl9b]; and 0(logn) time on an 0(n)-processor modified AKS network [Mill88]. In [Reif90] a randomized algorithm for determining the convex hull of a set of points in the plane on an 0(n)-processor butterfly is given that runs in 0 (log n) probabilistic time. In Chapter 11 we describe an algorithm for computing the convex hull on the star and pancake networks.

3.3 Other Models A number of models of computation that are less well known than the PRAM or processor network models have also been used to develop parallel convex hull algorithms. Three of these are particularly noteworthy. In [Blel88], two algorithms are given for the scan model. The first of these is based on the sequential algorithm of [Eddy77], dubbed Quickhull in [Prep85]. When the input points obey certain probability distributions, the algorithm runs on the scan model with 0(n) processors in 0(logh) expected time, where h is the number of points on the convex hull. In the worst case, however, the algorithm runs in 0(n) time. The second algorithm in [Blel88] is an adaptation of

38

Convex Hull

Chap. 3

the multiway divide-and-conquer algorithm of [Atal86a] and [Aggal88]: It also requires 0 (n) processors, but runs in 0 (log n) time in the worst case. An algorithm for the BSR model is described in [Akl89c] which uses the following property of convex hull points: Property 3. Consider a point p. Construct all the rays from p to every other point in the set. These rays form a star centered at p. Measure the angles between each pair of adjacent rays. If the largest such angle is smaller than ir, then p is not on the hull (and conversely). The algorithm in [Akl89c] requires 0(n 2 ) processors and runs in constant time. The details of this algorithm are provided in Chapter 11.

Summary The table in Figure 3.5 summarizes the results in the previous sections. Note that h is the number of hull edges, 0< e < 1, and 1 < K < log n.

3.4 When the Input is Sorted Assume that the n points for which the convex hull is to be computed are already sorted (say, by their x-coordinates). This situation may be used advantageously by any algorithm that explicitly sorts its input, and whose processor and time requirements are dominated by those for sorting. For example, 0(log3 n/(log log n) 2) time algorithms are described in [Mill88] for computing the convex hull of n sorted points on a tree, a pyramid, and a mesh-of-trees, each with 0(n) processors. Also given in [Mill88], as stated in Section 3.2, is an 0(n)-processor hypercube algorithm that runs in 0(logn) time if the input points are sorted and is identical to the one described in detail in Section 3.1 for the EREW PRAM. A CREW PRAM algorithm is presented in [Good87b] which computes the convex hull for sorted inputs in 0(logn) time using 0(n/logn) processors. An algorithm in [Fjal9O] computes the convex hull of a set of sorted points in 0(logn/loglogn) time using 0(n loglogn/logn) processors on a COMMON CRCW PRAM.

3.5 Related Problems 3.5.1 Three-Dimensional Convex Hulls There has been some work in developing parallel algorithms for the three-dimensional convex hull problem. Some of the earliest algorithms for computing the convex hull of a set of points in three dimensions are presented in [Chow8O]. One of these runs on a CREW PRAM in O(log 3 n) time using 0(n) processors and an 0(logn) time parallel sorting algorithm. [Note that the running time in [Chow8O] is actually given as 0(log3 n loglogn) because, at the time, 0(logn loglogn) was the running time of the

Sec. 3.5

Related Problems

39

Reference

Model

Processors

Running time

[Chow8l] [Chow8ll

CREW PRAM CREW PRAM

0(n)

[Nath8O]

CREW PRAM EREW PRAM

I < N
0(log n) 0(K logn) 0(N log n + log n log N) °0(Nlog n +log n log N)

[Nath8O] [Nath80] [Nath80] [Akl82]

CREW PRAM EREW PRAM

O(n'+11K)

0(n3)

O(K logn) O(K logn) 0(l)

(Ak184]

CRCW PRAM (AND) EREW PRAM

O(n' 6)

[Atal86a], [Agga88], [Cole88a] [Stou88]

CREW PRAM CRCW PRAM

0(n) 0(n)

0(n log h) 0(log n) 0(1) expected

0(n)

0(logn)

0(n2)

0(1)

0(n) 0(n'+11K)

0(log' n) O(K logn) 0(n) 0(n1 /2)

[Mil]88] [Akl89b] [Chow8l] [Chow8l] [Chazg4], [Chen87], [Hole90] [Mill84a], [Mill89b] [Blel88] [Blel88] [Stoj88a], [Hole90] [Mi1l88] with [Leig9l] [MiIl88] [Akl89a] [Akl89c] [Wang90a] [Reif90] [Hole90]

(COLLISION) EREW PRAM CRCW PRAM (AND, SMALLEST) CCC CCC Linear array Mesh Scan Scan Hypercube Hypercube Modified AKS network Mesh-of-trees BSR 3-D Mesh with reconfigurable buses Butterfly d-Dimensional mesh

0(n) 0(n) 0(n) 0(n) 0(n) 0(n) 0(n)

0 (log n) O(logh) expected 0(log2 n)

0 (n2) O(n3 )

O(logn log log n) O(log n) O(log n) 0(l) 0(1)

0(n) 0(n)

O(log n) 0(d 2 n1 d)

0(n2)

Figure 3.5 Performance comparison of parallel convex hull algorithms.

fastest known parallel sorting algorithm.] A three-dimensional convex hull algorithm that also runs in 0(log3 n) time on a CREW PRAM is given in [Agga88]. This time matches that in [Chow80], but optimal sorting is not necessary to achieve it. Any sorting algorithm that runs in 0(log3 n) time or better will suffice. In [Dado89], two algorithms for constructing the convex hull of a set of n points in three dimensions using 0(n) processors on a CREW PRAM are presented:

Convex Hull

40

1. A randomized algorithm whose probabilistic time is 2. A deterministic 0 (log2 n log* n) ' time algorithm.

6 (log2

Chap. 3

n).

There are also parallel algorithms for computing the convex hull of a set of points in three dimensions on network models. Two algorithms are given in [Chow8O] that run on a CCC network. The first runs in 0 (log4 n) time using O(n) processors, and the second runs in 0(K log 3 n) time using 0(n1+1/K) processors. A three-dimensional convex hull algorithm that runs in O(n 1/2 logn) time on an n 1/2 x n1/2 mesh is presented in [Dehn88b]. If each processor in the mesh is allowed 0(logn) space, the algorithm runs in 0 (n' /2) time. Recently, a solution to the multisearch problem on the mesh (see Chapter 12) has led to an 0 (n 1 12)-time three-dimensional convex hull algorithm on a mesh of size n with constant memory per processor [Atal9la]. The convex hull of a set of points on a sphere can be used to find the Voronoi diagram of a set of points in the plane due to an inversion method of Brown [Brow79a] (see Chapter 8). An algorithm to compute the convex hull of a set S of points on a sphere which is used to compute the Voronoi diagram of S is given in [Lu86a]. The algorithm runs in 0(n1/21ogn) time on an 0(n)-processor mesh. It has the same running time as the algorithm in [Dehn88b] but is for a more restricted case when the points are on a sphere. 3.5.2 Digitized Images A special instance of some geometric problems occurs when points form a digitized black-and-white picture [i.e., a two-dimensional binary array in which every entry is a picture element (or pixel)]: The entry in row i and column j is a 1 if and only if there is a black data point with coordinates (i, i). Algorithms that determine the convex hull of figures in a digitized image have been developed for a variety of networks. When one pixel is stored in each processor of an n1 /2 x n1/ 2 mesh-connected computer or systolic screen, an 0(nl/2) algorithm for computing the convex hull of each component in a black-and-white picture is given in [Stou84] and [Mill85b]. An algorithm is given in tDehn86b] for a slightly different problem on an n1/2 x n1/2 mesh or systolic screen. There, an image is defined to be rectilinearconvex if the intersection of the image and an arbitrary vertical or horizontal line in the digitized plane results in no more than one line segment. The image is "peeled" by finding the rectilinear convex hull of the image, removing its vertices from the image, and iterating that process. The algorithm finds all rectilinear convex hulls of an image in O(n 1 /2 ) time. The number of these hulls provides an estimate of the depth of the image. If one pixel is stored in each base processor of a pyramid computer of size n, it is shown in [Mill84b, Mill85a] that the convex hull of the points in processors belonging to the same figure can be found in 0(log2 n/log log n) time. The convex hulls of each figure in the image can be found simultaneously in 0(n 1/ 4 logn) time. In [Kuma86] it is shown that on an n x n mesh-of-trees, one can enumerate the extreme points of the ' log*n is defined as the number of times the log function needs to be applied to reduce n to a constant.

Sec. 3.5

Related Problems

41

convex hull of all figures in a digitized image simultaneously in 0(log4 n) time. Issues surrounding the implementation of a convex hull algorithm on the Intel iPSC hypercube of size n are discussed in [Mill87a, Mill89a]. 3.5.3 Convex Hull of Disks Given a set S of (possibly) overlapping disks of arbitrary size, the convex hull of S, CH(S), is the intersection of all half-planes containing S. Thus CH(S) consists of a sequence of arcs of the disks alternating with straight-line segments. A parallel algorithm for computing the convex hull of a set of overlapping disks of arbitrary size is given in [vanW90]. It runs in O(log2 n) time on a CREW PRAM with 0(n) processors. It uses divide-and-conquer and merging of slopes. 3.5.4 Computing Maximal Vectors Two points p and q in the plane are given by their Cartesian coordinates. The point q is said to dominate p, written p < q, if and only if the x-coordinate of q is larger than the x-coordinate of p, and the y-coordinate of q is larger than the y-coordinate of p. Given a set S of points in the plane, a point q of S is a maximal vector (also called a maximal element) with respect to S if there is no other point q' E S such that q < q'. For n points in the plane, the problem of computing the maximal vectors can be solved sequentially in optimal 0 (n log n) time [Prep85]. Computing the maximal vectors of S is also called finding the m-contour, m(S), of S. A related problem is to find for each point p E S the number of points D(p,S) in S dominated by p. This problem is known as the ECDF searching problem [as D(p,S) is called the empirical cumulative distributionfunction]. An algorithm for computing the m-contour of a set of points in the plane that uses divide-and-conquer and runs in O(n1/2 ) time on a mesh of size n is presented in [Dehn86a]. There it is also shown how to solve the ECDF searching problem in O(n1 2 ) time on an n 12 x n 1 2 mesh. An algorithm for finding all kth m-contours of an image S on a systolic screen is described in [Dehn86b]. Let m(S, I) = m(S) denote the first m-contour of S [i.e., the set of all maximal pixels of S (sorted by x-coordinate)]. This notion is generalized to define the kth mi-contour as m(S,k) m(S, k- 1) for k > 1. The kth m-contours can be found in O(nl/2 ) time on a systolic screen of size n [Dehn86b]. An algorithm for computing the m-contour of a set of points in the plane is given in [Stoj88b]. It runs in 0(logn) time using O(n) processors on a CREW PRAM. A constant-time algorithm for computing the maximal vectors of a set of points in the plane is presented in [Akl89c]. The algorithm runs on a BSR machine with O(n) processors (see Chapter 11). The problem of finding dominating points and maximal vectors in the plane can be generalized to higher dimensions. Sequentially, the maxima of a set of n points in d dimensions can be obtained in time 0(n logd-2 n), d > 3 [Prep85]. In [Dehn88d] it is shown how the ECDF searching problem can be solved in O(n 1/2 ) time for arbitrary constant dimension d on a mesh of size n. This result is used to show how the maximal vectors of a set of points in d dimensions can be found in the

42

Convex Hull

Chap. 3

a 0

. O-

b .

0

..-

*c

'' '

''''

0 . :

.

:

...

0

0 0*

-

.

.:

Figure 3.6 Set S of points and extremal points of S. Points a, b, and c are extremal points, and a and c are convex hull points, but b is not.

same time and processor bounds. An algorithm for computing maximal vectors in three dimensions is given in [Atal86b] that runs in 0(logn log logn) time with 0(n) processors on a CREW PRAM. This running time is improved to O(log n) in [Atal89c]. In [Atal89c], the two-set dominance counting problem is also solved in 0(log n) time with O(n) processors on a CREW PRAM: Given two sets of points PI and P2 such that IPI + IP21 < n, two-set dominance counting determines for each point p c P2 the number of points in PI dominated by p. In [Atal89c], computing maximal vectors in three dimensions and two-set dominance counting are also solved in O(logn) time using O(n) processors on an EREW PRAM with space requirements increased by a factor of O(logn). Algorithms on a hypercube of size n that solve the problems of ECDF searching, two-set dominance counting, and finding three-dimensional maximal vectors are given in [Stoj88a]. The algorithms use the divide-and-conquer technique such that the time to merge two solutions is O(logn), resulting in a running time of t(n) = 2t(n/2) +c log n for some constant c, which is 0(log2 n). Algorithms on a hypercube that solve the ECDF searching problem and the two-set dominance counting problem in the plane and compute the maximal vectors of a set of n points in three dimensions are presented in [MacK9Oa] that run in 0 (SORT(n) log log n) time, where SORT(n) is the time required to sort n values on a hypercube of size n.

Sec. 3.5

Related Problems

Problem

Reference

Model

Processors

Running time

Maximal vectors (2-D)

[Dehn86a] [Stoj88b] [Akl89c] [Atal86b]

Mesh CREW PRAM BSR CREW PRAM

O(n) O(n)

O(n 1/2) O(logn)

O(n)

0(1)

O(n)

0(logn log logn)

[Atalg9c] [Atal89c] [Stoj88a]

CREW PRAM EREW PRAM Hypercube

0(n) 0(n) 0(n)

O(logn) O(Iogn)

[MacK90a] with [Leig9 1] [Dehn88d]

Hypercube

0(n)

0(log n(log log n)

Mesh

0(n)

O(n 1 /2)

All kh m-contours

[Dehn86bl

Mesh

0(n)

O(n

2)

ECDF search (2-D)

[Dehn86a]

Mesh

0(n)

O(n

/2)

Hypercube Hypercube

0(n) 0(n)

0(log2 n) 0(log n(log log n)2 )

ECDF search (d-D)

[Stoj88a] [MacK90a] with [Leig9 l] [Dehn88d]

Mesh

[Atal89c]

EREW, CREW PRAM Hypercube Hypercube

0 (n) O(n) 0(n) 0(n)

0 (n 1/2) 0(logn) 0(log 2 n) 0(log n(log log n) 2 )

Maximal vectors (3-D)

Maximal vectors

43

2

O(log n) 2

)

(d-D)

Two-set dominance counting

[Stoj88a] [MacK90a] with [Leig9l]

Figure 3.7 Performance comparison of parallel algorithms for computing maximal vectors and related problems.

Multiway divide-and-conquer is used twice and the solutions are merged in O(SORT(n)) time. At the time of this writing, the fastest known sorting algorithm on a hypercube of size n has running time O(logn log logn) [Cyph9O, Leig9l1. Using this sorting algorithm, the algorithms in [MacK9Oa] run in 0 (log n (log log n)2 ) time. In [MacK9Ob] exact communication time analyses are given for algorithms on a hypercube that solve the ECDF searching problem and the two-set dominance counting problem in the plane, and maximal vector computation in three dimensions. These problems are discussed further in Chapter 11. There is a close relationship between the maximal vectors and the convex hull of a set. We say that a point is extremal if it dominates all points in one of the orthogonal directions. For example, there are usually four extremal points for a set of n planar points, one for each of the positive or negative, horizontal or vertical directions. Axes through these points define four quadrants. With an appropriate assignment of + and signs to the coordinates, the maximal points can be found in each quadrant. The union of maximal points thus found defines a boundary of the point set. Convex hull points are a subset of boundary points. Figure 3.6 shows a set of points with extremal points

Convex Hull

44

Chap. 3

highlighted. Note that points a, b, and c are extremal points, and a and c are convex hull points but b is not. Suppose that the convex hull of a point set has been computed. We now remove the hull points and compute the convex hull of the remaining set. This process, which is repeated until no points are left, is referred to as peeling. Sequentially, peeling can be performed optimally in 0(n log n) time for a planar point set [Prep85]. Peeling can also be applied to the boundary points. No efficient parallel algorithm is known for peeling, in either of its forms, at the time of this writing [ElGi9O]. 3.5.5 Summary The table in Figure 3.7 summarizes the results in the preceding section.

3.6 Problems 3.1. The CRCW PRAM algorithm of Section 3.1 uses 0(n2 ) processors to compute CH(P) and runs in constant time. Is there a CRCW PRAM algorithm that computes CH(P) in constant time while using fewer processors? Design an algorithm that runs on the broadcasting with selective reduction model (defined in Section 2.3.3) and uses properties I and 2 of Section 3.1 to compute CH(P). 3.3. An approach that is often used in sequential computational geometry is to design algorithms whose running time is a function of the size of the output. Apply this approach to design a parallel algorithm for computing the convex hull whose running time is O(h), where h is the number of points of CH(P). 3.4. Consider the following algorithm for computing CH(P). Let Ymin and Ymax be the two points of P with minimum and maximum y-coordinate, respectively. The convex hull is viewed as consisting of two parts: the right hull (a convex polygonal chain that goes from Ymin to Ymax) and the left hull (a convex polygonal chain that goes from Ymax to Yinn). If 3.2.

we had a procedure to compute the right hull, we could trivially modify it to yield the left

3.5.

hull; we therefore concentrate on developing such a procedure. Assume that a horizontal line is drawn that goes through P. It intersects exactly one edge of the (yet-to-be-computed) right hull; let this edge be (pi, pj), with p, being the point with the smaller y-coordinate. If we could determine (pi, pj), the process could be applied recursively, first to obtain the chain from Ymrn to pi, and then to obtain the chain from p1 to Yiax- The right hull is then computed by concatenating each of these two chains to the appropriate endpoint of (pi, pj). This procedure is an example of the divide, merge, and then conquer algorithm design paradigm, which ought to be distinguished from the more common divide, conquer, and then merge paradigm. (a) Show how (pi, pj) can be determined using only our knowledge of the equation of the horizontal line and the coordinates of the points of P. (b) Use the procedure above for computing the right hull to develop a parallel algorithm for CH(P). Given a set P of points in the plane, consider the following iterative process applied to the points of P:

Sec. 3.7

3.6. 3.7.

3.8.

3.9. 3.10.

3.11.

3.12.

References

45

while there are still points left in P do 1. Compute the convex hull of P, CH(P). 2. Update P to P - CH(P) [i.e., delete the vertices of CH(P).] The number of times the iteration above is executed for a given set P is called the depth of the set P. As mentioned at the end of the preceding section, the process described above is itself referred to as peeling. Design an efficient parallel algorithm for computing the depth of a set. The convex hull of a simple polygon with n vertices can be computed sequentially in 0(n) time [Prep85]. How fast can this problem be solved in parallel? Assume that the convex hull of a set of points in the plane has been computed. It is now required to maintain the hull so that when insertions to and deletions from the point set are made, the hull is updated dynamically. Discuss how this can be done in parallel. The algorithm of [Cole88a] (described in Section 3.1) computes the convex hull of n points in 0(logn) time with 0(n) processors on a CREW PRAM. In this algorithm, the points are initially sorted by x-coordinate, then divided into sets and merged. Design an 0(logn), 0(n) processor CREW PRAM algorithm that computes the convex hull of a set of n points using a divide-and-conquer technique without initially sorting the points. Design a parallel algorithm for computing the convex hull of a set of points in dimensions higher than three [Hole91] on a mesh-of-trees parallel computer. Given a set S of parallel line segments in the plane, it is required to find a polygon that intersects every segment of S and whose perimeter is a minimum. Design a parallel algorithm to solve this problem. Let S be a set of disjoint line segments in the plane. The set S is said to be extremally situated if each segment in S has at least one of its endpoints on the boundary of the convex hull of S. A circumscribing polygon of S is a simple polygon P such that vertices of P are the endpoints of the segments in S and every segment in S is either an edge or an internal diagonal of P. Design a parallel algorithm that constructs a circumscribing polygon of an extremally situated set of line segments. Given a simple polygon P and a point q, it is required to find the longest (equivalently, shortest) straight-line segment with endpoints on the boundary of P that goes through q (if q is interior to P or on its boundary), or whose extension goes through q (if q is exterior to P). Design a parallel algorithm to solve this problem.

3.7 References [Agga88] [Akl82] [Akl84] [Akl85b] [Akl89a] [Akl89b]

A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Dtnlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. S. G. AkM, A constant-time parallel algorithm for computing convex hulls, BIT, Vol. 22, 1982, 130-134. S. G. AkM, Optimal parallel algorithms for computing convex hulls and for sorting, Computing, Vol. 33, 1984, 1-11. S. G. Akl, ParallelSorting Algorithms, Academic Press, Orlando, Florida, 1985. S. G. AkM, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. S. G. AkI, On the power of concurrent memory access, in Computing and Information, R. Janicki and W. W. Koczkodaj (Editors), Elsevier, New York,

46

[AkI89c] [AtaI86aI [Atal86b]

[Ata189c]

[Atal9la]

[Ben-083]

[Blel88]

[Brow79a] [Chaz84] [Chen87] [Chow80] [Chow8l]

[Cole88a]

[Cole88b] [Cyph90]

[Dado89]

Convex Hull

Chap. 3

Proceedings of the International Conference on Computing and Information, ICCI '89, Toronto, 1989, 49-55. S. G. AkU and G. R. Guenther, Broadcasting with selective reduction, Proceedings of the Eleventh IFIP Congress, San Francisco, August 1989, 515-520. M. J. Atallah and M. T. Goodrich, Efficient parallel solutions to some geometric problems, Journal of Parallel and Distributed Computing, Vol. 3, 1986, 492-507. M. J. Atallah and M. T. Goodrich, Efficient plane sweeping in parallel (preliminary version), Proceedings of the Second Annual ACM Symposium on Computational Geometry, Yorktown Heights, New York, June 1986, 216-225. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, SIAM Journal on Computing, Vol. 18, No. 3, 1989, 499-532. M. J. Atallah, F. Dehne, R. Miller, A. Rau-Chaplin, and J.-J. Tsay, Multisearch techniques for implementing data structures on a mesh-connected computer, Proceedings of the Third ACM Symposium on ParallelAlgorithms and Architectures, Hilton Head, South Carolina, July 1991, 204-214. M. Ben-Or, Lower bounds for algebraic computation trees, Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, Boston, May 1983, 80-86. G. E. Blelloch and J. J. Little, Parallel solutions to geometric problems on the scan model of computation, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 218-222. K. Q. Brown, Voronoi diagrams from convex hulls, Information Processing Letters, Vol. 9, 1979, 223-228. B. Chazelle, Computational geometry on a systolic chip, IEEE Transactions on Computers, Vol. C-33, No. 9, September 1984, 774-785. G.-H. Chen, M.-S. Chern, and R. C. T. Lee, A new systolic architecture for convex hull and half-plane intersection problems, BIT, Vol. 27, 1987, 141-147. A. L. Chow, Parallel algorithms for geometric problems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1980. A. L. Chow, A parallel algorithm for determining convex hulls of sets of points in two dimensions, Proceedings of the Nineteenth Annual Allerton Conference on Communication, Control and Computing, Monticello, Illinois, September/October 1981, 214-223. R. Cole and M. T. Goodrich, Optimal parallel algorithms for polygon and point-set problems (preliminary version), Proceedingsof the Fourth Annual ACM Symposium on Computational Geometry, Urbana-Champaign, Illinois, June 1988, 201-210. R. Cole, Parallel merge sort, SIAM Journal on Computing, Vol. 17, No. 4, August 1988, 770-785. R. Cypher and C. G. Plaxton, Deterministic sorting in nearly logarithmic time on the hypercube and related computers, Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, Baltimore, May 1990, 193-203. N. Dadoun and D. G. Kirkpatrick, Parallel construction of subdivision hierarchies, Journal of Computer and System Sciences, Vol. 39, 1989, 153-165.

Sec. 3.7 [Dehn86aJ

[Dehn86b]

[Dehn88b]

[Dehn88d]

[Eddy77] [EIGi90] [Fjal9O] [Good87b] [Hole90]

[Hole9lJ

[Jarv73] [Kuma86]

[Leig9l] [Lu86a]

[MacK9Oa]

[MacK9Ob]

[Mill84a]

References

47

F. Dehne, O(n1/2) algorithms for the maximal elements and ECDF searching problem on a mesh-connected parallel computer, Information Processing Letters, Vol. 22, 1986, 303-306. F. Dehne, J.-R. Sack, and N. Santoro, Computing on a Systolic Screen: Hulls, Contours and Applications, Technical Report SCS-TR-102, School of Computer Science, Carleton University, Ottawa, Ontario, October 1986. F. Dehne, J.-R. Sack, and I. Stojmenovi6, A note on determining the 3-dimensional convex hull of a set of points on a mesh of processors, Proceedings of the Scandinavian Workshop on Algorithm Theory (SWAT), Sweden, Lecture Notes in Computer Science, No. 318, Springer-Verlag, Berlin, 1988, 154-162. F. Dehne and I. Stojmenovi6, An O(VH) time algorithm for the ECDF searching problem for arbitrary dimensions on a mesh-of-processors, Information Processing Letters, Vol. 28, 1988, 67-70. W. F. Eddy, A new convex hull algorithm for planar sets, ACM Transactions on Mathematical Software, Vol. 3, No. 4, 1977, 398-403. H. ElGindy, personal communication, 1990. P.-O. Fjdllstrom, J. Katajainen, C. Levcopoulos, and 0. Petersson, A sublogarithmic convex hull algorithm, BIT, Vol. 30, No. 3, 1990, 378-384. M. T. Goodrich, Finding the convex hull of a sorted point set in parallel, Information Processing Letters, Vol. 26, December 1987, 173-179. J. A. Holey and 0. H. Ibarra, Iterative algorithms for planar convex hull on meshconnected arrays, Proceedings of the 1990 International Conference on Parallel Processing, St. Charles, Illinois, August 1990, 102-109. J. A. Holey and 0. H. Ibarra, Triangulation, Voronoi diagram, and convex hull in k-space on mesh-connected arrays and hypercubes, Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, Illinois, August 1991, Vol. 111, Algorithms and Applications, 147-150. R. A. Jarvis, On the identification of the convex hull of a finite set of points in the plane, Information Processing Letters, Vol. 2, No. 1, 1973, 18-21. V. K. Prasanna Kumar and M. M. Eshaghian, Parallel geometric algorithms for digitized pictures on a mesh of trees, Proceedings of the 1986 International Conference on ParallelProcessing, St. Charles, Illinois, August 1986, 270-273. F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays Trees Hypercubes, Morgan Kaufman, San Mateo, California, 1991. M. Lu, Constructing the Voronoi diagram on a mesh-connected computer, Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, Illinois, August 1986, 806-811. P. D. MacKenzie and Q. F. Stout, Asymptotically efficient hypercube algorithms for computational geometry, Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, College Park, Maryland, October 1990, 8-11. P. D. MacKenzie and Q. F. Stout, Practical hypercube algorithms for computational geometry, poster presentation at the Third Symposium on the Frontiersof Massively Parallel Computation, College Park, Maryland, October 1990. R. Miller and Q. F. Stout, Computational geometry on a mesh-connected computer

48

[Mill84b]

[Mill85a]

[MilI85b]

[MilI87a]

[MilI88] [Mill89a] [Mill89b] [Nath80

[Over8l] [Prep851 [Reif9O]

[Stoj88a]

[Stoj88b] [Stou84]

[Stou88]

[vanW90

Convex Hull

Chap. 3

(preliminary version), Proceedings of the 1984 International Conference on Parallel Processing, Bellaire, Michigan, August 1984, 66-73. R. Miller and Q. F. Stout, Convexity algorithms for pyramid computers (preliminary version), Proceedings of the 1984 International Conference on Parallel Processing, Bellaire, Michigan, August 1984, 177-184. R. Miller and Q. F. Stout, Pyramid computer algorithms for determining geometric properties of images, Proceedings of the First Annual ACM Symposium on Computational Geometry, Baltimore, June 1985, 263-271. R. Miller and Q. F. Stout, Geometric algorithms for digitized pictures on a mesh-connected computer, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-7, No. 2, March 1985, 216-228. R. Miller and S. E. Miller, Using hypercube multiprocessors to determine geometric properties of digitized pictures, Proceedings of the 1987 International Conference on ParallelProcessing, St. Charles, Illinois, August 1987, 638-640. R. Miller and Q. F. Stout, Efficient parallel convex hull algorithms, IEEE Transactions on Computers, Vol. C-37, No. 12, December 1988, 1605-1618. R. Miller and S. Miller, Convexity algorithms for digitized pictures on an Intel iPSC hypercube, Supercomputer, Vol. 31, May 1989, 45-51. R. Miller and Q. F. Stout, Mesh computer algorithms for computational geometry, IEEE Transactions on Computers, Vol. C-38, No. 3, March 1989, 321-340. D. Nath, S. N. Maheshwari, and P. C. P. Bhatt, Parallel Algorithms for the Convex Hull Problem in Two Dimensions, Technical Report EE 8005, Department of Electrical Engineering, Indian Institute of Technology, Delhi Hauz Khas, New Delhi 110016, India, October 1980. M. H. Overmars and J. van Leeuwen, Maintenance of configurations in the plane, Journal of Computer and System Sciences, Vol. 23, 1981, 166-204. F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. J. H. Reif and S. Sen, Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 327-337. I. Stojmenovi6, Computational geometry on a hypercube, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 100-103. I. Stojmenovi6 and M. Miyakawa, An optimal parallel algorithm for solving the maximal elements problem in the plane, Parallel Computing, Vol. 7, 1988, 249-251. Q. F. Stout and R. Miller, Mesh-connected computer algorithms for determining geometric properties of figures, Proceedings of the 1984 International Conference on Pattern Recognition, 1984. Q. F. Stout, Constant-time geometry on PRAMS, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 104-107. K. van Weringh, Algorithms for the Voronoi diagram of a set of disks, M.Sc. thesis,

Sec. 3.7

References

49

Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1990. [Wang9Oa] B.-F. Wang and G.-H. Chen, Constant Time Algorithms for Sorting and Computing Convex Hulls, Technical Report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, 1990. [Wang9ob] B.-F. Wang, G.-H. Chen, and F.-C. Lin, Constant time sorting on a processor array with a reconfigurable bus system, Information Processing Letters, Vol. 34, No. 4, 1990, 187-192. [Yao8l] A. C. Yao, A lower bound to finding convex hulls, Journal of the ACM, Vol. 28, No. 4, 1981, 780-787.

4 Intersection Problems

The detection and reporting of intersections among geometric objects are often needed in several applications, including computer graphics, pattern classification, and integrated circuit design. In this chapter we discuss parallel algorithms for intersection problems involving line segments, polygons, half-planes, rectangles, and circles.

4.1 Line Segments Among all line segment intersection problems the following two have received the most attention: 1. Given a set of n line segments in the plane, determine if any two line segments intersect. 2. Given a set of n line segments in the plane, find and report all pairwise intersections, if any exist. The sequential lower bound for the first problem is Q (n log n) and, for the second, Q (n logn + I), where I is the number of intersections [Prep85]. A natural method for solving both problems sequentially is the plane sweep technique, in which the endpoints of the line segments are sorted by x-coordinate and a vertical line is swept across the set of segments [Prep85]. Data structures are maintained and updated at critical or event points such as endpoints of the segments and intersection points between the segments. The plane sweep technique appears to be inherently sequential; however, a number of parallel algorithms on the PRAM model of computation for detecting line segment intersections exist that utilize a parallel plane sweep technique. A CREW PRAM algorithm is presented in [Agga88] that detects if two line segments in a set S of n planar line segments intersect. Its running time is O(logn) and it uses O(n logn) processors. The algorithm begins by sorting the 2n endpoints of the line segments by x-coordinate and projecting them onto the x-axis, resulting in at most 2n + 1 intervals. A complete balanced binary tree T called a segment tree [Bent8O] is constructed in O(logn) time such that each of the intervals is stored at a leaf of T in sorted order. Let 51

Intersection Problems

52

Chap. 4

I I 1

2

3

Figure 4.1 segment s.

4

5

i

6

: 7

8

:

9

o10

1I

12

13

Set of line segments and segment tree. The node v covers the

the interval stored at the node v be [a,, bv]. At an internal node v in T, the interval [av, bv] = [ax, bJ] U [ay, by] is stored, where x and y are the left and right children of v, respectively, and bx = ay. Figure 4.1 illustrates a set of line segments and a corresponding segment tree. A node v of T covers a line segment s if the interval stored at v is contained in the projection of the segment s on the x-axis, but the interval stored at the parent of v is not. Let I, be the infinite vertical slab defined by [av, bv ] x (-xo, +oo), and let (v) be the set of segments covered by v. At each node v, the intersection of the segments in (v) with the boundary of 1(v) (¢(v) 1(v)) are stored in a list called H(v). These lists are found using binary search in T. Note that there are at most two nodes vi and vj at each level in T that contain the same segment in their sets H(vi) and H(vj). Therefore, the total number of nodes that contain the same segment in their H(v) set is O(logn), and the segment tree is stored in O(n logn) space. For each node v E T, the order in which the segments in 4(v) intersect the left

n

(a)

(b)

Figure 4.2 Detecting (a) type I and (b) type 2 intersections.

and right boundaries of I, is found by sorting the segments in H(v) in two ways: according to their left endpoints and according to their right endpoints. The left and right endpoints of the segments lie on the left and right boundaries of IT, respectively. An intersection is detected if the two sorted orders are not the same. We call these intersections type I intersections. See Figure 4.2. If no intersection is detected, a list W(v) is created for each v E T that contains those segments that have at least one endpoint in I,. The total size of the lists W(v) for each v E T is 0 (n log n). A second test for intersection is performed at each node by locating the two endpoints of each segment in W(v) in the sorted list of segments H(v); that is, the segments in H(v) above and below each endpoint are found. If the two endpoints are not in the same "cell" defined by Iv and two segments in H(v), an intersection is detected. These intersections are called type 2 intersections. Sorting the segments in each H(v) is done by assuming there is one list of size 0(n logn) such that each item contains a pair (i, v), where i is the segment index and v is the node in T, and sorting with v as the key. The H(v) lists are then sublists of the large sorted list. The entire algorithm takes 0(logn) time using 0(n logn) processors on a CREW PRAM. It is also shown that the algorithm can be made to run in 0(log 2 n) time using 0(n) processors, using the emulation technique described in [Bren74]. A similar algorithm is given in [Atal86b] that runs in O(log n log log n) time and uses 0(n) processors on a COMMON CRCW PRAM or in 0(log2 n) time using 0(n) processors on a CREW PRAM. This result is improved in [Atal89c], in which the idea of the segment tree used in [Agga88] is combined with the techniques of cascading divide-and-conquer (described in Chapter 3) and fractional cascading (see below) to obtain an algorithm for detecting intersections among line segments in the plane in 0 (log n) time on a CREW PRAM with 0(n) processors. The algorithm presented in [Atal89c] is cost-optimal in light of the Q (n logn) sequential lower bound for this problem [Prep85]. The technique of cascading divide-and-conquer is modified to allow 53

54

Intersection Problems

Chap. 4

the merging of elements that belong to a partial order rather than a total order, and a data structure called a plane sweep tree is constructed, which is based on the idea of the segment tree used in [Agga88]. One main difference between the tree used in [Agga88] and that used in [Atal89c] is that the intervals stored at the leaves of the plane sweep tree each contain 0(logn) segments. The H(v) lists (called Cover(v) in [Atal89c]) are constructed using cascading merge operations in T. The objects being merged at each stage are nonintersecting line segments instead of intervals on the x-axis, which means that they define a partial order instead of a total order: A line segment a is "above" I n a is I that intersects both a and b such that another line segment b if there is a line before lists H(v) the of some from deleted be to have I n b. Thus line segments above preserve to deleted being than rather identity change merging. These segments simply the rank information during the cascading merges. Tests for type1 and type 2 intersections are carried out in a similar manner as in the algorithm by [Agga88]; however, the test for type1 intersections is executed during the construction of T rather than after it is built. It is shown how adding steps required to do this test for intersection takes no longer, asymptotically, than creating T. The test for type 2 intersections is carried out after T has been built and made into a fractional cascading data structure. The problem of fractional cascading is described as follows: Let G = (V,E) be a directed graph such that every node ve V contains a sorted list such that there is a V Vm) C(v) called a catalog; given a sequence of nodes (vI, V2,.., x, construct element arbitrary an and G, in Vm-) V3, . , V through vm to VI from path 2 a data structure that facilitates the location of x in each C(vi) quickly. A linear time algorithm for building such a data structure is given in [Chaz86] allowing x to be located in each C(vi) in 0(logn + m log d(G)), where d(G) is the maximum degree of any node inG, and n = JVI + JEl + Y-VEV IC(v)I. It is shown in [Atal89c] how to make a fractional cascading data structure from T in 0(logn) time using 0(n/logn) processors. The catalogs C(v) correspond to the H(v) lists at each node v E T. The second test for intersection is carried out by multilocating each endpoint in T. Let p be the endpoint of some segment s. The point p is located in each H(v) in the tree and a test is performed to see if s intersects the segments in H(v) directly above and below p. Multilocating a point p in the plane sweep tree with fractional cascading takes O(logn) time using one processor. An algorithm that solves problem 2 (i.e., finding and reporting all pairwise interare isothetic sections) is given in [Chow80] for the case when the line segments 2 Imax) time n + (i.e., parallel to the coordinate axes). This algorithm runs in 0(log number of maximum on a CREW PRAM with 0(n) processors, where Imax is the solves the that [Good88] in intersections of any line segment. An algorithm is presented I / log n) + (n 0 using time n) same problem on a CREW PRAM and runs in 0 (log where model a assumes processors where I is the size of the output. The algorithm included are processors words, the number of processors grows dynamically; in other in the computation whenever they are needed. Using the same model, an algorithm for reporting all pairwise intersections between arbitrary line segments in the plane is 2 presented in [Good88]. This algorithm runs in 0(log n) time and uses 0(n + I/ logn) processors on a CREW PRAM.

Sec. 4.2

Polygons, Half-Planes, Rectangles, and Circles

55

An algorithm for reporting all points of intersection between well-behaved curve segments (WCSs) in the plane is presented in [Rub9O]. Let A and B be two sets of WCSs in the plane such that no two segments in A intersect and no two segments in B intersect. A WBS, s, is the graph of a function f5(x) on an interval dom(fs) of the x-axis such that f5 (x) is continuous on dom(f,) and any vertical line intersects s at most once. It is assumed that the function f5 (x) can be evaluated in constant time by one processor, the query x e dom(f,) can be answered in constant time by one processor, and it can be determined if two such segments intersect in constant time by one processor. The algorithm in [Rtib9O] runs in O(Imaxlogn) time using o (n + I/ log n) processors on a CREW PRAM, where n is the total number of segments in A and B, Imax is the maximum number of intersections between two segments, and I is the number of reported intersections. There are a number of algorithms for solving the intersection detection problem for line segments in the plane that are designed for network architectures. In [Jeon9O] it is shown how intersections can be detected among n line segments in the plane in 0(n1 /2 ) time on an nl1 /2 x n'/ 2 mesh. This result is time optimal in view of the time it takes to pass information from one processor to another on the mesh. An algorithm that achieves the same optimal running time on a mesh is presented in [Mill87b, Mill89b]. In [Shih87], a solution to the problem of detecting an intersection among n line segments in the plane is obtained by examining every pair of line segments in O(n) time on a linear array of size n. An algorithm is given in [MacK9Ob] that determines, for each segment in a set S of isothetic line segments, whether or not it is intersected by another segment in S. The algorithm runs in O(log2 n) time on a hypercube of size n. Two algorithms in [Chow80] report all pairwise intersections among n isothetic line segments for the CCC model of computation. The first algorithm runs in O(log 2 n + Imax) time on O(n) processors, where Imax is the maximum number of intersections per line segment. The second algorithm runs in 0(K logn + Imax) time with nl±+/K processors where 1 < K < logn. Finally, an EREW PRAM algorithm is described in [Kim9O] for the segment dragging problem: Given a set L of n nonintersecting line segments in the plane, it is required to preprocess L so that for a query line segment s (such that s is vertical and intersects no segment in L), the first element of L intersected by s when s is dragged horizontally to the right can be found efficiently. It is shown in [Kim9O] that the preprocessing can be performed using O(n) processors in 0(logn) time so that a query can subsequently be answered in O(logn) time by a single processor.

Summary Figure 4.3 shows a table that summarizes the results for detecting and reporting line segment intersections in parallel. Note that Imax is the maximum number of intersections per line segment, I is the number of intersections reported, and 1 < K < log n.

Intersection Problems

56

Chap. 4

Problem

Reference

Model

Processors

Running time

Detection

[Agga88]

CREW PRAM CREW PRAM CRCW PRAM

0(n logn)

0(logn)

0(n) 0(n)

0(log' n) 0(logn log logo)

[MacK90b]

(COMMON) Mesh Linear array CREW PRAM CREW PRAM Hypercube

0(n) 0(n) 0(n) 0(n + 1/logn) 0(n)

0(n'/2) 0(n) 0(logn) 0(log2 n) 0(log 2 n)

[Chow8O]

CREW PRAM

0(n)

0(log2 n + Ima)

[Chow8O] [Chow8O]

CCC CCC

0(n) 0(n'"+1K)

0(log2 n + /ma 0(K logn + Im)

[Atal86b], [Agga88] [Atal86b] [Mill87b], [Mill89b], [Jeon9O] [Shih87] [Atal89c] [Good88]

Reporting Detection (isothetic) Reporting (isothetic)

[Good88]

CREW PRAM

0(n + I/logn)

0(logn)

Reporting (WCSs)

[Rub9O]

CREW PRAM

0(n + I/ logn)

0(hax logn)

Segment dragging (preprocessing)

[Kim90]

EREW PRAM

0(n)

0(logn)

Figure 4.3 Performance comparison of parallel line segment intersection algorithms.

4.2 Polygons, Half-Planes, Rectangles, and Circles Two polygons P and Q are said to intersect if an edge of P crosses an edge of Q. In general, the two polygons need not be simple [i.e., two or more edges of P (or two or more edges of Q) may cross]. Without loss of generality, we assume that the two polygons have the same number of edges, n. A parallel algorithm is described in [Chaz84] which uses a linear array of 0(n) processors to determine whether P and Q intersect. The edges of P (each given by the coordinates of its endpoints) are stored in the linear array, one edge per processor. The edges of Q, also given by the coordinates of their endpoints, are fed in the array, one at a time in a pipeline fashion, and travel left to right. This way, every edge of P "meets" every edge of Q, and a constant time test for intersection is performed. Each processor keeps track of whether it has detected a pair of crossing edges. The last edge of Q carries along an extra variable answer. Initially, answer is set to false; if P and Q intersect, at least one processor will set answer to true. The algorithm requires 0(n) time and has a cost of 0(n 2 ). An

algorithm with the same cost but which runs on a mesh-of-trees is given in [Akl89a]. It uses 0(n

2

/logn) processors and runs in 0(logn) time. It should be noted that the

only known lower bound on the number of steps required to solve this problem is the trivial one of Q(n) operations performed while reading the input. Furthermore, it is not known whether a sequential algorithm exists with a smaller than quadratic running time.

Sec. 4.2

Polygons, Half-Planes, Rectangles, and Circles

57

The cost optimality of the algorithms of [Chaz84] and [Akl89a] is therefore an open question. By restricting the polygons to be simple, we can broaden the notion of intersection. Two simple polygons intersect if either their boundaries intersect or if one is entirely contained in the other. It is shown in [Chaz84] how the algorithm described above can be extended to test for the latter condition on a linear array. A simple polygon P is contained in a simple polygon Q if every one of its vertices lies inside Q. To test whether a point p lies inside a polygon Q, we draw a vertical line through p and find the intersection points between this line and the edges of Q: If the number of such points above p is odd, then p is inside Q. By storing a vertex of P in each processor of the linear array, and moving the edges of Q from left to right, we can determine whether all points of P lie inside Q. The same procedure is used to determine whether Q is contained in P. The new algorithm runs in O(n) time for a cost of 0(n 2 ). This algorithm is not cost-optimal, however, in view of the well-known time-optimal 0 (n log n) sequential algorithm for detecting if two simple polygons intersect [Prep85]. Note that line segment intersection algorithms can be used to detect if edges of two polygons P, and P2 intersect. If no intersection is detected among the edges, testing for inclusion of P, in P2 and P2 in P, involves testing if one vertex of P, is inside P2 and if one vertex of P2 is inside P,. Point location algorithms are discussed in Section 5.1. The line segment intersection detection algorithm (Section 4.1) and point location algorithm (Section 5.1) in [Atal89c] can be combined to detect if two simple polygons intersect in O(logn) time using O(n) processors on a CREW PRAM. A simple polygon P is said to be star shaped if there exists a point Z not external to P such that for all points p of P, the line segment (z,p) lies entirely within P. An EREW PRAM algorithm is described in [Ghos91] which determines whether the boundaries of two star-shaped polygons intersect, or one contains the other, or they are disjoint. For two star-shaped polygons with a total of n vertices, the algorithm uses 0(n/logn) processors and runs in 0(logn) time. By restricting the polygons to be convex, the detection problem can be solved on an 0(n)-processor CREW PRAM in constant time [Atal88a]. In [Dado89], two CREW PRAM algorithms are presented for detecting the separation of two convex polyhedra of n vertices in three dimensions using O(n) processors: 1. A randomized algorithm whose probabilistic time is O(logn). 2. A deterministic 0 (log n log* n) -time algorithm. We have dealt so far with the problem of detecting whether two polygons intersect. A companion problem is that of reporting the common intersection of two polygons if it exists. The intersection of two convex polygons with a total of O(n) vertices is a convex polygon with O(n) vertices, and computing this intersection has a sequential lower bound of Q (n) time [Prep85]. Computing the intersection of two convex polygons can be solved on an 0(n)-processor hypercube in O(logn) time [Stoj88a] and on an 0(n)-processor mesh in O(n 1 /2 ) time [Mill89b, Jeon9O]. Detecting and computing common intersections among more than two polygons have also been considered. In [Mill87b] the problem of testing for the existence of an

58

Intersection Problems

Chap. 4

'e function.

I I I I I I

l I

I I I I t I

I l

Is

i II II II II II I I I

I

Dndefines

Figure 4.4 Two-variable linear program.

intersection among multiple simple polygons with a total of no more than n edges is solved in 0(n1 /2 ) time on a mesh of size n. It is shown in [Boxe9O] that restricting the polygons to be monotone with a total of n edges leads to efficient algorithms for detecting a common intersection among the polygons. A polygon P is monotone if there is a direction d such that any line perpendicular to d intersects P in at most two points. (Note that a convex polygon is monotone in all directions.) The algorithms in [Boxe9O] assume that all polygons are monotone in the same direction; using 0(n) processors, they run on a CREW PRAM in 0 (log 2 n) time, on a hypercube in 0 (log 2 n) time, and on a mesh in 0 (n 1/2) time. A single processor algorithm is also given that runs in 0 (n log n) time. The optimality of these running times is an open problem. A different intersection problem is solved in [Chow80], where three algorithms are presented for reporting all intersecting pairs of rectangles in a set of n isothetic rectangles. Isothetic rectangles are rectangles whose sides are parallel to the coordinate axes. The first algorithm runs in 0 (log 2 n + nmax) time and uses 0(n) processors on a CREW PRAM, where Imax is the maximum number of intersections per rectangle. The second and third algorithms are designed for the CCC model of computation. One runs in 0 (log 2 n + Imax) time and uses 0(n) processors, and the other runs in 0(K logn + Imax) time and uses 0(nl+l/K) processors, for 1 < K < logn.

Sec. 4.2

Polygons, Half-Planes, Rectangles, and Circles

59

In a related problem, it is required to determine, for each rectangle in a set of n isothetic rectangles, whether it is intersected by another rectangle in the set. This problem can be solved in 0(n) time on an 0(n)-processor linear array [Chaz84], in 0(n1/ 2 ) time on an 0(n)-processor mesh [Lu86b, Mill89b], and in 0(log 2 n) time on a hypercube of size n [MacK9Ob]. The problem of determining for each of n given circles whether it is intersected by another circle is solved in [Mill89b] in 0(l/2) time on a mesh of size n. An approach similar to the one in rChaz84] for the polygon intersection problem is used in [Chaz84] and [Chen87] to compute the intersection of n half-planes on an 0(n)-processor linear array in 0(n) time. This algorithm is not cost-optimal in light of the Q (n log n) lower bound for computing half-plane intersections with a single processor [Prep85]. A problem that is solved by finding the intersection of half-planes is the problem of finding the kernel of a simple polygon P (i.e., the locus of points q = (x, y) such that for all points p inside P, the line segment (q, p) lies entirely inside P [Prep85]). This locus of points is found by intersecting n half-planes defined by the edges of P. Let (eo, el, .. ., en-1) be the edges of P such that eo and e"_1 share a common endpoint and each ei is oriented so that the interior of P is to its left. Let H(e,) be the half-plane to the left of the line through ei. The intersection of H(ei), i = 0, 1, . . , n-1, n gives the kernel of P. An algorithm is presented in [Cole88a] that finds the kernel of a simple polygon of size n in 0(logn) time using 0(n/logn) processors on a CREW PRAM. This algorithm is cost-optimal since finding the kernel of a simple polygon has a lower bound of Q (n) [Prep85]. Note that finding the intersection of half-planes defined by the edges of a simple polygon is easier than finding the intersection of arbitrary half-planes. Linear programming in two dimensions can be viewed as a type of intersection problem since its solution can be found by solving the half-plane intersection problem. Two-variable (planar) linear programming solves the following problem: Maximize ax+b (the objective function) subject to n constraints, aix + biy + ci < 0, i = 1, 2, . . ., n. The feasible region of the linear program above is the set of points (x, y) satisfying the n constraints, which is the intersection of the n half-planes defined by the constraints. Figure 4.4 shows a two-variable linear program. The solution to a linear program in two dimensions can be found by constructing the intersection of n half-planes in 0(n logn) time with a single processor and then finding the vertex of the resulting polygon that maximizes the objective function [Prep85]. Since the entire intersection need not be computed to find the solution to a linear programming problem, the two problems are not equivalent. Note that half-plane intersection construction has an Q (n log n) lower bound, while the lower bound for linear programming in fixed dimensions is Q(n) [Prep85]. Optimal algorithms that solve linear programming in two and three dimensions (two- and three-variable linear programming, respectively) with a single processor exist [Megi83, Dyer84]. A cost-optimal parallel algorithm for solving linear programming in the plane based on the sequential algorithms in [Megi83, Dyer84] is given in [Deng9O]. It runs in 0(logn) time on an ARBITRARY CRCW PRAM with 0(n/logn) processors. An algorithm for solving the linear programming problem in Rd is presented in [Wegr9l] that runs in 0(logd n) time on a CREW PRAM of size n, and in 0(log3 n)

Intersection Problems

60

Chap. 4

Problem

Reference

Model

Processors

Running time

Detection (two polygons) Detection (two simple polygons) Detection (two star-shaped polygons) Detection (two convex polygons) Detection (two convex polyhedra) Detection (many simple polygons) Detection (many monotone polygons) Detection (for each isothetic rectangle in a set) Detection (for each circle in a set)

[Chaz84] [Akl89a] [Chaz84]

Linear array Mesh-of-trees Linear array

O(n) O (n'/ log n) 0(n)

0(n)

[Ghos91]

EREW PRAM

0 (n/ log n)

O(logn)

[Atal88a]

CREW PRAM

O(n)

0(1)

[Dado89] [Dado89]

CREW PRAM CREW PRAM

O(n) O(n)

O (log n) 0(log n log* n)

[Mill87b]

Mesh

O(n)

O(n1/2)

[Boxe9O] [Boxe90] [Boxe9O] [Chaz84] [Lu86b], [Mill89b] [MacK90b] [Mill89b]

CREW PRAM

O(n) O(n) O(n) 0(n) 0(n) 0(n)

0 (log n )

O(n 1 /2) 0 (log2 n)

0(n)

O(n1/2 )

Reporting (two convex polygons) Reporting (pairs of isothetic rectangles) Reporting

[Stoj88a] [Mill89b], [Jeon9O] [Chow8O]

0(n)

0(logn)

0(n) 0(n) 0(n)

O(n1/2)

[Chow8O] [Chow8O] [Chaz84], [Chen87]

Hypercube Mesh CREW PRAM CCC CCC Linear array

[Cole88a] [Deng9O]

(n half-planes) Computing a kernel Linear programming (R 2 )

[Wegr91] Linear programming (R 3 ) Linear programming (Rd)

[Wegr91] [Wegr91] [Wegr91]

Hypercube Mesh Linear array Mesh Hypercube Mesh

1

0(logn)

0(n)

2

0(log2 n) O(n'/2 ) 0(n)

2

0(log n + I_0 2

O(log n +

0(k logn + I,,_)

0(n)

0(n)

CREW PRAM

0(n/logn)

CRCW PRAM (ARBITRARY) Mesh with reconfigurable buses Mesh with reconfigurable buses CREW PRAM Mesh with

0(n/ log n)

0(logn) 0(log n)

O(n)

0(log3 n)

O(n)

0(n'l3 log 3 n)

O(n) O(n)

O(logd n) O(n1/2)

reconfigurable buses Figure 4.5 Performance comparison of parallel polygon, half-plane, rectangle, and

circle intersection algorithms.

I..)

O(n +l/k)

Sec. 4.3

Problems

61

time in R 2 , O(n1 / 3 log 3 n) time in R3 , and in 0(n 1 /2 ) time in Rd on an 0(n)-processor mesh with reconfigurable buses.

4.2.1 Summary

The table in Figure 4.5 summarizes the results in the preceding section. Note that Imax is the maximum number of intersections per rectangle and 1 < K < log n.

4.3 Problems Suppose that a set S of n straight-line segments in the plane is given. The angle that each line segment forms with the horizontal is restricted to take one of c given values, where c is a constant independent of n. Is the problem of computing in parallel all intersections among the members of such a special set S any easier than the general case (i.e, when the angles are unrestricted)? 4.2. Two convex polygonal chains C, and C2 are given. It is required to detect whether these chains intersect, and if they do, to compute their intersection. Assuming that the two chains have a total of n vertices, how fast can this problem be solved in parallel? 4.3. Let A and B be two intersecting convex polygons in the plane. The depth of collision D(A,B) between the two polygons is defined as the least distance by which B has to be translated so that it just separates from A. Develop a PRAM algorithm for computing D(A, B). 4.4. A polygon P in the plane is said to be vertically convex (or monotone in the horizontal direction) if for every pair of points a and b in P. such that a and b are on the same vertical line, the line segment from a to b is contained in P. Given a collection of vertically convex polygons P,. P2 , ..., Pk, with a total of n vertices, design a parallel algorithm (for your chosen model of computation) which determines whether the polygons have a common intersection. Can you solve this problem in O(log n) time on an n-processor CREW PRAM? 4.5. Show that the intersection of two convex polygons with a total of n vertices can be computed in O(log n) time on a hypercube computer with n processors. 4.6. Design a parallel algorithm that finds the intersection of two star-shaped polygons on a hypercube parallel computer. 4.7. A mesh computer with n processors is given. Describe a parallel algorithm for this computer that determines the common intersection of n or fewer half-planes in O(n1/2) time. 4.8. Given a set of isothetic rectangles and a query rectangle, it is required to report the number of rectangles that intersect the query rectangle. Describe and analyze algorithms for solving this problem on the following models of parallel computation: (a) Linear array (b) d-Dimensional mesh (c) Modified CCC (d) Scan 4.9. A set of n circles (of different centers and diameters) is given. It is required to determine for each circle whether or not it is intersected by another circle. Using n processors, 4.1.

62

Intersection Problems

Chap. 4

determine how fast this problem can be solved on each of the following models of parallel computation: (a) Linear array (b) Mesh (c) Hypercube (d) Mesh-of-trees (e) Mesh with broadcast buses (f) Mesh with reconfigurable buses (g) EREW PRAM 4.10. Given a set S of n circles in the plane, construct a data structure on an 0(n)-processor CREW PRAM that, given a line 1, allows fast answers to queries of the form: (a) Detect if any circle in S is intersected by 1. (b) Report all circles in S that are intersected by 1. Solve the same problem on a hypercube of size n. 4.11. Given a set S of n circular arcs in the plane, design and compare parallel algorithms that count the number of pairwise intersections in S. Note that two circular arcs can intersect at more than one point. 4.12. Let PI be a simple polygon, and let P2 be a simple polygon with holes. It is required to identify all positions that PI can occupy inside P2 without intersecting any hole. Design a parallel algorithm for this problem.

4.4 References [Agga88] [Akl89a] [Atal86b]

[Atal88a] [Atal89c]

[Bent8O]

[Boxe90 [Bren74] [Chaz84]

A. Aggarwal, B. Chazelle, L. J. Guibas, C. 6'Duinlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. S. G. AkU, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. M. J. Atallah and M. T. Goodrich, Efficient plane sweeping in parallel (preliminary version), Proceedings of the Second Annual ACM Symposium on Computational Geometry, Yorktown Heights, New York, June 1986, 216-225. M. J. Atallah and M. T. Goodrich, Parallel algorithms for some functions of two convex polygons, Algorithmica, Vol. 3, 1988, 535-548. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, SIAM Journal on Computing, Vol. 18, No. 3, 1989, 499-532. J. L. Bentley and D. Wood, An optimal worst case algorithm for reporting intersections of rectangles, IEEE Transactions on Computers, Vol. C-29, 1980, 571-576. L. Boxer and R. Miller, Common intersections of polygons, Information Processing Letters, Vol. 33, No. 5, 1990, 249-254; see also corrigenda in Vol. 35, 1990, 53. R. P. Brent, The parallel evaluation of general arithmetic expressions, Journal of the ACM, Vol. 21, No. 2, 1974, 201-206. B. Chazelle, Computational geometry on a systolic chip, IEEE Transactions on Computers, Vol. C-33, No. 9, September 1984, 774-785.

Sec. 4.4 [Chaz86]

References

63

B. Chazelle and L. J. Guibas, Fractional cascading: I. A data structuring technique, Algorithmica, Vol. 1, 1986, 133-162. [Chen87] G.-H. Chen, M.-S. Chern, and R. C. T. Lee, A new systolic architecture for convex hull and half-plane intersection problems, BIT, Vol. 27, 1987, 141-147. [Chow80] A. L. Chow, Parallel Algorithms for Geometric Problems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1980. [Cole88a] R. Cole and M. T. Goodrich, Optimal parallel algorithms for polygon and point-set problems (preliminary version), Proceedings of the Fourth Annual ACM Symposium on Computational Geometry, Urbana-Champaign, Illinois, June 1988, 201-210. [Dado89] N. Dadoun and D. G. Kirkpatrick, Parallel construction of subdivision hierarchies, Journal of Computer and System Sciences, Vol. 39, 1989, 153-165. [Deng9O] X. Deng, An optimal parallel algorithm for linear programming in the plane, Information Processing Letters, Vol. 35, 1990, 213-217. [Dyer84] M. E. Dyer, Linear time algorithms for two- and three-variable linear programs, SIAM Journal on Computing, Vol. 13, No. 1, 1984, 31-45. [Ghos9l] K. S. Ghosh and A. Maheshwari, An optimal parallel algorithm for determining the intersection type of two star-shaped polygons, Proceedings of the Third Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 2-6. [Good88] M. T. Goodrich, Intersecting Line Segments in Parallel with an Output-Sensitive Number of Processors, Technical Report 88-27, Department of Computer Science, The John Hopkins University, Baltimore, 1988. [Jeon9O] C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. [Kim9O] S. K. Kim, Parallel algorithms for the segment dragging problem, Information Processing Letters, Vol. 36, No. 6, December 1990, 323-328. [Lu86b] M. Lu and P. Varman, Mesh-connected computer algorithms for rectangle-intersection problems, Proceedings of the 1986 International Conference on ParallelProcessing, St. Charles, Illinois, August 1986, 301-307. [MacK9Ob] P. D. MacKenzie and Q. F. Stout, Practical hypercube algorithms for computational geometry, poster presentation at the Third Symposium on the Frontiersof Massively Parallel Computation, College Park, Maryland, October 1990. [Megi83] N. Megiddo, Linear time algorithm for linear programming in R3 and related problems, SIAM Journal on Computing, Vol. 12, No. 4, 1983, 759-776. [Mill87b] R. Miller and Q. F. Stout, Mesh computer algorithms for line segments and simple polygons, Proceedings of the 1987 InternationalConference on ParallelProcessing, St. Charles, Illinois, August 1987, 282-285. [Mill89b] R. Miller and Q. F. Stout, Mesh computer algorithms for computational geometry, IEEE Transactions on Computers, Vol. C-38, No. 3, March 1989, 321-340. [Prep851 F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. [Rtib9O] C. Rub, Parallel algorithms for red-blue intersection problems, manuscript, FB 14, Informatik, Universitat des Saarlandes, Saarbrucken, 1990. [Shih87] Z.-C. Shih, G.-H. Chen, and R. C. T. Lee, Systolic algorithms to examine all pairs of elements, Communications of the ACM, Vol. 30, No. 2, February 1987, 161-167.

64 [Stoj88a]

[Wvgr91]

Intersection Problems

Chap. 4

I. Stojmenovi6, Computational geometry on a hypercube, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 100-103. P. Wqgrowicz, Linear programming on the reconfigurable mesh and the CREW PRAM, M.Sc. thesis, School of Computer Science, McGill University, Montreal, Quebec, 1991.

5 Geometric Searching

In the field of pattern recognition, an object is recognized by identifying, among several given classes of objects, one class to which it belongs. For example, a robot roaming the surface of Mars may wish to determine whether the object it faces is a rock, a spaceship, or a Martian. Each class is described by a region in some space, and the points inside the region represent objects in that class. Points are given by their coordinates in space, each coordinate being the value of an objectfeature. To classify a new object, it suffices to identify the region in which the point representing the object falls. In computational geometry, more generally, the problem of geometric searching is that of locating a given geometric object (such as a point) inside an existing geometric structure (such as a subdivision of the plane). In this chapter we review parallel algorithms for various geometric search problems.

5.1 Point Location A planar subdivision is a partition of the plane into regions bounded by straight-line edges. Each region is therefore a polygon whose corners are the vertices of the subdivision. A subdivision is said to be convex if it is bounded by a convex polygon and each of its regions is convex. A convex subdivision is triangulated if each of its regions is a triangle. It is shown in [Good89a] how an arbitrarysubdivision or even a simple polygon with holes can be triangulated in O(logn) time using O(n/logn) processors on a CREW PRAM. This result is discussed further in Chapter 10. The problem of point location in a planar subdivision calls for determining the region of the subdivision occupied by each of a given set of query points. Typically, a data structure is created that facilitates fast point location in the subdivision. This data structure is the result of an algorithm that performs some preprocessing on the subdivision. Given a subdivision S with n vertices, three measures are made of an algorithm for locating points in S: the time, p(n), to do the preprocessing; the amount of space, s(n), required to store the data structure; and the amount of time, q(n), required to search the data structure to locate a point. Algorithms exist that preprocess an arbitrary subdivision in p(n) = O(n logn) 65

66

Geometric Searching

Chap. 5

Figure 5.1 Convex planar subdivision.

time with a single processor such that the data structure takes s(n) = 0(n) space and queries take q(n) = 0(logn) time with one processor [Prep85]. Figure 5.1 illustrates an example of a planar convex subdivision. Assume that the planar subdivision consists of n vertices and that it is triangulated. A randomized parallel algorithm is described in [Reif87] which locates n query points in such a subdivision using 0(n) processors and 0(n) space on the COMMON CRCW PRAM. The algorithm runs in 6(logn) time with high probability (i.e., with probability larger than or equal to 1 - ef, for some constant c). The algorithm is essentially a parallelization of the sequential algorithm of [Kirk83]. As in its sequential counterpart, the algorithm first builds a data structure during a preprocessing stage. This requires O(logn) time with high probability and 0(n) processors. Subsequently, each of the query points can be located by a different processor, with each processor implementing (simultaneously with all others) a sequential point location algorithm requiring 0 (log n) deterministic time [Kirk83]. The preprocessing stage consists of three steps whose purpose is to remove a number of vertices from the planar subdivision. These steps are repeated until the size of the remaining set is a small constant. The three steps are as follows: Step 1. Select a set of vertices no two of which are connected by an edge and each of which has a degree of at most 12. This can be done in constant time by assigning a processor to each vertex and each edge of the planar subdivision. All vertices with a degree of at most 12 are then identified and given a label of 0 or 1 at random and with equal probability. The required set is formed by those vertices labeled 0 which are not connected to any other vertices also labeled 0. Step 2. Remove the set of vertices identified in step 1 and triangulate the remaining subdivision.

Sec. 5.1

Point Location

67

Step 3. For each of the new triangles constructed in step 2, determine which of the old triangles it intersects. It is shown in [Reif871 that the number of iterations is no larger than log,/, n, for some constant r, with high probability. Once the preprocessing stage is complete, a query point is located in the subdivision, again iteratively, beginning with the final subdivision. At each iteration, the query point is located with respect to those triangles in the next iteration that intersect the present subdivision, thus narrowing down the search. In the final iteration, the triangle of the original subdivision containing the query point is determined. It is also shown in [Reif87] how the algorithm can be extended to handle arbitrary subdivisions by exhibiting an 0(n)-processor COMMON CRCW PRAM randomized triangulation algorithm which runs in 0 (log n) time with high probability. A more subtle approach based on decomposing the subdivision into chains is described in [Dado89] which allows the preprocessing stage to be implemented on the weaker CREW PRAM in O(logn) probabilistic time with 0(n) processors. An 0(logn)-time deterministic triangulation algorithm is used such as the one described in [Good89a]. Deterministic point location algorithms are also given in [Dado89] which use 0(n) processors on the CREW PRAM and run in time 0(logn log* n) for convex subdivisions and 0(log2 n) for arbitrary subdivisions. Essentially, the hierarchicalsubdivision search structure of [Kirk83] is constructed in parallel. The structure uses 0(n) space, and a query point can be located in it sequentially in 0(logn) time. In Chapter 4 an algorithm is described for detecting an intersection among a set of line segments in the plane that constructs a data structure called a segment tree [Agga88]. This data structure can be used for planar point location since it allows the segment immediately above a query point to be located in 0(log2 n) time with one processor. If the face below each segment s is associated with s, locating the segment above a query point p gives the face of S that contains p. The data structure is constructed in 0(log2 n) time using 0(n) processors on a CREW PRAM, and it takes 0(nlogn) space. This result is improved upon by the introduction of the plane sweep tree technique presented in [Atal86b]. The algorithm builds a data structure called an augmented plane sweep tree in 0(logn loglogn) time using 0(n) processors and 0(n logn) space on a CREW PRAM. A query point can be multilocated in 0(logn) time using one processor. Surprisingly, a deterministic parallel algorithm exists whose performance matches that of the randomized algorithms of [Reif87]. In [Atal89c], the plane sweep tree with fractional cascading is used to detect line segment intersection (see Chapter 4). The plane sweep tree T is constructed in 0(logn) time using 0(n) processors on a CREW PRAM and can be made into a fractional cascading data structure T within the same time bound. The storage required for T is 0(n logn) and a single processor can determine the face containing a query point p in 0(logn) time if for each segment s, the face directly above s is associated with s. Constructing the plane sweep tree and making it into a fractional cascading data structure can also be done on the EREW PRAM within the same time and space bounds but with larger constant multiplicative factors. An algorithm given in [Tama9l] preprocesses a monotone subdivision in 0(logn)

68

Geometric Searching

Chap. 5

Figure 5.2 Monotone subdivision of the plane. time using 0(n/logn) processors on an EREW PRAM. A monotone subdivision is one in which all regions are monotone polygons. Figure 5.2 shows an example of a monotone subdivision. The preprocessing results in an 0(n)-size data structure that allows point location queries in 0(logn) sequential time. The data structure is the bridged separator tree [Lee77, Edel86a]. For nonmonotone subdivisions, a triangulation algorithm can first be applied. In [Tama90] the bridged separator tree is modified to give optimal time and processor bounds. For monotone subdivisions, an 0(n)-space data structure can be constructed in 0(log n) time using 0(n/ log n) processors on an EREW PRAM model. This result makes use of fractional cascading as described above [Chaz86]. Subsequent cooperative searching for query points can be accomplished in 0(logn/logN) time with O(N) processors on a CREW PRAM, I < N < n. In cooperative searching, the queries are answered using more than one processor rather than sequentially. This result is also extended to spatial point location in which a data structure requiring 0(n) space is constructed in 0(logn) time on an EREW PRAM, and cooperative searching takes 0(log2 n/log2 N) time with O(N) processors on a CREW PRAM, 1 < N < n. Given a triangulated planar n-vertex graph, an algorithm in [Cole9O] constructs a data structure in 0(logn) time using 0(n/logn) processors on a CREW PRAM such that the triangulated region in which a point is located can be found in 0(logn) time using a single processor. To this point we have discussed algorithms on the PRAM model of computation that construct a data structure in parallel to allow fast sequential location of query points. Once the data structure has been built, many queries can be performed efficiently on the same subdivision. A different approach has been taken in developing algorithms for network models. A linear array of 0(n) processors is used in [Chaz84], where the number of the edges in the subdivision S is 0(n) to locate a query point in 0(n) time. Let each face of S be a polygon such that the edges corresponding to each polygon f of S are contained in contiguous processors (a subarray) in clockwise order around f. Note that each edge is stored in at most two processors. The name of the face associated with an edge e is also stored at the processor that stores e. Locating a query point p requires testing each polygon in S for inclusion of p. A point p is pumped through the array and for each edge e in the subarray corresponding to a polygon, the vertical line lp through p is tested for intersection with e. The two edges that intersect 1p and form the smallest segment containing p are tested to see if the edge above p is directed left to right and

Sec. 5.1

Point Location

69

the edge below p from right to left [since the edges belonging to a face (i.e., polygon) of S are in a clockwise order around that face]. This process takes O(n) time and the query points can be pipeline with period 1. This means that while the first query point is being passed from processor PI to processor P2, a second query point can enter PI and begin being processed. This way, k points can be located in O(k + n) time. A similar approach is presented in [Akl89a] for point location on a mesh-of-trees. An algorithm that runs on a single tree network with O(n) processors is presented that determines if a point p is inside a polygon Q with n edges. The test for inclusion of a point p in a polygon Q involves counting the number of edges of Q above p that intersect Ip. If this number is odd, p is inside Q; otherwise, p is not inside Q. The algorithm takes O(logn) time. The subdivision is considered to be the union of the m polygons each with at most n edges. For point location, an m x n mesh-of-trees is used such that there are m rows and n columns each of which is connected as a binary tree. The root processors in each row receive the coordinates of the point p, then each row of processors simultaneously performs the inclusion test in 0(logn) time. The time required to give each row of processors the coordinates of p and to report the results is 0(logm). If we assume that m = 0(n), the running time of this algorithm is 0(logn) and the number of processors is O(n 2 ). The query points can be pipelined such that k points require 0(k + log n) time to be located in S. An algorithm for locating a set of query points P in a planar subdivision S is presented in [Jeon9O] for a mesh of size n 1 / 2 x n 1/ 2 that runs in 0(n1 / 2 ) time where ISI + JPJ < n. Finally, two CCC algorithms for locating n query points in an n-vertex planar subdivision in O(log 3 n) time [respectively, 0(log2 n) time] using O(n) processors [respectively, O(nlogn) processors] are described in [Lee89]. These algorithms are based on the segment tree, as are the algorithms in [Atal86b], [Agga88], and [Atal89c]. A randomized algorithm for locating n points in an arrangement of ny lines, where y < 1, that runs in 0(logn) probabilistic time on a butterfly network of size n is given in [Reif9O]. An algorithm in [Dehn9O] solves a slightly different problem: Given a set S of n nonintersecting line segments and a direction d, the next-element search problem finds for each point pi in a set of query points pI, P2,. . ., pA, the segment in S first intersected by a ray starting at pi in the direction d. The next-element search problem is solved in 0(log2 n) time on a hypercube of size 0(n log n). Summary

The table in Figure 5.3 summarizes the results in this section. The results in the table are for arbitrary subdivisions except that marked with a t, which is for convex subdivisions; that marked with a *, which is for triangulated subdivisions; and that marked with an T, which is for monotone subdivisions. In [Jeon9O] and [Lee89] n is the size of the subdivision and the number of query points, and in [Akl89a] n is the maximum number of edges around a face of the subdivision. In all other cases, n is the size of the planar subdivision (the number of edges or the number of vertices). In [Chaz84] and [Akl89a], two query-time results are given: The first is the time for one query and the second is

Geometric Searching

70

Chap. 5

Reference

Model

Processors

Space

Preprocessing time

Query time

[Agga88] [Atal86b] [Reif87]

0(n) 0(n) 0(n)

0(nlogn) 0(nlogn) 0(n)

0(log2 n) O(lognloglogn) O(logn)

0(log n) 0(logn) 0(logn)

[Atal89c] [Atal89c] [Dado89] [Dado89]t [Dado89]

CREW PRAM CREW PRAM CRCW PRAM (COMMON) CREW PRAM EREW PRAM CREW PRAM CREW PRAM CREW PRAM

0(n) 0(n) 0(n) 0(n) 0(n)

0(n logn) 0(nlogn) 0(n) 0(n) 0(n)

0(logn) 0(logn) O(logn) 0(lognlog*n) 0(log n)

0(logn) 0(logn) 0(logn)

[Tama9O] [Tama90]

EREW PRAM CREW PRAM

0(n/logn) O(N)

0(n) -

0(logn) -

0(logn/logN)

[Tama9l]{

EREW PRAM

0(n/logn)

0(n)

0(logn)

0(logn)

[Cole9Oa]* [Chaz84] [Jeon90] [Akl89a] [Lee89] [Lee89] [Reif9O]

CREW PRAM Linear array Mesh Mesh-of-trees CCC CCC Butterfly

0(n/logn)

0(n) -

0(logn) -

0(logn) 0(n + k) 0(n 1/2) 0(logn + k) 0(log3 n) 2 0(log n) 6(logn)

Figure 5.3

0(n) 0(n) 0(n 2 ) 0(n) 0(n logn)

0(n)

0(logn) 0(logn)

Performance comparison of parallel point location algorithms.

the time for k pipelined queries. For the result of [Tama90] preprocessing is done on

an EREW PRAM and the queries are executed cooperatively on a CREW PRAM with O(N) processors, 1 < N < n.

5.2 Range Searching Given a set of data points in space and a geometric shape (say, a sphere in 3-space) representing a query, it is required to identify those data points that are contained inside the query domain. A variant to this problem calls for simply counting the points

contained in the query. This problem, known as range searching, is in some sense a dual to the point location problem. A technique that has been used in solving range searching

problems involves constructing multidimensional binary trees (or k-D trees) [Prep85]. A k-D tree splits n points in k-dimensional space into n regions each with a single point. Initially, a hyperplane is used to divide the space into two along one of the coordinates. The same approach is then used recursively for each of the two subspaces. With a single processor, k-D trees can be constructed in 0 (n log n) time and stored in 0 (kn) space such that queries can be performed in 0(kn'- /k + I) time, where I is the size of the output [Prep85]. A parallel algorithm for constructing k-D trees on the scan model of parallel

Sec 5.3

Problems

71

computation is described in [Blel88]. During a preprocessing step, a rank vector is created for each of the k dimensions which stores, for each data point, what its position would be had the points been sorted along that dimension. This is followed by 0 (log n) steps, each of which corresponds to a split of the set of n points. At the beginning of every step, a cutting hyperplane is selected, and for each point it is determined in which of the two resulting subspaces it belongs. This information is then used to split the k rank vectors and generate new ranks for each subspace. On the scan model, each step requires 0(k) time, leading to an overall running time of 0 (k log n). Since n processors are used, the algorithm has optimal cost. A parallel algorithm for range searching is given in [Srid9O]. Here n data points in d dimensions are stored in a range tree data structure [Prep85] that is distributed over several independent processor memories. A range tree stores intervals on the x-axis at the nodes of a segment tree [Bent8O]. At each node in the segment tree, there is a pointer to a threaded binary tree that stores the points in that interval by increasing y-coordinate. The leaves of a threaded binary tree form a linked list ordered by increasing y-coordinate. The authors show that three-dimensional range searching on n points is possible on a hypercube with 0(log2 n) processors and 0(logn) time. Given a set S of n points in the plane, orthogonal range searching reports all points in S that are inside a query range r, where r is a rectangle with sides parallel to the coordinate axes. It is shown in [Tama90] how an 0(n)-size data structure can be constructed in 0(logn) time on an EREW PRAM with n processors that allows retrieval of k points in a given orthogonal range. If the items to be retrieved are marked, direct retrieval is used and the items are reported in 0(logn/ log N + loglogn + k/N) time using a CREW PRAM with 1 < N < n processors. If indirect retrieval is used, in which pointers to a linked list of items to be reported are returned, a query takes O(log n/log N) time on a PRIORITY CRCW PRAM with 1 < N < log 2 n processors. If log 2 n < N < n processors are available, the retrieval of k points using indirect retrieval can be done in constant time on a PRIORITY CRCW PRAM.

5.3 Problems 5.1.

Given a planar subdivision S with n vertices, show how n query points can be located in

S on the following models of computation: (a) Binary tree

(b) Mesh

5.2.

(c) Hypercube (d) Pyramid You are given a subdivision S of the plane, consisting of n polygons of 0(n) edges each.

(a) Develop an EREW PRAM algorithm that runs in 0(logn) time using 0(n) processors and builds a data structure for point location in S.

(b) Show how the data structure constructed in part (i) can be used by an N-processor CREW PRAM algorithm to locate a point in S in 0(logn/logN) time where I < N < n.

72

Geometric Searching

Chap. 5

5.3.

A mesh-of-trees algorithm for point location is described in Section 5.1. Can this algorithm be extended to subdivisions of spaces in dimensions higher than 2? 5.4. Can the algorithm of Problem 5.2 be extended to locate points in subdivisions of spaces in dimensions higher than 2? 5.5. Show how k-D trees for range searching can be implemented efficiently on an EREW PRAM. 5.6. Describe an algorithm for performing d-dimensional range searching on each of the following models of parallel computation: (a) Hypercube (b) d-Dimensional mesh (c) Mesh with reconfigurable buses (d) Modified CCC 5.7. As defined in Section 5.3, orthogonal range searching reports all points in an n-point planar set that fall inside an isothetic query rectangle. Describe algorithms that perform orthogonal range searching on each of the following models of parallel computation: (a) Linear array (b) Mesh-of-trees (c) Butterfly (d) Cube-connected-cycles 5.8. Which parallel model of computation is most suitable for range searching through the use of range trees? Why? 5.9. Discuss issues that arise when storing data structures such as the segment tree, the k-D tree, and the range tree on parallel network models. 5.10. Given a set S of points in the plane, the circular range searching problem calls for reporting all the points of S that are inside a query range C, where C is a circle of given radius and center. Design a parallel algorithm that solves the circular range searching problem. 5.11. You are given a set S of nonintersecting line segments in the plane, a direction d, and a query point q. Design a parallel algorithm for finding the last segment (if any) intersected by a half-infinite line starting at q and parallel to the direction d. 5.12. Design a parallel algorithm that determines whether two polygons are similar with respect to affine transformations (i.e., dilatation, translation, and rotation).

5.4 References [Agga88] [Akl89a] [Atal86b]

[Atal89c]

A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Dunlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. M. J. Atallah and M. T. Goodrich, Efficient plane sweeping in parallel (preliminary version), Proceedings of the Second Annual ACM Symposium on Computational Geometry, Yorktown Heights, New York, June 1986, 216-225. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, SIAM Journal on Computing, Vol. 18, No. 3, 1989, 499-532.

Sec. 5.4 [Bent80

[Blel88]

[Chaz84] [Chaz861 [Cole9Oa]

[Dado89] [Dehn90

[Ede186aj

References

73

J. L. Bentley and D. Wood, An optimal worst case algorithm for reporting intersections of rectangles, IEEE Transactions on Computers, Vol. C-29, 1980, 571 -576. G. E. Blelloch and J. J. Little, Parallel solutions to geometric problems on the scan model of computation, Proceedings of the 1988 InternationalConference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. 111, Algorithms and Applications, 218-222. B. Chazelle, Computational geometry on a systolic chip, IEEE Transactions on Computers, Vol. C-33, No. 9, September 1984, 774-785. B. Chazelle and L. J. Guibas, Fractional cascading: I. A data structuring technique, Algorithmica, Vol. 1, 1986, 133-162. R. Cole and 0. Zajicek, An optimal parallel algorithm for building a data structure for planar point location, Journal of Parallel and Distributed Computing, Vol. 8, 1990, 280-285. N. Dadoun and D. G. Kirkpatrick, Parallel construction of subdivision hierarchies, Journal of Computer and System Sciences, Vol. 39, 1989, 153-165. F. Dehne and A. Rau-Chaplin, Implementing data structures on a hypercube multiprocessor, and applications in parallel computational geometry, Journal of Paralleland Distributed Computing, Vol. 8, 1990, 367-375.

H. Edelsbrunner, L. J. Guibas, and J. Stolfi, Optimal point location in a monotone subdivision, SIAM Journal on Computing, Vol. 15, 1986, 317-340. [Good89a] M. T. Goodrich, Triangulating a polygon in parallel, Journal of Algorithms, Vol. 10, September 1989, 327-251. [Jeon9O] C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. [Kirk831 D. G. Kirkpatrick, Optimal search in planar subdivisions, SIAM Journal on Computing, Vol. 12, No. 1. February 1983, 28-35. [Lee77] D. T. Lee and F. P. Preparata, Location of a point in a planar subdivision and its applications, SIAM Journal on Computing, Vol. 6, 1977, 594-606. [Lee89] D. T. Lee and F. P. Preparata, Parallel batched planar point location on the CCC, Information Processing Letters, Vol. 33, 1989, 175-179. [Prep851 F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. [Reif87] J. H. Reif and S. Sen, Optimal randomized parallel algorithms for computational geometry, Proceedings of the 1987 International Conference on ParallelProcessing, St. Charles, Illinois, August 1987, 270-277. [Reif9O] J. H. Reif and S. Sen, Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications (Preliminary Version), Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 327-337. [Srid9O] R. Sridhar, S. S. Iyengar, and S. Rajanarayanan, Range search in parallel using distributed data structures, Proceedingsof the InternationalConference on Databases, ParallelArchitectures, and Their Applications, Miami Beach, Florida, March 1990, 14-19.

74

Geometric Searching

Chap. 5

[Tama9O] R. Tamassia and J. S. Vitter, Optimal cooperative search in fractional cascaded data

[Tama9l]

structures, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 307-316. R. Tamassia and J. S. Vitter, Planar transitive closure and point location in planar structures, SIAM Journalon Computing, Vol. 20, No. 4, August 1991, 708-725.

6 Visibility and Separability Given a set of geometric objects, a point p is said to be visible from a point q if the line segment with endpoints p and q is not intersected by any object. For example, two points p and q in a simple polygon P are visible from one another if the line segment with endpoints p and q does not intersect any edge of P. Two objects SI and S2 (polygons, point sets, etc.) in the plane are separable in a direction d if SI can be translated by an arbitrary distance in the direction d without colliding with S2, and there is a line 1 oriented such that the translation of SI is to its left and S2 is to its right. We combine the results for parallel visibility algorithms and parallel separability algorithms in the same chapter because, in some instances, the two problems are related. Application areas of visibility and separability problems include computer graphics, motion planning, robotics, pattern recognition, computer vision, and computer-aided design (CAD)/computer-aided manufacturing (CAM). The results presented in this chapter vary. Given a point p and n nonintersecting line segments in the plane, several algorithms are discussed that find the portion of the plane that is visible from p. Algorithms are presented that find the portions of a polygon (or polygonal chain) P that are visible in a direction d. When d is perpendicular to the x-axis, this problem can be stated as finding the portions of P that are visible from the point p = (0, +oc). A solution to this problem can be used to determine if two polygons are separable in the direction d. In this way, a visibility algorithm is applied to determine separability. An algorithm that finds visibility pairs of line segments is also presented. Finally, an algorithm that solves a separability problem that is not directly related to visibility problems is described.

6.1 Visibility 6.1.1 Visibility Polygon from a Point Inside a Polygon The visibility polygon from a point p contained inside an n-vertex simple polygon P with holes is that region of P that is visible from the point p. A polygon with holes

75

Visibility and Separability

76

Chap. 6

F

F. Figure 6.1 Polygon P and visibility polygon from point p inside P.

contains pairwise disjoint polygons P,, P2 , . .. , Pk in its interior such that the interior of Pi, i = 1, . . ., k, is exterior to P and the boundaries of Pi, i = 1, . . ., k, are part of the boundary of P. Figure 6.1 shows a polygon P, a point p inside P, and the visibility polygon K from p. An algorithm is given in [Asan88] that finds the visibility polygon from a point inside an n-vertex polygon with holes in O(n) time on a linear array of size n, where n is the number of vertices of P plus the number of vertices of each of Pi, P2,..., Pk. Consider the edges of the polygon P and each of Pi, i = 1, ... , k, oriented so that the interior of P is to the left of the edges. That is, the edges of P are oriented in a counterclockwise direction and the edges of Pi, i = 1, . . , k, are oriented in a clockwise direction. The algorithm uses a classification of each vertex v based on the orientation of the edges incident on v. The incoming edge (outgoing edge) of v is the edge oriented toward (away from) v. A vertex v is one of four types: Type 1. The point p is to the right of the line through the incoming edge of v and to the left of the line through the outgoing edge of v. Type 2. The point p is to the left of the line through the incoming edge of v and to the right of the line through the outgoing edge of v.

Sec. 6.1

Visibility

77

Type 3. The point p is to the left of both the line through the incoming and the line through the outgoing edges of v. Type 4. The point p is to the right of both the line through the incoming and the line through the outgoing edges of v. There are some things to note about the types of vertices. First, because of the orientation of edges, p cannot see a type 4 vertex. Further, p can only see type 1, 2, and 3 vertices v if the line segment (p,v) does not intersect any edge of P or Pi, i = 1, . k. If a vertex v of type 1 or 2 is visible from p, the line segment defined by the ray from v in the direction pV to its first intersection with a polygonal edge is an edge of the resulting visibility polygon. Figure 6.1 illustrates an example of each type of vertex. The vertices and the point p are loaded into the O(n) processors in linear time. Each processor can determine the type of its vertex v in constant time if given the directions of the edges incident on v. Edges of P and Pi, i = 1, .. ., k, are pipelined into the linear array at the left side one at a time. Each processor that stores a vertex v of type 1, 2, or 3 tests if the line segment (p, v) intersects the given edge, and if so, changes the type of its vertex to 4. If a processor holds a type 1 or 2 vertex, it also tests if the ray r(v) emanating from v in the direction PV intersects the given edge and keeps track of the edge that intersects r(v) closest to v. In 2n steps every vertex is compared with every edge and the new edges of the visibility polygon from type 1 and 2 vertices are computed. The final step sorts the vertices in polar angle about p. Note that for the vertices v of type 1 and 2, there will be another point v' with the same polar angle. For type 1 vertices, the order v, followed by v', is forced, and for type 2 vertices, the order v', followed by v, is forced to keep the proper orientation of the edges. An algorithm that solves the same problem is given in [Dehn88a]. It uses divide-and-conquer on a mesh and runs in O(n1 2) time using O(n) processors. 6.1.2 Region of a Polygon Visible in a Direction An algorithm in [Dehn88a] solves the following visibility problem: Given a simple polygon P and a direction d, it is required to find the region of the boundary of P that is visible in the direction d. A point q on the boundary of P is visible in the direction d if the ray from q in the opposite direction of d does not share any point (except q) with the boundary of P. Figure 6.2 shows a polygon P and the part of the boundary of P that is visible in the direction d. The algorithm in [Dehn88a] runs in O(n1 2) time on a mesh of size n. An algorithm for finding the visibility hull of an n-vertex polygon P in direction d that runs in 0(n1 /2 ) time on a mesh of size n is also given in [Dehn88a]. The boundary and the interior of the visibility hull VH(P) of P comprise the set of all points that are in the boundary and interior of the polygon P, plus those points that are in the closed regions defined by any line segment (a,b) parallel to d such that a and b are contained in the boundary of P. Figure 6.2 shows the visibility hull of a polygon P. Finally, the algorithm of [Asan88] for finding the visibility polygon of a point inside a polygon can be used to find the portion of a set of polygons that are visible in a given direction.

Visibility and Separability

78

Part of P visible in direction d

Figure 6.2 VH(P).

m

_____

Chap. 6

P VHP

Polygon P, the part of the boundary visible in direction d, and

A related problem is to find that portion of a binary (black-and-white) image that is visible in a given direction d. In this case, a pixel x is visible if there are no black pixels obstructing the view of x in the direction d from infinity. An algorithm is given in [Dehn88cJ that solves this problem in O(logn) time for an n x n binary image on a hypercube of size n2 . Given a point p in the plane and an n-vertex polygonal chain PC = VI, V2, . ,, where v1 may be the same point as Vn, the portion of the chain that is visible from p can be found in O(logn) time using O(n/ logn) processors on a CREW PRAM [Atal89a]. If p is the point (0, +oo), this problem is the same as that solved in [Dehn88a] and [Asan88] with d perpendicular to, and directed from above, the x-axis. In [Atal89a] it is assumed that p = (0, +oo), that no two consecutive line segments of PC are collinear, that Vt 0 vn, and that no segment is vertical. The general case can be added to the solution with little difficulty. PC consists of n segments ordered in a walk along PC from vl to Vn. Let Vis(PC) be the visibility chain of PC from the point p = (0, +±c). Figure 6.3 shows PC and Vis(PC), which consists of the parts of PC visible from p and vertical line segments joining them. A procedure VisChain that computes Vis(PC) is recursive and has parameters (C,d), where C is a polygonal chain of length m, and there are max(l, m/d) processors available. It is called initially with parameters PC and logn. In each call to VisChain, m and d are compared, and if m < d, a sequential

Sec. 6.1

Visibility

79

d

Ir

-1

Figure 6.3

- \

PC and Vis(PC).

algorithm is used to compute Vis(C) in 0(m) time with one processor. If d < m < d2, then C is divided into Cl and C2 such that IC11 = IC21, and VisChain is called recursively in parallel with VisChain(C1 ,d) and VisChain(C2 ,d). After returning from the recursive call, Vis(C) is computed from Vis(CI) and Vis(C2 ) in O(log 2 m) time using one processor. If m > d2 , then C is divided into g = (m/d)114 subchains of length m3 14dl/ 4 each, and g recursive calls of the form VisChain(Cj, d), i = 1, 2, .. ., g, are made in parallel. Then Vis(C) is computed from Vis(C,), i = 1, . . ., g, in 0(logm) time using m/d processors. We give a brief description of how to compute Vis(C) from Vis(Ci), i = 1_. g, in 0(logm) time using m/d processors. Since the Vis(Ci)'s are monotone with respect to the x-axis, each Vis(Ci) can be represented by a binary tree that can be searched by x-coordinate in time proportional to the height of the tree using one processor. Note that Ci is a subchain of C. It is shown that Vis(Cj) n Vis(C) has at most three connected components; that is, at most three separate portions of Vis(Ci) appear in the chain Vis(C). If Ci is a subchain at the beginning or end of C, then Vis(Ci) n Vis(C) has at most two connected components. Because of this fact, the tree representing Vis(Ci) can be split a constant number of times to remove those vertices of Vis(Ci) that do not appear in Vis(C). Over all Ci the 0(g) splits result in g' < 3g = 0(g) trees. These trees are then used to build Vis(C) such that the tree representing Vis(C) is log(3g) = 0(logm) levels higher than the highest tree representing Vis(C1 ), i = 1, . . ., g. Figure 6.4 shows a case where Vis(Ci) n Vis(C) has three connected components and Figure 6.5 shows a case where Vis(Ci) n Vis(C) has two connected components. For each subchain Ci, the part of C that is "before" Ci in a walk along C is called Bi and the part "after" Ci is called Ai; that is, Ai is the concatenation of Ci+l, Ci+2, 2 . ., Cg and Bi is the concatenation of Ch, C2 , ... , Ci1 . The portions of Vis(C,) hidden by Ai and those portions hidden by Bi are found by exploiting the following facts: There can be at most two intersections between Vis(Ai) [similarly, Vis(Bi)] and Vis(Ci), and there can be at most three portions of Vis(Ci) hidden by

Visibility and Separability

80

~66666666W

Chap. 6

Vis(CI)

~~I-1 c1

l

6666666666=

vis(c) C

Figure 6.4

vis(C) n Vis(C) has three connected components.

the two chains Ai and Bi. Computing the portions of Vis(Ci) hidden by Ai and Bi in parallel for each 1 < i < g gives the portions of Vis(Ci) that belong to Vis(C). The portions of the Vis(Ci)'s that are visible in Vis(C) are then combined by sorting their O(g) endpoints. Sorting can be done in O(logg) = O(logm) time using O(g) processors. Therefore, the visible portions of a polygonal chain of size n can be found in 0 (log n) time using 0 (n/ log n) processors. The total running time and the total number of processors are t(m, logm) = 0(logm) and p(m, logm) = O(m/logm), respectively; thus the time to compute Vis(PC) is O(logn) using 0(n/logn) processors. It is shown in [Atal9lb] how the same result can be obtained on the weaker EREW PRAM model of computation.

Sec. 6.1

Visibility

81

2

Vis(Cj)

Vis(C)

vZ66666666

C

C cj

Figure 6.5

vis(Ci) n Vis(C)

has two connected components.

6.1.3 Visibility of the Plane from a Point Given a set of n opaque nonintersecting line segments, the problem is to determine all parts of the plane visible from a point p in the plane. The line segments may intersect at their endpoints. Figure 6.6 shows a set of line segments and the part of the plane visible from the point p = (0, +oo). A technique called critical-point merging is used in [Atal86b] to solve this problem in 0 (log n log log n) time using 0 (n) processors on a CREW PRAM. This result is improved in [Atal89c], where cascading divide-and-conquer is used to solve the problem in 0 (log n) time using 0(n) processors on a CREW PRAM. It is also shown in [Atal89c] how the same time and processor bounds can be obtained on an EREW PRAM with space requirements increased by a factor of 0(logn). A randomized algorithm is given in [Reif87] that finds the visibility region in O(logn) probabilistic time using 0(n) processors on a CREW PRAM with high probability. An algorithm to solve the same problem on a linear array of size N is given in [Atal89d] and runs in 0(n logn/ log N) time using 0(N) processors, where N < n. Let S be the set of n line segments and let p be the point (0, +oc). Observe that the portion of S that is visible from p, Vis(S), is a chain monotone with respect to the x-axis and it describes the region of the plane that is visible from p. Note that IVis(S)I < 2n. The set of line segments is partitioned arbitrarily into N subsets Si, S2 ,. . . SN of size n/N each and the visibility problem is solved recursively for each set to obtain Vis(S,),

Visibility and Separability

82

Chap. 6

p = (0,+ 0)

Figure 6.6 Line segments and part of plane visible from p = (0, +oo).

i = 1, 2, .. , N. The termination step of the recursion checks if n < N and, if so, finds the visibility region of the set in O(n) time. This can be accomplished by projecting the line segments on the x-axis and sorting the 2n - 1 intervals in O(n) time. The intervals are stored in the linear array such that as the line segments are pipelined through, each processor tests whether its interval is visible from p. For the merge step, note that IVis(Si)l < 2n/N. The set ui=1 Vis(S 1) is then subdivided into m/N regions each of size O(N) by m/N vertical lines, where m = l IVis(S,)l < 2n. In O(N) time, the portion of Vis(S) that lies between each pair 1N of consecutive vertical lines can be found using a linear array with O(N) processors. An algorithm is presented in [MacK9Oa] that solves the same problem on a hypercube of size n in O(SORT(n)) time, where SORT(n) is the time needed to sort n numbers on an O(n)-processor hypercube. The algorithm uses multiway divideand-conquer, where solutions are merged in O(SORT(n)) time. At the time of this writing, the fastest known sorting algorithm on a hypercube of size n has a running time of 0 (log n log log n) [Cyph9O, Leig9 1]. Using the latter, the visibility algorithm of [MacK90a] runs in 0 (log n log log n) time. A randomized algorithm is given in [Reif9O] that solves the problem of determining which of a set of nonintersecting line segments are visible from (0, +oc) by using trapezoidal decomposition (described in Chapter 10) in 0(logn) probabilistic time on an O(n)-processor butterfly. 6.1.4 Visibility Pairs of Line Segments An algorithm is given in [Lodi86] that finds visibility pairs of line segments in a set of vertical line segments and runs in 0 (log n) time on a mesh-of-trees of size n2. A pair

Sec. 6.1

Visibility

83

T I T

-

I I

I

T

I

T

I

I

I

I

I

I

I

I

I

I T

I

II

I

B

I

I

I

T

I

I

T

I

I

I

I

B

I

I

I

I

I

I

BI

B

I

I

I

B B

I

-

Figure 6.7 Vertical line segments with visibility pairs joined by horizontal line segments and the representation of these segments in a mesh-of-trees. Those cells with no label have label 0. of vertical line segments si and sj form a visibility pair if there exists a horizontal line that intersects si and sj and does not intersect any other segment lying between s, and sj. Each segment si is described by three values, s,(x), s,(t), and s1 (b), where si(x) is the x-value of si, and si (t) and si (b) are the y-values of the top and bottom endpoints of Si, respectively. Figure 6.7 shows a set of vertical line segments with visibility pairs marked. The first step of the algorithm is to sort the segments by s (x), then to sort the si(t)'s and the s,(b)'s. A line segment si can be represented by three values: xi, ti, and bi, where xi e (l,...,nI is the index of si(x) in the sorted list of x-values, and ti, b E {1, . . .., 2n} are the indices of si(t) and si (b), respectively, in the sorted list of y-values. Sorting can be done on a mesh-of-trees of size n2 in O(logn) time [Akl85b]. Segments with the same x-values are assigned unique consecutive x-values which, upon consideration, does not affect the visibility problem. Endpoints with identical y-values are unchanged. The base of the mesh-of-trees is a 2n x n mesh. For each segment si, the label T is stored at processor (t,,x,) of the base, the label B is stored at processor (bi,xA), and the label I is stored at the processors in column xi between t, and be. All other processors store the label 0. Figure 6.7 illustrates a set of line segments and the labels for the set stored in a 2n x n mesh. There are four phases in the algorithm. In the first phase, visibility pairs are

Visibility and Separability

84

Chap. 6

detected between the T's in each row and labels 1, T, or B to their right by passing labels up the tree until pairs of labels meet at a lowest common ancestor. Information about which child node (right or left) the label came from is also passed up the tree. This is done for each row in parallel. Symmetrically, in the second phase, pairs are detected between T's and l's, T's, or B's to their left. There is one line segment per column and each is labeled with the binary representation of the column number. When a visibility pair si, sj is detected to the right of a T (where the label T is in the column se), the value j is sent to column i by passing the least significant bits of the binary representation of j that are different from i using the tree links. In the third and fourth phases, bottom endpoints are considered. Bottom endpoints can generate at most one visibility pair: two line segments on both sides of a bottom endpoint are a visibility pair. In the third phase, the nearest T or 1 to the right of each B is detected by passing labels up the tree in parallel for each row. The fourth phase symmetrically detects the nearest T or I to the left of each B. When a visibility pair si, sj is detected such that B is in column k, i < k < j, the least significant bits of the binary representation of j (in the third phase) and i (in the fourth phase) that are different from k are sent to column k. Only those columns that receive two distinct values report the two values as a visibility pair. In each phase of the algorithm, at most one visibility pair is generated per row and each line segment contributes to at most one pair; therefore, at most n pieces of information are exchanged by the columns in parallel. Details of the architecture of the processors (or nodes) and their connections are covered in [Lodi86]. Finally, the following two visibility problems are discussed in [Good9O] and algorithms for their solutions on the CREW PRAM are provided: 1. Given a simple polygon P with n vertices, it is shown how to compute the portion of P visible from a specified edge e of P in 0(logn) time using 0(n) processors. 2. Given a simple polygon P with n vertices, the visibility graph G of P is defined on the vertices of P such that node v is connected to node w in G if and only if vertex v is visible from vertex w in P. It is shown how G can be computed in 0 (log n) time using 0 (n log n + m) processors, where m is the number of arcs in G.

6.2 Separability Two separation problems that apply results in visibility are: 1. Given two n-vertex simple polygons, determine if they are separable in a given direction d. 2. Given m n-vertex simple polygons, determine if they are sequentially separable in a given direction d. A set S of m nonintersecting polygons is sequentially separable in a direction d if S can be moved to infinity by translating each polygon in S in the direction d such that no collisions between the polygons occur. Algorithms to solve both of these problems on a mesh are presented in [Dehn88a].

Sec. 6.3

Problems

85

The algorithm to solve the first problem runs in O(n 1 /2) time on a mesh of size n, and the one for the second problem runs in 0((mn)1 /2) time on a mesh of size mn. To determine if two n-vertex polygons P and Q are separable in a given direction d, the algorithm in [Dehn88a] first finds the visibility hulls of P and Q, VH(P) and VH(Q), respectively, in direction d. If VH(Q) n VH(P) = (0, then P is separable from Q in the direction d. Figure 6.2 shows a polygon and its visibility hull in a given direction. An algorithm is given in [Sark89b] to solve the polygon separation problem that runs in O(logn) time on a CREW PRAM with O(n) processors. This result is not directly related to visibility problems. Given two sets of points SI and 52 in the plane such that ISI + IS2 1 = n, the polygon separation problem is to construct a convex k-gon that separates SI from S2 such that k is minimized. Summary The table in Figure 6.8 summarizes the results in this chapter. The following symbols are used in the table: p for points, P or Pi for polygons, Si for sets of points, and d for a direction. Note that m is the number of arcs in the visibility graph. As the table shows, the visibility polygon from a point inside a simple n-vertex polygon with holes can be found on a mesh of size n in 0(nl/2 ) time and on a linear array of size n in 0(n) time. Since there are sequential algorithms that solve the problem in O(n logh) time, where h is the number of holes in the simple polygon [Asan85], neither result gives a good speedup. The question therefore remains open as to whether this problem can be solved in O(logh) time using a linear number of processors.

6.3 Problems 6.1. Design a parallel algorithm for computing the visibility polygon from a point inside a polygon with holes on a hypercube computer.

6.2. Develop a mesh-of-trees algorithm for computing the region of the boundary of a simple 6.3.

polygon visible in a certain direction d. Given a set of n opaque nonintersecting line segments and a point p in the plane, design a parallel algorithm for determining all parts of the plane visible from p on a tree computer.

6.4.

Show how the visibility pairs in a set of vertical line segments can be computed on a

6.5.

mesh of processors. Design a parallel algorithm for computing the visibility polygon of a simple n-vertex polygon P from a convex polygon inside P.

6.6. 6.7.

6.8.

Given two simple polygons with n vertices each, it is required to determine if they are separable in a certain direction d. How fast can this problem be solved on a PRAM? It is required to determine if m simple polygons with n vertices each are sequentially separable in a certain direction d. Is there an advantage to using the CREW PRAM over the EREW PRAM in solving this problem? Assume that you are given a CRCW PRAM (with your chosen conflict resolution rule,

Visibility and Separability

86

Chap. 6

Description

Reference

Model

Processors

Running time

Visibility polygon from a point p inside P Polygon visible

[Dehn88a] [Asan88] [Asan88]

Mesh Linear array Linear array

0(n

from p in d Visibility of a polygon from an edge

[Dehn88a] [Good9O]

Mesh CREW PRAM

0(n) 0 (n) 0(n) 0(n) 0(n)

Visibility graph of a polygon Plane visible from p with n opaque line segments

[Good90O

CREW PRAM

0(n log n + m)

0(log n)

[Atal86b] [Atal89c] [Atal89c] [Reif87] [Atal89d]

CREW PRAM CREW PRAM EREW PRAM CREW PRAM Linear array

0(n) 0(n)

0 (log n log log n)

[MacK9Oa] with [Leig9l] [Atal89a]

Hypercube

0(n)

O (log n) 0(log n) O(log n) 0 (n log no log N), N
CREW PRAM

0(n/logn)

O(logn)

[Lodi861

Mesh-of-trees

0(n2)

O(logn)

[Sark89b]

CREW PRAM

0(n)

0(logn)

[Dehn88a] [Dehn88a]

Mesh Mesh

0(n)

0(n 12)

0(mn)

0((mn)'2)

Polygonal chain visible from p Visibility pairs of vertical line segments S. and S2 separated by a convex k-gon P. separated from P2 One polygon separated from many Figure 6.8

0(n) 0(n) O(N)

1

2)

0(n) 0(n) 0(n1/2) 0 (log n)

Performance comparison of parallel visibility and separability

algorithms. i.e., COMMON, PRIORITY, ARBITRARY, etc.). How fast can you solve the polygon separation problem? 6.9.

Given a set P of points in the plane, a set S of disjoint line segments, and a straight line L outside the convex hull of P U S, it is required to find, for each point on the line L, the point of P, if any, which is visible from and closest to that point of L among all points of P. Design an algorithm for solving this problem on each of the following models of parallel computation: (a) Linear array (b) Cube-connected cycles (c) EREW PRAM (d) Scan

6.10. In this chapter we defined visibility in terms of straight lines (i.e., we said that two points p and q can see each other if the straight-line segment connecting them does not intersect any other object). The notion of visibility can also be extended to encompass the notion

Sec. 6.3

References

87

of reachability. Here, point p can reach point q if there is a path from p to q that does not intersect any other object. One can make this definition more precise by specifying the kind of path allowed. For example, the path from p to q may only be a convex chain, or a staircase function, or a circular arc, or something similar. The reachability kernel of a simple polygon P is the set of points in P that can see all other points in P according to this new definition of visibility. Discuss parallel algorithms for computing the reachability kernel of a simple polygon. 6.11. Two robots, each represented by a point in the plane, are moving at the same speed inside a room, represented by a simple polygon. There are also stationary objects in the room, each represented by a simple polygon. These objects are considered as obstacles that may prevent one robot from being visible by the other. Given the initial and final positions of the two robots, from which they can see each other, it is required to find a pair of paths from the initial to the final positions that satisfies the following two conditions: (a) The robots are visible to one another at all times. (b) The sum of the two path lengths is a minimum. Design a parallel algorithm for solving this problem. 6.12. Given two sets A and B of points in the plane, a ham sandwich cut is a line h with the property that at most half of the points in A and half of the points in B lie on the same side of h. Design a parallel algorithm for computing a ham sandwich cut.

6.4 References [Akl85b] [Asan851 [Asan88] [Atal86b]

[Atal89a]

[Atal89c]

[Atal89d]

[Atal9lb]

[Cyph9O]

S. G. Akl, Parallel Sorting Algorithms, Academic Press, Orlando, Florida, 1985. T. Asano, An efficient algorithm for finding the visibility polygon for a polygonal region with holes, Transactions of the IECE Japan E-68, Vol. 9, 1985, 557-559. T. Asano and H. Umeo, Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region, Parallel Computing, Vol. 6, 1988, 209-216. M. J. Atallah and M. T. Goodrich, Efficient plane sweeping in parallel (preliminary version), Proceedings of the Second Annual ACM Symposium on Computational Geometry, Yorktown Heights, New York, June 1986, 216-225. M. J. Atallah and D. Z. Chen, An optimal parallel algorithm for the visibility of a simple polygon from a point (preliminary version), Proceedings of the Fifth Annual ACM Symposium on Computational Geometry, Saarbruchen, Germany, June 1989, 114-123. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, SIAM Journal on Computing, Vol. 18, No. 3, 1989, 499-532. M. J. Atallah and J.-J. Tsay, On the parallel-decomposability of geometric problems, Proceedings of the Fifth Annual ACM Symposium on Computational Geometry, Saarbruchen, Germany, June 1989, 104-113. M. J. Atallah, D. Z. Chen, and H. Wagener, An optimal parallel algorithm for the visibility of a simple polygon from a point, Journal of the ACM, Vol. 38, No. 3, July 1991, 516-533. R. Cypher and C. G. Plaxton, Deterministic sorting in nearly logarithmic time on

88

Visibility and Separability

Chap. 6

the hypercube and related computers, Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, Baltimore, May 1990, 193-203. [Dehn88a] F. Dehne, Solving visibility and separability problems on a mesh-of-processors, The Visual Computer, Vol. 3, 1988, 356-370. [Dehn88c] F. Dehne, Q. T. Pham, and I. Stojmenovi6, Optimal Visibility Algorithms for Binary Images on the Hypercube, Technical Report TR-88-27, Computer Science Department, University of Ottawa, Ottawa, Ontario, October 1988. [Good9Oa] M. T. Goodrich, S. B. Shauck, and S. Guha, Parallel methods for visibility and shortest path problems in simple polygons, Proceedings of the Sixth Annual Symposium on Computational Geometry, Berkeley, California, June 1990, 73-82. [Leig9l] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays Trees Hypercubes, Morgan Kaufman, San Mateo, California, 1991. [Lodi86] E. Lodi and L. Pagli, A VLSI solution to the vertical segment visibility problem, IEEE Transactions on Computers, Vol. C-35, No. 10, October 1986, 923-928. [MacK9Oa] P. D. MacKenzie and Q. F. Stout, Asymptotically efficient hypercube algorithms for computational geometry, Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, College Park, Maryland, October 1990, 8-11. [Reif87] J. H. Reif and S. Sen, Optimal randomized parallel algorithms for computational geometry, Proceedings of the 1987 InternationalConference on ParallelProcessing, St. Charles, Illinois, August 1987, 270-277. [Reif9O] J. H. Reif and S. Sen, Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications (Preliminary Version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 327-337. [Sark89b] D. Sarkar and I. Stojmenovi6, An Optimal Parallel Algorithm for Minimum Separation of Two Sets of Points, Technical Report TR-89-23, Computer Science Department, University of Ottawa, Ottawa, Ontario, July 1989.

7 Nearest Neighbors

In many applications, physical or mathematical objects are represented as points in multidimensional space, and appropriate metric is used to measure the distance between pairs of points. Proximity problems arise naturally in these applications. For example, in cluster analysis, a number of entities are grouped together if they are sufficiently "close" to one another. Similarly, in classification theory, a pattern to be classified is assigned to the class of its "closest" (classified) neighbor. Computational geometric tools have been used successfully in this context. The purpose of this chapter is to provide an overview of parallel algorithms for proximity problems.

7.1 Three Proximity Problems The three most studied proximity problems for a set S of n points in the plane are: a. All nearest neighbors (ANN). Determine for each point of S its closest neighbor, also in S. A popular way to find all nearest neighbors of a point set S is first to compute the Voronoi diagram of S. Because of the importance of the Voronoi diagram, a separate chapter, Chapter 8, is dedicated to it. Without preprocessing, the ANN problem has a sequential lower bound of Q (n log n) [Prep85]. Figure 7.1 shows a set of points and an arrowed edge connecting each point to its nearest neighbor in the set. b. Query nearest neighbor (QNN). Determine which point of S is closest to a query point q. This problem has an Q(n) sequential lower bound. Figure 7.2 illustrates a set of points S, a query point q, and the nearest point in S to q. Often, one is interested in preprocessing S to enable fast nearest-neighbor queries. For example, the Voronoi diagram of S can be computed and then searched to find the nearest neighbor in S of a set of query points. Algorithms for this problem that involve preprocessing are not discussed in this chapter. c. Closest pair (CP). Determine which two points of S are closest to one another. This algorithm has a sequential lower bound of Q(n logn) [Prep85]. Note that 89

Nearest Neighbors

90

Chap. 7

1404-0

0-----,

Figure 7.1 For each point p, there is an arrow going from p to its nearest neighbor. 0

0

0

0

0

o

0

To 0 0 0

0

0

0

0

Figure 7.2 Set of points S, query point q, and nearest point in S to q.

algorithms that solve the ANN problem can be used to solve the CP problem by applying a minimum function. Figure 7.3 shows a set of points and the closest pair of points in the set. Solutions to these problems for the linear array and mesh, each with 0(n) processors, are described in [Chaz84] and [Mill89b], which require 0(n) and 0(n 1/2) time, respectively. Consider the points of S distributed one per processor on a mesh or linear array of size n. For the QNN problem on a linear array, the query point q travels from left to right in 0(n) time, and the distance from q to the point stored at each processor is computed, keeping track of the smallest distance [Chaz84]. On a mesh, a copy of the query point q is broadcast in 0(nl/2) time to all processors and the distance from q to each point is computed. The minimum over all these results is then computed in 0(n'12 ) time [Mill89b].

In [Chaz84] the ANN problem on a linear array of size n is solved by using

0 0

0 0 0

0

0 0 0

0

0 Figure 7.3 Set of points and closest Q

pair.

a foldover operation. Assume that the processors are numbered from left to right as 1, 2,. .. n and that the point pi is stored at processor i, i = 1, 2,. .,n. The point stored in processor 1 is sent to the right, and when it is being passed to processor 4, the point in processor 2 is sent to the right. At processor i, the moving point pj computes its distance to pi and keeps track of the point whose distance to pj is smallest, and that distance. In this way, the ANN problem is solved in 0(n) time. In [Mill89b], the ANN problem is solved on a mesh of size n 1/2 x n 1/2. The points are sorted and the plane is divided into five vertical slabs such that there are n/5 points in each slab. The slab boundaries are defined by lines through every (l/ 5 )th point. The problem is solved recursively in each slab, then the process is repeated for the case where the slabs are horizontal. Each point knows its nearest neighbor within the slabs. It is shown that at most 128 extra tests must be performed to find the nearest point to every point in the set. The running time of the algorithm is 0(n11 2). The CP problem is solved in 0(nl/2) time by finding all nearest neighbors, then performing a minimum operation on those values in 0(n"12) time. It is shown in [Atal89d] how a linear array with 0(N) processors can solve the ANN problem in 0(nlogn/logN) time when N < n. Let d(p,q) denote the Euclidean distance between two points p and q of S. Initially, the set of points is sorted by increasing x-coordinate in 0(n logn/ log N) time, as described in [Lee8l]. The algorithm itself is recursive and proceeds as follows: Step 1.

If SI

<

5N. the problem is solved directly by the algorithm of [Chaz84].

Step 2. Otherwise, vertical lines are used to split S into N subsets S], S2 The problem is then solved recursively for each Si, to obtain:

.

SN.

1. For every p in Si, the point in Si closest to p. call it N 1 (p). 2. The points in Si sorted by decreasing y-coordinates. Step 3. For each p in 5, a point N 2(p) is found such that if p is in Si, then N 2(P) is the point closest to p among all the points that are: 1. In S

-

Si. 91

Nearest Neighbors

92

Chap. 7

2. Below p. 3. Closer to p than d(p, N, (p)).

Step 4. For each p in S, a point N3 (p) is found such that if p is in Si, then N 3 (p) is the point closest to p among all the points that are: 1. InS- Si. 2. Above p. 3. Closer to p than d(p, Ni(p)). Step 5. For each p in S, the closest point to p is one of the points N1 (p), N 2(p), and N 3 (p). Step 6. The elements in S are sorted by y-coordinate by merging the N sorted lists returned in step 2 in O(n) time, using the algorithm of [Atal88b]. The sorting in step 6 is needed to perform a "sweep" of the points of S during steps 3 and 4. Since steps 3, 4, and 6 can be implemented in O(n) time, the algorithm has a running time of t(n) = Nt(n/N) + cn for some constant c, implying that t(n) = 0(n logn/ log N). This result is shown in [Atal89d] to hold for processor arrays of higher dimensions. An 0(n)-processor hypercube is used in [Stoj88a] to solve the CP problem in 0 (log 2 n) time. In [MacK9Oa] it is shown how the ANN and the CP problems can be solved in 0 (SORT(n) log log n) time on a hypercube of size n where SORT(n) is the time needed to sort n numbers on an O(n)-processor hypercube. At the time of this writing, the fastest known sorting algorithm on a hypercube of size n has running time 0(lognloglogn) [Cyph9O, Leig91]. Using this sorting algorithm, the algorithms of [MacK9Oa] that solve the ANN and CP problems run in 0 (log n (log log n) 2) time. Naturally, algorithms for solving proximity problems on PRAM's with 0(n) processors have also been developed. An O(lognloglogn) time CREW PRAM algorithm for the CP problem is described in [Atal86a]. There the set of points is divided into n1/2 sets of size nJ1 2 each and the CP problem is solved recursively in each set. This multiway divide-and-conquer idea was also used in [Atal86a] to find the convex hull of a set of points as described in Chapter 3. By using the powerful method of cascading divide-and-conquer, it is shown in [Cole88a] how the ANN problem can be solved in 0 (log n) time on an 0(n)-processor CREW PRAM. In fact, the same running time is achieved in [Cole88a] on the EREW PRAM, although the memory requirements are increased by a factor of 0 (log n). The set S of n points are sorted by x-coordinate and stored at the leaves of a binary tree T. At each node v in T, the set of points stored at descendent leaves of v are kept in an array Y(v) sorted by y-coordinate. For each point q E S, a label b(q) holds the name of the point that is closest to q of all the points that q has "encountered" during a cascading merge procedure. While cascading up the tree, q encounters points that have x-values near the x-value of q since the points are stored at the leaves in increasing order of x-value, and of those points, the ones with y-value close to the y-value of q are found in the lists Y(v) for each v E T. The nearest-neighbor ball B(q) is defined for each q E S to be the disk centered at q with radius equal to the distance from q to the point named in b(q). Then, using the cascading divide-and-conquer technique, for each point q a list C(q) is constructed that

Related Problems

Sec. 7.2

93

Problem

Reference

Model

Processors

Running time

QNN

[Chaz84] [Mill84a], [MiII89b] [Chaz84] [Mill84aI, [MilI89b] [Atal89d] [MacK9Oa] with [Leig9l] [Cole88a]

Linear array Mesh Linear array Mesh Linear array Hypercube CREW PRAM

0(n) 0(n) 0(n)

0(n) 0(n1/2 ) 0(n)

0(n) O(N) 0(n) 0(n)

0(n1/2 ) O(n logn/ log N), N < n 2 0(logn(loglogn) ) 0(logn)

[Cole88a] [Cole88a]t

EREW PRAM EREW PRAM

0(n) 0(n/logn)

0(log n) 0(logn)

[Mill84a], [MiII89b] [Stoj88a]

Mesh Hypercube

0(n)

2 0(n1/ )

0(n)

0(log 2 n)

[MacK9Oal with [Leig9l] [Atal86a] [Stou88]

Hypercube CREW PRAM CRCW PRAM (COLLISION)

0(n) 0(n) 0(n)

2 0(logn(loglogn) ) 0(logn log logn) 0(l) expected

ANN

CP

Figure 7.4 Performance comparison of parallel QNN, ANN, and CP algorithms.

stores those points of S that may have q as their nearest neighbor. It is shown that C(q) contains no more than six points; thus the lists can be searched to find which point has q as its nearest neighbor in 0(l) time with 0(n) processors and the entire algorithm takes 0(logn) time. An algorithm in [Cole88a] solves the ANN problem for the case when the points are vertices of a convex polygon in 0(logn) time using 0(n/logn) processors on an EREW PRAM model. Finally, an 0(1) expected time COLLISION CRCW PRAM algorithm for the CP problem is derived in [Stou88] under the assumption that the n points are chosen uniformly and independently from the unit square. Summary The table in Figure 7.4 summarizes the results in this section. The result marked with a t is for the case when the points are vertices of a convex polygon. All the times are for the worst case unless otherwise specified.

7.2 Related Problems As with convex hull problems, a special instance of proximity problems occurs when the points form a digitized black-and-white picture. An 0(n)-processor pyramid is used in [Dyer8O] to solve the CP problem for a digitized picture in 0(n 1/2) time. This is

Nearest Neighbors

94

Chap. 7

improved to 0 (log n) time in [Mill85 a]. An extension to the ANN problem for digitized pictures is obtained as follows: Two black pixels are said to be connected if they are vertical or horizontal neighbors; they are in the same component if and only if there is a connected path between them. Mesh, mesh-of-trees, and pyramid algorithms are described in [Stou84, Mill85b], [Kuma86], and [Mill85a], respectively, which determine for each black component the distance to its nearest black component. In each of the algorithms, distance is measured using the Li-metric. Recall that the distance between two points pi = (xi, Yi) and P2 = (X2, Y2) in the plane measured under the Lk-metric is (IY2 - Yl k + IX2 - X 1Ik)1/k. The mesh algorithm runs in O(n 1 / 2 ) time with 0(n) processors [Stou84, Mill85b], the mesh-of-trees algorithm runs in 0 (log n) time with O(n2 ) processors [Kuma86], and the pyramid algorithm runs in 0(nl/4) time with O(n) processors [Mill85a]. The CP problem for a digitized picture can be solved on a mesh-of-trees in O(logn) time with O(n2 ) processors [Kuma86] and in O(logn) time on a pyramid of size n2 [Stou85]. While all the algorithms above are restricted to a set of points in the plane, a mesh-of-trees with O(n 2 / logn) processors is used in [Akl89a] to solve the CP problem for points in d-dimensional space, where d > 2. Two extensions are described in [Mill87b], where it is shown how the nearest-neighbor problem for n line segments, or n simple polygons, can be solved on an 0(n)-processor mesh in O(n 1/2) time. A related problem to the ANN problem is that of finding for each point the point that is farthest from that point in the set; this is called the all farthest neighbors (AFN) problem. An algorithm for solving the AFN problem for the case when the points form a digitized black-and-white picture is given in [Mill85b]. For each black component, the distance to its farthest neighbor is found in O(n 1/2) time on a mesh of size n. In [Nand88] it is shown how to encode a quad tree on a linear array, and an analysis of two ways for embedding a quad tree on a hypercube is given. A quad tree [Fink74] is a data structure that can be viewed as a two-dimensional segment tree [Prep85]. Where the segment tree organizes contiguous intervals, the quad tree organizes a grid of cells. (The segment tree is discussed in Chapter 4.) In [Nand88], algorithms are provided that use a quad tree data structure to find neighbor information in a digitized image.

7.3 Problems 7.1.

7.2.

7.3.

7.4.

Given a set S of n points in the plane, it is required to determine for each point of S its closest neighbor, also in S. Design an algorithm for solving this problem on a mesh-of-trees parallel computer. Given a set S of n points in the plane and a query point q, it is required to determine which point of S is closest to q. Design an algorithm for solving this problem on a tree of processors. Given a set S of n points in the plane, it is required to determine which two points of S are closest to one another. Design an algorithm for solving this problem on a butterfly parallel computer. Given a convex polygon P. it is required to determine for each vertex of P its closest

Sec. 7.4

References

95

neighbor, also a vertex of P. Design an algorithm for solving this problem on a mesh of processors. 7.5. Assume that a set L of n line segments in the plane is given. (a) Develop a definition for the concept of distance between two line segments. (b) For each line segment in L, it is required to find its nearest neighbor in L according to the definition of distance in part (a). Design an algorithm for solving this problem on a hypercube parallel computer. 7.6. Assume that a set P of n simple polygons in the plane is given. (a) Develop a definition for the concept of distance between two simple polygons. (b) It is required to find the closest pair of polygons in P. Design an algorithm for solving this problem on a pyramid parallel computer. 7.7. Given a set S of n points in the plane, it is required to determine for each point in S its farthest neighbor, also in S. Design an algorithm for solving this problem on each of the following models of parallel computation: (a) Linear array (b) Mesh with broadcast buses (c) Scan 7.8. The diameter of a finite set of points in the plane is the distance between two points that are farthest apart. Given a set S of n points in the plane, it is required to partition S into two subsets so that the sum of the diameters of the subsets is minimized. Derive an EREW PRAM algorithm for solving this problem. 7.9. Given a set S of n points in the plane, it is required to partition S into k clusters C 1, C2 ,. . Ck, so that the maximum cluster diameter is minimized. Design an algorithm for solving this problem on your chosen model of parallel computation. 7.10. Repeat Problem 7.9 for the case where the set S is dynamically changing [i.e., points are inserted into and deleted from S (one at a time) and it is desired to efficiently preserve the property that the maximum cluster diameter is minimized].

7.11. Let S be a dynamically changing set of points in d-dimensional space. Design a parallel algorithm for dynamically maintaining (i.e., keeping track of) the closest pair of points in S. 7.12. Let S be a set of n points in three-dimensional space. The center of S is defined as a subset Q of S with the property that any plane M containing a point q E Q divides the points of S in such a way that each of the two half-spaces bounded by M contains at least n/4 points of S. Design and compare parallel algorithms for computing the center of a set of points in three dimensions.

7.4 References [Akl89a] [Atal86a] [Atal88b] [Atal89d]

S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. M. J. Atallah and M. T. Goodrich, Efficient parallel solutions to some geometric problems, Journal of Parallel and Distributed Computing, Vol. 3, 1986, 492-507. M. J. Atallah, G. N. Frederickson, and S. R. Kosaraju, Sorting with efficient use of special-purpose sorters, Information Processing Letters, 1988, 13-15. M. J. Atallah and J.-J. Tsay, On the parallel-decomposability of geometric problems,

96

Nearest Neighbors

Chap. 7

Proceedings of the Fifth Annual ACM Symposium on Computational Geometry, Saarbruchen, Germany, June 1989, 104-113. [Chaz84] B. Chazelle, Computational geometry on a systolic chip, IEEE Transactions on Computers, Vol. C-33, No. 9, September 1984, 774-785. [Cole88al R. Cole and M. T. Goodrich, Optimal parallel algorithms for polygon and point-set problems (preliminary version), Proceedingsof the Fourth Annual ACM Symposium on Computational Geometry, Urbana-Champaign, Illinois, June 1988, 201-210. [Cyph90] R. Cypher and C. G. Plaxton, Deterministic sorting in nearly logarithmic time on the hypercube and related computers, Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, Baltimore, May 1990, 193-203. [Dyer8O] C. R. Dyer, A fast parallel algorithm for the closest pair problem, Information Processing Letters, Vol. 11, No. 1, 1980, 49-52. [Fink741 R. A. Finkel and J. L. Bentley, Quad-trees; a data structure for retrieval on composite keys, Acta Informatica, Vol. 4, 1974, 1-9. [Kuma86I V. K. Prasanna Kumar and M. M. Eshaghian, Parallel geometric algorithms for digitized pictures on a mesh of trees, Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, Illinois, August 1986, 270-273. [Lee8l] D. T. Lee, H. Chang, and C. K. Wong, An on-chip compare steer bubble sorter, IEEE Transactions on Computers, Vol. C-30, 1981, 396-405. [Leig9l] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays Trees . Hypercubes, Morgan Kaufman, San Mateo, California, 1991. [MacK9Oa] P. D. MacKenzie and Q. F. Stout, Asymptotically efficient hypercube algorithms for computational geometry, Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, College Park, Maryland, October 1990, 8-11. [Mill84a] R. Miller and Q. F. Stout, Computational geometry on a mesh-connected computer (preliminary version), Proceedings of the 1984 International Conference on Parallel Processing, Bellaire, Michigan, August 1984, 66-73. [Mill85a] R. Miller and Q. F. Stout, Pyramid computer algorithms for determining geometric properties of images, Proceedings of the First Annual ACM Symposium on Computational Geometry, Baltimore, June 1985, 263-271. [MiIl85b] R. Miller and Q. F. Stout, Geometric algorithms for digitized pictures on a mesh-connected computer, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-7, No. 2, March 1985, 216-228. [Mill87b] R. Miller and Q. F. Stout, Mesh computer algorithms for line segments and simple polygons, Proceedings of the 1987 International Conference on ParallelProcessing, St. Charles, Illinois, August 1987, 282-285. [Mill89b] R. Miller and Q. F. Stout, Mesh computer algorithms for computational geometry, IEEE Transactions on Computers, Vol. C-38, No. 3, March 1989, 321-340. [Nand881 S. K. Nandy, R. Moona, and S. Rajagopalan, Linear quadtree algorithms on the hypercube, Proceedings of the 1988 InternationalConference on ParallelProcessing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 227-229. [Prep85] F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. [Stoj88a] I. Stojmenovi6, Computational geometry on a hypercube, Proceedings of the 1988

Sec. 7.4

[Stou84]

[Stou85] [Stou88]

References

97

International Conference on ParallelProcessing, St. Charles, Illinois, August 1988, Vol. 111, Algorithms and Applications, 100-103. Q. F. Stout and R. Miller, Mesh-connected computer algorithms for determining geometric properties of figures, Proceedings of the 1984 International Conference on Pattern Recognition, 1984. Q. F. Stout, Pyramid computer solutions of the closest pair problem, Journal of Algorithms, Vol. 6, 1985, 200-212. Q. F. Stout, Constant-time geometry on PRAMS, Proceedings of the 1988 International Conference on ParallelProcessing, St. Charles, Illinois, August 1988, Vol. 111, Algorithms and Applications, 104-107.

8 Voronoi Diagrams

In Chapter 7 we described algorithms that solve proximity problems such as: "QNN: Determine which point of a set S is closest to a query point q." One way to solve this problem is to consider a solution to the following problem posed in [Prep85] as the "loci of proximity" problem: Given a set S of n points in the plane, for each point p c S, find the locus of points (x, y) that are closer to p than to any other point of S. A mathematical concept called the Voronoi diagram attributed to mathematician G. Voronoi [Voro08] (sometimes called a Dirichletor Thiessen tessellation) is a partition of the plane into such loci. More formally, given two points pi E S and p1 e S, define H(pipj) as the half-plane containing Pi and bounded by the perpendicular bisector of pi and pj. Let the intersection of n -I half-planes be V(i) = UiajH(pipj), j = 1, . . ., n, the Voronoi polygon of the point pi. V(i) is a convex polygon with at most n - I sides such that any point in V(i) is closer to pi than to any other point in S. The set of n Voronoi polygons defines the Voronoi diagram of S, Vor(S). Note that some Voronoi polygons may be unbounded. Vertices of the Voronoi polygons are called Voronoi vertices and edges of the polygons are called Voronoi edges. Figure 8.1 shows the Voronoi diagram of a set of points. Voronoi diagrams have been well studied in computational geometry since the work of [Sham75] partly because of their applications in solving geometric problems such as proximity problems and the Euclidean minimum spanning tree problem, as well as their applications in such diverse areas as biology, visual perception, physics, and archeology [Saxe90]. Computing the Voronoi diagram of a set of points in the plane with a single processor has an Q (n log n) lower bound [Prep85]. Some important properties of the planar Voronoi diagram are given in what follows. (We assume that no four points are cocircular.) 1. Voronoi vertices are the center of circles defined through three points of S. These circles contain no other point of S. 2. A Voronoi polygon V(i) is unbounded if and only if pi is a point on the convex hull of S. 99

Voronoi Diagrams

too

Chap. 8

0 0

0 0

Figure 8.1 Voronoi diagram of a set of points. 3. The straight-line dual of the Voronoi diagram is a triangulation of S called the Delaunay triangulation.This triangulation of S has the property that the minimum angle of its triangles is maximum over all triangulations of S. See Chapter 10 for a discussion of parallel algorithms for computing arbitrary triangulations of point sets. 4. The Voronoi diagram of a set S of n points has O(n) edges and O(n) vertices by Euler's relation, namely v - e + f = 2, where v, e, and f denote the number of vertices, edges, and regions of a planar subdivision, respectively. For more properties and proofs of the properties above, see [Prep85]. Different types of Voronoi diagrams are studied. There are Voronoi diagrams of point sets, line segments, arcs, disks, or combinations of these objects. Voronoi diagrams can be constructed for point sets in higher dimensions. A slightly different type of construct, the farthest-site Voronoi diagram of a set of points S, partitions the plane into cells such that the farthest-site Voronoi polygon V'(i) of a point pi is the locus of points that are farthest from the point pi than any other point in S. For a more comprehensive discussion of generalizations of Voronoi diagrams, see [vanW90]. Except for three results, parallel algorithms that have been designed for Voronoi diagrams concentrate on finding the Voronoi diagram of a set of points in the plane.

Sec. 8.1

Network Algorithms for Voronoi Diagrams

101

The three exceptions are a parallel algorithm for finding the Voronoi diagram of a set of line segments in the plane [Good89b], a parallel algorithm for finding the Delaunay triangulation of a set of points in three dimensions [Saxe9O], and a parallel algorithm for computing the farthest-site Voronoi diagram of a set of (possibly overlapping) disks [vanW90]. All of the results presented here give algorithms for directly solving the Voronoi diagram except the algorithm in [Saxe9O], which solves the Delaunay triangulation directly. In the following discussion we use the general term "Voronoi diagram" to refer to algorithms that solve the Voronoi diagram for planar point sets. When describing algorithms for sets of different objects and in higher dimensions, we will state explicitly the specific type of Voronoi diagram.

8.1 Network Algorithms for Voronoi Diagrams Parallel algorithms for Voronoi diagram construction have been designed for the mesh, the mesh-of-trees, the CCC, and the hypercube models of computation. In Chow's thesis [Chow8O] an algorithm on the CCC is given for finding the Voronoi diagram that uses a method due to Brown [Brow79]. Inversion is used to transform a plane that does not pass through the center of inversion, Po, to a sphere that has Po at the apex. Given P0 , which is not on the xy-plane, and an inversion radius r > 0, a point p in the xy-plane is transformed to the point pf by the inversion such that -po is in the same direction as Pop and Pop' = r2 / IPop If the inversion is applied twice, the original point results. The exterior of the sphere corresponds to one half-space bounded by the plane, and the interior of the sphere corresponds to the other half-space. Let S' be the set of inversion points of S. By property I of Voronoi diagrams, one way to find the Voronoi diagram of S is to test each set of three points, Pi, Pj, pk E S to determine if the circle through Pi, Pj, Pk contains any other point of S. This test corresponds to checking if the convex hull of S', CH(S'), and Po are in the same half-space bounded by the face of CH(S') that is defined by Pi,, Pk. If this test is successful, the center of the circle through pi, Pjb Pk is a Voronoi point. Algorithms are given in [Chow8O] to compute the convex hull of a set of points in three dimensions (see Chapter 3) on a CCC model. The first runs in 0 (log4 n) time and uses O(n) processors, and the second runs in O(K log3 n) time and uses O(n 1+1/K) processors, 1 < K < logn. These algorithms and the inversion method described are used in the design of two algorithms that compute the Voronoi diagram of a set of points in the plane on a CCC model within the same time and processor bounds. Mi Lu presents a Voronoi diagram algorithm also based on Brown's method that runs in 0(n 112 logn) time on an 0(n)-processor mesh [Lu86a]. An algorithm for computing the convex hull of a set of points on a sphere is used in the Voronoi diagram algorithm. It also runs on an 0(n)-processor mesh in 0(n'12 logn) time. A time optimal algorithm is given in [Jeon9O] that runs in O(n1/2) time on an n1/2 x n1/ 2 mesh. The algorithm is based on the divide-and-conquer approach used in [Sham75]. The set of points is sorted by x-coordinate and divided in half into two

102

Voronoi Diagrams

Chap. 8

sets L and R by a vertical separating line I such that points in L are to the left of I and points in R are to the right of 1. Sorting takes O(n1 / 2) on a mesh of size n [Akl85b]. Recursively, the Voronoi diagrams Vor(L) and Vor(R) are computed for the sets L and R, respectively. The two diagrams are then merged, resulting in Vor(L U R). The merge step finds C, the collection of edges in Vor(L U R) that are shared by polygons of points in L and polygons of points in R. This dividing chain C is monotone with respect to the y-axis, and all points to the left of C are closer to a point in L than to any point in R. Similarly, all points to the right of C are closer to a point in R than to any point in L. The merge step works by identifying those Voronoi edges in Vor(L) and Vor(R) that are intersected by C. Planar point location is used to determine which Voronoi vertices of Vor(L) [respectively, Vor(R)] are closer to R [respectively, L], and the Voronoi edges of Vor(L) and Vor(R) are subdivided into groups depending on whether one, both, or none of their endpoints are closer to L or to R (special action is taken for unbounded Voronoi edges). It is shown how to determine, from this information, which edges intersect C. Let B, be the set of edges of Vor (L) that intersect C, and let Br be the set of edges of Vor (R) that intersect C. Therefore, B = B, U Br is the set of edges in both Vor(L) and Vor(R) that intersect C. The edges in B are sorted according to the order in which they intersect C. The chain C is directed from bottom to top and the edges in B are directed from the endpoint closer to L to the endpoint closer to R. Two edges ej, ej e B are y-disjoint if the minimum y-value of ei is no less than the maximum y-value of ej. The minimum (maximum) y-value of an edge is the minimum (maximum) y-value of its two endpoints. If two edges are y-disjoint, the order in which they cross C is easily determined since C is monotone with respect to the y-axis. If two edges are not y-disjoint, three cases are considered to determine the order in which they cross C. To find the actual edges of C, the points pi and pj, the bisector of which defines an edge of C, are found using a precede operation. The precede operation finds, for each edge el E Bl, the greatest edge in Br (sorted by the order in which they intersect C) that is less than el.The precede operation takes 0 (n1 /2) time [Jeon9O]. Finally, the edges of Vor(L U R) together with their vertices and the bisector points that define them are distributed so that each processor contains a constant number of Voronoi edges. Merging the two Voronoi diagrams takes O(n 1/2) time on a mesh, and the total time t(n) for the algorithm is t(n) = 2t(n/2) + O(n1/2 ), which is O(n 1 /2 ). Several improvements to the algorithm in [Jeon9O] are described in [Jeon9la]. An algorithm in [Stoj88a] that computes the Voronoi diagram of a set of n points in the plane in 0 (log 3 n) time on an 0(n)-processor hypercube can be obtained by using the algorithm in [Jeon9O] and a planar point location algorithm. Two algorithms are presented in [Saxe9O] (see also [Saxe9l]). The first finds the Delaunay triangulation of a set of n points in the plane in 0(log2 n) time on a mesh-of-trees of size n2. The second algorithm constructs the Delaunay triangulation for a set of points in three dimensions that runs in O(m 1 /2 logn) on an n x n mesh-of-trees, where m is the number of tetrahedra in the triangulation. The algorithm for a set of points in the plane is based on the fact that if (pi,pj) is a Delaunay edge, and if Pk is

Sec. 8.2

PRAM Algorithms for Voronoi Diagrams

103

the point such that cos Z PiPk Pi is a minimum among all points in S on the same side of the line through (pi,Pj) on which Pk lies, then APiPjPk is a Delaunay triangle. Let pI, P2,. . - P, be the set of points S. Each processor in row i is loaded with the coordinates of the point pi, and each processor in column j is loaded with the coordinates of the point pj, such that the processor in position (ij) in the mesh-of-trees contains two points, pi and pj. Each processor computes the square of the distance between its two points (i #Fj), and the minimum function is computed from leaf processors to the root in each column in 0(logn) time. The resulting edge that defines a closest point to the point pj is stored in each processor in column j. This edge is a Delaunay edge [Prep85]. An 0(logn) time compacting procedure is used to remove duplicate edges. Each processor containing edge (pi, Pj) computes cos Z Pi Pk P1 for k 0 i, j, and Pk to one side of the line through (PiPj) and cos Z PiPiPj for I $ i, j, and pi to the other side of the line through (pi,pj). The minimum "cos" on each side of the line through (pi,pj) is found by passing values to the column root in O(logn) time. The four new edges (PiPk), (PjPk)P (ipI), and (pjpi) are stored in processors in the

same column. Now, for each newly created edge (Pi,Pj), a point Pk is found that is in the triangulation, as was done in the previous step. By doing this, two new edges are created for each existing edge. This last step is then iterated 0(logn) times since, at each iteration, the number of edges remaining to be examined is decreased by half. As new edges are added, a compaction algorithm is executed to remove duplicate edges and to place edges in the mesh-of-trees in a form suitable for the next iteration. Thus the total time taken is O(log2 n). A similar algorithm is given for the case when the points are in three dimensions. An algorithm is given in [Jeon9lb] which computes the Voronoi diagram of n points in the plane on an n-processor mesh in 0(n'12 ) time, where distances are measured using the Li-metric. Finally, it is shown in [Schw89] how the discrete Voronoi diagram of an n x n digitized image can be obtained under the LI-metric in 0(logn) time on a mesh-of-trees with 0(n2 ) processors.

8.2 PRAM Algorithms for Voronoi Diagrams Several algorithms exist for computing the Voronoi diagram of n planar points on the PRAM model of computation. Chow gives an algorithm that uses inversion and computes the convex hull of a set of points in three dimensions. It runs in 0(log 3 n) time on a CREW PRAM with 0(n) processors [Chow8O]. A time-optimal algorithm is described in [Prei88] in which each point computes its own Voronoi polygon by determining its neighbors. The algorithm runs in 0(logn) time using 0(n3 ) processors on a CREW PRAM. The authors show how their algorithm can run on a (SMALLEST, LARGEST) CRCW PRAM in 0(1) time with 0(n4 ) processors.

104

Voronoi Diagrams

Chap. 8

Algorithms for computing the Voronoi diagram for a set of points under the LI-metric are given in [Wee90] and [Guha9O] for the CREW PRAM model. The first runs in 0(logn) time and uses 0(n) processors, and the second runs in 0(log 2 n) time and uses 0(n/logn) processors. Both algorithms are cost optimal in view of the Q (n log n) sequential lower bound for this problem [Prep85]. In [Agga88], an algorithm that uses divide-and-conquer and runs in 2n) time on an 0(n)-processor CREW PRAM is given. An algorithm is given in 0(log [Evan89] for computing the Voronoi diagram of points in the plane that runs in 0(log3 n) time and uses 0(n) processors on a CREW PRAM. It is also shown how the algorithm runs in 0 (log 2 n) time on a CRCW PRAM with the SMALLEST write conflict resolution rule. Their algorithm uses the divide-and-conquer technique of Shamos [Sham75] and is similar to that in [Jeon9O]. List ranking is used to compute the edges of C instead of the precede function. Given a linked list L of n elements represented as an array of pointers, list ranking computes, for each element e E L, its distance from the tail of the list, that is, the number of elements in L that follow e. List ranking can be performed in 0(logn) time on an EREW PRAM of size 0(n/ logn) [Cole88c]. The authors point out that by using the optimal point location algorithm of Jeong and Lee[Jeon9O] (see Chapter 5), their algorithm runs in 0(n/1 2 ) time on a mesh of size n. In [Levc88], a parallel algorithm is given for computing the Voronoi diagram of a planar point set within a square window W. It runs in 0 (log n) average time on a (PRIORITY) CRCW PRAM with 0(n/logn) processors when the points are drawn independently from a uniform distribution. The algorithm uses multilevel bucketing. The square window W is divided into equal-size cells in (log n)/2+ 1 ways, creating (logn)/2 +1 grids. Each grid GI,I = 0, 1... , (logn)/2, partitions W into 215n -21 equal-size squares. The grid with the most squares has 1lgn squares and the grid with 2 the least squares is W itself. The points of S are sorted into these buckets in parallel. This is done in 0(logn) expected time using 0(n/logn) processors on a (PRIORITY) CRCW PRAM by converting a randomized algorithm due to Reif [Reif85] that runs in 0(log n) probabilistic time on a CRCW PRAM of size n/log n. This is the only step of the Voronoi diagram algorithm that requires concurrent writing. It is shown how several Voronoi polygons of points can be computed in parallel by first computing for a point in S its rough rectangle of influence. Given a point p in a square C 1(i,j) of grid 1, define RI(p) as the region around C 1(i, j) from C(i - j3, - 3) to Cj(i + j3, + 3). If RI(p) extends outside W, only consider the part of it inside W. The rough rectangle of influence RI (p) for a point p is the rectangle RI (p) such that every square in RI (p) contains at least one point of S, and for each Rk(p), 0 < k <1, it is not true that every square in Rk(p) contains at least one point of S. For a point p S, c the intersection of the Voronoi polygon of p in Vor(S) with W is equal to the intersection of the Voronoi polygon of p in the Voronoi diagram of the points in Rl (p) with W. After computing the number of points in each square of each grid and the rough rectangles of influence for each point pi E S, an upper bound on the cost of computing V(i) for pi is obtained. By cost we mean the product of the time and the number of processors. The processors can be assigned to points for computing their Voronoi

Sec. 8.3

Problems

105

polygons by computing the partial sums of the costs. Each point whose upper bound cost is greater than the upper bound of the total cost divided by the number of processors is allocated a proportional number of processors to execute the algorithm in [Agga88]. Other points receive a single processor when that processor is finished with preceding tasks. A divide-and-conquer approach for computing the Voronoi diagram is described in [Cole9Ob], where the "marry" step is implemented by merging forests of free trees. This leads to two algorithms for the CREW PRAM: The first runs in 0(log n log log n) time using 0(n log n/log log n) processors, thus improving on the running time of the algorithm in [Agga88] while maintaining the same cost; the second runs in 0(log2 n) time using 0(n/logn) processors, thus improving on the cost of the algorithm in [Agga881 while maintaining the same running time. A parallel algorithm for computing the Voronoi diagram of a set of line segments in the plane is given in [Good89b]. It runs in O(log 2 n) time on an 0(n)-processor CREW PRAM. Vertical dividing lines through the endpoints of each segment are used to separate the plane into 2n + I regions called slabs. A binary tree is defined such that a slab is represented at each leaf. Each slab is further divided by segments that span it into regions called quads. The data structure and point location algorithm from [Atal89c] are used so that given a point p, the two segments that bound the quad containing p can be found in 0(logn) sequential time. The algorithm recurses down the tree, vertically merging the Voronoi diagrams in a slab by concatenating the Voronoi diagram for each quad in that slab, then horizontally merging the Voronoi diagrams in two consecutive slabs. Van Weringh gives a parallel algorithm for computing the farthest-site Voronoi diagram for a set of possibly overlapping disks. It runs in 0(log2 n) time on an 0(n)-processor CREW PRAM [vanW90]. Summary The table in Figure 8.2 summarizes the results in this chapter. Note that m is the number

of tetrahedra in the triangulation for the three-dimensional Delaunay triangulation algorithm and I < K < log n in Chow's CCC algorithm. All running times are for the worst case unless otherwise indicated. As the table shows, it remains an open problem to compute the Voronoi diagram of a set of points in the Euclidean plane in 0 (log n) time while using 0(n) processors.

8.3 Problems 8.1. 8.2.

Design an n-processor CREW PRAM algorithm for computing the Voronoi diagram of a set of n planar points in 0(log n) time. Describe an algorithm for computing the Voronoi diagram of a set of line segments on each of the following models of parallel computation:

Voronoi Diagrams

106

Chap. 8

Problem

Reference

Model

Processors

Running time

Planar point sets

[Chow8O] [Chow80O [Lu86a] [Jeon9O], [Jeon9l a] [Stoj88a] [Chow80]

CCC CCC Mesh Mesh

0(n)

0(Iog4 n)

0(n'+11K)

0(K log3 n) 0(n1 l/ logn) 0(n1/2)

Hypercube CREW PRAM

0(n) 0(n)

0(log' n)

[Agga88] [Cole9Ob] [Cole9Ob] [Levc88]

CREW PRAM CREW PRAM CREW PRAM CRCW PRAM (PRIORITY) CREW PRAM CRCW PRAM (SMALLEST, LARGEST)

0(n)

0(log2 n)

0(nlogn/loglogn) 0(n/ logn) 0(n/logn)

0(log2 n)

[Evan89]

CREW PRAM

0(n)

[Evan89]

0(n)

0(Iog n)

[Jeon9lb] [Schw89]

CRCW PRAM (SMALLEST) Mesh Mesh-of-trees

0(n)

0(n 1/2)

0(n2)

0(log n)

[Wee90] [Guha9O] [vanW90] [Saxe90],

CREW PRAM CREW PRAM CREW PRAM Mesh-of-trees

0(n)

0(log n)

0(n/ logn) 0(n2)

0(log2 n) 0(Iog 2 n) 0(log2 n)

Mesh-of-trees

0(n2)

0(m

CREW PRAM

0(n)

0(log2 n)

[Prei88] [Prei88]

LI-metric

Disks (farthest site) Delaunay (2-D) Delaunay (3-D) Line segments

[Saxe9l] [Saxe9o], [Saxe91] [Good89b]

0(n)

0(n)

O(n3) O(n4)

0(n)

0(log n) 0(log n log log n) 0(log n) expected 0(log n) 0(1) 3

0(log n) 2

1

/2

logn)

Figure 8.2 Performance comparison of parallel Voronoi diagram algorithms.

8.3. 8.4. 8.5. 8.6.

(a) Mesh (b) Modified CCC (c) Broadcasting with selective reduction (BSR) Show how the Delaunay triangulation of a set of points in three dimensions can be computed on a hypercube parallel computer. Repeat Problem 8.3 for the case where the points fall on a set of k planes. Develop a mesh-of-trees algorithm for computing the farthest-site Voronoi diagram for a set of (possibly overlapping) disks. One way to generalize the concept of the Voronoi diagram of a point set S is to let the space in which the points fall be d-dimensional, where d > 2 [Hole91]. Design a

Sec. 8.4

8.7.

8.8.

References

107

parallel algorithm to compute the Voronoi diagram of a set of points in d dimensions on a mesh-of-trees parallel computer. Another generalization of the Voronoi diagram calls for defining the locus of planar points closer to a given subset of k members of S than to any other subset of the same size. The Voronoi diagram of order k, denoted Vork(S), is the collection of all such loci for all k-subsets of S. Given S and k, design a parallel algorithm to compute Vork(S). Like the Voronoi diagram, the Delaunay triangulation can also be generalized. (a) Develop a definition for Delk(S), the dual of Vork(S). (b) Design a parallel algorithm for computing Delk(S). (c) Show how Vork(S) can be derived from Delk(S).

8.9.

Design a parallel algorithm for computing the Voronoi diagram of a set of points in the plane, when the distances are measured using the L1 -metric, on a mesh-of-trees parallel computer. 8.10. Design a parallel algorithm for computing the Voronoi diagram for a set of points in each of the following two cases: (a) The points fall on the surface of a sphere. (b) The points fall on the surface of a cone. 8.11. Consider the following variant of the Voronoi diagram. A set P of points and a set S of straight-line segments are given in the plane. For each point p E P, it is required to find the locus of points q of the plane that are closer to p than to any other point of P, and furthermore, the straight-line segment (p, q) does not intersect any segment of S (in other words, p must be able to see q). Design a parallel algorithm that computes this variant of the Voronoi diagram. 8.12. Repeat Problem 8.11 for the case where P and S are the set of vertices and the set of edges, respectively, of a simple polygon.

8.4 References [Agga88]

A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Ddnlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. [Akl85bI S. G. Akl, ParallelSorting Algorithms, Academic Press, Orlando, Florida, 1985. [Atal89c] M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, SIAM Journal on Computing, Vol. 18, No. 3, 1989, 499-532. [Brow79a] K. Q. Brown, Voronoi diagrams from convex hulls, Information Processing Letters, Vol. 9, 1979, 223-228. [Chow8O] A. L. Chow, Parallel algorithms for geometric problems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1980. [Cole88c] R. Cole and U. Vishkin, Approximate parallel scheduling: Part 1. The basic technique with applications to optimal parallel list ranking in logarithmic time, SIAM Journal on Computing, Vol. 17, 1988, 128-142. [Cole90bj R. Cole, M. T. Goodrich, and C. O'Dunlaing, Merging free trees in parallel for efficient Voronoi diagram construction, in Automata, Languages and Programming,

108

Voronoi Diagrams

Chap. 8

M. S. Paterson (Editor), Lecture Notes in ComputerScience, No. 443, Springer-Verlag, Berlin, 1990, 432-445. [Evan89] D. J. Evans and I. Stojmenovi6, On parallel computation of Voronoi diagrams, Parallel Computing, Vol. 12, 1989, 121-125. [Good89b] M. T. Goodrich, C. O'D13nlaing, and C. K. Yap, Constructing the Voronoi diagram of a set of line segments in parallel, Proceedings of the 1989 Workshop on Algorithms and Data Structures (WADS'89), Lecture Notes in Computer Science, No. 382, F. Dehne, J.-R. Sack, and N. Santoro (Editors), Springer-Verlag, Berlin, 1989, 12-23. [Guha9O] S. Guha, An optimal parallel algorithm for the rectilinear Voronoi diagram, Proceedings of the Twenty-Eighth Annual Allerton Conference on Communication, Control and Computing, Monticello, Illinois, October 1990, 798-807. [Hole9l] J. A. Holey and 0. H. Ibarra, Triangulation, Voronoi diagram, and convex hull in k-space on mesh-connected arrays and hypercubes, Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, Illinois, August 1991, Vol. III, Algorithms and Applications, 147-150. [Jeon9O] C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. [Jeon9tal C.-S. Jeong, An improved parallel algorithm for constructing Voronoi diagram on a mesh-connected computer, Parallel Computing, Vol. 17, No. 485, July 1991, 505-514. [Jeon9lb] C.-S. Jeong, Parallel Voronoi diagram in LI (L.) metric on a mesh-connected computer, Parallel Computing, Vol. 17, No. 2/3, June 1991, 241-252. [Levc88] C. Levcopoulos, J. Katajainen, and A. Lingas, An optimal expected-time parallel algorithm for Voronoi diagrams, Proceedings of the Scandinavian Workshop on Algorithm Theory (SWAT), Sweden, Lecture Notes in Computer Science, No. 318, Springer-Verlag, Berlin, 1988, 190-198. [Lu86a] M. Lu, Constructing the Voronoi diagram on a mesh-connected computer, Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, Illinois, August 1986, 806-811. [Prei88] W. Preilowski and W. Mumbeck, A time-optimal parallel algorithm for the computing of Voronoi-diagrams, Proceedings of the Graph-Theoretic Concepts in Computer Science, Amsterdam, Lecture Notes in Computer Science, No. 344, J. van Leeuwen (Editor), Springer-Verlag, Berlin, June 1988, 424-433. [Prep85] F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. [Reif85] J. H. Reif, An optimal parallel algorithm for integer sorting, Proceedings of the Twenty-Sixth Annual Symposium on Foundations of Computer Science, Portland, Oregon, October 1985, 496-503. [Saxe90] S. Saxena, P. C. P. Bhatt, and V. C. Prasad, Efficient VLSI parallel algorithm for Delaunay triangulation on orthogonal tree network in two and tree dimensions, IEEE Transactions on Computers, Vol. C-39, No. 3, March 1990, 400-404. [Saxe9l] S. Saxena, P. C. P. Bhatt, and V. C. Prasad, Correction to: "Parallel algorithm for Delaunay triangulation on orthogonal tree network in two and three dimensions," IEEE Transactions on Computers, Vol. C-40, No. 1, January 1991, 122. [Schw89] 0. Schwarzkopf, Parallel computation of discrete Voronoi diagrams, Proceedings of

Sec. 8.4

References

109

the Sixth Annual Symposium on Theoretical Aspects of Computer Science, Paderborn, Germany, February 1989, 193-204. [Sham75] M. I. Shamos, Geometric complexity, Proceedings of the Seventh ACM Symposium on Theory of Computing, Albuquerque, New Mexico, May 1975, 224-233. [Stoj88a] 1. Stojmenovi6, Computational geometry on a hypercube, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 100-103. [vanW901 K. van Weringh, Algorithms for the Voronoi diagram of a set of disks, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1990. [VoroO8] G. Voronoi, Nouvelles applications des paramrtres continus h la theorie des formes quadratiques. Deuxieme Memoire: Recherches sur les paralleloedres primitifs, Journal fur die Reine und Angewandte Mathematik, 134, 1908, 198-287. [Wee901 Y. C. Wee and S. Chaiken, An optimal parallel LI-metric Voronoi diagram algorithm, Proceedings of the Second Canadian Conference on Computational Geometry, Ottawa, Ontario, August 1990, 60-65.

9 Geometric Optimization

In this chapter we discuss problems that ask for the construction of a geometric object minimizing (or equivalently, maximizing) a certain objective function. We select for discussion the problems of computing minimum circle covers, Euclidean minimum spanning trees, shortest paths, and minimum matchings.

9.1 Minimum Circle Cover The minimum cardinality circle cover problem is as follows: Given a set of n circular arcs S = {Al, A2 , ... , An} on a circle C, find the minimum number of arcs in S that cover C. A related problem is that of finding a minimum-weight circle cover for a circle C. In this case the arcs are assigned weights and the requirement is to find a set of arcs that covers C for which the sum of the weights of the arcs is a minimum. Figure 9.1 shows a circle and a set of n = 8 arcs. An optimal sequential algorithm is given in [Lee84a] that finds a minimum circle cover in 0 (n log n) time. If the endpoints of the set of arcs are sorted by polar angle, a second algorithm given in [Lee84a] runs in 0(n) time, which is also optimal. The first parallel algorithm for the minimum cardinality problem appeared in [Bert88]. The algorithm runs in O(logn) time using 0(n 2 / logn + qn) processors on a CREW PRAM model, where q - 1 is the minimum number of arcs crossing any point of the circle. Note that in the worst case, 0(n 2 ) processors are needed. It is assumed that all of the endpoints of the arcs are distinct and that no arc covers the entire circle. Let the origin R be some arbitrary point on the circle C. An arc Ai E S is given by its counterclockwise and clockwise endpoints xi and yi, respectively, where each of xi is sorted about the center of C starting at and yi is a polar angle. The set xI, x2 , . . ., R using a parallel sorting algorithm [Cole88b]. In the case where xi is larger than yi, Ai extends across R. An arc Ai strictly intersects another arc Aj if Ai and Aj intersect and xi is before xj in the clockwise direction around C. See Figure 9.1. We say that (Ai,Aj) is a strictly intersecting pair. Those arcs properly contained within another arc are removed in 0(logn) time using 0(n 2/ logn) processors.

11l

Geometric Optimization

112

Chap. 9

Fj

Figure 9.1 A circle and

n =

8 arcs. A, strictly intersects Aj.

The algorithm is essentially a parallelization of the sequential algorithm in [Lee84a] in which a "greedy" method is used. It is shown in [Lee84a] that if a minimum cover is known to contain an arc Ai, then starting with Ai and proceeding in a clockwise direction, choosing as the next arc the longest one that has a nonempty intersection with the previous arc results in a minimum circle cover. The longest arc Aj that has a nonempty intersection with the previous arc Ai is called the successor arc of Ai, SUCC(A,) = Aj. More formally, SUCC(Ai) is the arc Aj such that yj extends around C the farthest over all yk, where (Ai,Ak) is a strictly intersecting pair. The first step of the algorithm computes the minimum number of arcs q that include any endpoint e E XlX2.-X,,, Y, Y2 .. y, in O(logn) time using O(n2 / logn) processors on a CREW PRAM. The minimum number of arcs crossing any point of C is q - 1. If q = 1, there is no cover of C. Otherwise, the values of SUCC(Ai) are computed for each Ai E S by computing for each Ai the set of arcs A1 such that (Ai, Aj) form a strictly intersecting pair. For one arc, these pairs are found in O(logn) time using O(n/ logn) processors, and the maximum distance yj in the set is found in the same time. For n arcs, O(n 2 / logn) processors are used to achieve the same time bound. The next part of the algorithm applies the greedy algorithm in parallel over all arcs Ai that include e and chooses the smallest length cover as the minimum cover. Note that the SUCC values define a linked list. One processor is assigned to each arc that includes e (there are q such arcs), and for each of these, 0(n) processors are assigned to the n arcs. Processors move along the successor linked list creating a cover list in parallel so that at each step of the algorithm, the number of arcs in each cover list doubles; there are, therefore, 0(logn) steps. An algorithm to find the

Sec. 9.1

Minimum Circle Cover

113

minimum-weighted cover is also presented in [Bert88]. This algorithm runs in 0(log 2 n) time and uses 0(n 3 / logn) processors on a CREW PRAM. Three subsequent results appeared in the literature within three months of each other. The algorithms in [Sark89a] and [Boxe89c] achieve optimal cost: 0(logn) time, 0(n) space, and 0(n) processors on a CREW PRAM. The third algorithm, presented in [Atal89b], matches these running time and processor and space bounds but has the advantage of running on the less powerful EREW PRAM model. In the algorithm of [Sark89a] it is shown how to remove those arcs properly contained in another arc by mapping the problem to the maximal elements problem that can be solved in 0(logn) time using 0(n) processors on a CREW PRAM [Stoj88b]. SUCC(Aj) for each arc Aj is found by first sorting the endpoints xi and yi, i = 1, 2, .. ., n, then merging the two lists in 0(logn) time on a CREW PRAM with 0(n) processors [Cole88b]. Those arcs Ai that contain the origin R such that yi > 27r and (y- 27r) > xi are handled by merging the list of counterclockwise endpoints with yj - 27r, j = 1, 2, . . - n, and finding new successors. To check if no cover exists, a test if SUCC(Ai) = Ai for any arc Ai suffices. If a cover exists, it is found using the following idea from [Lee84a]: If the size ml of the minimum circle cover starting from an arbitrary arc is known, the size of the minimum circle cover for the set is either ml or ml -1. In other words, the greedy algorithm starting at an arbitrary arc always produces a cover whose size is at most one more than the size of the minimum circle cover. Once an arbitrary arc has been chosen, 0(n) processors cooperate to find ml in 0(logn) time. Each processor j starts at arc Aj and doubles the number of successor arcs in the cover at each step in parallel until the x endpoint of the arbitrary arc is included in the cover. Processor 1 then finds ml by summing the number of arcs included by each processor. After ml has been determined, each processor j checks in 0(logn) time whether there is a cover of size m- 1 starting at Aj. A slightly different approach is presented in [Boxe89c], but the same running time with the same number of processors is achieved. The 2n endpoints are stored in records with four fields such that the initial values are xi or yi, yi, i, and xc in fields 1, 2, 3, and 4, respectively, for record i. The records are sorted on their first field in 0(logn) time using 0(n) processors on a CREW PRAM [Cole88b]. To find SUCC(Ai) for each arc Ai, a parallel prefix "max" operation is used on the second field of the records. This operation takes 0(logn) time with 0(n) processors [Krus85]. For the yi records, this operation gives the maximum Yk such that Xk ' Yi < Yk, and the index k is the fourth field of each record. Since the arcs can cross R, an additional 2n records with 2wr subtracted from the yi's are created and the steps above repeated on these records. The 4n records resulting from the union of the two sets of records are then sorted by the third field, the index of the endpoint. Each processor i examines four records at sorted positions 4i, 4i + 1, 4i + 2, and 4i + 3 in constant time in parallel to find the index of SUCC(Ai). If there is no cover, there is an arc that is its own successor which can be determined in 0(logn) time with 0(n) processors. If there is a solution, then for all arcs Ai that include the origin R, the minimum number of arcs, county, required for

14

Geometric Optimization

Chap. 9

Ai to wrap around on itself is found. A linked list of the successor arcs is a partially ordered list of n elements represented by an array. A modified list ranking procedure is used to find county for each arc Ai that includes R in 0(logn) time. The minimum county is then computed in 0(logn) time to find the minimum circle cover. Finally, an algorithm is given in [Atal89b] that computes a minimum circle cover in 0(logn) time using 0(n) processors on an EREW PRAM model. If the endpoints of the arcs are sorted, only 0 (n/log n) processors are needed. This algorithm is cost-optimal based on the 0 (n log n) and Q (n) lower bounds for unsorted and sorted endpoints, respectively, given in [Lee84a]. For unsorted endpoints, sorting is performed first in 0(logn) time using 0(n) processors [Cole88b]. The endpoints are labeled such that for indices i and j, i < j means that xi is before xj in a clockwise walk around the circle. This relabeling can be done using parallel prefix in 0(logn) time and 0(n/logn) processors on an EREW PRAM [Krus85]. In [Atal89b], as well as defining SUCC(Ai) for each arc Ai, the inverse function SUCC- (Ai) is defined: SUCC- 1(Ai) = {A1 E SISUCC(Aj) = Aj}. Note that ISUCC- (Ai)I > 1. The first two steps of the algorithm eliminate arcs properly contained in other arcs and compute SUCC and SUCC-1 for each arc in S. It is shown that eliminating contained arcs can be accomplished in 0 (log n) time using 0 (n/ log n) processors using parallel prefix. A method similar to that in [Boxe89c] is used for computing SUCC(Ai) for each arc Ai. A test [requiring 0(logn) time and 0(n/logn) processors] is then made to see if there is no solution, that is, if for some Ai, SUCC(Ai) = Ai. It is shown how to compute SUCC- 1(Ai) by first proving that for every arc Ai, the arc(s) in SUCC- 1(Ai) occur around the circle C consecutively. Therefore, the arcs can be "marked" with the indices j for which SUCC(Aj) 0SUCC(Aj+ ), and SUCC- (Ai) 1 can be computed for each A, e S in 0(logn) time using 0(n/logn) processors. With the successor function and its inverse computed for each arc and properly contained arcs removed, the minimum circle cover is computed by using a parallel version of the greedy algorithm given in [Lee84a] such as was done in [Bert88]. Recall that W is the set of arcs that contain the origin R. A new copy of Ai E W, New(Ai), is created and the successor function is modified so that every SUCC(Aj) = Ai such that Ai E W is changed to SUCC(Aj) = New(Ai) and every SUCC(New(Ai)) = 0. The result is a forest of IWI trees such that the roots of the trees are the elements of New(W) and the children of a node Aj in T are the arcs in SUCC- 1(Aj). The arcs Ai ( W are among the leaves of the trees. Since the inverse of the successor function is available, the trees can be computed in 0 (log n) time with 0 (n/log n) processors using the Euler tour technique [Tarj85] and list ranking. The trees are then used to find a minimum circle cover by finding the minimum depth of each leaf Ak such that Ak E W.

Summary The table in Figure 9.2 summarizes the results in this section. Note that q minimum number of arcs crossing any point of the circle.

-

1 is the

Sec. 9.2

Euclidean Minimum Spanning Tree

115

Problem

Reference

Model

Processors

Minimum

[Bert88]

CREW PRAM

0(n

[Sark89a]

CREW PRAM

0(n)

0(logn)

[Boxe89c]

CREW PRAM

0(n)

0(logn)

[Atal89b]

EREW PRAM

0(n)

0(logn)

[Bert88]

CREW PRAM

0(n'/ logn)

O(log n)

cardinality

Minimum

2

/ log n + qn)

Running time

O(log n)

2

weight

Figure 9.2

Performance comparison of parallel minimum circle cover algorithms.

Figure 9.3 Set of n = 16 points and EMST of set.

9.2 Ew cIidean Minimum Spanning Tree Given a set S of n points in the plane, a Euclidean spanning tree of S is a tree linking the points of S with rectilinear edges. A Euclidean minimum spanning tree (EMST) of S is one for which the total (Euclidean) length of the edges is smallest among all such trees. The lower bound for computing the EMST of a set of points on a single processor is Q2(n logn) [Prep85]. Figure 9.3 shows a set of n = 16 points and the EMST of the set. It is shown in [Mill89b] how the EMST can be computed in parallel using the algorithm of Sollin described in [Good77] on a mesh with O(n) processors. Initially, each point is viewed as a connected component. At each iteration, every component is connected by an edge to its nearest (component) neighbor, thus forming a new component for the next iteration. Since each iteration reduces the number of components by at least a factor of 2, the EMST is found after log n iterations at most. The

116

Geometric Optimization

Chap. 9

algorithm uses the procedure for finding ANN on a mesh (see Section 7.1) and runs in 0 (n 1/ 2 log n) time. It operates on the implicit complete graph connecting the points of S. By starting with a sparser graph guaranteed to contain the EMST, such as the Delaunay triangulation, more efficient algorithms may be obtained. Indeed, it is pointed out in [Mill89b] that an 0(n1 /2) time (and hence optimal) algorithm for the mesh can be obtained based on an algorithm for computing the Voronoi diagram [Jeon9O]. Another EMST algorithm that runs on the mesh and is based on first finding the Voronoi diagram for S appears in [Lu86a]. The running time of this algorithm is O(n 1 /2 logn) on an 0(n)-processor mesh. (Voronoi diagram construction is discussed in Chapter 8.) We note in passing that the EMST is only a special case of the minimum spanning tree (MST) problem defined for general connected and weighted graphs. Many algorithms for computing the MST in parallel exist [Akl89a], which are of course applicable to the EMST problem. However, the algorithms discussed in the preceding paragraph were singled out as they exploit the geometric properties of the EMST.

9.3 Shortest Path In addition to the minimum spanning tree problem, other graph theoretic problems have been studied in the special case of a geometric setting. One such problem we study in this section is that of computing the shortest path between two points. A third problem, computing perfect matchings, is discussed in the following section. Given a simple polygon P with n vertices and two points s and d in P. the interior shortest path (ISP) problem asks for computing the shortest path from s to d that lies completely inside P. Figure 9.4 shows a polygon P, two points s and d inside P, and the shortest path from s to d. This problem is solved in [ElGi86b] on an 0(n)-processor CREW PRAM in 0 (log 2 n) time, provided that P is monotone. This result is strengthened in [ElGi88], where it is shown how the same model can solve the ISP problem in 0 (log n) time for arbitrary simple polygons. The algorithm consists of two major steps, each requiring 0 (log n) time and 0(n) processors. A triangulation TP of P is constructed in the first step using the parallel algorithm of [Good89a], then its dual TP' (a tree) is obtained, where each edge connects two nodes whose corresponding triangles in TP share an edge. Denote by s' (d') the node in TP' corresponding to the triangle containing s (d). The algorithm of [Tarj85] is then applied to obtain a simple path from s' to d', which corresponds to a simple polygon S contained in P, and is called a triangulated sleeve. The edges of this triangulated sleeve are arranged in order of increasing distance from s. In the second step, a divide-and-conquer approach is used to compute a shortest path from s to d in the triangulated sleeve. An algorithm is also given in [ElGi88] for computing the shortest paths from a point inside an n-vertex simple polygon P to the vertices of P. It runs in 0(log2 n) time on an 0(n)-processor CREW PRAM. This running time is improved to 0(logn) in [Good9Oa]. It is also shown in [Good9Oa] that the farthest neighbor for each vertex

Sec. 9.4

Minimum Matchings

Figure 9.4 Polygon P. two points s and s to d.

117

d

inside P. and shortest path from

in P (where distance is measured by the shortest path inside P) can be determined in O(log2 n) time on a CREW PRAM with O(n) processors. A simple polygon is said to be rectilinear if all of its edges are either horizontal or vertical. Let P be a simple rectilinear convex polygon with n vertices inside of which lie n pairwise disjoint rectangles. The latter are called obstacles. CREW PRAM algorithms are given in [Atal9Oc] for computing shortest paths inside P that avoid the set of obstacles. Descriptions of shortest paths are obtained in O(log2 n) time using 0(n 2 / log 2 n) processors if the source and destination are on the boundary of P, 0(n 2/ logn) processors if the source is an obstacle vertex and the destination a vertex of P, and O(n2 ) processors if both source and destination are obstacle vertices. Using these descriptions, a single processor can obtain the path length in constant time if the source and destination are vertices, and in O(logn) time if they are arbitrary points. The shortest path itself can be retrieved from its description by O(n/logn) processors in O(logn) time. 9.4 Minimum Matchings Let 2n points in the plane be given, of which n are colored red and n are colored blue. It is required to associate every blue point with exactly one red point such that the sum of the distances between the pairs thus formed is the smallest possible. This is a special

Geometric Optimization

118

Chap. 9

Figure 9.5 Two sets of points and minimum-weight perfect matching of sets.

case (for points in the plane) of the more general minimum-weight perfect matching problem on bipartite graphs, also known as the assignment problem. Figure 9.5 shows two sets of points and a minimum-weight perfect matching of the sets. Two efficient sequential algorithms for the assignment problem in the plane are known [Vaid89]. The first runs in O(n 2 5 logn) time when the distances are measured using either the Euclidean (i.e., L2 ) or Manhattan (i.e., LI) metric. The second runs in o (n2 log 3 n) time strictly when the Manhattan metric is used. Parallel algorithms for the assignment problem in the plane that achieve an optimal speedup with respect to the algorithms of [Vaid89] are presented in [Osia9O]. The algorithm for Euclidean distances runs in O(n

3

/p

2

+ n2 5 5 logn/p) time, using p

112

processors where p < n . It achieves an optimal speedup with respect to the algorithm of [Vaid89] when p > n1/ 2 / logn. When the distances are measured using the Manhattan metric, an algorithm is given in [Osia9O] that solves the assignment problem in the plane on an EREW PRAM with 0 (log2 n) processors in 0(n

2

log n) time.

In what follows we provide some theoretical background to the assignment problem in the plane, a summary of the algorithm of [Osia9O] for the case where the Euclidean metric is used to measure distances, a description of related matching problems, and a number of open questions.

9.4.1 Graph Theoretic Formulation A bipartite graph is a graph whose set of nodes is the union of two disjoint sets of nodes U and V. No two nodes in either U or V have an arc between them. A complete

Sec. 9.4

Minimum Matchings

1t9

bipartite graph is a bipartite graph in which there is an arc between every node in U and every node in V. Let G = (U, V) be a complete bipartite graph on the plane induced by two sets of points U and V, with I U 1=1 V J= n. A matching M of G is a pairing of the points in U with those in V such that every point in U is paired with no more than one point in V, and vice versa. The weight of M is the sum of the distances between the pairs of points in the matching. A perfect matching is a matching M of G such that every point in G is paired with another point. A minimum-weight perfect matching is a matching M of G such that the weight of M is a minimum over all perfect matchings. The assignment problem in the plane is to determine a minimum-weight perfect matching of G. 9.4.2 Linear Programming Formulation Let d(ui,vj) be the distance between Ui E U and Vj formulation of the assignment problem on the plane is: Minimize

E

V. A linear programming

Ed(uiavj)xi (Ur ,1j )

subject to xj I') (i) Lxj 1 =

=1,2.n j=1,2.

n

Xij > 0

with the understanding that the pairing (ui,vj) is in the matching M if and only if xij = 1. The constraints of the linear program mean that when a solution is obtained, each point must be paired with exactly one other point. To solve this linear program, a dual to the linear program, which is generally easier to solve, is formulated as: Maximize a, + Eby subject to a + bj d(ui,vj)

I
ai, by, unconstrained otherwise where a, and by are the dual variables associated with ui dual variables can be thought of as weights associated constraint means that the weights on the points of a pairing distance between the points. Orthogonality conditions that for optimality of the primal and dual solutions are:

and vj, respectively. These with the points. The first should be no larger than the are necessary and sufficient

120

Geometric Optimization

xij > ai

0

= X

=> Exij

Chap. 9

a, + by = d(ui,vj)

=1

i = 1,2,...,n

UJ)

by

0

xij

=X

1

j = 1,2,

n.

(i)I

An algorithm based on this formulation maintains primal and dual feasibility at all times and, in addition, maintains satisfaction of all orthogonality conditions except the second. The number of points for which the second condition is not satisfied decreases during the course of the computation. An alternatingpath in G with respect to a matching M is a simple path such that only one of any two consecutive edges ei, ei+l on the path is in the matching. With respect to a matching M, a point is said to be exposed if it is not paired with any other point; otherwise, it is said to be matched (or paired). An alternating tree relative to a matching M is a tree whose root is exposed and a path between the root and any other point in the tree is an alternating path. An alternating path in G joining two distinct exposed points in U and V is called an augmenting path. A matching M that does not have a maximum number of edges can be augmented using an augmenting path P by including in M the edges on P not in M and removing from M the edges on P that are in M. The algorithm searches for a series of augmenting paths. Each time an augmenting path is found, the matching is augmented. A search for an augmenting path is done by growing alternating trees. When alternating trees cannot be grown, dual variables are revised to permit further growth of alternating trees. However, when an augmenting path cannot be found, a solution to the problem has been obtained.

9.4.3 Geometric Formulation The operation of determining which edge is the next to include in an alternating tree can be reduced to a geometric query problem as follows. The slack sij on an edge (ui, v) is the distance between ui and vj, minus the sum of the dual variables associated with ui and vj [i.e., sij = d(ui,vj) - a, - bj]. There are two types of points in V. Those in an alternating tree and those that are not in any alternating tree. For the latter, we need to determine the edge (ui,vj), where us is in an alternating tree and vj not in an alternating tree, such that sij is minimum. To do this, weights w(ui), us E U, and w(vj), vj E V, related to the dual variables, are associated with the points. Now, determination of the edge (ui,vj), not in M and with minimum slack, to be included in an alternating tree, is reduced to a geometric query problem involving the weights. Let F be a forest of alternating trees, and let H represent the sum of the amount h by which the dual variables change during a phase. The relationships between the weights and the dual variables are given as as = w(ui) + H and by = w(vj) - H. At the beginning of a phase, H is initialized to 0. When the dual variables are to be revised,

Sec. 9.4

Minimum Matchings

121

h is added to H instead of revising the dual variables. When a point is included in F, the associated weight is initialized to the dual variable. At the end of a phase when an augmenting path has been discovered, the matching is augmented and the dual variables are revised using H and the weights associated with the points. To determine the next edge to include in F efficiently, a solution to the following geometric query problem is required: Given a set of points Q and a weight w(p) for each point p E Q, preprocess Q so that for a given query point q, a point in Q nearest to q can be found quickly, where the distance between the points for this query problem is sij. A solution to this problem for points on the Euclidean plane can be obtained through the use of the weighted Voronoi diagram (WVD) of the points in Q. A weighted Voronoi diagram (WVD) partitions the plane into O(I Q l) regions. Each point p E Q has a region Vor(p) associated with it, defined as follows: Vor(p) = {p" I d(p",p)-

w(p) < d(p",p') -w(p'),Vp'

E Q).

Sequentially, the WVD of a set Q of n points can be computed in O(n logn) time [Fort87], and a query can be answered in O(logn) time [Edel86a]. 9.4.4 Parallel Algorithm The parallel algorithm of [Osia9O] for solving the assignment problem on the Euclidean plane is summarized below. Step 1.

Initialize a matching M to an empty set.

Step 2.

In parallel, root alternating trees at exposed points in U.

Step 3.

If F is empty then stop else (3.1) In parallel determine a point vj not in F such that vj is nearest to a point ui in F, using the distance sij = d(ui,vj) -w(uj) -w(vj).

(3.2) If sij = 0 and vj is exposed, all processors augment the matching, update dual variables, and go to step 2. (3.3) If sij = 0, grow an alternating tree by adding (ui,vj) and (vj,uk) E M to F, initialize the weights of ui, vj, and Uk to their respective dual variables, and go to step (3.1). (3.4) If si. > 0, in parallel revise the dual variables using h = s and go to step (3.1). 9.4.5 Related Problems A different minimum matching problem is considered in each of [Osia9l] and [He9l]. In [Osia9l] we are given 2n points in the plane, all of which are of the same color. It is required to match each point with a single other point so that the sum of the distances

Geometric Optimization

122

Chap. 9

between matched points is a minimum. It is shown in [Osia91] that this problem can be solved on a p-processor EREW PRAM in O(n2 5 log 4 n/p) time, where p < n112. A restriction of this problem to the case where the 2n points fall on the boundary of a convex polygon is described in [He91], where an algorithm is given for solving the problem in 0(log2 n) time on an 0(n)-processor CREW PRAM. 9.4.6 Some Open Questions Several problems are left open in [Osia9O, Osia9l]; for example: 1. Can smaller running times be obtained using more processors? 2. What performance can be obtained when using a set of interconnected processors (instead of the PRAM)? 3. Are there efficient parallel algorithms for computing maximum matchings in the plane? Parallel algorithms for optimization problems other than the ones discussed in this chapter are described in [Agga88] and [Ferr9lb].

9.5 Problems 9.1.

Design and compare parallel algorithms for solving the minimum cardinality circle cover problem on the following models of parallel computation: (a) Mesh (b) Mesh with broadcast buses (c) Mesh with reconfigurable buses

9.2.

Show how the minimum-weight circle cover problem can be solved on a hypercube parallel

9.3.

computer. Can the Euclidean minimum spanning tree problem be solved in constant time on the model of computation known as broadcasting with selective reduction?

9.4. For two points p and q inside a rectilinear polygon P. define a smallest path from p to q

9.5.

9.6. 9.7.

as a rectilinear path that minimizes both the distance and the number of line segments in the path. Given P, p, and q, design a parallel algorithm for computing the smallest path from p to q on a mesh of processors. Given a simple polygon P, an external shortest path between two vertices p and q of P, denoted SP(p, q), is a polygonal chain of vertices of minimum length that avoids the interior of P. The external diameter of P is the SP(p,q) of maximum length over all pairs of vertices p and q of P. Design an algorithm for computing the external diameter of a simple polygon P on the EREW PRAM model of parallel computation. Investigate various approaches to computing in parallel a maximum-weight perfect matching of a set of 2n points in the plane. Given a convex polygon P with n vertices and an integer k > 3, it is required to compute the minimum area k-gon that circumscribes P. (a) Design a parallel algorithm for solving this problem for the case k = 3.

Sec. 9.6

References

123

(b) What can be said about parallel solutions when k > 4? The following problem is known as the maximum empty rectangle (MER) problem: Given an isothetic rectangle RI and a set of points P inside RI, it is required to find an isothetic rectangle R2 of maximum area such that R2 is completely contained in RI and does not contain any points from P. Design a hypercube algorithm for solving the MER problem. 9.9. You are given a collection P of planar points, an integer C, and a radius R. The elements of P may be viewed as customers of a set F of facilities to be located in the plane such that each facility has capacity C. It is required to find a set F of planar points so that each customer in P can be assigned to some facility in F with distance at most R, and so that no facility has more than C customers assigned to it. Design a parallel algorithm for solving this problem. 9.10. Design a parallel algorithm for the model of your choice that computes a smallest radius disk that intersects every line segment in a set of n line segments in the plane. 9.11. Given a simple rectilinear polygon P, it is required to cover P with a minimum number of squares, possibly overlapping, all interior to P. Design a parallel algorithm for solving this problem. 9.12. A set P of points is given in the plane. The Euclidean traveling salespersonproblem calls for finding a simple polygon whose vertices are the points of P and whose perimeter is the shortest possible. This problem is believed to be very hard to solve sequentially in time polynomial in the number of points of P [Prep85]. Design a parallel algorithm that combines the minimum spanning tree and the minimum-weight perfect matching to obtain a solution to the Euclidean traveling salesperson problem that is no worse than 1.5 times the optimal. 9.8.

9.6 References [Agga88] [Akl89a] [Atal89b] [Atal9Oc]

[Bert881 [Boxe89c] [Cole88b]

A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Dunlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. M. J. Atallah and D. Z. Chen, An optimal parallel algorithm for the minimum circle-cover problem, Information Processing Letters, Vol. 32, 1989, 159-165. M. J. Atallah and D. Z. Chen, Parallel rectilinear shortest paths with rectangular obstacles, Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 270-279. A. A. Bertossi, Parallel circle-cover algorithms, Information ProcessingLetters, Vol. 27, 1988, 133-139. L. Boxer and R. Miller, A parallel circle-cover minimization algorithm, Information Processing Letters, Vol. 32, 1989, 57-60. R. Cole, Parallel merge sort, SIAM Journal on Computing, Vol. 17, No. 4, August 1988, 770-785.

124 [Edel86a] [ElGi86b]

[EiGi88] [Ferr9lb]

[Fort871 [Good771 [Good89a] [Good9Oa]

[He9l]

[Jeon9O] [Krus85]

[Lee84a] [Lu86a]

[MilI89b] [Osia90]

[Osia9l]

[Prep85]

Geometric Optimization

Chap. 9

H. Edelsbrunner, L. J. Guibas, and J. Stolfi, Optimal point location in a monotone subdivision, SIAM Journal on Computing, Vol. 15, 1986, 317-340. H. ElGindy, A Parallel Algorithm for the Shortest Path Problem in Monotone Polygons, Technical Report MS-CIS-86-49, Department of Computer and Information Science, Faculty of Engineering and Applied Science, University of Pennsylvania, Philadelphia, May 1986. H. ElGindy and M. T. Goodrich, Parallel algorithms for shortest path problems in polygons, The Visual Computer, Vol. 3, 1988, 371-378. A. G. Ferreira and J. G. Peters, Finding smallest paths in rectilinear polygons on a hypercube multiprocessor, Proceedings of the Third Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 162-165. S. Fortune, A sweepline algorithm for Voronoi diagrams, Algorithmica, Vol. 2, 1987, 153-174. S. E. Goodman and S. T. Hedetniemi, Introduction to the Design and Analysis of Algorithms, McGraw-Hill, New York, 1977, section 5.5. M. T. Goodrich, Triangulating a polygon in parallel, Journal of Algorithms, Vol. 10, September 1989, 327-351. M. T. Goodrich, S. B. Shauck, and S. Guha, Parallel methods for visibility and shortest path problems in simple polygons, Proceedings of the Sixth Annual Symposium on Computational Geometry, Berkeley, California, June 1990, 73-82. X. He, An efficient parallel algorithm for finding minimum weight matching for points on a convex polygon, Information Processing Letters, Vol. 37, No. 2, January 1991, 111-116. C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. C. P. Kruskal, L. Rudolf, and M. Snir, The power of parallel prefix, Proceedings of the 1985 International Conference on Parallel Processing, St. Charles, Illinois, August 1985, 180-185. C. C. Lee and D. T. Lee, On a circle-cover minimization problem, Information Processing Letters, Vol. 18, 1984, 109-115. M. Lu, Constructing the Voronoi diagram on a mesh-connected computer, Proceedings of the 1986 International Conference on ParallelProcessing, St. Charles, Illinois, August 1986, 806-811. R. Miller and Q. F. Stout, Mesh computer algorithms for computational geometry, IEEE Transactions on Computers, Vol. C-38, No. 3, March 1989, 321-340. C. N. K. Osiakwan and S. G. Akl, Efficient ParallelAlgorithms for the Assignment Problem on the Plane, Technical Report 90-284, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1990. C. N. K. Osiakwan, Parallel computation of weighted matchings in graphs, Ph.D. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1991. F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985.

Sec. 9.6

References

125

[Sark89a] D. Sarkar and 1. Stojmenovi6, An optimal parallel circle-cover algorithm, Information Processing Letters, Vol. 32, July 1989, 3-6. [Stoj88b] 1. Stojmenovi6 and M. Miyakawa, An optimal parallel algorithm for solving the maximal elements problem in the plane, Parallel Computing, Vol. 7, 1988, 249-251. [Tarj85] R. E. Tarjan and U. Vishkin, An efficient parallel biconnectivity algorithm, SIAM Journal on Computing, Vol. 14, 1985, 862-874. [Vaid89] P. M. Vaidya, Geometry helps in matching, SIAM Journal on Computing, Vol. 18, No. 6, December 1989, 1201-1225.

10

Triangulation of Polygons and Point Sets

In this chapter we review parallel algorithms for the following two problems: 1. Decomposing simple polygons into trapezoids; such decompositions are used in planar point location (see Chapter 5), as well as the triangulation of simple polygons (also discussed in this chapter). 2. Triangulating point sets; this problem has practical applications in the finite-element method and in numerical analysis.

10.1 Trapezoidal Decomposition and Triangulation of Polygons The trapezoidaldecomposition or trapezoidal map of a polygon P is the decomposition of P into trapezoids. Given a simple n-vertex polygon P, (vertical) trapezoidal edge(s) are determined for each vertex. A trapezoidal edge for vertex v is an edge e of P that is directly above or below v such that the vertical line segment from v to e is inside P. Figure 10.1 shows a polygon P and a trapezoidal decomposition of P. A triangulation of a simple n-vertex polygon P is the augmentation of P with diagonal edges (or chords) connecting vertices of P such that in the resulting decomposition, every face is a triangle. Figure 10.2 shows a triangulation of the polygon P shown in Figure 10.1. In this section we present algorithms that compute the trapezoidal map of a simple polygon and algorithms that, given a trapezoidal map of a simple polygon P. triangulate P. An algorithm that decomposes an n-vertex simple polygon P (possibly with holes) into trapezoids is given in [Asan88] with running time O(n) on a linear array of size n. Using the trapezoidal decomposition of P. P is decomposed into monotone polygons in O(n) time. Each monotone polygon is then triangulated sequentially by one processor in O(n) time. 127

128

Triangulation of Polygons and Point Sets

Figure 10.1 Polygon

P

Chap. 10

and trapezoidal decomposition of P.

In [Jeon9O] an algorithm for multilocating points in a set of nonintersecting line segments is described (see Chapter 5). It is shown how the trapezoidal decomposition of a simple polygon P and a triangulation of P can be constructed by direct application of this algorithm. The multilocation algorithm runs on a mesh of size n in O(n 1/2) time. An algorithm for point location on a hypercube presented in [Dehn9O] implies O (log2 n)-time solutions to trapezoidal decomposition and triangulation of a simple polygon on a hypercube of size 0 (n log n). Randomized algorithms for computing the trapezoidal decomposition of a set of nonintersecting line segments and the triangulation of a simple polygon are given in [Reif90]. Each algorithm runs in 0 (log n) probabilistic time on an 0(n)-processor butterfly network. Several algorithms on the CREW PRAM model of computation also exist for computing the trapezoidal map of a simple polygon and triangulating a simple polygon. The algorithm of [Agga88] described in Chapter 4 decides whether any two line segments in a set of line segments intersect and, if not, computes the vertical trapezoidal decomposition for the set of line segments. The trapezoidal decomposition of a simple polygon P is found in 0(log2 n) time using O(n) processors or in O(logn) time using O (n log n) processors. From the trapezoidal decomposition G, the polygon is partitioned into monotone polygons: Let t be a trapezoid in G and let c be a reflex comer of P such that c lies in the relative interior of a vertical edge of t; a diagonal edge is added that joins c to the comer of P on the opposite vertical edge of t. This can be done in constant time using O(n) processors, resulting in the partition of P into a

Sec. 10.1

Trapezoidal Decomposition and Triangulation of Polygons

129

Figure 10.2 Triangulation of P. set of horizontally monotone polygons. Horizontally monotone polygons consist of an upper chain and a lower chain of edges that meet at two extremal points, a leftmost and a rightmost point. From the horizontally monotone partition, P is partitioned into one-sided monotone polygons. A polygon Q is a one-sided monotone polygon if it is monotone and it has one distinguished edge s such that the vertices of Q are all above or below s except for the endpoints of s. The endpoints of s are the extremal points of a horizontally monotone polygon. Let Q be a one-sided monotone polygon with q vertices. Without loss of generality, assume that the distinguished edge s is below the vertices of Q. To triangulate Q, divide Q into ql 1 2 sections using ql1 2 vertical lines and find the lower hull of the part of Q above s in each section. The parts of Q above each hull are recursively triangulated by utilizing a multiway divide-and-conquer technique. The common tangent lines between each pair of lower hulls are computed iteratively until there is only a single lower hull left. This process results in a partial triangulation with the remaining parts to be triangulated having a similar shape. This shape is called a funnel polygon by Goodrich in [Good89a]. A funnel polygon is a one-sided monotone polygon that consists of a single edge followed by a convex chain followed by a single edge (or a vertex) followed by another convex chain. Figure 10.3 shows examples of funnel polygons. A funnel polygon K is triangulated in 0(logk) time using O(k) processors, where k is the number of vertices of K. The remaining section to be triangulated is bounded by the

130

t

I ii

---

-i-c-

UM - I iY r.-

M Pclvclnnq -n-

- --

and Point Sets

Chap. 10

Figure 10.3 Funnel polygons. r (left and right, respectively) on the bottom, distinguished edge s with vertices I and I to the on the top, with two edges connecting and the final lower hull boundary LH triangulated easily is portion of LH. This left extreme of LH and r to the right extreme either I or r. from visible is hull since each point on the lower runs in 0(logn) time using 0(n) The triangulation algorithm given in [Agga88] polygon to be triangulated is given. Otherwise, processors if the trapezoidal map for the or in 0(logn) time using O(n logn) 2 it runs in 0(log n) time using 0(n) processors in [Atal86b]. There, a simple polygon processors. A similar algorithm is presented and n log log n) time using 0(n) processors is decomposed into trapezoids in 0 (log 0(n) using time n) (log 0 in decomposition, is triangulated, given the trapezoidal processors. is used to construct a trapezoidal In [Atal89c], cascading divide-and-conquer plane 0 (log n) time using 0(n) processors. A decomposition of a simple polygon P in made segments that make up P, and T is sweep tree T is constructed for the line with T. Both operations take 0(logn) time into a fractional cascading data structure P of is multilocated in T; this yields the edge 0(n) processors. Each vertex p of P this trapezoidal decomposition or trapezoidal that is directly above (or below) p. Given time finds a triangulation of P in 0(logn) map of P, the algorithm in [Good89a] LAgga88I on the result of [Atal86b] and using 0 (n/log n) processors which improves uses a method similar to those Goodrich of by a factor of log n processors. The result of [Atal86b] and [Agga88]. an a simple polygon P, [Yap88] presents Given a trapezoidal decomposition of two making time using 0(n) processors by algorithm that triangulates P in 0(logn) as the one given in [Atal89c]. This algorithm calls to a trapezoidal map algorithm such and [Good89a] that perform heterogeneous avoids the steps in [Atal86b], [Agga88], algorithm, convex hull construction, and map operations such as calls to the trapezoidal map it performs just two calls to the trapezoidal multiway divide-and-conquer. Instead, algorithm and, as such, is more elegant. in [Reif87] that find the trapezoidal Finally, randomized algorithms are given 0 (log n) and triangulate a simple polygon in decomposition of a simple polygon probabilistic time using 0(n) processors.

Sec. 10.2

Triangulation of Point Sets

131

Reference

Model

Processors

TD time

T time

[Asan88) [Jeon9O]

Linear array Mesh

0(n) 0(n)

(n)(n) 0(n 1/2

0(n 1/2)

[Reif90] [Dehn9O]

Butterfly Hypercube

0 (n) 0(n logn)

O(logn)

6 (log n)

0(l0g2n)

0(log2 n)

[Atal86b]

CREW PRAM

0(n)

0(logn log logn)

0(logn)

[Agga88] [Agga88]

CREW PRAM CREW PRAM

0(n)

0(n)

Given a TD 2 0(log n)

0(logn)

[Atal89c] [Good89a] [Yap88] [Reif87]

CREW CREW CREW CREW

0(n) 0(n/logn) 0(n) 0(n)

PRAM PRAM PRAM PRAM

0(logn) Given a TD O(logn)

0(logn)

0(logn) 0 (logn) O(logn)

Figure 10.4 Performance comparison of parallel polygon triangulation algorithms.

Summary The table in Figure 10.4 summarizes the results for polygon triangulation in parallel. There are two running times given: TD time. T time.

The time to construct the trapezoidal decomposition. The time to construct a triangulation given a trapezoidal decomposition.

In [Asan88], the polygon is partitioned into monotone polygons in parallel, and the time marked with a t is the time to triangulate each monotone polygon with one processor.

10.2 Triangulation of Point Sets Triangulating a set S of n points requires partitioning the convex hull of S into triangles such that the vertex set of the partition is the set of points. The problem of triangulating a set of points in the plane is more difficult than triangulating a simple polygon since the sequential lower bound for triangulating a set of points is Q (n log n) [Prep85] and a linear time algorithm exists for triangulating a simple polygon [Chaz9O]. Conceptually, one can see that if there exists an algorithm to triangulate a simple polygon, it can be used to triangulate a point set S by first constructing a simple polygon P from S, then finding the convex hull of P. Each of P and the polygons formed between P and the convex hull edges can then be triangulated using a polygon triangulation algorithm. The parallel algorithms described in this section compute arbitrary triangulations of point sets. On the other hand, the Delaunay triangulation is the triangulation of a set of points

132

Triangulation of Polygons and Point Sets

Chap. 10

Figure 10.5 Arbitrary triangulation of set of points.

S such that the minimum angle of its triangles is a maximum over all triangulations [Saxe9O]. Since the Delaunay triangulation of S is the dual of the Voronoi diagram of S, algorithms for computing Delaunay triangulations are discussed in Chapter 8. Figure 10.5 illustrates an arbitrary triangulation of a set of points. An algorithm is given in [Chaz84] that triangulates a set S of n points in the plane on a linear array of size n in O(n) time. The convex hull of S, CH(S), is found in O(n) time using an algorithm also given in [Chaz84]. CH(S) is then partitioned into h triangles, where h is the number of points on CH(S), by adding an edge from one point of CH(S) to every other point of CH(S). The rest of the points in S are then triangulated inside each of the triangles. Processors store either the edges of CH(S) or the edges of a triangle in clockwise order. When a point p is added to the triangulation, it is passed through the array to test if it is inside the face of a triangle R and, if so, R is replaced by three triangles made up by joining p to vertices of R. This causes the rest of the information stored in the linear array of processors to "ripple" down the array in linear time. Points can also be added to the triangulation that are outside CH(S) by using an algorithm similar to Chazelle's convex hull algorithm [Chaz84]. Three more algorithms are considered for triangulating point sets in parallel. They all run on the CREW PRAM. The first two triangulate sets of points in the plane and the third one triangulates points in d-dimensional space. The algorithm in [Merk86] triangulates a set of points S in the plane and runs in O(logn) time using O(n) processors. It reduces the problem of triangulating points inside the convex hull of S to triangulating points inside triangles. The convex hull of S is found in O(logn) time using O(n) processors by an algorithm such as the one in [Atal86a] described in Chapter 3. The lowest rightmost point X is found and the rest of the points pi are sorted by the angle Oi that pi makes with X in the positive x-direction. The sorted sequence

Sec. 10.2

Triangulation of Point Sets

133

is split by lines through the extreme points of S and X. These splitting lines partition a convex polygon into triangles. Pr in one P, Pi, Pi+i, .Pr-1, Pi-l, Consider a subsequence of points pi, Pi+,. of these partitions, where pi and Pr are the left and right extreme points that bound the subsequence, respectively, in a clockwise direction around CH(S). The height Ai from the line through X parallel to the line through (pi, Pr) is calculated for each point Pi E [pi, Pr]I An algorithm called simple triangulation, which takes a subsequence such as the one defined above and triangulates it, runs in 0(logk) time using O(k) processors, where k is the number of points in the subsequence. Since a point can be in at most two subsequences, the processors can be divided among the subsequences so that O(k) processors are used for each subsequence. The simple triangulation algorithm splits a subsequence of size k into k0 2 subsequences of size k012 and recursively triangulates each in a multiway divide-and-conquer process. The 3,'s are used to connect points to their left higher and right higher neighbors in the subsequence and down to the point X. Merging the triangulated subsequences in 0(logn) time with O(n) processors is accomplished using a data structure similar to a segment tree [Bent8O]. The entire algorithm takes O(logn) time with O(n) processors on a CREW PRAM. An algorithm in [Wang87] achieves the same running time and also uses multiway divide-and-conquer but does not first reduce the problem of triangulating points in a convex hull to triangulating points in a triangle. The set of points is partitioned into n /2 subsets of size n1/2 each. The problem is solved recursively on each subset and, during the algorithm, the convex hull of each of the subsets is created. The upper hulls of the n1/2 convex hulls, n1 /2 - I of their pairwise common upper supporting lines, and n12- 1 "middle" lines connecting every two adjacent sets of points, form n1 /2 - funnel polygons. (Funnel polygons were described in the preceding section.) An algorithm is presented that triangulates funnel polygons in 0(logm) time using O(m) processors, where m is the number of vertices in the funnel polygon. A supporting line lij is chosen for a pair of convex hulls CHi and CHj (CHj to the left of CHi) where the slope of lij is smaller than the slope of all supporting lines between CH, and hulls to the left of CHi. The supporting lines can be found in 0(logn) time with one processor [Over8l], and since there are at most n supporting lines, in 0(logn) time using O(n) processors. The algorithm to triangulate each funnel polygon is called and the entire process is repeated for the lower hulls. It is shown that allocating O(n) processors to n1 /2 - I funnel polygons so that each funnel polygon P of size m is allocated m - 2 processors can be done in 0 (log n) time using O(n) processors on the CREW PRAM. The algorithm in [Wang87] is adapted to run on an 0(n)-processor hypercube by MacKenzie and Stout [MacK9Oa] by dividing the set of points in n1/4 subsets of size n314 each. At each stage of the recursion, O(SORT(n)) time is used, where SORT(n) is the time needed to sort n numbers on an 0(n)-processor hypercube. The total time required is t(n) = t(n314 ) + O(SORT(n)), which is O(SORT(n)). At the time of this writing, the fastest known sorting algorithm on a hypercube of size n has running time 0(log n log log n) [Cyph9O, Leig9 1]. Using this sorting algorithm, the triangulation algorithm of [MacK9oa] runs in 0(logn log logn) time.

Triangulation of Polygons and Point Sets

134

Reference

Model

Processors

Running time

[Chaz84]

Linear array

0(n)

0(n)

[Merk861, [Wang871 [MacK9Oa] with [Leig9l] [Elgi86a] t

CREW PRAM Hypercube CREW PRAM

0(n) O(n) 0(n/logn)

0(logn) 0(lognloglogn) 0(log2 n)

Chap. 10

Figure 10.6 Performance comparison of parallel algorithms for triangulating point sets. Finally, an algorithm given in [ElGi86a] triangulates a point set in arbitrary dimensions in O(log 2 n) time using 0(n/ log n) processors on a CREW PRAM.

Summary The table in Figure 10.6 summarizes the results in this section. All of the results are for triangulating n points in the plane except for the reference marked with a t, which triangulates n points in arbitrary dimensions.

10.3 Problems 10.1.

10.2.

Show how an n-vertex simple polygon can be triangulated on the following processor networks: (a) Tree (b) Butterfly (c) Pyramid Design a hypercube algorithm for trapezoidal decomposition and triangulation of a simple polygon with n vertices, whose cost is 0(n log2 n).

10.3.

10.4.

Design a CREW PRAM algorithm that decomposes a simple polygon with n vertices into trapezoids and whose cost is O(n). Can this performance be obtained on an EREW PRAM? Show how a triangulation of a set of points in the plane can be computed on each of the following models of computation: (a) Mesh (b) Tree

10.5. 10.6.

(c) Modified AKS network (d) Broadcasting with selective reduction (BSR) Design a mesh-of-trees algorithm for computing a triangulation of a set of points in a d-dimensional space. Given a set S of points in the plane, a minimum-weight triangulation of S is a

triangulation T such that the sum of the Euclidean lengths of its edges is a minimum over all triangulation of S. Design a parallel algorithm for computing T on each of the following models of computation: (a) Mesh

Sec. 10.4

References

135

(b) Hypercube (c) CRCW PRAM (d) Broadcasting with selective reduction (BSR) 10.7. Repeat Problem 10.6 for the case where the points of S form a simple polygon. 10.8. Repeat Problem 10.6 for the case where the points of S lie on a set L of m straight nonvertical lines, numbered I to m, and satisfying the following two properties: (i) No two lines intersect inside CH(S). (ii) All the points of S on line i are above line i + I and below line i - 1. 10.9. Design a parallel algorithm for triangulating a simple polygon with holes. 10.10. Design parallel algorithms for computing the following two geometric structures for a set P of points in the plane [Prep851: (a) The Gabriel graph of P has an edge between points pi and pj of P if and only if the disk with diameter the segment (p,, pj) contains no point of P in its interior. (b) The relative neighborhood graph of P has an edge between points pi and pj of P if and only if d(pi,pa), the distance from pi to pj, is such that d(pi, pj) < min max(d(pi, pk), d(pj,pk))-

10.11. A set of points P in d-dimensional space, where d > 2, is given. (a) Define the concept of a triangulation T of P. (b) Design a parallel algorithm for computing T on a hypercube computer. 10.12. A set of points P in d-dimensional space, where d > 2, is given. (a) Provide a definition for the concept of the Delaunay triangulation T of the convex hull of P. (b) Design a parallel algorithm for computing T [Beic90.

10.4 References [Agga88] [Asan88] [Atal86a] [AtaI86b]

A. Aggarwal, B. Chazelle, L. J. Guibas, C. 6'Ddnlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. T. Asano and H. Umeo, Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region, Parallel Computing, Vol. 6, 1988, 209-216. M. J. Atallah and M. T. Goodrich, Efficient parallel solutions to some geometric problems, Journal of Paralleland Distributed Computing, Vol. 3, 1986, 492-507. M. J. Atallah and M. T. Goodrich, Efficient plane sweeping in parallel (preliminary version), Proceedings of the Second Annual ACM Symposium on Computational

[Atal89c]

[Beic90

Geometry, Yorktown Heights, New York, June 1986, 216-225. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, SIAM Journal on Computing, Vol. 18, No. 3, 1989, 499-532. 1. Beichl and F. Sullivan, A robust parallel triangulation and shelling algorithm, Proceedings of the Second Canadian Conference in Computational Geometry,

[Bent8O]

Ottawa, Ontario, August 1990, 107-111. J. L. Bentley and D. Wood, An optimal worst case algorithm for reporting

136

Triangulation of Polygons and Point Sets

Chap. 10

intersections of rectangles, IEEE Transactions on Computers, Vol. C-29, 1980, 571-576. [Chaz84] B. Chazelle, Computational geometry on a systolic chip, IEEE Transactions on Computers, Vol. C-33, No. 9, September 1984, 774-785. [Chaz9O] B. Chazelle, Triangulating a simple polygon in linear time, Proceedings of the Thirty-First Annual Symposium on Foundations of Computer Science, St. Louis, October 1990, Vol. I, 220-230. [Cyph90] R. Cypher and C. G. Plaxton, Deterministic sorting in nearly logarithmic time on the hypercube and related computers, Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, Baltimore, May 1990, 193-203. [Dehn90] F. Dehne and A. Rau-Chaplin, Implementing data structures on a hypercube multiprocessor, and applications in parallel computational geometry, Journal of Paralleland Distributed Computing, Vol. 8, 1990, 367-375. [ElGi86a] H. ElGindy, An optimal speed-up parallel algorithm for triangulating simplicial point sets in space, International Journal of Parallel Programming, Vol. 15, No. 5, 1986, 389-398. [Good89a] M. T. Goodrich, Triangulating a polygon in parallel, Journal of Algorithms, Vol. 10, September 1989, 327-351. [Jeon90l C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. [Leig9l] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays Trees . Hypercubes, Morgan Kaufman, San Mateo, California, 1991. [MacK9Oa] P. D. MacKenzie and Q. F. Stout, Asymptotically efficient hypercube algorithms for computational geometry, Proceedings of the Third Symposium on the Frontiers of Massively Parallel Computation, College Park, Maryland, October 1990, 8-11. [Merk86] E. Merks, An optimal parallel algorithm for triangulating a set of points in the plane, International Journal of Parallel Programming, Vol. 15, No. 5, 1986, 399-411. [Over8l] M. H. Overmars and J. van Leeuwen, Maintenance of configurations in the plane, Journal of Computer and System Sciences, Vol. 23, 1981, 166-204. [Prep85] F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. [Reif87] J. H. Reif and S. Sen, Optimal randomized parallel algorithms for computational geometry, Proceedingsof the 1987 InternationalConference on ParallelProcessing, St. Charles, Illinois, August 1987, 270-277. [Reif9O] J. H. Reif and S. Sen, Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 327-337. [Saxe90] S. Saxena, P. C. P. Bhatt, and V. C. Prasad, Efficient VLSI parallel algorithm for Delaunay triangulation on orthogonal tree network in two and three dimensions, IEEE Transactions on Computers, Vol. C-39, No. 3, March 1990, 400-404. [Wang87] C. A. Wang and Y. H. Tsin, An O(logn) time parallel algorithm for triangulating a set of points in the plane, Information ProcessingLetters, Vol. 25, 1987, 55-60. [Yap88] C. K. Yap, Parallel triangulation of a polygon in two calls to the trapezoidal map, Algorithmica, Vol. 3, 1988, 279-288.

11

Current Trends

The purpose of this chapter is to expose some of the trends in parallel computational geometry that are developing at the time of this writing. We begin by describing a number of algorithms that run on systolic screens and solve problems defined on two-dimensional pictures, such as insertion, deletion, computing shadows, and finding shortest paths. We then present a generalization of the prefix sums problem and show how it leads to the derivation of efficient parallel algorithms for several problems in computational geometry, including finding maximal vectors and ECDF searching. This is followed by a study of the properties of the star and pancake interconnection networks and their use in solving a family of computational geometric problems. Finally, we conclude with a detailed discussion of the model of computation known as broadcasting with selective reduction and its applications. For other expositions of current trends in parallel computational geometry, the reader is referred to [Agga92] and [Good92a].

11.1 Parallel Computational Geometry on a Grid Current graphics technology uses raster scan devices to display images. These devices can be modeled by a two-dimensional array of picture elements or pixels (see Section 2.2.2). This suggests that it would be useful to consider a form of geometry where objects are composed of pixels. Operations on objects simply deal with collections of pixels. Indeed, a language is proposed in [Guib82] to manipulate pixels in which general graphics operations can be performed. An idea that blends in naturally with the use of pixel operations is the use of massive parallelism. A natural way to use parallelism in this framework is to assign a processor to each pixel with a suitable underlying connection network. Algorithms are described in [Four88] that use the frame buffer or the array of pixels in such a way. This concept was also used in our discussion of systolic screens and their associated algorithms (see, for example, Chapters 3 and 7). In [Akl9Ob] several common geometric problems are examined and algorithms are provided for their solution incorporating some of the ideas mentioned above. 137

Current Trends

138

Figure 11.1 components.

Chap. 11

Grid and three regions. Regions can have holes and disconnected

These include a geometric search problem, a shadow problem, and a path in a maze problem. This section is devoted to a discussion of these problems and the corresponding algorithms designed to solve them in parallel. Other algorithms for the problems addressed in this section, as well as algorithms for related problems and additional references, can be found in [Beye69], [Nass8O], [Won87], [Prea88], [Ayka91], [Dehn9la], and [Dehn9lb]. 11.1.1 Geometric Search Problem Consider a square finite subset of the plane denoted by U, partitioned into an N"12 by N112 square grid. Following the terminology used in the computer graphics literature we will call each grid square a pixel. We will identify each pixel by its row and column position in U. A region r is defined by a collection of pixels from U and identified with a unique label. Let R = (rl, r2, . ... , r) represent a well-ordered sequence of regions as shown in Figure 11.1. We would like to process operations in the following form:

Sec. 11.1

Parallel Computational Geometry on a Grid

139

1. INSERT: Given the description of a new region r, and a location in the sequence R, insert r into the sequence. 2. DELETE: Given a region r, delete it from R. 3. RETURN TOP REGION: Given a pixel (i,j), return the region r containing (i,j), such that of all regions containing (i,j), r is the one that appears first in the sequence R. If there is no region in R containing (i,j), return NIL. These operations are intended to be a formalism of the operations one performs when using a raster graphics display with inputs from an electronic selecting device (e.g., a mouse, light pen, touch screen, etc.). Each region represents an object of the graphics display. A RETURN TOP REGION operation represents the ability to choose an object by selecting a pixel the object contains. The INSERT and DELETE operations reflect the ability to insert and delete objects and move objects around. (Using our formalism, moving an object from one position to another results in a DELETE followed by an INSERT.) The sequence of the regions denotes that objects are layered, and in the case where several objects contain a selected pixel, the one that is on top is the one that is chosen. It is shown in [Akl90] that by using an array of processors (one processor per pixel) this problem can be solved in a simple and efficient manner. In what follows we begin by describing a sequential solution to the problem, designed to run on a single processor. This will serve as our paradigm in presenting the parallel solution of [Akl90], which runs on an array of processors. Sequential solution. One can view the problem we have described as a combination of two search problems. One search requires us to determine all the regions that contain a given pixel. The second is a search among these regions for the one that appears first in our ordering. A simple solution maintains a priority queue Tij for each pixel (ij). All regions in R that contain (ij) will be stored in Tij. We must update w priority queues to INSERT a region r containing w pixels. Similarly, an update of w priority queues is required to DELETE a region r. A RETURN TOP REGION operation simply examines one priority queue and returns an answer. We must now address the problem of maintaining these priority queues. If we could assign a value to the priority of a region, an update of a priority queue is straightforward. Using a suitable balanced search tree implementation, we can insert into and delete from the priority queue in time that is proportional to log ITIj 1.However, inserting into these priority queues requires knowledge of R. A region's position in R cannot be represented by a fixed rank; rather, the position in R only has meaning relative to other regions in R. We therefore require one additional priority queue, called TR, to maintain the entire sequence R. The operations INSERT and DELETE use TR when updating individual priority queues. Let us first examine the INSERT operation. We are given a region r to be inserted into the sequence R, after a region q. We want to update each Tij corresponding to pixels (ij) that are contained in R. The correct position in which to insert r into Tj is after the region r*, such that r* is the last region in Tij that precedes q in R. We can identify r* by performing a binary search in Tij. At each comparison we examine the

140

Current Trends

Chap. 11

region s to determine whether it precedes or succeeds q. If s precedes q, we can search in the upper part of the sequence, and if s succeeds q, we search in the lower part of the sequence. After O(log ITj1 ) comparisons we can find r*. Each comparison to determine if s precedes or succeeds q is done by locating s in TR. This computation requires o (log n) operations. Observe that ITijI < n. Thus to INSERT a region r consisting of w pixels, we use 0 (w log 2 n) operations. Using a similar method, we can DELETE a region consisting of w pixels in 0(w log 2 n) operations. Given the pixel (ij), the operation RETURN TOP REGION can be performed in constant time, by returning the first region found in Ti. This scheme uses one memory location for each pixel in the union of all of the regions in R, plus storage for the structure TR. On the average, each pixel is shared by a small number of regions, yielding a storage requirement of 0(n). In the worst case, however, each of the N pixels is part of all n regions, and the storage requirement is O(N x n). Similarly, the time to perform an INSERT or DELETE operation is 0 (N log2 n). The analysis above suggests a time-versus-storage trade-off. Instead of using T1j, assume that a copy of the priority queue TR is maintained for each pixel (ij). In that copy of TR, those regions containing (i,j) are identified by a special label. In this case, the storage requirement is always O(N x n), while the running time of an INSERT or DELETE operation is O(N log n). This reduction of the running time by a factor of o (log n) is due to the fact that all the information required to insert or delete a region is available at each pixel. Operation RETURN TOP REGION can be performed in' constant time as before by maintaining, for each pixel (i,j), a pointer to the top region in TR containing (i, j). Parallel solution. The observation in the final paragraph of the preceding section leads to the following parallel implementation of the algorithm. Assume that N processors are available. The processors are arranged in a two-dimensional array of N 1/2 rows and N"/2 columns. There are no links whatsoever connecting the processors. Each processor is associated with a pixel in U. We assume that each processor has enough memory to store a copy of TR. Within that copy, those regions containing the associated pixel are identified by a special label. All processors are capable of communicating with the outside world. Suppose that a region r with w pixels is to be inserted into the sequence R, after region q. Each of the w pixels in the region receives r and q and a "1" bit indicating that it belongs to this region. Each of the remaining N - w pixels also receives r and q and a "O" bit indicating that it lies outside the region. All processors would have thus received in constant time the data needed to perform the insertion. This is done locally by each processor in an additional 0(logn) steps, with all processors operating in parallel to update their copy of TR. The processors included in the new region also label that region in their copy of TR. Consequently, INSERT requires 0(logn) time. The same analysis applies to DELETE. Finally, operation RETURN TOP REGION [containing pixel (ij)] is performed by querying processor ij, associated with pixel (ij). If a pointer is maintained at all times to the top region in TR containing (ij), then processor ij is capable of answering the query in constant time.

Sec. 11.1

Parallel Computational Geometry on a Grid

141

A similar problem is examined in [Bern88]. A scene consisting of rectangular windows is maintained to allow insertions, deletions, and mouse click queries in much the same way as described above. However, the approach in [Bem88] is object oriented; that is, the algorithm is independent of the resolution of the display medium. Inserting or deleting a rectangle can be performed in O(log2 n loglogn + klog2 n) time using o (n log2 n + a log n) space, where k is the number of visible line segments that change, n is the number of rectangles, and a is the number of visible line segments at the time of the update. A mouse click query is performed in 0 (log n log log n) time.

11.1.2 Shadow Problem Assume that an N1 2 by N 1 / 2 mesh of processors represents an N 1/ 2 by N 1/ 2 grid of pixels. We will call this mesh the screen. The pixels are assumed to be unit squares and pixel (i,j) covers the square with corners (i i 0.5, j + 0.5). We assume that the rows and columns are numbered from 0 to N1/ 2 - 1. Each processor in the interior of the screen can communicate with four neighbors, and the exterior ones with one or two. An image on the screen is defined as a set of squares. Given a collection of images and a light source, as shown in Figure 11.2, it is required to compute the shadows that the images throw onto each other. In Figure 11.2, the parts of the object that are not highlighted are shadows. We assume that each processor ij knows whether or not pixel (i,j) is in the image. It is shown in [Dehn88e] that the shadows can be computed in O(N'1 2 ) time for a light source that is infinitely far away. The idea there is to solve the problem in strips that run parallel to the rays of light. The strip width needs to be chosen carefully to achieve the O(N 112 ) running time. A simpler algorithm that also computes the shadows in O(N' 2) time is given in [Akl9O]. Moreover, the algorithm works for any light source that is either inside or outside the screen. This algorithm is now described in some detail.

Computing shadows. Suppose that processor ij is the processor closest to the light source. We assume that this processor knows the location of the light source. Processor ij starts the computation, and in at most 2N'/2 steps all other processors can compute the shadows on their squares. Since all squares are equal in size, it can easily be seen that on each side of a square there can be at most one interval of light and one interval of shadow. The following algorithm computes all contours of images that receive light from the light source.

Algorithm Shadows for all processors st (except processor ij) do in parallel 1. Wait for a message from neighboring processors; this message specifies the coordinates of the light source and the shadow interval on the edge that separates the square of processor st and the square of the sending processor.

142

Current Trends

Chap. 11

Figure 11.2 Point light source and some objects. The illuminated parts of the objects are highlighted. 2. Using the coordinates of the light source, processor st computes the number of messages it should receive. 3. The shadow intervals on the remaining edges can be computed from the coordinates of the light source, the received shadow intervals of one or more edges, and the knowledge that square (s,t) is or is not part of the image. The number of received messages in step 2 is either 0, 1, or 2. For example, if the light source is inside the screen, the coordinates of the light source are in a square (a,b), for some a and b such that 0 < a, b < N 112. Processor ij, where ij = ab, will send the initial messages. All processors st with s = a or t = b will receive one message. All other processors will receive two messages. On the other hand, suppose that the light source is outside the screen in a square with coordinates (a,b) for some a < 0 , and 0 < b < N 112. Processor ij, where ij = Ob, will start the computation. All processors st with t = b will receive one message. All other processors will receive two messages. Similarly, it can be seen that the number of messages sent is at most 3, except possibly for the starting processor. To obtain the shadowing information of the remaining edges in step 3, it is

Sec. 11.1

Parallel Computational Geometry on a Grid

143

Figure 11.3 Maze. required only to compute the straight lines through the light source and the endpoints of the received shadow intervals, and to intersect these with the remaining edges. From the two observations above it can be concluded that steps 2 and 3 can be executed in constant time. Consequently, the overall running time of the algorithm is O(N' /2). 11.1.3 Path in a Maze Problem We are given an N1/2 by N1/2 grid consisting of some white squares and some black squares, as shown in Figure 11.3, for N = 81. Two white squares, A (the origin) and B (the destination), are designated. It is required to find a shortest path from A to B along white squares, which avoids the black squares (the obstacles). Of course, there may be cases where no path exists from A to B (for example, if A were A' in Figure 11.3). Applications that require finding the shortest path in a maze include circuit design, robot motion planning, and computer graphics. It should be noted that in the restricted maze problem [Lee61], the shortest path can go from one square to another only through a common horizontal or vertical side. By contrast, our definition also allows the path to go diagonally from one square to the next through a common corner. In what follows we begin by presenting a sequential solution to the problem. We then describe a parallel algorithm for a systolic screen, first proposed in [Akl90]. Sequential solution. To solve the problem sequentially, we express it in graph theoretic terms. A weighted graph is used to represent the maze as follows: 1. A white (black) node is associated with a white (black) square. 2. Each black node is connected to its immediate neighbors (at most eight) by arcs of infinite weight. 3. Each white node is connected to every white immediate neighbor (at most eight) by an arc of finite weight.

144

Current Trends

Chap. 11

Figure 11.4 Graph corresponding to maze in Figure 11.3.

4. The nodes associated with the origin and destination are marked A and B, respectively. The resulting graph for the maze in Figure 11.3 is shown in Figure 11.4. For simplicity only arcs of finite weight are shown. The problem of finding a shortest path in a maze is now reduced to that of finding a shortest path in a graph. The graph has N nodes and e arcs, where e < 2N1 /2 (N1 /2 - 1) = O(N). It is shown in [John77] that a shortest path in such a graph can be found in O(N log N) time. If it turns out that this path has infinite weight, we know that no path exists from origin to destination in the given maze. Parallel solution. Let the squares in the maze be indexed in row-major order (i.e., left to right and top to bottom) from I (top left square) to N (bottom right square). Our parallel solution associates a processor i with square i. Strictly speaking, only the processors associated with white squares will be needed. However, we usually do not know in advance which squares will be white and which will be black. Thus in a computer graphics application, for example, each pixel is assigned a processor: The same pixel is white in some scenes, black in others. The parallel algorithm may be thought of as a wave that originates at A and sweeps the maze; if B is reached, a stream flows back from B to A, yielding the required shortest path. Thus the algorithm consists of two phases: the forward phase and the backward phase. Each phase requires at most N steps. This means that a shortest path is found (if one exists) in at most 2N steps. If the wave never reaches B (i.e., if

Sec. 11.1

Parallel Computational Geometry on a Grid

145

there is no path from origin to destination), the backward phase is never executed. In most cases, the two phases will overlap in time, with the second phase beginning before the first phase has ended. The algorithm is given below. Algorithm Maze Step 1. The origin square is assigned the label (t,a,O), where t stands for temporary, a is the origin's index, and 0 is the distance from the origin to itself. Step 2. for k = I to 2N do for i = I to N do in parallel (2.1) Once square i is assigned a label (tj,d), for some j, it labels all its immediate unlabeled white neighbors, k, with (t, i, d + Wik), where Wik is the distance from i to k. (2.2) A square receiving more than one label simultaneously in (2.1) retains only one, the one with the smallest distance. If two or more labels are received with the same distance, the label with the smallest index is chosen. (2.3) Once the destination square is labeled (tj,d), for some j, it changes its label to (fj,d), where f stands forfinal. (2.4) Once a square is assigned a label (fj,d), for some j, it changes the label of its neighbor j from (tmd) to (fm,d), for some m. When the algorithm terminates, the squares labeled final define the path from origin to destination (if one such path exists). Assuming that the squares represent pixels, a line can be drawn along those pixels labeled final. The algorithm runs in O(N) parallel time. Any distance function can be used for assigning weight values. For example, if we simply want to minimize the number of squares traveled, weights of 1 are assigned to all arcs. On the other hand, if Euclidean distance is to be minimized, we can assign weights of I to vertical and horizontal arcs, and weights of X2 to diagonal arcs. It should be clear that the algorithm's speed can be nearly doubled by initializing two waves simultaneously, one originating at A and the other at B. 11.1.4 Concluding Remarks In this section, parallel algorithms were described for three geometric problems defined on a two-dimensional array of pixels. The algorithms presented are simple, efficient, and easy to implement. In each algorithm one processor is associated with every pixel. The algorithms differ, however, in the way the processors are interconnected. In the first algorithm, the processors conduct a geometric search independently of one another without the need to communicate among themselves. Consequently, the processors are not connected in any way. In the second algorithm, in order to compute the shadows created by images and a light source, each processor must communicate with its two horizontal and vertical neighbors. Finally, in the third algorithm, a shortest path is discovered between two given pixels, by allowing each processor to communicate

Chap. 11

Current Trends

146

0 4 2

7 9

. 8

*3

*

10

Figure 11.5 Instance of range searching problem.

with its diagonal neighbors (in addition to its horizontal and vertical neighbors). The discussion in this section suggests that using a grid of processors to solve raster geometric problems is an expedient strategy.

11.2 General Prefix Computations and their Applications Consider the following problem defined in [Spri891, where it is called general prefix computation (GPC): Let f (1), X (2), . .. , f (n) and y(l), y(2), . .. , y(n) be two sequences of elements with a binary associative operator "*" defined on the f -elements, and a linear order "<" defined on the y-elements. It is required to compute the sequence D(m) = f (it) * f 02) * ... ** t(j), for m = 1, 2, . . ., n, where ji < j2 < . .. < jk, and {il. j2, *.* , jk} is the set of indices j < m for which y(j) < y(m). This problem is a generalization of the basic prefix computation defined in

Section 2.3.2. It can also be considered as a special formulation of the following

range

searching problem. A database T is given which consists of ordered pairs of real numbers. Each entry of T can thus be viewed as the Cartesian coordinates of a point in the plane. The database is subjected to queries of the form R = [xI, x2] x [Y1, Y21. Here R defines a rectangle in the plane, and the answer to the query is a real-valued commutative group function "*" of point values satisfying the query, In other words, the answer to the query is *f (j) for all j belonging to both T and R. This is illustrated in Figure 11l.5, where the answer to the query shown is f (3) * f (5) * f (6) * f (8).

Sec. 11.2

General Prefix Computations and their Applications

0

147

4

(4, y ( ))

(2, y( 2 ))

*

(7, y( 7 ))

0 (",

Yti))

I

S

(8,Y(8))

0

(5, y( 5 ))

(6, y(6))

0 0

0

(9 Y(9)) 9 (9 )

(10,y(10))

0

( 1, x'())

Figure 11.6

Instance of general prefix computation problem.

From this general formulation of range searching, the GPC problem can be obtained by viewing the database T as consisting of points (m, y(m)), for I < m < n, and setting

R = [-oc, m] x [-oc, y(m)], for every m from 1 to n. An instance of a GPC problem is shown in Figure 11.6; taking m = 8, we get D8 = f(l) * f(5) * f(6), as illustrated in the figure. Parallel algorithms for GPC on various models of computation are given in [Spri89]. For example, using O(n) processors the problem can be solved on the CREW PRAM in 0 (log n) time. These algorithms are used in turn to obtain efficient solutions to a number of computational geometric problems, including ECDF searching, two-set dominance counting, computing maximal vectors in two and three dimensions, and triangulating a set of points in the plane. Following [Spri89], we begin by deriving a lower bound on GPC. We then show how GPC can be computed, sequentially and in parallel, and illustrate three of its applications in computational geometry.

11.2.1 Lower Bound for GPC

An Q (n log n) lower bound on the number of steps required to compute GPC is obtained by showing that any algorithm for GPC must be able to sort a sequence of numbers into nondecreasing numerical order. Let such a sequence be Z = {z(l), z( 2 ), ... , z(n)). If z(i) = z(j), we say that z(i) < z(j) if and only if i < j. To express the problem of sorting Z as a GPC problem, we let f(j) = 1 for I < j < n, and let * be the usual addition operation +.

148

Current Trends

Chap. 11

By first setting y(j) = z(j) and computing Di,, we obtain the number of elements j such that z(j) < z(m) and j < m. Now we set y(j) = z(n-j + 1), for 1 < j < n and compute Dn-m+l [i.e., the number of elements j such that z(j) < z(m) and j > m]. Finally, by computing Dm + D m±+l + 1 for a given m, we obtain the position of z(m) in Z had the sequence been sorted. This transformation of one problem to the other requires 0(n) time. In view of the Q (n log n) lower bound on the number of steps required to sort a sequence of numbers [Knut73], the lower bound on GPC follows. It should be noted that this proof is valid only for our choice of f, *, and <; other choices will require different proofs.

11.2.2 Computing GPC It is shown in [Spri89] that GPC can be computed sequentially in optimal [i.e., 0(n log n)] time. Parallel algorithms for computing GPC are also given in [Spri89]. They all use O(n) processors and run in O(logn) time on the CREW PRAM, in 0(log2 n) time on the hypercube, and in 0(n1 /2 ) time on the mesh. Although implementation details will necessarily differ from one model to the other, the sequential and parallel algorithms for GPC are all based on the same fundamental idea that we now describe. We begin by introducing some notation. Let: D(m, S) be the function Dm restricted on a set of indices S (i.e., the prefix product of points in S which are to the left and below point m) Y(S) be the sorted array of elements y(j) for j in S B(mS) be the position of element y(m) in the array Y(S)

{jl, j2,.I, js) be the set of all indices j from S for which y(j) < y(m) is satisfied (for convenience the index m is itself included in the set) P(m, S) = f (j) * f (2) * . . . * f (s) [i.e., the prefix product of points in S which are below point m (including point m itself)] The basic algorithm for computing the GPC uses a recursive divide-and-conquer approach and goes as follows. Let S be divided into two subsets L (for left) and R (for right) of equal size such that all the points of L are to the left of all the points of R. Apply the algorithm recursively to L and R, with an appropriate termination condition, thus obtaining Y(L), Y(R), D(l,L), D(r,R), P(l,L), P(r,R), B(l,L), and B(r,R) for all 1 in L and r in R. Now, Y(L) and Y(R) are merged to form Y(S), and the rank B(m,S) in Y(S) of each point m in S is obtained. The latter is used to compute, for each point r in R, the index Cr of the point in L with the largest y-coordinate such that Y(Cr) < y(r). Thus B(Cr,L) = B(r,S) - B(r,R). Similarly, for each I in L, we compute the index cl of the point in R with the largest y-coordinate such that y(cl) < y(l). Thus B(cl,R) = B(l,S) - B(1,L). It should be clear that the final result can now be obtained directly from the following relations, for each 1 in L and each r in R:

0 0

4 7

0

2

9

8

0

8

. D

0 5

6

0

P

0

L

10

R

separating line Figure 11.7

Divide-and-conquer algorithm for the GPC.

D(1,S) = D(1,L) D(r, S) = P(cr, L) * D(r, R) P(I, S) = P(I, L) * P(c,, R) P(r, S) = P(Cr, L) * P(r, R)

For example, in Figure 11.7, suppose that D(8, S) is to be computed. Once the algorithm has been applied to L and R, we would have obtained the values of the functions P and D for all points r in R and I in L. In particular, we would have P(5, L) = f (1) * f (5), and D(8, R) = f (6). After merging, we get c8 = 5. Consequently, D(8, S) = P(5, L) * D(8, R) = f(1) * f(5) * f (6). 11.2.3 Applying GPC to Geometric Problems The generality and power of GPC is demonstrated in [Spri89] by applying it to derive efficient algorithms for several problems, including a number of problems in computational geometry. We illustrate this here by showing how GPC can be used to solve the following problems for sets of points in the plane: ECDF searching, two-set dominance counting, and maximal vectors. Let S = IPI, P2,. pn} be a set of n points in the plane. A point pi is said to dominate a point pj if and only if pi [1] > pj [1] and p [2] > pj [2], where p[k] denotes the kth coordinate of a point p (see Chapter 3). 149

Current Trends

150

Problem 1: ECDF Searching. of S that are dominated by p.

Chap. 11

Compute for each p in S the number of points

Problem 2: Two-Set Dominance Counting. Given two disjoint subsets A and B of S, count for each point p in B the number of points in A that p dominates. Problem 3: Maximal Vectors. by any other point.

Determine those points of S that are not dominated

In solving these problems we begin by sorting the points on their first coordinate, and denote the lists sorted in ascending and descending order by a1, a2 , . . , an, and di, d2, . . ., d, respectively. We then apply GPC to each problem with an appropriate choice for f, y, and *, as follows: Problem 1: ECDF Searching. f (m) = 1, y(m) = am[2], * the answer for am is Din.

+;

Problem 2: Two-Set Dominance Counting.

f (m) = I

for am in A =0 for am in B, y(m) = a.[2], * = +; the answer for am is Dm.

Problem 3: Maximal Vectors. f (m) = dm[2], y(m) = m, * = max; dm is maximal if and only if dm[2] > Di. Each of the problems is thus solved in two steps, sorting and application of GPC. Sorting n elements on the CREW PRAM, the hypercube, and the mesh can be performed in O(logn) time [Cole88b], 0(lognloglogn) time [Cyph9O, Leig9l], and O(n 1 2 ) time [Thom77], respectively, using 0(n) processors. It follows that the time to solve each of the three problems, on each of the three parallel models, is asymptotically equal to the time it takes to compute the GPC for n inputs on that model. With O(n) processors, the latter is O(logn) on the CREW PRAM, O(log 2 n) on the hypercube, and 0(n 1 /2 ) on the mesh (see Chapter 3). 11.2.4 Concluding Remarks As stated in [Spri89], GPC is a generic computation that captures the most common difficult component of many problems. It is shown therein to be an elegant and powerful tool in tackling various problems in computational geometry, graph theory, and combinatorics. In particular, GPC provides a unifying approach for solving computational

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

151

geometric problems in parallel. Two possible directions for future research are suggested in [Spri89]: 1. To investigate the applicability of GPC to the solution of additional problems in the areas listed above, as well as in other areas. 2. To seek alternative generalizations of the basic prefix computation problem that might lead to the efficient solution of different kinds of problems.

11.3 Parallel Computational Geometry on Stars and Pancakes The star and pancake networks were recently proposed as attractive alternatives to the hypercube topology for interconnecting processors in a parallel computer [Aker87a, Aker87b, Aker89]. Properties of the two networks, as well as parallel algorithms for solving various problems on them, are provided in [Jwo9O, Menn9O, Niga9O, Qiu9la, Qiu9lb, Qiu9lc]. In [Akl9ld, Akl9le], several data communication algorithms that are fundamental to these two networks are presented: broadcasting, prefix sums computation, merging, unmerging, finding cousins, reversing, cyclic shifting, concentration, distribution, set difference computation, interval broadcasting, and manyto-one routing. These algorithms are then used to develop parallel solutions to various computational geometric problems on both networks, in particular: 1. An algorithm for finding the convex hull of a set of n = q! planar points on a star or pancake network with q! nodes in 0(q3 log I) time, a performance which matches that of the best known sorting algorithm on each of the two networks. 2. Algorithms for finding: critical support lines and the vector sum of two convex polygons, a smallest enclosing box, the diameter, width, and minimax linear fit of a set of points, and the maximum distance between two convex polygons. 3. Algorithms for implementing the general prefix computation (GPC) and its applications. This section reviews some of this work, and is organized as follows. Basic definitions are given in Section 11.3.1. The data communication algorithms are the subject of Section 11.3.2. These algorithms are then used in Section 11.3.3 to develop efficient solutions to the convex hull problem. Section 11.3.4 describes algorithms for solving other geometric problems by the merging slopes technique. GPC and its applications are discussed in Section 11.3.5. We conclude in Section 11.3.6 with some final remarks. 11.3.1 Basic Definitions Given a set of generators for a finite group G, the Cayley graph with respect to G is defined as follows. The nodes of the graph correspond to the elements of the group G, and there is an arc (a,b) for a, b E G if and only if there is a generator g such that ag = b [Aker89].

Current Trends

152

Chap. 11

'31 121

123 123 d

3142

2143

Figure 11.8

A 4-star.

Let G be a permutation group and V, be the set of all q! permutations of symbols 1, 2, . . ., al. A star interconnection network on v)symbols, S, = (Vq,,Es), is a Cayley graph with generators g, = i23 . .. (i - I)l(i + 1) ... i7, 2 < i < q. Figure 11.8 shows S4 .

Each node in Sq, is connected to q - 1 nodes which can be obtained by interchanging the 1-I connections first symbol of the node with the ith symbol, 2 < i < q. We call these on q symbols, network interconnection M-star. A pancake called an dimensions. S. is also P. = (Ve, Epa), is a Cayley graph with generators hi = i (i -1) ... 321(i + 1)(i +2) ... 2 < i < q. Figure 11.9 shows P4 . Each node in P, is connected to q - 1 nodes which can be obtained by flipping the first i symbols, 2 < i < i? (thus the name pancake). Each such connection is called a dimension. Pq is also called an q-pancake. Clearly, hi = gi for i < 3, and S,,= P., for q < 3. Also, both S, and Pq have O(O) diameters, and for any two arbitrary nodes u and v, it is easy to find a path from u to v of length less than 2r1 [Aker87a, Aker89]. The star and pancake networks compare favorably with the hypercube in several aspects. Each has a rich structure with a number of attractive symmetry properties, a degree (number of links per processor) and a diameter (maximum shortest distance between two processors) that are sublogarithmic in the number of processors, as well as many desirable fault tolerance characteristics [Aker87a, Aker87b, Aker89]. Since most of our discussion applies to both the star and the pancake networks, we henceforth use X. to denote either S,, or PqR. Definition 1. Let Xk(ak+lak+2... a.) be a subgraph of X, induced by all the nodes with the same 1)-k last symbols ak+lak+2 .. . a., 1 < k < r/, where ak+lak+2. . a,, is a permutation of 7 -k distinct symbols in {1, 2, .. ., i}.

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

1234

153

4321

32 23

~1

41

:3

14

:3

3412

2143

Figure 11.9 A 4-pancake.

It can be shown that Sk(ak+lak+2 ... a,) is a k-star and Pk(ak+lak+2... a,) is a k-pancake [Aker87a, Aker89]. In particular, X. can be decomposed into iq X,- 's: X,- 1 (i), I < i < q. For example, S4 in Figure 11.8 contains four 3-stars, S3(1), S3(2), S3(3), and S3(4), by fixing the last symbol at 1, 2, 3, and 4, respectively. This property that X. can be decomposed into rj X,- l's turns out to be very important to our subsequent analysis, and we will take advantage of it in developing algorithms. In what follows we assume that in X. each node is associated with a processor. Two processors are neighbors if they are connected by an arc. Definition 2. Let p denote the processor in X,, associated with the node at, and q denote the processor in X, associated with the node b1b2 ... bq. The ordering, <, on the processors is defined as follows: p < q if there exists an i, I < i < r, such that aj = by for j > i, and ai < bi. If p < q, we say that p precedes q. ala2 ...

In other words, the processors are ordered in reverse lexicographic order, or in lexicographic order if we read from right to left. Later it will become obvious that the results presented remain valid if the processor ordering is defined as the usual lexicographic order. Definition 3.

In Xq, the rank r(u) of a node u is the number of nodes v such

that v < u. Clearly, 0 < r(u) < q! -1.

Note that in some cases only a subset of the set of nodes V, is used to compute the rank of a given node (we call the nodes in the subset marked or active). It is also assumed that in one step requiring unit time, each processor may send or receive a constant number of data items to or from one of its neighbors, and that each processor

154

Current Trends

Chap. 11

holds a constant number of such items (in a geometry problem, an item is a point, an edge, etc.). Let ml and m2 be two distinct symbols from {1, 2,..., }. Further, let o represent any permutation of the q - 2 symbols {1, 2,. '71 - {ml, iM 2 . We use the notation Mi a m2 to represent a permutation of 11, 2,.. }. For example, if i7 = 9, ml is 7, m2 is 3, and o is 8416952, then ml o iM2 is 784169523. The notation can also be used when o stands for a permutation of -1I symbols; thus, each of m Io and oM2 are permutations of 7 symbols. 11.3.2 Data Communication Algorithms Parallel routing on the star and pancake networks. The following routing schemes on S, and Pq are important to data communication algorithms, and therefore are described first. Constant Time Routing: Case 1. Consider the following problem: Given X, -(i), 1 X- I(j), with i :A j, it is required to send the contents of the processors in X -1 (i) to the processors in X- I (j). By sending the contents of X1,-1 (i) to X-I (j) we mean that the content of each processor in X,- I(i) is routed to a processor in X- I(j) such that no two processors in X, 1-(i) send their contents to the same processor in X-,-(j). We can view this problem as copying the contents of Xq -(i) to X-, (j) in arbitrary order. 1 This can be accomplished in three steps as follows. In the first step, those (7- 2)! nodes of the form j o i in X- I(i) send their contents to () - 2)! nodes of the form i o j in X,,- (j) through dimension ' (note that the o in i o j is different from the one in j o i). At the same time, the remaining (q - 1)! - (iq-2)! nodes in X, -1 (i) of the form k o i, k 0 i, j, send their contents (also through dimension ') to the nodes of the form i o k in Xn- I(k). In one more step, the latter send their contents to the nodes of the form j o k, and from there, in another step through dimension q, the contents are sent to k oj in X- I(j). This algorithm is given below as procedure COPY. Procedure COPY (ij) 1. for all nodes oi do in parallel send contents to neighbors along dimension 7 2. for all nodes i o k, k 0 j do in parallel send contents to neighbors v with v(1) = j 3. for all nodes j o k, k 0 i do in parallel send contents to neighbors along dimension '7. Lemma 1. The mapping defined by procedure COPY is a bijection between the nodes of X, -,(i) and X- (j). 1 Proof. For the star network, it is easy to see from procedure COPY that node ajbi in S, -1 (i) is mapped to node aibj in S,1-I(j), where a and b are permutations of symbols in {1, 2,...,71 - {i, j), such that the symbols in a are different from the

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

155

symbols in b, and Jal + JbI = - 2, where Jal and IbI are the number of symbols in a and b, respectively. Note that either a or b could be empty, but not both. Therefore, the mapping is a bijection. The proof for the pancake network is similar, except that node ajbi in P 1-I(i) is mapped to node aibj in P_1-(j), where b = bkbk- . . . b, if b = b1b2 .. . bk. From Lemma I and procedure COPY, we immediately have: Lemma 2. The contents of X, i 0 j, in a bijective way in 0(l) time.

1 (i)

Constant Time Routing: Case 11.

can be copied to processors in Xr,_(j),

We now extend this result as follows. Let

I: il, i2 , ... , il and J: j1, 12. .. , jI be two sequences from 11, 2, .. ., 7)1 such that no two elements of I are equal, no two elements of J are equal, and {il, 2 . . . i/} and {Ji, 12.1.., j1} have no element in common. It is desired to send the contents 1(-X,1-I(1i) such that the to X- I(11), Xq1-(12), of X-lI(i 1), X- 1 (i2), ..- Xq -I(it) 1

contents of X,- I(im) are sent to X- I (im), for I in 0(l) time by the following algorithm:

<

m < 1. This task can also be achieved

Procedure GROUP COPY (I,J) for m = I to l do in parallel Modified COPY (im, jm).

where Modified COPY is the same as COPY except that all "send contents" are replaced by "exchange contents." The latter operation means that the contents of X, 1 (im) are sent to X,} (im) at the same time as the contents of X1,-1(im) are being sent to X,-I(jm), I < m < 1. It can easily be shown that the given conditions imply that no

conflict will occur. Constant Time Routing: Case If1. If we arrange all the nodes in X. into an by (a- 1)! array in row-major order (in terms of the processor ordering), then row i becomes X,_ I(i) [Menn9O]. The nodes in X4 are given in Figure 11.10. From Definitions 2 and 3 we can see that all the nodes in the same column of the v1by (iq- 1)! array have the same rank in their respective X,1-'s. For example, nodes 2431, 1432, 1423, and 1324 are all ranked third in X3(1), X3(2), X3 (3), and X3 (4), respectively. i1

4321 4312 4213 3214

3421 4231 3412 4132 2413 4123 2314 3124

2431 1432 1423 1324

3241 3142 2143 2134

2341 1342 1243 1234

Figure 11.10 Nodes in X4 In S,,, if we exchange the first symbol with the nth one in each node, we get another r1 by (q - 1)! array (see Figure 11.11) in which, by the definition of S.}, each column is connected to form a simple path (i.e., a linear array of processors [Menn9O]).

Current Trends

156

Chap. 11

(Note that this transformation corresponds to applying GROUP COPY on S,, with I: 1,3,..., J: 2, 4,..., and I: 2, 4,..., J: 3, 5,....) Therefore, we may consider the nodes in each column of S,, [whose nodes are arranged in an a by (, - 1)! array in row-major order] as "connected." Unfortunately, this is not true in the case of P,, for a > 4. In order for the nodes in each column of P, (whose nodes are also arranged in an q by (q -1)! array in row-major order) to have a similar property, we need a special routing scheme, which we now describe. 1324 1423 2314 2413 3214 3412 4213 4312

1234 2134 3124 4123

1432 2431 3421 4321

1243 2143 3142 4132

Figure 11.11 Nodes in

1342 2341 3241 4231

S4 .

This routing scheme allows us to exchange in constant time the contents of Xq11(i) (row i) with the contents of X-, 1(i + 1) (row i + 1) in an order-preserving way, that is, the content of node u ranked rth in X,,-, (i) is exchanged with the content of a node v in X,- I(i + 1) which is also ranked rth in X,- (i + 1) [namely, u and v are in the same column of the iq by (q - 1)! array], for all 1 < i < - 1. Obviously, node a(i + I)bi in P, 1_(i) has the same rank as that of node aib(i + 1) in P, 1_(i + 1). From Lemma 1, the content of node a(i + I)bi is sent to node aib(i + 1) by COPY or GROUP COPY in Sty in 0(1) time. In P, it is sent to aib(i + 1) instead, and the rank of the latter in P, -1(i + 1) is not the same as that of a(i + 1)bi in P-,(i), unless 1 bI < 1. But it can be seen easily that the following nodes form a cycle in P,-, (i + 1): aib(i + 1), biai(i + 1), biai(i + 1), aib(i + 1), iiib(i + 1), bia(i + 1), bia(i + 1), aib(i + 1). The length of the cycle is 8 if JaI > 1 and IbI > 1, and less than 8 otherwise. Thus if we first apply COPY (i, i + 1) to copy the contents of P, -I(i) to P-1 I(i + 1), 1 the content of node a(i + l)bi can be routed to the correct node in P11(i + 1) in constant time for all a and b using the cycle. The discussion above is summarized in the following lemma: Lemma 3. The contents of P,,- (i) can be copied to the nodes in P,1- (i + 1), or vice versa, in an order-preserving way, in 0(1) time, 1 < i < - 1. Thus a new routing scheme that applies procedure GROUP-COPY (I, J) on P, with two sequences 1: 1,3,5, ... , J: 2,4,6, ... , and I : 2,4,6, ... , J : 3,5,7, ... first, and then applies the routing in Lemma 3 on a cycle of constant size will "simulate" a linear array. Therefore, the nodes of each column in P,, when arranged as an q by (1- 1)! array can also be viewed as connected, since the routing takes constant time. The properties derived in this section allow us to make two assumptions that greatly simplify our presentation of subsequent algorithms:

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

157

Assumption. InX,, wheneveran Xk(ak+lak+2... a,), 2 < k < h,is considered, its nodes are always viewed as a k by (k - 1)! array where nodes are listed in increasing row-major order (i.e., each row of the array is an Xk- , and all the nodes in row i precede, in terms of the processor ordering, any node in row j, 1 < i < j < k). Assumption 2. In the arrangement of assumption 1, the nodes in each column are "connected" directly into a linear array of length k either in SI, or in P, without performing a constant time transformation (exchange the first and kth symbols in the star case) or routing (in the pancake case). For example, in X4 , the nodes of X 3 (2) are viewed as a 3 by 2! array as in Figure 11.12, in which each row is an X2 and each column of 3 nodes is "connected." In what follows we show that the routing schemes that led to the foregoing assumptions for X, enable us to view the two networks as one, and consequently to develop parallel algorithms that work on both. 4312 4132 3142

3412 1432 1342

Figure 11.12 Nodes in X 3 (2).

Broadcasting.

Suppose that a node of the form oi wishes to broadcast a piece of information to all the processors in Xr,. It can do so by recursively broadcasting in X,_ 1 (i); then the contents of X,- (i) are copied to the rest of the X -I's by the 1 technique of recursive doubling [Akl89a] and procedure GROUP-COPY. In other words, the message is first broadcast recursively to all the nodes in X,-, (i); the contents of Xq,- (i) are then copied to the nodes in X,,- (i + 1), the contents of X ,-I(i) and X,- (i + 1) are subsequently copied to the nodes in X- I(i + 2) and X, -I(i + 3), then 1 the contents of X,- (i), X,-, (i + 1), X,-1 , (i + 2), and X,-, (i + 3) are copied to the nodes in X. -,(i +4), Xq- 1 (i +5), X-, (i +6), and X,-, (i +7), and so on, until the 1 message is broadcast to all the nodes in X,,. It should be noted that in this particular application of GROUP COPY, all the processors contain the same datum, namely the message to be broadcast. Let t(q) be the time complexity of the broadcasting algorithm; then t (@) = t(

- 1) + 3[log 1o = O(7 logqi).

This algorithm works for both S. and P, and is different from the ones given in [Aker87a] and [Aker89], in which separate algorithms, also requiring 0(7 log q) time, are given for S.> and P,. In [Aker87a] and [Aker89] it is shown that there is an O('O log >1)length sequence of dimensions SI, S2, S3, . .,5S for S, and another 0 (1 log q)-length sequence of dimensions PI, P2, P3, . *, Pq for P, 2 < si < ql, 2 < pj < rI, such that broadcasting on S. or P,, can be done by letting each processor send its message along the dimensions SI, S2, . .- s., or PIP2, -. P Since Qi(log(n!)) = Q(i)log7) is the lower bound for broadcasting on any network with YI!nodes [Aker87a], assuming that each processor can communicate with only one neighbor in one time unit, all these broadcasting algorithms are optimal.

Current Trends

158

Chap. 11

Given elements Computing prefix sums, ranks, maxima, and minima. . - XN-1, stored in processors 0, 1, . . ., N-1 in a network with processors ordered such that processor i < processor j if and only if i < j, and an associative binary operation *, the parallelprefix computation (defined in Chapter 2) is to compute all the quantities sj = xo *x *... .*xj, i = 0, 1, . . ., N-1. At the end of the computation we require that processor j contain sj. Here, we refer to the problem as the prefix sums problem, since + is one possible binary associative operation. An 0 (, log 7)-time algorithm for computing all prefix sums on X, with respect to the processor ordering (Definition 2), using a constant-time routing scheme, is given below. The prefix sums computation on X,1 is done using the procedure GROUP-COPY. Suppose that we have computed prefix sums for two groups of substructures as follows: XO,

xI,

Group 1.

X,_1 (i) ... X-_ (i + k)

Group 2.

X,- i(i +k+ 1)...X-

I(i +2k+ 1)

and that each processor holds two variables, s and t, for storing the partial prefix sum so far and the total sum of values in the group it is in, respectively. Let the total sum in group 1 be t1 and the total sum in group 2 be t2. We first use GROUP COPY to send t1 to every processor in group 2, and t2 to every processor in group 1; then the prefix sums in processors in group 1 remain the same, while the prefix sum s in a processor in group 2 becomes s * tj. The total sum for all the processors in both groups becomes tj * t2 . All these steps can be accomplished in 0(1) time. When a group contains only one X, -1, the algorithm is called recursively. This leads to a running time of 0(rj log 0). It is straightforward to state the algorithm formally. However, care must be taken since 1 is not necessarily a power of 2. Assume now that some nodes in X,1 are marked. The rank of a marked node u is the number of marked nodes that precede u. The ranks of all the marked nodes can be computed in 0 (q log q) time by applying the prefix sums algorithm, with * being the usual addition +, each marked node having value 1, and others having value 0 (the rank of a marked node is its prefix sum minus one). The maximum and minimum of q! values stored one per node in X,, can also be found in 0 (q log)?) time by letting the binary associative operation in the prefix sums algorithm be max and min, respectively. The final result (either the maximum or the minimum) is reported in all processors. It is easy to see that the idea for broadcasting can also be used to find the maximum or minimum of q! elements in 0 (r log q) time on S,, or P,. Sorting, merging, unmerging, and finding cousins. Given a sequence of elements stored in a set of processors, with each processor holding one element, we say that the sequence is sorted in the F (forward) direction if for any two elements x and y held by processors p and q, respectively, p < q if and only if x < y. The R (reverse) direction is defined similarly. The sequential lower bound of Q ((i7!) log(y7!)) on the number of steps required for sorting q! numbers [Knut73] implies a lower bound of Q (log(q!)) = Q(q log q) on the number of parallel steps needed to sort on both S, and P,,. Sorting on S. has been studied in [Menn9O], in which an 0(03 log q)-time algorithm

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

159

is given. This algorithm is based on a sorting algorithm for the mesh-connected computer given in [Sche89] and is outlined below as procedure 11-Star Sort. We denote by D the direction of the final sorted sequence, where D can be either F or R. We also use D to denote the direction opposite to D. Each iteration of step 2 in the procedure implements a merging algorithm. Procedure

o-Star Sort (D)

1. in parallel sort all the odd-numbered rows in the forward direction and all the even-numbered rows in the reverse direction recursively. 2. for j = I to Flog q] do a. Starting with row 1, arrange all rows into groups of 2J consecutively numbered rows (the last group may not have all 2i rows). b. in parallel sort the columns within each group of rows in the direction D. c. in parallel 1. sort the rows in odd-numbered groups by calling FTG (D); 2. sort the rows in even-numbered groups by calling FTG (D). Procedure FTG ("fixing the gap" as it is called in [Menn9O]) is defined as follows: Procedure FTG (D)

if the row is not a 1-star do 1. in parallel sort all columns in the direction D. 2. in parallel sort all rows with FTG (D). It is important to node that if procedure FTG is called with a k-star, then, by Assumption 1, each row in step 2 is a (k - I)-star. Also note that in step 2, FTG is applied to each row, with all rows being sorted in parallel. From the algorithms above we can see that sorting or merging on X, is reduced to sorting on the columns. Since each column is connected as a linear array, odd-even transposition sort [Akl89a] can be applied. This means that given two sorted sequences stored in two groups of X,-I's: A: X,- (i), X,-1(i + 1),

X-1(j),

B: X-A(k), X_ 1(k + 1), .

Xq1

(1),

i < j < k < I (A and B do not necessarily contain the same number of X,-i's), such

that A and B are in opposite directions, they can be merged into a sorted sequence stored in C: X 1- (i), Xq _I(i + 1)I ... I X1W(j), X-1 (k), X, _1(k + 1), 1 1

X1 1-i(1),

in either direction in O(q 2) time. Let t(ij) be the time to sort q! elements on S.; then t(7) = t(r1- 1) + Flog

ff1

x O(Y) 2 ) = O(173 log 1).

It is not hard to see that the same sorting and merging algorithms also apply to P,

Current Trends

160

Chap. 11

since the nodes in columns of P, when arranged in an r1 x (YI- 1)! array, can also be considered as connected (Assumption 2). Now let A, B, and C as defined above be given, such that each element in C knows the rank of the node in which it was before the merging. The problem of unmerging is to permute the list to return each element in C to its original node in A or B. This operation is the inverse of merging. The problem can be solved by running the merging algorithm in reverse order, using the given rank information (the rank information is used to compute address coordinates, as defined in the following section; the unmerging procedure is basically an ASCEND-type algorithm). The problem of unmerging can also be solved by applying the operations of concentration and translation to be described later (concentrate the elements of A, then those of B, then translate the elements of B). Both approaches take O(172) time. Let A and B be two sorted lists stored in two groups of X 1-'s, and let a be an element of A. The cousins of a in B are two consecutive elements b, and b2 in B, such that a lies between b, and b2 in the sorted list resulting from merging A and B (we assume that B has two dummy elements, -oc and +oo, for obvious reasons). The cousins in B of each element in A can be determined in o(q 2) time by merging and interval broadcasting (the latter is described further below). Two classes of parallel algorithms. Suppose that all the nodes uo, u I . U,,!-, in X11 have been ordered such that Uk
{nodes

X2X3 ... Xi-]xixi+ ...

OPER

(x2 . .. Xi -xIOXi+ . .. x,,

X,1, 0 < x

< i -1,

x 2 . .. x-1 lxji

.. . .

form a linear array of length i} X,,

., x 2 . ..

Xi-

(i - I)xi+l . . X),

where OPER could be any computation on i elements on a linear array of length i. In the dual class DESCEND, the main loop of the algorithm is changed to: for i = to 2 do. These two types of algorithms are similar to the usual ASCEND-DESCEND

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

161

algorithms for the hypercube and other networks [Lang76, Nass81, Nass82, Prep81, Rank9O]. Using the ASCEND and/or DESCEND algorithms with appropriate OPERs, a number of useful data permutation algorithms can be found easily. They are: translation (cyclic shift), reversing, concentration, and distribution. These operations on X, are similar to the ones on the hypercube, and their correctness is established in the same way. In all of these algorithms, each node has a record that includes a destination address. This record is to be permuted to the destination. The permuting is done for all the nodes with records and nonempty destinations. As in the hypercube case, the computations (comparisons) during the permutation are performed on keys of records, and keys are destination address coordinates: at step i, if the ith coordinate of the destination address of a record is not the same as the ith coordinate of the node that this record is currently in, the record is sent to the node with the correct ith coordinate along an array of length i. By doing this, the destination address is matched coordinate by coordinate. The OPER for each of these algorithms is basically the sorting algorithm or one of its variations, with keys to be sorted being destination address coordinates. When a key needs to be moved during the sorting, the record that contains the key is also moved. Each OPER can be done in O(17) time since the longest linear array that OPER works on is 1. Therefore, all these data permutations can be done in 0(y12) time. Translation. Suppose that all the nodes u0 , ul, .. ., u,!-t in X, have been ordered such that Uk < uj if k < j. In the operation translation, given some integer s, node Uk has to send its datum to node Uk+,(mod q!) simultaneously for all k, 0 < k < 17!-1. Translation is also called cyclic shift. Node Uk with rank r(uk) = k and address x 2 x 3 ... x,, is to be shifted to a node with rank (r(uk) + s)mod q1!. The address Y2Y3 ... Y, for the new node can be computed accordingly. Using the new address, a translation can be accomplished by running the ASCEND or DESCEND algorithms, where OPER is sorting in the forward direction and the numbers to be sorted are coordinates of r(uk) + s. If we run the DESCEND algorithm, the odd-even transposition sort will be performed on linear arrays of length Y . *, Y2, for iterations ij, q-1, ., 2, and the values in the comparisons will be y, y,i = 1, .. 3, 2. On the other hand, if the ASCEND algorithm is run, the values used in the comparisons are Y2, y3, . . , y, respectively, for iterations i = 2, 3, . . ., q. At iteration i, the ith coordinates, xi of Uk, and y; of the node r(uk) + s, are matched. Translations for some s, for example, where s is a multiple of (17- 1)!, can be done in 0(Y1) time. Reversing. In X,, the element in node u ranked r(u), 0 < r(u) < 1! - 1, needs to go to the node ranked 17!- - r(u). Let Y2Y3 ... ye be the address of the node ranked q! 1 - r(u) (i.e., the new address after the reversing). Reversing is needed when two sorted sequences in the same direction are to be merged; one of them is reversed first, then the regular merging is carried out. The reversing can also be done by running the ASCEND or DESCEND algorithms, where at iteration i, the OPER is the sorting algorithm in the forward direction and the values (keys) used in the sorting algorithm are yi's, i = 2, 3, 1, for ASCEND, and Yi s, i = 1, .7. , 3, 2, for DESCEND.

162

Current Trends

Chap. 11

Concentration and distribution. Some nodes of X, contain "active" elements. The rank r(u) of an active node u is the number of active nodes preceding u. These active elements are to be compressed (concentrated) so that they are stored in nodes 0, 1, 2, ... , such that the active element originally in node u is now in node r(u). Consider the following problem first. Given a linear array of length i, with nodes 0, 1, 2,..., i - 1, some of the nodes are marked as "active." Each active node has a record with an integer key k, 0 < k < i - 1, and any two keys contained in two different active nodes are different. The task is to route the record in an active node having key k to the kh node in the linear array, with this being done for all the active nodes. This problem can be solved as follows. We first sort the records by their keys in the forward direction (assuming that inactive nodes have keys valued at +x0); that is, node 0 has the smallest key, node 1 has the second smallest key, etc. Now the problem becomes a distribution problem (see below for definition) on the linear array which can be solved easily in 0(i) time. A concentration can be done in o( 12) time by running the ASCEND algorithm. Let an active node u have address x2x 3 ... x,, and let the address of the node to which the record of u is to move be Y2Y3 ... Y, [computed from r(u)]. During iteration i, 2 < i < rq, the OPER is the same as described in the preceding paragraph with keys being yj's. A distribution is simply the inverse of a concentration, and can be done by applying the concentration operation in reverse order (i.e., running the DESCEND algorithm with the same OPER). An example is in order. Initially, in X4 , the data are stored in nodes 1, 7, 13, 14, 18, 21, and 23 (a node is in the range from 0 to 23 in X4 ), and their addresses X2X3 X4 are 100, 101, 102, 012, 003, 113, and 123. They are to be compressed to nodes 0, 1, 2, 3, 4, 5, and 6 with Y2Y3Y4 addresses 000, 100, 010, 110, 020, 120, 001. The execution of the algorithm is illustrated in Figure 11.13, in which (a) is the original configuration, and (b), (c), and (d) are the configurations after iterations i = 2, 3,4. Interval broadcasting. In Xl, certain k nodes are marked as leaders 11, 12, .* - lk, with 1i < 1j if i < j, and k < i/!; they possess data that they must share with all the higher-numbered nodes (in terms of the processor ordering defined before) up to but not including the next leader. That is, each marked node li has to broadcast its message to the interval of nodes between li and lI+,. Interval broadcasting can be done in 0(rj log )) time by running the prefix sums algorithm once, where each leader holds an index (processor rank) as well as its message, and each nonleader initially has an index of -1 and a dummy message. Given two messages and their associated indices, the result of applying the binary associative operation of the prefix sums algorithm is as follows: The processor with the smaller index is assigned the larger of the two indices and the message associated with that index. The interval broadcasting algorithm can be used to accomplish a translation by +1 or -1 position. We can do so by applying the interval broadcasting algorithm twice. For the case s = +1, we first let leaders be even-numbered nodes 0, 2, 4, .. .,7!- 2, then do the interval broadcasting so that their data are shifted to nodes 1, 3, 5, . .. , !.- 1;

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

-

A

*

*

B

*

*

C

D

E

*

*

-

*

-

-*

*

A

*

*

*

*

*

B

*

*

C

*

*

G

E

-

-

F

*

I

_

_

_

I__

I

I

I

A

B

C

G

*

I___I

A

*

*

*

*

*

I*B

*

*

*

G

I

*

*

*

*

*

D

*

*

F

G

-

(b)

(a)

__ ___ _

163

|

*

*

I D

I*1

*

I E

I*

F

I*1

*

C

D

*

*

*

*

*

*

*

*

*

*

-

E

F

*

*

*

*

*

*

(C)

(d)

Figure 11.13 Concentration on

X4 .

the second time, the leaders are 1, 3, 5, . . 1!- 3 and after the interval broadcasting, their data are in 2, 4, 6, .. ., .!- 2; finally, the content of node iq! - I can be routed to node 0 in 0(??) time, so the total time for the translation is still 0(rq log 1i). For the case s = -1, the translation can be done in a similar way except that the ordering of the nodes is reversed (i.e., node i becomes node q! - 1 - i). This idea can be used to do translations by +c or -c positions in 0(q log Y) time for any constant positive integer c. Set difference. The set difference operations is defined as follows: Given two sorted arrays A and B, find the set difference A - B (B is not necessarily a subset of A), where A and B are stored in two groups of X,,-i's. This operation can be performed in o(q12) time. By merging A and B, each element from A finds its cousins in B and checks whether or not one of them is equal to it. Those that are not equal to any of their cousins are then compressed to give the final result. If there are repeated elements in A, by compressing those elements of A that are not equal to their right neighbor, repetitions are avoided.

Current Trends

164

Chap. 11

Many-to-one routing. In a many-to-one routing problem, both origin and destination nodes have keys, with keys of origin nodes being different from one another. Each destination node should receive data from the origin with the same key. This problem can be solved in 0(i7310g q) time on X,1, by applying sorting, interval broadcasting, and concentration/distribution algorithms, as described in [Ullm84]. Summary. The operations that can be done in 0(i? logqi) time on S,, and P, are: broadcasting, interval broadcasting, prefix sums, max, min, ranking, and translation by a constant number of positions. Sorting and many-to-one routing can be done in 0 (,3 log q) time. Merging, unmerging, cousins, reversing, set difference, translation, concentration, and distribution can be done in 0(?72) time, while translation by k x (q-1)! positions, for any k, can be done in 0(q) time. It is easy to see that all the algorithms of this section apply not only to X,1, but also to a group of consecutively numbered X17 i'S.

11.3.3 Convex Hull Algorithms on the Star and Pancake Networks Divide-and-conquer is a common strategy to find the convex hull CH(S) of a set of points S. Given n = rq! planar points stored in X, we first sort the points by their x-coordinates. Then in procedure CONVEX HULL below, q disjoint convex hulls of (1- 1)! points each are found recursively in parallel in X, 1-(i), 1 < i < q. These convex hulls are now merged repeatedly until a final convex hull is obtained. Procedure CONVEX-HULL (X,) for i = 1 to q do in parallel CONVEX HULL (X,-l (i)) for j = 1 to [logifl do 1. Starting with row 1, arrange all rows (Xi,-i's) into groups of 2i consecutively numbered rows (the last group may not have all 2i rows). 2. for all the groups do in parallel merge two convex hulls within the group. We now describe the merge procedure, to which we refer henceforth as the merging slopes technique. This result is similar to the convex hull algorithm in [Stoj88a] that runs in 0 (log2 n) time using n processors on a hypercube (see Chapter 3). Here, we give more detail regarding its implementation on star and pancake networks. Let CH(P) and CH(Q) be two disjoint convex hulls of two sets of points P and Q. CH(P) and CH(Q) are stored in two groups of X,-I's; they are merged by computing two tangents common to CH(P) and CH(Q). In what follows, all angles are measured with respect to the x-axis.

Definition 4. The distance of a point to an oriented edge e is the distance from the point to a line containing e; if the point is to the left (alternatively, right) of e, the

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

165

distance is said to be positive (alternatively, negative). The a-distance of a point to e is its distance to the edge e' obtained by rotating e by the angle a in a counterclockwise direction around a point (the results of queries discussed below do not depend on the choice of this point). Let A and B be two convex polygons in the plane, each containing 0(k(17 - 1)!) edges stored in two groups of k X,1-'s, 1 < k < 1- 1, given in counterclockwise order. Given an angle a, consider the following problem [we call it the extremal search problem ES(A,B,a)]: For each edge e in A find a vertex v(e) in B with the smallest a-distance to e among vertices from B [v(e) is called an associated point of e in direction a]. It is easy to see that for a = 0 (a = 7), v(e) is the vertex with the smallest (greatest) distance from e among vertices of B. For a = 7r/2 (a = 3wr/2), v(e) is the rightmost (leftmost) point with respect to e. Let s(e) denote the angle of an edge e. We use the following property of associated points: The associated point v(e) in B (in direction at) of an edge e in A belongs to an edge e' in B such that Is(e) + a - s(e')l is minimized. In other words, the associated point of e belongs to an edge that is a cousin of e in B. We now describe the procedure ES(A,B,a). We first increase the angles of edges of A by a. The edges with minimal angles in A and B are recognized and by some translations they are moved to the first vertices of the corresponding groups of X,_ 's. Since angles of edges of both convex polygons are then given in increasing order, the sets A and B can be merged by their angles in 0(?72) time. Now sets A, B, and A U B are sorted and each edge e of A can find its cousins in B by interval broadcasting (the last leader broadcasts its data to all the nodes preceding the first leader). We use the unmerging technique to return all edges to their initial positions in X,. To merge CH(P) and CH(Q), we decide for each of their edges whether it is an external or internal edge [i.e., if it is an edge of CH(S)]. To judge if an edge e is external, we need to test if CH(P) and CH(Q) are in the same half-plane bounded by e. However, instead of testing all the vertices of CH(Q) with edge e of CH(P), we only test two representatives (associated points of e) such that if they are in the same half-plane bounded by e as CH(P), so is every point in CH(Q). These two representatives of e in CH(P) [e in CH(Q)] are the nearest and farthest extreme points from CH(Q) [CH(P)] and are obtained by calling procedures ES(CH(P), CH(Q), 0), ES(CH(P), CH(Q), 7), ES(CH(Q), CH(P), 0), and ES(CH(Q), CH(P),7r). Now each edge can decide in constant time if it is external. Then each extreme point of CH(P) or CH(Q) can learn if it is an extreme point of CH(S) (translation by 1 can be used to find the necessary data). Two extreme points in both CH(P) and CH(Q) share an external and an internal edge. These four points determine two common tangents of CH(P) and CH(Q). Then the computation of the circular edge list of CH(S) can be done in o(q 2) time by some translations. The merging procedure takes O(q2 ) time and is repeated O(log q) times. Let t(7) be the time to find the convex hull of n = q! planar points on X,; then t(r) = t(O- 1) + 0(0 2 logq) = 0Q 3 log ). Thus the convex hull of n = 17!planar points

can be computed in O(q3 log ij) time on S, or P. with q! processors. This performance matches that of the currently fastest known sorting algorithms on Sq and P,, described

166

Current Trends

Chap. 11

earlier. It is known that any convex hull algorithm can be used to sort a set of real numbers [Prep85]. Consequently, the convex hull algorithms for S,, and P,, that we have presented appear to be the currently best possible, and any faster convex hull algorithm would immediately imply a faster sorting algorithm. There is reason to believe that this could be achieved. For an input of size n, both problems can be solved sequentially in o (n log n) time, which is optimal. This means that for n = )!, we should aim to solve the two problems on S, and P, in O(logn) [i.e., 0(7 log7)], time.

11.3.4 Solving Geometric Problems by the Merging Slopes Technique Using the merging slopes technique, several other geometric problems can also be solved. They are the problems of finding critical support lines of two convex polygons, the smallest enclosing box of a set of points, the diameter, width, and minimax linear fit of a set of points, the maximum distance between two convex polygons, and the vector sum of two convex polygons. Let the size of every problem be n = 0(17!). The running time of the algorithms for these problems is either O(Yp2) or 0 (73 log ?1), depending on the problem. These algorithms are presented in what follows.

Critical support lines of two convex polygons.

Given two disjoint convex

polygons P and Q, a critical support line is a line L(pq) such that it is a line of support for P and Q at p and q, respectively, and such that P and Q lie on opposite sides of L(p,q). Critical support lines have applications in a variety of problems, such as visibility and collision avoidance [Tous833. The algorithm to find critical support lines of two convex polygons P and Q is similar to that for merging two convex hulls, and thus runs in o(q12) time. We find associated points for each edge from P (respectively, Q) as described in the preceding section. Then each edge of P (respectively, Q) can decide whether the polygon Q (respectively, P) lies completely on one side of a straight line passing through that edge and P (respectively, Q) on the other side. Now p is determined as the point common to two edges of P, one satisfying the latter property and the second not satisfying it. Point q is obtained in Q similarly. This defines L(p,q). Two other points, one in P and one in Q, obtained in the same fashion, define a second critical support line. Smallest enclosing box. For certain packing and layout problems it is useful to find a minimum-area rectangle (smallest box) that encloses a set S of n points. This problem has been studied in [Free75] and [Mill89b] and a lower bound of Q (n log n) on the number of sequential steps required to solve it was established in [Lee86a]. Clearly, any enclosing rectangle of S must enclose CH(S). It is shown in [Free75] that a smallest enclosing box of S must have one side collinear with an edge of CH(S), and that each of the other three sides must pass through an extreme point of S. We determine for each edge e of CH(S) the minimum area enclosing rectangle that contains a side collinear with e. The smallest enclosing box of S is the minimal over all the obtained enclosing rectangles. To determine the minimum-area rectangle

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

167

for each edge e of CH(S), each processor containing an edge e needs to know three additional extreme points of CH(S): T, L, and R, which are the topmost, leftmost, and rightmost points, respectively, with respect to the edge e. First we find the convex hull CH(S) of S. Then the points L, T, and R are obtained by calling the procedures ES(CH(S), CH(S), -7r/2), ES(CH(S), CH(S), Jr), and ES(CH(S), CH(S), 7r/2), respectively. It is clear that the parallel running time is bounded by the time needed for merging and computing convex hulls and minima of some data. Thus the algorithm runs in O(7 3 log r) time on X,, for the set S of size 0(,1!) Diameter of a set. The diameter of a set S of n points in the plane is the distance between two points from S that are farthest apart. Optimal sequential solutions to the problem, requiring O (n log n) time, are presented in [Sham78, Brow79b, Prep85]. The diameter of a set S is equal to the diameter of its convex hull CH(S), and the two points that determine the diameter belong to two parallel lines of support. A pair of points that admits parallel supporting lines are called antipodal. First we find CH(S). The number of extreme points of the convex hull is h < '7!. Let pi, P2. . . Ph be the points of CH(S) ordered in counterclockwise order. We assign a processor to each convex hull edge (Pi-tpi) (I < i < h, PO = Ph). By calling the parallel procedure ES(CH(S), CH(S), 7r) we find for each edge the extreme point that is farthest from it. There are one or two such points. If there is one point pj, then pi-,, pj and pi, pj are antipodal pairs of points assigned to the edge. If there are two such points pj- and pj [in this case edges (pi-,,pi) and (pj- ,pj) are parallel], then pi-,, pj -, and pi, pj are antipodal pairs assigned to the edge (the other two pairs are not candidates for the farthest pair). It is easy to show that all antipodal pairs are found during the process. Next, in parallel, each edge calculates the distances between the antipodal pairs assigned to it and stores the greater one. Then the greatest among the computed distances is the diameter of the given point set. This algorithm runs in 0('73 log TI) time on X, with '! nodes for a set S of size 0(r'!). For a set S of size n, it runs in 0(logn) time on a CREW PRAM with n processors and in 0(n' 2) time on a mesh computer, also with n processors. In [Boxe89a] an 0(logn) time CREW PRAM algorithm is proposed by using binary search rather than the merging slopes technique. A different CREW PRAM solution is suggested in [Agga88]. Finally, an optimal solution for mesh computers is given in [Jeon9O] based on finding the Voronoi diagram of the set. Width and minimax linear fit. The width of a set S of n planar points is the smallest distance between two parallel lines of support. This problem has been studied in [Houl85] and [Lee86b], where optimal 0(n logn) time algorithms are presented. Given the data points (x,,yi), i = 1, 2, . '.., !, and a line described by ax+by+c = O (where a2 + b2 = 1) define the vertical distance from a point (xiyi) to the line as ax(i)+by(i)+cj, and define the "error" of the line fit as 6(a,b) = max lax(i)+by(i)+cI. The line above is called a minimax linearfit if it minimizes s(a,b). There are a number of solutions to the minimax linear fit problem [Sham78, Houl85, Lee86b, Rey87].

Current Trends

168

Chap. 11

Shamos [Sham78] was the first to propose an 0 (n log n) time solution to the problem, and this solution was shown to be optimal in [Lee86b]. In what follows we give a parallelization of the sequential method described in [Houl851 for solving the width and minimax linear fit problems. Finding the width of a set S of points reduces to finding a farthest point for each edge of CH(S) [i.e., an associated point, by calling the procedure ES(CH(S), CH(S), Yr)] and computing the smallest distance from an edge to its associated point (i.e., computing the hull edge with the least error). Since the minimax linear fit is at maximum distance from at least three points of S [Sham78, Houl85, Lee86b] the minimax linear fit is the middle line between the least error hull edge and the point farthest away from it. (The middle line is the line that lies between, and is equidistant to, the convex hull edge with the least error and its farthest associated point. That way, the error of the middle line is minimized among all lines parallel to the edge. On the other hand, the minimax linear fit must be parallel to a convex hull edge.) Both algorithms run in O(ir 3 log q) time.

Maximum distance between two convex polygons.

Let P = (pI, P2,

Pn) and Q = (q , q2, ... ., qn) be two convex polygons. The maximum distance between P and Q, denoted by dmax(P,Q), is defined as dmax(P,Q) = max{d(p1 ,qj)}, i, j = 1, 2, . .. , n, where d(pi,qj) is the Euclidean distance between pi and qj. There is an 0(n) time optimal sequential solution to the problem [Tous83], which is similar to that for the diameter problem. The maximum distance between P and Q is chosen among antipodal pairs between the polygons P and Q. Procedure ES(P, Q,7r) is called and the maximum distance is then found as for the diameter problem. This algorithm runs on X,, in O(q2 ) time for n = q! points.

Vector sum of two convex polygons.

Consider two convex polygons P

and Q (where a convex polygon is viewed as an infinite set of points that contains all the interior points as well as the boundary points). Given a point r = (Xr,Yr) in P and a point s = (xsys) in Q, the vector sum of r and s, denoted by r G9 s, is a planar point t given by t = (Xr + Xs, Yr + Ys). The vector sum of the two sets P and Q, denoted by P E Q, is the set consisting of all the elements obtained by adding every point in Q to every point in P. Vector sums of polygons and polyhedra have applications in collision avoidance problems [Tous83]. There is an 0(n)-time optimal sequential solution to the problem [Tous83]. It is based on the fact that P E Q is a convex polygon with no more than 2n vertices, and the vertices of P E3 Q are vector sums of the vertices of P and Q. Two vertices pi in P and qj in Q that admit parallel lines of support in the same direction as illustrated in Figure 11.14 will be referred to as a co-podal pair. The following property allows us to search only for the co-podal pairs Q: The vertices of P e) Q are vector sums of co-podal of P and Q in constructing P ED pairs of vertices [Tous83]. Thus, to obtain a parallel solution we should call procedure ES(P, Q,O) which finds the co-podal pair (pi ,qj). The vertices of P e Q are given by computing Pi @ qj for each co-podal pair. A convex hull algorithm can then be used to

Sec. 11.3

Parallel Computational Geometry on Stars and Pancakes

Figure 11.14

169

Co-nodal pair.

find the desired polygon. Note that if no two edges in P and Q are parallel, the number of co-podal pairs is n = qj!. This algorithm runs in 0(i73 log q) time. 11.3.5 General Prefix Computation The GPC algorithm of [Spri89] for the hypercube can easily be modified to run in 0(3 log 0) time on X, where n = O!, using the merging and interval broadcasting algorithms described earlier. Using the GPC algorithm, several problems can be solved on X,1 by setting the f's and y's appropriately. They include ECDF searching, two-set dominance counting, and 2-D and 3-D maximal elements computation. These problems can all be solved in 0(73 log ?) time. Another problem that can be solved in 0(q 3 log ,) time on X.1 using the GPC is triangulating a set of points in the plane. Other applications of the GPC on X, are reconstruction of trees from their in-order and preorder traversals, parenthesis matching, and counting inversions in a permutation [Spri89]. 11.3.6 Concluding Remarks In this section we have presented several basic algorithms for data communication in the star and pancake interconnection networks. These algorithms were then used to obtain efficient solutions to many problems in computational geometry. One of these is an algorithm that computes the convex hull of a! planar points on star and pancake

170

Current Trends

Chap. 11

networks with iq! processors in 0 (q3 log 1) time. This time matches that of the best known sorting algorithms on these networks. The merging slopes technique, on which the convex hull algorithm is based, together with the data communication algorithms, are then applied to obtain solutions to a number of other geometric problems. It was also shown how GPC can be implemented on the star and pancake networks. Other problems that can be solved using the data communication algorithms are the problems of finding the closest pair among a set of n = ri! points, and computing the intersection of two convex polygons. They can be solved in 0(t3 log a) and 0(172) time, respectively. The algorithms are not given explicitly since they can be obtained directly from the ones described in [Stoj88a] for the hypercube model.

11.4 Broadcasting with Selective Reduction An examination of the literature on parallel algorithms reveals that the PRAM is without doubt - the most popular theoretical model of parallel computation [Gibb88, Akl89a, Karp90]. Figure 2.14, showing the PRAM, is repeated here as Figure 11.15 for easy reference. As stated in Chapter 2, the model consists of a number of processors sharing a common memory. The processors solve a computational problem in parallel by executing the steps of an algorithm simultaneously. The shared memory stores data and results, and also serves as the communication medium for the processors. An interconnection unit (IU) allows the processors to gain access to the memory locations for the purpose of reading or writing. The model is further specified by defining the mode of memory access. The three most commonly used variants are the EREW PRAM, the CREW PRAM, and the CRCW PRAM. One important property of the PRAM is that in all of its variants, memory access is assumed to take constant time. Curiously, and perhaps deliberately, the vast majority of definitions of the model appearing in the literature do not take into consideration the mechanism necessary to provide such powerful memory access capabilities. Consequently, virtually all analyses of PRAM algorithms ignore both the size of the IU linking processors to memory locations, and the time required to gain access to an arbitrary memory location (both necessarily functions of the size of the memory). Note, however, that the cost of such an IU often dominates the cost of the computation [Snyd86, Akl9Ia]. This state of affairs is unfortunate since the usefulness of a model that is not fully defined is significantly limited. As a result, there have been two approaches to remedy this situation. In the first, effort is directed toward showing how the PRAM can be simulated by a set of processors communicating through a network. There is no longer a global shared memory; instead, memory is distributed among the processors. Typical examples of this work appear in [Alt87, Rana87] and the references therein. In the second and more direct approach a construction is given that provides a full description of the IU. Several representatives of this approach appear in the literature; they include binary trees [Akl89b], memory buses [Kuce82, Akl89c], bus automata [Roth76], sorting and merging circuits [Vish84], and scan operations [Blel89]. These works led to two insights. The first is that all rules for resolving write conflicts in

rK>-

A 7-7

\

Interconnection Unit (IU)

\ Processors

A Shared Memory Locations

Figure 11.15

PRAM.

the CRCW PRAM are essentially equivalent in the sense that any concrete device with

constant size components that implements one of them can implement them all. The second is that the number of components required to build an IU for a CRCW PRAM is of the same order as that required to build an IJ for an EREW PRAM. The observation that any shared-memory model of parallel computation (e.g., the PRAM) must include as part of its definition the IU linking processors to memory locations leads in turn to another realization. Once such an IU is included in the definition, it may be exploited to provide additional useful instructions. One such PRAM extension, called broadcasting with selective reduction (BSR), is proposed in [Akl89c, Akl89d]. The model is of the shared memory family. It possesses all the features of the CRCW PRAM, with one additional instruction making it more powerful. Specifically, the IU is exploited to allow all processors to broadcast data (one datum per processor) to all memory locations, and each memory location to select a subset of the received data and reduce it to one value eventually stored in that location. This instruction is implemented in O(T (N,M)) time, where T (N,M) is the time required for memory access in a PRAM having N processors and M memory locations. A variety of algorithms for a number of fundamental computational problems are described in [Akl89c, Akl89d, Fava9O, Akl9lb, Akl9lc, Shi9l], which use this powerful BROADCAST primitive, and whose running time is O(T (N,M)). If T (N,M) = 0(1), as is typically assumed in the PRAM, these algorithms run in constant time. The remainder of this section is organized as follows. In Sectionl 1.4.1 we describe the BSR model formally and introduce a mathematical notation for the broadcast instruction. Sample algorithms are presented in Section 11.4.2. Finally, an efficient implementation of BSR is provided in Section 11.4.3. Section 11.4.4 describes an open problem.

11.4.1 BSR Model BSR is an extension of the CRCW PRAM, permitting an additional form of concurrent access to shared memory, namely the BROADCAST instruction. This instruction allows 171

Current Trends

172

Chap. 11

all processors to write to all shared-memory locations simultaneously. Each processor produces a tag and a datum, according to expressions given in the BROADCAST instruction. Each shared variable, having previously been assigned a selection operation, a, selects among the incoming data by examining the accompanying tag values and testing the condition tag a limit. The limit parameter must be set up before the BROADCAST and is (potentially) unique for each shared variable. For all tags satisfying this condition, the accompanying data are "accepted" by the shared variable and combined under the reduction operation, 'R. The BROADCAST instruction executes in one cycle, just as any shared-memory access. Notation for the BROADCAST. A mathematical notation is used to describe the BROADCAST instruction. Let us define the following symbols: di datum broadcast by processor i t, tag broadcast by processor i sj jth element of shared array s Ij: limit value associated with jth shared array element m: size of a shared array a: selection operation, where a E {<, <,=, >, >, :} A binary associative reduction operation, where 'R E {, fl, A, V, ED, max, min), denoting sum, product, AND, OR, Exclusive OR, maximum, and minimum, respectively N: number of processors M: number of shared memory locations, where M = cm, for some c > 1 The BSR BROADCAST instruction is denoted by

sj:=

N

1<j<m

taij

d

When the ranges of the variables are understood, this can be abbreviated as

sj :=S JN di _

The notation above can be interpreted as follows. For each memory location sj (with an associated limit value 1j), the proposition (toalj) is tested over all broadcast pairs ti, di. In every case for which t, satisfies the proposition, di is "accepted" by location sj. The set of all data accepted by sj is reduced to a single value by means of the binary associative operation A, and stored in shared variable sj. If no data are accepted by a given memory location, the value of that shared variable is not affected by the BROADCAST. If only one datum is accepted, sj is assigned the value of that datum. It is important to note that on an N-processor, M-memory location BSR model, the BROADCAST instruction above takes 0 (T (N, M)) time. On the other hand, on

Sec. 11.4

Broadcasting with Selective Reduction

173

a CRCW PRAM with the same number of processors and memory locations, we do not know how to perform the same computations in an amount of time smaller than M x T (NM).

11.4.2 Sample BSR Algorithms To demonstrate the power and elegance of the BROADCAST instruction, we describe BSR algorithms that compute, for a given set of n = N planar points, their maxima and their convex hull. Both algorithms run in O(T (N,M)) time, where (as defined earlier) T (N,M) is the time required for memory access in the PRAM. It is customary in analyses of PRAM algorithms to assume that T (NM) = 0(1). Consequently, the two BSR algorithms described below run in constant time. Note that in both algorithms, each step involving the index i (respectively, j) is executed for all i, I < i < N (respectively, for all j, 1 < j < N).

Maximal Vectors. Two points p and q in the plane are given by their Cartesian coordinates. The point q is said to dominate p if and only if the x-coordinate of q is larger than the x-coordinate of p, and the y-coordinate of q is larger than the y-coordinate of p. A point is said to be maximal with respect to a collection of points if and only if it is not dominated by any other point. Given a set of points in the plane, the maximal vectors problem requires that we identify those input points that are dominated by no others. (See Chapter 3 and Section 11.2.3.) The following algorithm requires that the input points be stored in arrays Xl, X2, . . ., XN, and Yl, Y2. . , YN, where the iih point is given by (xi, yi). The result is stored in array ml, m2 , .. MN. When the algorithm terminates, point i is maximal if i. and only if mi = l. Initially, mi =

Algorithm Maximal Vectors Step 1. {For each input point, find the maximum y-coordinate among all points to its right.} mj := max

yi

Xi >Xj

Step 2. {If the point found in step 1 does not lie above the input point, then the input point is maximal.1 if mi > Yi then mi := 0 else mi, 1.

During the BROADCAST, variable mj (associated with point pi) accepts the ycoordinate of every point that lies strictly to the right of pj. After the reduction, mj stores the largest y-coordinate among all points to the right of pj; say, it is the coordinate of point q. Obviously, if q lies above pj, then q dominates pj (since it also lies to the right of pj). Conversely, if q does not lie above pj, then q does not dominate pj. It follows that no point in the input set dominates pj, and hence pj is maximal. The algorithm uses N processors and M = O(N) memory locations.

Current Trends

174

Chap. 11

Convex Hull. Our second example is a BSR algorithm for computing the convex hull of a set of points in the plane. Since a key step in the algorithm sorts a set of points, we begin by showing how sorting can be done on the BSR model. The problem is stated as follows: Given a sequence of N distinct numbers VI, v2 ,. . , VN, it is required to permute them so that vi < v,+l for I < i < N - 1. The BSR algorithm below is based on counting, for each element, all the lesser elements in the sequence and using this result to place the element in its correct position. Algorithm Sorting Step 1. {Compute the rank of each element.) Cj := E

I

V,
Step 2. {Permute the elements.) VJ+Cj := Vj.

The algorithm uses N processors, M = O(N) memory locations, and runs in O(T (N,M)) time. It can be extended to handle duplicate data by modifying each vi before sorting as follows: v, := (vi x N) + (i x c), where 6 is the smallest positive difference between two inputs and can be computed in O(T (NV, M)) time during a preprocessing step. After sorting, the data are returned to their original values. Furthermore, if Vj is associated with another datum dj, the latter can also be permuted by adding the statement d+,j := dj. If both of these changes are used, the resulting sorting algorithm will be stable. We use this stable sorting algorithm in the convex hull algorithm below. We are now ready to present our BSR algorithm for computing the convex hull. It is based on the following characterization of hull points. Consider a point p. Construct all the rays from p to every other point in the set. These rays form a star centered at p. Now measure the angles between each pair of adjacent rays. If the largest such angle is smaller than 7r, then p is not on the hull (and conversely). (See property 3 of Chapter 3.) Given N points in the plane, pI, P2, PN, the following algorithm computes their convex hull in O(T (N,M)) time using N2 processors, numbered 1 to N 2 , and M = O(N) memory locations. Algorithm Convex Hull Step 1. Each processor i computes the angle made by the ray PrPs with the positive x-axis, where r and s are computed as follows: r :=1 + L(i - I)/Nj and, if i mod N > 0 then s := i mod N else s := N. Store the angle in a[i] and store r in b[i].

Sec. 11.4

Broadcasting with Selective Reduction

175

Step 2. Stably sort the two arrays a and b together, first using a as the key, and then using b as the key. Now a contains, in order, all the angles from PI, followed by all those from P2, and so on.

Step 3. if s = N then a[i] a[i - N + 1] -a[i] + 27r else a[i] a[i + 1] -a[i]. Now a contains all the angles between adjacent rays. Step 4. Use a maximizing concurrent-write in m[r] := a[i]. Now Pr is a hull point if and only if m[r] > 7r. It is interesting to note that this algorithm can be viewed as a parallelization of Graham's sequential algorithm for computing the convex hull [Grah72]. An open problem is to design a BSR algorithm that computes the convex hull of N points using asymptotically fewer than N 2 processors in O(T (N,M)) time and M = O(N) memory locations.

11.4.3 Optimal BSR Implementation In this section we show how the BSR model can be implemented (i.e., we give a full description of the IU required in BSR to execute the BSR instruction, as well as all operations in the CRCW PRAM). The main result of this section is that BSR, while extending the strongest PRAM variant (i.e., the CRCW PRAM), can be implemented with no more resources than required for the EREW PRAM (the weakest of the PRAM variants). As the implementation is based on combinational circuits, we use a definition of these as our starting point.

Combinational Circuits. For our purposes, a combinationalcircuit is a device taking a number of inputs at one end and producing a number of outputs at the other end [Parb87]. The circuit is made up of a number of interconnected components arranged in columns called stages. Each component has a constant number of input lines and a constant number of output lines; we say that the circuit has bounded degree. Given this fixed number of inputs (coming from the previous stage or from the outside world), each component computes a certain function of these inputs in one time unit and produces the result as output (to the next stage or to the outside world). An important feature of a combinational circuit is that it has no feedback: No component can be used more than once while computing the circuit's output for a given input. The size of a combinational circuit is defined as the number of components it uses. In Figure 11.16 a circuit is represented by a rectangular box. The figure shows what is meant by the depth and width of a combinational circuit. The depth is the number of stages in the circuit (i.e., the maximum number of components on a path from input to output). The width is the maximum number of components in a stage. Note that the product of the depth and

176

Current Trends

Chap. 11

depth

input

ith

ie

output

Figure 11.16 Circuit depth and width.

width provides an upper bound on the size of a circuit. We illustrate these definitions with two examples central to our subsequent development. Computing Prefix Sums. In its broadest definition, a prefix computation operates on N inputs and produces N outputs, such that the ith output item is a function of the first i input items. For example, the prefix sums problem requires that the it h output be the sum of the first i inputs. A circuit for computing prefix sums of a set of eight integers is shown in Figure 11.17 [Akl89a]. Each component either adds its two inputs or leaves its single input unchanged. Here the depth is 4 and the width is eight. In general, for N inputs the depth of the prefix sums circuit is I + log N, and the width is N. Sorting. A sorting circuit receives as input a set of N data values and produces at the output these same data values arranged in nondecreasing order. Here the components are comparators. Each comparator receives two values as input: It produces the smaller of the two on its top output line, and the larger on the bottom line. (If the two inputs are equal, their order is unchanged.) In [Batc68], two sorting circuits with similar properties are described, the odd-even and bitonic sorting circuits. Both circuits are simple extensions of merging circuits, also described in [Batc68]. Figure 11.18 shows the odd-even sorting circuit when N = 8. In general, Batcher's circuits have depth 0 (log 2 N) and width 0(N). A sorting circuit considerably more complicated than Batcher's, but with asymptotically smaller depth, is the AKS circuit [Ajta83, Pate9O]. It has depth 0(logN) and width 0(N). We conclude this section by pointing out that our implementation of BSR views the IU as a combinational circuit, two building blocks of which are a prefix sums circuit and a sorting circuit.

Lower bounds and an optimal PRAM circuit.

It is not difficult to see

that an IU connecting M processors to M memory locations, and implemented as a combinational circuit, must have a width of Q (M), a depth of Q (log M), and a size of Q (M log M) [Shan5O]. These lower bounds apply naturally to all variants of the PRAM [from the weakest (i.e., the EREW PRAM) to the strongest; (i.e., the CRCW PRAM)], as well as to BSR. An implementation of the PRAM is known whose width, depth, and size match these lower bounds up to a constant factor and is therefore optimal [Vish841. It should be noted here that the 0 (log M) depth of the IU implies a memory access

Sec. 11.4

Broadcasting with Selective Reduction

177

Figure 11.17 Prefix sums circuit.

time of 0 (log M). It is important to keep in mind, however, that the time required for memory access is usually assumed to be a constant on the PRAM. The BSR circuit. We now show that the BSR model can be implemented by a circuit whose width, depth, and size match the lower bounds described above (again up to a constant factor). At the execution of a BROADCAST instruction, processor i produces a record of the form (i, ti, di), where i is the processor index, t, its tag, and di its datum. Also, memory location j produces a record of the form (j + N, Ij, vj), where j is the index of the memory location, Ij its assigned limit value, and vj a variable that holds the data value to be stored in that memory location. Processors that are not active at the point of execution of a BROADCAST instruction produce records (i, ti, di) with di equal to the identity of the reduction operation (e.g., 0 for sum and 1 for product). In what follows we present the basic building blocks of the circuit and illustrate its operation through an example. Further details pertaining to the functioning of the various subcircuits and the way in which each of the BROADCAST cases is implemented can be found in [Fava9l]. The full BSR circuit is illustrated in Figure 1.19. The circuit is divided into six

178

Current Trends

Chap. 11

4

6

2 3

3

4 5 8 2

6 7

5

8

7

Figure 11.18 Batcher's odd-even sorting circuit. boxes, of which boxes A, C, D, and F are sorting and merging circuits. We employ the AKS sorting circuit [Ajta83] in these four boxes to achieve logarithmic depth. Boxes B and E are modified prefix sums circuits. In both of these boxes, the standard prefix sums circuit is augmented with additional lines that provide connections to preceding as well as following rows of components. Box B performs prefix computations in accordance with the selection and reduction operations in the BROADCAST instruction. Box E, on the other hand, serves to distribute data from processor records to memory records, and vice versa. The circuit in box E is an adaptation of a well-known algorithm for solving the following problem (see, e.g., [Nass81, Ullm84]): An ordered sequence of records is given, some of which hold a datum and are referred to as leaders; it is required to copy the datum held by each leader to the records separating it from the next and preceding leaders in the sequence (see the section "Interval Broadcasting" in Section 11.3.2). An Example. We illustrate the behavior of this IU through an example. Assume that a BROADCAST instruction is to be executed on a BSR model with four processors and four memory locations. Let the four processor records of the form (i, ti, di) be (1, 4, 6), (2, 2, -4), (3, 6, -2), and (4, 9, 15), and the four memory records of the form (j +4,1j, vj) be (5, 20, vu), (6, 8, V2 ), (7, 11, V3 ), and (8, 5, V4 ). When the BROADCAST instruction completes, we want the vj to have the following values: vl = 15, v2 = 0, V3 = 15, and V4 = 2. Figure 11.20 shows the progress of the processor and memory records through the IU. The figure does not show that boxes B and E each take as input the values of the selection and reduction operations which they need in order to implement a particular BROADCAST instruction. This information is distributed throughout each box without affecting the progress of the overall operation. Box A sorts the processor records by ti, while box C sorts the memory records by 1j. The prefix sums

Sec. 11.4

Broadcasting with Selective Reduction A

B

D

179 E

F

Processors

--Io

(0)

Sort

--)N.

Prefix Comp.

N Processors Merge

Dist.

Sort

--A.

Sort

--4.

M Shared Memory Locations -40

-0-

C Figure 11.19

-- 0-

Shared Memory BSR circuit.

of the di are computed in box B. Box D merges the processor and memory records on the t, and 1j. In box E, the vj are assigned their values; for example, V4 takes the value 2 since (1, 4, 2) is the processor record with largest tag (namely, 4) smaller than the limit value (namely, 5) in the memory record (8, 5, V4 ). Finally, box F sorts all records on their first field (i.e., the index field), in order to return each record to its source. The BROADCAST instruction is complete. The interested reader is referred to [Fava9 l] for a description of how the BROADCAST is executed for other combinations of the selection and reduction operations and how all PRAM instructions are executed, as well as how multiple shared arrays are handled correctly. Size of the BSR Circuit. Assume a BSR model with N processors and M memory locations. The IU circuit described contains six subcircuits, referred to as boxes A through F. Boxes A, C, and F are sorting circuits, and box D is a merging circuit. By using the AKS sorting circuit for each of these, we achieve O(N) width and 0(log N) depth for box A, 0(M) width and O(log M) depth for box C, and 0(M + N) width and 0(log(M + N)) depth for boxes D and F. Boxes B and E are based on the standard prefix computation circuit. Box B has 0(N) width and O(log N) depth. Box E is larger, having 0(M + N) width and 0(log(M + N)) depth. Thus the BSR circuit has, in total, O(M + N) width and 0(log(M + N)) depth. Assuming that N = 0(M), we obtain an IU for BSR of width 0(M), depth 0(logM), and size 0(M log M), all of which are optimal. We emphasize again that this implies an optimal memory access time of 0(logM) for BSR (exactly as for the PRAM, although as mentioned earlier this is always assumed to be a constant on the PRAM). It is important to note here that although the nodes we employ in the BSR circuit are more complex than comparators, their cost (in terms of depth and number of internal

A

B

D

E

F

O(,4, 6}

2:2, 2,-4) (C>(3,6,-2 O-),(4,9,15

C

Figure 11.20 Implementing the BROADCAST instruction.

components such as registers) is only a constant factor greater than the cost of a comparator.

11.4.4 Concluding Remarks We conclude our discussion of BSR by mentioning the following open problem. For many applications, we would like to be able to utilize simultaneously two tags, two selectors, and two limit values at the BROADCAST. In other words, we want a memory location to "accept" data fulfilling two conditions instead of one. A good example is the GPC that captures the core of many important problems, particularly in computational geometry. To solve this problem in constant time on the BSR model, we require "double selection": the first to select ji < m and the second to select y(ji) < y(m). The implementation of [Fava91] does not allow this operation, and it is not clear that it can be obtained by an extension of that implementation. Efficiently implementing double selection, and more generally multiple selection, remains a major open problem.

11.5 Problems 11.1.

11.2.

11.3.

180

In the geometric search problem of Section 11.1.1, we assumed that regions in a grid can have holes and disconnected components. Does lifting one or both of these assumptions result in a problem for which more efficient parallel algorithms exist? Develop an alternative solution to the geometric search problem of Section 11.1.1 which requires O(n + w log n) time per INSERT or DELETE operation and 0(1) time per RETURN TOP REGION operation, where n is the number of regions and w is the number of pixels in a region. How much storage does your algorithm require? Design solutions to the shadow problem of Section 11.1.2 for each of the following models of computation:

Sec. 11.5

11.4.

11.5.

11.6.

11.7. 11.8.

Problems

181

(a) Mesh with broadcast buses (b) Mesh with reconfigurable buses (c) Modified CCC It is pointed out at the end of Section 11.1.3 that the speed of the Algorithm Maze can be nearly doubled by initializing two waves simultaneously, one originating at A and the other at B. Give a formal statement of an algorithm that makes use of this idea and runs on the same model of computation as Algorithm Maze. Develop algorithms for solving the path in a maze problem on models of parallel computation other than the systolic screen. For example, how fast can the problem be solved on a hypercube? On both the CREW PRAM and the mesh, the time to compute the GPC by the algorithm of Section 11.2.2 matches (up to a constant factor) the time required to sort. This is optimal since any GPC algorithm can be used to sort. On the hypercube, however, this is not the case. To date, the fastest algorithm for sorting n elements on an n-processor hypercube runs in O(logn loglogn) time. On the other hand, the algorithm of Section 11.2.2 for computing the GPC on the hypercube takes O(log 2 n) time. Can you develop a faster algorithm for computing the GPC on the hypercube? Alternatively, can you show that the algorithm of Section 11.2.2 is optimal? Suggest other problems where GPC might lead to efficient parallel algorithms. It is stated in [Spri89] that the GPC can be computed in constant time on the BSR model using n processors. The algorithm given therein is as follows. Each processor i, I < i < n, broadcasts a tag y(i) and a datum f (i). Each memory location m, I < m < n, applies the selection operator < and the limit parameter y(m) to select those data f (j) for which j < m and y(j) < y(m). All selected data f (j) are then reduced, using the reduction operator *, to one value Dm that is finally stored in memory location m.

It would appear, however, that this algorithm requires double selection. Indeed, we need to know that j < m and that y(j) < y(m) before including f(j) in the computation of D.. The circuit implementing BSR and described in Section 11.4.3 does not allow double selection. (a) Can the circuit of Section 11.4.3 be extended to allow double selection? (b) Alternatively, can the GPC be computed on the existing implementation of BSR without the need for double selection? (c) Regardless of what the answers to (a) and (b) might be, are there geometric problems where double (or, more generally, multiple) selection in BSR might be useful? 11.9. The algorithms described in Section 11.3.2 for sorting on the star and pancake networks run in 0(q 3 log a) time. As suggested at the end of Section 11.3.3, however, there may be room for improvement since the lower bound on the time required for sorting q! elements using q! processors is Q (log q!) [i.e., Q(q log ?)]. Faster algorithms for sorting on these networks would imply faster algorithms for solving numerous other problems, including several in computational geometry. Can you find such algorithms? Alternatively, can you show that they do not exist? 11.10. Are there other problems in computational geometry (besides the ones described in Section 11.3) that can be solved efficiently on star and pancake networks? 11.11. Given n points in the plane, it is required to find the two that are closest to each other (i.e., the closest pair). Can this problem be solved in constant time using n processors on the BSR model?

182

Current Trends

Chap. 11

Hint: An extension of the BSR notation may be required. If the reduction operator is max or min, we may want the value stored to be the index of the maximum or minimum of a set of values rather than that value itself. 11.12. A set P of n points in the plane is given, where it is assumed for simplicity that no two points have the same x- or y-coordinate. Let xmin and Xmax be the two points with minimum and maximum x-coordinate, respectively. The convex hull of P can be regarded as consisting of two convex polygonal chains: the upper hull, which goes from xmij to Xmax (above the line segment (xmin, Xmax)), and the lower hull, which goes from Xmax to xmjn (below the line segment (Xmin, max)). Many convex hull algorithms are based on the idea of computing the upper hull and the lower hull separately. In fact, an algorithm for the upper hull requires minor modifications to produce the lower hull. Now consider the following algorithm for computing the upper hull of P [Ferr9la].

Algorithm Upper Hull Step 1. Sort the points of P by their x-coordinates. Step 2. For each point p of P do the following: 1. Among all points to the right of p, find the point q such that the line through the segment (p,q) forms the largest angle with the horizontal. 2. Label all points of P that fall below (p,q). Step 3. All unlabeled points form the upper hull. (a) Prove that this algorithm correctly finds the upper hull of P. (b) Can this algorithm be implemented to run in constant time, using n processors, on the BSR model? Hint: The hint in Problem 11.11 may be helpful here also.

11.6 References [Agga88]

A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Dtnlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. [Agga921 A. Aggarwal, Ed., Special Issue: Parallel Computational Geometry, Algorithmica, Vol. 7, No. 1, 1992. [Ajta83] M. Ajtai, J. Koml6s, and E. Szemeredi, An O(n logn) sorting network, Combinatorica, Vol. 3, 1983, 1-19. [Aker87a] S. B. Akers, D. Harel, and B. Krishnamurthy, The star graph: an attractive alternative to the n-cube, Proceedings of the International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 393-400. [Aker87b] S. B. Akers and B. Krishnamurthy, The fault tolerance of star graphs, Proceedings of the Second International Conference on Supercomputing, San Francisco, May 1987. [Aker89] S. B. Akers and B. Krishnamurthy, A group theoretic model for symmetric interconnection networks, IEEE Transactions on Computers, Vol. C-38, No. 4, 1989, 555-566. [Akl89a] S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989.

Sec. 11.6 [Akl89b]

[Akl89c] [Akl89d] [Akl90] [AkI91aJ

[Akl91b]

[Ak191c]

[Akl91d]

[Aki9le]

[Alt87]

[Ayka9l]

[Batc68]

[Bern88]

[Beye69] [Blel89] [Boxe89a]

References

183

S. G. Akl, On the power of concurrent memory access, in Computing and Information, R. Janicki and W. W. Koczkodaj (Editors), Elsevier, New York, Proceedings of the International Conference on Computing and Information, ICCI '89, Toronto, 1989, 49-55. S. G. Aki and G. R. Guenther, Broadcasting with selective reduction, Proceedings of the Eleventh IFIP Congress, San Francisco, August 1989, 515-520. S. G. AkU, Reflections on a parallel model of computation, Invited talk, First Great Lakes Computer Science Conference, Kalamazoo, Michigan, October 1989. S. G. Akl, H. Meijer, and D. Rappaport, Parallel computational geometry on a grid, Computers and Artificial Intelligence, Vol. 9, No. 5, 1990, 461-470. S. G. Akl, Parallel synergy: can a parallel computer be more efficient than the sum of its parts? Proceedings of the Thirteenth IMACS World Congress on Computation and Applied Mathematics, Dublin, July 1991. S. G. Akl, Memory access in models of parallel computation: from folklore to synergy and beyond, in Algorithms and Data Structures, F. Dehne, J.-R. Sack, and N. Santoro (Editors), Springer-Verlag, Berlin, 1991, 92-104. S. G. AkI and G. R. Guenther, Application of BSR to the maximal sum subsegment problem, International Journal of High Speed Computing, Vol. 3, No. 2, June 1991, 107-119. S. G. AkU, K. Qiu, and I. Stojmenovi6, Computational geometry on the star and pancake networks, Proceedings of the Third Annual Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 252-255. S. G. Akl, K. Qiu, and I. Stojmenovic, Data communication and computational geometry on the star and pancake interconnection networks, Proceedings of the Third Symposium on Parallel and Distributed Processing, Dallas, December 1991, 415-422. H. Alt, T. Hagerup, K. Mehlhorn, and F. P. Preparata, Deterministic simulation of idealized parallel computers on more realistic ones, SIAM Journal on Computing, Vol. 16, No. 5, October 1987, 808-835. C. Aykanat and T. M. Kurq, Efficient parallel maze routing algorithms on a hypercube multicomputer, Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, Illinois, August 1991, Vol. III, Algorithms and Architectures, 224-227. K. E. Batcher, Sorting networks and their applications, Proceedings of the AFIPS 1968 Spring Joint Computer Conference, Atlantic City, New Jersey, April/May 1968, 307-314. M. Bern, Hidden surface removal for rectangles, Proceedings of the Fourth Annual ACM Symposium on Computational Geometry, Urbana-Champaign, Illinois, June 1988, 183-192. W. T. Beyer, Recognition of topological invariants by iterative arrays, Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1969. G. E. Blelloch, Scans as primitive parallel operations, IEEE Transactions on Computers, Vol. C-38, No. 11, November 1989, 1526-1538. L. Boxer and R. Miller, Dynamic computational geometry on meshes and hypercubes, Journal of Supercomputing, Vol. 3, 1989, 161-191.

184

Current Trends

Chap. 11

[Brow79b] K. Q. Brown, Geometric transforms for fast geometric algorithms, Ph.D. thesis, Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pennsylvania, 1979. [Cole88b] R. Cole, Parallel merge sort, SIAM Journal on Computing, Vol. 17, No. 4, August 1988, 770-785. [Cyph9O] R. Cypher and C. G. Plaxton, Deterministic sorting in nearly logarithmic time on the hypercube and related computers, Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, Baltimore, May 1990, 193-203. [Dehn88e] F. Dehne, A. Hassenklover, J.-R. Sack, and N. Santoro, Parallel visibility on a mesh connected parallel computer, in Parallel Processing and Applications, E. Chiricozzi and A. D'Amico (Editors), North-Holland, Amsterdam, 1988, 203-210. [Dehn9la] F. Dehne and S. E. Hambrusch, Parallel algorithms for determining k-width connectivity in binary images, Journal of Parallel and Distributed Computing, Vol. 12, No. 1, May 1991, 12-23. [Dehn9lb] F. Dehne, Ed., Special Issue: Parallel Algorithms for Geometric Problems on Digitized Pictures, Algorithmica, Vol. 6, No. 5, 1991. [Fava9O] L. Fava, The design of an efficient BSR network, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, September 1990. [Fava9l] L. Fava Lindon and S. G. Akl, An Optimal Implementation of Broadcasting with Selective Reduction, Technical Report 91-298, Department of Computing and Information Science, Queen's University, Kingston, Ontario, March 1991. [Ferr9la] A. G. Ferreira, personal communication, 1991. [Four88] A. Fournier and D. Fussel, On the power of the frame buffer, ACM Transactions on Computer Graphics, Vol. 7, 1988, 103-128. [Free75] H. Freeman and R. Shapira, Determining the minimal area rectangle for an arbitrary closed curve, Communications of the ACM, Vol. 18, 1975, 409-413. [Gibb88] A. Gibbons and W. Rytter, Efficient Parallel Algorithms, Cambridge University Press, Cambridge, England, 1988. [Good92a] M.T. Goodrich, Ed., Special Issue: Parallel Computational Geometry, International Journal on Computational Geometry and Applications, 1992 [Grah72] R. L. Graham, An efficient algorithm for determining the convex hull of a finite planar set, Information Processing Letters, Vol. 1, 1972, 132-133. [Guib82] L. J. Guibas and J. Stolfi, A language for bitmap manipulation, ACM Transactions on Computer Graphics, Vol. 1, 1982, 191-214. [Houl85] M. E. Houle and G. T. Toussaint, Computing the width of a set, Proceedings of the First Annual ACM Symposium on Computational Geometry, Baltimore, 1985, 1-7. [Jeon9O] C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. [John77] D. B. Johnson, Efficient algorithms for shortest paths in sparse networks, Journal of the ACM, Vol. 24, No. 1, 1977, 1-13. [Jwo9O] J. S. Jwo, S. Lakshmivarahan, and S. K. Dhall, Embedding of cycles and grids in star graphs, Proceedingsof the Second IEEE Symposium on Parallel and Distributed Processing, Dallas, December 1990, 540-547. [Karp9O] R. M. Karp and V. Ramachandran, A survey of parallel algorithms for shared

Sec. 11.6

References

185

memory machines, in Handbook of Theoretical Computer Science, J. van Leeuwen

[Knut731 [Ku&e82] [Lang761

(Editor), North-Holland, Amsterdam, 1990, 869-941. D. E. Knuth, The Art of Computer Programming, Vol. 3, Addison-Wesley, Reading, Massachusetts, 1973. L. Kucera, Parallel computation and conflict in memory access, Information Processing Letters, Vol. 14, No. 2, April 1982, 93-96. T. Lang and H. S. Stone, A shuffle-exchange network with simplified control, IEEE Transactions on Computers, Vol. C-25, No. 1, January 1976, 55-65.

[Lee6l]

C. Y. Lee, An algorithm for path connections and its applications, IRE Transactions on Electronic Computers, Vol. EC-10, No. 3, 1961, 346-365.

[Lee86a]

D. T. Lee, Geometric location problems and their complexity, Proceedings of the Symposium on Mathematical Foundations of Computer Science, Lecture Notes in

[Lee86b]

Computer Science, No. 233, Springer-Verlag, Berlin, 1986, 154-167. D. T. Lee and Y. F. Wu, Geometric complexity of some location problems, Algorithmica, Vol. 1, 1986, 193-211.

[Leig9l]

F. T. Leighton, Introduction to ParallelAlgorithms and Architectures: Arrays

[Menn9O]

* Hypercubes, Morgan Kaufman, San Mateo, California, 1991. A. Menn and A. K. Somani, An efficient sorting algorithm for the star graph

.

Trees

interconnection network, Proceedings of the 1990 International Conference on

[Mill89b]

Parallel Processing, St. Charles, Illinois, August 1990, 1-8. R. Miller and Q. F. Stout, Mesh computer algorithms for computational geometry, IEEE Transactions on Computers, Vol. C-38, No. 3, March 1989, 321-340.

[Nass8O]

[Nass8l] [Nass82] [Niga90]

D. Nassimi and S. Sahni, Finding connected components and connected ones on a mesh-connected parallel computer, SIAM Journal on Computing, Vol. 9, No. 4, November 1980, 744-757. D. Nassimi and S. Sahni, Data broadcasting in SIMD computers, IEEE Transactions on Computers, Vol. C-30, No. 2, February 1981, 101-106. D. Nassimi and S. Sahni, Parallel permutation and sorting algorithms and a new generalized connection network, Journal of the ACM, Vol. 29, No. 3, 1982, 642-667. M. Nigam, S. Sahni, and B. Krishnamurthy, Embedding hamiltonians and hypercubes in star interconnection graphs, Proceedings of the InternationalConference on Parallel

Processing, St. Charles, Illinois, August 1990, 340-343. [Parb87]

[Pate9O] [Prea88]

1. Parberry, Parallel Complexity Theory, Research Notes in Theoretical Computer

Science, Pitman Publishing, London, 1987. M. S. Paterson, Improved sorting networks with O(log N) depth, Algorithmica, Vol. 5, 1990, 75-92. B. T. Preas, M. J. Lorenzetti, and B. D. Ackland (Editors), Physical Design Automation of Electronic Systems, Benjamin-Cummings, Menlo Park, California,

[Prep8l] [Prep851

[Qiu9la]

1988. F. P. Preparata and J. Vuillemin, The cube-connected-cycle: a versatile network for parallel computation, Communications of the ACM, Vol. 24, No. 5, 1981, 300-309. F. P. Preparata and M. I. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. K. Qiu, H. Meijer, and S. G. AkU, Parallel routing and sorting on the pancake network, Proceedingsof the International Conference on Computing and Information,

186

Current Trends

Chap. 11

Ottawa, May 1991, Lecture Notes in Computer Science, No. 497, Springer-Verlag, Berlin, 360-371. [Qiu9lb] K. Qiu, H. Meijer, and S. G. Akl, Decomposing a star graph into disjoint cycles, Information Processing Letters, Vol. 39, No. 3, August 1991, 125-129. [Qiu9lcI K. Qiu, S. G. AkU, and H. Meijer, The Star and Pancake Interconnection Networks: Properties and Algorithms, Technical Report 91-297, Department of Computing and Information Science, Queen's University, Kingston, Ontario, March 1991. [Rana87] A. G. Ranade, How to emulate shared memory, Proceedings of the Twenty-Eighth Annual Symposium on Foundations of Computer Science, Los Angeles, October 1987, 185-194. [Rank90] S. Ranka and S. Sahni, Hypercube Algorithms with Application to Image Processing and Pattern Recognition, Springer-Verlag, New York, 1990. [Rey87] C. Rey and R. Ward, On determining the on-line minimax linear fit to a discrete point set in the plane, Information Processing Letters, Vol. 24, No. 2, 1987, 97-101. [Roth76] J. Rothstein, On the ultimate limitations of parallel processing, Proceedings of the 1976 International Conference on Parallel Processing, Detroit, August 1976, 206-212. [Sche89] I. D. Scherson and S. Sen, Parallel sorting in two-dimensional VLSI models of computation, IEEE Transactions on Computers, Vol. C-38, No. 2, 1989, 238-249. [Sham78] M. I. Shamos, Computational geometry, Ph.D. thesis, Department of Computer Science, Yale University, New Haven, Connecticut, 1978. [Shan5O] C. E. Shannon, Memory requirements in a telephone exchange, Bell Systems Technical Journal, Vol. 29, 1950, 343-349. [Shi9l] X. Shi, Contributions to sequence problems, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, September 1991. [Snyd86] L. Snyder, Type architectures, shared memory and the corollary of modest potential, Annual Review of Computer Science, Vol. 1, 1986, 289-317. [Spri89] F. Springsteel and I. Stojmenovi6, Parallel general prefix computations with geometric, algebraic, and other applications, International Journal of Parallel Programming, Vol. 18, No. 6, December 1989, 485-503. [Stoj88a] I. Stojmenovi6, Computational geometry on a hypercube, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 100-103. [Thom771 C. D. Thompson and H. T. Kung, Sorting on a mesh-connected parallel computer, Communications of the ACM, Vol. 4, No. 20, 1977, 263-271. [Tous83] G. T. Toussaint, Solving geometric problems with the "rotating calipers", Proceedings of IEEE MELECON'83, Athens, May 1983. [Ullm84] J. D. Ullman, Computational Aspects of VLSI, Computer Science Press, Rockville, Maryland, 1984. [Vish84] U. Vishkin, A parallel-design distributed-implementation (PDDI) general-purpose computer, Theoretical Computer Science, Vol. 32, 1984, 157-172. [Won87] Y. Won and S. Sahni, Maze routing on a hypercube multiprocessor computer, Proceedings of the 1987 International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 630-637.

12 Future Directions

As its title indicates, this final chapter aspires to point toward some directions for future research in parallel computational geometry. We discuss implementing data structures on network models, problems related to visibility (such as art gallery, illumination and stabbing problems), geometric optimization using neural nets, arrangements, P-complete problems, and dynamic computational geometry.

12.1 Implementing Data Structures on Network Models The most important feature of the PRAM, and the reason for its power, is the common memory shared by the processors. Not only does the shared memory serve as a communication medium for the processors, but it allows a direct implementation of complex data structures, in a manner very similar to the way they are implemented on the memory of a sequential computer. The situation is considerably different for processor networks, where the memory is no longer shared but instead, distributed among the processors. Implementing data structures in the latter situation poses a more serious challenge. The problem of implementing data structures on a hypercube is addressed in [Dehn9O]. A class of graphs called ordered h-level graphs is defined, which includes most of the standard data structures. It is shown in [Dehn9O] that for such a graph with n nodes stored on a hypercube, O(n) search processes can be executed in parallel. This allows efficient solutions to be obtained for the following two problems: 1. Given a set of n linear segments in the plane, consider one endpoint of some segment p in the set. Let two rays emanate from that endpoint in the direction of the positive and negative y-axis, respectively. The rays intersect at most two other segments, called the trapezoidal segments for that endpoint. The trapezoidal map of the set consists in defining for each endpoint its trapezoidal segments. 2. Given an n-vertex simple polygon P, it is required to triangulate P. 187

188

Future Directions

Chap. 12

It is shown in [Dehn90 that both of these problems can be solved on a hypercube of size O(n logn) in time O(log 2 n). (Recall that faster algorithms to solve these problems exist for the more powerful CREW PRAM model as shown in Chapter 10.) Consider a data structure modeled as a graph G with n nodes of constant degree. The multisearch problem calls for efficiently performing 0(n) search processes on such a data structure. An additional condition on the problem is that each search path is defined on line (i.e., once a search query reaches some node v of G, it then determines which node of G it should visit next, using information stored at v). In [Atal9la] the multisearch problem is solved for numerous classes of data structures in 0(n"/2 + rn 1/2/log n) time on a mesh-connected-computer of size n, where n is the size of the data structure and r is the longest path of a search query. This result leads to an optimal 0(n 1/2)-time algorithm for the three-dimensional convex hull problem on a mesh of size n with constant storage per processor (see Section 3.5.1). These results suggest that implementing data structures on processor networks is a worthwhile endeavor that deserves to be pursued for other models besides the hypercube and the mesh.

12.2 Problems Related to Visibility Illumination and stabbing are two problems related to visibility (see Chapter 6) which are the subject of intense study in computational geometry, but for which parallel algorithms are yet to be developed.

12.2.1 Art Gallery and Illumination Problems The fundamental art gallery problem, now a classic in computational geometry, asks for determining the minimum number of guards sufficient to cover the interior of an n-wall art gallery room. The origins of this problem are traced in [O'Rou87], where many interesting variations and results are also presented. One version (giving rise to a family of problems) asks for the minimum number of lights sufficient to illuminate a set of objects (lines, triangles, rectangles, circles, convex sets, etc.). Here a light source is stationary and can illuminate 360 degrees about its position. For example, as shown in Figure 12.1, three light sources suffice to illuminate six straight-line segments. Although the topic is receiving considerable attention presently [Czyz89a, Czyz89b, Czyz89c, Urru89, Sher9O] to our knowledge, only one parallel algorithm has been developed to solve art gallery and illumination problems: It is shown in [Agga88] how an optimal placement of guards in an art gallery (in the shape of an n-vertex simple polygon) can be obtained in O(logn) time on an 0(n log n)-processor CREW PRAM.

12.2.2 Stabbing The following problems studied in [Pell90] for a set T of triangles in 3-space are representative of a wide range of stabbing problems:

Sec. 12.3

Geometric Optimization Using Neural Nets

Figure 12.1 segments.

189

Three light sources suffice to illuminate six straight-line

1. Query problem: Given a line, does it stab T (i.e., does the line intersect each triangle in the set)?

2. Existence problem: Does a stabbing line exist for T? 3. Ray shooting problem. Which is the first triangle in T hit by a given ray? Typically, the objects being stabbed range from lines to polygons to polyhedra, while ray shooting (also called ray tracing) is used to enumerate pairs of visible faces of a polyhedron and to determine whether certain complex objects (such as

nonconvex polyhedra) intersect. We are not aware of any effort toward developing parallel algorithms for these problems.

12.3 Geometric Optimization Using Neural Nets Investigations into the use of neural nets for solving optimization problems in computational geometry are currently under way [Dehn92]. In a neural net, each node (or neuron) is a processor connected to other neurons. A threshold d(i) is associated with each neuron N(i). A weight w(i,j) is associated with the edge leaving neuron N(i) and entering neuron N(j). A value v(i, t + i) is N(i)'s output at time t + 1: It is a function of d(i), w(j,i) and v(jt) for all N(j) connected to N(i) by an edge directed from N(j) to N(i). Usually, v(i,t) is a "step" function whose value is 0 unless its input exceeds a certain threshold, in which case its value is 1. A near-optimal solution to a problem is found by minimizing an energy function E. The neurons operate simultaneously, and after each iteration it is determined whether E has reached a (local) minimum, or whether a new iteration is to be performed [Hopf85, Rama88]. As an example, consider the following problem: Given an n-vertex simple polygon, it is required to triangulate the polygon such that the sum of the lengths of the edges

190

Future Directions

Chap. 12

forming the triangulation is a minimum. We associate one neuron with each of the n(n - 1)(n - 2)/6 triangles. Triangle T(i) will be said to belong to the optimal triangulation if and only if v(i, t), the output of neuron N(i), is 1. The triangulation sought must satisfy the following conditions: 1. Each boundary edge (i.e., each edge of the given polygon) belongs to exactly one triangle. 2. Each interior edge (i.e., each edge added to create a triangulation) is shared by exactly two triangles. 3. The area of the given polygon is equal to the sum of the areas of the triangles forming the triangulation. 4. The sum of the circumferences of the triangles forming the triangulation is minimal. A function E of the output v(i,t) is thus derived using conditions 1 through 4, whose minimum corresponds to the optimal triangulation. One of the main difficulties of this approach is in choosing appropriate values for the many parameters used. These parameters include the thresholds d(i) and the weights w(i,j) used in computing the v(i,t) at each iteration. They also include the various multiplicative and additive constants used in the expression for E. Another difficulty is in the choice of the initial values for the v (i, t). It should also be emphasized that there is no guarantee that the solution obtained is indeed optimal or that the process converges quickly. However, this field is still in its infancy and it is clear that a lot of work and new insights are still needed.

12.4 Parallel Algorithms for Arrangements A problem that has gained recent attention by designers of parallel algorithms is that of constructing arrangements. The problem is to determine the geometric structure of the intersections of objects in space and, in particular, the structure of the intersections of lines in the plane. Given a set L of lines in the plane, their arrangement A(L) is a subdivision of the plane. Algorithms for a single processor can be found in [Edel86b] and [Edel90]. Arrangements of n lines in two dimensions can be computed in 0(n 2 ) time, and arrangements of n hyperplanes in d dimensions can be computed in 0(nd) time with a single processor [Edel86b]. In [Ande90], an algorithm for computing the arrangement of n lines in two dimensions on a CREW PRAM is given that runs in 0 (log n log* n) time and uses 0 (n2 / logn) processors. This algorithm is generalized in [Ande90 to compute the arrangement of n hyperplanes in d dimensions on a CREW PRAM in 0(loglog*n) time using 0(nd/ logn) processors. It is shown in [Hage9O] how the latter problem can be solved in 0(logn) time by a randomized algorithm that runs on an ARBITRARY CRCW PRAM and uses 0(nd/ logn) processors. An EREW PRAM algorithm for constructing an arrangement of n lines on-line is also given in [Ande9O], where each insertion is done optimally in 0(logn) time using 0(n/logn) processors. Finally, several CREW PRAM algorithms are given in [Good9Ob] that use

Sec. 12.7

Problems

191

generalized versions of parallel plane sweeping to solve a number of problems including hidden surface elimination and constructing boundary representations for collections of objects.

12.5 P-Complete Geometric Problems A problem of size n is said to belong to the class NC if there exists a parallel algorithm for its solution which uses 0(nP) processors and runs in O(log" n) time, where p and q are nonnegative constants. Let P be the class of problems solvable sequentially in time polynomial in the size of the input. A problem 11 in the class of P-complete problems has the following two properties: 1. It is not known whether 11 is in NC. 2. If rl is shown to be in NC, then P = NC [Cook85]. It is shown in [Atal9Ob] that a number of geometric problems in the plane belong to the class of P-complete problems. This work suggests that there may exist other natural two-dimensional geometric problems for which there is no algorithm that runs in polylogarithmic time while using a polynomial number of processors.

12.6 Dynamic Computational Geometry In applications such as robotics, graphics, and air traffic control, it is often required to determine geometric properties of systems of moving objects. As these applications usually occur in real-time environments, the value of a parallel (and hence fast) solution is great. In an abstract setting, we are given a number of points (objects) that are moving in Euclidean space, with the added condition that for each point (object) every coordinate of its motion is a polynomial of bounded degree in the time variable. This formulation is used in [Boxe89b] to derive CREW PRAM algorithms for several problems, including the nearest neighbor, closest pair, collision, convex hull, and containment problems. For n moving objects, the algorithms run typically in O(log 2 n) time, using a number of processors that is only a little worse than linear in n. Mesh and hypercube algorithms for the same problems are described in [Boxe89a]. We feel that these important contributions have barely scratched the surface, however, and that many well-known computational geometric problems with applications to dynamic systems await parallel solutions.

12.7 Problems 12.1.

The standard data structures of sequential computation include linked lists, queues, stacks, trees, and so on. (a) Show how the standard data structures of sequential computation can be implemented on a linear array of processors.

192

Future Directions

Chap. 12

(b) Investigate problems in computational geometry where the implementations of part (a) lead to efficient parallel solutions. 12.2. Repeat Problem 12.1 for the following models of parallel computation: (a) Tree (b) Mesh-of-trees (c) Modified AKS network (d) Pyramid 12.3. A straight line that intersects each member of a set S of geometric objects is called a stabbing line, or a transversal, for S. Assume that S consists of n isothetic unit squares in the plane and that a transversal for S exists. Design a parallel algorithm for finding a placement of the minimum number of lights sufficient to illuminate the squares. 12.4. Let P be a simple polygon with n vertices, and let k be a fixed integer, I < k < n. It is required to place k guards inside P so that the area of P visible to the guards is maximized. Design algorithms for solving this problem on the following models of parallel computation: (a) Hypercube (b) Modified CCC (c) Scan 12.5. Given a simple polygon P, it is required to find the shortest closed path in P such that every point of P is visible from some point on the path. Develop a parallel algorithm for solving this problem. 12.6. Assume that a set S of straight-line segments is given. It is required to compute a transversal of S of minimum length. Discuss various parallel solutions to this problem on different models of computation. 12.7. Repeat Problem 12.6 for the case where S is a set of convex polygons. 12.8. Develop neural net solutions to the following geometric optimization problems defined on a set of planar points: (a) Minimum spanning tree (b) Minimum-weight perfect matching (c) Minimum-weight triangulation 12.9. Two sets of points in the plane are said to be linearly separable if a straight line can be found such that the two sets are on different sides of the line. Design a neural net algorithm for testing linear separability of two sets of points in the plane. 12.10. Given a set L of n straight lines in the plane, it is required to compute the arrangement A(L) of L on a hypercube computer. Your algorithm should produce, for each line in L, a sorted list of its intersections with other lines in L. 12.11. Are there problems in computational geometry that you suspect are not in NC? 12.12. Design and compare algorithms for solving the nearest-neighbor problem for a set of moving points in the plane on the following models of parallel computation: (a) Linear array (b) Mesh (c) d-Dimensional mesh (d) Mesh with broadcast buses (e) Mesh with reconfigurable buses

Sec. 12.8

References

193

12.8 References [Agga88] [Ande90

[Atal90bI

[Atal9la]

[Boxe89a] [Boxe89b] [Cook851 [Czyz89a]

[Czyz89b]

[Czyz89c]

[Dehn90]

[Dehn92] [Edel86b]

[Edel90

[Good9Ob]

[Hage90]

A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Ddnlaing and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. R. Anderson, P. Beame, and E. Brisson, Parallel algorithms for arrangements, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 298-306. M. J. Atallah, P. Callahan, and M. T. Goodrich, P-complete geometric problems (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 317-326. M. J. Atallah, F. Dehne, R. Miller, A. Rau-Chaplin, and J.-J. Tsay, Multisearch techniques for implementing data structures on a mesh-connected computer, Proceedings of the Third ACM Symposium on Parallel Algorithms and Architectures, Hilton Head, South Carolina, July 1991, 204-214. L. Boxer and R. Miller, Dynamic computational geometry on meshes and hypercubes, Journal of Supercomputing, Vol. 3, 1989, 161 -191. L. Boxer and R. Miller, Parallel dynamic computational geometry, Journal of New Generation Computer Systems, Vol. 2, No. 3, 1989, 227-246. S. A. Cook, A taxonomy of problems with fast parallel algorithms, Information and Control, Vol. 64, 1985, 2-22. J. Czyzowicz, E. Rivera-Campo, N. Santoro, J. Urrutia, and J. Zaks, Guarding Rectangular Art Galleries, Technical Report TR-89-27, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. J. Czyzowicz, E. Rivera-Campo, and J. Urrutia, Illuminating Rectangles and Triangles on the Plane, Technical Report TR-89-50, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. J. Czyzowicz, E. Rivera-Campo, J. Urrutia, and J. Zaks, Illuminating Lines and Circles on the Plane, Technical Report TR-89-49, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. F. Dehne and A. Rau-Chaplin, Implementing data structures on a hypercube multiprocessor, and applications in parallel computational geometry, Journal of Parallel and Distributed Computing, Vol. 8, 1990, 367-375. F. Dehne, B. Flach, M. Gastaldo, D. Graf, R. Merker, R. Sack, and N. Valiveti, Computational geometry on Hopfield networks, manuscript in preparation, 1992. H. Edelsbrunner, J. O'Rourke, and R. Seidel, Constructing arrangements of lines and hyperplanes with applications, SIAM Journal on Computing, Vol. 15, No. 2, 1986, 341 -363. H. Edelsbrunner, L. J. Guibas, and M. Sharir, The complexity and construction of many faces in arrangements of lines and of segments, Discrete and Computational Geometry, Vol. 5, 1990, 161-196. M. T. Goodrich, M. R. Ghouse, and J. Bright, Generalized sweep methods for parallel computational geometry (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 280-289. T. Hagerup, H. Jung, and E. Welzl, Efficient parallel computation of arrangements

194

Future Directions

Chap. 12

of hyperplanes in d dimensions, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 290-297. [Hopf8S] J. J. Hopfield and D. W. Tank, "Neural" computation of decisions in optimization problems, Biological Cybernetics, Vol. 52, 1985, 141-152. [O'Rou87] J. O'Rourke, Art Gallery Theorems and Algorithms, Oxford University Press, New York, 1987. [PeIl901 M. Pellegrini, Stabbing and ray shooting in 3 dimensional space, Proceedings of the Sixth Annual ACM Symposium on Computational Geometry, Berkeley, California, June 1990, 177-186. [Rama88] J. Ramanujam and P. Sadayappan, Optimization by neural networks, Proceedings of the IEEE InternationalConference on Neural Networks, San Diego, 1988, (11)325-332. [Sher9O] T. Shermer, Recent Results in Art Galleries, Technical Report CMPT TR 90-10, Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, 1990. [Urru891 J. Urrutia and J. Zaks, Illuminating Convex Sets, Technical Report TR-89-31, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989.

Bibliography

[Agga85]

[Agga88] [Agga92] [Ajta83] [Aker87a]

[Aker87b]

[Aker89]

[Akl82] [Akl84] [Akl85a]

[Akl85b] [Akl89a] [Akl89b]

[Akl89c] [Akl89d] [Akl90] [Akl9laI

A. Aggarwal, B. Chazelle, L. J. Guibas, C. O'Dtnlaing, and C. K. Yap, Parallel computational geometry, Proceedings of the Twenty-Sixth Annual Symposium on Foundations of Computer Science, Portland, Oregon, October 1985, 468-477. A. Aggarwal, B. Chazelle, L. J. Guibas, C. 6'Dunlaing, and C. K. Yap, Parallel computational geometry, Algorithmica, Vol. 3, 1988, 293-327. A. Aggarwal, Ed., Special Issue: Parallel Computational Geometry, Algorithmica, Vol. 7, No. 1, 1992. M. Ajtai, J. Koml6s, and E. Szemeredi, An O(n logn) sorting network, Combinatorica, Vol. 3, 1983, 1-19. S. B. Akers, D. Harel, and B. Krishnamurthy, The star graph: an attractive alternative to the n-cube, Proceedings of the International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 393-400. S. B. Akers and B. Krishnamurthy, The fault tolerance of star graphs, Proceedings of the Second International Conference on Supercomputing, San Francisco, May 1987. S. B. Akers and B. Krishnamurthy, A group theoretic model for symmetric interconnection networks, IEEE Transactionson Computers, Vol. C-38, No. 4, 1989, 555-566. S. G. Akl, A constant-time parallel algorithm for computing convex hulls, BIT, Vol. 22, 1982, 130-134. S. G. AkU, Optimal parallel algorithms for computing convex hulls and for sorting, Computing, Vol. 33, 1984, 1-11. S. G. Akl, Optimal parallel algorithms for selection, sorting and computing convex hulls, in Computational Geometry, G. T. Toussaint (Editor), Elsevier, Amsterdam, 1985, 1-22. S. G. Akl, Parallel Sorting Algorithms, Academic Press, Orlando, Florida, 1985. S. G. Akl, The Design and Analysis of ParallelAlgorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1989. S. G. AkU, On the power of concurrent memory access, in Computing and Information, R. Janicki and W. W. Koczkodaj (Editors), Elsevier, New York, Proceedings of the International Conference on Computing and Information, ICCI '89, Toronto, 1989, 49-55. S. G. AkI and G. R. Guenther, Broadcasting with selective reduction, Proceedings of the Eleventh IFIP Congress, San Francisco, August 1989, 515-520. S. G. AkU, Reflections on a parallel model of computation, Invited talk, First Great Lakes Computer Science Conference, Kalamazoo, Michigan, October 1989. S. G. AkM, H. Meijer, and D. Rappaport, Parallel computational geometry on a grid, Computers and Artificial Intelligence, Vol. 9, No. 5, 1990, 461-470. S. G. AkU, Parallel synergy: can a parallel computer be more efficient than the sum of its parts? Proceedings of the Thirteenth IMACS World Congress on Computation and Applied Mathematics, Dublin, July 1991.

195

196 [Akl91b]

[Akl91c]

[Akl91d]

[Akl91e]

[Alnu89] [Alt87]

[Ande891

[Ande9O]

[Asan85] [Asan86]

[Asan88] [Atal85]

[AtalS6a] [Atal86b]

[Atal87]

[Atal88a]

Bibliography S. G. Akl, Memory access in models of parallel computation: from folklore to synergy and beyond, in Algorithms and Data Structures, F. Dehne, J.-R. Sack, and N. Santoro (Editors), Springer-Verlag, Berlin, 1991, 92-104. S. G. AkU and G. R. Guenther, Application of BSR to the maximal sum subsegment problem, International Journal of High Speed Computing, Vol. 3, No. 2, June 1991, 107-119. S. G. Akl, K. Qiu, and I. Stojmenovi6, Computational geometry on the star and pancake networks, Proceedings of the Third Annual Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 252-255. S. G. Akl, K. Qiu, and 1. Stojmenovi6, Data communication and computational geometry on the star and pancake interconnection networks, Proceedings of the Third Symposium on Parallel and Distributed Processing, Dallas, December 1991, 415-422. H. M. Alnuweiri and V. K. Prasanna Kumar, An efficient VLSI architecture with applications to geometric problems, Parallel Computing, Vol. 12, 1989, 71-93. H. Alt, T. Hagerup, K. Mehlhorn, and F. P. Preparata, Deterministic simulation of idealized parallel computers on more realistic ones, SIAM Journal on Computing, Vol. 16, No. 5, October 1987, 808-835. R. Anderson, P. Beame, and E. Brisson, Parallel Algorithms for Arrangements, Technical Report 89-12-08, Department of Computer Science, University of Washington, Seattle, 1989. R. Anderson, P. Beame, and E. Brisson, Parallel algorithms for arrangements, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 298-306. T. Asano, An efficient algorithm for finding the visibility polygon for a polygonal region with holes, Transactions of the IECE Japan E-68, Vol. 9, 1985, 557-559. T. Asano and H. Umeo, Systolic Algorithms for Computing the Visibility Polygon and Triangulation of a Polygonal Region, Technical Report of IECE of Japan, COMP86-7, 1986, 53-60. T. Asano and H. Umeo, Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region, Parallel Computing, Vol. 6, 1988, 209-216. M. J. Atallah and M. T. Goodrich, Efficient parallel solutions to some geometric problems, Proceedingsof the 1985 InternationalConference on ParallelProcessing, St. Charles, Illinois, August 1985, 411-417. M. J. Atallah and M. T. Goodrich, Efficient parallel solutions to some geometric problems, Journal of Paralleland Distributed Computing, Vol. 3, 1986, 492-507. M. J. Atallah and M. T. Goodrich, Efficient plane sweeping in parallel (preliminary version), Proceedings of the Second Annual ACM Symposium on Computational Geometry, Yorktown Heights, New York, June 1986, 216-225. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, Proceedings of the Twenty-Eighth Annual Symposium on Foundations of Computer Science, Los Angeles, October 1987, 151-160. M. J. Atallah and M. T. Goodrich, Parallel algorithms for some functions of two convex polygons, Algorithmica, Vol. 3, 1988, 535-548.

Bibliography [Atal88b]

197

M. J. Atallah, G. N. Frederickson, and S. R. Kosaraju, Sorting with efficient use of special-purpose sorters, Information Processing Letters, 1988, 13-15.

[Atal89a]

M. J. Atallah and D. Z. Chen, An optimal parallel algorithm for the visibility of a simple polygon from a point (preliminary version), Proceedings of the Fifth Annual ACM Symposium on Computational Geometry, Saarbrucken, Germany, June 1989,

[AtaI89b] [Atal89c]

[Atal89d]

114-123. M. J. Atallah and D. Z. Chen, An optimal parallel algorithm for the minimum circle-cover problem, Information Processing Letters, Vol. 32, 1989, 159-165. M. J. Atallah, R. Cole, and M. T. Goodrich, Cascading divide-and-conquer: a technique for designing parallel algorithms, SIAM Journal on Computing, Vol. 18, No. 3, 1989, 499-532. M. J. Atallah and J.-J. Tsay, On the parallel-decomposability of geometric problems, Proceedings of the Fifth Annual ACM Symposium on Computational Geometry,

[Atal90aI

[Atal90b]

Saarbricken, Germany, June 1989, 104-113. M. J. Atallah, P. Callahan, and M. T. Goodrich, P-Complete Geometric Problems, Technical Report, Department of Computer Science, Johns Hopkins University, Baltimore, 1990. M. J. Atallah, P. Callahan, and M. T. Goodrich, P-complete geometric problems (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 317-326.

[Atal90c]

M. J. Atallah and D. Z. Chen, Parallel rectilinear shortest paths with rectangular obstacles, Proceedings of the Second ACM Symposium on ParallelAlgorithms and

[Atal9la]

Architectures, Crete, July 1990, 270-279. M. J. Atallah, F. Dehne, R. Miller, A. Rau-Chaplin, and J.-J. Tsay, Multisearch techniques for implementing data structures on a mesh-connected computer, Proceedings of the Third ACM Symposium on Parallel Algorithms and Architectures,

[Atal9lb]

[Ayka9l]

Hilton Head, South Carolina, July 1991, 204-214. M. J. Atallah, D. Z. Chen, and H. Wagener, An optimal parallel algorithm for the visibility of a simple polygon from a point, Journal of the ACM, Vol. 38, No. 3, July 1991, 516-533. C. Aykanat and T. M. Kurq, Efficient parallel maze routing algorithms on a hypercube multicomputer, Proceedings of the 1991 InternationalConference on Parallel Processing, St. Charles, Illinois, August 1991, Vol. III, Algorithms and Architectures, 224-227.

[Batc681

K. E. Batcher, Sorting networks and their applications, Proceedings of the AFIPS 1968 Spring Joint Computer Conference, Atlantic City, New Jersey, April/May

[Beic90]

1968, 307-314. I. Beichl and F. Sullivan, A robust parallel triangulation and shelling algorithm, Proceedings of the Second Canadian Conference in Computational Geometry,

[Ben-083]

Ottawa, Ontario, August 1990, 107-111. M. Ben-Or, Lower bounds for algebraic computation trees, Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, Boston, May 1983,

[Bent801

80-86. J. L. Bentley and D. Wood, An optimal worst case algorithm for reporting intersections of rectangles, IEEE Transactions on Computers, Vol. C-29, 1980, 571 -576.

198 [Berg891

[Bern88]

[Bert88] [Beye69] [Blel88]

[Blei89] [Blel90] [Boxe87aI

[Boxe87b]

[Boxe881

[Boxe89a] [Boxe89bI [Boxe89c] [Boxe9O] [Bren74] [Brow79a] [Brow79b]

[Chaz84]

Bibliography B. Berger, J. Rompel, and P. W. Shor, Efficient NC algorithms for set cover with applications to learning and geometry, Proceedings of the Thirtieth Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, October/November 1989, 54-59. M. Bern, Hidden surface removal for rectangles, Proceedings of the Fourth Annual ACM Symposium on Computational Geometry, Urbana-Champaign, Illinois, June 1988, 183-192. A. A. Bertossi, Parallel circle-cover algorithms, Information ProcessingLetters, Vol. 27, 1988, 133-139. W. T. Beyer, Recognition of topological invariants by iterative arrays, Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1969. G. E. Blelloch and J. J. Little, Parallel solutions to geometric problems on the scan model of computation, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 218-222. G. E. Blelloch, Scans as primitive parallel operations, IEEE Transactions on Computers, Vol. C-38, No. 11, November 1989, 1526-1538. G. E. Blelloch, Vector Models for Data-ParallelComputing, MIT Press, Cambridge, Massachusetts, 1990. L. Boxer and R. Miller, Parallel Dynamic Computational Geometry, Technical Report 87-11, Department of Computer Science, State University of New York at Buffalo, 1987. L. Boxer and R. Miller, Parallel algorithms for dynamic systems with known trajectories, Proceedings of the IEEE 1987 Workshop on Computer Architecture for Pattern Analysis and Machine Intelligence, 1987. L. Boxer and R. Miller, Dynamic computational geometry on meshes and hypercubes, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. I, Architecture, 323-330. L. Boxer and R. Miller, Dynamic computational geometry on meshes and hypercubes, Journal of Supercomputing, Vol. 3, 1989, 161-191. L. Boxer and R. Miller, Parallel dynamic computational geometry, Journal of New Generation Computer Systems, Vol. 2, No. 3, 1989, 227-246. L. Boxer and R. Miller, A parallel circle-cover minimization algorithm, Information Processing Letters, Vol. 32, 1989, 57-60. L. Boxer and R. Miller, Common intersections of polygons, Information Processing Letters, Vol. 33, No. 5, 1990, 249-254; see also corrigenda in Vol. 35, 1990, 53. R. P. Brent, The parallel evaluation of general arithmetic expressions, Journal of the ACM, Vol. 21, No. 2, 1974, 201-206. K. Q. Brown, Voronoi diagrams from convex hulls, Information Processing Letters, Vol. 9, 1979, 223-228. K. Q. Brown, Geometric transforms for fast geometric algorithms, Ph.D. thesis, Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pennsylvania, 1979. B. Chazelle, Computational geometry on a systolic chip, IEEE Transactions on Computers, Vol. C-33, No. 9, September 1984, 774-785.

Bibliography [Chaz86] [Chaz901

[Chen87] [Chow8O] [Chow8l]

[Codd681 [Cole88a]

[Cole88b] [Cole88c]

[Cole90a]

[Cole9fb]

[Conr86] [Cook82]

[Cook851 [Corm90] [Cyph9O]

[Czyz89a]

[Czyz89b]

199

B. Chazelle and L. J. Guibas, Fractional cascading: 1. A data structuring technique, Algorithmica, Vol. 1, 1986, 133-162. B. Chazelle, Triangulating a simple polygon in linear time, Proceedings of the Thirty-First Annual Symposium on Foundations of Computer Science, St. Louis, October 1990, Vol. 1, 220-230. G.-H. Chen, M.-S. Chern, and R. C. T. Lee, A new systolic architecture for convex hull and half-plane intersection problems, BIT, Vol. 27, 1987, 141-147. A. L. Chow, Parallel algorithms for geometric problems, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1980. A. L. Chow, A parallel algorithm for determining convex hulls of sets of points in two dimensions, Proceedings of the Nineteenth Annual Allerton Conference on Communication, Control and Computing, Monticello, Illinois, September/October 1981, 214-223. E. F. Codd, Cellular Automata, Academic Press, New York, 1968. R. Cole and M. T. Goodrich, Optimal parallel algorithms for polygon and point-set problems (preliminary version), Proceedings of the Fourth Annual ACM Symposium on Computational Geometry, Urbana-Champaign, Illinois, June 1988, 201-210. R. Cole, Parallel merge sort, SIAM Journal on Computing, Vol. 17, No. 4, August 1988, 770-785. R. Cole and U. Vishkin, Approximate parallel scheduling. I. The basic technique with applications to optimal parallel list ranking in logarithmic time, SIAM Journal on Computing, Vol. 17, 1988, 128-142. R. Cole and 0. Zajicek, An optimal parallel algorithm for building a data structure for planar point location, Journal of Parallel and Distributed Computing, Vol. 8, 1990, 280-285. R. Cole, M. T. Goodrich, and C. O'Ddnlaing, Merging free trees in parallel for efficient Voronoi diagram construction, in Automata, Languages and Programming, M. S. Paterson (Editor), Lecture Notes in Computer Science, No. 443, SpringerVerlag, Berlin, 1990, 432-445. M. Conrad, The lure of molecular computing, IEEE Spectrum, Vol. 23, No. 10, October 1986, 55-60. S. Cook and C. Dwork, Bounds on the time for parallel RAM's to compute simple functions, Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, San Francisco, May 1982, 231-233. S. A. Cook, A taxonomy of problems with fast parallel algorithms, Information and Control, Vol. 64, 1985, 2-22. T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill, New York, 1990. R. Cypher and C. G. Plaxton, Deterministic sorting in nearly logarithmic time on the hypercube and related computers, Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, Baltimore, May 1990, 193-203. J. Czyzowicz, E. Rivera-Campo, N. Santoro, J. Urrutia, and J. Zaks, Guarding Rectangular Art Galleries, Technical Report TR-89-27, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. J. Czyzowicz, E. Rivera-Campo, and J. Urrutia, Illuminating Rectangles and

200

[Czyz89cl

[Dado87]

[Dado891 tDehn86a]

[Dehn86b]

[Dehn88a] [Dehn88b]

[Dehn88c]

[Dehn88d]

[Dehn88e]

[Dehn89]

[Dehn9O]

[Dehn9la]

[Dehn9lb] [Dehn92]

Bibliography Triangles on the Plane, Technical Report TR-89-50, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. J. Czyzowicz, E. Rivera-Campo, J. Urrutia, and J. Zaks, Illuminating Lines and Circles on the Plane, Technical Report TR-89-49, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989. N. Dadoun and D. G. Kirkpatrick, Parallel processing for efficient subdivision search, Proceedings of the Third Annual ACM Symposium on Computational Geometry, Waterloo, Ontario, June 1987, 205-214. N. Dadoun and D. G. Kirkpatrick, Parallel construction of subdivision hierarchies, Journal of Computer and System Sciences, Vol. 39, 1989, 153-165. F. Dehne, 0(n"/2) algorithms for the maximal elements and ECDF searching problem on a mesh-connected parallel computer, Information Processing Letters, Vol. 22, 1986, 303-306. F. Dehne, J.-R. Sack, and N. Santoro, Computing on a Systolic Screen: Hulls, Contours and Applications, Technical Report SCS-TR-102, School of Computer Science, Carleton University, Ottawa, Ontario, October 1986. F. Dehne, Solving visibility and separability problems on a mesh-of-processors, The Visual Computer, Vol. 3, 1988, 356-370. F. Dehne, J.-R. Sack, and I. Stojmenovi6, A note on determining the 3-dimensional convex hull of a set of points on a mesh of processors, Proceedings of the Scandinavian Workshop on Algorithm Theory (SWAT), Sweden, Lecture Notes in Computer Science, No. 318, Springer-Verlag, Berlin, 1988, 154-162. F. Dehne, Q. T. Pham, and I. Stojmenovi6, Optimal Visibility Algorithms for Binary Images on the Hypercube, Technical Report TR-88-27, Computer Science Department, University of Ottawa, Ottawa, Ontario, October 1988. F. Dehne and 1. Stojmenovi6, An 0(,I/H) time algorithm for the ECDF searching problem for arbitrary dimensions on a mesh-of-processors, Information Processing Letters, Vol. 28, 1988, 67-70. F. Dehne, A. Hassenklover, J.-R. Sack, and N. Santoro, Parallel visibility on a mesh connected parallel computer, in Parallel Processing and Applications, E. Chiricozzi and A. D'Amico (Editors), North-Holland, Amsterdam, 1988, 203-210. F. Dehne and A. Rau-Chaplin, Implementing data structures on a hypercube multiprocessor, and applications in parallel computational geometry, Proceedings of the Fifteenth International Workshop on Graph-Theoretic Concepts in Computer Science, June 1989. F. Dehne and A. Rau-Chaplin, Implementing data structures on a hypercube multiprocessor, and applications in parallel computational geometry, Journal of Parallel and Distributed Computing, Vol. 8, 1990, 367-375. F. Dehne and S. E. Hambrusch, Parallel algorithms for determining k-width connectivity in binary images, Journal of Paralleland Distributed Computing, Vol. 12, No. 1, May 1991, 12-23. F. Dehne, Ed., Special Issue: Parallel Algorithms for Geometric Problems on Digitized Pictures, Algorithmica, Vol. 6, No. 5, 1991. F. Dehne, B. Flach, M. Gastaldo, D. Graf, R. Merker, R. Sack, and N. Valiveti, Computational geometry on Hopfield networks, manuscript in preparation, 1992.

Bibliography [Deng901 [Dyer8O] [Dyer84] [Eddy771 [Edel86a] [Edel86b]

[Edel87]

[Edel90

[ElGi86a]

[ElGi86b]

[EiGi88] [EIGi9J0 [Evan89] [Fava9O]

[Fava9l]

[Feit881 [Ferr9la] [Ferr9lb]

[Fink74]

201 X. Deng, An optimal parallel algorithm for linear programming in the plane, Information ProcessingLetters, Vol. 35, 1990, 213-217. C. R. Dyer, A fast parallel algorithm for the closest pair problem, Information Processing Letters, Vol. 11, No. 1, 1980, 49-52. M. E. Dyer, Linear time algorithms for two- and three-variable linear programs, SIAM Journal on Computing, Vol. 13, No. 1, 1984, 31-45. W. F. Eddy, A new convex hull algorithm for planar sets, ACM Transactions on Mathematical Software, Vol. 3, No. 4, 1977, 398-403. H. Edelsbrunner, L. J. Guibas, and J. Stolfi, Optimal point location in a monotone subdivision, SIAM Journal on Computing, Vol. 15, 1986, 317-340. H. Edelsbrunner, J. O'Rourke, and R. Seidel, Constructing arrangements of lines and hyperplanes with applications, SIAM Journal on Computing, Vol. 15, No. 2, 1986, 341-363. H. Edelsbrunner, Algorithms in combinatorial geometry, in EATCS Monographs on Theoretical Computer Science, W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1987. H. Edelsbrunner, L. J. Guibas, and M. Sharir, The complexity and construction of many faces in arrangements of lines and of segments, Discrete and Computational Geometry, Vol. 5, 1990, 161-196. H. ElGindy, An optimal speed-up parallel algorithm for triangulating simplicial point sets in space, InternationalJournal of ParallelProgramming, Vol. 15, No. 5, 1986, 389-398. H. ElGindy, A Parallel Algorithm for the Shortest Path Problem in Monotone Polygons, Technical Report MS-CIS-86-49, Department of Computer and Information Science, Faculty of Engineering and Applied Science, University of Pennsylvania, Philadelphia, May 1986. H. ElGindy and M. T. Goodrich, Parallel algorithms for shortest path problems in polygons, The Visual Computer, Vol. 3, 1988, 371-378. H. ElGindy, personal communication, 1990. D. J. Evans and I. Stojmenovi6, On parallel computation of Voronoi diagrams, Parallel Computing, Vol. 12, 1989, 121-125. L. Fava, The design of an efficient BSR network, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, September 1990. L. Fava Lindon and S. G. AkU, An Optimal Implementation of Broadcasting with Selective Reduction, Technical Report 91-298, Department of Computing and Information Science, Queen's University, Kingston, Ontario, March 1991. D. G. Feitelson, Optical Computing, MIT Press, Cambridge, Massachusetts, 1988. A. G. Ferreira, personal communication, 1991. A. G. Ferreira and J. G. Peters, Finding smallest paths in rectilinear polygons on a hypercube multiprocessor, Proceedings of the Third Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 162-165. R. A. Finkel and J. L. Bentley, Quad-trees; a data structure for retrieval on composite keys, Acta Informatica, Vol. 4, 1974, 1-9.

202 fFjal90] [Fort781

[Fort87] [Fost8O] [Four88] [Free75] [Ghos9lI

[Gibb881 [Good77] [Good87a] [Good87b] [Good88]

[Good89a] [Good89b]

[Good9OaI

[Good9Ob]

[Good92a] [Good92b]

Bibliography P.-O. Fjallstrom, J. Katajainen, C. Levcopoulos, and 0. Petersson, A sublogarithmic convex hull algorithm, BIT, Vol. 30, No. 3, 1990, 378-384. S. Fortune and J. Wyllie, Parallelism in random access machines, Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, San Diego, May 1978, 114-118. S. Fortune, A sweepline algorithm for Voronoi diagrams, Algorithmica, Vol. 2, 1987, 153-174. M. J. Foster and H. T. Kung, The design of special purpose VLSI chips, Computer, Vol. 13, No. 1, January 1980, 26-40. A. Fournier and D. Fussel, On the power of the frame buffer, ACM Transactions on Computer Graphics, Vol. 7, 1988, 103-128. H. Freeman and R. Shapira, Determining the minimal area rectangle for an arbitrary closed curve, Communications of the ACM, Vol. 18, 1975, 409-413. K. S. Ghosh and A. Maheshwari, An optimal parallel algorithm for determining the intersection type of two star-shaped polygons, Proceedings of the Third Canadian Conference on Computational Geometry, Vancouver, British Columbia, August 1991, 2-6. A. Gibbons and W. Rytter, Efficient Parallel Algorithms, Cambridge University Press, Cambridge, England, 1988. S. E. Goodman and S. T. Hedetniemi, Introduction to the Design and Analysis of Algorithms, McGraw-Hill, New York, 1977, section 5.5. M. T. Goodrich, Efficient parallel techniques for computational geometry, Ph.D. thesis, Purdue University, West Lafayette, Indiana, 1987. M. T. Goodrich, Finding the convex hull of a sorted point set in parallel, Information Processing Letters, Vol. 26, December 1987, 173-179. M. T. Goodrich, Intersecting Line Segments in Parallel with an Output-Sensitive Number of Processors, Technical Report 88-27, Department of Computer Science, John Hopkins University, Baltimore, 1988. M. T. Goodrich, Triangulating a polygon in parallel, Journal of Algorithms, Vol. 10, September 1989, 327-351. M. T. Goodrich, C. 6'Ddnlaing, and C. K. Yap, Constructing the Voronoi diagram of a set of line segments in parallel, Proceedingsof the 1989 Workshop on Algorithms and Data Structures (WADS'89), Lecture Notes in Computer Science, No. 382, F. Dehne, J.-R. Sack, and N. Santoro (Editors), Springer-Verlag, Berlin, 1989, 12-23. M. T. Goodrich, S. B. Shauck, and S. Guha, Parallel methods for visibility and shortest path problems in simple polygons, Proceedings of the Sixth Annual Symposium on Computational Geometry, Berkeley, California, June 1990, 73-82. M. T. Goodrich, M. R. Ghouse, and J. Bright, Generalized sweep methods for parallel computational geometry (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 280-289. M.T. Goodrich, Ed., Special Issue: Parallel Computational Geometry, International Journal on Computational Geometry and Applications, 1992. M. T. Goodrich and C. K. Yap, What can be parallelized in computational geometry: a survey, manuscript in preparation, 1992.

Bibliography [Gott87]

203 A. Gottlieb, An overview of the NYU Ultracomputer project, in Special Topics in Supercomputing, Vol. 1, Experimental Parallel Computing Architectures, J. J.

[Grah72]

Dongarra (Editor), Elsevier, Amsterdam, 1987, 25-96. R. L. Graham, An efficient algorithm for determining the convex hu]l of a finite planar set, Information Processing Letters, Vol. 1, 1972, 132-133.

[Guha9O]

S. Guha, An optimal parallel algorithm for the rectilinear Voronoi diagram, Proceedings of the Twenty-Eighth Annual Allerton Conference on Communication,

[Guib82]

Control and Computing, Monticello, Illinois, October 1990, 798-807. L. J. Guibas and J. Stolfi, A language for bitmap manipulation, ACM Transactions

[Hage90]

T. Hagerup, H. Jung, and E. Welzl, Efficient parallel computation of arrangements

on Computer Graphics, Vol. 1, 1982, 191-214. of hyperplanes in d dimensions, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 290-297.

[He9l]

[Hi1851 [Hole90

X. He, An efficient parallel algorithm for finding minimum weight matching for points on a convex polygon, In formation Processing Letters, Vol. 37, No. 2, January 1991, 111-116. W. D. Hillis, The Connection Machine, MIT Press, Cambridge, Massachusetts, 1985. J. A. Holey and 0. H. Ibarra, Iterative algorithms for planar convex hull on meshconnected arrays, Proceedings of the 1990 International Conference on Parallel

[Hole9l]

Processing, St. Charles, Illinois, August 1990, 102-109. J. A. Holey and 0. H. Ibarra, Triangulation, Voronoi diagram, and convex hull in k-space on mesh-connected arrays and hypercubes, Proceedings of the 1991 International Conference on Parallel Processing, St. Charles, Illinois, August 1991, Vol. IIt, Algorithms and Applications, 147-150.

[Hopf85]

J. J. Hopfield and D. W. Tank, "Neural" computation of decisions in optimization problems, Biological Cybernetics, Vol. 52, 1985, 141-152.

[HouI85]

M. E. Houle and G. T. Toussaint, Computing the width of a set, Proceedingsof the First Annual ACM Symposium on Computational Geometry, Baltimore, 1985, 1-7.

[Jarv731

R. A. Jarvis, On the identification of the convex hull of a finite set of points in the plane, Information Processing Letters, Vol. 2, No. 1, 1973, 18-21.

[Jeon87l

C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on mesh-connected computers, Proceedings of the 1987 Fall Joint Computer Conference, Exploiting Technology Today and Tomorrow, October 1987, 311-318.

[Jeon90] [Jeon9la] [jeon9lb] [John77] [Jwo90]

C.-S. Jeong and D. T. Lee, Parallel geometric algorithms on a mesh-connected computer, Algorithmica, Vol. 5, No. 2, 1990, 155-178. C.-S. Jeong, An improved parallel algorithm for constructing Voronoi diagram on a mesh-connected computer, Parallel Computing, Vol. 17, July 1991, 505-514. C.-S. Jeong, Parallel Voronoi diagram in L, (L_) metric on a mesh-connected computer, Parallel Computing, Vol. 17, No. 2/3, June 1991, 241-252. D. B. Johnson, Efficient algorithms for shortest paths in sparse networks, Journal of the ACM, Vol. 24, No. 1, 1977, 1-13. J. S. Jwo, S. Lakshmivarahan, and S. K. Dhall, Embedding of cycles and grids in star graphs, Proceedings of the Second IEEE Symposium on Paralleland Distributed

Processing, Dallas, December 1990, 540-547.

204 [Karp90I

[Kim90] [Kirk83] [Knut73] [Krus85]

[Kuce82] [Kuma86]

[Kuma9O] [Kuma9l] [Lang76] [Lee6l] [Lee77] [Lee8l] [Lee84a] [Lee84b] [Lee86a]

[Lee86b] [Lee89] [Leig8l]

[Leig85]

Bibliography R. M. Karp and V. Ramachandran, A survey of parallel algorithms for shared memory machines, in Handbook of Theoretical Computer Science, J. van Leeuwen (Editor), North-Holland, Amsterdam, 1990, 869-941. S. K. Kim, Parallel algorithms for the segment dragging problem, Information Processing Letters, Vol. 36, No. 6, December 1990, 323-328. D. G. Kirkpatrick, Optimal search in planar subdivisions, SIAM Journal on Computing, Vol. 12, No. 1, February 1983, 28-35. D. E. Knuth, The Art of Computer Programming, Vol. 3, Addison-Wesley, Reading, Massachusetts, 1973. C. P. Kruskal, L. Rudolf, and M. Snir, The power of parallel prefix, Proceedings of the 1985 International Conference on Parallel Processing, St. Charles. Illinois, August 1985, 180-185. L. Kucera, Parallel computation and conflict in memory access, Information Processing Letters, Vol. 14, No. 2, April 1982, 93-96. V. K. Prasanna Kumar and M. M. Eshaghian, Parallel geometric algorithms for digitized pictures on a mesh of trees, Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, Illinois, August 1986, 270-273. V. Kumar, P. S. Gopalakrishnan, and L. N. Kanal (Editors), ParallelAlgorithms for Machine Intelligence and Vision, Springer-Verlag, New York, 1990. V. K. Prasanna Kumar, Parallel Architectures and Algorithms for Image Understanding, Academic Press, New York, 1991. T. Lang and H. S. Stone, A shuffle-exchange network with simplified control, IEEE Transactions on Computers, Vol. C-25, No. 1, January 1976, 55-65. C. Y. Lee, An algorithm for path connections and its applications, IRE Transactions on Electronic Computers, Vol. EC-10, No. 3, 1961, 346-365. D. T. Lee and F. P. Preparata, Location of a point in a planar subdivision and its applications, SIAM Journal on Computing, Vol. 6, 1977, 594-606. D. T. Lee, H. Chang, and C. K. Wong, An on-chip compare steer bubble sorter, IEEE Transactions on Computers, Vol. C-30, 1981, 396-405. C. C. Lee and D. T. Lee, On a circle-cover minimization problem, Information Processing Letters, Vol. 18, 1984, 109-115. D. T. Lee and F. P. Preparata, Computational geometry-a survey, IEEE Transactions on Computers, Vol. C-33, No. 12, 1984, 1072-1101. D. T. Lee, Geometric location problems and their complexity, Proceedings of the Symposium on Mathematical Foundations of Computer Science, Lecture Notes in Computer Science, No. 233, Springer-Verlag, Berlin, 1986, 154-167. D. T. Lee and Y. F. Wu, Geometric complexity of some location problems, Algorithmica, Vol. 1, 1986, 193-211. D. T. Lee and F. P. Preparata, Parallel watched planar point location on the CCC, Information Processing Letters, Vol. 33, 1989, 175-179. F. T. Leighton, New lower bound techniques for VLSI, Proceedings of the Twenty-Second Annual Symposium on Foundations of Computer Science, Nashville, Tennessee, October 1981, I -12. F. T. Leighton, Tight bounds on the complexity of parallel sorting, IEEE Transactions on Computers, Vol. C-34, No. 4, April 1985, 344-354.

205

Bibliography [Leig9l]

F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays Trees Hlypercubes, Morgan Kaufman, San Mateo, California, 1991.

[Levc88]

C. Levcopoulos, J. Katajainen, and A. Lingas, An optimal expected-time parallel algorithm for Voronoi diagrams, Proceedings of the Scandinavian Workshop on Algorithm Theory (SWAT), Sweden, Lecture Notes in Computer Science, No. 318,

[Lipp87] [Lodi86]

Springer-Verlag, Berlin, 1988, 190-198. R. P. Lippmann, An introduction to computing with neural nets, IEEE ASSP Magazine, April 1987, 4-22. E. Lodi and L. Pagli, A VLSI solution to the vertical segment visibility problem, IEEE Transactions on Computers, Vol. C-35, No. 10, October 1986, 923-928.

[Lu86a]

M. Lu, Constructing the Voronoi diagram on a mesh-connected computer, Proceedings of the 1986 International Conference on Parallel Processing, St. Charles,

[Lu86b]

Illinois, August 1986, 806-811. M. Lu and P. Varman, Mesh-connected computer algorithms for rectangle-intersection problems, Proceedings of the 1986 International Conference on ParallelProcessing,

St. Charles, Illinois, August 1986, 301-307. [MacK90a] P. D. MacKenzie and Q. F. Stout, Asymptotically efficient hypercube algorithms for computational geometry, Proceedings of the Third Symposium on the Frontiers of Massively ParallelComputation, College Park, Maryland, October 1990, 8-1 1.

[MacK9Ob] P. D. MacKenzie and Q. F. Stout, Practical hypercube algorithms for computational geometry, poster presentation at the Third Symposium on the Frontiers of Massively

[Megi831

Parallel Computation, College Park, Maryland, October 1990. N. Megiddo, Linear time algorithm for linear programming in R3 and related problems, SIAM Journal on Computing, Vol. 12, No. 4, 1983, 759-776.

[MehI84]

K. Mehlhorn, Data structures and algorithms 3: multi-dimensional searching and computational geometry, in EATCS Monographs on Theoretical Computer Science,

W. Brauer, G. Rozenberg, and A. Salomaa (Editors), Springer-Verlag, Berlin, 1984. [Menn90

A. Menn and A. K. Somani, An efficient sorting algorithm for the star graph interconnection network, Proceedings of the 1990 International Conference on Par-

[Merk86]

allel Processing, St. Charles, Illinois, August 1990, 1-8. E. Merks, An optimal parallel algorithm for triangulating a set of points in the plane, InternationalJournal of Parallel Programming, Vol. 15, No. 5, 1986, 399-411.

[MiIl84a]

R. Miller and Q. F. Stout, Computational geometry on a mesh-connected computer (preliminary version), Proceedings of the 1984 International Conference on Parallel

[Mill84b]

Processing, Bellaire, Michigan, August 1984, 66-73. R. Miller and Q. F. Stout, Convexity algorithms for pyramid computers (preliminary version), Proceedings of the 1984 International Conference on Parallel Processing,

[MilI85a]

Bellaire, Michigan, August 1984, 177-184. R. Miller and Q. F. Stout, Pyramid computer algorithms for determining geometric properties of images, Proceedings of the First Annual ACM Symposium on Computational Geometry, Baltimore, June 1985, 263-271.

[Mill85b]

R. Miller and Q. F. Stout, Geometric algorithms for digitized pictures on a meshconnected computer, IEEE Transactions on Pattern Analysis and Machine Intelli-

[Mil]87a]

gence, Vol. PAMI-7, No. 2, March 1985, 216-228. R. Miller and S. E. Miller, Using hypercube multiprocessors to determine geometric

206

[Mill87b]

[MiII88] [MiIl89a] [Mill89bI [Mins691 [Nand88]

[Nass80

[Nass8l] [Nass82] [Nath80

[Niga9O]

[O'Rou87] [Osia9O]

[Osia9l]

[Over8l] [Parb87] [Pate9O] [PelI90]

Bibliography properties of digitized pictures, Proceedings of the 1987 International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 638-640. R. Miller and Q. F. Stout, Mesh computer algorithms for line segments and simple polygons, Proceedings of the 1987 International Conference on ParallelProcessing, St. Charles, Illinois, August 1987, 282-285. R. Miller and Q. F. Stout, Efficient parallel convex hull algorithms, IEEE Transactions on Computers, Vol. C-37, No. 12, December 1988, 1605-1618. R. Miller and S. Miller, Convexity algorithms for digitized pictures on an Intel iPSC hypercube, Supercomputer, Vol. 31, May 1989, 45-51. R. Miller and Q. F. Stout, Mesh computer algorithms for computational geometry, IEEE Transactions on Computers, Vol. C-38, No. 3, March 1989, 321-340. M. Minsky and S. Papert, Perceptrons, MIT Press, Cambridge, Massachusetts, 1969. S. K. Nandy, R. Moona, and S. Rajagopalan, Linear quadtree algorithms on the hypercube, Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 227-229. D. Nassimi and S. Sahni, Finding connected components and connected ones on a mesh-connected parallel computer, SIAM Journal on Computing, Vol. 9, No. 4, November 1980, 744-757. D. Nassimi and S. Sahni, Data broadcasting in SIMD computers, IEEE Transactions on Computers, Vol. C-30, No. 2, February 1981, 101-106. D. Nassimi and S. Sahni, Parallel permutation and sorting algorithms and a new generalized connection network, Journal of the ACM, Vol. 29, No. 3, 1982, 642-667. D. Nath, S. N. Maheshwari, and P. C. P. Bhatt, ParallelAlgorithms for the Convex Hull Problem in Two Dimensions, Technical Report EE 8005, Department of Electrical Engineering. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, October 1980. M. Nigam, S. Sahni, and B. Krishnamurthy, Embedding hamiltonians and hypercubes in star interconnection graphs, Proceedings of the International Conference on Parallel Processing, St. Charles, Illinois, August 1990, 340-343. J. O'Rourke, Art Gallery Theorems and Algorithms, Oxford University Press, New York, 1987. C. N. K. Osiakwan and S. G. Akl, Efficient ParallelAlgorithms for the Assignment Problem on the Plane, Technical Report 90-284, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1990. C. N. K. Osiakwan, Parallel computation of weighted matchings in graphs, Ph.D. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1991. M. H. Overmars and J. van Leeuwen, Maintenance of configurations in the plane, Journal of Computer and System Sciences, Vol. 23, 1981, 166-204. I. Parberry, Parallel Complexity Theory, Research Notes in Theoretical Computer Science, Pitman Publishing, London, 1987. M. S. Paterson, Improved sorting networks with O(log N) depth, Algorithmica, Vol. 5, 1990, 75-92. M. Pellegrini, Stabbing and ray shooting in 3 dimensional space, Proceedings of the

Bibliography

[Prea88] [Prei88]

[Prep8l] [Prep85] [Qiu9la]

[Qiu9lb] [Qiu9lc]

[Rama88]

[Rana87]

[Rank90 [Reif85]

[Reif87]

[Reif90]

[Rey87] [Rose62] [Roth76]

207

Sixth Annual ACM Symposium on Computational Geometry, Berkeley, California, June 1990, 177-186. B. T. Preas, M. J. Lorenzetti, and B. D. Ackland (Editors), Physical Design Automation of Electronic Systems, Benjamin-Cummings, Menlo Park, California, 1988. W. Preilowski and W. Mumbeck, A time-optimal parallel algorithm for the computing of Voronoi-diagrams, Proceedings of the Fourteenth International Workshop on Graph-Theoretic Concepts in Computer Science, Amsterdam, Lecture Notes in Computer Science, No. 344, J. van Leeuwen (Editor), Springer-Verlag, Berlin, June 1988, 424-433. F. P. Preparata and J. Vuillemin, The cube-connected-cycle: a versatile network for parallel computation, Communications of the ACM, Vol. 24, No. 5, 1981, 300-309. F. P. Preparata and M. 1. Shamos, Computational Geometry: An Introduction, Springer-Verlag, New York, 1985. K. Qiu, H. Meijer, and S. G. AkI, Parallel routing and sorting on the pancake network, Proceedingsof the InternationalConference on Computing and Information, Ottawa, May 1991, Lecture Notes in Computer Science, No. 497, Springer-Verlag, Berlin, 360-371. K. Qiu, H. Meijer, and S. G. Akl, Decomposing a star graph into disjoint cycles, Information Processing Letters, Vol. 39, No. 3, August 1991, 125-129. K. Qiu, S. G. Akl, and H. Meijer, The Star and Pancake Interconnection Networks: Properties and Algorithms, Technical Report 91-297, Department of Computing and Information Science, Queen's University, Kingston, Ontario, March 1991. J. Ramanujam and P. Sadayappan, Optimization by neural networks, Proceedings of the IEEE International Conference on Neural Networks, San Diego, 1988, (11)325-332. A. G. Ranade, How to emulate shared memory, Proceedings of the Twenty-Eighth Annual Symposium on Foundations of Computer Science, Los Angeles, October 1987, 185-194. S. Ranka and S. Sahni, Hypercube Algorithms with Application to Image Processing and Pattern Recognition, Springer-Verlag, New York, 1990. J. H. Reif, An optimal parallel algorithm for integer sorting, Proceedings of the Twenty-Sixth Annual Symposium on Foundations of Computer Science, Portland, Oregon, October 1985, 496-503. J. H. Reif and S. Sen, Optimal randomized parallel algorithms for computational geometry, Proceedings of the 1987 InternationalConference on ParallelProcessing, St. Charles, Illinois, August 1987, 270-277. J. H. Reif and S. Sen, Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications (preliminary version), Proceedings of the Second ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990, 327-337. C. Rey and R. Ward, On determining the on-line minimax linear fit to a discrete point set in the plane, Information Processing Letters, Vol. 24, No. 2, 1987, 97-101. F. Rosenblatt, Principles of Neurodynamics, Spartan Books, New York, 1962. J. Rothstein, On the ultimate limitations of parallel processing, Proceedings of

208

[Rub90] [Sark89a] [Sark89b]

[Saxe90

[Saxe9l]

[Sche891 [Schw80] [Schw89I

[Sham75] [Sham78] [Shan50] [Sher90

[Shi9l] [Shih87I [Snir85] [Snyd86] [Spri89]

[Srid90]

Bibliography the 1976 International Conference on Parallel Processing, Detroit, August 1976, 206-212. C. Rub, Parallel algorithmsfor red-blue intersection problems, manuscript, FB 14, Informatik, Universitat des Saarlandes, Saarbrucken, 1990. D. Sarkar and I. Stojmenovi6, An optimal parallel circle-cover algorithm, Information Processing Letters, Vol. 32, July 1989, 3-6. D. Sarkar and I. Stojmenovi6, An Optimal Parallel Algorithm for Minimum Separation of Two Sets of Points, Technical Report TR-89-23, Computer Science Department, University of Ottawa, Ottawa, Ontario, July 1989. S. Saxena, P. C. P. Bhatt, and V. C. Prasad, Efficient VLSI parallel algorithm for Delaunay triangulation on orthogonal tree network in two and three dimensions, IEEE Transactions on Computers, Vol. C-39, No. 3, March 1990, 400-404. S. Saxena, P. C. P. Bhatt, and V. C. Prasad, Correction to: "Parallel algorithm for Delaunay triangulation on orthogonal tree network in two and three dimensions," IEEE Transactions on Computers, Vol. C-40, No. 1, January 1991, 122. 1. D. Scherson and S. Sen, Parallel sorting in two-dimensional VLSI models of computation, IEEE Transactions on Computers, Vol. C-38, No. 2, 1989, 238-249. J. T. Schwartz, Ultracomputers, ACM Transactions on Programming Languages and Systems, Vol. 2, No. 4, October 1980, 484-521. 0. Schwarzkopf, Parallel computation of discrete Voronoi diagrams, Proceedingsof the Sixth Annual Symposium on Theoretical Aspects of Computer Science, Paderbom, Germany, February 1989, 193-204. M. I. Shamos, Geometric complexity, Proceedings of the Seventh ACM Symposium on Theory of Computing, Albuquerque, New Mexico, May 1975, 224-233. M. I. Shamos, Computational geometry, Ph.D. thesis, Department of Computer Science, Yale University, New Haven, Connecticut, 1978. C. E. Shannon, Memory requirements in a telephone exchange, Bell Systems Technical Journal, Vol. 29, 1950, 343-349. T. Shermer, Recent Results in Art Galleries, Technical Report CMPT TR 90-10, Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, 1990. X. Shi, Contributions to sequence problems, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, September 1991. Z.-C. Shih, G.-H. Chen, and R. C. T. Lee, Systolic algorithms to examine all pairs of elements, Communications of the ACM, Vol. 30, No. 2, February 1987, 161-167. M. Snir, On parallel searching, SIAM Journal on Computing, Vol. 12, No. 3, August 1985, 688-708. L. Snyder, Type architectures, shared memory and the corollary of modest potential, Annual Review of Computer Science, Vol. 1, 1986, 289-317. F. Springsteel and I. Stojmenovi6, Parallel general prefix computations with geometric, algebraic, and other applications, International Journal of Parallel Programming, Vol. 18, No. 6, December 1989, 485-503. R. Sridhar, S. S. Iyengar, and S. Rajanarayanan, Range search in parallel using distributed data structures, Proceedings of the International Conference on

Bibliography

209

[Stoj87]

Databases, Parallel Architectures, and Their Applications, Miami Beach, Florida, March 1990, 14-19. I. Stojmenovic, Parallel Computational Geometry, Technical Report CS-87-176, Computer Science Department, Washington State University, Pullman, Washington, November 1987.

[Stoj88a]

1. Stojmenovic, Computational geometry on a hypercube, Proceedings of the 1988

International Conference on ParallelProcessing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 100-103. [Stoj88b] I. Stojmenovi6 and M. Miyakawa, An optimal parallel algorithm for solving the maximal elements problem in the plane, ParallelComputing, Vol. 7, 1988, 249-251. [Stou84] Q. F. Stout and R. Miller, Mesh-connected computer algorithms for determining geometric properties of figures, Proceedings of the 1984 International Conference on Pattern Recognition, 1984. [Stou851 Q. F. Stout, Pyramid computer solutions of the closest pair problem, Journal of Algorithms, Vol. 6, 1985, 200-212. [Stou88] Q. F. Stout, Constant-time geometry on PRAMS, Proceedings of the 1988 International Conference on ParallelProcessing, St. Charles, Illinois, August 1988, Vol. III, Algorithms and Applications, 104-107. [Tama89a] R. Tamassia and J. S. Vitter, Parallel Transitive Closure and Point Location in Planar Structures, Technical Report CS-89-45, Department of Computer Science, Brown University, Providence, Rhode Island, October 1989. [Tama89b] R. Tamassia and J. S. Vitter, Optimal parallel algorithms for transitive closure and point location in planar structures, Proceedings of the 1989 Symposium on Parallel Algorithms and Architectures, Sante Fe, New Mexico, June 1989, 399-408. [Tama90] R. Tamassia and J. S. Vitter, Optimal cooperative search in fractional cascaded data structures, Proceedings of the Second ACM Symposium on ParallelAlgorithms and Architectures, Crete, July 1990, 307-316. [Tama9l] R. Tamassia and J. S. Vitter, Planar transitive closure and point location in planar structures, SIAM Journalon Computing, Vol. 20, No. 4, August 1991, 708-725. [Tarj85] R. E. Tarjan and U. Vishkin, An efficient parallel biconnectivity algorithm, SIAM Journal on Computing, Vol. 14, 1985, 862-874. [Thin87] Thinking Machines Corporation, Connection Machine Model CM-2 Technical Summary, Thinking Machines Technical Report HA87-4, April 1987. [Thom77] C. D. Thompson and H. T. Kung, Sorting on a mesh-connected parallel computer, Communications of the ACM, Vol. 4, No. 20, 1977, 263-271. [Tous83] G. T. Toussaint, Solving geometric problems with the "rotating calipers", Proceedings of IEEE MELECON'83, Athens, May 1983. [Uhr87] L. Uhr (Editor), Parallel Computer Vision, Academic Press, New York, 1987. [Ullm84] J. D. Ullman, ComputationalAspects of VLSI, Computer Science Press, Rockville, Maryland, 1984. [Umeo89] H. Umeo and T. Asano, Systolic algorithms for computational geometry problems - a survey, Computing, Vol. 41, 1989, 19-40. [Urru89] J. Urrutia and J. Zaks, Illuminating Convex Sets, Technical Report TR-89-31, Department of Computer Science, University of Ottawa, Ottawa, Ontario, 1989.

210 [Vaid89]

Bibliography

P. M. Vaidya, Geometry helps in matching, SIAM Journal on Computing, Vol. 18, No. 6, December 1989, 1201-1225. [vanW90] K. van Weringh, Algorithms for the Voronoi diagram of a set of disks, M.Sc. thesis, Department of Computing and Information Science, Queen's University, Kingston, Ontario, 1990. [Vish84] U. Vishkin, A parallel-design distributed-implementation (PDDI) general-purpose computer, Theoretical Computer Science, Vol. 32, 1984, 157-172. [VoroO8] G. Voronoi, Nouvelles applications des parametres continus a la theorie des formes quadratiques. Deuxieme Memoire: Recherches sur les paralldloedres primitifs, Journal fur die Reine und Angewandte. Mathematik, 134, 1908, 198-287. [Wang87] C. A. Wang and Y. H. Tsin, An O(logn) time parallel algorithm for triangulating a set of points in the plane, Information Processing Letters, Vol. 25, 1987, 55-60. [Wang9Oa] B.-F. Wang and G.-H. Chen, Constant Time Algorithms for Sorting and Computing Convex Hulls, Technical Report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, 1990. [Wang9Ob] B.-F. Wang, G.-H. Chen, and F.-C. Lin, Constant time sorting on a processor array with a reconfigurable bus system, Information Processing Letters, Vol. 34, No. 4, 1990, 187-192. [Wee9O] Y. C. Wee and S. Chaiken, An optimal parallel LI-metric Voronoi diagram algorithm, Proceedings of the Second Canadian Conference on Computational Geometry, Ottawa, Ontario, August 1990, 60-65. [Wegr91] P. Wegrowicz, Linear programming on the reconfigurable mesh and the CREW PRAM, M.Sc. thesis, School of Computer Science, McGill University, Montreal, Quebec, 1991. [Won87] Y. Won and S. Sahni, Maze routing on a hypercube multiprocessor computer, Proceedings of the 1987 International Conference on Parallel Processing, St. Charles, Illinois, August 1987, 630-637. [Yao81] A. C. Yao, A lower bound to finding convex hulls, Journal of the ACM, Vol. 28, No. 4, 1981, 780-787. [Yap87] C. K. Yap, What can be parallelized in computational geometry? Invited talk at the International Workshop on Parallel Algorithms and Architectures, Humboldt University, Berlin, May 1987, Lecture Notes in Computer Science, No. 269, Springer-Verlag, Berlin, 1988, 184-195. [Yap88] C. K. Yap, Parallel triangulation of a polygon in two calls to the trapezoidal map, Algorithmica, Vol. 3, 1988, 279-288.

Index Author Index Ackland, B. D., 185 Aggarwal, A., 45, 62, 72, 107, 123. 135, 182, 193 Ajtai, M., 25, 182 Akers, S. B., 182 AkU, S. G., 6, 25, 45, 46, 62, 72, 87, 95, 107, 123, 124, 182. 183, 185 Alt, H., 183 Anderson, R., 193 Asano, T., 87, 135 Atallah, M. J., 46, 62. 72, 87, 88, 95, 107, 123, 135. 193 Aykanat, C., 183 Catcher, K. E., 183 Beame, P., 193 Beichl, I., 135 Ben-Or, M., 46 Bentley, J. L., 62, 72, 96, 135 Bern, M., 183 Bertossi, A. A., 123 Bcyer, W. T., 183 Bhatt, P. C. P., 6, 48. 108. 136 Blelloch, G. E., 25, 46, 72, 183 Boxer, L., 62, 123, 183, 193 Brent, R. P., 62 Bright, J., 193 Brisson, E., 193 Brown, K. Q., 46, 107, 183 Callahan, P., 193 Chaiken, S., 109 Chang, H., 96 Chazelle, B., 45, 46, 62, 63, 72, 73, 96, 107, 123, 135, 182, 193 Chen, D. Z., 87. 88, 123 Chen, G. H., 46, 49, 63 Chern, M. S., 46, 63 Chow, A. L., 6, 46, 63, 107 Codd, E. F., 26 Cole, R., 26, 46, 62, 63. 72, 73, 87, 96, 107, 123, 135, 183 Conrad, M.. 26 Cook, S. A., 193 Cormen, T. H., 26 Cypher, R., 46, 88, 96, 136, 183 Czyzowicz, J., 193 Dadoun, N., 46, 63, 73 Dehne, F., 46, 47, 73, 88, 136, 183, 193

Deng, Dhall, Dyer, Dyer,

X.. 63 S. K., 184 C. R.. 96 M. E.. 63

Eddy, W. F., 47 Edelsbrunner, H., 6, 73, 123, 193

ElGindy, H., 47, 123, 124, 136 Eshaghian, M. M., 47, 96 Evans, D. J., 107 Fava, L. (Fava Lindon, L.) 183 Feitelson, D. G., 26 Ferreira, A. G., 124, 184 Finkel, R. A., 96 Fjillstrom, P. O., 47 Flach, B., 193 Fortune, S., 124 Foster, M. J., 26 Fournier, A., 184 Frederickson, G. N., 95 Freeman, H., 184 Fussel, D., 184

Johnson, D. B., 184 Jung, H., 193 Jwo, J. S., 184 Kanal, L. N., 6 Karp, R. M., 184 Katajainen, J., 47, 108 Kim, S. K., 63 Kirkpatrick, D. G., 46, 63, 73 Knuth, D. E., 184 Koml6s, J., 25, 182 Kosaraju, S. R., 95 Krishnamurthy, B., 182, 185 Kruskal, C. P., 26, 124 Kucera, L., 184 Kumar, V., 6 Kung, H. T., 26, 186 Kurc, T. M., 183 Lakshmivarahan, S., 184 Lang, T., 184 Lee, C. C., 124 Lee, C. Y., 184 Lee, D. T., 6, 63, 73, 96, 108, 124, 136, 184 Lee, R. C. T., 46, 63 Leighton, F. T., 26, 47, 88, 96, 136, 184 Leiserson, C. E., 26 Levcopoulos, C., 47, 108 Lin, F. C., 49 Lingas. A., 108 Lippmann, R. P., 26 Little, J. J., 46, 72 Lodi, E., 88 Lorenzetti, M. J., 185 Lu, M., 47, 63, 124, 108

Gastaldo, M., 193 Ghosh. K. S., 63 Ghouse, M. P., 193 Gibbons, A.. 184 Goodman, S. E., 124 Goodrich. M. T., 6, 46, 47, 62, 63, 72, 73, 87, 88, 95, 96, 107, 108, 124. 135, 136, 184, 193 Gopalakrishnan, P. S.. 6 Graf, D., 193 Graham, R. L., 184 Guenther, G. R.. 25, 46, 182 Guha. S., 88. 108, 124 Guibas, L. J., 45, 62, 63, 72, 73, 107, 123, 135, 182, 184, 193 MacKenzie, P. D., 47, 63, 88, 96, 136 Maheshwari, A., 63 Hagerup, T., 183, 193 Maheshwari, S. N., 6, 48 Hambrusch, S. E.. 183 Megiddo, N., 63 Harel, D., 182 Mehlhorn, K., 6, 183 Hassenklover, A., 183 Meijer, H., 182, 185 He, X., 124 Menn, A., 184 Hedetniemi, S. T., 124 Merker, R., 193 Holey, J. A., 26, 47, 108 Merks, E., 136 Hopfield, J. J., 194 Houle, M. E., 184 Miller, R., 26, 47, 48, 62, 63, 96, 97, 124, 183, 185, 193 Miller, S., 48 Ibarra, 0. H., 26, 47, 108 Miller, S. E., 48 Iyengar, S. S., 73 Minsky, M., 26 Miyakawa, M., 48, 124 Jarvis, R. A., 47 Moona, R., 96 Jeong, C. S., 63, 73, 108, 124, 136, 184 Mumbeck, W., 108 211

Index

212 Nandy, S. K., 96 Nassimi, D., 185 Nath, D., 6, 48 Nigam, M., 185 O'Dunlaing, C., 45, 62, 72, 107, 108, 123, 135, 182, 193 O'Rourke, J., 193, 194 Osiakwan, C. N., 124 Overmars, M. H., 48, 136 Pagli, L., 88 Papert, S., 26 Parberry, 1., 185 Paterson, M. S., 185 Pellegrini, M., 194 Peters, J. G., 124 Petersson, O., 47 Pham, Q. T., 88 Plaxton, C. G., 46, 88, 96, 183 Prasad, V. C., 108, 136 Prasanna Kumar, V. K., 6, Preas, B. T., 185 Preilowski, W., 108 Preparata, F. P., 6, 48, 63, 108, 124, 136, 183,

136, 47, 96 73, 96, 185

Qiu, K., 182, 183, 185 Rajagopalan, S., 96 Rajanarayanan, S., 73 Ramachandran, V., 184 Ramanujam, J., 194 Ranade, A. G., 185 Ranka, S., 185 Rappaport, D., 182 Rau-Chaplin, A., 46, 73, 136, 193 Reif, J. H., 6, 48, 73, 88, 108, 136 Rey, C., 185 Rivera-Campo, E., 193 Rivest, R. L., 26 Rosenblatt, F., 26 Rothstein, J., 185 Rub, C., 63 Rudolf, L., 26, 124 Rytter, W., 184 Sack, J. R., 47, 183, 193 Sadayappan, P., 194 Sahni, S., 185, 186 Santoro, N., 47, 183, 193 Sarkar, D., 88, 124 Saxena, S., 108, 136 Scherson, 1. D., 185 Schwarzkopf, O., 108 Seidel, R., 193 Sen, S., 6, 48, 73, 88, 136, 185

Shamos, M. I., 6, 48, 63, 73, 96, 108, 124, 136, 185, 186 Shannon, C. E., 186 Shapira, R., 184 Sharir, M., 193 Shauck, S. B., 88, 124 Shermer, T., 194 Shi, X., 186 Shih, Z. C., 63 Snir, M., 26, 124 Snyder, L., 186 Somani, A. K., 184 Springsteel, F., 186 Sridhar, R., 73 Stojmenovic, I., 47, 48, 64, 88, 96, 107, 109, 124, 182, 183, 186 Stolfi, J., 73, 123, 184 Stone, H. S., 184 Stout, Q. F., 26, 47, 48, 63, 88, 96, 97, 124, 136, 185 Sullivan, F., 135 Szemeredi, E., 25, 182 Tamassia, R., 73 Tank, D. W., 194 Tarjan, R. E., 124 Thompson, C. D., 186 Toussaint, G. T., 184, 186 Tsay, J. J., 46, 87, 95, 193 Tsin, Y. H., 136 Uhr, L., 7 Ullman, J. D., 26, 186 Umeo, H., 87, 135 Urrutia, J., 193, 194 Vaidya, P. M., 125 Valiveti, N., 193

Yao, A. C., 49 Yap, C. K., 6, 7, 45, 62, 72, 107, 108, 123, 135, 136, 182, 193 Zajicek, O., 73 Zaks, J., 193, 194

Subject Index AKS, sorting: circuit, 176 network, 18 Algorithm: cost optimal, 3 deterministic, 3 optimal, 3 parallel, I randomized, 3 sequential, 3 Alternating: path, 120 tree, 120 Antipodal, 166 Arrangement, 90 Art gallery, 88 Ascend, 160 Assignment problem, 118, 119 Associated point, 164 Augmented plane sweep tree, 67 Augmenting path, 120 Average case analysis, 4

van Leeuwen, J., 48, 136 van Weringh, K., 48, 109 Varman, P., 63 Vishkin, U., 107, 124, 186 Vitter, J. S., 73 Voronoi, G., 109 Vuillemin, J., 185

Balanced search tree, 139 Biological computer, 23 Bipartite graph, 118 Boundary of a point set, 43 Bridged separator tree, 68 Broadcasting, 22, 156: interval, 162 with selective reduction, 23, 169 Bucketing, 104 Butterfly network, 17

Wagener, H., 88 Wang, B. F., 49 Wang, C. A., 136 Ward, R., 185 Wee, Y. C., 109 Wegrowicz, P., 64 WeIzl, E., 193 Won, Y., 186 Wong, C. K., 96 Wood, D., 62, 72, 135 Wu, Y. F., 184

Cascading: divide and conquer, 53, 81, 92, 130 fractional, 53, 67, 130 merge technique, 30 Cayley graph, 151 Cellular automata, 10 Center of a set, 95 Circle cover: minimum cardinality, 111 minimum weight, 111

Index Circular: arcs, 24. 62, 111 range searching, 72 Circumscribing polygon, 45, 122 Classification theory, 89 Closest pair (CP) , X9, 169. 181 Cluster analysis, 89 Combinational circuit. 175 Computational geometry, I Concentration, 161 Conflict resolution policies, 22 Convex: hull, 2, 5, 24. 27, 164, 173, 181 polygon, 2 polygonal chain, 44, 61, 181 subdivision, 2 Cooperative searching, 68 Co-podal pair, 168 Cost, 3: optimal, 3 Cousins, 158 Critical: point merging, 81 support line, 166 Cube-connected cycles (CCC) network, 17 Data structure, 187 Delaunay triangulation, 1()0. 102, 116, 131, 135 Depth: of a combinational circuit, 175 of a point set, 45 of an image, 40 of collision, 61 Descend, 160 Deterministic algorithm, 3 Diameter: of a point set, 95, 166 of a polygon, 122 Digitized image, 12, 40, 93, 103 Dirichlet tessellation, 99 Disk, 105, 123 Distance: between a point and an oriented edge, 164 between line segments, 95 between polygons, 95, 167. 168 Distribution, 161 Divide and conquer, 33, 101, 104, 105, 116 cascading, 53, 81, 92, 130 multiway, 29, 31, 33, 38, 43, 82, 129, 133 Dividing chain, 102 Dominate, 41, 149, 172 Dynamic computational geometry, 191 Dynamically changing set, 95

213 ECDF (Empirical Cumulative Distribution Function) searching, 41, 147, 150, 168 Elementary operation, 3 Eucliidean: minimum spanning tree, 99. 115. 123 minimum weight perfect matching, 118, 123 minimum weight triangulation. 134 traveling salesperson problem. 123 Euler tour technique, 114 Euler's relation. 100 Expected running time, 4 Extremal: point, 43 search, 164 Facility location problem, 123 Farthest neighbor, 94 All(AFN), 94 Finger probe, 5 Fold-over operation, 91 Fractional cascading, 53, 67, 130 Frame buffer, 137 Free tree, 1)5 Funnel polygon, 129, 133 Gabriel graph, 135 General polygon, 5 prefix computation, 146, 168, 180 Geometric optimization, III, 122, 189 Graphics, 137 Greedy algorithm, 112, 114 Grid, 137 Half-plane, 41, 51, 59 Ham-sandwitch cut, 87 Hull convex, 2, 5, 24, 27, 164, 173, 181 lower, 29, 181 upper, 29, 133, 181 Hypercube network, 17 Hyperplane, 190 Illumination, 188 Indexing scheme, 12: proximity ordering, 34 row-major order, 12 shuffled row-major order, 12 snakelike row-major order, 12

Interconnection network, I 1: AKS sorting, 18, 176 butterfly, 17 cube-connected cycles (CCC), 17 hypercube, 17 linear array, II mesh, II mesh of trees (MOT), 13 omega, 25 pancake, 19, 151, 152 perfect shuffle, 25 plus minus 2 (PM21), 24 pyramid, 14 star, 19, 151 tree, 13 Interconnection unit (IU), 21, 170 Intersection, 2, 51: of two convex polygons, 169 strict, Ill Inversion, 101 Isothetic: line segment, 54 rectangle, 58 unit square, 192 k-D (or multidimensional binary tree), 70, 72 Kernel of a simple polygon, 59 reachability, 87 Leader, 162, 178 Light source, 141 Line segment, 51, 81, 82, 83 Linear: array network, II programming, 59, 119 Linearly separable, 192 Link: center, 6 distance, 6 List ranking, 104, 114 Loci of proximity, 99 Lower: bound, 3, 147, 177 hull, 29, 181 Maintenance, 45 Many-to-one routing, 163 Matching, 2, 117: Euclidean, 118 Manhattan, 118 maximum, 122 minimum weight perfect, 119, 123 Maximal vectors (maximal elements), 41, 137, 147, 150, 168, 172

Index

214 Maximum empty rectangle, 122 Maze, 143 m-contour, 41 Merging, i58: circuit, 176 slopes technique, 164 Mesh: network, i I of trees (MOT) network, 13 with broadcast buses, 12, 13 with reconfigurable buses, 13 Metric: L, (Manhattan), 94, 103, 118 L, (Euclidean), 118 L,, 94 Minimax linear fit, 167 Model of computation, 4 Monotone polygon, 58 Multidimensional binary (or k-D) tree, 70, 72 Multi-level bucketing, 104 Multilocation, 54, 128, 130 Multisearch, 40, 188 Multiway divide and conquer, 29, 31, 33, 38, 43, 82, 129, 133 NC, 191 Nearest neighbor, 89 all(ANN), 89 ball, 92 query(QNN), 89 Neighbor: nearest, 89 farthest, 94 Network model, 4, 10 Neural net, 9, 189 Next element search, 69 Omega network, 25 One-sided monotone polygon, 129 Optical computer, 23 Optimal algorithm, 3 Optimization, 111, 122, 189 Order: at least, 3 at most, 3 proximity, 34 row-major, 12 shuffled row-major, 12 snakelike row-major, 12 Orthogonal range searching, 71 Pancake network, 19, 151, 152 Parallel: algorithm, 1 computer, 4 prefix, 23, 29, 113, 157, 175 Parallel random access machine (PRAM), 4, 21:

CRCW, 22 CREW, 21 EREW, 21 Real(RPRAM), 22 Parallelism, 1, 2 Path: external shortest, 122 in a maze, 143 polygonal, 6 shortest, 116 smallest, 122 Pattern recognition, 9, 65 P-complete problem, 191 Peeling, 40, 44, 45 Perceptron, 9 Perfect shuffle network, 25 Performance comparisons of: parallel algorithms for computing maximal vectors and related problems, 43 parallel algorithms for triangulating point sets, 134 parallel convex hull algorithms, 39 parallel line segment intersection algorithms, 56 parallel minimum circle cover algorithms, 115 parallel point location algorithms, 70 parallel polygon triangulation algorithms, 131 parallel polygon, half-plane, rectangle and circle intersection algorithms, 60 parallel QNN, ANN and CP algorithms, 93 parallel visibility and separability algorithms, 86 parallel Voronoi diagram algorithms, 106 Pipelining, 68, 77, 82 Pixel, 12, 40, 94, 137, 141 Plane sweep, 51: tree, 54, 67, 130 Plus minus 2' network (PM2I), 24 Point location, 2, 57, 65, 104 Polygon circumscribing, 45, 122 convex, 2 general, 5 horizontally monotone, 129 inclusion, 5 intersection, 5 monotone, 58 one-sided monotone, 129 rectilinear, 117, 122, 123 separation, 85 simple, 5 star shaped, 57

vertically convex, 61 visibility, 76, 77 with holes, 62, 76, 135 Polygonal chain, 44, 78, 122 convex, 44, 61, 181 Prefix sums, 29, 157, 175 Priority queue, 139 Probabilistic time, 3 Processor, 4 network, 10 Proximity problems, 89, 99 Pyramid network, 14 Quad, 105 tree, 94 Random access machine (RAM), 22 Real, 22 Range searching, 70, 146 tree, 71, 72 Rank, 158 Ray shooting, 189 Reachability, 87 Rectangle: isothetic, 72, 123 maximum empty, 122 minimum-area, 166 of influence, 104 query, 61, 72 Rectilinear convex hull, 40 convex image, 40 polygon, 117, 122, 123 Recursive doubling, 157 Relative neighborhood graph, 135 Retrieval: direct, 71 indirect, 71 Reversing, 161 Routing, 3, 154 many-to-one, 163 Running time, 3 Scan model, 23 Screen, 12, 40, 41, 137, 141 Searching circular range, 72 cooperative, 68 ECDF (Empirical Cumulative Distribution Function), 41, 137, 147, 150, 168 extremal, 164 geometric, 138 multi, 188 next element, 69 orthogonal range, 71 range, 70, 146

Index Segment dragging, 55 tree, 51, 72, 94, 133 Semigroup operation, 35 Separability, 75, 84 Sequential algorithm, 3 Sequentially separable simple polygons, 85 Set difference, 163 Shadow, 141 Shared memory model, 4, 20, 170 Similar polygons, 72 Simple polygon, 5 Size: of a combinational circuit, 175 of a problem, 2 Sorting, 22, 27, 158, 159, 173, 176 Stabbing, 188 Star network, 19, 151 Star shaped polygon, 57 Step: computation, 3 routing, 3 Subdivision arbitrary, 65 convex, 65 hierarchical, 67 monotone, 67 planar, 65 triangulated, 65 Supporting line, 133 Systolic array, 10 screen, 12, 40, 41, 137, 141

215 Tangent, 29, 32 Thiessen tessellation, 99 Threaded binary tree, 71 Translation, 161 Transversal, 192 Trapezoidal decomposition (map), 82, 127 segment, 187 Tree alternating, 120 augmented plane sweep, 67 balanced search, 139 bridged separator, 68 Euclidean minimum spanning, 99, 115, 123 free, 105 multidimensional binary (or kD), 70, 72 network, 13 plane sweep, 54, 67, 130 quad, 94 range, 71, 72 segment, 51, 72, 94, 133 threaded binary, 71 Triangulated sleeve, 116 Triangulation, 2, 5, 116, 127, 135 Delaunay, 100, 102, 116, 131, 135 minimum weight, 134 of a point set, 131, 147, 168 of a polygon, 127, 189 Two-dimensional array, II Two-set dominance counting, 42, 147, 150, 168

Unmerging, 158 Upper: bound, 3 hull, 29, 133, 181

Vector sum of two convex polygons, 168 Vertically convex polygon, 61 Visibility, 2, 75, 188 chain, 78 graph, 84 hull, 77, 85 pair of line segments, 83 polygon, 24, 76, 77 Voronoi diagram, 2, 40, 89, 99, 116, 132 discrete, 103 furthest site, 100 of disks, 105 of order k, 107 weighted, 121

Well balanced curve segment, 55 Width: of a combinational circuit, 175 of a point set, 167 Work, 3 Worst case analysis, 3 Y-disjoint edges, 102