Parallel Metaheuristics: A New Class of Algorithms

PARALLEL METAHEURISTICS A New Class of Algorithms Edited by Enrique Alba @ZEiCIENCE A JOHN WILEY & SONS, INC., PUBLIC...

Author: Enrique Alba

47 downloads 1652 Views 33MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

PARALLEL METAHEURISTICS A New Class of Algorithms

Edited by

Enrique Alba

@ZEiCIENCE A JOHN WILEY & SONS, INC., PUBLICATION

This Page Intentionally Left Blank

PARALLEL METAHEURISTICS

WILEY SERIES ON PARALLEL AND DISTRIBUTED COMPUTING Series Editor: Albert Y. Zomaya Parallel and Distributed Simulation Systems / Richard Fujimoto Mobile Processing in Distributed and Open Environments / Peter Sapaty Introduction to Parallel Algorithms / C. Xavier and S. S. lyengar Solutions to Parallel and Distributed Computing Problems: Lessons from Biological Sciences /Albert Y. Zomaya, Fikret Ercal, and Stephan Olariu (Editors) Parallel and Distributed Computing: A Survey of Models, Paradigms, and Approaches / Claudia Leopold Fundamentals of Distributed Object Systems: A CORBA Perspective / Zahir Tari and Omran Bukhres Pipelined Processor Farms: Structured Design for Embedded Parallel Systems I Martin Fleury and Andrew Downton Handbook of Wireless Networks and Mobile Computing / Ivan Stojmenovic (Editor) Internet-BasedWorkflow Management: Toward a Semantic Web / Dan C. Marinescu Parallel Computing on Heterogeneous Networks / Alexey L. Lastovetsky Performance Evaluation and Characteization of Parallel and Distributed Computing Tools / Salim Hariri and Manish Parashar Distributed Computing: Fundamentals, Simulations and Advanced Topics, Second Edition / Hagit Attiya and Jennifer Welch Smart Environments: Technology, Protocols, and Applications / Diane Cook and Sajal Das Fundamentalsof Computer Organization and Architecture / Mostafa Abd-El-Barr and Hesham El-Rewini Advanced Computer Architecture and Parallel Processing / Hesham El-Rewini and Mostafa Abd-El-Barr UPC: Distributed Shared Memory Programming / Tarek El-Ghazawi, William Carlson, Thomas Sterling, and Katherine Yelick Handbook of Sensor Networks: Algorithms and Architectures / Ivan Stojmenovic (Editor) Parallel Metaheuristics: A New Class of Algorithms / Enrique Alba (Editor)

PARALLEL METAHEURISTICS A New Class of Algorithms

Edited by

Enrique Alba

@ZEiCIENCE A JOHN WILEY & SONS, INC., PUBLICATION

Copyright Q 2005 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act. without either the prior written permission of the Publisher. or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center. Inc., 222 Rosewood Drive, Danvers. MA 01923, (978) 750-8400, fax (978) 750-4470. or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., I 1 1 River Street, Hoboken. NJ 07030, (201 ) 748-601 I , fax (201 ) 748-6008, or online at http://www.wiley.coni/go/permission. Limit of LiabilityiDisclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (3 17) 572-3993 or fax (3 17) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products. visit our web site at www.wiley.com. Libray of Congress Cataloging-in-Publication Data:

Parallel metaheuristics : a new class of algorithms /edited by Enrique Alba. p. cm. ISBN-I 3 978-0-471 -67806-9 ISBN- I0 0-47 1-67806-6 (cloth) 1. Mathematical optimization. 2. Parallel algorithms. 3. Operations research. I . Alba, Enrique. T57.P37 2005 5 19.6-4c22 2005001251 Printed in the United States of America 1 0 9 8 7 6 5 4 3 2 1

Contents Foreword Preface Contributors INTROD JCTION TO n ETA t E JR STlCS AND PARALLELISR

Part I 1

An Introduction to Metaheuristic Techniques

xi

...

Xlll

xv

1

3

Christiun Blum. Andrea Roli, Enrique Alba

1.1 1.2 1.3 1.4 1.5 1.6

2

Introduction Trajectory Methods Population-Eased Methods Decentralized Metaheuristics Hybridization of Metaheuristics Conclusions References

Measuring the Performance of Parallel Metaheuristics

3 8 19 28 29 31 31 43

Enrique Alba, Gabriel Luque

2.1 2.2 2.3 2.4 2.5

3

Introduction Parallel Performance Measures How to Report Results Illustrating the Influence of Measures Conclusions References

New Technologies in Parallelism

43 44 48 54 60 60 63

Enriyue Albo. Antonio J. Nehro

3.1 3.2 3.3

Introduction Parallel Computer Architectures: An Overview Shared-Memory and Distributed-Memory Programming

63 63 65 V

Vi

CONTENTS

Shared-Memory Tools Distributed-Memory Tools Which of Them'? Summary References

68 70 74 75 76

Metaheuristics and Parallelism

79

3.4 3.5 3.6 3.7

4

Enrique Alba. El-Ghazali Talbi, Gabriel Luque. Nouredine Melab

4.1 4.2 4.3 4.4 4.5 4.6 4.7

Part I I

5

Introduction Parallel LSMs Case Studies of Parallel LSMs Parallel Evolutionary Algorithms Case Studies of Parallel EAs Other Models Conclusions References

79 80 81 85 87 93 95 96

PARALLEL METAHEURISTIC MODELS

105

Parallel Genetic Algorithms

107

Gabriel Luque, Enrique Aha. Bemabe Dorronsoro

5.1 5.2 5.3 5.4 5.5 5.6

introduction Panmictic Genetic Algorithms Structured Genetic Algorithms Parallel Genetic Algorithms Experimental Results Summary References

6 Parallel Genetic Programming

107 108

110 112 1 I8 121

122 127

E Ferndndez. G. Spezzano, M. Tomassini. L. Vanneschi

6.1 6.2 6.3 6.4 6.5 6.6 6.7

Introduction to GP Models of Parallel and Distributed GP Problems Real-Life Applications Placement and Routing in FPGA Data Classification Using Cellular Genetic Programming Concluding Discussion References

127 130 134 137 139 144 150 150

CONTENTS

7

Parallel Evolution Strategies

vii 155

Gunter Rudolph

7.1 7.2 7.3 7.4 7.5

8

Introduction Deployment Scenarios of Parallel Evolutionary Algorithms Sequential Evolutionary Algorithms Parallel Evolutionary Algorithms Conclusions References

Parallel Ant Colony Algorithms

155 156 159 159 165

165 171

Stefan Janson. Daniel Merkle, Martin Middendorf

9

8.1 8.2 8.3 8.4

Introduction Ant Colony Optimization Parallel ACO Hardware Parallelization of ACO

8.5

Other Ant Colony Approaches References

Parallel Estimation of Distribution Algorithms

171 172 175 190 195 197 203

Julio Madera. Enrique Alba, Alberto Ochoa

9.1 9.2 9.3 9.4 9.5

Introduction Levels of Parallelism in EDA Parallel Models for EDAs A Classification of Parallel EDAs Conclusions

203 204 206 216 219

References

220

10 Parallel Scatter Search

223

F. Garcia, M . Garcia. B. Melian, J. A. Moreno-Pkrez. J. M . Moreno-Vega

10.1 10.2 10.3 10.4 10.5 10.6 10.7

Introduction Scatter Search Parallel Scatter Search Application of Scatter Search to the p-Median Problem Application of Scatter Search to Feature Subset Selection Computational Experiments Conclusions References

223 224 225 229 232 239 243 244

viii

CONTENTS

1 1 Parallel Variable Neighborhood Search

247

Josi A. Moreno-Perez. Pierre Hansen, Nenad MladenoviL:

1 1.1 Introduction

11.2 The VNS Metaheuristic 11.3 The Parallelizations 1 1.4 Application of VNS for the p-median 1 1.5 Computational Experiments

1 1.6 Conclusions References 12 Parallel Simulated Annealing

247 24 8 25 1 258 262 263 264 261

M. Emin Aydin, Vecihi Yi@t

12.1 12.2 12.3 12.4 12.5

Introduction Simulated Annealing Parallel Simulated Annealing A Case Study Summary References

13 Parallel Tabu Search

267 268 269 275 283 284 289

Teodor Gabriel Cruinic. Michel Gendreuu. Jean- Yves Potvin

Introduction Tabu Search Parallelization Strategies for Tabu Search Literature Review Two Parallel Tabu Search Heuristics for Real-Time Fleet Management 13.6 Perspectives and Research Directions References

13.1 13.2 13.3 13.4 13.5

14 Parallel Greedy Randomized Adaptive Search Procedures

289 290 29 1 294 302 305 306 315

Mauricio G. C. Resende, Celso C. Riheiro

14.1 14.2 14.3 14.4 14.5

Introduction Multiple-Walk Independent-Thread Strategies Multiple-Walk Cooperative-Thread Strategies Some Parallel GRASP Implementations Conclusion References

315 317 323 327 340 34 1

CONTENTS

15 Parallel Hybrid Metaheuristics

ix

347

Carlos Cotta, El-Ghazali Talbi. Enrique Alba

15.1 15.2 15.3 15.4 15.5 15.6

Introduction Historical Notes on Hybrid Metaheuristics Classifying Hybrid Metaheuristics Implementing Parallel Hybrid Metaheuristics Applications of Parallel Hybrid Metaheuristics Conclusions References

16 Parallel Multiobjective Optimization

347 348 350 355 358 359 359

371

Antonio J. Nebro, Francisco Luna, El-Ghazali Talbi. Enrique Alba

16.1 16.2 16.3 16.4 16.5

Introduction Parallel Metaheuristics for Multiobjective Optimization Two Parallel Multiobjective Metaheuristics Experimentation Conclusions and Future Work References

17 Parallel Heterogeneous Metaheuristics

37 1 372 377 379 386 387

395

Francisco Luna. Enrique Alba, Antonio J. Nebro

17.1 17.2 17.3 17.4 17.5 17.6

Introduction Heterogeneous Metaheuristics Survey Taxonomy of Parallel Heterogeneous Metaheuristics Frameworks for Heterogeneous Metaheuristics Concluding Remarks Annotated Bibliography References

395 397 400 404 406 407 412

Part 111 THEORY AND APPLICATIONS

423

18 Theory of Parallel Genetic Algorithms

425

Erick Cantu-Paz

18.1 18.2 18.3 18.4 18.5

Introduction Master-Slave Parallel GAS Multipopulation Parallel GAS Cellular Parallel GAS Conclusions References

425 42 8 430 43 7 438 439

X

CONTENTS

19 Parallel Metaheuristics Applications

441

Teodor Gabriel Crainic. Nourredine Hail

19.1 Introduction 19.2 Parallel Metaheuristics 19.3 Graph Coloring 19.4 Graph Partitioning 19.5 Steiner Tree Problem 19.6 Set Partitioning and Covering 19.7 Satisfiability Problems 19.8 Quadratic Assignment 19.9 Location Problems 19.10 Network Design 19.1 1 The Traveling Salesman Problem 19.12 Vehicle Routing Problems 19.13 Summary References

20 Parallel Metaheuristics in Telecommunications

447 44 8 45 1 452 456 457 459 462 464 468 47 1 476 479 480 495

Sergio Nesmachnou: Hdctor Cancela. Enriqzre Alba, Francisco Chicano

Introduction Network Design Network Routing Network Assignment and Dimensioning Conclusions References

495 496 502 504 510 510

21 Bioinformatics and Parallel Metaheuristics

51 7

20.1 20.2 20.3 20.4 20.5

Osuraldo T r e k . AndrPs Rodriguez

2 1.1 21.2 2 1.3 21.4 2 1.5 21.6

Index

Introduction Bioinformatics at a Glance Parallel Computers Bioinformatic Applications Parallel Metaheuristics in Bioinformatics Conclusions References

517 519 522 526 534 543 543 55 1

Foreword

Metaheuristics are powerful classes of optimization techniques that have gained a lot of popularity in recent years. These techniques can provide useful and practical solutions for a wide range of problems and application domains. The power of metaheuristics lies in their capability in dealing with complex problems with no or little knowledge of the search space, and thus they are particularly well suited to deal with a wide range of computationally intractable optimizations and decision-making applications. Rather simplistically, one can view metaheuristics as algorithms that perform directed random searches of possible solutions, optimal or near optimal, to a problem until a particular termination condition is met or after a predefined number of iterations. At the first instance, this can be seen as a drawback because the search for a solution may take too much time to an extent that renders the solution impractical. Fortunately, many classes of metaheuristics are inherently parallelizable and this led researchers to develop parallelization techniques and efficient implementations. Of course, in some metaheuristics, parallelization is much easier to achieve than in others, and with that comes issues of implementation on actual parallel platforms. In earlier implementations the master-slave paradigm was the preferred model used to run metaheuristics and still is a valid approach for many classes of these algorithms. However, due to the great variety of computer architectures (shared memory processors, clusters, grids, etc.) other approaches have been developed and more concerted work is needed in this direction. Moreover, another important issue is that of the development of parallelization tools and environments that ease the use of metaheuristics and extend their applicability range. Professor Alba’s new book, Parallel Metaheuristics, is a well-timed and worthy effort that provides a comprehensive and balanced blend of topics, implementations, and case studies. This volume will prove to be a very valuable resource for researchers and practitioners interested in using metaheuristics to solve problems in their respective disciplines. The book also serves as a repository of significant reference material as the list of references that each chapter provides will serve as a useful source of further study.

Professor Albert Y. Zomaya ClSCO Systems Chair, Professor of Internetworking The University of Sydney, Australia May 2005 xi

This Page Intentionally Left Blank

Preface The present book is the result of an ambitious project to bring together the various visions of researchers in both the parallelism and metaheuristic fields, with a main focus on optimization. In recent years, devising parallel models of algorithms has been a healthy field for developing more efficient optimization procedures. What most people using these algorithms usually miss is the important idea that parallel models that run in multiple computers are quite modified versions of the sequential solvers they have in mind. This of course means that not only the resulting algorithm is faster in wall clock time, but also that the underlying algorithm performing the actual search is a new one. These new techniques have their own dynamics and properties, many of them coming from the kind of separate decentralized search that they perform, while many others come from their parallel execution. Creating parallel metaheuristics is just one way for improving an algorithm. Other different approaches account for designing hybrid algorithms (merging ideas from existing techniques), creating specialized operations for the problem at hand, and a plethora of fruitful research lines of the international arena. However, designing parallel metaheuristics has an additional load of complexity, since doing it appropriately implies that the researcher must have background knowledge from the two combined fields: parallelism and metaheuristics. Clearly, this is difficult, since specialization is a must nowadays, and these two fields are naturally populated by often separate groups of people. Thus, many researchers in mathematics, engineering, business, physics, and pure computer science deal quite appropriately with the algorithms, but have no skills in parallelism. Complementary, many researchers in the field of parallelism are quite skilled with parallel software tools, distributed systems, parallel languages, parallel hardware, and many other issues of high importance in complex applications; but the problem arises since these researchers often do not have deep knowledge in metaheuristics. In addition, there are also researchers who are application-driven in their daily work; they only want to apply the techniques efficiently, and do not have the time or resources (nor maybe the interest) in the algorithms themselves nor in parallelism, just in the application. This book is intended to serve all of them, and this is why I initially said that it tries to fulfill an ambitious goal. The reader will have to judge to which extent this goal is met in the contents provided in the different chapters. Most chapters contain a methodological first part dealing with the technique, in order to settle its expected behavior and the main lines that could lead to its parallelization. In a second part, chapters discuss how parallel models can be derived for the technique to become xiii

xiv

PREFACE

more efficient and what are the implications for the resulting algorithms. Finally, some experimental analysis is included in each chapter in order to help understand the advantages and limits of each proposal from a practical point of view. In this way, researchers whose specialities are in either domain can profit from the contents of each chapter. This is the way in which the central part of the book, entitled Parallel Metaheuristic Models (Chapters 5 to 17) was conceived. There are of course some exceptions to this general chapter structure to make the book more complete. I added four initial chapters introducing the twoje1d.s (Chapters 1 to 4) and four trailing chapters dealing with theory and applications (Chapters 18 to 21). The resulting structure has three building blocks that offer an opportunity to the reader to select the parts or chapters he/she is more interested in. The four initial chapters are targeted to a broad sector of readers that want to know in a short time what are the most important topics and issues in metaheuristics and in parallelism, both dealt together or separately. In the third part, also included is an invited chapter on theoretical issues for Parallel Genetic Algorithms (a widely used metaheuristic) and three more chapters dealing with applications of these algorithms. Since the spectrum of potential applications is daunting, I decided to to devote a chapter to complex applications in general to reach a large audience, plus two additional ones on interesting, influent, and funded research lines internationally, that is telecommunications and bioinformatics. The whole work is targeted to a wide set of readers, ranging from specialists in parallelism, optimization, application-driven research, and even graduate courses or beginners with some curiosity of the advances and latest techniques in parallel metaheuristics. Since it is an edited volume, I was able to profit from well-known international researchers as well as from new research lines on related topics started recently; this is an important added value that a non edited book could not show. I would like to end this introduction with my profound acknowledgment to all authors contributing a chapter to this book, since any merit this work could deserve must be credited to them. Also, I thank the research group in my university in Malaga for all their effort and help in this project. I also appreciate the support received from Wiley during the whole editing process as well as the decisive endorsement of Professor A. Zomaya to make true this idea. To all of them, thank you very much. Myjnal words are of course for my family: my wifepAna, and my children, Enrique and Ana, the three lights that are always guiding my life, anytime, anywhere. Malaga. Spain May 2005

ENRIQUE ALBA

Contributors E. ALBA University o f Malaga, Spain

C. COTTA

M. E. AYDIN

T. CRAINIC

University o f Malaga, Spain Departamento de Lenguajes y Ciencias de la Computacion E.T.S.I. lnformatica (3-2-49) Campus de Teatinos, 29071 Malaga (Spain) [email protected]

Departamento de Lenguajes y Ciencias de la Computacion E.T.S.I. lnformatica (3-2-12) Campus de Teatinos, 29071 Malaga (Spain) [email protected] London South Bank University. UK London South Bank University, BClM 103 Borough Rd.. London. SEI OAA (UK) [email protected]

c. BLUM

Polytechnic University o f Catalunya, Spain Dept. de Llenguatges i Sistemes Informatics Universitat Politecnica de Catalunya Jordi Girona 1-3, C6119 Campus Nord E-08034 Barcelona (Spain) [email protected]

H. CANCELA University o f La Republica, Urugu: ‘Y Facultad e lngeniefla J . Herrera y Reissig 565, Montevideo 1 1300 (Uruguay) [email protected]

E. CANTU-PAZ Lawrence Livermore National Laboratory, USA Center for Applied Scientific Computing Lawrence Livermore National Laboratory 7000 East Avenue, L-56 1 Livermore, CA 94550 (USA) cantupaz@llnl .gov

F. CHICANO University o f Malaga. Spain

Departamento de Lenguajes y Ciencias de la Computacion E.T.S.I. lnformatica (3-3-4) Campus de Teatinos. 29071 Malaga (Spain) [email protected]

Transport Research Center and U n versity o f Quebec at Montreal, Canada Departement Management et Technologie Universite du Quebec a Montrkal 31 5. rue Sainte-Catherine est, local R-2380 Montrkal QC H2X 3 x 2 (Canada) theo@>crt.umontreal.ca

B. DORRONSOROUniversity o f Malaga, Spain Departamento de Lenguajes y Ciencias de la Computacibn E.T.S.I. Informatica (3-3-4) Campus de Teatinos, 29071 Malaga (Spain) [email protected]

F. FERNANDEZ

University o f Extremadura, Spain Centro Universitario de Merida Universidad de Extremadura C/ Sta. Teresa de Jornet, 38 06800 Merida (Spain) [email protected]

F. GARCiA

University of La Laguna. Spain Departamento de Estadistica. 1.0.y Computacion Universidad de La Laguna 38271 La Laguna (Spain) [email protected]

M. GARCiA

University o f La Laguna, Spain Departamento de Estadistica, 1.0.y Computacion Universidad de La Laguna 38271 La Laguna (Spain) [email protected]

xv

XVi

CONTRIBUTORS

M. GENDREAUTransport Research Center and University o f Montreal. Canada Centre de Recherche sur les Transports Universite de Montreal C.P. 6128, succ. Centre-ville Montreal-Quebec (Canada) [email protected]

N.

HAIL Transport Research Center and Uni-

versity o f Montreal, Canada Centre de Recherche sur les Transports Universite de Montreal C.P. 6128. succ. Centre-ville Montreal-Quebec (Canada) hailacrt .umontreal.ca

P. HANSEN GERADand HEC Montreal. Canada 3000, ch. de la Cote-Sainte Catherine Montreal (Quebec) H3T 2A7 (Canada) [email protected]

S. JANSON

D.MERKLE

University of Leipzig, Germany Parallel Computing and Complex Systems Group Faculty of Mathematics and Computer Science University of Leipzig Augustusplatz l0il I D-04109 Leipzig(Germany) [email protected]

M. MIDDENDORF University of Leipzig. Germany Parallel Computing and Complex Systems Group Faculty of Mathematics and Computer Science University of Leipzig Augustusplatz 10/11 D-04109 Leipzig(Germany) middendorf~~~informati k.uni-leipzig.de

N. MLADENOVIC Mathematical lnstitute(SANU), Belgrade Mathematical Institute, SANU Knez Mihajlova 32 1 1000 Belgrade Serbia and Montenegro nenadami .sanu.ac .yu

University of Leipzig. Germany Parallel Computing and Complex Systems Group J. A. MORENO-PkREZ University of La LaFaculty of Mathematics and Computer Science University of Leipzig guna, Spain Departamentode Estadistica, 1.0.y Computacion Augustusplatz 10/11 D-04109 Leipzig (Germany) Universidad de La Laguna janson@informati k.uni-leipzig.de 38271 La Laguna (Spain) F. LUNA University of Malaga, Spain jamoreno~jull.es Departamento de Lenguajes y Ciencias de la J. M. MORENO-VEGA University of La LaComputacion E.T.S.I. lnformatica (3-3-4) guna. Spain Campus de Teatinos. 29071 MBlaga (Spain) Departamento de Estadistica, I .O. y Computacion [email protected] Universidad de La Laguna 38271 La Laguna (Spain) G . LUQUE University of Malaga, Spain [email protected] Departamento de Lenguajes y Ciencias de la A. J. NEBRO University o f Malaga, Spain Computacion Departamento de Lenguajes y Ciencias de la E.T.S.I. lnformatica (3-3-4) Computacion Campus de Teatinos, 29071 Malaga (Spain) [email protected] E.T.S.I. lnformatica (3-2-1 5) Campus de Teatinos, 29071 Malaga (Spain) J. MADERA University of Camaguey. Cuba [email protected] Department of Computing S. NESMACHNOW University o f La Repuhlica, Circunvalacion Norte km. 5'12 Camaguey (Cuba) Uruguay [email protected] Facultad e Ingenieria J. Herrera y Reissig 565, N. MELAB University o f Lille, France Montevideo 11300 (Uruguay) Lahoratoire d'lnformatique Fondamentale de Lille [email protected] U M R CNRS 8022, Citl scientifique 59655 Villeneuve d'Ascq cedex (France) A. OCHOA ICIMAF, Cuba Institute ofCyhernetics, Mathematics and Physics [email protected] Calk I5 No. 551 e / C y D B. MELIAN University of La Laguna, Spain 10400 La Habana (Cuba) Departamentode Estadistica, I.O. y Computacion [email protected] Universidad de La Laguna 38271 La Laguna (Spain) mhmelian(ic:ull.es

CONTRIBUTORS

J Y . POTVlN

Transport Research Center and University of Montreal, Canada Dept. lnformatique et Recherche Operationnelle Bureau 3383, Pavillon Andre-Aisenstadt CP 6128, succ. Centre-Ville Montreal Quebec H3C 357 (Canada) [email protected]

M. G. C. RESENDE AT&T Labs Research, Shannon Laboratory, USA AT&T Labs Research, Algorithms and Optimization R. D. 180 Park Avenue. Room C-241 Florham Park. NJ 07932-0971 (USA) [email protected]

C.

G. SPEZZANO

University of Calabria, Italy ICAR-CNR c/o DEIS, Universita della Calabria Via Pietro Bucci cubo 41 C 87036 Rende, CS (Italy) [email protected]

E. G. TALBI University of Lille, France

Laboratoire d'lnformatique Fondamentale de Lille UMR CNRS 8022, Cite scientifique 59655 Villeneuve d'Ascq cedex (France) El-ghazali.Talbi~~lifl.fr

M. TOMASSINIllniversityof Lausanne, Switzerland Information Systems Department University of Lausanne 1015 Dorigny-Lausanne (Switzerland) [email protected]

c.

RlBElRO Universidade Federal Fluminense, Brazil Department of Computer Science Rua Passo da Patria 156 24210-240 Niteroi, RJ (Brazil) celsoQinf.puc-rio.br

0. TRELLES University of Malaga. Spain Dpto. de Arquitectura de Computadores E.T.S. Ingenieria Informatica, Campus de Teatinos 29071 Malaga (Spain) [email protected]

A. RODR~GUEZ University of Malaga, Spain Dpto. de Arquitectura de Computadores E.T.S. lngenieria Informatica. Campus de Teatinos 29071 Malaga (Spain) [email protected]

A. ROLI University G.D'Annunzio, Italy Dipartimento di Scienze Universita degli Studi "G.D'Annunzio" Viale Pindaro 42 65 127 Pescara (Italy) [email protected]

G. RUDOLPH Parsytec GmbH, Germany Parsytec AG Auf der Huls I83 52068 Aachen (Germany) [email protected]

xvii

L.

VANNEWHI University of Calabria, Italy Dipartimento di Informatica. Sistemistica e Comunicazione Universita di Milano-Bicocca Via Pietro Bucci cubo 41C Via Bicocca degli Arcimboldi 1, Milano (Italy) [email protected]

v. YlGlT

University of Ataturk, Turkey Ataturk University. Faculty of Engineering Dept. of Industrial Engineering, Erzurum, (Turkey) vyigiteatauni .edu.tr

This Page Intentionally Left Blank

Part I Introduction to Metaheuristics and Parallelism

This Page Intentionally Left Blank

1

An Introduction to Metaheuristic Techniques CHRISTIAN BLUM1, ANDREA ROL12, ENRIQUE ALBA3 ‘Universitat Politecnica de Catalunya, Spain

2Universita degli Studi ‘GD’Annunzio”,Italy 3Universidadde Malaga, Spain

1.1

INTRODUCTION

In optimization we generally deal with finding among many alternatives a best (or good enough) solution to a given problem. Optimization problems occur everywhere in our daily life. Each one of us is constantly solving optimization problems, such as finding the shortest way from our home to our work place subject to traffic constraints or organizing our agenda. (Most) human brains are pretty good in efficiently finding solutions to these daily problems. The reason is that they are still tractable, which means that their dimension is small enough to process them. However, these types of problems also arise in much bigger scales, such as, for example, making most beneficial use of the airplane fleet of an airline with the aim of saving fuel and parking costs. These kinds of problems are usually so high-dimensional and complex that computer algorithms are needed for tackling them. Optimization problems can be modelled by means of a set of decision variables with their domains and constraints concerning the variable settings. They naturally divide into three categories: (i) the ones with exclusively discrete variables (i.e., the domain of each variable consists of a finite set of discrete values), (ii) the ones with exclusively continuous variables (i.e., continuous variable domains), and (iii) the ones with discrete as well as continuous variables. As metaheuristics were originally developed for optimization problems from class (i), we restrict ourselves in this introduction to this class of problems, which is also called the class of combinatorial optimization problems, or CO problems. However, much can be said and extended to continuous and other similar domains. According to Papadimitriou and Steiglitz [ 1141, a CO problem P = (S,f) is an optimization problem in which is given a finite set of objects S and an objective hnction f : S H R+ that assigns a positive cost value to each of the objects s E S. 3

4

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

The goal is to find an object of minimal cost value.’ The objects are typically integer numbers, subsets of a set of items, permutations of a set of items, or graph structures. An example is the well-known travelling salesman problem (TSP [92]). Other examples of CO problems are assignment problems, timetabling, and scheduling problems. Due to the practical importance of CO problems, many algorithms to tackle them have been developed. These algorithms can be classified as either complete or approximate algorithms. Complete algorithms are guaranteed to find for every finite size instance of a CO problem an optimal solution in a bounded time (see [ 114, 11 11). Yet, for CO problems that are NP-hard [63], no polynomial time algorithm exists, assuming that P # N P . Therefore, complete methods need exponential computation time in the worst-case. This often leads to computation times too high for practical purposes. Thus, the use of approximate methods to solve CO problems has received more and more attention in the last 30 years. In approximate methods we sacrifice the guarantee of finding optimal solutions for the sake of getting good solutions in a significantly reduced amount of time. Among the basic approximate methods we usually distinguish between constructive heuristics and local search methods. 1.1.1 Constructive Heuristics

Constructiveheuristics are typically the fastest approximate methods. They generate solutions from scratch by adding opportunely defined solution components to an initially empty partial solution. This is done until a solution is complete or other stopping criteria are satisfied. For the sake of simplicity, we henceforth assume that a solution construction stops in case the current (partial) solution cannot be hrther extended. This happens when no completion exists such that the completed solution is .feasible, i.e., it satisfies the problem constraints. In the context of constructive heuristics, solutions and partial solutions are sequences (cZr. . . ,c k ) composed of solution components c3 from a finite set of solution components C (where /&I = n). This kind of solution is throughout the chapter denoted by 5 , respectively 5P in case of partial solutions. Constructive heuristics have first to specify the set of possible extensions for each feasible (partial) solution s p . This set, henceforth denoted by ( J I ( 5 P ) , is a subset of C \ {c I c E sp}’. At each construction step one of the possible extensions is chosen until ( J I ( 5 P ) = 8, which means either that 5P is a solution or that 5P is a partial solution that cannot be extended to a feasible solution. The algorithmic framework of a constructiveheuristic is shown in Algorithm 1. A notable example of a constructive heuristic is a greedy heuristic, which implements procedure ChooseFrom(M(sP)) by applying a weighting,function.A weighting function is a function that, sometimes

‘Note that minimizing over an objective hnction f is the same as maximizing over’ -f. Therefore, every CO problem can be described as a minimization problem. ’Note that constructive heuristics exist that may add several solution components at the same time to a partial solution. However, for the sake of simplicity, we restrict our description of constructive heuristics to the ones that add exactly one solution component at a time.

INTRODUCTION

5

Algorithm 1 Constructive heuristic 5 p = () Determine ' J l ( 5 P ) while 'Jl(sP) # 8 do c t ChooseFrom(%(sP)) S P +- extend 5P by appending solution component c Determine %(zip) end while output: constructed solution

depending on the current (partial) solution, assigns at each construction step a heuristic value y(c) to each solution component c E %(sp). Greedy heuristics choose at each step one of the extensions with the highest value. For example, a greedy heuristic for the TSP is the Nearest Neighbor Heuristic. The set of solution components is the set of nodes (cities) in G = (V,E ) . The algorithm starts by selecting a city i at random. Then, the current partial solution 5P is extended at each of n - 1 construction steps by adding the closest city j E %(sP) = V \ 5p. Note that in the case of the Nearest Neighbor Heuristic the heuristic values, which are chosen as the inverse of the distances between the cities, do not depend on the current partial solution. Therefore, the weighting function that assigns the heuristic values is called static. In cases in which the heuristic values depend on the current partial solution, the weighting function is called dynamic. 1.1.2 Local Search Methods As mentioned above, constructive heuristics are often very fast, yet they often return solutions of inferior quality when compared to local search algorithms. Local search algorithms start from some initial solution and iteratively try to replace the current solution by a better one in an appropriately defined neighborhood of the current solution, where the neighborhood is formally defined as follows:

Definition 1 A neighborhood structure is a function N : S -+ 2s that assigns to every s E S a set of neighbors N ( s ) S . N ( s )is called the neighborhood of s. Often, neighborhood structures are implicitly deJined by specibing the changes that must be applied to a solution s in order to generate all its neighbors. The application of such an operator that produces a neighbor s' E N (s ) of a solution s is commonly called a move. A neighborhood structure together with a problem instance defines the topology of a so-called search (or fitness) landscape [134, 84, 61, 1231. A search landscape can be visualized as a labelled graph in which the nodes are solutions (labels indicate their objective function value) and arcs represent the neighborhood relation between solutions. A solution s* E S is called a globally minimal solution (or global minimum) if for all s E S it holds that f(s*) 5 f(s). The set of all globally

6

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

Algorithm 2 Iterative improvement local search s + GeneratelnitiaiSolution() while 3 s’ E N ( s )such that f ( s ’ ) < f (s) do s + ChooselmprovingNeighbor(N(s)) end while output: s minimal solutions is henceforth denoted by S*. The introduction of a neighborhood structure enables us to additionally define the concept of locally minimal solutions.

Definition 2 A locally minimal solution (or local minimum) with respect to a neighborhood structure N is a solution i such that ’d s E N (i) : f (s) I f ( s ) . We call s a strict IocaIIy minimal solution if V s E N ( s ): f (i) < f (s) . The most basic local search method is usually called iterative improvement local search, since each move is only performed if the resulting solution is better than the current solution. The algorithm stops as soon as it finds a local minimum. The high level algorithm is sketched in Algorithm 2. There are two major ways of implementing function ChooseImprovingNeighbor(A‘(s ) ) . The first way is called jrst-improvement. A first-improvement fkction scans the neighborhood N (s ) and returns the first solution that is better than s. In contrast, a best-improvement function exhaustively explores the neighborhood and returns one of the solutions with the lowest objective function value. An iterative improvement procedure that uses a first-improvement function is called first-improvement local search, respectively best-improvement local search (or steepest descent local search) in the case of a best-improvement function. Both methods stop at local minima. Therefore, their performance strongly depends on the definition of a neighborhood structure N . 1.1.3 Metaheuristics In the 1970s, a new kind of approximate algorithm has emerged which basically tries to combine basic heuristic methods in higher level frameworks aimed at efficiently and effectively exploring a search space. These methods are nowadays commonly called metaheuristics. The term metaheuristic, first introduced in [66], derives from the composition of two Greek words. Heuristic derives from the verb heuriskein ( E V ~ L O K C L Vwhich ) means “to find”, while the suffix meta means “beyond, in an upper level”. Before this term was widely adopted, metaheuristics were often called modern heuristics [ 1221. The class of metaheuristic algorithms includes’-but is not restricted to-ant colony optimization (ACO), evolutionary computation (EC) including genetic algorithms (GAS),iterated local search (ILS), simulated annealing (SA), and tabu search (TS). For books and surveys on metaheuristicssee [ 19,69,148]. In alphabetical order

INTRODUCTION

7

The different descriptions of metaheuristics found in the literature allow us to extract some fundamental properties by which metaheuristics are characterized: 0

0

0

0 0

0

0

0

0

Metaheuristics are strategies that “guide” the search process. The goal is to efficiently explore the search space in order to find (near-) optimal solutions. Techniques which constitute metaheuristic algorithms range from simple local search procedures to complex learning processes. Metaheuristic algorithms are approximate and usually non-deterministic. They may incorporate mechanisms to avoid getting trapped in confined areas of the search space. The basic concepts of metaheuristics can be described on an abstract level (i.e., not tied to a specific problem) Metaheuristics are not problem-specific. Metaheuristics may make use of domain-specific knowledge in the form of heuristics that are controlled by the upper level strategy. Todays more advanced metaheuristics use search experience (embodied in some form of memory) to guide the search.

In short we may characterize metaheuristics as high level strategies for exploring search spaces by using different methods. Of great importance hereby is that a dynamic balance is given between diversfication and intensijication. The term diversification generally refers to the exploration of the search space, whereas the term intensification refers to the exploitation of the accumulated search experience. These terms stem from the tabu search field [70] and it is important to clarify that the terms exploration and exploitation are sometimes used instead, for example in the evolutionary computation field [5 11. The balance between diversification and intensification is important, on one side to quickly identify regions in the search space with high quality solutions and on the other side not to waste too much time in regions of the search space which either are already explored or do not provide high quality solutions. Blum and Roli elaborated on the importance of the two concepts in their recent survey on metaheuristics [ 191. The search strategies of different metaheuristics are highly dependent on the philosophy of the metaheuristic itself. There are several different philosophies apparent in the existing metaheuristics. Some of them can be seen as “intelligent” extensions of local search algorithms. The goal of this kind of metaheuristic is to escape from local minima in order to proceed in the exploration of the search space and to move on to find other hopefully better local minima. This is for example the case in tabu search, iterated local search, variable neighborhood search and simulated annealing. These metaheuristics (also called trajectory methods) work on one or several neighborhood structure(s) imposed on the search space. We can find a different philosophy in algorithms such as ant colony optimization and evolutionary computation. They incorporate a learning component in the sense that they implicitly or explicitly try to learn correlations between decision variables to identify high quality areas in the search space. This kind of metaheuristic performs, in a sense, a biased sampling

8

A N INTRODUCTION TO METAHEURISTIC TECHNIQUES

of the search space. For instance, in evolutionary computation this is achieved by recombination of solutions and in ant colony optimization by sampling the search space at each iteration according to a probability distribution. There are different ways to classify and describe metaheuristic algorithms. Depending on the characteristics selected to differentiate among them, several classifications are possible, each of them being the result of a specific viewpoint (see for example, [ 1361). The classification into nature-inspired vs. non nature-inspired metaheuristics, into memory-based vs. memory-less methods, or into methods that either use a dynamic or a static objective function, is possible. In this chapter we describe the most important metaheuristics according to the single-point vs. population-based search classification, which divides metaheuristics into trajectory methods and population-based methods. This choice is motivated by the fact that this categorization permits a clearer description of the algorithms. Moreover, a successful hybridization is obtained by the integration of single-point search algorithms in population-based ones. As mentioned at the beginning of this section, metaheuristic algorithms were originally developed for solving CO problems. However, in the meanwhile they are also successfully applied to continuous optimization problems. Examples are simulated annealing algorithms such as [128] or differential evolution [135] and [4, 25, 271 from the evolutionary computation field. Tabu search algorithms such as [ 13, 261 were among the first metaheuristic algorithms to be applied to continuous problems. Among the most recent metaheuristic approaches are ant colony optimization algorithms such as [46,99, 1311. Some of the above mentioned algorithms are based on the well-known Nelder-Mead simplex algorithm for continuous optimization [ 1lo], while others are developed after new ideas on real parameter management coming from the mathematical programming field. However, for the rest of this introduction we will focus on metaheuristic approaches for CO problems, since including in each section discussion on real optimization could end in a chapter of quite difficult organization and reading.

The structure of this chapter is as follows. Section 1.2 and Section 1.3 are devoted to a description of nowadays most important metaheuristics. Section 1.2 describes the most relevant trajectory methods and in Section 1.3 we outline population-based methods. In Section 1.4 we give an overview over the different decentralizedmethods, which are metaheuristics without a central control, and we conclude in Section 1.5 with an overview on metaheuristic hybridizations. 1.2 TRAJECTORY METHODS In this section we outline metaheuristics referred to as trajectory methods. The term trajectory methods is used because the search process performed by these methods is characterizedby a trajectory in the search space. Most of these methods are extensions

TRAJECTORY METHODS

9

of simple iterative improvement procedures (see Section 1.1.2), whose performance is usually quite unsatisfactory. They incorporate techniques that enable the algorithm to escape from local minima. This implies the necessity of termination criteria other than simply reaching a local minimum. Commonly used termination criteria are a maximum CPU time, a maximum number of iterations, a solution s of sufficient quality, or reaching the maximum number of iterations without improvement. 1.2.1 Simulated Annealing SimulatedAnnealing (SA) is commonly said to be the oldest among the metaheuristics and surely one of the first algorithms that had an explicit strategy to escape from local minima. The origins of the algorithm are in statistical mechanics (see the Metropolis algorithm [loll). The idea of SA was provided by the annealing process of metal and glass, which assume a low energy configurationwhen cooled with an appropriate cooling schedule. SA was first presented as a search algorithm for CO problems in [87] and [23]. In order to avoid getting trapped in local minima, the fundamental idea is to allow moves to solutions with objective function values that are worse than the objective function value of the current solution. Such a move is often called an uphill move. At each iteration a solution s' E N ( s )is randomly chosen. If s' is better than s (i.e., has a lower objective function value), then s' is accepted as new current solution. Otherwise, s' is accepted with a probability which is a function of a temperature parameter Tk and f(s') - f(s). Usually this probability is computed following the Boltzmann distribution:

The dynamic process described by SA is a Murkov chain [52],as it follows a trajectory in the state space in which the successor state is chosen depending only on the incumbent one. This means that basic SA is memory-less. However, the use of memory can be beneficial for SA approaches (see for example [24]). The algorithmic framework of SA is described in Algorithm 3. The components are explained in more detail in the following.

GeneratelnitialSolution(): The algorithm starts by generating an initial solution that may be randomly or heuristically constructed. SetlnitialTemperature(): The initial temperature is chosen such that the probability for an uphill move is quite high at the start of the algorithm. AdaptTemperature(Tk): The temperature Tk is adapted at each iteration according to a cooling schedule (or cooling scheme). The cooling schedule defines the value of Tk at each iteration k. The choice of an appropriate cooling schedule is crucial for the performance of the algorithm. At the beginning of the search the probability of accepting uphill moves should be high. Then, this probability should be gradually

10

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

Algorithm 3 Simulated Annealing (SA) s t GeneratelnitialSolution() k+O Tk t SetlnitialTemperature() while termination conditions not met do s’ t PickNeighborAtRandom(N(s ) ) if ( f ( s ’ ) < f ( s ) )then s + s’ { s’ replaces s} else Accept s’ as new solution with probability p(s’ 1 Tk,S ) end if AdaptTem perature(Tk) k c k + l end while output: best solution found

03.(1.1))

decreased during the search. Note that this is not necessarily done in a monotonic fashion. Theoretical results on non-homogeneous Markov chains [ 11 state that under particular conditions on the cooling schedule, the algorithm converges in probability to a global minimum for k + oa. More precisely:

3 T E R+

s.t.

lim p(globa1 minimum found after k steps) = 1

k-oo

k=l

A particular cooling schedule that fulfills the hypothesis for the convergence is the one that follows a logarithmic law. Hereby, Tk is determined as T k + (where c is a constant). Unfortunately, cooling schedules which guarantee the convergence to a global optimum are not feasible in applications, because they are too slow for practical purposes. Therefore, faster cooling schedules are adopted in applications. One of the most popular ones follows a geometric law: Tk c Q . T k - 1 , where Q E (0, l),which corresponds to an exponential decay of the temperature. The cooling schedule can be used for balancing between diversification and intensification. For example, at the beginning of the search, T , might be constant or linearly decreasing in order to sample the search space; then, T k might follow a rule such as the geometric one in order to make the algorithm converge to a local minimum at the end of the search. More successhl variants are non-monotonic cooling schedules (e.g., see [94, 1 131). Non-monotonic cooling schedules are characterized by alternating phases of cooling and reheating, thus providing an oscillating balance between diversification and intensification. The cooling schedule and the initial temperature should be adapted to the particular problem instance considered, since the cost of escaping from local minima depends

&

TRAJECTORY METHODS

11

on the structure of the search landscape. A simple way of empirically determining the starting temperature TOis to initially sample the search space with a random walk to roughly evaluate the average and the variance of objective function values. Based on the samples the starting temperature can be determined such that uphill moves have a high probability. But also more elaborate schemes can be implemented [82].

SA has been applied to many CO problems, such as the quadratic assignment problem (QAP) [30] and the job shop scheduling (JSS) problem [ 1441. References to other applications can be found in [2, 55, 821. SA is nowadays more used as a component in metaheuristics, rather than applied as a stand-alone search algorithm. Variants of SA called Threshold Accepting and The Great Deluge Algorithm were presented in [48] and [47], respectively. 1.2.2 Tabu Search Tabu Search (TS) is one of the most successful metaheuristics for the application to CO problems. The basic ideas of TS were introduced in [66], based on earlier ideas formulated in [65].4 A description of the method and its concepts can be found in [70]. The basic idea of TS is the explicit use of search history, both to escape from local minima and to implement an explorative strategy. A simple TS algorithm is based on a best-improvement local search (see Section 1.1.2) and uses a short term memory to escape from local minima and to avoid cycles5 The short term memory is implemented as a tabu list TL that keeps track of the most recently visited solutions and excludes them from the neighborhood of the current solution. At each iteration, the best solution among the allowed ones is chosen as the new current solution. Furthermore, this solution is added to the tabu list. The implementation of short term memory in terms of a list that contains complete solutions is not practical, because managing a list of complete solutions is highly inefficient. Therefore, instead of the solutions themselves, those solution components are stored in the tabu lists that are involved in moves. Since different kinds of moves that work on different types of solution components can be considered, a tabu list is usually introduced for each type of solution component. The different types of solution components and the correspondingtabu lists define the tabu conditions which are used to filter the neighborhood of a solution and generate the allowed set n/,( 9 ) . Storing solution components instead of complete solutions is much more efficient, but it introduces a loss of information, as forbidding, for example, the introduction of a certain solution component in a solution means assigning the tabu status to probably more than one solution. Thus, it is possible that unvisited solutions of high quality are excluded from the allowed set. To overcome this problem, aspiration criteria are defined which allow to include a solution in the allowed set even if it is forbidden by

'Related ideas were labelled steepest ascent/mildest descent method in [76] ' A cycle is a sequence of moves that constantly repeats itself.

12

AN INTRODUCTION TO METAHEURlSTlC TECHNIQUES

Algorithm 4 Tabu Search (TS) s c GeneratelnitialSolution() InitializeTabuLists(TLI, . . . , TL,) while termination conditions not met do Na(s)c {s’ E N ( s ) I s’ does not violate a tabu condition, or it satisfies at least one aspiration condition} s’ +- argmin{f(s’l) I s” E N,(s)} UpdateTabuLists( TLI , . . . , TLp,s,s‘) s +- sf {i.e., s’ replaces s} end while output: best solution found tabu conditions. The most commonly used aspiration criterion applies to solutions which are better than the best solution found so far. This tabu search algorithm is shown in Algorithm 4. The use of a tabu list prevents from returning to recently visited solutions; therefore it prevents from endless cycling6 and forces the search to accept even uphill moves. The length I of the tabu list-known in the literature as the tabu tenure-controls the memory of the search process. With small tabu tenures the search will concentrate on small areas of the search space. In contrast, a large tabu tenure forces the search process to explore larger regions, because it forbids revisiting a higher number of solutions. The tabu tenure can be varied during the search, leading to more robust algorithms. An example can be found in [ 1391, where the tabu tenure is periodically reinitialized at random from the interval [Imln, lmaz]. A more advanced use of a dynamic tabu tenure is presented in [ 131, where the tabu tenure is increased if there is evidence for repetitions of solutions (thus a higher diversification is needed), while it is decreased if there are no improvements (thus intensification should be boosted). More advanced ways of applying dynamic tabu tenures are described in [67]. Tabu lists, which are usually identified with the use of short term memory, are only one of the possible ways of taking advantage of the history of the search. Information collected during the whole search process can be used in many other ways, especially for a strategic guidance of the algorithm. This kind of long term memory is usually added to TS by referring to four principles: recency,frequency, quality, and influence. Recency-based memory records for solutions (or solution components)the most recent iteration they were involved in. Orthogonally, frequency-basedmemory keeps track of how many times a solution has been visited, respectively how many times a solution component was part of a visited solution. This information identifies the regions of the search space in which the search was confined or in which it stayed for a high number of iterations. This kind of information about the past is usually exploited to diversify the search. The third principle (i.e., quality) refers to the accumulation and extraction of information from the search history in order 6Cycles of higher period are possible. since the tabu list has a finite length 1 which is smaller than the cardinality of the search space.

TRAJECTORY METHODS

13

Algorithm 5 Greedy Randomized Adaptive Search Procedure (GRASP) while termination conditions not met do {see Algorithm 6) s t ConstructGreedyRandomizedSolution() ApplyLocalSearch(s) end while output: best solution found

to identify solution components that contribute to good solutions. This information can be usefully integrated in solution constructions or in the evaluation of moves. Other metaheuristics (e.g., ant colony optimization) explicitly use this principle to learn about good combinations of solution components. Finally, influence concerns certain choices that were made during the search process. Sometimes it can be beneficial to know which choices were the most critical ones. In general, the TS field is a rich source of ideas. Many of these ideas and strategies have been and are currently adopted by other metaheuristics. TS has been applied to most CO problems; examples of successhl applications are the Robust Tabu Search to the QAF' [ 1391, the Reactive Tabu Search to the maximum satisfiability (MAXSAT) problem [12, 1301, and to assignment problems [34]. TS approaches dominate the job shop scheduling (JSS) problem area (see, for example, [ 1121) and the vehicle routing (VR) area [64]. Further references of applications can be found in [70]. 1.2.3 Explorative Local Search Methods

In this section we present more recently proposed trajectory methods. These are the greedy randomized adaptive search procedure (GRASP), variable neighborhood search (VNS), guided local search (GLS), and iterated local search (ILS). 1.2.3.I Greedy Randomized Adaptive Search Procedure. The greedy randomized adaptive search procedure (GRASP), see [53, 1171, is a simple metaheuristic that combines constructive heuristics and local search. Its structure is sketched in Algorithm 5 . GRASP is an iterative procedure, composed of two phases: solution construction and solution improvement. The best found solution is returned upon termination of the search process. The solution construction mechanism (see Algorithm 6) is a randomized constructive heuristic. As outlined in Section 1.1.1, a constructive heuristic generates a solution step-by-step by adding one new solution component from a finite set C (where JCl = n) of solution components to the current partial solution 57'. The solution component that is added at each step is chosen at random from a list that is called the restricted candidate list. This list is a subset of %(sP), the set of allowed solution components, and is denoted by RCL. In order to generate this list, the solu-

14

A N INTRODUCTION TO METAHEURISTIC TECHNIQUES

Algorithm 6 Greedy Randomized Solution Construction {Remember that 5P denotes a partial solution} 5 p = () cr t DetermineLengthOfRestrictedCandidateList() while n ( 5 p ) # 8 do RCL t GenerateRestrictedCandidateList(rl,rJl(s) c t PickAtRandom(RCL) 5 P t extend 5P by adding solution component c end while tion components in %(5P) are ranked by means of a weighting function. Then, RCL is composed by the (Y highest ranked solution components. The length cy of the restricted candidate list determines the strength of the heuristic bias that is introduced by the weighting function. In the extreme case of cy = 1, the highest weighted solution component is added deterministically;thus the construction would be equivalent to a deterministic greedy heuristic. In contrast, the setting of a = /n(sP)/ at each construction step leads to the construction of a random solution. Therefore, a is a critical parameter which influences the sampling of the search space. In [ 1 171 the most important schemes to define a are listed. The simplest scheme.is, trivially, to keep cr constant; alternatively it can also be changed at each iteration, either randomly or by means of an adaptive scheme. The second phase of the algorithm is a local search method, which may be a basic local search algorithm such as iterative improvement or a more advanced technique such as SA or TS. GRASP can be effective if two conditions are satisfied: 0

0

the solution construction mechanism samples the most promising regions of the search space; the solutions constructed by the constructive heuristic enable local search to reach different local minima.

The first condition can be met by the choice of an effective constructive heuristic and an appropriate length of the candidate list, whereas the second condition can be met by choosing the constructive heuristic and the local search in a way such that they fit well. The description of GRASP as given above indicates that a basic GRASP does not use the history of the search process.' The only memory requirement is for storing the problem instance and for keeping the best-so-far solution (i.e., the best solution found since the start of the algorithm). This is one of the reasons why GRASP is often outperformed by other metaheuristics. However, due to its simplicity, it is generally very fast and it is able to produce quite good solutions in a very short amount of computationtime. Furthermore, it can be easily integrated into other search 'However. some extensions in this direction are cited in -I1171. - and an exainole o f a inetaheuristic inethod using an adaptive greedy procedure depending on search history is Squeaky Wheel Optimization (SWO) ~ 5 1 .

TRAJECTORY METHODS

15

Algorithm 7 Variable Neighborhood Search (VNS) Select a set of neighborhood structures Nk, k = 1, . . . , kma, s t GeneratelnitialSolution() while termination conditions not met do k+l while k < kma, do s’ t PickAtRandom(Nk(s)) {also called the shakingphase} S” c

LocalSearch(s’)

if f(s”) < f(s) then s + s“

k + l else k t k S 1 end if end while end while output: best solution found

techniques. Among the applications of GRASP we mention the JSS problem [ 161, the graph planarization problem [125], and assignment problems [118]. A detailed and annotated bibliography references many more applications [54].

1.2.3.2 Variable Neighborhood Search. Variable Neighborhood Search (VNS) is a metaheuristic proposed in [78], which explicitly applies strategies for swapping between different neighborhood structures from a predefined finite set. The algorithm is very general and many degrees of freedom exist for designing variants and particular instantiations. The algorithmic framework of the VNS algorithm is shown in Algorithm 7. At the initialization of the algorithm, a set of neighborhood structures has to be defined. These neighborhood structures can be arbitrarily chosen. Then, an initial solution is generated, the neighborhood index is initialized, and the algorithm iterates until a termination condition is met. Each iteration consists of three phases: shaking, local search, and move. In the shaking phase a solution s‘ in the k-th neighborhood of the current solution s is randomly selected. Solution s’ is then used as the starting point for a local search procedure, which may use any neighborhood structure and is not restricted to the set of neighborhood structures Nk, k = 1,.. . ,k,,. At the termination of the local search process, the new solution s” is compared with s and, if it is better, it replaces s and the algorithm proceeds with k = 1. Otherwise, k is incremented and a new shaking phase starts using a different neighborhood. The objective of the shaking phase is to select a solution from some neighborhood of the current local minimum that is a good starting point for the local search. The starting point should enable local search to reach a different local minimum than the current one, but should not be “too far” from s, otherwise the algorithm would degenerate into a simple multi-start local search with random starting solutions.

16

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

Therefore, choosing s’ in the neighborhood of the current solution is likely to produce a solution that maintains some features of the current one. The process of changing neighborhoods in case of no improvements corresponds to a diversification of the search. The effectiveness of this dynamic strategy for swapping between neighborhood structures can be explained by the fact that a “bad” place on the search landscape given by a certain neighborhood structure could be a “good” place on the search landscape given by another neighborhood structure.8 Moreover, a solution that is locally optimal with respect to a neighborhood is probably not locally optimal with respect to another neighborhood.

VNS and its variants have been successfully applied to graph-based CO problems such as the p-median problem [77], the degree constrained minimum spanning tree problem 11261, the Steiner tree problem [147], and the k-cardinality tree (KCT) problem [ 105, 1431. References to more applications can be found in [78]. 1.2.3.3 Guided Local Search. Guided Local Search (GLS) [I461 applies a strategy for escaping from local minima that is very different to the strategies that are employed by tabu search or variable neighborhood search. This strategy consists in dynamically changing the objective function, which results in a change of the search landscape. The aim is to make the current local minimum gradually “less desirable” over time. The dynamic change of the objective function in GLS is based on a set of nL solution features sf,, i = 1,. . . , m. A solution feature may be any kind of property or characteristic that can be used to discriminate between solutions. For example, solution features in the TSP could be the edges between the cities. To each solution feature sf, is assigned a cost value cz, which gives a measure of the contribution of solution feature sf, to the objective function f(.). In the TSP example, the cost of a solution feature could be the length of the corresponding edge. An indicator hnction I ( i ,s ) indicates whether the solution feature s f , is present in a solution s:

I ( i ,s ) =

{

1 : if feature sf2 is present in solution s 0 : otherwise .

During the whole run of the algorithm, the original objective hnction f (.) is replaced by anew objective function f ’ (.) that is obtained by adding to f ( - ) a term that depends on the m solution features:

where p,, i = 1, . . . , m, are thepenalty values and X > 0 is a constant that determines the influence of the penalty term. The penalty values are weights of the solution features: the higher p , , the higher is the cost of having that feature in a solution. 8 A “good’ place in the search space is an area froin which a good local minimum can be reached

TRAJECTORY METHODS

17

Algorithm 8 Guided Local Search (GLS) s t GeneratelnitialSolution() P (O,'.'?O) while termination conditions not met do it LocalSearch(s,f ' ) UpdatePenaltyVector(p,i) s t d end while output: best solution found +

The algorithm (see Algorithm 8) works as follows. First, an initial solution s is generated. Then, at each iteration a local search procedure is applied to the current solution until a local minimum iis reached. Note that this local search procedure uses the modified objective function. The function UpdatePenaltyVector(p,i) modifies the penalty vector p = (PI, . . . ,p,) depending on 5. First, the so-called utility Util(ii) , of each solution feature is determined:

Util(i,i)= I ( i ,i) '

c,

1

+ P,

This equation shows that the higher the cost, the higher the utility of features. Nevertheless, the cost is scaled by the penalty value to prevent the algorithm from being totally biased toward the cost and to make it sensitive to the search history. Then the penalty values of the solution features with maximum utility are updated as follows: P,+Pi+1. (1.4) The penalty value update procedure can be supplemented by a multiplicative rule of the form p , t pi . a, where cy E ( 0 , l ) . Such an update rule is generally applied with a lower frequency than the one of Equation 1.4 (e.g., every few hundreds of iterations). The aim of this update is the smoothing of the weights of penalized features so as to prevent the search landscape from becoming too rugged. It is important to note that the penalty value update rules are often very sensitive to the problem instance under consideration. GLS has been successfully applied to the weighted MAXSAT [103], the VR problem [86], the TSP, and the QAP [ 1461. 1.2.3.4 Iterated Local Search. Iterated Local Search (ILS) [136, 93, 981 is a metaheuristic that is based on a simple but powerfd concept. At each iteration the current solution (which is a local minimum) is perturbed and a local search method is applied to the perturbed solution. Then, the local minimum that is obtained by applying the local search method is either accepted as the new current solution or not. Intuitively, ILS performs a trajectory along local minima il, i 2 , . . . , it without

18

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

Algorithm 9 Iterated Local Search (ILS) s t GeneratelnitialSolution() 2 t LocalSearch(s) while termination conditions not met do s’ t Perturbation(S, Izistory) i?+ LocalSearch(s’) s t ApplyAcceptanceCriterion(d’, s, Iristory) end while output: best solution found

explicitly introducing a neighborhood structure on S, by applying the scheme that is shown in Algorithm 9. The importance of the perturbation is obvious: too small a perturbation might not enable the system to escape from the local minimum just found. On the other side, too strong a perturbation would make the algorithm similar to a random restart local search. Therefore, the requirement on the perturbation method is to produce a starting point for local search such that a local minimum different from the current solution is reached. However, this new local minimum should be closer to the current solution than a local minimum produced by the application of the local search to a randomly generated solution. The acceptance criterion acts as a counterbalance, as it filters the accepted solutions depending on the search history and the characteristics of the new local minimum. The design of ILS algorithms has several degrees of freedom in the generation of the initial solution, the choice of the perturbation method, and the acceptance criterion. Furthermore, the history of the search process can be exploited in the form of both short and long term memory. In the following we describe the three main algorithmic components of ILS.

GeneratelnitialSolution(): The construction of initial solutions should be fast (computationally not expensive), and initial solutions should be a good starting point for local search. Any kind of solution construction procedure can be used.

Perturbation(2,history): The perturbation is usually non-deterministic in order to avoid cycling. Its most important characteristic is the strength, roughly defined as the amount of changes inflicted on the current solution. The strength can be either fixed or variable. In the first case, the distance between S and s’ is kept constant, independently of the problem size. However, a variable strength is in general more effective, since it has been experimentally found that, in most of the problems, the bigger the problem instance size, the larger should be the strength. A more sophisticated mechanism consists of adaptively changing the strength. For example, the strength might be increased when more diversification is needed or decreased when intensification seems preferable. A second choice is the mechanism to perform perturbations: random or semi-deterministic.

POPULATION-BASED METHODS

19

ApplyAcceptanceCriterion(i’,i, history): The third important component is the acceptance criterion. Two extreme examples are (1) accepting the new local minimum only in case of improvement and (2) always accepting the new solution. In-between, there are several possibilities. For example, it is possible to adopt an acceptance criterion that is similar to the one of simulated annealing. Non-monotonic cooling schedules might be particularly effective if they exploit the history of the search process. For example, when the recent history of the search process indicates that intensification seems no longer effective, a diversification phase is needed and the temperature is increased. Examples of successhi applicationsof ILS are to the TSP [83,97], to the QAP [93], and to the single-machine total weighted tardiness (SMTWT) problem [35]. References to other applications can be found in [93]. 1.3 POPULATION-BASED METHODS Population-based methods deal in every iteration of the algorithm with a set (i.e., a population) of solutions rather than with a single solution.’ In this way, populationbased algorithms provide a natural, intrinsic way for the exploration of the search space. Yet, the final performance strongly depends on the way the population is manipulated. The most studied population-based methods in combinatorial optimization are evolutionary computation (EC) and ant colony optimization (ACO). In EC algorithms, a population of individuals is modified by recombination and mutation operators, and in ACO a colony of artificial ants is used to construct solutions guided by the pheromone trails and heuristic information. 1.3.1 Evolutionary Computation

Evolutionary Computation (EC) algorithms are inspired by nature’s capability to evolve living beings well adapted to their environment. EC algorithms can be characterized as computational models of evolutionary processes. At each iteration a number of operators are applied to the individuals of the current population to generate the individuals of the population of the next generation (iteration). Usually, EC algorithms use operators called recombination or crossover to recombine two or more individuals to produce new individuals. They also use mutation or modification operators which cause a self-adaptation of individuals. The driving force in evolutionary algorithms is the selection of individuals based on theirJitness (which can be based on the objective function, the result of a simulation experiment, or some other lund of quality measure). Individuals with a higher fitness have a higher probability to be chosen as members of the population of the next iteration (or as parents for the generation of new individuals). This corresponds to the principle of survivaal qfthe

91n general. especially in EC algorithms. we talk about a population of individuals rather than solutions.

20

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

Algorithm 10 Evolutionary Computation (EC) P c GeneratelnitialPopulation() Evaluate(P) while termination conditions not met do P’ t Recombine(P) P” t Mutate(P’)

Eva1uate(P”) P c Select(P,P”) end while output: best solution found

fittest in natural evolution. It is the capability of nature to adapt itself to a changing environment, which gave the inspiration for EC algorithms.

There has been a variety of slightly different EC algorithms proposed over the years. Basically they fall into three different categories which have been developed independently of each other. These are Evolutionary Programming (EP) as introduced by Fogel in [59] and Fogel et al. in [60], Evolution Strategies (ES) proposed by Rechenberg in [121], and Genetic Algorithms initiated by Holland in [81] (see [73], [104], [124], and [145] for further references). EP arose from the desire to generate machine intelligence. While EP originally was proposed to operate on discrete representations of finite state machines, most of the present variants are used for continuous optimization problems. The latter also holds for most present variants of ES, whereas GAS are mainly applied to solve combinatorial optimization problems. Over the years there have been quite a few overviews and surveys about EC methods. Among those are the ones by Back [8], by Fogel [57], by Hertz and Kobler [79], by Spears et al. [ 1331, and by Michalewicz and Michalewicz [ 1021. In [20] a taxonomy of EC algorithms is proposed. Algorithm 10 contains the basic structure of EC algorithms. In this algorithm, P denotes the population of individuals. At each iteration a set of offspring individuals P’ is generated by the application of the function Recombine(P), whose members may then be mutated in function Mutate(P’), producing a set of mutated offspring individuals I”’.The individuals for the next population are then selected in hnction Select(P,P”)from the union of the old population P and the set of mutated offspring individuals P ” . Individuals of EC algorithms are not necessarily solutions to the considered problem. They may be partial solutions, or sets of solutions, or any object which can be transformed into one or more solutions in a structured way. Most commonly used in combinatorial optimization is the representation of solutions as bit-strings or as permutations of n integer numbers. Tree-structuresor other complex structures are also possible. In the context of genetic algorithms, individuals are called genotypes, whereas the solutions that are encoded by individuals are called phenotypes. This is to differentiate between the representation of solutions and solutions themselves. The choice of an appropriate representation is crucial for the

POPULATION-BASED METHODS

21

success of an EC algorithm. Holland’s schema analysis [8 11 and Radcliffe’s generalization to formae [ 1201 are examples of how theory can help to guide representation choices. In the following the components of Algorithm 10 are outlined in more detail.

GeneratelnitialPopulation():The initial population may be a population of randomly generated individuals, or individuals obtained from other sources such as constructive heuristics.

Recombine(P): The most common recombination operator is two-parent crossover. But there are also recombination operators that operate on more than two individuals to create a new individual (multi-parent crossover), see [l5, 491. More recent developments even use population statistics for generating the individuals of the next population. Examples are the recombination operators called Gene Pool Recombination [ 1091 and Bit-Simulated Crossover [ 1381 which make use of a probability distribution over the search space given by the current population to generate the next population.” The question of which individualscan be recombined can be expressed in the form of a neighborhood function NEC: Z-+ Z2, which assigns to each individual i E Za set of individuals N ~(i) c Z whose members are permitted to act as recombination partners for i to create offspring. If an individual can be recombined with any other individual (as, for example, in the simple GA [145]), we talk about unstructured populations, otherwise we talk about structured populations. An example of an EC algorithm that works on structured populations is the Parallel Genetic Algorithm (PGA) proposed by Muhlenbein [ 1071. Like in this case, many structured algorithms are run in parallel, but many others are not. To get deeper on this distinction the interested reader can consult [7] or the more recent and complete study found in [ 5 ] . Mutate(P’): The most basic form of a mutation operator applies small random changes to some of the offspring individuals in P’. This is done in order to introduce some noise in the search process for avoiding premature convergence. Instead of using random mutations, in many applications it proved to be quite beneficial to use improvement mechanisms to increase the fitness of individuals. EC algorithms that apply a local search algorithm to each individual of a population are often called Memetic Algorithms [ 1061. While the use of a population ensures an exploration of the search space, the use of local search techniques helps to quickly identify “good’ areas in the search space. Select(P”,P): At each iteration it has to be decided which individuals will enter the population of the next iteration. This is done by a selection scheme. To choose the individuals for the next population exclusively from the offspring is called generational replacement. In some schemes, such as elitist strategies, successive

“Both methods can be regarded as members of the class of Estimation of Distribution Algorithms (EDAs); see Section 1.3.1.2.

22

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

generations overlap to some degree, i.e., some portion of the previous generation is retained in the new population. The fraction of new individuals at each generation is called the generational gap [32]. In a steady state selection, only a few individuals are replaced in each iteration: usually a small number of the least fit individuals are replaced by offspring. Most EC algorithms deal with populations of constant size. However, it is also possible to have a variable population size. In case of a continuously shrinking population size, the situation in which only one individual is left in the population (or no crossover partners can be found for any member of the population) might be one of the stopping conditions of the algorithm. One of the major difficulties of EC algorithms (especially when applying local search) is the premature convergence toward sub-optimal solutions. The simplest mechanism to diversify the search process is the use of a random mutation operator. In order to avoid premature convergence there are also a number of other ways of maintaining the population diversity. Probably the oldest strategies are crowding [32] and its close relative,preselection 1221. Newer strategies arejtness sharing [74] and niching [95] in which the reproductive fitness allocated to an individualin a population is reduced proportionallyto the number of other individualsthat share the same region of the search space. An important characteristic of an EC algorithm is the way it deals with infeasible individuals which might be produced by the genetic operators. There are basically three different ways to handle such a situation. The simplest action is to reject infeasible individuals. Nevertheless, for some highly constrained problems (e.g., for timetabling problems) it might be very difficult to find feasible individuals. Therefore, the strategy of penalizing infeasible individuals in the function that measures the quality of an individual is sometimes more appropriate (or even unavoidable); see, for example, [29]. The third possibility consists in trying to repair an infeasible solution (see [50] for an example). Finally, we must point out that all the mentioned methods are usually developed within some research line dnving the creation of such operations. Two such important philosophies to deal with the intensificatioddiversificationtrade-off are hybridization of algorithms [ 1401 and parallel and structured algorithms [ 5 ] . This concludes our description of EC algorithms. EC algorithms have been applied to most CO problems and optimization problems in general. Recent successes were obtained in the rapidly growing bioinformatics area (see, for example, [58]), but also in multiobjective optimization 1281 and evolvable hardware [ 1291. For an extensive collection of references to EC applications we refer to [9]. In the following two subsections we are going to introduce two other populationbased methods which are sometimes also considered as EC algorithms. 1.3.1.1 Scatter Search and Path Relinking.

Scatter Search (SS) and its generalized form called Path Relinking (PR) [68, 711 differ from EC algorithms mainly by providing unifying principles for joining (or recombining) solutions based on generalized path constructions in Euclidean or neighborhood spaces. These principles are

POPULATION-BASED METHODS

23

Algorithm 11 Scatter Search (SS) and Path Relinking (PR) S s e e d + SeedGenerationo S d z v +- DiversificationGenerator(S,,,d)

Sref+- ChooseReferenceSet(Sd,,)

while termination conditions not met do while stopping conditions for inner loop not met do &ub +- SubsetGeneration(S,,f) Strzalc SolutionCornbination(Ssub) sdzsp Irnprovernent(Strzal) STef+ ReferenceSetUpdate(S,,f,Sdzsp) end while Selzte +- ChooseBestFrorn(SVef) sdzv +- DiversificationGenerator(Selzte)

S,,f

+

ChooseReferenceSet(Sd,,)

end while output: best solution found

based on strategies originally proposed for combining decision rules and constraints in the context of integer programming. The template for scatter search and path relinking is shown in Algorithm 11. Scatter Search and path relinking are search strategies that operate on a set of reference solutions (denoted by Srefin Algorithm 11) that are feasible solutions to the problem under consideration. The set of reference solutions corresponds to the population of individuals in EC algorithms. The algorithm starts by generating a set S s e e d of so-called seed solutions. This is done by means of some heuristic method. Then, in function DiversificationGenerator(SSeed), a method is applied that iteratively chooses one of the seed solutions and generates a new solution with the aim of creating a solution as much different as possible to the existing seed solutions. The newly generated solutions are added to the set of seed solutions if they do not already exist in there. From S d z u , the first set of reference solutions is then chosen such that it contains high quality as well as diverse solutions. Then the main loop of the algorithm starts. At each iteration the following cycle is repeated a number of times (which is a parameter of the algorithm). First, a subset of the reference solutions Ssub is chosen in fimction SubsetGeneration(Sref). Second, the solutions from Ssvb are recombined in fbnction SolutionCornbination(S,,b) to yield one or more trial solutions Strzal.These trial solutions may be infeasible solutions and are therefore usually modified by means of a repair procedure that transforms them into feasible solutions. An improvement mechanism Irnprovement(StrZa~) is then applied in order to try to improve the set of trial solutions (usually this improvement procedure is a local search). These improved solutions form the set of dispersed solutions, denoted by S d z s p . Finally, the set of reference solutions is updated with the solutions from s d z s p in hnction ReferenceSetUpdate(Sref,Sdzsp), again with respect to criteria such as quality and diversity. After a number of these cycles, a set of elite solutions Selzteis chosen from the set of reference solutions, the diversification generator is

24

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

applied, and the new set of reference solutions is chosen from the resulting set of solutions.

SolutionCombination(Ssub):In scatter search, which was introduced for solutions encoded as points in the Euclidean space, new solutions are created by building linear combinations of reference solutions using both positive and negative weights. This means that trial solutions can be both inside and outside the convex region spanned by the reference solutions. In path relinking the concept of combining solutions by making linear combinations of reference points is generalized to neighborhood spaces. Linear combinations of points in the Euclidean space can be re-interpreted as paths between and beyond solutions in a neighborhood space. To generate the desired paths, it is only necessary to select moves that satisfy the following condition: upon starting from an initiating solution chosen from S S U b , the moves must progressively introduce attributes contributed by a guiding solution that is equally chosen from s s u b . Scatter Search enjoys increasing interest in recent years. Among other problems it has been applied to multiobjective assignment problems [88] and to the linear ordering problem (LOP) [21]. For further references we refer to [72]. Path Relinlung is often used as a component in metaheuristics such as TS [90] and GRASP [3, 891.

1.3.1.2 Estimation of Distribution Algorithms. In the last decade more and more researchers tried to overcome the drawbacks of usual recombination operators of EC algorithms, which are likely to break good building blocks.” With this aim, a number of algorithms that are sometimes called estimation of distribution algorithms (EDAs) [lo81 have been developed (see Algorithm 12 for the algorithmic framework). These algorithms, which have a theoretical foundation in probability theory, are based on populations that evolve as the search progresses, like EC algorithms. They work as follows. First, an initial population P of solutions is randomly or heuristically generated. Then the following cycle is repeated until the termination criteria are met. A fraction of the best solutions of the current population (denoted by Psel) are selected in function ChooseFrom(P). Then from the solutions in Psel a probability distribution over the search space is derived in function EstimateProbabilityDistribution(Psel).This probability distribution is then sampled in function SampleProbabilityDistribution(p(x))to produce the population of the next iteration. For a survey of EDAs we refer the interested reader to [ 1 161. One of the first EDAs that was proposed for the application to CO problems is called Population-Based Incremental Learning (PBIL) [ 1 13. The method works on a real valued probability vector (i.e., the probability distribution over the search space) where each position corresponds to a binary decision variable. The objective is to change this probability vector over time such that high quality solutions are generated

’’

Roughly speaking, a good building block is a subset of the set of solution components which result in a high average quality of all the solutions that contain this subset.

POPULATION-BASED METHODS

25

Algorithm 12 Estimation of Distribution Algorithm (EDA) P c GeneratelnitialPopulation() while termination conditions not met do Psel +- ChooseFrom(P) {Eel P) p(x) = p(x I Psel)t EstimateProbabilityDistribution(Psel) P c SampleProbabilityDistribution(p(x)) end while output: best solution found

c

from it with a high probability. In contrast to PBIL, which estimates a distribution of promising solutions assuming that the decision variables are independent, various other approaches try to estimate distributions taking into account dependencies between decision variables. An example of EDAs regarding such painvise dependencies is MIMIC [31], while an example of multivariate dependencies is the Bayesian Optimization Algorithm (BOA) [ 1151. The field of EDAs is still quite young and much of the research effort is focused on methodology rather than high performance applications. Applications to the knapsack problem, the job shop scheduling (JSS) problem, and other CO problems can be found in [9 I].

1.3.2 Ant Colony Optimization Ant colony optimization (ACO) [42, 40, 451 is a metaheuristic approach that was inspired by the foraging behavior of real ants. This behavior-as described by Deneubourg et al. in [36]-enables ants to find shortest paths between food sources and their nest. Initially, ants explore the area surrounding their nest in a random manner. As soon as an ant finds a source of food, it carries some of this food to the nest. While walking, the ant deposits a chemical pheromone trail on the ground. The quantity of pheromone deposited, which may depend on the quantity and quality of the food, will guide other ants to the food source. The indirect communication between the ants via the pheromone trails enables them to find shortest paths between their nest and food sources. This functionality of real ant colonies is exploited in artificial ant colonies in order to solve CO problems. In ACO algorithms the chemical pheromone trails are simulated via a parametrized probabilistic model that is called the pheromone model. The pheromone model consists of a set of model parameters whose values are called thepheromone values. The basic ingredient of ACO algorithms is a constructive heuristic that is used for probabilistically constructing solutions using the pheromone values. In general, the ACO approach attempts to solve a CO problem by iterating the following two steps: Solutions are constructed using a pheromone model, that is, a parametrized probability distribution over the solution space.

26

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

Algorithm 13 Ant Colony Optimization (ACO) while termination conditions not met do ScheduleActivities AntBasedSolutionConstruction() Pheromoneupdate0 DaemonActions() {optional} end ScheduleActivities end while output: best solution found

0

The constructed solutions, and possibly solutions that were constructed in earlier iterations, are used to modify the pheromone values in a way that is deemed to bias future sampling toward high quality solutions.

The ACO metaheuristic framework is shown in Algorithm 13. It consists of three algorithmic components that are gathered in the ScheduleActivitiesconstruct. The ScheduleActivities construct does not specify how these three activities are scheduled and synchronized. This is up to the algorithm designer. In the following we explain these three algorithm components in more detail.

AntBasedSolutionConstruction():As mentioned above, the basic ingredient of ACO algorithms is a constructive heuristic for probabilistically constructing solutions. As outlined in Section 1.1.1, a constructive heuristic assembles solutions as sequences of solution components taken from a finite set of solution components C = { c l , . . . , c , } . A solution construction starts with an empty partial solution 57' = (). Then, at each construction step the current partial solution i P is extended by adding a feasible solution component from the set %(@) C C \ 5P, which is defined by the solution construction mechanism. The process of constructing solutions can be regarded as a walk (or a path) on the so-called construction graph Sc = (C,C), whose vertices are the solution components C and the set C has the connections. The allowed walks on & are hereby implicitly defined by the solution construction mechanism that defines set T ( 9 )with respect to a partial solution 57'. At each construction step, the choice of a solution component from %(sP) is done probabilistically with respect to the pheromone model, which consists of pheromone trail parameters '& that are associated to components c, E C.'*The set of all pheromone trail parameters is denoted by 7. The values of these parameters-the pheromone values-are denoted by T ~ In . most ACO algorithms the probabilities for choosing the next solution component-also called the transition probabilities-are defined

"Note that the description of the ACO inetaheuristic as given for example in 1401 allows also connections of the construction graph to be associated a pheromone trail parameter. However. for the purpose ofthis introduction it is sufficient to assume pheromone trail parainetas associated to componcnts.

POPULATION-BASED METHODS

27

as follows:

where 77 is a weighting function, which is a function that, sometimes depending on the current partial solution, assigns at each constructionstep a heuristic value v( c l ) to each feasible solution component c1 E ( y l ( 5 P ) . The values that are given by the weighting function are commonly called the heuristic information. Furthermore, cy and /3 are positive parameters whose values determine the relation between pheromone information and heuristic information. Pheromoneupdate(): In ACO algorithms we can find different types of pheromone updates. First, we outline a pheromone update that is used by most ACO algorithms. This pheromone update consists of two parts. First, a pheromone evaporation, which uniformly decreases all the pheromone values, is performed. From a practical point of view, pheromone evaporation is needed to avoid a too rapid convergence toward a sub-optimal region. It implements a useful form of forgetting, favoring the exploration of new areas in the search space. Then, one or more solutions from the current and/or from earlier iterations are used to increase the values of pheromone trail parameters on solution components that are part of these solutions. As a prominent example, we outline in the following the pheromone update rule that was used in Ant System (AS) [39, 421, which was the first ACO algorithm proposed. This update rule, which we henceforth call AS-update, is defined by

for i = 1,.. . n, where GtteTis the set of solutions that were generated in the current iteration. Furthermore, p E (011]is a parameter called evaporation rate, and F : 6 H R+ is a function such that f ( 5 ) < f(s’) =+ F ( s ) L F(s’),V5 # 5’ E 6. F ( .) is commonly called the quality function. Other types of pheromone update are rather optional and mostly aim at the intensification or the diversification of the search process. An example is a pheromone update in which, during the solution construction, when adding a solution component c, to the current partial solution 5 p , the pheromone value 7-i is immediately decreased. This kind of pheromone update aims at a diversification of the search process.

DaemonActions(): Daemon actions can be used to implement centralized actions which cannot be performed by single ants. Examples are the application of local search methods to the constructed solutions or the collection of global information that can be used to decide whether it is useful or not to deposit additional pheromone to bias the search process from a non-local perspective. As a practical example, the daemon may decide to deposit extra pheromone on the solution components that belong to the best solution found so far.

28

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

In general, different versions of ACO algorithms differ in the way they update the pheromone values. This also holds for the currently best-performing ACO variants in practice, which are Ant Colony System (ACS) [41], M A X - M Z N Ant System ( M M A S ) [ 1371, and ACO algorithms that are implemented in the hyper-cube framework (HCF) [ 181. Successful applications of ACO include the application to routing in communication networks [38], to the sequential ordering problem (SOP) [62], to resource constraint project scheduling (RCPS) [ 1001, and to the open shop scheduling (OSS) problem [ 171. Further references to applications of ACO can be found in [44,45]. 1.4 DECENTRALIZED METAHEURISTICS

There exists a long tradition in using structured populations in EC, especially associated to parallel implementations. Among the most widely known types of structured EAs, distributed (dEA) [141] and cellular (cEA) [96] algorithms are very popular optimization procedures [7]. Decentralizing a single population can be achieved by partitioning it into several subpopulations,where island EAs are run performing sparse exchanges of individuals (distributed EAs) or in the form of neighborhoods (cellular EAs). In distributed EAs, additional parameters controlling when migration occurs and how migrants are selectedincorporated fromito the sourcekarget islands are needed [ 14, 14 11. In cellular EAs, the existence of overlapped small neighborhoods helps in exploring the search space [lo]. These two kinds of EAs seem to provide a better sampling of the search space and improve the numerical and run time behavior of the basic algorithm in many cases [6, 751. The main difference of a cEA with respect to a panmictic EA is its decentralized selection and variation. In cEAs, the reproductive loop is performed inside every one of the numerous individual pools. In a cEA, one given individual has its own pool of potential mates defined by neighboring individuals; at the same time, one individual belongs to many pools. This 1D or 2D structure with overlapped neighborhoods is used to provide a smooth diffusion of good solutions across the grid. A distributed EA is a multi-population(island) model performing sparse exchanges of individuals among the elementary populations. This model can be readily implemented in distributed memory MIMD computers, which provides one main reason for its popularity. A migration policy controls the kind of distributed EA being used. The migration policy must define the island topology, when migration occurs, which individuals are being exchanged, the synchronization among the subpopulations,and the kind of integration of exchanged individuals within the target subpopulations. The advantages of a distributed model (either running on separate processors or not) is that it is usually faster than a panmictic EA. The reason for this is that the run time and the number of evaluations are potentially reduced due to its separate search in several regions from the problem space. A high diversity and species formation are two of their well-reported features.

HYBRIDIZATION OF METAHEURISTICS

29

As a summary, while a distributed EA has a large subpopulation, usually much larger than one individual, a cEA has typically one single individual in every subpopulation. In a dEA, the subpopulations are loosely coupled, while for a cEA they are tightly coupled. Additionally, in a dEA, there exist only a few subpopulations, while in a cEA there is a large number of them. To end this subsection, we must point out that there exists a large number of structured algorithms lying in between the distributed and the cellular classes, and much can be said on heterogeneity and synchronicity of the cooperating algorithms. The present book deals in depth with these issues in the forthcoming chapters.

1.5 HYBRIDIZATION OF METAHEURISTICS We conclude this introduction by discussing a very promising research direction, namely the hybridization of metaheuristics. In fact, many of the successful applications that we have cited in previous sections are hybridizations. In the following we distinguish different forms of hybridization. The first one consists of including components from one metaheuristic into another one. The second form concerns systems that are sometimes labelled as cooperative search. They consist of various algorithms exchanging information in some way. The third form is the integration of metaheuristics with more conventional artificial intelligence (Al) and operations research (OR) methods. For a taxonomy of hybrid metaheuristics see [140]. 1.5.1 Component Exchange Among Metaheuristics One of the most popular ways of hybridization concerns the use of trajectory methods in population-based methods. Most of the successful applications of EC and ACO make use of local search procedures. The power of population-based methods is certainly based on the concept of recombining solutions to obtain new ones. In EC algorithms and scatter search explicit recombinations are performed by one or more recombination operators. In ACO and EDAs recombination is implicit, because new solutions are generated by using a probability distribution over the search space which is a function of earlier populations. This enables the algorithm to make guided steps in the search space which are usually “larger” than the steps done by trajectory methods. In contrast, the strength of trajectory methods is rather to be found in the way in which they explore a promising region in the search space. As in those methods local search is the driving component; a promising area in the search space is searched in a more structured way than in population-based methods. In this way the danger of being close to good solutions but “missing” them is not as high as in population-based methods. Thus, metaheuristic hybrids that in some way manage to combine the advantage of population-based methods with the strength of trajectory methods are often very successful.

30

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

1.5.2 Cooperative Search A loose form of hybridization is provided by cooperative search [37, 80, 132, 1421, which consists of a search performed by possibly different algorithms that exchange information about states, models, entire sub-problems, solutions or search space characteristics. Typically, cooperative search algorithms consist of the parallel execution of search algorithms with a varying level of communication. The algorithms can be different or they can be instances of the same algorithm working on different models or running with different parameter settings. The algorithms composing a cooperative search system can be all approximate, all complete, or a mix of approximate and complete approaches. 1.5.3 Integration with Tree Search Methods and Constraint Programming

One of the most promising recent research directions is the integration of metaheuristics with more classical artificial intelligence and operations research methods, such as constraint programming (CP) and branch & bound or other tree search techniques. In the following we outline some of the possible ways of integration. Metaheuristics and tree search methods can be sequentially applied or they also can be interleaved. For instance, a tree search method can be applied to generate a partial solution which will then be completed by a metaheuristic approach. Alternatively, metaheuristics can be applied to improve a solution generated by a complete method. CP techniques can be used to reduce the search space of the problem under consideration (see, for example, [56]). In CP, CO problems are modelled by means of variables, domainsI3 and constraints, which can be mathematical (as, for example, in linear programming) or symbolic. Constraints encapsulate well-defined parts of the problem into sub-problems. Every constraint is associated to ajifilteving algorithm that deletes those values from a variable domain that do not contribute to feasible solutions. Metaheuristics (especially trajectory methods) may use CP to efficiently explore the neighborhood of the current solution, instead of simply enumerating the neighbors or randomly sampling the neighborhood. A prominent example of such integration is Large Neighborhood Search [ 1271 and related techniques. These approaches are effective mainly when the neighborhood to explore is very large or when problems (such as many real-world problems) have additional constraints (called side constraints). Another possible combination consists of introducing concepts or strategies from either class of algorithms into the other. For example, the concepts of tabu list and aspiration criteria-known from tabu search-can be used to manage the list of open nodes (i.e., the ones whose child nodes are not yet explored) in a tree search algorithm. Examples of these approaches can be found in [33, 1191.

l 3 We

restrict the discussion to finite domains.

CONCLUSIONS

31

1.6 CONCLUSIONS This chapter has offered a detailed description of the different kinds of metaheuristics as well as provided a clear structure in order to be accessible for readers interested only in parts of it. The main goal is to make this book self-contained for readers not familiar with some of the techniques later parallelized in the forthcoming chapters. We tried to arrive to a trade-off between detailed description of the metaheuristic working principles and a fast survey of techniques. Besides pure canonical techniques we also reinforced some promising lines of research for improving their behavior, such as hybridization, as well as some lines leading to direct parallelization of the algorithms, such as that of decentralized algorithms.

Acknowledgments The first and third authors acknowledge funding from the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project). The first author also acknowledges a Juan de la Cierva post-doctoral fellowship from the Spanish Ministry of Science and Technology.

REFERENCES 1. E. H. L. Aarts, J. H. M. Korst, and P. J. M. van Laarhoven. Simulated annealing. In E. H. L. Aarts and J. K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 91-120. John Wiley & Sons, Chichester, UK, 1997. 2. E. H. L. Aarts and J. K. Lenstra, editors. Local Search in Combinatorial Optimization. John Wiley & Sons, Chichester, UK, 1997. 3. R. M. Aiex, S. Binato, and M. G. C. Resende. Parallel GRASP with pathrelinking for job shop scheduling. Parallel Computing, 29(4):393-430,2003. 4. E. Alba, F. Luna, A. J. Nebro, and J. M. Troya. Parallel Heterogeneous Genetic Algorithms for Continuous Optimization. Parallel Computing, 30(5-6):699-7 19, 2004.

5. E. Alba and M. Tomassini. Parallelism and evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 6(5):443462, October 2002. 6. E. Alba and J. M. Troya. Improving flexibility and efficiency by adding parallelism to genetic algorithms. Statistics and Computing, 12(2):91-1 14,2002. 7. E. Alba and J.M. Troya. A survey of parallel distributed genetic algorithms. Complexity, 4(4):31-52, 1999.

32

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

8. T. Back. Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, 1996. 9. T. Back, D. B. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation. Institute of Physics Publishing Ltd, Bristol, UK, 1997.

10. S. Baluja. Structure and performance of fine-grain parallelism in genetic search. In S . Forrest, editor, Proceedings ofthe Fifih International Conference on Genetic Algorithms, pages 155-1 62. Morgan Kaufmann, 1993. 11. S. Baluja and R. Caruana. Removing the Genetics from the Standard Genetic Algorithm. In A. Prieditis and S. Russel, editors, The International Conference on Machine Learning 1995, pages 3 8 4 6 , San Mateo, California, 1995. Morgan Kaufmann Publishers.

12. R. Battiti and M. Protasi. Reactive Search, a history-base heuristic for MAXSAT. ACM Journal of Experimental Algorithmics, 2:Article 2, 1997. 13. R. Battiti and G. Tecchiolli. The Reactive Tabu Search. ORSA Journal on Computing, 6(2):126-140, 1994. 14. T. C. Belding. The distributed genetic algorithm revisited. In L. J. Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 114- 121. Morgan Kaufmann, 1995. 15. H. Bersini and G . Seront. In search of a good evolution-optimization crossover. In R. Manner and B. Manderick, editors, Proceedings of PPSN-II, Second International Conference on Parallel Problem Solving from Nature, pages 479488. Elsevier, Amsterdam, The Netherlands, 1992. 16. S. Binato, W. J. Hery, D. Loewenstem, and M. G . C. Resende. A greedy randomized adaptive search procedure for job shop scheduling. In P. Hansen and C. C. Ribeiro, editors, Essays and Surveys on Metaheuristics. Kluwer Academic Publishers. 200 1. 17. C. Blum. Beam-ACO-Hybridizing ant colony optimization with beam search: An application to open shop scheduling. Computers & Operations Research, 32(6):1565-1591,2005. 18. C. Blum and M. Dorigo. The hyper-cube framework for ant colony optimization. IEEE Transactions on Systems, Man, and Cybernetics - Part B, 34(2): 1161-1 172, 2004. 19. C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys, 35(3):268-308,2003. 20. P. CalCgary, G. Coray, A. Hertz, D. Kobler, and P. Kuonen. A taxonomy of evolutionary algorithms in combinatorial optimization. Journal of Heuristics, 5: 145-158. 1999.

REFERENCES

33

2 1. V. Campos, F. Glover, M. Laguna, and R. Marti. An Experimental Evaluation of a Scatter Search for the Linear Ordering Problem. Journal of Global Optimization, 21:397-414.2001. 22. D. J. Cavicchio. Adaptive search using simulated evolution. PhD thesis, University of Michigan, Ann Arbor, MI, 1970. 23. V. Cerny. A thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45141-51. 1985. 24. P. Chardaire, J. L. Lutton, and A. Sutter. Thermostatistical persistency: A powerful improving concept for simulated annealing algorithms. European Journal of Operational Research, 86:565-579, 1995. 25. R. Chelouah and P. Siarry. A continuous genetic algorithm designed for the global optimization of multimodal functions. Journal of Heuristics, 6: 191-2 13, 2000. 26. R. Chelouah and P. Siarry. Tabu search applied to global optimization. European Journal of Operational Research, 123:256-270,2000. 27. R. Chelouah and P. Siarry. Genetic and Nelder-Mead algorithms hybridized for a more accurate global optimization of continuous multiminima functions. European Journal of Operational Research, 148:335-348,2003. 28. C. A. Coello Coello. An Updated Survey of GA-Based Multiobjective Optimization Techniques. ACM Computing Surveys, 32(2): 109-143,2000. 29. A. Colorni, M. Dorigo, and V. Maniezzo. Metaheuristics for high school timetabling. Computational Optimization andApplications, 9(3):275-298, 1998. 30. D. T. Connolly. An improved annealing scheme for the QAP. European Journal of Operational Research, 46:93-100, 1990. 31. J. S. de Bonet, C. L. Isbell Jr., and P. Viola. MIMIC: Finding optima by estimating probability densities. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 7 (NIPS7), pages 424-43 1. MIT Press, Cambridge, MA, 1997. 32. K. A. DeJong. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, University of Michigan, Ann Arbor, MI, 1975. Dissertation Abstracts International 36( lo), 5 140B, University Microfilms Number 76-938 1. 33. F. Della Croce and V. T’kindt. A Recovering Beam Search algorithm for the one machine dynamic total completion time scheduling problem. Journal of the Operational Research Society, 53( 1 1): 1275-1280, 2002. 34. M. Dell’Amico, A. Lodi, and F. Maffioli. Solution of the Cumulative Assignment Problem with a well-structured Tabu Search method. Journal of Heuristics, 5 : 123-143, 1999.

34

AN INTRODUCTION T O METAHEURISTIC TECHNIQUES

35. M. L. den Besten, T. Stiitzle, and M. Dorigo. Design of iterated local search algorithms: An example application to the single machine total weighted tardiness problem. In E. J. W. Boers, J. Gottlieb, P. L. Lanzi, R. E. Smith, S. Cagnoni, E. Hart, G. R. Raidl, and H. Tijink, editors, Applications of Evolutionary Computing: Proceedings of Evo Workshops 2001, volume 2037 of Lecture Notes in Computer Science, pages 441452. Springer Verlag, Berlin, Germany, 2001. 36. J.-L. Deneubourg, S. Aron, S. Goss, and J.-M. Pasteels. The self-organizing exploratory pattern of the argentine ant. Journal of Insect Behaviour, 3: 159168,1990. 37. J. Denzinger and T. Offerman. On cooperation between evolutionary algorithms and other search paradigms. In Proceedings of Congress on Evolutionary Computation - CEC’1999, pages 2317-2324, 1999.

38. G. Di Car0 and M. Dorigo. AntNet: Distributed stigmergetic control for communications networks. Journal of Artificial Intelligence Research, 9:317-365, 1998. 39. M. Dorigo. Optimization, Learning and Natural Algorithms (in Italian). PhD thesis, Dipartimento di Elettronica, Politecnico di Milano, Italy, 1992. 40. M. Dorigo, G. Di Caro, and L. M. Gambardella. Ant algorithms for discrete optimization. Artificial Life, 5(2):137-172, 1999. 4 1. M. Dorigo and L. M. Gambardella. Ant Colony System: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):53-66, 1997. 42. M. Dorigo, V. Maniezzo, and A. Colorni. Ant System: Optimization by a colony of cooperating agents. IEEE Transactions on Svstems, Mun, and Cvbernetics Part B, 26(1):2941, 1996. 43. M. Dorigo and T. Stiitzle. http://www.metaheuristics.net/, 2000. Visited in January 2003. 44. M. Dorigo and T. Stiitzle. The ant colony optimization metaheuristic: Algorithms, applications and advances. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, volume 57 of International Series in Operations Research & Management Science, pages 25 1-285. Kluwer Academic Publishers, Norwell, MA, 2002. 45. M. Dorigo and T. Stiitzle. Ant Colony Optimization. MIT Press, Cambridge, MA, 2004. 46. J. DrCo and P. Siarry. A new ant colony algorithm using the heterarchical concept aimed at optimization of multiminima continuous functions. In M. Dorigo, G. Di Caro, and M. Sampels, editors, Proceedings of ANTS 2002 - From Ant Colonies to Artificial Ants: Third International Workshopon Ant Algorithms, volume 2463

REFERENCES

35

of Lecture Notes in Computer Science, pages 2 16-221. Springer Verlag, Berlin, Germany, 2002. 47. G. Dueck. New Optimization Heuristics. Journal of Computational Physics, 104:86-92, 1993. 48. G. Dueck and T. Scheuer. Threshold Accepting: A General Purpose Optimization Algorithm Appearing Superior to Simulated Annealing. Journal of Computational Physics, 90:161-175, 1990. 49. A. E. Eiben, P.-E. RauC, and Z. Ruttkay. Genetic algorithms with multi-parent recombination. In Y.Davidor, H.-P. Schwefel, and R. Manner, editors, Proceedings of the 3rd Conference on Parallel Problem Solvingfrom Nature, volume 866 of Lecture Notes in Computer Science, pages 78-87, Berlin, 1994. Springer. 50. A. E. Eiben and Z. Ruttkay. Constraint satisfaction problems. In T. Back, D. Fogel, and M. Michalewicz, editors, Handbook ofEvolutionary Computation. Institute of Physics Publishing Ltd, Bristol, UK, 1997. 5 1. A. E. Eiben and C. A. Schippers. On evolutionary exploration and exploitation. Fundamenta Informaticae, 35: 1-16, 1998. 52. W. Feller. An Introduction to Probability Theory and its Applications. John Whiley, 1968. 53. T. A. Feo and M.G. C. Resende. Greedy randomized adaptive search procedures. Journal of Global optimization, 6~109-133, 1995. 54. P. Festa and M. G. C. Resende. GRASP: An annotated bibliography. In C. C. Ribeiro and P. Hansen, editors, Essays and Surveys on Metaheuristics, pages 325-367. Kluwer Academic Publishers, 2002. 55. M. Fleischer. Simulated Annealing: past, present and future. In C. Alexopoulos, K. Kang, W. R. Lilegdon, and G . Goldsman, editors, Proceedings ofthe 1995 Winter Simulation Conference, pages 155-161, 1995. 56. F. Focacci, F. Laburthe, and A. Lodi. Local Search and Constraint Programming. In F. Glover and G. Kochenberger, editors, Handbook OfMetaheuristics, volume 57 of International Series in Operations Research & Management Science. Kluwer Academic Publishers, Norwell, MA, 2002. 57. D. B. Fogel. An introduction to simulated evolutionary optimization. ZEEE Transactions on Neural Networks, 5( 1):3-14, January 1994. 58. G. B. Fogel, V. W. Porto, L).G. Weekes, D. B. Fogel, R. H. Griffey, J. A. McNeil, E. Lesnik, D. J. Ecker, and R. Sampath. Discovery of RNA structural elements using evolutionary computation. Nucleic Acids Research, 30(23):53 10-53 17, 2002.

36

AN INTRODUCTION T O METAHEURISTIC TECHNIQUES

59. L. J. Fogel. Toward inductive inference automata. In Proceedings of the International Federationfor Information Processing Congress, pages 395-399, Munich, 1962. 60. L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated Evolution. Wiley, 1966. 61. C. Fonlupt, D. Robilliard, P. Preux, and E. G. Talbi. Fitness landscapes and performance of meta-heuristics. In S. Vol3, S. Martello, 1. Osman, and C. Roucairol, editors, Meta-heuristics: udvunces and trends in local search paradigms for optimization. Kluwer Academic, 1999. 62. L. M. Gambardella and M. Dorigo. Ant Colony System hybridized with a new local search for the sequential ordering problem. INFORMS Journal on Computing, 12(3):237-255,2000, 63. M. R. Garey and D. S. Johnson. Computers and intractability; u guide to the theory of NP-completeness. W. H. Freeman, 1979. 64. M. Gendreau, G. Laporte, and J.-Y. Potvin. Metaheuristics for the capacitated VRP. In I? Toth and D. Vigo, editors, The Vehicle Routing Problem, volume 9 of SIAM Monographs on Discrete Mathematics and Applications, pages 129-1 54. SIAM, Philadelphia, 2002. 65. F. Glover. Heuristics for Integer Programming Using Surrogate Constraints. Decision Sciences, 8: 156166, 1977. 66. F. Glover. Future paths for integer programming and links to artificial intelligence. Computers & Operations Research, 131533-549,1986. 67. F. Glover. Tabu Search Part 11. ORSA Journal on Computing, 2(1):4-32, 1990. 68. F. Glover. Scatter search and path relinking. In D. Corne, M. Dorigo, and F. Glover, editors, New Ideas in Optimization, Advanced topics in computer science series. McGraw-Hill, 1999. 69. F. Glover and G. Kochenberger, editors. Handbook of Metaheuristics. Kluwer Academic Publishers, Nonvell, MA, 2002. 70. F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, 1997. 7 1. F. Glover, M. Laguna, and R. Marti. Fundamentals of scatter search and path relinking. Control and Cybernetics, 29(3):653-684,2000. 72. F. Glover, M. Laguna, and R. Marti. Scatter Search and Path Relinking: Advances and Applications. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, volume 57 of International Series in Operations Research & Management Science. Kluwer Academic Publishers, Nonvell, MA, 2002. 73. D. E. Goldberg. Genetic algorithms in search, optimization and machine learning. Addison Wesley, Reading, MA, 1989.

REFERENCES

37

74. D. E. Goldberg and J. Richardson. Genetic algorithms with sharing for multimodal function optimization. In J. J. Grefenstette, editor, Genetic Algorithms and their Applications, pages 4 1-139. Lawrence Erlbaum Associates, Hillsdale, NJ. 1987. 75. V. S. Gordon and D. Whitley. Serial and parallel genetic algorithms as function optimizers. In S. Forrest, editor, Proceedings of the F$h International Conference on Genetic Algorithms, pages 177-1 83. Morgan Kaufmann, 1993. 76. P. Hansen. The steepest ascent mildest descent heuristic for combinatorial programming. In Congress on Numerical Methods in Combinatorial Optimization, Capri, Italy, 1986. 77. P. Hansen and N. MladenoviC. Variable Neighborhood Search for the pMedian. Location Science, 5:207-226, 1997. 78. P. Hansen and N. Mladenovid. Variable neighborhood search: Principles and applications. European Journal of Operational Research, 130:449467,200 1. 79. A. Hertz and D. Kobler. A framework for the description of evolutionary algorithms. European Journal of Operational Research, 126:1-12,2000.

80. T. Hogg and C. P. Williams. Solving the really hard problems with cooperative search. In Proceedings ofAAAI93, pages 213-235. AAAI Press, 1993. 81. J. H. Holland. Adaption in natural and artijicial systems. The University of Michigan Press, Ann Harbor, MI, 1975. 82. L. Ingber. Adaptive simulated annealing (ASA): Lessons learned. Control and Cybernetics - Special Issue on Simulated Annealing Applied to Combinatorial Optimization, 25( 1):33-54, 1996. 83. D. S. Johnson and L. A. McGeoch. The traveling salesman problem: a case study. In E. H. L. Aarts and J. K. Lenstra, editors, Local Search in Combinatorial Optimization, pages 21 5-3 10. John Wiley & Sons, Chichester, UK, 1997. 84. T. Jones. Evolutionary Algorithms, Fitness Landscapes and Search. PhD thesis, Univ. of New Mexico, Albuquerque, NM, 1995. 85. D. E. Joslin and D. P. Clements. "Squeaky Wheel" Optimization. Journal qf Art!iicial Intelligence Research, 10:353-373, 1999. 86. P. Kilby, P. Prosser, and P. Shaw. Guided Local Search for the Vehicle Routing Problem with time windows. In S. Vo13, S. Martello, I. Osman, and C. Roucairol, editors, Meta-heuristics: advances and trends in local search paradigms for optimization, pages 473486. Kluwer Academic, 1999. 87. S. Kirkpatrick, C . D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671-680, 1983.

38

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

88. M. Laguna, H. R. LourenGo, and R. Marti. Assigning Proctors to Exams with Scatter Search. In M. Laguna and J. L. Gonzalez-Velarde, editors, Computing Toolsfor Modeling, Optimization and Simulation: Interfaces in Computer Science and Operations Research, pages 2 15-227. Kluwer Academic Publishers, Boston, MA, 2000. 89. M. Laguna and R. Marti. GRASP and Path Relinking for 2-Layer Straight Line Crossing Minimization. INFORMS Journal on Computing, 11(1):44-52, 1999.

90. M. Laguna, R. Marti, and V. Campos. Intensification and Diversification with Elite Tabu Search Solutions for the Linear Ordering Problem. Computers and Operations Research, 26:1217-1230, 1999. 9 1. P. Larraiiaga and J. A. Lozano, editors. Estimation OfDistribution Algorithms: A New, Too1,forEvolutionary Cornputation. Kluwer Academic Publishers, Boston, MA, 2002. 92. E. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys. The Travelling Salesman Problem. John Wiley & Sons, New York, NY, 1985. 93. H. R. Lourenqo, 0. Martin, and T. Stiitzle. Iterated local search. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, volume 57 of International Series in Operations Research & Management Science, pages 32 1-353. Kluwer Academic Publishers, Nonvell, MA, 2002. 94. M. Lundy and A. Mees. Convergence of an annealing algorithm. Mathematical Programming, 34(1):111-124, 1986. 95. S.W. Mahfoud. NichingMethods.for Genetic Algorithms. PhD thesis, University of Illinois at Urbana-Champaign, Urbana, IL, 1995. 96. B. Manderick and P. Spiessens. Fine-grained parallel genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 428433. Morgan Kaufmann, 1989. 97. 0. Martin and S. W. Otto. Combining Simulated Annealing with Local Search Heuristics. Annals of Operations Research, 63:57-75, 1996. 98. 0. Martin, S. W. Otto, and E. W. Felten. Large-step markov chains for the traveling salesman problem. Complex Systems, 5(3):299-326, 1991. 99. M. Mathur, S. B. Karale, S. Priye, V. K. Jyaraman, and B. D. Kulkami. Ant colony approach to continuous hnction optimization. Industrial & Engineering Chemistry Research, 39:38 14-3822, 2000. 100. D. Merkle, M. Middendorf, and H. Schmeck. Ant Colony Optimization for Resource-Constrained Project Scheduling. IEEE Trunsuctions on Evolutionary Computation, 6(4):333-346,2002.

REFERENCES

39

10 1. N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21: 1087-1092, 1953. 102. Z. Michalewicz and M. Michalewicz. Evolutionary computation techniques and their applications. In Proceedings of the IEEE International Conference on Intelligent Processing Systems, pages 14-24, Beijing, China, 1997. Institute of Electrical & Electronics Engineers, Incorporated. 103. P. Mills and E. Tsang. Guided Local Search for solving SAT and weighted MAX-SAT Problems. In Ian Gent, Hans van Maaren, and Toby Walsh, editors, SAT2000, pages 89-106. 10s Press, 2000. 104. M. Mitchell. A n introduction to genetic algorithms. MIT press, Cambridge, MA, 1998. 105. N. Mladenovid and D. UroSevic. Variable Neighborhood Search for the kCardinality Tree. In Proceedings of the Fourth Metaheuristics International Conference, volume 2, pages 743-747,2001. 106. P. Moscato. Memetic algorithms: A Short Introduction. In F. Glover, D. Corne and M. Dorigo, editors, New Ideas in Optimization. McGraw-Hill, 1999.

107. H. Miihlenbein. Evolution in time and space - the parallel genetic algorithm. In G. J. E. Rawlins, editor, Foundations of Genetic Algorithms. Morgan Kaufmann, San Mateo, USA, 1991. 108. H. Miihlenbein and G. Paal3. From recombination of genes to the estimation of distributions. In H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.-P. Schwefel, editors, Proceedings of the 4th Conference on Parallel Problem Solving .from Nature - PPSN IV, volume 1411 of Lecture Notes in Computer Science, pages 178-187, Berlin, 1996. Springer. 109. H. Miihlenbein and H.-M. Voigt. Gene Pool Recombination in Genetic Algorithms. In I. H. Osman and J. P. Kelly, editors, Proc. ofthe Metaheuristics Co&nnce, Nonvell, USA, 1995. Kluwer Academic Publishers. 110. J. A. Nelder and R. Mead. A simplex method for function minimization. Computer Journal, 7:308-3 13, 1965. 111. G. L. Nemhauser and A. L. Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, New York, 1988. 112. E. Nowicki and C. Smutnicki. A fast taboo search algorithm for the job-shop problem. Management Science, 42(2):797-8 13, 1996. 113. I. H. Osman. Metastrategy simulated annealing and tabu search algorithms for the vehicle routing problem. Annals of Operations Research, 41 :421-45 1, 1993. 114. C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization -Algorithms and Complexity. Dover Publications, Inc., New York, 1982.

40

AN INTRODUCTION TO METAHEURISTIC TECHNIQUES

115. M. Pelikan, D. E. Goldberg, and E. Cantu-Paz. BOA: The Bayesian optimization algorithm. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, Proceedings ofthe Genetic and Evolutionmy Computation Conference (GECCO-1999), volume I, pages 525-532. Morgan Kaufmann Publishers, San Fransisco, CA, 1999. 116. M. Pelikan, D. E. Goldberg, and F. Lobo. A survey of optimization by building and using probabilistic models. Technical Report No. 99018, IlIiGAL, University of Illinois, 1999. 117. L. S. Pitsoulis and M. G. C. Resende. Greedy Randomized Adaptive Search procedure. In P. M. Pardalos and M. G. C. Resende, editors, HandbookofApplied Optimization, pages 168-183. Oxford University Press, 2002. 118. M. Prais and C. C. Ribeiro. Reactive GRASP: An application to a matrix decomposition problem in TDMA traffic assignment. INFORMS Journal on Computing, 12:164-176, 2000. 119. S. Prestwich. Combining the Scalability of Local Search with the Pruning Techniques of Systematic Search. Annals of Operations Research, 1 15:51-72, 2002. 120. N. J. Radcliffe. Forma Analysis and Random Respectful Recombination. In Proceedings of the Fourth International Conference on Genetic Algorithms, ICGA 1991, pages 222-229. Morgan Kaufmann Publishers, San Mateo, California, 1991. 121. I. Rechenberg. Evolutionsstrategie: Optimierung technischer Systeme nnch Prinzipien der biologischen Evolution. Frommann-Holzboog, 1973. 122. C. R. Reeves, editor. Modern Heuristic Techniquesfor Combinatorial Problems. Blackwell Scientific Publishing, Oxford, England, 1993. 123. C. R. Reeves. Landscapes, operators and heuristic search. Annals of Operations Research, 86:473-490, 1999. 124. C. R. Reeves and J. E. Rowe. Genetic Algorithms: Principles and Perspectives. A Guide to GA Theory. Kluwer Academic Publishers, Boston (USA), 2002. 125. M. G. C. Resende and C. C. Ribeiro. A GRASP for graph planarization. Networks, 29:173-189,1997. 126. C. C. Ribeiro and M. C. Souza. Variable neighborhood search for the degree constrained minimum spanning tree problem. Discrete Applied Mathematics, 118~43-54,2002. 127. P. Shaw. Using Constraint Programming and Local Search Methods to Solve Vehicle Routing Problems. In M. Maher and J.-F. Puget, editors, Principle and Practice o f Constraint Programming - CP98, volume 1520 of Lecture Notes in Computer Science. Springer, 1998.

REFERENCES

41

128. P. Siany, G. Berthiau, F. Durbin, and J. Haussy. Enhanced simulated annealing for globally minimizing functions of many-continuous variables. ACM Transactions on Mathematical Sofmare, 23(2):209-228, 1997. 129. M. Sipper, E. Sanchez, D. Mange, M. Tomassini, A. Perez-Uribe, and A. Stauffer. A Phylogenetic, Ontogenetic, and Epigenetic View of Bio-Inspired Hardware Systems. IEEE Transactions on Evolutionary Computation, 1(1):83-97, 1997. 130. K. Smyth, H. H. Hoos, and T. Stiitzle. Iterated robust tabu search for MAX-SAT. In Proc. of the 16th Canadian Conference on Artijicial Intelligence (AI’2003), volume 2671 of Lecture Notes in Computer Science, pages 129-144. Springer Verlag, 2003. 131. K. Socha. Extended ACO for continuous and mixed-variable optimization. In M. Dorigo, M. Birattari, C. Blum, L. M. Gambardella, F. Mondada, and T. Stiitzle, editors, Proceedings of ANTS 2004 - Fourth International Workshop on Ant Algorithms and Swarm Intelligence, Lecture Notes in Computer Science 3 172, pages 25-36. Springer Verlag, Berlin, Germany, 2004. 132. L. Sondergeld and S. Vo13. Cooperativeintelligent search using adaptive memory techniques. In S. Vo13, S. Martello, I. Osman, and C. Roucairol, editors, Metaheuristics: advances and trends in local search paradigms .for optimization, chapter 21, pages 297-3 12. Kluwer Academic Publishers, 1999. 133. W. M. Spears, K. A. De Jong, T. Back, D. B. Fogel, and H. de Garis. An overview of evolutionary computation. In P. B. Brazdil, editor, Proceedings of the European Conference on Machine Learning (ECML-93), volume 667, pages 442-459, Vienna, Austria, 1993. Springer Verlag. 134. P. F. Stadler. Landscapes and their correlation functions. Journal of Mathematical Chemistry, 20:145, 1996. Also available as SFI preprint 95-07-067. 135. R. Storn and K. Price. Differential evolution -A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11:341-359, 1997. 136. T. Stiitzle. Local Search Algorithms for Combinatorid Problems - Analysis, Algorithms and New Applications. DISK1 - Dissertationen zur Kiinstliken Intelligenz. infix, Sankt Augustin, Germany, 1999. 137. T. Stiitzle and H. H. Hoos. M A X - M Z N Ant System. Future Generation Computer Systems, 16(8):889-914,2000. 138. G. Syswerda. Simulated Crossover in Genetic Algorithms. In L. D. Whitley, editor, Proceedings of the second workshop on Foundations of Genetic Algorithms, pages 239-255, San Mateo, California, 1993. Morgan Kaufmann Publishers. 139. E. D. Taillard. Robust Taboo Search for the Quadratic Assignment Problem. Parallel Computing, 17:443-455, 1991.

42

AN INTRODUCTION T O METAHEURISTIC TECHNIQUES

140. E-G. Talbi. A Taxonomy of Hybrid Metaheuristics. Journal of Heuristics, 8( 5):54 1-564,2002. 141. R. Tanese. Distributed genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 434439. Morgan Kaufmann, 1989. 142. M. Toulouse, T. G. Crainic, and B. Sa nd. An experimental study of the systemic behavior of cooperative search algorithms. In S. VoR, S. Martello, I. Osman, and C. Roucairol, editors, Meta-heuristics: adimnces and trends in local search paradigms for optimization, chapter 26, pages 373-392. Kluwer Academic Publishers. 1999. 143. D. UroSeviC, J. Brimberg, and N. MladenoviC. Variable neighborhood decomposition search for the edge weighted k-cardinality tree problem. Computers & Operations Research, 3 l(8): 1205-1213,2004. 144. P. J. M. Van Laarhoven, E. H. L. Aarts, and J. K. Lenstra. Job Shop Scheduling by Simulated Annealing. Operations Research, 40: 113-125, 1992. 145. M. D. Vose. The simple genetic algorithm: foundations and theory. MIT Press, Cambridge, MA, 1999. 146. C . Voudouris and E. Tsang. Guided Local Search. European Journal of Operational Research, 113(2):469-499, 1999. 147. A. S. Wade and V. J. Rayward-Smith. Effective local search for the steiner tree problem. Studies in Locational Analysis, 11:219-24 1, 1997. Also in Advances in Steiner Trees, ed. by Ding-Zhu Du, J. M.Smith and J. H. Rubinstein, Kluwer, 2000. 148. M. Yagiura and T. Ibaraki. On metaheuristic algorithms for combinatorial optimization problems. Systems and Computers in Japan, 32(3):33-55,2001.

2

Measuring the Performance of Parallel Metaheuristics ENRIQUE ALBA, GABRIEL LUQUE Universidad de Malaga, Spain

2.1

INTRODUCTION

Most optimization tasks found in real-world applications impose several constraints that usually do not allow the utilization of exact methods. The complexity of these problems (they are often NP-hard [ 131) or the limited resources available to solve them (time, memory) have made the development of metaheuristics a major field in operations research. In these cases, metaheuristics provide optimal or suboptimal feasible solutions in a reasonable time. Although the use of metaheuristics allows to significantly reduce the time of the search process, the high dimension of many tasks will always pose problems and result in time-consuming scenarios for industrial problems. Therefore, parallelism is an approach not only to reduce the resolution time but also to improve the quality of the provided solutions. The latter holds since parallel algorithms usually run a different search model with respect to sequential ones [4]. Unlike exact methods, where time efficiency is a main measure for success, there are two chief issues in evaluating parallel metaheuristics: how fast can solutions be obtained, and how far they are from the optimum. We can distinguish between two different approaches for analyzing metaheuristics: a theoretical analysis (worse-case analysis, average-case analysis, ...) or an experimental analysis. Several authors [ 16, 201 have developed theoretical analyses of some importance for a number of heuristics and problems. But, their difficulty, which makes it hard to obtain results for most realistic problems and algorithms, severely limits their range of application. As a consequence most of the metaheuristics are evaluated empirically in an ad hoc manner. An experimental analysis usually consists in applying the developed algorithms to a collection of problem instances and comparatively report the observed solution quality and consumed computational resources (usually time). Other researchers [5, 261 have tried to offer a kind of methodological framework to deal with the ex43

44

MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS

perimental evaluations of heuristics, what mainly motivates this chapter. Important aspects of an evaluation are the experimental design, finding good sources of test instances, measuring the algorithmic performance in a meaninghl way, sound analysis, and clear presentation of results. Due to the great difficulty of malung all this correctly, the actual main issues of the experimental evaluation are simplified to just highlight some guidelines for designing experiments and reporting on the experimentation results. An excellent algorithmic survey about simulations and statistical analysis is given in [24]. In that paper, McGeoch includes an extensive set of basic references on statistical methods and a general guide for designing experiments. In this chapter, we focus on how the experiments should be performed and how the results must be reported in order to make fair comparisons between parallel metaheuristics. Specially, we are interested in revising, proposing, and applying parallel performance metrics and statistical analysis guidelines to ensure that our conclusions are correct. This chapter is organized as follows. The next section briefly summarizes some parallel metrics such as speedup and related performance measures. Section 2.3 discusses how to report results on parallel metaheuristics. Then, in the next section, we perform several practical experiments to illustrate the importance of the metric in the conclusions. Finally, some concluding remarks are outlined in Section 2.5.

2.2 PARALLEL PERFORMANCE MEASURES There are different metrics to measure the performance of parallel algorithms. In the first subsection we discuss in detail the most common measure, i.e., the speedup, and address its meaningful utilization in parallel metaheuristics. Later, in a second subsection we summarize some other metrics also found in the literature 2.2.1

Speedup

The most important measure of a parallel algorithm is the speedup. This metric compares two times: ratio between sequential and parallel times. Therefore, the definition of time is the first aspect that we must face. In a uni-processor system, a common performance is the CPU time to solve the problem; this is the time the processor spends executing algorithm instructions, typically excluding the time for input of problem data, output of results, and system overhead activities. In the parallel case, time is neither a sum of CPU times on each processor nor the largest among them. Since the objective of parallelism is the reduction of the real time, time should definitely include any overhead activity time because it is the price of using a parallel algorithm. Hence the most prudent choice for measuring the performance of a parallel code is the wall-clock time to solve the problem at hands. This means using the time between starting and finishing the whole algorithm. The speedup compares the serial time against the parallel time to solve a particular problem. If we denote by T, the execution time for an algorithm using m processors,

PARALLEL PERFORMANCE MEASURES

45

the speedup is the ratio between the faster execution time on a uni-processor T I and the execution time on m processors T,: s,

=

-.Tl

(2.1) Tm For non-deterministic algorithms we cannot use this metric directly. For this kind of method, the speedup should instead compare the mean serial execution time against the mean parallel execution time:

With this definition we can distinguish among: sublinear speedup (s, < ni), linear speedup (s, = m),and superlinear speedup (s, > m). The main difficulty with that measure is that the researchers do not agree on the meaning of TI and T,,,. In his study, Alba [ 11 distinguishes between several definitions of speedup depending on the meaning of these values (see Table 2.1). Table 2.1 Taxonomy of speedup measures proposed by Alba 111

I. Strong speedup 11. Weak speedup A. Speedup with solution stop 1. Versus panmixia 2. Orthodox B. Speed with predefined effort Strong speedup (type I) compares the parallel run time against the best-so-far sequential algorithm. This is the most exact definition of speedup, but due to the difficulty of finding the current most efficient algorithm, most designers of parallel algorithms do not use it. Weak speedup (type 11) compares the parallel algorithm developedby a researcher against hidher own serial version. In this case, two stopping criteria for the algorithms exist: solution quality or maximum effort. The author discards the latter because it is against the aim of speedup to compare algorithms not yielding results of equal accuracy. He proposes two variants of the weak speed with solution stop: to compare the parallel algorithm against the canonical sequential version (type II.A.l) or to compare the run time of the parallel algorithm on one processor against the run time of the same algorithm on m processors (type II.A.2). In the first case we are comparing two clearly different algorithms. Ban and Hickman in [6] showed a different taxonomy: Speedup, Relative speedup, and Absolute speedup. The Speedup measures the ratio between the time of the faster serial code on a parallel machine with the time of the parallel code using m processors on the same machine. The Relative speedup is the ratio of the serial execution time with parallel code on one processor with respect to the execution time of that code on rn processors. This definition is similar to the type II.A.2 shown above. The

46

MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS

Absolute speedup compares the fastest serial time on any computer with the parallel time on m processors. This metric is the same as the strong speedup defined by [ 11. As a conclusion, it is clear that the parallel metaheuristics should compute a similar accuracy as the sequential ones. Just in this case we are allowed to compare times. The used times are average mean times: the parallel code on one machine versus the parallel code on nz machines. All this define a sound way for comparisons, both practical (no best algorithm needed) and orthodox (same codes, same accuracy).

2.2.1.1 Superlinear Speedup. Although several authors have reported superlinear speed [8,23], its existence is still controversial. Anyway, based on their experiences, we can expect to get superlinear speedup sometimes. In fact, we can point out several sources behind superlinear speedup: 0

0

0

Implementation Source. The algorithm being run on one processor is "inefficient" in some way. For example, if the algorithm uses lists of data, the parallel one can be faster because it manages shorter lists. On the other hand, the parallelism can simplify several operations of the algorithm. Numerical Source. Since the search space is usually very large, the sequential program may have to search a large portion before finding the required solution. On the other hand, the parallel version may find the solution more quickly due to the change in the order in which the space is searched. Physical Source. When moving from a sequential to a parallel machine, it is often the case that one gets more than an increase in CPU power (see Figure 2.1). Other resources, such as memory, cache, etc. may also increase linearly with the number of processors. A parallel metaheuristic may achieve superlinear speedup by taking advantage of these additional resources.

We therefore conclude that superlinear speedup is possible theoretically and, as result of empirical tests, both for homogeneous [31] and heterogeneous [3, 111 computing networks. 2.2.2 Other Parallel Metrics

Although the speedup is a widely used metric, there exist other measures of the performance of a parallel metaheuristic. The ejiciency (Equation 2.3) is a normalization of the speedup and allows to compare different algorithms (em = 1 means linear speedup): em =

-.S m

(2.3) m There exist several variants of the efficiency metric. For example, the incremental eficiency (Equation 2.4) shows the fraction of time improvement from adding another processor, and it is also often used when the uni-processor times are unknown. This

PARALLEL PERFORMANCE MEASURES

47

Sequential Algorithm

Parallel Algorithm

Fig. 2.1 Physical source for superlinear speedup. The population does not tit into a single cache, but when run in parallel, the resulting chunks do tit, providing superlinear values of

speedup.

metric has been later generalized (Equation 2.5) to measure the improvement attained by increasing the number of processors from n to m.

The previous metrics indicate the improvement coming from using additional processing elements, but they do not measure the utilization of the available memory. The scaled speedup (Equation 2.6) addresses this issue and allows to measure the full utilization of the machines resources:

ssm

=

Estimated time to solve problem of size nm on 1 processor Actual time to solve problem of size nm on m processors

7

(2.6)

where n is the size of the largest problem which may be stored in the memory associated to one processor. Its major disadvantage is that performing an accurate estimation of the serial time is difficult and it is impractical for many problems. Closely related to scaled speedup is scaleup, but it is not based on an estimation uni-processor time:

48

MEASURING THE PERFORMANCE OF PARALLEL METAHEURlSTlCS

=

Time to solve k problems on m processors Time to solve n k problems on nm processors'

(2.7)

This metric measures the ability of the algorithm to solve a n-times larger job on a n-times larger system in the same time as the original system. Therefore, linear speedup occurs when SU,~,, = 1. Finally, K q and Flatt [ 191 have devised an interesting metric for measuring the performance of any parallel algorithm that can help us to identify much more subtle effects than using speedup alone. They call it the serial fraction of the algorithm (Equation 2.8):

Ideally, the serial fraction should stay constant for an algorithm. If a speedup value is small since the loss of efficiency is due to the limited parallelism of the program, we can still say that the result is good if f m remains constant for different values of m. On the other side, smoothly increasing fm is a warning that the granularity of the parallel tasks is too fine. A third scenario is possible in which a significant reduction in f m occurs, indicating something akin to superlinear speedup. If superlinear speedup occurs, then fm would take a negative value.

2.3 HOW TO REPORT RESULTS In general, the goal of a publication is to present a new approach or algorithm that works better, in some sense, than existing algorithms. This requires experimental tests to compare the new algorithm with respect to the rest. It is, in general, hard to make fair comparisons between algorithms. The reason is that we can infer different conclusions from the same results depending on the metrics we use. This is specially important for non-deterministic methods. In this section we address the main issues on experimental testing for reporting numerical effort results and the statistical analysis that must be performed to ensure that the conclusions are meaningful. The main steps are shown in Figure 2.2.

Design

i

- Define goals

Choose instances Select factors

I

Measure

Report

+

W

Fig. 2.2

Main steps for an experimental design.

HOW TO REPORT RESULTS

49

2.3.1 Experimentation The first choice that a researcher must decide is the problem domain and the problem instances to test hisher algorithm. That decision depends on the goals of the experimentation. We can distinguish between two clearly different objectives: (1) optimization and (2) understanding of the algorithms. Optimizing is a commonly practiced sport in designing a metaheuristic that beats others on a given problem or set of problems. This kind of experimental research finishes by establishing the superiority of a given heuristic over others. Researchers are not limited to establishing that one metaheuristic is better than another in some way, but should also investigate why. A very good study of this latter subject can be found for instance in [ 181. One important decision is the instance used. The set of instances must to be complex enough to obtain interesting results and must have a sufficient variety of scenarios to allow the generalization of the conclusions. Problem generators [ 101 are specially good for a varied and wide analysis. In the next paragraphs we show the main classes of instances (a more comprehensive classification can be found in [12,261). Real- World Instances. The instances taken from real applications represent a hard testbed for testing algorithms. Sadly, it is rarely possible to obtain more than a few real data for any computational experiment due to proprietary considerations. An alternative is to use random variants of real instances, i.e., the structure of the problem class is preserved, but details are randomly changed to produce new instances. Another approach is using natural instances [ 121 that represent instances that emerge from a specific real-life situation, such as timetabling on a school. This class of instances has the advantage of being freely available. Specially, academic instances must be analyzed in the existing literature to not reinvent the wheel and to avoid using straightforward benchmarks [33]. Standard Instances. In this class are included the instances, benchmarks, and problem instance generators that, due to their wide use in experimentation, became standard in the specialized literature. For example, Reinelt [27] offers the TSPLIB, a travelling salesman problem test instances, and Uzsoy et al. [32] offer something similar for job scheduling problems. Such libraries allow to test specific issues of algorithms and also to compare our results against other methods. The OR-library [7] is a final excellent example of results from academics and companies for a large set of problem classes. Random Instances. Finally, when none of the mentioned sources provide an adequate supply for tests, the remaining alternative is pure random generation. This method is the quick way to obtain a diverse group of test instances but is also the most controversial.

50

MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS

After having selected a problem or a group of instances, we must design the computational experiments. Generally, the design starts by analyzing the effects of several factors on the algorithm performance. These factors include problem factors, such as problem size, number of constrains, etc., plus algorithmic factors, such as parameters or components that are used. If the cost of the computer experiments are low, we can do a ful1,factorial design, but in general, it is not possible due to the large number of experiments. But we usually need to reduce the factors. There is a wide literature on.fractiona1factorial design in statistics, which seeks to assess the same effects of a fractional analysis without running all the combinations (see, for example, [ 2 5 ] ) . The next steps in an experimental project are to execute the experiments, choose the measure of performance, and analyze the data. These steps are addressed in the next sections.

2.3.2 Measuring Performance Once we have chosen the instances that we are going to use and the factors that we are going to analyze, we must select the appropriate measures for the goal of our study. The objective of a metaheuristic is to find a good solution in a reasonable time. Therefore, the choice of performance measures for experiments with heuristics necessarily involves both solution quality and computational effort. Because of the stochastic nature of metaheuristics, a number of independent experiments need to be conducted to gain sufficient experimental data. The performance measures for these heuristics are based on some kind of statistics. 2.3.3 Quality of the Solutions This is one of the most important issues to evaluate the performance of an algorithm. For instances where the optimal solution is known, one can easily define a measure: the success rate or number of hits. This measure can be defined as the percentage of runs terminating with success (% hits). But this metric cannot be used in all cases. For example, there are problems where the optimal solution is not known at all and a lowerhpper bound is also unavailable. In other cases, although the optimum is known, its calculation delays too much, and the researcher relaxes to find a good approximation in a reasonable time. It is a common practice in metaheuristics for the experiments to have a specific bound of computational effort (a given number of search space points visited or a maximum execution time). In these cases, when optimum is not known or located, statistical metrics are also used. Most popular metrics include the mean and the median of the best performing solutions like the fitness (a measure of the quality of the solution) over all executions. These values can be calculated for any problem. For each run of a given metaheuristic the best fitness can be defined as the fitness of the best solution at termination. For

HOW TO REPORT RESULTS

51

parallel metaheuristics it is defined as the best solution found by the set of cooperating algorithms. In a problem where the optimum is known, nothing prevents us to use both % hits and mediadmean of the final quality (or of the effort). Furthermore, all combinations of lowlhigh values can occur for these measures. We can obtain a low number of hits and a high mean/median accuracy; this, for example, indicates a robust method that seldom achieves the optimal solution. An opposite combination is also possible but it is not common. In that case the algorithm achieves the optimum in several runs but the rest of the runs locates a very bad fitness. In practice, a simple comparison between two averages or medians might not give the same result as a comparison between two statistical distributions. In general, it is necessary to offer additional statistical values such as the variance and to perform a global statistical analysis to ensure that the conclusions are meaningful and not just random noise. We discuss this issue in Section 2.3.5.

2.3.4 Computational Effort While heuristics that produce superior solutions are important, the speed of computation is a key factor. Within metaheuristics, the computational effort is typically measured by the number of evaluations and/or the execution time. In general, the number of evaluations is defined in terms of the number of points of the search space visited. Many researchers prefer the number of evaluations as a way to measure the computational effort since it eliminates the effects of particular implementations, software, and hardware, thus making comparisons independent from such details. But this measure can be misleading in several cases in the field of parallel methods. For example, if some evaluations take longer than others (for example, in parallel genetic programming [22]) or if an evaluation can be done very fast, then the number of evaluations does not reflect the algorithm's speed correctly. Also, the traditional goal of parallelism is not the reduction of the number of evaluationsbut the reduction of time. Therefore, a researcher must usually use the two metrics to measure the computational effort. It is very typical to use the average evaluations/time to a solution, defined over those runs that end in a solution (with a predefined quality maybe different from the optimal one). Sometimes the average evaluatiodtime to termination is used instead of the average evaluationshime to a solution of a given accuracy. This practice has clear disadvantages, i.e., for runs finding solutions of different accuracy, using the total execution timeieffort to compare algorithms becomes hard to interpret from the point of view of the parallelism. On the contrary, imposing a predefined timeleffort and then comparing the solution quality of the algorithms represents an interesting and correct metric; what is incorrect is to also use the run times to compare algorithms, i.e., to measure speedup of efficiency (although works using this kind of metric can be found in literature).

52

MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS

2.3.5 Statistical Analysis In most papers, the objective is to prove that a particular heuristic outperforms another one. But as we said before, the comparison between two average values might be different from the comparison between two distributions. Therefore, statistical methods should be employed wherever possible to indicate the strength of the relations between different factors and performance measures [ 171. Usually, the researchers use t-tests or an analysis of variance (ANOVA) to ensure the statistical significance of results, i.e., determining whether an observed effect is likely to be due to sampling errors or not. Several statistical methods and the conditions to apply them are shown in Figure 2.3. The first step, in theory, should be to decide between non-parametric and parametric tests; when the data set is non-normally distributed and the number of experiments is below 30, we should use non-parametric methods; otherwise, parametric tests can be used. In fact, most researchers go for parametric tests from the beginning, what in most cases is a good idea if the number of independent experiments is high. Applying a Kolmogorov-Smirnov test is a powerful, accurate, and low cost method to check data normality. However, we must also say that most studies assume normality for data sets of more than 30 or 50 values (an assumption that is formally grounded). The Student t-test is widely used to compare means of normal data. This method can be only used when there exist two populations or results (e.g., two sets of independentruns of two metaheuristics). In case of many data sets we must use an ANOVA test, plus a later analysis to compare and sort means. For a non-normal data set, a wide amount of methods exist (see Figure 2.3). All the mentioned methods assume several hypotheses to obtain a correct conclusion, a linear relation between causes and effects being the most common one. The t-test is based on Student’s t distribution. It allows to calculate the statistical significance of two sampled populations with a given confidence level, typically between 95% @-value < 0.05) and 99% (p-value < 0.01). The underlying notion of ANOVA is to assume that every non random variation in experimental observations is due to differences in mean performance at alternative levels of the experimental factors. ANOVA proceeds by estimating each of the various means and partitioning the total sum qfsyuares or squared deviation from the sample global mean into separate parts due to each experimental factor and to error. As a general advice, you should include either one or both kinds of tests (if possible) in your scientific communications,since the whole metaheuristiccommunity is headed slowly to aslung for the assessment of the claims made in the experimental phase of your work. The two kinds of analyses, t-test and ANOVA, can only be applied if the source distribution is normal. In metaheuristics, the resulting distribution could also not to be normal. For this case, there is a theorem that can help us. The Central Limit Theorem states that the sum of many identically distributed random variables tends to a Gaussian. So the mean of any set of samples tends to a normal distribution. But in several cases the Central Limit Theorem is not useful. In these cases, there are a host of nonparametric techniques (for example, the sign test) that can and should

HOW TO REPORT RESULTS

I

Normal Variables

Non-Normal Variables Median comparison, ion-Parametric tests)

Analysis of variance

Levene test and

Post hoe mean comparison tests

2 independent data

> 2 dependent data

1

53

I

4

Mann-Witney test

I

4

Wileoron or Sign tests

I

4

Friedman test

Eqoality of Variance (Levene test)

Newman-Kenls (SNK), and/or Bonferronl tests

Fig. 2.3

Tamhane tests

Application scheme of statistical methods.

be employed to sustain the author’s arguments, even if the results show no statistical difference between the quality of the solutions produced by the metaheuristics [ 151.

2.3.6 Reporting Results The final step in an experimental design is to document the experimental details and findings and to communicate them to the international community. In the next paragraphs, we show the most important points that should be met. Reproducibili@. One necessary part of every presentation should be background on how the experiment was conducted. The reproducibility is an essential part of scientific research, and experimental results that cannot be independently verified are given little credence in the scientific community. Hence, the algorithm and its implementation should be described in sufficient detail to allow replication, including any parameter (probabilities, constants, ...), problem encoding, pseudo-random number generation, etc. The source and characteristics of problem instances should also be documented. Besides, many computing environment factors that can influence the empirical performance of a method should be documented also: number, types and

54

MEASURING THE PERFORMANCE OF PARALLEL METAHEURlSTlCS

speeds of processors, size and configuration of memories, communication network, operating system, etc. Presenting Results. A final important detail is the presentation of the results. The best way to support your conclusion is to display your data in such a way as to highlight the trends they exhibit, the distinctions made, and so forth. There are many good display techniques depending on the types of points one wants to make (for example, see [9] or [30]). Tables by themselves are usually a very inefficient way of showing the results. Hence, if there is any graphical way to summarize the data and reveal a message or lesson, it is almost to be preferred to a table alone. On the other hand, although pictures can often tell your story more quickly, they are usually a poor way of presenting the details of your results. Therefore, a scientific paper should contain both pictures and tables. 2.4

ILLUSTRATING THE INFLUENCE OF MEASURES

In this section we perform several experimental tests to show the importance of the reported performances in the conclusions. We use several parallel genetic algorithms and one parallel simulated annealing to solve the well-known MAXSAT problem. Before beginning with the examples, we give a brief description of the algorithms, the problem, and the configuration. The Algorithms. Genetic algorithms (GA) [ 141 make use of a randomly generated population of solutions. The initial population is iteratively enhanced through a natural evolution process. At each generation of this process, the whole population or a part of it is replaced by newly generated individuals (often the best ones). In the experiments, we use three different parallel models of GA: independent runs (IR), distributed GA (dGA), and a cellular GA (cGA) (see Figure 2.4) In the first model, a pool of processors is used to speed up the execution of separate copies of a sequential algorithm,just because independent runs can be made more rapidly by using several processors than by using a single one. In dGAs [29], the population is structured into smaller subpopulations relatively isolated from the others. The key feature of this kind of algorithm is that individuals within a particular subpopulation(or island) can occasionally migrate to another one. The parallel cGA [28] paradigm normally deals with a single conceptual population, where each processor holds just a few individuals. The main characteristic of this model is the structuring of the population into neighborhood structures, and individuals may only interact with their neighbors. Also, we consider a local search method such as simulated annealing (SA). A SA [2 11 is a stochastic technique that can be seen as a hill-climber with an internal mechanism to escape from local optima. For this, moves that increase the energy function being minimized are accepted with a decreasing probability. In our parallel SA there exist multiple asynchronous component SAs. Each component SA period-

ILLUSTRATING THE INFLUENCE OF MEASURES

Fig. 2.4

55

GA models: (a) a cellular GA and (b) a distributed GA.

ically exchanges the best solution found (cooperation phase) with its neighbor SA in the ring.

The Problem. The satisfiability (SAT) problem is commonly recognized as a fundamental problem in artificial intelligence applications, automated reasoning, mathematical logic, and related fields. The MAXSAT [13] is a variant of this general problem. Formally, this problem can be formulated as follows: given a formula f of the propositional calculus in conjunctive normal form (CNF) with in clauses and n variables, the goal of this problem is to determine whether or not there exists an assignment t of truth values to variables such that all clauses are satisfied. In the experiments we use several instances generated by De Jong et al. [lo]. These instances are composed of 100 variables and 430 clauses (f*(optimum)= 430). Configuration. No special analysis has been made for determining the optimum parameter values for each algorithm. We use a simple representation for this problem: a binary string of length n (the number of variables) where each digit corresponds to a variable. A value of 1 means that its corresponding variable is true, and 0 defines the corresponding variable as false. In our GA methods, the whole population is composed of 800 individuals and each processor has a population of 800/m individuals, where m is the number of processors. All the GAS use the one-point crossover operator (with probability 0.7) and bit-flip mutation operator (with probability 0.2). In distributed GAS,the migration occurs in a unidirectional ring manner, sending one single randomly chosen individual to the neighbor subpopulation. The target population incorporates this individual only if it is better than its current worst solution. The migration step is performed every 20 iterations in every island in an asynchronous way. For the SA method, we use a proportional update of the temperature, and the cooling factor is set to 0.9. The cooperation phase is performed every 10,000 evaluations. All experiments are performed on Pentium 4 at 2.8 GHz linked by a Fast Ethernet communication network. We performed 100 independent runs of each experiment to ensure statistical significance.

56

MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS

All the algorithms MALLBA Library [2].

have This

been implemented library is publicly

http://neo.lcc.uma.es/mallba/easy-mallba/index.html.

using the available at

In the next section we present several examples of utilization of the performance measures. 2.4.1

Example 1: On the Absence of Information

We begin our test showing the results of a SA with different numbers of processors to solve an instance of MAXSAT. The results can be seen in Table 2.2. The values shown are the number of executions that found the optimal value (O/O hit column), the fitness of the best solution (best column), the average fitness (avg column), the number of evaluations (# evals column), and the running time (time column).

Table 2.2 Results of Example 1 Algorithm SA8 SA16

%hit

best

avg

#evals

0% 0%

426 428

418.3

-

time

416.1

In this example, algorithms did not find an optimal solution in any execution. Then we cannot use the percentage of hits to compare them: we must stick to another metric to compare the quality of solutions. We could use the best solution found, but that single value does not represent the actual behavior of the algorithms. In this case the best measure to compare the quality of results is the mean fitness. Then, we can conclude that the SA with 8 processors is better than the same one with 16 processors. But before stating such conclusion we need to perform a statistical test to ensure the significance of this claim. To assess the statistical significance of the results we performed 100 independent runs (30 independent runs is usually thought as a minimum in heuristics). Also, we computed a Student t-test analysis so that we would be able to distinguish meaningful differences in the average values. A significant p-value is assumed to be 0.05, in order to indicate a 95% confidence level in the results. In this case, the resulting p-value is 0.143, i.e., there is not a significant difference among the results. This comes as no surprise, since they are the same algorithm, and the behavior should be similar, while only the time should be affected by the change in the number of processors. Thus, if we would have elaborated on the superiority of one of them, we would have been mistaken. In this example, it is not fair to measure the computational effort, since the algorithms do not achieve an optimum. In the next example, we will use a different stopping criterion to allow us to compare the computational effort.

ILLUSTRATING THE INFLUENCE OF MEASURES

57

2.4.2 Example 2: Relaxing the Optimum Helps Again, we show in Table 2.3 the results of the same SA as in the previous section, but for this test we consider as an optimum any solution with f*(z) > 420 (the global optimum has a fitness value = 430). Table 2.3 Algorithm SAS SA16

Results of Example 2

%hit

best

avg

60% 58%

426 428

418.3 416.1

# evals 60154 67123

time 2.01 1.06

For this example, we do not compare the quality of the solution since there is not statistical difference, and therefore we focus on the computational effort. The algorithm with 8 processors, SA8, performs a slightly smaller number of evaluations than the SAI 6, but the differenceis not significant(the pvalue is larger than 0.05). On the other hand, the reduction in the execution time is significant (p-value = 1.4eP5). Thus we could have stated at a first glance that SA8 is numerically more efficient than SA16, but statistics tell us that no significant improvement can be drawn. However, we can state that SA16 is better from a time efficiency point of view than SA8. These results are somewhat expected: the behavior (quality of solutions and number of evaluations) of both methods are similar, but the execution time is reduced when the number of processors is increased. Concluding on SA8’s numerical superiority is that can be avoided thanks to the utilization of such statistical tests.

2.4.3

Example 3: Clear Conclusions Do Exist

Now, let us compare two different algorithms: a parallel GA using the independent run model and the SA of the previous examples. Both GA and SA are distributed on 16 machines. As it occurs in the first example, neither of the methods achieve the optimum solution in any independent run. Therefore, we consider the same optimum range as the second example. Table 2.4 Results of Example 3 Algorithm %hit best avg #evals IR16 37% 424 408.3 85724 67123 SA16 58% 428 416.1

time 1.53 1.06

Table 2.4 shows a summary of the results for this experiment. From this table we can infer that SA16 is better in all aspects (solution quality, number of evaluations, and time) than IR16. And, this time, these conclusions are all supported by statistical tests, i.e., their p-values are all smaller than 0.05. Although SA16 is better than IR16, none of them is adequate for this problem, since they are both quite far from the optimum.

58

MEASURING T H E PERFORMANCE O F PARALLEL METAHEURISTICS

2.4.4 Example 4: Meaningfulness Does Not Mean Clear Superiority Now, we compare the results obtained with the same parallel GA (independent runs model) using two, four, and eight processors. The overall results of this example are shown in Table 2.5.

Algorithm Seq. IR2 IR4 IR8

Table 2.5 Results of YOhit best avg 60% 430 419.8 41% 430 417.7 20% 430 412.2 7% 430 410.5

Example 4 #evals time 97671 19.12 92133 9.46 89730 5.17 91264 2.49

speedup

1.98 3.43 7.61

The statistical tests are always positive, i.e., all results are significantly different from the other ones. Then we can conclude that the IR paradigm allows to reduce the search time and obtains a very good speedup, nearly linear, but its results are worse than those of the serial algorithms, since its percentage of hits is lower. It might be surprising since the algorithm is the same in all the cases, and the expected behavior should be similar. The reason is that, as we increase the number of processors, the population size decreases and the algorithm is not able to keep diversity enough to find the global solution.

2.4.5 Example 5: Speedup, Do Not Compare Apples Against Oranges In this case, we show an example of speedup. In Table 2.6 we show the results for a sequential GA against a distributed cellular GA with different numbers of processors. In this example we focus on the time column. The ANOVA test for this column is always significant (p-value = 0.0092). Table 2.6 Algorithm % h i t Seq. 60% cGA2 85% cCA4 83% &A8 83% cCA16 84%

Results of Example 5 best 430 430 430 430 430

avg 421.4 427.4 426.7 427.1 427.0

# evals

97671 92286 94187 92488 91280

time 19.12 10.40 5.79 2.94 1.64

As we do not know the best algorithm to this MAXSAT instance, we cannot use the strong speedup (see Table 2.1). Then, we must use the weak definition of speedup. On the data of Table 2.6, we can measure the speedup with respect to the canonical serial version (panmixia columns of Table 2.7). But it is not fair to compute speedup against a sequential GA, since we compare different algorithms (the parallel code is that of a cGA). Hence, we turn to compare the same algorithm (the cGA) both in sequential and in parallel (cGAn on 1 versus n processors). This speedup is known

ILLUSTRATING THE INFLUENCE OF MEASURES

59

as orthodox speedup. The speedup, the efficiency, and the serial fraction using the orthodox definition are shown in the orthodox columns of Table 2.7. The orthodox values are slightly better than those on the panmictic ones. But the trend in both cases is similar (in some other cases the trend could even be different); the speedup is quite high in this case, but it is always sublinear, and it slightly moves away from the linear speedup as the number of CPUs increases. That is, when we increment the number of CPUs we have a moderate loss of efficiency. The serial fraction is quite stable, as one can expect in a well-parallelized algorithm, although we can notice a slight reduction of this value as the number of CPUs increases, indicating that the granularity of the parallel task is too fine, and the loss of efficiency is mainly due to the limited parallelism of the program behavior itself.

Algorithm cCA2 cCA4 cCA8 cCA16

2.4.6

speedup 1.83 3.30 6.50 11.65

Table 2.7 Speedup and Efficiency panmixia orthodox efficiency serial fract. speedup efficiency 0.9 I5 0.093 1.91 0.955 0.070 3.43 0.857 0.825 0.846 0.032 6.77 0.8 12 0.75 1 0.025 12.01 0.728

serial fract. 0.047 0.055 0.026 0.022

Example 6: Predefined Effort Could Hinder Clear Conclusions

In the previous examples the stopping criterion is based on the quality of the final solution. In this experiment, the termination condition is based on a predefined effort (60,000 evaluations). Previously, we used the fitness of the best solution and the average fitness to measure the quality of the solutions. We now turn to a different metric, such as the median and the average of the final population mean fitness of each independent run (mm). In Table 2.8 we list all these metrics for a sequential GA and a distributed GA using four processors. Table 2.8 Algorithm %hit Seq. 0% 0% dCA4

Results of Example 6 best 418 410

avg 406.4 402.3

median 401 405

mm 385.8 379.1

Using a predefined effort as a stopping criterion is not always a good idea in parallel metaheuristics if one wishes to measure speedup: in this case, for example, algorithms could not find an optimal solution in any execution. If we analyze the best solution found or the two averages (average of the best fitness, avg column, and average of the mean fitness, mm column), we can conclude that the sequential version is more accurate than the parallel GA. But the median value of dGA is larger than that of the serial value, indicating that the sequential algorithm obtained several very good solutions but the rest of them had a moderate quality, while the parallel GA had a more stable behavior. With this stopping criterion, it is hard to obtain a

60

MEASURING THE PERFORMANCE O F PARALLEL METAHEURISTICS

clear conclusion if the algorithm is not stable. In fact, the normal distribution of the resulting fitness is hardly found in many practical applications, a disadvantage for simplistic statistical claims. Also, we can notice that the avg data are always better than the mm values. This is common sense, since the final best fitness is always larger than the final mean fitness (or equal when all the individuals converge). Finally, a statistical analysis verified the significance of those data. 2.5 CONCLUSIONS This chapter considered the issue of reporting experimental research with parallel metaheuristics. Since this is a difficult task, the main issues of an experimental design are highlighted. We do not enter the complex and deep field of pure statistics in this chapter (space problems), but just present some important ideas to guide researchers in their work. As could be expected, we have focused on parallel performance metrics that allow to compare parallel approaches against other techniques of the literature. Besides, we have shown the importance of the statistical analysis to support our conclusions, also in the parallel metaheuristic field. Finally, we have performed several experimental tests to illustrate the influence and utilizations of the many metrics described in the chapter. Acknowledgments The authors acknowledge partial funding by the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project).

REFERENCES 1. E. Alba. Parallel evolutionary algorithms can achieve super-linear performace.

Information Processing Letters, 82:7-13,2002. 2. E. Alba and the MALLBA Group. MALLBA: A Library of Skeletons for Combinatorial Optimisation. In R. Feldmann and B. Monien, editors, Proceedings of the Euro-Par, pages 927-932, Paderborn, Germany, 2002. Springer-Verlag. 3. E. Alba, A.J. Nebro, and J.M. Troya. Heterogeneous Computing and Parallel Genetic Algorithms. Journal of Parallel and Distributed Computing, 62: 13621385,2002. 4. E. Alba and J.M. Troya. A survey of parallel distributed genetic algorithms. Complexity, 4(4):31-52, 1999.

REFERENCES

61

5. R.S. Ban, B.L. Golden, J.P. Kelly, M.G.C. Resende, and W.R. Stewart. Designing

and Reporting on Computational Experiments with Heuristic Methods. Journal ofHeuristics, 1(1):9-32, 1995. 6. R.S. Barr and B.L. Hickman. Reporting Computational Experiments with Parallel Algorithms: Issues, Measures, and Experts’ Opinions. ORSA Journal on Computing, 5(1):2-18, 1993. 7. J.E. Beasley. OR-Library: distributing test problems by electronic mail. Journal of the Operational Research Society, 41(11): 1069-1072, 1990. 8. T.C. Belding. The distributed genetic algorithm revisited. In L.J. Eshelman, editor, 6th International Conference on Genetic Algorithms, pages 114-121, Los Altos, CA, 1995. Morgan Kaufmann. 9. W.S. Cleveland. Elements of Graphing Data. Wadsworth, Monteray, CA, 1985. 10. K.A. De Jong, M.A. Potter, and W.M. Spears. Using Problem Generators to Explore the Effects of Epistasis. In T. Back, editor, Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA), pages 338-345. Morgan Kaufmann, 1997. 11. V. Donalson, F. Berman, and R. Paturi. Program speedup in heterogeneous computing network. Journal of Parallel and Distributed Computing, 2 1:3 16322,1994. 12. A.E. Eiben and M. Jelasity. A critical note on experimental reseeach methodology in ec. In Congress on Evolutionary Computation 2002, pages 582-587. IEEE Press, 2002.

13. M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Theory of Np-Completeness. W.H. Freeman, San Francisco, 1979. 14. D.E. Goldberg. Genetic Algorithms in Search, Optimization andMachine Learning. Addison-Wesley, 1989. 15. B. Golden and W. Stewart. Empirical Analisys of Heuristics. In E. Lawlwer, J. Lenstra, A. Rinnooy Kan, and D. Schoys, editors, The Traveling Salesman Problem, a Guided Tour of Combinatorial Optimization, pages 207-249, Chichester, UK, 1985. Wiley. 16. R.L. Graham. Bounds on multiprocessor timing anomalies. SIAM Journal qf Applied Mathematics, 17:416429, 1969. 17. C. Hervas. Analisis estadistico de comparacion de algoritmos o heuristicas. Personal Communication, University of Cordoba, Spain, 2004. 18. J.N. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1(1):3342, 1995.

62

MEASURING THE PERFORMANCE OF PARALLEL METAHEURISTICS

19. A.H. Karp and H.P. Flatt. Measuring parallel processor performance. Communications of the ACM, 33(5):539-543, 1990. 20. R.M. Karp. Probabilistic analysis of partitioning algorithms for the traveling salesman problem in the plane. Mathematics of Operations Research, 2:209224, 1977. 2 1. S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by Simulated Annealing. Science, 220(4598):671-680, 1983. 22. J.R. Koza. Genetic Programming. The MIT Press, Massachusetts, 1992. 23. S.C. Lin, W.F. Punch, and E.D. Goodman. Coarse-grain parallel genetic algorithms: Categorization and a new approach. In Sixth IEEE Parallel Distributed Processing, pages 28-37, 1994. 24. C. McGeoch. Towards an experimental method for algorithm simulation. INFORMS Journal on Computing, 8(1): 1-15, 1996. 25. D.C. Montgomery. Design andAnalysis ofExperiments. John Wiley, New York, 3rd edition. 1991. 26. R.L. Rardin and R. Uzsoy. Experimental Evaluation of Heuristic Optimization Algorihtms: A Tutorial. Journal of Heuristics, 7(3):26 1-304,2001.

27. G. Reinelt. TSPLIB - A travelling salesman problem library. ORSA - Jorunal o f Computing, 3:376-384, 1991. 28. P. Spiessens and B. Manderick. A massively parallel genetic algorithm. In R.K. Belew and L.B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms (ICGA), pages 279-286. Morgan Kaufmann, 1991.

29. R. Tanese. Distributed genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms (ICGA), pages 434-439. Morgan Kaufmann, 1989. 30. E.R. Tufte. The Visual Display o f Quantitative Information. Graphics Press, 1993. 3 1. UEA CALMA Group. Calma project report 2.4: Parallelism in combinatonal optimisation. Technical report, School of Information Systems, University of East Anglia, Nonvich, UK, September 18 1995. 32. R. Uzsoy, E. Demirkol, and S.V. Mehta. Benchmarks for shop scheduling problems. European Journal of Operational Research, 109:137-141, 1998. 33. D. Whitley. An overview of evolutionary algorithms: practical issues and common pitfalls. Information and Sofmare Technology, 43917-83 1, 2001.

3

New Technologies in Parallelism ENRIQUE ALBA, ANTONIO J. NEBRO Universidad de Mdaga, Spain

3.1 INTRODUCTION Parallel computing is a continuously evolving discipline, and this evolution involves both hardware and software. In this chapter, we intend to provide a review of software issues, focusing mainly in tools, languages, and systems in order to find out how they can match the requirements needed for implementing parallel optimization heuristic algorithms. Notwithstanding, it is necessary to understand hardware concepts related to parallel computer architectures (Section 3.2). Then, in Section 3.3, we discuss issues related to parallel programming in shared-memory and distributed-memory systems. An analysis of tools for programming both kinds of systems is included in Section 3.4 and Section 3.5. Finally, Section 3.6 discusses the global features of the presented tools and Section 3.7 makes a summary of the contents of the chapter.

3.2 PARALLEL COMPUTER ARCHITECTURES: AN OVERVIEW Various classification schemes for parallel computers have been defined over the years. The most commonly used taxonomy is the proposal ofFlynn made in 1972, but nowadays it is not accurate enough to describe all the possible parallel architectures. Consequently, other classifications have been presented, although many of them are extensions of Flynn’s classification. The model of Flynn is based on the notion of instruction and data streams. There are four possible combinations, conventionally called SISD (Single Instruction, Single Data stream), SIMD (Single Instruction, Multiple Data stream), MISD (Multiple Instruction, Single Data stream), and MIMD (Multiple Instruction, Multiple Data stream). This scheme is illustrated in Figure 3.1. We describe each model next:

0

The SISD architecture corresponds to the classical mono-processor personal computer or workstation. 63

64

NEW TECHNOLOGIES IN PARALLELISM

Data Stream

5

4

e 0 ‘-r:

Z

0

c)

2

Single

Multiple

SISD

SIMD

Single

MISD

MIMD

Multiple

I

Fig. 3.1 Flynn’s Taxonomy. 0

0

0

In the SIMD architecture the same instruction is executed by all the processors, at each computing step or clock cycle, over different data. A typical SIMD computer is composed of hundreds or even thousands of simple processors, each with a small local memory. This lund of machine exploits spatial parallelism that may be present in a given problem which uses large and regular data structures. If the problem domain is spatially or temporally irregular, many processors must remain idle at a given time step, thus producing a loss in the amount of parallelism that can be exploited. This architecture was promising at the beginning of the 1990s, but the complexity and often inflexibility of SIMD machines, strongly dependent on the synchronization requirements, have restricted their use mostly to special-purpose applications. In the class MISD, the machines execute multiple instructions on the same piece of data. Computers of this type are hard to find in practice, although some people regard pipelined machines as MISD. In the MIMD class, different data and programs can be loaded into different processors, and each processor can execute different instructions at any given point of time. This class is in general the most useful one, and most parallel computers belong to it.

Although Flynn’s taxonomy has become a standard model, it is a coarse-grain classification. For example, most of today’s processors are parallel in the way in which they execute the instructions, but they are considered as SISD. What it is more important, this taxonomy does not consider in the MIMD class whether the system memory spans a single address space or it is distributed among several modules. A more exhaustive taxonomy that extends Flynn’s classification is depicted in Figure 3.2. Here, MIMD systems are subdivided into multiprocessors, in which all the processors have direct access to all the memory, and multicomputers (also called distributed systems), where each processor has it own local memory module and remote memory modules accessing requires the use of a message-passingmechanism. Multiprocessors are classified depending on whether the access time to every memory address is constant (uniform memory access, or UMA) or not (non-uniform

SHARED-MEMORY A N D DISTRIBUTED-MEMORY PROGRAMMING

65

1

Shared memory

Message Passing

Fig. 3.2 Extension of Flynn’s Taxonomy.

memory access,.orNUMA). In the former, the interconnection among processors can be a bus or a switch; in the latter there is a distinction if the caches are kept coherent (coherent-cache or CC-NUMA) or not (NC-NUMA). Although multi-processors are widely used, they have the drawback of being limited in terms of the maximum number of processors that can be part of them, and their price tends to increase exponentially with the number of processors. Distributed systems are composed of collections of interconnected computers, each having its own processor, memory, and a network adaptor. Compared to multiprocessors, they have a number of significant advantages, namely: easier to build and extend, better ratio price/performance,more scalability,more flexibility, and they are the only choice to execute inherently distributed applications [25]. A typical distributed system is a cluster of workstations (COW), composed of PCs or workstations interconnected by a communication network such as Fast Ethernet (general purpose, low cost) or Myrinet (high performance, medium cost). The number of computers in a COW is limited to a few hundred because of the limits imposed by the network technology. The systems belonging to the MPP model (MassivelyParallel Processor) are composed of thousands of processors. If the MPP system is tightly-coupled (i.e, it is a unique computer), then we have the MIMD systems based on topologies such as hypercube, fat tree, or torus. On the other hand, a MPP can be composed of machines belonging to multiple organizations and administrative domains, leading to the so-called grid systems [4, 91, also known as metacomputers, which are built around the infrastructure provided by the Internet. 3.3

SHARED-MEMORY AND DISTRIBUTED-MEMORY PROGRAMMING

Independently of the kind of parallel computer we consider, they share a common feature: they are difficult to program. Compared to sequential programming, which is based on a single process having a unique flow of control, the programs of parallel

66

NEW TECHNOLOGIES IN PARALLELISM

computers adheres to concurrent programming. A concurrent program contains two or more processes that work together to perform a task by communicating and synchronizing among them [2], so the programmer has to deal with known issues such as mutual exclusion, condition synchronization, or deadlock. Although concurrent programming includes both the programming of multiprocessors and distributed systems, the former adheres to shared-memoryprogramming, while the latter is known as distributed programming. Shared-memoryprogramming is based on the fact that the whole memory is directly accessible by all the processes, which use read and write operations to access them; thus, this model is a natural extension of sequential programming. Furthermore, its foundations were established in the 1960s/1970s,so they are well known. Distributed programming is based on message-passing mechanisms, which introduce a number of difficulties not encountered in shared-memory programming. To start with, there exists several ways to interchange messages between process (synchronous/asynchronous, bufferedhnbuffered, reliablehot reliable) [25], each of them with distinct semantics; the program has also to solve issues related to heterogeneity (the sender and the receiver machines happen to have different architectures or operating systems), load balancing (to keep the processors as busy as possible), security, etc. Because of the advantages that shared-memory programming offers compared to distributed programming, in the last decade a large amount of research was carried out to try to offer a shared-memory over distributed systems, leading to the so-called distributed shared-memory (DSM) systems. A survey can be found in [14]. In fact, some multiprocessors belonging to the NUMA category (Figure 3.2) are distributed systems implementing DSM by hardware. DSM can be also implemented at the software level, including here extensions of operating systems, libraries to be used by sequential languages, new languages, and extensions of existing languages. Given that this chapter is devoted to new technologies, we summarize next a representative set of tools, languages, and systems that can be considered to be used to implement parallel heuristic algorithms. However, for completion purposes, we will also discuss those tools that are not strictly new, but they can be considered to be used in our applications. We will analyze in greater detail some of these tools in Sections 3.4 and 3.5. The easiest way to deal with parallelism in the shared-memory model is to use a parallelizing compiler that automatically converts a sequential program to a parallel one. However, most of these compilers are oriented to programs written in Fortran, although there are some tools for C and C++, such as SUIF and the Portland Group compilers (www .pgroup .corn). The second approach is to use operating systems resources such as processes, threads, semaphores, or even files. A third group is composed of parallel libraries that can be used by sequential languages, such us OpenMP (www .openmp .org), which provides bindings for Fortran, C, and C++. Finally, we can use parallel languages. Here we can find a plethora of programming models (shared variables, coordination, data parallelism, functional) and languages (Ada, Cilk, HPF, NESL), although is worth mentioning that modern, general-purpose languages such as Java and C # are prepared to deal with shared-

SHARED-MEMORY AND DISTRIBUTED-MEMORY PROGRAMMING

67

memory (and distributed-memory) parallelism, which is based normally on the use of thread libraries. Concerning distributed-memory programming, we can consider also the use of operating system resources (sockets), parallel libraries (PVM, MPI), and parallel languages (again Java and C#, but also Ada, Linda, or Orca). However, while that classification is enough to characterize shared-memory programming tools, it is insufficient to properly identify all the possibilities we can find in distributed programming. A more accurate taxonomy is described next: 0

0

0

Message-passing libraries. These libraries are mainly indicated to develop parallel applications in COWs. We can include here the sockets offered by the operating systems (Unix and Windows) and the sockets provided by Java and Microsoft’s .NET platform (which supports the languages C#, Visual Basic, or C++). Nevertheless, the socket API is a low level interface, while libraries such as PVM and MPI offer a rich set of primitives for interprocess communication and synchronization. Object-based systems. If our application requirements involve coping with heterogeneous hardware and/or software, then object-based systems are to be considered. They are an evolution of remote procedure call (RPC) systems, and they are based on the idea of having remote objects that can be accessed by clients by invokmg the methods they define in their interfaces. Examples of these systems are C O M A and Java MI. To manage heterogeneity, objectbased systems are structured in a three layer scheme, where between the clients and objects (the higher level) and the operating systems and hardware (the lower level) there is an intermediate level called middhare, which hides all the details of the lower level to the applications. Grid computing systems. Internet provides an infrastructure that allows to interconnect computers around the world. Thus, it is possible to envision a metacomputer composed of thousands of machines belonging to organizations residing in different countries. As a result, a new discipline known as grid computing has emerged in the last years, and there is currently a large amount of projects and systems focused in this area [4,91. The computing power that can be obtained by a grid computing system allows to attack problems that were unable to be solved with COWs [3], but the development of grid applications is difficult. Among the reasons, we can mention [7] large scalability, heterogeneity at multiple levels, unpredictable structure of the system (which is constructed dynamically from available resources), dynamic and unpredictable behavior (concerning network failures, as well a high latency and low bandwidth of communications), and multiple administrative domains (each one having its own administrative policies, which much be preserved by the grid system). The defacto standard grid system is Globus [7], but there are many other systems, such as Condor, Sun Grid Engine (SGE), Legion, NetSoIve, etc. On the other hand, some authors argue that the features of Java make it a candidate language for grid computing [ 1 11.

68

NEW TECHNOLOGIES IN PARALLELISM

Web-based computing. As grid computing, another approach based on the Internet has appeared: the Web services. These provide a method for applications to communicate whith each other over the Internet. However, compared to grid computing systems, the Web services are built on existing Web protocols and open XML standards [6] regulated by the W3C (www .w3.org). Thus, communication uses the Simple Object Access Protocol (SOAP), Web services are described with the Web Services Description Language (WSDL), and the Universal Description, Discovery, and Integration (UDDI) directory allows to register Web service descriptions. Another interesting aspect of Web services is that the hosting environments and runtime systems do not depend on a specific platform, such as Windows 2OOOlXP, same flavor of Unix, Java 2 Enterprise Edition (J2EE), or Microsoft .NET. Currently there exists the trend to combine Web services and the Grid. An example is the Open Grid Services Architecture (OGSA) [lo], which is presented as an evolution of Globus towards a Grid system architecture based on an integration of Grid and Web services concepts and technologies.

3.4 SHARED-MEMORY TOOLS In this section we describe a set of tools that we consider as interesting to be taken into account when facing the construction of parallel algorithms in shared-memory systems. The tools are summarized in Table 3.1. Table 3.1 Tools for shared-memory programming System

Category

Language Bindings

Pthreads Java threads OpenMP

Operating system resource Programming language Compiler directives

C Java Fortran, C, C+i

3.4.1

Pthreads

The concept of thread as an independent flow of control inside a process has been virtually adopted by all modern operating systems since the 1990s, leading to the so-called multithreaded processes and a discipline known as multithreaded programming. Although it has alwaysbeen possible to write parallel programs using processes and other resources provided by the operating system (semaphores, shared-memory, files), multithreaded processes are itself concurrent programs, what brings a number of advantages over multiple processes: faster context switching between threads, lower resource usage, simultaneous computatiodcommunication, and some parallel applications fit well in the thread model.

SHARED-MEMORY TOOLS

69

In the last decade several Unix-based operating systems begun to include their proprietary thread libraries (e.g. Solaris), leading to nonportable multithreaded code. In this context, a standardized library for multithreaded programming, known as Pthreads (or POSIX threads), was defined in the mid-1990s as an effort to provide a unified set of C library routines in order to make multithreaded programs portable. The Pthreads library offers functions for thread management (thread creation, scheduling, and destruction) and synchronization (mutexes, synchronization variables, semaphores, and read-write locks), and it is available mainly on various variants of the UNIX operating system. The operating systems belonging to Microsoft’s Windows family, Windows 2000 and Windows XP, also provide multithreaded processes. Although the thread functions are syntactically different to the Pthreads interface, the functionality regarding thread management and synchronization is equivalent to some extent.

3.4.2 Java Threads The benefits of multithreaded programming modified not only the classical view of single-threaded processes in operating systems but also in modern programming languages. An example of such a language is Java ( java. sun. corn). Threads are programmed in Java by extending the Thread class or by implementing a Runnable i n t e r f a c e . Thread creation involves creating an instance of these classes and invoking on the new object a method called s t a r t o . Synchronization is carried out providing synchronized methods that ensure mutual exclusion and condition synchronization by means of a number of methods (wait, n o t i f y , and n o t i f yAll). Compared to Pthreads, Java threads offer the advantages of the portability inherent in Java programs and a multithreaded programming model adapted to the objectoriented features of Java.

3.4.3

OpenMP

OpenMP is a set of compiler directives and library routines that are used to express shared-memory parallelism (www .openmp .org). The OpenMP Application Program Interface (API) was developed by a group representing the major vendors of highperformance computing hardware and software. Fortran and C++ interfaces have been designed, with some efforts to standardize them. The majority of the OpenMP interface is a set of compiler directives. The programmer adds these to a sequential program to tell the compiler what parts of the program must be executed concurrently and to specify synchronization points. The directives can be added incrementally, so OpenMP provides a path for parallelizing existing software.

70

NEW TECHNOLOGIES IN PARALLELISM

3.5 DISTRIBUTED-MEMORY TOOLS As in the previous section, we describe here a set of languages and systems of interest for the construction of parallel algorithms in distributed-memory systems. These systems are summarized in Table 3.2.

Table 3.2 Tools for distributed-memory programming Message-Passing Library

Object-Based System

Internet Computing System

Sockets PVM MPI

Java RMI CORBA

Globus Condor

3.5.1 Sockets The BSD socket interface (see, e.g., [ 5 ] ) is a standard defucto message-passing system. A set of data structures and C functions allow the programmer to establish full-duplex channels between two computers for implementing general-purpose distributed applications. If the chosen underlying protocol is TCP, sockets offer a connection-oriented service, ensuring reliable communications and guaranteing that the messages are received in the order they were issued. Also, a connectionless service over UDP is available for applications not needing the facilities of TCP. Parallel programs can be developed with the socket API, with the added benefits of large applicability, high standardization, and complete control on the communication primitives. Despite their advantages, programming with sockets has many drawbacks for applications involving a large number of computers, with different operating systems, and belonging to different owned networks. First, programming with sockets is error-prone and requires understanding low level characteristics of the network. Also, it does not include any process management, fault tolerance, task migration, security options, and other attributes usually requested in modem parallel applications. As in the case of threads, modem languages such as Java incorporate a socket library. Thus, portability is enhanced, and the socket functions are simplified compared to the C socket interface. Furthermore, Java allows to send objects via sockets by using a mechanism known as serialization. This is a powerful mechanism that can be used to send complex data structures, such us lists and trees of objects. The price to pay is an overhead that may not be assumable by intensive communication programs.

DISTRIBUTED-MEMORY TOOLS

3.5.2

71

PVM

The Parallel Virtual Machine (PVM) [23] is a software system that permits the utilization of a heterogeneous network of parallel and serial computers as a unified general and flexible concurrent computational resource. The PVM system supports the message-passing paradigm, with implementations for distributed-memory, sharedmemory, and hybrid computers. These features allow applications to use the most appropriate computing model for the entire application for individual subalgorithms. The PVM system is composed of a suite of user interface primitives supporting software that together enable concurrent computing on loosely coupled networks of processing elements. PVM may be implemented on heterogeneous architectures and networks. These computing elements are accessed by applications via a standard interface that supports common concurrent processing paradigms in the form of well-defined primitives that are embedded in procedural languages such as C and Fortran. The advantages of PVM are its wide acceptability and its heterogeneous computing facilities, including fault tolerance issues and interoperability [24]. Despite its advantages, the standard for PVM has recently begun to be unsupported (no further releases); also, many PVM users are shifting to MPI.

3.5.3

MPI

The Message-Passing Interface (MPI) is a library of message-passing routines [20]. When MPI is used, the processes in a distributed program are written in a sequential language (C, Fortran), and they communicate and synchronize by calling functions in the MPI library. The MPI API was defined in the mid-1990s by a large group of people from academia, government, and industry. The interface reflects people’s experiences with earlier message-passing libraries, such as PVM. The goal of the group was to develop a single library that could be implemented efficiently on the variety of multiple processor machines. MPI has now become a de fucto standard, and several implementations exist, such as MPICH (www .mcs .an1 .gov/mpi/mpich) and LAM/MPI (www .mpi .nd .eddlam). The interest in developing MPI was that each massively parallel processor (MPP) vendor was creating its own proprietary message-passing M I . In this scenario it was not possible to write a portable parallel application. MPI is intended to be a standard for message-passing specifications that each MPP vendor would implement on its system. The MPP vendors need to be able to deliver high-performance and this became the focus of the MPI design. Given this design focus, MPI is expected to always be faster than PVM on MPP hosts [24]. MPI programs follow a SPMD style (single program, multiple data), that is, every processor executes a copy of the same program. Each instance of the program can determine its own identity and hence take different actions. The instances interact by calling MPI library functions. The MPI provides a rich set of 128 functions for

72

NEW TECHNOLOGIES IN PARALLELISM

process-to-process communication, group communication, setting up and managing communication groups, and interacting with the environment. The first standard, named MPI- 1, had the inconvenience that applications were not portable across a network of workstations because there was no standard method to start MPI tasks on separate hosts. Different MPI implementations used different methods. In 1995 the MPI committee began meeting to design the MPI-2 specification to correct this problem and to add additional communication functions to MPI, including language bindings for C++. The MPI-2 specification was finished in June 1997. The MPI-2 document adds 200 functions to the 128 original fimctions specified in the MPI- 1. All the mentioned advantages have made MPI the standard for future applications using message-passing services. The drawbacks relating dynamic process creation and interoperability are being successfully solved, although, up-to-date, full implementations for MPI-2 are not widely available. It is worth mentioning the many nonstandard extensions of MPI that have been developed to allow running MPI programs in grid computing systems. An example is MPICH-G2, a complete implementation of the MPI-1 standard to extend the MPICH implementation of MPI for grid execution on Globus [ 151. 3.5.4

Java RMI

The implementation of remote procedure calls (RPC) in Java is called Java-RMI. The Remote Method Invocation in Java allows an application running in one Java virtual machine (JVM) to invoke methods of objects residing in different JVMs, with the added advantages of being object-oriented, platform-independent, and distributed garbage collection. The features of Java RMI come at a cost: one remote method invocation can take several milliseconds, depending on the number and the types of arguments [ 111. This latency is too slow if we are dealing with fine-grained, communication-intensive applications running on COWS. However, some projects are trying to improve the performance of RMI. For example, in the Manta project [ 191, Java code is compiled to native code and a runtime system written in C is used, leading to a significant reduction of the communication time.

3.5.5 CORBA Distributed systems are typically heterogeneous, and this heterogeneity refers to computer architectures, operating systems, and programming languages. The origin of the heterogeneity can be found in several issues, including the continuous advances in both hardware and software (the new technologies must coexist with the old ones), applications that are inherently heterogeneous, and the need of using legacy systems. In this context, at the beginning of the 1990s CORBA (Common Object Request Broker Architecture) was defined by the Object Management Group (www.omg .org), with the goals in mind of defining models and abstractions that

DISTRIBUTED-MEMORY TOOLS

73

were platform independent and hiding as much as possible the complexity of the underlying systems but trying to keep high performance. C O M A is a middleware, providing a distributed-object-basedplatform to develop distributed applications in which components can be written in different languages and can be executed in different machines. There are defined bindings for several languages, including C, C++, Java, Smalltalk, Cobol, and Ada. The architecture offers a rich set of services to applications, including naming, notification, concurrency, security, and transactions. The first version of CORBA, appearing in 1991, did not permit interoperability among different implementations; this was achieved with the definition of the Internet Inter-ORE3 Protocol (IIOP) in CORBA-2, in 1996. Currently there are several C O M A implementations, including both proprietary and free systems. Examples of the former are Orbix and Visibroker (offering bindings to C++ and Java), while TAO (C++) and JacORE3 (Java) are examples of the latter.

3.5.6

Globus

The Globus Toolkit is a community-based, open-architecture, open-source set of services and software libraries that support grids and grid applications [8]. Globus has become a standard defacto, and it provides support for security, information discovery, resource management, data management, communication, fault detection, and portability. It is constructed as layered architecture, in which high level global services are built upon essential low level core local services. The components of Globus can be used either independently or together to develop applications. For each component of the toolkit, both protocols and applications programming interfaces (MIS) are defined. Furthermore, it provides open-source reference implementations in C (for client-side MIS).Some of these components are the following: 0

Grid Security Infrastructure (GSI).

0

GridFTP.

0

Globus Resource Allocation Manager (GRAM). Metacomputing Directory Service (MDS-2).

0

Global Access to Secondary Storage (GASS).

0

Data catalogue and replica management.

0

Advanced Resource Reservation and Allocation (GARA).

The latest version of the Globus Toolkit (GT3) is based on a core infrastructure component compliant with the Open Grid Services Architecture (OGSA). This architecture defines the concept of Grid service as a Web service that provides a set of well-defined interfaces and that follow specific conventions.

74

NEW TECHNOLOGIES IN PARALLELISM

3.5.7 Condor Condor (www . c s . wisc . edu/condor) is a resource management system (RMS) for grid computing. A RSM is responsible for detecting available processors, matching job requests to available processors, executing jobs on available machines, and determining when a processor leaves computation. Compared to other RMSs [ 161, Condor is an open-source project based on three features: remote system calls, classified advertisement, and checkpointing [ 171. These features are implemented without modification underlying UNIX kernel. With Condor, a set of machines can be grouped into a Condor pool, which is managed according to a opportunistic computing strategy. For example, Condor can be configured to run jobs on workstations when they are idle, thus using CPU time which otherwise would be wasted. If a job is running in a machine when its owner returns, the job can be stopped and resumed later, or it can be migrated to other available idle machine. Several Condor pools can be merged by using two mechanisms, flocking and Condor-G. This last one is based on combining Condor with Globus, and it allows to utilize large collections of resources that span across multiple domains. Condor is easy to install and manage, and to take advantage of Condor features we do not to modify our C, C++, or Fortran programs; only relinking is required. Furthermore, once we have a binary for each architecture type in our Condor pool, Condor automatically transfers the executables to the target machines. If we need to communicate our processes, Condor includes versions of PVM and MPI, although using shared files is also possible.

3.6 WHICH OF THEM? We have made a somewhat detailed description of many systems, with the aim of giving an overview of those that can be considered as more interesting to be used to implement parallel heuristic algorithms. Choosing one of them depends basically on our programming skill and preferences and on the application requirements. In the context of shared-memory programming, a system programmer will be comfortable using C or C++ with Pthreads or even processes instead, a Java programmer will choose Java threads, and a Fortran programmer will probably prefer using OpenMP. On the other hand, if we require that our application be portable to UNIX and Windows operating systems, then Java is an option to be considered. However, if we prefer using C or C++ instead of Java, there are libraries that offer wrappers to Pthreads and Windows threads, thus ensuring portability. An example of such libraries is ACE (Adaptive Communication Environment, www . c s .wustl .edu/-schmidt/ACE. html). Besides, using OpenMP is adequate if our application follows a data-parallel model, but it can be difficult to use in task-parallel applications.

SUMMARY

75

We analyze now the tools for distributed programming. If we intend to develop a distributed algorithm to be executed in a cluster of workstations, sockets can be an adequate tool if we have experience using them and the complexity of communications is low; otherwise, using PVM or MPI can be preferable. If the machines in the cluster are heterogeneous, we can avoid some problems of sockets by using again libraries such as ACE or using Java sockets. If our favorite language is Java and the latency of communications is not critical, Java-RMI offers a high level distributed object-oriented programming model, but we must be aware that this model does not fit well with one-to-many communication or bamer synchronization; in this case, we should look again at MPI. The programming model of C O M A is similar to that of Java-RMI, but its learning curve is larger. C O M A is the choice if we need to combine program modules written in different programming languages. A common usage of CORBA is to implement computation modules in C++ (more efficiency) while the graphical user interface is written in Java. Finally, we discuss the adequacy of grid computing systems to develop parallel heuristic algorithms. The two analyzed systems, Condor and Globus, are suited if we need to use hundreds or thousands of machines, but they present different features that have to be taken into account. Condor is simple to install (a full installation can take few minutes) and manage, and several Condor pools can be merged easily if they belong to the same administrative domain. Furthermore, its ability to use idle processor cycles, the use of checkpointing and process migration (thus achieving fault tolerance), and the remote system-call facility make Condor very attractive. However, our programs should use shared files if interprocess communication is required; otherwise, Condor includes versions of P V M and MPI, but then some of the above features are not available. Although it is possible to use Java in Condor, its full power can only be obtained by programs written in C, C++, and Fortran. Compared to Condor, Globus is significantlymore complex to install and administer, and it presents advantagesto build grid systems spread among different administrative organizations. As a security mechanism, Globus uses X.500 certificates, and its C API provides full access to the Globus services, which must be used explicitly by the programmer. Furthermore, Globus does not incorporate any resource management system (RMS), which should be implemented by hand or, alternative, an existing RMS can be used as, for example, Condor. A choice to MPI programmers is to use MPICH-G2, the implementation of MPICH included by Globus. 3.7

SUMMARY

Parallel computing involves both hardware and software issues. The former are directly related to parallel architectures, while the latter have to do with parallel programming models. Nowadays there are two main kinds of parallel computers, multiprocessors and distributed systems, and the programs running on them require to use, respectively, shared-memory programming and distributed programming. This chapter has been dedicated to new technologies that can be used to implement parallel heuristic algorithms, although we have also included some tools that are not

76

NEW TECHNOLOGIES IN PARALLELISM

strictly new for completeness. Considering shared-memory programming, recent advances include parallel libraries such as OpenMP and the generalized use of threads, either using the thread AF'I offered by the operating system or with modem languages that already incorporate them, such as Java. Distributed systems are continuously evolving and, after the popularization of COWs, grid computing is gaining interest as a discipline which permits using thousands of machines as a single parallel computer. However, programming grid computer systems presents a number of difficulties not found when programming COWs, so popular message-passing tools (sockets, PVM, or MPI) used in COWs are not p o w e h l enough in a grid environment. We have described Globus, the de facto standard grid system, and Condor, although this is an open research area and there are many ongoing projects that can fit our requirements. Acknowledgments The authors acknowledge partial funding by the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project).

REFERENCES 1. E. Alba, A. J. Nebro, and J. M. Troya. Heterogeneous computing and parallel genetic algorithms. Journal ofParalle1 and Distributed Computing, 62(9): 13621385,2002.

2. G. R. Andrews. Foundations of Multithreaded, Parallel and Distributed Programming. Addison Wesley, 2000. 3. K. Anstreicher, N. Brixius, J.-P. Goux, and J. Linderoth. Solving Large Quadratic Assignment Problems on Computational Grids. Mathematical Programming, 91:563-588, 2002.

4. F. Berman, G. C. Fox, and A. J. G. Hey. Grid Computing: Making the Global Infrastructure a Reality. Wiley, 2003. 5. D.E. Comer and D.L. Stevens. Internetworking with TCPLP, volume 111.Prentice Hall, 1993. 6. F. Curbera, M. Duftler, R. Khalaf, W. Nagy, N. Mukhi, and S. Weerawarana.

Unraveling the web services web. an introduction to SOAP, WSDL, and UDDI. IEEE Internet Computing, 6(2):8&93, March-April 2002.

7. I. Foster and C. Kesselman. Globus: a metacomputing infraestructure toolkit. International Journal of Supercomputer Applications, 1l(2): 115-128, 1997.

8. I. Foster and C. Kesselman. Globus: A toolkit-based grid architecture. In Ian Foster and Carl Kesselman, editors, The Grid: Blueprint.for a New Computing Infrastructure, pages 259-278. Morgan Kaufmann, 1999.

REFERENCES

77

9. I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999. 10. I. Foster, C. Kesselman, J. M. Nick, and S. Tuecke. Grid services for distributed system integration. IEEE Computer, pages 37-46, June 2002. 11. V. Getov, G. von Laszewski,M. Philippsen, and I. Foster. Multiparadigmcommunications in java for grid computing. Communications of the ACM, 44( 10):118125,2001. 12. A. Globus, E. Langhirt, M. Livny, R. Ramamurthy, M. Solomon, and S . Traugott. Javagenes and condor: Cycle-scavenging genetic algorithms. In ACM Java Grande 2000 Conference, San Francisco (CA), June 2000. 13. E.D. Goodman. An introductionto galopps - the genetic algorithm optimized for portability and parallelism system, release 3.2. Technical Report 96-07-01, Intelligent Systems Laboratory and Case Center for Computer-Aided Engineering and Manufacturing, Michigan State University, 1996. 14. M. TomaSeviC M J. ProtiC J and V. Milutinovic. Distributed shared-memory: Concepts and systems. IEEE Parallel and Distributed Technology, 4(2):63-79, 1996.

15. N. T. Karonis, B. Toonen, and I. Foster. MPICH-G2: A grid-enabled implementation of the message-passing interface. Journal of Parallel and Distributed Computing, 63551-563,2003. 16. K. Krauter, R. Buyya, and M. Maheswaran. A taxonomy and survey of grid resource management systems for distributed computing. Sofmare - Practice and Experience, 32:135-164,2001. 17. M. Livny, J. Basney, R. Raman, and T. Tannenbaum. Mechanisms for high throughput computing. SPPEDUP Journal, 11(1):3640, June 1997. 18. N. Lynch. Distributed Algorithms. Morgan Kaufman, 1996. 19. J. Maassen, R. van Nieuwport, R. Veldema, H.E. Bal, and A. Plaat. An efficient implementation of java’s remote method invocation. In 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 173182, Atlanta, GA, May 1999. 20. Message-Passing Interface Forum. MPI: A message-passing interface standard. International Journal of Supercomputer Applications, 8(3/4):1 6 5 4 14, 1994. 2 1. M. Raynal. Distributed Algorithms and Protocols. John Wiley, 1988. 22. K. C. Sarma and H. Adeli. Bilevel parallel genetic algorithms for optimization structures. Computer-Aided Civil and Infrastructure Engineering, 16:296-304, 2001.

78

NEW TECHNOLOGIES IN PARALLELISM

23. V.S. Sunderam. PVM: a framework for parallel distributed computing. Journal of Concurrency Practice and Experience, 2(4):3 15-339, 1990. 24. V.S. Sunderam and G.A. Geist. Hererogeneous parallel and distributed computing. Parallel Computing,2511699-1721, 1999. 25. A. S. Tanenbaum. Distributed Operating Systems. Prentice-Hall, 1995. 26. Y. Tanimura, T. Hiroyasu, M. Miki, and K. Aoi. The system for evolutionary computing on the computational grid. In 14th International Conference on Parallel and Distributed Computing and Systems, pages 39-44, November 2002. 27. M. Tomassini, L. Vanneschi, L. Bucher, and F. Femandez. An mpi-based tool for distributed genetic programming. In Proceedings. IEEE International Conference on Cluster Computing 2000, pages 209-2 16, Chemnitz (Germany), Novembermecember 2000.

4

Metaheuristics and Parallelism E. ALBA’, E-G. TALB12, G. LUQUEl, N. MELAB2 ‘Universidad de Malaga, Spain 2Laboratoire d’lnformatique Fondamentale de Lille, France

4.1

INTRODUCTION

In practice, optimization problems are often NP-hard, complex, and CPU timeconsuming. Two major approaches are traditionally used to tackle these problems: exact methods and metaheuristics. Exact methods allow to find exact solutions but they are impractical as they are extremely time-consuming. Conversely, metaheuristics provide suboptimal solutions in a reasonable time. They allow to meet the resolution delays often imposed in the industrial field. Metaheuristics fall in two categories: local search metaheuristics (LSMs) and evolutionary algorithms (EAs). A local search starts with a single initial solution. At each step of the search the current solution is replaced by another (often the best) solution found in its neighborhood. Very often, LSMs allow to find a local optimal solution and so are called exploitation-oriented methods. On the other hand, EAs make use of a randomly generated population of solutions. The initial population is enhanced through a natural evolution process. At each generation of the process, the whole population or a part of the population is replaced by newly generated individuals (often the best ones). EAs are often called exploration-orientedmethods. Although the use of metaheuristics allows to significantly reduce the temporal complexity of the search process, the latter remains time-consuming for industrial problems. Therefore, parallelism is necessary to not only reduce the resolution time but also to improve the quality of the provided solutions. For each of the two families of metaheuristics, differentparallel models have been proposed in the literature. Each of them illustrates an alternative approach to handle and deploy the parallelization. According to their granularity, parallel LSMs follow three major models (from coarse-grained to fine-grained): parallel multistart, parallel moves, and move acceleration. As EAs are population-based algorithms the following approaches are often roughly used: parallelization of computation and parallelization of population. In the first model, the operations commonly applied to each of the individuals are performed in parallel. In the other model, the population is split in different subpopulations that 79

80

METAHEURISTICS AND PARALLELISM

can be simply exchanged or evolve separately and be joined later. In this chapter, we propose a study of these models through the major solution methods: LSMs (Simulated Annealing [49], Tabu Search [39], Greedy Randomized Adaptive Search Procedure [30], Variable Neighborhood Search [65]) and EAs (Genetic Algorithms [13], Evolutionary Strategies [13], Genetic Programming [ 131, Ant Colonies [26], Estimation of Distribution Algorithms [66], Scatter Search [38]). Other models are also sketched on parallel heterogeneous metaheuristics and parallel multiobjective optimization. The rest of the chapter is organized as follows: In Section 4.2, we highlight some principles of LSMs and their major parallel models. In Section 4.3, we show how these models are instantiated and applied to the cases of LSM studies cited above. In Section 4.4, we give some principles of EAs and present their main parallel models. In Section 4.5, we study some cases of EAs as previously quoted. In Section 4.6, we analyze other models such as parallel heterogeneous metaheuristics or parallel multiobjective optimization. In Section 4.7, a conclusion is drawn and some future directions are proposed. 4.2

PARALLEL LSMs

4.2.1 Principles of LSMs Metaheuristics for solving optimization problems could be viewed as “walks through neighborhoods” meaning search trajectories through the solutions domains of the problems at hand [23]. Algorithm 1. LSM skeleton pseudocode Generate(s(0)); t := 0; while not Termination-Criterion(s(t)) do s’(t) := SelectMove(s(t)); if AcceptableMove(s’(t)) then s’(t) := ApplyMove(s(t)); t := t+l; endwhile The walks are performed by iterative procedures that allow to move from a solution to another one in the solution space (see Algorithm 1). LSMs perform particularly the moves in the neighborhood of the current solution. The walks start from a solution randomly generated or obtained from another optimization algorithm. At each iteration, the current solution is replaced by another one selected from the set of its neighboring candidates. The search process is stopped when a given condition is satisfied (stopping criterion). A powerful way to achieve high performance with

CASE STUDIES OF PARALLEL LSMs

81

LSMs is the use of parallelism. Different parallel models have been proposed, and these are summarized in the next section. 4.2.2 Parallel Models of LSMs Three parallel models are commonly used in the literature: the parallel multistnrt model, the parallel exploration and evaluation of the neighborhood orparallel moves model, and the parallel evaluation of a single solution or move acceleration model. 0

0

0

Parallel multistart model. It consists in simultaneously launching several LSMs for computing better and robust solutions. They may be heterogeneous or homogeneous, independent or cooperative, start from the same or different solution(s), configured with the same or different parameters. Parallel moves model. It is a low level farmer-workermodel that does not alter the behavior of the heuristic. A sequential search computes the same results slower. At the beginning of each iteration, the farmer duplicates the current solution between distributed nodes. Each one manages some candidates and the results are returned to the farmer. Move acceleration model. The quality of each move is evaluated in a parallel centralized way. That model is particularly interesting when the evaluation function can be itself parallelized as it is CPU time-consuming and/or inputout (10) intensive. In that case, the function can be viewed as an aggregation of a certain number of partial functions.

4.3 CASE STUDIES OF PARALLEL LSMs 4.3.1

Parallel Simulated Annealing

Simulated Annealing (SA) [49] is a stochastic search method in which, at each step, the current solution is replaced by another one that improves the objective function, randomly selected from the neighborhood. SA uses a control parameter, called temperature, to determine the probability of accepting nonimproving solutions. The objective is to escape from local optima and so to delay the convergence. The temperature is gradually decreased according to a cooling schedule such that few non-improving solutions are accepted at the end of the search. To our knowledge, SA was the first optimization LSM to be parallelized. Several parallel implementations have been proposed between 1980 and 1990, most of them focusing on cell placement problems in VLSI layout [43,50,52,64,77]. The major of parallelization approaches can be classified into two categories: move acceleration and parallel moves [43]. Table 4.1 shows some representative parallel SAs. In [ 171, these models are implemented as skeletons and can be instantiated by the user. The fine-granularity nature of the move acceleration model makes it not suitable

82

METAHEURISTICS AND PARALLELISM

for distributed-memory systems. Indeed, its implementation is often restricted to shared-memory machines [50]. Table 4.1 A quick survey of several parallel SA Article (1 987) (1 987) (1 990) ( 1996) (2000) (2000) (2000) (2004)

Parallel Model

Move acceleration Noninteracting parallel moves Noninteracting parallel moves Parallel multistart (synchronous and asynchronous) Noninteracting and interacting parallel moves Parallel multistart (synchronous and asynchronous) Parallel asynchronous multistart A general framework for parallel algorithms (e.g., parallel SA)

The parallel moves model is more widely investigated [ 19, 771. In this model, different moves are evaluated in a concurrent way. Each processor generates and evaluates moves independently. The model suffers from inconsistency: due to the moves made by other processors, the cost function computations may be incorrect. Two major approaches are usually used to manage the inconsistency: (1) the evaluation of moves is performed in parallel and only non-interacting moves are accepted [43, 771. It can be viewed as a domain decomposition approach. This allows to preserve the convergence property of the sequential algorithm and permits good speedups [50,43]. The difficulty of the approach is how to determine noninteractive moves. ( 2 ) The second approach consists in evaluating and accepting in parallel multiple interacting moves. Some errors in the calculation of the cost functions are allowed. Errors are corrected after a certain number of moves (after each temperature in [43]) by synchronization between processors. However, it affects the convergence of the parallel algorithm compared to the sequential algorithm. In addition, due to the synchronization cost negative speedups may be obtained, as reported in [43]. While most parallel implementations of SA are based on the parallel moves model, several other parallelizations follow the parallel multistart model. These parallelizations use multiple Markov chains [43, 52, 641, and many of them are applied to the cell placement problem. Each chain performs cell moves on the whole set of cells rather than only on a subset of cells. This approach allows to overcome the performance problems of the parallel moves strategy caused by the use of restricted moves and tolerated errors in the cost function evaluation. In the parallel multiple Markov chains approach, each processor carries out SA on a local copy of the whole problem data. The processors dynamically combine their solutions by exchanging their best ones in a synchronous or asynchronous way. 4.3.2

Parallel Tabu Search

Tabu Search (TS) [39] manages a memory of solutions or moves recently applied, called the tabu list. When a local optimum is reached, the search carries on by

CASE STUDIES OF PARALLEL LSMs

83

selecting a candidate worse than the current solution. To avoid the previous solution to be chosen again, and so to avoid cycles, TS discards the neighboring candidates that have been previously applied. Several parallel implementations of TS are briefly summarized in Table 4.2. Most of them are based on the multistart model and/or a neighborhood decomposition, and follow the parallel moves model [32, 721. In [9, 14, 171, the different models are implemented as general skeletons and can be instantiated by the user. Table 4.2 A quick survey of several parallel TS Article [32] [72] [89] [41 [I41

(1994) (1995) (1996) (1998)

[I71

(2004)

(2001)

Parallel Model

Parallel moves, large TSP Parallel moves, task scheduling Parallel independent multistart with adaptive load balancing, QAP Parallel cooperative multistart, circuit partitioning Parallel skeletons for Tabu search: independent parallel multistart, master-slave with neighborhood partition Cooperative parallel multistart, capacitated network design Parallel skeletons for Tabu search: independent parallel multistart (with search strategies), master-slave (with neighborhood partition) Parallel skeletons for Tabu search (all models)

In [32], it is claimed that due to its heavy synchronization such a model is worth applying to problems in which the calculations required at each iteration are timeconsuming. In [72], a the parallel implementation of TS based on parallel moves model is applied to such a problem, and linear speedups can be attained for large problems. A wide range of parallel implementations based on parallel multistart model have also been proposed [4, 21, 891. In most of them, a sequential TS performs its search in each processor. The different TS algorithms may use a different initial solution and use different parameter values. These TS algorithms may be completely independent [89]. They also can cooperate through a central pool of elite solutions held by a dedicated master processor [4]. An elite solution is a local optimum which improves the best solution already locally visited. The cooperation may be performed as a post-optimization intensification procedure based on path-relinking. A parallel cooperative multistart approach has also been developed in [21]. The results show that it outperforms both the sequential algorithm and the independent parallel multistart approach.

4.3.3 Parallel GRASP The Greedy Randomized Adaptive Search Procedure (GRASP) [30] is a multistart algorithm. Each iteration of the algorithm is composed of two phases: a construction phase and a local search phase. The construction phase consists in generating in an iterative way a feasible solution by a greedy randomized procedure. The local search

84

METAHEURISTICS A N D PARALLELISM

phase allows to provide a local optimum in the neighborhood of the constructed solution. The resulting solution of the problem at hand is the best solution over all iterations. Table 4.3 shows that the major part of the parallel implementations [53,57,70] of GRASP are based on the parallel multistart model. Many of these implementations are proposed by Resende and his collaborators. Parallelism consists in distributing the iterations over the processors. Each processor receives a copy of the sequential algorithm and a copy of the problem data. Since the iterations are independent and very little information is exchanged between processors linear speedups are often obtained. For instance, in [53] almost linear speedups are reported on an implementation of a parallel GRASP applied to the quadratic assignment problem, and in particular a speedup of 62 is obtained on 64 processors. In more recent research works on parallel GRASP the parallel iterations are followed by the path-relinking intensification process to improve the quality of the obtained solutions [ 3 , 61. In [5], a methodology is proposed for the analysis of parallel GRASP approaches. Table 4.3 A quick survey of several parallel GRASP Article [53] [70]

(1994) (1995) [57] (1998) [6]

(2000)

[3]

(2003)

[5]

(2003)

Parallel Model Parallel multistart, QAP Parallel multistart, QAP Cycle stealing, parallel multistart with adaptive load balancing, Steiner problem in graphs Parallel independent multistart with path-relinking, Tree index assignment problem Parallel independent multistart with path-relinking, Job shop scheduling problem A methodology for the analysis of parallel GRASP

Load balancing may be easily achieved by evenly distributing the iterationsamong the processors. However, in a heterogeneous multiuser executionenvironment a static distribution may be less efficient. In [57], a dynamic adaptive distribution approach is presented. The approach is based on the farmer-worker cycle stealing strategy. ,Eachworker processor is initially allocated a small number of iterations. Once it has performed its iterations, it requests from the farmer processor additional iterations. All the workers are stopped once the final result is returned. Faster and less loaded processors perform more iterations than the others. This approach allows to reduce the execution time compared to the static one.

4.3.4 Parallel Variable Neighborhood Search The basic idea of the Variable Neighborhood Search (VNS) [65] is to successively explore a set of predefined neighborhoods to provide a better solution. It uses the descent method to get the local minimum. Then, it explores either at random or systematically the set of neighborhoods. At each step, an initial solution is shaken

PARALLEL EVOLUTIONARY ALGORITHMS

85

from the current neighborhood. The current solution is replaced by a new one if and only if a better solution has been found. The exploration is thus restarted from that solution in the first neighborhood. If no better solution is found, the algorithm moves to the next neighborhood, randomly generates a new solution, and attempts to improve it. Since VNS is a relatively new metaheuristic, it has not yet been investigated much from a parallelization point of view. The two major research works reported in the literature on the parallelization of VNS are [37] and [22]. In [37], three approaches have been proposed and compared: the first one follows a low level parallel model and attempts to speed up the execution by parallelizing the local search phase; the second one is based on the parallel independent multistart model; and the third strategy implements the parallel synchronous cooperative multistart model. They have been experimented on the TSPLIB problem instances using 1400 customers. The reported results show that the multistart model obtained the better solutions. Table 4.4 A quick survey of parallel VNS Article

[37] [37] [37]

[22]

(2002) (2002) (2002) (2004)

Parallel Model

Parallel local search, TSPLIB Parallel independent multistart, TSPLIB

Parallel cooperative synchronous multistart, TSPLIB Parallel cooperative asynchronous multistart, p-median problem

In [22], an asynchronous cooperative variant of the parallel multi-start model, called Cooperative Neighborhood VNS (or CNVNS), is proposed. The farmer keeps, updates, and communicates the current overall best solution. It also initiates and terminates the procedure. Unlike the parallel cooperative synchronous variant of the multistart model previously presented, the communications are initiated by the workers in an asynchronous way. When a worker cannot improve its solution it communicates it to the farmer if it is better than the one at the last communication. The overall best solution is requested from the farmer and serves as the initial solution fiom which the search is started in the current neighborhood. The approach has been experimented on p-median problem instances of up to 1,000 medians and 11,948 customers. The results show that the strategy allows to reduce the computation time without losing on solution quality compared to the sequential VNS. It also allows to find better solutions given time.

4.4 P A W L E L EVOLUTIONARY ALGORITHMS 4.4.1 Principles of EAs Evolutionary Algorithms (broadly called EAs) are stochastic search teclmques that have been successfully applied in many real and complex applications (epistatic, multimodal, multiobjective and highly constrained problems). Their success in

86

METAHEURISTICS AND PARALLELISM

solving difficult optimization tasks has promoted the research in the field known as evolutionary computing (EC) [13]. An EA is an iterative technique that applies stochastic operators on a pool of individuals (the population) (see Algorithm 2). Every individual in the population is the encoded version of a tentative solution. Initially, this population is generated randomly. An evaluation function associates a fitness value to every individual indicating its suitability to the problem. Algorithm 2. EA pseudocode Generate(P( 0)); t := 0; while not Termination-Criterion(P(t)) do Evaluate(P(t )); P’(t) := Selection(P(t)); P’(t) := ApplyReproduction-Ops(P’( t)); P ( t 1) := Replace(P(t), P’(t)); t := t + 1; endwhile

+

The above pseudocode shows the genetic components of any EA. There exist several well-accepted subclasses of EAs depending on the representation of the individuals or on the applied evolution step. The main subclasses of EAs are the Genetic Algorithm (GA), Evolutionary Programming (EP), the Evolution Strategy (ES), and some others not shown here. 4.4.2

Parallel Models of EAs

For nontrivial problems, executing the reproductive cycle of a simple EA on long individuals and/or large populations requires high computational resources. In general, evaluating a fitness fbnction for every individual is frequently the most costly operation of the EA. Consequently,a variety of algorithmic issues are being studied to design efficient EAs. These issues usually consist of defining new operators, hybrid algorithms, parallel models, and so on. We now analyze the parallel models used in the field of the EC. Parallelism arises naturally when dealing with a population, since each of the individuals belonging to it is an independent unit. Due to this, the performance of population-based algorithms is specially improved when running in parallel. Two parallelizing strategies are specially focused on population-based algorithms: ( 1 ) parallelization of computation, in which the operations commonly applied to each of the individuals are performed in parallel and (2) parallelization of population, in which the population is split in different parts that can be simply exchanged or evolve separately and be joined later. In the beginning of the parallelization of these algorithms the well-known masterslave (also knows as globaIparallelization) method was used. In this way, a central

CASE STUDIES OF PARALLEL EAs

87

processor performs the selection operations while the associated slave processors perform the recombination, mutation, and evaluation of the fitness function. This algorithm is the same as the sequential one, although it is faster, especially for timeconsuming objective functions. Also, many researchers use a pool of processors to speed up the execution of a sequential algorithm, just because independent runs can be made more rapidly by using several processors than by using a single one. In this case, no interaction at all exists between the independent runs. However, actually most parallel EAs (PEAS) found in the literature utilize some kind of spatial disposition for the individuals and then parallelize the resulting chunks in a pool of processors. Among the most widely known types of structured EAs, the distributed (dEA) (or coarse-grain) and cellular (cEA) (or fine-grain)algorithms are very popular optimization procedures [lo]. In the case of distributed EAs, the population is partitioned in a set of islands in which isolated EAs are executed. Sparse individual exchanges are performed among these islands with the goal of introducing some diversity into the subpopulations, thus avoiding them to fall in local optima. In the case of a cellular EA the concept of neighborhood is introduced, so that an individual may only interact with its nearby neighbors in the breeding loop. The overlapped small neighborhood in cEAs help in exploring the search space because a slow diffusion of solutions through the population provides a kind of exploration, while exploitation takes place inside each neighborhood. Also hybrid models have been proposed in which a two-level approach of parallelization is undertaken. In general, the higher level for parallelization is a coarse-grain implementation and the basic island perfoms a cEA, a master-slave method, or even another distributed one. 4.5 CASE STUDIES OF PARALLEL EAs

4.5.1

Parallel Genetic Algorithms

Genetic Algorithms (GAS) [ 131 are a very popular class of EAs. Traditionally, GAS are associated with the use of a binary representation, but nowadays you can find GAS that use other types of representations. A GA usually applies a recombination operator on two solutions, plus a mutation operator that randomly modifies the individual contents to promote diversity. In Table 4.5 we show some of the most important and representative works on parallel GAS. The distributed model is the most common parallelization in PGAs, since it can be implemented in distributed-memoryMIMD computers. Some coarsegrain algorithms like dGA [90], DGENESIS [ 5 8 ] , and GALOPPS [40] are relatively close to the general distributed model of migration islands. They often include many features to improve efficiency. Some other coarse-grain models like GDGA [44] have been designed for specific goals, such as providing explicit exploratiodexploitation by applying different operators on each island. Some other PGAs execute nonorthodox models of coarse-grain evaluation, such as GENITOR I1 [94], which is based on a steady-state reproduction.

88

METAHEURISTICS AND PARALLELISM

Table 4.5 Algorithm

A quick survey of several parallel GAS

Article

ASPARAGOS dGA GENITOR I1 ECO-GA EnGEN Eer GAME DG EN ESI S GALOPPS GDGA ParadisEO

[41] [90] [94] [95] [76]

[82] [58] [40] [44] [I71

Parallel Model

(1989) (1989) (1990) (1991) (1992) (1993) (1994) (1996) (2000) (2004)

Fine-grain. Applies hill-climbing if no improvement Distributed populations Coarse grain Fine-grain Global parallelization Object oriented set of general programming tools Coarse grain with migration among subpopulations Coarse grain Coarse grain. Hypercube topology A general framework for parallel algorithms

On the other hand, parallel implementations of the cellular model have been strongly associated to the machines on which they run: ASPARAGOS [4 11and ECOGA [95]. As to the master-slave model, some implementations, such as EnGEMEer [76], are available. Finally, some efforts to construct general frameworks for PGAs are GAME [82] or ParadisEO [17]. The mentioned systems are endowed with “general” programming structures intended to ease the implementation of any model of PGA. 4.5.2

Parallel Evolution Strategies

Evolution Strategies (ESs) [ 131 are other subclasses of EAs, such as GAS or GPs. This algorithm is suited for continuous optimization, usually with an elitist selection and a specific mutation (crossover is used rarely). In ES, the individual is composed of the objective float variables plus some other parameters guiding the search. Thus, an ES facilitates a kind of self-adaption by evolving the problem variables as well as the strategy parameters at the same time. Hence, the parameterization of an ES is highly customizable.

Table 4.6 A quick survey of several parallel ESs Article [79] [24] [83] [Sl] [42] [93]

(1991) (1993) (1994) (1996) (1 999) (2004)

Parallel Model Distributed Distributed Cellular Distributed and cellular Cellular Cellular with dynamic neighborhood structures

CASE STUDIES OF PARALLEL EAs

89

Table 4.6 shows some representative parallel implementation of ESs. Several of these works [81, 83, 931 follow the cellular approach where the individual are structured and may only interact with its nearby neighbors. They show that a cellular model applied to complex problems can have a higher convergence probability than panmictic GAS. Other studies had analyzed the converge properties of this model, such as [42]. The classic distributed model also has been extensively used to implement parallel versions of ES [24, 79, 811 and it obtains very competitive results for optimization problems and continuous problems. 4.5.3 Parallel Genetic Programming Genetic Programming (GP) [13] is a more recent EA which extends the generic model of learning to the space of programs. Its major variation with respect to other evolutionary families is that the evolving individualsare themselves programs instead of fixed-length strings from a finite alphabet of symbols. GP is a form of program induction that allows to automatically discover programs that solve or approximately solve a given task. See Table 4.7 for a summary of parallel implementations of GP.

Table 4.7 A quick survey of parallel CP Article

[48] [ I I] [27] [ 121 [73] [31] [33]

(1996) (1996) (1996) (1 996) ( I 998) (2000) (2001)

Parallel Model Fine-grain Distributed Master-slave Distributed Distributed Distributed Fine-grain

GP is not in general suitable for massively cellular implementations since individuals may vary widely in size and complexity. This makes cellular implementations of GP difficult both because of the amount of local memory needed to store individuals as well as for efficiency reasons. Despite these difficulties, several fine-grain parallel GP’s have been implemented, such as [48]. Actually, several implementations of this cellular model on distributed-memorycomputers can be found in the literature, such as [33], where the authors show that their parallel cellular GP has a nearly linear speedup and a good scaleup behavior. For coarse-grain, island-basedparallel genetic programming the situation is somewhat controversial. Several works [ 11, 12,3 11 reported excellent results but another one [73] found that the multiple-population approach did not help in solving some problems. Also other implementations can be found such as [27], where the authors used a master-slave approach.

90

METAHEURISTICS A N D PARALLELISM

4.5.4 Parallel Ant Colony Optimization

The ant colony optimization technique (ACO) [26] is a new metaheuristic for hard combinatorial optimization problems. Ant algorithms have been inspired by colonies of real ants, which deposit a chemical substance (called pheromone) on the ground. This substance influences the choices they make: the larger the amount of pheromone on a particular path, the larger the probability that the ants select the path. Artificial ants are stochastic construction procedures that probabilistically build a solution by iteratively adding solution components to partial ones by taking into account (1) heuristic information on the problem and (2) pheromone trails which change dynamically at runtime to reflect the acquired search experience. Ant algorithms are good candidates for parallelization, but not much research has been done in parallel ant algorithms so far. Now, in this section we briefly describe the most important parallel implementation and parallel model of ant colony algorithms that have been described in the literature.

Table 4.8 A quick survey of several parallel ACO Article (1 993) (1 997) ( 1998) (1 998) (1 999) (2001) (2002) (2002) (2002) (2004)

Parallel Model Fine-grained and a coarser grained variant Information exchange every k generations Independent runs Distributed Master-slave Distributed Distributed Master-slave and independent runs A very coupled master-slave Master-slave

In Table 4.8, we can observe that many parallel algorithms follow a master-slave model [25,74,75,88]. In this model, a master process sends some solutions to each slave processor, and after each generation these colonies exchange information and the master calculates the pheromone matrix. The distributed or island model has also been used in multiple implementation of a parallel version of ACO [59,62,63]. In this last model has been studied different methods for information exchange in multicolony ant algorithm, and these studies conclude that it is better to exchange the local best solution only with the neighbor in a directed ring and not too often (loose coupling). Stiitzle [87] studied the easiest way to parallelize the algorithm (run several independent executions and return the best solution of best execution) and he obtained more accurate results than the serial version in many cases.

CASE STUDIES OF PARALLEL EAs

91

4.5.5 Parallel Estimated Distribution Algorithms Estimation of Distribution Algorithms (EDAs) are a recent type of optimization and learning t e c h q u e based on the concept of using a population of tentative solutions to improve the best-so-far optimum for a problem [66]. The general EDA can be sketched as follows:

Algorithm 3. EDA pseudocode Set t c 1; Generate N >> 0 points randomly; while termination criteria are not met do Select M 5 N point according to a selection method; Estimate the distribution p*(z,t ) of the selected set; Generate N new points according to the distribution p* (2,t ) ; Set t t t 1; endwhile

+

The chief step in this algorithm is to estimate p*(x?t ) and to generate new points according to this distribution. This represents a clear difference with respect to other EAs that use recombination and/or mutation operators to compute a new population of tentative solutions. This algorithm requires considerable CPU and memory utilization. Therefore, it is important to find techniques that allow improving its execution, such as parallelism. There are several possible levels at which an EDA can be parallelized (1) estimation of probability distribution level (or learning level), (2) sampling of new individuals level (or simulation level), (3) population level, (4) fitness evaluation level, and (5) any combination of the mentioned levels. Table 4.9 Algorithm

A quick survey of several parallel EDAs Article

(2000) (2001) (2001) (2003) (2003) (2003) (2003) (2004) (2004)

Parallel Model

Hybrid, learning level and simulation Learning level Hybrid, learning level and simulation Population level Learning level Learning level Learning level Learning level Hybrid, learning level and simulation

Table 4.9 shows a survey of the most important works on parallel EDAs. Notice that in most cases the algorithms try to reduce the time required to learn the probability distribution [ 5 5 , 56,60,68]. In general, these algorithms for learning the probability

92

METAHEURISTICS AND PARALLELISM

distribution use a score+search procedure mainly defining a metric that measures the goodness of every candidate Bayesian network with respect to a database of cases. Also, several approaches use the parallelization of the sampling of new individuals in order to improve the behavior of the algorithms [61, 68, 691. Only a few works use the other levels of parallelism [2] in which the authors proposed a skeleton for the construction of distributed EDAs simulating the migration through vectors of probabilities. 4.5.6

Parallel Scatter Search

Scatter Search (SS) [38] is a population-based metaheuristic that combines solutions selected from a reference set to build others. The method starts by generating an initial population of disperse and good solutions. The reference set is then constructed by selecting good representative solutions from the population. The selected solutions are combined to provide starting solutions to an improvement procedure. According to the result of such a procedure the reference set and even the population of solutions can be updated. The process is iterated until a stopping criterion is satisfied. The SS approach involves different procedures allowing to generate the initial population, to build and update the reference set, to combine the solutions of such a set, to improve the constructed solutions, etc. The major parallel implementations of SS are summarized in Table 4.10. Parallelism can be used at three levels of the SS process: the improvement procedure level, the combination level, and the whole process level. The first level is low level and consists in parallelizing the improvement procedure by using the different parallel models of LSMs quoted above. In [36], the coarse-grain parallelism of the local searches (parallel moves model) is exploited and called Synchronous Parallel Scatter Search or SPSS. The model is applied to the p-median problem, and the experimental results demonstrate its efficiency in terms of the quality of the provided solutions. Furthermore, it allows to properly reduce the computational time.

Table 4.10 A quick survey of the major parallel SS ~~~

Article [36] [36] [36] [38]

(2003) (2003) (2003) (2004)

~

~~

~

Parallel Model Parallel local search with parallel moves (SPSS) Master-slave parallel combinations (RCSS) Independent runs (RPSS), p-median problem RCSS with different combination methods and different parameter settings, feature selection in data mining

The second level of parallelism is obtained by parallelizing the combinations of solutions. The set of possible combinations is divided among a set of available processors and solved in parallel. Such a model is presented in [36] and called Replicated Combination Scatter Search or RCSS. The experimentations of its application to the

OTHER MODELS

93

p-median problem show that the model allows to obtain the best known results in a reduced computational time. Another variant of the model with different combination methods and different parameter settings has been proposed in [38] and experimented in the data mining area. The objective of the approach is to improve the precision of the SS metaheuristic without increasing the computational time. At the third level of parallelism, the whole SS process is parallelized, i.e., each processor runs a SS procedure. The model is multistart and its objective is to increase the diversification of the solutions. The intensification can also be increased by sharing the best found solution. In [38], the model is called Replicated Parallel Scatter Search or RPSS and has been applied to the p-median problem. The reported experimental results show that the model allows to find the best objective values.

4.6

OTHER MODELS

In this section we include a set of models whose classification is unclear because they could belong to some of the existing types of algorithms.

4.6.1 Parallel Heterogeneous Metaheuristics A heterogeneous algorithm is a method whose components either are executed over a different computing platform or have different search features. The first class, hardware heterogeneity, is an influent issue because of the current relevance of Internet and grid computing. Several works have found that this kind of heterogeneous platform allows to improve the performance [8]. As to the other class, software/search heterogeneity, we can define additional levels of heterogeneity as regards to the kind of search that the components are making. At this software level, we can distinguish various sublevels according to the source of the heterogeneity: (1) Parameter level: We use the same metaheuristic in each component but vary the parameter configuration. (2)’Operator level: At this level the heterogeneity is introduced by using different space search mechanism (for example, different genetic operators in GAS, or different neighborhood definitions in SAs). (3) Solution level: At this level each component stores locally encoded solutions represented with different encoding schemata. (4) Algorithm level: This is the most general class of software heterogeneity in which each component can potentially be a different algorithm. In Table 4.11 we show several representative algorithms of each level of the software heterogeneous class. We can observe that in most cases the heterogeneity is introduced by using a different configuration in each component that composes the heterogeneous algorithm [ 1, 7, 2 1, 45, 9 1, 921. In general, these algorithms use the distributed parallel model, and each island uses its own configuration (different rates of mutation or crossover), although each implementation has its own features which distinguishes it from the rest. For example, in PGA-PA [91] the parameters are also migrated and evolved, or in Hy4, the 16 subpopulations are arranged in a hypercube

94

METAHEURISTICS A N D PARALLELISM

Table 4.1 1 A quick survey of several parallel heterogeneous metaheuristics Algorithm iiGA

GCPSA CoPDEB CGA

MACS-VRPTW DGNnnr PGA-PA

CPTS CPM-VRPTW HY4

Article ( 1994)

( 1 996) ( 1996) ( 1996) ( I 999) ( 1 999) (2002) (2002) (2004) (2004)

Heterogeneity level

Solution level (distributed model) Algorithm level with heuristic relay (master-slave model) Operator and parameter levels (distributed model) Parameter level (master-slave model)

Operator level (distributed model)

Parameter level (distributed model) Parameter level (distributed model) Parameter level (distributed model) Algorithm level with heuristic teamwork Operator and parameter levels (distributed model)

topology of four dimensions. Another common way of heterogeneity is to put to work together different algorithms, for example, CPM-VRPTW [5 11 is composed of two TS, two EAs, and a local search method or GCPSA [46] is composed of several SA's and one GA.

4.6.2 Parallel Multiobjective Optimization Optimization problems for real applications often have to consider many objectives and we thus have a multiobjective (MO) problem. A trade-off between the objectives exist and we never have a situation in which all the objectives can be satisfied in a best possible way simultaneously. MO optimization provides the information of all possibilities of alternative solutions we can have for a given set of objectives. By analyzing the spectrum of solutions we have to decide which of these solutions is the most appropriate. The two steps are to solve the MO problem and decide what the optimal solution is. Parallelization may be especially productive in certain MO applications, due primarily to the fact that identifying a set of solution, perhaps a very large one, is often the primary goal driving search. Table 4.12 summarizes some of the most important works about parallel MO algorithms. The fitness evaluation in MO problems is the most time-consuming process of the algorithm. Therefore, several algorithms try to reduce this time by means of parallelizing the calculation of the fitness evaluation [84,85,86]. The other most used model is the distributed model. Several algorithms follow this model [47, 71, 801. Other models also are implemented, for example, Nebro et al. [67] proposed an exhaustive algorithm that uses a grid computing technique to parallelize the search of the algorithm or Conti et al. [20] use a SA with parallel exploration of the neighborhood.

CONCLUSIONS

95

Table 4.12 A quick survey of several parallel MO algorithms Article ( 1994) ( 1 995) ( 1 995)

(1 996) ( 1 997)

(2000) (2002)

(2004) (2004)

Parallel Model SA with parallel exploration of the neighborhood Master-slave model with slaves computing fitness evaluation Master-slave model with slaves computing fitness evaluation Cellular model Heterogeneous distributed algorithm Two master-slave algorithms Distributed hybrid algorithm Enumerative deterministic algorithm Distributed particle swam optimization

4.7 CONCLUSIONS Parallelism is a powerful and necessary way to reduce the computation time of metaheuristics and/or improve the quality of the provided solutions. Different models have been proposed to exploit the parallelism of metaheuristics. These models have been and are still being largely experimented on a wide range of metaheuristics and applied to a large variety of problems in different areas. The reported results in the literature demonstrate that the efficiency of these models depends at the same time on the kind of metaheuristic at hand and the characteristics of the problem being tackled. The survey presented in this chapter shows that: (1) the parallel multistart model of LSMs is straightforward to use in its independent variant and allows in this case to improve the robustness of the execution. The cooperative variant is more complex but often provides better solutions. Synchronous information exchange guarantees more reliability but is often less efficient than asynchronous exchange. ( 2 ) This statement is also true for the parallel moves model. The efficiency of this model may be greater if the evaluation of each move is time-consuming andor there are a great deal of candidate neighbors to evaluate. Another parameter that influences the performance of such a model is the moves interactivity. If only noninteractive moves are accepted the convergence property of the sequential algorithm can be preserved and good speedups can be obtained. However, determining noninteractive moves is not an obvious task. On the other hand, acceptance of interactive moves affects the convergence of the algorithm. In addition, negative speedups may probably be obtained due the synchronization problem. ( 3 ) The move acceleration model may be particularly interesting if the evaluation function can be itself parallelized as it is CPU time-consuming andor I 0 intensive. In general, due to its fine-grained nature it is less exploited than the other models. The master-slave model has been and is still very popular in the area of parallel EAs. It allows to speed up the execution, especially for time-consuming objective functions. The model is easy to use as it supposes independent runs with any interaction between them. Nevertheless, most parallel EAs are nowadays based on the distributed coarse-grained model and the cellularjne-grained model. The former

96

METAHEURISTICS A N D PARALLELISM

allows to introduce some diversity and to provide better solutions. Its efficiency depends on some parameters such as the exchange topology, the exchange mode (synchronous/asynchronous), etc. The other model allows at the same time better exploration and exploitation of the search space. Therefore, it provides better and more diverse solutions. In the last decade, grid computing [34] and Peerito-Peer (P2P) computing [28] have become a real alternatives to traditional supercomputing for the development of parallel applications that harness massive computational resources. In the future, the focus in the area of parallel and distributed metaheuristics will be on the gridification of the parallel models presented in this chapter. This is a great challenge as nowadays grid and P2P-enabled frameworks for metaheuristics are just emerging [ 18,29,67]. Acknowledgments The first and third authors acknowledge funding by the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project).

REFERENCES 1. P. Adamidis and V. Petridis. Co-operating Populations with Different Evolution Behaviors. In Proc. of the Third IEEE Con$ on Evolutionary Computation, pages 188-191, New York, 1996. IEEE Press. 2. C.W. Ahn, D.E. Goldberg, and R.S. Ramakrishna. Multiple-deme parallel estimation of distribution algorithms: Basic framework and application. Technical Report 20030 16, University of Illinois, 2003. 3. R.M. Aiex, S. Binato, and M.G.C. Resende. Parallel grasp with path-relinking for job shop scheduling. Parallel Computing, 291293430,2003. 4. R.M. Aiex, S.L. Martins, C.C. Ribeiro, and N.R. Rodriguez. Cooperative multithread parallel tabu search with an application to circuit partitioning. LNCS 1457,pages 310-331,1998. 5. R.M. Aiex and M.G.C. Resende. A methodology for the analysis of parallel grasp strategies. AT&TLabs Research TR, Apr. 2003.

6. R.M. Aiex, M.G.C. Resende, P.M. Pardalos, and G. Toraldo. Grasp with path relinking for the tree-index assignment problem. TR, AT&T Labs Research, Florham Park, NJ 07932, USA, 2000.

7. E. Alba, F. Luna, A.J. Nebro, and J.M. Troya. Parallel heterogeneous GAS for continuous optimization. Parallel Computing, 30:699-7 19, 2004.

REFERENCES

97

8. E. Alba, A.J. Nebro, and J.M. Troya. Heterogeneous Computing and Parallel Genetic Algorithms. J of Parallel and Distributed Computing, 62: 1362-1 385, 2002. 9. E. Alba and the MALLBA Group. MALLBA: A library of skeletons for combinatorial optimization. LNCS 2400, pages 927-932,2002. 10. E. Alba and M. Tomassini. Parallelism and Evolutionary Algorithms. IEEE Transactions on Evolutionary Computation, 6(5):443462,2002. 11. D. Andre and J.R. Koza. Parallel genetic programming: a scalable implementation using the transputer network architecture. In Advances in genetic programming: volume 2, pages 317-337. MIT Press, 1996. 12. D. Andre and J.R. Koza. A parallel implementation of genetic programming that achieves super-linear performance. In H.R. Arabnia, editor, Proceedings of the International ConJ on ParaNel and Distributed Processing Techniques and Applications, volume 111, pages 1163-1 174, 1996. 13. T. Back, D.B. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation. Oxford University Press, 1997. 14. M.J. Blesa, L1. Hernandez, and F. Xhafa. Parallel Skeletons for Tabu Search Method. In the Sth Intl. Con$ on Parallel and Distributed Systems, Korea, IEEE Computer Society Press, pages 23-28,2001. 15. M. Bolondi and M. Bondaza. Parallelizzazione di un Algoritmo per la Risoluzione del Problema del Commesso Viaggiatore. Master’s thesis, Politecnico di Milano, Dipartamento di Elettronica e Informazione, 1993. 16. B. Bullnheimer, G. Kotsis, and C. Straufl. Parallelization Strategies for the Ant System. Technical Report 8, University of Viena, October 1997. 17. S. Cahon, N. Melab, and E.-G. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics. Journal of Heuristics, 10(3):357-380, May 2004. 18. S. Cahon, N. Melab, and E-G. Talbi. ParadisEO on Condor-MW for optimization on computational grids. http://ww. lif.fr/-cahon/cmw/index. html, 2004. 19. J.A. Chandy and P. Banerjee. Parallel Simulated Annealing Strategies for VLSI Cell Placement. Proc. of the I996 Intl. Con$ on VLSI Design, Bangalore, India, page 37, Jan. 1996. 20. M. Conti, S. Orcioni, and C. Turchetti. Parametric Yield Optimisation of MOS VLSI Circuits Based on SA and its Parallel Implementation. IEEE Proc.-Circuits Devices Syst., 141(5):387-398, 1994. 21. T.G. Crainic and M. Gendreau. Cooperative Parallel Tabu Search for Capacitated Network Design. Journal of Heuristics, 8:60 1-627,2002.

98

METAHEURISTICS AND PARALLELISM

22. T.G. Crainic, M. Gendreau, P. Hansen, and N. Mladenovic. Cooperative Parallel Variable Neighborhood Search for the p-Median. Journal of Heuristics, 10(3), 2004. 23. T.G. Crainic and M. Toulouse. Parallel Strategies for Meta-heuristics. In F. Glover and G. Cochenberger, eds., Handbook of Metaheuristics, Kluwer Academic Publishers, Norwell, MA, pages 475-514, 2003. 24. I. de Falco, R. del Balio, and E. Tarantino. Testing parallel evolution strategies on the quadratic assignment problem. In Proc. IEEE Int. Con( in Systems, Man and Cybernetics, volume 5 , pages 254-259, 1993. 25. K.F. Doerner, R.F. Hartl, G. Kiechle, M. Lucka, and M. Reimann. Parallel Ant Systems for the Capacited VRP. In J. Gottlieb and G.R. Raidl, editors, EvoCOP’04, pages 72-83. Springer-Verlag, 2004. 26. M. Dorigo. Optimization, Learning and Natural Algorithms. PhD thesis, Dipartamento di Elettronica, Politecnico di Milano, 1992. 27. D.C. Dracopoulos and S. Kent. Bulk synchronous parallelisation of genetic programming. Technical report, Brunel University, 1996. 28. A. Oram (Ed.). Peer-to-Peer: Harnessing the Power ofDisruptive Technologies. O’Reilly & Associates, 2001. 29. M.G.Arenas, P. Collet, A.E. Eiben, M. Jelasity, J. J. Merelo, B. Paechter, M. Preuss, M. Schoenauer A framework for distributed evolutionary algorithms. Proceedings ofPPSN VII, Granada, Spain, LNCS 2439, pages 665-675,2002. 30. T.A. Feo and M.G.C. Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6: 109-133, 1999. 31. F. Fernandez, M. Tomassini, W.F. Punch, 111, and J.M. Shchez-Perez. Experimental study of multipopulation parallel genetic programming. In Proc. of the European Conf:on GP, pages 283-293. Springer, 2000. 32. C.N. Fiechter. A parallel tabu search algorithm for large traveling salesman problems. Discrete Applied Mathematics, 5 1:243-267. 1994. 33. G. Folino, C. Pizzuti, and G. Spezzano. CAGE: A tool for parallel genetic programming applications. In J.F. Miller et al., editor, Proceedings ofEuroGP’2001, LNCS 2038, pages 64-73, Italy, 2001. Springer-Verlag. 34. I. Foster and C. Kesselman (eds.). The Grid: Blueprint for a Nav Computing Infrastructure. Morgan Kaufmann, San Fransisco, 1999. 35. L.M. Gambardella, E.D. Taillard, and G. Agazzi. MACS-VRPTW: A Multiple Ant Colony System for Vehicle Routing Problems with Time Windows. In D. Corne, M. Dorigo, and F. Glover, editors, New Ideas in Optimization, pages 63-76. McGraw-Hill, 1999.

REFERENCES

99

36. F. Garcia-Lopez, B. Melian-Batista, J. Moreno-Perez, and J.M. Moreno-Vega. Parallelization of the Scatter Search. Parallel Computing, 29:575-589, 2003. 37. F. Garcia-Lopez, B. Melib-Batista, J.A. Moreno-Perez, and J.M. Moreno-Vega. The Parallel Variable Neighborhood Search for the p-Median Problems. Journal ofHeuristics, 8(3):375-388, 2002. 38. F. Garcia-Lopez, M. Garcia Torres, B. Melian-Batista, J. Moreno-Perez, and J.M. Moreno-Vega. Solving Feature Subset Selection Problem by a Parallel Scatter Search. European Journal of Operational Research, 2006. To appear. 39. F. Glover. Tabu Search, part I. ORSA, Journal of Computing, 1:190-206, 1989. 40. E.D. Goodman. An Introduction to GALOPPS v3.2. Technical Report 96-07-01, GARAGE, I.S. Lab. Dpt. of C. S. and C.C.C.A.E.M., Michigan State Univ., East Lansing, MI, 1996. 4 1. M. Gorges-Schleuter, ASPARAGOS an asynchronous parallel genetic optimization strategy. In Proceedings of the Third International Conference on Genetic Algorithms, pages 422427. Morgan Kaufinann, 1989. 42. M. Gorges-Schleuter. An Analysis of Local Selection in Evolution Strategies. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, Proc. of the Genetic and Evolutionary Computation Conference, volume 1, pages 847-854, 1999. 43. A.M. Haldar, A. Nayak, A. Choudhary, and P. Banerjee. Parallel Algorithms for FPGA Placement. Proc. of the Great Lakes Symposium on VLSI (GVLSI ZOOO), Chicago, IL, 2000. 44. F. Herrera and M. Lozano. Gradual distributed real-coded genetic algorithm. IEEE Transaction in Evolutionary Computation, 4:43-63,2000. 45. T. Hiroyasu, M. Miki, and M. Negami. Distributed Genetic Algorithms with Randomized Migration Rate. In Proc. of the IEEE Con$ of Systems, Man and Cybernetics, volume 1, pages 689-694. IEEE Press, 1999. 46. D. Janaki Ram, T.H. Sreenivas, and K.G. Subramaniam. Parallel Simulated Annealing Algorithms. Journal of Parallel and Distributed Computing, 37:207212,1996. 47. N. Jozefowiez, F. Semet, and E.-G. Talbi. Parallel and Hybrid Models for MultiObjective Optimization: Application to the VRP. In Parallel Problem Solving from Nature VII, pages 271-280,2002. 48. H. Juille and J.B. Pollack. Massively parallel genetic programming. In Peter J. Angeline and K. E. Kinnear, Jr., editors, Advances in Genetic Programming 2, pages 339-358. MIT Press, Cambridge, MA, USA, 1996. 49. S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by Simulated Annealing. Science, 220(4598):67 1-680, 1983.

100

METAHEURISTICS A N D PARALLELISM

50. S.A. Kravitz and R.A. Rutenbar. Placement by simulated annealing on a multiprocessor. IEEE Trans. in Computer Aided Design, 6:534-549, 1987.

51. A. Le Bouthillier and T.G. Crainic. Co-Operative Parallel Method for Vehicle Routing Problems with Time Windows. Computers & Operation Research, 32(7): 1685-1708,2005. 52. S.Y. Lee and K.G. Lee. Synchronous and asynchronous parallel simulated annealing with multiple markov chains. IEEE Transactions on Parallel and Distributed Systems, 7:993-1008, 1996. 53. Y. Li, P.M. Pardalos, and M.G.C. Resende. A greedy randomized adaptive search procedure for qap. DIMACS Series in Discrete Mathematics and Theoritical Computer Science, 16:237-261, 1994. 54. S.-L. Lin, W.F. Punch, and E.D. Goodman. Coarse-Grain Parallel Genetic Algorithms: Categorization and New Approach. In Shth IEEE Symp. on Parallel and Distributed Processing, pages 28-37, 1994. 55. F.G. Lobo, C.F. Lima, and H. Martires. An architecture for massively parallelization of the compact genetic algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004), LNCS 3 103, pages 4 12-413. Springer-Verlag, 2004.

56. J.A. Lozano, R. Sagama, and P. Larraiiaga. Parallel Estimation of Distribution Algorithms. In P. Larraiiaga and J. A. Lozano, editors, Estimation ofDistribution Algorithms. A New Tool for Evolutionary computation. Kluwer Academis Publishers, pages 129-145,2001. 57. S.L. Martins, C.C. Ribeiro, and M.C. Souza. A parallel grasp for the steiner problem in graphs. LNCS 1457, pages 285-297, 1998. 58. M. Mejia-Olvera and E. Canhi-Paz. DGENESIS-software for the execution of distributed genetic algorithms. In Proceedings X X conf Latinoamericana de Informatica, pages 935-946, 1994.

59. R. Mendes, J.R. Pereira, and J. Neves. A Parallel Architecture for Solving Constraint Satisfaction Problems. In Proceedings of Metaheuristics Int. Conf: 2001, volume 2, pages 109-114, Porto, Portugal, 2001. 60. A. Mendiburu, J.A. Lozano, and J. Miguel-Alonso. Parallel estimation of distribution algorithms: New approaches. Technical Report EHU-KAT-IK- 1-3, Department of Computer Architecture and Technology, The University of the Basque Country, 2003. 6 1. A. Mendiburu, J. Miguel-Alonso, and J.A. Lozano. Implementation and performance evaluation of a parallelization of estimation of bayesian networks algorithms. Technical Report EHU-KAT-IK-XX-04, Computer Architecture and Technology, 2004. Submitted to Parallel Computing.

REFERENCES

101

62. R. Michel and M. Middendorf. An Island Model Based Ant System with Lookahead for the Shortest Supersequence Problem. In A.E. Eiben et al., editor, Fifth Int. ConJ on Parallel Problem Solving.from Nature, LNCS 1498, pages 692-70 1. Springer-Verlag, 1998. 63. M. Middendorf, F. Reischle, and H. Schmeck. Multi Colony Ant Algorithms. Journal ofHeuristic, 8:305-320,2002. 64. M. Miki, T. Hiroyasu, and M. Kasai. Application of the temperature parallel simulated annealing to continuous optimization problems. IPSL Transaction, 41 :1607-16 16,2000. 65. N. Mladenovic and P. Hansen. Variable neighborhood search. Computers and Opereration Research, 24: 1097-1 100, 1997. 66. H. Miihlenbein,T. Mahnig, and A. Ochoa. Schemata, distributions and graphical models in evolutionary optimization. Journal ofHeuristics, 5(2):2 15-247, 1999. 67. A.J. Nebro, E. Alba, and F. Luna. Multi-Objective Optimization Using Grid Computing. Soft Computing Journal, 2005. To appear. 68. J. Ocenhsek and J. Schwarz. The parallel bayesian optimization algorithm. In European Symp. on Comp. Inteligence, pages 61-67,2000. 69. J. Ocenasek and J. Schwarz. The distributed bayesian optimization algorithm for combinatorial optimization. In Evolutionary Methods for Design, Optimisation and Control, pages 115-120,2001.

70. P. Pardalos, L. Pitsoulis, ,andM.G. Resende. A parallel grasp implementation for the quadratic assignment problem. Parallel algorithmsfor irregular problems: State of the art, A . Ferreira andJ. Rolim eds., Kluwer, pages 115-133, 1995. 71. K.E. Parsopoulos, D.K. Tasoulis, N.G. Pavlidis, V.P. Plagianakos, and M.N. Vrahatis. Vector Evaluated Differential Evolution for Multiobjective Optimization. In Proceedings of the IEEE Congress on Evolutionary Computation, pages 204-2 11,2004. 72. S.C. Port0 and C.C. Ribeiro. Parallel tabu search message-passing synchronous strategies for task scheduling under precedence constraints. Journal of Heuristics, 1:207-223, 1995. 73. W. Punch. How effective are multiple poplulations in genetic programming. In J.R. Koza et al., editor, Genetic Programming 1998: Proc. of the Third Annual Conference,pages 308-3 13. Morgan Kaufmann, 1998. 74. M. Rahoual, R. Hadji, and V. Bachelet. Parallel Ant System for the Set Covering Problem. In M. Dorigo et al., editor, 3rd Intl. Workshop on Ant Algorithms, LNCS 2463, pages 262-267. Springer-Verlag, 2002. 75. M. Randall and A. Lewis. A Parallel Implementation of Ant Colony Optimization. Journal of Parallel and Distributed Computing, 62(9):1421 - 1432,2002.

102

METAHEURISTICS AND PARALLELISM

76. G. Robbins. EnGENEer - The evolution of solutions. In Proc. of theJifth Annual Seminar Neural Networks and Genetic Algorithms, 1992. 77. P. Roussel-Ragot and G. Dreyfus. A problem-independent parallel implementation of simulated annealing: Models and experiments. ZEEE Transactions on Computer-Aided Design, 9:827-835, 1990. 78. J. Rowe, K. Vinsen, and N. Marvin. Parallel GAS for Multiobjective Functions. In Proc. of the 2nd Nordic Workshop on Genetic Algorithms and Their Applications (2NWGA), pages 61-70, 1996. 79. G. Rudoplh. Global optimization by means of distributed evolution strategies. In H.P. Schwefel and R. Miinner, editors, Parallel Problem Solving from Nature, volume 496, pages 209-2 13, 1991. 80. V. Schnecke and 0. Vornberger. Hybrid Genetic Algorithms for Constrained Placement Problems. IEEE Transactions on Evolutionary Computation, 1(4):266-277, 1997. 8 1. M. Schutz and J. Sprave. Application of Parallel Mixed-Integer Evolution Strategies with Mutation Rate Pooling. In L.J. Fogel, P.J. Angeline, and T. Back, editors, Proc. Fifth Annual Con$ Evolutionary Programming (EP '96), pages 345-354. The MIT Press. 1996.

82. J. Stender, editor. Parallel Genetic Algorithms: Theory and Applications. 10s Press, Amsterdam, The Netherlands, 1993. 83. J. Sprave. Linear Neighborhood Evolution Strategies. In A.V. Sebald and L.J. Fogel, editors, Proc. Third Annual Con$ Evolutionary Programming (EP'94), pages 42-5 1. World Scientific, Singapore, 1994. 84. N. Srinivas and K. Deb. Multiobjective Optimization Using Nondominated Sorting in Genetic Algorithms. Evolutionary Computation, 2(3):221-248, 1995.

85. T.J. Stanley and T. Mudge. A Parallel Genetic Algorithm for Multiobjetive Microprocessor Design. In Proc. of the Sixth Int. Con$ on Genetic Algorithms, pages 597-604, 1995. 86. R. Szmit and A. Barak. Evolution strategies for a parallel multi-objective genetic algorithm. In D. Whitley et al., editor, GECCO'OO, pages 227-234. Morgan Kaufmann, 2000. 87. T. Stiitzle. Parallelization Strategies for Ant Colony Optimization. In R. De Leone, A. Murli, P. Pardalos, and G. Toraldo, editors, High Performance Algorithms and Software in Nonlinear Oprimization, volume 24 of Applied Optimization,pages 87-100. Kluwer, 1998. 88. E.-G. Talbi, 0. Roux, C. Fonlupt, and D. Robillard. Parallel Ant Colonies for Combinatorial Optimization Problems. In Feitelson & Rudolph (Eds.), Job

REFERENCES

103

Scheduling Strategies for Parallel Processing: IPPS’95 Workshop, Springer LNCS 949, volume 11. 1999. 89. E.G. Talbi, Z. Hafidi, and J.M. Geib. A parallel adaptive tabu search approach. Parallel Computing, 24:2003-2019, 1996. 90. R. Tanese. Distributed genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 434439. Morgan Kaufmann, 1989. 9 1. S. Tongchim and P. Chongstitvatana. Parallel Genetic Algorithm with Parameter Adaptation. Information Processing Letters, 82( 1):47-54,2002. 92. R. Venkateswaran, Z. ObradoviC, and C.S. Raghavendra. Cooperative Genetic Algorithm for Optimization Problems in Distributed Computer Systems. In Proc. of the Second Online Workshop on Evolutionary Computation, pages 49-52,1996. 93. K. Weinert, J. Mehnen, and G. Rudolph. Dynamic neighborhood structures in parallel evolution strategies. Complex Systems, 13(3):227-244, 2002. 94. D. Whtley and T. Starkweather. GENITORII: A distributed genetic algorithm. Journal of Experimental and Theoretical AritiJicialIntelligence, 2:189214, 1990. 95. Y. Davidor. A naturally occuring niche and species phenomenon: The model and first results. In R.K. Belew and L.B. Booker, editors, Proc. ofthe Fourth Intl. Con$ Genetic Algorithms, pages 257-263, 1991.

This Page Intentionally Left Blank

Part I1 Parallel Metaheuristic Models

This Page Intentionally Left Blank

3

Parallel Genetic Algorithms GABRIEL LUQUE, ENRIQUE ALBA, BERNABE DORRONSORO Universiad de Malaga, Spain

5.1 INTRODUCTION Genetic Algorithms (GAS) [22,28] are a subfamily of EvolutionaryAlgorithms (EAs) [ 111, which are stochastic search methods designed for exploring complex problem spaces in order to find optimal solutions using minimal information on the problem to guide the search. Unlike other optimization techniques, GASare characterized by using a population of multiple structures (individuals) to perform the search through many different areas of the problem space at the same time. The individuals encode tentative solutions, which are manipulated competitively by applying to them some stochastic operators to find a satisfactory, if not globally, optimal solution. An outline of a classical GA is described in Algorithm 1. Algorithm 1. Pseudocode of a canonical GA Generate(P (0)); Evaluate(P (0 ) ) ; t := 0; while not Termination-Criterion(P( t ) )do P’(t) := Selection(P(t)); P”(t) := Recombination(P’(t)); P”’(t) := Mutation(P”(t)); Evaluate(P’( t ) ) ; P ( t 1) := Replace(P(t), P”’(t)); t := t 1; endwhile

+ +

A GA proceeds in an iterative way by successively generating a new population P ( t ) of individuals from a population P(t - 1) (t = 1 , 2 , 3 , . . . ). The initial population P(0) is generated randomly. A fitness function associates a value to every 107

108

PARALLEL GENETIC ALGORITHMS

individual which is meaningful of its suitability to the problem in hands. The canonical algorithm applies stochastic operators such as selection, crossover, and mutation on a population in order to compute a whole generation of new individuals. In a general formulation, we apply variation operators to create a temporary population P'(t),whose individuals are evaluated; then, a new population P(t 1) is obtained by using P'(t) and, optionally, P ( t ) .The stopping criterion is usually set as reachmg a preprogrammed number of iterations of the algorithm and/or to find an individual with a given error if the optimum, or an approximation to it, is known beforehand. For nontrivial problems, the execution of the reproductive cycle of a simple GA may require high computational resources (e.g., large memory and very long search times), and thus a variety of algorithmic issues have been studied to design efficient GAS. For this goal, numerous advances are continuouslybeing achieved by designing new operators, hybrid algorithms, termination criteria, and so on [ 1 11. In this chapter, we address one such improvement, consisting in adding parallelism to GAS. Parallel GAS (PGAs) are becoming very popular [16], and there exists a large number of implementations and algorithms. The reasons for this success have to do, first, with the fact that GAS are naturally prone to parallelism, since most variation operators can be easily undertaken in parallel. However, the truly interesting observation is that the use of a structured population, that is, a spatial distribution of individuals, in the form of either a set of islands [46] or a diffusion grid [41], is responsible for such benefits. As a consequence, many authors do not use a parallel machine at all to run PGAs, and still get better results than with serial traditional GAS. The main goal of this chapter is to provide a survey of the different models and implementations concerning PGAs. In addition, in order to illustrate the working principles of PGAs, we test the behavior of some of the most important proposed models to solve the same problem; therefore, we intend to provide what we hope to be a useful comparison among the main models for parallelizing GASexisting in the literature. This chapter is organized as follows. First, a description of the standard model of GA, in which the whole population is considered as a single pool of individuals, is given. In the next section, we address the structured models, in which the population is decentralized somehow. Next, some different implementations of PGAs are presented, and a PGA classification is given. In Section 5.5, we test and compare the behavior of several parallel models when solving an instance of the well-known MAXSAT problem. Finally, we summarize the most important conclusions.

+

5.2

PANMICTIC GENETIC ALGORITHMS

In the field of GAS, it is customary to find algorithms in which the population structure is panmictic. Thus, selection takes place globally and any individual can potentially mate with any other one. The same holds for the replacement operator, where any individual can potentially be removed from the pool and replaced by a new one. In contrast, there exists a different (decentralized) selection model, in which individuals

PANMlCTlC GENETIC ALGORITHMS

109

are arranged spatially, therefore giving place to structured GAS (see Section 5.3). Most other operators, such as recombination or mutation, can be readily applied to these two models (i.e., to panmictic and structured populations). There exist two popular classes of panmictic GAS, having different granularity at the reproductive step [43]. In the first one, called a “generational” model, a whole new population of X individuals replaces the old one (right part of Fig. 5.1, where p is the population size). The second type is called “steady state” since usually one (A = 1) or two (A = 2) new individuals are created at every step of the algorithm and then they are inserted back into the population, consequently coexisting with their parents (left part of Fig. 5.1). We relate these two kinds of panmictic GAS in terms of the number of new individuals being inserted into the population of the next generation in Fig. 5.1. As it can be seen in the figure, in the region in between (where 1 < X < p), there exists a plethora of selection models generically termed “generation gap” algorithms, in which a given number of individuals (A value) are replaced with the new ones. Clearly, generational and steady state selection are two special subclasses of generation gap algorithms.

h=1

1
h=p 0

steady state

Fig. 5.1

generational

Panmictic GASfrom steady state to generational algorithms.

Centralized versions of selection are typically found in serial GAS, although some parallel implementations have also used them. For example, the globalparallelism approach evaluates in parallel the individuals of the population while still using a centralized selection performed sequentially in the main processor guiding the base algorithm [29]. This algorithm is conceptually the same as the sequential one, although it is faster, especially for time-consuming objective functions. Usually, the other parts of the algorithm are not worth parallelizing, unless some population structuring principle is used (see Section 5.3). Most PGAs found in the literature implement some kind of spatial disposition for the individuals and then parallelize the resulting chunks in a pool of processors. We must stress at this point in the discussion that parallelization is achieved by first structuring the panmictic algorithm and then parallelizing it. This is why we distinguish throughout this chapter between structuring populations and making parallel implementations,since the same structured GA can admit many different implementations. The next section is devoted to explaining different ways of structuring the populations. The resulting models can be executed in parallel or not, although some structured models suggest a straightforward parallel implementation.

110

PARALLEL GENETIC ALGORITHMS

5.3 STRUCTURED GENETIC ALGORITHMS There exists a long tradition in using structured populations in GAS, especially associated to parallel implementations. Among the most widely known types of structured GAS, the distributed (dGA) and cellular (cGA) ones are very popular optimization procedures [6]. Decentralizing a single population can be achieved by partitioning it into several subpopulations (islands or demes), where island GAS are run performing sparse exchanges (migrations) of individuals (dGAs) or in the form of neighborhoods (cGAs). These two GA types, along with a panmictic GA, are schematically depicted in Figure 5.2.

0 . Workers

Workers

Fig. 5.2 A panmictic GA (a) has all its individual -black points- in the same population and, thus, each of them can potentially mate with any other. Structuring the population usually leads to a distinction between (b) dGAs and (c) &As.

In dGAs, additional parameters controlling when migration occurs and how migrants are selectedincorporated f r o d t o the sourceharget island are needed [ 13,461. In cGAs, the existence of overlapped small neighborhoods helps in exploring the search space [12]. These two kinds of GASseem to provide a better sampling of the search space, and improve the numerical and run time behavior of the basic algorithm in many cases [9,24]. The main difference in a cGA with respect to a panmictic GA is its decentralized selection and variation. In cGAs, the reproductive loop is performed inside every one of the numerous individual pools. In a cGA, one given individual has its own pool of potential mates, defined by its neighboring individuals; at the same time, one individual belongs to many pools. This structure with overlapped neighborhoods (usually a 1D or 2D lattice) is used to provide a smooth difision of good solutions across thegrid. A cGA can be implementedin a distributed-memoryMIMD computer [3 11, although its more direct implementation is on a SIMD computer. A dGA is a multipopulation (island) model performing sparse exchanges of individuals among the elementary populations. This model can be readily implemented in distributed-memory MIMD computers, which provides one main reason for its popularity. A migration policy controls the kind of dGA being used. The migration policy must define the island topology, when migration occurs, which individuals are being exchanged, the synchronization among the subpopulations, and the kind

STRUCTURED GENETIC ALGORITHMS

111

of integration of exchanged individuals within the target subpopulations. The main advantage of a distributed model (either running on separate processors or not) is that it is usually faster than a panmictic GA, but not only from a run time point of view. The reason for this is that the run time and the number of evaluations are potentially reduced due to its separate search in several regions from the problem space. A high diversity and species formation are two of their well-reported features.

man)

#sub-algorithms coupling sub-pop size

A 1 I I I I I I I

few

Fig. 53 Structured-population GA cube.

In Figure 5.3, we plot a three-dimensional (3D) representation of structured algorithms based on the number of subpopulations, the number of individuals in each one, and the degree of interaction among them. While a dGA has a large subpopulation, usually much larger than one individual, a cGA has typically one single individual in every subpopulation. In a dGA, the subpopulations are loosely coupled, while for a cGA they are tightly coupled. Additionally, in a dGA, there exist only a few subpopulations, while in a cGA there is a large number of them. We use this cube to provide a generalized way of classifying structured GAS. However, the points in the cube indicating dGA and cGA are only “centroids”; this means that we could find or design an algorithm that can be hardly classified as belonging to one of such two classes of structured GAS because it depends strongly on the selection of the values in each axis for the algorithm. So far, we have made the implicit hypothesis that the genetic material, as well as the evolutionary conditions, such as selection and recombination methods, was the same for all the individuals and all the populations of a structured GA. Let us call these algorithm types uniform. If one gives up some of these constraints and allows different subpopulations to evolve with different parameters andor with different individual representations for the same problem, then new distributed algorithms may arise. We will name these algorithms nonuniform PGAs. Tanese did some original work in this field and was the first in studying the use of different mutation and crossover rates in different populations [45]. A more recent example of a nonuniform algorithm is the injection island GA (iiGA) of Lin et al. [30]. In an iiGA, there are multiple populations that encode the same problem using a different representation size and thus different resolutions in different islands. The migration rules are also special in the sense that migration is only one-way, going from a lowto a high-resolution node. According to Lin et al., such a hierarchy has a number

112

PARALLEL GENETIC ALGORITHMS

of advantages with respect to a standard island algorithm. A similar hierarchical topology approach has been recently used in [40] with some differences such as real-coded GAS and two-way migration. The purported advantages are: no need for representation conversion,better precision, and better exploration of the search space using a nonuniform mutation scheme. A related proposal has been offered by Herrera et al. [27]. Their hybrid distributed real-coded GA involves a hierarchical structure in which a higher level nonuniform distributed GA joins a number of uniform distributed GASthat are connected among themselves. The uniform distributed GASdiffer in their exploration and exploitation properties due to different crossover methods and selection pressures. The proposed topology is the cube-connected cycle in which the lower level distributed GAS are rings at the comers of the cube, and the rings are connected at the higher level along the edges of the cube. There are two types of migration: local migration among subpopulationsin the same lower level distributed GA and global migrations between subpopulationsbelonging to different lower level distributed GAS. According to [26], the proposed scheme outperforms other distributed GAS on the set of test functions that were used in the paper.

5.4 PARALLEL GENETIC ALGORITHMS In this section, our goal is to present a structured vision of the parallel models and parallel implementations of GAS. Therefore, Section 5.4.1 is devoted to the parallel model used to parallelize a GA, Section 5.4.2 presents a classification of the parallel implementations, and Section 5.4.3 focuses on some of the most promising research lines in the field of PGAs.

5.4.1 Parallel Models This section briefly describes the primary conceptual models of the major parallel GA paradigms that are implemented in the literature. 5.4.1.1 Independent Runs Model. This model merely consists in executing in parallel a same sequential algorithm, with no interaction among the independent runs. This extremely simple method of doing simultaneous work can be very useful. For example, it can be used to run several versions of the same problem with different initial conditions, thus allowing statistics on the problem to be gathered. Since GAS are stochastic in nature, the availability of this kind of statistics is very important. The independent runs model can also be considered as a special case of the distributed model (see Section 5.4.1.3), where there is no migration at all. This way, the result of the distributed computation is the best solution obtained by one of the independent executions.

PARALLEL GENETIC ALGORITHMS

113

5.4.1.2 Master-Slave Model. The master-slave model is easy to visualize. It consists of distributing the objective function evaluationsamong several slave processors whle the main loop of the GA is executed in a master processor. This parallel paradigm is quite simple to implement and its search space exploration is conceptually identical to that of a GA executing on a serial processor. In other words, the number of processors being used is independent of which solutions are evaluated, except for time. This paradigm is illustrated in Figure 5.4, where the master processor sends parameters (those necessary for the objective function evaluations) to the slaves; objective function values are then returned when computed.

lves Fig. 5.4

Master-Slave model.

The master processor controls the parallelization of the objective function evaluation tasks (and possibly the fitness assignment andor transformation) performed by the slaves. This model is generally more efficient as the objective evaluation becomes more expensive to compute, since the communication overhead is negligible with respect to the fitness evaluation time. 5.4.2.3 DistributedModel. In this model, the population is structured into smaller subpopulations relatively isolated one from the others, so it is well-suited for implementing dGAs. Parallel GAS based on this paradigm are sometimes called multipopulation or multi-deme GAS. Besides its name, the key characteristic of this kind of algorithm is that individuals within a particular subpopulation (or island) can occasionally migrate to another one. This paradigm is illustrated in Figure 5.5. Note that the communication channels shown are notional; specific assignments are made as a part of the GA's migration strategy and are mapped to some physical network.

Fig. 5.5

Distributed model.

114

PARALLEL GENETIC ALGORITHMS

Conceptually, the overall GA population is partitioned into a number of independent, separate subpopulations (or demes). An alternative view is that of several small, separate GAS executing simultaneously. Regardless, individuals occasionally migrate between one particular island and its neighbors, although these islands usually evolve in isolation for the majority of the GA run. Here, genetic operators (selection, mutation, and recombination) take place within each island, which means that each island can search in very different regions of the whole search space with respect to the others, Each island could also have different parameter values (heterogeneous GAS [2]). The distributed model requires the identification of a suitable migration policy. The main parameters of the migration policy include the following ones: 0

0

Migration Gap. Since a dGA usually makes sparse exchanges of individuals among the subpopulations, we must define the migration gap, which is the number of steps in every subpopulation between two successive exchanges (steps of isolated evolution). It can be made in every subpopulation either periodically or by using a given probability PA[to decide in every step whether migration will take place or not. Migration Rate. This parameter determines the number of individuals that undergo migration in every exchange. Its value can be given as a percentage of the population size or else as an absolute value. Selection/Replacement of Migrants. This parameter decides how to select emigrant solutions and which solutions have to be replaced by the inmigrants. It is very common in parallel distributed GASto use the same selectionireplacement operators for dealing with migrants.

0

Topology. This parameter defines the neighbor of each island, i.e., the islands that a concrete subpopulation can send to (or receive from) individuals. The traditional nomenclature divides parallel GAS into stepping-stone and island models, depending on whether individuals can freely migrate to any subpopulation or they are restricted to migrate to geographically nearby islands, respectively.

5.4.1.4

Cellular Model. The parallel cellular (or diffusion) GA paradigm normally deals with a single conceptual population, where each processor holds just a few individuals (usually one or two). This is why this model is sometimes called Jine-grained parallelism. The main characteristic of this model is the structuring of the population into neighborhoods, and individuals may only interact with their neighbors. Thus, since good solutions (possibly) arise in different areas of the overall topology, they are slowly spread (or diffused) throughout the whole structure (population). This model is illustrated in Figure 5.6, and cGA implementationsfit naturally into it. The cGAs were initially designed for working in massively parallel machines, although the model itself has been adopted also for distributed systems [31] and

PARALLEL GENETIC ALGORITHMS

Fig. 5.6

115

Cellular model.

monoprocessor machines [7, 241. This issue may be stated clearly, since many researchers still hold in their minds the relationship between massively parallel GAS and cellular GAS,which nowadays represents an incorrect link.

5.4.1.5 Other Models. It is possible to find many implementations of a difficult classification in the literature. In general, they are called hybrid algorithms since they implement characteristics of different models.

(b) Fig. 5.7

Hybrid models.

For example, Figure 5.7 shows three hybrid architectures in which a two-level approach of parallelization is undertaken. In the three cases the highest level of parallelization is a dGA. In Figure 5.7a, the basic islands perform a cGA, thus trying to get the combined advantages of the two models. In Figure 5.7b, we have many global parallelizationfarms connected in a distributed fashion, thus exploiting parallelism for making fast evolutions and for obtaining separate population evolutions at the same time. Finally, Figure 5 . 7 presents ~ several farms of distributed algorithms with a still higher level of distribution, allowing migration among connected farms. Although these combinations may give rise to interesting and efficient new algorithms, we have the drawback of needing some additional new parameters to account for a more complex topology structure.

116

PARALLEL GENETIC ALGORITHMS

5.4.2 A Brief Survey of Parallel GAS In this section we briefly discuss the main features of some of the most important PGAs by presenting a structured classification,organized by the model of parallelization (other classificationscan be found in [5]). In Table 5.1, we provide a quick overview of different PGAs to point out important milestones in parallel computing with GAS. These “implementations” have rarely been studied as “parallel models”. Instead, usually only the implementation itself is evaluated. Table 5.1 A quick survey of several parallel GAS Algorithm

ASPARAGOS dGA GENITOR II ECO-GA PCA SGA-Cube EnGENEer PARAGENESIS GAME PEGAsuS DGENESIS GAMAS iiGA PGAPack CoPDEB GALOPPS MARS RPL2 GDGA DREAM MALLBA HY4 ParadisEO

Article (1989) (1989) ( 1990) (1991) (1991) (1991) ( I 992) (1993) (1993) (1993) (1994) (1994) ( 1994) (1995) (1996) ( I 996) (1999) (1999) (2000) (2002) (2002) (2004) (2004)

Parallel Model

Fine grain. Applies hill-climbing if no improvement Distributed populations Coarse grain Fine grain Subpopulations. migrate the best. local hill-climbing Coarse grain. Implemented on the nCUBE 2 Global parallelization Coarse grain. Made for the CM-200 Object-oriented set of general programming tools Coarse or fine grain. High-level programming on MlMD Coarse grain with migration among sub-populations Coarse grain. Uses 4 species of strings (nodes) Injection island GA, heterogeneous and asynchronous Global parallelization (parallel evaluations) Coarse grain. Every subpop. applies different operators Coarse grain Parallel environment with fault tolerance Coarse grain. Very flexible to define new GA models Coarse grain. Hypercube topology Framework for distributed €As A general framework for parallel algorithms Coarse grain. Heterogeneous and hypercube topology A general framework for parallel algorithms

Some coarse-grain algorithms like dGA [46], DGENESIS [32], GALOPPS [23], PARAGENESIS [42], and PGA [33] are relatively close to the general model of migration islands. They often include many features to improve efficiency. Some other coarse-grain models like CoPDEB [l], GDGA [26], and Hy4 [3] have been designed for specific goals, such as providing an explicit exploratiodexploitation tradeoff by applying different operators on each island. Another example of this class is the iiGA [30], which promotes coding and operator heterogeneity (see Section 5.3). A further parallel environment providing adaptation with respect to the dynamic behavior of the computer pool and fault tolerance is MARS, described by Talbi et al. in [44]. Some other PGAs execute nonorthodox models of coarse-grain evolution. This is the case of GAMAS [36], based on using different alphabets in every island, and GENITOR I1 [47], based on a steady-state reproduction. In contrast, massive PGAs have been strongly associated to the machines on which they run, such as ASPARAGOS [25] and ECO-GA [17]. This is also the case for models of difficult

PARALLEL GENETIC ALGORITHMS

117

classification like PEGAsuS [38] or SGA-Cube [19]. Depending on the global parallelization model we find some implementations, such as EnGENEer [39] or PGAPack [29]. Finally, it is worth emphasizing some efforts to construct general frameworks for PEAs, like GAME [42], PEGAsuS, RPL2 [37], DREAM [lo], MALLBA 141, and ParadisEO [ 141. These systems are endowed with “general” programming structures intended to ease the implementation of any model of PEA for the user, who must particularize these general structures to define hisher own algorithm. Nowadays, many researchers are using object-oriented programming (OOP) to create a more quality software for PGAs, but unfortunately some of the most important issues typical in OOP are continuously being ignored in the resulting implementations. The reader can find some general guidelines for designing object-oriented PEAs in [8]. All these models and implementations offer different levels of flexibility, ranging from a single PGA to the specification of general PGA models. This list is not complete, of course, but it helps in understanding the current “state of the art”. 5.4.3

New Trends in PGAs

In this section, we focus on some of the most promising research lines in the field of PGAs. Future achievements should take note of these issues. Tackling Dynamic Function Optimization Problems (DOP). PGAs will have an important role in optimizing complex functions whose optima vary in time (learning-like process). Such problems consist in optimizing a successive set of fitness functions, each one usually being a (highhmall) perturbation of the preceding one. Industrial processes, like real task scheduling, and daily life tasks such as controlling an elevator or the traffic light system can be dealt with using dynamic models. Some PGAs, like cGAs and dGAs, can successhlly deal with such DOP environments due to their natural diversity enhancements and speciation-like features. Developing Theoretical Issues. Improving the formal explanations of the influence of parameters on the convergence and search of PGAs will endow the research community with tools allowing to analyze, understand, and customize a GA family for a given problem. Running PGAs on Geographically Separated Clusters. This will allow the user to utilize sparsely located computational resources in a metacomputing fashion in order to solve hislher optimization problem. A distinguished example of such a system is to use the Web as a pool of processors to run PGAs for solving the same problem. In particular, grid computing [20] and Peer-to-Peer (P2P) computing [35] have became real alternatives.to traditional supercomputing for the development of parallel applications that harness massive computational resources. This is a great challenge since nowadays grid and P2P-enabled’ frameworks for metaheuristics are just emerging [ 10, 15, 341.

118

PARALLEL GENETIC ALGORITHMS

Benchmarking Soft Computing Techniques. At present, it is clear that a widely available and large set of problems is needed to assess the quality of existing and new GAS. Problem instances of different difficulty specially targeted to test the behavior of GAS and related techniques can greatly help practitioners in choosing the most suitable GA or hybrid algorithm for the task at hand. 5.5 EXPERIMENTAL RESULTS

In this section, we perform several experimental tests to study the behavior of the different parallel models described in the previous sections. Concretely, we use a distributed GA (dGA), a cellular GA (cGA), a master-slave GA (MS), and a distributed GA where interaction among the islands does not exist, i.e., the islands are completely independent (idGA, isolated dGA). All these models are implemented on a distributed-memorysystem. Let us give some details about the implementation of the parallel cGA on distributed systems (the implementation of the other three algorithms is simple and has no especial complications in such systems). In this algorithm the whole population is divided among the processors, but the global behavior of this parallel cGA is the same as that of a sequential (cellular) one. At the beginning of each iteration, all the processors send the individuals of their firsdlast columdrow to their neighbor islands (see Figure 5.8). After receiving the individuals from the neighbors, a sequential cGA is executed in each subpopulation. The remaining of the algorithms has a canonical implementation and no especial issues are used. island (i,j) to island (i, j-1 mod N) A

. -* to island (i+l mod M, j)

to island (i-1 mod M, j)+-

.

to island (i, j + l mod N)

Fig. 5.8 Migration scheme of the distribution of the cGA ( M is the number of columns and N is the number of rows).

For testing the parallel algorithms we have used the well-known MAXSAT problem. The next section briefly describes the MAXSAT problem and then we discuss the results.

EXPERIMENTAL RESULTS

5.5.1

119

MAXSAT Problem

The satisfiability (SAT) problem is commonly recognized as a fundamental problem in artificial intelligence applications, automated reasoning, mathematical logic, and related fields. The MAXSAT is a variant of this general problem. Formally, the SAT problem can be formulated as follows. Let U = { u ~. .,. , un} be a set of n Boolean variables. A truth assignment for U is a function t : U -+ {true, false}. Two literals, u and ~ ucan , match with each variable. A literal u (respectively ~ uis)true under t if and only if t ( u ) = true (respectively t ( T u ) = false). A set C of literals is called a clause and it represents the disjunction (or logical connective). A set of clauses is called a formula. A formula f is interpreted as a formula of the propositional calculus in conjunctive normal form (CNF) so that a truth assignment t satisfies a clause C iff at least one literal u E C is true under t . Finally, t satisfies f iff it satisfies every clause in f . The SAT problem consists of a set of n variables ( ~ 1 , .. . ,u,} and a set of m clauses C1,. . . ,C,. The goal is to determine whether or not there exists an assignment of truth values to variables that makes the formula f = C1 A . . . A C, in CNF satisfiable. Among the extensions to SAT, MAXSAT [21] is the most widely known one. In this case, a parameter K is given and the problem is to determine whether there exists an assignment t of truth values to variables such that at least K clauses are satisfied. SAT can be considered as a special case of MAXSAT when K equals the number m of clauses. In the experiments we use the first instance of De Jong et al. [ 181. This instance is composed of 100 variables and 430 clauses (f*(optimum)= 430). 5.5.2

Analysis of Results

In this section, we study the behavior of different parallel implementations of a GA when solving the MAXSAT problem. We begin with a description of the parameters of each algorithm. No special configuration analysis has been made for determining the optimum parameter values for each algorithm. The whole population is composed of 800 individuals. In parallel implementations, each processor has a population of 800/n, where n is the number of processors. All the algorithms use the one-point crossover operator (with probability 0.7) and bit-flip mutation operator (with probability 0.2). In distributed GAS, the migration occurs in a unidirectional ring manner, sending one single randomly chosen individual to the neighbor subpopulation. The target population incorporates this individual only if it is better than its current worst solution. The migration step is performed every 20 iterations in every island in an asynchronous way. Ail the experiments are performed on 16 Pentium 4 at 2.8 GHz PCs linked by a Fast Ethernet communication network. Because of the stochastic nature of GAS, we perform 100 independent runs of each test to gain sufficient experimental data. Now, let us begin the analysis by presenting in Table 5.2 the number of executions that found the optimal value (% hit column), the number of evaluations (# evals column) and the running time in seconds (time column) for all the algorithms: the sequential GA, the master-slave GA, the isolated distributed GA, the (cooperative)

120

PARALLEL GENETIC ALGORITHMS

distributed GA, and the cellular GA (the parallel algorithms running on 2, 4, 8, and 16 processors). We also show in Table 5.3 the speedup of these algorithms. We use the Weak definition of speedup [ 5 ] , i.e., we compare the parallel implementationrun time with respect to the serial one. Each algorithm is named appending to its name the number of processors (e.g, dGA4 means the distributed GA is executed using four processors). Table 5.2 Average results for all the parallel GAS Algorithm Seq. MS2 MS4 MS8 MS16 dGA2 dCA4 dGA8 dCA16

% hit

60% 61% 60% 58% 62% 72% 73% 72% 68%

# evals 97671 95832 101821 99124 96875 86133 88200 85993 93180

time 19.12 16.63 14.17 13.64 12.15 9.87 5.22 2.58 1.30

Algorithm

% hit

# evals

time

idCA2 idCA4 idCA8 idCA16 cCA2 cCA4 cCA8 cCA16

41% 20% 7% 0% 85% 83% 83% 84%

92133 89730 91264

9.46 5.17 2.49

92286 94187 92488 91280

10.40 5.79 2.94 1.64

If we interpret the results in Table 5.2 we can notice several facts. We analyze each parallel model in turn. As we expected, the behavior of the master-slave algorithms is similar to the sequential version since they obtain the same number of hits and sample a similar number of points of the search space. Also, the master-slave methods spend less time to find the optimum but the profit is very low (see Table 5.3) for any number of processors. It is because the execution time of the fitness function does not compensate the overhead of the communications.

Algorithm MSn idCAn dCAn cCAn

Table 5.3 Weak speedup Speedup n = 2 n = 4 71 = 8 1.34 1.40 1.14 3.69 7.67 2.02 3.66 7.41 1.93 1.83 3.30 6.50

n = 16 1.57 14.7 11.65

The idGA model allows a reduction in the search time and obtains a quasi-linear speedup (see Table 5.3), nearby linear, but the results are worse than those of the serial algorithms (lower number of hits), and even with 16 processors it cannot find the optimal solution in any execution. This it is not surprising, since as we increase the number of processors the resulting subpopulation size decreases, and the algorithm is not able to maintain a high enough diversity to find the global solution (local optimum stagnation). The distributed GASperform better than the sequential algorithm both numerically and in terms of search time, since they obtain a higher number of hits with a lower number of evaluations, and they also reduce the search time. This is a widely

SUMMARY

121

supported result in most of the applications of parallel GAS. The speedup is quite good but it is always sublinear and it slightly moves away from the linear speedup when the number of CPUs increases. That is, when incrementing the number of CPUs a small loss of efficiency is obtained. Numerically spealung, the cellular GAS are the best ones since they obtain the highest number of hits. Surprisingly, they obtain very low execution times, which are only slightly worse than those of the dGAs, which perform a meaningful lower number of exchanges. Besides, the speedup of our distributed implementation of a cGA is not as low as one could suspect a priori for an algorithm with such large communication needs. In general, these results all point to a more in-depth future research on such implementations of cGAs. 5.6

SUMMARY

This chapter contains a modem survey of parallel models and implementations of GAS. By summarizing the parallel algorithms, their applications, classes, and theoretical foundations, we intended to offer valuable information not only for beginners, but also for researchers working with GASor heuristics in general. As we have seen along this chapter, the robustness and advantages of sequential GASare enhanced when a PGA is used. The drawback is the more complex analysis and design, and also the need of some kind of parallel machine to run it. A classified overview on the most important up-to-date PGA systems is discussed. In this chapter, not only a survey of existing problems is outlined, but also possible variants apart from the basic operations and future trends are considered, yielding what we hope is a unified overview and a useful text. The reference list has been elaborated to serve as a directory for granting the reader access to the valuable results that parallel GASare offering to the research community. Finally, we have performed an experimental test with the most common parallel models used in the literature. We use distributed, cellular, master-slave and independent runs models to solve the well-known MAXSAT problem. For this problem, we notice that the master-slave model is not suitable since the overhead provoked by the communications is not compensated by the execution time of the objective function. The isolated dGA (or independent runs) model obtains very low execution times but the solution quality gets worse. The use of distributed and cellular models managed to improve both the number of hits and the search time. The distributed version spends shorter time than the cellular one since it performs a low number of exchanges, but the cellular GA obtains the best number of hits of all the studied algorithms. Acknowledgments The authors acknowledge partial funding by the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project).

122

PARALLEL GENETIC ALGORITHMS

REFERENCES 1. P. Adamidis and V. Petridis. Co-operating populations with different evolution behavior. In Proceedings of the Second IEEE Conference on Evolutionary Computation, pages 188-191. IEEE Press, 1996.

2. E. Alba, F. Luna, and A.J. Nebro. Advances in Parallel Heterogeneous Getenic Algorithms for Continuous Optimization. International Journal of Applied Mathematics and Computer Science, 14(3):317-333, 2004. 3. E. Alba, F. Luna, A.J. Nebro, and J.M. Troya. Parallel heterogeneous GAS for continuous optimization. Parallel Computing, 30:699-7 19,2004. 4. E. Alba and the MALLBA Group. MALLBA: A library of skeletons for combinatorial optimization. In Proceedings of the Euro-Par, volume 2400 of Lecture Notes in Computer Science, pages 927-932. Springer-Verlag, Heidelberg, 2002. 5 . E. Alba and M. Tomassini. Parallelism and Evolutionary Algorithms. IEEE Transactions on Evolutionaiy Computation, 6(5):443-462, October 2002.

6. E. Alba and J.M. Troya. A survey of parallel distributed genetic algorithms. Complexity, 4(4):3 1-52, 1999. 7. E. Alba and J.M. Troya. Cellular evolutionary algorithms: Evaluating the influence of ratio. In M. Schoenauer et al., editor, Parallel Problem Solving,fvom Nature (PPSN VI), volume 1917 of Lecture Notes in Computer Science, pages 29-38. Springer-Verlag, Heidelberg, 2000. 8. E. Alba and J.M. Troya. Gaining new fields of application for OOP: the parallel evolutionary algorithm case. Journal of Object Oriented Programming, December (web version only) 2001. 9. E. Alba and J.M. Troya. Improving flexibility and efficiency by adding parallelism to genetic algorithms. Statistics and Computing, 12(2):91-114,2002. 10. M.G. Arenas, P. Collet, A.E. Eiben, M. Jelasity, J.J. Merelo, B. Paechter, M. Preub, and M. Schoenauer. Framework for Distributed Evolutionary Algorithms. In J.J. Merelo, P. Adamidis, H.G. Beyer, J.L. Fernandez-Villacaiias, and H. P. Schwefel, editors, Seventh International Conference on Pnrallel Problem Solving .from Nature (PPSN), pages 665-675, Springer-Verlag, 2002. 11. T. Back, D.B. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation. Oxford University Press, 1997. 12. S. Baluja. Structure and performance of fine-grain parallelism in genetic search. In S. Forrest, editor, Proceedings of the Fijih International Conference on Genetic Algorithms (ICGA), pages 155-162. Morgan Kaufmann, 1993. 13. T.C. Belding. The distributed genetic algorithm revisited. In L. J. Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms (ICGA), pages 114-121. Morgan Kaufmann, 1995.

REFERENCES

123

14. S. Cahon, N. Melab, and E-G. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics. Journal of Heuristics, 10(3):357-380, May 2004.

15. S. Cahon, N. Melab, and E-G. Talbi. ParadisEO on Condor-MW for optimization on computational grids. http://ww. IiflJr/-cahon/cmw/index.html, 2004. 16. E. Canki-Paz. Eficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Press, 2000. 17. Y. Davidor. A naturally occuring niche and species phenomenon: The model and first results. In R.K. Belew and L.B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms (ICGA),pages 257-263, 1991. 18. K.A. De Jong, M.A. Potter, and W.M. Spears. Using Problem Generators to Explore the Effects of Epistasis. In T. Back, editor, Proceedings ofthe Seventh International Conference on Genetic Algorithms (ICGA), pages 338-345. Morgan Kaufmann, 1997. 19. J.A. Erickson, R.E. Smith, and D.E. Goldberg. SGA-Cube, a simple genetic algorithm for ncube 2 hypercube parallel computers. Technical Report 91005, The University of Alabama, 1991.

20. I. Foster and C. Kesselman (eds.). The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Fransisco, 1999. 21. M.R. Garey aiid D.S. Johnson. Computers and Intractability. A guide to the Theory of NP-Completeness. Freeman, San Francisco, CA, 1979. 22. D.E. Goldberg. Genetic Algorithms in Search, Optimization andMachine Learning. Addison-Wesley, 1989. 23. E.D. Goodman. An Introduction to GALOPPS v3.2. Technical Report 96-07-0 1, GARAGE, I.S. Lab. Dpt. of C. S. and C.C.C.A.E.M., Michigan State Univ., East Lansing, MI, 1996. 24. V.S. Gordon and D. Whitley. Serial and parallel genetic algorithms as function optimizers. In S. Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms (ICGA), pages 177-1 83. Morgan Kaufmann, 1993. 25. M. Gorges-Schleuter. ASPAFUGOS an asynchronous parallel genetic optimization strategy. In Proceedings of the Third International Conference on Genetic Algorithms (ICGA), pages 422427. Morgan Kaufmann Publishers Inc., 1989. 26. F. Herrera and M. Lozano. Gradual distributed real-coded genetic algorithms. IEEE Transactions on Evolutionary Computation, 4( 1):43-63,2000. 27. F. Herrera, M. Lozano, and C. Moraga. Hybrid distributed real-coded genetic algorithms. In A. Eiben, D. Back, M. Schoenauer, and H. P. Schwefel, editors,

124

PARALLEL GENETIC ALGORITHMS

Parallel Problem Solving from Nature, PPSN IV, volume 1498 of Lecture Notes in Computer Science, pages 603-6 12. Springer-Verlag, Heidelberg, 1998. 28. J.H. Holland. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, Michigan, 1975. 29. D. Levine. Users guide to the PGAPack parallel genetic algorithm library. Technical Report ANL-95118, Argonne National Laboratory, Mathematics and Computer Science Division, January 31 1995. 30. S.C. Lin, W.F. Punch, and E.D. Goodman. Coarse-grain parallel genetic algorithms: Categorization and a new approach. In Sixth IEEE Symposium on Parallel and Distributed Processing (SPDP), pages 28-37, 1994. 3 1. T. Maruyama, T. Hirose, and A. Konagaya. A fine-grained parallel genetic algorithm for distributed parallel systems. In S. Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms (ICGA), pages 184190. Morgan Kaufmann, 1993. 32. M. Mejia-Olvera and E. Cantli-Paz. DGENESIS-software for the execution of distributed genetic algorithms. In Proceedings X X con$ Latinoamericana de Informatica, pages 935-946, 1994. 33. H. Muhlenbein, M. Schomish, and J. Born. The parallel genetic algorithm as a function optimizer. Parallel Computing, 17:619-632, 1991. 34. A.J. Nebro, E. Alba, and F. Luna. Multi-Objective Optimization Using Grid Computing. Soft Computing Journal, 2005. To appear. 35. A. Oram (Ed.). Peer-to-Peer: Harnessing the Power ofDisruptiive Technologies. O’Reilly & Associates, 2001. 36. J.C. Potts, T.D. Giddens, and S.B. Yadav. The development and evaluation of an improved genetic algorithm based on migration and artificial selection. IEEE Transactions on Systems, Man, and Cybernetics, 24( 1):73-86, 1994. 37. N.J. Radcliffe and P.D. Suny. The reproductive plan language RPL2: Motivation, architecture and applications. In J. Stender, E. Hillebrand, and J. Kingdon, editors, Genetic Algorithms in Optimisation, Simulation and Modelling. 10s Press, 1999. 38. J.L. Ribeiro-Filho, C. Alippi, and P. Treleaven. Genetic algorithm programming environments. In J. Stender, editor, Parallel Genetic Algorithms: Theoly and Applications, pages 65-83. 1 0 s Press, 1993. 39. G. Robbins. EnGENEer - The evolution of solutions. In Proceedings of the Fifth Annual Seminar Neural Networks and Genetic Algorithms”, London, UK, 1992. 40. M. Sefrioui and J. Periaux. A hierarchical genetic algorithm using multiple models for optimization. In M. Schoenauer, K. Deb, G. Rudolph, X. Yao,

REFERENCES

125

E. Lutton, J.J. Merelo, and H.-P. Schwefel, editors, Parallel Problem Solving @om Nature (PPSN VI), volume 1917 of Lecture Notes in Computer Science, pages 879-888. Springer-Verlag, Heidelberg, 2000. 41. P. Spiessens and B. Manderick. A massively parallel genetic algorithm. In R.K. Belew and L.B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms (ICGA), pages 279-286. Morgan Kaufmann, 1991. 42. J. Stender, editor. Parallel Genetic Algorithms: Theory and Applications. 10s Press, Amsterdam, The Netherlands, 1993. 43. G. Syswerda. A study of reproduction in generational and steady-state genetic algorithms. In G . J. E. Rawlins, editor, Foundations of Genetic Algorithms, pages 94-1 0 1. Morgan Kaufmann, 1991. 44. E.-G. Talbi, Z. Hafidi, D. Kebbal, and J-M. Geib. MARS: An adaptive parallel programming environment. In B. Rajkumar, editor, High Performance Cluster Computing, Vol. I , pages 722-739. Prentice-Hall, 1999. 45. R. Tanese. Parallel genetic algorithms for a hypercube. In J. J. Grefenstette, editor, Proceedings of the Second International Conference on Genetic Algorithms (ICGA), page 177. Lawrence Erlbaum Associates, 1987. 46. R. Tanese. Distributed genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms (ICGA), pages 434-439. Morgan Kaufmann, 1989. 47. D. Whitley and T. Starkweather. GENITOR 11: A distributed genetic algorithm. Journal of Experimental and Theoretical Aritificial Intelligence, 2: 189214,1990.

This Page Intentionally Left Blank

6

Parallel Genetic Programming FRANCISCO FERNANDEZ', GIANLIOMENICO SPEZZANO~, MARC0 TOMASSIN13, LEONARD0 VANNESCH14 'Universidad de Extremadura, Spain 'Universita della Calabria, Italy 3University of Lausanne, Switzerland 'Universita di Milano-Bicocca, Italy

6.1 INTRODUCTION TO GP A few decades ago, some researchers begun to explore how some ideas taken from nature could be adapted and harnessed for solving well-known difficult problems. Among the concepts borrowed from nature, natural evolution demonstrated how simple ideas can be helpful for devising new ways of solving difficult problems. Among the techniques that arose under the umbrella of natural evolution, Genetic Algorithms [16], Evolutionary Programming [lo], and Evolution Strategies [27, 281 have pioneered, matured, and demonstrated their usefulness. But researchers have gone ahead by formulating new utilities for Evolutionary Algorithms, sometimes proposing new techniques that feature different aims and scopes. One of these recent successful techniques is Genetic Programming (GP). GP was presented one decade ago by John Koza [ 171 as a natural evolution inspired technique aimed at helping computers to program themselves. During the last few years, GP not only has demonstrated its capability in automatically developing software modules, but also has been employed for designing industrial products with outstanding quality -such as electronic circuits that have been recently patented [ 191. According to [ 171, GP is a machine learning technique for inducing computer programs by evolutionary means. Koza employed Lisp expressions to represent programs to be evolved, and this has favored the use of tree-like data structures in GP, although some researchers have sometimes employed different alternatives such as linear genomes with dynamic size and graphs.

127

128

PARALLEL GENETIC PROGRAMMING

6.1.1 How GP Works

The GP Algorithm. Basically, any EA -including GP- can be described by means of the following algorithm: 1. Initialize the population of candidate solutions (individuals) for the problem to be solved.

2. Evaluate all of the individuals in the population and assign them a fitness value. 3. Select individuals in the population using a selection algorithm. 4. Apply genetic operations to the selected individuals in order to create new ones. 5. Insert these new individuals into the new population.

6. If the population is not fully populated go to step 3.

7. If the termination criterion is reached, then present the best individual as the output. Otherwise, replace the existing population with the new population and go to step 3. In the following sections, we will describe the different steps of the algorithm; but before that, an introduction to program representation is offered.

Terminal and Function Sets. One of the differences between GP and the other EAs is that candidate solutions are computerprograms. If we consider that individuals are encoded by means of tree like structures, each program is made up offunctions internal nodes - and terminals - the leaves of the tree (see Figure 6.1). The definition of both function and terminal sets is decided by the GP designer and usually depends on the nature of the faced problem. For instance, if GP is employed for solving a symbolic regression problem, we may choose arithmetic primitives for the function set, while if GP is applied for programming a robot, some primitives that allow the robot to move along several directions could make up the function set. The terminal set is usually made up of variables and constant values that are significant for the problem at hand.

Fig. 6.1

Individuals in GP are usually encoded by means of tree structures.

INTRODUCTION TO GP

129

Thus, the first concern for GP designers is to appropriately define the function and terminal sets: even when the solution to the faced problem is not known, one should take care that a solution can be found using the functions and terminals selected. For instance, let assume that we want to solve this problem: find an algorithm capable of retrieving the n element from the series S = { 1 , 3 , 5 , 7...}. If one decides to define the function set F = {+} and the terminal set T = { n } ,no matter which function we construct ( n ,n n, n n 7~...) using those sets, we will never find the optimal function g ( n ) = 2 * n - 1. Nevertheless, if we use instead the sets F = {+, -} and T = { n ,l}, we could build the function y(n) = ( n n) - 1, which is equivalent to g ( n ) . Even when the designer does not know the optimal solution, he/she should be able to extract some information from the high level specification of the problem that helps to define the appropriate function and terminal sets.

+

+ +

+

Fitness Function. After making up the sets of primitive functions and terminal symbols, one should define the fitness function. This function is in charge of evaluating each of the candidate solutions that are generated during the evolution, depending on their capability to solve the problem. If we go back to the examplepresented above, the fitness fimction would simply take each of the individuals from the population +ach of the evolved programs- and compute their fitness over a set of n test cases. For instance, the fitness function could be the sum of the squared errors between each of the values returned by the individual over the n test values and the corresponding values in the series. Equation 6.1 computes the fitness value for the kth individual in the population. n

2=1

Of course, low fitness values are preferred in this case. When it happens, the problem is called a minimization problem. Once the fitness function is defined, selection chooses individuals depending on their fitness value. But it would not make sense to just create clones +r copies- of the best individuals and remove bad ones, because this would lead to the generation of a new population composed by many identical individuals. Thus, some variation mechanism is needed in order to generate some new candidate solutions for the problem.

Genetic Operations. Genetic operators are the variation mechanisms that generate new candidate solutions. The idea is borrowed from nature: if parents are good, they are allowed to breed new individuals that share some features with them but that are not completely identical to them. Possibly some of these offspring can have a better fitness than their parents. Two operators are usually employed for this purpose: crossover and mutation. Crossover takes a set of (normally two) parents and mixes them up with a given probability, so that new individuals are generated. Mutation takes an individual and randomly changes a part of it with a certain probability. Typically, if we are dealing with trees, crossover exchanges two randomly selected subtrees (see Figure 6.2) of

130

PARALLEL GENETIC PROGRAMMING

the parents, while mutation randomly modifies a subtree from an individual (see Figure 6.3).

Fig. 6.2 A graphical representation of tree-based GP crossover. The two parents exchange one of their subtrees in order to generate two new offspring.

Fig. 6.3 A graphical representation of tree based GP mutation. A subtree i s randomly selected from the parent and replaced with a randomly generated tree.

Termination Criterion. The GP algorithm may be stopped when a satisfactory solution has been found or after a given number of generations have been computed. 6.2 MODELS OF PARALLEL AND DISTRIBUTED GP Managing large sets of individuals for many generations requires significant amounts of computational resources. There are thus two main reasons for parallelizing an EA: one is to achieve time savings by distributing the computational effort on a set of calculating agents, and the second is to benefit from a parallel setting from the algorithmic point of view, in analogy with the natural parallel evolution of spatially distributed populations. There are several levels at which an evolutionary algorithm

MODELS OF PARALLEL AND DISTRIBUTED G P

131

can be parallelized, the most important being the level of the population and the level of the fitness evaluation. In the following sections we briefly describe these models, concentrating on GP; for a more detailed discussion see [l].

6.2.1

Parallelizing at Fitness Level

In many real-life problems, the calculation of the individuals’ fitness is by far the most time-consuming step of the GP algorithm. In such cases an obvious approach is to share fitness evaluation among several processors. For example, a master process can manage the population, send individuals to be evaluated to different processors, collect results, and apply the genetic operators to create a new population. Since individuals in GP feature different sizes and complexities, some experiments have dealt with the resulting problem of load imbalance, which decreases the utilization of processors [24]. Load balancing can automatically be obtained if steady-state reproduction is used instead of generational reproduction. Moreover, since several fitness cases must often be considered to calculate the fitness of each individual, it is also possible to evaluate each fitness case on a different processor for the same individual program. Although these techniques are interesting for speeding up the computation, we will not explore them any further since they do not imply any change of the EA itself and thus don’t offer any new challenge to improve the quality of the solutions found by the evolutionary process. Instead, we choose to concentrate our efforts on structured population GP models.

6.2.2 Parallelizing at Population Level Usually, natural populations tend to possess a spatial structure and to be organized in so-called demes. Demes are semi-independent groups of individuals or subpopulations which have only a loose coupling to other neighboring demes. This coupling takes the form of the migration of individuals between demes. Several models based on this idea have been proposed. The two most important are the island and the grid models. These will be described in the following sections. For a recent review dealing with many aspects of distributed EAs see [ 11. 6.2.2.1 The Island ModeL The intuitive idea of dividing a large population into several smaller ones and distributing them over a set of processors is interesting because it allows us to perform several tasks in parallel. This model is usually called the island model. It has been used early in evolutionary computation (see, for instance, [5, 311). In GP, the island model has first been used by Andre et al. [ 2 ] using a parallel message-passing computer. This work has been followed by others ([8,25] and references therein) who have also empirically studied in detail a number of model’s parameters. In the island model the individuals are allowed to migrate among populations with a given frequency (see Figure 6.4). Two ideas lie behind this model: exploring as far as possible different search areas via different populations and maintaining diversity within populations due to the exchange of individuals with other populations. Several

132

PARALLEL GENETIC PROGRAMMING

patterns of exchange have been traditionally used. The most common ones are rings, 2D and 3D meshes, stars, and hypercubes. A random communication topology can also be defined in which a given subpopulation sends its emigrants to another randomly chosen subpopulation. The most common replacement policy, and the one used here, consists in replacing the worst k individuals in the receiving population with k immigrants which are the best k individuals of their original island.

Fig. 6.4

General island topology.

Apart from the usual GP parameters, the model needs a few additional ones: the number of subpopulations, subpopulation size, frequency of exchange, number of exchanged individuals, and communication topology. In the first part of the chapter we focus on two important parameters: the number of subpopulationsand the number of individuals per subpopulation. Once satisfactory values for these have been found, one can also study communication topologies and the migration policies. The island model is interesting in itself and could very well be run on a standard sequential machine. Thus, there is a clear separation between the model and its implementation. However, from the computer architecture point of view, island GP models can naturally be implemented on both distributed memory multiprocessor machines and clusters of networked workstations. In these architectures the address spaces of each processor are separated and communication between processors must be implemented through some form of message passing. Networked architectures are more interesting because of their low cost and ubiquity. The drawbacks are that the performance is limited by high communication latencies and by the fact that the machines have different workloads at any given time and are possibly heterogeneous. Nevertheless, problems that do not need frequent communication such as EAs are suitable for this architecture. Moreover, some of the drawbacks can be overcome by using networked computers in the dedicated mode with a high-performance communication switch (see for instance [4]). The migrations between the different demes can be implemented, for example, using the Message Passing Interface (MPI) standard with synchronous communication operations, i.e., each island runs a standard generational GP and individuals are exchanged at fixed synchronization points between generations. Implementation details can be found in 191.

MODELS OF PARALLEL AND DISTRIBUTED G P

133

6.2.2.2 The Grid or Cellular Model. The cellular model has been introduced in evolutionary computation at the end of the 1980s [21]. Its use in GP is more recent [ l l , 301. In this model each individual is associated with a spatial location on a low dimensional grid. The population is considered as a system of active individuals that interact only with their direct neighbors. Different neighborhoods can be defined for the cells. The most common neighborhoods in the 2D case are the five-neighbor (von Neumann neighborhood) consisting of the cell itself plus the North, South, East, West neighbors (see Figure 6.5) and nine-neighbor (Moore neighborhood) consisting of the same neighbors augmented with the diagonal neighbors.

Fig. 6.5

A grid cellular structure.

Fitness evaluation is done simultaneously for all the individuals and selection, reproduction, and mating take place locally within the neighborhood. Good individuals slowly diffuse across the grid giving rise to the formation of semi-isolated niches of individuals having similar characteristics. The choice of the individual to mate with the central individual and the replacement of the latter with one of the offspring can be done in several ways. Parallel implementation of cellular GP on clusters of workstations can be done by attributing a different piece of the grid to each processor. However, due to the different size of individuals in the grid, different processors may need different times to process their chunk. To get good performance, some load-balancing technique is needed. A scalable implementation of the cellular GP model, called CAGE, is described in detail in [ 141. CAGE is h l l y distributed with no need of any global control structure and is naturally suited for implementationon parallel computers. It introduces fundamental changes in the way GP works. In the model, the individuals of the population are located on a specific position in a toroidal 2D grid and the selection and mating operations are performed, cell by cell, only among the individual assigned to a cell and its neighbors. Three replacement policies have been implemented: direct (the best of the offspring always replace the current individual), greedy (the replacement occurs only if offspring is fitter), and probabilistic (the replacement happens according to difference of the fitness between parent and offspring). Experimental results on a variety of benchmark problems have substantiated the validity of the cellular model over both the island and panmictic GP model.

134

PARALLEL GENETIC PROGRAMMING

6.3 PROBLEMS 6.3.1 Typical GP Problems In the absence of theoretical guidance and with limited time and resources available it is difficult to decide which problems should be considered. If only GP theory had the same degree of maturity as GA theory, one could choose a mix of synthetic benchmarks, standard test problems, and provably difficult problems. But unfortunately this is not the case yet, although progress is being made (see [20]). Thus, we decided to address a set of problems that have been classically used for testing GP: the even parity problem, the symbolic regression problem, and the artificial ant on the Santa Fe trail problem since there is a large amount of accumulatedknowledge on those in the GP community ([ 17,201). It is to be noted that these problems are of the optimization type with known solutions; thus, solutions are not required to generalize to out-of-sample cases. While this is an advantage in the present study, it is a simplification for most real-world problems of interest do not have known solutions and machine-learned inferences are required to be of general value. Therefore, the set is far from complete or even representative of a wide range of typical GP problems. On the other hand, to extend and reinforce the conclusions, we also tackled two difficult real-life problems that will be described in Sections 6.5 and 6.6. The standard benchmarks will be solved with the multi-population (island) model. The two real-world problems will be described using island GP in the first case and cellular GP for the second one. The following is a brief description of the test problems. More detailed explanations can be found in [ 171.

Even-Parity-4 Problem. The Boolean even panty k function of k Boolean arguments returns true if an even number of its boolean arguments evaluates to true, otherwise it returnsfalse. If k = 4, then 16 fitness cases must be checked to evaluate the fitness of an individual. The fitness is computed as 16 minus the number of hits over the 16 cases. Thus a perfect individual has fitness 0, while the worst individual has fitness 16. The set of functions we employed for GP individuals is the following: F = { N A N D , N O R } . The terminal set in this problem is composed of four different Boolean variables T = { a , b, c, d } . Artificial Ant Problem on the Santa Fe Trail. In this problem, an artificial ant is placed on a 32 x 32 toroidal grid. Some of the cells from the grid contain food pellets. The goal is to find a navigation strategy for the ant that maximizes its food intake. We use the same set of functions and terminals as in [ 171. As fitness function, we use the total number of food pellets lying on the trail (89) minus the amount of food eaten by the ant during its path. This turns the problem into a minimization one, like the previous one. Symbolic Regression Problem. The problem aims to find a program which matches a given equation. We employ the classic polynomial equation f(x) = x4 x3 x2 x,and the input set is composed of the values 0 to 999 (1000 fitness

+ + +

PROBLEMS

135

cases). For this problem, the set of functions used for GP individualsis the following: F={*,//,+,-},where /I is like / but returns 0 instead of error when the divisor is equal to 0, thus allowing syntactic closure. We define the fitness as the sum of the square errors at each test point. Again, lower fitness means a better solution. GP Parameters. In all the experiments performed we used the same set of GP parameters: generational GP, crossover rate equal to 95%, mutation rate equal to 0.1%, tournament selection of size 10, ramped half and half initialization, maximum depth of individuals for the creation phase equal to 6, maximum depth of individuals for the variation phase equal to 17, elitism (i.e., survival of the best individual into the newly generated population for panmictic populations. The same was done for each subpopulation in the distributed case). Furthermore, to avoid complicating the issue, we refrained from using advanced techniques such as ADFs (Automatically Defined Functions), tailored function sets, and so on. Only plain GP was used in the experiments.

6.3.1.1 Performance Measures and Comparisons. Reporting on performance comparisons in EAs is a notoriously difficult issue. For problems in which the solution is not known, such as hard real-life optimization problems, a useful figure of merit is the Mean Best Fitness (MBF), that is, the average over all runs of the best fitness values at termination [6]. The very concept of termination is a fizzy one. Indeed, in the above situation, one does not know in advance whether the global optimum has been reached. Consequently, one common attitude is to take the measure under a specified amount of computational effort. Since we are interested in comparing island GP with standard GP, simply comparing fitness values during generations or, as it is usually done in GAS, comparing in terms of the number of fitness evaluations performed, is inadequate. While this is often acceptable in EAs with fixed-length representations, the assumption can be misleading in GP, where individuals change their size dynamically. We thus analyzed the data by means of the eflort of computation, which is defined as the total number of nodes GP has evaluated in a population for a given number of generations. For problems with known solutions, such as those that are studied here, MBF is not entirely adequate because a sizable part of the runs are unsuccessful under the prescribed effort (using a larger effort would help in some cases but would become prohibitively expensive). This prevents us from knowing whether increasing the length of the runs would have been useful and in which cases. Thus, the measured MBF says little about the problem-solving capabilities of the methods. Instead, when the solution is known, the success rate (SR), defined as the ratio of successful runs with respect to the total number of runs under a specified computational effort, is a good indicator of algorithmic effectivity [3,6]. Thus, our main performance indicator for GP benchmarks is the SR. However, to get a more complete picture of the whole evolutionaryprocess, we also report mean fitness curves against computational effort and fitness histograms giving the relative frequency of the solutions found at a given effort value. The mean fitness curves are useful for getting a visual feeling of the workings of the different algorithms with respect to each other over time. The

136

PARALLEL GENETIC PROGRAMMING

histograms allow us to know not only the number of perfect solutions but also how many solutions were close to optimal, which gives an idea of the “dispersion” of the solutions at the end of the runs. Overall, these figures should give a faithful and rather complete picture of the evolutionaryprocess. We also remark that the number of runs we performed per experiment (100) is unusually high. From the statistical point of view, our runs may be considered as a series of independent Bemouilli trials of the same experiment having only two possible outcomes: success or failure. In this case, the number of successes (or of failures) is binomially distributed. The maximum likelihood estimator ?j for the mean of a series of Bernouilli trials, and hence for the probability of success, is simply the number of successes divided by the sample size (the number of runs n). With this information at hand, one can calculate the sample standard deviation ~7= which is also given in the success rate tables.

d m ,

We will compare island GP with standard GP and, to limit the number of free parameters we decided to fix the rest of them as follows: 0

Communication topology: Random.

0

Frequency of exchange: Every 10 generations.

0

Number of individuals exchanged: 10% of the population size.

These values are not arbitrary: they have been used before by ourselves and by others with good results and should constitute an acceptable first approximation [8].

The Artificial Ant Problem. Figure 6.6 clearly shows that 5 communicating populations of 500 individuals each are more effective than a single population of 2500 individuals. In fact, both the curves of average fitness vs. computational effort and the table of the number of hits show better performance for the distributed case, as confirmed by the standard deviations, while the relative frequency histograms show that the distributed case finds more solutions of better quality. The case of a total population size of 1000 (not shown for reasons of space) confirms the results, with five populations being more effective than a single panmictic population on the problem [S]. The Even-Parity-4 Problem. In Figure 6.7 we observe that PADGP with random topology performs significantly better than standard GP for all the population sizes reported. In particular, distributing the individuals into 10 populations gives the best results, as it is evident from both the average fitness curves and the number of hits table. The relative frequency histograms confirm the trend, with a higher number of perfect solutions found and a clustering of good solutions that is shifted towards better fitness values in the distributed case. The analogous experiment with a total number of individuals equal to 500 (not reported here) confirms the trend.

137

REAL-LIFE APPLICATIONS

,O, O'

0

05

1

15

Effort (x 10')

2

25

3

I

1-2500

Fitness

(a)

-

50

40

5-500

I

1-2500

43

47

5-500

53

60

50 (a= 5.000)

61 (a= 4.877)

I

2

0

3

0

Fitness

4

0

%

Fig. 6.6 The artificial ant problem. Standard GP vs Island GP with random communication topology. (a): average fitness against effort for 1 and 5 populations and a total number of 2500 individuals over 100 executions. (b): number of successes in 100 runs for three values of the total effort. (c): histogram of the relative frequency of solutions found by standard GP for an effort E = 3 x lo8. (d): histogram of the relative frequency of the solutions in the case of 5 populations of 500 individuals each for the same value of the effort.

The Symbolic Regression Problem. Figure 6.8 depicts the results obtained on the symbolic regression problem with 250 individuals. Again, we see that distributing the individuals into 5 or 10 populations is beneficial, as shown by the number of successes and the standard deviations. We also run 100 experiments with a total population size of 500 (not reported here) that confirm the trend. 6.4 REAL-LIFE APPLICATIONS The previous sections have shown convincingly that distributed GP is an efficient problem-solving technique. However, we have used typical GP benchmark problems. These benchmarks can be intrinsically difficult,but in general they lack the flavor that

138

PARALLEL GENETlC PROGRAMMING

- 1-loo0 -- 2-500 10-100

,

"I-

LL

.-

-30 m -

d

20 10

05

0 ;

15

1

2

Effort (x 10')

25

O O

10

(4

20

30

Fitness

40

50

40

50

(c) 10-1 00

E 1-1000

=

1.5r1Oa E = 2x10' 71

72

E

=2.5~10~

73 ( 0 = 4 . 4 3 9 J 8 1 (a=3.Y231

85 (a-3.570) 10- 100

88 (a=3.2501

10

20

30

Fitness

Fig. 6.7 The even-parity-4 problem. Standard G P vs PADGP with random communication topology (see text for the other communication parameters). (a): average fitness against effort for a total number of 1000 individuals over 100 executions. (b): number of successes in 100 runs for three values of the total effort. (c): histogram of the relative frequency of solutions found by standard GP for an effort E = 2.5 x 10'. (d): histogram of the relative frequency of the solutions in the case of 10 populations of 100 individuals each for the same value of the effort.

one finds when confronted with real-life problems. In order to show that distributed GP is a generally useful technique, we discuss here two applications that are much closer to the problems practitioners face in their daily activity. The first problem belongs to the hardware design problem, while the second one deals with data classification. In the first case we tackle the problem with the help of island GP, while the latter is treated with cellular GP.

139

PLACEMENT AND ROUTING IN FPGA

1-250

. . 20

30

40

Fitness

(c) 10-25

44 @=4.964)

55 (0=4.974) 55

56 (0=4.964)

O

(b)

I

L

-

_

.

Fitness

,

,

_

;

(4

Fig. 6.8 The symbolic regression problem. Standard GP vs PADGP with random communication topology (see text for the other communication parameters). (a): average fitness against effort for a total number of 250 individuals over 100 executions. (b): number of successes in 100 runs for three values of the total effort. (c): histogram of the relative frequency of solutions found by standard GP for an effort E = 4 x lo5. (d): histogram of the relative frequency of the solutions in the case of 10 populations of 25 individuals each for the same value of the effort. In the histograms, the hight of the bar marked "> 50' is proportional to the number of solutions that are at least 50 units of fitness worse than the perfect solution.

6.5

PLACEMENT AND ROUTING IN FPGA

Field Programmable Gate Arrays (FPGAs) are integrated devices used for the implementation of digital circuits by means of a configuration or programming process. There are different manufacturers and several kinds of FPGAs are available. We will focus on those called island-based FPGAs. This model includes three main components: Configurable Logic Blocks (CLBs), Input-Output Blocks (IOBs), and connection blocks (see Figure 6.9). CLBs are used to implement all the logic circuitry. They are positioned in a matrix way in the device, and they have different configuration possibilities. IOBs are responsible for connecting the circuit imple-

140

PARALLEL GENETIC PROGRAMMING

mented by the CLBs with any external system. The third class of component is omprised of connection blocks (switch-boxes and interconnection lines). They are the elements available for the designer to make the internal routing of the circuit. In most occasions one needs to use some of the CLBs to accomplish the routing.

Fig. 6.9

Island-based FPGAs.

FPGA design flow has two major tasks: placement and routing. In this section we present a methodology that is based on GP for the automation of the placement and routing steps. This methodology has also been employed for tackling multi-FPGA systems synthesis [7]. There are several ways for encoding graphs, i.e., circuits, when working with GA and GP . Sometimes new techniques have been developed to do so. For instance, Cartesian Genetic Programming is a variation of GP which was developed for representing graphs and shows some similarities to other graph-based forms of GP. The aim of Miller et aZ[23]is to find complete circuits capable of implementing a given Boolean function. Nevertheless, we are more interested in physical layout. Our optimisation problem begins with a given circuit description, and the goal is to find out how to place components and wires in FPGAs. Meanwhile we have also developed a new methodology for representing circuits by means of GP with individuals represented as trees. Several researchers have applied EAs for evolving circuits. For instance, Koza has employed GP for designing and discovering analog circuits [18] which have eventually been patented. Thompson’s research scope was the physical design and implementation of circuits in FPGAs [32]. However, all of them try to evolve analog circuits, while we are addressing digital ones. Other studies have addressed the implementation of GP using reconfigurable hardware. For instance, in [22] the author describes how trees can be implemented and evaluated on FPGAs. However, the aim here is not to implement a GP tool on an FPGA but rather to use GP for physically placing and routing circuits. The main reason behind this choice is the similarity between data structures that GP uses -trees- and the way of describing circuits -graphs-. A tree is more convenient than a fixed-size string for describing graphs of any length. In the following we describe how graphs are encoded by means of trees.

PLACEMENT AND ROUTING IN FPGA

141

6.5.1 Circuit Encoding Using Trees The main goal is to implement a circuit into an FPGA. Each circuit component has to be implemented into a CLB, and all the CLBs have to be connected according to the circuit topology. Circuits have to be encoded as trees, and any of the trees that GP will generate should also have an equivalent circuit; the fitness function will later decide if the circuit is correct or not and its degree of resemblance with the correct circuit. A given circuit is made up of components and connections. If we forget the name and function of each of the simple components (considering each of them as a black box), a circuit could be represented in a similar way as the example depicted in Figure 6.10. Given that components compute very easy logic function, any of them can be implemented by using any of the CLBs available within each FPGA. This means that we only have to connect CLBs from the FPGA according to the interconnection model that a given circuit implements, and then we can configure each CLB with the function that each component performs in the circuit. After these simple steps the circuit is in the FPGA. To encode the circuit as a tree we can firstly label each component from the circuit with a number and then assign component labels to the ends of wires connected to them (Figure 6.10). Wires could now be disconnected without losing any information. We could even rebuild the circuit by using labels as a guide.

Fig. 6.10

Representing a circuit with black boxes and labelling connections.

We may now describe all the wires by means of a tree by connecting each of the wires as a branch of the tree and keeping them all together in the same tree (in a similar way as depicted in Figure 6.1 la). By labelling both extremes of branches, we will have all the information required for reconstructing the circuits. This way of representing circuits allows us to go back and construct the real graph. Moreover, any given tree, randomly generated, will always correspond to a particular graph, regardless of the usefulness of the associated circuit. In this proposal, each node from the tree is representing a connection, and each branch is representing a wire.

142

PARALLEL GENETIC PROGRAMMING

(a) Mappingacircuit intoatree.

Fig. 6.11

(b) Mapping a blanch into an FPGA connection.

Mapping an individual into a circuit.

6.5.2 GP Sets The hnction set for our problem contains only one element: F={SW}. Similarly, the terminal set contains only one element T={CLB}. But SW and CLB may be interpreted differently depending on the position of the node within a tree. Sometimes a terminal node corresponds to an IOB connection, while sometimes it corresponds to a CLB connection in the FPGA (see Figure 6.1 la)). Similarly, an internal node -SW node- sometimes corresponds to a CLB connection (the first node in the branch), while others affect switch connections in the FPGA (internal node in a branch, see Figure 6.1 lb). Each of the nodes in the tree will thus contain different information: 1. If we are dealing with a terminal node, it will have information about the position of CLBs, the number of pins selected, the number of wires to which it is connected, and the direction we are talung when placing the wire. 2. If we are instead in a hnction node, it will have information about the direction we are taking. This information enables us to establish the switch connection or, in the case of the first node of the branch, the number of the pin where the connection ends.

6.5.3 Evaluating Individuals To evaluate an individual we must convert the genotype (tree structure) to the phenotype (circuit in the FPGA) and then compare it to the circuit provided by the partitioning algorithm. We developed an FPGA simulator for this task. This software allows us to simulate any circuit and to check its resemblance to other circuits. Therefore, this software tool is in charge of taking an individual from the population and evaluating every branch from the tree, in a sequential way, establishing the connections that each branch specifies. Circuits are thus mapped by visiting each

PLACEMENT A N D ROUTING IN FPGA

143

of the useful nodes of the trees and making connections on the virtual FPGA, thus obtaining phenotype. 6.5.4

Results

Figure 6.12 graphically depicts the series of test circuits of increasing complexity that has been used for validating the methodology.

Fig. 6.12

Circuits to be tested.

As we may notice, the numbers of inputloutputs, connections, and CLBs required for each of the circuit are increased from circuit 1 to circuit 4. The idea is to check whether the methodology can tackle circuits with different sizes and complexities. The main parameters employed were the following: maximum number of generations equal to 500, maximum tree depth equal to 30, steady state, tournament selection of size 10, crossover probability equal to 98%, mutation probability equal to 2%, ramped half-and-half initialization, elitism (i.e., the best individual has been added to the new population at each generation). The population size was modified for each circuit using the following values: 500, 1000, 10,000, and 200 individuals, respectively, for each of the circuits 1 , 2 , 3 , and 4. The smaller population was used for the larger circuit, because each of the individual trees required larger resources, and we could not maintain a large population of individuals in memory. The GP tool we used is described in [9]. Figures 6.13 and 6.14 show one of the proposed solutions among those obtained with GP for each circuit. A very important fact is that each of the solutions that GP found possesses different features, such as area of the FPGA used and position of the input/output terminals. This means that the methodology could easily be adapted for managing typical constraints in FPGA placement and routing. The methodology was later employed for multi-FPGAs system synthesis. Figure 6.15 shows the board that has been employed for experimenting.

144

PARALLEL GENETIC PROGRAMMING

Fig. 6.13 Solutions found for circuits 1 , 2, and 3.

Fig. 6.14 One of the solutions found for circuit 4.

Fig. 6.15 Multi-FPGA board designed for testing the methodology.

6.6 DATA CLASSIFICATION USING CELLULAR GENETIC PROGRAMMING

This section describes the application of cellular GP to the data classificationproblem, which is probably the most studied data mining task [ 151. Data mining refers to the entire process of extracting interesting knowledge from real-world, large and complex data sets. Data classification is a typical discovery-driven task. It tries to identify

DATA CLASSIFICATION USING CELLULAR GENETIC PROGRAMMING

145

common characteristics in a set of N objects (tuple or examples) contained in large data sets and to categorize them into different groups (classes). In a classification task, a training set (also called learning set) is identified for the construction of a classifier. Each record in the learning set has several attributes, one of which, the goal or class label attribute, indicates the class to which each record belongs. The classifier, once built and tested, is used to predict the class label of new records that do not yet have a class label attribute value. A test set is used to test the accuracy of the classifier. The classifier, once certified, is used to predict the class label of future unclassified data. Different models have been proposed for classification, such as decision trees, neural networks, Bayesian belief networks, fuzzy sets, and generic models. Among these models, decision trees are widely used for classification. We focus on decision tree induction by GP. A decision tree is a tree where the leaf nodes are labelled with the classes C,, i=l,...,k and the nonleaf nodes (decision nodes) with the attributes A,, i=l,...,m, of the training set. The branches leaving a node represent a test on the attribute values of the form A, = v], where uJ is one of the possible values A, can assume. The path from the root to a leaf represents a set of conditions (attribute-value) which describe the class C, labelling that leaf. in Figure 6.16 the well-known decision tree built for the small training set given in [26] and consisting of four attributes (outlook, temperature, humidity, and wind) and two classes (play, don 't play) is shown. The tree gives the weather condition under which it is better to play tennis or not. For example, the path from the root to the leftmost node, expressed as outlook = sunny and humidity = high then Don 'tplay, means that if the weather outlook is sunny and the humidity is high, then don't play.

outlook overcast

Don't play Play

Don't play Play

Fig. 6.16 Decision tree for modelling play and don't play tennis.

GP can be used to inductively generate decision trees for the task of data classification. Decision trees can be interpreted as compositions of functions where the function set is the set of attribute tests and the terminal set comprises the classes. The function set can be obtained by converting each attribute into an attribute-test function. Thus there are as many functions as there are attributes. For each attribute A , if Al, ...,A , are the possible values A can assume, the corresponding attribute-test function f~ has arity n and if the value ofA is A , then f ~ ( A 1...,, An)= Ai. When a

146

PARALLEL GENETIC PROGRAMMING

tuple has to be evaluated, the function at the root of the tree tests the corresponding attribute and then executes the argument coming from the test. If the argument is a terminal, then the class name for that tuple is returned, otherwise the new fbnction is executed. The function set is fozltloo~(sunny,overcast,rain), fhumldlty(high, normal), fwlndy(true,.false) and the terminal set is {play, don’t p l a y } . The fitness of an individual measures the goodness of a tree in classifying the data set. The measure can be done in several ways. The most common approach is to define it as the number of training examples in the correct class. This definition comes in a natural way from the classification problem and in GP it is called ruwfitness. However, the raw fitness is a very simple measure that does not take into account the information content of a tree with respect to another. Since the number of trees that can be built is very high, the GP algorithm needs to be endowed with an inductive bias to prefer a particular tree over the offspring after that crossover took place. The only way to provide an inductive bias to the GP classification algorithm is through the fitness function. In [12] we use a fitness function based on the J-measure [29], which determines the information content of a tree and can give a preference criterion to find the decision tree that classifies a set of tuples in the best way. 6.6.1 A CGP-Based Classifier Cellular genetic programming (CGP) for data classification was proposed in [ 111. It enables a fine-grained parallel implementation of GP through the diffusion model. The main advantages of parallel GP for classification problems consist in handling large populations in a reasonable time, enabling fast convergence by reducing the number of iterations and execution time, and favoring the cooperation in the search for good solutions, thus improving the accuracy of the method. The algorithm, in the following referred as CGPC (Cellular Genetic Programming Classifier), is described in Figure 6.17. At the beginning, for each cell, an individual is randomly generated and its fitness is evaluated. Then, at each generation, every tree undergoes one of the genetic operators (reproduction, crossover, mutation) depending on the probability test. If crossover is applied, the mate of the current individual is selected as the neighbor having the best fitness, and the offspring is generated. The current string is then replaced by the best of the two offspring if the fitness of the latter is better than that of the former, The evaluation of the fitness of each classifier is calculated on the entire training data. After the execution of the number of generations defined by the user, the individual with the best fitness represents the classifier.

6.6.2 Ensemble of Classifiers Although CGPC allows the construction of accurate decision trees, the performance of the algorithm is strongly dependent on the size of the training set. In fact, in this model, one of the most expensive operations is the evaluation of the fitness of each decision tree: the entire data set is needed to compute the number of examples that are

DATA CLASSIFICATION USING CELLULAR GENETIC PROGRAMMING

147

Let p c , p , be crossover and mutation probability for each point i in grid do in parallel generate a random individual ti evaluate the fitness o f t , end parallel for while not MaxNumberOfGenerationdo for each point i in grid do in parallel generate a random probability p if 0, < P c ) select the cellj, in the neighborhood of i, such that t, has the best fitness produce the offspring by crossing t , and t, evaluate the fitness of the offspring replace t, with the best of the two offspring evaluate the fitness of the new t, else if (p < p,, + p c ) then mutate the individual evaluate the fitness of the new t, else copy the current individual in the population end if end if end parallel for end while

Fig. 6.17 The algorithm CGPC.

correctly classified; thus it must be replicated for each subpopulation. One approach to improve the performance of the model is to build an ensemble of classifiers, each working on a different subset of the original data set, then combine them together to classify the test set. We proposed an extension of CGPC to generate an ensemble of classifiers, each trained on a different subset of the overall data, and then used them together to classify new tuples by applying a simple majority voting algorithm, like bagging [ 131. The main feature of the new model, in the following referred as BugCGPC, is that each subpopulation generates a classifier working on a sample of the trained set instead of using all the trained set. Figure 6.18 illustrates the basic framework for a parallel and distributed implementation of BagCGPC. We assume that each training sample S,, i= 1,... P resides on a different processor within the distributed computer. We use the CGPC algorithm to parallelize in a natural way the implementation of BagCGPC. The size of each subpopulation Q,, i=l, ..., P , present on a node, must be greater than a threshold determined from the granularity supported by the processor. Each processor, using a training sample

.

148

PARALLEL GENETIC PROGRAMMING

[Trainiyg s e t

I Fig. 6.18

I

Combine CGPC, outputs

Framework for a parallel and distributed implementation of BngCGPC.

S, and a subpopulation Qt implements a classifier process CGPC, as a learning algorithm and generates a classifier. The single classifier is always represented by the tree with the best fitness in the subpopulation. With K subpopulations we obtain K classifiers that constitute our ensemble. To take advantage of the cellular model of GP, subpopulations are not independently evolved, but they exchange the outmost individuals in an asynchronous way. Experimental results show that communication among the subpopulations produces an interesting positive result since the diffusion effect, which allows to transfer classifiers from a subpopulation to another, reduces the average size of trees and consequently improves the performance of the method since the evaluation time of the fitness is reduced. Many data mining applications use massive data sets that are too large to be handled in the memory of the typical computer and in many cases, the data sets may be located at multiple distributed sites. Our objective was to develop a general framework for data classification applicable to both parallel and distributed environments. BugCGPC is designed for learning when disjoint data sets from multiple sites cannot be merged together and can also be applied to parallel learning, where the huge training set is split into several sets that reside on a parallel computer with parallel processors. 6.6.3 Experimental Results

We present preliminary experiments and results of BagCGPC on a large data set taken from UCI Machine Learning Repository, the Cens data set, and compare them with CGPC. The parameters used for the experiments are shown in Table 6.1. Both algorithms run for 100 generations with a population size depending on the number of classifiers. We experimented BagCGPC with 2, 3, 4, 5, 10, 15 and 20 classifiers. Every classifier, with its subpopulation, runs on a single processor of the

DATA CLASSIFICATION USING CELLULAR GENETIC PROGRAMMING

149

Table 6.1 Main parameters used in the experiments Name max-depth-foraew-trees maxdepth After-crossover inax mutant-depth initialisationmethod selectionmethod crossover-hnc-pt-fraction crossoverany _pt_fraction fitness-proprepro-fraction parsimony_factor

Value 6 6 2 Ramped Tournament 0.7

0.1 0.1 0

parallel machine. The size of a subpopulation was fixed at 100; thus CGPC used a population size of 100 x number of classifiers. For example, if an ensemble of 5 classifiers is considered, the population size for CGPG is 500 (and the number of processors on which CGPC is executed is 5), while if the number of classifier is 20, CGPC used a population of 2000 elements (executed on 20 processors). All results were obtained by averaging 10-fold cross-validation runs. The experiments were performed on a Linux cluster with 16 dual-processor 1.133 GHz Pentium I11 nodes having 2 Gbytes of memory connected by Myrinet and running Red Hat v7.2. In our experiments we wanted to investigate the influence of the sample sizes on the accuracy of the method. To this end we used the Cens data set, a large real data set containing weighted census data extracted from the 1994 and 1995 current population surveys conducted by the U.S. Census Bureau. The data set consists of 299,285 tuples, 42 attributes, and 2 classes. Table 6.2 shows the effect of different sample sizes on accuracy as the number of classifiers increases. For each ensemble, the error of CGPC and BagCGCP with sample size of 6000, 15,000, 30,000 and 50,000 are shown. From the table we can note that when the sample size is 6000, BagCGPC is not able to outperform the single classifier working on the entire data set. The same effect is obtained when the number of classifiers is less than 3 . But as the sample size or the number of classifiers increases, BagCGPC is able to obtain an error lower than CGPC. Another positive result regards the computation time. In fact BugCGPC is much more efficient than CGPC. Table 6.2 shows the execution times of CGPC and BagCGPC for each sample. For example, CGCP required 6053 seconds to run on the Cens data set for 100 generations with a population size of 500 elements. When five classifiers are employed, each using 5000 tuples and a population size of 100 elements, BagCGPC needed 1117 seconds of computation time. As already stated in the previous section, communication among the subpopulations has a positive impact on the average size of the trees and consequently improves the performance of the method since the evaluation time of the fitness is reduced. As an example, we compare the average length of the trees when the algorithm ran with 5 classifiers and 50,000 tuples, in the cases of communication of the border trees among subpopulations and no communication, respectively. After 100 iterations, in

150

PARALLEL GENETIC PROGRAMMING

Table 6.2 Comparing execution times for BagCGPC and CGPC Num. proc I 2 3 4 5 10 15 20

6000 588 596 612 635 668 799 823 964

BagCGPC 30000 783 823 980 965

15000 633 654 719 725 843 902 922 1055

1064

1116 1226 1456

50000 841 976 1022 1056 1117 1278 1316 1621

CGPC Alldataset 3760 405 I 4233 5598 6053 6385 8026 9161

the former case the average size is 900 and the communication time is 11 17 seconds, while in latter the average size is about 10,000 and the time needed by the method is 408 1 seconds. In the lack-of-communication case accuracy also worsened, going from 5.07 to 5.56. 6.7 CONCLUDING DISCUSSION

GP is a powerful machne-learning technique, as it can be used to induce a variety of learning models. However, it is relatively slow. Spatially structured populations have been used for a number of years in GP with good empirical results in terms of efficiency. In this chapter, we have presented two distributed forms of GP: the island model and the cellular model. We have shown that giving a spatial or “geographical” structure to the population is beneficial in terms of the solution quality that can be achieved. This has been empirically shown on standard GP benchmark problems and, more importantly for the applicative reader, also on some typical real-life cases: a circuit synthesis problem and a data classification problem. In addition, spatially structuredpopulations can be run on parallel or distributed hardware, with a significant gain in processing time due to the relatively low communication requirements of the algorithms.

REFERENCES 1. E. Alba and M. Tomassini. Parallelism and evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 6(5):443462,2002.

2 . D. Andre and J. R. Koza. Parallel genetic programming: A scalable implementation using the transputer network architecture. In P. Angeline and K. Kinnear, editors, Advances in Genetic Programming 2, pages 3 17-337, Cambridge, MA, 1996. The MIT Press.

REFERENCES

151

3. T. Back. Evolutionary Algorithms in Theory and Practice. Oxford University Press, Oxford, 1996. 4. F. H. Bennet 111, J. Koza, J. Shipman, and 0.Stiffelman. Building a parallel computer system for $18,000 that performs a half peta-flop per day. In W. Banzhaf et al., editor, Proceedings ofthegenetic and evolutionary computution conference GECCO’99,pages 1484-1490, San Francisco, CA, 1999. Morgan Kaufmann.

5. J. P. Cohoon, S. U. Hegde, W. N. Martin, and D. Richards. Punctuated equilibria: A parallel genetic algorithm. In J. J. Grefenstette, editor, Proceedings of the Second International Conference on Genetic Algorithms, page 148. Lawrence Erlbaum Associates, 1987. 6. A.E. Eiben and M. Jelasity. A critical note on experimental research methodology in EC. In Proceedings of the 2002 Congress on Evolutionary Computation (CEC’2002), pages 582-587. IEEE Press, Piscataway, NJ, 2002. 7. F. Fernandez, I. Hidalgo, J. Lanchares, and J.M. Sanchez. A methodology for reconfigurable hardware designed based upon evolutionary computation. Microprocessors and Microsystems, to appear.

8. F. Fernhdez, M. Tomassini, and L. Vanneschi. An empirical study of multipopulation genetic programming. Genetic Programming and Evolvable Machines, 4(1):21-52,2003. 9. F. Fernandez, M. Tomassini, L. Vanneschi, and L. Bucher. A distributed computing environment for genetic programming using MPI. In J. Dongarra, P. Kaksuk, and N. Podhorszki, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, volume 1908 of Lecture Notes in Computer Science, pages 322-329. Springer-Verlag,Heidelberg, 2000. 10. L. J. Fogel, Alvin J. Owens, and Michael J. Walsh. Artificial intelligence through a simulation of evolution. In M. Maxfield, A. Callahan, and L. J. Fogel, editors, Biophysics and Cybernetic Systems: Proc. of the 2nd Cybernetic Sciences Symposium, pages 131-155, Washington, D.C., 1965. Spartan Books. 11. G. Folino, C. Pizzuti, and G. Spezzano. A cellular genetic programming approach to classification. In W. Banzhaf et al., editor, Proceedings of the Genetic and Evolutionary Computation Conference GECCO ’99,pages 1015-1 020, San Francisco, CA, 1999. Morgan Kaufmann. 12. G. Folino, C. Pizzuti, and G. Spezzano. Parallel genetic programming for decision tree induction. In 13th IEEE International Conference on Tools with Artijicial Intelligence (ICTAI’OI),pages 129-135. IEEE Press, Piscataway, NJ, 2001. 13. G. Folino, C. Pizzuti, and G. Spezzano. Ensemble techniques for parallel genetic programming based classifiers. In C. Ryan et al., editor, Genetic Programming, Proceedings ofEuroGP’2003, volume 2610 ofLNCS,pages 59-69, Essex, 2003. Springer-Verlag.

152

PARALLEL GENETIC PROGRAMMING

14. G. Folino, C. Pizzuti, and G . Spezzano. A scalable cellular implementation of parallel genetic programming. IEEE Transactions on Evolutionary Computation, 7(1):37-53,2003.

15. D. J. Hand. Construction and Assessment of Classlfication Rules. John Wiley, New York, 1997. 16. J. H. Holland. Adpatation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, 1975. 17. J. R. Koza. Genetic Programming. The MIT Press, Cambridge, Massachusetts, 1992.

18. J. R. Koza, D. Andre, F. H. Bennett, and M. A. Keane. Genetic Programming III: Darwinian Invention & Problem Solving. Morgan Kaufmann Publishers Inc., 1999. 19. J. R. Koza, M. A. Keane, and M. J. Streeter. Evolving inventions. ScientiJic American, February 2003. 20. W. B. Langdon and R. Poli. Foundations of Genetic Programming. Springer, Berlin, 2002. 2 1. B. Manderick and P. Spiessens. Fine-grained parallel genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 428433. Morgan Kaufmann, 1989. 22. P. Martin. A hardware implementation of a genetic programming system using FPGAs and Handel-C. Genetic Programming and Evolvable Machines, 2(4):3 17-343, December 2001. 23. J. F. Miller, D. Job, and V. K. Vassilev. Principles in the evolutionary design of digital circuits-part I. Genetic Programming and Evolvable Machines, 1(1/2):735, April 2000. 24. M. Oussaidene, B. Chopard, 0. Pictet, and M. Tomassini. Parallel genetic programming and its application to trading model induction. Parallel Computing, 2311183-1198,1997. 25. B. Punch, D. Zongker, and E. Goodman. The royal tree problem, a benchmark for single and multiple population genetic programming. In P. Angeline and K. Kinnear, editors, Advances in Genetic Programming 2, pages 299-316, Cambridge, MA, 1996. The MIT Press. 26. J. Ross Quinlan. C4.5 Programsfor Machine Learning. Morgan Kaufmann, San Mateo CA, 1993. 27. I. Rechenberg. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. frommann-holzbog, Stuttgart, 1973. German.

REFERENCES

153

28. H. P. Schwefel. Evolutionsstrategie und numerische Optimierung. PhD thesis, Technische Universitat Berlin, Berlin, May 1975. 29. P. Smyth and R.M. Goodman. An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering, 4(4):301-3 16, August 1992. 30. G. Spezzano, G. Folino, and C. Pizzuti. CAGE: A tool for parallel genetic programming applications. In J. Miller et al., editor, Genetic Programming, Proceedings of EuroGP’2001, volume 2038 of LNCS, pages 51-63. Springer, Berlin, 200 1.

3 1. R. Tanese. Parallel genetic algorithms for a hypercube. In J. J. Grefenstette, editor, Proceedings of the Second International Conference on Genetic Algorithms, pages 177-183. Lawrence Erlbaum Associates, 1987. 32. A. Thompson, P. Layzell, and R. S. Zebulum. Explorations in design space: Unconventional electronics design through artificial evolution. IEEE Trans. Evol. Comp., 3(3):167-196, 1999.

This Page Intentionally Left Blank

7

Parallel Evolution Strategies GUNTER RUDOLPH+ Parsytec Solutions GmbH, Erkelenz, Germany

7.1

INTRODUCTION

The term ‘Evolution Strategy’ (ES) denotes a branch of Evolutionary Algorithms (EAs) that emerged and initially developed independently [22, 23, 341 from other branches like Evolutionary Programming (EP) [ 1I ] and Genetic Algorithms (GAS) [ 161. In their contemporary form the typical field of application of ES and EP focuses on parameter optimization with real variables [9, 10,25,35]. Therefore the scope of this chapter comprises parallel EAs, with individuals encoded by vectors of floatingpoint numbers. Not surprisingly, the architectural design of parallel ES/EP hardly differs from parallel GAS. But due to the characteristic feature of a self-adaptive control of the mutation strength, there are parallel designs working fine for GAS (with fixed mutation strength) for which parallel ESiEP fail. Such cases and their cure will be addressed in the course of this chapter. Historically, probably the first non-sequential computer running an ES was the vector computer CYBER 205 [3] in 1984. This may be seen as the beginning of the area of parallel EWEP. With the advent of affordable parallel hardware in the late 1980s the dream of exploiting the inherent parallelism of EAs came true. First parallel versions of ES were realized in a local area network [17], on Parsytec transputer systems under OCCAM [4] or under C/Helios [27, 281, and on the Connection Machine 2 (CM2) under C* [30]. Parallel versions of EP were implemented on an Intel Paragon iPSCi860 [6] and a MasPar MP-I [20]. Remarkably, none of these early parallel versions was a pure parallelization of the sequential version despite the parallelism inherent to a population of individuals. Rather, the new parallel designs were closely aligned to the parallel target hardware. As a result, the availability of parallel hardware with different architectures had led to entirely new EAs!

‘The views expressed in this chapter are those ofthe author and do not reflect the official policy or position of Parsytec Solutions GiabH.

155

156

PARALLEL EVOLUTION STRATEGIES

According to Flynn’s well-known classification of parallel hardware [8] the designs of parallel ES/EP were tailored to SIMD (MP-1, CM2) and MIMD (Paragon, transputer) computers. This is why the taxonomy of parallel EAs proposed in [ 15,141 distinguished between parallelism at the level of subpopulations (MIMD) and at the level of individuals (SIMD) and between synchronous and asynchronous communication. Viewed from the biological angle, the parallel models can be categorized into migration (MIMD) and pollination (SIMD) models. Other terms commonly used for the latter model are neighborhood or diffision model. The presentation of parallel ESIEP given in Section 7.4 is oriented at this classification, too. Section 7.2 is devoted to the question of under which circumstances a parallelization of randomized optimization algorithms is useful at all.

7.2 DEPLOYMENT SCENARIOS OF PARALLEL EVOLUTIONARY ALGORITHMS The utility of a parallel deterministic optimization is evident: Since the deterministic algorithm is run only once, the parallel version delivers the solution more rapidly. In case of randomized optimization algorithms like EAs the situation changes. Moreover, our typical measures for speedup and efficiency of parallel versions must be changed [ 11. Since randomized algorithms are run multiple times usually it is advisable to verify that the burden of developing a parallel randomized algorithm is worth the effort. This will be exemplified for two typical deployment scenarios. 7.2.1 Scenario: Run EA Multiple Times, Choose Best Solution Found

In practice, nobody runs a randomized algorithm like an EA only once. Rather, the EA is run multiple times and the best solution found within some time limit is used. 7.2.1.1 Fixed Generation Number. Let t be the running time of the sequential algorithm and t , = c t / p the running time of the parallelized sequential algorithm, where c > 1 aggregates the communication and other overhead costs of the parallelized version. Let n be the maximum number of times we can run the EA before we must use the best solution found and assume that n = p , where p is the number of processors. Then T = t is the total running time of running the sequential algorithm on p processors in parallel. Since the total running time of p sucessive runs of the parallelized version is r p = p x t , = c t , we can see easily that nothing is gained by a parallelization. Even worse, every effort invested in this task is a waste of resources. 7.2.1.2 Random Generation Number. The situation changes if the running time of the EA is a random variable. Let T be the random running time of the sequential algorithm and T, = c T / p the running time of the parallelized sequential algorithm with c > 1. Again, assume 72 = p. Then the random total running time R of running

DEPLOYMENT SCENARIOS OF PARALLEL EVOLUTIONARY ALGORITHMS

157

the sequential algorithm on p processors in parallel is

R=max{T(l),T(2), ...,T ( p ) }= T,:, where T ( i )is the running time at processor i . Clearly, the T ( i )are independent and identically distributed. Assume that T ( i )is normally distributed with mean t > 0 and variance cr2. The expectation of R can be approximated [5] via

E[ R ] = E[ T,:]

FZ

E[TI

+ D[TI d-

The random total running time R, of p sucessive runs of the parallelized version is given by P

P

R, = c T p ( i )=

xT(i) p

i=l

2=1

with expectation

E[R,]=cE[T]. Thus, the parallelized version is faster if

In other words, the larger is the coefficient of variation v = D[T ] /E[ T ] the larger the benefit achieved by the parallelization of the sequential algorithm! As seen from t h s analysis, this scenario can be an appropriate field of deployment of parallelized EAs.

7.2.2 Run Until Satisfactory Solution One might argue that the previous scenario is not always the case. For example, if we need only a satisfactory solution, then we can stop the EA as soon as such a solution has been detected. In principle, this can happen in a single run of the EA.

7.2.2.1 Fixed Generation Number. As in the previous scenario let t be the running time of the sequential algorithm and t , = c t / p the running time of the parallelized sequential algorithm with c > 1. Suppose there exists a success probability s E (0,1) for each run of the EA such that the random variable G represents the number of runs until a successful run occurs. The random variable G has geometrical distribution with probability function P{ G = k }

=

s(1-s)~-'

fork = 1 , 2 , . . . and s E ( 0 , l ) with 1 1-s E [ G ] = - and D 2 [ G ]= S

52

158

PARALLEL EVOLUTION STRATEGIES

The time until a successful run occurs on a single processor is S = t G. Therefore, the random total running time R of running the sequential algorithm on p processors in parallel is

R

= min{

S(l),S(2), . . . , S ( p )} = S1:,

= tG1,,

where GI:, denotes the minimum of p independent and identically distributed geometrical random variables. According to [45] we have

E[ GI:,] =

1 (1 - S ) n and D2[ GI:,] = 1 - (1 - s)p [ 1 - (1 - s ) " ] 2

such that

E[ R ] = t E[ G I Z p= ]

1 - (1 - s ) p

The random total running time R, of p sucessive runs of the parallelized version is given by

Rp = t, S

=

C

-

P

tS

with expectation C

E[Rp] = - t E [ S ] P

Since

E [ R p ]< E [ R ] w c <

=

ct

-.

S P S P

1 - (1 - s ) p

, there are constellations in which a parallelized version is useful.

7.2.2.2 Random Generation Number. run i. Then

s=

Let T ( i )be the random running time of

c G

T(2)

2=1

is the random time until the first successful run on a single processor. Since G is a stopping time we have E [ S ] = E[G ] E[ T 1. As a consequence, the random total running time R of running the sequential algorithm on p processors in parallel is

R = niin{ S(l),S ( 2 ) .. . . . S ( p )} with

=

Sl:,

E[ R ]= E[ Sl:,] < E[ S ] = E[ TI E[ G I .

The random total running time R, of p contiguous runs of the parallelized version is given by G

R, =

G

Tp(2)= a=l

T(i) p

1=1

SEQUENTIAL EVOLUTIONARY ALGORITHMS

with

C

159

ct

E[R,]=-E[T]E[G]=-. P S

P

Since certainly E(R ] 2 $ E[ S ] a parallelization under this scenario is worth considering as well.

7.3 SEQUENTIAL EVOLUTIONARY ALGOFUTHMS The similarities and ‘historical’ differences between ES and EP are described in [ 2 ] . Contemporary versions of ES and EP deploy the same self-adaptive control of the mutation strength and they differ only in the selection method and in the renunciation of recombination operators in case of EP. The general skeleton of ES/EP is given below:

Algorithm 1. ES/EP Algorithm initialize population of p individuals and evaluate them repeat generate X offspring from p parents evaluate offspring select p new parents from offspring (and possibly parents) until stopping criterion fulfilled output: best individual found

7.4 PARALLEL EVOLUTIONARY ALGORITHMS 7.4.1 Parallelization of Sequential Algorithm The parallelism inherent in the sequential ES/EP can be found in the generation and evaluation of offspring. The nature of generational selection procedures requires that all offspring have been generated and evaluatedbefore selection can begin. Therefore, an obvious approach to parallelize the sequential version on MIMD computers is given by the farmedworker model: One distinguished processor is the farmer that runs the main logic of the algorithm and distributes the working packages to the worker processors. In case of ES/EP the farmer generates the initial population and evaluates them by sending the individual to an idle worker. The worker evaluates the individual and returns its fitness value to the farmer. After the initializationthe farmer sequentially generates a new offspring by random variation (mutation and possibly recombination) and sends it to an idle worker. A concurrent thread of the farmer process is responsible for collecting the fitness values delivered by the workers. If

160

PARALLEL EVOLUTION STRATEGIES

the farmer has sent all offspring to the worker, it waits until the concurrent thread signals that all offspring have been evaluated. Now the farmer applies a selection method in ES or EP fashion to determine the parents of the next iteration. Needless to say, this approach leads to an efficient parallel ESiEP if the evaluation of the fitness function requires much time. In light of Section 7.2 a parallelization is useful if the evaluation of the objectiveifitness function requires non-constant time. This may be the case if the evaluation requires an adaptive numerical integration procedure. In [7] a (p,A)-ES has been parallelized for optimizing a multibody system requiring numerical time integration of the equations of motion. The PEPNet system [26] uses a parallel version of EP to optimize artificial neural networks. The farmer generates the initial population before sending groups of individuals to the worker. Now each worker runs a sequential EP for a prescribed number of generations and returns its group of evolved individuals. After all workers have delivered their evolved group of individuals the farmer selects the best solution for output and halts. Essentially, this is nothing more than running p instances of the sequential EP on p processors in parallel. Since the number of iterations is fixed and fitness evaluation needs almost constant time, this approach is the most efficient one actually (as can be seen from Section 7.2).

7.4.2 Migration Model In the migration model the population is divided in p subpopulations, where p is the number of processors available. Each subpopulation runs a standard ESiEP and exchanges individuals from time to time. In the design phase you must make decisions about the 1. migration paths (which subpopulations exchange individuals?),

2. migration frequency (how often migration takes place?), 3. number of migrants (how many individuals are exchanged?), 4. selection policy for emigrants (which individuals leave the subpopulation?), 5. integration policy for immigrants (how to integrate the arriving individuals?).

In [28] the subpopulations were arranged in a bidirectional ring. Migrations took place every kth generation and a prefixed number of individuals are sent to each neighboring subpopulation. The individuals with rank 1, 3, 5, . . . were sent to the left whereas the individuals with rank 2,4, 6, . . , were sent to the right neighboring subpopulation. The immigrants took the empty slots of the emigrants and the next generation began with recombination and mutation. A variant of this ES was proposed in [33]: the immigrants remain unchanged until new immigrants amve. The idea was to give them a chance to establish their genes in the new environment. Apparently, this idea proved useful for the application (mixed-integer optimization of the coating of optical multi-layer filters).

PARALLEL EVOLUTIONARY ALGORITHMS

161

In the parallel EP described in [6] the subpopulations were placed at the vertices of a d-dimensional hypercube with d = 6. Migration took place after each generation as follows: Each subpopulationwas divided into d 1groups of individuals by random selection. One group stayed at the processor whereas the other d groups were sent to the neighboring subpopulations. The immigrants took the place of the emigrants. The subpopulations of the parallel ES proposed in [ 121 were arranged in a ring, but the migration paths were unidirectional. Every kth generation a copy of the best individual emigrates and the immigrant replaces the worst individual in the subpopulation. Notice that the emigrants in the previous versions actually left the processors. Here, only copies are sent. The parallel ES presented in [21] divided the population in p subpopulations on p processors. One distinguished processor is the ‘master’ with additional responsibilities. Each subpopulation selects every kth generation they best individuals and sends copies of them to the master before starting the next iteration. A concurrent thread at the master asynchronously receives ( p - 1)y individuals. As soon as all individuals have been collected the master selects the y best of them and broadcasts them to all subpopulations. A concurrent thread at each subpopulation receives the y individuals from the master and replaces the y worst by those individualsjust received. A similar approach was realized in the parallel ES/GP hybrid developed in [44]. The blackboard architecture was used to ‘publish’ about 10% of the best individuals of each subpopulation. The topology of the migration paths of such approaches is the fully connected graph with p nodes. It seems plausible that the topology of the migration network has important impact on the diversity of the gene pool: It is expected that smaller diameters increase the risk of premature convergence but systematic studies are not available apparently. The most recent migration model was given in [40]. As in [12] the population was arranged in a unidirectional ring. Each subpopulation ran a non-standard EP: Each parent generates two intermediate offspring. The first has normally-distributed mutations whereas the second has Cauchy-distributed mutations. Both intermediate offspring are evaluated and the best of them is the offspring. Thus, the variation step is already on generation of a (1,2)-ES. After all offspring are generated the parents are selected in EP-style and the next generation starts. Every k generation the y best or randomly selected individuals are sent to the other node. The immigrants either replace the worst or randomly selected individuals. The EVAtoolbox [41] also contains a parallel ES according to the migration model but no details are given. Migration models were also simulated on sequential computers. For example, experiments with 100 communicating (1,lO)-ES are reported in [18].

+

7.4.3 Pollination Model The pollination model assumes that the individuals do not move but that the genetic information is spread by means of pollination. This situation is closely matched by a SIMD computer. Each individual is placed on a processor and the interaction

162

PARALLEL EVOLUTION STRATEGIES

takes place in its neighborhood defined by an underlying graph structure. The most obvious neighborhood graph on a SIMD computer is the 2D torus. But the tight coupling between parallel ESIEP design and target hardware was broken in [37] by mapping a pollination model to a MIMD computer. The advantages are twofold: First, the population size on a SIMD computer is limited to multiples of the number of processors. Second, the population size can be scaled easily to achieve high efficiency on a MIMD computer. This idea was adopted in the parallel pollination ES described in [29], where each individual selects a mate for recombination from its neighborhood. After recombination the offspring is mutated and evaluated. The offspring replaces its parent at the current location if it is better.

Fig. 7.1

implementing a pollination model on a MIMD computer.

Figure 7.1 is intended to illustrate how the pollination model algorithm has been mapped to the MIMD machine. After initialization of the core and border population cells the border cells are sent to the neighboring subpopulations. Now all relevant information for one iteration is known on each processor and the generation and evaluation of new sample points for each cell placed on one processor can be computed

PARALLEL EVOLUTIONARY ALGORITHMS

163

sequentially. If this is done, the border cells are sent to the neighboring processors again and the next iteration can be computed. This parallel ES was also realized on a true SIMD computer as reported in [30]. Another example of running the pollination model on MIMD computers is given in [38]. Here, the underlying topology is a ring, but the neighborhood of each individual has a certain radius, i.e., r individuals to the right and left. At each iteration and for each processor the two best individuals are taken from the processor’s neighborhood. The selected individuals are recombined to an offspring that is mutated and evaluated. If it is better than the individual at the processor, it replaces that individual. Another version generates multiple offspring per processor and accepts only the best of them. A parallel EP was realized on a MasPar MP- 1 [20], which is also the target machine of the pollination model realized in the EVA toolbox [41]; unfortunately, no details are given. An extension of the pollination model was presented in [43]. Here, the static neighborhood structure was replaced by a dynamic one. Now one needs rules for establishing (coupling) and cutting (decoupling) a connection. A connection to the neighbor is detached if recombination with that neighbor yields an offspring that is worse than the best offspring generated at the processor during the current generation. If the number of connections of an individual falls below a certain threshold, a certain number of new connections to randomly chosen individuals are established. Notice that individuals cannot become isolated, in contrast to groups of connected individuals. But this kind of isolation is finished as soon as a randomly generated connection is established by the coupling rule. Needless to say, due to its lack of locality this algorithm is not well suited for SIMD machines. Rather, this approach is tailored for multiprocessor machines with shared memory.

7.4.4 Non-generational Models Both the migration and pollination models represent generationalESIEP. All offspring must be generated and evaluated before selection can take place. Non-generational or steady-state or ( p 1)-ES do not have this potential synchronization bottleneck. Moreover, this model is well-suited for multi-processor computers with shared memory. But there is a drawback: The self-adaptive control of mutation strength does not work as desired. This problem was addressed by [42] and [31]. Here, we focus on the latter solution. After the generation and evaluation of the initial population of size p, each of the p processors selects the best and worst unblocked individual and blocks them. The best individual generates an offspring by mutation and is unblocked. The offspring is evaluated and replaces the worst individual, whose slot is unblocked. The special handling of the mutation control was developed in [32]. A reproduction counter r, is added to each individual. If an individual did not generate a better offspring in R/2 reproduction events, then the step size is considered wrong and it is adjusted via the weighted average with the mutation strength of the grandparent:

+

164

PARALLEL EVOLUTION STRATEGIES

Something that was good in the past cannot be completely wrong in the near future. If this adjustment also does not lead to a better offspring within R/2 further reproduction events, then the individual will be discarded from the population. This kind of control of mutation strength seems to work for non-generational ES. 7.4.5 Nested Populations

The shorthand notation ( p t A)-ES was extended in [24] to the expression

[ p’

t A’ ( p t

Iy’-ES

with the following meaning: There are p‘ populations of p parents. These are used to generate (e.g., by merging) A’ initial populations of p individuals each. For each of these A’ populations a ( p t A)-ES is run for y generations. The criterion to rank the A’ populations after termination might be the average fitness of the individuals in each population. This scheme is repeated y’times. The obvious generalization to higher levels of nesting is described in [25], where it is also attempted to develop a short-hand notation to specify the parametrization completely. This nesting technique is of course not limited to evolution strategies: other evolutionary algorithms and even mixtures of them can be used instead. In fact, the somewhat artificial distinction between ES, GA, and EP becomes more and more blurred when higher concepts enter the scene. It is reported in [41] that the EVA toolbox provides a parallel implementation of nested populations. This opens the door to several fields of application:

1. Alternative method to control internal parameters. Herdy (1992) [13] used A’ subpopulations, each of them possessing its own different and fixed step size a. Thus, there is no step size control at the level of individuals. After y generations the improvements (in terms of fitness) achieved by each subpopulation is compared to each other and the best p’ subpopulations are selected. Then the process repeats with slightly modified values of a. Since subpopulations with a near-optimal step size will achieve larger improvements, they will be selected (i.e., better step sizes will survive) resulting in an alternative method to control the step size. 2. Mixed-Integer optimization. Lohmann (1992) [ 191 considered optimization problems in which the decision variables are partially discrete and partially continuous. The nested approach worked as follows: The evolution strategies in the inner loop optimized over the continuous variables while the discrete variables were held fixed. After termination of the inner loop, the EA in the outer loop compared the fitness values achieved in the subpopulations, selected the best ones, mutated the discrete variables, and passed them as fixed parameters to the subpopulations in the inner loop. It should be noted that this approach to mixed integer optimization may cause some problems: In essence, a GauKSeidel-like optimization strategy

CONCLUSIONS

165

is realized, because the search alternates between the subspace of discrete variables and the subspace of continuous variables. Such a strategy must fail whenever simultaneous changes in discrete and continuous variables are necessary to achieve further improvements.

3. Minimax optimization. Sebald and Schlenzig (1994) [36] used nested optimization to tackle minimax problems of the type

where X follows:

C Rn and Y C R".

min{ g(z) : 5 E X }

Equivalently, one may state the problem as

where

g(x) = m u { f(z, y) : y E Y } .

The evolutionary algorithm in the inner loop maximizes f(z, y) with parameters z, while the outer loop is responsible to minimize g(z) over the set X.

7.5 CONCLUSIONS The most popular parallelization model for ES/EP in the past was the migration model. Today, migration and pollination models may be seen as extreme cases: as shown in [39] both models can be integrated in a general population model. More recent parallel realizations of ES/EP move into that direction. An important task for the future is the systematic test of the different parallel versions of ES and EP. Even if an analysis of the problem and deployment scenario support the use/development of a parallel ES/EP, one is unable to recommend a certain class. Summing up: there is the need of extending and founding our knowledge about parallel ES/EP.

REFERENCES 1. J. Acztl and W. Ertel. A new formula for speedup and its characterization. Acta Informat ica, 34:637-65 2, 1997.

2. T. Back, G. Rudolph, and H.-P. Schwefel. Evolutionary programming and evolution strategies: Similaritiesand differences. In D. B. Fogel and W. Atmar, editors, Proceedings of the 2nd Annual Conference on Evolutionary Programming, pages 11-22. Evolutionary Programming Society, La Jolla (CA), 1993.

3. U. Bernutat-Buchmann and J. Krieger. Evolution strategies in numerical optimization on vector computers. In Feilheimer, Joubert, and Schendel, editors,

166

PARALLEL EVOLUTION STRATEGIES

Proceedings of the Int '1 Con$ on Parallel Computing '83,pages 42-5 1. Elsevier, Amsterdam, 1984. 4. A. Bormann. Parallelisierungsmoglichkeitenf i r direkte Optimierungsverfahren auf Transputersystemen. Diplomarbeit, University of Dortmund, Department of Computer Science, 1989.

5. H. A. David. Order Statistics. Wiley, New York, 1970. 6. B. S. Duncan. Parallel evolutionary programming. In D. B. Fogel and W. Atmar, editors, Proceedings of the 2nd Annual Conference on Evolutionary Programming, pages 202-209. Evolutionary Programming Society, La Jolla (CA), 1993.

7. P. Eberhard, F. Dignath, and L. Kiibler. Parallel evolutionary optimization of multibody system with application to railway dynamics. Multibody System Dynamics, 9(2): 143-164,2003, 8. M. J. Flynn. Very high-speed computing systems. Proceedings of IEEE, 54:1901-1909, 1966. 9. D. B. Fogel. Evolving Artzficial Intelligence. PhD thesis, University of California, San Diego, 1992. 10. D. B. Fogel. Evolutionary Computation: Toward a New Philosophy ofMachine Intelligence. IEEE Press, New York, 1995. 11. L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated Evolution. Wiley, New York, 1966. 12. H. Fuger, G . Stein, and V. Stilla. Multi-population evolution strategies for structural image analysis. In Proceedings of the First IEEE Conference on Evolutionary Computation (ICEC '94), pages 229-234. IEEE Press, Piscataway (NJ), 1994. 13. M. Herdy. Reproductive isolation as strategy parameter in hierachically organized evolution strategies. In R. Manner and B. Manderick, editors, Parallel Problem Solvingfrom Nature, 2, pages 207-2 17. North Holland, Amsterdam, 1992. 14. F. Hoffmeister. Scalable parallelism by evolutionary algorithms. In M. Grauer and D. B. Pressmar, editors, Applied Parallel and Distributed Optimization, pages 175-198. Springer, Berlin, 1991.

15. F. Hoffmeister and H.-P. Schwefel. A taxonomy of evolutionary algorithms. In G . Wolf, T. Legendi, and U. Schendel, editors, Proceedings ofthe K International Workshop on Parallel Processing by Cellular- Automata and Arrnys (Purcella '90), pages 97-107. Akademie-Verlag, Berlin, 1990. 16. J. H. Holland. Adaptation in Natural and Artijicial Systems. The University of Michigan Press, Ann Arbor, 1975.

REFERENCES

167

17. R. Kottkamp. Nichtlineare Optimierung unter Verwendung verteilter, paralleler Prozesse in einem Local Area Network (LAN). Diploma thesis, University of Dortmund, Department of Computer Science, February 1989. 18. R. Lohmann. Application of evolution strategies in parallel populations. In H.-P. Schwefel and R. Miinner, editors, Parallel Problem Solving from Nature, pages 198-208. Springer, Berlin and Heidelberg, 1991. 19. R. Lohmann. Structure evolution and incomplete induction. In R. Manner and B. Manderick, editors, Parallel Problem Solvingfrom Nature, 2, pages 175-1 85. North Hollarld, Amsterdam, 1992. 20. K. M. Nelson. Function optimization and parallel evolutionary programming on the MasPar MP-1. In A. V. Sebald and L. J. Fogel, editors, Proceedings of the 3rd Annual Conference on Evolutionary Programming. World Scientific, River Edge (NJ), 1994. 21. R.S. Pereira, 0. R. Saavedra, and 0. A. Carmona. Parallel distributed evolution strategies. In Proceedings of the 5th Brazilian Symposium on Intelligent Automation, pages 1034-1040. Port0 Alegre, 2001. 22. I. Rechenberg. Cybernetic solution path of an experimental problem, royal aircraft establishment, library translation no. 1122, farnborough, hants., uk, 1965. 23. I. Rechenberg. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommanr-Holzboog Verlag, Stuttgart, 1973. 24. I. Rechenberg. Evolutionsstrategien. In B. Schneider and U. Ranft, editors, Simulationsmethoden in der Medizin und Biologie, pages 83-1 14. Springer, Berlin, 1978. 25. I. Rechenberg. Evolutionsstrutegie '94. Frommann-Holzboog Verlag, Stuttgart, 1994. 26. G. A. Riessen, G. J. Williams, and X. Yao. PEPNet: Parallel evolutionary programming for constructing artificial neural networks. In Proceedings of the Sixth Annual Conference on Evolutionary Programming (EP97), pages 3 5 4 5 . Springer, Berlin, 1997. 27. G. Rudolph. Globale Optimierung mit parallelen Evolutionsstrategien. Diplomarbeit, University of Dortmund, Department of Computer Science, July 1990. 28. G. Rudolph. Global optimization by means of distributed evolution strategies. In H.-P. Schwefel and R. Manner, editors, Parallel Problem Solvingfrom Nature, pages 209-213. Springer, Berlin and Heidelberg, 1991. 29. G. Rudolph. Parallel approaches to stochastic global optimization. In W. Joosen and E. Milgrom, editors, Parallel Computing: From Theory to Sound Practice,

168

PARALLEL EVOLUTION STRATEGIES

Proceedings of the European Workshop on Parallel Computing (EWPC 92), pages 256267. 10s Press, Amsterdam, 1992. 30. G. Rudolph. Massively parallel simulated annealing and its relation to evolutionary algorithms. Evolutionary Computation, 1(4):36 1-382, 1994. 3 1. T. P. Runarsson. An asynchronous parallel evolution strategy. International Journal of Computational Intelligence & Applications, 3(4):38 1-394,2003. 32. T. P. Runarsson and X. Yao. Continuous selection and self-adaptive evolution strategies. In Proceedings of the 2002 Congress on Evolutionary Computation (CEC 2002), pages 279-284. IEEE Press, Piscataway (NJ), 2002. 33. M. Schutz and J. Sprave. Application of parallel mixed-integer evolution strategies with mutation rate pooling. In L. J. Fogel, P. J. Angeline, and T. Back, editors, Proceedings of the 5th Annual Conference on Evolutionary Programming, pages 345-354. MIT Press, Cambridge (MA), 1996. 34. H.-P. Schwefel. Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrategie. Birkhauser, Basel, 1977. 35. H.-P. Schwefel. Evolution and Optimum Seeking. Wiley, New York, 1995. 36. A. V. Sebald and J. Schlenzig. Minimax design of neural net controllers for highly uncertain plants. IEEE Transactions on Neural Networks, 5( 1):73-82, 1994. 37. J. Sprave. Parallelisierung Genetischer Algorithmen zur Suche und Optimierung. Diplomarbeit, University of Dortmund, Department of Computer Science, 1990. 38. J. Sprave. Linear neighborhood evolution strategies. In A. V. Sebald and L. J. Fogel, editors, Proceedings of the 3rd Annual Conference on Evolutionary Programming, pages 42-5 1. World Scientific, River Edge (NJ), 1994. 39. J. Sprave. A unifiedmodel of non-panmictic population structures in evolutionary algorithms. In Proceedings of the 1999 Congress on Evolutionary Computation (CEC 1999), volume 2, pages 13841391. IEEE Press, Piscataway (NJ), 1999.

40. S. Tongchim and X. Yao. Parallel evolutionary programming. In Proceedings of the 2004 Congress on Evolutionary Computation (CEC2004),pages 1362-1367. IEEE Press, Piscataway (NJ), 2004. 41. J. Wakunda and A. Zell. EVA - A tool for optimization with evolutionary algorithms. In Proceedings of the 23rd EUROMICRO Conference, pages 644651. 1997. 42. J. Wakunda and A. Zell. Median-selection for parallel steady-state evolution strategies. In M. Schoenauer, J. J. Merelo, K. Deb, G . Rudolph, X. Yao, E. Lutton, and H.-P. Schwefel, editors, Proceedings qf the 6th Internationml Conference on Parallel Problem Solving from Nature (PPSN 2000), pages 4 0 5 4 14. Springer, Berlin, 2000.

REFERENCES

169

43. K. Weinert, J. Mehnen, and G. Rudolph. Dynamic neighborhood structures in parallel evolution strategies. Complex Systems, 13(3):227-243,2002.

44. K. Weinert, T. Surmann, and J. Mehnen. Parallel surface reconstruction. In J. A. Foster, E. Lutton, J. Miller, C. Ryan, and A. G. B. Tettamanzi, editors, Proceedings of the Fifth European Conference on Genetic Programming (EuroGP 2002), pages 113-122, Springer, Berlin and Heidelberg, 2002. 45. D. H. Young. The order statistics of the negative binomial distribution. Biometrika, 57( 1): 181-186, 1970.

This Page Intentionally Left Blank

Parallel Ant Colony Algorithms STEFAN JANSON, DANIEL MERKLE, MARTIN MIDDENDORF University of Leipzig, Germany

8.1 INTRODUCTION Ant Colony Algorithms are computational methods for solving problems that are inspired by the behavior of real ant colonies. One particularly interesting aspect of the behavior of ant colonies is that relatively simple individuals perform complicated tasks. Examples for such collective behavior are: i) the foraging behavior that guides ants on short paths to their food sources, ii) the collective transport of food where a group of ants can transport food particles that are heavier than the sum of what all members of the group can transport individually, and iii) the brood sorting behavior of ants to place larvae and eggs into brood chambers of the nest that have the best environmental conditions. In this chapter we concentrate on the Ant Colony Optimization (ACO) metaheuristic for solving combinatorial optimization problems. ACO is inspired by the foraging behavior of ants. An essential aspect thereby is the indirect communication of the ants via pheromones, i.e., chemical substances which are released into the environment and influence the behavior or the development of other individuals of the same species. In a famous biological experiment called the double-bridge experiment ([8, 181) it was shown how trail pheromone leads ants along short paths to their food sources. In this experiment a double bridge with two branches of different lengths connected a nest of the Argentine ant with a food source. It was found that after a few minutes nearly all ants use the shorter branch. This is interesting because Argentine ants cannot see very well. The explanation of this behavior has to do with the fact that the ants lay pheromone along their path. It is likely that ants which randomly choose the shorter branch arrive earlier at the food source. When they go back to the nest they smell some pheromone on the shorter branch and therefore prefer this branch. The pheromone on the shorter branch will accumulate faster than on the longer branch so that after some time the concentration of pheromone on the former is much higher and nearly all ants take the shorter branch. Inspired by this experiment Dorigo and colleagues designed a heuristic for solving the Traveling Salesperson Problem (TSP) [ 11, 131 and initiated the field of ACO. 171

172

PARALLEL ANT COLONY ALGORITHMS

Meanwhile ACO algorithms have been designed for various application problems and different types of combinatorial optimization problems like vehicle routing problems, scheduling problems, and assignment problems (see [6, 14,321 for an overview). The ACO metaheuristic is described in Section 8.2. Different strategies for parallelization of Ant Colony algorithms are described in Section 8.3. This section contains also an experimental part where we investigate a non-centralized parallel ACO for a network of workstations that offer different computational power for the ACO. Hardware parallelization of ACO is considered in Section 8.4. In Section 8.5 some other ant colony inspired parallel algorithms are considered. 8.2 ANT COLONY OPTIMIZATION

The general idea of the ACO metaheuristic is to let artificial ants construct solutions for a given combinatorial optimization problem (see [ 141 for detailed information about ACO). Typically for ACO an ant constructs a solution by a sequence of probabilistic decisions where every decision extends a partial solution by adding a new solution component until a complete solution is derived. The sequence of decisions for constructing a solution can be viewed as a path through a corresponding decision graph. The aim is to let the artificial ants find paths through the decision graph that correspond to good solutions. This is done in an iterative process where the good solutions found by the ants of an iteration should guide the ants of following iterations. Therefore, ants that have found good solutions are allowed to mark the edges of the corresponding path in the decision graph with artificial pheromone. This pheromone guides following ants of the next iteration so that they search near the paths to good solutions. In order that pheromone from older iterations does not influence the following iterations for too long, during an update of the pheromone values some percentage of the pheromone evaporates. Thus, an ACO algorithm is an iterative process where pheromone information is transferred from one iteration to the next one. The process continues until some stopping criterion is met, e.g., a certain number of iterations has been done or a solution of a given quality has been found. A scheme of an ACO algorithm is given in the following pseudocode. Now we illustrate how the general ACO scheme can be applied to the TSP. The TSP problem is to find for a given set of n cities with distances d,, between each pair of cities i, j E [l : n]a shortest closed tour that contains every city exactly once. Every such tour together with a start city can be characterized by the permutation of all cities as they are visited along the tour. Vice versa, each permutation of all cities corresponds to a valid solution, i.e., a closed tour. The three most important elements of the ACO scheme that constitute an ACO algorithm are described in the following, namely the use of pheromone information, the solution construction process, and the pheromone update method (a scheme for an ACO for the TSP is given at the end of this section). The pheromone information should reflect the most relevant information for the solution construction. For the TSP the pheromone information can be encoded in an

ANT COLONY OPTIMIZATION

173

ACO scheme: Initialize pheromone values repeat f o r a n t k c {I, . . . ,m } construct a solution endfor forall pheromone values do decrease the value by a certain percentage {evaporation} endfor forall pheromone values corresponding to good solutions do increase the value {intensification} endfor until stopping criterion is met

n x TI pheromone matrix [ri,],i,j E [l : n],where pheromone value rij expresses the desirability to have city j as successor of city i in a tour. The pheromone matrix is typically initialized so that all values ~~j with i # j are the same. Note that the values rz2are not needed because each city is selected only once. For solution construction an ant starts with a random city and then always chooses the next city from the set S of selectable cities that have not been visited so far. This is done until no city is left. Initially, the set of selectable cities S contains all cities and after each decision the selected city is removed from S. Every decision is made randomly where the probability equals the amount of pheromone relative to the sum of all pheromone values of cities in the selection set S. Thus to the probability to select city j after city i is

For many optimization problems additional problem-dependent heuristic information can be used to give the ants additional hints for their decisions. To each pheromone value rt, there is defined a corresponding heuristic value vz3. For the TSP a suitable heuristic is to prefer a next city j that is near to the current city i , e.g., by setting q,, := l/dzJ.The probability p,, when using a heuristic is then

where parameters cr and p are used to determine the relative influence of pheromone values and heuristic values.

174

PARALLEL ANT COLONY ALGORITHMS

All m tours that are constructed by the ants in one iteration are evaluated according to their length and the shortest tour j ~ *of the current iteration is determined. Then the pheromone matrix is updated in two steps: 1. Evaporation. All pheromone values are reduced by a fixed proportion p E (071): rii3:= (1 - p ) . riJ t! i , j E [l : n]

2. Intensification. All pheromone values corresponding to the best solution T* are increased by an absolute amount A > 0:

It should be mentioned that several variants of pheromone updates exist. One variant is to add pheromone according to every solution in a generation but so that the amount of pheromone added depends on the quality of the solution (the better a solution is, the more pheromone is added). For this variant, called Ant System (AS), it is possible that the pheromone matrix is updated immediately when a solution or update vector is received by a master process (see [ 161). A scheme for ACO for the TSP is given in the following.

ACO for TSP: Initialize pheromone values repeat for ant lc E { 1,. . . ,m}{solution construction} S := { 1, . . . , n } {set of selectable cities} choose city i with probability pot repeat choose city j E S with probability p,,

s := s

2

-

{j}

:=3

until S = 0 endfor forall i ,j do rLJ:= (1 - p ) . T~,{evaporation} endfor forall 2 , jin iteration best solution do T % ,:= ~ T~, A {intensification} endfor until stopping criterion is met

+

PARALLEL ACO

8.2.1

175

Population Based ACO

For standard ACO algorithms the information that is transferred from one iteration to the next is the pheromone matrix. An alternative approach that was proposed by Guntsch and Middendorf ([ 191) is to transfer a small population of good solutions to the following iteration. This approach is called population based ACO (P-ACO). The differences between P-ACO and standard ACO are described in this section for the TSP. It was shown that standard ACO and P-ACO obtain results of similar quality on the TSP ([ 193). As for the standard ACO we describe some main elements of P-ACO for the TSP in more detail, namely the information transfer and the population matrix, and the construction of a pheromone matrix (see the scheme for P-ACO at the end of this section). P-ACO maintains a small population P of the k best tours that have been found in past iterations. The population can be stored in an n x k matrix P = [ p i J ]where , each column of P contains one tour and pij is the number of the ith city in tour j. This matrix is called the population matrix. The population (matrix) is updated when all ants of an iteration have constructed their tour. The best tour of the current iteration is added to P. If afterward P contains k 1 tours, the oldest tour is removed from P. The initial population is empty and after the first k iterations the population size remains k . Hence, for an update only one column in the population matrix has to be changed. Other schemes for deciding which solutions should enterileave the population are discussed in [20]. In P-ACO a pheromone matrix ( T ~ is~ used ) by the ants for solution construction in the same way as in standard ACO. This matrix is derived anew in P-ACO in every iteration from the population matrix as follows. Each pheromone value is set to an initial value Tinzt > 0 and is increased, if there are corresponding solutions in the population: Tiig := Tinit (23 . A (8.2)

+

+

( i j denoting the number of solutions T E P with ~ ( i = ) j, i.e., = \ { h : pih = j } l . Thus, the only possible pheromone values in P-ACO are Tinit, Tinit A, . . . , Tinzt k . A. An update of the pheromone values is done

with (%j

+

+

implicitly by a population update so that a solution T entering the population, corresponds to a positive update T ~ ~ := ( ~ ' T )~ ~ ( ~A) and a solution CT leaving the population corresponds to a negative update T ~ ~ := ( T ~ )~ ~- (A.~ Note ) that a difference to standard ACO is that no evaporation is used to reduce the pheromone values at the end of an iteration. A scheme for P-ACO is given in the following.

+

8.3

PARALLELACO

In this section we describe and compare different approaches for parallel ACO algorithms that are suitable for networks of workstations or parallel computers. So far parallel ACO algorithms have been designed mainly for homogenous parallel systems where all processors or workstations are similar or they follow the master-slave

176

PARALLEL ANT COLONY ALGORITHMS

P-ACO for TSP:

P := 8 Initialize pheromone values repeat for ant I; E { 1,. . . ,m}{solution construction} S := { 1, . . . , n } {set of selectable cities} choose city i with probability pot for i = 1 to 1% do choose city j with probability p,, s := s - { j } ?

:=I

endfor endfor If [PI = k remove the oldest solution % from the population: P := P - 7i Determine the best solution of the iteration and add it to the population: P := P T' Compute the new pheromone matrix from P until stopping criterion is met

+

paradigm where the central master distributes the work to the other processors. In the last case the master can cope with differences in speed between the workstations/processors by sending the slower processor less work. Unfortunately, not much work has been done so far on non-centralized approaches to ACO for heterogenous parallel systems, although this is very relevant for practical applications. Therefore, we concentrate on this aspect in the experimental part of this section, where we investigate a non-centralized ACO on a network of workstations which offer different computational power for the ACO due to other load. Hardware parallelization and parallelization for processor arrays which consist of many simple processing elements are considered in Section 8.4. It is common to all parallel ACO approaches studied so far in the literature that the construction of a single solution (including the evaluation of the quality of this solution) by an ant is not split between several processors but is always done on one processor. One reason is that the solution construction process in ACO is typically a sequential process and it is difficult to split it into several parts that can be done more or less independently. Clearly, it can be advantageous to parallelize the determination of the quality of a solution in cases where this is a very time-consuming process. But this has more to do with the specific application problem than with ACO in general. Thus, the minimum (suitable) grain size of parallel ACO is the construction of a single solution (however, this is different for hardware-oriented ACO algorithms that are considered in Section 8.4). This implies that a corresponding processor has to know the pheromone information and possibly the heuristic information. Most parallel ACO algorithms put more than one ant on each processor. When several ants

PARALLEL A C O

177

are placed on one processor and these ants work more closely together than with ants on other processors, this group of ants is often called a colony. In order to evaluate the quality of a parallel ACO algorithm it should be compared with a multistart ACO. This means that several runs of a sequential ACO algorithm are executed on different processors independently. Some additional profit may be gained for the multistart ACO when the sequential ACO algorithms are started with different parameter values. Parallel ACO algorithms can be classified with respect to several criteria that are described in the following. Two of the most basic criteria are: i) Is the algorithm a parallelization of standard ACO or a specially designed parallel ACO algorithm'? The aim of a parallelization of standard ACO algorithm is to increase the run time without changing the optimization behavior of the algorithm. In contrast the specially designed parallel ACO algorithms try to change the standard ACO algorithm so that the parallel version works more efficiently. One approach is to not do information exchange between the processors at every iteration. This can also have a positive effect on the optimization behavior because the colonies of the processors can specialize to different regions of the search space. ii) Does the algorithm use a centralized approach or a decentralized one? Typically, in a centralized approach there is one processor that collects the solutions or the pheromone information from all other processors. Then it does the pheromone update and computes the new pheromone matrix, which is then sent to the other processors. This process on the central processor is often called the master process and other processes are the slave processes. In a decentralized approach every processor has to compute the pheromone update by itself using information that it has received from other processors. ACO algorithms which have several colonies of ants that use their own pheromone matrix and where the pheromone matrices of different colonies are not necessarily equal are called multicolony ACO algorithms. Originally, multicolony ACO algorithms have been designed to improve the behavior of ACO algorithms or to be used for multi objective optimization (e.g., [17,26,33]). However multicolony ACO algorithms are also well suited for parallelizationbecause a processor can host a colony of ants and typically there is less information exchange between the colonies as would have been between groups of ants in standard ACO. Most parallel multicolony ACO algorithms are decentralized algorithms. The parallel multicolony ACO algorithms have some similarities with island genetic algorithms. In the following we list some aspects that are relevant for the design of multicolony ACO algorithms (a scheme for P-ACO is given at the end of this section):

i) Communication structure and neighborhood topology: the communication structure defines a neighborhood relation between the colonies. For example, during an information exchange step every colony sends information to all its neighboring colonies. Topologies that have been used for multicolony ACO are:

178

PARALLEL ANT COLONY ALGORITHMS

- All-to-all topology: In this topology every colony is a neighbor to every

other colony.

- (Directed or undirected) ring topology: In a directed ring colony (z + 1 mod p ) + 1 is the neighbor of colony i for all z E [ 1 : p ] . Additionally, in an undirected ring colony (i - 1 mod p ) i for all i t [l : p ] .

+ 1 is the neighbor of colony

- Hypercube topology: This topology requires that there are p = 2k

colonies and each colony i is a neighbor with colony j iff the binary representation of i and j differ by one bit. Thus, each colony has exactly logp = k neighbors. - Random topology: In this topology the neighbors of each colony are

defined randomly for each communication step. Different methods for determining random neighbors are possible, e.g., each colony has exactly one neighbor colony that is chosen with equal probability from all other colonies. ii) Type of information that is exchanged between the colonies: - Solutions: In this strategy good solutions that have been found are sent

to other colonies. It should be noted that it is often advantageous to exchange also the quality of the corresponding solutions so that it has not to be recomputed in the neighboring colonies. There are different possibilities to define which solutions are exchanged:

*

* * *

Migrants: This means that the solution of a single ant from the current iteration is sent to another colony. Usually the solution of the best ant of the current iteration is sent as a migrant. Global best solution (GBest): In this strategy the global best solution is determined and sent to all the colonies. Neighborhood best solution (NBest): In this strategy the neighborhood best solution is determined and sent to all the colonies of the neighborhood. Local best solution (LBest): In this strategy the local best solution of a colony is sent to its neighbors.

- Pheromone vectors (PVector): An alternative for sending solutions, es-

pecially, when the update for a solution depends only on its quality is to exchange only the corresponding pheromone update vectors (see e.g., [16]). In the extreme case a whole pheromone update matrix is sent which corresponds to the total pheromone update of several solutions.

- Pheromone matrix (PMatrix): In this strategy the actual pheromone matrix of a colony is sent to its neighbors. iii) Usage of information that has been received from other colonies:

is compared to the current elitist solution and becomes a new elitist solution

- Compare with elitist solution: The best of the received solutions

PARALLEL ACO

179

when it is better than the old elitist solution. An alternative is to use the best of the received solutions as an elitist solution even when it is worse than the old elitist solution. - Add to pheromone matrix: When a solution has been received, the phero-

mone matrix is updated with this solution. When a pheromone matrix has been received, the new pheromone matrix of the colony is a weighted average of the received pheromone matrices and the old pheromone matrix of the colony. - Add to current generation: A solution that has been received is added

to the solutions that have been found by the ants of the current iteration. Then this whole set of solutions is used for pheromone update. Hence, when only the best solution of a colony in an iteration is allowed to update, a received solution can have an influence only when it is better than all solutions that have been found by the ants in the same iteration. iv) Communication times: - Every iteration: Information is exchanged at the end of every iteration

(but eventually before pheromone update).

- Only every fixed number of k iterations: This type of communication can be done synchronously so that all colonies exchange information at the same iteration or asynchronously so that at each iteration only n i / k colonies send their information to their neighbors.

- Solution quality dependent: For example, only when a local best solution

has been found in a colony or when its pheromone matrix has been changed significantly, does it sends information to its neighbors. v) Homogenous approach vs. heterogenous approach: In the heterogenous approach the colonies can use different ACO parameters or different heuristics. - Heterogenous within an iteration: Here, the colonies that are used within

the same iteration use different methods for solution construction and/or pheromone update. A problem with this approach is that the best solutions within an iteration are often found only in one of the colonies. When only these best solutions are allowed to update in the colonies, only one colony has an influence (whether this is the case depends of the information exchange and update strategies).

- Heterogenous between iterations: In this approach the colonies use different strategies in different iterations (e.g., the used heuristic is periodically changed, see [30]for more details).

8.3.1 Overview of the Literature In this section we give an overview on the parallel ACO algorithms that have been described in the literature. So far the research on parallel ACO algorithms is not

180

PARALLEL ANT COLONY ALGORITHMS

Parallel Multicolony ACO scheme: pardo for processor i E { 1,. . . ,p } Initialize pheromone values repeat for ant k E { 1,. . . , m/p} in colony i construct a solution endfor if communication step then exchange information with neighboring colonies evaluate received information endif update elitist ant {if elitist ant is used} forall pheromone values do decrease the value by a certain percentage {evaporation} endfor forall pheromone values corresponding to good solutions do increase the value {intensification} endfor until stopping criterion is met endpardo

very extensive. One reason might be that it seems quite obvious how to parallelize ACO. Most parallel implementations of ant algorithms in the literature are just parallelizations of standard ACO. They differ only in granularity and whether the computations for the new pheromone matrix are done locally in all colonies or centrally by a master processor which distributes the new matrix to the colonies. Some exceptions are [4,28,33,34], which consider multicolony ACO algorithms. A very fine-grained and early parallelization of ACO where every processor holds only a single ant was implemented by Bolondi and Bondaza [2]. Due to the high overhead for communication this implementation did not scale very well with a growing number of processors (better results have been obtained by [2, 121 with a more coarse grained variant). Also Talbi et al. [44] implemented such a finegrained parallel ACO algorithm for the Quadratic Assignment Problem. They used a master-worker approach, where every worker holds a single ant. Every worker sends its solution to the master. The master then computes the new pheromone matrix and updates the best found solution. Then it sends the new pheromone matrix to every worker. The algorithm was tested on a network of 10 Silicon Graphics Indy workstations using the programming environment CIPVM. The authors compare their parallel ACO algorithm to other metaheuristics, but the effect of parallelization on the results was not studied in further detail. An application of the fine-grained masterslave ACO to the reconstruction of chlorophyll concentration profile in offshore ocean water was presented by Souto et al. [42]. The obtained efficiency with a parallel implementation using MPI for 3, 5 , and 15 processors was 0.88, 075, and

PARALLEL ACO

181

0.43, respectively. Unfortunately, the authors do not mention which parallel machine was used. Another fine-grained parallel ACO was studied by Randall and Lewis [38]. Here, the ants do some pheromone update after every decision which requires communication with the master. Clearly, this approach is not very suitable for parallel execution. The obtained speedup values for different TSP instances on an IBM SP2 using 8 processors ranges from 0.06 for an instance with 24 cities (which means that parallel execution is much worse than serial execution) up to 3.3 for an instance with 657 cities. Stiitzle [43] compares the solution quality obtained by a multi start ACO algorithm, i.e. the execution of several independent short runs that can be run in parallel, with the solution quality of a single long run whose running time equals the sum of the running times of the short runs. Under some conditions the short runs proved to give better results. Bullnheimer et al. [4] propose a parallelization where an information exchange between several colonies of ants is done only every k generations for some fixed k. They show by using simulations how much the running time of the algorithm decreases with an increasing interval between the information exchange. But it is not discussed how this influences the quality of the solutions. Piriyakumar and Levi [37] studied also an ACO approach where information exchange is not done at every iteration for the TSP. The algorithm was implemented in a Cray T3e computer using the MPI library and synchronous communication is used between the nodes. Tests were done with a TSP instance with 5 1 nodes. The communication time when using 4 processors and information exchange every 10 iterations was 1.12% of the total computation time. The idle times are mainly due to the synchronous mode of communication. Since the results in the paper are not described very clear it is difficult to draw many conclusions. The authors concluded: “if more number of parallel processors are used, it is shown that the communication times and idle times can be contained” and “this encourages the use of synchronous communication”. One of the first real multicolony approaches for ACO was proposed in Michels and Middendorf [33]. Here, every processor holds a colony of ants exchanging the locally best solution after every fixed number of iterations. When a colony receives a solution that is better than the best solution found so far by this colony, the received solution becomes the new best found solution. It influences the colony because during trail update some pheromone is always put on the trail that corresponds to the best found solution. The results of Kriiger et al. [28] indicate that it is better to exchange only the best solutions found so far than to exchange whole pheromone matrices and add the received matrices -multiplied by some small factor- to the local pheromone matrix. Middendorf et al. [34] studied several information exchange strategies and topologies (e.g., ring topology) for a parallel multicolony ACO (the results are discussed in the next section). In two very similar papers Tsai et al. [45,46] studied a multicolony ACO approach for the TSP problem. Communication between the colonies is done via the ring topology with the PMatrix strategy for pheromone update so that in each colony the new pheromone matrix is the sum of the received matrix with weight 0.4 and the old matrix with weight 0.6. Test results are given for a

182

PARALLEL ANT COLONY ALGORITHMS

sequential implementation with 5 colonies and 4 ants in each colony. The conclusion is that the multicolony approach performs better than a single-colony algorithm. Most parallel implementations of ACO have been done with respect to the message-passing model. The aim of [7]was to show that a shared-memorymodel is useful to reduce the cost for parallelization although the synchronization procedure cannot be avoided. The ACO algorithm itself is not changed and each processor gets the same number of ants. Since each ant was allowed to update the pheromone matrix, each processor computes an update matrix and the update matrices of all processors are merged at the end of an iteration. The authors applied the parallel ACO to an industrial scheduling problem in an aluminium casting center. The obtained efficiency with 4 processors and 1000 ants per iteration was 0.86. For larger number of processors the efficiency decreased significantly -it was only 0.64 for 8 processors and 0.29 for 6 processors-. The authors intend to perform further studies in order to investigate the reasons for the surprisingly low efficiency for 8 and more processors. A shared-memory system was also used for a parallel ACO for a set covering problem that was studied by Fiorenzo Catalan0 and Malucelli [16]. They studied a synchronous and an asynchronous version. In the asynchronous version every processor has one ant. When each ant has computed a solution, the corresponding pheromone update vector is sent to all other processor. In a preceding work the authors have shown that an implementation of the same algorithm with a Shared Memory Access library CRAY MPP is slightly more efficient than a message-passing implementation. Clearly such a scheme is critical because it leads to a large number of messages that are sent at the same time among the processors. The asynchronous scheme uses a master-slave approach where each slave that is idle asks the master for the pheromone infixmation. When it has received this information, it computes a solution and sends the corresponding pheromone back to the master. The master then updates the pheromone information immediately. Note that this scheme is only possible when every ant updates the pheromone information. The run times of the algorithms were compared on a CRAY T3D with up to 64 processors where each algorithm has to compute a total number of 320 solutions and the number of ants was equal to the number of processors. For the ACO algorithm with greedy heuristic the run time on 4 processors was 30.5 sec for the synchronous version and 24 sec for the asynchronous version. The relative advantage for the asynchronous version was smaller for a larger number of processors (for 64 processors the synchronous version had run time 3.1 sec and the asynchronous version 3 sec). The efficiencies obtained when computed with respect to an execution on a single processor were 0.80, 0.75, 0.69,0.60, and 0.49 for the synchronous version with 4, 8, 16,32, and 64 processors, respectively. For the asynchronous version the corresponding efficiency values were 1.02, 0.94, 0.68, 0.6 1, and 0.5 1, respectively. 8.3.1.1

Information Exchange and Topology. In this section we consider more closely the influence of the topology and the information exchange on the optimization behavior of parallel multicolony ACO. Several strategies for information exchange that differ in the degree of coupling which is enforced between the colonies through

PARALLEL ACO

183

this exchange were investigated in [34]. Since the results of [28] indicate that exchange of complete pheromone matrices is not advantageous, all the following methods that were studied are based on the exchange of single solutions: 1. Exchange of globally best solution (Gbest): In every information exchange step the globally best solution is computed and sent to all colonies where it becomes the new locally best solution.

2. Circular exchange of locally best solutions (Lbest-Ring): A virtual neighborhood is established between the colonies so that they form a directed ring. In every information exchange step every colony sends its local best solution to its successor colony in the ring. The variable that stores the best found solution is updated accordingly. 3. Circular exchange of migrants (Migrant-Ring): As in (2) the processors form a virtual directed ring. In an information exchange step every colony compares its mbest > 0 best ants with the mbest best ants of its successor colony in the ring. The m b e s t best of these 2 m b e s t ants are allowed to update the pheromone matrix. 4. Circular exchange of locally best solutions plus migrants: Combination of (2)

and (3) (Lbest+Migrant-Ring). Tests with an instance of TSP (eillOl) have shown that the Lbest-Ring method performed best. Interesting results have been obtained with respect to the expected solution quality and the total number of solution evaluations that were allowed. Concerning the probability to find a solution that has at least a certain given quality, it was found that the more solutions are allowed to be constructed, the more colonies of ants are best. This is illustrated in Figure 8.1, which shows for the TSP instance eillOl that a single colony is best up to about 6375 (5350) evaluations, for more evaluations 2 colonies are best up to about 11,175 (10,825) evaluations, and for more evaluations 3 colonies are best. It can also be seen that for higher solution qualities a multicolony approach becomes profitable earlier. It was also found that depending on the target solution quality, the information exchange is an advantage or not for a multicolony approach. When the requested minimum solution quality is low, the highest probability to find such solutions is obtained when no information exchange is done. If on the other hand the requested minimum solution quality is high and the allowed number of solution evaluations is not too low, it is better to do information exchange. This is illustrated in Figure 8.2, which shows for the TSP instance eillOl that for a required minimum tour length 655 no information exchange is better for the 2 and 3 colony ACO. In contrast, when the required minimum tour length is 645 and the number of allowed solution evaluations is at least 5000 (lO,OOO), it is better to do information exchange for the 2 (respectively 3) colony ACO. Thus, the conclusion is that the question of whether a multicolony approach is good and whether information exchange is profitable has no simple answer. For the TSP, it was shown that the more total run time is available (high number of

184

PARALLEL ANT COLONY ALGORITHMS

solution constructions) and the higher the minimum expected solution quality, the more advantageous is it to use a multicolony ACO with information exchange. I ,

I

U.0

0.4 0.2 0

0

2000

4000 6000 8000 p=l p=2 ........

10000

12000

p=3

14000

Fig. 8.1 Single-colony ACO vs. multicolony ACO: Probability to find a tour of length at least 655 (3 upper curves) or at least 645 (3 lower curves) within a given number of solution evaluations for TSP instance eillOl: each colony has 10 ants (the number of evaluations per iteration is 10 . p ) , for p = 2 and p = 3 colonies information exchange according to LBestRing strategy was done every 10 iterations, a black square (black circle) marks the point where 2 colonies become better than I colony (respectively 3 become better than 2).

p=2, exchange

p=2, no exchange - ------

p=3, exchange p=3, no exchange

Fig. 8.2 Multicolony ACO with and without information exchange: Probability to find a tour of length at least 655 (4 upper curves) or at least 645 (4 lower curves) within a given number of solution evaluations for TSP instance eillOl: each colony has 10 ants (the number of evaluations per iteration is 10 . p ) , when information exchange was done it was according to the LBest-Ring strategy every 10 iterations.

PARALLEL ACO

185

It should be mentioned that in a later study Chu et al. [ 5 ] also compared different information exchange strategies for a multicolony ACO and the TSP problem. They also used the Gbest and Lbest-Ring strategy. In addition, they tried Lbest with i) a hypercube neighborhood (Lbest-Hypercube) where two colonies are neighbors when the binary representations of their colony numbers differ by one bit and ii) a painvise neighborhood (Lbest -Painvise) where two colonies are neighbored when the binary representations of their colony numbers differ only by the least significant bit. Moreover, combinations of Gbest with one of the Lbest strategies where tested. Unfortunately, not too much can be seen from the experimental data that were given in the paper. The main conclusion is that the multicolony approach (with 4 colonies of 20 ants each or 8 colonies of 10 ants each) performs better than the single-colony approach (with 80 ants per iteration) and the relative difference might increase with the size of the problem instance.

8.3.2 Empirical Results

In this section we investigate a multicolony ACO for a heterogenous network of workstations. Although heterogenous networks of workstations are common in practice, they are mostly neglected in the literature on (non-centralized)parallel ACO. The main aim of this experimental part is to show that heterogenous workstations can be used successfully for multicolony ACO and offer also interesting questions for research. It is not possible within this chapter to cover this aspect in full detail. A multicolony ACO for the TSP is studied which uses the LBest-Ring strategy for information exchange. For the tests we study the algorithm on a network of workstations where the nodes do not work exclusively for the ACO, but do also some other work in between. The standard multicolony ACO is not designed for such a situation because all colonies have the same size and the pheromone update is synchronized between them. Note that this means that the information exchange between neighboringprocessors is synchronizedbut not between all processor. Hence the colonies on the faster nodes have to wait for the other colonies. A way out of this problem would be to make the informationexchange asynchronous so that a processor that has done enough iteration for information exchange sends its information to its neighbors no matter how much iterations they have done so far. Unfortunately, there is a danger with this strategy because when the fast colonies have done too many iterations there is basically no chance for the slow ones to do any valuable work. But this means that it would have been better to simply not use the slower colonies. Here we propose a strategy in between the extreme cases of synchronous and asynchronous information exchange. This strategy is to let the number of iterations that each colony can execute before the next information exchange be at least equal to a minimum value but not larger than some given maximum value. Thus, the number of iterations between two information exchanges is in an interval I = [imin,imaz], imin 5 .,,i Each colony sends,its neighbors an information after iminiterations that it is ready for information exchange and then executes possibly more iterations until the neighbor is also ready or until ,,i iterations have been done (in the last case the colony has to wait until it receives the ready message from its neighbor). In

186

PARALLEL A N T COLONY ALGORITHMS

Table 8.1 Time used for MPI routines and for the application code within the first 400 iterations for different artificial load strategies and different information exchange intervals Load All Load One Load Skewed 10 [lo, 401 10 [lo. 401 10 [lo.401 Exchange interval I MPI 11.8 4.6 54.4 1.5 64.9 6.0 ADDliCatiOn 24.9 24.9 24.8 24.8 24.8 24.8

the concrete implementation that was used for the tests it was allowed to initiate the solution exchange in iteration imin - 1. But it is not used before iteration & i n . The test runs were done on a linux cluster with 8 nodes each with 4 Intel Pentium-111 Xeon 550 MHz processors. The nodes are connected to a Myrinet switch. The peak performance of the cluster is 19.8 GHz. The maximal intra-node (respectively internode) communication is 200 MBis (respectively 120 MBls). For communication the mpich implementation of the standard for message-passing libraries MPI was used. The test problem was the TSP instance eillOl from the TSPLib [48]. The optimal solution has length 629. The parameter values were a = 1,A = 5 , p = 0.95, Q = 100. Ten elitist ants ( e = 10) were used. We used a so-called MaxMin approach for the ACO where the pheromone values were restricted to the interval [0.1,5.0]. Every run was stopped after all colonies had reached at least 400 iterations. All 32 processors of the cluster were used for each run. All given results are averaged over 30 runs. For analysis of the algorithms we used Vampir [23], which is a commercial performance analysis tool for MPI parallel programs. To study the influence of different loads on the processors we increased the computation time per iteration by adding some artificial computation time. After a11 ants of a colony constructed their solutions, an active waiting period (artificial load) followed. We used the following three scenarios where we added such artificial load: i) Load All: The artificial load period for each colony after each iteration was chosen randomly between 0 and 250 ms, ii) Load One: only colony 0 was delayed by an artificial load between 0 and 250 ms per iteration, iii) Load Skewed: the artificial load period in milliseconds was calculated by T 10 . j , where T is a random value between 0 and 50 and j E [0 : 311 is the colony number. Table 8.1 shows the amount of time that was used for MPI routines compared to the time used for the application code. It can be seen that the MPI overhead (that is mainly due to waiting) can be reduced dramatically if an information exchange interval of I = [lo?401 is used instead of the synchronized information exchange (with information exchange every I = 10 iterations). In particular, when the artificial load is distributed very non-uniform to the colonieslprocessors (which is the case for Load One and Load Skewed), the reduction is more than 90%. But also for Load All the reduction is more than 50%. Clearly, the reduced MPI waiting times are an advantage. However, it has to be investigated how this influences the solution quality because different times for information exchange can change the relative influence of the colonies.

+

PARALLEL ACO

650

650

645

645

640

640

635

635

630

4 630

20

650

,.

\.:,~, I.

645

187

.;: j..

40

j,

60

80

100

120

processor 0 __ 10 ........... 13 ............ 31 ...........

: L

650

645

640

640

635

635

630

630 20

40

60

80

100

120

I

50

,

100

I ,

150

200

I

250

300

350

Fig. 8.3 Quality of best solutions in selected colony numbers 0, 10, 13, 3 1 for information exchange interval I = 10 (top) and I = [lo,401 (bottom), results given over time in seconds (left) and number of iterations (right); the horizontal and vertical lines indicate when solution qualities 645 and 635 are achieved first.

400

188

PARALLEL A N T C O L O N Y ALGORITHMS

Figure 8.3 shows the quality of the best solutions for selected colonies over time and over iterations for Load Skewed when information exchange is done every I = 10 iterations (top) and for I = [lo,401 (bottom). The quality is given over time (left) and over the number of iterations. It can be clearly seen that a fixed quality is achieved after less time if I = [10,40] is used. For example, a tour of length 645 is found after about 39 seconds for I = 10 and after less than than 20 seconds I = [lo,401. Similarly, quality 635 was achieved after about 80 seconds for I = 10, whereas only about 45 seconds were needed for I = [ 10,40]. It can also be seen that for I = 10 and I = [lo, 401 processor 10 has found the best solution quality when measured over time (this holds also with respect to the colonies that are not considered in the figure). For an explanation, recall that for Load Skewed colony 0 has the smallest additional artificial load whereas colony 3 1 has the most artificial load. But colony 0 is not the fastest colony because it has colony 3 1 as its neighbor and therefore has to wait for it. This can be seen in the Vampir traces in Figure 8.4, which shows that colony 10 finished with 400 iterations first for I = 10. For I = [10,40] the Vampir trace shows that colony 1 is the fastest. However, it is not the best with respect to time. The reason is that for I = [lo,401 the faster colonies make more iterations and therefore their neighbors can profit from their good solutions. Hence there is the trade-off that the colonies with small numbers are faster but the colonies with larger numbers receive better solutions from their neighbors. It happens here accidentally that colony 10 is also the best for I = [lo.401. The picture changes when the solution qualities are compared over the number of iterations. For I = [lo,401 colony 0 is allowed to perform up to 40 iteration in the same time span where processor 31 can only perform 10 iterations. The consequence is that if the quality is measured over a number of iterations colony 31 can profit most from the information exchange because it has the highest artificial load and will receive solutions that its neighbors have found over more iterations that it has done by itself so far. In contrast, colonies 0 and 1 are the fastest and cannot profit much from the other colonies. Therefore, colony 31 is best when measured over a number of iterations for I = [10,40] and colony 0 is worst. For I = 10 colony 10 is the fastest, as can be seen in the Vampir trace in Figure 8.4. Usually one would expect all colonies to have the same average solution quality. The reason that this is not the case is due to the concrete implementation where a solution exchange can be done at i, - 1. Thus, colonies 1 to 10 typically receive a solution after I = 10 iterations while colonies 11 to 31, and 0 receive a solution after 9 iterations. Therefore, the quality of the solutions that colonies 1 to 10 receive will be slightly better on average and colony 10 profits most 0s this. There are still many interesting questions for research on parallel multicolony ACO on heterogenous workstations left. For example, more complicated load-balancing situations can be considered. Also, the information exchange between the colonies can be dynamically varied dependent on the actual load situation.

PARALLEL ACO

189

Fig. 8.4 Vampir timeline of multicolony ACO with Load Skewed and information exchange intervals I = 10 (top) and I = [lo,401 (bottom), shown are the approximately 30 seconds before the first processor had finished 400 iterations; for each processor, the white area blends out what happens after 400 iterations.

190

8.4

PARALLEL ANT COLONY ALGORITHMS

HARDWARE PARALLELIZATION OF ACO

Hardware parallelization of ACO is a promising field because considerable speedup can possibly be gained by parallelism and pipelining features of the hardware. Of particular interest are modern dynamically reconfigurable hardware architectures that are able to reconfigure their function andor structure to suit the changing needs of a computation during run time. Two different approaches for implementation of ACO on reconfigurable architectures have been studied in the literature so far. In [31] an ACO algorithm for large processor arrays with a dynamically reconfigurable bus system (R-Mesh model) was proposed. The first implementation of ACO in real hardware was presented by mapping an ACO variant onto Field Programmable Gate Arrays (FPGAs) [21,40]. Both approaches are described in more detail in the next sections. 8.4.1

ACO for the Reconfigurable Mesh

In this section we describe the ACO algorithm (R-Mesh-ACO) of [31] for large dynamically reconfigurable processor arrays (R-Mesh model). The R-Mesh is a standard model for reconfigurable processor arrays (see [47] for a short overview). An R-Mesh consists of a set of processing elements (PEs) arranged on a k x n grid. Every PE contains four ports (named north, east, south, and west port) enabling it to connect to its neighbor PEs. The linked ports construct the static topology. Every PE is furthermore equipped with a number of switched lines which link its ports internally. The configuration of each PE is set by these switches, which essentially form the dynamic topology through buses as depicted in Figure 8.5.

Fig. 8.5 (right).

Reconfigurable mesh with 16 PEs (left), possible connections of ports within a PE

Every PE can read from and write to the bus lines it is connected to, that is, we have concurrent read, concurrent write buses (CRCW buses). When several PEs write to the same bus, the result is the bitwise OR. Every PE has only a constant number of

HARDWARE PARALLELIZATION OF ACO

191

registers and it knows its row and column indices. The PEs work synchronously so that within one time step every PE can locally configure the bus, write to andor read from one of the buses it is connected to, and perform some local computation. Signal propagation on buses is assumed to take constant time regardless of the number of switches on the bus. This is the standard assumption for this model of computation (e.g., PSI). The general principle of the R-Mesh-ACO is to embed the n x n, pheromone matrix M into an n x n R-Mesh so that PE PzJcontains only the pheromone value rij, i , j E [l : n ] . The ants are then pipelined through the R-Mesh so that one ant occupies one row of the mesh. This is done as follows. The first ant starts in row 1 of the RM and selects the first item. Then it moves to row 2 and selects the second item. When an ant moves, it takes the information with it of which items have already been selected, that is, items which are not in the set of selectable items S . This process continues until the ant reaches row n, where it has determined its permutation. The next ant always follows its predecessor ant one row behind. In more detail, the selection of an item in a row i by an ant is done as follows. Every PE P z - ~in, ~ row i - 1 knows whether item j has been selected in one of the rows 1, . . . ,i - 1. PE Pi- 1 , sends this information to Pzj when the ant moves to row i. The prefix-sums of the rij . qzj values from all PEs in S are determined. Formally, for S = {jl, j 2 , . . . ,j n - ( i - 1 ) } , j , < j, < . . . < j n - ( i - 1) the prefix-sums of the T i j . qtj values are all sums rij, . qij, rijZ . qij, . . . rtj, . qij, with k E [l : n - ( i - l)].The first PE in the row chooses a random number z from the interval [0, C J Cr$ s . 75).Random number i is then sent to all PEs in the row. PE Pij is selected when 2 is in the interval [C1<J,lES 7 ; .qil, B Cl<j,lEs r$ .qil]. P Clearly,

+

+ +

every PE Pij can determine in time O(1) whether it is selected or not by using its own prefix-sum and the prefix-sum of the next PE to the left with a column index in

S. When all ants in a generation have found their solution, it is determined which ants are allowed to update the pheromone information. Then pheromone evaporation and update are done and the next generation of ants starts. Since the prefix-sums can be determined in time O(1ogn) in every row, it can be seen that the algorithm will run on an n x n R-Mesh in time O ( x . ( m 11) . log n ) , where II: is the number of generations. In order to find a faster implementation that is more suitable for the R-Mesh the following two main further changes have been proposed in [31].

+

1. One change is to give up the principle of iterations of ants that are divided by the update of the pheromone matrix. Without the use of explicit iterations a new selection criterion to decide which ants are allowed to update the pheromone information is needed. The criterion that is applied after an ant has found a solution is whether the solution found by the ant is better than the nz’ - 1 solutions of the preceding in - 1 ants, m’ 5 ni. If an ant is allowed to update, it is done immediately while other ants are still pipelined through the mesh constructing a solution. This principle -immediate update of the pheromone matrix when new information is available- might also be useful

192

PARALLEL ANT COLONY ALGORITHMS

for other paralleVdistributed implementations of ACO algorithms. Evaporation is done every time an ant is allowed to update the pheromone information. Pipelining the ants using this noniterative approach gives a running time of O(( z + n) . log n ) instead of O ( x . (7n n ) . log 1%) for the generational approach, where z = x . m is the total number of solutions that have been constructed.

+

2 . The second major change is that an ant no longer uses the pheromone values directly to determine the probabilities of the possible outcomes of a decision. Instead, it uses a threshold function that assigns to every possible outcome of the next decision either a high or a low probability depending on whether the pheromone value is above or below the threshold. Then a fast bit-summation algorithm for the R-Mesh can be used for realizing the decisions of an ant. It was shown that this R-Mesh-ACO variant can be implemented in quasi-linear time (with respect to the total number of solutions that are constructed and the problem size) on an R-Mesh with n2 processors. This has to be compared to a sequential running time of O ( z n 2 ) where , n is the problem size and z is the total number of ants. Experiments on instances of the Quadratic Assignment Problem (QAP) have shown that R-Mesh-ACO has a good optimization behavior which is only slightly worse when compared to a standard ACO algorithm with respect to the number of iterations. 8.4.2

P-ACO for FPGAs

An ACO implementation for Field Programmable Gate Arrays (FPGAs) that was proposed in [21, 401 is described in this section. This FPGA-ACO is based on the population-based ACO (see Section 8.2.l), where pheromone information is replaced by a small set (population) of good solutions discovered during the preceding iterations [ 191. A sketch of the FPGA-ACO is given here on a functional level without considering several technical details. To ease the description no heuristic information is used in the described algorithm (on the use of heuristic information for FPGA-ACO see [39]). FPGAs provide fast configuration, i.e., a programmable selection of alternate logic structures and routing structures, of circuits on a chip. Thus, an FPGA can be switched between the configuration mode and the operational mode. An FPGA contains the following three major types of configurable elements (see Figure 8.6): 0

0

0

Configurable Logic Blocks (CLBs) provide the basic functional components for implementing logic and registers. InpudOutput Blocks (IOBs) form interfaces between the routing network and package pins. The routing network consists of horizontal and vertical multitrack channels and configurable switches to interconnect CLBs.

H A R D W A R E PARALLELIZATION OF A C O

193

Fig. 8.6 Scheme of an FPGA with IOBs, CLBs, and switch boxes.

Generally, CLBs consist of two or three look-up tables (LUTs) and two flip-flops. Any LUT can typically either be configured to compute an arbitrary Boolean function of three or four input signals, or be used as a small RAM providing storage for up to 16 bits. Depending on the respective device, additional dedicated storage elements, carry logic, multiplication circuits, or even complete RISC microprocessors might be embedded into the FPGA chip. Horizontal and vertical communication resources provide configurable connections among CLBs or between CLBs and IOBs. The design of the FPGA-ACO for permutation problems consists of three main hardware modules: the Population Module, the Generator Module, and the Evaluation Module (see Figure 8.7). The Population Module contains the k solutions in the population of the underlying P-ACO in the form of a population matrix Q. The population matrix is organized as a Fist-In-First-Out (FIFO) queue. Every entry qt3 E { 1,..., n} is the number of the item located at place i of the j-th solution, j E [l,. . . , k ] . For example for the TSP, qt3 is the next city to be visited after city 2 in the j-th tour stored in the population. The Generator Module and the Evaluation Module each contain m main blocks (evaluation blocks, respectively solution generators) one for each ant. The Population Module is responsible for broadcasting the content of the k items qt3 from the i-th row of the population matrix to the Generator Module for the ith decisions of the ants. In addition, the Population Module receives the best solution of the current iteration from the Evaluation Module and inserts it into the population matrix. In the Generator Module every ant constructs its solution, e.g., a tour through n cities for the TSP. After all m solutions have been constructed concurrently, they are transferred to the Evaluation Module. The evaluation results (e.g., tour lengths) of these m solutions are computed and collected in a comparison block. It determines the best solution of the current iteration and sends it to the Population Module.

194

PARALLEL ANT COLONY ALGORITHMS Pa Jation Module 'opulation Matrix

Generator Module

Evaluation Module

,..~......_~....... ( . . . . . . . . ...... ...._...........~ _

;

$

~

.

~

.

~

~

.

.

~

~

~

~

~

~

~

~

1

I

a

or

~

~

' ~

I

Evaluation Block

Fig. 8.7 FPGA-ACO design with Population Module, Generator Module, and Evaluation Module. Solution Generator

Fig. 8.8 Decision making in solution generator of FPGA-ACO.

The hardware implementation of P-ACO does not explicitly use any pheromone information. Instead ants make their decision solely based on the solutions stored inside the population as depicted in Figure 8.8. Consider a decision in the solution generator corresponding to row number i of the population matrix. Then ( q z l . . . . q L k ) in matrix Q denotes the current population vector and { s ~. ., . , S N } the selection set of N 5 n yet available items. All k items in the population vector are broadcast to the selection set. If the respective item is still contained in S,it is called a matched item. All matched items are stored into the match buffer. After all broadcasts are finished, the match buffer keeps M p 5 k matched items. Each of these items is associated with a weight A p . Now the next decision can be made according to the probability distribution p z J.j E S :

.

with CzJ denoting the number of solutions in the population matrix with qlh = j , e.g., = / { h : qzh = j } l . Note that in the P-ACO hardware implementation ~ ~ :=~ 1. ,A decision t is made by drawing an integer random number T from the ranger E (1,. . . , N + M p . A p } . I f r 5 N,thenther-thitems,inselectionsetSis selected. Otherwise match buffer item b: with 3 = is selected. Building up

ctJ

1 9 1

OTHER ANT COLONY APPROACHES

195

the probability distribution according to Equation 8.3 requires O(k ) steps. Drawing a random number and selecting an item in hardware can be accomplished in O( 1). Hence, a complete ant decision can be computed in O ( k ) time, compared to O ( n ) for the standard ACO algorithm. Test results of an FPGA-ACO for the Single Machine Total Tardiness problem (SMTTP) are presented in [21, 401. The hardware implementation was done on a Virtex-I1 Pro X2VP125-7 FPGA. The number of solution generators (which equals the number of ants per iteration) ranged from r n = 2 to m = 32. The run time on the FPGA was compared to a software implementation on a 1540 MHz AMD Athlon single-processor machine. The time for placing and routing the hardware design is not contained in the run time measurements. Since the software and hardware version produce the same solutions, their performances can be compared by means of run time per iteration. The obtained speedup values for the hardware implementation range from a minimum of 2.04 for a problem size of n = 320 to a maximum value of 4.07 for n = 48, both for a fixed number of ni = 9 solution generators. It was shown that for an increasing number of Solution Generators up to nz = 32 the time per iteration for the software version grows approximately linear; whereas for the hardware implementation the run time remains almost constant at 28 microseconds. The obtained speedup values for the hardware version range from l .59 for m = 2 solution generators up to 10.22 for m = 32. This shows that (within certain constraints) the degree of parallelism can easily be increased (in terms of concurrently working ants) with only a marginal decline in execution speed. 8.5 OTHER ANT COLONY APPROACHES

The behavior of ant colonies not only has inspired the design of the ACO metaheuristic, but also has influenced other (parallel) problem-solving approaches that are related to optimization. Two of these approaches apply the indirect communication model of ants via pheromones and rely on a graph model where the nodes are active units that work in parallel and the edges are transport units that are used by simple agents called ants to move through the graph. The main focus of these applications is more on how to distribute information in the network and not so much on parallel computing. Therefore, these approaches are only briefly discussed here. Routing in telecommunication networks is one application where the idea is to use ants to find good paths from sender nodes to receiver nodes in a network. On their way through the network the ants collect information about the traffic (e.g., number of packets in the router queues, the time it took to move from one router to the next). This information is then used in the routers to change their routing tables, which they use to decide the direction they forward the arriving packets. Different types of networks (e.g., networks with fixed communication structures or mobile ad hoc networks) and different models for communications (e.g., packet routing, connection oriented communication) have been investigated in this area (e.g., [lo, 22, 24,411).

196

PARALLEL ANT COLONY ALGORITHMS

Another direction of research in ant colony inspired parallel systems is the design of distributed control processes for manufacturing systems (see [25J). Here the main focus is to use ant colony coordination to obtain global information from locally available information. A production system is seen as a directed acyclic graph (transport network) where the nodes are processing units and the edges are transport units. The pheromone information is distributed through a network of local pheromone locations that is modelled as a separate network in parallel over the transport network. The pheromone locations are seen as a distributed blackboard. The ants have a specified propagation direction and can move upstream or downstream in the network. Different types of ants are considered. As an example consider processes that are responsible for products selecting proper routes that lead them to the required processing nodes in a proper sequence. These processes send out specific ants that retrieve subnet capability information. The sorting behavior of ants to collect dead corpses or larvae onto clusters has also inspired the design of heuristics (see [3]) -some of which have also been parallelized-. Nugula et al. [36 J compare two types of parallel implementations of systems for the simulation of a collection of ants that move in a two-dimensional space. Examples of problems that have been solved with such systems are clustering problems and mesh partitioning problems (see [27,29]). In the system a nest of ants is placed within a two-dimensional grid environment. The ants are searching for seeds that are placed with the grid environment. The behavior of each ant depends on whether it is within a nest or not and whether it carries a seed or not. The movements of each individual ant are described by a finite-state machine. Each move depends on random decisions but also on data that the ants collects from its environment. Each ant can detect pheromone trails that are laid by seed-cawing ants that move back to the nest. Moreover, an ant can detect the direction of the pheromone gradient in order to decide which is the direction of the nest with respect to the putative food source. An ant can also detect when a seed is located in its neighborhood. The simulation model is event driven. In the bulletin-board model the ants and their activities are placed within the tuple space of the server and represent tasks that need to be completed. Each client picks up a task from the tuple space and updates the pheromone information. Then it performs the task and when it is completed the tuple space is updated and the client picks up another task. In the non-bulletin-board model the ants are divided statically between the processors. The server acts as a synchronizing agent to maintain the global pheromone information. The clients send time-stamped packets and the server increments the time step only when it has received pheromone information from all clients with the actual time stamp. When the server has incremented the time stamp, it returns in a packet to all clients that then continue their work. Test runs were done by Nugula et al. on a set of workstations that were connected with a bus topology. The two test problems had 525 and 875 ants, respectively. The execution times on a single processor for both problems were about 20% larger for the non-bulletin model compared to the bulletin model. For two processors an efficiency (with respect to the one processor version of the corresponding model) of 0.93 was obtained for both problems with the non-bulletin-board model. For the

REFERENCES

197

bulletin-board model it was slightly more than 0.8. For 7 processors the efficiency was decreased for the smaller problem to 0.59 and 0.51 for the non-bulletin-board model and the bulletin-board model, respectively. Slightly larger efficiencies of 0.69 and 0.60 were observed for the larger problem. The relative amount of time that was spent for communication was less for the bulletin-board model (about 14%) than for the non-bulletin-boardmodel (about 22%) when only 2 processors were used and the task granularity was 25 ants per task. But the relative amount of communication time increased for the bulletin-board model when the number of processors was enlarged (for 7 processors it was about 47% for the bulletin-board model and 30% for the non-bulletin-board model). The authors conclude that the bulletin-board method can be faster when the best granularity of tuples is chosen. Thus, the granularity should be relatively coarse so that the communication cost is not too high. For very large granularity the execution times increased to the point where the bulletin-board model effectively becomes the non-bulletin-board model. Albuquerque and Dupuis [ 11 studied a parallel implementation of a variant of the ant sorting model of Deneubourg et al. [9]. The implementation was done using the PELABS (Parallel Environment for a LAttice Based Simulations) [15]. In this implementation the behavior of each ant was influenced by the items that lay on the same grid cell and on the eight neighboring cells. For parallel execution the grid was partitioned equally between the processors. Every processor knows the states of cells from the other parts of the grid when they are on the boundary of its part. After each step the boundary information is exchanged by the processors. The partitions are chosen by the PELABS environment so that the boundary to other sets of the partition is small in order to reduce the amount of communication. Unfortunately, not much details are given about the performance of the parallelization. The authors only claim that a typical run was 320 s on a network of 8 PCs compared to 1600 s on a single-node, which is a speedup of 5 . Acknowledgments Support by the Deittsche Forschitngsgemeinschnft within the project "Methods of Swarm Intelligence on Reconfigurable Architectures" is greatfully acknowledged.

REFERENCES 1 . P. Albuquerque, and A. Dupuis: A Parallel Cellular Ant Colony Algorithm for Clustering and Sorting. Proc. of ACRI 2002, LNCS 2493, Springer, 220-230 (2002).

2. M. Bolondi, and M. Bondaza: Parallelizzazionedi un algoritmo per la risoluzione del problema del comesso viaggiatore; Master's thesis, Politecnico di Milano, (1993).

198

PARALLEL A N T C O L O N Y ALGORITHMS

3. E. Bonabeau, M. Dorigo, and G. Theraulaz: Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press (1999). 4. B. Bullnheimer, G. Kotsis, and C. Strauss: Parallelization Strategies for the Ant System; in: R. De Leone et al. (Eds.), High Performance Algorithms and Sofmare in Nonlinear Optimization; series: Applied Optimization, Vol. 24, Kluwer, 87-100 (1998). 5. S.-C. Chu, J.F. Roddick, and J.-S. Pan: Ant colony system with communication strategies. Information Sciences 167(1-4):63-76, (2004).

6. Cordon, O., F. Herrera, and T. Stiitzle. A Review on the Ant Colony Optimization Metaheuristic: Basis, Models and New Trends. Mathware and Sofi Computing, 9(2-3): 141-175, (2002). 7. P. Delisle, M. Krajeclu, M. Gravel, and C. Gagne: Parallel implementation of an ant colony optimization metaheuristic with OpenMP. International Conference on Parallel Architectures and Compilation Techniques, Proceedings of the 3rd European Workshop on OpenMP (EWOMP’OI), (2001). 8. J. L. Deneubourg, S. Aron, S. Goss, and J. M. Pasteels: The self-organizing exploratory pattern of the Argentine ant. J. Insect Behav. 32: 159-168 (1990).

9. J. L. Deneubourg, S. Goss, N. Franks, A. Sendova-Franks, C. Detrain, and L. Chretien: The Dynamics of Collective Sorting. From Animals to Animats: International Conference on Simulation of Adaptive Behavior, The MIT Press, 356363 (1990). 10. G. Di Caro, and M. Dorigo: AntNet: Distributed Stigmergetic Control for Communications Networks. Journal of Artzjicial Intelligence Research (JAIR), 9: 317-365 (1998).

1 1. M. Dorigo: Optimization, Learning and Natural Algorithms (in Italian). PhD thesis, Dipartimento di Elettronica, Politecnico di Milano, (1992). 12. M. Dorigo: Parallel ant system: An experimental study; unpublished manuscript, ( 1 993). 13. M. Dorigo, V. Maniezzo, and A. Colorni. Positive feedback as a search strategy, Tech. Rep. 91-016, Politecnico di Milano, Italy (1991). 14. M. Dorigo and T. Stiitzle: Ant Colony Optimization. MIT Press, (2004).

15. A. Dupuis and B. Chopard. An object oriented approach to lattice gas modeling. Future Generation Computer Systems, 16(5):523-532 (2000). 16. M.S. Fiorenzo Catalan0 and F. Malucelli: Parallel randomized heuristics for the set covering problem. International Journal of Practical Parallel Computing, lO(4): 113-132 (2001).

REFERENCES

199

17. L.M. Gambardella, E. Taillard, and G. Agazzi: MACS-VRPTW A Multiple Ant Colony System for Vehicle Routing Problems with Time Windows. In D. Come, M. Dorigo and F. Glover (Eds.) New Ideas in Optimization. McGraw-Hill, 63-76 (1999). 18. S. Goss, S. Aron, J.L. Deneubourg, and J.M. Pasteels. Self-organized shortcuts in the Argentine ant. Natunvissenschaften 76:579-58 1 (1989). 19. M. Guntsch and M. Middendorf. A population based approach for ACO. In Applications of Evolutionary Computing - Proc. Evo Workshops 2002, Springer, LNCS 2279,724 1 (2002). 20. M. Guntsch and M. Middendorf: Applying Population Based ACO to Dynamic Optimization Problems. Proceedings of Third International Workshop ANTS 2002, Brussels, Springer, LNCS 2463, 111-122 (2002). 21. M. Guntsch, M. Middendorf, B. Scheuermann, 0. Diessel, H. ElGindy, H. Schmeck, and K. So: Population based Ant Colony Optimization on FPGA. Proceedings 2002 IEEE International Conferenceon Field-Programmable Technology (FPT’02), IEEE, Hong Kong, 125-132 (2002). 22. M. Gunes, U. Sorges, and I. Bouazizi: AFL4 The Ant-Colony Based Routing Algorithm for MANETs. Proc. International Conferenceon Parallel Processing Workshops (ICPPWSOZ), 79-85 (2002). 23. J. Hoeflinger, B. Kuhn, P. Petersen, H. Rajic, S . Shah, J. Vetter, M. Voss, and R. Woo: An integrated performance visualizer for OpenMPMPI programs. Proc. Intei-national Workshop on OpenMP Applications and Tools, Springer, LNCS 2104,40-52 (2001). 24. 0. Hussein and T. Saadawi: Ant routing algorithm for mobile ad-hoc networks (ARAMA). Proceedings of the 2003 IEEE International on Performance, Computing, and Communications Conference,28 1-290 (2003). 25. Hadeli, P. Valckenaers, M. Kollingbaum, and H. Van Brussel: Multi-agent coordination and control using stigmergy. Computers in Industry, 53( 1): 75-96 (2004). 26. H. Kawamura, M. Yamamoto, K. Suzuki, and A. Ohuchi: Multiple ant colonies algorithm based on colony level interactions. IEICE Transactions on Fundamentals, E83-A(2): 371-379 (2000). 27. P. Korosec, J. Silc, and B. Robi: Solving the mesh-partitioning problem with an ant-colony algorithm. Parallel Computing, 30(5-6): 785-801, (2004). 28. F. Kriiger, D. Merkle, and M. Middendorf, : Studies on a Parallel Ant System for the BSP Model; Unpub. manuscript (1998). 29. A.E. Langham and P.W. Grant. Using competing ant colonies to solve k-way partitioning problems with foraging and raiding strategies. In Proc. 5th European Conference on Artificial Life, ECAL‘99, Springer, LNCS 1674,62 1-625 (1999).

200

PARALLEL ANT COLONY ALGORITHMS

30. D. Merkle and M. Middendorf. A New Approach to Solve Permutation Scheduling Problems with Ant Colony Optimization. In: E.J.W. Boers et al. (Eds.) Applications of Evolutionary Computing: Proceedings of EvoWorkshops 200 1, Springer, LNCS 2037,484-493 (2001). 3 1. D. Merkle and M. Middendorf: Fast Ant Colony Optimization on Runtime Reconfigurable Processor Arrays. Genetic Programming and Ei~olvableMachines, 3(4): 345-361 (2002). 32. D. Merkle and M. Middendorf: Swarm Intelligence. Appears in E. Burke, G . Kendall (Eds.), Introductory Tutorials in Optimisation, Search and Decision Support Methodology, Kluwer (2004). 33. R. Michels and M. Middendorf: An Ant System for the Shortest Common Supersequence Problem; in: D. Corne, M. Dorigo, F. Glover (Eds.), New Ideas in Optimization, McGraw-Hill, 5 1-61 (1999). 34. M. Middendorf, F. Reischle, and H. Schmeck: Multicolony Ant Algorithms. Journal oflieuristics, X(3): 305-320 (2002). Preliminary version in: J. Rolim (Ed.) Parallel and Distributed Computing, Proc. of the 15 IPDPS 2000 Workshops, Springer-Verlag, LNCS 1800,645-652 (2000). 35. R. Miller, V.K. Prasanna, D.1 Reisis, and Q.F Stout: Parallel Computations on Reconfigurable Meshes. IEEE Transactions on Computers, 42(6): 678-692 (1993). 36. V. Nugala, S.J. Allan, and J.W. Haefner: Parallel implementations of individualbased models in biology: bulletin- and non-bulletin-board approaches. Biosystems, 45(2): 87-97 (1998). 37. D.A.L. Piriyakumar, and P. Levi: A New Approach to Exploiting Parallelism in Ant Colony Optimization. International Symposium on Micromechatronics and Human Science(MHS), 7 pp. (2002). 38. M. Randall and A. Lewis, A parallel implementation of ant colony optimization. Journal ofParallel and Distributed Computing, 62(9): 142 1-1432 (2002). 39. B. Scheuennann, M. Guntscki, M. Middendorf, and H. Schmeck: Time-Scattered Heuristic Guidance for a Hardware Implementation of ACO. Proc. Fourth International Workshop ANTS, LNCS 3 172,250-261 (2004). 40. B. Scheuermann, K. So, M. Guntsch, M. Middendorf, 0. Diessel, H. EIGindy, and H. Schmeck: FPGA Implementation of Population-based Ant Colony Optimization. Applied Soft Computing, 4: 303-322 (2004). 41. R. Schoonderwoerd, O.E. Holland, J.L. Bruten, and L.J.M. Rothkrantz: Antbased load balancing in telecommunications networks. Adaptive Behavior, 5(2): 169-207 (1996).

REFERENCES

201

42. R.P. Souto, H.F. de Campos Velho, and S. Stephany (2004): Reconstruction of Chlorophyll Concentration Profile in Offshore Ocean Water using a Parallel Ant Colony Code. In Proc. 16th European Conference on Artificial Intelligence (ECAI-2004), Hybrid Metaheuristics (HM-2004), 19-24 (2004). 43. T. Stiitzle: Parallelization strategies for ant colony optimization. In A. E. Eiben, T. Back, M. Schonauer, H.-P. Schwefel (Eds.), Parallel Problem Solving ,from Nature - PPSN V, Springer-Verlag, LNCS 1498, 722-73 1 (1998). 44. E-G. Talbi, 0. Roux, C. Fonlupt, and D. Robilliard: Parallel ant colonies for the quadratic assignment problem. Future Generation Computer Systems, 17(4):441-449(2001). Preliminary version in J. Rolim et al. (Eds.) Parallel and Distributed Processing, 1 1 IPPSBPDP’99 Workshops, LNCS 1586, Springer, 239-247 (1999). 45. C.-F. Tsai, C.-W. Tsai, and C.-C. Tseng: A new and efficient ant-based heuristic method for solving the traveling salesman problem. Wxpert systems, 20: 179185 (2003). 46. C.-F. Tsai, C.-W. Tsai, and C.-C. Tseng: A new hybrid heuristic approach for solving large traveling salesman problem. Information Sciences, 166(14):6781 (2004). 47. R. Vaidyanathan and J. L. Trahan: Dynamic Reconfiguration: Architectures and Algorithms. Kluwer, (2004). 48. http://www.iwr.uni-heidelberg.de/iwr/comopt/soft/TSPLIB/TSPLIB.html

This Page Intentionally Left Blank

9

Parallel Estimation of Distribution Algorithms JULIO MADERA', ENRIQUE ALBA', ALBERT0 OCHOA3 'Universidad de Camagiiey, Cuba 2Universidad de Milaga, Spain 31nstituto de Cibernetica, Matematicas y Fisica, Cuba

9.1 INTRODUCTION Estimation of Distribution Algorithms (EDAs) [26,25]are gaining popularity among evolutionary computation researchers, who are introducing proposals of new and more complex EDAs. An EDA algorithm uses a population of individuals (as other evolutionary algorithms -EAs- do) to estimate the probability distribution of each variable present in them. Later, this distribution is used (instead of crossover or mutation) in order to generate a new set of solutions that are hopefully nearest the optimum of the problem. In this sense, many techniques to select and replace individuals are used (truncation being a well-known one), and the estimated distribution holds inside many of the advantages and drawbacks of this type of algorithm. A major (and usual) issue with such complex algorithms is the computational cost, which in this case is largely determined by the estimation of the probability model. One obvious approach to deal with this problem is the parallelization and/or a kind of decentralization of the algorithms. This chapter presents a review of the main current parallel approaches for EDAs. We discuss the masterislavemodel, the distributed model (island-basedmodel), a new model called cellular introduced here for the first time, and also different existing approaches to parallelize the components of the algorithms as well as how parallelism can be applied to EDAs. The aim of the work is to offer a starting point on this topic to the newcomers and to promote the application of these algorithms to practical and complex problems. The main lesson here is that the advantages of sequential EDAs with respect to other traditional EAs are amplzjied when a parallel model is used.

203

204

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

Algorithm 1. Local Optimization in the DNAml Algorithm

-

1: Sett 1; 2 : Generate .I‘> > 0 points randomly; 3: Select -11 5 A’ points according to a selection method: 4: Estimate the distribution p”(-t..t ) of the selected set; 5: Generate .Y new points according to the distribution p ‘ ( . ~t.) ; 6: Set t - f .r1. If tcrmination criteria are not met, go to Step 3.

-

The focus of our attention has been concentrated on two main parallel approaches: evolution of parallel populations and different kinds of distributions of search procedures acting on sequentially evolving populations. The outline of the chapter is as follows. To begin with, we analyze the different opportunities for parallelization of EDAs in Section 9.2. Then, three parallel models suitable for EDAs are presented in Section 9.3, together with sample simulations for illustration purposes. The chapter ends with a summary of the main work that has been done in the field. LEVELS OF PARALLELISM IN EDA

9.2

The class of Estimation of Distribution Algorithms (EDA), was initially proposed by Muehlenbein and Pass in [26]. The reader is referred to [ 181 for a good revision of early works on EDAs as well as for new applications and comparisons to other algorithms. An EDA can be conceptually described according to Algorithm 1. The most costly steps of EDA are the estimation of p“(x,t ) and the generation of new points according to this distribution. These steps play the same role as the recombination operator in other EAs. Examples of EDA for discrete domains can be found in [31,30, 191, where they use Bayesian networks to represent the probability distribution. Gaussian networks [34] are usually employed when EDAs are targeted to continuous domains [20]. An EDA algorithm can be parallelized at different levels, namely: 0

(L)- Learning (or estimation) level.

0

(S)- Sampling level.

0

(P)- Population level.

0

(F)- Fitness evaluation level.

In general, Bayesian network learning is an NP-hard problem [ 9 ] , because it requires an exponentially growing computational effort: most learning algorithms are exponential on the maximum number of parents of the network. The problem gets still worse in the presence of incomplete data, when it is necessary to resort to

LEVELS OF PARALLELISM IN EDA

205

approximate algorithms like the EM [ 111. Therefore, this is the most time-consuming step of an EDA and represents a major challenge for parallelization. Sampling is another EDA step suitable for parallelization; the generation of new individuals could be accomplished in a parallel fashion. The most popular sampling method in current discrete EDA implementationsis the PLS algorithm [ 171. For small prob!ems it is not an expensive method: however, for large populations and large number of variables it is. Moreover, there are cases when the representation of the problem is quite complex, and hence sampling from the corresponding distributions becomes a hard task (indeed, expensive Montecarlo simulation tools like Gibbs sampling are usually needed). Decentralizing the search at the population level to make algorithmic components is an additional issue to later parallelize the resulting chunks or neighborhoods. Adding a spatial structure to the population could lead to more efficient algorithms from a numerical point of view, with a later reward of running faster than sequential algorithms when these components are executed on (e.g.) a cluster of machines. In this case, a global population is defined virtually on a set of interacting local populations. The strength and frequency of the interactions and the sizes of the subpopulations define the scheme of parallelization. One popular and well studied scheme is the island framework. Islands are groups of semi-independent individuals, or subpopulations with weak bindings among neighbor islands. This cooperation occurs in the form of a migration of some individuals from one island to another. This technique admits an easy parallelization which has been widely investigated in the field of EAs [2, 81. The existing results can be easily extended to the EDA domain, either purely or merged with other levels of parallelization [ 1,22,23,27,28]. Recently, a new spatially structured class of algorithms (cellular EDAs), with a high coupling among neighbor subpopulations, has been introduced by Ochoa et al. [29]. In Section 9.3.4 we will see that it is a development based on both an EDA and a cellular GA [2,4]. Fitness evaluation is another important component of the cost of an EDA, particularly for real-world applications. However, at this level, there are not any new issues in EDAs that were not found previously in the parallelization of other EAs. The above list of levels is not exhaustive. There are other procedures that can be parallelized in EDA; for example selection (see Section 9.3.4). Moreover, EDA research, which is just at its beginning, started [26, 251 as a combination of evolutionary theory with graphical models [34]. Therefore, it is expected that new and more complex methods that will require the application of parallel techniques will be introduced in the field. It is also worth noting that parallelizations at different levels can coexist in a single EDA. For instance, a spatially structured algorithm can be combined with a master-slave model for fitness function evaluation and/or local computation of marginals in a kind of hybrid model or search. Also, running different EDAs in parallel in a cooperative manner could yield more accurate solutions where other algorithms find difficulties (in fact, this is an ongoing research line of the authors).

206

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

9.3 PARALLEL MODELS FOR EDAS In this section we review three models of parallel EAs that have been applied to EDAs, namely the masterislave, the distributed, and the cellular models. 9.3.1 MastetWave in EDA

Probably the easiest way to parallelize an EA is to distribute among several slave processors the evaluation of the fitness function while one master executes the other parts of the algorithm (selection, generation of new points, etc.). In EDAs this model can be applied also to the estimation of the probability distribution. For example, in addition to computing fitness in parallel, the slave processors compute marginal distributionsand the master collects the results. In either case, the algorithm performs like a sequential one from the point of view of its numerical results, but considerably faster. Figure 9.1 shows the mastedslave scheme.

Fig. 9.1 Masterklave parallelization scheme

9.3.1.1 Parallel Fitness Evaluation in EDAs. The masterislave model for fitness evaluation, also known as the global model, has an easy implementation. It has no influence at all on the search itself but can reduce drastically the computational time when the fitness function is costly. The execution time of masterislave models has two components: the time used in local computations and the time used to communicate information among processors. In a parallel EDA, the former time is largely determined by the size of the population. Small population sizes often are not compatible with successful searches. In those cases, the only possible way to reduce time is the parallelization of the fitness evaluation process.

PARALLEL MODELS FOR EDAS

207

One critical issue is the load balance between the processors that execute the evaluation of the fitness (the slaves). If the processors have the same amount of available computational power, the load can be distributed statically, assigning the same amount of work to each processor. Alternatively, if the available computational power is different, the load has to be assigned dynamically, according to the demand. It is worth noting that it is advisable to keep low the frequency of the communication between the processors to reduce the overhead that can become considerable as the number of computers grows. 9.3.1.2 Learning the Probability Distribution in Parallel. For EDAs, the master/slave model is also suitable for the estimation step, which is one of the most time consuming phases of the method. The reason is that the cost of learning a probability distribution associated to a Bayesian network is exponential on the maximum number of parents. The application of the masterislave scheme to this step takes advantage from the decomposability properties of most learning algorithms. Both the BDe and the BIC score metrics are decomposable [ 161. The computation of the metric components is accomplished in different slave processors, and the master collects the partial results to calculate the complete score. Another possible related parallelization involves the computation of marginal distributions. In this case, slave processors are responsible for computing and sending the marginals to the master. 9.3.2 Distributed Estimation of Distribution Algorithms

The coarse-grain computational model has been largely studied in the EA community as well as in other branches of optimization and learning. In this field the resulting algorithm could be called distributed EDA or dEDA. The island model [ 101 features geographically separated subalgorithms each one having its own subpopulation of relatively large size. These subalgorithms may exchange information with a given frequency, e.g., by allowing some individuals to migrate from one island to another. The main idea of this approach is to periodicallyre-inject diversity into subpopulations which would otherwise converge prematurely. As it could be expected, different islands will tend to explore different portions of the search space in parallel, and to provide independent solutions to the same problem [33]. Within each subpopulation, a standard sequential (single-populationor panmictic) EA is usually executedbetween migration phases, although nothing prevents using other classes of algorithms. The behavior of distributed EDAs is controlled by several parameters that affect their efficiency and precision. Among others, we have to choose the number of subpopulations and their sizes, the connection topology, the number of migrants (alternatively it can be defined as a migration rate), the frequency of the migrations, and the criteria for selecting the migrants and the replaced individuals. The importance of these parameters in the quality of the search and the induced efficiency has been largely studied [ 15,32,4], although the optimal values clearly depend on the problem being solved.

208

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

9.3.2.1 Migration Policy in a Parallel Distributed Evolutionary Algorithm. The working principles of any distributed EA include a communication phase, which is governed by a migration policy. The migration policy determines how communication is carried out by the islands of the distributed EA, and it is defined by five parameters: Number of migrants ( m ) .It is the number of individuals to exchange among the islands, m E { 0 , 1 , 2 .. .}. The value 0 means in this case no interaction at all among the subpopulations (idle search). Alternatively, this parameter could be measured as a subpopulation percentage or rate. 0

0

0

0

Migration frequency ( r ) . Number of generations in isolation, r E {0,1,2 . . .}. Alternatively, it can be measured as the number of fknction evaluations before migration, which is more appropriate when comparing algorithms having a different step grain (in terms of the number of evaluations). Policy for selecting migrants ( S ) . The migration selection can be made according to any of the selection operators available in the literature (fitness proportional, tournament, etc.), e.g., S = {best,random}. The most used are truncation (select the best) and random. Policy for migration replacement (I?). It is used for integrating the incoming individual in the target subpopulation, e.g., R = {worst, random}. It decides which individuals will be replaced by the incoming migrants. Synchronization. It is a flag indicating whether the algorithm islands are performing regular blocking inputloutput f r o d t o another island or whether individuals are integrated whenever they arrive, at any moment during the search.

In practice, combinations of the above techniques are also possible. In fact, the exchanged information does not need to be individuals but could also consists of statistics or parameters meaningful to guide the distributed search.

9.3.2.2 The EDA Domain. After discussing the different parameters affecting the parallelization of EAs, we now move to a discussion from the point of view of the EDA domain. The asynchronouslsynchronous dEDA algorithm can be seen as the combination of d islands each one executing an EDA algorithm:

This is graphically depicted in Figure 9.2. The main idea of the distributed algorithm is to execute in each island an EDA algorithm and periodically (e.g., after the generation of each new individual) to verify whether the migration step has been reached. In that case, there will be an exchange of individuals with the neighbors according to the selected topology and the rest of

PARALLEL MODELS FOR EDAS

209

EDA TOPOLOGY

EDA

Fig. 9.2

Distributed Estimation of Distribution Algorithm (dEDA).

migration parameters. The arriving individuals replace the selected individuals; e.g., worst or random individuals are replaced by the newcomers. In our case, the best individuals in the source island can be selected and replace the worst individuals of the target neighboring island. This choice is expected to induce a larger selection pressure that will hopefully accelerate the convergence of the algorithm as a whole 171. In a more general conception, each island in a dEDA could execute a different EDA, resulting in a heterogeneous dEDA, which also represents a very interesting open research line. For example, we could have one dEDA of four islands where the first one executes UMDA, the second one LFDA, the third one EBNA, and the last one executes PADA (see Figure 9.3), being all of them different flavors of EDA algorithms. Each algorithm, depending on the problem, has potential advantages and weaknesses that could be conjugated with the features of the other algorithms. To deal with the differences in the execution times of each algorithm, we suggest the distributed algorithm be implemented asynchronously, in order to better exploit the computational machine power. In a different heterogeneous scenario, each island could execute the same base algorithm, but with different parameters. 9.3.3 A Numerical Example

The simulations presented in this section are aimed at illustrating what was said in the previous two sections. A simple binary function. the OneMax, that returns the number of variables set to 1 in a vector is used. It has n 1 different fitness values multinomially distributed and its global optimum is located at the point (1,1,. . . ,1). All tests were executed on four Pentium 4 at 2.4 GHz and 512 MB of RAM running Linux, interconnected with a Gigabit Ethernet network using the LAM implementation of the MPI standard.

+

210

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

LFDA

UMDA

Fig. 9.3 Conceptual dEDA running different EDAs in each island (heterogeneity).

We test one master/slave, and one island-based distributed model with UMDA [26]. The masterislave model is only tested for fitness evaluation, because there is no need to compute the probability distribution, as far as it is known to be the a full independent network. All the algorithms tested use truncation selection with ratio 0.3 (30% of the population) without elitism (the new generation completely replaces the old one). The population size N is 400 and the individual size is 1000. The algorithms stop after finding the optimum (hit) or after reaching 40,000 evaluations. The number of evaluations is averaged over 100 independent runs. For the distributed versions, the number of hnction evaluations is the sum of the evaluations carried out by each island. All the results (fitness evaluations and speedup) are average values over the successful runs. To test the masterislave model, we make the function OneMax costly by introducing some delay in its evaluation. We show how an asynchronous distributed UMDA (dUMDA) and a masterislave UMDA (gUMDA) improve the execution times of the sequential UMDA. The speedup can be defined as follow:

Speedup =

sequential t i m e parallel tame

To evaluate the speedup we use the taxonomy proposed in [3]. This taxonomy divides the speedup analysis into two types, strong speedup, -type I- and weak speedup, -type ZI-. Strong speedup is intended to compare the parallel algorithm with the best-so-far sequential algorithm. The second measure compares the parallel algorithm against its own sequential version, which is more pragmatic in most works. Before the algorithm runs, the stop condition is set (finding the optimum) and then the speedup is measured (type 1I.A). In this work we select a weak speedup (type II.A.2), called Orthodox. This type of analysis runs the same algorithm (dUMDA) with four islands over 1, 2, and 4 processors to provide a meaningful comparison, i.e., the algorithm run in one processor is not the panmictic UMDA, but the dUMDA itself (in parallel studies it is not fair to compare time against a different algorithm, since an arbitrary result could be obtained).

PARALLEL MODELS FOR EDAS

21 1

Figure 9.4 shows the results with the masterhlave gUMDA. In this case the communications are frequent and the system can only be expected to yield good results if it takes a considerable time to evaluate the fitness function. OneMax Function

1

31

O50 1

2

3

Number of Processors

4

I

Fig. 9.4

Speedup for masterislave model based on the gUMDA algorithm.

The results for the distributed dUMDA are presented in Figure 9.5. In this case each deme is connected to the neighbor in a ring topology. The migration strategy was to migrate only one individual after each generation. Note that the speedup is not significant. This happens because in this case the savings on computational time brought by the use of multiple demes are not enough to overcome the increase in communications. We could yield more clearer results in the two parallel cases (global and distributed), but our goal here is just to put to work these two algorithms. In a really difficult problem, the global approach should have thrown good speedup values up to a given number of computers, while the distributed approach could have been specially good beyond this point by taking advantage from its decentralized numerical search. This is of course not mandatory, but it is just the claim that many researchers usually find in their daily experiments with similar EAs. 9.3.4 Cellular Estimation of Distribution Algorithms

In this section we review a recent development in the design of parallel and distributed EDA algorithms: Cellular Estimation Distribution Algorithms (cEDAs) [29]. This is a new computational model that can be considered as an extension of both EDA [26, 251 and cEAs [4,3,2].

212

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

3 2.5 Q

2-

g

1.5 -

n

=I

Q

ua

1 4 0.5 -

0

Fig. 9.5 Speedup for the distributed model based on the dUMDA algorithm.

1111 1111 111. 1111

-1111 1111 1111 1111

1111 1111 1111 1111

iiii 1111

Fig. 9.6 Example of cellular organization in a cEDA. A global population of 12 x 20 individuals (small squares) is organized in a 3 x 5 toroidal grid of cells (large squares) containing 16 individuals each. Besides, it is shown the so-called von Neumann neighbourhood with respect to the central cell.

A cEDA is a collection of collaborating EDAs, also called member algorithms, that evolve overlapping populations. One distinctive feature of this class of algorithm is that selection is decentralized at the level of the member algorithms. We recall that

PARALLEL MODELS FOR EDAS

213

selection in other cEAs usually occurs at the recombination level. Being more precise, we say that in a traditional cEA the reproductive loop is performed inside each of the numerous string pools. One given string has its own pool defined by neighboring strings, and at the same time, one string belongs to many pools. However, in a cEDA the reproductive loop is performed inside each of the numerous subpopulationspools, also called local populations. One given subpopulation, which is usually called cell, has its own pool defined by neighboring subpopulations, and at the same time, one subpopulation belongs to many pools. The set of all subpopulations defines a partitioning of the global population. The organization of cEDAs is based on the traditional 2D structure of overlapped neighborhoods. That structure is better understood in terms of two grids: one consisting of strings and another consisting of disjoint sets of strings (cells). Figure 9.6 shows a global population of 12 x 20 strings (small squares) partitioned in a 3 x 5 toroidal grid of cells (large squares) containing 16 strings each. Often a short notation for a cellular grid is used. For example, the grid of Figure 9.6 is labelled as 4 x 4 - 3 x 5. Each iteration of a cEDA consists of exactly one iteration of the member algorithms, each one of which is responsible for updating exactly one subpopulation. The computation of the local EDA model, and therefore also selection, is carried out on the basis of the local population defined by the neighborhood of the cell. Figure 9.6 shows a local population of 80 strings that is defined by the so-called von Neumann neighborhood of the central cell. Cellular EDAs are a kind of combination of distributed (dEA) and cellular (cEA) models [2]. Usually, a dEA has a small number of nodes, many tens of individuals in each node, and performs sporadic communications. On the other hand, a cEA has a large number of nodes, usually a single individual per node, and performs a tight and frequent communication with neighboring nodes. Both dEAs and cEAs directly suggest parallel execution in MIMD and SIMD computers, respectively [3], although the model itself can run sequentiallyjust as a new kind of search in EDAs. In contrast, a cEDA may have a small or large number of nodes and the number of individuals in each node ranges from a few tens to many hundreds. Besides, they perform tight and frequent communication with neighboring nodes and are suitable for execution in MIMD computers and for some distributed implementations (clusters of computers). Cellular EDAs have the same updating strategies that are found in cGAs, namely synchronous and asynchronous update [14]. In the former strategy all cells are updated at the same time. On the other hand, asynchronous update means that the updates are independent events. A critical issue in a cEDA is the computation of the probabilistic model because usually it is time consuming. However, the computational overhead can be alleviated if the structural part of the model is computed once. In fact, there are several alternative learning schemes for cEDAs. 0

Learning the structure from the global population and the parameters from the local populations. In this case the computation of the structure is camed out only once. A selection step is performed globally to infer the structure of the

214

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

problem. Then selection is applied locally and the probabilities of the model are computed. 0

0

Learning the structure from the global population but computing the necessary statistics locally and then integrating the results with an appropriate statistical method. This seems to be a promising method that could help reduce the population size requirements of a cEDA significantly. Learning the structure and the parameters from the local populations. This is the most time consuming method and the one that is expected to give the best results. However, this is a subtle matter since it is not clear with which accuracy it is necessary to learn the structure of the local populations. This remains an interesting open question.

Finally, we illustrate the communication process in a cEDA with a simple example. Let us suppose that the search distributions of all member algorithms have the same structure. This can be the case of the optimization of an additive function because its structure often is a good approximation of the probabilistic model of its search distribution [25]. Besides, we assume that each cell is allocated to a separated processor for execution. As a first step, the distributed selection could be applied on the local population. For example, distributed truncation selection finds in each cell which individuals belong to the selected set of the local population. Then each processor computes the marginals on the selected subset of each cell. Finally, all marginals are sent to the processor of the central cell of the neighborhood where they are added to compute the local EDA model. Potential parallelizations of a cEDA could lead to fast performances, since its intrinsic fine grain is somewhat hidden inside the separated populations and thus the communications overhead can be managed in a cluster of computers. 9.3.4.1 A Sample Simulation with cEDA. To illustrate the power of a cEDA we present the following result borrowed from [29]. The OneMax with 1000 variables is once again the benchmark function. The behaviors of two algorithms are compared: a synchronous-cellular (cUMDA) and a sequential UMDA. The two algorithms are run 1000times for at most 70 generations using a truncation selection of 0.3 and neither elitism nor mutation. The cUMDA uses the grid 2 x 2 - 10 x 10 and the compact C41 neighborhood, which is depicted in Figure 9.7. Therefore, it has a local population of size 164, a global population of size 400, and a cell size of 4 strings. Note that there are 100 member algorithms. The sequential UMDA uses a population of 400 strings. Table 9.1 shows the results of the experiment. The columns of the table stand for success rate, generation where the optimum is found, and the number of function evaluations, respectively. The obvious observation that can be drawn from the table is that the algorithms have a similar behavior. However, what can we say about the computational time?

PARALLEL MODELS FOR EDAS

215

Fig. 9.7 In a square template of 9 x 9 cells the compact c 4 1 neighborhood is shown. The cells marked with a small black square belong to the neighborhood and define the local population of the central cell. Note that each cell represents a subpopulation. Table 9.1 Comparing a synchronous-cellular and a sequential UMDA for the OneMax function with 1000 variables. The cellular algorithm uses the compact neighborhood c41 and the grid 2 x 2 - 10 x 10. The sequential algorithm uses a population of 400 strings

Algorithm

Success

Sequential UMDA Syncronous cUMDA

99% 98%

GCO,,

40.38 f 0.62 40.62 f 0.61

Evaluations 16553.5 4I 247.152 16374.0 4I 219.845

Let us assume that each member algorithm computes the fitness function for the strings of its central cell. In other words, it is possible to set up a masterislave scheme for the computation of the fitness function. Due to the synchronous update strategy used, this means that theoretically (if we would have 100 processors) we could evaluate the whole population in a time close to the time spent by the sequential algorithm in evaluating just 4 strings! This is a tremendous gain whose importance is proportional to the cost of the fitness function. For the computation of the marginals we can use a master/slave scheme as well. Every slave computes the counts associated to the central cell and send the result to a master processor. At the same time, the master provides every slave with the counts of their neighboring cells, which are needed to compute the marginals of the local population. It is worth noting that decreasing the population size of the sequential algorithm also decreases its success rate. However, the cellular variant is able to work with a much smaller local population (164 strings), without a reduction in the success rate.

216

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

Although we do not present numerical results of the timing, the above should be enough for the reader to have an idea of the power of the method. A detailed discussion will be publicly available soon in [29]. 9.4

A CLASSIFICATION OF PARALLEL EDAS

Tables 9.2 and 9.3 present a revision of the most important parallel EDAs developed so far (sorted by publication date). The tables give a short description of the main features of the algorithms and also report the level (column Level) of the implemented parallelism and whether they are able to deal with continuous and/or discrete nodes (column Type). Most parallel implementations concentrate on algorithms where the complexity of model learning is high. Table 9.2 Works in Parallel EDAs before 2003

Algorithm

Ref.

Year

Main Feature

Level Type

pBOA

[27]

2000

Parallel learning of the Bayesian network. Each processor inserts edges independently due to the generation of variables permutation. The sampling step uses a pipe-line of processors.

L,S

D

PA 1BIC , PA2BIC

[22]

2001

Parallel learning of the Bayesian network using multi-thread programming. This algorithm profits from the decomposability of the BIC metric.

L

D

dBOA

[28]

2001

Coarse-grain parallelism used to learn the Bayesian network in an distributed environment. The sampling of the new solutions is also distributed.

L,S

D

Although it is a new field, parallel EDAs have deserved interest in the recent past. Lozano et al. [22] proposed two parallel versions for an algorithm that uses Probabilistic Graphical Models in combinatorial optimization ( E B N A B ~ ~ This ). approach requires that the communication be established by means of shared data structures accessed in a multithread environment, a major disadvantage for the popular kind of distributed-memory computers like clusters.

A CLASSIFICATION OF PARALLEL EDAS

217

Table 9.3 Works in Parallel EDAs since 2003

Algorithm

Ref.

Year

Main Feature

Level Type

P~BIL

[l]

2003

Framework to develop distributed EDAs. It simulates the migration using two vectors of probabilities.

P

D

p E B N A s l c [23]

2003

Extension of the algorithms PAlBIC, PA2BIC using multithread programming and distributed programming. It allows using multi-computers and workstation clusters at the same time.

L

D

pEBNApc

[23]

2003

Identical to previous, but uses the algorithm E B N A p c [12] to learn the Bayesian network.

L

D

~ E G N A E E [23]

2003

Parallel version of the algorithm E G N A E ~[19, 201 for continuous domains.

L

C

pcGA

[21]

2004

Massively parallel implementation of the Compact Genetic Algorithm. This algorithm propagates the probability vectors among the computers.

L

D

P E B N A B I ~[24]

2004

An extension of the algorithm P E B N A B Iusing ~ distributed sampling of the new solutions.

S

D

c E DA

2004

Cellular EDAs.

All

D,C

[29]

The works developed by Ocenasek and Schwarz [27,28] are of great importance. They proposed two methods of parallelization for an algorithm belonging to the E B N A family: the Bayesian Optimization Algorithm ( B O A )[30]. The central idea of the pBOA [27] is the creation of the Bayesian network, so that each processor introduces arcs irrespective of other processors. In this way, in each generation a permutation is performed of the different problem variables with the goal that the resulting graph is acyclic. This algorithm uses a pipe-line process organization for the generation of the new individuals, thus allowing each one to generate portions of the individuals. In the case of the dBOA algorithm [28] the learning of the Bayesian

218

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

network is carried in a distributed environment in which each processor keeps a copy of the population at each generation. The new generation is made up in a distributed way also, allowing processors to generate population portions separately. Mendiburu et al. [23] proposed several parallel implementations of sequential EDA algorithms, not only for the discrete domain but also for the continuous one. All the implementations used two different parallel programming interfaces (APIs): MPI [13] and POSIX threads [6]. This permits using a combined multi-thread environment with message exchange in a distributed platform. The first implemented algorithm is the ~ E B N A Bwhere ~ ~ the , sequential function is preserved, using the parallel prosecution in order to accelerate some portions of the algorithm: those that consume the biggest execution time. Particularly, it is centered in the E B N A B I ~ ~ algorithm; in this way the works [22, 27, 281 are extended, allowing the learning of the Bayesian network without taking into account restrictions in the order of the problem variables, and using, not only multi-thread environments, but also distributed ones. The p E B N A p c is the second implemented algorithm in this work, whose functionality is similar to the previous one, but substituting the learning algorithm by another one based on the E B N A p c [23] algorithm. Then, the ~ E G N A E E algorithm is developed learning by a E G N A E E for continuous domains. This constitutes an important result, because it is a leading work in parallelism with EDAs for continuous domains. A managedworker model is the main model in [23]. One of the processes behaves as the manager and makes the centralized search, sending and gathering information to/from the slave processes. The need of sending the population to all the slaves for learning is an important shortcoming of these parallel algorithms, reducing the efficiency because of the high communication overheads. Ahn et al. [ 11 proposed a general framework to develop distributed EDAs. Unlike in classic distributed algorithms, where the migration exchanges individuals among the islands, this algorithm simulates the process by using two probability vectors, one to keep the resident individuals in the island ( T P V )and another for individuals that arrive to the island (ZPV).With these two vectors the search proceeds in three phases: generation, selection, and update (learning). The framework has a great advantage, since it requires low communication of data. In order to show the function of the proposed framework, the algorithm P2B I L L is implemented, which uses the P B I L algorithm [ 5 ] in the learning step. Lob0 et al. [21] presented an architecture for massive parallelization of the Compact Genetic Algorithm (cGA). The architecture uses the managedworker model, in a way that the manager is in charge to keep the probability vector that represents the individuals in a centralized way, and each worker executes a cGA. At the end of each generation the workers communicate with the manager in order to update the centralized probability vector. The proposed scheme presents three fundamental advantages: first the synchronization low costs, second it is fault tolerant, and third it is scalable. Finally, Mendiburu et al. [24] implement an extension to the pEBNAo1c algorithm that consists in performing the generation of new individuals in a distributed form. Each slave receives from the manager process a variable order and the probabilities to generate a portion of the population and sends it to the master. As its ancestors, it suffers the problem of communication highest costs.

CONCLUSIONS

219

It is interesting to note that many current implementations are a kind of iterative improvement on previous algorithms, such as pEBNA. This algorithm evolved from an implementation that only used multithreading to a new implementation that conjugates threading with distributed processing that not only distributes the estimation step but also generates new individuals. Another important issue for advances is to clarify and classify the parallel EDAs according to the implemented level of parallelism. The main efforts have been focused on approaches that improve the time of the learning process, although there are also proposals that parallelize the simulation step [27, 281. On the other hand, the population level of parallelism has been applied to EDAs whose complexity of learning is rather low. The only exceptionis the yet-unpublishedwork on cEDAs [29], where all the levels of parallelism are combined harmoniously with the population level. Observe that most of the implementations have been developed for discrete domains. This seems to be a consequence of the current state of the art of EDA research, where an adequate balance between the discrete and continuous domain is still missing. Fortunately, this situation is changing; thus it is expected that new parallel implementations for continuous domains will appear in the near future. Nonetheless, some exceptions exist, like [23] and the mentioned cEDA strategy, which can also deal with continuous and discrete nodes in the same way. In summary, many works still are left to unify parallel EDAs and to provide a clear list of advantagesidrawbacks for each of them. In this sense, existing works on other parallel metaheuristics will be of capital importance to guide the research in this field. We conclude with the following observation: there are not many reported applications of parallel or sequential EDAs to real-world problems of large dimensions in academic and industrial domains. Hopefully, this scenario will change, and most probably parallel EDAs will help achieve this goal. 9.5

CONCLUSIONS

In this work we have reviewed the main parallel models for EDAs. We have discussed the master/slavemodel, the distributed model (island based model), the (new) cellular model, and how they are applied to EDAs. The advantages of sequential EDAs with respect to other traditional EAs (here where this holds) are amplified when a parallel EDA is used. Finally, a summary of works has been presented with the aim of giving a unifying landscape to the reader. One of the objectives has been to offer a starting point on this topic to newcomers and to promote the application of these algorithms to practical and complex problems.

220

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

Acknowledgments E. Alba acknowledges partial funding by the Ministry of Science and Technology and FEDER under contract TlC2002-04498-C05-02 (the TRACER project).

REFERENCES 1. C.W. Ahn, D. E. Goldberg, and R.S. Ramakrishna. Multiple-deme parallel estimation of distribution algorithms: Basic framework and application. Technical Report 200301 6, University of Illinois at Urbana-Champaign, 2003. 2. E. Alba. A survey of parallel distributed genetic algorithms. 4(4):3 1-52, 1999.

Complexity,

3. E. Aha. Parallel evolutionary algorithms can achieve superlinear performance. Information Processing Letters, 82( 1):7-13, April 2002. 4. E. Alba and J. M. Troya. Influence of the migration policy in parallel distributed GASwith structured andpanmicticpopulations. Applied Intelligence, 12(3):163181,2000. 5. S. Baluja. Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94- 163, Carnegie Mellon University, Pittsburgh, PA, 1994.

6. D. R. Butenhof. Programing with P0SIXThread.s. Addison-Wesley Professional Computing Series, 1997.

7. E. Canti-Paz. Eficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Press, 2000. 8. E. Cantu-Paz and D. E. Goldberg. Predicting speedups of idealized bounding cases of parallel genetic algorithms. In T. Back, editor, In Proceedings of the Seventh International Conference on GAS,pages 1 13-120. Morgan Kaufmann, 1997. San Francisco. 9. D. Chickering, D. Geiger, and D. Heckerman. Learning bayesian networks: Search methods and experimental results. In Proceeding of 5th Conference Artificial Intelligence and Statistics, pages 112-128, 1995. 10. J. P. Cohoon, S. U. Hedge, W. N. Martin, and D. Richards. Punctuated equilibria: A parallel genetic algorithm. In J. J. Grefenstette, editor, In Proceedings ofthe Second International Conference on Genetic Algorithms, pages 148-154, 1987. 11. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data Via the EM Algorithm. Journal of the Royal Statistical Society, B(39):1-38, 1977.

REFERENCES

221

12. R. Etxeberria and P. Larraiiaga. Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 332-339, 1999. 13. M.P.I. Forum. MPI: A message-passing interface standard. The International Journal of Supercomputer Applications and High Performance Computing, 8 (3/4):119-416, 1994. 14. M. Giacobini, E. Alba and M. Tomassini. Selection Intensity in Asynchronous Cellular Evolutionary Algorithms. E. Cant&Paz et al. (ed.), in Proceedings qf the Genetic and Evolutionary Computation Conference GECC0’03, 955-966, 2003. 15. J. J. Grefenstette. Parallel adaptative algorithms for function optimization. Technical Report CS-81-19, Vanderbilt University, 1981. 16. D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Redmond, WA, USA, 1995.

17. M. Henrion. Propagating uncertainty in bayesian networks by probabilistic logic sampling. Uncertainty in Artijicial Intelligence, 2 3 17-324, 1988. 18. P. Larrafiaga. A review on estimation of distribution algorithms. In P. Larraiiaga and J. A. Lozano, editors, Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers, 2001. 19. P. Larraiiaga, R. Etxeberria, J. A. Lozano, and J. M. Peiia. Optimization by learning and simulation of Bayesian and Gaussian networks. Technical Report KZZA-IK-4-99, Department of Computer Science and Artificial Intelligence, University of the Basque Country, 1999. 20. P. Larrafiaga, R. Etxeberria, J. A. Lozano, and J. M. Pefia. Optimization in continuous domains by learning and simulation of gaussian networks. In A. S. Wu, editor, Proceedings of the 2000 Genetic and Evolutionary Computation Conference Workshop Program, pages 201-204,2000. 21. F. G . Lobo, C. F. Lima, and H. Martires. An architecture for massively parallelization of the compact genetic algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004), 2004. 22. J. A. Lozano, R. Sagama, and P. Larraiiaga. Parallel Estimation of Distribution Algorithms. In P. Larraiiaga and J. A. Lozano, editors, Estimation ofDistribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers, pages 129-145,2001. 23. A. Mendiburu, J.A. Lozano, and J. Miguel-Alonso. Parallel estimation of distribution algorithms: New approaches. Technical Report EHU-KAT-IK-1-3, Department of Computer Architecture and Technology, The University of the Basque Country, 2003. Submitted to IEEE Transactions on Evolutionary Computation.

222

PARALLEL ESTIMATION OF DISTRIBUTION ALGORITHMS

24. A. Mendiburu, J. Miguel-Alonso, and J.A. Lozano. Implementation and performance evaluation of a parallelization of estimation of bayesian networks algorithms. Technical Report EHU-KAT-IK-XX-04, Department of Computer Architecture and Technology, The University of the Basque Country, 2004. Submitted to Parallel Computing. 25. H. Muhlenbein, T. Mahnig, and A. Ochoa. Schemata, Distributions and Graphical Models in Evolutionary Optimization. Journal of Heuristics, 5(2):2 13-247, 1999. 26. H. Muhlenbein and G. Paas. From Recombination of Genes to the Estimation of Distributions i. Binary Parameters. Lecture Notes in Computer Sciences. Parallel Problem Solving from Nature PPSNIV, 1141:178-187, 1996. 27. J. Ocenasek and J. Schwarz. The parallel bayesian optimization algorithm. In Proceedings of the European Symposium on Computational Intelligence, pages 61-67, 2000. Slovak Republic. 28. J. Ocenasek and J. Schwarz. The distributed bayesian optimization algorithm for combinatorial optimization. In EUROGEN 2001 - Evolutionary Methods for Design, Optimization and Control, pages 115-120, 2001. Athens, Greece. 29. A. Ochoa, M. Soto, and E. Alba. Cellular Estimation of Distribution Algorithms. In preparation, 2005.

30. M. Pelikan, D. E. Goldberg, and E. Cant&Paz. BOA: The Bayesian Optimization Algorithm. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Conference GECCO-99, volume 1, pages 525-532. Morgan Kaufmann Publishers, San Francisco, CA, 1999. Orlando, FL. 31. M. Soto, A. Ochoa, S. Acid, and L. M. de Campos. Introducing the polytree aproximation of distribution algorithm. In Second Symposium on Artijicial Intelligence. Adaptive Systems. CIMAF 99, pages 360-367, 1999. La Habana. 32. R. Tanese. Parallel genetic algorithm for hypercube. In J. J. Grefenstette, editor, In Proceedings of the Second International Conference on Genetic Algorithms, pages 177-183, 1987. Hillsdale, NJ: Lawrence Erlbaum Associates. 33. D. L. Whitley. An executable model of a simple genetic algorithm. In D. L. Whitley, editor, In Proceedings of the Second Workshop on Foundations of Genetic Algorithms, pages 45-62, 1992. 34. J. Whittaker. In Graphical Models in Applied Multivariate Statistics. John Wiley & Sons, Inc., 1990.

10 Parallel Scatter Search FELIX GARCIA LOPEZ, MIGUEL GARCIA TORRES, BELEN MELIAN BATISTA, JOSE A. MORENO PEREZ, J. MARCOS MORENO VEGA Universidad de La Laguna, Spain

10.1 INTRODUCTION Scatter Search (SS) [26] is an evolutionary algorithm (EA) in which a moderatesize set of solutions evolves due to mechanisms of intelligent combination between solutions. Unlike other strategies of combination of existing rules, like Genetic Algorithms, the search for a local optimum is a guided task. In order to carry out this strategy, a moderate-size reference set (Reflet) is selected from a wide population of solutions. This Refset is generated and iteratively updated, attempting to intensify and diversify the search. After combining the solutions in the reference set, a local search procedure is applied to improve the resulting solution, and the Refset is updated to incorporate both good and disperse solutions. These steps are repeated until a stopping condition is met. The method provides not only a single heuristic solution, like other metaheuristics, but also a reduced set of disperse high quality solutions. Parallel implementations of metaheuristics appear quite naturally as an effective alternative to speed up the search for approximate solutions of combinatorial optimization problems. They allow not only to solve larger problems or to find improved solutions with respect to their sequential counterparts but also lead to more precise random algorithms. We say that a random algorithm A l g A is more precise than a random algorithm A l g B if, after running both algorithms the same number of times, A l y ~reaches objective function values with less standard deviation. Several parallelization strategies can be applied to solve selection problems with a Scatter Search procedure. The parallelism can be used to run several Sequential Scatter Searches in different processors or to accelerate the local search involved in the scatter search. In addition, another possible parallelization consists of running simultaneously several combination strategies by using different processors. The aim of this chapter is to describe the design and implementation of the Scatter Search metaheuristic parallelized to solve selection problems. The selection 223

224

PARALLEL SCATTER SEARCH

problems are those problems that can be solved by choosing an optimal set of items from a universe. A generic selection problem consists in choosing the set of items that minimizes a cost function subject to some constraints. The objective hnctions for these problems range from the simplest one up to that need to solve another hard problem or even to perform a simulation process. Also the set of constraints can be so restrictive such that only one selection is feasible or that all selections of items are feasible. Some of the most relevant selection problems that appear in Combinatorial Optimization and Machine Learning are the Travelling Salesman Problem, Knapsack Problem, Median Problem, Spanning Tree Problem, Shortest Path Problem, Network Flow Problem, Matching Problem, Steiner Problem, Subset Selection Problem, Clustering Problem, and Classification Problem. A fixed size, say p, selection problem, named a p-selection problem, consists in choosing the set of p items that minimize a cost function subject to some constraints; i.e., a selection problem where all the feasible solutions have size p . Among them are the p-Median Problem and the p-Clustering Problem. The p-Median Problem is a discrete location allocation problem, whose objective is to select, from a discrete candidate set of facility points, p locations such that the sum of the distances from a set of users to the chosen facility points is minimized. We consider the application of Parallel Scatter Search to the p-Median Problem and to the Feature Subset Selection Problem. In the Feature Subset Selection Problem in a classification task the size of the solutions is not fixed. Given a set of instances characterized by several features, the classification task consists of assigning a class to each instance. The Feature Subset Selection Problem selects a relevant subset of features from the initial set in order to classify future instances. We show in this chapter several ways of parallelizing the Scatter Search that were applied to the mentioned problems. The next Section describes the Scatter Search metaheuristic and its parallelizations are described in Section 10.3. Sections 10.4 and 10.5 define the p-Median Problem and Feature Subset Selection Problem and summarize the main characteristics of components of the Scatter Search for these problems. The computational experiences in these problems are analyzed in Section 10.6. Finally, the main conclusions are reported in Section 10.7. 10.2

SCATTER SEARCH

Scatter Search (SS) [26] is an EA that combines good solutions of a reference set (Ref Set) to construct others exploiting the knowledge of the problem at hand. Genetic Algorithms [ 17],[2 11 are also EAs in which a population of solutions evolves by using the mutation and crossover operators. These operators have a significant reliance on randomization to create new solutions. Unlike the population in GAS, Ref Set of solutions in SS is relatively small. The principles of the Scatter Search metaheuristic were first introduced in the 1970s as an extension of formulations for combining decision rules and problem constraints. This initial proposal generates solutions taking account of characteristics in several parts of the solution space 1111. An important feature of Scatter Search is

PARALLEL SCATTER SEARCH

225

its association with the Tabu Search metaheuristic and the fact that the search can be improved by including particular forms of adaptive memory and associated memoryexploiting mechanisms [12]. Scatter Search has an implicit form of memory, which can be considered as an inheritance memory, since it keeps track of the best solutions found during the search and selects their good features to create new solutions. Since this inheritance memory is not sufficient, the adaptive memory principles of Tabu Search can improve the effectiveness of Scatter Search. Consequently, the ScatterTabu Search Hybrid has been widely applied to solve optimization problems. The Scatter Search Template, proposed by Glover in 1998 [13], summarizes the general description of Scatter Search given in [12]. Scatter Search consists of five component processes: DiversiJication Generation Method, which generates a set of diverse solutions, Improvement Method, which improves a solution to reach a better solution, Reference Set Update Method, which builds and updates the reference set consisting of R e f SetSize good solutions, Subset Generation Method, which produces subsets of solutions of the reference set, and Solution Combination Method, which combines the solutions in the produced subsets. A comprehensive description of the elements of Scatter Search can be found in [ 131, [14], [15] and [16]. The basic Scatter Search procedure (see Figure 10.1) starts generating a large set of diverse solutions Pop, which is obtained using the Diversification Generation Method. This procedure creates the initial population (Pop),which must be a wide set consisting of diverse and good solutions. Several strategies can be applied to get a population with these properties. The solutions to be included in the population can be created, for instance, by using a random procedure to achieve a certain level of diversity. An Improvement Method is applied to each solution obtained by the previous method, reaching a better solution, which is added to Pop. A set of good representative solutions of the population is chosen to generate the reference set ( R e f Set). The good solutions are not limited to those with the best objective function values. The considered reference set consists of Ref SetSizel solutions with the best objective function values and Ref SetSize2 diverse solutions. Then RefSetSize = Ref SetSizel+ Re f SetSize2. The reference set is generated by first selecting the RefSetSizel best solutions in the population and second adding Ref SetSize2 times the most diverse solution in the population. Several subsets of solutions from the R e f Set are then selected by the Subset Generation Method. The Solution Combination Method combines the solutions in each subset, taking account of their good features. Then, the Improvement Method is applied to the result of the combination to get an improved solution. Finally, the Reference Set Update Method uses the obtained solution to update the reference set. 10.3 PARALLEL SCATTER SEARCH Although metaheuristics provide quite effective strategies for finding approximate solutions to combinatorial optimizationproblems, the computationaltimes associated with the exploration of the solution space may be very large. With the proliferation of

226

PARALLEL SCATTER SEARCH

procedure Sequential Scatter Search begin Create Population; Generate Reference Set; repeat repeat Subset Generation Method Solution Combination Method; Improvement Method; until (StoppirsgCriterion1); Reference Set Update Method; until (StoppiizgCriterio122); end. Fig. 10.1 Sequential Scatter Search Metaheuristic Pseudocode.

parallel computers, parallel implementations of metaheuristics appear quite naturally as an alternative to speedup the search for approximate solutions. Moreover, parallel implementations allow also solving larger problems or finding improved solutions, with respect to their sequential counterparts, due to the partitioning of the search space. Therefore, parallelism is a possible way not only to reduce the running time of local search algorithms and metaheuristics but also to improve their effectiveness and precision. Three parallelizations of Scatter Search to solve the p-Median Problem were proposed in [9]. The aim of these strategies was to reduce the running time of the algorithm and increase the exploration in the solution space. Moreover, some of these strategies improved the quality of the solutions. The first two strategies reduce the running time of the procedure. The first parallelization reduces the running time of every local search algorithm and the second parallelization performs the local searches in parallel from the results of the combinations. The last parallelization increases the exploration in the solution space by parallel running the Sequential Scatter Search for several populations. More precise implementations of Scatter Search by using different combination methods and parameter settings at each processor can be obtained, leading to high quality solutions for different classes of instances of the same problem, without too much effort in parameter tuning and with the same execution time as the sequential algorithm. With this purpose, two different combination methods designed for the Feature Subset Selection Problem were run simultaneously in different processors in [lo].

1 . Synchronous Parallel Scatter Search. In the sequential Scatter Search algorithm, in every iteration, the most time-consuming part is the local search.

Then, a synchronous algorithm that enables solving, in parallel, the local searches was proposed. We denoted the Synchronous Parallel Scatter Search

PARALLEL SCATTER SEARCH

227

algorithm that parallelizes the local search by SPSS. Figure 10.2 shows the pseudocode of the SPSS with n-pr processors. procedure SPSS begin Create Poptilation; repeat Generate Reference Set; repeat Subset Generation Method; Solution Combination Method; (* Improvement Method *) Take N t N(CurSol).(* the neighborhood of CurSol *) Divide N in n-pr subsets N,, r = 1, ..., x p r . repeat Set ImpSol +- CurSol for each processor r = 1,..., n-pr, do in parallel Set X, t CurSol ; for each X E N , do If Cost(X) < Cost(X,) then X , t X; for r = 1,..., nqr do If Cost(X,) < Cosf(CurS0l) then CurSol +- X,; until Cost(1mpSol) = Cost(CurSol); until (StoppingCriterionl); Reference Set Update Method until (StoppingCriterion2); end. Fig. 10.2 Synchronous Parallel Scatter Search.

Note that this pseudocode is obtained from the Sequential Scatter Search by replacing the local search by the parallel version of the local search. In SPSS, the neighborhood of the solution N is divided into ii-pr subsets N,. These subsets are assigned to the processors and each returns an improving neighbor in its subset of the neighborhood. The best solution among the neighbors provided by the processors is chosen as the current solution. This strategy is a low level parallelism. 2 . Replicated Combination Scatter Search. Another parallel algorithm that also reduces the running time of the procedure was proposed. The procedure is parallelized by selecting several subsets from the reference set that are combined and improved by the processors. These procedures are replicated as many times as the number of available processors. The local optima found by the processors are used to update the reference set. This method is the Replicated Combination Scatter Search (RCSS) that is described in Figure 10.3.

228

PARALLEL SCATTER SEARCH

procedure RCSS begin Create Population; Generate Reference Set; repeat repeat for each processor r = 1 , .... n-pr, do in parallel Subset Generation Method; Solution Combination Method; Improvement Method; until (StoppingCriterioizl); Reference Set Update Method; until (StoppingCriterion2); end. Fig. 10.3 Replicated Combination Parallel Scatter Search Pseudocode.

3. Replicated Parallel Scatter Search. The Replicated Parallel Scatter Search (RPSS) consists of a multistart search where the local searches are replaced by Scatter Search methods using different populations that run on the parallel processors (see Figure 10.4). The RPSS parallelization method corresponds to a natural parallelization of the hybrid metaheuristic between a Scatter Search and a Multistart Search.

procedure RPSS begin for each processor r = 1, ..., n-pr, do in parallel Create Population; Generate Reference Set; repeat repeat Subset Generation Method; Solution Combination Method; Improvement Method; until (StoppingCriter ion 1); Reference Set Update Method; until (StoppingCriterion2); end. Fig. 10.4 Replicated Parallel Scatter Search Pseudocode.

4. Multiple Combinations Scatter Search. This Parallel Scatter Search applies a different combination method at each processor ( r = 1 , . . . , npr).Figure 10.5 shows the pseudocode ofMultiple Combinations Scatter Search ( M C S S ) .

APPLICATION OF SCATTER SEARCH TO THE p M E D l A N PROBLEM

229

SolutionCombinationMethod, denotes the combinationmethod applied by processor r. The development of several combination methods for the Scatter Search metaheuristic has been utilized in previous works. For example, Campos et al. [4] designed different combination methods for a sequential implementation of Scatter Search for the linear ordering problem. They also assessed the relative contribution of each method to the quality of the final solution. Based on the results obtained, they used the combination method that presented a better performance. However, in the computational experiments we run two combination methods simultaneously by using two processors. Scatter Search requires high computational times, complicating the sequential execution of several consecutive combination methods. The goal of the proposed parallelization is to achieve an improvement of the quality of the solution using the same computational time as the sequential algorithm.

procedure MCSS begin Create Population; Generate Refirence Set; repeat repeat Subset Generation Method; for each processor r = 1, . . . ,nprdo in parallel Solution Combination Method,; Improvement Method; until (StoppingCriterionl); Reference Set Update Method; until (StoppingCriterion2); end. Fig. 10.5

10.4

Multiple Combinations Scatter Search Pseudocode.

APPLICATION OF SCATTER SEARCH TO THE p-MEDIAN PROBLEM

10.4.1 The p-Median Problem

The p-facility location-allocation problems constitute a wide class of logistic problems. Consider a set L of m potential locations for p facilities and a set U of locations of 71 given users. A y-facility location-allocationproblem consists of locating simultaneously p facilities at locations of L and allocating every user u to a facility point in order to minimize a cost function that represents the resource used for serving the demand of every user from the corresponding facility point. One of the most relevant

230

PARALLEL SCATTER SEARCH

p-facility location problems is the p-Median Problem. The p-Median Problem consists of locating p facilities in order to minimize the total distance between the users and its closest facility. Given the set L = (211, v2,...,v,} of potential locations for the facilities (or location points) and the set U = { u l ,u2, ..., u,} of users (or customers, or demand points), theentriesofann xmmatrix D = ( d , j ) n x m = ( D Z S ~ ( U , , U givethe ,)),~~~ distances travelled (or costs incurred) for satisfying the demand of the user located at u,from the facility located at vJ for all uJ E L and u,E U . The objective of the p-Median Problem is to minimize the sum of these distances (or transportation costs). i.e..

1x1

where X C L and = p. The p-Median Problem is NP-hard [23]. Many heuristics and exact methods have been proposed to solve it. Exact algorithms are provided by Beasley [2] and Hanjoul and Peeters [18], among others. Classical heuristics for this problem often cited in the literature are Greedy [25], Alternate [29], and Interchange [35]. In addition, several hybrids of these heuristics have been suggested. Another type of heuristics suggested in the literature is based on the relaxed dual of the integer programming formulation of thep-Median Problem and uses the well-known Dual Ascent heuristic, DUALOC [6]. In [31], a 1-interchange move is extended into a so-called I-chainsubstitution move, which is applied to the p-Median Problem. Another Tabu Search heuristic is suggested by Voss [37], where some variants of the so-called reverse elimination method are discussed. The Variable Neighborhood Search heuristic and its variants have also been applied to the p-Median Problems [ 191 [20]. In [8], several parallelization methods of the Variable Neihborhood Search are considered for the p-Median Problem. For most instances, the set of potential locations for the facilities and the set of locations of the users coincide, so that L = U and in = n. In this case, a solution for the p-Median Problem consists of selecting a set X of p points from U to locate the facilities. The solution is evaluated by a cost function which is the sum of the distances from the users to the points in the solution. This cost function is given by

Regardless of the search technique employed, it is necessary to specify a solution coding, which encodes alternative candidate solutions for manipulation. The choice of the coding that provides an efficient way of implementing the moves and evaluating the solutions is essential for the success of the heuristic. Every solution X is encoded by arranging all the points of U in an array [u, : i = 1,...,n],where w, is a point in X for i 5 p , and it is a point out of X for i > p . The neighborhood of every solution with the interchange moves is identified by the set I = { ( i , j ) : 1 5 i 5 p , p < j 5 n}. For each ( i , j )E I the corresponding neighbor X L jof X is obtained, = - {vz} {Vj}.

x,, x

+

APPLICATION OF SCATTER SEARCH TO THE p M E D I A N PROBLEM

231

10.4.2 Scatter Search Components

This section summarizes an implementation of the components of the Scatter Search for the p-Median Problem, explaining the key methods mentioned above. 1. Create Population. In the first place, the set L of points is divided into

several disjoint subsets. In each subset L , the following constructive method is applied. The method selects an arbitrary initial point u of L, and performs p - 1 times the following operation: select the farthest point to the already selected points. Since the constructed solution depends on the initial point, it is applied from several starting points to get different solutions for each subset Li. The Improvement Method is then applied to each solution obtained by the previous method.

With the purpose of evaluating the dispersion among solutions, a distance between them is considered. This distance is defined using the same objective function. Let f y ( X )be the objective function for the set of users in Y :

+

The distance between two solutions X and Y is D i s t ( X , Y ) = f y ( X ) fx ( Y ) . Given a previously fixed size PopSize for the population Pop, (Y is the proportion of these PopSize solutions that are selected according to the objective function value. The remaining solutions up to PopSize are obtained by a scoring procedure. For each solution X let the score be defined by

h ( X ) = C o s t ( X ) - /?Dist(X, Pop), where D i s t ( X , Pop) = minyEpopD i s t ( X , Y ) and p is a fixed factor. The I(1 - cY)PopSize] best solutions according to h ( . ) are included in the population Pop.

2. Generate Reference Set. The generation of a reference set is done by selecting R e f SetSizel solutions from the best solutions and Ref Setsize:! disperse solutions taking into account the above Ref SetSizel solutions ( R e fSetSize = R e f SetSizel + R e f SetSize:!). After including the best R e f SetSizel solutions in Ref Set, the algorithm iteratively includes in the R e f S e t the farthest solution from the solutions already in R e f S e t , repeating this procedure Ref Setsize:! times. 3 . Subset Generation Method. The selection of a subset for applying the combination usually consists in selecting all the subsets of fixed size r (for the computational experience r = 2 was used). However, in order to avoid repetitions of combinations when some solutions of the reference set do not vary, information on the combinations performed is kept.

232

PARALLEL SCATTER SEARCH

4. Solution Combination Method. Given two solutions, in the first place this method selects the points common to both solutions. Let X be the set of these points. For every point u E L \ X let

L ( u ) = {?I E L : DiSt(lL,'U) 5 ;?Dist,,,} where

Dist,,,

= max U,U€L

Dist(u;u).

Choose the point u* E L such that D i s t ( X ,u*)= m a x U c L D i s t ( X , u)and select at random a point u E L(u*)that is included in X . This step is iteratively applied until = p.

1x1

5 . Improvement Method. Given a solution, the Improvement Method performs a local search with interchanges moves, which replace a point in the solution by a point out of the solution. 6 . Reference Set Update Method. Let InipSolSet be the set of all the solutions reached by the Improvement Method. Apply Generate Reference Set to the set

RefSet u ImpSolSet.

10.5 APPLICATION OF SCATTER SEARCH TO FEATURE SUBSET SELECTION 10.5.1 The Feature Subset Selection Problem The feature subset selection problem consists of finding a subset of the original set of features, such that an induction algorithm using only these features provides the best performance. Selecting the optimal feature subset is an NP-hard optimization problem [24]. Therefore exact algorithms should not be used due to the complexity of the problem. For example, determining the optimal binary decision tree is an NP-hard problem [ 2 2 ] . Several heuristic algorithms have been proposed for solving the feature subset selection problem. One of the most widely used metaheuristics are the Genetic Algorithms. These algorithms have been proposed and analyzed for the feature subset selection problem in [7], [34], [36] and [40]. Let A be a set of given instances, which are characterized by d features X = {X,: j = 1 , .. . , d } . Each feature is either a nominal or a linear attribute. An attribute is linear if the evaluation of the difference between two of its values has sense (being discrete or continuous); otherwise it is nominal. Furthermore, each instance has a label that indicates the class to which it belongs. In order to carry out the task of classifying by means of supervised learning, a subset of instances T c A in which labels are known and can be used as training examples and the subset V = A \ T of instances to be classified (validation instances) are considered. The labels of V will only be used to measure the performance of the classifier. In the feature subset selection problem, the set of features with the best performance must be obtained. The accuracy percentage is often used to measure the performance

APPLICATION OF SCATTER SEARCH TO FEATURE SUBSET SELECTION

233

of a classifier. Then, the associated optimization problem consists of finding the subset S C { X , : j = 1,. . . , d } with the highest accuracy percentage. However, this percentage can only be estimated using the validation instances, since V is only a subset of the set of instances to be classified. The k-fold cross-validation method is widely used to estimate the accuracy percentages of a subset of features S on a given set of instances B. The method proceeds in the following way. The set of instances B is randomly divided into k disjoint subsets of equal size B1, B2, . . . , B k and k trials are carried out. In the trial i , B, is considered the test set and the training set is the union of the other subsets T, = B \ B,. In each trial, the test instances are classified using the learning algorithm. The estimated accuracy percentage of the classifier is the average of the accuracy percentages over all the trials. The estimated accuracy percentage of a subset of features S on a given set of instances B using cross-validation is stated as follows: la E B : fa = c,l (10.1) f B ( S ) = 100 IBl where c, is the class of each instance a and C, is the class assigned by the classifier. In the computationalexperiencethe IB 1, Naive Bayes, and C4.5 Decision Tree (see [30]) provided by the Weka Machine Learning Project [39] were used as inductive classifiers. If IB 1 is used, for each instance v in the test set, its nearest example t in the training set is calculated and then both of them are considered that belong to the same class and have the same label (i.e., C;, = ct with ct the label oft and C;, the label of the class assigned to v). The distance function considered was the heterogeneous Euclidean overlap metric ( H E O M ) ,which can handle both nominal and linear attributes [38]. The overlap metric was applied to nominal attributes, and the normalized Euclidean distance was considered for linear attributes. Let t = ( t l ,t 2 , . . . , t d , ct) be an example with value t , for the 2-th feature and label ct, and let w = (v1, W Z , . . . ,V d ,cV) be an instance, with similar notation. Let S 2 { X , : j = 1,. . . , d} be the feature subset considered. The distance between t and v is defined as

with

dist(t,, vu3)=

{

if t, or v, is unknown if X, is nominal if X , is linear

1

v,) dist,(t,, v,) dist,(t,,

where d i s t , is the overlap metric and dist, is the normalized Euclidean distance. That is, 0 if t j = 71j diSt,(tj, W j ) = 1 otherwise and

dist,(t,,

Vj) =

Itj

-

vjl

maxj - minj '

234

PARALLEL SCATTER SEARCH

where mux, and min, are respectively the maximum and minimum values of the feature X , in the training set. Note that, since the validation set is assumed unknown, the normalized distance can be greater than 1 for instances out of the training set. The Naive Bayes Classifier is a practical very appropriated method when the attributes that describe the instances are conditionally independent given the classification. Given the attributes t = (tl ! t 2 . . . . t d ) that describe an instance, the most probable class is !

ct = arg max P(cjX1 = t l , X 2 cEClass

= t z , . . . , Xd = t d ) .

By Bayes theorem

ct = arg max P ( X 1 = t l , X 2 = t ~ , cEClass

Then, assuming the conditional independence, the Naive Bayes classifier is stated as d

In practical applications the theoretical probabilities are replaced by their estimations. Each probability is estimated by the corresponding frequencies in the training set. One of the two major objections to this method is the case where none of the training instances in a given class have an attribute value. If P r ( X , = t,lc) = 0, then every instance with this value cannot be classified in class c. Therefore, modified estimations of these probabilities are used. The other major objection is that the conditional i t dependence assumption is often violated in real applications. However, it works well even in that case because it is only needed that arg max P ( X ~= t l , . . . ,x cEClass

d

d = t d / C ) P ( C ) = arg

max

cEClass

P(X,

P(C)

= t,lc).

3=1

and the feature selection procedure helps to choose those attributes that are conditionally independent given the classification. The C4.5 algorithm is an improvement ofthe classical ID3 (Interactive Dichotomer 3) method for constructing a decision tree. The improvement includes a method for dealing with numeric attributes, missing values, noisy data, and generating rules for trees. The basic ID3 is a “divide and conquer” method that works as follows. First, it selects an attribute test to place at the root node and make a branch for each possible result of the test. Usually, each test involves only an attribute and one branch is made for each possible value of the attribute. Ten, this splits the training set into subsets and the process is repeated recursively with each branch, using only those instances that actually reach the branch. When all the instances at a node have the same classification, stop developing that part of the tree.

APPLICATION OF SCATTER SEARCH TO FEATURE SUBSET SELECTION

235

The algorithms determine the test to place at each node. The ID3 uses the information gain criterion to construct the decision tree. The information gain is measured by the purity of the set of instances corresponding to each branch. The purity of a set of instances is measured by the amount of information required to specify the class of one instance that reaches the branch. The use of the gain ratio is one of the improvements that were made to ID3 to obtain the C4.5 algorithm. The gain ratio is a modification of the information gain measure to compensate for its tendency to prefer attributes with a large number of possible values. The gain ratio takes into account the number and size of the daughter nodes into which a test splits the training set. For the purpose of guiding the search for the best subset of features (training) and measuring the effectiveness of a particular subset of features after the search algorithm has chosen it as the solution of the problem (validation), the function (Equation 10.1) for 2-fold cross-validation is used. To guide the search, f~(.) is considered, and to measure the effectiveness, fv(.) is used. In validation, 5 x 2 cross-validation (5 x 2cv) [ 5 ] that consists of dividing the set V twofold and then conducting two trials was considered. This is done for 5 random arrangements of V . However, in training, 1 x 2cv, where only one arrangement is done, was used.

10.5.2 Scatter Search Components This section summarizes an implementation of the components of Scatter Search for the Feature Subset Selection Problem, explaining its key methods. 1. Create Population. For the feature subset selection problem, the solution space size depends on the number of features of the problem. Therefore, the size of the initial population is fixed depending on the number of features. PopSize = d2,where d is the number of features, was used.

In order to build a solution, using the Divers$cation Generation Method, the vector of weights of the features P ( X ) = ( P ( X , ) ,. . . ,P ( X d ) ) given by P ( X , ) = f ~ {(X , } ) is considered. These weights indicate the quality of the feature for classifying by itself. Let L be the set of features X , with the highest weights P( X , ). The DiverslJicationGeneration Method, stated in Figure 10.6, consists of iteratively selecting at random one of the ILl best possible features (according to P ) while its inclusion improves the set. 2. Generate Reference Set. In general, the Reference Set Update Method is applied to generate and update the reference set. However, for this problem two different methods, Generate Reference Set and Divers$cation Generution Method, were developed to cany out these tasks. Let C be the set of features that belong to any solution already in the reference set; i. e.,

c=

u

SERefSet

s.

236

PARALLEL SCATTER SEARCH

(a) Set S = 0. (b) Repeat i. Select at random a feature from L. Let X j , be the selected feature. ii. If f T ( { X J c }US) 2 fT(S) then S + S U { X j * }and Let Xj $ L be the feature with the highest P ( X j ) ,then L ( L\ u { X , }. until no improvement is reached. +

{%*}I

Fig. 10.6 Diversification Generation Method.

The diversity of each solution S , D i v ( S ) ,is given by the symmetric difference between S and C, D i v ( S , C), defined as follows:

D i v ( S ) = Div(S,C ) = I(Su C ) \ ( S n C)i. The algorithm proposed to generate the reference set is described in Figure 10.7. (a) Initialize: i. Let R e f S e t be the empty set. ii. Add to Ref Set the RefSetSizel best solutions in Pop. iii. Obtain the initial set of features: C = U S ~ R ~ ~ S ~ ~ S , (b) Repeat: i. ii. iii. iv. v.

for each S @ Ref Set, calculate D i v ( S ,C ) . Set S* = arg max D i v ( S ,C ) : S @ Ref Set. Ref Set + Ref Set u S*.

Ref SetSize +-- Ref SetSize + 1. Update C. until R e f SetSize = R e f SetSizel + R e f SetSize2. Fig. 10.7 Generate Reference Set Method.

3. Subset Generation Method. As is usual in the application of Scatter Search, all the subsets of two solutions in the current reference set of solutions were considered. The solutions in the subsets are then combined to construct other solutions. 4. Solution Combination Method. This procedure combines good characteris-

tics of the selected solutions to get new current solutions. Two combination methods, which are both greedy strategies, were considered.

APPLICATION OF SCATTER SEARCH TO FEATURE SUBSET SELECTION

237

Let 5’1and S2 be the solutions in the subset. Each combination method generates two new solutions, S; and Sh. We will refer to the first strategy as greedy combination (GC) and to the second as reduced greedy combination (RGC). They both start by adding to the new solutions S; and S i the features common to S1 and SZ. Then, at each iteration, one of the remaining features in S1or S2 is added to S; or Sa.The reduced version only considers those features that have appeared in good solutions. The description of the greedy combination (GC) is stated in Figure 10.8. (a) Initialize new solutions:

S; S,

+

+-

s1n s2.

s1n s,.

Let C = (S1u S2) \ (S1 n S2).

(b) Repeat

i. for each feature X , E C evaluate f~ ( S ; U { X , }) and f~ (Sb U { X , }) . ii. Let j ; and j2+be the features such that

fT(s;u {X,: 1) = my{fT(s;u { X , ) ) } and

f&

u { X I ; 1) = max{fr(S; u { X , 111 ,

respectively. iii. 1 f f ~ ( S U ; { X , ; } ) > ~ T ( S or ; )f~(Sh U { X J ; } )> f~(S,i) then

A. I f f d S ; IJ { X , ; } ) > ~ T ( S and ; ) f~(Sa U {X,;}) I fds;),then set k = 1. B. I f f T V ; u { X , ; } ) L fT(s;)and fT($ u { X , ; } ) > fT(Sh),then set k = 2.

> f~(s;) and f~(Sa U { X I ; } ) > f~(Sa), then set k = a r g m a q fT(s;u {X,; )), fT(Siu { X J ; } ) } . If fT($ u { X , ; } ) = f~(Sa U { X , ; } ) then k is the index (1 or 2) corresponding

C. I f f d S ;

U {X,;})

to the solution with the smallest number of features. If both solutions have the same number of features then choose k randomly. Add X J ; to the solution S i , set C = C \ X,; and go to 2. until there is no improvement; i.e., ~ T ( Su;{ X J ; } ) 5 fr(S;) and

f~(Sh U

{X,;}) 5 fT(Sh Fig. 10.8 The GC combination.

The reduced greedy combination strategy (RGC) differs from the first one in that, instead of considering the whole set of features in C = (Sl U S2) \ (Sln Sa), it only uses the features with the highest accuracy percentages. The initial

238

PARALLEL SCATTER SEARCH

set C is reduced by applying the following procedure. Let Q be a weight vector defined in the following way. For each feature X , E C , Q(-X,) is the average estimated accuracy percentage over all the analyzed solutions containing the feature X,. Then, Q ( X , ) is stated as follows:

Let

be the average of the values Q ( X j )such that Xj E C, _I

ICI

The RGC strategy uses the features X , E C such that & ( X I )2 Q.

5. Improvement Method. The Improivement Method is applied to every solution S generated by the combination method explained above. Let CA be the set of features that do not belong to the solution S. Then, the features X, E C A are ordered according to their weights P ( X , ) . The improvement method is described in Figure 10.9. (a) Let X(l),. . . , X(lcAl)be the features ordered such that

(b) j

+

0.

(c) Repeat: i. j + j + l .

ii. I f h ( S U { X ( , ) } )2 until Q = /CAI)

fT(S),

then S

t

S u {X(,)}.

Fig. 10.9 The Improvement Method.

The aim of the method is to add to the solution those characteristics that improve it. All the solutions reached by the Improvement Method are recorded in a pool, IrnpSolSet, which is then used to update the reference set. 6. Reference Set Update Method. Finally, after obtaining all the improved solutions, RefSet is updated according to intensity and diversity criteria. First, the method selects the Ref Setsize12 best solutions from R e f Set u InipSolSet. Then, Ref Set is updated according to the diversity criterium by applying the procedure explained in 2. This strategy is called the Static Update of the reference set. If a Dynamic Update strategy were used, the combination method would be applied to a new solution faster than in the static strategy. That is,

239

COMPUTATIONAL EXPERIMENTS

instead of waiting until all the combinations have been performed to update the reference set, if a new solution is added to the reference set, the set is updated before the next combination is carried out (see [26]).

10.6 COMPUTATIONAL EXPERIMENTS 10.6.1 Parallelization Strategies with a Single Combination The parallelization strategies named Synchronous Parallel Scatter Search, Replicated Combination Scatter Search, and Replicated Parallel Scatter Search use a single combination method. The algorithms were coded in C and OdinMp [3] (OdinMp is a free and portable implementation of OpenMP [33] for ANSI C) and tested with large instances of the p-Median Problem. The distance matrix was taken from the instance TSPLIB RL1400 that includes 1400 points. The sets of instances are characterized with the number 1%of points (1400) and the number p of facility points or medians that is reported in the first column of Tables 10.1, 10.2, and 10.3, going from 40 to 100. Some computational results on these instances of the problem have been reported in [8] [20], where several heuristics were compared. The algorithms run on the machine TEIDE (8 processors ALPHA at 466-Mhz, with 2 Gbytes and O.S. DIGITAL UNIX 4.0C) at the University of La Laguna. Table 10.1 Synchronous Parallel Scatter Search 2 processors

1 processor

4 processors

8 processors

p Objective Time S-u Objective Time S-u Objective Time S-u Objective Time S-u 40 50 60 70 80 90 100

35002.02 29089.71 25185.79 22125.46 19884.51 17987.91 16563.93

1858 1443 1930 1864 1794 4168 3341

1 35002.02 1047 1 29089.71 847 1 25185.79 1119 1 22125.46 1088 1 19884.51 1049 I 17987.91 2322 1 16563.93 1912

1.77 1.70 1.72 1.71 1.71 1.80 1.75

35002.02 461 4.03 29089.71 612 2.36 25186.24 634 3.04 22125.46 649 2.87 19870.51 1085 1.65 18006.23 780 5 3 4 16551.68 1266 2.64

35002.02 29089.71 25167.84 22125.46 19870.51 18002.35 16554.08

283 311 628 416 471 488 520

6.57 4.64 3.07 4.48 3.81 8.54 6.64

Table 10.1 reports the results for the SPSS algorithm, which runs as many times as the number of available processors ( 1 , 2 , 4 and 8, respectively). The objective value, real time in seconds, and speedups are presented for each number of processors. The real time is used to obtain the time-reduction provided by the parallel algorithm SPSS. Speed-up is the ratio of sequential time to parallel time to solve a particular problem on a given machine. It is given by

Speedup =

time to solve a problem with the parallel code on one processor time to solve the same problem with the parallel code on n-pr processors

240

PARALLEL SCATTER SEARCH

Table 10.2 Replicated Combination Scatter Search 2 processors

1 processor

8 processors

4 processors

p Objective Time S-u Objective Time S-u Objective Time S-u Objective Time S-u 40 50 60 70 80 90

35002.02 29089.71 25185.79 22125.46 19884.51 17987.91 100 16563.93

1858 1443 1930 1864 1794 4168 3341

1 35002.02 I 29089.71 1 25185.79 1 22125.46 1 19884.51 1 17987.91 1 16563.93

1173 965 1240 1263 1214 2871 2285

1.58 1.50 1.56 1.48 1.48 1.45 1.46

35002.02 29089.71 25175.01 22125.46 19872.28 17987.91 16561.58

578 953 760 828 1015 1389 1113

3.21 1.51 2.54 2.25 1.77 3.00 3.00

35002.02 29089.71 25167.82 22125.46 19884.52 17987.91 16553.70

557 410 433 486 519 537 827

3.34 3.52 4.46 3.84 3.46 7.76 4.04

Table 10.3 Replicated Parallel Scatter Search 2 processors

4 processors

8 processors

p Best known Objective Time Objective Time Objective Time 40 50 60 70 80 90 100

35002.02 29089.71 25160.40 22125.46 19870.29 17987.94 16551.20

35002.02 29089.71 25160.40 22125.46 19870.29 17987.91 16556.58

2048 1667 2067 2035 3557 22359 3477

35002.02 29089.71 25160.40 22125.46 19870.29 17987.91 16552.48

2058 1672 2124 2044 3591 4730 4076

35002.02 29089.71 25160.40 22125.46 19870.29 17987.91 16552.48

1750 1605 2382 3040 2796 12742

3838

The results in Table 10.1 show that, in general, the speedup increases with the number of processors. For two processors, the speedup is almost linear. For 4 processors and p = 80 we observed a detrimental anomaly [27] [28]; i.e., the real time is greater than the real time for 2 processors. We also observed that, in some cases, using 4 and 8 processing elements, an acceleration anomaly manifests itself in the form of a speedup greater than 4 and 8, respectively. Table 10.2 summarizes the results for the RCSS algorithm, where the headings are the same as in Table 10.1. The speedup reached is smaller than the speedup for the SPSS algorithm, but the speedup increases with the number of processors for the RCSS algorithm. The objective values obtained with both algorithms are similar. The real times for the SPSS are better than the real times for the RCSS. Table 10.3 shows the results for the RPSS algorithm. For 2, 4, and 8 processors it reports the best known objectives and the objective values and real times for the RPSS. The best objective values are found by this algorithm. The real times are similar for different numbers of processors. The real time reported in this table is the maximum of the times of the processors.

COMPUTATIONAL EXPERIMENTS

241

The objective values found with these algorithms are comparable with the best obtained in the literature [8] [20]. The SPSS and RCSS algorithms reduced the computational time properly. RPSS increases the diversification. Note that it is possible to increase the intensification by sharing the best solution reached by the processors. Another possible strategy for increasing the diversification is to apply several combination methods in parallel.

10.6.2 Parallelization Strategies with Multiple Combinations The Multiple Combinations Scatter Search parallelization of the Scatter Search for the Feature Subset Selection was analyzed in [lo]. The computational experiments showed the good performance of the MCSS in searching for a reduced set of features with high accuracy compared to the two Sequential Scatter Searches using each of the single combination methods. The datasets considered in the computational experiments were obtained from the UCI repository [32], from which full documentation about all datasets can be obtained. We chose them taking into account their size and use in machine-learning research. Table 10.4 summarizes the characteristics of the chosen datasets. The first two columns correspond to the name of the datasets as they appear in the UCI repository and the identifier ( I d ) used in forthcoming tables. The intermediate three columns show the total number of features, the number of nominal features, and the number of (numerical) linear features. Finally, the last two columns summarize the number of instances and classes in the dataset. Table 10.4 Summary of general characteristics of datasets Database

Id

Heart( Cleveland) SoybeanLarge Vowel Creditscreening PimaIndianDiabetes Anneal Thyroid(Allbp) Thyroid(Sick - Euthyroid) Breas tCancer Ionosphere Hor seColic WiscosinBreastCancer

HC SbL

Features

All Nom Lin

13 35 10 vw 15 cx 8 Pm 38 An. TAb 28 T S E 25 BC 9 lo 34 HoC 21 W B C 30

7 6 29 6 0 10 9 6 0 8 29 9 22 6 18 7 0 9 0 34 14 7 0 30

Instances Classes 303 307 528 690 768 798 2800 3163 699 35 1 368 569

2 19 11 2 2 5 2 2 2 2 2 2

Both Sequential Scatter Searches were first compared in [ 101 with a Genetic Algorithm using the three standard classifiers (IB1, Naive Bayes, and C4.5). The data showed a superiority of the Sequential Scatter Searches over the Genetic Algorithm. The computational experience also corroborates that there is not significant differ-

242

PARALLEL SCATTER SEARCH

ences between the three classifiers. Therefore, only the IB1 classifier was used to carry out the remaining experiments. The 5 x 2 cross-validation method [ 11was used to measure the accuracy percentage of the resulting subset of features selected by the algorithms. However, in order to increase the efficiency of the search, during the learning process only 1 x 2 crossvalidation was considered. Taking into account the analysis of the influence of some key parameters on the sequential procedures camed out in [lo], ILI = d/2 and RefSetSize = d2,where d is the number of features, were considered. Table 10.5 shows a comparison between the accuracy of the Sequential Scatter Searches (SSS-GC and SSS-RGC) and the MCSS using the IBI classifier. For each method, it provides average accuracy percentages and standard deviations over 10 runs for each dataset. The first average and standard deviation for each dataset show the results obtained using all the features. It also reports, at the bottom, average results over all the datasets considered. From the results obtained, the following observations can be made. First of all, IB1 with all the features provides the best accuracy percentages, but the difference is not significant in most datasets. Second, MCSS has higher precision than both SSS-GC and SSS-RGC. Table 10.5 Accuracy percentage and standard deviation in validation Id. HC SbL

v U! CX

Pm An TAb TSE BC lo HoC

WBG Average

All

SSS-GC

SSS-RGC

MCSS

75.98f3.10 85.02 f 4.18 95.2951.66 81.54+2.31 69.71 f 2.68 93.59 i 1.09 95.86 f 0.24 92.67 f 0.68 95.48It0.70 85.75f1.30 75.60f1.99 95.61*0.82

74.99*5.31 82.41 f 3.49 93.58f1.39 83.28f3.12 67.92 f 2.35 94.14 f 3.16 95.53 f 0.33 95.09 f 2.76 95.22f1.07 87.75f1.37 76.69f3.49 94.66h1.51

74.99f5.68 83.65 5 3.65 93.58f1.39 83.91f3.27 67.66 f 2 . 4 2 92.98 f 2.68 95.44 & 0.44 95.12 i 2.78 94.88f1.45 87.12f1.24 77.94f2.96 93.57i2.23

74.91f2.85 80.53 f 1.92 93.64i1.34 83.3952.74 68.10f2.43 91.49 f 2.19 95.44 f 0.43 93.58 f 2.20 95.11f0.90 87.35f1.56 76.96313.79 93.6752.36

86.84 i 1.73

86.77 i 2.45

86.74 & 2.52

86.18 5k 2.06

Finally, the number of features selected by the algorithms was analyzed. Table 10.6 shows the total number of features of each dataset and the average number of features selected with each algorithm and their standard deviations. At the bottom, it gives the average of these numbers over all the datasets considered and the reduction percentages. Both sequential SS procedures have a similar behavior. However, MCSS uses a smaller number of features and its standard deviation is the lowest of all the considered algorithms. Moreover, MCSS reduces significantly the set of features selected by each Sequential Scatter Search to classify for some datasets. For example, for the TSE dataset, the number of features is reduced from 5.10 to 1.90 in average. Considering the number of features of the best solution obtained by each algorithm, we conclude that parallel SS is the algorithm that performs better.

CONCLUSIONS

Id

243

Table 10.6 Number of features selected for each algorithm 13 35 10 15 8 38 29 25 9 34 21 30

6 . 3 0 f 1.64 15.0f 2.71 7 . 7 0 3 0.68 ~ 3 . 4 0 3 ~1.43 4.10f0.99 8 . 9 0 f 2.89 2.80% 1.48 5 . 1 0 f 2.47 5 . 2 0 f 1.62 6.10& 1.37 7 . 4 0 f 2.41 6.80% 2.53

22.25

6.57 f 1.85

VUI

C X

Pm An TAb

TSE BC lo

HoC WBC Average Reduction

SSS-GC

All

HC SbL

100.00%

70.47%

SSS-RGC

6.20 f 2.10 16.5 f 2.22 7.70 i 0.68 4.50 f 2.27 4.00 f 0.94 8.20 f 2.66 2.70 % 1.83 5.10 f 2.42 4.78 f 1.48 5.70 dz 1.06 6.30 f 2.16 5.50 f 1.43 6.43 f 1.77 71.10%

MCSS

5.56 f 1.60 12.80 f 1.81 8.00 f0.94 2.80 f 2.62 4 . 2 0 f 1.14 6.30 f 2.06 2.00 f 1.05 1.90 f 1.20 5.40 f 1.71 3.90 f0.88 4.50 f 1.51 6.00 f 2.63 5.28 f 1.60 76.27%

The obtained computational results corroborate the effectiveness of this parallelization. MCSS achieves values of accuracy percentage similar to both Sequential Scatter Search algorithms but uses a smaller subset of features. Moreover, the parallel algorithm is more precise than sequential algorithms. 10.7 CONCLUSIONS

In this chapter, we describe several parallelizationsof the Scatter Search metaheuristic that have been designed with the purpose of achieving results which are better than the results obtained by the sequential procedures. The proposed parallel strategies are the Synchronous Parallel Scatter Search, Replicated CombinationScatter Search, Replicated Parallel Scatter Search, and Multiple Combinations Scatter Search. The first three strategies, which parallelize the local search, select in parallel several subsets from the reference set, and run in parallel several Sequential Scatter Searches, respectively, were applied to the p-Median Problem. The last parallel strategy, which runs in parallel different combination methods, was designed and implemented for the Feature Subset Selection Problem. The computational experience corroborates the effectiveness of the parallel metaheuristics, since they achieve either an increase of the efficiency or an increase of the exploration. Note that the parallel strategies applied to the pMedian Problem can be easily adapted to solve other p-selection problems such as the Clustering and Classification Problems that appear in machine-learning.

244

PARALLEL SCATTER SEARCH

Acknowledgments This research has been partially supported by the Spanish Ministry of Science and Technology through the project TIC2002-04242-C03-01, 70% of which are FEDER funds. Also, the second author acknowledges partial funding from a CajaCanarias grant.

REFERENCES 1. Alpaydin, E. Combined 5 x 2cv f test for comparing supervised classification learning algorithms, Neural Computation 1 1 (1999) 1885-1 892.

2. Beasley, J.E. A note on solving large p-median problems, European Journal of Operational Research 2 1 (1985) 270-273 3. Brunschen, C., Brorsson, M. OdinMPiCCp-A Portable Implementationof OpenMP for C, Proceedings of First European Workshop on OpenMP EWOMP99, (1999) 21-26 4. Campos, V., Glover, F., Laguna, M., Marti, R. An experimental evaluation of a scatter search for the linear ordering problem, Journal of Global Optimization 21 (2001) 397-414.

5. Dietterich, T. G. Approximate statistical test for comparing supervised classification learning algorithms, Neural Computation 10) (1998) 1895-1923. 6. Erlenkotter, D. A dual-based procedure for uncapacitated facility location, Operations Research 26 (1978) 992-1009 7. Feni, F., Kadirkamanathan, V., Kittler, J. Feature subset search using genetic algorithm, in: IEE/IEEE Workshop on Natural Algorithms in Signal Processing, IEE Press, (1993) 8. Garcia-Lopez, F., Melian-Batista, B., Moreno-Perez, J.A., Moreno-Vega, J.M. The Parallel Variable Neighborhood Search for the p-Median Problem Journal of Heuristics 8 (2002) 375-388 9. Garcia-Lopez, F., Melian-Batista, B., Moreno-Perez, J.A., Moreno-Vega, J.M., Parallelization of the scatter search for the p-median problem, Parallel Computing 29 (2003) 575-589. 10. Garcia-Lopez, F., Garcia-Torres, M., Melian-Batista, B., Moreno-Perez, J.A., Moreno-Vega, J.M. Solving Feature Subset Selection Problem by a Parallel Scatter Search, European Journal of Operations Research (2005, to appear). 11. Glover, F., Heuristics for Integer Programming using Surrogate Constraints, Decision Sciences 8, (1977) 156-166

REFERENCES

245

12. Glover, F., Tabu Search for Nonlinear and Parametric Optimization (with Links to Genetic Algorithms), Discrete Applied Mathematics 49 (1994) 23 1-1 55 13. Glover, F., A Template for Scatter Search and Path Relinlung, in Lecture Notes in Computer Science, 1363, J.K. Hao, E. Lutton, E. Ronald, M. Schoenauer, D. Snyers (Eds.), (1998) 13-54 14. Glover, F., Scatter Search and Path Relinking, in D. Corne, M. Dorigo, F. Glover (Eds.) New Ideas in Optimisation, Wiley, (1999) 15. Glover, F., Laguna, M., Marti, R. Fundamentals of Scatter Search and Path Relinking Control and Cybernetics, 39, (2000) 653-684 16. Glover, F., M. Laguna, R. Marti, Scatter Search, in Theory and Applications of Evolutionary Computation: Recent Trends, A. Ghosh, S. Tsutsui (Eds.) Springer-Verlag, (2003) 5 19-537 17. Goldberg, D.E., GeneticAlgorithms in Search, Optimization andMachine Learning, Addison Wesley, (1989) 18. Hanjoul, P., Peeters, D. A comparison of two dual-based procedures for solving the p-median problem European Journal of Operational Research 20 (1985) 387-396 19. Hansen, P., MladenoviC, N. Variable Neighborhood Search for the p-Median Location Science 5 (1997) 207-226 20. Hansen, P., MladenoviC, N. Perez-Brito, D. Variable Neighborhood Decomposition Search, Journal ofHeuristics 7 (2001) 335-350 2 1. Holland, J., Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, (1975) 22. Hyafil, L., Rivest, R.L., Constructing optimal binary decision trees is N P complete, Information Processing Letters 5 (1976) 15-17 23. Kariv, O., Hakimi, S.L. An algorithmic approach to network location problems, part 2. The p-medians. SIAM Journal on Applied Mathematics 37 (1979) 539560 24. Kohavi, G., John, H. Wrappers for feature subset selection, Artificial Intelligence 97 (1997) 273-324 25. Kuehn, A.A., Hamburger, M.J. A heuristic program for locating warehouses, Management Science 9 (1963) 643466 26. Laguna, M. and R. Marti, Scatter Search: Metodology and Implementations in C, Kluwer Academic Press, (2003) 27. Lai, T., Sahni, S. Anomalies in parallel branch-and-bound algorithms Communications ofthe ACM 27 (1984) 594-602

246

PARALLEL SCATTER SEARCH

28. Lai, T., Sprague, A, Performance of parallel branch-and-bound algorithms IEEE Transactions on computers 34 (1985) 962-964 29. Maranzana, F.E. On the location of supply points to minimize transportation costs Operations Research Quarterly 12 (1964) 138-139

30. Mitchell, T. Machine Learning, Series in Computer Science, McGraw-Hill, (1997) 3 1. Mladenovic, N., Moreno-PCrez, J.A., Moreno-Vega, J. M. A Chain-Interchange Heuristic Method, Yugoslav Journal of Operational Research 6 (1996) 41-54 32. Murphy, P.M., Aha, D.W., Uci repository http://www.ics.uci.edu/ mlearn/MLRepository.html

of machine learning.

33. OpenMP: A proposed Industly Standard API for Shared Memory Programming. White Paper, (1997) (http://www.openmp.org/openmp/mp-documents/paper/paper.html) 34. Siedlicki, W., Sklansky, J. A note on genetic algorithm for large-scale feature selection, Pattern Recognition Letters 10 (1989) 335-347 35. Teitz, M.B., Bart, P. Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research 16 (1968) 955-96 1 36. Vafaie, H., Jong, K.D., Robust feature selection algorithms, in: Proceedings of the 5th IEEE International Conference on Toolsfor Artificial Intelligence, IEE Press, (1993) 356363 37. Voss, S. A reverse elimination approach for the p-median problem Studies in Locational Analysis 8 (1996) 49-58 38. Wilson, D.R., Martinez, T.R., Improved heterogeneous distance functions, Journal ofArtijicia1 Intelligence Research 6 (1997) 1-34

39. Witten, I., Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, (2000) 40. Yang, J., Honavar, V. Feature Subset Selection using a Genetic Algorithm in Proceeding of the Second Annual Conference on Genetic Programming, Morgan Kaufmann, (1997) 380-385

11 Parallel

Variable Neighborhood Search

JOSE A. MORENO PEREZ~,PIERRE HANS EN^, NENAD MLADENOVIC~ Universidad de La Laguna, Spain GERAD and HEC Montreal, Canada Mathematical Institute (SANU) Belgrade and GERAD Montreal, Canada

11.1 INTRODUCTION A combinatorial optimization problem consists of finding the minimum or maximum of a real-valued function f defined on a discrete or partially discrete set. If it is a minimization problem, it can be formulated as follows:

min{f(x) : x E X } .

(11.1)

Here X is called solution space, x represents a feasible solution and f the objective function of the problem. An optimal solution x* (or a global minimum) of the problem is a feasible solution where the minimum of (1 1.1) is reached. That is, x* E X has a property that f (x*)5 f(x),‘dx E X . A local minimum x’ of the problem (1 1. l), with respect to (w.r.t. for short) the neighborhood structure N is a feasible solution z’E X that satisfies the following: f(x*) 5 f(x), Vx E N ( z ) . Therefore any local or neighborhood search method (i.e., method that only moves to a better neighbor of the current solution) is trapped when it reaches a local minimum. Several metaheuristics, or frameworks for building heuristics, extend this scheme to avoid being trapped in a local optimum. The best known of them are Genetic Search, SimulatedAnnealing, and Tabu Search (for discussion of these metaheuristics and others, the reader is referred to the books of surveys edited by Reeves [26] and Glover and Kochenberger [9]). Variable Neighborhood Search (VNS) ([23, 24, 12, 13, 14, 151) is a recent metaheuristic which exploits systematically the idea of neighborhood change, both in the descent to local minima and in the escape from the valleys which contain them. Hence, VNS proceeds by a descent method to a local minimum, then explores a series of different predefined neighborhoods of this solution. Each time, one or 247

248

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

several points of the current neighborhood are used as a starting point for a local descent method that stops at a local minimum. The search jumps to the new local minimum if and only if it is better than the incumbent. In this sense, VNS is not a trajectory-following method (that allows nonimproving moves within the same neighborhood) as Simulated Annealing or Tabu Search. The application of parallelism to a metaheuristic can and must allow to reduce the computational time (by a partition of the sequential program) or to increase the exploration in the search space (by the application of independent search threads). In this survey, several strategies for parallelizing a VNS are considered. We analyze and test them with large instances of the p-median problem. The next section describes the basics of the VNS metaheuristic. Several parallelization strategies for VNS are analyzed in Section 11.3. In Section 11.4 analyze the characteristics of the VNS applied to solve p-median problem. Computational experiments and conclusions are presented in Sections 11.5 and 1 1.6 respectively. 11.2 THE VNS METAHEURISTIC The basic idea of VNS is a change of neighborhoods in search for a better solution. VNS proceeds by a descent method to a local minimum, then explores, systematically or at random, increasingly distant neighborhoods of this solution. Each time, one or several points within the current neighborhood are used as initial solution for a local descent. One jumps from the current solution to a new one if and only if a better solution has been found. So VNS is not a trajectory-following method (as Simulated Annealing or Tabu Search) and does not specify forbidden moves. Despite its simplicity it proves to be effective. VNS exploits systematically the following observations: 0

A local minimum with respect to one neighborhood structure is not necessarily so for another.

A global minimum is a local minimum with respect to all possible neighborhood structures. 0

For many problems local minima with respect to one or several neighborhoods are relatively close to each other.

This last observation, which is empirical, implies that a local optimum often provides some information about the global one. This may for instance be several variables with the same value in both. However, it is usually not known which ones are such. An organized study of the neighborhood of this local optimum is therefore in order until a better one is found. Unlike many other metaheuristics, the basic schemes of VNS and its extensions are simple and require few and sometimes no parameters. Therefore, in addition to providing very good solutions, often in simpler ways than other methods, VNS gives insight into the reasons for such a performance, which in turn can lead to more efficient and sophisticated implementations.

THE VNS METAHEURISTIC

249

Variable Neighborhood Descent (VND) is a deterministic version of VNS. It is based on Fact I above, i.e., a local optimum for afirst type of move x + z‘ (or heuristic, or within the neighborhood N1 (x))is not necessarily one for another type ofmove x c 2 (within neighborhood Nz(x))).It may thus be advantageous to combine descent heuristics. This leads to the basic VND scheme presented in Figure 11.1.

VND method 1. Find an initial solution x. 2. Repeat the following sequence until no improvement is obtained (i) Set t t 1; (ii) Repeat the following steps until != tmaa: (a) Find the best neighbor x’ of x (x’E Nt(x)); (b) If the solution x’ thus obtained is better than x, set x otherwise, set! t ! 1;

+

c

x’ and !+- 1;

Fig. 11.1 Variable Neighborhood Descent.

Another simple application of the VNS principle is reduced VNS. It is a pure stochastic search method: solutions from the pre-selected neighborhoods are chosen at random. Its efficiency is mostly based on Fact 3 described above. A set of neighborhoods N1 (z),N ~ ( z .) .,. ,Nk,,<,,(x)will be considered around the current point z (which may be or not a local optimum). Usually, these neighborhoods will be nested, i.e., each one contains the previous. Then a point is chosen at random in the first neighborhood. If its value is better than that of the incumbent (i.e., f(x’) < f(x)),the search is recentered there (x t x’). Otherwise, one proceeds to the next neighborhood. After all neighborhoods have been considered, one begins again with the first, until a stopping condition is satisfied (usually it will be the maximum computing time since the last improvement, or the maximum number of iterations). The description of the steps of reduced VNS is as shown in Figure 11.2. In the previous two methods, we examined how to use variable neighborhoods in descent to a local optimum and in finding promising regions for near-optimal solutions. Merging the tools for both tasks leads to the general VNS scheme (GVNS). We first discuss how to combine a local search with systematic changes of neighborhoods around the local optimum found. We then obtain the basic VNS scheme (BVNS) of Figure 11.3. The simplest BVNS is sometimes called Iterated Local Search [20]. The method gets by a perturbation a neighbor of the current solution, makes a local search from it to a local optimum, and moves to it if there has been an improvement. The steps of the simplest VNS are obtained taking only one neighborhood (see Figure 11.4).

250

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

RVNS method

1. Find an initial solution 2; choose a stopping condition;

2. Repeat the following until a stopping condition is met: (i) k +- 1; (ii) Repeat the following steps until k = kmaz: (a) Shake. Take (at random) a solution z’ from Nk(r). (b) If this point is better than the incumbent, move there (z continue the search with NI ( k t 1); otherwise, set k t k

Fig. 11.2

+- XI),

+ 1.

and

Reduced Variable Neighborhood Search.

BVNS method

1. Find an initial solution x;choose a stopping condition; 2. Repeat until the stopping condition is met: (1) Set k t 1; (2) Repeat the following steps until k = k,,

:

(a) Shaking. Generate a point x’at random from the kth neighborhood of z

(x’E Ndx)); (b) LocaZ search. Apply some local search method with z’ as initial solution; denote with 2’’the so obtained local optimum; (c) Move ornot. If the local optimum z” is better than the incumbent z, move there (z t z”), and continue the search with Jdl ( k +- 1);otherwise, set k+k+l;

Fig. 11.3

Basic Variable Neighborhood Search.

If instead of simple local search, one uses VND and if one improves the initial solution found by Reduced VNS, one obtains the GVNS scheme shown in Figure 11.5. Then a C code for the simple version of sequential VNS is shown in Figure 1 1.6. This code of the VNS can be applied to any problem if the user provides the initialization procedure i n i t i a l i z e , the shake shake, the local search l o c a l s e a r c h and the function improved to test if the solution is improved or not. For those problems consisting in selecting a fixed number p of items from an universe U = (2~1,..., un},the local search and the shake based on the interchange moves can also be implemented using a function exchange also provided for the

THE PARALLELIZATIONS

251

VNS algorithm 1. Initialization: Find an initial solution x. Set x*

-+

x.

2. Repeat the following until a stopping condition is met. (a) Shake: Take (at random) a solution x’ in / d ( x ) . (b) Local Search: Apply the local search method with x’as initial; denote 2’’the so obtained local optimum. (c) Improve or not: If 2’‘ is better than x*,do x* + x”.

Fig. 11.4

Simple Variable Neighborhood Search.

problem. A solution S is represented by an array S = [uz : i = 1, ..., 111 where u,is the i-th element of the solution, for i = 1,2, ...,p , and the (i - p)-th element outside the solution, for i = p + 1, ...,n. The usual greedy local search is implemented by choosing iteratively the best possible move among all interchange moves. Let S,, denote the solution obtained from S by interchanging 21% and u,,for i = 1,...,p and j = p 1, ..., n. The pseudocode of the local search is in Figure 11.7. The code C for the Sequential Local Search (SLS) is shown in Figure 11.8. The shake procedure consists of, given the size k for the shake, choosing k times two points at random, u,and u,; ut in the solution and uIoutside the solution, and performing the corresponding interchange move (see Figure 11.9). The C code for the shake procedure is in figure 11.10.

+

11.3 THE PARALLELIZATIONS

The application of parallelism to a metaheuristic can and must allow to reduce the computational time (by the partition of the sequential program) or to increase the exploration in the search space (by the application of independent search threads). In order to do it, we need to know the parts of the code of an appropriate size and that can be partitioned to be solved simultaneously. Several strategies for parallelizing a VNS algorithm have been proposed and analyzed in the literature (see [8,4]). The parallel VNS heuristics reported were coded in C (using the OpenMP, a model for parallel programming portable across shared memory architectures) in [ 81 and in Fortran 90 (using MPL) in [4].

252

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

GVNS algorithm 1. Initialization.

Select the set of neighborhood structures Nk, for k = 1,.. . , k,,, that will be used in the shaking phase, and the set of neighborhood structures Ne for I = 1, . . . , that will be used in the local search; find an initial solution x and improve it by using RVNS; choose a stopping condition;

2. Main step. Repeat the following sequence until the stopping condition is met: (1) Set k +- 1; (2)Repeat the following steps until k = kmaz:

(a) Shaking. Generate a point x’ at random from the kth neighborhood Nk (x)of x; (b) Local search by VND. (bl) Set l! +- 1; (b2) Repeat the following steps until I = I,,; . Find the best neighbor 2” of x‘ in N e ( Z ’ ) ; . If f(x”) < f(z’)set x’ +- 2’’ and k. +- 1;otherwise set .t c .t

+ 1;

(c) Move or not. If this local optimum is better than the incumbent, move there (z c x”),and continue the search withN1 ( k +- 1);otherwise, set k+-k+l;

Fig. 11.5 General Variable Neighborhood Search.

Sequential VNS Algorithm 1: initialize(best-sol) ; 2 : k = O ; 3: while (k < k a a x ) { 4: k++ ; 5: cur-sol = shake(best-so1,k) ; 6: local-search ( cur-sol) ; 7: if improved(cur-sol ,best-sol) 8: best-sol = cur-sol ; 9: k = O ; 10: } /* if */ 11: 1 /* while */ Fig. 11.6 Sequential Variable Neighborhood Search.

Using the OpenMP, pseudocodes very similar to C programs were obtained as an adaptation of codes originally written for serial machines that implement the

THE PARALLELIZATIONS

253

Local Search Initialize S’ Repeat

s

+

S’

S’ + argmin{Cost(S,j) : i = 1, . . , p , j = p Until Cost(S’) = Cost(S)

+ 1,...,n }

Fig. 11.7 Local Search.

Sequential Local Search void seq-locaLsearch(so1 cur-sol)

{

1: i n i t - s o l = cur-sol ; 2: while improved(cur-sol, i n i t - s o l ) ) { 3: f o r ( i = p ; i < n ;i++) 4: f o r (j=O;j
Fig. 11.8 Sequential Local Search Pseudocode.

Shake Repeat k times Choose1 < i < p a n d p < j < n a t r a n d o m Let Slj

DO S

+-

+

S - {ui}

Si,

+ {wJ}

Fig. 11.9 Shake Procedure. sequential VNS. The OpenMP is based on a combination of compiler directives, library routines, and environment variables that can be used to specify shared-memory parallelism in Fortran and C/C++ programs (see [35]). Only a few lines of the sequential code had to be replaced by specific directives of the OpenMP compiler in the pseudocode to get the code of the parallel programs used in the computational experiments.

254

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

Sequential Shake void seq-shake(so1 c ur-sol)

{

1: i n i t - s o l = cur-sol ; 2: f o r (r=O;r
Four different parallelization strategies have been reported in the literature; two are simple parallelization and the other two are more complex strategies. The two simple parallelizations of the VNS consist of parallelizing the local search and of replicating the whole VNS in the processors. The other two additional parallelization strategies are proposed in [8] [4] and have been tested with known large instances of the p-median problem. The first of the two simple parallelization strategies analyzed in [8] attempts to reduce computation time by parallelizing the local search in the sequential VNS and is denoted SPVNS (Synchronous Parallel WS).The second one implements an independent search strategy that runs an independent VNS procedure on each processor and is denoted RPVNS (Replicated Parallel VNS). The parallel local search is implemented trying to get a balanced load among the processors. The procedure divides the set of p ( n - p ) solutions of the neighborhood of the current solution among the available num-proc processors to look for the best one. Figure 11.11 shows the code of the procedure par-local-search that implements parallel local search. Then the pseudocode of the Synchronous Parallel VariableNeighborhood Search SPVNS is shown in Figure 11.12. The second simple parallelization of the VNS is the Replicated Parallel VNS (RPVNS) that tries to search for a better solution by means of the exploration of a wider zone of the solution space, using multistart strategies. It is done by increasing the number of neighbor solutions to start a local search (several starting solutions in the same neighborhood or in different ones). This method is like a multistart procedure where each local search is replaced by the VNS. The pseudocode of the RPVNS is described in Figure 11.13. The two additional parallelization strategies use cooperation mechanisms to improve the performance. The Replicated-Shaking VNS (RSVNS) parallelization of the VNS proposed in [8] applies a synchronous cooperation mechanism through a classical master-slave approach. The Cooperative Neighborhood VNS (CNVNS)

THE PARALLELIZATIONS

255

Algorithm PLS void par-locaLsearch(so1 cur-sol)

{

1: i n i t - s o l

=

cur-sol

2: while '(improved()> { 3: l o a d = (n-p) d i v bum-proc) ;

p a r a l l e l ( p r = 0 ; p r < nun-proc; pr++) { 4: 5: tmp-sol(pr) = i n i t - s o l ; 6: low = p r * l o a d ; high = low + load ; 7: f o r (i = low; i < high; i++) 8: f o r (j = 0 ; j < p ; j++> { exchange ( i n i t - s o l ,new-sol, i ,j 9: i f improve ( n e w s o l , tmp-sol ( p r ) 1 10: 11: tmp-sol ( p r ) = new-sol ; 12: } /* f o r */ 13: critical 14: if improve (tmp-sol (pr ) , cur-sol) cur-sol = tmp-sol (pr) ; 15: 16: } /* p a r a l l e l */ 17: } /* while */ ] /* par-local-search */ Fig. 11.11 Parallel Local Search Pseudocode.

Algorithm SPVNS 1: i n i t i a l i z e ( b e s t - s o l ) ; 2 : k = O ; 3: while (k < k a a x ) { 4: k++ ; 5: cur-sol = shake(best-sol, k ) ; 6: par-locaLsearch(curso1) ; 7: i f improved(cur-sol , b e s t - s o l ) { 8: b e s t - s o l = cur-sol ; 9: k = O ; 10: } /* i f */ 11: ] /* while */ Fig. 11.12 Synchronous Parallel Variable Neighborhood Search Pseudocode.

parallelization proposed in [4] applies a cooperative multisearch method based on a central-memory mechanism. In RSVNS, the master processor runs a sequential VNS but the current solution is sent to each slave processor that shakes it to obtain an initial solution from which the local search is started. The solutions obtained by the slaves are passed on to the

256

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

Algorithm RPVNS 1: initialize(joint-best-sol); 2: p a r a l l e l p r = 0, nun-proc-1 { 3: initialize(best-sol (pr)) ; 4: k(pr) = 0 ; while ( k ( p r ) < k a a x ) { 5: 6: k ( p r ) ++ ; 7: cur-sol(pr) = shake(best-sol(pr) , k(pr)) ; 8: local-search(cur-so1 ( p r ) ; 9: i f improved(cur-sol(pr), b e s t - s o l ( p r ) ) { 10: best-sol(pr) = cur-sol(pr) ; 11: k ( p r ) = 0; 12: } /* i f */ 13: } /* while */ 14: c r i t i c a l 15: i f improve(best-sol(pr) , j o i n t - b e s t - s o l ) 16: joint-best-sol = best-sol(pr) ; 17: } /* p a r a l l e l */ Fig. 11.13 Replicated Parallel Variable Neighborhood Search Pseudocode.

master which selects the best and continues the algorithm. The independence between the local searches in the VNS allows their execution in independent processors and updating the information about the joint best solution found. This information must be available for all the processors in order to improve the intensification of the search. The RSVNS pseudocode is described in Figure 11.14. The CNVNS proposed by [4] is obtained by applying the cooperative multisearch method to the VNS metaheuristic. This parallelization method is based on the centralmemory mechanism that has been successfully applied to a number of different combinatorial problem. In this approach, several independent VNSs cooperate by asynchronously exchanging information about the best solution identified so far, thus conserving the simplicity of the original, sequential VNS ideas. The asynchronous cooperative multisearch parallel VNS proposed allows a broader exploration of the solution space by several VNSs. The controlled random search nature of the shaking in the VNS and its efficiency are altered significantly by the cooperation mechanism that implement frequent solution exchanges. However, the CNVNS implements a cooperation mechanism that allows each individual access to the current overall best solution without disturbing its normal proceedings. Individual VNS processes communicate exclusively with a central memory or master. There are no communications among individual VNS processes. The master keeps, updates, and communicates the current overall best solution. Solution updates and communications are performed following messages from the individual VNS processes. The master initiates the algorithms by executing

T H E PARALLELIZATIONS

257

Algorithm RSVNS 1: initialize(joint-best-sol) ; 2 : k = O ; 3: while (k < k a a x ) { 4: k++; 5: joint-cur-sol = joint-best-sol ; p a r a l l e l p r = 0 , num-proc-1 { 6: cur-sol(pr) = s h a k e ( j o i n t - b e s t - s o l , k ) ; 7: l o c a1-sear ch (cur -s o 1(p r ) ) ; 8: 9: critical 10: i f improved(cur-sol ( p r ) , j o i n t - c u r s o l ) j oint-cur-sol = cur-sol ( p r ) ; 11: 12: barrier ; 13: master ; 14: i f improved(joint-cur-sol, j o i n t - b e s t - s o l ) { 15: joint-best-sol = joint-cur-sol; 16: k = 0; 17: } /* i f */ 18: barrier 19: } /* p a r a l l e l */ 20: ) /* while */ Fig. 11.14

Replicated-ShakingParallel Variable Neighborhood Search Pseudocode.

a parallel RVNS (without local search) and terminates the whole search by applying a stopping rule. Each processor implements the same VNS algorithm. It proceeds with the VNS exploration for as long as it improves the solution. When the solution is not improved any more, it is communicated to the master if better than the last communication, and the overall best solution is requested from the master. The search is the continued starting from the best overall solution in the current neighborhood. The CNVNS procedure is summarized as follows: Cooperative Neighborhood VNS 0

Master process: - Executes parallel RVNS. - Sends initial solutions to

...e individual VNS processes.

- After each communication from an individual VNS process, updates the best overall and communicates it back to the requesting VNS process.

- Verifies the stopping condition. 0

Each VNS process:

258

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

- Receives the initial solution, selects randomly a neighborhood, and explores it by shaking and local search. - If the solution is improved, the search proceeds from the first neighbor-

hood: shake and local search. - If the solution cannot be improved, the process

*

* *

Communicates its solution to the master; Requests the overall best solution from the master; Continues the search from the current neighborhood.

The pseudocodes, similar to the above parallelizations, of the master and workers procedures of the CNVNS are shown in Figures 1 1.15 and 1 1.16.

Algorithm CNVNS Master process 1 : parallelRVNS(init-sol(pr) : pr=1. .nm-proc) ; 2: initialize(joint-best-sol); 3: for ( p r = i ,pr
11.4

APPLICATION OF VNS FOR THE p-MEDIAN

The p-median problem has been chosen as a test problem for a wide set of basic VNS algorithms and extensions appeared in literature.

11.4.1 The pMedian Problem The p-median problem is a locatiodallocation problem consisting of selecting the p locations for the facilities that minimize the sum of the distances from a set of users to the set of facility points. It belongs to a wide class of hard combinatonal problems where the solutions consist in the selection of p items from an universe. The evaluation of the objective function of the locatiodallocation problems ranges from the simplest one to that needing to solve another hard problem or to perform a

APPLICATION OF VNS FOR THE p M E D l A N

259

Algorithm CNVNS Master process Workerbr) process 1 : g e t (best-sol-pr) 1 ; 2:k=O; 3: while (k < kmax) { 4: k++ ; 5: cur-sol = shake (best-sol-pr ,k) ; 6: local-search(cur-so1) ; 7: i f improved(cur-sol , b e s t - s o l - p r ) 8: best-sol-pr = c u r - s o l ; 9: k = O ; 10: } /* i f */ 11: } /* while */ 12: Return best-sol-pr Fig. 11.16

Worker CNVNS Pseudocode.

simulation process. The standard moves for this class of problems are the interchange moves. An interchange move consists of replacing an item in the solution by another one out of the solution. Consider a space S that includes a set of potential location points for facilities or facility centers and a given set of users with their corresponding demand for the facility. Consider also a real-valued hnction D : S x S -+ R ! whose values D ( s ,t ) represent, 'ds,t E S , the distance travelled (or costs incurred) for satisfying a unit of demand of a user located at s from a facility located at t. The distance from a finite set of facility points X c S to a user located at s is

D ( X ,s ) = min D ( z ,s ) . X E-1-

The total transportation cost for satisfying the demand of all the users located at a finite set of points U C S from the facilities located at the points of X c S is

T ( X ,U ) =

c

D(X, ). . w(u),

UEU

where W ( U ) is the total amount of demand of all the users located at u. The p median problem consists of locating p facility centers (or medians) in S in order to minimize the total transportation cost for satisfying the demand of all users. Several formulations and extensions of this optimization problem are useful to model many real word situations, such as the location of industrial plants, warehouses and public facilities. When the set of potential locations and the set of users are finite, the problem admits a combinatorial formulation. This formulation is as follows.

260

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

Let L = { v l ,v2,...,urn} be the finite set of potential locations for the p medians and U = (211, u2,...,u,} be the set of demand points for the facility. Consider also a weight vector w = [wa: a = 1,...,n]representing the amount of demand of the users located at demand point u,.Let D be the n x nz matrix whose entries contain the distances dist(u,,u,) = d,, between the demand point uaand the potential location u,,for i = 1, ...,n, j = 1, ..., m; i.e.,

D

=

[dz3: 2 = 1, ..., n , j = 1, ..., WL]= [ D ~ s ~ (?uj J ,) ,: 2 = 1, ..., 72, j = 1, ...>I T Z ] .

The p-median problem consists of choosing the location of p medians from L, minimizing the total transportation cost for satisfying the whole demand. The objective of the combinatorial problem is to minimize the sum of the weighted distances (or transportation costs), i.e.,

where X 2 L and (XI = p . The capacitated version (see [6, 191) includes a fixed capacity for the facility center located at each point of L that bounds the amount of demand served by it, but usually the problem is uncapacitated and each customer is supplied from its closest facility. Beside this combinatorial formulation, the p-median problem has a formulation in terms of integer programming with matrix D and vector w as data. The formulation includes two sets of decision variables: the location variables y = [y, : j = 1, ..., m] and the allocation variables x = [x,, : i = 1,..., n , j = 1, ...,n]. The meaning of these variables is as follows: y j = 1 if a facility is located at vJ and yj = 0 otherwise. 0

x,, = 1 if all the users at demand point u7are served from a facility located at vJ and xaJ= 0 otherwise.

The integer linear programming formulation of the p-median problem is then n

m

Subject to m

Exij

=

1, i = 1,...,12,

j=1

m

xiJ 5

yj, i = 1, ..., n, j = 1, ..., n z ,

j=1

xzj, y j

E

(0, l}, i = 1, ..., 72, j = 1, ..., In.

APPLICATION OF VNS FOR THE p M E D l A N

261

However, the most common version of the pmedian problem is the unweighted case where all the weights are equal and can be eliminated from the formulation. The unweighted and uncapacitatedp-median problem is NP-hard [ 171. Extensive references to works on this and related problems are contained in the main books, surveys, and reviews [3, 2,22,7]. Many heuristics and exact methods have been proposed for solving it. Exact algorithms were developed in [I], [lo], and others. Classical heuristics for the pmedian problem often cited in the literature are Greedy [18], Alternate [21], and Interchange [33]. The basic Greedy heuristic starts with an empty set and repeats the following greedy step; the facility point that least increases the objective of the resulting set is added. The Alternate heuristic, from an arbitrary set of p locations, “alternates” the following allocation and allocation steps. In an allocation step, all the users are allocated to the nearest facility point. In a location step, the best facility point for the set of users allocated to a single facility point is chosen. These steps are iterated while some improvement is obtained. Finally, the Interchange heuristic, from an arbitrary initial set of p facility point chosen as medians, iteratively interchanges a facility points in the median set with another facility point out of the median set. Among other metaheuristics (see [29,34,25, 28,30,3 1,32, 271) the VNS and its variants have been applied to the p-median problems (see [ 1 1, 16, 13, 151). 11.4.2 Application of VNS to p-Median Problem In this section we describe some details of the application of VNS to the standard the pmedian problem. In the standard instances of p-median problem, there are no weights associated to the users and the set of potential locations for the facilities and the set of locations of the users coincide. Then m = n and L = U is the universe of points and wi = 1, for i = 1,...,n. A solution of the p-median problem consists of a set S of p points from the universe U to hold the facilities. The objective function is usually named the cost function due to the economical origin of the formulation of the problem. Therefore, the solutions are evaluated by the cost function computed as the sum of the distances to the points in the solution:

Most of the heuristic searches use a constructive method and then apply a good strategy to select base moves to apply. Basic constructive heuristics are used to generate an initial solution for the searches. They consist of adding elements to an empty solution until a set of p points is obtained. The base moves for this problem are the interchange moves. They are used in most of the heuristic searches. Given a solution S , an element v, in the solution, and an element vj not in the solution, an interchange move consists of dropping v, from S and adding vj to S. The selection of a solution coding that provides an efficient way of implementing the moves and evaluating the solutions is essential for the success of any search

262

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

method. For this purpose the following coding of the solutions are often applied. A solution S is represented by an array S = [wl : i = 1,..., n] where v, is the i-th element of the solution, for i = 1 , 2 , ...,p, and the ( i - p)-th element outside the solution, for i = p 1, ..., n. The computation of the cost of the new solution after each move can be simplified by storing the best allocation costs in a vector named Cost1[,I, defined as

+

Costl[i]= min Dist[v,,u,], i J=l..p

=

1. ..., n.

and the second best allocation cost of u,,i = 1, ..., n, in

Cost2[i]=

min

j = 1.. p j # r

(2)

Dist[v,,vj].

where r( i) is such that Cost1[i] = Dist[u,,wr(,)].The first and second best allocation costs have been used in a Variable Neighborhood Decomposition Search (VNDS) by P61. For 1 < i 5 p and p < j 5 n, let S,, be the new solution consisting in interchanging u,and vJ. Then the cost of the new solution is given by

Cost(S,,) = min{Dist[u,, ~

1 ~ 1 ,

min

I= 1,. . .,p,I#z

Dist[v,,w I ] }

n

which would imply O(pn) operations. However, using the values in Cost1 and Cost2, an improved procedure takes O(piz, n ) time, i i l being the number ofpoints assigned to point w,. Note that if p is large, then n, must be small and the difference between p n and pn, n is important in shaking and local searches.

+

+

11.5 COMPUTATIONAL EXPERIMENTS

The algorithms of [8] were coded in C and tested with instances of the p-median problem where the distance matrix was taken from TSPLIB RL1400, which includes 1400 points. The values forp where from 10 to 100 with step 10; i.e., 10,20, ..., 100. The algorithms run on the Origin 2000 (64 processors RlOOOO at 250 MHz, with 8 Gbytes and O.S. IRIX 6.5) of the European Centerfor Pcrrallelism of Barcelona. The algorithm runs four times with four different numbers of processors (1, 2, 4, and 8 respectively). The results show that the algorithm SPVNS finds the same objective value using different number of processors. The objective values obtained with SPVNS were worse than the best known objective values. The comparison between the results of the four methods (the sequential VNS and the three parallelizations) showed that the speedup increased with the number of processors. However, the linearity was not reached due to the concurrent access to the data in the shared memory. Some

CONCLUSIONS

263

computer results on ths instance of the problem have been reported in [ 111, where several heuristics (including a basic VNS) were compared. They reported results of the algorithms RPVNS and RSVNS using 2, 4, and 8 processors. The best results were obtained for the algorithm RPVNS, which gets CPU times near to those obtained by the sequential algorithm and in most cases it gets better objective values. Also they are better when the number of processors increases. The algorithm RSVNS provides worse results than RPVNS both in CPU time and objective values. Crainic et al. [4] extended to the VNS the success of the cooperative multisearch method that had been applied to a number of difficult combinatorial optimization problems [ 5 ] . They carried out extensive experimentations on the classical TSPLIB benchmark problem instances with up to 11,948 demand points and 1000 medians. Their results indicate that the cooperative strategy yields, compared with the sequential VNS, significant gains in terms of computation time without a loss in solution quality. The CNVNS and the sequential VNS for comparison purposes were coded in Fortran77. The cooperative parallel strategy was implemented using MPI. Computational experiments were performed on a 64-processor SUN Enterprise 1000 with 400 MHz clock and 64 Gbytes of RAM.Tests were run using 5, 10, and 15 processors, and 1 processor for the sequential version. Note that, since VNS includes a random element in the shake, solving the same problem repeatedly may yield different solutions. Therefore, the comparisons have to be based on average values talung into account the standard deviations. However standard deviations were extremely low or very low in all cases. This fact indicates that both the sequential and the cooperative multisearch parallel are robust with respect to the random move in the shake step. VNS is based on the idea of aggressive exploration of neighborhoods, that is, on generation and evaluation of many different solutions. Consequently, when the evaluation of the moves is not expensive in computing time (as is the case for the pmedian instances using the fast interchange), the communication overhead associated to parallel computation results in less search time and generally somewhat lower quality solutions for the same total search time. 11.6 CONCLUSIONS

The combination of the VNS and parallelism provides a useful tool to solve hard problems. The VNS, as a combination of series of random and local searches, is parallelizable in several ways. Two simple parallelization strategies are the Synchronous Parallel VNS (SPVNS) that is obtained by parallelizing the local search and the Replicated Parallel VNS (RPVNS) that is obtaining by parallelizing the whole procedure so that each processor runs in parallel one VNS. These parallelizations provide the basic advantages of the parallel procedures. However using cooperative mechanisms, the performance is improved by the Replicated-Shaking VNS (RSVNS) proposed in

264

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

[8] that applies a synchronous cooperation mechanism through a classical masterslave approach and additionally improved by the Cooperutive Neighborhood VNS (CNVNS) proposed in [4] that applies a cooperative multisearch method based on a central-memory mechanism.

Acknowledgments The research of the first author has been partially supported by the Spanish Ministry of Science and Technology through the project TIC2002-04242-CO3-01,70% ofwhich are FEDER funds.

REFERENCES 1. J.E. Beasley. A note on solving large p-median problems, European Journal of Operational Research, 21 (1985) 270-273. 2. M.L. Brandeau and S.S. Chiu. An overview of representative problems in location research. Management Science 35 (1989) 645-674. 3. G. Comuejols, M.L. Fisher and G.L. Nemhauser. Location of bank accounts to optimize float: An analytic study of exact and approximate algorithms, Management Science 23 (1977) 789-810. 4. T.G. Crainic, M. Gendreau, P. Hansen and N. Mladenovik, Cooperative parallel variable neighbourhood search for the p-median. Journal of Heuristics 10 (2004) 293-314. 5 . T.G. Crainic, M. Gendreau, Cooperative parallel tabu search for the capacitated network design. Journal of Heuristics 8 (2002) 601-627.

6. J.A. Diaz, E. Femandez, Scatter search and Path relinking for the capacitated p-median problem, European Journal of Operational Research (2005, to appear). 7. Drezner, Z. (ed.) Facility location: A survey of applications and methods, Springer, 1995. 8. F. Garcia Lopez, B. Melian Batista, J.A. Moreno Perez and J.M. Moreno Vega, The parallel variable neighbourhood search for the p-median problem. Journal of Heuristics, 8 (2002) 375-388. 9. F. Glover and G. Kochenberger (eds.), Handbook of Metaheuristics, Kluwer, 2003.

10. P. Hanjoul, D. Peeters. A comparison of two dual-based procedures for solving the pmedian problem. European Journal of Operational Research, 20 (1985) 387-396.

REFERENCES

265

1 1. P. Hansen and N. MladenoviC. Variable neighborhood search for the pmedian. Location Science, 5 (1997) 207-226. 12. P. Hansen and N. MladenoviC, An introduction to Variable neighborhood search, in: S. Voss et al. eds., Metaheuristics, Advances and Trends in Local Search Paradigmsfor Optimization, Kluwer, (1999) 433-458. 13. P. Hansen and N. MladenoviC. Developments of variable neighborhood search, C. Ribeiro, P. Hansen (eds.), Essays and surveys in metaheuristics, Kluwer Academic Publishers, BostodDordrecht/ London, (2001) 415 4 4 0 . 14. P. Hansen and N. MladenoviC, Variable neighborhood search: principles and applications, European Journal of Operational Research 130 (200 1) 449-467. 15. P. Hansen, N. MladenoviC. Variable Neighborhood Search. In F. Glover and G. Kochenberger (eds.), Handbook ofMetaheuristics Kluwer (2003) 145-1 84. 16. P. Hansen, N. MladenoviC and D. Perez-Brito. Variable neighborhood decomposition search. Journal of Heuristics 7 (2001) 335-350. 17. 0. Kariv, S.L. Hakimi. An algorithmic approach to network location problems; part 2. The p-medians. SZAM Journal on Applied Mathematics, 31 (1969) 539-560.

18. A.A. Kuehn, M.J. Hamburger. A heuristic program for locating warehouses. Management Science, 9 (1963) 643-666. 19. L.A.N. Lorena, E.L.F. Senne, A column generation approach to capacitated p-median problems Computers and Operations Research 3 1 2004) 863-876. 20. H.R. Lourenco, 0. Martin, and T. Stuetzle. Iterated Local Search. In F. Glover and G. Kochenberger (eds.), Handbook of Metaheuristics, Kluwer, (2003) 32 1353. 2 1. F.E. Maranzana. On the location of supply points to minimize transportation costs. Operations Research Quarterly, 12 (1964) 138-139. 22. P. Mirchandani and R. Francis, (eds.) Interscience, (1990).

Discrete location theory, Wiley-

23. N. MladenoviC. A variable neighborhood algorithm-A new metaheuristic for combinatorial optimization. Presented at Optimization Days, Montreal ( 1995) pp. 112. 24. N. MladenoviC and P. Hansen. Variable neighborhood search. Computers Oper: Res. 24 (1997) 1097-1 100. 25. N. MladenoviC, J.A. Moreno-PCrez, and J. Marcos Moreno-Vega. A chaininterchange heuristic method, YugoslavJ. Oper Res. 6 (1996) 41-54.

266

PARALLEL VARIABLE NEIGHBORHOOD SEARCH

26. C.R. Reeves, Modern heuristic techniques for combinatorial problems. Blackwell Scientific Press, (1993). 27. M.G.C. Resende and R.F. Werneck, A hybrid heuristic for the p-median problem, Journal ofHeuristics 10 (2004) 59-88. 28. E. Rolland, D.A. Schilling and J.R. Current, An efficient tabu search procedure for the p-median problem, European Journal of Operational Research, 96 (1 996) 329-342. 29. K.E. Rosing, C.S. ReVelle and H. Rosing-Vogelaar, The p-median and its linear programming relaxation: An approach to large problems, Journal of the Operational Research Society, 30 (1979) 815-823. 30. K.E. Rosing and C.S. ReVelle, Heuristic concentration: Two stage solution construction, European Journal of Operational Research 97 (1997) 75-86. 3 1. E.L.F. Senne and L.A.N. Lorena, Lagrangeadsurrogate heuristics for p-median problems, in: M. Laguna and J. L. Gonzalez-Velarde, eds. Computing Tools .for Modeling Optimization and Simulation: Interfaces in Computer Science and Operations Research, Kluwer (2000) 1 15- 130. 32. E.L.F. Senne and L.A.N. Lorena, Stabilizing column generation using Lagrangeadsurrogate relaxation: an application to p-median location problems, EURO 2001 - The European Operational Research Conference, (200 1). 33. M.B. Teitz, P. Bart. Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research, 16 (1968) 955-96 1. 34. S. Voss. A reverse elimination approach for the p-median problem. Studies in Locational Analysis, 8 (1996) 49-58. 35. The OpenMP architecture review Board. OpenMP: A proposed industv standard API for shared memory programming. White Paper, October 1997 (http://www.opeMlp.org/openmp/mp-documents/paper/paper.html)

12 Parallel Simulated Annealing M. EMIN AYDIN',VECIHI Y161T2 'London South Bank University, BCIM, UK 2Ataturk University, Erzurum, Turkey

12.1

INTRODUCTION

This chapter overviews Parallel Simulated Annealing (PSA) algorithms and discusses an illustrative case study. It is not a review done on PSA, but is rather an introduction in the light of recent advances. We tried to give the main taxonomy and reveal the main categorical approaches and methods. The notion of PSA is not new as it has been discussed since the late 1980s. The computationalpower was very poor then and therefore serial algorithms like SA have taken a very long time even for medium-size problems. Since the technology of parallel computation has been very promising, the majority of PSA studies have considered multiple-processor high performance computers to overcome long computational time. In this chapter, we briefly introduce PSA in general and present a PSA implementation for computer organizationsas well. PSAs have been reviewed and widely introduced by Azencott [l] and briefly introduced by Greening [2] and Leite and Topping [3]. On the other hand, Meise [4] has studied the convergence of various PSA algorithms parallelised in the way of parallel moves. Considering the whole literature on PSA, one can conclude that there is no widely agreed upon classification of PSA. The aim of this chapter is to close that gap and to discuss an instance of PSA implemented for computer organizations rather than multiple-processor computers. Logically, SA algorithms can be parallelized in two main ways. One is regarding the partitioning of the algorithm appropriately [ 5 , 6 ] . The other is based on parallelizing the data. The infrastructure of parallelization is mainly the technology of parallel machines: SIMD, MIMD, etc. Those resources are utilized to process the tasks in parallel by either of the methodologies mentioned above [7, 8,9]. The communications needed between parallel units are handled by using Massage Passing Interfaces (MPI), which are developed within various programming environments. The rest of the chapter is organized as follows. A brief introduction to SA is given in Section 12.2 while PSA approaches are presented more widely in Section 12.3. 267

268

PARALLEL SIMULATED ANNEALING

Section 12.4 is about a case study of PSA with relevant discussion on parallelization and scalability and finally Section 12.5 summarizes this chapter. 12.2 SIMULATED ANNEALING

Simulated annealing (SA) is a stochastic heuristic algorithm which explores through the solution space using a stochastic hll-climbing process [lo]. Because of its ease of use, SA has become a popular and practical method for solving large and complex problems such as scheduling, timetabling, and travelling salesman, etc. However, SA has two drawbacks: one is to be trapped by local minima and the other is taking too long to find a reasonable solution. In order to overcome these drawbacks, researchers have either hybridized SA with other heuristics such as the genetic algorithms or parallelized it. The main aim is to avoid local minima traps and/or to have faster convergence. On the one hand, many hybrid approaches combining the genetic algorithm (GA) with SA have been implemented to take advantage of a diverse population provided by the GA and hill climbing provided by SA [ 1 1, 12, 13, 14,151. On the other hand, a number of PSA algorithms have been developed to overcome its speed problem -since the late 1980s attempts have been made to parallelize the SA through parallel computing. References [16, 17, 18, 19,201 have all discussed PSA algorithms for various optimization problems. Begin

Randomly initialize energy and temperature ( Ea To) E&=Eo While (not frozen) do Repeat: - Generate a new state with En,, - AE= E d - En,, - i f ( A E < O ) E o u = En,, - Else Eou = EM, with probability of e-''T Until (not at equilibrium) T = T,,, Fig. 12.1 The Metropolis algorithm [21].

SA is inspired by the Metropolis algorithm, which is presented in Figure 12.1. As can be seen there, it starts working with an initial energy EOto be cooled down until a certain level (frozen). At each step, the energy is cooled down once an equilibrium level is reached. That is explored for a good and acceptable level of energy transition (Eold En,,) where that transition is taking place with a Boltzmann probability distribution. SA can be viewed as a probabilistic decision-making process in which a control parameter called temperature, instead of energy, is employed to evaluate the probability of accepting an uphill move (in the case of minimization). Suppose that sn, s i , and sn+l denote the solution held (moved from) in the nthiteration, the solution moved to in the nth iteration, and the qualified solution for the ( n+ l ) t h

-

PARALLEL SIMULATED ANNEALING

269

iteration, respectively. sk is yielded by moving from s, in the way of a neighborhood function. We have adopted f(s)to be the cost of state s. The new qualified solution is determined as follows:

s,

otherwise

where As = f ( s , ) - f ( s k ) , pistherandomnumbergenerated tomakeastochastic decision for the new solution, and t , is the level of temperature (at the nth iteration), which is cooled by a particular cooling function, F(t,). This means that, in order to make the new solution (sk) qualified for the next iteration, either it has to be better than the old one (s,) or, at least, the stochastic rule has to be satisfied. The stochastic rule, in which the probabilistic decision is made to prevent the optimization process from sticking in possible local minima, is the main idea behind SA. The probability of malung such a decision under the circumstances of a A s at temperature, t,, is -A5 denoted by et.. . Clearly, as the temperature is decreased by F ( t n ) ,the probability of accepting a large decrease in solution quality decays exponentially towards zero according to the Boltzmann distribution. Therefore, the final solution is near optimal when the temperature approaches zero. 12.3 PARALLEL SIMULATED ANNEALING

Before discussing PSA, it will be useful to summarize the possible parallel resources (computers), which can mainly be classified into three categories: SIMD (Single Instruction Multiple Data), MIMD (Multiple Instruction Multiple Data), and computer networks, such as LAN (Local Area Network) and WAN (Wide Area Network). SIMD and MIMD are two main architectures of multiple-processorcomputers which are coupled either loosely or tightly. On the other hand, computer networks are organized in various fashions. The communication among the organized computers is more expensive than for coupled parallel processors. There are mainly two possible ways of parallelism: (i) physical parallelism that implies paralellization through decomposing and distributing data undertaken or the search space and (ii) functional or algorithmic parallelism that considers the algorithm itself for that purpose. This classification is inspired by the classification of distributed programming [22]. Physical parallelism is done through decomposing the search space or the data set in hand according to the computational resources. On the other hand, the functional parallelism is concerned with building the algorithm itself regarding the computational facilities. Functional parallelism is applied to simulated annealing in two ways. One is by launching multiple independent runs (MIR) concurrently with different initial conditions and the other is by exploring for

270

PARALLEL SIMULATED ANNEALING

the equilibrium state concurrently at each level of temperature. Various methods are built based on either of these approaches in different fashions. Parallel Simulated Annealing (PSA) is not a new notion with respect to empowering SA as it has been studied since late 1980s [23]. There is no major consensuses available on classification of PSAs as well. Azencott [ 13 compiled the parallel approaches of SA into his book early in the 1990s. He has edited his book into 13 chapters, where the main approaches are discussed with various applications. The earlier authors classified PSAs in terms of the ways to decompose the algorithm and dispatch parts over the parallel processors rather than considering local or wide-area machine networks since the main hardware to use was parallel machines. For instance, Greening [2] has categorized PSA in the sense of the synchronization level of the algorithm. The main taxonomy given is classified into synchronous and asynchronous categories and then goes deeper. The majority of other authors recognize different PSAs if they differ with respect to the method of parallelization, such as MIR or parallelism by data. Bevilacqua [9] classifies PSA methods into four groups: data parallel, speculative, parallel moves, and MIR. On the other hand, Leite and Topping [3] do not give substantial categories but emphazise both parallel moves and MIR. Crainic and Toulouse [24] classify PSAs with respect to coding level of the system. Our conclusion is that the best point of view for classification of PSA is the methodology applied. From this point of view, PSAs can be classified into four groups: parallelism by data, MIR, parallel moves, and hybrid parallelism. Parallelism by data covers all methods built based on decomposition of data to parallelize, while the others are related to handling the algorithm itself regardless of data. The speculative [9] and systolic [2] methods can be recognized as parallel moves in particular (special) fashions. In the following sections, we will introduce the possible ways to parallelize SA. Table 12.1 presents a list of works each of which can be counted as a milestone of parallelism for SA. They are presented with publication year and method of parallelism. As can also be seen there, PSA has been considered since mid-1980s until now and recent works are parallelized in a hybrid (evolutionary) way.

PARALLEL SIMULATED ANNEALING

Table 12.1

Some contributions made to the field of PSA

Year Published

Method of Parallelism

Explanations

Aarts et al. [23]

1986

Parallel moves

One of the first examples of parallelism

Kim et al. [ 191

1991

MIR'

Gaudron and Trouve [32]

1992

Massive parallelism

With parallel moves

Boissini and Lutton [ 161

I993

Massive parallelism

With parallel moves

Bongiovanni et al. [ S ]

1995

Parallel moves & Massive Parallelism

With parallel moves

Authors

Yong et al. [35]

1995

Hybrid parallelism

Evolution with SAt

Chau [SO]

1996

MIR*

Multiple Independent Monte Carlo method

Jeong and Lee [ 121

1996

Hybrid parallelism

Hybrid with GAt

Premont et al. [30]

I996

Data parallelism

Parallelizing cost function

Bouhmala and Pahud [25]

I998

Data parallelism

Graph partitioning

Bhandarker et al. (71

1998

Parallel moves

Chu et al. [8]

1999

MIR*

Intermediate state mixing

Kolonko [ 131

I999

Hybrid parallelism

Hybrid with GAt

Czech [SI]

2000

MIR*

Interactive MIR

Durand and White [29]

2000

Parallel moves

Huang et al. [ 1 I]

200 1

Hybrid parallelism

Stemhofel et al. [28]

2002

MIR*

Vales-Alonso et al. [6]

2003

Hybrid parallelism

Hybrid with GAt

Aydin and Fogarty [33]

2004

Hybrid parallelism

S A operator ~ to evolve a population

*Multiple Independent Run Genetic Algorithm + Simulated Annealing

Hybrid with GAt

271

272

PARALLEL SIMULATED ANNEALING

12.3.1 Parallelism by Data The parallelism achieved in this way is very basic. The idea is to decompose a data set into subparts and assign each part to a different processor. If there is a search space provided instead of a data set, then the idea is applicable to the search space. Data sets andor search spaces can be decomposed and parallelized as long as they are separable. That is why this way of parallelism is problem dependent. If the data set undertaken or the search space is not separable, then data parallelism will not work. Bouhmala and Pahud [ 2 5 ] have applied data parallelism to SA for mesh partition optimization, which is an easily decomposableproblem. Greening [2] has considered a couple of methods to parallelize the systems under this category: functional parallelism, simple serializable set, spatial decomposition, etc. Functional parallelism implies the cost function calculated in parallel, which yields very limited performance of parallelism. A simple serializable set suggests to partition the whole algorithm and put each part on a parallel process by means of the operating system’s facilities while strictly taking care of its serial property. On the other hand, there are synchronousand asynchronous methods to split the datahearch space into parts, each to be assigned to a particular processor. 12.3.2 Multiple Independent Runs (MIR) This type of parallelism is one of the most popular methods because of its ease of use. The idea is to launch multiple runs of the same algorithm simultaneously,each to be executed on different processes with different initial solutions. At the end of runs the best of the system is picked among all outputs, as sketched in Figure 12.2. This has been done by many researchers and is the easiest way. The approaches may be altered by building communications between processes to let parallelized units interact according to particular fashions. Some may exchange information during run time, but others only present the result produced at the end of the process. The latter applications work like clientlserver systems where the administrative tasks such as assigning the tasks over the processes are performed by a central unit, which can be a server. On the other hand, each independent run works like a client. The main difficulty here is to clarify whether or not to build the interaction among the independent runs and how to manage the communication. That may give different quality of solution depending on the problem undertaken. The communication is built among the processors either by message-passing systems (MPSs) or sharing memory. Since the communicationmay charge more often than the computation [3], there must be a balance built between the communication and computation depending on the problem and the properties of the facility available [8]. Chu et al. [8] have studied a PSA to be an MIR-style parallelism with periodical interactions in order to mix the intermediate states, where they are mixed in a pool after each period finished and then each independent run picks another solution to run on. This repeats until a certain amount of time reached. Ram et al. [26] have implemented a MIR style parallelism for job shop and traveling salesman

PARALLEL SIMULATED ANNEALING

273

Fig. 12.2 An instant of parallelization by Multiple Independent Runs.

combinatorial problems by running independent runs on a number of distributed workstations, where independentruns periodically exchange the solutions. Steinhofel et al. [28] have studied on parallelization for job shop scheduling problems in which they run several independent SAs with different initial solutions by letting each run exchange intermediate results with other peers in a predetermined time pattern. 12.3.3 Parallel Moves The critical part of a SA algorithm with respect to parallelism is the core of the algorithm, where the solution space is explored to reach the equilibrium state. In that stage, the algorithm tries many states of the space at a particular level of temperature. Once an equilibrium state is reached, then it cools the temperature down to the next level. Actually, reaching an equilibrium state does not mean anything more than having a part of the space explored for a better state under certain restrictions. Since the moves taking place at this stage form a unique Markov chain, the approaches built based on this stage differ from one another by the method in which the dispatch of the altered states is handled (see Leite and Topping [3] for more details on various fashions on how to configure parallelism by this way). In this type of parallelization, the Markov chain is formed across the process of exploration while to reach the equilibrium state it is mainly divided appropriately and parallelized accordingly. Obviously, it is not easy to parallelize SA due to its quite

274

PARALLEL SIMULATED ANNEALING

L

1

9 Step 4

Fig. 12.3

An instant of parallelization by Parallel Moves.

strict sequential process. There is a strong need to determine the operationsitasks along one process to be performed concurrently in order to have a consistent and reasonable parallelism. In the case of SA, (see the Metropolis algorithm in Figurel2. l), obviously, every step is rigorously placed and the whole algorithm is strictly serial. The only critical part to be considered for algorithmic parallelism is the repeated step where the solutions are permuted to reach the state of equilibrium. That loop is called the inner loop. The solutions explored through inner loop do not have to be successive, theoretically. However, since the algorithm works in a serial fashion, they become a sequence that fits in a Markov chain. The main idea here is to divide this chain (sequence) into subchains and explore through each on a particular parallel process. A typical instant of parallelization by parallel moves is sketched in Figure 12.3, where several copies of the second step are created and each assigned to a particular process. Bhandarkar et al. [7] have implemented some parallel-movesbased parallel techniques for SA for the chromosome reconstruction problem in DNA studies. Several ways are explained by Leite and Topping [3] on how to break that chain for parallelization. There might appear some sort of error on breaking the chain. Durand and White [29] have discussed the impact of errors likely to occur during parallel moves and tried to reveal the worst-case situation of the parameters that identify the algorithm. 12.3.4 Massive Parallelization

Massive parallelization is one of the most studied methods in the context of parallel algorithms. In the case of PSA, there is no specific way to implement massive parallelization: it is done via either MIR or parallel moves or any other method. For image processing problems, Premont et al. [30] have studied the difficulty

A CASE STUDY

215

of energy functions used in SA and their possible local minima. They suggested a massively parallel SA to calculate the sigmoid energy function on a special computer. Parallelization of SA is studied and discussed alongside the other approaches, but it is not found to be efficient due to the sequential nature of SA. Since massive parallelization uses a huge number of processors, the communicationtime andor the interaction among the parallel processors takes longer and makes this parallelization disadvantageous. This fact has been studied and discussed by Trouve [31] and Gaudron and Trouve [32] concluding that massive parallelization behaves in a parallel fashion in the short term but is serial-like in the longer term. Also, Bongiovanni et al. [5] have implemented a massively parallel SA for shape detection. Finally, some specific discussion is presented in Section 12.4.6 with respect to the viability of massive parallelization for an evolutionarySA (ESA) algorithm, which is a particular PSA [33]. 12.3.5 Evolutionary Parallelism

Ths is a special and prominent way of hybrid parallelism. For that reason, we prefer to call this method evolutionaryparallelism. In this case, SA algorithms are somehow merged with evolutionary approaches to form hybrid algorithms. In fact, EAs are inherently parallel [34], therefore, combining a SA with an EA is directly parallel and ready to run on a parallel environment. The ways to parallelize EAs change case by case and a variety of MIR are utilized for that purpose. The majority of the algorithms combined with GASare to be counted for evolutionary parallelism, where each individual in the population is picked by a separate SA to be manipulated in parallel. References [l 1, 12, 13, 14, 15,26,27] have tailored their hybrid algorithms from SA and GA in various fashions, while [33] and [35] implemented their SAs in an evolutionary way. In the following section, Evolutionary SA (ESA), a particular PSA in an evolutionary way, is presented with an application in uncapacited facility location problems. A similar one has been introduced and discussed by Vales-Alonso et al. [6] for minimization of the number of alleles lost across generations in control of allelic diversity of a population. They used typical MIR with a communication facility of MPI. Since the application is based on a population, and there exists an interaction among the MIRs, it can be considered as an evolutionary approach as well. 12.4 A CASE STUDY

In this section, we present an illustrative application of distributed Evolutionary SA (dESA), which is a sort of PSA parallelized in an evolutionary way. dESA has previously been implemented based on Distributed Resource Machine (DRM) and applied to job shop scheduling problems [33]. In this section, we have presented an application of dESA to the Uncapacited Facility Location Problem (UFL), which is another very well known combinatorial problem. UFL problems have been solved by classical OR methods [36,37,38,39,40] as well as heuristics [41,42]. However, we

276

PARALLEL SIMULATED ANNEALING

have not come across any method that uses parallel algorithms for the advantage of fast computation. Therefore, this application gives very important insight into PSA to be used for UFL andor similar problems. In the following subsections, DRM, dESA, and UFL are introduced and then the application is discussed. 12.4.1 Distributed Resource Machine @RM)

In order to run dESA, a parallel computing environment is required. Since it is not common to have specific parallel machines such as SIMD and MIMD, etc. easily available, we need to consider the most accessible parallel and distributed environment, which is the common wideilocal area network of computers (WAN, LAN). Distributed Resource Machine (DRM) is an infrastructure that provides a distributedproblem-solving environment based on mobile agents. It is the distributed infrastructure of DREAM software [43], which was developed to solve problems through distributed EAs spread over a massive network of nodes on the Internet. The main aim of this system is to solve the problems based on multiagent systems, which run EAs. The system consists of a scalable network of resources, which works as a peer-to-peer network of nodes on physically distributed computers. Each node has incomplete knowledge about the rest of the network and works as the container of all the agents running on a particular computer. The environment has very good functionalities to develop applications such that the agents have good communication and limited mobility. (See [44] for more information on DRM). The way of distributing the evolutionary processes over the resources throughout DREAM is to implement the island model. Islands are designed and furnished with various properties, data and algorithms, and then distributed over the DRM network. The DRM environment is developed based on multi-thread programming in Java. It runs the islands as MIR technology and provides it with a message passing system (MPS) using sockets on TCP/IP. It is required to partition the problem into subparts to be solved through multi-island models. Since ESA has an evolutionary nature and is lnherently parallel, we easily used DRM software to develop our distributed ESA (dESA) as a model of multiple island applications. Each island is furnished with an ESA algorithm identical to the others and a small part of the main population, where the ESA algorithm evolves that population towards an optimum state.

'

12.4.2 Distributed Evolutionary Simulated Annealing (dESA) The presented dESA is a distributed version of the ESA algorithm. It has been designed based on the island approach [45],where islands equipped with ESA work in parallel. The identical islands are in communicationwith one another in a particular fashion where each adopts a SA as an evolutionary operator together with selection and migration. A SA is embedded into an EA thereby no other reproduction operators such as crossover and/or mutation has been considered.

' http://www.world-wide-dream.org

A CASE STUDY

211

ESA can mainly be parallelized in two ways. One is exploiting identical islands in parallel with different populations. In this case, the parallelism can be considered as MIR-style parallelism since the same process works on every island. In that sense, the islands do not communicate while operating and present their results at the end to be compared for the final output. This approach is the case for the application presented here. There is another implementation of ESA for job shop scheduling (JSS) problems that allows communication and migration of solutions among the islands [33]. The other parallelism to be considered for ESA is to parallelize it for parallel machines. dESA is developed for computer clusters such as LAN and WAN, whereas it could be implemented for parallel machmes as well. Since ESA is inherently parallel because of its evolutionary character, it can be parallelized on parallel machines. This time, it would be like the parallelism done by Chu [21] with mixing of the states intermediately, as the SA operators will operate in parallel on the population from different processors, and each time they will mix the solutions. It will be computationally very expensive if one prefers such a parallelism on computer clusters. Because of that fact, we preferred parallelism as done in dESA. Begin 0 Inirialise the population, Rep eat: pick one solution (old), set the highest temperature (t=100). repeat:conduct a move by neighbourhood function and adopt the new solution (new) if (new-old)r then replace old with new - endif endif t=t*0.955 until t<0.01 put the solution back into the population Untilpre-defined number of iterations

..

~~

Fig. 12.4

~~

~~

Evolutionary Simulated Annealing algorithm [33].

The algorithm sketched in Figure 12.4 works in the following way. After initialization and parameter setting, the algorithm repeats the following steps: (i) selects one individual subject to the running selection rule, (ii) operates on it with a SA operator, and (iii) evaluates whether to put it back into the population or not by a particular replacement rule. All the usual elements of a typical simulated annealing algorithm are contained except that there are no inner repetitions, which are implemented during the acceptance stage before decreasing the temperature level. In this case, the neighborhood fimction works once to cool the temperature per iteration.

278

PARALLEL SIMULATED ANNEALING

The total number of moves per SA operation becomes 200 as the highest temperature of 100 (t)is decayed by 0.955 ( f ( t ) )per iteration until it reaches 0.01. The most relevant work to our approach is done by Yong et al. [35], suggesting an annealing EA for function optimization. A SA process is embedded into a generational evolutionary algorithm where the SA takes a very long time to finalize a single generation. On the other hand, ESA evaluates the individualsjust after each separated SA operation so that one more genetic operator can be easily added to the algorithm. This provides our algorithm with more modularity and scalability over others, but since they only applied their algorithms to function optimization, it is difficult to do a comparison.

12.4.3 The Uncapacited Facility Location Problem (UFL) The mathematical formulation of these problems as mixed integer programming models has proven very fruitful in the derivation of solutionmethods. To formulatethe UFL problems, consider a set of candidate sites for facility location, J = { 1, ..,nz}, and a set of customers, I = { 1, .., n}. Each facility j E J has a fixed cost f3. Every customer i E I has a demand, b,, and c,] is the unit transportation cost from facility j to customer i. Without a loss of generality we can normalize the customer demands to b, = 1. The problem is formulated in the following way: n

m

m

subject to m

Czi3= 1

V i E I:

(12.2)

j=1

0 5 xtj 5 y.j

or y j E {0;1}

'di E I

and 'dj E J ;

(12.3)

where xij represents the quantity supplied from facility j to customer i , y j indicates whether facility j is established ( y j = 1) or not (y3 = 0). The constraint (12.2) makes sure that all demands have been met by the open sites, and the constraint (12.3) is to keep xij and y j integer. Since it is assumed that there is no capacity limit for any facility, the demand size of each customer is ignored, and therefore constraint (12.2) is established without considering the demand variable (bi = 1). By this model, it is mainly decided for the number of facility sites to be established. It could be possible to determine the quantities to be supplied from facility j to customer i such that the total cost (including fixed and variable costs) is minimized. However, it is not considered in this case since each candidate site is assumed to have unlimited capacity. The benchmark problems considered in this study are given in Table 12.2 and are taken from the OR Library [46]. The first column is the benchmark name, and the following two are benchmark size and the known optimum value, respectively.

A CASE STUDY

279

Table 12.2 Benchmark problems with relevant information Benchmark Cap 71 Cap 72 Cap 73 Cap 74 Cap 101 Cap 102 Cap 103 Cap 104 Cap 131 Cap 132 Cap 133 Cap 134 Cap A Cap B Cap C

Size(m x ' n ) I6 x 50 I6 x 50 I6 x 50 I6 x 50 25 x 50 25 x 50 25 x 50 25 x 50 50 x 50 50 x 50 50 x 50 50 x 50 100 x 1000 100 x 1000 100 x 1000

Optimum 932615.75 977799.40 10 10641.45 1034976.98 796648.44 854704.20 893782.1 1 928941.75 793439.56 851495.33 893076.71 928941.75 17156454.48 12979071.58 1 1505594.33

12.4.4 A dESA Implementation for UFL Problems

The dESA implementation for UFL problems has been developed based on the implementation done for JSS problems [33] with some minor differences. The main differences are the size of subpopulations and the intercommunicationof the islands. In the JSS case, each island has a subpopulation to evolve; on the other hand, the islands here take individuals instead of subpopulations. In addition, we do not let the individuals migrateicirculated among the islands. That is because the problems are not as hard as JSS benchmarks, and are very time sensitive due to their sizes. We handle transitions between states by the neighborhood function. We represent the state of solutions by a list of integers, say L , denoting the enumerated open facility (Zi). For instance, L = (0, 1,4,12} means the open facilities are 0,1,4, and 12; the rest are considered as closed. As we denote the particular solution state at time t with st, one state of solution will be si = { L } = { l i 10 < i 5 m } ,where m is the maximum number of facilities. The neighborhood function employed allows modifying the states by three different alternative operations: Exchange0 exchanges one integer on the list with another possible one not on the list; Add0 adds a new integer to the list; Remove() removes one integer from the list. That means that Exchange0 closes an open facility and opens another, Add() opens a new facility, and Remove() closes an open one. Only one of these operations is applied at once. We select one operation randomly according to the following rule: Exchange() iff Condition 1 iff Condition 2 Remove() iff Condition 3

280

PARALLEL SIMULATED ANNEALING

Condition 1: (X(L)= 1 n 0 5 p < 0.7) U (X(L)> 1 r l 0 5 p 5 0 . 5 ) Condition2: (X(L)= 1 f' 0.7 5 p < 1) U (X(L)> 1 n 0.5 5 p 5 0.7) Condition3: (X(L)= m ) U (X(L)> m f' 0.7 5 p I 1) where (X(L))is the length of the list, p is a uniformly generated random number, and m is the maximum number of facilities as usual. By applying this function, we move to a neighboring state. This is a preventive neighborhood function that keeps the solutions feasible by letting the operators manipulate as is convenient. The convenience of one situation is determined by both the length of the list and the random number (X(L),p ). For instance, if ( L ) > 1, then any of the operations can be selected according to p; on the other hand, if X(L) = m, then only the Remove() operator is allowed to operate.

12.4.5 Experimental Results and Discussion The experimental study for this application has been carried out as a comparative work. First, the results of two genetic algorithms (GA) introduced by Kratica et al. [47] and Jaramillo et al. [48]. Then one sequential (nondistributed) ESA and one dESA implementation have been developed as described before. Both GAS are presented in Table 12.3 as shown in their original work. ESA and dESA implementations have the highest temperature of 100 to be decayed by 0.955 at each cooling iteration until the temperature becomes 0.01. That takes 200 iterations per run of the SA operator. The experimental study for sequential algorithms has been done on a Pentium I11 700 MHz double-processor computer running Windows 2000. All of the software was developed using Sun Java JDK1.3.1. The tackled problems are very well known benchmarks that are accessible on the OR Library2 [46]. The dESA implementation is developed as a 12-asynchronous-islandmodel; each evolves a single individual for a while. More details are given in the following paragraphs. The benchmarks are introduced in Table 12.2 with their sizes and the optimum values. Table 1 2 3 Two genetic algorithms that solve UFL problems Benchmarks (Cap) 71-74 10 I - 104 131-134 A-C

GA-Jaram [48] Deviation from the Optimum NIA NIA N/A NIA

CPU Time 0.030 0.096

0.495 53.22

GA-Krat [47] Deviation from the Optimum

CPU Time

NIA NIA 0.175 0.300

0.86 1.26 2.97 83.1

The two applications of GA given in Table 12.3 are very similar as both have exploited a diffusion-like crossover operator, which tailors the new states very rigorously. As is indicated in Table 12.3, the success of Kratica et al. [47] is slightly

A CASE STUDY

281

worse than the other [48], possibly because of the computational facilities used. The reason to indicate those results is to give an insight into the benchmarks used and to create a base for the success of ESA implementations. Both algorithms are somehow inspired by the GA reported in [49], where the algorithm works for slightly different types of assignment problems. The ESA implementation has, in the first instance, been set up with a population of 5 individuals to be evaluated 300 times so that the total number of evaluations becomes 60,000 (300 x 200), where each individual is evaluated 200 times per run of SA operator. Therefore, the evaluation per individual takes place approximately 12,000 times. However, the results were slightly worse as some benchmarks have not been hit with 100 %. Then, the parameter set has been adjusted accordingly. The population of 10 individuals has evolved for 2000 iterations so that the total number of evaluations becomes 400,000 and the evaluations per individual become approximately 40,000. The results become more impressive than before. The dESA implementationconsists of 12 identical ESA islands, where each works independently and communicates autonomously, tackling a single individual rather than a population. The idea here is to spread a population of solutions over the distributed islands to have alternative operations in parallel. The results of both applications are summarized in Table 12.4 showing the superiority of the dESA with respect to both the quality of the solution and CPU time. The results are tabulated in two groups of columns: one is for the quality of solutions, and the other is for CPU time spent. The quality of solutions is given as a percentage of error that is calculated as follows: Error % =

Result - Optimum Optimum

(12.4)

The results of the ESA implementation before adjustment hit the optimum values in 11 of 15 benchmarks, but have deviations in 4 of 15, each with tougher optimum values than the others. Experiments are done 50 times for each benchmark, where the CPU time consumed is measured as the time of the last best result found. The results of ESA after the adjustment are much better with respect to the quality of solutions, but still benchmark Cap C is not met 100 % as the optimum is not hit in a few runs out of 50. The results of dESA are very impressive and the best of all of the algorithms examined. All benchmarks have been solved hitting the optimal values within shorter time than the others. The main difficulty with the methods that consider individuals is the trap of getting stuck in a local optimum. This is caused by the initial state of the individuals. On the other hand, the population-based methods provide more diversity that enables the algorithm to search through various paths. In the case of dESA, a diverse initial population is spread over the islands to run all in parallel, and at the end, the best solution can be found with respect to both time and quality. This makes the method more powerfit1 than many others.

282

PARALLEL SIMULATED ANNEALING

Table 12.4 Experimental results obtained from different algorithms CPU Time

Error %

Benchmark Cap 71 Cap 72 Cap 13 Cap 74 Cap 101 Cap 102 Cap 103 Cap 104 Cap 131 Cap 132 Cap 133 Cap 134 Cap A Cap B Cap C

ESA

dESA

ESA

dESA

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 .o 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.040 0.020 0.022 0.013 0.250 0.085 0. I56 0.03 I 1.960 0.828 0.991

0.0 I3 0.014 0.01 I 0.008 0.046 0.035 0.115 0.0 I0 0.207 0. I60 0.103 0.019 1.392 7.995 18.017

0.0 0.0 0.0 0.0 0.0 0.000 11

4000

0.111

29.699 98.563 184.263

CapA Cap0 CapC

14000

12000

0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0

--t-

---x---

---f---

t t

0

Fig. 12.5 The change in the speed ofconvergenceofdESA by changing the number of islands (scalability).

12.4.6

Parallelism and Scalability

The parallelism applied to dESA is categorically an evolutionary parallelism, which has been achieved through a multiagent (island) system. As discussed earlier, EAs are inherently parallel. Specifically, ESA is an evolutionary algorithm and can be

SUMMARY

283

parallelized in various MIR. In this case, it is parallelized as MIR in which islands are independent runs of ESA with different initial solutions. If we were able to run the algorithm on parallel machines such as MIMD, then we would try it with other parallelization ways. One of the main benefits of multiagent systems running on parallel computers is the scalability. That is a strength of dESA inherited from DRM, as one can enlarge the system by increasing the numbers of the islands. As the number of islands grows, the population is getting larger and more diverse while the time spent does not change by the means of parallelism. Figure 12.5 indicates the decrease in CPU time gained by increasing the number of islands, where we examined the three hardest benchmarks (Cap A, Cap B, and Cap C) with four different numbers of islands: 5,10,15, and 20. On vertical and horizontal axes, the average CPU time and the numbers of the islands are presented, respectively. The lowest graph presents the change of CPU time for Cap A, while the middle is for Cap B, and the top one is for Cap C. As Cap A is not as hard as the other two, there is no significant change in its CPU time. On the other hand, the change is substantial and significant in the case of Cap C as the CPU time is about 15,000 seconds when the number of islands is 5 and it goes down to 6000 by employing 20 islands. In the case of Cap B, the CPU time decreases from 5920 to 3560 seconds by employing 5 more islands earlier but does not change as such later. Figure 12.5 clearly shows that the larger the numbers of island, the shorter the CPU time spent to reach the optimum. The other significant conclusion is about the upper limit of parallelism. As discussed by other authors [ 1,3], massively parallelism does not help SA in full. SA needs some minimum time to catch the optimum and that time is necessary even if we apply a massive parallelism. That can be seen from the case of Cap A in Figure 12.5 as the lowest graph does not show a significant decrease while the number of islands increases. This is the case for the latter part of middle graph of Cap B. If we increase the number of islands a little more, then we can realize the same tendency for Cap C as well. Thus, we can derive such a fact that the parallelism will help to speed up the annealing process but will stop when the number of processors reaches a certain level. 12.5

SUMMARY

In this chapter, we present an introduction to advances in the field of PSA with a case study in uncapacitated facility location problems. As mentioned earlier, PSA algorithms can taxonomically be classified into four categories: parallelism by data, Multiple Independent Run, parallel moves, and hybrid methods (such as evolutionaly approaches). We have explained each category in the light of recent and previous works. As mentioned by many authors, parallelism by data could be easy depending on the nature of the problem. MIR is quite easy to apply and provides good performance, while parallel moves could provide very high performance with respect to speedup and the quality of the solution, but it is not that easy to implement and run the method. The hybrid methods support very good quality of solution by taking the advantages of more methods. The performance of parallelization is not

284

PARALLEL SIMULATED ANNEALING

avoidable as well. One of the most relevant issues is the balance of communication and computation, where some problems need heavier computation and communication does not matter at all, and some problems do matter as they do not need that heavy computation. Therefore, a balance should be built during the parallelization with any method. The case study in this chapter has shown the performance of a particular evolutionary SA algorithm distributed and run in parallel. As it works in an evolutionary way, we can count it as a hybrid method for parallelization. The performance of the algorithm varies problem-by-problem due to difference in their hardness. The last three benchmarks have been examined by more experimentations in order to reveal that difference in the performance. It reveals that it may not be fully beneficial to speed by increasing the number of parallel units: as indicated in the literature increasing massive parallelization does not always help as expected.

REFERENCES 1. Azencott, R., Simulated Annealing: Parallelization Techniques, John Wiley and Sons, 1992. 2 . Greening, D.R., Simulated annealing with errors, PhD Dissertation, University of California, Los Angeles, 1995. 3. Leite, J.P.B., and Topping, B.H.V., "Parallel simulated annealing for structural optimization", Computers and Structures ,73,545-564, (1 999).

4.Meise, C., "On the convergence of parallel simulated annealing", Stochastic Processes and their Applications,76,99-115, (1998).

5. Bongiovanni, G., Crescenzi, P., and Guerra, C., Parallel simulated annealing for shape detection", Computer Vision and Image Understanding, 61(1), 60- 19, (1995). 'I

6. Vales-Alonso, J., FernandeqJ., Gonzalez-Castano, F.J., and Cabarello, A., "A parallel optimization approach for controlling allele diversity in conversation schemes", Mathematical Biosciences, 183 (2), 161-173, (2003). 7 . Bhandarkar, S.M., Machaka, S., Chirravuri, S., and Arlond, J., " Parallel computing for chromosome reconstruction via ordering of DNA sequences", Parallel Computing, 24(12-13), 1177-1204, (1998). 8. Chu, K.W., Deng, Y., and Reinitz, J., "Parallel simulated annealing by mixing of states", Journal of Computational Physics, 148 (2), 646-662, (1999). 9. Bevilacqua, A., "A methodological approach to parallel simulated annealing on an SMP system", Journal of Parallel and Distributed Computing, 62(10), 1548-1570, (2002).

REFERENCES

285

10. Kirkpatrick, S., Gelatt, C. D. Jr. and Vecchi, M. P., "Optimisation by simulated annealing", Science, 220(4598), 671-679, (1983). 11. Huang, H.C., Pan, J.S., Lu, Z.M., Sun, S.H., Hang, H.M., "Vector quantization based on genetic simulated annealing", Signal Processing, 81, 15 13-1523, (2001). 12. Jeong, I.K, Lee J.J., "Adaptive Simulated Annealing Genetic Algorithm for System Identification", Engineering Applications of Artijkial Intelligence, 9(5), 523-532, (1996). 13. Kolonko, M., "Some new results on simulated annealing applied to job shop scheduling problem", European Journal of Operational Research, 113, 123-136, (1999). 14. Wang, L., Zheng, D.Z., "An effective hybrid optimisation strategy for jobshop scheduling problems", Computers and Operations Research, 28, 585-596, (2001). 15. Wong, S.Y.W., "Hybrid simulated annealing/genetic algorithm approach to short term hydro-thermal scheduling with multiple thermal plants", Electrical Power and Energy Systems, 23, 565-575, (2001). 16. Boissin, N., and Lutton J.L., "Aparallel simulated annealing algorithm", Parallel Computing, 19(8), 859-872,(1993). 17. Ferreira A. G., and Zerovnik J., "Bounding the probability of success of stochastic methods for global optimisation", Computers and Mathematics with Applications, 25 (10-ll), 1-8, (1993). 18. Greening D. R., "Parallel simulated annealing techniques", Physica D: Nonlinear Phenomena, 42 (1-3), 293-306, (1990). 19. Kim, Y., Jang, Y., and Kim, M., Stepwise-overlapped parallel simulated annealing and its application to floorplan designs",Computer-AidedDesign, 23(2), 133-144, (1991). 20. Voogd, J. M., Sloot, P. M. A., and Dantzig, R., "Crystallization on a shape", Future Generation Computer Systems, 10 (2-3), 359-361, (1994). 21. Chu, K.W., Optimal Parallelisation of Simulated Annealing by State Mixing, PhD Dissertation, State University of New York at Stony Brook, 2001. 22. Brown, C.,Unix Distributed Computing, Prentice Hall, London, 1994. 23. Aarts, E. H. L.,de Bont, F. M. J.,Habers, J.H. A. and van Laarhoven, P. J. M., "Parallel implementation of the statistical cooling algorithm", Integration VLSl Journal, 4,209-238, (1986).

286

PARALLEL SIMULATED ANNEALING

24. Crainic, T.G., and Toulouse,M., "Parallel metaheuristics",In Fleet Management andLogistics ed: T.G. Crainic and G. Laporte, Kluwer Academics, Nonvell MA, 209-251, (1998). 25. Bouhmala, N., and Pahud, M., "A parallel variant of simulated annealing for optimizing mesh partitions on workstations", Advances in Engineering Sofmare, 29(3-6), 481-485, (1998). 26. Ram, D.J., Sreenivas T.H., and Subramaniam, K.G., "Parallel simulated annealing algorithms", Journal of Parallel and Distributed Computing , 37, 207-212, (1996). 27. Satake, T., Morikawa, K., Takahashi, K., Nakamura, N., "Simulated annealing approach for minimising the makespan of the general job-shop", Znternational Journal of Production Economics, 60-61, 515-522, (1999). 28. Steinhofel, K., Albrecht, A., Wong, C. K., "Fast parallel heuristics for the job shop scheduling problem", Computers and Operations Research, 29, 151-169, (2002). 29. Durand, M.D., and White, S. R., "Trading accuracy for speed in parallel simulated annealing with simultaneous moves ",Parallel Computing,26, 135-150, (2000). 30. Premont, G., Lalanne, P., Chavel, P., Kuijk, M., and Heremans, P., "Generation of sigmoid probability functions by clipped differential spackle detection ",Optical Communications, 129, 347-356, (1996). 3 1. Trouve, A., "Massive parallelisation of simulated annealing: A mathematical study", In Simulated Annealing Parallelization Techniques, ed: R. Azencott, John Wiley and Sons, 144-162, (1992). 32. Gaudron, I., and Trouve, A., "Massive parallelisation of simulated annealing: An experimental and theoretical approach for spin glass models", In Simulated Annealing Parallelization Techniques, ed: R. Azencott, John Wiley and Sons, 162-186, (1992). 33. Aydin, M.E., Fogarty, T.C., "A simulated annealing algorithm for multi-agents systems: A job shop scheduling application" Journal of Intelligent Manufacturing, 15(6):805-814,(2004). 34. Alba, E., and Tomassini, M., "Parallelism and evolutionary algorithms", ZEEE Transactions on Evolutionary Computation, 6(5), 443-462, (2002). 35. Yong, L., Lishan, K., and Evans, D.J., The annealing evolution algorithm as function optimizer", Parallel Computing, 21(3), 389-400, (1995). I'

36. Beasley, J.E., "Lagrangean heuristics for location problems",European Journal of Operational Research, 65 383-399, (1993). 37. Conn A.R. and Cornuejols, G. "A Projection Method for the Uncapacitated Facility Location Problem", Math. Programrning,46,273-298, (1990).

REFERENCES

287

38. Erlenkotter, D., "A Dual-Based Procedure for Uncapacitated Facility Location", Operations Research,26,992-1009, (1978). 39. Guignard, M., "A Lagrangean Dual Ascent Algorithm for Simple Plant Location Problems", European Journal of Operational Research, 35, 193-200, (1988). 40. Holmberg, K., and Jornsten, K., "Dual Search Procedures for The Exact Formulation of The Simple Plant Location Problem with Spatial Interaction", Location Science, 4,83-100, (1996). 41. Alves M.L. and Almeida, M.T., "Simulated Annealing Algorithm for the Simple Plant Location Problem: A Computational Study". Rev. Invest., 1241992). 42. Yigit, V., Aydin, M.E., Turkbey, O., "Evolutionary simulated annealing algorithms for uncapacitated facility location problems", Adaptive Computing in Design and Manufacture VT, Ed: I.C. Parmee, 185-196. Springer, (2004) 43. Jelasity M., Preu M., Peachter B., "A scalable and robust framework for distributed applications", CEC'02: The 2002 World Congress on Evolutionary Computing, 12-17 May 2002: Honolulu, HI, U.S.A. 44. Paechter B., Back T., Schoenauer M., Sebag M., Eiben A. E., Merelo J. J., Fogarty T. C., (2000), A distributed resource evolutionary algorithm machine (DREAM)" in Proc. of the Congress of Evolutionary Computation 2000 (CEC2000). IEEE, 2000, pp. 951-958, IEEE Press. 45. Schmeck, H., Branke, J., and Kohlmorgen, U., (2001), "Parallel implementations of evolutionary algorithms", In: Solutions to Parallel and Distributed Computing Problems, Edited by A. Zomaya, F. Ercal, and S. Olariu, 2001 John Wiley and Sons Inc. 46. Beasley, J.E., "Obtaining Test Problems via Internet", Journal of Global Optimisation, 8,429-433, (1996). 47. Kratica J., To% D., Filipovi V., Ljubi I., "Solving the Simple Plant Location Problem by Genetic Algorithms", RAIRO - Operations Research, 35 (l),127142,(2001). 48. Jaramillo, J.H., Bhadury, J., Batta, R., "On the use of genetic algorithms to solve location problems", Computers and Operations Research, 29, 76 1-779, (2002). 49. Beasley,J.E., Chu, P.C., "A genetic algorithm for the set covering problem. European Journal of Operational Research, 94,392-404, (1996). 50. Chou, C-C., Parallel Simulated Annealing and Applications, PhD Dissertation, State University of New York at Stony Brook, 1996.

5 1. Czech, Z.J., "Parallel simulated annealing for the set-partitioning problem", Proc. of the 8th Euromicro Workshop on Parallel and Distributed Processing, Rhodos, Greece, 19-21 J a n u a y , 2000,343-350, (2000).

This Page Intentionally Left Blank

13 Parallel Tabu Search TEODOR GABRIEL CRAINIC', MICHEL GENDREAU2, JEANYVES POTVIN2 'Centre de Recherche sur les Transports and Universite du Quebec a Montreal, Canada 2Centre de Recherche sur les Transports and UniversitC du Montreal, Canada

13.1 INTRODUCTION

Like other metaheuristics, Tabu Search (TS) has been the object over the last 15 years or so of several efforts aimed at talung advantage of the benefits offered by parallel computing (see the surveys of Crainic and Toulouse [23, 241, Cung et al. [28], Holmqvist, Migdalas, and Pardalos [54], and Pardalos et al. [64]). As it is the case with other metaheuristics, the main goal pursued when resorting to parallel implementations of TS is to reduce the overall ("wallclock") time required to solve a problem instance. This is a particularly important objective when tackling problems that must be solved in real time, as we shall see in a later section of this chapter. There are, however, other possible objectives. Among these, we must mention enhancing the robustness of the algorithm at hand by performing a broader search of the solution space. In some cases, this may even lead to a more efficient search scheme, i.e., a search scheme capable of finding better solutions than the corresponding sequential search approachfor the same overall computational eflort. This may also significantly reduce the calibration effort required to achieve good results consistently over diverse sets of problem instances. The purpose of this chapter is two-fold: first, to describe, discuss, and illustrate the main strategies that have been used over time to parallelize TS heuristics; second, to provide an updated survey of the literature in the rapidly moving field of parallel TS. The remainder of the chapter is organized as follows. In Section 13.2, we briefly recall the main features and components of TS. Section 13.3 describes parallelization strategies for TS. Parallel TS implementations from the literature are then reviewed in Section 13.4. This is followed in Section 13.5 by a description of two applications of parallel TS schemes for the real-time management of fleets of vehicles. Section 13.6 summarizes the main conclusions of the papers and identifies some interesting research directions for parallel TS. 289

290

PARALLEL TABU SEARCH

13.2 TABU SEARCH

Tabu Search was introduced by Glover in 1986 in a seminal paper [45] in which he also coined the term metaheuristics and defined these as strategies designed to guide inner heuristics aimed at specific problems. TS is an extension of classical local search methods typically used to find approximate solutions to difficult combinatorial optimization problems. As in other local search techniques, TS explores the solution or search space by moving at each iteration from the current solution to a neighboring one, where the neighborhood of the current solution is defined by the transformations of the solution allowed by the specific inner heuristic (also called the neighborhood operator). Usually, the next solution is the one that improves the most the objective finction (“best improvement” rule), or the first improving one that is encountered when exploring the neighborhood (“first improvement” rule). Traditional local search techniques rely on the monotonic improvement of the objective function to guide and control the search and, therefore, they typically end up trapped in local optima of the neighborhood operator. The main distinctive feature of TS is that it can overcome local optima and keep the search going. When a local optimum is encountered, the search moves instead to the best deteriorating solution in the neighborhood. To prevent cycling, a history of the search is maintained in a short-term memory and moves that would bring the search trajectory back toward recently visited solutions are said to be tabu and are disallowed, unless they meet some conditions (aspiration criteria). In theory, the search could continue forever. Stopping criteria must thus be specified in actual implementations. The most commonly used are: a fixed CPU time allotment, the total number of iterations performed, the number of iterations performed without observing an improvement in the best objective value recorded, etc. The previous description relates to what could be termed basic Tabu Search. In reality, TS in its full implementation goes much firther than that and can be interpreted globally as the combination of local search principles with the exploitation of information stored in various types of memories. Two key concepts in TS are those of search intensification and search diversiJication. The idea behind search intensification is that regions of the search space that appear “promising” (usually because good solutions have been encountered close by) should be explored more thoroughly in order to make sure that the best solutions in these regions are found. Intensification is usually based on some type of intermediate-term memory, such as a recency memory, in which one would record the number of consecutive iterations that various “solution components” have been present without interruption in the current solution. Intensification is typically performed by periodically restarting the search from the best currently known solution and by “freezing” (fixing) in this solution the components that seem more attractive. It also often involves switching to a more powerful neighborhood operator for a short period of time. Search diversification addresses the complementary need to perform a broad exploration of the search space to make sure that the search trajectory has not been confined to regions containing only mediocre solutions. It is thus a mechanism that tries to force the search trajectory into previously unexplored regions of the search space. Diversification is usually based

PARALLELIZATION STRATEGIES FOR TABU SEARCH

291

on some form of long-term memory, such as a frequency memory, in which one would record the total number of iterations (since the beginning of the search) that various “solution components” have been present in the current solution or have been involved in the selected moves. There are three major diversification techniques. The first one, called restart diverslJication, involves introducing some rarely used components in the current or the best known solution and restarting the search from this point. The second diversification technique, called continuous diversijication, integrates diversification considerations directly into the regular search process by biasing the evaluation of possible moves to account for component frequencies. A third method for achieving diversification is through so-called strategic oscillation, which is a systematic technique for driving the search trajectory into infeasible space (i.e., to solutions that do not satisfy all the constraints of the problem at hand) and then back into feasible space in the hope that it will end up in a different region of the search space. Readers wishing to learn more about TS are referred to the several introductory chapters that have been written on the topic (e.g., [38,50,53]) and to the fundamental papers of Glover [46, 47, 481. The book by Glover and Laguna [51] is the most comprehensive reference on the topic.

13.3 PARALLELIZATION STRATEGIES FOR TABU SEARCH Because of its potentially heavy computational requirements, TS is a natural candidate for the application of parallel computing techniques. In fact, fairly early after the introduction of the method in 1986, researchers began to use such techniques in the development of TS heuristics. Most of these early efforts focused on the parallelization of the most computationally intensive step of the method, namely the neighborhood evaluation, using rather straightforward master-slave schemes (see, e.g., [ 18,60,73]). It soon became apparent, however, that one could go much further in the parallelization of TS than suggested by these low levelparallelization schemes, which turn out to be only faster versions of sequential TS implementations but with the same overall behavior (i.e., they will produce the same search trajectories as their sequential counterparts, but in lower wallclock time). In fact, one could easily envision high level parallelization approaches that would display a completely different algorithmic behavior; this would be the case, in particular, of algorithmic schemes relying on several search threads exploring simultaneously the search space in a coordinated and purposeful fashion.

13.3.1 A Taxonomy In 1993, Crainic, Toulouse, and Gendreau introduced a taxonomy of parallel TS approaches (later published in 1997 [27]) that had several objectives: first, to provide a comprehensivepicture of the then existing parallelization strategies for TS; second, to contributeto a more meaningful analysis and comparison of these methods; third, to foster a better understanding of the relationships between TS and parallel computing;

292

PARALLEL TABU SEARCH

and fourth, to identify new potential parallelization strategies and suggest interesting research avenues. To this date, this taxonomy remains the most comprehensive one on the topic. We now summarize this taxonomy and the parallelization strategies it sought to classify, before reviewing the literature using it in the next section. The taxonomy is based on a three-dimensional classification of algorithmic features. The first dimension is called Control cardinality; it defines whether the search is controlled by a single process (as in master-slave implementations) or collegially by several processes that may collaborate or not. In the latter case, each process is in charge of its own search, as well as of establishing communications with the other processes, and the global search terminates once each individual search stops. These two alternatives are respectively identified as 1-control (1 C) and p-control (PC). The second dimension (Control and communication type) relates to the type and flexibility of the search control. As we shall see, it is probably the most important dimension on which to differentiate parallel TS implementations, and it deserves a detailed discussion. This control-type dimension accounts for the communication organization, synchronization, and hierarchy, as well as the way in which processes handle and share information. There are four degrees or levels along this dimension; they correspond to progressively more complex and sophisticated control schemes. The two first levels cover parallelization schemes that rely on synchronized communications, i.e., where all processes have to stop and engage in some form of communication and information exchange at set moments (number of iterations, time intervals, specified algorithmic stages, etc.) that are exogenously determined, being either hard-coded or determined by a control (master) process. The first level (Rigid Synchronization (RS)) refers to situations where little, if any, information takes place between processes that are at the same level of the communication hierarchy. This would typically the case in classical 1-controlmaster-slave schemes in which the master process dispatches fixed computing-intensive tasks to the slaves processes, waits for all these tasks to be completed, and then proceeds with the remainder of the search (calling again upon the slave processes whenever needed). Another example of rigid synchronization, but with a p-control control cardinality, is that of the direct parallelization of independent search processes. In such a case, each individual process executes its own search without communicating with the others except at the very end when the best solutions found by each process are compared to determine the best overall. The second level along this control type dimension (Knowledge Synchronization (KS)) corresponds to more sophisticated synchronous parallelization schemes. In 1-control settings, slaves thus perform on their own more complex tasks, such as executing a limited number of TS steps on a subset of the search space. In a p-control environment, the knowledge synchronizationmode covers situations where processes follow independent search trajectories for some time but stop at predetermined moments (e.g., after performing a set number of iterations) to engage in intensive communicationto share information between themselves. The third level along the control-type dimension (Collegial (C)) covers situations where asynchronous communications are used; it only makes sense in a p-control context. In these situations, each search process explores the search space (or sometimes, part of it) according to its own logic, storing and processing local information as it goes

PARALLELIZATION STRATEGIES FOR TABU SEARCH

293

along. It also communicates with the other processes (all or just a subset of them) or with a central memory at times dictated by the results of its own search (albeit in the context of the overall global search): for instance, if a process finds a globally improving solution, it might broadcast it to all other processes or deposit it in the central memory; if it finds that its own search is stagnating, it may request a new solution from the other processes or from the central memory. The fourth and final level on the control type dimension is called Knowledge Collegial (KC). It refers to more advanced asynchronous communication schemes in which the contents of communications are analyzed to infer additional information concerning the global search pattern performed so far and/or the characteristics of good solutions. This may be implemented using global memory structures that can be accessed by the processes while they conduct their own search. The main difference between the collegial and the knowledge collegial organizations is that in the former the information recovered by a process from another is identical to the information sent by that process, while in the latter case, the information received is richer, thus helping to build a picture of the overall dynamics of the asynchronous exploration of the search space. The third dimension of the taxonomy pertains to Search diferentiation: do search threads start from the same or from different initial solutions? Do they use the same or different search strategies (parameter settings, memory management rules, neighborhood operators, etc.)? These two questions lead to a four-way classification along thts dimension: Same initial Point, Same search Strategy (SPSS); Same initial Point, Diferent search Strategies (SPDS); Multiple initial Points, Same search Strategies (MPSS); Multiple initial Points, Different search Strategies (MPDS). It should be noted that this dimension had, in fact, been introduced earlier by Volj in his own attempt to classify parallelization schemes for TS [89]. It should be pointed out that, although it was originally developed for TS, this taxonomy could apply equally well to several other classes of metaheuristics and thus possibly constitutes a valuable basis for a comprehensive taxonomy of parallel metaheuristics.

13.3.2 More on Cooperative Search

As we shall see in the next section, the trend in parallel TS has been to move from the low level parallelism (e.g., the lC/RS methods) of the early implementations toward more and more complex high level parallelism schemes. In fact, most recent parallel TS heuristics implement some form of cooperative search. While cooperation seems to offer the most promising avenue for superior performance, it also involves the greatest challenges in terms of algorithm design and it is worth discussing somewhat further before moving on to the literature review. Cooperative multithread TS methods launch several independent search threads (each defining a trajectory in the search space) and implement information exchange mechanisms among these threads. The key challenge in such a context is to ensure that meaningful information is exchanged in a timely manner between the threads, to allow the global parallel search to achieve a better performance than the simple

294

PARALLEL TABU SEARCH

concatenation of the results of the individual threads, where performance is measured in terms of computing time and solution quality [5,24]. Toulouse, Crainic, and Gendreau [81] have proposed a list of fundamental issues to be addressed when designing cooperative parallel strategies for metaheuristics: What information is exchanged? Between what processes is it exchanged? When is information exchanged? How is it exchanged? How are the imported data used? It is not our intention to address these issues at this point, but it is useful to bear them in mind as we survey the literature in the next section.

13.4 LITERATURE REVIEW This literature review is divided into four sections that cover respectively 1-control heuristics, p-control synchronized methods, asynchronous search approaches, and hybrid metaheuristics involving parallel TS. This division also roughly corresponds to a chronological arrangement of the literature, starting with the earliest efforts and finishing with the most recent ones.

13.4.1 1-Control Parallel Heuristics As we already mentioned before in this chapter, early parallel implementationsof TS were based on the classical master-slave approach and aimed solely at accelerating the search, thus lowering computing times. This allowed researchers to tackle more effectively difficult problems such as the quadratic assignment problem (QAP) [ 16, 18,19,73,75], the traveling salesman problem (TSP) [ 171, vehicle routingproblems (VRP) [33], and the task schedulingproblem on heterogeneous systems [65,66,67]. As explained in the previous section, in this type of implementation, a “master” process executes a regular sequential TS procedure but dispatches computingintensive tasks to be executed in parallel by “slave” processes. The master receives and processes the information resulting from the slaves’ computations, selects and implements moves, updates the search memories, and makes all decisions pertaining to the activation of search strategies (e.g., deciding when to perform intensification or diversification) and to the termination of the search. The search step usually parallelized and assigned to slave processes is the neighborhood evaluation. At each iteration, the moves that make up the neighborhood of the current solution are partitioned into as many sets as the number of available slave processors and the evaluation is carried out in parallel by slave processes.

LITERATURE REVIEW

295

This 1C/RS/SPSS strategy proved quite successful for problems that display large neighborhoods and relatively low computing requirements to evaluate and perform a given move, such as the ones listed above. In implementationswith a relatively small number of processors, near-linear speedups were reported for the same quality of solutions. This approach also permitted, at the time, to improve the best-known solutions for several problem instances proposed in the literature. In fact, the approach has not been totally abandoned: in 1999, Randall and Abramson [68] proposed a general framework for applying it to a variety of problems and recently Blazewicz, Moret-Salvador, and Walkowiak [8] used it to tackle two-dimensional cutting problems. In 1995, Crainic, Toulouse, and Gendreau [25] realized a comparative study of several synchronous TS parallelizations for the location-allocation problem with balancing requirements. Apart from a straightforward 1C/RS/SPSS approach and some p-control ones, they also implemented a 1C/KS/SPSS heuristic based on the Sequential fan candidate list strategy, also known as the look ahead or probing approach [53,51]. In this approach, slave processes perform a small number of (look ahead) TS iterations before synchronization, and the selection of the best neighboring solution from which the next iteration is initiated is based on the value of the objective after the lookahead iterations. Both the 1CIKSISPSSand the lC/RS/SPSS heuristics yielded better solutions than sequential TS on the tested instances, with the KS one being consistently superior to the RS one. To the best of our knowledge, this paper is the only ever to report results on a parallel implementation of the sequential fan candidate list strategy, Another major parallelization strategy that has been implemented using 1-control schemes is search space decomposition. The basic idea behind this approach is to divide the search space of the problem into several (usually disjoint, but not necessarily exhaustive) sets and to run TS (or any other heuristic or metaheuristic) on each subset, thus accelerating the global search. The approach can be implemented in two fairly different fashions: in the first, all search threads consider complete solutions to the problem, while in the second, they handle partial ones, in which case a complete solution has to be reconstructed at some point. It must stressed, however, that in both cases each search process has only access to a restricted portion of the search space. Furthermore, the decomposition of the search space is often non-exhaustive, i.e., the union of the search space subsets considered by the slave processes at a given point in time may be significantly smaller than the complete search space. Therefore, to increase the thoroughness of the search and allow all potential solutions to be examined, the decomposition is modified at regular intervals and the search is then restarted using this new decomposition. This strategy is naturally implemented using lC/KS master-slave schemes (with either an MPSS or MPDS search differentiation strategy): the master process determines the partition, synchronizes slave processes, reconstructs solutions (if required), and determines stopping conditions, while slave processes perform the search on their assigned search space subset. This approach has proved quite successful for problems for which a large number iterations can be performed in a relatively short time and restarting the method with a new decomposition does not require an unreasonable computational

296

PARALLEL TABU SEARCH

effort (see, e.g., Fiechter [3 11 for the TSP and Laganiere and Mitiche [57] for image filtering). An extension of this strategy was used by Gendreau, Laporte, and Semet [43] to solve efficiently in real time several variants of the same problem instance; their method is described in detail in Section 13.5.1. 13.4.2 p-Control Synchronized Parallel Heuristics Zndependent multi-searches were also among the earliest parallel TS strategies implemented. Most implementations launch several independent search processes from different, often randomly generated, initial solutions. No communicationstake place between the multiple search threads running in parallel, except once all processes have stopped, when the best overall solution is identified. As mentioned in subsection 13.3.1, these approaches clearly belong to the pC/RS class of the taxonomy. Note, however, that, in most implementations,a designated processor verifies that the others have completed their search and collects the information. While these procedures, like lC/RS ones, essentially amount to repeated sequential TS heuristics, they turn out to be effective, simply because of the sheer quantity of computing power they allow one to apply to a given problem. This was indeed established empirically by several papers, including those of Battiti and Tecchiolli [7] for the QAP and Taillard [76] forjob shop scheduling problems, in which excellent results were obtained when compared to the best existing heuristics at the time. This parallelization of the classic sequential multi-start heuristic is also very easy to implement and remains popular for this reason [ 10, 791. As was mentioned in the taxonomy, pC/KS strategies attempt to take advantage of the parallel exploration of the search space by synchronizing search processes at predetermined intervals. They are also generally implemented in a master-slave configuration, in which the designated master process collects information from the other processes at synchronization instants and usually restarts the search from the best solution found so far (see Malek et al. [60] for the TSP, and Rego and Roucairol [70] and Rego [69] for the VRP using ejection chains). De Falco et al. [30] and De Falco, Del Balio, and Tarantino [29] attempted to overcome the limitations of the master-slave setting, by allowing processes, when they terminate their local search phase, to synchronize and exchange information (best solutions) with processes running on neighboring processors. A more sophisticated pC/KS approach was proposed in 1997 by Niar and Freville [61]. In this pC/KS/MPDS scheme, a master process controls p slave processors executing synchronous parallel TS threads by dynamically adjusting their respective search strategy parameters according to the results they have obtained so far. Computational results reported for the 0- 1 Multidimensional Knapsack Problem show that this dynamic adjustment of search parameters is indeed beneficial. Authors generally report good performance and results, but synchronous cooperative implementations tend to show a poorer performance when compared to asynchronous and even independent searches (see Crainic, Toulouse, and Gendreau [25, 26, 27]), especially when synchronization points are predetermined (as in most existing implementations). This is mainly due to the large computation overheads

LITERATURE REVIEW

297

that must be incurred at synchronization instants when all processes need to wait for the slowest of them. Furthermore, the predefinition of synchronizationpoints makes these strategies less reactive to the progress of the search on any given problem instance than asynchronous approaches. This being said, synchronous cooperative implementations are quite simple to implement and they have yielded good results on several problems. As pointed out by Crainic [20], there are two other issues that merit further discussion in connection with these pC/KS strategies. The first has to do with the way in which memories are handled in them. In the implementations reported in the literature, memories are emptied at synchronization instants before restarting the search. Considering the central role played my memories in TS, one may wonder whether any precious information is lost when doing so. It might thus be interesting to conduct an investigation to determine if any useful information could be passed from one search phase to the next, how it could be used, and how it might impact on the overall performance of these methods. The second issue concerns specifically pC/KS/SPDS strategies. It has to do with the fact that in those strategies the new searches that are launched after synchronizationusually all restart from the same best known solution, thus concentrating the search in the same region of the search. It is well known, however, that the main weakness of TS is its tendency to explore a too limited region of the search space, i.e., the search lacks breadth, unless systematic and effective diversification schemes are used. pCIKSISPDS strategies may therefore end up exaggerating this weakness. Because they use different solutions to restart the search, pC/KS/MPSS and pC/KS/MPDS may be less prone to this problem, but it is not obvious that they do not also suffer from it in an attenuated fashion. This issue would certainly be worth investigating too. Search space decomposition (see Section 13.4.1) may also be implemented in a pC/KS framework, as in Taillard’s early TS heuristic for the VRP [74, 751. In this implementation, customers are partitioned on a geographical basis and vehicles are allocated to the resulting regions to create smaller VRP instances. Each subproblem is then solved by an independent TS procedure. These independent search processes stop after a number of iterations that varies according to the total number of iterations already performed. The partition is then modified by an information exchange phase, during which tours, undelivered cities, and empty vehicles are exchanged between processes handling adjacent regions. Taillard’s results at the time were excellent, but his approach did require considerable computing time (in fact, he did not even report computing times, but these are known to be quite substantial). This is not surprising considering the fact that, as the other pC/KS strategies described above, it had to incur the inherent overhead stemming from synchronization. 13.4.3 Asynchronous Methods

Historically, independent and synchronous cooperative methods were the first multithread search approaches to be developed. However, because of the shortcomings of these methods, which we have discussed at length in the previous sections, researchers have increasingly turned their attention to asynchronous procedures, which

298

PARALLEL TABU SEARCH

now largely define the “state-of-the-art” in parallel multithread search. These asynchronous procedures all follow the same general pattern: starting from possibly different initial solutions and using possibly different tabu (or other) search strategies, p threads explore simultaneously the search space. As indicated in Section 13.3.1, they belong either to the pC/C or to the pC/KC class of the taxonomy, the main difference between the two being whether or not any “new” knowledge is inferred on the basis of the information exchanged between the search threads. A key issue in the development of asynchronous procedures is the definition of effective mechanisms to allow the asynchronous exchange of information between the search threads. The simplest scheme for achieving this is simply by having threads engage in communication when some triggering events occur, such as the discovery of a globally improving solution. One may implement, for instance, a “broadcast and replace” strategy: when a search thread improves the global best solution, this solution is broadcast to all other processes that then do their own exploration and restart it from this new solution (alternatively, processes might broadcast every locally improving solution, but this clearly increases significantly the communication overhead). This type of approach was applied successfully to complex vehicle routing problems by Attanasio et al. [2] and Caricato et al. [ 141. Crainic [20] observes that these methods in which the “local” exploration phase of processes can be interrupted do not always yield good results. In fact, as he correctly points out, cooperative metaheuristics with unrestricted access to shared knowledge may experience serious premature “convergence”, especially when the shared knowledge reduces to one solution only (the overall best or the new best from a given thread): eventually, all threads end up exploring the same restricted region of the search space, thus forfeiting one of the main potentials of parallel cooperative search, that IS, search breadth. For a more detailed discussion of these issues, see [84, 82, 83, 851. Most asynchronous implementationsof parallel TS do, however, handle information exchange rather differently in order to avoid the pitfall mentioned above. In these approaches, communications are controlled, i.e., they occur rather infrequently and only at well-specified stages of the exploration conducted by the individual threads. Exchanges of information take place through some form of memory (or blackboard) that is used to store various information on solutions and/or solution components. In most cases, the main information stored in the memory is a list of best known or elite solutions. The memory is then often referred to as the central memory, the solution pool, the solution warehouse or, even, the reference set. In other cases, only partial solutions are recorded and the memory is then referred to as the adaptive memory. This terminology was coined in 1995 by Rochat and Taillard in a seminal paper [71], in which they proposed (sequential) TS heuristics for the classical Vehicle Routing Problem and the Vehicle Routing Problem with time windows (VRPTW) that are still among the most effective ones for both problems. The main idea in the adaptive memory approach is to record in a structure the individual components (in routing problems, the vehicle routes) making up elite solutions as they are found. These components are kept sorted in the adaptive memory with respect to the objective function value of the solution to which they belong. When the search stagnates and needs to

LITERATURE REVIEW

299

be restarted from a new solution, this solution is constructed by combining randomly selected routes from the adaptive memory. In almost all cases, the new solution will be made up of routes from different elite solutions (in what could be interpreted, in genetic algorithm terminology, as a multiparent crossover operation), thus inducing a powerful diversification effect. For more on adaptive memory concepts, see Glover [48] and Taillard et al. [78]. The adaptive memory approach is eminently amenable to parallelization, since the search threads can all feed a single, common adaptive memory from which new solutions that naturally combine information from different search threads can be constructed. It has been applied very successfully to the VRF'TW by Badeau et al. [3] and to real-time vehicle routing and dispatching by Gendreau et al. [40];this latter application is described in more detail in the next section. A similar approach was used with good results by Schulze and Fahle [72] to solve the VRF'TW: all the routes generated by the TS threads are collected in a pool, but they are recombined by solving a set-covering heuristic whenever a new solution is needed. Badeau et al. [3] also report a number of other interesting findings. First, the performance of their method with respect to the quality of the solution is almost independent of the number of search processes (as long as this number remains within reasonable bounds) for a fixed computational effort (measured in terms of the overall number of calls to the adaptive memory by all search threads). Second, while traditional parallelization schemes rely on a one-to-one relationship between actual processors and search processes, it turned out that their method did run significantly faster when using more search processes than the number of available processors, because this allowed to overcome the bottlenecks created when several threads were trying to access simultaneously the processor on whch the adaptive memory was located. Furthermore, computational evidence showed that it is not, in general, a good idea to run a search thread concurrently with the adaptive memory management procedure on the same processor. These are lessons that should be kept in my mind when developing asynchronous multithread procedures, whether they use adaptive memory or not. To the best of our knowledge, Crainic, Toulouse, and Gendreau were the first, in 1995, to propose a central memory approach for asynchronous TS in their heuristics for multicommodity location with balancing requirements [26]. In their method, individual TSs record their current solution into central memory whenever it improves on their local best solution (i.e., the best solution found up to that point by the same thread), but they only import a solution from the central memory when they are about to undertake a diversification phase. If the imported solution is better than the current local best one, it replaces it. Diversification then proceeds from the (possibly modified) local best solution. It is important to note that the memories local to search threads are never re-initialized. Five strategies for retrieving a solution from the pool when requested by an individual thread were tested. When few (four) processors were used, the strategy that returns the overall best solution produced the best results. When the number of processors was increased, the best performance was achieved by a probabilistic procedure that selects solutions on the basis of their rank in the pool. The parallel procedure improves the quality of the solution and also requires

300

PARALLEL TABU SEARCH

less (wallclock) computing time compared to the sequential version, particularly for large problems with many commodities. The same approach was applied to the fixed-cost, capacitated, multicommoditynetwork design problem with similar results [22]. Over the last few years, several other authors have implemented fairly similar approaches to a variety of problems, including the partitioning of integrated circuits for logical testing [ 11, two-dimensional cutting [8], the loading of containers [ 111, and labor-constrainedscheduling [ 151. Another attempt to overcome the problems stemming from the uncontrolled sharing of information is the so-called multi-level cooperative search proposed by Toulouse, Thulasiraman, and Glover [86,87]. This approach is based on the principle of the controlled diffusion of information, which is achieved by having search processes work at different aggregation levels of the original problem and by allowing communicationsonly between processes at immediately adjacent aggregation levels. These communications consist in asynchronous exchanges of improving solutions at various moments dynamically determined by each process according to its own logic, status, and search history. Communications are further limited by the fact that an imported solution will not be transmitted further until a number of iterations have been performed. The approach has proved very successful for graph partitioning problems [62, 631.

13.4.4 Hybrids Involving Parallel Tabu Search In his 2001 review of recent advances in TS, Gendreau [37] mentioned that hybrid metaheuristics, which combine principles from two or more families of metaheuristics, were probably one of the most exciting developments in the field. The main motivation for developing such hybrid methods is that it is hoped they will display the desirable properties of all the original “pure” methods that they draw upon. For instance, one may replace the mutation operator of a Genetic Algorithm (GA) by a TS procedure, thus creating a GA-TS hybrid, with the objective of achieving simultaneously search aggressiveness(the TS component) and breadth (the GA component) in the exploration of the search space. Hybrids involving parallel TS and other metaheuristics are not as recent as our earlier statement may suggest. In fact, a method combining the principles of TS and of Hopfield- Tank neural networks to solve simple plant location problems on massively parallel architectures was proposed by Vaithyanathan, Burke, and Magent [88] as early as 1996, and there may have been earlier parallel TS hybrids. Broadly speaking, hybrid metaheuristics can be divided into three main classes. The first class is made up of methods that sequentially apply metaheuristics of different families to a given problem. The two-phase approach of Gehring and Homberger for the VRPTW [34,35,36] is a typical example of such a method: it first applies an evolution strategy to reduce the number of vehicles required by a solution and then TS to minimize total travel distance. The parallelization is a multithread cooperative one in which each process executes the two-phase heuristic (possibly with different parameters) and information exchanges take place according to the asynchronous central memory strategy of the previous section. Bastos and Ribeiro

LITERATURE REVIEW

301

[6] describe a somewhat different two-phase hybrid for the Steiner problem: in their approach, a parallel multithread reactive TS phase (using again the asynchronous central memory strategy) is followed by a distributed Path Relinking (PR) [49,5 1,521 phase, i.e., all processes switch from TS to PR simultaneously. In the second class of hybrid metaheuristics, the algorithmic elements of the original methods are assembled into a single (monolithic), complex algorithmic design, as in the example provided at the beginning of this section. An illustration of this class of methods is provided in the TS-PR hybrid of Gallego, Romero, and Monticelli [32] for transmission network expansion planning. In this heuristic, the diversification step of a parallel multithread TS is implemented, in some situations, by applying PR instead of rather simple modifications of solutions. Other hybrids of this class include those of Talbi et al. [80], which applies Simulated Annealing (SA) principles in the intensification step of a multi-start TS procedure for the QAP, of Jozefowiez, Semet, and Talbi [56], which integrates evolutionary algorithms and parallel TS to solve multi-objective vehicle routing problems, and of Baiios et al. [4], which combines TS, SA, and multi-level cooperative search to tackle graph partitioning problems. The third class of hybrids exploits differently the hybridization concept by relying on parallel multiagent search architectures in which individual agents run “pure” methods but exchange information among themselves. Crainic and Gendreau [21] proposed such a hybrid search strategy by adding a GA thread to their asynchronous multithread TS heuristic for multicommodity location-allocation with balancing requirements [26]. This distinct GA thread is launched when a certain number of elite solutions have been recorded in the central memory of the parallel TS, using these solutions as its initial population. Asynchronous migration subsequently transfers the best solution of the genetic pool to the parallel TS central memory, as well as solutions of the central memory toward the genetic population. This strategy did perform well, especially on larger instances. An interesting observation of this study was that the best overall solution was never found by the GA thread, but its inclusion allowed the TS threads to find better solutions. It is hypothesized that this superior performance stemmed from a more effective diversificationof the search. This scheme was recently further refined by Le Bouthillier and Crainic [59], who combined several different GA (i.e., with different parent selection and crossover operators) and TS threads in their hybrid metaheuristic for the VRPTW. In this implementation, there is one central memory that is common to all threads. Computational results show that, without any particular calibration, the parallel metaheuristic is competitive with the best metaheuristics available and demonstrates almost linear speedups. The hybridization of parallel TS approaches with population-based methods such as GA, PR, and Scatter Search [49, 51, 52, 581 appears very promising, because it addresses what is probably the greatest weakness of TS, namely its tendency to remain confined in a too small area of the search space. The third class of hybrids seems particularly attractive with respect to the benefits it has to offer for a relatively low implementation effort.

302

PARALLEL TABU SEARCH

13.5 TWO PARALLEL TABU SEARCH HEURISTICS FOR REAL-TIME FLEET MANAGEMENT Real-time fleet management problems are found in numerous transportation and logistics applications. In the following, we describe two parallel TS heuristics for dispatching fleets of vehicles in the context of emergency services and courier services, respectively. Given that fast response times are required in these cases, parallel implementations are particularly indicated since they allow for more optimization work to be performed within the allotted time.

13.5.1 Real-Time Ambulance Relocation In the work of Gendreau, Laporte, and Semet [43], a real-time redeployment problem for a fleet of ambulances is addressed. Basically, when a call is received, an ambulance is first assigned to it. This assignment is done through the application of relatively simple dispatching rules. Then, the remaining available ambulances can be relocated to other waiting sites to provide a better coverage of the demand. The latter problem is tackled with a TS heuristic that moves ambulances between potential waiting sites. The objective is to maximize the proportion of demand covered by at least two vehicles within a given (time) radius minus relocation penalties. The problemsolving approach exploits a TS heuristic previously developed for a static ambulance location problem by the same authors [42]. As life or death dispatching and relocation decisions must be taken under considerable time pressure in the real-time context, a parallel implementation is proposed to speed up the decision process. Basically, the available time between two emergency calls is exploited to precompute possible scenarios. More precisely, for each site currently occupied by an available ambulance, a relocation plan is computed with TS by assuming that an ambulance from this site will be assigned to the next incoming call. When a new call occurs, an ambulance is assigned according to a given dispatching rule and the precomputed redeployment scenario associated with the site of this ambulance is then directly applied. If there is not enough time to compute a complete solution before the next call, then no redeployment takes place after the ambulance assignment. The parallel algorithm is based on a pure master-slave scheme. The master manages global data structures with precalculated information on each ambulance and sends the relocation problems associated with each occupied site to the slaves. The CPU time allotted to each slave for solving a problem is strictly controlled by fixing the number of iterations in TS. This number is based on the frequency of calls (e.g., low frequency implies that more CPU time can be allocated to the search). When every problem has been solved once, a new attempt to improve the solutions is performed using a larger number of iterations. This algorithm has been implemented on a network of SUN UltraSparc workstations, using simulated data based on real-life call distributions on the island of Montreal, Canada. The results that were obtained demonstrate the suitability of this

T W O PARALLEL TABU SEARCH HEURISTICS FOR REAL-TIME FLEET MANAGEMENT

algorithm. In the simulations, every call was serviced within the required time range of 15 minutes and 98% of urgent calls were serviced within 7 minutes, with an average of 3.5 minutes (the current practice in Montreal requires that 90% of the urgent calls should be responded to within 7 minutes). Furthermore, in 95% of the cases, the algorithm succeeded in precomputing a complete redeployment strategy before the occurrence of the next call. 13.5.2 Real-Time Vehicle Routing and Dispatching for Courier Services The problem considered by Gendreau et al. [40] is motivated from courier services where customer requests for the transportation of small items (e.g., letters, small parcels) must be accommodatedin real-time and incorporatedinto the current planned routes of a fleet of vehicles. A planned route here corresponds to the sequence of requests that have been assigned to a given vehicle but have not been serviced yet. Due to the presence of soft time constraints for servicing a customer, the problem is modeled as an uncapacited Vehicle Routing Problem with Soft Time Windows (VRPSTW). The objective function to be minimized relates to the total distance travelled (or total travel time) for servicing the customers plus penalties for lateness at customer locations. As in Section 13.5.1, the problem-solving approach exploits a TS heuristic previously developed for a static version of the problem, where all customer requests are known in advance [77]. Thus, a series of static problems are solved over time, based on the current planned routes. A new static problem is defined each time an input update occurs, due to the arrival of a new request, and TS is applied on this problem until the next input update. The general problem-solving framework, within which the TS heuristic is embedded, is described below. In this description, it is assumed that a certain number of “static” requests are known at the start of the day (those that have been received the previous day, but too late to be accommodated the same day). These requests are incorporated into initial planned routes that are used to start the search process.

** Initialization ** 1. Generate different initial solutions with the static requests using a constructive (insertion) heuristic.

2. Apply the TS heuristic to these solutions and store the resulting routes in adaptive memory.

** Search algorithm ** 3. For a number of iterations do:

3.1 Construct a starting solution by combining routes in adaptive memory. 3.2 For a number of iterations do:

303

304

PARALLEL TABU SEARCH

- Decompose the set of planned routes in the current solution into

disjoint subsets of routes. - Apply TS to each subset of routes. - Merge the resulting routes to create the new current solution.

3.3 Apply a post optimization procedure to each individual route. 3.4 Add the resulting routes to the adaptive memory (if the solution is good enough). As we can see, an adaptive memory stores the best solutions found during the search. The routes in these solutions are then used to feed the TS with new starting points. The optimization is performed by the latter by moving customers between routes. It should be noted that search space decomposition takes place and that a distinct TS is applied to each subset of routes in the current decomposition. These subsets are then merged together to form a complete solution. After a number of decompositions, each individual route in the final solution is further improved with a specialized heuristic for the Traveling Salesman Problem with Time Windows [4 11. A two-level parallelization scheme is proposed to implement this problem-solving framework.

1. The so-called “search algorithm” is launched in parallel on multiple processors, thus implementing a multithread search. At this upper level, we have asynchronous cooperative threads that communicate through a common adaptive memory, as they all feed and fetch solutions from this memory. In the taxonomy, this approach corresponds to a pCIKCNPSS parallelization scheme. 2. Within each search thread, search space decomposition is realized in parallel through a master-slave implementation, where each slave runs a TS on a different subset of routes. After a given number of search iterations, each slave returns its best (partial) solution. The partial solutions are then merged together to obtain a complete solution. At the end, the best solution found is sent for possible inclusion in the adaptive memory. We thus have a ICIKSIMPSS parallelization scheme at this level. This algorithm has been run in a coarse-grained parallel environment, namely a network of SUN Sparc workstations. The results on simulated data have shown that the TS-based optimization procedure provides substantial benefits over simpler dispatching approaches. Through an appropriate adaptation of the basic neighborhood operator that moves customers between routes, it is possible to exploit this framework to solve problems where a customer request involves either a single location or both a pick-up and a delivery location [39]. In the work reported above, the current destination of each vehicle was not included in the planned routes and could not be moved by the neighborhood operator. Hence, a vehicle in movement had to reach its planned destination. In a more recent development, Ichoua, Gendreau, and Potvin [ 5 5 ] included the current destination of each vehicle in planned routes. Upon request arrival, it is thus possible for TS to

PERSPECTIVES AND RESEARCH DIRECTIONS

305

insert the new request between the current position of a given vehicle and its planned destination, thus redirecting (or diverting) the vehicle to serve the new request. As the current destination of every vehicle can now be freely moved around by TS, the solution obtained can also modify the current destination of other vehicles, if it leads to a better solution. The proposed algorithm thus implements a generalized form of diversion that might involve more than one vehicle. As vehicles are moving fast, exploiting redirection opportunities when new requests occur must be realized under considerable time pressure. A parallel implementation is thus particularly appropriate in this context. The interested reader will find in Attanasio et al. [ 2 ] the description of another parallel TS heuristic for a real-time vehicle routing and dispatching problem. The algorithm is developed in the context of a dial-a-ride system where people are transported. Different issues related to real-time vehicle routing and parallel computing can also be found in the recent paper of Ghiani, Guerriero, and Laporte [44].

13.6 PERSPECTIVES AND RESEARCH DIRECTIONS Parallel TS has come a long way since its early beginnings 15 years ago when parallel implementations of TS almost exclusively focused on the parallelization of the neighborhood evaluation step. The main trend is now clearly toward asynchronous cooperative multithread methods and hybrids, which attempt to bring to bear on a given problem all the algorithmic machinery that is at hand. Throughout this evolution, our understanding of the key features that make a particular implementation successful has significantly deepened (for instance, most experimented researchers in the area are now keenly aware of the efficiency loss inherent in synchronous search models), but a lot of a research is still required in order to fully understand the subtle interactions that occur when using complex cooperativestrategies,whether in a “pure” TS scheme or in a hybrid one. One step in that direction would be the definition and collection of relevant performance measures and statistics. Such measures are acutely needed to track down what really takes places in complex cooperative schemes. Furthermore, if used at execution time, they could help improve their efficiency. For instance, unproductive search threads could be identified and then terminated or redirected. Statistics could also be attached to the elite solutions in the pool (e.g., measures related to the quality of the solutions visited by search threads starting from these elite solutions) to better target the most promising regions of the search space. Apart from the questions related to the design of effective search strategies, the evolution toward complex parallel TS schemes has created formidable challenges with respect to the actual implementation of the methods. To be quite honest, implementing from scratch any of the heuristics described in the latter parts of our survey requires considerableprogramming skills. Fortunately, over the last few years, one has witnessed the emergence of dedicated environments for the development of parallel TS algorithms and hybrids (e.g., [9, 131). These environments provide skeletons (templates) or frameworks that one may instantiate to obtain an algorithm

306

PARALLEL TABU SEARCH

for tackling a specific problem with a given search strategy. In all cases, a strict separation between the generic and the problem-specific parts of the algorithm is enforced. We believe that these environments are an essential step in the further development of parallel TS approaches and we expect to see more of them being proposed in the coming years. As a final remark, we would like to recall that parallel TS has proved over time to be an extremely effective metaheuristic for tackling a large variety of very difficult combinatorial optimization problems, especially in a real-time context. With the exciting recent developments in the field, it should remain so for many years.

Acknowledgments Funding for this project has been provided by the Natural Sciences and Engineering Council of Canada and by the Fonds FQRNT of the Province of Quebec.

REFERENCES 1. R. M. Aiex, S. L. Martins, C. C. Ribeiro, and N. R. Rodriguez. Cooperative Multi-Thread Parallel Tabu Search with an Application to Circuit Partitioning. In Proceedings of IRREGULAR’98 - 5th International Symposium on Solving Irregularly Structured Problems in Parallel, Lecture Notes in Computer Science, volume 1457, pages 3 10-33 1, 1998. Springer-Verlag. 2. A. A. Attanasio, J.-F. Cordeau, G. Ghiani, and G. Laporte. Parallel Tabu Search Heuristics for the Dynamic Multi-Vehicle Dial-a-Ride Problem. Parallel Computing, 30:377-387, 2004. 3. P. Badeau, F. Guertin, M. Gendreau, J.-Y. Potvin, and E. D. Taillard. A Parallel Tabu Search Heuristic for the Vehicle Routing Problem with Time Windows. Transportation Research C: Emerging Technologies, 5(2): 109-122, 1997. 4. R. Baiios, C. Gil, J. Ortega, and F. G. Montoya. Cooperative Parallel Tabu Search for Capacitated Network Design. Journal of Heuristics, 10(3):315-336, 2004.

5. R. S. Barr and B. L.Hickman. Reporting Computational Experiments with Parallel Algorithms: Issues, Measures, and Experts Opinions. ORSA Journal on Computing, 5( 1):2-18, 1993. 6. M. P. Bastos and C. C. Ribeiro. Reactive tabu search with path-relinking for the Steiner problem in graphs. In C. C. Ribeiro and P. Hansen, editors, Essays and Surveys in Metaheuristics, pages 39-58,200 I . Kluwer Academic Publishers. 7. R. Battiti and G. Tecchiolli. Parallel Based Search for Combinatorial Optimization: Genetic Algorithms and TABU. Microprocessors und Microsystems, 16(7):351-367, 1992.

REFERENCES

307

8. J. Blazewicz, A. Moret-Salvador, and R. Walkowiak. Parallel tabu search approaches for two-dimentional cutting. Parallel Processing Letters, 14(1):23-32, 2004. 9. M. J. Blesa, L. Hernandez, and F. Xhafa. Parallel Skeletons for Tabu Search Method Based on Search Strategies and Neighborhood Partition. In R. Wyrzykowski, J. Dongarra, M. Paprzycki, and J. Waniewski, editors, Parallel Processing and Applied Mathematics : 4th International Conference (PPAM 2001), Lecture Notes in Computer Science, volume 2328, pages 185-193, Naleczow, Poland, 2002. Springer-Verlag. 10. S. Bock and 0. Rosenberg. A New Parallel Breadth First Tabu Search Technique for Solving Production Planning Problems. International Transactions in Operational Research, 7(6):625-635,2000. 11. A. Bortfeldt, H. Gehring, and D. Mack. A parallel tabu search algorithm for solving the container loading problem. Parallel Computing, 29:64 1-662,2003. 12. W. Bozejko and M. Wodecki. Solving the flow shop problem by parallel tabu search. In Proceedings, International Conference on Parallel Computing in Electrical Engineering, PARELEC '02,pp. 189-194, 2002. 13. S. Cahon, N. Melab, and E.-G. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics. Journal of Heuristics, 1O(3): 357-380,2004. 14. P. Caricato, G. Ghiani, A. Grieco, and E. Guerriero. Parallel tabu search for a pickup and delivery problem under track contention. Parallel Computing, 291631-639,2003. 15. C. C. B. Cavalcante, V. C. Cavalcante, C. C. Ribeiro, and C. C. de Souza. Parallel Cooperative Approaches for the Labor Constrained Scheduling Problem. In C. C. Ribeiro and P. Hansen, editors, Essays and Surveys in Metaheuristics, pages 20 1225,2001. Kluwer Academic Publishers.

16. J. Chakrapani and J. Skorin-Kapov. A Connectionist Approach to the Quadratic Assignment Problem. Computers & Operations Research, 19(3/4): 287-295, 1992. 17. J. Chakrapani and J. Skorin-Kapov. Connection Machine Implementation of a Tabu Search Algorithm for the Traveling Salesman Problem. Journal of Computing and Information Technology, 1(1):29-36, 1993. 18. J. Chakrapani and J. Skorin-Kapov. Massively Parallel Tabu Search for the Quadratic Assignment Problem. Annals of Operations Research, 41 :327-341, 1993. 19. J. Chakrapani and J. Skorin-Kapov. Mapping Tasks to Processors to Minimize Communication Time in a Multiprocessor System. In The Impact of Emerging

308

PARALLEL TABU SEARCH

Technologies of Computer Science and Operations Research, pages 45-64,1995. Kluwer Academic Publishers. 20. T. G. Crainic. Parallel Computation, Co-operation, Tabu Search. In C. Rego and B. Alidaee, editors, Metaheuristic Optimization via Memory and Evolution: Tabu Search and Scatter Search, 2004 (forthcoming). Kluwer Academic Publishers. 2 1. T. G. Crainic and M. Gendreau. Towards an Evolutionary Method - Cooperating Multi-Thread Parallel Tabu Search Hybrid. In S. VoB, S. Martello, C. Roucairol, and I. H. Osman, editors, Meta-Heuristics 98: Theory & Applications, pages 33 1-344. 1999. Kluwer Academic Publishers. 22. T. G. Crainic and M. Gendreau. Cooperative Parallel Tabu Search for Capacitated Network Design. Journal of Heuristics, 8(6):601-627,2002. 23. T. G. Crainic and M. Toulouse. Parallel Metaheuristics. In T. G. Crainic and G. Laporte, editors, Fleet Management and Logistics, pages 205-25 1, 1998. Kluwer Academic Publishers. 24. T. G. Crainic and M. Toulouse. Parallel Strategies for Meta-heuristics. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, pages 475513,2003. Kluwer Academic Publishers. 25. T. G. Crainic, M. Toulouse, and M. Gendreau. Synchronous Tabu Search Parallelization Strategies for Multicommodity Location-Allocation with Balancing Requirements. OR Spektrum, 17(2/3):113-123, 1995. 26. T. G. Crainic, M. Toulouse, and M. Gendreau. Parallel Asynchronous Tabu Search for Multicommodity Location-Allocation with Balancing Requirements. Annals of Operations Research, 63:277-299, 1995. 27. T. G. Crainic, M. Toulouse, and M. Gendreau. Towards a Taxonomy of Parallel Tabu Search Algorithms. INFORMS Journal on Computing, 9( 1):61-72, 1997. 28. V.-D. Cung, S. L. Martins, C. C. Ribeiro, and C. Roucairol. Strategies for the Parallel Implementations of Metaheuristics. In C. C. Ribeiro and P. Hansen, editors, Essays and Suwqs in Metaheuristics, pages 263-308, 2001. Kluwer Academic Publishers. 29. I. De Falco, R. Del Balio, and E. Tarantino. Solving the Mapping Problem by Parallel Tabu Search. Report, Istituto per la Ricerca sui Sistemi Informatici Paralleli-CNR, 1995. 30. I. De Falco, R. Del Balio, E. Tarantino, and R. Vaccaro. Improving Search by Incorporating Evolution Principles in Parallel Tabu Search. In Proceedings International Confonference on Machine Learning, pages 823-828, 1994. 3 1. C.-N. Fiechter. A Parallel Tabu Search Algorithm for Large Travelling Salesman Problems. Discrete Applied Mathematics, 5 1(3):243-267, 1994.

REFERENCES

309

32. R. A. Gallego, R. Romero, and A. J. Monticelli. Tabu Search Algorithm for Network Synthesis. IEEE Transactions on Power Systems, 15(15):490-495, 2000. 33. B. L. Garcia, J.-Y. Potvin, and J.-M. Rousseau. A Parallel Implementation of the Tabu Search Heuristic for Vehicle Routing Problems with Time Window Constraints. Computers & Operations Research, 21(9): 1025-1033, 1994. 34. H. Gehring and J. Homberger. A Parallel Hybrid Evolutionary Metaheuristic for the Vehicle Routing Problem with Time Windows. In K. Miettinen, M. M. Makela and J. Toivanen, editors, Proceedings of EUROGEN99 - Short Course on EvolutionaryAlgorithms in Engineering and Computer Science, pages 57-64, Jyvaskyla, Finland, 2002.

35. H. Gehring and J. Homberger. Parallelization of a Two-Phase Metaheuristic for Routing Problems with Time Windows. Asia-Pacific Journal of Operational Research, 18:35-47,2001. 36. H. Gehring and J. Homberger. Parallelization of a Two-Phase Metaheuristic for Routing Problems with Time Windows. Journal of Heuristics, 8(3):25 1-276, 2002. 37. M. Gendreau. Recent Advances in Tabu Search. In C. C. Ribeiro and P. Hansen, editors, Essays and Surveys in Metaheuristics, pages 369-377, 2001. Kluwer Academic Publishers. 38. M. Gendreau. An Introduction to Tabu Search. In F. Glover and G. A. Kochenberger, editors, Handbook of Metaheuristics, pages 37-54, 2003. Kluwer Academic Publishers. 39. M. Gendreau, F. Guertin, J.-Y. Potvin, and R. Seguin. Neighborhood Search Heuristics for a Dynamic Vehicle Dispatching Problem with Pick-ups and Deliveries. Technical Report CRT-98- 10, Centre de recherche sur les transports, Universite de Montreal, 1998. 40. M. Gendreau, F. Guertin, J.-Y. Potvin, and E. D. Taillard. Tabu Search for RealTime Vehicle Routing and Dispatching. Transportation Science, 33(4):38 1-390, 1999. 41. M. Gendreau, A. Hertz, G. Laporte, and M. Stan. A Generalized Insertion Heuristic for the Traveling Salesman Problem with Time Windows. Operations Research, 46:330-3 35, 1998. 42. M. Gendreau, G. Laporte, and F. Semet. Solving an Ambulance Location Model by Tabu Search. Location Science, 5:75-88, 1997. 43. M. Gendreau, G. Laporte, and F. Semet. A Dynamic Model and Parallel Tabu Search Heuristic for Real-Time Ambulance Relocation. Parallel Computing, 27:164-1653.2001.

310

PARALLEL TABU SEARCH

44. G. Ghiani, G. Guerriero, G. Laporte, and R. Musmanno. Real-Time Vehicle Routing: Solution Concepts, Algorithms and Parallel Computing Strategies. European Journal of Operational Research, 15111-11,2003. 45. F. Glover. Future Paths for Integer Programming and Links to Artificial Intelligence. Computers & Operations Research, 1(3):533-549, 1986.

46. F. Glover. Tabu Search - Part I. ORSA Journal on Computing, 1(3):190-206, 1989. 47. F. Glover. Tabu Search - Part 11. ORSA Journal on Computing, 2( 1):4-32, 1990. 48. F. Glover. Tabu Search and Adaptive Memory Programming - Advances, Applications and Challenges. In R. Barr, R. Helgason, and J. Kennington, editors, Interfaces in Computer Science and Operations Research, pages 1-75, 1996. Kluwer Academic Publishers. 49. F. Glover. A Template for Scatter Search and Path Relinking. In J. K. Hao, E. Lutton, E. Ronald, M. Schoenauer, and D. Snyers, editors, Artlfcial Evolution, Lecture Notes in Computer Science, volume 1363, pages 13-54, 1997. Springer Verlag.

50. F. Glover and M. Laguna. Tabu Search. In C. Reeves, editor, Modern Heuristic Techniques for Combinatorial Problems, pages 70-1 50, 1993. Blackwell Scientific Publications. 5 1. F. Glover and M. Laguna. Tabu Search, 1997. Kluwer Academic Publishers. 52. F. Glover, M. Laguna, and R. Marti. Fundamentals of scatter search and path relinking. Control and Cybernetics, 39(3):653-684,2000. 53. F. Glover, E. D. Taillard, and D. de Werra. A User’s Guide to Tabu Search. Annals of Operations Research, 41:3-28, 1993. 54. K. Holmqvist, A. Migdalas, and P. M. Pardalos. Parallelized Heuristics for Combinatorial Search. In A. Migdalas, P. Pardalos, and S. Storoy, editors, Parallel Computing in Optimization, pages 269-294, 1997. Kluwer Academic Publishers. 55. S. Ichoua, M. Gendreau, and J.-Y. Potvin. Diversion Issues in Real-Time Vehicle Dispatching. TransportationScience, 34:426-438, 2000.

56. N. Jozefowiez, F. Semet, and E.-G. Talbi. Parallel and Hybrid Models for Multiobjective Optimization: Application to the Vehicle Routing Problem. In J. J. Merelo Guervbs, P. Adamidis, H.-G. Beyer, J.-L. Fernandez-Villacaiias, and H.-P. Schwefel, editors, Parallel Problem Solving from Nature - PPSN VII: 7th International Conference, Lecture Notes in Computer Science, volume 2439, Granada, Spain, Springer-Verlag, pages 27 1-280,2002.

REFERENCES

311

57. R. Laganiere and A. Mitiche. Parallel Tabu Search for Robust Image Filtering. Proceedings of IEEE Workshop on Nonlinear Signal and Image Processing, volume 2, pages 603-605, Greece, 1995.

58. M. Laguna and R. Marti. Scatter Search: Methodology and Implementations in C, 2003. Kluwer Academic Publishers. 59. A. Le Bouthillier and T. G. Crainic. A Cooperative Parallel Meta-Heuristic for the Vehicle Routing Problem with Time Windows. Computers & Operations Research, 32(7): 1685- 1708,2004. 60. M. Malek, M. Guruswamy, M. Pandya, and H. Owens. Serial and Parallel Simulated Annealing and Tabu Search Algorithms for the Traveling Salesman Problem. Annals of Operations Research, 2 1:59-84, 1989. 61. S. Niar and A. Freville. A Parallel Tabu Search Algorithm For The 0-1 Multidimensional Knapsack Problem. In l l th International Parallel Processing Symposium (IPPS '97), Geneva, Switzerland, pages 5 12-516, 1997. 62. M. Ouyang, M. Toulouse, K. Thulasiraman, F. Glover, and J. S. Deogun. MultiLevel Cooperative Search: Application to the NetlisdHypergraph Partitioning Problem. In Proceedings of International Symposium on Physical Design, pages 192-198,2000. ACM Press. 63. M. Ouyang, M. Toulouse, K. Thulasiraman, F. Glover, and J. S. Deogun. Multilevel Cooperative Search for the Circuitmypergraph Partitioning Problem. IEEE Transactions on Computer-AidedDesign, 2 1(6):685-693,2002. 64. P. M. Pardalos, L. Pitsoulis, T. Mavridou, and M. G. C. Resende. Parallel Search for Combinatorial Optimization: Genetic Algorithms, Simulated Annealing, Tabu Search and GRASP. In A. Ferreira, and J. Rolim, editors, Proceedings of Workshop on Parallel Algorithms for Irregularly Structured Problems, Lecture Notes in Computer Science, volume 980, pages 3 17-33 1, 1995. Springer-Verlag.

65. S. C. S. Porto, J. P. F. W. Kitajima, and C. C. Ribeiro. Performance Evaluation of a Parallel Tabu Search Task Scheduling Algorithm. Parallel Computing, 26~73-90,2000. 66. S . C. S. Porto and C. C. Ribeiro. Parallel Tabu Search Message-Passing Synchronous Strategies for Task Scheduling Under Precedence Constraints. Journal of Heuristics, 1(2):207-223, 1996. 67. S. C. S. Porto and C. C. Ribeiro. A Case Study on Parallel Synchronous Implementations of Tabu Search Based on Neighborhood Decomposition. Investigacion Operativa, 5:233-259, 1996. 68. M. Randall and D. Abramson. General Parallel Tabu Search Algorithm for Combinatorial Optimisation Problems In W. Cheng and A. Sajeev, editors, PART99: Proceedings of the 6th Australasian Conference on Parallel and Real Erne Systems, pages 68-79, Singapore, 1999. Springer-Verlag.

312

PARALLEL TABU SEARCH

69. C. Rego. Node Ejection Chains for the Vehcle Routing Problem: Sequential and Parallel Algorithms. Parallel Computing, 27:20 1-222,200 1.

70. C. Rego and C. Roucairol. A Parallel Tabu Search Algorithm Using Ejection Chains for the VRP. In I. H. Osman and J. P. Kelly, editors, Meta-Heuristics: Theory & Applications, pages 253-295, 1996. Kluwer Academic Publishers. 71. Y. Rochat and E. D. Taillard. Probabilistic Diversification and Intensification in Local Search for Vehcle Routing. Journal of Heuristics, l(1): 147-167, 1995. 72. J. Schulze and T. Fahle. A Parallel Algorithm for the Vehicle Routing Problem with Time Window Constraints. Annals of Operations Research, 86585-607, 1999. 73. E. D. Taillard. Robust Taboo Search for the Quadratic Assignment Problem. Parallel Computing, 17:443455, 1991. 74. E. D. Taillard. Parallel Iterative Search Methods for Vehicle Routing Problems. Networks, 23:661-673, 1993. 75. E. D. Taillard. Recherches itiratives dirigees paralldles. Ph.D. dissertation, Ecole Polytechnique Federale de Lausanne, 1993. 76. E. D. Taillard. Parallel Taboo Search Techniques for the Job Shop Scheduling Problem. ORSA Journal on Computing, 6(2):108-117, 1994. 77. E. D. Taillard, P. Badeau, M. Gendreau, and J.-Y. Potvin. A Tabu SearchHeuristic for the Vehicle Routing Problem with Soft Time Windows. Transportation Science, 3 l(2): 170-186, 1997. 78. E. D. Taillard, L. M. Gambardella, M. Gendreau, and J.-Y. Potvin. Adaptive memory programming: a unified view of metaheuristics. European Journal of Operational Research, 135:1-16, 1997. 79. E.-G. Talbi, Z. Hafidi, and J.-M. Geib. Parallel adaptive tabu search approach. Parallel Computing, 24:2003-2019, 1998. 80. E.-G. Talbi, Z. Hafidi, D. Kebbal, and J.-M. Geib. A fault-tolerant parallel heuristic for assignment problems. Future Generation Computer Systems, 14:425438, 1998. 81. M. Toulouse, T. G. Crainic, and M. Gendreau. Communication Issues in Designing Cooperative Multi Thread Parallel Searches. In I. H. Osman and J. P. Kelly, editors, Meta-Heuristics: Theory & Applications, pages 501-522, 1996. Kluwer Academic Publishers. 82. M. Toulouse, T. G. Crainic, and B. Sans6. An Experimental Study of Systemic Behavior of Cooperative Search Algorithms. In S. Van, S. Martello, C. Roucairol, and I. H. Osman, editors, Meta-Heuristics 98: Theory & Applications, pages 373-392, 1999. Kluwer Academic Publishers.

REFERENCES

313

83. M. Toulouse, T. G. Crainic, and B. Sanso. Systemic Behavior of Cooperative Search Algorithms. Parallel Computing, 2 1(1):57-79,2004.

84.M. Toulouse, T. G. Crainic, B. Sanso, and K. Thulasiraman. Self-organization in

Cooperative Search Algorithms. In Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics, pages 2379-2385, Madison, Wisconsin, 1998. Omnipress.

85. M. Toulouse, T. G. Crainic, and K. Thulasiraman. Global Optimization Properties of Parallel Cooperative Search Algorithms: A Simulation Study. Parallel Computing, 26(1):91-112, 2000. 86. M. Toulouse, F. Glover, and K. Thulasiraman. A Multi-Scale Cooperative Search with an Application to Graph Partitioning. Report, School of Computer Science, University of Oklahoma, Norman, OK, 1998. 87. M. Toulouse, K. Thulasiraman, and F. Glover. Multi-Level Cooperative Search: A New Paradigm for Combinatorial Optimization and an Application to Graph Partitioning. In P. Amestoy, P. Berger, M. Dayde, I. Duff, V. FrayssC, L. Giraud, and D. Ruiz, editors, 5th International Euro-Par Parallel Processing Conference, Lecture Notes in Computer Science, volume 1685, pages 533-542, 1999. Springer-Verlag. 88. S. Vaithyanathan, L. I. Burke, and M. A. Magent. Massively parallel analog tabu search using neural networks applied to simple plant location problems. European Journal of Operational Research, 93(2):317-330, 1996. 89. S. VoR. Tabu Search: Applications and Prospects. In D.-Z. Du and P. Pardalos, editors, Network Optimization Problems, pages 333-353, 1993. World Scientific Publishing.

This Page Intentionally Left Blank

14 Parallel Greedy Randomized Adaptive Search Procedures MAURICIO G.C. RESENDEl, CELSO C. RIBEIR02 'Internet and Network Systems Research Center, AT&T Labs Research, USA

2Universidade Federal Fluminense, Brazil

14.1

INTRODUCTION

Metaheuristics are high level procedures that coordinate simple heuristics, such as local search, to find solutions that are of better quality than those found by the simple heuristics alone. One such metaheuristic is GRASP (GRASP, Greedy Randomized Adaptive Search Procedure) [23, 24, 26, 551. A GRASP is a multi-start procedure, where each iteration usually consists of two phases: construction and local search. The construction phase produces a feasible solution that is used as the starting point for local search. The multi-start procedure returns the best local optimum found. In the GRASP construction phase, a feasible solution is built, one element at a time. For example, a spanning tree is built one edge at a time; a schedule is built one operation at a time; and a clique is built one vertex at a time. The set of candidate elements is made up of those elements that can be added to the current solution under construction without causing infeasibilities. When building a spanning tree, for example, the candidate elements are those yet unselected edges whose inclusion in the solution does not result in a cycle. A candidate element is evaluated by a greedy ,function that measures the local benefit of including that element in the partially constructed solution. The value-based restricted candidate list (RCL) is made up of candidate elements having a greedy function value at least as good as a specified threshold. The next element to be included in the solution is selected at random from the RCL. Its inclusion in the solution alters the greedy function and the set of candidate elements used to determine the next RCL. The construction procedure terminates when the set of candidate elements is empty, obtaining a feasible solution. Algorithm 1 shows a GRASP in pseudocode form, where the objective function f(z) is minimized over the set X . The GRASP runs for MaxIterations iterations. The best solution returned is z*, with f(z*) = f*. 315

316

PARALLEL GRASP

Data : Number of iterations Ma xIt er ations Result : Solution IC* E X

f* + 00;

= 1 , .. .,,,i do IC + GreedyRandomizedConstruction();

for i

x +- LocalSearch(x); if f ( x ) < f* then

I +

x* +- 2; f* f(x); end end

Algorithm 1: Pseudocode of a basic GRASP for minimization.

Local search makes use of the concept of solution neighborhood. A local search algorithm successively replaces the current solution by a better solution in its neighborhood, if one exists. It terminates with a locally optimal solution when there is no better solution in the neighborhood. Since the solutions generated by a GRASP construction phase are usually sub-optimal, local search almost always improves the constructed solution. GRASP has been used to find quality solutions for a wide range of combinatorial optimization problems [26,27]. Many extensions and improvements with respect to the GRASP introduced in [23, 241 have been proposed. Many of these extensions consist in the hybridization of the method with other metaheuristics. Parallel computers have increasingly found their way into metaheuristics [ 16, 201. Most of the parallel implementations of GRASP found in the literature consist in either partitioning the search space or partitioning the GRASP iterations and assigning each partition to a processor [6, 7, 25, 19, 39,40,41,43,44,46,47, 511. GRASP is applied to each partition in parallel. These implementations can be categorized as multiple-walk independent-thread [ 16,671, where the communication among processors during GRASP iterations is limited to the detection of program termination, Recently, there has been much work on hybridization of GRASP and pathrelinking [57]. Parallel approaches for GRASP with path-relinking can be categorized as multiple-walk independent-thread or multiple-walk cooperative-thread [16, 671, where processors share information on elite solutions visited during previous GRASP iterations. Examples of parallel GRASP with path-relinking can be found in [2,4, 14,42, 601. In this chapter, we present a survey of parallel GRASP heuristics. In Section 14.2, we consider multiple-walk independent-thread strategies. Multiple-walk cooperative-thread strategies are examined in Section 14.3. Some applications of parallel GRASP and parallel GRASP with path-relinking are surveyed in Section 14.4. In Section 14.5, we make some concluding remarks.

MULTIPLE-WALK INDEPENDENT-THREAD STRATEGIES

317

14.2 MULTIPLE-WALK INDEPENDENT-THREAD STRATEGIES Most parallel implementations of GRASP follow the multiple-walk independentthread strategy, based on the distribution of the iterations over the processors. In general, each search thread has to perform M a x I t e r a t i o n s l p iterations, where p and MaxIterations are, respectively, the number of processors and the total number of iterations. Each processor has a copy of the sequential algorithm, a copy of the problem data, and an independent seed to generate its own pseudorandom number sequence. To avoid that the processors find the same solutions, each of them must use a different sequence of pseudorandom numbers. A single global variable is required to store the best solution found over all processors. One of the processors acts as the master, reading and distributing problem data, generating the seeds which will be used by the pseudorandom number generators at each processor, distributing the iterations, and collecting the best solution found by each processor. Since the iterations are completely independent and very little information is exchanged, linear speedups are easily obtained provided that no major load imbalance problems occur. The iterations may be evenly distributed over the processors or according with their demands, to improve load balancing. Pardalos, Pitsoulis, and Resende [46] reported on results of a parallel GRASP for the quadratic assignment problem on a Kendall Square Research KSR-1 parallel computer with 128 processors. The implementation used the p t h r e a d mechanism, a lightweight process that is the fundamental unit of concurrency on the KSR-1 [36]. Each pthread executes on a separate processor and has its own memory. Twenty instances from the QAPLIB [13] were run for 1000 GRASP iterations on each of 64 single processors. For each instance, the best solution found over all processors was used as the stopping criterion for solving the instance on 54, 44, 34, 24, 14, 4, and 1 processors. Speedups were computed by averaging the running times of all instances. Pardalos, Pitsoulis, and Resende [47] implemented a parallel GRASP for the MAX-SAT problem on a cluster of SUN-SPARC 10 workstations, sharing the same file system, with communication done using the Parallel Virtual Machine (PVM) [30] software package. Each instance was run on a parallel GRASP using 1, 5, 10, and 15 processors, with a maximum number of iterations of 1000, 200, 100, and 66, respectively. The amount of CPU time required to perform the specified number of iterations and the best solution found were recorded. Since communication was kept to a minimum, linear speedups were expected. Figure 14.1 shows individual speedups as well as average speedups for these runs. Figure 14.2 shows that the average quality of the solution found was not greatly affected by the number of processors used. Martins et al. [43] implemented a parallel GRASP for the Steiner problem in graphs. Parallelization is achieved by the distribution of 512 iterations over the processors, with the value of the RCL parameter cr randomly chosen in the interval [O.O, 0.31 at each iteration. The algorithm was tested on an IBM SP-2 machine with 32 processors, using the Message-Passing Interface (MPI) library [65] for communication. The 60 problems from series C, D, and E of the OR-Library [lo] were

318

PARALLEL GRASP

number of processors

Fig. 14.1 Average speedups on 5, 10, and 15processors for maximum satisfiability problems.

2.5

2

m

-

1.5

3

’

w

9

x P

+

+ +

+ + $

0.5

+

0 number of processors

Fig. 14.2 Percentage error on 1,5,10, and 15 processors for maximum satisfiability problems.

used for the computational experiments. The parallel implementation obtained 45 optimal solutions over the 60 test instances. The relative deviation with respect to the optimal value was never larger than 4%. Almost-linear speedups observed for 2 ,

MULTIPLE-WALK INDEPENDENT-THREAD STRATEGIES

319

4, 8, and 16 processors with respect to the sequential implementation are illustrated in Figure 14.3.

1

2

4

6

8

10

number of processors

12

14

16

Fig. 14.3 Average speedups on 2,4, 8, and 16 processors on Steiner tree problem in graphs.

Path-relinking may also be used in conjunction with parallel implementations of GRASP. In the case of the multiple-walk independent-thread implementation described by Aiex et al. [4] for the 3-index assignment problem and Aiex, Binato, and Resende [2] for the job shop scheduling problem, each processor applies pathrelinking to pairs of elite solutions stored in a local pool. Computational results using MPI on an SGI Challenge computer with 28 RlOOOO processors showed linear speedups for the 3-index assignment problem, but sub-linear for the job shop scheduling problem. Alvim and Ribeiro [6, 71 showed that multiple-walk independent-thread approaches for the parallelization of GRASP may benefit much from load balancing techniques, whenever heterogeneous processors are used or if the parallel machine is simultaneously shared by several users. In this case, almost-linear speedups may be obtained with a heterogeneous distribution of the iterations over the p processors in q 2 p packets. Each processor starts performing one packet of [MaxIterations/gl iterations and informs the master when it finishes its packet of iterations. The master stops the execution of each slave processor when there are no more iterations to be performed and collects the best solution found. Faster or less loaded processors will perform more iterations than the others. In the case of the parallel GRASP implemented for the problem of traffic assignment described in [49], this dynamic load balancing strategy allowed reductions in the elapsed times of up to 15% with respect

320

PARALLEL GRASP

to the times observed for the static strategy, in which the iterations were uniformly distributed over the processors. The efficiency of multiple-walk independent-thread parallel implementations of metaheuristics, based on running multiple copies of the same sequential algorithm, has been addressed by some authors. A given target value T for the objective function is broadcast to all processors which independently execute the sequential algorithm. All processors halt immediately after one of them finds a solution with value at least as good as r. The speedup is given by the ratio between the times needed to find a solution with value at least as good as T , using respectively the sequential algorithm and the parallel implementation with p processors. These speedups are linear for a number of metaheuristics, including simulated annealing [ 18, 451; iterated local search algorithms for the traveling salesman problem [2 11; tabu search, provided that the search starts from a local optimum [9, 661; and WalkSAT [64] on hard random 3-SAT problems [35]. This observation can be explained if the random variable time tofind a solution within some target value is exponentially distributed, as indicated by the following proposition [67]:

Proposition 1: Let Pp(t)be the probability of not having found a given target solution value in t time units with p independent processes. If PI ( t ) = e - t / x with X E R’, corresponding to an exponential distribution, then Pp(t)= This proposition follows from the definition of the exponential distribution. It implies that the probability 1 - e-pt/’ of finding a solution within a given target value in time p t with a sequential algorithm is equal to the probability of finding a solution at least as good as that in time t using p independent parallel processors. Hence, it is possible to achieve linear speedups in the time to find a solution within a target value by multiple independent processors. An analogous proposition can be stated for a two-parameter (shifted) exponential distribution: Proposition 2: Let Pp(t)be the probability of not having found a given target solution value in t time units with p independent processors. If Pl(t) = e - ( t - p ) / x with X 6 R+ and p E R+, corresponding to a two-parameter exponential distribution, then pp(t)= e-o(t-p)/x. Analogously, this proposition follows from the definition of the two-parameter exponential distribution. It implies that the probability of finding a solution within a given target value in time pt with a sequential algorithm is equal to 1 - e-(pt-fi)/x, while the probability of finding a solution at least as good as that in time t using p independent parallel processors is 1 - e - p ( t - p ) / X . If p = 0, then both probabilities are equal and correspond to the non-shifted exponential distribution. Furthermore, if pp << A, then the two probabilities are approximately equal and it is possible to approximately achieve linear speedups in the time to find a solution within a target value using multiple independent processors. Aiex, Resende, and Ribeiro [S] showed experimentally that the solution times for GRASP also have this property, i.e. that they fit a two-parameter exponential distribution. Figure 14.4 illustrates this result, depicting the superimposed empirical and theoretical distributions observed for one of the cases studied along the computational experiments reported by the authors, which involved 2400 xuns of GRASP

MULTIPLE-WALK INDEPENDENT-THREAD STRATEGIES

321

1

0.8

0.6

0.4

0.2

0 time to target value (seconds)

Fig. 14.4 Superimposed empirical and theoretical distributions (times to target values measured in seconds on an SGI Challenge computer with 28 processors).

procedures for each of five different problems: maximum independent set [25, 511, quadratic assignment [39, 521, graph planarization [54, 591, maximum weighted satisfiability [53], and maximum covering [50]. We observe that the empirical distribution plots illustrating these conclusions were originally introduced by Feo, Resende, and Smith [25]. Empirical distributions are produced from experimental data and corresponding theoretical distributions are estimated from the empirical distributions. The same result still holds when GRASP is implemented in conjunction with a post-optimization path-relinking procedure [4]. A quantile-quantile plot (9-Q plot) and a plot showing the empirical and the theoretical distributions of the random variable time to target value for the sequential GRASP and GRASP with path-relinking for the three-index assignment problem [4] are shown in Figures 14.5 and 14.6, respectively. Analogously, Figures 14.7 and 14.8 show the same plots for the job shop scheduling problem [2]. These plots are computed by running the algorithms for 200 independent runs. Each run ends when the algorithm finds a solution with value less than or equal to a specified target value. Each running time is recorded and the times are sorted in increasing order. We associate with the i-th sorted running time ( t t )a probability p , = (i - 3)/200, and plot the points z, = ( t zp, z ) ,for i = 1, . . . ,200 as the empirical distribution. Following Chambers et al. [15], one determines the theoretical Q-Q plot for the data to estimate the parameters of the two-parameter exponential distribution. To describe Q-Q plots, recall that the cumulative distribution function for the two-

322

PARALLEL GRASP

parameter exponential distribution is given by

where X is the mean and standard deviation of the distribution data and p is the shift of the distribution with respect to the ordinate axis. For each value p z , i = 1,. . . ,200, we associate a p,-quantile &t(pl.)of the theoretical distribution. For each pz-quantile we have, by definition, that F ( ( Q t ( p z ) )= P,. Hence, & t ( p Z )= F-’ ( p t ) and, therefore, for the two-parameter exponential distribution, we have & t ( p z ) = -A ln(1 - p z ) /I.

+

The quantiles of the data of an empirical distribution are simply the (sorted) raw data. A theoretical Q-Q plot is obtained by plotting the quantiles of the data of an empirical distribution against the quantiles of a theoretical distribution. This involves three steps. First, the data (in this case, the measured times) are sorted in ascending order. Second, the quantiles of the theoretical exponential distribution are obtained. Finally, a plot of the data against the theoretical quantiles is made. When the theoretical distribution is a close approximation of the empirical distribution, the points in the Q-Q plot will have a nearly straight configuration. If the parameters X and p of the theoretical distribution that best fits the measured data could be estimated a priori, the points in a Q-Q plot would tend to follow the line 2 = y. Alternatively, in a plot of the data against a two-parameter exponential distribution with A’ = 1 and 1’= 0, the points would tend to follow the line y = Xz p. Consequently, parameters X and p of the two-parameter exponential distribution can be estimated, respectively, by the slope and the intercept of the line depicted in the Q-Q plot. To avoid possible distortions caused by outliers, one does not estimate the distribution mean by linear regression on the points of the Q-Q plot. Instead, one estimates the slope of line y = Xz p using the upper quartile qu and lower quartile ql of the data. The upper and lower quartiles are, respectively, the &( f ) and &( quantiles, respectively. Let = (2, - Z l ) / ( S u - ql)

+

+

3)

be an estimate of the slope, where 2, and are the u-th and l-th points of the ordered measured times, respectively. These estimates are used to plot the theoretical distributions on the plots on the right side of the figures. The lines above and below the estimated line on the Q-Q plots correspond to plus and minus one standard deviation in the vertical direction from the line fitted to the plot. This superimposed variability information is used to analyze the straightness of the Q-Q plots. Aiex and Resende [3] proposed a test using a sequential implementation to determine whether it is likely that a parallel implementation using multiple independent processors will be efficient. A parallel implementation is said to be efficient if it achieves linear speedup (with respect to wall time) to find a solution at least as good

MULTIPLE-WALK COOPERATIVE-THREAD STRATEGIES

323

ume I0 s"a.3Pomsl (SI

expOnent,a1quan01es

Fig. 14.5 Q-Q plot and exponential distribution for GRASP for the three-index assignment problem: instance B-S 26.1 with target value of 17.

0

1

2 3 4 e x p e n l a 1 wanlller

5

6

0

Fig. 14.6 Q-Q plot and exponential distribution for GRASP with path-relinking for the three-index assignment problem: instance B-S 2 6 . 1 with target value of 17.

as a given target value. The test consists in running K (200, for example) independent trials of the sequential program to build a Q-Q plot and estimate the parameters ,u and X of the shifted exponential distribution. If pIp1 << A, then we predict that the parallel implementation will be efficient. 14.3 MULTIPLE-WALK COOPERATIVE-THREAD STRATEGIES Path-relinking has been implemented with GRASP in multiple-walk independentthread strategies [4]. In this section, however, we focus on the use of path-relinking as a mechanism for implementing GRASP in the multiple-walk cooperative-thread strategies framework. We first briefly outline path-relinking and its hybridization with GRASP. Then, we discuss how cooperation among the threads can be achieved by using path-relinking. Path-relinking was originally proposed by Glover [31] as a strategy to explore trajectories connecting elite solutions obtained by tabu search or scatter search 132, 33,341. Paths in the solution space connecting pairs of elite solutions are explored in

324

PARALLEL GRASP

j -i r i 2

am

a m

E"

4m 203

0 2w

p,ce=ao5 lm*4;910 0

1

2 3 4 e x p e n t l a l quanbks

5

5

Fig. 14.7 Q-Q plot and exponential distribution for GRASP for the job shop scheduling problem: instance orb5 with target value of 910.

UMO

'F

08

Fig. 14.8 Q-Q plot and exponential distribution for GRASP with path-relinking for the job shop scheduling problem: instance orb5 with target value of 895.

the search for better solutions. Each pair consists of a starting solution and a guiding solution. Paths emanating from the starting solution are generated by applying moves that introduce in the current solution attributes that are present in the guiding solution. Algorithm 2 shows the pseudocode of the path-relinlung procedure applied between the starting and guiding solutions. The procedure first computes the symmetric difference A(z,, xt)between the two solutions, which defines the moves needed to reach the guiding solution (Q) from the initial solution (xs).A path of neighboring solutions is generated linking x, and xt.The best solution x* in this path is returned. At each step, all moves m E A(z,zt) from the current solution z are examined and the one which results in the least cost solution is selected, i.e. the move that minimizes f(z@ m),where x @ m is the solution resulting from applying move m to solution x. The best move m* is made, producing solution 5 c9 m*. This move is taken out of the set of available moves. If necessary, the best solution x* is updated. The procedure terminates when z t is reached, i.e., when A(x, xt)= 0. The use ofpath-relinking within a GRASP procedure was first proposed by Laguna and Marti [37]. It was followed by several extensions, improvements, and successful applications [ I , 2, 3,4, 1 1 , 14, 22, 56, 57, 58, 60, 61, 621.

MULTIPLE-WALK COOPERATIVE-THREAD STRATEGIES

325

Data : Starting solution x, and guiding solution xt Result : Best solution x* in path from x, to xt Compute symmetric difference A(x,, xt); f * min{f(x,), f(zt)); x* argmin{f(x,), f(xt)); x + x,; while A(x, xt) # 8 do m* c argmin{f(x @ m ) : m E A(x,xt)}; A(x @ m*,.t) A(x,.t) \ {m*>; x xern*; +

+

+

f.-

end Algorithm 2: Pseudocode of path-relinlung from starting solution x, to guiding solution xt . In its hybridization with GRASP, path-relinking is usually applied to pairs (z, y) of solutions, where x is a locally optimal solution produced by each GRASP iteration after local search and y is an elite solution randomly chosen from a pool with a limited number MaxElite of elite solutions found along the search. Since the symmetric difference is a measure of the length of the path explored during relinking, a strategy biased toward pool elements y with high symmetric difference with respect to x is often better than one using uniform random selection [58]. The pool is originally empty. To maintain a pool of good but diverse solutions, each locally optimal solution obtained by local search is considered as a candidate to be inserted into the pool if it is sufficiently different from every solution in the pool. If the pool already has MaxElite solutions and the candidate is better than the worst of them, then a simple strategy is to have the former replace the latter. Another strategy, which tends to increase the diversity of the pool, is to replace the pool element most similar to the candidate among all pool elements with cost worse than the candidate’s. If the pool is not full, the candidate is simply inserted. Algorithm 3 shows the pseudocode for a hybrid GRASP with path-relinking. Each GRASP iteration now has three main steps. In the construction phase, a greedy randomized construction procedure is used to build a feasible solution. The local search phase takes the solution built in the first phase and progressively improves it using a neighborhood search strategy, until a local minimum is found. In the pathrelinking phase, path-relinking is applied to the solution obtained by local search and to a randomly selected solution from the pool. The best solution found along this trajectory is also considered as a candidate for insertion in the pool and the incumbent is updated. Two basic mechanisms may be used to implement a multiple-walk cooperativethread GRASP with path-relinking heuristic. In distributed strategies [2, 31, each

326

PARALLEL GRASP

Data : Number of iterations MaxIterations Result : Solution x* E X

Pc0; f* c o o ;

for i = 1,. . . ,i,,, do x +- GreedyRandomizedConstruct ion(); z + LocalSearch(z); if i 2 2 then Randomly select an elite subset Y C P to relink with z; for y E Y do Set one of solutions zand y as the starting solution; Set the other as the guiding solution; xp +- PathRelinking(x,, x t ) ; Update the elite set P with xp; if f ( z p )< f * then

I +

*x* ! + xp; f(xp); end end end end z*= argmin{f(z), x E P } ;

Algorithm 3: A basic GRASP with path-relinking heuristic for minimization.

thread maintains its own pool of elite solutions. Each iteration of each thread consists initially of a GRASP construction, followed by local search. Then, the local optimum is combined with a randomly selected element of the thread’s pool using path-relinking. The output of path-relinking is finally tested for insertion into the pool. If accepted for insertion, the solution is sent to the other threads, where it is tested for insertion into the other pools. Collaboration takes place at this point. Though there may be some communicationoverhead in the early iterations, this tends to ease up as pool insertions become less frequent. The second mechanism is that used in centralized strategies [42, 601, in which a single pool of elite solutions is used. As before, each GRASP iteration performed at each thread starts by the construction and local search phases. Next, an elite solution is requested to and received from the centralized pool. Once path-relinking has been performed, the solution obtained as the output is sent to the pool and tested for insertion. Collaboration takes place when elite solutions are sent from the pool to other processors different from the one that originally computed it. We notice that, in both the distributed and the centralized strategies, each processor has a copy of the sequential algorithm and a copy of the data. One processor acts as the master, reading and distributing the problem data, generating the seeds which will be used by the pseudo-random number generators at each processor, distributing the iterations, and collecting the best solution found by each processor. In the case

SOME PARALLEL GRASP IMPLEMENTATIONS

327

of a distributed strategy, each processor has its own pool of elite solutions and all available processors perform GRASP iterations. Contrary to the case of a centralized strategy, one particular processor does not perform GRASP iterations and is used exclusively to store the pool and to handle all operations involving communication requests between the pool and the slaves. In the next section, we describe three examples of parallel implementations of GRASP with path-relinking. 14.4

SOME PARALLEL GRASP IMPLEMENTATIONS

In this section, we describe a comparison of multiple-walk independent-thread and multiple-walk cooperative-thread strategies for GRASP with path-relinking for the three-index assignment problem [4], the job shop scheduling problem [2], and the 2-path network design problem [42,60]. For each problem, we first state the problem and describe the construction, local search, and path-relinking procedures. We then show numerical results comparing the different parallel implementations. The experiments described in Sections 14.4.1 and 14.4.2 were done on an SGI Challenge computer (16 196-MHz MIPS RlOOOO processors and 12 194-MHzMIPS RlOOOO processors) with 7.6 Gb of memory. The algorithms were coded in Fortran and were compiled with the SGI MIPSpro F77 compiler using flags -03 -static -u. The parallel codes used SGI’s Message-Passing Toolkit 1.4, which contains a fully compliant implementationof version 1.2 of the Message-PassingInterface (MPI) [65] specification. In the parallel experiments, wall clock times were measured with the MPI function MPI-WT.This was also the case for runs with a single processor that are compared to multiple-processor runs. Timing in the parallel runs excludes the time to read the problem data, to initialize the random number generator seeds, and to output the solution. In the experimentsdescribed in Section 14.4.3,both variants ofthe parallel GRASP with path-relinlung heuristic were implemented in C (version egcs-2.91.66 of the gcc compiler) and the MPI LAM 6.3.2 implementation. Computational experiments were performed on a cluster of 32 Pentium I1 400 MHz processors with 32 Mbytes of RAM each, running under the Red Hat 6.2 implementation of Linux. Processors are connected by a 10 Mbits/s IBM 8274 switch. 14.4.1

Three-Index Assignment

14.4.1.1 Problem Formulation. The NP-hard [28, 291 three-index assignment problem (AP3) [48] is a straightforward extension of the classical two-dimensional assignment problem and can be formulated as follows. Given three disjoint sets I , J , and K with (I] = IJI = IK(1 = n and a weight C i j k associated with each ordered triplet (i, j , k) E I x J x K , find a minimum weight collection of n disjoint triplets ( i , j ,Ic) E I x J x K . Another way to formulate the AP3 is with permutations. There are n3 cost elements. The optimal solution consists of the n smallest cost elements such that the constraints are not violated. The constraints are enforced if one assigns to each set I, J , and I( the numbers 1 , 2 , . . . , n and none of the chosen

328

PARALLEL GRASP

triplets (z,j,k ) is allowed to have the same value for indices i, j , and k as another. The permutation-based formulation for the AP3 is n

where T N denotes the set of all permutations of the set of integers N = { 1 , 2 , . . . , n}. 14.4.1.2 GRASP Construction. The construction phase selects n triplets, one at a time, to form a three-index assignment S . The usual random choice in the interval [0,1]for the RCL parameter a is made at each iteration. The value remains constant during the entire construction phase. Construction begins with an empty solution S . The initial set C of candidate triplets consists of the set of all triplets. Let and c denote, respectively, the values of the smallest and largest cost triplets in C . All triplets ( i , j ,k ) in the candidate set C having cost cZJk5 c + a(C - c) are placed in the RCL. Triplet (i,,j,, k,) E C' is chosen at random and is added to the solution, i.e. S = S U {(ip,jp,kp)}.Once (Z,,j,,k,) is selected, any triplet ( z , j , k ) E C such that i = i, or j = j , or k = k , is removed from C . After n - 1 triplets have been selected, the set C of candidate triplets contains one last triplet which is added to S , thus completing the construction phase. 14.4.1.3 Local Search. If the solution of the AP3 is represented by a pair of permutations ( p , q ) , then the solution space consists of all ( n ! ) 2possible combinations of permutations. If p is a permutation vector, then a 2-exchange permutation of p is a permutation vector that results from swapping two elements in p . In the 2-exchange neighborhood scheme used in this local search, the neighborhood of a solution ( p ,Q) consists of all 2-exchange permutations of p plus all 2-exchange permutations of q. In the local search, the cost of each neighbor solution is compared with the cost of the current solution. If the cost of the neighbor is lower, then the solution is updated, the search is halted, and a search in the new neighborhood is initialized. The local search ends when no neighbor of the current solution has a lower cost than the current solution. 14.4.1.4 Path-Relinking. A solution of AP3 can be represented by two permutation arrays of numbers 1 , 2 , . . . , 11 in sets J and K , respectively, as follows:

Path-relinking is done between an initial solution

and a guiding solution

329

SOME PARALLEL GRASP IMPLEMENTATIONS

Let the difference between S and T be defined by the two sets of indices

1 2 # PT } , S f J - = {i = 1 , .. . ,nI q y # q?}. ,jS,T = p

= 17 . . . 7 n

During a path-relinking move, a permutation 7r 0, or q ) array in S, given by

is replaced by a permutation array

by exchanging permutation elements {1,2,. . . ,n} are such that T; = 7 r s .

$

and

7r;,

where i E 6:)T and j

E

14.4.1.5 Independent-Thread GRASP with Path-Relinking for AP3. We study the parallel efficiency of multiple-walk independent-thread GRASP with pathrelinking on Ap3 instances B-S 20.1, B-S 22.1, B-S 24.1, and B-S 2 6 . 1 of Balas and Saltzman [8] using 7, 8, 7, and 8 as target solution values, respectively. Table 14.1 shows the estimated exponential distribution parameters for the multiplewalk independent-thread GRASP with path-relinking strategy obtained from 200 independent runs of a sequential variant of the algorithm. In addition to the sequential variant, 60 independent runs of 2-, 4-, 8-, and 16-thread variants were run on the four test problems. Average speedups were computed dividing the sum of the execution times of the independent parallel program executing on one processor by the sum of the execution times of the parallel program on 2,4, 8, and 16 processors for 60 runs. The execution times of the independent parallel program executing on one processor and the execution times of the sequential program are approximately the same. The average speedups can be seen in Table 14.2 and Figure 14.9. Table 14.1 Estimated exponential distribution parameters p and obtained with 200 independent runs of a sequential GRASP with path-relinking on AP3 instances B-S 20.1, B-S 22.1, B-S 24.1, and B-S 26.1, with target values 7,8, 7, and 8, respectively

Problem B-S 2 0 . 1 B-S 2 2 . 1 B-S 24.1 B-S 2 6 . 1 average

I*.

-26.46 -135.12 - 16.76 32.12

x

1223.80 3085.32 4004.1 1 2255.55

I N X .02 1 .043 .004 .014 ,020

14.4.1.6 Cooperative-Thread GRASP with Path-Relinking for AP3. We now study the multiple-walk cooperative-thread strategy for GRASP with path-relinlung

330

PARALLEL GRASP

Table 14.2 Speedups for multiple-walk independent-thread implementations of GRASP with path-relinking on instances B-S 20.I, B-S 22.1,B-S 24.I, and B-S 26.1, with target values 7 , 8 , 7 , and 8, respectively. Speedups are computed with the average of 60 runs

Processors Problem B-S 20.1 B-S 22.1 B-S 24.1 B-S 26.1 average

2 spdup 1.67 2.25 1.71 2.11 1.94

effic 0.84 1.13 0.86 1.06 0.97

4 spdup 3.34 4.57 4.00 3.89 3.95

effic 0.84 1.14 1.00 0.97 0.99

8 spdup 6.22 9.01 7.87 6.10 7.3

effic 0.78 1.13 0.98 0.76 0.91

16 spdup effic 10.82 0.68 14.37 0.90 12.19 0.76 11.49 0.72 12.21 0.77

number of processors

Fig. 14.9 Average speedups on 2, 4, 8, and 16 processors for multiple-walk independentthread parallel GRASP with path-relinking on AP3 instances B-S 20.I, B-S 22.1,

B-S 24.1.andB-S 26.1.

on the AP3. As with the independent-thread GRASP with path-relinking strategy, the target solution values 7, 8, 7, and 8 were used for instances B-S 20.I, B-S 22.1, B-S 24.1,and B-S 26.1,respectively. Table 14.3 and Figure 14.10 show super-linear speedups on instances B-S 22.1,B-S 24.1,and B-S 26.I and about 90% efficiency for B-S 2 0 . I. Super-linear speedups are possible because good elite solutions are shared among the threads and are combined with GRASP solutions, whereas they would not be combined in an independent-thread implementation. Figure 14.11 compares average speedup of the two implementations tested in this section, namely the multiple-walk independent-thread and multiple-walk

SOME PARALLEL GRASP IMPLEMENTATIONS

331

16 14

12

a

U

g

10

s 6

2

4

6

8 10 number of processors

12

14

16

Fig. 14.10 Average speedups on 2, 4, 8, and 16 processors for multiple-walk cooperativethread parallel GRASP with path-relinking on AP3 instances B-S 20.1, B-S 22.1,

B-S 24.1,andB-S 26.1

cooperative-thread GRASP with path-relinking implementations using target solution values 7,8,7, and 8 on the same instances. The figure shows that the cooperative variant of GRASP with path-relinlung achieves the best parallelization. 14.4.2 Job Shop Scheduling 14.4.2.1 Problem Formulation. The job shop scheduling problem (JSP) is an NPhard [38] combinatorial optimization problem that has long challenged researchers.

Table 14.3 Speedups for multiple-walk cooperative-thread implementations of GRASP with path-relinking on instances B-S 20.1, B-S 22.1, B-S 24.1, and B-S 26.1, with target values 7,8,7, and 8, respectively. Average speedups were computed over 60 runs

Processors Problem B-S 20.1 B-S 22.1 B-S 24.1 B-S 26.1

average

2 spdup 1.56 1.64 2.16 2.16 1.88

effic 0.78 0.82 1.10 1.08 0.95

4 spdup 3.47 4.22 4.00 5.30 4.24

effic 0.88 1.06 1.00 1.33 1.07

8 spdup 7.37 8.83 9.38 9.55 8.78

effic 0.92 1.10 1.17 1.19 1.10

16 spdup 14.36 18.78 19.29 16.00 17.10

effic 0.90 1.04 1.21 1.00 1.04

332

PARALLEL GRASP

2

4

6

8 10 12 number of Drocessors

14

16

Fig. 14.11 Average speedups on 2, 4, 8, and 16 processors for the parallel algorithms tested on instances of AP3: multiple-walk independent-thread GRASP with path-relinking and multiple-walk cooperative-thread GRASP with path-relinking.

It consists in processing a finite set of jobs on a finite set of machines. Each job is required to complete a set of operations in a fixed order. Each operation is processed on a specific machine for a fixed duration. Each machine can process at most one job at a time, and once a job initiates processing on a given machine, it must complete processing on that machine without interruption. A schedule is a mapping of operations to time slots on the machines. The makespan is the maximum completion time of the jobs. The objective of the JSP is to find a schedule that minimizes the makespan. A feasible solution of the JSP can be built from a permutation of the set of jobs J’ on each of the machines in the set M , observing the precedence constraints, the restriction that a machine can process only one operation at a time, and requiring that once started, processing of an operation cannot be interrupted until its completion. Since each set of feasible permutations has a corresponding schedule, the objective of the JSP is to find, among the feasible permutations, the one with the smallest makespan. 14.4.2.2 GRASP Construction. Consider the GRASP construction phase for the JSP, proposed in Binato et al. [ 121 and Aiex, Binato, and Resende [ 2 ] ,where a single operation is the building block of the construction phase. A feasible schedule is built by scheduling individual operations, one at a time, until all operations have been scheduled.

SOME PARALLEL GRASP IMPLEMENTATIONS

333

While constructing a feasible schedule, not all operations can be selected at a given stage of the construction. An operation a; can only be scheduled if all prior operations of job j have already been scheduled. Therefore, at each construction phase iteration, at most \,71 operations are candidates to be scheduled. Let t h s set of candidate operations be denoted by 0, and the set of already scheduled operations by 0,. Denote the value of the greedy function for candidate operation a; by h(a;).

The greedy choice is to next schedule operation; JC = argmin(h(a;) 1 a; E 0,). Let 3; = argmax(h(a;) I a; E O,), b = h(g;), and TI = h(ZFi). Then, the GRASP restricted candidate list (RCL) is defined as

where cy is a parameter such that 0 5 (Y 5 1. A typical iteration of the GRASP construction is summarized as follows: a partial schedule (which is initially empty) is on hand, and the next operation to be scheduled is selected from the RCL and is added to the partial schedule, resulting in a new partial schedule. The selected operation is inserted into the earliest available feasible time slot on machine M,:. Construction ends when the partial schedule is complete, i.e., all operations have been scheduled. The algorithm uses two greedy functions. Even numbered iterations use a greedy hnction based on the makespan resulting from the inclusion of operation a; to the already-scheduled operations, i.e., h ( a i ) = C, for 0 = (0, U a;}. On odd numbered iterations, solutions are constructed by favoring operations from jobs having long remaining processing times. The greedy function used is given by h(ai) = which measures the remaining processing time for job j. The use of two different greedy functions produces a greater diversity of initial solutions to be used by the local search.

d,

14.4.2.3 Local Search. To attempt to decrease the makespan of the solution produced in the construction phase, we employ the 2-exchange local search used in [2, 12, 661, based on the disjunctive graph model of Roy and Sussmann [63]. We refer the reader to [ 2 , 121 for a description of the implementation of the local search procedure. 14.4.2.4 Path-Relinking. Path-relinking for job shop scheduling is similar to path-relinlung for three-index assignment. Whereas in the case of three-index assignment each solution is represented by two permutation arrays, in the JSP problem, each solution is made up of IMI permutation arrays of numbers 1 , 2 , . . . ,131. 14.4.2.5 Independent-Thread GRASP with Path-Relinking for JSR We study the efficiency of the multiple-walk independent-thread GRASP with path-relinkmg on JSP instances abzb, m t l O , orb5, and la21 of ORLib [lo] using 943,938, 895, and 1100 as target solution values, respectively. Table 14.4 shows the estimated exponential distribution parameters for the multiple-walk independent-thread GRASP

334

PARALLEL GRASP

with path-relinking strategy obtained from 200 independent runs of a sequential variant of the algorithm. In addition to the sequential variant, 60 independent runs of 2-, 4-, 8-, and 16-thread variants were run on the four test problems. As before, average speedups were computed dividing the sum of the execution times of the independent parallel program executing on one processor by the sum of the execution times of the parallel program on 2,4,8, and 16 processors for 60 runs. The average speedups can be seen in Table 14.5 and Figure 14.12. Table 14.4 Estimated exponential distribution parameters p and obtained with 200 independent runs of a sequential GRASP with path-relinking on JSP instances abz6, m t l 0 , orb5, and la21, with target values 943,938,895, and 1100, respectively

Problem abz6

x

P

47.67 305.27 130.12 175.20

mtlO orb5 la21

IPl/X

756.56 524.23 395.41 407.73

.06 .58 .32 .42 .34

average

Table 14.5 Speedups for multiple-walk independent-thread implementations of GRASP with path-relinking on instances abz6, m t l O , orbs, and la21, with target values 943, 938,895, and 1100, respectively. Speedups are computed with the average of 60 runs

Processors Problem abzd m t I0 Orb5 la2 I average

2 spdup 2.00 1.57 1.95 1.64 1.79

effic 1.00 0.79 0.98 0.82 .90

4 spdup 3.36 2.12 2.91 2.25 2.67

effic 0.84 0.53 0.74 0.56 0.67

8 spdup 6.44 3.03 3.99 3.14 4.15

effic 0.81 0.39 0.50 0.39 0.52

16 spdup effic 10.51 0.66 4.05 0.25 5.36 0.34 3.72 0.23 5.91 0.37

Compared to the efficiencies observed on the AP3 instances, those for these instances of the JSP were much worse. While with 16 processors average speedups of 12.2 were computed for the AP3, average speedups of only 5.9 were computed for the JSP. This is consistent with the Ipl/X values, which were on average .34 for the JSP and 0.02 for the AP3. 14.4.2.6 Cooperative-Thread GRASP with Path-Relinking for JSE! We now study multiple-walk cooperative-thread strategy for GRASP with path-relinking on the JSP. As with the independent-threadGRASP with path-relinking strategy, the target solution values 943,938,895, and 1100 were used for instances abz6, m t 10, orbs, and la21, respectively. Table 14.6 and Figure 14.13 show super-linear speedups on instances abzd andmtio, linear speedup on orb5 and about 70% efficiency for la21.

SOME PARALLEL GRASP IMPLEMENTATIONS

c

335

linear speedup ! % , I

2

4

6

8 10 number of processors

12

14

16

Fig. 14.12 Average speedups on 2, 4, 8, and 16 processors for multiple-walk independentthread parallel GRASP with path-relinking on JSP instances abz6,mtlO, orbs, and la21.

As before, super-linear speedups are possible because good elite solutions are shared among the threads and these elite solutions are combined with GRASP solutions whereas they would not be combined in an independent-thread implementation. Table 14.6 Speedups for multiple-walk cooperative-thread implementations of GRASP with path-relinking on instances abz6,mtlO, orbs, and la21, with target values 943, 938,895, and 1100, respectively. Average speedups were computed over 60 runs

Processors Problem abz6 mt 10 Orb5 la21 average

2 spdup 2.40 1.75 2.10 2.23 2.12

effic 1.20 0.88 1.05 1.12 1.06

4 spdup 4.21 4.58 4.91 4.47 4.54

effic 1.05 1.15 1.23 1.12 1.14

8 spdup 11.43 8.36 8.89 7.54 9.05

effic 1.43 1.05 1.11 0.94 1.13

16 spdup 23.58 16.97 15.76 11.41 16.93

effic 1.47 1.06 0.99 0.71 1.06

Figure 14.14 compares the average speedup of the two implementations tested in this section, namely implementations of the multiple-walk independent-thread and multiple-walk cooperative-thread GRASP with path-relinking using target solution values 943, 938, 895, and 1100 on instances abz6, mtlO, orbs, and la21, respectively. The figure shows that the cooperative variant of GRASP with path-relinlung achieves the best parallelization.

336

PARALLEL GRASP 16

14

12

a

10

D

! a 6

4

2 I

2

6

4

8 10 number of processors

12

14

16

Fig. 14.13 Average speedups on 2, 4, 8, and 16 processors for multiple-walk cooperativethread parallel GRASP with path-relinking on JSP instances abz6, mt 10,orbs, and la21.

2

4

6

8

10

12

14

16

number of prccessors

Fig. 14.14 Average speedups on 2. 4, 8, and 16 processors for the parallel algorithms tested on instances of JSP: multiple-walk independent-threadGRASP with path-relinking and multiple-walk cooperative-threadGRASP with path-relinking.

SOME PARALLEL GRASP IMPLEMENTATIONS

337

14.4.3 2-Path Network Design Problem 14.4.3.1 Problem Formulation. Let G = (V,E ) be a connected graph, where V is the set of nodes and E is the set of edges. A k-path between nodes s, t E V is a sequence of at most k edges connecting them. Given a non-negative weight function w : E 4 R+ associated with the edges of G and a set D of pairs of origindestination nodes, the 2-path network design problem (2PNDP) consists of finding a minimum weighted subset of edges E’ C E containing a 2-path between every origin-destination pair. Applications of 2PNDP can be found in the design of communications networks, in which paths with few edges are sought to enforce high reliability and small delays. 2PNDP was shown to be NP-hard by Dahl and Johannessen [ 171. 14.4.3.2 GRASP Construction. The construction of a new solution begins by the initialization of modified edge weights with the original edge weights. Each iteration of the construction phase starts by the random selection of an origin-destination pair still in D. A shortest 2-path between the extremities of this pair is computed, using the modified edge weights. The weights of the edges in this 2-path are set to zero until the end of the construction procedure, the origin-destination pair is removed from D , and a new iteration resumes. The construction phase stops when 2-paths have been computed for all origin-destination pairs. 14.4.3.3 Local Search. The local search phase seeks to improve each solution built in the construction phase. Each solution may be viewed as a set of 2-paths, one for each origin-destinationpair in D. To introduce some diversity by driving different applications of the local search to different local optima, the origin-destination pairs are investigated at each GRASP iteration in a circular order defined by a different random permutation of their original indices. Each 2-path in the current solution is tentatively eliminated. The weights of the edges used by other 2-paths are temporarily set to zero, while those which are not used by other 2-paths in the current solution are restored to their original values. A new shortest 2-path between the extremities of the origin-destination pair under investigationis computed, using the modified weights. If the new 2-path improves the current solution, then the latter is modified; otherwise the previous 2-path is restored. The search stops if the current solution was not improved after a sequence of ID I iterations along which all 2-paths have been investigated. Otherwise, the next 2-path in the current solution is investigated for substitution and a new iteration resumes. 14.4.3.4 Path-Relinking. A solution to 2PNDP is represented as a set of 2-paths connecting each origin-destination pair. Path-relinking starts by determining all origin-destination pairs whose associated 2-paths are different in the starting and guiding solutions. These computations amount to determining a set of moves which should be applied to the initial solution to reach the guiding one. Each move is characterized by a pair of 2-paths, one to be inserted and the other to be eliminated from the current solution.

338

PARALLEL GRASP

14.4.3.5 Parallel GRASP with Path-Relinking for ZPNDR As for AP3 and JSP, in the case of the independent-threadparallel implementation of GRASP with pathrelinking for 2PNDP, each processor has a copy of the sequential algorithm, a copy of the data, and its own pool of elite solutions. One processor acts as the master, reading and distributing the problem data, generating the seeds which will be used by the pseudo-random number generators at each processor, distributing the iterations, and collecting the best solution found by each processor. All the p available processors perform GRASP iterations. However, in the case of the cooperative-threadparallel implementation of GRASP with path-relinkingfor 2PNDP, the master handles a centralizedpool of elite solutions, collecting and distributing them upon request (recall that in the case of AP3 and JSP each processor had its own pool of elite solutions). The p - 1 slaves exchange the elite solutions found along their search trajectories. In the proposed implementation for 2PNDP, each slave may send up to three different solutions to the master at each iteration: the solution obtained by local search and the solutions w1 and w 2obtained by forward and backward path-relinking [57]between the same pair of starting and guiding solutions, respectively. 14.4.3.6 Computational Results. The results illustrated in this section concern an instance with 100 nodes, 4950 edges, and 1000 origin-destination pairs. We use the methodology proposed in [51 to assess experimentally the behavior of randomized algorithms. This approach is based on plots showing empirical distributions of the random variable time to target solution value. To plot the empirical distribution, we fix a solution target value and run each algorithm 200 times, recording the running time when a solution with cost at least as good as the target value is found. For each algorithm, we associate with the i-th sorted running time t , a probability p z = (i - ;)/200 and plot the points zi = (t,,p t ) , for i = 1,.. . ,200. Results obtained for both the independent-thread and the cooperative-threadparallel implementations of GRASP with path-relinking on the above instance with the target value set at 683 are reported in Figure 14.15. The cooperative implementation is already faster than the independent one for eight processors. For fewer processors the independent implementationis naturally faster, since it employs all p processors in the search (while only p - 1 slave processors take part effectively in the computations performed by the cooperative implementation). Three different strategies were investigated to further improve the performance of the cooperative-thread implementation, by reducing the cost of the communication between the master and the slaves when the number of processors increases:

(1) Each send operation is broken in two parts. First, the slave sends only the cost of the solution to the master. If this solution is better than the worst solution in the pool, then the full solution is sent. The number of messages increases, but most of them will be very small ones with light memory requirements. (2) Only one solution is sent to the pool at each GRASP iteration.

SOME PARALLEL GRASP IMPLEMENTATIONS

339

time to target value

(a) 10

09 08

07

06

03 02 01

00 lime lo target value

(b) Fig. 14.15 Running times for 200 runs of (a) the multiple-walk independent-thread and (b) the multiple-walk cooperative-thread implementations of GRASP with path-relinking using two processors and with the target solution value set at 683.

340

PARALLEL GRASP

(3) A distributed implementation, in which each slave handles its own pool of elite solutions. Every time a processor finds a new elite solution, the latter is broadcast to the others. Comparative results for these three strategies on the same problem instance are plotted in Figure 14.16. The first strategy outperformed all others.

10

1

wo

1 ow0

1

wwo

time to target value

Fig. 14.16 Strategies for improving the performance of the centralized multiple-walk cooperative-threadimplementation on eight processors.

Table 14.7 shows the average computation times and the best solutions found over 10 runs of each strategy when the total number of GRASP iterations is set at 3200. There is a clear degradation in solution quality for the independent-thread strategy when the number of processors increases. As fewer iterations are performed by each processor, the pool of elite solutions gets poorer with the increase in the number of processors. Since the processors do not communicate, the overall solution quality is worse. In the case of the cooperative strategy, the information shared by the processors guarantees the high quality of the solutions in the pool. The cooperative implementation is more robust. Very good solutions are obtained with no degradation in quality and significant speedups. 14.5

CONCLUSION

Metaheuristics, such as GRASP, have found their way into the standard toolkit of combinatorial optimization methods. Parallel computers have increasingly found their way into metaheuristics. In this chapter, we surveyed work on the parallelization of GRASP. We first showed that the random variable time to target solution i d u e for GRASP heuristics fits a two-parameter (shifted) exponential distribution. Under the mild assumption that the

REFERENCES

341

Table 14.7 Average times and best solutions over 10 runs for ZPNDP

Processors 1 2 4 8 16 32

independent-thread best value avg. time (s) 673 1310.1 676 686.8 680 332.7 687 164.1 81.7 692 702 41.3

cooperative-thread best value avg. time (s) -

-

676 673 676 674 678

1380.9 464.1 200.9 97.5 74.6

product of the number of processors by the shift in the distribution is small compared to the standard deviation of the distribution, linear speedups can be expected in parallel multiple-walk independent-thread implementations. We illustrated with an application to the maximum satisfiabilityproblem a case where this occurs. Path-relinking has been increasingly used to introduce memory in the otherwise memoryless original GRASP procedure. The hydridization of GRASP and path-relinking has led to some effective multiple-walk cooperative-thread implementations. Collaboration between the threads is usually achieved by sharing elite solutions, either in a single centralized pool or in distributed pools. In some of these implementations, super-linear speedups are achieved even for cases where little speedup occurs in multiple-walk independent-thread variants. Parallel cooperative implementations of metaheuristics lead to significant speedups, smaller computation times, and more robust algorithms. However, they demand more programming efforts and implementation skills. The three applications described in this survey illustrate the strategies and programming skills involved in the development of robust and efficient parallel cooperative implementations of GRASP.

REFERENCES 1. R.M. Aiex. Umu investigaGcio experimental du distribuipio de probabilidude de tempo de solu~c?oem heuristicus GRASP e sua aplicu@o nu analise de implementuqijes parulelas. PhD thesis, Department of Computer Science, Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil, 2002. 2. R.M. Aiex, S . Binato, and M.G.C. Resende. Parallel GRASP with path-relinking for job shop scheduling. Parallel Computing, 29:393430, 2003. 3. R.M. Aiex and M.G.C. Resende. Parallel strategies for GRASP with pathrelinking. In T. Ibaraki, K. Nonobe, and M. Yagiura, editors, Metaheuristics: Progress us realproblem solvers, pages 301-331. Springer, 2005.

342

PARALLEL GRASP

4. R.M. Aiex, M.G.C. Resende, P.M. Pardalos, and G. Toraldo. GRASP with path relinking for three-index assignment. INFORMS Journal on Computing, 17(2):224-247, 2005. 5. R.M. Aiex, M.G.C. Resende, and C.C. Ribeiro. Probability distribution of solution time in GRASP: An experimental investigation. Journal of Heuristics, 81343-373,2002. 6. A. Alvim and C.C. Ribeiro. Balanceamento de carga na paralelizaqiio da metaheuristica GRASP. In X Simpdsio Brasileiro de Arquiteturas de Computadores, pages 279-282. Sociedade Brasileira de Computaqlo, 1998. 7. A.C.F. Alvim. Estrategias de paralelizaqlo da metaheuristica GRASP. Master’s thesis, Departamento de Informatica, PUC-Rio, Rio de Janeiro, RJ 22453-900 Brazil, April 1998. 8. E. Balas and M.J. Saltzman. An algorithm for the three-index assignment problem. Oper: Rex, 39:150-161, 1991. 9. R. Battiti and G. Tecchiolli. Parallel biased search for combinatorial optimization: Genetic algorithms and TABU. Microprocessors and Microsystems, 16:351-367, 1992. 10. J. E. Beasley. OR-Library: Distributing test problems by electronic mail. Journal of the Operational Research Society, 41: 1069-1072, 1990. 11. S. Binato, H. Faria Jr., and M.G.C. Resende. Greedy randomized adaptive path relinking. In J.P. Sousa, editor, Proceedings of the IV Metaheuristics International Conference, pages 393-397,2001. 12. S. Binato, W.J. Hery, D.M. Loewenstern, and M.G.C. Resende. A GRASP for job shop scheduling. In C.C. Ribeiro and P. Hansen, editors, Essays and Survtys on Metaheuristics, pages 58-79. Kluwer Academic Publishers, 2002. 13. R. Burkard, S. Karisch, and F. Rendl. QAPLIB -A quadratic assignment problem library. European Journal of Operations Research, 55: 115-1 19, 1991. 14. S.A. Canuto, M.G.C. Resende, and C.C. Ribeiro. Local search with perturbations for the prize-collecting Steiner tree problem in graphs. Networks, 38:50-58, 2001. 15. J. M. Chambers, W. S. Cleveland, B. Kleiner, and P. A. Tukey. Graphical Methods for Data Analysis. Chapman & Hall, 1983. 16. V.-D. Cung, S.L. Martins, C.C. Ribeiro, and C. Roucairol. Strategies for the parallel implementation of metaheuristics. In C.C. Ribeiro and P. Hansen, editors, Essays and sun’eys in metaheuristics, pages 263-308. Kluwer Academic Publishers, 2002. 17. G. Dahl and B. Johannessen. The 2-path network design problem. Networks, 43: 190-199.2004.

REFERENCES

343

18. N. Dodd. Slow annealing versus multiple fast annealing runs: An empirical investigation. Parallel Computing, 16:269-272, 1990. 19. L.M.A. Drummond, L.S. Vianna, M.B. Silva, andL.S. Ochi. Distributedparallel metaheuristics based on GRASP and VNS for solving the traveling purchaser problem. In Proceedings of the Ninth International Conference on Parallel and Distributed Systems - ICPADS’O2, pages 1-7. IEEE, 2002. 20. S . Duni, P.M. Pardalos, and M.G.C. Resende. Parallel metaheuristics for combinatorial optimization. In R. Corrza, I. Dutra, M. Fiallos, and F. Gomes, editors, Models for Parallel and Distributed Computation - Theory, Algorithmic Techniques and Applications, pages 179-206. Kluwer Academic Publishers, 2002. 21. H.M.M. Ten Eikelder, M.G.A. Verhoeven, T.W.M. Vossen, and E.H.L. Aarts. A probabilistic analysis of local search. In I.H. Osman and J.P. Kelly, editors, Metaheuristics: Theory & applications, pages 605-6 18. Kluwer Academic Publishers, 1996. 22. H. Faria Jr., S. Binato, M.G.C. Resende, and D.J. Falcfo. Transmission network design by a greedy randomized adaptive path relinking approach. IEEE Transactions on Power Systems, 20( 1):43-49,2005. 23. T.A. Feo and M.G.C. Resende. A probabilistic heuristic for a computationally difficult set covering problem. Operations Research Letters, 8:67-71, 1989. 24. T.A. Feo and M.G.C. Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6: 109-133, 1995. 25. T.A. Feo, M.G.C. Resende, and S.H. Smith. A greedy randomized adaptive search procedure for maximum independent set. Operations Research, 42:860878, 1994. 26. P. Festa and M.G.C. Resende. GRASP: An annotated bibliography. In C.C. Ribeiro and P. Hansen, editors, Essays and surveys in metaheuristics, pages 325-367. Kluwer Academic Publishers, 2002. 27. P. Festa and M.G.C. Resende. An annotated bibliography of GRASP. Technical Report TD-SWYSEW, AT&T Labs Research, Florham Park, NJ 07932, February 2004. 28. A.M. Frieze. Complexity of a 3-dimensional assignment problem. European Journal of Operational Research, 13:161-164, 1983. 29. M.R. Garey and D.S. Johnson. Computers and intractability - A guide to the theov of NP-completeness. W.H. Freeman and Company, 1979. 30. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Mancheck, and V. Sunderam. PVM: Parallel virtual machine, A user S guide and tutorialfor networkedparallel computing. Scientific and Engineering Computation. MIT Press, Cambridge, MA, 1994.

344

PARALLEL GRASP

3 1. F. Glover. Tabu search and adaptive memory programing - Advances, applications and challenges. In R.S. Ban; R.V. Helgason, and J.L. Kennington, editors, Interfaces in Computer Science and Operations Research, pages 1-75. Kluwer, 1996. 32. F. Glover. Multi-start and strategic oscillation methods - Principles to exploit adaptive memory. In M. Laguna and J.L. Gonzales-Velarde, editors, Computing Tools for Modeling, Optimization and Simulation: Interfaces in Computer Science and Operations Research, pages 1-24. Kluwer, 2000. 33. F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, 1997. 34. F. Glover, M. Laguna, and R. Marti. Fundamentals of scatter search and path relinking. Technical report, Graduate School of Business and Administration, University of Colorado, Boulder, CO 80309-0419,2000. 35. H. Hoos and T. Stiitzle. Towards a characterisation of the behaviour of stochastic local search algorithms for SAT. Artificial Intelligence, 112:213-232, 1999. 36. Kendall Square Research. KSR Parallel Programming. Waltham, MA, February 1992.

170 Tracer Lane,

37. M. Laguna and R. Marti. GRASP and path relinking for 2-layer straight line crossing minimization. INFORMS Journal on Computing, 1 1:44-52, 1999. 38. J. K. Lenstra and A. H. G. Rinnooy Kan. Computational complexity of discrete optimization problems. Annals of Discrete Mathematics, 4: 121-140, 1979. 39. Y. Li, P.M. Pardalos, and M.G.C. Resende. A greedy randomized adaptive search procedure for the quadratic assignment problem. In P.M. Pardalos and H. Wolkowicz, editors, Quadratic assignment and related problems, volume 16 of DIMACS Series on Discrete Mathematics and Theoretical Computer Science, pages 237-26 1. American Mathematical Society, 1994. 40. S.L. Martins, P.M. Pardalos, M.G.C. Resende, and C.C. Ribeiro. Greedy randomized adaptive search procedures for the Steiner problem in graphs. In P.M. Pardalos, S. Rajasejaran, and J. Rolim, editors, Randomization methods in algorithmic design, volume 43 of DIMACS Series on Discrete Mathematics and Theoretical Computer Science, pages 133-145. American Mathematical Society, 1999. 41. S.L. Martins, M.G.C. Resende, C.C. Ribeiro, and P.M. Pardalos. A parallel GRASP for the Steiner tree problem in graphs using a hybrid local search strategy. Journal of Global Optimization, 17:267-283, 2000. 42. S.L. Martins, C.C. Ribeiro, and I. Rosseti. Applications and parallel implementations of metaheuristics in network design and routing. Lecture Notes in Computer Science, 3285:205-2 13, 2004.

REFERENCES

345

43. S.L. Martins, C.C. Ribeiro, and M.C. Souza. A parallel GRASP for the Steiner problem in graphs. In A. Ferreira and J. Rolim, editors, Proceedings of IRREGULAR '98 - 5th International Symposium on Solving Irregularly Structured Problems in Parallel, volume 1457 of Lecture Notes in Computer Science, pages 285-297. Springer-Verlag, 1998. 44. R.A. Murphey, P.M. Pardalos, and L.S. Pitsoulis. A parallel GRASP for the data association multidimensional assignment problem. In P.M. Pardalos, editor, Parallel processing of discrete problems, volume 106 of The IMA Volumes in Mathematics and Its Applications, pages 159-1 80. Springer-Verlag, 1998. 45. L.J. Osborne and B.E. Gillett. A comparison of two simulated annealing algorithms applied to the directed Steiner problem on networks. ORSA Journal on Computing, 3:213-225, 1991. 46. P.M. Pardalos, L.S. Pitsoulis, and M.G.C. Resende. A parallel GRASP implementation for the quadratic assignment problem. In A. Ferreira and J. Rolim, editors, Parallel Algorithms for Irregularly Structured Problems - Irregular '94, pages 115-133. Kluwer Academic Publishers, 1995. 47. P.M. Pardalos, L.S. Pitsoulis, and M.G.C. Resende. A parallel GRASP for MAX-SAT problems. Lecture Notes in Computer Science, 1184575-585, 1996. 48. W.P. Pierskalla. The tri-subsitution method for the three-multidimensional assignment problem. CORS Journal, 5:71-81, 1967. 49. M. Prais and C.C. Ribeiro. Reactive GRASP: An application to a matrix decomposition problem in TDMA traffic assignment. INFORMS Journal on Computing, 12:164-176, 2000. 50. M.G.C. Resende. Computing approximate solutions of the maximum covering problem using GRASP. J. ofHeuristics, 4:161-171, 1998. 51. M.G.C. Resende, T.A. Feo, and S.H. Smith. Algorithm 787: Fortran subroutines for approximate solution of maximum independent set probiems using GRASP. ACM Trans. Math. Software, 24:38&394, 1998. 52. M.G.C. Resende, P.M. Pardalos, and Y. Li. Algorithm 754: Fortran subroutines for approximate solution of dense quadratic assignment problems using GRASP. ACM Transactions on Mathematical Software, 22: 104-1 18, 1996. 53. M.G.C. Resende, L.S. Pitsoulis, and P.M. Pardalos. Fortran subroutines for computing approximate solutions of MAX-SAT problems using GRASP. Discrete Applied Mathematics, 100:95-113, 2000. 54. M.G.C. Resende and C.C. Ribeiro. A GRASP for graphplanarization. Networks, 29:173-189, 1997. 55. M.G.C. Resende and C.C. Ribeiro. Greedy randomized adaptive search procedures. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, pages 2 19-249. Kluwer Academic Publishers, 2002.

346

PARALLEL GRASP

56. M.G.C. Resende and C.C. Ribeiro. A GRASP with path-relinking for private virtual circuit routing. Networks, 41:104-114,2003. 57. M.G.C. Resende and C.C. Ribeiro. GRASP with path-relidung: Recent advances and applications. In T. Ibaralu, K. Nonobe, and M. Yagiura, editors, Metaheuristics: Progress as realproblem solvers, pages 29-63. Springer, 2005.

58. M.G.C. Resende and R.F. Werneck. A hybrid heuristic for the p-median problem. Journal OfHeuristics, 10:59-88, 2004. 59. C.C. Ribeiro and M.G.C. Resende. Algorithm 797: Fortran subroutines for approximate solution of graph planarization problems using GRASP. ACM Transactions on Mathematical Software, 25:341-352, 1999. 60. C.C. Ribeiro and I. Rosseti. A parallel GRASP for the 2-path network design problem. Lecture Notes in Computer Science, 2004:922-926, 2002. 61. C.C. Ribeiro, E. Uchoa, and R.F. Werneck. A hybrid GRASP with perturbations for the Steiner problem in graphs. INFORMS Journal on Computing, 14:228246,2002. 62. I. Rosseti. Heuristicas para o problema de sintese de redes a 2-cuminhos. PhD thesis, Department of Computer Science, Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil, July 2003. 63. B. Roy and B. Sussmann. Les problemes d’ordonnancement avec contraintes disjonctives, Note D. S. no. 9 bis, SEMA, Paris, France, 1964. 64. B. Selman, H. Kautz, and B. Cohen. Noise strategies for improving local search. In Proceedings of the TweIfth National Conference on ArtiJicial Intelligence, pages 337-343, Seattle, 1994. MIT Press.

65. M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. MPZ - The complete reference, Volume I - The MPI Core. The MIT Press, 1998. 66. E.D. Taillard. Robust taboo search for the quadratic assignment problem. Parallel Computing, 17:443-455, 1991. 67. M.G.A. Verhoeven and E.H.L. Aarts. Parallel local search. Journal ofHeuristics, 1:4346, 1995.

15 Parallel Hybrid Metaheuristics CARLOS COTTA’, EL-GHAZALI TALB12,ENRIQUE ALBA’ ‘Universidad de MBlaga, Spain ’Laboratoire d’lnformatique Fondamentale de Lille, France

15.1 INTRODUCTION Finding optimal solutions is in general computationally intractable for many combinatorial optimization problems, e.g., those known as “-hard [54]. The classical approach has been the use of approximation algorithms, i.e., relaxing the goal from finding the optimal solution to obtaining solutions within some bounded distance from the former [61]. Unfortunately, it turns out that attainable bounds in practice (that is, at a tenable computational cost) are in general too far from the optimum to be useful in many problems. The days in which researchers struggled to slightly tighten worst-case bounds that were anyway far from practical or in which finding a PTAS (let alone a FPTAS) for a certain problem was considered a whole success are thus swiftly coming to an end. Indeed, two new alternative lines of attack are being currently used to treat these difficulties. On one hand, a new corpus of theory is being built around the notion of fixed-parameter tractability that emanates from the field of parameterized complexity [40] [41]. On the other hand, metaheuristics are being increasingly used nowadays. Quoting [27], the philosophy of these latter techniques is “try to obtain probably optimal solutions to your problem, for probably good solutions are overwhelmingly hard to obtain”. See also [27][97] for some prospects on the intersection of parameterized complexity and metaheuristics. Focusing on the latter techniques, metaheuristic approaches can be broadly categorized into two major classes: single-solution search algorithms (also known as trajectory-based or local-search based algorithms) and multiple-solution search algorithms (also-known as population-based or -arguably stretching the term- evolutionary algorithms). Examples of the former class are descent local search (LS) [109], greedy heuristics (GH) [81], simulated annealing (SA) [72], or tabu search (TS) [56]. Among the latter class, one can cite genetic algorithms (GA) [63], evolution strategies (ES) [115], genetic programming (GP)[74], ant colony optimization (ACO)[22], scatter search (SS) [55], estimation of distribution algorithms (EDAs) [79], etc. We refer the reader to [23][58][l08][ 1161 for good overviews of metaheuristics. 347

348

PARALLEL HYBRID METAHEURISTICS

Over the years, interest in metaheuristicshas risen considerably among researchers in combinatorial optimization. The flexibility of these techniques makes them prime candidates for tackling both new problems and variants of existing problems. This fact, together with their remarkable effectiveness (not to mention the algorithmic beauty of the paradigm), acted as a major attractor of the research community attention. In this expanding scenario, the communal confidence on the potential of these techniques had to endure a crucial test in the middle of the 1990s when some important theoretical results were released. To be precise, the formulation of the so-called No-Free-Lunch Theorem (NFL) by Wolpert and Macready [ 1391 made it definitely clear that a search algorithm strictly performs in accordance with the amount and quality of the problem knowledge they incorporate. However, far from undermining the confidence on the usefulness of metaheuristics, this result contributed to render clear the need for adopting problem-dependent strategies within these techniques. Following the terminology of L. Davis [38], who together with other researchers pioneered this line of work long before these results became public, we use the term hybrid metaheuristicsto denote these techniques. Here, the term “hybrid” refers to the fact that metaheuristics are typically endowed with problem-dependent knowledge by combining them with other techniques (not necessarily metaheuristic). The best results found for many practical or academic optimization problems are obtained by hybrid algorithms. Combinations of algorithms such as descent LS, SA, TS, and EAs have provided very powerful search algorithms. The introduction of parallelism plays a major role in these hybrid algorithms. The reason is twofold: (i) the use of parallel programming techniques grants access to utilizing highly powerful computational platforms (multiprocessor systems, or distributed networks of workstations), and (ii) parallelism opens a plethora of possibilities for attaining new hybrid algorithms of increased search capability. We will here present an overview of these parallel hybrid metaheuristics, focusing both on algorithmic and computational aspects. Previously, and for the sake of self-containedness, sequential hybrid metaheuristics will be briefly surveyed.

15.2 HISTORICAL NOTES ON HYBRID METAHEURISTICS The hybridization of metaheuristics in sequential scenarios dates back to the origins of the paradigm itself, although it was initially considered as a minor issue (at least within the majority of the evolutionary computing community). Before the popularization of the NFL Theorem, general-purpose (evolutionary) metaheuristics were believed to be quasi-optimal searchers, globally better than other optimization techniques. All theoretical nuances that our current understanding of the issue have brought (see, e.g., [42, 1381) notwithstanding, these results stress the fact that hybridizing (in its broad sense of problem augmentation) is determinant for achieving top performance. With hindsight, the group of researchers that advocated for the centrality of this hybridization philosophy were on the right track. Let us start by providing some historical background on the development of the topic.

HISTORICAL NOTES ON HYBRID METAHEURISTICS

349

One of the keystones in the development of hybrid metaheuristics was the conception of memetic algorithms [ 1061 (MAS). This term was given birth in the late 1980s to denote a family of metaheuristics that tried to blend several concepts from tightly separated -at that time- families such as EAs and SA. The adjective memetic comes from the term meme, coined by R. Dawluns [39] to denote an analogy to gene in the context of cultural evolution. One of the first algorithms to which the MA label was assigned dates from 1988 [ 1061 and was regarded by many as a hybrid of traditional GAS and SA. Part of the initial motivation was to find a way out of the limitations of both techniques on a well-studied combinatorial optimization problem the Euclidean Traveling Salesman Problem (ETSP). Less than a year later, in 1989, Moscato and Norman identified several authors who were also pioneering the introduction of heuristics to improve the solutions before recombining them [59][100] (see other references and the discussion in [95] and [97]). Particularly coming from the GA field, several authors were introducing problem-domain knowledge in a variety of ways. In [95] the denomination of MAS was introduced for the first time. It was also suggested that cultural evolution can be a better working metaphor for these metaheuristics to avoid the biologically constrained thinking that was restricting progress at that time. See also [96][97][98]. L. Davis was other of the champions of this hybrid approach to optimization. He advocated for the inclusion of problem-dependent knowledge in GASby means of ad hoc representations or by embedding specialized algorithms within the metaheuristic. He followed this latter approach in [94], where backpropagation [ 1201 was used as a local searcher within a GA aimed at optimizing the weights of a neural network in the learning phase. T h s algorithm qualifies as a MA, and constitutes an approach that has been used by many other researchers in this context. We refer to [25] for a general survey of these previous works, and an empirical comparison of GA, ES, and EDAs hybridized with backpropagation. Davis also studied the combination of GAS with the conspicuous k-means algorithm for classification purposes [69][70]. Regarding ad hoc representations, he provided strong empirical evidence on the usefulness of utilizing problem-specific non-binary representations within GAS, e.g., real-coded representations. These guidelines are well illustrated in [38]. It must be noted that an important part of the metaheuristic community traditionally associated with the field of operations research (OR)- grew in these years in relative isolation from the Evolutionary Computation (EC) community. They were thus largely unconstrained by disputable theoretical disquisitions, or by the dominant orthodoxy. This allowed a more pragmatic view of the optimization process, free of biologically oriented thinking. One of the most distinctive and powerful fruits of this line of research is SS. The foundations of this population-based metaheuristic can be traced back to the 1970s in the context of combining decision rules and problem constraints [55]. Despite some methodological differences with other population-based metaheuristics (e.g., SS relies more on deterministic strategies rather than on randomization), SS shares some crucial elements with MAS: both techniques are explicitly concerned with using all problem-knowledgeavailable. This typically results in the use of problem-dependent combination procedures and local improvement strategies: see [57][78].

350

PARALLEL HYBRID METAHEURISTICS

The same pragmatic view that gives rise to SS can be found in the origins of asynchronous teams (A-Teams), back in the 1980s [131]. These techniques generalized the notion of population from a set of solutions to a set of any relevant structures for the problem at hand (e.g., solutions and partial solutions may coexist). This set of structures acts as a shared memory, much like it is done in blackboard systems [ 1041. A set of agents operate on this shared memory following a predefined dataflow in order to produce new structures. These agents are assimilable to the operators of population-based metaheuristics and as in MAS and EAs may comprise local improvers, constructive heuristics, and selection and replacement procedures. See [ 1301 for more details.

15.3 CLASSIFYING HYBRID METAHEURISTICS It is possible to fit the vast majority of hybrid metaheuristics into some of the major hybrid templates described in the previous section. Nevertheless, it is worth trying to provide a more systematic characterization of hybrid metaheuristics, so that the structure of the algorithm can be easily grasped. Several authors have attempted such classification schemes. We will here address two of these. E.-G. Talbi [ 1281 proposed a mixed hierarchical-flat classification scheme. The hierarchical component captures the structure of the hybrid, whereas the flat component specifies the features of the algorithms involved in the hybrid. More precisely, the structure of the hierarchical portion of the taxonomy is shown in the upper part of Figure 15.1. At the first level, we may distinguish between low level and high level hybridizations. The low-level hybridization addresses the functional composition of a single optimization method. In this hybrid class, a given function of a metaheuristic is replaced by another metaheuristic. On the contrary, in high-level hybrid algorithms, the different metaheuristics are self-contained. We have no direct relationship to the internal workings of a metaheuristic. As to relay hybridization, a set of metaheuristics are applied one after another, each using the output of the previous as its input, acting in a pipeline fashion. On the other hand, teamwork hybridization represents cooperative optimization models, in which we have many parallel cooperating agents, where each agent carries out a search in a solution space. Four classes are thus derived from this hierarchical taxonomy: 0

0

LRH (Low-level Relay Hybrid). Typical examples belonging to this class are MAS or SS where local improvement is performed by some non-general purpose mechanism. For example, in [2], the TSP is tackled using a MA endowed with 2-opt optimization. Also for this problem, [9 11 defines a LRH hybrid combining SA with LS. In both cases, the main idea is to embed deterministic LS techniques into the metaheuristic so that the latter explores only local optima. LTH (Low-level Teamwork Hybrid). This class typically comprises combinations of metaheuristics with strong exploring capability (e.g., most EAs) with exploitation-oriented metaheuristics (e.g., most single-solution metaheuris-

CLASSIFYING HYBRID METAHEURISTICS

351

Hybrid Metaheuristics

Relay

Homogeneous

Teamwork

Relay

Teamwork

Heterogeneous Flat Global

Partial

General

Specialist

Fig. 15.1 Talbi’s classification of hybrid metaheuristics.

tics). In such a combination, the former algorithms will try to optimize locally, while the population-based algorithms will try to optimize globally. For example, when a GA is used as a global optimizer, it can be augmented with HC or SA to perform local search (a typical local-search-based MA). This can be done in a variety of ways. First of all, it is possible to use the local search algorithm as a mutation operator. There are numerous examples of this strategy, using HC [125][134] [68], TS [51][50][71][133], or SA [16][19][137]. This lund of operators is usually qualified as Lamarckian, referring to the fact that individuals are replaced by the local optima found after applying local search (contrary to the Baldwin model where the local optima is just used to evaluate the individual). Another possibility is using greedy procedures andor local-search within crossover, e.g., [86][1071. A similar strategy has been used in non-crossover-based metaheuristics such as ACO [ 127][1241, where local search has been introduced to intensify the search. 0

HRH (High-levelRelay Hybrid). In a HRH hybrid, self-containedmetaheuristics are executed in a sequence. For example, it is well known that EAs are not well suited for fine-tuning structures which are very close to optimal solutions. Instead, the strength of EAs is in quickly locating the high performance regions of vast and complex search spaces. Once those regions are located, it may

352

PARALLEL HYBRID METAHEURISTICS

be useful to apply local search heuristics to the high performance structures evolved by the EA. Many authors have used the idea of HRH hybridization for EAs. In [87][129], the authors introduce respectively SA and TS to improve the population obtained by a GA. In [ 1051, the author introduces HC to improve the results obtained by an ES. In [85], the algorithm proposed starts from SA and uses GAS to enrich the solutions found. Experiments performed on the graph partitioning problem using the TS algorithm exploiting the result found by a GA give better results than a search performed either by the GA, or the TS alone [ 1291. 0

HTH (High-level Teamwork Hybrid). The HTH scheme involves several selfcontained algorithms performing a search in parallel, and cooperating to find an optimum. Ideally HTH would perform at least as well as one algorithm alone, although there might be undesired detrimental interactions (i.e., one algorithm could mislead another one, or let it fall within a fitness trap). The HTH hybrid model has been applied to SA [44], GP [73], ES [ 1351, ACO [89], SS [37] and TS [45] among others.

As to the flat classification, several dichotomies are defined: 0

Homogeneous versus heterogeneous. In homogeneous hybrids, all the combined algorithms use the same metaheuristic. Hybrid algorithms such as the island model for GAS [ 1 111, belong to this class of hybrids. Arguably, the term ‘hybrid’ is somewhat forced with homogeneous algorithms, unless different parameters are used in each of these. For example, in HTH based on tabu search, the algorithms may be initialized with different initial solutions, tabu list sizes, etc [136].

In heterogeneous algorithms, different metaheuristics are used. A heterogeneous HTH algorithm based on genetic algorithms and tabu search has been proposed in [34] to solve a network design problem. The population of the GA is asynchronously updated by multiple tabu search algorithms. The best solutions found by tabu search algorithms build an elite population for the GA. The GRASP method (Greedy Randomized Adaptive Search Procedure) may be seen as an iterated heterogenous HRH hybrid, in which local search is repeated from a number of initial solutions generated by randomized greedy heuristic [48][47]. The method is called adaptive because the greedy heuristic takes into account the decisions of the precedent iterations [46]. Global versus partial. In global hybrids, all the algorithms search in the whole search space. The goal is here to explore the space more thoroughly. All the above mentioned hybrids are global hybrids, in the sense that all the algorithms solve the whole optimization problem. A global HTH algorithm based on tabu search has been proposed in [32],where each tabu search task performs a given number of iterations, then broadcasts the best solution. The best of all solutions becomes the initial solution for the next phase.

CLASSIFYING HYBRID METAHEURISTICS

353

In partial hybrids, the problem to be solved is decomposed into subproblems, each one having its own search space. Then, each algorithm is dedicated to the search in one of these subspaces. Generally speaking, the subproblems are all linked with each other, thus involving constraints between optima found by each algorithm. Hence, the algorithms communicate in order to respect these constraints and build a global viable solution to the problem. This approach has been applied for GAS [66] and for SA with TS algorithms [ 1261. 0

Specialist versus general. All the above mentioned hybrids are general hybrids, in the sense that all the algorithms solve the same target optimization problem. Specialist hybrids combine algorithms which solve different problems. An example of such a HTH approach has been developed in [ 101to solve the quadratic assignment problem (QAP). A parallel TS is used to solve the QAP, while a GA makes a diversification task, which is formulated as another optimization problem. The frequency memory stores information relative to all the solutions visited by TS. The GA refers to the frequency memory to generate solutions being in unexplored regions. Another approach of specialist hybrid HRH heuristics is to use a heuristic to optimize another heuristic, i.e., find the optimal values of the parameters of the heuristic. This approach has been used to optimize SA and noisy methods ("I by) GA [77], ACO by GA [3], and a GA by a GA [121].

C. Cotta [24] proposed another taxonomy for hybrid metaheuristics. This taxonomy has the dichotomy strong vs. weak as its root. First of all, strong hybridization refers to the placement of problem-knowledgeinto the core of the algorithm, affecting its internal components. Specifically, a representation of problem solutions allowing the extraction of their most important features and reproductive operators working on that representation are required. These two aspects are closely related and must be carefully selected to obtain adequate performance. The term 'strong' reflects the tight coupling that exists between the basic model and the included knowledge. Examples of strong hybrid algorithms include metaheuristics using ad hoc operators, tailored to deal with the particulars of the problem, e.g., the Edge Assembly Crossover operator defined in [ 1021 for the TSP (involving a greedy repair algorithm), or the different recombination operators defined in [29] for flowshop scheduling (involving the manipulation of non-trivial features of solutions) or in [28] for Bayesian network inference (involving the manipulation of phenotypic information). Strong hybridization also comprises algorithms with nontrivial genotype-to-phenotypemappings. For example, the use of decoders [93] in order to produce non-homogeneous representations [24][ 1191 (representations that do not cover uniformly the search space, but give preference to promising regions), as in, e.g., [20] for the multi-constraint knapsack problem and the set covering problem. Problem-space search [30][ 1231 -the use of metaheuristic with an embedded construction heuristic that is guided through problem-space- also fall within this class. On the other hand, it is possible to combine algorithms performing different searches, thus resulting in a weak hybrid algorithm. This term tries to capture the fact of hybridization taking place in a higher level and through a well-defined interface,

PARALLEL HYBRID METAHEURISTICS

354

Control axis

t

- cooperative

- coercive

J

asynchronous

Temporal axis Fig. 15.2

Taxonomy of weak hybrid algorithms.

therefore allowing algorithms to retain their identities. This terminology is consistent with the classificationof problem-solving strategies in artificial intelligence as strong and weak methods [92]. It also relates closely to the low level vs. high level dichotomy in Talbi’s taxonomy, and as in the former, it allows further refinement. To be precise, a three-dimensionalbasis is defined (see Figure 15.2), each of whose axes supports a certain dichotomy, as shown below: 0

Control axis. This axis determines whether two algorithms interact at the same level, or there exists a hierarchical relationship between them. The first case corresponds to cooperative systems, in which each algorithm performs its own search, with eventual information exchanges. The second case describes coercive models, in which one of the algorithms assumes a role of master, imposing the way the second algorithm has to perform its search (either by means of an external control of the parameterization or by setting exploration constraints). Usually, cooperative models are associated to techniques with similar computational requirements, so a useful information exchange can be performed. For example, the homogeneous hybrids mentioned before (e.g., [111][136]), and in general go-with-the-winners algorithms [8] [ 1101 are comprised here. On the other hand, coercive models are usually associated with embedded metaheuristics, i.e., high level relay hybrids. Besides the HRH mentioned before, one can cite the hybridizations of exact and heuristic techniques defined in [31], where branch-and-bound (BnB) is used as a crossover operator. Temporal axis. This axis captures those aspects related to when the interaction between algorithms takes place and what the behavior of the algorithm between

IMPLEMENTING PARALLEL HYBRID METAHEURISTICS

355

interactions is. In principle, there are two major possibilities, synchronism and asynchronisrn. The first case comprises those models in which one of the algorithms waits at a synchronization point. Examples of such synchronous algorithms are relay hybrids as described before, as well as teamwork hybrids with synchronous interaction (e.g., synchronous migration-based EAs [7]). The second case comprises asynchronous teamwork hybrids as well as some mixtures of teamwork and relay hybrids (e.g., in [26] a coercive HRH model is asynchronously parallelized, resulting in the simultaneous execution of a GA master and several BnB slaves; recall that the sequential version of this hybrid would be a synchronous coercive model). 0

Spatial axis. This axis allows classifying weak hybrids from the point of view of the search space explored by each of the involved algorithms. To be precise, open and closed models can be distinguished. In the former, there exist no constraint on the search space considered by each of the algorithms, i.e., any point in the search space is potentially reachable. In closed models exploration is restricted to a certain subspace. This can owe to external coercion by one of the algorithms or to an internal feature of the algorithm behavior (e.g., think of decomposition approaches, in which each algorithms is assigned a portion of the search space). Local-search-based MAS are typical examples of open models since the localsearcher can potentially visit any point of the search space (obviously, the features of the search landscape are determinant for this purpose; anyway, this does not obey to internal algorithmic reasons, but depends on the input to the algorithm). Another open (synchronous coercive) model can be found in [53], where a GA is used to explore the queue of open problems in a Bnl3 algorithm. On the other hand, the globally optimal forma completion approach defined in [ 1 141 is a closed model (during recombination, some features of solution are fixed, and the subspace of solutions with those features is exhaustivelyexplored to determine the best solution). A related closed model is the dynastically optimal recombination defined in [311, where the subspace comprising all solutions that can be built using the information comprised in a set of parents is exhaustively explored during recombination.

The taxonomies described in this section are aimed at specifying the algorithmic and functional details of a hybrid metaheuristic. The next section is devoted to some computational details affecting the implementation of these hybrids.

15.4 IMPLEMENTING PARALLEL HYBRID METAHEURISTICS Parallelism can be brought into hybrid metaheuristics in many ways. A taxonomy for this purpose is actually proposed in [128]. We here concentrate on the parallelization of hybrid metaheuristics on general-purpose computers, since this is the most widespread computational platform. Parallel hybrids may then be firstly classified according to the different characteristics of the target parallel architecture:

356

PARALLEL HYBRID METAHEURISTICS

0

SIMD versus MIMD. In SIMD (Single Instruction stream, Multiple Data stream) parallel machines, the processors are restricted to execute the same program. They are very efficient in executing synchronized parallel algorithms that contain regular computations and regular data transfers. So, SIMD machines have been used for some parallel hybrid algorithms such a HTH based on TS arranged in a two-dimensional cyclic mesh on a Maspar MPP-1 [431. When the computations or the data transfers become irregular or asynchronous, the SIMD machines become much less efficient. In parallel MIMD (Multiple Instruction stream, Multiple Data stream), the processors are allowed to perform different types of instructions on different data. HTH hybrids based on TS [49][45], SA, and GAS [ l o l l have been implemented on networks of transputers.

0

0

Shared-memory versus Distributed-memory. The advantages of parallel hybrids implemented on shared-memory parallel architectures are their simplicity. However, parallel distributed-memory architectures offer a more flexible and fault-tolerant programming platform. Homogeneous versus Heterogeneous. Most massively parallel machines (MPP) and cluster of workstations (COW) such as IBM SP/2, Cray T3D, and DEC Alpha-fams are composed of homogeneous processors. The proliferation of powerful workstations and fast communication networks have shown the emergence of heterogeneous networks of workstations (NOWs) as platforms for high-performance computing (see Figure 15.3).

Fig. 15.3

Parallel implementation of heterogeneous HTH algorithms.

IMPLEMENTING PARALLEL HYBRID METAHEURISTICS

357

Parallel hybrid metaheuristics can also be classified according to whether the number and/or the location of work (tasks, data) depend on the load state of the target parallel machine: 0

Static. This category represents parallel heuristics in which both the number of tasks of the application and the location of work (tasks or data) are generated at compiling time (static scheduling). The allocation of processors to tasks (or data) remains unchanged during the execution of the application regardless of the current state of the parallel machine. Most of the proposed parallel heuristics belong to this class. An example of such an approach for TS is presented in [112]. The neighborhood is partitioned in equal-size partitions depending on the number of workers, which is equal to the number of processors of the parallel machine. In [ 181, the number of tasks generated depends on the size of the problem and is equal to n2,where n is the problem size. In [ 111, a parallel GA is proposed, where the number of tasks generated is equal to the population size which is fixed at compiling time. When there are noticeable load or power differences between processors, the search time of the static approach presented is derived by the maximum execution time over all processors (presumably on the most highly loaded processor or the least powerful processor). A significant number of tasks are often idle waiting for other tasks to complete their work.

0

Dynamic. to improve the performance of parallel static heuristics, dynamic load balancing must be introduced [ 12][ 1121. This class represents heuristics for which the number of tasks is fixed at compilation time but the location of work (tasks, data) is determined and/or changed at run time. Load balancing requirements are met in [ 1121 by a dynamic redistribution of work between processors. During the search, each time a task finishes its work, it proceeds to a work demand. However, the degree of parallelism in this class of algorithms is not related to load variation in the target machine: when the number of tasks exceeds the number of idle nodes, multiple tasks are assigned to the same node. Moreover, when there are more idle nodes than tasks, some of them will not be used.

0

Adaptive: parallel adaptiveprograms are parallel computations with a dynamically changing set of tasks. Tasks may be created or killed as a function of the load state of the parallel machine. A task is created automatically when a node becomes idle. When a node becomes busy, the task is killed. In [lo], a parallel adaptive implementation has been proposed for HTH combining TS and GAS.

358

15.5

PARALLEL HYBRID METAHEURISTICS

APPLICATIONS O F PARALLEL HYBRID METAHEURISTICS

This section will provide an overview of the numerous applications of parallel hybrid metaheuristics. Of course, this overview is far from exhaustive since new applications are being developed continuously. However, it is intended to be illustrative of the practical impact of these optimization techniques. Table 15.1 shows such an illustrative sample of applications. It must be noted that we have focused on those applications involving a truly parallel implementation of a hybrid metaheuristic. For hrther information about applications of parallel hybrid metaheuristics we suggest querying bibliographical databases or web browsers for the keywords “parallel hybrid metaheuristic or “parallel hybrid evolutionaly algorithm”. ”

Table 15.1 Some applications of parallel hybrid metaheuristics Domain

Problem

References

Coinbinatorial Optimization

0-1 Integer Linear Programming Boolean Function Capacitated Network Design Graph Coloring Graph Partitioning Independent Set Maximum Cut Multicoininodity Allocation Packing Quadratic Assignment

[ 151 [30] [ 1031 [73l[771 1331 [62l P O I [821 [90J [601 IS1 ~351 [75l 1761 [I61 [I81 (641 [651 I831 1x41 [9] [26] [77] 1881 [9011991 [loo][ 1221 [I31 [ I 171 [I261

Set Partitioning Traveling Salesman Problem Vehicle Routing Engineering and Electronics

Cell Placement Floorplan Design VLSl Design

[ l o l l [ I 1 I ] [ 1321 [I351

Functional Optimization Machine Learning

Neural Networks

I41 [671

Physics and Chemistry

Molecular Structure

[I131 [I401

Scheduling

Resource Allocation Task Allocation Task Scheduling

PI

Antenna Placement Error Correcting Codes Frequency Assignment Mapping

[61 [61

Telecommunications

[521 [ I 121

PI W I 1431

CONCLUSIONS

359

15.6 CONCLUSIONS

This chapter has presented a general overview of hybrid metaheuristics, with special emphasis on their utilization on parallel environments. For this purpose, we have surveyed different taxonomies of these techniques, covering both algorithmic and computational aspects of parallel metaheuristics. As it was mentioned before, pure (i.e., general-purpose, not augmented with problem knowledge) population-based heuristics such as GAS, GP, ES, and ACO are not well suited in general to search in highly dimensional combinatorial spaces. Hence, the need for hybridization with other techniques in order to achieve practical results. A very common strategy to do that consists in hybridizing population-based metaheuristics with local search heuristics (open synchronous coercive models HRH), as it is done in MAslSS. Very often, this is done in sequential settings, although the authors indicate in their future work the parallelization of the algorithms. This is an indication of the growing interest in developing parallel hybrid algorithms. Parallel schemes ideally provide novel ways to parallelize hybrid algorithms by providing new algorithmic models. Furthermore, parallel computing systems offer a way to provide the large computational power needed for tackling the increasing complexity of hard combinatorial optimization problems. Clusters of existing commodity workstations are a low cost hardware alternative to run parallel metaheuristics. Obviously, issues of heterogeneity and work load arise. Nevertheless, the flexibility of metaheuristics (either parallel or sequential) makes them much less sensitive to these issues than other classical optimization techniques. For this reason, they constitute excellent candidates to approach the hardest optimization tasks in the years to come. Acknowledgments The first and third authors acknowledge partial hnding by the Ministry of Science and Technology and FEDER under contract TIC2002-04498-C05-02 (the TRACER project).

REFERENCES 1. E. H. L. Aarts, F. M. I. De Bont, J. H. A. Habers, and P. J. M. Van Laarhoven. Parallel implementations of the statistical cooling algorithms. Integration, 4:209238,1986.

2 . E. H. L. Aarts and M. G. A. Verhoeven. Genetic local search for the traveling salesman problem. In T. Back, D.B. Fogel, and Z. Michalewicz, editors, Handbook of Evolutionavy Computation, pages G9.5: 1-7. Institute of Physics Publishing and Oxford University Press, Bristol, New York, 1997.

360

PARALLEL HYBRID METAHEURISTICS

3. F. Abbattista, N. Abbattista, and L. Caponetti. An evolutionary and cooperative agent model for optimization. In IEEE Int. Conf. on Evolutionary Computation ICEC’95, pages 668-671, Perth, Australia, Dec 1995. 4. P. Adamidis and V. Petridis. Co-operating populations with different evolution behaviours. In IEEE Int. Conf. on Evolutionary Computation, ICEC’96, pages 188-191, Nagoya, Japan, May 1996. 5 . E. Alba, F. Almeida, M. Blesa, C. Cotta, M. Diaz, I. Dorta, J. Gabarro, J. Gonzalez, C. Leon, L. Moreno, J. Petit, J. Roda, A. Rojas, and F. Xhafa.

Mallba: A library of skeletons for combinatorial optimisation. In B. Monien and R. Feldman, editors, Euro-Par 2002 Parallel Processing, volume 2400 of Lecture Notes in Computer Science, pages 927-932. Springer-Verlag, Berlin Heidelberg, 2002. 6. E. Alba, C. Cotta, F. Chicano, and A.J. Nebro. Parallel evolutionary algorithms in telecommunications: Two case studies. In Proceedings of the CACIC’02, Buenos Aires, Argentina, 2002.

7. E. Alba and J.M. Troya. Influence of the migration policy in parallel distributed GASwith structured and panmictic populations. Applied Intelligence, 12(3):163181,2000. 8. D. Aldous and U. Vazirani. “Go with the winners” algorithms. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, pages 492-501, Los Alamitos, CA, 1994. IEEE. 9. T. Asveren and P. Molitor. New crossover methods for sequencing problems. In H-M.Voigt, W. Ebeling, I. Rechenberg, and H-P. Schewefel, editors, Parallel Problem Solving from Nature PPSN4, volume 1141 of LNCS, pages 290-299, Dortmund, Germany, Sept 1996. Springer-Verlag. 10. V. Bachelet, Z. Hafidi, P. Preux, and E-G. Talbi. Diversifying tabu search by genetic algorithms. In INFORMS’98 on Operations Research and Management Sciences meeting, Montreal, Canada, Apr 1998. 11. V. Bachelet, P. Preux, and E-G. Talbi. Parallel hybrid meta-heuristics: application to the quadratic assignment problem. In Parallel Optimization Colloquium POC96, pages 233-242, Versailles, France, Mar 1996. 12. P. Badeau, M. Gendreau, F. Guertin, J-Y. Potvin, and E. Taillard. A parallel tabu search heuristic for the vehicle routing problem with time windows. RR CRT-95-84, Centre de Recherche sur les Transports, Universiti de Montrial, Canada, Dec 1995. 13. P. Badeau, F. Guertin, M. Gendreau, J-Y. Potvin, and E. D. Taillard. A parallel tabu search heuristic for the vehicle routing problem with time windows. Transportation Research, 5(2): 109-122, 1997.

REFERENCES

361

14. P. Banerjee and M. Jones. A parallel simulated annealing algorithm for standard cell placement on a hypercube computer. In IEEE Int. Con$ on Computer-Aided Design, pages 34-37, Santa Clara, California, USA, Nov 1986. 15. R. Bianchini and C. Brown. Parallel genetic algorithms on distributed-memory architectures. Technical Report 436, University of Rochester, Rochester, N Y , USA, Aug 1992. 16. D. E. Brown, C. L. Huntley, and A. R. Spillane. A parallel genetic heuristic for the quadratic assignment problem. In Third Int. Con$ on Genetic Algorithms ICGA’89, pages 406-415. Morgan Kauffmann, San Mateo, California, USA, July 1989. 17. A. Casotto, F. Romeo, and A. L. Sangiovanni-Vincentelli.A parallel simulated annealing algorithm for the placement of macro-cells. In ZEEE Int. Con$ on Computer-AidedDesign, pages 30-33, Santa Clara, California, USA, Nov 1986. 18. J. Chakrapani and J. Skorin-Kapov. Massively parallel tabu search for the quadratic assignment problem. Annals of Operations Research, 4 1:327-34 1, 1993. 19. H. Chen and N. S. Flann. Parallel simulated annealing and genetic algorithms: A space of hybrid methods. In Y. Davidor, H-P. Schwefel, and R. Manner, editors, Third Con$ on Parallel Problem Solvingfrom Nature, pages 428436, Jerusalem, Israel, Oct 1994. Springer-Verlag. 20. P. C. Chu. A genetic algorithm approachfor combinatorial optimization problems. PhD thesis, University of London, London, UK, 1997. 2 1. J. Cohoon, S. Hedge, W. Martin, and D. Richards. Distributed genetic algorithms for the floorplan design problem. IEEE Trans. on Computer-Aided Design, 10(4):483492, Apr 1991. 22. A. Colomi, M. Dorigo, and V. Maniezzo. Distributed optimization by ant colonies. In European Con$ on Artijicial Life, pages 134-142. Elsevier Publishing, 1991. 23. D. Corne, M. Dorigo, and F. Glover. New Ideas in Optimization. McGraw-Hill, 1999. 24. C. Cotta. A study of hybridisation techniques and their application to the design of evolutionary algorithms. AI Communications, 1l(3-4):223-224, 1998. 25. C. Cotta, E. Alba, R. Sagarna, and P. Larraiiaga. Adjusting weights in artificial neural networks using evolutionaryalgorithms. In P. Larraiiaga and J.A. Lozano, editors, Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation, pages 350-373. Kluwer Academic Publishers, Boston MA, 2001. 26. C. Cotta, J.F. Aldana, A.J. Nebro, and J.M. Troya. Hybridizing genetic algorithms with branch and bound techmques for the resolution of the TSP. In

362

PARALLEL HYBRID METAHEURISTICS

D.W. Pearson, N.C. Steele, and R.F. Albrecht, editors, Artificial Neural Nets and Genetic Algorithms 2, pages 277-280, Wien New York, 1995. Springer-Verlag. 27. C. Cotta and P. Moscato. Evolutionary computation: Challenges and duties. In A. Menon, editor, Frontiers of Evolutionaiy Computation, pages 53-72. Kluwer Academic Publishers, Boston MA, 2004. 28. C. Cotta and J. Muruzabal. Towards a more efficient evolutionary induction of bayesian networks. In J.J. Merelo et al., editors, Parallel Problem Solving From Nature VII, volume 2439 of Lecture Notes in Computer Science, pages 730-739. Springer-Verlag,Berlin, 2002. 29. C. Cotta and J.M. Troya. Genetic forma recombination in permutation flowshop problems. Evolutionary Computation, 6( 1):2544, 1998. 30. C. Cotta and J.M. Troya. A hybrid genetic algorithm for the 0-1 multiple knapsack problem. In G.D. Smith, N.C. Steele, and R.F. Albrecht, editors, Artificial Neural Nets and Genetic Algorithms 3, pages 251-255, Wien New York, 1998. Springer-Verlag. 3 1. C. Cotta and J.M. Troya. Embedding branch and bound within evolutionary algorithms. Applied Intelligence, 18(2):137-153, 2003. 32. T. G. Crainic, M. Toulouse, and M. Gendreau. Towards a taxonomy ofparallel tabu search algorithms. Technical Report CRT-933, Centre de Recherche sur les Transports, Universite de Montreal, Montreal, Canada, Sep 1993. 33. T. G. Crainic and M. Gendreau. A cooperative parallel tabu search for capacitated network design. Technical Report CRT-97-27, Centre de recherche sur les transports, Universite de Montreal, Montreal, Canada, Canada, 1997. 34. T. G. Crainic, A. T. Nguyen, and M. Gendreau. Cooperative multi-thread parallel tabu search with evolutionary adaptive memory. 2nd Int. Con6 on Metaheuristics, Sophia Antipolis, France, July 1997. 35. T. G. Crainic, M. Toulouse, and M. Gendreau. Synchronous tabu search parallelization strategies for multi-commodity location-allocation with balancing requirements. OR Spektrum, 17:113-123, 1995. 36. W. Crompton, S. Hurley, and N.M. Stephen. A parallel genetic algorithm for frequency assignment problems. In Proceedings of IMACS SPRAN”94, pages 81-84, 1994. 37. V-D. Cung, T. Mautor, P. Michelon, and A. Tavares. Recherche dispersee parallele. In Deuxi2me Congrts de la SociPt6 Francnise de Recherche Operationnelle et d’Aide a la decision ROADEF’99, Autrans, France, Jan 1999. 38. L.D. Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold, New YorkNY, 1991.

REFERENCES

363

39. R. Dawkins. The Selfish Gene. Clarendon Press, Oxford, 1976. 40. R. Downey and M. Fellows. Parameterized Complexity. Springer-Verlag, 1998. 4 1. R.G. Downey and M.R. Fellows. Fixed-parameter tractability and completeness I: Basic results. SZAM Journal on Computing, 24(4):873-921, August 1995. 42. S. Droste, T. Jansen, and I. Wegener. Perhaps not a free lunch but at least a free appetizer. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jahela, and R. E. Smith, editors, Proceedings of the First Genetic and Evolutionaly Computation Conference (GECCO '99), pages 833-839, San Francisco CA, 13-17 1999. Morgan Kaufmann Publishers, Inc. 43. I. De Falco, R. Del Balio, and E. Tarantino. Solving the mapping problem by parallel tabu search. In IASTED Conf, Paris, France, 1994. 44. I. De Falco, R. Del Balio, and E. Tarantino. An analysis of parallel heuristics for task allocation in multicomputers. Computing, 3, 59 1995.

45. I. De Falco, R. Del Balio, E. Tarantino, and R. Vaccaro. Improving search by incorporating evolution principles in parallel tabu search. In Znt. Conf on Machine Learning, pages 823-828, 1994. 46. T. A. Feo and M. G. C. Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6:109-133, 1995. 47. T. A. Feo, M. G. C. Resende, and S. H. Smith. A greedy randomized adaptive search procedure for maximum independent set. Operations Research, 42:860878. 1994. 48. T. A. Feo, K. Venkatraman, and J.F. Bard. A GRASP for a difficult single machine scheduling problem. Computers and Operations Research, 18:635-643, 1991. 49. C-N. Fiechter. A parallel tabu search algorithm for large travelling salesman problems. Discrete Applied Mathematics, 5 1:243-267, 1994.

50. C. Fleurant and J. A. Ferland. Genetic and hybrid algorithms for graph coloring. Annals of Operations Research, 63:437-461, 1996. 51. C. Fleurent and J. A. Ferland. Genetic hybrids for the quadratic assignment problem. DZMACS Series in Discrete Mathematics and Theoretical Computer Science, 16:173-188, 1994. 52. S-M. Foo. An approach of combining simulated annealing and genetic algorithm. Master's thesis, University of Illinois, Urbana-Champaign, 1991. 53. A. P. French, A. C. Robinson, and J. M. Wilson. Using a hybrid geneticaigorithmlbranch and bound approach to solve feasibility and optimization integer programming problems. Journal of Heuristics, 7(6):55 1-564, 2001.

364

PARALLEL HYBRID METAHEURISTICS

54. M.R. Garey and D.S Johnson. Computers and Intractability: A Guide to the Theoly of NP-Completeness. Freeman and Co., San Francisco CA, 1979. 55. F. Glover. Heuristics for integer programming using surrogate constraints. Decision Sciences, 8:156-166, 1977.

56. F. Glover. Tabu search - part I. ORSA Journal of Computing, 1(3):190-206, 1989. 57. F. Glover. Scatter search and path relinking. In D. Come, M. Dorigo, and F. Glover, editors, New Methods in Optimization, pages 29 1-3 16. McGraw-Hill, London, 1999.

58. F. Glover and G. Kochenberger. Handbook ofMetaheuristics. Kluwer Academic Publishers, Boston MA, 2003. 59. M. Gorges-Schleuter. ASPARAGOS: An asynchronous parallel genetic optimization strategy. In J. David Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 422427. Morgan Kaufmann Publishers, 1989. 60. M. Hifi. A genetic algorithm-based heuristic for solving the weighted maximum independent set and some equivalent problems. Journal of the Operational Research Society, 48(6):612-622, 1997. 61. D:S. Hochbaum. Approximation algorithmsfor NP-hardproblems. International Thomson Publishing, 1996. 62. T. Hogg and C. P. Williams. Solving the really hard problems with cooperative search. In Proc. of AAAI93, pages 231-236, Menlo Park, CA, 1993. AAAI Press. 63. J. H. Holland. Adaptation in natural and artijicial systems. Michigan Press University, Ann Arbor, MI, USA, 1975. 64. C. L. Huntley and D. E. Brown. Parallel genetic algorithms with local search. Technical Report IPC-TR-90-006, University of Virginia, Charlottesville, VA, USA, July 1991. 65. C. L. Huntley and D. E. Brown. A parallel heuristic for quadratic assignment problems. Computers and Operations Research, 18:275-289, 1991. 66. P. Husbands, F. Mill, and S. Warrington. Genetic algorithms, production plan optimisation and scheduling. In H-P. Schewefel and R. Manner, editors, Parallel Problem Solving From Nature, volume 496 of LNCS, pages 80-84, Dortmund, Germany, Oct 1990. Springer-Verlag. 67. T. Ichimura and Y. Kuriyama. Learning of neural networks with parallel hybrid ga using a royal road function. In 1998IEEE International Joint Conference on Neural Networks, volume 2, pages 1131-1 136, New York, NY, 1998. IEEE.

REFERENCES

365

68. P. Jog, J. Y. Suh, and D. Van Gucht. The effects of population size, heuristic crossover and local improvement on a genetic algorithm for the traveling salesman problem. In 3rd Int. ConJ Genetic Algorithms. Morgan Kaufmann, USA, 1989. 69. J.D. Kelly and L. Davis. A hybrid genetic algorithm for classification. In Proceedings of the TwelvethInternational Joint Conference on Artificial Intelligence, pages 645-650, Sidney, Australia, 1991. Morgan Kaufmann. 70. J.D. Kelly and L. Davis. Hybridizing the genetic algorithm and the k-nearest neighbors classification algorithm. In Proceedings of the Fourth International Conference on Genetic Algorithms, pages 377-383, San Diego, CA, 1991. Morgan Kaufmann. 7 1. H. Kim, Y. Hayashi, and K. Nara. The performance of hybridized algorithm of genetic algorithm simulated annealing and tabu search for thermal unit maintenance scheduling. In 2nd IEEE Con$ on Evolutionary Computation ICEC'95, pages 114-119, Perth, Australia, Dec 1995. 72. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671-680, May 1983. 73. J. Koza and D. Andre. Parallel genetic programming on a network of transputers. Technical Report CS-TR-95-1542, Stanford University, 1995. 74. J. R. Koza. Geneticprogramming. MIT Press, Cambridge, USA, 1992. 75. B. Kroger, P. Schwenderling, and 0. Vornberger. Parallel genetic packing of rectangles. In H-P. Schwefel and R. Manner, editors, Parallel Problem solving f;om nature, volume 496 of LNCS, pages 160-164, Dortmund, Germany, Oct 1990. Springer-Verlag. 76. B. Kroger, P. Schwenderling, and 0. Vornberger. Genetic paclung of rectangles on transputers. In P. Welch et al., editor, Transputing 91. 10s Press, 1991. 77. M. Krueger. Methodes d 'analyse d'algorithmes d 'optimisation stochastiques a 1'aide d 'algorithmes gendtiques. PhD thesis, Ecole Nationale Superieure des TClCcommunications, Paris, France, Dec 1993. 78. M. Laguna and R. Marti. Scatter Search. Methodology and Implementations in C. Kluwer Academic Publishers, Boston MA, 2003. 79. P. Larraiiaga and J.A. Lozano. Estimation of Distribution Algorithms. A New Toolfor Evolutionavy Computation. Kluwer Academic Publishers, Boston MA, 2001. 80. G . Von Laszewski and H. Muhlenbein. Partitioning a graph with parallel genetic algorithm. In H-P. Schwefel and R. Manner, editors, Parallel Problem solving from nature, volume 496 of LNCS, pages 165-169, Dortmund, Germany, Oct 1990. Springer-Verlag.

366

PARALLEL HYBRID METAHEURISTICS

8 1. E. L. Lawler. Combinatorial optimization: Networks and matroids. Holt, Rinehart and Winston, USA, 1976. 82. K-G. Lee and S-Y. Lee. Efficient parallelization of simulated annealing using multiple markov chains: An application to graph partitioning. In T. N. Mudge, editor, Int. Con$ on Parallel Processing, pages 177-180. CRC Press, 1992. 83. D. Levine. A parallel genetic algorithm ,for the set partitioning problem. PhD thesis, Argonne National Laboratory, Illinois Institute of Technology, Argonne, USA, May 1994. 84. D. Levine. A parallel genetic algorithm for the set partitioning problem. In I.H. Osman and J.P. Kelly, editors, Meta-Heuristics: Theory & Applications, pages 23-35. Kluwer Academic Publishers, 1996. 85. F. T. Lin, C. Y. Kao, and C. C. Hsu. Incorporating genetic algorithms into simulated annealing. Proc. of the Fourth Int. Symp. on AI, pages 290-297, 1991. 86. M. Lozano, F. Herrera, N. Krasnogor, and D. Molina. Real-coded memetic algorithms with crossover hill-climbing. Evolutionary Computation, 12(3):273302,2004. 87. S. W. Mahfoud and D. E. Goldberg. Parallel recombinative simulated annealing: A genetic algorithm. Parallel computing, 21: 1-28, 1995. 88. M. Malek, M. Guruswamy, M. Pandya, and H. Owens. Serial and parallel simulated annealing and tabu search algorithms for the traveling salesman problem. Annals of Operations Research, 2 1:59-84, 1989. 89. C. E. Mariano and E. Morales. A multiple objective ant-q algorithm for the design of water distribution irrigation networks. In First International Workshop on Ant Colony Optimization ANTS '98, Bruxelles, Belgique, Oct 1998. 90. 0. C. Martin and S. W. Otto. Combining simulated annealing with local search heuristics. Annals of Operations Research, 6357-75, 1996. 9 1. 0.C. Martin, S. W. Otto, and E. W. Felten. Large-step markov chains for the TSP: Incorporating local search heuristics. Operation Research Letters, 1 1:2 19-224, 1992. 92. Z. Michalewicz. A hierarchy of evolution programs: An experimental study. Evolutionary Computation, 1(1):51-76, 1993. 93. Z. Michalewicz. Decoders. In T. Back, D.B. Fogel, and Z. Michalewicz, editors, Handbook of Evolutionary Computation, pages C5.3: 1-3. Institute of Physics Publishing and Oxford University Press, Bristol, New York, 1997. 94. D. Montana and L. Davis. Training feedforward neural networks using genetic algorithms. In Proceedings of the Eleventh International Joint Conference on

REFERENCES

367

Artificial Intelligence, pages 762-767, San Mateo, CA, 1989. Morgan Kaufmann.

95. P. Moscato. On Evolution, Search, Optimization, Genetic Algorithms and Martial A r t s : Towards Memetic Algorithms. Technical Report Caltech Concurrent Computation Program, Report. 826, California Institute of Technology, Pasadena, California, USA, 1989. 96. P. Moscato. Memetic algorithms: A short introduction. In D. Corne, M. Dorigo, and F. Glover, editors, New Ideas in Optimization, pages 219-234. McGraw-Hill, 1999. 97. P. Moscato and C. Cotta. A gentle introduction to memetic algorithms. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, pages 105-144. Kluwer Academic Publishers, Boston MA, 2003. 98. P. Moscato, C . Cotta, and A. Mendes. Memetic algorithms. In G.C. Onwubolu and B.V. Babu, editors, New Optimization Techniques in Engineering, pages 53-85. Springer-Verlag, Berlin Heidelberg, 2004. 99. P. Moscato and M.G. Norman. A memetic approach for the traveling salesman problem: Implementation of a computational ecology for combinatorial optimization on message-passing systems. In M. Valero, E. Onate, M. Jane, J.L. Lamba, and B. Suarez, editors, Parallel Computing and Transputer Applications, pages 187-194. 10s Press, Amsterdam, 1992. 100. H. Muhlenbein, M. Gorges-Schleuter, and 0.Kramer. Evolution Algorithms in Combinatorial Optimization. Parallel Computing, 7:65-88, 1988. 101. H. Muhlenbein, M. Schomisch, and J. Born. The parallel genetic algorithm as function optimizer. Parallel Computing, 17:619-632, 1991. 102. Y. Nagata and S. Kobayashl. Edge assembly crossover: A high-power genetic algorithm for the traveling salesman problem. In T. Back, editor, Proceedings of the Seventh International Conference on Genetic Algorithms, East Lansing, EUA, pages 450-457, San Mateo, CA, 1997. Morgan Kaufmann. 103. S. Niar and A. Freville. A parallel tabu search algorithm for the 0-1 multidimensional knapsack problem. In Int. Parallel Processing Symposium, Geneva, Switzerland, 1997. IEEE Society. 104. H.P. Nii. Blackboard systems, part one: The blackboard model of problem solving and the evolution of blackboard architectures. AIMagazine, 7(2):38-53, 1986.

105. V. Nissen. Solving the quadratic assignment problem with clues from nature. IEEE Transactions on Neural Networks, 5(1):66-72, Jan 1994. 106. M.G. Norman and P. Moscato. A competitive and cooperative approach to complex combinatorial search. Technical Report Caltech Concurrent Computation

368

PARALLEL HYBRID METAHEURISTICS

Program, Report. 790, California Institute of Technology, Pasadena, California. USA, 1989. Expanded version published at the Proceedings of the 20th Informatics and Operations Research Meeting, Buenos Aires (20th JAIIO), Aug. 1991, pp. 3.15-3.29. 107. U-M. O’Reilly and F. Oppacher. Hybridized crossover-based techniques for program discovery. In IEEE Int. Con$ on Evolutionary Computation ICEC’95, pages 573-578, Perth, Australia, Dec 1995. 108. I. H. Osman and G. Laporte. Metaheuristics: A bibliography. Annals of Operations Research, 63513-628, 1996. 109. C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, 1982. 110. M. Peinado and T. Lengauer. Parallel “go with the winners algorithms” in the LogP Model. In IEEE Computer Society Press, editor, Proceedings of the 11th International Parallel Processing Symposium, pages 656-664. Los Alamitos, California, 1997. 1 11. C. B. Pettey, M. R. Leuze, and J. J. Grefenstette. A parallel genetic algorithm. Proc. of the Second Int. Con[ on Genetic algorithms, MIX Cambridge, pages 155-161, July 1987.

112. S. C. S. Port0 and C. Ribeiro. Parallel tabu search message-passing synchronous strategies for task scheduling under precedence constraints. Journal of Heuristics, 1(2):207-223, 1996. 113. W.J. Pullan. Structure prediction of benzene clusters using a genetic algorithm. Journal of Chemical Information and Computer Sciences, 37(6): 1189-1 193, 1997. 114. N.J. Radcliffe and P.D. Suny. Fitness Variance of Formae and Performance Prediction. In L.D. Whitley and M.D. Vose, editors, Proceedings of the Third Workshop on Foundations of Genetic Algorithms, pages 5 1-72, San Francisco, 1994. Morgan Kaufmann. 115. I. Rechenberg. Evolutionsstrategie : Optimierung technischer systeme nach prizipien der biologischen evolution. Formann-Holzboog Verlag, Stuttgart, Germany, 1973. 116. C. R. Reeves. Modern heuristic techniques for combinatorialproblems. Black Scientific Publications, Oxford, UK, 1993. 117. C. Rego and C. Roucairol. A parallel tabu search algorithm for the vehicle routing problem. In I. H. Osman and J. P. Kelly, editors, Meta-Heuristics: Theoly and Applications, pages 253-295. Kluwer, Nonvell, MA, USA, 1996. 118. J. S. Rose, D. R. Blythe, W. M. Snelgrove, and Z . G. Vranecic. Fast, high quality VLSI placement on a MIMD multiprocessor. In IEEE Int. Conf on Computer-Aided Design, pages 4 2 4 5 , Santa Clara, Nov 1986.

REFERENCES

369

119. F. Rothlauf, Representations for Genetic and Evolutionaly Algorithms. Studies in Fuzziness and Soft Computing. Physica-Verlag, Heidelberg New York, 2002. 120. D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning representations by backpropagating errors. Nature, 323:533-53 6, 1986. 121. K. Shahookar and P. Mazumder. A genetic approach to standard cell placement using meta-genetic parameter optimization. IEEE Trans. on Computer-Aided Design, 9(5):500-511, May 1990. 122. P. M. Sloot, J. A. Kandorp, and A. Schoneveld. Dynamic complex systems: A new approach to parallel computing in computational physics. Technical Report TR-CS-95-08, University of Amsterdam, Netherlands, 1995. 123. R.H. Storer, S.D. Wu, and R. Vaccari. New search spaces for sequencing problems with application to job-shop scheduling. Management Science, 38: 14951509, 1992.

124. T. Stutzle and H. H. Hoos. The MAX-MIN ant system and local search for combinatorial optimizationproblems: Towards adaptive tools for global optimization. In 2nd Int. Conf on Metaheuristics, pages 191-193, Sophia Antipolis, France, July 1997. INRIA. 125. J. Y. Suh and D. Van Gucht. Incorporating heuristic information into genetic search. In 2rd Int. Con$ Genetic Algorithms, pages 10G107. Lawrence Erlbaum Associates, USA, 1987. 126. E. Taillard. Parallel iterative search methods for vehicle routing problem. Networks, 23:661-673, 1993. 127. E. D. Taillard and L. Gambardella. Adaptive memories for the quadratic assignment problem. Technical Report 87-97, IDSIA, Lugano, Switzerland, 1997. 128. E.-G. Talbi. A taxonomy of hybrid metaheuristics. Journal of Heuristics, 8(5):54 1-564,2002. 129. E. G. Talbi, T. Muntean, and I. Samarandache. Hybridation des algorithmes genetiques avec la recherche tabou. In Evolution Artificielle EA94, Toulouse, France, Sep 1994. 130. S. Talukdar, L. Baerentzen, A. Gove, and P. de Souza. Asynchronous teams: Cooperation schemes for autonomous agents. Journal of Heuristics, 4(4):295321,1998. 131. S.N. Talukdar, S.S. Pyo, and T.Giras. Asynchronous procedures for parallel processing. IEEE Transactions on PAS, PAS-102(1l), 1983. 132. R. Tanese. Parallel genetic algorithms for a hypercube. Proc. ofthe Second Int. Conf: on Genetic algorithms, MIZ Cambridge, MA, USA, pages 177-183, July '19%7.

370

PARALLEL HYBRID METAHEURISTICS

133. J. Thiel and S. Voss. Some experiences on solving multiconstraint zero-one knapsack problems with genetic algorithms. INFOR, 32(4):226-242, Nov 1994. 134. N. L. J. Ulder, E. H. L. Aarts, H-J. Bandelt, P. J. M. Van Laarhoven, and E. Pesch. Genetic local search algorithms for the traveling salesman problem. In H-P. Schewefel and R. Manner, editors, Parallel Problem Solving From Nature, volume 496 ofLNCS, pages 109-1 16, Dortmund, Germany, Oct 1990. SpringerVerlag. 135. H-M. Voigt, J. Born, and I. Santibanez-Koref. Modelling and simulation of distributed evolutionary search processes for function optimization. In H-P. Schwefel and R. Manner, editors, Parallel Problem solving from nature, volume 496 of LNCS, pages 373-380, Dortmund, Germany, Oct 1990. Springer-Verlag. 136. S. Voss. Network optimizationproblems, chapter Tabu search: Applications and prospects, pages 333-353. World Scientific, USA, 1993. 137. L-H. Wang, C-Y. Kao, M. Ouh-young, and W-C. Chen. Molecular binding: A case study of the population-based annealing genetic algorithms. In IEEE Int. Con$ on Evolutionary Computation ICEC’95, pages 50-55, Perth, Australia, Dec 1995. 138. D. Whitley. Functions as permutations: Implications for no free lunch, walsh analysis and summary statistics. In Marc Schoenauer, Kalyanmoy Deb, Gunter Rudolph, Xin Yao, Evelyne Lutton, Juan Julian Merelo, and Hans-Paul Schwefel, editors, Parallel Problem Solving from Nature - PPSN VI,pages 169-1 78, Berlin, 2000. Springer. 139. D.H. Wolpert and W.G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67-82, 1997. 140. C.R. Zacharias, M.R. Lemes, and A.D. Pino. Combining genetic algorithm and simulated annealing: a molecular geometry optimization study. THEOCHEMJournal ofMolecular Structure, 430(29-39), 1998.

16 Parallel Multiobjective Optimization

A.J. NEBROl, F. LUNA', E.-G. TALB12, E. ALBA' Universidad de Mdaga, Spain Laboratoire d'lnformatique Fondamentale de Lille, France

16.1 INTRODUCTION

In the last years, much attention has been put on the optimization of problems that involve more than one objective function [14, 23, 661, this interest being mainly motivated by the multiobjective nature of most real-world problems. The task of finding solutions for such a kind of problems is known as multiobjective optimization. Multiobjective optimization problems (MOPS) have therefore a number of objective functions to be minimized or maximized whch form a mathematical description of performance criteria which are usually in conflict with each other. More formally, solving a MOP means finding a vector 2* = [xT,xa,. . . ,2$] which satisfies the m inequality constraints gz(5)2 0, i = 1 , 2 , . . . , m, and the p equality constraints h , (5) = 0, i = 1, 2, . . . ,p , and optimizes the vector function f(2) = [fl(2),f2(2), . . . , f k ( 2 ) I T , where 3 = [x1,22,. . . , xn]T is. the vector of decision variables. Generally, multiobjective optimization is restricted not to finding a unique single solution but a set of solutions called nondominated solutions. Each solution in this set is said to be a Pareto optimum, and when they are plotted in the objective space they are collectively known as the Paretofront. Obtaining the Pareto front of a given MOP is the main goal of multiobjective optimization. The techniques used to compute a Pareto front can be classified into exact and heuristic ones. Exact methods are able to find the Pareto optimal set of a MOP but they might need exponential computation times in the worst case, thus performing very poorly in most practical settings. On the other hand, heuristic techniques do not guarantee an optimal solution, but they provide near optimal solutions to a wide range of optimization problems in a significantlyreduced amount of time. Within heuristic optimization methods, metaheuristics [9] are a subclass that try to combine basic 371

372

PARALLEL MULTIOBJECTIVE OPTIMIZATION

heuristic methods in a higher level framework aimed at efficiently and effectively exploring the search space. Since algorithms for solving MOPs have usually been used to deal with complex nonlinear objective functions coming from real-world problems, the computation of which involves high time-consuming tasks, even heuristic methods are impractical in certain applications. In this context, parallelism arises as a possible choice in order to obtain the results in a reasonable amount of time. In fact, parallel algorithms have been widely used in the field of mono-objective optimization [36, 591. In the case of exact techniques, a typical example is the solution of optimization problems by means of parallel branch and bound algorithms [33]. The idea is, in general, to solve the problems more rapidly or to solve more complex problems. As to the heuristic methods, parallelism is a way not only for solving problems more rapidly but for developing more efficient models of search: a parallel heuristic algorithm can be more effective than a sequential one, even when executed on a single processor. For example, the reader may refer to [ 5 , 61 for surveys concerning this effect in parallel Evolutionary Algorithms (EAs). The advantages that parallelism offers to mono-objective optimization should also hold in multiobjective optimization. Even though few efforts were devoted to parallel implementations in this field until recent years [ 141, parallelism and multiobjectiveoptimization is a growing field [79]. The first contribution of this chapter is to provide a wide review of works related to parallel metaheuristics for multiobjective optimization. We perform a classification attending to the level of parallelism that these algorithms show. Second, we explore the use of parallelism applying it to two heuristic methods for solving MOPs. In concrete, we have developed pPAES (Parallel Pareto Archived Evolution Strategy), a parallel search model based on PAES [49] and a parallel multiobjectivesteady state Genetic Algorithm (ssGA) [58]. The remainder of the chapter is as follows. In the next section, we detail a survey of parallel algorithms for multiobjective optimization. Next, in Section 16.3, we introduce both the pPAES algorithm as well as the multiobjective ssGA. Section 16.4 is devoted to the presentation and the analysis of the obtained results. Finally, we summarize the conclusions and discuss several lines for future research in Section 16.5. 16.2 PARALLEL METAHEURISTICS FOR MULTIOBJECTIVE OPTIMIZATION Several taxonomies have been proposed in order to classify the parallel implementations of metaheuristics [ 16, 171. Existing works review and discuss the general design and the main strategies used in their parallelization. A widely accepted classification mainly distinguishes between strategies whose goal is basically to speed up the sequential algorithm (Type 1 parallelism in [ 161 or the Single-walkparallelization in [ 17]), and those which modify the behavior of the sequential implementationnot only to search for higher speed up but to hopefully improve the solution quality (Type 2 and o p e 3 parallel strategies in [16] or the Multiple-walkparallelization in [17]).

PARALLEL METAHEURISTICS FOR MULTIOBJECTIVE OPTIMIZATION ~

Parallel Multi-Objective

Single-walk paralelization Parallel Function Evaluation (PFE)

373

Multiple-walk parallelization Parallel Operators (PO)

Centralized Pareto Front (CPF)

Distributed Pareto Front (DPF)

Fig. 16.1 Classification of parallel metaheuristics for multiobjective optimization.

These taxonomies hold for multiobjective optimization algorithms but they need a further specification for two reasons. First, real-world MOPs have to deal with the utilization of complex solvers and simulators. We therefore differentiate the strategies aimed solely to speed up the computations in those that parallelize the function evaluation of the problem to optimize and those that parallelize one or more operators of the search technique. Second, the results of a multiobjectiveoptimization procedure are not restricted to finding a single solution but to finding the set of nondominated solutions (Pareto front). This should be taken into account in the parallelization strategy because several threads, at the same time, are exploring new potential solutions whose Pareto optimality must be checked. We here distinguish between two approaches: the Pareto front is distributed and locally managed by each search thread during the computation (local nondominated solutions) or it is a centralized element of the algorithm (global nondominated solutions). An outline of this hierarchical classification is drawn in Figure 16.1. Hence, we define the following categories: 1. Single-walkparallelization. This kind of parallelism is aimed at speeding up the computations and the basic behavior of the underlying algorithms is not changed. It is the easiest and the most widely used parallelization scheme in multiobjective optimization because of the MOPs that are usually solved in this field, e.g., real-world problems involving high time-consuming tasks. Parallelism is applied in two ways: (a) Parallel Function Evaluation (PFE): The evaluations of the objective functions of MOPs are performed in parallel. (b) Parallel Operator (PO): The search operators of the method are run in parallel.

2 . Multiple-walkparallelization. Besides the search for speedup, improvements in the solution quality should be also sought in parallel implementations. Although the latter is likely to be the most important contribution of parallelism to metaheuristics [ 171, few of such parallel search models have been especially designed for multiobjective optimization until recently [14]. A main issue in the development of this kind of algorithms is how the Pareto front is built during the optimization process. Two different approaches can be considered:

374

PARALLEL MULTIOBJECTIVE OPTIMIZATION

(a) Centralized Pareto Front (CPF). The front is a centralized data structure of the algorithm that is built by the search threads during the whole computation. This way, the new nondominated solutions in the Pareto optimal set are global Pareto optima. (b) Distributed Pareto Front (DPF). The Pareto front is distributed among the search threads so that the algorithm works with local nondominated solutions that must be somehow combined at the end of their work. Among the revised stuff analyzed for this chapter, no pure CPF implementation has been found clearly motivated by efficiency issues. All the CPF parallelizations in the literature are combined with DPF phases where local nondominated solutions are considered. After each DPF phase, a single optimal Pareto front is built by using these local Pareto optima. Then, the new Pareto front is again distributed for local computation, and so on. In Table 16.1 we show a list of works (sorted by year) of parallel algorithms for multiobjective optimization taken from the literature. In this table, the following information is presented (in the order they appear): 0

the reference(s) of the work,

0

the year of the first publication related to the work,

0

the parallelization strategy used (PS), where SW and MW stand for SingleWalk and Multiple-Walk, respectively,

0

whether or not the algorithm considers Pareto optimality explicitly (PE),

0

the membership of the algorithm to the four categories defined here,

0

the implementation language andor communication library used,

0

the communication topology,

0

a brief description of the algorithm, and

0

its application domain.

We have decided to include the PE (Pareto Explicit) column showing whether the parallel multiobjective algorithm considers Pareto optimality directly for solving MOPS. The reason is that other MOP-like approaches have been developed, the aggregative weighted sum being the most popular [44, 761. However, advanced approaches like using fuzzy techniques [ 121 or multiobjective optimization via Nash equilibrium [8 11 can also be found. Note that when Pareto optimality is not addressed by the algorithm (see "*"marks ofPE column in Table 16.l), the further categorization of MW parallelizations does not make sense ("-" marks in columns CPF and DPF) because there do not exist Pareto fronts to be considered. Despite of this last kind of works, Table 16.1 shows that the most recent parallel multiobjective metaheuristic algorithms are based on the Pareto optimality approach.

PARALLEL METAHEURISTICS FOR MULTIOBJECTIVE OPTIMIZATION

375

Of course, the four categories presented in the above taxonomy are not disjoint sets and many of the algorithms fall into more than one (see Table 16.1, where several references have more than one of the columns PFE, PO, CPF, and DPF marked at the same time). For example, in [ 5 5 ] an island model parallel EA to identify constrained De Novo peptide is presented. At first level, it follows a MW parallelization strategy considering a distributed Pareto front (DPF strategy), but, in a second level, each island also operates using a SW parallelization where the function evaluation is performed in parallel (PFE strategy). It is clear from Table 16.1 that, due to the complex objective functions involved in multiobjectiveoptimization, when the algorithm follows a SW parallelization, a PFE strategy is the most widely used. Similarly, in MW parallelizations most works utilize a DPF (distributed) strategy instead of a CPF (centralized) one. The column “Language” in Table 16.1 shows that C/C++ is the preferred programming language for this kind of algorithms, mainly because of its efficiency. FORTRAN [69] and Java [85] implementations have also been developed. They are typically combined with the communicationlibraries PVM or MPI in order to get parallel programs, but there already exist recent works using modem Grid technologies to parallelize these algorithms [7]. Concerning the connection topology of the search threads in parallel metaheuristics, centralized approaches such as master/slave or star have usually been addressed in SW parallelizations. They are based on a master thread running the algorithm and several slaves performing the complex function evaluation. Among the MW parallelizations,full connected topologies have also been widely used. In this approach, one or more nondominated solutions are periodically broadcast among the threads to enhance the diversity of the search. A wide spectrum of applications can be observed in the last column of Table 16.1. As we stated before, most of them are real-world engineering problems. In fact, they are mainly design problems: VLSI design, aerodynamic airfoil design, or radio network design. A set of mathematical test functions [ 14, 781 are also typically used for evaluation and comparison of new multiobjective algorithms. Although Table 16.1 only shows parallel metaheuristic algorithms, parallel exact techniques for solving MOPs also appear in the literature. In [60],an enumerative search algorithm based on Condor to distribute the computation is presented. In this work, the search space of various mathematical test MOPs is divided into several subspaces, each one explored by a sequential enumerativealgorithm. Other examples of exact techniques are [ 5 1,77,78,84]. Hybrid strategies between exact and heuristic methods could also be found [8]. In general, the most widely used metaheuristics in multiobjective optimization are EAs [ 14, 231 because of two features. First, EAs are able to search for many Pareto optimal solutions in parallel by typically maintaining a population of tentative solutions that are iteratively manipulated, and, hence, they are naturally well-suited for multiobjective optimization. Second, EAs are naturally prone to parallelism by themselves [ 5 ] .

[83]

1571 [58] [72] [26] [30] 1751

[I]

[69]

.

(24. 251 [20.2 I . 221 [41] [43] [44] [45] [46] [81] 17. 191 [31] [48] [56] [65. 711 1851 1131 [47] [37. 381 1551 [MI [68] [67] [86]

L

135.341

References [53] [IS] 1761 128. 27. 291 [54] [73] [52] [62.63,61] 1121 [42] (70.801 [lo. 181 (39.40. 821 [4. 2. 3. 741

Year 1993 1994 1995 1996 1996 1996 1997 1997 1998 1998 1998 1999 1999 2000 2000 2000 2000 2000 2000 2000 2000 2001 2001 2001 2002 2002 2002 2002 2002 2002 2002 2002 2003 2003 2003 2003 2003 2003 2004 2004 2004 2004 2004 2004 2004 2004

MW MW MW MW MW

sw

MW MW MW

sw sw sw sw sw

MW MW

sw

MW MW MW MW MW

sw sw

MW

sw

MW

sw sw

MW

sw sw

MW MW MW

sw sw

MW

sw

MW MW

sw

MW

sw sw

PS MW

.

.

.

.

0

J J

J J

.

.

J

. .

.

0

0

. J

J .

.

.

-

-

-

-

J

. .

-

. .

.

-

-

.

-

-

-

J

. .

.

-

. J

-

J

J

-

-

J J

J J

J J J J J J J

J

J

.

J

J

J J J J J

J J

J

. J J J

.

J

-

J

J J

.

. .

. .

.

.

.

-

.

. .

. .

-

-

. J

.

.

. J

J

-

.

...

J J J J J J

0

J

J

J

J J

J

..--

.

J

.

a m

-

. . . J

.

J

J

0

a

J

J

J

J

J

.

0

J

-

MPI MPI PVM C++/PVM CIPVM

-

Java CiMPI MPI

-

CiMPI PVM C/Globus PVM C++iMPI C++/MPI

-

-

-

MPI? C/MPI CiMPl

-

ClOpenMP C;PVM

-

MPI CiPVM

c++

-

M PI C'?

C++

-

MPI MPI

MPI PVM PVM FYOiCORBA

. ..-.

J J

J J J J

J

J J

J

J

.

.

J

0

Language C C

I

TONS

-

Masterislave Random Ring Masterislave Ring Rine Random

-

Master'Slave Full Connected? MasteriSlave MasterNave Full Connected Star Hierarchy Masterislave Master/Slave Full Connected Hierarchy MasterWave Star Star Full Connected MasterlSlave Master5lave Full Connected Star Dynamic TONS Fixed Graph Master/Slave Full Connected Hierarchy Masterislave MasterISlave Star Masterislave Master/Slave Full Connected Masterislave

Topology Full Connected Masterislave Master/Slave Mesh Master/Slave Torus

I

'

1

Description DCGA: one island per objective SA with a local search quasiNewton algorithm VLSl optimization Microprocessor design GA running on the Internet DGA where stepping stone migrations Aerodynamic design Airfoil design Global parallelization of a GA 1 Sensibility analysis problem Parallel cellular GA VLSl routing DGA with torus topology 1 Airfoil design Global optimization of MOGA , SVC planning PSA where independent searches interact 1 Airfoil design Parallel GA with global parallelization Wing design PGA for shared-memory machines Control system design Decision-making unit rule-based parallel GA Mathematical test functions Divided Range MOGA (DRMOGA) VLSl design Parallel TS on a heterogenenous platfonn Astrophysical problem Parallel implementation of the NPGA Yacht fin keel design Hybrid with a MOGA. a NN, and a optimizer Coinbinatorial libraries design PSA with communicating search threads Shape design GA with parallel genetic operators Radio network design Parallel global optimization of a GA DGA where islands optimize different objectives Actuators placement ACO algorithm for shared memory computers Scheduling problem Islands separately executing the SPEA algorithm Network design Wing design GA parallelized on a SGI ORIGIN 2000 Antenna arrangement problem Master Slave with Local Cultivation model Mathematical test function Island model of NSGA-I1 Mathematical test functions DGA with island exploring different regions Mathematical test functions DGA with asynchronous migrations Vehicle routing problem Hybridization between an EA and a TS Mathematical test functions DGA where islands have different weights Car engine design Global parallelization of DRMOGA I391 Mathematical test functions MOGA + Cooperative Coevolutionary GA Aerodynamic design Hierarchical GA based on Nash equilibrium Micro-GA parallelization using Grid Computing Diesel engine chamber design Multibody systems optimization Parallel ES using a heterogeneous cluster Groundwater bioreinediation MOEA following a farming model Optical filter design Global parallelization of a GA Handwritten recognition Global parallelization of NSGA Mathematical test functions Parallel SPEA2 targeted to SMP machines Mathematical test functions Pthreads and MPI parallel Coevolutionary EA Quadratic assignment problem Island model of MOMGA-II Diesel engine design Parallel Neighborhood Cultivation GA De Nova peptide identification DGA with parallel hnction evaluation Autonomous controller design Parallel multiobjective GP Mathematical test functions Vector Evaluated Particle Swarm Optimization Vector Evaluated Differential Evolution Mathematical test functions Mathematical test functions Adaptive Pareto Differential Evolution

Table 16.1 Parallel metaheuristics for multiobjective optimization

PFE PO C P F DPF

.. . . . . ...

PE

T W O PARALLEL MULTIOBJECTIVE METAHEURISTICS

377

generate i n i t i a l random s o l u t i o n c and add it t o t h e archive while termination c r i t e r i o n has not been reached mutate c t o produce m and evaluate m i f (c dominates m) discard m else i f (m dominates c) 7 r e p l a c e c with m , and add m t o t h e archive e l s e i f ( m i s dominated by any member of t h e archive) 8 9 discard m 10 else apply t e s t ( c , m , a r c h i v e ) t o determine which becomes t h e new c u r r e n t s o l u t i o n 11 and whether t o add m t o t h e archive 12 end while

1

2 3 4 5 6

Fig. 16.2

Pseudocode for ( 1 + 1)-PAES.

16.3 TWO PARALLEL MULTIOBJECTIVE METAHEURISTICS To offer the reader with two example algorithms we include an experimental study in this section. For this goal we have developed two parallel multiobjective metaheuristics. The first one, named pPAES, is based on the Pareto Archived Evolution Strategy (PAES) algorithm [49]. PAES is a well-known sequential multiobjective algorithm against which new proposals are compared. Second, we propose three parallel models of a specialized ssGA [ 5 8 ] for solving a real-world radio network design problem (see Section 16.4.2) coming from the telecommunication industry. 16.3.1 Parallelizing PAES: pPAES The PAES version used here is a (1 + 1) evolution strategy (Section 16.3.1.1) employing local search and a reference archive of previously found solutions in order to identify the approximate dominance ranking of the current and candidate solution vectors. We will detail pPAES later in Section 16.3.1.2. 26.3.2.2 (I + 2)-PAES. We have implemented a C++ version of the (1 + 1)-PAES algorithm based on the description presented in [49]. The (1 + 1)-PAESrepresents the simplest nontrivial approach to a multiobjective local search procedure. An outline of this algorithm is presented in Figure 16.2. PAES is based on maintaining a single solution that, at each iteration, is mutated to generate a new candidate solution (line 2 in Figure 16.2). After that, the algorithm determines whether to accept or reject the mutant solution and whether to archive it in a list of nondominated solutions (archive) by means of an acceptance criterion. Since we intend to solve real-world problems whose evaluation requires high computational resources, a basic improvement has been performed: if the mutation operator does not modify the current individual, c, then neither the evaluation nor the archiving procedure is camed out. In this case, the termination criterion is to reach a preprogrammed number of function evaluations. This implies that the number of iterations is strongly dependent on the mutation probability used in the ES. This way, two executions of this version of PAES with the

378

PARALLEL MULTIOBJECTIVE OPTIMIZATION

same mutation probability will probably finish with a different number of iterations of the algorithm, although they perform the same number of function evaluations, so that the numerical effort spent is the same. Since the aim of the multiobjective search is to find a set of equally spread nondominated solutions, PAES uses a crowding procedure based on an adaptive numerical grid that recursively divides up the objective space [49]. When each solution is generated, its grid location in the objective space is determined by a recursive subdivision of the objective space. The number of solutions currently residing in each grid location is also maintained in order to make important decisions in the selection and archiving procedure when a location is too crowded. 16.3.1.2 pPAES. We have developed a MW parallelization of PAES, called pPAES, using a DPF strategy where each search thread executes a (1 + 1)-PAES. To the best of our knowledge, this is the first parallel model of PAES. The pPAES algorithm works as follows. Each process computes the PAES algorithm during a predefined number of function evaluations and maintains its own local archive of nondominated solutions (DPF strategy). This Pareto optimal set has the same maximum size, N , in all the threads. Periodically, a synchronous migration operation produces an exchange of solutions in a ring topology. This migration process requires further explanations. First, we must set how many and what individuals should be migrated. The pPAES algorithm migrates one single individual each time, which is randomly chosen from the local Pareto front of the thread. Then, the new individual is included as a normal mutated individual in the local PAES (line 3 in Fig. 16.2). Second, the frequency must be defined. As the number of iterations of each PAES could be different (as we explained before), exchanging solutions in pPAES is carried out after a fixed step number, called migration frequency, measured in terms of the number of function evaluations (all the threads migrate the same number of individuals even when each PAES uses a different number of iterations). The last step in pPAES consists in building the final Pareto front that will be presented as the result. This front will have the same number of nondominated solutions as a single PAES thread. So, if pPAES uses p parallel processes, then, at most, p . N Pareto optima could have been calculated during the optimization procedure because it could happen that not all the threads will calculate N solutions. This way, pPAES gathers all this information to present a single front. All pPAES processes send their nondominated solutions to a distinguished process that includes them (along with its already locally stored solutions) in the single front by using the adaptive grid algorithm presented in the previous section.

16.3.2 A Parallel Multiobjective Steady State GA This section is devoted to describe an specialized ssGA [ 5 8 ] for solving a radio network design problem (Section 16.4.2). We first show the basics of the algorithm. Next, three parallel models of the ssGA are presented.

EXPERIMENTATION

379

16.3.2.1 Multiobjective SSGA. The basic line of the algorithm is derived from a steady state GA (ssGA), where only one replacement occurs per generation (see [58] for the details). The classical crossover and mutation genetic operators have been used, but modified for the radio network design problem. They are named geographical crossover and multilevel mutation. In order to deal with the several objectives of the problem, the ssGA uses ranking to sort the population according to the definition of Pareto dominance, sharing to handle the diversity of the nondominated solutions, and elitism to speed up the convergence process. 16.3.2.2 PurullelModelsfor thessGA. In addition to the huge number of solutions in the search space, the other difficulty of the real-world network design problem we are dealing with is due to the high computational cost required to evaluate the objective functions and the constraints. We also need a high requirement of memory. Therefore, we have proposed three hierarchical parallel models [ 111 of the ssGA algorithm to improve the quality and the robustness of the obtained Pareto front, to speed up the search, and to solve large instances of the problem. These models are: 0

0

0

A parallel cooperative model following a DPF MW parallelization scheme which is based on the insular (migration) model of EAs adapted to multiobjective optimization. A parallel asynchronous evaluation model (PFE single walk parallelization strategy), in which the evaluation phase of the evolutionary algorithm is done in parallel. Those two first parallel models are independent of the network design problem. A parallel synchronous decomposition model, in which the evaluation of a single solution is carried out in parallel by partitioning the geographical domain. This PO single walk parallel model is specific to the network design problem.

16.4 EXPERIMENTATION Here, we first present the parallel systems used and the benchmark with which the two previous algorithms have been evaluated in Section 16.4.2 and Section 16.4.1, respectively. Next, in Section 16.4.3, the performance metrics used for measuring the results are described. The analysis of the results is performed in Section 16.4.4.

16.4.1 Parallel Systems All the pPAES experiments have been carried out in a cluster of 16 PCs equipped with Pentium 4 processors at 2.8 GHz, 5 12 MB of RAM, the interconnectionnetwork being a Fast-Ethernet at 100 Mbps. All the programs have been compiled with GNU gcc v3.2 using the option -03. The parallel versions use MPICH 1.2.5, an implementation of the standard MPI.

380

PARALLEL MULTIOBJECTIVE OPTIMIZATION

The implementation of the parallel models of the multiobjective ssGA has been carried out on different parallel and distributed architectures: 0

0

0

A network of heterogeneous workstations (NOWs), composed of 25 PCs using Linux operating system: the interconnection network is a Fast-Ethernet at 100 Mbps. A cluster of clusters of PCs composed of 128 PCs (COWS). In this hardware platform, we have 3 clusters of 40 PCs and 2 clusters of 48 PCs. Each node of the cluster is a Pentium I11 at 733 MHz with 256 MB of RAM. The communication network is a crossbar with 5 switches. A cluster of SMP (CLUMPS): It is an IBM SP3 parallel machine with 64 processors. This platform is composed of 4 clusters of 16 processors. Each processor is a Power3 at 375 MHz. In each cluster, the processors share a memory of 16 GB. The communication between clusters is based on the Colony network: an Omega switch with 800 Mbps.

In these cases, the programming environment used is C/MPI (Message Passing Interface). 16.4.2

Benchmark

Whereas the multiobjective ssGA has been specialized just to solve the network design problem, two additional mathematical MOPs have been used to test pPAES: Fonseca [32] and Kursawe [50]. These two MOPs have been selected from the especialized literature to evaluate the search model of pPAES. Their definition is shown in Table 16.2.

Both pPAES and the ssGA have been used to solve a cellular radio network design (CRND) problem coming from the telecommunication industry. This problem may be reduced to the placement and the configuration of base stations (BS) on candidate sites. The main decision variable of the problem deals with the mapping of the BS on the potential sites. BS are of three types: omnidirectional, small directive, or large directive. Each site may be equipped with either one BS with a single omnidirectional antenna or with one to three BS, each having a directive antenna. In addition to the mapping decision variable and the type of antenna, each BS is

EXPERIMENTATION

381

configured by some engineering parameters: the azimuth, that is, the direction the BS is pointing to, the transmitter power, and the vertical tilt. We define three main objectives for the problem: minimize the number of sites used (reducing the number of sites reduces the cost of the design), maximize the amount of traffic held by the network, and minimize the interferences. Two main constraints have to be satisfied to design a cellular network: 0

0

Cover ofthe area. All the geographical area must be covered with a minimal radio field value that must be greater than the receiver sensibility threshold of the mobile.

Handover. By definition, a mobile moves at any time. The cellular network must be able to ensure the communication continuity from the starting cell to the target cell when a mobile is moving toward a new cell.

16.4.3 Performance Metrics

Since we are dealing with parallel multiobjectivealgorithms, we have used two kinds of metrics. On the one hand, the temporal behavior have been measured with the classical speedup, S N ,and parallel efficiency, q. They can be defined as (16.1) (16.2) where N is the number of processors involved in the parallel computations. On the other hand, two complementary performance indicators have been used to evaluatethe quality of the obtained nondominated set of solutions: the entropy and the contribution. The entropy indicator gives an idea about the diversity of the solutions found; the contribution indicator compares two fronts in terms of dominance. Let PO1 and PO2 be the Pareto fronts of a given MOP respectively calculated with two algorithms A and B, and let be PO = ND (PO1 UPOz), with N D representing the nondominated set. Then, the N-dimensional space, where N is the number of objective hnctions to optimize, is clustered. For each space unit with at least one element of PO, the number of present solutions of PO1 is calculated. This way, the relative entropy, E ( P O 1 ,P O z ) ,of a set of nondominated solutions PO1 regarding to the Pareto frontier PO is defined as (16.3) where C is the cardinality of the non-empty space units of PO, and ni is the number of solutions of set PO1 inside the corresponding space unit. The more diversified the solution set PO1 on the frontier the higher the entropy value (0 5 E 5 1).

382

PARALLEL MULTIOBJECTIVE OPTIMIZATION

The other performance metric (contribution) quantifies the dominance between two Pareto optimal sets. The contributionof PO1 relative to PO2 is roughly the ratio of nondominated solutions produced by P O I . This way, let C be the set of solutions in PO1 P02. Let W1 (resp. W2)be the set of solutions in PO1 (resp. P02) that dominate solutions of PO2 (respectivelyin POI). Similarly, let L1 (respectivelyL:!) be the set of solutions in PO1 (resp. PO:!)that are dominated by solutions of PO2 (resp. PO1).The set of solutions in PO1 (respectively P02) that are comparable to solutions in PO:! (respectively POI) is N 1 = POI\ (C U Wl U L1) (respectively 1V2 = P02\ (C U W2 U L2)). This way, the contribution C O N T ( P O l / P 0 2 )is stated as

These two metrics allow us to perform measurements between exactly two Pareto fronts. However, when one deals with heuristic algorithms, K independent runs should be executed in order to get statistical confidence for the results (where K is typically greater than 30). In this case, for each multiobjective heuristic algorithm K Pareto fronts are obtained, so we have extended the definitions of entropy and contribution to address this fact. Let POA be a Pareto front obtained with the heuristic algorithm A and let {PO:}tE{13,,,,K} be the K fronts resulting from the K independent runs of algorithms B. The extended entropy, E e x t ,and the extended contribution, C O N T e x t ,are defined as

(16.6) That is, these extended metrics are the average values of E and CONT ofa given nondominated set over the whole set of fronts passed as the second argument. 16.4.4

Results

This section is devoted to the presentation and analysis of the results, first, for the pPAES algorithm and, second, for the parallel multiobjective ssGA. 16.4.4.1 pPAES. We have tested three configurations of pPAES: pPAES4, pPAES8, and pPAES16, that use 4, 8, and 16 processes, respectively. Each process runs in a dedicate processor, so 4, 8, and 16 machines have been used for the pPAES experiments. The three versions of pPAES, as well as the PAES algorithm, perform 10,000 function evaluations. This way, each parallel process of pPAES4 evaluates 2500 individuals, 1250 function evaluations are computed by the pPAES8 processes, and a process of pPAES16 does 625. All of them use a mutation rate of

EXPERIMENTATION

383

Table 16.3 Execution times (in seconds) and parallel efficiency of the algorithms for the CRND problem

Algorithm PAES

I Time(s) I

19777 5153 3277 2160

q

-

95.95% 75.44% 57.23%

0.05 and a maximum archive size of 100. All the values shown in the tables of this section are the average values over 30 independent runs. Let us begin with the temporal behavior of pPAES. In Table 16.3, we show the execution times (in seconds) of PAES and the three versions of pPAES for the CRND problem. The parallel efficiency (Equation 16.2) of these last three algorithms is also presented. The two mathematical test MOPS, Fonseca and Kursawe, have not been considered for this temporal study because their execution times are lower than a second in a single processor machine of our cluster (see its specifications above). This way, their parallel execution times will not give us relevant information, but they have been included to evaluate the new model of search proposed by pPAES. If we analyze the results depicted in Table 16.3, we can observe that the execution times decrease when a higher number of processors are used. A near optimal parallel efficiency of 95.95% is obtained with pPAES4. The q value of pPAES8 is 75.44% (which is lower but acceptable). This reduction in the parallel efficiency, which is accentuated in pPAES 16 (57.23%), is expected due to the synchronization constraints imposed by pPAES. That is, pPAES uses a synchronous ring of processes, thus the larger the number of processes, the higher the synchronization constraints among these processes. An additional reason that justifies this result is the size of the problem considered. Each network instance of the CRND problem needs, on average, 300 Kbytes of memory. This way, if we consider that, first, each pPAES process at the end of its computation sends its nondominated individuals (i.e., networks) to a distinguished process and, second, this number of nondominated individuals is typically 100 (i.e., the maximum archive size), then, for example, in pPAES16, 15 process x 100 individuals x 300 Kbytes II 425 Mbytes of information that must be transmitted through the network before pPAES will finish. (Note that we have used 15 processes because the distinguished nodes that collect the nondominated individuals to build the final Pareto front do not have to send its own front to itself.) This requires a significant amount of time over our Fast-Ethernet at 100 Mbps, penalizing the parallel efficiency of pPAES when the number of processes forming the ring grows UP.

Table 16.4 presents the resulting values for the extended entropy and extended contribution metrics of the Pareto fronts obtained by pPAES. The Eext column of this table shows the extended entropy indicator (Equation 16.5). Since E e x tis a relative and nonsymmetricmeasure between nondominated sets, the first two columns in the table contain the arguments (and order) used to calculate this metric (this is

384

PARALLEL MULTIOBJECTIVE OPTIMIZATION

Arguments

POA PAES PAES PAES pPAES4 pPAES8 vPAESIF,

EeXt

{PO:}

Fonseca

pPAES4 pPAES8 pPAES16 PAES PAES PAES

0.3620 0.3617 0.3591 0.3957 0.4050 0.4206

Kursawe 0.3731 0.3706 0.3694 0.4091 0.41 16 0.41 15

CRND

Fonseca

0.4359 0.4251 0.4550 0.4280 0.4160 0.4143

0.4251 0.4137 0.3674 0.5749 0.5864 0.6326

CONT~"~ Kursawe CRND 0.4325 0.4237 0.4214 0.5675 0.5763 0.5786

0.5537 0.5570 0.6168 0.4463 0.4431 0.3832

also applicable to the contribution indicator). This way, for the problems Fonseca and Kursawe, the relative entropies between PAES and the three pPAES versions (rows 1 to 3 ) are lower than the relative entropies between the pPAES versions and PAES (rows 4 to 6), which indicates that the new search model is able to find a more diverse Pareto front than the PAES algorithm for these two mathematical MOPs. This diversity may be introduced by the migration operator which exchanges nondominated solutions among the pPAES processes. However, if we focus on the CRND problem, the algorithms yield a similar spreadout of the nondominated solutions on the frontiers (the relative entropy is around 0.42). The contribution indicator (see the CONTCxt column of Table 16.4) behaves slightly different. For the problems Fonseca and Kursawe, contribution values show that the three pPAES versions bring more Pareto solutions than PAES, i.e., rows 4 to 6 have larger values for this metric than rows 1 to 3 (especially pPAESl6 for the problem Fonseca, its contribution value to PAES being 0.6326). As a matter of fact, the contribution of pPAES to PAES increases with the number of processes for these two MOPS: CONText(pPAESlsIPAES) 2 CONText(PPAESsIPAES) 2 CONText(pPAES4IPAES). This allows us to conclude that starting the search from different points of the search space (each pPAES process begins with a random initial solution) allows more accurate solutions for these mathematical problems, i.e., we have designed an enhanced search model. Despite the improvements yielded for the problems Fonseca and Kursawe, PAES clearly overcomes any pPAES configuration in the CRND problem. We can explain this fact because of the search capabilities of pPAES. It is clear that performing the same number of function evaluations, PAES does a more exploitative search than any pPAES versions (which are more explorative since they consider different initial search points, one per each pPAES process searching during a lower number of steps). This way, even though all the algorithms are able to find accurate solutions for the two mathematical test MOPs, the CRND problem is quite difficult and the exploitative features of PAES allow it to find nondominated solutions that are closer to the optimum Pareto front than pPAES versions do. This way, when the contribution metric is calculated between PAES and any pPAES version, it shows that a higher number of points found by the former are in the front.

EXPERIMENTATION

385

16.4.4.2 Parallel ssCA. For the parallel models of the multiobjective ssGA, just the temporal behavior has been considered. The obtained speedups (Equation 16.1) for the parallel evaluation and the decomposition models, when running on COW and CLUMPS parallel systems, are displayed in Figures 16.3 and 16.4, respectively.

18

I

16

-

14

-

12

-

I

I

Parallel evaluation model.

-

____.....----

4

U

w

10

-

0)

8 -

6 4 -

2 0 ' 0

I

5

I

1

I

10

15

20

25

Number of processors

Fig. 16.3 Obtained speedups of the ssGA parallel models on a cluster of PCs (COW).

The parallel evaluation model is efficient on the two types of parallel machines: COWS and CLUMPS (Section 16.4.1). This is due to the coarse granularity of this model. Indeed, the complete evaluation of a single solution is time-consuming relative to the communication cost. Efficient results have also been obtained on a network of heterogeneous workstations based on an Ethernet network. The obtained speedup of the parallel decompositionmodel depends on the number of processors used. For a small number of processors (1 to 16) this model is more efficient than the parallel evaluation model. A relative degradation of performance has been shown from 16 processors for the COW architecture and from 16 to 32 processors for the CLUMPS architecture. Indeed, as the number ofpartitions depends on the number of processors, the granularity of the model decreases with the number of processors. Hence, this model is not scalable, but it can be used in conjunction with the other parallel models.

386

PARALLEL MULTIOBJECTIVE OPTlMlZATlON

Parallel evaluation model. Parallel decomposition model. - - - - - - ~ ~

30 25 -

_.-..--

m

0

I

0

10

20

I

30

40

I

I

50

60

70

Number of Drocessors processors

Fig. 16.4 Obtained speedups of the ssGA parallel models on a cluster of SMP machine (ri 1 I M D C

~

IQM CDI\

16.5 CONCLUSIONS AND FUTURE WORK In this chapter we have performed a wide review of the literature concerning parallel algorithms and multiobjective optimization. A classification of the algorithms found attending to their level of parallelism has been presented. Two example parallel metaheuristics have been included: we have developed a parallel multiobjective algorithm named pPAES, which is based on PAES, and a parallel multiobjective ssGA. We have tested pPAES with two mathematical MOPs (Fonseca and Kursawe) and a real problem coming from the telecommunication industry (the CRND problem). The results show that the new algorithm has a higher parallel efficiency when few machines are involved in parallel computations of complex problems like the CRND. For the mathematical MOPs, the larger the number of processes used in the pPAES configuration, the more accurate the Pareto front obtained. This allows us to conclude that pPAES improves the basic search model of PAES. As to the multiobjective ssGA, only the temporal behavior of its parallel versions has been analyzed. In fact, the obtained speedups show that the parallel evaluation models are efficient when running on the two tested parallel systems: COWS and CLUMPS. The efficiency of the ssGA parallel decomposition model depends on the number of processors used.

REFERENCES

387

As future research lines, we plan to improve the pPAES search model to deal with difficult problems like the CRND. We also intend to develop new parallel models of search possibly based on other multiobjective EAs.

Acknowledgments This work has been partially funded by the Ministry of Science and Technology and FEDER under contracts TIC2002-04498-C05-02(the TRACER project), TIC2002-04309-C02-02and by the RCgion Nord-Pas-De-Calaisin France under the TAT MOST project.

REFERENCES

1. D. K. Agrafiotis. Multiobjective Optimization of Combinatorial Libraries. ZBM J. RES.& DEY, 45(3/4):545-566, 2001. 2. A. Al-Yamani, S. Sait, and H. Youssef. Parallelizing Tabu Search on a Cluster of Heterogeneous Workstations. Journal of Heuristics, 8(3):277-304,2002. 3. A. Al-Yamani, S.M. Sait, H. Barada, and H. Youssef. Parallel Tabu Search in a Heterogeneous Environment. In Proc. of the Int. Parallel and Distributed Processing Symp., pages 5663,2003. 4. A. Al-Yamani, S.M. Sait, and H.R. Barada. HPTS: Heterogeneous Parallel Tabu Search for VLSI Placement. In Proc. of the 2002 Congress on Evolutionary Computation, pages 351-355,2002. 5 . E. Alba and M. Tomassini. Parallelism and Evolutionary Algorithms. ZEEE

Transactions on Evolutionaly Computation, 6(5):443462,2002. 6. E. Alba and J.M. Troya. A Survey of Parallel Distributed Genetic Algorithms. Complexity, 4(4):3 1-52, 1999. 7. G. Aloisio, E. Blasi, M. Cafaro, I. Epicoco, S. Fiore, and S. Mocavero. A Grid Environment for Diesel Engine Chamber Optimization. In Proc. ofParCo2003, pages 599-608,2003. 8. M. Basseur, J. Lemesre, C. Dhaenens, and E.-G. Talbi. Cooperation Between Branch and Bound and Evolutionary Approaches to Solve a Biobjective Flow Shop Problem. In Workshop on Evolutionary Algorithms (WEA '04), pages 7286,2004. 9. C. Blum and A. Roli. Metaheuristics in Combinatorial Optimization: Overview and Conceptual Comparison. ACM Computing Surveys,35(3):268-308,2003. 10. C.P. Bottura and J.V. da Fonseca Neto. Rule-Based Decision-Makmg Unit for Eigenstruture Assignment via Parallel Genetic Algorithm and LQR Designs. In Proc. of the American Control Conference, pages 467471,2000.

388

PARALLEL MULTIOBJECTIVE OPTIMIZATION

1 1. S. Cahon, N. Melab, and E.-G. Talbi. Building with ParadisEO Reusable Parallel

and Distributed Evolutionary Algorithms. Parallel Computing, 30(5-6):677697,2004. 12. C.S. Chang and J.S. Huang. Optimal Multiobjective SVC Planning for Voltage Stability Enhancement. IEE Proc.-Generation, Transmission, and Distribution, 145(2):203-209, 1998.

13. C.A. Coello and M. Reyes. A Study of the Parallelization of a Coevolutionary Multiobjective Evolutionary Algorithm. In MICAZ 2004, LNAI 2972, pages 688-697,2004.

14. C.A. Coello, D.A. Van Veldhuizen, and G.B. Lamont. Evolutionary Algorithms for Solving Multiobjective Problems. Kluwer Academic Publishers, 2002.

IS. M. Conti, S. Orcioni, and C. Turchetti. Parametric Yield Optimisation of MOS

VLSI Circuits Based on Simulate Annealing and its Parallel Implementation. IEE Proc.-Circuits Devices Syst., 141(5):387-398, 1994.

16. T.G. Crainic and M. Toulouse. Parallel Strategies for Metaheuristics. In F.W. Glover and G.A. Kochenberger, editors, Handbook of Metaheuristics, pages 475-514,2003. 17. V.-D. Cung, S.L. Martins, C.C. Ribeiro, and C. Roucairol. Strategies for the Parallel Implementation of Metaheuristics. In C.C. Ribeiro and P. Hansen, editors, Essays and Surveys in Metaheuristics, pages 263-308. Kluwer, 2003. 18. J.V. da Fonseca Net0 and C.P. Bottura. Parallel Genetic Algorithm Fitness Function Team for Eigenstructure Assignment via LQR Designs. In Proc. ofthe I999 Congress on Evolutionary Computation, pages 1035-1042, 1999.

19. A. de Risi, T. Donateo, D. Laforgia, G. Aloisio, E. Blasi, and S. Mocavero. An Evolutionary Methodology for the Design of a D.I. Combustion Chamber for Diesel Engines. In Con$ on Thermo- and Fluid Dynamics Processes in Diesel Engines (THIESEL 2004), vol. 1, pages 545-559,2004.

20. F. de Toro, J. Ortega, J. Fernandez, and A. Diaz. PSFGA: A Parallel Genetic Algorithm for Multiobjective Optimization. In Proc. of the 10th Euromicro Workshop on Parallel, Distributed and Network-Based Processing, pages 38439 1,2002. 2 1. F. de Toro, J. Ortega, and B. Paechter. Parallel Single Front Genetic Algorithm: Performance Analysis in a Cluster System. In Proc. of the Int. Parallel and Distributed Processing Symp. (IPDPS’O3),page 143,2003. 22. F. de Toro, J. Ortega, E. Ros, S. Mota, B. Paechter, and J.M. Martin. PSFGA: Parallel Processing and Evolutionary Computation for Multiobjective Optimisation. Parallel Computing, 30(5-6):72 1-739,2004.

REFERENCES

389

23. K. Deb. Multiobjective Optimization Using Evolutionary Algorithms. Wiley, 2001. 24. K. Deb, P. Zope, and A. Jain. Distributed Computing of Pareto-Optimal Solutions Using Multiobjective Evolutionary Algorithms. KanGAL 2002008, Indian Institute of Technology Kampur, 2002. 25. K. Deb, P. Zope, and A. Jain. Distributed Computing of Pareto-Optimal Solutions Using Multiobjective Evolutionary Algorithms. In E M 0 2003, LNCS 2632, pages 534549,2003. 26. P. Delisle, M. Krajecki, M. Gravel, and C. Gagn. Parallel Implementation of an Ant Colony Optimization Metaheuristic with OpenMP. In Proc. of the 3rd European Workshop on OpenMP (EWOMPOI), pages 8-12,2001. 27. D.J. Doorly and J. Peiro. Supervised Parallel Genetic Algorithms in Aerodynamic Optimisation. In Proc. ofthe 13th AIM CFD Conference, pages 2 10-2 16, 1997. 28. D.J. Doorly, J. Peiro, and J.-P. Oesterle. Optimisation of Aerodynamic and Coupled Aerodynamic-Structural Design Using Parallel Genetic Algorithms. In Proc. of the Sixth AIAA/NASA/USAF Multidiscipliary Analysis and Optimization Symp., pages 40 1 4 0 9 , 1996. 29. D.J. Doorly, S . Spooner, and J. Peiro. Supervised Parallel Genetic Algorithms in Aerodynamic Optimisation. In EvoWorkshops 2000, LNCS 1803, pages 357366,2000. 30. S. Duarte and B. Barb. Multiobjective Network Design Optimisation Using Parallel Evolutionary Algorithms. In XYVII Conferencia Latinoamericana de Informatica CLEI’2OOI, 2001. 31. P. Eberhard, F. Dignath, and L. Kubler. Parallel Evolutionary Optimization of Multibody Systems with Application to Railway Dynamics. Multibody Systems Dynamics, 9: 143-164,2003. 32. C.M. Fonseca and P.J. Flemming. Multiobjective Optimization and Multiple Constraint Handling with Evolutionary Algorithms - Part 11: Application Example. IEEE Transactions on System, Man, and Cybernetics, 28( 1):3847, 1998. 33. B. Gendron and T. G. Crainic. Parallel Branch and Bound Algorithms: Survey and Synthesis. Operations Research, 42: 1042-1066, 1994. 34. I.E. Golovkin, S.J. Louis, and R.C. Mancini. Parallel Implementation of Niched Pareto Genetic Algorithm Code for X-Ray Plasma Spectroscopy. In Proc. ofthe 2002 Congress on Evolutionary Computation, pages 1820-1824, 2002. 35. I.E. Golovkin, R.C. Mancini, and S.J. Louis. Parallel Implementation of Niched Pareto Genetic Algorithm Code for X-Ray Plasma Spectroscopy. In LateBreaking Papers at the 2000 Genetic and Evolutionary Computation Conference, 2000.

390

PARALLEL MULTIOBJECTI\’E OPTIMIZATION

36. A. Grama and V. Kumar. State of the Art in Parallel Search Techniques for Discrete Optimization. IEEE Transactions on Knowledge and Data Engineering, 11(1):28-35, 1999. 37. T. Hiroyasu. Diesel Engine Design Using Multiobjective Genetic Algorithm. In Japan/US Workshop on Design Emlironment, 2004. 38. T. Hiroyasu, M. Miki, M. Kim, S. Watanabe, H. Hiroyasu, and H. Miao. Reduction of Heavy Duty Diesel Engine Emission and Fuel Economy with Multiobjective Genetic Algorithm and Phenomenological Model. SAE Paper SP-I 824, 2004. 39. T. Hiroyasu, M. Miki, and S. Watanabe. Divided Range Genetic Algorithms in Multiobjective Optimization Problems. In Proc. of Int. Workshop on Emergent Synthesis, IWES’ 99, pages 57-66, 1999. 40. T. Hiroyasu, M. Miki, and S. Watanabe. The New Model of Parallel Genetic Algorithm in Multiobjective Optimization Problems - Divided Range Multiobjective Genetic Algorithm -. In 2000 IEEE Congress on Evolutionary Computation, pages 333-340,2000. 41. H. Horii, M. Miki, T. Koizumi, and N. Tsujiuchi. Asynchronous Migration of Island Parallel GA For Multiobjective Optimization Problem. In Proc. of the 4th Asia-Pacijic ConJ on Simulated E\,olution and Learning (SEAL ’OZ), pages 86-90,2002. 42. B.R. Jones, W.A. Crossley, and A.S. Lyrintzis. Aerodynamic and Aeroacoustic Optimization of Airfoils Via a Parallel Genetic Algorithm. Journal of Aircraft, 37(6): 1088-1098,2000. 43. N. Jozefowiez, F. Semet, and E.-G. Talbi. Parallel and Hybrid Models for Multiobjective Optimization: Application to the Vehicle Routing Problem. In Parallel Problem Solving from Nature (PPSN VII), pages 271-280, 2002. 44. J. Kamiura, T. Hiroyasu, M. Miki, and S. Watanabe. MOGADES: Multiobjective Genetic Algorithm with Distributed Environment Scheme. In Proc. @the 2nd Int. Workshop on Intelligent Systems Design and Applications (ISDA’O2),pages 143-148.2002. 45. M. Kanazaki, M. Morikawa, S. Obayashi, and K. Nakahashi. Multiobjective Design Optimization of Merging Configuration for an Exhaust Manifold of a Car Engine. In Parallel Problem Solving from Nature (PPSN VII), pages 28 1-287, 2002. 46. N. Keerativuttitumrong, C. Chaiyaratana, and V. Varavithya. Multiobjective Co-Operative Co-Evolutionary Genetic Algorithm. In Parallel Problem Solving from Nature (PPSN VII), pages 288-297,2002.

REFERENCES

391

47. M.P. Kleeman, R.O. Day, and G.B. Lamont. Analysis of a Parallel MOEA Solving the Multiobjective Quadratic Assignment Problem. In GECCO 2004, LNCS 3103, pages 402403,2004. 48. M.R. Knarr, M.N. Goltz, G.B. Lamont, and J. Huang. In Situ Bioremediation of Perchlorate-Contaminated Groundwater Using a Multiobjective Parallel Evolutionary Algorithm. In Proc. of the 2003 Congress on Evolutionary Computation (CEC’2003), pages 1604-161 1,2003. 49. J.D. Knowles and D.W. Corne. Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy. Evolutionary Computation, 8(2): 149172,2000.

50. F. Kursawe. A Variant of Evolution Strategies for Vector Optimization. In H.P. Schwefel and R. Manner, editors, Parallel Problem Solving for Nature, pages 193-197, Berlin, Germany, 1990. Springer-Verlag. 5 1. J. Lemesre, C. Dhaenens, and E.-G. Talbi. A Parallel Exact Method for a Bicriteria Permutation Flow-Shop Problem. In Project Management and Scheduling (PMS’O4), pages 359-362,2004.

52. J. Lienig. A Parallel Genetic Algorithm for Performance-Driven VLSI Routing. IEEE Trans. on Evolutionary Computation, 1( 1):29-39, 1997.

53. D.A. Linkens and H. Okola Nyongesa. A Distributed Genetic Algorithm for Multivariable Fuzzy Control. In IEE Colloquium on Genetic Algorithms for Control Systems Engineering, pages 911-913, 1993. 54. R.A.E. Miikinen, P. Neittaanmaki, J. Periaux, M. Sefrioui, and J. Toivanen. Parallel Genetic Solution for Multobjective MDO. In Parallel CFD ’96 Conference, pages 352-359, 1996.

55. J.M. Malard, A. Heredia-Langner, D.J. Baxter, K.H. Jarman, and W.R. Cannon. Constrained De Novo Peptide Identification via Multiobjective Optimization. In Online Proc. of the Third IEEE Int. Workshop on High Performance Computational Biology (HiCOMB 2004), IPDPS-04, page 191,2004. 56. S. Manos and L. Poladian. Novel Fibre Bragg Grating Design Using Multiobjective Evolutionary Algorithms. In Proc. ofthe 2003 Congress on Evolutionary Computation (CEC’2003), pages 2089-2095,2003. 57. N. Marco and S. Lanteri. A Two-Level Parallelization Strategy for Genetic

Algorithms Applied to Optimum Shape Design. Parallel Computing, 26:377397,2000. 58. H. Meunier, E.-G. Talbi, and P. Reininger. A Multiobjective Genetic Algorithm for Radio Network Design. In Proc. of the 2000 Congress on Evolutionary Computation, pages 3 17-324,2000,

392

PARALLEL MULTIOBJECTIVE OPTIMIZATION

59. A. Migdalas, P.M. Pardalos, and S. Story. Parallel Computing in Optimization, volume 7 of Applied Optimization. Kluwer, 1997,

60. A.J. Nebro, E. Alba, and F. Luna. Multiobjective Optimization Using Grid Computing. Soft Computing Journal, 2005. To appear. 61. S. Obayashi, D. Sasalu, Y. Takeguchi, and N. Hirose. Multiobjective Evolutionary Computation for Supersonic Wing-Shape Optimization. IEEE Trans. on Evolutionary Computation, 4(2): 182-187,2000, 62. S. Obayashi, T. Tsukahara, and T. Nakamura. Cascade Airfoil Design by Multiobjective Genetic Algorithms. In Second Int. Conf: on Genetic Algorithms in Engineering Systems: Innovations and Applications, pages 24-29, 1997. 63. S. Obayash, T. Tsukahara, and T. Nakamura. Multiobjective Genetic Algorithm Applied to Aerodynamic Design of Cascade M o i l s . IEEE Trans. on Industrial Electronics, 47( 1):211-216, 2000. 64. C.K. Oh and G.J. Barlow. Autonomous Controller Design for Unmanned Aerial Vehicles using Multiobjective Genetic Programming. In Proceedings ofthe 2004 IEEE Congress on Evolutionary Computation, pages 1538-1 545,2004. 65. L.S. Oliveira, R. Sabourin, F. Bortolozzi, and C.Y. Suen. A Methodology for Feature Selection Using Multiobjective Genetic Algorithms for Handwritten Digit String Recognition. Int. Journal of Pattern Recognition and Artijicial Intelligence, 17(6):903-929, 2003. 66. A. Osyczka. Multicriteria Optimization for Engineering Design. Academic Press, 1985. 67. K.E. Parsopoulos, D.K. Tasoulis, N.G. Pavlidis, V.P. Plagianakos, andM.N. Vrahatis. Vector Evaluated Differential Evolution for Multiobjective Optimization. In Proc. of the IEEE 2004 Congress on Evolutionary Computation (CEC 2004), pages 204-2 1 1,2004. 68. K.E. Parsopoulos, D.K. Tasoulis, and M.N. Vrahatis. Multiobjective Optimization Using Parallel Vector Evaluated Particle Swarm Optimization. In Proc. of the IASTED International Conference on ArtiJicialIntelligence and Applications (AZA 2004), Innsbruck, Austria, pages 823-828,2004, 69. C. Poloni, A. Giurgevich, L. Onesti, and V. Pediroda. Hybridization of a Multiobjective Genetic Algorithm, a Neural Network and a Classical Optimizer for a Complex Design Problem in Fluid Dynamics. Computer Methods in Applied Mechanics and Engineering, 186:403420, 2000. 70. D. Quagliarella and A. Vicini. Sub-Population Policies for a Parallel Multiobjective Genetic Algorithm with Application to Wing Design. In 1998 IEEE Int. Con$ On Systems, Man, And Cybernetics, pages 3 142-3 147, 1998.

REFERENCES

393

71. P.W.W. Radtke, L.S. Oliveira, R. Sabouring, and T. Wong. Intelligent Zoning Design Using Multiobjective Evolutionary Algorithms. In Proc. of the Seventh Int. Con$ on Document Analysis and Recognition (ICDAR 2003), pages 824828,2003. 72. J.L. Rogers. A Parallel Approach to Optimum Actuator Selection with a Genetic Algorithm. In A I M Guidance, Navigation, and Control Confi, page 10,2000. 73. J. Rowe, K. Vinsen, and N. Marvin. Parallel GAS for Multiobjective Functions. In Proc. of the 2nd Nordic Workshop on Genetic Algorithms and Their Applications (2hWGA), pages 61-70, 1996. 74. S.M. Sait, H. Youssef, H.R. Barada, and A. Al-Yamani. A Parallel Tabu Search Algorithm for VLSI Standard-Cell Placement. In ISCAS’OO, pages 58 1-584, 2000. 75. D. Sasaki, M. Morikawa, S. Obayashi, and K. Nkahashi. Aerodynamic Shape Optimization of Supersonic Wings by Adaptive Range Multiobjective Genetic Algorithms. In First Int. Con$ on Evolutionaiy Multi-Criterion Optimization (EM0 2001), pages 639-652,2001. 76. T.J. Stanley and T. Mudge. A Parallel Genetic Algorithm for Multiobjetive Microprocessor Design. In Proc. of the Sixth Int. Con$ on Genetic Algorithms, pages 597-604,1995. 77. G. Stehr, H. Graeb, and K. Antreich. Performance Trade-off Analysis of Analog Circuits by Normal-Boundary Intersection. In Proc. of the 40th Conference on Design automation, pages 958-963, 2003. 78. D.A. Van Veldhuizen and G.B. Lamont. Multiobjective Evolutionary Algorithm Test Suites. In Proc. of the 1999 ACM Symp. on Applied Computing, pages 351-357, 1999. 79. D.A. Van Veldhuizen, J.B. Zydallis, and G.B. Lamont. Considerations in Engineering Parallel Multiobjective Evolutionary Algorithms. IEEE Trans. on Evolutionary Computation, 87(2):144 -173,2003. 80. A. Vicini and D. Quagliarella. A Multiobjective Approach to Transonic Wing Design by Means of Genetic Algorithms. In NATO RTO AVT Symposium on Aerodynamic Design and Optimization, 1999. 81. J.F. Wang, J. Periaux, and M. Sefrioui. Parallel Evolutionary Algorithms for Optimization Problems in Aerospace Engineering. Journal of Computational and Applied Mathematics, 149:155-169,2002. 82. S. Watanabe, T. Hiroyasu, and M. Milu. Parallel Evolutionary Multi-Criterion Optimization for Block Layout Problems. In 2000 Int. Con$ on Parallel and Distributed Processing Techniques and Applications (PDPTA’2000), pages 667-673,2000,

394

PARALLEL MULTIOBJECTIVE OPTIMIZATION

83. S. Watanabe, T. Hiroyasu, and M. Mih. Parallel Evolutionary Multi-Criterion Optimization for Mobile Telecommunication Networks Optimization. In Proc. ofthe EUROGEN'2001, pages 167-172,2001. 84. M.M. Wiecek and H. Zhang. A Scalable Parallel Algorithm for Multiple Objective Linear Programs. ICASE 94-38, NASA, 1994. 85. S. Xiong and F. Li. Parallel Strength Pareto Multiobjective Evolutionary Algorithm. In Proc. of the 2003 Congress on Evolutionary Computation (CEC'2003), pages 681-683,2003. 86. D. Zaharie and D. Petcu. Adaptive Pareto Differential Evolution and Its Parallelization. In PPAM2003, LNCS 3019, pages 261-268,2004.

17 Parallel Heterogeneous Metaheuristics

FRANCISCO LUNA, ENRIQUE ALBA, ANTONIO J. NEBRO Universidad de Malaga, Spain

17.1 INTRODUCTION Optimization techniques can be classified, in a first approximation, into exact and heuristic methods. Exact techniques are guaranteed to find the optimal solution of a given optimization problem. Due to the difficulties to solve many of these optimization problems, particularly those that exhibit exponential complexity, exact algorithms often perform very poorly. As a result, the use of heuristic techniques has received much attention in the last 30 years. In heuristic methods, we sacrifice the guarantee of finding optimal solutions for the sake of (hopefully) getting acceptable solutions in a significantly reduced amount of time. Among the basic heuristic methods, we usually distinguish between constructive methods and local search methods. Constructive algorithms generate solutions from scratch by adding -to an initially empty partial solution- components, until a solution is complete. Local search algorithms start from some initial solution and iteratively try to replace the current solution by a better one in an appropriately defined neighborhood of the current solution. The main drawback of these approaches, their inability to continue the search upon becoming trapped in a local optimum, leads to the consideration of techniques for guiding the search to overcome local optimality. In the last 20 years, new kinds of algorithms have emerged, and they are based in combining basic heuristic methods in a higher level framework aimed at efficiently and effectively exploring the search space. These methods are commonly called metaheuristics. Up to now there not exist a commonly accepted definition for the term metaheuristic. Several definitions have been proposed by researchers in the area [21] based on the work of Glover [45]. A metaheuristic is a top-level strategy that guides an underlying heuristic to solve a given problem. This class of algorithms [46] includes, but is not restricted to, Ant Colony Optimization (ACO), Evolutionary Computation (EC), Iterated Local Search (ILS), Simulated Annealing (SA), and Tabu Search (TS). 395

396

PARALLEL HETEROGENEOUS METAHEURISTICS

An issue of great importance in metaheuristics is that a dynamic balance exists between diversification and intensification. The term diversification generally refers to the exploration of the search space, whereas the term intensification refers to the exploitation of the accumulated search experience. These terms stem from the TS field and are becoming commonly accepted by the whole field of metaheuristics. The balance between diversification and intensification is important, on one side, to quickly identify regions in the search space with high quality solutions and, on the other side, to avoid wasting too much time in regions of the search space which are already explored or which do no provide high quality solutions. The exploration of the search space performed by a metaheuristic having an imbalance between diversification and intensification may get trapped in a region that does not contain the global optimum. This problem has been widely studied in the field of the Evolutionary Algorithms (EAs) [57, 881, where it is called premature convergence. On the other hand, with the proliferation of parallel computers, powerful workstations, and fast communication networks, parallel implementations of metaheuristics appear quite naturally as an alternative to speed up the search for solutions [30, 321. Using a parallel metaheuristic often leads not only to a faster algorithm but also to a more effective one. However, the truly interesting observation is that the new search model of the parallel metaheuristic, based on an improved mechanisms for search diversification and intensification, is responsible for such benefits [32]. As a consequence, many authors do not use parallel machines to run parallel metaheuristics models, and they still get better results than with serial traditional ones. Several parallelization strategies can be applied to metaheuristics [29, 30, 321. Among them, there exist parallel models where multiple search threads concurrently explore the solution space. These models are called Type 3 parallel strategies in [29, 301 and multiple walk approaches in [32]. If each thread uses a different search procedure (a different method or different parameter settings, for example), we obtain a Parallel Heterogeneous Metaheuristic (PHM). The utilization of multiple search threads using different strategies and parameter values allows for a larger diversity and deeper exploration of the search space, which could hopefully lead to more accurate solutions. Heterogeneous metaheuristics could be considered as hybrid algorithms [94], but there is a basic conceptual reason to studying the PHMs: while hybridization consists in adding particular problem-knowledge into the algorithm in order to solve specific problems, heterogeneity helps to develop more robust metaheuristics trying to offer a high level of performance over a wide variety of problem settings and characteristics. That is, they are different ways of improving metaheuristics. Some works accounting for this fact are, for example, [26] for TS, [lo, 52, 991 for EC, [79] for SA, or [97] for ACO. In this chapter, we present a state-of-the-art survey of heterogeneous metaheuristics and a taxonomy for this kind of algorithm. Our goals are, first, to provide a classification mechanism and, second, to show by means of the given references how the heterogeneity could help to design powerful and robust optimization algorithms. The taxonomy we present here combines two classification schemes: a hierarchical

HETEROGENEOUS METAHEURISTICS SURVEY

397

one that determines the number of classes and a flat one used when the descriptors of the algorithms may be chosen in an arbitrary order. To the best of our knowledge, this is the first taxonomy of PHMs. The work is organized as follows. In the next section, we include a survey of PHMs from a historical and application-oriented point of view. In Section 17.3, we present a taxonomy that tries to encompass the most important heterogeneous algorithms. Then, some frameworks that allow the development of PHMs are shown in Section 17.4. In Section 17.5, we draw some concluding remarks. Finally, an annotated bibliography which classifies more than 100 references according to the taxonomy is given in the last section.

17.2 HETEROGENEOUS METAHEURISTICS SURVEY

As we stated before, we consider as PHMs those parallel models (even running in sequential machines) composed of multiple search threads, each one following a different search technique. Note that this is a high level description of this kind of optimization algorithms: several agents exploring a search space for global optima, whatever the search method they use (TS, SA, EC, or even exact algorithms). We present here a survey of heterogeneous metaheuristics from a historical and application-orientedperspective. Let us begin with the temporal view of PHMs. The most well-known models and their main characteristics sorted by date of published material are shown in Table 17.1. Of course, this is not a complete list, but it give us a feeling of the evolution of PHMs over the last years. For example, heterogeneous metaheuristics involving ACO algorithms do not appear until the late 1990s, when they became widely known. We can observe that, at the beginning, PHMs only involved search threads using the same metaheuristic differently configured, e.g., several GA islands [99] or several TS processes [31]. However, the cooperation of different metaheuristics is the current research trend [69, 821. From a different point of view, we present in Table 17.2 a set of applications in order to show the wide spectrum of successful studies. It can be seen that the set of applications is quite diverse, thus stating the relative importance of PHMs. Many of these applications are real-world problems which involves high cost computational tasks, the function evaluation being typically the most time-consuming component. In this context, grid computing systems [20] have appeared as powerful solution for complex tasks demanding high computational resources that cannot be addressed in normal clusters, so that they are being utilized to parallelize optimization algorithms [74, 801. A promising research line in the PHM field, as it occurs in parallel metaheuristics [30,32], is based on the cooperation of the search threads that exchange and share information collected along the trajectories they investigate. For this reason, we show in Tables 17.3 to 17.5, for each algorithm, whether it is parallel or not, whether the search threads communicate with each other during the exploration, and the connec-

Metaheuristic PGA dGA ffiA GAMAS iiGA CS TPSA CoPDEB PHMH PATS SAGA ACS APGA nGA (DAGA2) CGA HDGA CMPTS AGA FTPH TECHS MACS-VRPTW ANTabu COSEARCH ATeam PTS CAC HM4C MCAA GAACO CPTS DPM HFC-EA Hy3 PMSATS HY4 CPM-VRPTW

[69]

[lo]

Ref. [98]

2004

7nn4 _.

Well-known heterogeneous metaheuristics by year

Main Features Distributed GA ananged in a hypercube topology. Different mutations rates used. Distributed GA with a hypercube topology. Different crossover and mutation rates. GA in which an explorer population divides the search space into exploiter subpopulations at every schema detection. Uses four heterogeneous species (islands) and quite specialized migrations and genotypes. Distributed GA where each subpopulation stores solutions coded with different representations. GA with several subpopulations using a different strategy and competing with each other. Several SA algorithm running with different temperatures. Cooperation through solutions interchange. Distributed GA where each island uses its own rates for mutation, crossover, and specialized operators. A cellular GA and a TS cooperating. The cGA acts like a diversification mechanism for the TS. Asynchronous parallelization of T S processes using different search strategies. Self-adaptive GA. Three GASself-adapting their population sizes and the mutation strength. Extend the CS model [91] by adapting the size of the population. Distributed GA adapting the search strategies of each subpopulation. General recursive model of multilevel GAS. A two-level GA. DAGA2. is evaluated. Distributed GA. Each island solves the same problem using a different representation. Distributed GA applying different crossover operators to each subpopulation. Synchronous. Several TS processes collaborate exchanging elite solutions which are used as initial solutions. Adaptive GA where subpopulations compete by using different crossover operators. Different initial solutions and different parameters settings for TS processes. Fault-tolerant. Cooperation between GA and BB agents. Highly application-dependent. Two ant colonies running different ACO searches, one for each objective of the VRPTW. Parallel AC combined with TS. Masteriworker paradigm. Central memory. Different tabu list sizes. Cooperation between TS, GA. and local search. Communication via adaptive memory. Asynchronous Team combining different heuristics. Cooperation by means of two shared memories. TS processes cooperating through a central process. It uses different initial solutions and search strategies. Cooperative AC. Two ant colonies using different parameter settings and different heuristics. Two-phase metaheuristic combining ES and multiple TS threads with different parameters settings. Multicolony Ant Algorithm for bicriterion optimization. Each ant and each colony uses a different weight. A GA and an ACO. running and cooperating in parallel. Several TS threads with different search strategies. Asynchronous cooperation through a pool of solutions. Distributed Parallel Metaheuristic executing different configuration of GRASP and VNS. Hierarchical model. Subpopulations can only accommodate individuals within a specified fitness range. Physical parallelization of the GD-RCGA [52]. Synchronous vs. asynchronous version of the model. Parallel multilevel algorithm. Mix SA and TS. Several annealing parameters used in each process. Extension of the Hy3 [9. 1 I ] model. 16 subpopulations arranged in a hypercube topology o f 4 dimensions. Cooperative framework composed of two TS. two EAs. and local search. Central-memory communication

Table 17.1

Year 1987 1989 1993 1994 1994 1994 I995 1996 I996 I996 1996 1996 1996 1996 I997 I997 1998 1998 1998 1999 I999 1999 2000 2000 2000 2001 2001 2001 2002 2002 2002 2002 2003 2003

HETEROGENEOUS METAHEURISTICS SURVEY

Table 17.2 Some applications of heterogeneous metaheuristics Reference@) 13.81 . , [103. 105. 62. 2. 65. 76. 120, 11 [103. 106. 52. I I . 101 [67.62. 108. 331 1781 (72. 71. 14. 15. 161 (38. 75, 891 1251

Application domain Training of artificial neural networks. Travelling salesman problem. Frequency modulation sounds problem. Job shop scheduling. Structural optimum design problem. Graph partitioning problem. Aeronautic engineering problems. Artificial evolution of fuzzy controllers. Optimization of material distribution. Software engineering problems. The file allocation problem. Data mining. Knapsack problem. Walsh polynomials. Quadratic assignment problem. Dynamic optimization problems. Assignment of frequencies in cellular networks. Transport lot sizing and transporter allocation problem. Digital and analog circuit optimization problems. Labor constrained scheduling problem. Travelling purchaser problem. Single machine total tardiness problem. Vehicle routing problem with time windows. CaDacitated multicommodity network design. Tr&portation problems. .

-

[inj

[87. 861

[88l

1811 [56. 76.64. i02j [991 [17.95.96,97. 18. I ] [ I l l , 1121 (1171 [851 [ 5 , 7, 6. 27. 57, 58. 931 [26l [371 r60.471 [42.43.70.44.69. 191 1281 [22.24.31,'35j

Table 17.3 Trajectory-based heterogeneous metaheuristics Algorithm CMPTS CPSA CPTS DPM DSA FTPH HTPS MMC PSA PATS PMSATS PTSA PTS TPSA TPSAIAN

TTS

Reference 151 [621 P8I [37] [671 PSI [71

VIl

Parallel J J

J

J

J

J

; J

J

w-1

J J

[26l [66l [79l 1241

J = Yes,

J

J

[31]

[I41

Comm. J

J J J J

0

J

J J J J J J J

J

J

Topology Star Star Star ~

Star Star Star Full Connected Star Full Connected Ring/Full Connected Star Ladder Ladder Full Connected

= No, - = N o information

given

399

400

PARALLEL HETEROGENEOUS METAHEURISTICS

tion topology among threads. We have presented this information in several tables because we also want to differentiate whether threads use one or more solutions at the same time [2 11. Metaheuristics working on single solutions are called trajectorybased metaheuristics (Table 17.3). On the contrary,population-based metaheuristics (Table 17.4)perform search procedures which describe the evolution of a set of points in the search space. As it can be seen, the latter ones are more numerous because of the intensive research on the EC field. Of course, mixed models can also be found (Table 17.5). 17.3 TAXONOMY OF PARALLEL HETEROGENEOUS METAHEURISTICS In this section, we describe our taxonomy of PHMs. We present the hierarchical and flat classification schemes in Sections 17.3.1 and 17.3.2, respectively. 17.3.1 Hierarchical Classification The structure of the hierarchical portion of the taxonomy is shown in Figure 17.1. A discussion about each class follows.

Hierarchical HOlTl0ge"eo"S

Flat

Fig. 17.1 Classification of Parallel Heterogeneous Metaheuristics.

17.3.1.1 Different Search Method At first level, the search threads of the PHMs can use different search methods or a unique method. In the first case, each search thread is guided by a different metaheuristic. Collaborations with exact techniques also appear in the literature [33]. This kind of heterogeneity allows us to design the most robust search methods because of the utilization of different metaheuristics with specific balance between intensification and diversification. For example, A. Le Bouthillier and T.G. Crainic present in [69] a parallel cooperative multisearch method based on a solution warehouse, being applied to the Vehicle Routing Problem with Time Windows (VRPTW). The cooperation scheme is achieved through asynchronous exchanges of information which is shared by means of a solution warehouse or pool of solutions. When a thread improves a solution or when it identifies a new best solution, it sends out the value of that solution to the warehouse.

TAXONOMY OF PARALLEL HETEROGENEOUS METAHEURISTICS

Table 17.4 Population-based heterogeneous metaheuristics Algorithm ACS ADS-EEA AGA AHFC-EA AIPGA ANTabu APGA CAC CE CGATTO CGA CGA COM PETants CoPDEB PcGA CS-EEA CS DCMOGADES DDGA DEA DEGA DGAIrmr dGA DiiGA EDGA RiA GAACO GAMAS GDRCGA HD-RCGA HDEA HDGA HFC-EA Hy3 HY4 iiGA MACA MACS-VRPTW MCAA MEA MGA MOGADES mp-ffiA nGA (DAGA2) o-GA p-ffiA PAC0 PGA-DCP PGA-PA PGA PGA PLGA RDGA RHGA SAGA SIM Type I bGA Type I1 bGA

Reference 1921

J = Yes, 0

Parallel

Comm.

J J 4’

J J J J J J

Topology Full Connected Full Connected Star Hierarchy Full Connected Star Mesh Master&lave Full Connected

0

-

J J J J J

MasterNave Full Connected Full Connected Full Connected Torus

0

J J J J J J J J J J J J J

J

-

Full Connected Random Ring Dynamic Hierarchy Hypercube Random Ring Full Connected Hypercube Hierarchy Random Ring MasterlSlave Full Connected Fixed Hierarchy Hypercube Hypercube

0

J J J J J J J J J J J J

J J J J J J J J 0

-

J J

Full Connected Full Connected

0

J J J = No,

Hypercube Hierarchy 3D-Hypercube 4D-Hypercube Hierarchy Full Connected Star Star Full Connected Full Connected Random Ring MasteriSlave Any Desired Master’Slave MastedSlave Star MasteriSlave Ring Hypercube Ring

-

Fixed Hierarchy Full Connected Full Connected

- = No information given

401

402

PARALLEL HETEROGENEOUS METAHEURISTICS

Table 17.5 Heterogeneous metaheuristics composed of population- and trajectorybased algorithms Algorithm ATeain COSEA RC H CPM-VRPTW

GCPSA HPMH PAC0 PHM H PSAIANGA TECHS

Reference

[261 [I81 [691 [621 [761 1821 1171 V71 [33]

Parallel

Comm.

J

J

J J

J

J J J

J J

J

J

J

J J J J

J

Topology

Directed Graph Star Star Star Full Connected

Full Connected Full Connected Star Full Connected

J = Yes, 0 = No, - = No information given

The cooperative framework is composed of two different TSs and two EAs (in this case, the EAs are the same but using different crossover mechanisms). These EAs use the solution warehouse as populations. Construction and improvement heuristics are also included to generate an initial population and to perfom postoptimization. The solution warehouse keeps good, feasible solutions and is dynamically updated by the independent search processes. It is divided into two subpopulations: intraining and adult. All solutions received from the independent processes are placed in the in-training part. The postoptimization procedure is then applied and the resulting solution is moved to the adult subpopulation. Duplicate solutions are eliminated. All requests for solutions initiated by the independent processes are satisfied by the adult subpopulation. Solutions are randomly selected according to probabilitiesbiased toward the best, and it is based on the same function used to order solutions in the solution warehouse. This solution warehouse provides starting and diversification solutions to TS procedures and parents to the EAs. 17.3.1.2 Same Search Method. Heterogeneous metaheuristics can also be composed of several search threads that utilize the same optimization technique but differently configured. We want to note that, from the point of view of the metaheuristics considered in this work, parallel homogeneous metaheuristics are a subtype of PHMs (Fig. 17.1). The PHMs using differently configured search methods can in turn be classified according to the level where changes have been introduced in each configuration method. As it is depicted in Fig. 17.1, we distinguish between different parameter settings, different operators, and different representations. Of course, they are not disjoint sets so we can find a PHM whose threads have a different search parameter setting and also use different search operators [5, 10, 341. Parameter This is the most widespread class of PHM since it represents a direct extension of the canonical homogeneous models. It consists in using different parameters for the underlying metaheuristic that guides the search threads. We refer, for example, to the utilization of different crossover and/or mutation rates in each subpopulation of a GA [98], or different initial temperatures in a parallel SA [66, 711. In addition, we

TAXONOMY OF PARALLEL HETEROGENEOUS METAHEURISTICS

403

can consider that these parameters could be initially preprogrammed [3,4], randomly chosen during the evolution [55,78], or could follow an adaptive strategy [54,91,93]. This kind of heterogeneity has also been used to enhance the effectiveness of TS algorithms [5, 281 and ACO systems [47,60].

Operators In general, metaheuristics use heuristic operators in order to perform the exploration of the search space. Examples are the crossover and mutation operators in GAS or the solution construction procedure of the ants in ACO. Many variants of each operator have appeared in each research field. In that way, it is possible to use different procedures for the same heuristic operation in each search thread of a parallel metaheuristic. This kind of heterogeneity is very popular in the EC field since there exists a large amount of different crossover and mutation operators [ 11, 39, 52, 1061. Some references can also be found in the TS field [5], in ACO systems [34, 421, and with GRASP (Greedy Randomized Adaptive Search Procedure) and VNS (Variable Neighborhood Search) [37]. Representation This is a more subtle kind ofheterogeneity where each search thread uses a different solution representation to explore the search space. A well-known example is the iiGA [72], although some additional works exist [40, 116, 1031. The iiGA (Injection Island GA) is a heterogeneous distributed GA where each subpopulation stores search space solutions coded with different resolutions. Subpopulations inject their best individual into higher resolution subpopulations for fine-grained modifications. This allows the search to occur in multiple codings, each focusing on different areas of the search space. An important advantage is that the search space in subpopulations with lower resolution is proportionally smaller; in this way, fit solutions are found quickly, and then, they are injected into higher resolution subpopulations for refinement. The migration rules of the iiGA consist of the following. An iiGA may have a number of differentblock sizes being used in its subpopulations. To allow interchange of individuals, a one-way exchangeof informationis only allowed, where the direction is from a low resolution to a high resolution node. Solution exchange from one node type to another requires translation to the appropriate block size, which is done without loss of information from low to high resolution. All nodes inject their best individual into a higher resolution node for “fine-grained” modification. This establishes a hierarchy of exchange that allows us to design different migration topologies. 17.3.2

Flat Classification

17.3.2. I Hardware versusSoffware. All the above mentioned PHMs are software heterogeneous: the difference between the exploration methods of the search threads is defined by the software design. However, the homoheterogeneity could first be understood as a term refemng to the execution platform, where each search thread

404

PARALLEL HETEROGENEOUS METAHEURISTICS

executes over a different hardware andor operating system. The behavior of heuristic techniques, in which stochastic process are usually involved, is highly determined by the platform specific random number generator. In this case, we consider as a PHM a software homogeneous metaheuristic running on a heterogeneous hardware [7, 121. Of course, there also exist software heterogeneousmetaheuristicswhere the executing platform used is also heterogeneous [3 1,951.

17.3.2.2 Independent Run versus Collaborative Search Thread. According to whether the threads communicate with each other, we distinguish two execution models of PHMs: Independent Run and Collaborative Search Thread. In the independent run model, there is no communication between the search threads, i.e., they do not interact during the exploration and therefore the search is independentlyperformed by the threads. Typically, at the end of the computation,the best of the solutions is presented as the output result and the others are discarded [ 17, 711. If the search threads share information collected along the trajectories they investigate, we obtain the collaborative search thread model. In the EC field, this interaction, called migration, is widely used and consists in exchanging one or more individuals between subpopulations of the algorithm [99]. However, PHMs involving trajectory-based algorithms usually utilize a pool of solution as a central point of communication among the different threads [70]. 1 7.3.2.3 Competition versus Cooperation. Another orthogonal level of heterogeneity can be defined by rapport to the relationship maintained among the elementary metaheuristics in the algorithm. Basically, if the number of resources a thread uses to perform the search (number of individuals, sizes of tabu lists, etc.) is not constant during the evolution, i.e., they depend on the previous success of its search strategy, then it could be said that threads are competing. Otherwise, it seems that the subpopulationscollaborate to find the optimum. Hence, we differentiatebetween competition-basedheterogeneity [57,83, 1201and collaboration-basedheterogeneity [52,28]. 17.4 FRAMEWORKS FOR HETEROGENEOUS METAHEURISTICS

In this section we present some frameworks that enable the development of PHMs. We distinguish two kinds of frameworks for PHMs: implementation-oriented and model-oriented. 0

Implementation-OrientedFrameworks. They are focused on the software implementation of metaheuristics. In this way, a framework may be defined as a set of classes that embody an abstract design for solutions to a family of related problems [63]. For a further review of the frameworks and software class libraries for metaheuristics that have been proposed in the literature, the reader is referred to [ 1 151.

FRAMEWORKS FOR HETEROGENEOUS METAHEURISTICS 0

405

Model-Oriented Frameworks. These frameworks are intended for modelling metaheuristics. They are conceptual formalisms allowing the definition of new search models where the heterogeneity is a key factor. In general, they do not impose any constraint with respect to the subsequent implementation.

In the following sections we review, from a heterogeneity-enabled point of view, some of the most important frameworks for metaheuristics in the literature.

17.4.1 Implementation-Oriented Frameworks ParadisEO ParadisEO (Parallel and Distributed Evolving Objects) [23] is an extension of the Evolving Objets' (EO) library devoted to the design of serial EAs. The extensions include local search, parallel and distributed computing, and hybridization. ParadisEO has an architecture with three layers identifying three major categories of classes: solvers, runners, and helpers. Helpers and runners are dedicated to the implementation of the metaheuristics, whereas the solvers control the evolution process and/or the search. Two types of solvers can be distinguished: SingIe Metaheuristic Solvers and MuZtiple Metaheuristic Solvers. The latter solvers are the more complex ones as they control and sequence several metaheuristics that can be heterogeneous [23].

DREAM The Distributed Resource Evolutionary Algorithm Machine (DREAM) [ 131 is a framework for the production of distributedEAs and systems of evolving agents which use the Internet to allow distributed processing. It is based on JEO (Java Evolving Objects, the Java version of EO) and only the island model is implemented. The first source of heterogeneity in this framework comes from the Internet resources, which usually are heterogeneous, so hardware heterogeneous EAs (see Section 17.3.2) can easily be built. On the other hand, heterogeneous distributed EAs can also be defined with JEO because different search strategies can be defined in each island.

17.4.2 Model-Oriented Frameworks COP This is a framework for modelling simple and parallel EA implementations as Cooperating Populations (COP) [4]. It is a multilevel recursive model at different levels that allow the implementation of any type of EA in any kind of hardware layout. COPcan be described as follows. The initial level consists of populations of individuals living in islands that are able to communicate with each other. At a second level, populations of islands can form a larger entity with each island regarded as an individual. At a higher level, these larger entities can also form a population, each one regarded as an individual. The procedure can continue to the required level. The 'http://eodev. sourceforge.net

406

PARALLEL HETEROGENEOUS METAHEURISTICS

initial level of Cop (Copo)can describe both cellular EAs and simple (sequential)EA implementations, so each individual of COP' corresponds to a gene. CoPDEB [3] is at level 1, i.e., COP'. In COP' each individual is a COP' and correspond to an island. Each island has its own population, which evolves in parallel with the population of the other islands.

MAS-DGA MAS-DGA [81] is a framework that models and implements dGAs as multiple genetic search agents. In a MAS-DGA instance, each basic GA is encapsulated into an agent, an autonomous entity that must keep knowledge of the search, learning, or optimization problem on which it should operate. Agents should be coordinated through a set of rules stipulating the topological and communication (migration) aspects, and these rules may be fixed a priori or set in runtime via a coordination entity (meta-agent). Agents in MAS-DGA allow to build heterogeneous dGAs, since they can be homo/heterogeneousin accordance with the tasks they tackle, the settings of the GASthey represent, or the embedded knowledge with which they are endowed. MAGMA MAGMA [90] is a multiagent architecture conceived as a conceptual and practical kamework for metaheuristic algorithms (MAGMA stands for MultiAGent Metaheuristics Architecture). The architecture contains different kinds of agents, with different functionalities,perspectives, and goals. Agents can be either homogeneous or heterogeneous, and they act in multilevel architecture composed of some levels, each of which corresponds to a different level of abstraction. At each level there are one or more specialized agents, each implementing an algorithm. The LEVEL-0 provides a feasible solution (or a set of solutions) for the upper level: it can be considered the solution level. LEVEL-1 deals with solution improvementand agents perform local search until a termination condition is verified: this can be defined as the level dealing with neighborhood structure. LEVEL2 agents have a global view of the space, or, at least, their task is to guide the search toward promising regions. Therefore, this can be defined as the landscape level. A fourth level, LEVEL-3 can be introduced to describe higher level strategies like cooperative search and any other combination of metaheuristics. Therefore, this can be defined as the coordination level and it deals with different landscapes and strategies. Communications between any two levels are possible; therefore an algorithm can be described as the result of the interaction of some agents (algorithmic components) each specialized for a specific task. The most effective metaheuristic algorithms can be described in term of MAGMA agents: GRASP, ACO, ILS, or EAs. 17.5 CONCLUDING REMARKS In this work we have presented a classification of PHMs. These techniques are composed of several search threads, each following a different strategy for exploring the search space. In general, PHMs not only allow us to solve larger problems

ANNOTATED BIBLIOGRAPHY

407

due to their parallelism but also are able of finding improved solutions with respect to their homogeneous counterparts (even when running on sequential computing platforms), thus leading to more robust algorithms. The use of multiple threads in a broader exploration of the solution space, using different search strategies, allows for larger diversity and deeper investigation of this solution space, due to an improved mechanisms for search intensification and diversification. From the classification,it can be observed that the thread explorations of the earlier PHMs usually were guided by the same metaheuristics but differently configured. With the growing number of new metaheuristics, collaborative searches involving different methods have became more usual. In general, if trajectory-based metaheuristics are used in this collaboration, the communication is performed through a central pool of solutions. Population-based PHMs typically collaborate by transferring solutions between subpopulations (called migrations in the EC field). We are especially concerned with offering useful and rigorous material that could help new and expert practitioners with PHMs. An extensive review of the literature has also been carried out. There are many papers to be considered, so the exclusion of any particular results neither has been intentional nor should be considered as a judgment of that work. 17.6 ANNOTATED BIBLIOGRAPHY In this section, examples are taken from the published literature to show many different heterogeneous algorithms which fit into our classification. Although considerable effort has been devoted to collecting many references, the list is certainly not complete. The references are ordered by year and, for each PHM, we present its kind of heterogeneity (hierarchical and flat classification attributes), some additional information such as whether or not it has been physically parallelized, the implementation language, its communication topology, and, finally, a brief description.

1996

1996

1996

1996

1996

CoPDEB

PHMH

PcGA

CGATTO

PATS

'

Hoin Het

.

1995

p-fGA

.

1995

DSA

.

1995

TPSA

0

J

1994

CE

J

J

.

1994

J

0-ffiA

J

1994

J

cs

* " , I

1994

J

J

0

J

. . J

J

J

J

J

J

J

J

J

J

J

J

J

J

J

J

J

J

J

. .. .. .. .. . .

J

J

iiGA

om

1993

1994

1992

HES

J

J

ffiA

1989

dCA

J

.

GAMAS

1987

PGA

Hom Het

Hierarchy

TONS

Full Connected

Full Connected

MasteriSlave

SIX

Ladder

Full Connected

MastedSlave

Full Connected

FORTRAN Star

CiPVM

C/PVM

CIPVM

C

DiPS[LI]

PVM

c7

Fixed Hierarchy

MastedSlave

Hypercube

Hypercube

Isolated GAS lunning in parallel. Processon group with its own parameters. Compete for processors. Asvnchronous Darallelizationof TS wocesses usine drffcrent search strateps Continue...

A parallel cellular GA running on a heterogeneous workstation cluster,

A cellular GA (cGA) and a TS cooperating through individual interchanges.

Distributed GA. Each island uses its own rates for mutatmn. crossover. and specialized operators

Extend the ffiA [I031 by using the phenotypic domain for space diwslon.

Competitive Evolution. GA where the finest subpopulations are allowed to evolve for more steps. Several SA algorithms running with different temperatures. Cooperation through solution interchange. Multiple SA processes usingdifferent temperature modifier parameters. No collaboration

Extend the ffiA [I031 to deal with order-based representations.

Hierarchical Evolution Shategies. Competition behveen subpopulations with different shategies. GA in which an explorer population divides the search space into exploiler subpopulations at every schema detection. dGA with four heterogeneous species (islands) and quite specialized migrations. Distributed GA where each subpopulation stores solutions encoded with different representations. GA where subpopulations use different strategies and compete with other subpopulations

dGA with hypercube topology. Different crossover and mutations rates.

dGA with hypercube topology. Different mutations rates

P

z

r

m

i

W

0

1997

1997

(711

[73]

1921

[93]

[I091

[I141

[I161

[36]

(40)

[SI]

[SO]

[I071

[I061

[I131

[Sl

1391

[53]

1561

[95]

MMC PSA

PGA-DCP

ACS

APGA

mp-ffiA

CGA

nGA (DAGAZ)

DGA

CGA

HDGA

GDRCGA

Type I bGA

T y p II bGA

DiiGA

CMPTS

AGA

HDRCGA

AIPGA

FTPH

1998

1998

1998

1998

1998

1997

1997

1997

1997

1997

1996

1996

1996

1996

1996

1996

1996

1996

(681

RDGA

1996

(621

CPSA

SAGA J

J

-

J

J J

O

.

J

.

.

J

.

.

.

.

.

J .

J

J

J

-

-

-

-

0

J

J

.

J

J

J

-

-

.

J

e

J

J

-

J

.

J

J

.

.

.

.

J

J

-

S

.

. -

J

-

J

Full Connected 'Iar

CiPVM (MARS)

Hypercube -

C

Star Star

Hierarchy

C!PVM ClLinda -

Full Connected

Full Connected

Hypercube

Hypercube

Full Connected

Mesh

Any Desired

MasteriSlave

Masterislave

Mesh

Full Connected

-

-

-

-

C

C

MPI

*

*

J

.-

CIPVM

-

-

-

J J

Full Connected

OCCAM Maste/Slave

-

FullConnected

DiPS(611 Star

PVM

-

J

-

J

J

J

J

Different parameters settings for TS processes. Fault-tolerant. Coriririire.

Adaptive GA where subpopulations compete by using different cmsmver operators. Extend the GDRCGA [SZ]. Hierarchical structure combining homogeneous and heterogeneous dGAs. Distributed GA. Migrants are exchanged asynchronously among islands which have different search strategies.

Several different TS processes collaborate exchanging elite solutions.

Integrate the CHC algorithm with the iiGA [72] architecture.

Extend the Type I bGA [I071 by avoiding overexploration of the search space.

Bi-population GA with a explorer sub-GA and a exploiter sub-GA.

Distributed GA. Each island solves the same problem using a different representation. Distributed GA applying different crossover operator to each subpopulation. Synchronous migrations. Extend the HDGA [SI] by applying different selective pressures degrees to each subpopulation.

Distributed GA where migrations occur in a stepping stone fashion.

Masterislave schema. Each slave runs a GA using different crossover and mutation rates. General recursive model of multilevel GAS. A two-level GA. DAGAZ. is evaluated.

Extend the ffiA [ 1031by using the phenolypic domain for space division.

DGA adapting the search suategies of each subpopulation.

Extend the CS [91] model by adapting the sire ofthe population

Self-Adaptive GA Three GAS self-adapting their population sizes and mutation strength. Several SA processes using different initial solutions. Exchange of panial solutions. Robust and Distributed GA. Each island has different crossover and mutation rates. Multicast migration. PSA where each processor has its own parameter settings. Information exchange. Sync and s y n c versions. Parallel GA based on the farming model. Each slave runs a GA with different mutation rate.

MEA

Zoo0

20M)

2000

2000

2000

2000

2000

2001

2001

2001

2001

2001

2001

2001

FTS

MACA

HPMH

DEA

PGA-PA

MGA

DDGA

COMPETants

CAC

HM4C

MCAA

ADS-EEA

CS-EEA

EDGA

ATeam

COSEARCH

J

J

J

J

J

J

J

J

J

J

J

J

Matlab

Mallab

CiMPl

J

MPI

Random Ring

Full Connected

Stat

Master!Slave

MasterlSlave

Full Connected

Dynamic Hierarchy

Full Connected

Ring

Hypercube

Full Connected

Full Connected

C++‘PVM Star

C!PVM

J

Star

Full Connected

C++IPVM Directed Graph

CiPVM (MARS)

J

J

J

J

J

.. .. .. . . .

J

J

J

J

J

J

J

J

J

J J

C/PVM

Full Connected Star

Star

ANTabu

.. . .. .

Cq

C++

J

J

Random b n g

J

. Full Connected

J

J J

J

~~-

~~~

J

-

DEGA

Ho’n He1

J

IJ

DCAlnnr

I999

1999

PLGA

1999

.

MACSVRPTW

..

TECHS

Cooperation between GA and BE agents. Highly application-dependcnt

Extend the DEGA (781 model to solve constraint problems

Distributed EA where subpopulations compete for individuals

Distributed EA applying different search strategies on each subpopulation.

Distributed EA for simulation optimization. Each population has its own parameters. Distributed GA dynamically adjusting the parameters. Parameters are also evolved and migrated Self-organized GA for iiiultimodal optimization based on subpopulations called ‘halions” Dynamic distnbuted GA. Subpopulations sizes adapted through a central monitor Populations of ants competing for resources Multiobjecttve optiinization: colony per objective. Cooperative AC Two ant colonies using different parameter settings and different heuristics Two-phase metaheuristic combining ES and multiple TS threads configured with differen! parameters settings. Multicolony Ant Algorithm for btcnterion optimization Each ant and each colony uses a different weight.

Cooperative parallel search between a GA process and a TS process

Distributed GA with different mutation and crossover rates in each island. Random-nng migration Parallel AC combined with TS. Masledworker paradigm. Central memory Different tabu list sizes Multinational EA. Competition for individuals among populations called %ations” Cooperation between TS. GA. and local search. Communicalion via an adaptive memory. Asynchronous Team combining different heuristics. Cooperation by means of two shared memones TS processes cooperating through a central process. Use different initial solutions and search strategies. Multiple Ant Colonies Algorithm. Each colony determines the pheromone effect of the other colonies.

Distributed GA in which the migration rate is randomized

Two ant colonies running different ACO searches. one for each objective of the VRPTW. Parameter-Less GA. Multiple populations self-adapting their search parameters.

2m

2002

[831

HDEA

.

2002

[79]

TPSAiAN

J

-

2002

[64]

MOCADES

J .

J J

J J

~ 4 1 2003

2003

I221

PTSA

lTS

[691

2004

APDE

2004

[I01

[I191

PSPMEA

“Ill

(771

PSAiANGA

Hy4

[591

HEM0

PYM CPMVRPTW

2003

2003

[471

PAC0

U-.

e

J

(141

PMSATS

.

0

~

~

8

2003

[91

Hy3

2003

0

=

Ye

.

J

.

J e

b

J

.

.

2003

J

J

2003

[61

J

-

2002

DCMOGADES [84]

J

*

J .

.

J

J

J

8

FTS

-

.

J

-

2002

[SS]

J

J

-

AHFC-EA

2002

2002

.

.

.

...

J

J

J

J J

(571

om

...

HFC-EA

J

P81

2002

Het.

071

(71

HPTS

2002

.

DPM

I121

PGA

2002

CPTS

[I]

CAACO

Cmitimation

Ring

-

PVM Java MPI

-

J J J J

J

P

Java

J

= No,

Random

Star

Hierarchy

4D-Hypercube

Ring?

Star

Hierarchy

star

- = N o information given

CIPVM

MPI

J J

-

J

Java

C?

J

RinglFull Connected

Full Connected

3D-Hypercube

Star

Random Ring

Ladder

Random Ring

Hierarchy

Hierarchy

C++/MPI Full Connected

J

-~

~~

-

PVM

J

J

-

J

~~

MPI -

-

Star

FORTRANStar

PVM

J

J

Java

J

Full Connected

J

-

J

The neighborhood range of each SA process in a PSA algorithm IS adaptively selected by a GA. Parallel coarse-grain implementation of SPEA2 where each island uses different crossover and mutation rates Extension ofthe Hy3 [II ] model. 16 subpopulations arranged in a 4-dimensional hypercube topology. Coarse-grain parallel GP with pyramidal topology having different parameters in each island. Cooperative framework composed of two TSs. two EAs. and local search Central-memory communication. Adaptive Pareto Differential Evolution using subpopulations with randomly initialized adaptive parameters.

Extend the HFC (571 model to multiobjective optimization.

Physical parallelization of the GD-RCCA [52]. Synchronous vs asynchronous version of the model. Parallel multilevel algorithm. Mix SA and TS Different annealing parameters used in each process. Parallel TS Algorithm with different parameters Settings In each thread. Ring and central memory communication models Several distinct tailored TS processes in parallel. each using different neighborhood sizing parameter. Population-based ACO. Ants adapt the multi-criteria optimization influence weights

Parallel TS running on a heterogeneous computing platform.

EA with dynamic generatiodextintion of subpopulations that use different search parameters. Extend MOCADES [MI with a cooperation model for multiobjective optimization.

Extension ofthe TPSA (661for continuous optimization.

Extend the DEGA [78] model to deal with multiobjective problems.

Heterogeneous Parallel Tabu Search. Experiments conducted on four different architectures. Several TS threads with different search sh’ategies. Asynchronous cooperation through a pool of solutions. Distributed Parallel Metaheuristic executing different configurations of GRASP and VNS. Hierarchical model for EAs. Subpopulations can only accommodate individuals within a specified fitness range. Extend the HFC [57] model by automatically determining and adjusting the fitness range.

PGA running on a heterogeneous computing platform

Two population of solutions. a GA and a ACO. running and cooperating in parallel.

412

PARALLEL HETEROGENEOUS METAHEURISTICS

Acknowledgments This work has been partially funded by the Ministry of Science and Technology and FEDER under contracts TIC2002-04498-C05-02 (the TRACER project) and TIC2002-04309-C02-02.

REFERENCES 1. A. Acan. GAACO: A GA + ACO Hybrid for Faster and Better Search Capability. In ANTS 2002, LNCS 2463,pages 300-301,2002.

2.P. Adamidis, S. Karzalis, and V. Petridis. Advanced Method for Evolutionary Optimization. In Symposium on Large Scale Systems: Theory and Applications, Patras, Greece, 1998.

3.P. Adamidis and V. Petridis. Co-operating Populations with Different Evolution Behaviors. In Proc. of the Third IEEE Con$ on Evolutionary Computation, pages 188-191,New York, 1996.IEEE Press. 4.P. Adamidis and V. Petridis. On Modelling Evolutionary Algorithm Implementations through Co-operating Populations. In Parallel Problem Solving from Nature (PPSN VII),pages 321-330. Springer-Verlag, 2002. 5. R.M. Aiex, S.L. Martins, C.C. Ribeiro, and N.R. Rodriguez. Cooperative MultiThread Parallel Tabu Search with an Application to Circuit Partitioning. LNCS 1447, pages 310-331,1998.

6.A. Al-Yamani, S.M. Sait, H. Barada, and H. Youssef. Parallel Tabu Search in a Heterogeneous Environment. In Proc. of the Int. Parallel and Distributed Processing Symp., pages 56-63,2003. 7 . A. Al-Yamani, S.M. Sait, and H.R. Barada. HPTS: Heterogeneous Parallel Tabu Search for VLSI Placement. In Proc. of the 2002 Congress on Evolutionaiy Computation, pages 35 1-355,2002.

8. E. Alba, J.F. Chicano, F. Luna, G. Luque, and A.J. Nebro. Advanced Evolutionary Algorithms for Training Neural Networks. In s. Olariu and A.Y. Zomaya, editors, Handbook of Bioinspired Algorithms and Applications, chapter 28,2005. 9.E. Alba, F. Luna, and A.J. Nebro. Parallel Heterogeneous Genetic Algorithms for Continuous Optimization. In IPDPS-NIDISC’0.3, page 147,Nize (France), 2003. 10. E. Alba, F. Luna, and A.J. Nebro. Advances in Parallel Heterogeneous Genetic Algorithms for Continuous Optimization. Int. Journal of Applied Mathematics and Computer Science, 14(3):101-117,2004.

REFERENCES

413

11. E. Alba, F. Luna, A.J. Nebro, and J.M. Troya. Parallel Heterogeneous Genetic Algorithms for Continuous Optimization. Parallel Computing,30(5-6):699-7 19, 2004.

12. E. Alba, A.J. Nebro, and J.M. Troya. Heterogeneous Computing and Parallel Genetic Algorithms. Journal of Parallel and Distributed Computing, 62: 13621385,2002. 13. M.G. Arenas, P. Collet, A.E. Eiben, M. Jelasity, J.J. Merelo, B. Paechter, M. Preussand, and M. Schoenauer. A Framework for Distributed Evolutionary Algorithms. In Proc. of the PPSN VII, LNCS 2439, pages 665-675,2002. 14. R. Baiios, C. Gil, J. Ortega, and F.G. Montoya. Optimising Graphs Partitions using Parallel Evolution. In 6th International Conference on Artijicial Evolution, EA’O3, LNCS 2936, pages 119-130,2003. 15. R. Baiios, C. Gil, J. Ortega, and F.G. Montoya. A Parallel Multilevel Metaheuristic for Graph Partitioning. Journal ofHeuristics, 10(3):315-336,2004. 16. R. Baiios, C. Gil, J. Ortega, and F.G. Montoya. Parallel Heuristic Search in Multilevel Graph Partitioning. In 12th Euromicro Conference on Parallel, Distributed and Network-based Processing, pages 88-95. IEEE Press, 2004. 17. V. Bachelet, P. Preux, and E.-G. Talbi. Parallel Hybrid Meta-Heuristics: Application to the Quadratic Assignment Problem. In Proceedings of the Parallel Optimization Colloquium, 1996. 18. V. Bachelet and E.-G. Talbi. COSEARCH: a Co-Evolutionary Metaheuristic. In Proc. of the 2000 Congress on Evolutionary Computation, volume 2, pages 1550-1557, July 2000. 19. J. Berger and M. Barkaoui. A Parallel Hybrid Genetic Algorithm for the Vehicle Routing Problem with Time Windows. Computer & Operation Research, 31(12):2037-2053,2004. 20. F. Berman, G.C. Fox, and A.J.G. Hey. Grid Comptuing. Making the Global Infrastructure a RealiQ. Communications Networlung and Distributed Systems. Wiley, 2003. 2 1. C. Blum and A. Roli. Metaheuristics in Combinatorial Optimization: Overview and Conceptual Comparison. ACM Computing Surveys,35(3):268-308,2003. 22. A. Bortfeldt, H. Gehnng, and D. Mack. A Parallel Tabu Search Algorithm for Solving the Container Loading Problem. Parallel Computing, 29:64 1 4 6 2 , 2003. 23. S. Cahon, N. Melab, and E.-G. Talbi. ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics. Journal ofHeuristics, 10:357380,2004.

414

PARALLEL HETEROGENEOUS METAHEURISTICS

24. P. Caricato, G. Ghiani, A. Grieco, and E. Guerriero. Parallel TS for a Pickup and Delivery Problem Under Track Contention. Parallel Computing, 29:63 1-639, 2003. 25. B. Carse, A.G. Pipe, and 0. Davies. Parallel Evolutionary Learning of Fuzzy Rule Bases Using The Island Injection Genetic Algorithm. In Proc. of the 1997 IEEE Int. Con$ on System, Man, and Cybernetics, volume 4, pages 3692-3697, October 1997. 26. C.B. Cavalcante, V.C. Cavalcante, C.C. Ribeiro, and C.C. De Souza. Parallel Cooperative Approaches for the Labor Constrained Scheduling Problem. In Essays and Surveys in Metaheuristics, Kluwer, pages 20 1-225,2000. 27. F. Corno, P. Prinetto, M. Rebaudengo, and M. Sonza Reorda. Exploiting Competing Subpopulations for Automatic Generation of Test Sequences for Digital Circuits. In Parallel Problem Solving from Nature (PPSN IV), pages 792-800, Berlin, Germany, September 1996. 28. T.G. Crainic and M. Gendreau. Cooperative Parallel Tabu Search for Capacitated Network Design. Journal of Heuristics, 8:601427, 2002. 29. T.G. Crainic and M. Toulouse. Parallel Metaheuristics. In T.G. Crainic and G. Laporte, editors, Fleet Management and Logistics, pages 205-25 1. Kluwer Academic Publisher, 2003. 30. T.G. Crainic and M. Toulouse. Parallel Strategies for Metaheuristics. In F.W. Glover and G.A. Kochenberger, editors, Handbook of Metaheuristics, pages 475-514,2003. 31. T.G. Crainic, M. Toulouse, and M. Gendreau. Parallel Asynchronous Tabu Search for Multicommodity Location-Allocation with Balancing Requirements. Annals of Operations Research, 63 1277-299, 1996. 32. V.-D. Cung, S.L. Martins, C.C. Ribeiro, and C. Roucairol. Strategies for the Parallel Implementation of Metaheuristics. In C.C. Ribeiro and P. Hansen, editors, Essays and Surveys in Metaheuristics, pages 263-308. Kluwer, 2003. 33. J. Denzinger and T. Offermann. On Cooperation Between Evolutionary Algorithms and Other Search Paradigms. In Proc. of the 1999 Congress on Evolutionary Computation, pages 23 17-2324, Washington, DC, USA, July 1999. 34. K. Doerner, R.F. Hartl, and M. Reimann. Are COMPETants more competent for problem solving? - The Case of a Multiple Objective Transportation Problem. In Proc. of the Genetic and Evolutionary Computation Conference GECCO '01, page 802,2001. 35. K. Doerner, R.F. Hartl, and M. Reimann. Cooperative Ant Colonies for Optimizing Resource Allocation in Transportation. In Applications of Evolutionary Computing, LNCS 2037, pages 70-79,2001.

REFERENCES

415

36. D.J. Doorly and J. Peiro. Supervised Parallel Genetic Algorithms in Aerodynamic Optimisation. In Proc. of the 13th AIM CFD Conference,pages 2 10-2 16, 1997. 37. L.M.A. Drummond, L.S. Vianna, M.B. Silva, andL.S. Ochi. DistributedParallel Metaheuristics Based on GRASP and VNS for Solving the Travelling Purchaser Problem. In Proc. of the Ninth Int. Con$ on Parallel and Distributed Systems, pages 257-263,2002. 38. D. Eby, R.C. Averill, B. Gelfand, W.F. Punch, 0. Mathews, and E.D. Goodman. An Injection Island GA for Flywheel Design Optimization. In Proc. EUFIT '97, - 5th European Congress on Intelligent Techniques and Soft Computing, September 1997. Competing 39. A.E. Eiben, I.G. Sprinkhuizen-Kuyper, and B.A. Thijssen. Crossovers in an Adaptive GA Framework. In Proc. of the 5th IEEE Conference on Evolutionary Computation, pages 787-792. IEEE Press, 1998. 40. C. Fonlupt, P. Preux, and D. Robilliard. Preventing Premature Convergence via Cooperating Genetic Algorithms. In Proc. of the 3rd Int. Mendel Conference on Genetic Algorithms, Optimization problems, Fuzzy Logic, Neural networks, Rough Sets (MENDEL'97), pages 50-55,1997. 4 1. J. Frey, R. Gras, P. Hernandez, and R. Appel. A Hierarchical Model of Parallel Genetic Programming Applied to Bioinformatic Problems. In PPAM 2003, LNCS 3019, pages 1 1 4 6 1153,2004. 42. L.M. Gambardella, E.D. Taillard, and G. Agazzi. MACS-VRPTW: A Multiple Ant Colony System for Vehicle Routing Problems with Time Windows. In D. Corne, M. Dorigo, and F. Glover, editors, New Ideas in Optimization, pages 63-76. McGraw-Hill, 1999. 43. H. Gehring and J. Homberger. A Parallel Two-Phase Metaheuristic for Routing Problems with Time Windows. Asia-Pa@ Journal of Operational Research, 18:3547,2001. 44. H. Gehring and J. Homberger. Parallelization of a Two-Phase Metaheuristic for Routing Problems with Time Windows. Journal of Heuristics, 8(3):25 1-276, 2002. 45. F.W. Glover. Future Paths for Integer Programming and Links to Artificial Intelligence. Computers & Operations Research, 13533-549, 1986. 46. F.W. Glover and G.A. Kochenberger. Handbook ofktetaheuristics. Int. Series in Operations Research and Management Sciences. Kluwer Academic Publisher, 2003. 47. M. Guntsch and M. Middendorf. Solving Multi-Criteria Optimization Problems with Population-Based ACO. In Second Int. Con$ on Evolutionary MultiCriterion optimization, LNCS 2632, pages 464478,2003.

416

PARALLEL HETEROGENEOUS METAHEURISTICS

48. G.R. Harik and F.G. Lobo. A Parameter-Less Genetic Algorithm. In Proc. of the Genetic and Evolutionary Computation Conference (GECCO’99), pages 258-267. Morgan Kaufmann, 1999. 49. M. Herdy. Reproductive Isolation as Strategy Parameter in Hierarchical Organized Evolution Strategies. In PPSN 2, pages 207-217, 1992. 50. F. Herrera and M. Lozano. Gradual Distributed Real-Coded Genetic Algorithms. Technical Report 97-01-03, DECSAI, 1997.

5 1. F. Herrera and M. Lozano. Heterogeneous Distributed Genetic Algorithm Based on the Crossover Operator. In Second IEEIIEEE Int. Con$ on Genetic Algorithms in Engineering Systems: Innoilations and Applications, pages 203-208. IEEE Press, 1997. 52. F. Herrera and M. Lozano. Gradual Distributed Real-Coded Genetic Algorithms. IEEE Transactions on Evolutionary Computation, 4( 1):43-63, 2000. 53. F. Herrera, M. Lozano, and C. Moraga. Hybrid Distributed Real-Coded Genetic Algorithms. In Parallel Problem Solving.from Nature (PPSN v),pages 879-888, 1998. 54. R. Hinterding, Z. Michalewicz, and T.C. Peachey. Self-Adaptive Genetic Algorithm for Numeric Functions. In Parallel Problem Solving from Nature (PPSN Iv), pages 420429, 1996. 55. T. Hiroyasu, M. Milu, and M. Negami. Distributed Genetic Algorithms with Randomized Migration Rate. In Proc. of the IEEE Con$ of Systems, Man and Cybernetics, volume 1, pages 689-694. IEEE Press, 1999. 56. H. Horii, S. Kunifuji, and T. Matsuzawa. Asynchronous Island Parallel GA Using Multiform Subpopulations. In Simulated Evolution and Learning: Second AsiaPacific Con$ on Simulated Evolution and Learning (SEAL’98), LNCS, pages 122-129, Camberra, Australia, November 1998. 57. J.J. Hu and E.D. Goodman. The Hierarchical Fair Competition (HFC) Model for Parallel Evolutionary Algorithms. In Proc. ofthe 2002 Congress on Evolutionary Computation CEC2002, pages 49-54. IEEE Press, 2002. 58. J.J. Hu, E.D. Goodman, K. Seo, and M. Pei. Adaptive Hierarchical Fair Competition (AHFC) Model for Parallel Evolutionary Algorithms. In Proc. of the Genetic and Evolutionaly Computation Conference GECCO’02, pages 772-779, 2002. 59. J.J. Hu, K. Seo, Z. Fan, R.C. Rosenberg, and E.D. Goodman. HEMO: A Sustainable Multi-Objective Evolutionary Optimization Framework. In Proc. of the 2003 Genetic and Evolutionary Computing Conference, pages 1764-1775, Chicago, July 2003. Springer, Lecture Notes in Computer Science.

REFERENCES

417

60. S. Iredi, D. Merkle, and M. Middendorf. Bi-Criterion Optimization with Multi Colony Ant Algorithms. In First Int. Con$ on Evolutionary Multi-Criterion Optimization, pages 359-372. LNCS 1993. Springer Verlag, 200 1. 6 1. R.D. Janaki and K. Ganeshan. DTSS: A System for Implementing and Analyzing Distributed Algorithms. Technical Report IITM-CSE-93-001, Indian Institute of Technology, Madras, 1993. 62. D. Janaki Ram, T.H. Sreenivas, and K.G. Subramaniam. Parallel Simulated Annealing Algorithms. Journal of Parallel and Distributed Computing, 37:207212, 1996. 63. P. Kall and H.-J. Luthi, editors. Building Reusable Software Components for Heuristic Search, 1999. 64. J. Kamiura, T. Hiroyasu, M. Miki, and S. Watanabe. MOGADES: MultiObjective Genetic Algorithm with Distributed Environment Scheme. In Second International Workshop on Intelligent Systems Design and Applications, pages 143-148, Atlanta, Georgia, August 2002. 65. H. Kawamura, M. Yamamoto, K. Suzuh, and A. Ohuchi. Multiple Ant Colonies Algorithm Based on Colony Level Interactions. IEICE Transactions on Fundamentals of Electronics, E83-A(2):371-379,2000. 66. K. Konishi, K. Taki, and K. Kimura. Temperature Parallel Simulated Annealing Algorithm and Its Evaluation. Trans. on Information Processing Society of Japan, 36(4):797-807, 1995. 67. K. Krishna, K. Ganeshan, and D. Janaki Ram. Distributed Simulated Annealing Algorithms for Job Shop Scheduling. IEEE Trans. on System, Man, and Cybernetics, 25(7):1102-1109, 1995. 68. A. Kumar, A. Srivastava, A. Singru, and R.K. Ghosh. Robust and Distributed Genetic Algorithm for Ordering Problems. In Proc. of the Fi$h IEEE International Symposium on High Performance Distributed Computing,pages 253-262, 1996. 69. A. Le Bouthillier and T.G. Crainic. A Cooperative Parallel Meta-Heuristic for the Vehicle Routing Problem with Time Windows. Computers & Operations Research, 32(7): 1685-1708,2004. 70. A. Le Bouthillier, T.G. Crainic, and R.K. Keller. Co-Operative Parallel Method for Vehicle Routing Problems with Time Windows. In 4th Metaheuristics International Conference MIC'2001, pages 277-279,200 1, 7 1. S.-Y. Lee and K.-G. Lee. Synchronous and Asynchronous Parallel Simulated Annealing with Multiple Markov Chains. IEEE Trans. on Parallel and Distributed Systems, 7( 10):993-1008, 1996.

418

PARALLEL HETEROGENEOUS METAHEURISTICS

72. S.-L. Lin, W.F. Punch, and E.D. Goodman. Coarse-Grain Parallel Genetic Algorithms: Categorization and New Approach. In Sixth IEEE Symp. on Parallel and Distributed Processing, pages 28-37. IEEE Press, 1994. 73. J. Lis. Parallel Genetic Algorithm with the Dynamic Control Parameter. In Proc. of the IEEE Int. Conf: on Evolutionary Computation, pages 324-329. IEEE Press. 1996. 74. G. Lo Presti, G. Lo Re, P. Stomiolo, and A. Urso. A Grid Enabled Parallel Hybrid Genetic Algorithm for SPN. In ICCS 2004, LNCS 3036, pages 156-163, 2004. 75. B. Malott, R.C. Averill, S.-C. Ding, W.F. Punch, and E.D. Goodman. Use of Genetic Algorithms for Optimal Design of Laminated Composite Sandwich Panels with Bending-Twisting Coupling. In AIAA SDM (Structures, Dynamics and Materials), 1996. 76. T. Matsumura, M. Nakamura, S. Tamaki, and K. Onaga. A Parallel Tabu Search and Its Hybridization with Genetic Algorithms. In Int. Symp. on Parallel Architectures, Algorithms and Networks, pages 18-22. IEEE, 2000. 77. M. Mih, T. Hiroyasu, and T. Fushimi. Parallel Simulated Annealing with Adaptive Neighborhood Determined by GA. In IEEE Int. Con$ on System, Man, and Cybernetics, pages 2631,2003. 78. M. Miki, T. Hiroyasu, M. Kaneko, and K. Hatanaka. A Parallel Genetic Algorithm with Distributed Environment Scheme. In Proc. of the 1999 IEEE Conf: of Systems, Man and Cybernetics, pages 695-700. IEEE Press, 1999. 79. M. Miki, T. Hiroyasu, M. Kasai, K. Ono, and T. Jitta. Temperature Parallel Simulated Annealing with Adaptive Neighborhood for Continuous Optimization Problem. Computational Intelligence and Applications, pages 149-1 54,2002. 80. A.J. Nebro, F. Luna, and E. Alba. Multi-Objective Optimization Using Grid Computing. Soft Computing Journal. To appear, 2005. 81. E. Noda, A.L.V. Coelho, I.L.M. Ricarte, A. Yamakami, and A.A. Freitas. Devising Adaptive Migration Policies for Cooperative Distributed Genetic Algorithms. In Proc. of the 2002 IEEE Int. ConJ: on System, Man and Cybernetics. IEEE Press. (Published in CD-ROM), 2002. 82. V. Nwana, K. Darby-Dowman, and G. Mitra. A Co-operative Parallel Heuristic for Mixed Zero-One Linear Programming: Combining Simulated Annealing with Branch and Bound. European Journal of Operational Research, 164( 1):223,2004. 83. S.-K Oh, C.-Y. Lee, and J.-J. Lee. A New Distributed Evolutionary Algorithm for Optimization in Nonstationary Environments. In Proc. ofthe 2002 Congress on Evolutionary Computation, pages 378-383. IEEE Press, 2002.

REFERENCES

419

84. T. Okuda, T. Hiroyasu, M. Miki, J. Kamiura, and S. Watanabe. DCMOGADES: Distributed Cooperation model Multi-Objective Genetic Algorithm with Distributed Scheme. In Second International Workshop on Intelligent Systems Design and Applications, pages 155-1 60, Atlanta, Georgia, August 2002. 85. H. Pierreval and J.-L. Paris. Distributed Evolutionary Algorithms for Simulation Optimization. IEEE Transactions on System, Man and Cybernetics, 30( 1):1524,2000. 86. H. Pohlheim. Competition and Cooperation in Extended Evolutionary Algorithms. In L. Spector, editor, Proc. of the Genetic and Evolutionary Computation Conference GECCO’OI, pages 33 1-338. Morgan Kaufmann, 2001. 87. H. Pohlheim, J. Wegener, and H. Sthamer. Testing the Temporal Behavior of Real-Time Engine Control Software Modules using Extended Evolutionary Algorithms. In Computational Intelligence im industriellen Einsatz, pages 6 166. VDI-Verlag, 2000. 88. J.C. Potts, T.D. Giddens, and S.B. Yadav. The Development and Evaluation of an Improved Genetic Algorithm Based on Migration and Artificial Selection. IEEE Transactions on Systems, Man And Cybernetics, 24( 1):73-86, January 1994. 89. W.F. Punch, R.C. Averill, E.D. Goodman, S.-C. Lin, Y. Ding, and Y.C. Yip. Optimal Design of Laminated Composite Structures Using Coarse-Grain Parallel Genetic Algorithms. Computing System in Engineering, 5:415-423, 1994. 90. A. Roli and M. Milano. MAGMA: A Multiagent Architecture for Metaheuristics. IEEE Transactions on System, Man, and Cybernetics - Part B, 34(2):925-94 1, 2004. 9 1. D. Schlierkamp-Voosen and H. Muhlenbein. Strategy Adaptation by Competing Subpopulations. In Proc. ofthe third Int. Con$ on Parallel Problem Solving from Nature (PPSN 111), pages 199-208, Jerusalem, Israel, October 1994. 92. D. Schlierkamp-Voosen and H. Muhlenbein. Adaptation of Population Sizes by Competing Subpopulations. In Proc. of the Int. Con$ on Evolutionary Computation, pages 33Ck335, Nagoya, Japan, 1996. 93. V. Schnecke and 0. Vornberger. An Adaptive Parallel Genetic Algorithm for VLSI-Layout Optimization. In Proc. of thefourth Int. Conf on Parallel Problem Solvingfrom Nature (PPSNIV),pages 22-27, Berlin, Germany, September 1996. 94. E.-G. Talbi. A Taxonomy of Hybrid Metaheuristics. Journal of Heuristics, 8~541-564,2002. 95. E.-G. Talbi, J.-M. Geib, Z. Hafidi, and D. Kebbal. A Fault-Tolerant Parallel Heuristic for Assignment Problems. Future Generutions Computer Systems, 14:425438, 1998.

420

PARALLEL HETEROGENEOUS METAHEURISTICS

96. E.-G. Talbi, 0. Roux, C. Fonlupt, and D. Robillard. Parallel Ant Colonies for Combinatorial Optimization Problems. In IEEE IPPS/SPDP '99, pages 239-247, 1999. 97. E.-G. Talbi, 0. Roux, C. Fonlupt, and D. Robillard. Parallel Ant Colonies for the Quadratic Assignment Problem. Future Generation Computer Systems, 17:441-449,2001. 98. R. Tanese. Parallel Genetic Algorithm for a Hypercube. In J.J. Grefenstette, editor, Proc. of the Second Int. Conj: on Genetic Algorithms, pages 177-183, 1987. 99. R. Tanese. Distributed Genetic Algorithms. In J.D. Schaffer, editor, Proc. qfthe Third Int. Conj: on Genetic Algorithms, pages 434439. Morgan Kaufmann, 1989. 100. Y. Tanimura, T. Hiroyasu, and M. Miki. Discussion on Distributed Genetic Algorithms for Designing Truss Structures. In CD-ROM of HPC Asia 2001, 2001. 101. S. Tongchim and P. Chongstitvatana. Adaptive Parameter Control in Parallel Genetic Algorithm. Proc. of Int. Con$ on Intelligent Technologies, 2000. 102. S . Tongchim and P. Chongstitvatana. Parallel Genetic Algorithm with Parameter Adaptation. Information Processing Letters, 82( 1):47-54,2002. 103. S. Tsutsui and Y. Fujimoto. Forking Genetic Algorithm with Blocking and Shnnlung Modes (fGA). In S. Forrest, editor, Proc. of the Fifth Int. Conf on Genetic Algorithms, pages 206-2 13. Morgan Kaufmann, 1993. 104. S. Tsutsui and Y. Fujimoto. Phenotypic Forking Genetic Algorithm (p-fGA). In D. Fogel, editor, Proc. of the IEEE Int. Con$ on Evolutionary Computation, pages 566572, Piscataway, NJ, 1995. IEEE Press. 105. S. Tsutsui, Y. Fujimoto, and I. Hayashi. Extended Forking Genetic Algorithm for Order Representation (0-fGA). In Proc. of the 1st IEEE Conf on Evolutionary Computation, pages 170-175, 1994. 106. S. Tsutsui, A. Ghosh, A. Come, and Y. Fujimoto. A Real Coded Genetic Algorithm with an Explorer and an Exploiter Populations. In Proc. of the 7th Int. Con$ on Genetic Algorithms, pages 238-245, 1997. 107. S. Tsutsui, A. Ghosh, Y. Fujimoto, and D. Come. A Bi-population Scheme for Real-Coded GAS: the Basic Concept. In Proc. of the 1st Int. Workshop on Frontiers in Evolutionary Algorithms, volume 1, pages 39-42, 1997. 108. S. Tsutsui and Y. Ghosh. On Convergence Measures for Order-based Forking Genetic Algorithms. In Proc. of the 1996 Australian New Zealand Con$ on Intelligent Information Systems, pages 280-283, 1996.

REFERENCES

421

109. S. Tsutsui, Y. Ghosh, and M. Takiguchi. Phenotypic Forking GA with Moving Windows. In Proc. of the 1996 Int. Con$ on Neural Information Processing, pages 1335-1340. Springer Verlag, 1996. 110. Q. Tuan Pham. Competitive Evolution: a Natural Approach to Operator Selection. In X. Yao, editor, Progress in Evolutionary Computation, Lecture Notes in Artificial Intelligence, pages 49-60, November 1994. 11 1. R.K. Ursem. Multinational Evolutionary Algorithms. In Proc. of the 1999 Congress on Evolutionary Computation, volume 3, pages 1633-1640. IEEE Press. 1999. 112. R.K. Ursem. Multinational GAS: Multimodal Optimization Techniques in Dynamic Environments. In Proc. of the Second Genetic and Evolutionary Computation Conference (GECCO ’00),volume 1, pages 19-26,2000. 113. H.D. Vekeria and I.C. Parmee. Reducing Computational Expense Associated with Evolutionary Detailed Design. In IEEE Int. ConJ: on Evolutionary Computation, pages 391-396. IEEE Press, April 1997. 114. R. Venkateswaran, Z. Obradovic, and C.S. Raghavendra. Cooperative Genetic Algorithm for Optimization Problems in Distributed Computer Systems. In Proc. of the Second Online Workshop on Evolutionary Computation, pages 49-52.1996. 115. S. Voss and D.L. Woodruff. Optimization Sofmare Class Libraries. Kluwer, 2002. 116. G. Wang, E.D. Goodman, and W.F. Punch. Simultaneous Multi-Level Evolution. GARAGe Technical Report, Department of Computer Science and Case Center for Computer-Aided Engineering & Manufacturing, Michigan State University, 1996. 117. B. Weinberg, V. Bachelet, and E.-G. Talbi. A Co-evolutionnist Meta-heuristic for the Assignment of the Frequencies in Cellular Networks. In First European Workshop on Evolutionary Computation in Combinatorial Optimization (EvoCOP), volume 2037 of LNCS, pages 140-149,2001, 118. N. Xiao and M.P. Armstrong. A Specialized Island Model and Its Applications in Multiobjective Optimization. In Genetic and Evolutionary Computation Conference (GECCO ’03),LNCS 2724, pages 1530-1 540,2003. 119. S. Xiong and F. Li. Parallel Strength Pareto Multi-Objective Evolutionary Algorithm. In Proc. of the 2003 Congress on Evolutionary Computation (CEC’2003), pages 681-683,2003. 120. W. Yi, Q. Liu, and Y. He. Dynamic Distributed Genetic Algorithms. In Proc. of the 2000 Congress on Evolutionary Computation, pages 1132-1 136. IEEE Press, 2000.

422

PARALLEL HETEROGENEOUS METAHEURISTICS

12 1. D. Zaharie and D. Petcu. Adaptive Pareto Differential Evolution and Its Parallelization. In P PM 2 0 0 3 , LNCS 3019, pages 261-268,2004.

Part I11

Theory and Applications

This Page Intentionally Left Blank

18 Theory of

Parallel Genetic Algorithms ERICK CANTU-PAZ Lawrence Livermore National Laboratory, United States

18.1 INTRODUCTION One of the most promising alternatives to improve the efficiency of Genetic Algorithms (GAS)is to use parallel implementations. The most time-consumingoperation in a GA is the evaluation of the fitness of each individual in the population. Since the fitness evaluations are independent of one another, GAS are “embarrassingly parallel,” and efficientparallel versions of GAs are relatively easy to implement. However, despite their algorithmic simplicity, parallel GAS are complex nonlinear algorithms that are controlled by many parameters that affect their efficiency and the quality of their search. This chapter reviews the existing theory that explains the effect of the numerous parameters of these algorithms on their accuracy and efficiency. The design of parallel GAS involves numerous choices. For instance, one must decide whether to use a single or multiple populations. In either case, the size of the populations must be determined carefully, and for multiple populations, one must decide how many to use. In addition, the populations may remain isolated or they may communicate during the run by exchanging individuals or some other information. Communication involves extra costs and additional decisions on the pattern of communications, on the number of individuals to be exchanged, and on the frequency of communications. The chapter is organized following a common categorization of parallel GAS into four main classes [ 1,40, 1 1,4, 601:

Single-population master-slave GAS have a master node that stores the population and executes the GA operations (selection, crossover, and mutation). The evaluation of fitness is distributed among several slave processors (see Figure 18.1). Despite being very simple algorithms, master-slave implementations can be very efficient, as will be shown in Section 18.2. 425

426

THEORY OF PARALLEL GENETIC ALGORITHMS

Fig. 18.1 A schematic of a master-slave parallel GA. The master stores the population, executes the GA operations, and distributes individuals to the slaves. The slaves evaluate the fitness of the individuals.

Multiple-population GAS are the most sophisticated and popular type of parallel GAS. They consist of several subpopulations that exchange individuals (Figure 18.2 has a schematic). This exchange of individuals is called migration, and is controlled by several parameters such as the frequency of migration, the number and destination of migrants, and the method used to select which individuals migrate. The effect of these parameters on the search is complex and these algorithms are the focus of most of the research on parallel GAS. We will review the theory relevant to these algorithms in Section 18.3. Cellular GAS consist of a single spatially structured population (see Figure 18.3). The population structure is usually a two-dimensional rectangular grid, with one individual in each vertex. Selection and mating are restricted to a small neighborhood around each individual. The neighborhoods overlap, so that eventually the good traits of a superior individual can spread to the entire population. The theory pertinent to these algorithms will be reviewed in Section 18.4. Hierarchical hybrids combine multiple demes with master-slave or cellular GAS. We call this class of algorithms hierarchical parallel GAS, because at a higher level they are multiple-deme algorithms with single-population parallel GAS (either master-slave or cellular) at the lower level. A hierarchical parallel GA combines the benefits of its components, and it has the potential of better performance than any of them alone. There have been no theoretical analyses of this type of parallel GAS, but Canti-Paz [ 131 recommends using the existing theory for multiple populations to configure the higher level and existing master-slave models to configure the lower level.

In general, the communication requirements of all the algorithms are low, and inexpensive Beowulf clusters [56, 9, 331 or Web-based computations can be practical [44,2 11. In fact, the behavior of GAS with spatially distributedpopulations is interesting, regardless of their implementation on serial or parallel computers [29,35,46]. Having a spatially distributed population may have some algorithmic benefits that are independent of the efficiency gains obtained from using multiple processors (e.g., 142, 62, 22, lo]).

INTRODUCTION

427

Fig. 18.2 A schematic of a multiple-population parallel GA. Each circle represents a simple GA, and there is (infrequent) communication between the populations. In this example, the populations are arranged in a ring, but many other communication topologies have been used.

Fig. 183 A schematic of a cellular parallel GA. This class of parallel G A has one spatiallydistributed population, and it can be implemented very efficiently on massively parallel computers as well as on coarse-grained multiprocessors.

428

THEORY OF PARALLEL GENETIC ALGORITHMS 1 generation

Tc

master

slave1 slave2

--

I

Tc

1

Tcomp

I

I

Tc

I

TC

I

TCWD

I

l

Tc

I

i

-

time

idle

Fig. 18.4 One generation in a master-slave parallel GA when the master evaluates a fraction of the population.

18.2 MASTER-SLAVE PARALLEL GAS

Probably the easiest way to implement GAS on parallel computers is to distribute the evaluation of fitness among several slave processors while one master executes the GA operations (selection, crossover, and mutation). Master-slave parallel GAS are interesting and important for several reasons: (1) they explore the search space in exactly the same manner as serial GAS, and therefore the existing design guidelines for simple GAS are directly applicable; (2) they are very easy to implement, which makes them popular with practitioners; and (3) in many cases master-slave GASresult in significant improvements in performance. The execution time of master-slave GAS has two basic components: the time used in computations and the time used to communicate information among processors. Centering our attention on the master processor, Figure 18.4 depicts the sequence of events in every generation. First, the master sends a fraction of the population to each of the slaves, using time T, to communicate with each. Next, the master evaluates a fraction of the population using time where Tf is the time required to evaluate one individual, n is the size of the population, and P is the number of processors used. The slaves start evaluating their portion of the population as soon as they receive it and return the evaluations to the master as soon as they finish. We ignore the time consumed by selection, crossover, and mutation because it is usually much shorter than the time used to evaluate and to communicate individuals. In addition, we assume that the same number of individuals are assigned to each slave and that evaluation time is the same for all individuals. With these assumptions, the elapsed time for one generation of the parallel GA may be estimated as

q,

(18.1) As more slaves are used, the computation time decreases as desired, but the aT communication time increases. Making = 0 and solving for P , we obtain the

+

MASTER-SLAVE PARALLEL GAS

429

Speedup 1000

,

500 !

Fig. 18.5 Theoretical speedups of a master-slave GA varying the value of y. The thinnest line corresponds to y = 1, the intermediate to y = 10, and the thickest to y = 100. The dashed line is the ideal (linear)speedup.

optimal number of processors that minimizes the execution time:

p* =

@.

(18.2)

Defining the parallel speedup as

(1 8.3) and substituting P* give the maximum speedup possible as

s*p = -12p * .

(18.4)

Figure 18.5 shows the theoretical speedups of a master-slave GA varying the ratio y = Tf/T,. As y increases, more processors may be used effectively to reduce the execution time. The figure considers a master-slave GA with a population of n = 1000 individuals, and the speedups are plotted for y = 1,10,100. In many practical problems, the fknction evaluation time is much greater than the time of communications, Tf >> T, (y >> l), and master-slave parallel GAS can deliver near-linear speedups for a large range of processors. However, Equation 18.4 clearly points out that at their optimal configuration, the simple master-slave GAS have an efficiency of 50%: half of the time the processors are idle (or communicating). A major cause of this inefficiency is that the master waits for all the slaves to finish the fitness evaluations before selecting the parents to the next generation. This synchronization ensures that the master-slave produces exactly the same results as a simple serial GA, but it is easy to avoid if we are willing to accept a different behavior of the algorithm. In asynchronous master-slave GAS, the master generates individuals and sends them to the slaves to be evaluated. As soon as a slave finishes its share of the

430

THEORY OF PARALLEL GENETIC ALGORITHMS

evaluations, the master inserts the evaluated individual(s) into the population. Then, the master generates new individuals and sends them to any available slaves. Asynchronous master-slaves have the potential to be more efficient than the synchronous algorithm (e.g., [63, 5 5 ] ) , especially if the evaluation times are not constant for all individuals, but they introduce additional parameters. In particular, there are several options about how many individuals are generated at a time and about how they are incorporatedinto the population. Some of these choices may increase the selection pressure and require larger population sizes [ 13, 151. In addition, in asynchronous master-slaves, individuals may return from the slaves in a different order in which they were created, because some slaves will finish their evaluations faster than others. Davison and Rasheed [23] identified this problem and studied experimentally the effect of allowing random returns while varying the number of slaves. They found that by replacing a low-fit individual that is similar to the new one, the effect of accepting individuals in random order was very small. Asynchronousmaster-slave GASare only slightly more difficult to implement than the synchronous, and the gains in performance may easily offset the extra cost of development. However, their mathematical analysis is significantly more difficult. Until this point, the calculations have assumed that T, is constant, and that each slave received a single message from the master with all the individuals that the slave must evaluate. In reality, a better expression for T, may be T, = Bx L , where B is the inverse of the bandwidth of the network, x is the amount of information transmitted, and L is the latency of communications. The latency is the overhead per message that depends on the operating system, the programming environment, and the particular hardware, and can easily dominate the communication time. CanhiPaz [ 131 contains calculations that indicate that Equation 18.2 is a good estimator of the optimal number of slaves. Gagne et al. [24] perform similar calculations and include a term to account for possible failures of the slave nodes. Their calculations also take into account that the master can use multiple messages to send the individuals that each slave will evaluate. Of course, the master-slave method has been used to parallelize other EAs, such as Evolution Strategies [ 5 ] , Evolutionary Programming [37], and Genetic Programming [45]. The models presented in this section are applicable to those algorithms as well as to GAS.

+

18.3 MULTIPOPULATION PARALLEL GAS

The existing theory of multipopulationparallel GASdeals mainly with (1) the sizing of each population and (2) with the effects of migration. These two factors are related, but we will look first at the effects of population sizing, since this is extremely important for the quality and efficiency of the algorithm.

MULTIPOPULATION PARALLEL GAS

x=O

x=n/2Ak

431

x=n

Fig. 18.6 The bounded one-dimensional space of the gambler’s ruin problem. The absorbing barrier on the right represents a population full of representatives of the correct building block. The probability of reaching that barrier depends on the starting point and on the probability of adding one building block in each step, given by p.

18.3.1 Models of Population Sizing The existing population sizing models for multipopulation parallel GAS are derived from the gambler’s ruin model of population sizing for simple GAS [34]. This is a very simple model that originates from considering the number of correct building blocks in a partition of the population as a random quantity (see Figure 18.6). A partition is given by a template of k fixed symbols F and “don’t care” symbols * that match any character in the alphabet chosen to represent solutions. Assuming binary alphabets, a partition specified by k F symbols contains 2k schemata. A schema represents the class of individuals that match each of the fixed positions. The schema with the highest average fitness and that matches the global optimum in the fixed positions is called the correct building block. The gambler’s ruin model views the GA search as a series of competitions that end either when the entire population contains the building block or when the building block has completely disappeared from the population. For some problems, we can calculate the probability p that selection will add one copy of the correct building block to the population [27]. Given p , the probability that the population will be eventually full of copies of the correct building block can be approximated as [34] (18.5)

where n is the population size and k is the order of the building blocks we are considering. The exponent of the equation above is the number of correct building blocks of order k we expect in a randomly initialized population of size n if we are using binary encodings. For a given desired probability of success P I , we can solve the equation above for n and obtain a population-sizing equation: (18.6) An extremely simple case of a multipopulation parallel GA is to run T independent GAS and report the answer of the best of the runs to the user. Intuitively, this strategy makes sense: Since GASare stochastic and there will be some variance in the quality

432

THEORY OF PARALLEL GENETIC ALGORITHMS

of the results, reporting the best of r runs should result in a better solution than running the GA only once. An alternative view is that if we keep the desired quality constant, we could reduce the population size, execute the independent (reduced size) GAS in parallel, and reach the desired solution faster than with a single GA. However, at least for linearly decomposable functions, this strategy is not very efficient [ 17, 181. To reduce the population size and still reach the same solution quality, we would need to substitute in our population sizing equation by

(1 8.7) where m is the number of partitions in the problem. The point is that this new probability does not differ much from as more independent runs T as used. Therefore, the population size required by multiple independent runs is not much smaller than the size required by a single GA to reach the same solution, and there are almost no time savings. Empirical studies [58, 181 confirm that using isolated populations is not a very effective or efficient approach. Most multipopulation parallel GASexchange a number of individuals between the populations. Empirically, this exchange has been known for a long time to improve the effectiveness and efficiency of the algorithms, but the theoretical modeling of algorithms with migration is more difficult than modeling isolated populations. One approach to study the effect of migration is to bound the parameters that control migration and calculate the expected improvements in performance. This is the approach taken by Canhi-Paz and Goldberg [17, 191, who bounded the migration rate (the number of individuals communicated by each population) to its maximum value, and considered that migration occurred between all the populations (maximum connectivity), but only after the populations had converged to a unique solution (minimum migration frequency). These extreme settings of the algorithm are unlikely to be used in practice, but the corresponding calculations are simple and clearly illustrate the performance improvements of using migration. Essentially, the modeling of this extreme case of multipopulation GAS concentrates on one population. Since there is no communication until after the populations converge to a unique value, the gambler’s ruin model (Equation 18.5) can be used to determine the average quality of solutions reached by the populations. After convergence, the r populations communicate, each sending n / r individuals to each of the other populations (this is the maximum migration rate and complete connectivity). The populations incorporate the migrants and the gambler’s ruin model can be used again to predict the quality of solutions after the populations restart simply by changing the exponent of Equation 18.5 to reflect the expected number of correct building blocks after the first migration: (18.8)

MULTIPOPULATION PARALLEL GAS

433

This probability can be set to the desired success probability PI and solved (approximately) to find that the required population size would be much smaller than in a serial GA [ 17, 181. Actually, the required populations were so small that relatively good speedups were observed even with extremely simple objective functions [ 181. Following the idea of changing the initial point for the gambler’s ruin model, Cantti-Paz and Goldberg [19] analyzed the effect of arbitrary migration rates and more sparse topologies. Essentially, their calculations assumed that to reach a particular solution quality, the populations must start with some initial number of correct building blocks 5 1 (replacing this number in the exponent of the gambler’s ruin model gives the required success probability). For a given migration rate, it is easy to compute the probability that a population receives at least 2 1 building blocks from its neighbors. Actually, there are several combinations of migration rate, number of neighbors, and population size that result in the same probability of success: large populations with few neighbors have the same chance of contributing X I copies of the BB than small populations with many neighbors. Since the size of the populations mainly determines the computation time and the number of neighbors largely determines the communications time, there is a trade-off between communications and computation similar to the one for the master-slaves. This trade-off can be optimized to find the optimal number of neighbors and corresponding population size that minimizes the parallel execution time. Interestingly,the optimal number of neighbors is 0 which is the same as the optimal number of processors in master-slaves. All the calculations so far have assumed that the populations communicate after converging to a unique solution, then restart and converge again. The obvious extension is to consider what happens if the populations are restarted multiple times. With multiple restarts, the populations are influenced not only by their direct neighbors, but by the neighbors of their neighbors and so on. The multiple restarts in effect create an “extended neighborhood” composed of all the populations that contribute migrants to a population. The gambler’s ruin model again gives us a way to calculate the probability of success. For some topologies of communications where the populations have 6 neighbors, after 7 epochs the extended neighborhood grows linearly reaching S(T - 1) 1 = 6r’ 1 populations. The number of individuals in this extended neighborhood is (18.9) 72, = (AT’ l ) n &

(dm),

+

+

+

where n,j is the number of individuals in each population. Simply substituting this extended population size into the gambler’s ruin model (Equation 18.5) gives very accurate predictions of the probability of success, as evidenced in Figure 18.7. Empirical evidence suggests that even when the extended neighborhood does not grow linearly, using n, in the gambler’s ruin model is very accurate [13]. These results imply that the degree of the connectivity is a crucial factor in the solution quality and that the specific details of how the populations are connected is not as important. This observation is corroborated by other empirical studies [20] as well as by a more exact analysis with Markov chains [ 141.

434

THEORY OF PARALLEL GENETIC ALGORITHMS

Proportion BBs

I

20

40

60

80

100 1 2 0 140

Deme size

Fig. 18.7 Theoretical predictions (line) and experimental results (dots) of the average quality per deme after I , 2 , 3 , and 4 epochs (from right to left) using eight demes connected by a +1+2 topology.

18.3.2 Other Effects of Migration

The previous section showed that migration greatly affects the solution quality and that extensions of the gambler's ruin model can predict the quality very accurately, given certain assumptions. In particular, in the previous section we assumed that migration occurred after the populations had converged to a unique solution. In that case, there is no question about which individuals should migrate, because all the individuals are identical. This section reviews theory that considers other effects of migration and pays particular attention to the choice of migrants and how they are incorporated into the receiving population. All EAs have a selection mechanism that identifies which individuals in the population will survive to generate new solutions. The choice of selection method influences the speed of convergence of the algorithm, and it has been demonstrated that an excessively slow or fast convergence may cause the algorithm to fail [28,59]. In addition, the selection pressure is related to the optimal mutation rate and to the population size [43, 61. This is relevant to us because migration can cause changes in the selection pressure. The speed of convergence of different selection methods was first studied by Goldberg and Deb [26], who introduced the concept of takeover time. This approach considers a simplified population model with two classes of individuals: good and bad. The takeover time is the number ofgenerations that selection requires to replicate a single individual of the good class until the population is full. Goldberg and Deb used difference equations to model the proportion of individuals of the best class at each time step (generation). For example, for tournaments of size s , the difference equation is Pt+l = 1 - (1 - P,)'. We consider that individuals are selected to migrate either randomly or based on their fitness. Likewise, existing individuals can be replaced by incoming migrants randomly or based on their fitness. We can easily extend the takeover time equations to consider the four migration policies defined by the choice of emigrants and

MULTIPOPULATION PARALLEL GAS

I

o 1

o i

0.3

0.4

0.5

435

Iate

Fig. 18.8 Takeover times using different migration policies and varying the migration rate.

replacements. For example, for tournament selection, when good migrants replace bad individuals, the difference equation becomes [12, 131

Pt+l = 1 - (1 - Pt)"

+ p,

(18.10)

where p is the proportion of individuals that migrate. Similar equations can be found for the other migration policies [12, 131. Figure 18.8 shows the results of iterating these equations to obtain the takeover times in demes with n = 10,000 individuals and pairwise tournament selection (s = 2). The figure illustrateshow the convergence is faster as the migration rate increases and that the fastest convergence occurs when good migrants replace bad individuals, which is a frequently used migration policy (e.g., [32, 57, 42, 391). The slowest convergence occurs when both migrants and replacements are chosen randomly. A GA with a single population would converge in exactly the same time as this case. These simple takeover time calculations suggest that the difference between the fastest and slowest convergence times is quite large and that the choice of migrants is a greater factor in the convergence speed than the choice of replacements. Takeover times are very rough approximate tools, but they are useful to investigate how fast a good solution dominates the population and to compare different selection methods. A more sophisticated way to study selection methods -and therefore migration policies- is to examine the selection intensity, which is the difference of the average fitness of the selected individuals and of the original population, normalized by the standard deviation of the fitnesses of the original population:

Canhi-Paz [ 161 derived equations for the additional selection intensity caused by selecting migrants and their replacements based on their fitnesses. The calculations work for arbitrary migration rates and any regular communication topology where each population communicateswith 6 neighbors. If the fitness is distributed normally, we can derive closed-form equations for the increases in selection intensity caused by migration. When a fraction p of the top individuals in a population migrate, the

436

THEORY OF PARALLEL GENETIC ALGORITHMS

(a) Best replace worst.

(b) Best replace random.

(c) Random replace worst

Fig. 18.9 Selection intensity for different migration policies varying the migration rate and the number of neighbors (6=1.2,..,5, from bottom to top in each graph).

added selection intensity is

where 4 ( z ) = e x p ( - z 2 / 2 ) / 6 is the density of a normal distribution and @-‘(x) is the value of z where #(y)dy = 2. If the migrants are chosen randomly, there is no increase in the selection pressure and I , = 0. Replacing the worst individuals by incoming migrants causes an increase in the selection pressure given by

J2m

IT % #+€-1(1

- 6p)).

(18.12)

The maximum of I, is 4(0) = 1 / f i = 0.3989, which is a fairly low value, but it is not negligible (consider that the intensity of pairwise tournament selection is 0.5642). When individuals are replaced randomly in the receiving population, I , = 0. The overall selection intensity caused by migration is simply In&= I , I,. Figure 18.9 presents plots of I , for different migration policies considering topologies with different degrees and varying the migration rate. The plots show that the migration policy with the highest intensity is when the best individuals migrate and replace the worst. Using a very different method, Whitley et al. [61, 621 studied the dynamics of multiple populations using an exact model of GAS using an assumption of infinite populations. However, to study the effects of multiple finite populations, they initialized multiple models using different finite population samples. The models consider arbitrary migration rates and migration intervals. They observed that selecting the top individuals and replacing the worst causes an increase in the selection pressure and that even with this additional selection pressure the multiple populations maintain diversity. In the same paper, Whitley et al. present an intuitive explanation of why linearly separable problems might be solved faster by multiple populations. In essence, the argument is that subsolutions of the problem can be found by the different populations and these subsolutions are brought together by migration. A higher selection pressure causes the algorithms to converge faster. Faster convergence can be desirable, but converging too quickly without allowing enough

+

CELLULAR PARALLEL GAS

437

time for crossover to explore and combine useful building blocks may cause the algorithm to converge prematurely to a solution of poor quality. The additional selectionpressure can also be the cause of some controversialclaims of superlinear parallel speedups. Despite the attempts of careful experimenters to set up “fair” experiments (using the same parameters for mutation and crossover rates, keeping the total population sizes equal, using the same selection method, comparing algorithms when they reach the same solutions, etc.), the migration and replacement of individuals is usually done according to their fitness. As we have seen, these choices increase the selection pressure and cause the parallel algorithm to converge faster than it would if migrants and replacements were chosen randomly. Clearly, comparing parallel and serial algorithms with different selection intensities is not fair. Superlinear speedups can also be caused by other reasons. For example, the smaller populations might fit completely in cache memory 181 or the parallel code that executes unnecessary communications is used to measure the serial performance [7]. More details and discussions about fair comparisons can be found elsewhere [ 13, 31. 18.4 CELLULAR PARALLEL GAS

There is very little theory and relatively few applications of cellular parallel GAS. Perhaps the reason is that since they are well suited for massively parallel SIMD computers [47, 411, there might be a perception that they can be executed only on specialized hardware that is not easily accessible. However, it is also possible to implement them very efficiently on coarse-grain MIMD computers [35, 361, and because their behavior is interesting regardless of their implementation, simulating them on single-processor machines is also a good alternative. Since in cellular GAS an individual is restricted to interact with its neighbors, the most salient concern is the effect of the neighborhood on the quality and efficiency of the algorithm. In particular, the size of the neighborhood has received considerable attention. Early on, Manderick m d Spiessens 1411 observed that the solution quality degraded with larger neighborhoods and suggested to use moderately sized neighborhoods. Spiessens 1531 showed that for a neighborhood size s and a string length 1, the execution time ofthe algorithm was O(s+ 1 ) or O(s(1og s ) 1 ) time steps, depending on the selection scheme used. More recently, Sarma and De Jong [50,51] determined that the ratio of the radius of the neighborhood to the radius of the entire population is a critical parameter that determines the selection pressure. Their analysis is concerned with quantifying the time for a single good solution to takeover the entire population varying the sizes and shapes of neighborhoods. They observed that the empirical growth curves followed a logistic law and fitted them to the following equation:

+

Pt =

1

+ (+)

1 -

1) cat’

(18.13)

438

THEORY OF PARALLEL GENETIC ALGORITHMS

where Pt is the proportion of the best individual in the population at time t and a is a growth coefficient that Sarma and De Jong determine with a least-squares fit of the takeover growth curves. Their experiments showed that a has an inversely exponential dependence on the ratio of the neighborhood radius and the grid radius. Later, they extended their observations to consider dynamic environments [52] and observed that cellular GASseem to perform better than simple GASwithout requiring modifications (although some modificationshelp, see Kirley’s work, for example [38]). Others have extended Sarma and De Jong’s basic model to consider different mating strategies and population structures [31]. Rudolph [48] investigated takeover times in spatially distributed populations. He calculated lower bounds of the takeover time for arbitrary structures, lower and upper bounds for grid-like structures, and an exact closed-form expression for rings. The lower bound of the takeover time is the diameter of the graph (i.e., the largest number of edges that separate any two vertices in the graph). This suggests that population structures with short diameters have a stronger selection pressure, which is consistent with the empirical observationsof Gorges-Schleuter [30] and Sarma and De Jong [SO, 5 11. Rudolph conjectured that the takeover time depends more strongly on the diameter of the graph than on the particular selection method used. Sprave [54] presented a unified view of cellular and island population structures using hypergraphs (a generalization of graphs where an edge can represent relations between arbitrary subsets of vertices). The model can represent arbitrary topologies, migration policies, and migration intervals. It is “intended as a base for further theoretical work in the field of nonpanmictic population structures” and Sprave demonstrated the model’s utility to calculate takeover times. Most work in cellular GASassumes that the cells are updated in parallel and synchronously, meaning that all cells will be updated simultaneously in each generation. However, the cells can also be updated in serial and asynchronously. Alba et al. [2] identify three update possibilities: (1) a canonical ordering given by the position of the cells in the grid (line by line, left to right), ( 2 ) a fixed random permutation of the cells, and (3) a random permutation of cells in each generation. With experiments using benchmarking functions, they show that asynchronous updates reach the optimum solution faster, but less often than synchronous updates. In more recent work, Giacobini et al. [25] present quadratic recurrence equations that predict the growth curves very accurately when populations are structured as a grid. Other interesting theoretical work focuses on the convergence to the global optimum. Rudolph and Sprave [49] proved that by augmenting proportionate selection with a self-adapting threshold to accept offspring, a cellular GA is guaranteed to converge to the global optimum. 18.5 CONCLUSIONS

Parallel implementations of genetic and other EAs have long been recognized as a promising alternative to improve the efficiency of these algorithms. Most of the research in this field has been empirical, and while important observations were made

REFERENCES

439

even in very early studies, it is difficult to assess their generality or limitations. The existing theory summarized in this chapter explains many of the observations and provides guidelines that should be usefil to practitioners. There is, however, much that remains to be done. All theoretical models make assumptions about the problem, the algorithm, or both. Relaxing those assumptions suggests opportunities to extend the theory in several directions; for example, extending models for multiple populations to consider frequent migrations or communication behavior that adapts to the progress of the search. In addition, there is still relatively little theory related to cellular GAS. Continuing the study of these algorithms seems to be a promising avenue of research. Acknowledgments UCRL-MI-206366. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48.

REFERENCES 1. P. Adamidis. Review of parallel genetic algorithms bibliography. Tech. rep. version 1, Aristotle University of Thessalonilu, Thessaloniki, Greece, 1994.

2. E.Alba, M.Giacobini, M. Tomassini, and S. Romero. Comparing synchronous and asynchronous cellular genetic algorithms. In J.J. Merelo et al., editor, Parallel Problem Solving from Nature VIZ, pages 60 1-610, Berlin Heidelberg, 2002. Springer Verlag. 3. E. Alba and M. Tomassini. Parallelism and evolutionary algorithms. ZEEE Transactions on Evolutionary Computation, 6(5):443462,2002.

4.E. Alba and J. M. Troya. A survey of parallel distributed genetic algorithms. Complexity, 4(4):31-52, 1999. 5 . T. Back. Parallel optimization of evolutionary algorithms. In Y. Davidor, H.-P.

Schwefel, and R. Manner, editors, Parallel Problem Solving fron Nature, PPSN ZZI, pages 4 18427,Berlin, 1994.Springer-Verlag.

6.T. Back. Evolutionary algorithms in theory and practice. Oxford University Press, New York, 1996.

7.D.H. Bailey. Misleading performance reporting in the supercomputing field. ScientiJicProgramming, l(2): 141-1 5 1 , 1992. 8. T. C. Belding. The distributed genetic algorithm revisited. In L. Eschelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 114121,San Francisco, CA, 1995. Morgan Kaufmann.

440

THEORY OF PARALLEL GENETIC ALGORITHMS

9. F. H. Bennett 111, J. R. Koza, J. Shipman, and 0. Stiffelman. Building a parallel computer system for $18,000 that performs a half peta-flob per day. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Conference 1999: Volume 2, pages 1484-1490, San Francisco, CA, 1999. Morgan Kaufmann Publishers. 10. P. R. Calegari. Parallelization of Population-Based Evolutionary Algorithms for Combinatorial Optimization Problems. Unpublished doctoral dissertation, Ecole Polytechnique Federale de Lausanne (EPFL), 1999. 11. E. Canhi-Paz. Using Markov chains to analyze a bounding case of parallel genetic algorithms. Genetic Programming 98, pages 456462, 1998. 12. E. Canhi-Paz. Migration policies and takeover times in parallel genetic algorithms. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, GECCO-99: Proceedings of the I999 Genetic and Evolutionary Computation Conference, page 775, San Francisco, CA, 1999. Morgan Kaufmann Publishers. 13. E. Canhi-Paz. Ejicient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publishers, Boston, MA, 2000. 14. E. Canhi-Paz. Markov chain models of parallel genetic algorithms. IEEE Transactions on Evolutionary Computation, 4(3):2 16-226,2000. Also available as IlliGAL Report No. 98010. 15. E. Canhi-Paz. Selection intensity in genetic algorithms with generation gaps. In D. Whitley, D. E. Goldberg, E. Canhi-Paz, L. Spector, I. Parmee, and H.-G. Beyer, editors, GECCO-2000: Proceedings of the Genetic and Evolutionary Computation Conference, pages 9 11-9 18, San Francisco, CA, 2000. Morgan Kaufmann. 16. E. Canhi-Paz. Migration policies, selection pressure, and parallel evolutionary algorithms. Journal of Heuristics, 7(4):3 11-334,2001. 17. E. Canhi-Paz and D. E. Goldberg. Modeling idealized bounding cases of parallel genetic algorithms. In J. Koza, K. Deb, M. Dorigo, D. Fogel, M. Garzon, H. Iba, and R. Riolo, editors, Genetic Programming 1997: Proceedings of the Second Annual Conference, pages 353-361, San Francisco, CA, 1997. Morgan Kaufmann Publishers. 18. E. Canhi-Paz and D. E. Goldberg. Predicting speedups of idealized bounding cases of parallel genetic algorithms. In T. Back, editor, Proceedings ofthe Seventh International Conference on Genetic Algorithms, pages 113-1 2 1, San Francisco, 1997. Morgan Kaufmann. 19. E. Canhi-Paz and D. E. Goldberg. Efficient parallel genetic algorithms: theory and practice. Computer Methods in Applied Mechanics and Engineering, 1861221-238, 1999.

REFERENCES

441

20. E. Canhi-Paz and M. Mejia-Olvera. Experimental results in distributed genetic algorithms. In International Symposium on Applied Corporate Computing, pages 99-108, Monterrey, Mexico, 1994. 2 1. Fuey-Sian Chong. Java-based distributed genetic programming on the internet. In A. Wu, editor, Evolutionary Computation and Parallel Processing Workshop, Proceedings of the 1999 GECCO Workshops, pages 163-166, July 1999. 22. Y. Davidor. The ECOlogical framework 11: Improving GA performance at virtually zero cost. In S . Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms, pages 171-176, San Mateo, CA, 1993. Morgan Kaufmann. 23. B. D. Davison and K. Rasheed. Effect of global parallelism on a steady state ga. In A. Wu, editor, Evolutionary Computation and Parallel Processing Workshop, Proceedings of the I999 GECCO Workshops, pages 167-1 70, July 1999. 24. C. Gagne, M. Parizeau, and M. Dubreuil. The master-slave architecture for evolutionary computations revisited. In E. Canai-Paz et al., editor, Genetic and Evolutionary Computation-GECCO-2003, pages 1578-1579, Berlin, 2003. Springer-Verlag. 25. M. Giacobini, E. Alba, A. Tettamanzi, and M. Tomassini. Modeling selection intensity for toroidal cellular evolutionary algorithms. In K. Deb et al., editor, Genetic and Evolutionary Computation-GECCO 2004, pages 1138-1 149, Berlin Heidelberg, 2004. Springer Verlag. 26. D. E. Goldberg and K. Deb. A comparative analysis of selection schemes used in genetic algorithms. Foundations of Genetic Algorithms, 1:69-93, 1991. (Also TCGA Report 90007). 27. D. E. Goldberg, IS.Deb, and J. H. Clark. Genetic algorithms, noise, and the sizing of populations. Complex Systems, 6~333-362, 1992. 28. D. E. Goldberg, K. Deb, and D. Thierens. Toward a better understanding of mixing in genetic algorithms. Journal of the Society of Instrument and Control Engineers, 32(1):10-16, 1993. 29. V. S. Gordon and D. Whitley. Serial and parallel genetic algorithms as hnction optimizers. In S. Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms, pages 177-183, San Mateo, CA, 1993. Morgan Kaufmann. 30. M. Gorges-Schleuter. Explicit parallelism of genetic algorithms through population structures. In H.-P. Schwefel and R. Manner, editors, Parallel Problem Solving from Nature, pages 150-1 59, Berlin, 1991. Springer-Verlag. 3 1. M. Gorges-Schleuter. An analysis of local selection in evolution strategies. In W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation

442

THEORY OF PARALLEL GENETIC ALGORITHMS

Conference 1999: Volume 1, pages 847-854, San Francisco, CA, 1999. Morgan Kaufmann Publishers. 32. J. J. Grefenstette. Parallel adaptive algorithms for function optimization. Tech. Rep. No. CS-8 1-19, Vanderbilt University, Computer Science Department, Nashville, TN, 1981. 33. J.-P. Gwo, F.M. Hoffinan, and W.W Hargrove. Mechanistic-based genetic algorithm search on a Beowulf cluster of Linux PCs. In Proceedings ofthe High Performance Computing Conference, Washington, DC, pages 148-1 53,2000. 34. G. Harik, E. Canhi-Paz, D. E. Goldberg, and B. L. Miller. The gambler’s ruin problem, genetic algorithms, and the sizing of populations. In Proceedings of 1997 IEEE International Conference on Evolutionary Computation, pages 7-1 2, Piscataway, NJ, 1997. IEEE. 35. W. E. Hart. Adaptive Global Optimization with Local Search. PhD thesis, University of California, San Diego, 1994. 36. W. E. Hart, S. Baden, R. K. Belew, and S. Kohn. Analysis of the numerical effects of parallelism on a parallel genetic algorithm. In Proceedings ofthe 10th International Parallel Processing Symposium, pages 6 0 6 6 12. IEEE Press, 1996. 37. J.E. Hirsh and D.K. Young. Evolutionary programming strategies with self adaptation applied to the design of rotorcraft using parallel processing. In V. W. Porto, N. Saravanan, D. Waagen, and A.E. Eiben, editors, Proceedings of the Seventh Annual Conference on Evolutionary Programming, pages 147-1 56, Berlin, 1998. Springer Verlag. 38. M. Kirley. An empirical investigation of optimisation in dynamic environments using the cellular genetic algorithm. In D. Whitley, D. E. Goldberg, E. CanhiPaz, L. Spector, I. Parmee, and H.-G. Beyer, editors, GECCO-2000: Proceedings of the Genetic and Evolutionary Computation Conference, pages 11-18, San Francisco, CA, 2000. Morgan Kaufmann. 39. S.-C. Lin, E. D. Goodman, and W. F. Punch 111. Investigating parallel genetic algorithms on job shop scheduling problems. In P. J. Angeline, R. G. Reynolds, J. R. McDonnell, and R. Eberhart, editors, Evolutionary Programming VI,pages 383-393, Berlin, 1997. Springer. 40. S-C Lin, W. Punch, and E. Goodman. Coarse-grain parallel genetic algorithms: Categorization and new approach. In Sixth IEEE Symposium on Parallel and Distributed Processing, pages 28-37, Los Alamitos, CA, October 1994. IEEE Computer Society Press. 4 1. B. Manderick and P. Spiessens. Fine-grained parallel genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 428433, San Mateo, CA, 1989. Morgan Kaufmann.

REFERENCES

443

42.H. Muhlenbein. Evolution in time and space-The parallel genetic algorithm. In G. J. E. Rawlins, editor, Foundations of Genetic Algorithms, pages 3 16-337.San Mateo, CA, 199 1. Morgan Kaufmann.

43.H. Miihlenbein and D. Schlierkamp-Voosen. The science of breeding and its application to the breeder genetic algorithm (BGA). Evolutionary Computation, l(4):335-3 60,1 994.

44.P. Nangsue and S. E. Conry. An agent-oriented, massively distributed parallelization model of evolutionary algorithms. In J. R. Koza, editor, Late Breaking Papers at the Genetic Programming 1998 Conference: pages 160-168,Madison, WI, 1998.Omni Press.

45.M. Oussaidene. Genetic programming methodology, parallelization and applications. Unpublished doctoral dissertation, Universite de Genkve, Geneve, 1997.

46.W.F. Punch. How effective are multiple programs in genetic programming. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. L. Riolo, editors, Genetic Programming 98, pages 308-313, San Francisco, 1998. Morgan Kaufmann Publishers.

47.G. G. Robertson. Parallel implementation of genetic algorithms in a classifier system. In J. J. Grefenstette, editor, Proceedings of the Second International Conference on Genetic Algorithms, pages 140-147, Hillsdale, NJ, 1987. Lawrence Erlbaum Associates.

48.G. Rudolph. On takeover times in spatially structured populations: Array and ring. In K.K. Lai, 0. Katai, M. Gen, and B. Lin, editors, Proceedings of the Second Asia-PaciJic Conference on Genetic Algorithms and Applications, pages 144-151, Hong Kong, 2000. Global Link Publishing Company.

49.G. Rudolph and J. Sprave. A cellular genetic algorithm with self-adjusting acceptance threshold. In Proceedings of the First IEElIEEE International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications, pages 365-372, London, 1995. Institution of Electrical Engineers and Institute for Electrical and Electronics Engineers. 50. J. Sarma and K. De Jong. An analysis of the effects of neighborhood size and shape on local selection algorithms. In H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.-P. Schwefel, editors, Parallel Problem Solving from Nature, PPSN IV, pages 23C244,Berlin, 1996. Springer-Verlag.

5 1. J. Sarma and K. De Jong. An analysis of local selection algorithms in a spatially structured evolutionary algorithm. In T. Back, editor, Proceedings qfthe Seventh International Conference on Genetic Algorithms, pages 181-1 87,San Francisco, 1997. Morgan Kaufmann.

52. J. Sarma and K. De Jong. The behavior of spatially distributed evolutionary algorithms in non-stationary environments. In W. Banzhaf, J. Daida, A. E. Eiben,

444

THEORY OF PARALLEL GENETIC ALGORITHMS

M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Conference 1999: Volume I , pages 572-578, San Francisco, CA, 1999. Morgan Kaufmann Publishers. 53. P. Spiessens and B. Manderick. A massively parallel genetic algorithm: Implementation and first analysis. In R. K. Belew and L. B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 279-286, San Mateo, CA, 1991. Morgan Kaufmann. 54. J. Sprave. A unified model of non-panmictic population structures in evolutionary algorithms. In Peter J. Angeline, Zbyszek Michalewicz, Marc Schoenauer, Xin Yao, and Ali Zalzala, editors, Proceedings of the Congress on Evolutionary Computation, volume 2, pages 13861391. IEEE Press, 1999.

55. T. J. Stanley and T. Mudge. A parallel genetic algorithm for multiobjective microprocessor design. In L. Eschelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 597-604, San Francisco, CA, 1995. Morgan Kaufinann. 56. T. Sterling. Beowulf-class clustered computing: Harnessing the power of parallelism in a pile of PCs. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. L. Riolo, editors, Genetic Programming 98, pages 883-887, San Francisco, 1998. Morgan Kaufmann Publishers.

57. R. Tanese. Parallel genetic algorithm for a hypercube. In J. J. Grefenstette, editor, Proceedings of the Second International Conference on Genetic Algorithms, pages 177-183, Hillsdale, NJ, 1987. Lawrence Erlbaum Associates. 58. R. Tanese. Distributed genetic algorithms. In J. D. Schaffer, editor, Proceedings

the Third International Conference on Genetic Algorithms, pages 434439, San Mateo, CA, 1989. Morgan Kaufmann.

of

59. D. Thierens and D. E. Goldberg. Mixing in genetic algorithms. In S. Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms, pages 3 8 4 5 , San Mateo, CA, 1993. Morgan Kaufmann. 60. M. Tomassini. Parallel and distributed evolutionary algorithms: A review. In K. Miettinen, M. Makela, P. Neittaanmaki, and J. Periaux, editors, Evolutionary Algorithms in Engineering and Computer Science, pages 113-1 33. J. Wiley and Sons, Chichester, UK, 1999. 61. D. Whitley. A free lunch proof for gray versus binary encodings. Proceedings of the Genetic and Evolutionary Computation Conference 1999: Volume I , pages 726-733,1999. 62. D. Whitley and T Starkweather. Genitor 11: A distributed genetic algorithm. Journal ofExperimenta1 and TheoreticalArtijkiaI Intelligence, 2: 189-2 14, 1990.

REFERENCES

445

63. B. P. Zeigler and J. Kim. Asynchronous genetic algorithms on parallel computers. In S. Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms, page 660, San Mateo, CA, 1993. Morgan Kaufmann.

This Page Intentionally Left Blank

19 PARALLEL

METAHEURISTIC S

APPLICATIONS

TEODOR GABRIEL CRAINICl, NOURREDINE HAIL2 ‘Ecole des sciences de la gestion, Universite du Quebec a Montreal and Centre de recherche sur les transports, Univ. de Montreal, Montreal, Canada

2Centre de recherche sur les transports, Universite de Montreal, Montreal, Canada

19.1 INTRODUCTION Parallel metaheuristics have been applied to a multitude of fields by people with different backgrounds and active in different scientific communities. This illustrates the importance of parallel metaheuristics. This phenomenon has also resulted, however, in what may be called a lack of dissemination of results across fields. This is unfortunate but not exclusive to the parallel metaheuristics field. This chapter aims to contribute toward filling this gap. The parallel metaheuristics field is a very broad one. Parallel metaheuristics may potentially be applied to any decision problem in whichever field, as indeed it is the case with sequential metaheuristics. Yet, the space available for this chapter imposes hard choices and limits the presentation. We have therefore selected a number of topics that we believe representative due to their significant methodological impact and broad practical interest: graph coloring and partitioning, Steiner tree problems, set covering and partitioning, satisfiability and MAX-SAT problems, quadratic assignment, location and network design, traveling salesman and vehicle routing problems. We do not pretend to be exhaustive. We have also restricted to a minimum the presentation of general parallel computation issues as well as that of the parallel metaheuristic strategies. The reader may consult a number of surveys, taxonomies, and syntheses of parallel metaheuristics, of which quite a few address the “classical” metaheuristics, Simulated Annealing (SA), Genetic Algorithms (GAS), and Tabu Search (TS), while some others address the field in more comprehensive terms: 447

448

PARALLEL METAHEURISTICS APPLICATIONS

PSA: Azencott [4], Greening [71,70], Laursen [84], Ram, Sreenivas, and Subramaniam [ 1191;

PGA: Canhi-Paz [21], Lin, Punch, and Goodman [91], Miihlenbein [104], Shonkwiler [ 1331; P T S Crainic, Toulouse, and Gendreau [41], Glover and Laguna [69], VoB [ 1621; General: Barr and Hickman [9], Crainic [33], Crainic and Toulouse [37, 381, Cung et al. [43], Holmqvist, Migdalas, and Pardalos [74], Pardalos et al. [ I 141, Verhoeven and Aarts [ 1601. Surveying the literature, one notices an uneven distribution of work among the various areas, both in number of papers and in the variety of methods used. Indeed, in several fields the number of different metaheuristics that are applied appears quite limited as is the number of parallelization strategies. As a result, general trends and conclusions are difficult to identify, even within a given area. It appears that several areas would benefit from a broader investigation and critical comparison of metaheuristics and parallel strategies. This chapter has been conceived with the aim of offering both a starting point and an incentive to undertake the research required to address these challenges. The chapter is organized as follows. Section 19.2 introduces the elements we use to describe and classify each contribution. Sections 19.3 to 19.12 present parallel metaheuristic contributions to the areas identified above. Section 19.13 concludes the chapter.

19.2 PARALLEL METAHEURISTICS To survey applications of parallel metaheuristics requires that one defines some criteria to classify the parallel metaheuristic strategies. Our approach is one of conceptual, algorithmic design, the computer implementation playing second violin. We adopt for this chapter a classification that is sufficiently general to encompass all metaheuristic classes without, on the one side, erasing the specifics of each while, on the other side, avoiding a level of detail incompatible with the scope and dimension limits of the chapter. The classification is based on the work of Crainic, Toulouse, and Gendreau [41] and Crainic [33], with considerations from Crainic and Toulouse [37,38]. Note that Verhoeven and Aarts [ 1601 and Cung et al. [43] present classifications that proceed in the same spirit as Crainic, Toulouse, and Gendreau [4 11. Parallelidistributedcomputing applied to problem solving means that several processes work simultaneously on several processors with the common goal of solving a given problem instance. One then has to determine how the global problem-solving process is controlled and how information is exchanged among the various processes. More than one solution method may be used to perform the same task and thus a taxonomy has also to specify this characteristic. The first dimension, Search Control

PARALLEL METAHEURISTICS

449

Cardinality, thus explicitly examines how the global search is controlled: either by a single process (as in master-slave implementations) or collegially by several processes that may collaborate or not. The two alternatives are identified as 1-control (1 C) andp-control (pC),respectively. The dimension relative to the type of Search Control and Communications addresses the issue of how information is exchanged. In parallel computing, one generally refers to synchronous and asynchronous communications. In the former case, all concerned processes have to stop and engage in some form of communication and information exchange at moments (number of iterations, time intervals, specified algorithmic stages, etc.) exogenously determined, either hard-coded or determined by a control (master) process. In the latter case, each process is in charge of its own search, as well as of establishing communications with other processes, and the global search terminates once each individual search stops. To reflect more adequately the quantity and quality of the information exchanged and shared, as well as the additional knowledge derived from these exchanges (if any), we refine these notions and define four classes of Search Control and Communication strategies, Rigid (RS) and Knowledge Synchronization (KS) and, symmetrically, Collegial (C) and Knowledge Collegial (KC). The thrd dimension indicates the Search Diflerentiation: do search threads start from the same or different solutions and do they make use of the same or different search strategies? The four cases considered are: SPSS, Same initial Point/Population, Same search Strategy; SPDS, Same initial Point/Population, Different search Strategies; MPSS, Multiple initial Points/Populations, Same search Strategies; MPDS, Multiple initial Points/Populations, Diferent search Strategies. Obviously, one uses “point” for neighborhood-basedmethods such as Simulated Annealing, Tabu Search, Variable Neighborhood Search, GRASP, Guided Local Search, etc., while “population” is used for evolutionary methods (e.g., Genetic Algorithms), Scatter Search, and colony methods (e.g., ant colonies). When the initial population is a singleton (e.g., an ant), the term single evolutionary point is also used, as well as, in the PGA context, fine-grained, massive, or global parallelism. Typically, I-control strategies implement a classical master-slave approach that aims solely to accelerate the search. Here, a “master” processor executes a sequential metaheuristic but dispatches computing-intensive tasks to be executed in parallel by “slave” processes. The master receives and processes the information resulting from the slave operations, selects and implements moves or, for population-based methods, selects parents and generates children, updates the memories (if any) or the population, and decides whether to activate different search strategies or stop the search. In the context of neighborhood-based search, the operation most widely targeted in such approaches is the neighborhood evaluation. At each iteration, the possible moves in the neighborhood of the current solution are partitioned into as many sets as the number of available processors and the evaluation is carried out in parallel by slave processes. For population-based methods, it is the fitness evaluation that is most often targeted in such lC/RS/SPSS strategies.

450

PARALLEL METAHEURISTICS APPLICATIONS

Probing or look-ahead strategies belong to the IC/KS class with any of the search differentiation models identified previously. For neighborhood-based methods, such an approach may allow slaves to perform a number of iterations before synchronization and the selection of the best neighboring solution from which the next iteration is initiated (one may move directly to the last solution identified by the slave process or not). For population-based methods, the method may allow each slave process to generate child solutions, “educate” them through a hill climbing or local search procedure, and play out a tournament to decide who of the parents and children survive and are passed back to the master. Multisearch or multithread parallel strategies for metaheuristics have offered generally better performances, in terms of solution quality and computing times, than the methods introduced above. Historically, independent and synchronous cooperative multisearch methods were proposed first. The emphasis currently is on asynchronous communications and cooperation. Most applications of such strategies generally fall into the pC category. Independent multisearch methods belong to the pC/RS class of the taxonomy. Most implementations start several independent search processes, all using the same search strategy, from different, randomly generated, initial configurations. No attempt is made to take advantage of the multiple threads running in parallel other than to identify the best overall solution once all processes stop. This definitively earns independent search strategies their Rigid Synchronization classification. (Note that, in general, the implementations designate a processor to collect the information and verify stopping criteria.) This parallelization of the classic sequential multistart heuristic is easy to implement and may offer satisfactory results. Cooperative strategies often offer superior performance, however. pCKS cooperative strategies adopt the same general approach as in the independent search case but attempt to take advantage of the parallel exploration by synchronizing processors at predetermined intervals. In a master-slave implementation, the master process then collects the information and usually restarts the search from the best solution. Note that one can overcome the limitations of the master-slave architecture by, for example, empowering each search process to initiate synchronization of all other searches (e.g., using a broadcast) or a prespecified subset (e.g., processes that run on neighboring processors). Here, as in the more advanced cooperation mechanisms indicated bellow, migration is the term used to identify information exchanges in PGA. Asynchronous cooperative multithread search methods belong to the pC/C or pC/KC classes of the taxonomy according to the quantity and quality of the information exchanged and, eventually, on the “new” knowledge inferred based on these exchanges. Most such developments use some form of memory for interthread communications (the terms pool, blackboard, and data warehouse are also used sometimes). Each individual search thread starts from (usually) a different initial solution and generally follows a different search strategy. Exchanges are performed asynchronously and through the pool. The information exchanged may be simply a “good’ solution, a solution and its context (e.g., memories recording recent behavior of solution attributes), or a comprehensive history search. Some cooperation

GRAPH COLORING

451

mechanisms, most migration-basedpopulation methods for example, do not keep any trace of information exchanges. Most others keep at least the solutions exchanged. Memories recording the performance of individual solutions, solution components, or even search threads may be added to the pool and statistics may be gradually built. Historically, adaptive memory mechanisms relied on building a set of “good” partial solutions extracted from “good” solutions, whle central-memory ones kept all solutions exchanged. The differences between the two approaches tend to become more and more blurred. Various procedures may also be added to the pool to attempt to extract information or to create new informations and solutions based on the solutions exchanged. Cooperative multithread strategies implementing such methods belong to the pC/KC class. One of the goals of memory-based cooperation strategies is to increase the control on the complex information diffusion process generated by chains of interprocess information exchanges. (It is indeed well-known that unrestricted access to shared information may have quite a negative impact on the global behavior of cooperative searches.) A different approach to cooperation has been proposed recently with the same goals. The mechanism is called dzfusion and in its original presentation was based on a multilevel paradigm. In this setting, each search works on a version of the problem instance “aggregated” at a given level. Exchanges are then restricted to the processes working at one aggregation level up and down. Exchanges involve solution values and context information that is used by various operators to guide the search at the receiving level. This pC/KC approach is presented in somewhat more detail in Section 19.4.4. We complete this section with a note on hybrid methods. The term is much used but its meaning varies widely. In a strict sense, all metaheuristics are hybrids since they involve at least two methods. Closer to most applications, a hybrid involves at least two methods that belong to different methodological approaches. Thus, for example, using genetic operators to control the temperature evolution in a PSA method yields a hybrid. Notice, however, that by this definition, all evolutionary methods that include an “educational” component, that is, an enhancement of new solutions through a hill climbing, a local search, or even a full-blown metaheuristic, are hybrids. Most cooperative parallel strategies could be qualified as hybrids as well. Since, other than “more than one method is used”, the term does not offer any fundamental insight into the design of parallel strategies for metaheuristics, we do not use it to qualify the contributions reviewed in this chapter. 19.3

GRAPH COLORING

Graph coloring is a well-studied problem with many applications, in particular to the problem of testing printed circuit boards for unintended short circuits, frequency assignment in wireless networks, and time tabling. The parallel metaheuristic developments for the problem are, however, very limited.

452

PARALLEL METAHEURISTICS APPLICATIONS

Given a graph, the problem consists in finding the minimum number of colors such that an assignment of colors to graph vertices yields a coloring where adjacent vertices display different colors. The problem is known to be "-hard. The unique parallel metaheuristic contribution we are aware of was proposed by Kokosinski, Kolodziej, and Kwarciany [82]. The authors proposed a cooperative coarse-grained PGA, often identified in the PGA literature as the island framework: each process in the cooperation ran the same GA, and all subpopulations were of the same dimension. Cooperation was performed by migrating at fixed intervals, a fixed number of individuals from one island to all the other ones. The arriving individuals randomly replaced the same number of individuals of the receiving subpopulation. Two migration policies were proposed: random and elitist. In the former, the migrating individuals were randomly selected, while in the latter, the best individuals migrate. Experimental results on benchmark test problems indicated that the elitist migration approach performed better than the random one. Optimal solutions were obtained rapidly when the elitist migration strategy was applied between a small number of subpopulations (3 or 5) of size 60.

19.4 GRAPH PARTITIONING The graph (and hypergraph) partitioning problem may be stated as follows. Given a graph G = ( V ,€) with vertex set V and edge set E ) find a partition of the graph into a number of disjoint subsets of vertices such that some conditions are satisfied. Thus, for example, find a p-set partition such that the cardinality of the p subsets of vertices are (nearly) equal. Graph partitioning is a fundamental problem in combinatorial optimization and graph theory. It appears in many application areas, especially in computer science (e.g., VLSI design). A significant number of parallel metaheuristics, involving a wide array of metaheuristic methodologies, have been proposed for the problem.

19.4.1 Fine-Grained PGA Miihlenbein [ 1031 proposed an asynchronous fine-grained PGA approach where the individuals were placed on a particular topology (grid) and a small fixed-size neighborhood was assigned to each individual. This neighborhood usually consisted of the neighbor individuals on the grigand the application of selection and crossover operators was restricted to this neighborhood. The author suggested using different hill-climbing strategies applied concurrently, but did not implement this strategy. Collins and Jefferson [3 13 developed a fine-grained PGA for the multilevel graph partitioning problem using the massively parallel Connection Machine. The main purpose of this work was to characterize the difference between local and panmictic (i.e., at the level of the entire population) selectiodcrossover schemes, which are known to converge on a single peak of multimodal functions, even when several solutions of equal quality exist. Tests were performed on a function with two optimal

GRAPH PARTITIONING

453

solutions, and the panmictic approach never found both solutions. The method using local selection and crossover consistently found both optimal solutions and appeared more robust. The authors noted that behavior-changingmodifications to the panmictic selection and crossover operators required access to global information, which they claimed was not suitable for parallelization. One may question this conclusion, given the new cooperation mechanisms proposed since and the particular architecture used (which is no longer available). Talbi and Bessiere [ 1461 studied the problem of mapping parallel programs on parallel architectures. The authors proposed a mathematical model related to the graph partitioning problem and a fine-grained PGA. The initial population was arranged on a torus of transputers such that every transputer held exactly one individual. Each individual thus had four neighboring individuals, and each of them was selected in turn as the second parent for the reproduction operation. Each reproduction produced two offsprings, but only one of them (randomly selected) survived to participate to the operation that replaced the individual with the best of its surviving offsprings. The proposed parallel algorithm was tested for a pipeline of 32 vertices partitioned into eight subsets. Near-linear speedup was observed. Moreover, the solution quality increased with the population size. According to the authors, the fine-grained algorithm outperformed SA and hill-climbing methods. Maruyama, Hirose, and Konagaya [97] implementedan asynchronousfine-grained PGA on a cluster of workstations and a Sequent Symmetry computer in an attempt to adapt the fine-grained strategy to coarse-grained parallel computers. Each processor had an active individual and a buffer of several suspended individuals. Active individuals were sent to all other processors, which randomly selected one individual among those received to replace one of the suspended individuals according to the fitness function. Crossover consisted of replacing part of the active individual by a part of one of the suspended individuals. Only one offspring was produced, the other parts of the active and suspended individuals being rejected. Mutation was then applied and the modified active individualwas compared to the suspended individuals. If it could not survive, it was replaced by one of the suspended individuals according to the fitness function. Tests were performed with 15 processors for the Sequent Symmetry and 6 processors for the cluster of workstations. The authors reported near-linear speedups for the same quality of solution on both types of coarse-grained architecture. 19.4.2 Coarse-Grained PGA

Cohoon et al. [28] and Cohoon, Martin, and Richards [29, 301 compared the independent multisearch PGA (i.e., without migration) and a pCKS cooperative strategy where migration operators were applied at regular intervals. The latter strategy outperformed the independent search approach. Diekmann et al. [49] proposed: lC/RS PGA for the ppartitioning problem implemented as a master-slave model. Numerical experimentation on a MIMD machine using up to 64 processors showed sublinear speedups. Yet, the solution quality of the parallel algorithm outperformed that of the sequential version. This

454

PARALLEL METAHEURISTICS APPLICATIONS

performance was increasing with the number of processors. The authors also observed a strong dependence of the solution quality on the value of p : the larger, the better. Lin, Punch, and Goodman [9 11 presented several coarse-grained GAS based on different cooperation schemes, obtained by varying the Control and Communication and the Search Differentiation strategies. Static and dynamic communications have been considered. In the static model, communications are defined by the physical topology, rings, meshes, etc. The dynamic model allows several degrees of freedom in the choice of the process to communicate with by, for example, taking into account the Hamming distance between the two processes in the given computer architecture. The migration may be synchronous, after a fixed number of generations or when a convergence threshold is attained, or asynchronous (elitist). Two Search Differentiation strategies were implemented: either the same strategy for all processes or different strategies for each by varying the genetic operators, encoding method, an so on. Computational experiments were conducted for eight instantiations of the previous PGA model. Subpopulations were of equal size, randomly generated initially, and placed in a ring. Communications were triggered at fixed intervals. Numerical results indicated that the pC/C/MPDS strategy, i.e., different GA search strategies with asynchronousmigrations toward a subpopulation dynamically selected based on Hamming distance, offered the best performance. Superlinear speedup was observed for 5 , 10, and 25 subpopulations. Hidalgo et al. [73] studied a graph partitioning problem coming from of a particular circuit design application. The authors proposed a two-level hierarchical parallelization. A pCIKSIMPDS multithread cooperative method, which synchronized at regular intervals to improve the best individual, was run at the first level. The fitness computation was performed at the second level in a master-slave implementation. Experimental results showed good performance for up to eight processors. The overhead due to the parallelization of the fitness becomes significant for larger numbers of processors.

19.4.3 Parallel Simulated Annealing Durand [54] addressed the issue of error tolerance for PSA methods that implement 1C/KS strategies with domain decomposition. In such strategies, the problem variables (i.e., vertices of the graph partitioning problem) are partitioned into subsets and distributed among a number of processors. A master-slave approach was used on a shared-memory system. To initiate the search, the master processor partitioned the vertices into a number of initial sets and sent each set to a processor, together with the initial values for the temperature and the number of iterations to be executed. Each slave processor then executed the SA search at the received temperature on its allocated set of variables, and sent its partial configuration of the entire solution to the master processor. Once the information from all slaves was received, the master processor merged the partial solutions into a complete solution and verified the stopping criterion. If the search continued, it generated a new partition of the variables such that each set is different from the one at the previous partition, and sent them to the slave processors together with new values for the number of iterations and the

GRAPH PARTITIONING

455

temperature. The author tested different levels of synchronization to measure the impact on the errors generated. As expected, they conclude that 1) the error is small at frequent synchronization levels but the cost in computation efficiency is high and 2) the error increases as the frequency of synchronization decreases and the number of processors increases. An alternative to the decomposition-based strategies is to move to multisearch approaches, where each processor runs its own cooling schedule. Lee and Lee [86, 88, 871 examined a pCRS independent search model as well as several cooperation variants where the SA threads interact synchronously and asynchronously at fixed or dynamic intervals. For the graph partitioning problem, dynamic interval exchange strategies generally performed best. Asynchronous and synchronous cooperative multithread SA outperformed the other parallelizations in terms of solution quality and running time. The pC/C strategy, where threads exchange asynchronously through a central memory obtained solutions of equal or better quality compared to the synchronous parallel schemes. Laursen [83] proposed a different cooperation mechanism. Noting that several SA threads form a population, he proposed a scheme based on the selection and migration operators of parallel GAS. Each processor concurrently runs k SA procedures for a given number of iterations. Processors are then paired and each processor migrates (copies) its solutions to its paired processor. Thus, after the migration phase, each processor has 2k initial solutions and this number is reduced to k by selection. These new k solutions become the initial configurations of the k concurrent SA threads, and the search restarts on each processor. Pairing is dynamic and depends on the topology of the parallel machine. For example, in a grid topology, processors can pair with any of their comer neighbors. Three cooperation strategies were tested: no migration, global, and local (stepping-stone). Global migration corresponded to a Knowledge Synchronization strategy: the best states were brought to a given processor, which then chose the best among the best and broadcast them to all processors. This strategy suffered a 10% to 20% overhead in communication cost and produced very bad solutions. As expected, the independent search (no migration) approach was the fastest strategy but produced lower quality solutions compared to the local migration strategy, which incurred only a 2% overhead. Because processors are dynamically paired and neighborhoods overlap, in the local cooperation scheme information propagates in the network of processors similarly to the stepping-stone coarse-grained model for parallel genetic methods. 19.4.4 Multilevel Cooperation

Toulouse, Thulasiraman, and Glover [ 1581; see also Toulouse, Glover, and Thulasiraman 1998 [ 1571 proposed a new cooperation mechanism and applied it to the graph and hypergraph partitioning problems with great success (Ouyang et al. [ 11 1, 1121). Their approach is currently the best available for this problem. The mechanism is called multilevel cooperative search, belongs to the pC/KC class with potentially any Search Differentiation strategy (the authors used MPSS), and is based on the principle of controlled diffusion of information. Each search process

456

PARALLEL METAHEURISTICS APPLICATIONS

works at a different level of aggregation of the original problem (one processor works on the original problem). The aggregation scheme ensures that a feasible solution at any level is feasible at the more disaggregated levels. Ouyang et al. [ 111, 1121 analyze various aggregation operators for the graph partitioning problem. Each search communicates exclusively with the processes working on the immediate higher and lower aggregation levels. Improved solutions are exchanged asynchronously at various moments dynamically determined by each process according to its own logic, status, and search history. Received solutions are used to modify the search at the receiving level. An incoming solution will not be transmitted further until a number of iterations have been performed, thus avoiding the uncontrolled diffision of information. The approach is very successful for graph partitioning problems and starts now to be applied to other fields. Banos et al. [7,6] have also used the multilevel approach as a basis for constructing a cooperativesearch but used a pCIRSIMPSS strategy implemented in a master-slave configuration. Each search thread consisted of a SA algorithm, enhanced of a simple TS to avoid SA cycling, and worked at a given aggregation level. Periodically, each search thread sent its best solution to the master, which selected the overall best and broadcasted it back to the threads, which then continued their search. Computational results reveal that the parallel algorithm obtained solutions as good or better that the sequential version in shorter computing times. 19.5 STEJNER TREE PROBLEM

The Steiner tree problem, also identified sometimes as the Steiner problem (Verhoeven and Severens [ 16l]), or as the Steiner problem in graphs or the Steiner minimal tree (Martins, Ribeiro, and Souza [96]), has many applications, including VLSI design and telecommunicationnetwork design (e.g., multicast routing). Consider a graph G = ( V ,E ) with vertex set V and edge set E ) and a nonnegative weight function w that associates a positive value w(e) to every edge e E E . Let X c V such that V \ X # 0. The Steiner tree problem then consists in finding a minimum weighted subtree of G spanning all terminal vertices in X. The set of non-terminal vertices of the minimum tree is called the Steiner set. The Steiner tree problem is NP-Hard. Many metaheuristics of various types, e.g., TS, SA, GRASP, GA, and local search, have therefore been proposed. A few of these methods have been parallelized. 19.5.1 Parallel GRASP

Martins, Ribeiro, and Souza [96] and Martins et al. [95] proposed several GRASP procedures for the Steiner problem. The parallel versions of these methods have also been proposed. The authors used a pC/RS/MPSS parallelization strategy implemented according to a master-slave model. Each thread ran the same GRASP procedure with a different initial seed for a number of iterations equal to those of the

SET PARTITIONLNG AND COVERING

457

sequential version divided by the number of available processors. The best solution was collected at the end. The strategy achieved high solution quality for the problem instances tested as well as good speedups. The Martins et al. [95] implementation aclueved comparable results, in terms of solution quality, compared to the best known TS with path-relinking method of Bastos and Ribeiro [lo].

19.5.2 Parallel Local Search Verhoeven and Severens [ 1611proposed sequential and parallel local search methods for the Steiner tree problem based on a novel neighborhood, which the authors claimed were “better” than those known in the literature. The parallel strategy followed a 1C K S model. Computational results indicated that good speedups could be obtained without loss in solution quality. Actually, the proposed parallel algorithm outperformed in terms of speedup the parallel GRASPof Martins, Ribeiro, and Souza ~961.

19.6 SET PARTITIONING AND COVERING Consider a set S of cardinality n and a collection C = (S1, SZ,. . . , Sm)of m subsets of S. A weight c, is associated with each subset S,. A partition of S is a subset of C such that all elements of S are included and each of them belongs to exactly one set of the partition. A cover of S is a subset of C such that each element of S belongs to at least one subset in the cover. The weight of a partition or of a cover is the sum of the weights of the included sets. The set partition problem consists of finding a partition of S with minimum total weight, while a cover of S of minimum total weight is the goal of the set covering problem. The set partitioning and set covering problems may also be cast as 0-1 linear optimization problems min C,=l,m c,xJ subject to Ax < (=)1and x3 E (0, l}, where a row of the matrix A corresponds to an element of S,each column to a subset S, , j = 1,. .. ,m of C, and each decision variable x, , j = 1,. . . , m indicates if the corresponding subset is to be part of the optimal partition (cover) or not. The literature on the set partitioning and set covering problems is very rich. The two problems appear in many application domains, routing and scheduling, location and design, production, capital investment, image coding, and so on. Exact, heuristic, and metaheuristic solution methods have been proposed, including a number of parallel metaheuristics summarized in the following.

19.6.1 Set Partitioning Applications We identified only two contributions aimed at parallel metaheuristics for the set partitioning problem. Levine [89] addressed the set partitioning problem applied to the airline crew scheduling problem and proposed two versions of a coarse-grained, island PGA:

458

PARALLEL METAHEURISTICS APPLICATIONS

with (cooperative search) and without (independent search) migration. A simple cooperative mechanism was proposed: the best chromosome in a subpopulation migrates to a neighboring subpopulation at fixed intervals, while the chromosome to delete is randomly selected. A MPSS Search Differentiation strategy was selected, the initial populations for the islands being randomly and independently generated. Numerical results indicated that the two parallel versions generally offered similar solution quality, with a slight advantage to the method integrating migration. Both methods outperformed the sequential approach. Czech [45] has studied a single-depot vehicle routing problem where each route cannot serve more than a fixed, small number of customers. The author formulated the problem as a set partitioning problem and proposed two PSA algorithms, based on the independent and cooperative multisearch paradigms, respectively. The cooperation is of the pCKS type. The SA processes transmit their best solutions every n steps. Each thread then starts again the search after updating its best solution. The results reported show that both parallel methods obtain better results than the sequential version in terms of solution quality. Moreover, the cooperative method outperformed the independent multithread search. 19.6.2 Set Covering Applications

We identified four contributions to the parallel metaheuristic for the set covering literature: two island-based cooperative PGA methods, one ant colony, and one proposing randomized approaches. Calegari et al. [20] have considered the problem of selecting the best set of radiotransmitter locations such that a maximum covering area with an optimal cost (or minimum number of radio transmitters) is achieved and stated it as a variant of the set covering problem. The authors proposed a pC/KS cooperative approach according to an island model with a small population (two or four individuals) associated to each island (and processor). The islands were arranged on an oriented ring, and migrations were only allowed between neighboring islands. The authors chose this arrangement to minimize the amount of migrations and the communication overload due to migrations between remote islands. Once a new generation was computed, a copy of the best individual of each island was sent to the next island on the ring. Each island thus received a new individual that replaced a randomly selected local individual. The proposed cooperative multithread search performed well in terms of computational time compared to the sequential GA. The speedup was almost linear and the efficiency reached 93%. Solar, Parada, and Unutia [I371 presented a IC/KS parallel method based on a coarse-grained GA. The authors used an island model, where each island contained one subpopulation. Initial subpopulation were independently and randomly generated. A master-slave scheme was selected to implement the proposed approach. Each slave ran a standard GA on its island. After performing the computations of each generation, each slave sent its best individual to the master. Once the master received all the best individuals from the slaves, it selected the overall best and broadcast it. Each slave replaced its worst local solution with the individual sent by the master

SATISFIABILITY PROBLEMS

459

and launched again its GA. The authors reported numerical results that showed that the parallel approach could reach near-optimal solutions. Errors ranged from 3.3% to 10% of the optimal solution value for almost all problems tested. They also concluded that the parallel approach is more efficient than the corresponding sequential one in the number of generations required to obtain a certain quality of solution. The authors claimed that the solution quality of the proposed algorithm is better than that of both TS and SA. Unfortunately, this comparison is not very helpful because no details were given regarding the solution quality versus computational time for each of these approaches. Rahoual, Hadji, and Bachelet [ 1 181 presented two parallel approaches based on a combination of an ant colony system and a local search heuristic. The sequential method, called AntsLS, consisted in launching the ant searches and applying a local search to each solution found. This sequence was repeated until a stopping criterion based on a convergence threshold was reached. The first parallel approach was a pC/RS independent multithread search, where each thread run AntsLS and the best solution was collected at the end. The second method represented a direct parallelization of AntsLS according to a lC/KS strategy in a a master-slave implementation. Each slave holds one ant process. The master synchronizes the searches of all ants. Once a solution is found by every slave, it is sent to the master. A pheromone updating is then performed by the master. The master also updates the best solution. The numerical results show a good performance of the independent search approach in both solution quality and speedup. Not surprisingly, the lC/RS approach did not perform well due to high communication times. It is noteworthy that the two methods do not parallelize the same algorithm and cannot therefore be compared. Fiorenzo Catalan0 and Malucelli [57] discussed several general schemes that lead to approximate approaches for the set covering problem. These schemes embedded two constructive heuristics: a greedy algorithm, which at each iteration added a set to the partial solution according to associated probabilities, and a randomized primaldual approach (Beasley [ 131). The authors proposed synchronous and asynchronous parallelizations of these schemes. The first synchronous approach is a variant of the lC/KS/MPSS strategy and was used for the randomized primal-dual method. It was implemented according to a master-slave scheme. The master held the reduced cost information and updated the Lagragian multipliers. It also updated the best solution following synchronization, at regular intervals, when slaves sent their best solutions. Slaves received from the master the information required to run the procedure, but also exchanged and updated information regarding recent set operations. The second synchronous approach followed a pC/KS scheme where independent searches regularly exchange information. It was used with both randomized procedures. The parallel algorithms performed well. 19.7 SATISFIABILITY PROBLEMS

Satisfiability problems are Boolean problems of central importance in various fields such as artificial intelligence, automatedreasoning, computer design, database search,

460

PARALLEL METAHEURISTICS APPLICATIONS

computational complexity, and so on. Loosely speaking, the problem consists in finding an assignment of variables that evaluate TRUE a given formula. More precisely, consider a set of variables 1c1 , ..., 5 , , a set of clauses C1 , ..., C,, and operators A (AND), V (OR), and NOT. A variable may be either TRUE or FALSE. A literal is defined as “a variable or its negation”. A clause is a finite disjunction of one or more literals. Thus, for example, a clause with three literals, C = 2 1 V 22 V NOT 23, will be true if at least one of the literals is true. The satisfiability problem SAT consists in determining if there is an assignment of variables that evaluate the formula C1 A C2A *-* AC, to TRUE.The maximum satisfiability problem, MAX-SAT, refers to finding a truth assignment of variables such that the number of TRUE clauses is maximized. 19.7.1 Parallel Genetic Algorithms

Wilkerson and Nemer-Preece [ 1631proposed two coarse-grainPGAs, an independent and a cooperative multisearch. An initial sequential phase reduced the search space by assigning a TRUE or FALSE value to each variable that appeared in the same literal (positive or negative form) in all the clauses. The remaining variables were used to generate the initial populations for an island model with 2 P processors, where p E { 2 , 3 , 4 , 5 } represented the highest ranking variables according to the Jeroslaw-Wang rule. A different assignment of TRUE or FALSE values to the p variables was performed for each processor. Then, an initial population was randomly generated for each processor by considering the rest of the variables. The same GA was used for all islands. A pC/KS cooperation mechanism was implemented. At fixed intervals (number of iterations), each processor broadcast its best individual to all other processors, the best new individual replacing the worst individual of the receiving population. Experimental results showed that the cooperative model outperformed the independent strategy, achieving superlinear speedups on some problem instances. Folino, Pizzuti, and Spezzano [60, 591 proposed a fine-grained PGA based on a diffusion model in a Cellular Automata framework. Every cell contained one individual and interacted with the neighbor displaying the best fitness. The offspring survived to replace the parent and be enhanced through a local search (Selman, Levesque, and Mitchell [131], Selman, Kautz, and Cohen [130]), if it had a better fitness. Comparative experimentation on hard 3-SAT problems (Mitchell, Selman, and Levesque [loo]) showed that the proposed method outperformed the parallel version of local search used to enhance surviving offspring. Additional results reported later (Folino, Pizzuti, and Spezzano [61]) displayed almost linear speedup and high quality solutions. Moreover, the parallel GA outperformed a SA (Spears [139]) and a GA (Marchiori and Rossi [94]) developed for the SAT problem.

SATlSFlABlLlTY PROBLEMS

461

19.7.2 Parallel Simulated Annealing Sohn and Biswas [ 1361 and S o h [ 1351proposed a 1CRS PSA method for the L-MT problem, a variant of the SAT problem where L is the ratio of the number of clauses to the number of variables. The algorithm was implemented according to a master-slave scheme. At each temperature and step, p iterations were distributed by the master among p slaves, one iteration per slave. The master received all the “accepted’ solutions of the slaves and, in order to avoid errors associated with simultaneous evaluation of solution, it kept the solution of the slave with the smallest index in the list of processors. The algorithm was implemented on a large-scale distributedmemory multiprocessor machine. The authors reported high quality solutions (i.e., the quality increasing with the number of processors. Almost linear speedup was reported, particularly when the processors number was below 100. 19.7.3 Parallel GRASP Pitsoulis, Pardalos, and Resende [ 1161proposed an independent multisearch parallel GRASP method for the MAX-SAT problem. Different seeds were used for each thread in the construction phase to favor the exploration of different search spaces by the independent threads. The maximum number of iterations (Resende and Feo [ 1231) was divided among the processors. The authors reported high quality solutions in almost linear speedups.

19.7.4 Parallel Ant Colony Systems Assume a positive weight w iis associated with each clause i. The weighted MAXSAT problem then refers to finding a truth assignment of variables such that the total weight of true clauses is maximized. Drias and Ibri [50] proposed a sequential ant colony system for MAX-SAT,as well as two parallelizations. Similar to coarsegrained strategies for population methods, both parallelizations are based on dividing the colony into several subcolonies, each assigned to a different processor. In the lC/KS/MPSS strategy, implemented according to the master-slave model, each slave sent to the master its best solution once its ants finished searching. The master then updated the pheromone and launched a new search phase. The second parallel method followed a pC/KS/MPSS strategy. Each process executes the same ant colony method on its subcolony. When the search is over, it broadcasts its best solution to all other subcolonies and requests their current best solutions. A “local” synchronization is thus generated within an asynchronouscommunication environment. The process then selects the overall best solution, updates the pheromone, and restarts the search. Numerical tests showed the superiority of the cooperative strategy over the first approach. Both parallel methods outperformed the scatter search proposed by Drias and Khabzaoui [51] and the sequential GRASP presented in Resende and Feo [ 1241.

462

PARALLEL METAHEURISTICS APPLICATIONS

19.8 QUADRATIC ASSIGNMENT

One of the most difficult problems in combinatorial optimization, the quadratic assignment problem (QAP),may be simply stated as follows. Given A and B, two square matrices of dimension n, find a permutation p to minimize

Cz=l,...,n Cj=l,...n 4 % W ) . The QAP has many applications, facility location problems in particular. Neighborhoods based on swapping elements in the permutation have been particularly popular, even though their dimensions grow very fast. 19.8.1 Tabu Search

Among the first parallelizations of TS for the QAP, as for the other applications with large neighborhoods and relatively small computing efforts required to evaluate and perform a given move, one finds the 1CIRSISPSS strategy that targets the neighborhood evaluation. At each iteration, the possible moves in the neighborhood of the current solution are partitioned into as many sets as the number of available processors and the evaluation is camed out in parallel by slave processes. Chakrapani and Skorin-Kapov [22,24,25] proposed and studied such algorithms for the Connection Machine CM-2, a massively parallel SIMD machine. At the time, the authors reported that they either attained or improved the best known solutions for benchmark problems in a significantly smaller number of iterations. Taillard [142, 1441 used a different implementation and a ring of 10 transputers. There was no specific master processor. Following the initial partition of the set of possible moves and their assignment to different processors, each processor evaluated the pair-wise interchange moves and identified the best one. It then broadcast it to all other processors, which then performed all the normal tasks of a “master”, selecting and implementing the move, making the necessary adjustments and updates, partitioning the neighborhood, etc. Load balancing through partition ofthe neighborhood was acknowledged as critical, but no indication was given on how it was performed. On several problem sets proposed in the literature (essentially, the same set used by Chakrapani and Skorin-Kapov, with problem instances up to size loo), Taillard reported very good solutions, improving the best known values of many of the problems tested and obtaining suboptimal solutions (conjectured but not proven to be optimal) for problems up to size 64. Battiti and Tecchiolli [ 11, 121 proposed an independent multithread PTS, where the independent TSs started the exploration of the solution domain from different randomly-generatedinitial solutions. Each TS was including a hashing procedure to dynamically modify the length of the tabu lists and thus react to the behavior of the search. The authors then proceeded to derive probability formulas for the success of this pCIRSIMPDS global search, which tended to show that the independent search parallelization scheme was efficient - the probability of success increased, while the average success time decreased with the number of processors - provided the tabu procedure did not cycle.

QUADRATIC ASSIGNMENT

463

De Falco et a1 [47] proposed a pCKS cooperative approach. At each iteration, each search thread performed a local search from its best solution. Best solutions were then exchangedbetween searches that ran on neighboringprocessors. Local best solutions were replaced with imported ones only if the latter solutions were better. The authors experimented on several architectures and reported that they obtained better solutions when cooperation was included compared to an independent thread strategy. Superlinear speedups were reported. Talbi, Hafidi and Geib [ 1471 (see also Talbi et al. [ 1491 and Talbi, Hafidi, and Geib [ 1481) presented independent multithread strategies based on TS hybridized with SA for the intensificationphase. An interesting feature of this contribution was the dynamic-loading mechanism, which allowed to use the available resources of a heterogeneous network of single and multiprocessors computers. The computational results showed very promising performance of the proposed dynamic loading mechanism in terms of the scheduling overhead. Overhead was low (0.09%) comparing to the total execution time. Moreover, the authors claimed very high solution quality compared to the literature.

19.8.2 Genetic Algorithm Miihlenbein [ 101, 10 1] proposed a fine-grained PGA for the QAP. Individuals were arranged into two equal subsets placed on two rings such that each individual had two neighbors on each ring. A hill-climbing heuristic was applied to individuals and GA operators were applied to the resulting local optimum. It is interesting to recall that this work was part of a larger body of contributions where hill-climbing heuristics were embedded into GAS to improve (“educate”) individuals and the impact of this hybridization on GA behavior and performance was addressed (e.g., Muhlenbein, Gorges-Schleuter, and Kramer [106, 1071, Muhlenbein [102, 104, 1051). 19.8.3 Simulated Annealing Boissin and Lutton [ 171 developed a parallel SA that used a domain decomposition strategy on a massively parallel computer. Experiments performed on a 16K Connection Machine produced interesting performances. Laursen [83] applied to the QAP the same cooperation mechanism described in Section 19.4.3. The strategy suffered a 10% to 20% communication overhead and produced very bad solutions.

Li, Pardalos, and Resende [90] and Pardalos, Pitsoulis, and Resende [ 1 151 proposed the same pC/RS/MPSS parallelization strategy for large-scale QAF’ that will later be applied to the Steiner problem (i.e., Martins, Ribeiro, and Souza [96] and Martins et al. [95]; see Section 19.5.1). Recall that the strategy calls for equal distribution of a maximum number of iterations among the available GRASP threads. Numerical experiments revealed almost linear speedups, around 62 for 64 processors.

464

PARALLEL METAHEURISTICS APPLICATIONS

Pardalos, Li, and Murthy [113] proposed a pCIRSIMPSS independent multisearch method. Computational results on benchmark problem instances showed that best known solutions were generally found. The speedup results were more varied.

19.8.5 Parallel Scatter Search Cung et al. [44]introduced the first parallel scatter search method for the QAP. The authors proposed an independent multithread SS, where the different searches used different parameter settings. The computational results showed encouraging speedup, but no improvement in solution quality compared to the sequential algorithm. According to the authors, the main reason that could explain this is the rapid convergence of the small-sized subpopulation of each search.

19.8.6 Parallel Ant Colony Systems According to our best knowledge, the first parallel implementation of an Ant Colony System (ACS) for the QAP was proposed by Talbi et al. [ 150, 1511. The method consisted in a 1C/KS parallelization of ANTabu, which is a combination of an ACS and TS. In a master-slave implementation, the master kept and updated the pheromone matrix and the best solution. At each iteration, the master spread the current pheromone matrix among all the ants. Each ant performed its search to construct a complete solution, launched a TS to improve that solution, and sent the final solution to the master. Computational tests conducted with 10 ants indicated the method offered good performances, in terms of solution quality, compared to the independent multithread TS proposed by Talbi, Hafidi, and Geib [ 1471.

19.9 LOCATION PROBLEMS Location problems may be defined in continuous or discrete space (two dimensions are generally used, but not always) or on graphs. The objective is to select and locate a number of points to cover optimally given zones or points while satisfying various constraints (topological, demand, capacity, and so on). Location problems are broadly applied from marketing and economic research topolitical districting, from transportation and logistics to telecommunications and production. We present in this section parallel metaheuristics proposed for three particular classes of location problems.

19.9.1 Simple Plant Location Problem The simple plant location problem is a classical discrete location problem. We are given a set of customer points with known demands and a set of potential plant (or warehouse) locations. A fixed cost for opening the facility is associated to each potential plant site. Transportation costs are associated to each customer-plant site

LOCATION PROBLEMS

465

pair. The objective is to select the plant locations such that customer demand is satisfied at minimum total system, opening, and transportation cost. Kohlmorgern, Schmeck, and Haase [81] proposed two fine-grained parallel versions based on the island and the diffusion models for the uncapacitated warehouse location problem. The initial population was divided into 1, 4, 16, 64, 256, and 1024 islands on a parallel machine with 16K processors, each processor holding one individual. Individuals were arranged in a two-dimensional grid and had eight neighbors. In the first version, a sequential GA was launched in each island (for each subpopulation). Individuals were exchanged between two neighboring island after a predetermined number of iterations. The migration rate was fixed as well. The second model was based on selecting partners at given distances. Each processor selected a neighboring partner according at the given distance in one of the eight possible directions A wheel selection rule was used. Experimental results indicated that the performance of the second model increased with the number of neighbors. Best results were found when an elitist selection procedure was used. No results were given for the first island model. 19.9.2 Location with Balancing Requirements The multicommodity location problem with balancing requirements emerged in the context of the design of intermodal transportation systems. We are given a set of vehicle/commodities, a set of potential locations for vehicle depots, and two sets of “customers”, one that supplies and another that requests known quantities of given vehicle types. Commodity-specific transportation costs are associated with the customer-to-depot, depot-to-customer, and depot-to-depot movements (there are no customer-to-customermovements). The latter are more efficient (bulk shipments) and cost less that the other two. Fixed opening costs characterize the depot locations. The objective is to design the system such that the total cost is minimized while satisfying requests at supply and demand points. The problem is NP-hard and is represented as a fixed-cost, mixed-integer formulation with a multicommodity network structure. The problem served to illustrate the parallel TS taxonomy introduced by Crainic, Toulouse, and Gendreau [41] as well as a thorough comparison of various parallelization strategies based on this taxonomy (Crainic, Toulouse, and Gendreau [40,39]). The authors implemented and compared a lC/RS/SPSS and a lC/KS/SPSS method. In a master-slave implementation, the first method had slaves evaluate candidate moves only, while in the second, called probing, slaves also performed a few local search iterations. The second strategy performed marginally better. However, both methods were outperformed by pcontrol implementations that attempt a more thorough exploration of the solution space. A decomposition approach, which partitioned the vector of decision variables and performed a search on each subset, was also part of the study (Crainic, Toulouse, and Gendreau [40]). It performed poorly, mainly because of the nature of the class of problems considered; multicommodity location with balancing requirements requires a significant computation effort to evaluate and implement moves, resulting in a limited number of moves that may be performed during the search.

466

PARALLEL METAHEURISTICS APPLICATIONS

As far as we can tell, Crainic, Toulouse, and Gendreau [39, 411 proposed the first cooperative central memory strategy for TS as part of their taxonomy. Other than this pC/KC/MPDS strategy, the authors also implemented and compared an independent multithread pC/RS/MPDS approach, pC/KS synchronous cooperations (varying the synchronization mechanisms and the Search Differentiation strategies), and broadcast-based asynchronous pC/C cooperative strategies. The authors report that the parallel versions achieved better quality solutions than the sequential ones and that, in general, asynchronous methods outperformed synchronous strategies. The independent multisearch and the asynchronous cooperative approaches offered the best performance. Crainic and Gendreau [34] report the development of a hybrid search strategy combining the cooperative multithread parallel TS method of Crainic, Toulouse and Gendreau [39] with a genetic engine. The GA initiates its population with the first elements from the central memory of the parallel TS. Asynchronous migration (migration rate = 1) subsequently transfers the best solution of the genetic pool to the parallel TS central memory, as well as solutions of the central memory towards the genetic population. The hybrid appears to perform well, especially on larger problems where the best known solutions are improved. It is noteworthy that the GA alone was not performing well and that it was the parallel TS procedure that identified the best results once the genetic method contributed to the quality of the central memory. The multicommodity location-allocationproblem has also been used to study the impact of cooperation on the global behavior of the search (Toulouse et al. [ 1551, Toulouse, Crainic, and S a n d [ 153,1541,Toulouse, Crainic, andThulasiraman [ 1561). The authors showed that cooperativemetaheuristicswith unrestricted access to shared knowledge may experience serious premature “convergence” difficulties, especially when the shared knowledge reduces to one solution only (the overall best or the new best from a given thread). This is due to a combination of factors: there is no global history or knowledge relative to the trajectory of the parallel search and thus each process has a (very) partial view of the entire search; threads often use similar criteria to access the (same) “best” solution; and each process may broadcast a new best solution after a few moves only and thus disseminate information where only part of the solution has been properly evaluated. The contents of shared data tend then to become stable and biased by the best moves of the most performing threads and one observes a premature convergence of the dynamic search process. Moreover, the phenomenon may be observed whether one initializes the local memories following the import of an external solution or not. This study explains several of the anomalies reported in the literature. It has also been the motivation for the development of more advanced cooperation concepts, in particular the central memory and the multilevel diffusion cooperation mechanisms. Gendron, Potvin, and Soriano [67] proposed a cooperative multithread parallel metaheuristic for the capacitated version of the multicomodity location problem with balancing requirements. The method combined Variable Neighborhood Descent (VND), a version of VNS, and Slope Scaling (SS). The central-memory-basedcooperation was implemented via a master-slave architecture and consisted of two phases.

LOCATION PROBLEMS

467

In the first phase, the slaves performed SS procedures in parallel and sent the best solutions to the SS and VND memories located in the central memory. In the second phase, a certain proportion of slaves ran a VND search, while the others ran SS heuristics. VND processes started from S S memory and fed the VND memory, while SS processes started from VND memory and fed the SS memory. The computational tests revealed that the proposed parallel search was more diversified and thus performed better with an increase in the number of slaves.

19.9.3 The p-Median Problem Given a set of potential locations for p facilities and a set of locations for their many users, the pmedian problem aims to locate simultaneously the p facilities in order to minimize the total transportation cost for satisfying the demand of the users, each supplied from its closest facility. The pmedian problem is one of the fundamental models in (discrete) location theory and a classical combinatorial optimization formulation with a broad application range, including cluster analysis and data mining. Despite the apparent simplicity of its mathematical expression, the p-median problem is difficult to solve. It belongs to the "-hard class of problems and exact solution methods cannot address realistically-sized problem instances in most cases of interest. 19.9.3.1 Parallel Variable Neighborhood Search. Garcia-Lopez et al. [62] proposed the first parallelizations for the pmedian problem and among the first for VNS in general. The authors introduced and compared three strategies. The first approach was a 1C&S parallelization that attempted to reduce computation time by parallelizing the local search phase within a sequential VNS. The second one implemented an independent multisearch strategy, pC/RS/MPSS, which ran an independent VNS procedure on each processor, the best solution being collected at the end. The third method applied a pCiRS synchronous cooperation mechanism through a classical master-slave approach. The master processor ran a sequential VNS. The current solution was sent to each slave processor that modified it randomly to obtain an initial solution from which local search was started. The solutions were passed on to the master that selected the best and continued the algorithm. The authors tested their methods using the TSPLIB problem instances with 1400 customers only. Not surprisingly, the last two strategies found better solutions, with the third approach using marginally less iterations than the second one. Crainic et al. [36] presented a pC/C/MPSS cooperation mechanism that allowed each individual search access to the current overall best solution without disturbing its normal proceedings. The parallel procedure was implemented in an asynchronous master-slave scheme and may be summarized as follows. Individual VNS processes communicatedexclusivelywith a central process called the central memory or master. There were no communications among individual VNS processes. The master kept, updated, and communicated the current overall best solution. Solution updates and communications were performed following messages from the individual VNS processes. The master also initiated and terminated the algorithm. To initiate the

468

PARALLEL METAHEURISTICS APPLICATIONS

process, a parallel reduced VNS (no local search phase) was executed until a number of unsuccessful trials were observed. Each search process implemented the same VNS metaheuristic (local search used First Improvement, Fast Interchange, and Ic, = p ) . It proceeded with the “normal” VNS exploration for as long as it improved the solution. When the solution was not improved, it was communicated to the master (if better than the one at the last communication) and the overall best solution was requested from the master. The search was then continued starting from the best overall solution in the current neighborhood. Computational results on TSPLIB problem instances with up to 1 1,849 customers showed that the cooperative strategy yielded significant gains in terms of computation time without losing solution quality. The quality of the solutions obtained was, in fact, comparable to that of the best results in the literature.

19.9.3.2 Parallel Scatter Search. Garcia-Lopez et al. [63] proposed three parallel SS methods for the p-median problem. The first parallelization was a I C E S synchronous parallel SS, where the neighborhood was divided into several disjoints subsets that were assigned to processors, one subset per processor. Each processor ran a local search on its subset and returned the result to the master running the sequential SS. The second parallelization, called replicated combination SS (RCSS), also lead to runtime reductions. It consisted of partitioning the reference set and running separate SSs on each one of them. The third parallelization follows an independent multisearch strategy by running the SS in parallel for several populations. According to the reported computational results, the simple l C R S parallelization achieved superlinear speedup, while the best values were found by the independent multisearch procedure. 19.10 NETWORK DESIGN Network design problems are a generalization of location formulations. They are defined on graphs containing nodes, connected by links, which may be either undirected edges or directed arcs. Links may have various characteristics, such as length, capacity, and cost. In particular, fixed costs may be associated with some or all links, signaling that as soon as one chooses to use that particular link, one has to incur the fixed cost, in excess of the utilization cost, which in most cases is related to the volume of traffic on the link. Recall that when the fixed costs are associated to nodes, one obtains the formulation of a location problem. Such representations are generally used to model the cost of constructing new facilities, offering new services, or adding capacity to existing facilities. In network design problems, the aim is to choose links in a network, along with capacities, eventually, in order to enable demand to flow between their origin and destination at the lowest possible system cost, i.e., the total fixed cost of selecting the llnks plus the total variable cost of using the network. Network design has a wide variety of applications in transportation, logistics, telecommunication,production, and so on.

NETWORK DESIGN

469

We found few parallel metaheuristic developments for the fundamental network design formulations. Most contributions are dedicated to various aspects of telecommunication network design, as illustrated in this section. 19.10.1 Multicommodity Network Design Crainic and Gendreau [35] proposed a pC/KC/MPDS cooperative multithread parallel TS for the fixed-cost, capacitated, multicommodity network design (CMND) problem. In their study, the individual TS threads differed in their initial solution and parameter settings. Communications were performed asynchronously through a central-memory device. The authors compared five strategies of retrieving a solution from the pool when requested by an individual thread. The strategy that always returns the overall best solution displayed the best performance when few (4) processors were used. When the number of processors was increased, a probabilistic procedure, based on the rank of the solution in the pool, appeared to offer the best performance. The parallel procedure improved the quality of the solution and also required less (wall clock) computing time compared to the sequential version, particularly for large problems with many commodities (results for problems with up to 700 design arcs and 400 commodities are reported). The experimental results also emphasized the need for the individual threads to proceed unhindered for some time (e.g., until the first diversification move) before initiating exchanges of solutions. This ensures that local search histories can be established and good solutions can be found to establish the central memory as an elite candidate set. By contrast, early and frequent communications yielded a totally random search that was ineffective. The cooperative multithread procedure also outperformed an independent search strategy that used the same search parameters and started from the same initial points. Crainic, Toulouse, and Li [42] proposed a multilevel diffusion parallel algorithm for the CMND problem. Each individual search thread involved in cooperation made use of the cycle-based TS proposed by Ghamlouche, Crainic, and Gendreau [68] and explored a given level of aggregation of the graph. Aggregation was performed by variable fixing. Sets of elite solutions were built at each level and local memories recorded the frequency of variables in elite solutions. Exchanges occurred at regular intervals and involved solutions as well as context information (e.g., memories). Computational results on benchmark problem instances indicated that this approach yielded good quality solutions comparable to those obtained by the current best metaheuristics for the problem. 19.10.2 Telecommunication Network Design

Sleem et al. [ 1341 studied the design of cost-effective capacitated ATM networks to provide low end-to-end delays. A 1C/RS/MPSS parallelization strategy was implemented on a distributed network of workstations and a multiprocessor computer. A first GA was applied to an initial population. The resulting population was divided into subpopulations, one per processor, and a second GA was applied to each subpopulation. The resulting subpopulations were then returned to the master process that

470

PARALLEL METAHEURISTICS APPLICATIONS

reconstituted the whole population and restarted the process. Surprisingly, numerical results indicated better performance on the distributed network than on the parallel machine. The general performance of the parallel algorithm, compared to the serial method, was mediocre due to the overhead associated to exchanging large quantities of data (subpopulations). Flores, Cegla, and Caceres [58] proposed parallel GA methods for the multiojective telecommunication network design. The authors proposed a pCIRSIMPSS cooperative coarse-grained island strategy. Each subpopulation performed its own evolutionary algorithm. Following each generation, an elite migration was triggered from each subpopulation to all other subpopulations. The numerical experimentation was used exclusively to compare this parallel strategy applied to two GAS, SPEA (Strenght Pareto Evolutionary Algorithm) and NSGA (Nondominated Storing Genetic Algorithm), which differed in the fitness evaluation procedure. The former dominated the latter. Ribeiro and Rosseti [ 125, 126, 1271 proposed pC/RS/MPSS parallelization strategies for a GRASP procedure for the two-path network design. This problem appears often in telecommunication network design. It aims to identify a minimum-cost design that includes a path with at most two edges for each origin-destination pair in the network. The authors made use of the same pC/RS/MPSS parallelization strategy that was applied for quadratic assignment, Steiner, and other problems (e.g., Sections 19.5.1 and 19.8.4). The authors compared several parallel versions of the sequential GRASP on a set of small-sized problem instances. They concluded that the version integrating a two-way path-relinking improvement phase performed best. The results also showed that versions that include the path-relinking improvement phase outperformed a “pure” GRASP. 19.10.3

Mobile Network Design

A core problem in the design of wireless networks is the selection of the best set of radio transmitter locations to maximize the covered area at a minimum cost (or minimum number of transmitters). The problem may be formulated as a set covering problem. Calegari et al. [20] proposed a cooperative PGA based on the island framework with small subpopulation size. Numerical results on data from a real-life problem, showed that the parallel algorithm performed well, in terms of computational time, in comparison with the sequential GA (the speedup was almost linear and the efficiency was high). Tongcheng and Chundi [ 1521 proposed a cooperative coarse-grained parallel GA for the same problem and implemented it on three different topologies: ring, bioriented ring, and torus. Each thread ran a GA enhanced with a local search applied to 20% of the population after each generation. Migration involved neighboring nodes which exchange their best chromosome. Each process sent its best chromosome to all its neighboring nodes, where it replaced the worst individual. The proposed strategy performed well on the torus topology in terms of solution quality and number of generations to reach a good solution. The authors performed a limited compar-

THE TRAVELING SALESMAN PROBLEM

471

ison between their algorithm implemented with 16 subpopulations of size 10 and the method proposed by CalCgari et al. [20] applied to 40 subpopulations o f size 4. The results were close with a slight advantage to the latter method. The authors concluded that their approach performed well even when the number of processors and individuals was small. Towhidul Islam, Thulasiraman, and Thalasiram [ 1591 presented a parallel ant colony algorithm for the all-pair routing problem in a wireless network with mobile nodes (MANET). In MANET, nodes are mobile and can instantaneouslyand dynamically form a network as needed. The topology of the nodes can thus substantially change during a short period of time. This makes the problem of determining the best route for data between all pairs of nodes very difficult. The most significant applications of this type of network appear in the military domain and in major disaster situations. According to our knowledge, this is the first time that a parallel metaheuristic is applied to the all-pair routing problem (the recent study of Gunes, Sorges, and Bouazizi [72] does not address the all-pair routing problem and does not use parallel computing). The parallel ACS proposed by Towhidul Islam, Thulasiraman, and Thalasiram applied a domain-decomposition strategy. The network was decomposed into subgraphs. Each ant was assigned to “its” processor and received a subgraph on which to search for the shortest paths between all pairs of nodes. The ants communicated during their searches. Unfortunately, no details were given on how this communication was realized. It was reported, however, that it required between 93% and 97% of the total run time. A speedup of 7 could, however, be reached when 10 computers were used.

19.11 THE TRAVELING SALESMAN PROBLEM A well-known, even outside of scientific circles, and fundamental problem in graph theory and combinatorial optimization, the Traveling Salesman Problem (TSP), may be summarized as follows. Given a set of points and distances between all pairs of points, find a Hamiltonian tour through all points. The TSP appears prominently in many applications: transportation, distribution, logistics, telecommunications, production planning, and so on. The TSP is “-Hard. In recent years, however, advances in mixed-integer programming have resulted in codes that may address very large problem instances. Yet, it is still interesting to survey the parallel metaheuristic field, since the TSP has offered a rich environment for developments,many of which were later applied to other problems. Moreover, it often appears as a subproblem in many applications where metaheuristics are the solution method of choice. The next subsections examine parallel metaheuristics for the TSP according to the basic methodology used. 19.11.1 Parallel Simulated Annealing Felten, Karlin, and Otto [ 5 5 ] proposed a 1CIKSIMPSSstrategy based on the domain decomposition idea and experimented with 64 cities using up to 64 processors of a

472

PARALLEL METAHEURJSTICS APPLICATIONS

hypercube computer. An initial tour was randomly generated and partitioned into p subsets of adjacent cities. Each subset was then assigned to a processor, which performed local swaps on adjacent cities for a number of iterations. This was followed by a synchronization phase where cities were rotated among processors. Parallel moves did not interact due to the spatial decomposition of the decision variables. Moreover, each synchronization ensured the integrity of the global state. Hence, there was no error and almost linear speedups were observed. Similar developments were proposed by Allwright and Carpenter [2], but communication overhead impaired performances. Jeong and Kim [76,77] improved the performance by working on the implementation to reduce this overhead. Numerical results showed their method to be 10 times faster than the previous one. A different approach, based on parallelizing the speculative computation method of Witte, Chamberlain, and Franklin [164, 1651, was presented by Nabhan and Zomaya [ 1091. The authors managed to reduce the overhead by communicating only the moves actually performed rather than whole solutions. The method produced modest results, however. Bevilacqua [ 161proposed an adaptive strategy to parallelize SA methods and used the TSP as a benchmark. According to the status of the search, the adaptive strategy executes either k iterations of the sequential SA or 2k iterations of an independent multisearch SA, with k a parameter to be calibrated. Computationalresults indicated the proposed method may yield high quality solutions. For the instances tested, deviations of the order of 1% to 2.5% from the optimal solution were observed. Sanvicente-Sanchezand Frausto-Solis [ 1291proposed a pCICIMPSS strategy that decomposed the sequential SA along temperature lines. A SA was launched at each temperature. Once its iterations were finished, a SA process broadcast its best solution to all processes working at lower temperature. On receiving a solution, a process verified if it was better than its best. In the affirmative, it restarted its search from the imported solution. It continued, otherwise. The algorithm was tested on a network of workstations with few machines on a few test problem instances. Computation times were reduced, for the same solution quality, compared to the sequential version. It would be surprising if such an approach could perform on a larger number of processors. Diekmann, Luling, and Simon [48] proposed a synchronous pC/KS/MPSS cooperative multisearch. The parallel method performed as well as the sequential SA in terms of solution quality, while achieving a speedup of 85 over 121 processors on a parallel machine (OCCAM-2) with 320 transputers. The authors noticed that the solution quality was independent of the number of processors. Ram, Sreenivas,and Subramaniam [ 1191proposed two variants of a pC/RS/MPDS synchronous multisearch strategy and applied it to the TSP and the Jop-Shop problem. The first parallel approach was a direct implementationof the strategy according to a master-slave model. Slaves ran different SA threads and synchronized at predefined moments to exchange information. The second parallel approach consisted in executing a sequential GA first and then starting the previous PSA, each thread starting from a different individual (solution) from the last generation of the GA. Experimental tests were conducted on a limited number of processors and the two

THE TRAVELING SALESMAN PROBLEM

473

parallel procedures performed well compared to the sequential SA, in terms of both solution quality and computation time. Using the GA to initiate the parallel search seemed to be beneficial. Miki et a/. [99] presented a different SA-GA combinationwithin aparallel scheme. The authors proposed a pC/RS independent multisearch strategy, where each thread corresponded to a SA with an adaptive temperature schedule controlled by a GA. Computational results showed good speedups (from 20 to 26.60 on 32 processors). 19.11.2 Parallel Genetic Algorithms

19.11.2.1 Fine-Grained Methods. Muhlenbein, Gorges-Schleuter, and Kramer [106, 1071 proposed a fine-grained parallel GA for the TSP. The individuals of the population were placed on a planar grid, each on a processor. Consequently, each individual had 4 neighbors and neighborhoods overlapped allowing diffusion of information. This work was part of the effort to study the introduction into the GA framework of individual enhancement procedures (a fast version of the 2-opt local search procedure in this case). Knight and Wainwright [SO] and Mutalik et al. [lo81 proposed a pC/KS/MPSS coarse-grained cooperative GA, called HYPERGEN, where an initial population was evenly distributed among the processors of a hypercube. Migration occurred at predetermined moments. The proposed algorithm was tested for few (three) TSP test problems but with different parameter settings: population size, migration rate, and interval migration. The authors observedthat, on their test problems, implementations with large subpopulations needed less interaction (migration exchanges) than those with smaller subpopulations for equivalent solution quality. The acheved solution quality was comparable to the best known solutions at the time. Logar, Convin, and English [92] developed a massively parallel GA on a MasPar MP- 1 parallel computer for the TSP. The 2048 processors of the SIMD machine were arranged in a toroidal grid. Each processor had 16 KB memory and 8 neighboring processors. The authors used all 2048 processors to create an equal number of islands with 10 individuals each. They compared different migration strategies for their pC/MPSS method. The variant that triggered migration at each generation offered the best performance with a not-so-impressive speedup of 145. Chen, Flann, and Watson [26] proposed a massivelyparallel GA that included a SA method to determine the surviving offspring. The method is similar to a cellular GA, except that mating is not limited to neighboring individuals. According to the authors, one of the most striking remarks was the inversely proportional relationship between the number of processors and the time needed to reach a near-optimal solution. Kohlmorgern, Schmeck, and Haase [8 11 focused on studies regarding the capability of different processor topologies to prevent demes of fine-grained parallel GASto be dominated by the genotype of strong individuals. The population was distributed, one individual per processor, on the processors of a massively parallel computer with 16K processing elements. p subpopulations of 1, 4, 16, 64, 256, and 1024 were defined for the purpose of the study. The authors noticed that solution quality

474

PARALLEL METAHEURISTICS APPLICATIONS

increased with the number of subpopulations. They also reported good speedups but slow convergence. 19.11.2.2 Coarse-Grained Methods. Sena, Megherbi, and Isern [ 1321 discussed a parallel implementationof GA that combined strict synchronizationand population decomposition ideas. A master-slave platform was used to realize this implementation. Slaves ran the same GA on subpopulations that were randomly generated initially. After every generation, slaves sent their best solutions to the master. The master synchronized and controlled the operations. It stopped the search when the best solution reached a certain threshold. The master could also stop or let continue slaves evolve their populations, depending on the quality of the solution received. Experimental results pointed to relations between the population size and the number of slave GA processes. Indeed, an optimum combination seemed to exist between these two parameters for best performance. The authors also noticed that as the size of the population increased, the performance of the proposed PGA improved with an increase in the number of slaves. Performances were poor for small population sizes due to communication overhead. Baraglia, Hidalgo, and Perego [8] started from a GA enhanced with a LinKernighan heuristic for “educating” new individuals and proposed a coarse-grained parallelization. The population was partitioned into a few subpopulations, each of which evolved in parallel on different processors. An elitist migration occurred at predefined periods between all subpopulations. According to the authors, the proposed algorithm performed well on a number of TSP benchmarks. It required less iterations to reach the optimal solution than the sequential version. Katayama, Hibarayashi, and Narihisa [78,79] studied the impact on island-based parallel GAS of crossover and selections operators. The pCIKSIMPSS strategy had the initial population partitioned into subpopulationsand the same local search (2-opt) enhanced GA ran starting from each. Cooperation was performed by elite migration at fixed intervals defined as a number of iterations. Numerical results (first paper) showed that the proposed PGA was robust relative to the three crossover operators MPX, ERX, and CSEX but that the PGA using CSEX dominated the others. The results (second paper) also showed that the performance of the parallel approach is more sensitive to the choice of crossover operator than to that of the selection operators. 19.11.3 Tabu Search

Malek et al. [93] proposed a pC/KS/MPDS cooperative parallel strategy where the individual search threads were tightly synchronized. The authors implemented a master-slave model. Their method proceeded with one master that controlled the cooperation and four slaves that ran serial TS algorithms with different tabu conditions and parameters. Slaves were stopped after a specified time interval, solutions were compared, bad areas of solution space were eliminated, and the searches were restarted with a good solution and an empty tabu list. Note that longterm diversification memories were disabled in order to strictly implement this strategy. This

THE TRAVELING SALESMAN PROBLEM

475

implementation was part of a comparative study of serial and parallel SA and TS algorithms for the TSP, The authors reported that the parallel TS implementation outperformed the serial one and consistently produced comparable or better results than sequential or parallel SA. Chakrapani and Skorin-Kapov [23] proposed the same general lC/RS/SPSS strategy used for the QAP (Chakrapani and Skorin-Kapov [22, 24, 251; Section 19.8.1) based on distributing the neighborhood evaluation. The Connection Machine was again used for experimentation. The authors reported that, for the same quality of solution as the sequential version, near-linear speedups were achieved using a relatively small number of processors. De Falco et a1 [47] also applied to the TSP their pC/KS cooperative approach used for QAP. Recall that the authors implemented a multithread strategy, where each process performed a local search from its best solution. When done, processes synchronized to exchange best solutions with processes that ran on neighboring processors. Local best solutions were replaced with imported ones only if the latter were better. The authors indicated that better solutions were obtained when cooperation was included compared to an independent thread strategy. Superlinear speedups were reported. Fiechter [56] proposed a pC/KS/MPSS cooperative method for the TSP that included an intensificationphase during which each process optimized a specific slice of the tour. At the end of the intensification phase, processes synchronized to recombine the tour and modify (shift part of the tour to a predetermined neighboring process) the partition. To diversify, each process determined from among its subset of cities a candidate list of most promising moves. The processes then synchronized to exchange these lists, so that all built the same final candidate list and applied the same moves. A master-slave model was used for implementation. Fiechter reported near-optimal solutions to large problems (500,3000, and 10,000 vertices) and almost linear speedups (less so for the 10,000 vertex problems). 19.11.4 Parallel Ant Colony Systems Stutzle [ 1401 implemented an independent multisearch pC/RS/MPSS approach, where each thread was an enhanced ACS (Stutzle and Hoos [141]). For a similar total run time, i.e., the parallel method ran for the time of the sequential method divided by the number of processors, this approach achieved high performance on a number of TSP instances. Bullnheimer, Kotsis, and S t r a d [19] presented for the TSP two parallel ACSs: synchronousand partially asynchronousalgorithms. The 1 C/RS parallel synchronous method was implemented in a master-slave platform, where each slave held one ant. After each iteration, each slave sent its tour and the trace trail to the master, After updating the trails, the master sent the new information back to the slaves, which restarted the search. In order to reduce the communication overhead, a 1C/KS partially asynchronous strategy was also proposed. In this approach, a certain number of ants were assigned to each slave, which performed their search independently of

476

PARALLEL METAHEURISTICS APPLICATIONS

other slaves. The master triggered the trail updating by a global synchronization of all slaves at regular intervals (number of iterations). In 2000, Middendorf,Reischle, and Schmeck [98] discussed informationexchange strategies for multicolony ant algorithms. They focused on the impact of information exchange on running time and solution quality. Colonies were of different sizes and each was assigned to one processor arranged in a ring. Synchronization took place after several generations and the best ants were exchanged. This is a very simple cooperation mechanism and, not surprisingly, the authors reported that the solution quality was better when the colonies did not exchange too much information. Randall and Lewis [ 1201 examined several parallelization strategies, including multicolony cooperation: Parallel Independent Ant Colonies (pC/RS/MPDS), Parallel Interacting Ant Colonies (pCIKSIMPDS),Parallel Ants (lCIKS), Parallel Evaluation of Solution Elements (lC/RS), and Parallel Combination of Ants and Evaluation of Solution Elements (1CKS). A simple synchronization with broadcasting was used as cooperation mechanism in the second strategy. The authors selected Parallel Ants for the experiments dedicated to the TSP. The authors implemented this approach according to a master-slave model. Each slave launched an ant search separately from other ants. The master received information from slaves (pheromone, solutions), updated pheromones, and restarted the search for each ant. The computational tests revealed that the performance, in terms of speedup and efficiency, was acceptable for larger problems (number of cities >200). However, one of the disadvantages of the approach is the large amount of communicationrequired in maintaining and updating the pheromone matrix.

19.12 VEHICLE ROUTING PROBLEMS The vehicle routing problem (VRP) is one of the central problems in operations research and combinatonal optimization with numerous applications in transportation, telecommunications,production planning, etc. The VRP may be briefly described as follows. Given one or more depots, a fleet of vehicles, homogeneous or nor, and a set of customers with known or forecast demands, find a set of closed routes, originating and ending at one of the depots, to service all customers at minimum cost, while satisfying vehicle and depot capacity constraints. Other constraints may be added to this core problem, e.g., time restrictions, yielding a rich set of problem variants. Most VRP problems are NP-hard and exact solution methods address limited-size problem instances only.

19.12.1 Parallel Metaheuristics for the VRP Rego and Roucairol [ 12 11 proposed a TS approach for the VRP based on ejection chains and an independent multithread parallel version where each thread used a different set of parameter settings but started from the same solution. The method was implemented in a master-slave setting, where each slave executed a complete sequential TS. The master gathered the solutions found by the threads, selected the

VEHICLE ROUTING PROBLEMS

477

overall best, and reinitialized the threads for a new search. Low level parallelism was used to accelerate the move evaluations of the individual searches as well as in a postoptimization phase. Experiments showed the method to be competitive on a set of standard VRP (Christofides,Mingozzi, and Toth [27]). Ochi et al. [110] (see also Drummond, Ochi, and Vianna [52, 531) proposed a pC/C/MPSS coarse-grained PGA based on the island model for the vehicle routing problem with heterogeneous fleet. A petal decomposition procedure was used to build the initial population. The population was then divided into several disjoint subpopulations. Each GA thread evolved a subpopulation and triggered migration when subpopulationrenewal was necessary. An island in this case would broadcast its need and receive the best individual of every other island. The incoming individuals would replace the worst individuals of the receiving population. Computational tests show encouraging results in terms of solution quality and computing effort. Alba and Dorronsoro [I] addressed the VRP in which the routes have to be limited by a predefined travel time and proposed a fine-grained, cellular PGA. The population was arranged in a two-dimentional toroidal grid, each individual having 4 neighbors. Binary tournament selection was applied when selecting the mate for the first parent. Crossover was applied for these parents, then mutation and local search for the offspring. Two local search procedures were tested, 2-opt and 2-opt+X-Interchange, with X E { 1,2}. Elitist replacement was used. The authors compared their algorithm to classical heuristics, the TS of Rochat and Taillard [ 1281, the GAS of Prim and Taillard [ 1171 and Berger and Barkaoui [ 151, and the ant algorithms of Bullnheimer, H a d , and StrauR [18] and Reimann, Doerner, and Hart1 [122]. Computational results on benchmark problem instances showed high performance quality for both local search versions. Best performance (solution quality and rapidity) was observed for 2-opt+1-Interchange. 19.12.2 Vehicle Routing with Time Constraints Also known as the Vehicle Routing Problem with lime Windows (VRPTW), this problem specifies that service at customer sites must take place within given time intervals. Most time constraints specify that service cannot begin before a certain moment (but vehicles may wait “outside”, in most cases) and must be over by a given deadline. In soft-constrained versions, the time limits may be violated at a penalty. Czech and Czarnas [46] proposed a pC/KS/MPSS cooperative multithread PSA implemented on a master-slave platform. The master sent the initial solution to the salves. It was also in charge of controlling the annealing procedure temperature schedule, collecting the best local solution from each slave after n2 iterations for each temperature level (n was the number of customers), and updating the global best solution. Each slave ran a SA algorithm with the same parameters. Each slave j cooperated with slaves j - 1 and j 1 (slave 1 cooperated with slave 2 only) by exchanging best solutions. Cooperation was triggered every n iterations. Computational tests with few (five) processes showed good performance, in terms of solution quality, compared to the best-known solutions of the Solomon benchmarks.

+

478

PARALLEL METAHEURISTICS APPLICATIONS

Berger and Berkaoui [ 141 presented a low level parallel hybrid GA that used two populations. The first one aimed to minimize the total traveled distance, while the second aimed to minimize the violation of the time window constraints. A different fitness function was associated with each population. A master-slave platform was applied, where the master controlled the execution of the algorithm and coordinated the genetic operations. The slave concurrently executed the reproduction and mutation operators. Computational tests were conducted on a cluster of heterogeneous machines (19 computers). The authors compared their algorithm to the best-known methods in the literature for Solomon’s benchmark. Their results showed that the proposed technique was competitive. Taillard [ 1431 proposed a pCIKSIMPSS parallel TS based on domain decomposition. The domain was partitioned and vehicles were allocated to the resulting regions. Once the initial partition was performed, each subproblem was solved by an independent TS. All processors stopped after a number of iterations that varied according to the total number of iterations already performed. The partition was then modified by an information exchange phase, during which tours, undelivered cities, and empty vehicles were exchanged between adjacent processors (corresponding to neighboring regions). This approach did allow to address successfully a number of problem instances. The synchronization inherent in the design of the strategy hindered its performance, however. Rochat and Taillard [128] proposed what may be considered as the first fully developed adaptive memory-based approach for the VRPTW. The adaptive memory contained tours of good solutions identified by the TS threads. The tours were ranked according to attribute values, including the objective values of their respective solutions. Each TS process then probabilistically selected tours in the memory, constructed an initial solution, improved it, and returned the corresponding tours to the adaptive memory. Despite the fact that it used a rather simple TS, this method produced many new best results at publication time. Taillard et al. [ 1451 and Badeau et al. [ 5 ] refined this method by enriching the neighborhood and the intensification phase and by adding a post-optimization procedure. The authors reported 14 new best solutions for the standard Solomon data set [ 1381. Gehring and Homberger [64] (see also Homberger and Gehring [75])proposed a pCICIMPSS cooperative parallel strategy where concurrent searches were performed with differently configured two-phase metaheuristics. The first phase tried to minimize the number of vehicles by using an evolutionary metaheuristic, while the second phase aimed to minimize the total traveled distance by means of a TS. The parallel metaheuristic was initiated on different threads with different starting points and values for the search time available for the first and second search phases. Threads cooperated by exchanging solutions asynchronously through a master process. For now, this approach has produced, on average, the best solution for the Solomon problems with respect to the number of vehicles and the total distance. Results were also presented on larger instances, generated similarly to the original Solomon problems but varying in size from 200 to 1000 customers. It is worth mentioning, however, that this method is rather time-consuming compared to other metaheuristics, TS in particular.

SUMMARY

479

Le Bouthiller and Crainic [85]proposed a central memory pC/C/MPDS parallel metaheuristic where several TS and GA threads cooperate. In this model, the central memory constituted the population common to all genetic threads. Each GA had its own parent selection and crossover operators. The offspring were returned to the pool to be enhanced by two TS procedures. The central memory followed the same rules as in the work of Crainic and Gendreau [35].Experimental results show that without any particular calibration, the parallel metaheuristic obtained solutions whose quality is comparable to the best metaheuristics available, in almost linear speedups. 19.12.3

Dynamic Problems

Gendreau, Laporte, and Semet [66] addressed the deployment problem for a fleet of emergency vehicles and proposed a parallel TS based on domain decomposition. A master-slave implementation was performed where each slave addressed a subproblem associated to a vehicle. Computation tests showed high solution quality as indicated by territory coverage measures. Attanasio et al. [3] addressed the multi-vehicle dial-a-ride-problem and proposed two parallel strategies based on a multi-thread tabu search, a pC/C/SPDS and pCICIMPSS strategies. In the pC/C/SPDS approach, each processor ran a different tabu search strategy from the same initial solution. Once a processor found a new best solution, it broadcast it. Reinitilization searches were then launched. Every K iterations, a diversification procedure was applied to the first half of the processors, while an intensification was run on the remaining ones. The pC/C/MPSS strategy consisted in running various tabu search algorithms from different starting points. Each processor ran the same tabu search algorithm with the best known parameter settings. Moreover, every 77 iterations, processors exchanged information in order to perform a diversification procedure. According to the computational results, both the pC/C/SPDS and pC/C/MPSS strategies outperformed the sequential tabu search of Cordeau and Laporte [321. Gendreau et al. [65] proposed a cooperative multithread parallel TS method for real-time routing and vehicle dispatching problems. The authors followed an adaptive memory approach. In an interesting development, the authors also exploited parallelism within each search thread by decomposing the set of routes along the same principles proposed in Taillard’s work [143]. Very good results were obtained. 19.13 SUMMARY

We have presented a survey of parallel metaheuristic methods applied to a rather broad set of problems: graph coloring and partitioning, Steiner tree problems, set covering and set partitioning, satisfiability and MAX-SAT problems, quadratic assignment, location and network design, traveling salesman, and vehicle routing problems. This survey is certainly not comprehensive. Important topics could not be covered and not all published contributions in the topics covered could be surveyed. The scope

480

PARALLEL METAHEURISTICS APPLICATIONS

of the chapter is sufficiently broad, however, to allow us to draw some conclusions and share a number of thoughts on the subject of parallel metaheuristic applications. The survey illustrates the richness of contributions to the development of parallel metaheuristics as well as that of their applications to many important problems in science and practice. It also illustrates the fact that this richness notwithstanding, one finds only a somewhat limited number of fundamental principles regarding how to design parallel metaheuristic procedures. We summarized these principles in the taxonomy section presented at the beginning of the chapter. To sum up, it appears that asynchronouscooperation enhances the performance of parallel metaheuristics independently of the methodology used in the initial sequential method. This conclusion is strongly supported by the results obtained by multithread cooperative strategies. The survey also illustrates that not all application fields have been studied with comparable fervor. Indeed, many important topics have seen only a few contributions. Even for topics for which the number of contributions is larger these are not evenly distributed among metaheuristic classes. Without trying to completely explain the phenomenon, one may observe correlations between the methodologies selected and the scientific field of most of the researchers that have addressed it. Interesting research avenues and promising developments may thus go unexplored, and appropriate tools may be missing in some areas. It should be a challenge of the profession to explore as comprehensivelyas possible as many problem types as possible. While talung up this challenge, one should make sure that methods are compared across methodological approaches and that such comparisons are performed fairly, that is, all algorithmic developments are at the same level of sophistication. To conclude, parallel metaheuristics offer versatile and powefil tools to address large and complex problems. Many fascinating research avenues are open. Some address issues related to the design of parallel metaheuristics. Others concern the application of these designs to specific problems and the selection of the most appropriate. We hope that this chapter has contributed to illustrate these opportunities and challenges.

Acknowledgments Funding for this project has been provided by the Natural Sciences and Engineering Council of Canada and by the Fonds FQRNT of the Province of Qukbec.

REFERENCES

1. Alba, E. and Dorronsoro, B. Solving the Vehicle Routing Problem by Using Cellular Genetic Algorithms. In EvoCOP, pages 11-20,2004.

REFERENCES

481

2. Allwright, J.R.A. and Carpenter, D.B. A Distributed Implementation of Simulated Annealing for the Travelling Salesman Problem. Parallel Computing, 10:335-33 8, 1989.

3 . Attanasio, A., Cordeau, J.F., Ghiani, G., and Laporte, G. Parallel tabu search heuristics for the dymanic multivehcle dial-a-ride problem. Parallel Computing, 30 :377-3 87,2004. 4. Azencott, R. Simulated Annealing Parallelization Techniques. John Wiley & Sons, New York, NY,1992.

5. Badeau, P., Guertin, F., Gendreau, M., Potvin, J.-Y., and Taillard, E.D. A Parallel Tabu Search Heuristic for the Vehicle Routing Problem with Time Windows. Transportation Research C: Emerging Technologies, 5(2):109-1 22, 1997. 6. Banos, R., Gil, C., Ortega, J., and Montoya, F.G. A Parallel Multilevel Metaheuristic for Graph Partitioning. Journal of Heuristics, 10(4):315-336,2004. 7. Banos, R., Gil, C., Ortega, J., and Montoya, F.G. Parallel Heuristic Search in Multilevel Graph Partitioning. In Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pages 88-95,2004. 8. Baraglia, R., Hidalgo, J.I., and Perego, R. A Parallel Hybrid Heuristic for the TSP. In Boers, E.J.W., Cagnoni, S., Gottlieb, J., Hart, E., Lanzi, P.L., Gunther, R., Smith, R., and Tijink, H., editors, Applications of Evolutionary Computing. Proceedings ofEvoCOe EvoFlight, EvoIASe EvoLearn, and EvoSTIM, volume 2037 of Lecture Notes in Computer Science, pages 193-202. Springer-Verlag, Heidelberg, 200 1.

9. Barr, R.S. and Hickman, B.L. Reporting Computational Experiments with Parallel Algorithms: Issues, Measures, and Experts Opinions. ORSA Journal on Computing, 5( 1):2- 18, 1993. 10. Bastos, M.P. and Ribeiro, C.C. Reactive Tabu Search with Path-Relinking for the Steiner Problem in Graphs. In S. VoR, S. Martello, C. Roucairol, and Osman, I.H., editors, Meta-Heuristics 98: Theory & Applications, pages 3 1-36. Kluwer Academic Publishers, Norwell, MA, 1999. 11. Battiti, R. and Tecchiolli, G. Parallel Based Search for Combinatorial Optimization: Genetic Algorithms and TABU. Microprocessors and Microsystems, 16(7):351-367, 1992.

12. Battiti, R. and Tecchiolli, G. The Reactive Tabu Search. ORSA Journal on Computing, 6(2):126-140, 1994. 13. Beasley, J.E. Randomized Heuristic Schemes for the Set Covering Problem. Naval Research Logistics, 37:151-164, 1990. 14. .! Serger and M. Barkaoui. A Parallel Hybrid Genetic Algorithm for the Vehicle Routing Problem with Time Windows. Computers & Operations Research, 31(12):2037-2053,2004.

482

PARALLEL METAHEURISTICS APPLICATIONS

15. Berger, J. and Barkaoui, M. A Hybrid Genetic Algorithm for the Capacitated Vehicle Routing Problem. In E. Cantu-Paz, editor, GECCOO3, pages 646-656. Springer-Verlag, 2003. 16. Bevilacqua, A. A Methodological Approach to Parallel Simulated Annealing. Journal of Parallel and Distributed Computing, 62: 1548-1 570, 2002. 17. Boissin, N. and Lutton, J.L. A Parallel Simulated Annealing Algorithm. Parallel Computing, 19(8):859-872, 1993. 18. Bullnheimer, B., Hartl, R., and StraulJ, C. An Improved Ant System Algorithm for the Vehicle Routing Problem. Annals oj'Operations Research, 89:3 19-328, 1999. 19. Bullnheimer, B., Kotsis, G., and StrauB, C. Parallelization Strategies for the Ant Aystem. Applied Optimization, 24:87-100, 1998. 20. Calegari, P. , Guidec, F., Kuonen, P., and Kuonen, D. Parallel Island-Based Genetic Algorithm for Radio Network Design. Journal of Parallel and Distributed Computing, 47( 1):86-90, 1997. 2 1. Canhi-Paz, E. A Survey of Parallel Genetic Algorithms. Calculateurs Paralleles, Reseaux et SystPmes ripartis, 10(2):141-170, 1998. 22. Chakrapani, J. and Skorin-Kapov, J. A Connectionist Approach to the Quadratic Assignment Problem. Computers & Operations Research, 19(3/4):287-295, 1992. 23. Chakrapani, J. and Skorin-Kapov, J. Connection Machine Implementation of a Tabu Search Algorithm for the Traveling Salesman Problem. Journal of Computing and Information Technology, 1(1):29-36, 1993. 24. Chakrapani, J. and Skorin-Kapov, J. Massively Parallel Tabu Search for the Quadratic Assignment Problem. Annals of Operations Research, 41 :327-341, 1993. 25. Chakrapani, J. and Skorin-Kapov, J. Mapping Tasks to Processors to Minimize Communication Time in a Multiprocessor System. In The Impact of Emerging Technologies of Computer Science and Operations Research, pages 45-64. Kluwer Academic Publishers, Nonvell, MA, 1995. 26. Chen, H., Flann, N.S., and Watson, D.W. Parallel Genetic Simulated Annealing: A Massively Parallel SIMD Algorithm. IEEE Transactions on Parallel and Distributed Systems, 9(2): 126-1 36, 1998. 27. Christofides, N., Mingozzi A., and Toth, P. The Vehicle Routing Problem. In N. Christofides, Mingozzi A., P. Toth, and C. Sandi, editors, Combinatorial Optimization, pages 3 15-338. John Wiley, New York, 1979. 28. Cohoon, J., Hedge, S., Martin, W., and Richards, D. Punctuated Equilibria: A Parallel Genetic Algorithm. In J.J. Grefenstette, editor, Proceedings of the

REFERENCES

483

Second International Conference on Genetic Algorithms and their Applications, pages 148-154. Lawrence Erlbaum Associates, Hillsdale, NJ, 1987. 29. Cohoon, J., Martin, W., and Richards, D. Genetic Algorithm and Punctuated Equilibria in VLSI. In Schwefel, H.-P. and Manner, R., editors, Parallel Problem Solving from Nature, volume 496 of Lecture Notes in Computer Science, pages 134144. Springer-Verlag, Berlin, 1991a. 30. Cohoon, J., Martin, W., and Richards, D. A Multipopulation Genetic Algorithm for Solving the k-Partition Problem on Hyper-Cubes. In R.K. Belew and L.B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 134-144. Morgan Kaufmann, San Mateo, CA, 1991b. 31. Collins, R.J. and Jefferson, D.R. Selection in Massively Parallel Genetic Algorithms. In R.K. Belew and L.B. Booker, editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 249-256. Morgan Kaufmann, San Mateo, CA, 1991. 32. Cordeau, J.F. and G. Laporte, G. A Tabu Search Heuristics for the Static Multivehicle Dial-a-Ride Problem. Transportation Research Part B , pages 579-594, 2003. 33. Crainic, T.G. Parallel Computation, Cooperation, Tabu Search. In C. Rego and B. Alidaee, editors, Adaptive Memory and Evolution: Tabu Search and Scatter Search, pages 283-302, Kluwer Academic Publishers, Norwell, MA, 2005. 34. Crainic, T.G. and Gendreau, M. Towards an Evolutionary Method - Cooperating Multithread Parallel Tabu Search Hybrid. In S. Vo13, S. Martello, C. Roucairol, and Osman, I.H., editors, Meta-Heuristics 98: Theory & Applications, pages 33 1-344. Kluwer Academic Publishers, Norwell, MA, 1999. 35. Crainic, T.G. and Gendreau, M. Cooperative Parallel Tabu Search for Capacitated Network Design. Journal of Heuristics, 8(6):601-627,2002. 36. Crainic, T.G., Gendreau, M., Hansen, P., and MladenoviC, N. Cooperative Parallel Variable Neighborhood Search for the p-Median. Journal of Heuristics, 10(3):293-314,2004, 37. Crainic, T.G. and Toulouse, M. Parallel Metaheuristics. In T.G. Crainic and G. Laporte, editors, Fleet Management and Logistics, pages 205-25 1. Kluwer Academic Publishers, Norwell, MA, 1998. 38. Crainic, T.G. and Toulouse, M. Parallel Strategies for Metaheuristics. In F. Glover and G. Kochenberger, editors, Handbook in Metaheuristics, pages 4755 13. Kluwer Academic Publishers, Norwell, MA, 2003. 39. Crainic, T.G., Toulouse, M., and Gendreau, M. Parallel Asynchronous Tabu Search for Multicommodity Location-Allocation with Balancing Requirements. Annals of Operations Research, 63:277-299, 1995.

484

PARALLEL METAHELIRISTICS APPLICATIONS

40. Crainic, T.G., Toulouse, M., and Gendreau, M. Synchronous Tabu Search Parallelization Strategies for Multicommodity Location-Allocation with Balancing Requirements. OR Spektrum, 17(2/3):113-123, 1995. 41. Crainic, T.G., Toulouse, M., and Gendreau, M. Towards a Taxonomy of Parallel Tabu Search Algorithms. INFORMS Journal on Computing, 9( 1):61-72, 1997. 42. Crainic, T.G., Toulouse, M., and Li, Y. A Simple Cooperative Multilevel Algorithm for the Capacitated Multicommodity Network Design. Computers & O.R., 2005. 43. Cung, V.-D., Martins, S.L., Ribeiro, C.C., and Roucairol, C. Strategies for the Parallel Implementations of Metaheuristics. In C.C. Ribeiro and P. Hansen, editors, Essays and Surveys inMetaheuristics, pages 263-308. Kluwer Academic Publishers, Nonvell, MA, 2002. 44. Cung, V.-D., Mautor, T., Michelon, P., and Tavares, A. A Scatter Search Based Approach for the Quadratic Assignment Problem on Evolutionary Computation and Evolutionary Programming. In T. Baeck, Z. Michalewicz, and X. Yao, editors, Proceedings of the IEEE International Conference on Evolutionaly Computation, pages 165-170. IEEE Press, 1997. 45. Czech, Z.J. A Parallel Genetic Algorithm for the Set Partitioning Problem. In 8th Euromicro Workshop on Parallel and Distributed Processing, pages 343-350, 2000. 46. Czech, Z.J. and Czamas, P. Parallel simulated annealing for the vehicle routing problem with time windows. In 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pages 376383, 2002. 47. De Falco, I., Del Balio, R., Tarantino, E., and Vaccaro, R. Improving Search by Incorporating Evolution Principles in Parallel Tabu Search. In Proceedings International Confonference on Machine Learning, pages 823-828, 1994. 48. R. Diekmann, R. Luling, and J. Simon. Problem Independent Distributed Simulated Annealing and its Applications. In R.V.V. Vidal, editor, Lecture Notes in Economics and Mathematical Systems, volume 396, pages 1 7 4 4 . Springer Verlag, Berlin, 1993. 49. Diekmann, R., Luling, R., Monien, B., and Spriiner, C. Combining Helpful Sets and Parallel Simulated Annealing for the Graph-Partitioning Problem. International Journal of Parallel Programming, 8:61-84, 1996. 50. Drias, H. and Ibri , A. Parallel ACS for Weighted MAX-SAT. In Mira, J. and Alvarez, J., editors, Artrficial Neural Nets Problem Solving Methods - Proceedings of the 7th International Work-Conference on Artificial and Natural Neural Networks, volume 2686 of Lecture Notes in Computer Science, pages 414421. Springer-Verlag, Heidelberg, 2003.

REFERENCES

485

5 1. Drias, H. and Khabzaoui, M. Scatter Search with Random Walk Strategy for SAT and Max-SAT Problems. In L. Monostori, Vancza, J., and A. Moonis, editors, Proceedings of the 14th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2001I, pages 35-44. Springer-Verlag, 2001. 52. Drummond, L.M.A., Ochi, L.S., andviama, D.S. AParallelHybridEvolutionary Metaheuristic for the Period Vehicle Routing Problem. volume 1586 of Lecture Notes in Computer Science, pages 183-191. 1999. 53. Drummond, L.M.A., Ochi, L.S., and Vianna, D.S. An Asynchronous Parallel Metaheuristic for the Period Vehicle Routing Problem. Future Generation Computer Systems, 17:379-386,2001. 54. Durand, M.D. Parallel Simulated Annealing: Accuracy vs. Speed in Placement. IEEE Design & Test of Computers, 6(3):8-34, 1989.

55. Felten, E., Karlin, S., and Otto, S.W. The Traveling Salesman Problem on a Hypercube, MIMD Computer. In Proceedings I985 of the Int. Conf: on Parallel Processing, pages 6-10, 1985. 56. Fiechter, C.-N. A Parallel Tabu Search Algorithm for Large Travelling Salesman Problems. Discrete Applied Mathematics, 51(3):243-267, 1994. 57. Fiorenzo Catalano, M.S. and Malucelli, F. Randomized Heuristic Schemes for the Set Covering Problem. In M. Paprzyky, L. Tarricone, and T. Yang, editors, Practical Applications ofParalle1 Computing,pages 23-38. Nova Science, 2003. 58. Flores, S.D., Cegla, B.B., and Caceres, D.B. TelecommunicationNetwork Design with Parallel Multiobjective Evolutionary Algorithms. In IFIP/ACM Latin America Networking Conference 2003, pages 1-1 1,2003. 59. Folino, G., Pizzuti, C., and Spezzano, G. Combining Cellular Genetic Algorithms and Local Search for Solving SatisfiabilityProblems. In Proceedings of the Tenth IEEE International Conference on Tools with Artificial Intelligence, pages 192198. IEEE Computer Society Press, 1998. 60. Folino, G., Pizzuti, C., and Spezzano, G. Solving the SatisfiabilityProblem by a Parallel Cellular Genetic Algorithm. In Proceedings of the 24th EUROMICRO Conference,pages 715-722. IEEE Computer Society Press, 1998. 61. Folino, G., Pizzuti, C., and Spezzano, G. Parallel Hybrid Method for SAT that Couples Genetic Algorithms and Local Search. IEEE Transactions on Evolutionary Computation, 5(4):323-334,2001, 62. Garcia-Lbpez, F., Melih-Batista, B., Moreno-Perez, J.A., and Moreno-Vega, J.M. The Parallel Variable Neighborhood Search for the p-Median Problem. Journal of Heuristics, 8(3):375-388,2002.

486

PARALLEL METAHEURISTICS APPLICATIONS

63. Garcia-Lopez, F., Melian-Batista, B., Moreno-PCrez, J.A., and Moreno-Vega, J.M. Parallelization of the Scatter Search for the p-Median Problem. Parallel Computing, 29:575-589, 2003. 64. Gehring, H. and Homberger, J. A Parallel Two-Phase Metaheuristic for Routing Problems with Time Windows. Asia-Pacific Journal of Operational Research, 18(1):35s47,2001.

65. Gendreau, M., Guertin, F., Potvin, J.-Y., and Taillard, E.D. Tabu Search for RealTime Vehicle Routing and Dispatching. Transportation Science, 33(4):38 1-390, 1999. 66. Gendreau, M., Laporte , G., and Semet, F. A Dynamic Model and Parallel Tabu Search Heuristic for Real-Time Ambulance Relocation. Parallel Computing, 27( 12):1641-1653,2001. 67. Gendron, B., Potvin, J.-Y., and Soriano, P. A Parallel Hybrid Heuristic for the Multicommodity Capacitated Location Problem with Balancing Requirements. Parallel Computing, 29:59 1-606, 2003. 68. Ghamlouche, I., Crainic, T.G., and Gendreau, M. Cycle-based Neighborhoods for Fixed-Charge Capacitated Multicommodity Network Design. Operations Research, 5 1(4):655-667,2003. 69. Glover, F. and Laguna, M. Tabu Search. Kluwer Academic Publishers, Norwell, MA, 1997. 70. Greening, D.R. Asynchronous Parallel Simulated Annealing. Lectures in Complex Systems, 314977505, 1990. 71. Greening, D.R. Parallel Simulated Annealing Techniques. Physica D, 42:293306, 1990. 72. Gunes, M., Sorges, U., and Bouazizi, I. ARA - The Ant Colony Based Routing Algorithm for MANETs. In Proceedings of the International Conference on ParaNeI Processing, pages 79-85, 2002. 73. Hidalgo, J.I., Prieto, M., Lanchares, J., Baraglia, R., Tirado, F., and Gamica, 0. Hybrid Parallelization of a Compact Genetic Algorithm. In Proceedings of the 1Ith Euromicro Conference on Parallel, Distributed and Network-Based Processing, pages 449455,2003. 74. Holmqvist, K., Migdalas, A., and Pardalos, P.M. Parallelized Heuristics for Combinatorial Search. In A. Migdalas, P.M. Pardalos, and S. Storoy, editors, Parallel Computing in Optimization, pages 269-294. Kluwer Academic Publishers, Norwell, MA, 1997. 75. Homberger, J. and G e h n g , H. Two Evolutionary Metaheuristics for the Vehicle Routing Problem with Time Windows. INFOR, 37:297-3 18, 1999.

REFERENCES

487

76. Jeong, C.-S. and Kim, M.-H. Parallel Algorithm for the TSP on SIMD Machines Using Simulated Annealing. In Proceedings of the International Conference on Application Specijic Array Processors, pages 7 12-721, 1990. 77. Jeong, C.-S. and Kim, M.-H. Fast Parallel Simulated Annealing Algorithm for TSP on SIMD Machines with Linear Interconnections. Parallel Computing, 171221-228, 1991. 78. Katayama, K., Hirabayashi, H., and Narihisa, H. Performance Analysis for Crossover Operators of Genetic Algorithm. Systems and Computers in Japan, 30:20-30, 1999. 79. Katayama, K., Hirabayashi, H., and Narihisa, H. Analysis of Crossovers and Selections in a Coarse-grained Parallel Genetic Algorithm. Mathematical and Computer Modelling, 38:1275-1282,2003. 80. Knight, R.L. and Wainwright, R.L. HYPERGEN" - A Distributed Genetic Algorithm on a Hypercube. In Proceedings of the 1992 IEEE Scalable High Performance Computing Conference, pages 232-235. IEEE Computer Society Press, Los Alamitos, CA, 1992. 81. Kohlmorgen, U., Schmeck, H., and Haase, K. Experiences with Fine-grained Parallel Genetic Algorithms. Annals of Operations Research, 90:203-2 19,1999. 82. Kokosinski, Z., Kolodziej, M., and Kwarciany, K. Parallel Genetic Algorithm for Graph Coloring Problem. In Bubak, M., van Albada, G.D., and Sloot, P.M.A., editors, International Conference on Computational Science, volume 3036 of Lecture Notes in Computer Science, pages 2 15-222. SpringerVerlag, Heidelberg, 2004. 83. Laursen, P.S. Problem-Independent Parallel Simulated Annealing Using Selection and Migration. In Davidor, Y., Schwefel, H.-P., and Manner, R., editors, Parallel Problem Solving from Nature III, Lecture Notes in Computer Science 866, pages 408-41 7. Springer-Verlag, Berlin, 1994. 84. Laursen, P.S. Parallel Heuristic Search - Introductions and a New Approach. In A. Ferreira and P.M. Pardalos, editors, Solving Combinatorial Optimization Problems in Parallel, Lecture Notes in Computer Science 1054, pages 248-274. Springer-Verlag, Berlin, 1996. 85. Le Bouthillier, A. and Crainic, T.G. A Cooperative Parallel Meta-Heuristic for the Vehicle Routing Problem with Time Windows. Computers & Operations Research, 32(7): 1685-1708,2005, 86. Lee, K-G. and Lee, S-Y. Efficient Parallelization of Simulated Annealing Using Multiple Markov Chains: An Application to Graph Partitioning. In Mudge, T.N., editor, Proceedings of the International Conference on Parallel Processing, volume 111: Algorithms and Applications, pages 177-1 80. CRC Press, 1992a.

488

PARALLEL METAHELIRISTICS APPLICATIONS

87. Lee, K-G. and Lee, S-Y. Synchronous and Asynchronous Parallel Simulated Annealing with Multiple Markov Chains. volume 1027 of Lecture Notes in Computer Science, pages 396408. Springer-Verlag,Berlin, 1995. 88. Lee, S.-Y. and Lee, K.-G. Asynchronous Communication of Multiple Markov Chains in Parallel Simulated Annealing. In Mudge, T.N., editor, Proceedings of the International Conference on Parallel Processing, volume 111: Algorithms and Applications, pages 169-176. CRC Press, Boca Raton, FL, 1992b. 89. Levine, D. A Parallel Genetic Algorithm for the Set Partitioning Problem. In I.H. Osman and J.P. Kelly, editors, Meta-Heuristics: Theory & Applications, pages 23-35. Kluwer Academic Publishers, Nonvell, MA, 1996. 90. Li, Y., Pardalos, P.M., and Resende, M.G.C. A Greedy Randomized Adaptive Search Procedure for Quadratic Assignment Problem. In DIMACS Implementation Challenge, DIMACS Series on Discrete Mathematics and Theoretical Computer Science, volume 16, pages 237-261. American Mathematical Society, 1994. 91. Lin, S.-C., Punch, W., and Goodman, E. Coarse-Grain Parallel Genetic Algorithms: Categorization and New Approach. In Sixth IEEE Symposium on Parallel and Distributed Processing, pages 28-37. IEEE Computer Society Press, 1994. 92. Logar, A.M., Corwin, E.M., and English, T.M. Implementation of Massively Parallel Genetic Algorithms on the MasPar MP- 1. In Proceedings of the I992 IEEE ACMISIGAPP Symposium on Applied Computing: Technological Challenges of the 1990's, pages 1015-1020. ACM Press, Kansas City, Missouri, 1992. 93. Malek, M., Guruswamy, M., Pandya, M., and Owens, H. Serial and Parallel Simulated Annealing and Tabu Search Algorithms for the Traveling Salesman Problem. Annals of Operations Research, 2 1:59-84, 1989. 94. Marchiori, E. and Rossi, C. A Flipping Genetic Algorithm for Hard 3-SAT Problems. In Banzhaf, W., Daida, J., Eiben, A.E., Garzon, M.H., Honavar, V., Jakiela, M., and Smith, R.E., editors, Proceedings of the Genetic Evolutionary Computation Conference, pages 393-400. Morgan Kaufmann, San Mateo, CA, 1999. 95. Martins, S.L., Resende, M.G.C., Ribeiro, C.C., and Parlados, P.M. A Parallel Grasp for the Steiner Tree Problem in Graphs Using a Hybrid Local Search Strategy. Journal of Global Optimization, 171267-283,2000. 96. Martins, S.L., Ribeiro, C.C., and Souza, M.C. A Parallel GRASP for the Steiner Problem in Graphs. In A. Ferreira and J. Rolim, editors, Proceedings of ZRREGULAR'98 - 5th International Symposium on Solving Irregularly Structured Problems in Parallel, volume 1457 of Lecture Notes in Computer Science, pages 285-297. Springer-Verlag, 1998. 97. T. Maruyama, T. Hirose, and A. Konagaya. A Fine-Grained Parallel Genetic Algorithm for Distributed Parallel Systems. In S. Forrest, editor, Proceedings

REFERENCES

489

of the F$h International Conference on Genetic Algorithms, pages 184-190. Morgan Kaufmann, San Mateo, CA, 1993. 98. Middendorf, M., Reischle, F., and Schmeck, H. Information Exchange in Multi Colony Ant Algorithms. volume 1800 of Lecture Notes in Computer Science. Springer-Verlag, Heidelberg, pages 645-652,2000. 99. Miki, M., Hiroyasu, T., Wako, J., and Yoshida, T. Adaptive Temperature Schedule Determined by Genetic Algorithm for Parallel Simulated Annealing. In CEC’O3 - The 2003 Congress on Evolutionary Computation, volume 1, pages 459-466, 2003. 100. Mitchell, D., Selman, B., and Levesque, H. Hard and Easy Distribution of SAT Problems. In Rosenbloom, P. and Szolovits, P., editors, Proceedings ofthe Tenth National Confirence on Artificial Intelligence, pages 459-465. AAAI Press, Menlo Park, CA, 1992. 101. H. Muhlenbein. Parallel Genetic Algorithms, Population Genetics and Combinatorial Optimization. In J.D. Schaffer, editor, Proceedings ofthe Third International Conference on Genetic Algorithms, pages 4 1 6 4 2l . Morgan Kaufmann, San Mateo, CA, 1989. 102. Miihlenbein, H. Evolution in Time and Space - The Parallel Genetic Algorithm. In G.J.E. Rawlins, editor, Foundations of GeneticAlgorithm & ClassifierSystems, pages 316338. Morgan Kaufman, San Mateo, CA, 1991. 103. Miihlenbein, H. Asynchronous Parallel Search by the Parallel Genetic Algorithm. In V. Ramachandran, editor, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, pages 526-533. IEEE Computer Society Press, Los Alamitos, CA, 1991c. 104. Muhlenbein, H. Parallel Genetic Algorithms in Combinatorial Optimization. In 0. Balci, R. Sharda, and S . Zenios, editors, Computer Science and Operations Research: New Developments in their Interface, pages 44 1-456. Pergamon Press, New York, NY, 1992. 105. Miihlenbein, H. How Genetic Algorithms Really Work: Mutation and HillClimbing. In R. Manner and B. Manderick, editors, Parallel Problem Solving from Nature, 2, pages 15-26. North-Holland, Amsterdam, 1992a. 106. Muhlenbein, H., Gorges-Schleuter, M., and Kramer, 0. New Solutions to the Mapping Problem of Parallel Systems - the Evolution Approach. Parallel Computing, 6:269-279, 1987. 107. Muhlenbein, H., Gorges-Schleuter, M., and K r h e r , 0. Evolution Algorithms in Combinatorial Optimization. Parallel Computing, 7( 1):65-85, 1988. 108. Mutalik, P. P., Knight, L. R., Blanton, J. L., and Wainwright, R. L. Solving Combinatorial Optimization Problems Using Parallel Simulated Annealing and Parallel Genetic Algorithms. In Proceedings of the 1992 IEEE ACMISIGAPP

490

PARALLEL METAHEURISTICS APPLICATIONS

Symposium on Applied Computing: Technological Challenges of the I990 's, pages 1031-1038. ACM Press, Kansas City, Missouri, 1992. 109. Nabhan, T.M. and Zomaya, A.Y. A Parallel Simulated Annealing Algorithm with Low Communication Overhead. IEEE Transactions on Parallel and Distributed Systems, 6( 12):1226-1233, 1995. 110. Ochi, L.S., Vianna, D.S., Drummond, L.M.A., and Victor, A.O. A Parallel Evolutionary Algorithm for the Vehicle Routing Problem with Heterogeneous Fleet. Future Generation Computer Systems, 14(3):285-292, 1998. 111. Ouyang, M., Toulouse, M., Thulasiraman, K., Glover, F., and Deogun, J.S. Multilevel Cooperative Search: Application to the NetlistiHypergraph Partitioning Problem. In Proceedings of International Symposium on Physical Design, pages 192-198. ACM Press, 2000. 112. Ouyang, M., Toulouse, M., Thulasiraman, K., Glover, F., and Deogun, J.S. Multilevel Cooperative Search for the CircuiVHypergraph Partitioning Problem. IEEE Transactions on Computer-Aided Design, 2 1(6):685-693,2002. 113. Pardalos, P.M., Li, Y., and Murthy, K.A. Computational Experience with Parallel Algorithms for Solving the Quadratic Assignment Problem. In 0. Balci, R. Sharda, and S. Zenios, editors, Computer Science and Operations Research: New Developments in their Interface, pages 267-278. Pergamon Press, New York, NY, 1992. 114. Pardalos, P.M., L. Pitsoulis, T. Mavridou, and Resende, M.G.C. Parallel Search for Combinatorial Optimization: Genetic Algorithms, Simulated Annealing, Tabu Search and GRASP. In A. Ferreira and J. Rolim, editors, Proceedings of Workshop on Parallel Algorithms for Irregularly Structured Problems, Lecture Notes in Computer Science, volume 980, pages 3 17-33 1. Springer-Verlag, Berlin, 1995. 115. Pardalos, P.M., Pitsoulis, L., and Resende, M.G.C. A Parallel GRASP Implementation for the Quadratic Assignment Problem. In A. Ferreira and J. Rolim, editors, Solving Irregular Problems in Parallel: State ofthe Art, pages 115-130. Kluwer Academic Publishers, Norwell, MA, 1995. 1 16. L. Pitsoulis, Pardalos, P.M., and Resende, M.G.C. A Parallel GRASP for MAXSAT. In Wasniewski J., Dongarra, J., Madsen, K., and Olesen, D., editors, Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization, volume 1180 of Lecture Notes in Computer Science, pages 575-585. Springer-Verlag,Berlin, 1996.

117. Prins, C. and Taillard, E.D. A Simple and Effective Evolutionary Algorithm for the Vehicle Routing Problem. Computers & Operations Research, 3 l(12): 19852002,2004.

REFERENCES

491

1 18. Rahoual, M., Hadji, R., and Bachelet, V. Parallel Ant System for the Set Covering Problem Source. In Proceedings of the Third International Workshop on Ant Algorithms, pages 262-267. Springer-Verlag, London, UK, 2002.

119. Ram, D.J., Sreenivas,T.H., and Subramaniam, K.G. Parallel Simulated Annealing Algorithms. Journal of Parallel and Distributed Computing, 37:207-2 12, 1996. 120. Randall, M. and Lewis, A. A Parallel Implementation of Ant Colony Optimisation. Journal of Parallel and Distributed Computing, 62: 1421-1432, 2002. 121. Rego, C. and Roucairol, C. A Parallel Tabu Search Algorithm Using Ejection Chains for the VRP. In I.H. Osman and J.P. Kelly, editors, Metu-Heuristics: Theory & Applications, pages 253-295. Kluwer Academic Publishers, Norwell, MA, 1996. 122. Reimann, M., Doerner, K., and Hartl, R. D-ants: Savings Based Ants Divide and Conquer the Vehicle Routing Problem. Computers & Operations Research, 311563-591,2004. 123. Resende, M.G.C. and Feo, T.A. A GRASP for Satisfiability. In Trick, M.A. and Johnson, D.S., editors, The Second DIMACS Implementation Challenge, DIMACS Series on Discrete Mathematics and Theoretical Computer Science, volume 26, pages 499-520. American Mathematical Society, 1996. 124. Resende, M.G.C. and Feo, T.A. Approximative Solution of Weighted MAXSAT Problems Using GRASP. Discrete Mathematics and Theoretical Computer Science, 35:393405, 1997. 125. Ribeiro, C.C. and Rosseti, I. A Parallel GRASP Heuristic for the 2-path Network Design Problem. 4 journee ROADEF, Paris, February 20-22,2002. 126. Ribeiro C.C. and Rosseti, I. A Parallel GRASP Heuristic for the 2-path Network Design Problem. Third Meeting of the PARE0 Euro Working Group, Guadeloupe (France), May, 2002. 127. Ribeiro C.C. and Rosseti, I. Parallel GRASP with Path-Relinking Heuristic for the 2-Path Network Design Problem. AIR0’2002, L‘Aquila, Italy, September, 2002. 128. Rochat, Y. and Taillard, E.D. Probabilistic Diversification and Intensification in Local Search for Vehicle Routing. Journal of Heuristics, l(1): 147-167, 1995. 129. Sanvicente-Sanchez, H. and Frausto-Solis, J. . MPSA: A Methodology to Parallelize Simulated Annealing and its Application to the Traveling Salesman Problem. volume 23 13 of Lecture Notes in Computer Science. Springer-Verlag Heidelberg, pages 89-97, 2002. 130. Selman, B., Kautz, H. A., and Cohen, B. Noise Strategies for Improving Local Search. In Proceedings of the Twerfth National Conference on Artificial Intelligence, pages 337-343, 1994.

492

PARALLEL METAHEURISTICS APPLICATIONS

131. Selman, B., Levesque, H., and Mitchell, D. A New Method for Solving Hard SatisfiabilityProblems. In Rosenbloom, P. and Szolovits, P., editors, Proceedings of the Tenth National Conference on Artificial Intelligence, pages 440-446. AAAI Press, Menlo Park, CA, 1992. 132. Sena, G.A., Megherbi, D., and Isern, G. Implementation of a Parallel Genetic Algorithm on a Cluster of Workstations: Traveling Salesman Problem, a Case Study. Future Generation Computer Systems, 17:477488, 200 1. 133. Shonkwiler, R. Parallel Genetic Algorithms. In S. Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms, pages 199-205. Morgan Kaufmann, San Mateo, CA, 1993. 134. Sleem, A., Ahmed, M., Kumar, A., and Kamel, K. Comparative Study of Parallel vs. Distributed Genetic Algorithm Implementation for ATM Networking Environment. In Fifth IEEE Symposium on Computers and Communications, pages 152-157,2000. 135. Sohn, A. Parallel Satisfiability Test with Synchronous Simulated Annealing on Distributed Memory Multiprocessor. Journal of Parallel and Distributed Computing, 36: 195-204, 1996. 136. Sohn, A. and Biswas, R. Satisfiability Tests with Synchronous Simulated Annealing on the Fujitsu A p 1000 Massively-parallel Multiprocessor. In Proceedings of the International Conference on Supercomputing, pages 2 13-220, 1996. 137. Solar, M., Parada, V., and Urmtia, R. A Parallel Genetic Algorithm to Solve the Set-Covering Problem. Computers & Operations Research, 29(9): 1221-1235, 2002. 138. Solomon, M.M. Time Window Constrained Routing and Scheduling Problems. Operations Research, 35:254-265, 1987. 139. Spears, W.M. Simulated Annealing for Hard Satisfiability Problems, in Clique, Coloring and Satisfiability. In Johnson, D.S. and Trick, M.A., editors, Cliques, Coloring, and Satisfiability, volume 26, pages 533-558. American Mathematical Society, 1996. 140. Stutzle, T. Parallelization Strategies for Ant Colony Optimization. In Eiben, A.E., Back, T., Schoenauer, M., and Schwefel, H.-P., editors, Proceedings of Parallel Problem Solvingfi-om Nature V, volume 1498 of Lecture Notes in Computer Science, pages 722-73 1. Springer-Verlag, Heidelberg, 1998. 141. Stutzle, T. and Hoos, H. Improvements on the Ant System: Introducing the MAX-MIN Ant System. In Smith, G.D., Steele, N.C., and Albrecht, R.F., editors, Proceedings of Artijicial Neural Nets and Genetic Algorithms, Lecture Notes in Computer Science, pages 245-249. Springer-Verlag,Heidelberg, 1997. 142. Taillard, E.D. Robust Taboo Search for the Quadratic Assignment Problem. Parallel Computing, 17:443455, 1991.

REFERENCES

493

143. Taillard, E.D. Parallel Iterative Search Methods for Vehicle Routing Problems. Networks, 23:661-673, 1993. 144. Taillard, E.D. Recherches iteratives dirigkes paralliles. Polytechnique FedCrale de Lausanne, 1993.

PhD thesis, Ecole

145. Taillard, E.D., Badeau, P., Gendreau, M., Guertin, F., and Potvin, J.-Y. A Tabu Search Heuristic for the Vehicle Routing Problem with Soft Time Windows. Transportation Science, 31(2):170-186, 1997. 146. Talbi, E-G. and Bessibre, P. A Parallel Genetic Algorithm for the Graph Partitioning Problem. In Proceedings of the ACM International Conference on Supercomputing ICS91, pages 312-320, 1991a. 147. Talbi, E.-G., Hafidi, Z., andGeib, J.-M. Parallel Adaptive Tabu Search Approach. Parallel Computing, 24:2003-2019, 1998. 148. Talbi, E.-G., Hafidi, Z., and Geib, J.-M. Parallel Tabu Search for Large Optimization Problems. In S. VoB, S. Martello, C. Roucairol, and Osman, I.H., editors, Meta-Heuristics 98: Theory & Applications, pages 345-358. Kluwer Academic Publishers, Norwell, MA, 1999. 149. Talbi, E.-G., Hafidi, Z., Kebbal, D., and Geib, J.-M. A Fault-Tolerant Parallel Heuristic for Assignment Problems. Future Generation Computer Systems, 14:425438. 1998.

150. Talbi, E.-G., Roux, O., Fonlupt, C., and Robillard, D. Parallel Ant Colonies for Combinatorial Optimization Problems. In J.D.P. Rolim et al., editor, 11th IPPS/SPDP’99 Workshops,volume 1586 of Lecture Notes in Computer Science, pages 239-247. 1999. 151. Talbi, E.-G., Roux, O., Fonlupt, C . , and Robillard, D. Parallel Ant Colonies for the Quadratic Assignment Problem. Future Generation Computer Systems, 17:441-449,2001. 152. Tongcheng, G. and Chundi, M. Radio Network Design Using Coarse-Grained Parallel Genetic Algorithms with Different Neighbor Topology. In Proceedings of the 4th World Congress on Intelligent Control and Automation, volume 3, pages 1840-1843,2002. 153. Toulouse, M., Crainic, T.G., and Sans6, B. An Experimental Study of Systemic Behavior ofcooperative Search Algorithms. In S. VoB, S. Martello, C. Roucairol, and Osman, I.H., editors, Meta-Heuristics 98: Theory & Applications, pages 373-392. Kluwer Academic Publishers, Norwell, MA, 1999. 154. Toulouse, M., Crainic, T.G., and Sansb, B. Systemic Behavior of Cooperative Search Algorithms. Parallel Computing, 2 1( 1):57-79,2004.

494

PARALLEL METAHEURISTICS APPLICATIONS

155. Toulouse, M., Crainic, T.G., Sanso, B., and Thulasiraman, K. Self-organization in Cooperative Search Algorithms. In Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics, pages 2379-2385. Omnipress, Madisson, Wisconsin, 1998. 156. Toulouse, M., Crainic, T.G., and Thulasiraman, K. Global Optimization Properties of Parallel Cooperative Search Algorithms: A Simulation Study. Parallel Computing, 26(1):91-112, 2000. 157. Toulouse, M., Glover, F., and Thulasiraman, K. A Multiscale Cooperative Search with an Application to Graph Partitioning. Report, School of Computer Science, University of Oklahoma, Norman, OK, 1998. 158. Toulouse, M., Thulasiraman, K., and Glover, F. Multilevel Cooperative Search: A New Paradigm for Combinatorial Optimization and an Application to Graph Partitioning. In P. h e s t o y , P. Berger, M. DaydC, I. Duff, V. Frays&, L. Giraud, and D. Ruiz, editors, 5th International Euro-Par Parallel Processing Conference, volume 1685 of Lecture Notes in Computer Science, pages 533-542. SpringerVerlag, Heidelberg, 1999. 159. Towhidul Islam, M., Thulasiraman, P., and Thalasiram, R.K. A Parallel Ant Colony Optimization Algorithm for All-Pair Routing in MANETs. In Proceedings of the International Parallel and Distributed Processing Symposium, IEEE, page 259,2003. 160. Verhoeven, M.G.A. and Aarts, E.H.L. Parallel Local Search. Journal ofHeuristics, 1(1):43-65, 1995. 161. Verhoeven, M.G.A. and Severens, M.M.M. Parallel Local Search for Steiner Trees in Graphs. Annals of Operations Research, 90:185-202, 1999. 162. VolJ, S. Tabu Search: Applications and Prospects. In D.-Z. Du and P.M. Pardalos, editors, Network Optimization Problems, pages 333-353. World Scientific Publishing Co., Singapore, 1993. 163. Wilkerson, R. and Nemer-Preece, N. Parallel Genetic Algorithm to Solve the Satisfiability Problem. In Proceedings of the 1998 ACM symposium on Applied Computing, pages 23-28. ACM Press, 1998. 164. Witte, E.E., Chamberlain, R.D., and Franklin, M.A. Parallel Simulated Annealing using Speculative Computation. In Proceedings of the 19th International Conference on Parallel Processing, pages 286-290, 1990. 165. Witte, E.E., Chamberlain, R.D., and Franklin, M.A. Parallel Simulated Annealing using Speculative Computation. IEEE Transactions on Parallel & Distributed Systems, 2(4):483494, 1991.

28

Parallel Metaheuristics in Telecommunications SERGIO NESMACHNOW~, HECTOR CANCELA~, ENRIQUE ALBA~,FRANCISCO CHICANO^ ‘Universidad de la Republica, Uruguay 2Universidad de Malaga, Spain

20.1

INTRODUCTION

The fast development of network infrastructures, software, and Internet services has been dnven by the growing demand for data communications over the last 20 years. At the present time, emergent new technologies like cellular mobile radio systems, optical fibers, and high speed networks, which allow fast data communications and new services and applications are in widespread use around the globe. In this situation, there is renewed interest in related technology and communication network problems, such as optimal allocation of antennas, frequency assignment to cellular phones, and structural design problems relating to routing information through the net. Since the size of the existing networks is continuously enlarging, the underlying instances of related optimization problems frequently pose a challenge to existing algorithms. In consequence, the research community has been searching for new algorithms that are able to replace and improve the traditional exact ones, whose low efficiency often makes them useless for solving real-life problems of large size in reasonable time. In this context, metaheuristic algorithms have been frequently applied to telecommunication problems in the last 15 years. Parallel implementations became popular in the last decade as an effort to make metaheuristics more efficient. As this volume eloquently demonstrates, there exists a wide range of choices to design efficient parallel implementations for metaheuristic algorithms, but the common idea consists in splitting the amount of work into several processing elements. In this way, parallel metaheuristic algorithms allow us to reach high quality results in a reasonable execution time even for hard-to-solve underlying optimization problems related to telecommunication. In addition, parallel implementations usually provide a pattern for the search space exploration that is

495

496

PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS

different from the sequential one and has been often shown to be useful for obtaining superior results [8, 24,471. This chapter provides a summary of articles related to the application of parallel metaheuristics to telecommunication problems. The survey focuses mainly on application areas, considering only those problems inherently related to the telecommunication field and therefore disregarding a whole class of optimization problems not directly connected with the area. For example, limiting the problem domain to applications unique to telecommunication prevents us from analyzing electronic circuits design problems and VLSI channel and switchbox routing problems, though there exist several parallel metaheuristic proposals for these kinds of problems and their approaches are usually related to telecommunicationproblems. Our main classificationdivides the applications in three categories: network design problems, network routing problems, and network assignment and dimensioning problems. Under the network design category, we include those problems related with finding a network topology that satisfies certain properties associated with reliability, Quality of Service (QoS), and other important features for source-to-destinationcommunication. Besides pure topology design problems, we also include node and transmitter positioning problems and construction of trees, typical combinatorial optimization problems strongly related to network topology design. The network routing group comprises all those applications concerning the transmission of information, routing protocol problems and their solution using parallel metaheuristic approaches. Finally, the network assignment and dimensioning collection involves those problems related to assigning available resources to a given network, such as frequency assignment and wavelength allocation problems. We also include dynamical dimensioning problems, considering that network planning usually must satisfy expected levels of demand for new services, upgrading, and improvementson existing designs.

20.2 NETWORK DESIGN Network design is the focus of numerous papers, where the authors propose several parallel and distributed metaheuristic techniques to solve them. Table 20.1 summarizes these papers. To organize the section we have grouped the works according to the aspect of network design they tackle. We identify five groups: reliability, Steiner tree problem, antennae placement, topological design, and other network design problems.

20.2.1 Reliability and Connectivity Problems Reliability refers to the ability of the network of worlung when some of the nodes or links fail. The evaluation of reliability metrics is a difficult problem itself; an alternative is to impose connectivity constraints on the network topology. One usual

NETWORK DESIGN

Table 20.1

497

Parallel metaheuristics applied to network design problems

Authodsl Huane et al.

Year 1997

Martins et al. Baran and Laufer Martins et al. Cruz et al.

1998 1999 2000 2000

Meunier and Talbi

2000

Calegari et al.

200 1

Canuto et al. Duarte and Baran

2001 2001

Related ootimization orobiem 2-connectivity..problem with diameter constraints. Steiner tree problem. Reliable network design. Steiner tree problem. Topological design. dimensioning. facility location. Position and configuration of mobile base stations. Antenna placement Hitting set problem. Prize-collecting Steiner tree. Reliable network design.

Watanabe et al.

2001

Antenna arrangement problem.

Ribeiro and Rosetti Cmz and Mateus

2002 2003

Di Fatta et al Duarte et al.

2003 2003

2-path problem. Topological design dimensioning, facility location. Steiner tree problem. Reliable network design.

Nesmachow et al. Lo Re et al. Alba and Chicano

2004 2004 2005

Generalized Steiner Problem. Steiner tree problem. Antenna placement.

Metaheuristic Genetic Algorithm GRASP. Asynchronous Team. Hybrid: GRASP + Local Search. Genetic Algorithm. Mutiobjective Evolutionary Algorithm. Genetic Algorithm. Multi-start Local Search. Multiobjective Evolutionary Algorithm. Multiobjective Evolutionary Algorithm. GRASP. Genetic Algorithm. Genetic Algorithm. Multiobjective Evolutionary Algorithm. GA. CHC. SA. Hybrid GA + SA. Hybrid: GA + Local Search. Genetic Algorithm.

objective is to ensure that in the presence of a single node or link failure the data flows may be re-routed to reach the destination. Then, to ensure that the data can arrive in the final node two disjoint paths between any pair of nodes are necessary. This is the so called 2-connectivity constraint. Huang et al. [26] presented a Parallel Genetic Algorithm (PGA) for solving the 2-connectivity network design problem with diameter constraints. The algorithm uses a node-based codificationencoding the diameter and 2-connectivity constraints in the chromosome and thus avoiding nonfeasible solutions. The authors apply two parallel approaches: a domain decomposition based on partitioning the connectivity requirements and a distributed PGA model. They analyze the influence of several virtual topologies for both strategies and conclude that the double-ring topology gives the best performance for the PGA on the level of partitioning requirements, and the torus topology is the most suitable for the PGA on the level of dividing population. Over this last model, they also verified that the best results are obtained with the most frequent exchange of solutions with neighbors, but the communication overhead increases significantly. Setting the most frequent communication interval value (one generation) and limiting the interactions to only one neighbor produce an appropriate balance between the quality of the results and the computational effort required. Another approach to get reliability in a telecommunicationnetwork consists in limiting the number of edges in a path between two nodes, that is, there is a distinguished set of nodes D and the paths between them have at most k edges (k-path network design problem). Ribeiro and Rosetti [44] developed a parallel GRASP algorithm applied to the 2-path network design problem. The GRASP construction phase uses an iterated shortest 2-path algorithm on random source-destination pairs of nodes

498

PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS

until each pair is considered, while the local search phase tries to improve the solutions tentatively eliminating 2-paths and recalculating the paths using modified edge weights. The local search is complemented with a path-relinking seeking mechanism applied to pairs of solutions. The parallel approach uses a multiple walk independent thread strategy, distributing the iterations over the available processors. The parallel algorithm obtains linear speedup and allows reaching high quality solutions, even though they deteriorate as the number of threads (and processors) increases. Baran and Laufer [4] utilize a model of the network that assigns a reliability value to each link and propose a parallel implementation of an Asynchronous Team Algorithm (A-Team) to solve the problem of finding a reliable communication network. The A-Team is a hybrid technique which combines distinct algorithms interacting in the solution of the same global problem. In Baran and Laufer’s proposal the A-Team combines a PGA with different reliability calculation approaches for the topological optimization of telecommunication networks subject to reliability constraints. The proposed PGA corresponds to a distributed island model with broadcast migration. It employs a bit-string codification and employs specialized initialization, crossover, and mutation operators. A repair mechanism is included to keep the solutions under the 2-connectivity constraint. Two approaches are used to estimate network reliability: an upper bound of all the candidates included in the population is efficiently calculated, and after that, a Monte Carlo simulation is used to get good approximations of the all-terminal reliability. The empirical results show good values for medium-size networks and sublinear speedup. In [17] a multiobjective version of the previous problem is addressed by Duarte and Baran. The authors designed a parallel asynchronous version of the SPEA multiobjective evolutionary algorithm [57] to find optimal topologies for a network. The algorithm presented is made up of two kinds of processes: several parallel SPEA processes, which perform the real optimization work, and one organizer process, which creates the workers, collect the results, and applies the Pareto dominance test over them. The parallel version results outperform the sequential ones, considering standard metrics in the multiobjective algorithms domain. In addition, the parallel version is fully scalable, showing almost linear speedup values and the ability of obtaining better solutions when the number of processors increases. Later, Duarte, Baran and Benitez [ 181 published a comparison of parallel versions of several multiobjective EAs for solving the same reliable network design problem. The authors present experimental results for asynchronous parallel versions of SPEA and NSGA [46] using external populations. The experiments confirm the previous findings, indicating that the quality of results is outperformed with more processors for all the implemented algorithms. They also illustrate that SPEA is able to obtain better results than NSGA using smaller execution times.

20.2.2 Steiner Tree Problem In telecommunication network design there is a problem that frequently appears: the Steiner Tree Problem (STP). The STP consists in finding a minimum-weight subtree of a given graph spanning a set of distinguished terminal nodes. Martins,

NETWORK DESIGN

499

Ribeiro, and Souza [35] developed a parallel GRASP heuristic for this problem. The proposed parallelization scheme consists in a master-slave model, distributing the GRASP iterations among several slave processes running on different processors. The GRASP construction phase is based on a randomized version of Kruskal’s algorithm for finding the minimum spanning tree of a given graph, while the local search phase utilizes a node-based neighborhood built using a nonterminal node insertion or deletion procedure. Results obtained on a set of experiments over series C, D, and E of the OR-Library STP instances present the approach as a usehl way to solve the problem. The parallel algorithm allows tackling high dimension problems. Later, the same authors workmg with Resende and Pardalos presented an improved version of the former algorithm exploring a hybrid local search strategy [36]. This approach incorporates a local search using a path-based neighborhood, replacing paths between terminal nodes. A sublinear speedup behavior is observed on the three classes of problems solved. Lo Re and Lo Presti studied the application to the problem of PGAs. In a first article with Di Fata [ 161, these researchers developed a master-slave PGA obtaining promising speedup values when solving Beasley’s OR Library standard test problems. Recently, the same authors working with Storniolo and Urso extended their proposal [34], presenting a parallel hybrid method that combines a distributed GA and a local search strategy using a specific STP heuristic. The computational efficiency analysis shows that the distributed model achieves significantlybetter speedup values than the master-slave approach, since it employs few synchronization points, and thus it can be executed over a wide-area grid-computing environment. These results encourage the authors to face high dimension problems, with sizes ranging from 1000 to 2000 nodes: 400 problems randomly created and 50 subnetworks with real Internet data extracted from the description produced by the Mercator project. The grid PGA is able to obtain the best-known solutions on about 70% of the instances. Canuto, Resende, and Ribeiro [9] proposed a parallel multistart local search algorithm for solving the prize-collecting Steiner tree problem, which has important applications in telecommunication LAN design. The authors put forward a method based on the generation of initial solutions by a primal-dual algorithm with perturbations. They utilize path-relinking to improve the solutions obtained by the local search and variable neighborhood search (VNS) as postoptimization procedure. Nesmachnow, Cancela, and Alba [391 tackled the Generalized Steiner Problem (GSP). The objective is to find a minimum cost topology such that for each pair of nodes ( i , j ) there exists at least rzj disjoint (or edge-disjoint) paths. The authors present a comparative study of sequential and parallel versions of different metaheuristics applied to a number of medium-sized test cases. The heuristics were implemented over the MALLBA library [l], and comprise a standard Genetic Algorithm (GA), a Simulated Annealing (SA) method, two GA+SA hybrid algorithms, and another evolutionary method called CHC (Cross generational elitist selection, Heterogeneous recombination, and Cataclysmic mutation). All problems used the same binary codification, where the presence or absence of each edge was mapped to a different bit. Standard mutation and recombination operators were applied; the resulting individuals were accepted only when they correspond to feasible solutions.

500

PARALLEL METAHEURlSTlCS IN TELECOMMUNICATIONS

For the parallel versions of the evolutionary methods, the population was split into 8 demes, applying a migration operator working on a unidirectional ring topology. The results for the sequential methods showed CHC as the best alternative in terms of solution quality. For the parallel methods, the experiments over an 8-machine cluster showed that both the standard GA and one of the GA+SA hybrids obtained the best performances in solution quality and speedup.

20.2.3 Antennae Placement and Configuration The localization and parameters of the antennae in a radio network have influence on the quality and cost of the service. This problem is specially important in cellular networks where, in addition to cost and quality requirements, we find coverage and handover constraints. Watanabe, Hiroyasu, and Mikiand [50] worked out a parallel evolutionary multiobjective approach for deciding the antennae placement and configuration in cellular networks. The authors presented two parallel models for multiobjective GAS applied to the problem: the Master-Slave with Local Cultivation Genetic Algorithm (MSLC) and the Divided Range Multi Objective Genetic Algorithm (DRMOGA). The MSLC algorithm is based on the standard master slave approach, but the evolutionary operators are carried out on the slaves using a twoindividual population and the evolution follows the minimal generation gap model. DRMOGA is a standard distributed island model that uses domain decomposition. The empirical analysis compares both models proposed with MOGA [21] and a standard distributed GA. They show that MSLC gets the best results of Pareto front covering and nondominated individuals,while establishingthat DRMOGA results are affected by the number of subpopulations: the number of nondominated individuals decreases when the number of subpopulations grows. In the same line of work, Meunier, Talbi, and Reininger [38] presented a parallel implementation of a GA with a multilevel encoding deciding the activation of sites, the number and type of antennae, and the parameters of each base station. Two modified versions of the classical genetic operators, named geographical crossover and multilevel mutation, are introduced. The fitness evaluation utilizes a ranlung function, similar to Fonseca and Fleming’s MOGA algorithm [21], and a sharing technique is employed to preserve diversity among solutions. In addition, a linear penalization model is used to handle the constraint considered (a minimal value for the covered area). A master-slave parallel implementation is presented for solving high dimension problems in reasonable times, with each slave processing a part of the geographical workmg area. The algorithm is evaluated with a large and realistic highway area generated by France Telecom. The authors analyze the convenience of using the sharing strategy proposed instead of concentrating on a small part of the Pareto front, showing that a better Pareto front sampling is obtained in the first case. Calegari et al. [7] developed a distributed GA to find the optimal placement of antennae. The authors compare a greedy technique, a Darwinian algorithm, and a PGA. The PGA uses a bit string representation for codifying the whole set of possible antenna locations and a parametric fitness function evaluating the covered area as a function of a parameter that can be tuned in order to obtain acceptable

NETWORK DESIGN

501

service ratio values. Experiments were performed on two real-life cases: Vosges (rural scenario) and Geneva (urban scenario). On average, the PGA and the greedy technique show the same solution quality. But when an optimal solution is known, it can be found using PGA whereas the greedy approach usually falls in bad attractive local optima. Alba and Chicano [2] tackled the same problem with sequential and parallel GAS over an artificial instance. They performed a deep study on the parallel approach evaluating the influence of the number of possible locations, the number of processors, and the migration rates. They found a sublinear speedup and concluded that the isolation of the subpopulations is beneficial for the search. 20.2.4 Other Network Design Problems The works of Cruz et al. [ 14, 151 study the multilevel network design problem, which arises in many industrial contexts, including telecommunications. This problem integrates several optimization features such as topological design, dimensioning, and facility location in several hierarchical levels. The multilevel network optimization problem generalizes some specific telecommunication related problems, such as tree construction problems or uncapacitated location problems. The authors focused on several master-slave parallel implementations of the classical branch & bound algorithm, suitable for executing on MIMD parallel computer systems, like the nowadays popular clusters and networks of workstations. They proposed a parallel-centralized version and a parallel-distributed version using different load balancing policies. The evaluation of the algorithms was performed with both OR-library instances and testing problems randomly generated. The results obtained show promising computational efficiency behaviors, achieving an improvement over sequential execution times. The centralized version attains almost linear speedup values, while the distributed approach shows sublinear speedup behavior. The results are similar with all the load-balancing strategies employed. A network design benchmark built using data from France Telecom has been studied by Le Pape, Perron, and other researchers [5,10,32,41,42]. The benchmark consists in dimensioning the arcs of a telecommunication network so that a number of commodities can be simultaneouslyrouted at minimum cost. It includes networks of several sizes and different constraints to take into account real-life aspects, such as security demands, installing multiple optical fibers on a single link, symmetric routing, number of hop/port constraints, and total node traffic constraints. The problems are presented in detail in [5, 321; there are 21 base problems that are translated into 1344 different problem instances when taking into account the different combinations of active constraints. Different methods, such as constraint programming, standard mixed-integer programming, column generation approaches, and GA are compared. The main objective of these works is to study how to design robust industry quality methods, which are able to find a good quality solution for any problem in the benchmark in a small time (10 minutes on either a 1 or 4 processor computer). The publications focus on the first three approaches, which were identified as the most promising in an early stage [32]. In [42], a Large Neighborhood Search (LNS) schema is employed to complement the constraint programming approach.

502

PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS

The main idea is to “freeze” a large part of the solution and re-optimize the unfrozen part using constraint programming. This schema is iterated repeatedly with a new neighborhood each time. Different alternativesfor introducing parallelism were compared: parallelism within the constraint programming algorithm, parallelism using a portfolio of algorithms running in parallel, and parallelism at the Large Neighborhood Search level (multipoint LNS), where there are different algorithms optimizing different randomly chosen parts of the problem. The results showed that this last method obtained the best quality results. At the efficiency level, the experiments on a 4-processor machine showed quasi-constant load over 90% in contrast with the two previous alternatives that used under 60% of the processing power available.

20.3 NETWORK ROUTING Several researchers have proposed parallel and distributed metaheuristics for solving differefit variants of network routing problems. We focus our review on explicitly parallel and multiagent distributed approaches proposed. Agent-based approaches frequently solve the routing problem using the implicit parallelism or the distributedagent-based search without proposing a clear parallel schema. However, there are some agent-based proposals with an explicit distribution or parallelism. One example is presented by Islam, Thulasiraman, and Thulasiram [28, 491. The paper exposes a parallel version of the Ant Colony Optimization (ACO) algorithm for solving the all-pair routing problem in Mobile Ad Hoc Networks (MANETs). This kind of network operates building connections on the fly when they are needed, using a noncontrolled, distributed administrated communication. The network topology changes dynamically and this nondeterministic behavior implies difficulties for the routing procedure. The parallel ACO is employed to solve the problem of finding the shortest path from a given source to the destination using an exploration technique complemented with a search-ahead mechanism based on the pheromone concentration. The authors present a distributed-memoryimplementation with MPI. Based on the domain decomposition technique, the best configuration of the algorithm runs each ant in a different processor for determining the best route for all pairs of nodes in its subgraph. The results show a sublinear speedup with 10 processors. Even though the communication cost is much larger than the computation done by each ant, the algorithm shows a promising scalability factor as long as the problem dimension increases. Following with multiagent approaches, Sim and Sun [45] have proposed a multiple ant colony scheme for solving network routing problems. The approach adds a new aspect to the implicit distribution induced by the network itself: each node of the network employs multiple ant colonies in parallel. The claim is that this feature may help to mitigate stagnation or premature convergence of the ant colony schemes. Xingwei [53]presented a parallel algorithm combining an evolutionary approach and SA for solving routing and wavelength assignment for multicast in Dense Wavelength-Division Multiplexing (DWDM) networks. The hybrid approach consists in a synchronous distributed island PGA, incorporating a SA technique to decide

NETWORK ROUTING

503

whether or not to accept offspring and mutated individuals produced in the evolutionary operators when their fitness values are worse than their fathers’ values. The PGA-SA method is employed to solve the routing problem (construction of a multicast tree), while the wavelength assignment is done via a deterministic procedure based on Dijkstra’s shortest path algorithm. The PGA uses a node-based binary coding to represent networks and a fitness function that considers the concept of user QoS satisfaction degree. The reported results show that the algorithm is able to improve the QoS values of the multicast tree. Eren and Ersoy [20] study the static establishment of virtual paths in ATh4 networks, where the network topology, link capacities, and traffic requirements (in the form of a list of demands from source to terminal nodes) are given. The problem consists in assigning a virtual path to each demand, while minimizing the maximum utilization among all links. The solution proposed is a hybrid PGA, with an annealing mechanism in the selection stage (PAGA, Parallel Annealed Genetic Algorithm). The parallelism is implemented by means of an island model, where each processor runs the same algorithm but with different mutation and crossover parameters, and some selected individuals are migrated among the processors. The migration occurs synchronously, which can explain in part the seemingly poor running times. The method is compared with both sequential SA and GA over four different networks having between 26 and 50 nodes and three different traffic requirement patterns. Substantial improvements in quality were obtained at the cost of much longer running times (up to 12 times the running time of the GA). The total running times (about 30 seconds) make the method suitable for static problems but not for real-time optimization if the demands are varying. The algorithm was more robust than the GA and the SA algorithms: the dispersion of the quality of the results over 100 runs was much smaller. Zappala [54] presented a distributed local search algorithm for building multicast routing trees for alternate path computation in large networks. In the alternate path routing approach, the network nodes mainly communicate using shortest paths, but alternative longer paths are available for the case in which the shortest paths are overloaded. In addition to topics related to the routing architecture and protocol, Zappala evaluates a distributed local path searching heuristic that utilizes only partial topology information (each receiver computes its own alternate paths) to find feasible alternate paths for the whole network. In Zappala’s fully distributed approach, each receiver conducts a limited search by collecting paths from routers, developing a partial map of the network which is used to compute alternate paths. All communications occur between the receiver and individual routers, explicitly avoiding any message passing between routers with the goal of providing a simple, purely local path computation algorithm. The author shows the efficacy of the distributed heuristic approach over a wide range of topologies and loads, proving that the local search algorithm can approximate the effectiveness of a global routing protocol with much lower overhead. The approach scales to large networks, where an exhaustive search is not viable, and that performance improves as long as the multicast group grows in size. Table 20.2 resumes the articles that have proposed parallel metaheuristics applied to network routing problems.

504

PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS

Table 20.2 Parallel metaheuristics applied to network routing problems Authnrls) Eren and Ersoy Siin and Sun Islam et al.

Year 2001 2002 2003

Xingwei et al.

2003

Zappala

2004

Related ootimization oroblem Virtual path routing. Network routing. All-pair routing problem in mobile ad hoc networks. Routing and wavelength assignment for multicast in Dense Wavelength Division Multidexing networks. Multifast trees.

Metaheuristic Annealing Genetic Ant Colony Optimization. Ant Colony Optimization.

Hybrid: Evolutionary Algorithm + Simulated Annealing. Local Search.

20.4 NETWORK ASSIGNMENT AND DIMENSIONING In this section, we summarize the works proposing parallel and distributed metaheuristics for facing problems related to assigning resources to a given network, dynamical dimensioning problems, and other miscellaneous applications. We have grouped the works into four categories: radio link frequency assignment, cellular networks, multi-commodity network design, and other works. Table 20.3 resumes the articles included in this section.

Table 20.3 Parallel metaheuristics applied to network assignment and dimensioning problems Author(s) kkstein Hurley et al. Hurley et al.

Year 1994 1994 1996

Bienstock and Gunluk Gunluk Gendron et al. Kwok

1996 1997 1997 1999 2000 2000 2000 2000

Zhou et al. Lee and Kang

Related optimization problem Multicommodity network design. Frequency assignment problem. Radio link frequency assignment. Capacitated network design. Capacitated network design. Uncapacitated network design. Dynamic channel assignment in mobile networks. Media mapping in video on demand server network. Cell planning with capacity expansion in wireless networks for multicast. Frequency assignment.

Weinberg et al.

2000 2001

Crainic and Gendreau

2002

Quintero and Pierre

2003

Gendron et al.

2003

Thompson and Anwar

2003

Lightwave assignment in WDMs

Oliveira and Pardalos

2004

Power control in wireless ad hoc networks.

Fixed charge multicommodity network design. Assigning cells to switches in mobile networks. Multicommodity capacitated location problem.

Metaheuristic Branch & Bound. Genetic Algorithm. Genetic Algorithm.

Branch & Bound. Branch & Bound. Branch & Bound. Genetic Algorithm. Simulated Annealing. Genetic Algorithm. Hybrid: Genetic Algorithm +Tabu Search + Random Walk. Tabu Search. Memetic Algorithm. Hybrid: Variable Neighbourhood Descent Slope Scaling. Parallel Recombinative Simulated Annealing Variable Neighbourhood Search.

NETWORK ASSIGNMENT A N D DIMENSIONING

505

20.4.1 Radio Link Frequency Assignment The Radio Link Frequency Assignment Problem (RLFAP) consists in assigning frequencies to a number of radio links in order to to simultaneously satisfy a large number of constraints and minimize the amount of different frequencies employed. T h s problem appears in radio networks and is known to be NP-hard. In an earlier proposal, Hurley, Crompton, and Stephens [ 131 solved the problem with a PGA. The authors compare the results obtained using two different chromosomerepresentations: a “simple representation” codifying legal frequency values for each node in the network and an “alternative representation” grouping together those sites with the same frequency assigned. The fitness function is formulated to minimize several parameters related to the electromagnetic interferences due to the use of similar frequency values for nearby transmitters. Computational results obtained when solving the problem considering a simulated but realistic military scenario showed that the improved ordered representationproposed yields to superior numerical values in terms of fewer constraint violations. Based on the previous work, Hurley, Thiel, and Smith [27] presented a comparative study of several metaheuristics applied to the RLFAP. The authors studied the assignment problem subject to several management constraints and proposed a SA, a distributed island PGA, and a TS procedure to solve it. The experimental results show that SA obtains the assignments with the lowest number of constraints violations. In addition, the SA and the TS algorithms perform a more efficient search, taking advantage of specialized neighborhood search operators and generating more assignments than the GA, which employs non-specialized operators and codification. The works of Weinberg, Bachelet, and Talbi [5 1,521 propose to solve the problem with the COSEARCH parallel hybrid metaheuristic [3], which is based on the cooperation of three complementary agents, balancing the exploration of the search space and the exploitation of good solutions previously found. An adaptive memory acts as coordinator for exchanging information related to the search procedure among the searching agent (which implements a simple TS algorithm), the diversifying agent (a GA), and the intensifying agent (using a random walk). The parallel implementation follows the master-slave paradigm, where the master process manages the workers and hosts the adaptive memory, the GA, and the intensifying agent. The slaves consist in several sequential TS algorithms. The authors tested the COSEARCH metaheuristic on several benchmark problems provided by France Telecom. Using a parallel hybrid algorithm, Kwok [30, 3 11 tackled the Dynamic Channel Assignment (DCA problem). This is a variant of the Frequency Assignment Problem where channels must be allocated to cells dynamically, depending on the traffic demands. Oriented to take advantage of static and dynamic assignment models, Kwok proposed a quasi-static dynamic approach, combining two modules: an offline module that employs a PGA to generate a set of allocation patterns and an on-line module using a parallel local search method based on table-lookup and reassignment strategies. The hybrid parallel model is executed on a Linux-based cluster of PCs and reports better results than other DCA algorithms, in terms of both solution quality and efficiency.

506

PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS

20.4.2 Cellular Networks

With the increment of cellular phones the interest in cellular networks is emphasized. There are many aspects related to this kind of network that are not new in the telecommunications domain, e.g., the radio link frequency assignment problem tackled in the previous section. However, some other aspects appear only in this context, such as assigning cells to switches and cell planning. Quintero and Pierre [43] proposed a parallel multipopulation memetic algorithm applied to the problem of assigning cells to switches in mobile networks. The objective of this problem consists in providing a cell assignment that minimizes the number and cost of required facilities when a mobile user changes the switch serving as relay and the mobile network needs to transfer a call in progress from a cell to another. The authors proposed a memetic algorithm combining the evolutive search of GAS with local refinement strategies based on TS and SA. The parallel algorithm follows a distributed island model with subpopulations arranged in a fully meshed topology. It employs a nonbinary representation codifying the cells and the switch to which is assigned, and the local search operator (TS or SA) is applied after the recombination and mutation stages. The experiments performed on a network of 10 workstations, Pentium at 500 MHz connected with a 100 Mbps LAN, show that the two local search strategies yield to better result quality, improving by 30% over the best sequential GA results and a 20% over the best PGA results. The memetic version that incorporates TS has the best computational performance, between 10 and 20 times faster than the SA version. The sequential GA is the slowest method. The authors also compare their memetic algorithms with a specific heuristic from Merchant and Sengupta [37] and a pure TS algorithm, showing that, although the memetic approaches are slower, they yield slight improvements in the cost function, which represent important fund savings over a 10-year period. Lee and Kang 1331 studied the cell planning problem with capacity expansion in wireless communications. This problem consists in finding optimal locations and capacities for new base stations in order to cover the expanded and increased traffic demand of mobile phones, minimizing the installation cost ofnew base stations. They propose a TS algorithm and compare it with a Grouping PGA, whose main difference with the standard PGA is mainly due to the use of group-oriented operators, suitable for grouping problems, including set covering. The PGA gives near-optimal solutions in Advanced Mobile Phone Service (AMPS) problems with up to 100 Time Division Access (TDA) but the quality of results degrades as the problem size increases, reporting gap values from the optimal solution of 20% in problems with 900 TDAs. Similar results are obtained when solving Code Division Multiple Access (CDMA) problems, reaching gap values between 25% and 30% for 2500 TDAs, while failing to meet the coverage factor desired. In both cases, the PGA did not achieve accurate results for large-size problems and showed itself as not competitive with the TS approach. The authors argued that this is due to the inaccurate penalty method used to handle the problem constraints.

NETWORK ASSIGNMENT AND DIMENSIONING

507

20.4.3 Multicommodity Network Design Several authors have presented multiple parallel approaches for solving different variants of the multicommodity network design problem. The problem consists in deciding the transportation of several commodities from sources to destinations over the links of a network with limited capacities. There are fixed construction costs related to the use of a link plus variable transportation costs related to the flow volumes. These kinds of problems are frequent in vehicle routing and logistic applications, but they also have several applications in telecommunications when addressing planning and operations management issues. A well-known problem of this kind looks for an optimal dimensioning of reserved capacities on the links of an existing telecommunication network to face “catastrophic” link failures. In a pioneer article, Eckstein [I91 proposed a parallel branch & bound algorithm for solving several mixed-integer programming problems, among which he faced optic fiber network design and multiperiod facility location problems. Eckstein studied several centralized and distributed parallelization approaches and load-balancing strategies, presenting comparative results on a set of 16 problems. The parallel branch & bound algorithm shows almost linear speedup values (with efficiency values near 0.8) even for the simplest parallel approaches. Experiments performed with up to 64 processors reveals that fully decentralized schemes scale correctly and are able to achieve similar overall performance to centralized approaches, while promising good expectations for larger configurationwhere the centralized schemes might suffer bottleneck problems. Extending the previous approach, Bienstock and Gunluk [6], and Gunlunk 1251 analyzed parallel branch & bound algorithms for solving mixed integer programming problems arising in capacitated network design problems. Considering a capacitated network and point-to-point traffic demands, these problems propose to install more capacity on the edges of the network and route traffic simultaneously,minimizing the overall cost. The authors presented a branch & cut algorithm and studied alternative parallel implementations to confront large-size real-life problem instances. In a first proposal, Gendron and Crainic [22] introduced a parallel branch & bound algorithm for solving the uncapacitated version of the problem when balancing requirements are involved. The algorithm consists of a synchronous initialization phase and an asynchronous exploration phase. It utilizes bounding procedures based on Lagrangian relaxation and known nondifferentiable optimization techniques. The parallel approach follows a master-slave model that accelerates mainly the bounding procedure performing operations on several subproblems simultaneously without changing the tree search procedure. The parallel asynchronous exploration shows sublinear speedup on a set of 10 representative test problems. However, it achieves significant speedup values on larger instances, allowing to solve a huge real-life planning problem in acceptable times. Later, Gendron, Potvin, and Soriano [23] designed a parallel hybrid heuristic for the multicommodity capacitated location problem with balancing requirement combining a Variable Neighborhood Descent (VND) and a Slope Scaling (SS) method. The parallel implementation is based on a coarse-grained master-slave approach,

508

PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS

where the master process manages the memories while the slave processes perform the computations. In addition, it employs adaptive memories to provide new starting points for both SS and VND methods. Experiments illustrate that adding a large number of processors leads to a diversification of the search procedure, improving the results quality. Crainic and Gendreau [ 111 studied in detail different alternatives for designing a cooperative parallel TS method for solving the Fixed-charge Capacitated Multicommodity Network Design Problem with linear costs (PCMND). The parallel TS method is based on several TS threads which communicate by means of a central memory or pool of solutions. Each individual thread works like the sequential TS by Crainic, Gendreau, and Farvolden [12], with a different set of parameter values (chosen from the pool of good parameters found for the sequential TS). The authors discuss in detail five different pool data selection strategies (regulatingwhich solution from the central pool will be sent to a search thread which asks for a given solution) and five external solution import strategies (when search threads will ask for a pool solution). These different strategies are compared on 10 problem instances, using 4, 8, and 16 search threads. The results indicate that, independently from the strategies, the parallel algorithm improves the quality of the results of the sequential algorithm (from the literature). Other conclusions are that the cooperative parallel search has better results than the independent parallel search, and the performance of selection strategies depends on the number of processors. Regarding the solution import strategies, the basic communication criterion (importing a solution before a diversification operation and accepting it if the imported solution is better than the current best) outperforms the alternative more sophisticated criterion. Based on these results, the authors fix the strategies and perform more experiments. The results indicate that parallel implementations consistently reduce the optimality gap over the entire range of problems considered. They require longer times than the sequential one but find good quality solutions faster.

20.4.4 Other Works The articles by Zhou, Lueling, and Zie [55, 561 describe a media-mapping problem in a video-on-demand server network. Given a server network architecture (a binary tree with ring), the problem consists in deciding which media assets -actually television recordings- to store in each server, the encoding quality of these assets, and the routing of the client requests to the servers. The overall objective is the maximization of the QoS level measured in terms of provided user coverage and media bit rates. The solution must satisfy constraints on server storage capacity and link bandwidth. This problem can be seen as an extension of the File Allocation Problem with the special property that the same video can be stored in files of different sizes, depending on the encoding quality chosen. The authors use the parSA library [29] to develop a parallel SA method for solving this problem. The initial solution is constructed in two phases: first a feasible solution is built and then a greedy strategy is applied to improve the QoS objective. The algorithm utilizes different neighborhood structures based on asset migration plus re-routing of demands (which

NETWORK ASSIGNMENT A N D DIMENSIONING

509

in one of the neighborhoods is optimized using backtracking). The articles report experimental results on 10 benchmark instances designed “ad hoc” with between 7 and 63 servers, 256 and 1024 different media assets, and different storage capacities, communication bandwidths, and levels of access patterns. The proposed method obtained good quality solutions: less than 2% average gap from upper bound for the best neighborhood structure. Recently, Oliveira, and Pardalos [40] faced the problem of determining optimal results of using mobile clients, such as cellular phones and personal digital assistants, over wireless ad hoc networks. The problem proposes to minimize resource consumption (such as battery) employing algorithmic techniques, and it is directly related to the definition of optimal strategies for data routing. The authors present mathematical programming models to compute the amount of power required by networks users at a specific time period. Since those models are difficult to solve using exact algorithms, they suggest a distributed VNS to find accurate solutions in reasonable execution times. The distributed algorithm is executed over the mobile agents, distributing the computational effort among the different nodes in the wireless network and cooperating to find a solution that can determine its power level. The algorithm divides the whole network topology in small parts and assigns them to different mobiles in the network. Each one computes the solution for part of the network and communicates with its neighbors. The results obtained on a single machine environment show that the distributed VNS algorithm is able to find accurate solutions similar or even better than those of the serial VNS and the relaxed integer programming method. In addition, the distributed algorithm shows very good computational time values. Thompson and Anwar [48]study the static lightwave assignment problem in a Wavelenght Division Multiplexing (WDM) network equipped only with selective cross-connects (i.e., cross-connects which can not apply wavelengh conversions). The problem consists in assigning wavelenghs to the different source-destinationpairs in such a way that on each link all routed flows have different wavelenghs (there is no conflict). To solve this problem, Thompson and Anwar employ a hybrid technique called Parallel Recombinative Simulated Annealing (PRSA) with features of the GA and the SA. Like GA it is a population-based method with crossover and mutation operators, and like SA it employs the Metropolis criterion for deciding whether to select the new individuals generated. Parallelism is implemented following an island model with asynchronous migration. In the case study, the islands are organized into a ring topology, receiving individuals from one neighbor and sending them to another one. A small case study (25 nodes, 600 traffic parcels) was generated with a random network generator. The final conclusions claim that lower levels of interaction between the different populations (i.e., fewer migrants) lead to better convergence rate and solution quality.

510

PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS

20.5 CONCLUSIONS In this chapter we have presented a summary of works dealing with the application of parallel metaheuristics in the field of telecommunications. We grouped the works in three categories: network design, network routing, and network assignment and dimensioning. As we can observe above, there are many problems in the telecommunication domain that are computationally intractable with classic exact techniques. In these problems metaheuristic algorithms have been applied in the literature not only with a sequential approach but also with parallel methods, as this chapter clearly demonstrates. The advantages of the parallel metaheuristic techniques come from their computational efficiency as well as from the different exploration schemes used by the parallel search methods.

Acknowledgments The third and fourth authors acknowledge partial funding by the Ministry of Science and Technology and FEDER under contract TlC2002-04498-C05-02 (the TRACER project).

REFERENCES 1. E. Alba, F. Almeida, M. Blesa, C. Cotta, M. Diaz, I. Dorta, J. Gabarro, J. Gonzalez, C. Leon, L. Moreno, J. Petit, J. Roda, A. Rojas, and F. Xhafa. MALLBA: A Library of Skeletons for Combinatorial Optimisation. In Proceedings of the Euro-Par, pages 927-932,2002. 2. E. Alba and F. Chicano. On the Behavior of Parallel Genetic Algorithms for Optimal Placement of Antennae in Telecommunications. International Journal of Foundations of Computer Science, 16(2):343-359,2005. 3. V. Bachelet and E-G. Talbi. COSEARCH: a Co-evolutionary Metaheuristic. In Proceedings of Congress on Evolutionary Computation (CEC ’2000), pages 1550-1557, San Diego, USA, 2000. 4. B. Baran and F. Laufer. Topological Optimization of Reliable Networks Using

A-Teams. In Proceedings of World Multiconference on Systemics, Cybernetics and Informatics - SCI’99. IEEE Computer Society, 1999.

5. R. Bemhard, J. Chambon, C. Le Pape, L. Perron, and J-C. Regin. Resolution d’un Probleme de Conception de Reseau avec Parallel Solver. In Proceeding of JFPLC, page 151,2002. (text in French). 6. D. Bienstock and 0. Gunluk. A Parallel Branch & Cut Algorithm for Network Design Problems. In INFORMS, Atlanta, USA, November 1996.

REFERENCES

511

7. P. Calegari, F. Guidec, P. Kuonen, and F. Nielsen. Combinatorial Optimization Algorithms for Radio Network Planning. Theoretical Computer Science, 263( 12):235-265,2001. 8. E. Canhi and M. Mejia. Experimental results in distributed genetic algorithms. In Proceedings of the Second International Symposium on Applied Corporate Computing, Monterrey, MCxico, pages 99- 108, 1994. 9. S. Canuto, M. Resende, and C. Ribeiro. Local Search with Perturbations for the Prize Collecting Steiner Tree Problem in Graphs. Networks, 38:50-58,2001. 10. A. Chabrier, E. Danna, C. Le Pape, and L. Perron. Solving a Network Design Problem. Annals of Operations Research, 130:217-239,2004. 11. T. Crainic and M. Gendreau. Cooperative Parallel Tabu Search for Capacitated Network Design. Journal of Heuristics, 8(6):601-627,2002. 12. T.G. Crainic, M. Gendreau, and J.M. Farvolden. A Simplex-based Tabu Search Method for Capacitated Network Design. INFORMS Journal on Computing, 12(3):223-236,2000. 13. S. Hurley, W. Crompton, and N.M. Stephens. A Parallel Genetic Algorithm for Frequency Assignment Problems. In Proceedings IMACS/IEEE Int. Svmp. on Signal Processing, Robotics and Neural Networks, pages 8 1-84, Lille, France, 1994. 14. F. Cruz and G.R. Mateus. Parallel Algorithms for a Multi-level Network Optimization Problem. Parallel Algorithms and Applications, 18(3):121-137, 2003. 15. F. Cruz, G.R. Mateus, and J.M. Smith. Randomized Load-balancing for Parallel Branch-and-bound Algorithms for Multi-level Network Design. In Proceedings of 12th Symposium on ComputerArchitecture and High Performance Computing, pages 83-90, Sao Pedro, Brazil, 2000. 16. G. Di Fatta, G. Lo Presti, and G. Lo Re. A Parallel Genetic Algorithm for the Steiner Problem in Networks. In The 15th IASTED International Conference on Parallel and Distributed Computing and Systems, PDCS 2003, Marina del Rey, CA, USA, pages 569-573,2003. 17. S. Duarte and B. B a r b . Multiobjective Network Design Optimisation Using Parallel Evolutionary Algorithms. In X W I Z Conferencia Latinoamericana de Informatica (CLEI’2001), Mtrida, Venezuela, 2001. (text in Spanish).

18. S. Duarte, B. Barhn, and D. Benitez. Telecommunication Network Design with Parallel Multiobjective Evolutionary Algorithms. In Proceedings of IFIP/ACM Latin America Networking Conference,pages 1-1 1,2003. 19. J. Eckstein. Parallel Branch-and-bound for Mixed Integer Programming. SIAM NOVS, 27:12-15, 1994.

512

PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS

20. M. Eren and C. Ersoy. Optimal Virtual Path Routing Using a Parallel Annealed

Genetic Algorithm. In Proceedings of the IEEE International Conference on Telecommunications, volume 1, pages 336-341, Bucarest, June 2001. 21. C.M. Fonseca and P.J. Fleming. Genetic Algorithms for Multiobjective Opti-

mization: Formulation, Discussion and Generalization. In Genetic Algorithms: Proceedings of the Fifth International Conference, pages 4 1 M 2 3 . Morgan Kaufmann, 1993. 22. B. Gendron and T.G. Crainic. A Parallel Branch-and-bound Algorithm for Multicommodity Location with Balancing Requirements. Computers & Operations Reseach, 24(9): 829-847, 1997. 23. B. Gendron, J-Y. Potvin, and P. Soriano. A Parallel Hybrid Heuristic for the

Multicommodity Capacitated Location Problem with Balancing Requirements. Parallel Computing, 29(5):591-606,2003. 24. S. Gordon and D. Whitley. Serial and Parallel Genetic Algorithms as Function

Optimizers. In Proceedings of the Fifth International Conference on Genetic Algorithms, pages 177-183. Morgan Kaufmann, 1993. 25. 0. Gunluk. Parallel Branch-and-cut: A Comparative Study In International Symposium on Mathematical Programming, Laussanne, Switzerland, 1997. 26. R. Huang, J. Ma, T.L. Kunii, and E. Tsuboi. Parallel Genetic Algorithms for

Communication Network Design. In Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms/Architecture Synthesis (PAS ’97),pages 370377. IEEE Computer Society, 1997. 27. S. Hurley, S.U. Thiel, and D.H. Smith. A Comparison of Local Search Algo-

rithms for Radio Link Frequency Assignment Problems. In Oficialprogram of the 1996 ACM symposium on Applied Computing, pages 251-257. ACM Press, 1996. 28. M.T. Islam, P. Thulasiraman, and R.K. Thulasiram. A Parallel Ant Colony

Optimization Algorithm for All-pair Routing in MANETs. In IEEE Computer Sociery Fourth IPDPS workshop on Parallel and Distributed Scientijic and Engineering Computing with Applications (PDSECA-2003), Nice, France, p. 259, April 2003. 29. G. Kliewer and S. Tschoke. A General Parallel Simulated Annealing Library

and its Application in Airline Industry. In Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS 2000), pages 55-6 1, Cancun, Mexico, 2000. 30. Y.K. Kwok. A Quasi-Static Cluster-Computing Approach for Dynamic Channel

Assignment in Cellular Mobile Communication Systems. In Proceedings ofthe 1999 IEEE Vehicular Technolgy Conference (VTC’99-Fall), volume 4, pages 2343-2347, Amsterdam, Netherlands, 1999. IEEE Press.

REFERENCES

513

31. Y.K. Kwok. Quasi-Static Dynamic Channel Assignment Using a Linux PC Cluster. In Proceedings of the 4th IEEE International Conference on High Perfonnance Computing in the Asia-Pacific Region - Volume I , pages 170-1 75, Beijing, China, May 14-17,2000. 32. C. Le Pape, L. Perron, J-C. RCgin, and P. Shaw. Robust and Parallel Solving of a Network Design Problem. In Pascal Van Hentenryck, editor, Proceedings of CP 2002, pages 633-648, Ithaca, NY,USA, September 2002. 33. C.Y. Lee and H.G. Kang. Cell Planning with Capacity Expansion in Mobile Communications: A Tabu Search Approach. IEEE Transactions on Vehicular Technology,49(5): 1678-1690,2000. 34. G. Lo Re, G. Lo Presti, P. Storniolo, and A. Urso. A Grid Enabled Parallel Hybrid Genetic Algorithm for SPN. In M. Bubak, P.M.A. Slot, G.D. Van Albada, and J. Dongarra, editors, Proc. of The 2004 International Conference on Computational Science, ICCS’O4, Lecture Notes in Computer Science, Vol. 3039, pages 156-163, Krakow, Poland, June 6-9,2004. Springer. 35. S. L. Martins, C. C. Ribeiro, and M. C. Souza. A Parallel GRASP for the Steiner Problem in Graphs. In Workshop on Parallel Algorithms for Irregularly Structured Problems, pages 285-297, 1998. 36. S.L. Martins, M.G.C. Resende, C.C. Ribeiro, and P.M. Pardalos. A Parallel GRASP for the Steiner Tree Problem in Graphs Using a Hybrid Local Search Strategy. Journal of Global Optimization, 17:267-283,2000. 37. A. Merchant andB. Sengupta. Assignment ofcells to Switches inPCS Networks. IEEE/ACM Transactions on Networking, 3(5):521-526, October 1995. 38. H. Meunier, E-G. Talbi, and P. Reininger. A Multiobjective Genetic Algorithm for Radio Network Optimization. In Proceedings of the 2000 Congress on Evolutionary Computation CECOO, pages 3 17-324, California, USA, 2000. IEEE Press. 39. S. Nesmachnow, H. Cancela, and E. Alba. Evolutive Techniques Applied to Reliable Communication Network Design. In Tercer congreso espaiiol de Metaheuristicas, Algoritmos Evolutivos y Bioinspirados (MAEB’O4),pages 388-395, Cordoba, Spain, 2004. (text in Spanish). 40. C.A.S. Oliveira and P.M. Pardalos. A Distributed Optimization Algorithm for Power Control in Wireless Ad Hoc Networks. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’O4),page 177,

Santa Fe, New Mexico, April 26-30,2004. 4 1. L. Perron. Parallel and Random Solving of a Network Design Problem. In AAAI 2002 Workshop on Probabilistic Approaches in Search, pages 35-39,2002. 42. L. Perron. Fast Restart Policies and Large Neighborhood Search. In Proceedings of CPAIOR’O3, Montreal, Canada, May 8-10,2003.

514

PARALLEL METAHEURISTICS IN TELECOMMUNICATIONS

43. A. Quintero and S. Pierre. Sequential and Multi-population Memetic Algorithms for Assigning Cells to Switches in Mobile Networks. Computer Networks: The International Journal of Computer and Telecommunications Networking, 43(3):247-261, October 2003. 44. C.C. Ribeiro and I. Rosseti. A Parallel GRASP for the 2-path Network Design Problem. In Burkhard Monien and Rainer Feldmann, editors, Parallel Processing 8th International Euro-Par Conference, Lecture Notes in Computer Science, Vol. 2400, pages 922-926, Paderborn, Germany, 2002.

45.K.M. Sim and W.H. Sun. Multiple Ant-Colony Optimization for Network Rout-

ing. In Proceedings of the First International Symposium on Cyber Worlds, pages 277-28 1,2002.

46. N. Srinivas and K. Deb. Multiobjective Optimization Using Nondominated Sorting in Genetic Algorithms. Evolutionary Computation, 2(3):221-248, 1994. 47. R. Tanese. Distributed Genetic Algorithms. In Proceedings of the Third International Conference on Genetic Algorithms, pages 434-439. Morgan Kaufmann, 1989. 48. D.R. Thompson and M.T. Anwar. Parallel Recombinative Simulated Annealing for Wavelength Division Multiplexing. In Proceedings of the 2003 International Conference on Communications in Computing, pages 2 12-2 17, Las Vegas, NV, June 23-26,2003. 49. P. Thulasiraman, R.K. Thulasiram, and M.T. Islam. An Ant Colony Optimization Based Routing Algorithm in Mobile Ad Hoc Networks and its Parallel Implementation. In Laurence Tianruo Yang and Yi Pan, editors, High Performance Scientific and Engineering Computing: Hardware/Software Support, Probability: Pure and Applied, chapter 18, pages 267-284. Kluwer Academic Publishers, 2004. 50. S. Watanabe, T. Hiroyasu, and M. Mihand. Parallel Evolutionary Multi-criterion Optimization for Mobile Telecommunication Networks Optimization. In Proceedings of the EUROGEN2001 Conference, pages 167-1 72, Athens, Greece, September 19-21, 2001. 51. B. Weinberg, V. Bachelet, and E-G. Talbi. A Coevolutionary Metaheuristic for the Frequency Assignment Problem. In Frequency Assignment Workshop, London, England, July 2000. 52. B. Weinberg, V. Bachelet, and E-G. Talbi. A Co-evolutionist Meta-heuristic for the Assignment of the Frequencies in Cellular Networks. In First European Workshop on Evolutionary Computation in Combinatorial Optimization EvoCOP’2001, Lecture Notes in Computer Science Vol. 2037, pages 140-149, Lake Come, Italy, April 200 1. 53. W. Xingwei. A Multi-population-p~allel-genetic-simulated-a~ealing-based QoS Routing and Wavelength Assignment Integration Algorithm for Multicast

REFERENCES

515

in DWDM Networks. In 16th APAN Meetings /Advanced Network Conference, Busan, August 2003. 54. D. Zappala. Alternate Path Routing for Multicast. IEEEACM Transactions on Networking, 12(1):30-43,2004. 55. X. Zhou, R. Lueling, and L. Xie. Heuristic Solutions for a Mapping Problem in a TV-Anytime Server Network. In Workshops of 14th IEEE International Parallel and Distributed Processing Symposium (IPDPS'OO), Lecture Notes in Computer Science, Vo1.1800, pages 210-217, Cancun, Mexico, 2000. Springer Verlag.

56. X. Zhou, R. Lueling, and L. Xie. Solving a Media Mapping Problem in a Hierarchical Server Network with Parallel Simulated Annealing. In Proceedings of the 29th International Conference on Parallel Processing (ICPP '00), pages 115-124, Toronto, 2000. IEEE Computer Society. 57. E. Zitzler. Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications. PhD thesis, Swiss Fed. Inst. Techn. Zurich, 1999.

This Page Intentionally Left Blank

2 1 Bioinformatics and

Parallel Metaheuristics

OSWALDO TRELLES, ANDRkS RODRiGUEZ Universidad de Malaga, Spain

21.1

INTRODUCTION

Many bioinformatics applications involve hard combinatorial search over a large solution space. This scenario seems to be a natural application domain for parallel metaheuristics. This chapter surveys the computational strategies followed to parallelize the most used software in the bioinformatics arena, laying special emphasis on metaheuristic approaches. It strongly follows the document “On the Parallelization of Bioinformatics Applications” [7 11 but extends both, the applications to include metaheuristic methods and the computational details with deep comments on efficiency and implementation issues with special focus on information technology details. The studied algorithms are computationally expensive and their computational patterns range from regular, such as database-searching applications, to very irregularly structured patterns (phylogenetic trees). Fine and coarse-grained parallel strategies are discussed for these very diverse sets of applications. This overview outlines computational issues related to parallelism, physical machine models, parallel programming approaches, and scheduling strategies for a broad range of computer architectures. In particular, it deals with shared, distributed, and shareddistributed memory architectures. 21.1.1

Inter-Disciplinary Work

Born almost at the same time 50 years agd, molecular biology and computer science have grown explosively as separate disciplines. However,just as two complementary DNA strands bind together in a double helix to better transmit genetic information, an evolving convergence has created an interrelationship between these two branches of science. In several areas, the presence of one without the other is unthinkable. Not only has traditional sequential Von Neumann-based computing been fertilized through this interchange of programs, sequences, and structures, but the biology 517

518

BIOINFORMATICS AND PARALLEL METAHEURlSTlCS

field has also challenged h g h performance computing with a broad spectrum of demanding applications (for CPU, main memory, storage capacity, and I/O response time). Strategies using parallel computers are driving new solutions that seemed unaffordable only a few years ago. 21.1.2 Information Overload With the growth of the information culture, efficient digital searches are needed to extract and abstract useful information from massive data. In the biological and biomedical fields, massive data take the form of bio-sequences flat files, 3D structures, motifs, 3D microscopic image files, and more recently, videos, movies, animations, etc. However, while genome projects and DNA arrays technology are constantly and exponentially increasing the amount of data available (for statistics see http:l/www3.ebi.ac.uWServices/DBStats), our ability to absorb and process this information remains near constant. It was only a few years ago when we were confident that evolution in computerprocessing speed, increasing exponentiallylike some areas of knowledge in molecular biology, could handle the growing demand posed by bioinformatic applications. Processing power has jumped from the once-impressive 4.77 MHz in the early Intel 8088 to more than 3.5 GHz current frequencies in the Pentium IV and AMD Athon 64 gallery. Most probably, commercial processors with up to 5 GHz will be available in the course ofthis decade; moreover Intel estimates the 10 GHz CPUs will be available by 2010. This exponential growth rate can also be observed in the development of practically every computer component, such as the number of CPU transistors, memory access time, cache size, etc. However, contemporary genome projects have delivered a blow to this early confidence. From the completion of the first whole organism's genome (saccharomyces, mid- 1998), the growth rates for biological data have become a detriment to sequential computing processing capability. At this point, sequential (one-processor)computing can allow only a small part of the massive, multidimensional biological information to be processed. Under this scenario, comprehension of the data and understanding of the data-described biological processes could remain incomplete, causing us to lose vast quantities of valuable information because CPU power and time constraints could fail to follow critical events and trends. 21.1.3 Computational Resources

From a computationalpoint of view, there are several ways to address the lack of hard computing power for bioinformatics. The first is by developing new, faster heuristic algorithms that reduce computational space for the most time-consuming tasks [3][54]. The second is incorporating these algorithms into the ROM of a specialized chip (i.e., the bio-accelerator at Weizmann Institute, http://sgbcd//weizma.ac.iV). The third and most promising consideration, however, is parallel computing. Two or more microprocessors can be used simultaneously, in parallel processing, to divide

BIOINFORMATICS AT A GLANCE

519

and conquer tasks that would overwhelm a single, sequential processor. However promising, parallel computing still requires new paradigms in order to harness the additional processing power for bioinformatics. Before this document embarks on a detailed overview of the parallel computing software currently available to biologists, it is useful to explore a few general concepts about the biological concerns and about computer architectures, as well as the parallel programming approaches that have been used for addressing bioinformatic applications.

21.2 BIOINFORMATICS AT A GLANCE One of the major challenges for computer scientists who wish to work in the domain of computational biology is becoming fluent with the basis of biological knowledge and its large technical vocabulary. Of course we do not pretend to fully introduce the reader in such specialized vocabulary (see, for example, [l, 421). Nothing could be further from a real objective for this short document, but a minimal understanding of the relationships and internals of biological data is important to design coherent and useful bioinformatic software. Helping to answer questions that have been pursued by nearly all cultures is by itself of profound interest ... and computers have been in the border line of those. All living organisms are endowed with genetic material that carries out the instructions to build all the other constituents of the cell. This information is mainly stored in long strands of DNA grouped into X-shaped structures called chromosomes. DNA instructions are written in a four-letter alphabet represented by A, C. G , T that correspond to the Adenine, Cytosine, Guanine and Thymine nucleotides. All of the genetic information of an organism is referred to as its genome. The genome size varies from few hundreds in some bacteria to more than lo1' nucleotides in salamander (i.e., the Human genome is approximate 3 100 million nucleotides long). But not all DNA in eukaryotes codes for proteins. Just around 5% of the human genome is formed by coding regions called exons, separated by long strands of noncoding regions named introns. Even more, the instructions for producing a particular protein -called genes- are normally composed of several exons separated by introns inserted into them. These introns are spliced out before the sequence is translated into aminoacids (the constituent of proteins). The cell machinery is able to interpret the gene instructions following the rules of the genetic code. Each non-overlapping triplet of nucleotides, called a codon, codes for one of the 20 different amino acids. Observe that four nucleotides can code 43 = 64 possible triplets, which is more than the 20 needed to code for each amino acid. Three of these codons designate the end of a protein sequence (stop codons). That means that most amino acids are encoded by more than one codon, which is explained as the degeneracy of the code. Many technologicalbreakdowns have made it possible to obtain the DNA sequence of whole organisms. A genomic project is the effort to obtain the sequence of the

520

BlOlNFORMATlCS AND PARALLEL METAHEUNSTICS

DNA of a given organism. Being able to divide the genome into moderate-sized chunks is a prerequisite to determining its sequence. Sequencing is performed at a resolution of a few thousand base-pairs at a time. Thus, in order to determine the sequence of large pieces of DNA, many different overlapping chunks must be sequenced, and then these sequences must be assembled. A first draft of the human genome was obtained at the end of the last century, and nearly 100 of high eukaryotes genomes are now available (see http:llwww.ebi.ac.ukgenomes). Although knowing the genome composition of a given organism is an important achievement, it is only the first step in understanding the biological process that underlies life. The second step is to identify the genes hidden in the mountains of DNA information. But the process is not as easy as it seems. The basic process of synthesizing proteins maps from a sequence of codons to a sequence of amino acids. However, there are some important complications: since codons come in triples, there are three possible places to start parsing a segment of DNA, and it is also possible to read off either strand of the double helix, and finally there are well-known examples of DNA sequences that code for proteins in both directions with several overlapping reading frames. The central feature of life organisms is their ability to reproduce and become different through an accumulativeprocess named evolution. In order to evolve, there must be a source of variation such as random changes or mutations (insert a new nucleotide, delete an existing one, or change one nucleotide into another), sexual recombination, and various other hnds of genetic rearrangements. These changes modify the genetic information passed from parent to offspring. The analysis of such variations allows us to determine the way in which organisms were diverging (phylogenetic analysis). This is mostly performed by similarity comparison of molecular sequences. The similarities and differences among molecules that are closely related provide important information about the structure and fbnction of those molecules. Once the genes have been identified, and thus the protein it encodes, next step is to disclose the role of the protein in the organism. The sequence of amino acid residues that make up a protein is called the primary structure of the protein, but the function the protein holds is more related to its tri- dimensional conformation and one of the major unsolved problems in molecular biology is to be able to predict the structure and function of a protein from its amino acid sequence. In raw terms, the folding problem involves finding the mapping from primary sequence (a sequence of from dozens to several thousand symbols, drawn from a 20 letter alphabet) to the real-numbered locations of the thousands of constituent atoms in 3D space. An approach to the protein folding problem starts with the prediction of the secondary structure of the protein, which refers to local arrangements of a few to a few dozen amino acid residues that take on particular conformations that are seen repeatedly in many different proteins: corkscrew-shaped conformations called ahelices and long flat sheets called a @-strand,and a variety of small structures that link other structures: turns. The next problem is to determine the position of the atoms in a folded protein - known as its tertiary structure - and finally, some proteins only become functional when assembled with other molecules, termed the quaternary structure of the protein.

BlOlNFORMATlCS AT A GLANCE

521

Although every cell has the same DNA, at any particular time, a given cell is producing only a small fraction of the proteins coded for in its DNA. The amount of each protein is precisely regulated for the cell to function properly in a given particular environment. Thus, the cell machinery modifies the level of proteins as a response to changes in the environmental conditions or other changes. Although the amount of protein produced is also important, genes are generally said to be expressed or inhibited. Recently advances in gene monitoring microarray technology [58] has enabled the simultaneous analysis of thousands of gene transcriptions in different developmental stages, tissue types, clinical conditions, organisms, etc. The availability of such expression data affords insight into the functions of genes as well as their interactions, assisting in the diagnosis of disease conditions and monitoring the effects of medical treatments. The revolution in biology comes from the knowledge of the basic transformations of intermediary metabolism that can involve dozens or hundreds of catalyzed reactions. These combinations of reactions, which accomplish tasks like turning foods into usable energy or compounds, are called metabolic pathways. Because of the many steps in these pathways and the widespread presence of direct and indirect feedback loops, they can exhibit much counterintuitive behavior. When Doolittle et al. (1983) [14] used the nascent genetic sequence database to prove that a cancer-causing gene was a close relative of a normal growth factor, molecular biology labs all over the world began installing computers or linking up to networks to do database searches. Since then, a bewildering variety of computational resources for biology have arisen. Technological breakthroughs such as high-throughput sequencing and geneexpression monitoring technology have nurtured the “omics” revolution enabling the massive production of data. Unfortunately, this valuable information is often dumped in proprietary data models and specific services are developed for data access and analysis, without forethought to the potential external exploitation and integration of such data. Dealing with the exponential growing rates of biological data was a simple problem when compared with the problem posed by diversity, heterogeneity, and dispersion of data [ 5 5 ] . Nowadays, the accumulated biological knowledge needed to produce a more complete view of any biological process is disseminated around the world in the form of molecular sequences and structure databases, frequently as flat files, as well as imagekcheme-based libraries, web-based information with particular and specific query systems, etc. Under these conditions, parallel computers and grid technology are a clear alternative to help in exploiting this plethora of interrelated information pointing to the integration of these information sources as a clear and important technological priority.

522

BlOlNFORMATlCS AND PARALLEL METAHEURlSTlCS

21.3 PARALLEL COMPUTERS 21.3.1 Parallel Computer Architectures: Taxonomy A parallel computer uses a set of processors that are able to cooperate in solving computational problems [22]. This cooperation is made possible, first, by splitting the computational load of the problem (tasks or data) into parts and, second, by reconnecting the partial computations in order to create an accurate outcome. The way in which load distribution and reconnection (communications) are managed is heavily influenced by the system that will support the execution of a parallel application program. Parallel computer systems are broadly classified into two main models based on Flynn’s ( 1972) [2 11 specifications: single-instruction multiple-data (SIMD) machines, and multiple-instruction multiple-data MIMD machines. SIMD machines are the dinosaurs of the parallel computing world, once powerful, but now facing extinction. A typical SIMD machine consists of many simple processors (hundreds or even thousands), each with a small local memory. Every processor must execute, at each computing or “clock” cycle, the same instruction over different data. When a processor needs data stored on another processor, an explicit communication must pass between them to bring it to local memory. The complexity and often the inflexibility of SIMD machines, strongly dependent on the synchronization requirements, have restricted their use mostly to special-purpose applications. MIMD machines are more amenable to bioinformatics. In MIMD machines, each computational process executes at its own rhythm in an asynchronous fashion with complete independence of the other computational processes [34]. Memory architecture has a strong influence on the global architecture of MIMD machines, becoming a key issue for parallel execution, and frequently determines the optimal programming model. It is really not difficult to distinguish between shared and distributed memory. A system is said to have shared-memory architecture if any process, running in any processor, has direct access to any local or remote memory in the whole system. Otherwise, the system has distributed-memoryarchitecture. Shared-memory architecture brings several advantages to bioinformatic applications. For instance, a single address map simplifies the design of parallel programs. In addition, there is no “time penalty” for communicationbetween processes, because every byte of memory is accessible in the same amount of time from any CPU (uniform memory access: UMA architecture). However, nothing is perfect, and shared memory does not scale well as the number of processors in the computer increases. Distributed-memory systems scale very well, on the other hand, but the lack of a single physical address map for memory incurs a time penalty for interprocess communication (nonuniform memory access: NUMA architecture). Current trends in multiprocessor design try to achieve the best of both memory architectures. A certain amount of memory physically attaches to each node (distributed architecture), but the hardware creates the image of a single memory for the

523

PARALLEL COMPUTERS

whole system (shared architecture). In this way, the memory installed in any node can be accessed from any other node as if all memory were local with only a slight time penalty. A few years ago, two technological breakthroughs made possible another exciting approach to parallel computing. The availability of very fast processors in workstations, together with the widespread utilization of networks, led to the notion of a “virtual parallel computer” that connected several fast microcomputers by means of a Local Area Network (LAN). This distributed-memory system was called multi-computer architecture. Multicomputer configurations are constructed mainly with clusters of workstations (COWS),although one emerging multicomputer architecture is beowulf-clusters (http:/lwww.beowulf.org), whch are composed of ordinary hardware components (like any PC) together with public domain software (like Linux, PVM, or MPI). A server node controls the whole cluster, serving files to the client nodes.

Architectures

M Single lnstnrction Multiple Data

SM Shared Memory

DM Multiple Instruction

SM DM Distributed Mem. I

MPP Massively Pard.Pr.

1 ~

cow

SDM Shared Distr. Mem.

Clusters of Work.

MC ~ l I

t

i

~

BC Beowlf Clusters

~ SD~

~

~

t

~

~

Intercanretion Network

Fig. 21.1 Summarized parallel computer architecturetaxonomy and memory models. Many forms of parallelism exist today. Some architecturesbring together a relatively small number of very tightly-coupled processors. In other designs, the coupling of processors is relatively loose, but the number of processors can scale up to the thousands. A diagram of parallel architecturetaxonomy is presented on the left. On the right, we show the most used memory models available for these architectural designs.

Multicomputers bring several advantages to parallel computing: cost (on average, one order of magnitude cheaper for the same computational power), maintenance (replacing fault nodes), scalability (adding new nodes), and code-portability. Some drawbacks also exist, such as the lack of available software that enables management of the cluster as one integrated machine. In addition to this, current network technology has high latency and insufficient bandwidth to handle fast parallel processing.

524

BlOlNFORMATlCS AND PARALLEL METAHELIRISTICS

These factors limit the effectiveness of this architecture at the present time, although it looks promising given the expected capabilities of future technologies. 21.3.2 Parallel Programming Models In simple terms, parallel software enables a massive computational task to be divided into several separate processes that execute concurrently through different processors to solve a common task. The method used to divide tasks and rejoin the end result can be used as a point of reference to compare different alternative models for parallel programs. In particular, two key features can be used to compare models: 1 . Granularity: the relative size of the units of computation that execute in parallel (coarseness or fineness of task division), and

2. Communication: the way that separate units of computation exchange data and synchronize their activity. Most of todays advanced single-microprocessorarchitectures are based on the Superscalar and Multiple Issue paradigms (MIPS-R10000, Power-PC, Ultra-Sparc, Alpha 21264, Pentium 111, etc.) These paradigms have been developed to exploit Instruction Level Parallelism (ILP): the hardware level of granularity. The finest level of software granularity is intended to run individual statements over different subsets of a whole data structure. This concept is called data-parallel, and is mainly achieved through the use of compiler directives that generate library calls to create lightweight processes called threads, and distribute loop iterations among them. A second level of granularity can be formulated as a “block of instructions”. At this level, the programmer (or an automatic analyzer) identifies sections of the program that can safely be executed in parallel and inserts the directives that begin to separate tasks. When the parallel program starts, the runtime support creates a pool of threads which are unblocked by the runtime library as soon as the parallel section is reached. At the end of the parallel section, all extra processes are suspended and the original process continues to execute. Ideally, if we have n processors, the run time should also be 72 times faster with respect to the wall clock time. In real implementations, however, the performance of a parallel program is decreased by synchronizationbetween processes, interaction (information interchanges), and load imbalance (idle processors while others are busy). Coordination between processes represents sources of overhead, in the sense that they require some time added to the pure computational workload. Much of the effort that goes into parallel programming involves increasing efficiency. The first attempt to reduce parallelization penalties is to minimize the interactions between parallel processes. The simplest way, when possible, is to reduce the number of task divisions; in other words, to create coarsely grained applications.

PARALLEL COMPUTERS

525

Once the granularity has been decided, a crucial question arises: how will the parallel processes interact to coordinate the behavior of each other? Communications are needed to enforce correct behavior and create an accurate outcome. 21.3.3 Communications When shared memory is available, interprocess communication is usually performed through shared variables. When several processes are working over the same logical address space, locks, semaphores, or critical sections (blocks of code that only one process can execute at a time) are required for safe access to shared variables. When the processors use distributed memory, all interprocess communication must be performed by sending messages over the network. With this messagepassing paradigm, the programmer needs to keep in mind where the data are, what to communicate, and when to communicate to whom. Library subroutines are available to facilitate message-passing constructions: PVM [63], MPI (http://www.mpi-forum.org/index.html),etc. As one might imagine, writing parallel code for a disjointed memory space address is a difficult task, especially for applications with irregular data access patterns. To facilitate this programming task, software distributed shared memory provides the illusion of shared memory on top of the underlying message-passing system (i.e., TreadMarks, http://www.cs.rice.edu/ willy/TreadMarks/overview.html).

21.3.4 Task Scheduling Strategies Common knowledge gained from working on parallel applications suggests that obtaining an efficient parallel implementation is fundamental to acheve a good distribution for both data and computations. In general, any parallel strategy represents a trade-off between reducing communication time and improving the computational load balance. The simple task scheduling strategy is based on a masterlslave approach. In essence, one of the processors acts as a master, scheduling and dispatching blocks of tasks (e.g., pairwise sequence alignments) to the slaves, which, in turn,perform the typical calculations specified by the algorithm. When the slave completes one block, the master schedules a new block of tasks and repeats this process until all tasks have been computed. Efficiency can be improved by slaves prefetching tasks from the master so as to overlap computations and communications. Efficiency is further improved by catching problems in slaves, so that slaves communicate with the manager only when no problems are available locally. As the number of slaves scales upward, slaves can be divided into sets, each with a submaster, in a herarchical fashion. Finally, in a filly decentralized model, each processor manages its own pool of tasks, and idle slave processors request tasks from other processors. One can easily see how bioinformatics applications, with their massive data calculation loads, would be amenable to parallel processing.

526

BIOINFORMATICS AND PARALLEL METAHEURISTICS

At this point, a very schematic and abbreviated description of parallel architectures has been presented for easier comprehension. A more academic, up-to-date, and detailed description can be found, for example, in Tanenbaum (1999), ([64], chapter 8: Parallel Computer Architectures).

21.4 BIOINFORMATIC APPLICATIONS In this section, different and routinely used algorithms will be presented to describe the strategies followed to parallelize bioinformatic software. The discourse has been organized by the task-level computational pattern observed in such algorithms, from regular to irregular structured [56]. Traditionally, a regular-irregular classification, also named synchronous/asynchronous(and their respective semi-regularand loosely synchronous levels), has been used in such a way that it was closely related to the characteristic that computations were performed over dense or sparse matrices. However, when working with non-numerical applications, as is the case for most of bioinformatic applications, the rate of free-dependent tasks, the data access pattern, and the task homogeneity, are appropriate indices used to classify applications. 21.4.1 Regular Computational Pattern: Database Searching Database searching (DBsrch) is the most heavily used bioinformatic application. It is also one of the most familiar applications to begin a discussion about parallelization in bioinformatics: DBsrch has a very simple form as far as data flow is concerned, and a broad range of strategies have been proposed to apply parallel computing. The primary influx of information for bioinformatics applications is in the form of raw DNA and protein sequences. Therefore, one of the first steps towards obtaining information from a new biological sequence is to compare it with the set of known sequences contained in the sequence databases. Results often suggest functional, structural, or evolutionary analogies between the sequences. Two main sets of algorithms are used for painvise comparison (the individual task in a DBsrch application): 1. Exhaustive algorithms based on dynamic programming methodology [48][61]. 2. Heuristic (faster and most used) approaches such as the FASTA [73][43][54] and BLAST [2][3] families. DBsrch applications allow two different granularity alternatives to be considered: fine- and coarse-grained parallelism. Early approaches focused on data-parallel over SIMD machines (notably the ICL-DAP massive parallel computer) starting with the pioneering work of Coulson et al. (1987) [lo]. Deshpande et al. (1991) [13] and Jones (1992) [36] presented a work on hypercubes and CM-2 computers. Soon after, Sturrock and Collins (1993) [62] implemented the exhaustive dynamic programming algorithm of Smith and Waterman (1981) [61] in the MasPar family of parallel

BIOINFORMATIC APPLICATIONS

527

machines (from the minimum 1024-processor configuration of MP-1 systems up to a 16,384-processor MP-2 system). They roughed out one of the first remote servers over parallel machines (the BLITZ server at the EMBL, http//:www.emblheidelberg.de) that is still active at the EBI (http://www.ebi.ac.uk/MPsrch/). Simple and elegant dynamic programming-based algorithms compute an SN,M matrix ( N and A4 being the sequence lengths). The S2,3cell is defined by the expression s2,J

+

= maz[{S%-l,j-l w ( 2 2 , 9 3 ) } ,

{SZ-IJ

+

crcJ},

{Sz,j - +erg}],

where w represents a scoring scheme for every pair of residues xi,yj, and asis a negative value representing the penalty for introducing or extending a gap of length g. To compute the S2,Jcell, data dependencies exist with the value of the previous cell in the same diagonal, and the best values are on the left of the previous row and on top of the previous columns.

Fig. 21.2 Diagonal-sweep fine-grained workload distribution for SIMD machines to avoid data dependencies. Rows are distributed along processors (residue 2 , of query sequence is assigned to processor Pi) and processor Pi starts its computations with a delay of z columns. There will be ( P x ( P- 1))idle processors at the beginning and at the end of computations.

Fine-grain means, in this case, that processors will work together in computing the S matrix, cell by cell. Edmiston and Wagner (1987) [16], and Lander et al. (1988) [41] organized the CM-2 machine as an array of processors to compute in diagonal-sweep fashion the matrix S (see Figure 21.2). An advantage is that this strategy only requires local communications (in each step, Pi sends Si,j to Pi+l to allow it to compute Si+l,j in the next step, while Pi computes Si,j+l). Query sequence length determines the maximum number of processors able to be assigned, and processors remain idle at begidend steps. Both inconveniences are important due to the high number of processors usually present in SIMD architectures. Around this time, Collins et al. (1987) [9] proposed a row-sweep workload distribution, splitting the sequence database into groups of 4096 residues to be assigned to a 64 x 64 array of processors. This modification was significant, because solving both problems (number of P’s greater than sequence length and idle processors) addresses an important problem: data dependencies. In fact, in step j , Pi computes

528

BlOlNFORMATlCS AND PARALLEL METAHEURISTICS

only partially the cell Sz,j(Sz- l , J - 1 is received by a message from Pz- 1 in step j - 1 and the best column value is in the same processor). At this point, the best horizontal value is needed to complete the cell final value. To broadcast this value, only logzP messages are used when processor P sends a message in iteration i to processor P + 22 (with i = 1...12). Note that a given processor needs to send a message only when it changes its best row value (a very unlikely event); thus, in practical terms, the number of messages is much lower. It might seem unnecessary that the last two paragraphs have been used to discuss parallel strategies for computers that, to use a colloquial expression, are in danger of extinction. However, apart from its historical interest, there are other good reasons. As we can see right away, a coarse-grained approach is the best for a great number of tasks (such as most of today’s parallel bioinformatic problems). However, several other applications exist for which there are not enough independent tasks to be solved concurrently. It is still possible to learn from early approaches and obtain fruitful conclusions that improve new parallel solutions. There are several proposed strategies for achieving coarse-grained parallelism in DBsrch applications. Most of them can be explained on the basis of the general pseudocode:

Algorithm 1. DBsrch Algorithm get parameters; get query-sequence; perforninicializations(); for each sequence E {Database}

{

1

score=Algorithm(query-sequence, sequence, parameters); maintain a trace of best results(sequence,score);

results optimization(); report best results();

In this general sequential pseudocode for a DBsrch application, first step sets the initial stage of the algorithm, and the loop manages the algorithm extension which works until the number of database sequences is exhausted. Inside the loop, the next sequence is compared against the query sequence. The result value is often used to rank the best results, and finally after the main loop, specific implementations can incorporate a last optimization step (i.e., assessing the statistical significance of results) and report the results. As should be noted, the algorithm has a very simple form as far as data flow is concerned. The database sequence corresponds to the data set to be searched, which, we need to keep in mind, is a set of sequences of different lengths. In essence, in a typical coarse-grained parallel implementation, one of the processors acts as a “master”, dispatching blocks of sequences to the “slaves” which, in turn,perform the

BIOINFORMATIC APPLICATIONS

529

algorithm calculations. When the slaves report results for one block, the master sends a new block. This strategy is possible because results from the comparison between two sequences (query and database sequences) are independent of the previous results deriving from the comparison of the query with other sequences. However, the time required in the processing of any given sequence depends not only on the length of the sequence, but also on its composition. Therefore, the use of a dynamic load balancing strategy is necessary. The simplest way is to modify the way in which the master processor distributes the load on demand from the slaves. Obviously, sending one-sequence messages introduces additional expensive time overhead due to the high number of messages interchanged. Thus, rather than distributing messages sequence-by-sequence, better results are achieved by dispatching blocks of sequences [ 131. Additional improvements are obtained by applying buffering strategies that reduce or eliminate slave inactivity while waiting for a new message (server data starvation). The master processor can send, at the outset, more than one block of sequences to each slave, so that a slave has a new block at the ready to continue working as soon as each block is completed [67]. Several methods have been used to determine the size of the block of sequences to be distributed. The simplest way is to divide the database in n chunks (nbeing the number of slave processes) and obviously assign one chunk to each slave [44]. The data chunks can even reside in a local disk storage. To minimize load unbalancing, sequences are ordered by size and are assigned in round-robin fashion to chunks. The strategy is simple, inexpensive, and effective. Unfortunately, it also presents at least two difficult problems:

1. To perform the distribution it is necessary to know in advance the number of processors (n). 2. When working in heterogeneous environments, such as multicomputers clusters of workstations, the CPU time needed to process each chunk can be quite different, depending on the CPU power and the CPU availability in each node. A direct solution divides the database in m blocks of sequences ( m>> n) of fixed length (with block size around 4 to 16 Kbytes, aiming to maximize the network bandwidth) and assigns blocks to slaves on demand. In this way, the maximum imbalance at the end of computations is proportional to the block size, and scheduling cost (including message-passing) is proportional to m. The major scheduling-distribution cost is normally shadowed by using buffering strategies, as explained above. An additional specialization can be obtained by using blocks of variable size [68]. This last approach allows a pattern of growing-size/decreasing-sizemessages with a minimal scheduling cost. It is especially suitable for clusters of workstations because it avoids server data starvation due to scheduling latencies. If the first blocks are short, the first servers may finish computing before a new block of data is available to them. If the first blocks are large, the last slaves must wait a substantial amount of time for their first block of data to be dispatched. Moreover, large blocks in the last

530

BlOlNFORMATlCS AND PARALLEL METAHEURISTICS

steps of the data distribution may increase overall processing time due to poor load balancing. For distributed-memoryparallel machines, the blocks of sequences arrive at slaves via message passing from a master that deals with the file system. It is also possible that the master sends to slaves only a pointer in the database, and the slaves load the sequences by themselves through the NFS (Network File System) or another particular element, i.e., the CFS (Concurrent File System). When shared memory is available, a counter-variable, which serves as a pointer into the database, manages the workload distribution. Since the counter is located in the shared memory, each processor can access it in a guarded region, obtain the value, and move the pointer to the next block. This type of system has been implemented for the Cray Y-MP [37]. Two simple notes complete this epigraph: 1. The Achilles heel of message passing is the relatively limited data transmission bandwidth in the communication pathway. In these architectures, the communication/computationratio must be low to efficiently port algorithms. It will always be harder to parallelize Fasta or Blast than a dynamic programming algorithm.

2. When there are several query sequences for database searching (i.e., in the case of a DBsrch server) a process-level of granularity can be applied (in fact, this approach is used at the NCBI (http:l/www.ncbi.nlm.nih.gov/BLAST) However, there is a more important thing to be learned at this point. When more tasks than processors are available, the simplest and most effective strategy is coarsegrained parallelization. This is so fundamental that presenting a new algorithm with this feature goes together with its parallel coarse-grained implementation. Some good examples are: 0

Structural biology (electron microscopy). Determines viral assembly mechanisms and identifies individual proteins. The computational intensive task in this algorithm is associated with imaging the 3D structure of viruses from electron micrographs (2D projections). The number of tasks is related to the set of candidate orientations for each particle, such calculations being at different orientations, completely independent of each other. Protein structure prediction. This task involves searching through a large number of possible structures representing different energy states. One of the most computationally intensive tasks calculates the solvent accessible surface area that can be measured on individual atoms if the location of neighboring atoms is known. Searching 3 0 structure databases. As the number of protein structures known in atomic detail increases, the demand for searching by similar structures also grows. A new generation of computer algorithms has been developed for searching by:

BIOINFORMATIC APPLICATIONS

531

1. extending dynamic programming algorithms [52]; 2. importing strategies from computer vision areas [20]; 3. using intra-molecular geometrical information, as distances, to describe protein structures [32][33]; and 4. finding potential alignments based on octomeric C alpha structure fragments and determining the best path between these fragments using a final dynamic programming step followed by least squares superposition [591. 0

Linkage analysis. Genetic linkage analysis is a statistical technique used for rapid and largely automated construction of genetics maps from gene linkage data. One key application of linkage analysis aims to map human genes and locate disease genes. The basic computational goal in genetic linkage analysis is to compute the probability that a recombination occurs between two loci L1 and L2. Most frequently used programs estimate this recombinationfunction by using a maximum likelihood approach [53].

All of the previous examples fit perfectly into coarse-grained parallel applications, due to the large number of independent tasks and the regular computational pattern they exhibit, together with the low communicatiodcomputation rate they present. All these features make them suitable for parallelism with high efficiency rates. However, several other interesting examples have non-regular computational patterns, and they need particular strategies to better exploit parallelism. Let’s take a deeper look into the last example. In the parallelization of LINKMAP, Miller et al. (1992) [46] first used a machine-independent parallel programming language known as Linda. It was compared to the use of machine-specific calls on the study of a Hypercube computer and a network of workstations, concluding that a machine-independent code could be developed using that tool with only a modest sacrifice in efficiency. One particular hypothesis says there are many pedigrees and/or many candidate 19vectors, treating each likelihood evaluation for one pedigree as a separate task. If there are enough tasks, a good load balancing can be obtained. Godia et al. (1992) [26] use a similar strategy for the MENDEL program. However, Gupta et al. (1995) [29] observe that typical optimization problems have a dimension of only two or three; thus, there is no need for a large number of processors. In conclusion, it is important to integrate parallelization strategies for individual function evaluation (coarse-grained) with a strategy to parallelize the gradient estimation (fine-grained). 21.4.2 Semi-Regular Computational Patterns A similar problem arises in the parallelization of hierarchical multiple sequence alignments, MSA [ 11][28][47][66]. The first steps for solving a MSA include calculating a cross similarity matrix between each pair of sequences, followed by determining the alignment topology and finally solving the alignment of sequences, or clusters themselves.

532

BlOlNFORMATlCS AND PARALLEL METAHEUFUSTICS

Painvise calculation provides a natural target for parallelization because all elements of the distance matrix are independent (for a set of n sequences n . ( n - 1)/2 pair wise comparisons are required). Computing the topology of the alignment (the order in which the sequences will be grouped) is a relatively inexpensive task, but solving the clustering (guided by the topology) is not that amenable to parallelism. This is due to the fact that, at this stage, many tasks are to be solved (for a set of n sequences it is necessary to solve n - 1 alignments). However, only those tasks corresponding to the external nodes of the topology can be solved concurrently. Certainly, parallel strategies for the cross-matrix calculation have been proposed [27][12][60], all of them in a coarse-grained approach. In addition, when the MSA is embedded in a more general clustering procedure [69], combining a dynamic planning strategy with the assignment of priorities to the different types of active tasks using the principles of data locality has allowed us both to exploit the inherent parallelism of the complete applications and to obtain performances that are very close to optimal. However, at present and strictly speaking, the last step in MSA remains unsolved for parallel machines. When the work is carried out following a coarse-grained parallelization scheme for distributed-memory architectures, it is then necessary to exchange the sequences and/or clusters that are being modified due to the insertion of gaps during their alignment, which is extremely expensive. For this, we should look back and learn from the earliest fine-grained parallel solutions applied to sequence comparison. Today, when mixed shareddistributed memory architectures are available, this could be an excellent exercise that -it should be stressed- is far from being an academic one. A full solution probably should combine a coarse-grained solution when computing the cross similarity matrix with a fine-grained solution for solving the topology. Many challenges are yet to be overcome.

21.4.3 Irregular Computational Patterns Applications with irregular computational patterns are the hardest to deal with in the parallel arena. In numeric computation, irregularity is mostly related to sparse computational spaces which introduce hard problems for data parallel distributions (fine-grained approaches) and data dependencies. The latter reduces the number of independent tasks, which affords little chance to develop efficient coarse-grained parallel implementations. A good example of this comes from another routine task in biological sequence analysis, that ofbuilding phylogenetictrees [25][7][ 17][181. Earlier approachesto apply maximum likelihood methods for very large sets of sequences have been centered on the development ofnew simpler algorithms, such as the fastDNAml [ 5 11, which has been ported to parallel architectures using the P4 package [6]. A current parallel version of the fastDNAml implementedin C with communicationsunder MPI is available at http:ilwww.santafe.edu/btWscience-paper/bette.htmland at the Pasteur Institute under TreadMarks http://www.cs.rice.edd willy/TreadMarks/overview.html.Even these simplified approaches have been known to be quite computationally intensive.

BIOINFORMATIC APPLICATIONS

533

In fact, they were reported to have consumed most of the CPU time of the first IBM SP 1 installation in the Argonne National Laboratory (1993) [4]. Let’s center our attention on the original Felsenstein version of the method (implemented in the PHYLIP package, available at evolution.genetics.washington.edu).In very simple terms, the maximum likelihood method searches for a tree and a branch length that have the greatest probability of being produced from the current sequences that form that tree. The algorithm proceeds by adding sequences into a given tree topology in such a way that maximizes the likelihood topology (suitable for coarsegrained parallelism). Once the new sequence is inserted, a local-optimization step is performed to look for minor rearrangements that could lead to a higher likelihood. These rearrangements can move any subtree to a neighboring branch. Given a current tree Tk with likelihood Lk,one of its k nodes is removed and rearranged in its two neighbor nodes which produce two new trees, Tk1 y Tk2,with likellhood Lkl and Lk2,respectively. The tree with greater likelihood value (including Lk)is chosen as the new best-tree and it replaces the current-tree. This procedure is performed until the set of nodes to rearrange is exhausted, as can be observed in the next pseudocode: Algorithm 2. Local ODtimization in the DNAml Algorithm

Current-best-tree Tk(Lk); for i = 1 to n-tasks

Nfrom insertion step

Remove sub-tree a from Tk and produce Tk1 and Tk2; Likelihood evaluation for Tk1 and T ~( z L ~ and I Lk2); Current-best-tree Tk = tree with greatest likelihood (Tk,Tk1,Tkz);

Strictly speakmg, only those nodes without leaves and a depth of at least 2 (not hanging from the root node) can be reorganized, which represent 2k - 6 tasks ( k being the number of species). For a large number of sequences, it could be addressed in a coarse-grained parallel solution by distributing the n tasks among different processors. Unfortunately, the reorganization task of one node is dependent on the reorganization of the previous nodes, due to the replacement of the current best-tree. In fact, each new optimization task must be performed over the last best-tree found, and not over the initial topology. This leaves only two tasks (likelihood evaluation for T k 1 and T k 2 topologies) that can be solved in parallel. In other words, the maximum theoretical speedup of this step will be limited to this value (2), independent of the number of processors used in the computation. There is no generic procedure to address this type of irregular problem; hence, a good initial approach includes a detailed analysis of the computational behavior of the algorithm. In this specific case, a careful runtime analysis of the algorithm shows that the number of times a tree with a likelihood better than the current likelihood is obtained is extremely low (see Figure 21.3). From this behavior, it is possible

534

BIOINFORMATICS AND PARALLEL METAHEURISTICS

to conclude that the probability of a new current-best-tree event is rather low; or conversely, in most cases there is a high probability that a best tree will not be produced. The most important implication of this runtime observation is that, having evaluated the probability of rearranging a node, the next likelihood evaluation can be started with the same tree used to evaluate the previous one. In this way, the task dependencies in the local optimization step are avoided [8][70].

DNA-ml: Algorithm Run-Time Behaviour

Fig. 21.3 Example of the runtime behavior of the DNAml algorithm in the optimization step (4)using 50 sequences. The number of circles in each horizontal line represents the number

of optimization tasks successively performed as a function of the number of sequences already incorporated into the topology (on the left). Filled circles show those points in which a new maximum value was detected. At the top right-hand corner of the figure, the total number of tree likelihood evaluations performed by the Ceron et al. algorithm is presented, together with the number of extra evaluations (parallel penalty) incurred by the algorithm and the very low penalty evaluation percentage.

21.5 PARALLEL METAHEURISTICS IN BIOINFORMATICS As it is known, metaheuristics are generic methods for nonexact solution of difficult ("-hard) combinatorial problems [45]; on the other hand many bioinformatics applications involve hard combinatorial searches over a large solution space. Therefore,

PARALLEL METAHELIRISTICS IN BIOINFORMATICS

535

it seems reasonable to think bioinformatics problems could benefit from appropriate metaheuristic approaches. In fact, some of these bioinformatic problems are so hard, and the amount of raw data to be processed so vast, that a parallel exploration of the search space is a must if you want to find a reasonably "good" solution. With the advent of powerful distributed or parallel computers, new bioinformatics algorithms making use of metaheuristics will hopefully be able to produce quality results within a reasonable amount of time. There are a growing number of works in the literature applying metaheuristics and parallel or cooperative metaheuristics to classical but still unsolved challenges in the automation of bioinformatics data analysis [24][65]. Most of these analyses can be regarded as difficult combinatorial optimization problems such as, for example, efficient learning of proprieties from data, classification of complex sets of information, extraction of grammatical structure from sequences, etc. For example, multiple sequence alignment (MSA), a semi-regular bioinformatics application discussed in the previous section, is a combinatorialoptimization problem whose aim is to find the optimal alignment of a group of nucleotide or protein sequences. This task plays a key role in a wide range of applicationsthat include finding the characteristic motifs among biological sequences, backtracking the evolutionary paths through sequence similarity, identifying the consensus sequence, predicting the secondary and tertiary structures, and sequence clustering. Throughout the previous sections of this chapter we have surveyed a variety of parallel implementationsto depict general solutions in the bioinformatics arena. Now we are going to introduce the parallelization of a particular heuristic approach for the clustering of very large sequence data sets in which the multiple alignment of the input sequence data set is initially an unaffordable prerequisite. Current classification and clustering methods for biological sequences are mostly based on pre-aligned sequences (using MSA), which is CPU time bounded and becomes unaffordable for large sets of sequences. Numerous techniques have been used to align multiple sequences [50] and heuristic approaches have many instances among them. Simulated annealing [38], genetic algorithm [49][77] and tabu search [65] are heuristic iterative optimization techniques that have been applied to align multiple sequences. There are also parallel solutions using simulated annealing [35] and hierarchical cooperative implementations of evolutionaryalgorithms [24] for the MSA resolution. This is still an open problem, and as we stated in the previous section devoted to applications with semi-regular computational patterns, at present the last step in MSA remains with no feasible solution for parallel machines. Thus in this section we will describe a two-step strategy for the hierarchical classification of very large data sets. The power of our heuristic solution is due to the multiresolution decomposition of a self-organizing algorithm. We use a first coarse classification step to break the input data set in affordable independent subsets, which allows scaling down the complexity of MSA and at the same time enables the parallel computation of these MSA independent tasks. The first step of this approach is aimed at identifying coarse but homogeneous groups based on the dipeptide composition of the sequences and the second step uses each coarse group as input to the original SOTA classification algorithm, enabling us to deal with large groups of sequences.

536

21.5.1

BlOlNFORMATlCS AND PARALLEL METAHEURISTICS

The Problem

There have been various attempts at grouping sequences systematicallywith different objectives, i.e. UniGene builds a non-redundant set of gene-oriented clusters from GeneBank [ 7 2 ] ;ClusTr aims to produce clusters of protein families [40]; and iProClass [76] establishes comprehensive family relationships and structuraVfunctiona1 features of proteins. In all cases, the strategy for clustering involves some form of “all-against-all” comparison, which is computationally expensive: O ( N 2 )at least, N being the number of sequences in the comparison, which is certainly a major concern in view of the spectacular growth in the number of biological sequences in the databases. In addition, each comparison step involves sequence alignments whose complexity is O ( L 2 ) ,L being the sequence length. Both, the high computational cost and the non-linear behavior of the algorithmic complexity with respect to the number of sequences are behind the interest in developing high performance computing approaches to this problem. Unsupervised neural networks, and in particular self-organizing maps (SOMs) [39], provide a robust and accurate approach to the clustering of large amounts of data that replace the “all-against-all’’ strategy with “all-against-the-nodes”, which can be performed in virtually lineal run times [30]. In fact, SOMs have been used for classifying large datasets of proteins [ 191. Nevertheless, since SOM is a topologypreserving neural network [23] and the number of clusters is arbitrarily fixed from the beginning, it trends to be strongly influenced by the number of items. Thus, if some particular family of proteins is overrepresented, SOMs will produce an output in which this type of data will populate the vast majority of clusters. The Self-organizing Tree Algorithm (SOTA) [ 151, a clustering divisive method, was proposed to organize prealigned sequences [30] showing a good recovering of the natural cluster structure of the data set. However, the requirement of prealigned sequences demands all-against-all multiple alignments, which is unforeesable for large datasets. To solve this problem a two-steps approach can be devised: preidentifying coarse sequence groups previous to the use of SOTA. Thus, a coarse classification will divide the complete set into well defined groups, and once the coarse groups are determined, a multiple sequence alignment of each group is performed and used as input to the original SOTA algorithm, enabling large groups of sequences to be dealt with. Thus, for a test case of 1000 sequences, using the original SOTA procedure, almost 0.5. lo6 sequence alignments need to be computed; assuming that ten coarse groups composed of one hundred sequences on average are formed, the computational cost will only be around 0.5 . lo5 painvise sequence alignments, representing a one order of magnitude reduction on the original case. Moreover, when worlung over a whole database, such as SWISS-PROT with 80,000 sequences (version 38), and assuming around 2000 groupdfamilies with an average of 40 sequences per group, the original case would require more than 3 . lo9 sequence comparisons, whereas with a preselection of coarse groups, only 1.5 . lo6 comparisons are needed. It is worth observing that, while the number of new sequences grows exponentially, the number of new families is expected to grow linearly. This means that the proposed

PARALLEL METAHEURISTICS IN BlOlNFORMATlCS

537

strategy will obtain even better results so long as sequence databases continue to grow in the predicted way. However, to be effective, the initial step must necessarily be based on a fast method of estimating distances between sequences, such as dipeptide-frequencies-baseddistances [74][75][311. Additionally the point at which the coarse classification should be stopped to give way to the second fine-classification procedure was estimated from the dipeptide random distances distribution as the threshold that produces homogeneous clusters of sequences, i.e., groups of sequences with similar dipeptide distribution. 21.5.2

The Procedure

In summary, the procedure is composed of three steps (see Figure 21.4): first, a pre-selection group strategy, named SOTAdp, based on dipeptide coding with cluster homogeneity criteria being used for the definition of groups; second, the coarse groups of sequences obtained in the first step are aligned for input to the final step in which the original SOTA is applied.

I

a a

8

a

a 4

aI

[ c ISotaC

Fig. 21.4 General strategy. In step [A] sequences are represented by their dipeptide frequencies, and a modification of Euclidean distance is used as a similarity measure, without previous treatment of sequences. In step [B] MSA is applied over reduced sets of sequences, producing an important computational space reduction. Finally, in step [C] the classic (SOTA) algorithm is applied to refine the classification.

SOTA is based on both the SOM [39] and the growing cell structures by Fritzke (1994) [23]. SOTA is an unsupervised neural network with a hierarchical topology.

538

BlOlNFORMATlCS AND PARALLEL METAHELIRISTICS

The final result is a map in the shape of a binary tree where sequences are mapped to terminal nodes which contain their average values. The initial system is composed of two external elements, denoted as cells, connected by an internal element that we will call a node. Each node is a vector with the same size as the data used (400, for the case of dipeptide frequencies or the length of aligned sequences, depending on the case). The algorithm proceeds by expanding the binary tree topology starting from the terminal node with the most heterogeneous population of associated input data. Two new descendants are generated from this heterogeneous cell which changes state to an internal node. The series of steps performed until a terminal node generates two descendants is called a cycle. During a cycle, nodes are repeatedly adapted by the input gene profiles. This process of successive cycles generating descendant nodes can be stopped at the desired level of heterogeneity, thus producing a classification of data to a given hierarchical level. If no stop criterion is used, then a complete hierarchical classification of the whole data set would be obtained. SOTA has been used in the initial and final steps. In the first case, SOTAdp uses k-peptide frequencies (k = 2, being k consecutive residues) for encoding the sequence information [74]. In the last step each sequence position is related to the probability of finding a given residue in that position. Homogeneity is used in both cases as stopping criteria. For SOTAdp, a cluster is homogenous when it is formed of sequences whose dipeptide distances are below the homogeneity threshold; and in the third step the homogeneity is evaluated through the silhouette index [57]. In the second step, a MSA is required in order to produce a set of sequences of the same length -by introducing gaps in the sequences- and maximizing similarity between residues in the vertical composition of the group. As described previously, the MSA follows a three step approach: 1. Determination of the cross-similarity matrix which demands R . ( n calculations of pairwise similarity values.

+ l)/2

2 . Calculation of the alignment topology.

3. Solution of the n - 1 sequence and/or cluster alignments, following the order of the topology.

21.5.3 The Parallel Solution Although the identification of coarse groups of sequences avoids MSA of the whole group, the number of sequences involved in this type of study demands computational resources that are still high enough to justify a high performance computing approach to this problem. A careful runtime analysis of the three-steps approach is shown in Figure 21.5. From this analysis it is concluded that most of the CPU time is demanded by the MSA step, with SOTA a far second and a small requirement for SOTAdp. This suggests a focus of parallelization efforts on the MSA step. Moreover, SOTAdp runs only once, but MSA and SOTA run for each of the groups produced by SOTAdp.

PARALLEL METAHEURISTICS IN BlOlNFORMATlCS

539

It is also noticeable that MSA is performed for each of the coarse but homogeneous groups formed by SOTAdp. In the three steps in which MSA proceeds (producing the similarity matrix, computing the alignment topology and solving the alignments), the intermediate step -topology- has not appreciable effect on the computing demand. The natural parallel solution considers each coarse cluster (produced by SOTAdp) as a distribution unit (task) (see Figure 21.6). Thus, the first step -SOTAdp will mn sequentially and the parallel processing will start after the coarse groups have been formed. Each branch of the topology is fully assigned to a given processor in which the MSA and then SOTA will be applied, thus amounting to a coarse-grained parallel solution.

Fig. 21.5 Application profiling. On the left, we have drawn the general procedure. SOTAdp produces coarse groups. For each one of these groups an MSA is performed in three steps (computing the per-sequence similarity matrix, determining the alignment topology, and performing the alignment). Finally, the classical SOTA is applied over each set of prealigned sequences to refine the classification. The computational cost (right-hand side) arises from MSA and includes the computation of the similarity matrix.

Because a high number of sequences are expected, enough tasks are available for distribution in this coarse-grained parallel strategy. However, the computational cost of solving a coarse cluster is a function of the number of sequences belonging to such a cluster which can be significantly different between clusters. For this reason a priority level is assigned to each task to ensure that large tasks are not left until last for launching in the parallel system. The priority level is inversely proportional to the number of sequences in the cluster and the average size of their sequences. The main drawback to the simple initial solution proposed in the above paragraphs is the large size of the distribution unit (parallel task). This makes the strategy quite sensitive to the number of tasks that can produce an unbalanced load distribution with very poor results. As an effective way of reducing this problem, the MSA subtask can be divided into two minor tasks: computing the similarity matrix and

540

BlOlNFORMATlCS AND PARALLEL METAHEUFUSTICS

solving sequence alignments (the latter includes computing the topology). At the same time, the SOTA applied over each branch of aligned sequences (produced by SOTAdp topology) is also a distribution unit task. Thus for this second approach, we may differentiate up to 4 types of tasks: [Tl] SOTAdp to gather coarse clusters, [T2] completion of similarity matrix of a given cluster, [T3] solution of the sequence alignment, including determination of the alignment topology, and [T4] SOTA using a set of aligned sequences as input. It is worth observing that in the first coarse-grained solution, each cluster produced by SOTAdp was completely assigned to one processor (i.e., depth-first solved). In the second approach, the different subtasks needed for solving a cluster can be computed in different processors (i.e. breadth-first solved). The parallel processing conducts as follows: First [Tl] SOTAdp runs to obtain a given number m of clusters which produce, from the parallel perspective, m new tasks of type T2 for inclusion in the queue of unsolved parallel tasks. The parallel threads pick tasks from this queue. When a T2 task is completed a new T3 task is produced for inclusion in the queue, and when a T3 task ends, a T4 task is produced. The pending-tasks queue is also managed with a priority scheme that assigns higher priority to longer tasks. However, to avoid longer coarse groups monopolizing processors, we introduce a new priority criterion: the task-type, which has the effect of first launching those tasks that will produce new tasks to be solved, thus increasing

I'a

a

a '

I&n qa

a

a a a Coarse-Grained

Medium-Grained

Fine-Grained

Parallel Approaches Fig. 21.6 Alternative parallel approaches. On the left, the initial approach distributes a complete branch of the topology to processors (distribution units are identified by filled rectangles). In the middle, three different tasks are proposed for each branch, and finally, on the right, small distribution tasks are proposed in a fine-grained solution with good parallel results.

PARALLEL METAHEURISTICS IN BIOINFORMATICS

541

the number of tasks for distribution avoiding processor inactivity, and then ensuring better performance. SOTA and SOTAdp have strong data dependencies that prevent their parallelization. However, tasks T2 and T3 -accounting for between 80% and 90% of the computational load- are still expensive as individual tasks, and they can be reduced in size. Task T2 can be divided into subtasks that partially calculate the similarity matrix (a row-blocks distribution is used). Task T3 poses a more serious challenge: the interdependenceof the tasks, which means that despite there being (g - 1)singlealignment tasks in each cluster with g sequences, only a small number of them may be simultaneously solved due to data dependencies. To solve this problem we use a distribution task governed by a task graph (see Figure 21.6), which delays the alignment launch of a given node until the previous nodes have been solved. First the strategy was evaluated about its ability to reproduce previously wellaccepted biological knowledge using a short data set composed of 48 protein sequences belonging to five different protein families. These groups, Catalase, Citrate, Globin, Histone, and G-proteins, have typical dipeptide distributions which makes it difficult to separate groups. It is interesting to study the behavior of the algorithm under these conditions, since they represent one of the less favorable cases for our strategy for coarse classification based on dipeptide distances. Figure 21.7 (left) depicts the results of running SOTAdp over this dataset using different thresholds for the first step. As expected, the number of clusters and the cluster size are obviously dependent of the threshold stringency. As we increase the coverage value, more -small- clusters are formed because a minor distance is demanded to belong to the same cluster. Because the method looks for coarse clusters, good results are obtained for coverage values of 90% and even at 70%, with a correct separation of sequences. Second, a massive test over the SwissProt database [5] was used to contrast the efficiency and usefulness of the strategy, and to illustrate not only how the method works on a large sets of sequences, but its ability to produce results with affordable CPU resources and in a time that allows the routine use of the procedure. Finally a synthetic test set was used to evaluate the performance of the parallel strategy in terms of speedup. This test was formed with 10,000 synthetic sequences organized in 50 groups. These sequences were generated from a set of 50 random sequences of 400 residue length over which a point mutation was applied to produce close related partner sequences. A number of different workload tests (formed by partial collections of 1000, 5000, and 10,000 sequences, named Test-S1, Test-S5, and Test-S10) were used to explore the behavior of the parallel strategy under different workloads: number of available tasks and computational cost of each task. The goal of parallelizing the code was achieved by using a thread-based approach that results in a very portable code. In fact, along this work three specific and quite different parallel platforms were used. One of them was a SGI-Origin 2000 (http://www.scai.uma.es)with up to 16 processors, the second an IBM RS/6000 SP, and the third a LAN-based cluster of PCs. Figure 21.7 (right) shows the speed-up achieved for the three synthetic subsets in this test (Test-S1, Test-S5, and Test- SlO). The most important question to ask with this test is the ability of the strategy to retain good performance levels when adding additional processors. This fact is positively

542

BlOlNFORMATlCS AND PARALLEL METAHEURISTICS

contrasted with the results shown below. It is evident that the reduction in the number of sequences to very low levels (e.g., Test-S 1 for 8 PEs) substantially affect the results (4.6 of speedup). On the other side, increasing the number of sequences for the same number of PE, have always a positive response (using 8 PEs, the efficiency goes up to 72.5% with 5000 sequences and 80% with 10,000 sequences, and in this latter case the efficiency is at the 72.5% level). In this example we have described both the use of metaheuristics to solve a classical bioinformatics problem and the strategy for its parallel implementation. As in the previously discussed cases a detailed runtime analysis and profiling of the application has allowed an efficient parallel solution be implemented. Efficiency has been obtained by combining advanced dynamic scheduling algorithms, prioritybased load distribution, and graph task organization for solving data dependencies. The result is a parallel computing implementation that is very efficient when applied to a large set of sequences that scale well when the number of processors increases.

6

Parallel Results

4

2 0

3 6 4

2

0

0

2

4

6

8

10

12

I4

16

Fig. 21.7 On the left, coarse cluster composition for the set of 48 protein sequences. Three different thresholds have been used: on the left a restrictive value with k = 0.7, in the middle the normal case with k = 0.9, and on the right a permissive k = 0.99, k being a factor multiplied by the threshold surface. As can be observed, the number of coarse clusters and the number of sequences in each cluster (cluster size) depend on the value of k. For normal values, a correct separation of sequences is obtained. On the right, parallel speedup curves are shown for the three different data sets formed by 1000, 5000, and 10,000 synthetic sequences running on a SGI-Origin-2000 (similar results were obtained when using the IBM RSi6000 SP described in the text). The shorter group maintains a reasonable efficiency up to 8 processors. Coarse groups have on average only 20 sequences that are easily managed by the pool. Better results are observed for longer sets, and when necessary, a more powerful machine can be used in all its capacity.

CONCLUSIONS

543

21.6 CONCLUSIONS 21.6.1 Reusing Available Software Parallel computing has shown itself to be an effective way to deal with some of the hardest problems in bioinformatics. The use of parallel computing schemes expands resources to the size of the problem that can be tackled, and there is already a broad gallery of parallel examples from which we can learn and import strategies, allowing the development of new approaches to challenges awaiting solution, without the need to ‘re-invent the wheel’. Today, it should be natural to “think in parallel” when writing software, and it should be natural to exploit the implicit parallelism of most applications when more than one processor is available. In most bioinformatic applications, due to a high number of independent tasks, the simplest approaches are often the most effective. These applications scale better in parallel, are the least expensive, and are the most portable among different parallel architectures.

21.6.2 New Challenges However, several other problems in bioinformatics remain unsolved as far as parallel computing is concerned. Parallel metaheuristics approaches appear as a positive alternative to address hard computing problems in the field, representing attractive challenges for biologists and computer scientists in the years ahead.

Acknowledgments We would like to thank Dr. Jacek Bardowsky from the IBB-PAN in Warsaw-Poland for valuable comments in the biological aspects of this document and Dr. Eladio Gutierrez from the Computer Architecture department of the University of Malaga for sharing his LTEX know-how. This work has been partially supported by project GNV-5 Integrated bioinformatics UMA, from Genoma Espaiia.

REFERENCES 1. Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J. (1989). “The Molecular Biology of the Cell”, (2nd. ed. ed.). New York, Ny: Garland Publishing.

2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990). “Basic local alignment search tool”, J.Mol.Bio1. 2 15:403-410.

544

BIOINFORMATICS AND PARALLEL METAHEURISTICS

3. Altschul, S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W. and Lipman D.J., (1997). “Gapped BLAST and PSI-BLAST: A new Generation of Protein DB search Programs”, Nucleid Acids Research (1997) 25: 17,3389-3402.

4.Argonne National Laboratory, (1993), “Early experiences with the IBM SP1 and the High Performance Switch”, (Internal report ANL-93141).

5. Bairoch and Apweiler, (2000). “The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000”. Nucleic Acids Res. 28( 1):45-48. 6. Butler, R. and Lusk, E. (1992), “User’s guide to the P4 programming system, (Argonne National Laboratory Technical report TM-ANL-92/17), 7. Cavalli-Sforza, L.L. and Edwards, A.W.F. (1967), “Phylogenetic analysis: models and estimation procedures”, Am. J. Hum. Genet. 19: 233-257.

8. Ceron, C., Dopazo, J., Zapata, E.L., Carazo, J.M. and Trelles, 0. (1998), “Parallel Implementation for DNAml Program on Message-Passing Architectures”, Parallel Computing and Applications, 24 (5-6), 701-7 16. 9. Collins, J.F. and Coulson, A.F.W., (1987) “Nucleid Acid and Protein Sequence Analysis: a practical Approach”, IRL Press, Oxford, 327-358.

10. Coulson, A.F.W., Collins, J.F. and Lyall, A., (1987) “Protein and Nucleid Acid sequence database searching: a suitable case for Parallel Processing”, Computer J., (39), 420-424. 11. Corpet, F. (1988), “Multiple sequence alignments with hierarchical clustering”, Nucleic Acid Research, (16), 10881-10890. 12. Date, S, Kulkarni,R., Kulkami, B., Kulkami-kale, U. and Kolaskar,A. (1993), “Multiple alignment of sequences on parallel computers”. CABIOS (9)4, 397402. 13. Deshpande, A.S., Richards, D.S. and Pearson, W.R. (1991), “A platform for biological sequence comparison on parallel computers”, CABIOS (7), 237-247. 14. Doolittle, R. F., Hunkapiller, M. W., Hood, L. E., Devare, S. G., Robbins, K. C., Aaronson, S. A., and Antoniades, H. N. (1983). Simian Sarcoma Onc Gene, v-sis, Is Derived from the Gene (or Genes) Encoding Platelet Derived Growth Factor. Science, 221, 275-277.

15. Dopazo, J. and Carazo, J.M. (1997) Phylogenetic reconstruction using a growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol 44 1226-233. 16. Edmiston, E. and Wagner, R.A. (1987) “Parallelization of the dynamic programming algorithm for comparison of sequences”, Proc. of 1987 International Conference on Parallel Processing pp.78-80.

REFERENCES

545

17. Felsenstein, J. (1973), “Maximum-likelihood estimation of evolutionary trees from Continuous Characters”. Society of Human Genetics 25: 471-492. 18. Felsenstein, J. (1988), “Phylogenies from molecular sequences: inference and reliability”. Annu. Rev. Genet. 22: 521-565. 19. Ferran EA, Ferrara P (1992), “Clustering proteins into families.using artificial neural networks”. Comput Appl Biosci. 8:39-44. 20. Fisher, D., Bachar, O., Nussinov, R. and Wolfson, H. (1992), “An efficient automated computer vision, based technique for detection of three-dimensional structural motifs in proteins”, J.Biomo1.Struct.Dyn. 9 :769-789. 2 1. Flynn, M.J., (1972), “Some Computer Organizations and their Effectiveness”, IEEE Transon Computers, v0l.C-2 1 (948-960). 22. Foster, 1. (1994), “Designing and Building parallel programs: concepts and tools for parallel software engineering”, Addison-Wesley Publishing Company, Inc. (On-line version: http://wotug.ukc.ac.uk/parallel/books/addison-wesley/dbpp/). 23. Fritzke, B. (1994), “Growing cell structures - a self-organizing network for unsupervised and supervised learning”. Neural networks 7: 1141- 1160. 24. Gras, R.; Hernandez, D.; Hernandez, P.; Zangger, N.; Mescam, Y. Frey, J.; Martin, 0.;Nicolas, J. and Appel, R.D. (2003) “Cooperative Metaheuristics for Exploring Proteomic Data”. Artificial Intelligence Review, 20:95, 120. 25. Gribskov, M. and Devereux, J. (199 l),“Sequence Analysis Primer”, UWBC Biotechnical Resource Series. 26. Godia, T.M., Lange, K., Miller, P.L. and Nadkami, P.M., (1992), “Fast computation of genetic likelihoods on human pedigree data”, Human Heredity, 42:42-62. 27. Gonnet, G.H., Cohen, M.A. and Benner, S.A. (1992) “Exhaustive matching of the entire protein sequence database”. Science, (256), 1443-1445. 28. Gotoh, O., (1993), “Optimal alignment between groups of sequences and its application to multiple sequence alignment”. CABIOS (9):2,361-370. 29. Gupta, S.K., Schaffer, A.A., Cox, A.L., Dwarkadas, S. and Zwaenepoel, W. (19959, “Integrating parallelization strategies for llnkage analysis”, Computers and Biomedical Research, (28) 116-139. 30. Herrero, J., Valencia, A. and Dopazo, J. (2001), “A hierarchical unsupervised growing neural network for clustering gene expression patterns”. Bioinfonnatics, 17:126-136. 3 1. Hobohm, U. and Sander, C. (1995), “A sequence property approach to searching protein databases” J. Mol. Bio1.251, 390-399.

546

BIOINFORMATICS AND PARALLEL METAHEURISTICS

32. Holm, L. and Sander Ch. (1993), “Protein structure comparison by alignment of distance matrices”, J.Mol.Bio1. 233: 123-138. 33. Holm, L. and Sander Ch. (1994), “Searching protein structure databases has come of age”, Proteins 19:165-173. 34. Hwang Kai and Xu Zhiwei (1998), “Scalable Parallel Computing: Technology, Architecture, Programming”, McGraw-Hill Series in Computer Engineering. 35. Ishikawa, M., Toya, T., Hoshida, M., Nitta, K., Ogiwara, A., and Kanehisa, M. ( 1993), “Multiple sequence alignment by parallel simulated annealing”. Comput. Appl. Biosci., 9(3): 267-73. 36. Jones, R., (1992) “Sequence pattern matching on a massively parallel computer”, CABIOS (8), 377-383. 37. Jiilich A., (1995), “Implementations of BLAST for parallel Computers”, CABIOS 11, 1(3-6). 38. Kim, J., Pramanik, S., and Chung, M. J. (1994), “Multiple sequence alignment using simulated annealing”. Comput. Appl. Biosci., lO(4): 419-26. 39. Kohonen T. (1997), “The Self-organizing Maps”. Berlin, Springer. 40. Kriventseva EV, Fleischmann W, Zdobnov EM, Apweiler R (2001), “CluSTr: a database of clusters of SWISS- PROT+TrEMBL proteins”.Nucleic Acids Res 29133-36. 41. Lander, E., Mesirov,J.P. and Taylor W. (1988) “Protein sequence comparison on a data parallel computer”. Proc. of 1988 International conference on Parallel Processing pp.257-263. 42. Li, W.-H. and Graur, D. (1991). “Fundamentals of Molecular Evolution”. Sunderland, MA: Sinauer Associates, Inc. 43. Lipman, D.J. and Pearson, W.R. (1985), “Rapid and sensitive protein similarity searches”, Science, 227, 1435-1441. 44. Martino, R.L., Johnson, C.A., Suh, E.B., Trus, B.L. and Yap, T.K. (1994) “Parallel computing in Biomedical research”. Science (256) 902-908. 45. Michalewicz, Z. and Fogel, D. (2000), “How to Solve It: Modem Heuristics”. Springer-Verlag. 46. Miller, P.L., Nadkami, P.M. and Bercovitz, P.A. (1 992) “Hamessing networked workstations as a powerful parallel computer: a general paradigm illustrated using three programs for genetic linkage analysis”, Comput. Applic. Bioscience, (S), 141-147. 47. Miller, W. (1993) “Building multiple alignments from painvise alignments”. CABIOS (9) 2, 169-176.

REFERENCES

547

48. Needleman, S.B. and Wunsch, C.D. (1970), “A general method applicable to the search for similarities in the aminoacid sequence of two proteins”, J.Mol.Biol., 48,443-453. 49. Notredame, C. and Higgins, D. G. (1996), “SAGA: sequence alignment by genetic algorithm”, Nucleic Acids Res., 24(8): 1515-24.

50. Notredame, C. (2002), “Recent progress in multiple sequence alignment: a survey”, Pharmacogenomics, 3( 1): 131-44. 51. Olsen, G.J., Matsuda, H., Hagstrom, R. and Overbeek, R. (1994), “fastDNAm1: a tool for construction of philogenetic trees of DNA sequences using maximum likelihood, CABIOS 10 41-48. 52. Orengo, C.A., Brown, N.P. and Taylor, W.T. (1992), “Fast stucture alignment for protein databank searching”, Proteins, 14:139-167. 53. Ott, J., (1991) “Analysis of Human Genetic Linkage”, The Johns Hopkins University Press, Baltimore and London (Revised Edition). 54. Pearson W.R. and Lipman D.J.; (1988), “Improved tools for biological sequence comparison”, Proc.Natl.Acad.Sci. USA (85), 2444-2448.

55. Rechenmann, F. (2000), “From Data to Knowledge”, Bioinformatics, v. 16.115 pp 411. 56. Rodriguez, A., Fraga, L.G. de la, Zapata, E.L., Carazo, J.M. and Trelles, O., (1998), “Biological Sequence Analysis on Distributed-Shared Memory Multiprocessors”, 6th Euromicro Workshop on Parallel and Distributed Processing. Madrid, Spain. 57. Rousseeuw, P.J. (1987) “Silhouettes: A graphical aid to the interpretations and validation of cluster analysis”. J. of Computational and Applied mathematics,20:53-65.

58. Schena. M., Shalon, D., Davis, R.W. and Brown, P.O. (1995), “Quantitative monitoring of gene expression patterns with a complementary DNA microarray”. Science ,270,467-70.

59. Shindyalov, I.N, and Bourne, P.E. (1998), “Protein structure alignment by incremental combinatorial extension (CE) of the optimal path”, Protein Engineering 11 (9) 739-747. 60. SGIT”, (1999), “SGI Bioinfonnatics http:llwww.sgi.com/chembio

performance

report”

at

61. Smith, T.F. and Waterman, M.S. (198 l), “Identification of common molecular subsequences”, J. Mol. Biol, 147, 195-197. 62. Sturrock, S.S. and Collins, J., (1993), “MPsrch version 1.3”, BioComputing Research Unit, University of Edinburgh, UK.

548

BlOlNFORMATlCS AND PARALLEL METAHEURlSTlCS

63. Sunderam, V., Manchek, R., Dongarra, J., Geist, A., Beguelin, A. and Jiang, W. (1993), “PVM 3.0 User‘s Guide and Reference manual”. Oak Ridge National Laboratory. 64. Tanenbaum, A., (1 999), “Structured Computer Organization”, Ed. Prentice-Hall, Fourth Edition. 65. Tariq, R., Yi, W. and Li, K,B., (2004), Multiple sequence alignment using tabu search. Proceedings of the second Asia-Pacific Bioinformatics Conference. Dunedin, New Zealand, 2004. 66. Thompson, J.D., Higgins, D.G. and Gibson, T.J, (1994), “Clustal W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting position-specific gap penalties and weight matrix choice”, Nucleic Acids Research 22:4673-4680. 67. Trelles,O., Zapata, E.L. and Carazo, J.M., (1994a), “Mapping strategies for sequential sequence comparison algorithms on LAN-based message passing architectures”, In Lecture Notes in Computer Science, vol796; High Performance Computing and Networking, Springer-Verlag,Berlin, 197-202. 68. Trelles, O., Zapata, E.L. and Carazo, J.M., (1994b), “On an efficient parallelization of exhaustive sequence comparison algorithms on message passing architectures”, CABIOS 10 ( 5 ) , 509-5 11. 69. Trelles, O., Andrade, M.A., Valencia, A., Zapata E.L. and Carazo, J.M., (1998a), “Computational Space Reduction and Parallelization of a new Clustering Approach for Large Groups of Sequences”, BioInformatics vol. 14 110.5 (pp.43945 1). 70. Trelles, O., Ceron, C., Wang, H.C., Dopazo, J. and Carazo, J.M., (1998b), “New phylogenetic venues opened by a novel implementation of the DNAml Algorithm”, BioInformatics vol. 14 no.6 (pp.544-545). 7 1. Trelles, 0. (200 l), “On the Paralelization of bioinformatic applications”, Briefings in bioinformatics (May, 2001) v01.2 (2) pp. 181-194. 72. Wheeler DL, Church DM, Lash AE, Leipe DD, Madden TL, Pontius JU,Schuler GD, Schriml LM, Tatusova TA, Wagner L, Rapp BA. (2002) “Database resources of the National Center for Biotechnology Information: update”, Nucleic Acids Res.30( 1):13-6. 73. Wilbur, W.J. and Lipman, D.J, (1983), “Rapid similarity searches in nucleic acid and protein databanks”, Proc.Natl.Acad.Sci. USA, 80. 726-730. 74. Wu, C., Berry, M., Shivakumar, S. And Mclarty, J, (1999, “Neural Networks for Full-scale Sequence Classification: Sequence Encoding with Singular Value Decomposition” machine Learning. 2 1, 177-193.

REFERENCES

549

75. Wu, C. (1997) “Artificial neural networks for molecular sequence analysis” Computers Chem. Vol. 21, No. 4,237-256. 76. Wu, C., Xiao, C., Hou, Z., Huang, H., Barker, W.C. (2001) “iProclass: an integrated, comprehensive and annotated protein classification database”. Nucleic Acids Res. 29: 52-54. 77. Zhang, C. and Wong, A. K. (1997), “A genetic algorithm for multiple molecular sequence alignment”, Comput. Appl. Biosci., 13(6): 565-8 1.

This Page Intentionally Left Blank

Index

A

2-path network design problem, 337 3-index assignment problem, 3 19 ACO for the Reconfigurable Mesh, 190 ACS, 398 Adaptive memory, 298,304 AGA, 398 Algorithmic Parallelism in PSA, 269 Aminoacids, 5 I9 ANOVA, 52 Ant Colony Optimization, 25 ANTabu. 398 Antennae Placement and Configuration, 500 APGA, 398 Applications of Parallel Hybrid Metaheuristics, 358 Artificial Ant Problem and PGP, 134 ASPARAGOS, 116 ATeam, 398 B

Bioinfonnatic Applications, 526 Bioinformaticsat a Glance, 5 I9

C CAC, 398 CEDA at Work, 214 Cellular Estimation of Distribution Algorithms, 21 1 Cellular Model in PGP, 132 Cellular Networks, 506 Central memory, 293,298-30 I Centralized strategy. 326 CGA, 398 CGP-Based Classifier, 146 Circuit Encoding Using Trees, 141 Classification of Parallel EDAs, 216 Classificationwith Cellular GP, 143 Classifying Hybrid Metaheuristics, 350 CMPTS, 398 Coarse-grained.304 CommunicationTopology in PGP, 132

Component Exchange Among Metaheuristics. 29 Computational Effort of PGP, 135 Condor, 67,74,375 Condor-G, 74 Constructive Heuristics, 4 Control cardinality, 292 Cooperative-threadparallel implementation. 338 Cooperative Search, 30 COP,405 CoPDEB, 116,398 CORBA. 72 COSEARCH, 398 CPM-VRPTW, 398 CPTS, 398 Crossover Operator in GP, 129 CS, 398

D Data Mining and Cellular GP, 143 DBsrch, 526 Decision Trees in PGP, 145 DGA, 116,398 DGENESIS, 116 Distributed Estimation of Distribution Algorithms, 207 Distributed Resource Machine (DRM), 276 Distributed strategy, 325 Diversification. 290.297,299,30 1 DNA, 519 DNA arrays, 5 18 DNA strands, 5 I7 DPM, 398 DREAM, 116,276,405 island model, 276 Distributed Resource Machine, 276 Dynamic load balancing, 3 19 Dynamic Problems, 479

E ECO-GA. 116 Efficiency, 46 Elite solutions, 299.301, 323 Empirical distribution, 320 Empirical distribution plot, 32 1

551

552

INDEX

EnGENEer, 116 Evaluating the Computational Effort, 5 1 Even Parity Problems and PGP, 134 Evolutionary Computation, 19 Evolutionary Parallelism, 275 Exons, 5 19 Explorative Local Search Methods. I3 Exponential distribution, 320

F FGA, 398 Fitness-Level Parallelism in GP, I3 I Flynn’s Taxonomy, 64 FPGA Circuit Encoding, 14 1 Fractional Factorial Design, 50 Frameworks, 305 Frameworks for Heterogeneous Metaheuristics, 404 FTPH, 398 Full Factorial Design. 50 Function Set in GP, 128

c

GAACO. 398 GALOPPS, I16 GAMAS, 116,398 GAME, 116 GDGA. 116 Genetic algorithms, 280 Genetic Programming, 127 GENITOR 11, 116 Globus, 67,73 GP, 128 GP Sets, 142 Graph Coloring, 451 Graph partitioning, 300-30 1 Graph partitioning, 452 Graph planarization, 321 GRASP, 3 15 Greedy Randomized Adaptive Search Procedure, 3 15 construction for 2-path network design, 337 construction for 3-index assignment. 328 construction for job shop scheduling, 332 construction phase, 3 15 greedy function, 3 15 local search for 2-path network design, 337 local search for 3-index assignment. 328 local search for job shop scheduling, 333 local search phase, 3 15 multiple-walk cooperative-thread with path-relinking, 325 restricted candidate list, 3 I5 with path-relinking, 325 Grid, 375 H

HDGA, 398 Heterogeneous Metaheuristics Survey, 397 HFC-EA, 398 HM4C. 398 Hy3.398 Hy4, 116,398 Hybrid GRASP with path-relinking, 325 Hybrid parallelism, 270,275 Hybridization with GRASP, 325 GRASP with path-relinking, 323

I IiGA. 1 16,398GAindependent-thread parallel implementation Independent-thread parallel implementation, 338 Independent search, 296 Information exchange, 293,298 Integration with Tree Search Methods and Constraint Programming, 30 Intensification, 290,301 Irregular Computational Patterns in Bioinformatics, 532 Island Models of PGP. 131 J Java RMI, 72 Java Threads, 69 Job shop scheduling. 277 Job shop scheduling problem, 3 19. 321, 33 1

K Kendall Square Research, 3 17

L Learning the Probability Distribution in Parallel EDAs, 207 Levels of Parallelism in EDA, 204 Local search, 3 16 Local Search Methods, 5 Location Problems, 464 LSM, 80

M MACS-VRPTW. 398 MAGMA, 406 MALLBA, 56, 116,499 MARS, 1I6 MAS-DGA, 406 Massive parallelization, 274 Master-slave, 294295.304 MastedSlave in EDA, 206 MAX-SAT, 3 17 Maximum covering, 32 1 Maximum independent set. 321 Maximum weighted satisfiability, 321 MAXSAT Problem. I 19 MCAA, 398

INDEX

Message-Passing Interface, 3 17,327 Metaheuristics, 6 Metropolis Algorithm, 268 Microarray, 521 Middleware, 67 Migration Model of PES. 160 Migration Parameters in PGP, 132 Migration Policy in a dEDA, 208 MIMD, 63 MISD, 63 Mobile Network Design, 470 MOP, 371 MPI, 71,317, 327 Message Passing Toolkit, 327 MSA, 538 Multi-level cooperative search, 300-301 Multi-Population Models of PGP, 13I Multicommodity Network Design. 469, 507 Multiple-walk cooperative-thread, 316,323,334 Multiple-walk independent-thread, 316-317, 329 Multiple-walk independent-thread GRASP with path-relinking, 333 Multiple independent runs, 269,273 Multiple Independent Runs in PSA, 272 Multithread search, 297,301,304 Mutation Operator in GP, 129

N Neighborhood function, 277,279 Neighborhood operator, 290 Nested Populations of PES, 164 Network Assignment and Dimensioning, 504 Network Design, 300,468,496 Network Routing, 502 New Challenges in Bioinfonnatics. 543 New Trends in PGAs, 117 NGA (DAGA2), 398 No-Free-Lunch Theorem, 348 Non-generational Models of PES, 163 OpenMP, 69 OR-Library, 3 17 OR Library, 278 ORLib, 333

P P-ACO for FPGAs, 192 PAES, 372 ParadisEO, 116,405 PARAGENESIS, 116 Parallel Ant Colony Optimization, 90 Parallel Computer Architectures: Taxonomy, 522 Parallel Estimated Distribution Algorithms, 91 Parallel Evolution Strategies, 88 Parallel Evolutionary Algorithms, 159 Parallel Fitness Evaluation in EDAs, 206

553

Parallel Genetic Algorithms. 87, 112 Parallel Genetic Programming, 89 Parallel GRASP, 83 Parallel GRASP, 317 Parallel Heterogeneous Metaheuristic, 396 Parallel Heterogeneous Metaheuristics, 93 Parallel Hybrid Metaheuristics, 355 Parallel Metaheuristics for the VRP, 476 Parallel Metaheuristics in Bioinformatics, 534 Parallel Metrics, 46 Parallel Models for EDAs, 206 Parallel Models of EAs, 86 Parallel Models of LSMs, 81 Parallel moves, 270, 274 Parallel Moves in PSA, 273 Parallel Multiobjective Models, 379 Parallel Multiobjective Optimization, 94 Parallel Multiobjective Steady State GA, 378 Parallel Programming Models, 524 Parallel Scatter Search, 92,225 with Multiple Combinations, 241 with Single Combination, 239 Parallel Simulated Annealing, 8 I, 269-270 Parallel Tabu Search, 82 Parallel Virtual Machine, 3 17 Parallel VNS, 84,251 Parallelism by data, 270 Parallelism by Data in PSA, 272 Parallelization test, 322 Parallelizing PAES, 377 Pareto front, 371 Pareto optimum, 371 Path-relinking, 323 for 2-path network design, 337 for 3-index assignment, 328 for job shop scheduling, 333 symmetric difference, 324 PATS, 398 PEGAsuS, 116 Performance Measures for Parallel Metaheuristics. 44 CPU time, 44 Speedup, 44 Wall-clock time, 44 Performance Metrics for Multiobjective Optimization, 38 I PGA-Cellular Model, 114 PGA-Distributed Model, 113 PGA-Independent Runs Model, 112 PGA-Master-Slave Model, 112 PGA, 116,398 PGA Models of Population Sizing, 43 1 PGAPack, I 1 6 PGP Benchmark Problems, 134 PHM, 396 PHMH. 398 PHYLIP package, 533

554

INDEX

Phylogenetic analysis, 520 Phylogenetic trees, 5 17 Physical Parallelism in PSA, 269 Placement and Routing in FPGAs. 138 PMSATS, 398 Pollination Model of PES, 161 Population Based ACO. 175 Principles of EAs, 85 Protein Folding, 520 Pthreads, 68 PTS, 398 PVM, 71,317

Q Q-Q plot, 32 1 quantile-quantile plot, 321 QAP. 3 17 QAPLIB, 3 17 quadratic assignment, 32 I quadratic assignment problem, 3 I7 Quadratic assignment, 294 Quadratic Assignment, 462

R Radio Link Frequency Assignment, 505 RCL, 315 Real Life Applications of PGP, 137 Regular Computational Pattern: Database Searching, 526 Reliability and Connectivity Problems, 496 Reporting Results in Parallel Metaheuristics, 53 Reusing Availat le Software in Bioinfonnatics, 543 RPL2, 1 I6 S

SAGA, 398 Satisfiability Problems, 459 Scalability, 282 Scaled Speedup. 47 Scaleup, 47 Scatter Search Components, 23 I , 235 Search control, 292 Search differentiation, 293 Search space decomposition, 295 Semi-Regular Computational Patterns, 53 I Sequential fan candidate list. 295 Serial Fraction, 48 Set Covering Applications, 458 Set Partitioning and Covering, 457 Set Partitioning Applications, 457 SGA-Cube, 116 SGI Challenge, 319, 327 Short Introduction to Parallel Metaheuristics, 448 SIMD. 63 Simulated Annealing, 9,267-268 Boltzmann Distribution, 269

distributed Evolutionary Simulated Annealing, 276 Evolutionary Simulated Annealing, 275,277 SISD, 63 Skeletons. 305 Sockets, 70 SOTA, 538 SOTAdp, 538 Speedup, 44 Statistical Analysis of Metaheuristics, 52 Steiner problem, 3 I7 Steiner Tree Problem, 456,498 Structured Genetic Algorithms, 110 SUN-SPARC 10,317 Survey of Hybrid Algorithms, 348 Survey of Parallel GAS, 116 Symbolic Regression Problem and PGP, 134 Synchronous, 295 Synchronous parallelization, 292 T

t-tests, 52 Tabu Search, 11,289 Task Scheduling Strategies, 525 Taxonomy, 291 Taxonomy of Parallel Heterogeneous Metaheuristics, 400 Taxonomy of Speedup Measures, 45 TECHS, 398 Telecommunication Network Design, 469 Terminal Set in GP, 128 The p-Median Problem, 229,258,467 The Feature Subset Selection Problem, 232 The Genetic Programming Algorithm, 128 The Traveling Salesman Problem, 471 The VNS Metaheuristic, 248 Theoretical distribution, 320 Theoretical Effects of Migration in PGAs. 434 Theory of Master-Slave Parallel GAS,428 Theory of Multipopulation Parallel GAS, 430 Theory on Cellular Parallel GAS, 437 Three-index assignment problem, 32 I , 327 TPSA, 398 Traffic assignment, 3 19 Trajectory versus population based methods, 8 Tree-Structured Individuals in GP, 128 Two-parameter exponential distribution, 320 Typical GP Problems, 134

U Uncapacited Facility Location Problem. 275,278 V

Various Applications on Telecoms, 508 VNS for the p-Median, 258 VRP, 294,298 vehicle routing, 294 .Jehicle routing probiems. 476 with time constraints, 477