Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany
6622
Peter Merz Jin-Kao Hao (Eds.)
Evolutionary Computation in Combinatorial Optimization 11th European Conference, EvoCOP 2011 Torino, Italy, April 27-29, 2011 Proceedings
13
Volume Editors Peter Merz University of Applied Sciences and Arts Department of Business Administration and Computer Science Ricklinger Stadtweg 120, 30459 Hannover, Germany E-mail:
[email protected] Jin-Kao Hao University of Angers, Faculty of Sciences 2, Boulevard Lavoisier, 49045 Angers Cedex 01, France E-mail:
[email protected]
Cover illustration: "Globosphere" by Miguel Nicolau and Dan Costelloe (2010), University of Dublin, Ireland
ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-20363-3 e-ISBN 978-3-642-20364-0 DOI 10.1007/978-3-642-20364-0 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011924397 CR Subject Classification (1998): F.1, C.2, H.4, I.5, I.4, F.2 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Metaheuristics continue to demonstrate their effectiveness for an ever-broadening range of difficult combinatorial optimization problems appearing in a wide variety of industrial, economic, and scientific domains. Prominent examples of metaheuristics are evolutionary algorithms, tabu search, simulated annealing, scatter search, memetic algorithms, variable neighborhood search, iterated local search, greedy randomized adaptive search procedures, ant colony optimization and estimation of distribution algorithms. Problems solved successfully include scheduling, timetabling, network design, transportation and distribution, vehicle routing, the travelling salesman problem, packing and cutting, satisfiability and general mixed integer programming. EvoCOP began in 2001 and has been held annually since then. It is the first event specifically dedicated to the application of evolutionary computation and methods related to combinatorial optimization problems. Originally held as a workshop, EvoCOP became a conference in 2004. The events gave researchers an excellent opportunity to present their latest research and to discuss current developments and applications. Following the general trend of hybrid metaheuristics and diminishing boundaries between the different classes of metaheuristics, EvoCOP has broadened its scope in recent years and invited submissions on any kind of metaheuristic for combinatorial optimization. This volume contains the proceedings of EvoCOP 2011, the 11th European Conference on Evolutionary Computation in Combinatorial Optimization. It was held in Torino, Italy, during April 27–29, 2011, jointly with EuroGP 2011, the 14th European Conference on Genetic Programming, EvoBIO 2011, the 9th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, and EvoApplications 2011 (formerly EvoWorkshops), which consisted of the following 12 individual events: 8th European Event on the Application of Nature-Inspired Techniques for Telecommunication Networks and Other Parallel and Distributed Systems (EvoCOMNET), Second European Event on Evolutionary Algorithms and Complex Systems (EvoCOMPLEX), 5th European Event on Evolutionary and Natural Computation in Finance and Economics (EvoFIN), Third European Event on Bio-inspired Algorithms in Games (EvoGAMES), 6th European Event on Bio-Inspired Heuristics for Design Automation (EvoHOT), 13th European Event on Evolutionary Computation in Image Analysis and Signal Processing (EvoIASP), Second European Event on Nature-Inspired Methods for Intelligent Systems (EvoINTELLIGENCE), 9th European Event on Evolutionary and Biologically Inspired Music, Sound, Art and Design (EvoMUSART), 4th European Event on Bioinspired algorithms for Continuous Parameter Optimization (EvoNUM), 6th European Event on Nature-inspired Techniques in Scheduling, Planning and Timetabling (EvoSTIM), 8th European Event on Evolutionary Algorithms in
VI
Preface
Stochastic and Dynamic Environments (EvoSTOC), and 5th European Event on Evolutionary Computation in Transportation and Logistics (EvoTRANSLOG). Since 2007, all these events are grouped under the collective name EvoStar, and constitute Europe’s premier co-located meetings on evolutionary computation. Accepted papers of previous EvoCOP editions were published by Springer in the series Lecture Notes in Computer Science (LNCS – Volumes 2037, 2279, 2611, 3004, 3448, 3906, 4446, 4972, 5482, 6022). Below we report statistics for each conference: EvoCOP 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Submitted 31 32 39 86 66 77 81 69 53 69 42
Accepted 23 18 19 23 24 24 21 24 21 24 22
Acceptance ratio 74.2% 56.3% 48.7% 26.7% 36.4% 31.2% 25.9% 34.8% 39.6% 34.8% 52.4%
The rigorous, double-blind reviewing process of EvoCOP 2011 resulted in the selection of 22 out of 42 submitted papers; the acceptance rate was 52.4%. The number of submissions was considerably lower compared to the previous event. The submission numbers dropped for most other events of EvoStar, too. However, the same quality level as that of EvoCop 2010 could be maintained by carefully selecting the papers for the event. Each paper was reviewed by at least three members of the international Program Committee. All accepted papers were presented orally at the conference and are included in this proceedings volume. We would like to acknowledge the members of our Program Committee: we are very grateful for their thorough work. EvoCOP 2011 contributions consist of new algorithms together with important new insights into how well these algorithms can solve prominent test problems from the literature or real-world problems. We would like to express our sincere gratitude to the two internationally renowned invited speakers, who gave keynote talks at the conference: Craig Reynolds from Sony Computer Entertainment, USA and Jean-Pierre Changeux, neuroscientist and professor emeritus from the Pasteur Institute, France. The success of the conference resulted from the input of many people to whom we would like to express our appreciation. Frist of all, we like to thank the Local Chair of EvoStar 2011, Mario Giacobini from the University of Torino. He and his team did an extraordinary job for which we are very grateful. We thank Marc Schoenauer from INRIA in France for his support with the MyReview conference management system. We thank Penousal Machado of the University of Coimbra for an excellent website and publicity material. Thanks are also due to Jennifer Willies and the Centre for Emergent Computing at Napier University in Edinburgh, Scotland, for administrative support and event coordination. We
Preface
VII
gratefully acknowledge sponsorship from the Human Genetics Foundation of Torino (HuGeF) as well as the University of Torino. Last, but not least, we would like to thank Carlos Cotta, Peter Cowling, Jens Gottlieb, Jano van Hemert, and G¨ unther Raidl for their hard work and dedication in past editions of EvoCOP, which contributed to making this conference one of the reference events in evolutionary computation and metaheuristics. April 2011
Peter Merz Jin-Kao Hao
Organization
EvoCOP 2011 was organized jointly with EuroGP 2011, EvoBIO 2011, and EvoApplications 2011.
Organizing Committee Chairs
Local Chair Publicity Chair
Peter Merz, University of Applied Sciences and Arts, Hannover, Germany Jin-Kao Hao, University of Angers, Angers, France Mario Giacobini, University of Torino, Italy Penousal Machado, University of Coimbra, Portugal
EvoCOP Steering Committee Carlos Cotta Peter Cowling Jens Gottlieb Jin-Kao Hao Jano van Hemert Peter Merz G¨ unther Raidl
Universidad de M´alaga, Spain University of Bradford, UK SAP AG, Germany University of Angers, France University of Edinburgh, UK University of Applied Sciences and Arts, Hannover, Germany Vienna University of Technology, Austria
Program Committee Adnan Acan Hern´ an Aguirre Enrique Alba Mehmet Emin Aydin Thomas Bartz-Beielstein Maria Blesa Christian Blum Rafael Caballero Pedro Castillo Carlos Coello Coello Carlos Cotta Peter Cowling
Middle East Technical University, Ankara, Turkey Shinshu University, Nagano, Japan Universidad de M´ alaga, Spain University of Bedfordshire, UK Cologne University of Applied Sciences, Germany Universitat Polit`ecnica de Catalunya, Spain Universitat Polit`ecnica de Catalunya, Spain University of M´ alaga, Spain Universidad de Granada, Spain National Polytechnic Institute, Mexico Universidad de M´alaga, Spain University of Bradford, UK
X
Organization
Keshav Dahal Karl Doerner Benjamin Doerr Jeroen Eggermont Richard F. Hartl Antonio J. Fern´ andez Francisco Fern´andez de Vega Bernd Freisleben Philippe Galinier Jens Gottlieb Walter Gutjahr Jin-Kao Hao Geir Hasle Juhos Istv´ an Graham Kendall Joshua Knowles Mario K¨ oppen Jozef Kratica Rhyd Lewis Arne Løkketangen Jos´e Antonio Lozano Dirk C. Mattfeld Juan Juli´ an Merelo Peter Merz Martin Middendorf Julian Molina Jose Marcos Moreno Christine L. Mumford Yuichi Nagata Volker Nissen Francisco J.B. Pereira Jakob Puchinger G¨ unther Raidl Marcus Randall Marc Reimann Andrea Roli Franz Rothlauf Michael Sampels Marc Schoenauer Patrick Siarry Jim Smith Christine Solnon
University of Bradford, UK Universit¨ at Wien, Austria Max-Planck-Institut f¨ ur Informatik, Germany Leiden University Medical Center, The Netherlands University of Vienna, Austria Universidad de M´ alaga, Spain University of Extremadura, Spain University of Marburg, Germany Ecole Polytechnique de Montreal, Canada SAP, Germany University of Vienna, Austria University of Angers, France SINTEF Applied Mathematics, Norway University of Szeged, Hungary University of Nottingham, UK University of Manchester, UK Kyushu Institute of Technology, Japan University of Belgrade, Serbia Cardiff University, UK Molde College, Norway University of the Basque Country, Spain University of Braunschweig, Germany University of Granada, Spain University of Applied Sciences and Arts, Hannover, Germany Universit¨ at Leipzig, Germany University of M´ alaga, Spain University of La Laguna, Spain Cardiff University, UK Tokyo Institute of Technology, Japan Technical University of Ilmenau, Germany Universidade de Coimbra, Portugal Arsenal Research, Vienna, Austria Vienna University of Technology, Austria Bond University, Queensland, Australia Warwick Business School, UK Universit` a degli Studi di Bologna, Italy University of Mainz, Germany Universit´e Libre de Bruxelles, Belgium INRIA, France Universit´e Paris-Est Cr´eteil Val-de-Marne, France University of the West of England, UK University Lyon 1, France
Organization
Giovanni Squillero Thomas St¨ utzle El-ghazali Talbi Kay Chen Tan Jorge Tavares Jano van Hemert Jean-Paul Watson Fatos Xhafa Takeshi Yamada
XI
Politecnico di Torino, Italy Universit´e Libre de Bruxelles, Belgium Universit´e des Sciences et Technologies de Lille, France National University of Singapore, Singapore MIT, USA University of Edinburgh, UK Sandia National Laboratories, USA Universitat Politecnica de Catalunya, Spain NTT Communication Science Laboratories, Kyoto
Table of Contents
A Guided Search Non-dominated Sorting Genetic Algorithm for the Multi-Objective University Course Timetabling Problem . . . . . . . . . . . . . . Sadaf Naseem Jat and Shengxiang Yang
1
A Hybrid Dual-Population Genetic Algorithm for the Single Machine Maximum Lateness Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Veronique Sels and Mario Vanhoucke
14
A Kolmogorov-Type Stability Measure for Evolutionary Algorithms . . . . Matthew J. Craven and Henri C. Jimbo A Matheuristic Approach for the Total Completion Time Two-Machines Permutation Flow Shop Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Federico Della Croce, Andrea Grosso, and Fabio Salassa Connectedness and Local Search for Bicriteria Knapsack Problems . . . . . Arnaud Liefooghe, Lu´ıs Paquete, Marco Sim˜ oes, and Jos´e R. Figueira
26
38
48
Cutting Graphs Using Competing Ant Colonies and an Edge Clustering Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Max Hinne and Elena Marchiori
60
Effective Variable Fixing and Scoring Strategies for Binary Quadratic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yang Wang, Zhipeng L¨ u, Fred Glover, and Jin-Kao Hao
72
Evolutionary Multiobjective Route Planning in Dynamic Multi-hop Ridesharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wesam Herbawi and Michael Weber
84
Experiments in Parallel Constraint-Based Local Search . . . . . . . . . . . . . . . Yves Caniou, Philippe Codognet, Daniel Diaz, and Salvador Abreu
96
Fitness-Probability Cloud and a Measure of Problem Hardness for Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guanzhou Lu, Jinlong Li, and Xin Yao
108
Frequency Distribution Based Hyper-Heuristic for the Bin-Packing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . He Jiang, Shuyan Zhang, Jifeng Xuan, and Youxi Wu
118
XIV
Table of Contents
From Adaptive to More Dynamic Control in Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giacomo di Tollo, Fr´ed´eric Lardeux, Jorge Maturana, and Fr´ed´eric Saubion
130
Geometric Generalisation of Surrogate Model Based Optimisation to Combinatorial Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alberto Moraglio and Ahmed Kattan
142
GPU-Based Approaches for Multiobjective Local Search Algorithms. A Case Study: The Flowshop Scheduling Problem . . . . . . . . . . . . . . . . . . . . Th´e Van Luong, Nouredine Melab, and El-Ghazali Talbi
155
Local Search for Mixed-Integer Nonlinear Optimization: A Methodology and an Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fr´ed´eric Gardi and Karim Nouioua
167
Multi-start Heuristics for the Two-Echelon Vehicle Routing Problem . . . Teodor Gabriel Crainic, Simona Mancini, Guido Perboli, and Roberto Tadei NILS: A Neutrality-Based Iterated Local Search and Its Application to Flowshop Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marie-El´eonore Marmion, Clarisse Dhaenens, Laetitia Jourdan, Arnaud Liefooghe, and S´ebastien Verel Off-line and On-line Tuning: A Study on Operator Selection for a Memetic Algorithm Applied to the QAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gianpiero Francesca, Paola Pellegrini, Thomas St¨ utzle, and Mauro Birattari On Complexity of the Optimal Recombination for the Travelling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anton V. Eremeev Pareto Local Optima of Multiobjective NK-Landscapes with Correlated Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S´ebastien Verel, Arnaud Liefooghe, Laetitia Jourdan, and Clarisse Dhaenens
179
191
203
215
226
Quick-ACO: Accelerating Ant Decisions and Pheromone Updates in ACO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wei Cheng, Bernd Scheuermann, and Martin Middendorf
238
Two Iterative Metaheuristic Approaches to Dynamic Memory Allocation for Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mar´ıa Soto, Andr´e Rossi, and Marc Sevaux
250
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
263
A Guided Search Non-dominated Sorting Genetic Algorithm for the Multi-Objective University Course Timetabling Problem Sadaf Naseem Jat1 and Shengxiang Yang2 1
2
Department of Computer Science, University of Leicester University Road, Leicester LE1 7RH, United Kingdom
[email protected] Department of Information Systems and Computing, Brunel University Uxbridge, Middlesex UB8 3PH, United Kingdom
[email protected]
Abstract. The university course timetabling problem is a typical combinatorial optimization problem. This paper tackles the multi-objective university course timetabling problem (MOUCTP) and proposes a guided search non-dominated sorting genetic algorithm to solve the MOUCTP. The proposed algorithm integrates a guided search technique, which uses a memory to store useful information extracted from previous good solutions to guide the generation of new solutions, and two local search schemes to enhance its performance for the MOUCTP. The experimental results based on a set of test problems show that the proposed algorithm is efficient for solving the MOUCTP.
1
Introduction
In the university course timetabling problem (UCTP), events (subjects, courses) have to be set into a number of time slots and located in suitable rooms while satisfying various constraints. The UCTP is one of the most challenging scheduling problems due to its complexity and highly constrained nature. Usually, the UCTP is NP-hard. It is very difficult to find a general and effective solver for the UCTP due to the diversity of the problem and variance of constraints from institute to institute. Researchers have proposed various approaches, e.g., constraintbased methods, population-based methods, meta-heuristic, and hyper-heuristic approaches, for timetabling. Most research has taken timetabling as a single objective problem by combining multiple criteria into a single scalar value and then minimising the weighted sum of constraint violations as the only objective function. Few work has tackled the multi-objective UCTP (MOUCTP). Burke et al. [3] proposed a hyper-heuristic approach for MOUCTPs. Carrasco and Pato [5] used a bi-objective genetic algorithm (GA) to the class teacher timetabling problem. Datta et al. [6] used the non-dominated sorting GA (NSGA-II) [7] as a university class timetable optimizer. They used a bi-objective model to minimize the soft-constraint violations. A comprehensive review on multi-objective evolutionary algorithms (MOEAs) can be found in [2]. P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 1–13, 2011. c Springer-Verlag Berlin Heidelberg 2011
2
S.N. Jat and S. Yang
This paper proposes a guided search non-dominated sorting GA (GSNSGA) to solve the MOUCTP. GSNSGA integrates a guided search technique [9] and local search (LS) techniques into NSGA-II to solve the MOUCTP. NSGA-II [7] is chosen since it has been successfully used for multi-objective problems in different fields, including timetabling [5]. The guided search technique is used to create offspring to increase the rate of highly fit individuals in the population that lead NSGA-II to find the non-dominated set of solutions and LS techniques are used to enhance the performance of GSNSGA by encouraging better convergence and discovering any missing trade-off space. Experimental results on a set of MOUCTP instances show that GSNSGA is a good solver for the MOUCTP.
2
Description of the MOUCTP
The real-world UCTP consists of different constraints: some are hard constraints and some are soft constraints. Hard constraints must not be violated under any circumstances, e.g., a student cannot attend two events at the same time. Soft constraints should preferably be satisfied, e.g., a student should not attend more than two events in a row. It is very tough or even impossible to satisfy all the soft constraints [6]. This requires us to treat the scheduling of timetable as finding solutions over hard constraints, and optimize them over soft constraints [14]. In this paper, we will test algorithms on the problem instances discussed in [13]. These instances are dealt with as the MOUCTP due to the lack of MOUCTP benchmarks in the literature. We deal with the following hard constraints: – – – –
No student attends more than one events at the same time; The room is big enough for all the attending students; The room satisfies all the features required by the event; Only one event is in a room at any time slot.
There are also soft constraints, which are equally penalized by the number of their violations and are described as follows: – A student has an event in the last time slot of a day; – A student attends more than two events consecutively; – A student has a single event on a day. The number of violations of each of the above three kinds of soft constraints can be taken as one objective function to be minimized. Hence, we have three objective functions, f1 (x), f2 (x), and f3 (x), which are associated with the above three kinds of soft constraints, respectively, in the MOUCTP in this paper. In a UCTP, we assign an event (course, lecture) into a time slot and also assign a number of resources (students and rooms) such that there is no conflict between the rooms, time slots, and events. The UCTP consists of a set of n events E = {e1 , e2 , ..., en } to be scheduled into a set of 45 time slots T = {t1 , t2 , ..., t45 } (9 for each day in a five-day week), a set of m rooms R = {r1 , r2 , ..., rm } in which events can take place, a set of k students S = {s1 , s2 , ..., sk } who attend the events, and a set of l available features F = {f1 , f2 , ..., fl } that are satisfied
A Guided Search NSGA for Multi-Objective University Course Timetabling
3
Algorithm 1. Guided Search Non-dominated Sorting Genetic Algorithm 1. input: A problem instance I 2. initialise a population P of N solutions 3. apply local search schemes LS1 and LS2 for individuals in P 4. evaluate individuals in P 5. assign rank and crowding distance for individuals in P 6. create data structures 7. set the generation counter g := 0 8. while the termination condition is not reached do 9. if (g mod τ ) == 0 then 10. apply ConstructMEM () to construct data structures 11. create a child population Q using GuidedSearch() or Crossover() with a probability γ 12. apply mutation on individuals in Q with a probability Pm 13. apply local search schemes LS1 and LS2 for individuals in Q 14. evaluate the child individuals in Q 15. merge P and Q and assign rank and crowding distance for individuals 16. select a new population from the merge of P and Q based on rank and crowding distance 17. g := g + 1 18. output: A non-dominated set of solutions
by rooms and required by events [13]. In addition, the inter-relationships between these sets are given by five matrices, see [9,13] for details. Usually, a matrix is used for assigning each event to a room ri and a time slot ti . Each pair of (ri , ti ) is assigned a particular number which corresponds to an event. If a room ri in a time slot ti is free or no event is placed, then “-1” is assigned to that pair. This way, we assure that there will be no more than one event assigned to the same pair so that one of the hard constraint will always been satisfied. For room assignment, we use a matching algorithm described in [13]. For every time slot, there is a list of events taking place in it and a preprocessed list of possible rooms to which the placement of events can occur. The matching algorithm uses a deterministic network flow algorithm and gives the maximum cardinality matching between rooms and events. A solution to a UCTP can be represented as an ordered list of pairs (ri , ti ), of which the index of each pair is the identification number of an event ei ∈ E (i = 1, 2, · · · , n). For example, the time slots and rooms are allocated to events in an ordered list of pairs like: (2, 4), (3, 30), · · · , (2, 7), where room 2 and time slot 4 are allocated to event 1, room 3 and time slot 30 are allocated to event 2, and so on.
3
The Proposed GSNSGA for the MOUCTP
The framework of GSNSGA, as shown in Algorithm 1, is based on NSGA-II [7]. Initially, a population P of N individuals are randomly generated. For each individual, each event is assigned a random time slot and a room via the matching algorithm. As random solutions have a low chance to be feasible, two LS methods, denoted LS1 and LS2, are used to convert them into feasible or near-feasible solutions. Then, the individuals in P is ranked by the non-dominated sorting as described in [7]. For each individual Ii ∈ P , we calculate the domination count ni (the number of solutions in P which dominate Ii ) and the set Si of solutions that Ii dominates. Then, we construct the Pareto fronts from the population round by round as follows. All solutions with ni = 0 form the first Pareto front.
4
S.N. Jat and S. Yang
For each solution Ii in the first Pareto front, we check each member in the set Si and reduce its domination count value by one. If the domination count of a member becomes zero, we put it into a list L. After this round of checking, all the members in L form the second Pareto front. The above checking and ranking procedure continues until all Pareto fronts are identified. After ranking, the crowding distance [7] of each front is calculated, which is used for the density estimation for each individual. The crowding distance of a solution Ii is the average side-length of the cube that encloses the solution without including any other individuals in the population. After assigning ranks and crowding distances, GSNSGA constructs three data structures M EMi (i = 1, 2, 3) to store useful information from the best individuals of the population, which are used to guide the generation of offspring for the following generations. In each generation, a child population Q is first generated using the data structures M EMi (i = 1, 2, 3) or crossover, depending on a probability γ. If crossover is applied, we select two parents according to the rank and crowding distance from the parent population P and apply crossover on them. After that, we perform mutation with a probability Pm . Mutation applies a randomly selected neighbourhood structure N1, N2, N3, or N4 to make a move. After mutation, we merge populations Q and P , assign rank and crowding distances for individuals as above, and select the best N solutions based on the ranks and crowding distances to form the population of next generation. The iteration continues until a stop condition is reached, e.g., a time limit tmax is reached. The key components of GSNSGA, including the LS schemes, the data structures, and the guided search strategy, are described respectively as follows. 3.1
The LS Schemes (LS1 and LS2)
In GSNSGA, two LS schemes (LS1 and LS2) are used orderly on each individual in the initial population as well as after a child is created through crossover or the MEM data structure and mutation. The first scheme (LS1), as shown in Algorithm 2, is based on the LS scheme used in [13] with the extension of an additional neighbourhood. LS1 works in two steps based on four neighbourhood structures, denoted as N1, N2, N3, and N4, respectively, where N1 is defined by an operator that moves one event from a time slot to a different one, N2 is defined by an operator that swaps the time slots of two events, N3 is defined by an operator that permutes three events in three distinct time slots in one of the two possible ways other than the existing permutation of the three events, and N4 is defined by an operator that swaps the time slots of two consecutive events with the time slots of another two consecutive events. In the first step (lines 2-9 in Algorithm 2), LS1 checks the hard-constraint violations of each event while ignoring its soft-constraint violations. If there are hard-constraint violations for an event, LS tries to resolve them by applying moves in the neighbourhood structures N1, N2, N3, and N4 orderly, until an improvement is reached or the maximum number of steps smax is reached, which is set to different values for different problem instances. After each move, we apply the matching algorithm to the time slots affected by the move and try to
A Guided Search NSGA for Multi-Objective University Course Timetabling
5
Algorithm 2. Local Search Scheme 1 (LS1) 1. 2. 3. 4. 5.
input : Individual I from the population for each event ei ∈ E do if event ei is infeasible then if there is untried move left then calculate the moves: first N1, then N2 if N1 fails, then N3 if N2 also fails, and finally N4 if N3 also fails 6. apply the matching algorithm to the time slots affected by the move to allocate rooms for events 7. delta evaluate the result of the move 8. if moves reduce hard constraints violation then 9. make the moves and go to line 4 10. if no any hard-constraint violations remain then 11. for each event ei ∈ E do 12. if event ei has soft constraint violation then 13. if there is untried move left then 14. calculate the moves: first N1, then N2 if N1 fails, then N3 if N2 also fails, and finally N4 if N3 also fails 15. apply the matching algorithm to the time slots affected by the move to allocate rooms for events 16. delta evaluate the result of the move 17. if moves reduce soft-constraint violations then 18. make the moves and go to line 13 19. output : A possibly improved individual I
Algorithm 3. Local Search Scheme 2 (LS2) 1. input : Individual I after LS1 is applied 2. while the termination condition is not reached do 3. S := randomly select a preset percentage of time slots from the total time 4. for each time slot ti ∈ S do 5. for each event j in time slot ti do 6. calculate the penalty value of event j 7. sum the total penalty value of events in time slot ti 8. select the time slot wt with the biggest penalty value from S 9. for each event i in wt do 10. calculate a move of event i in the neighbourhood structure N1 11. apply the matching algorithm to the time slots affected by the move 12. compute the penalty of event i and delta evaluate the result 13. if all the moves together reduce hard or soft constraint violations then 14. apply the moves 15. else 16. delete the moves 17. output : A possibly improved individual I
slots of T
resolve the room allocation disturbance and delta-evaluate the result of the move (i.e., calculate the hard- and soft-constraint violations before and after the move). If there is no untried move left in the neighbourhood for an event, LS1 continues to the next event. After applying all neighbourhood moves on each event, LS1 will perform the second step (lines 10-18 in Algorithm 2). In the second step, after reaching a feasible solution, LS performs a similar process as in the first step on each event to reduce its soft-constraint violations without violating hard constraints. When LS1 finishes, we get a possibly improved feasible individual. LS2, as shown in Algorithm 3, is used immediately after LS1 on an individual. The basic idea of LS2 is to choose a high penalty time slot that may have a large number of events involving hard- and soft-constraint violations and try to reduce the penalty values of involved events. LS2 first randomly selects a preset
6
S.N. Jat and S. Yang
e2
(r 1 2 , t 1 ) , (r N22 , t N2 )
e6
(r 1 6 , t 1 ) , (r 2 6 , t 2 )
eh
2
l2
2
6
6
(r N66 , t N6 )
(r 1 , t 1 ) , (r 2 , t 2 ) h
h
h
h
l6
6
(r Nh , t Nh ) h
h
lh
Fig. 1. Illustration of the data structure M EMi (i = 1, 2, 3)
percentage of time slots1 (e.g., 30% as used in this paper) from the total time slots of T . Then, it calculates the penalty of each selected time slot2 and chooses the worst time slot wt that has the biggest penalty value for local search as follows: LS2 tries a move in the neighbourhood N1 for each event of wt and checks the penalty value of each event before and after applying the move. If all the moves in wt together reduce the hard- and/or soft-constraint violations, then we apply the moves; otherwise, we do not apply the moves. This way, LS2 can not only check the worst time slot but also reduce the penalty value for some events by moving them to other time slots. 3.2
Data Structures M EMi (i = 1, 2, 3)
Usually, it is assumed that elitism and diversity preservation mechanisms improve the performance of MOEAs [2]. In GSNSGA, we also create extra data structures (memories) to preserve best parts of individuals to guide the generation of offspring. We create three data structures, each of which stores useful information according to one of the three objectives. Figure 1 shows the data structure M EMi (i = 1, 2, 3), associated with the i-th objective. In M EMi , there is a list of events and each event ek has again a list lek of room and time slot pairs. In Fig. 1, Nk represents the total number of pairs in the list lek . The data structures are regularly reconstructed, e.g., every τ generations. Algorithm 4 shows the outline of the (re-)construction of the data structures. When the data structures are due to be (re-)constructed, we first select α best individuals from the population P to form a set Q. After that, for each individual Ij ∈ Q, we check its objective values. If any of its objectives, say fi (Ij ), has a 1
2
Rather than choosing the worst time slot out of all the time slots, we randomly select a set of time slots and then choose the worst time slot. This is because for each selected time slot we need to calculate its penalty value, which is time-consuming. By selecting a set of time slots instead of all time slots, we try to balance between the computational time and the quality of the algorithm. The penalty of a time slot is the sum of the penalty values of all the events that occur in the time slot.
A Guided Search NSGA for Multi-Objective University Course Timetabling
7
Algorithm 4. ConstructM EM () – Constructing data structures 1. input : The whole population P 2. Q ← select the best α individuals in P 3. for each individual Ij in Q do 4. for each objective i do 5. if fi (Ij ) = 0 then 6. for each event ek in Ij do 7. calculate the penalty value of event ek from Ij 8. if ek is feasible (i.e., ek has zero constraint violation) then 9. add the pair of room and time slot (rek , tek ) assigned to ek into the list lek 10. output : The updated data structures MEMi (i = 1, 2, 3)
in MEMi
zero value, then each event of Ij is checked by its penalty value (hard and soft constraints associated with this event). If an event has a zero penalty value, then we store the information corresponding to this event into corresponding data structure M EMi . For example, for an individual Ij ∈ Q, assuming f1 (Ij ) = 0, which means no students have a class in the last time slot of a day in the solution Ij , if the event e2 of Ij is assigned room 2 at time slot 13 and has a zero penalty value, then we add the pair (2, 13) into the list le2 in M EM1 . Similarly, the events of the next individual Ij+1 ∈ Q are checked by their penalty values. If f1 (Ij+1 ) = 0 and the event e2 in Ij+1 has a zero penalty, then we add the pair of room and time slot assigned to e2 in Ij+1 into the existing list le2 in M EM1 . If an event em in an individual Ik ∈ Q with f1 (Ik ) = 0 has a zero penalty and there is no list lem existing in M EM1 yet, then the list lem is added into M EMi . Similar process is carried out for each individual in Q. Finally, M EMi stores a list of pairs of room and time slot for each event with a zero penalty corresponding to the best individuals of the population regarding the i-th objective. The data structures are then used to generate offspring for the next τ generations before re-constructed. We update the data structures every τ generations instead of every generation in order to make a balance between the solution quality and the computational time cost. 3.3
Generating a Child by the Guided Search Strategy
In GSNSGA, a child population is created by the guided search strategy or crossover with a probability γ. That is, when a child is to be generated, a random number ρ ∈ [0.0, 1.0] is first generated. If ρ < γ, the guided search strategy is used to generate the child; otherwise, a crossover operation is used to generate the child. If a child is to be created by the guided search strategy, we first randomly select one data structure M EMi and then apply Algorithm 5. In Algorithm 5, we first select a set Es of β ∗ n random events to be generated from M EMi . Here, β is the percentage of the total number of events. After that, for each event ek in Es , we randomly select a pair (rek , tek ) from the list lek in M EMi that corresponds to the event ek and assign the selected pair to ek for the child. If an event ek in Es has no list lek in M EMi , then we randomly assign a room and a time slot from possible rooms and time slots to ek for the child. This process is carried out for all events in Es . For those remaining events not present in Es , available rooms and time slots are randomly assigned to them.
8
S.N. Jat and S. Yang
Algorithm 5. GuidedSearch(M EMi ) – Generating a child from M EMi 1. 2. 3. 4. 5. 6. 7. 8.
input : The MEMi data structure Es := randomly select β ∗ n events for each event ei in Es do randomly select a pair of room and time slot from the list lei assign the selected pair to event ei for the child for each remaining event ei not in Es do assign a random time slot and room to event ei output : A child generated using the MEMi data structure
Table 1. Three groups of problem instances Class
Small Medium Large
Number of events 100 Number of rooms 5 Number of features 5 Approximate features per room 3 Percentage (%) of features used 70 Number of students 80 Maximum events per student 20 Maximum students per event 20
4
400 10 5 3 80 200 20 50
400 10 10 5 90 400 20 100
Experimental Study
In this section, we experimentally investigate the performance of GSNSGA and NSGA-II [7] for the MOUCTP. The program was coded in GNU C++ with version 4.1 and run on a 3.20 GHz PC. We use a set of benchmark problem instances to test the algorithms, which were proposed for the timetabling competition, see [8]. Table 1 represents the data of the UCTP instances of three different groups: 5 small instances, 5 medium instances, and 1 large instance. According to our preliminary experiments, the parameters for GSNSGA and NSGA-II were set as follows: N = 50, α = 0.2 ∗ N = 10, β = 0.4, γ = 0.6, τ = 30, and Pm = 0.6. In the initialisation of the population, the maximum number of steps per LS operation smax was set to 300 for small instances, 1500 for medium instances, and 2500 for the large instance, respectively. There were 20 runs of each algorithm on each problem instance. For each run, the maximum run time tmax was set to 100 seconds for small instances, 1000 seconds for medium instances, and 10000 seconds for the large instance. Firstly, we compare the performance of GSNSGA and NSGA-II regarding the three objective values. The experimental results are shown in Table 2, where S1 to S5 denote small instance 1 to small instance 5, M 1 to M 5 denote medium instance 1 to medium instance 5, and L denotes the large instance, respectively. In Table 2, “Best”, “Average”, and “Std” mean the best, average, and standard deviation of the three objective values over 20 runs, respectively, “ln” means that over 50% of the results are infeasible. The objective function values of GSNSGA on all problem instances are much smaller than the values for NSGAII. This shows that local and guided search help the algorithm to find different or unexplored regions of the search space and try to lead the algorithm to global optimum. Figure 2 shows the 3-D and 2-D projections of the objective functions
A Guided Search NSGA for Multi-Objective University Course Timetabling
9
Table 2. Results of NSGA-II and GSNSGA regarding the three objective values Algo
MOUCTP Best f1 f2 NSGA-II s1 4 5 s2 5 3 s3 5 3 s4 4 1 s5 6 22 m1 38 249 m2 38 277 m3 38 277 m4 38 234 m5 38 234 l ln ln GSNSGA s1 0 0 s2 0 0 s3 0 0 s4 0 0 s5 0 0 m1 3 95 m2 7 94 m3 1 95 m4 0 38 m5 5 94 l 30 221
f3 59 72 72 91 52 61 61 58 66 66 ln 0 0 0 0 0 15 3 5 2 15 89
Average f1 f2 9.43 23.22 9.17 21.83 8.52 24.42 8.88 7.54 9.71 48.54 44.24 312.78 44.67 317.25 44.06 346.92 44.23 309.79 44.23 309.79 ln ln 1.33 6.74 1.63 5.75 0.65 2.06 1.02 1.14 1.52 2.04 8.74 138.76 13 176.6 6.9 145.81 7.15 88.81 23.15 150.4 39.72 345.94
f3 87.61 90.94 85.79 111.74 72.06 81.94 82.46 80.94 80.73 80.73 ln 9.91 5.9 7.38 20.46 15.1 32.52 21 16.69 22.38 43.15 124.64
Std f1 2.23 2.09 2.48 2.07 1.96 2.82 2.93 3.5 2.87 2.87 ln 0.94 1.35 0.76 0.93 1.52 3.75 2.73 2.33 5.36 9.72 5.46
f2 11.82 9.01 12.31 4.6 15.01 28.29 19.93 26.95 30.09 30.09 ln 3.9 4.12 2.17 1.48 2.22 23.25 24.68 20.29 20.29 25.71 73.62
f3 11.72 10.01 8.75 13.11 10.82 8.27 8.36 8.47 8.54 8.54 ln 4.66 3.51 5.51 8.86 6.84 11.55 6.73 7.88 15.44 9.72 21.1
of Pareto front of NSGA-II and GSNSGA on S1 and M 1, respectively. The scale of Fig. 2 is based upon the objective values of non-dominated solutions. From Fig. 2, it can be seen that there is a huge difference between the objective values of the two algorithms. For example, on M 1, the minimal f3 (x) value of NSGA-II is greater than the maximal f3 (x) value of GSNSGA. Secondly, we compare the performance of NSGA-II and GSNSGA regarding some other performance measures used for MOEAs. As the true Pareto front of the problems is unknown, we use two performance measures, hypervolume [15] and D metric [15], which are not based on the true Pareto front. The first measure concerns the size of the objective space which is covered by a set of nondominated solutions. The higher the value, the larger the dominated volume in the objective space and hence the better an algorithm’s performance. The D metric measure between two non-dominated sets A and B gives the relative size of the region in the objective space that is dominated by A but not by B, and vice versa. It also gives information about whether either set totally dominates the other set, e.g., D(A, B) = 0 and D(B, A) > 0 means that A is totally dominated by B. Since in this paper the focus is on finding the Pareto optimal set rather than obtaining a uniform distribution over a trade-off surface, we do not consider the online performance of MOEAs but consider the offline performance. Hence, the Pareto optimal set regarding all individuals generated over all generations is taken as the output of a MOEA. The performance of a particular algorithm on a test problem was calculated by averaging over all 20 runs. Table 3 shows the values of the hypervolume and D metric of NSGA-II and GSNSGA on the test instances. It can be seen that GSNSGA covers a larger objective value space compared with NSGA-II on all problem instances. It is also
10
S.N. Jat and S. Yang NSGA−S1
NSGA−S1
NSGA−S1
NSGA−S1
120
120
120
50
110
110
100
40
100
100
90
90
30
f3
f2
f3
80
f3
60
60 20 40 100
10
15 50
80
70
70
60
60
10 5
f2
80
0 0
0 4
f1
6
8
GSNSGA−S1
10
12
14
50 4
16
6
8
10
12
f1
f1
GSNSGA−S1
GSNSGA−S1
14
50 0
16
10
20
30
40
50
60
f2 GSNSGA−S1
20
20
20
15
15
15
20
10
f3
f3
10
f2
f3
15 10
10
5 0 20
5
5
5
3 10
2 1
f2
0 0
0 0
f1
0.5
1
NSGAII−M1
1.5
2
2.5
0 0
3
0.5
1
1.5
f1
f1
NSGAII−M1
NSGAII−M1
380
2
2.5
0 0
3
5
10
15
20
350
400
f2 NSGAII−M1
110
110
100
100
90
90
360
120
340 100 300
60 400
280 260
60 300
80
80
70
70
50 40
f2
f3
f3
f2
f3
320 80
200 30
240 35
f1
40
GSNSGA−M1
60
45
50
60 35
55
40
45
50
f1
f1
GSNSGA−M1
GSNSGA−M1
60 200
55
250
300
f2 GSNSGA−M1
200
60
60
180
50
50
40
40
140
f3
f2
f3
20
f3
160
40
30
30
20
20
120 0 200 20 100
f2
10 0 0
f1
100 80 0
5
10
15
f1
20
10 0
5
10
15
20
f1
10 80
100
120
140
160
180
200
f2
Fig. 2. Comparison of NSGA-II and GSNSGA regarding the three objective values Table 3. Comparison of algorithms regarding the hypervolume and D metric measures Hypervolume D metric MOUCTP NSGA-II GSNSGA NSGA-II vs GSNSGA GSNSGA vs NSGA-II S1 S2 S3 S4 S5 M1 M2 M3 M4 M5 L
5.3 × 106 4.8 × 106 4.8 × 106 4.2 × 106 5.0 × 106 5.0 × 107 4.4 × 107 4.5 × 107 5.3 × 107 5.1 × 107 −
8.0 × 106 7.9 × 106 8.0 × 106 7.0 × 106 7.9 × 106 9.7 × 107 9.4 × 107 9.9 × 107 1.1 × 108 9.4 × 107 3.1 × 108
0 0 0 0 0 0 0 0 0 0 −
7.8 × 106 7.5 × 106 7.4 × 106 6.8 × 106 7.6 × 106 9.5 × 107 9.3 × 107 9.4 × 107 9.2 × 107 9.3 × 107 −
A Guided Search NSGA for Multi-Objective University Course Timetabling
11
Table 4. Comparison of GSNSGA and other algorithms on the test problem instances MOUCTP S1 S2 S3 S4 S5 M1 M2 M3 M4 M5 L
GSNSGA Best 0 0 0 0 0 113 104 101 42 114 340
GSGA Best 0 0 0 0 0 240 160 242 158 124 801
TSRW Best 0 – – – – 173 224 160 – –
VNS Best 0 0 0 0 0 317 313 357 247 292 ln
GBHH Best 6 7 3 3 4 372 419 359 348 171 1068
EGD Best 0 0 0 0 0 80 105 139 88 88 730
NLGD Best 3 4 6 6 0 140 130 189 112 141 873
evident from the D metric values that there is no objective space that GSNSGA is dominated by NSGA-II. On all test problems, GSNSGA outperformed NSGAII regarding the two performance measures. Thirdly, a comparison of GSNSGA with other published results was also conducted in order to assess the effectiveness of GSNSGA against other optimisation methods. Since most published results are based on the single objective UCTP, we also compare the results of GSNSGA by aggregating the three objective values into one objective. Table 4 shows the results, where “−” means no result available in the literature, “ln” means no feasible solution for the problem instance, and the best results among all approaches are shown in bold. In Table 4, GBHH [4] denotes a graph-based hyper-heuristics with tabu search for the UCTP, GSGA [9] denotes the guided search GA with LS for the UCTP, VNS [1] denotes the variable neighbourhood search, EGD [11] denotes an extended great deluge method, NLGD [10] denotes a non-linear great deluge algorithm, and TSRW [3] denotes the tabu search roulette wheel hyper-heuristic to solve the three-objective UCTP. From Table 4, it can be seen that GSNSGA obtained the best results for 9 out of 11 problem instances and the second best results for 2 problems. In summary, GSNSGA is able to produce high quality solutions no matter how many objectives the problem has in comparison to other methods.
5
Conclusions and Future Work
This paper presents a MOEA that combines guided search and LS techniques with NSGA-II to solve the MOUCTP. NSGA-II gives good results on MOUCTPs, but when it is integrated with the guided search and LS techniques, the improvement is noticeable. The data structures introduced in GSNSGA improve the quality of individuals by storing part of former good solutions, which otherwise would have been lost in the selection process, and reusing the stored information to guide the generation of offspring. This enables GSNSGA to quickly retrieve the best solutions corresponding to previous populations. The experimental results show that GSNSGA is competitive across all test problems. It gives good results by producing a set of non-dominated solutions for the user to choose the
12
S.N. Jat and S. Yang
most appropriate one rather than restricting to a single solution. It can also be seen that the addition or deletion of constraints or objectives does not affect the performance of GSNSGA much because each objective function is treated separately. Hence, GSNSGA is appropriate for the MOUCTP. There are several relevant future works. One would be to check the performance of guided and local search with other MOEAs for the MOUCTP. We also intend to test our approach on other problem instances and devise new genetic operators and neighborhood techniques based on different problem constraints.
References 1. Abdullah, S., Burke, E.K., McCollum, B.: An investigation of variable neighbourhood search for university course timetabling. In: Proc. of the 2nd Multidisciplinary Int. Conf. on Scheduling: Theory and Appl., pp. 413–427 (2005) 2. Abdullah, K., Coit, D.W., Smith, A.E.: Multi-objective optimisation using genetic algorithms: A tutorial. Reliability Engg and System Safty 91(9), 992–1007 (2006) 3. Burke, E.K., Silva, J.D.L., Soubeiga, E.: Multi-objective hyper-heuristic approaches for space allocation and timetabling. In: Ibaraki, T., Nonobe, K., Yagiuru, M. (eds.) Meta-heuristics: Progress as Real Problem Solvers, ch. 6, pp. 129– 158. Springer, Heidelberg (2003) 4. Burke, E.K., MacCloumn, B., Meisels, A., Petrovic, S., Qu, R.: A graphbased hyper-heuristic for educational timetabling problems. Europ. J. of Oper. Res. 176(1), 177–192 (2007) 5. Carrasco, M.P., Pato, M.V.: A multiobjective genetic algorithm for the class/Teacher timetabling problem. In: Burke, E., Erben, W. (eds.) PATAT III. LNCS, vol. 2079, pp. 3–17. Springer, Heidelberg (2001) 6. Datta, D., Deb, K., Fonseca, C.M.: Multi-objective evolutionary algorithm for university class timetabling problem. In: Dahal, K.P., Tan, K.C., Cowling, P.I. (eds.) Evolutionary Scheduling, pp. 197–236. Springer, Heidelberg (2007) 7. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 849–858. Springer, Heidelberg (2000) 8. http://iridia.ulb.ac.be/supp/IridiaSupp2002-001/index.html 9. Jat, S.N., Yang, S.: A guided search genetic algorithm for the university course timetabling problem. In: Proc. of the 4th Multidisciplinary Int. Conf. on Scheduling: Theory and Appl., pp. 180–191 (2009) 10. Landa-Silva, D., Obit, J.H.: Great deluge with non-linear decay rate for solving course timetabling problem. In: Proceedings of the 2008 IEEE Conference on Intelligent Systems (IS 2008), pp. 8.11–8.18. IEEE Press, Los Alamitos (2008) 11. McMullan, P.: An Extended Implementation of the Great Deluge Algorithm for Course Timetabling. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4487, pp. 538–545. Springer, Heidelberg (2007) 12. Paquete, L.F., Fonseca, C.M.: A study of examination timetabling with multiobjective evolutionary algorithms. In: Proc. of the 4th Metaheuristics Int. Conf., pp. 149–154 (2001)
A Guided Search NSGA for Multi-Objective University Course Timetabling
13
13. Rossi-Doria, O., Sampels, M., Birattari, M., Chiarandini, M., Dorigo, M., Gambardella, L., Knowles, J., Manfrin, M., Mastrolilli, M., Paechter, B., Paquete, L., St¨ utzle, T.: A comparison of the performance of different metaheuristics on the timetabling problem. In: Burke, E.K., De Causmaecker, P. (eds.) PATAT IV. LNCS, vol. 2740, pp. 329–351. Springer, Heidelberg (2003) 14. Rudov´ a, H., Murray, K.: University course timetabling with soft constraints. In: Burke, E.K., De Causmaecker, P. (eds.) PATAT IV. LNCS, vol. 2740, pp. 310–328. Springer, Heidelberg (2003) 15. Zitzler, E.: Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications. Shaker, Ithaca (1999)
A Hybrid Dual-Population Genetic Algorithm for the Single Machine Maximum Lateness Problem Veronique Sels1 and Mario Vanhoucke1,2 1
Faculty of Economics and Business Administration, Ghent University, Tweekerkenstraat 2, 9000 Gent, Belgium
[email protected] 2 Operations and Technology Management Centre, Vlerick Leuven Gent Management School, Reep 1, 9000 Gent, Belgium
[email protected]
Abstract. We consider the problem of scheduling a number of jobs, each job having a release time, a processing time and a due date, on a single machine with the objective of minimizing the maximum lateness. We developed a hybrid dual-population genetic algorithm and compared its performance with alternative methods on a new diverse data set. Extensions from a single to a dual population by taking problem specific characteristics into account can be seen as a stimulator to add diversity in the search process, which has a positive influence on the important balance between intensification and diversification. Based on a comprehensive literature study on genetic algorithms in single machine scheduling, a fair comparison of genetic operators was made. Keywords: Single machine scheduling, maximum lateness, genetic algorithm, dual-population structure.
1
Introduction
The single machine scheduling (SMS) problem addressed in this paper often occurs as a subproblem in solving other scheduling environments such as flow or job shops. The problem can be described as follows: there is a set N of n jobs (index j, j = 1, 2, . . . n) that have to be scheduled on a single machine. The machine is assumed to be continuously available and can process at most one job at a time. The jobs may not be preempted and each job j is characterized by its processing time pj and its due date dj . Due to the dynamic nature of the subproblem, jobs arrive over time at the single machine, and therefore, each job is further characterized by a release time rj . The objective is to find a schedule that minimizes the maximum lateness, Lmax . Based on the α|β|γ-classification scheme of [6], the problem under study can be written as 1|rj |Lmax . This scheduling problem is known to be NP-hard [10]. The problem without release times, 1||Lmax , can be optimally solved in polynomial time, by sequencing the jobs in non-decreasing order of their due dates [9]. However, the addition of arbitrary release times makes P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 14–25, 2011. c Springer-Verlag Berlin Heidelberg 2011
A Dual-Population GA for the SM Maximum Lateness Problem
15
the problem of minimizing maximum lateness on a single machine much more complex. Only a few special cases have been shown to be solvable in polynomial time. For the other cases, numerous enumerative as well as approximation approaches have been proposed in literature. A complete overview of the literature on the SMS maximum lateness problem can be found in [15]. Some noteworthy enumerative approaches are the branch-and-bound procedures of [11] and [2]. The algorithm of Carlier is considered to be one of the most efficient algorithms for the problem under study and has proven to be especially useful for solving large problem sets. The algorithm has been discussed and improved in later work by, among others, [13]. Our work is, to some extent, based on the findings of the papers mentioned above. More precisely, the conclusions of [2] and [13] concerning the difficulty of their data instances, motivated us to perform a critical analysis of the instances used in our computational experiments. The results are described in section 3.2 and further discussed in section 4.3. The contribution of this paper is threefold. First of all, an in-depth comparison of genetic operators used in the SMS literature is done. Secondly, the effect of using a dual-population structure is investigated. Thirdly, a new data generation method is described and the obtained data instances are carefully examined and analyzed.
2
Genetic Algorithm
The genetic algorithm (GA) is a well-known search technique for solving optimization problems based on the principles of genetics and natural selection. The method was initially developed by John Holland in the 1970s [8], who was inspired by Charles Darwin’s theory of evolution. Genetic algorithms (GA) have been applied to a wide variety of scheduling problems, including the single machine scheduling problem. An overview of these GA applications in the SMS literature is given in [15]. In this paper, we present a hybrid genetic algorithm for the single machine maximum lateness problem with ready times and distinct due dates. This GA is developed by means of comparing the different genetic operators described in literature. As such, the experience gained in other SMS problems is used to build an effective GA for the problem under study. Moreover, we borrowed some elements from the scatter search technique, such as the dual-population structure and the diversification generation method of [4], to enhance the important balance between diversification and intensification. As such, we not only focus on the quality of a solution, but also on the diversity of a solution. The alternative genetic operators were thoroughly tested to find the best combination for the SMS maximum lateness problem. The test results are given in section 4.2. For more information on the different genetic operators and their corresponding references in literature, we refer to [15]. In the following paragraphs, we give an overview of the hybrid dual-population genetic algorithm we implemented. A general outline is given in figure 1. In the next paragraphs, a detailed description of the different genetic operators is given.
16
V. Sels and M. Vanhoucke
# iterations Population Subset 1
Quality?
... Subset 2
Recombination
comb_rate
Local Search
Selection
Diversityy? Diversity?
...
Evaluation
Qualityy?
Mutation
Diversityy?
Replacement
cross_rate
mut_rate
Crossover
Best Solution
Fig. 1. Overview of GA
Solution Representation. The most natural way to represent a schedule in single machine scheduling is to use permutation encoding. In permutation encoding, every chromosome is a permutation of the n jobs in which each job appears exactly once. As such, it is ensured that each chromosome corresponds to a unique schedule and each schedule corresponds to a unique chromosome. The permutation of jobs is translated into a feasible schedule using a list engine, which simply keeps the jobs in the order of the chromosome and assigns the earliest possible starting time to them. As such, semi-active schedules are built, in which no job can be completed earlier without changing the order of processing. Population. In order to ensure a certain degree of diversity, a dual-population structure borrowed from the scatter search framework is used in the GA. The population is split into a high quality (subset 1) and highly diverse subset (subset 2). The elements of the first subset are generated randomly and seeded with some good constructive heuristic solutions such as the Schrage heuristic [11]. The elements of the second subset are generated according to the diversification generator for permutation problems as described in [4]. The diversity in both sets is guaranteed by a distance measure calculated between every pair of solutions. This A-distance measure of [1], is equal to the sum of all absolute differences between the positions of all items in strings p and q: d(p,q) = |pi − qi | (1) i
For the first subset, a moderate diversity threshold value has to ensure that the x unique solutions with highest fitness value are used in the initial population. The elements of the second subset only enter the initial population when a much severe diversity threshold with all of the solutions in the first subset is exceeded. Selection. When the initial population is constructed, the algorithm has to select parent solutions that will generate new children for the next generation.
A Dual-Population GA for the SM Maximum Lateness Problem
17
Parents can be selected from the first as well as the second subset. Solutions from the high quality subset are selected according to their fitness value. The better the fitness value, the higher the chance that the parent will be selected. Solutions from the second subset are selected based on their distance measure. Their chance of being selected increases with increasing distances. The selection methods used in the single machine scheduling literature can be generally classified in two classes. Proportionate-based selection selects parents according to their fitness value (distance measure) relative to the fitness value (distance measure) of the other solutions in the population. An example of such selection method is roulette wheel selection (RWS). Ordinal-based selection on the other hand, selects parents according to their rank within the population. Examples are tournament selection (TS) and ranking selection (RS). Crossover. In a next step, the selected parents are recombined to create a new child or offspring. This is done by the crossover operator, which is executed with a certain probability, the cross rate. Crossover can occur between solutions within the first subset or between solutions of the first and second subset. For this, we introduce a probability measure that controls the combination of high quality with highly diverse solutions, the comb rate. Crossover operators for permutation encoding can be roughly classified in three classes: a class that preserves the relative order of the jobs, a class that respects the absolute position of the jobs and a class that tends to preserve the adjacency information of the jobs. As we want to minimize the maximum lateness, the relative order and/or the absolute position of each job is more relevant to the total fitness of the schedule than the adjacency information. For that reason, the crossover operators implemented in our algorithm belong to one of the first two classes. These include (linear) order crossover (OX), (uniform) order-based crossover (OBX), cycle crossover (CX), position-based crossover (PBX) and partially mapped crossover (PMX). Mutation. After a number of generations, the chromosomes become more homogeneous and the population starts to converge. Together with the dualpopulation structure of our GA, the mutation operation serves as a tool to introduce diversification into a population. Mutation also occurs with a particular probability, the mut rate. The mutation operators we tested include commonly used mutation operators for permutation encoding, such as a swap mutation (SM), an insertion mutation (InsM) and an inversion mutation (InvM). Local Searches. In order to intensify our search process, we hybridize our GA with a local search algorithm. Before introducing the offspring solutions into the population, a local search technique is used to improve these solutions. A local search algorithm iteratively searches through the solution space by moving from one solution to another. It replaces the current solution by a better neighboring solution until no more improvements can be found or until some stopping criteria is met. With respect to this stopping criteria, we define a maximum number of schedules that each local search may explore during every generation of the GA. Since the risk of converging to a poor local optimum exists, the local searches are sometimes allowed to accept non-improving or neutral moves. An important
18
V. Sels and M. Vanhoucke
feature in the local search algorithm is the neighborhood (NH) function. This function specifies the possible neighbors of a solution. As a solution is represented by a permutation of jobs, the most common neighborhood functions used are the insertion and the swap NH [12]. An insertion NH function generates a neighbor by deleting a job from the sequence and inserting it at another position in that sequence. In a swap NH, a neighboring solution is generated by interchanging two (or more) arbitrary or adjacent jobs of the sequence. Common local search techniques from SMS literature, based on either a swap or an insertion neighborhood, include the randomized pairwise swap (RPS), the adjacent pairwise swap (APS), the 3-swap (3S), the insertion (INS) and the largest cost insertion (LCI). Replacement and Termination Condition. Once the new offspring solutions are created, they have to be inserted in the population. We use an incremental replacement strategy, where the offspring solutions replace the least fit solutions of the current population. If the solutions are not allowed into the new population, they are checked on their diversity using equation 1. If their distance is greater than the smallest distance in the current diverse subset, the solution is accepted in the new population. The algorithm is stopped within the time limit of one second or when the optimal solution is found. A solution is optimal when the job with the largest lateness starts at its release time or when the solution equals the lower bound.
3 3.1
Data Generation Generation of Instances
In the literature of the 1|rj |Lmax problem, no standard data set can be found. This motivated us to analyze the methods described in the literature and translate them into two simple generation methods. A distinction was made based on how the due dates of the jobs are generated. The first method generates due dates that depend on the release and/or processing times of the jobs. The method is based on the due date assignment methods described in the paper of [5]. These dependent due dates are uniformly distributed between rj + kpj and rj + kpj + q. The parameter k represents the due date tightness, while parameter q defines the slack allowance. The larger k and q, the more slack a job has to be scheduled between its release time and its due date. The second generation method generates due dates that do not depend on the jobs’ processing times and/or ready times, but only on the sum of the processing times of all jobs. In general, the independent generation method is based on the widely used techniques of [14] and [7]. These due dates were generated from the independent uniform distribution U [a pj , b pj ], where a and b are (wide-ranging) parameters that define the range and location of due dates relative to the period that the machine processes the jobs (a ≤ b). These two generation methods result in two different data sets, set I and set II, for dependent and independent due dates respectively. The problem size is
A Dual-Population GA for the SM Maximum Lateness Problem
19
Table 1. Instance parameters Parameter processing time pj release time rj due date dj
Set I U[1,100] U[0, l pj ] l = 0, 0.25, 0.5, ..., 4 U[ rj + kpj , rj + kpj + q] k = -1, 0, 5, 10 q = 0, n, 5*n, 10*n
Set II U[1,100] U[0, l pj ] l= 0, 0.25, 0.5, ..., 2 U[a pj , b pj ] a = 0, 0.25, ..., 1 b = 0, 0.25, ..., 1.5
set to 100 jobs. The job processing times pj are uniformly distributed integers between 1 and 100. The release times rj are integers generated from the uniform distribution U [0, l pj ] where l defines the range of distribution of rj . The smaller l, the more frequent a job arrives in the system. An overview of the parameter settings can be found in table 1. In both sets, every combination of release and due date parameter values leads to a certain problem class. This results in 272 problem classes for Set I and 225 classes for Set II. For each problem class, ten instances were generated, resulting in a total of 4,970 instances1 . 3.2
Data Analysis
An analysis of the data obtained is performed to make a distinction among the difficulty levels of the instances and thus, to provide more insights in the performance of the genetic algorithm. Figure 2 illustrates this analysis. On the left axes, the due date ranges are given by the different parameter values (i.e. k and q for set I and a and b for set II). The release times ranges, with parameter l, are shown on the bottom axes. The cells in the figures represent the different problem classes, each cell containing ten problem instances. The shaded areas designate the different difficulty levels of the corresponding instances. The darker the shade, the less difficult the class of instances is. These difficulty levels are based on computational experiences performed by analyzing the solution obtained by the Schrage heuristic [11] for every instance. From this analysis, it could be observed that instances of some problem classes were already solved optimally. As a consequence, we could presume that some instances were rather ‘easy’ and others were rather ‘hard’ to solve. We conjectured that the more instances per problem class could be solved with the Schrage heuristic, the less difficult that class of instances was. As a result, the difficulty of the problem classes can be subdivided in different levels. The first level contains problem classes in which every instance was solved optimally by the Schrage heuristic. The theoretical optimal instances, denoted by T O, have release times equal to zero (l = 0), due dates equal to their release times (k = q = 0) or common due dates (a = b), and are known to be solved optimally by the Schrage heuristic. The empirical optimal instances were observed to be solved optimally by the EDD-rule, but no theoretical funded explanation can be given. We presume that their optimality results from the fact that their 1
The problem instances can be downloaded from the website www.projectmanagement .ugent.be/machinescheduling.html.
20
V. Sels and M. Vanhoucke
Sett II a
b 1.5 1 1.25 1 1.5 1.25 0.75 0 75 1 0.75 1.5 1.25 0.5 1 0.75 0.5 1.5 1.25 1 0 25 0.25 0.75 0.5 0.25 1.5 1.25 1 0 0.75 0.5 0.25 0 l
Lege end:
Sett 1 k
0
0.25 0.5 0.75
1
1.25 1.5 1.75
2
q 10*n 5*n 5 n 0 10*n 5*n 1 n 0 10*n 5*n 0 n 0 10*n 5*n -1 1 n 0 l
TO 100% 80% 60% 40% 20% 0%
0
0.25 0.5 0.75
1
1.25 1.5 1.75
2
2.25 2.5 2.75
3
3.25 3.5 3.75
4
Fig. 2. Data analysis for both data sets based on the due date and release time parameter values
release time and/or due date ranges are closely related to those of the theoretical optimal instances. These classes are represented by the 100% regions in figure 2, denoting that all instances were solved optimally. The next level of difficulty include problem classes with instances for which the Schrage heuristic mostly was able to find the optimal solution. These are called the empirical easy instances. For 60% to 90% of the instances, the optimal solution was found. The third level are the empirical moderate problem classes. Some instances, 20% to 50%, were solved optimally, but in general, an optimal solution could not be found. The last level are the empirical hard problem instances, for which the Schrage heuristic was not able to find the optimal solution. At first sight, when looking at the shaded areas, it seems that the instances of set I are less difficult than the instances of set II. This conjecture will be assessed in a computational experiment described in section 4.3.
4 4.1
Computational Experiments Parameter Fine-Tuning
In order to refine our hybrid dual-population genetic algorithm, decisions with respect to the parameter values have to be made. Examples are the population size, the combination, crossover and mutation rates. These parameters were finetined by performing tests in cycles. Each cycle, a single parameter was chosen to be fine-tuned, while the other parameters were set to a certain value. All possible values for that parameter were tested and its best value was fixed before going to the next parameter. This process is repeated for all parameters until no more improvement was found. In table 2, the different test values of all parameters are given, together with their best value for both data sets. The table shows that, for both sets, the
A Dual-Population GA for the SM Maximum Lateness Problem
21
Table 2. Parameter values used in GA Parameter Test values pop size comb rate cross rate mut rate
Best values Set I Set II n, 2n, 5n, n/2, n/5 n/5 n/2 0, 0.1, ..., 1 0.85 0.85 0, 0.1, ..., 1 0.5 0.6 0, 0.1, ..., 1 0.1 0.3
population size is relatively small thanks to the intelligence of the algorithm and the dual-population structure. The combination rate, which determines the probability of combining a high quality with a high diverse population element, is set to 85%. This means that there is 85% chance on combining elements within the first subset of the population and 15% chance on combining elements from the first with the second subset of the population. The optimal crossover and mutation rates are equal to 50% and 60%, and 10% and 30%, respectively. 4.2
Operator Selection
In this section, the search for the best combination of genetic operators is discussed, given the ideal parameters of the previous section. We performed a full factorial design where every possible combination was tested in cycles. For each genetic operator, the best alternative was obtained by fixing the other operators to their best alternative. In table 3, the results of the design are given. The values in the table are obtained by calculating the relative performance of the genetic algorithm. The relative performance (RP ) of the genetic algorithm was measured by the deviation of the objective values found by the GA with a lower bound and is equal to − LB GA RP = , (2)
LB
where GA is the sumof the objective values over some set of instances obtained by the GA and LB is the sum of the lower bounds over the same set of instances. This lower bound was obtained by combining four lower bound calculations from literature as described in [11], [2] and [3]. For set I, the best combination is to use the tournament selection method together with the position-based crossover, the single swap mutation and the largest cost insertion local search algorithm. The combination of the tournament selection method with the cycle crossover, the inversion mutation and the randomized pairwise swap improvement heuristic turned out to be the best for set II. However, when looking at the table, it can be noticed that in comparison with set I, the differences between the operators of set II are relatively small. The choice of genetic operators seems to be of little importance for set II (i.e. with independent due dates). Moreover, the deviations from the lower bound in general are very small. This makes us presume that the instances of set II have a lower difficulty level than figure 2 of section 3.2 would imply. In the next section, the difficulty of these instances will be therefore further examined.
22
V. Sels and M. Vanhoucke Table 3. Operator selection - Relative performance (%) Operator Selection:
Crossover:
Mutation:
Local search:
4.3
TS RWS RS PMX OX PBX OBX CX SM InsM InvM RPS APS INS 3S LCI
Set I 0.1348 0.1493 0.2966 0.1952 0.2010 0.1348 0.1362 0.2787 0.1348 0.1488 0.1411 0.1884 0.1493 1.4150 0.4995 0.1348
Set II 0.0058 0.0073 0.0073 0.0073 0.0073 0.0073 0.0073 0.0058 0.0073 0.0073 0.0058 0.0058 0.0073 0.2948 0.0073 0.0073
Computational Results
In this section, we compare the performance of our algorithm, with the performance of the Schrage heuristic with (Schrage LS ) and without local search (Schrage) and the Multi-Start algorithm (MultiS ) in order to obtain benchmark results2 . The Multi-Start algorithm starts with (1,000) randomly chosen solutions followed by a hill-climbing technique, which seeks to improve each initial schedule to its local optimum. Three versions of our algorithm were tested, the genetic algorithm with a single-population structure (only high quality solutions) (GA), the dual-population GA (2PGA) and the hybrid dual-population GA with the best performing local search (2PGA LS ). The computational results are listed in table 4, which summarizes the relative performances of all heuristics. We make a distinction according to the difficulty levels described before. In doing so, certain effects become more clear when only ‘hard’ instances (e.g. the -20% instances are assumed to be the hardest problem instances) are considered. The results in both tables reveal the contribution of the various solution approaches, the local search procedures and the various operators embedded in the genetic algorithm. They can be summarized along the following lines: – The contribution of an intensive multi-pass search versus a quick singlepass search leads to obvious conclusions. The single-pass Schrage algorithm performs worst compared to the more time-consuming multi-pass algorithms (GA, 2PGA, 2PGA LS and MultiS). However, the results show that the efficient Schrage LS algorithm is better than the time-consuming MultiS algorithm for the sets with empirical and/or theoretical optimal instances (Full set and -TO columns). These sets contain instances for which the Schrage algorithm is proven to generate optimal solutions, which could often 2
The algorithms described were coded in Visual Studio C++ and run on a 2.6 GHz Intel Pentium Processor.
A Dual-Population GA for the SM Maximum Lateness Problem
23
not be found by the intensive MultiS search. It is for this very reason that the size of the data sets is further decreased to only the more difficult instances. In doing so, we avoid that many instances from the sets can be quickly solved by single-pass heuristics or straightforward extensions of these heuristics. – The contribution of local search algorithms is clearly shown in the table, since all solution procedures with LS always outperform their non-LS version: the hybrid 2PGA LS outperforms 2PGA, Schrage LS outperforms Schrage and MultiS (which also contains the LS procedures) outperforms both Schrage and 2PGA without local search. – The contribution of the dual-population structure can be seen by comparing the performance of the GA with the 2PGA. These results show, that the incorporation of a dual population structure is a crucial tool that allows the GA the possibility to explore the solution space more efficiently. – The contribution of the genetic algorithm operators can be best shown when comparing the 2PGA LS with the MultiS algorithm. The 2PGA LS outperforms the MultiS performance, even when the 2PGA and 2PGA LS algorithms have been truncated after 1 second, while the MultiS search algorithm needed on average 1 minute to evaluate the 1,000 random solutions. Moreover, the hybrid dual-population genetic algorithm (2PGA LS) always outperforms all other solution procedures, both for the dependent (set I) and independent (set II) data sets. It should be noted that the differences between the relative performances of table 4 are less clear for the data of set II compared to set I. This is in line with the findings of the previous section. This could possible lead to the conjecture that most instances with independent due dates of set II are relatively easy to solve, which supports the findings of [2] and [13]. In the paper of [2], 999 out of the 1,000 instances described were solved optimally. Seemingly an outstanding result, but Carlier noted that “in nine cases out of ten, either the Schrage solution or the schedule when the critical job was scheduled after the critical set J was optimal”. Moreover, he tested instances up to 10,000 jobs, but even those instances were solved optimally with only one node in the tree. This was Table 4. Comparison of methods - Relative performance (%) Data set Method Set I GA 2PGA 2PGA LS Schrage Schrage LS MultiS #inst Set II GA 2PGA 2PGA LS Schrage Schrage LS MultiS #inst
Full set 1.4895 0.9763 0.2538 1.6620 0.6282 1.0771 2720 0.1391 0.0951 0.0272 0.4915 0.0488 3.8533 2250
- TO 2.4968 1.2515 0.4255 2.7859 1.0531 1.7020 2400 0.1836 0.1186 0.0358 0.6485 0.0643 0.4263 1600
-100% 25.4976 12.1321 4.3452 28.4500 10.7543 8.3532 1650 0.1882 0.1323 0.0368 0.6649 0.0660 0.1779 1400
-80% 46.7115 25.8566 7.8404 52.1908 19.6902 13.5248 1320 0.1832 0.1544 0.0282 0.6768 0.0565 0.0701 1290
-60% 65.1439 47.5332 11.0952 73.0150 27.9287 19.2784 1070 0.2159 0.1795 0.0322 0.7616 0.0651 0.0828 1140
-40% 133.7975 99.4174 20.3734 149.0884 55.5257 31.0752 730 0.3174 0.2004 0.0458 1.0123 0.0879 0.1257 860
-20% 339.6232 224.8983 46.3563 376.5309 147.4370 76.5863 360 0.5241 0.3941 0.0703 1.4161 0.1225 0.2139 630
24
V. Sels and M. Vanhoucke
conformed by the study of [13], who stated that “Carlier reported the remarkable performance of its algorithm on a test set of 1,000 instances, but he also pointed out that the large majority of those instances was easy”. They extended the set of instances, with the same parameters, to 500,000 and they concluded that with Carlier’s algorithm, only 101 instances remained unsolved. This also indicates that the instances studied are not that challenging. To confirm these conjectures, a further thorough analysis of the instances of both data sets was done. Through empirical testing, we have noticed that the instances of set II often had the characteristic that the lateness of the job j that was responsible for the Lmax was equal to the difference between dj and rj + pj . As a consequence, in order to obtain an optimal schedule, this job has to be fixed on its release time, while the order of the other jobs is less important. Hence, many permutations will lead to the optimal solution, as long as this single job starts at his release time. This makes the problem instances somewhat easier to solve, because any greedy sequence of simple moves (swap and/or insertion) performed on any non-optimal schedule can easily improve the schedule quality. Comparing this with the findings of section 3.2, there seems to be no direct link between the Schrage heuristic performance and the actual difficulty of the instances of the independent data set, as the previous section presumed. Because of the independency of the due dates, there is no relation between release times, processing times or due dates. This makes solving these instances harder for a single pass heuristic (i.e. the Schrage heuristic) that is more or less based on these relationships. This explains the rather poor performance of the heuristic on the relative ‘easy’ instances of set II.
5
Conclusions
In this paper we have examined the single machine maximum lateness problem with distinct release times and due dates. A hybrid dual-population genetic algorithm was developed by means of comparing different genetic operators from the SMS literature. Various alternatives for the operators were tested in a full factorial design to find the best combination for the problem under study. The computational experiments were performed on two diverse data sets, with dependent or independent due dates, which are a summary of the methods described in literature. The instances in both sets were analyzed to examine their difficulty levels. The results indicated that the instances with an independent due date were not that challenging. This was justified by literature and our own experiments. Comparison was made with the Schrage heuristic and a Multi-Start algorithm. The test results illustrate the contribution of the various solution approaches, the local search procedures, the various operators and the dual-population structure embedded in the genetic algorithm. Possible directions for future research include employing other metaheuristics and extending the problem with setup considerations and batching. Moreover, the inclusion of the problem in solving more complex environments such as job shop or flow shop scheduling is an interesting field of study.
A Dual-Population GA for the SM Maximum Lateness Problem
25
References 1. Campos, V., Laguna, M., Marti, R.: Context-independent scatter and tabu search for permutation problems. Journal on Computing 17, 111–122 (2005) 2. Carlier, J.: The one-machine sequencing problem. European Journal of Operational Research 11, 42–47 (1982) 3. Chang, P.-C., Su, L.-H.: Scheduling n jobs on one machine to minimize the maximum lateness with a minimum number of tardy jobs. Computers & Industrial Engineering 40, 349–360 (2001) 4. Glover, F.: A template for scatter search and path relinking. In: Hao, J.-K., Lutton, E., Ronald, E., Schoenauer, M., Snyers, D. (eds.) AE 1997. LNCS, vol. 1363, pp. 13–54. Springer, Heidelberg (1998) 5. Gordon, V.S., Proth, J.-M., Chu, C.: Due date assignment and scheduling: SLK, TWK and other due date assignment models. Production Planning & Control 13(2), 117–132 (2002) 6. Graham, R.L., Lawler, E.L., Lenstra, J.K., Rinnooy-Kan, A.H.G.: Optimization and approximation in deterministic sequencing and scheduling: A survey. Annals of Discrete Mathematics 5, 287–326 (1979) 7. Hariri, A.M.A., Potts, C.N.: Single machine scheduling with batch set-up times to minimize maximum lateness. Annals of Operations Research 70, 75–92 (1997) 8. Holland, J.: Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor (1975) 9. Jackson, J.R.: Scheduling a production line to minimize maximum tardiness. Management Science Research Report 43, University of California, Los Angeles (1955) 10. Lenstra, J.K., Rinnooy-Kan, A.H.G., Brucker, P.: Complexity of machine scheduling problems. Annals of Discrete Mathematics 1, 343–362 (1977) 11. McMahon, G., Florian, M.: On scheduling with ready times and due dates to minimize maximum lateness. Operations Research 23, 475–482 (1975) 12. Michiels, W., Aarts, E., Korst, J.: Theoretical Aspects of Local Search Series. In: Monographs in Theoretical Computer Science, An EATCS Series (2007) 13. Pan, Y., Shi, L.: Branch-and-bound algorithms for solving hard instances of the one-machine sequencing problem. European Journal of Operational Research 168, 1030–1039 (2006) 14. Potts, C.N., Van Wassenhove, L.N.: A branch and bound algorithm for the total weighted tardiness problem. Operations Research 33(2), 363–377 (1985) 15. Sels, V., Vanhoucke, M.: A genetic algorithm for the single machine maximum lateness problem. Technical report, Ghent University (2009)
A Kolmogorov-Type Stability Measure for Evolutionary Algorithms Matthew J. Craven1 and Henri C. Jimbo2 1
School of Comp. and Eng. Systems, University of Abertay Dundee, Dundee, UK
[email protected] 2 Nara Institute of Science and Technology, Ikoma, 630-0192 Nara, Japan
[email protected]
Abstract. In previous work, EAs were shown to efficiently solve certain equations over partially commutative groups. The EAs depend on the values of several control parameters for success. Generally these values must be tuned to the structure of the equation or problem to be solved. Supposing suitable values are found, a natural concern is stability of the EA under random perturbation of its parameters. This work considers such a model of EA stability by defining neighbourhoods over EA parameter space and examining their properties. We define stability based upon Kolmogorov distance and analyse that distance between repeated random perturbations of parameters, forming a statistical indication of EA stability under parameter perturbation. We then analyse the model for the wider class of general EAs, meaning our model may serve as a framework for parameter optimisation and stability analysis.
1
Introduction
An experimental Evolutionary Algorithm (EA) approach to solving the Double Coset Search Problem (a problem arising in group-theoretic cryptography [4]) over certain subclasses of partially commutative groups was discussed in previous work [2], and shown to be effective and efficient. In [1], we also showed the EA to be sensitive to changes in its control parameters, a typical EA characteristic, and described a coevolutionary method for determining optimal sets of said control parameters. The method was shown to be effective for control parameters search, proving much faster than traditional search methods. However, the above method does not give a picture of local stability of the EA around given sets of such parameters. We may be able to depict an overall picture of EA performance for a wide variety of parameters, but it proves difficult to measure in a meaningful way the differences in EA performance for a selection of sets of parameters that are “close” to each other (in some sense). The differences may be used as a measure of stability of the EA under variation of its parameters. In this paper we consider the EA as a dynamical system; that is, the EA is a black box, and the EA control parameters are the inputs of the system (the problem instance is assumed fixed). We present our approach as a general framework, using our previous work as a case study. We then investigate system P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 26–37, 2011. c Springer-Verlag Berlin Heidelberg 2011
Kolmogorov Stability of EAs in Groups
27
stability under perturbation of initial conditions, formulating statistical notions of the behaviour of the system near “near optimal” sets of control parameters (however produced). Our approach will be probabilistic and experimental. This paper is organised as follows. In the next section, we introduce the EA, the typical kinds of problems the EA has previously been used to solve, and an example of the sensitivity of the EA in response to changes in control parameters. In Section 3, we view the EA as a black box and introduce the EA control parameters as a space, giving results about enumeration of parameter sets within the space. In Section 4 we introduce a variation on Kolmogorov distance as a metric between two EA parameter sets given their respective outputs, and formulate notions of EA stability based on this distance. We argue our work may be generalised to other EAs, providing a framework for analysing stability and performance of such algorithms. The final section concludes the paper.
2
An Example of an EA, Problem and Parameters
We give our EA structure in this section, presenting it as an example we later generalise. We shall use it to provide a baseline experimental result. For brevity, we summarise only the pertinent aspects of the EA and the group-theoretic problem here, omitting a full description. Full details may be found in [2]. Recall that an EA is an algorithm acting on a population of initially random strings representing solutions to a given problem, evolving those strings through successive populations by performing string operations on them. The operations imitate genetic processes according to the neo-Darwinian model of nature. A cost function associates to each string a crude indication (cost) of how far away the string is from an “optimal” solution. The cost function guides the EA in its selection procedure, encouraging convergence (or not), through successive approximation, to an (approximation of an) optimal solution. 2.1
The Problem and EA Parameters
The EA attacks the Double Coset Search Problem (a cryptographic problem denoted by the acronym DCSP; see [4]) in the Vershik groups, a type of partially commutative group [10] which we denote Vn . This problem may be broadly stated as a decomposition problem as follows (see [2] for a full explanation). Given a pair of elements (words) a, b in Vn , with b expressed in the form b = xay, find words x, y that are contained in given subgroups of Vn . −1 A word in Vn is a string of symbols from the set {x1 , x−1 1 , . . . , xn , xn }, where we may swap any two such symbols in the string if their subscript values differ by at least two, or cancel a symbol xi with its inverse x−1 i . For example, the −1 −1 strings v = x1 x−1 x x x x x and w = x x x x x denote the same word in 4 3 1 5 4 1 5 3 2 1 2 Vn by cancellation of the symbols x1 and x−1 and other swappings. Hence the 1 relation in the DCSP is one of equivalence under the above group operations. An instance of the DCSP is specified by the pair of words (a, b).
28
M.J. Craven and H.C. Jimbo
A solution to the DCSP is a pair (x, y) of words that satisfy b = xay. In terms of cryptographic schemes, x, y are the secret words (the key) [4]. The EA represents solutions as pairs of 2n−ary variable length strings which function as approximations to the desired words x and y. The population size p is constant. Strings in the current population are manipulated by genetic operators to produce subsequent populations. The operators are: crossover by tournament selection, roulette wheel selection, generation of random strings, and also the string operations of insertion, deletion and substitution of symbols at “recommended” positions with other symbols. We take generation count as a base measure of EA performance. This is the mean number of generations over a specified number of trials taken to solve a given DCSP instance. 2.2
EA Sensitivity
Conventional EA literature (for example, [5]) shows that, in general, EAs are especially sensitive to changes in their control parameters. These parameters must be carefully selected in order for the EA to be successful. Indeed, input of “incorrect” parameters may cause the EA to fail on most non-trivial instances. Our EA depends on a set of six interacting parameters given by the proportion of each population created by each of the above genetic operators. The parameter sum is the EA population size, p, and the parameters are fixed throughout an EA run. For the purposes of this paper we define an “optimal” set of parameters to be one for which the EA solves a selection of instances with “minimal” generation count1 . Of course, there may be more than one such set. Further, as parameters are constant through an EA run, these parameters should (in some sense) serve the needs of the population at each evolution stage [7,9]. We now define the parameter set concept more precisely. The parameter space is the set S = {(pi )6i=1 : pi = p, pi ≥ 0} of sextuples (vectors) of non-negative integers which sum to p. A parameter set is an ordered vector p ∈ S, with each component pi the number of executions of each operator in the order: crossover, substitution, deletion, insertion, selection and random generation. Hence, for example, the parameter sets (5, 33, 0, 2, 10, 0) and (33, 5, 0, 2, 10, 0) are distinct. To exhibit sensitivity to parameter change of our EA, we present Table 1. For each parameter set shown on Table 1, the corresponding generation count, g, depicts the mean generation count excluding “outliers” from ten runs of the EA using that parameter set, with σ representing the standard deviation. Table 1. An illustration of EA sensitivity to parameter change for p = 200 Parameter Set p p1 = (5, 33, 4, 128, 30, 0) p2 = (4, 34, 3, 129, 29, 1) p3 = (5, 3, 4, 158, 5, 25) p4 = (22, 40, 21, 54, 39, 24) 1
g 179 198 641 1231
σ(g) Mean Time (sec) 51.8 20 69.9 22 363.3 72 480.9 130
The term minimal is used in an experimental rather than mathematical sense.
Kolmogorov Stability of EAs in Groups
29
The final column records the corresponding mean runtime in seconds. In all runs we used the same instance, chosen to exhibit straightforward solution by the EA. The first parameter set in Table 1 was produced by deterministic search, and has an associated generation count of 179. The second, p2 , represents the small change of one in each parameter, and has the effect of increasing the generation count by 19. In comparison, the third parameter set has the distinctions of a decrease in selections (to 5), and an increase in insertions (to 158) and random inclusions (to 25). The generation count increases to 641 as a result. Finally, the parameter set p4 bears little outward resemblance to the original p1 , and its generation count is over six times that of p1 . Note, Table 1 utilises only one instance; by experiment we found this behaviour to be representative of a wide selection of instances. We also found many examples of parameter sets that do not work at all, giving EA non-convergence. By analysing the EA outputs, we found that runs associated to increased generation counts were mostly comprised of stagnation caused by likely local minima of the EA cost function. Many of the EA parameters are correlated; for example, a small number of string operation sequences have an identical effect on the words involved. Thus simply changing the value of one parameter (and another parameter to compensate) may not produce a lower generation count. Next, to measure EA stability under random perturbation of its parameters, we consider our EA as an abstract system, and construct a distance function over the parameter space.
3
The EA as a Black Box
We now choose to view the EA as an abstract function acting upon a population of strings to produce a new population: effectively, the EA is a black box. Again, fix an instance of the DCSP; we call this the initial condition of the system. 3.1
Setup
The input of the system is the parameter set of the EA. The outcome of the system is defined to be some measurement of “time” (that is, the generation count) that the EA took to successfully find a solution to the instance. Note that by definition of the DCSP, a specified instance always has a solution. The output of the system is then the outcome in addition to the solution of the instance. That is, the EA may be seen as a function q : S −→ N0 × Vn .
(1)
The stability of the system is, in a practical manner, the degree of change in outcome resulting from a small change in input (relative to some base input and base outcome). The measurement of generation count gives a clear indication whether the EA has converged and has thus found a solution to a particular DCSP instance, and thus it is sensible to measure the stability of the EA as a function of this measurement. In the next subsection we define a distance function over the parameter space, and use it to induce neighbourhood structures.
30
3.2
M.J. Craven and H.C. Jimbo
The Distance and Neighbourhood Structure on Parameter Sets
Consider the Manhattan distance function d(p, p ) =
6
|pi − pi |,
(2)
i=1
defined as the sum of the absolute differences between the respective parameters in any two parameter sets p, p ∈ S (considered as vectors). For example, we have d((1, 1, 1, 0, 1, 6), (2, 3, 1, 1, 1, 2)) = 8. Clearly, (S, d) is a metric space under the function d. A parameter set p is contained in the β−neighbourhood of a parameter set p if and only if d(p, p ) ≤ β for a given bound β ≥ 2. We write this as p ∈ Nβ (p). So for the above example, we have p ∈ N8 (p). Observe that Remark 1. For any two distinct parameter sets p, p , the value of the distance function d(p, p ) is always even. Given a parameter set p ∈ S and a bound β, we may estimate how many parameter sets are contained in the β−neighbourhood. This gives a conception of the size of the local search space. In the next subsection, we give an example of the enumeration of parameter sets in a neighbourhood with bound β = 6. 3.3
An Example of Neighbourhood Enumeration for β = 6
Consider the number of possible parameter sets in the 6−neighbourhood of a parameter set p such that, for simplicity, all parameters have values at least β2 . We wish to find the number of parameter sets p a distance of at most six away from p. Firstly, we give a form of parameter set that lies on the boundary of the neighbourhood (so d = 6). One situation where this occurs is when the values of each respective parameter from p and p differ by one; so, for example, the first parameter p1 of p has the value p1 = p1 ± 1. We know that another parameter must compensate for this difference, and so, without loss of generality, assume the compensating parameter is the second parameter, p2 . Then p2 = p2 ∓ 1. Continuation of this approach produces twenty parameter sets of the form p = (p1 ± 1, p2 ∓ 1, p3 ± 1, p4 ∓ 1, p5 ± 1, p6 ∓ 1). We also include in N6 (p) the other forms of parameter sets on the boundary, those strictly inside the boundary of the neighbourhood (d < 6), and p itself. Now, to each parameter set p associate a tuple of signed integers (t1 , . . . , t6 ) where each integer ti may be zero, such that in p each parameter is of the form pi = pi + ti . We omit all occurrences of zero, writing each tuple in descending numerical absolute value order. For example the tuples (3, −2, −1, 0, 0, 0) and (0, −2, 0, 3, −1, 0) are written (3, −2, −1). We also take the negative of the tuple to be the same as the tuple; in this case, (−3, 2, 1) is considered to be the same tuple as above. Hence each tuple corresponds to many parameter sets, but the tuple is uniquely specified. A summary of the forms of such tuple for an arbitrary
Kolmogorov Stability of EAs in Groups
31
Table 2. All non-trivial forms of tuple for which d(p, p ) ≤ 6 Tuple Inducing p d(p, p ) (±1, ±1, ±1, ∓1, ∓1, ∓1) 6 (±3, ∓3) 6 (±3, ∓1, ∓1, ∓1) 6 (±2, ∓2, ±1, ∓1) 6 (±2, ±1, ∓1, ∓1, ∓1) 6 (±3, ∓2, ∓1) 6 (±2, ∓2) 4 (±1, ±1, ∓1, ∓1) 4 (±2, ∓1, ∓1) 4 (±1, ∓1) 2 Total
# 20 30 120 240 240 360 30 90 120 30 1280
Table 3. The numbers of non-trivial parameter sets for varying bounds β Bound β
2
4
6
8
10
12
#{p : d(p, p ) = β} 30 240 1010 2970 7002 14240 #Nβ (p) 30 270 1280 4250 11252 25492
parameter set p is given by Table 2, where we also give the number (#) of such tuples of each form and the distance of each induced parameter set p from p. To measure growth in the number of parameter sets as the bound β increases, we performed computer-based enumeration (Table 3). The first row of Table 3 gives the number of parameter sets of distance β from p, and the second row the number of non-trivial parameter sets contained in the β-neighbourhood. If we negate our original assumption and take pi < β2 for at least one i, then the above calculations do not hold. In this case, the offending parameters have reduced ability to compensate for other parameters, reducing the total number of parameter sets available. Next, we detail how to compute a measure of stability of the EA via computation of Kolmogorov distances between parameter sets.
4
EA Kolmogorov Stability
Recall we view the EA as a function q : S −→ N0 × Vn from the parameter space S to the product of the possible cost values and the group Vn (containing the solution of the instance). Let h = q |N0 be the standard projection of q onto N0 (the generation count). For a parameter set p ∈ S let Eh (p) be the expectation (mean) of h over t ≥ 1 trials of the EA. Note, if the EA fails then we take its generation count to be s (set by experimentation). 4.1
Kolmogorov Distance
For any parameter set p ∈ Nβ (p) ⊂ S in some β−neighbourhood of p define the following variation on Kolmogorov distance [3]:
32
M.J. Craven and H.C. Jimbo
dK (p, p ) = (Eh (p) − Eh (p )) . 2
(3)
Observe that (3) is a distance function over the output space (and not over the parameter space S). We now give an approach to estimate the distribution of Kolmogorov distances over neighbourhoods of any parameter set p satisfying the conditions of Section 3.3, given a bound β. 4.2
Estimation of Distribution of Kolmogorov Distances over Nβ (p)
Let us choose a random parameter set p0 as above (which may be identified as “good” or not) and specify its neighbourhood by fixing a bound β. Let N be a positive integer, chosen to be sufficiently smaller than #Nβ (p0 ). Now take a sample R = {p1 , . . . , pN } ⊆ Nβ (p0 ) (without replacement, and uniformly at random) of size N from the β−neighbourhood of p0 . Let dij be the Kolmogorov distances between the pair of parameter sets (pi , pj ) in R, and compute dij for all 0 < |i − j| ≤ N . This approach computes (3) over the sample distributions of p and p , enabling identification of parameter sets which have small Kolmogorov distance from our chosen parameter set. In all experiments, we will use t = 1 trial for speed purposes and the same instance of the problem for every EA run (to ensure results are comparable). All experiments have been performed on a Centrino Core Duo 1.8 GHz computer, with 1GB RAM running GNU C++. 4.3
Experiment 1
Firstly we chose p0 to be a “good” parameter set; that is, one for which the EA performs well. We chose the parameter set p1 from Table 1. We took the sample sizes N = 10 and N = 100 (including the original parameter set p0 , ensuring all sample points distinct), s = 1500, and computed the following statistics: – d, the mean parameter distance over all pairs (pi , pj ) (i = j) of parameter sets from the sample R; – σ(d), the standard deviation of the above parameter distances; – dK , the mean Kolmogorov distance; – σ(dK ), the standard deviation of the Kolmogorov distance. Table 4 shows the statistics for bounds β ∈ {4, 6, 8, 10, 20, 30, 40, 50}. The first two statistics give key information about the distribution of the sample drawn from the neighbourhood. Note the average distance d is greater than the bound β in many cases. This was expected, because we compared the distances of pairs of parameter sets from each other (rather than from the parameter set p0 ). For example, if we take p1 = (5, 33, 1, 127, 34, 0), p2 = (2, 33, 5, 131, 29, 0) ∈ N8 (p0 ) then we have d(p1 , p0 ) = d(p2 , p0 ) = 8 but d(p2 , p1 ) = 16. We observe that for N = 10, the mean Kolmogorov distance dK generally increases as the bound β increases; this is also true for the standard deviation σ(dk ). This is because when the bound β increases there are an increased number of possible parameter
Kolmogorov Stability of EAs in Groups
33
Table 4. Presenting the above statistics for a “good” parameter set (N = 10, N = 100) β\N 4 6 8 10 20 30 40 50
d 5.24 8 8.27 12.12 20.36 27.42 37.96 46.31
10 σ(d) dK 1.82 7633 2.80 6318 3.26 23888 3.61 19900 5.47 9535 9.87 4674 15.51 23766 15.93 53193
σ(dK ) 8174 8731 24697 21831 10291 6770 26399 60513
d 5.72 7.88 10.14 12.43 21.03 30.23 39.62 54.20
100 σ(d) dK 1.80 14487 2.63 33546 3.37 22412 4.17 17125 7.31 30420 11.14 31265 14.59 49868 18.86 58920
σ(dK ) 25286 80462 43527 25961 51929 48157 121264 137044
sets in the β−neighbourhood, and so more possibilities to choose parameter sets so contained that have a widely disparate generation count (and hence larger Kolmogorov distance). For the same reason, the mean Kolmogorov distance is generally larger for sample size N = 100 than that for N = 10. This indicates that, in a small neighbourhood of a “good” parameter set, the EA is relatively stable because the mean Kolmogorov distance is relatively low. Next, we repeat the above experiment with a “bad” parameter set. 4.4
Experiment 2
Now we choose p0 to be a “bad” parameter set. By this we mean that we wish the EA to produce a solution, but to not have good performance. We chose the parameter set p4 from Table 1. Analagously to Experiment 1, we took sample sizes of N = 10 and N = 100 and sampled N parameter sets from the β−neighbourhood of the “bad” parameter set. Intuition suggests that the EA will exhibit a greater measure of “sensitivity” in the neighbourhood. Table 5 shows the statistics produced (as in Experiment 1) for bounds β ∈ {4, 6, 8, 10, 20, 30, 40, 50}. Table 5. Presenting the above statistics for a “bad” parameter set (N = 10, N = 100) β\N 4 6 8 10 20 30 40 50
d 5.16 7.6 10.62 11.87 23.33 36.36 41.78 64.44
σ(d) 1.57 2.24 2.89 3.34 6.61 10.10 13.77 17.44
10 dK 503736 330015 397017 433137 325410 519482 631867 320600
σ(dK ) 521974 332931 424063 426221 358591 499276 636145 358747
d 5.90 8.23 10.54 13.00 24.90 36.00 47.75 61.76
σ(d) 1.74 2.41 3.04 3.68 7.05 10.00 13.42 17.50
100 dK 388324 363054 416616 394552 398669 448576 422592 438785
σ(dK ) 429884 398837 459149 432645 439422 483483 468304 486685
34
M.J. Craven and H.C. Jimbo
Observe that the mean and standard deviations of the Kolmogorov distance are greater than those of Experiment 1 in every case. For example, with β = 20, N = 100 we have mean Kolmogorov distance 398669 for Experiment 2 and 30420 for Experiment 1. This implies a large mean variation in EA performance for small perturbations of a “bad” parameter set. Contrary to Experiment 1, the mean and standard deviation of the Kolmogorov distance is not generally higher for sample size N = 100 than for sample size N = 10. The final experiment takes the sample from Experiment 1 and measures the Kolmogorov distance between pairs (p0 , pi ) for i ∈ {1, . . . , N }. 4.5
Experiment 3
In this experiment, we take the data generated in Experiment 1 and only compare the parameter sets in the sample S to the “good” parameter set p0 . That is, we do not compare all pairs of parameter sets (pi , pj ); instead we compare all pairs (p0 , pi ) for i ∈ {1, . . . , N } for sample sizes N ∈ {10, 100, 1000} (Table 6). As we expect, the mean parameter distance d is always at most β. This is because we are only comparing distance between sample points and the initial parameter set p0 . For N = 10 and N = 100, we observe the mean and standard deviation of the Kolmogorov distance are lower for this “local” comparison than for Experiment 1. This is expected because, for example when N = 100, we compare 9900 pairs of parameter sets in order to find the Kolmogorov distances in Experiment 1 but only 99 pairs in Experiment 3. Further, we pose the question “Does there exist a correlation between the mean distance d of parameter sets from the “good” parameter set, and their mean Kolmogorov distance dK for each bound β?”. In Table 7, we calculate the standard correlation between the parameter distance d and the Kolmogorov distance dK over the N = 1000 sample points from Table 6. Table 7 indicates there is no discernable correlation between the distances d and dK for any of our tested bounds β, and hence we observe that, according to statistical evidence, there is no (linear) relationship between parameter and Kolmogorov distance. In the next subsection, we discuss our results. Table 6. Presenting the above statistics for a “good” parameter set. “Local” comparison only for sample sizes N ∈ {10, 100, 1000}. β\N 4 6 8 10 20 30 40 50
d 3.56 5.78 7.33 8.89 18.22 24 30.67 35.33
σ(d) 0.88 0.67 1 1.45 2.33 5.57 5.92 8.19
10 dK 4534 5185 21064 34552 10968 2380 30259 39822
σ(dK ) 5336 6482 19678 29105 9704 4311 32127 60403
d 3.74 5.45 7.27 9.01 17.54 24.53 31.19 40.34
100 σ(d) dK 0.68 11245 1.10 18375 1.19 11327 1.52 24462 2.44 19855 4.66 15634 7.12 25885 8.33 43981
σ(dK ) 10784 44893 25015 19399 43369 26379 85745 124311
d 7.14 8.86 16.78 24.26 31.35 38.46
1000 σ(d) dK 1.38 10043 1.62 36443 3.08 21093 4.99 14978 7.26 39232 9.76 30479
σ(dK ) 23907 39356 67665 34189 113541 91676
Kolmogorov Stability of EAs in Groups
35
Table 7. The correlation between parameter distance and Kolmogorov distance for β ∈ {8, 10, 20, 30, 40, 50}: local comparison for a “good” parameter set β 8 10 20 30 40 50 corr(d,dK ) 0.00165 -0.10312 0.04007 -0.01502 0.04356 0.06301
4.6
Discussion
Observe from Tables 4–6 that the results do not follow a constant pattern; there is natural variation in results from the natural stochasticity of the EA. For a “good” parameter set, we observe in many cases (for example, in Table 4 with N = 100, β ≥ 4, and Table 6 with N ≥ 100, β ≥ 40) that the standard deviation of Kolmogorov distance is much greater than the value of the mean Kolmogorov distance. Hence the distribution of Kolmogorov distances is positively skewed, indicating we may perturb the “good” parameter set by a reasonable amount and not overly affect the EA generation count. As the bound increases, the mean Kolmogorov distance increases (but not significantly), meaning that in a small neighbourhood of that parameter set the EA is locally stable. On the contrary, for a “bad” parameter set (Table 5) we observe no case where the standard deviation of Kolmogorov distance is demonstrably larger than the mean. Hence the distribution of Kolmogorov distances is not as positively skewed as the above. In addition, the means and standard deviations of Kolmogorov distance are much larger than that of Table 4. Taking the sample size N = 100 and computing the ratio of mean Kolmogorov distance of the “bad” with “good” parameter sets for each bound β, we produce Table 8. The ratios in the table may be viewed as a measure of relative stability of the EA around “bad” parameter sets compared to “good” ones, and indicate that the EA exhibits a high level of instability in the proximity of a “bad” parameter set. This is especially clear close to the “bad” parameter set, where we have a ratio of 26.1. As the bound β increases, this relative instability decreases; intuitively, this should be expected. The above observations imply there are parameter space regions containing concentrations of “good” parameter sets (that is, we may consider those in close proximity to “good” parameter sets to also be good), rather than sparse concentrations of such parameter sets over the whole space. Table 4 with N=100 suggests that around a “good” parameter set we may safely perturb parameters within a distance of 20 to 30 before larger mean Kolmogorov distances occur. At the lower bound β = 20, the square root of the mean Kolmogorov distance indicates a mean difference in generation count of 176.8 from that of the “good” parameter set (approximately the generation count of the “good” parameter set Table 8. The ratios of mean Kolmogorov distance of “bad” parameter sets with that of “good” parameter sets (N = 100) β 4 6 8 10 20 30 40 50 Ratio of Means 26.1 10.8 18.6 23.0 13.1 14.3 8.5 7.4
36
M.J. Craven and H.C. Jimbo
Table 9. Kolmogorov sensitivity for the “good” (first two rows) and “bad” (second two rows) parameter sets. Sample sizes: N = 10, N = 100). β Good N = 10 Good N = 100 d /d Bad N = 10 K Bad N = 100
4 1457 2353 97623 65818
6 790 4257 43423 44113
8 2889 2210 37384 39527
10 1642 1378 36490 30350
20 468 1447 13948 16011
30 170 1034 14287 12460
40 626 1259 15122 8850
50 1149 1087 4975 7105
itself (Table 1)). Hence this bound corresponds to a region around the parameter set where the EA is locally stable. In contrast, around a “bad” parameter set, Table 5 shows far smaller increases in mean Kolmogorov distance as we increase the bound β. Hence there does not seem to exist a region where the EA is locally stable around the “bad” parameter set. Given some bound β, we define Kolmogorov sensitivity as the ratio dK /d of the mean Kolmogorov distance and mean parameter distance for that value of β. We computed this sensitivity for the “good” and the “bad” parameter sets and values N = 10, N = 100 (Table 9). Observe that for Experiment 1, the sensitivity ratio displays a small general decrease as β increases. On Experiment 2, however, the sensitivity ratio shows a marked decrease as β increases. This may mean that, even though the mean Kolmogorov distance remains relatively constant for Experiment 2 (cf. Table 5), its importance dramatically decreases the further we move away from the bad parameter set in its neighbourhood.
5
Conclusion
We have introduced the concept of local stability of an EA around a parameter set using the notion of Kolmogorov distance. This measures the effect on EA generation count when its parameter set is perturbed, and also answers in the affirmative a question posed in previous work [1]: “Do there exist regions of parameter space in which the EA is locally stable?”. Indeed, in the previous section, we gave an example of such a region. We have also detailed a framework for analysing the stability and the performance of EAs. Using the EA of [2] as an example, we have shown that we may analyse the performance of an EA relative to its control parameters. To generalise our work to EAs in general, we propose that a prospective EA meets the following criteria: 1. The EA has a well-defined parameter structure; 2. There are a selection of instances (at least one) of the problem that give reasonable baseline performance; 3. There exists a “good” initial guess at reasonable parameters. This can be determined by parameter search, logic, experimentation or otherwise; 4. The EA has relatively consistent EA performance. This may be ensured by increasing the number of trials, t.
Kolmogorov Stability of EAs in Groups
37
We believe our model of stability of EAs with respect to their parameters may serve as a framework for parameter optimisation, since by sampling parameter sets within a neighbourhood we may extract a measure of its local stability, and hence, an indication of whether isolated parameter sets (or clusters of them) are likely to exist in that neighbourhood. In addition, our model admits further research contributions towards EA stability and performance. It is acknowledged that the stability of an EA also depends upon the test instance(s) used. A fuller statistical treatment of stability may be obtained by running our EA example over a wider selection of instances and larger samples for each bound tested; this is something we wish to pursue in future work.
References 1. Craven, M.J.: Coevolution-Based Parameter Optimization in Group-Theoretic Evolutionary Algorithms. In: Proc. Fourth International Conference on Neural, Parallel and Scientific Computations, Atlanta, USA, pp. 108–113 (2010) 2. Craven, M.J.: Genetic Algorithms for Word Problems in Partially Commutative Groups. In: Cotta, C., van Hemert, J. (eds.) EvoCOP 2007. LNCS, vol. 4446, pp. 48–59. Springer, Heidelberg (2007) 3. Gy¨ orfi, L., Vajda, I., van der Meulen, E.: Minimum Kolmogorov Distance Estimates of Parameters and Parametrized Distributions. Metrika 43, 237–255 (1996) 4. Ko, K.: Braid Group and Cryptography, 19th SECANTS, Oxford, UK (2002) 5. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn. Springer, Heidelberg (1996) 6. Reidys, C., Stadler, P.: Combinatorial Landscapes. SIAM Review 44, 3–54 (2002) 7. Stadler, P.: Fitness Landscapes. In: L¨ assig, M., Valleriani, A. (eds.) Biological Evolution and Statistical Physics, pp. 187–207. Springer, Heidelberg (2002) 8. Stanley, R.: Enumerative Combinatorics 1. Cambr. Stud. Adv. Math. 49 (1999) 9. van Nimwegen, E., Crutchfield, J., Mitchell, M.: Finite Populations Induce Metastability in Evolutionary Search. Phys. Lett. A 229, 144–150 (1997) 10. Wrathall, C.: The Word Problem for Free Partially Commutative Groups. J. Symbolic Comp. 6, 99–104 (1988)
A Matheuristic Approach for the Total Completion Time Two-Machines Permutation Flow Shop Problem Federico Della Croce, Andrea Grosso, and Fabio Salassa D.A.I., Politecnico di Torino, Italy Dip. Informatica, Universit` a di Torino, Italy
Abstract. This paper deals with the total completion time 2-machines flow shop problem. We present a so-called matheuristic post processing procedure that improves the objective function value with respect to the solutions provided by state of the art procedures. The proposed procedure is based on the positional completion times integer programming formulation of the problem with O(n2 ) variables and O(n) constraints.
1
Introduction
In the present work a matheuristic solution approach is proposed for minimizing thetotal (or average) completion time in a 2-machines flow shop problem (F 2| | Ci in the three-fields notation of Graham et al. [10]). In a 2-machines flow-shop environment a set of jobs N = {1, 2, . . . , n} is to be scheduled on two machines, and each job i ∈ N is made up of two operations to be executed in order one after the other, the first (respectively, the second) operation requiring to run continously for p1i (resp. p2i ) units of time on the first (resp. second) machine. The completion time Ci of a job i ∈ N in a schedule S is defined as the completion time of its second operation. The F 2| | Ci problem calls for finding a schedule S that minimizes f (S) = Ci (S). i∈N
The problem is known to be NP-complete; also, at least an optimal solution is known to be a permutation schedule, where the (operations of the) jobs share the same sequence on both machines. Thus we deal equivalently with the permutation flow shop problem F 2|perm| Ci . The flow shop problem is one of the oldest and best known production scheduling models and the available literature is extensive. We refer to [7,14,17,18] for contributions related to the objective function tackled in this work. Exact algorithms (mainly “ad-hoc” branch and bound, [1,2,3,5,13,21]) and MILP-based approaches [20], have also been proposed, but due to their important computational times, these methods are mainly suitable to solve relatively small size instances. To the authors’ knowledge the best results obtained for the F 2| | Ci problem have been achieved by the Recovering Beam Search method (RBS) of [4]. P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 38–47, 2011. c Springer-Verlag Berlin Heidelberg 2011
Matheuristics for F 2| |
Cj
39
Also, Dong et al. [7] proposed a very effective Iterated Local Search method for the more general m-machines permutation flow-shop (F m|perm| Ci ), but computational experience is not reported for the 2-machines case. Matheuristics are methods that recently attracted the attention of the community of researchers, suddenly giving rise to an impressive amount of work in a few years. Matheuristics lye on the general idea of exploiting the strength of both metaheuristic algorithms and exact methods as well, leading to a “hybrid” approach (see [16]), but because of their novelty there is no unique classification nor a consolidated working framework in the field; hence, it is hard to state a pure and sharp definition of these methods. A distinguishing feature is often the exploitation of nontrivial mathematical programming tools as part of the solution process. For example, in [8] a sophisticated Mixed-Integer Linear Programming (MILP) solver is used for analyzing very large neighborhoods in the solution space. Hence matheuristics are seen by many as a combination of metaheuristics and mathematical programming, even though this does not cover the complete range of possibilities. A crucial issue also underlined in [16], is that the structure of these methods is not a priori defined and in fact a solution approach can be built in many different ways. As a general example, one can construct a matheuristic algorithm based on an overarching well known Metaheuristic, a Variable Neighborhood Search for example [6,11], with search phases realized by an exact algorithm as well as by a MILP solver. A different, more loosely coupled approach could be a two-stage procedure: a first heuristic procedure is applied to the problem for generating a starting solution and then a post processing “refinement” procedure is applied exploiting, for example, some peculiar properties of the mathematical formulation of the problem under analysis; this second example is the core idea of the present work. Pursuing the above sketched idea of a two-stage procedure, we couple a heuristic algorithms like RBS with a neighborhood search based on a MILP formulation solved by means of a commercial tool. The two-stage approach is appealing because of its simplicity — allowing to tinker with building blocks plus some glue-code — and for the possibility of concentrating more on modeling the neighborhood instead of building up the search procedure. Exploiting this idea we get very good results, improving solution’s quality over the state of the art heuristics. The paper is organized as follows: in Section 2 a MILP model for the problem is recalled and the proposed matheuristic procdure is described. In Section 3 computational results are reported, and final remarks are given in Section 4.
2
Basic Model and Matheuristic Approach
Following the two-stage scheme we execute in the second stage an intensive neighborhood search starting from the solution delivered in the first stage. We define a neighborhood structure relying on a MILP model of the F 2|perm| Cj problem; we stand on the model with positional variables (see [12,15]), since
40
F. Della Croce, A. Grosso, and F. Salassa
it offered better performances with respect to other classical models based on disjunctive variables and constraints. Let Cki be variables representing the completion times of i-th job processed by machine k = 1, 2 and xij 0/1 decision variables, where i, j ∈ {1, . . . , n}. A variable xij is equal to 1 if job i is in position j of the sequence, zero otherwise. The problem can be formulated as follows. min
n
C2j
(1)
j=1
subject to n i=1 n
xij = 1
∀j = 1, . . . , n
(2)
xij = 1
∀i = 1, . . . , n
(3)
j=1
C11 =
n
p1i xi1
(4)
i=1
C21 = C11 +
n
p2i xi1
(5)
i=1
C1j = C1,j−1 +
n
p1i xij
∀j = 2, . . . , n
(6)
∀j = 2, . . . , n
(7)
∀j = 2, . . . , n
(8)
i=1
C2j ≥ C1j +
n
p2i xij
i=1
C2j ≥ C2,j−1 +
n
p2i xij
i=1
xij ∈ {0, 1}
(9)
where constraints (2)–(3) state that a job is chosen for each position in the sequence and each job is processed exactly once. Constraints (4)–(6) set the completion time of the first job on both machines. Constraints (7)–(8) forbid for each job the start of the 2-nd operation on the corresponding machine two before its preceding operation on machine one has completed. The heuristic algorithm considered for the first stage is RBS from [4]; RBS is a beam search technique combined with a limited neighborhood search typically based on job extraction and reinsertion. For the F 2| | Ci problem it offers high execution speed combined with a good solution quality. In designing a neighborhood concept for the second-stage search, a crucial issue is that the structure of the neighborhood should be as much as possible “orthogonal” to the structure of the neighborhoods used by the first-stage
Matheuristics for F 2| |
Cj
41
heuristic. That is, we do not want the solution delivered by the first stage to be (close to) a local optimum for the second stage. Hence we tried to design a neighborhood with many more degrees of freedom, still keeping in mind that the perturbation of the current solution should not fully disrupt its structure. ¯ in model (1)–(9) this obviously corresponds Consider a working sequence S; to a valid configuration x ¯ = (¯ xij : i, j = 1, . . . , n) satisfying constraints (2)–(3), ¯ We define a neighborhood with x ¯ij = 1 iff job i appears in the j-th position of S. ¯ N (S, r, h) by choosing a position r in the sequence and a “size” parameter h; ¯ h) = {[r], [r + 1], . . . , [r + h − 1]} be the index set of the jobs located in let S(r; the consecutive positions r, . . . , r + h − 1 of sequence S¯ — we call such run a ¯ r, h) is “job-window”. The choice of the best solution in the neighborhood N (S, accomplished by minimizing (1) subject to (2)–(9) and xij = x ¯ij
¯ h), j ∈ ∀i∈ / S(r; / {r, . . . , r + h − 1}.
(W)
The resulting minimization program — we call it the window reoptimization problem — is solved by means of an off-the-shelf MILP solver. The additional constraints (W) state that in the new solution all jobs but those in the window are fixed in the position they have in the current solution, while the window gets reoptimized — the idea is sketched in Figure 1. If no improved solution is found a new job-window is selected to be optimized until all possible O(n) windows have been selected. The search is stopped because of local optimality (no window reoptimization offers any improved solution) or because a predefined time limit expires. It is known that exact methods are usually not suited for this kind of problems because of the amount of CPU time they need to solve a problem but this is true for larger size problems while they can perform well only on relatively small size instances. Exploiting this issue with our approach, a subproblem is generated with few variables and in such case we know that commercial, open source or custom exact methods — used almost “out of the box” — can be well performing at analyzing large scale (exponential) neighborhoods of a given solution. With respect to the choice of the windows, a first-improvement strategy has been implemented: as soon as an improved solution is found, solving a window
Fig. 1. Example of jobs window reoptimized
42
F. Della Croce, A. Grosso, and F. Salassa
reoptimization problem, that solution becomes the new current. The choice of the windows (the r index) is randomized — keeping track of the already examined windows. The algorithm can be schematically described as follows. x ¯ = heuristic solution from 1st stage repeat Set improved := false; repeat Pick r ∈ {1, . . . , n − h + 1} randomly; ¯ h); Compute S(r; minimize (1) subject to (2)–(9) and (W) Let x ˆ be the optimal solution; if f (¯ x) > f (ˆ x) then x¯ := x ˆ Set improved := true; end if until improved or all r values have been tried until not improved or time limit expired In order to limit the time to search a window we stop the window reoptimization after a time limit Tw , concluding with the best incumbent available and the neighborhood being only partially searched. We note anyway that for reasonable values of Tw most of the times the window reoptimization can be fully performed. Our design choices rest on top of a preliminary computational study, as we outline below. Tests performed in order to compare the performances of models based on disjunctive constraints against models with positional variables pointed out that window reoptimization becomes substantially less efficient, requiring higher computation time. This phenomenon was quite expected since disjunctive models are popularly considered weaker than positional models because of the substantial number of “big-M” constraints involved in such formulations. Decision taken about generating neighbors based on windows is justified by preliminary tests conducted on a pool of instances of various sizes. In principle, an even simpler neighborhood definition could require the reoptimization of a completely general subset of jobs — not necessarily consecutively sequenced. Anyway it turned out that often the first-stage solutions delivered by RBS were nearly local minima for neighborhoods based on rescheduling non consecutive jobs, thus missing the desired “orthogonality” between first and second-stage neighborhoods. This phenomenon is much less common when using windows. The window size h is the key parameter in our approach; its choice is dictated by the need of trading off between the chances of improving a given solution and the CPU time the solver needs to actually perform the reoptimization. A small window size makes reoptimization faster, but of course it restricts the size of the
Matheuristics for F 2| |
Cj
43
neighborhood; on the other hand the neighborhood should be, obviously, as large as possible in order to have more chances of improving the current solution. After testing the same pool of instances, we fixed h = 12; this value proved to be a robust choice, giving good results through extensive computational tests (see Section 3) on instances with 100, 300 and 500 jobs. The value h = 12 should be considered only as an indication of the order of magnitude for the parameter: note that the choice may also depend on the technology of the underlying solver — that is used in a “black-box” fashion, and whose internals may not be fully known.
3
Computational Results
We ran tests on a PC equipped with a Xeon processor at 2.33 GHz, with 8 GB of RAM; CPLEX 12.1 was used as MILP solver. CPLEX default parameters were kept, without attempting to tune them. In order to generate the first stage solution of each instance we ran RBS (from [4]) with beam size 10. Computational experience showed that widening such parameter does not significantly improve the performances of RBS. Tables 1–3 report the performances achieved by the two-stage procedure on batches of twenty instances generated as in [4], for n = 100, 300, 500, with integer processing times randomly drawn from the uniform distribution [1, 100]. The first stage never consumed more than 15 seconds of CPU time, and we allowed the second stage to run with a time limit of 60, 300, and 3600 seconds respectively for the three problem sizes. The time limit Tw for the window reoptimization was set to 10, 60 and 100 seconds for n = 100, 300, 500 respectively, but in all cases the window reoptimization was achieved well before the limit. The tables compares the quality of solutions delivered by the matheuristic approach against those delivered by generated by pure RBS, ILS and, for n = 100, Ant Colony Optimization (from [19] and kindly provided by the authors). ILS ran for the same time limit given to the two-stage procedure; the ILS code was kindly provided by the authors of [7]. Table 1 is related to the tests for n = 100. Columns RBS, ILS, ACO report the objective function values for the solutions delivered by such approaches. Column MATHEUR reports the objective values delivered by our matheuristic approach, and column MATHEUR* reports the results of our approach within a time limit of 300 seconds. The reason to test our procedure against wider time limits is justified by the wish to verify if a local minimum has been reached in the benchmark time limit or if the time limit stopped the approach while improvements were still to be found. Column LB reports a lower bound on the optimal values of the instances derived from the LP relaxation of (1)–(9); in order to obtain a stronger bound, LB is defined as the minimum lower bound among the open nodes after 60 seconds of CPLEX branch and bound. The best feasible solution obtained by CPLEX after such amount of work is also considered in the column labeled UB.
44
F. Della Croce, A. Grosso, and F. Salassa
All heuristics allow for a narrow optimality gap, in all cases the two-stage search strongly dominates all the other compared approaches. We note that CPLEX delivered, after one minute of branch and bound, a solution whose quality is in most cases dominated by that of ILS or RBS. We also note that ACO is dominated by ILS and/or RBS on all instances; for this reason we do not consider ACO a competitor on larger instances, focusing on the comparison with ILS and CPLEX. The above considerations can be replicated for Table 2 where we consider n = 300. In this case the benchmark time limit is 600 seconds while the extended time limit is 1800 seconds. The matheuristic approach gave better results in all cases except one, where CPLEX get gets a better solution, while within the extended time limit our procedure was always the winner. Moreover, 600 seconds were never sufficient to certify a local minimum. Results presented in Table 3 for n = 500 jobs were slightly less effective than the other tests. In this case, in fact, CPLEX performed better than our procedure on 6 instances over 20 within the time limit of 1 hour. We note that even on such instances, ILS usually does not outperform the matheuristic approach (it happens only on instance #15). The time limit for MATHEUR* was fixed to 7200 seconds. The performances of the matheuristic on these largest instances could be improved by further calibration of the window size — only a mild effort has been devoted to it in this work — and/or incorporating some diversification technique in the second-stage search, that up to now consists of pure intensification. Table 1. Results for n = 100, 60 secs Inst 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
RBS
ILS
ACO
190580 192250 175496 190894 173260 187234 166930 193122 171435 173666 194496 178703 188090 204725 186049 194925 192079 202725 176443 178313
190599 192346 175586 190822 172787 187226 166154 193106 171374 173470 194164 178713 187590 205101 186173 194619 192382 202021 176216 178349
191075 192888 176073 191311 173198 187633 166490 193796 171974 174072 195153 179516 187582 205020 186455 195058 192437 202247 176957 178796
MATHEUR MATHEUR* 190254 192159 175350 190545 172708 186968 166107 192881 171261 173360 194090 178416 187214 204441 185689 194433 191935 201996 176144 178014
190223 192159 175327 190540 172597 186943 165956 192876 171182 173353 194042 178399 187191 204405 185650 194381 191848 201818 176112 177993
LB
UB
189843.64 191741.95 174909.20 190100.11 171923.31 186623.70 165449.77 192529.62 170762.77 172822.78 193602.12 178016.53 186786.80 203950.38 185166.21 193773.17 191282.61 201291.25 175570.47 177651.31
190637 192541 175694 191192 173041 187228 166294 193112 171575 173619 194378 178747 187516 204856 186182 194977 192466 202273 176514 178210
Matheuristics for F 2| |
Cj
Table 2. Results for n = 300, 600 secs Inst 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
RBS
ILS
1703211 1632209 1737151 1761128 1836372 1765979 1797299 1803321 1784953 1835515 1739093 1755873 1820591 1691438 1792010 1830889 1825747 1756024 1756944 1827704
1702256 1636048 1738890 1762538 1840062 1768072 1798523 1806213 1780052 1840146 1740728 1749832 1823739 1692350 1795283 1832292 1826510 1757753 1758224 1828647
MATHEUR MATHEUR* 1700127 1630997 1736218 1759344 1834379 1764601 1794095 1802100 1778751 1832610 1736557 1748247 1817335 1689231 1789677 1829700 1822444 1754160 1754890 1825659
1699479 1630817 1736003 1759239 1834243 1764305 1793340 1801972 1776085 1831981 1735953 1746327 1816771 1688356 1789435 1829539 1821510 1753666 1754334 1825375
LB
UB
1697849.40 1629384.60 1733880.94 1757575.31 1832929.82 1762980.85 1791780.67 1800512.03 1773943.34 1830603.61 1734497.71 1744370.65 1815519.06 1686405.61 1788031.32 1828170.46 1819986.79 1752198.10 1752940.80 1823857.26
1702034 1632681 1738066 1761260 1836067 1766312 1795096 1804189 1778085 1833678 1737675 1748522 1818330 1690935 1791391 1832264 1823464 1756183 1756736 1828020
Table 3. Results for n = 500, 3600 secs Inst 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
RBS
ILS
5144540 4994590 5040922 4904534 5005902 4842945 5011513 5104328 5063940 4838541 5092817 4772735 4908764 4814208 5038041 5285359 4799251 4957518 4860580 5059799
5152875 5002745 5034092 4914064 5009590 4853143 5016199 5076191 5065326 4846072 5097629 4766853 4922216 4808437 5042309 5265594 4801812 4968088 4870398 5065701
MATHEUR MATHEUR* 5140704 4990250 5032714 4901632 5002635 4838979 5007430 5091775 5060261 4835943 5089093 4764130 4905943 4803011 5035192 5274060 4795606 4955545 4854128 5056070
5140035 4987900 5028593 4900607 5001433 4837693 5005994 5084128 5059070 4835043 5087501 4760492 4905135 4797422 5033977 5269667 4794169 4954798 4852002 5054660
LB
UB
5136952.93 4983253.08 5020723.19 4897858.55 4997890.36 4833972.24 5002655.02 5064833.20 5056099.72 4832246.66 5083138.52 4754100.99 4902325.86 4789764.69 5029369.95 5254046.68 4788121.47 4952415.71 4848152.27 5051654.98
5143635 4988865 5029116 4904571 5004911 4839728 5009555 5071688 5062880 4839079 5090212 4761279 4911806 4795061 5038432 5261292 4797003 4959358 4855376 5058377
45
46
4
F. Della Croce, A. Grosso, and F. Salassa
Concluding Remarks
A matheuristic two-stage approach for minimizing total flowtime in a (permutation) 2-machines flow shop has been developed and tested. The obtained results confirm that even if, apparently, MILP approaches still cannot compete with ad-hoc state of the art heuristics for such problem, an hybrid approach implementing a post-optimization refinement procedure by means of a MILP solver can achieve valuable results, dominating the current state-of-the art heuristics. We also stress the simplicity of the approach: the first stage is implemented using a state of the art heuristic taken off the shelf, and the second stage is implemented with a balck-box solver at the only cost of writing a well-known MIP model plus about 100 additional lines of code.
References 1. Akkan, C., Karabati, S.: The two-machine flowshop total completion time problem: Improved lower bounds and a branch-and-bound algorithm. European Journal of Operational Research 159, 420–429 (2004) 2. Bansal, S.P.: Minimizing the sum of completion times of n-jobs over M-machines in a flowshop. AIIE Transactions on a Branch and Bound Approach 9, 306–311 (1977) 3. Della Croce, F., Ghirardi, M., Tadei, R.: An improved branch-and-bound algorithm for the two machine total completion time flow shop problem. European Journal of Operational Research 139, 293–301 (2002) 4. Della Croce, F., Ghirardi, M., Tadei, R.: Recovering Beam Search: enhancing the beam search approach for combinatorial optimization problems. Journal of Heuristics 10, 89–104 (2004) 5. Della Croce, F., Narayan, V., Tadei, R.: The two-machine total completion time flow shop problem. European Journal of Operational Research 90, 227–237 (1996) 6. Della Croce, F., Salassa, F.: A Variable Neighborhood Search Based Matheuristic for Nurse Rostering Problems. In: Proceedings of 8th International Conference on the Practice and Theory of Automated Timetabling, PATAT 2010, Belfast, UK, August 10-13 (2010) 7. Dong, X., Huang, H., Chen, P.: An iterated local search algorithm for the permutation flowshop problem with total flowtime criterion. Computers & Operations Research 36, 1664–1669 (2009) 8. Fischetti, M., Lodi, A.: Local Branching. Mathematical Programming B 98, 23–47 (2003) 9. Garey, M.R., Johnson, D.S., Sethi, R.: The complexity of flowshop and jobshop scheduling. Mathematics of Operations Research 1, 117–129 (1976) 10. Graham, R.L., Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G.: Optimization and approximation in deterministic sequencing and scheduling: a survey. Annals of Operations Research 5, 287–326 (1979) 11. Hansen, P., Mladenovic, N.: Variable neighborhood search: Principles and applications. European Journal of Operational Research 130, 449–467 (2001) 12. Hoogeveen, H., van Norden, L., van de Velde, S.: Lower bounds for minimizing total completion time in a two-machine flow shop. Journal of Scheduling 9, 559– 568 (2006)
Matheuristics for F 2| |
Cj
47
13. Hoogeveen, J.A., Van de Velde, S.L.: Stronger Lagrangian bounds by use of slack variables: applications to machine scheduling problems. Mathematical Programming 70, 173–190 (1995) 14. Ladhari, T., Rakrouki, M.A.: Heuristics and lower bounds for minimizing the total completion time in a two-machine flowshop. International Journal of Production Economics 122, 678–691 (2009) 15. Lasserre, J.B., Queyranne, M.: Generic scheduling polyhedral and a new mixed integer formulation for single machine scheduling. In: Proceedings of the IPCO Conference (1992) 16. Maniezzo, V., Stutzle, T., Voss, S.: Matheuristics: Hybridizing Metaheuristics and Mathematical Programming. Annals of Information Systems 10 (2009) 17. Ruiz, R., Maroto, C.: A comprehensive review and evaluation of permutation flowshop heuristics. European Journal of Operational Research 165, 479–494 (2005) 18. Taillard, E.: Some efficient heuristic methods for the flow shop sequencing problem. European Journal of Operational Research 47, 65–74 (1990) 19. T’kindt, V., Monmarch´e, N., Laugt, D., Tercinet, F.: An Ant Colony Optimization Algorithm to Solve a 2-Machine Bicriteria Flowshop Scheduling Problem. European Journal of Operational Research 142, 250–257 (2002) 20. Stafford, E.F.: On the development of a mixed integer linear programming model for the flowshop sequencing problem. Journal of the Operational Research Society 39, 1163–1174 (1988) 21. Van de Velde, S.: Minimizing the sum of job completion times in the two-machine flowshop by Lagrangean relaxation. Annals of Operations Research 26, 257–268 (1990)
Connectedness and Local Search for Bicriteria Knapsack Problems Arnaud Liefooghe1 , Lu´ıs Paquete2 , Marco Sim˜ oes2 , and Jos´e R. Figueira3 1
2
Universit´e Lille 1, LIFL – CNRS – INRIA Lille-Nord Europe, France
[email protected] CISUC, Department of Informatics Engineering, University of Coimbra, Portugal
[email protected],
[email protected] 3 ´ INPL, Ecole des Mines de Nancy, Laboratoire LORIA, France
[email protected]
Abstract. This article reports an experimental study on a given structural property of connectedness of optimal solutions for two variants of the bicriteria knapsack problem. A local search algorithm that explores this property is then proposed and its performance is compared against exact algorithms in terms of running time and number of optimal solutions found. The experimental results indicate that this simple local search algorithm is able to find a representative set of optimal solutions in most of the cases, and in much less time than exact approaches.
1
Introduction
Stochastic local search algorithms have been applied successfully to many multicriteria combinatorial optimization (MCO) problems. It is widely accepted that their sound performance is related to some structural properties of the solution space that allow local search procedures to find reasonably good quality solutions in an effective manner. However, little is known about which properties these are and how they can affect the performance of this class of algorithms. In this article, the notion of connectedness of the set of optimal solutions for MCO problems (a.k.a. the efficient set ) [3] is analyzed from an experimental point of view, and related to the performance of a particular class of multicriteria local search algorithms [9]. For a given efficient set of a MCO problem instance, a graph can be constructed such that each node represents an optimal solution and an edge connects two nodes if the corresponding optimal solutions are neighbors for a given neighborhood structure. The efficient set is connected with respect to that neighborhood structure if the underlying graph is also connected, that is, there is a path between any pair of nodes. If the efficient set is connected and the neighborhood structure is tractable from a computational point of view, local search algorithms would be able to find the efficient set in a very effective manner [9], by starting from at least one efficient solution. However, worstcase results have shown that the efficient set for many MCO problems is not connected in general with respect to different neighborhood structures [3,5], except for very few particular cases [6,11]. Moreover, some recent results indicate P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 48–59, 2011. c Springer-Verlag Berlin Heidelberg 2011
Connectedness and Local Search for Bicriteria Knapsack Problems
49
that approximate solutions that are obtained from independent metaheuristic runs on the bicriteria traveling salesman problem are strongly clustered with respect to small-sized neighborhood structures [10]. The degree of connectedness is here investigated experimentally for two variants of the bicriteria knapsack problem: The bicriteria unconstrained optimization problem (BUOP) and the bicriteria knapsack problem with bounded cardinality (BKP-BC). Both problems are NP-hard and intractable in the general case [2]. The experimental results suggest that the efficient set for the two problems above is very often connected with respect to elementary neighborhood structures, despite of the negative results reported in the literature [5]. Based on these positive findings, a local search algorithm is proposed and its performance is compared with that of multicriteria dynamic programming algorithms in terms of running-time and number of optimal solutions found. A special technique is introduced that allows the early termination of the complete neighborhood exploration without harming algorithmic performance in terms of solution quality. The article is organized as follows. Section 2 and Section 3 give the connectedness results for BUOP and BKP-BC, respectively. Moreover, for each of the sections above, the local search algorithms and the exact approaches for the corresponding problem are introduced. Numerical results are shown on a large set of random instances of different structure and size. Finally, Section 4 presents conclusions and further work.
2
The Bicriteria Unconstrained Optimization Problem
This section introduces a variant of the classical knapsack problem where the capacity constraint is transformed into an additional criterion to be optimized. Then, an experimental analysis on the connectedness property of the corresponding efficient set is reported, as well as the performance of a local search algorithm on several instances of the problem. 2.1
Problem Definition
The original (single-criterion) 0/1 knapsack problem is formulated as follows: max s.t.
n i=1 n
p i xi (1) wi xi ≤ W
i=1
where p = (p1 , p2 , . . . , pj , . . . , pn ) is the profit vector, pj representing the amount of profit on item j, j = 1, . . . , n, and x = (x1 , x2 , . . . , xj , . . . , xn ) with xj = 1 if the item j is included in the subset of selected items (knapsack) and xj = 0 otherwise; w = (w1 , w2 , . . . , wj , . . . , wn ) is the weight vector, wj representing the amount of investment on item j, j = 1, . . . , n; and W is the overall amount
50
A. Liefooghe et al.
available or budget. The sum of profits and the sum of weights of a given solution x are denoted by p(x) and w(x), respectively. By transforming the capacity constraint of Problem (1) into a criterion, the following bicriteria unconstrained optimization problem [2] is obtained: max (p(x), −w(x)) .
(2)
A proper meaning to the operator “max” above is given as follows. Let X denote the set of feasible solutions of Problem (2). The image of the feasible solutions when using the vector maximizing function of Problem (2) defines the feasible region in the criteria space, denoted here by Z ⊆ N2 . A feasible solution x ∈ X is efficient if there does not exist another feasible solution x ∈ X such that p(x ) ≥ p(x) and w(x ) ≤ w(x), with at least one strict inequality in one of above (or (p(x ), −w(x )) ≥ (p(x), −w(x))). A vector z ∈ Z is nondominated if there is some efficient solution x such that z = (p(x), −w(x)). A vector z ∈ Z dominates a vector z ∈ Z (or z is dominated by z) if z ≥ z holds; if neither z ≥ z nor z ≥ z holds, then both are (mutually) nondominated. The set of all efficient solutions and the set of nondominated vectors are called the efficient set and the nondominated set, respectively. The usual goal of MCO is to find a minimal complete set, that is, the smallest subset of the efficient set whose image coincides with the nondominated set. This subset may not be unique. 2.2
Connectedness Analysis
This section describes an experimental analysis for investigating the influence of problem size and degree of conflict between the two criteria on the connectedness property of the efficient set for BUOP. A multicriteria dynamic programming (MDP-BUOP) algorithm is implemented to compute the efficient set. This algorithm consists of the first phase of the Nemhauser-Ullman algorithm for the single-criterion 0/1 knapsack problem [8]. It has shown to be theoretically efficient for several input data distributions [1]. The MDP-BUOP sequential process consists of n stages. At any stage i, the algorithm generates a set Si of states, which represents a set of promising feasible solutions made up of the first i items, i = 1, . . . , n. A state s = (sp , sw ) ∈ Si represents a feasible solution of profits sp and weight sw . The MDP-BUOP algorithm follows the recursion: Si := vmax {(sp + pi , sw − wi ), s ∈ Si−1 } for i = 1, . . . , n, with the basis case S0 := (0, 0). Operator “vmax” returns the states that are nondominated in Si . At the last stage n, the set Sn corresponds to the nondominated set. In order to obtain the efficient set with the MDP-BUOP algorithm, a binary string is generated with each new state and updated accordingly during the sequential process. For this reason, the implementation keeps states with the same component values. The removal of dominated states at each stage is performed by the algorithm of Kung et al. [7]. Only two sets of states are maintained during the overall sequential process since, at any stage i > 0, only set Si−1 is required.
Connectedness and Local Search for Bicriteria Knapsack Problems
51
Table 1. Percentage of BUOP instances connected with respect to the 1-flip neighborhood (all instances are connected with respect to the 1-flip-exchange neighborhood) and average size of the efficient set for each instance size and data correlation (%con and avgs, respectively) ρ = −0.8 ρ = −0.4 ρ = 0.0 ρ = 0.4 ρ = 0.8 size %con avgs %con avgs %con avgs %con avgs %con avgs 300 100.0 8 965 96.7 10 838 100.0 14 501 100.0 24 702 93.3 55 159 600 96.7 32 242 100.0 39 056 100.0 52 675 96.7 92 717 90.0 22 8427 900 90.0 69 208 93.3 83 357 86.7 112 598 76.7 201 642 60.0 495 154 1200 96.7 118 483 90.0 144 259 86.7 195 396 83.3 338 903 823 612 1500 90.0 179 932 93.3 217 230 86.7 301 845 526 029 1 352 817 1800 83.3 252 972 93.3 308 373 423 450 716 598 1 818 608 2100 76.7 337 443 409 771 563 840 969 069 2 431 715
These two sets are implemented as height-balanced binary search trees in order to allow logarithmic-time operations. If the implementation does not terminate before one hour of CPU-time, or if RAM resources available are exceeded, the run is cancelled and the output is omitted. The code is written in C++. BUOP instances are defined with two parameters: problem size (n) and correlation between profit and weight vectors (ρ). Both parameters affect the size of the efficient set. The positive (resp. negative) data correlation will increase (resp. decrease) the degree of conflict between the two criteria. The size of the instances ranges from 300 to 3000 and the correlations are ρ ∈ {−0.8, −0.4, 0.0, 0.4, 0.8}. Profit and weight integer values are generated randomly according to a uniform distribution in [1, M/n], where M denotes the maximum possible integer value. The generation of correlated data follows the procedure given by Verel et al. [12]. For each problem size and each correlation degree, 30 different and independent instances are randomly generated. Two neighborhood structures are considered for the experimental analysis: the 1-flip and 1-flip-exchange neighborhoods. Two feasible solutions are 1-flip neighbors if they differ exactly on one assignment. In other words, a given neighbor can be reached by adding or removing one item from a given solution. Hence, this neighborhood structure is directly related to the Hamming distance between binary strings. The 1-flip-exchange neighborhood is an extension of the neighborhood above. Two feasible solutions are 1-flip-exchange neighbors if one can be obtained from the other by exchanging two items, adding one item, or removing one item. The size of the neighborhood is linear with n for the 1flip neighborhood structure, while it is quadratic in the case of 1-flip-exchange. Both neighborhoods coincide with the neighborhood structures used by Gorski et al. [5] for a similar class of problems. For the connectedness analysis, MDP-BUOP outputs the efficient set for every instance. For each neighborhood structure, an adjacency matrix is built, indicating whether each two efficient solutions are neighbors or not. Based on this matrix, the connectedness of the corresponding graph is tested. Since this analysis involves a large usage of memory resources (more than 2Gb for large-size
52
A. Liefooghe et al.
Algorithm 1. Pareto Local Search Input: n ∈ N, p, w ∈ Nn , x0 := 0. Output: Set VT 1. VF := {x0 } 2. VT := ∅ 3. while VF = ∅ do 4. select x from VF 5. VF := VF \ {x } 6. VT := VT ∪ {x } 7. for all x ∈ N (x ) do 8. if {x | x ∈ VF ∪ VT , (p(x ), −w(x )) ≥ (p(x), −w(x))} = ∅ then 9. VF := {x | x ∈ VF , (p(x), −w(x)) ≥ (p(x ), −w(x ))} 10. VT := {x | x ∈ VT , (p(x), −w(x)) ≥ (p(x ), −w(x ))} 11. VF := VF ∪ {x} 12. return VT
instances), only results for a limited number of instance sizes and correlation values are presented. Table 1 gives the percentage of instances with the efficient set that is connected with respect to the 1-flip neighborhood, as well as the average size of the efficient set, rounded to the nearest integer. Although the correlation in the input data influences the size of the efficient set, it does not seem to affect the connectedness results. However, the proportion of instances with a connected efficient set slightly decreases with the increase of the instance size. Finally, for those set of instances, the efficient set is always connected with respect to the 1-flip-exchange neighborhood.
2.3
Local Search for BUOP
The local search algorithm for BUOP (PLS-BUOP) is based on the Pareto Local Search [9]. The pseudo-code is given in Algorithm 1. For simplification purpose, it is assumed that all feasible solutions have a distinct image in the criterion space. Two archives of nondominated solutions are maintained, VT and VF , respectively. Archive VT contains the set of solutions whose neighborhood has already been explored, while VF contains the remaining ones. PLS-BUOP starts with an efficient solution that initializes the archive. Then, at each iteration, a solution is chosen from VF , and its neighborhood is explored (N (x) denotes the neighborhood of a given solution x ∈ X). All the nondominated neighboring solutions are used to update VF and dominated solutions are discarded from VF and VT . The algorithm terminates once VF is empty. This algorithm stops naturally when a Pareto local optimum set is found, it does not cycle, and if connectedness of the efficient set holds, it is able to identify it by starting from at least one efficient solution (with a proper change of the conditions in Algorithm 1) [9]. In the following, some particular details of the implementation are described.
Connectedness and Local Search for Bicriteria Knapsack Problems
53
Fig. 1. Initial dominance-depth ranking of items for the exploration of the neighborhood (left: ranking for adding, right: ranking for deleting)
– Initialization. The algorithm starts with the efficient solution x0 = 0. This solution maps, in the criteria space, to an extreme point from the nondominated set, with null profit and weight values. – Selection. At each iteration of the algorithm, the solution from VF with the smallest profit value is selected (Algorithm 1, line 1). – 1-flip neighborhood exploration. In order to perform the 1-flip neighborhood exploration in an efficient manner, a preliminary step ranks the items into different layers, with respect to the dominance-depth ranking [4]. Two different ranks are required to cover the cases of adding and deleting an item. This step is illustrated in Fig. 1. Let Li denote the set of items in the i-th layer and let u ∈ Li and v ∈ Lj , j > i. Then, for the case of the addition, it holds that (pv , −wv ) ≥ (pu , −wu ). Therefore, the neighborhood exploration starts by examining the items in the first layer, and proceeds with the items in the subsequent layers. Within the same layer, the exploration follows the nondecreasing order of the weights. The exploration stops after verifying that no item of a given layer belongs to the current solution. Indeed, any neighboring solution constructed from the subsequent layers is dominated by at least one neighboring solution built from this layer. For the case of deletion, a similar reasoning applies with the corresponding changes. – 1-flip-exchange neighborhood exploration. For this neighborhood, the same reasoning as above applies when adding or removing an item. In order to exchange pairs of items more efficiently, the following pre-processing procedure is applied: First, for each item i ∈ {1, . . . , n}, a tuple records the profit and weight difference with respect to each different item j. Then, the n − 1 tuples are sorted in terms of dominance-depth ranking. When considering the exchange of item i with another item, the exploration follows the order given by the dominance-depth ranking, with the nondecreasing order of the weighs for tuples within the same layer; the exploration stops once the items corresponding to all tuples of a given layer can be exchanged with i. The exploration is iterated for every item in the knapsack.
54
A. Liefooghe et al.
1000
10000 DP PLS 1000
100
CPU time (seconds)
CPU time (seconds)
DP PLS
10
1
100
10
1
0.1
0.1 500
1000
1500 2000 instance size
2500
3000
500
1000
1500 2000 instance size
2500
3000
Fig. 2. CPU time in seconds (average value and standard deviation, given in log scale) of MDP-BUOP and PLS-BUOP using the 1-flip neighborhood for instances with data correlation ρ = −0.4 (left) and ρ = 0.4 (right)
– Data structures. Archives VF and VT are implemented as height-balanced binary search trees. The removal of dominated solutions follows the algorithm of Kung et al. [7]. MDP-BUOP and two PLS-BUOP versions, each one for each neighborhood structure, were run on the instances described in Section 2.2. The implementation of MDP-BUOP was modified in order to output a minimal complete set. All the algorithms share the same programming language, data structures, compilation options and were run in the same machine with a time bound of one hour. The PC used for the experiments was an iMac with Mac OS X v10.5.5 (Leopard), 2.4 GHz Intel Core with 4MB L2 Cache and 2GB SDRAM. All codes were compiled with g++ version 4.0.1 using the -O3 flag. The CPU-time taken by MDP-BUOP and PLS-BUOP with 1-flip neighborhood for instances with correlation ρ = −0.4 and ρ = 0.4 is reported in Fig. 2. In order to distinguish the differences of performance, the CPU-time is presented in log scale. Similar relationship between performance of both implementations was obtained for the remaining correlation values. Table 2 presents the percentage of efficient solutions and the total number of solutions returned by PLS-BUOP using 1flip neighborhood, averaged over results obtained in 30 instances for each size and correlation. Since PLS-BUOP using the 1-flip-exchange neighborhood took always more time than MDP-BUOP, the results of the former are omitted. Although the experimental analysis indicated the existence of instances with unconnected efficient set with respect to the 1-flip neighborhood (see Table 1), the results show that the local search approach using the same notion of neighborhood is able to identify a minimal complete set in many cases. For other instances, it leads to the identification of more than 99.9% of the efficient set. Furthermore, the local search performs very efficiently in comparison to the dynamic programming approach in terms of computational time. Indeed, the larger the instance size, the larger the gap between PLS-BUOP and MDP-BUOP in terms of CPU time. However, PLS-BUOP appears to be much more efficient
Connectedness and Local Search for Bicriteria Knapsack Problems
55
Table 2. Average percentage of efficient solutions and average number of solutions found by PLS-BUOP using the 1-flip neighborhood (%ef and avgs, respectively)
size 300 600 900 1200 1500 1800 2100 2400 2700 3000
ρ = −0.8 %ef avgs 100.0 8 965 99.9 32 242 99.9 69 208 100.0 118 483 99.9 179 932 99.9 252 972 99.9 337 443 100.0 434 333 99.9 541 046 99.9 658 680
ρ = −0.4 %ef avgs 99.9 10 838 100.0 39 056 99.9 83 357 99.9 144 259 100.0 217 230 100.0 308 373 99.9 409 770 99.9 530 843 100.0 661 676 100.0 806 236
ρ = 0.0 %ef avgs 100.0 14 501.2 100.0 52 675.4 99.9 112 597.3 99.9 195 395.6 100.0 301 844.6 99.9 423 450.0 99.9 563 839.5 100.0 719 388.3 100.0 900 284.3 100.0 1 100 882.2
ρ = 0.4 ρ = 0.8 %ef avgs %ef avgs 100.0 24 702 99.9 55 159 100.0 92 717 100.0 228 427 99.9 201 642 99.9 495 153 100.0 338 903 99.9 823 611 99.9 526 029 99.9 1 352 816 99.9 716 598 99.9 1 818 607 99.9 969 068 99.9 2 431 713 99.9 1 300 909 3 358 473 100.0 1 560 123 4 158 499 99.9 2 022 463 4 846 109
for negatively correlated data, while MDP-BUOP was not able to solve all the instances for positively correlated profit and weight values.
3
The Bicriteria Knapsack Problem with Bounded Cardinality
This section reports a similar analysis on a variant of the previous problem, where an additional cardinality constraint is considered. 3.1
Problem Definition
The bicriteria knapsack problem with bounded cardinality is a variant of the BUOP that is obtained by limiting the number of chosen items by a cardinality bound (k). The formulation of BKP-BC is as follows: max (p(x), −w(x)) n s.t. xi ≤ k.
(3)
i=1
In the problem above, the operator “max” follows the same meaning as in Problem (2). The same terminology and notation will be used for this problem. 3.2
Connectedness Analysis
The efficient set is computed by means of a multicriteria dynamic programming (MDP-BKP-BC) algorithm that extends the MDP-BUOP algorithm given in Section 2.2. This algorithm has k × n stages. At each stage, the algorithm generates the set T(i,j) of states, which represents a set of potential efficient solutions made up of the first i items, i = 1, . . . , n, with cardinality j, j = 1, . . . , k.
56
A. Liefooghe et al.
A state t = (tp , tw ) ∈ T(i,j) represents a feasible solution of profits tp and weight tw . This approach follows the recurrence relation: T(i,j) := vmax T(i−1,j) ∪ (ts + pi , tw − wi ), t ∈ T(i−1,j−1) for i = 1, . . . , n and j = 1, . . . , k, with the basis cases T(i,0) := (0, 0), for i = 0, . . . , n and T(0,j) := (0,0) for j = 0, . . . , k. The nondominated set of states is given by the set vmax T(n,0) ∪ · · · ∪ T(n,k) . The implementation follows the same principles presented in Section 2.2 for the MDP-BUOP algorithm. However, the implementation of MDP-BKP-BC has to keep 2(k +1) sets during the overall process, since at each state i, the sets T(i,j) and T(i−1,j) , for j = 0, . . . , k are required. For this reason, MDP-BKP-BC should take more time than MDPBUOP for the same instance size. A similar set of instances to those used for the previous problem is generated. In addition to the instance size (n) and to the data correlation (ρ), several different values for the cardinality bound (k) are considered: n/10, n/5, and n/2. For each problem size, correlation and cardinality bound, 30 different and independent instances are generated randomly, as explained in Section 2.2. The two neighborhood structures described in Section 2.2 were also considered for this problem. The use of the 1-flip-exchange neighborhood for this problem is motivated by the conjecture that many efficient solutions may have the same cardinality, due to the additional cardinality constraint. Due to limited memory resources, results were only obtained for instances of limited size (these instances can be inferred from Table 3). Differently from the connectedness results obtained in the first problem (see Table 1), no instance with an efficient set that is connected with respect to the 1-flip neighborhood was found. However, the efficient set for all the instances were connected with respect to the 1-flip-exchange neighborhood. These results corroborate those of Gorski et al. [5] for much smaller instances (up to 100 items). 3.3
Local Search for BKP-BC
Given the positive results reported in the previous section, a local search (PLSBKP-BC) was developed under the same reasoning of PLS-BUOP (see Section 2.3). The only difference is in the neighborhood exploration since a maximum number of items has to be ensured in the solution when considering the possibility of adding an item. MDP-BKP-BC and two versions of PLS-BKP-BC were run in the same instances defined in the previous section. The experiments were performed in a computer cluster with 6 nodes, each with an AMD Phenom II X6 processor with 3.2GHz, 3 and 6 MB L2 and L3 Cache, respectively, and 12 GB DDR3 SDRAM. The operating system was Ubuntu 8.04 LTS. Both codes were compiled with g++ version 4.2.4 using the -O3 flag. The CPU-time taken by the three approaches is plotted in Fig. 3 for four different instance parameter settings. In order to distinguish the differences of performance, the CPU-time is presented in log scale. Clearly, both PLS-BKP-BC
Connectedness and Local Search for Bicriteria Knapsack Problems
1000
10000 DP PLS-flip-ex PLS-flip
1000 CPU time (seconds)
CPU time (seconds)
100 10 1 0.1
DP PLS-flip-ex PLS-flip
100 10 1 0.1
0.01
0.01
200
400
600
800 1000 instance size
1200
1400
1600
200
10000
1000
100 10 1 0.1
600 800 instance size
1000
1200
DP PLS-flip-ex PLS-flip
100 10 1 0.1
0.01 200
400
10000 DP PLS-flip-ex PLS-flip CPU time (seconds)
1000 CPU time (seconds)
57
0.01 400
600 800 instance size
1000
1200
200
300
400
500 600 700 instance size
800
900 1000
Fig. 3. CPU time in seconds (average value and standard deviation, given in log scale) of the three approaches for instances with parameters k = n/10 (top) and k = n/5 (bottom), ρ = −0.4 (left) and ρ = 0.4 (right)
versions take much less time than MDP-BKP-BC to terminate. Similar results hold for the remaining instances. Table 3 reports the percentage of efficient solutions found by the two versions of PLS-BKP-BC. As expected from the connectedness analysis reported in the previous section, PLS-BKP-BC using the 1-flip-exchange neighborhood was always able to find a minimal complete set for all instances up to 1800 items in less than one hour of CPU-time. PLS-BKPBC using 1-flip neighborhood is, in many cases, able to find more than half of a minimal complete set in less than one second.
4
Concluding Remarks
This article describes an experimental analysis on the structure of the efficient set, in terms of connectedness, for two MCO problems. Despite of the negative results reported in the literature for similar problems [3,5], the experimental analysis for the problems investigated in this paper are quite promising. For both bicriteria versions of the unconstrained optimization problem and knapsack problem with a bounded cardinality constraint, the experiments suggest that small-sized neighborhood structures give rise to connected efficient sets quite frequently, and independently of the size and of the structure of input data. In fact, it is not yet clear what structure of the input data may generate, in
58
A. Liefooghe et al.
Table 3. Average percentage of efficient solutions found by PLS-BKP-BC using the 1-flip and the 1-flip-exchange neighborhood structures (%ef1 and %ef2 , respectively)
size 300
600
900 1200 1500 1800
ρ = −0.8 card. %ef1 %ef2 n/10 72.6 100.0 n/5 83.9 100.0 n/2 95.6 100.0 n/10 75.3 100.0 n/5 84.2 100.0 n/2 95.4 100.0 n/10 75.6 100.0 n/5 83.1 100.0 n/10 75.6 100.0 n/5 82.3 100.0 n/10 75.4 100.0 n/5 83.1 100.0 n/10 75.4 100.0
ρ = −0.4 %ef1 %ef2 54.1 100.0 63.0 100.0 88.6 100.0 50.9 100.0 63.3 100.0 88.3 100.0 50.1 100.0 61.0 100.0 50.5 100.0 60.0 100.0 50.5 100.0 -
ρ = 0.0 %ef1 %ef2 42.0 100.0 53.5 100.0 85.3 100.0 41.6 100.0 51.3 100.0 83.1 100.0 39.6 100.0 50.4 100.0 40.2 100.0 50.9 100.0 40.3 100.0 -
ρ = 0.4 %ef1 %ef2 44.2 100.0 54.4 100.0 83.1 100.0 41.6 100.0 50.0 100.0 83.4 100.0 40.5 100.0 51.5 100.0 39.9 100.0 -
ρ = 0.8 %ef1 %ef2 47.7 100.0 56.3 100.0 84.7 100.0 47.6 100.0 55.3 100.0 47.0 100.0 -
general, an unconnected efficient set under the 1-flip-exchange neighborhood that was used in this article. Although the large number of connected instances motivates the use of local search algorithms, it is still an open question whether those approaches are efficient enough as compared to exact algorithms. The experimental analysis reported in this article gives a clear positive answer for the second problem. For the first problem, without cardinality constraint, some preliminary results indicated that the local search proposed in this article under the same neighborhood that (empirically) provides connectedness would not be worthwhile in terms of running time. Still, using a smaller neighborhood structure allows the same algorithm to find more than 99.9% of the efficient set in a significantly less amount of time than the exact approach. The simplicity of the local search approach proposed in this article is very appealing for implementation purpose, needs no definition of parameters and requires a minimum number of modifications in order to be applied to other type of knapsack problems. For instance, the same principles can be applied to the multicriteria knapsack problem (with several maximizing profit criteria and one capacity constraint) by ignoring (or penalizing) infeasible neighboring solutions. However, finding appropriate definitions of neighborhoods that give rise to a large number of connected efficient sets for knapsack problems with capacity constraints is still under investigation. A natural question is whether it is possible to derive analytical results for MCO problems that would prove connectedness by assuming some structure or distribution on the input data. Connections with neighborhood structures arising in the context of the linear programming formulation of the MCO problem may provide further insights [5]. Moreover, the derivation of bounds on the run-time of multiobjective evolutionary algorithms that start from efficient solutions are also of interest. For instance, the size of the efficient set for the first problem is
Connectedness and Local Search for Bicriteria Knapsack Problems
59
polynomially bounded for many input data distributions [1]. This suggests that a polynomial run-time bound may be achieved by such type of algorithms. Acknowledgements. The authors acknowledge Jochen Gorski for the discussion on the main topic of this article. This work was partially supported by the Portuguese Foundation for Science and Technology (PTDC/EIA-CCO/098674/2008) and the project “Connectedness and Local Search for Multiobjective Combinatorial Optimization” funded by the Deutscher Akademischer Austausch Dienst and Conselho de Reitores das Universidades Portuguesas. The third author acknowledges a CEG-IST grant from Instituto Superior T´ecnico (PTDC/GES/73853/2006).
References 1. Beier, R., V¨ ocking, B.: Probabilistic analysis of knapsack core algorithms. In: Proc. of the 15th ACM-SIAM Symposium on Discrete Algorithms (SODA 2004), pp. 468–477 (2004) 2. Ehrgott, M.: Multicriteria optimization. Lecture Notes in Economics and Mathematical Systems, vol. 491. Springer, Heidelberg (2000) 3. Ehrgott, M., Klamroth, K.: Connectedness of efficient solutions in multiple criteria combinatorial optimization. European Journal of Operational Research 97(1), 159– 166 (1997) 4. Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Boston (1989) 5. Gorski, J., Klamroth, K., Ruzika, S.: Connectedness of efficient solutions in multiple objective combinatorial optimization. Tech. Rep. 102/2006, University of Kaiserslautern, Department of Mathematics (2006) 6. Gorski, J., Paquete, L.: On a particular case of the multi-criteria unconstrained optimization problem. Electronic Notes on Discrete Mathematics 36, 135–142 (2010) 7. Kung, H., Luccio, F., Preparata, F.: On finding the maxima of a set of vectors. Journal of ACM 22(4), 469–476 (1975) 8. Nemhauser, G., Ullman, Z.: Discrete dynamic programming and capital allocation. Management Science 15(9), 494–505 (1969) 9. Paquete, L., Schiavinotto, T., St¨ utzle, T.: On local optima in multiobjective combinatorial optimization problems. Annals of Operations Research 156(1), 83–97 (2007) 10. Paquete, L., St¨ utzle, T.: Clusters of non-dominated solutions in multiobjective combinatorial optimization: An experimental analysis. In: Multiobjective Programming and Goal Programming. Lecture Notes in Economics and Mathematical Systems, vol. 618, pp. 69–77. Springer, Heidelberg (2009) 11. da Silva, C.G., Cl´ımaco, J., Figueira, J.R.: Geometrical configuration of the Pareto fronteir of the bi-criteria 0-1-knapsack problem. Tech. Rep. 16/2004, INESC, Coimbra, Portugal (2004) 12. Verel, S., Liefooghe, A., Jourdan, L., Dhaenens, C.: Analyzing the effect of objective correlation on the efficient set of MNK-landscapes. In: Proc. of the 5th Conference on Learning and Intelligent OptimizatioN (LION 5). LNCS. Springer, Heidelberg (2011) (to appear)
Cutting Graphs Using Competing Ant Colonies and an Edge Clustering Heuristic Max Hinne and Elena Marchiori Radboud University Nijmegen
Abstract. We investigate the usage of Ant Colony Optimization to detect balanced graph cuts. In order to do so we develop an algorithm based on competing ant colonies. We use a heuristic from social network analysis called the edge clustering coefficient, which greatly helps our colonies in local search. The algorithm is able to detect cuts that correspond very well to known cuts on small real-world networks. Also, with the correct parameter balance, our algorithm often outperforms the traditional Kernighan-Lin algorithm for graph partitioning with equal running time complexity. On larger networks, our algorithm is able to obtain low cut sizes, but at the cost of a balanced partition.
1
Introduction
Networks are quickly becoming one of the most well-studied data structures. Many interesting phenomena can intuitively be seen as networks. Some examples include the World Wide Web, social networks, neural networks, traffic networks and gene regulatory networks. Unfortunately, many of the most interesting data sets are extremely large; several billion nodes are no exception. Even with the advances in modern hardware, this severely hampers analysis of such huge networks. One of the consequences is that research on these networks must employ efficient algorithms, since even algorithms with polynomial complexity can be prohibitively expensive. We consider the network as a graph consisting of a set of vertices and edges. In this paper we study one particular well-known problem which is finding a minimum cut, i.e. the division of the set of vertices into two disjoint subsets with the smallest possible number of intra-set edges. This problem has many applications, for example in computer vision and network design, and also forms the basis of several network clustering techniques that operate by repeatedly bisecting a graph [1]. In particular in the latter example, the minimum cuts are subject to an additional requirement that captures the relative size of the subsets. In practice, it is often desirable that these are balanced, i.e. they are of roughly the same size. This is not reflected in the minimum cut concept itself. Several measures have been suggested that do take this balance into account. We will consider conductance [2]. Minimizing the conductance of a network has been shown to be a problem that is NP-complete [3]. Instead of calculating the minimum conductance exactly, we will apply a heuristic approach to finding an optimal graph cut based on the Ant Colony P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 60–71, 2011. c Springer-Verlag Berlin Heidelberg 2011
Cutting Graphs Using Competing Ant Colonies
61
Optimization [4] meta-heuristic. We will show that this method is able to find graph cuts on several real-world data sets. The problem of finding the minimum cut of a graph is well-studied. Originally the problem was solved by dividing it into an easier problem, which was to find a minimum s − t cut. This entailed finding a cut so that vertices s and t became disconnected. Repeated application of a minimum s−t cut algorithm on all pairs (s, t) ∈ V yields the minimum cut. Several approaches have been suggested that can reduce the running time of such algorithms. For example, Hao and Orlin [5] devised an algorithm that has a running time of O(m(n−λ) log (1/)) on a graph with n vertices, m edges and λ the node connectivity1 . The problem of finding a minimum cut has been shown to be equivalent to the problem of finding the maximum flow of a network. Consequently, algorithms that solve the latter can also be used to solve the former. An extensive overview of available algorithms is given in [6]. Most of these algorithms have running time complexities of O(nm2 ), O(n3 ) or O(mn2 ), which makes them all computationally quite expensive on large graphs. When the problem is not just finding a minimum cut, but finding a balanced minimum cut – as captured by conductance – the task becomes harder. Essentially, all possible cuts should be considered, but as this number increases exponentially with n, this leads to an intractable problem [3]. In order to be able to use conductance, poly-logarithmic approximation algorithms are used based on spectral analysis of the adjacency matrix that corresponds to the graph [7], or heuristics [8]. Although such algorithms have a poly-logarithmic running time, calculating the spectrum of a matrix can still be computationally expensive. This leaves room for improvement, which we attempt through the Ant Colony Optimization (ACO) meta-heuristic. A number of interesting methods for graph partitioning problems based on ACO have been introduced, e.g., [9,10,11]. In this paper we adopt an approach based on competing ant colonies [10]: two colonies compete for resources and reconstruct a global environment corresponding to a good graph partition. Our algorithm differs from previous methods based on competing ACO mainly in the type of heuristic it uses, which is used in social network analysis.
2
Theoretical Background
2.1
Graph Cuts
Let G = (V, E), E ⊆ V × V , be a graph with n = |V | vertices and m = |E| edges. In this paper, we consider only undirected, unweighted graphs, but our methodology can easily be extended to graphs with edge directions and weights. A cut is defined as a partition of the (vertices of the) graph into two disjoint subsets, S and V \ S = S. The associated cut-set is the set of edges for which one end point is in one element of the partition and the other end point in the other element. The size of the cut is the number of edges in the cut set, defined as 1
The size of the smallest number of nodes that must be removed in order to leave the graph disconnected. This can be calculated a priori in O(mn) time.
62
M. Hinne and E. Marchiori
cG (S) ≡ {u, v} ∈ E | u ∈ S, v ∈ S .
(1)
We omit the subscript G when there is no confusion likely to occur. The problem we consider is that of finding a balanced cut, i.e. the cut for which the cut-set is smaller than any other cut possible on the graph, while at the same time balances the sizes of the two sets of the partition. This is reflected in the conductance [7,12] measure, which is defined as: c(S) , (2) min(vol(S), vol(S)) where c(S) is the size of the cut-set, and vol(S) is the volume of S, i.e. the number of edges that have both end points within S: φ(S) ≡
vol(S) ≡ |{{u, v} ∈ E | u, ∈ S, v ∈ S}| .
(3)
Note that a lower conductance indicates a better balance, given the same size of the cut-set. The conductance of the whole graph G is the minimum conductance over all possible cuts: Φ(G) ≡ min φ(S) . (4) S⊆V
Finding the minimum conductance of a graph has been shown to be NP-complete [3], which is why we resort to a probabilistic meta-heuristic. 2.2
Ant Colony Optimization
Ant Colony Optimization (ACO) is an approach to tackle combinatorial optimization problems (e.g. dividing the vertex set into an optimal partition according to some quality function) [4]. ACO considers solution components C and a pheromone model T . Several independent agents construct a solution s to the problem by combining elements from the solution components. These components ci ∈ C have pheromone values τi ∈ T assigned to them that are used to obtain the probability that the ant moves to the next component, e.g. τi Pr(ci |s) ∝ , (5) τj cj ∈N(s)
with N (s) the options available to the ant. In Section 3 we will go deeper into this probability distribution. The general way to solve a combinatorial optimization problem using ACO is to 1. assemble a candidate solution from components based on the probabilities of the pheromone model, 2. update the pheromone model based on the quality of the candidate solution and 3. repeat the first two steps until a satisfying solution is obtained. [13] In the next section we describe how we translate the cutting of graphs into the ACO paradigm.
Cutting Graphs Using Competing Ant Colonies
63
Table 1. The outline of the ACO graph cutting algorithm Algorithm 1: Cutting graphs through ACO. Cut GraphCut() C = LoadGraph(); T = InitializePheromones(); H = InitializeHeuristics(); Cut s = (V, ∅); while no convergence do s = GenerateSolution(C, T, H); if φ(s ) < φ(s) then s = s ; UpdatePheromones(T, H, s); fi; od; return s;
3
The Algorithm
The general ACO graph cutting algorithm is provided in Table 1 [4]. Its implementation depends on the two subroutines GenerateSolution(C, T, H) and UpdatePheromones(T, H, s). In case of a graph cut algorithm, a solution is obviously a partition of the graph. To enable the construction of candidate solutions, we deviate from traditional ACO based on the ant colony metaphor. Instead, we use two competing ant colonies. The intuition behind this idea is that both colonies of ants will try to obtain a densely connected subset of the graph, that is only sparsely connected to the subset belonging to the other ant colony. Together, the sets of vertices controlled by the ant colonies correspond to a graph cut. In this scenario, the solution components are the edges in the graph. Whenever an ant colony sends an ant to traverse an edge and marks the target vertex, that edge becomes a part of one of the two components of the cut. Based on the outcome of several of such ‘ant colony competitions’, the cut with the lowest conductance is selected as the candidate solution s (see Table 2). Afterwards, pheromones are deposited on edges within S and S, so that in subsequent rounds the ants will favor walking along these edges. Eventually the process is ended when a fixed number of iterations is completed. The CutGraph(C, T, H) function, where the actual ant colony competition takes place, is described separately in Table 3. In the initialization function, the ant colonies are given random starting vertices. The ACO paradigm comes into play in the SelectEdge(C, T, H) function, where a colony must decide to which neighboring vertex it sends an ant next. Out of the possible edges N (s) the colony makes its selection in a probabilistic manner. The probability of an edge ei ∈ N (s) is given by
64
M. Hinne and E. Marchiori
Table 2. The GenerateSolution(C, T, H) subroutine Algorithm 2: Generating a solution. Cut GenerateSolution(C, T, H) S = ∅; while i < k do s = CutGraph(C, T, H); S = S ∪ s; i = i + 1; od; return argmin φ(s); s∈S
Table 3. The CutGraph(C, T, H) subroutine Algorithm 3: Cutting a graph. Cut CutGraph(C, T, H) A = InitializeColonies(Colony 1, Colony 2); while not all vertices flagged do foreach colony a ∈ A do e = SelectEdge(C, T, H); a.MoveAntAcross(e); a.FlagVertex(a); od; od;
Pr(ei |s) ≡
τ α η(ei )β i , τjα η(ej )β
(6)
ej ∈N (s)
with τi the pheromone associated with ei and η(ei ) represents optional prior knowledge, or heuristic information, the colonies have about the attractiveness of ei (more on this later). The parameters α ≥ 0 and β ≥ 0 determine the relation between pheromone information and heuristic information, respectively. The final part of the ACO approach is the updating of the pheromone distribution, so that solution components from successful solution candidates are more likely to occur in future solution candidates. This is accomplished in UpdatePheromones(T, H, s). In this procedure, pheromones slowly evaporate over time as well. This is done to prevent the colonies from getting stuck in local optima. For each τi ∈ T , τi is updated according to τi = ρτi + δi (s)f (s) ,
(7)
with ρ the parameter representing the evaporation rate (ρ = 0.1 in our experiments), δi (s) is 1 if ei ∈ s (i.e. the end points are either both in S or both in S), 0 otherwise and finally f : S → R is a function that maps a solution to a
Cutting Graphs Using Competing Ant Colonies
65
1 score, using f (s) ≡ c φ(s) , where φ(s) is the conductance corresponding to this cut and c a scaling constant. For now, we simply choose c = 1.
3.1
Heuristic Information
Ants are not generally considered bright individuals, but in the ACO paradigm there is room for heuristic information. This represents the idea that ants (or their collectives) do not base their decisions on pheromone information only, but also on some local characteristic of the network connectivity. In Eq. (6) this is reflected in the parameter η(ei ). In the case of finding a balanced graph cut, the heuristic information should reward edges that are likely to be inter-set edges. We borrow a measure from social network analysis that is used in community detection, namely the edgeclustering coefficient. The coefficient C(e) of an edge [14], which counts the number of triangles that an edge is part of, is expressed as: η(ei ) ≡ C(ei ) =
|N b(s) ∩ N b(t)| +1. min (d(s) − 1), (d(t) − 1)
(8)
with s and t the end points of ei , N b(s) the neighbors of a vertex s and d(s) = |N (s)| the degree of a vertex. The +1 in the denominator prevents the heuristic distribution becoming 0, which would make it impossible for the edge to be selected at all. 3.2
Stop Conditions
Although by the nature of the ACO paradigm, the algorithm converges, it is out of the scope of this study to derive the exact time complexity in terms of the approximation of the true minimal conductance (e.g. [15]). Therefore we settle for one of two possible stop conditions for the outer loop of the algorithm (see Table 1). The first alternative is to simply execute a fixed number of iterations of GenerateSolutions(C, T, H). For small graphs, this gives satisfactory results (see Section 4). For larger graphs, a threshold γ can be used to indicate the minimum improvement of Φ between each iteration. If ΔΦ drops below γ, the algorithm terminates.
4
Validation
To test the performance of our algorithm, conducted a series of experiments on different types of networks. In the first series, we executed the ant colony graph cutting algorithm on two real-world networks for which a ground truth is known. The first network we consider is the famous karate club network [16]. The network has become a staple example in literature on clustering/cutting. It consists of the social network of 34 members of a US university karate club. The
66
M. Hinne and E. Marchiori
Fig. 1. The karate club network. The shape of the nodes indicate the partitioning in the actual network. The cut obtained by our algorithm (for parameters, see text) is given by the bold line.
78 edges in the network represent friendships between the members. The network was followed by Zachary for a period of two years, during which the club split after a dispute. The split caused the club to break into two new clubs, one centered around the former instructor and one around one of the members. The separation into these groups is often taken as the ground truth division (i.e. cut) of this network [17]. The conductance of the actual cut on the karate club network is Φ = 0.30. We obtained a cut through our ACO algorithm as well, using the following settings: 25 iterations, 10 trials per iteration, α = 1 and β = 0 (i.e. heuristic information was not used). The result of this cut is shown in Figure 1. As can be seen, our cut corresponds very well to the actual division, with only node 10 being classified on the wrong side of the cut (this corresponds to an accuracy of 0.97). The minimum conductance we obtained is actually lower than the ground truth, Φ = 0.29. The second data set is a network of books sold by Amazon.com about US politics published around the presidential elections of 2004. The 105 nodes in this network are the individual books, the 374 edges represent frequent co-purchasing by customers of Amazon. The data was collected by Krebs [18]. Later, the ground truth labels of the books were determined by Newman by looking at the book descriptions2 . The books were classified into ‘liberal’, ‘neutral’ and ‘conservative’. Since we wanted to test our algorithm on a binary data set, we removed the five books that were labeled ‘neutral’. With the remaining sets of liberal and conservative books, the cut has a conductance of Φ = 0.07. Using the same settings as for the karate club network, our algorithm identified a cut with conductance Φ = 0.04. The accuracy of this cut was 0.96, indicating that the obtained cut nearly coincides with the actual division. 4.1
Parameter Tuning
For many other networks, a ground truth is generally not available. To gain insight in the performance of our algorithm nonetheless, we executed it on the 2
See http://www-personal.umich.edu/~ mejn/netdata/
Cutting Graphs Using Competing Ant Colonies
67
neural network of the roundworm C. Elegans [19], which consists of 297 vertices and 2148 edges. We considered the conductance of the optimal cut after an increasing number of iterations of the GenerateSolution(C, T, H) procedure. We used 10 solutions per iteration and up to 25 iterations. The first experiment considers the impact of using just pheromones (α = 1, β = 0), just heuristic information (α = 0, β = 1), both in equal proportions (α = 1, β = 1) and none (α = 0, β = 0), as a baseline to compare the performance. In the baseline, the edge selection process is essentially random. The experiment was repeated 20 times. Figure 2a shows the mean conductance as a function of the number of iterations. The chart shows each parameter setting outperforms the baseline, however, only by a small margin when only heuristic information is used. Using pheromones and heuristic information combined, or just pheromones alone, leads to significantly better cuts than the baseline settings. The second experiment considers using different ratios between α and β. Again, we used 10 solutions per iteration and up to 25 iterations, and repeated each experiment 20 times. This time however, we consider strong favor for pheromones (α = 10, β = 1), strong favor for heuristic information (α = 1, β = 10), strong emphasis on both (α = 10, β = 10) and again the baseline (α = 0, β = 0). It is important to note that for a pheromone τi , τi ∈ [1, ∞). For heuristic information η, η ∈ [1, 2]. Consequently, the results for parameter settings α = 1, β = 1 behave differently than α = 10, β = 10. The results from this experiment are shown in Figure 2b. The results show that too much emphasis on the pheromone information may lead to bad cuts, which we attribute to getting stuck in local optima. The best performance is obtained when heuristic information is weighed heavily, in combination with a little guidance by the pheromone distribution. The experiments show that the two defining features of the ACO paradigm, pheromone- and heuristics based selection of solution components, indeed lead to better graph cuts. However, the heuristic information is far more crucial than the pheromones, although the latter is needed to improve performance in subsequent iterations. 4.2
Comparison to a Baseline Algorithm
In order to further analyze the performance of our algorithm, we compare it to a well-known graph partitioning algorithm as proposed by Kernighan and Lin [8]. The algorithm tries to minimize the cut set while balancing the partition elements by repeatedly swapping vertices from the sets. It has a running time complexity of O(n2 log n) [20]. We compared the cuts obtained by the KernighanLin algorithm to our own approach, using 25 iterations of 10 solutions, with α = 1 and β = 10. The datasets we used were all taken from the online collection of Mark Newman3 . The mean results of 20 runs of this experiment are shown in Table 4. As the conductance scores show, our algorithm obtains better cuts in 5 of 6 datasets, and 3
http://www-personal.umich.edu/~ mejn/netdata/
M. Hinne and E. Marchiori
1.3
Conductance φ
1.2
α=0, β=0 α=0, β=1 α=1, β=0 α=1, β=1
1.2 1.1
α=0, β=0 α=10, β=1 α=1, β=10 α=10, β=10
1.1 Conductance φ
68
1 0.9 0.8
1 0.9 0.8 0.7
0.7
0.6
0.6
0.5 0.4
0.5 0
5
10 15 Iterations
20
5
0
25
10
15
20
25
Iterations
(a)
(b)
Fig. 2. The mean conductance φ of the optimal solution after I iterations (repeated 20 times), for different settings of α and β (see text). Error bars have been omitted for clarity.
obtains the same cut in one network. Only once is the algorithm outperformed by the baseline algorithm. The table also shows the sizes of the cut-set, which is used as an alternative for conductance when both partitions are required to be equal in size. 4.3
Complexity
At each step in the cutting of a graph, our algorithm considers at most m edges and selects the optimal to traverse according to Eq. (6). The last calculation is done in O(1). The selection of edges is repeated until all n vertices in the network have been flagged. As this process is repeated for each iteration (of a total of I) and each trial (a total of S per iteration), the total running time complexity of the algorithm is O(ISnm). In general, we assume that I n and S n, so that the complexity may be considered O(nm). If we furthermore assume that we the algorithm is run on sparse networks, i.e. m = O(n), our algorithm has running time complexity like the Kernighan-Lin algorithm. Table 4. Conductance and cut size for the Kernighan-Lin algorithm (KL) and the mean conductance for the ant colony optimization algorithm (ACO), for several network datasets. n and m denote the number of vertices and edges in the networks, respectively. The number in parenthesis is the standard deviation. Network n Zachary karate club [16] 34 Bottlenose dolphins [21] 62 C. Elegans neural network [19] 297 Football players [22] 115 Political books [18] 105 Les Miserables [23] 77
m 78 159 2345 613 441 254
ΦKL 0.64 0.36 0.49 0.37 0.04 0.35
cKL 36 46 716 190 16 56
ΦACO 0.29 (0) 0.21 (0.04) 0.49 (0.02) 0.25 (0.03) 0.09 (0) 0.29 (0.01)
cACO 10 (0) 11.15 (3.54) 402 (20.37) 67.75 (7.81) 19 (0) 23 (12)
Cutting Graphs Using Competing Ant Colonies
4.4
69
Comparison to State-of-the-Art Cuts
As our final experiment, we compared the cuts of our algorithm to the best partitions known of five larger datasets. The networks and the best known cuts were taken from the University of Greenwhich Graph Partitioning Archive4 (GPA). Unfortunately, these cuts do not list the corresponding conductance values. Instead, they show the cut size and the size of the largest element of the partition. A further difference with our approach is that the cuts from the archive are all accompanied by a known balance score B, defined as max |S|, |S| B= , (9) n/2 with n the number of vertices in the graph. Although the optimization of conductance favors balanced partitions, it is possible that the ACO algorithm yields an imbalanced partition. Therefore, we compare the performance of our algorithm to the most lenient cut known from the archive, those with balance up to B ≤ 1.05. Table 5 shows the results for the first five datasets in the archive. As the numbers indicate, the ACO algorithm is not (yet) up to par with state-ofthe-art algorithms on larger graphs. Although the algorithm is sometimes able to detect a cut with a cut size lower than the best known, it does so with a very imbalanced partition. Table 5. Mean cut size for 20 repetitions of the Ant Colony Optimization (ACO) algorithm, the best known cut from the Graph Partitioning Archive (GPA), and the best cut and corresponding balance obtained by ACO. The numbers within parentheses indicate one standard deviation. Network add20 data 3elt uk add32
5
n m cACO BACO cGPA cACObest BACObest 2395 7462 611.24 (308.31) 1.26 (0.29) 550 141 1.74 2851 15093 78.12 (13.19) 1.57 (0.02) 181 61 1.54 4720 13722 131.50 (12.60) 1.03 (0.05) 87 113 1.02 4824 6837 7.00 (1.49) 1.56 (0.04) 18 4 1.56 4960 9462 7.10 (0.31) 1.12 (0.02) 10 7 1.12
Conclusion and Suggestions for Further Research
In this paper we have explored the usage of the Ant Colony Optimization paradigm to find balanced minimal graph cuts. We developed an algorithm based on the general ACO outline that uses ant systems to build a solution to the problem based on individual components (graph edges). Crucial to our approach – and different from most conventional ACO algorithms – is that we use two competing ant colonies. Both colonies try to claim as much of the graph as possible. 4
http://staffweb.cms.gre.ac.uk/~ c.walshaw/partition/
70
M. Hinne and E. Marchiori
The resulting front line corresponds to the graph cut of a single iteration of the algorithm. Using the traditional methods to update pheromones along solution components that are part of successful solutions, as well as the use of a heuristic based on edge clustering, our algorithm is able to obtain cuts with low conductance. Our experiments show that strong emphasis on the heuristic information speeds up the detection of the optimal cut. Furthermore, we have shown that cuts on small networks correspond very well to the true cut. This suggests that our algorithm can be used as a feasible approximation of an otherwise intractable problem. When comparing our algorithm to benchmark and state-of-the-art algorithms, we observed that the ACO algorithm is able to obtain cuts with lower conductance than the Kernighan-Lin algorithm. However, the ACO algorithm delivers rather unbalanced cuts on larger networks. This shows that although the approach certainly has potential, more work is needed to enforce more balanced solutions. In further research, we intend to experiment with the optimization of cut size alone. Also, more sophisticated techniques may be used to initialize the ant colonies on the networks, so that local optima can be avoided easier. Lastly, ACO uses several heuristic constants, such as the evaporation rate of the pheromones, the relation between conductance and pheromone updates and the factors α and β that tune the ratio between pheromones and heuristic information. For each of these parameters, optima should be identified. Acknowledgments. We wish to thank Mart Gerrits and Bob van der Linden for their contributions to the experimentation software and the anonymous reviewers for their constructive comments on Section 4.
References 1. Flake, G.W., Tarjan, R.E., Tsioutsiouliklis, K.: Graph clustering and minimum cut trees. Internet Mathematics 1, 385–408 (2004) 2. Chung, F.R.K.: Spectral Graph Theory. CBMS Regional Conference Series in Mathematics, vol. 92. American Mathematical Society, Providence (February 1997) 3. S´ıma, J., Schaeffer, S.E.: On the NP-Completeness of Some Graph Cluster Measures. CoRR, abs/cs/0506100 (2005) 4. Dorigo, M., Maniezzo, V., Colorni, A.: The Ant System: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B 26, 29–41 (1996) 5. Hao, J., Orlin, J.B.: A faster algorithm for finding the minimum cut in a directed graph. J. Algorithms 17(3), 424–446 (1994) 6. Goldberg, A.V., Tarjan, R.E.: A new approach to the maximum flow problem. Journal of the ACM 35, 921–940 (1988) 7. Brandes, U., Gaertler, M., Wagner, D.: Experiments on graph clustering algorithms. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 568–579. Springer, Heidelberg (2003) 8. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal 49(1), 291–307 (1970)
Cutting Graphs Using Competing Ant Colonies
71
ˇ 9. Koroˇsec, P., Silc, J.: Multilevel optimization of graph bisection with pheromonoes. ˇ In: Filipiˇc, B., Silc, J. (eds.) Proceedings of the International Conference on Bioinspired Optimization Methods and their Applications (BIOMA 2004), Ljubljana, Slovenia, October 11-12, pp. 73–80. Joˇzef Stefan Institute (2004) 10. Langham, A.E., Grant, P.W.: Using Competing Ant Colonies to Solve k-way Partitioning Problems with Foraging and raiding strategies. In: Floreano, D., Nicoud, J.D., Mondana, F. (eds.) ECAL 1999. LNCS, vol. 1674, pp. 621–625. Springer, Heidelberg (1999) 11. Leng, M., Yu, S.: An effective multi-level algorithm based on ant colony optimization for bisecting graph. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 138–149. Springer, Heidelberg (2007) 12. Cheng, D., Kannan, R., Vempala, S., Wang, G.: A divide-and-merge methodology for clustering. ACM Trans. Database Syst. 31(4), 1499–1525 (2006) 13. Blum, C.: Ant colony optimization: Introduction and recent trends. Physics of Life Reviews 2(4), 353–373 (2005) 14. Papadopoulos, S., Skusa, A., Vakali, A., Kompatsiaris, Y., Wagner, N.: Bridge bounding: A local approach for efficient community discovery in complex networks. Technical report, Informatics & Telematics Institute (CERTH) (2009) 15. Gutjahr, W.J.: First steps to the runtime complexity analysis of ant colony optimization. Comput. Oper. Res. 35(9), 2711–2727 (2008) 16. Zachary, W.W.: An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 452–473 (1977) 17. Newman, M.E.J.: Fast algorithm for detecting community structure in networks (September 2003) 18. Krebs, V.: Political polarization during the 2008 us presidential campaign (2008) 19. Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393(6684), 440–442 (1998) 20. Ravikumar, C.P.: Parallel Methods for VLSI Layout Design. Greenwood Publishing Group Inc., Westport (1995) 21. Lusseau, D., Schneider, K., Boisseau, O.J., Haase, P., Slooten, E., Dawson, S.M.: The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. can geographic isolation explain this unique trait? Behavioral Ecology and Sociobiology 54(4), 396–405 (2003) 22. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99(12), 7821–7826 (2002) 23. Knuth, D.E.: The stanford graphbase: A platform for combinatorial computing. Addison-Wesley, Reading (1993)
Effective Variable Fixing and Scoring Strategies for Binary Quadratic Programming Yang Wang1 , Zhipeng L¨ u1 , Fred Glover2, and Jin-Kao Hao1 1
LERIA, Universit´e d’Angers, 2 Boulevard Lavoisier, 49045 Angers Cedex 01, France 2 OptTek Systems, Inc., 2241 17th Street Boulder, CO 80302, USA {yangw,lu,hao}@info.univ-angers.fr,
[email protected]
Abstract. We investigate two variable fixing strategies and two variable scoring strategies within a tabu search algorithm, using the unconstrained binary quadratic programming (UBQP) problem as a case study. In particular, we provide insights as to why one particular variable fixing and scoring strategy leads to better computational results than another one. For this purpose, we perform two investigations, the first analyzing deviations from the best known solution and the second analyzing the correlations between the fitness distances of high-quality solutions. We find that one of our strategies obtains the best solutions in the literature for all of the test problems examined.
1
Introduction
The strategy of fixing variables within optimization algorithms (also sometimes called backbone guided search) often proves useful for enhancing the performance of methods for solving constraint satisfaction and optimization problems [10,11]. Such a strategy was proposed in early literature [1] as a means for exploiting critical variables identified as strongly determined and consistent, and has come to be one of the basic strategies associated with tabu search. Two of the most important features of this strategy are to decide how to score the variables (variable scoring) and which variables should be fixed (variable fixing). In this paper, we provide a case study of variable fixing and scoring strategies within a tabu search variable fixing and scoring (TS/VFS) algorithm designed to solve the Unconstrained Binary Quadratic Programming problem UBQP: Maximize f (x) = x Qx x binary where Q is an n by n matrix of constants and x is an n-vector of binary variables. The formulation UBQP is notable for its ability to represent a wide range of important problems, as noted in the survey of [6]. Motivated by this extensive range of applications, a number of advanced heuristic and metaheuristic algorithms have been devised for solving the UBQP problem ([2,7,8,9]). However, to date there exist no studies of methods that employ variable fixing and scoring strategies within these algorithms. In this work, we investigate different variable fixing and scoring strategies within our TS/VFS algorithm and undertake to answer related questions such as: why does one particular variable fixing and P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 72–83, 2011. c Springer-Verlag Berlin Heidelberg 2011
Effective Variable Fixing and Scoring Strategies
73
scoring strategy lead to better computational results than another one? Which aspects are more important in designing effective optimization algorithms using variable fixing and scoring? To this end, we present an experimental analysis of two variable fixing strategies and two variable scoring strategies within the TS/VFS algorithm. The analysis shows that the computational results strongly depend on the variable fixing strategies employed but are not very sensitive to the variable scoring methods. Moreover, the analysis sheds light on how different fixing and scoring strategies are related with the search behavior and the search space characteristics.
2
The TS Variable Fixing and Scoring Algorithm
Algorithm 1 describes the framework of our TS/VFS algorithm. It begins with a randomly constructed initial solution xs and repeatedly alternates between a tabu search procedure and a phase that either fixes or frees variables until a stop criterion is satisfied. The TS procedure is employed for maxIter iterations to improve the input solution and to obtain p best solutions cached in P as the reference solutions, which are used to score and fix variables.
Algorithm 1. Pseudo-code of the TS/VFS algorithm for UBQP 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.
Input: matrix Q Output: the best binary n-vector x∗ found so far x∗ = ø; f (x∗ ) = −∞; fp = −∞; C = ø repeat Construct an initial solution xs (xsj = x0j , j ∈ C) (Section 3.2) x ← TabuSearch(xs, maxIter) (Section 2.2) Keep p best solutions found during T abuSearch in population P , |P | = p if f (x ) > f (x∗ ) then x∗ = x end if if f (x ) > fp then V arScore ← FixingScoringStrategy(P ) (Section 2.1 & 3.1) V arSorted ← FixingScoreSorting(V arScore) F ixedV ar ← FixingStrategy(V arSorted, F ixedN um) (Section 3.2) else V arScore ← FreeingScoringStrategy(P ) (Section 2.1 & 3.1) V arSorted ← FreeingScoreSorting(V arScore) F ixedV ar ← FreeingStrategy(V arSorted, DroppedN um) (Section 3.2) end if fp = f (x∗ ) until a stop criterion is satisfied
If the objective value f (x ) obtained by the current round of TS is better than the previous one fp , a variable fixing phase is launched. Specifically, the fixing phase consists of three steps: FixingScoringStrategy, FixingScoreSorting
74
Y. Wang et al.
and FixingStrategy. FixingScoringStrategy is used to give score values to variables and then FixingScoreSorting to sort the variables according to these values. FixingStrategy determines a number FixedNum of variables that go into a set FixedVar of variables to be fixed whose index set F is referenced in Algorithm 1. Consequently, the set of variables FixedVar will not be allowed to change its composition during the next round of TS, although conditionally changing the value of a fixed variable is another interesting strategy worthy of further investigation. It is understood that the values of variables xsj in the starting solution xs are selected randomly except for j ∈ F . On the contrary, if the TS procedure fails to find an improved solution relative to fp , the algorithm performs the freeing phase to release some of the fixed variables to permit these variables to change their values during the next round of TS. Similar to the fixing phase, the freeing phase also consists of three steps. To describe these steps we make use of the following definitions. 2.1
Definitions
Definition 1. Relative to a given solution x = {x1 , x2 , ..., xn } and a variable xi , the (objective function) contribution of xi in relation to x is defined as: V Ci (x ) = (1 − 2xi )(qii + qij xj ) (1) j∈N \{i}
As noted in [2] and in a more general context in [4], V Ci (x ) identifies the change in f (x) that results from changing the value of xi to 1 - xi ; i.e., V Ci (x ) = f (x ) − f (x ) xj
xj
xi
(2)
xi .
where = for j ∈ N − {i} and = 1− We observe that under a maximization objective if x is a locally optimal solution, as will typically be the case when we select x to be a high quality solution, then V Ci (x ) ≤ 0 for all i ∈ N , and the current assignment xi = xi will be more strongly determined as V Ci (x ) is “more negative”. Definition 2. Relative to a given population of solutions P = {x1 , . . . , xp } and their corresponding objective values F V = {f (x1 ), . . . , f (xp )} indexed by I = {1, . . . , p}, and relative to a chosen variable xi , let Pi (0) = {k ∈ I : xki = 0} and Pi (1) = {k ∈ I : xki = 1}, the (objective function) contribution of xi in relation to P is defined as follows. Contribution for xi = 0: ˜ (xk )) · V Ci (xk )) (β · V Ci (xk ) + (1 − β) · A(f V Ci (P : 0) =
(3)
k∈Pi (0)
Contribution for xi = 1: ˜ (xk )) · V Ci (xk )) V Ci (P : 1) = (β · V Ci (xk ) + (1 − β) · A(f k∈Pi (1)
(4)
Effective Variable Fixing and Scoring Strategies
75
where fmin and fmax are respectively the minimum and maximum objective ˜ represents the normalized function: values of the set F V and A(·) ˜ (xk )) = (f (xk ) − fmin )/(fmax − fmin + 1) A(f
(5)
Notice that this scoring function not only considers the contributions of the variables but also the relative quality of the solution with respect to other reference solutions in the population P . Relative to variable xi and a population P , The score of xi is then defined as: Score(i) = max{V Ci (P : 0), V Ci (P : 1)} 2.2
(6)
Tabu Search
Our TS procedure begins from a starting solution xs as indicated in Algorithm 1 and uses a neighborhood defined by the simple one-flip move. Once the method is launched, the variables in F ixedV ar are held fixed during the execution of the TS procedure. The method incorporates a tabu list as a “recency-based” memory structure to assure that solutions visited within a certain span of iterations, called the tabu tenure, will not be revisited [3]. In our implementation, we elected to set the tabu tenure by the assignment T abuT enure(i) = tt + rand(10), where tt is a given constant (n/100) and rand(10) takes a random value from 1 to 10. Interested readers are referred to [4,7] for more details. 2.3
Reference Solutions
Reference solutions are used for fixing or freeing variables. We conjecture that there exists a subset of variables, of non-negligible size, whose optimal values are often also assigned to these same variables in high quality solutions. Thus, our goal is to identify such a critical set of variables and infer their optimal values from the assignments they receive in high quality solutions. Our expectation is that this will reduce the search space sufficiently to enable optimal values for the remaining variables to be found more readily. On the basis of this conjecture, we maintain a set of reference solutions consisting of good solutions obtained by TS. Specifically, we take a given number p of the best solutions from the current round of TS (subject to requiring that these solutions differ in a minimal way), which then constitute a solution population P for the purpose of fixing or freeing variables. In our implementation, we empirically set p = 20. (A more refined analysis is possible by a strategy of creating clusters of the solutions in the reference set and of considering interactions and clusterings among subsets of variables as suggested in [1].) 2.4
Variable Fixing Procedure
Given the reference solutions, our variable fixing procedure consists of three steps: FixingScoringStrategy, FixingScoreSorting and FixingStrategy. Variables
76
Y. Wang et al.
are scored using Eq. (6) and sorted according to their scores in a non-decreasing order. We then decide how many variables with the smallest contribution scores should be fixed at their associated values xi at the current level and fix them so that these variables are compelled to receive their indicated values upon launching the next round of TS. Let F ix(h) denote the number of new variables (na) that are assigned fixed values and added to the fixed variables at fixing phase h. We begin with a chosen value F ix1 for F ix(1), referring to the number of fixed variables at the first fixing phase and then generate values for higher fixing phases by making use of an “attenuation fraction” g as follows. We select the value F ix1 = 0.25n and the fraction g = 0.4. F ix(1) = F ix1 F ix(h) = F ix(h − 1) · g for h > 1
2.5
Variable Freeing Procedure
Experiments demonstrate that in most cases, the fixed variables match well with the putative optimal solution. Nevertheless, it is possible that some of these variables are wrongly fixed, resulting in a loss of effectiveness of the algorithm. In order to cope with this problem, it is imperative to free the improperly fixed variables so that the search procedure can be put on the right track. Like the fixing procedure, the freeing procedure also consists of three steps: FreeingScoringStrategy, FreeingScoreSorting and FreeingStrategy. Contrary to the fixing phase, the number of the variables freed from their assignments at each freeing phase is not adjusted, due to the fact that at each phase only a small number of variables are wrongly fixed and need to be freed. Specifically, we set the number nd of fixed variables to free to 60. Then, these selected fixed variables are free to receive new values when initiating the next round of TS.
3 3.1
Variable Scoring and Fixing Strategies Variable Scoring Strategy
We introduce two variable scoring strategies: the first one only considers the contribution of the variables (Definition 1) in the reference solutions while the second one simultaneously considers this contribution and the quality of reference solutions. By Definition 2 in Section 2.1, the part of the equation multiplied by 1 − β is obviously equal to 0 if β is 1.0, which implies that the objective values of the reference solutions are neglected. This constitutes our first variable scoring strategy. We introduce the second variable scoring strategy by simultaneously considering the solution quality of the reference solutions, implemented by assigning a value to β from the interval [0, 1), selecting β = 0.4 in our experiment.
Effective Variable Fixing and Scoring Strategies
3.2
77
Variable Fixing Strategy
In order to describe the variable fixing and freeing strategies more precisely, the following is useful. Let F denote the index set for the fixed variables and U the index set for the free (unfixed) variables. Note that F and U partition the index set N = {1, . . . , n}, i.e., F ∪ U = N , F ∩ U = ∅. Let na be the number of variables to be added when new variables are fixed and nd the number of variables to be dropped when new variables are freed. In addition, let xF i , for i ∈ F , denote the current values assigned to the fixed variables, and let xs denote the starting solution at each run of TS. Then, at each iteration our algorithm s begins by setting: xsi = xF i for i ∈ F and xi = Rand[0, 1] for i ∈ U and the tabu search procedure is launched to optimize this constructed initial solution. The two variable fixing strategies are described as follows: Variable Fixing Strategy 1 (FIX1): Order the elements of i ∈ U such that score(i1 ) ≤ . . . ≤ score(i|U| ) Let F (+) = i1 , . . . , ina F := F ∪ F (+) (|F | := |F | + na) U := U − F (+) (|U | := |U | − na) 0 F xF := i = xi for i ∈ F (+), (xi is already determined for i ∈ “previousF F − F (+) and x0i represents the value that xi should be assigned to according to Eq. (6), i.e., x0i = 0 if V Ci (P : 0) < V Ci (P : 1) and x0i = 1 otherwise.) Variable Freeing Strategy 1 (FREE1): Order the elements of i ∈ F such that score(i1 ) ≥ . . . ≥ score(i|F | ) Let F (−) = i1 , . . . , ind F := F − F (−)(|F | := |F | − nd) U := U F (−)(|U | := |U | + nd) Variable Fixing Strategy 2 (FIX2): Set |F | := |F | + na Order the elements of i ∈ N such that score(i1 ) ≤ . . . ≤ score(in ) (We only need to determine the first |F | elements of this sorted order.) Let F = i1 , . . . , i|F | U := N − F (|U | := |U | − na) 0 xF i = xi for i ∈ F Variable Freeing Strategy 2 (FREE2): Set |F | := |F | − nd Order the elements of i ∈ N such that score(i1 ) ≤ . . . ≤ score(in ) (We only need to determine the first F elements of this sorted order.) Let F = i1 , . . . , i|F | U := N − F (|U | := |U | + nd) 0 xF i = xi for i ∈ F
78
Y. Wang et al.
The strategy FIX1 differs in two ways from FIX2. At each fixing phase, FIX2 fixes |F | variables, while FIX1 only fixes na new variables since |F |−na variables are already fixed. In other words, once a variable is fixed by the strategy FIX1, its value cannot be changed unless a freeing phase frees this variable. Instead of inheriting the previously fixed variable assignment as in FIX1, FIX2 selects all |F | variables to be fixed at each fixing phase. In the freeing phase, the strategy FREE1 only needs to score variables belonging to F and then to select those with the highest scores to be freed, while FREE2 redetermines the variables to be freed each time. 3.3
Four Derived Algorithms
Our four key variants of the TS/VFS algorithm consist of the combination of the two variable fixing strategies and the two variable scoring strategies. Specifically, using β = 1.0 as our scoring strategy, we employ the variable fixing strategies FIX1 and FIX2 to get the first two algorithms, respectively. Likewise, the third and fourth algorithms are derived by combining the scoring strategy β = 0.4 with FIX1 and FIX2, respectively.
4 4.1
Experimental Results Instances and Experimental Protocol
To evaluate the variable scoring and fixing strategies, we test the four variants of the TS/VFS algorithm on a set of 21 large and difficult random instances with 3000 to 7000 variables from the literature [9]. These instances are known to be much more challenging than those from ORLIB. Our algorithm is programmed in C and compiled using GNU GCC on a PC running Windows XP with Pentium 2.66GHz CPU and 512MB RAM. Given the stochastic nature of the algorithm, problem instances are independently solved 20 times. The stop condition for a single run is respectively set to be 5, 10, 30, 30, 50 minutes on our computer for instances with 3000, 4000, 5000, 6000 and 7000 variables, respectively. 4.2
Computational Results
We present in Tables 1 and 2 the computational results with β equaling to 1.0 and 0.4, respectively. Each table reports the results of both FIX1 and FIX2 variable fixing strategies. Columns 2 gives the density (dens) and Column 3 gives the best known objective values (f ∗ ) obtained by all previous methods applied to these problems, as reported in [4,7]. The remaining columns give the results of one of the two versions (FIX1 and FIX2) according to four criteria: (1) the best solution gap, gbest , to the previous best known objective values (i.e., gbest = f ∗ − fbest where fbest denotes the best objective value obtained by our algorithm), (2) the average solution gap, gavr , to the previous best known objective values (i.e., gavr = f ∗ −favr where favr represents the average objective value), (3) the success rate, suc, for reaching the best result f ∗ and (4) the CPU
Effective Variable Fixing and Scoring Strategies
79
Table 1. Results of TS/VFS algorithms with variable fixing strategies FIX1 and FIX2 (β = 1.0) Instance dens p3000.1 p3000.2 p3000.3 p3000.4 p3000.5 p4000.1 p4000.2 p4000.3 p4000.4 p4000.5 p5000.1 p5000.2 p5000.3 p5000.4 p5000.5 p6000.1 p6000.2 p6000.3 p7000.1 p7000.2 p7000.3 Average
0.5 0.8 0.8 1.0 1.0 0.5 0.8 0.8 1.0 1.0 0.5 0.8 0.8 1.0 1.0 0.5 0.8 1.0 0.5 0.8 1.0
f∗ 3931583 5193073 5111533 5761822 5675625 6181830 7801355 7741685 8711822 8908979 8559680 10836019 10489137 12252318 12731803 11384976 14333855 16132915 14478676 18249948 20446407
gbest 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 716 0 34
gavr 413 0 71 114 372 0 1020 181 114 1376 670 1155 865 1172 268 914 1246 2077 2315 2340 2151 897
FIX1 suc tavr 15 172 20 62 18 115 16 93 8 86 20 65 11 295 18 201 18 171 9 231 1 999 6 740 3 1037 3 1405 13 1003 6 451 1 739 2 1346 1 2470 0 3000 7 981 9.3 746
tbest 40 2 6 5 5 14 64 17 56 58 999 47 279 1020 192 68 739 1267 2470 3000 478 516
gbest 0 0 0 0 0 0 0 0 0 0 368 582 354 608 0 0 88 2184 744 2604 0 359
gavr 3193 397 1144 3119 1770 319 2379 1529 1609 2949 2429 2528 4599 4126 2941 4694 3332 8407 4155 6164 8150 3330
FIX2 suc tavr 5 54 12 26 2 43 7 61 2 147 19 74 5 81 9 58 9 209 2 231 0 1800 0 1800 0 1800 0 1800 3 588 4 550 0 1800 0 1800 0 3000 0 3000 5 1836 4.0 988
tbest 63 5 4 7 16 16 59 20 39 134 1800 1800 1800 1800 279 209 1800 1800 3000 3000 149 848
Sd Y Y Y Y Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
time, consisting of the average time and the best time, tavr and tbest (in seconds), for reaching the best result f ∗ . The last column Sd indicates the superiority of FIX1 over FIX2 when a 95% confidence t-test is performed in terms of the objective values. Furthermore, the last row “Average” indicates the summary of the algorithm’s average performance. Table 1 shows the computational results of variable fixing strategies FIX1 and FIX2 where β = 1.0. One observes that for all the considered criteria, FIX1 outperforms FIX2 for almost all the instances. Specifically, FIX1 is able to reach the previous best known objectives for all instances except one (p7000.2) while FIX2 fails for 8 cases. Moreover, FIX1 has an average success rate of 9.3 over 20 runs, more than two times larger than FS2’s 4.0. FIX1 is also superior to FIX2 when it comes to the average gap to the best known objective values. In addition, FIX1 performs slightly better than FIX2 in terms of the CPU time to reach the best values. The T-test also demonstrates that FIX1 is significantly better than FIX2 except only one case (p4000.1). Table 2 gives the computational results of variable fixing strategies FIX1 and FIX2 when β is set to be 0.4 instead of 1.0. From Table 2, we observe that FIX1 outperforms FIX2 in terms of all the considered criteria, including gbest , gavr , suc, tavr and tbest . One also notices that this is quite similar to the case of β = 1.0. Therefore, we can conclude that the variable fixing strategy FIX1 is generally superior to FIX2 when using the two variable scoring strategies considered in this paper. In other words, the two variable scoring strategies have a similar influence on the computational results. The ability of the tabu search method using FIX1 to obtain all of the best known solutions in the literature places this method on a par with the best methods like [4,7], while its solution times are better than those obtained in [4].
80
Y. Wang et al.
Table 2. Results of TS/VFS algorithms with variable fixing strategies FIX1 and FIX2 (β = 0.4) Instance dens p3000.1 p3000.2 p3000.3 p3000.4 p3000.5 p4000.1 p4000.2 p4000.3 p4000.4 p4000.5 p5000.1 p5000.2 p5000.3 p5000.4 p5000.5 p6000.1 p6000.2 p6000.3 p7000.1 p7000.2 p7000.3 Average
5
0.5 0.8 0.8 1.0 1.0 0.5 0.8 0.8 1.0 1.0 0.5 0.8 0.8 1.0 1.0 0.5 0.8 1.0 0.5 0.8 1.0
f∗ 3931583 5193073 5111533 5761822 5675625 6181830 7801355 7741685 8711822 8908979 8559680 10836019 10489137 12252318 12731803 11384976 14333855 16132915 14478676 18249948 20446407
gbest 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
gavr 308 0 166 19 275 0 783 254 75 1769 791 860 1698 1123 455 1450 1079 2320 1784 2743 3971 1044
FIX1 suc tavr tbest 16 98 4 20 59 9 17 108 2 19 109 24 11 147 14 20 61 13 11 369 44 17 234 29 19 250 13 8 361 275 2 721 228 4 540 37 5 702 292 2 103 76 12 747 261 9 1014 432 3 911 515 3 1000 642 2 1519 785 1 2238 2238 3 1457 870 9.7 607 324
gbest 0 0 0 0 0 0 0 0 0 0 325 0 354 444 0 0 0 0 1546 1710 0 209
FIX2 gavr suc tavr 3315 5 75 488 13 50 1355 4 28 1684 10 74 1796 3 154 354 19 78 2722 3 382 1474 8 75 2537 7 158 3112 3 101 2798 0 1800 2397 1 45 4939 0 1800 3668 0 1800 3250 3 145 5405 2 1178 4923 1 192 6137 1 147 4556 0 3000 5986 0 3000 11604 1 1113 3548 4.0 733
tbest 2 3 5 2 40 3 106 29 12 41 1800 45 1800 1800 114 768 192 147 3000 3000 1113 668
Sd Y Y Y Y Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
Discussion and Analysis
We now turn our attention to discussing and analyzing some key factors which may explain the performance difference of algorithms when using different variable fixing and scoring strategies. For this purpose, we examine the Variables Fixing Errors (number of wrongly fixed variables) relative to the putative optimal solution and show a fitness landscape analysis of high-quality solutions. β = 1.0 FIX1 FIX2
15
Variable Fixing Errors
Variable Fixing Errors
β = 0.4
10 5 0
0
10 20 Fixing or freeing phases
30
FIX1 FIX2
15 10 5 0
0
10 20 Fixing or freeing phases
30
Fig. 1. Comparison of variable fixing errors between two fixing strategies
5.1
Variable Fixing Errors
As previously demonstrated, the variable fixing strategy FIX1 dominates FIX2 with both scoring strategies (with β = 1.0 and β = 0.4). In order to ascertain
Effective Variable Fixing and Scoring Strategies
81
why this is the case, we conduct an experiment to compare the total number of wrongly fixed variables during the search using these two variable fixing strategies. For this, we carry out our experiment on instance p5000.5 and repeat the experiment 20 times. For each run, we count, after each fixing or freeing phase, the number of mismatched variables of the current (possibly partial) solution with respect to the best known solution1 . Figure 1, where each point represents the accumulated Variable Fixing Errors over 20 runs, shows how the variable fixing strategies affect the Variable Fixing Errors at each fixing or freeing phase under two variable scoring strategies: the left one is for β = 0.4 and the right is for β = 1.0. From Figure 1, one observes that the number of variable fixing errors induced by FIX1 and FIX2 (with both scoring strategies) increases rapidly at the beginning of the search and then decreases gradually when the search progresses. However, the number of the Variable Fixing Errors of FIX1 is much smaller than that of FIX2 throughout the search process. This observation together with the results in Tables 1 and 2 demonstrate that the variable fixing strategy plays a vital role in our TS/VFS algorithm for both β = 1.0 and β = 0.4. 5.2
Fitness Distance Correlation Analysis
In this section, we show a search landscape analysis using the fitness distance correlation [5], which estimates how closely the fitness and distance are related to the nearest optimum in the search space. For this purpose, we collect a large number of high-quality solutions by performing 20 independent runs of our TS/VFS algorithm, each run being allowed 30 fixing and freeing phases, where each phase has 20 elite solutions recorded in the population P . Thus, 20 ∗ 30 ∗ 20 = 12, 000 solutions are collected and plotted. Figures 2 and 3 show the hamming distance between these solutions to the best known solution against the fitness difference Δf = f ∗ - f (xk ) of these high-quality solutions for instances p5000.1 and p5000.5, respectively. Figure 2 discloses that the majority of the high quality solutions produced by variable fixing strategy FIX1 (two upper sub-figures) has a much wider distance range than the solutions produced by strategy FIX2 (two bottom sub-figures), which indicates that the search space of FIX1 is more dispersed than that of FIX2. Moreover, the high-quality solutions of FIX1 are much closer to the xaxis than FIX2, implying that FIX1 can obtain better objective values than FIX2. In sum, this indicates the higher performance of the FIX1 strategy. Figure 3 presents a trend quite similar to that of Figure 2 in terms of the solutions’ distance range and the percentage of high quality solutions when comparing the two variable fixing strategies FIX1 (two upper sub-figures) and FIX2 (two bottom sub-figures). However, a clear difference from Figure 2 is that high quality solutions are distributed in a wider range. In particular, the distribution of solutions is more continuous and does not produce the “isolated cluster effect” shown in Figure 2. This indicates that instance p5000.5 is much easier 1
The best known solutions are obtained by different algorithms, sharing exactly the same assignment. Thus, we assume that it is very likely to be the optimal solution.
82
Y. Wang et al.
4
x 10
FIX1 (β = 0.4)
4
3 Fitness difference
Fitness difference
3
2
1
0
4
2
1
0 200 400 600 800 Distance to the best known solution
FIX2 (β = 0.4)
4
3 Fitness difference
Fitness difference
x 10
2
1
FIX1 (β = 1.0)
0
0 200 400 600 800 Distance to the best known solution
3
x 10
0
x 10
FIX2 (β = 1.0)
2
1
0
0 200 400 600 800 Distance to the best known solution
0 200 400 600 800 Distance to the best known solution
Fig. 2. Fitness distance correlation: instance p5000.1 4
FIX1 (β = 0.4)
3
4
x 10 Fitness difference
Fitness difference
x 10
2 1 0
Fitness difference
1
FIX2 (β = 0.4)
3 2 1 0
2
0 200 400 600 800 Distance to the best known solution
0 200 400 600 800 Distance to the best known solution
4
x 10 Fitness difference
4
3
0
0 200 400 600 800 Distance to the best known solution x 10
FIX1 (β = 1.0)
FIX2 (β = 1.0)
3 2 1 0
0 200 400 600 800 Distance to the best known solution
Fig. 3. Fitness distance correlation: instance p5000.5
than p5000.1 to solve as shown in Tables 1 and 2. Indeed, for instance p5000.5, the search space seems smoother, enabling the search to traverse easily from solutions that are far from optimal to the best known solution.
6
Conclusions
To build better algorithms that make use of variable fixing and scoring, it is important to understand and explain the performance variations produced by
Effective Variable Fixing and Scoring Strategies
83
different variable scoring and fixing strategies. We undertook to analyze the intrinsic characteristics of two variable fixing strategies and two variable scoring strategies for UBQP. To this end, we compared the Variable Fixing Errors produced in the course of obtaining a (near) optimal solution and identified the correlations between fitness distances of high quality solutions to characterize the search behavior of the variable fixing and scoring strategies. Our experimentation discloses that our TS method indeed performs differently according to the variable fixing strategy employed, but is much less sensitive to the variable scoring strategy. The finding that the fixing strategy FIX1 obtains the best solutions in the literature to the challenging test problems examined underscores the relevance of variable fixing strategies and the value of analyzing their impacts.
Acknowledgement We are grateful for the referees for their comments and questions which helped us to improve the paper. The work is partially supported by “Pays de la Loire” Region (France) through RaDaPop and LigeRO projects (2009-2013).
References 1. Glover, F.: Heuristics for Integer Programming Using Surrogate Constraints. Decision Sciences 8(1), 156–166 (1977) 2. Glover, F., Kochenberger, G.A., Alidaee, B.: Adaptive memory tabu search for binary quadratic programs. Management Science 44, 336–345 (1998) 3. Glover, F., Laguna, M.: Tabu Search. Kluwer Academic Publishers, Boston (1997) 4. Glover, F., L¨ u, Z., Hao, J.K.: Diversification-driven tabu search for unconstrained binary quadratic problems. 4OR 8(3), 239–253 (2010) 5. Jones, T., Forrest, S.: Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms. In: Proceedings of the 6th International Conference on Genetic Algorithms, pp. 184–192. Morgan Kaufmann, San Francisco (1995) 6. Kochenberger, G.A., Glover, F., Alidaee, B., Rego, C.: A unified modeling and solution framework for combinatorial optimization problems. OR Spectrum 26, 237–250 (2004) 7. L¨ u, Z., Glover, F., Hao, J.K.: A Hybrid Metaheuristic Approach to Solving the UBQP Problem. European Journal of Operational Research 207(3), 1254–1262 (2010) 8. Merz, P., Katayama, K.: Memetic algorithms for the unconstrained binary quadratic programming problem. BioSystems 78, 99–118 (2004) 9. Palubeckis, G.: Multistart tabu search strategies for the unconstrained binary quadratic optimization problem. Annals of Operations Research 131, 259–282 (2004) 10. Wilbaut, C., Salhi, S., Hanafi, S.: An iterative variable-based fixation heuristic for 0-1 multidimensional knapsack problem. European Journal of Operation Research 199, 339–348 (2009) 11. Zhang, W.: Configuration landscape analysis and backbone guided local search: Satisfiability and maximum satisfiability. Artificial Intelligence 158, 1–26 (2004)
Evolutionary Multiobjective Route Planning in Dynamic Multi-hop Ridesharing Wesam Herbawi and Michael Weber Institute of Media Informatics, Ulm University, Germany {wesam.herbawi,michael.weber}@uni-ulm.de http://www.uni-ulm.de
Abstract. Ridesharing is considered as one of the promising solutions for dropping the consumption of fuel and reducing the congestion in urban cities, hence reducing the environmental pollution. In this work, we present an evolutionary multiobjective route planning algorithm for solving the route planning problem in the dynamic multi-hop ridesharing. The experiments indicate that the evolutionary approach is able to provide a good quality set of route plans and outperforms the generalized label correcting algorithm in term of runtime. Keywords: Multiobjective Optimization, Evolutionary Algorithms, Genetic Algorithms, Route Planning, Ridesharing.
1
Introduction
In recent decades the global warming has taken the global temperature to its highest level in the past millennium and there is a growing consensus that increasing anthropogenic greenhouse gases participates in the global warming [14]. According to [25], in 2006, 51 percent of the liquids made from petroleum and biomass is used by transportation and this share is expected to rise up to 56 percent in 2030. In addition, as in 2006, transportation accounts for 29 percent of total U.S. greenhouse gases emissions [9]. According to [8], the average car occupancies in Europe ranges from 2 for leisure trips to 1.1 for commuters. USA has no better occupancy rates [4] and this leads to increase the congestion. the estimated cost of lost hours and wasted fuel, resulted from congestion in the USA was 78 billion dollars in 2007 [22]. In their work Jacobson and King[13] have showed that adding 1 passenger to every 10 traveling cars would potentially save 7.54-7.74 billion gallons of fuel per year. A straight forward solution could be the ridesharing which is the shared use of a car by its driver and one or more passengers called riders. These days, there are many ridesharing services and the demand for these services has increased sharply in recent years. The growing ubiquity of the internet enabled mobile devices enables practical dynamic ridesharing [21]. By dynamic ridesharing, we refer to a system where
This work was partially supported by the German Academic Exchange Service (DAAD), Grant no. A/08/99166.
P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 84–95, 2011. c Springer-Verlag Berlin Heidelberg 2011
Evolutionary Multiobjective Route Planning in DMR
85
the process of matching the riders and drivers to form ridesharing is done on very short notice or even en-route [2]. In the USA, the number of persons who use their mobile devices to access the internet has doubled from 2008 to 2009 to reach 63 million people in January 2009, a third on daily basis1 . A dynamic ridesharing system in which the rider is matched with one or more drivers is called Dynamic Multi-hop Ridesharing System (DMR). DMR is more flexible than other forms of ridesharing and offers more choices for the rider[12]. For the rider, the set of drivers’ offers that constitute a trip are called route plan. Route plans in DMR are subject to multiobjectives such as the minimization of time and cost which leads to multiobjective route planning. Usually a single route plan does not minimize all objectives in parallel especially in the presence of conflicting objectives [26]. Multiobjective route planning is categorized as NP-complete [11]. The Generalized Label Correcting algorithm (GLC) [24] is a deterministic algorithm that can be used to find the set of optimal (known as Pareto-optimal) set of route plans, but its exponential worst case complexity is prohibitive. Multiobjective evolutionary algorithms (MOEAs) have received a growing interest in solving the multiobjective optimization problems especially for their scalability as compared with the deterministic algorithms [1]. Many multiobjective evolutionary algorithms have been proposed and NSGA-II [6] is considered one of the most popular multiobjective evolutionary algorithms[10]. In this work, we investigate the feasibility of implementing the MOEAs to solve the route planning problem in DMR systems. Specifically we implement the NSGA-II algorithm to solve the multiobjective route planning problem and compare it with the deterministic GLC algorithm in terms of the quality of solutions and runtime. The objectives to be minimized are the cost, time and the number of drivers in the route. The rest of this paper is organized as follows. We introduce some basic concepts in section 2. In Section 3 we describe the modeling of drivers offers for route planning and in section 4 we overview some related works. The multiobjective evolutionary algorithm and our customized genetic operators and evaluation function are described in section 5. Finally we discuss the results in section 6.
2
Preliminaries
In this section, we will define some concepts used throughout this work. We suppose that the database has a set of drivers D = {d1 , d2 , ..., dn }, a set of stations S = {s1 , s2 , ..., sm } and a set of drivers’ offers O = {o1 , o2 , ..., ok }. Each offer consists of a starting station s1 , ending station s2 , departure time t1 , arrival time t2 , driver d and cost c such that ∀o ∈ O : o = {s1 , s2 , t1 , t2 , d, c : s1 , s2 ∈ S ∧ d ∈ D ∧ t1 < t2 ∧ s1 = s2 }. A request q for sharing a ride contains a source station s1 , destination station s2 and departure time t such that ∀q : q = {s1 , s2 , t : s1 , s2 ∈ S ∧ s1 = s2 } 1
comscore: Mobile Internet Becoming A Daily Activity, http://www.comscore.com
86
W. Herbawi and M. Weber
Definition 1 (Route Plan in DMR). Is the set p ∈ P of drivers’ offers that are sufficient to serve a request q such that: P = {{o1 , o2 , ..., on } ⊆ O : s2 (oi ) = s1 (oi+1 ) ∧ t2 (oi ) ≤ t1 (oi+1 ) ∧ s1 (o1 ) = s1 (q) ∧ s2 (on ) = s2 (q) ∧ t(q) ≥ t1 (o1 )∀oi∈{1,...,n−1} } The cost function, also called evaluation function, C : P → [R+ ]k assigns the k dimensional cost vector (ci,1 , ci,2 , ..., ci,k ) for route plan pi ∈ P . Definition 2 (Pareto-dominance). For two route plans p1 , p2 ∈ P . We say that p1 dominates p2 , written p1 p2 , iff ∀c1,j ∧ c2,j : c1,j ≤ c2,j ∧ ∃c1,j < c2,j j ∈ {1, 2, ..., k}. In other words p1 p2 iff p1 is partially less than p2 . Definition 3 (Pareto-optimal). If ∀x ∈ P : x p the route plan p is called Pareto-optimal. Definition 4 (Pareto-optimal Set). Is the set of all Pareto-optimal route plans. Formally, Pareto-optimal set of route plans={p : ∀x ∈ P : x p}. Definition 5 (Multiobjective Route Planning in DMR). Is the process of finding the Pareto-optimal set of route plans with respect to request q.
3
Modeling Drivers’ Offers for Route Planning in Multi-hop Dynamic Ridesharing
In this section we will discuss how we model the drivers’ offers to solve the route planning problem. The drivers’ offers are represented as a simple time-expanded graph [23] G= (V, E). The vertices in the graph refer to the departure and arrival events. For each offer o, two vertices u, v ∈ V are introduced, the first represents the departure event from station s1 at time t1 and the second represents the arrival event to station s2 at time t2 . Each vertex contains its associated station s and event time t. An edge e = (u, v) ∈ E, is inserted between each two departure and arrival vertices to represent the travel from s1 to s2 with weight vector w(e) = (time, cost, driver) = (Δ(t1 (o), t2 (o)), c(o), d(o)). Note that we have considered the driver as part of the edge weight as it will be used to compute the number of drivers for the route plan. For each station s, the set of vertices that represent either departure or arrival events are sorted in ascending order regarding their timestamps t. An additional edge between each two consecutive vertices e = (vi , vi+1 ) that belong to the same station s is inserted to represent the waiting at a particular station s with w(e) = {Δ(t(vi ), t(vi+1 )),0,null}. Figure 1 shows an example of simple time-expanded graph model of drivers’ offers where s1 and s2 are two stations. At each station the gray vertices represent departure events to their corresponding arrival events, white vertices, at the other station. The gray vertex at station s1 represents a departure event from
Evolutionary Multiobjective Route Planning in DMR
87
Fig. 1. Simple time-expanded graph. White vertices represent arrival events and gray vertices represent departure events.
s1 at time t6 to station s2 at time t10 with the cost vector associated with the edge that connects them. Given that the drivers’ offers are modeled as a time-expanded graph, the multiobjective route planning problem is reduced to solving the known multiobjective shortest path problem (MSPP) on a time-expanded graph. A route plan will be equivalent to a path from source vertex to destination vertex.
4
Related Work
In this section, we will give a brief overview of some efforts directed towards solving the MSPP. Mainly they fall in two broad categories: The first category is the deterministic (global optimal) approaches which are generalizations of the traditional single objective shortest path label setting and label correcting algorithms, and the second category is the approximation schemes, mainly evolutionary algorithms. We begin with some efforts directed towards the deterministic approach. Brumbaugh-Smith and Shier [3] have discussed several implementations for the label correcting methods to solve the bicriteria MSPP. Mainly they have tested different node selection methods on different network sizes. Their findings indicate that there is some possibility to solve the bicriteria MSPP in practice. In another independent work, Martins and Santos [15] have provided a detailed theoretical study for different types of labeling algorithms to solve the MSPP. They have proved that MSPP is bounded if and only if absorbent cycles are not present in the network. M¨ uller-Hannemann and Weihe [17] have identified a set of key characteristics that exists in many applications of MSPP that makes the number of Paretooptimal paths at each visited node to be limited by a small constant. In an effort to test the suitability of different classes of algorithms for solving the MSPP on different size road networks, Shad et al.[20] have analyzed three shortest path algorithms and a genetic algorithm on Iranian road networks. They have concluded that the genetic algorithm performs poorly on large size networks. However, Mooney and Winstanley [16] have developed an evolutionary algorithm for the MSPP and showed its feasibility through a set of experiments over real road networks. They concluded that evolutionary algorithms are a competing alternative for the deterministic algorithms especially in large networks.
88
W. Herbawi and M. Weber
Pangilinan and Janssens[19] have explored the behavior of a multiobjective evolutionary algorithm when applied to solve the MSPP. The behavior of the algorithm is described in terms of the diversity and optimality of the solutions in addition to its computational complexity. Their experiments on random networks indicated that the algorithm can find a diverse set of solutions for the MSPP in polynomial time. However their test networks were relatively small with maximum 200 nodes. Given the two broad approaches for solving the MSPP and sometimes conflicting results regarding the feasibility of the MOEAs to solve this problem, in this work, we investigate the feasibility of implementing the MOEAs to solve the route planning problem in DMR systems and compare the results with the deterministic GLC algorithm. Also we decided to utilize the NSGA-II algorithm as it is one of the more popular MOEAs in addition to being frequently used as a reference for evaluating other algorithms [10].
5
Evolutionary Multiobjective Route Planning
In this section, we provide the algorithmic details of the Evolutionary Multiobjective Route Planning used in this work. To approximate the Pareto-optimal set, we use the Nondominated Sorting Genetic Algorithm(NSGA-II) and provide problem specific crossover and mutation operators plus an evaluation function. NSGA-II is a fast and elitist multiobjective genetic algorithm characterized by its fast nondomination sorting and diversity preservation without the need for parameter sharing. The basic operations of the algorithm are shown in algorithm 1, for more details see [6]. We utilize the jMetal framework [7] in the implementation of our solution. jMetal provides a rich set of classes of multi-objective metaheuristics in addition to some experimentation tools and quality indicators. 5.1
Solution Representation and Population Initialization
A solution is represented as a list L of vertices v forming a path from the source vertex to the destination vertex. Given that different paths have different lengths, we followed the variable-length solution representation approach. For population initialization, we use the Random Walks (RW) [5]. This method is characterized by generating more diverse paths as compared with other approaches such as A*, depth-first, and breadth-first and enables the MOEA to well explore the search space [16]. During population initialization and offsbring creation, a station might be visited twice. We suppose that a path with a duplicate station will have higher values for the objective functions and we simply consider it as a penalty. 5.2
Genetic Operators
In this section, we describe our crossover and mutation operators.
Evolutionary Multiobjective Route Planning in DMR
89
Algorithm 1. NSGA-II Basic Operations begin P ←− initialpopulation N ←− |P | while !Stop Condition do Q ←− Create Of f springs(P ) Evaluate Elements(Q) R ←− P ∪ Q F ←− N ondomination Sort(R) P ←− ∅ i ←− 1 while |P | + |Fi | ≤ N do Crowding Distance Assignment(Fi ) P ←− P ∪ Fi i ←− i + 1 Crowding Distance Sorting(Fi ) P ←− P ∪ Fi [1 : (N − |P |)]
We make mutation by the mean of local search to create an alternative local path in the selected solution. We randomly select two vertices from L and then RW is utilized to find an alternative path connecting them. Figure 2 clarifies the proposed mutation operator.
Fig. 2. Path Mutation: dark gray vertices represent the local source and destination vertices and the light gray vertices represent a different path connecting them
Fig. 3. crossover: gray vertices represent crossover vertices
For crossover, we make a single point crossover between each two selected solutions. A vertex from one solution is randomly selected and a search for a matching vertex in the second solution is made. If a matching vertex is found, then the solutions are crossed after the matching vertices as shown in Figure 3.
90
W. Herbawi and M. Weber
We noticed that the match rate is low because of the time-expansion nature of the graph, so if a match is not found, we keep searching for another match until all match possibilities are explored. 5.3
Evaluation Function
We evaluate a solution as follows. For the first two cost elements (cost and time) we simply make vector sum for all costs associated with the edges between L vertices. For the third cost function, we count the number of drivers included in the solution, such that successive offers for same driver counts for one driver as in equation 1.
(time, cost, drivers) =
|L|−1 i=1
|L|−1
T ime(Li, Li+1 ),
H ++ {d(Li, Li+1 )} |H| : ∀i∈{1,...,|L|−1}H = H
Cost(Li , Li+1 ),
i=1
if H|H| = d(Li , Li+1 ) Otherwise
(1)
Where d = driver, H is a list and + + is concatenation operator.
6
Experiments
To test the feasibility of NSGA-II for solving the multiobjective route planning problem in a DMR system, a set of experiments have been done and the results of NSGA-II are compared with the results of the GLC algorithm in terms of quality and runtime. All experiments are done on a computer with 2G RAM and 2.00GHz CPU running Windows 7. JRE 1.6.0 21 was the runtime environment. 6.1
Experiment Description
First, we constructed a network of 41 nodes with a density of 0.038 to represent an example road network with nodes number 0 and 40 being at the opposite ends of the network and all other nodes being in between. Edges are introduced between nodes such that alternative paths between nodes are possible. Then, different instances of the time-expanded graph are created by the generation of different number of random trips with length x, 1 ≤ x ≤ 13, on this network. A random trip is generated by randomly selecting a trip start node, trip length and trip starting time. Trip length represents how many nodes to visit. The trip time and cost between each two nodes are set randomly to represent different behaviors for different drivers and each trip is related for a unique driver. A trip of length x will generate 2 × x vertices in the time-expanded graph as described in section 3. The numbers of generated trips were 100, 150, 200, 250, 300, 350, 400, 450 and 500. For each number of trips listed above, the experiment is repeated 10 times. Each time, the GLC algorithm is used to find the Pareto-optimal set of solutions
Evolutionary Multiobjective Route Planning in DMR
91
followed by 40 independent runs of NSGA-II to find an approximation of the Pareto-optimal solution set. The results of the 40 NSGA-II runs are averaged to be the result of the ith run of the 10 runs for the j th number of trips. We have followed this approach, i.e. repeat the experiment for the same number of trips, to simulate the randomness of such system where different instances of the same number of trips will have different behavior by both GLC and NSGA-II. To compare the runtime, another experiment is done. For each number of trips listed above, the experiment is repeated 40 times and the result for each number of trips is the average of the 40 running times for each algorithm. NSGA-II settings were as follows: crossover and mutation probabilities were 1.0 and 0.4 per solution (chromosome) respectively. Population size was 250 and maximum generations 100 with total of 25000 function evaluations. We noticed that the proposed algorithm is not very sensitive for the values of the genetic operators. However, a value less than 0.4 for the mutation operator results in drop in solutions quality. 6.2
Quality Indicators
Following we describe the metrics used to asses the quality of the approximate Pareto-optimal set Sa generated by NSGA-II w.r.t the true Pareto-optimal set So generated by GLC. Distances and volumes are measured in the objective space after normalization. 1. Hypervolume: This quality indicator calculates the volume covered by the members of Sa regarding a reference point consisting of the highest values of the objective functions of the elements in So . For more information see [18]. In our experiments, we measure the ratio (HVr) of the hypervolume of Sa to the hypervolume of So . 2. Generational Distance (GD): This quality indicator is used to measure how far the elements in Sa are from those in So as in equation (2). n GD =
i=1
n
di 2
(2)
where n is the number of the elements in Sa , di is the euclidean distance between each element of Sa and the nearest element in So . The best value of GD is 0, however this does not necessarily mean Sa = So but means Sa ⊆ So . Therefore, we also calculate the Inverted GD. 3. Inverted GD (IGD): Measures how far the elements of So are from the elements of Sa as in equation (2). 4. Runtime: we compare the runtime of NSGA-II with the GLC algorithm. Although the worst case runtime complexity of GLC algorithm is exponential to the number of vertices, some research [17] indicates that this worst case is not realistic in many applications. Therefore we decided to compare both algorithms using the runtime as a fourth metric.
92
6.3
W. Herbawi and M. Weber
Experimentation Results and Discussion
Table 1 shows the mean and standard deviation for the results obtained regarding the first three quality indicators. Also, we provide boxplots for the same data sets in Figure 4. We notice that the ratio of Sa ’s hypervolume to So ’s hypervolume is more than 90% most of the time. This means that the the elements of Sa dominate most of the solutions dominated by So and Sa is a good approximation. Table 1. Quality Indicators for different number of trips. Mean and standard deviation 100 150 200 250 300 350 400 450 500 HVr 9.57e−01 9.43e−01 9.61e−01 9.12e−01 9.48e−01 9.22e−01 9.33e−01 9.17e−01 9.23e−01 ±3.3e−02
±6.6e−02
±2.5e−02
±5.3e−02
±2.5e−02
±3.0e−02
±2.4e−02
±3.0e−02
±2.4e−02
GD 1.67e−02 6.37e−03 6.93e−03 1.00e−02 1.07e−02 9.28e−03 9.69e−03 1.20e−02 8.13e−03 ±2.2e−02
±7.6e−03
±5.4e−03
±5.0e−03
±9.4e−03
±4.7e−03
±3.9e−03
±8.6e−03
±2.7e−03
IGD 5.52e−02 2.79e−02 3.04e−02 3.04e−02 3.35e−02 2.04e−02 2.66e−02 2.89e−02 2.58e−02 ±1.9e−02
±8.1e−03
±2.5e−02
±8.6e−03
±1.5e−02
±4.7e−03
±9.1e−03
±1.1e−02
±1.1e−02
In addition to the hypervolume ratio, the values of GD and IGD indicate that the elements of Sa are close to their counter elements in So which means that NSGA-II produces high quality Sa . However, we notice that the IGD metric has higher values than GD which means that there are some elements in So not well approximated in Sa . After investigating the results, we noticed that the solutions that have high value of the first cost function (time) are not well approximated. This is due to the time-expansion nature of the graph and the fact that we use the random walks approach for both population initialization and mutation which gives higher chance for solutions with smaller time to be found. In all experiments, maximum generations was the only stopping condition and was constant for all number of trips. Therefore, we notice that by increasing the search space (number of trips) the trend is that the quality of Sa decrease which requires more generations to better converge. Figure 5 shows the result of the runtime comparison between NSGA-II and GLC. It is clear that NSGA-II outperforms the GLC from runtime point of view especially for large problem instances. The GLC algorithm suffers from bad scalability in solving the route planning problem on simple-time expanded graphs, especially in the presence of a cost function, related to the number of drivers or number of car changes, to be optimized. This clear bad scalability is due to the fact that, this cost function is not directly modeled in the timeexpanded graph but is computed on the fly which makes a lot of labels to be domination incomparable at each vertex between the source and destination vertices. In other words, if Labels b1 and b2 at vertex v have different drivers, then they are dominance-incomparable. This is because, at vertex v it might be that b1 b2, but this domination does not hold after v. From Figure 5 we notice that GLC outperforms NSGA-II on smaller problem instances from runtime point of view. This is due to the fixed evolutionary startup (population initialization) overhead and our fixed stopping condition. Although, for small problem instances smaller population size and maximum generations can help in improving the runtime.
93
0.00
0.80
0.02
0.85
0.90
0.04
0.95
0.06
1.00
Evolutionary Multiobjective Route Planning in DMR
100
200
300
400
500
100
300
400
500
(b) Generational Distance GD
0.02
0.04
0.06
0.08
(a) Hypervolume Ratio HVr
200
100
200
300
400
500
(c) Inverted Generational Distance IGD Fig. 4. Boxplots for the quality indicators (ordinate) for different number of trips
120 CPU Time (s)
100 80 60
40 20 0
100
150
200
250
300
350
400
450
500
GLC
0.09 0.169 1.293 2.633 8.026 13.36 29.89 56.39 98.25
NSGAII
1.6
1.462 1.475 1.476 1.489 1.501 1.522 1.554 1.556
Fig. 5. Runtime (ordinate) comparison under different number of trips
A little bit confusing result is that NSGA-II at 100 trips suffers from the worst runtime which is contrary to expectations. We explain this simply as follows: to provide good approximation, we take care of having diverse initial population.
94
W. Herbawi and M. Weber
Therefore, at the initialization phase, we try to find undiscovered solutions and neglect already discovered solutions for a constant number of times. Therefore, for small problem instances, it was time consuming to search for unique solutions.
7
Conclusions and Future Work
In this work we have modeled the route planning problem in dynamic multi-hop ridesharing systems and provided a customized version of NSGA-II algorithm to solve the multiobjective route planning problem. The experiments indicate that the evolutionary approach, represented by NSGA-II, is able to provide a good quality approximation of the optimal solutions and outperforms the GLC algorithm in term of the runtime. In future work, we consider incorporating more criteria such as the social closeness and shared interests, inferred from the social networks, between riders and drivers. Incorporating more criteria will potentially increase the size of the Pareto-optimal solution set which challenges the evolutionary approach for finding good approximation. Also we consider as a future work the spatial properties of the vertices as heuristics to direct the search. This might help in reducing the search space and directs the search towards the destination. Acknowledgments. The authors would like to thank Antonio J. Nebro, Hashem Tamimi and Boto Bako for all fruitful discussions and the anonymous reviewers for their valuable comments.
References 1. Abraham, A., Jain, L., Goldberg, R.: Evolutionary Multiobjective Optimization: Theoretical Advances and Applications. Advanced Information and Knowledge Processing. Springer, Inc., New York (2005) 2. Agatz, N., Erera, A., Savelsbergh, M., Wang, X.: Sustainable passenger transportation: Dynamic ride-sharing. Tech. rep., Erasmus Research Inst. of Management (ERIM), Erasmus Uni., Rotterdam (2010) 3. Brumbaugh-Smith, J., Shier, D.: An empirical investigation of some bicriterion shortest path algorithms. EJOR 43(2), 216–224 (1989) 4. BTS: Highlights of the 2001 National Household Travel Survey. U.S. Dept. of Transportation, Bureau of Trans. Statistics, Washington (2003) 5. Costelloe, D., Mooney, P., Winstanley, A.: From random walks to pareto optimal paths. In: 4th Irish AI and Cog. Science. pp. 309 – 318 (2001) 6. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE TEC 6(2), 182–197 (2002) 7. Durillo, J., Nebro, A., Alba, E.: The jmetal framework for multi-objective optimization: Design and architecture. In: IEEE CEC, Barcelona, Spain, pp. 4138–4325 (2010) 8. EEA: Occupancy rates in passenger vehicles. Tech. rep., European Environment Agency, Copenhagen, Denmark (2005)
Evolutionary Multiobjective Route Planning in DMR
95
9. EPA: Greenhouse gas emissions from the U.S. transportation sector, 1990-2003. Tech. rep., U.S. EPA, Office of Trans. and Air Quality (2006) 10. Eskandari, H., Geiger, C., Lamont, G.: FastPGA: A dynamic population sizing approach for solving expensive multiobjective optimization problems. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 141–155. Springer, Heidelberg (2007) 11. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., NY (1979) 12. Gruebele, P.: Interactive system for real time dynamic multi-hop carpooling. Tech. rep., Global Transport Knowledge Partnership (2008) 13. Jacobson, S., King, D.: Fuel saving and ridesharing in the U.S.: Motivations, limitations, and opportunities. Transportation Research Part D: Trans. and Environment 14(1), 14–21 (2009) 14. Mann, M., Bradley, R., Hughes, M.: Northern hemisphere temperatures during the past millennium: Inferences, uncertainties, and limitations. GRL 26, 759–762 (1999) 15. Martins, E., Santos, J.: The labeling algorithm for the multiobjective shortest path problem. Tech. rep., University of Coimbra, Portugal (1999) 16. Mooney, P., Winstanley, A.: An evolutionary algorithm for multicriteria path optimization problems. IJGIS 20, 401–423 (2006) 17. M¨ uller-Hannemann, M., Weihe, K.: Pareto shortest paths is often feasible in practice. In: Brodal, G.S., Frigioni, D., Marchetti-Spaccamela, A. (eds.) WAE 2001. LNCS, vol. 2141, pp. 185–197. Springer, Heidelberg (2001) 18. Nebro, A., Durillo, J., Luna, F., Dorronsoro, B., Alba, E.: Mocell: A cellular genetic algorithm for multiobjective optimization. IJIS 24(7), 726–746 (2009) 19. Pangilinan, J., Janssens, G.: Evolutionary algorithms for the multi-objective shortest path problem. IJASET 4(1), 205–210 (2007) 20. Roozbeh, S., Hamid, E., Mohsen, G.: Evaluation of route finding methods in GIS application. In: Map Asia 2003, Bejing, China, pp. 12–20 (2003) 21. Hartwig, S., Buchmann, M.: Empty seats traveling. Tech. rep., Nokia Research Center, Bochum (2007) 22. Schrank, D., Lomax, T.: The 2007 urban mobility report. Tech. rep., Texas Transportation Inst., Washington (2007) 23. Schulz, F., Wagner, D., Weihe, K.: Dijkstra’s algorithm on-line: An empirical case study from public railroad transport. In: Vitter, J., Zaroliagis, C. (eds.) WAE 1999. LNCS, vol. 1668, pp. 110–123. Springer, Heidelberg (1999) 24. Skriver, A., Andersen, K.: A label correcting approach for solving bicriterion shortest-path problems. Computers & Operations Research 27(6), 507–524 (2000) 25. Sperling, D., Cannon, J.: Climate and transportation solutions: Findings from the 2009 asilomar conf. on trans. and energy policy. Tech. rep., Uni. of California, Inst. of Trans. Studies (2010) 26. Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C., da Fonseca, V.: Performance assessment of multiobjective optimizers: an analysis and review. IEEE TEC 7(2), 117–132 (2003)
Experiments in Parallel Constraint-Based Local Search Yves Caniou1 , Philippe Codognet2 , Daniel Diaz3 , and Salvador Abreu4 1
JFLI, CNRS / NII, Japan JFLI, CNRS / UPMC / University of Tokyo, Japan 3 University of Paris 1-Sorbonne, France 4 ´ Universidade de Evora and CENTRIA FCT/UNL, Portugal
[email protected],
[email protected],
[email protected],
[email protected] 2
Abstract. We present a parallel implementation of a constraint-based local search algorithm and investigate its performance results on hardware with several hundreds of processors. We choose as basic constraint solving algorithm for these experiments the ”adaptive search” method, an efficient sequential local search method for Constraint Satisfaction Problems. The implemented algorithm is a parallel version of adaptive search in a multiple independent-walk manner, that is, each process is an independent search engine and there is no communication between the simultaneous computations. Preliminary performance evaluation on a variety of classical CSPs benchmarks shows that speedups are very good for a few tens of processors, and good up to a few hundreds of processors.
1
Introduction
Constraint Programming emerged in the late 1980’s as a successful paradigm to tackle complex combinatorial problems in a declarative manner [21]. It is somehow at the crossroads of combinatorial optimization, constraint satisfaction problems (CSP), declarative programming language and SAT problems (boolean constraint solvers and verification tools). Experiments to parallelize constraint problems started in the early days of the Constraint Programming paradigm, by exploiting the search parallelism of the host logic language [22]. Parallel implementation of search algorithms has indeed a long history, especially in the context of Logic Programming [13]. In the field of constraint satisfaction problems (CSP), early work has been done in the context of Distributed Artificial Intelligence and multi-agent systems [38], but these methods, even if interesting from a theoretical point of view, did not lead to efficient algorithms. In the last decade, with desktop computers turning into parallel machines with 2, 4 or even 8 core CPUs, the temptation to implement efficient parallel constraint solvers has become an increasingly developing research field. Most of the proposed implementations are based on the so-called OR-parallelism, splitting the search space between different processors and relying on the Shared P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 96–107, 2011. c Springer-Verlag Berlin Heidelberg 2011
Experiments in Parallel Constraint-Based Local Search
97
Memory Multiprocessor architecture as the different processors work on shared data-structures representing a global environment in which the subcomputations take place. Only very few implementations of efficient constraint solvers on such machines have been reported, for instance [34] for a shared-memory architectures with 8 core CPUs. The Comet system [23] has been parallelized for small clusters of PCs, both for its local search solver [28] and its propagation-based constraint solver [29]. Recent experiments have been done up to 12 processors [30], and speedups tend somehow to level after 10 processors. For SAT solvers, several multi-core parallel implementations have also been developed [20,8,35] and similarly for Model Checkers, e.g., the SPIN software [5,24]. More recently [32], a SAT solver has been implemented on a larger PC cluster, using a hierarchical shared memory model and trying to minimize communication between nodes. However performances tend to level after a few tens of processors, i.e., with a speed-up of 16 for 31 processors, 21 for 37 processors and 25 for 61 processors. In this paper we wanted to address the issue of parallelizing constraint solvers for massively parallel architectures, involving several thousands of CPUs. A design principle implied by this goal is to abandon the classical model of shared data structures which have been developed for shared-memory architectures or tightly controlled master-slave communication in cluster-based architectures and to consider either purely independent parallelism or very limited communication between parallel processes. Up to now, the only parallel method to solve optimization problems being deployed at large scale is the classical branch and bound, because it does not require much information to be communicated between parallel processes (basically: the current bound, see [17]). It has been recently a method of choice for experimenting the solving of optimization problems using Grid computing, because few data has to be exchanged between nodes [1]. Another implementation, described in [7], uses several hundreds of nodes of the Grid’5000 platform. Good speedups are achieved up to a few hundreds of processors but, interestingly, their conclusion is that the execution time tends to stabilize afterwards. In [14], the authors proposed to parallelize a constraint solver based on local search using a simple multi-start approach requiring no communication between processes. Experiments done on an IBM BladeCenter with 16 Cell/BE cores show nearly ideal linear speed-ups for a variety of classical CSP benchmarks (magic squares, all-interval series, perfect square packing, etc.). We wanted to investigate if this method could scale up to a larger number of processors, e.g., a few hundreds or a few thousands. We therefore developed a parallel OpenMPIbased implementation from the existing sequential Adaptive Search C-based implementation. This parallel version can run on any system based on OpenMPI, i.e., supercomputer, PC cluster or Grid system. We performed experiments with classical CSP benchmarks from the CSPLIB on two systems: – the HA8000 machine, an Hitachi supercomputer with a maximum of nearly 16000 cores installed at University of Tokyo,
98
Y. Caniou et al.
– the Grid’5000 infrastructure, the French national Grid for the research, which contains 5934 cores deployed on 9 sites distributed in France. The rest of this paper is organized as follows. Section 2 gives some context and background in parallel local search, while Section 3 presents the Adaptive Search algorithm, a constraint-based local search method based on the CSP formalism. Section 4 details the performance analysis on the parallel hardware. A short conclusion and perspectives end the paper.
2
Local Search and Parallelism
Local Search methods and Metaheuristics [25,19] can be applied to solve CSPs as Constraint Satisfaction can be seen as a branch of Combinatorial Optimization in which the objective function to minimize is the number of violated constraints: a solution is therefore obtained when the function has value zero. For nearly two decades Local Search methods have been used in SAT solvers for checking the satisfaction of boolean constraints. Since the pioneering algorithms such as GSAT and WalkSAT in the mid 90’s, there has been a trend to incorporate more and more local search and stochastic aspects in SAT solvers, in order to cope with ever larger problems [27]. Recently, algorithms such as the ASAT heuristics or Focused Metropolis Search, which incorporate even more stochastic aspects, seem to be among the most effective methods for solving random 3-SAT problems [3]. Parallel implementation of local search metaheuristics has been studied since the early 90’s, when multiprocessor machines started to become widely available, see [37,33]. With the increasing availability of PC clusters in the early 2000’s, this domain became active again [11,4]. Apart from domain-decomposition methods and population-based method (such as genetic algorithms), [37] distinguishes between single-walk and multiple-walk methods for Local Search. Single-walk methods consist in using parallelism inside a single search process, e.g., for parallelizing the exploration of the neighborhood (see for instance [36] for such a method making use of GPUs for the parallel phase). Multiple-walk methods (parallel execution of multi-start methods) consist in developing concurrent explorations of the search space, either independently or cooperatively with some communication between concurrent processes. Sophisticated cooperative strategies for multiple-walk methods can be devised by using solution pools [12], but requires shared-memory or emulation of central memory in distributed clusters, impacting thus on performances. A key point is that independent multiple-walk methods are the most easy to implement on parallel computers without shared memory and can lead in theory to linear speed-up if solutions are uniformly distributed in the search space and if the method is able to diversify correctly [37]. Interestingly, [2] showed pragmatically that this is the case for the GRASP local search method on a few classical optimization problems such as quadratic assignment, graph planarization, MAX-SAT, maximum covering but this experiment was done with a limited number of processors (28 max).
Experiments in Parallel Constraint-Based Local Search
3
99
The Adaptive Search Algorithm
Adaptive Search was proposed by [9,10] as a generic, domain-independent constraint based local search method. This meta-heuristic takes advantage of the structure of the problem in terms of constraints and variables and can guide the search more precisely than a single global cost function to optimize, such as for instance the number of violated constraints. The algorithm also uses a short-term adaptive memory in the spirit of Tabu Search in order to prevent stagnation in local minima and loops. This method is generic, can be applied to a large class of constraints (e.g., linear and non-linear arithmetic constraints, symbolic constraints, etc.) and naturally copes with over-constrained problems. The input of the method is a problem in CSP format, that is, a set of variables with their (finite) domains of possible values and a set of constraints over these variables. For each constraint, an “error function” needs to be defined; it gives, for each tuple of variable values, an indication of how much the constraint is violated. This idea has also been proposed independently by [16], where it is called ”penalty functions”, and then reused by the Comet system [23], where it is called ”violations”. For example, the error function associated with an arithmetic constraint |X − Y | < c, for a given constant c ≥ 0, can be max(0, |X − Y | − c). Adaptive Search relies on iterative repair, based on variable and constraint error information, seeking to reduce the error on the worst variable so far. The basic idea is to compute the error function for each constraint, then combine for each variable the errors of all constraints in which it appears, thereby projecting constraint errors onto the relevant variables. This combination of errors is problem-dependent, see [9] for details and examples, but it is usually a simple sum or a sum of absolute values, although it might also be a weighted sum if constraints are given different priorities. Finally, the variable with the highest error is designated as the “culprit” and its value is modified. In this second step, the well known min-conflict heuristic [31] is used to select the value in the variable domain which is the most promising, that is, the value for which the total error in the next configuration is minimal. In order to prevent being trapped in local minima, the Adaptive Search method also includes a short-term memory mechanism to store configurations to avoid (variables can be marked Tabu and “frozen” for a number of iterations). It also integrates reset transitions to escape stagnation around local minima. A reset consists in assigning fresh random values to some variables (also randomly chosen). A reset is guided by the number of variables being marked Tabu. It is also possible to restart from scratch when the number of iterations becomes too large (this can be viewed as a reset of all variables but it is guided by the number of iterations). The core ideas of adaptive search can be summarized as follows: – to consider for each constraint a heuristic function that is able to compute an approximated degree of satisfaction of the goals (the current “error” on the constraint); – to aggregate constraints on each variable and project the error on variables thus trying to repair the “worst” variable with the most promising value;
100
Y. Caniou et al.
– to keep a short-term memory of bad configurations to avoid looping (i.e., some sort of “tabu list”) together with a reset mechanism. Adaptive Search is a simple algorithm but it turns out to be quite efficient in practice. The following table compares its performances with the Comet 2.1.1 system on a few benchmarks from CSPLib [18], included in the distribution of Comet. Timings are in seconds and taken for both solvers on a PC with a Core2 Duo E7300 processor at 2.66 GHz, and are the average of 100 executions for AS and of 50 executions for Comet. Of course, it should be noticed that Comet is a complete and very versatile system while Adaptive Search is just a C-based library, but one can see that Adaptive Search is about two orders of magnitude faster than Comet. Also note that [26] compares a new metaheuristics named Dialectic Search with the older (2001) version of Adaptive Search [9], showing that both methods have similar results. However when using the timings from [10], the newer (2003) version of Adaptive Search is about 15 to 40 times faster than Dialectic Search on the same reference machine. Table 1. Execution times and speedups of Adaptive Search vs Comet Benchmark Queens n=10000 Queens n=20000 Queens n=50000 Magic Square 30x30 Magic Square 40x40 Magic Square 50x50
Comet Adaptive Search Speedup 24.5 0.52 47 96.2 2.16 44.5 599 13.88 43.2 56.5 199 609
0.34 0.53 1.18
166 375 516
We can thus state the overall Adaptive Search algorithm as follows: Input: A problem given in CSP format: - a set of variables V = {V1 , V2 , ..., Vn } with associated domains - a set of constraints C = {C1 , C2 , ..., Ck } with associated error functions - a combination function to project constraint errors on variables - a (positive) cost function to minimize And some tuning parameters: - T: Tabu tenure (number of iterations a variable is frozen) - RL: reset limit (number of frozen variables to trigger reset) - RP: reset percentage (percentage of variables to reset) - Max I: maximal number of iterations before restart - Max R: maximal number of restarts Output: A solution (configuration where all constraints are satisfied) if the CSP is satisfied or a quasi-solution of minimal cost otherwise.
Experiments in Parallel Constraint-Based Local Search
101
Algorithm Restart = 0 Repeat Restart = Restart + 1 ; Iteration = 0 ; Tabu Nb = 0 compute a random assignment A of variables in V Opt Sol = A ; Opt Cost = cost(A) Repeat Iteration = Iteration +1 compute errors of all constraints in C and combine errors on each var. (by considering only the constraints in which a variable appears) select the variable X (not marked Tabu) with highest error evaluate costs of possible moves from X if no improvement move exists then mark X as Tabu until Iteration + T Tabu Nb = Tabu Nb + 1 if Tabu Nb ≥ RL then randomly reset RP variables in V (and unmark those which are Tabu) else select the best move and change the value of X accordingly to produce next configuration A’ if cost(A’) < Opt Cost then Opt Sol = A = A’ ; Opt Cost = cost(A’) until a solution is found or Iteration ≥ Max I until a solution is found or Restart ≥ Max R output (Opt Sol, Opt Cost)
4
Parallel Performance Analysis
We used the implementation of the Adaptive Search method consisting of a C-based framework library available as freeware at the URL: http://contraintes.inria. fr/∼ diaz/adaptive/ The parallelization of the Adaptive Search method was done with OpenMPI, an implementation of the MPI standard [15]. The idea of the parallelization is straightforward, and based on the idea of multi-start and independent multiplewalks: fork a sequential Adaptive Search method on every available cores. But on the opposite of the classical fork-join paradigm, parallel Adaptive Search shall terminate as soon as a solution is found, not wait until all the processes have finished (since some searches initialized with ”bad” initial configurations can take some time). Thus, some non-blocking tests are involved every c iterations to check if there is a message indicating that some other processes has found a solution; in which case it terminates the execution properly. Note however that several processes can find a solution ”at the same time”, i.e., during the same c-block of iterations. Thus, those processes send their statistics (among which the execution time) to the process 0 which will then determine which of them is actually the fastest.
102
Y. Caniou et al.
Three testbeds were used to perform our experiments: – HA8000, the Hitachi HA8000 supercomputer of the University of Tokyo with a total number of 15232 cores. This machine is composed of 952 nodes, each of which is composed of 4 AMD Opteron 8356 (Quad core, 2.3 GHz) with 32 GB of memory. Nodes are interconnected with a Myrinet-10G network with a full bisection connection, attaining 5 GB/sec in both directions. HA8000 can theoretically achieve a performance of 147 Tflops, but we only accessed to a subset of its nodes as users can only have a maximum of 64 nodes (1,024 cores) in normal service. – Grid’5000 [6], the French national Grid for the research, which contains 5934 cores deployed on 9 sites distributed in France. We used two subsets of the computing resources of the Sophia-Antipolis node: Suno, composed of 45 Dell PowerEdge R410 with 8 cores each, thus a total of 360 cores, and Helios, composed of 56 Sun Fire X4100 with 4 cores each, thus a total of 224 cores. We use a series of classical benchmarks from CSPLib [18] consisting of: – all-interval: the All Interval Series problem (prob007 in CSPLib), – perfect-square: the Perfect Square placement problem (prob009 in CSPLib), – magic-square: the Magic Square problem (prob019 in CSPLib). Although these benchmarks are academic, they are abstractions of real-world problems and could involve very large combinatorial search spaces, e.g., the 400x400 magic square problem requires 160000 variables whose domains range over 160000 values and the time to find a solution on a single processor by local search is nearly 2 hours on average. Classical propagation-based constraint solvers cannot solve this problem for instances higher than 10x10. Also note that we are tackling constraint satisfaction problems as optimization problems, that is, we want to minimize the global error (representing the violation of constraints) to value zero, therefore finding a solution means that we actually reach the bound (zero) of the objective function to minimize. Table 2. Speedups on HA8000, Suno and Helios Platform Problem Time on Speedup on k cores 1 core 16 32 64 128 256 HA8000
MS 400 Perfect 5 A-I 700 Suno MS 400 Perfect 5 A-I 700 Helios MS 400 Perfect 5 A-I 700
6282 42.7 638 5362 106 662 6565 139.7 865.8
10.6 15.0 8.19 8.4 15.1 10.1 13.2 15.8 9.1
20.6 29.5 14.8 22.8 23 15.8 20.6 24.5 14.9
31.7 44.6 17.8 32.6 46.1 19.9 31 46.6 23.5
41.3 49.1 23.4 41.3 70.7 23.9 44 77.2 27.3
54.1 57.0 27.7 52.8 106 28.3 -
Experiments in Parallel Constraint-Based Local Search
103
Fig. 1. Speedups on HA8000
Table 2 presents the execution times and speedups for executions up to 256 cores on HA8000 and on the Grid’5000 platform. The same code has been ported and executed, timings are given in seconds and are the average of 50 runs, except for MS 400 on HA8000 where it is the average of 20 runs. We can see that the speedups are more or less equivalent on both platforms. Only in the case of perfect-square are the results significantly different between the two platforms, for 128 and 256 cores. In those cases Grid’5000 has much better speedups than on HA8000. Maybe this is because execution time is getting too small (less than one second) and therefore some other mechanisms interfere. The stabilization point is not yet obtained for 256 cores, even if speedups do not increase as fast as the number of cores, i.e., are getting further away from linear speedup. This is visually depicted on Fig. 1 and Fig. 2. As the speedups on the two Grid’5000 platforms (Helios and Suno nodes) are nearly identical, we only depicted the speedups with Suno, as we can experiment up to 256 cores on this platform. 4.1
Discussion
As we can see in the results obtained, the parallelization of the method gives good benefits on both the HA8000 and the Grid’5000 platforms, achieving speedups of about 30 with 64 cores, 40 with 128 cores and more than 50 with 256 cores. Of course speedups depend on the benchmarks and the bigger the benchmark, the better the speedup.
104
Y. Caniou et al.
Fig. 2. Speedups on Grid5000 (Suno)
To see the impact of the problem size on performances, let us detail a single benchmark, magic square, on three instances of increasing difficulty. Table 3 details the performances on HA8000 for the following instances: 100x100, 120x120 and 200x200. The three plots on Fig. 3 show a similar shape, but the bigger the benchmark, the better the parallel speedup, and for those smaller benchmarks the speedup curve start to flatten after 64 processors. As these experiments show that every speedup curves tend to flatten at some point, it suggests that there is maybe an intrinsically sequential aspect in local search methods and that the improvement given by the multi-start aspect might reach some limit when increasing the number of parallel processors. This might be theoretically explained by the fact that, as we use structured problem instances and not random instances, solutions may be not uniformly distributed in the search space. Table 3. Performances for magic square on HA8000 # cores 1 8 16 32 64 128
MS 100 time speed 18.2 1.0 2.16 8.41 1.69 10.8 1.43 12.7 1.20 15.1 1.16 15.5
MS 120 time speed 53.4 1.0 5.84 9.14 3.99 13.4 3.03 17.7 2.26 23.6 2.24 23.9
MS200 time speed 338 1.0 42.3 8.0 22.4 15.1 14.8 22.9 12.2 27.8 12.1 28.0
Experiments in Parallel Constraint-Based Local Search
105
Fig. 3. Speedups for 3 instances of magic square on HA8000
5
Conclusion and Future Work
We presented a parallel implementation of a constraint-based local search algorithm, the ”Adaptive Search” method in a multiple independent-walk manner. Each process is an independent search engine and there is no communication between the simultaneous computations except for completion. Performance evaluation on a variety of classical CSPs benchmarks and on two different parallel architectures (a supercomputer and a Grid platform) shows that the method is achieving speedups of about 30 with 64 cores, 40 with 128 cores and more than 50 with 256 cores. Of course speedups depend on the benchmarks and the bigger the benchmark, the better the speedup. In order to take full advantage of the execution power at hand (i.e., hundreds or thousands of processors), we have to seek a new way to further increase the benefit of parallelization. We are currently working on a more complex algorithm, with communication between parallel processes in order to reach better performances. The basic idea is as follows: Every c iteration a process will send the value of its current best total configuration cost to other processes. Every c iteration each process also checks messages from other processes and if it received a message with a cost lower than its own cost, which means that it is further away from a solution, then it can decide to stop its current computation and make a random restart. This will be done following a given probability p. Therefore the two key parameters are c, the number of iterations between messages and p, the probability to make a restart. We are currently experimenting this algorithm with various values for the benchmarks described in this paper.
106
Y. Caniou et al.
References 1. Aida, K., Osumi, T.: A case study in running a parallel branch and bound application on the grid. In: SAINT 205: Proceedings of the the 2005 Symposium on Applications and the Internet, pp. 164–173. IEEE Computer Society, Washington, DC, USA (2005) 2. Aiex, R.M., Resende, M.G.C., Ribeiro, C.C.: Probability distribution of solution time in grasp: An experimental investigation. Journal of Heuristics 8(3), 343–373 (2002) 3. Alava, M., Ardelius, J., Aurell, E., Kaski, P., Orponen, P., Krishnamurthy, S., Seitz, S.: Circumspect descent prevails in solving random constraint satisfaction problems. PNAS 105(40), 15253–15257 (2007) 4. Alba, E.: Special issue on new advances on parallel meta-heuristics for complex problems. Journal of Heuristics 10(3), 239–380 (2004) 5. Barnat, J., Brim, L., Roˇckai, P.: Scalable multi-core LTL model-checking. In: Boˇsnaˇcki, D., Edelkamp, S. (eds.) SPIN 2007. LNCS, vol. 4595, pp. 187–203. Springer, Heidelberg (2007) 6. Bolze, R., et al.: Grid 5000: A large scale and highly reconfigurable experimental grid testbed. Int. J. High Perform. Comput. Appl. 20(4), 481–494 (2006) 7. Caromel, D., di Costanzo, A., Baduel, L., Matsuoka, S.: Grid’BnB: a parallel branch and bound framework for grids. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 566–579. Springer, Heidelberg (2007) 8. Chu, G., Stuckey, P.: A parallelization of MiniSAT 2.0. In: Proceedings of SAT race (2008) 9. Codognet, P., Diaz, D.: Yet another local search method for constraint solving. In: Steinh¨ ofel, K. (ed.) SAGA 2001. LNCS, vol. 2264, pp. 73–90. Springer, Heidelberg (2001) 10. Codognet, P., Diaz, D.: An efficient library for solving CSP with local search. In: Ibaraki, T. (ed.) MIC 2003, 5th International Conference on Metaheuristics (2003) 11. Crainic, T., Toulouse, M.: Special issue on parallel meta-heuristics. Journal of Heuristics 8(3), 247–388 (2002) 12. Crainic, T.G., Gendreau, M., Hansen, P., Mladenovic, N.: Cooperative parallel variable neighborhood search for the -median. Journal of Heuristics 10(3), 293–314 (2004) 13. de Kergommeaux, J.C., Codognet, P.: Parallel logic programming systems. ACM Computing Surveys 26(3), 295–336 (1994) 14. Diaz, D., Abreu, S., Codognet, P.: Parallel constraint-based local search on the cell/BE multicore architecture. In: Essaaidi, M., Malgeri, M., Badica, C. (eds.) Intelligent Distributed Computing IV. Studies in Computational Intelligence, vol. 315, pp. 265–274. Springer, Heidelberg (2010) 15. Gabriel, E., et al.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Kranzlm¨ uller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004) 16. Galinier, P., Hao, J.-K.: A general approach for constraint solving by local search. In: 2nd Workshop CP-AI-OR 2000, Paderborn, Germany (2000) 17. Gendron, B., Crainic, T.: Parallel branch-and-bound algorithms: Survey and synthesis. Operations Research 42(6), 1042–1066 (1994) 18. Gent, I.P., Walsh, T.: CSPlib: A benchmark library for constraints. In: Jaffar, J. (ed.) CP 1999. LNCS, vol. 1713, pp. 480–481. Springer, Heidelberg (1999)
Experiments in Parallel Constraint-Based Local Search
107
19. Gonzalez, T. (ed.): Handbook of Approximation Algorithms and Metaheuristics. Chapman and Hall / CRC, Boca Raton (2007) 20. Hamadi, Y., Jabbour, S., Sais, L.: ManySAT: a parallel SAT solver. Journal on Satisfiability, Boolean Modeling and Computation 6, 245–262 (2009) 21. Hentenryck, P.V.: Constraint Satisfaction in Logic Programming. The MIT Press, Cambridge (1989) 22. Hentenryck, P.V.: Parallel constraint satisfaction in logic programming: Preliminary results of CHIP within PEPSys. In: International Conference on Logic Programming, pp. 165–180. MIT Press, Cambridge (1989) 23. Hentenryck, P.V., Michel, L.: Constraint-Based Local Search. The MIT Press, Cambridge (2005) 24. Holzmann, G.J., Bosnacki, D.: The design of a multicore extension of the spin model checker. IEEE Transactions on Software Engineering 33(10), 659–674 (2007) 25. Ibaraki, T., Nonobe, K., Yagiura, M. (eds.): Metaheuristics: Progress as Real Problem Solvers. Springer, Heidelberg (2005) 26. Kadioglu, S., Sellmann, M.: Dialectic search. In: Gent, I.P. (ed.) CP 2009. LNCS, vol. 5732, pp. 486–500. Springer, Heidelberg (2009) 27. Kautz, H.A., Sabharwal, A., Selman, B.: Incomplete algorithms. In: Biere, A., Heule, M., van Maaren, H., Walsch, T. (eds.) Handbook of Satisability. IOS Press, Amsterdam (2008) 28. Michel, L., See, A., Van Hentenryck, P.: Distributed constraint-based local search. In: Benhamou, F. (ed.) CP 2006. LNCS, vol. 4204, pp. 344–358. Springer, Heidelberg (2006) 29. Michel, L., See, A., Van Hentenryck, P.: Parallelizing constraint programs transparently. In: Bessi`ere, C. (ed.) CP 2007. LNCS, vol. 4741, pp. 514–528. Springer, Heidelberg (2007) 30. Michel, L., See, A., Van Hentenryck, P.: Parallel and distribited local search in comet. Computers and Operations Research 36, 2357–2375 (2009) 31. Minton, S., Johnston, M.D., Philips, A.B., Laird, P.: Minimizing conflicts: A heuristic repair method for constraint satisfaction and scheduling problems. Artificial Intelligence 58(1-3), 161–205 (1992) 32. Ohmura, K., Ueda, K.: c-SAT: A parallel SAT solver for clusters. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 5584, pp. 524–537. Springer, Heidelberg (2009) 33. Pardalos, P.M., Pitsoulis, L.S., Mavridou, T.D., Resende, M.G.C.: Parallel search for combinatorial optimization: Genetic algorithms, simulated annealing, tabu search and GRASP. In: Ferreira, A., Rolim, J.D.P. (eds.) IRREGULAR 1995. LNCS, vol. 980, pp. 317–331. Springer, Heidelberg (1995) 34. Perron, L.: Search procedures and parallelism in constraint programming. In: Jaffar, J. (ed.) CP 1999. LNCS, vol. 1713, pp. 346–360. Springer, Heidelberg (1999) 35. Schubert, T., Lewis, M.D.T., Becker, B.: Pamiraxt: Parallel sat solving with threads and message passing. Journal on Satisfiability, Boolean Modeling and Computation 6, 203–222 (2009) 36. Van Luong, T., Melab, N., Talbi, E.-G.: Local search algorithms on graphics processing units. In: Cowling, P., Merz, P. (eds.) EvoCOP 2010. LNCS, vol. 6022, pp. 264–275. Springer, Heidelberg (2010) 37. Verhoeven, M., Aarts, E.: Parallel local search. Journal of Heuristics 1(1), 43–65 (1995) 38. Yokoo, M., Durfee, E.H., Ishida, T., Kuwabara, K.: The distributed constraint satisfaction problem: Formalization and algorithms. IEEE Transactions on Knowledge and Data Engineering 10(5), 673–685 (1998)
Fitness-Probability Cloud and a Measure of Problem Hardness for Evolutionary Algorithms Guanzhou Lu1 , Jinlong Li2 , and Xin Yao1 1
2
University of Birmingham, UK {G.Lu,X.Yao}@cs.bham.ac.uk University of Science and Technology of China, China
[email protected]
Abstract. Evolvability is an important feature directly related to problem hardness for Evolutionary Algorithms (EAs). A general relationship that holds for Evolvability and problem hardness is the higher the degree of evolvability, the easier the problem is for EAs. This paper presents, for the first time, the concept of Fitness-Probability Cloud (f pc) to characterise evolvability from the point of view of escape probability and fitness correlation. Furthermore, a numerical measure called Accumulated Escape Probability (aep) based on f pc is proposed to quantify this feature, and therefore problem difficulty. To illustrate the effectiveness of our approach, we apply it to four test problems: OneMax, Trap, OneMix and Subset Sum. We then contrast the predictions made by the aep to the actual performance measured using the number of fitness evaluations. The results suggest that the new measure can reliably indicate problem hardness for EAs.
1
Introduction
Evolutionary Algorithms (EAs) are a class of randomised algorithms widely applied in various domains. Hence it would be very useful to classify problems according to their difficulty on EAs. The difficulty of problems for an EA has been described using concepts of ruggedness, neutrality [1] and information landscapes [2]. The notion of fitness landscapes, originally proposed in [3], underlies a large body of work in problem hardness studies for EAs. It is generally agreed that the properties associated with fitness landscapes can indicate problem hardness, e.g. rugged or smooth, number of peaks and valleys as well as distance between them, etc. Nevertheless, the huge size of the search space makes it generally infeasible to plot a fitness landscape. It is desirable to have one or more algebraic measures that are able to capture key characteristics of fitness landscapes. Along this line of consideration, a significant contribution is made by Jones [4] through his introduction of a measure called Fitness Distance Correlation (f dc), which has been tested empirically on a large number of GA and GP benchmarks showing considerable effectiveness. However, it still has some drawbacks and the most severe one being the global optima have to be known beforehand, which apparently prevents one from applying f dc to real-world problems. This limitation has been overcome by the P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 108–117, 2011. c Springer-Verlag Berlin Heidelberg 2011
Fitness-Probability Cloud and a Measure of Problem Hardness
109
introduction of Fitness Cloud (f c) [5] and the Negative Slope Coefficient (nsc) [6] as a measure based on f c. Unfortunately f c also has its own weakness, which will be identified and described in section 2. Evolvability [7] is a concept directly linked with problem hardness for EAs, which is defined as the capacity of genetic operators to improve fitness quality for a given problem. In essence, both f dc and nsc are characterisations of evolvability, from the point of view of fitness-distance correlation and fitness-fitness correlation, respectively. This paper presents a new concept called Fitness-Probability Cloud (f pc). It is argued that f pc overcomes limitations presented in fitness cloud and serves as a more accurate characterisation of evolvability. Furthermore, a measure called Accumulated Escape Probability (aep) based on f pc is proposed to quantify this feature, and therefore problem hardness for EAs. The main contributions of this paper include: – We have identified a limitation of Fitness Cloud (f c) that it is drastically influenced by the neighbourhood sample size K in generating it. In practice, f c cannot be a reliable characterisation unless an appropriate K is selected. – We develop a novel approach on characterising the feature of evolvability and problem hardness, from the point of view of escape probability and fitness correlation. Our approach overcomes limitations in previous work and serves as a reliable indicator of problem hardness for EAs. The remainder of the paper is organised as follows. Section 2 briefly introduces the concept of evolvability and its characterisations. In Section 3, we propose the concept of Fitness-Probability Cloud and a measure of problem hardness based on it. The test problems and evaluation criteria are defined in Section 4. Section 5 presents the results from the experiments. Finally, we conclude this paper in Section 6.
2
Evolvability and Its Characterisations
Generally defined as the capacity of genetic operators to improve fitness quality, the concept of evolvability is closely linked with, albeit not exactly identical to, problem hardness for EAs. Since the concept of evolvability is defined in a very abstract way and lacks formality, it would be more useful to characterise or even quantify the evolvability in some way. So far there have been two approaches: Fitness Distance Correlation (f dc) [4] and Fitness Cloud (f c) [5], attempting to characterise the evolvability from the view of fitness-distance correlation and fitness-fitness correlation, respectively. Fitness Distance Correlation (f dc) is a first attempt trying to characterise evolvability as a measure of problem hardness for EAs, using the joint variation of distances and fitness values [4]. Given a set F = {f1 , . . . , fn } of n fitness values and a corresponding set D = {d1 , ..., dn } of the n distances away from the nearest global optimum, SF and SD are standard deviations of F and D. the correlation coefficient r is defined as:
110
G. Lu, J. Li, and X. Yao
CF D 1 , where CF D = (fi − f )(di − d). SF · SD n i=1 n
r=
Ideally we will have r = 1 when minimising (the fitness decreases while approaching a global optimum), and r = −1 when maximising (the fitness increases while approaching a global optimum). The f dc definition involves the notion of a global optimum, hence this restricts its applicability to problems with known optima, which is apparently unrealistic in the real world. Fitness Cloud (f c) is recently introduced as another characterisation of evolvability, which does not require known optima [5]. Basically f c is a plot of fitness values of individuals against fitness values of their neighbours, where the neighbours are obtained by one application of a genetic operator. Nevertheless, the size of the search space does not allow consideration of all individuals, therefore, sampling is required. The method for generating f c is presented in [8]. Let Γ be a set of individuals sampled from the search space and let fi = f (γi ) where f is the fitness function. Then for each γi ∈ Γ generate K neighbours by applying a genetic operator to γi and let fi denotes the maximum fitness value among K neighbours of γi . Finally the set of points {(f1 , f1 ), ..., (fn , fn )} is taken as the fitness cloud. The fitness cloud itself can be of help in determining the evolvability to some extent, but it would be clearer to quantify this feature with a numerical value. In general, there can be many potential ways to compress the information on fitness cloud into a single measure. Negative Slope Coefficient (nsc) has been proposed as the current state-of-the-art. Let us partition the fitness cloud C into a certain number of separate ordered “bins” C1 , ...Cm . Let f i∈C1 , ..., f i∈Cm represent the average fitness of fi in corresponding bins, and f i∈C1 , ..., f i∈Cm represent the average fitness of fi in corresponding bins. nsc is defined as [6]: nsc =
m−1 j=1
min(Sj , 0), where Sj =
f i∈Cj+1 − f i∈Cj f i∈Cj+1 − f i∈Cj
.
Although conceptually fitness cloud seems to be an appropriate characterisation of evolvability, we have identified a flaw that the neighbourhood sample size K has a drastic influence on the fitness cloud generated, and there is no proper method in tuning this parameter. Naturally, when a global genetic operator is applied in generating neighbours, as the value of K increases, it is more likely to obtain similar “best fitness” out of the neighbours generated in individuals of different fitness values. As a result, the capacity of the obtained fitness cloud in characterising evolvability degenerates, it is no longer reliable since you can have entirely different fitness clouds for exactly the same problem. To confirm this, using the bitwise mutation with probability 1/n, we generate the fitness cloud for the OneMix function of problem size 20 using 13 different K varying from 100 to 10000000. As we can see from Figure 1, 13 different fitness clouds have been generated, giving different indications of problem hardness for the same problem instance.
Fitness-Probability Cloud and a Measure of Problem Hardness
111
fitness cloud with different neighborhood size k 20
average fitness value of neighbors
18
16
100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1000000
14
12
10
8
6
2
4
6
8
10 fitness value
12
14
16
18
Fig. 1. Plot of Fitness Cloud for OneMix function (Problem Size = 20) with varying Neighbourhood Sample Size K
As for the measure nsc induced from fitness cloud, Vanneschi et al. [9] pointed out that nsc is dramatically influenced by the minimum number of points contained in a bin, namely, nsc cannot be a reliable indicator of evolvability unless an appropriate value for the minimum number of points contained in a bin is chosen, however, no formal method for choosing this parameter has been given.
3
Fitness-Probability Cloud
In order to accurately characterise evolvability in determining problem hardness for EAs, we propose the concept of Fitness-Probability Cloud (f pc). In general, f pc is to study evolvability based on the correlation between the fitness values of individuals and their escape probabilities. 3.1
Escape Probability
One of the factors that may influence problem hardness for EAs is the number of steps required to escape a particular set of individuals, e.g. local optima. The notion of Escape Probability (Escape Rate) is introduced by Merz [10] to quantify this factor. In theoretical runtime analysis of EAs, He and Yao [11] proposed an analytic way to estimate the mean first hitting time of an absorbing Markov chain, in which the transition probability between states were used. To make the study of Escape Probability applicable in practice, we adopt the idea of transition probability in a Markov chain. Let us partition the search space into L + 1 sets according to fitness values, F = {f0 , f1 , . . . , fL | f0 < f1 < · · · < fL } denotes all possible fitness values of the entire search space. Si denotes the average number of steps required to find an improving move starting in an individual of fitness values fi . The escape probability P (fi ) is defined as follows:
112
G. Lu, J. Li, and X. Yao
P (fi ) =
1 . Si
The greater the escape probability for a particular fitness value fi , the easier it is to improve the fitness quality. From this perspective, the escape probability P (fi ) is a good indication of the degree of evolvability for individuals of fitness value fi . 3.2
Fitness-Probability Cloud
We can extend the definition of escape probability to be on a set of fitness values. Pi denotes the average escape probability for individuals of fitness value equal to or above fi and is defined as: fj ∈Ci P (fj ) Pi = , where Ci = {fj |j ≥ i}. |Ci | If we take into account all the Pi for a given problem, this would be a good indication of the degree of evolvability of the problem. For this reason, the FitnessProbability Cloud (f pc) is defined as: f pc = {(f0 , P0 ), . . . , (fL , PL )}. 3.3
Accumulated Escape Probability
It is clear by definition that the Fitness-Probability Cloud (f pc) can demonstrate certain properties related to evolvability and problem hardness, however, the mere observation is not sufficient to quantify these properties. Hence we define a numerical measure called Accumulated Escape Probability (aep) based on the concept of f pc: fi ∈F Pi aep = , where F = {f0 , f1 , ..., fL | f0 < f1 < ... < fL }. |F | In general, aep should classify problem hardness in the following way: the larger the value of aep, the higher the evolvability is, and therefore the easier the problem should be for an EA. 3.4
Methodology for Generating f pc
Here we describe the methodology in generating the f pc for a given problem and an operator. The size of the search space does not allow consideration of all individuals, therefore sampling is required. Since not all the points are equally important, it is preferred to sample the space according to a distribution that gives higher weight to individuals of higher fitness values. In fact this can be achieved by using the Metropolis method or any other equivalent method [12]. In our case, we chose to use the Metropolis-Hastings sampling method described in [8].
Fitness-Probability Cloud and a Measure of Problem Hardness
113
For each sampled point, the escape probability will be estimated by computing the proportion of potentially better moves out of the entire neighbour set generated by one application of the genetic operator. The larger the number of neighbours sample, the more accurate the estimated escape probability would be. Hereinafter we refer to F as the set of fitness values of the sampled individuals obtained with the Metropolis-Hastings method, and for each fi ∈ F , Pi is the estimated average escape probability computed from the sampled neighbourhood set.
4
Test Problems
4.1
Invertible Functions of Unitation
Three invertible functions of unitation, i.e., OneMax, Trap, OneMix [13] are used. Definition 1. Let s be a bit string of length l, the unitation u(s) of s is a function l defined as: u(s) = (si ...sl ) = si . i=1
In other words, unitation represents the number of units in the bit string. OneMax functions are generalisations of the unitation u(s) of a bit string s: f (s) = d · u(s), where d is 1. The Trap function[14] is defined as follows: a (z − u(s)) if u(s) ≤ z f (s) = { z b (u(s) − z) otherwise l−z
where a represents a local optimum and b is a global optimum, z is a slope-change location. OneMix function is a mixture of OneMax function and ZeroMax function. It is an OneMax function where the unitation values are higher than 2l . If unitation values are lower, it is OneMax when u is odd, otherwise it is a scaled version of ZeroMax. OneMix is formally defined as: f (s) = {
(1 + a)( 2l − u(s)) + u(s)
l 2
if g(a) otherwise
where a represents a constant above zero and g(s) is equal to 1 when u(s) is even and u(s) < 2l . 4.2
Subset Sum Problem
The Subset Sum problem [15] is a constrained optimisation problem. Given a set of n items each with an associated weight w. The problem is to select a subset
114
G. Lu, J. Li, and X. Yao
out of n items, where the weight sum is maximised and does not exceed the budget W . Mathematically this problem is formulated as follows: n
Maximise
Subject to
wi xi ,
i=1 n
wi xi ≤ W,
n
xi ∈ {0, 1},
i=1
W =
i=1
5
wi
2
.
Experimental Results
Once a measure of problem hardness and the way to compute it have been chosen, the problem remains of finding a means to validate the prediction of the measure with respect to the problem instance and the algorithm. The easiest way is to use a performance measure [16]. Since the optimal solutions of practical problems are unknown, we used the number of fitness evaluations until the stopping criteria is satisfied as the performance measure.
Fitness−Probability Cloud
Fitness−Probability Cloud
0.4
0.35
trap
0.35
0.3 onemax subset
0.25 Average Escape Probability Pi
Average Escape Probability Pi
0.3
onemix
0.25
0.2
0.15
0.2
0.15 trap onemax
0.1 0.1
subset 0.05
0.05
0 0.1
0.2
0.3
0.4
0.5 0.6 Fitness Value
0.7
0.8
0.9
0 0
1
onemix
0.1
(a) Problem Size = 20
0.2
0.3
0.35
0.3
0.3
0.25
0.25 Average Escape Probability Pi
Average Escape Probability Pi
0.7
0.8
0.9
1
0.9
1
Fitness−Probability Cloud
0.35
0.2
trap onemax
0.1
0.2
0.15
trap onemax
0.1 subset
0.05
0 0
0.5 0.6 Fitness Value
(b) Problem Size = 40
Fitness−Probability Cloud
0.15
0.4
subset 0.05
onemix
0.1
0.2
0.3
0.4
0.5 0.6 Fitness Value
0.7
0.8
(c) Problem Size = 80
0.9
1
0 0
onemix
0.1
0.2
0.3
0.4
0.5 0.6 Fitness Value
0.7
0.8
(d) Problem Size = 200
Fig. 2. Plot of Fitness-Probability Cloud for Four Test Problems of Problem Sizes 20, 40, 80 and 200
Fitness-Probability Cloud and a Measure of Problem Hardness
115
Then we evaluate the effectiveness of the Fitness-Probability Cloud (f pc) and the problem hardness measure Accumulated Escape Probability (aep) on four different test problems: OneMax, Trap, OneMix and Subset Sum. For each test problem, four problem instances are generated with problem size varying from 20 to 200. To experimentally confirm the predictions given by our measure aep, we use the mutation-based (μ + λ)EA (μ denotes the number of parents, λ the number of offspring) with the following characteristics: mutation operator with flip probability 1/n for each bit (bitwise mutation), the algorithm stopped in 500 fitness evaluations if no better solution can be found. For each problem instance, 100 independent executions were performed. We applied the approach described in Section 3 to generate f pc and calculate the value of corresponding aep. For each problem instance, 1000 samples were obtained using the Metropolis-Hastings sampling method. For each sampled point, the bitwise mutation operator was used to generate 10000 neighbours in order to estimate the escape probability. Under the above parameters setting, we generate Fitness-Probability Cloud for four test problems of problem sizes 20, 40, 80 and 200. The results are illustrated in Figure 2(a) - Figure 2(d), respectively. With the Fitness-Probability Cloud generated, we can then apply the method defined in Section 3.3 to compute the Accumulated Escape Probability (aep). We then contrast its predictions to the performance from actual runs defined above. For the sake of comparison, we also compute the values of nsc [6]. The experimental results are summarised in Table 1.
Table 1. aep Predictions vs. Actual Performance for Four Problems of Size 20, 40, 80 and 200. Column 3 to 5 are the number of fitness evaluations taken by three different (μ+λ) EAs Problem Problem Size (1+1) EA (3+7) EA (7+3) EA OneMax 20 641 1166 1110 Trap 20 627 1158 1105 OneMix 20 745 1375 1330 Subset Sum 20 548 1009 928 OneMax 40 821 1434 1430 Trap 40 829 1422 1438 OneMix 40 1028 1776 1728 Subset Sum 40 533 1009 928 OneMax 80 1267 2002 2134 Trap 80 1273 2004 2115 OneMix 80 1609 2608 2678 Subset Sum 80 547 1015 936 OneMax 200 2640 3848 4221 Trap 200 2590 3860 4242 OneMix 200 3070 4724 4952 Subset Sum 200 534 1021 945
aep 0.135 0.135 0.09 0.22 0.175 0.182 0.105 0.239 0.202 0.209 0.121 0.246 0.225 0.222 0.121 0.252
nsc 0 0 -8.1932 -1.1572 -0.333 0 -16.3114 -6.818 -0.5 -0.25 -20.4879 -7.5286 -3 0 -30.175 -8.6169
116
G. Lu, J. Li, and X. Yao
In terms of the defined performance measure, the relevant problem hardness of the four test problems remains the same across problem sizes of 20, 40, 80, 200. Among different problems, the relevant problem hardness order is: Subset Sum < OneMax ≈ Trap < OneMix. By definition of the aep, the smaller the aep value, the more difficult the problem is. In this case, aep consistently predicts the relevant problem hardness between four test problems of the problem sizes given, with the results in qualitative agreement with the actual performance. However, the aep is unable to quantify the magnitude of the difference in problem hardness among those problems. In contrast, as we can see from Table 1, the results given by nsc do not correspond to the actual performance, consequently the nsc fails to correctly predict the relevant problem hardness among OneMax, Trap, OneMix and Subset Sum.
6
Conclusion
Evolvability is an important feature directly related to problem hardness for Evolutionary Algorithms(EAs). Previous attempts trying to characterise and quantify this feature including Fitness Distance Correlation (f dc) and Fitness Cloud (f c) suffered from several deficiencies. This paper presents the concept of Fitness-Probability Cloud f pc to characterise evolvability from the point of view of escape probability and fitness correlation, which overcomes limitations in previous approaches and serves as an improved characterisation of evolvability. Furthermore, we propose a new measure of problem hardness for EAs called Accumulated Escape Probability (aep). We then compute the values of aep and validate its effectiveness by contrasting its predictions to the actual performance measured by the number of fitness evaluations on four test problems - OneMax, Trap, OneMix and Subset Sum under various problem sizes. Experimental results suggest that the aep can reliably discriminate the relevant hardness between different problems of the same problem size with respect to the mutation-based (μ+λ) EAs, whilst it seems that it is unable to quantify the magnitude of the difference in problem hardness. Future work includes a more exhaustive study of aep and other measures based on Fitness-probability Cloud over a wide range of problems, this can potentially identify the applicable problem domains of the aep and Fitness-probability Cloud. Furthermore, a more in-depth study of empirical problem hardness measures for EAs should be carried out.
Acknowledgement This work was partially supported by an EPSRC grant (No. EP/D052785/1) on “SEBASE: Software Engineering By Automated SEarch”.
References 1. Vassilev, V.K., Fogarty, T.C., Miller, J.F.: Smoothness, ruggedness and neutrality of fitness landscapes: from theory to application, pp. 3–44. Springer, Inc., New York (2003)
Fitness-Probability Cloud and a Measure of Problem Hardness
117
2. Borenstein, Y., Poli, R.: Information landscapes and problem hardness. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, GECCO 2005, pp. 1425–1431. ACM, New York (2005) 3. Wright, S.: The roles of mutation, inbreeding, crossbreeding, and selection in evolution. In: Proc. 6th Congr. Genetics, vol. 1, p. 365 (1932) 4. Jones, T., Forrest, S.: Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Proceedings of the 6th International Conference on Genetic Algorithms, pp. 184–192. Morgan Kaufmann Publishers Inc., San Francisco (1995) 5. Collard, P., V´erel, S., Clergue, M.: Local search heuristics: Fitness cloud versus fitness landscape. CoRR, abs/0709.4010 (2007) 6. Vanneschi, L., Tomassini, M., Collard, P., V´erel, S.: Negative slope coefficient: A measure to characterize genetic programming fitness landscapes. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ek´ art, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 178–189. Springer, Heidelberg (2006) 7. Smithand, T., Husbands, P., Layzell, P., O’Shea, M.: Fitness landscapes and evolvability. Evol. Comput. 10, 1–34 (2002) 8. Vanneschi, L., Clergue, M., Collard, P., Tomassini, M., V´erel, S.: Fitness clouds and problem hardness in genetic programming. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3103, pp. 690–701. Springer, Heidelberg (2004) 9. Vanneschi, L., V´erel, S., Tomassini, M., Collard, P.: NK landscapes difficulty and negative slope coefficient: How sampling influences the results. In: Giacobini, M., Brabazon, A., Cagnoni, S., Di Caro, G.A., Ek´ art, A., Esparcia-Alc´ azar, A.I., Farooq, M., Fink, A., Machado, P. (eds.) EvoWorkshops 2009. LNCS, vol. 5484, pp. 645–654. Springer, Heidelberg (2009) 10. Merz, P.: Advanced fitness landscape analysis and the performance of memetic algorithms. Evol. Comput. 12, 303–325 (2004) 11. He, J., Yao, X.: Towards an analytic framework for analysing the computation time of evolutionary algorithms. Artificial Intelligence 145, 59–97 (2003) 12. Madras, N.: Lectures on Monte Carlo Methods. American Mathematical Society, Rhode Island (2002) 13. Mengshoel, O.J., Goldberg, D.E., Wilkins, D.C.: Deceptive and other functions of unitation as bayesian networks (1998) 14. Deb, K., Goldberg, D.E.: Analyzing deception in trap functions. In: Foundations of Genetic Algorithms, vol. 2, pp. 93–108 (1993) 15. Rivest, R., Stein, C., Cormen, T., Leiserson, C.: Introduction to Algorithms. MIT Press and McGraw-Hill (1990) 16. Naudts, B., Kallel, L.: A comparison of predictive measures of problem difficulty in evolutionary algorithms. IEEE Transactions on Evolutionary Computation 4(1), 1–15 (2000)
Frequency Distribution Based Hyper-Heuristic for the Bin-Packing Problem He Jiang1 , Shuyan Zhang2 , Jifeng Xuan3 , and Youxi Wu4 1
School of Software, Dalian University of Technology School of Software Technology, Zhengzhou University 3 School of Mathematical Sciences, Dalian University of Technology School of Computer Science and Software, Hebei University of Technology
[email protected],
[email protected],
[email protected],
[email protected] 2
4
Abstract. In the paper, we investigate the pair frequency of low-level heuristics for the bin packing problem and propose a Frequency Distribution based Hyper-Heuristic (FDHH). FDHH generates the heuristic sequences based on a pair of low-level heuristics rather than an individual low-level heuristic. An existing Simulated Annealing Hyper-Heuristic (SAHH) is employed to form the pair frequencies and is extended to guide the further selection of low-level heuristics. To represent the frequency distribution, a frequency matrix is built to collect the pair frequencies while a reverse-frequency matrix is generated to avoid getting trapped into the local optima. The experimental results on the bin-packing problems show that FDHH can obtain optimal solutions on more instances than the original hyper-heuristic. Keywords: hyper-heuristic, frequency distribution, bin-packing, pair frequency.
1
Introduction
In recent years, hyper-heuristics were proposed to overcome the problem-specific drawbacks of existing heuristics. By definition, hyper-heuristics, termed ‘heuristics to choose heuristics’ [1], are heuristics utilizing a high-level heuristic to choose and assign a set of simple low-level heuristics(LLHs). The main difference between hyper-heuristics and other heuristics is that hyper-heuristics raise the level of generality [2]. Hyper-heuristics work on a LLH space rather than directly on the problem space. The goal of a hyper-heuristic is to generate a LLH sequence which can achieve a final solution to the problem at hand.
Our work is partially supported by the Natural Science Foundation of China under Grant No. 60805024, 60903049, 61033012, the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No. 20070141020, CAS Innovation Program under Grant No. ISCAS2009-DR01, and Natural Science Foundation of Dalian under Grant NO. 201000117.
P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 118–129, 2011. c Springer-Verlag Berlin Heidelberg 2011
Frequency Distribution Based Hyper-Heuristic for the Bin-Packing Problem
119
According to the search characteristics, hyper-heuristics can be classified into two categories, namely constructive hyper-heuristics and perturbative hyperheuristics [3]. Constructive hyper-heuristics apply LLHs to gradually construct a complete solution from an empty initial solution while perturbative hyperheuristics improve the solution quality from a complete solution. Both constructive and perturbative hyper-heuristics have been applied to some problems, such as bin-packing [4], [5], timetabling [6], [7], [8], production scheduling [9], and personal scheduling [10]. Among these problems, bin-packing is a typical problem attracting much attention. As an NP-hard problem [11], bin-packing is to pack all the given pieces into as few bins as possible [5]. We focus on the selection strategy for the perturbative hyper-heuristics in this paper. As a significant component, the selection strategy is helpful to decide the next LLH in the perturbative hyper-heuristic [3]. Most of hyper-heuristics select the LLHs individually. However, the combination of LLHs may provide more improvement than individual LLHs. For example, Thabtah & Cowling [12] propose an associative classification approach to predict which LLH to combine with the given LLH sequence. In this paper, we investigate the frequency distribution for the combination of LLHs and propose a Frequency Distribution based Hyper-Heuristic (FDHH). First, we present the pair frequency of LLHs and employ the distribution of pair frequencies to guide the further selection of LLHs. Then, we design FDHH to solve the bin-packing problem. In FDHH, the frequency distribution is incorporated into an existing algorithm, a Simulated Annealing Hyper-Heuristic (SAHH)[5]. FDHH consists of two phases: one for generating the frequency distribution and the other for guiding the hyper-heuristic. Moreover, a frequency matrix is built to collect the pair frequencies while a reverse-frequency matrix is utilized to avoid getting trapped in the local optimal LLH sequence. Finally, experimental results on the bin-packing problem demonstrate that our FDHH can obtain optimal solutions on more instances than the original hyper-heuristic, SAHH. The paper is organized as follows. Section 2 gives the related work. Section 3 analyses the frequency distribution and Section 4 describes FDHH. Section 5 reports the experimental results on the bin-packing problem. Finally, conclusion and future work are given in Section 6.
2 2.1
Related Work Combination of LLHs
The combination of LLHs is a new technology to enlarge the granularity of the LLHs when selecting heuristics in hyper-heuristics. To our knowledge, the mostly related work is an associative classification based hyper-heuristics for combining an LLH with the existing ones [12]. This algorithm can be viewed as a trade-off between the greediness degree and the randomness degree for LLHs. Moreover, some other approaches are proposed to explore the characteristics of LLHs. For example, Chakhlevitch & Cowling [13] investigate the learning strategies for
120
H. Jiang et al.
choosing the subset of the fittest LLHs for hyper-heuristic design; Ren et al. [14] propose a bipartite-graph based approach to distinguish intensification and diversification sets of LLHs for reducing the search space. In this paper, our approach attempts to analyse the distribution for pair frequencies of LLHs and then to further guide the selection of LLHs. 2.2
Bin-Packing Problem
The bin-packing problem is a well-known NP-hard problem in real-world applications [11]. Given a set of pieces P = {p1 , p2 , . . . , pk }, an unlimited set of bins, a weight ai for the piece pi , and an identical capacity c for each bin, the bin-packing problem is to pack all pieces into bins with the goal to minimize the number of used bins. A solution x to the bin-packing problem is a set of used bins with all pieces packed in the bins. We give the formal definition of the bin-packing problem in Equation (1). In the definition, both yj and xi,j are binary values. The value 1 of yj denotes that the jth bin is used; otherwise, 0 denotes not. The value 1 of xi,j denotes that the piece pi is assigned into the bin j; otherwise, 0 denotes not. The constraints suggest that the sum weight of pieces in a bin cannot exceed the capacity c and each piece must be packed in one bin. The objective function is the number of the used bins. min f (x) =
k
yj
(1)
j=1
s.t.
k i=1
k
ai xi,j ≤ cyj ,
j=1
2.3
j ∈ {1, . . . , k}
xi,j = 1
LLHs
In this part, we introduce the LLHs for the bin-packing problem. These LLHs can be combined into a sequence in a hyper-heuristic and each LLH in this sequence is used for enhancing the solution quality or to get the solution out of local optima. We choose 6 simple LLHs from the LLH list in competition CHeSC [15] for the bin-packing problem. Due to the paper length limit, we briefly list their functions as follows. h1 , to swap from lowest bin; h2 , to split a bin; h3 , to swap; h4 , to repack the lowest filled bin; h5 , to destroy 3 highest bins; h6 , to destroy 3 lowest bins. 2.4
Simulated Annealing Hyper-Heuristic
The Simulated Annealing Hyper-Heuristic (SAHH) proposed by Bai et al. [5] is a typical hyper-heuristic for the bin-packing problem. We briefly introduce this algorithm as follows. Besides the similar framework of other hyper-heuristics, SAHH adopts three special strategies: stochastic heuristic selection, simulated annealing acceptance,
Frequency Distribution Based Hyper-Heuristic for the Bin-Packing Problem
121
Table 1. Framework of SAHH algorithm Algorithm: SAHH Input: pro, H, W, t, total iter, LP, β Output: the LLH sequence. (1) s = Generate Initial Solution(pro, H); (2) while current iter < total iter do (2.1) hi = Stochastic Heuristic Selection(H, W ); (2.2) s = Heuristic Application(s, hi ); (2.3) s = Simulated Annealing Acceptance (s , s, t); (2.4) t = Temperature Resetting(t, β); (2.5) // short term learning If mod(current iter, LP ) = 0 then (2.5.1) C = Performance Calculation(LP ); (2.5.2) W = Weight Resetting(C); Endwhile
and short term learning. The details are shown in Table 1. Given a test instance pro and a set of perturbative LLHs H = {h1 , h2 , . . . , hn }, SAHH generates an initial solution s and retains s as the current solution (step (1)). Thereafter, an iteration consisting of five steps, namely selection, application, acceptance, resetting, and learning begins to iteratively update the current solution (step (2)). In step (2.1), stochastic heuristic selection strategy selects a heuristic hi according to its weight wi (wi ∈ W ). Then SAHH applies hi to the current solution s and generates a new solution s (step (2.2)). In step (2.3), the simulated annealing acceptance decides whether s is accepted as the current solution depending on the current temperature t. Next, step (2.4) resets the temperature. In this step, SAHH checks whether the current temperature t should be decreased, increased or unchanged. The current temperature is changed according to t = t/(1 + βt) when decreasing, or t = t/(1 − βt) when increasing. Since the performance of the LLHs varies during different periods, SAHH introduces short term learning with LP as the length of one learning period (step (2.5)). In this step, SAHH calculates a performance ci of each LLHs hi and resets the weights wi (ci ∈ C) based on LP iterations. Finally, SAHH returns a LLH sequence to generate the final solution to the problem.
3
Frequency Distribution
In this section, we introduce the notion of pair frequency and analyse the frequency distribution using the LLH sequences of SAHH for the bin-packing problem. 3.1
Pair Frequency and Frequency Matrix
Given a set of LLHs H = {h1 , h2 , . . . , hn } and the fixed length m of sequences, the size of heuristic space is nm . Then it is intractable to obtain the optimal sequence in polynomial time for large n and m. However, we can obtain approximate optimal sequences instead. From the view of the graph theory, each
122
H. Jiang et al.
sequence in the heuristic space is acquired by traversing a fully connected graph [14] whose vertexes are n LLHs. When a hyper-heuristic traverses the graph, the information guiding the search from one heuristic to another is significant. We employ the pairs of LLHs (the edges in a graph) to help to express the association by their frequencies. In this paper, hi,j denotes the pair that starts with hi and ends with hj (i, j ∈ {1, . . . , n}). Therefore, instead of analysing heuristics individually, we tend to analyse the pairs of LLHs. We define a frequency matrix F for a given heuristic sequence. In F, an element Fi,j denotes the pair frequency of hi,j . Considering a set of n LLHs, H = {h1 , h2 , . . . , hn }, there are totally n × n different pairs of LLHs, i.e., h1,1 , . . . , h1,n , h2,1 , . . . , h2,n , . . . , hn,1 , . . . , hn,n . Then, the size of the frequency matrix F is n × n. Let L = (l1 , l2 , . . . , lm ) be the sequence obtained by SAHH and m is the length of L. It can intuitively conclude that a set of pairs defined as LC = {l1,2 , l2,3 , . . . , lm−1,m } lies in L where li,i+1 is the pair combined by the ith and (i+1)th LLHs in L. Here, |LC| = m − 1. We let ti,j be the number of occurrences of hi,j in LC. According to this knowledge, the element Fi,j is formally defined as Fi,j = ti,j /|LC| 3.2
(i, j ∈ {1, . . . , n})
(2)
Frequency Distribution Analysis
In this section, we visualise the frequency distributions based on the pair frequencies. According to Section 3.1, the values of pair frequencies require the knowledge of existing sequences to build the frequency matrix. For a given instance, SAHH is run for ten rounds independently and ten distinct sequences are obtained. These sequences are thereafter used to produce frequency matrixes respectively. Thus, we illustrate the frequency distributions from these ten matrixes in one figure. We set all the parameters of SAHH according to [5] except choosing 6 LLHs in Section 2.3. The instances in our experiments are from three widely used classes of binpacking problem instances: Uniform (80 instances), Triplet (80 instances), and Sch set (1210 instances) which have been shown and investigated in [5], [16] and [17]. The detailed descriptions of these classes of instances are described in Section 5. Due to the paper length limit, we only illustrate the frequency distributions for four selected instances: U120 00, T249 02, N3C3W4 M, and Table 2. Characteristics of four selected bin-packing problem instances Instance U120 00 T249 02 N3C3W4 M N4W3B1R4
Instance class
Piece number
Uniform Triplet Sch set Sch set
120 249 200 500
Weight range [20, [250, [30, [114,
100] 500] 100] 168]
Capacity 150 1000 150 1000
Frequency Distribution Based Hyper-Heuristic for the Bin-Packing Problem
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
0.1
0.05
0 0
6
12
18
Pair
24
30
0.15
Frequency
Frequency
0.15
0.1
0.05
(a) Frequency distributions of U120_00 L1 L2 L3 L4 L5 L6
0.1
L7 L8 L9 L10
0.05
0 0
6
12
18
Pair
24
30
6
12
18
Pair
24
30
36
(b) Frequency distributions of T249_02
36
(c) Frequency distributions of N3C3W4_M
0.15
Frequency
Frequency
0.15
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
0 0
36
123
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
0.1
0.05
0 0
6
12
18
Pair
24
30
36
(d) Frequency distributions of N4W3B1R4
Fig. 1. Illustrations of frequency distributions
N4W3B1R4. Similar tendencies can be found for the other instances. The selected instances are described in Table 2. Fig. 1 illustrates the frequency distributions of U120 00, T249 02, N3C3W4 M, and N4W3B1R4. Each sub-figure plots ten frequency distributions from ten different sequences {L1 , L2 , . . . , L10 } of the corresponding instance. The horizontal axis shows the pairs (the index in the horizontal axis is set as(i − 1) × 6 + j for hi,j , e.g., Index 9 is for h2,3 , and Index 25 is for h5,1 ) while the vertical axis indicates the pair frequencies. From Fig. 1, we make four observations.First, for one instance, ten frequency distributions generated by ten distinct LLH sequences are quite similar, especially for T249 02 and N4W3B1R4. Second, for one instance, different pairs have different frequency values. Third, for different instances, the frequency distributions are somewhat similar but not identical across the four instances; note that all the sub-figures show that the average frequencies of pairs started with h1 , h2 , h3 , and h5 are larger than that those started with h4 and h6 , yet frequencies of the same pair (e.g., h2,1 ) are different for the four instances. Fourth, the pairs of LLHs can be classified into two categories according to the values of frequencies: pairs with large frequencies and pairs with small frequencies, e.g., the frequency 0.05 can be viewed as one boundary among the frequencies. According to sequences generated by SAHH, large pair frequencies for the combination of the two LLHs are likely to improve the solution quality while the remaining pairs tend to decrease the solution quality.
124
4
H. Jiang et al.
Frequency Distribution Based Hyper-Heuristic
Based on the frequency matrixes in Section 3, we propose a Frequency Distribution based Hyper-Heuristic (FDHH). In FDHH, an existing hyper-heuristic, SAHH is used to generate the pair frequencies and then SAHH is extended into a Frequency based Simulated Annealing Hyper-Heuristic (FSAHH) to guide the LLHs. In FSAHH, a reverse-frequency matrix is proposed to promote the search process. In this section, we first give a reverse version of the frequency matrix. Then, we propose FSAHH based on the frequency matrix and the reversefrequency matrix. Finally, we present the framework FDHH, which employs both SAHH and FSAHH as sub-algorithms. 4.1
Reverse-Frequency Matrix
After choosing one heuristic hlast , the next LLH hi is selected according to the pair frequencies hlast,1 , hlast,2 , . . . , hlast,n . If the frequencies are in the frequency matrix F, as shown in Section 3, pairs with larger frequency values are more likely to be chosen and this leads to get trapped in local optima. As a result, in order to get out of local optima, sometimes, a reverse-frequency matrix should be utilized to increase the small pair frequencies and to decrease the large pair frequencies. For the reverse-frequency matrix R, its element Ri,j is the reverse-frequency of pair hi,j (i, j ∈ {1, . . . , n}). Each row Ri is created by exchanging the qth largest element with the qth smallest element in Fi , q ∈ {1, . . . , n/2}. Ri and Fi are reverse-frequencies and frequencies of pairs started with the LLH hi . For instance, with n = 6, in the first row of F , if F1,1 has the largest frequency value, and F1,4 has the smallest frequency value. Then set R1,1 to F1,4 and R1,4 to F1,1 . If F1,2 has the second largest frequency value, and F1,6 has the second smallest frequency value, we set R1,2 to F1,6 and R1,6 to F1,2 . At last, if F1,3 has the third largest frequency value, and F1,5 has the third smallest frequency value, we set R1,3 to F1,5 and R1,5 to F1,3 . Therefore, the first row R1 is generated. The other rows of R can be generated under the same method. 4.2
Frequency Based Simulated Annealing Hyper-Heuristic
Based on the frequency matrix and the reverse-frequency matrix, we propose FSAHH. FSAHH employs three new strategies: Frequency Based Selection, Count Value Updating, and Interval Disturb. We show their details as follows. Frequency Based Selection: Given a set of LLHs H = {h1 , h2 , . . . , hn } and a frequency matrix F (or a reverse-frequency matrix R), this strategy selects a LLH hi under matrix F (or R). To start the procedure, given the last used LLH hlast , the first hi is chosen randomly; the other hi is selected under the frequency Flast,i (or Rlast,i ). Count Value Updating: Given a counter v which records the number of continuously unimproved solutions, this strategy updates v by comparing objective functions of the current solution s and the new solution s . There are four cases.
Frequency Distribution Based Hyper-Heuristic for the Bin-Packing Problem
125
Table 3. Framework of FSAHH algorithm Algorithm: FSAHH Input: pro, H, t, total iter, LP, β, maxV, F, R Output: the best solution (1) initialize v = 0 and ts = t; (2) s = Generate Initial Solution(pro, H); (3) while current iter < total iter do (3.1) hi = Frequency Based Selection (H, F ); (3.2) s = Heuristic Application(s, hi ); (3.3) v = Count Value Updating (s , s); (3.4) s = Simulated Annealing Acceptance (s , s, t); (3.5) t = Temperature Resetting(t, β); (3.6) if v equals to maxV then s = Interval Disturb(s, R, ts); current iter = current iter + LP ; Endwhile
In case 1, f (s ) < f (s), then set v = 0; in case 2, f (s ) > f (s), then set v = v + 1; in case 3, f (s ) = f (s) and s is different from s, then set v = 0; in case 4, f (s ) = f (s) and s = s, then set v = v + 1. Interval Disturb: This strategy is a standard simulated annealing method using a reverse-frequency matrix R and a high starting temperature t to escape from the local optima. Given a current solution s, matrix R, and temperature t, Interval Disturb consists of 4 steps. Step (1) selects a heuristic hi by Frequency Based Selection strategy with R as the based matrix. Step (2) applies hi to s and generates a new solution s . Step (3), with t as the current temperature, accepts s as the current solution by Simulated Annealing Acceptance strategy (the same with step (2.3) in Table 1). Step (4) decreases the temperature t (sets t = t/(1 + βt)). Interval Disturb totally repeats step (1)-(4) LP times and returns the best solution. β and LP are the same as those in Table 1. We present the details of FSAHH in Table 3. It works as follows. After generating an initial solution s with the same manner used in SAHH, FSAHH runs 6 steps iteratively (step (3)). In step (3.1), we utilize Frequency Based Selection strategy and a frequency matrix F of a given test instance to select a LLH hi . Then, we apply hi to the current solution s to generate a new solution s in step (3.2). Next in step (3.3), the counter v is updated by Count Value Updating method with s and s as input parameters. After that, in step (3.4) and (3.5), we adopt Simulated Annealing Acceptance to decide whether s should be accepted as the current solution, and Temperature Resetting method to reset the current temperature t. Detailed explanations are corresponding to step (2.3) and (2.4) of Table 1, respectively. The last step is trying to escape from the local optima (step (3.6)). When v = maxV (maxV is set to LP/5 in the experiments), it can be viewed as the algorithm has already been trapped in the local optima. Then Interval Disturb is triggered with the current solution s, the reverse-frequency matrix R, and the starting temperature ts as input parameters. Note that, in
126
H. Jiang et al.
Interval Disturb, there is a loop which executes LP times. Thus, after running Interval Disturb, add LP to current iter. The iteration runs until current iter equals to total iter. At last, the best solution is returned. 4.3
Framework
In this section, we present the framework of FDHH, which employs SAHH and FSAHH as subroutine algorithms. We present the framework of FDHH in Table 4. For a given test instance, first, SAHH is run learn iter times (learn iter is set to 10 in the experiments) to achieve a set of distinct LLH sequences denoted as LS = {L1 , L2 , . . . , Llearn iter } which are used for learning in next steps. Then, in step (2), the average frequency matrix aveF is generated utilizing the sequence set LS. After that, in step (3), the average reverse-frequency matrix aveR is transferred from aveF. As preconditions aveF and aveR have been established, then in step (4), FSAHH is run learn iter times with aveF and aveR as two parameters. Finally, the best solution is returned in step (4). Table 4. Framework of FDHH algorithm Algorithm: FDHH Input: pro, H, W, t, total iter, LP, β, maxV, learn iter Output: the best solution s∗ initialise s∗ as a random solution (1) for i = 1 to learn iter Li = SAHH(pro, H, W, t, total iter, LP, β); End for (2) aveF = Getting Average Frequency Matrix (LS ); (3) aveR = Getting Average Reverse Frequency Matrix(aveF ); (4) for i = 1 to learn iter si = FSAHH(pro, H, t, total iter, LP, β, maxV, aveF, aveR); if (f (si ) < f (s∗ )) then s∗ = si ; End for
5
Experimental Results
In this section, we apply our FDHH to the bin-packing problem. To evaluate our algorithm, experiments are conduct on three classes of bin-packing instances (totally 1370 instances). The first class (Uniform class) has 80 instances totally with c = 150, and ai ∈ [20, 100]. There are 4 sub classes in Uniform class: Fal U120, Fal U250, Fal U500, and Fal U1000, each of which has 20 test instances with k = 120, 250, 500, and 1000, respectively. The second class (Triplet class) also has 80 instances with c = 1000, and ai ∈ [250, 500]. There are 4 sub classes in Triplet class: Fal T60, Fal T120, Fal T249, and Fal T501, each of which has 20 test instances with k = 60, 120, 249, and 501,
Frequency Distribution Based Hyper-Heuristic for the Bin-Packing Problem
127
respectively. In a more complex class, Triplet, each bin of the optimal solution must be fully filled with 3 pieces. The third class (Sch set) contains 1210 instances with 3 sub classes: Sch set1 (720 instances), Sch set2 (480 instances), and Sch set3(10 instances). In Sch set1, c ∈ {100, 120, 150}, and ai distributes in [1, 100], [20, 100], or [30, 100]. In Sch set2, c = 1000, and ai satisfies that three to nine pieces pack in one bin. For both Sch set1 and Sch set2, k ∈{50, 100, 200, 500}. Sch set3 is harder than Sch set1 and Sch set2 with c = 100000, ai ∈[20000, 35000] and k = 200. Uniform class and Triplet class are introduced by [18] and Sch set class is generated by [19]. For all instances in the Uniform and Triplet classes, the optimal solution is known [20]. Their instances and corresponding optimal objective values can be downloaded (http://people.brunel.ac.uk/∼mastjjb/jeb/orlib/binp ackinfo.html). In Sch set, 1184 instances have been solved optimally. The instances and optimal objective values (or the best known lower bound) can be downloaded (http://www.wiwi.uni-jena.de/Entscheidung/binpp/index.htm). Experiments are performed under Win XP on a Pentium Dual Core 2.8 GHx with 4G memory PC. All the source codes are implemented in Java, compiled using JDK 6.20. We run FDHH on the 1370 test instances, and for comparison, we run SAHH on the same instances independently. The source code of SAHH is implemented according to [5]. All parameters of SAHH are extracted from [5] except the set of LLHs. Table 5 presents the comparative results of our proposed FDHH (running one time with learn iter = 10) and SAHH (running 2 × learn iter times). ‘Instance class’ column shows the names of test instance classes. ‘Num’ indicates the number of test instances in each sub class. ‘Hits’ denotes the number of instances that can arrive at the optimal objective value (or the best known lower bound). ‘Max dev.’ is the maximum absolute deviation from the objective values in the worst case to the optima over all instances in a class. From ‘Hits’ columns of Table 5, we can see that FDHH achieves 1094 optimal solutions (out of 1370) while Table 5. Comparative results of SAHH and FDHH Instance class Uniform
Triplet
Sch Set All
Num
Fal U120 20 Fal U250 20 Fal U500 20 Fal U1000 20 Fal T60 20 Fal T120 20 Fal T249 20 Fal T501 20 Sch Set1 720 Sch Set2 480 Sch Set3 10 1370
SAHH Hits Max dev. 17 1 20 0 20 0 16 3 0 1 0 1 0 3 0 3 669 2 340 2 3 2 1085 3
FDHH Hits Max dev. 18 1 20 0 20 0 17 2 0 1 0 1 0 1 0 2 672 1 343 1 4 1 1094 2
128
H. Jiang et al.
SAHH gets 1085 optimal solutions. Besides, as shown in ‘Max dev.’ columns, it is easy to summarize that the worst solutions generated by FDHH is closer to optima than that generated by SAHH. Note that the numerical results of SAHH are different from those reported in [5] due to the difference between the chosen LLHs.
6
Conclusion and Future Work
In this paper, we study the frequency distributions of LLH sequences and design the frequency distribution based hyper-heuristic (FDHH). To build the frequency distribution for the algorithm design, we propose the notation of pair frequency to investigate the characteristics of the combination of LLHs. Experimental results on the bin-packing problem indicate that FDHH can obtain optimal solutions on more instances than the original hyper-heuristic, SAHH. The experience on the frequency distribution in this paper can be drawn on to design other hyper-heuristics. In future work, we plan to design a selection strategy based on the combination of multiple LLHs. On the other hand, the multiple granularity combination may be more effective than the single granularity. Since the multiple granularity of combination of LLHs enlarges the complexity of our algorithm, it is necessary to design a new strategy to dynamically decide the size of the combination of the LLHs. Moreover, it is useful to give a large empirical study or develop the theoretical analysis for providing a relatively exact approach for deciding the size of the combination.
References 1. Burke, E.K., Hart, E., Kendall, G., Newall, J., Ross, P., Schulenburg, S.: Hyper-heuristics: An Emerging Direction in Modern Search Technology. In: Glover, F., Kochenberger, G. (eds.) Handbook of Metaheuristics, pp. 457–474. Kluwer, Dordrecht (2003) 2. Ochoa, G., Vaquez-Rodr´ıguez, J.A., Petrovic, S., Burke, E.K.: Dispatching Rules for Production Scheduling: a Hyper-heuristic Landscape Analysis. In: Proceedings of the IEEE CEC, Trondheim, Norway, pp. 1873–1880 (2009) 3. Burke, E.K., Hyde, M., Kendall, G., Ochoa, G., Ozcan, E., Qu, R.: A Survey of Hyper-heuristics. Technical Report, School of Computer Science and Information Technology, University of Nottingham, Computer Science (2009) 4. Ross, P., Marin-Blazquez, J.G., Schulenburg, S., Hart, E.: Learning a Procedure that Can Solve Hard Bin-packing Problems: A new GA-based Approach to Hyper-heuristics. In: Cant´ u-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 1295–1306. Springer, Heidelberg (2003) 5. Bai, R., Blazewicz, J., Burke, E.K., Kendall, G., McCollum, B.: A Simulated Annealing Hyper-heuristic Methodology for Flexible Decision Support. Technical report, School of CSiT, University of Nottingham (2007)
Frequency Distribution Based Hyper-Heuristic for the Bin-Packing Problem
129
6. Qu, R., Burke, E.K.: Hybridisations within a Graph Based Hyper-heuristic Framework for University Timetabling Problems. JORS 60, 1273–1285 (2008) 7. Qu, R., Burke, E.K., McCollum, B.: Adaptive Automated Construction of Hybrid Heuristics for Exam Timetabling and Graph Colouring Problems. EJOR 198, 392– 404 (2008) 8. Bilgin, B., Ozcan, E., Korkmaz, E.E.: An Experimental Study on Hyper-heuristics and Final Exam Scheduling. In: PATAT 2006, pp. 394–412. Springer, Berlin (2007) 9. Vazquez-Rodriguez, J.A., Petrovic, S., Salhi, A.: A Combined Meta-heuristic with Hyper-heuristic Approach to the Scheduling of the Hybrid Flow Shop with Sequence Dependent Setup Times and Uniform Machines. In: Proceedings of the 3rd Multidisciplinary International Scheduling Conference, Paris, France, pp. 506–513 (2007) 10. Han, L., Kendall, G.: Guided Operators for a Hyper-heuristic Genetic Algorithm. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 807–820. Springer, Heidelberg (2003) 11. Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons, Chichester (1990) 12. Thabtah, F., Cowling, P.: Mining the Data from a Hyperheuristic Approach Using Associative Classification. Expert Systems with Applications 34(2), 1093– 1101 (2008) 13. Chakhlevitch, K., Cowling, P.: Choosing the Fittest Subset of Low Level Heuristics in a Hyperheuristic Framework. In: Raidl, G.R., Gottlieb, J. (eds.) EvoCOP 2005. LNCS, vol. 3448, pp. 23–33. Springer, Heidelberg (2005) 14. Ren, Z., Jiang, H., Xuan, J., Luo, Z.: Ant Based Hyper Heuristics with Space Reduction: A Case Study of the p-Median Problem. In: Schaefer, R., Cotta, C., Kolodziej, J., Rudolph, G. (eds.) PPSN XI. LNCS, vol. 6238, pp. 546–555. Springer, Heidelberg (2010) 15. Cross-domain Heuristic Search Challenge, http://www.asap.cs.nott.ac.uk/chesc2011/index.html 16. Fleszar, K., Hindi, K.S.: New Heuristics for One-dimensional Bin-packing. Computers and Operations Research 29(7), 821–839 (2002) 17. Alvim, A.C.F., Ribeiro, C.C., Glover, F., Aloise, D.J.: A Hybrid Improvement Heuristic for the One Dimensional Bin Packing Problem. Journal of Heuristics 10, 205–229 (2004) 18. Falkenauer, E.: A Hybrid Grouping Genetic Algorithm for Bin Packing. Journal of Heuristics 2, 5–30 (1996) 19. Scholl, A., Klein, R., Jurgens, C.: BISON: A Fast Hybrid Procedure for Exactly Solving the One Dimensional Bin Packing Problem. Computers & Operations Research 24(7), 627–645 (1997) 20. Valerio de Carvalho, J.M.: Exact Solution of Bin-packing Problems Using Column Generation and branch-and-bound. Annals of Operations Research 86, 629–659 (1999)
From Adaptive to More Dynamic Control in Evolutionary Algorithms Giacomo di Tollo1 , Fr´ed´eric Lardeux1 , Jorge Maturana2 , and Fr´ed´eric Saubion1 1
2
LERIA, University of Angers, France
[email protected] Instituto de Inform´ atica, Universidad Austral de Chile, Chile
[email protected]
Abstract. Adaptive evolutionary algorithms have been widely developed to improve the management of the balance between intensification and diversification during the search. Nevertheless, this balance may need to be dynamically adjusted over time. Based on previous works on adaptive operator selection, we investigate in this paper how an adaptive controller can be used to achieve more dynamic search scenarios and what is the real impact of possible combinations of control components. This study may be helpful for the development of more autonomous and efficient evolutionary algorithms.
1
Introduction
From a high level point of view, Evolutionary Algorithms (EAs) [5] manage a set (population) of possible configurations of the problem solutions(individuals), which are progressively modified by variation operators, in order to converge to an optimal solution or, at least, to a sub-optimum of good quality. EAs have been successfully applied to various combinatorial and continuous optimization problems. It is now clearly assessed that the successful use of an EA mainly relies on the suitable combination of several components: the choice of an appropriate encoding of the problem, the definition of efficient operators and the adjustment of the behaviour of the algorithm by means of parameters. One may indeed identify two general classes of parameters: behavioural, mainly operator application rates or population size and structural, that define the main features of the algorithm and could eventually transform it radically, e.g., those related with the encoding and the choice of operators. Even if significant progresses have been achieved in parameter setting [10], this setting often relies on empirical rules and/or problem-domain knowledge and often involve time-consuming experiences. Following the classification proposed in [4], we usually distinguish between tuning techniques that aim at adjusting the algorithm’s parameters before the run and control techniques that modify the algorithm’s behaviour along the search process. Efficient tuning methods are now available using either statistical tools[2] or meta-algorithms[8][14]. Back to the performance of the EAs, the management of the balance between the exploration and the exploitation of the search space (also known as P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 130–141, 2011. c Springer-Verlag Berlin Heidelberg 2011
From Adaptive to More Dynamic Control in Evolutionary Algorithms
131
diversification and intensification) is largely recognized as a key feature for the overall performance of the search process. Therefore, there is a need for finding a suitable criteria that can be used for this management. Several control techniques have been proposed, especially to tackle the selection problem, i.e., which variation operator should be applied at a given step of the search. Recent works on adaptive pursuit [17] or Adaptive Operator Selection (AOS) [7,6] have provided promising results on the adaptive management of the search process but focused on a single criterion: the quality of the population. [12] proposed a control method for EAs, whose purpose is to manage simultaneously the mean quality and the diversity of the population, which are clearly related to intensification and diversification. The main idea is to consider the impact of an operator over the search and to keep a trace of its successes or failures. This knowledge is then used to choose the next operator to apply. In [11] combinations of various components of the controller have been tested in order to achieve the best possible results by combining ideas from [6,3,12]. In [13], the adaptive management of the operators themselves is addressed. In these works, the search is guided by a fixed scheme to control the trade-off between intensification and diversification. The described controller is adaptive since it adapts its behaviour to the current state of the search and the algorithm’s environment (including its components and some fixed parameters). However, this control is not really dynamic in the sense that the balance between intensification and diversification is keep fixed. Since the search may need different emphasis on these trends in different moments of the search in order to obtain better results [9], we want to explore ways to adjust this trade-off dynamically. In this work, our purpose is twofold: – to provide a fair a clear experimental analysis of the controller behaviour: different level of control were interleaved in [13] and we need to assess the control ability of the controller itself. – to provide an analysis of the dynamic control ability of the controller and of its potential ability to manage more dynamic strategies. Such a study will help algorithm designers to better understand the actual effects of such adaptive control techniques and thus to select the suitable components for achieving more autonomous algorithms. This paper is organized as follows. The controller is presented in Sec.2, an experimental analysis is offered in Sec. 3 and 4, before drawing conclusions and outlining future work in Sec. 5.
2
Adaptive Operator Selection
Parameter tuning implies a high cost, reason why we will focus on parameter control and more specially on the adaptive control of the applications rates of operators that can be used in an EA. Considering a basic steady state EA, these parameters are used to select for each generation which operator will be applied. Actually, the concept of AOS has been widely studied for EAs and we may decompose it in the basic actions depicted in Fig. 1 and described below.
132
G. di Tollo et al.
EA
Impact Computation
Performances Evaluation
Credit Assignment
Operator Selection
Fig. 1. AOS General Scheme
Impact Computation. The first task consists in assessing the performances of the operator that has been applied. Following the ideas developed in [12], we evaluate an operator according to its impact on both quality and diversity, which can be seen as criteria that reflect the balance between intensification and diversification. Since considering a single value could not reflect the properties of an operator, it is more representative to record several values over a sliding window [11] of size T . Different functions can be used to compute the performance of the operator such as mean value or, as suggested in [6], the extreme value (i.e., max). Wi will show that the choice of this function may fully change the operator’s behaviour. Once the impacts in terms of quality and diversity have been computed, they can be used to evaluate the operator. Performance Evaluation. The method presented in [12] plots ΔD (diversity variation) vs ΔQ (quality variation). In this framework, an user-defined angle θ is introduced to set the desired compromise between quality and diversity. Its value varies between 0 (maximum reachable diversity) and π2 (maximum reachable quality). The similarity between the direction induced by the operator and the desired search direction (defined by θ) is computed as (sin(θ) · δD + cos(θ) · δQ)/ sin(θ)2 + cos(θ)2 . Being this a multi-criteria measure, other evaluation functions could be used (e.g., Pareto Dominance or Pareto Rank [13]), however, they will not be considered in this study. Credit Assignment. Since we may have a wide variety of operator profiles, we are interested in comparing –and rewarding– them in a fair way. Performance is normalized according to the highest value and to the execution time of the operator (to set the trade-off between obtaining good results and the time invested in doing it). The credit register is then used by the the operator selection process. Note that, as for impact computation, the rewards can be computed over a given period of time either using the max imum reward or the mean reward. Nevertheless, it has been shown that this choice level has a negligible effect on the overall behaviour of the controller[11]. Operator Selection. The operator selection has been considered from two basic points of view. On the one hand, Probability Matching (PM) is the most common method, based on a roulette-wheel routine: the application rate is proportional to the reward. On the other hand, operator selection is related to reinforcement learning problems since it basically consists in discovering the optimal application policy, i.e., applying the best possible operators, without neglecting
From Adaptive to More Dynamic Control in Evolutionary Algorithms
133
the potential offered by formerly-bad and not-used ones. Following ideas stemming from Multi-Armed Bandit methods (MAB), we used an Operator Selection scheme that has been firstly investigated in [3]. In MAB, the next operator is log
n
k k,t chosen according to the following formula: M ABo,t = ro,t + C where no,t ro,t is the cumulated reward obtained by operator o at time t, and n is the number of times operator o has been applied so far. The scaling factor C1 is used to properly balance the trade-off between both terms. Furthermore, ro,t and no,t are reset when a change in operators’ behaviour is detected (according to a Page-Hinkley test). This formula relies on the fact that all operators are available from the beginning of the search.
3
Quality and Diversity Management
Experiments have been performed on the satisfiability problem (SAT) [1] for two reasons: first, because SAT can encode a variety of problems with different fitness landscapes; then, because we use a SAT-specialized algorithm [18] that has several crossover operators whose performance is already known due to previous studies [13]. Population size has been set to 30 for every different experiments and a set of 20 crossover operators was used (to be detailed in Sec. 4). To show that the controller’s behaviour is not dependent on the category of instances, we use three SAT instances coming from both random (F500) and handmade categories (SIMON, 3BIT). These instances have been chosen because they are representative of different kind of instances and show different landscapes2 . In the following experiments we will focus on the behaviour of the population along the execution time in terms of mean quality and diversity rather than the quality of the solution. Please notice that the execution time is constant for each operator, so it will be assessed as the number of crossovers performed during the search. We have observed that, with regards to the issues we want to highlight, the behaviour of the controller is relatively equivalent for the considered instances, reason why we will focus on the most representative figures. About 100 full experiments were executed for all different combinations of algorithm’s components and strategies to change the angle’s value. 3.1
The Trade-off Hypothesis
Previous works claimed that the angle value π4 would lead to a good compromise between intensification and diversification. Our purpose is first to check this hypothesis. We will present experiments made to compare PM and MAB, considering how the mean and max impact computation criteria influence their behaviour (see Sec. 2, § ‘Impact Computation’). 1
2
Let us note that some parameters are involved in MAB, but as studied in [11], their influence is much less significant than initial operator application rates and they have relative stable values over wide set of benchmarks. For more details, we forward the interested reader to the SAT competition’s website http://www.satcompetition.org/.
134
G. di Tollo et al.
To start, we fixed θ = π4 on the three considered instances. The following pictures show the normalized population ’s entropy and the cost of all individuals as they appear in each step (i.e., at each crossover, shown on the x-axis). When using PM, we can see that the search behaviour shows great differences between the mean and max criteria. As shown in the pictures, the mean approach can converge, depending on the instance, to both intensification (Fig. 2a) or diversification behaviours (high entropy, Fig. 2b).
(a) Simon, PM mean, angle π/ 4
(b) F500, PM mean, angle π/ 4
Fig. 2. Experiments with fixed angle π/4
The same uncertainty holds when introducing the MAB approach. This can lead either to a strong instability (Fig. 3a), without evident differences between the mean and max approach, or to a behaviour favouring diversification (Fig. 3b)3 .
(a) SIMON, MAB max, angle π/ 4
(b) F500, MAB mean, angle π/ 4
Fig. 3. Experiments with fixed angle π/4
We have seen that setting the value of θ to π/4 produces a variety of different behaviours (and not necessarily an intermediate compromise between diversification and intensification). Apparently, the instance’s features (fitness 3
In this case, the controller shows a strong use of operator 6011, namely the only one able to diversify, see Sec. 4.
From Adaptive to More Dynamic Control in Evolutionary Algorithms
135
landscape) or other settings of the controller (the operator selection scheme used) are stronger factors than a fixed “universal“ value for θ. In order to elucidate the real effect of the angle over the search, we will observe how the search behaviour changes when the value of θ is modified during the search. 3.2
Towards a More Dynamic Control
We have defined different strategies in order to dynamically change the angle value during search: – To split the execution time in several epochs and alternates the angle value between 0 and π2 (ALWAYSMOVING). – To split the execution time in several epochs and angle values in equally distributed levels in [0, π2 ] for each one. Angle can either increase (ANGLEINCREASE ) or decrease (ANGLEDECREASE ); – To change the angle linearly over the execution time in [0, π2 ], either increasing (LINEARINCREASE ) or decreasing (LINEARDECREASE ). In the following pictures we plot, along with the population’s Cost and Entropy, the value of angle θ during the execution, in the range [0, π2 ]. It has been remarked that the EA behaviour does not change progressively product of small angle variations, either when using the max or the mean criteria. In some cases it seems to exist a threshold from where the behaviour changes radically, supporting the hypothesis of the “positive feedback” proposed in [11], except that this threshold is not always found at θ = π/4. Notice, for instance, how MAB with the max criterion and ANGLEINCREASE strategy switches near π2 (Fig. 4b), while the switch occurs near π4 for ANGLEINCREASE-mean (Fig. 4a) and ANGLEDECREASE both mean and max (Fig. 4d). In some other cases, particularly those mixing PM with decreasing strategies (with mean and max ), the behaviour seems to be unaffected by the value of θ (Fig. 4c). When increasing the angle, the mean criterion shows itself more reactive than max. We can say that defining a dynamic strategy which accounts for progressive angle variations does not represent an advantage in terms of control, since the search behaviour does not react to progressive angle changes, but rather to trespassing a threshold. This is not satisfactory from a dynamic behaviour point of view. The hypothesis of a threshold value for θ that depends on the instance being solved is supported by the observed behaviour of LINEARDECREASE and LINEARINCREASE in different instances (Fig. 5): When using MAB, there clearly exists an angle value that triggers a different search behaviour. Notice that the threshold value is different depending on whether we are increasing (Figs. 5 c and d) or decreasing (Figs. 5 a and b) θ . This could make difficult to find a compromise value, since it would depend on initial state of diversity. Given that θ produces basically two extreme behaviours, we could devise a control strategy that simply alternates between minimum and maximun values in order to control both quality and diversity. This can be seen in experiments carried using the ALWAYSMOVING strategy, in which the MAB clearly reacts to angle changes in both mean and max impact computation criteria (Fig. 6).
136
4
G. di Tollo et al.
Operators Management
Another relevant goal of this study is to investigate and understand how the control affects the Operator Selection during the search process. Operators lead to the concentration of the population into specific search space areas (thus favouring intensification), others have it spread, thus favouring diversification. To this goal, we have used a selection of 20 crossover operators from the set of more than 300 crossover operators defined in [13], combining four basic features. We have grouped them according to their expected effect over population: diversification: 0035, 0015, 4455, 6011; intensification: 0011, 1111, 1122, 5011, 3332, 1134, 0022, 2352, 4454, 1224, 0013; neutral: 2455, 4335, 1125, 5035, 1335. Preliminary experiments have shown that, despite the above-mentioned expected effects, the behaviour induced by operators may be rather unclear at some point, except for 6011, which has shown excellent diversification capabilities in every condition. We will focus on the behaviour of the operator selection module w.r.t. its skill on dealing with those operators during diverse search epochs. As a starting point, we show the operators frequency application when using a fixed angle. When favouring diversification (θ = 0), both PM and MAB are able to identify the operator 6011 as the one able to insure the utmost diversification, using both max and mean. Notice that the probabilistic nature of PM can incur
(a) SIMON, MAB mean, ANGLEINCREASE
(b) SIMON, MAB max, ANGLEINCREASE
(c) 3BIT, PM max, ANGLEDECREASE
(d) 3BIT, MAB max, ANGLEDECREASE
Fig. 4. Experiments with ANGLEINCREASE and ANGLEDECREASE
From Adaptive to More Dynamic Control in Evolutionary Algorithms
(a) SIMON, MAB max, LINEARINCREASE
(b) SIMON, MAB mean, LINEARINCREASE
(c) SIMON, MAB max, LINEARDECREASE
(d) SIMON, MAB mean, LINEARDECREASE
137
Fig. 5. Experiments with LINEARINCREASING and LINEARDECREASING
on the risk of applying a “wrong” operator if this has been applied in the early stage of the search. This is the case of the operator 1111, which is an operator favouring intensification, but that has been used by PM mean (Fig. 7a). MAB instead (Fig. 7b) does not show this shortcoming. At this point, we have also to remark that it is obviously difficult to intensify the search when the population has reached a good mean quality, therefore intensification operators may then turn to another behaviour when being applied. In the following pictures, we are displaying, along with Cost, Entropy and Angle, the operator’s relative application rate. When imposing the maximum desired quality (θ = π2 ), MAB mean is able to identify the intensifier operator 1111 (Fig. 9a). The same happens with the MAB max, even with higher application magnitude (Fig. 9b). Note that when applying MAB, the operator selection process makes all operators to be selected at least a minimum number of times through the search: for this reason, operators which are known to not provide any intensification are deemed to worth a try. This is the case of the diversifying operator 6011, that has been detected and used in both mean and max. This is different with PM: When 6011 is not randomly selected at the early search stage, its not likely to be selected, neither using mean (Fig. 9c) nor max (Fig. 9d). Notice that also with PM the operator 1111 has been detected and used, but its magnitude is smaller than in the MAB case.
138
G. di Tollo et al.
(a) SIMON, MAB max, ALWAYSMOVING
(b) SIMON, MAB mean, ALWAYSMOVING
Fig. 6. SIMON, ALWAYSMOVING
(a) 3 BIT, PM mean, angle 0
(b) 3 BIT, MAB mean, angle 0
Fig. 7. Operator Frequency: angle 0
When dynamically changing the angle, the advantages of using MAB rather than PM become evident by analysing the operator frequency: in ANGLEDECREASE, PM turns not to be utterly unable to detect the operator 6011 as diversifying agent, in both mean and max. This is given by the fact that in the first search stages, when intensification is advocated, high rewards are assigned to other operators. These high rewards make them to be applied also in further search stages (Fig 8e, only the mean case is reported). For each operator, several bars are plotted, each of which corresponds to a different search epoch, identified by a diverse angle value. MAB does not show this shortcoming, being able to correctly identify the operator 6011 as the best diversificator, in both ANGLEINCREASE and ANGLEDECREASE (Fig. 8f). When applying ANGLEINCREASE, PM max identifies soon the operator 6011 as the one apt to diversify, applying it less and less as the angle value decreases (Fig. 8g). PM mean also discriminates among search epochs, but the operator 6011 is uniformly applied over the first two epochs (Fig. 8h). Similar conclusions can be drawn when considering extreme and consecutive angle changes (ALWAYSMOVING): PM max is able to understand when to use operator 6011, when starting from both θ = π2 (Fig. 8a) and θ = 0 (Fig. 8b). PM mean is instead incapable to self-adapt to these angle changes, and shows furthermore a huge application of operator 2455, which is a neutral one (Fig. 8c).
From Adaptive to More Dynamic Control in Evolutionary Algorithms
(a) PM Simon Always moving 1, Max Rew
(b) PM Simon Always moving 0, Max Rew
(c) PM Simon Always moving 0, Mean Rew
(d) MAB Simon Always moving, Max Rew
(e) PM Simon Decrease, Mean Rew
(f) MAB Simon Decrease, Max Rew
(g) PM Simon Increase, Max Rew
(h) PM Simon Increase, Mean Rew
Fig. 8. Operator Frequency
139
140
G. di Tollo et al.
(a) F500 MAB mean, angle π/2
(b) F500 MAB max, angle π/2
(c) F500 PM mean, angle π/2
(d) F500 PM max, angle π/2
Fig. 9. Operator Frequency: Angle π/2
MAB has instead no problem in self-adapting to extreme angle changes: using both mean and max, and in both ANGLEINCREASE and ANGLEDECREASE, the behaviour can be summarised by Fig. 8d.
5
Conclusion
The general parameter control paradigm is related to the algorithm selection problem [16], which consists in selecting the most efficient algorithm for solving a problem at hand. Here, this selection is adaptive or dynamic and this general paradigm may open a new perspective for finding Free Lunch Theorems [15]. At least, there is a need of more formalized tools and criteria to compare algorithm’s performances in term of reliability, adaptability and autonomy. We have shown that, changing the angle value during search allows the controller to control the desired features. Since diversity depends not only on the value of θ but also in its previous state, extreme changes in θ seem to be the more straightforward way to control diversity. A finer diversity control could be done using two approaches: either to define how long these extrema values should be maintained, or to closely monitor the obtained diversity values and adjust θ according to the strategy schedule. Additional experiments are being carried out, which embed a mechanism to insert new operators on the fly. Further work will address the dynamic adjustment of the angle value based on the difference between desired and actual levels of intensification–diversification trade-off.
From Adaptive to More Dynamic Control in Evolutionary Algorithms
141
Acknowledgement. This work was supported by the French Chilean ECOS program C10E07.
References 1. Biere, A., Heule, M., Maaren, H.V., Walsh, T. (eds.): Handbook of Satisfiability. Frontiers in Artificial Intelligence and Applications, vol. 185. IOS Press, Amsterdam (2009) 2. Birattari, M., St¨ utzle, T., Paquete, L., Varrentrapp, K.: A racing algorithm for configuring metaheuristics. In: Proc. GECCO 2002, pp. 11–18. M. Kaufmann, San Francisco (2002) ´ Schoenauer, M., Sebag, M.: Adaptive operator selection 3. Da Costa, L., Fialho, A., with dynamic multi-armed bandits. In: Proc. GECCO 2008, pp. 913–920. ACM Press, New York (2008) 4. Eiben, A.E., Hinterding, R., Michalewicz, Z.: Parameter control in evolutionary algorithms. IEEE Trans. Evolutionary Computation 3(2), 124–141 (1999) 5. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series. Springer, Heidelberg (2003) ´ Da Costa, L., Schoenauer, M., Sebag, M.: Extreme value based adaptive 6. Fialho, A., operator selection. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN X. LNCS, vol. 5199, pp. 175–184. Springer, Heidelberg (2008) ´ Cai, Z.: Adaptive strategy selection in differential evolution. 7. Gong, W., Fialho, A., In: Proc. GECCO 2010, pp. 409–416. ACM Press, New York (2010) 8. Hutter, F., Hoos, H.H., St¨ utzle, T.: Automatic algorithm configuration based on local search. In: Proc. AAAI 2007, pp. 1152–1157 (2007) 9. Linhares, A., Yanasse, H.: Search intensity versus search diversity: a false trade off? Applied Intelligence 32(3), 279–291 (2010) 10. Lobo, F., Lima, C., Michalewicz, Z. (eds.): Parameter Setting in Evolutionary Algorithms. Studies in Computational Intelligence, vol. 54. Springer, Heidelberg (2007) 11. Maturana, J., Fialho, A., Saubion, F., Schoenauer, M., Sebag, M.: Compass and dynamic multi-armed bandits for adaptive operator selection. In: Proc. CEC 2009, pp. 365–372. IEEE, Los Alamitos (2009) 12. Maturana, J., Saubion, F.: A compass to guide genetic algorithms. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN X. LNCS, vol. 5199, pp. 256–265. Springer, Heidelberg (2008) 13. Maturana, J., Lardeux, F., Saubion, F.: Autonomous operator management for evolutionary algorithms. Journal of Heuristics 16(6), 881–909 (2010) 14. Nannen, V., Smit, S.K., Eiben, A.E.: Costs and benefits of tuning parameters of evolutionary algorithms. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN X. LNCS, vol. 5199, pp. 528–538. Springer, Heidelberg (2008) 15. Poli, R., Graff, M.: There is a free lunch for hyper-heuristics, genetic programming and computer scientists. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 195–207. Springer, Heidelberg (2009) 16. Rice, J.R.: The algorithm selection problem. Advances in Computers 15, 65–118 (1976) 17. Thierens, D.: Adaptive Strategies for Operator Allocation. In: Parameter Setting in Evolutionary Algorithms, pp. 77–90. Springer, Heidelberg (2007) 18. Lardeux, F., Saubion, F., Hao, J.K.: GASAT: A Genetic Local Search Algorithm for the Satisfiability Problem. Evolutionary Computation 14(2), 223–253 (2006)
Geometric Generalisation of Surrogate Model Based Optimisation to Combinatorial Spaces Alberto Moraglio and Ahmed Kattan School of Computing and Centre for Reasoning, University of Kent, Canterbury, UK College of Computer and Information Systems, Um Alqura University, Saudi Arabia
[email protected],
[email protected]
Abstract. In continuous optimisation, Surrogate Models (SMs) are often indispensable components of optimisation algorithms aimed at tackling real-world problems whose candidate solutions are very expensive to evaluate. Because of the inherent spatial intuition behind these models, they are naturally suited to continuous problems but they do not seem applicable to combinatorial problems except for the special case when solutions are naturally encoded as integer vectors. In this paper, we show that SMs can be naturally generalised to encompass combinatorial spaces based in principle on any arbitrarily complex underlying solution representation by generalising their geometric interpretation from continuous to general metric spaces. As an initial illustrative example, we show how Radial Basis Function Networks (RBFNs) can be used successfully as surrogate models to optimise combinatorial problems defined on the Hamming space associated with binary strings.
1
Introduction
Some typologies of tasks when cast as optimisation problems give rise to objective functions which are prohibitively expensive to evaluate. Furthermore, oftentimes these problems are black-box problems, i.e., whose problem class is unknown, and they are possibly mathematically ill-behaved (e.g., discontinuous, non-linear, non-convex). For example, most engineering design problems are of this type (see e.g., [12]). They require experiments and/or simulations to evaluate to what extent the design objective has been met as a function of parameters controlling the design. In evolutionary computation parlance, the controlling parameters are the genotype that encodes the design solution (i.e., the phenotype) which needs to be expressed via an expensive simulation (i.e., the growth function) to be evaluated (i.e., fitness evaluation). The simulation can take many minutes, hours, or even days to complete. Optimisation methods based on surrogate models, also known as response surface models, have been successfully employed to tackle expensive objective functions (EOFPs). For a survey on surrogate model based optimisation methods refer to [6]. A surrogate model is a mathematical model that approximates as precisely as possible the expensive objective function of the problem at hand, and that is computationally much cheaper to evaluate. The objective function is P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 142–154, 2011. c Springer-Verlag Berlin Heidelberg 2011
Geometric Generalisation of Surrogate Model Based Optimisation
143
considered unknown. The surrogate model is built solely from available known values of the expensive objective function evaluated on a set of solutions. We refer to the pair (solution, known objective function value) as data-point. When the search space is the real line (i.e., solutions are real numbers), perhaps the simplest example of surrogate model is linear interpolation which determines a function from the graph of the data-points available by linking each data-point to the closest with a straight line. A class of more natural-looking surrogate models for the real line is the class of polynomial curves of suitable degree, which can be used to interpolate all data-points and that are everywhere differentiable. These and other methods to build surrogate models on the real line naturally extend to higher dimensional spaces giving rise to various forms of spatial interpolation and spatial regression. The traditional procedure of surrogate model based optimisation (SMBO) [6] is outlined in Algorithm 1. An initial surrogate model is constructed using the objective values of a small set of solutions evaluated using the expensive objective function. The remaining expensive objective function evaluations out of a limited budget are applied to candidate solutions which the surrogate model predicts to have promising performance. The process interleaves search of the surrogate model to obtain its optimum, evaluation of the optimum solution of the model using the expensive objective function, and update of the surrogate model with the new data-point. Note that the role of the evolutionary algorithm in the SMBO procedure is to infer the location of a promising solution of the problem using the surrogate model, and it is not directly applied to the original problem with the expensive objective function. This is feasible because the computational cost of a complete run of the evolutionary algorithm on the surrogate model is negligible (in the order of few seconds) with regard to the cost of evaluating a solution using the expensive objective function of the problem (in the order of minutes, hours or even days depending on the problem). Algorithm 1. Surrogate Model Based Optimisation 1. Sample uniformly at random a small set of candidate solutions and evaluate them using the expensive objective function (initial set of data-points) 2. while limit number of expensive function evaluations not reached do 3.
Construct a new surrogate model using all data-points available
4.
Determine the optimum of the surrogate model by search, e.g., using an evolutionary algorithm (this is feasible as the model is cheap to evaluate)
5.
Evaluate the solution which optimises the surrogate model in the problem with the expensive objective function (additional data-point available) 6. end while 7. Return the best solution found (the best in the set of data-points)
Virtually all surrogate models are implicitly or explicitly spatial models as their predictions involve exploiting some assumed spatial relations (e.g., smoothness) between the values of the objective function at a query point whose value
144
A. Moraglio and A. Kattan
is unknown and has to be predicted, and the known data-points. This makes SMBOs naturally suited to continuous optimisation. However they do not seem to be applicable to combinatorial optimisation problems except in those cases in which solutions are naturally represented as vectors of integers, in which case adequately discretized versions of the surrogate model may be used. Furthermore, when solutions are vectors, integer or real, a host of techniques to build functions from data-points can be borrowed from statistics (i.e., multi-variate regression [3]) and machine learning (i.e., supervised learning by e.g., neural networks and support vector machines [9]), which can be used to build surrogate models. There is an increasing number of optimisation problems naturally associated with complex solution representations which have also very expensive objective functions. For example, permutations and related representations are natural representations for solutions of many types of scheduling problems. In real-world problems, candidate scheduling solutions may need to be tested in a complex setting by running a computationally expensive simulation of, for example, an entire production process, to be evaluated. Variable-length sequences are natural representations for biological sequences in bio-informatics problems. In this context, surrogate model based optimisation may be used to determine which particular biological sequences to study in detail by scientist or to simulate at an atomic level that, for example, are more likely to correspond to proteins with desired target properties/functions. Genetic Programming that normally uses a tree representation, has a number of application domains with expensive objective functions. For example, one of them is when genetic programs encode behavioral controllers of robots that may need to be tested in a virtual or real environment a number of times to assess how good the controller is at controlling the robot for certain target tasks (i.e., wall-following or obstacle avoidance). The current situation of surrogate model with regard to solution representations is as follows. There is much existing literature on surrogate model based optimisation using evolutionary algorithms or other search algorithms to optimise the surrogate model for continuous spaces. See for example the survey [5]. Some recent work covers the case when the underlying solution representation is discrete vectors (e.g., [1]). There are works focusing on specific real-world applications with expensive objective functions which are inherently combinatorial problems with structured solutions (e.g., graphs) which are approached by encoding solutions in vectorial form to use standard surrogate models (e.g., [13][2]). There are also approaches in which evolutionary algorithms are not used to search the surrogate model but to train the surrogate model on the known data-points, see e.g. [8], in which Genetic Programming is used to do symbolic regression to determine the best fitting vector-input function to the data-points. To the authors’s best knowledge there are no works in literature on surrogate models defined directly on more complex representations than vectors. Therefore, for search problems naturally based on structured representations, surrogate models can be used only after shoe-horning the original representation to a vector form. This introduces extra non-linearity in the target expensive objective function, so making it harder to learn, and consequently requiring more expensive samples to
Geometric Generalisation of Surrogate Model Based Optimisation
145
approximate it well enough to locate its optimum. The aim of this paper is to introduce a framework that allows us to define systematically surrogate models directly on any underlying solution representation, allowing us to choose the most natural representation for the problem at hand. A geometric framework of recent introduction [10] has been successfully used to generalise various search algorithms from continuous spaces to combinatorial spaces in a principled way. This paper shows that the geometric methodology extends naturally to the generalisation of machine learning algorithms. The method is conceptually simple. Firstly, (i) the original algorithm on the continuous space is rewritten only as a function of the Euclidean distance between points. Algorithms that are inherently spatial can be often rewritten in these terms. Then (ii) the generalisation is done by replacing the Euclidean distance with a generic distance (i.e., formally a metric), obtaining a general formally well-defined algorithm. Finally, (iii) the formal algorithm can be specified to any solution representation by specifying it to an edit distance directly defined on the target representation. The generalised algorithms using the geometric methodology can be naturally specified to complex representations because many types of structured objects admit natural notion of distance/similarity between them. In particular edit distances are well suited to structured objects. The edit distance between two configurations is the minimum number of edit operations required to transform one of them into the other. Edit operations are unitary modifications that change one configuration into a similar configuration. For example, the Hamming distance is an edit distance between binary strings based of the bit-flip edit move. On permutations, the swap distance is the minimum number of swaps of two elements to sort one permutation into the other. On variable length sequences, the Levenshtein distance is the minimum number of insertions, deletions, or changes of characters in a sequence to transform it into the other. There are also various types of edit distances defined on trees and graphs based on moves editing edges and nodes. In the reminder of the paper, we show how a supervised machine learning algorithm to learn functions from data, namely Radial Basis Function Networks (RBFNs) (see e.g., [4]), can be successfully generalised to encompass any solution representation using the geometric methodology. The generalised algorithm obtained is then formally instantiated to the binary strings representation endowed with the Hamming distance. As a preliminary experimental analysis, we use the resulting learning model within a SMBO to optimise a well-known test-bed of problems on binary strings, the NK-landscapes [7], which we will consider having costly objective function1 . The generalised RBFNs model derived in this paper is very general and can be used in principle with any solution representation. 1
The aim of the present paper is not to show that the generalised SMBO can be competitive on real-world problems with expensive objective functions with structured representations. It is to show that the generalised SMBO can be in principle applied to such cases and that it provides meaningful results when applied to well-studied toy problems on a simple discrete space. This preliminary step is necessary because the transition from continuous to discrete spaces is a large conceptual leap, and there is no guarantee that such an approach would work on even simple discrete spaces.
146
A. Moraglio and A. Kattan
Furthermore, other learning algorithms naturally defined in spatial terms, e.g., spatial regression algorithms (i.e., Gaussian Process Regression [11]) can be generalised analogously.
2
Radial Basis Function Networks
In the machine learning literature, there are a number of approaches to “learning” a function belonging to a certain class of functions from data-points (i.e., finding a function in that class that interpolates and best fits the data-points according to some criteria), which can be naturally cast only in terms of (Euclidean) distances between points, hence readily generalised to metric spaces, by replacing the Euclidean distance with a general metric. These include Nearest-Neighbors Regression, Inverse Distance Weighting Interpolation, Radial Basis Function Network Interpolation, and Gaussian Process Regression (also known as Kriging). The first two methods are simpler but they are not adequate to be used as surrogate models because the global optimum of the learnt functions from the data-points coincide with a data-point used in the construction of the function. Consequently, these methods cannot be used to suggest a solution that improves over the known datapoints (i.e., they cannot extrapolate from the data-points). Gaussian Process Regression is a very powerful method with a solid theoretical foundation, which not only can make a rational extrapolation about the location of the global optimum, but also gives an interval of confidence about the prediction made. Radial Basis Function Network Interpolation is conceptually simpler than Gaussian Process Regression and can extrapolate the global optimum from the known data-points. In this paper, we focus on RBFNs, and leave the generalisation of Gaussian Process Regression as future work. 2.1
Classic RBFNs
A radial basis function (RBF) is a real-valued function φ : Rn → R whose value depends only on the distance from some point c, called a center, so that φ(x) = φ(x − c). The point c is a parameter of the function. The norm is usually Euclidean, so x − c is the Euclidean distance between c and x. Other norms are also possible and have been used. Commonly used types of radial basis functions include Gaussian functions, multi-quadrics, poly-harmonic splines, and thin plate splines. The most frequently used are Gaussian functions of the form: φ(x) = exp(−βx − c2 ) where β > 0 is the width parameter. Radial basis functions are typically used to build function approximations of the form: N y(x) = w0 + wi φ(x − ci ). i=1
Therefore the approximating function y(x) is represented as a sum of N radial basis functions, each associated with a different center ci , a different width βi , and
Geometric Generalisation of Surrogate Model Based Optimisation
147
weighted by an appropriate coefficient wi , plus a bias term w0 . Any continuous function can in principle be approximated with arbitrary accuracy by a sum of this form, if a sufficiently large number N of radial basis functions is used. In a RBF network there are three types of parameters that need to be determined to optimise the fit between y(x) and the data: the weights wi , the centers ci , and the RBF width parameters βi . The most common method to find these parameters has two phases. Firstly, using unsupervised learning (i.e., clustering) the position of the centers and the widths of the RBFs are determined. Then, an optimal choice of weights wi that optimises the accuracy of the fit is done by least squares minimisation. A widely applied simplified procedure to fit RBF networks to the data, which skips the unsupervised learning phase, consists of choosing the centers ci to coincide with the known points xi , and choosing the widths βi according to some heuristic based on the distance to nearest neighbors of the center ci (local model), or to fix all widths to the same value which is taken proportional to the maximum distance between the chosen centers (global model). The bias w0 can be set to the mean of the function values bi at the known data-points (i.e., function values of the points in the training set), or set to 0. Under these conditions, the weights wi can be determined by solving the system of N simultaneous linear equations in wi obtained by requiring that the unknown function interpolates exactly the known data-points: y(xi ) = bi , i = 1 . . . N. Setting gij = φ(||xj −xi ||), the system can be written in matrix form as Gw = b. The matrix G is non-singular, if the points xi are distinct and the family of functions φ is positive definite (which is the case for Gaussian functions), and thus the weights w can be solved by simple linear algebra: w = G−1 b
2.2
Generalisation of RBFNs to Arbitrary Representations
To generalise RBFNs we need to generalise: (i) the class of functions used as approximants of the unknown function; (ii) the training procedure to determine the function within that class that best fits the data-points; (iii) the model query procedure that given a query point whose value is unknown it returns its predicted value. Following the geometric methodology for the generalisation, we first need to rewrite any of the above three elements only as a function of the Euclidean distance, then substitute the Euclidean distance with a generic metric obtaining a formal generalisation of the original algorithm, and finally specify the formal algorithm to a distance rooted in the target representation to obtain the specific instance of the algorithm for the target representation. If all these points are possible, then the generalisation of the algorithm has been successful. In the following, we show that indeed the geometric methodology can be applied successfully to generalise RBFNs.
148
A. Moraglio and A. Kattan
Let M be a metric space endowed with a distance function d. A radial basis function φ : Rn → R whose value depends only on the distance from some point c ∈ Rn so that φ(x) = φ(x − c) can be naturally generalised to a function φ : M → R whose value depends only on the distance from some point c ∈ M in the metric space so that φ(x) = φ(d(x, c)). For example, the generalised Gaussian functions are obtained by replacing the Euclidean distance with the generic metric d in the original definition: φ(x) = exp(−βd(x, c)2 ). A set of configurations endowed with a notion of edit distance is a metric space, as all edit distances meet the metric axioms. Consequently, a generalised radial basis function is well-defined on any set of configurations, or in other words, is a representation-independent function. For example, the set of binary strings H endowed with the Hamming distance hd form a metric space. Therefore, the generalised Gaussian functions, when the Hamming distance hd is specified as metric d, become well-defined functions φ : H → R, which map binary strings to real (note that both c and x are binary strings). The same generalised Gaussian functions are well-defined functions mapping permutations to real when the swap distance on permutations is specified as metric d. The approximating model y(x) which is a linear combination of radial basis functions can be generalised by considering a linear combination of generalised radial basis functions: y(x) = w0 + N i=1 wi φ(d(x, ci )). As its components, the generalised approximating model is also representation-independent and it can be specified to any underlying solution representation by specifying as underlying metric d a distance function rooted in the target representation. Interestingly, a generalised approximating model is a way of representing a very large family of functions on general metric spaces parameterised on the center locations ci , the weights wi , the widths βi , and by the specific underlying metric d. When the underlying metric space is finite (as it is in combinatorial optimisation problems), any function can in principle be approximated with arbitrary accuracy by a sum of this form, if a sufficiently large number N of radial basis functions is used2 . The method to fit the model to the known data-points does not refer explicitly to their underlying representation but it depends solely on the distances between the known data-points, taking gij = φ(d(xj , xi )), and on the known objective values bi . Therefore, in effect, model-fitting is also representation-independent. In particular, the simplified model-fitting procedure which fixes the centers and the widths and determine the weights wi by least squares minimisation can be done by solving the system Gw = b, regardless of the underlying representation. Notice however that when the distance function d is not embeddable in the Euclidean space, the radial basis functions which are positive definite on the Euclidean space are not necessarily positive definite with regard to the distance function d. In turns, the matrix G is not necessarily a positive definite matrix, hence the existence of the inverse matrix G−1 needed to determine the weights wi is not guaranteed. This difficulty can be overcome by considering the 2
To see this, consider the extreme case in which every point in space is associated with a radial basis function. In this case, it is always possible to choose the weights of the bases to fit the function values at each location.
Geometric Generalisation of Surrogate Model Based Optimisation
149
pseudo-inverse of the matrix G which always exist, it is unique, and it corresponds to the traditional inverse when it exists. It can be shown that the weights wi determined by solving the system Gw = b using the pseudo-inverse are exactly those obtained by least squares minimisation. As for querying the model with a point to obtain the predicted value, clearly the query point is represented using the underlying representation (e.g., the query point is a binary string when the model is specified to the Hamming distance on binary strings). However, importantly, the way of calculating the predicted value is representation-independent as it does not depend on the underlying representation but only on the distance between the query point and the center points. In summary, the definition, learning and querying of RBFNs naturally generalise from Euclidean spaces to general metric spaces. The generalised model applies to any underlying representation once a distance function rooted on that representation is provided. In particular, this method can be used as it is to learn in principle any function mapping complex structured representations to reals. It is important to note that it is the generalised RBFNs learning that adapts to the target representation, rather than the other way around. In particular, there is no special requirement of the target representation of being shoehorn in a vector of features. This allows us to choose the most natural representation and distance for the task at hand which, as discussed earlier, is likely to make the learning easier. A further point to note is that the adaptation of the general model to the specific representation is done by formal instantiation of a generic metric to a distance function associated with the target representation, in particular the model adapts naturally to the target representation without introducing any arbitrary ad-hoc element, for example, to deal with more complex representations. Finally, this way of looking at learning algorithms is both formal and general and naturally bridges continuous and combinatorial spaces. Naturally, the fact that this generalised model is well-defined on any representation is logically independent from that it may work well on all representations and for problems occurring in practice. Therefore, it is important to test the framework experimentally for different representations and types of problems. In the following section, we present initial experiments for when the framework is applied to binary strings3 on a standard test-bed. In future work, we will investigate how the framework performs when specified to more complex representations and to problems with expensive evaluation functions arising in practice.
3
Experiments
Experiments have been carried out using the well-known NK-Landscape problem [7], which provides a tunable set of rugged, epistatic landscapes over a space of binary strings, which we will consider having costly objective function. In our 3
Binary strings are of course a special type of vectors. However, they are a valid representation to use as an illustrative example of application of the generalised SMBO to combinatorial spaces because their property of being vectors is not utilised.
150
A. Moraglio and A. Kattan
experiments, in order to evaluate the performance of the SMBO algorithm under different conditions of problem size and ruggedness, we use landscapes of size n = 10, 15, 20, 25 for k = 2, 3, 4, 5, for a total of 16 combinations. We use a standard surrogate model based optimisation algorithm (see Algorithm 1). The surrogate model used is a RBFN model which is fitted to the available data-points using the simplified learning procedure presented in the previous section. The centers ci of the radial basis functions are chosen to coincide with all available data-points. The widths of the radial basis functions are assigned to the 1 same value β = 2D 2 where D is the maximum distance between all centers. With this setting of β, each radial basis function is “spread on the space” to cover all other centers so that each known function value at a center can potentially contribute significantly to the prediction of the function value of any point in space, and not only locally to function values of points near the given center (i.e., we force the surrogate model to be a global approximating function). The value of the bias term w0 is set to the average function value of the known data-points, i.e., the average of vector bi . In this way, the predicted function value of a point which is out of reach of the influence of all centers is by default set to the average of their function values. The coefficients wi of the radial basis functions in the linear model are determined by least squares minimisation as described in the previous section. The other settings of surrogate model based optimisation are as follows. We set the parameters as a function of the problem size n. The number of total available expensive function evaluations is set to n2 . So, essentially our aim is to find the best solution to the problem the algorithm can produce in quadratic time out of an exponential number of candidate solutions (i.e., 2n ). This setting is just a term of reference, as for different problems one may have a different number of solutions available with regard to the problem size. We set the size of the initial sample of data-points to two, and the number of sample points suggested by the surrogate model to n2 − 2. This setting is consistent with the working hypothesis that the surrogate model is better than random sampling at suggesting promising solutions which are better than the known data-points as it uses as much as possible the surrogate model to make predictions. To search the surrogate model we use a standard generational evolutionary algorithm with tournament selection with tournament size two, uniform crossover with crossover rate 0.5 and bitwise mutation with mutation rate 1/n. The population size and the number of generations are both set to 10n, which provide the evolutionary algorithm with an abundant lot of trials to locate the optimal or a near-optimal solution of the surrogate model. If the predicted objective value of the best solution of the surrogate model is better than the best known objective value of the known data-points, then the model could extrapolate from the data, and that solution is evaluated in the expensive objective function. Otherwise, the surrogate model has failed at suggesting a promising solution which improves over the known best, and a solution sampled uniformly at random is evaluated with the expensive objective function in the attempt to gather more data about under-sampled regions of the problem and improve the accuracy of the surrogate model to help subsequent searches on the model.
Geometric Generalisation of Surrogate Model Based Optimisation
151
To have a reference for the performance of the SMBO algorithm, we compared it with Random Search (RS), a standard (1+1) Evolutionary Algorithm ((1+1) EA), and with a generational Evolutionary Algorithm (EA) applied directly on the problem with the expensive objective function. The reason we included random search in the set of algorithms is because, whereas it is safe to assume that in practice evolutionary algorithms are better than random search in normal circumstances, with small samples random search can do relatively well. We gave all algorithms in the comparison exactly the same number of expensive objective functions, which is n2 trials, and report the best solution found. The (1+1) EA has a population of a single individual and uses bitwise mutation with bit-flip probability of 1/n. The EA has a population of n individuals, it runs for n generations, it uses tournament selection with size two, bitwise mutation with bit-flip probability of 1/n, and uniform crossover with crossover rate 0.5. For each considered combination of the parameters n and k, we generated a single fitness landscape and did 10 independent runs of the above algorithms on it. For each problem, we also estimated the global optimum by running an evolutionary algorithm with a very large population (1000 individuals) and very large number of generations (1000 generations). Table 1. Results on the NK landscape benchmark. Mean and max over 10 independent runs of the best solution found by each algorithm, for all combinations of k = 2, 3, 4, 5 and n = 10, 15, 20, 25. EA (1+1)EA RS mean max mean max mean max N=10 2 (0.704) 0.702 0.704 0.686 0.704 0.675 0.698 0.649 0.704 3 (0.794) 0.775 0.794 0.724 0.794 0.705 0.745 0.724 0.794 4 (0.787) 0.755 0.787 0.725 0.787 0.714 0.787 0.727 0.787 5 (0.810) 0.762 0.810 0.706 0.727 0.729 0.810 0.718 0.810 N=15 2 (0.743) 0.742 0.743 0.693 0.714 0.628 0.681 0.674 0.714 3 (0.738) 0.718 0.738 0.678 0.706 0.622 0.706 0.677 0.717 4 (0.747) 0.721 0.747 0.685 0.711 0.646 0.705 0.680 0.710 5 (0.760) 0.737 0.758 0.711 0.749 0.672 0.728 0.700 0.757 N=20 2 (0.729) 0.726 0.729 0.689 0.718 0.613 0.668 0.673 0.711 3 (0.777) 0.767 0.777 0.718 0.761 0.606 0.639 0.706 0.777 4 (0.775) 0.747 0.775 0.708 0.731 0.640 0.676 0.684 0.707 5 (0.766) 0.744 0.761 0.710 0.745 0.637 0.709 0.684 0.721 N=25 2 (0.753) 0.747 0.753 0.698 0.727 0.590 0.679 0.673 0.701 3 (0.798) 0.781 0.798 0.727 0.742 0.607 0.666 0.698 0.749 4 (0.775) 0.743 0.762 0.714 0.750 0.595 0.639 0.679 0.695 5 (0.774) 0.736 0.756 0.713 0.751 0.622 0.705 0.676 0.722 *Bold numbers are highest max and underlined numbers are second highest max. K (opt)
SMBO mean max
152
A. Moraglio and A. Kattan
Table 1 reports the results of the comparison. SMBO is consistently the best both in terms of average best solution and max best solution for all combinations of n and k considered. Furthermore, in 12 out of 16 cases, SMBO was able to find the (estimated) real optimum. As the problem size (n) increases, SMBO becomes better in the comparison in terms of difference in performance. As expected, as the ruggedness (k) of the problem increases, the performance of SMBO and the other search algorithms decreases with regard to the difference in performance to the estimated optimum. As for the other algorithms in the comparison, the population-based EA generally does better than (1+1) EA and RS. In particular, the population seems to help with larger instances of the problem. Perhaps surprisingly, random search often does better than (1+1) EA. This is because (1+1) EA can get easily trapped in local optima, whereas the solution found by random search exhibits a large variance in quality so the max best solution found can be competitive “by a stroke of luck”, especially with small sample size and in small problems. In summary, analogously to the case of continuous space, the surrogate model on the Hamming space really helps at finding better solutions than using standard search algorithms. This makes very promising the application of this framework to more complex solution representations associated with combinatorial spaces.
4
Conclusions and Future Work
There are many potentially interesting applications of a surrogate model based optimisation framework that can naturally encompass more complex representations beyond the traditional vector-based representation. We advocate that allowing a natural representation for the problem at hand makes the task of learning the underlying surrogate model easier, hence more efficient, as no extra non-linearity due to shoe-horning the solutions in vectors is introduced. Also, a direct approach to representations greatly enlarges the scope of SMBO to complex representations (e.g., Genetic Programming trees) which cannot be naturally mapped to vectors of features. In this paper, we have outlined a conceptually simple, formal, general and systematic approach to adapt a SMBO algorithm to any target representation. This approach has been derived using a geometric methodology to generalise search algorithms from continuous to combinatorial spaces that has been successfully applied in the past to generalise other types of search algorithms. This methodology requires to write the original continuous algorithm only in terms of Euclidean distances between candidate solutions, which then can be generalised by replacing the Euclidean distance function with a generic metric. Then the formal algorithm obtained can be formally specified to any target representation by employing as underlying metric a distance rooted on the target representation (e.g., edit distance). Multivariate interpolation and regression methods to build surrogate models are well-suited to this methodology as they are inherently based on spatial notions. We showed how radial basis function networks can be naturally generalised to encompass any representation. This is possible because
Geometric Generalisation of Surrogate Model Based Optimisation
153
both the approximating model and the learning of the parameter of the model can be cast completely in a representation-independent way, and rely only on distance relations between training instances and query instances. As a preliminary experimental validation of the framework, we have considered the binary strings representation endowed with the hamming distance and tested the SMBO on the NK-landscapes, obtaining consistently that with the same budget of expensive function evaluations, the SMBO performs best in a comparison with other search algorithms. This shows that this framework has potential to work well on other more complex representation associated with combinatorial spaces. Much work remains to be done. Firstly, we will test the framework on standard problems for more complex representations, such as permutations, variablelength sequences, and trees. Then, we will test how the system performs on a number of challenging real-world problems. We will also experiment with different types of radial basis functions, beside the Gaussian, and in a more complex learning settings (i.e., learning the centers and the widths of the radial basis functions). Lastly, we will attempt the generalisation of more sophisticated interpolation and regression methods, including gaussian process regression, which is state of the art in machine learning.
References 1. Bajer, L., Holeˇ na, M.: Surrogate model for continuous and discrete genetic optimization based on RBF networks. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 251–258. Springer, Heidelberg (2010) 2. Castillo, P., Mora, A., Merelo, J., Laredo, J., Moreto, M., Cazorla, F., Valero, M., McKee, S.: Architecture performance prediction using evolutionary artificial neural networks. In: Giacobini, M., Brabazon, A., Cagnoni, S., Di Caro, G.A., Drechsler, R., Ek´ art, A., Esparcia-Alc´ azar, A.I., Farooq, M., Fink, A., McCormack, J., O’Neill, M., Romero, J., Rothlauf, F., Squillero, G., Uyar, A.S¸., Yang, S. (eds.) EvoWorkshops 2008. LNCS, vol. 4974, pp. 175–183. Springer, Heidelberg (2008) 3. Cressie, N.A.C.: Statistics for Spatial Data, revised edn. Wiley, Chichester (1993) 4. Jain, L.C.: Radial Basis Function Networks. Springer, Heidelberg (2001) 5. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation. Soft Computing Journal 9(1), 3–12 (2005) 6. Jones, D.R.: A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization 21(4), 345–383 (2001) 7. Kauffman, S.: Origins of order: self-organization and selection in evolution. Oxford University Press, Oxford (1993) 8. Lew, T.L., Spencer, A.B., Scarpa, F., Worden, K., Rutherford, A., Hemez, F.: Identification of response surface models using genetic programming. Mechanical Systems and Signal Processing 20, 1819–1831 (2006) 9. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
154
A. Moraglio and A. Kattan
10. Moraglio, A.: Towards a geometric unification of evolutionary algorithms. PhD thesis, University of Essex (2007) 11. Rasmusen, C.E., Williams, C.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006) 12. Tong, S., Gregory, B.: Turbine preliminary design using artificial intelligence and numerical optimization techniques. Journal of Turbomachinery 114, 1–10 (1992) 13. Voutchkov, I., Keane, A., Bhaskar, A., Olsen, T.M.: Weld sequence optimization: the use of surrogate models for solving sequential combinatorial problems. Computer Methods in Applied Mechanics and Engineering 194, 3535–3551 (2005)
GPU-Based Approaches for Multiobjective Local Search Algorithms. A Case Study: The Flowshop Scheduling Problem Th´e Van Luong, Nouredine Melab, and El-Ghazali Talbi INRIA Dolphin Project / Opac LIFL CNRS 40 avenue Halley, 59650 Villeneuve d’Ascq Cedex, France
[email protected], {Nouredine.Melab,El-Ghazali.Talbi}@lifl.fr Abstract. Multiobjective local search algorithms are efficient methods to solve complex problems in science and industry. Even if these heuristics allow to significantly reduce the computational time of the solution search space exploration, this latter cost remains exorbitant when very large problem instances are to be solved. As a result, the use of graphics processing units (GPU) has been recently revealed as an efficient way to accelerate the search process. This paper presents a new methodology to design and implement efficiently GPU-based multiobjective local search algorithms. The experimental results show that the approach is promising especially for large problem instances.
1
Introduction
Real-world optimization problems are often complex and NP-hard, their modeling is continuously evolving in terms of constraints and objectives, and their resolution is time-consuming. Although approximate algorithms such as metaheuristics allow to reduce the temporal complexity of their resolution, they remain unsatisfactory to tackle very large problems. Nowadays, GPU computing is recognized as a powerful way to achieve highperformance on long-running scientific applications [1]. Designing multiobjective local search (MLS) algorithms for solving real-world optimization problems is an important challenge for GPU computing. However, only few research works are related to evolutionary algorithms on GPU for monoobjective optimization exist [2, 3, 4, 5]. Indeed, a fine-grained model such that the parallel exploration of the neighborhood on GPU is not immediate and several challenges persist and are particular related to the characteristics and underlined issues of the GPU architecture and the MLS algorithms. The major issues are the efficient distribution of data processing between CPU and GPU, the thread synchronization, the optimization of data transfer between the different memories, the capacity constraints of these memories, etc. The main objective of this paper is to deal with such issues for the re-design of parallel MLS algorithms to allow solving of large scale optimization problems on GPU architectures. In this paper, we contribute with the first results of P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 155–166, 2011. c Springer-Verlag Berlin Heidelberg 2011
156
T. Van Luong, N. Melab, and E.-G. Talbi
multiobjective local search algorithms on GPU. More exactly, we propose some new GPU-based approaches for building the parallel exploration of the neighborhood on GPU in a multiobjective context. These approaches are based on a decomposition of the GPU hierarchy allowing a clear separation between generic and problem-dependent LS features. Several challenges are dealt with: (1) the distribution of the search process among the CPU and the GPU minimizing the data transfer between them; (2) finding the efficient mapping of the hierarchical LS parallel models on the hierarchical GPU; (3) using efficiently the coalescing and texture memory in the context of MLS algorithms. To validate the approaches presented in this paper, the flowshop scheduling problem (FSP) [6] have been considered and implemented on GPU. The remainder of the paper is organized as follows: Section 2 highlights the principles of MLS methods and their parallel models. In Section 3, for a better understanding of the difficulties of using the GPU architecture, its characteristics are described according to a decomposition of the GPU hierarchy. Section 4 presents generic concepts for designing parallel MLS methods on GPU. In Section 5, on the one hand, efficient mappings between state-of-the-art LS structures and GPU threads model are performed. On the other hand, a depth look on the GPU memory management adapted to MLS heuristics is depicted. Section 6 reports the performance results obtained for the implemented problem mentioned above. Finally, a discussion and some conclusions of this work are drawn in Section 7.
2 2.1
Parallel Multiobjective Local Search Algorithms Multiobjective Local Search Algorithms
The existing LS methods that intend to find an approximation of the Pareto optimal set of a multiobjective optimization problem fall into two categories: scalar approaches and Pareto approaches. Approaches belonging to the first class contains the approaches that transform a multiobjective problem into a monoobjective one or a set of such problems. Many proposed algorithms in the literature are scalar approaches. Among these methods one can find the aggregation methods, the weighted metrics, the constraint methods . . . A review of these methods is given in [7]. The second class consists in defining the acceptance of the LS according to a dominance relationship such as the Pareto dominance. The idea of Pareto approaches is to maintain an archive of non-dominated solutions, to explore the neighborhood of the solutions contained in the archive and to update the archive with the visited solutions. A complete description of the different algorithms can be found in [8]. 2.2
Parallel Models of Local Search Algorithms
Parallelism arises naturally when dealing with a neighborhood, since each of the solutions belonging to it is an independent unit. Parallel design and implementation of metaheuristics have been studied as well on different architectures [9,10].
GPU-Based Approaches for Multiobjective Local Search Algorithms
157
Basically, three major parallel models for LS heuristics can be distinguished: solution-level, iteration-level and algorithmic-level. • Solution-level Parallel Model. A focus is made on the parallel evaluation of a single solution. In this case, the function can be viewed as an aggregation of a given number of partial functions. • Iteration-level Parallel Model. This model is a low-level Master-Worker model that does not alter the behavior of the heuristic. Exploration and evaluation of the neighborhood are made in parallel. Each worker manages some candidates and the results are returned back to the master. • Algorithmic-level Parallel Model. Several LS algorithms are simultaneously launched for computing better and robust solutions. They may be heterogeneous or homogeneous, independent or cooperative, start from the same or different solution(s), configured with the same or different parameters. The solution-level model is problem-dependent and does not present many generic concepts. In this paper, we focus only on the fine-grained problem-independent model: the iteration-level. Indeed, unlike the algorithmic-level, the iteration-level can be seen as an acceleration model which does not change the semantics of the algorithm.
3
GPU Computing for Metaheuristics
Driven by the demand for high-definition 3D graphics on personal computers, GPUs have evolved into a highly parallel, multithreaded and many-core environment. Indeed, since more transistors are devoted to data processing rather than data caching and flow control, GPU is specialized for compute-intensive and highly parallel computation. A complete review of GPU architectures can be found in [1]. The adaptation of MLS algorithms on GPU requires to take into account at the same time the characteristics and underlined issues of the GPU architecture and the LS parallel models. In this section, we propose a decomposition of the GPU adapted to the parallel iteration-level model allowing to identify the different challenges that must be dealt with. 3.1
General GPU Model
In general-purpose computing on graphics processing units, the CPU is considered as a host and the GPU is used as a device coprocessor. This way, each GPU has its own memory and processing elements that are separate from the host computer. Each processor device on GPU supports the single program multiple data (SPMD) model, i.e. multiple autonomous processors simultaneously execute the same program on different data. For achieving this, the concept of kernel is defined. The kernel is a function callable from the host and executed on the specified device simultaneously by several processors in parallel.
158
T. Van Luong, N. Melab, and E.-G. Talbi
Regarding the iteration-level parallel model, since the evaluation of neighboring candidates is often the most time-consuming part of MLSs, it must be done in parallel on GPU. Therefore, a kernel is associated with the evaluation of the neighborhood and the CPU controls the whole sequential part of the LS process. However, memory transfer from the CPU to the device memory is a synchronous operation which is time consuming. Indeed, bus bandwidth and latency between the CPU and the GPU can significantly decrease the performance of the search. As a result, one of the challenges is to optimize the data transfer from CPU to GPU. 3.2
Parallelism Control: GPU Threads Model
The kernel handling is dependent of the general-purpose language. For instance, CUDA or OpenCL are parallel computing environments which provide an application programming interface for GPU architectures [11] [12]. Indeed, these toolkits introduce a model of threads which provides an easy abstraction for SIMD architecture. Regarding their spatial organization, threads are organized within so called thread blocks. A kernel is executed by multiple equally threaded blocks. All the threads belonging to the same thread block will be assigned as a group to a single multiprocessor. Thereby, a unique id can be deduced for each thread to perform computation on different data. Regarding MLS algorithms, a move which represents a particular neighbor candidate solution can also be associated with a unique id. However, according to the solution representation of the problem, finding a corresponding id for each move is not straightforward. As a consequence, another challenging issue is to find an efficient mapping between GPU threads and LS moves. 3.3
Memory Management: Kernel Management
From a hardware point of view, graphics cards consist of streaming multiprocessors, each with processing units, registers and on-chip memory. Since multiprocessors are used according to the SPMD model, threads share the same code and have access to different memory areas. Communication between the CPU host and its device is done through the global memory. Since this memory is not cached and its access is slow, one needs to minimize accesses to global memory (read/write operations) and reuse data within the local multiprocessor memories. Graphics cards provide also read-only texture memory to accelerate operations such as 2D or 3D mapping. Constant memory is read only from kernels and is hardware optimized for the case where all threads read the same location. Shared memory is a fast memory located on the multiprocessors and shared by threads of each thread block. This memory area provides a way for threads to communicate within the same block. Registers among streaming processors are private to an individual thread, they constitute fast access memory.
GPU-Based Approaches for Multiobjective Local Search Algorithms
159
A last challenge consists in finding the association of the different LS structures with the different available memories to obtain the best performances.
4
The Proposed GPU-Based Algorithm
In this section, the focus is on the re-design of the iteration-level parallel model. Designing parallel LS model is a great challenge as nowadays there is no generic GPU-based MLS algorithms to the best of our knowledge. Adapting traditional MLS methods to GPU is not a straightforward task because the hierarchical memory management on GPU has to be handled. We propose (see algorithm 1) a methodology to adapt MLS methods on GPU in a generic way. The given template is applicable to most of scalar and Pareto approaches.
Algorithm 1. Multiobjective Local Search Template on GPU 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.
Choose an initial solution Evaluate the solution Specific MLS initializations Allocate problem data inputs on GPU device memory Allocate a solution on GPU device memory Allocate a neighborhood fitnesses structure on GPU device memory Allocate additional solution structures on GPU device memory Copy problem data inputs on GPU device memory Copy the solution on GPU device memory Copy additional solution structures on GPU device memory repeat for each neighbor in parallel on the GPU kernel do Complete or delta evaluation of the candidate solution Insert the resulting fitness into the neighborhood fitnesses structure end for Copy the neighborhood fitnesses structure on CPU host memory Specific MLS solution selection strategy on the neighborhood fitnesses structure Specific MLS post-treatment Copy the chosen solution on GPU device memory Copy additional solution structures on GPU device memory until a stopping criterion satisfied
First of all, memory allocations on GPU are made: data inputs and candidate solution of the problem must be allocated (lines 4 and 5). Since GPUs require massive computations with predictable memory accesses, a structure has to be allocated for storing the results of the evaluation of each neighbor (neighborhood fitnesses structure) at different addresses (line 6). In the case of Pareto approaches, this structure can represent different objective vectors. Additional solution structures which are problem-dependent can also be allocated to facilitate the computation of the evaluation function (line 7). Second, problem data
160
T. Van Luong, N. Melab, and E.-G. Talbi
inputs, initial candidate solution and additional structures associated with this solution have to be copied on the GPU (lines 8 to 10). It is important to notice that problem data inputs (e.g. a matrix in TSP) are a read-only structure and never change during all the execution of LS algorithms. Therefore, their associated memory are copied only once during all the execution. Third, comes the parallel iteration-level, in which each neighboring solution is generated, evaluated and copied into the neighborhood fitnesses structure (from lines 12 to 15). Fourth, since the order in which candidate neighbors are evaluated is undefined, the neighborhood fitnesses structure has to be copied to the host CPU (line 16). Then a specific LS solution selection strategy is applied to this structure (lines 17 and 18). For instance, the archiving of non-dominated solutions might be done in the case of Pareto approaches. Finally, after a new candidate has been selected, this latter and its additional structures are copied to the GPU (lines 19 and 20). The process is repeated until a stopping criterion is satisfied.
5 5.1
Neighborhood Generation and Memory Management Efficient Mappings of Neighborhood Structures on GPU
Since the generation of the neighborhood is done on GPU to reduce multiple data transfers, the issue is to say which solution must be handled by which thread. The answer is dependent of the target optimization problem representation. In the following, we provide a methodology to deal with different structures of the literature. Binary Representation. A solution is coded as a vector of bits. The neighborhood representation for binary problems is based on the Hamming distance of one where a given solution is obtained by flipping one bit of the solution. A mapping between LS neighborhood encoding and GPU threads is quiet trivial. Indeed, on the one hand, for a binary vector of size n, the size of the neighborhood is exactly n. On the other hand, threads are provided with a unique id. As a result, a IN → IN mapping is straightforward. Discrete Vector Representation. This is an extension of binary encoding using a given alphabet Σ. In this representation, each variable takes its value over the alphabet Σ. Assume that the cardinality of the alphabet Σ is k, the size of the neighborhood is (k − 1) × n for a discrete vector of size n. Let id be the identity of the thread corresponding to a given solution of the neighborhood. Compared to the initial candidate solution, id/(k − 1) represents the position which differs from the initial solution and id%(k − 1) is the available value from the ordered alphabet Σ (both using zero-index based numbering). As a consequence, a IN → IN mapping is possible.
GPU-Based Approaches for Multiobjective Local Search Algorithms
161
Permutation Representation. Building a neighborhood by pairwise exchange operations is a standard way for permutation problems. Unlike the previous representations, a mapping between a neighbor and a GPU thread is not straightforward. Indeed, on the one hand, a neighbor is composed by two element indexes. On the other hand, threads are identified by a unique id. As a result, a IN → IN × IN mapping has to be considered to transform one index into two ones. In a similar way, a IN × IN → IN mapping is required to transform two indexes into one. Proposition 1. Two-to-one index transformation Given i and j the indexes of two elements to be exchanged in the permutation representation, the corresponding index f (i, j) in the neighborhood representation is equal to i × (n − 1) + (j − 1) − i×(i+1) , where n is the permutation size. 2 Proposition 2. One-to-two index transformation Given f (i, j) the index of the element in the √ neighborhood representation, the 8×(m−f (i,j)−1)+1−1
corresponding index i is equal to n − 2 − and j is equal 2 to f (i, j) − i × (n − 1) + i×(i+1) + 1 in the permutation representation, where n 2 is the permutation size and m the neighborhood size. The proofs of these two index transformations can be found in [13]. Notice that for different neighborhood operators such as 2-opt or the insertion operator, a slight variation of these mappings is easily applicable. 5.2
Memory Management of Local Search Algorithms on GPU
In this section, the focus is on the memory management. Understanding the GPU memory organization and issues is useful to provide an efficient implementation of parallel MLS heuristics. Texture Memory. Optimizing the performance of GPU applications often involves optimizing data accesses which includes the appropriate use of the various GPU memory spaces. The use of texture memory is a solution for reducing memory transactions due to non-aligned accesses by consecutive threads. Texture memory provides a surprising aggregation of capabilities including the ability to cache global memory. Therefore, texture memory can be seen as a relaxed mechanism for the thread processors to access global memory because the alignment requirements do not apply to texture memory accesses. The use of texture memory is well adapted for LS algorithms since cached texture data is laid out to give best performance for 1D/2D access patterns. The best performance will be achieved when the threads of a warp read locations that are close together from a spatial locality perspective. Since optimization problem inputs are generally 2D matrices or 1D solution vectors, LS structures can be bound to texture memory. The use of textures in place of global memory accesses is a completely mechanical transformation. Details of texture coordinate clamping and filtering is given in [11].
162
T. Van Luong, N. Melab, and E.-G. Talbi Table 1. Summary of the different memories used in the evaluation function Type of memory LS structure Texture memory data inputs, solution representation Global memory fitnesses structure Registers additional variables Local memory additional structures
Kernel management. Table 1 summarizes the kernel memory management in accordance with the different LS structures. The inputs of the problem (e.g. matrix in TSP) and the solution which generates the neighborhood are associated with the texture memory. The fitnesses structure which stores the obtained results for each neighbor is declared as global memory. Indeed, since only one writing operation per thread is performed at each iteration, this structure is not part of intensive calculation. Declared variables for the computation of the evaluation function of each neighbor are automatically associated with registers by the compiler. Additional complex structures which are private to a neighbor will reside in local memory.
6 6.1
Application to the Flowshop Scheduling Problem Configuration
To validate the approaches presented in this paper, the FSP have been implemented on GPU. This problem is one of the most well-known scheduling problems. The problem can be presented as a set of n jobs J1 , J2 , . . . , Jn to be scheduled on m machines. Each job Ji is composed of m consecutive tasks ti1 , . . . , tim , where tij represents the jth task of the job Ji requiring the machine Mj . To each task tij is associated a processing time pij , and to each job Ji a release time ri and a due date di (deadline of the job) are given. For the following experiments, three objectives are used in scheduling tasks on different machines: – Makespan (total completion time); max{Ci |i ∈ [1...n]} n – Total tardiness; i=1 max(0, Ci − di ) – Number of jobs delayed with regard to their due date di where sij represents the time at which the task tij is scheduled and Ci = sim + pim represents the completion time of job Ji . The problem has been implemented using a permutation representation and the neighborhood is based on a standard insertion operator. The considered instances are the Taillard instances extended by Liefooghe in a multiobjective context [14]. As the iteration-level parallel model does not change the semantics of the sequential algorithm, the effectiveness in terms of quality of solutions is not addressed here. Only average execution times and acceleration factors are reported
GPU-Based Approaches for Multiobjective Local Search Algorithms
163
in comparison with a single-core CPU. The objective is to evaluate the impact of a GPU-based implementation in terms of efficiency. Experiments have been implemented on top of two different configurations. The number of global iterations of the each LS is set to 10000 which corresponds to a realistic scenario in accordance with the algorithm convergence. For each algorithm, a single-core CPU implementation, a CPU-GPU, and a CPU-GPU version using texture memory (GP Utex ) are considered for each configuration. The average time has been measured in seconds for 50 runs. 6.2
Aggregated Tabu Search
For the first experiment, a tabu search based on an aggregation (or weighted) method is used for the generation of Pareto solutions. Thereby, the FSP is transformed into a monoobjective problem by combining the various objective functions into a single one in a linear way. The results are shown in Table 2. Table 2. Time measurements for the tabu search
Instance 20-10 20-20 50-10 50-20 100-10 100-20 200-10 200-20
CPU 1.1 2.3 19.8 38.0 170.8 321.1 1417.4 2644.1
Xeon 3Ghz GTX 285 240 cores GPU GP UT ex 3.9×0.3 3.7×0.4 7.1×0.3 6.6×0.4 9.4×2.1 8.9×2.2 17.4×2.2 16.3×2.3 23.6×7.2 20.9×8.2 44.1×7.3 38.1×8.4 159.4×8.9 144.3×9.8 284.4×9.3 263.9×10.0
Core i7 3.2Ghz GTX 480 480 cores CPU GPU GP UT ex 0.9 1.9×0.5 1.7×0.6 1.9 3.3×0.6 6×0.7 16.5 5.0×3.3 4.3×3.8 31.9 9.1×3.5 7.8×4.1 144.5 12.6×11.5 11.5×12.6 270.7 23.3×11.6 20.9×12.9 1189.7 81.88×14.5 77.2×15.4 2220.7 147.8×15.0 139.0×16.0
From the instance 50-10, GPU versions start to give positive accelerations for both configurations (from ×2.1 to ×3.8). Indeed, the slow speed-ups for small instances can be explained by the fact that since the neighborhood is relatively small, the number of threads per block is not enough to fully cover the memory access latency. As long as the instance size increases, the acceleration factor grows accordingly. For example, from a larger instance such as 100-10, the provided speed-ups are much better (from ×7.2 to ×12.6). For each instance, in a general manner, the use of texture memory allows to provide additional acceleration. However, constraints of memory alignment in latest G200 and G400 series are relaxed in comparison with the previous cards (e.g. G80 and G90 series). As a consequence, programs running on the used cards get a better global memory performance and the benefits of using the texture memory are less evident. Finally, efficient speed-ups are obtained for the instance 200-20. They vary between ×9.3 and ×16. As a consequence, parallelization on top of GPU provides an efficient way for handling large neighborhoods.
164
6.3
T. Van Luong, N. Melab, and E.-G. Talbi
Pareto Local Search Algorithms
For the next experiments, the PLS-1 (Pareto local search proposed by Paquete et al [8]) has been considered. At each iteration, PLS-1 selects a non-dominated solution from the unbounded archive and explores its neighborhood in an exhaustive way. The termination condition is done when all the solutions contained in the archive have been visited. For the experiments, a restart mechanism is performed while the number of iterations has not reached 10000. A first algorithm P LS have been considered where only the dominating neighbors are added to the archive. The results of the experiments are provided in Table 3. Table 3. Time measurements for P LS Xeon 3Ghz GTX 285 Instance 240 cores CPU GPU GP UT ex 20-10 1.1 5.3×0.2 3.7×0.3 20-20 2.5 8.4×0.3 6.6×0.4 50-10 19.6 10.9×1.8 8.9×2.2 50-20 39.7 18.9×2.1 16.5×2.4 100-10 167.0 25.3×6.6 21.2×7.9 100-20 329.8 45.8×7.2 40.4×8.1 200-10 1391.5 161.8×8.6 145.1×9.6 200-20 2707.8 307.7×8.8 285.9×9.4
Core i7 3.2Ghz GTX 480 # ≺ 480 cores CPU GPU GP UT ex 1.0 1.9×0.5 1.7×0.6 1.9 3.4×0.6 3.0×0.6 16.6 4.9×3.4 4.3×3.8 32.7 9.1×3.6 7.8×4.2 139.5 12.8×10.9 11.7×12.0 274.9 23.5×11.7 21.1×13.1 1171.5 82.5×14.2 77.9×15.1 2196.3 148.4×14.8 139.7×15.7
solutions 11 13 19 20 31 34 61 71
In comparison with Table 2, similar observations can be made regarding the performance results where the maximal speed-up reaches the value ×15.7 for the biggest instance. A look at the average number of non-dominated solutions obtained by the algorithm shows that this number is rather low whatever the instance size. Therefore, this may explained why the performance results of the Pareto algorithm are similar to the aggregated tabu search. To emphasize this point, a second algorithm P LS≺ has been considered. In this version, all the non-dominated neighbors are added to the archive. Table 4 reports the different measurements. For the instance 100-10, in comparison with the previous table, one can clearly start to see the impact of the number of non-dominated solutions in terms of performance. Indeed, the acceleration factors vary between ×3.1 to ×5.1. The performance results are significantly reduced in comparison with the analog instance for P LS . Finally, the speed-up still grows with the size increase until reaching the value ×7.1 for the last instance. This global performance loss can be explained by the fact that the number of non-dominated solutions is more important. Indeed, one can see that P LS≺ is more time-consuming than its counterpart P LS . Therefore, the time spent on the archiving of solutions may be significant. Furthermore, this step is performed exclusively on CPU.
GPU-Based Approaches for Multiobjective Local Search Algorithms
165
Table 4. Time measurements for P LS≺ Xeon 3Ghz GTX 285 Instance 240 cores CPU GPU GP UT ex 20-10 1.1 5.4×0.2 3.9×0.3 20-20 2.6 8.5×0.3 6.7×0.4 50-10 25.7 15.1×1.7 13.1×1.9 50-20 44.5 24.7×1.8 22.3×2.2 100-10 220.1 71.0×3.1 66.9×3.3 100-20 386.8 96.7×4.0 91.3×4.2 200-10 1631.2 362.5×4.5 346.3×4.7 200-20 2998.8 588.0×5.1 566.1×5.3
7
Core i7 3.2Ghz GTX 480 # ≺ 480 cores CPU GPU GP UT ex 1.2 2.0×0.6 1.8×0.7 2.0 3.4×0.6 3.0×0.7 20.5 8.9×2.3 8.2×2.5 39.5 14.1×2.8 13.2×3.0 253.3 51.7×4.9 49.2×5.1 312.5 62.5×5.0 59.0×5.3 1361.5 223.2×6.1 217.7×6.4 2672.6 398.9×6.7 378.0×7.1
solutions 77 83 396 596 1350 1530 1597 2061
Discussion and Conclusion
High-performance computing based on the use of computational GPUs is recently revealed to be a good way to get at hand such computational power. However, the exploitation of parallel models is not trivial and many issues related to the GPU memory hierarchical management of this architecture have to be considered. To the best of our knowledge, GPU-based parallel LS approaches have never been widely investigated. In this paper, efficient mapping of the iteration-level parallel model on the hierarchical GPU has been proposed. First, the CPU manages the whole MLS process and let the GPU be used as a coprocessor dedicated to intensive calculations. Then, efficient mappings between neighborhood candidate solutions and GPU threads are done. Finally, memory management is applied to the evaluation function kernel. Apart from being generic, we demonstrated the effectiveness of our methodology by making extensive experiments. Applying such mechanism with an efficient memory management allows to provide promising speed-ups (up to ×16). We strongly believe that the overall performance could be better for other multiobjective optimization problems requiring 1) more computational calculations and 2) less resources in terms of memory. The re-design of the parallel MLS iteration-level model on GPU fits well for deterministic scalar and Pareto approaches. However, few other MLS algorithms only partially explore neighborhoods and take the first improving local neighbor that is detected. Applied to LS algorithm such as multiobjective simulated annealing, this model needs to be re-thought. Furthermore, in the case of Pareto approaches, the experimental results have shown a global performance loss with the increase of non-dominated solutions. For other multiobjective optimization problems whose objectives are uncorrelated, this number could be huge, leading to a serious performance decrease. Even if some archiving techniques could be applied to bound the archive size, this would not completely solve the issue at all. As a consequence, for being
166
T. Van Luong, N. Melab, and E.-G. Talbi
complete, the next step of our method is to provide a SIMD parallel archiving on GPU. This way, it will allow to significantly enhance the performance of the provided algorithms. However, performing such step is challenging since it requires to ensure additional synchronizations, non-concurrent writings and to manage some dynamic allocations on the GPU.
References 1. Ryoo, S., Rodrigues, C.I., Stone, S.S., Stratton, J.A., Ueng, S.Z., Baghsorkhi, S.S., Hwu, W.-m.W.: Program optimization carving for gpu computing. J. Parallel Distribributed Computing 68(10), 1389–1401 (2008) 2. Li, J.M., Wang, X.J., He, R.S., Chi, Z.X.: An efficient fine-grained parallel genetic algorithm based on gpu-accelerated. In: IFIP International Conference on Network and Parallel Computing Workshops, NPC Workshops, pp. 855–862 (2007) 3. Chitty, D.M.: A data parallel approach to genetic programming using programmable graphics hardware. In: GECCO, pp. 1566–1573 (2007) 4. Wong, T.T., Wong, M.L.: Parallel evolutionary algorithms on consumer-level graphics processing unit. In: Parallel Evolutionary Computations, pp. 133–155 (2006) 5. Fok, K.L., Wong, T.T., Wong, M.L.: Evolutionary computing on consumer graphics hardware. IEEE Intelligent Systems 22(2), 69–78 (2007) 6. Taillard, E.: Benchmarks for basic scheduling problems (1989) 7. Talbi, E.G.: Metaheuristics: From design to implementation. Wiley, Chichester (2009) 8. Paquete, L.: Stochastic Local Search Algorithms for Multiobjective Combinatorial Optimization: Methods And Analysis. IOS Press, Amsterdam (2006) 9. Alba, E., Talbi, E.G., Luque, G., Melab, N.: 4. Metaheuristics and Parallelism. Wiley Series on Parallel and Distributed Computing. In: Parallel Metaheuristics: A New Class of Algorithms, pp. 79–104. Wiley, Chichester (2005) 10. Melab, N., Cahon, S., Talbi, E.G.: Grid computing for parallel bioinspired algorithms. J. Parallel Distributed Computing 66(8), 1052–1061 (2006) 11. NVIDIA: CUDA Programming Guide Version 3.2 (2010) 12. Group, K.: OpenCL 1.0 Quick Reference Card (2010) 13. Luong, T.V., Melab, N., Talbi, E.G.: Large neighborhood for local search algorithms. In: International Parallel and Distributed Processing Symposium. IEEE Computer Society, Los Alamitos (2010) 14. Liefooghe, A., Basseur, M., Jourdan, L., Talbi, E.G.: Combinatorial optimization of stochastic multi-objective problems: An application to the flow-shop scheduling problem. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 457–471. Springer, Heidelberg (2007)
Local Search for Mixed-Integer Nonlinear Optimization: A Methodology and an Application Fr´ed´eric Gardi1 and Karim Nouioua2 2
1 Bouygues e-lab, Paris, France Laboratoire d’Informatique Fondamentale – CNRS UMR 6166, Universit´e Aix-Marseille II – Facult´e des Sciences de Luminy, Marseille, France
[email protected],
[email protected]
Abstract. A methodology is presented for tackling mixed-integer nonlinear optimization problems by local search, in particular large-scale real-life problems. This methodology is illustrated through the localsearch heuristic implemented for solving an energy management problem posed by the EDF company in the context of the ROADEF/EURO Challenge 2010, an international competition of applied optimization. Our local-search approach is pure and direct: the problem is tackled frontally, without decomposition nor hybridization. In this way, both combinatorial and continuous decisions can be modified by a move during the search. Then, our work focuses on the diversification by the moves and on the performance of the incremental evaluation machinery. Exploring millions of feasible solutions within one hour of running time, the resulting local search allows us to obtain among the best results of the competition, in which 44 teams from 25 countries were engaged.
1
Presentation of the Problem
´ Electricit´ e de France (EDF) is the historical French energy producer and supplier, whose operations and participations span worldwide today. The EDF power generation facilities in France stand for a total installed capacity of nearly 100 GW. Most of the French electricity is produced by thermal power plants: 90 % in 2009 among which 82 % by nuclear power plants. The subject of the ROADEF/EURO Challenge 2010, an international competition organized by the French Operational Research and Decision Support Society (ROADEF) and the Association of European Operational Research Societies (EURO), was focused on the medium-term (5 years) management of the EDF French thermal power park, and especially of nuclear plants which have to be repeatedly shut down for refueling and maintenance. Before describing our contributions (methodology and application), the main characteristics of this problem (decision variables, objectives, constraints) are outlined. For the sake of concision and readability, the presentation remains voluntarily informal. The interested reader is referred to the detailed technical specification provided by EDF in the context of the P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 167–178, 2011. c Springer-Verlag Berlin Heidelberg 2011
168
F. Gardi and K. Nouioua
ROADEF/EURO Challenge 2010 [9], which can be downloaded on the ROADEF website1 . The thermal power park is composed of two kinds of plants. Type-1 (shortly T1) plants can be supplied in fuel continuously. They correspond to coal, fuel oil, or gas power facilities, or even virtual power stations allowing to import energy. On the other hand, Type-2 (shortly T2) plants have to be shut down for refueling and maintenance regularly. They correspond to nuclear power plants. Indeed, the fuel stock of these facilities is consumed as power is produced. Whenever a T2 plant is supplied with new fuel, it has to be offline and cannot produce during the length of this outage period, generally several weeks. Thus, the operation of a T2 plant is organized in a succession of cycles, namely an offline period (outage) followed by an online period (production campaign). These production assets are used to satisfy a customer demand over a specific time horizon. This horizon is discretized with a homogeneous time step. Customer load is uncertain and known only through an available set of uncertainty scenarios. These scenarios are assumed to be the realization of some stochastic processes (in particular weather conditions). Production at T1 plants incurs a cost proportional to the power output and also depends on the load scenario and the time step. For each T2 facility, the initial fuel stock at the beginning of the time horizon is known. Then, refueling of T2 plants leads to costs proportional to the amount of loaded fuel, also depending on the time step. Because refueling and maintenance are heavy operations immobilizing many resources, the order of cycles is fixed for each T2 power station. The earliest outages over the horizon cannot be canceled, and must be planned in given time intervals. For the latest outages, whose dates are not forced and which can be postponed, the following rule applies: if an outage is canceled, all following outages must be canceled too. Ultimately, the objective is to minimize the expected cost of production over the given horizon. More precisely, the decision variables of the problem are: the starting dates of outages for all T2 plants, the refuel quantities for all outages of all T2 plants, the production levels for all T1 and T2 plants, all time steps, and all scenarios. Note that the first kind of decision variables are discrete, whereas the second and third ones are continuous. The objective function to minimize is composed of two terms: the expected production costs for all T1 plants (that is, the production costs over all scenarios divided by the number of scenarios), and the refuel costs for all T2 plants minus the expected stock values at the end of the period (to avoid end-of-side effects). The constraints can be classified into three categories; all these constraints are listed and numbered from CT1 to CT21 in the EDF specification. The first category (CT1) corresponds to constraints for coupling the production of the plants: for each scenario and each time step, the sum of productions for all T1 and T2 plants must be equal to the demand. The second category (CT2-CT12) concerns how plants can produce. The power of T1 plants must remain between minimum and maximum values depending on time steps and scenarios. When a T2 plant is offline, its power is equal to zero. When a T2 plant is online, its power must be non negative, lower than a maximum 1
http://challenge.roadef.org/2010/index.en.htm
Local Search for Mixed-Integer Nonlinear Optimization
169
value depending on time steps. During the production campaign, the fuel level dynamics couples fuel quantity and production output over the time horizon: the fuel stock at time step t + 1 is equal to the fuel stock at t minus the energy produced at t, namely the product of the power delivered at t by the duration of a time step. However, the T2 power must respect an imposed profile when the fuel stock becomes lower than a given limit; this profile follows a decreasing piecewise affine function (with only a few pieces). If there is no longer enough fuel stock to produce, the power production is equal to zero. A nuclear plant produces generally at maximal power; otherwise, the T2 plant is said to be in modulation. Because the difference between the maximum power of the plant and the actual production leads to an undesirable wear of the equipment involved, the quantity of energy modulated (that is, not produced at maximal power) between two outages cannot be lower than a given value. For T2 plants, additional constraints are set on refueling operations. The refuel quantity must be in a given interval, the stocks before and after refueling must be lower than given limits. Note that at each refueling operation, a given percentage of stock is lost. Finally, the third category (CT13-CT21) corresponds to constraints on outage scheduling of T2 plants: earliest and latest starting dates of outages, minimum spacing or maximum overlapping of two outages, minimum spacing between the starting or ending dates of two outages, resource constraints (the number of maintenance teams able to perform refueling operations is limited), maximum number of outages containing a given week, maximum cumulated offline power during a given period. This optimization problem can be classified as mixed-integer nonlinear. One can observe that it includes two dependent subproblems, nested according to a master/slave scheme. The master subproblem consists in determining a schedule of outages of T2 plants, satisfying constraints induced by limitations on resources which are necessary to perform refueling and maintenance operations (CT13CT21). In summary, this subproblem involves combinatorial decision variables, subject to constraints related to intervals on the integer line. Having scheduling outages, the slave subproblem consists in planning production to satisfy the demand at the minimum cost, that is determining the stock refuels for each T2 plant and each outage, and the quantity of energy to produce by each plant (T1 and T2) at each time step for each scenario. This subproblem involves continuous decision variables subject to classical flow conservation and capacity constraints, but also to nonlinear constraints activated under certain logical conditions (CT1-CT12). The master subproblem of outage scheduling is theoretically NP-complete, because it corresponds to scheduling tasks with date and resource constraints [5]. On the other hand, the slave subproblem of production planning seems to be NP-hard too because of the nonlinear constraints (CT6, also called imposition constraints). Moreover, the instances to tackle may be very large: 8 outages to schedule over 300 weeks for 70 T2 plants, the production levels to determine for 170 plants (T1 and T2) over 10000 time steps for 500 scenarios. The execution time of the algorithm is limited to 1 hour on a standard computer.
170
F. Gardi and K. Nouioua
We insist on the fact that the economic issues subjacent to this problem are considerable. The economic function to minimize (obfuscated in EDF data) contains in effect nine digits, representing nearly one billion euros per year of operating costs for EDF [6]. Therefore, a gap of only 0.1 % between two solutions represents savings of the order of one million euros a year.
2
Methodology and Outcomes
To the best of our knowledge, no work has been published yet in the literature which addresses the energy management model exposed above. Nevertheless, several softwares have been implemented by EDF researchers these last ten years for solving this problem. These solutions consists essentially in decomposing and reducing the problem so as to approach it with mixed-integer programming solvers [7, pp. 116–118]. The software currently exploited at EDF is based on a decomposition of the problem site by site, each subproblem being heuristically solved by integer linear programming according to the master/slave decomposition. Unfortunately, this solution does not guaranty the satisfiability of all the constraints of the problem. A large neighborhood search approach based on constraint programming (for scheduling outages) and integer linear programming (for planning production) was recently proposed by [8] (see also [7]), but the proposed solution does not take in input a set of scenarios but only one. The solution approach that we have implemented in the context of the ROADEF/EURO Challenge 2010 is a randomized local search, technique which is rarely used in mixed-integer nonlinear optimization. The design and engineering of our local-search algorithm follow a precise methodology, inspired by the previous works of the authors [1,2,3,4]. This methodology is outlined below. The first particularity of our local search is to be pure and direct. Indeed, no decomposition is done; the problem is tackled frontally. The search space explored by our algorithm is close to the original solution space. In particular, the combinatorial and continuous parts of the problem are treated together: combinatorial and continuous decisions can be simultaneously modified by a move during the search. By avoiding decompositions or reductions, no solution is lost and the probability to find good-quality ones is increased. Then, no hybridization is done: no particular metaheuristic is used, no tree-search technique is used. The diversification of the search is obtained by exploring a large variety of randomized neighborhoods. The second specificity of our local search is to be very aggressive: millions of feasible solutions are visited within the time limit. Indeed, randomized local search is a non deterministic, incomplete exploration of the search space. Therefore, exploring a huge number of (feasible) solutions during the allocated time augments the probability to find good-quality solutions. Then, our local-search heuristic is composed of three layers: general heuristic, moves, evaluation machinery. The evaluation machinery forms the engine of the local search; it computes the impacts of moves on constraints and objectives during the search. The time spent to engineer each layer during the project follows the following distribution: 10 % on general heuristic, 20 % on moves, 70 % on
Local Search for Mixed-Integer Nonlinear Optimization
171
evaluation machinery. In summary, our work was focused on: designing randomized moves allowing a diversified exploration of the search space despite strong constraints, and speeding up the evaluation of these moves notably by implementing an incremental randomized combinatorial algorithm for solving approximately but very efficiently the continuous subproblem. Numerical experiments show that this combinatorial algorithm is 10 000 times faster than state-of-theart linear programming solvers (without imposition constraints), while providing near-optimal production plans. Note that the same approach was recently applied by one of the authors for solving a real-life inventory routing problem [1], which can be viewed as a mixed-integer linear optimization problem. Benchmarks are divided into three categories A, B, X containing each one 5 instances. The instances A were communicated at the beginning of the qualification phase of the ROADEF/EURO Challenge 2010. Teams were selected for the final stage based on the results obtained on these instances. Then, instances B, much larger and very realistic, were given as test bed for the final stage. Ultimately, finalists were ranked according to their results on instances B (known from competitors) and instances X (unknown from competitors, communicated after the announcements of the final ranking). The final results were announced during EURO 2010, the 24th European Conference on Operational Research. Our algorithm was ranked 1st on instances A and B (among 44 teams engaged, 16 finalists), before falling to the 8th place due to a late-working-hours bug appearing on some instances X (note that only 4 teams among the 16 finalists have been able to provide all the solutions to instances X). Once corrected, our algorithm provides state-of-the-art results on instances X in conditions similar to ones of the Challenge. The results on instances B as computed by the Challenge’s organizers2 show an average gap greater than 1 % (resp. 10 %) between our solutions and the ones of the team ranked 3rd (resp. 6th). As evoked above, such gaps are important because corresponding from dozens to hundreds million euros of savings. One can observe that the majority of approaches proposed by the other competitors corresponds to MIP/CP-based decomposition heuristics. The presentation of the local-search algorithm is done through three sections, each one corresponding to one layer of the local search. The last section is devoted to numerical experiments.
3
General Heuristic
The general heuristic is decomposed into three phases. First (phase 1), starting from a random scheduling of outages, we try to find an admissible scheduling of outages, that is, a scheduling of outages which respects all the combinatorial constraints of the problem (CT13-CT21). In this phase, fuel reloading and production levels are ignored. Then (phase 2), starting from the scheduling of outages previously found, we try to determine dates of outages in such a way that an admissible production plan exists. In particular, the spacing between 2
http://challenge.roadef.org/2010/finalResults.pdf
172
F. Gardi and K. Nouioua
outages of each plant must be sufficient so as to ensure that stocks do not exceed maximum levels before and after refueling operations (CT11). Once such a solution is found, an admissible solution of the original optimization problem is found. Finally (phase 3), this solution is optimized according to the original objective function. Each phase is performed by local search. The heuristic employed in each phase is a first-improvement descent with randomized selection of moves; each phase follows the same design, allowing to factorize many components in their implementation. In practice, the first two phases are quickly passed (a few seconds). For the sake of efficiency, the phase 3 has been subdivided into several steps too. Indeed, the complexity for evaluating the cost of a solution depends on the number of scenarios. Since the number of scenarios may be large (up to 500), one way to reduce this complexity is to work on some subsets of scenarios (eventually aggregated). This is done as follows. At step s, the solution is optimized by local search for a subset S of scenarios, but with T1 completion costs computed over all scenarios (thanks to a special data structure). At step s + 1, the previous solution is repaired so as to become feasible for the new subset S ⊃ S of scenarios, and then is optimized over S . Repairing a solution consists in adjusting the production of T2 plants in order to not to exceed the demand constraints (CT1) for the new scenarios; this is done by minimizing the amount of power exceeding the demand over all the time steps and for all scenarios. This sequence of steps can be parameterized in different ways to obtain the best ratio between efficiency and robustness, depending on the allowed time limit. After experiments on instances A and B provided by EDF, we have chosen to simply proceed as follows: first, we optimize on an “average demand” scenario, and then we refine the solution over all scenarios.
4
Moves
The combinatorial structure of the problem naturally induces the following move: select k outages in the current solution and shift them over the time line. The neighborhood induced by such a move is of size O(H k ), with H the number of weeks given in input. In the preliminary phase of the Challenge, these moves were applied totally randomly by the heuristic with small values of k (between 1 and 3) in a first-improvement fashion (if the new solution reached via the move satisfy all the constraints and has a better cost, then commit the move, otherwise rollback it). Unfortunately, the combinatorial part of the problem being hardly constrained, such “simple” moves have a low success rate, which limits the diversification and may prevent the convergence to high-quality solutions. To overcome this difficulty, larger moves have to be designed so as to better explore the combinatorial part of the problem. This has been done by designing compound moves based on the simple moves described above. The underlying principal of these compound moves is to allow to join (better) admissible solutions by passing through infeasible solutions. It is particularly useful when the constraints of the problem are hard, as it is the case here. A compound move
Local Search for Mixed-Integer Nonlinear Optimization
173
is designed as follows. First, apply one move which may destroy the feasibility of the current solution. Then, apply iteratively some moves in order to repair the solution. In fact, repairing the solution using simple moves is equivalent to applying local search for solving a satisfaction problem. In this way, an appropriate objective to optimize must be defined to guide local search toward feasible solutions. It can be simply done by defining an objective based on the number and the amplitude of violations for each type of constraints. These compound moves can be viewed as a generalization (and simplification) of ejection chains and destroy-repair methods (see [10] for more details on these search techniques). In order to speed up the convergence toward feasible solutions, the “repairing” moves, applied following a first-improvement descent, target outages inducing violations. Moreover, the number of simple moves to attempt during reparation is limited: a compromise must be found between the time spent in attempting the compound move and its success rate (both depending on its length). On the other hand, in order to refine the search, the “destroying” move is chosen randomly (following a non uniform distribution) in a pool composed of the following moves: – k-MoveOutagesRandom: select k outages among T2 plants randomly and move them randomly in “feasible” time intervals, that is, ensuring the respect of earliest and latest starting dates (CT13 constraints) and maximum stocks before and after refueling (CT11 constraints); – k-MoveOutagesConstrained: select T2 plants which are involved in combinatorial constraints related to outage scheduling (CT14-CT21) randomly, select k outages of these plants randomly, move these outages randomly in feasible time intervals; – k-MoveOutagesConsecutive: select a T2 plant randomly, select k consecutive outages of these plant randomly, move these outages randomly in feasible time intervals. Note that more than the half of destroying moves attempted are of type 1-Move OutagesRandom, but specific and larger moves helps to diversify the search. Thus, whereas the admissibility rate of simple moves (that is, the number of moves leading to a new feasible solution, divided by the number of attempted moves) is about 20 % on average, the admissibility rate of compound moves is about 75 %. The computational experiments presented in last section show that a firstimprovement descent with randomized compound moves converges confidently toward high-quality solutions (no local optimum is encountered after hours of computations).
5
Evaluation Machinery
As suggested in introduction, the effectiveness of an incomplete search depends crucially on the number of (feasible) solutions explored. Hence, an important task arising in engineering a local-search approach is to make the evaluation fast, especially when tackling such large-scale real-life problems. Methodologically, our work on this point is driven by a simple goal: to lower as much as possible the
174
F. Gardi and K. Nouioua
practical time complexity of the evaluation of all moves. For this, an incremental evaluation is necessary, exploiting the invariants between the current solution and the one induced by the move. In the present case, the evaluation of the compound move is realized in two steps, each one corresponding to the resolution of a hard subproblem: first (re)scheduling outages, then (re)planning production (refuels and levels). Here is the general scheme of the evaluation layer, for a set S of scenarios: Combinatorial part : perform a destroying move; while combinatorial violations remain do perform a repairing move; Continuous part : set refueling amounts of impacted outages; for each scenario in S do set production levels of impacted T2 plants; compute global cost of new solution;
5.1
Combinatorial Part
When a simple move is applied for satisfying combinatorial constraints (CT14CT21), the violations on these constraints are maintained incrementally through routines related to the arithmetic of integer intervals (distance, intersection, inclusion). For each constraint, the evaluation returns not only if the constraint is violated or not following the move, but also the “distance” to feasibility (which can be interpreted as the cost of infeasibility). For example, for a minimum spacing constraint of 10 weeks between two outages, this cost shall be 8 if the spacing is of 2 weeks after the move. For constraints concerning minimum spacing/maximum overlapping between outages (CT14-CT18), the evaluation for k outages impacted by the move is done in O(k k) time with k the number of outages involved with the constraint. Forresource constraints (CT19), the evaluation for k impacted outages takes O( k wi ) time with wi the number of weeks for which a resource is immobilized during outage i. For constraints bounding the number of overlapping outages during a given week (CT20), the evaluation for k impacted outages is done in O(k ) time. For constraints limiting the offline power capacity of T2 plants during a time period (CT21), the evaluation for k impacted outages takes O( k ti ) time with ti the number of time steps during outage i. Note that in practice our moves are such that the number k of impacted outages is small relatively to the total number of outages scheduled (k = O(1)). As explained in the previous section, the convergence toward combinatorial feasible solutions is speeded up by attempting moves on outages inducing violations. The randomized selection of outages inducing violations is made in constant time by maintaining a bipartition of outages, with the ones inducing violations on the left side and the others on the right side, after each evaluation of combinatorial constraints.
Local Search for Mixed-Integer Nonlinear Optimization
175
In addition to CT14-CT21 combinatorial constraints, we add an implicit constraint, called “cut of minimum distance between outages”, induced by the continuous subproblem: the distance between two consecutive outages k and k + 1 must be large enough to ensure that stocks at outage k + 1 do not exceed maximum levels before and after refueling operations (CT11), even if the fuel reload at outage k is minimal and the production during cycle k is maximal. When an outage is moved, this minimal distance can be reevaluated in O(log d) worstcase time, with d the number of time steps between outages. For each T2 plant, compute the cumulated maximum powers of the plant from the beginning to the end of the horizon, for all time steps. Using this data structure, the maximum amount of fuel which can be consumed between any pair t1 ≤ t2 of time steps is obtained in constant time. Then, starting a production cycle with the minimum fuel level (derived from the minimum fuel reload), the resulting fuel level after t time steps of production at maximum power is computed in constant time. Therefore, the minimum number of time steps such that the resulting fuel level does not exceed a given level (CT11) is obtained by dichotomy in O(log d) time in the worst case. In practice, this evaluation is still made faster by caching the output minimum distance with the key (starting time step, starting fuel level) in a hash map. Ultimately, it allows to compute the desired minimum distance in amortized constant time. Since 80 % of the evaluation time is spent in the continuous part, cutting the evaluation earlier using this property of minimum distance between outages is crucial for the efficiency of the whole local search. 5.2
Continuous Part
Now, assume that the compound move yields a solution which is combinatorially feasible (CT13-CT21 constraints + cuts of minimum distance between outages). The continuous decision variables impacted by the move must be updated so as to ensure the feasibility on the continuous part; then, the global cost of the new solution induced by the move must be evaluated. This continuous subproblem, whose objective is (roughly speaking) to minimize T1 completion costs, is solved by an incremental randomized combinatorial algorithm. For each impacted T2 plant, the refuel at outage k is determined randomly between the minimum given in input (CT7) and a maximum reinforced to avoid exceeding the maximum levels before and after refueling operations at outage k + 1 (CT11). The computation of this maximum value is based on the maximal consumable energy emax (s) over the production cycle and the resulting minimal ending fuel level lmin (s) for each scenario s. These values are computed for each scenario through a first pass over the time steps from outage k to outage k + 1. A nuclear plant produces generally at maximal power; otherwise, the plant is said to modulate. Modulation is sometimes necessary when the demand is too low compared to the T2 power capacities. But it can also be used to reduce production costs by transferring T2 power from time steps where T1 costs are cheap to time steps where T1 costs are expensive. The quantity of energy which can be modulated (that is, not produced at maximal power) for each scenario, denoted by emod (s), is obtained as the minimum among the maximum bound
176
F. Gardi and K. Nouioua
given in input (CT12) and the maximum stock before (resp. after) refuel at k + 1 minus the minimal ending fuel stock lmin (s). The energy e(s) to consume during the cycle is then determined randomly between emax (s) − emod (s) and emax (s). Finally, through another pass, the power levels between outages k and k + 1 are set so as to consume e(s). This is done randomly from the left to the right or guided by the lowest T1 completion costs. In summary, the randomized algorithm for planning production is running in O(T ) time for each scenario, with T the number of time steps between outages k and k + 1. Thus, the refuels and the production levels which are impacted by the move are recomputed in O(P2 T S ) time, with P2 the number of impacted T2 plants, T the number of time steps over the planning horizon, and S the number of considered scenarios. Finally, the global cost of the new solution is evaluated. The most timeconsuming task is the evaluation of the T1 completion costs (when T2 plants cannot provide enough power to satisfy demands). For each impacted T2 plant and each time step, the cheapest T1 completion can be computed in O(log(P1 S )) time, with P1 the number of T1 plants and S the number of considered scenarios. Indeed, for a given scenario, one can observe that the cheapest completion cost can be computed by dichotomy in O(log P1 ) time, using a data structure containing the powers and costs of T1 plants, cumulated according to nondecreasing costs. By fusioning the S structures into only one, it is possible to perform the computation in O(log(P1 S )) time for S scenarios. In conclusion, the time complexity for evaluating the continuous part (refuels, production levels, global cost) is almost linear in the number of modifications induced by the move on the current solution. This time complexity is critical in the global performance of the local search. Having relaxed imposition constraints, the continuous subproblem can be solved by linear programming. Some numerical experiments have shown that our combinatorial approximation algorithm is 10 000 times faster than the state-of-the-art linear programming solvers (Gurobi Optimizer and ibm ilog cplex), while providing production plans with optimality gap lower than 0.1 %.
6
Numerical Experiments
The whole algorithm was implemented in ISO C++ programming language (C++0x). The resulting program includes nearly 12 000 lines of code, whose 15 % are dedicated to check the validity of all incremental data structures at each iteration (only active in debug mode). Note that the continuous part of the problem was handled with exact precision using 64-bits integers. The executable was statically compiled on Linux x64 platform using GCC 4 with full optimization (-O3). Gprof and Valgrind were used for CPU and memory profiling. All statistics and results presented here have been obtained (without parallelization) on a computer equipped with a Linux x64 operating system and a chipset Intel Xeon X5365 (CPU 3 GHz, RAM 4 GB, L2 4 MB, L1 64 kB). Benchmarks are divided into three categories A, B, X containing each one 5 instances. Note that the scale of instances B and X is much larger than instances
Local Search for Mixed-Integer Nonlinear Optimization
177
Table 1. Average results over 10 runs of our local-search heuristic for 10 minutes, 1 hour, and 10 hours of running time, and comparison with the best results found by the other competitors of the Challenge (1 hour) instance A01 A02 A03 A04 A05 B06 B07 B08 B09 B10 X11 X12 X13 X14 X15
10 mn 1.694 804 e11 1.459 699 e11 1.543 227 e11 1.115 163 e11 1.245 784 e11 8.413 041 e10 8.118 554 e10 8.231 156 e10 8.219 982 e10 7.805 363 e10 7.919 385 e10 7.760 939 e10 7.652 986 e10 7.631 402 e10 7.444 765 e10
1h 1.694 780 e11 1.459 600 e11 1.543 212 e11 1.114 966 e11 1.245 599 e11 8.387 786 e10 8.117 563 e10 8.196 477 e10 8.175 367 e10 7.803 998 e10 7.910 063 e10 7.760 090 e10 7.637 339 e10 7.615 823 e10 7.439 302 e10
10 h 1.694 748 e11 1.459 568 e11 1.543 160 e11 1.114 940 e11 1.245 439 e11 8.379 878 e10 8.109 972 e10 8.189 974 e10 8.168 956 e10 7.791 096 e10 7.900 765 e10 7.756 399 e10 7.628 852 e10 7.614 948 e10 7.438 837 e10
best 1.695 383 e11 1.460 484 e11 1.544 298 e11 1.115 913 e11 1.258 222 e11 8.342 471 e10 8.129 041 e10 8.192 620 e10 8.261 495 e10 7.776 702 e10 7.911 677 e10 7.763 413 e10 7.644 920 e10 7.617 299 e10 7.510 139 e10
gap −0.036 % −0.061 % −0.070 % −0.085 % −0.989 % +0.543 % −0.129 % +0.047 % −1.043 % +0.351 % −0.020 % −0.043 % −0.099 % −0.019 % −0.943 %
A. The running time limit was fixed to 1 hour by the organizers of the Challenge. Note that 1.7 GB of RAM are allocated for tackling the largest instances (B8, B9, B10, X13, X14, X15). Since our local-search heuristic is randomized, the results which are given correspond to averages over 10 runs with different seeds. The variance of results is weak (lower than 0.1 %), except for instances B8 and B9 (0.5 %). All the solutions obtained passed the EDF’s checker. Table 1 contains the results obtained for each instance with 10 minutes, 1 hour, and 10 hours of running time respectively. The convergence of the local search is very fast: on average, 99.9 % of the improvement is performed in less than 10 minutes. On average, our algorithm attempts more than 20 000 compound moves per minute on largest instances (which corresponds to more than 100 000 attempted simple moves); 75 % of these compound moves lead to new feasible solutions. The success rate of simple moves during the compound move (to improve the number of violations on combinatorial constraints during the reparation) is greater than 20 %, while the success rate of compound move (to improve the cost of the current solution) is nearly 1 %. Consequently, the number of admissible solutions visited within 1 hour is at least one million, while the number of improving solutions is of a few thousands. One can observe that despite using a standard descent heuristic, diversified compound moves allows a large diversification, which allows still finding improving solutions after several hours of running time. In Table 1, a comparison is done between the solutions found by our localsearch algorithm and the best solutions found among the competitors within 1 hour of running time (computed by the Challenge’s organizers). The latter ones have been obtained on a comparable platform: Linux x64 operating system with chipset Intel Xeon 5420 (2.67 GHz, RAM 8 GB, L2 6 MB, L1 64 kB). Over the 15 instances, we obtain 12 best solutions. In fact, the solutions provided by
178
F. Gardi and K. Nouioua
the team Kuipers-Peekstok, also obtained with a direct local-search heuristic, are very close to ours (average gap lower than 0.1 %). But the average gaps of solution approaches ranked after the 3th (resp. 6th) position, mainly based on MIP/CP decompositions, is greater than 1 % (resp. 10 %). Such gaps are considerable here since the economic fonction (obfuscated in EDF data) contains in effect 9 digits [6]. Acknowledgements. We warmly thank our colleague Dr. Bertrand Estellon (LIF Marseille, France), which worked on several topics in relation with this paper and helped us to perform some numerical experiments during the Challenge.
References 1. Benoist, T., Estellon, B., Gardi, F., Jeanjean, A.: Randomized local search for real-life inventory routing. Transportation Science (2011) (to appear) 2. Estellon, B., Gardi, F., Nouioua, K.: Large neighborhood improvements for solving car sequencing problems. RAIRO Operations Research 40(4), 355–379 (2006) 3. Estellon, B., Gardi, F., Nouioua, K.: Two local search approaches for solving real-life car sequencing problems. European Journal of Operational Research 191(3), 928–944 (2008) 4. Estellon, B., Gardi, F., Nouioua, K.: High-performance local search for task scheduling with human resource allocation. In: St¨ utzle, T., Birattari, M., Hoos, H.H. (eds.) SLS 2009. LNCS, vol. 5752, pp. 1–15. Springer, Heidelberg (2009) 5. Garey, M., Johnson, D. (eds.): Computers and Intractability: a Guide to the Theory of NP-Completeness. Series of Books in the Mathematical Science. W.H. Freeman and Company, New York (1979) 6. Gorge, A.: Planification des arrˆets pour rechargement des centrales nucl´eaires. In: JOR 2008, la 3`eme Journ´ee de Recherche Op´erationnelle et Optimisation dans les R´eseaux, Paris, France (May 2008), oral communication 7. Khemmoudj, M.: Mod´elisation et r´esolution de syst`emes de contraintes : application au probl`eme de placement des arrˆets et de la production des r´eacteurs nucl´eaires d’EDF. Ph.D. thesis, Universit´e Paris 13 (Paris-Nord), France (2007) 8. Khemmoudj, M., Porcheron, M., Bennaceur, H.: When constraint programming ´ and local search solve the scheduling problem of Electricit´ e de france nuclear power plant outages. In: Benhamou, F. (ed.) CP 2006. LNCS, vol. 4204, pp. 271–283. Springer, Heidelberg (2006) 9. Porcheron, M., Gorge, A., Juan, O., Simovic, T., Dereu, G.: Challenge ROADEF/EURO 2010: a large-scale energy management problem with varied constraints. In: EDF R&D, Clamart, France, 27 pages (February 2010) 10. Rego, C., Glover, F.: Local search and metaheuristics. In: Gutin, G., Punnen, A. (eds.) The Traveling Salesman Problem and Its Variations. SIAM Monographs on Discrete Mathematics and Applications, vol. 9, pp. 105–109. Kluwer Academic Publishers, Dordrecht (2002)
Multi-start Heuristics for the Two-Echelon Vehicle Routing Problem Teodor Gabriel Crainic2,3 , Simona Mancini1,2 , Guido Perboli1,2 , and Roberto Tadei1 1
Politecnico di Torino, Turin, Italy CIRRELT, Montreal, Canada School of Management, UQAM, Montreal, Canada 2
3
Abstract. In this paper we address the Two-Echelon Vehicle Routing Problem (2E-VRP ), an extension of the classical Capacitated VRP, where the delivery from a single depot to the customers is managed by routing and consolidating the freight through intermediate depots called satellites. We present a family of Multi-Start heuristics based on separating the depot-to-satellite transfer and the satellite-to-customer delivery by iteratively solving the two resulting routing subproblems, while adjusting the satellite workloads that link them. The common scheme on which all the heuristics are based consists in, after having found an initial solution, applying a local search phase, followed by a diversification; if the new obtained solutions are feasible, then local search is applied again, otherwise a feasibility search procedure is applied, and if it successful, the local search is applied on the newfound solution. Different diversification strategies and feasibility search rules are proposed. We present computational results on a wide set of instances up to 50 customers and 5 satellites and compare them with results from the literature, showing how the new methods outperform previous existent methods, both in efficiency and accuracy.
1
Introduction
The transportation of goods plays a crucial role for most economic and social activities taking place in urban areas. For the city inhabitants, it supplies stores and places of work and leisure, delivers goods at home,and so on. For firms established within city limits, it forms a vital link with suppliers and customers. In the past decade researchers, besides the research for developing green vehicles, practitioners and institutions started to be aware that there was the need of developing new methods and technologies for optimizing how we use the resources presently available in order to reduce the impact of the different sources of nuisance (traffic congestion, pollution, reduction of the quality of life), avoiding to slow down the economic, social and cultural development of the urban areas. The implementation of this view is known as City Logistic, which introduces a multidisciplinary approach to urban logistics, as well as all the research projects aiming to build sustainable logistic systems which takes into account the impact P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 179–190, 2011. c Springer-Verlag Berlin Heidelberg 2011
180
T.G. Crainic et al.
of freight operations on the environment. [6] Under this context, in this paper, we address the Two-Echelon Vehicle Routing Problem (2E-VRP ), [12] which is characterized by a single depot and a set of customers. The delivery of the freight to the customers is not managed by direct shipping from the depot, but by consolidating the freight in intermediate depots, called satellites. The first level routing problem addresses depot-to-satellites delivery, while the satelliteto-customer delivery routes are built at the second level. The goal is to ensure an efficient and low-cost operation of the system, where the demand is delivered on time and the total cost of the traffic on the overall transportation network is minimized. This problem is frequently faced in real life applications, both at the strategic level (long term planning) and the operational one (real-time optimization). Methods which can be applied at both levels must be accurate and, at the same time, fast. In fact, in long term planning the 2E-VRP is often part of a larger simulation framework, which means it must be solved several times during the optimization process. Then, the computational times should be short, while maintaining a high accuracy. On the other hand, at the operational level, real-time optimization problems, for which a feasible solution is needed with a limited computational effort, are also often faced. Our goal is to introduce new methods able to guarantee good accuracy while maintaining high efficiency. In this paper we introduce and compare different heuristics for the 2E-VRP , which are based on separating first and second level routing problems and applying an iterative procedure in which the two resulting subproblems are sequentially solved. We also report extensive computational tests on instances of various sizes and layouts, comparing the newly defined heuristics with the other heuristics available in the literature. More in detail, the paper is organized as follows. We define the problem in Section 2, while in Section 3 we give a literature review. The methods are presented in Section 4 and we report the computational results and their analysis in Section 5. Conclusions and perspectives are presented in Section 6.
2
Problem Definition
The Two-Echelon Vehicle Routing Problem (2E-VRP ) is the Two-Echelon extension of the Capacitated Vehicle Routing Problem (CVRP), which aims to deliver the freight from the depot to the customers by consolidating the freight through the satellites while minimizing the overall transportation cost [12]. In our model we will not consider the fixed costs of the vehicles, since we suppose they are available in fixed number. Thus, the travel costs are given by the sum of the cost due to the usage by the vehicles of the arcs connecting depot, satellites and customers. These costs are of two types: – costs of the arcs traveled by 1st-level vehicles, i.e. arcs connecting the depot to the satellites and the satellites between them; – costs of the arcs traveled by 2nd-level vehicles, i.e. arcs connecting the satellites to the customers and the customers between them.
Multi-start Heuristics for the Two-Echelon Vehicle Routing Problem
181
Let us denote the depot with v0 , the set of satellites with Vs and the set of customers with Vc . Let ns be the number of satellites and nc the number of customers. The depot is the starting point of the freight and the satellites are capacitated. Define the arc (i, j) as the direct route connecting node i to node j and cij its associated traveling cost. If both nodes are satellites or one is the depot and the other is a satellite, we define the arc as belonging to the 1st-level network, while if both nodes are customers or one is a satellite and the other is a customer, the arc belongs to the 2nd-level network. We define as 1st-level route a route made by a 1st-level vehicle which starts from the depot, serves one or more satellites and ends at the depot. A 2nd-level route is a route made by a 2nd-level vehicle which starts from a satellite, serves one or more customers and ends at the same satellite. The freight must be delivered from the depot v0 to the customers set Vc . Let di be the demand of the customer i: the demand of each customer cannot be split among different vehicles at the 2nd level. For the first level, we consider that each satellite can be served by more than one 1st-level vehicle, therefore the aggregated freight assigned to each satellite can be split into two or more vehicles. Each 1st-level vehicle can deliver the freight of one or several customers, as well as serve more than one satellite in the same route. The number of 1st-level vehicles available at the depot is m1 . These vehicles have the same given capacity K 1 . The total number of 2nd-level vehicles available for the second level is equal to m2 . Moreover, each satellite k has a maximum capacity msk expressed in terms of number of vehicles. The 2nd-level vehicles have the same given capacity K 2 . No additional limitation on the route size, neither in length nor in number of visited customers is introduced.
3
Literature Review
Literature on Multi-Echelon systems is quite huge, but it is mainly focused on flow distribution, while routing costs are usually simplified, or not explicitly considered in all the levels. The problem we address is similar, but different, to the Multi-Echelon Capacitated Location Distribution Problem, in which location and flow assignment are handled, while no first-level depot exists and then no first-level routing costs are considered. For a complete survey of this problem the readers can refer to [15]. For what concern exact methods, different formulations and relaxation have been presented in [9], while in [1] a compact model and tight bounds have been provided. A Branch and Cut method has been proposed in [3]. For heuristic methods reference can be made to [2], where the authors developed several heuristics based on hierarchical and non hierarchical clustering algorithms, while, for what concerns metaheuristic methods, we refer to the following papers. In [13], the authors present a two-phases metaheuristic, in which the first phase executes a Greedy Randomized Adaptive Search Procedure (GRASP), based on an extended and randomized version of Clarke and Wright algorithm. This phase is implemented with a learning process on the choice of depots. In a second phase, new solutions are generated by a post-optimization
182
T.G. Crainic et al.
using a path relinking, while in [17], the authors propose a simulated annealing with a special solution encoding scheme that integrates location and routing decisions in order to enlarge the search space so that better solutions can be found. In [4] an hybrid heuristic based on a column generation scheme where the subproblems are solved using a tabu search algorithm, is presented.
4
Heuristics for the 2E-VRP
The customer-to-satellite assignment problem plays a crucial role while solving 2E-VRP , as remarked by the results in [12] and [11]. In fact, if we suppose to know the optimal customer-satellite assignments, 2E-VRP is partitioned in at most ns +1 Capacitated VRP (CVRP) instances, one for the 1st-level and one for each satellite with at least a customer assigned. Thus, as in the math-heuristics in [12], we directly focus on the customer-satellite assignments by searching the optimal assignments, delegating state-of-the-art methods for solving the corresponding CVRPs. Even if the literature on CVRP is quite huge and efficient methods have been developed to solve this problem, the computational time due to CVRP solving could be quite large. Thus, methods involving large neighborhood exploration on the assignments between customers and satellites are not suitable to solve this problem, because of the computational time needed to analyze each customersatellite assignment change and its impact on the routing. For this reason, the core of our heuristic is a Multi-Start procedure that iteratively perturbs the solution, and a simple local search heuristic able to improve the initial assignment. Moreover, additional rules to prune not-promising assignments, and their corresponding CVRPs instances, are taken over. The main steps of our Multi-start heuristic are the following: 1. First Clustering. An initial solution is computed, by assigning each customer to a satellite according to a distance-based greedy rule. Thus, a complete solution is computed by solving the resulting first and second level CVRPs. 2. Clustering Improvement. A local search based on a neighborhood which changes one customer-satellite assignment each time is applied to the solution found by the First Clustering. 3. While the maximum number of iterations is not reached 3.1 Multi-Start. Given the best solution found so far, the assignments customer-satellite are perturbed according to rules taking into account the cost of the reassignment. 3.1.1 If the new solution is not feasible, we try to reach again the feasibility by means of the Feasibility Search algorithm. 3.1.2 If the solution is feasible and it is considered promising, i.e. its objective function is within a given percentage threshold of the best solution, the Clustering Improvement phase is applied on it. In the following, we give a detailed description of the different procedures involved.
Multi-start Heuristics for the Two-Echelon Vehicle Routing Problem
4.1
183
First Clustering
In order to find an initial solution, we develop a clustering-based heuristic, from now on called First Clustering (FC). FC is based on a cost greedy criterion. More in detail, the procedure, after ordering the customers according to non-increasing order of their demand di , assigns each customer to the satellite with the smallest Euclidean distance. If the assignment of the customer to a satellite implies to add an additional vehicle, the procedure checks whether the constraints about the capacity of the satellite or the overall fleet capacity are violated. If so, the assignment is considered as unfeasible and the customer is assigned to the second nearest satellite, and so on until a feasible assignment is found. At the end of this clustering procedure, the customers are assigned to the satellite and the full solution can be computed by solving the first-level CVRP and the second-level CVRPs, one for each satellite with at least one customer assigned to it. 4.2
Clustering Improvement
Clustering Improvement (CI) aims to improve the customer-satellite assignments by means of a local search approach. The local search is a first improvement method where the neighborhood solutions are defined by assigning one customer from its original satellite to another one by a cost-based rule. More in details, the rule consists in moving customers from current satellite to nearest available. This trivial idea is very reasonable because it is much more frequent that in the optimal solution a customer is assigned to the nearest satellite or the second nearest one. Furthermore, this consideration holds for each customer distribution and does not depend on the satellite location strategy. Let define the current solution as the solution given as the initial one to CI if we are at the first iteration or the best solution found at the previous iteration, otherwise. Then, the neighborhood works as follows. Given the current solution, the customers are sorted by non-decreasing order of the reassignment cost, defined as RCi = cij −cik , where i is a customer, j is the satellite to which i is assigned in the current solution, and k = j is the satellite such that, moving i from satellite j to satellite k, the capacity constraints on the global second-level vehicle fleet and the satellite k are satisfied and the cost cik is minimum among the satellites k = j. This is equivalent to order the customers according to non-decreasing order of the estimation of the change in the solution quality due to the assignment of one customer from the present satellite to its second-best choice. Let be CL the ordered list of the customers. repeat Consider the first customer i in CL; if k exists then remove i from CL; else terminate the CI algorithm and return the best solution; end if
184
T.G. Crainic et al.
Solve the CVRPs of satellites j and k; Update the demand of each satellite according to the new assignment and solve the first-level CVRP; Compute the objective function of the new solution and compare it to the cost of the current solution; if the new solution is better then Keep it as new current solution and exit from the neighborhood; else if the new solution has an objective function which is worse than a fixed percentage threshold γ from the objective function of the current solution then Terminate the CI algorithm and return the best solution; else Consider the next customer in the list end if end if until CL is empty Even if the neighborhood has size O(nc ), the computational time could grow up due to the need of recompute the CVRPs after a change in the customersatellite assignments. This is the rationale of adding the additional heuristic stopping criterion when the reassignment has an objective function which is significantly worst than the current solution. The worsening of the quality of the solution is measured by the γ parameter. In fact, being the customers ordered by non-decreasing order of RCi and being RCi related to the change in the objective function when we assign the customer to another satellite, if the objective function of a neighbor is deteriorating too much, it is unlikely that the following neighbors may bring us an improving solution. 4.3
Multi-start Heuristic
Search methods based on local optimization that aspire to find global optima usually require some type of perturbation to overcome local optimality. Without a perturbation phase, such methods can become localized in a small area of the solution space, with very limited possibility of finding a global optimum. In recent years many techniques have been proposed to avoid local optima and a promising way are Multi-Start strategies. They are able to explore different regions of the search space by means of a re-start mechanism. Multi-Start strategies are then used to guide the construction of new solutions in a long term horizon of the search process. The general framework, after generating an initial solution, uses a perturbation mechanism to iteratively build new solutions, which are usually improved by a local search approach (but it could be even a more complex heuristic or metaheuristic). For a complete overview of Multi-Start methods we refer the reader to [10]. In the following, we present our Multi-Start heuristic. The perturbation is done in the Perturbed Solution Generation procedure by a cost-driven randomized rule, which changes the customer-to-satellite assignments. This perturbation method does not imply the feasibility of the obtained
Multi-start Heuristics for the Two-Echelon Vehicle Routing Problem
185
solution, because of satellites capacity or global fleet size constraints violation. In this case, a Feasibility Search (FS) procedure is applied for bringing back the solution in the feasibility region. Whether the solution is feasible, the Clustering Improvement (CI) presented in Section 4.2 is applied to it to improve the solution quality. In order to limit the computational effort, the local search phase is applied only on the most promising solutions, i.e. the ones whose objective value is better of the current best or which objective function is not worse than a fixed percentage threshold δ from the objective function of the best solution. The procedure is repeated until a maximum number of iterations has been reached. In the following, we give more detail about the different rules we tested in Perturbed Solution Generation and Feasibility Search. Perturbed Solution Generation. We present the different rules to generate perturbed solutions. Both are random based, where the probability of a change is proportional to an estimation of the cost due to the reassignment of the customer to another satellite. Generally speaking,for each customer i and satellite j, we define a reassignment probability Pij , j Pij = 1. Then, the perturbation is obtained by considering the customers one after the other and computing the new satellite to which the customer is assigned by a Russian Wheel algorithm, based on the probabilities Pij . The two different definitions of the probabilities Pij are the following: cij
1−
l cil – Linear Randomized (LR). The probability Pij is computed as Pij = n−1 . The rule assigns the probabilities of each customer in inverse relation to its distance from the satellites. The rationale of this rule, in particular when the number of satellites increases, is to enforce the effect of the random component. In fact when the number of satellites n grows, the probabilities aim to be similar. This implies that we find perturbed solutions very far from the initial one, but potentially unfeasible or with a very high objective function. – Majority Prize (MP). The idea of MP is to give a prize in terms of assignment to the best customer-satellite assignments of each customer, while penalizing the worst ones. For each customer, probabilities Pij are computed according to LR and the satellite are ordered, for each customer, in non-decreasing order of Pij . Let ji1 and ji2 the first and the second satellites in the ordered list of customer i. Thus, given two constants r ∈ (0, 1) and p ∈ (0.5, 1), the assignment probabilities are the following: rPij , if j = ji1 , ji2 ; rPij +(1-r)p, if j = ji1 ; rPij +(1-r)(1-p), ifj = ji2 .
Feasibility Search. Let suppose that after the Perturbed Solution Generation phase we obtain a solution which is infeasible. Aim of the Feasibility Search procedure is to guide the solution towards the feasibility space. Thus, if the global fleet size constraint has been violated we try to move customers from the satellite to which belong the less filled vehicle, to another satellite randomly chosen,
186
T.G. Crainic et al.
in order to free that vehicle. In case of a violation of the satellite capacity, we remove customers from a satellite whose capacity has been exceeded, and assign them to another satellite randomly chosen, until the capacity constraint is again fulfilled. We repeat it for all the satellites in which the constraint has been violated. If the new obtained solution is still unfeasible, the solution is discarded. In the following, we present the six different strategies we developed to choose the customers to be moved in order to achieve the feasibility: 1. COST. We move first customers with the highest cost from the satellite; 2. MAX DEM. We move first the customer with the highest demand. This allows us to free a vehicle moving the minimum number of customers; 3. MIN DEM. We move the customers with the lowest demand. The rationale is that the lower is the demand of the customer we are moving, the easier it is assigned to another satellite without violating capacity constraints; 4. The following three strategies uses both the cost and demand-based rules, by maximizing the expression αscosti + βdi , where α and β are the weights we give to the criteria, scosti indicate the cost between customer i and the satellite to which it has been assigned, while di represents the demand of customer i. According to our tests, the best rules are the following (a) 25C 75D. The parameters are set α = 0.25 and β = 0.75. (b) 50C 50D. The parameters are set α = 0.5 and β = 0.5; (c) 75C 25D. The parameters are set α = 0.75 and β = 0.25; This strategies are not applied sequentially. Tests for determining the most performing one among them are presented in Section 5.
5
Computational Tests
In this section we analyze the behavior of the above proposed heuristics in terms of solution quality and computational efficiency. Computational tests are based on instances with different sizes and layout. We compare our heuristics in their best setting with the other heuristics obtained from the literature, the mathheuristics proposed in [12], as well as the best lower bounds from the literature [11]. We do not report explicitly a comparison with MIP solver, because they solve exactly only small instances (up to 32 customers and 2 satellites), while the quality of their solutions becomes very poor when instances grow up to 50 customers, making them not any more competitive. More details can be found in [12]. All the methods presented in this paper are implemented in C++ and tested on a 2.5 GHz Intel Centrino Duo, while the CVRP instances built by the different procedures are heuristically solved by the Branch and Cut method developed by [14], an exact method based on an implicit solutions enumeration with additional constraints, with a time limit of 5 seconds. The instances we used cover up to 50 customers and 5 satellites and can be grouped into two sets:
Multi-start Heuristics for the Two-Echelon Vehicle Routing Problem
187
– S1. It contains all the instances of Set 2 in [12]. The set contains 21 instances obtained as extensions of data sets E-n22-k4, E-n33-k4 and E-n51-k5 for the CVRP problem introduced in [5]. The cost matrix of each instance is given by the corresponding CVRP instance. The capacity of the 1st-level vehicles is 2.5 times the capacity of the 2nd-level vehicles, to represent cases in which the 1st-level is made by trucks and the 2nd-level is made by smaller vehicles (e.g., vehicles with a maximum weight less than 3.5 t). The capacity and the number of the 2nd-level vehicles is equal to the capacity of the vehicles of the corresponding CVRP instance. The satellites are located at the same position of some randomly chosen customers. The instances range between 21 and 50 customers and consider 2 or 4 satellites. – S2. Instances taken from [7]. We consider the instances with 50 customers, combining three customer distributions and three satellites location patterns, with 2, 3, and 5 satellites. Preliminary computational tests on a small subset of S2 have been effectuated in order to determine the behavior of the different rules used in the Perturbed Solution Generation and Feasibility Search procedures. (For the detailed results, see [8]). From the point of view of the perturbation, a better behavior of the Majority Prize rule while from the point of view of the Feasibility Search, the best results are given by the rules which linearly combine cost and demand(25C 75D, 50C 50D, 75C 25D). We also performed a tuning of the parameters involved in the different procedures. We do not report the detailed results, but, according to our tests, the best values are the following: δ = 0.1, γ = 0.1, r = 0.5, p = 0.8, IT ER = 100. 5.1
Comparison with State-Of-The-Art Algorithms
In this section, we compare the results of First Clustering, Clustering Improvement and Multi-Start heuristics with the two math-heuristics by [12], the Diving and the Semi-Relaxed heuristics, as well as with the best lower bounds taken from the literature ([12], [11]). Due to the different workstation used, in order to make the computational times comparable we scale the results for the mathheuristics, as well and the lower bounds to a 2.5 GHz Intel Centrino Duo by means of the SPECINT benchmarks [16]. The results obtained on sets S1 are reported in Table 1, which is organized as follows: – Columns 1-3 and 10-12. Instance name (E-nx-ky-sa-b-c-d , where x indicates the number of customers, y the maximum number of vehicles and letters from a to d, the customers at which the satellites is located), number of customers, and number of satellites. – Columns 4 and 5. Objective function and computational time in seconds obtained by the First Clustering. – Columns 6 and 7. Objective function and computational time in seconds obtained by the Clustering Improvement.
188
T.G. Crainic et al.
Table 1. Computational results for set S1
INSTANCE EͲn22Ͳk4Ͳs6Ͳ17 EͲn22Ͳk4Ͳs8Ͳ14 EͲn22Ͳk4Ͳs9Ͳ19 EͲn22Ͳk4Ͳs10Ͳ14 EͲn22Ͳk4Ͳs11Ͳ12 EͲn22Ͳk4Ͳs12Ͳ16 EͲn33Ͳk4Ͳs1Ͳ9 EͲn33Ͳk4Ͳs2Ͳ13 EͲn33Ͳk4Ͳs3Ͳ17 EͲn33Ͳk4Ͳs4Ͳ5 EͲn33Ͳk4Ͳs7Ͳ25 EͲn33Ͳk4Ͳs14Ͳ22 EͲn51Ͳk5Ͳs2Ͳ17 EͲn51Ͳk5Ͳs4Ͳ46 EͲn51Ͳk5Ͳs6Ͳ12 EͲn51Ͳk5Ͳs11Ͳ19 EͲn51Ͳk5Ͳs27Ͳ47 EͲn51Ͳk5Ͳs32Ͳ37 EͲn51Ͳk5Ͳs2Ͳ4Ͳ17Ͳ46 EͲn51Ͳk5Ͳs6Ͳ12Ͳ32Ͳ37 EͲn51Ͳk5Ͳs11Ͳ19Ͳ27Ͳ47 SUM/AVGTIME IMPROVEMENT(CI) GAP(LIT)
Cust 21 21 21 21 21 21 32 32 32 32 32 32 50 50 50 50 50 50 50 50 50
INSTANCE EͲn22Ͳk4Ͳs6Ͳ17 EͲn22Ͳk4Ͳs8Ͳ14 EͲn22Ͳk4Ͳs9Ͳ19 EͲn22Ͳk4Ͳs10Ͳ14 EͲn22Ͳk4Ͳs11Ͳ12 EͲn22Ͳk4Ͳs12Ͳ16 EͲn33Ͳk4Ͳs1Ͳ9 EͲn33Ͳk4Ͳs2Ͳ13 EͲn33Ͳk4Ͳs3Ͳ17 EͲn33Ͳk4Ͳs4Ͳ5 EͲn33Ͳk4Ͳs7Ͳ25 EͲn33Ͳk4Ͳs14Ͳ22 EͲn51Ͳk5Ͳs2Ͳ17 EͲn51Ͳk5Ͳs4Ͳ46 EͲn51Ͳk5Ͳs6Ͳ12 EͲn51Ͳk5Ͳs11Ͳ19 EͲn51Ͳk5Ͳs27Ͳ47 EͲn51Ͳk5Ͳs32Ͳ37 EͲn51Ͳk5Ͳs2Ͳ4Ͳ17Ͳ46 EͲn51Ͳk5Ͳs6Ͳ12Ͳ32Ͳ37 EͲn51Ͳk5Ͳs11Ͳ19Ͳ27Ͳ47 SUM/AVGTIME IMPROVEMENT(CI) GAP(LIT)
Cust 21 21 21 21 21 21 32 32 32 32 32 32 50 50 50 50 50 50 50 50 50
Sat 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4
FC OF 424.89 386.36 485.12 375.91 453.77 425.65 774.54 745.39 810.83 796.50 775.85 833.30 614.17 544.70 562.21 612.14 553.77 558.48 566.60 573.01 618.52 12491.71
TIME 0.13 0.14 0.48 0.14 0.34 0.16 0.11 0.11 0.25 2.10 0.12 0.17 0.24 2.60 0.27 0.27 0.23 0.15 0.12 0.28 0.20 0.41
0.62%
Sat 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4
DIVING OF TIME 417.07 7 441.41 9 472.23 6 435.92 8 487.45 6 425.65 7 772.57 29 749.94 28 801.19 68 838.31 18 756.88 18 779.06 13 666.83 75 543.24 72 560.22 69 584.09 49 538.20 85 584.59 84 590.63 280 571.80 112 724.09 118 12741.37 55.30
CI OF 424.89 384.96 485.12 375.91 453.77 425.65 774.54 745.39 801.21 796.50 756.88 825.06 614.17 533.83 559.00 597.90 553.77 558.48 565.00 567.00 600.00 12399.03
TIME 1.590 0.456 0.763 0.703 1.180 0.887 3.420 2.730 3.400 8.830 1.880 2.600 0.566 2.790 0.586 0.567 0.352 0.404 0.138 0.560 0.640 1.67
Ͳ0.13% SEMI OF TIME 417.07 14 408.14 7 470.60 10 440.85 0.1 429.39 10 439.19 8 736.92 2 736.37 6 739.47 5 816.59 12 756.88 42 779.06 4 628.53 567 534.04 257 554.80 60 592.06 247 538.20 224 587.12 557 542.37 1057 584.88 936 724.09 555 12456.62 218.16
MP/50C_50D OF TIME 417.07 16 384.96 9 472.23 20 375.91 7 444.83 15 403.79 26 757.56 20 733.18 25 754.65 28 792.89 19 756.88 15 824.60 16 614.17 12 533.83 46 564.92 32 597.90 19 553.77 17 555.05 33 565.00 5 567.00 6 600.00 3 12270.19 18.52 1.04% Ͳ1.16% LIT OF TIME 417.07 21 408.14 16 470.60 16 435.92 9 429.39 16 425.65 15 736.92 31 736.37 34 739.47 73 816.59 31 756.88 59 779.06 17 628.53 641 534.04 329 554.80 130 584.09 296 538.20 310 584.59 640 542.37 1338 571.80 1048 724.09 673 12414.57 273.46
BESTLB 417.07 384.96 470.60 371.50 427.22 392.78 730.16 714.63 707.41 778.73 756.84 779.05 576.97 529.34 541.17 558.27 535.04 552.27 515.75 516.02 511.09 11766.86
Multi-start Heuristics for the Two-Echelon Vehicle Routing Problem
189
– Columns 8-9. For the best version of the Multi-Start (50C5 0D), we give the objective function and computational time. – Columns 13-16. We report the results of the state-of-the-art algorithms. More precisely, DIVING and SEMI columns refer to the Diving and the SemiRelaxed heuristics.[12]. – Column 17-18. Objective function and computational time of the composite heuristic obtained by taking the best between Diving and Semi-Relaxed heuristics. – Column 19. Column BEST LB gives the best lower bound computed for each instance ([12], [11]). Values in bold correspond to optimal solutions. For each method we report the single values for each instance. The last three rows give a summary of the results of each method, providing the sum of the objective functions, the average computational time, the percentage improvement with respect to CI and the percentage gap with the results from the literature. The overall best of each instance is emphasized. If it has been obtained by two or more methods, we consider as overall best the one obtained within the lower computational time. As far as set S1 analysis is concerned, it can be noticed that the different versions of the Multi-Start heuristic perform sensibly better than DIVING (around 4%) and SEMI (around 2%) with a smaller computational effort. Even CI outperforms the math-heuristics of 2.97% and 0.75%, respectively, with a reduction of the computational effort of two order of magnitude. If we compare our results with the composite heuristic which consider the best of the two math-heuristic, the Multi-Start heuristics still improve of more than 1%. Furthermore, if we consider the results instance by instance, we notice how our heuristics reach the overall best in the 59% of the cases, with an average improvement of the literature of 2.63%. Tables reporting results obtained on S2 can be found in [8]. All our MultiStart methods perform sensibly better than Diving (more than 3%) and SemiRelaxed (more than 1%) in quite smaller computational times. If compared with the best solution from literature, Multi-Start procedures obtain very similar results within a computational time one order of magnitude lower. The overall best is reached in 53% of the cases and yield to an averaged improvement of the literature of 3.44%.
6
Conclusions
In this paper, we presented a family of Multi-Start heuristics for the Two-Echelon Vehicle Routing Problem, a newly defined Multi-Echelon variant of the classical CVRP. The experimental results have shown that they all perform well, particularly considering the very limited computational effort needed by our algorithms, and are more efficient than the other heuristic methods from the literature. Computational results show also the very good performances of our local search approach, and a good quality of the initial solution computation method. Future developments will address larger instances and meta-heuristic frameworks working on neighborhoods directly based on the customer positioning
190
T.G. Crainic et al.
inside the routes, instead of acting on the assignments. For a more detailed discussion of results we refer to [8].
References 1. Albareda-Sambola, M., Diaz, J., Fernandez, E.: A compact model and tight bounds for a combined location-routing problem. Computers & Operations Research 32, 407–428 (2005) 2. Barreto, S., Ferreira, C., Paixao, J., Souza Santos, B.: Using clustering analysis in a capacitated location-routing problem. European Journal of Operation Research 179, 968–977 (2007) 3. Belenguer, J., Benavent, E., Prins, C., Prodhon, C.: A branch-and-cut method for the capacitated location-routing problem. Computers & Operation Research 38(6), 931–941 (2010) 4. Boulanger, C., Semet, F.: A column generation based heuristic for the capacitated location-routing problem. In: EU/MEeting (2008) 5. Christofides, N., Mingozzi, A., Toth, P.: The Vehicle Routing Problem. In: Christofides, N., Mingozzi, A., Toth, P., Sandi, C. (eds.) Combinatorial Optimization, pp. 315–338. John Wiley, New York (1979) 6. Crainic, T.G., Ricciardi, N., Storchi, G.: Models for evaluating and planning city logistics systems. Transportation Science 43, 432–454 (2009) 7. Crainic, T.G., Mancini, S., Perboli, G., Tadei, R.: Two-echelon vehicle routing problem: A satellite location analysis. PROCEDIA Social and Behavioral Sciences 2, 5944–5955 (2010) 8. Crainic, T.G., Mancini, S., Perboli, G., Tadei, R.: Multi-start heuristics for the two-echelon vehicle routing problems. Tech. Rep. CIRRELT-2010-30, CIRRELT (2010) 9. Gendron, B., Semet, F.: Formulations and relaxations for a multi-echelon capacitated location-distribution problem. Computers & Operations Research 36, 1335– 1355 (2009) 10. Marti, R.: Multi-Start Methods. In: Handbook of Metaheuristics, vol. 57, pp. 355– 368. Springer, New York (2003) 11. Perboli, G., Tadei, R., Masoero, F.: New families of valid inequalities for the twoechelon vehicle routing problem. Electronic Notes on Discrete Mathematics (2010) (forthcoming), 10.1016/J.ENDM.2010.05.081 12. Perboli, G., Tadei, R., Vigo, D.: The two-echelon capacitated vehicle routing problem. Publication cirrelt-2008-55, CIRRELT Montr´eal, Canada (2008); and Transportation Science (forthcoming) 13. Prins, C., Prodhon, C., Wolfer-Calvo, R.: Solving the capacitated location-routing problem by a grasp complemented by a learning process and a path relinking. 4OR 4, 221–238 (2006) 14. Ralphs, T.K.: Parallel Branch and Cut for Capacitated Vehicle Routing. Parallel Computing 29, 607–629 (2003) 15. Salhi, S., Nagy, G.: Location-routing: Issues, models and methods. European Journal of Operation Research 177, 649–672 (2007) 16. Standard Performance Evaluation Corporation: SPEC CPU2006 benchmarks (2006), http://www.spec.org/cpu2006/results/ 17. Yu, V., Lin, S., Lee, W., Ting, C.: A simulated annealing heuristic for the capacitated location routing problem. Computers & Industrial Engineering 58, 288–299 (2010)
NILS: A Neutrality-Based Iterated Local Search and Its Application to Flowshop Scheduling Marie-El´eonore Marmion1,2 , Clarisse Dhaenens1,2 , Laetitia Jourdan2, Arnaud Liefooghe1,2 , and S´ebastien Verel2,3 1
3
Universit´e Lille 1, LIFL – CNRS, France 2 INRIA Lille-Nord Europe, France University of Nice Sophia Antipolis – CNRS, France
[email protected]
Abstract. This paper presents a new methodology that exploits specific characteristics from the fitness landscape. In particular, we are interested in the property of neutrality, that deals with the fact that the same fitness value is assigned to numerous solutions from the search space. Many combinatorial optimization problems share this property, that is generally very inhibiting for local search algorithms. A neutrality-based iterated local search, that allows neutral walks to move on the plateaus, is proposed and experimented on a permutation flowshop scheduling problem with the aim of minimizing the makespan. Our experiments show that the proposed approach is able to find improving solutions compared with a classical iterated local search. Moreover, the tradeoff between the exploitation of neutrality and the exploration of new parts of the search space is deeply analyzed.
1
Motivations
Many problems from combinatorial optimization, and in particular from scheduling, present a high degree of neutrality. Such a property means that a lot of different solutions have the same fitness value. This is a critical situation for local search techniques (although this is not the only one), since it becomes difficult to find a way to reach optimal solutions. However, up to now, this neutrality property has been under-exploited to design efficient search methods. Barnett [1] proposes a heuristic (the Netcrawler process), adapted to neutral landscapes, that consists of a random neutral walk with a mutation mode adapted to local neutrality. The per-sequence mutation rate is optimized to jump from one neutral network to another. Stewart [2] proposes an Extrema Selection for evolutionary optimization in order to find good solutions in a neutral search space. The selection aims at accelerating the evolution during the search process once most solutions from the population have reached the same level of performance. To each solution is assigned an endogenous performance during the selection step to explore the search space area with the same performance more largely, and to reach solutions with better fitness values. Verel et al. [3] propose a new approach, the Scuba search, which has been tested on Max-SAT problems and on P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 191–202, 2011. c Springer-Verlag Berlin Heidelberg 2011
192
M.-E. Marmion et al.
NK-landscapes with neutrality. This local search exploits the evolvability (ability of random variations to reach improving solutions) of neutral networks. On neutral plateaus, the search is guided by the evolvability of solutions. When there is no neighboring solution with a higher evolvability on the plateau, a fitness “jump” is performed. Knowing that the permutation flowshop scheduling problem (PFSP) of minimizing the makespan is N P-hard in the general case [4], a large number of metaheuristics have been proposed so far for its resolution. Recently, 25 methods have been tested and their performance has been compared by Ruiz and Maroto [5]. The Iterated Local Search of St¨ utzle [6] has been shown to be one of the most efficient algorithm to solve Taillard’s FSP instances [7]. More recently, Ruiz and St¨ utzle [8] proposed a simple variant of St¨ utzle’s ILS based on greedy mechanisms. The perturbation is made of a destruction and of a construction phase: jobs are removed from the solution and then re-inserted in order to get a new configuration that yields the best possible fitness value. Given that the PFSP is known to have a high neutrality with respect to the insertion neighborhood operator [9], we can assume that a lot of moves which do not change the fitness value are allowed with such a perturbation strategy. Furthermore, the acceptance criterion used in [8] has the particularity to accept equivalent, and then neutral, solutions. We argue that this neutrality property could be used more explicitly, and in a more simple way. This paper will not bring to a new heuristic which produces some new bestknown solutions for a given scheduling problem. The goal of this preliminary work is to give a minimal, and yet efficient approach based on Iterated Local Search (ILS), which exploits the neutrality property of the search space in a novel way. The approach explicitly balances the search between the exploitation of the plateaus in the landscape, and the exploration of new parts of the search space. Two main questions are addressed in this paper: (i) What are the performances of this neutrality-based approach on solving a difficult scheduling problem where neutrality arises? (ii) What are the costs and the benefits of such an exploitation? A Neutrality-based Iterated Local Search (NILS) that performs neutral walks along the search is proposed. The performances and the dynamics of NILS are deeply analyzed on the PFSP. The paper is organized as follows. Section 2 is dedicated to the flowshop scheduling problem, to the required definitions of neutral fitness landscapes, and to the main principles of iterated local search. Section 3 presents the Neutralitybased Iterated Local Search (NILS) proposed in the paper. Section 4 is devoted to the analysis of the NILS efficiency to solve the PFSP. Finally, the last section concludes the paper and gives suggestions for further research.
2 2.1
Background The Permutation Flowshop Scheduling Problem
The Flowshop Scheduling Problem is one of the most investigated scheduling problem from the literature. The problem consists in scheduling N jobs
Neutrality-Based Iterated Local Search for Flowshop Scheduling
193
{J1 , J2 , . . . , JN } on M machines {M1 , M2 , . . . , MM }. Machines are critical resources, i.e. two jobs cannot be assigned to the same machine at the same time. A job Ji is composed of M tasks {ti1 , ti2 , . . . , tiM }, where tij is the j th task of Ji , requiring machine Mj . A processing time pij is associated with each task tij . We here focus on a Permutation Flowshop Scheduling Problem (PFSP), where the operating sequences of the jobs are identical for every machine. As a consequence, a feasible solution can be represented by a permutation πN of size N (determining the position of each job within the ordered sequence), and the size of the search space is then |S| = N !. In this study, we consider the minimization of the makespan, i.e. the total completion time, as the objective function. Let Cij be the completion date of task tij , the makespan can be defined as follows: Cmax = maxi∈{1,...,N } {CiM }. 2.2
Neighborhood and Local Search
The design of local search algorithms requires a proper definition of the neighborhood structure for the problem under consideration. A neighborhood structure is a mapping function N : S → 2S that assigns a set of solutions N (s) ⊂ S to any feasible solution s ∈ S. N (s) is called the neighborhood of s, and a solution s ∈ N (s) is called a neighbor of s. A neighbor of solution s results of the application of a move operator performing a small perturbation to solution s. This neighborhood is one of a key issue for the local search efficiency. A solution s∗ is a local optimum iff no neighbor has a better fitness value: ∀s ∈ N (s∗ ), f (s∗ ) ≤ f (s). For the PFSP, we consider the insertion operator. This operator is known to be one of the best-performing neighborhood structure for the PFSP [6,10]. It can be defined as follows. Let us consider an arbitrary solution, represented here by a permutation of size N (the number of jobs). A job located at position i is inserted at position j = i. The jobs located between positions i and j are shifted. The number of neighbors per solution is then (N − 1)2 [6], where N stands for the size of the permutation. 2.3
Neutral Fitness Landscape
A fitness landscape [11] can be defined by a triplet (S, N , f ), where S is a set of admissible solutions (i.e. the search space), N : S −→ 2S , a neighborhood structure, and f : S −→ IR is a fitness function that can be pictured as the height of the corresponding solutions. A neutral neighbor is a neighboring solution having the same fitness value, and the set of neutral neighbors of a solution s ∈ S is then Nn (s) = {s ∈ N (s) | f (s ) = f (s)}. The neutral degree of a given solution is the number of neutral solutions in its neighborhood. A fitness landscape is said to be neutral if there are many solutions with a high neutral degree. A neutral fitness landscape can be pictured by a landscape with many plateaus. A plateau P is a set of pairwise neighboring solutions with the same fitness values: ∀s ∈ P, ∃s ∈ P, s ∈ N (s ), f (s ) = f (s). A portal in a plateau is a solution that has at least one
194
M.-E. Marmion et al.
neighbor with a better fitness value, i.e. a lower fitness value in a minimization context. Notice that a local optimum can still have some neutral neighbors. A plateau which contains a local optimum is called a Local Optima Plateau (LOP). The average or the distribution of neutral degrees over the landscape may be used to test the level of neutrality of a problem. This measure plays an important role in the dynamics of local search algorithms [12,13]. When the fitness landscape is neutral, the main features of the landscape can be described by its LOP, that may be sampled by neutral walks. A neutral walk Wneut = (s0 , s1 , . . . , sm ) from s to s is a sequence of solutions belonging to S where s0 = s and sm = s and for all i ∈ {0, . . . , m − 1}, si+1 ∈ Nn (s). Hence, to escape from a LOP, a heuristic method has to perform a neutral walk in order to find a portal. 2.4
Neutrality in the PFSP
In scheduling problems, and in particular in flowshop scheduling problems, it is well known that several solutions may have the same fitness value, i.e. the neutral degree of feasible solutions is not null. This has been confirmed in a recent work that analyzes the neutrality of Taillard’s PFSP instances [9]. It has been shown that the neutrality increases when the number of jobs and/or the number of machines increase. Moreover, this study revealed that very few local optima have no neutral neighbor and that the local optima plateaus are numerous and quite large. Experiments highlighted that most random neutral walks on a LOP are able to find a portal, i.e. numerous portals exist. This indicates that exploring a plateau allows to lead to a more interesting solution. These comments lead us to make proposals on the way to exploit this neutrality in order to guide the search more efficiently. More precisely, we will study how this neutrality can be exploited within an iterated local search algorithm. 2.5
Iterated Local Search
Iterated Local Search (ILS) [14] is a simple and powerful heuristic methodology that applies a local move iteratively to a single solution. In order to design an ILS, four elements have to be defined: (i) an initialization strategy, (ii) a (basic) local search, (iii) a perturbation, and (iv) an acceptance criterion. The initialization strategy generates one solution from the search space. The local search must lead the current solution to a local optimum. The perturbation process allows to modify a solution in a different way that the neighborhood relation, in order to jump over the landscape. The acceptance criterion determines whether the local optimum should be kept for further treatments. Let us remark that both the perturbation process and the acceptance criterion could be based on an history of solutions found during the search. Thus, during the search, if the current optimum is not satisfying, the search process is able to restart from a better local optimum, found in a previous iteration. The search is stopped once a termination condition is satisfied. The general scheme of ILS, given in [14] is as follows: (i) after generating an initial solution, the local search finds a local optimum, until the termination
Neutrality-Based Iterated Local Search for Flowshop Scheduling
195
condition is met, (ii) the current solution is perturbed, (iii) the local search is applied to find another local optimum, (iv) it is accepted or not regarding the acceptation criterion. The best solution found along the search is returned as output of the algorithm.
3
Neutrality-Based Iterated Local Search
In this section, we propose an ILS based on neutrality, called Neutrality-based Iterated Local Search (NILS). We first give the local search and the perturbation strategy used at every NILS iteration, and then we discuss the general principles of NILS. 3.1
Local Search: First-Improving Hill-Climbing
NILS is based on the iteration of a First-Improving Hill-Climbing (FIHC) algorithm, where the current solution is replaced by the first encountered neighbor that strictly improves it. Each neighbor is explored only once, and the neighborhood is evaluated in a random order. FIHC stops on a local optimum. 3.2
Perturbation: Neutral Walk-Based Perturbation
In general, there exists two possible ways of escaping from a local optimum in a neutral fitness landscape: either performing neutral moves on the Local Optima Plateau (LOP) until finding a portal, or performing a kick move which is a ‘large step’ move. When a neutral move is applied, it is assumed that the exploitation of the neutral properties helps to find a better solution. On the contrary, when a kick move is applied, it supposes that portals are rare, and that the exploration of another part of the search space is more promising. The proposed Neutral Walk-based Perturbation (NWP), given in Algorithm 1, deals with this tradeoff between exploitation and exploration of LOP. First, NWP performs a random neutral walk on a LOP. The maximum number of allowed neutral steps on the LOP is tuned by a parameter denoted by M N S (Maximal Number of Steps). Along the neutral walk, as soon as a better solution (a portal) is found, the neutral walk is stopped and the current solution is replaced by this neighboring solution. Otherwise, if the neutral walk does not find any portal, the solution is kicked. The kick move corresponds to a large modification of the solution. Like in FIHC, each neighbor is explored only once, and the neighborhood is evaluated in a random order. 3.3
NILS: A Neutrality-Based ILS
NILS is based on the FIHC local search and the NWP perturbation scheme. Its acceptance criterion always accepts the current solution. After the initialization of the solution and the first execution of the FIHC, NILS iterates two phases: (i) a neutral phase which performs neutral moves on LOP, (ii) a strictly improving phase. The neutrality of the problem is taken into account during phase (i).
196
M.-E. Marmion et al.
Algorithm 1. Neutral Walk-based Perturbation (NWP) step ← 0, better ← false while step < M N S and not better and |Nn (s)| > 0 do choose s ∈ N (s) such that f (s ) ≤ f (s) if f (s ) < f (s) then better ← true end if s ← s step ← step+1 end while if not better then s ← kick(s) end if
Indeed, the neutral walk on the LOP is used to cross a large part of the search space. When the density of portals on plateaus is low, the M N S parameter avoids to spread on the plateau unnecessarily. In such a case, a “restart” technique is used: the solution is kicked to escape from its neighborhood. Phase (ii) strictly improves the current solution using the neighborhood operator under consideration. The neutral phase allows to exploit the neutrality of the search space from the local optima by visiting the corresponding LOP. Thus, a LOP is considered as a large frontier until a better local optimum. When a portal is found, the FIHC algorithm is executed in order to find this local optimum. However, when the frontier is too large, such a local optimum can be difficult to reach. Thus, the NILS algorithm restarts its search from another part of the search space, expecting that the next LOP would be easier to cross. As a consequence, the tradeoff between exploitation and exploration of the plateaus is directly tuned by the single NILS parameter: the M N S value.
4
Neutrality-Based Iterated Local Search for the Permutation Flowshop Scheduling Problem
4.1
Experimental Design
Experiments are driven using a set of benchmark instances originally proposed by Taillard [7] for the flowshop scheduling problem, and widely used in the literature [6,10]. We investigate different values for the number of machines M ∈ {5, 10, 20} and for the number of jobs N ∈ {20, 50, 100, 200, 500}. The processing time tij of job i ∈ N and machine j ∈ M is an integer value generated randomly, according to a uniform distribution U(0, 99). For each problem size (N × M ), 10 instances are available. Note that, as mentioned on the Taillard’s website1 , very few instances with 20 machines have been solved to optimality. For 5- and 10-machine instances, optimal solutions have been found, requiring for some of 1
http://mistic.heig-vd.ch/taillard/problemes.dir/ordonnancement.dir/ ordonnancement.html
Neutrality-Based Iterated Local Search for Flowshop Scheduling
197
them a very long computational time. Hence, the number of machines seems to be very determinant in the problem difficulty. That is the reason why the results of the paper will be exposed separately for each number of machines. The performances of the NILS algorithm are experimented on the first Taillard’s instance of each size, given by the number of jobs (N ) and the number of machines (M ). For each size, several M N S values are investigated: – – – – –
For For For For For
N N N N N
= 20, = 50, = 100, = 200, = 500,
M N S ∈ {0, 10, 20, 50, 100} M N S ∈ {0, 10, 50, 70, 100, 150, 300, 900, 1800} M N S ∈ {0, 50, 100, 300, 1200, 5000, 10000, 20000, 40000} M N S ∈ {0, 100, 200, 500, 1000, 4000, 8000, 30000, 60000} M N S ∈ {0, 250, 500, 1000, 5000, 10000, 15000, 20000, 40000}
NILS is tested 30 times for each M N S value and each problem size. Let us remark that when M N S = 0, NILS corresponds to an ILS algorithm without any neutrality consideration. Hence, we are able to compare an ILS that uses neutrality to a classical one. As the distribution of the 30 final fitness values is not symmetrical, the average and the standard deviation are less statistically meaningful. Therefore, for each problem size and each M N S value, the median and the boxplot of the 30 fitness values of solutions found are presented. For the other statistical quantities, the average value and the standard deviation of the 30 executions are computed. In order to let any configuration enough time to converge, the NILS termination condition is given in terms of a maximal number of evaluations, set to 2.107 . The insertion operator (Section 2.2) defines the neighborhood used in the first improving hill-climbing (FIHC) and in the random neutral walk (NWP). In the NILS perturbation scheme, the solution is kicked when the neutral walk does not find any portal. In this study, the kick move corresponds to 3 randomly chosen exchange moves. As empirically shown in [10], this kick is instance-dependent. However, as our work does not attempt to set the best parameters for the kick move, we choose a proper value that obtained reasonably good performances for all instances. 4.2
Experimental Results
This section examines the benefit of taking neutrality into account in the search process. We denote by NILSx , the NILS algorithm with the following parameter setting: M N S = x. Results obtained with M N S = 0 and M N S > 0 are compared with each other. Table 1 presents, for each problem size (M × N ), the fitness value of the best solution found in comparison to the best known solution from the literature. Let us remind that a minimization problem is under consideration. The median of the 30 fitness values found by NILS0 and NILS with the larger M N S value (NILSmax ) are also given. For 5-machine instances, the best solution is reached by every NILS configuration. This means that using the neutral walk is as interesting as the classical ILS model. The same conclusion can be made for all 20-job instances. For the remaining instances, the performance is quite far from the best known results from the literature. However,
198
M.-E. Marmion et al.
Table 1. NILS performances according to the number of jobs (N ) and to the number of machines (M ). The best known solution from the literature is given. NILS∗ gives the best performance found for all M N S parameter values. The median of the 30 fitness values is calculated for each M N S value. NILS0 gives the median for M N S = 0. NILSmax gives the median for the maximal tested M N S value. The best value between NILS0 and NILSmax is given in bold type. M
N
Literature NILS∗ best
NILS0
NILSmax median
5
20 50 100
1278 2724 5493
1278 2724 5493
1278 2724 5493
1278 2724 5493
10
20 50 100 200
1582 2991 5770 10862
1582 3025 5770 10872
1582 3042 5784 10885
1582 3025 5771 10872
20
20 50 100 200 500
2297 3847 6202 11181 26059
2297 3894 6271 11303 26235
2297 3925 6375.5 11405.5 26401.5
2297 3917 6315.5 11334.5 26335.5
the median value of NILSmax is better than the one of NILS0 . For example, for N = 200 and M = 20, NILSmax clearly outperforms NILS0 . The best results are reached when neutral walks on plateaus are allowed, which validates the use of neutrality. When the results are compared with the degree of neutrality, we can give additional conclusions. For 20-machine instances, the average degree of neutrality of solutions is quite low (around 5% in average) [9]. However, the performance of NILS is better. This could be surprising at first sight. In fact, for those instances, the degree of neutrality is sufficient for the neutral random walks to cross a wide search space with the same fitness level. For the instances with a higher degree of neutrality, the density of portals on the plateaus decreases, and a pure random walk on a plateau is less efficient to find a portal quickly. 4.3
Influence of the M N S Parameter
Since random neutral walks on a local optima plateau seem to be efficient in finding improving solutions, we can wonder whether the neutral walk should be large or not. The M N S parameter corresponds to the maximal number of neutral steps allowed to move on the LOP. In this section, we study the performance using different M N S values that depend on the number of jobs and so, that is related to the neighborhood size. Figure 1 shows the boxplot of fitness values for instances up to 50 jobs and 10 machines. For 10-machine instances (top), the median fitness value starts by decreasing with the M N S value, and then stabilizes:
Fitness value
60000
8000
MNS
30000
4000
500
1000
200
0
100
40000
20000
5000
10000
300
MNS
11400 11350
Fitness value
6340
6360
60000
8000
30000
MNS
4000
1000
500
200
100
0
11300 40000
20000
5000
300
50
100
0
6260
1800
900
300
1200
MNS
10000
MNS
150
100
70
50
10
0
6280
3900
6300
6320
Fitness value
3920 3910
Fitness value
3930
6380
MNS
1200
50
0
100
5770 900
1800
300
150
70
100
50
10
0
3025
10875
3030
5775
10880
Fitness value
5780
10885
3045 3040 Fitness value
3035
199
10890
5785
Neutrality-Based Iterated Local Search for Flowshop Scheduling
Fig. 1. Boxplots of fitness values of the 30 solutions found after 2.107 evaluations. The boxplots for 10-machines and 20-machines instances are represented top and bottom respectively and the 50-,100-,200-jobs instances from left to right.
the results are equivalent for large M N S values. As they are not distributed normally, the Mann-Whitney statistical test was used to check this tendency. The pairwise tests validate that the results are not statistically different for highest M N S values. Increasing the value of the M N S parameter does not deteriorate the performance. Therefore, it appears that this parameter can be set to a large value. For 20-machine instances (bottom), best performances are always found by a NILS algorithm that exploit the LOP. The same happens for N = 200 and N = 500 (Figure 2 (a)). However, for N = 50 and N = 100, the performance decreases with the M N S value. This remark is confirmed by the Mann-Whitney statistical test. Therefore, in some cases, an over-exploitation of the plateaus can become too costly. Here, we see that the tradeoff between neutrality exploitation and search space exploration should be considered carefully as it seems to exist a more-suited M N S value for some instance sizes.
M.-E. Marmion et al.
26500
26550
200
26850
26450
26750
26350
Fitness value
26400
26700
26300
Fitness value
MNS=0 MNS=1000 MNS=5000 MNS=15000 MNS=40000
26800
26650 26600 26550 26500 26450
26250
26400 26350 0
40000
20000
MNS
15000
5000
10000
500
1000
250
0
26300 5e+06 1e+07 1.5e+07 Number of evaluations
(a)
2e+07
(b)
100
3.5e+06
90
3e+06 Number of evaluations
80 70
%
60 50 40 30
2.5e+06 2e+06 1.5e+06 1e+06
20 500000
10 0
0 0
5000 10000 15000 20000 25000 30000 35000 40000
0
10000
MNS
(c)
20000
40000
MNS
(d)
Fig. 2. Results for N=500 and M=20. Boxplots of fitness values of the 30 solutions found after 2.107 evaluations (a). Median performance according to the number of evaluations (b). Average percentage (and standard deviation) of visited local optima plateaus where the perturbation led to a portal (c). Number of evaluations (average number and standard deviation) lost during the perturbation step according to the M N S values (d).
The performances after 2.107 evaluations are better with the maximum number of neutral steps. Figure 2 (b) represents the median of the 30 fitness values according to different numbers of evaluations for N = 500 and M = 20. For each possible number of evaluations, the median performance is always higher when the M N S value is large. Moreover, the larger the problem size, the better the fitness value found using neutrality along the search (see Figure 2 (b)). Whatever the fitness level, from 26401.5 to 26335.5, using neutrality clearly improves the performance. 4.4
Benefits and Costs of Neutral Moves
In order to deeply analyze the tradeoff between exploitation and exploration, tuned by the M N S parameter, we here estimate the benefit and the cost of
Neutrality-Based Iterated Local Search for Flowshop Scheduling
201
neutral moves on the plateaus. Figure 2(c) computes the number of portals reached by the neutral walks for different M N S values for N = 500 and M = 20. When the M N S value increases, the number of reached portals increases, even if the slope decreases. This indicates that the few additional portals reached with a larger M N S value are interesting ones, since they lead to better solutions (Figure 1). The cost of the neutral moves is the total number of evaluations when the neutral walk fails to find a portal. Figure 2 (d) gives this number for N = 500 and M = 20. Even if this cost increases with the M N S value, we can notice that the slope decreases. This indicates that a larger neutral walk keeps interesting performance. Nevertheless, the slope of the cost is higher than the one of the benefit. It suggests that a tradeoff value can be reached in favor of a kick-move exploration for very large M N S values.
5
Conclusion and Future Works
In this paper, we exploited the property of neutrality in order to design an efficient local search algorithm. Indeed, numerous combinatorial optimization problems contain a significant number of equivalent neighboring solutions. A Neutrality-based Iterated Local Search (NILS), that allows neutral walks to move on the plateaus, is proposed. The performance of the NILS algorithm has been experimented on the permutation flowshop scheduling problem by regarding the maximum length of neutral walks. As revealed in a previous study [9], the fitness landscape associated with this problem has a high degree of neutrality. Our experimental analysis shows that neutral walks allow to find improving solutions, and that the longer the neutral walk, the better the solution found. NILS is able to take advantage of applying neutral moves on plateaus, without any prohibitive additional cost. This can be explained by the relatively high density of portals over plateaus which can be reached by a random neutral walk. In their iterated greedy algorithm proposed to solve the flowshop problem investigated in the paper, Ruiz et al. [8] combine a steepest descent local search with a destruction and a construction phases for perturbation, and an acceptance criterion based on a temperature. Let us remark that the authors suggest that the temperature level has not a significant influence on the overall performance. However, for the problem at hand, a solution has a non negligible number of neutral neighbors and the local optima plateaus are quite large. Thus, the destruction and construction phases can easily lead to a solution with the same fitness value. This perturbation can be compared to the NILS perturbation between a random neutral walk and a kick-move exploration. In the NILS algorithm, this tradeoff is explicitly tuned by a single parameter. Considering those remarks together with the neutral exploitation of NILS, it suggests that the good performance of Ruiz et al.’s algorithm [8] is probably due to the degree of neutrality of the flowshop scheduling problem. NILS uses this property more explicitly, with a lower implementation cost and less parameters. This first version of the NILS algorithm explores the neighborhood in a random order, and both the components of hill-climbing and of random neutral walk
202
M.-E. Marmion et al.
are based on a first-improving strategy. Future works will be dedicated to design efficient guiding methods based on the evolvability of solutions, in order to reach the portals more quickly. Moreover, a similar work will allow to better understand the dynamics of the NILS algorithm on other combinatorial optimization problems where a high degree of neutrality arises. Hopefully, this will allow us to propose an adaptive control of the single parameter of the NILS algorithm: the maximum length of the neutral walk.
References 1. Barnett, L.: Netcrawling - optimal evolutionary search with neutral networks. In: Congress on Evolutionary Computation CEC 2001, pp. 30–37. IEEE Press, Los Alamitos (2001) 2. Stewart, T.: Extrema selection: Accelerated evolution on neutral networks. In: Congress on Evolutionary Computation CEC 2001, pp. 25–29. IEEE Press, Los Alamitos (2001) 3. Verel, S., Collard, P., Clergue, M.: Scuba Search: when selection meets innovation. In: Congress on Evolutionary Computation CEC 2004, pp. 924–931. IEEE Press, Los Alamitos (2004) 4. Johnson, S.M.: Optimal two- and three-stage production schedules with setup times included. Naval Research Logistics Quarterly 1, 61–68 (1954) 5. Ruiz, R., Maroto, C.: A comprehensive review and evaluation of permutation flowshop heuristics. European Journal of Operational Research 165(2), 479–494 (2005) 6. St¨ utzle, T.: Applying iterated local search to the permutation flow shop problem. Technical Report AIDA-98-04, FG Intellektik, TU Darmstadt (1998) 7. Taillard, E.: Benchmarks for basic scheduling problems. European Journal of Operational Research 64, 278–285 (1993) 8. Ruiz, R., St¨ utzle, T.: A simple and effective iterated greedy algorithm for the permutation flowshop scheduling problem. European Journal of Operational Research 177(3), 2033–2049 (2007) 9. Marmion, M.E., Dhaenens, C., Jourdan, L., Liefooghe, A., Verel, S.: On the neutrality of flowshop scheduling fitness landscapes. In: Learning and Intelligent Optimization (LION 5). LNCS. Springer, Heidelberg (2011) 10. Ruiz, R., Maroto, C.: A comprehensive review and evaluation of permutation flowshop heuristics. European Journal of Operational Research 165(2), 479–494 (2005) 11. Stadler, P.F.: Towards a theory of landscapes, vol. 461, pp. 78–163. Springer, Heidelberg (1995) 12. Wilke, C.O.: Adaptative evolution on neutral networks. Bull. Math. Biol. 63, 715– 730 (2001) 13. Verel, S., Collard, P., Tomassini, M., Vanneschi, L.: Fitness landscape of the cellular automata majority problem: view from the “Olympus”. Theor. Comp. Sci. 378, 54– 77 (2007) 14. Louren¸co, H.R., Martin, O., St¨ utzle, T.: Iterated local search. In: Handbook of Metaheuristics. International Series in Operations Research & Management Science, vol. 57, pp. 321–353. Kluwer Academic Publishers, Dordrecht (2002)
Off-line and On-line Tuning: A Study on Operator Selection for a Memetic Algorithm Applied to the QAP Gianpiero Francesca1 , Paola Pellegrini2 , Thomas St¨ utzle2 , and Mauro Birattari2 1
Dipartimento di Ingegneria, University of Sannio, Benevento, Italy
[email protected] 2 IRIDIA, CoDE, Universit´e Libre de Bruxelles, Brussels, Belgium
[email protected], {stuetzle,mbiro}@ulb.ac.be
Abstract. Tuning methods for selecting appropriate parameter configurations of optimization algorithms have been the object of several recent studies. The selection of the appropriate configuration may strongly impact on the performance of evolutionary algorithms. In this paper, we study the performance of three memetic algorithms for the quadratic assignment problem when their parameters are tuned either off-line or on-line. Off-line tuning selects a priori one configuration to be used throughout the whole run for all the instances to be tackled. On-line tuning selects the configuration during the solution process, adapting parameter settings on an instance-per-instance basis, and possibly to each phase of the search. The results suggest that off-line tuning achieves a better performance than on-line tuning.
1
Introduction
Tuning an algorithm means to select its configuration, that is, a specific setting of all relevant parameters. The selection of the appropriate configuration has a major impact on the performance of evolutionary algorithms and, more generally, of all stochastic optimization algorithms. Several automatic tuning methods are available in the literature. Tuning methods can be grouped in two main categories, namely off-line and on-line ones. In off-line methods the configuration to be used is selected after testing several ones on a set of tuning instances. The selected configuration is then used for solving all instances to be tackled. Off-line methods typically consider the algorithm to be tuned as a black-box. Thus, they may be easily applied to any algorithm without any intervention on the algorithm itself [1,2,3,4]. Online methods vary the configuration during the solution of the instances to be tackled, by exploiting some feedback from the search process [5,6,7]. On-line tuning is often named parameter control, or parameter adaptation [8,9]. In this paper, we study the performance achieved by three memetic algorithms for the quadratic assignment problem, when the crossover operator [7] is P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 203–214, 2011. c Springer-Verlag Berlin Heidelberg 2011
204
G. Francesca et al.
tuned either off-line or on-line. We focus here on the operator selection since it is recognized to be a major issue when dealing with evolutionary algorithms, as it has a great impact on the performance achieved [10]. We test one off-line and three on-line methods for selecting the appropriate operator out of a set of four possible ones. The configuration space so obtained is very small, and, thus, the tuning problem can be considered rather simple. In fact, we earlier have shown that the increase of the dimension of the configuration space penalizes on-line more than off-line tuning [11]. Thus, the current experimental setup can be seen as the most favorable for on-line tuning. We compare the tuning methods as a function of the quality of the specific algorithm to be tuned. These quality differences are obtained by considering variants of the memetic algorithm. Our initial conjecture was that the performance level of an algorithm may have an impact on the relative desirability of off-line vs. on-line tuning methods. Therefore, we tested the tuning methods on three variants of the memetic algorithm: the first variant does not include either local search or mutation operator; the second one includes local search, but no mutation operator; the third one includes both, local search and the mutation operator. The results obtained are actually not fully conclusive: only some trends can be detected for supporting our initial conjecture. In general, off-line tuning is the best performing method, with on-line tuning achieving seldomly the best results. Still, some relation may exist between the method to be preferred and the quality of the algorithm. In particular, one should prefer off-line tuning when a high quality algorithm is to be applied. Surprisingly, the heterogeneity of the instances to be solved does not have a remarkable impact on the results. The rest of the paper is organized as follows: in Section 2, we describe the memetic algorithms we consider in this study, in Section 3, we present the tuning methods we apply. In Section 4, we depict the experimental setup, and in Section 5, we discuss the results obtained. In Section 6, we draw some conclusions.
2
The Algorithms Implemented
Memetic algorithms (MA) represent one of the most successful approaches in the field of evolutionary computation [12]. Typically, a memetic algorithm combines a population based technique and a local search. In the experimental analysis reported in this paper, we tackle the quadratic assignment problem (QAP). In the QAP, a set of n facilities are to be assigned to a set of n locations. A flow fij is associated to each pair of facilities i, j = 1, ..., n, and a distance dhk is given for each pair of locations h, k = 1, ..., n. A solution of the QAP is an assignment of each facility to a location, and it can be represented as a permutation π: the value in the i−th position of the permutation, π(i), corresponds to the facility that is assigned to the i−th location. The cost of a solution is equal to the sum over all pairs of facilities of the product of the flow between them, and the distance between their assigned location: n n i=1 j=1
fπ(i)π(j) dij .
Off-line and On-line Tuning: A Study on Operator Selection
205
The goal of the QAP is to find the solution that minimizes the cost of the assignment. In MA, each individual represents a solution of the problem. In the initialization phase of our MA for the QAP, a population of p individuals is randomly generated and it is improved by local search. The algorithm evolves the current population through crossover and mutation operators, until a stopping criterion is fulfilled. At each iteration, the algorithm generates pc new individuals through a crossover operator. A crossover operator generates an individual by combining two different ones belonging to the current population. The new individual is named offspring, the two preexisting ones are named parents. A mutation operator modifies an individual. After crossover and mutation, local search is applied to each individual. The new population is obtained by selecting the best p individuals from both old and new ones. For avoiding premature convergence, the search is restarted as soon as the average distance between individuals becomes smaller than a predefined threshold t. In this case, the new population is generated randomly. We study the performance of three algorithms that are inspired by the implementation proposed by Merz and Freisleben [13]. They differ in the application of either the local search or the mutation operator. The first algorithm (simple MA) does not adopt either local search or a mutation operator (actually, this is not really an MA, but we keep this name for simplicity of language). The second algorithm (intermediate MA) adopts local search, but it does not adopt a mutation operator. The third algorithm (full MA) adopts both local search and a mutation operator. The mutation operator performs a random perturbation of individuals. In particular, the algorithm randomly draws a number of pm = p/2 individuals from the overall population, including both the p current individuals and the new ones generated through crossover. For each individual, the operator iteratively exchanges elements in the permutation selecting them randomly according to a uniform distribution. Such exchanges are performed until the distance between the original and the resulting individuals is higher than a predefined threshold m. The distance between two individuals is equal to the number of components with different values. A crossover operator generates an offspring, Io , starting from a pair of parents, Ip1 and Ip2 . We consider the crossover operator to be used as a parameter with four possible settings. The cycle crossover operator, CX [14], copies to the offspring all components that are equal in both parents. The remaining components of Io are assigned starting from a random one, Io (j), according to the following procedure. One of the two parents is randomly drawn. Let it be Ip1 . CX sets Io (j) = Ip1 (j). Then, let Ip1 (j ) be the component such that Ip1 (j ) = Ip2 (j): CX sets Io (j ) = Ip1 (j ), and it substitutes the index j with j . This procedure is repeated until all components of Io are instantiated. The distance preserving crossover, DPX [13,15], generates an offspring that has the same distance from both parents. DPX copies in Io all the components that are equal in Ip1 and Ip2 . Each remaining component Io (j) is randomly
206
G. Francesca et al.
assigned, provided that Io (j) is a permutation and it is different from both Ip1 (j) and Ip2 (j). The partially mapped crossover operator, PMX [16], randomly draws two components of Io , Io (j) and Io (j ), j < j . It sets Io (k) = Ip1 (k) for all k < j or k > j , and Io (k) = Ip2 (k) for all j ≤ k ≤ j . If the so obtained offspring is not a feasible solution, for each pair of components Io (k) and Io (z) such that Io (k) = Io (z), j ≤ z ≤ j , PMX sets Io (k) = Ip1 (k). The order crossover, OX [17], randomly draws two components of Io , Io (j) and Io (j ). It sets Io (k) = Ip1 (k) for all j ≤ k ≤ j . Then, OX copies in the k th unassigned component of Io the k th component of Ip2 that differs from any Io (z), j ≤ z ≤ j .
3
Parameter Tuning
For selecting the appropriate configuration of the three MAs described in Section 2, we apply one off-line and three on-line tuning methods. The off-line method performs an exhaustive exploration of the configuration space, based on a set of instances with characteristics that are similar to those of the instances to be tackled: all tuning instances are solved using all possible configurations in 10 independent runs. The three on-line methods select the configuration to be used among the possible ones. The selection is a function of the quality of solutions previously generated by applying each configuration. The configuration to be used varies at each step, where a step corresponds to the generation of one offspring starting from two parents. The quality of a configuration c, Qc , is evaluated after each iteration. The equation used for updating Qc depends on a reward function Rc , which is given by fIo − fIp 1 fIo Rc = c max 0, , (1) |I | f fIp c Ibest Io ∈I
where I is the set of offspring generated in the current iteration by configuration c; fI is the value of the fitness function associated to individual I; Ibest is the individual with the highest fitness generated up to the current iteration; Ip is the Io ’s parent with the highest fitness. The contribution of each offspring to the reward is the product of two quantities. The first quantity is the ratio between the fitness of Io and the one of Ibest . The second quantity is the relative fitness improvement of Io with respect to Ip , or zero in absence of an improvement. In the first on-line method, named probability matching, PM, the selection of the configuration to be used is stochastic [18]. The quality Qc associated to configuration c is updated as: c
Qc = Qc + max{0, α (Rc − Qc )},
(2)
where α is a parameter of the algorithm, 0 < α ≤ 1. The probability of selecting configuration c is:
Off-line and On-line Tuning: A Study on Operator Selection
Pc = Pmin + (1 − |C|Pmin )
Qc c ∈C
Qc
,
207
(3)
where C is the set of all possible configurations, and Pmin is a parameter of the algorithm, 0 ≤ Pmin ≤ 1. In the initialization phase, the quality is initialized to Qc = 1 for each configuration c, and the probability is uniformly distributed. In the second on-line method, named adaptive pursuit, AP, as in probability matching, the selection of the configuration to be used is stochastic, and the probability distribution is based on a quality measure that is updated following Equation (2) [19]. Here, the probability of selecting configuration c, Pc , is computed as: ⎧ if Qc = maxc ∈C {Qc }, ⎨ Pc + β(Pmax − Pc ), Pc = (4) ⎩ Pc + β(Pmin − Pc ), otherwise, where C is the set of all possible configurations, Pmin and β are parameters of the algorithm, 0 < β ≤ 1 and 0 ≤ Pmin ≤ 1, and Pmax is set to 1−(|C|−1)Pmin . In the third on-line method, named multi-armed bandit, MAB, the selection of the configuration to be used is deterministic [20]. The quality Qc is computed as the average value returned by the reward function in all the iterations performed. The configuration selected c¯ is: ⎧ ⎫ ⎨ 2 ln c ∈C nc ⎬ c¯ = arg max Qc + γ , (5) c∈C ⎩ ⎭ nc where nc is the number of offspring generated by using configuration c in all the iterations performed, and γ, γ > 0, is a parameter of the algorithm.
4
Experimental Setup
In the experimental analysis we study the performance of off-line and on-line tuned versions of three MA algorithms. By studying the various algorithms described in Section 2, we compare the performance achieved by the different tuning methods as a function of the quality of the algorithm. In addition, we analyze the performance of the algorithms when different values of CPU time are imposed as stopping criterion. We run the algorithms with the following default parameter settings: p = 40, pc = p/2, t = 30%; in full MA, pm = p/2, m = 40%. In intermediate and full MA, we apply the 2-opt local search with best improvement [21]. For each algorithm, we test seven different versions depending on the configuration selection policy: – The configuration is maintained constant throughout the whole run: the configuration to be used is i) default, D: CX crossover operator; ii) off-line, OFF: the one selected by the off-line method; iii) random, R: random selection of one crossover operator according to a uniform probability distribution.
208
G. Francesca et al.
Table 1. Configuration selected by off-line tuning on the two sets of instances tackled, for each algorithm and for each CPU time limit in seconds time simple intermediate full
homogeneous 10 31 100 OX PMX PMX CX CX CX CX PMX PMX
heterogeneous 10 31 100 PMX PMX PMX CX PMX PMX CX PMX PMX
– The configuration to be used is selected at each step: iv) naive, N: random selection of the configuration, according to a uniform probability distribution; v) probability matching, PM: α = 0.3 and Pmin = 0.05 [18]; vi) adaptive pursuit, AP: α = 0.3, β = 0.3 and Pmin = 0.05 [19]; vii) multi-armed bandit, MAB: γ = 1 [20]. For a fair comparison between off-line and on-line tuning, all the methods select the configuration to be used from the same set of possibilities: the crossover operator can be set to CX, DPX, PMX, or OX. We consider two sets of instances. First, we solve instances of size 50 to 100 from the QAPLIB [22]. We name these instances heterogeneous, since they come from very different backgrounds, they have different sizes, and they are either structured or unstructured. Second, we consider a set of instances obtained through the instance generator described by St¨ utzle and Fernandes [23]. We name these instances homogeneous, since they are all unstructured, they have all size 80, and they are generated based on the same distributions. Both sets include 34 instances. We randomly split each set in two subsets. We use one of them for performing the off-line tuning. Table 1 reports the configuration selected by off-line tuning for each algorithm and for each CPU time limit, namely 10, 31 and 100 CPU seconds. In Section 5 we discuss the results achieved on the instances of the second subsets by the seven versions implemented. For the different stopping criteria, we perform 10 independent runs of each version on all instances. All the experiments are performed on Xeon E5410 quad core 2.33GHz processors with 2x6 MB L2-Cache and 8 GB RAM, running under the Linux Rocks Cluster Distribution. The algorithms are implemented in C++, and the code is compiled using gcc 4.1.2.
5
Experimental Results
By analyzing the performance of three MA algorithms, we can observe the relative performance of the tuning methods as a function of the algorithm performance. Table 2 shows the percentage error with respect to the best known solution of each instance obtained by the default version of the three algorithms. We present the results obtained in one run of 100 seconds on both the homogeneous and heterogeneous instances. The best algorithm is the full one, followed
Off-line and On-line Tuning: A Study on Operator Selection
209
Table 2. Algorithm quality. Percentage error obtained after 100 seconds by the default version of the three MA algorithms, with respect to the best known solution of each instance. simple intermediate full
homogeneous 4.69343% 1.51216% 0.79046%
heterogeneous 9.29772% 2.17695% 1.44571%
by the intermediate. The simple algorithm is the worst performing. For each instance set, the difference between all pairs of algorithms is statistically significant at the 95% confidence level, according to the Wilcoxon rank-sum test. Simple MA. For assessing the performance of the seven versions of the simple algorithm as a function of different run-lengths on both heterogeneous and homogeneous instances, we present the results achieved in terms of ranking. We test the significance of the differences with the Friedman test for all-pairwise comparisons. The plots depicted describe the 95% simultaneous confidence intervals of these comparisons. For each version we show the median rank over all instances, together with the bounds of the corresponding confidence interval. If the intervals of two versions overlap, then the difference among these versions is not statistically significant [24]. We use the same type of representation for all results provided in the paper. The results achieved on the homogeneous instances are reported in Figure 1(a). The off-line version performs significantly worse than at least one on-line version 100 s (17 Instances)
31 s (17 Instances)
10 s (17 Instances)
N MAB AP PM R OFF D
N MAB AP PM R OFF D
N MAB AP PM R OFF D
20 30 40 50 60
20 30 40 50 60
20 30 40 50 60
(a) homogeneous set N MAB AP PM R OFF D
N MAB AP PM R OFF D 20 30 40 50 60
100 s (17 Instances)
31 s (17 Instances)
10 s (17 Instances)
N MAB AP PM R OFF D
20 30 40 50 60
20 30 40 50 60
(b) heterogeneous set Fig. 1. Results achieved by the seven versions of the simple algorithm. Simultaneous confidence intervals for all-pairwise comparisons of ranks between all versions applied to homogeneous and heterogeneous instances.
210
G. Francesca et al.
10 s (17 Instances)
N MAB AP PM R OFF D
31 s (17 Instances)
N MAB AP PM R OFF D 20 30 40 50 60
100 s (17 Instances)
N MAB AP PM R OFF D 20 30 40 50 60
20 30 40 50 60
(a) homogeneous set 10 s (17 Instances)
N MAB AP PM R OFF D
31 s (17 Instances)
N MAB AP PM R OFF D 20 30 40 50 60
100 s (17 Instances)
N MAB AP PM R OFF D 20 30 40 50 60
20 30 40 50 60
(b) heterogeneous set Fig. 2. Results achieved by the seven versions of the intermediate algorithm. Simultaneous confidence intervals for all-pairwise comparisons of ranks between all versions applied to homogeneous and heterogeneous instances.
for runs of 10 and 31 seconds, while it is the best one for the longest run-length. Which is the best on-line version depends on the CPU time available, even if in most cases the difference among these versions is not significant. For what concerns the benchmark versions, the random version outperforms only the default versions for runs of 100 seconds, and it is the worst performing otherwise. The naive version, instead, achieves quite good results. In particular, it is comparable to the best on-line method for medium and long runs. In Figure 1(b), we depict the results achieved on the heterogeneous instances. The qualitative conclusions that can be drawn are equivalent to those derived from the homogeneous instances. The off-line version is significantly worse than the best on-line version for the short and medium run-lengths, while the opposite holds for long ones. Which is the best on-line method depends on the run-length. On these instances, the difference between the best on-line version and the other ones is statistically significant for runs of 10 and 31 seconds. Intermediate MA. In Figure 2(a), we report the results achieved on the homogeneous instances. The off-line version outperforms the best on-line one for the short run-length. They are comparable for runs of 31 and 100 seconds. Adaptive pursuit is the best on-line version for runs of 31 seconds. Naive and multi-armed bandit achieve very similar results, and they outperform only the random version. Differently from the case of the simple algorithm, the default version achieves good results: it is always statistically equivalent to the best version. The results on the heterogeneous instances, reported in Figure 2(b), show that the characteristics of the instances do not have a remarkable impact on the
Off-line and On-line Tuning: A Study on Operator Selection
10 s (17 Instances)
N MAB AP PM R OFF D
31 s (17 Instances)
N MAB AP PM R OFF D 20 30 40 50 60
211
100 s (17 Instances)
N MAB AP PM R OFF D 20 30 40 50 60
20 30 40 50 60
(a) homogeneous set 10 s (17 Instances)
N MAB AP PM R OFF D
31 s (17 Instances)
N MAB AP PM R OFF D 20 30 40 50 60
100 s (17 Instances)
N MAB AP PM R OFF D 20 30 40 50 60
20 30 40 50 60
(b) heterogeneous set Fig. 3. Results achieved by the seven versions of the full algorithm. Simultaneous confidence intervals for all-pairwise comparisons of ranks between all versions applied to homogeneous and heterogeneous instances.
relative performance of off-line tuning: the off-line version is the best one for the short and medium run-lengths, and it is comparable to all on-line versions in long ones. Considering only the on-line versions, all of them are equivalent to the naive one for runs of 10 and 100 seconds. Full MA. In Figure 3(a), we depict the results achieved by the seven versions of the full algorithm on the homogeneous instances. The off-line version always appears to be the best choice. The default version achieves very good results, too. The difference in the performance of the off-line and the on-line versions decreases as the CPU time increases. The results achieved by the on-line versions are very similar to each other. The results obtained on the heterogeneous instances, reported in Figure 3(b), suggest similar conclusions. In particular, the off-line version is the best performing for the short run-length, while this is not true for runs of 31 and 100 seconds. The relative performance of the off-line and the on-line versions follows the trend identified for the homogeneous instances: as the CPU time grows, the on-line versions achieve relatively better performance. This trend is even more evident here, since the off-line version is comparable to all the on-line ones for runs of 100 seconds. The results achieved by the random version are quite poor, while the naive version is always comparable to at least an on-line one. Summary of the results. By examining the results just presented, we cannot identify a clear relation between the quality of the algorithms and the relative performance of off-line and on-line tuning: off-line tuning achieves quite
212
G. Francesca et al.
constantly very good performance, and thus it appears the most advantageous and conservative choice. Nonetheless, when a low quality algorithm is to be used, applying an on-line method may be preferable for short run-lengths. One critical issue in this case is the selection of the best on-line tuning method: on one hand, adaptive pursuit achieves always quite good results; on the other hand, in several cases it is not the best performing method. A further element that cannot be neglected is the relatively good performance achieved by the naive version, compared to the more advanced methods proposed in the literature. Still, by counting the cases in which each operator is winning against the others, we can conclude that adaptive pursuit is the on-line method to choose: it achieves in general good performance, and it often outperforms the naive version. Nonetheless, if we consider the effort devoted by the scientific community to the development and the analysis of on-line tuning methods, the difference between them and the naive version is surprisingly small. Maybe surprisingly, the heterogeneity of the set of instances to be tackled does not have a remarkable impact on the results. These conclusions are supported by further results we have obtained by performing the same analysis on two ant colony optimization (ACO) algorithms, namely MAX–MIN ant system (MMAS) for the QAP either with or without local search. We applied the on-line tuning methods described by Pellegrini et al. [11]. We tuned parameter α, that is, the exponent value used for the pheromone trails in the state transition rule. The results of this analysis are depicted in a supplementary report [25].
6
Conclusions
In this paper, we studied the performance of three memetic algorithms for the QAP, when their configurations are tuned either off-line or on-line. We consider one off-line and three on-line methods, we tested the algorithms on two different instance sets, a heterogeneous and a homogeneous one, and we observed the impact of the different tuning methods as a function of the quality of the algorithm. The results do not allow drawing any clear conclusion on the relation between the tuning methods and the quality of the algorithms. In general, off-line tuning seems to be preferable under all experimental conditions. The heterogeneity of the instances to be tackled does not have a remarkable impact on the results. Some trend can be detected that indicates that, for a low quality algorithm, on-line tuning may achieve better results than off-line tuning. In this case, the choice of the on-line method to implement is not trivial and it must be done after considering the computational time available. In future studies, we will try to further investigate the relation between the quality of the algorithms and the impact of off-line and on-line tuning. An extensive experimental analysis will be necessary to this aim. Moreover, we will increase the heterogeneity of the instances to be tackled, for identifying whether and for what level of heterogeneity on-line methods have a clear advantage over
Off-line and On-line Tuning: A Study on Operator Selection
213
off-line ones. Finally, we will further focus on the relative performance of the state-of-the-art on-line tuning methods compared to some simple approaches for perturbing the configuration used during the search process. Recently, Fialho [10] has proposed a well performing on-line method called rank-based multi-armed bandit. We will implement this further method and analyze its performance in our setting. In this framework, it may be interesting to identify some conditions under which the additional effort required for selecting a specific on-line method and for implementing it, is or is not payed in terms of improved performance. Acknowledgments. This research has been supported by “META-X”, an Action de Recherche Concert´ee funded by the Scientific Research Directorate of the French Community of Belgium, and by “E-SWARM – Engineering Swarm Intelligence Systems”, an European Research Council Advanced Grant awarded to Marco Dorigo (Grant Number 246939). The work of Paola Pellegrini is funded by a Bourse d’excellence Wallonie-Bruxelles International. Mauro Birattari and Thomas St¨ utzle acknowledge support from the Belgian F.R.S.-FNRS, of which they are Research Associates.
References 1. Birattari, M., St¨ utzle, T., Paquete, L., Varrentrapp, K.: A racing algorithm for configuring metaheuristics. In: Langdon, W., et al. (eds.) GECCO 2002, pp. 11– 18. Morgan Kaufmann Publishers, San Francisco (2002) 2. Balaprakash, P., Birattari, M., St¨ utzle, T.: Improvement strategies for the F-race algorithm: Sampling design and iterative refinement. In: Bartz-Beielstein, T., Blesa Aguilera, M.J., Blum, C., Naujoks, B., Roli, A., Rudolph, G., Sampels, M. (eds.) HM 2007. LNCS, vol. 4771, pp. 108–122. Springer, Heidelberg (2007) 3. Adenso-D´ıaz, B., Laguna, M.: Fine-tuning of algorithms using fractional experimental designs and local search. Operations Research 54(1), 99–114 (2006) 4. Hutter, F., Hoos, H.H., Leyton-Brown, K., St¨ utzle, T.: ParamILS: An automatic algorithm configuration framework. J. Artif. Intell. Res. (JAIR) 36, 267–306 (2009) 5. Battiti, R., Brunato, M., Mascia, F.: Reactive Search and Intelligent Optimization. Operations Research/Computer Science Interfaces, vol. 45. Springer, Berlin (2008) 6. Martens, D., Backer, M.D., Haesen, R., Vanthienen, J., Snoeck, M., Baesens, B.: Classification with ant colony optimization. IEEE Transactions on Evolutionary Computation 11(5), 651–665 (2007) 7. Maturana, J., Fialho, A., Saubion, F., Schoenauer, M., Sebag, M.: Extreme compass and dynamic multi-armed bandits for adaptive operator selection. In: IEEE Congress on Evolutionary Computation, pp. 365–372 (2009) 8. Eiben, A.E., Michalewicz, Z., Schoenauer, M., Smith, J.E.: Parameter control in evolutionary algorithms. In: Lobo, F., Lima, C.F., Michalewicz, Z. (eds.) Parameter Setting in Evolutionary Algorithms, pp. 19–46. Springer, Berlin (2007) 9. Whitacre, J.M., Pham, Q.T., Sarker, R.A.: Credit assignment in adaptive evolutionary algorithms. In: Cattolico (ed.) GECCO 2006, pp. 1353–1360. ACM, New York (2006) 10. Fialho, A.: Adaptive Operator Selection for Optimization. PhD thesis, Universit´e Paris-Sud XI, Orsay, France (2010)
214
G. Francesca et al.
11. Pellegrini, P., St¨ utzle, T., Birattari, M.: Off-line and on-line tuning: a study on MAX-MIN ant system for TSP. In: Dorigo, M., Birattari, M., Di Caro, G.A., Doursat, R., Engelbrecht, A.P., Floreano, D., Gambardella, L.M., Groß, R., S ¸ ahin, E., Sayama, H., St¨ utzle, T. (eds.) ANTS 2010. LNCS, vol. 6234, pp. 239–250. Springer, Heidelberg (2010) 12. Moscato, P.: On evolution, search, optimization, genetic algorithms and martial arts: towards memetic algorithms. Technical Report Caltech Concurrent Computation Program, 826, California Institute of Technology, Pasadena, CA, USA, 1989. 13. Merz, P., Freisleben, B.: Fitness landscape analysis and memetic algorithms for the quadratic assignment problem. IEEE Transactions on Evolutionary Computation 4(4), 337–352 (2000) 14. Merz, P., Freisleben, B.: A comparison of memetic algorithms, tabu search, and ant colonies for the quadratic assignment problem. In: Proc. Congress on Evolutionary Computation, pp. 2063–2070. IEEE Press, Los Alamitos (1999) 15. Merz, P., Freisleben, B.: A genetic local search approach to the quadratic assignment problem. In: 7th International Conference on Genetic Algorithms, pp. 465– 472. Morgan Kaufmann, San Francisco (1997) 16. Goldberg, D.E.: Genetic algorithms and rule learning in dynamic system control. In: International Conference on Genetic Algorithms and Their Applications, pp. 8–15. Morgan Kaufmann Publishers Inc., San Francisco (1985) 17. Davis, L.: Applying adaptive algorithms to epistatic domains. In: Proc. of IJCAI, pp. 162–164 (1985) 18. Corne, D.W., Oates, M.J., Kell, D.B.: On fitness distributions and expected fitness gain of mutation rates in parallel evolutionary algorithms. In: Guerv´ os, J.J.M., Adamidis, P.A., Beyer, H.-G., Fern´ andez-Villaca˜ nas, J.-L., Schwefel, H.-P. (eds.) PPSN VII. LNCS, vol. 2439, pp. 132–141. Springer, Heidelberg (2002) 19. Thierens, D.: An adaptive pursuit strategy for allocating operator probabilities. In: IEEE Congress on Evolutionary Computation, pp. 1539–1546. IEEE Press, Piscataway (2005) 20. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235–256 (2002) 21. St¨ utzle, T., Hoos, H.H.: M AX–M IN ant system. Future Generation Computer Systems 16(8), 889–914 (2000) 22. Burkard, R., Karisch, S., Rendl, F.: QAPLIB – A quadratic assignment problem library. Journal of Global Optimization (10), 391–403 (1997) 23. St¨ utzle, T., Fernandes, S.: New benchmark instances for the QAP and the experimental analysis of algorithms. In: Gottlieb, J., Raidl, G. (eds.) EvoCOP 2004. LNCS, vol. 3004, pp. 199–209. Springer, Heidelberg (2004) 24. Chiarandini, M.: Stochastic Local Search Methods for Highly Constrained Combinatorial Optimisation Problems, ch. 3. PhD thesis, Computer Science Department, Darmstadt University of Technology, Darmstadt, Germany (2005) 25. Francesca, G., Pellegrini, P., St¨ utzle, T., Birattari, M.: Companion to Off-line and On-line Tuning: a study on operator selection for a memetic algorithm applied to the QAP (2010), http://iridia.ulb.ac.be/supp/IridiaSupp2010-015/, IRIDIA Supplementary page.
On Complexity of the Optimal Recombination for the Travelling Salesman Problem Anton V. Eremeev Omsk Branch of Sobolev Institute of Mathematics, 13 Pevtsov str. 644043, Omsk, Russia
[email protected]
Abstract. The computational complexity of the optimal recombination for the Travelling Salesman Problem is considered both in the symmetric and in the general cases. Strong NP-hardness of these optimal recombination problems is proven and solving approaches are considered.
1
Introduction
The Travelling Salesman Problem (TSP) is one of the well-known NP-hard combinatorial optimization problems [1]: given an (n × n)-matrix (cij ) with nonnegative elements (distances), it is required to find a permutation i1 , i2 , . . . , in of the elements 1, 2, . . . , n minimizing the sum ci1 ,i2 + . . . + cin−1 ,in + cin ,i1 . In case the matrix (cij ) is symmetric, the TSP is called symmetric as well. In case such property is not presupposed, we will say that the general case is considered. In the general case a tour of the travelling salesman is a Hamiltonian circuit in a complete digraph without loops or multiple arcs, where the set of vertices is V = {v1 , . . . , vn } and the set of arcs is A. The length of an arc (i, j) ∈ A, equals cij . In the symmetric case the tour direction does not matter so a travelling salesman’s tour is a Hamiltonian cycle in a complete graph G with the same set of vertices V and a set of edges E, where the length of an edge {i, j} is cij = cji . This paper is devoted to the complexity analysis of the optimal recombination problem (ORP) for the TSP. The problem consists in finding a shortest travelling salesman’s tour which coincides with two given feasible parent solutions in those arcs (or edges) which belong to both parent solutions and does not contain the arcs (or edges) which are absent in both parent solutions. These constraints are equivalent to a requirement that the recombination should be respectful and gene transmitting as coined by Radcliffe [11]. In the symmetric case the input of ORP consists of an edge-weighted complete graph and two Hamiltonian parent cycles in it. In the general case the problem input consists of an arc-weighted directed graph and two parent circulations. In the general case, the ORP formulation implies that the direction of arcs in the desired tour must coincide with the direction of arcs in parent solutions, unless both opposite arcs between two vertices are present (in the later case both directions are possible).
Partially supported by RFBR grant 07-01-00410.
P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 215–225, 2011. c Springer-Verlag Berlin Heidelberg 2011
216
A.V. Eremeev
For the first time the optimal recombination was employed by Agarwal, Orlin and Tai [2] for the maximum independent set problem. Presently, this approach has multiple applications. In the genetic algorithms (GAs) where the set of feasible solutions is a set of permutations, the recombination procedures of such kind were used by Yagiura and Ibaraki [14], Cotta, Alba and Troya [6], Cook and Seymour [5] and Whitley, Hains and Howe [13]. Formulations of the optimal recombination problems in [5,6,14] differ from the ORP formulation considered in this paper, e.g., in [5] the result of recombination may contain any edge belonging at least to one of the parent solutions, while in our case all edges included in both of the parent solutions must be used. Both approaches to optimal recombination are meaningful: the optimal recombination problem [5] has larger or equal set of feasible solutions (and is usually more complex), while the approach employed here, usually leads to more simple ORPs. However, the smaller computational cost of recombination does not necessarily improve the overall performance of a GA. The choice between the two approaches should be based on the complexity analysis and experiments. Many problems, like the maximum independent set problem, admit polynomial-time recombination [3,8]. In a number of GAs, where the optimal recombination turns out to be NP-hard, many authors use the branch-and-bound methods [4] or dynamic programming [14] in the crossover operator. In such cases, often the approximate versions of the branch-and-bound or dynamic programming are used to avoid excessive computational cost. In dynamic programming [14], the number of states is limited by a given threshold. In the branch-and-bound methods [4] the result of recombination is the best solution found within a limited computation time or limited number of iterations. The dimensionality of recombination problem may also be reduced by choosing an appropriate granularity of representation [6]. The paper is structured as follows. In Section 2, using the results of Itai, Papadimitriou and Szwarcfiter [10] we show NP-hardness of the optimal recombination problem in the symmetric case. Here we also prove the NP-hardness of optimal recombination in the general case, using the well-known idea of transforming the vertex cover problem into the TSP [1]. In Section 3 we propose reductions of the considered ORPs to the TSP on graphs with bounded vertex degrees. The resulting TSP problems may be solved, e.g. by means of the algorithms of Eppstein [7], which have the time bounds significantly smaller than the well-known upper bound O(n2 2n ) of the dynamic programming [9]. Concluding remarks are given in Section 4.
2 2.1
NP-Hardness of Optimal Recombination Symmetric Case
In [10] it is proven that recognition of Hamiltonian grid graphs (the Hamilton cycle problem) is NP-complete. Recall that a graph G = (V , E ) with vertex set V and edge set E is called a grid graph, if its vertices are the integer vectors v = (xv , yv ) ∈ Z2 on plane, i.e., V ⊂ Z2 , and a pair of vertices is connected by an edge iff the Euclidean distance between them is equal to 1. Here and below, Z denotes the set of integer numbers. Let us call the edges that connect two
Complexity of Optimal Recombination for the TSP
217
vertices in Z2 with equal first coordinates vertical edges. The edges that connect two vertices in Z2 with equal second coordinates will be called horizontal edges. Let us assume V > 4, graph G is connected and there are no bridges in G (note that if any of these assumptions are violated, then existence of a Hamiltonian cycle in G can be recognized in polynomial time). Now we will construct a reduction from the Hamilton cycle problem for G to an optimal recombination problem for some complete edge-weighted graph G = (V, E), where V = V . Let the edge weights cij in graph G be defined so that if a pair of vertices {vi , vj } is connected by an edge of G , then cij = 0; all other edges in G have a weight 1. Consider the following two parent solutions of the TSP on graph G (an example of graph G and two parent solutions for the corresponding TSP is given in Fig. 1). Let ymin = minv∈V yv , ymax = maxv∈V yv . For any integer y ∈ {ymin, . . . , ymax } denote by P y the horizontal chain that passes through vertices v ∈ V with yv = y by increasing values of coordinate x. Let the first parent tour follow the chains P ymin , P ymin+1 , . . . , P ymax , connecting the right-hand end of each chain P y with y < ymax to the left-hand end of the chain P y+1 . Note that
Fig. 1. Example of two parent tours used in reduction from Hamilton cycle problem to ORP in symmetric case
218
A.V. Eremeev
these connections never coincide with some vertical edges because G has no bridges. To create a cycle, connect the right-hand end vTR of the chain P ymax to the left-hand end vBL of the chain P ymin . The second parent tour is constructed similarly using the vertical chains. Let xmin = minv∈V xv , xmax = maxv∈V xv . For any integer x ∈ {xmin , . . . , xmax } let Qx denote the vertical chain that passes monotonically in y through the vertces v ∈ V , such that xv = x. The second parent tour follows the chains Qxmin , Qxmin +1 , . . . , Qxmax , connecting the lower end of each chain Qx with x < xmax to the upper end of chain Qx+1 . These connections never coincide with horizontal edges since G has no bridges. Finally, the lower end vRB of chain Qxmax is connected to the upper end vLT of chain Qxmin . Note that the constructed parent tours have no common edges. Indeed, common slanting edges do not exist since V > 4. The horizontal edges belong to the first tour only, except for the situation where yvRB = yvLT and the edge {vRB , vLT } of the second tour is oriented horizontally. But if the first parent tour included the edge {vRB , vLT } in this situation, then the edge {vRB , vLT } would be a bridge in graph G . Therefore the parent tours can not have the common horizontal edges. Similarly the vertical edges belong to the second tour only, except for the case where xvTR = xvBL and the edge {vTR , vBL } of the first tour is oriented vertically. But in this case the parent tour can not contain the edge {vTR , vBL }, since G has no bridges. Note also that the union of edges of parent solutions contains E . Consequently, any Hamiltonian cycle in graph G is a feasible solution of the ORP. At the same time, a feasible solution of the ORP has zero value of objective function iff it contains only the edges of E . Therefore, the optimal value of objective function in the ORP under consideration is equal to 0 iff there exists a Hamiltonian cycle in graph G . So, the following theorem is proven. Theorem 1. Optimal recombination for the TSP in the symmetric case is NPhard in the strong sense. In [10] it is also proven that recognition of grid graphs with a Hamiltonian path is NP-complete. Optimal recombination for this problem consists in finding a shortest Hamiltonian path, which uses those edges where both parent tours coincide, and never uses the edges absent in both parent tours. The following theorem is proved analogously to Theorem 1. Theorem 2. Optimal recombination for the problem of finding the shortest Hamiltonian path in a graph with arbitrary edge lengths is NP-hard in the strong sense. Note that in the proof of Theorem 2, unlike in Theorem 1, it is impossible simply to exclude the cases where graph G has bridges. Instead, the reduction should treat separately each maximal (by inclusion) subgraph without bridges. 2.2
The General Case
In the general case of TSP the ORP is not a more general problem than the ORP considered in Subsection 2.1 because in the problem input we have two directed
Complexity of Optimal Recombination for the TSP
219
Fig. 2. A pair of parent circuits for the case of G = K3 . It is supposed that the incident edges are enumerated as follows. For vertex v 1 : ev1 ,1 = e1 , ev1 ,2 = e3 ; for vertex v 2 : ev2 ,1 = e1 , ev2 ,2 = e2 ; for vertex v3 : ev3 ,1 = e2 , ev3 ,2 = e3 .
parent paths, while in the symmetric case the parent paths were undirected. Even if the distance matrix (cij ) is symmetric, a pair of directed parent tours defines a significantly different set of feasible solutions, compared to the undirected case. Therefore, the general case requires a separate consideration of ORP complexity. Theorem 3. Optimal recombination for the TSP in the general case is NP-hard in the strong sense. Proof. We use a modification of the textbook reduction from the vertex cover problem to the TSP [1]. Suppose an instance of a vertex cover problem is given as a graph G = (V , E ). It is required to find a vertex cover in G of minimal size. Let us assume that the vertices in V are enumerated, i.e. V = {v1 , . . . , vn }, where n = |V |, and let m = |E |. Consider a complete digraph G = (V, A) where the set of vertices V consists of |E | cover-testing components, each of 12 vertices: Ve = {(vi , e, k), (vj , e, k) : 1 ≤ k ≤ 6} for each e = {vi , vj } ∈ E , i < j. Besides that, V contains n selector vertices, which we will denote by a1 , . . . , an , and besides that, a supplementary vertex an+1 . Let the parent tours in graph G be defined by the following two circuits (an example of a pair of such circuits for the case of G = K3 is provided in fig. 2).
220
A.V. Eremeev
1. Each cover-testing component Ve , where e = {vi , vj } ∈ E and i < j is visited twice by the first tour. The first time it visits the vertices that correspond to vi in the sequence (vi , e, 1), . . . , (vi , e, 6), (1) the second time it visits the vertices corresponding to vj , in the sequence (vj , e, 1), . . . , (vj , e, 6).
(2)
2. The second tour goes through each cover-testing component Ve , where e = {vi , vj } ∈ E and i < j in the following sequence: (vi , e, 2), (vi , e, 3), (vj , e, 1), (vj , e, 2), (vj , e, 3), (vi , e, 1), (vi , e, 6), (vj , e, 4), (vj , e, 5), (vj , e, 6), (vi , e, 4), (vi , e, 5). The first parent tour connects the cover-testing components as follows. For each vertex v ∈ V order arbitrarily the edges incident to v in graph G in sequence: ev,1 , ev,2 , . . . , ev,deg(v) , where deg(v) is the degree of vertex v in G . In the covertesting components, following the chosen sequence ev,1 , ev,2 , . . . , ev,deg(v) , this tour passes 6 vertices in each of the components (v, e, k), k = 1, . . . , 6, e ∈ {ev,1 , ev,2 , . . . , ev,deg(v) }. Thus, each vertex of any cover-testing component Ve , e = {u, v} ∈ E will be visited by one of the two 6-vertex sub-tours. The second tour passes the cover-testing components in an arbitrary order of edges Ve1 , . . . , Vem , entering each component Vek for any ek = {vik , vjk } ∈ E , ik < jk , k = 1, . . . , m via vertex (vik , ek , 2) and exiting through vertex (vik , ek , 5). Thus, a sequence of vertex indices i1 , . . . , im is induced (repetitions are possible). In what follows, we will need the beginning i1 and the end im of this sequence. The parent sub-tours described above are connected to form two Hamiltonian circuits in G using the vertices a1 , . . . , an+1 . The first circuit is completed using the arcs a1 , (v1 , ev1 ,1 , 1) , (v1 , ev1 ,deg(v1 ) , 6), a2 , a2 , (v2 , ev2 ,1 , 1) , (v2 , ev2 ,deg(v2 ) , 6), a3 , ..., an , (vn , evn ,1 , 1) , (vn , evn ,deg(vn ) , 6), an+1 , an+1 , a1 . The second circuit is completed by the arcs a1 , a2 , . . . , an−1 , an , an , an+1 , an+1 , (vi1 , e1 , 2) , (vim , em , 5), a1 . Assign unit weights to all arcs ai , (vi , evi ,1 , 1) , i = 1, . . . , n in the complete digraph G. Besides that, assign weight n + 1 to all arcs of the second tour which are
Complexity of Optimal Recombination for the TSP
221
connecting the components Ve1 , . . . , Vem, the same weights are assigned to the arcs an+1 , (vi1 , e1 , 2) and (vim , em , 5), a1 . All other arcs in G are given weight 0. Note that for any vertex cover C of graph G , the set of feasible solutions of ORP with two parents defined above contains a circuit R(C) with the following structure (an example of such circuit for the case of G = K3 is provided in fig. 3). For each vi ∈ C the circuit R(C) contains the arcs ai , (vi , evi ,1 , 1) and (vi , evi ,deg(vi ) , 6), ai+1 . The components Ve , e ∈ {evi ,1 , evi ,2 , . . . , evi ,deg(vi ) } are connected together by the arcs from the first tour. For each vertex vi which does not belong to C, the circuit R(C) has an arc (ai , ai+1 ). Also, R(C) passes the arc (an+1 , a1 ). The circuit R(C) visits each cover-testing component Ve by one of the two ways: 1. If both endpoints of an edge e belong to C, then R(C) passes the component following the same arcs as the first parent tour. 2. If e = {u, v}, u ∈ C, v ∈ C, then R(C) visits the vertices of the component in sequence (u, e, 1), (u, e, 2), (u, e, 3), (v, e, 1), . . . , (v, e, 6), (u, e, 4), (u, e, 5), (u, e, 6). One can check straightforwardly that this sequence does not violate the ORP constraints. In general, the circuit R(C) is a feasible solution to the ORP because, on one hand, all arcs used in R(C) are present at least in one of the parent tours. On the other hand, both parent tours contain only the arcs of the type (u, e, 2), (u, e, 3) , (u, e, 4), (u, e, 5) , (v, e, 1), (v, e, 2) , (v, e, 2), (v, e, 3) , (v, e, 4), (v, e, 5) , (v, e, 5), (v, e, 6) within the cover-testing components Ve , e = {u, v} ∈ E , where vertex u has a smaller index than v. All of these arcs belong to R(C). The total weight of circuit R(C) is |C|. Now each feasible solution R to the constructed ORP defines a set of vertices C(R) as follows: vi , i ∈ {1, . . . , n} belongs to C(R) iff R contains an arc vi ,1 ai , (vi , e , 1) . Let us consider only such ORP solutions R that have the objective value at most n. These solutions do not contain the arcs that connect the covertesting parent tour. components in the second They also do not contain the arcs an+1 , (vi1 , e1 , 2) and (vim , em , 5), a1 . Note that the set of such ORP solutions is non-empty, e.g. the firstparent tour belongs to it. Consider the case where the arc ai , (vi , evi ,1 , 1) belongs to R. Each covertesting component Ve with e = {vi , vj } ∈ E in this case may be visited in one of the two possible ways: either the same way as in the first parent tour (in this
222
A.V. Eremeev
Fig. 3. An ORP solution R(C) corresponding to the vertex cover {v1 , v3 } of graph G = K3
case, vj must also be chosen into C(R) since R Hamiltonian), or in the following sequence: (vi , e, 1), (vi , e, 2), (vi , e, 3), (vj , e, 1), . . . , (vj , e, 6), (vi , e, 4), (vi , e, 5), (vi , e, 6) (in this case, vj will notbe chosen into C(R)). In view of our assumption that the arc ai , (vi , evi ,1 , 1) belongs to R, the cover-testing components Ve , e ∈ {evi ,1 , evi 2 , . . . , evi ,deg(vi ) } should be of the first tour, and connected by the arcs vi ,deg(vi ) besides that, R contains the arc (vi , e , 6), ai+1 . Note that the total length of the arcs in R equals |C(R)|, and the set C(R) is a vertex cover in graph G , because the tour R passes each component Ve in a way that guarantees coverage of each edge e ∈ E . To sum up, there exists a bijection between the set of vertex covers in graph G and the set of feasible solutions to the ORP of length at most n. The values of objective functions are not changed under this bijection, therefore the statement of the theorem follows.
3
Transformation of the ORP into TSP on Graphs With Bounded Vertex Degree
In this Section, the ORP problems are connected to the TSP on graphs (digraphs) with bounded vertex degree, arbitrary positive edge (arc) weights and a
Complexity of Optimal Recombination for the TSP
223
given set of forced edges (arcs). It is required to find a shortest Hamiltonian cycle (circulation) in the given graph (digraph) that passes all forced edges (arcs). 3.1
General Case
Consider the general case of ORP for the TSP, where we are given two parent tours A1 , A2 in a complete digraph G = (V, A). This ORP problem may be transformed into the problem of finding a shortest Hamiltonian circuit in a supplementary digraph G = (V , A ). The digraph G is constructed on the basis of G by excluding the set of arcs A\(A1 ∪ A2 ) and contracting each path that belongs to both parent tours into a pseudo-arc of the same length and the same direction as those of the path. The lengths of all other arcs that remained in G are the same as they were in G. A shortest Hamiltonian circuit C in G transforms into an optimum of the ORP problem by means of reverse substitution of each pseudo-arc in C by the path corresponding to it. Note that there are two ingoing arcs and two outgoing arcs for each vertex in G . The TSP on such a digraph is equivalent to the TSP on a cubic digraph G = (V , A ), where each vertex v ∈ V is substituted by two vertices vˇ, vˆ, connected by an artificial arc (ˇ v , vˆ) of zero length. All arcs that entered v, now enter vˇ, and all arcs that left v are now outgoing from vˆ. Let an arc e ∈ A be forced and called a pseudo-arc, if it corresponds to a pseudo-arc in G . A solution to the last problem may be obtained through enumeration of all ¯= feasible solutions to the TSP with forced edges on a supplementary graph G ¯ (V , E). Here, a pair of vertices u, v is connected iff these vertices were connected ¯ is assumed to by an arc (or a pair of arcs) in the digraph G . An edge {u, v} ∈ E be forced if (u, v) or (v, u) is a pseudo-arc or an artificial arc in the digraph G . A ¯ will be denoted by F¯ . All Hamiltonian cycles in G ¯ w.r.t. set of forced edges in G the set of forced edges may be enumerated by means of the algorithm proposed ¯ ¯ ¯ in [7] in time O(|V | · 2(|E|−|F |)/4 ). Then, for each Hamiltonian cycle from G in each of the two directions we can check if it is possible to pass a circulation in G , and if possible, compute the length of the circulation. This takes O(|V |) ¯ − |F¯ | = d ≤ |E | ≤ 2n, where d is time for each Hamiltonian cycle. Note that |E| the number of arcs which are present in one of the parents only. Consequently, the time complexity of solving the ORP on graph G is O(n·2d/4 ), or O(n·1.42n). Implementation of the method described above may benefit in the cases where the parent solutions have many arcs in common. 3.2
Symmetric Case
Suppose the symmetric case takes place and two parent Hamiltonian cycles in graph G = (V, E) are defined by two sets of edges E1 and E2 . Let us construct a reduction of the ORP in this case to a TSP with a set of forced edges on a graph with vertex degree at most 4. Similar to the general case, the ORP reduces to the TSP on a graph G = (V , E ) obtained from G by exclusion of all edges that belong to E\(E1 ∪E2 ) and
224
A.V. Eremeev
contraction of all paths that belong to both parent tours. Here, by contraction we mean the following mapping. Let Puv be a path with endpoints in u and v, such that the edges of Puv belong to E1 ∩ E2 and Puv is not contained in any other path with edges from E1 ∩ E2 . Assume that contraction of the path Puv maps all of its vertices and edges into one forced edge {u, v} of zero length. All other vertices and edges of the graph remain unchanged. Let F denote the set of forced edges in G , which are introduced when the contraction is applied to all paths wherever possible. The vertex degrees in G are at most 4, and |V | ≤ n. If an optimum of the TSP on graph G with the set of forced edges F is found, then substitution of all forced edges by the corresponding paths yields an optimal solution to the ORP problem. (Note that the objective functions of these two problems differ by the total length of contracted paths.) The search for an optimum to the TSP on graph G may be carried out by means of the randomized algorithm proposed in [7] for solving TSP with forced edges on graphs with vertex degree at most 4. Besides the problem input data this algorithm is given a value p, which sets the desired probability of obtaining the optimum. If p ∈ [0, 1) is a constant which does not depend on the problem input, then the algorithm has time complexity O((27/4)n/3 ), which is O(1.89n). There exists a deterministic modification of this algorithm corresponding to the case p = 1 which requires greater computation time [7].
4
Conclusions
The obtained results indicate that optimal recombination for the TSP is NPhard. However, the algorithms exist that solve the optimal recombination problem in shorter time than the well-known time bound O(n2 2n ) [9]. Apparently, the results on NP-hardness of the optimal recombination may be extended to some other problems, where the set of feasible solutions consists of permutations. For some binary encodings of solutions such extension could be made using the reductions of optimal recombination problems [8]. There may be some room for improvement of the algorithms, proposed in [7] for the TSP on graphs with vertex degrees at most 3 or 4 and forced edges, in terms of the running time. Thus, it seems to be important to continue studying this modification of the TSP. Also, in future it is necessary to perform experimental study of the proposed optimal recombination algorithms and compare them to other recombination methods.
References 1. Garey, M.R., Johnson, D.S.: Computers and intractability. A guide to the theory of N P -completeness. W.H. Freeman and Company, San Francisco (1979) 2. Agarwal, C.C., Orlin, J.B., Tai, R.P.: Optimized crossover for the independent set problem. Working paper # 3787-95. MIT, Cambridge (1995)
Complexity of Optimal Recombination for the TSP
225
3. Balas, E., Niehaus, W.: Optimized crossover-based genetic algorithms for the maximum cardinality and maximum weight clique problems. J. Heur. 4(2), 107–122 (1998) 4. Borisovsky, P., Dolgui, A., Eremeev, A.: Genetic algorithms for a supply management problem: MIP-recombination vs greedy decoder. Eur. J. Oper. Res. 195(3), 770–779 (2009) 5. Cook, W., Seymour, P.: Tour merging via branch-decomposition. INFORMS J. Comput. 15(2), 233–248 (2003) 6. Cotta, C., Alba, E., Troya, J.M.: Utilizing dynastically optimal forma recombination in hybrid genetic algorithms. In: Eiben, A.E., B¨ ack, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 305–314. Springer, Heidelberg (1998) 7. Eppstein, D.: The traveling salesman problem for cubic graphs. J. of Graph Algor. and Applicat. 11(1), 61–81 (2007) 8. Eremeev, A.V.: On complexity of optimal recombination for binary representations of solutions. Evol. Comput. 16(1), 127–147 (2008) 9. Held, M., Karp, R.M.: A dynamic programming approach to sequencing problems. J. of Soc. for Indust. and Appl. Math. 10, 196–210 (1962) 10. Itai, A., Papadimitriou, C.H., Szwarcfiter, J.L.: Hamilton paths in grid graphs. SIAM J. Comput. 11(4), 676–686 (1982) 11. Radcliffe, N.J.: Forma analysis and random respectful recombination. In: Proc. of the Fourth International Conference on Genetic Algorithms, pp. 31–38 (1991) 12. Reeves, C.R.: Genetic algorithms for the operations researcher. INFORMS J. Comput. 9(3), 231–250 (1997) 13. Whitley, D., Hains, D., Howe, A.: A hybrid genetic algorithm for the traveling salesman problem using generalized partition crossover. In: Schaefer, R., Cotta, C., Kolodziej, J., Rudolph, G. (eds.) PPSN XI. LNCS, vol. 6238, pp. 566–575. Springer, Heidelberg (2010) 14. Yagiura, M., Ibaraki, T.: The use of dynamic programming in genetic algorithms for permutation problems. Eur. J. Oper. Res. 92, 387–401 (1996)
Pareto Local Optima of Multiobjective NK-Landscapes with Correlated Objectives S´ebastien Verel1,3, Arnaud Liefooghe2,3, Laetitia Jourdan3 , and Clarisse Dhaenens2,3 1
University of Nice Sophia Antipolis – CNRS, France 2 Universit´e Lille 1, LIFL – CNRS, France 3 INRIA Lille-Nord Europe, France
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. In this paper, we conduct a fitness landscape analysis for multiobjective combinatorial optimization, based on the local optima of multiobjective N K-landscapes with objective correlation. In singleobjective optimization, it has become clear that local optima have a strong impact on the performance of metaheuristics. Here, we propose an extension to the multiobjective case, based on the Pareto dominance. We study the co-influence of the problem dimension, the degree of nonlinearity, the number of objectives and the correlation degree between objective functions on the number of Pareto local optima.
1
Motivations
The aim of fitness landscape analysis is to understand the properties of a given combinatorial optimization problem in order to design efficient search algorithms. One of the main feature is related to the number of local optima, to their distribution over the search space and to the shape of their basins of attraction. For instance, in single-objective optimization, it has been shown that local optima tend to be clustered in a ‘central massif’ for numerous combinatorial problems, such as the family of N K-landscapes [1]. A lot of methods are designed to ‘escape’ from such local optima. However, very little is known in the frame of multiobjective combinatorial optimization (MoCO), where one of the most challenging question relies on the identification of the set of Pareto optimal solutions. A Pareto Local Optima (PLO) [2] is a solution that is not dominated by any of its neighbors. The description of PLO is one of the first fundamental step towards the description of the structural properties of a MoCO problem. Surprisingly, up to now, there is a lack of study on the number and on the distribution of PLO in MoCO. Like in single-objective optimization, the PLO-related properties clearly have a strong impact on the landscape of the problem, and then on the efficiency of search algorithms. In particular, local search algorithms are designed in order to take them into account. For instance, the family of Pareto Local Search (PLS) [2] iteratively improves a set of solutions with respect to a given neighborhood P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 226–237, 2011. c Springer-Verlag Berlin Heidelberg 2011
Pareto Local Optima of ρM N K-Landscapes
227
operator and to the Pareto dominance relation. The aim of PLS, like a number of other search algorithms, is to find a set of mutually non-dominated PLO. PLS has been proved to terminate on such a set, called a Pareto local optimum set [2]. Notice that a Pareto optimal solution is a PLO, and that the whole set of Pareto optimal solutions is a Pareto local optimum set. The behavior of multiobjective algorithms clearly depends on the properties related to the PLO. First, a Pareto local optimum set is always a subset of the whole set of PLO. Second, the dynamics of a PLS-like algorithm depends of the number of PLO found along the search process. The probability to improve an approximation set that contains a majority of PLO should be smaller than the probability to improve an approximation set with few PLO. There exists a small amount of literature related to fitness landscape for MoCO. Borges and Hansen [3] study the distribution of local optima, in terms of scalarized functions, for the multiobjective traveling salesman problem (TSP). Another analysis of neighborhood-related properties for biobjective TSP instances of different structures is given in [4]. Knowles and Corne [5] lead a landscape analysis on the multiobjective quadratic assignment problem with a rough objective correlation. Next, the transposition of standard tools from fitness landscape analysis to MoCO are discussed by Garrett [6], and an experimental study is conducted with fitness distance correlation. But this measure requires the true Pareto optimal set to be known. In another study, the landscape of a MoCO problem is regarded as a neutral landscape, and divided into different fronts with the same dominance rank [7]. In such a case, a small search space needs to be enumerated. In previous works on multiobjective N K-landscapes by Aguirre and Tanaka [8], small enumerable fitness landscapes are studied according to the number of fronts, the number of solutions on each front, the probability to pass from one front to another, and the hypervolume of the Pareto front. However, the study of fronts simply allows to analyze small search spaces, and from the point of view of dominance rank only. In this work, our attempt is to analyze the structure of large search space using the central notion of local optimum. For the design of a local search algorithm for MoCO, the following questions are under study in this paper: (i) What is the number of PLO in the whole search space? (ii) Is the number of PLO related to the number of Pareto optimal solutions? In particular we want to study such properties according to the correlation degree between objective functions. In order to study the problem structure, and in particular the PLO, we use the multiobjective N K-landscapes with objective correlation, ρM N Klandscapes for short, recently proposed in [9]. The contributions of this work can be summarized as follows. First, we show the co-influence of objective correlation, objective space dimension and epistasis on the number of PLO. Next, we propose a method based on the length of a Pareto adaptive walk to estimate this number. At last, we study the number of PLO for large-size instances. The paper is organized as follows. Section 2 deals with MoCO and local search algorithms. Section 3 is devoted to the definition of multiobjective N Klandscapes with objective correlation. In Section 4, we study the number of
228
S. Verel et al.
PLO for enumerable instances and we propose a method to estimate it. Moreover, we analyze the correlation between the number of PLO and Pareto optimal solutions. In Section 5, the co-influence of objective space dimension, objective correlation and epistasis is studied for the PLO of large-size instances. The last section concludes the paper.
2 2.1
Local Search for Multiobjective Combinatorial Optimization Multiobjective Combinatorial Optimization
A multiobjective combinatorial optimization (MoCO) problem can be defined by a set of M ≥ 2 objective functions (f1 , f2 , . . . , fM ), and a (discrete) set X of feasible solutions in the decision space. Let Z = f (X) ⊆ IRM be the set of feasible outcome vectors in the objective space. In a maximization context, a solution x ∈ X is dominated by a solution x ∈ X, denoted by x ≺ x, iff ∀i ∈ {1, 2, . . . , M }, fi (x ) ≤ fi (x) and ∃j ∈ {1, 2, . . . , M } such that fj (x ) < fj (x). A solution x ∈ X is said to be Pareto optimal (or efficient, non-dominated ), if there does not exist any other solution x ∈ X such that x dominates x. The set of all Pareto optimal solutions is called the Pareto optimal set (or the efficient set ), denoted by XE , and its mapping in the objective space is called the Pareto front. A possible approach in MoCO is to identify the minimal complete Pareto optimal set, i.e. one solution mapping to each point of the Pareto front. However, the overall goal is often to identify a good Pareto set approximation. To this end, metaheuristics in general, and evolutionary algorithms in particular, have received a growing interest since the late eighties. Multiobjective metaheuristics still constitute an active research area [10]. 2.2
Local Search
A neighborhood structure is a function N : X → 2X that assigns a set of solutions N (x) ⊂ X to any solution x ∈ X. The set N (x) is called the neighborhood of x, and a solution x ∈ N (x) is called a neighbor of x. In single-objective combinatorial optimization, a fitness landscape can be defined by the triplet (X, N , h), where h : X −→ IR represents the fitness function, that can be pictured as the height of the corresponding solutions. Each peak of the landscape corresponds to a local optimum. In a single-objective maximization context, a local optimum is a solution x such that ∀x ∈ N (x ), f (x) ≤ f (x ). The ability of local search algorithms has been shown to be related to the number of local optima for the problem under study, and to their distribution over the landscapes [11]. In MoCO, given that Pareto optimal solutions are to be found, the notion of local optimum has to be defined in terms of Pareto optimality. Let us define the concepts of Pareto local optimum and of Pareto local optimum set. For more details, refer to [2]. A solution x ∈ X is a Pareto local optimum (PLO) with respect to a neighborhood structure N if there does not exist any neighboring solution x ∈ N (x) such that x ≺ x . A Pareto local optimum set XP LO ∈ X with respect
Pareto Local Optima of ρM N K-Landscapes
229
to a neighborhood structure N is a set of mutually non-dominated solutions such that ∀x ∈ XP LO , there does not exist any solution x ∈ N (XP LO ) such that x ≺ x . In other words, a locally Pareto optimal set cannot be improved, in terms of Pareto optimality, by adding solutions from its neighborhood. Recently, local search algorithms have been successfully applied to MoCO problems. Such methods seem to take advantage of some properties of the landscape in order to explore the search space in an effective way. Two main classes of local search for MoCO can be distinguished. The first ones, known as scalar approaches, are based on multiple scalarized aggregations of the objective functions. The second ones, known as Pareto-based approaches, directly or indirectly focus the search on the Pareto dominance relation (or a slight modification of it). One of them is the Pareto Local Search (PLS) [2]. It combines the use of a neighborhood structure with the management of an archive (or population) of mutually non-dominated solutions found so far. The basic idea is to iteratively improve this archive by exploring the neighborhood of its own content until no further improvement is possible, i.e. the archive falls in a Pareto local optimum set [2].
3
ρM N K-Landscapes: Multiobjective N K-Landscapes with Objective Correlation
In single-objective optimization, the family of N K-landscapes constitutes an interesting model to study the influence of non-linearity on the number of local optima. In this section, we present the ρM N K-landscapes proposed in [9]. They are based on the M N K-landscapes [8]. In this multiobjective model, the correlation between objective functions can be precisely tuned by a correlation parameter value. 3.1
N K- and M N K-Landscapes
The family of N K-landscapes [1] is a problem-independent model used for constructing multimodal landscapes. N refers to the number of (binary) genes in the genotype (i.e. the string length) and K to the number of genes that influence a particular gene from the string (the epistatic interactions). By increasing the value of K from 0 to (N − 1), N K-landscapes can be gradually tuned from smooth to rugged. The fitness function (to be maximized) of a N K-landscape fN K : {0, 1}N → [0, 1) is defined on binary strings of size N . An ‘atom’ with fixed epistasis level is represented by a fitness component fi : {0, 1}K+1 → [0, 1) associated to each bit i ∈ N . Its value depends on the allele at bit i and also on the alleles at K other epistatic positions (K must fall between 0 and N − 1). The fitness fN K (x) of a solution x ∈ {0, 1}N corresponds to the mean value of its N fitness components fi : fN K (x) = N1 N i=1 fi (xi , xi1 , . . . , xiK ), where {i1 , . . . , iK } ⊂ {1, . . . , i − 1, i + 1, . . . , N }. In this work, we set the K bits randomly on the bit string of size N . Each fitness component fi is specified by extension, i.e. a number yxi i ,xi1 ,...,xi from [0, 1) is associated with each element K
230
S. Verel et al.
(xi , xi1 , . . . , xiK ) from {0, 1}K+1. Those numbers are uniformly distributed in the range [0, 1). More recently, a multiobjective variant of N K-landscapes (namely M N Klandscapes) [8] has been defined with a set of M fitness functions: N 1 ∀m ∈ [1, M ], fN Km (x) = fm,i (xi , xim,1 , . . . , xim,Km ) N i=1
The numbers of epistasis links Km can theoretically be different for each fitness function. But in practice, the same epistasis degree Km = K for all m ∈ [1, M ] is used. Each fitness component fm,i is specified by extension with the numbers yxm,i . In the original M N K-landscapes [8], these numbers are rani ,xim,1 ,...,xim,K m domly and independently drawn from [0, 1). As a consequence, it is very unlikely that two different solutions map to the same point in the objective space. 3.2
ρM N K-Landscapes
In [9], CM N K-landscapes have been proposed. The epistasis structure is identical for all the objectives: ∀m ∈ [1, M ], Km = K and ∀m ∈ [1, M ], ∀j ∈ [1, K], im,j = ij . The fitness components are not defined independently. The numbers (yx1,i , . . . , yxM,i ) follow a multivariate uniform law of dii ,xi1 ,...,xiK i ,xi1 ,...,xiK mension M , defined by a correlation matrix C. Thus, the y’s follow a multidimenm,i sional law with uniform marginals and the correlations between y... s are defined by the matrix C. So, the four parameters of the family of CM N K-landscapes are (i) the number of objective functions M , (ii) the length of the bit string N , (iii) the number of epistatic links K, and (iv) the correlation matrix C. In the ρM N K-landscapes, a matrix Cρ = (cnp ) is considered, with the same correlation between all the objectives: cnn = 1 for all n, and cnp = ρ for all n = p. However, it is not possible to have the matrix Cρ for all ρ between [−1, 1]: ρ must be greater than M−1 , see [9]. To generate random variables with −1 uniform marginals and a specified correlation matrix C, we follow the work of Hotelling and Pabst [12]. The construction of CM N K-landscapes defines correlation between the y’s but not directly between the objectives. In [9], it is proven by algebra that the correlation between objectives is tuned by the matrix C: E(cor(fn , fp )) = cnp . In ρM N K-landscape, the parameter ρ allows to tune very precisely the correlation between all pairs of objectives.
4
Study of Pareto Local Optima
In this section, we first study the number of Pareto local optima (PLO) according to the objective correlation, the number of objectives and the epistasis of ρM N K-landscapes. Then, we analyze its relation with the size of the Pareto optimal set. At last, we propose an adaptive walk that is able to estimate the number of PLO very precisely. We conduct an empirical study for N = 18 so that we can enumerate all the PLO exhaustively. In order to minimize the influence of
Pareto Local Optima of ρM N K-Landscapes
231
Table 1. Parameters used in the paper for the experimental analysis Parameter N M K ρ
Values {18} (Section 4) , {18, 32, 64, 128} (Section 5) {2, 3, 5} {2, 4, 6, 8, 10} {−0.9, −0.7, −0.4, −0.2, 0.0, 0.2, 0.4, 0.7, 0.9} such that ρ ≥
−1 M −1
the random creation of landscapes, we considered 30 different and independent instances for each parameter combinations: ρ, M , and K. The measures reported are the average over these 30 landscapes. The parameters under investigation in this study are given in Table 1. 4.1
Number of Pareto Local Optima
Fig. 1 shows the average number of PLO to the size of the search space (|X| = 218 ) for different ρMNK-landscapes parameter settings. As the well-known result from single-objective N K-landscapes [1], the number of PLO increases with the epistasis degree. For instance, with an objective space dimension M = 2 and an objective correlation ρ = 0.9, the average number of PLO increases more than 30 times: from 192 for K = 2 to 6048 for K = 10. However, the range of PLO is larger with respect to objective correlation. For the same epistatic degree and number of objectives, the number of PLO decreases exponentially (Fig. 1, top). Indeed, for an objective space dimension M = 2 and an epistasis degree K = 4, the average number of PLO decreases more than 120 times: from 82, 093 for negative correlation (ρ = −0.9) to 672 for positive correlation (ρ = 0.9). This result can be interpreted as follows. Let us consider an arbitrary solution x, and two different objective functions fi and fj . When the objective correlation is high, there is a high probability that fi (x) is close to fj (x). In the same way, the fitness values fi (x ) and fj (x ) of a given neighbor x ∈ N (x) are probably close. So, for a given solution x such that it exists a neighbor x ∈ N (x) with a better fi -value, the probability is high that fj (x ) is better than fj (x). More formally, the probability IP(fj (x ) > fj (x) | fi (x ) > fi (x)), with x ∈ N (x), increases with the objective correlation. Then, a solution x has a higher probability of being dominated when the objective correlation is high. Under this hypothesis, the probability that a solution dominates all its neighbors decreases with the number of objectives. Fig. 1 (bottom) corroborates this hypothesis. When the objective correlation is negative (ρ = −0.2), the number of PLO changes in an order of magnitude from M = 2 to M = 3, and from M = 3 to M = 5. This range is smaller when the correlation is positive. When the number of objective is large and the objective correlation is negative, almost all solutions are PLO. Assuming that the difficulty for Pareto-based search approaches gets higher when the number of PLO is large, the difficulty of ρM N K-landscapes increases when: (i) the epistasis increases, (ii) the number of objective functions increases,
232
S. Verel et al.
1
1
| XPLO | / | X |
| XPLO | / | X |
0.1
0.01
K=2 K=4 K=6 K=8 K=10
0.001
0.0001 -1
0.1
0.01
0.001 -0.5
0
0.5
1
K=2 K=4 K=6 K=8 K=10 -0.2
0
0.2
ρ
0.6
0.8
1
0.1
| XPLO | / | X |
1
| XPLO | / | X |
0.4 ρ
0.1
M=2 M=3 M=5
0.01 2
4
6 K
8
0.01
0.001 M=2 M=3 M=5
0.0001 10
2
4
6
8
10
K
Fig. 1. Average number of PLO to the size of the search space (|X| = 218 ) according to parameter ρ (top left M = 2, right M = 5), and to parameter K (bottom left ρ = −0.2, right ρ = 0.9). The problem size is N = 18.
(iii) the objective correlation is negative, and its absolute value increases. Section 5 will precise the relative difficulty related to those parameters for large-size problem instances. 4.2
Estimating the Cardinality of the Pareto Optimal Set?
When the number of Pareto optimal solutions is too large, it becomes impossible to enumerate them all. A metaheuristic should then manipulate a limited-size solution set during the search. In this case, we have to design specific strategies to limit the size of the approximation set [13]. Hence, the cardinality of the Pareto optimal set also plays a major role in the design of multiobjective metaheuristics. In order to design such an approach, it would be convenient to approximate the size of the Pareto optimal set from the number of PLO. Fig. 2 shows the scatter plot of the average size of the Pareto optimal set vs. the average number of PLO in log-scales. Points are scattered over the regression line with the Spearson correlation coefficient of 0.82, and the regression line equation is log(y) = a log(x)+b with a = 1.059 and b = −6.536. For such a log-log scale, the correlation is low. It is only possible to estimate the cardinality of the Pareto optimal set from the number of PLO with a factor 10. Nevertheless, the number of Pareto optimal solutions clearly increases when the number of PLO increases.
Pareto Local Optima of ρM N K-Landscapes
0.1
| XE | / | X |
0.01
233
M=2 M=3 M=5 reg. line
0.001 0.0001 1e-05 1e-06 1e-07 0.0001
0.001
0.01 | XPLO | / | X |
0.1
1
Fig. 2. Scatter plot of the average size of the Pareto optimal set (to the size of the search) vs. the average number of PLO (to the size of the search) for the 110 possible combinations of parameters. The problem size is N = 18. The correlation coefficient is 0.82. Notice the log-scales.
4.3
Adaptive Walk
In single-objective optimization, the length of adaptive walks, performed with a hill-climber, allows to estimate the diameter of the local optima basins of attraction. Then, the number of local optima can be estimated when the whole search space cannot be enumerated exhaustively. In this section, we define a multiobjective hill-climber, and we show that the length of the corresponding adaptive walk is correlated to the number of PLO. We define a very basic single solution-based Pareto Hill-Climbing (PHC) for multiobjective optimization. A pseudo-code is given in Algorithm 1. At each iteration of the PHC algorithm, the current solution is replaced by one random neighbor solution which dominates it. So, the PHC stops on a PLO. The number of iterations, or steps, of the PHC algorithm is the length of the Pareto adaptive walk. We performed 103 independent PHC executions for each problem instance. Fig. 3 shows the average length of the Pareto adaptive walks for different landscapes according to the set of parameters given in Table 1. The variation of the average length follows the opposite variation of the number of PLO. In order to
Algorithm 1. Pareto Hill-Climbing (PHC) start with a random solution x ∈ X step ← 0 while x is not a Pareto Local optimum do randomly choose x from {y ∈ N (x)|x ≺ y} x←x step ← step +1 end while
S. Verel et al.
7 6 Length of adaptive walks
6
K=2 K=4 K=6 K=8 K=10
5
K=2 K=4 K=6 K=8 K=10
5 Length of adaptive walks
234
4 3 2
4 3 2 1
1 0
0 -1
-0.5
0
0.5
1
-0.2
0
0.2
ρ 4
Length of adaptive walks
Length of adaptive walks
0.8
1
2.5 2 1.5 1
6 5.5 5 4.5 4 3.5
0.5
3
0
2.5 6
8
M=2 M=3 M=5
6.5
3
4
0.6
7
M=2 M=3 M=5
3.5
2
0.4 ρ
10
2
4
6
K
8
10
K
Fig. 3. Average length of the Pareto adaptive walk according to parameter ρ (top left M = 2, right M = 5), and according to parameter K (bottom left ρ = −0.2, right ρ = 0.9). The problem size is N = 18.
1
reg. line
| XPLO | / | X |
0.1
0.01
0.001
0.0001 0
1
2 3 4 5 Length of adaptive walks
6
7
Fig. 4. Scatter plot of the average number of PLO to the size of the search space (|X| = 218 ) vs. the average length of the Pareto adaptive walk. The problem size is N = 18.
show the link with the number of PLO more clearly, Fig. 4 gives the scatter-plot of the average Pareto adaptive length vs. the logarithm of the average number of PLO. The correlation is strong (r = 0.997), and the regression line equation is: log(y) = ax + b , with a = −1.095 and b = 12.443. For bit-string of length
Pareto Local Optima of ρM N K-Landscapes
235
N = 18, the average length of the Pareto adaptive walks can then give a precise estimation of the average number of PLO. When the adaptive length is short, the diameter of the basin of attraction associated with a PLO is short. This means that the distance between PLO decreases. Moreover, assuming that the volume of this basin is proportional to a power of its diameter, the number of PLO increases exponentially when the adaptive length decreases. This corroborates known results from single-objective optimization. Of course, for larger bit-string length, the coefficients are probably different.
5
Properties vs. Multi-modality for Large-Size Problems
In this section, we study the number of PLO for large-size ρM N K-landscapes using the length of the adaptive walk proposed in the previous section. First, we analyze this number according to the problem dimension (N ). Then, we precise the difficulty, in terms of PLO, with respect to objective space dimension (M ) and objective correlation (ρ). We performed 103 independent PHC executions for each problem instance. Fig. 5 shows the average length of the Pareto adaptive walks for different landscapes according to the set of parameters given in Table 1. Whatever the objective space dimension and correlation, the length of the adaptive walks increases 25
ρ=-0.9 ρ=-0.2 ρ=0.0 ρ=0.2 ρ=0.9
40 Length adaptive walk
20 Length adaptive walk
45
M=2 M=3 M=5
15
10
35 30 25 20 15 10
5
5 0
0 0
20
40
60
80
100
120
140
0
20
40
60
N
Length adaptive walk
20
45
M=2 M=3 M=5
40 Length adaptive walk
25
80
100
120
140
N
15
10
M=2 M=3 M=5
35 30 25 20 15 10
5
5 0
0 -1
-0.8 -0.6 -0.4 -0.2
0 ρ
0.2
0.4
0.6
0.8
1
-1
-0.8 -0.6 -0.4 -0.2
0 ρ
0.2
0.4
0.6
0.8
1
Fig. 5. Average length of the Pareto adaptive walk according to problem size (N ) for K = 4 and ρ = −0.2 (top-left) and for K = 4 and M = 2 (top-right). Average length of the Pareto adaptive walk according to objective correlation (ρ) for K = 4 and N = 64 (bottom-left) and for K = 4 and N = 128 (bottom-right).
236
S. Verel et al.
linearly with the search space dimension N . According to the results from the previous section, the number of PLO increases exponentially. We can then reasonably conclude that the size of the Pareto optimal set grows exponentially as well, to an order of magnitude (Section 4.2). However, the slope of the Pareto adaptive length increase is related to the objective space dimension (M ) and correlation (ρ). The higher the number of objective functions, the smaller the slope. As well, the higher the objective correlation, the smaller the slope. Fig. 5 (bottom) allows us to give a qualitative comparison for given problem sizes (N = 64 and N = 128). Indeed, let us consider an arbitrary adaptive walk of length 10. For ρM N K-landscapes with N = 64 and K = 4, this length corresponds approximately to parameters (ρ = −0.4, M = 2), (ρ = 0.3, M = 3), and (ρ = 0.7, M = 5) at the same time. For N = 128, we have (ρ = −0.9, M = 2), (ρ = −0.1, M = 3), and (ρ = 0.3, M = 5). Still assuming that a problem difficulty is closely related to the number of PLO, an instance with a small objective space dimension and a negative objective correlation can be more difficult to solve than with many correlated objectives.
6
Discussion
This paper gives a fitness landscape analysis for multiobjective combinatorial optimization based on the local optima of multiobjective N K-landscapes with objective correlation. We first focused on small-size problems with a study of the number of local optima by complete enumeration. Like in single-objective optimization, the number of local optima increases with the degree of non-linearity of the problem (epistasis). However, the number of objective functions and the objective correlation have a stronger influence. Futhermore, our results show that the cardinality of the Pareto optimal set clearly increases with the number of local optima. We proposed a Pareto adaptive walk, associated with a Pareto hill-climber, to estimate the number of local optima for a given problem size. Next, for large-size instances, the length of such Pareto adaptive walk can give a measure related to the difficulty of a multiobjective combinatorial optimization problem. We show that this measure increases exponentially with the problem size. A problem with a small number of negatively correlated objectives gives the same degree of multi-modality, in terms of Pareto dominance, than another problem with a high objective space dimension and a positive correlation. A similar analysis would allow to better understand the structure of the landscape for other multiobjective combinatorial optimization problems. However, an appropriate model to estimate the number of local optima for any problem size still needs to be properly defined. A possible path is to generalize the approach from [14] for the multiobjective case. For a more practical purpose, our results should also be put in relation with the type of the problem under study, in particular on how to compute or estimate the problem-related measures reported in this paper. Moreover, we mainly focused our work on the number of local optima. The next step is to analyze their distribution by means of a local optima network [15]. At last, we already know that the number and the distribution of local optima have a strong impact on the performance of multiobjective
Pareto Local Optima of ρM N K-Landscapes
237
metaheuristics, but it is not yet clear how they exactly affect the search. This open issue constitutes one of the main challenge in the field of fitness landscape analysis for multiobjective combinatorial optimization.
References 1. Kauffman, S.A.: The Origins of Order. Oxford University Press, New York (1993) 2. Paquete, L., Schiavinotto, T., St¨ utzle, T.: On local optima in multiobjective combinatorial optimization problems. Ann. Oper. Res. 156(1), 83–97 (2007) 3. Borges, P., Hansen, M.: A basis for future successes in multiobjective combinatorial optimization. Technical Report IMM-REP-1998-8, Institute of Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark (1998) 4. Paquete, L., St¨ utzle, T.: Clusters of non-dominated solutions in multiobjective combinatorial optimization: An experimental analysis. In: Multiobjective Programming and Goal Programming. Lecture Notes in Economics and Mathematical Systems, vol. 618, pp. 69–77. Springer, Heidelberg (2009) 5. Knowles, J., Corne, D.: Towards landscape analyses to inform the design of a hybrid local search for the multiobjective quadratic assignment problem. In: Soft Computing Systems: Design, Management and Applications, pp. 271–279 (2002) 6. Garrett, D., Dasgupta, D.: Multiobjective landscape analysis and the generalized assignment problem. In: Maniezzo, V., Battiti, R., Watson, J.-P. (eds.) LION 2007 II. LNCS, vol. 5313, pp. 110–124. Springer, Heidelberg (2008) 7. Garrett, D., Dasgupta, D.: Plateau connection structure and multiobjective metaheuristic performance. In: Congress on Evolutionary Computation (CEC 2009), pp. 1281–1288. IEEE, Los Alamitos (2009) 8. Aguirre, H.E., Tanaka, K.: Working principles, behavior, and performance of MOEAs on MNK-landscapes. Eur. J. Oper. Res. 181(3), 1670–1690 (2007) 9. Verel, S., Liefooghe, A., Jourdan, L., Dhaenens, C.: Analyzing the effect of objective correlation on the efficient set of MNK-landscapes. In: Learning and Intelligent Optimization (LION 5). LNCS, Springer, Heidelberg (2011) (to appear) 10. Coello Coello, C.A., Dhaenens, C., Jourdan, L. (eds.): Advances in Multi-Objective Nature Inspired Computing. Studies in Computational Intelligence, vol. 272. Springer, Heidelberg (2010) 11. Merz, P.: Advanced fitness landscape analysis and the performance of memetic algorithms. Evol. Comput. 12(3), 303–325 (2004) 12. Hotelling, H., Pabst, M.R.: Rank correlation and tests of significance involving no assumptions of normality. Ann. Math. Stat. 7, 29–43 (1936) 13. Knowles, J., Corne, D.: Bounded Pareto archiving: Theory and practice. In: Metaheuristics for Multiobjective Optimisation. Lecture Notes in Economics and Mathematical Systems, vol. 535, pp. 39–64. Springer, Heidelberg (2004) 14. Eremeev, A.V., Reeves, C.R.: On confidence intervals for the number of local optima. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoWorkshop 2003. LNCS, vol. 2611, pp. 224–235. Springer, Heidelberg (2003) 15. Daolio, F., Verel, S., Ochoa, G., Tomassini, M.: Local optima networks of the quadratic assignment problem. In: Congress on Evolutionary Computation (CEC 2010), pp. 1–8. IEEE, Los Alamitos (2010)
Quick-ACO: Accelerating Ant Decisions and Pheromone Updates in ACO Wei Cheng1 , Bernd Scheuermann1 , and Martin Middendorf2 1
2
SAP AG, SAP Research Center Karlsruhe Vincenz-Priessnitz-Str. 1, 76131 Karlsruhe, Germany {wei.cheng,bernd.scheuermann}@sap.com University of Leipzig, Faculty of Mathematics and Computer Science Johannisgasse 26, 04103 Leipzig, Germany
[email protected]
Abstract. In Ant Colony Optimization (ACO) algorithms, solutions are constructed through a sequence of probabilistic decisions by artificial ants. These decisions are guided by information stored in a pheromone matrix which is repeatedly updated in two ways: Pheromone values in the matrix are increased by the ants to mark preferable decisions (probabilistic selection of items) whereas evaporation reduces each pheromone value by a certain percentage to weaken the relevance of former, potentially unfavorable, decisions. This paper introduces novel methods for expedited ant decisions and pheromone update for ACO. It is proposed to speedup decisions of ants by temporarily allowing them to select any item. If this item has already been chosen before (which would result in an inadmissible solution), the ant repeats its decision until an admissible item has been chosen. This method avoids to continuously determine the probability distributions over the yet admissible items which otherwise would require frequent expensive prefix sum calculations. The procedure of pheromone matrix updates is accelerated by entirely abandoning evaporation while re-scaling pheromone values and update increments. It should be empasized that both new methods do not change the optimization behavior compared to standard ACO. In experimental evaluations with a range of benchmark instances of the Traveling Salesman Problem, the new methods were able to save up to 90% computation time compared to a ACO algorithm which uses standard procedures for pheromone update and decision making. Keywords: Ant Colony Optimization, ant decision, pheromone update, speed up.
1
Introduction
Ant colonies are capable of finding shortest paths between their nest and food sources (see e.g. [4]). This complex behavior of the colony is possible because the ants communicate indirectly by disposing traces of pheromone as they walk along a chosen path. Following ants most likely prefer those paths possessing the P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 238–249, 2011. c Springer-Verlag Berlin Heidelberg 2011
Quick-ACO: Accelerating Ant Decisions and Pheromone Updates in ACO
239
strongest pheromone information, thereby refreshing or further increasing the respective amounts of pheromone. Since ants need less time to traverse short paths, pheromone traces on these paths are increased very frequently. On the other hand, pheromone information is permanently reduced by evaporation, which diminishes the influence of formerly chosen unfavorable paths. This combination focuses the search process on short, profitable paths. Inspired by this biological paradigm, the Ant Colony Optimization (ACO) meta-heuristic was introduced for the Traveling Salesman Problem [5]. Later the range of ACO applications has been extended gradually, an overview is given in [1,6]. Typically, in ACO a set of artificial ants searches for good solutions to the optimization problem under consideration. Each ant constructs a solution by making a sequence of local decisions guided by pheromone information and some additional heuristic information (if applicable). After a number of ants have constructed solutions, some ants are allowed to update the pheromone information along their paths through the decision graph. Evaporation is accomplished by globally reducing pheromone information by a certain percentage. This process is repeated iteratively until a stopping criterion is met. This paper proposes to speedup the decisions of artificial ants by temporarily allowing them to select any item (e.g. next city to be visited on a tour for the Traveling Salesman Problem). If this city has already been chosen before (which would result in an inadmissible tour with cycles), the ant repeats its decision until a city has been chosen which has not been visited before. This method avoids to continuously determine the probability distributions over the as yet admissible cities which otherwise would require frequent expensive prefix sum calculations. Applying this new method the optimization behavior remains the same compared to standard ACO. As further improvement it is proposed to constrain the maximum number of unsuccessful selection attempts. If this limit (which is adaptive with respect to the current stage within solution construction) is exceeded the standard ACO selection procedure is applied which guarantees making an admissible decision. In early experiments with real ant colonies [7,3], it was shown that the degree of pheromone evaporation did not play an important role. However, in the context of ACO, where the optimization problems tackled by the artificial ants are much more complex, evaporation turned out to be very useful [6] as the ants gradually “forget” unfavorable decisions made in the past. Furthermore, evaporation restrains the value of pheromone which can be reached on the paths traversed. In a typical ACO algorithm (i.e. without expensive, specialized construction heuristics or local search), evaporation requires O(n2 ) time per iteration and accounts for a large portion of the overall algorithm runtime. The second new method proposed in this paper is to completely abandon explicit evaporation by appropriately re-scaling pheromone values and pheromone update increments. Also here, the new method does not change the optimization behavior of the algorithm. Hence, the new methods can be used for the design of future ACO algorithms but they also be applied to speed up most existing ACO algorithms. Many interesting variants and improvements of the original ACO
240 1 2 3 4 5
W. Cheng, B. Scheuermann, and M. Middendorf Initialize; while termination condition not met do ConstructSolutions; PheromoneUpdate; end
Algorithm 1. Top-level structure of a typical ACO algorithm algorithm have been proposed in recent years. An example is the Max-Min Ant System (MMAS) [12]. Most of these variants do not change the core of the ACO approach, i.e., how the pheromone information is used to make decisions and how the pheromone information is updated at every iteration of the algorithm. Two exceptions shall be mentioned which also avoid explicit evaporation: Counterbased ACO [11] where evaporation is replaced by locally reducing pheromones during solution construction. In Population-based ACO [8], evaporation is simulated when deleting a solution from the population. In both cases, however, the optimization behavior is different from the standard ACO algorithm. To test the new methods they have been integrated into an ACO and applied to some standard benchmark TSP instances. The results show, significant speed up is gained compared to a standard ACO implementation. Moreover, the influence of parameters of the new methods on the speed up is investigated. The remainder of this paper is structured as follows. Section 2 briefly explains the standard ACO algorithm. The new decision and pheromone update methods of the Quick-ACO algorithm are introduced in Section 3. Experimental results are presented in Section 4. Conclusions and an outlook on future work are given in Section 5.
2
The Standard ACO Algorithm
In this section, a standard ACO algorithm described for the Traveling Salesman Problem (TSP) as an exemplary optimization problem. It should be noted that there exist many variants of this algorithm but the basic structure is similar in the majority of ACO algorithms that have been described in the literature for (static single-objective) combinatorial optimization problems. Given a graph G = (N, E) with set of nodes (cities) N = {0, . . . , n − 1}, set of edges E = N × N , and distances dij between cities i and j for i, j ∈ {0, . . . , n − 1}, the objective of TSP is to find a distance-minimal Hamiltonian cycle. A Hamiltonian cycle is a closed walk (tour) on the graph G, such that each city is visited exactly once. One can distinguish between symmetric TSP and asymmetric TSP. In the symmetric TSP, the distance between two cities is irrespective of the direction of traversing the edge connecting them, i.e., dij = dji for every pair of cities. A TSP is called asymmetric if dij = dji for at least one pair of cities. A typical ACO algorithm has a general top-level structure as shown in Algorithm 1. The procedures of the ACO are described in the following.
Quick-ACO: Accelerating Ant Decisions and Pheromone Updates in ACO
241
Initialize. An n × n pheromone matrix τ is initialized such that τij := τinit with initial pheromone value τinit > 0. This pheromone matrix is used as the accumulated global memory for the ants to remember how beneficial previous decisions have been. For the TSP problem, pheromone value τij expresses the desirability to move from city i to city j. In addition to the pheromone information, the ants may consider heuristic information η for their decisions. For the TSP, ηij := 1/dij is an established heuristic indicating that a closer city should be preferred as next city. The subsequent procedures are executed until a stopping condition is met (e.g., the maximal number of iterations completed). ConstructSolutions. Each ant w ∈ 0, . . . , m − 1 starts from a randomly chosen city with an empty tour that has initial tour length Lw = 0. The tour of the ant is then constructed by a sequence of random decisions, each of which adds a new city to the tour. Assuming that ant w is located in city i, the probability that the ant selects city j as next city in the tour is determined by
β τijα · ηij pij = β α τih · ηih h∈S
where selection set S comprises all cities that have not been included in the tour so far and α, β are parameters that define the relative influence of the pheromone and heuristic information. If ant w has chosen next city j then the tour length is increased accordingly: Lw = Lw + dij . This process is repeated until the ant has included every city into the tour and the final tour length has been determined by adding to Lw the distance from the last city to the first city of the tour. PheromoneUpdate. After m ants have created their tours, the pheromone matrix for the next iteration is updated in two steps. First each pheromone value is reduced by a certain percentage, i.e., τij := (1 − ρ) · τij for each i, j ∈ 0, . . . , n − 1 where parameter ρ ∈ (0, 1) is called the evaporation rate. Then, the ants that found good solutions are allowed to increase the pheromone values that correspond to the traversed edges of their solution. In most ACO algorithms, the ants that found the b best tours are allowed to update where b = 1 is a common value. Typically, the increase of a pheromone value is done by simply adding a fixed amount of pheromone or by adding an amount of pheromone that w depends on the quality of the found solution. For example, τij = w∈B Δij w where B is the set of b best ants, Δij := 1/Lw if edge eij is contained in the tour of ant w, and Δw ij := 0 otherwise. Other update strategies are proposed in the literature, for example to update pheromone also according to the best tour found so far in all iterations. In the case of symmetric TSP, it is common practice to maintain symmetric pheromone values τij = τji in accordance with the symmetric distances between cities. In the context of pheromone updates, this can be achieved by always incrementing (or decrementing) τij and τji by identical values.
242 1 2 3 4 5
W. Cheng, B. Scheuermann, and M. Middendorf t := 0; initPheromoneMatrix; initHeuristicsMatrix; initProductsMatrix; initPrefixMatrix;
Algorithm 2. Procedure Initialize of the Quick-ACO algorithm
3
The Quick-ACO Algorithm
In this section, it is described how a standard ACO for the TSP can be suitably modified to accelerate the procedures for ant decision-making and for pheromone update. In principle, also standard ACO algorithms for other optimization problems can be extended similarly. 3.1
Initialization
The initialization procedure in shown in Algorithm 2. As in the standard ACO algorithm for the TSP, the pheromone matrix is initialized as τij := τinit in procedure initPheromoneMatrix, and procedure initHeuristicsMatrix stores heuristic values ηij := 1/dij into a matrix. The Quick-ACO algorithm maintains two additional matrices. Procedure initProductsMatrix initializes an n × n matrix α β ϕij := τij ηij with the products of weighed pheromone values and corresponding heuristic values. Procedure initPrefixMatrix initializes an (n − k) × n matrix pref ixij which stores the prefix sums over the products of pheromone value and j heuristic value where pref ixij := h=0 ϕih , and k is an input parameter explained in the following section. Note, that it is reasonable to store these prefix sums since in algorithm Quick-ACO, they remain constant during the subsequent ConstructSolutions procedure (this is different from the standard ACO). 3.2
Solution Construction
With respect to solution construction the novelty of the Quick-ACO algorithm is that each ant attempts to make its first n−k decisions irrespective of selection set S. Hence, initially the ant can choose any city to be visited next. The advantage of this approach is that the ant can make use of the constant prefix sums over ϕij which have been pre-calculated. In the standard ACO algorithm, the prefix sums for all cities in S would have to be re-calculated prior to every decision which has a substantial impact on algorithm runtime. Applying this new method to the first n − k decisions only is motivated by the fact that during the initial decisions per tour, selection set S still contains many cites and the likelihood of choosing an inadmissible city is relatively low. Furthermore it is assumed that with an advancingly converging pheromone matrix the chance of choosing a city from S exhibits an increasing trend. The final k cities in a tour are selected with the standard selection procedure because the chance of choosing an inadmissible city becomes increasingly high.
Quick-ACO: Accelerating Ant Decisions and Pheromone Updates in ACO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
243
for w := 0, . . . , m − 1 do S := {0, . . . , n − 1}; choose randomly start city cf ∈ S; S := S \ {cf }; start city cs := cf ; for v := 0, . . . , n − k − 1 do repeat cd := selectDestCityFast(); until cd ∈ S ; πw (cs ) := cd ; cs := cd ; S := S \ {cd }; end for v := n − k, . . . , n − 2 do cd := selectDestCityStand(); πw (cs ) := cd ; cs := cd ; S := S \ {cd }; end πw (cs ) := cf ; end
Algorithm 3. Procedure ConstructSolutions of the Quick-ACO algorithm In the following, the new method is introduced by explaining the algorithm and by giving a mathematical assessment of its runtime characteristics. Subsequently, a modification is proposed which restricts the number of selection attempts. Algorithmic Description. Algorithm 3 outlines the procedure for solution construction. Each ant w starts creating a tour from a randomly selected first city cf ∈ S (line 3). Subsequently the ant makes n − 2 decisions (selection of next destination cities) until a complete tour has been created. The initial n − k destination cities (lines 5-9) are determined with an expedited selection method called selectDestCityFast, where a random number r is drawn from the interval [0, pref ixcs ,n−1 ) (line 7). Afterwards the corresponding destination city cd with r ∈ [pref ixcs ,cd −1 , pref ixcs ,cd ) is identified by binary search. Destination city cd is added to tour πw of ant w as the next city visited after city cs . The remaining k − 1 cites of tour πw are always selected according to the standard selection procedure selectDestCityStand (lines 11-13). This includes drawing a random number r ∈ [0, ci ∈S ϕcs ,ci ). The destination city cd = cj is determined with Φj−1 ≤ r < Φj where Φj = ci ∈S,i≤j ϕcs ,ci . After n − 1 decisions cf is chosen as the final city which closes the Hamiltonian cycle. Runtime Characteristics. Consider an ant w that has already started creating a tour containing v cities and now has to select the (v + 1)-th city of the tour. For an approximate analysis of the run time we assume for simplicity that the probability p of choosing an admissible city from S can be approximated by pˆ = (n − v)/n. Typically during later iterations of the algorithm, when the pheromone values converge, p tends to become increasingly larger than pˆ. Then, the expected number A of attempts to make a valid decision could be expressed
244
W. Cheng, B. Scheuermann, and M. Middendorf
∞ by A = a=1 (1 − p)a−1 p = 1/p ≤ n/(n − v). Every check whether a chosen city is admissible requires time O(log2 n) using binary search. Hence, an assumption for the expected time tquick for the first n − k ant decisions could by: be expressed tquick = O log2 n · n−k−1 n/(n − v) = O n log2 n · ( ni=1 1/i − ki=1 1/i) v=0 k The remaining decisions for v ∈ {n−k, . . . , n−1} require tstandard = O( i=1 i) = 2 O(k(k+1)/2) time. Therefore the total time = O(k ) for all decisions of an ant is k n tcomb = O n log2 n · ( i=1 1/i − i=1 1/i) + k 2 . Subsequently, runtime tcomb is further examined for three different cases. Case 1: k = o( n log 2 n). The following calculations make use of the harn monic number Hn = i=1 1/i which can be approximated by limn→∞ Hn = ln n + γ with γ ≈ 0.577 known as the Euler-Mascheroniconstant. Hence, the n total execution time for n → ∞ is tcomb ≤ 2O (n log2 n · i=1 1/i + n log2 n) = O (n log2 n · (ln n + γ) + n log2 n) = O n log2 n . Case 2: k = Θ n log2 n . In this case, the total execution time for n → ∞ can be expressed as tcomb = O n log2 n · (ln n − ln k) + k 2 = c1 · n log2 n · (ln n − ln k)+c2 ·k 2 , where c1 , c2 are constants. Time tcomb is minimal when dtcomb /dk = −c1 n log2 n(1/k) + 2c2 k = 0 which is true for k = (c1 /(2c2))n log2 n such that the total execution time is tcomb = O n log2 n ln n/log2 n = O n log22 n . Total time tcomb reaches its global minimum when k = Θ( n log2 n). Since d2 tcomb /dk 2 ≥ 0 for all k = O( n log2 n) (cases 1 and 2), holds tcomb = O n log22 n . Case 3: k = ω n log2 n . This case can be further sub-divided into 3a) k ∈ ω( n log2 n) ∧ O(k ) and 3b) k ∈ ω(k ) where k needs to be determined such that in case 3a) for computaton time tcomb = O n log22 n and in case 3b) tcomb = ω n log22 n holds. If the combined runtime is tcomb = ω n log22 n then 2 2 limn→∞ √ (n log2 n · (ln n − ln k) + k )/n log2 n = ∞. So it can be concluded: k = n log2 n. From the three cases above, it can be concluded that the expected run time of 2 the solution construction of algorithm Quick-ACO is O n log n , if k is chosen 2 √ from O ( n log2 n). In the worst case, the run time is O(n2 ) which is equal to the time for solution construction of the standard ACO algorithm. Concluding from this theoretical discussion, choosing parameter k ≈ n log2 n may provide some guidance when implementing the algorithm. Restriction of Selection Attempts. In Algorithm 3 (line 7) an ant may repeatedly selects a destination city which it has already visited before. Under unfortunate circumstances the actual number of such selection attempts may become undesirably high. Therefore it is suggested switching to the standard selection procedure after a number of L unsuccessful attempts. In the following, it is discussed how to appropriately choose this parameter.
Quick-ACO: Accelerating Ant Decisions and Pheromone Updates in ACO
245
It was outlined that during the (v + 1)-th decision of an ant the probability p of choosing a valid city can be approximated by pˆ = (n − v)/n. The Quick-ACO selection method requires O(log2 n) time to identify the selected city and to verify its validity. The standard method needs O(n − v) time to make a decision. the total expected time tdec for an LConsequently, l−1 l ant decision is tdec = p · c3 log2 n + (1 − p)L · c4 (n − v) = l=1 (1 − p) (1 − (1 − p)L )/pc3 log2 n − L(1 − p)L c4 log2 n + (1 − p)L c4 (n − v) with c3 and c4 being constant values. Time tdec is minimal if dtdec /dL = 0 which is satisfied for L = c3 (n − v)/(c4 log2 n) − 1/p − 1/ ln(1 − p). As the actual value of −1/p − 1/ ln(1 − p) ∈ (−0.5, −1), while p ∈ (0, 1), is comparatively low, setting the parameter in the magnitude of L = Θ((n − v)/ log2 n) could be considered as a suitable orientation value for the implementation of the algorithm. This choice also has a clear practical explanation: Asymptotically L times applying the new method would take as long as the subsequent standard ACO ant decision. 3.3
Updating Pheromone Matrix and Prefix Sum Matrix
Algorithm Quick-ACO accelerates the pheromone update procedure by using suitably re-scaled pheromone values τij = τij · Rt where τij denotes the regular pheromone value (as in standard ACO), R = 1/(1 − ρ) the re-scaling factor, and t the iteration counter. Observe, that re-scaling the pheromone values does not change the selection probabilities β β (τij · Rt )α · ηij τij · ηij pij = = β β α (τih · Rt)α · ηih τih · ηih α
h∈S
h∈S
In the standard ACO algorithm, the pheromone update is done by setting τij (t) := (1−ρ)·τij (t−1)+Δij . Re-scaled pheromone values in the Quick-ACO algorithm are updated likewise: τij (t) = τij (t)·Rt = (1−ρ)·τij (t−1)·Rt +Δij ·Rt = τij (t − 1) + Δij · Rt . Hence, if Δij = 0 then τij (t) = τij (t − 1) such that no evaporation is needed anymore which would otherwise require O(n2 ) time. The novel update procedure (shown in Algorithm 4) maintains a modified re-scaling factor R = Rt . As in the standard ACO algorithm, a set B of ants is determined which are allowed to perform an update. Each ant w from the set of b = |B| updating ants increments pheromone values τi,πw (i) by value δ := Δw i ·R w with Δi = Δiπw (i) > 0. Accordingly, values ϕi,πw (i) are adjusted. Afterwards the prefix sums are updated by updatePrefixMatrix. In the case of b > 1, it is proposed to re-calculate the entire prefix sum matrix. If b = 1 one may save computation time by restricting prefix sum updates to all j ∈ {π w (i), . . . , n − 1} in each row i as all other prefix sum values for j < πw (i) remain unchanged. Finally, the re-scaling factor is updated to R := R /(1 − ρ).
4
Experimental Result
Quick-ACO algorithm is evaluated with instances of the asymmetric TSP from the TSPLIB [10] and for some real world test instances [2]. The ACO algorithms
246
W. Cheng, B. Scheuermann, and M. Middendorf
1
determine set of updating ants B and increment values Δ; if t = 1 then R := 1/(1 − ρ); end foreach w ∈ B do for i := 0, . . . , n − 1 do δ := Δw i ·R ; τi,πw (i) := τi,πw (i) + δ; β α ϕi,πw (i) := τi,π w (i) ηi,πw (i) ; end end updatePrefixMatrix; t := t + 1; R := R /(1 − ρ);
2 3 4 5 6 7 8 9 10 11 12 13 14
Algorithm 4. Procedure PheromoneUpdate (include update of prefix sum matrix) of the Quick-ACO algorithm mentioned below do not use local search, because the main aim of this paper is to compare the core concepts of the different ACO algorithms with respect to run time. In the following, the influence of parameters k and L is discussed and an experimental comparison between Quick-ACO and standard ACO is made. The implementations of both algorithms use two improvements from Max-Min Ant System (MMAS) [12]: 1) The τ values are limited to an interval [τmin , τmax ]. In Quick-ACO the size of pheromone values is checked every u iterations, u is set so that (1 −ρ)u = τmin /τmax . The standard ACO with Max-Min limit checks pheromone values in every iteration. 2) Only the iteration best ant can update the pheromone matrix. The following parameter values are used in the tests: number of ants m = 25, maximal number of iterations 2500n as the setting of MMAS, ρ = 0.1, α = 1, β = 2, τmax = 10, τmin = τmax /2n. All algorithms are implemented in C++ (g++ version 4.3.2 ) and have been run on a machine with Intel E8400 @ 3 GHz processor (only one core is used) and SUSE Linux Enterprise Server 11 (Linux version 2.6.27.19-5-default). 4.1
Examining Impact of Parameters k and L
The influence of parameters k and L is studied. As proposed in Section 3.2, L has been chosen to be dependent on the size of the problem instance and dynamic with the current decision number: L = c3 (n − v)/(c4 log2 n). In the experiments L is determined indirectly via c = c3 /c4 . One parameter is set fixed and the other one is changed. Every test has been repeated 10 times, the test results for TSP instances ftv170, rbg443, and td1000.20 are shown in Figure 1. For problem instance ftv170 (with 171 cities), it can be seen that k should be set to a low value of about 5 − 20. For the larger values of k tested the run time increases continuously. The standard ACO is about 3.6 times slower (1.21s/1000 iterations) during the first 100, 000 iterations (not visualized here).
Quick-ACO: Accelerating Ant Decisions and Pheromone Updates in ACO
ftv170, c=0.3
ftv170, k=15
0.6
0.44 1000 iteration 10000 iteration 100000 iteration
0.55
excution time every 1000 iteration
excution time every 1000 iteration
247
0.5 0.45 0.4 0.35 0.3
1000 iteration 10000 iteration 100000 iteration
0.42 0.4 0.38 0.36 0.34 0.32
0
20
40
60
80
100
0
0.5
1
1.5
k rbg443, c=0.3
2 c
2.5
3
3.5
4
rbg443, k = 120
1000 iteration 10000 iteration 100000 iteration
5.5
excution time every 1000 iteration
excution time every 1000 iteration
4 6
5 4.5 4 3.5 3 2.5
1000 iteration 10000 iteration 100000 iteration
3.8 3.6 3.4 3.2 3 2.8 2.6
0
50
100
150
200 k
250
300
350
400
0
0.2
0.8
1
td1000.20, k =10
td1000.20, c=0.3 12 excution time per 1000 iteration
excution time per 1000 iteration
0.6 c
1000 iteration 10000 iteration 100000 iteration
25
0.4
20
15
10
1000 iteration 10000 iteration 100000 iteration
11 10 9 8 7 6
5 0
100
200
300 k
400
500
0
0.2
0.4
0.6 c
0.8
1
1.2
Fig. 1. Execution time for different k and c; L = c(n − v)/ log2 n; left: varying k (c = 0.3 is fixed), right: varying c (k is fixed)
Problem instance rbg443 is medium size with 443 cities. In this instance, standard ACO is about 2.8 times slower (7.95s/1000 iterations) than QuickACO. The value of parameter c should be chosen relatively low around 0.2 (see Figure 1, rbg443). The runtimes for values k ∈ [50 − 160]) are similar. The large problem instance td1000.20 with 1001 cities shows most clearly that Quick-ACO runs faster as the number of iterations increases. Parameter c should be set in all these cases to 0.2 − 0.5. With a good setting (k = 10, c = 0.3) Quick-ACO needs 593 seconds for 100, 000 iterations compared to standard ACO which as around 10 times slower (5832 seconds). Altogether, the experiments show that in early stages of the optimization (around first 1, 000 iterations) choosing k ≈ n log2 n can be considered as
248
W. Cheng, B. Scheuermann, and M. Middendorf
Table 1. Comparison of solution quality and average run time between Quick-ACO and standard ACO (Parameter c of Quick-ACO is set as 0.3) instance p43 ry48p atex5 kro124p td100.1 ftv170 rbg443 td1000.20
it. (2500∗) 43 48 72 100 101 171 40 40
standard mean median time 5622.2 5621 10.61 14632 14601 14.70 5289.8 5289 43.34 37320 37210 134.41 285245 285499 114.49 2844.2 2839 516.28 3129.7 3129.5 794.83 1529454 1532191 5832.0
Quick-ACO mean median time
k
5620.9 5621 9.6 15 14589 14570 9.18 15 5333.9 5269 35.55 25 36999 368405 57.42 15 268037 268637 49.99 10 2838.8 2837 140.89 15 2902.5 2903.5 281.04 120 1270924 1269270 593.71 10
p 1.000 0.934 0.914 0.989 1.000 0.611 1.000 1.000
Speedup 1.11 1.60 1.22 2.34 2.29 3.66 2.83 9.82
suitable initial value as the actual runtime would deviate from the lowest runtime measured by at most 10%. With progressing optimization (and convergence of the pheromone matrix) it can be advisable to further reduce k in an adaptive manner. Appropriate values for c were found in the interval [0.2, 0.5]. 4.2
Comparison with Standard ACO
The experiment reported in this section has been repeated 30 times, except the last two large problems (10 repititions). The statistical comparisons are done with the one-sided Mann-Whitney-U test [9] with the null hypothesis that the solution quality of Quick-ACO is at least as good as that of the standard ACO. The significance level is α = 0.025. Table 1 shows the solution qualities and the corresponding p-values for the different instances after 2500n iterations. Recall that there is only one functional difference between the two algorithms: standard ACO with Max-Min limit ensures that the τ values are never smaller than τmin , whereas Quick-ACO checks this only every u iterations. The results of the test show that the solutions provided by Quick-ACO have at least the same quality as those of standard ACO. Table 1 also shows Quick-ACO gains high speedup values (even without extensive parameter tuning for k and c). Choosing tuned and adaptive parameters might further increase the speedup by ca. 10% (see Figure 1). The runtime of Quick-ACO for the small instances (e.g., p43 and atex5) is slightly smaller (less than 30%) than for the standard ACO. Thus, Quick-ACO benefits from an increased problem size. For the larger problem instance ftv170 Quick-ACO reaches a speedup of more than 3.6, and for td1000.20, the speedup is almost 10.
5
Conclusion
This paper has introduced the Ant Colony Optimization (ACO) algorithm QuickACO. For many typical optimization problems (e.g. TSP), an ant in Quick-ACO needs on average time O n log22 n to construct a solution. Only in the worst case it needs time Θ(n2 ) which is the same as in standard ACO. Furthermore compared to standard ACO the pheromone update of Quick-ACO eliminates evaporation and reduces pheromone update time from O(n2 ) to O(n). Functionally
Quick-ACO: Accelerating Ant Decisions and Pheromone Updates in ACO
249
Quick-ACO is equivalent to standard ACO. Thus, Quick-ACO can be considered as a faster implementation of a standard ACO algorithm. The only restriction is that Quick-ACO requires optimization problems where a heuristic can be used which remains static while constructing a solution. For all test instances of the TSP problem Quick-ACO has achieved a considerable speedup (by almost factor 10 for a TSP instance with 1001 cities) without any loss in solution quality. For further investigation and improvement of Quick-ACO, it would be worth considering the following issues: The calculation of the prefix sum matrix takes a large portion of the execution time and there might be potential for further improvement. For a theoretical analysis of Quick-ACO with TSP it was assumed, that the probability of choosing the next destination city from the set of the so far not visited cities be p = (n − v)/n. However, typically this probability will be larger which means that a more detailed analysis might assess the algorithm complexity even more precisely. So far the ant decision of Quick-ACO uses binary search for a validity check which costs time O(log2 n). It might be possible to reduce this by using more complex data structures.
References 1. Blum, C.: Ant colony optimization: Introduction and recent trends. Physics of Life Reviews 2, 353–373 (2005) 2. Cirasella, J., Johnson, D.S., McGeoch, L.A., Zhang, W.: The asymmetric traveling salesman problem: Algorithms, instance generators, and tests. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 32–59. Springer, Heidelberg (2001) 3. Deneubourg, J.L., Aron, S., Goss, S., Pasteels, J.M.: The self-organizing exploratory pattern of the argentine ant. Journal of Insect Behavior 3, 159–168 (1990) 4. Deneubourg, J.L., Pasteels, J.M., Verhaege, J.C.: Probabilistic behaviour in ants: a strategy of errors? Journal of Theoretical Biology 105, 259–271 (1983) 5. Dorigo, M.: Optimization, Learning and Natural Algorithms. Ph.D. thesis, Dipartimento di Elettronica, Politecnico di Milano (1992) 6. Dorigo, M., St¨ utzle, T.: Ant Colony Optimization. The MIT Press, Cambridge (2004) 7. Goss, S., Aron, S., Deneubourg, J.L., Pasteels, J.M.: Self-organized shortcuts in the argentine ant. Naturwissenschaften 76, 579–581 (1989) 8. Guntsch, M., Middendorf, M.: A population based approach for ACO. In: Cagnoni, S., Gottlieb, J., Hart, E., Middendorf, M., Raidl, G.R. (eds.) EvoWorkshops 2002. LNCS, vol. 2279, p. 72. Springer, Heidelberg (2002) 9. Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics (1947) 10. Reinelt, G.: Tsplib–a traveling salesman problem library. Informs Journal on Computing 3(4), 376–384 (1991) 11. Scheuermann, B., Middendorf, M.: Counter-based ant colony optimization as a hardware-oriented meta-heuristic. In: Rothlauf, F., Branke, J., Cagnoni, S., Corne, D.W., Drechsler, R., Jin, Y., Machado, P., Marchiori, E., Romero, J., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2005. LNCS, vol. 3449, pp. 235–244. Springer, Heidelberg (2005) 12. St¨ utzle, T., Hoos, H.H.: Max-min ant system. Future Gener. Comput. Syst. 16(9), 889–914 (2000)
Two Iterative Metaheuristic Approaches to Dynamic Memory Allocation for Embedded Systems Mar´ıa Soto, Andr´e Rossi, and Marc Sevaux Universit´e de Bretagne-Sud, Lab-STICC, CNRS Centre de recherche B.P. 92116 F-56321 Lorient Cedex, France
[email protected]
Abstract. Electronic embedded systems designers aim at finding a tradeoff between cost and power consumption. As cache memory management has been shown to have a significant impact on power consumption, this paper addresses dynamic memory allocation for embedded systems with a special emphasis on time performance. In this work, time is split into time intervals, into which the application to be implemented by the embedded system requires accessing to data structures. The proposed iterative metaheuristics aim at determining which data structure should be stored in cache memory at each time interval in order to minimize reallocation and conflict costs. These approaches take advantage of metaheuristics previously designed for a static memory allocation problem. Keywords: Memory allocation, Electronics, Metaheuristics.
1
Introduction
Advances in nanotechonolgy have made possible the design of miniaturized electronic chips which have drastically extended the features supported by embedded systems. Smart phones that can surf the WEB and process HF images are a typical example. While technology offers more and more opportunities, the design of embedded systems becomes more and more complex. In addition to market pressure, this context has favored the development of Computer Assisted Design (CAD) software, which bring a deep change in the designers’ line of work. CAD tools as Gaut [1] can generate the architecture of a circuit from its specifications, but the designs produced by CAD software usually lack optimization, which results in high power consumption, and this is of course a major drawback. Thus, designers want to find a trade-off between architecture cost (i.e. the number of memory banks in the embedded system) and its power consumption [2]. To some extent, electronics practitioners consider that minimizing power consumption is equivalent to minimizing the running time of the application to be implemented by the embedded system [3]. Moreover, the power consumption of a given application can be estimated using an empiric model as in [4], and P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 250–261, 2011. c Springer-Verlag Berlin Heidelberg 2011
Metaheuristics for Dynamic Memory Allocation
251
parallelization of data access is viewed as the main action point for minimizing execution time, hence power consumption. We study dynamic memory allocation in embedded processors, as this issue has a significant impact on execution time and on power consumption as shown by Wuytack et al. in [5]. There exist recentl approaches of software and hardware techniques [6,7] to tackle this kind of problems. In this paper we address this problem from the point of view of operations research. It is organized as follows. Section 2 provides a more detailed presentation of the problem, and Section 3 gives an integer linear program formulation. Two iterative metaheuristics are then proposed for addressing larger problem instances in Section 4. Computational results are then shown and discussed in Section 5.
2
Modeling the Problem
The main objective in embedded system is very often to implement signal processing applications efficiently (i.e. MPEG decoding, digital filter, FFT etc.). The application to be implemented is assumed to be given as a C source code, whose data structures (i.e. variables, arrays, structures) have to be loaded in the cache memory of the processor that executes it. Time is split into T time intervals whose durations may be different, those durations are assumed to be given along with the application. During each time interval, the application requires accessing a given subset of its data structures for reading and/or writing. Unlike alternative problem versions like in [8] where a static data structure allocation is searched for, the problem addressed in this paper is to find a dynamic memory allocation, i.e. the memory allocation of a data structure may vary over time. Roughly speaking, one wants the right data structure to be present in cache memory at the right time, while minimizing the efforts for updating memory mapping at each time interval. The chosen memory architecture is similar to the one of a TI C6201 device, which is composed of m memory banks (i.e. cache memory) whose capacity is cj kilo octet (ko) for all j ∈ {1, . . . , m} and an external memory (i.e. RAM memory) whose capacity is supposed to be large enough to be considered as unlimited. The external memory is referred to as memory bank m + 1. The processor requires access to data structures in order to execute the operations (or instructions) of the application. The data structure access time is expressed in milliseconds, and depends on its current allocation. If the data structure is allocated to a memory bank, its access time is equal to its size in ko because the transfer rate from a memory bank to the processor is one ko per millisecond. If it is allocated to the external memory, its access time is p ms per ko. Initially (i.e. during time interval I0 ), all data structures are in the external memory and memory banks are empty. The time required for moving a data structure form the external memory to a memory bank (and vice-versa) is v ms/ko. The time required for moving a data structure from a memory bank to
252
M. Soto, A. Rossi, and M. Sevaux
another is l ms/ko. The memory management system is equipped with a DMA (Direct Memory Access) controller that allows for a direct access to data structures. The time performances of that controller are captured with the numerical values of v and l. Moreover, the time required for moving a data structure in memory is assumed to be less than its access cost: v < p. The TI C6201 device can access all its memory bank simultaneously, which allows for parallel data loading. Thus, both data structures (or variables) a and b can be loaded in parallel when the operation a+b is to be executed by the processor, provided that a and b are allocated to two different memory banks. If these variables share the same memory bank, the processor has to access them sequentially, which requires twice more time if a and b have the same size. Two data structures are said to be conflicting whenever they are involved in the same operation in the application. Each conflict has a cost, that is equal to the number of times a and b are involved in the same operation during the current time interval. This cost might be non-integer if the application source code has been analyzed by a code-profiling software [9,10] based on the stochastic analysis of the branching probability of conditional instructions. This happens when an operation is executed within a while loop or after a conditional instruction like if or else if. A conflict between two data structures is said to be closed if both data structures are allocated to two different memory banks. In any other case, the conflict is said to be open. A data structure can be conflicting with itself: this typically happens when the data structure is an array, and the application performs an operation like a[i] = a[i+1]. However, data structures can not be split and expand over different memory banks. This problem is denoted by Dy-MemExplorer, it is to allocate a memory bank or the external memory to any data structure of the application for each time interval, so as to minimize the time spent accessing and moving data structures while satisfying the memory banks’ capacity. In this paper, the application to be implemented and its data structures are assumed to be given. In practice, a software like SoftExplorer [11] can be used for collecting the data, but the code profiling is out of the scope of this work.
3
ILP Formulation for Dy-MemExplorer Problem
Dy-MemExplorer problem is intrinsically linear, and in this section we present its integer linear formulation. Let n be the number of data structures in the application. The size of a data structure is denoted by si , for all i in {1, . . . , n}. nt is the number of data structures that the application has to access during the time interval It , for all t in {1 . . . , T }. At ⊂ {1, . . . , n} denotes the set of data structure required in the time interval It for all t ∈ {1, . . . , T }. Thus ei,t denotes the number of times that i ∈ At is accessed in the interval It . The number of conflicts in It is denoted by ot , and dk,t is the cost of conflict (k, t) = (k1 , k2 ) during the time interval It for all k in {1, . . . , ot }, k1 and k2 in At , and t in {1, . . . , T }.
Metaheuristics for Dynamic Memory Allocation
253
The allocation of data structures to memory banks (and to the external memory) for each time interval are modeled as follows. For all (i, j, t) in {1, . . . , n}× {1, . . . , m + 1}× {1, . . . , T }, xi,j,t is set to one if and only if data structure i is allocated to memory bank j during time interval It , xi,j,t = 0 otherwise. The statuses of conflicts are represented as follows. For all k in {1, . . . , ot } and t ∈ {1, . . . , T }, yk,t is set to one if and only if conflict k is closed during time interval It , otherwise yk,t = 0. The allocation change for a data structure is represented with the two following sets of variables. For all i in {1, . . . , n} and t ∈ {1, . . . , T }, wi,t is set to one if and only if the data structure i has been moved from a memory bank j = m + 1 at It−1 to a different memory bank j = m + 1 during time interval It . For all i in {1, . . . , n} and t ∈ {1, . . . , T }, wi,t is set to one if and only if the data structure i has been moved from a memory bank j = m + 1 at It−1 to the external memory, or if it has been moved from the external memory at It−1 to a memory bank during time interval It . The cost of executing operations in the application can be written as follows: T m t=1
ot ei,t · xi,j,t + p ei,t · xi,m+1,t − yk,t · dk,t
i∈At j=1
i∈At
(1)
k=1
The first term in (1) is the access cost of all the data structures that are in a memory bank, the second term is the access cost of all the data structures allocated to the external memory, and the last one accounts for closed conflict cost. The cost of moving data structures between the intervals can be written as: nt T t=1
si · (l · wi,t + v · wi,t )
(2)
i=1
m+1 The cost of a solution is the sum of these two costs. Since i∈At j=1 ei,t · xi,j,t = i∈At ei,t is a constant for all t in {1, . . . , T }. The cost function to minimize is equivalent to: f=
T
(p − 1)
t=1
ot ei,t · xi,m+1,t − yk,t · dk,t + si · (l · wi,t + v · wi,t ) (3)
i∈At
k=1
i∈At
The ILP formulation of Dy-MemExplorer is then f
Minimize
m+1
(4) xi,j,t
=1
∀i ∈ {1, . . . , n}, ∀t ∈ {1, . . . , T }
(5)
xi,j,t si
≤ cj
∀j ∈ {1, . . . , m}, ∀t ∈ {1, . . . , T }
(6)
j=1 n i∈At
xk1 ,j,t + xk2 ,j,t ≤ 2 − yk,t
∀k1 , k2 ∈ At , ∀j ∈ {1, . . . , m + 1}, ∀k ∈ {1, . . . , ot }, ∀t ∈ {1, . . . , T }
(7)
254
M. Soto, A. Rossi, and M. Sevaux xi,j,t−1 + xi,g,t ≤ 1 + wi,t
∀i ∈ {1, .., n}, ∀j = g, (j, g) ∈ {1, .., m}2 , ∀t ∈ {1, .., T }
xi,m+1,t−1 + xi,j,t ≤ 1 +
wi,t
∀i ∈ {1, . . . , n}, ∀j ∈ {1, . . . , m}, ∀t ∈ {1, . . . , T }
xi,j,t−1 + xi,m+1,t ≤ 1 +
wi,t
(8) (9)
∀i ∈ {1, . . . , n}, ∀j ∈ {1, . . . , m}, ∀t ∈ {1, . . . , T }
(10)
xi,j,0
=0
∀i ∈ {1, . . . , n}, ∀j ∈ {1, . . . , m}
(11)
xi,m+1,0
=1
∀i ∈ {1, . . . , n}
(12)
xi,j,t ∈ {0, 1}
∀i ∈ {1, . . . , n}, ∀j ∈ {1, . . . , m}, ∀t ∈ {1, . . . , T }
(13)
wi,t ∈ {0, 1}
∀i ∈ {1, . . . , n}, ∀t ∈ {1, . . . , T }
(14)
∈ {0, 1} wi,t
∀i ∈ {1, . . . , n}, ∀t ∈ {1, . . . , T }
(15)
yk,t ∈ {0, 1}
∀k ∈ {1, . . . , ot }, ∀t ∈ {1, . . . , T }
(16)
Equation (5) enforces that any data structure is either allocated to a memory bank or to the external memory. (6) states that the total size of the data structures allocated to any memory bank must not exceed its capacity. For all conflicts (k, t) = (k1 , k2 ), (7) ensures that variable yk,t is set appropriately. Equations (8) to (10) enforce the same constraints for variables wi,t and wi,t . The fact that initially, all the data structures are in the external memory is enforced by (11) and (12). Finally, binary requirements are enforced by (13) − (16). This ILP formulation has been integrated in SoftExplorer. It can be solved for modest size instances using an ILP solver like Xpress-MP [12]. A simplified static version of this problem is when the application time is not split in intervals and memory banks are not subject to capacity constraints. Indeed, in that case the external memory is no longer used and the size, as well as the access cost of data structures can be ignored. This simplified static version is similar to the k-weighted graph coloring problem (see [13]). In this problem, the vertices represent data structures and edges represent conflict between a pair of structures. Colors model the allocation of memory banks to data structures. It is well known that this problem is NP -hard, and so is Dy-MemExplorer. 3.1
Example
For the sake of illustration, Dy-MemExplorer is solved on an instance originating in the LMS (Least Mean Square) dual-channel filter [14] which is a well-known signal processing algorithm. This algorithm is written in C and is to be implemented on a TI C6201 target. The compilation and code profiling of the C file yields an eight data structure instance having the same size (16384 Bytes), there are 2 memory banks whose capacity is 32768 Bytes. On that target, p = 16 (ms), and l = v = 1 (ms/ko). The conflicts (1;5), (2;6), (3;5) and (4;6) with cost conflict of 4095 are involved at intervals 1, 2, 3 and 4 respectively. The cost conflict for the remaining conflicts is 1. The conflicts (1;5) and (1;7) are involved
Metaheuristics for Dynamic Memory Allocation
255
in the interval 5. The interval 6 involves the conflicts (2;8) and (2;6); and the last interval involves the conflicts (3;3) and (4;4). Data structure 3 is required by the application during time interval I3 . In an optimal solution found by Xpress-MP [12], the data structures 1 and 3 are swapped for avoiding to access data structure 3 from the external memory. Similarly, during time interval I4 , data structures 4 and 5 are swapped, then there is no allocation change during the last time intervals. The data structures 7 and 8 are in the external memory during all the application lifetime because their moving cost is larger than their access cost. The cost of this solution is 147537 milliseconds.
4 4.1
Iterative Metaheuristic Approaches Long-Term Approach
This approach takes into account the application’s requirements for the current and future time intervals. The Long-term approach relies on addressing a memory allocation sub-problem called MemExplorer. This sub-problem is to search for a static memory allocation of data structures that could remain valid from the current time interval to the end of the last one. In this subproblem, the fact that the allocation of data structures can change at each time interval is ignored. MemExplorer is addressed for all time intervals It , for all t ∈ {1, . . . , T }. The data and variables of this subproblem are the same as for Dy-MemExplorer, but index t is removed. MemExplorer is then to find a memory allocation for data structures such that the time spent accessing these data is minimized, for a given number of capacitated memory banks and an external memory. MemExplorer is addressed using a Variable Neighborhood Search-based approach hybridized with a Tabu Search-inspired method in [8]. This algorithm relies on two neighborhoods: the first one is generated by performing a feasible allocation (respecting the capacity constrains to change allocation of a single data structure); and the other neighborhood explores solutions that are beyond the first one by allowing for unfeasible solutions before repairing them. The tabu search for MemExplorer is based on TabuCol, an algorithm for graph coloring introduced in [15]. The main difference with a classic tabu search is that the size of the tabu list is not constant over time. This idea is introduced in [16] and also used in the work of Porumbel, Hao and Kuntz on the graph coloring problem [17]. In TabuMemex, the size of the tabu list N T is set to a + N T max × t every N T max iterations, where a is a fixed integer and t is a random number in [0, 2]. The Long-term approach builds a solution iteratively, i.e. from time interval I1 to time interval IT . At each time interval, it builds a preliminary solution called the parent solution. Then, the solution for the considered time interval is built as follows: the solution is initialized to the parent solution; then, the data structures that are not required until the current time interval are allocated to the external memory.
256
M. Soto, A. Rossi, and M. Sevaux
Algorithm 1. Long-term approach Data: for each time interval t ∈ {1, . . . , T } a set of data structures At involved, a set of size of data structures St , a set of conflicts between data structures Kt and a set of cost of conflicts Dt . Result: X1 , . . . , XT memory allocations for each time interval and the total cost of the application C. Initially all data structures are in the external memory X0 (a) = m + 1, for all a ∈ ∪Tα=1 Aα P0 ← X0 for t ← 1 to T do Updating data A = ∪Tα=t Aα , A = ∪tα=1 Aα , E = ∪Tα=t Eα , S = ∪Tα=t Sα , S = ∪tα=1 Sα , K = ∪Tα=t Kα , D = ∪Tα=t Dα Solving MemExplorer problem with current data Mt ← MemExplorer(A, E, S, K, D) Computing the total cost. executing cost plus converting cost CMt ← Access Cost(Mt , A, E, K, D) + Change Cost(Xt−1 , Mt , A , S ) CPt−1 ← Access Cost(Pt−1 , A, E, K, D) + Change Cost(Xt−1 , Pt−1 , A , S ) Choosing the parent solution if CMt < CPt−1 then Pt ← Mt else Pt ← Pt−1 end Making the solution at time interval t Xt ← Pt for a ∈ / A do Xt (a) = m + 1 end Computing the total cost of application C ← C+ Access Cost(Xt , At , Et , Kt , Dt ) + Change Cost(Xt−1 , Xt , A , S ) end
At each time interval, the parent solution is selected among two candidate solutions. The candidate solutions are the parent solution for the previous interval, and the solution to MemExplorer for the current interval. The total cost of both candidate solutions is then computed. This cost is the sum of two subcosts. The first sub-cost is the cost that we would have if the candidate solution was applied from the current time interval to the last one. The second sub-cost is the cost to be paid for changing the memory mapping from the solution of the previous time interval (which is known) to the candidate solution. Then, the candidate solution associated with the minimum total cost is selected as the parent solution. The Long-term approach is presented in Algorithm 1. A memory allocation is denoted by X, so X(a) = j means that data structure a is allocated to memory bank j in {1, . . . , m + 1}. The solution Xt is associated with time interval It for all t in {1, . . . , T }. The solution X0 consists in allocating all the data structures of the application to the external memory.
Metaheuristics for Dynamic Memory Allocation
257
The parent solution is denoted by Pt for the time interval It . The algorithm builds the solution Xt by initializing Xt to Pt , and the data structures that are not required until time interval It are moved to the external memory. In the algorithm, Mt is the memory allocation found by solving the instance of MemExplorer built from the data for the time interval It . Then, a new instance of MemExplorer is solved at each iteration. Algorithm 1 uses two functions to compute the total cost CX of a solution X. The first sub-cost is computed by the function Access Cost(X, . . . , ). That function returns the cost produced by a memory allocation X for a specified instances (data) of Memexplorer. The second sub-cost is computed by the function Change Cost(X1, X2 ). It computes the cost of changing solution X1 into solution X2 . At each time interval It the parent solution Pt is chosen between two candidate Pt−1 and Mt . It is the one which produces the minimum total cost (comparing both the total cost CPt−1 and CMt ). At each iteration, Algorithm 1 updates the data and uses the same process to generate the time interval solution Xt for all t in {1, . . . , t}. 4.2
Short-Term Approach
This approach relies on addressing a memory allocation sub-problem called MemExplorer-Prime. Given an initial memory allocation, this sub-problem is to search for a memory allocation of data structures that should be valid from the current time interval. This sub-problem takes into account the cost for changing the solution of the previous time interval. Algorithm 2. Short-term approach Data: Same as Algorithm 1. Result: Same as Algorithm 1. Initially all data structures are in the external memory X0 (a) = m + 1, for all a ∈ ∪Tα=1 Aα for t ← 1 to T do Solve MemExplorer-Prime problem with current data Xt ← MemExplorer-Prime(Xt−1 , At , Et , St , Kt , Dt ) end
MemExplorer-Prime is addressed for all time intervals. The data of this subproblem are the same as for MemExplorer. MemExplorer-Prime is stated as follows: for a given initial memory allocation for data structures, number of capacitated memory banks and an external memory, we search for a memory allocation such that the time spent accessing data and the cost of changing allocation of these data are minimized. In this paper, MemExplorer-Prime is addressed using a Tabu Search method similar to one used by Long-term approach. The Short-term approach iteratively builds a solution for time intervals. Each solution is computed by taking into account the conflicts and data structures involved in the current time interval, and also by considering the allocation in the previous time interval. The Short-term approach solves MemExplorer-Prime
258
M. Soto, A. Rossi, and M. Sevaux
considering as initial allocation the allocation of data structures of previous interval. Algorithm 2 presents this approach. A solution X is defined as above, and it uses a function MemExplorer-Prime(X0, . . .) for solving an instance of the problem MemExplorer-Prime where the initial solution is solution X0 . At each iteration the algorithm updates the data and the solution produced by the MemExplorer-Prime(X0, . . .) is taken as the time interval solution.
5
Computational Results
These approaches have been implemented in the c++ programming language and compiled with gcc 4.11 in Linux OS 10.04. They have been tested over a set of instances on an Intel Pentium iv processor system at 3 ghz with 1 Gbyte ram. The first eighteen instances of Table 1 are real life instances that come from electronic design problems addressed in the Lab-sticc laboratory. The remaining ones originate from dimacs [18], they have been enriched by generating some edge costs at random to represent conflicts, access costs and sizes for data structures, the number of memory banks with random capacities, and by dividing the conflicts and data structures into different time intervals. Also p = 16 (ms), and l = v = 1 (ms/ko) for all instances. Although real-life instances available today are relatively small, they will be larger and larger in the future as market pressure and technology tend to integrate more and more complex functionalities in embedded systems. Thus, we tested our approaches on current instances and on larger (but artificial) ones as well, for assessing their practical use for forthcoming needs. In Table 1, we compared performances of approaches with the ILP formulation solved by Xpress-MP, that is used as a heuristic when the time limit of one hour is reached: the best solution found so far is then returned by the solver. We presented the instances sorted by non decreasing sizes (i.e. by the number of conflicts and data structures). The first two columns of Table 1 show the main features of the instances: name, number of data structures, conflicts, memory banks and time intervals. The next two columns present the cost and the CPU time of Short-term approach. For the Long-term approach we present the best costs and its time reached in twelve experiments, the standard deviation and the ratio between the standard deviation and average cost. The following two columns report the cost and CPU time of the ILP approach. The column “gap” reports the gap between the Long-term approach and the ILP. The last columns indicates whether or not the solution returned by Xpress-MP is optimal. The optimal solution is known only for the smallest instances. Memory issues prevented Xpress-MP to address the nine largest instances. Bold figures in the table are the best known solutions reported by each method. When the optimal solution is known, only three instances resist to the Long-term approach with a gap of at most 3%. Over the 17 instances solved by Xpress but without guarantee of the optimal solution, the ILP method finds 6 best solutions whereas the Longterm approach improves 11 solutions, sometimes up to 48%.
Metaheuristics for Dynamic Memory Allocation
259
Table 1. Cost and CPU time for all the approaches proposed for Dy-MemExplorer Instances Name gsm newdy compressdy volterrady cjpegdy lmsbvdy adpcmdy lmsbdy lmsbv01dy lmsbvdyexp spectraldy gsmdy gsmdycorr lpcdy myciel3dy turbocodedy treillisdy mpegdy myciel4dy mug88 1dy mug88 25dy queen5 5dy mug100 1dy mug100 25dy r125.1dy myciel5dy mpeg2enc2dy queen6 6dy queen7 7dy queen8 8dy myciel6dy alidy myciel7dy zeroin i3dy zeroin i2dy r125.5dy mulsol i2dy mulsol i1dy mulsol i4dy mulsol i5dy zeroin i1dy r125.1cdy fpsol2i3dy fpsol2i2dy inithx i1dy
Short-term
Long-term
n\o\m
T
cost
(s)
6\5\3 6\6\3 8\6\3 11\7\3 8\8\3 10\8\3 8\8\3 8\8\3 8\8\3 9\8\3 19\18\3 19\18\3 15\19\3 11\20\3 12\22\4 33\61\3 68\69\3 23\71\4 88\146\3 88\146\3 25\160\4 100\166\3 100\166\3 125\209\4 47\236\4 130\239\3 36\290\5 49\476\5 64\728\6 95\755\3 192\960\7 191\2360\5 206\3540\16 211\3541\16 125\3838\19 188\3885\17 197\3925\26 185\3946\17 186\3973\17 211\4100\26 125\7501\24 425\8688\16 451\8691\16 864\18707\28
2 3 2 4 3 3 3 4 4 3 5 5 4 4 4 6 8 7 6 6 5 7 7 6 6 12 10 16 24 11 48 24 35 35 38 39 39 39 40 41 75 87 87 187
7,808 571,968 192 4,466,800 4,323,294 49,120 54,470,706 4,399,847 5,511,967 44,912 1,355,420 494,134 31,849 6,947 3,835 1,867 11,108 16,277 27,521 24,641 22,927 30,677 29,463 37,486 26,218 10,248 31,710 47,988 73,091 70,133 135,682 176,921 219,189 215,950 379,162 238,724 229,157 240,439 243,237 236,435 413,261 528,049 521,923 1,058,645
< 0.01 < 0.01 < 0.01 < 0.01 < 0.01 < 0.01 0.01 < 0.01 0.01 < 0.01 < 0.01 < 0.01 0.01 < 0.01 < 0.01 < 0.01 < 0.01 < 0.01 0.02 0.16 0.02 0.23 0.03 0.14 0.03 0.09 0.04 0.05 0.13 0.16 0.58 0.42 1.11 1.16 1.12 0.86 1.51 0.96 0.98 1.59 2.06 2.50 2.83 12.76
Number of optimal solutions Number of best solutions Average CPU time and gap
3 4
cost
(s) stand-dev ratio
(s)
7,808 < 0.01 0.00 0.00 7,808 342,592 < 0.01 59,284 0.17 342,592 178 < 0.01 0.00 0.00 178 4,466,800 0.01 0.00 0.00 4,466,800 4,323,294 < 0.01 1,352,052 0.31 4,323,294 44,192 0.01 0.00 0.00 44,192 7,409,669 0.29 1,146,369 0.23 7,409,669 4,350,640 < 0.01 388,819 0.09 4,350,640 4,367,024 < 0.01 1,787,414 0.41 4,367,024 15,476 0.01 4,393 0.25 15,472 1,355,404 0.01 0.00 0.00 1,355,390 494,118 0.04 0.00 0.00 494,118 26,888 0.02 0.00 0.00 26,888 3,890 0.01 457 0.11 3,792 3,246 0.13 158 0.05 3,195 1,806 0.03 1 0.00 1,806 10,630 0.13 110 0.01 10,614 8,847 0.94 121 0.01 8,611 25,543 5.17 126 0.00 25,307 24,310 5.87 178 0.01 24,181 15,358 0.11 572 0.04 15,522 30,488 5.80 253 0.01 29,852 28,890 5.89 203 0.01 28,448 36,484 2.93 24 0.00 36,489 24,162 0.11 336 0.01 23,118 9,812 0.75 1 0.00 9,887 23,489 0.35 219 0.01 24,678 37,599 0.90 564 0.01 46,721 54,214 2.10 195 0.00 86,270 65,716 11.21 670 0.01 61,831 64,696 1.46 2,124 0.03 65,882 163,676 215.93 2,026 0.01 276,542 212,138 19.15 93 0.00 404,270 210,464 19.74 72 0.00 368,212 238,443 561.98 1,297 0.01 430,900 232,537 20.69 160 0.00 0.00 222,410 21.11 19 0.00 0.00 232,315 17.67 149 0.00 0.00 236,332 19.24 171 0.00 0.00 231,170 22.72 34 0.00 0.00 475,593 1,488 5,329 0.01 0.00 516,549 189.39 398 0.00 0.00 509,834 133.50 395 0.00 0.00 1,038,331 1,559 201 0.00 0.00
0.02 0.22 0.06 0.16 0.11 0.11 0.48 0.38 0.27 0.27 0.69 0.77 0.32 1.44 23.09 1.56 6.21 3,600 3,600 1,197 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gap opt. 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.02 0.00 0.00 0.03 0.01 0.01 −0.01 0.02 0.02 −0.00 0.05 −0.01 −0.05 −0.20 −0.37 0.06 −0.02 −0.41 −0.48 −0.43 −0.45 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes no no yes no no no no no no no no no no no no no no no no no no no no no no no no
18 24
14 33 0.72
ILP cost
98.5
1,783.80 −0.06
The practical difficult of one instance is related to its size (n, o), but it is not the only factor, for example the ratio between the total capacity of memory bank and the sum of sizes of data structures. We can see instances mug88 1dy and mug88 25dy, they have the same size but the performance of solver Xpress for the ILP formulation is different. The last three lines of the table summarize the results. The Short-term approach finds 4 optimal solutions and the Long-term approach finds 14 out of the
260
M. Soto, A. Rossi, and M. Sevaux
18 known optimal solutions. The Long-term approach is giving the largest number of best solutions with an average improvement of 6% over the ILP method. In most cases, the proposed metaheuristic approaches are significantly faster than Xpress-MP, the Short-term approach being the fastest one. The Shortterm approach is useful when the cost of reallocating data structures is small compared to conflicts costs. In such a case, it makes sense to focus on minimizing the cost of the current time interval without taking future needs into account, as the most important term in the total cost is due to open conflicts. The Longterm approach is useful in the opposite situation (i.e. moving data structure is costly compared to conflict costs). In that case, anticipating future needs makes sense as the solution is expected to undergo very few modification over time. Table 1 shows that the architecture used and the considered instances are such that the Long-term approach returns solution of higher quality than the Shorttern approach (except for r125.1cdy), and then emerges as the best method for today’s electronic applications, as well as for future needs.
6
Conclusion
This paper presents an exact approach and two iterative metaheuristics based on a static subproblem for addressing memory allocation in embedded systems. Numerical results show that the Long-term approaches returns good results in a reasonable amount of time, which makes this approach appropriate for today and tomorrow needs. However, the Long-term approach is outperformed by the Short-term approach on some instances, which suggests that taking the future requirements by aggregating the data structures and conflicts of the forthcoming time interval might not always be relevant. Indeed, the main drawback of this approach is that it ignores the potential for updating the solution at each iteration. Consequently, future work should concentrate on a Mid-term approach, for which future requirements are less and less weighted as they as far away from the current time interval. A second idea would be to design a global approach that builds a solution for all time intervals.
References 1. Coussy, P., Casseau, E., Bomel, P., Baganne, A., Martin, E.: A formal method for hardware IP design and integration under I/O and timing constraints. ACM Transactions on Embedded Computing System 5(1), 29–53 (2006) 2. Atienza, D., Mamagkakis, S., Poletti, F., Mendias, J., Catthoor, F., Benini, L., Soudris, D.: Efficient system-level prototyping of power-aware dynamic memory managers for embedded systems. Integration, the VLSI Journal 39(2), 113–130 (2006) 3. Chimientia, A., Fanucci, L., Locatellic, R., Saponarac, S.: VLSI architecture for a low-power video codec system. Microelectronics Journal 33(5), 417–427 (2002) 4. Julien, N., Laurent, J., Senn, E., Martin, E.: Power consumption modeling and characterization of the TI C6201. IEEE Micro. 23(5), 40–49 (2003)
Metaheuristics for Dynamic Memory Allocation
261
5. Wuytack, S., Catthoor, F., Nachtergaele, L., Man, H.D.: Power exploration for data dominated video application. In: Proc. IEEE International Symposium on Low Power Electronics and Design, Monterey, CA, USA, pp. 359–364 (1996) 6. Cho, D., Pasricha, S., Issenin, I., Dutt, N.D., Ahn, M., Paek, Y.: Adaptive scratch pad memory management for dynamic behavior of multimedia applications. Trans. Comp.-Aided Des. Integ. Cir. Sys. 28, 554–567 (2009) 7. Ozturk, O., Kandemir, M., Irwin, M.J.: Using data compression for increasing memory system utilization. Trans. Comp.-Aided Des. Integ. Cir. Sys. 28, 901–914 (2009) 8. Soto, M., Rossi, A., Sevaux, M.: Exact and metaheuristic approaches for a memory allocation problem. In: Proc. EU/MEeting, Workshop on the Metaheuristics Community, Lorient, France, pp. 25–29 (2010) 9. Iverson, M., Ozguner, F., Potter, L.: Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment. IEEE Transactions on Computers 48(12), 1374–1379 (1999) 10. Lee, W., Chang, M.: A study of dynamic memory management in C++ programs. Comp. Languages Systems and Structures 28(3), 237–272 (2002) 11. Softexplorer (2006), [Online] http://www.softexplorer.fr/ 12. Xpress-mp, FICO (2009), [Online] http://www.dashoptimization.com/ 13. Carlson, R., Nemhauser, G.: Scheduling to minimize interation cost. Operations Research 14, 52–58 (1966) 14. Besbes, S.F.J.H.: A solution to reduce noise enhancement in pre-whitened lms-type algorithms: the double direction adaptation. In: Proc. Control, Communications and Signal Processing, 2004, pp. 717–720 (2004) 15. Herz, A., de Werra, D.: Using tabu search techniques for graph coloring. Computing 39(4), 345–351 (1987) 16. Battiti, R.: The reactive tabu search. ORSA Journal on Computing 6(2), 126–140 (1994) 17. Porumbel, D., Hao, J.-K., Kuntz, P.: Diversity control and multi-parent recombination for evolutionary graph coloring algorithms. In: Cotta, C., Cowling, P. (eds.) EvoCOP 2009. LNCS, vol. 5482, pp. 121–132. Springer, Heidelberg (2009) 18. Porumbel, D.: DIMACS graphs: Benchmark instances and best upper bound (2009), [Online] http://www.info.univ-angers.fr/pub/porumbel/graphs/
Author Index
Abreu, Salvador
96
Birattari, Mauro
203
Caniou, Yves 96 Cheng, Wei 238 Codognet, Philippe 96 Crainic, Teodor Gabriel 179 Craven, Matthew J. 26 Della Croce, Federico 38 Dhaenens, Clarisse 191, 226 Diaz, Daniel 96 di Tollo, Giacomo 130
Mancini, Simona 179 Marchiori, Elena 60 Marmion, Marie-El´eonore Maturana, Jorge 130 Melab, Nouredine 155 Middendorf, Martin 238 Moraglio, Alberto 142 Nouioua, Karim
Paquete, Lu´ıs 48 Pellegrini, Paola 203 Perboli, Guido 179 Rossi, Andr´e
Eremeev, Anton V.
250
215
Figueira, Jos´e R. 48 Francesca, Gianpiero 203 Gardi, Fr´ed´eric 167 Glover, Fred 72 Grosso, Andrea 38 Hao, Jin-Kao 72 Herbawi, Wesam 84 Hinne, Max 60 Jat, Sadaf Naseem 1 Jiang, He 118 Jimbo, Henri C. 26 Jourdan, Laetitia 191, 226 Kattan, Ahmed
167
142
Lardeux, Fr´ed´eric 130 Li, Jinlong 108 Liefooghe, Arnaud 48, 191, 226 Lu, Guanzhou 108 L¨ u, Zhipeng 72 Luong, Th´e Van 155
Salassa, Fabio 38 Saubion, Fr´ed´eric 130 Scheuermann, Bernd 238 Sels, Veronique 14 Sevaux, Marc 250 Sim˜ oes, Marco 48 Soto, Mar´ıa 250 St¨ utzle, Thomas 203 Tadei, Roberto 179 Talbi, El-Ghazali 155 Vanhoucke, Mario 14 Verel, S´ebastien 191, 226 Wang, Yang 72 Weber, Michael 84 Wu, Youxi 118 Xuan, Jifeng
118
Yang, Shengxiang Yao, Xin 108 Zhang, Shuyan
1
118
191