Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5975
Pierre Collet Nicolas Monmarché Pierrick Legrand Marc Schoenauer Evelyne Lutton (Eds.)
Artificial Evolution 9th International Conference Evolution Artificielle, EA 2009 Strasbourg, France, October 26-28, 2009 Revised Selected Papers
13
Volume Editors Pierre Collet Université de Strasbourg, France E-mail:
[email protected] Nicolas Monmarché Ecole Polytechnique de l’Université de Tours, France E-mail:
[email protected] Pierrick Legrand Université de Bordeaux 2, France E-mail:
[email protected] Marc Schoenauer INRIA Futurs, Université Paris-Sud, Orsay, France E-mail:
[email protected] Evelyne Lutton INRIA Saclay - Ile-de-France, Orsay, France E-mail:
[email protected]
Library of Congress Control Number: 2010929492 CR Subject Classification (1998): I.2, F.1, J.3, H.3, I.5, F.2 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13
0302-9743 3-642-14155-2 Springer Berlin Heidelberg New York 978-3-642-14155-3 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180
Preface
This LNCS volume contains the best papers presented at the 9th Conference on Artificial Evolution, EA1 2009, held in Strasbourg (France). Previous EA events took place in Tours (2007, LNCS 4926), Lille (2005, LNCS 3871), Marseille (2003, LNCS 2936), Le Creusot (2001, LNCS 2310), Dunkerque (1999, LNCS 1829), Nimes (1997, LNCS 1363), Brest (1995, LNCS 1063) and Toulouse (1994, LNCS 1063). For this ninth edition, authors were invited to present their original work relevant to artificial evolution, including, but not limited to: evolutionary computation, evolutionary optimization, co-evolution, artificial life, population dynamics, theory, algorithmics and modeling, implementations, application of evolutionary paradigms to the real world (industry, biosciences, ...), other biologically-inspired paradigms (swarm, artificial ants, artificial immune systems, ...), memetic algorithms, multi-objective optimization, constraint handling, parallel algorithms, dynamic optimization, machine learning and hybridization with other soft computing techniques. Submitted papers were reviewed by at least four members of the International Program Committee, which selected 23 papers to be presented during the conference out of 43 submissions. However, only 17 papers were included in the present volume, resulting in a 39.5% acceptance rate. We would like to thank the members of the Program Committee for their conscientious work, the authors for their greatly appreciated contributions, but also the members of the Organizing Committee who, once more, managed to put together a really enjoyable conference in Strasbourg. Finally, financial and material support coming from ENSPS, Universit´e de Strasbourg, CNRS, R´egion Alsace and the EA association contributed to the success of the conference and helped to keep registration fees very low. November 2009
1
Pierre Collet Nicolas Monmarch´e Pierrick Legrand Evelyne Lutton Marc Schoenauer
As for previous editions of the conference, the EA acronym is based on the original ´ French name of the conference: “Evolution Artificielle.”
Organization ´ Evolution Artificielle 2009 – EA 2009 October 26–28, 2009 Universit´e de Strasbourg, France 9th International Conference on Artificial Evolution
Steering Committee Pierre Collet Nicolas Monmarch´e Pierrick Legrand Evelyne Lutton Marc Schoenauer
Universit´e Louis Pasteur de Strasbourg Universit´e Fran¸cois Rabelais de Tours Universit´e Bordeaux 2 INRIA Saclay Ile de France INRIA Saclay Ile de France
Organizing Committee General Chair Treasurer Publicity Chair Publication Chair Fund Raising Submissions Webmaster Local Organization
Pierre Collet, University of Strasbourg S´ebastien V´erel, University of Nice Sophia-Antipolis Laetitia Jourdan, INRIA Futurs Lille Pierrick Legrand, University of Bordeaux Cedric Wemmert, University of Strasbourg Alexandre Blansch´e, University of Strasbourg Aline Deruyver, University of Strasbourg Germain Forestier, University of Strasbourg Jonathan Weber, University of Strasbourg Aur´elie Bertaux, University of Strasbourg
International Program Committee Enrique Alba Anne Auger S´ebastien Aupetit Mehmet Aydin Wolfgang Banzhaf Hans-Georg Beyer Peter Bentley Alexandre Blansch´e Amine Boumaza
Universidad de M´ alaga, Spain INRIA Saclay, France Universit´e de Tours, France University of Bedfordshire, UK University of Newfoundland, Canada Vorarlberg University of Applied Sciences, Austria University College London, UK Universit´e de Strasbourg, France Universit´e du Littoral, France
VIII
Organization
Nicolas Bredeche Larry Bull Edmund Burke Stefano Cagnoni Alexandre Caminada Nirupam Chakraborti Uday Chakraborty Maurice Clerc Carlos Coello Philippe Collard Pierre Collet David Corne Ernesto Costa Luis Da Costa Daniel Delahaye Alexandre Devert Nicolas Durand Marc Ebner Aniko Ekart Christian Gagn´e Mario Giacobini Jens Gottlieb Fr´ed´eric Guinand Steven Gustafson Jin-Kao Hao Jano van Hemert Daniel Howard Colin Johnson Laetitia Jourdan Natalio Krasnogor Nicolas Labroche Nicolas Lachiche Pier Luca Lanzi Claude Lattaud Pierrick Legrand Jean Louchet Simon Lucas Evelyne Lutton Bob McKay Julian Miller
Universit´e Paris-Sud XI, France UWE Bristol, UK University of Nottingham, UK Universit` a di Parma, Italy Universit´e de Technologie de Belfort-Montb´eliard, France Indian Institute of Technology Kharagpur, India University of Missouri, USA Independant Consultant, France Instituto Polit´ecnico Nacional, Mexico Universit´e de Nice - Sophia Antipolis, France Universit´e de Strasbourg, France Heriot Watt University, UK University of Coimbra, Portugal Paris XI University, France Ecole Nationale Aviation Civile, France Polytechnic Institute of Ha Noi, Vietnam Institut de Recherche en Informatique de Toulouse, France Eberhard Karls Universit¨ at T¨ ubingen, Germany Aston University, UK Universit´e Laval, Canada University of Turin, Italy SAP AG, Germany Le Havre University, France GE Global Research, USA Universit´e d’Angers, France University of Edinburgh, UK Qinetiq, UK University of Kent, UK INRIA Lille, France University of Nottingham, UK LIP6 Paris, France Universit´e de Strasbourg, France Politecnico di Milano, Italy Universit´e Ren´e Descartes, France Universit´e de Bordeaux, France Inria Saclay, France University of Essex, UK Inria Saclay, France Seoul National University, South Korea University of York, UK
Organization
Nicolas Monmarch´e Jean-Baptiste Mouret Yuichi Nagata Miguel Nicolau Gabriela Ochoa Michael O’Neill Martin Pelikan Jean-Philippe Rennard Denis Robilliard El-Ghazali Talbi Emmanuel Sapin Marc Schoenauer Deepak Sharma Patrick Siarry Moshe Sipper Stephen Smith Christine Solnon Terence Soule Thomas Stuetzle Hideyuki Takagi Olivier Teytaud Marco Tomassini Shigeyoshi Tsutsui Paulo Urbano Gilles Venturini Sebastien Verel Darrell Whitley Xin-She Yang Tina Yu Mengjie Zhang
Universit´e de Tours, France Universit´e Paris VI, France Japan Advanced Institute of Science and Technology, Japan INRIA Saclay - ˆıle-de-France, France University of Nottingham, UK University College Dublin, Ireland University of Missouri, USA ´ Ecole de Management de Grenoble, France Universit´e du Littoral, France INRIA Lille, France INRIA Saclay - ˆıle-de-France, France INRIA Saclay - ˆıle-de-France, France Universit´e de Strasbourg, France Universit´e Paris Val de Marne, France Ben-Gurion University of the Negev, Israel University of York, UK Universit´e Lyon 1, France University of Idaho, USA Universit´e Libre de Bruxelles, Belgique Kyushu University, Japan INRIA Saclay, France Universit´e de Lausanne, Switzerland Hannan University, Japan University of Lisbon, Portugal Universit´e de Tours, France Universit´e de Nice-Sophia Antipolis, France Colorado State University, USA University of Cambridge, UK Memorial University of Newfoundland, Canada Victoria University of Wellington, New Zealand
Invited Talk Elementary Landscapes and Why they Matter, Darrell Whitley.
Sponsoring Institutions EA association ENSPS CNRS R´egion Alsace Universit´e de Strasbourg
IX
Table of Contents
Theory Extremal Optimization Dynamics in Neutral Landscapes: The Royal Road Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I. De Falco, A. Della Cioppa, D. Maisto, U. Scafuri, and E. Tarantino Improving the Scalability of EA Techniques: A Case Study in Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan R. Bach, A. S ¸ ima Uyar, and J¨ urgen Branke
1
13
Ant Colony Optimization MC-ANT: A Multi-Colony Ant Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . Leonor Melo, Francisco Pereira, and Ernesto Costa
25
Applications Artificial Evolution for 3D PET Reconstruction . . . . . . . . . . . . . . . . . . . . . . Franck P. Vidal, Delphine Lazaro-Ponthus, Samuel Legoupil, ´ Jean Louchet, Evelyne Lutton, and Jean-Marie Rocchisani A Hybrid Genetic Algorithm/Variable Neighborhood Search Approach to Maximizing Residual Bandwidth of Links for Route Planning . . . . . . . Gajaruban Kandavanam, Dmitri Botvich, Sasitharan Balasubramaniam, and Brendan Jennings Parallelization of an Evolutionary Algorithm on a Platform with Multi-core Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shigeyoshi Tsutsui On the Difficulty of Inferring Gene Regulatory Networks: A Study of the Fitness Landscape Generated by Relative Squared Error . . . . . . . . . . . Francesco Sambo, Marco A. Montes de Oca, Barbara Di Camillo, and Thomas St¨ utzle
37
49
61
74
Combinatorial Optimization Memetic Algorithms for Constructing Binary Covering Arrays of Strength Three . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eduardo Rodriguez-Tello and Jose Torres-Jimenez
86
XII
Table of Contents
A Priori Knowledge Integration in Evolutionary Optimization . . . . . . . . . Paul Pitiot, Thierry Coudert, Laurent Geneste, and Claude Baron
98
Robotics On-Line, On-Board Evolution of Robot Controllers . . . . . . . . . . . . . . . . . . . N. Bredeche, E. Haasdijk, and A.E. Eiben The Transfer of Evolved Artificial Immune System Behaviours between Small and Large Scale Robotic Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . Amanda M. Whitbrook, Uwe Aickelin, and Jonathan M. Garibaldi
110
122
Multi-objective Optimization An Analysis of Algorithmic Components for Multiobjective Ant Colony Optimization: A Case Study on the Biobjective TSP . . . . . . . . . . . . . . . . . Manuel L´ opez-Ib´ an ˜ez and Thomas St¨ utzle Alternative Fitness Assignment Methods for Many-Objective Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mario Garza Fabre, Gregorio Toscano Pulido, and Carlos A. Coello Coello
134
146
Genetic Programming Evolving Efficient List Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . Kfir Wolfson and Moshe Sipper Semantic Similarity Based Crossover in GP: The Case for Real-Valued Function Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nguyen Quang Uy, Michael O’Neill, Nguyen Xuan Hoai, Bob Mckay, and Edgar Galv´ an-L´ opez Genetic-Programming Based Prediction of Data Compression Saving . . . Ahmed Kattan and Riccardo Poli
158
170
182
Machine Learning On the Characteristics of Sequential Decision Problems and Their Impact on Evolutionary Computation and Reinforcement Learning . . . . . Andr´e M.S. Barreto, Douglas A. Augusto, and Helio J.C. Barbosa
194
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
207
Extremal Optimization Dynamics in Neutral Landscapes: The Royal Road Case I. De Falco1 , A. Della Cioppa2, , D. Maisto1 , U. Scafuri1 , and E. Tarantino1 1
Institute of High Performance Computing and Networking, National Research Council of Italy (ICAR–CNR) Via P. Castellino 111, 80131 Naples, Italy {ivanoe.defalco,domenico.maisto,umberto.scafuri, ernesto.tarantino}@na.icar.cnr.it 2 Natural Computation Lab, DIIIE, University of Salerno, Via Ponte don Melillo 1, 84084 Fisciano (SA), Italy Tel.: +39-089-964255; Fax: +39-089-964218
[email protected]
Abstract. In recent years a new view of evolutionary dynamics has emerged based on both neutrality and balance between adaptation and exaptation. Differently from the canonical adaptive paradigm where the genotypic variability is strictly related to the change at fitness level, such a paradigm has raised awareness of the importance of both selective neutrality and co-option by exaptation. This paper investigates an innovative method based on Extremal Optimization, a coevolutionary algorithm successfully applied to NP–hard combinatorial problems, with the aim of exploring the ability of its extremal dynamics to face neutral fitness landscapes by exploiting co-option by exaptation. A comparison has been effected between Extremal Optimization and a Random Mutation Hill Climber on several problem instances of a wellknown neutral fitness landscape, i.e., the Royal Road.
1
Introduction
In recent years the developments in the field of the theory of evolution have produced new theoretical concepts. The importance of natural selection was first amplified by the discovery of Mendelian genetics and then diminished by the discovery of various non-Mendelian mechanisms of inheritance. In this context, the processes known as adaptation, exaptation, i.e., the co-option of characters not evolved for their current role that increases the survival probability, and selective neutrality have been proposed as possible evolutionary pathways leading to the origin of new functions, fundamental in understanding evolution. The above processes have then suggested a new view of evolutionary dynamics based on both neutrality and balance between adaptation and exaptation. While in the canonical adaptive paradigm, as the Evolutionary Computation (EC), the genotypic variability is strictly related to the fitness change, i.e., the large
Corresponding author.
P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 1–12, 2010. c Springer-Verlag Berlin Heidelberg 2010
2
I. De Falco et al.
majority of variations at genotypic level results in a change at phenotypic level, in this paradigm awareness has been raised of the importance of selective neutrality and co–option by exaptation. In EC landscapes have been studied either from a static point of view, focusing on geometric properties such as smoothness, ruggedness and neutrality, or from a dynamical point of view, focusing on dynamic features, for instance evolving populations that use the landscape as its substrate. In particular, selective neutrality has been deeply studied by using Royal Road fitness landscapes to address questions about the processing and recombination of schemata in Genetic Algorithms (GAs) [1, 2]. However, surprisingly, the simple hill climbing paradigm based on random mutation easily outperforms the GA one on such a class of neutral landscapes [2]. Consequently, a change in the way of looking at evolutionary dynamics is necessary. The question we want to answer is the following: how can neutrality and exaptation be exploited in an artificial evolution paradigm in order to better simulate natural evolution? Since the canonical GA paradigm suffers from hitchhiking [2], self-organized co–evolution can provide the key to better address the problem [3]. In fact, the resulting evolutionary dynamics very naturally exhibits characteristics of Self-Organized Criticality (SOC) such as punctuated equilibria [4]. Following this approach, it is of paramount importance to bear in mind that biological systems with numerous interacting elements can spontaneously organize towards a critical state, i.e., a state which presents long– range correlations in space and time, the so-called SOC describing a wide range of dynamical processes [4]. Systems exhibiting SOC often consist of a large number of highly interrelated elements, and the collective behavior of the overall system can be statistically analyzed at the macroscopic level. Correspondingly, the system configurations can also be regulated by updating the microscopic states of its elements. In this case, the optimal configuration may naturally emerge from extremal dynamics [5], simply through a selection against the “worst” elements. In this perspective and differently from classical approaches, we aim to investigate Extremal Optimization (EO) [6], a coevolutionary algorithm exhibiting SOC successfully applied to NP–hard combinatorial problems [7], with the aim of exploring extremal dynamics in facing fitness landscapes that allow for the emergence of many neutral mutations, i.e., landscapes with large flat regions, and, at same time, of exaptive phenomena. Moreover, we aim to make a comparison between EO and Random Mutation Hill Climber (RMHC) on several instances of the well–known Royal Road neutral fitness landscape [1, 2]. In the following, Section 2 outlines the concepts of neutrality and exaptation. Section 3 accounts for EO while Section 4 describes the Royal Road and its formalization in terms of EO. Section 5 reports on the test problems experienced and shows the findings achieved. Finally, Section 6 contains our conclusions.
2
Neutrality and Exaptation
Attributes of living entities are actually a complex blend of adaptation by natural selection and exaptation by neutral genetic drift. It is nowadays becoming more
Extremal Optimization Dynamics in Neutral Landscapes
3
and more widely held that not all features of an organism are the results of an ideal, straightforward selection process. In fact, not everything is an adaptation: many results are due to chance events, or to interactions and trade-offs among different selective forces. According to Kimura [8], when one compares the genomes of existing species most of the molecular differences are selectively neutral. That is, the molecular changes represented by these differences do not influence the fitness of the individual. In fact, recent results into molecular evolution have confirmed that the majority of mutations have no selective effect [9], i.e., are neutral. Most of the evolutionary changes are then the result of genetic drift acting on neutral alleles. A new allele arises typically through the spontaneous mutation of a single nucleotide within the sequence of a gene. Such an event immediately contributes a new allele to the population and this allele is subject to drift. As a consequence, genetic drift has the effect of reducing the evolutionary impact of natural selection, which would otherwise tend to cause the spread of adaptations through populations. The mapping of genotype to phenotype (and hence to fitness) is therefore a many-to-one mapping. This gives rise to the possibility that neutral mutations in genotypes of equal fitness occur without selective pressure. Thus, one can realize with no doubt that neutrlity and evolvability [10] are contrasting concepts. Actually, recent theories consider neutral mutations as possibile sources of evolutionary innovations. In fact, being the resulting fitness landscape significantly different from the rugged and hilly model, if the frequency of neutral mutations is high enough, a neutral layer of genotypes may result across which a population may drift until an individual discovers a relatively rare beneficial mutation. Rather than becoming trapped in local optima, populations may be able to escape via a sequence of neutral mutations leading to a more rewarding region of the fitness landscape. Characters that arise through random or neutral evolution, or have arisen through selection either for a role different from its current one or for a role that is no longer relevant, can also be used, or co-opted, for other roles and can form the basis for new characters. Such characters are said to be exapted for their new role [11]. The concepts of exaptation and non–adaptive traits are central to the understanding of organismal evolution in general, including the evolution of development. Co-option of existing genes for novel functions is one aspect of exaptation that has been dealt with extensively in the evolutionary literature. Gould and Vrba [11] define exaptations as “characters, evolved for other usages (or [having] no function at all), and later co-opted for their current role.” As a consequence, adaptation and exaptation are alternative historical explanations for traits [11]. Adaptation, or evolutionary response to selective pressure, is commonly invoked to explain convergence within species [12]. Conversely exaptation, or current utility of a previously evolved character, is infrequently, if ever, associated with convergence.
3
Extremal Optimization
In biological systems, highly specialized complex structures often emerge when their most inefficient elements are selectively driven to extinction. Such a view
4
I. De Falco et al.
is based on the principle that evolution progresses by selecting against the few most poorly adapted species, rather than by expressly breeding those species well adapted to the environment. The principle that the least-fit elements are progressively modified has been applied successfully in the Bak–Sneppen model [4] showing the emergence of Self–Organized Criticality (SOC) in ecosystems. According to that model, each component of an ecosystem corresponds to a species with a given fitness value. The evolution is driven by a process where the least fit species together with its closest dependent species are selected for adaptive changes. As the fitness of one species changes, those of its neighbors are affected. Thus, species coevolve and the resulting dynamics of this extremal process exhibits the characteristics of SOC, such as punctuated equilibrium [4]. Extremal Optimization was proposed by Boettcher and Percus and draws upon the Bak–Sneppen mechanism, yielding a dynamic optimization procedure free of selection parameters. It represents a successful method for the study of NP–hard combinatorial and physical optimization problems [6, 7] and a competitive alternative to other nature–inspired paradigms such as Simulated Annealing, Evolutionary Algorithms, Swarm Intelligence and so on, typically used for finding high–quality solutions to such NP–hard problems. Differently from the well–known paradigm of Evolutionary Computation (EC), which assigns a given fitness value to the whole set of the components of a solution based upon their collective evaluation against a cost function and operates with a population of candidate solutions, EO works with one single solution S made of a given number of components si , each of which is a variable of the problem and is thought to be a species of the ecosystem. Once a suitable representation is chosen, allowing variables to be assigned an adaptive measure, and by assuming a predetermined interaction among these variables, a fitness value φi is assigned to each of them. Then, at each time step the overall fitness Φ of S is computed and this latter is evolved, by randomly updating only the worst variable, to a solution S belonging to its neighborhood Neigh(S). This last is the set of all the solutions that can be generated by randomly changing only one variable of S by means of a mutation. However, EO is competitive with respect to other EC techniques if it can randomly choose among many S ∈ Neigh(S). When this is not the case, EO leads to a deterministic process which very easily gets stuck in a local optimum. To avoid this behavior, Boettcher and Percus introduced a probabilistic version of EO, based on a parameter τ , called τ –EO. According to it, for a minimization problem, the species are firstly ranked in increasing order of fitness values, i.e., a permutation π of the variable labels i is found such that: φπ(1) ≤ φπ(2) ≤ . . . ≤ φπ(n) , where n is the number of species. The worst species sj is of rank 1, i.e., j = π(1), while the best one is of rank n. Then, a distribution probability over the ranks k is considered as follows: pk ∝ k −τ , 1 ≤ k ≤ n, for a given value of the parameter τ . Finally, at each update a generic rank k is selected according to pk so that the species si with i = π(k) randomly changes its state. In such a way, the solution moves to a neighboring one S ∈ Neigh(S) unconditionally. Note that only a small number of variables change their fitness, so that only a few connected variables need to be re–evaluated and re–ranked.
Extremal Optimization Dynamics in Neutral Landscapes
5
There is no other parameter to adjust for the selection of better solutions aside from this ranking. In fact, it is the memory encapsulated in this ranking that directs τ –EO into the neighborhood of increasingly better solutions. On the other hand, in the choice of a move to S , there is no consideration given to the outcome of such a move, and not even the selected variable xj itself is guaranteed to improve its fitness. Accordingly, large fluctuations in the cost function can accumulate in a sequence of updates. Merely the bias against extremely ‘bad’ fitness enforces repeated returns to near-optimal solutions. Thus, the only parameters are the maximum number of iterations Niter and the probabilistic selection value τ . For details about the τ –EO implementation see [7, 6, 5].
4
Neutrality: The Royal Road
Our aim is to investigate both the search ability and the extremal dynamics of τ –EO at solving the problems with high degree of neutrality. A fitness landscape is neutral if it shows the existence of neighboring configurations with the same fitness, i.e., neutral networks. On such landscapes, movements on a neutral network will occur as mutations between neighboring solutions of equal fitness. Neutral fitness landscapes are often used to model metastability in that evolutionary dynamics on neutral landscapes shows a characteristic evolutionary pattern consisting of long periods of stasis, while the population explores the current neutral layer, punctuated by rapid fitness increases when an individual discovers a transition point to a fitter neutral layer [13]. Within such a class of landscapes, the Royal Road (RR) function [1, 2] is certainly one of the most studied ones. In it we assign a target string St . This latter can be seen as a list of schemata (i.e. partially specified bit strings, in which, apart from 0’s and 1’s, referred as defined bits, we can have wild cards *’s meaning either a 0 or a 1). We denote with σi the i–th schema for St (i.e. the schema representing the i–th block making up the string), and with o(σi ) the order of σi , i.e. the number of defined bits in this schema. Then, given a binary string S, its fitness is defined as follows: R(S) = δi (S)o(σi ) (1) σi ∈Σ
where σi is the i–th schema of St , δi (S) = 1 if S is an instance of σi , 0 otherwise, and Σ is the set of all schemata related to St . Another description of RR is given in terms of number of consecutive blocks nb (sequences of bits) and of block size bs (number of bits per block) starting from the first bit in the whole sequence of length L. RR is intended to capture one landscape feature of particular relevance to GAs: the presence of fit low–order building blocks that recombine to produce fitter, higher–order building blocks. According to the above description, RR can be represented as a tree of increasingly higher-order schemata with schemata of a given order forming higher-order schemata.
6
I. De Falco et al.
It is evident that RR gives rise to the possibility of neutral mutations between genotypes of equal fitness occurring in the absence of selective pressure. In fact, the fitness is incremented by o(σi ) if and only if a block as a whole is completely found. As a consequence RR is very difficult to solve in that evolution should be able to escape from a local optimum by means of a sequence of neutral mutations leading to a fitter region of the landscape. On the other hand, it should be noted here that, from the exaptation point of view, this feature can be also seen as: – the co–option of previously evolved characters for a new role in case of schemata of order higher than 1. In fact, pre–optimized schemata at a given lower level have a specific phenotypic function at that level and can be co– opted for different phenotypic functions in the schemata of next higher levels in the hierarchical structure [1]; – the co–option of characters having no function at all in case of schemata of order equal to 1. In fact, in this case the contribution to the fitness of the 1s belonging to schemata of order 1 is null unless the schema gets completed. It is then clear that, in this latter case, exaptation is an emergent feature in the RR. Since their introduction, the RR functions were proposed as functions simple to optimize for a GA, but difficult for a hillclimber. However, Random Mutation Hill Climbing algorithm, easily outperformed a GA [14]. Afterwards, Holland [15] presented a revised class of RR functions designed to create insurmountable difficulties for a wider class of hillclimbers, and yet still be admissible to optimization by a GA. On the other hand, S. Forrest and M. Mitchell in [2] stated that, according to some their experimental results, a simple GA might optimize a RR function quite slowly: “once an instance of a higher-order schema was discovered, its high fitness allowed the schema to spread quickly in the population [. . . ] This slowed down the discovery of schemas in the other positions [. . . ] ”. On the other hand, G. Ochoa et al. [3] came to conclusion that “coevolution limits hitchhiking on RR functions”. However, their coevolutionary approach explictally decomposes the problem in sub–components (lower–order schemata) assigned to separate sub-populations. Obviously, such a method requires to know in advance the number of schemata involved in order to set the appropriate number of sub-populations.
5
Simulations
To set up our simulational framework, we have taken into account 6 instances of the Royal Road with the same block size bs of 8 bits (schemata of maximum size) and different number of blocks nb (schemata), namely functions with a number of blocks equal to 8, 16, 32, 64, 128 and 256. As a consequence, the total length L of the target sequence is equal to 64, 128, 256, 512, 1024 and 2048, respectively. In all cases, the target string is composed by all bits set to 1. Usually, as regards τ –EO, each bit of the RR encoding represents a species, while blocks represent groups of species populating an environmental niche. From
Extremal Optimization Dynamics in Neutral Landscapes
B7
B5 B0
B1
7
B6 B2
B3
Fig. 1. The relationship among bits and blocks of different order in a RR landscape with a string length of 16 and blocks of length 4. Circles account for the bits, while ellipsoids account for blocks.
the extremal dynamics point of view, the interaction among species is selectively neutral except in a particular case in which they all together are selective sensible. On the other hand, in order to evidence the exaptive phenomenon, the bits of the RR encoding can be considered as single characters while the blocks traits. Despite the difference in the terminology, the key issue here is represented by the interactions acting among the entities under evolution (species or characters). Thus, the interaction is the same as before except that it is referred to characters. In the following we refer to the bits of the RR function as characters. Figure 1 shows the interactions acting among single bits and also among blocks of different levels. Moreover, this figure highlights that in RR problems the connections among characters are only local, i.e., each bit interacts with those in its same block only. This is very important because it is known [16] that EO and its variants work poorly for highly–connected problems as TSP and protein folding. This feature of RR problems makes our choice of using a τ –EO sensible. As a consequence of the above considerations, in our algorithm a generic solution is simply represented by a string with a number of bits equal to the size of the RR instance we are facing. The fitness value Φ(S) of each solution S is computed according to Eq. (1) above. Moreover, the fitness value of the k–th species (bit) sk in the current solution S of the τ –EO is computed according to: δi (S)o(σi ) (2) φk = σi ∈Σ:sk ∈σi
where the σi areall the schemata completely set to 1 which the bit sk belongs to and Φ(S) = k φk . Based on this, the global best value is different for any size of the problem. As regards τ –EO the bit–flip mutation typical of binary GAs has been chosen as the move allowing to reach a neighbouring solution from the current one. As for the parameters, after a preliminary tuning phase, Niter parameter has been set to 1, 000, 000 and τ to 3.0. Finally, for each problem 50 τ –EO runs, differing in the seeds for the random number generator, have been performed All the simulations have been effected by making use of a Mac XServe with a two Quad–Core Intel Xeon 3.2 Ghz CPUs and 32 GByte of main memory.
8
I. De Falco et al.
Table 1. Simulation results. The success rate is computed with a time limit equal to 100,000 steps.
64 best value 256 t 4896.52 sdt 3167.83 EO tmax 14866 tmin 1060 success rate 100 t 6042.80 sdt 2418.97 RMHC tmax 14604 tmin 2561 success rate 100
128 640 6205.24 2718.33 18982 3013 100 16128.18 6216.07 36674 5999 100
String length 256 512 1536 3584 12546.54 22030.00 4026.85 3443.22 27283 33358 7887 15054 100 100 38906.06 91361.24 11884.50 20987.84 70620 152205 19176 58088 100 70
1024 8192 41876.78 6745.60 73376 30558 100 212902.66 41797.29 351146 130582 0
2048 18432 81838.02 6838.78 98748 69934 100 459490.58 97573.77 725653 314659 0
Table 1 reports the simulation results for all the RR instances considered in terms of the average best time t (number of iterations required by τ –EO to get the global solution), its related standard deviation sdt , the maximum best time tmax and the minimum best time tmin . The first line of the table contains the global best value for each of the different–sized problems. Finally, we have also computed a success rate by considering if an algorithm got the solution within a maximum number of iterations equal to 100,000. To better evaluate the effectiveness and the efficiency of τ –EO in facing this problem it is important to compare its results against those achieved by RMHC [14]. In fact, it is reported in [14] that this simple algorithm significantly outperforms two different versions of the GA on this problem, which is reported there as R2. Aiming at carrying out this comparison, we have implemented RMHC and run it on the same RR instances for 50 times on each problem size. Table 1 shows the results of these runs in terms of the same indices used for τ –EO. It is worth noting that the result achieved by our RMHC implementation on the 64–bit problem is a little bit better than that reported in [14]: the average number of iterations is 4896 over 50 runs in our case, whereas it is 6551 over 200 in theirs. This is important because allows us to take advantage of their results and conclusions, so that we can also perform an indirect comparison of τ –EO against GAs on this class of problems: if our algorithm turns out to be better than RMHC, which is much better than GA on RR functions, then we can be confident that it is better than a GA too. As it can be seen, for all the sizes of the problem τ –EO always gets the global best solution in all the 50 runs within 100,000 iterations, i.e. it has a success rate of 100%. Very interestingly, it outperforms RMHC in terms of lower number of evaluations to reach the optimum and lower standard deviation (apart from the smallest size of 64). Figure 2 reports the plot of the average best time t as a function of the problem size ratio r, i.e., the ratio between the total length of a given string and the minimum string length considered in the simulations, for both τ –EO and
Extremal Optimization Dynamics in Neutral Landscapes
9
Fig. 2. Average best time t to find the global solution as a function of the problem size ratio r. The black line indicates the τ –EO linear law of t as a function of the size ratio, while the dashed line accounts for the RMHC law.
RMHC. For the τ –EO the behavior can be approximated by means of a linear law. In order to model the behavior of this function, we have found the following correspondence: tτ −EO (r) = a · r + b, with a = 2493.18 and b = 2053.81. The values of these coefficients have been computed with a confidence of 95% and the resulting RMSE is equal to 523.27. On the other hand, also the RMHC behavior can be approximated by means a linear law and the correspondence found is in this case: tRMHC (r) = c · r + d, with c = 14746.59 and d = −17367.24. Also these values have been computed with a confidence of 95%, while the resulting RMSE is equal to 7758.42. Figure 3 shows typical runs of both algorithms for all problem sizes; namely, the fitness of the current solution as a function of the number of iterations. A first comment is that RMHC fitness behaviour is always non–decreasing, as it should be, since it is an elitist strategy. On the other hand, τ –EO is non– decreasing in a first phase, whereas it oscillates and provides solutions worse than the current one in a second phase, when it has approached the global best. This is because in this last phase the current solution contains a large number of 1s, so what happens is that, based on the choice of the characters to mutate, a bit 1 is quite likely chosen and mutated to 0. This mutation either destroys a block already completed with eight 1s, thus leading to a decrease in fitness, or deals with a not–yet–complete block so that the fitness remains constant. A second comment is that the evolutions for τ –EO show the typical behaviour of this technique, i.e. the so–called avalanches take place every now and then. An avalanche is a cascade of mutations in the characters that does not improve the fitness of the whole ecosystem. The theory of SOC [4] states that large interactive systems evolve naturally to a critical state where a single change
10
I. De Falco et al.
Fig. 3. Plots of the fitness Φ as a function of the number of iterations t. Size = 64 (top left), Size = 128 (top right), Size = 256 (middle left), Size = 512 (middle right), Size = 1024 (bottom left), Size = 2048 (bottom right).
in one of their elements generates “avalanches”, i.e., the sequence of mutations occurring to get a fitness improvement, and periods of stasis alternate with them. Thus our findings are in good accordance with the theory. As an example of how an avalanche takes place, for the problem with size of 128, in the top right position of Figure 3, shows that τ –EO quickly moves from a solution with a fitness value of 128 at iteration 8574 to another with a value of 392 at iteration 9280, passing through a set of five intermediate improving configurations within 706 iterations. Examples of stasis phases in the same figure are represented by
Extremal Optimization Dynamics in Neutral Landscapes
11
the intervals between iterations 117 and 2575, and from 3038 to 4844. Avalanches take place also for all the other problem sizes, though for higher sizes they are less visible because behaviour of τ –EO is “shrunk” in the figures due to presence of that of RMHC which is much more time–consuming. The “neutral” landscape of any RR function is strictly related to the concept of exaptation. In fact, the only way to get an improvement in fitness is to complete a block, i.e. an octet at the lowest level, or one with 16, 32, . . . , 2048 bits at the higher levels. Since in our τ –EO implementation for RR functions any new solution differs from the previous one by just one bit, the only way to complete, for example, an octet, is that this latter already contains 7 bits equal to 1 and the algorithm chooses the only 0 contained there. As soon as the last remaining 0 in the octet is flipped to a 1, the octet itself contributes to an improvement in fitness. In other words, we have a situation in which some features in the genotype (the 1s originally contained in the octet) are set, some others (the original 0s which become 1s in the octet) are modified during evolution, and none of them contributes to an improvement in fitness until something new takes place (the last 0 in the octet becomes a 1). It is exactly then that those already–existing features are not changed, yet they are used “as they are” to obtain a new individual which shows a higher fitness. This is exactly what happens in exaptation, where, for example, birds’ feathers, originally used to keep temperature constant, become very useful for the flight: they are not changed, they are taken as they are and used for something they were not originally “designed” for. Finally, it is important to underline that hitchhiking does not take place when using τ –EO on RRs, differently from what Forrest and Mitchell noticed in [2]. This is because coevolution helps preventing this problem: the characters co–evolve and compete, whereas in GAs they must be evaluated “as a whole”, as evidenced in [3].
6
Conclusions
In this paper attention has been focused on neutral landscapes, i.e. those characterized by the existence of many neighboring configurations with the same fitness. Specific reference has been made to the Royal Road functions. These latter have as a distincitive feature the fact that, in most cases, a change in a bit does not yield any variation in the fitness of the solution until some specific conditions about the configuration of a block of bits becomes true. This has some links with exaptation point of view in evolution, where, under the appearance of some modifications in an individual, one of its features may become very useful to carry out a task very different from the one it was originally destined to. A heuristic combinatorial optimization method, the τ –Extremal Optimization, has been used to investigate its effectiveness and efficiency in tackling this class of problems. Several Royal Road items, differing in their number of bits, have been taken into account. Comparison has been effected against the state–of–the–art algorithm on Royal Roads, i.e. Random–Mutation Hill–Climbing. Results have shown the superiority of τ –Extremal Optimization in terms of success rate and higher convergence speed to the optimum.
12
I. De Falco et al.
References 1. Mitchell, M., Forrest, S., Holland, J.H.: The royal road for genetic algorithms: Fitness landscape and ga performance. In: Proceedings of the First European Conference on Artificial Life: Toward a Practice of Autonomous Systems, pp. 245– 254. MIT Press, Cambridge (1992) 2. Mitchell, M., Forrest, S.: Royal road functions. In: Handbook of Evolutionary Computation. CRC Press, Boca Raton (1998) 3. Ochoa, G., Lutton, E., Burke, E.K.: The cooperative royal road: Avoiding hitchhiking. In: Monmarch´e, N., Talbi, E.-G., Collet, P., Schoenauer, M., Lutton, E. (eds.) EA 2007. LNCS, vol. 4926, pp. 184–195. Springer, Heidelberg (2008) 4. Sneppen, K., Bak, P., Flyvbjerg, H., Jensen, M.H.: Evolution as a self–organized critical phenomenon. In: Proceedings of the National Academy of Science, pp. 5209–5236 (1995) 5. Boettcher, S., Percus, A.G.: Optimization with extremal dynamics. Complexity, Complex Adaptive systems: Part I 8(2), 56–62 (2002) 6. Boettcher, S., Percus, A.G.: Extremal optimization: an evolutionary local–search algorithm. Computational Modeling and Problem Solving in the Networked World. Kluwer, Dordrecht (2003) 7. Boettcher, S., Percus, A.G.: Extremal optimization: methods derived from co-evolution. In: Proceedings of the Genetic and Evolutionary Computation Conference, San Francisco, CA, pp. 825–832. Morgan Kaufmann, San Francisco (1999) 8. Kimura, M.: The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge (1983) 9. Huynen, M.A., Stadler, P.F., Fontana, W.: Smoothness within ruggedness: The role of neutrality in adaptation. Proceeding of the National Academy of Sciences of United States of America 9(1), 397–401 (1996) 10. Wagner, A.: Robustness and evolvability in living systems. Princeton Studies in Complexity. Princeton University Press, Princeton (2005) 11. Gould, S.J., Vrba, E.: Exaptation: a missing term in the science of form. Paleobiology 8(1), 4–15 (1982) 12. Futuyma, D.J.: Evolutionary Biology. Sinauer Associates, Sunderland (1986) 13. Eldredge, N., Gould, S.J.: Punctuated equilibria: an alternative to phyletic gradualism. Models in paleobiology, 82–115 (1972) 14. Forrest, S., Mitchell, M.: Relative building–block fitness and the building–block hypothesis. In: Proceedings of Foundations of Genetic Algorithms, San Mateo, CA, pp. 109–126. Morgan Kaufmann, San Francisco (1996) 15. Holland, J.H.: Royal road functions. Internet Genetic Algorithm Digest 7(22) (1993) 16. Boettcher, S.: Extremal optimization for low–energy excitations of very large spin–glasses (2004), http://www.mcc.uiuc.edu/nsfitr04Rev/presentations/ 0312510 Large-Scale Applications.pdf
Improving the Scalability of EA Techniques: A Case Study in Clustering Stefan R. Bach1 , A. S¸ima Uyar2 , and J¨ urgen Branke3 1
3
University of Karlsruhe, Faculty of Informatics, 76128 Karlsruhe, Germany
[email protected] 2 Istanbul Technical University, Department of Computer Engineering, 34469 Maslak/Istanbul, Turkey
[email protected] University of Warwick, Warwick Business School, Coventry CV4 7AL, UK
[email protected]
Abstract. This paper studies how evolutionary algorithms (EA) scale with growing genome size, when used for similarity-based clustering. A simple EA and EAs with problem-dependent knowledge are experimentally evaluated for clustering up to 100,000 objects. We find that EAs with problem-dependent crossover or hybridization scale near-linear in the size of the similarity matrix, while the simple EA, even with problemdependent initialization, fails at moderately large genome sizes. Keywords: Evolutionary Algorithms, Pairwise Similarity-based Clustering, Scalability, Performance.
1
Introduction
Evolutionary algorithms (EA) have become known as good optimizers suitable for many N P-hard problems. The underlying theories are still incomplete, so most insight is gained through experimentation. But even now that the field has matured, many researchers still restrict their experiments to small genome sizes of a couple of hundred variables and fail to look at scalability issues for newly proposed approaches. This raises our interest in the suitability of EAs to efficiently search in high-dimensional search spaces on very large genomes. We review and categorize approaches towards achieving better EA performance and scalability. We then show that clustering is an interesting problem for a scalability analysis because of a high interdependency in the genome. We introduce a simple EA and problem-dependent enhancements for the clustering problem with a fixed number of clusters. Experimental evaluation of the EA scalability shows that problem-dependent knowledge is essential to successfully operate on very large genomes. Using specialized crossover or hybridization, near-linear scalability is achieved for clustering up to 100,000 objects. P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 13–24, 2010. c Springer-Verlag Berlin Heidelberg 2010
14
2
S.R. Bach, A.S ¸ . Uyar, and J. Branke
Background and Related Work
In this section we give a short overview of work regarding the scalability and performance of EAs and propose a taxonomy of approaches towards improvement. We then set the focus on our application domain, clustering, and present some background on this domain. 2.1
Previous Work on EA Scalability and Performance
The topics of scalability and performance have a long history in EA research. Various approaches have been proposed to improve scalability. We classify them into the following four groups: 1. Building block and linkage discovery tries to find groups of co-adapted chromosome positions that should be inherited together. Several classes of algorithms work towards this goal, such as some distribution estimation algorithms [20], EAs that detect linkage through perturbations [17], and EAs that learn linkage implicitly through special operators [4]. 2. Parallelization is an easy way to speed up the execution of an EA. The inherent parallelism in a population-based search allows nearly all EAs to distribute the workload of solution evaluation in a master/slave model. More specialized approaches use structured populations or multipopulation models that are designed to fit the underlying parallel hardware [1][21]. 3. Problem-specific knowledge can be integrated in many steps of the EA flowchart. For example, by using a problem-specific representation [3], informed initialization [22] and/or operators [7], and through the hybridization of an EA with traditional heuristics [14]. 4. Multi-level approaches use an EA as a component in a larger environment. For example, [12] and [18] process a compacted or partial version of a problem in a first EA run and then build a solution for the complete problem in a second EA run. Other approaches use EAs as second-level mechanisms to complete partial solutions obtained from a heuristic [5] or as a first-level mechanism that evolves a suitable ordering of low-level heuristics to solve a problem. [6]. 2.2
Clustering as an Example – Motivation and Background
Our interest lies in studying the scalability of EAs and their suitability to work on large genomes. We thus view scalability as the behavior of performance over changing problem sizes. Many past studies on large problems use test cases that are concatenations of independent smaller-sized problems [4][17][20]. Those problems are especially well suited to building block approaches, since they are fully decomposable into the single building blocks. We set our focus on large-sized problems with a high interdependency between many variables, which is the case for clustering problems. Clustering partitions a data set of n objects, V = {o1 , . . . , on }, into k clusters, C = {C1 , . . . , Ck },
Improving the Scalability of EA Techniques
15
Ci ⊆ V , so that similar objects are assigned to the same cluster. If each object is mapped to a gene, moving a single object to a different cluster changes the relation of its gene to the genes of all other objects in the old and new cluster. We analyze the suitability of a generic EA for large clustering problems with a fixed number of clusters and explore potential benefits of introducing problem-specific knowledge. Many EAs for clustering have been conceived since the early ’90s. A recent survey for feature-based clustering is given in [15]. Special focus on large data sets was laid in [10] and [16], but always in conjunction with a centroid-based representation. In this representation the genome length is independent of the number of objects to cluster, which is only suitable for feature-based clustering. We decide on a more flexible direct representation that allows the precise assignment of each object. Instead of feature-based clustering we use a pairwise similarity-based approach: a problem is represented by a sparse similarity matrix with entries s (oa , ob ). Based on previous work [8], we select the min-max-cut (MMC) as a criterion function to measure the quality of a clustering: MMC(C) =
Ci ∈C
cut (V \ Ci , Ci ) , cut (V \ Ci , Ci ) = ox ,oy ∈Ci s (ox , oy ) o ∈C x
i
s(ox , oy ) (1)
oy ∈V \Ci
The criterion is based on total inter-cluster similarity calculated as cut (V \ Ci , Ci ) and intra-cluster similarity of each cluster. MMC sums their ratio per cluster. It is to be minimized and is globally optimal for k = 1 cluster. MMC can be evaluated in linear time with respect to the number of non-zero similarity entries.
3
The Base Algorithm and New Operators
In this section we first define a simple EA for the clustering problem, which we will use as our base algorithm. We then propose new operators for initialization, crossover and hybridization to enhance the scalability of the algorithm. 3.1
The Base Algorithm
A simple EA for clustering can be quickly conceived by adapting a simple genetic algorithm: We directly represent a solution as a string of length n, assigning each gene a value in the range 0 . . . k − 1 to denote a cluster number. The population is randomly initialized, we use binary tournament selection, generational replacement with 1 elite individual, uniform crossover, and reassign objects to a random cluster for mutation. The clustering criterion is MMC defined by (1), 1 fitness is calculated as 1+MMC to allow for a maximization problem. 3.2
New Initialization Operators
As alternatives to random initialization, we introduce the heuristic initialization and the MST initialization, which is based on minimum spanning trees (MST).
16
S.R. Bach, A.S ¸ . Uyar, and J. Branke
The Heuristic Initialization postprocesses randomly initialized individuals. All objects are considered in a random permutation; for each one we calculate the summed similarity to all clusters and place it in the cluster with the highest total. Updated placements of already processed objects are taken into account when calculating the summed similarities for later objects. The MST Initialization is based on a minimum spanning tree of a graph induced by connecting objects with edges of weight 1 − s(oa , ob ). MST-based clustering has a long history; it is known to produce results identical to single linkage clustering [11] and has been used to initialize a clustering EA in [13]. We calculate a MST with Kruskal’s algorithm and then define an edge as eligible for removal if both of its endpoints have a degree > 1. The eligibility criteria prevents the creation of singleton clusters. We introduce a degree of freedom f and remove k−1 randomly selected edges of the least similar (k−1)+f edges. The remaining connected components induce a clustering. We use the parameter f to create multiple clusterings from a single MST; a small value limits the eligible edges for removal to those with low similarity. 3.3
Alternative Crossover Operators
In combination with a direct encoding, standard crossover operators can show undesirable effects in the clustering task: For example, two parents encoding the same solution with permuted cluster numbers (competing conventions problem) tend to create a completely different child [9]. To overcome this drawback we introduce the matching crossover and the cluster-based crossover. The Matching Crossover is an adaptation of uniform crossover, which introduces an additional preprocessing step. Before the crossover is applied, we adjust the cluster numbers in one parent to match those in the other. This is done by repeatedly matching those clusters in both parents that show the largest overlap, until all clusters in one parent have been paired up with clusters in the other one. The approach is similar to [19], where cluster numbers in the parents are adapted “in such a way that the difference between the two parent solutions is as small as possible.” The Cluster-based Crossover is a family of crossover operators following a common scheme that aims at passing on some unaltered clusters to the children. The operator is inspired by [9], which suggests to inject a subset of the clusters in one parent unaltered into the other, while resolving ambiguities in object assignment in favor of the newly injected clusters. This approach requires problem-dependent heuristics to deal with constraints and to counteract the increase in the number of clusters that occurs through crossover. Since we fixed the number of clusters, we use a modified approach. Clusterbased crossover assigns different roles to the parents, so that one of them acts as the primary parent. Cluster-based crossover then creates one child from two parents in three steps:
Improving the Scalability of EA Techniques
17
1. A number of clusters (k1 ) in the primary parent are selected for inheritance according to a selection strategy. The selected clusters are then copied to the child solution. Objects assigned in this step are excluded from any further processing. 2. Then the most intact k − k1 clusters of the secondary parent are selected, i.e. those which have the highest percentage of yet unassigned objects. They are also copied to the child solution, but do not overwrite assignments made in the first step. 3. After the first two steps a number of objects remain “homeless”, namely those that belong in both parents to a cluster that has not been selected. They are assigned to a cluster according to a certain homeless strategy. An implementation of cluster-based crossover needs to choose strategies to select clusters in the primary parent and to deal with homeless nodes. We suggest two alternatives for both cases, using either a random or an informed strategy. For the selection strategy we suggest to either – randomly decide for each cluster in the primary parent whether it gets inherited to the child, giving equal probability to both cases. Thus the expected number of inherited clusters is k2 . Alternatively we – aim to keep the best clusters in the primary parent (keep-best selection) by deterministically selecting the k2 clusters which have the lowest value of inter-cluster similarity over intra-cluster similarity. For the homeless strategy we suggest to either – assign a random cluster number to the homeless objects, or – do a heuristic assignment, similar to the postprocessing of the heuristic initialization, but limited to homeless objects only. 3.4
Hybridization with a Hill-Climbing Algorithm
Our last proposal is the introduction of a hybridization operator that performs hill-climbing on the solution candidates. We insert it just before the regular mutation step to keep the EA’s theoretic property of being able to mutate every solution candidate into every other solution candidate; experiments (not presented here) comparing this to a traditional invocation after the mutation operator have shown no significant differences in the results. The proposed hill-climbing hybridization reassigns all objects of a solution candidate in a random order, similar to the postprocessing of the heuristic initialization. But instead of selecting the target cluster heuristically, the hill-climbing hybridization performs the reassignment that results in the highest fitness improvement. To accomplish this, the change in fitness for each possible assignment of an object needs to be calculated. Due to the nature of the fitness function given by (1), we do not need to completely recalculate the fitness for each assignment from scratch. The change in fitness can quickly be assessed if the inter-cluster and intra-cluster similarity sums for all clusters and the summed similarity between the object under consideration and each cluster are known.
18
S.R. Bach, A.S ¸ . Uyar, and J. Branke
The main computational effort for the hybridization is spent on calculating the similarity sums between each object and every cluster. Those values change with the reassignment of objects and can thus only be calculated as soon as an object is considered for reassignment. Computational complexity is easiest described when thinking of the induced similarity graph: Calculating the similarity sums for a single object u needs O(deg(u)) time, if each non-zero similarity value corresponds to an edge. When iterating over all objects each edge in the graph is involved exactly twice. Thus the complexity of the hybridization is O(|E|+k·n) to calculate the similarity sums and to consider k clusters for all n nodes.
4
Experiments
In this section we present details on our experimental design alongside a summary and discussion of the observed results. 4.1
Design of Experiments
We evaluate the proposed operators on test problems with k = 10 clusters. Problem sizes range from n = 100 to 1,000 objects for all configurations and up to n = 100, 000 objects for successful algorithms. We report on the performance to achieve a set target quality. Experiments are run on a Linux machine equipped with Intel Xeon hyper-threading CPUs running at 2.6 GHz. The Test Problems are artificially generated, so that we can vary problem size while keeping equal properties. First, all objects are uniformly assigned to k clusters, giving each an expected size of nk . Then, we generate a sparse similarity matrix by randomly adding non-zero entries. Those entries are added by choosing a random object, deciding to create either an intra-cluster (p = 0.7) or intercluster (p = 0.3) entry, and then choosing the second object out of the possible candidates randomly. Similarity values are drawn from a normal distribution with μintra−cluster = 0.75, μinter−cluster = 0.25, and σ = 0.25. For each problem instance we add 5n entries, which results in an expected node-degree of 10 in the induced similarity graph. Thus, the total edges in the graph and therefore the size of the sparse similarity matrix scale linearly in n. Finally, we assert for each object that the summed intra-cluster similarity exceeds the inter-cluster sum. Otherwise, additional intra-cluster entries are added to increase the likelihood that the initial node assignment represents an optimal solution. The Reporting Metrics employed are success rate (SR), average success runtime (SRT), and expected runtime (ERT), based on measured CPU times. The values are based on executing an algorithm run until success or termination. A run is successful if the fitness of the best individual reaches at least 85% of the optimal min-max cut value. A run is terminated, if the relative improvement of the best individual over the last 20% of the generations falls below 1%; but no sooner than generation 125. This termination condition is derived from initial
Improving the Scalability of EA Techniques
19
Table 1. Results for simple crossover operators and varying initialization schemes Uniform crossover Size Init.
Matching crossover
SR SRT (s) Gener. ERT (s) SR SRT (s) Gener. ERT (s)
100 Random 0.55 Heuristic 0.46 MST 0.25
1.40 1.09 0.53
310 204 67
2.93 0.46 2.61 0.31 3.26 0.27
1.99 1.33 0.64
373 196 60
4.41 4.57 3.55
500 Random 0.33 Heuristic 0.28
50.62 38.39
3,683 2,724
189.69 0.40 172.49 0.24
52.11 42.36
3,506 2,807
155.99 216.21
1000 Random 0.41 Heuristic 0.29
251.28 190.32
9,141 6,852
743.40 0.38 819.01 0.33
259.86 198.67
9,112 6,840
834.90 766.13
experiments with the base algorithm. Each algorithm configuration is executed for 200 independently generated problems, each one having a fixed seed to use common random numbers for all configurations. SR is the percentage of successful runs, SRT averages the runtimes of successful runs and ERT is calculated as 1 − SR URT (2) SR with URT denoting the average runtime for unsuccessful runs. The ERT metric gives the expected time to success when a run is restarted after convergence at an unsatisfactory solution. We also report the average generations to success. ERT = SRT +
Parameter Settings have been empirically determined, based on initial experiments using the base algorithm. We use a population size of 100; always perform crossover, giving equal chance to both parents in uniform and matching 0.5 crossover; and use a mutation rate of genome-length . The MST initialization uses an f -value of 5, which allows to sample the initial population from 9+5 = 2002 9 candidates for a 10-cluster problem. 4.2
Results and Discussion
We summarize the key observations found in the experimental results. Further data, including results on additional test configurations and visualized fitness plots can be found in [2]. Uniform and Matching Crossover both exhibit similar performance for the various initialization operators and problem sizes. Table 1 presents experimental results. Since “intelligent initialization” starts a run at a higher fitness level and the termination condition asks for a minimum relative increase over a time frame that depends on the number of elapsed generations this might result in premature termination. Thus we clean the results for problems of size 500 and 1,000 and remove all runs that terminate before generation 200, which removes the data of all MST-initialized runs and up to 40 runs for heuristic initialization.
20
S.R. Bach, A.S ¸ . Uyar, and J. Branke
The remaining results indicate that the inclusion of special initialization operators benefits the average runtime for success but simultaneously reduces the average fitness level upon convergence, which results in lower success rates. The ERT criterion combines the benefits of smaller runtimes and the drawback of lower success rates. The ERT measurements show no definite winner, since no algorithm configuration is reliably better than the base algorithm (uniform crossover, random initialization). When comparing uniform crossover and matching crossover, we find that both show a similar increase of the required generations and runtime to success. This behavior can be understood when looking in detail at the genetic convergence for both algorithms. For example, we have inspected genetic convergence of the randomly initialized algorithms and find that more than 75% of the runs are converged after 300 generations, meaning that over 95% of the genes have the same allele value on over 95% of the individuals. At this point the population has been taken over by a single individual and the matching step during crossover remains without any effect. Indeed, the whole EA run is reduced to a local search that is mainly driven by the mutation operator. Cluster-based Crossover shows varying success depending on the choice of the selection and homeless strategies. Three out of the four possible combinations result in algorithms with low success; we summarize their symptoms but will not give any detailed results for them. If selection and homeless assignment are both done randomly, we observe a very slow fitness increase that is even worse than the base algorithm. For example, for the problem size 1000, heuristically initialized cluster-based crossover reaches an average fitness of 0.25 after 10,000 generations, whereas the base algorithm shows an average fitness of 0.75. In the case where cluster-based crossover aims to keep the best clusters we find that usually the largest clusters get selected. This behavior is in line with the fitness function, which finds its global optimum for a single large cluster. In combination with a random assignment of homeless objects this leads to a drop of the average population fitness to nearly zero: In each generation the largest clusters are selected and grow even larger, since each cluster receives k1 homeless objects on average. This self-reinforcing behavior leads to the generation of small meaningless clusters, which have a negative impact on the fitness. In combination with heuristic homeless assignment however, cluster-based crossover is able to achieve an improvement over both the fitness of the best individual and the average population fitness. Still, the larger the problem size gets, the smaller the achieved improvement in this class. We suspect this to be caused by a limited capability to explore the search space, since the deterministic selection tends to choose the large clusters of the primary parent, which leads to them always being kept intact. The fourth possible combination, random selection and heuristic homeless assignment, turns out to be the best configuration for cluster-based crossover. Both the average fitness as well as the fitness of the best individual exhibit good improvement and a satisfying solution quality is nearly always achieved in very few generations. Table 2 presents experimental results for problem sizes up to
Improving the Scalability of EA Techniques
21
Table 2. Results for cluster-based crossover (random sel./heuristic) and the hillclimbing hybridization using cluster-based crossover (keep-best sel./heuristic) Cluster-based cross. Size Init.
Hill-climbing hybrid.
SR SRT (s) Gener. ERT (s) SR SRT (s) Gener. ERT (s)
100 Random 1.00 Heuristic 1.00 MST 0.98
0.61 0.50 0.44
12 9 8
0.61 1.00 0.50 0.99 0.48 1.00
0.55 0.59 0.48
2 6 2
0.55 0.62 0.48
500 Random 1.00 Heuristic 1.00 MST 1.00
1.54 1.31 1.46
16 10 13
1.54 1.00 1.31 1.00 1.46 1.00
1.42 1.24 1.34
3 2 3
1.42 1.24 1.34
1000 Random 1.00 Heuristic 1.00 MST 1.00
2.09 1.65 1.77
19 12 13
2.09 1.00 1.65 1.00 1.77 1.00
1.97 1.84 1.83
3 3 3
1.97 1.84 1.83
1,000 objects in detail. We find that this time the application of intelligent initialization is beneficial, since fewer generations until success are required without negative impacts to the success rate, resulting in an overall improved ERT. The Hill-climbing Hybridization proves as very successful in our experiments and constantly reaches a near perfect success rate, regardless of the choice for crossover and initialization. In combination with the base algorithm and no further optimizations, the hill-climbing hybridization solves the problems of size 1,000 on average in less than 8 generations. For space reasons we only present details of one configuration that constantly ranks among the fastest for problems of up to 1,000 objects. Table 2 shows the experimental results for an algorithm that combines the hill-climbing hybridization with cluster-based crossover using keep-best selection and heuristic assignment of homeless nodes. This algorithm reaches success in very few generations most of the time; the notably high average for problem size 100 and heuristic initialization is caused by a small number of outliers. The quick success raises the suspicion that it is due to the hill-climbing only and that the EA might not add value to the clustering process. We investigate this by running a single-instance hill-climbing hybridization for each of the 200 test problems for up to 500 iterations. Table 3 shows that large problem sizes achieve satisfying success rates of around 80%. Problems of the smallest size had a high risk of being trapped in local optima; they constantly failed for heuristic initialization, since it is susceptible for producing individuals with an empty cluster on small problem sizes. This renders the hybridized EA similar to approaches that combine clustering heuristics, such as k-Means, with an EA. Thus the main focus of the EA lies no longer on the performed crossover and mutation, but on compensating the shortcoming of a heuristic that might result in local optima of undesired quality for some initial solution candidates. Large-sized problems have been further used to test the successful EA configurations that we found above, namely (i) unhybridized cluster-based crossover
22
S.R. Bach, A.S ¸ . Uyar, and J. Branke
Table 3. Success rate and average iterations until success for iterative hill-climbing on a single solution candidate 100
500
1000
Init.
SR
Iter.
SR
Iter.
SR
Iter.
Random Heuristic MST
0.44 0 0.30
4.9 – 2.9
0.89 0.88 0.69
8.6 6.6 6.6
0.82 0.85 0.70
11.1 9.6 8.4
1000 900 800 700 600 500 400 300 200 100 0
Generations to success
Total CPU time (sec)
with random selection and heuristic homeless assignment and (ii) cluster-based crossover with keep-best selection and heuristic homeless assignment combined with the hill-climbing hybridization. We test the heuristic the and MST initialization with both algorithms but no longer pursue random initialization. The additional test problems are of size 2,000; 5,000; 10,000; 25,000; 50,000; 75,000; and 100,000. For larger problem sizes of up to 100,000 nodes we again observe success rates that are nearly constant at 100%; only one run did not reach a successful fitness level. We find that the average number of generations to reach a successful √ fitness grows sub-linearly (Fig. 1 top right). It shows a behavior along the lines of n for (i) and is nearly constant for (ii), in which case an acceptable solution is always found in 7 generations or less. In (ii) the theoretic complexity per generation is linear in the number of non-zero similarity entries, which allows for a total complexity that is near-linear in the size of the sparse similarity matrix. But the observed times for initialization (Fig. 1 bottom left) and the average runtime spent in each generation (Fig. 1 bottom right) increase faster than linear.
0
25000
50000
75000
100000
40 35 30 25 20 15 10 5 0 0
100 90 80 70 60 50 40 30 20 10 0 0
25000
50000
75000
25000
50000
75000
100000
Problem size (Number of objects) Time per Generation (sec)
Initialization time (sec)
Problem size (Number of objects)
100000
Problem size (Number of objects)
Cluster-based Crossover, Heurisitic Init. Cluster-based Crossover, MST Init.
100 90 80 70 60 50 40 30 20 10 0 0
25000
50000
75000
100000
Problem size (Number of objects)
Hill-climbing, Heurisitic Init. Hill-climbing, MST Init.
Fig. 1. Runtimes and required generations for the improved evolutionary algorithms
Improving the Scalability of EA Techniques
23
This is especially evident when comparing the slope for problems smaller than 10,000 objects with the slope for larger problem sizes. We would expect the heuristic initialization and the time to process a single generation to scale linearly in the number of objects: all involved loops iterate over either the number of clusters (which we kept constant), the number of objects (n), or the non-zero similarity values (which are linear in the number of objects in our test problems). The contradicting measurement results are most likely due to overhead of the Java Virtual Machine1 .
5
Conclusion and Future Work
We have systematically studied the scalability of EAs for pairwise similaritybased clustering, based on artificially generated data sets of up to 100,000 objects. A simple generic EA turned out to be unsuitable for solving even moderately large problems of 1,000 objects, since genetic convergence quickly reduces the EA to a parallel local search. Neither intelligent initialization nor simple amendments to crossover can improve this behavior. More profound usage of problem knowledge allows the design of crossover and hybridization operators specialized for clustering. Both improvements lead to an EA that scales close to linear in the size of the sparse similarity matrix and solves even large test problems in feasible time. Future work will incorporate approaches from other categories we have mentioned in our taxonomy. Multi-level strategies have already shown some success for EA clustering and we are curious if they further improve the scalability of our successful EAs. An open question is the applicability of our results for clustering in general. We have preferred artificial test problems over real world data, to create scalable problems with similar properties that suit the employed objective function. It needs to be tested if problems with different properties or under a different objective function exhibit similar scalability.
References 1. Alba, E., Tomassini, M.: Parallelism and evolutionary algorithms. IEEE Trans. Evol. Comput. 6(5), 443–462 (2002) 2. Bach, S.R.: A scalability study of evolutionary algorithms for clustering. Diploma thesis: University of Karlsruhe, Germany (2009), urn:nbn:de:swb:90-114447 3. Carvalho, P.M.S., Ferreira, L.A.F.M., Barruncho, L.M.F.: On spanning-tree recombination in evolutionary large-scale network problems–application to electrical distribution planning. IEEE Trans. Evol. Comput. 5(6), 623–630 (2001) 4. Chen, Y.p., Goldberg, D.E.: Convergence time for the linkage learning genetic algorithm. Evolutionary Computation 13(3), 279–302 (2005) 1
We compared measuring the process CPU time of the JVM, which includes the time for helper threads such as garbage collection, with measuring the thread CPU time, which measures only the single-threaded user code. The latter scales much closer to linear.
24
S.R. Bach, A.S ¸ . Uyar, and J. Branke
5. Christou, I.T., Zakarian, A., Liu, J.M., Carter, H.: A two-phase genetic algorithm for large-scale bidline-generation problems at delta air lines. Interfaces 29(5), 51–65 (1999) 6. Cowling, P., Kendall, G., Han, L.: An investigation of a hyperheuristic genetic algorithm applied to a trainer scheduling problem. In: 2002 Congress on Evolutionary Computation, vol. 2, pp. 1185–1190. IEEE, Los Alamitos (2002) 7. Deb, K., Pal, K.: Efficiently solving: A large-scale integer linear program using a customized genetic algorithm. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 1054–1065. Springer, Heidelberg (2004) ¨ gu 8. Demir, G.N., Uyar, A.S ¸ ., G¨ und¨ uz-O˘ ¨d¨ uc¨ u, S ¸ .: Multiobjective evolutionary clustering of web user sessions: A case study in web page recommendation. Soft Computing 14(6), 579–597 (2009) 9. Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, Chichester (1998) 10. Gasvoda, J., Ding, Q.: A genetic algorithm for clustering on very large datasets. In: Nygard, K.E. (ed.) 16th Int. Conf. on Comp. App. Ind. Eng., pp. 163–167. International Society for Computers and Their Applications (ISCA) (2003) 11. Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. Applied Statistics 18(1), 54–64 (1969) ¨ gu 12. G¨ und¨ uz-O˘ ¨d¨ uc¨ u, S ¸ ., Uyar, A.S ¸ .: A graph based clustering method using a hybrid evolutionary algorithm. WSEAS Trans. on Mathematics 3(3), 731–736 (2004) 13. Handl, J., Knowles, J.: An Evolutionary Approach to Multiobjective Clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007) 14. Hart, W.E., Krasnogor, N., Smith, J. (eds.): Recent Advances in Memetic Algorithms. Studies in Fuzziness and Soft Computing, vol. 166. Springer, Heidelberg (2005) 15. Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., de Carvalho, A.C.P.L.F.: A survey of evolutionary algorithms for clustering. IEEE Trans. Systems, Man, and Cybernetics Part C 39(2), 133–155 (2009) 16. Jie, L., Xinbo, G., Li-cheng, J.: A GA-based clustering algorithm for large data sets with mixed and categorical values. In: 5th Int. Conf. on Comp. Intel. Mult. App., pp. 102–107. IEEE, Los Alamitos (2003) 17. Kargupta, H., Bandyopadhyay, S.: Further experimentations on the scalability of the GEMGA. In: Eiben, A.E., B¨ ack, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 315–324. Springer, Heidelberg (1998) 18. Korkmaz, E.E.: A two-level clustering method using linear linkage encoding. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guerv´ os, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 681–690. Springer, Heidelberg (2006) 19. von Laszewski, G.: Intelligent structural operators for the k-way graph partitioning problem. In: 4th Int. Conf. on Gen. Alg., pp. 45–52. Morgan Kaufmann, San Francisco (1991) 20. Pelikan, M., Sastry, K., Goldberg, D.E.: Sporadic model building for efficiency enhancement of the hierarchical BOA. Genetic Programming and Evolvable Machines 9(1), 53–84 (2008) 21. Schmeck, H., Kohlmorgen, U., Branke, J.: Parallel implementations of evolutionary algorithms. In: Zomaya, A., Ercal, F., Olariu, S. (eds.) Solutions to Parallel and Distributed Computing Problems, pp. 47–68. Wiley, Chichester (2001) 22. Surry, P.D., Radcliffe, N.J.: Inoculation to initialise evolutionary search. In: Fogarty, T.C. (ed.) AISB-WS 1996. LNCS, vol. 1143, pp. 269–285. Springer, Heidelberg (1996)
MC-ANT: A Multi-Colony Ant Algorithm Leonor Melo, Francisco Pereira, and Ernesto Costa Instituto Superior de Engenharia de Coimbra, 3030-199 Coimbra, Portugal Centro de Inform´ atica e Sistemas da Universidade de Coimbra, 3030-790 Coimbra, Portugal
[email protected], {xico,ernesto}@dei.uc.pt
Abstract. In this paper we propose an ant colony optimization variant where several independent colonies try to simultaneously solve the same problem. The approach includes a migration mechanism that ensures the exchange of information between colonies and a mutation operator that aims to adjust the parameter settings during the optimization. The proposed method was applied to several benchmark instances of the node placement problem. The results obtained shown that the multi-colony approach is more effective than the single-colony. A detailed analysis of the algorithm behavior also reveals that it is able to delay the premature convergence. Keywords: Ant Colony Optimization, Multiple colony, Node Placement Problem, Bidirectional Manhattan Street Network.
1
Introduction
Ant Colony Optimization (ACO) is one of the most successful branches of swarm intelligence [4]. ACO takes inspiration from social insects such as ants. While foraging real ants deposit pheromone on the ground to guide the other members of the colony. ACO mimics this indirect way of communication. The first ant algorithms were proposed by [6], [7] as a multi-agent approach to solve difficult combinatorial optimization problems like the traveling salesman problem. Since then a wide range of variants were proposed and applied to different classes of problems (see [8] for an overview). In this paper we propose MC-ANT, a multi-colony ACO. The idea behind this approach is to allow for the simultaneous exploration of several search locations and to dynamically intensify the search on the most promising ones. Each colony maintains its own trail and set of parameters, but the most successful colonies transfer information to the worst ones. Specifically the trails of the worst colonies are periodically updated, which hopefully will help them to escape from local optima and move towards more promising locations. We illustrate our approach by addressing the problem of finding the optimal node assignment in a multi-hop Wavelength Division Multiplexing (WDM) lightwave network with a virtual Bidirectional Manhattan Street Network topology P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 25–36, 2010. c Springer-Verlag Berlin Heidelberg 2010
26
L. Melo, F. Pereira, and E. Costa
[16]. One advantage of this type of network is the ability to create a virtual topology different from the underlying physical topology. The assignment of the physical nodes to the virtual topology is a strong factor in the efficiency of the network. The results obtained are encouraging as they show the advantage provided by the existence of several colonies. Migration is able to enhance the algorithm performance without causing the convergence of the colonies to the same trail. The structure of the paper is the following: in sec. 2 we briefly describe Ant Colony Optimization algorithms and in sec. 3 we present the Node Placement Problem. Section 4 comprises the presentation of our multi-colony approach. Results from experiments are presented in sec. 5 and, finally, in sec. 6 we provide the main conclusions.
2
Ant Colony Optimization
In many species, an ant walking to or from a food source leaves a substance in the ground called pheromone. The other ants tend to follow the path where the pheromone concentration is higher [3]. [11] proved that this pheromone laying mechanism is used to guide the other members of the colony to the most promising trails. In an ACO algorithm, artificial ants use an artificial trail (together with some heuristic information) to guide them in the process of building a solution to a given problem. While the heuristic information is static the pheromone trail is updated according to the solutions found in previous iterations. Starting from an empty solution, components are probabilistically added one by one until a complete solution is obtained. Some heuristic knowledge can also be used to bias the choice. The specific formula used to select the next solution component depends on the ACO variant. The general ACO algorithm consists of three phases (see fig. 1). After the initialization and until some termination condition is met the following steps are repeated: each ant builds a solutions, the best(s) solution(s) are improved by a local search (this step is optional) and at last the pheromone trail is updated.
set the parameters initialize the pheromone trail while termination condition not met do construct ant solutions apply local search (optional) update pheromone trail end_while
Fig. 1. The ACO metaheuristic
MC-ANT: A Multi-Colony Ant Algorithm
2.1
27
ACO Algorithms
ACO algorithms have been applied to many problems. Examples are the applications to assignment problems, scheduling problems and vehicle routing problems [8]. Among other applications, ACO algorithms are currently state-of-the-art for solving the sequential ordering problem (SOP), the resource constraint project scheduling (RCPS) problem, and the open shop scheduling (OSS) problem [8]. Ant System (AS) [6], [7] was the first ACO algorithm. Since then some some variants have been derived, being the MAX-MIN Ant System (MMAS) [19] and Ant Colony System (ACS) [5] some of the most successful and most studied of them [8]. A common characteristic of ACS and MMAS is that they focus their search in a specific region of the search space [8]. We thus hope that by using an island model approach a bigger portion of the landscape can be covered. Our method is inspired in the ACS, partly because is considered the most aggressive of the two [8] and is able to find better solutions in short computation times, although it converges sooner to a suboptimal solution. We hope the multi-colony method helps avoiding the premature convergence while retaining the ability to reach good solutions fast. 2.2
Ant Colony System (ACS)
ACS tries to diversify the solutions landscape covered by the ants in an iteration by introducing a pheromone update during the construction step. At each decision point, each of the ants updates the trail by slightly decreasing the pheromone level of the component it just choose. The regular pheromone update at the end of each iteration considers only one ant, either the iteration best, the best-so-far or a combination of both. The formula used is (1) where Lbest is the quality of the solution found by the selected ant. ρ (1 − ρ) · τij + Lbest if cij is in the solution τij = (1) τij otherwise The mechanism used to select the next component uses a pseudo-random proportional rule. Depending on the value of a parameter q0 the rule may favor either exploration or exploitation. 2.3
Multi-Colony ACO
In multi-colony ant algorithms several colonies of ants cooperate in order to find good solutions for the problem being solved [18]. The cooperation takes place by exchanging information about good solutions. There are a few proposed multi-colony variants of the ACO. Many of them are used to solve multi-objective problems (see [10] or [1] for an overview of the approaches) or specially implemented for parallel computing environment, where p colonies run in p parallel processors (for a review of some of the models see [12], [18], [9], [8]).
28
L. Melo, F. Pereira, and E. Costa
Fewer variants are used on single objective problems as island model alternatives to the classic ACO. Two examples of the latter are ACOMAC [21] where periodically each colony uses its current trail τi to update another colony trail τi+1 in a circular manner (τi = w × τi + (1 − w) × τi+1 ), and AS-SCS [17] which has two types of colonies working at the same time but with slightly different construction methods.
3
Node Placement Problem
The Bidirectional Manhattan Street Network (BMSN) is a 2d-toroidal mesh where every node is directly connected to 4 other nodes (see fig. 2).
Fig. 2. A 3 by 4 BMSN
Let us consider a BMSN with x×y = n nodes. The network can be represented as a graph G = (V, E), where V is the set of nodes slots and E is the set of bidirectional edges. Each of the n nodes (0, 1, ..., n − 1) can be assigned to the n slots of the graph without duplication. Two nodes i, j can communicate directly if they are in adjacent slots in the graph, otherwise they must use intermediate nodes to communicate and the number of hops increase. The number of hops should be as low as possible to minimize package forwarding. The topology optimization problem in regular topologies, such as the BMSN, is studied as optimal Node Placement Problem (NPP) [15], [13]. The amount of traffic among each pair of nodes i, j is given by a traffic matrix T where tij denotes the traffic from i to j, with tij ∈ IR+ 0 . Let h(i, j) be a function that returns the hop distance of the shortest path between two nodes i and j. The objective of NPP is to minimize the average weighted hop distance between the nodes, i.e. to minimize the function f indicated in equation 2, where n is the number of nodes. n−1 n−1 tij · h(i, j) (2) f (σ) = i=0 j=0
In recent years several approximate methods were proposed to solve NPP. Most of them use a combination of greedy methods, local search, tabu search, genetic algorithm, simulated annealing, multi-start local search and variable depth search [14], [15], [22], [13]. The best performing one at the moment is [20].
MC-ANT: A Multi-Colony Ant Algorithm
4
29
MC-ANT
Our approach is a multiple colony variation inspired in the ACS. The most relevant features of our proposal are: 1. the optimization algorithm maintains several colonies (a) all colonies have the same number of ants (b) all colonies run for the same number of iterations (c) all colonies share the same heuristic function 2. each colony has its own trail, in an attempt to maximize the search area explored. 3. each colony has its own set of parameters (α, β, ρ, q0 ), and is able to tune them therefore adjusting its search strategy. 4. There is no step-by-step on-line pheromone update as that is a computationally costly step and we expect to preserve the diversity by using multiple colonies/trails. The main algorithm is presented in Figure 3. set the parameters and initialize the pheromone trails while termination condition not met do for each colony do construct ant solutions apply local search end_for migrate best solution update pheromone trails end_while Fig. 3. The MC-ANT algorithm
The termination condition is met if a predefined number of iterations is reached. In the construct ant solutions step each of the ants builds a solution. This step is clearly dependent on the problem being solved and we explain it in further detail in sec. 4.1. In the apply local search step one or more of the best solutions from the current iteration is improved through local search. We used a greedy search algorithm with a first improvement 1-swap neighborhood and a don’t look bit mechanism [2]. In our experiments the total number of solutions improved per iteration was the same irrespectively of the number of colonies (i.e. in the configurations with a smaller number of colonies a bigger number of solutions per colony were subjected to local search). In the migrate best solution step, for each colony we consider only the best solution found in the present iteration. We use these solutions to determine
30
L. Melo, F. Pereira, and E. Costa
the best and worst colonies. Let hdbest and hdworst stand for the value of the solution found by the best and worst colonies respectively. The migration takes place if (hdworst − hdbest )/(hdbest ) > x, where x is a random variable uniformly distributed over [0, 1]. In that case the best-so-far solution of the best colony is sent to the worst colony to be used in the trail updating step. The solution received is used in the same manner as if it was found by the colony that receives it. The idea is to slightly move the worst trail to a more promising area and, as a consequence, intensify the search effort on the preferred area. The trails should remain apart, to preserve diversity, but nearer than before. The worst colony also suffers a slight disturbance in its parameters. The amplitude of the disturbance, δ, is itself a parameter. For each disturbed parameter a value d is calculated as a uniformly distributed value over [−δ, δ]. Let p be the parameter to be disturbed, its value after the disturbance if given by (3), ⎧ if p ∈ {ρ, q0 } ∧ pb + d ∈]0, 1[ ⎨ pb + d (3) p = pb + 10 · d if p ∈ {α, β} ∧ pb + 10 · d ∈]0, 10[ ⎩ p otherwise with pb being the value of parameter p in the best colony. Equation (3) can be read as follows, should the migrated parameter value plus the disturbance be within the parameter range the change is accepted, otherwise the value is unaltered. Since the range for α and β is 10 times superior to that of ρ and q0 so is the added disturbance. In the update pheromone trails step each colony uses its’ best-so-far solution to update the pheromone trail using (2), with Lbest being equal to the quality of the solution. 4.1
The Construct Ant Solutions Method
In the construction phase, the heuristic information and the trail are used for two different purposes. The first node, i, is randomly selected and positioned at random in the graph. Then the heuristic information is used to ascertain which unplaced nodes (if any) should be the neighbors of i. Afterwards, for each of those potential neighbors the trail is used to select the slot (from the free slots that are immediately North, East, South or West from i) where it should be placed. After all the possible neighbors are placed, the node i is said to be connected, and the process is repeated for each of the new neighbors. If all the placed nodes are connected but there are still some unplaced nodes, one of them is randomly selected and placed at a random free slot in the graph in order to continue the construction of the solution. This process is repeated until all the nodes are placed. For each pair of nodes i, j the heuristic ηij value is equal to tij + tji (this information is extracted from the traffic matrix, T ). Given the way we construct the solution we are also interested in the relative orientation of the nodes. As such, the trail is used to assign a value to each
MC-ANT: A Multi-Colony Ant Algorithm
31
triple (i, d, j), where i and j are nodes and d ∈ {N orth, East, South, W est}. τidj denotes the value associated with placing j to the d of i, (for example, if d = N orth, τidj stands for the value of placing j immediately to the north of i). For a given placed node i let C be the set of all the available nodes j for which ηij > 0. If C is empty no neighbor is selected and i is considered connected. Otherwise we use a pseudo-proportional rule to select the next neighbor to be placed (4) where q is a uniformly distributed variable over [0, 1], q0 ∈ [0, 1] is a preset parameter and argmaxx f (x) represents the value of x for which the value of f (.) is maximized. β argmaxj∈C ηij if q < q0 j= (4) variable selected using (5) otherwise Equation 5 give us the probability pij of a node j in C to be selected as the next neighbor of i to be placed. β ηij pij =
(5) β l∈C ηil The placed but unconnected nodes are stored in a FIFO queue. The formula used to choose the slot where to place a given node j (selected to be a neighbor by i) is also a pseudo-proportional rule. Let D be the set of directions in which the slots surrounding i. The direction to by used, e, is given by (6), where q is once again a uniformly distributed variable over [0, 1]. α argmaxd∈D τidj if q < q0 e= (6) variable selected using eq. (7) otherwise The probability of choosing direction f ∈ D is calculated using (7) pidj =
5
α τidj
α . d∈D τidl
(7)
Experiments
Several experiments were performed to compare the results obtained by MCANT as we vary the number of colonies. The benchmark instances used were the ones proposed by [13] and also used by [20]. The benchmark set consists of 80 instances of 4 problem sizes (n = 4 × 4, n = 8 × 8, n = 16 × 16, n = 32 × 32) with 20 matrices for each given size. We selected the first 10 problems in the n = 8 × 8 and n = 16 × 16 data sets to perform the experiments reported here. In the experiments performed all the colonies shared the same initial parameters: α = 1, β = 2, ρ = 0.1, q0 = 0.9 and δ = 0.05. The value τ0 is set to 1/L where L is the optimal solution quality. Each experience has run 30 times.
32
5.1
L. Melo, F. Pereira, and E. Costa
Results
In the following, when referring to the problem sets we identify them by size, such as n = 4 × 4, and to the configurations as c × a, where c denotes the number of colonies and a the number of ants per colony. Note that for a given problem size, the total number of ants (and hence, the number of explored solutions) is the same regardless the configuration. In table 1 we present the mean best-fitness (MBF) and best-run, worst-run solution’s qualities discovered in the 30 runs, both for the n = 08 × 08 and n = 16 × 16 data sets. The results are averages of the 10 instances. In general, for each instance, the MBF decreases as the number of colonies increases. Table 1. MBF, best-run and worst-run solutions discovered for the n = 08 × 08 (a), and n = 16 × 16 (b). The results are averages over 10 problem instances. dimension optimum configuration best-run MBF worst-run n08x08
76
n16x16
307
01x064 02x032 04x016 08x008 01x256 02x128 04x064 08x032 16x016
76.1 76.9 76.1 76.7 76.1 76.5 76.1 76.3 321.1 340.1 317.7 338.0 316.6 335.3 315.7 333.4 317.1 332.3
79.0 78.7 77.9 77.1 360.6 356.9 356.6 350.3 349.2
In each one of the n = 08 × 08 instances, the quality of the best solution found was the same for all the configurations. For the MBF and worst solutions the relative order of the global performance depicted in table 1 is the same as the one observed in the individual instances. As for the n = 16 × 16 data set, the 16 × 16 configuration achieved the lowest MBF for nearly all instances. The best solutions were usually found by configurations with multiple colonies. The single colony configuration was the least effective in all the performance measures displayed. To complement the previous results, in fig. 4 we present the evolution of the MBF. The results displayed are averages of the 10 instances for the n = 16 × 16 data set. As the number of iterations increases, the difference in the quality of the solutions found by each configuration becomes more noticeable, apparently favoring those with more colonies. It is visible that the solutions are still improving even after 2500 iterations, when the simulation was stopped. The line slopes vary slightly according to the problem, but as a general rule the configurations with more colonies are the ones showing the highest rate of improvement at the end of the simulation. These results suggest that the multicolony approach is able to postpone convergence.
33
390
400
MC-ANT: A Multi-Colony Ant Algorithm
370 360 330
340
350
hd_als
380
01x256 02x128 04x64 08x32 16x16
0
500
1000
1500
2000
2500
iterations
Fig. 4. Evolution of the MBF averaged over the 10 instances for the n = 16 × 16
To gain a deeper insight into the algorithm behavior we also studied the migration flow. As expected it tends to be more intense in the beginning of the runs and then it slowly becomes less frequent, although it does not stop. Still, for instances where the optimal solution was harder to find, the frequency of migration remained higher when compared with instances that were more easily solved. Configurations with more colonies have a higher frequency of migration as expected. One important point to investigate is whether the migration is so intense that leads to all the colonies converging to the same path. In order to ascertain this we measured the evolution of the average distance between the trails of each pair of colonies. In fig. 5 we present two examples of trail differences (averaged over 30 runs each): in panel a) we display results for the p09 instance from the n = 16 × 16 data set; in panel b) we can see the results for the p08 instance from the n = 8 × 8 data set. The same trend is visible for the other instances. Initially all colonies have the same trail and, as expected, in the early stages of the optimization they become distinct. Results form the chart depicted in fig. 5 reveal that the trails are able to remain separated until the end of the execution. The (average) distance increases slightly with the number of colonies. In the n = 16 × 16 data set, for some instances (specifically the ones for which the algorithm was able to find better solutions) the gap between the values for the 16 × 16 and 8 × 32 versus the other configurations seems to be larger. In instances where the algorithm is less effective the difference seems to be bellow the average specially for the configurations with more colonies. For the moment it is still not clear why this happens and how relevant it is for the behavior of the algorithm, but we plan to address this issue in our future research. In the smaller instances after the initial rise in the distances there is a slight decrease and then the curves remain stable as can be seen in fig. 5 b). We believe that the decrease is due to a more intense migration as soon as some colonies find high quality solutions. After some iterations, all the colonies are able to find a very good solution and as such there is little alteration in the paths.
34
L. Melo, F. Pereira, and E. Costa
02x128
04x064
08x032
16x016
6 4 2 0
prob n16x16 − trail difference
8
01x256
0
500
1000
1500
2000
2500
2000
2500
02x032
04x016
08x008
1
2
3
4
01x064
0
prob n16x16 − trail difference
5
a)
0
500
1000
1500 b)
Fig. 5. Trail distance (averaged for 30 runs) for p09 in the n = 16 × 16 (a) and p08 in the n = 08 × 08 (b)
In addition to the migration of solutions, the proposed architecture allows for the self-adaptation of parameters. Due to space constraints we cannot present a complete analysis of its influence on the behavior of the algorithm. We nevertheless provide some evidence that our approach is able to converge to specific settings that increase the likelihood of finding good solutions. For each run of a given instance we recorded the current value of the parameters when the best solution was found. This allowed us to determine a range (rgeneral ) and an average value (ageneral ) for each of the parameters. We then selected the subset of runs that found the highest quality solution (for that instance) and calculated the range (rbest ) and the average value (abest ) obtained considering only those runs. These results were taken for each configuration. An example is depicted in table 2 showing the values obtained by 08 × 32 configuration on instance p01 of the 16 × 16 problem set. This is an example of a situation were the best solution was found by several colonies. This holds true for other configurations and instances from the n = 16 × 16 data set. This result confirms that the parameters have influence in the quality of the solution found and allowing for the parameters to adjust may improve the algorithm performance, particularly in situations where the optimal settings are not known in advance. As for the smaller instances (n = 8 × 8), the very best solution was typically found hundreds of thousands of times by each configuration (as opposed to usually much less than one hundred for the n = 16 × 16) and rbest is almost identical to rgeneral .
MC-ANT: A Multi-Colony Ant Algorithm
35
Table 2. Parameters ranges obtained by 08 × 032 for p01 in the n = 16 × 16
general maximum general minimum general average best maximum best minimum best average
6
α
β
ρ
q0
2.74 0.02 1.14 1.73 1.35 1.54
3.74 0.49 2.02 1.86 1.40 1.63
0.24 0.00 0.11 0.11 0.10 0.11
1.00 0.73 0.91 0.98 0.97 0.98
Conclusions
This paper presents MC-ANT, a multi-colony variation for the ACO. Each colony has its own trail and parameters settings and, periodically, information may be exchanged in order to improve the search abilities of the algorithm. Additionally, mutation mechanism allows for the self-adaptation of the parameters. The proposed approach was applied to several instances of the NPP. Results show that the multi-colony configurations consistently outperforms the single colony. For almost every instance the MBF decreases as the number of colonies increases. Also, the multi-colony configurations were able to avoid premature convergence, this effect being more noticeable in configurations with more colonies. The migration flow behaved as expected, being stronger in the beginning and in the configurations with more colonies and gradually decreasing. Still the migration was gentle enough to allow for the trails to remain separated and thus avoid the convergence of the colonies to the same trail. A brief analysis of how parameters values adjust during the optimization shows that they can create a positive bias towards promising areas of the search space, improving the algorithm performance. This is a key issue in our approach and it will be studied in depth in the near future. Acknowledgments. This work was supported by Funda¸c˜ao para a Ciˆencia e Tecnologia, under grant SFRH/BD/38945/2007.
References 1. Angus, D., Woodward, C.: Multiple objective ant colony optimisation. Swarm Intelligence (3), 69–85 (2009) 2. Bentley, J.L.: Fast algorithms for geometric traveling salesman problems. ORSA Journal on Computing 4, 387–411 (1992) 3. Deneubourg, J.L., Aron, S., Goss1, S., Pasteels, J.M.: The self-organizing exploratory pattern of the argentine ant. Journal of Insect Behavior 3(2) (1990) 4. Dorigo, M., Birattari, M., Stutzle, T.: Ant colony optimization - artificial ants as a computational intelligence technique. Technical report, Universit´e Libre de Bruxelles, Institut de Recherches Interdisciplinaires et de D´eveloppements en Intelligence Artificielle (September 2006)
36
L. Melo, F. Pereira, and E. Costa
5. Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation 1(1), 53–66 (1997) 6. Dorigo, M., Maniezzo, V., Colorni, A.: Positive feedback as a search strategy. Tech. rep., Politecnico di Milano, Italy (1991) 7. Dorigo, M., Maniezzo, V., Colorni, A.: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics 26(1), 29–41 (1996) 8. Dorigo, M., Stutzle, T.: Ant Colony Optimization. A Bradford Book. MIT Press, Cambridge (2004) 9. Ellabib, I., Calamai, P., Basir, O.: Exchange strategies for multiple ant colony system. Information Sciences: an International Journal 177(5), 1248–1264 (2007) 10. Garc´ıa-Mart´ınez, C., Cord´ on, O., Herrera, F.: A taxonomy and an empirical analysis of multiple objective ant colony optimization algorithms for the bi-criteria tsp. European Journal of Operational Research 180(1), 116–148 (2007) 11. Goss, S., Aron, S., Deneubourg, J.L., Pasteels, J.M.: Self-organized shortcuts in the argentine ant. Naturwissenschaften 76, 579–581 (1989) 12. Janson, S., Merkle, D., Middendorf, M.: Parallel Ant Colony Algorithms. In: Parallel Metaheuristics, pp. 171–201. John Wiley & Sons, Chichester (2005) 13. Katayama, K., Yamashita, H., Narihisa, H.: Variable depth search and iterated local search for the node placement problem in multihop wdm lightwave networks. In: IEEE Congress on Evolutionary Computation, pp. 3508–3515 (2007) 14. Kato, M., Oie, Y.: Reconfiguration algortihms based on meta-heuristics for multihop wdm lightwave networks. In: Procedings IEEE International Conference on Communications, pp. 1638–1644 (2000) 15. Komolafe, O., Harle, D.: Optimal node placement in an optical packet switching manhattan street network. Computer Networks (42), 251–260 (2003) 16. Maxemchuk, N.F.: Regular mesh topologies in local and metropolitan area networks. AT&T Technical Journal 64, 1659–1685 (1985) 17. Michel, R., Middendorf, M.: An ACO Algorithm for the Shortest Common Supersequence Problem. In: New ideas in optimization, pp. 51–61. McGraw-Hill, London (1999) 18. Middendorf, M., Reischle, F., Schmeck, H.: Multi colony ant algorithms. Journal of Heuristics 8(3), 305–320 (2002) 19. St¨ utzle, T., Hoos, H.H.: The max-min ant system and local search for the traveling salesman problem. In: Piscataway, T., B¨ ack, Z.M., Yao, X. (eds.) IEEE International Conference on Evolutionary Computation, pp. 309–314. IEEE Press, Los Alamitos (1997) 20. Toyama, F., Shoji, K., Miyamichi, J.: An iterated greedy algorithm for the node placement problem in bidirectional manhattan street networks. In: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pp. 579–584. ACM, New York (2008) 21. Tsai, C.F., Tsai, C.W., Tseng, C.C.: A new hybrid heuristic approach for solving large traveling salesman problem. Information Sciences 166(166), 67–81 (2004) 22. Yonezu, M., Funabiki, N., Kitani, T., Yokohira, T., Nakanishi, T., Higashino, T.: Proposal of a hierarchical heuristic algorithm for node assignment in bidirectional manhattan street networks. Systems and Computers in Japan 38(4) (2007)
Artificial Evolution for 3D PET Reconstruction Franck P. Vidal1,2, , Delphine Lazaro-Ponthus2, Samuel Legoupil2 , ´ Jean Louchet1,3 , Evelyne Lutton1 , and Jean-Marie Rocchisani1,4 1
INRIA Saclay - ˆIle-de-France/APIS, Parc Orsay Universit´e, 4 rue Jacques Monod 91893 Orsay Cedex, France 2 CEA, LIST, Saclay, F-91191 Gif-sur-Yvette, France 3 Artenia, 24 rue Gay-Lussac, 92320 Chˆ atillon, France 4 Paris XIII University, UFR SMBH & Avicenne hospital, 74 rue Marcel Cachin, 930013 Bobigny, France
Abstract. This paper presents a method to take advantage of artificial evolution in positron emission tomography reconstruction. This imaging technique produces datasets that correspond to the concentration of positron emitters through the patient. Fully 3D tomographic reconstruction requires high computing power and leads to many challenges. Our aim is to reduce the computing cost and produce datasets while retaining the required quality. Our method is based on a coevolution strategy (also called Parisian evolution) named “fly algorithm”. Each fly represents a point of the space and acts as a positron emitter. The final population of flies corresponds to the reconstructed data. Using “marginal evaluation”, the fly’s fitness is the positive or negative contribution of this fly to the performance of the population. This is also used to skip the relatively costly step of selection and simplify the evolutionary algorithm.
1
Introduction
Fully 3D tomographic reconstruction in nuclear medicine requires high computing power and leads to many challenges. Indeed, tomographic reconstruction is an ill posed inverse problem: a solution cannot be assumed to exist (e.g. in extreme cases of excessive noise), and a unique solution does not necessary exist. Conventional reconstruction methods are analytical or based on statistical analysis, such as the maximum-likelihood expectation-maximization (ML-EM)1 [12] or the ordered subset expectation-maximization (OS-EM) [6] algorithms. A broad overview of reconstruction algorithms in nuclear medicine can be found in [7]. The trend today is to use more general methods that can integrate more realistic models (application-specific physics and data acquisition system geometry). To date, the use of such methods is still restricted due to the heavy computing power needed. Evolutionary algorithms have proven to be efficient optimisation techniques in various domains [10], including medicine [11] and medical imagery [2, 4, 15]. 1
Member of Fondation Digiteo (http://www.digiteo.fr). See Section on “Acronyms” for a list of acronyms.
P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 37–48, 2010. c Springer-Verlag Berlin Heidelberg 2010
38
F.P. Vidal et al.
However their use in tomographic reconstruction has been largely overlooked. In a previous paper, we showed that a cooperative coevolution strategy (also called Parisian evolution) called “fly algorithm” [8] could be used in single-photon emission computed tomography (SPECT) reconstruction [3]. Here, each fly corresponds to a 3D point that is emitting photons. The evolutionary algorithm is then used to optimise the position of flies. After convergence, the set of flies corresponds to the reconstructed volume. However, positron emission tomography (PET) – the other main tomographic technique in nuclear medicine – has taken over SPECT in routine clinical practice. Although the underlying physics and the design of imaging systems in SPECT and PET are different, it is possible to use a reconstruction method that is similar to the one that we initially proposed for SPECT data. In this case, PET raw data needs to be converted into sinograms (see Fig. 4(b) for an example of synthetic sinogram). This pre-processing step introduces sampling that constrains the resolution of the reconstructed images and most of the input sinogram is empty. During the reconstruction using the evolutionary algorithm, it is difficult to take into account physics and the geometrical properties of the imaging system due to this intermediate data representation, and also only a few pixels of the simulated images will contain useful information. Moreover, the issue regarding the memory usage that has been identified in [3] remains. This is therefore not straightforward to achieve an efficient fully 3D reconstruction in PET. In this paper we propose a new approach to overtake the disadvantages presented above. It makes use of a simplified geometry model that still matches the acquisition system properties. Our long term goal is to include Compton scattering (the dominant physical perturbation in the input data) correction in the evolution loop to further improve the quality of the final reconstructed image. The following section gives an overview of the context and objective of this study. Our methodology is described in Section 3. The results and performance of our reconstruction method is presented in Section 4. The last section discusses the work that has been carried out and it provides directions for further work.
2
Context and Objectives
Nuclear medicine [1] appeared in the 1950’s. Its principle is to diagnose or treat a disease by administering to patients a radioactive substance (also called tracer) that is absorbed by tissue in proportion to some physiological process. This is the radiolabelling process. In the case of diagnostic studies, the distribution of the substance in the body is then imaged. It is generally a functional form of imaging because the purpose is to obtain information about physiological processes rather than anatomical forms and structures. When a pathology occurs, the metabolism increases and there are more tracer molecules in the pathology area. Consequently, the radioactivity also increases. There are two classes of techniques to produce 3D data in nuclear medicine: SPECT and PET. They allows 3D reconstruction of the distribution of the
Artificial Evolution for 3D PET Reconstruction
39
tracer through the body. In SPECT, a gamma emitter is used as radioactive tracer. Similarly to conventional computed tomography (CT) [9], multiple 2D projections are recorded at successive angles. It is followed by a mathematical reconstruction. The main limitations include the finite spatial resolution and the sensitivity of detectors, physical effects (such as absorption, Compton scattering, and noise), long exposure times, and accuracy of the reconstruction. In PET, a positron emitter is used as radionuclide for labeling rather than a gamma emitter. After interactions, a positron combines with an electron to form a positronium, then the electron and positron pair is converted into radiations: this is the annihilation reaction. It produces two photons of 511 keV emitted in opposite directions. Annihilation radiations are then imaged using a system dedicated to PET. This system operates on the principle of coincidence, i.e. the difference in arrival times of the photons of each pair of detected photons and by knowing that each annihilation produces two photons emitted in exactly opposite positions. Our previous attempt to use cooperative coevolution was in SPECT [3]. It is based on the fly algorithm [8]. Each fly represents a point of the patient 3D space and it acts as a radioactive emitter. The final population of flies corresponds to the tracer density in the patient who is scanned, i.e. the reconstructed data. It uses a “marginal fitness” metrics based on the “leave-one-out cross-validation” method to evaluate the contribution of each fly as it will be explained in detail in Section 3.1. This metrics gives the contribution (positive or negative) of a given fly with respect to the whole population. In SPECT, the input data corresponds to raw 2D projections at successive angles around the patient that can be formated into a sinogram format. However, to speed up the reconstruction time and reduce the amount of memory needed by the algorithm, only 3% of the input data is used at a time by a fly during the reconstruction. Indeed, for each fly, only four orthogonal projections are simulated. This SPECT reconstruction approach, based on artificial evolution, gave promising results. However, PET is considered to be the gold-standard tomographic technique in nuclear medicine due to its higher sensitivity. The section below shows how to adapt the fly algorithm more efficiently in this case, taking into account the specificity of PET data.
3 3.1
Reconstruction Method Artificial Evolution Algorithm for PET
In [3], we showed that, when we were addressing the SPECT problem, if we defined the fitness of a fly as the consistency of the image pattern it generates, with the actual images, it gave an important bias to the algorithm with a tendency of the smaller objects to disappear. This is why we then introduced marginal evaluation, where the fitness of a given fly is not evaluated in itself, but as the (positive or negative) contribution to the likeness of the image produced by the complete population of flies, with the current image. In other terms, to evaluate a given fly, we first evaluate the fitness of the whole population - the distance between the total illumination pattern created
40
F.P. Vidal et al.
by all the flies, to the actual image - then evaluate the fitness of the same total population without the fly that is being evaluated, and calculate the difference: fitnessm (i) = fitness (population − {i}) − fitness (population)
(1)
with fitnessm (i) the marginal fitness of a given fly, fitness (population − {i}) the fitness metrics of the population without the fly that is being evaluated, and fitness (population) the fitness metrics of the whole population of flies. This particular method to calculate fitnesses does not induce any extra computation load. Each time a new fly is created, a simulation is done in order to calculate its illumination pattern, which is kept in memory. Each time a fly is destroyed or created, the total illumination pattern is updated. When a fly has to be evaluated, the global fitness is readily calculated using the total illumination pattern, and the global fitness “minus one” is calculated the same way using the “total minus one” illumination pattern. This fitness calculation method is then integrated into an evolution strategy scheme (see Figure 1). The population of flies is first initialised randomly. In the test experiments shown in this paper, the flies are initialised inside a cylinder contained between the sensor crystals. In real applications, the actual shape of the body may be used. In the second part of the initialisation, in the case of
Create initial population of n flies
For each fly(i)
Select the fly to kill (choose any fly whose fitness is negative)
Simulate path of n photons
Update total population pattern
Update total population pattern
Select the fly to mutate (choose any fly whose fitness is positive)
Replace dead fly by mutatated fly
Simulate path of n photons for mutated fly
Update total population pattern
NO Stop evolution
YES Get solution (flies with positive fitness)
Fig. 1. Reconstruction algorithm
Artificial Evolution for 3D PET Reconstruction
41
SPECT, a fly produces an adjustable number of photons to compute its own image pattern. Once each pattern is computed, the sum of these patterns is stored as the population’s pattern and the population’s global fitness is calculated by comparing the pattern to the actual image. In the case of PET, each fly is producing an adjustable number of annihilation events. The result of this simulation consists of a list of pairs of detector identification numbers that correspond to annihilations (see Section 3.3 for details). There begins the evolution loop. It will aim to optimise the position of flies with respect to the input data, e.g. using a mutation operator to modify the location of a given fly. An interesting point is that our method to calculate the fitness delivers negative values for flies with a negative contribution and positive values otherwise. Therefore it was attempting to skip the classical selection step and use a fixed threshold - zero! - as the only selection criterion. Thus we built a steady state evolution strategy, where in order to choose the fly that has to be replaced, we draw flies randomly until a fly is found with a negative fitness. Then it is eliminated and replaced by the application of evolutionary operators to a parent fly chosen the same way but with a positive fitness. 3.2
Sinogram Mode
During the annihilation, two photons of 511 keV are emitted in opposite directions. Photons of a single pair are called coincidence photons. When two photons are detected within a predefined time window 2τ , the PET imaging system records which crystals have been activated, i.e. the position of the photons within the imaging system. It is possible to convert the coincidence data into a sinogram format [5]. This is convenient as it enables the use of standard reconstruction methods that have been originally developed for CT or SPECT. Fig. 2 shows how this conversion can be achieved. When an annihilation event occurs, two photons are emitted in coincidence at 180 degrees. Two detectors are activated almost at the same time. The line between the activated detectors is called a “line of response” (LOR). To generate a sinogram, sampling is needed along: – the horizontal axis of the sinogram that matches the minimum distance between a LOR and the centre point of the system (see distance r in Fig. 2), – the vertical axis of the sinogram that matches the angle between a LOR and the horizontal plane of the system (see angle α between the LOR and the dash line in Fig. 2). Then it is possible to use a reconstruction method that is similar to the one that we initially proposed for SPECT data. However, we saw that using sinograms in PET introduces drawbacks (such as sampling, difficulties to take advantages of physics and geometrical properties of the imaging system, memory usage, etc.) and that, therefore, a new approach dedicated to PET is required.
42
F.P. Vidal et al. annihilation active detector line of response (LOR) direction of the annihilation photon
line of the sinogram
(angle α)
Sinogram angle 90°
α
70° 50°
r 30° O
10°
α r
−10° −30° −50° −70° −90° pixel corresponding to the LOR
Fig. 2. Conversion from coincidence events to 2D sinograms
3.3
LOR Mode
It is possible to model the actual geometry of the imaging system to directly use the coincidence data without any conversion. In practice, PET imaging systems are made of blocks of collinear detectors [14]. These blocks are located circularly to constitute a cylinder. Each crystal is identified by a unique identification number. Note that several cylinders of blocks are used in a PET scanner. Here, a fly acts as a positron that will emit random pairs of coincidence photons. The number of pairs per fly is a parameter that can be tuned. For each pair of photons, a direction is picked using uniformly distributed random points on the surface of a unit sphere. It gives the direction of the first photon of the pair, whilst the opposite vector gives the direction of the other photon. The fly’s position and the photon’s direction define a line. When this line intersects two crystals, a LOR is detected. Using efficient ray-tracing techniques (ray-tracing is widely documented in the literature, and a complete overview can be found in 3D Computer Graphics by A. Watt [16]), it is possible to detect intersections. To speed up computations, the PET scanner is embedded into a bounding open cylinder. If one of the rays corresponding to a pair of photons does not intersect the bounding cylinder, then no LOR will be detected. In this case, intersections between the rays and the crystals are skipped. Fig. 3 shows a simplified PET system with simulated LORs.
Artificial Evolution for 3D PET Reconstruction
43
Fig. 3. Using a simplified PET system geometry
To evaluate a fly’s contribution, the concept of marginal fitness is used once again (see Eq. 1). The fitness metrics corresponds to a distance measurement between simulated data and the actual data given by the imaging system (note that data must be normalised). A LOR needs to be modelled using a pair of detector identification numbers. LORs cannot be efficiently accumulated in 2D images. Due to the fitness function, two lists are needed, one for the actual data, and one for the population data. The number of times a given LOR is encountered is stored in the corresponding record. Actual and simulated data are efficiently stored into indexed lists (the standard template library (STL) provides such containers [13]). The pair of identification numbers is used as the key of each record in the lists. It speeds up the memory access to records and reduces memory usage to its minimum. Indeed, empty records, when a pair of crystal identification numbers do not correspond to a LOR, are not stored. Also, each fly needs to store the LORs that it has generated. In this case, the fitness can be computed using a distance measurement between the lists. For efficiency reasons, we have chosen the City Block Distance metrics (sometimes called Manhattan Distance): |LORr (key).counter − LORs (key).counter| (2) d(LORr , LORs ) = with d(LORr , LORs ) the city block distance between LORr and LORs , the set of LORs for the real data and the simulated data respectively, and counter is the number of times that a given key appears in the LOR set. To compute Eq. 1, LORs corresponds either to the set of LORs of the whole population of flies or the set of LORs of the population without the fly that is being evaluated.
4
Results and Validation
This section presents the results obtained using synthetic data to validate the usefulness and accuracy of our novel PET reconstruction approach. First, we
44
F.P. Vidal et al.
compare images reconstructed from sinograms using our algorithm with reference images computed with an OS-EM implementation. Then, we evaluate images reconstructed from LOR data with respect to theoretical values. 4.1
Sinogram Mode
Let us consider the set up presented in Fig. 4. It is made of two spheres whose radius is 2.5 mm. The radioactivity of one of them is twice as great as the other one’s activity.
x
r = 0.25 5 cm
5 −5 cm
0
2x
r = 0.25
−5 cm
(a) Set up.
(b) Sinogram.
Fig. 4. Test sinogram
Fig. 5 shows examples of tomographic slices reconstructed using our evolutionary method and the OS-EM algorithm. Method 1 consists in incrementing the voxel value of the reconstructed volume for each fly that lies in that voxel. Method 2 consists in accumulating the marginal fitness of each fly whose fitness is positive and that lies in that voxel. The final volume data is then normalised between 0 and 1. This normalisation step is needed to compare results with the volume that has been produced using the OS-EM algorithm. Reconstructed volumes appear to be visually similar to the reference volume. To further compare the results, profiles are extracted at the centre of each bright area in Fig. 5. Fig. 6 shows that these profiles are relatively close. In particular, the difference of intensity between the two bright areas is preserved in the reconstructed slices.
Artificial Evolution for 3D PET Reconstruction
(a) Method 1.
(b) OS-EM
45
(c) Method 2.
Fig. 5. Tomographic slices reconstructed using Fig. 4(b)
Profiles 1 0.8
Profiles 2 1
Method 1, raw data Method 2, raw data OS-EM
0.7
0.8 0.6
Intensity
Intensity
0.5 0.4 0.3
0.6
0.4
0.2 0.2 0.1 0
0 -4
-3
-2
-1 0 1 Pixel number
2
(a) Upper left bright area.
3
4
-4
-3
-2
-1 0 1 Pixel number
2
3
4
(b) Lower right bright area.
Fig. 6. Profiles extracted from Fig. 5
4.2
LOR Mode
Test 1. Nine spheres with various radius and radioactivity are simulated in this test. Fig. 7 shows both the reconstructed and real data. Reconstructed volumes appear to be visually similar to the reference volume. Once again, to further compare the results, profiles are extracted at the centre of each bright area in Fig. 7 (note that a mean filter is used to reduce the noise level). Fig. 8 shows that these profiles are relatively close. The profiles of the upper bright areas will be symmetrically similar to those of the lower areas. The radius of each sphere accurately matches the corresponding radius in the reference volume. Also, the difference of radioactivity is preserved in the reconstructed slices. Test 2. Two cubes are simulated in this test. Fig. 9 shows the raw data after tomographic reconstruction and the real data. Likewise, reconstructed volumes
46
F.P. Vidal et al.
(a) Method 1.
(b) Real data.
(c) Method 2.
Fig. 7. Tomographic slices
Profiles 1
Profiles 2 1
Method 1, raw data Method 2, raw data Real data
0.8
0.8
0.6
0.6
Intensity
Intensity
1
0.4
0.2
0.4
0.2
0
0 0
100
200 300 Pixel number
400
500
0
(a) Lower bright areas.
100
200 300 Pixel number
400
500
(b) Middle bright areas.
Fig. 8. Profiles extracted from Fig. 7
(a) Method 1.
(b) Real data.
(c) Method 2.
Fig. 9. Tomographic slices
appear to be visually similar to the reference volume. Using the same method, profiles are extracted at the centre of each bright area. Fig. 10 shows that the cube lengths accurately match the real values.
Artificial Evolution for 3D PET Reconstruction Profiles 1
Profiles 2 1
Method 1, raw data Method 2, raw data Real data
0.8
0.8
0.6
0.6
Intensity
Intensity
1
47
0.4
0.2
0.4
0.2
0
0 80
100
120 140 Pixel number
160
180
(a) Lower left bright area.
320
340
360 380 Pixel number
400
420
(b) Upper right bright area.
Fig. 10. Profiles extracted from Fig. 9
5
Discussion and Conclusion
In this paper, we show that Evolutionary Computation is a promising technique to solve the usually computationally expensive problem of reconstructing 3D images from PET data. Whilst it is possible to use sinograms in PET, this option is not acceptable. Instead, a simplified geometrical model of PET scanner is used to simulate annihilation events. To date, the photons’ trajectory is simulated without interaction with matter. This approach is closer to reality and it gives promising results on synthetic data. Also, more realistic physics simulations could be added to correct Compton scattering. Another point we have raised is that when using “marginal evaluation”, an individual’s fitness is not calculated in an absolute manner, but as the positive or negative value of the contribution of this individual to the performance of the complete population. A consequence is that the relatively costly step of selection may be skipped and the evolutionary algorithm be simplified. In addition, in some cases this may even result in a stopping criterion, if at some stage all the individuals have got a positive fitness value, which means they are all contributing positively to the reconstruction. This should be applicable into other areas of evolutionary computation and coevolution, not necessarily restricted to medical imaging, whenever the marginal fitness paradigm is used.
List of Acronyms CT computed tomography LOR line of response ML-EM maximum-likelihood expectation-maximization OS-EM ordered subset expectation-maximization PET positron emission tomography SPECT single-photon emission computed tomography STL standard template library
48
F.P. Vidal et al.
Acknowledgements This work has been partially funded by Agence Nationale de la Recherche (ANR).
References 1. Badawi, R.D.: Nuclear medicine. Phys. Educ. 36(6), 452–459 (2001) 2. Bosman, P.A.N., Alderliesten, T.: Evolutionary algorithms for medical simulations: a case study in minimally-invasive vascular interventions. In: Proceedings of the 2005 workshops on Genetic and evolutionary computation (GECCO ’05), pp. 125– 132 (2005) 3. Bousquet, A., Louchet, J., Rocchisani, J.M.: Fully three-dimensional tomographic evolutionary reconstruction in nuclear medicine. In: Monmarch´e, N., Talbi, E.-G., Collet, P., Schoenauer, M., Lutton, E. (eds.) EA 2007. LNCS, vol. 4926, pp. 231– 242. Springer, Heidelberg (2008) 4. Cagnoni, S., Dobrzeniecki, A.B., Poli, R., Yanch, J.C.: Genetic algorithm-based interactive segmentation of 3D medical images. Image Vision Comput. 17(12), 881–895 (1999) 5. Fahey, F.H.: Data acquisition in PET imaging. J. Nucl. Med. Technol. 30(2), 39–49 (2002) 6. Hudson, H.M., Larkin, R.S.: Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans. Med. Imaging 13(4), 601–609 (1994) 7. Lewitt, R.M., Matej, S.: Overview of methods for image reconstruction from projections in emission computed tomography. Proceedings of IEEE 91(10), 1588–1611 (2003) 8. Louchet, J.: Stereo analysis using individual evolution strategy. In: Proceedings of the International Conference on Pattern Recognition (ICPR ’00), p. 1908 (2000) 9. Michael, G.: X-ray computed tomography. Phys. Educ. 36(6), 442–451 (2001) 10. Olague, G., Cagnoni, S., Lutton, E.: Introduction to the special issue on evolutionary computer vision and image understanding. Pattern Recognit. Lett. 27(11), 1161–1163 (2006) 11. Pea-Reyes, C., Sipper, M.: Evolutionary computation in medicine: an overview. Artif. Intell. Med. 19(1), 1–23 (2000) 12. Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imaging 1(2), 113–122 (1982) 13. Silicon Graphics, Inc.: Standard template library programmer’s guide, http://www.sgi.com/tech/stl/ 14. Townsend, D.W.: Physical principles and technology of clinical PET imaging. Ann. Acad. Med. Singap. 33(2), 133–145 (2004) 15. V¨ olk, K., Miller, J.F., Smith, S.L.: Multiple network CGP for the classification of mammograms. In: Giacobini, M., et al. (eds.) EvoCOMNET. LNCS, vol. 5484, pp. 405–413. Springer, Heidelberg (2009) 16. Watt, A.: 3D Computer Graphics, 3rd edn. Addison-Wesley, Reading (2000)
A Hybrid Genetic Algorithm/Variable Neighborhood Search Approach to Maximizing Residual Bandwidth of Links for Route Planning Gajaruban Kandavanam, Dmitri Botvich, Sasitharan Balasubramaniam, and Brendan Jennings TSSG, Waterford Institute of Technology, Ireland
Abstract. This paper proposes a novel approach to performing residual bandwidth optimization with QoS guarantees in multi-class networks. The approach combines the use of a new highly scalable hybrid GA-VNS algorithm (Genetic Algorithm with Variable Neighborhood Search) with the efficient and accurate estimation of QoS requirements using empirical effective bandwidth estimations. Given a QoS-aware demand matrix, experimental results indicate that the GA-VNS algorithm shows significantly higher success rate in terms of converging to optimum/near optimum solution in comparison to pure GA and another combination of GA and local search heuristic, and also exhibits better scalability and performance. Additional results also show that the proposed solution performs significantly better than OSPF in optimizing residual bandwidth in a medium to large sized network.
1
Introduction
QoS-aware network planning remains critically important for Internet Service Providers (ISP) given the challenges they are faced with due to the heterogeneity in network infrastructure and dynamism in network traffic. There have been many research efforts to address the route planning problem. Kodialam and Lakshman [1] proposed an integer programming formulation to find two disjoint paths for each demand pair during route planning. One path is used as the primary and the other as secondary. The secondary paths are selected in such a way they share the link capacity when their corresponding primaries do not have any links in common. They showed through experiments that the complete information regarding the allocated paths is not necessary to find a near optimal bandwidth allocation. Riedl and Schupke [2] proposed a routing algorithm that takes into account a concave link metric in addition to an additive one, showing that better utilization can be achieved when both metrics are used to perform the routing. They presented a mixed integer programming model to work on the metrics for small networks and a GA based technique for large networks. Applegate and Cohen [3] proposed a routing model based on linear programming. The maximum link utilization was taken as the metric to optimize routing. They further showed that a perfect knowledge of the traffic matrix was not required to perform robust routing under dynamic traffic conditions. P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 49–60, 2010. c Springer-Verlag Berlin Heidelberg 2010
50
G. Kandavanam et al.
Kodialam et al [4] proposed an on-line multi-cast routing algorithm which minimizes a cost function. Their heuristic algorithm uses the knowledge of the ingress and egress points which are potential future demands to avoid the use of the loaded links that may be required for future demands. They also presented results that showed reduction in call rejection rate. The solution proposed by Yaiche et al [5] for optimal bandwidth allocation of elastic demands is based on a game theoretic framework. The Nash equilibrium is used as the bargaining point of the bandwidth requirement. The solution was presented in both centralized and distributed manners. In previous work [9] we proposed a heuristic solution based on a GA to support bandwidth guaranteed routing for clustered topologies. We showed through experiments that the proposed algorithm, which is distributed in the sense that individual clusters are treated in parallel outperformed similar algorithms that do not take cognizance of the clustered nature of the topology. However, the above problem formulations of the well known routing problem do not produce optimum desirable route plans always. For example, choosing to minimize maximum link utilization does not guarantee minimal load on each link in the network. We formulate the said routing problem as maximizing the residual bandwidth of all links in a network. This is a significant multi-modal constrained optimization problem that deterministic heuristic techniques are incapable of addressing. We revisit the use of GA [13] as a means of addressing the above bandwidth optimization problem in network planning. GA have been proposed for various non-linear optimization problems in communication networks [2,14], where deterministic optimization techniques such as linear programming are not applicable. However, it is well known that standard GA approaches often do not guarantee converging to feasible solutions in the context of constrained optimization problems [12]. Failure to guarantee convergence is an important limitation, especially in the context of route planning for large networks, where application of these techniques can be computationally expensive. In this paper, we present a novel algorithm (denoted GA-VNS) to maximize residual bandwidth of links for route planning that is based on augmenting a standard GA approach with the VNS approach proposed by Tasgetiren et al [11]. We assume that the input to the algorithm is a demand matrix for the network that is QoS-aware in the sense that the estimated demand for a given source-destination pair takes cognizance of the effective bandwidth [10] of the traffic expected to flow from that source to that destination. In previous work [8], we outlined a cost-effective approach for QoS-aware demand matrix preparation based on a measurement-based approach to effective bandwidth estimation. Using traffic matrices prepared following this approach, we investigate the performance of GA-VNS in comparison to a standard GA algorithm and a GA algorithm augmented with another local search heuristic called Fixed Neighborhood Search (denoted GA-FNS), for a number of randomized and real network topologies. Our results show GA-VNS always finds an optimal or near-optimal solution, whereas the other two algorithms fail to find a feasible solution for a significant proportion of the attempts. Furthermore, the algorithm is shown to
A Hybrid Genetic Algorithm/Variable Neighborhood Search Approach
51
scale well to large networks and to out-perform the commonly deployed OSPF intra-AS routing protocol in terms of maximizing link residual bandwidth.
2
Residual Bandwidth Optimization Problem
Our residual bandwidth optimization problem can be stated informally as: identify a routing plan for a network that maximizes the residual bandwidth of all links in the network given that all links have defined maximum capacities and that the per traffic class effective bandwidth required between source-destination pairs for expected traffic flows is that defined in the provided traffic matrix. Provision of a traffic matrix that is QoS-aware in the sense that the demands for given source/destination pairs reflect effective bandwidth estimations means that the routing plan identified as a solution to the optimization problem is one that minimizes the risk of QoS violations if actual traffic carried on the network is equal to the estimated values. In this context choosing to maximize residual bandwidth on all links appears a natural choice, since the larger the residual bandwidth is on a given link the less the probability of traffic carried over that link incurring a QoS violation. Our objective is to maximize the residual bandwidth on all links in the multil class network. It can be represented as maximizing X Cl , over all l ∈ L satisfying the constraint Xl ≥ λl , ∀ l ∈ L, where Xl is the residual bandwidth on the link l, Cl is the capacity of the link l and L is the set of links. Route allocations are calculated to satisfy the QoS requirements of the estimated traffic demand, which is represented by the QoS-aware demand matrix. The metric used to compare the quality of different paths in forming routes is an asymptotic convex function, defined as: Cl Cl F (Xl ) = − (1) (Xl − λl )α (Cl − λl )α where α is a constant. The cost function used in our hybrid GA algorithm to guide the search, is based on F (Xl ). The function is motivated by the condition that if Xl = Cl then F (Xl ) = 0 and should go to infinity when the available bandwidth on link l approaches the reserved bandwidth λl . Note that Riedl and Schupke [2] found that the use of a convex metric significantly improves routing optimization.
3
Approach Outline
The first step in QoS-aware route planning using our approach is the preparation of a QoS-aware demand matrix. One possible process for doing this is outlined by Davy et al [8]. We then proceed to outline three separate GA-based algorithms to solve the residual bandwidth optimization problem. The first is a “pure” GA algorithm and forms the basis for the other two algorithms. These are the hybrid GA with Fixed Neighborhood Search (GA-FNS) and our proposed solution: hybrid GA with Variable Neighborhood Search (VNS).
52
3.1
G. Kandavanam et al.
Genetic Algorithm
We now summarize the GA technique and outline the “pure” GA we apply to the residual bandwidth optimization problem. For our residual bandwidth optimization GA we seek to populate a route table for our network which contains a sub-table for each source/destination pair, which in turn contains n shortest paths between that source/destination pair. A breadth first search based algorithm can be used to initially find the routes to build the route table. Given this starting point we now outline the GA in terms of the nature of Chromosomes, the Initial Population, the Selection mechanism, the Crossover mechanism, the mutation mechanism, and the Cost Function. An array structure is used as the chromosome structure. The selected path for each source/destination pair is encoded in an array. For example Chromosome[i] [j] is the gene representing the path selected for the ith traffic class in j th sourcedestination pair. The initial population is generated by randomly selecting the paths for the required source destination pairs for different traffic classes from the route table to form chromosomes. The probabilities of selecting given paths for each sourcedestination pair follows a uniform distribution. A simple tournament selection mechanism is employed. Every chromosome in the odd position in the ordered population is mated with the next chromosome to produce offspring. This operation produces a population twice the size the actual population size. The best half is selected for the next generation and the rest is discarded. But, this may produce duplicates in the population as the generations go on. The duplicates are replaced with newly generated chromosomes. This process increases the diversity in the population. A two point crossover is used to produce offspring. Crossover is applied separately for arrays representing each traffic class by selecting different crossover points for different arrays representing the paths selected for different classes of traffic. Once more, a uniform distribution is used to randomly select the crossover points. Mutation is performed by randomly changing one gene in the chromosome. Uniform random distribution is used to select the gene to be replaced. A cost function based on the metric that is presented in the previous section for residual bandwidth calculation is used in this paper to measure the quality of the solution. This is given by: Φ(Xl ) = l∈L F (Xl ). 3.2
The GA-FNS Hybrid Algorithm
The same GA that is described in the previous section is used here. But the difference is that a greedy local search algorithm with fixed neighborhood is used along with the GA. Therefore, we use the term Fixed Neighborhood Search (FNS) to make it easier for the comparisons with the VNS algorithm that is described in the next section. The FNS is applied to all the chromosomes in the population with 10% probability except the best. The best chromosome in the population is always selected for applying local search. The FNS is applied after every generation of GA. The best chromosome in the population always
A Hybrid Genetic Algorithm/Variable Neighborhood Search Approach
53
undergoes FNS in order to accelerate the convergence to a global optimum/near optimum value. The search is also probabilistically applied to the other chromosomes to guide the search from different regions in the solution space. This is crucial, because a solution with high cost can be close to the global optimum solution. This process improves fairness and prevents the potential solutions being neglected. The FNS performs the search by comparing the cost with the other solutions in the immediate neighborhood. The cost is calculated using the same cost function as the one used in GA. The neighborhood of a solution is the set of solutions which differ from the current solution by at most one bit. One gene in the chromosome is randomly selected and replaced by another random gene to find the neighbor of the solution that a chromosome represents. If the neighbor is found to be having lower cost than the current solution, the neighbor is made the current solution and the search proceeds from there. Another neighbor is examined for improvement otherwise. This process continues until no better solution is found in the neighborhood. But it is very likely that the search proceeds in the wrong direction in a multi-model solution space and leads to a very long convergence time. Therefore, we have restricted the number of hops allowed to search for a better solution, to a finite number. This process helps to improve the solution that the GA finds in a more deterministic fashion. 3.3
The GA-VNS Hybrid Algorithm
In this algorithm, VNS is applied as opposed to FNS. All other aspects of this algorithm are the same as that of the GA-FNS. The Variable Neighborhood search switches the neighborhood while searching for a better solution. Therefore, there are two neighborhoods defined in this algorithm. The first neighborhood is the set of solutions which have at most one bit difference against the current solution. This is the same as that of FNS. The second neighborhood is comparatively bigger. This is defined as the set of solutions that differ from the current solution by exactly two bits. A neighbor in the second neighborhood is defined by replacing two randomly selected genes from the chromosome that represents the current solution, by two random genes. Initially the search is performed in the first neighborhood to find a better solution as described in the previous section. If a solution with lower cost cannot be found in the first neighborhood, the search switches to the second neighborhood. The neighbors are examined for a solution with lower cost in the new neighborhood. If the neighbor is having lower cost, the neighbor is made the current solution. The other neighbors are examined one by one for an improved solution otherwise. The search terminates when a better solution cannot be found in the new neighborhood. The number of maximum hops allowed is restricted to a fixed number for the same reasons as it is in FNS. The switching of neighborhoods prevents the search being stuck at the local minimum. When there is no better solution found in the first neighborhood, it can be a local minimum. But when the neighborhood changes, it is probable that a better solution can be found and thus the local minimum is skipped.
54
G. Kandavanam et al. Table 1. The Range of Mean Rate Values Traffic Class Rate γi αi Class1 100 − 500 Mb/s 142.640 0.6 Class2 100 − 200 Mb/s 31.548 0.4
4
Experimental Setup
The problem scenarios and the different contexts in which the evaluations of the proposed techniques are carried out are explained in this section. Demand matrices with two degrees of load impact are generated to perform routing, where the first type has an impact of 30 − 40% average link utilization and the other has 50 − 60%. This is to evaluate the performance of the proposed algorithm under different difficulty levels. The experiments are carried out on different sizes of network topologies, both randomized and real, to show the applicability and scalability of the proposed solution. The random network topologies are built with 20% connectivity, ie. there exist links between 20% of node pairs in the network. The random topologies have 40% of their nodes as edge nodes. For certain experiments two real topologies were used: the Telstra Australia topology described in [6] and the N 20 topology described in [7]. We simulate the 95th percentile Effective bandwidth-Mean (EM) coefficient values K95,i to be as close as possible to the actual values. The EM coefficient γi K95,i is given by the following expression: K95,i = mean αi + 1. The mean rate i meani , γi and αi values for the traffic class i are selected as shown in Table 1. The values for mean rate are generated using uniform distribution. Then the required effective bandwidth Ref f per source-destination x − y per traffic class i is given by: Ref f = K95,i × meani,x−y .
5
Results and Analysis
This section discusses the results of performance evaluation carried out using the proposed solution. The results are presented in three sub-sections. The first set of experiments are performed on randomly generated network topologies of different sizes. The second and third sets of experiments are performed on Telstra Australia backbone topology as described in [6] and the N 20 topology described in [7]. The GA have the population size of 60 and mutation rate of 5% as they were found to be optimum in our experiments [14]. 5.1
Randomized Topologies
This section presents the results of the experiments carried out on randomly generated network topologies. The results of first set of experiments are illustrated in Fig. 1. This shows the scalability of the proposed GA-VNS algorithm.
A Hybrid Genetic Algorithm/Variable Neighborhood Search Approach
55
The results show that the number of fitness evaluations required to find the optimum/near optimum solution does not change significantly when the network grows in size. The generated demand matrices for this experiment had (30−40%) average link utilization. The experiment is carried out for a single class of traffic and the cost function has the α value of 1. Fig. 2 illustrates the scalability of the proposed algorithm for two traffic classes based on the QoS-aware demand matrix. Fig. 3 illustrates how the load on the network makes an impact on the convergence of the proposed solution. The experiments are carried out under approximately 30 − 40% and 50 − 60% average link utilization scenarios. A randomly generated network topology with 50 nodes is used. The other parameters remain the same as that of previous experiment.
Fig. 1. The Scalability of the Algorithm for a Single Traffic Class
Fig. 2. The Scalability of the Algorithm for Multiple Traffic Classes
The next set of experiments are to evaluate the suitability of the other algorithms like Pure GA and GA-FNS in comparison to the proposed GA-VNS algorithm. The parameters of the experiments are the same as that of previous experiment. As it can be seen from the results shown in Table 2, the Pure GA and GA-FNS do not converge to an optimal solution always, that is, they often fail to drive the search into the feasible region. The percentage of successfully converging to an optimal solution for pure GA and GA-FNS are 17% and 60% respectively. Significantly, the GA-VNS algorithm has 100% success. The results illustrated in Fig. 4 show how GA-VNS converges with respect to different number of traffic classes. The load on the network is approximately the same for both scenarios. As it can be seen when there are multiple classes of traffic, it takes longer to converge to a feasible solution in comparison to single class traffic. This is due to the variation in effective bandwidth requirements for different classes of traffic. The average link utilization caused by the demand matrices used in this experiment are shown in Table 3. The results illustrated in Fig. 5 show the convergence of the GA-VNS algorithm for different sizes of network topologies when the cost function with α = 2 is used. All other parameters associated with this experiment are the same as
56
G. Kandavanam et al.
Fig. 3. The Effect of Load on Convergence
Table 2. The Success Rate of the Algorithms The Algorithm The Percentage of Success GA-VNS 100% GA-FNS 60% Pure GA 17%
Fig. 4. Single Class of Traffic vs Two Classes of Traffic Table 3. The Average Link Utilization Average Link Utilization for single class 0.42 Average Link Utilization for two classes 0.33
that of the first experiment. It can be observed that the α value does not have an impact on converging to a feasible solution. However, it does have an impact on converging to an optimal solution, with the cost function with α = 1 converges slightly faster than that with α = 2. Fig. 6 illustrates the distribution of link utilization across the network. A topology with 50 nodes, 20 edge nodes and 20% connectivity is generated to perform the simulations. The generated demand matrix has 160 source-destination pairs. The link capacities and the bandwidth requirements are randomly generated using uniform distribution in the range of 150 − 200 Mb/s and of 40 − 80 Mb/s respectively. It can be seen that the maximum link utilization is kept below 0.6. The results show that there are many links that are having < 0.1 link utilization, because they are not used to route the traffic. This is due to the fact that the network topology is randomly generated and so are the demand matrix and edge routers. The maximum link utilization and the average link utilization for the above scenario are 0.59 and 0.33 respectively. As it can be seen from all previous experiments, the cost of the best solution is infinity at the initial stages. This means the available bandwidth on at least one link l in the network is below the reserved bandwidth λl . Therefore, the solutions in the initial population as well as in the first few generations are infeasible, ie. the used bandwidth on the links is higher than the allowed bandwidth limit. This happened, because the initial population is randomly generated. The size of the feasible region depends on the difficulty of the problem, as dictated by the size of the demand matrix and the bandwidth requirements. As the bandwidth requirement grows, the feasible region shrinks and becomes
A Hybrid Genetic Algorithm/Variable Neighborhood Search Approach
Fig. 5. The Scalability of the Algorithm for α = 2
57
Fig. 6. The Distribution of Link Utilisation Across the Network
disconnected. At some stage, the feasible region completely vanishes. However, in our experiments, we considered the scenarios where all the demands can be satisfied. Searching for the global optimum in many disconnected feasible regions, each having a number of maxima and minima is a very challenging task. In fact, the size of the complete feasible region is much smaller than that of the infeasible region for a problem with moderate difficulty. Therefore, it is crucial that a successful algorithm must have the ability to drive the search from the infeasible region towards the feasible region. For the results reported here it is clear that GA-VNS has this ability, at least for problems of the scale and difficulty addressed here. To illustrate this point the results shown in Table 2 helps to compare the ability of different algorithms to find feasible solutions. In case of pure GA, the offspring of infeasible solutions are very likely to be infeasible. GA is a random guided search technique. Therefore, it fails to direct the search towards the feasible region. But for GA-FNS, the neighbor of the infeasible solution is also infeasible in most of the cases. Therefore GA-FNS also fails to search, in spite of having the local search along with GA. Typically non-linear optimization problems are addressed by enforcing the feasibility restriction on initial population generation. But this cannot be done in complex problems like the one addressed in this paper due to the very small size of the feasible region in comparison to the infeasible region. When the feasible region is very small, generation of feasible results requires extensive searching, which makes it impractical to initiate the GA with a set of feasible solutions. 5.2
Telstra Australia Topology
The experiments for the results presented in this section are carried out on the Telstra Australia topology, as detailed by the Rocketfuel project [6]. Although the actual capacities of the topology that are needed for our study are not known, this project provides the derived OSPF/IS-IS link weights. In the absence of any other information on capacities, they are assumed to be inversely proportional to the link weights, as per the Cisco recommended default setting of link weights.
58
G. Kandavanam et al.
The first set of experiments performed to compare the maximum link utilization obtained in the route plan using the proposed solution in comparison to OSPF is as follows: Proposed solution produced a maximum link utilization of 0.87 in comparison to the maximum link utilization of 1.06 (infeasible) produced by OSPF. Fig. 7 illustrates how the proposed GA-VNS algorithm performs in comparison to GA and GA-FNS algorithms. We have performed this experiment at comparatively low network load to start with a population having feasible solutions. This was necessary as GA and GA-FNS algorithms fail to converge on certain proportion of attempts. As it can be seen, GA-VNS algorithm converges faster in comparison to the rest. The next set of results illustrated in Fig. 8 is to show how the α value of the cost function influences the performance of the solution. The costs of the solutions are normalized to make them comparable, since two different cost functions are used.
Fig. 7. The Performance of GA-VNS in Comparison to GA and GA-FNS
5.3
Fig. 8. The Comparison of Performance for α Values 1 and 2
Comparison with OSPF for N 20 Topology
The results illustrated in Fig. 9 are evaluated on N 20 topology given in [7]. The demand matrix used is having 1.4 times the bandwidth requirements given in [7]. The distribution of link utilization is compared with that of OSPF. We have set the OSPF weights using two criteria. The first criterion is link capacity, where the link weights are inversely proportional to the capacity of the corresponding links. The second is the available bandwidth on links, where the weights are inversely proportional to the available bandwidth of the corresponding links. As it can be seen, OSPF weight setting using the available bandwidth as a metric shows better distribution of link utilization as expected. The proposed solution produces significantly better distribution in comparison to the two versions of OSPF. In this experiment, the average link utilization produced is approximately 0.4. The maximum link utilization produced by GA-VNS, OSPF-Capacity and OSPF-Free Bandwidth are 0.66, 1.33 and 0.81 respectively. The superiority of the performance of GA-VNS over Pure GA and GA-FNS is illustrated in Fig. 10.
A Hybrid Genetic Algorithm/Variable Neighborhood Search Approach
Fig. 9. The Distribution of Load in comparison to OSPF
6
59
Fig. 10. The Performance of GA-VNS in comparison to GA-FNS and Pure GA
Summary and Future Work
In this paper we proposed a solution to support QoS-aware network planning for medium to large networks by performing residual bandwidth optimization for all links in the network, assuming the availability of a QoS-aware demand matrix. This is a constrained optimization problem that standard GA typically fails to converge for when we start from a set of infeasible solutions. To address this limitation we outlined a hybrid GA-VNS algorithm to perform residual bandwidth optimization on all links for multiple traffic classes in medium to large network topologies. Experiments were performed on randomized networks of different sizes, real topologies and specific network topologies to evaluate the GA-VNS algorithm under different scenarios. The results demonstrated that, for the tests carried out, the algorithm always converges to an optimal or near optimal solution. Our performance evaluations further demonstrated that GA-VNS performs and scales better than standard GA and a representative combination of GA-local search algorithms. Additionally the algorithm performed significantly better in terms of optimizing residual bandwidth in comparison to OSPF. The suitability of the proposed GA-VNS algorithm for residual bandwidth optimization is evident from the experimental results we have provided. However the scope of the proposed algorithm is not restricted to this problem alone. For future work we plan to investigate the application of GA-VNS and extensions thereof to other network optimization problems.
Acknowledgments This work was funded by Science Foundation Ireland via grant number 08/SRC/ I1403 (“Federated, Autonomic Management of End-to-End Communications Services”).
60
G. Kandavanam et al.
References 1. Kodialam, M., Lakshman, T.V.: Dynamic Routing of Restorable BandwidthGuaranteed Tunnels using Aggregated Network Resource Usage Information. IEEE/ACM Transactions on Networking 11(3) (June 2003) 2. Riedl, A., Schupke, D.A.: Routing Optimization in IP Networks Utilizing Additive and Concave Link Metrics. IEEE/ACM Transactions on Networking 15(5) (October 2007) 3. Applegate, D., Cohen, E.: Making Routing Robust to Changing Traffic Demands: Algorithm and Evaluation. IEEE/ACM Transactions on Networking 14(-6) (December 2006) 4. Kodialam, M., Lakshman, T.V., Sengupta, S.: Online Multicast Routing With Bandwidth Guarantees: A New Approach Using Multicast Network Flow. IEEE/ACM Transactions on Networking 11(4) (August 2003) 5. Yaiche, H., Mazumdar, R.R., Rosenberg, C.: A Game Theoretic Framework for Bandwidth Allocation and Pricing in Broadband Networks. IEEE/ACM Transactions on Networking 8(5) (October 2000) 6. Spring, N., Mahajan, R., Whetherall, D.: Measuring ISP Topologies with Rocketfuel. In: Proc ACM SIGCOMM, pp. 133–145 (2002) 7. Kohler, S., Binzenhofer, A.: MPLS Traffic Engineering in OSPF Networks-A Combined Approach. Univ. Wurzburg, Germany, Tech. Rep. 304 (February 2003) 8. Davy, A., Botvich, D., Jennings, B.: On the use of Accounting Data for QoS-aware IP Network Planning. In: Mason, L.G., Drwiega, T., Yan, J. (eds.) ITC 2007. LNCS, vol. 4516, pp. 348–360. Springer, Heidelberg (2007) 9. Kandavanam, G., Botvich, D., Balasubramaniam, S., Suganthan, P.N., Tasgetiren, M.F.: A Dynamic Bandwidth Guaranteed Routing Using Heuristic Search for Clustered Topology. In: IEEE Advanced Networks and Telecommunication Systems (December 2008) 10. Kelly, F.: Notes on Effective Bandwidth. In: Kelly, F.P., Zachary, S., Ziedins, I.B. (eds.) Stochastic Networks: Theory and Application. Royal Statistical Society Lecture Notes Series, vol. 4, pp. 141–168. Oxford University Press, Oxford (1996) ISBN 0-19-852399-8 11. Tasgetiren, M.F., Sevkli, M., Liang, Y.C., Gencyilmaz, G.: Particle Swarm Optimization Algorithm for Permutation Flowshop Sequencing Problem. LNCS. Springer, Heidelberg (2004) 12. Coello, C.A.: Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Computer methods in applied mechanics and engineering 191(11-12), 1245–1287 (2002) 13. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston (1989) ISBN:0201157675 14. Kandavanam, G., Botvich, D., Balasubramaniam, S., Suganthan, P.N., Donnelly, W.: A Multi Layered Solution for Supporting ISP Traffic Demand using Genetic Algorithm. In: The proc. of IEEE Congress on Evolutionary Computation (September 2007)
Parallelization of an Evolutionary Algorithm on a Platform with Multi-core Processors Shigeyoshi Tsutsui Hannan University, Matsubara Osaka 580-8502, Japan
[email protected]
Abstract. This paper proposes methods of parallel evolutionary algorithms using multi-thread programming on a platform with multi-core processors. For this study, we revise the previously proposed edge histogram sampling algorithm (EHBSA) which we call the enhanced EHBSA (eEHBSA). The parallelization models are designed using eEHBSA to increase the execution speed of the algorithm. We propose two types of parallel models; a synchronous multi-thread model (SMTM), and an asynchronous multi-thread model (AMTM). Experiments are performed using TSP. The results showed that both parallel methods increased the speed of the computation times nearly proportional to the number of cores for all test problems. The AMTM produced especially good run time results for small TSP instances without local search. A consideration on parallel evolutionary algorithms with many-core GPUs was also given for future work.
1
Introduction
Parallel evolutionary algorithms (PEAs) have been recognized for a long time and have been successfully applied to solve many hard tasks to reduce the time required to reach acceptable solutions [1,2]. Although PEAs are very promising, it is true that there are many barriers to be conquered in designing PEAs, depending on the computational environment available at hand, and also there are many design factors, such as determining parallel topology, parameter setting, reducing communication overhead, etc. Recently, microprocessor vendors supply processors which have multiple cores, and PCs which use such processors are available at a reasonable cost, which makes it possible to run PEAs on standard, commercially-available computers. They are normally configured with symmetric multi processing (SMP) architecture. Since the main memory is shared among processors in SMP, parallel processing can be performed efficiently. Thus, we can say that computing platforms running parallel algorithms using SMP are now ready for end users. In a previous study [3], we have proposed the edge histogram based sampling algorithm (EHBSA) that uses edge histogram models within the estimation of distribution algorithm (EDA) framework [4,5] to solve problems in permutation domains, and showed EHBSA worked fairly well. In this paper, we propose methods of parallelization of EHBSA using multi-thread programming on a PC with P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 61–73, 2010. c Springer-Verlag Berlin Heidelberg 2010
62
S. Tsutsui
multi-core processors which are commonly available at reasonable cost. For this study, we revise the previously proposed EHBSA which we call the enhanced EHBSA (eEHBSA). Madera et al. provided a survey of parallel EDAs [6], and presented severals level of parallelization in EDAs. In this research, we discuss parallel eEHBSA aiming at speedup of evolutionary algorithms. In our approach, parallelization is performed at the individual level. We propose two types of parallel models for this purpose; one is a synchronous multi-thread model (SMTM), and the other is an asynchronous multi-thread model (AMTM). The results showed that both parallel methods increased the speed of the computation times nearly proportional to the number of cores for all test problems. The AMTM produced especially good run-time results for small TSP instances without local search. For larger instances with local searches, the differences of both approaches become smaller. In addition to the above mentioned PEAs with multi-core, we also give short consideration to the possibilities of PEAs run with GPU computation, where potentially hundreds of streaming processors (many-core) can be run in parallel. The paper is structured as follows. Section 2 describes eEHBSA. Section 3 describes the parallelization scheme of eEHBSA in a computing platform with a multi-core processor, and then in Section 4 the computational results of parallel eEHBSA are analyzed. Section 5 briefly discusses the potential of using PEAs with GPU computationbriefly. Finally, Section 6 concludes the paper.
2
Enhanced EHBSA
Many studies on EDAs have been performed in discrete (mainly binary) domains and there are several attempts to apply EDAs in continuous domains. However, a few studies on EDAs in permutation representation domains are found. Bosman and Thierens use random keys to represent permutations together with a marginal product factorization to estimate the continuous distribution of random keys [7]. Robles et. al. also use random keys to enable the use of EDAs from continuous parameter optimization in solving permutation problems [8]. In our previous studies, we have proposed EHBSA [3]. In this approach, we use an edge histogram matrix (EHM) obtained from the current population as a probabilistic model, where an edge is a link between two nodes in a string. The most novel feature of the EHBSA is how new solutions are constructed; we generate permutaion strings by combining a segment which consists of nodes obtained by sampling the EHM, and a segment which consists of nodes taken from existing solutions in the current population. This approach showed good convergence in the search process. We tested the algorithm using TSP. The results showed EHBSA worked fairly well on the test problems compared with traditional GAs, which use operations such as OX, PMX and eER [9]. The approach was applied to flow shop scheduling in [10], and capacitated vehicle routing problems in [11]. EHBSA also showed good performance as reported in [7,8]. Please see [3,10,11,12] for details.
Parallelization of an Evolutionary Algorithm on a Platform
63
The revision of the EHBSA was performed mainly aiming at ease of parallelization and parameter setting of the algorithm. Below, the outline of the revised EHBSA (eEHBSA) will be described. 2.1
Edge Histogram Matrix and Generation of New Individuals Revised
The basic idea of EHBSA is to use the edge distribution of the whole population in generating new strings. An edge histogram matrix (EHM) for the selected solutions is constructed and new solutions are generated by EHM sampling. An EHM is an L × L matrix, where L is the problem size. An example of EHM at generation t, EHMt = (etp,q ), is shown in Fig. 1. The integer value of each (etp,q ) represents number of edges from node p to node q (p → q) in the population. Note here, in a symmetrical permutation problem such as symmetrical TSP, an edge p → q and an edge q → p are considered as the same in EHM. In an asymmetrical permutation problem such as a scheduling problem, both have different positions in the EHM. The fractional value (ε) represents minimum value to give a bias to control pressure in sampling nodes and is given as ε=
2N L−1 Bratio N L−1 Bratio
for symmetric (1) for asymmteric
st1 = ( 1, st2 = ( 2, st3 = ( 4, st4 = ( 5, st5 = ( 3,
2, 4, 5, 1, 2,
3, 5, 3, 4, 4,
4, 3, 2, 2, 5,
5) 1) 1) 3) 1)
0 3.1 2.1 2.1 3.1
3.1 0 4.1 3.1 0.1
2.1 4.1 0 1.1 3.1
2.1 3.1 1.1 0 4.1
3.1 0.1 3.1 4.1 0
where N is the population size and Bratio (Bratio > 0), or the bias ratio, is a constant related to the pressure (a) P(t) (b) EHM t toward random permutations (please Fig. 1. An example of a symmetric edge see [3]). Generation of a new individual I histogram matrix for N = 5, L = 5, is performed as shown in Fig 2. In Bratio = 0.04. P (t) represents a population the figure, a template individual I is at generation t. chosen from P (t). The new individL ual I is generated as a combination of the following two segments; (1) a par- template I lp tial solution segment (PSS) from posicopy ls tion (loci) ptop with number of nodes lp , and (2) a sampling based segment solution I ' SBS PSS (SBS) from position stop with numplast ptop sampling ber of nodes ls . In generating I , first, stop l ls L p EHM nodes in PSS are copied from the temstop ( ptop l p L) mod L plast ( ptop 1 L) mod L plate I to I . Then, nodes for SBS are obtained by sampling the EHM. Fig. 2. Generating a new solution using To ensure robustness across a wide partial solution spectrum of problems, it is advantageous to introduce variation both in the portion and the number of nodes of both segments. First we choose the starting node position for the PSS (ptop ) randomly. Thereafter, the number of nodes of the partial solution lp must be
64
S. Tsutsui
determined. Here, let us introduce a control parameter γ which can define E(ls ) (the average of ls ) by E(ls ) = L × γ. To determine the ls in previous studies of EHBSA, we used the n cut-point approach [3,12]. However with this approach, E(ls ) is L/2, L/3, L/4, · · · for n= 2, 3, 4, · · ·, and γ corresponds to 1/n, i.e., γ can take only the values of 0.5, 0.333, 0.25, · · ·, corresponding to n = 2, 3, 4, · · ·. In eEHBSA, we extend this elementary method to a more flexible technique which allows for γ taking values in the range [0.0, 1.0]. Eq. (2) is the probability density function for ls in this research. This density function can be obtained by extending the distribution function of the length of a segment in the n cut-point approach (see [12]). ⎧ 1−2γ ⎨ 1−γ 1 − ls γ for γ ∈ (0, 0.5] Lγ L f (ls ) = (2) 2γ−1 ⎩ γ ls 1−γ for γ ∈ (0, 1] L(1−γ)
L
The generation of new nodes for the SBS is performed by sampling the EHM probabilistically as follows. Let p be a node in a position of a new string I , then node q for the next position is determined by sampling edge p → q according to the following probability pp,q (t): ⎧ t ⎨ ep,q t if q ∈ F (p) e pp,q (t) = s∈F (p) p,s (3) ⎩ 0 otherwise where F (p) is the set of nodes connected to p, that remain for potential selection in string I . This sampling is iterated until ls nodes are obtained by replacing the next p by the obtained q. Note here the initial p is the node of the last position plast obtained as plast = (stop − 1 + L) mod L, where stop is the top position of the SBS and stop = (ptop + lp + L) mod L (see Fig. 2). To save sampling time, we used the candidate list with candidate size (Csize ) of 20. Using the candidate list, the computation complex for getting one solution according to Eq. (3) should nearly be O(L). This sampling method is similar in part to the sampling in Ant Colony Optimization (ACO) [13]. 2.2
Generational Model Revised
The most important revision in eEHBSA is the generational model. In previous studies [3,12], we used a generational model in which only one new solution is generated in one generation. However this model is inefficient and not suitable for parallelization. In eEHBSA, we generate N solutions in one generation as shown in Fig. 3.
better
P(t) I1 I2
Modeling
IN
better
W(t) I1' I2'
EHM
IN' Selection
Sampling with template
Fig. 3. Generational model of eEHBSA
Parallelization of an Evolutionary Algorithm on a Platform
65
1. t ← 0. Generate initial individuals Ii (i = 1, 2, · · · , N ) in P (0) randomly. 2. For each Ii in P (0), improve it by a local search (if we use a local search), and evaluate it. 3. Build the EHM from P (t). 4. Generate new individuals Ii from Ii and the EHM. 5. For each Ii , improve it by a local search (if we use a local search), and evaluate it. 6. Obtain next population P (t + 1) by comparing Ii and Ii and setting the best one as Ii for each i. 7. t ← t + 1. 8. If the termination criteria are met, terminate the algorithm. Otherwise, go to Step 3.
Fig. 4. Algorithm description of eEHBSA. Boldface text indicates those steps that are executed in parallel threads.
Let P (t) represent population at generation t. The population P (t + 1) is produced as follows: For each individual Ii in population P (t)(i = 1, 2, · · ·, N ), we generate new solutions Ii in a working pool W (t) using Ii as their templates. Then, we compare each pair (Ii , Ii ). If Ii is better than Ii , Ii is replaced with Ii , otherwise Ii remains in the population, and thus the next population P (t + 1) is formed. The algorithm of the eEHBSA is summarized in Fig. 4.
3 3.1
Parallelization of eEHBSA Using Multi-Thread Models with a Multi-core Processor A Synchronous Multi-Thread Model (SMTM)
In Fig. 4, all steps except for Step 3 can be executed in parallel among individual Ii (i ∈ {1, · · · , N }). Since Step 3 (building the EHM) requires all individuals Ii in P (t), we need a synchronization in this step. Computation time required for Step 1 (Initialization) is very small, so there is no merit to running this step in parallel. Thus, in the synchronous multi-thread model in this study, we implement two thread classes TH EVAL which execute Step 2, and TH SYNC MAIN which execute Steps 4, 5, and 6. We generate nthread instances of these classes, respectively. They run in parallel, synchronously. Here, we set the nthread to the core number of the computing platform. These threads are woken up when their corresponding steps inFig. 4 start and enter the waiting status when there are no more individuals to be processed in each generational cycle. 3.2
An Asynchronous Multi-Thread Model (AMTM)
For an AMTM in this study, we modify the algorithm described in Fig. 4 as shown in Fig. 5. In Step 3, building the EHM from P (t) is performed only for t = 0. For t >0, EHM can be updated by Ii and Ii in Step 6 for each i independently of each other. This update is performed as follows: If Ii is better than Ii in comparison between Ii and Ii , then the values of etp,q |(p, q) ∈ Ii are decremented by 1 and the values of etp,q |(p, q) ∈ Ii are incremented by 1. With
66
S. Tsutsui
this modification, Steps 4, 5, 6, and 7 in Fig. 5 can be executed in parallel for i. We implement a thread class TH ASYNC MAIN which executes Steps 4, 5, 6, and 7 in Fig. 5. For Step 2, we used thread class TH EVAL in SMTM. As in SMTM, we generate nthread instances for these classes, respectively. Instances of TH EVAL run in parallel synchronously only once at t = 0. Instances of TH ASYNC MAIN run in parallel asynchronously until the termination criteria are met. As in SMTM, we set the nthread to the core number of the computing platform.
1. t ← 0. Generate initial individuals Ii (i = 1, 2, · · ·, N ) in P (0) randomly. 2. For each Ii in P (0), improve it by a local search (if we use a local search), and evaluate it. 3. Build the initial EHM from P (0). 4. Generate new individuals Ii from Ii and the EHM. 5. For each Ii , improve it by a local search (if we use a local search), and evaluate it. 6. Compare Ii and Ii . If Ii is better than Ii , update EHM and set Ii as Ii for each i. 7. If the termination criteria are met, terminate the algorithm. Otherwise, go to Step 4 to process another individual.
Fig. 5. Algorithm description of the modified eEHBSA for AMTM. Boldface text indicates those steps that are executed in parallel threads.
Fig. 6 shows how the instances of TH ASYNC MAIN class execute in parallel asynchronously in the AMTM. Here, we assume 4 instances (threads) of TH ASYNC MAIN run in parallel. (1) shows the situation where the individuals i=1, 2, 3, and 4 are running in parallel and (2) shows the situation where individual 2 has finished and execution of individual 5 has started. (3) shows the situation where individuals N − 3, N − 2, N − 1, and N are running in parallel. (4) and (5) show the situations when executions of individuals N − 1 and N − 2 have finished and executions of individuals 1 and 2 have stated, respectively. In this way, individuals 1 and 2 are in a next generation while other individuals remain in the previous generation. As seen in this figure, there is no common generation counter in the AMTM.
(1) N
1
(2) 2
N 3
N-1
4
N-2 5
N-3 N-4
6
1
(3) 2
N 3
N-1
4
N-2 5
N-3 N-4
6
1
(4) 2
N 3
N-1
4
N-2 5
N-3 N-4
6
1
(5) 2
N 3
N-1
4
N-2 5
N-3 N-4
6
1
2 3
N-1
4
N-2 5
N-3 N-4
6
Fig. 6. Examples of execution in AMTM. nthread = 4 is assumed. Shaded circles show individuals under process by threads of TH ASYNC MAIN class.
Parallelization of an Evolutionary Algorithm on a Platform
4 4.1
67
Results of Parallelization of eEHBSA Performance of the Single eEHBSA
Before running parallel eEHBSA, we confirm the performance characteristics of the eEHBSA which is not parallelized for various values of γ which was introR CoreTM i7 965 (3.2 duced in eEHBSA. The machine we used has one Intel GHz) Processor, 6GB main memory, and 32-bit Windows XP. The code is written in Java. We measure the performance by the number of runs in which the algorithm succeeded in finding the optimal solution (#OPT) and the average time to find optimal solutions in successful runs in seconds (Tavg ). We set Bratio to 0.005 and 20 runs were performed in each experiment. First, we run eEHBSA on two small TSP instances; berlin52 and pr76 without using local search. Population size is set to 2 × L. Maximum number of solution constructions (Smax ) is set to 20,000×L. Next, we run eEHBSA when it is combined with local searches. For problems comprising hundreds of cities, i.e., lin318 and att532, we applied 3-OPT local search. The population size is set to L/15. The Smax is set to 2000×L. For problems comprising thousands of cities, i.e, fl3795 and rl5934, we applied the Lin-Kernighan heuristic (LK). Here, we used Chained LK called Concorde TSP solver [14]. Concorde showed good performance in our previous studies on the cunning Ant System (cAS) in [15]. Since the current eEHBSA is written in JAVA, Concorde (written in C) was combined with eEHBSA JNI. The population size is set to 4. The Smax is set to L/10. In these experiments, executions were terminated when the number of solution constructions reached Smax or their optimal solutions were found. Fig. 7 shows #OPT and Tave for various γ ∈ [0.1, 1] with interval of 0.1. Recall that γ specifies the number of nodes to be generated using EHM (the remaining nodes are taken from the template). 20
12
O 5.63 15
9
O
4.59
O
4
#OPT
O
O
10
2.54
2.3 1.76
2
O
O
O
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
O
O
3.2
5
3
0
0
O
2.5 2.6
Tavg (sec)
O
20
500
15
375
3.71 3.76 0
6.1
O
O
O
O
O
O
10
6.11
O
5
125
J
1
0
0
732.6
15
230.5
0
0
20
O
495.2
10
344.8 277 280.9
200
1
259.4 249.2
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
20
2000
15
1500
151.6 113.6
O
438.8
O
O
O
1
20
O
Tavg
232.3
48.2 52.2
0
O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
O O 664.2
#OPT
O
456.5
5
#OPT
250
O
fl3795
O O
O
O
Tavg
400
Tavg O 13.65
4.69
O
J
24.18 19.34
#OPT
5.49
O
O
9.87 7
O
600
lin318
O
Tavg
14
O O
2.6 2.7 2.7
2
800
15
J
O 21
O 11.2 20 O 8.9
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
#OPT Tavg (sec)
O
O
10
berlin52 O
O
#OPT
J 28
O
6
1.02 0.59 0.57 0.74 0.83 0
O
Tavg
#OPT
O
10
O
73.8 69.3 0
0
O
O
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
O 1000
500
0
0
#OPT
15
1297.2 1082.5
993.7 682
5
O 1536.9 O
O
807.3 782.7
10
901.7
5 0
0
O
O
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
J
J
pr76
att532
rl5943
(1) no LS
(2) 3-OPT
(3) LK
Fig. 7. Performance of eEHBSA for various γ
0
#OPT
O
#OPT
O
Tavg
Tavg (sec)
O
#OPT Tavg (sec)
O
#OPT
O
6
Tavg (sec)
Tavg (sec)
8
68
S. Tsutsui
From these results, we can confirm a clear effect of using partial solutions in all cases where we combine eEHBSA with no local search and with local searches. We can see that using small γ values, i.e., [0.2, 0.5] shows larger #OPT and smaller Tavg for each instance. Though we showed results of only six instances in Fig. 7 due to limitations of space, a similar result was obtained for other instances. 4.2
Performance of eEHBSA in Multi-Thread Runs
We ran parallel eEHBSA on the following 10 instances; (1) small instances which are solved by combining no local search i.e., oliver30, gr48, berlin52, and pr76, (2) instances comprising hundreds of cities which are solved by combining 3-OPT, i.e, lin318, pcb442, att532, and rat783, and (3) instances comprising thousands of cities which are solved by combining LK, i.e, fl3795 and rl5934. The experimental conditions are the same as in Section 4.1 except for γ value. We fixed γ = 0.3. With this γ value, eEHBSA showed stably higher performance than other γ R CoreTM values on all instances in experiments of Subsection 4.1. Since Intel i7 965 processor has 4 cores, we set nthread to 4. Table 1 summarizes the results. In all experiments, #OPT=20 was obtained. The Speedup indicates (Tavg of the non-parallelized eEHBSA) / (Tavg of the parallelized eEHBSA), not the ratio of computation time of which the number of solution constructions reached a fixed value. Here we conducted a two-sided t-test between Tavg of the SMTM and AMTM to show the statistical significance Table 1. Results of of eEHBSA with multi-thread models
Instance
Local search
oliver30
T avg (sec) SE confidence-interval 0.082 0.004 [0.074, 0.091]
0.60
gr48 no
0.04
[0.53, 0.67]
berlin52
0.57
pr76
3.71
lin318
2.57
pcb442
12.49
0.03
[0.51, 0.64]
0.14
[3.40, 4.01]
0.64
[1.23, 3.91]
3-OPT
2.21
[7.85, 17.12]
att532
73.84
rat783
139.48
fl3795
280.88
12.29
[48.11, 99.56]
7.57
[123.63, 155.32]
LK rl5934 SE : standard error
Parallelized e EHBSA (n thread =4)
Non-parallelized e EHBSA
T avg (sec) SE confidence-interval 0.05 0.08 [0.045, 0.063]
0.29
0.01
[0.26, 0.31]
0.29
0.02
[0.25, 0.33]
1.46
0.07
[1.31, 1.61]
0.84
0.06
[0.70, 0.98]
3.12
0.28
[2.54, 3.70]
21.84
3.00
[15.57, 28.11]
38.77
2.72
[33.07, 44.47]
34.70
105.55
76.04
252.44
[208.24]
807.27
SMTM
19.08
[65.60, 145.50]
[648.12, 966.43]
26.51
[196.96, 307.93]
AMTM Speedup 1.5
T avg (sec) SE confidence-interval 0.02 0.04 [0.020, 0.004]
2.1
0.16
2.0
0.13
2.5
0.86
3.1
0.76
4.0
2.92
3.4
19.41
3.6
38.37
2.7
85.51
3.2
209.62
0.01
[0.15, 0.17]
0.00
[0.12, 0.14]
0.11
[0.74, 1.20]
0.04
[0.68, 0.84]
0.26
[2.38, 3.47]
2.78
[13.81, 25.02]
2.75
[32.62, 44.12]
12.28
[59.81, 111.21]
16.91
[174.22, 245.01]
Speedup
p -value
3.4
5.93E-07
3.8
1.31E-09
4.3
1.58E-07
4.3
8.01E-04
3.4
0.310
4.3
0.606
3.8
0.549
3.6
0.918
3.3
0.384
3.9
0.183
Parallelization of an Evolutionary Algorithm on a Platform
69
of the obtained results. The p-values are also put in the table. In the table, we also showed the two-sided 95% confidence interval of Tavg in each experiment. First, let’s observe the results with AMTM. We can see that the values of Speedup for this group of instances range from 3.3 to 4.3. The speedup values in table 1 which go above 4 can be understood by looking at the confidence intervals listed in the table. Thus we can see a reasonable speedup can be obtained with AMTM. Next, let’s observe the results with SMTM. On instances of small size (oliver30, gr48, berlin52, and pr76), the values of Speedup range from 1.5 to 2.5. These values are much smaller than the values of Speedup with AMTM. On the instances of large size, comprising hundreds of cities, with 3-OPT (lin318, pcb442, att532, and rat783) and comprising thousands of cities with LK (fl3795 and rl5934), again we can see the values of Speedup with SMTM are worse than that with AMTM for all instances. However, on these instances the differences of the results were not of much statistical significance (as the p-values were not very small in the t-test). To investigate this eviGenerating new individuals Evaluation and LS Building EHM Other dence, let’s check the disBuilding EHM (0%) Building EHM (0%) tributions of computation Building EHM (1%) Generating new individuals (0%) Generating new individuals (7%) Other (7%) Other (1%) Other (3%) time of single eEHBSA Evaluation and LS (12%) on olive30, att532, and Generating new individuals (79%) Evaluation and LS (97%) Evaluation and LS (92%) rl5934 (as representatives of small, medium and oliver30 att532 rl5934 large size instances). Fig. 8 shows the distributions Fig. 8. Distribution of computation time of eEHBSA of computation time for oliver30 (without local search), att532 (with 3-OPT), and rl5934 (with LK). On oliver30, the time for generating new individuals occupies about 80% of the algorithm’s computation time. In generating new strings with eEHBSA, numbers of sampling nodes among individuals distribute according to Eq. (2). Further, the time for sampling nodes according to Eq. (3) is also probabilistically distributed. Thus, in SMTM there exists waiting time for synchronization of these sampling processes. In contrast to this, there exists no such waiting time for synchronization in the AMTM. As a result, we can see a clear advantage of using AMTM over SMTM on small instances. On att532, only 7% of the total time is occupied with generating new individuals and on rl5934 the time is insignificant. Although for eEHBSA with local searches the advantage of using AMTM over SMTM can be still observed, it is smaller than that seen in eEHBSA without local search.
5
Consideration on a Future Direction: From Multi-core to Many-core
As a new parallel computation scheme, recently there has been a growing interest in developing parallel algorithms using graphic processing units, or GPUs [16].
70
S. Tsutsui
Shared Memory (SM)
Shared Memory (SM)
Shared Memory (SM)
In this approach, more than 1000 threads can be run in parallel on one GPU. Although there are many parallel computations in the scientific computation fields [16], a few studies on parallel GAs with GPU computation are reported, e.g. [17]. In this section we will give a brief consideration on the possibilities of using many-core processors with GPGPUs to run PEAs to solve the quadratic assignment problem (QAP) by referring to our previous study in [18]. The problem sizes of QAPs in real life problems are relatively small compared with other problems in permutation domains such as TSP. This enables us to use the limited resources of a GPU effectively. Since QAP is one of the most difficult problems among problems in permutation domains, it is a good testbed for an early evaluation of PEAs with GPGPU. As the parallel method, a multiple-population, coarse-grained GA model was adopted as performed in [17]. Each subpopulation is evolved in each shared memory (SM) by a multiprocessor (MP) in a CUDA GPU (see Fig. 9). At every 500 generation intervals, all individuals in subpopulations are shuffled via the VRAM of the GPU. Here we do not apply any local search to see pure runtime of EAs with GPU. The GA model of subpopulation is very similar to that used in Fig. 3. However, instead of using eEHBSA, here we used the PMX. The PMX operator produces two offspring from two parents, but we generated only one offspring as is performed in the GENITOR [19]. For each Ii in Fig. 3, we choose its partner Ii,j from subpopulation P (t) randomly (i = j). Then, we get one offspring Ii in W (t) from Ii and Ii,j with the PMX operator. For the mutation operator, we used a swap mutation where values of two randomly chosen positions in a string are exchanged with rate of 0.1. We used the NVIDIA GeForce GTX285 GPU, which has 30 Stream Processor Multi-Processor (30) multiprocessors (MPs) and each SP SP SP SP SP SP MP has 8 stream processors SP SP SP SP SP SP (SPs) sharing 16KB high speed SP SP SP SP SP SP memory (SM) among them, as shown in Fig. 9. To use this SP SP SP SP SP SP machine’s features efficiently, we divide individuals into subpopuVRAM (Global Memory) lations of size 128 each. So, we allocate the population P (t) and Fig. 9. NVIDIA GeForce GTX285 GPU working pool W (t) (see Fig. 3) to the SM of each MP. We define the total number of subpopulation as 30 × k(k = 1, 2, · · ·). Thus, the total number of individuals is 128 × 30 × k. The parameter k determines the total thread number in CUDA programming and the total thread number is the same as the total number of individuals. To load as many individuals as possible in SM, we represent an individual as an array of type unsigned char. This restricts the problem size we can solve to at most 255. However, this size is enough in solving QAPs. Let L be the problem size of a given QAP instance. Then, the size of population pools P (t)
Parallelization of an Evolutionary Algorithm on a Platform
71
㪪㫇㪼㪼㪻㫌㫇㩷㫏
and working pool W (t) is 2L × N where N is the subpopulation size. We have only 16KB SM per MP. To maximize both L and N , we chose N as 128 and the shape of a thread block as 128 × 1 × 1 under the assumption that L is at most 56. We stored distance matrix and flow matrix of QAP in the constant memory so that they can be accessed via cache. To save memory space size for these matrices, unsigned short was used for the elements of these matrices. We used the same PC described in Section 4 with one NVIDIA GeForce GTX285. For CUDA program compilation, Microsoft Visual Studio 2005 Professional Edition with optimization option /O2 and CUDA 2.1 SDK were used. The QAP instances on which this algorithm was tested were taken from the QAPLIB. 10 runs were performed for each instance. We measured 㪉㪌 the performance by Tavg (see GA1 㪉㪇 Section 4). We increased paramGA2 eter k from 1 to 6 with step size 㪈㪌 1 until the #OP T 9 is obtained. To compare results with GPU 㪈㪇 computation, we designed two types of GA on CPU, GA1 and 㪌 GA2. GA1 is logically identical 㪇 with parallel GA on the GPU. 㫋㪸㫀㪉㪌㪹 㫂㫉㪸㪊㪇㪸 㫂㫉㪸㪊㪇㪹 㫋㪸㫀㪊㪇㪹 㫋㪸㫀㪊㪌㪹 㫊㫋㪼㪊㪍㪹 㫋㪸㫀㪋㪇㪹 GA2 was obtained by tuning 㪨㪘㪧㩷㫀㫅㫊㫋㪸㫅㪺㪼㫊 GA1 and it also consists of 30 × k(k = 1, 2, · · ·) subpopulations. Fig. 10. Experimental results of GPU computaIn GA2, only 5% of individuals tion. The results are compared with CPU compuare randomly exchanged among tation with one core. subpopulations at every 50 generations. Results are shown in Fig. 10. Although the speedup ratios of GPU computation are modest and are different depending on QAP instances, and are in the range from 2.9 to 12.6 against GA2, we can observe a definite speedup with GPU computation on instances in this experiment.
6
Conclusions
In this paper we proposed methods of parallelization of an evolutionary algorithm using multi-thread programming on a PC with multi-core processors. The parallelization models were designed using eEHBSA to speed up execution of the algorithm. We proposed two types of parallel models; one was a synchronous multi-thread model (SMTM), and the other was an asynchronous multi-thread model (AMTM). Experiments were performed using several sizes of TSP instances ranging from small instances to large instances. The results showed that both parallel methods increased the speed of the computation times nearly proportional to the number of cores for all test problems. The AMTM produced especially good run time results for small TSP instances
72
S. Tsutsui
without local search. For larger instances with local searches, the differences of both approaches become smaller. There remains much future work. Although in this study we used TSP as the test problem, to test the parallel eEHBSA on other sequencing problems remains for future study. We focused mainly on running PEAs on a computing platform with multi-core processors, however we also showed preliminary tests on PEAs with many-core GPUs showing better results than with CPU computation. This is an area open to future study. Although we got promising results, there are many potential improvements that can be made upon our methods. Further, currently available GPUs have problems attacking large-size problems, such as TSP, because they have a relatively small shared memory available among processing units. Even so, GPU computation remains an important area of future work in this line of this study.
Acknowledgements Research in Section 5 was performed in collaboration with Prof. N. Fujimoto, Osaka Prefecture University. This research is partially supported by the Ministry of Education, Culture, Sports, Science and Technology of Japan under Grantin-Aid for Scientific Research No. 19500199.
References 1. Cant´ u-Paz, E.: Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publishers, Dordrecht (2000) 2. Alba, E.: Parallel Metaheuristics: A New Class of Algorithms. John Wiley & Sons, NJ (2005) 3. Tsutsui, S.: Probabilistic model-building genetic algorithms in permutation representation domain using edge histogram. In: Guerv´ os, J.J.M., Adamidis, P.A., Beyer, H.-G., Fern´ andez-Villaca˜ nas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 224–233. Springer, Heidelberg (2002) 4. Pelikan, M., Goldberg, D., Lobo, F.: A survey of optimization by building and using probabilistic models. Computational Optimization and Applications 21(1), 5–20 (2002) 5. Larra˜ naga, P., Lozano, J.: Estimation of distribution algorithms: A new tool for evolutionary computation (2002) 6. Madera, J., Alba, E., Ochoa, A.: Parallel estimation of distribution algorithms. In: Alba, E. (ed.) Parallel Metaheuristics: A New Class of Algorithms, pp. 203–222. John Wiley & Sons, Chichester (2005) 7. Bosman, P., Thierens, D.: Permutation optimization by iterated estimation of random keys marginal product factorizations. In: Proc. of the 2002 Prallel Problem Solving from Nature, pp. 331–340 (2002) 8. Robles, V., Miguel, P., Larra˜ naga, P.: Solving the traveling salesman problem with edas. Estimation of Distribution Algorithms, 211–229 (2002) 9. Starkweather, T., McDaniel, S., Mathias, K., Whitley, D., Whitley, C.: A comparison of genetic sequence operators. In: Proc. of the 4th Inter. Conf. on Genetic Algorithms, pp. 69–76. Morgan Kaufmann, San Francisco (1991)
Parallelization of an Evolutionary Algorithm on a Platform
73
10. Tsutsui, S., Miki, M.: Using edge histogram models to solve flow shop scheduling problems with probabilistic model-building genetic algorithms (2004) 11. Tsutsui, S., Wilson, G.: Solving capacitated vehicle routing problems using edge histogram based sampling algorithms. In: Proc. of the 2004 Congress on Evolutionary Computation (CEC’ 04), pp. 1150–1157 (2004) 12. Tsutsui, S., Pelikan, M., Goldberg, D.E.: Using edge histogram models to solve permutation problems with probabilistic model-building genetic algorithms. IlliGAL Report No. 2003022 (2003) 13. Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: Optimization by a colony of cooperating agents. IEEE Trans. on SMC-Part B 26(1), 29–41 (1996) 14. Applegate, D., Bixby, R., Chvatal, V., Cook, W.: Ansi c code as gzipped tar file, concorde tsp solver (2006), http://www.tsp.gatech.edu/concorde.html 15. Tsutsui, S.: cAS: Ant colony optimization with cunning ants. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guerv´ os, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 162–171. Springer, Heidelberg (2006) 16. NVIDIA:Cuda programming guide 2.1 (2009) 17. Maitre, O., Baumes, L.A., Lachiche, N., Corma, A., Collet, P.: Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA. In: GECCO ’09: Proc. of the 11th Annual conference on Genetic and evolutionary computation, pp. 1403–1410. ACM, New York (2009) 18. Tsutsui, S., Fujimoto, N.: Solving quadratic assignment problems by genetic algorithms with gpu computation: A case study. In: Proc. of the GECCO 2009 Workshop on Computational Intelligence on Consumer Games and Graphics Hardware (CIGPU-2009), pp. 2523–2530. ACM, New York (2009) 19. Whitley, D.: The genitor algorithm and selective pressure: Why rank-based allocation of re-productive trials is best. In: Proc. of the 3rd Inter. Conf. on Genetic Algorithms, pp. 116–121. Morgan Kaufmann, San Francisco (1989)
On the Difficulty of Inferring Gene Regulatory Networks: A Study of the Fitness Landscape Generated by Relative Squared Error Francesco Sambo1 , Marco A. Montes de Oca2 , Barbara Di Camillo1 , and Thomas St¨ utzle2 1
Dipartimento di Ingegneria dell’Informazione, Universit` a di Padova, Padua, Italy {sambofra,dicamill}@dei.unipd.it 2 IRIDIA-CoDE, Universit´e Libre de Bruxelles, Brussels, Belgium {mmontes,stuetzle}@ulb.ac.be
Abstract. Inferring gene regulatory networks from expression profiles is a challenging problem that has been tackled using many different approaches. When posed as an optimization problem, the typical goal is to minimize the value of an error measure, such as the relative squared error, between the real profiles and those generated with a model whose parameters are to be optimized. In this paper, we use dynamic recurrent neural networks to model regulatory interactions and study systematically the “fitness landscape” that results from measuring the relative squared error. Although the results of the study indicate that the generated landscapes have a positive fitness-distance correlation, the error values span several orders of magnitude over very short distance variations. This suggests that the fitness landscape has extremely deep valleys, which can make general-purpose state-of-the-art continuous optimization algorithms exhibit a very poor performance. Further results, obtained from an analysis based on perturbations of the optimal network topology, support approaches in which the spaces of network topologies and of network parameters are decoupled.
1
Introduction
In the cells of living organisms, genes are transcribed into mRNA (messenger RNA) molecules which, in turn, are translated into proteins [7]. Some proteins, called transcription factors, can increase (activate) or decrease (inhibit) the transcription rates of genes; other proteins can control the translation of mRNA into new proteins. The process whereby genes control, indirectly via the proteins they encode, the expression (i.e., the mRNA transcription rate) of other genes, is known as genetic regulation [7]. Knowing the regulatory relations among genes is important for understanding fundamental processes that occur within living cells. DNA microarray technology [19] has enabled researchers to monitor the expression of the whole genome under various genetic, chemical and environmental perturbations. The output data from DNA microarray experiments, in the form P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 74–85, 2010. c Springer-Verlag Berlin Heidelberg 2010
On the Difficulty of Inferring Gene Regulatory Networks
75
of gene expression time series, can be used to infer a gene regulatory network (GRN). A GRN is a graph in which the nodes, representing genes or proteins, are connected by an edge if a regulatory relation exists between them. Different approaches have been adopted in the literature to model and infer GRNs from DNA microarray experiments [11, 8, 26]. A very common approach for inferring a GRN is to cast the problem as one of optimizing the free variables of a model that is capable of generating time expression profiles. In this case, the goal of the optimization process is to minimize a cost function quantifying the differences between the real temporal profiles and the profiles generated with the current estimation of the model’s parameters. Unfortunately, the problem of inferring GRNs from gene expression profiles using optimization techniques has proved to be difficult even when dealing with very small networks (5-10 genes) [24]. In this paper, we address the issue of the difficulty of inferring GRNs by performing an analysis based on the notion of fitness-distance correlation (FDC) [10, 9]. To model regulatory interactions we chose dynamic recurrent neural networks (RNNs) [15], which model the set of genes as a system of nonlinear differential equations, and we adopted the relative squared error (RSE) as a measure of the lack of accuracy of time profiles generated by an inferred network with respect to those of a target GRN. As a first contribution, we present an analysis of the error surface generated by the combination RNN-RSE (Section 4.1). The main result of this analysis is that the RNN-RSE error surface has a strong positive fitness-distance correlation; however, the data also shows the existence of many local optima of extreme depth, which seems to be the main cause for the poor performance shown by optimization algorithms on this problem. A second contribution is the quantification of the effect that a priori information on the target’s GRN structure has on the fitness landscape (Section 4.1). The final contribution is the analysis of the behavior of a state-of-the-art continuous optimization algorithm (NEWUOA [22] with multiple restarts) on the problem with and without a priori network structure information (Section 4.2). The results obtained from this analysis constitute strong evidence in favor of inference approaches in which the spaces of network topologies and of network parameters are decoupled.
2
Modeling Gene Regulatory Networks
Many mathematical models exist in the literature to describe gene regulatory interactions: Relevance Networks [17], Boolean Networks [16], Dynamic Bayesian Networks [5] and systems of additive or differential equations, being them linear [1], ordinary nonlinear [6, 13, 23, 25, 27, 28] (including recurrent neural networks) or S-systems [21, 14, 24]. Systems of equations are commonly used as a modeling tool by the metaheuristics community, because the problem of fitting the model to data can be mapped easily to an optimization problem. In that case, the model’s parameters form the search space and the fitness function is usually a variant of the error between the real temporal profile and the one estimated from the fitted model.
76
F. Sambo et al.
Linear and additive models lack the capability to capture real regulatory relations, which in general are highly nonlinear and differential; S-systems, on the contrary, are suited to accurately describe the behavior of small sets of genes, but are impractical on large scale scenarios, because of the high number of free parameters (2n(n + 1) for a network of n genes). Considering the limitations of the methods mentioned above, we use dynamic recurrent neural networks (RNNs) [15], which model the set of genes with a system of nonlinear differential equations of the form dxi k 1 − k2 xi , = dt 1 + exp − j=1...n wij xj + b
i = 1...n
(1)
where n is the number of genes in the system, xi is the rate of expression of gene i, wij represents the relative effect of gene j on gene i (1 ≤ i, j ≤ n), b is a bias coefficient, k1 is the maximal rate of expression and k2 is the degradation rate. For our analysis, we set for simplicity b = 0, k1 = 1 and k2 = 1. The search space for an optimization algorithm, then, is formed by the matrix W of coefficients wij . An identical model is suggested in [25] for the analysis of microarray data from an experiment on Saccharomyces Cerevisiae cell cycle, and is adopted in [27] and [28] for a reverse engineering algorithm based on particle swarm optimization [12]. In the latter two cases, however, derivatives are approximated with finite differences and estimated from temporal data: such an approach amplifies the effects of noise and requires a large amount of data points. Thus, we decided to maintain derivatives and to generate temporal profiles with numerical integration of the whole system. For this purpose, we chose a Runge-Kutta-Fehlberg method with adaptive step size control [4].
3
Experimental Dataset
Experimental data are generated with the simulator recently introduced in [3]. In this simulator, the regulatory network’s topology is generated according to the current knowledge of biological network organization, including scale-free distribution of the connectivity1 and a clustering coefficient independent of the number of nodes in the network. The resulting networks are very sparse, that is, the number of edges at most doubles the number of nodes, therefore the majority of elements in the connectivity matrix are equal to zero. Nonzero elements of the matrices generated by the simulator (wij terms in Equation 1) are then set uniformly at random in the range [−10, 10]. To generate simulated gene expression time series, the expression of each gene is initialized uniformly at random and the system is let free to evolve to a steady 1
For networks in the size range of the ones we consider in this paper the scale-free distribution can not be properly defined. However, generated networks exhibit the main properties of scale-free networks: they are sparse, with few highly connected nodes and a large number of loosely connected nodes.
On the Difficulty of Inferring Gene Regulatory Networks
77
state. Gene profiles are then sampled with logarithmic time spacing, so that the majority of samples are taken right after the initialization. This practice is common in real microarray experiments, because meaningful information usually concentrates right after the external stimulation of a dynamical system. The analyses reported in this paper are carried over on gene networks of size 10, in line with experimental results from the state-of-art [24, 28, 27].
4
Analysis
To investigate the structure of the fitness landscape of our optimization problem, we performed a fitness-distance correlation analysis [10,9]. We randomly sampled interesting areas of the search space and studied, for sampled solutions, the distribution of fitness values versus distance from the optimal solution. In our case, a fitness value is considered to be better if the solution associated with it has a lower value of the objective function. For the fitness function, we used the relative squared error (RSE) between real and estimated temporal profiles, which is defined as T
RSE =
n
1 [ˆ xi (t) − xi (t)] , T n t=1 i=1 x2i (t) 2
(2)
where n is the number of genes, T is the number of time samples, xi (t) is the real value for gene i at time t and xˆi (t) is the estimated value for the same sample. Preliminary analyses with mean squared error, another measure widely used as fitness function, showed the same behavior for the two types of errors, thus we concentrated the study only on RSE. As distance measure between candidate solutions and the optimal solution, we used the Euclidean distance. Fitness-distance correlation analysis is a standard tool for search space analysis that is used in many research efforts on evolutionary algorithms and has lead to a number of interesting insights, as an example see [18]. 4.1
Fitness Distance Correlation Analysis
Three type of analysis have been performed. In the first, we introduce perturbations that affect any of the n2 matrix elements, that is, zero and nonzero elements. In the second, only nonzero elements are affected. Finally, in the third, we perturbe the structure of the network, changing the pattern of nonzero elements. Step 1. As a first step of our analysis, we explored relations between fitness and distance among a set of random perturbations of the optimal solution: each element of the optimal matrix was perturbed with the addition of a log-uniformly distributed random variable (i.e. a random variable uniformly distributed in logarithmic scale) in the interval [10−a , 10−0 ], where a was tuned to account for different problem sizes. Results for 10000 iterations of the perturbation procedure on two networks of 10 genes are shown in Figure 1.
78
F. Sambo et al.
(a) Example of widely distributed samples (network 1)
(b) Example of more narrowly distributed samples (network 3)
Fig. 1. RSE vs Euclidean distance of 10000 log-uniform perturbations of the elements of the optimal system matrix, for two networks of 10 genes
As it can be seen from the figure, there is a strong correlation between Euclidean distance and RSE, because samples distribute along a band with positive slope, but the band is rather wide (approximately 10 orders of magnitude of RSE for Figure 1(a) and 6 orders for Figure 1(b)), thus leading to an average correlation coefficient of 0.471. We formulate the hypothesis that the difficulty in solving the particular problem instance is closely related to the width of the band in the fitness-distance plot. A large band width, in fact, suggests the presence of extremely deep valleys in the fitness landscape, in which a general purpose continuous optimization algorithm can keep decreasing the RSE without getting closer to the optimal solution. The perturbation procedure was repeated for 20 different problem instances of 10 genes. For the majority of them (17 over 20) the band in the RSE vs distance plot exhibits a width close to the one of network 3 (Figure 1(b)), and for the remaining instances the width is larger, close to the one of network 1 (Figure 1(a)). Therefore, we decided to use network 1 and 3 throughout the paper as two representative examples of problem instances, to validate empirically our hypothesis. Step 2. As a second step, we decided to investigate the relation between the features of the search space and the structure of the networks (i.e. the pattern of zero and nonzero elements in the weight matrix): gene networks are largely sparse, thus the number of parameters to be fine tuned by an optimization procedure is small with respect to the number of variables in the search space. We wanted to understand how much the search space is affected by information on the network structure.
On the Difficulty of Inferring Gene Regulatory Networks
79
Fig. 2. RSE vs Euclidean distance of 10000 log-uniform perturbations of nonzero elements of the optimal system matrix, for two networks of 10 genes. Plots are cut to keep the same scale adopted in the other figures; the diagonal lines spread with the same behavior down to 10−15 for Euclidean distance and 10−35 for RSE
To this end, we perturbed only nonzero elements of the optimal solution for each problem instance, fixing to zero the other elements. As before, perturbations were obtained with the addition of a log-uniformly distributed random variable. RSE vs Euclidean distance of 10000 perturbations for network 1 and 3 are shown in Figure 2. Even though the average correlation coefficient is 0.424, thus slightly lower than the one form the previous step, fitness-distance plots tend to be more structured: as it is clear from the figure, most of the samples lie on straight lines parallel to the bands of the previous experiment, and the vertical span of the lines reflects the width of the bands. Further experiments (data not shown) showed that each line corresponds to a single nonzero element of the weight matrix. This suggests that some variables may be optimized independently. Such an hypothesis was not necessarily evident from the mathematical description of the system and should be explored in future work. Step 3. We then decided to further explore the shape of the fitness landscape in regions close in structure to the global optimum; for this purpose, we exploited the concept of Hamming distance between two connectivity matrices, i.e. the number of bits that differ between the two matrices, and we randomly sampled Boolean matrices at Hamming distance 1, 2, 5 and 10 from the global optimum. We then kept original values for elements that are nonzero in both matrices, the original one and the sampled one, and set new values for the other nonzero elements, drawing them uniformly at random from the interval [−10, 10]. 10000 samples for each value of Hamming distance are shown in Figure 3, where lighter gray corresponds to higher Hamming distance, for networks 1 and 3.
80
F. Sambo et al.
Fig. 3. RSE vs Euclidean distance of 10000 log-uniform perturbations of the optimal system matrix at Hamming distance 1, 2, 5, 10, for two networks of 10 genes
From the figure, it is evident that at higher Hamming distances there is no particular correlation between fitness and distance, but when the Hamming distance decreases the fitness vs distance plot becomes more and more organized, approaching the global shape of the bars from Figure 1. Indeed, average correlation coefficients are 0.219, 0.196, 0.185 and 0.177 for networks at Hamming distance 1, 2, 5 and 10, respectively. At Hamming distance 1, samples tend to appear as curved lines and the structure of the plots become closer to the one from Figure 2. This latter analysis outlines that portions of the search space which correspond to networks structurally close to the optimum (i.e., at a low Hamming distance) present more organization in the fitness landscape, and can thus be a local basin of attraction for an algorithm which searches in the discrete space of network structures. To test the quality of a particular network structure, a second algorithm can be alternated to the first, to optimize continuous nonzero values of the network; for the second algorithm, the probability of finding the optimal solution should increase as the network structure becomes closer to the optimal structure. 4.2
Algorithm Behavior
The analysis presented above gives an overall picture of the fitness-distance relationship in the search space. In addition, it is of interest to study the behavior of a specific algorithm in the search space. The question we want to address is whether an algorithm is capable of inferring the structure of the target network using only the information provided by the RSE measure. If that is not the case, a second experiment consists in measuring the performance of the algorithm when the optimal network topology is known a priori. For our experiments, we use
On the Difficulty of Inferring Gene Regulatory Networks
81
NEWUOA [22], which is a software for unconstrained continuous optimization in many dimensions that does not need information about the derivatives of the objective function f : Rn → R it is applied to. At each iteration, NEWUOA creates a quadratic model that interpolates k values of the objective function which is used in a trust-region procedure [2] to update the variables. The main advantage of NEWUOA is that it can be used to solve large scale optimization problems thanks to the reduced number of interpolation points it needs to build the quadratic model (usually k = 2m + 1, where m is the number of variables to optimize, is recommended). NEWUOA is considered to be a state-of-theart continuous optimization technique [20]. By definition, trust-region methods search locally, which means that they may converge to some local optimum in the case the objective function is multimodal. For this reason, we used NEWUOA with multiple restarts, so as to explore different regions of the search space in order to reduce the chances of converging to low-quality local optima. In our setting, NEWUOA is restarted from a new initial solution after it has reached a maximum number of function evaluations, or when the final radius of the trust region reaches a certain threshold. In Table 1, we show the parameters used in our experiments. These parameters were chosen after an initial non-exhaustive experimentation phase. Table 1. Parameters used with NEWUOA with multiple restarts Parameter Initial trust region radius
0.2
Final trust region radius
1 × 10−10
Number of interpolation points
k = 2m + 1, where m is the number of variables to optimize
Maximum number of function evaluations per NEWUOA run
2 × 104
Maximum total number of function evaluations
2 × 105 , with structure information
Number of independent runs
Value
1 × 106 , information
without
structure
100
The results obtained from running NEWUOA with multiple restarts without any a priori information about the correct topology of the target GRN are shown in Figure 4. Each shade of gray represents a run of the algorithm. Although the algorithm is capable of making progress in terms of the value of the objective function (it descends from a value in the order of 100 to a value in the order of 10−5 ), it does not make any progress towards the actual target GRN. This can be seen by the (almost) vertical lines that appear on the upper right corner of the plots in Figure 4.
82
F. Sambo et al.
(a) NEWUOA with multiple restarts on network 1
(b) NEWUOA with multiple restarts on network 3
Fig. 4. The progress of NEWUOA with multiple restarts on two 10-gene-network inference problems. Each shade of gray represents a run of the algorithm. The plots shown correspond to the case in which no a priori information about the correct topology of the target GRN is provided to the algorithm.
In Figure 5, we show the results obtained after running NEWUOA with multiple restarts when the correct topology of the target GRN was used by the algorithm, which is equivalent to reducing the size of the search space so that only nonzero entries are optimized. As before, each shade of gray represents a run of the algorithm. In this case, the behavior of the algorithm depends on the target network. With network 1, the algorithm moves towards the optimal solution while improving the value of the RSE in both cases over several orders of magnitude. However, in the vast majority of cases, the algorithm cannot find solutions that are closer than a distance of 100 to the optimal solution. In contrast, with network 3, the algorithm is capable to find the optimal solution in each run. The results presented above, together with those of the analysis based on structure perturbations, constitute strong evidence in favor of optimization algorithms that explicitly intertwine a network structure search phase with a network’s parameters search phase. A reduction in the distance from the optimal network topology allows a continuous optimization algorithm to make more progress toward the truly optimal solution. Although we tested our hypotheses using only one specific algorithm, we do not expect our observations to change substantially if another algorithm is used. This is because, as evidenced in Figure 5, even with a perfect information about the correct topology of the target GRN, the error surface generated by the RSE measure is still hard to search as it is multimodal in nature.
On the Difficulty of Inferring Gene Regulatory Networks
(a) NEWUOA with multiple restarts on network 1
83
(b) NEWUOA with multiple restarts on network 3
Fig. 5. The progress of NEWUOA with multiple restarts on two 10-gene-network inference problems. Each shade of gray represents a run of the algorithm. The plots shown correspond to the case in which the correct topology of the target GRN is provided to the algorithm.
5
Conclusions and Related Works
In this work, we presented a study of the fitness landscape for the problem of gene regulatory networks inference, when dynamic recurrent neural networks are adopted as a model for gene regulation and relative squared error is chosen as a fitness function. As far as we know, this is the first study on fitness-distance correlation analysis for the problem of gene regulatory networks inference. The study consists in a fitness-distance correlation analysis of different random samplings around the problem’s optimal solution, which is in the form of a weight matrix W. The optimal matrix was first perturbed globally, then only on its nonzero elements and at fixed Hamming distance. Results show that the error surface has a strong positive fitness-distance correlation, but they also reveal the presence of extremely deep valleys in the fitness landscape, which are responsible for the poor performance of optimization algorithms not designed explicitly for this problem. The network structure perturbation analysis highlights that: (i) RSE alone is not sufficient to guide a search algorithm towards regions of the search space close to the global optimum, (ii) even if information about the optimal network structure is provided to the algorithm, convergence to the global optimum is not guaranteed because the fitness landscape presents many deep local optima, and (iii) the closer a network structure is to the one of the optimal solution, the higher the chances are that an algorithm converges to the optimum. This last fact seems to be due to the higher level of organization in the fitness landscape in the proximity of the optimal structure.
84
F. Sambo et al.
Because of these observations, we conclude that a two-phase algorithm, which alternates between a search step in the discrete space of network structures and a search step in the continuous space of nonzero system parameters, has the potential of reaching high-quality solutions. Research in this direction has already been done, for example in [28, 23, 13], but no analysis of the underlying fitness landscape had been performed before.
References 1. Bansal, M., Belcastro, V., Ambesi-Impiombato, A., di Bernardo, D.: How to infer gene networks from expression profiles. Mol. Syst. Biol. 3(78) (February 2007) 2. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. MPS-SIAM Series in Optimization. SIAM, Philadelphia (2000) 3. Di Camillo, B., Toffolo, G., Cobelli, C.: A gene network simulator to assess reverse engineering algorithms. Annals of the New York Academy of Sciences 1158(1), 125–142 (2009) 4. Fehlberg, E.: Low-order classical runge-kutta formulas with step size control and their application to some heat transfer problems. Technical Report 315, NASA (1969) 5. Ferrazzi, F., Sebastiani, P., Ramoni, M.F., Bellazzi, R.: Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear gaussian networks. BMC Bioinformatics 8(suppl. 5) (2007) 6. Gennemark, P., Wedelin, D.: Benchmarks for identification of ordinary differential equations from time series data. Bioinformatics 25(6), 780–786 (2009) 7. Hunter, L.: Life and its molecules: A brief introduction. AI Magazine - Special issue on AI and Bioinformatics 25(1), 9–22 (2004) 8. Ideker, T., Ozier, O., Schwikowski, B., Siegel, A.F.: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(suppl. 1), 233–240 (2002) 9. Jones, T.: Evolutionary algorithms, fitness landscapes and search. Working Papers 95-05-048, Santa Fe Institute (1995) 10. Jones, T., Forrest, S.: Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Proceedings of the 6th International Conference on Genetic Algorithms, pp. 184–192. Morgan Kaufmann, San Francisco (1995) 11. de Jong, H.: Modeling and simulation of genetic regulatory systems: A literature review. Journal of Computational Biology 9(1), 67–103 (2002) 12. Kennedy, J., Eberhart, R., Shi, Y.: Swarm Intelligence. Morgan Kaufmann, San Francisco (2001) 13. Kentzoglanakis, K., Poole, M.J., Adams, C.: Incorporating heuristics in a swarm intelligence framework for inferring gene regulatory networks from gene expression time series. In: Dorigo, M., Birattari, M., Blum, C., Clerc, M., St¨ utzle, T., Winfield, A.F.T. (eds.) ANTS 2008. LNCS, vol. 5217, pp. 323–330. Springer, Heidelberg (2008) 14. Kimura, S., Ide, K., Kashihara, A., Kano, M., Hatakeyama, M., Masui, R., Nakagawa, N., Yokoyama, S., Kuramitsu, S., Konagaya, A.: Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics 21(7), 1154–1163 (2005) 15. Kremer, S.C.: Field Guide to Dynamical Recurrent Networks. Wiley-IEEE Press, Chichester (2001)
On the Difficulty of Inferring Gene Regulatory Networks
85
16. Liang, S., Fuhrman, S., Somogyi, R.: Reveal: a general reverse engineering algorithm for inference of genetic network architectures. In: Pacific Symposium on Biocomputing, pp. 18–29 (1998) 17. Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., Califano, A.: Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7(suppl. 1) (2006) 18. Merz, P., Freisleben, B.: Fitness landscapes and memetic algorithm design. In: Corne, D., Dorigo, M., Glover, F. (eds.) New Ideas in Optimization, pp. 244–260. McGraw Hill, London (1999) 19. Molla, M., Waddell, M., Page, D., Shavlik, J.: Using machine learning to design and interpret gene-expression microarrays. AI Magazine - Special issue on AI and Bioinformatics 25(1), 23–44 (2004) 20. Mor´e, J.J., Wild, S.M.: Benchmarking derivative-free optimization algorithms. SIAM Journal on Optimization 20(1), 172–191 (2009) 21. Noman, N., Iba, I.: Reverse engineering genetic networks using evolutionary computation. Genome Informatics 16(2), 205–214 (2005) 22. Powell, M.J.D.: The NEWUOA software for unconstrained optimization. In: LargeScale Nonlinear Optimization, Nonconvex Optimization and Its Applications, vol. 83, pp. 255–297. Springer, Berlin (2006) 23. Ressom, H.W., Zhang, Y., Xuan, J., Wang, Y., Clarke, R.: Inference of gene regulatory networks from time course gene expression data using neural networks and swarm intelligence. In: IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp. 1–8. IEEE, Los Alamitos (2006) 24. Spieth, C., Worzischek, R., Streichert, F., Supper, J., Speer, N., Zell, A.: Comparing evolutionary algorithms on the problem of network inference. In: Cattolico, M. (ed.) Genetic and Evolutionary Computation Conference, GECCO 2006, Proceedings, Seattle, Washington, USA, July 8-12, pp. 305–306. ACM, New York (2006) 25. Vu, T.T., Vohradsky, J.: Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of saccharomyces cerevisiae. Nucleic Acids Research 35(1), 279–287 (2007) 26. Xu, R., Hu, X., Wunsch II, D.: Inference of genetic regulatory networks from time series gene expression data. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2, pp. 1215–1220. IEEE Press, Los Alamitos (2004) 27. Xu, R., Venayagamoorthy, G.K., Wunsch II, D.C.: Modeling of gene regulatory networks with hybrid differential evolution and particle swarm optimization. Neural Networks 20(8), 917–927 (2007) 28. Xu, R., Wunsch II, D., Frank, R.: Inference of genetic regulatory networks with recurrent neural network models using particle swarm optimization. IEEE/ACM Trans. Comput. Biol. Bioinformatics 4(4), 681–692 (2007)
Memetic Algorithms for Constructing Binary Covering Arrays of Strength Three Eduardo Rodriguez-Tello and Jose Torres-Jimenez CINVESTAV-Tamaulipas, Information Technology Laboratory, Km. 6 Carretera Victoria-Monterrey, 87276 Victoria Tamps., Mexico
[email protected],
[email protected]
Abstract. This paper presents a new Memetic Algorithm (MA) designed to compute near-optimal solutions for the covering array construction problem. It incorporates several distinguished features including an efficient heuristic to generate a good quality initial population, and a local search operator based on a fine tuned Simulated Annealing (SA) algorithm employing a carefully designed compound neighborhood. Its performance is investigated through extensive experimentation over well known benchmarks and compared with other state-of-the-art algorithms, showing improvements on some previous best-known results. Keywords: Memetic Algorithms, Covering Arrays, Software Testing.
1
Introduction
Software systems play a very important role in modern society, where numerous human activities rely on them to fulfill their needs for information processing, storage, search, and retrieval. Ensuring that software systems meet people’s expectations for quality and reliability is an expensive and highly complex task. Especially, considering that usually those systems have many possible configurations produced by the combination of multiple input parameters, making immediately impractical an exhaustive testing approach. An alternative technique to accomplish this goal is called software interaction testing. It is based on constructing economical sized test-suites that provide coverage of the most prevalent configurations. Covering arrays (CAs) are combinatorial structures which can be used to represent these test-suites. A covering array, CA(N ; t, k, v), of size N , strength t, degree k, and order v is an N × k array on v symbols such that every N × t sub-array contains all ordered subsets from v symbols of size t (t-tuples) at least once. In such an array, each test configuration of the analyzed software system is represented by
This research work was partially funded by the following projects: CONACyT 58554, C´ alculo de Covering Arrays; CONACyT 99276, Algoritmos para la Canonizaci´ on de Covering Arrays; 51623 Fondo Mixto CONACyT y Gobierno del Estado de Tamaulipas.
P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 86–97, 2010. c Springer-Verlag Berlin Heidelberg 2010
Memetic Algorithms for Constructing Binary CAs of Strength Three
87
a row. A test configuration is composed by the combination of k parameters taken on v values. This test-suite allows to cover all the t-way combinations of parameter values, (i.e. for each set of t parameters every t-tuple of parameter values is represented). Then, software testing cost can be substantially reduced by minimizing the number of test configurations N in a covering array. The minimum N for which a CA(N ; t, k, v) exists is the covering array number and it is defined according to (1). CAN(t, k, v) = min{N : ∃ CA(N ; t, k, v)}
(1)
The problem of determining the covering array number is also known in the literature as the Covering Array Construction (CAC) problem. This is equivalent to the problem of maximizing the degree k of a covering array given the values N , t, and v. There exist only some special cases where it is possible to find the covering array number using polynomial order algorithms. For instance, the case N = v t , t = 2, k = v+1 was completely solved for v = pα a prime or a power of prime and v > t (see [1] for references). This case was subsequently generalized by Bush for t > 2 [2]. However, in the general case determining the covering array number is known to be NP-complete [3,4], thus it is unlikely that exact algorithms running in polynomial time exist for this hard combinatorial optimization problem. Other applications related to the CAC problem arise in fields like: drug screening, data compression, regulation of gene expression, authentication, intersecting codes and universal hashing (see [5] for a detailed survey). Addressing the problem of obtaining the covering array number in reasonable time has been the focus of much research. Among the approximate methods that have been developed for constructing covering arrays are: a) recursive methods [1,6], b) algebraic methods [7,5], c) greedy methods [8] and d) metaheuristics such as Tabu Search [9], Simulated Annealing [10], and Genetic Algorithms [11]. This paper aims at developing a new powerful Memetic Algorithm (MA) for finding near-optimal solutions for the CAC problem. In particular, we are interested in constructing binary covering arrays of strength three and in establishing new bounds on the covering array number CAN(3, k, 2). To achieve this, the proposed MA algorithm incorporates a fast heuristic to create a good quality initial population and a local search operator based on a fine tuned Simulated Annealing algorithm employing two carefully designed neighborhood functions. The performance of the proposed MA algorithm is assessed with a test-suite, conformed by 20 binary covering arrays of strength three, taken from the literature. The computational results are reported and compared with previously published ones, showing that our algorithm is able to improve on 9 previous best-known solutions and to equal these results on the rest of the selected benchmark instances. It is important to note that for some of those instances the best-known results were not improved since their publication in 1993 [1]. The rest of this paper is organized as follows. In Sect. 2, a brief review is given to present some representative solution procedures for constructing binary covering arrays of strength three. Then, the components of our new Memetic
88
E. Rodriguez-Tello and J. Torres-Jimenez
Algorithm are discussed in detail in Sect. 3. Section 4 is dedicated to computational experiments and comparisons with respect to previous best-known results. Finally, the last section summarizes the main contributions of this work.
2
Relevant Related Work
Because of the importance of the CAC problem, much research has been carried out in developing effective methods for solving it. In this section, we give a brief review of some representative procedures which were used in our comparisons. These procedures were devised for constructing binary CAs of strength three. Sloane published in [1] a procedure which improves some elements of the work reported in Roux’s PhD dissertation [12]. This procedure allows to construct a CAN(3, 2k, 2) by combining two CAs CA(N2 ; 2, k, 2) and CA(N3 ; 3, k, 2). It first appends CA(N2 ; 2, k, 2) to a CA(N3 ; 3, k, 2), which results in a k × (N2 + N3 ) array. Then this array is copied below itself, producing a 2k × (N2 + N3 ) array. Finally, the copied strength 2 array is replaced by its bit-complement array (i.e. switch 0 to 1 and 1 to 0). Following these ideas, Chateauneuf and Kreher presented latter in [5] an algebraic procedure for constructing CAN(3, 2k, v). This procedure has permitted to attain some of the best-known solutions for binary CAs of strength three. Furthermore, it is a polynomial time algorithm. In 2001, a study was carried out by Stardom [11] to compare three different metaheuristics: Tabu Search (TS), Simulated Annealing (SA) and Genetic Algorithms (GA). Stardom’s GA implementation represents a CA(N ; t, k, v) by using an N × k array on v symbols and operates as follows: An initial population 100 ≤ |P | ≤ 500 is randomly generated. At each generation the original population is randomly partitioned into two groups (male and female) of size |P |/2. The members of each group are ordered randomly and then the i-th arrays from each group are mated with each other, for 1 ≤ i ≤ |P |/2. The |P | offspring are mutated and then the most fit |P | members of the male, female and offspring subpopulations combined are selected as the new population. The crossover operator randomly selects a point (i, j) for each pair of arrays to be mated. If the pair of mates contain entries Amn and Bmn and the pair of offspring contain entries Cmn and Dmn , then Cmn = Amn (Dmn = Bmn ) for m ≤ i and n ≤ j; and Cmn = Bmn (Dmn = Amn ) for m > i and n > j. The mutation operator consists in applying a random entry swap. This process is repeated until a predefined maximum of 5000 generations is reached or when a covering array is found. For his comparisons the author employed a set of benchmark instances conformed by binary covering arrays of strength two. The results show that his GA implementation was by far the weakest of the three compared metaheuristics. A more effective Tabu Search (TS) algorithm than that presented in [11] was devised by Nurmela [9]. This algorithm starts with a N × k randomly generated matrix that represents a covering array. The number of uncovered t-tuples is used to evaluate the cost of a candidate solution (matrix). Next an uncovered t-tuple is selected at random and the rows of the matrix are verified to find
Memetic Algorithms for Constructing Binary CAs of Strength Three
89
those that require only the change of a single element in order to cover the selected t-tuple. These changes, called moves, correspond to the neighboring solutions of the current candidate solution. The variation of cost corresponding to each such move is calculated and the move leading to the smallest cost is selected, provided that the move is not tabu. If there are several equally good nontabu moves, one of them is randomly chosen. Then another uncovered t-tuple is selected and the process is repeated until a matrix with zero cost (covering array) is found or a predefined maximum number of moves is reached. The results produced by Nurmela’s TS implementation have demonstrated that it is able to slightly improved some previous best-known solutions, specially the instance CA(15; 3, 12, 2). However, an important drawback of this algorithm is that it consumes considerably much more computational time than any of the three previously presented algorithms. More recently Forbes et al. [13] introduced an algorithm for the efficient production of covering arrays of strength t up to 6, called IPOG-F (In-Parameter Order-Generalized). Contrary to many other algorithms that build covering arrays one row at a time, the IPOG-F strategy constructs them one column at a time. The main idea is that covering arrays of k − 1 columns can be used to efficiently build a covering array with degree k. In order to construct a covering array, IPOG-F initializes a v t × t matrix which contains each of the possible v t distinct rows having entries from {0, 1, . . . , v − 1}. Then, for each additional column, the algorithm performs two steps, called horizontal growth and vertical growth. Horizontal growth adds an additional column to the matrix and fills in its values, then any remaining uncovered t-tuples are covered in the vertical growth stage. The choice of which rows will be extended with which values is made in a greedy manner: it picks an extension of the matrix that covers as many previously uncovered t-tuples as possible. IPOG-F is currently implemented in a software package called FireEye [14], which was written in Java. Even if IPOG-F is a very fast algorithm for producing covering arrays it generally provides poorer quality results than other state-of-the-art algorithm like the algebraic procedures proposed by Chateauneuf and Kreher [5].
3
A New Memetic Algorithm for Constructing CAs
In this section we present a new Memetic algorithm for solving the CAC problem. Next all the details of its implementation are presented. 3.1
Search Space and Internal Representation
Let A be a potential solution in the search space A , that is a covering array CA(N ; t, k, v) of size N , strength t, degree k, and order v. Then A is represented as an N × k array on v symbols, in which the element ai,j denotes the symbol assigned in the test configuration i to the parameter j. The size of the search space A is then given by the following expression: |A | = v N k
(2)
90
E. Rodriguez-Tello and J. Torres-Jimenez
3.2
Fitness Function
The fitness function is one of the key elements for the successful implementation of metaheuristic algorithms because it is in charge of guiding the search process toward good solutions in a combinatorial search space. Previously reported metaheuristic algorithms for solving the CAC problem have commonly evaluated the quality of a potential solution (covering array) as the change in the number of uncovered t-tuples [15,10,11,9]. We can formally define this fitness function as follows. Let A ∈ A be a potential solution, S r a N × t subarray of A representing the r-th subset of t columns taken from k, and ϑj a set containing the union1 of the N t-tuples in S j denoted by the following expression: ϑj =
N −1 i=0
Sij ,
(3)
then the function F (A) for computing the fitness of a potential solution A can be defined using (4). k −1 ( t) k t F (A) = v − |ϑj | (4) t j=0 In our MA implementation this fitness function definition was used. Its computa tional complexity is equivalent to O(N kt ), but with appropriate data structures it allows an incremental fitness evaluation of neighboring solutions in O(2 k−1 t−1 ) operations. 3.3
General Procedure
Our MA implementation starts building an initial population P , which is a set of configurations having a fixed constant size |P |. Then, it performs a series of cycles called generations. At each generation, assuming that |P | is a multiple of four, the population is randomly partitioned into |P | mod 4 groups of four individuals. Within each group, the two most fit individuals are chosen to become the parents in a recombination operator. The resulting offspring are then improved by using a local search operator for a fixed number of iterations L. Finally, the two worst fit individuals in the group are replaced with the improved offspring. This mating selection strategy ensures that the fittest individuals remain in the population, but restricts the amount of times they can reproduce to once per generation. At the end of each generation, thanks to the selection for survival, half of the population is turned over, ensuring a wide coverage of the search space through successive mating. The repeated introduction of less fit offspring increases the chance of a less fit individual being involved in the recombination phase, thus maintaining diversity in the population. 1
Please remember that the union operator ∪ in set theory eliminates duplicates.
Memetic Algorithms for Constructing Binary CAs of Strength Three
91
The iterative process described above stops either when a predefined maximum number of generations (maxGenerations) is reached or when a covering array with the predefined parameters N , t, k, and v is found. 3.4
Initializing the Population
In the GA reported in [11] the initial population is randomly generated. In contrast, in our MA implementation the population is initialized using a procedure that guarantees a balanced number of symbols in each column of the generated individuals (CAs). This procedure assigns randomly N/2 ones and the same number of zeros to each column of the individuals when its size N is even, otherwise it allocates N/2 + 1 ones and N/2 zeros to each column. Due to the randomness of this procedure, the individuals in the initial population are quite different. This point is important for population based algorithms because a homogeneous population cannot efficiently evolve. We have decided to use this particular method for constructing the initial population because we have observed, from preliminary experiments, that good quality individuals contain a balanced number of symbols in each column. 3.5
The Recombination Operator
The main idea of the recombination operator is to generate diversified and potentially promising individuals. To do that, a good recombination operator for the CAC problem should take into consideration, as much as possible, the individuals’ semantic. After some preliminary experiments for comparing different crossover operators, we have decided to use a row crossover. It randomly selects a row i for each pair of individuals to be mated. If the pair of mates contain entries Amn and Bmn and the pair of offspring contain entries Cmn and Dmn , then Cmn = Amn (Dmn = Bmn ) for m ≤ i; and Cmn = Bmn (Dmn = Amn ) for m > i. This recombination operator has the advantage to preserve certain information contained in both parents. 3.6
The Local Search Operator
The purpose of the local search (LS) operator is to improve the offspring (solutions) produced by the recombination operator for a maximum of L iterations before inserting them into the population. In general, any local search method can be used. In our implementation, we have decided to use a Simulated Annealing (SA) algorithm. In our SA-based LS operator the neighborhood function is a key component which has a great impact on its performance. Formally, a neighborhood relation is a function N : A → 2A that assigns to every potential solution (covering array) A ∈ A a set of neighboring solutions N (A) ⊆ A , which is called the neighborhood of A. A wrong selected neighborhood function can lead to a poor exploration of the search space. A well documented alternative to increase the
92
E. Rodriguez-Tello and J. Torres-Jimenez
search power of LS methods consists in using compound neighborhood functions [16,17,18]. Following this idea, and based on the results of our preliminary experimentations, a neighborhood structure composed by two different neighborhood functions is proposed for this SA algorithm. Let switch(A, i, j) be a function allowing to change the value of the element ai,j by a different legal member of the alphabet in the current solution A, and W ⊆ A a set containing ω different neighboring solutions of A created by applying the function switch(A, i, j) with different random values of i and j (0 ≤ i < N , 0 ≤ j < k). Then the first neighborhood N1 (A, ω) of a potential solution A, used in our SA implementation can be defined using the following expression: N1 (A, ω) = A ∈ A : A = min [F (A )] (5) ∀A ∈W, |W |=ω
Defining the second neighborhood N2 (A) (Equation (6)) used in our SA implementation requires the use of a function swap(A, i, j, l) which exchanges the values of two elements ai,j and al,j (ai,j = al,j ) within the same column of A, and a set R ⊆ A containing neighboring solutions of A produced by γ successive applications of the function swap(A, i, j, l) using randomly chosen values for the parameters i, j and l (0 ≤ i < N , 0 ≤ l < N , 0 ≤ j < k). N2 (A, γ) = A ∈ A : A = min [F (A )] (6) ∀A ∈R, |R|=γ
During the search process a combination of both N1 (A, ω) and N2 (A, γ) neighborhood functions is employed by our SA algorithm. The former is applied with probability p, while the latter is employed at a (1−p) rate. This combined neighborhood function N3 (A, x, ω, γ) is defined in (7), where x is a random number in the interval [0, 1]. N1 (A, ω) if x ≤ p N3 (A, x, ω, γ) = (7) N2 (A, γ) if x > p The SA operator proposed starts at an initial temperature T0 = 3, at each Metropolis round r = 500 moves are generated. If the cost of the attempted move decreases then it is accepted. Otherwise, it is accepted with probability P (Δ) = e−Δ/T where T is the current temperature and Δ is the increase in cost that would result from that particular move. At the end of each Metropolis round then the current temperature is decremented by a factor of α = 0.95. The algorithm stops either if the current temperature reaches Tf = 0.001, or when it reaches the predefined maximum of L iterations. The algorithm memorizes and returns the most recent covering array A∗ among the best configurations found: after each accepted move, the current configuration A replaces A∗ if F (A) ≤ F (A∗ ). The rational to return the last best configuration is that we want to produce a solution which is as far away as possible from the initial solution in order to better preserve the diversity of the population.
Memetic Algorithms for Constructing Binary CAs of Strength Three
4
93
Computational Experiments
In this section, we present a set of experiments accomplished to evaluate the performance of the MA algorithm presented in Sect. 3. The algorithms was coded in C and compiled with gcc using the optimization flag -O3. It was run sequentially into a CPU Xeon at 2 GHz, 1 GB of RAM with Linux operating system. Due to the non-deterministic nature of the algorithms, 20 independent runs were executed for each of the selected benchmark instances. In all the experiments the following parameters were used for the MA: a) population size |P | = 40, b) recombinations per generation offspring = |P | mod 4, c) maximal number of local search iterations L = 50000, d) the neighborhood function N3 (A, x, ω, γ) is applied using a probability p = 0.6 and parameters ω = 10 and γ = N/2, and e) maximal number of generations maxGenerations = 200000. These parameter values were chosen experimentally and taking into consideration our experience in solving other combinatorial optimization problems with the use of MA [19]. 4.1
Benchmark Instances and Comparison Criteria
To assess the performance of the MA introduced in Sect. 3, a test-suite composed of 20 well known benchmark instances taken from the literature was used [1,5,9,13]. It includes instances of size 8 ≤ N ≤ 32. The main criterion used for the comparison is the same as the one commonly used in the literature: the best degree k found (bigger values are better) given fixed values for N , t and v. 4.2
Comparison among MA and the State-of-the-Art Procedures
The purpose of this experiment is to carry out a performance comparison of the best bounds achieved by our MA with respect to those produced by the following state-of-the-art procedures: orthogonal array constructions [2], Roux type constructions [1], doubling constructions [7,5], Tabu Search [9], and IPOGF [13]. Table 1 displays the detailed computational results produced by this experiment. The first column in the table indicates the size N of the instance. Column 2 shows the best results found by IPOG-F [13] in terms of the degree k, while column 3 (Best) presents the previous best-known degree along with the reference where this result was originally published as indicated in [20]. Next five columns provide the best solutions (k ∗ ), the success rate of finding those best solutions (Succ.), the average solution cost (Avg.) with respect to (4), its standard deviation (Dev.), and the average CPU time (T ) in seconds obtained in 20 executions of our MA. Finally, the difference (ΔBest−k∗ ) between the best result produced by our MA and the previous best-known solution is depicted in the last column. According to [1] the results presented in column 4 for the instances of size N ≤ 12 are optimal solutions. From the data presented in Table 1 we can make the following main observations. First, the solution quality attained by the proposed MA is very competitive
94
E. Rodriguez-Tello and J. Torres-Jimenez Table 1. Improved bounds for CAN(3, k, 2) N
8 10 12 15 16 17 18 19 20 21 22 23 24 25 26 27 28 30 31 32 Avg.
IPOG-F 4 4 5 6 7 9 11 12 13 15 16 16 19 21 24 26 30 31 33 37 16.95
Best 4 [7] 5 [7] 11 [1] 12 [9] 14 [1] 16 [1] 20 [5] 22 [5] 22 [5] 22 [5] 24 [5] 28 [5] 30 [5] 32 [5] 40 [5] 44 [5] 44 [5] 48 [5] 56 [5] 64 [1] 27.90
k∗ 4 5 11 12 14 16 20 22 23 24 24 28 36 41 42 45 47 50 56 67 29.35
Succ. 1.0 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.4 0.3 0.9 0.7 0.3 0.2 0.4 0.3 0.5 0.6 0.7 0.5 0.65
MA Avg. Dev. 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.4 0.4 0.8 1.2 1.2 1.0 0.9 0.2 0.4 0.7 1.2 2.2 2.0 1.8 1.4 1.9 2.0 1.6 2.1 2.2 2.8 2.8 3.9 2.4 4.0 7.5 10.8 1.29 1.70
T 0.02 0.03 0.13 0.18 96.64 374.34 13430.99 10493.26 13451.34 14793.41 5235.06 21480.33 36609.51 49110.23 52958.12 64939.07 71830.49 89837.65 134914.74 262151.32 42085.34
ΔBest−k∗ 0 0 0 0 0 0 0 0 1 2 0 0 6 9 2 1 3 2 0 3 1.45
with respect to that produced by the state-of-the-art procedures summarized in column 3. In fact, it is able to improve the previous best-known solutions on 9 benchmark instances. It is important to note that for some of these instances the best-known results were not improved since their publication in 1993 [1]. For the rest of the instances in the test-suite our MA equals the previous best-known solutions. Second, one observes that in this experiment the IPOG-F procedure [13] returns poorer quality solutions than our MA in 19 out 20 benchmark instances. Indeed, IPOG-F produces covering arrays which are in average 73.16% worst than those constructed with a MA. Regarding the computational effort we would like to point that, in general, authors of the algorithms used in our comparisons did not provide information about their expended CPU times. Thus, the running times from these algorithms cannot be directly compared with ours. Even if the results attained by our MA are very competitive, we have observed that the average computing time consumed by our approach, to produce these excellent results, is greater than that used by some recursive [6,21] and algebraic methods [7,5,22]. However, since MA outperforms some of the state-of-the-art procedures, finding 9 new bounds, we believe that the extra consumed computing time is fully justified. Especially, if we consider that for this kind of experiments the objective is to compare the best bounds achieved by the studied algorithms. The outstanding results achieved by MA are better illustrated in Fig. 1. The plot represents the size N of the instance (ordinate) against the degree k attained by the compared procedures (abscissa). The bounds provided by IPOG-F [13] are
Memetic Algorithms for Constructing Binary CAs of Strength Three
95
shown with squares, the previous best-known solutions are depicted as circles, while the bounds computed with our MA are shown as triangles. From this figure it can be seen that MA consistently outperforms IPOG-F, obtaining also important improvements with respect to the previous best-known solutions on CAN(3, k, 2) for 4 ≤ k ≤ 67. This is the case of covering array CA(25; 3, k, 2) for which an increase of 28.13% of the degree k was accomplished by our algorithm.
40
IPOG−F Best MA
35 30
N
25 20 15 10 5 0
10
20
30
40
50
60
70
k
Fig. 1. Previous best-known and improved bounds on CAN(3, k, 2)
4.3
Influence of the Variation Operators
In order to further examine the behavior of our approach we have performed some additional experiments for analyzing the influence of the variation operators used in its implementation. The results obtained with all the benchmark instances described in Sect. 4.1 were similar, so for the reason of space limitation, we have decided to show the product of these experiments using only a representative graph. Figure 2 shows the evolution profile of the fittest individual (ordinate) along the search process (abscissa) when the instance CA(17; 3, 16, 2) is solved using two variants of the MA described in Sect. 3: a) an algorithm using only the crossover operator, and b) an algorithm using the crossover operator and a simple mutation operator based on the switch(A, i, j) defined in Sect. 3.6. From Fig. 2 it can be observed that the worst solution quality is provided by the algorithm using only the crossover operator. It gets stuck longer time on some local minima than the algorithm employing also a simple mutation operator. However, neither of these two variants of our MA were able to find a covering array CA(17; 3, 16, 2). In contrast, the MA using the combination of crossover and LS operators gives better results both in solution quality and computational time expended. These experiments allow us to conclude that the contribution of the crossover operator is less significant than that of the LS operator based on a SA algorithm.
96
E. Rodriguez-Tello and J. Torres-Jimenez 300
MA Only crossover Crossover and mutation
250
Best fitness
200 150 100 50 0 0
500
1000
1500
2000
Generations
Fig. 2. Influence of the variation operators when solving the instance CA(17; 3, 16, 2)
5
Conclusions
In this paper, a highly effective MA designed to compute near-optimal solutions for the CAC problem was presented. This algorithm is based on an efficient heuristic to generate good quality initial populations, and a SA-based LS operator employing a carefully designed compound neighborhood. The performance of this MA was assessed through extensive experimentation over a set of well-known benchmark instances and compared with five other state-of-the-art procedures: orthogonal array constructions [2], Roux type constructions [1], doubling constructions [5], Tabu Search [9], and IPOG-F [13]. The results show that our MA was able to improve on 9 previous best-known solutions and to equal these results on the other 11 selected benchmark instances. Furthermore, it is important to note that these new bounds on CAN(3, k, 2) offer the possibility to improve other best-known results for binary CAs of strength three of size N > 32 by employing doubling constructions [5]. Finding near-optimal solutions for the CAC problem in order to construct economical sized test-suites for software interaction testing is a very challenging problem. However, the introduction of this new MA opens up an exciting range of possibilities for future research. One fruitful possibility is to develop a multimeme algorithm [23] based on the MA presented here in order to efficiently construct covering arrays of strength t > 3 and order v > 2.
References 1. Sloane, N.J.A.: Covering arrays and intersecting codes. Journal of Combinatorial Designs 1(1), 51–63 (1993) 2. Bush, K.A.: Orthogonal arrays of index unity. Annals of Mathematical Statistics 23(3), 426–434 (1952) 3. Seroussi, G., Bshouty, N.: Vector sets for exhaustive testing of logic circuits. IEEE Transactions on Information Theory 34, 513–522 (1988)
Memetic Algorithms for Constructing Binary CAs of Strength Three
97
4. Lei, Y., Tai, K.: In-parameter-order: A test generation strategy for pairwise testing. In: 3rd IEEE International Symposium on High-Assurance Systems Engineering, pp. 254–261. IEEE Press, Washington (1998) 5. Chateauneuf, M.A., Kreher, D.L.: On the state of strength-three covering arrays. Journal of Combinatorial Design 10(4), 217–238 (2002) 6. Hartman, A., Raskin, L.: Problems and algorithms for covering arrays. Discrete Mathematics 284(1-3), 149–156 (2004) 7. Hedayat, A.S., Sloane, N.J.A., Stufken, J.: Orthogonal Arrays, Theory and Applications. Springer, Berlin (1999) 8. Cohen, D.M., Dalal, S.R., Fredman, M.L., Patton, G.C.: The AETG system: An approach to testing based on combinatorial design. IEEE Transactions on Software Engineering 23, 437–444 (1997) 9. Nurmela, K.J.: Upper bounds for covering arrays by tabu search. Discrete Applied Mathematics 138(1-2), 143–152 (2004) 10. Cohen, D.M., Colbourn, C.J., Ling, A.C.H.: Constructing strength three covering arrays with augmented annealing. Discrete Mathematics 308(13), 2709–2722 (2008) 11. Stardom, J.: Metaheuristics and the search for covering and packing arrays. Master’s thesis, Simon Fraser University, Burnaby, Canada (2001) 12. Roux, G.: k-propri´et´es dans des tableaux de n colonnes; cas particulier de la ksurjectivit´e et de la k-permutivit´e. PhD thesis, Universit´e de Paris 6, France (1987) 13. Forbes, M., Lawrence, J., Lei, Y., Kacker, R.N., Kuhn, D.R.: Refining the inparameter-order strategy for constructing covering arrays. Journal of Research of the National Institute of Standards and Technology 113(5), 287–297 (2008) 14. Lei, Y., Kacker, R., Kuhn, D.R., Okun, V., Lawrence, J.: IPOG: A general strategy for t-way software. In: 14th Annual IEEE International Conference and Workshops on the Engineering of Computer-Based Systems, pp. 549–556. IEEE Press, Washington (2007) ¨ 15. Nurmela, K.J., Osterg˚ ard, P.R.J.: Constructing covering designs by simulated annealing. Technical Report 10, Department of Computer Science, Helsinki University of Technology, Otaniemi, Finland (January 1993) 16. Mladenovi´c, N., Hansen, P.: Variable neighborhood search. Computers & Operations Research 24(11), 1097–1100 (1997) 17. Urosevi´c, D., Brimberg, J., Mladenovi´c, N.: Variable neighborhood decomposition search for the edge weighted k-cardinality tree problem. Computers & Operations Research 31(8), 1205–1213 (2004) 18. Rodriguez-Tello, E., Hao, J.K., Torres-Jimenez, J.: An effective two-stage simulated annealing algorithm for the minimum linear arrangement problem. Computers & Operations Research 35(10), 3331–3346 (2008) 19. Rodriguez-Tello, E., Hao, J.-K., Torres-Jim´enez, J.: Memetic algorithms for the minLA problem. In: Talbi, E.-G., Liardet, P., Collet, P., Lutton, E., Schoenauer, M. (eds.) EA 2005. LNCS, vol. 3871, pp. 73–84. Springer, Heidelberg (2006) 20. Colbourn, C.J.: Covering Array Tables (2009), http://www.public.asu.edu/~ ccolbou/src/tabby/catable.html (accessed March 17, 2009) 21. Martirosyan, S.S., Van Trung, T.: On t-covering arrays. Designs, Codes and Cryptography 32(1-3), 323–339 (2004) 22. Hartman, A.: Software and hardware testing using combinatorial covering suites. In: Graph Theory, Combinatorics and Algorithms, pp. 237–266. Springer, Heidelberg (2005) 23. Krasnogor, N.: Towards robust memetic algorithms. In: Recent Advances in Memetic Algorithms, pp. 185–207. Springer, Heidelberg (2004)
A Priori Knowledge Integration in Evolutionary Optimization Paul Pitiot1,2, Thierry Coudert1, Laurent Geneste1, and Claude Baron3 1
Laboratoire Génie de Production, Ecole Nationale d’Ingénieurs de Tarbes, 47, av. d’Azereix BP 1629 - 65016 TARBES, France {paul.pitiot,thierry.coudert,laurent.geneste}@enit.fr 2 Centre de Génie Industriel, Ecole des Mines d’Albi, Université de Toulouse Campus Jarlard, 81013 Albi CT CEDEX 09
[email protected] 3 LATTIS, INSA de Toulouse, 135, av. de Rangueil - 31077 TOULOUSE, France
[email protected]
Abstract. Several recent works have examined the effectiveness of using knowledge models to guide search algorithms in high dimensional spaces. It seems that it may be a promising way to tackle some difficult problem. The aim of such methods is to reach good solutions using simultaneously evolutionary search and knowledge guidance. The idea proposed in this paper is to use a bayesian network in order to store and apply the knowledge model and, as a consequence, to accelerate the search process. A traditional evolutionary algorithm is modified in order to allow the reuse of the capitalized knowledge. The approach has been applied to a problem of selection of project scenarios in a multi-objective context. A preliminary version of this method was presented at EA' 07 conference [1]. An experimentation platform has been developed to validate the approach and to study different modes of knowledge injection. The obtained experimental results are presented. Keywords: Project management, product preliminary design, guided evolutionary algorithm, experience feedback, bayesian network.
1 Introduction Many companies, in order to meet the requirements of their clients and to provide them with adequate products, implement two key processes: – the “product design or configuration” process, which aims at defining precisely the architecture of the product and its components, – the “project design or configuration” process which aims at specifying how the product will be realized (sequence of tasks, used resources...). These two processes are often implemented sequentially: first the product is designed then the realization project is elaborated. For example, when a client wants to build a house, the architect designs at first a plan of the house, then the corresponding realization project is developed and launched. Since the project constraints (for example P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 98–109, 2010. © Springer-Verlag Berlin Heidelberg 2010
A Priori Knowledge Integration in Evolutionary Optimization
99
delays) are not explicitly taken into account in the product design, this can lead to additional iterations between “product design” and “project design” processes. A better integration (or coupling) of both processes is therefore a way to improve the global performance of companies. An in-depth study of several mechanisms that can facilitate integration has been launched in a project called ATLAS, funded by the French National Research Agency and involving academic laboratories, industrialists and the competitiveness cluster Aerospace Valley. The work presented in this paper takes place in the context of the ATLAS project. In this paper, a simplified integration product / project model is first proposed. Indeed, in both environments (product and project), design processes are achieved according to a hierarchical decomposition (see Figure 1(a)): – products are recursively decomposed into smaller sub-products (“AND” connectors), eg. product P1 is made of P11 and P12 (yellow cloud on figure 1.a represent the fact that to make P1, “P11 and P12” are needed and this “global” task will be decomposed on the next analysis level illustrated underneath), – accordingly, projects are recursively decomposed into sub-projects, – alternatives (“XOR” connectors) can be defined in products (e.g. choice between components) and in projects (e.g. choice between sub-contractors to achieve a task). Product decomposition
Project decomposition
P1
T1
Make P11 and P12
T5
T6
AND P11
(a)
P12
T2
T3
XOR P''11
T7
P'11
T4
XOR R'7
T2 (b)
T1
AND
T8
R''7
T3 T4
T5
T6
T5
T6
XOR XOR
T'7
T8
T''7 T2 (c)
T1
T3
AND
T'7
T8
Fig. 1. Product / Project decomposition
100
P. Pitiot et al.
In order to represent the links between both hierarchies, an integrated model is used. This model consists of a graph which nodes are: tasks of the project, AND nodes and XOR nodes. Figure 1(b) represents such a model of the example given in Figure 1(a). A “scenario”, corresponds to a graph in which all the choices are made (i.e. with no more XOR nodes). An example of scenario, corresponding to the model in Figure 1(b), is illustrated in Figure 1(c). The problem addressed is to find, among all the possible scenarios, an optimal one with respect to multiple criteria (such as weight of the product, delay of the project, cost of both, etc.). In this paper, two objectives are considered: minimise the project delay (time needed for the execution of a scenario (due date of the final task) and minimise cost of the project (sum of the cost of every selected task in the scenario)). Let us point out that this problem can be considered as an extended product configuration problem. The existing literature on the subject is dedicated to finding a feasible configuration according to constraints and knowledge on the domain. However, as mentioned in [2] it is very difficult to optimize the resulting configured product a problem of combinatorial explosion appears especially when the problem is loosely constrained. In this case, using an optimization approach can help to focus on good solutions. In [3] a search method, based on a classical multiobjective evolutionary algorithm, was proposed for the problem of scenario selection with promising results. In this paper, we propose to improve this method by taking into account the knowledge that can be capitalized from previous optimizations (learning from exp erience). The background of our work with regard to existing approaches that mix learning and search is given in section 2. Then, the proposed approach, based on an hybridation between bayesian networks (for learning) and evolutionary algorithms (for searching) is described in section 3. Finally, the obtained results are discussed in section 4.
2 Background The method proposed in this paper is close to a new family of algorithms called “intelligent” or “guided” evolutionary optimization [4][5]. This kind of algorithms is based on the interaction between a search process and a knowledge extraction process achieved by a learning procedure. The goal is to merge advantages of each approach. The search process aims at improving a set of solutions by selection and combination operations. The learning process goal is to extract, to capitalize and to exploit knowledge contained into the solutions in order to guide the search process. The learning process has to give orientations with respect to a given context. Michalski in [4] shows that fixing some interesting solutions properties is enough for the search method to generate very quickly some solutions close to the optimal one. As a possible search process, EA are well suited for the coupling with learning methods. Indeed, in a multi-criteria search context, they provide the learning algorithm with a set of individuals that “represent” the global space search. This kind of method indirectly reuses knowledge associated with the problem via the evaluation of the generated solutions. But this knowledge, used during search, is not preserved from one execution to another. In order to do so, it is necessary to complement the EA with a model adapted to knowledge capitalisation and reuse.
A Priori Knowledge Integration in Evolutionary Optimization
101
Among the different methods coupling optimisation and learning, Bayesian Optimization Algorithms (BOA) uses Bayesian Networks (BN) as a Model of Knowledge (MoK) [6]. In these methods, MoK is learned from a database containing selected individuals from the previous generation (according to their fitness). Then, from the MoK, a sampling procedure is used to generate directly the new population of individuals. The induction of the probability model, especially parameters interaction (i.e. definition of the network structure), constitutes the hardest task to perform [7]. Therefore, classical BOA learning process limits itself to the study of most influent parameters interaction. The use of prior knowledge allows either to speed up algorithm convergence by introducing some high-quality or partial available solutions [8], or to improve the learning procedure using an available structural knowledge (prior probabilities of networks structure [7][8][9]). The model proposed in this paper (section 3) acquires prior knowledge about the whole structure of the network from an expert. Then, the learning achieved during optimization process concerns only probabilities updating. This method make it possible to use a MoK with every parameter and main interactions always represented (given by experts) and then to concentrate the learning effort to the probabilities estimation. The hypothesis used is that experts provide a structure close to optimal one, which is enough, after a quick probabilities learning, to guide efficiently the EA. For the majority of guided evolutionary methods listed above, the use of knowledge is achieved indirectly. Knowledge is represented by means of classes of operators [10], intervals [4], and assumptions on the parameters values or by the attributes about good solutions [11]. A.L. Huyet in [5] proposes to model directly the knowledge using classes of parameters. The problem is that it is nearly impossible to directly handle this knowledge with the used formalisms (e.g. decision trees or neural networks). Furthermore, there is no model that dissociates objectives in order to have a representation of the influence of solutions on each of them. Objectives are generally aggregated and then, partial knowledge is impossible to reuse. In the different encountered approaches, the two processes (search and learning) have few interactions during execution, especially for the crossover operator. The model proposed in next section gives some answers to the issues listed above.
3 Proposed Framework and Algorithm The proposed framework uses a hybrid method that makes interacting an EA for the multi-criteria search process and a Model of Knowledge (MoK) able to provide orientations adapted to the treated case. The Bayesian Network (BN) formalism is used for the knowledge base. Two sources of knowledge are used for supplying this base: on one hand, the case base that contains a selection of individuals (solutions) provided by the EA and, on the other hand, the expert knowledge base used in order to define the structure of the BN. The resulting BN provides probabilities that EA can use as orientations for guiding its search process. These orientations are taken into account directly by the evolutionary operators. Using the case base, a learning step enables the BN to be updated by means of an inference algorithm. BN inference algorithms are time consuming even if they are only used to compute probabilities. Therefore, the
102
P. Pitiot et al.
knowledge is clusterised [4] with respect to objectives, represented by discrete nodes in the BN. So, in the proposed approach, objectives are represented as discrete nodes (the values of an objective are represented as discrete states – e.g. Low, Medium, and High). A class of objectives is defined as the combination of different objective states. It corresponds to a region of the objective space. In a multi-criteria decision making process, the method has to provide decision makers with a set of solutions belonging to the Pareto front. A good quality of this set is obtained when all the classes of objectives corresponding to the Pareto front have at least one solution. So, the proposed method enables to guide the EA to reach, at each generation, an ideal Pareto front or, more exactly, interesting zones of search space represented by the different classes of objectives (see Figure 3). MoK acquisition. The structure of the model of knowledge is built from expert knowledge. As illustrated in Figure 2, it contains 4 kinds of nodes: objectives, decision, concepts and environment nodes. The decision nodes correspond to the XOR connectors of the model. The objective nodes represent the set of objectives used for optimization. The concepts nodes are used by experts to express which characteristics of the domain are important and discriminatory on one or several objectives. Environment nodes enable to contextualize the knowledge contained into concept nodes. The whole structure is organised in an heterarchical oriented network from decisions to the objectives nodes, established by experts. Then probabilities of the BN are inferred from some representative cases using EM1 algorithm.
Fig. 2. Decision analysis and capitalization in global MoK
MoK actualization. Considering that, in certain cases, MoK can be unsuitable or incomplete, it is thus necessary to preserve the independence of the search method when the predictions of the MoK are not appropriate. For this reason, the evaluation and selection steps of a standard EA, developed in the next section, are preserved. Moreover, when insufficient progress is observed, two alternatives are implemented and tested: 1) a probability smoothing mechanism allowing to progressively come back to traditional genetic operators, 2) a MoK updating by an online parametric learning. 1
The algorithm EM (Expectation - Maximization) is used for learning. EM algorithm is used considering an industrial implementation perspective because of abilities for dealing with missing or partial data.
A Priori Knowledge Integration in Evolutionary Optimization
103
Individual representation. In the model, first proposed by [3], an individual represents one scenario for the project (see Figure 1.c). The chromosome of an individual gathers on the first part the XOR nodes derived from Product decomposition (choice between components). Instantiations of the genes of this first part (selection of a state) lead to inhibition of some others gene in the chromosome. On the second part of the chromosome, genes represent the XOR nodes derived for Project decomposition (choices to achieve tasks). All choices are always represented whereas the majority of them are inactive since they are inhibited by choices realized on genes of the first part. This encoding ensures a constant viability of the solutions. Selection, evaluation and archiving. The search algorithm is adapted from a SPEA method (Strength Pareto Evolutionary Algorithm) proposed in [12]. It is a traditional EA with classical steps: initialization, evaluation, selection, crossover and mutation operators. SPEA ensures the multi-objective evaluation of individuals according to two steps: i) the Taguchi approach is used in order to evaluate the cost of a scenario for each criterion; ii) then, the multi-criteria evaluation is achieved by means of Pareto front in order to compare and classify the scenarios. The probability of selection of an individual is proportional to its performance (fitness). This fitness depends on the position of the individual compared to the Pareto front. New evolutionary algorithm. The modified EA is represented in figure 3 with the three new evolutionary operators. During the “loading step”, the objective classes are built with respect to the BN. In order to reinforce the main characteristics of each class, probabilities superior to 0.95 are set to 1 and probabilities inferior to 0.05 are set to 0. When a gene is inhibited by a previous gene instantiation (probability of 1 for a particular state of a gene in the first part of the chromosome), the value -1 appears in the class of objective indicating its inhibition. This inhibition mechanism represent so called structural knowledge. It is represented by lists of inhibited gene for each genes instantiation, computed before optimisation process. This behaviour is called Kstruct. The initial population is built according to the objective classes in order to start the search procedure with a priori good orientations (KO-initialisation on figure 3). The individuals are distributed through the various objective classes. Then, for each individual, the probabilities of its class are used to fix the value of genes as shown on figure 3. A gene can be inhibited when operators are used on the chromosome. Indeed evolutionary operators are applied progressively on the chromosome from the first part to the second. Therefore, it is possible to use the structural knowledge in order to add the value -1 to inhibited genes after each instantiation of a gene in first part. In other hand, when a gene is inhibited, either it is kept as it is, or it can be modified by a classical evolutionary process (mutation and crossover). This second possibility enables a genetic mixing. This behaviour is called diploid knowledge preservation (mode Diplo). During the EA process, for each generation, all the individuals have to be associated to an objective class. So, the objective classes are matched to current cluster of Pareto-optimal individuals. The central solution of the cluster (i.e. which minimizes Euclidian distance with other solutions) is used as a reference point for the objective class to which it is matched. It makes it possible to assign to each individual the class of objective to which the centre is closest.
104
P. Pitiot et al.
The Knowledge Oriented mutation operator (KO-mutation), selects an individual randomly among the population and secondly, the probabilities of its class are used to fix the value of genes as during initialization except that a gene mutation occurs according to the probability of mutation. If the diploid knowledge preservation is not used, the mutation is performed randomly and uniformly on the inhibited genes.
Fig. 3. Initialisation, mutation operator and crossover operator
The Knowledge Oriented crossover operator (KO-crossover), enables exploration or intensification of the search space. It corresponds to an “inter-class” exchange by crossing individuals belonging to different classes or to an “intra-class” exchange by crossing individuals of the same class, according to the selection strategy of the parents. Once parent selection is done, probabilities of their classes are used to determine the points of crossover. The crossover is performed in a specific manner for each
A Priori Knowledge Integration in Evolutionary Optimization
105
individual (unilateral crossover). For each gene, the probability of crossover is equal to 1 minus the probability given by the class of the active individual. This method makes it possible to preserve and, if possible to exchange, favourable genes of each individual. When the value linked to a gene in the corresponding objective class is -1 (inhibited gene), a unilateral crossover is done with a probability of 0.5 if the Diplo mode is inactive (uniform crossover of inhibited genes). If the Diplo mode is active, inhibited genes are preserved from the evolutionary process.
4 Experimentation and Validation Main contribution of this study concerns a priori knowledge use by three forms: 1) a conceptual dependency structure between parameters expressed by a B.N., 2) probabilities of this model stem from analysis of previous plans and, 3) explicit structural knowledge (inhibitions between genes stem from graph shape). To evaluate the use of each type of knowledge, the behaviour of three algorithms is studied: – Classical EA (without a priori knowledge) is ran with equiprobable objective classes (each state has an equivalent probability to be mute or crossed). The features in this case are the inhibition mechanism and crossover strategies. – Evolutionary Algorithm Oriented by Knowledge using on line learning (noted EAOKX), the network structure is defined at the beginning of optimization while probabilities tables, initially uniforms, are learned every X generations, – EAOK guided by an exact model (noted EAOKinit), structure and probabilities are learned using a sample of optimal solutions previously generated with an exact approach for small instances or resulting from previous runs of EAOKX. Experimentation has been planned following two steps. In the first step, algorithm is confronted to problems with limited size (different graph shape with 35 to 90 task nodes and 10 to 40 XOR nodes). This first step allows checking the general behaviour of the algorithm as well as tuning of multiple parameters (evolutionary parameters, crossover strategies, learning parameters, using of structural knowledge and diploid knowledge preservation). On a second phase, the behaviour of the proposed algorithm is studied on a large project (approximately hundred XOR nodes). Figure 4 and tables 1, 2 and 3 introduce first tests results on different small projects (35 task nodes randomly generated, 12 XOR nodes for the figure 4 for example). The first curve of the figure 4 illustrates the average performance of the population of individuals obtained with modes EA, EAOKinit, EAOK1 and EAOK5. Second curve illustrates the average performance of the individuals of the Pareto front. Each curve represents average values obtained after one hundred executions. The setting of evolutionary parameter is linked to the graph shape especially the number of XOR nodes. They are experimentally tuned with the classical EA mode for each graph then used with others modes (EAOKinit and EAOKX). EAOKinit shows good performances. After initialisation, individuals of the population are 25% better than those obtained with EA. These results outcome from different combination of others parameters (crossover strategies, knowledge use, etc.) This explains the important standard deviation, but ratio between EA and EAOKinit is constant for equivalent setting. Initial gap between EA and EAOKinit corresponds to the direct impact of injection during initialization. This gap varies according to MoK
106
P. Pitiot et al.
quality and complexity of solved problem. On last generation, the gap between EA and EAOKinit is about 16% with a Relative Standard Deviation (RSD) of 30% less for individuals of current population. Population generated by the guided EAOK is always improved in comparison with classical EA, because the MoK leads to a concentration of the population within performing areas. The final Pareto-optimal individuals mean fitness is improved 4.82% (RSD 30% less) on twentieth generation. EA performance meets EAOKinit ones very progressively, according to problem complexity (number of parameters and complexity of injected knowledge) and according to evolutionary parameters setting.
Fig. 4. Population and Pareto front average fitness. Test realized for twenty generations, with a population of thirty individuals, a maximum of nine individual in Pareto front, Pmut = 0.5, Pcross=0.5 and five classes of objectives. Table 1. Values corresponding to curves showed on figure 4. The first line of table 1 gives results obtained by EA mode (value and relative standard deviation (RSD) for the hundred executions, while values of other lines are expressed as a percentage compared to the EA. Mode EA EAOKinit EAOK1 EAOK5
Mean Fitness for entire population Generation 0 Generation 19 Value RSD Value RSD 11289 4.4 6961 16.5 25.52 % 3.9 16.46 % 11.8 -0.56 % 4.4 11.93 % 27 1.28 % 4.4 13.65 % 30
Mean Fitness for Pareto individuals Generation 0 Generation 19 Value RSD Value RSD 7521 13.9 5781.4 7.5 16.17 % 10.3 4.82 % 4.9 -1.37 % 13.7 -2.06 % 10.9 0.83 % 14.5 0.90 % 10.6
Figure 4 presents first tests for on line learning algorithms. They are equivalent to EA at the beginning of optimisation process (uniform probabilities distribution). They deviate from EA after every learning phase. Learning effect is particularly visible in
A Priori Knowledge Integration in Evolutionary Optimization
107
mode EAOK5 with three zones where difference with EA is intensified (generations 5, 10 and 15). At beginning of process, mode EAOK1 is more performing than EAOK5, but the difference is progressively reduced and finally EAOK5 gives better results. Indeed, it gives degrees of freedom to the search process in order to refine individuals between each learning phase. For the mean of final Pareto-optimal individual fitness, EAOK1 has the worst performances. Indeed, when individuals selected for learning are not enough diversified, the guiding tends to limit search around the existing individuals. EAOK5 take advantage of search and guiding combined effects. This performance has been improved by regulating learning parameters. Concerning adjustment of learning algorithm, two important characteristics emerge: the quality of cases used for learning and the parameter settings of learning algorithm. After various tests, a very fast learning2 has been chosen because it is sufficient to make emerging main properties of search space and thus obtain a global guidance. Quality of the available learning cases set seems to be the most important characteristic in order to obtain a correct model. When the number of cases per class of objectives is too restricted, the phenomenon of “over-learning” involves search stagnation around the already founded individuals, with a risk of stagnation in local minima. Thus, the progressive smoothing of MoK probabilities can be used and provides two functions: i) it makes possible to limit over-learning when cases provided to the learning are too similar; ii) it constitutes a mean for gradually giving degrees of freedom to the search process, i.e., for release the guiding by the MoK. Crossover strategies are also preliminary evaluated. The exploratory strategy during the whole optimization process gives the better results, so it have been selected form the following tests. Finally, every combination3 of Structural Knowledge (SK) and Diploid Knowledge Preservation (DKP) has been evaluated. The results are presented in table 2 and concern hundred executions of each mode on a project of fifty tasks nodes. Structural knowledge can be used to indirectly manage knowledge contained in the individuals. If it allows an initial improvement of the EA, it also involves a reduction of the genetic diversity by reducing exchanges between the individuals. On the other hand, the use of structural knowledge with a learned MoK allows using only individual specific information among knowledge contained in his corresponding class. The diploid knowledge preservation mode gives good results only when the individuals have already a good level of performance, by preserving the inactive combinations which can be re-used if the corresponding genes were reactivated. Conversely, the best strategy with reliable information (EAOKinit) is to use neither structural knowledge, nor diploid knowledge preservation. Guidance by the model is then complete, but this strategy has not to be maintained because stagnation risks increase (strict guiding towards existing individuals). Our method has finally been tested on a problem with large size (350 tasks nodes and more than hundred XOR nodes in the project graph). The project graph is obtained by gathering five small projects previously used. An exact algorithm is not suitable in such large project. Individuals used for the construction of complete model (EAOKinit) are obtained by collecting individuals obtained during one execution of the 2 3
Stop criterion of algorithm EM: 1% of log-likelihood minimal improvement. Note that in EA mode, DKP is completely linked to structural knowledge activation, while in other modes, genes could be inactivated by learned knowledge.
108
P. Pitiot et al.
Table 2. Average fitness of individuals of the Pareto front at the beginning (generation 0 to 2), in progress (generation 10 to 12) and at the end of the optimisation, with various indicators allowing to evaluate more precisely the Pareto front quality: relative standard deviation of the average fitness of Pareto front individuals (PD), RSD of distance between two consecutive individuals (DI), overall length of the Pareto front (Lg) and number of individuals of the Pareto front (Nb). The last column presents the average fitness of the best final individual. Mode
Average fitness of Pareto front individual
DKP / KS
0
1
2
10
11
12
18
19
1/0 1/1 0/1 0/0 -/0 1/1 0/1 1/0 1/1 0/1 0/0
1667 1706 1665 1716 2322 2309 2377 2505 2324 2270 2380
1465 1545 1527 1484 2215 2148 2083 2255 2104 2091 2183
1445 1465 1492 1409 2032 1925 1984 2032 2008 1889 2095
1359 1384 1356 1326 1541 1537 1667 1570 1468 1562 1606
1348 1390 1357 1321 1522 1528 1627 1498 1421 1539 1545
1344 1400 1355 1336 1518 1510 1584 1483 1419 1475 1500
1360 1399 1379 1353 1451 1464 1461 1369 1364 1406 1398
1369 1396 1383 1355 1439 1464 1459 1363 1357 1399 1391
EAOKinit EA EAOK10
PD
DI
Lg Nb Best
0,08 0,07 0,08 0,08 0,1 0,11 0,1 0,07 0,08 0,1 0,07
0,20 0,29 0,24 0,20 0,26 0,24 0,25 0,26 0,26 0,25 0,26
34 32 33 34 31 30 31 32 31 31 32
8,6 8 8,3 8,6 6,9 6,9 6,7 7,4 7,1 7,5 7,4
656 659 657 658 665 674 674 669 661 669 664
Table 3. The table below presents average values and associated RSD (for the twenty executions) for the performance of population individuals (Pop.), Pareto front individuals (Pareto) and best individual (best) at the end of optimization process, as well as qualitative indicators for Pareto front (PD, DI, Lg et Nb) and the execution time in second.
EA
Pop. Pareto best PD DI Lg. Nb. time Val. RSD Val. RSD Val. σ 11357 0.199 6557 0.24 0.09 0.18 11.2 3.95 5453 0.21 217
EAOK10
7348
0.20
5688 0.14 0.07 0.16
9.4
4.1
4876 0.12
298
EAOKinit
5420
0.18
3601 0.17 0.11 0.21
9.5
3.6
2953 0.11
201
EAOK10 (390 individuals). Table 3 presents the average of twenty executions of our algorithm (thirty generations of fifty individuals, Pmut= Pcross=0.5). The EAOK10 algorithm shows an interesting behaviour. The population is overall improved as well as individuals of the Pareto front. At last generation, the variation between EA and EAOK10 respectively reaches 54% (population), 15% (Pareto front) and 11% (better individual) in favour of the EAOK10. Moreover, these performances are more regular than with EA. The learning improves the results, especially the precision and reliability of optimisation. It also seems that the performances obtained strongly depend on the quality of research before the first learning. An interesting prospect is to use an adjustment of the EA supporting the diversity of individuals, in order to improve quality of individuals provided to the learning algorithm. However, in current version of the platform, the time of inference needed to update the probabilities classes remains important. The EAOK10 requires indeed approximately 300 seconds to reach the thirtieth generation with two learning phase, so approximately 27% of additional time required compared to the EA.
A Priori Knowledge Integration in Evolutionary Optimization
109
5 Conclusion/Perspectives Obtained results show the interest of different levels of knowledge reuse. When the knowledge contained in the model of knowledge is reliable, our method allows a significant improvement of performance. When the MoK is erroneous or incomplete, the tests realised on learning algorithm enabled us to study learning process abilities with suggested method. To validate our approach completely, it still remains to confront it with standard problems (“benchmarks”). However, tests carried out show the higher performances of our guided evolutionary algorithm compared to a traditional EA. Moreover, the advantages of our model relate a well guided and more efficient optimization than with classical EA, but also the possibility of knowledge capitalizing on the projects planned according to their context, as well as the possibility of providing to the expert the MoK used during optimization in addition to the optimized solutions. It is indeed useful, for the decision maker, to be able to consult a bayesian network, thanks to the tools offered by this formalism, and to directly visualize the influence of its future decisions on the objectives.
References 1. Pitiot, P., Coudert, T., Geneste, L., Baron, C.: Improvement of Intelligent Optimization by an experience feedback approach. In: Monmarché, N., Talbi, E.-G., Collet, P., Schoenauer, M., Lutton, E. (eds.) EA 2007. LNCS, vol. 4926, pp. 316–327. Springer, Heidelberg (2008) 2. Li, B., Chen, L., Huang, Z., Zhong, Y.: Product configuration optimization using a multiobjective GA. I.J. of Adv. Manufacturing Technology 30, 20–29 (2006) 3. Baron, C., Rochet, S., Esteve, D.: GESOS: a multi-objective genetic tool for project management considering technical and non-technical constraints. In: IFIP World Computer Congress on Art. Intel. Applications and Innovations, AIAI (2004) 4. Michalski, R.S., Wojtusiak, J., Kaufman, K.A.: Intelligent Optimization via Learnable Evolution Model. In: 18th Conf. on Tools with Artificial Intelligence, pp. 332–335 (2006) 5. Huyet, A.-L., Paris, J.-L.: Synergy between Evolutionary Optimization and Induction Graphs Learning for Simulated Manufacturing Systems. Inter. J. of Production Research 42(20), 4295–4313 (2004) 6. Pelikan, M., Sastry, K., Goldberg, D.E.: Sporadic model building for efficiency enhancement of the hBOA. Genetic Programming and Evolvable Machines 9, 53–84 (2008) 7. Baluja, S.: Using a priori knowledge to create probabilistic models for optimization. Inter. J. of approximate reasoning 31(3), 193–220 (2002) 8. Schwarz, J., Ocenasek, J.: A problem knowledge-based evolutionary algorithm KBOA for hypergraph bisectioning. In: 4th Joint Conf. on Knowledge-Based Software Engineering, pp. 51–58. IOS Press, Amsterdam (2000) 9. Hauschild, M.W., Pelikan, M., Sastry, K., Goldberg, D.E.: Using previous models to bias structural learning in the hierarchical BOA. In: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pp. 415–422 (2008) 10. Sebag, M., Schoenauer, M.: A rule based similarity measure. In: Wess, S., Richter, M., Althoff, K.-D. (eds.) EWCBR 1993. LNCS, vol. 837, pp. 119–130. Springer, Heidelberg (1994) 11. Chung, C.J.: Knowledge based approaches to self adaptation in cultural algorithms. PhD thesis, Wayne State University, Detroit, USA (1997) 12. Zitzler, E., Thiele, L.: Multi objective EA: a comparative case study and the strength Pareto approach. IEEE Trans. on evolutionary computation 3(4), 257–271 (1999)
On-Line, On-Board Evolution of Robot Controllers N. Bredeche1 , E. Haasdijk2 , and A.E. Eiben2 1
TAO/LRI, INRIA Saclay, Univ. Paris-Sud, CNRS (Orsay, France) 2 Free University, Amsterdam, The Netherlands
[email protected], {e.haasdijk,gusz}@few.vu.nl
Abstract. This paper reports on a feasibility study into the evolution of robot controllers during the actual operation of robots (on-line), using only the computational resources within the robots themselves (on-board). We identify the main challenges that these restrictions imply and propose mechanisms to handle them. The resulting algorithm is evaluated in a hybrid system, using the actual robots’ processors interfaced with a simulator that represents the environment. The results show that the proposed algorithm is indeed feasible and the particular problems we encountered during this study give hints for further research.
1 Background and Introduction Evolutionary Computing (EC) has proved a powerful technology for developing robot controllers [4] and has resulted in the establishment of Evolutionary Robotics (ER). The overwhelming majority of ER applications use an off-line flavour of evolution. In these cases an evolutionary algorithm (EA) is used to optimise the robot controllers before the robots start their actual operation. This process may rely on real-life fitness evaluations or on a simulation-based assessment of controller quality, but in all cases the EA is executed on one or more computer(s) distinct from the robots. Once the development process has terminated, the controllers are deployed on real robots and remain fixed while the robots go about their given tasks. Thus, during the operational period of the robots, the controllers do not adapt anymore (or at least, not by evolutionary operators [8,9,14]). The present study was undertaken as part of the Symbrion project1 that explicitly aims at using evolution on-line. That is, the evolutionary algorithm is required to adapt the robot controllers during the actual operation period of the robots. Such a switch from (off-line) optimisation to pervasive adaptation offers advantages in cases where the environment is changing and/or it is impossible to optimize the robots for circumstances in which they will operate (for instance, because they are not known well enough in advance). One of the premises of the Symbrion project is the presence of a large group of robots that form a changing “social environment” for each other, which in turn, necessitates on-line adaptation again. All in all, we aim at a system that is decentralised, on-board, without any master computer that executes the evolutionary operators, and fully autonomous, with no human intervention. These requirements imply two major restrictions: 1
EU FP7, FET project, Grant No. 216342, http://symbrion.eu/
P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 110–121, 2010. c Springer-Verlag Berlin Heidelberg 2010
On-Line, On-Board Evolution of Robot Controllers
111
1. Fitness must be evaluated in vivo, i.e., the quality of any given controller is determined by actually using that controller in a robot as it goes about its tasks. 2. All necessary computation must be performed by the robots themselves, implying limited processing power and storage capacity. The real-life, real-time fitness evaluations are inevitably very noisy because the initial conditions for the genomes under evaluation vary considerably. Whatever the details of the evolutionary mechanism, different controllers will be evaluated under different circumstances; for instance, the nth controller will start at the final location of the (n − 1)th one. This leads to very dissimilar evaluation conditions and ultimately to very noisy fitness evaluations. The limited processor power and storage capacity implies that we must use a “lightweight” evolutionary algorithm, with a small population per robot. Obviously, this could limit the exploratory power of the EA, with a high risk of premature convergence at a local optimum. Taking these considerations into account, we formulate the following research objectives: 1. Provide an evolutionary mechanism that can cope with noisy fitness evaluations. 2. Provide an evolutionary mechanism that can perform balanced local and global search even with very small populations. Related work on the on-line, on-board evolution of robot controllers can be roughly divided into two categories: The distributed online onboard ER approach. Each robot carries one genotype and is controlled by the corresponding phenotype. Robots can reproduce autonomously and asynchronously and create offspring controllers by recombination and/or mutation. Here, the iterative improvement (optimisation) of controllers is the result of the evolutionary process that emerges from the exchange of genetic information among the robots. [17] The encapsulated online onboard ER approach. A robot has an EA implemented on-board, maintaining a population of controllers inside itself. The EA is running on a local basis and perform the fitness evaluations autonomously. This is typically done in a time-sharing system, where one member of the inner population is activated (i.e., decoded into a controller) at a time and is used for a while to gather feedback on its quality. Here, the iterative improvement (optimisation) of controllers is the result of the EA running on a single robot [10,16]. This can be extended to multiple robots setup, where each robot is completely independant from others. Note, that both approaches inherently work with a heterogeneous population of robot controllers. The two approaches can also be combined, and often are, resulting in a setup akin to an island model as used in parallel genetic algorithms. In such a combined system, there are two ways of mixing genetic information: intra-island variation (i.e., within the ”population” of the encapsulated EA in one robot) and inter-island migration (between two, or more, robots). [6,15,18,11,3] The work presented here falls in the second category, i.e. the encapsulated approach, explicitly aiming at online adaptation for a single robot.
112
N. Bredeche, E. Haasdijk, and A.E. Eiben
2 The (1+1)- ONLINE Evolutionary Algorithm We propose an EA based on the classical (1+1) Evolution Strategy (ES)[13]. In our experiments, the genome consists of the weights in an artificial neural networks (NN) that controls the robot, formally a real-valued vector x¯ = x1 , . . . , xn . The controlling NN is a perceptron with 9 input nodes (8 sensor inputs and a bias node), no hidden nodes and 2 output nodes (the left and right motor values) –18 weights in total. Thus, the genome is a vector of 18 real values. The perceptron uses a hyperbolic tangent activation function. Variation in a (1+1) ES is necessarily restricted to mutation. This is implemented as straightforward Gaussian mutation, adding values from a distribution ¯. Parent selection in a singleton population is trivN (0, σ) to each xi in the genotype x ial and for survival selection we rely on the so-called + strategy: the child (challenger) replaces the parent (champion) if its fitness is higher. This simple scheme defines the core of our EA, but it is not sufficient to cope with a number of issues in our particular application. Therefore, we extend this basic scheme with a number of advanced features, described below. Adapting σ values. A singleton population is very sensitive to premature convergence to a local optimum. To overcome this problem, we augment the EA with a mechanism that varies the mutation stepsize σ on the fly, switching from local to global search and back, depending on the course of the search. In particular, σ is set to a pre-defined minimum to promote local search whenever a new genome is stored (that is, when the challenger outperforms the champion). Then, σ gradually increases up to a maximum value (i.e., the search shifts towards global search) while the champion outperforms its children. If local search leads to improvements, σ remains low, thus favouring local search. If no improvement is made on a local basis, either because of a neutral landscape or a local optimum, the increasing σ values ensure that the search will move to new regions in the search space. Recovery period. Because we use in vivo fitness evaluation, a new genome x ¯ needs to be “activated” to be evaluated: it has to be decoded into a NN and take over the control of the robot for a while. One of the essential design decisions is to avoid any human intervention during evolution, such as repositioning the robot befor evaluating a new genome. Consequently, a new controller will start where the previous one finished, implying the danger of being penalised for bad behaviour of its predecessor that, for instance, may have manoeuvred itself into an impossibly tight corner. To reduce this effect, we introduce a recoveryT ime, during which robot behaviour is not taken into account for the fitness value computation. This favours genomes that are efficient at both getting out of trouble during the recovery phase and displaying efficient behavior during the evaluation phase. Re-evaluation. The evaluation of a genome is very noisy because the initial conditions for the genomes vary considerably: an evaluation must start at the final location of the previous evaluation, leading to very dissimilar evaluation conditions from one genome to another. For any given genome this implies that the measurenemt of its fitness, during the evaluation period, may be misleading, simply because of the lucky/unlucky starting conditions. To cope with such noise, we re-evaluate the champion (i.e., current best) genome with a probability Preeavulate . This is, in effect, resampling as advocated by Beyer to deal with noisy fitness evaluations [1] and
On-Line, On-Board Evolution of Robot Controllers
113
it implies sharing the robot’s time between producing and evaluating new genomes and re-evaluating old ones. The fitness value that results from this re-evaluation could be used to refine a calculation of the average fitness of the given genome. However, we choose to overwrite the previous value instead. This may seem counterintuitive, but we argue that this works as a bias towards genomes with low variance in their performance. This makes sense as we prefer controllers with robust behaviour. It does, however, entail an intrinsic drawback as good genomes may be replaced by inferior, but lucky genomes in favourable but specific conditions. Then again, a lucky genome which is not good on average will not survive re-evaluation, avoiding the adaptive process getting stuck with a bad genome. Algorithm 1. The (1+1)- ONLINE evolutionary algorithm for evaluation = 0 to N do if random() < Preevaluate then Recover(Champion) F itnessChampion = RunAndEvaluate(Champion) else Challenger = Champion + N (0, σ) {Gaussian mutation} Recover(Challenger) F itnessChallenger = RunAndEvaluate(Challenger) if F itnessChallenger > F itnessChampion then Champion = Challenger F itnessChampion = F itnessChallenger σ = σmin else σ =σ·2 end if end if end for
3 Experimental Setup We evaluate the (1+1)- ONLINE algorithm in a set-up that features actual robotic hardware, a Cortex M3 board with 256kb memory. This controls a simulated autonomous robot in a Player/Stage2 environment. Using the Cortex board instead of a fully simulated setup is due to administrative constraint in the project within which this research takes place: the Cortex board is the same hardware that is currently being integrated in the Symbrion robot prototypes and there is a strong emphasis on validation with similar hardware constraints. After N time-steps, the evaluation of the current controller is complete and the controller parameters are replaced with values from a new genome, which is evaluated from the location the previous controller left it in. This means that no human intervention is ever needed. We run the experiment 12 times. Figure 1 illustrates the experimental set-up, with a Cortex board connected to the computer running Player/Stage. The simulated robot is modelled after an ePuck mobile 2
http://playerstage.sourceforge.net
114
N. Bredeche, E. Haasdijk, and A.E. Eiben
Fig. 1. The experimental setup: the Cortex board connected to Player/Stage. The numbers in the player-stage arena indicate the starting positions for the validation trials.
robot with two wheels and eight proximity sensors. The maze environment used in our experiment is exactly as shown in this figure. For each run of the experiment, the robot starts with a random genome and a random seed. The fitness function is inspired by a classic one, described in [7] which favours robots that are fast and go straight-ahead, which is of course in contradiction with a constrained environment, implying a trade-off between translational speed and obstacle avoidance. The following equation describes the fitness calculation: evalT ime
f itness =
(speedtranslational∗(1 − speedrotational)∗(1 − minSensorV alue))
t=0
All values are normalised between 0 and 1. minSensorV alue is the value of the proximity sensor closest to any obstacle, normalised to [0, 1] (i.e., the value decreases as an obstacle gets closer). We used the following settings during our experiments: both recoveryT ime and evaluationT ime are set to 30 time-steps, Preeavulate is set to 0.2, the σ initial value is set to 1 and may range from 0.01 up to a maximum of 4 and the gene values are defined to be in [−4, +4]. It is important to note that this fitness function is used as a test function. Indeed, the current algorithm is by no mean limited to optimize collision avoidance. Relying on such a fitness function makes it possible to focus on the dynamics of the evolutionary algorithm with regards to desired properties. To provide an indication of the true performance and reusability of the best individuals found by (1+1)- ONLINE evolution, a hall-of-fame is computed during the course of evolution from the champions of all runs. The 10 best genomes from the hall-of-fame are validated by running each from six initial positions in the environment, indicated in figure 1. Starting from each of these positions, the genomes are evaluated for ten times the number of steps used for evaluation during evolution. Note, that one of the validation starting positions has certainly never been visited during development (test no.4, within a small enclosed area) and provides an extreme test case in a very constrained environment. This decomposition into an evolution (development) phase and a post-experiment testing
On-Line, On-Board Evolution of Robot Controllers
115
phase is similar to the learning and testing phases commonly seen in Machine Learning and does not imply a deployment phase as in traditional, off-line ER approaches.
4 Results Evolution dynamics. We conducted a series of twelve independent experiments (1+1)ONLINE evolution, with parameters set as stated above. Each experiment started with a different random controller (with very poor behaviour indeed) and a different random seed. The experiments ran for 500 evaluations and displayed different overall fitness dynamics with very similar patterns. Figure 2 shows typical statistics from one of those runs. Evaluations are denoted on the x-axis. The y-axis consists of two parts: the top half shows the fitness of the current champion genome. When a champion is re-evaluated very poorly or is replaced by an individual that upon re-evaluation turns out to be very bad, the fitness drops dramatically, as happens in this case after about 250 evaluations. The bottom half of the y-axis shows the number of (re-)evaluations of the current champion (downwards; the lower the line, the higher the number of re-evaluations). Every time the champion is re-evaluated, the line drops down a notch, until a new champion is found; then, the number of re-evalations is reset and the line jumps back to the top. The small vertical markers near the x-axis indicate whenever a new champion is adopted, i.e., when the challenger outperforms the current champion.
Fig. 2. Evolution dynamics of a typical run
By analysing the course of the evolutionary process for the experiments, we can observe important mechanisms such as local search (small continuous improvements in the fitness values due to nearby genomes), global search (the ability to get out of a neutral landscape or to jump from a local optimum to a different region), performance and robustness (the ability of certain genomes to display good performance and to remain champion even through re-evaluation).
116
N. Bredeche, E. Haasdijk, and A.E. Eiben
Initially, performance is quite low, as is the number of re-evaluations; in effect, we are waiting for random search (σ is very high at this point) to bootstrap the adaptation. Then, after about 90 evaluations, an interesting individual is selected as champion that produces children that are even better. We observe a quick sequence of new, better performing individuals and an increasing fitness. After about 160 evaluations, we find that the champion has good fitness and is very robust: the individual survives many re-evaluations (the bottom line goes down) while displaying similar fitness values after successive re-evaluations. During this period, σ steadily increases (as prescribed by the algorithm), causing mutation to become more and more aggressive, approaching random search. Eventually, this results in a challenger that beats the champion –either because this newcomer is actually very good or because it was lucky (favourable environmental conditions or taking advantage of an unfortunate re-evaluation of the champion). Observation during experiments showed that the latter option is more likely: at some point, the champion encounters a very difficult set-up and is re-evaluated as performing badly so that almost any challenger has a good chance of beating it. In the plot, this is exactly what happens at the precipitous drop in performance after 250 evaluations. In all our experiments, we saw a similar pattern of initial random search characterised by many different genomes with poor fitness; then, local search characterised by subsequent genomes with increasing fitness until a robust genome is found that survives re-evaluation for some time and then a switch to another region that yields good results or towards an inferior genome that got lucky (almost a restart, in effect). From the point of view of operational robot control such performance degradation may seem undesirable, but bear in mind that the (1+1)- ONLINE algorithm is meant as a global search algorithm. Therefore, such regular fitness reversals are a desired property as long as the search is slightly conservative around good individuals (as is evident from the lengthy episodes of re-evaluation in Figure 2). The regular re-evaluation of the champion promotes (in addition to the varying σ discussed below) global search; because of the noisy fitness calculation, if nothing else, it will occasionally be assessed as performing very poorly indeed. Such an occurrence provides an opportunity for a lucky new and possibly quite different genome to overthrow the champion. Validation of the hall-of-fame. As described in section 3, a hall-of-fame was maintained during the course of the experiments for further validation of the -apparently- best genomes. Figure 3 shows the results of the validation of the hall-of-fame for the selected re-evaluation scheme (the champion’s fitness is overwritten after every re-evaluation) and for two alternatives: one where the fitness is the average of all re-evaluations and one where there is no re-evaluation at all. This allows us to assess two things: whether high ranking genomes in the hall-of-fame are also efficient in a new set-up and whether the ”overwrite fitness” re-evaluation scheme is relevant. The y-axis shows the normalised performance: the best of all individuals for a scenario is set to 1.0, the performance of the other individuals is scaled accordingly. For each scenario (arranged along the x-axis), the graphs show a mark for each individual from the hall-of-fame. All results for a given genotype are linked together with a line. The graph clearly shows that re-evaluation improves performance substantially; from the ten best solutions without re-evaluation, only a single one performs at a level
On-Line, On-Board Evolution of Robot Controllers
117
Fig. 3. Performance on validation scenarios for various re-evaluation schemes. Top: overwritelast-fitness scheme ; Middle: average-fitness scheme ; Down: no re-evaluation scheme. X-axis shows the results on the six different validation setups (see fig.1), y-axis shows normalized fitness performance for each run. For a given genome, results in the six validation setups are joined together with a line.
comparable to that of the ones with re-evaluation. The best individuals in the hall-offame for both re-evaluation variants are, on the whole, quite efficient; some come quite close to the maximum possible performance for these test cases (30,000). It is harder to distinguish between the performance of either variants: On the one hand, the spread of performance seems greater for the case with averaging fitness than it does for overwriting fitness, which would endorse the reasoning that overwriting after re-evaluation
118
N. Bredeche, E. Haasdijk, and A.E. Eiben
promotes individuals with high average fitness and low standard deviation. On the other hand, however, the nature of real world experiments have a negative impact on the amount of data available for statistically sound comparison of re-evaluation strategies, as is often the case with real hardware, and keep from formulating a statistically sound comparaison. In particular, hardware contingencies implies strong constraints regarding time and human intervention, as robots should be re-located for each experiment and the Cortex board should be reloaded with the genome to be tested, as opposed to the completely autonomous setup during the evolution phase. Overall, the testing of ten genomes took approx. a full day of work with full investment from the experimenter3. Behavioural diversity. Further analysis of the ten best individuals with the overwritefitness re-evaluation scheme shows that the controllers actually display different kinds of behaviour –all good, robust, but different wall avoidance and/or open environment exploration strategies, ranging from cautious long turns (reducing the probability of encountering walls) to exploratory straight lines (improved fitness but more walls to deal with). Figure 4 illustrates this by showing the pathways of these individuals, starting from an initial position on the left of the environment. This reflects the genotypic diversity observed in the hall-of-fame and hints at the algorithm’s capability to produce very different strategies with similar fitness.
Fig. 4. Traces for the ten best controllers (using fitness replacement after re-evaluation)
Strong causality and mutation. The reasoning behind the scheme to update σ relies on Strong Causality[12]: it only holds if small changes in the genome lead to small changes in behaviour and big changes in the genome to big changes in behaviour. To investigate if this property holds, a separate set of experiments was performed. For a range of σ values, 200 mutants were created from some fixed initial genome; every one of these mutants was then tested in our arena, from 193 different starting locations (homogeneously distributed over the environment), each with four orientations (i.e., a total of 772 tries per genome); each evaluation lasted 30 time-steps. Because such experiments using the Player/Stage and Cortex set-up as described above would require approx. 3.5 years to run, we used a simplified autonomous robot simulator.Each experiment started from one specific genome, the first experiment from the best genome in the hall-of-fame (Fig. 5.(a)) and the second experiment from a randomly generated genome (Fig. 5.(b)). In both figures, The x-axis shows σ. The y-axis shows the range of fitness values: the 3
To some extent, an illustrative metaphor is that of a biologist performing an experiment with mice.
On-Line, On-Board Evolution of Robot Controllers
119
sum of fitnesses for each mutant over all 772 trials. For every value of σ, the candle bars show the minimum, maximum, median and lower and upper quartile. Figure (c) shows a histogram of the frequency of σ values over 12 runs (approximately 4,700 evaluations) of the original experiment. The (logarithmic) x-axis shows occurring values for σ, ranging from 0.01 to 4. The count of occurrences is displayed along the y-axis.
(a) starting from one of the best genomes
(b) from a random start
(c) Incidence of σ values
Fig. 5. Strong causality experiments
Graphs (a) and (b) show that, as σ increases, the performance of mutated individuals becomes increasingly different; it actually covers the whole domain for medium to large values of σ. When starting from the ‘best’ genome, the average performance decreases as the mutations move further and further away from the original genome. From a randomly generated start point, performance changes either way as we move away from the original genome. This shows that there is strong causality: small changes in the genome (low σ) lead to small variations in fitness, and big changes lead to large variations. Finally, figure 5.(c) shows the density of σ values over the 12 original runs of the original experiment. The (logarithmic) x-axis shows occurring values for σ, ranging from 0.01 to 4, and the count of occurrences is displayed along the y-axis. As shown in the graph, all possible values of σ from very small (entailing local search) to large (global search) frequently occurred in the course of our experiments, with more occurences of both the minimum value (ie. local search) and maximum value (ie. global search). This provides a sound validation of the (1+1)- ONLINE algorithm ability to conduct both local and global search thanks to the self-updating σ.
5 Conclusions and Further Work This paper provides a proof-of-concept for the viability of on-line, on-board evolution in autonomous robots. We have presented the (1+1)- ONLINE evolutionary algorithm to provide continuous adaptation in autonomous robots in unknown and/or changing environments, without help from any supervisor (human or otherwise). The (1+1)- ONLINE evolutionary algorithm is based on an “encapsulated” online onboard ER approach, as explained in the introduction. It was tested on very constrained, embedded hardware – the Cortex board we used in the experiments is limited in terms of performance as well as
120
N. Bredeche, E. Haasdijk, and A.E. Eiben
memory (256kb, including the operating system, complying with the robot prototype actually under construction). This requires a light-weight, low-complexity algorithm such as the one presented here, which is derived from the well known and well established (1 + 1) evolution strategies. One of the main contributions of the (1+1)- ONLINE evolutionary algorithm is that, by using re-evaluation, it specifically deals with intrinsically noisy performance evaluation in real world environments. This greatly increases the real-life applicability of this method and constitutes an original contribution compared to previous research into similar on-line setups. The second contribution is that of balancing local and global search through the σ update. Walker et al. described a similar (1 + 1)−ES inspired scheme with self-tuning σ [16] –however, the proposed approach updates σ through a heuristic that explicitly tunes the local and global search. An on-line approach such as presented here tackles problems beyond the scope of traditional off-line ER, such as dealing with dynamic or unknown environments for which the robots could not be optimised before their deployment and it also addresses issues such as the reality gap and the lack of fidelity in simulation [2,17]. In the experiments shown here, the (1+1)- ONLINE evolutionary algorithm yielded a great variety of behaviours in a limited amount of time; good performance was typically reached in under an hour. While the task at hand is relatively simple (obstacle avoidance and maximisation of translation speed), it should be noted once again that it requires no background knowledge whatsoever about the task and that the current algorithm can be applied in different contexts, simply by rewriting the fitness function. The dynamics of evolution often result in the loss of a very good genome –this is actually desired behaviour of the algorithm as it ensures continued exploration of new or changed regions in the search space. It could, however, be interpreted as a complication from the engineer’s viewpoint in a production environment where one wants to retain good genomes. This is in fact an instance of the well-known issue of exploration vs. exploitation in reinforcement learning; in this context, the algorithm proposed here provides the exploration. A reservoir such as the Hall-of-Fame introduced above can keep track of the best genomes and allow them to be re-used for exploitation. Further research focusses on the following issues. Firstly, we consider alternative schemes to update the champion’s fitness value after re-evaluation to combine the benefits of the ”last fitness” and the ”average fitness” approaches. This could for instance be achieved by weighting the influence of the latest fitness estimate and previous ones. Secondly, we intend to extend the algorithm towards a multi-robot set-up combining the distributed and encapsulated approaches (cf. Section1): adding a genome migration feature would make it possible to spread good genomes through a population of robots – similar to an island-based parallel EA. Thirdly, we are to test on-line, on-board evolution in a group of robots within dynamic environments - as a matter of fact, preliminary works with a real robot has already been done as an extension of this current work [5].
Acknowledgements This work was made possible by the European Union FET Proactive Intiative: Pervasive Adaptation funding the Symbrion project under grant agreement 216342. We also thank
On-Line, On-Board Evolution of Robot Controllers
121
Selmar Smit for his enthusiastic contribution to our discussions and Olivier Teytaud for discussions regarding the quasi-random number generator used on the Cortex board.
References 1. Beyer, H.G.: Evolutionary algorithms in noisy environments: theoretical issues and guidelines for practice. Computer Methods in Applied Mechanics and Engineering 186(2-4), 239– 267 (2000) 2. Brooks, R.A.: Intelligence without reason. In: Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI-91), Sydney, Australia, pp. 569–595. Morgan Kaufmann, San Francisco (1991) 3. Elfwing, S., Uchibe, E., Doya, K., Christensen, H.: Biologically inspired embodied evolution of survival. In: Proceedings of the 2005 IEEE Congress on Evolutionary Computation IEEE Congress on Evolutionary Computation, Edinburgh, UK, September 2-5, vol. 3, pp. 2210– 2216. IEEE Press, Los Alamitos (2005) 4. Floreano, D., Husbands, P., Nolfi, S.: Evolutionary robotics. In: Siciliano, B., Khatib, O. (eds.) Handbook of Robotics, pp. 1423–1451. Springer, Heidelberg (2008) 5. Montanier, J.M., Bredeche, N.: Embedde evolutionary robotics: The (1+1)-restart-online adaptation algorithm. In: IEEE IROS Workshop on Exploring new horizons in Evolutionary Design of Robots, Evoderob09 (2009) 6. Nehmzow, U.: Physically embedded genetic algorithm learning in multi-robot scenarios: The pega algorithm. In: Proceedings of The Second International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Edinburgh, UK, August 2002. Lund University Cognitive Studies, no. 94, LUCS (2002) 7. Nolfi, S., Floreano, D.: Evolutionary Robotics: The Biology, Intelligence, and Technology of Self-Organizing Machines. MIT Press/Bradford Books, Cambridge (2000) 8. Nolfi, S., Parisi, D.: Auto-teaching: networks that develop their own teaching input. In: Free University of Brussels. MIT Press, Cambridge (1993) 9. Nolfi, S., Parisi, D., Elman, J.L.: Learning and evolution in neural networks. Adapt. Behav. 3(1), 5–28 (1994) 10. Nordin, P., Banzhaf, W.: An on-line method to evolve behavior and to control a miniature robot in real time with genetic programming. Adaptive Behavior 5(2), 107–140 (1997) 11. Perez, A.L.F., Bittencourt, G., Roisenberg, M.: Embodied evolution with a new genetic programming variation algorithm. ICAS 0, 118–123 (2008) 12. Rechenberg, I.: Evolutionstrategie: Optimierung Technisher Systeme nach Prinzipien des Biologischen Evolution. Fromman-Hozlboog Verlag, Stuttgart (1973) 13. Schwefel, H.P.: Numerical Optimisation of Computer Models. Wiley, New York (1981) 14. Nolfi, S., Parisi, D., Elman, J.L.: Learning and evolution in neural networks. Adapt. Behav. 3(1), 5–28 (1994) 15. Usui, Y., Arita, T.: Situated and embodied evolution in collective evolutionary robotics. In: Proceedings of the 8th International Symposium on Artificial Life and Robotics, pp. 212–215 (2003) 16. Walker, J.H., Garrett, S.M., Wilson, M.S.: The balance between initial training and lifelong adaptation in evolving robot controllers. IEEE Transactions on Systems, Man, and Cybernetics, Part B 36(2), 423–432 (2006) 17. Watson, R.A., Ficici, S.G., Pollack, J.B.: Embodied evolution: Distributing an evolutionary algorithm in a population of robots. Robotics and Autonomous Systems 39(1), 1–18 (2002) 18. Wischmann, S., Stamm, K., W¨org¨otter, F.: Embodied evolution and learning: The neglected timing of maturation. In: Almeida e Costa, F., Rocha, L.M., Costa, E., Harvey, I., Coutinho, A. (eds.) ECAL 2007. LNCS (LNAI), vol. 4648, pp. 284–293. Springer, Heidelberg (2007)
The Transfer of Evolved Artificial Immune System Behaviours between Small and Large Scale Robotic Platforms Amanda M. Whitbrook, Uwe Aickelin, and Jonathan M. Garibaldi Intelligent Modelling and Analysis Research Group (IMA), School of Computer Science, University of Nottingham, Nottingham, NG8 1BB
[email protected],
[email protected],
[email protected]
Abstract. This paper demonstrates that a set of behaviours evolved in simulation on a miniature robot (epuck) can be transferred to a much larger scale platform (a virtual Pioneer P3-DX) that also differs in shape, sensor type, sensor configuration and programming interface. The chosen architecture uses a reinforcement learning-assisted genetic algorithm to evolve the epuck behaviours, which are encoded as a genetic sequence. This sequence is then used by the Pioneers as part of an adaptive, idiotypic artificial immune system (AIS) control architecture. Testing in three different simulated worlds shows that the Pioneer can use these behaviours to navigate and solve object-tracking tasks successfully, as long as its adaptive AIS mechanism is in place.
1
Introduction
Evolutionary robotics is a technique that refers to the genetic encoding of autonomous robot control systems and their improvement by artificial evolution. Ideally the end product should be a controller that evolves rapidly and has the properties of robustness, scalability and adaptation. However, in practice it proves difficult to achieve all of these goals without introducing an additional mechanism for adaptability since behaviour is essentially an emergent property of interaction with the environment [11]. Thus, the major challenge facing evolutionary robotics is the development of solutions to the problem of brittleness via the design of controllers that can generalize to modified environments. The characteristics of the robot body and its sensorimotor system may be regarded as part of the environment [2] as all embodied systems are physically embedded within their ecological niche and have a dynamic reciprocal coupling to it [9]. Indeed, artificial evolution often produces control systems that rely heavily on body morphology and sensorimotor interaction [3], and when these are subsequently altered, the changes can affect behavioural dynamics drastically. Thus, a solid test for robustness, scalability and adaptation is the ability of an evolved control system to function not only in different physical environments, but also on a number of robotic platforms that differ in size, morphology, sensor type and sensor-response profile. This paper is therefore concerned with P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 122–133, 2010. c Springer-Verlag Berlin Heidelberg 2010
The Transfer of Evolved Artificial Immune System
123
demonstrating the theoretical and practical cross platform transferability of an evolutionary architecture designed to combat adaptation problems. Adaptation is usually made possible through the introduction of additional mechanisms that permit some kind of post-evolutionary behaviour modification. The architecture used here falls into the general category of approaches that combine evolution or long term learning (LTL) with a form of lifelong or short term learning (STL) to achieve this [14]. The particular technique consists of the rapid evolution of a number of diverse behaviour sets using a Webots [10] simulation (LTL) followed by the use of an idiotypic artificial immune system (AIS) for selecting appropriate evolved behaviours as the robot solves its task in real time (STL). The approach differs from most of the evolutionary schemes employed previously in the literature in that these are usually based on the evolution of neural controllers [6] rather than the actual behaviours themselves. Previous papers have provided evidence that the idiotypic AIS architecture has advantages over a reinforcement learning scheme when applied to mobile robot navigation problems [15] and have shown that the idiotypic LTL-STL architecture permits transference from the simulator to a number of different real-world environments [16]. The chief aim of this paper is, therefore, to supply further support for the robustness, scalability and adaptability of the architecture by showing that it can be extended to the much larger scale Pioneer P3-DX robots. For this purpose, the behaviours evolved for the epuck in the Webots simulator are transplanted onto the Pioneer and used both with and without the idiotypic network in Player’s Stage [5] simulator. The results successfully demonstrate the scalability of the method in the virtual domain, and provide strong empirical evidence that the idiotypic selection feature is a vital component for achieving this. The remainder of this paper is structured as follows. Section 2 introduces some essential background information about the problem of platform transfer in mobile robotics and previous attempts to achieve it. Section 3 describes the LTL-STL control system, including its modular structure and the encoding of the evolved behaviours. In particular, it shows how platform transfer is achieved between the epuck and Pioneer P3-DX robots. Section 4 provides information regarding the simulated test environments and experimental set-up and Section 5 presents and discusses the results. Section 6 concludes the paper.
2
Background and Relevance
Cross platform transfer of an intelligent robot-control algorithm is a highly desirable property since more generic software is more marketable for vendors and more practical for users with more than one robot type. Furthermore, software that is robust to changes in a user’s hardware requirements is particularly attractive. However, transferability between platforms is difficult to achieve and is hence extremely rare in mobile robotics [2]. This is primarily due to hardware differences such as the size, morphology and spatial relationships between the body, actuators and sensors, which constitute a drastic change of the environment from an ecological perspective. Since modern mobile robot systems are
124
A.M. Whitbrook, U. Aickelin, and J.M. Garibaldi
distributed systems, transfer may also be hindered by diversity of middleware, operating systems, communications protocols and programming languages and their libraries [13]. Furthermore, portability is made even more challenging by differences in sensor type, sensor characteristics and the mechanical structure of the robot. Despite its rarity, platform transfer for evolved control systems is reported in the literature. Floreano and Mondada [2,3] use an incremental approach to evolve artificial neural networks for solving a looping maze navigation problem. Evolution begins with a real miniature Khepera robot and gradually moves to a Koala, a larger, more fragile robot supplied by the same manufacturer. Within this architecture, previously evolved networks are gradually adapted, combined, and extended to accommodate the changing morphology and sensorimotor interfaces [3]. However, the scheme possesses some significant drawbacks. The use of physical robots for the evolution is both impractical and infeasible due to the excessive amount of time and resources required. For example, adaptability to the Koala platform emerges only after 106 generations on the real Khepera and an additional 30 on the Koala (each generation taking approximately 40 minutes). Also, if each new environment or platform requires additional evolution then there is no controller that is immediately suitable for an unseen one. Another consideration is that the Koala was deliberately designed to support transfers from the Khepera and is thus very similar in terms of the wheel configuration, IR sensors, vision module, low-level BIOS software and communication method. This is good from a practical perspective, but one could argue that the new, supposedly unknown environment is far too engineered. Floreano and Urzelai [4,12] also evolve a neural network to control a lightswitching robot but evolve the mechanisms for self-organization of the synaptic weights rather than the weights themselves. This means that the robot is rapidly able to adapt its connection weights continuously and autonomously to achieve its goal. The authors transfer the evolved control system from simulation to a real Khepera and also from a simulated Khepera to a real Koala. They report no reduction in performance following the transfer. However, since the same platforms are used as in [2,3], the new environment is, again, too engineered. In addition, the task is very simple and the environment is sparse, requiring minimal navigational and obstacle avoidance skills. In this paper more complex tasks and environments are used to demonstrate that behaviours evolved on a simulated epuck can be used by a larger, unrelated robot that has not deliberately been designed for ease of transfer (the Pioneer P3-DX). This represents a much more difficult platform transfer exercise than has been attempted before and is hence a more realistic test of control system adaptability. In particular, Pioneer P3-DX robots differ significantly from epucks in mechanical structure, body size, body shape and wheel size and possess sixteen sonar sensors rather than the eight infrared (IR) sensors of the epuck, which also have a different spatial arrangement. The Pioneer is also produced by a different manufacturer and uses different middleware and a different simulator (Stage [5]). A full comparison between the two platforms is provided in
The Transfer of Evolved Artificial Immune System
125
Section 3.4, Table 1. In addition, the achievement of platform transfer between epucks and Pioneers is of practical value since Pioneer behaviours cannot be evolved directly on the Stage simulator within a realistic time frame; Stage is not fast or accurate enough, and control systems used in the Webots programming environment are not directly transferable to real Pioneers. Moreover, simulation of the epuck in Webots requires a 3D model that is readily available, so it is computationally much cheaper to reuse the epuck’s evolved behaviours in Stage rather than to design a complex 3D Pioneer model for Webots.
3 3.1
System Architecture Artificial Immune Systems and the Behavioural Encoding
AISs mimic the properties of the vertebrate immune system (for example antibody recognition of and stimulation by antigens). Idiotypic systems in particular exploit Jerne’s notion of an idiotypic network [7], where antibodies are capable of recognizing and being stimulated or suppressed by other antibodies via their paratopes and idiotopes, see [15] for further details. Idiotypic AIS algorithms are often based on Farmer et al.’s computational model [1] of Jerne’s theory and are characterized by a decentralized behaviour-selection mechanism. They are a popular choice for STL in robotic control systems [8] since they allow much greater flexibility for determining a robot’s actions. The LTL and STL aspects of the control system presented here thus work together to produce adaptability; diversity of the behaviour sets is provided by the evolutionary (LTL) component and the idiotypic network (STL) exploits the wide range of choice available to select behaviours appropriate for a given environmental scenario. The AIS analogy is that antigens model the environmental information as perceived by the sensors and antibodies model the behaviours of the robot. Here, eight antigens (coded 1 - 8) are identified based on the robot’s possession of distance-measuring sensors (IR or sonar) and a camera for tracking coloured objects. These are 1 - target unseen, 2 - target seen, 3 - obstacle right, 4 - obstacle rear, 5 - obstacle left, 6 - collision right, 7 - collision rear, 8 - collision left. In addition, six types of basic behaviour (coded 1 - 6) are used; 1 - wandering using either a left or right turn, 2 - wandering using both left and right turns, 3 - turning forwards, 4 - turning on the spot, 5 - turning backwards, and 6 tracking targets. More detailed individual behaviours are thus described using the attribute type T , which refers to the basic behaviour code, and the additional attributes speed S in epuck speed units per second (ψ per second), frequency of turn F (% of time), angle of turn A (% reduction in one wheel speed), direction of turn D (either 1 - left or 2 - right), frequency of right turn Rf (% of time) and angle of right turn Ra (% reduction in right wheel speed). This structure means that, potentially, a vast number of diverse behaviours can be created. However, there are limits to the attribute values [17]; these are carefully selected in order to strike a balance between reducing the size of the search space, which increases speed of convergence, and maintaining diversity. More details on the behavioural encoding are provided in Section 3.4.
126
3.2
A.M. Whitbrook, U. Aickelin, and J.M. Garibaldi
LTL Phase
The LTL phase is a reinforcement learning-assisted genetic algorithm (GA) that evolves a suitable behaviour for each antigen. It works by selecting two different parent robots via the roulette-wheel method and determining behaviour attribute values for their offspring as described in [17]. The reinforcement learning component constantly assesses the performance of the behaviours during evolution so that poorly-matched ones are replaced with newly-created ones when the need arises, which accelerates the GA. All the test problems are assessed by measuring task completion time ti and number of collisions ci i = 1, ..., x, thus, the relative fitness μi of each member of the population is calculated using: μi =
ti + ρci
x
1
k=1
(tk + ρck )−1
,
(1)
where ρ represents the weighting given to collisions (ρ = 1 here) and x is the number of robots in the population. After convergence, the fittest robot in the final population is selected. However, since the idiotypic network requires a number n of distinct behaviours for each antigen, the whole process is repeated n times in order to obtain n robots from separate populations that never interbreed. This is an alternative to selecting a number of robots from a single, final population and means that greater diversity can be achieved with smaller population sizes without reducing speed of convergence. The attribute values representing the behaviours of the n robots and their final reinforcement scores are saved as a genetic sequence (a simple text file) for seeding the AIS system. 3.3
STL Phase
The AIS system reads the genetic sequence generated by the LTL phase and then calculates the relative fitness μi of each behaviour (or antibody) set using (1), where ρ = 8 to increase the weighting given to the number of collisions. It then produces an n×8 matrix P (analogous to an antibody paratope) representing the reinforcement scores, or degree of match between antibodies and antigens. The elements of this matrix (Pij i = 1, ..., n, j = 1, ..., 8) are calculated by multiplying each antibody’s final reinforcement score by the relative fitness of its set μi . An n × 8 matrix I (analogous to an antibody idiotope) is also created by assigning a value of 1.0 to the element corresponding to the minimum Pij for each j, and designating a value of 0.0 to all other elements. The matrix P is adjusted after every iteration through reinforcement learning, but I remains fixed throughout. If an idiotypic network is not used then P alone governs antibody selection; the antigen-matching antibody with the highest reinforcement score is used. If idiotypic stimulation and suppression of antibodies are taken into account then I is used to adjust the degree of match of each antibody as described in [16], which may result in a different one being selected.
The Transfer of Evolved Artificial Immune System
3.4
127
Mechanisms of Cross Platform Transfer
Following convergence of the GA, the selected behaviours are encoded as nine integers in a simple text file that contains all the genetic information necessary to reproduce them. The first integer represents the antigen code, and the next seven represent the behavioural attributes T , S, F , A, D, Rf and Ra . The last integer is the final reinforcement score attained by the behaviour prior to convergence. The genetic sequence encodes the principal wheel speeds in epuck speed units per second (ψ per second) where ψ = 0.00683 radians. A speed value of 600 ψ per second thus corresponds to 600 × 0.00683 = 4.098 radians per second. An example line from a genetic text file is: 0 2 537 80 51 2 37 76 50, which encodes wandering in both directions with a speed of 537 ψ per second, turning 80% of the time. The robot turns right 37% of this time by reducing the speed of the right wheel by 76%, and turns left 63% of this time by reducing the speed of the left wheel by 51%. A particular genetic sequence thus governs how the left and right wheel speeds change with time. In theory, the behavioural encoding may be extended to any two-wheeled, non-holonomic, mobile-robot, since the wheel motions of such robots are fully described by their changing speeds. Furthermore, since the output from the LTL phase is a simple text file, any program is capable of reading it and extracting the information necessary to form the wheel motions. Moreover, specification of the speeds in radians per second permits automatic scaling between different-sized environments, without requiring knowledge of the particular scales involved, since wheel size is generally related to the scale of the environment. However, it is also necessary to consider some fundamental hardware and software differences between the Pioneer P3-DX and epuck robots when making the transfer. Table 1 below shows the technical specification for each robot type. The most fundamental considerations are the larger scale of the Pioneer, the use of different programming environments, the use of sonar sensors on the Pioneer and the spatial arrangement of these sensors. These affect the transfer in three main ways; how velocity is expressed, how the sensors are read and translated into antigen codes, and how blob finding is implemented. Use of the genetic sequence coupled with a simple conversion of ψ per second to radians per second, as described above, would be adequate to cater for the scaling differences if the two platforms did not use different APIs. However, the epuck is programmed using the Webots C/C++ Controller API, where robot wheel speeds are set using the differential wheels set speed method, which requires the left and right wheel speeds in ψ per second as its arguments. In contrast, the Pioneer robot is programmed using libplayerc++, a C++ client library for the Player server. In this library, the angular and linear components of the robot’s velocity are set separately using yaw ω and velocity v arguments for the SetSpeed method of the Position2dProxy class, and v is expressed in metres per second. As methods for encoding the genetic sequence into left L and right R epuck wheel speeds already exist, it is computationally cheaper to reuse these methods on the Pioneer and simply convert them into equivalent ω and v arguments. The conversions are given by:
128
A.M. Whitbrook, U. Aickelin, and J.M. Garibaldi Table 1. Differences between the Pioneer and Epuck Robotic Platforms No. Attribute
Pioneer P3-DX
Epuck
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
MobileRobots Inc Stage Player Linux Wireless TCP/IP 9.50 5.00 33.00 Aluminium 44 38 22 9 Octagonal Sonar 16 15cm to 5m Canon VC-C4 Player
EPFL Webots Webots N/A Bluetooth 2.05 0.20 5.20 Plastic 7 7 4.8 0.15 Circular Infrared 8 0 to 6cm VGA Weblobs
Manufacturer Simulator Middleware Operating system Communications protocol Wheel radius (cm) Wheel width (cm) Axle length (cm) Body material Body length (cm) Body width (cm) Body height (cm) Weight (kg) Body shape Sensor type No. of sensors Sensor range Camera Blob finding software
v=
ψrp (R + L) , 2
(2)
ω=
ζψre (R − L) , ae
(3)
where rp is the radius of the Pioneer wheel, re is the radius of the epuck wheel, and ae is the axle length of the epuck. The parameter ζ = 1.575 is determined by empirical observation and is introduced in order to replicate the angular movement of the epuck more accurately. The antigens indexed 3 to 8 describe an obstacle’s orientation with respect to the robot (right, left or rear) and classify its distance from the robot as either “obstacle” (avoidance is needed) or “collision” (escape is needed). Thus, two threshold values τ1 and τ2 are required to mark the boundaries between “no obstacle” and “obstacle” and between “obstacle” and “collision” respectively. The epuck’s IR sensors are nonlinear and correspond to the quantity of reflected light, so higher readings mean closer obstacles. In contrast, the Pioneer’s sonar readings are linear denoting the estimated proximity of an obstacle in metres, so lower readings mean closer obstacles. Since direct conversion is difficult, the threshold values τ1 and τ2 (250 and 2400 for the epuck) are determined for the Pioneer by empirical observation of navigation through cluttered environments, (τ1 = 0.15 and τ2 = 0.04). Additionally, in order to determine the orientation of any detected obstacle, the epuck uses the index of the maximum IR reading,
The Transfer of Evolved Artificial Immune System
129
where indices 0, 1 and 2 correspond to the right, 3 and 4 correspond to the rear and 5, 6 and 7 correspond to the left. For the Pioneer it is necessary to use the index of the minimum sonar reading and encode positions 4 to 9 as the right, 10 to 13 as the rear and positions 0 to 3 and 14 to 15 as the left, due to the different spatial arrangement of the sensors. Blob finding software (named Weblobs) was developed for the epuck as part of this research, since the Webots C/C++ Controller API has no native blob finding methods. However, the Pioneer robot is able to use methods belonging to the BlobfinderProxy class of libplayerc++. The objective is to determine whether blobs (of the target colour) are visible, and if so, to establish the direction (left, centre or right) of the largest from the centre of the field of view. The two robot types thus use different blob finding software, but collect the same information. 3.5
Modular Control Structure
The entire STL program is broken down into the pseudo code below in order to demonstrate its modular structure and the ease with which this facilitates adaptation for the Pioneer P3-DX platform. Each block shows the module it calls and the method it uses within that module. Blocks marked with an asterisk are dealt with in the main body of the program and do not call other modules. 1 2 3 4 5 6 7 8 9 10 11
I n i t i a l i z e r o b o t (ROBOT −−> I n i t i a l i z e R o b o t ( ) −−> I n i t i a l i z e S e n s o r s ( ) ) Read g e n e t i c s e q u e n c e ∗ B u i l d m a t r i c e s P and I ∗ REPEAT Read s e n s o r s (ROBOT −−> Re a d S e n so r s ( ) ) Read camera (BLOBFINDER −−> G e t B l o b I n f o ( ) ) Determine a n t i g e n code ∗ Score previous behaviour using r e i n f or c e ment l e ar n i n g ∗ Update P ∗ S e l e c t behaviour ∗ Update a n t i b o d y c o n c e n t r a t i o n s ∗ E x e c u t e b e h a v i o u r (BEHAVIOUR −−> E x e c u t e ( ) ) UNTIL s t o p p i n g c r i t e r i a met
The only blocks that require changes for the Pioneer platform are 1, 4, and 5. Since these are dealt with by calling other modules, the main body of the program can be wholly reused, although an additional two lines in block 11 are necessary to convert the wheel speeds to the Player format. Some slight changes to block 7, which deals with using sensor data to determine the reinforcement score are also required.
4
Test Environments and Experimental Set-Up
The genetic behaviour sequences are evolved using 3D virtual epucks in the Webots simulator, where the robot is required to track blue markers in order to navigate through a number of rooms to the finish line, see [17]. Throughout evolution, five separate populations of ten robots are used and the mutation rate is set at 5% as recommended by [17] for a good balance between maximizing diversity and minimizing convergence time. Following the LTL phase, the evolved
130
A.M. Whitbrook, U. Aickelin, and J.M. Garibaldi
behaviour sequences are used with 2D virtual Pioneer robots in three different Stage worlds, S1 , S2 , and S3 . S1 and S2 require maze navigation and the tracking of coloured door markers (Figure 1 and Figure 2), and S3 involves search and retrieval of a blue block whilst navigating around other obstacles (Figure 3). Sixty runs are performed in each Stage world, thirty using the idiotypic selection mechanism, and thirty relying on reinforcement learning only. In addition, in S3 the obstacle positions, target block location, and robot start point are changed following each idiotypic and nonidiotypic test, so that the data is paired. For all runs, the task time t and number of collisions c are recorded. However, a fast robot that continually crashes or a careful robot that takes too long to complete the task is undesirable, so an additional score metric ϕ that combines t and c is computed for each run. This is given by: ϕ=
t + σi c , 2
(4)
where σi is the ratio of the mean task time t¯ to mean number of collisions c¯ for world Si . In all worlds, t¯, c¯ and ϕ¯ are computed with and without using idiotypic effects and the results are compared using a 2-tailed t-test (paired for world S3 ), with differences accepted as significant at the 99% level only. As another measure of task performance, runs with an above average ϕ for each world are counted as good and those with fitness in the bottom 10% of all runs in each world are counted as bad. Additionally, for each task, robots taking longer than 900 seconds are counted as having failed and are stopped.
Fig. 1. 2D Stage world S1 used in the Pioneer STL phase
5
Results and Discussion
Table 2 shows t¯, c¯ and ϕ¯ values with and without using idiotypic effects in each world and the significant difference levels when these are compared. It also displays the percentage of good and bad runs and number of fails in each world. When idiotypic effects are employed, the virtual Pioneer robots prove able to navigate safely (the mean number of collisions is very low) and solve their tasks within the alloted time in all of the worlds. Navigation is also safe for the nonidiotypic robots, but, in terms of task time in worlds S1 and S2 there is a 17% failure
The Transfer of Evolved Artificial Immune System
131
Fig. 2. 2D Stage world S2 used in the Pioneer STL phase
rate and in world S3 there is a 7% failure rate. Furthermore, mean task time is significantly higher than for the idiotypic case in all worlds, although the number of collisions is consistently low and not significantly different between the idiotypic and nonidiotypic cases. In addition, the number of bad runs is higher and the number of good runs is lower for nonidiotypic robots in all worlds and the score is significantly better when idiotypic effects are employed in worlds S1 and S2 . These observations provide strong empirical evidence that the behaviours evolved in simulation on an epuck robot can be successfully ported to the virtual Pioneer P3-DX platform provided that the adaptive idiotypic mechanism is applied within the STL architecture. As with the STL results for virtual and real epucks (documented in [16]), the results show that the evolutionary (LTL) phase is capable of producing sets of very diverse behaviours but that the STL phase requires the use of a scheme that selects from the available behaviours in a highly adaptive way. This is further illustrated in Figure 3 which shows the paths taken by an idiotypic (left) and nonidiotypic (right) Pioneer when solving the block-finding problem in world S3 . It is evident that the nonidiotypic Pioneer takes a much less direct route and repeats its path several times. This is because it is less able to adapt its behaviour and consequently spends much more time wandering, getting stuck and trying to free itself. This result is typical of the S3 experiments. The chosen architecture has a number of benefits. The reinforcement-assisted GA effectively balances the exploitative properties of reinforcement and the explorative properties of the GA. This reduces convergence time, improves accuracy and maintains GA reliability. The genetic encoding of the behaviours and the choice of separate populations permits greater diversity for the antibodies, which allows for a much more adaptive strategy in the STL phase. Earlier work [15] suggests that the idiotypic advantage can be attributed to an increased rate of antibody change, which implies a much less greedy strategy. It also proposes that the network is capable of linking antibodies of similar type, so that useful but potentially untried ones can be used. The use of concentrations and feedback within the network may also facilitate a memory feature that achieves a good balance between selection based on past antibody use and current environmental information. However, the present scheme has some limitations; there is no scope to change the antibodies within the network, only to choose between
132
A.M. Whitbrook, U. Aickelin, and J.M. Garibaldi
them. A possible improvement would be constant execution of the LTL phase, which regularly updates the genetic sequence, allowing fresh antibodies to be used if the need arises. In addition, success with transference to other platforms is presently too heavily dependent upon parameter tuning and readjustment of the reinforcement scheme for the particular sensor characteristics. Table 2. Results of Experiments with and without Idiotypic Effects in Each World. G = % Good, B = % Bad, F = % Fail. World Significance Idiotypic Nonidiotypic t¯ c¯ ϕ ¯ t¯(s) c¯ ϕ¯ G B F t¯(s) c¯ ϕ ¯ G B F S1 S2 S3
100 96 100 176 2 166 70 3 0 336 4 346 47 30 17 100 98 100 309 2 287 27 10 0 513 5 535 13 60 17 100 56 97 160 2 233 43 7 0 395 1 322 37 37 7
Fig. 3. World S3 showing the trail of an idiotypic (left) and nonidiotypic (right) robot
6
Conclusions
This paper has described a mobile robot control architecture that consists of an LTL (evolutionary) phase responsible for the generation of sets of diverse behaviours, and an STL (immune system) phase, which selects from the available behaviours in an adaptive way. It has shown that the behaviours are essentially platform independent and that they can be evolved in simulation on a miniature epuck robot and used on a much larger virtual Pioneer P3-DX robot. The platform transfer is equivalent to a complex and difficult environmental change and is thus a sound test of adaptability and scalability for the combined LTL-STL architecture. Tests in different environments have shown that the Pioneer is able to accomplish navigation, obstacle avoidance and retrieval tasks using the epuck behaviours, and that on average it performs significantly faster when employing the idiotypic mechanism as behaviour selection is much more adaptable than using reinforcement learning alone. The next step is testing with real Pioneer P3-DX robots to establish whether similar levels of success can also be achieved in the real domain.
The Transfer of Evolved Artificial Immune System
133
References 1. Farmer, J.D., Packard, N.H., Perelson, A.S.: The Immune System, Adaptation, and Machine Learning. Physica D 2(1-3), 187–204 (1986) 2. Floreano, D., Mondada, F.: Evolutionary Neurocontrollers for Autonomous Mobile Robots. Neural Networks 11(7-8), 1416–1478 (1998) 3. Floreano, D., Mondada, F.: Hardware Solutions for Evolutionary Robotics. In: Proceedings of the First European Workshop on Evolutionary Robotics, pp. 137– 151. Springer, London (1998) 4. Floreano, D., Urzelai, J.: Evolutionary Robots with On-line Self-organization and Behavioural Fitness. Neural Networks 13, 431–443 (2000) 5. Gerkey, B., Vaughan, R., Howard, A.: The Player/Stage Project: Tools for MultiRobot and Distributed Sensor Systems. In: Proceedings of the International Conference on Advanced Robotics (ICAR 2003), Coimbra, Portugal, pp. 317–323 (2003) 6. Goosen, T., van den Brule, R., Janssen, J., Haselager, P.: Interleaving Simulated and Physical Environments Improves Evolution of Robot Control Structures. In: Proceedings of the 19th Belgium-Netherlands Conference on Artificial Intelligence (BNAIC), pp. 135–142. Utrecht University Press (2007) 7. Jerne, N.K.: Towards a Network Theory of the Immune System. Annales d’Immunologie 125C(1-2), 373–389 (1974) 8. Krautmacher, M., Dilger, W.: AIS Based Robot Navigation in a Rescue Scenario. In: Nicosia, G., Cutello, V., Bentley, P.J., Timmis, J. (eds.) ICARIS 2004. LNCS, vol. 3239, pp. 106–118. Springer, Heidelberg (2004) 9. Lungarella, M., Sporns, O.: Mapping Information Flow in Sensorimotor Networks. PLOS Computational Biology 2(1), 1301–1312 (2006) 10. Michel, O.: Cyberbotics Ltd - WebotsTM: Professional Mobile Robot Simulation. International Journal of Advanced Robotic Systems 1(1), 39–42 (2004) 11. Nolfi, S., Floreano, D.: Evolutionary Robotics, The Biology, Intelligence, and Technology of Self-Organizing Machines, 1st edn. MIT Press, Cambridge (2000) 12. Urzelai, J., Floreano, D.: Evolutionary Robotics: Coping with Environmental Change. In: Proceedings of the Genetic and Evolutionary Computation Conference GECCO-00, pp. 941–948. Morgan Kaufmann, San Francisco (2000) 13. Utz, H., Sablatnog, S., Enderle, S., Kraetzschmar, G.: Miro- Middleware for Mobile Robot Applications. IEEE Transactions on Robotics and Automation 18(4), 493– 497 (2002) 14. Walker, J.H., Garett, S.M., Wilson, M.S.: The Balance Between Initial Training and Lifelong Adaptation in Evolving Robot Controllers. IEEE Transactions on Systems, Man and Cybernetics- Part B: Cybernetics 36(2), 423–432 (2006) 15. Whitbrook, A.M., Aickelin, U., Garibaldi, J.M.: Idiotypic Immune Networks in Mobile Robot Control. IEEE Transactions on Systems, Man and Cybernetics, Part B- Cybernetics 37(6), 1581–1598 (2007) 16. Whitbrook, A.M., Aickelin, U., Garibaldi, J.M.: An Idiotypic Immune Network as a Short-Term Learning Architecture for Mobile Robots. In: Bentley, P.J., Lee, D., Jung, S. (eds.) ICARIS 2008. LNCS, vol. 5132, pp. 266–278. Springer, Heidelberg (2008) 17. Whitbrook, A.M., Aickelin, U., Garibaldi, J.M.: Genetic Algorithm Seeding of Idiotypic Networks for Mobile-Robot Navigation. In: Proceedings of the 5th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2008), Madeira, Portugal, pp. 5–13 (2008)
An Analysis of Algorithmic Components for Multiobjective Ant Colony Optimization: A Case Study on the Biobjective TSP Manuel L´ opez-Ib´ an ˜ ez and Thomas St¨ utzle IRIDIA, CoDE, Universit´e Libre de Bruxelles, Brussels, Belgium
[email protected],
[email protected]
Abstract. In many practical problems, several conflicting criteria exist for evaluating solutions. In recent years, strong research efforts have been made to develop efficient algorithmic techniques for tackling such multiobjective optimization problems. Many of these algorithms are extensions of well-known metaheuristics. In particular, over the last few years, several extensions of ant colony optimization (ACO) algorithms have been proposed for solving multi-objective problems. These extensions often propose multiple answers to algorithmic design questions arising in a multi-objective ACO approach. However, the benefits of each one of these answers are rarely examined against alternative approaches. This article reports results of an empirical research effort aimed at analyzing the components of ACO algorithms for tackling multi-objective combinatorial problems. We use the bi-objective travelling salesman problem as a case study of the effect of algorithmic components and their possible interactions on performance. Examples of design choices are the use of local search, the use of one versus several pheromone matrices, and the use of one or several ant colonies. Keywords: Multiobjective Optimization, Ant Colony Optimization, Travelling Salesman Problem.
1
Introduction
Ant colony optimization (ACO) [6] is a general-purpose stochastic local search (SLS) method [9] inspired by the pheromone trail laying and following behavior of some real ant species. The main application area of ACO is NP-hard combinatorial optimization problems. Due to the practical relevance of this class of problems and the high performance reached by ACO algorithms, the number of applications of ACO algorithms has risen strongly over the recent years [6]. In many practically relevant problems, the quality of candidate solutions is evaluated with respect to various, often conflicting objectives. It is therefore not surprising that several extensions of ACO algorithms have been proposed to tackle such multi-objective combinatorial optimization problems (MCOPs). Two recent reviews [3,7] describe a few tens of papers that deal with multi-objective ACO (MOACO) algorithms for MCOPs defined in terms of Pareto optimality. P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 134–145, 2010. c Springer-Verlag Berlin Heidelberg 2010
An Analysis of Algorithmic Components for MOACO
135
Nevertheless, surprisingly little research on MOACO is targeted towards understanding the contribution to performance of the specific design choices made in these algorithms. Such design choices concern the definition of pheromone and heuristic information, the aggregation of specific information for each objective, different ways of selecting solutions for the pheromone update, the use of multiple ant colonies to specialize on specific areas of the Pareto front, and so on. In fact, most articles propose one specific way of tackling MCOPs by an ACO algorithm [5,4]; rare are comparisons of several MOACO algorithms [7] or studies that compare few design alternatives [1,2]. In fact, we believe that our previous article on an experimental analysis of MOACO algorithms for the biobjective quadratic assignment problem (BQAP) [12] is one of the most complete concerning the systematic experimental comparison of specific design alternatives of MOACO algorithms. In this article, we build and extend upon our earlier work in the area of the experimental analysis of MOACO algorithms. As in our previous work, we use a component-wise view of the design of MOACO algorithms, but here we extend the analysis in various ways. First, we use more advanced tools for the empirical analysis of the behavior of multi-objective optimizers. In particular, we base our analysis on the attainment function methodology [8], and we use graphical illustrations to examine where in the objective space the various MOACO algorithms differ in performance [11,13]. This gives a detailed view of the impact of specific MOACO components on performance. Second, we tackle another problem: the bi-objective traveling salesman problem (BTSP). The BTSP differs very strongly from the BQAP in search space characteristics [16], and the solution evaluation in the BTSP is much faster than in the BQAP. We also apply to the BTSP the usual algorithmic speed-up techniques for the single-objective TSP such as candidate lists [14]. Third, some of the algorithm details studied have not been considered previously. In fact, the results we present here are part of a larger research effort that aims at a detailed analysis of the impact of MOACO components on algorithm performance. The article is structured as follows. In Section 2 we give basic definitions and introduce the BTSP. Section 3 reviews concisely available ACO algorithms for MCOPs tackled in the Pareto sense and introduces the algorithmic components we study. We detail the experimental setup in Section 4 and present results in Section 5. We end with some concluding remarks in Section 6.
2
Multiobjective Combinatorial Optimization
In this article, we focus on MCOPs, where candidate solutions are evaluated by an objective function vector f = (f1 , . . . , fd ) with d objectives. MCOPs are often solved without a priori assumptions on the preferences of the decision maker. In such a case, the goal is to determine a set of feasible solutions that “minimizes” the objective function vector f according to the Pareto optimality criteria. Let = v and u and v be vectors in Rd . We say that u dominates v (u ≺ v) iff u ui ≤ vi , i = 1, . . . , d. Furthermore, u and v are nondominated iff u ⊀ v and
136
M. L´ opez-Ib´ an ˜ez and T. St¨ utzle
v ⊀ u. To simplify the notation, we also say that a feasible solution s dominates another solution s iff f (s) ≺ f (s ). A solution s is a Pareto optimum iff no other feasible solution s exists such that f (s ) ≺ f (s). The goal in MCOPs then typically is to determine the set of all Pareto-optimal solutions. However, this task is often computationally intractable, and therefore it is usually preferable to approximate the Pareto set as well as possible in a given amount of time. Such an approximation is always a set of solutions that are mutually nondominated. In this paper, we deal with the BTSP, which is a direct extension of the widely studied single-objective TSP with two cost values between each pair of distinct cities. A BTSP instance is defined by a complete graph G = (V, A) with n = |V | nodes {v1 , . . . , vn }, and a set of arcs A that fully connects the graph. Each arc has = j. an associated cost vector whose components are c1 (vi , vj ) and c2 (vi , vj ), i Here we assume that instances are symmetric, that is cq (vi , vj ) = cq (vj , vi ), i = j, q = 1, 2. The goal in the BTSP is to find the set of Hamiltonian tours p = (p1 , . . . , pn ) “minimizing” in terms of Pareto optimality the total tour cost, which is given by n−1 fq (p) = cq vp(n) , vp(1) + cq vp(i) , vp(i+1)
q = 1, 2.
i=1
3
Multi-Objective Ant Colony Optimization
The first extensions of ACO for MCOPs were proposed at the end of the 90s. While various early approaches targeted problems where the objectives can be ordered lexicographically or preferences of a decision maker are known, recent overviews [3,7] show that most proposals of MOACO algorithms tackle multiobjective problems in terms of Pareto optimization. 3.1
Available MOACO Algorithms
All proposed MOACO algorithms somehow try to modify underlying ACO components so as to allow the algorithm to direct the search towards the different regions of the Pareto front simultaneously. A main question is how to represent the solution components of different areas of the front in the form of pheromones. Essentially, the alternatives are to use one pheromone matrix [4,10] or several pheromone matrices [5,1,10]. In the latter case, typically each objective has associated one pheromone matrix. Related to this decision is the choice of the ants that deposit pheromone. Typically, if one pheromone matrix is used, some or all nondominated solutions are selected for update [10], whereas when using several matrices, some elitist choices w.r.t. the objectives represented by the pheromone matrices are done [5,1]. During the solution construction, multiple pheromone or heuristic matrices are commonly combined by means of weights [5,10]. Most papers concerning MOACO approaches do not study the various alternative algorithmic components, and, hence, little insight about the impact of specific choices is available. The following articles are, to some extent, exceptions.
An Analysis of Algorithmic Components for MOACO
137
Iredi et al. [10] explored some few design options regarding multiple colonies for a specific scheduling problem. Alaya et al. [1] studied four combinations of the number of pheromone matrices and the number of colonies for a multi-objective knapsack problem. The review article by Garc´ıa-Mart´ınez et al. [7] experimentally compared various complete MOACO algorithms, rather than individual algorithmic components. They used the BTSP as a benchmark problem, but they did not apply usual algorithmic techniques such as candidate lists or local search. We therefore decided to start a significant research effort to analyze in more detail the impact various MOACO algorithm components may have on performance, extending in a strong way beyond our initial efforts [12]. 3.2
Studied Algorithm Components
Based on our earlier experience with MOACO algorithms and the various proposals made in the literature, we decided to study the following main algorithmic components for composing MOACO algorithms for the BTSP. Note that for some components, only specific levels of other components make sense. This is indicated below where necessary. Local Search. It is well known that local search has a strong influence on performance for single-objective ACO algorithms [6]. Hence, we study the impact of local search for the BTSP by means of an iterative improvement algorithm based on the 2–exchange neighborhood. Our local search algorithm uses a weighted sum combination of the two objectives and exploits standard TSP speed-up techniques (candidate sets are reordered for each weight vector). Pheromone Information. We use two levels for this component, either one pheromone matrix or two pheromone matrices. If one pheromone matrix is used, the solution construction exploits this pheromone matrix as usual in single1 2 objective ACO algorithms. However, two heuristic values ηij , ηij are available for each arc (i, j), one for each distance matrix. We decided to aggregate them 1 2 by giving equal weight to each: ηij = 0.5·ηij +0.5·ηij . When multiple pheromone matrices are utilized, each matrix is associated to one objective, and they are aggregated by means of weights. The MOACO literature contains examples of both linear versus non-linear aggregations. Here, we use a linear aggregation, since in our underlying ACO algorithm we ensure that the pheromones are in the same range. Hence, the probability of choosing city j directly after city i for ant k is given by: α 1 2 β (1 − λk )τij1 + λk τij2 · (1 − λk )ηij + λk ηij k pij = if j ∈ Nik , 1 + λ τ 2 )α · ((1 − λ )η 1 + λ η 2 )β ((1 − λ )τ k k k k k l∈N il il il il i
where Nik is the feasible neighborhood of ant k, that is, those cities not visited yet by ant k, τij1 , τij2 are the pheromone trails for arc (i, j) for either objective, 1 2 ηij , ηij are the corresponding heuristic values, and λk is a weight. Weight Setting Strategies. Whenever weights are used, either for aggregating multiple pheromone matrices or for performing a single-objective local search
138
M. L´ opez-Ib´ an ˜ez and T. St¨ utzle
over a scalarization of the multi-objective problem, several strategies can be applied for setting the weights used at each iteration of the algorithm. A possible strategy is to define as many weights as ants. Then, each ant may be assigned a different weight from the other ants at each iteration. Iredi et al. [10] proposed this approach. Alternatively, we can assign to all ants the same weight but modify the weight slightly at each iteration. This latter strategy is more similar to the approach followed by two-phase local search [15]. If an ant is assigned a weight vector, the subsequent local search will use the same weight vector. Pheromone deposit. This algorithmic component is mostly defined by the previous choice of pheromone information. In the case of one pheromone matrix, typically the nondominated solutions are allowed to deposit pheromones. In the case of multiple pheromone matrices, some elitist strategy is usually followed. Here, we use the method of updating each pheromone matrix with the best solution for each objective. As for whether the iteration-best or global-best solutions are used for updating, we follow directly the strategy of the underlying ACO algorithm, MAX -MIN Ant System (MMAS) [18]. Multiple Colonies. Multiple colonies are typically used to better direct the search towards specific areas of the Pareto front. Each colony has its own pheromone information, that is, either one or two pheromone matrices. The total number of ants is divided equally between the colonies. Solutions are placed in a common nondominated archive, hence, colonies influence each other by sharing information in order to discard dominated solutions. Then, the pheromone of each colony is updated with solutions generated by ants from the same colony (update by origin) [10]. The cooperation among the colonies can be made stronger by exchanging solutions. One such approach is update by region [10], where the common archive is divided into as many parts as colonies, and each part is used to update the pheromone information of a different colony. In the bi-objective case, this can be implemented by sorting the common Pareto front according to the first objective, and subdividing it into smaller fronts of equal size.
4
Experimental Setup
The multi-objective ACO algorithms are implemented in C based on ACOTSP [17], and compiled with gcc, version 3.4. Experiments are carried out on AMD OpteronTM 2216 dual-core processors running at 2.4 GHz with 2 MB L2-Cache and 4 GB RAM under Rocks Cluster GNU/Linux. Due to the sequential implementation of the code, only one core is used for running the executable. We wish to focus on the multi-objective components. Hence, we use a common underlying ACO algorithm, MMAS [18], for the management of pheromones (evaporation, pheromone trail limits, etc.). Basic parameters are set to α = 1, β = 2, Δτ = 1, τ0 = τmax , where Δτ is the amount of pheromone deposited by an ant and τ0 is the initial value of the pheromone. The evaporation factor is ρ = 0.2 if 2-opt local search is used, otherwise it is ρ = 0.05. The total number of ants is m = 30. Since the use of candidate lists is essential for obtaining highquality solutions for the single-objective TSP, we also use them for the BTSP
An Analysis of Algorithmic Components for MOACO
139
by sorting edges according to dominance ranking [14]. The size of the candidate list is 40. We consider the algorithmic components described in the previous section. When weights are needed, that is, when using multiple pheromone matrices or local search, we use m maximally dispersed weight vectors, where each component of the weight vectors is in the interval [0, 1] and the sum of the two components is equal to one. We test the two weight setting strategies explained above. These weight setting strategies are only relevant for components utilizing weights. In fact, they do not apply if only one pheromone matrix is used and no local search is done. We test different number of colonies ∈ {1, 3, 5}, with ants equally divided among the colonies (m/ ). We tested both update by origin and update by region strategies explained earlier. We found a slight advantage in favor of update by region, and, hence, we focus on the results obtained by this approach. Each experiment was run for a time limit of 300 seconds and repeated 25 times with different random seeds. We performed experiments on 3 Euclidean BTSP instances of 500 cities. Two were generated taking a single-objective Euclidean instance, and altering the coordinates of the cities by a correlated random noise to create a new distance matrix. This way, we generated a positively correlated instance, with a correlation between the distance matrices of 0.77, and an estimated correlation between the objectives of 0.81, measured by generating 10 000 random solutions. We also generated a zero correlated instance with a correlation between the distance matrices of 0.01, and an estimated correlation between the objectives of −0.01. In addition, we tested the instance euclidAB500 with correlation 0.01 available from http://eden.dei.uc.pt/~paquete/tsp/. In contrast to previous results on the BQAP [12], the different correlation does not lead to large differences in the behavior of MOACO components. Hence, we discuss only results obtained for the zero-correlated instance generated by us, but the same overall behavior was observed for the other instances.
5
Analysis of Experiments
Our goal is to identify fundamental differences between alternative components of multi-objective ACO algorithms, rather than obtaining the best configuration possible. For this purpose, we analyze the results by means of a graphical technique [11,13] based on the empirical attainment function (EAF) [8]. The attainment function gives the probability of a particular point in the objective space vector being attained by (dominated by or equal to) the outcome of a single run of an algorithm. This probability can be estimated from several runs of an algorithm, in order to calculate the empirical attainment function (EAF) of an algorithm. The EAFs of two algorithms can be compared by calculating the difference in value of the EAFs for each point in the objective space. Differences in favor of one algorithm indicate that those points are more likely to be attained by that algorithm than by its competitor, and, hence, that the performance of that algorithm is better (in that region of the objective space)
140
M. L´ opez-Ib´ an ˜ez and T. St¨ utzle
Fig. 1. Differences in EAFs with and without local search, either using one (top) or two pheromone matrices (bottom), and one colony
than the performance of its competitor. We display this information graphically by plotting side-by-side the differences between the EAFs of two algorithms. An example is the plot in Fig. 3. Each side of the plot displays points in the objective space where the difference in the value of the two EAFs is in favour of the corresponding algorithm mentioned at the bottom. The points are colored depending on the size of the difference (points where the difference is smaller than 0.2 are not shown). For example, a black point at the left side of the plot means that the corresponding algorithm has attained that point in (at least) 80% more runs than the algorithm at the right side. The plots also display, as a continuous line, the grand best and the grand worst attainment surfaces, which respectively delimit the regions never attained by any algorithm and always attained by any run of the two algorithms. The median attainment surface of each algorithm, which delimits the region of the objective space attained by 50% of the runs, is also shown in each side of the plot as a dashed line. Here we discuss the main effects of the components we have studied, in order of decreasing impact. The most important component in this sense is the use of
An Analysis of Algorithmic Components for MOACO
141
Fig. 2. Differences in EAFs between the two weight setting strategies. The algorithms use one colony, two pheromone matrices, and local search (top) or not (bottom).
local search. For the other components, we always give results with and without local search, since sometimes there are quite strong interactions observable. In the following, we present the results only for the BTSP. We applied the same algorithms to the BQAP using the same instances as in our previous work [12]; below we also comment shortly on the main findings for the BQAP, and indicate possible differences from the BTSP results. Component: Local Search. The strongest difference is caused by the use of local search. The plots in Fig. 1 compare the same configuration of MOACO with and without local search. The results in the top plot were obtained by using one pheromone matrix, whereas the bottom plot shows results using two pheromone matrices. In both cases, all differences are in favour of the variants using local search, showing that local search clearly improves the results obtained by ACO. This was the result in all our tests. A similar conclusion was obtained for the BQAP [12]. If we focus on the median attainment surfaces (dashed lines) on the bottom plot, the version without local search obtains solutions that are quite
142
M. L´ opez-Ib´ an ˜ez and T. St¨ utzle
Fig. 3. Differences in EAFs between one or two pheromone matrices. The comparison is done without local search; one colony.
Fig. 4. Differences in the EAFs between one and multiple colonies. Results are shown for one pheromone matrix (top) and two pheromone matrices (bottom).
An Analysis of Algorithmic Components for MOACO
143
close to those of the version with local search at the extremes of the Pareto front but very poor results in the center. This result suggests that handling the tradeoff between objectives is difficult for ACO when using two pheromone matrices. With only one pheromone matrix (top plot), the ACO algorithm tries to handle extreme and trade-off solutions at a same time, and, as a consequence, the results are rather poor along the whole Pareto front. Component: Weight Setting Strategies. Another notable observation is the difference in results obtained by the two weight setting strategies: All ants use the same weight at each iteration (TPLS-like strategy) versus each ant uses a different weight). The plots in Fig. 2 compare these two weight setting strategies with (top plot) and without local search (bottom plot). Although the benefit of the TPLS-like strategy is clearly stronger by a large margin if no local search is used, even in the case with local search there is an important difference along the whole Pareto front. By comparison, for the BQAP, there is no noticeable difference between weight setting strategies. In our experiments, we consistently noticed that the aggregation of multiple pheromones by using weights, which is necessary for computing the probabilities of adding specific solution components, is an expensive operation. If each ant uses a different weight, this aggregation has to be performed for each of the m ants. On the other hand, if all ants use the same weight, the probabilities can be pre-computed once at each iteration. For the BTSP, the latter approach allows to perform ten times more iterations than the former within the same computation time. Component: One vs. Two Pheromone Matrices. There were strong differences in the results obtained when using one pheromone matrix (plus update by non-dominated solutions) or two pheromone matrices (plus update by the best solution for each objective). Figure 3 compares both approaches when not using local search. The first approach with only one pheromone matrix tends to produce better results in the center of the Pareto front than the approach using two pheromone matrices. In contrast, when using two pheromones matrices, the resulting Pareto front is much better at the extremes of the Pareto frontier. If local search is used (not shown here), these structural differences disappear because the weighted local search explores the whole Pareto front explicitly. Moreover, when using local search, the approach using two pheromone matrices is better along the whole Pareto frontier, and in particular in the center of the Pareto front. Hence, there is a strong interaction between the components concerning the pheromone matrices and local search. Component: Number of Colonies. We observed, as well, differences in the structure of Pareto frontiers obtained when using one or more colonies. Figure 4 compares the use of one or more colonies for the approach based on one pheromone matrix (top) or two pheromone matrices (bottom). As illustrated by the top plot, when using one pheromone matrix, the results obtained with three colonies are better in the extremes than with one colony. The opposite behavior is observed in the bottom plot when using two pheromone matrices. In this case, five colonies improve over the results obtained by only one colony, except for the very extremes of the Pareto front.
144
M. L´ opez-Ib´ an ˜ez and T. St¨ utzle
Taking into account that using one pheromone matrix is good at the center of the Pareto front but very poor at the extremes, the effect of the multiple colonies is to improve the results at the extremes. On the other hand, using two pheromone matrices is very good at the extremes of the Pareto front, but quite poor at the center. In this case, the use of multiple colonies improves the results at the center. Hence, one may conclude that the effect of multiple colonies is to overcome the core weakness of each approach. However, the multiple colonies approach is not better along the whole Pareto frontier than a single colony, because each additional colony introduces an overhead in computation time and the available time is limited in our experiments. The benefits of additional colonies were even stronger in the case of the BQAP [12], since the overhead introduced by each additional colony was smaller.
6
Conclusions
In this paper, we have presented an analysis of several key algorithm components for the design of MOACO algorithms for the BTSP. We also have identified specific components, where different behavior is induced as when to compared to the BQAP. There are a number of ways of how this study can be extended. As next steps we plan to investigate the contribution of components of other concrete MOACO algorithms on performance both for the BTSP and BQAP. We also may consider comparisons to ACO algorithms that are embedded into higher-level guidance strategies such as two-phase local search [15]. We hope to have made a step forward towards an understanding of the impact of MOACO algorithm components on performance; ultimately we hope that this leads towards informed design choices when applying MOACO to practical applications. Acknowledgments. This work was supported by the META-X project, an Action de Recherche Concert´ee funded by the Scientific Research Directorate of the French Community of Belgium. Thomas St¨ utzle acknowledges support from the Belgian F.R.S.-FNRS, of which he is a Research Associate.
References 1. Alaya, I., Solnon, C., Gh´edira, K.: Ant colony optimization for multi-objective optimization problems. In: 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), Los Alamitos, CA, vol. 1, pp. 450–457. IEEE Computer Society Press, Los Alamitos (2007) 2. Angus, D.: Population-based ant colony optimisation for multi-objective function optimisation. In: Randall, M., Abbass, H.A., Wiles, J. (eds.) ACAL 2007. LNCS (LNAI), vol. 4828, pp. 232–244. Springer, Heidelberg (2007) 3. Angus, D., Woodward, C.: Multiple objective ant colony optimization. Swarm Intelligence 3(1), 69–85 (2009)
An Analysis of Algorithmic Components for MOACO
145
4. Bar´ an, B., Schaerer, M.: A multiobjective ant colony system for vehicle routing problem with time windows. In: Proceedings of the Twentyfirst Iasted International Conference on Applied Informatics, Insbruck, Austria, pp. 97–102 (2003) 5. Doerner, K., Gutjahr, W.J., Hartl, R.F., Strauss, C., Stummer, C.: Pareto ant colony optimization: A metaheuristic approach to multiobjective portfolio selection. Annals of Operations Research 131, 79–99 (2004) 6. Dorigo, M., St¨ utzle, T.: Ant Colony Optimization. MIT Press, Cambridge (2004) 7. Garc´ıa-Mart´ınez, C., Cord´ on, O., Herrera, F.: A taxonomy and an empirical analysis of multiple objective ant colony optimization algorithms for the bi-criteria TSP. European Journal of Operational Research 180(1), 116–148 (2007) 8. Grunert da Fonseca, V., Fonseca, C.M., Hall, A.O.: Inferential performance assessment of stochastic optimisers and the attainment function. In: Zitzler, E., Deb, K., Thiele, L., Coello Coello, C.A., Corne, D.W. (eds.) EMO 2001. LNCS, vol. 1993, pp. 213–225. Springer, Heidelberg (2001) 9. Hoos, H.H., St¨ utzle, T.: Stochastic Local Search—Foundations and Applications. Morgan Kaufmann Publishers, San Francisco (2005) 10. Iredi, S., Merkle, D., Middendorf, M.: Bi-criterion optimization with multi colony ant algorithms. In: Zitzler, E., Deb, K., Thiele, L., Coello Coello, C.A., Corne, D.W. (eds.) EMO 2001. LNCS, vol. 1993, pp. 359–372. Springer, Heidelberg (2001) 11. L´ opez-Ib´ an ˜ez, M., Paquete, L., St¨ utzle, T.: Hybrid population-based algorithms for the bi-objective quadratic assignment problem. Journal of Mathematical Modelling and Algorithms 5(1), 111–137 (2006) 12. L´ opez-Ib´ an ˜ez, M., Paquete, L., St¨ utzle, T.: On the design of ACO for the biobjective quadratic assignment problem. In: Dorigo, M., Birattari, M., Blum, C., Gambardella, L.M., Mondada, F., St¨ utzle, T. (eds.) ANTS 2004. LNCS, vol. 3172, pp. 214–225. Springer, Heidelberg (2004) 13. L´ opez-Ib´ an ˜ez, M., Paquete, L., St¨ utzle, T.: Exploratory analysis of stochastic local search algorithms in biobjective optimization. In: Bartz-Beielstein, T., et al. (eds.) Experimental Methods for the Analysis of Optimization Algorithms. Springer, Heidelberg (2010) (to appear) 14. Lust, T., Jaszkiewicz, A.: Speed-up techniques for solving large-scale biobjective TSP. Computers & Operations Research (2009) (in press) 15. Paquete, L., St¨ utzle, T.: A study of stochastic local search algorithms for the biobjective QAP with correlated flow matrices. European Journal of Operational Research 169(3), 943–959 (2006) 16. Paquete, L., St¨ utzle, T.: Clusters of non-dominated solutions in multiobjective combinatorial optimization: An experimental analysis. In: Barichard, V., et al. (eds.) Multiobjective Programming and Goal Programming: Theoretical Results and Practical Applications. Lecture Notes in Economics and Mathematical Systems, vol. 618, pp. 69–77. Springer, Berlin (2009) 17. St¨ utzle, T.: ACOTSP: A software package of various ant colony optimization algorithms applied to the symmetric traveling salesman problem (2002), http://www.aco-metaheuristic.org/aco-code 18. St¨ utzle, T., Hoos, H.H.: MAX –MIN Ant System. Future Generation Computer Systems 16(8), 889–914 (2000)
Alternative Fitness Assignment Methods for Many-Objective Optimization Problems Mario Garza Fabre1 , Gregorio Toscano Pulido1 , and Carlos A. Coello Coello2 1
2
CINVESTAV-Tamaulipas. Km. 6 carretera Cd. Victoria-Monterrey, Cd. Victoria, Tamaulipas, 87267, M´exico CINVESTAV-IPN, Depto. de Computaci´on (Evolutionary Computation Group), Av. IPN No. 2508, San Pedro Zacatenco, M´exico, D.F. 07360, M´exico
Abstract. Pareto dominance (PD) has been the most commonly adopted relation to compare solutions in the multiobjective optimization context. Multiobjective evolutionary algorithms (MOEAs) based on PD have been successfully used in order to optimize bi-objective and three-objective problems. However, it has been shown that Pareto dominance loses its effectiveness as the number of objectives increases and thus, the convergence behavior of approaches based on this concept decreases. This paper tackles the MOEAs’ scalability problem that arises as we increase the number of objective functions. In this paper, we perform a comparative study of some of the state-of-the-art fitness assignment methods available for multiobjective optimization in order to analyze their ability to guide the search process in high-dimensional objective spaces.
1 Introduction Evolutionary algorithms (EAs) draw inspiration from the process of natural evolution in order to evolve progressively a population of individuals (i.e., potential solutions to the optimization problem) through the application of a series of probabilistic processes. As a population-based approach, EAs are suitable alternatives to solve problems with two or more objectives (the so-called multiobjective optimization problems, or MOPs for short), since they are able to explore simultaneously different regions of the search space and to produce several elements of the Pareto optimal set within a single execution. Since the mid-1980s, the field of evolutionary multiobjective optimization (EMO, for short) has grown and a wide variety of multiobjective EAs (or MOEAs) have been proposed so far. Despite the considerable volume of research on EMO, most of these efforts have been focused on two-objective or three-objective problems. Recently, the EMO community started to explore the scalability of MOEAs with respect to the number of objective functions. As a result, several studies have shown that even the most popular MOEAs fail to converge to the trade-off surface in high-dimensional objective spaces [17,12,10,13]. MOPs having more than 3 objectives are referred to as many-objective optimization problems in the specialized literature [7]. EAs requires a function which measures the fitness of solutions in order to identify the best candidates to guide the search process. When dealing with a single-objective P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 146–157, 2010. c Springer-Verlag Berlin Heidelberg 2010
Alternative Fitness Assignment Methods for Many-Objective Optimization Problems
147
problem, such fitness function is usually related to the function to be optimized. However, when solving MOPs it is required an additional mechanism to map the multiobjective space into a single dimension in order to allow a direct comparison among solutions; this mechanism is known as the fitness assignment process1 [11]. Pareto dominance (PD) has been the most commonly adopted relation to discriminate among solutions in the multiobjective context, and it has been the basis to develop most of the MOEAs proposed so far. However, PD loses its discrimination potential with the increase in the number of objectives and thus, decreases the convergence ability of approaches based on this concept. With the aim of clarifying this point, Figure 1 shows how the proportion of nondominated solutions (i.e., equally good solutions for PD) grows in a population with respect to the number of objective functions and as the search progresses. This experiment was performed using two well-known scalable test problems, namely DTLZ1 and DTLZ6 [4]. The data in Figure 1 corresponds to the mean of 31 independent runs of a generic MOEA (described in Section 3) with a population of 100 individuals. (b) DTLZ6
90 80 70 5 objectives 10 objectives 15 objectives 20 objectives 30 objectives 50 objectives
60 50 40 30 0
1
2 3 Generation
4
5
Pareto−nondominated solutions (%)
Pareto−nondominated solutions (%)
(a) DTLZ1 100
100 98 96 94 5 objectives 10 objectives 15 objectives 20 objectives 30 objectives 50 objectives
92 90 88 86 0
1
2 3 Generation
4
5
Fig. 1. Proportion of Pareto-nondominated solutions with respect to the number of objectives
From Figure 1, we can clearly see that an increment in the number of objectives raises the proportion of nondominated individuals even in the case of the initial population (generation 0) which is randomly generated. This problem becomes more evident as the search progresses and the population is rapidly saturated with nondominated solutions. When the whole population becomes nondominated it is not possible to discriminate among solutions and thus, the search process weakens since the selection is performed practically at random. It should be clear how important is to devise alternative approaches to rank solutions when dealing with many-objective problems. This paper tackles the MOEAs’ scalability problem that arises when the number of objectives is increased, by performing a comparative study of some state-of-the-art alternative approaches to PD. In our study, we incorporate the considered approaches into a generic MOEA in order to investigate their convergence ability and scalability with respect to the number of objectives. The remainder of this document is structured as follows: Section 2 describes the studied approaches. In Section 3, we present the results of the performed comparative 1
In this study, we will use indistinctly the terms fitness and rank to refer to the value which expresses the quality of solutions and allows to compare them with respect to each other.
148
M. Garza-Fabre, G. Toscano-Pulido, and C.A. Coello Coello
study. Finally, Section 4 provides our conclusions as well as some possible directions for future research.
2 Multiobjective Fitness Assignment Methods In this study, we assume that all objectives are equally important and, without loss of generality, we will refer only to minimization problems. Here, we are interested in solving many-objective optimization problems with the following form: Minimize F(Xi ) = [ f1 (Xi ), f2 (Xi ), . . . , fM (Xi )]T subject to Xi ∈ F
(1)
where Xi is a decision vector (containing decision variables), F(Xi) is the M-dimensional objective vector (M > 3), fm (Xi ) is the m-th objective function, and F is the feasible region delimited by the problem’s constraints. Also, from now on, we will use the ranking procedure proposed by Fonseca and Fleming [9] for all the approaches described herein, except for those for which a different ranking method is explicitly given. Fonseca and Fleming proposed to rank each solution Xi in a population P as follows: rank(Xi ) = 1 + |{X j ∈ P : X j ≺ Xi }|
(2)
where X j ≺ Xi denotes that solution X j dominates (is better than) Xi according to a preference relation ≺. ≺ was originally proposed for Pareto dominance. 2.1 Pareto Dominance Pareto dominance (PD) was proposed by Vilfredo Pareto [16] and is defined as follows: given two solutions Xi , X j ∈ F , we say that Xi Pareto-dominates X j (Xi ≺P X j ) if and only if: ∀m ∈ {1, 2, . . ., M} : fm (Xi ) ≤ fm (X j ) ∧ ∃m ∈ {1, 2, . . ., M} : fm (Xi ) < fm (X j )
(3)
2.2 Ranking Composition Methods Ranking composition methods (RCM) extract the separated fitnesses of every solution into a list of fitness values for each objective. These lists are then individually sorted, resulting in a set of different ranking positions for every solution for each objective. The ranking positions of a solution Xi are given by the vector R(Xi ) = [r1 (Xi ), r2 (Xi ), . . . , rM (Xi )]T , where rm (Xi ) is the rank of Xi for the m-th objective. Finally, the different ranking positions of an individual are composed into a single ranking which reflects the candidate solutions’ quality [1]. Assuming a previous calculation of R(Xi ) for each solution Xi , we describe below some RCM reported in the specialized literature.
Alternative Fitness Assignment Methods for Many-Objective Optimization Problems
149
Average ranking (AR). This method was proposed by Bentley and Wakefield [1]. The global rank of an individual Xi is given by: rank(Xi ) =
M
∑ rm (Xi )
(4)
m=1
Maximum ranking (MR). Bentley and Wakefield [1] also proposed a method in which the global rank of an individual corresponds to its best ranking position: M
rank(Xi ) = min rm (Xi ). m=1
(5)
2.3 Relaxed Forms of Dominance Some authors have developed alternative methods to PD in order to allow a finer grain discrimination among solutions. Relaxed forms of dominance (RFD) make possible for a solution Xi to dominate another solution X j even in cases when Xi is Paretodominated by X j . Generally, RFDs can accept a detriment in some objectives whether it implies a considerable improvement of the solution in some other objectives. Next, we describe some RFD reported in the specialized literature. Approaches that require parameters The approaches described below require the fine-tuning of at least one parameter which sometimes involves in-depth knowledge or understanding of the method and the problem to be solved. This specification can be seen as a drawback, since it can reduce the application range of such approaches. α-domination strategy (AD). Ikeda et al. [14] proposed a RFD to deal with what they called dominance resistant solutions, i.e., solutions that are extremely inferior to others in at least one objective, but hardly-dominated. The idea of AD is setting upper/lower bounds of trade-off rates between two objectives, in order to allow Xi to dominate X j if Xi is slightly inferior in an objective but largely superior in some other objectives. A solution Xi α-dominates solution X j (Xi ≺α X j ) if and only if: ∀m ∈ {1, 2, . . ., M} : gm (Xi , X j ) ≤ 0 ∧ ∃m ∈ {1, 2, . . . , M} : gm (Xi , X j ) < 0
(6)
where gm (Xi , X j ) = fm (Xi ) − fm (X j ) +
∑ αmn ( fn (Xi ) − fn (X j ))
(7)
n =m
and αmn is the trade-off rate between objectives m and n. If αmn = 0 for all pairs of ob1 jectives, AD enforces PD. In [14], parameters αmn were set to a constant c = { 13 , 19 , 100 } 1 in all m = n. Since 3 was the value which allowed the best performance in [14], this value will be used for this study.
150
M. Garza-Fabre, G. Toscano-Pulido, and C.A. Coello Coello
k-dominance (KD). Farina and Amato [8] proposed a dominance relation which takes into account the number of objectives where a solution Xi is better, equal and worse than another solution X j . For this purpose, the authors defined respectively the following functions: nb (Xi , X j ) = |{ m : fm (Xi ) < fm (X j ) }| for m ∈ {1, 2, . . ., M}
(8)
ne (Xi , X j ) = |{ m : fm (Xi ) = fm (X j ) }| for m ∈ {1, 2, . . ., M}
(9)
nw (Xi , X j ) = |{ m : fm (Xi ) > fm (X j ) }| for m ∈ {1, 2, . . ., M}
(10)
For simplicity, we will refer to functions in Equations (8), (9) and (10) as nb , ne and nw , respectively. Given two solutions Xi and X j , we say that Xi k-dominates2 X j (Xi ≺k X j ) if and only if: ne < M ∧ nb ≥
M − ne k+1
(11)
where 0 ≤ k ≤ 1. If k=0, KD and PD relations’ discrimination would be equivalent. The strictness of KD depends on the value chosen for k. In this study the value k = 1 will be used in order to enhance discrimination among solutions. Volume dominance (VD). This dominance relation was proposed by Le and LandaSilva [15]. VD is based on the volume of the objective space that a solution dominates. The dominated volume of Xi is defined as the region R for which all its feasible solutions are dominated by Xi . We need to define a reference point r such that it is dominated by all solutions in R. The dominated volume of a solution Xi with respect to the reference point r = [r1 , r2 , . . . , rM ]T is given by: V (Xi , r) =
M
∏ (rm − fm (Xi ))
(12)
m=1
To establish the dominance relationship of two solutions Xi and X j , we need to compare their dominated volumes to the shared dominated volume (SV), i.e., the volume dominated by both solutions. The SV is defined as follows: M
SV (Xi , X j , r) = ∏(ri − max( fm (Xi ), fm (X j )))
(13)
i=1
It is said that Xi volume-dominates X j (Xi ≺V X j ) for a ratio rSV if either (14) or (15) holds. V (X j , r) = SV (Xi , X j , r) ∧ V (Xi , r) > SV (Xi , X j , r)
V (Xi , r) > V (X j , r) > SV (Xi , X j , r) ∧ 2
V (Xi , r) − V (X j , r) > rSV SV (Xi , X j , r)
(14)
(15)
In [8], the term (1 − k)-dominates is used but, for simplicity, we will use k-dominates instead.
Alternative Fitness Assignment Methods for Many-Objective Optimization Problems
151
A small rSV indicates that a small difference between the dominated volumes of two solutions is enough to establish preferences between them. The authors suggested us that a value in the range [0.05, 0.15] is reasonable for rSV ; we will use rSV = 0.1. In our experimental study we will apply this method to the normalized objectives within the range [0, 1] (see details in Section 3) and thus, the reference point could be any point r such that rm > 1. We’ll use r = 1.1M for this study. Contraction/expansion of dominance area (CE). Sato et al. [18] proposed a method to strengthen or weaken the selection process by expanding or contracting the solutions’ dominance area. The fitness value of a solution Xi for each objective function is modified as follows: fm (Xi ) =
r · sin(ωm + Sm · π) ∀m ∈ {1, 2, . . ., M} sin(Sm · π)
(16)
where r is the norm of vector F(Xi ) and ωm is the declination angle between F(Xi ) and fm (Xi ), which can be calculated as ωm = fm (Xi )/r. Sm is a user defined parameter which allows to control the dominance area of Xi for the m-th dimension. The possible values for Sm lie in the range [0.25, 0.75]. If Sm = 0.5, then fm (Xi ) = fm (Xi ). Otherwise, if Sm > 0.5 the dominance area is contracted, producing a coarser ranking of solutions and would weaken the selection process. On the other hand, Sm < 0.5 expands the dominance area and would strengthen the selection by producing a more fine grained ranking of solutions. It is clear that for many-objective problems, we are interested in expanding the dominance area of solutions in order to achieve a richer ordering of preferences among them. For this study we adopted the value Sm = 0.25 for all m ∈ {1, 2, . . . , M}. Parameter-less approaches Unlike the above methods, the operation of the approaches described in this section does not depend on any parameters’ fine-tuning, which expands their applicability and facilitates their understanding and implementation. L-dominance (LD). This dominance relation was proposed by Zou et al. [20]. Similar to the KD relation, LD considers functions nb (8), ne (9) and nw (10), which count the number of objectives in which a solution Xi is respectively better, equal and worse than another solution X j . According to LD, we can say that Xi L-dominates X j (Xi ≺L X j ) if and only if: nb − nw = L > 0 ∧ f(Xi ) p < f(X j ) p (for certain p)
(17)
where F(Xi ) p is the p-norm of a solution Xi . The value p = 1 is used in this study. Favour relation (FD). In this alternative dominance relation, proposed by Drechsler et al. [6], a solution Xi is said to dominate another solution X j (Xi ≺ f X j ) if and only if: |{m : fm (Xi ) < fm (X j )}| > |{n : fn (X j ) < fn (Xi )}| for m, n ∈ {1, 2, . . . , M} (18) Since FD is not a transitive relation (consider solutions Xi =(8,7,1), X j =(1,9,6) and Xk =(7,0,9); it is clear that Xi ≺ f X j ≺ f Xk ≺ f Xi ), authors proposed to rank solutions as
152
M. Garza-Fabre, G. Toscano-Pulido, and C.A. Coello Coello
follows: to use a graph representation for the relation, where each solution is a node and the preferences are given by edges, in order to identify the Strongly Connected Components (SCC). A SCC groups all elements which are not comparable to each other (as the cycle of solutions in the above example). A new cycle-free graph is constructed using the obtained SCCs, such that it would be possible to establish an order by assigning the same rank to all solutions that belong to the same SCC. Preference order ranking (PO). di Pierro et al. proposed an strategy that ranks a population according to the order of efficiency of solutions [5]. An individual Xi is consideredefficient of order k if it is not Pareto-dominated by any other individual for any of the Mk subspaces where are considered only k objectives at a time. Efficiency of order M for a MOP with exactly M objectives simply corresponds to the original Pareto optimality definition. If Xi is efficient of order k, then it is efficient of order k + 1. Analogously, if Xi is not efficient of order k, then it is not efficient of order k − 1. Given these properties, the order of efficiency of a solution Xi is the minimum k value for which Xi is efficient: M
order(Xi ) = min(k : isE f f icient(Xi , k)) k=1
(19)
where isE f f icient(Xi , k) is to be true if Xi is efficient of order k. The order of efficiency can be used to rank solutions. The smaller the order of efficiency an individual has, the better this individual is. In [5] it is proposed to use this strategy in combination with a PD-based ranking procedure, in such a way that the order of efficiency can be used to discriminate among solutions classified with the same rank according to PD. However, since it is known that for many-objective problems the whole population rapidly becomes nondominated (all solutions share the same rank), in this study we rank solutions by using the order of efficiency alone.
3 Experimental Results The different ranking methods described in Section 2 were incorporated into a generic MOEA in order to investigate their convergence ability as the number of objectives increases. Figure 2 describes the implemented MOEA’s workflow. Initially, a parent population of N individuals is randomly generated. Then, this population is ranked and selection is performed in order to identify those individuals which are to be reproduced. A children population of N new individuals is generated by applying variator operators over the selected individuals. Finally, parent and children populations are combined and ranked in order to select the N best individuals to survive and to form the new parent population (elitist MOEA [2]). The ranking step (which is remarked in Figure 2) is where the different studied approaches were incorporated. The implemented operators are: binary tournament selection based on the rank of solutions. Simulated binary crossover (ηc = 15) with probability of 1. Polynomial mutation (ηm = 20) with probability of 1/n, where n is the number of decision variables. We used a population of N = 100 individuals and 300 generations for all experiments.
Alternative Fitness Assignment Methods for Many-Objective Optimization Problems
#"$
"
&
153
%
!
Fig. 2. Implemented generic MOEA’s workflow
In order to avoid alterations in the behavior of the studied methods, we did not use any additional mechanism to maintain diversity in the population. The different studied approaches were applied to the normalized objective values: f m (Xi )−GMINm fm (Xi ) = GMAX for all m = 1, 2, ..., M, where GMAXm and GMINm are the m −GMINm maximum and minimum known values for the m-th objective. However, since we know a priori that for the adopted set of test problems GMINm = 0, we simply normalized f m (Xi ) the objectives as follows: fm (Xi ) = GMAX for all m = 1, 2, ..., M. m Problems DTLZ1 and DTLZ6 [4] were selected for our experimental study. These test functions can be scaled to any number of objectives and decision variables. The total number of variables in these problems is n = M + k − 1, where M is the number of objectives. k is a difficulty parameter and was set to k = 5 for DTLZ1 and k = 10 for DTLZ6. In this study, we consider instances with M = {5, 10, 15, 20, 30, 50} objectives. However, since PO becomes computationally expensive as the number of objectives increases, we only applied it for instances with up to 20 objectives. As a convergence measure, the average distance of Pareto-nondominated solutions in the approximation set obtained by the MOEA from the Pareto front was computed [3]. Since equations defining the Pareto front are known for the test problems adopted, the convergence measure was analytically determined [12,19]. Tables 1 and 2 show the obtained results when the different methods were applied to problems DTLZ1 and DTLZ6, respectively. These tables show the average and standard deviation of the convergence measure for 31 independent trails of each experiment. From Table 1 we can highlight that, for DTLZ1, LD, CE and AD showed the best convergence ability as the number of objectives increases, whereas MR obtained the worst average convergence for all instances of this problem. Problem DTLZ6 (Table 2) imposes higher convergence difficulties for most of the studied approaches. On the one hand, CE was the only method which achieved relatively low values for the convergence measure in all the instances of this problem. On the other hand, AR and KD seem to be the most affected methods by the DTLZ6’s difficulties, since these obtained the worst performance in almost all instances. Results confirm that Pareto dominance is not able to effectively guide the search in many-objective scenarios. However, the achieved convergence of MR is even worse than that of PD in all cases. In our opinion, this is because MR tends to favor extreme
154
M. Garza-Fabre, G. Toscano-Pulido, and C.A. Coello Coello
Table 1. Average and standard deviation of the achieved convergence in 31 runs for DTLZ1
PD AR MR AD KD VD CE LD FD PO
5 Obj. 1.363 ± 1.072 0.000 ± 0.000 45.40 ± 22.77 0.002 ± 0.002 0.000 ± 0.000 0.582 ± 0.355 0.002 ± 0.001 0.000 ± 0.000 0.001 ± 0.001 0.750 ± 0.691
10 Obj. 17.94 ± 14.67 0.000 ± 0.000 32.37 ± 12.13 0.001 ± 0.002 0.025 ± 0.134 0.464 ± 0.468 0.002 ± 0.001 0.000 ± 0.000 0.047 ± 0.151 1.879 ± 4.646
15 Obj. 6.740 ± 7.553 0.005 ± 0.023 26.62 ± 10.22 0.007 ± 0.024 0.022 ± 0.047 0.515 ± 0.384 0.002 ± 0.002 0.004 ± 0.023 0.121 ± 0.237 1.121 ± 1.302
20 Obj. 6.615 ± 7.254 0.019 ± 0.041 29.89 ± 12.32 0.005 ± 0.020 0.033 ± 0.058 0.400 ± 0.415 0.002 ± 0.001 0.004 ± 0.020 0.277 ± 0.492 0.853 ± 0.816
30 Obj. 5.687 ± 5.448 0.110 ± 0.117 18.92 ± 10.37 0.013 ± 0.031 0.128 ± 0.107 0.388 ± 0.374 0.008 ± 0.022 0.006 ± 0.023 0.829 ± 1.131 -
50 Obj. 2.845 ± 2.301 0.378 ± 0.281 15.83 ± 8.824 0.091 ± 0.074 0.480 ± 0.329 0.323 ± 0.234 0.032 ± 0.059 0.030 ± 0.035 2.859 ± 3.330 -
Table 2. Average and standard deviation of the achieved convergence in 31 runs for DTLZ6
PD AR MR AD KD VD CE LD FD PO
5 Obj. 6.525 ± 0.402 0.150 ± 0.044 8.607 ± 0.349 0.074 ± 0.028 0.063 ± 0.032 0.106 ± 0.035 0.081 ± 0.029 0.079 ± 0.029 5.377 ± 2.361 5.836 ± 0.644
10 Obj. 8.485 ± 0.350 10.00 ± 0.000 9.543 ± 0.233 0.553 ± 0.069 10.00 ± 0.000 0.102 ± 0.038 0.089 ± 0.030 7.416 ± 0.237 9.902 ± 0.135 9.847 ± 0.106
15 Obj. 8.584 ± 0.353 10.00 ± 0.000 9.820 ± 0.155 0.677 ± 0.109 10.00 ± 0.000 0.152 ± 0.089 0.080 ± 0.027 7.149 ± 0.319 9.515 ± 0.166 9.968 ± 0.041
20 Obj. 8.646 ± 0.352 10.00 ± 0.000 9.828 ± 0.096 0.659 ± 0.091 10.00 ± 0.000 0.213 ± 0.140 0.089 ± 0.030 7.105 ± 0.338 9.864 ± 0.096 9.994 ± 0.007
30 Obj. 8.783 ± 0.467 10.00 ± 0.000 9.887 ± 0.057 0.706 ± 0.093 10.00 ± 0.000 0.447 ± 0.313 0.094 ± 0.032 6.744 ± 0.326 9.742 ± 0.106 -
50 Obj. 8.870 ± 0.285 10.00 ± 0.000 9.874 ± 0.050 0.679 ± 0.081 10.00 ± 0.000 0.634 ± 0.371 0.091 ± 0.028 6.218 ± 0.354 9.641 ± 0.112 -
solutions, i.e., it prefers solutions with the best performance for some objectives but without taking into account their assessment in the rest of the objectives. With the aim of clarifying this point, consider the following example. If in a 20-objective MOP, Xi is the solution with the best performance with respect to the first objective, but it is the worst solution for the remainder 19 objectives, Xi would be classified with the best rank by MR. We consider MR as the worst of the studied alternatives. In general, according to our experimental observations, MR, PD, PO and FD are the four methods with the worst performance. On the other hand, results suggest that CE provides the best convergence properties and the most stable behavior as the number of objectives increases. Additionally, we investigated the convergence speed achieved by the MOEA when using the different ranking schemes of our interest. Figures 3 and 4 show for DTLZ1 and DTLZ6, respectively, the average convergence of 31 runs as the search progresses. Due to space limitations, we only show results for the 20-objective instances, since 20 is the maximum number of objectives for which we performed experiments for all the studied methods. The data shown in Figures 3 and 4 was plotted in logarithmic scale, in order to highlight the differences in the results obtained using each method.
Convergence measure
Alternative Fitness Assignment Methods for Many-Objective Optimization Problems 10
2
10
1
155
100
10
-1
PD AR MR AD KD VD CE LD FD PO
10-2
10
- DTLZ1, 20 Objectives -
-3
0
25
50
75
100
125
150 175 Generation
200
225
250
275
300
Convergence measure
Fig. 3. Convergence at different search stages for DTLZ1 problem with 20 objectives 10
1
10
0
10
-1
10
-2
PD AR MR AD KD VD CE LD FD PO 0
- DTLZ6, 20 Objectives -
25
50
75
100
125
150 175 Generation
200
225
250
275
300
Fig. 4. Convergence at different search stages for DTLZ6 problem with 20 objectives
Figure 3 confirms that CE, LD and AD performed the best for problem DTLZ1. We can clearly see that these three methods had an accelerated convergence, since during the first 50 generations they reached relatively low values for the convergence measure. Regarding the DTLZ6 problem, Figure 4 shows that, as stated before, most of the studied alternatives failed to converge to the Pareto-optimal frontier. CE, VD and AD (in this order) showed the best convergence properties and a high convergence speed during the first 130 generations.
4 Conclusions and Future Work Since the performance of Pareto-based MOEAs deteriorates as the number of objectives increases, it is necessary to identify alternative approaches to establish preferences among solutions in many-objective scenarios. In this paper, we performed a comparative study of some state-of-the-art approaches of this sort in order to investigate their ability to guide the search process in high-dimensional objective spaces.
156
M. Garza-Fabre, G. Toscano-Pulido, and C.A. Coello Coello
Due to space limitations we only considered two test cases. However, it is of our interest to extend these experiments to a larger set of test functions as well as to adopt real-world many-objective problems in order to generalize our results. Since the performance of some of the studied approaches depends of a proper parameters’ fine-tuning, as part of our future work we want to investigate the influence of the parameter’s settings on the behavior of such approaches. In this paper we focused on the convergence properties of different ranking schemes. However, an important issue of MOEAs is to converge to a set of well-spread solutions. Therefore, we also want to extend our experiments in order to study the distribution of the approximation set achieved by each of the studied methods.
Acknowledgements The first author acknowledges support from CONACyT through a scholarship to pursue graduate studies at the Information Technology Laboratory at CINVESTAV-IPN. The second author gratefully acknowledges support from CONACyT through project 90548. The third author is also affiliated to the UMI-LAFMIA 3175 CNRS. Also, This research was partially funded by project number 51623 from “Fondo Mixto ConacytGobierno del Estado de Tamaulipas”. Finally, we would like to thank to Fondo Mixto de Fomento a la Investigaci´on cient´ıfica y Tecnol´ogica CONACyT - Gobierno del Estado de Tamaulipas for their support to publish this paper.
References 1. Bentley, P.J., Wakefield, J.P.: Finding Acceptable Solutions in the Pareto-Optimal Range using Multiobjective Genetic Algorithms. In: Chawdhry, P.K., Roy, R., Pant, R.K. (eds.) Soft Computing in Engineering Design and Manufacturing, Part 5, London, June 1997, pp. 231–240. Springer Verlag London Limited, Heidelberg (1997) (Presented at the 2nd On-line World Conference on Soft Computing in Design and Manufacturing (WSC2)) 2. Deb, K., Agrawal, S., Pratab, A., Meyarivan, T.: A Fast Elitist Non-Dominated Sorting Genetic Algorithm for Multi-Objective Optimization: NSGA-II. KanGAL report 200001, Indian Institute of Technology, Kanpur, India (2000) 3. Deb, K., Mohan, R.S., Mishra, S.K.: Towards a quick computation of well-spread paretooptimal solutions. In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Deb, K., Thiele, L. (eds.) EMO 2003. LNCS, vol. 2632, pp. 222–236. Springer, Heidelberg (2003) 4. Deb, K., Thiele, L., Laumanns, M., Zitzler, E.: Scalable Test Problems for Evolutionary Multiobjective Optimization. In: Abraham, A., Jain, L., Goldberg, R. (eds.) Evolutionary Multiobjective Optimization. Theoretical Advances and Applications, pp. 105–145. Springer, USA (2005) 5. di Pierro, F., Khu, S.-T., Savi´c, D.A.: An Investigation on Preference Order Ranking Scheme for Multiobjective Evolutionary Optimization. IEEE Transactions on Evolutionary Computation 11(1), 17–45 (2007) 6. Drechsler, N., Drechsler, R., Becker, B.: Multi-objective optimisation based on relation favour. In: Zitzler, E., Deb, K., Thiele, L., Coello Coello, C.A., Corne, D.W. (eds.) EMO 2001. LNCS, vol. 1993, pp. 154–166. Springer, Heidelberg (2001)
Alternative Fitness Assignment Methods for Many-Objective Optimization Problems
157
7. Farina, M., Amato, P.: On the Optimal Solution Definition for Many-criteria Optimization Problems. In: Proceedings of the NAFIPS-FLINT International Conference 2002, Piscataway, New Jersey, June 2002, pp. 233–238. IEEE Service Center, Los Alamitos (2002) 8. Farina, M., Amato, P.: A fuzzy definition of “optimality” for many-criteria optimization problems. IEEE Transactions on Systems, Man, and Cybernetics Part A—Systems and Humans 34(3), 315–326 (2004) 9. Fonseca, C.M., Fleming, P.J.: Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generalization. In: Forrest, S. (ed.) Proceedings of the Fifth International Conference on Genetic Algorithms, San Mateo, California, pp. 416–423. University of Illinois at Urbana-Champaign, Morgan Kauffman Publishers (1993) 10. Hughes, E.J.: Evolutionary Many-Objective Optimisation: Many Once or One Many? In: 2005 IEEE Congress on Evolutionary Computation (CEC’2005), Edinburgh, Scotland, September 2005, vol. 1, pp. 222–227. IEEE Service Center, Los Alamitos (2005) 11. Hughes, E.J.: Fitness Assignment Methods for Many-Objective Problems. In: Knowles, J., Corne, D., Deb, K. (eds.) Multi-Objective Problem Solving from Nature: From Concepts to Applications, pp. 307–329. Springer, Berlin (2008) 12. Khare, V.R., Yao, X., Deb, K.: Performance scaling of multi-objective evolutionary algorithms. In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Deb, K., Thiele, L. (eds.) EMO 2003. LNCS, vol. 2632, pp. 376–390. Springer, Heidelberg (2003) 13. Knowles, J.D., Corne, D.W.: Quantifying the effects of objective space dimension in evolutionary multiobjective optimization. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 757–771. Springer, Heidelberg (2007) 14. Kokolo, I., Hajime, K., Shigenobu, K.: Failure of Pareto-based MOEAs: Does Nondominated Really Mean Near to Optimal? In: Proceedings of the Congress on Evolutionary Computation 2001 (CEC 2001), Piscataway, New Jersey, May 2001, vol. 2, pp. 957–962. IEEE Service Center, Los Alamitos (2001) 15. Le, K., Landa-Silva, D.: Obtaining Better Non-Dominated Sets Using Volume Dominance. In: 2007 IEEE Congress on Evolutionary Computation (CEC’2007), Singapore, September 2007, pp. 3119–3126. IEEE Press, Los Alamitos (2007) 16. Pareto, V.: Cours d’Economie Politique. Droz, Gen`eve (1896) 17. Purshouse, R.C., Fleming, P.J.: Evolutionary Multi-Objective Optimisation: An Exploratory Analysis. In: Proceedings of the 2003 Congress on Evolutionary Computation (CEC 2003), Canberra, Australia, December 2003, vol. 3, pp. 2066–2073. IEEE Press, Los Alamitos (2003) 18. Sato, H., Aguirre, H.E., Tanaka, K.: Controlling dominance area of solutions and its impact on the performance of mOEAs. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 5–20. Springer, Heidelberg (2007) 19. Wagner, T., Beume, N., Naujoks, B.: Pareto-, aggregation-, and indicator-based methods in many-objective optimization. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007. LNCS, vol. 4403, pp. 742–756. Springer, Heidelberg (2007) 20. Zou, X., Chen, Y., Liu, M., Kang, L.: A New Evolutionary Algorithm for Solving ManyObjective Optimization Problems. IEEE Transactions on Systems, Man, and Cybernetics– Part B: Cybernetics 38(5), 1402–1412 (2008)
Evolving Efficient List Search Algorithms Kfir Wolfson and Moshe Sipper Dept. of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
Abstract. We peruse the idea of algorithmic design through Darwinian evolution, focusing on the problem of evolving list search algorithms. Specifically, we employ genetic programming (GP) to evolve iterative algorithms for searching for a given key in an array of integers. Our judicious design of an evolutionary language renders the evolution of linear-time search algorithms easy. We then turn to the far more difficult problem of logarithmic-time search, and show that our evolutionary system successfully handles this case. Subsequently, because our setup might be perceived as being geared towards the emergence of binary search, we generalize our genomic representation, allowing evolution to assemble its own useful functions via the mechanism of automatically defined functions (ADFs). We show that our approach routinely and repeatedly evolves general and correct efficient algorithms.
1
Introduction
One of the most basic tasks a computer scientist faces is that of designing an algorithm to solve a given problem. In his book Algorithmics, Harel [1] defines the subject matter as “the area of human study, knowledge, and expertise that concerns algorithms.” Indeed, the subtitle of Harel’s book—“The Spirit of Computing”—evidences the importance of algorithm design in computer science. While simple problems readily yield to algorithmic solutions, many, if not most, problems of interest are hard, and finding an algorithm to solve them is an arduous task. Compounding this task is our desire not only to find a correct algorithm but also an efficient one, with efficiency being measured in terms of resources to be used with discretion, such as time, memory, and network traffic. Evolutionary algorithms have been applied in recent years to numerous problems from diverse domains. However, their application within the field of software engineering in general, and algorithmic design in particular, is still quite limited. This dearth of research might be partly attributed to the complexity of algorithms—and the even greater complexity of their design. Our aim in this paper is to introduce the notion of algorithmic design through Darwinian evolution. To find out whether this approach has any merit at all, we begin with a benchmark case, one familiar to any freshman computer-science student: searching for a given key in an array of elements. A solution to this
Kfir Wolfson was partially supported by the Frankel Center for Computer Science at Ben-Gurion University.
P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 158–169, 2010. c Springer-Verlag Berlin Heidelberg 2010
Evolving Efficient List Search Algorithms
159
problem is known as a list search algorithm (implying a one-dimensional array, or a linked-list of elements), and can be either iterative or recursive in nature. Herein, we evolve iterative algorithms for arrays of integers (rather than lists), and refer to them as array search algorithms, or simply search algorithms. We ask two questions: 1. Can evolution be applied to finding a search algorithm? 2. Can evolution be applied to finding an efficient search algorithm? Employing genetic programming (GP) to find algorithmic innovations, our findings show that the answer to both questions is affirmative. Indeed, our judicious design of an evolutionary language renders the answer to the first question quite straightforward: A search algorithm that operates in linear time is easily evolved. We then turn to finding a more efficient algorithm for sorted arrays, concentrating on execution time as a measure of efficiency. We show that a logarithmic-time search algorithm can be evolved, and proceed to analyze its workings. This paper is organized as follows: In the next section we describe our setup for evolving search algorithms, followed by results in Section 3. A more general representation involving automatically defined functions (ADFs) is presented in Section 4. Related work on program evolution is described in Section 5, with concluding remarks following in Section 6.
2
The Evolutionary Setup
We use Koza-style GP [2], in which a population of individuals evolves. An individual is represented by an ensemble of LISP expressions, each composed of functions and terminals. Each individual represents a computer program—or algorithm—for searching an element in an array. Since most common computer languages are typed, we opted for strongly-typed genetic programming [3], which may ultimately help in evolving more understandable algorithms. We used the ECJ package to conduct the experiments [4]. 2.1
Representation
We designed a representation that has proven successful in the evolution both of linear and sublinear search algorithms. The genotypic function set is detailed in Table 1. In order to evaluate an individual, a phenotype is constructed by plugging the genotypic code into the template given in Fig. 1. The genotype thus represents the body of the for loop, the hardest part to develop in the algorithm, while the incorporating phenotype adds the necessary programmatic paraphernalia. As can be seen in Fig. 1, the individual’s genotypic code is executed iterations times for an input array of size n, with a global variable ITER incremented after each iteration. For the linear case we set iterations to n, whereas for the sublinear case iterations is set to log2 n. This upper limit on the number of loop iterations is the only difference between the evolution of
160
K. Wolfson and M. Sipper
Table 1. Terminal and function sets for the evolution of search algorithms (both linear and sublinear). int refers to Integer, bool – Boolean. Name
Arguments
Return Description Type
TERMINALS INDEX none Array[INDEX] none
int int
KEY ITER M0, M1
none none none
int int int
[M0+M1]/2
none
int
NOP none TRUE, FALSE none FUNCTIONS INDEX:= int
void bool void
M0:=, M1:= >, <, =
int int, int
void bool
PROGN2
void, void
void
If
bool, void, void
void
Current pointer into array Element at location INDEX in the input array. If INDEX is not in [0, n − 1], for array length n, 0 is returned The element we are searching for Current iteration number Getters to global variables, at the algorithm’s disposal Average of M0, M1 (truncated to nearest integer) Does nothing Boolean terminals Sets the value of variable INDEX to value returned by argument Setters to global variables Returns true if the first argument is greater than, less than, or equal to the second argument, respectively; else returns false Sequence: execute first argument, then execute second argument Conditional branching: if the first argument evaluates to true, execute second argument, otherwise execute third argument
the two cases and can be considered as part of the fitness function (described below), specifically, the differentiating part. We decided not to add an early-termination condition, which exits the loop when the index of the searched-for key is found, in order to render the problem harder for evolution: The evolving search algorithm should learn to retain the correct index, if the key is located before the loop terminates. The terminal and function sets include read access to the variable ITER and the searched-for KEY, and read/write access to a global variable INDEX, initialized to 0. INDEX is used to access array elements through the Array[INDEX] terminal, and the value of INDEX after the final iteration is taken as the return value of the run. To discourage INDEX being set to values outside the array bounds ([0, n − 1] since we use Java), Array[INDEX] returns 0 if INDEX is outside of bounds. Note that the key 0 does not appear in any input array because all the keys are positive, as described below. The evolving search algorithm is provided with read/write access to two global variables, M0 and M1, which the algorithm may use as it (or, more precisely, evolution) sees fit. The variables are initialized to 0 and n−1, respectively, which
Evolving Efficient List Search Algorithms
161
public s t a t i c i n t s e a r c h ( i n t [ ] a r r , i n t KEY) { int n = arr . length ; i n t M0 = 0; i n t M1 = n −1; i n t INDEX = 0 ; f o r ( i n t ITER = 0 ; ITER < i t e r a t i o n s ; ITER++) { −> GENOTYPE INSERTED HERE <− } return INDEX ; }
Fig. 1. Evolution of search: The evolving genotype, composed of elements delineated in Table 1, is incorporated into the above phenotypic JAVA template. The variable iterations is set to n for evolving linear search algorithms, and is set to log2 n for evolving sublinear algorithms.
affords the individual potential knowledge of the array length. This information should prove useful in sublinear solutions. The [M0+M1]/2 terminal embodies human intuition about the problem, to facilitate the solution, which, nonetheless, still requires crucial algorithmic insight—to be derived via evolution. In Section 4 we re-examine this terminal, repealing it altogether. The remaining functions and terminals include standard comparative predicates (<, >, =), conditional branching (If), a sequence operator (PROGN2), the Boolean terminals TRUE and FALSE, and a simple NOP (no-operation) to enable, e.g., the evolution of an if without an else part. Note that the evolving algorithms can inherently deal with keys not in the array, by wrapping the search method in a method that returns an illegal index value (e.g., -1) if the array does not contain the key in the returned index. Thus, the algorithms will not be trained or tested on such inputs. We also mention that using our function and terminal sets (specifically, ITER being read-only, and not defining a nested-loop function) and limiting the number of iterations of the for loop, we avoid generating non-terminating phenotypes. 2.2
Fitness Evaluation and Run Parameters
Fitness is defined similarly both for the evolution of linear and sublinear algorithms. The basic idea is to present the evolving individual with many random input arrays, have it run and search keys in them, and reward the individual for the closeness of the outputs to perfect answers. It is important to note that fitness is based not on an all-or-nothing quality (key found or not), but on gradations of “finding” quality—as defined below. Specifically, to compute fitness, each individual is run over a set of training cases, each case being an array to be searched. The set of training cases is fixed for all individuals per generation, and is randomly generated anew every generation, as we found this encouraged more general solutions. Let minN and maxN be the predefined minimal and maximal training-case array lengths, and let N = maxN − minN + 1. We generate N arrays of all N possible sizes in the range [minN, maxN ], both to induce variety during evolution and also to
162
K. Wolfson and M. Sipper
render the solution general, able to function correctly on as many different array lengths as possible. In the linear case, an array of length n ∈ [minN, maxN ] holds a random permutation of integers in the range [1000, 1000 + n − 1]. In the sublinear case, an array of length n ∈ [minN, maxN ] holds a sorted list of random integers in the range [n, 100n]. Note that the key range is completely disjoint from the index range, to discourage “cheating”(e.g., in a sorted array, a program might evolve to simply return the key value, which happens to equal the index value). All n keys are searched for by an (individual) phenotypic program in the population, using the search(arr,KEY) method given in Fig. 1. In order to define fitness, we first provide a number of definitions. The error per single key search is defined as the absolute distance between the correct index of KEY in the array and the index returned by search(arr,KEY): error(arr, key, correct) = |correct − search(arr, key)| . An error of zero means that the search was successful. All generated arrays contain unique elements, to avoid ambiguity in error definition. Note that the index returned by the search function may be out of array bounds, and as such suffers from a larger error value—another discouragement of illegal index values. Let calls be the total number of search calls, over all N training cases, i.e., the total number of keys searched for: calls =
maxN
n=
n=minN
maxN (maxN + 1) minN (minN − 1) − . 2 2
The average error per search call is calculated as follows: avgerr =
N nt −1 1 error(arrt , arrt [i], i), calls t=1 i=0
where arrt is the tth array of the N randomly generated arrays, and nt is its length (nt = minN + t − 1). (Note: Java array indexes begin at 0.) Note that the order of calls to search(arr,KEY) does not affect their outputs, so it is safe to execute the individual program for consecutive indexes in the array without bias. We define a hit as the finding of the precise location of KEY, i.e., error(arr, key, correct) = 0. The total number of hits is thus given by: hits =
N n t −1
max (0, 1 − error(arrt , arrt [i], i)) .
t=1 i=0
Finally, the fitness value of an individual is defined as the average error per search call, with a 0.5% bonus reduction for every 1% of correct hits: hits f itness = avgerr × 1 − 0.5 × . calls
Evolving Efficient List Search Algorithms
163
For example, if an individual scored 300 hits in 1000 search calls, its fitness will be the average error per call, reduced by 15%. An evolving program attains a perfect raw fitness value of zero if every test is passed, i.e., for every searched KEY the correct index is returned. The bonus hits component was added to encourage perfect answers since we felt that an individual with a higher overall error could be considered better than one with a lower overall error, if the former’s hit count is higher. We also noted that the hits component increased fitness variation in the population. The best solution of each run was subjected to a stringent generality test, by running it on random arrays of all lengths in the range [2, 5000] (for linear search the range was smaller, [2, 500], given the considerably longer runtime of such a search—and of the generality test thereof). Kinnear [5] noted that “For any algorithm... that operates on an infinite domain of data, no amount of testing can ever establish generality. Testing can only increase confidence.” To increase our confidence in the solutions evolved we added analysis by hand to the generality test. Though some solutions were quite large, the intuition behind the algorithmic idea could be gleaned by focusing on the ADF code (Section 4). Array-length parameters were set to minN = 2 and maxN = 10 for the linear case, to decrease evaluation time, and minN = 2 and maxN = 100 for the sublinear case, as a trade-off between generality and performance. (When we used lower boundary values, evolved solutions did not prove general. Higher boundary values yielded general solutions, at the expense of increasing evaluation time by a quadratic factor.) The GP run operators and parameters are summarized in Table 2. Table 2. GP parameters Objective
Find a key in a given input array of unsorted (linear-time case) or sorted (sublinear-time case) positive integers in a prefixed number of iterations Function and Ter- As detailed in Table 1 minal sets Fitness Average error per search call on training set, with bonus reduction for hits (as detailed in Section 2.2) Selection Tournament of size 7, elitism of size 2, generational Population Size 250 Initial Population Created using ramped-half-and-half, with a maximum depth of 6 Max tree depth 10 Generations 5000 (or until individual with perfect fitness emerges) Crossover Standard subtree exchange Mutation Standard grow (generate new subtree at chosen node) Node Selection Nodes chosen for crossover or mutation are function nodes with probability 0.9 and terminal nodes with probability 0.1 Genetic Operator On the selected parent individual: with probability 0.1 copy to next Probabilities generation (reproduction); with probability 0.05 mutate individual; with probability 0.85, select a second parent and cross over trees
164
3
K. Wolfson and M. Sipper
Results
3.1
Linear
It turned out that evolving a linear-time search algorithm was quite easy with the function and terminal sets we designed. We performed 50 runs, 46 of which (92%) produced solutions with a perfect fitness of 0, also passing with flying colors the generality test, exhibiting no errors up to length 500. In fact, our representation rendered the problem easy enough for a perfect individual to appear in the randomly generated generation 0 in three of the runs. An example of an evolved solution is shown in Fig. 2, along with the equivalent Java code. When plugged into the template of Fig. 1, we observe a linear-time search algorithm that proceeds as follows: As long as KEY is not in location INDEX, INDEX is incremented by one along with ITER. From the index wherein the key is found (i.e., Array[INDEX] = KEY), INDEX is no longer modified, preserving the correct value until the end of the algorithm’s execution. An irrelevant setting of M1 to [M0+M1]/2 takes place, but does not have any effect on the returned index. ( I f (= Array [ INDEX ] KEY) (M1:= [M0+M1] / 2 ) (INDEX:= ITER ) )
i f ( a r r [ INDEX ] == KEY) M1 = (M0+M1) / 2 ; else INDEX = ITER ;
(a)
(b)
Fig. 2. An evolved linear-time search algorithm. (a) LISP genotype. (b) Equivalent JAVA code, which, when plugged into the full-program template (Fig. 1), forms the complete algorithm. (Note: the actual code contains an additional check when executing the arr[INDEX] instruction; if the value of INDEX is within [0, arr.length − 1] return arr[INDEX], otherwise, return 0.)
3.2
Sublinear
The sublinear search problem proved (unsurprisingly) a greater challenge for evolution. We performed 50 runs, 35 of which (70%) produced perfect solutions, exhibiting no errors up to length 5000. The solutions emerged in generations 22 to 3632, and their sizes varied between 42 and 244 nodes. Seven runs (14%) produced near-perfect solutions, which failed on a single key in the input arrays, usually either the first or last key (scoring 99.96% hits on the generality test). A simplified version of one of the evolved solutions is given in Fig. 3, along with the equivalent Java code. The solution was simplified by hand from a tree of 50 nodes down to 14, and it turns out to be an implementation of the well-known binary search.
4
Less Knowledge—More Automation
Re-examining the representation used until now (Table 1), we note that most terminals and functions are either general-purpose ones (e.g., conditional and
Evolving Efficient List Search Algorithms
(PROGN2 (INDEX:= [ M0+M1] / 2 ) ( i f (> KEY Array [ INDEX ] ) (PROGN2 (M0:= [ M0+M1] / 2 ) (INDEX:= M1) ) (M1:= [ M0+M1 ] / 2 ) ) ) )
(a)
165
INDEX = (M0+M1) / 2 ; i f (KEY > a r r [ INDEX ] ) { M0 = (M0+M1) / 2 ; INDEX = M1 ; } else M1 = (M0+M1) / 2 ;
(b)
Fig. 3. An evolved sublinear-time search algorithm (simplified). Evolved solution reveals itself as a form of binary search. (a) LISP genotype. (b) Equivalent JAVA code, which, when plugged into the full-program template (Fig. 1), forms the complete algorithm.
predicates), or ones that represent a very basic intuition about the problem to be solved (e.g., the straightforward need to access INDEX and KEY). However, one terminal—[M0+M1]/2—stands out, and might be regarded as our “intervening” too much with the course of evolution by providing insight born of our familiarity with the solution. In this section we remove this terminal and augment the evolutionary setup with the mechanism of automatically defined functions (ADFs) [6]. (Note on terminology: We use the term main tree rather than resultproducing branch (RPB) [6], since the tree does not actually produce a result: The behavior of the program is mainly determined by the side effects of functions in the tree, e.g., INDEX:= changes the value of INDEX.) Specifically, the terminal [M0+M1]/2 was removed from the terminal set, with the rest of the representation remaining unchanged from Table 1. We added an ADF—ADF0—affording evolution the means to define a simple mathematical function, able to use the variables M0 and M1. The evolved function receives no arguments, and has at its disposal arithmetic operations, integer constants, and the values of the global variables, as detailed in Table 3. The evolutionary setup was modified to incorporate the addition of ADFs. The main program tree and the ADF tree could not be mixed because the function sets are different, so crossover was performed per tree type (main or ADF). We noticed that mutation performed better than crossover, especially in the ADF tree. We increased the array-length parameters to minN = 200 and maxN = 300, upon observing a tendency for non-general solutions to emerge with arrays shorter than 200 in the training set. The rest of the GP parameters are summarized in Table 4. The sublinear search problem with an ADF naturally proved more difficult than with the [M0+M1]/2 terminal. We performed 50 runs with ADFs, 12 of which (24%) produced perfect solutions. The solutions emerged in generations 54 to 4557, and their sizes varied between 53 and 244 nodes, counting the sum total of nodes in both trees. Analysis revealed all perfect solutions to be variations of binary search. The algorithmic idea can be deduced by inspecting the ADFs, all eleven of which turned out to be equivalent to one of the following: (M0 + M1)/2, (M0 + M1 + 1)/2, or (M0/2 + (M1 + 1)/2) (all fractions truncated); to wit, they are reminiscent of
166
K. Wolfson and M. Sipper
Table 3. Terminal and function sets for the automatically defined function ADF0 Name
Arguments Return Description Type
TERMINALS M0, M1 none 0, 1, 2 none FUNCTIONS +, −, × int, int /
int, int
int int
Getters to global variables Integer constants
int
Standard arithmetic functions, returning the addition, subtraction, and multiplication of two integers Protected integer division. Returns the first argument divided by the second, truncated to integer. If the second argument is 0, returns 1
int
Table 4. GP parameters for ADF runs. (Parameters not shown are identical to those of Table 2). Function and Ter- As detailed above in this section minal sets Initial Population Created using ramped-half-and-half, with a maximum depth of 6 for main tree and 2 for ADF Max tree depth main tree: 10; ADF tree: 4 Crossover Standard subtree exchange from same tree (main or ADF) in both parents Genetic Operator On the selected parent individual: with probability 0.1 copy to Probabilities next generation (reproduction); with probability 0.25 mutate individual’s main tree; with probability 0.4 mutate individual’s ADF tree; with probability 0.2, select a second parent and cross over main trees; with probability 0.05, select a second parent and cross over ADF trees
the original [M0+M1]/2 terminal we dropped. We then simplified the main tree of some individuals and analyzed them. A simplified version of one of the evolved solutions is given in Fig. 4, along with the equivalent Java code. The solution was simplified by hand from 58 nodes down to 26.
5
Related Work
We performed an extensive literature search, finding no previous work on evolving list search algorithms, for either arrays or lists of elements. The “closest” works found were ones dealing with the evolution of sorting algorithms, a problem that can be perceived as being loosely related to array search. Note that both problems share the property that a solution has to be 100% correct to be useful. Like search algorithms, the problem of rearranging elements in ascending order has been a subject of intensive study [7]. Most works to date were able to evolve
Evolving Efficient List Search Algorithms
167
(PROGN2 (PROGN2 ( i f (< Array [ INDEX ] KEY) (INDEX:= ADF0) NOP) ( i f (< Array [ INDEX ] KEY) (M0:= INDEX) (M1:= INDEX ) ) ) (INDEX:= ADF0 ) ) ) ADF0 : ( / (+ (+ 1 M0) M1) 2 )
(a)
if
( a r r [ INDEX ] < KEY) INDEX = ((1+M0)+M1) / 2 ; i f ( a r r [ INDEX ] < KEY) M0 = INDEX ; else M1 = INDEX ; INDEX = ((1+M0)+M1) / 2 ;
(b)
Fig. 4. An evolved sublinear-time search algorithm with ADF (simplified). Evolved solution is another variation of binary search. (a) LISP genotype. (b) Equivalent JAVA code, which, when plugged into the full-program template (Fig. 1), forms the complete algorithm.
O(n2 ) sorting algorithms, and only one was able to reach into the more efficient O(n log n) class, albeit with a highly specific setup. The problem of evolving a sorting algorithm was first tackled by Kinnear [5,8], who was able to evolve solutions equivalent to the O(n2 ) bubble-sort algorithm. Kinnear compared between different function sets, and showed that the difficulty in evolving a solution increases as the functions become less problem-specific. He also noted that adding a parsimony factor to the fitness function not only decreased solution size, but also increased the likelihood of evolving a general algorithm. The most recent work on evolving sorting algorithms is that of Withall et al. [9]. They developed a new GP representation, comprising fixed-length blocks of genes, representing single program statements. A number of list algorithms, including sorting, were evolved using problem-specific functions for each algorithm. A for loop function was defined, along with a double function, which incorporated a highly specific double-for nested loop. With these specialized structures Withall et al. evolved an O(n2 ) bubble-sort algorithm. An O(n log n) solution was evolved by Agapitos et al. [10,11]. The evolutionary setup was based on their object-oriented genetic programming system. In [10] the authors defined two configurations, one with a hand-tailored filter method, the second with a static ADF. The former was used to evolve an O(n log n) solution, and the latter produced an O(n2 ) algorithm. Runtime was evaluated empirically as the number of method invocations. In [11] an Evolvable Class was defined, which included between one and four Evolvable Methods that could call each other. This setup increased the search space and produced O(n2 ) modular recursive solutions to the sorting problem. Agapitos et al. noted that mutation performed better than crossover in their problem domain, a conclusion we also reached regarding our own domain of evolving search algorithms with ADFs (Section 4). Other interesting works on evolving sorting algorithms include [12,13,14,15], not detailed herein due to space limitations.
168
K. Wolfson and M. Sipper
Another related line of research is that of evolving iterative programs. Koza [16] defined automatically defined iterations (ADIs) and Kirshenbaum [17] defined an iteration schema for GP. These constructs iterate over an array or a list of elements, executing their body for each element, an thus cannot be used for sublinear search, as their inherent runtime is Ω(n). Many loop constructs were suggested, e.g., Koza’s automatically defined loops (ADLs) [16], and the loops used to evolve sorting algorithms mentioned above. But, as opposed to the research on sorting algorithms, herein we assume that an external for loop exists, for the purpose of running our evolving solutions. In the sorting problem, the O(n2 ) solutions requires nested loops, which the language must support. The O(n log n) solution was developed in a language supporting recursion. In linear and sublinear search algorithms, there will always be a single loop (in non-recursive solutions), and the heart of the algorithm is the body of the loop (which we have evolved in this paper). In summary, our literature survey has revealed several related interesting works on the evolution of sorting algorithms and on various forms of evolving array iteration. There seems to be no work on the evolution of array search algorithms.
6
Concluding Remarks and Future Work
We showed that algorithmic design of efficient list search algorithms is possible. With a high-level fitness function, encouraging correct answers to the search calls within a given number of iterations, the evolutionary process evolved correct linear and sublinear search algorithms. Knuth [7] observed that “Although the basic idea of binary search is comparatively straightforward, the details can be somewhat tricky, and many good programmers have done it wrong the first few times they tried.” Evolution produced many variations of correct binary search, and some nearly-correct solutions erring on a mere handful of extreme cases (which one might expect, according to Knuth). Our results suggest that, in general, algorithms can be evolved where needed, to solve hard problems. Our work opens up a number of possible avenues for future research. We would like to explore the coevolution of individual main trees and ADFs, as in the work of Ahluwalia [18]. Our phenotypes are not Turing complete (TC) [19], e.g., because they always halt. It would be interesting to use a Turing-complete GP system to evolve search algorithms. Some of the evolved solutions are bloated. It would be interesting to see how adding parsimony pressure affects evolution. We also plan to delve into related areas, such as sorting algorithms, and show evolutionary innovation in action. Ultimately, we wish to find an algorithmic innovation not yet invented by humans.
Acknowledgment We are grateful for the many helpful remarks of the anonymous referees. We also thank Amit Benbassat for pointing us in an interesting direction.
Evolving Efficient List Search Algorithms
169
References 1. Harel, D.: Algorithmics: The Spirit of Computing. Addison-Wesley Publishing Company, Readings (1992) 2. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992) 3. Montana, D.J.: Strongly typed genetic programming. Evolutionary Computation 3(2), 199–230 (1995) 4. Luke, S., Panait, L.: A Java-based evolutionary computation research system (March 2004), http://cs.gmu.edu/~ eclab/projects/ecj 5. Kinnear Jr., K.E.: Evolving a sort: Lessons in genetic programming. In: Proceedings of the 1993 International Conference on Neural Networks, San Francisco, USA, March 28-April 1, vol. 2, pp. 881–888. IEEE Press, Los Alamitos (1993) 6. Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programms. MIT Press, Cambridge (1994) 7. Knuth, D.E.: Sorting and Searching. The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1975) 8. Kinnear Jr., K.E.: Generality and difficulty in genetic programming: Evolving a sort. In: Proceedings of the 5th International Conference on Genetic Algorithms, San Francisco, CA, USA, pp. 287–294. Morgan Kaufmann Publishers Inc., San Francisco (1993) 9. Withall, M.S., Hinde, C.J., Stone, R.G.: An improved representation for evolving programs. Genetic Programming and Evolvable Machines 10(1), 37–70 (2009) 10. Agapitos, A., Lucas, S.M.: Evolving efficient recursive sorting algorithms. In: Proceedings of the 2006 IEEE Congress on Evolutionary Computation, Vancouver, July 6-21, pp. 9227–9234. IEEE Press, Los Alamitos (2006) 11. Agapitos, A., Lucas, S.M.: Evolving modular recursive sorting algorithms. In: Ebner, M., O’Neill, M., Ek´ art, A., Vanneschi, L., Esparcia-Alc´ azar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 301–310. Springer, Heidelberg (2007) 12. O’Reilly, U.M., Oppacher, F.: A comparative analysis of GP. In: Angeline, P.J., Kinnear Jr., K.E. (eds.) Advances in Genetic Programming, vol. 2, pp. 23–44. MIT Press, Cambridge (1996) 13. Abbott, R., Guo, J., Parviz, B.: Guided genetic programming. In: The 2003 International Conference on Machine Learning; Models, Technologies and Applications (MLMTA’03), Las Vegas, June 23-26. CSREA Press (2003) 14. Spector, L., Klein, J., Keijzer, M.: The push3 execution stack and the evolution of control. In: GECCO ’05: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, New York, NY, USA, pp. 1689–1696. ACM, New York (2005) 15. Shirakawa, S., Nagao, T.: Evolution of sorting algorithm using graph structured program evolution. In: SMC, pp. 1256–1261. IEEE, Los Alamitos (2007) 16. Koza, J.R., Andre, D., Bennett III, F.H., Keane, M.: Genetic Programming 3: Darwinian Invention and Problem Solving. Morgan Kaufmann, San Francisco (1999) 17. Kirshenbaum, E.: Iteration over vectors in genetic programming. Technical Report HPL-2001-327, HP Laboratories, December 17 (2001) 18. Ahluwalia, M., Bull, L.: Coevolving functions in genetic programming. Journal of Systems Architecture 47(7), 573–585 (2001) 19. Woodward, J.: Evolving Turing complete representations. In: Sarker, R., et al. (eds.) Proceedings of the 2003 Congress on Evolutionary Computation CEC2003, Canberra, December 8-12, pp. 830–837. IEEE Press, Los Alamitos (2003)
Semantic Similarity Based Crossover in GP: The Case for Real-Valued Function Regression Nguyen Quang Uy1 , Michael O’Neill1 , Nguyen Xuan Hoai2 , Bob Mckay2 , and Edgar Galv´an-L´opez1 1
Natural Computing Research & Applications Group, University College Dublin, Ireland 2 School of Computer Science and Engineering, Seoul National University, Korea
[email protected]
Abstract. In this paper we propose a new method for implementing the crossover operator in Genetic Programming (GP) called Semantic Similarity based Crossover (SSC). This new operator is inspired by Semantic Aware Crossover (SAC) [20]. SSC extends SAC by adding semantics to control the change of the semantics of the individuals during the evolutionary process. The new crossover operator is then tested on a family of symbolic regression problems and compared with SAC as well as Standard Crossover (SC). The results from the experiments show that the change of the semantics (fitness) in the new SSC is smoother compared to SAC and SC. This leads to performance improvement in terms of percentage of successful runs and mean best fitness.
1 Introduction Genetic Programming (GP) is an evolutionary algorithm inspired by biological evolution to find the solution for an user-defined task. The program is usually presented in a language of syntactic formalisms such as s-expression trees [14], a linear sequence of instructions, grammars, or graphs [18]. The genetic operators in such GP systems are usually designed to ensure the syntactic closure property, i.e., to produce syntactically valid children from any syntactically valid parent(s). Using such purely syntactical genetic operators, GP evolutionary search is conducted on the syntactical space of programs with the only semantic guidance from an individual’s fitness. Although GP has shown to be effective in evolving programs for solving different problems using such (finite) behavior-based semantic guidance and pure syntactical genetic operators, this practice is somewhat unusual from a real programmers’ perspective. Computer programs are not just constrained by syntax but also by semantics. As a normal practice, any change to a program should pay heavy attention to the change in semantics of the program. To amend this deficiency in GP, resulting from the lack of semantic guidance on genetic operators, Uy et al. [20] proposed a semantic-based crossover operator for GP, called Semantic Aware Crossover (SAC). The results reported in [20] show that using semantic information helps to improve performance of GP in terms of the number of successful runs in solving real-valued symbolic regression problems. This paper extends the ideas presented in [20]. Our new operator, called Semantic Similarity based Crossover (SSC) is an improvement over SAC through the inclusion P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 170–181, 2010. c Springer-Verlag Berlin Heidelberg 2010
Semantic Similarity Based Crossover in GP
171
of additional semantic information to control the change of semantics of individuals by only allowing swapping of two subtrees which are semantically similar while also being semantically different. Effectively an upper and lower bound on semantic difference is used to determine whether or not subtrees can be exchanged during a crossover event. By doing this, we expect that the change of fitness of an individual will be less destructive. This property has been proved to be very important in GP [19]. The paper is organised as follows. In the next section, we give a review of related work on semantic based crossovers in GP. In Section 3 we describe our new crossover operator and explain how it differs from the crossover operator proposed in [20]. The experiments on the new crossover operator is described in Section 4 of the paper. The results of the experiments are then given and discussed in section 5. Section 6 concludes the paper and highlights some potential future extensions of this work.
2 Previous Works There has been a number of work in the literature on how to incorporate semantic information into GP. There are at least three ways in which semantics can be represented, extracted and used to guide GP: (a) using grammars [21,2,3], (b) using formal methods [9,10,11,13,12], and (c) based on GP tree-like structures [1,16,20]. In the first category, the most popular formalism used to incorporate semantic information into GP is Attribute Grammars. By using an attribute grammar and adding some attributes to individuals, some useful semantic information obtained from individuals during the evlutionary process can be obtained. This information then can be used to remove bad individuals from the population as reported in [3] or to prevent generating semantically invalid individuals as in [21,2]. The attributes used to present semantics are problem dependent and it is not always easy to design the attributes for each problem. Within the second category, Johnson has advocated for using formal methods as a way of adding semantic information in GP [9,10,11]. In these methods, the semantic information extracted by using formal methods (e.g., Abstract Interpretation and Model Checking) is used to measure individuals’ fitnesses in some problems which are difficult to use a traditional sample point based fitness measure. Katz and his co-workers used a model checking with GP to solve the Mutual Exclusion problem [13,12]. In these works, semantics is also used to calculate the fitness of individuals. Finally, with expression trees, semantic information has been incorporated mainly by modifying the crossover operator. Early work focused on the syntax and structure of individuals. In [8], the authors modified the crossover operator to take into account the depth of trees. Other work modified crossover taking into account the shape of the individuals [17]. More recently, context has been used as extra information for determining GP crossover points [6,15]. However, all these methods have to pay extra time costing for evaluating the context of all subtrees within each individual in the population. In [1], the authors investigated the effects of directly using semantic information to guide GP crossover on Boolean domains. The main idea was to check the semantic equivalence between offspring and parents by transforming the trees to Reduced Ordered Binary Decision Diagrams (ROBDDs). Two trees have the same semantic information if and only if they both are reduced to the same ROBDD. The semantic
172
N.Q. Uy et al.
equivalence checking is then used to determine which of the individuals participating in crossover will be copied to the next generation. If the offspring are semantically equivalent to their parents, then the parents are copied into the new population. By doing so, the authors argue that there is an increase in the semantic diversity of the evolving population and as a consequence an improvement in the GP performance. Uy et al. [20] proposed a new crossover operator (SAC), based on the semantic equivalence checking of subtrees. The approach was tested on a family of real-value symbolic regression problems (e.g., polynomial functions). The empirical results showed that SAC improves GP performance. SAC differs from [1] in two ways. Firstly, the test domain is real-valued rather than Boolean. For real-value domains, checking semantic equivalence by reduction to common ROBDDs is no longer possible. Secondly, the crossover operator is guided not by the semantics of the whole program tree, but by subtrees. This is inspired by recent work presented in [16] for calculating subtree semantics.
3 Semantic Similarity Based Crossover Semantic Similarity based Crossover (SSC) presented in this section is an extension of SAC [20]. In SAC, the semantic equivalence of two subtrees is determined by comparing them on a set of random points in the domain. If the outputs of the two subtrees on the set is close enough (subject to a parameter called semantic sensitivity) then they are designated as semantically equivalent. This information is then used to guide crossover by preventing the swap of two equivalent subtrees in each crossover operation. The method proposed in this paper differs from SAC in two ways. Firstly, the concept of semantically equivalent subtrees is replaced by the concept of semantically similar subtrees. As in [20], the similarity of two subtrees is also checked by comparing them on a set of random points in the domain. If the output of these subtrees on the set is within an interval, then they are considered as semantically similar subtrees. Assuming that we need to check if two subtrees St1 and St2 are similar or not, then the pseudo-code for doing it is as follows: If α
Semantic Similarity Based Crossover in GP
173
Algorithm 1. Semantic Similarity based Crossover select Parent 1 P1 ; select Parent 2 P2 ; Count=0; while Count<Max Attempt do choose at random crossover points at Subtree1 in P1 ; choose at random crossover points at Subtree2 in P2 ; if Subtree1 is similar to Subtree2 then execute crossover; add the children to the new population; return true; else Count=Count+1; if Count=Max Attempt then choose at random crossover points at Subtree1 in P1 ; choose at random crossover points at Subtree2 in P2 ; execute crossover; return true;
as not similar, we try a number of times to pick up two other subtrees. The reason is that picking up two similar subtrees is more difficult than selecting two non-equivalent subtrees in [20]. Algorithm 1 shows how SSC works. In our experiments, Max Atempt is set at 3. The motivation for SSC is that while encouraging GP individuals to exchange semantically different subtrees as in SAC [20], it is also desirable to prevent exchanging subtrees where the semantic difference is too large since it might cause substantial or even unbounded change in the semantics of the individuals after being crossed over. In other words, while forcing a change in the semantics of the individuals in the population, we also want to keep this change bounded and small. It is expected that a smoother change in fitness of the individuals will be obtained. For instance, consider an individual with the root node as the arithmetic multiplication (*) and its left and right subtrees have return values (semantics) of 10 and 3 respectively. Then, this individual has the return value of 30 (=10*3). If the right subtree is replaced by a semantically similar subtree such as the return value is 3.2, the return value of the individual is only slightly changed to 10*3.2=32. However, if it is replaced by a semantically different subtree, with the return value of 100 for example, the semantics of the individual will change dramatically to 10*100=1000. This likely causes a big change in the fitness of the individual.
4 Experimental Setup To investigate the possible effects of SSC and to compare it with SAC and SC, we used four real-valued symbolic regression problems of increasing difficulty. The underlying functions, from [7], are shown in Table 1, and the parameters used for our experiments
174
N.Q. Uy et al. Table 1. Symbolic Regression Functions
F1 = X 3 + X 2 + X F3 = X 5 + X 4 + X 3 + X 2 + X 4 3 2 F2 = X + X + X + X F4 = X 6 + X 5 + X 4 + X 3 + X 2 + X Table 2. Run and Evolutionary Parameter Values
Parameter Value Parameter Value Generations 50 Population size 500 Selection Tournament Tournament size 3 Crossover probability 0.9 Mutation probability 0.1 Initial Max depth 6 Max depth 15 Non-terminals +, -, *, /, sin, cos, exp, log (protected versions) Terminals X, 1 Number of samples 20 random points from [−1 . . . 1] Successful run sum of absolute error on all fitness cases < 0.1 Termination max generations exceeded Lower semantic sensitivities 0.02, 0.04, 0.06, 0.08 Higher semantic sensitivities 8, 10, 12 Trials per treatment 100 independent runs for each value.
are shown in Table 2. The reason for choosing the lower bound semantic sensitivities of SSC as the semantic sensitivities of SAC is these values helped to improve the performance of SAC over SC as shown in [20]. The reason for setting the upper bound semantic sensitivities values is inspired from the results of our experiments. These values are sufficient to demonstrate the performance of SSC.
5 Results and Discussion We present the results of two experiments undertaken to understand the behaviour of Semantic Similarity based Crossover (SSC), the earlier Semantic Aware Crossover (SAC), with both benchmarked against standard crossover (SC). In the first instance we examine the classic performance metrics of mean best fitness and the number of successful runs, followed by an analysis of the locality of each operator. 5.1 Mean Fitness and Success Rates Table 3 shows the percentage of successful runs. Figure 1 depicts the cumulative frequency using three different crossover operators, namely, SC, SAC, SSC (with lower bound semantic sensitivity and upper bound semantic sensitivity as 0.02 and 10, respectively). It can be seen from Table 3 that in almost all cases, SSC outperforms both SAC and SC. The exceptions mostly happen when solving the problem with target function F1 , which may simply be far too easy a problem to benefit from semantic information. Figure 1 shows that SSC usually find the perfect solutions faster than SAC and SC.
Semantic Similarity Based Crossover in GP
175
Table 3. Comparison of the percentage of successful runs
sensitivities F1 F2 F3 F4 low high SC SAC SSC SC SAC SSC SC SAC SSC SC SAC SSC 8 62 70 65 28 33 42 15 22 34 10 14 26 0.02 10 62 70 75 28 33 47 15 22 27 10 14 23 12 62 70 67 28 33 37 15 22 29 10 14 17 8 62 70 64 28 34 38 15 20 32 10 19 23 0.04 10 62 70 73 28 34 47 15 20 25 10 19 27 12 62 70 62 28 34 36 15 20 31 10 19 20 8 62 71 63 28 32 39 15 20 32 10 17 27 0.06 10 62 71 73 28 32 45 15 20 29 10 17 25 12 62 71 59 28 32 33 15 20 32 10 17 19 8 62 70 70 28 35 42 15 20 26 10 17 16 0.08 10 62 70 70 28 35 35 15 20 25 10 17 23 12 62 70 70 28 35 37 15 20 24 10 17 17
Table 4 gives the average of best solutions found in all runs of all GP systems. In this table, we use the shorhand sen for sensitivity. In Figure 2 we also show the average of best fitness (over 100 runs) in each of 50 generations with lower bound and high upper bound semantic sensititities as 0.02 and 10 respectively. It is noted that, in this figure, we only show the statistics from the 10th generation onwards. The reason is that at some first generations, the values of those statistics are usually big (which is expected as the fitness of individuals at the early stage of evolution is usually very bad). Therefore, it is difficult to scale the graphs to highlight the difference. Moreover, at these early generations, the statistics on fitness values were almost similar in all of GP systems regardless of which crossover operator is used. The results in Table 4 are consistent with those in Table 3 in that SSC is also superior than both SAC and SC finding solutions with better quality. Moreover, it can be observed from this table that the more difficult problem, the better performance achieved by SSC in comparison with SAC and SC. The results in Table 4 is also very solid as there is no exception in it. It expresses that SSC is also better than SAC and SC when they are compared by the average best fitness with the above lower bound and upper bound semantic sensitivities. Figure 2 shows that SSC not only outperforms than SAC and SC in terms of best fitness of runs but also better in terms of the mean best fitness at each generations. To measure the statistical significance of the results in Table 4, we also conducted some statistical tests. Here the t-test was used to see if the improvement over the average best fitness of SSC is significant. The t-test results of SAC in comparison with SC done in [20] is also transfered to this paper for the ease of comparison. The result of t-test (p-values) of both SSC and SAC in comparison with SC is shown in Table 5. In this table, if the improvement is remarkable, p-value is less than 0.05, and that value is bold faced as in the previous tables. It can be seen from Table 5 that while the improvement in terms of the average best fitness of runs of SAC over standard crossover is either not signficant or borderline
176
N.Q. Uy et al. F1
F2
90
60 SC SAC SSC
80
SC SAC SSC
70
Cumulative frequency
Cumulative frequency
50
60 50 40 30 20
40
30
20
10 10 0
0 5
10
15
20
25
30
35
40
45
50
5
10
15
Generations F3
20
25
30
35
40
45
50
Generations F4
30
25 SC SAC SSC
SC SAC SSC
25
Cumulative frequency
Cumulative frequency
20
20
15
10
15
10
5 5
0
0 5
10
15
20
25
30
Generations
35
40
45
50
5
10
15
20
25
30
35
40
45
50
Generations
Fig. 1. Cumulative frequency α=0.02, β=10
marginal, regardless of the upper and lower bounds explored on F2 , F3 and F4 there is almost always a significant difference is observed with SSC. The exceptions mostly lie in the easy-to-learn function F1 . On the contrary, in the most complicated target functions, F3 , F4 , there is no exception, whereas there are two exceptions in F2 . The results support the confirmation that the more difficult problem, the better GP performance is gained by using SSC. 5.2 Operator Locality The next set of experimental results are for the investigation of the locality property of SSC. It is well known that using a high-locality representation (small change in genotype corresponds to small change in phenotype) is important for efficient evolutionary search [5]. It is also widely admitted that designing an search operator for GP that could correspond a small change in syntax (genotype) to a small change in semantics (phenotype) is very difficult. Therefore, nearly all current GP representations and operators are low-locality, meaning that a small (syntactic) change in a parent can cause a big or even uncontrollable (semantical) change in their children. Our new crossover operator is different with other crossover operators in the literature in that it attempts to achieve high-locality by controlling the small change in terms of semantics. To compare the locality property of SSC with SAC and SC, an experiment was conducted where the fitness change of individuals before and after crossover is measured. For example, if two individuals having fitness of 10 and 15 are selected for crossover,
Semantic Similarity Based Crossover in GP
177
Table 4. Comparison of the average best fitness over 100 runs
sen F1 F2 F3 F4 low high SC SAC SSC SC SAC SSC SC SAC SSC SC SAC SSC 8 0.13 0.13 0.11 0.26 0.24 0.16 0.30 0.28 0.20 0.40 0.33 0.24 0.02 10 0.13 0.13 0.09 0.26 0.24 0.14 0.30 0.28 0.21 0.40 0.33 0.23 12 0.13 0.13 0.09 0.26 0.24 0.18 0.30 0.28 0.19 0.40 0.33 0.27 8 0.13 0.13 0.11 0.26 0.23 0.16 0.30 0.27 0.21 0.40 0.33 0.25 0.04 10 0.13 0.13 0.09 0.26 0.23 0.13 0.30 0.27 0.21 0.40 0.33 0.23 12 0.13 0.13 0.10 0.26 0.23 0.19 0.30 0.27 0.19 0.40 0.33 0.26 8 0.13 0.12 0.11 0.26 0.23 0.15 0.30 0.27 0.20 0.40 0.32 0.24 0.06 10 0.13 0.12 0.08 0.26 0.23 0.16 0.30 0.27 0.21 0.40 0.32 0.23 12 0.13 0.12 0.10 0.26 0.23 0.19 0.30 0.27 0.20 0.40 0.32 0.27 8 0.13 0.14 0.08 0.26 0.22 0.17 0.30 0.28 0.21 0.40 0.33 0.26 0.08 10 0.13 0.14 0.09 0.26 0.22 0.17 0.30 0.28 0.21 0.40 0.33 0.26 12 0.13 0.14 0.11 0.26 0.22 0.16 0.30 0.28 0.23 0.40 0.33 0.26 Table 5. T-test result (p-values)
sen F1 F2 F3 F4 low high SAC SSC SAC SSC SAC SSC SAC SSC 8 0.68 0.86 0.46 0.00 0.41 0.00 0.12 0.00 0.02 10 0.68 0.44 0.46 0.00 0.41 0.00 0.12 0.00 12 0.68 0.36 0.46 0.02 0.41 0.00 0.12 0.00 8 0.93 0.94 0.36 0.00 0.30 0.00 0.12 0.00 0.04 10 0.93 0.29 0.36 0.00 0.30 0.00 0.12 0.00 12 0.93 0.58 0.36 0.05 0.30 0.00 0.12 0.00 8 0.98 0.96 0.22 0.00 0.26 0.00 0.08 0.00 0.06 10 0.98 0.21 0.22 0.00 0.26 0.01 0.08 0.00 12 0.98 0.77 0.22 0.05 0.26 0.00 0.08 0.00 8 0.60 0.08 0.25 0.00 0.49 0.01 0.15 0.00 0.08 10 0.60 0.40 0.25 0.00 0.49 0.00 0.15 0.00 12 0.60 0.94 0.25 0.00 0.49 0.02 0.15 0.00
and that after the crossover operation their children have fitness of 17 and 9. Then, the change of fitness of these individuals is Abs(17 − 10) + Abs(9 − 15) = 13. Here Abs is again the absolute function. This value is then averaged over whole population and over 100 runs as well as for 50 generations. The results about the average of the fitness change of individuals before and after crossover is shown in Table 6. Again, in this table, the best results (the smallest values) are bold faced. In the Figure 3 we show the change of the average of fitness movement of 100 runs for each of 50 generations with lower bound and upper bound semantic sensitivities as 0.02 and 10. Table 6 and Figure 3 show that the step of the fitness change of our new crossover operator (SSC) is smaller than both SAC and SC. This means that the change of fitness over generations of SSC is smoother than SAC and SC. The table and
178
N.Q. Uy et al. F1
F2 SC SAC SSC
The average of best fitness
The average of best fitness
1
SC SAC SSC
1.4
0.8
0.6
0.4
0.2
1.2
1
0.8
0.6
0.4
0.2
0
0 10
15
20
25
30
35
40
45
50
10
15
20
25
Generations F3 1.8
35
40
45
50
2 SC SAC SSC
SC SAC SSC
The average of best fitness
1.6
The average of best fitness
30
Generations F4
1.4 1.2 1 0.8 0.6 0.4
1.5
1
0.5
0.2 0
0 10
15
20
25
30
Generations
35
40
45
50
10
15
20
25
30
35
40
45
50
Generations
Fig. 2. Average of best fitness with α=0.02, β=10 Table 6. The average individual fitness change before and after crossover operation
sen F1 F2 F3 F4 low high SC SAC SSC SC SAC SSC SC SAC SSC SC SAC SSC 8 8.8 8.6 6.9 10.1 8.7 6.0 10.3 10.3 6.8 11.8 9.7 7.5 0.02 10 8.8 8.6 6.1 10.1 8.7 5.8 10.3 10.3 6.6 11.8 9.7 7.3 12 8.8 8.6 6.5 10.1 8.7 5.9 10.3 10.3 7.2 11.8 9.7 7.5 8 8.8 8.7 6.7 10.1 8.5 6.1 10.3 10.1 7.4 11.8 9.7 8.0 0.04 10 8.8 8.7 6.0 10.1 8.5 5.3 10.3 10.1 6.8 11.8 9.7 7.2 12 8.8 8.7 6.0 10.1 8.5 5.8 10.3 10.1 7.3 11.8 9.7 7.4 8 8.8 8.2 6.8 10.1 7.3 5.9 10.3 9.3 7.2 11.8 9.5 7.8 0.06 10 8.8 8.2 6.2 10.1 7.3 6.1 10.3 9.3 6.8 11.8 9.5 7.4 12 8.8 8.2 5.7 10.1 7.3 5.8 10.3 9.3 7.4 11.8 9.5 7.3 8 8.8 8.3 6.6 10.1 7.4 5.3 10.3 9.4 7.3 11.8 9.6 7.9 0.08 10 8.8 8.3 5.9 10.1 7.4 5.1 10.3 9.4 6.8 11.8 9.6 7.5 12 8.8 8.3 5.4 10.1 7.4 6.1 10.3 9.4 7.4 11.8 9.6 6.9
figure also show that the fitness change of SAC is only slightly smoother than SC. These results explain why SSC is much better than SAC and SC on the problems tried, while SAC is also better than SC but only slightly.
Semantic Similarity Based Crossover in GP F1
F2 20 SC SAC SSC
The average of fitness movement
The average of fitness movement
20
15
10
5
0
SC SAC SSC
15
10
5
0 10
15
20
25
30
35
40
45
50
10
15
20
25
Generations F3
30
35
40
45
50
Generations F4
20
20 SC SAC SSC
The average of fitness movement
The average of fitness movement
179
15
10
5
0
SC SAC SSC
15
10
5
0 10
15
20
25
30
35
40
45
50
Generations
10
15
20
25
30
35
40
45
50
Generations
Fig. 3. The average fitness movement before and after crossover with α=0.02, β=10
6 Conclusion and Future Work In this paper, we have proposed a new semantic based crossover operator for GP, Semantic Similarity based Crossover (SSC). The new operator was tested and analysed on a class of real-valued symbolic regression problems and the results were compared using Semantic Aware Crossover (SAC) and standard GP crossover (SC). The experimental results show that SSC helps to improve the performance of GP in comparison with SAC and SC both in terms of the percentage of successful runs and the average of best fitness over a number of runs. The results from the experiments also show that this operator not only helps to encourage the exchange of subtrees with different semantics as in [20], but also makes a smaller change of fitness during the evolutionary process, by only allowing exchange of subtrees which have a controlled degree of similarity, ensuring a more well-behaved operator in terms of locality. We argue that this is the main reason why SSC outperformed SAC and SC on the problems tried. In the near future, we are planning to extend the work presented in this paper in a number of ways. Firstly, we are aiming to apply SSC on more difficult symbolic regression problems (the problems that are multi-variable and more complex in the structure of the solutions). For these problems, we predict that making small change in semantics is more difficult and also more important. Secondly, SSC could be used to enhance some previous proposed crossover operators that are purely based on the structure of trees such as crossover with bias on the depth of nodes [8] or one point crossover [17]. Another potential research direction is to apply SSC on other kind of
180
N.Q. Uy et al.
problem domains such as on Boolean problems that have been investigated in [16]. It could be even more difficult to generate the children that are different from their parents in terms of semantics. Last but not least, we are planning to investigate the range of lower bound semantic sensitivity and upper bound semantic sensitivity values that are good for a class of problems. In this paper, these values are manually and experimentally specified, however, it may be possible to allow these values to self-adapt during the evolutionary process [4].
Acknowledgements This paper was funded under a Postgraduate Scholarship from the Irish Research Council for Science Engineering and Technology (IRCSET).
References 1. Beadle, L., Johnson, C.: Semantically driven crossover in genetic programming. In: Proceedings of the IEEE World Congress on Computational Intelligence, pp. 111–116. IEEE Press, Los Alamitos (2008) 2. Cleary, R., O’Neill, M.: An attribute grammar decoder for the 01 multi-constrained knapsack problem. In: Proceedings of the Evolutionary Computation in Combinatorial Optimization, April 2005, pp. 34–45. Springer, Heidelberg (2005) 3. de la Cruz Echeand´ıa, M., de la Puente, A.O., Alfonseca, M.: Attribute grammar evolution. ´ In: Mira, J., Alvarez, J.R. (eds.) IWINAC 2005. LNCS, vol. 3562, pp. 182–191. Springer, Heidelberg (2005) 4. Deb, K., Beyer, H.G.: Self-adaptation in real-parameter genetic algorithms with simulated binary crossover. In: Proceedings of the Genetic and Evolutionary Computation Conference, July 1999, pp. 172–179. Morgan Kaufmann, San Francisco (1999) 5. Gottlieb, J., Raidl, G.: The effects of locality on the dynamics of decoder-based evolutionary search. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 283– 290. ACM, New York (2000) 6. Hengpraprohm, S., Chongstitvatana, P.: Selective crossover in genetic programming. In: Proceedings of ISCIT International Symposium on Communications and Information Technologies, November 2001, pp. 14–16 (2001) 7. Hoai, N.X., McKay, R., Essam, D.: Solving the symbolic regression problem with treeadjunct grammar guided genetic programming: The comparative results. In: Proceedings of the 2002 Congress on Evolutionary Computation (CEC 2002), pp. 1326–1331. IEEE Press, Los Alamitos (2002) 8. Ito, T., Iba, H., Sato, S.: Depth-dependent crossover for genetic programming. In: Proceedings of the 1998 IEEE World Congress on Computational Intelligence, May 1998, pp. 775– 780. IEEE Press, Los Alamitos (1998) 9. Johnson, C.G.: Deriving genetic programming fitness properties by static analysis. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 299–308. Springer, Heidelberg (2002) 10. Johnson, C.: What can automatic programming learn from theoretical computer science. In: Proceedings of the UK Workshop on Computational Intelligence. University of Birmingham (2002)
Semantic Similarity Based Crossover in GP
181
11. Johnson, C.G.: Genetic programming with fitness based on model checking. In: Ebner, M., O’Neill, M., Ek´art, A., Vanneschi, L., Esparcia-Alc´azar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 114–124. Springer, Heidelberg (2007) 12. Katz, G., Peled, D.A.: Genetic programming and model checking: Synthesizing new mutual exclusion algorithms. In: Cha, S(S.), Choi, J.-Y., Kim, M., Lee, I., Viswanathan, M. (eds.) ATVA 2008. LNCS, vol. 5311, pp. 33–47. Springer, Heidelberg (2008) 13. Katz, G., Peled, D.A.: Model checking-based genetic programming with an application to mutual exclusion. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 141–156. Springer, Heidelberg (2008) 14. Koza, J.: Genetic Programming: On the Programming of Computers by Natural Selection. MIT Press, MA (1992) 15. Majeed, H., Ryan, C.: A less destructive, context-aware crossover operator for GP. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ek´art, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 36–48. Springer, Heidelberg (2006) 16. McPhee, N.F., Ohs, B., Hutchison, T.: Semantic building blocks in genetic programming. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alc´azar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 134–145. Springer, Heidelberg (2008) 17. Poli, R., Langdon, W.B.: Genetic programming with one-point crossover. In: Proceedings of Soft Computing in Engineering Design and Manufacturing Conference, June 1997, pp. 180–189. Springer, Heidelberg (1997) 18. Poli, R., McPhee, W.L.N.: A Field Guide to Genetic Programming (2008), http://lulu.com 19. Rothlauf, F., Oetzel, M.: On the locality of grammatical evolution. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ek´art, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 320–330. Springer, Heidelberg (2006) 20. Nguyen, Q.U., Nguyen, X.H., O’Neill, M.: Semantic aware crossover for genetic programming: The case for real-valued function regression. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 292–302. Springer, Heidelberg (2009) 21. Wong, M.L., Leung, K.S.: An induction system that learns programs in different programming languages using genetic programming and logic grammars. In: Proceedings of the 7th IEEE International Conference on Tools with Artificial Intelligence (1995)
Genetic-Programming Based Prediction of Data Compression Saving Ahmed Kattan and Riccardo Poli School of Computer Science and Electronic Engineering, University of Essex. Wivenhoe, United Kindom
Abstract. We use Genetic Programming (GP) to generate programs that predict the data compression ratio for compression algorithms. GP evolves programs with multiple components. One component analyses statistical features extracted from the files’ byte frequency distribution to come up with a compression ratio prediction. Another component does the same but by analysing statistical features extracted from the files’ raw ASCII representation. A further (evolved) component acts as a decision tree to determine the overall output (compression ratio estimation) returned by an individual. The decision tree produces its result based on a series of comparisons among statistical features extracted from the files and the outputs of the two prediction components. The evolved decision tree has the choice to select either the outputs of the two compression prediction trees or alternatively, to integrate them into an evolved mathematical formula. Experiments with the proposed approach show that GP is able to accurately estimate the compression ratio of unseen files thereby avoiding the need to run multiple compressions on a file to decide which one provide best results. Keywords: Genetic Programming, Compression, Byte frequency distribution, Decision tree.
1 Introduction Researchers in the compression field tend to develop algorithms that work with specific data types, taking advantage of any available knowledge regarding the data. Evidently, applying different compression algorithms to the same file will result in different compression ratios. However, with prior knowledge it is possible to match the data with the proper compression model. This is difficult to do if the nature and regularities of a given data file are not known as is the case for heterogeneous archives. Testing alternative compression algorithms to determine the best one to use is extremely time consuming when the given data is large (e.g., > 1GB). Applying a random compression model in this case might result in loss of efficiency with regards to storage space or it might even cause increase. Consequently, estimating the compression ratio when applying different compression models could be very advantageous in saving both computational resource and the time required to perform the compression. Researchers have attempted to estimate data compression ratios for compression algorithms without the need to run the algorithms in question. Hus, [1], proposed an P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 182–193, 2010. © Springer-Verlag Berlin Heidelberg 2010
Genetic-Programming Based Prediction of Data Compression Saving
183
automatic synthesis of compression techniques for heterogeneous files. His approach focused on pigeon holing file types and forwarding each type to the proper compression model. The proposed approach modified the UNIX ‘file’ command and applied it on every 5KB of a file to determine the type of information it contains (e.g., text, graphics, executable). In [2] Culhane proposed three measurements to predict the compression ratio of files when applying the Huffman coding or LZW. The three measures includes; i) standard deviation of the bytes, ii) standard deviation of the difference of consecutive bytes, and iii) standard deviation of the XORed value of consecutive bytes. Recently we presented a lossless GP data compression system called GP-zip* in [3]. There we used Genetic Programming to learn the compressibility of different patterns in the data and match them with different compression models in such a way as to minimise the total size of the file. Although GP-zip* successfully matched unseen data with different compression models, it was not designed to give an estimation of the compression ratio. In this paper we propose a new approach based on GP to generate programs that can predict data compression efficiency for different compression models. The aim is allow a rapid analysis of the data to determine which compression model is to be used and, hence, save the resources and time needed to run multiple algorithms.
2 The Methodology Our system works in two stages: i) Training, where the system evolves mathematical formulas to predict the compression ratio of a particular compression model when applied to different training files, and ii) Testing, where the system is applied to unseen data. From the point of view of an operating system or standard high-level programming language, the data to be compressed is normally treated as a sequence of elementary data units, typically bytes. What each unit represents depends on the file type. If the file is plain ASCII text, each unit will represent a character. If a file is an executable program, each unit may either represent an instruction (or most likely a fragment of an instruction) or some numeric or textual data. In files containing recordings of signals (e.g., sound) a unit (say a byte) will either represent a sample or part of a sample. In any case, the interpretation of the sequence of units contained in a data file entirely depends on what we know about that file and what our expectations are regarding the contents of that file. In most situations these are determined by the file’s name and extension (although further information may also be available). Naturally, one can use such knowledge about a file to decide how to compress it, which is what most off-theshelf compression algorithms do. However, when presented with unknown data (e.g., an archive that has been encrypted or an archive in a format unknown to the operating system) one cannot exploit this information. Our system processes each file via two different representations. Firstly, each file is represented as a series of byte values (i.e., 0 - 255). Secondly, each file is represented as a byte frequency distribution (BFD). Sections 2.1 and 2.2 describe each representation in detail.
184
A. Kattan and R. Poli
GP evolves programs with multiple component trees (see Figure 1). The system analyses each representation of the data independently via the two compression prediction trees. Then, it integrates them into a single evolved decision tree. The output of the decision tree is the estimated compression ratio. The decision tree produces its result based on a series of comparisons among statistical features extracted from the files and the outputs of the other evolved components. The evolved decision tree has the choice to either select the outputs of the compression prediction trees or alternatively, to integrate them into an evolved mathematical formula in an effort to improve the prediction. Section 2.3 will describe the decision trees’ structure in detail. GP has been supplied with a language that allows the extraction of statistical features out of the two data representations and then combines them into a single decision tree. Table 1 illustrates the primitive set of the system. Note that all three trees in the representation of an individual use the same primitives. The only exception in the decision tree (see section 2.3). Table 1. Pprimitives set
Fig. 1. Individual within the system population
The system starts by randomly initializing a population of individuals using the ramped half and the half method [4]. The three standard genetic operators (crossover, mutation, and reproduction) have been used to guide evolution through the search space.
Genetic-Programming Based Prediction of Data Compression Saving
185
We let evolution optimise the three components during the training phase. The objective of the system is to build two statistical models and a decision tree that approximate the compression ratio for the files in the training set when applying a particular compression model. After evolution, we test the performance of the evolved components on unseen data. 2.1 Analyzing the Byte-Series Each file is stored, within a computer, as a series of unsigned bytes. In our system we use this series as a reference interpretation for the data. In particular, we treat the stream of data as a signal, digitised using an 8-bit quantisation. Hence, each byte in a data file is treated as an integer between 0 and 255. Preliminary tests involving plotting such signals for different file types revealed that different data types often correspond to signals with very different characteristics, while similar data types share similar features. The task is to evolve a non-linear function that extracts features out of the given byte-series that spot regularities and redundancy in the data stream. Naturally, in practice spotting such characteristics is not always straightforward. This depends on the nature of the given data stream. For example, it is easy to spot a regular pattern in an English text (e.g., ‘th’, ‘qu’), but complex in an executable file. Also, large byteseries might conceal some useful features only in some parts of the file. 2.2 Analyzing the Byte-Frequency Distribution Preliminary experimentations showed that analyzing files’ byte-series alone does not provide enough information to build a generic compression estimation model. So, we also look at the Byte Frequency Distribution (BFD). BFD is defined as a histogram of the number of times that each character appeared divided by the total number of characters. The basic concept of any compression model is to identify and remove the redundancy during the compression process. BFD contains features about the amount of available information in the data ‘entropy’ and also the symmetry in the data (from the point of view of characters frequencies). Thus, the BDF allows GP to reveal these characteristics and spot commonalities among different characters in the data stream. The task of the second component of each GP individual is to evolve a function that extracts features from the BFD. The advantage of this representation is that it is easy to process, since it is a list of 256 values only, and contains valuable information regarding the data stream. A disadvantage is that it ignores the order of the data in the stream. 2.3 Decision Tree A decision tree is a model that maps from the attributes of an item to a conclusion about its value [5]. Decision trees, leaves represent classifications and branches represent conjunctions of features that lead to classifications. Learning decision trees can be represented as an if-else-if series for human readability [5]. For the purpose of our for compression prediction system we customized the decision tree representation to fit our objective. As displayed in the decision tree on Figure 1,
186
A. Kattan and R. Poli
each comparison node (i.e., a node that contains a comparison condition such as, <, >, <= or >=) has four children. The first two represent conditions while the other two represent decisions. There are two types of condition trees and three types of decisions. The details of each type are as follows: Condition types: 1. Byte-series condition tree: this is a component tree (see middle of Figure 1) that extracts features from the given Byte-series representation and abstracts them to a single number. 2. BFD/Byte-series Outputs condition tree: this is a component tree that integrates the outputs of the Byte-series analyzer tree and the BFD analyzer tree into a mathematical formula. Decision types: 1. BFD Decision: is the output of the BFD analyzer tree. 2. Byte-series Decision: is the output of the Byte-series analyzer tree. 3. BFD/Byte-series Decision: is similar to BFD/Byte-series Outputs condition. It is an evolved tree that integrates the outputs of the Byte-series analyzer tree and/or BFD analyzer tree into single mathematical formula. The output of the decision tree is the estimated compression ratio. The decision tree produces its result based on a series of comparisons among statistical features extracted from the files and the outputs of the Byte-series analyzer tree and/or the BFD analyzer tree. The evolved decision tree has three choices: i) select the output of the Byte-series analyzer tree, ii) select the output of the BFD analyzer tree or iii) integrate both the Byte-series analyzer and the BFD analyzer trees into a mathematical formula and use that to produce the output. 2.4 Genetic Operators We used crossover, mutation and reproduction. Naturally, the genetic operators take the multi-tree representation of the individuals into account. There are several options in applying genetic operators to a multi-tree representation. Firstly, we could either apply a particular operator that has been selected to all trees within an individual or select a potentially different operator for each component. Secondly, we could constrain crossover to occur only between homologous component trees or not. It is unclear what technique is best. In [6] the authors argued that crossing over trees at different positions might result in the swapping of useless genetic material resulting in weaker offspring. On the contrary, in [7] the authors argued that restricting the crossover positions is misleading for evolution as the features are indistinguishable during evolution. After experimenting with a variety of approaches we settled for the following. Let the be the c-th tree of individual i, where c {Byte-series analyzer, BFD analyzer, Decision tree}. The system selects an operator with a predefined probability for each . Thus, offspring can be generated by using more than one operator. In the crossover, as each component has a particular task, only homologous components are allowed to cross. Also, the system has to take the structural constrains of the Decision tree into consideration and ensure its syntax is maintained. The system
Genetic-Programming Based Prediction of Data Compression Saving
187
crosses condition branches with condition branches from corresponding trees and decision branches with decision branches from corresponding trees. 2.5 Fitness Function As mentioned previously, each individual has a multi-tree representation, where one tree is used to analyze the byte-series, another tree is used to analyze the BFD and a third tree is used to decide which is the most accurate prediction of the compression ratio. Thus, there are three different objectives for the system. The first is to optimize the performance of the byte-series analyzer tree; the second is to optimize the performance of the BFD analyzer tree, and the third is to optimize the decision maker’s performance. We look at this as a multi-objective problem with three fitness functions. Each fitness function is measuring the quality of one component. The system randomly selects a fitness measurement each time it produce a new individual. In this way evolution is forced to jointly optimise all objectives. The fitness function for each component is simply the average of the absolute difference between the estimated compression ratio and actual achieved compression for all files in the training set as we explain below. The fitness of the BFD analyzer tree can be expressed as follows: let the output of the BFD-tree be denoted as BFD(filen),where filen is the nth file in the training set. Furthermore, let C(y, filen) be the compression saving of filen when applying the compression model y. Thus, –
,
.
BFD-tree Fitness =
(1)
The fitness of the byte-series analyzer tree can be expressed as follows: let the output of the byte-series tree be denoted as BS(filen). Thus, –
,
.
Byte-series tree Fitness =
(2)
Finally, the fitness of the Decision tree can be expressed as follows: let the output of the Decision-tree be denoted as DT(bs ,bfd, filen),where bs is the output of the byteseries analyzer tree, and bfd is the output of the BFD analyzer tree. Thus, ,
Decision-tree Fitness =
,
–
,
.
(3)
Thus, a GP individual’s quality is defined by its ability to identify statistical features for the data stream and predict their compressibility when applying a particular compression model. 2.6 Training and Testing The system extracts knowledge concerning the features of the data and their relationships with the performance of a particular compression algorithm during a training phase (a GP run).
188
A. Kattan and R. Poli
Several factors have been considered while designing the training set. Firstly, the training set has to contain enough diversity of data types in order to ensure the generality of the system. Secondly, the system will process the training set many times for each individual in each generation. Thus, the size of the training set should be small enough to maintain time-efficient training, but should also be large enough to be a representative set of data types the compression ratio of which is likely to be estimated by the users of the system. In addition, it is essential to avoid over-fitting the training set. Table 2 presents the data types within our training set. The training set contains 15 different file types within 26 files for a total of 5.14MB. It should be noticed that the training set is completely independent of the test set. GP's output at the end of the evolution consists of a byte-series analyzer tree, a BFD analyzer tree and a decision tree estimating compression ratios based on the other components. Because GP is stochastic, the user needs to run the system several times until it achieves adequate performance on the training set. Testing involves processing unseen files using these trees. Table 2. Training files
Files’ types pdf, exe, C++ code, gif, jpg, xls, ppt, mp3, mp4, txt, xml, xlsx, doc, ps, ram Total number of files
Total Size 5.14 MB
26
Table 3 presents a list of file types that have been used to measure the algorithm performance. The test set contains 19 different data types within 27 files. It contains some file types similar to those used in the training set, while others are different data types to which the algorithm has not been exposed during training. Details regarding the algorithm’s performance are given in the next section. Table 3. Testing files
Files’ types Tif, jpg, bmp, accdb, xml, c++ code, txt, mht, doc, docx, ppt, pdf, exe, msi, wmv, flv, mp4, mp3, ram Total number of files
Total Size 67.9 MB
27
3 Experiments The main aim of the experiments was to investigate the performance of the algorithm and to assess the algorithm’s behaviour under a variety of circumstances.
Genetic-Programming Based Prediction of Data Compression Saving
189
The approach has been tested to predict the compression ratio for the files in the test set for the following compression algorithms: Prediction by Partial Matching (PPMD) [8], Arithmetic coding (AC) [9] and Boolean Minimisation [10]. These algorithms were chosen because they belong to different categories of compression algorithms (i.e., AC is a statistical based coding, BooleanM is a dictionary based coding and PPMD is an adaptive statistical coding). The experiments presented here were performed using the following parameter settings: a population of 200 individuals, 40 generations, a crossover probability of 90%, a mutation probability of 5%, tournament selection with tournaments of size 5 and a maximum tree depth of 10.
Fig. 2. Summary of GP runs. a) Summarize 10 GP runs to evolve predictors for the AC compression. b) Summarize 10 GP runs to evolve predictors for the PPMD compression c) Summarize 10 GP runs to evolve predictors for the Boolean Minimisation compression.
190
A. Kattan and R. Poli Table 4. Performance Comparison (Seen file types vs. Unseen file types) Compression Model
AC PPMD BooleanM
Avg .prediction error for trained files types in all runs
5.95 11.96 9.40
Avg .prediction error for untrained files types in all runs
7.31 16.98 9.77
Figures 4a-c summarise the results of the experiments when evolving predictors for AC, PPMD and Boolean Minimisation (we performed 10 independent GP runs for each compression model). The graphs plot the best, worst and average prediction error for each file. Prediction error is measured as the absolute difference between the actual compression ratio and the estimated compression ratio (expressed as a percentage). Also, the standard deviation of the achieved predictions for each file in all runs is recorded. Each figure shows also the average of the best and worst predictions. As one can see the average of the best achieved prediction errors is very small (with values ranging from 0.8% to 3.4%). The small standard deviation with the reasonable predictions average indicates that our system is likely to produce accurate models within a few runs. As we mentioned before, the test set contained some files that consist of data types, to which the algorithm has not been exposed during training. Table 4 illustrates the average achieved prediction errors for the seen file types and the unseen file types. Generally, the algorithm gives slightly less accurate results for those types which it had no prior experience. Nonetheless, the achieved predictions when the algorithm deals with new files types are satisfactory. So, the algorithm must have learnt some general knowledge which can be used in different situations. Each component in an individual has a particular task, as explained previously. The final output is produced by the decision tree using the estimates constructed by the two other components. Hence, it is interesting to study how the decision tree integrates such information. As we mentioned before the evolved decision tree has three choices: select the output of the Byte-series analyzer tree, select the output of BFD analyzer tree or integrate both Byte-series analyzer and BFD analyzer trees into a mathematical formula. Thus, if the decision tree selected the closest estimate to the actual compression ratio then we count that as a right decision. If it chose to return the estimate of the less precise component, we count it as a wrong decision. If the decision tree decided to integrate the Byte-series and BFD analyzers into a mathematical formula producing a more accurate estimate we count that as an improved decision. Table 5 shows the proportion of correct/wrong/improved decisions for the decision tree in all 30 runs. Decision trees were able to select the correct compression estimation in most of the cases. Table 6 shows how often the BFD and Byte series trees were used to produce the right estimation for the compression ratio. Both components have been used to estimate the compression ratio.
Genetic-Programming Based Prediction of Data Compression Saving
191
Table 5. Decision tree performance
Decision
Percentage 7.05% 61.78% 31.17%
Improved decision Right decision Wrong decision
Table 6. Decision tree- Right decisions statistics
Decision
Percentage
BFD tree Byte series tree
73.87% 26.13%
We mentioned previously that the Byte-series analyzer tree and the BFD analyzer tree are used as primitives in the decision tree. One might ask: why don’t we evolve a single decision tree with all its branches instead of separating its components and evolving them individually? We tested also this alternative approach (using the same parameter settings for both systems). For space limitations we are unable to report full results. However, results with a single-tree representation were much less satisfactory. As mentioned previously, our aim is to estimate the compression ratio via different compression modules to save both computational resource and the time required to perform a compression. Table 7 reports a comparison between the times needed to compress all files in the test set (67.9 MB) against the time needed to predict their compressions. It is clear that our approach is faster than performing the actual compression. This is no surprise, as compression algorithms involve a lot of I/O operations while our approach only scans the file and extract statistical features that correlate with the compression ratio. Table 7. Compression time vs. Prediction time
Compression algorithm PPMD AC BooleanM
Compression time
Prediction time
148 seconds 52 seconds 3003 seconds
62 seconds 45 seconds 65 seconds
4 Conclusions In this paper we proposed to use GP to evolve predictors of the compression ratio achievable with three well-known compression models when applied to new files without the need to run the actual compression model. The proposed approach attempts to predict the compression saving for the data via two different interpretations; i) looking at a file’s byte-series and ii) considering a
192
A. Kattan and R. Poli
file’s bytes frequency distribution. Each interpretation is used by a separate tree within our representation in conjunction with a collection of statistical measures. We require the two trees to predict as best as possible the compression ratio achievable when applying a particular compression model. A third component of the representation, a decision tree attempts to distinguish the most accurate of the two predictions and if necessary integrates them into an even better prediction. Evolution is guided by a single fitness measure (prediction error), which is applied randomly to one of the tree components in the representation. This forces GP to evolve accurate predictors in the first two components but in such a way that the third component can easily understand which predictor is more accurate, thereby effectively performing a kind of multi-objective optimisation. Results are very encouraging, in the sense that a good prediction accuracy has been achieved on a large test set, both on seen and unseen data types, and with three compression models. We also found that separating the components of the decision tree and evolving them individually was advantageous in comparison with the standard method of evolving single decision trees. Although the proposed technique has achieved good results, the performance depends on the knowledge GP acquires during evolution. Thus, users need to select the training set according to their needs. For example, a user interested in DVD production might want to train the algorithm to predict the compression saving for different types of videos or audios. This research can be extended in many different ways. In the future we will investigate the use of further interpretations for the raw data in files (e.g., 2- and 4-bytes). Also, using primitives implementing High Order Statistics functions is likely to provide further improvements to the system’s performance. The disadvantage of the current realisation is that the evolved prediction models work only with particular compression algorithms. Thus, the user has to evolve a prediction model for each compression algorithm of interest. In future research we will try to create a single prediction model that works well with different compression algorithms.
References [1] Hsu, W.H., Zwarico, E.E.: Automatic Synthesis of Compression Techniques for Heterogeneous Files. SOFTPREX: Software–Practice and Experience 25 (1995) [2] Chlhane, W.: Statistical Measures as Predictors of Compression Savings, The Ohio State University, Department of Computer Science and Engineering, Honors Theses (2008) [3] Kattan, A., Poli, R.: Evolutionary lossless compression with GP-ZIP*. In: Proceedings of the 10th annual conference on Genetic and evolutionary computation, Atlanta, Georgia, USA, pp. 1211–1218 (2008) [4] Poli, R., Langdon, W.B., McPhee, N.: A field guide to genetic programming (2008), http://lulu.com [5] Mitchell, T.M.: McGRAW-HILL International Editions, ch. 3 (1997) [6] Muni, D.P., Pal, N.R., Das, J.: A novel approach to design classifiers using genetic programming. IEEE Transactions on Evolutionary Computation 8(2), 183–196 (2004) [7] Boric, N., Estevez, P.A.: Genetic Programming-Based Clustering Using an Information Theoretic Fitness Measure. In: IEEE Congress on Evolutionary Computation, September 25-28, pp. 31–38 (2007)
Genetic-Programming Based Prediction of Data Compression Saving
193
[8] Cleary, J.G., Witten, I.H.: Unbounded length contexts for PPM. In: Data Compression Conference, Snowbird, UT, USA, March 28-30, pp. 52–61 (1995) [9] Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Communications of the ACM 30(6), 520–541 (1987) [10] Kattan, A.: Universal Lossless Data Compression with built in Encryption, Master Thesis, Ed. University of Essex (2006)
On the Characteristics of Sequential Decision Problems and Their Impact on Evolutionary Computation and Reinforcement Learning Andr´e M.S. Barreto, Douglas A. Augusto, and Helio J.C. Barbosa Laborat´ orio Nacional de Computa¸ca ˜o Cient´ıfica Petr´ opolis, RJ, Brazil {amsb,douglas,hcbm}@lncc.br
Abstract. This work provides a systematic review of the criteria most commonly used to classify sequential decision problems and discusses their impact on the performance of reinforcement learning and evolutionary computation. The paper also proposes a further division of one class of decision problems into two subcategories, which delimits a set of decision tasks particularly difficult for optimization techniques in general and evolutionary methods in particular. A simple computational experiment is presented to illustrate the subject.
1
Introduction
In a sequential decision-making problem the consequences of a decision may last for an arbitrarily long time [13]. Thus, a choice that seems beneficial from a short-sighted perspective may reveal itself to be disastrous in the long run. In the game of chess, for example, a move that captures one of the opponent’s pieces may also expose the king and eventually lead to a defeat. Many real-world tasks involve this tradeoff between immediate and long-term benefits. Problems of practical and economical interest arising in areas as diverse as operations research, control theory and economics all fit within the decisionmaking framework [20]. The importance of the development of reliable methods to automatically solve sequential decision tasks cannot be overemphasized. One of the main issues regarding the solution of sequential decision tasks is the so-called temporal credit-assignment problem: how to apportion credit to individual decisions by looking at the outcome of a sequence of them [16]. For example, after being surprised by a checkmate, a chess player would like to know which moves were responsible for the defeat and which were not. There are two main approaches to address the temporal credit-assignment problem: – One may perform the credit assignment implicitly, by evaluating each decision policy as a whole and then combining the most successful ones in the hope that better policies will come forth. In the example of chess, each decision policy would be a strategy to play the game. This type of phylogenetic learning is the approach adopted by evolutionary methods [5]. P. Collet et al. (Eds.): EA 2009, LNCS 5975, pp. 194–205, 2010. c Springer-Verlag Berlin Heidelberg 2010
On the Characteristics of Sequential Decision Problems
195
– Another possibility is to resort to an ontogenetic learning paradigm, in which a single decision policy is gradually refined based on the individual evaluation of the decisions made (in the game of chess each move would be evaluated separately). This is the basic idea behind reinforcement-learning algorithms [17]. The advantages and drawbacks associated with the phylogenetic and the ontogenetic learning have been the subject of some debate. Sutton and Barto [17] argue that evolutionary methods cannot be considered as true reinforcement-learning techniques because they are not able to use valuable information available during the learning process. In contrast, Moriarty et al. [12] list several advantages of using evolutionary methods to solve sequential decision problems. Indeed, many researchers have successfully applied evolutionary algorithms to decision tasks, not rarely obtaining better results than reinforcement-learning methods on the same tasks [22,11,6]. On the other hand, it is also possible to find reports of experiments in which evolutionary methods failed entirely while reinforcement learning performed well [1]. The aim of this work is to contribute to the ongoing discussion by noting that the performance of both evolutionary computation and reinforcement learning can be strongly influenced by the characteristics of the task at hand. Thus, instead of asking which approach is the best choice to solve sequential decision problems in general, one should try to identify the characteristics of a task that make it more or less amenable to be solved by each technique. Ideally, one would have well-defined categories of decision problems in which the expected performance of evolutionary computation and reinforcement learning was known. This way, the choices involved in the solution of a given problem would be less subject to intuition or personal inclination of the designer. This paper represents a small step towards the scenario described above. It starts with a brief review of the criteria already used to classify decision problems and discusses the expected behavior of evolutionary computation and reinforcement learning on the resulting categories. This is done in Section 2. Then, in Section 3, a subdivision of one of these categories is proposed. As will be seen, this new dimension of classification is of particular interest for the evolutionarycomputation community, since it clearly delineates a class of decision problems in which optimization methods are simply not applicable. The new categories also help to understand the disparity on the reports of previous experiments comparing reinforcement learning and evolutionary methods. Section 4 presents a simple computational experiment to illustrate the issues discussed in the previous section. Finally, the main conclusions regarding the present investigation are presented in Section 5.
2
Classification of Sequential Decision Problems
Sequential decision problems can be dealt with at different levels of abstraction. In the model considered here an agent must learn how to perform a task by directly interacting with it—thus, the task is sometimes referred to as the
196
A.M.S. Barreto, D.A. Augusto, and H.J.C. Barbosa
environment [17]. The interaction between agent and environment happens at discrete time intervals. At each time step t the agent occupies a state si ∈ S and must choose an action a from a finite set A. The sets S and A are called the state and action spaces, respectively. The execution of action a in state si moves the agent to a new state sj , where a new action must be selected, and so on. a Each transition si − → sj has an associated reward r ∈ R, which provides evaluative feedback for the decision made. The use of rewards is a simple mechanism to represent the tradeoff between immediate and long-term benefits present in sequential decision problems. For example, if one wishes to model the game of chess, it suffices to associate a positive reward with every transition leading to a win and a negative reward with those transitions taking to a defeat. In this case, maximizing the expected reward corresponds to maximizing the probability of winning the game [17]. Given the above model of the decision-making process, a sequential decision problem can be classified as: 1. Markovian/Non-Markovian: In a Markovian decision problem the transitions and rewards depend only on the current state and the action selected by the agent [13]. Another way to put it is to say that Markovian states retain all the relevant information regarding the dynamics of a task: once the current state is known, the history of transitions that took the agent to that position is irrelevant for the purpose of decision making. In the game of chess, for example, a particular configuration of the board is all what is needed for an informed decision regarding the next move. On the other hand, upon deciding whether or not to concede a draw, a player may benefit from information about the history of previous games. Reinforcement-learning algorithms were developed based on the Markovian assumption. In particular, they rely on the concept of a value function, which associates every state-action pair (s, a) with the expected reward following the execution of a in s [17]. It is not hard to see, therefore, that reinforcement learning is particularly sensitive to the Markov property: if the dynamics of a task depend on the history of transitions executed by the agent, it makes little or no sense to associate (s, a) with a specific sequence of rewards. This is not to say that it is impossible to apply reinforcement learning to non-Markovian tasks; however, at the current stage of theoretical development a non-Markovian problem still represents a considerable obstacle for reinforcement-learning algorithms in general. On the other hand, evolutionary algorithms do not associate credit with individual actions, but rather evaluate decision policies as a whole. Therefore, each state-action pair is considered in the context of an entire trajectory of the agent. This focus on the net effect of a sequence of actions is much more robust with respect to the Markov property (for a particularly enlightening example, see the article by Moritarty et al. [12]). 2. Deterministic/Stochastic: In a deterministic decision problem the execution of action a in state si always takes the agent to the same state sj . In contrast, in a stochastic environment each transition is associated with a probability distribution over the state space S—that is, the agent may end up in
On the Characteristics of Sequential Decision Problems
197
different states on two distinct executions of a in si . The game of chess is clearly deterministic, whereas blackjack is an example of stochastic task [17]. A deterministic task is a degenerate stochastic environment in which the probability distributions associated with the transitions have only one nonzero element. Thus, any method designed for the latter category of decision problems should in principle also work in the former. Reinforcement-learning algorithms were developed with the stochastic scenario in mind, and apart from minor technical details the above distinction is irrelevant for them. Evolutionary computation can also be applied to both deterministic and stochastic tasks, but the latter require some caution. Since in stochastic tasks the same sequence of actions may result in different trajectories, an individual should not be evaluated on the basis of a single interaction with the environment (this would be like evaluating a chess player based on a single game). One way to circumvent this difficulty in evaluating candidate solutions is to resort to one of the many techniques available to deal with a noisy evaluation function (see the article by Moriarty et al. [12]). 3. Small/Large: Here, the terms “small” and “large” refer to the size of the state space with respect to the storage capacity of current computers. If a sequential decision problem is such that all state-action pairs can be stored in a look-up table, this problem is considered small. If S is large enough to preclude such storage, the task is considered large. Obviously, a decision problem with a continuous state space is always large. As discussed above, standard reinforcement-learning algorithms associate every state-action pair with a number, which amounts to a storage requirement of O(|S||A|). Obviously, when S is large this requirement cannot be fulfilled, and, therefore, one must resort to some form of approximation. Unfortunately, it is well known that the combination of reinforcement learning with general function approximators can easily become unstable [2,18,19]. One alternative is to use approximators with a linear dependence on the parameters, which results in stable reinforcement-learning algorithms [18,9]. However, the performance of such algorithms is generally very sensitive to the set of features used in the approximation, which, in principle, must be handcrafted. Evolutionary algorithms can be easily combined with any type of approximator, and hence they can be more naturally applied to decision problems with a large state space. 4. Stationary/Non-stationary: In a stationary decision problem the dynamics of the environment are fixed, that is, the rules governing the transitions of the agent and the delivery of rewards do not change over time. A nonstationary environment is characterized by a changing environment, which means the performance of a decision policy may also vary with time. A non-stationary decision problem is considerably more difficult than a stationary one for both reinforcement learning and evolutionary algorithms. Nevertheless, both approaches can be made to work in these problems with slight modifications in their standard forms. In the case of reinforcement learning, a
198
A.M.S. Barreto, D.A. Augusto, and H.J.C. Barbosa
non-stationary environment imposes a particularly severe version of the exploration/exploitation dilemma [17]. In order to address this problem, one must adopt techniques to guarantee a constant exploration of the environment, such as offering bonus rewards in states that have not been visited for a long time [17]. In the case of evolutionary computation, the standard strategy to address nonstationariness is to maintain the diversity within the population of candidate decision policies [12]. This way, changes in the dynamics of the problem will favor policies that perform better on the new environment, guiding the evolutionary search in the right direction. 5. Episodic/Continual: In an episodic task the interaction between agent and environment can be naturally broken into trials or episodes [17]. Each episode ends in a special state called a terminal state. There might be one or several such states. In chess, for example, every configuration of the board representing the end of a game would be a terminal state. In continual tasks the decision process goes on endlessly, without a clear criterion for interrupting it. One example of continual task is the automatic stock trading, in which an agent must decide whether to buy or sell stocks depending on the market situation. Though not as commonly noted as the previous four categories, the distinction between episodic and continual tasks is of particular interest for evolutionary computation. It is well known that the search performed by evolutionary algorithms is based on information gathered between episodes, but not within them. This creates difficulties in the evaluation of candidate decision-policies for a continual task. Since in this case there is no clear criterion for interrupting the interaction of the agent with the environment, the evaluation of an individual must be truncated at an arbitrary point. Depending on the problem, it might be difficult to determine how much experience is necessary to properly measure the quality of an individual. From the reinforcement-learning perspective, the distinction between episodic and continual tasks is mostly irrelevant, since in the ontogenetic paradigm learning takes place both inter- and intra-episodes. Notice that the dimensions of classification discussed above are orthogonal to each other, that is, the inclusion of a decision problem in one category does not influence its classification with respect to another criterion. This amounts to 25 different classes of decision problems. Obviously, it is possible to refine the above classification system by adding other dimensions to it. For example, at a higher level of generality it is possible to distinguish problems with a continuous action space from those with a finite number of actions available. Similarly, one can consider tasks in which the interaction between agent and environment happens continuously rather than at discrete time intervals [13]. The next section discusses another way to extend the above classification scheme.
3
A New Dimension of Classification
Among the classification criteria discussed in the previous section, the distinction between continual and episodic tasks is especially important for evolutionary
On the Characteristics of Sequential Decision Problems
199
methods. As discussed, continual tasks may create difficulties for these methods because the evaluation of an individual must be interrupted at a somewhat arbitrary point. Though this is certainly an obstacle, it does not prevent the use of evolutionary computation in the solution of continual decision-problems. In fact, this issue is similar to that of generalization in supervised learning, a well-understood problem for which several effective techniques exist [8]. Based on the discussion above, one may come to the following conclusion: evolutionary methods work well on episodic problems and, as long as the right techniques are employed, they can also be used on continual tasks. Unfortunately, the situation is a bit more complicated than that. Surprisingly enough, the most severe limitation of evolutionary computation manifests itself not on continual tasks, but on a particular type of episodic decision problem. As discussed before, in episodic tasks the interaction between agent and environment ends when the former reaches one of possibly many terminal states. But terminal states are not all the same; some represent desirable situations while others represent situations one would rather stay away from. This gives rise to the following definitions: if a terminal state is associated with the accomplishment of a task, such as winning a game or finding the exit of a maze, it is called a goal state. If, on the other hand, a terminal state represents an error of the agent, it is called a dead-end. Examples of dead-ends include: dropping the ball in a game like volleyball, falling off a suspended race track, losing all the money in a game of chance. This distinction between different types of terminal states induces a division of episodic tasks into two sub-categories: 5.1. Goal seeking/Error avoidance: An episodic decision problem is called a goal-seeking task if its terminal states represent the accomplishment of the task. Usually, the arrival at a goal state is accompanied by a positive reward or by the ceasing of a stream of negative rewards. In an error-avoidance episodic-task the terminal states are associated with undesirable situations. More specifically, the terminal states are “dead-ends” representing bad decisions that are irreversible. The delivery of rewards in an error-avoidance task follows the opposite logic of that of goal-seeking tasks. Notice that, in contrast with the categories discussed in the previous section, the above classes of decision problems are not mutually exclusive, since it is possible for a task to simultaneously have goals and dead-ends. For example, Randløv and Alstrøm [14] proposed a task in which the objective is to balance and ride a bicycle to a target location. Thus, one must avoid terminal states representing the falling off the bike while seeking for those terminal states associated with the goal region. In principle, there is no reason to believe the distinction between goal-seeking and error-avoidance tasks is of any relevance to reinforcement learning. In fact, it is possible to find in the literature examples of reinforcement-learning applications which can be clearly identified with both categories [17]. In contrast, the performance of evolutionary methods can be highly influenced by the type of episodic decision-problem at hand. Usually, error-avoidance tasks cause no trouble for evolutionary computation. This is because in this type of problem it is
200
A.M.S. Barreto, D.A. Augusto, and H.J.C. Barbosa
straightforward to rank unsuccessful candidate solutions: the longer it takes for a decision policy to reach a dead-end, the better its evaluation. As a side effect, in an error-avoidance task the time spent in the evaluation of an individual is normally proportional to its quality. This is also a very desirable property, since it induces a smart allocation of computational resources: the evolutionary process will quickly evaluate a large number of poor solutions and eventually focus on candidate decision-policies that are worth the computational effort. Unfortunately, the situation with goal-seeking tasks is quite different. In this type of task one is interested in finding a decision policy able to reach a specific set of states. Since the position of such states is not known beforehand, it is not clear how to compute the kind of quality measure required by the phylogenetic search, such as the distance from a given state to the closest goal. This makes it hard for evolutionary algorithms to differentiate between candidate solutions that are unable to accomplish the task. Of course, there are situations in which it is possible to estimate the quality of a decision policy based on additional information about the problem. However, when such information is not available, the evolutionary process is reduced to a random search until a successful solution luckily emerges in the population. Depending on the difficulty of the task, this might be a very unlikely event [1]. Notice that this question is not related to design choices such as the type of representation or genetic operators used by the evolutionary algorithm. In fact, this issue is intrinsic to any goal-seeking task and any optimization method which basis its search on the relative merit of candidate solutions. Reinforcement learning algorithms do not suffer from this difficulty because they evaluate each decision made by the agent individually. Hence, even before reaching the goal, the agent has an estimate of the effects of the decisions made so far. The distinction between goal-seeking and error-avoidance episodic-tasks sheds some light on the apparent inconsistency on previous accounts of experiments comparing reinforcement learning and evolutionary algorithms. For example, in a sequence of scientific works initiated in 1993 by Whitley et al. [22], several researchers report an overwhelming advantage of evolutionary methods over reinforcement learning on experiments with a control problem known as the polebalancing task [11,10,7,15,6]. In the pole-balancing task one has to apply forces to a wheeled cart moving along a limited track in order to keep a pole hinged to the cart from falling over. When formulated as a Markovian problem, the characteristic of pole-balancing that seems to favor evolutionary computation the most is the task’s continuous state space. Nevertheless, the fact that it is an error-avoidance task may play an important role as well. The experiments of Barreto and Anderson [1] with the Acrobot task corroborates this hypothesis. The Acrobot is a continuous problem in which the objective is to help a gymnast-robot swinging on a high bar to raise its feet above the bar. It is, therefore, a goal-seeking task. In their experiments with the Acrobot, Barreto and Anderson were unable to find a single decision policy able to accomplish the task when using an evolutionary algorithm. In contrast, reinforcement learning easily solved the problem.
On the Characteristics of Sequential Decision Problems
4
201
An Illustrative Experiment
This section presents a computational experiment to illustrate the issues involved in the solution of episodic tasks using evolutionary methods. The experiment concerns a deterministic environment that can be easily configured as either an error-avoidance or a goal-seeking task. In the proposed problem an agent must learn how to perform by directly interacting with a two-dimensional maze. The dynamics of the maze follow the convention usually adopted in this type of task: at every state, there are four actions available—north, south, east and west—whose effect is the movement of the agent in the corresponding direction. To allow for a broader analysis, the experiments were not performed on a single maze, but rather on a set of mazes with similar characteristics. All mazes were based on n × n grids and had one entrance at the upper-left corner. The position of the goal was selected at random. The mazes were generated in such a way that all states were connected to at least two other states (see Fig. 1). Two formulations of the problem were used. The first one is a typical erroravoidance task: the agent must travel inside the maze for as long as possible without hitting any walls (thus, the exit of the maze is ignored). If the agent runs into a wall, it gets a negative reward and is repositioned at the entrance of the maze. In order to avoid trivial solutions in which the agent repeatedly visits two neighbor cells, the state visited at time t was treated as a wall at time t + 1 (but not at t + 2). This forced the agent to look for cycles within the maze. The second version of the problem has the usual description of a maze: the agent must find a path from the entrance to the exit (that is, it gets a positive reward when it finds the goal state). This is clearly a goal-seeking task. Notice that both versions of the problem are deterministic, stationary, and episodic. For the sizes of mazes used in the experiments, they are also small. Since in the error-avoidance task the transitions depend on the states previously visited by the agent, this is a non-Markovian problem. The goal-seeking task is Markovian. The evolutionary method adopted in the experiments was a generational genetic-algorithm using a population of 100 individuals. The decision policies were represented as n2 -dimensional vectors, where n is the dimension of the maze. The elements of the vectors were one of the four actions available in the
Fig. 1. Example of 10 × 10 maze
202
A.M.S. Barreto, D.A. Augusto, and H.J.C. Barbosa
task, indicating the direction selected by the policy in each state of the maze. A two-point crossover was employed for recombination, with probability of occurrence of 0.9 [5]. After an individual had been created, the random mutation operator was independently applied to each of its elements with probability 1/n2 . The candidate solutions were selected for reproduction based on linear ranking using a selective pressure of 1.5 [21]. At every generation the best solution found up to that point was copied to the next generation without modification. As one would expect, the evaluation of candidate decision-policies was easy in the error-avoidance version of the task: the fitness of an individual was defined as the number of steps it performed before running into a wall. Notice that, considering the representation used, if a decision policy could avoid the walls for n2 steps, it could do so indefinitely. Thus, upon performing this number of steps a decision policy was considered successful. The definition of an evaluation function was not as straightforward in the goal-seeking version of the task. Since one does not know the number of steps separating a given state from the goal, it is necessary to resort to alternative sources of information to build an evaluation scheme that (hopefully) helps the agent to accomplish its objective. By analyzing the dynamics of the present task and the representation used by the genetic algorithm, it is clear that a decision policy cannot escape from the maze if it runs into a wall or visits a state more than once. Hence, the evaluation of an individual can be interrupted as soon as one of these events occur. Still, one is left with the problem of estimating the quality of a decision policy—that is, how far it is from completing the task. In order to illustrate the issues involved in the derivation of an evaluation function for a goal-seeking task, two scenarios were considered in the experiments. In the first one the designer knows nothing about the task but its dynamics. In this case, a possible approach is to stimulate the candidate decision-policies to explore the maze as much as possible, in the hope that one will luckily find the goal. Thus, in this version of the task the fitness of an individual was defined as the number of steps it executed before hitting a wall or encountering an already-visited state. In the second formulation of the goal-seeking task the designer knew the coordinates of the goal state.1 In this case, the fitness of an individual was defined as n2 − d, where d is the Manhattan distance between the last state visited by the agent and the goal. Figure 2 shows the results obtained by the genetic algorithm after 500 generations on mazes of different sizes. Since the number of generations was fixed in the experiments, it is natural to observe a decrease on the success rate of the algorithm as the dimension of the mazes increases. However, the performance of the evolutionary method degenerates much faster on the goal-seeking task, as shown in the figure. This illustrates the issue discussed in the last section: since in the goal-seeking version of the problem the evaluation functions do not reflect the true objective of the agent, the performance of the algorithm deteriorates when a lucky move into the goal becomes less and less likely to occur. 1
The coordinates of the states were defined as their indices in the n×n grid underlying the maze.
203
80 60 20
40
Error avoidance Goal seeking with exploration Goal seeking with distance
0
Success rate (%)
100
On the Characteristics of Sequential Decision Problems
5x5
10x10
15x15
20x20
Fig. 2. Results obtained by the genetic algorithm on the maze task. In the erroravoidance task a run was considered successful if the agent could avoid the walls for n2 steps. In the goal-seeking task a successful run was characterized by the agent finding the goal. The values correspond to an average computed over 500 mazes of each dimension. The same mazes were used for the goal-seeking and error-avoidance tasks.
Perhaps more surprising is the comparison between the two evaluation functions used in the goal-seeking task. Since the fitness based on the distance to the goal uses more information than its naive explorative counterpart, one would expect the former to generate better results. The reason why this is not so is unclear. The explanation might be related with the presence of multiple local minima represented by states close to the goal in the Euclidean sense but far away inside the maze. Incidentally, this experiment shows that the availability of information about a goal-seeking problem is no guarantee of success for an evolutionary method.
5
Conclusions
Evolutionary computation and reinforcement learning address the sequential decision-making problem in completely different ways, and their advantages and drawbacks may be emphasized or disguised by the features of a specific task [17,12]. Therefore, an important goal is to discover characteristics of a decision problem that can be easily identified and that at the same time provide some hint as to which of the two approaches should be adopted to solve the problem. The contribution of this paper for the above scenario is twofold. First, it provides a systematic review of the criteria most commonly used to classify decision problems and discusses their impact on the performance of reinforcement learning and evolutionary computation. To the best of the authors’ knowledge, this is the first attempt to organize this information in an easily accessible form. The paper also proposes the division of the class of episodic problems into two subcategories, which delimits a set of decision problems particularly difficult for optimization techniques in general and evolutionary methods in particular.
204
A.M.S. Barreto, D.A. Augusto, and H.J.C. Barbosa
In an error-avoidance task the decision-making process lasts until the agent makes a mistake. This category of decision problem is usually amenable to evolutionary methods, since in this case it is trivial to evaluate decision policies that have failed to accomplish the task. On the other hand, in goal-seeking tasks unsuccessful candidate solutions cannot be easily ranked, since all decision policies unable to find a goal state are in principle equally bad. This subjects the use of evolutionary algorithms to the availability of prior information about the problem. There also exist episodic decision-problems whose characteristics can be identified with both error-avoidance and goal-seeking tasks. In problems with mixed features like this, evolutionary methods are expected to perform well without specific knowledge about the problem if the avoidance of errors helps the agent to get to the goal. As discussed in the paper, the characteristics of a sequential decision problem have different effects over reinforcement learning and evolutionary algorithms: some characteristics will favor one over another, while others will have the same impact on both approaches. In some cases, the description of a given problem will make it obvious which paradigm one should resort to. In general, however, one should not expect the classification of a decision problem to provide such a clearcut answer. Therefore, it might be a good idea to see evolutionary computation and reinforcement learning as complementary rather than mutually exclusive approaches. This seems to be the underlying assumption behind the learning classifier systems, a rule-based approach for solving decision problems which combines ideas from reinforcement learning and evolutionary computation [3]. It should be noted, however, that learning classifier systems still lack a strong mathematical basis, and much research must be done in order to understand the subtle interactions between the ontogenetic and the phylogenetic learning processes [4].
Acknowledgments The authors would like to thank the support provided by the Brazilian agencies CAPES, CNPq (grant 311651/2006-2), and FAPERJ (grants E-26/102.825/2008 and E-26/102.025/2009).
References 1. Barreto, A.M.S., Anderson, C.W.: Restricted gradient-descent algorithm for valuefunction approximation in reinforcement learning. Artificial Intelligence 172(4-5), 454–482 (2008) 2. Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Advances in Neural Information Processing Systems, pp. 369–376. MIT Press, Cambridge (1995) 3. Bull, L., Kovacs, T. (eds.): Foundations of Learning Classifier Systems (Spring 2005) 4. Drugowitsch, J.: Design and Analysis of Learning Classifier Systems—A Probabilistic Approach. Springer, Heidelberg (2008)
On the Characteristics of Sequential Decision Problems
205
5. Goldberg, D.: Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading (1989) 6. Gomez, F., Schmidhuber, J., Miikkulainen, R.: Accelerated neural evolution through cooperatively coevolved synapses. Journal of Machine Learning Research 9, 937–965 (2008) 7. Gomez, F.J.: Robust non-linear control through neuroevolution. Ph.D. thesis, The University of Texas at Austin (2003), Technical Report AI-TR-03-303 8. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2002) 9. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003) 10. Moriarty, D.E.: Symbiotic Evolution of Neural Networks in Sequential Decision Tasks. Ph.D. thesis, The University of Texas at Austin (1997), Technical Report UT-AI97-257 11. Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic evolution. Machine Learning 22(1–3), 11–32 (1996) 12. Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research 11, 241–276 (1999) 13. Puterman, M.L.: Markov Decision Processes—Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., Chichester (1994) 14. Randløv, J., Alstrøm, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 463–471. Morgan Kaufmann Publishers Inc., San Francisco (1998) 15. Stanley, K.O.: Efficient evolution of neural networks through complexification. Ph.D. thesis, The University of Texas at Austin (2004), Technical Report AI-TR04-314 16. Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988) 17. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) 18. Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Machine Learning 22, 59–94 (1996) 19. Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42, 674–690 (1997) 20. White, D.J.: Real applications of Markov decision processes. Interfaces 15, 73–83 (1985) 21. Whitley, D.: The GENITOR algorithm and selective pressure:why rank-based allocation of reproductive trials is best. In: Schaffer, J. (ed.) Proceedings of the Third International Conference on Genetic Algorithms and their Applications, pp. 116–121. Morgan Kaufmann, San Francisco (1989) 22. Whitley, D., Dominic, S., Das, R., Anderson, C.W.: Genetic reinforcement learning for neurocontrol problems. Machine Learning 13(2-3), 259–284 (1993)
Author Index
Aickelin, Uwe 122 Augusto, Douglas A.
Bach, Stefan R. 13 Balasubramaniam, Sasitharan Barbosa, Helio J.C. 194 Baron, Claude 98 Barreto, Andr´e M.S. 194 Botvich, Dmitri 49 Branke, J¨ urgen 13 Bredeche, N. 110 Coello Coello, Carlos A. Costa, Ernesto 25 Coudert, Thierry 98 De Falco, I. 1 Della Cioppa, A. 1 Di Camillo, Barbara Eiben, A.E.
146
74
Galv´ an-L´ opez, Edgar 170 Garibaldi, Jonathan M. 122 Garza Fabre, Mario 146 Geneste, Laurent 98
Jennings, Brendan
49
134
Maisto, D. 1 Mckay, Bob 170 Melo, Leonor 25 Montes de Oca, Marco A. O’Neill, Michael
74
170
Pereira, Francisco 25 Pitiot, Paul 98 Poli, Riccardo 182 Rocchisani, Jean-Marie 37 Rodriguez-Tello, Eduardo 86
110
Haasdijk, E. 110 Hoai, Nguyen Xuan
L´ opez-Ib´ an ˜ez, Manuel Louchet, Jean 37 ´ Lutton, Evelyne 37
194
170
Sambo, Francesco 74 Scafuri, U. 1 Sipper, Moshe 158 St¨ utzle, Thomas 74, 134 Tarantino, E. 1 Torres-Jimenez, Jose 86 Toscano Pulido, Gregorio Tsutsui, Shigeyoshi 61 Uyar, A. S ¸ ima 13 Uy, Nguyen Quang
49
Kandavanam, Gajaruban Kattan, Ahmed 182
49
Lazaro-Ponthus, Delphine Legoupil, Samuel 37
37
Vidal, Franck P.
146
170
37
Whitbrook, Amanda M. Wolfson, Kfir 158
122