Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen Advisory Board: Wo Brauer
D. Gries
J. Stoer
899
Wolfgang Banzhaf Frank H. Eeckman (Eds.)
Evolution and Biocomputation ComputationalModelsofEvolution
~ Springer
Series Editors Gerhard Goos Universit~it Karlsruhe Vincenz-Priessnitz-Strat~e 3, D-76128 Karlsruhe, Germany Juris Hartmanis Department of Computer Science, ComeU University 4130 Upson Hall, Ithaca, NY 14853, USA Jan van Leeuwen Department of Computer Science, Utrecht University Padualaan 14, 3584 CH Utrecht, The Netherlands
Volume Editors Wolfgang Banzhaf Department of Computer Science, University of Dortmund D-44221 Dortmund, Germany Frank H. Eeckman Human Genome Center, Lawrence Berkeley Laboratory Berkeley, CA 94720, USA
CR Subject Classification (1991): E2, 1.2, G.2, J.3 ISBN 3-540-59046-3 Springer-Verlag Berlin Heidelberg New York CIP data applied for This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. 9 Springer-Verlag Berlin Heidelberg 1995 Printed in Germany Typesetting: Camera-ready by author SPIN: 10485391 45/3140-543210 - Printed on acid-free paper
Preface Biology is the eternal interdisciplinary subject. With its position right between physical sciences and the behavioral/social sciences its progress always depends on good relationships to neighboring disciplines. This was even true in the good old times during the 18th and 19th century where marly of the lasting contributions of biology depended on the collaboration with geography and geology, as in the case of evolutionary theory, or mathematics, as in the case of Mendelian genetics. The 20th century cell biological and molecular revolution is fueled by the influx of concepts and techniques from chemistry and physics. Now, close to the end of the 20th century, a new ally to biology is emerging, computer science. I consider the connection of biology to computer science to be of fundamental significance for the future of biological science (as a biologist I cannot talk about the future computer science). This relationship develops on two levels: the methodological and the fundamental conceptual level. Obviously, advances in computer science are important in handling the new types of data biology is currently producing. They range from nucleotide sequences to data about foodweb structure, each requiring new techniques to handle and analyze. However, the connection to computer science goes beyond data handling and analysis and addresses one of the deep unsolved problems of biology: the problem of organization. As far as we know every biological event is based on ordinary physico-chemical processes; no special vital force keeps organisms alive. What makes organisms different from the so-called inanimate world is the spatial and temporal organization of these physico-chemical processes. But there is no scientific paradigm which allows us to tackle this problem in a systematic fashion. Similarly, computation is due to the spatial and temporal organization of data streams based on fairly ordinary physical events in processors. This is the conceptual ground on which biology and computer science meet and have, in my opinion, the chance to make a lasting impact. It is this assessment of the importance of computer science to biology that makes me especially welcome the publication of this book. It brings together computer scientists and biologists to discuss one of the areas where the connection has already been established, evolutionary theory and evolutionary algorithms, but where the communication between the disciplines is still limited. Computer scientists and biologists have fundamentally different ways of looking at evolutionary theory. This makes their interaction so promising and interesting. For a biologist evolution is a fact, and he/she uses population genetics theory to understand how it happened. In contrast, computer scientists, at least those working in evolutionary algorithms, want to harness the principle of natural selection to find solutions to new problems. This means that the computer scientists have a forward looking perspective, while the biologist has a backward looking perspective. Consequently it was the computer scientists who first had to deal seriously with the problem that mutation/selection only works under certain conditions, a fact known as the representation problem. In biology this topic was rarely discussed and mostly overlooked, even if it is a serious problem in explaining the evolution of complex adaptations (Frazetta, 1975, Riedl, 1975,
Yl
Bonner, 1988). Who or what has chosen the right genetic representation for the living species to be able to adapt? Another area, reflected in the contributions to this book, where computer science and biology have much in common is the universality of organizational principles. The root of computer science is the discovery of abstract, universal calculation machines, that work regardless of their hardware realization. The best known universal principle in biology has already been mentioned, natural selection. But there is the justified expectation that there ought to be more of these principles. One reason is that the principle of natural selection does not easily explain the class of problems with the labels "The origin of..." (Fontana and Buss, 1994). This is the philosophical basis of artificial life research (Langton, 1989). I agree, but remark that no such principle has been found yet (perhaps the "edge of chaos" ?), and I have also not seen a paradigm which promises progress along theses lines. However, if principles exist, they are most likely to be found by the combined efforts of people trained in both computer science and biology.
Yale University November 1994
G/inter P. Wagner
References Bonuer, J.T. (1988): The evolution of complexity. Princeton University Press, Princeton, NJ. Fontana, W. and Buss, L.W. (1994): "The arrival of the Jittes~": toward a theory of biological organization. Bull. Math. Biol. 56:1 - 64. Frazetta, T.H. (1975): Complex Adaptations in Evolving Populations. Sinauer Ass., Sunderland, MA. Holland, J.H. (1992): Adaptation in natural and artificial systems. MIT Press, Cambridge, MA. Langton, C.G.(1992): Computation at the edge of chaos: Phase transitions and emergent computation. Manuscript. Langton, C.G. (1989): Preface to Artificial life. Edited by C.G.Langton, Santa Fe Irtstitute, Studies in the Sciences of Complexity Series, Vol VI, Addison Wesley, Readwood City, CA. Rechenberg, I.(1973): Evolutionsstrategie. Friedrich Frommann Verlag, Stuttgart. Riedl, R. (1975): Die Ordnung des Lebendigen. Systembcdingungen der Evolution. Verlag Paul Parey, Hamburg and Berlin.
Contents Editors' I n t r o d u c t i o n W. Banzhaf and F.H. Eeckman Aspects of Optimality Behavior in Population Genetics Theory W.J. Ewens and A. Hastings Optimization as a Technique for Studying Population Genetics Equations A. Hastings and G.A. Fox
!8
Emergence of Mutualism G. Duchateau-Nguyen, G. Weisbuch and L. Peliti
27
T h r e e Illustrations of Artificial Life's Working Hypothesis M.A. Bedau
53
Self-Organizing Algorithms Derived from RNA Interactions W. Banzhaf
69
Modeling the Connection Between Development and Evolution: Preliminary Report E, Mjolsness, C.D. Garrett, J. Reinitz and D.H. Sharp
103
Soft Genetic Operators in Evolutionary Algorithms H.-M. Voigt
123
Analysis of Selection, M u t a t i o n and Recombination in Genetic Algorithms H. Mfihlenbein and D. Schlierkamp-Voosen
142
The Role of Mate Choice in Biocomputation: Sexual Selection as a Process of Search, Optimization and Diversification G.F. Miller and P.M. Todd
169
Genome Growth and the Evolution of the G e n o t y p e - P h e n o t y p e Map L. Altenberg
205
A b o u t the Contributors
261
Index
265
EDITORS' INTRODUCTION This volume comprises papers presented at an interdisciplinary workshop on biocomputation entitled "Evolution as a computational process" held in Monterey, California, in July 1992. The Monterey workshop brought together scientists from diverse backgrounds to discuss the implications of viewing evolution as a computational process. As such, evolution can be considered in the more general framework of biocomputation. Biocomputation may be broadly defined by its emphasis on understanding biological systems as computational devices. Many biocomputation subgroups have identified themselves clearly over the years: computational population biology, computational biochemistry, computational neuroscience, etc. Altogether, biocomputation is situated at the intersection between the biological sciences and computer science. Scientists and engineers with different backgrounds converge here, bringing with them specific insights and viewpoints and exporting new ideas to outside areas. Biocomputation may be also considered as part of an ambitious enterprise to uncover the secrets of the living universe. We would like to understand the genetic library each of us is carrying around, we would like to formulate principles of information processing in organisms and other living systems that have evolved over billions of years, we would like to know (or at least have a well founded scientific hypothesis), whether life should be seen as a single and unique event in the history of our Universe or whether there is a large probability of other forms of life elsewhere, maybe in a nearby galaxy. Computer scientists and engineers have already started to use strategies that are modeled loosely after Nature's recipes for optimization, adaptation and improvement of designs. Examples are neural networks [1] - - [3] and evolutionary algorithms [4] - - [8] which have recently entered the world of industrial applications after decade long investigations in academia. In our opinion we have only just started to scratch the surface and it seems likely that many more treasures will await us as we further our understanding of biological computation. A central notion in biocomputation is that of Emergent Properties or Emergence. Emergence started as a philosophical idea early in this century [9] - - [12]. It describes the dynamical process of qualitative changes, e.g. in the form of a creation of new structures and capabilities, and of complexity growth in nonlinear systems due to increased interaction between components. Consequently,
researchers are now paying more attention to the dynamical aspects of origin of systems. Investigations of emergent phenomena in natural and artificial systems [13] - - [16] are playing a prominent role in our understanding of self-organization and evolution. Since emergence usually needs exponential growth rates (at least early on in its dynamics) we can assume that positive feed-back loops are in effect. A wealth of emergent phenomena can therefore be found in communication links. Such links are effective at the organismal level, for example the emergence of language from primitive utterings, or at the societal level, as in the emergence of common technologies through reinforcement. Instabilities caused by positive feed-back loops in a system are required to move the system from one qualitative stage to the next. The requirement for complex systems to teeter "at the edge of chaos" [17],[18] or near instabilities [19] is therefore understandable. Only the violent forces of instability allow a system to truly evolve. However, continuous exponential growth of unstable modes is simply not possible in a finite world. Sooner or later the stabilizing forces of evolution will limit growth by subjugating organisms to selection. Selection is a consequence of competition that itself results from finite resources. It is only when the instabilities are held at bay by selection that we can start to see structure in a system. Stabilizing selection and de novo emergence are the main themes of evolution viewed as a computational process. The general course of evolution has often been associated with that of an adaptive search. Hence it has been a longstanding controversy in evolutionary biology whether evolution really is a dynamical process that searches for optima [20] or not. What then is optimized, and what is the measure of quality in evolution? Warren Ewens and Alan Hastings [21] discuss this question in the context of a one-locus population genetics model and propose a new interpretation of results obtained by Svirezhev [22]. The basic idea is to formulate a Hamiltonian principle similar to the ones formulated in physics for various dynamical problems. Evolutionary dynamics then follows naturally by computation of extrema of the corresponding scalar function in the integrand. Alan Hastings and Gordon Fox [23] further elaborate on this idea aa~d derive results fo~ two-locus models. Why then, one might ask, are there multiple solutions if evolution is an optimization process? Guillemette Duchateau et al. [24] answer this question in a dynamical model of the emergence of coelenterates-algae symbiosis. Interestingly, they are able to show a region of co-existence between symbiosis and selfishness. They interpret their results by drawing an analogy to phases in thermodynamics. The thermodynamical metaphor is also at the center of the argument of Mark Bedau [25]. In the context of simple artificial life models he has devised, he examines statistical macrovariables like mean values and diversity measures of traits within a population as well as adaptive evolutionary activity in the population as a whole. He demonstrates convincingly that the identification of macrovariables is key to understanding such models.
One of us (W.B., [26]) highlights another important aspect of self-organization: the appearance of organizing entities acting on themselves. He does so by discussing a model of self-organizing algorithms. Since the days of von Neumann [27], this theme has been reverberating in the self-organization literature .[28]. It finds its expression here in a very simple form using sequences of binary numbers. Lee Altenberg [29] considers the genotype-phenotype mapping and demonstrates the advantage of his "constructional" selection in the process of adaptation. Specifically, he considers the variational aspect of the representation problem and pleiotropy. The relation and even interaction of evolution and development is discussed in the paper by Eric Mjolsness et al. [30]. Their model is based on a regulatory network for development introduced earlier [31] stating a grammar for development. First observations in the model show the emergence of different cell types in a simulation of multicellular organisms. The next two papers discuss a class of algorithms that have become prominent in recent years [4] - [8]. Hans-Michael Voigt [32] blends evolutionary algorithms with fuzzy logic by introducing soft genetic operators. He compares the performance of these newly invented operators to what he calls hard genetic operators and is able to draw favorable conclusion for the fuzzy operators. Heinz MiJhlenbein and Dirk Schlierkamp-Voosen [33] study the Breeder genetic algorithm and derive theoretical and empirical conclusions from the behavior of this algorithm applied to selected problems. A central role in their argument plays the well known response-to-selection equation of quantitative genetics [34]. Finally, Geoffrey Miller and Peter Todd [35] provide strong arguments against the popular idea that natural selection can explain evolution. Going back to Darwin, they state that sexual selection is central in explaining innovation upon which natural selection might act only later on. As such, the emergence of new traits can be understood as resulting from the communication events of sexual selection. The more general statement about the necessity of instabilities through positive feed-back loops, mentioned in the beginning of this introduction, finds a very clear confirmation here. The workshop in Monterey was a truly interdisciplinary event and we attempted to bring together researchers on both sides of the issue, the biological and the computational. The aim of our meeting was to highlight and explore the notion of evolution as a giant computation being carried out over a vast spatial and temporal scale. We hope that the collection of essays presented here successfully reflects the spirit and enthusiasm at the meeting. Indeed, the impression was that computer scientists, mathematicians and physicists can learn about optimization from looking at evolution and that biologists may learn about evolution from studying artificial life, game theory, and mathematical optimization.
We would like to acknowledge the Institute for Scientific Computing Research at Lawrence Livermore National Laboratory (LLNL) and the Biocomputation Center at Sandia National Laboratory (SNL) for their generous support of this meeting. We would also like to thank the organizing committee for providing us with this opportunity for interdisciplinary communication. It is our pleasure to thank all the participants of the workshop and especially the invited speakers, for their valuable contributions. Chris Ghinazzi did a wonderful job coordinating the meeting and the special event banquet. We are grateful to Helge Baler who generated an index for the book. Last but not least we would like to express our gratitude to Dr. Alfred Hofmann from Springer, Heidelberg, for his friendly and helpful cooperation.
Wolfgang Banzhaf
Frank Eeckman
Dortmund and Berkeley, November 1994
References 1. Hecht-Nielssen, R. (1989): Neurocomputing. Addsion-Wesley, Reading, MA. 2. Hertz, J., Krogh, A. and Palmer, R. (1991): Introduction to the Theory of Neural Computation. Addison Wesley, Redwood City, CA. 3. Wasserman, P.D. (1993): Advanced Methods in Neural Computing. Van NostrandReinhold, New York, NY. 4. Rechenberg, I. (1973): Evolutionsstrategien. Fromann-Holzboog, Stuttgart. 5. Holland, J.H, (1975): Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI. 6. Schwefel, H.P. (1981): Numerical Optimization. Wiley, Chichester, UK. 7. Goldberg, D. (1989): Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA. 8. Michalewicz, Z. (1992): Genetic Algorithms + Data Structures = Evolution Programs. Springer, Berlin. 9. Morgan, Lloyd C. (1923): Emergent Evolution. Williams & Norgate, London. 10. Pepper, S.C. (1926): Emergence. Philos. 23:241 - 245. 11. Ablowitz, R. (1939): The Theory of Emergence. Philos. Science 177:393 - 396. 12. Angyal, A. (1939): The Structure of Wholes. Philos. Sei. 6:25 - 37. 13. Forrest, S. (1991): Emergent Computation. MIT Press, Cambridge, MA.
14. Kampis, G. (1991): Self-modifying Systems in Biology and Cognitive Science. Perg~mon Press, Oxford, UK. 15. Cariani, P. (1991): Emergence and Artificial Life. In: Langton, C., Taylor, C., Farmer, J. and Rasmussen, S. (Eds.): Artificial Life II. Addison-Wesley, Redwood City, CA, 775 - 797. 16. Baas, N. (1994): Emergence, Hierarchies and Hyperstructures. In: Langton, C. (Ed.): Artificial Life IIL Addison-Wesley, Redwood City, CA, 515 - 537. 17. Langton, C. (1991): Life at the edge of chaos. In: Langton, C., Taylor, C., Farmer, J. a~nd Rasmussen, S. (Eds.): Artificial Life II. Addison-Wesley, Redwood City, CA, 41 - 91. 18. Kanffman, S. and Johnsen, S. (1991): Go-Evolution to the Edge of Chaos: Coupled Fitness Landscapes, Poised States and Go-Evolutionary Avalanches. In: Langton, C., Taylor, C., Farmer, J. and Rasmussen, S. (Eds.): Artificial Life II. Addison-Wesley, Redwood City, CA, 325 - 369. 19. HHaken, H. (1983): Synergetics, an Introduction. Springer, Berfin. 20. Dupre, J. (1987): The latest on the best. MIT Press, Cambridge, MA. 21. Ewens, W. and Hastings, A. (1995): Aspects of Optimality Behavior in Population Genetics Theory. This volume, 7 - 17. 22. Svirezhev, Y.M. (1972): Optimum principles in genetics. In: Studies on Theoretical Genetics. USSR Academy of Science, Nowosibirsk. fin Russian] 23. Hastings, A. and Fox, G. (1995): Optimization as a Technique for Studying Population Genetics Equations. This volume, 18 - 26. 24. Duchateau, G., Weisbuch G. and Peliti, L. (1995): Emergence of Mutualism. This volume, 27- 52. 25. Bedau, M. (1995): Three Illustrations of Artificial Life's Working Hypothesis. This volume, 53 - 68. 26. Banzhaf, W. (1995): Sell-organizing Algorithms derived from RNA interactions. This volume, 69 - 102. 27. von Neumann, J. (1966): Theory of Self-reproducing Automata. Edited and completed by Burks, A.W. University of Illinois Press,Urbana, IL. 28. Langton, C. (1989) Artificial Life. In: Artificial Life. Langton, C. (Ed.). Addison Wesley, Redwood City, CA. 29. Altenberg, L. (1995): "Constructional" Selection and the Evolution of the Genotype-Phenotype Map. This volume, 205. 30. Mjolsness, E., Garrett, C., Reinitz, J. and Sharp, D. (1995): Modeling the connection between Development and Evolution. This volume, 103 - 123. 31. Mjolsness, E., Sharp, D. and Reinitz, J. (1991): A connectionist model of development. Journal of Theoretical Biology 152:429 - 453. 32. Voigt, H.M. (1995): Soft Genetic Operators in Evolutionary Algorithms. This volume, 123- 141. 33. Mfihlenbein, H. and Schfierkamp-Voosen, D. (1995): Analysis of Selection, Mutation and Recombination in Genetic Algorithms. This volume, 142 - 168. 34. Falconer, D.S. (1981): Introduction to quantitative Genetics. Longman, London. 35. Miller, G. and Todd, P. (1995): The role of mate choice in biocomputation: Sexual selection as a process of search, optimization and diversification. This volume, 169 204.
6
Organizing
Committee
of the Biocomputation
E v o l u t i o n as a C o m p u t a t i o n a l
Workshop
Process
Joachim Buhmann Lawrence Livermore National Laboratory, now at Bonn University Michael Cotvin Sandia National Laboratory Richard Durbin Medical Research Council, Cambridge Frank Eeckman Lawrence Livermore National Laboratory, now at Lawrence Berkeley Laboratory Richard Judson Sandia National Laboratory Nora Smiriga Lawrence Livermore National Laboratory
Aspects of Optimality Behavior in Population Genetics Theory W.J. Ewens 1 and Alan HastingQ 1 Department of Biology University of Pennsylvania Philadelphia, PA 19104 2 Division of Environmental Studies, Center for Population Biology, and Institute for Theoretical Dynamics University of California Davis, CA. 95616 A b s t r a c t . Optimality principles are central to many areas of the physical sciences, and often the simplest way of finding the evolutionary behavior of some dynamical system is by finding that path satisfying some optimality criterion. This paper discusses two aspects of the evolutionary paths followed by gene frequencies under natural selection as derived by optimality principles. The first, due to Svirezhev, is that when fitnesses depend on the genes at a single locus only, and random mmating occurs, the evolutionary paths of gene frequencies, as determined by natural selection, minimize a functional which can be thought of as the sum of a kinetic and a potential energy. The second principle applies when fitness depends on all loci in the genome and random mating does not necessarily occur. The set of gene frequencies start at some point p in gene frequency space, and, some time later, under natural selection, are at some point q. There is a natural non-euclidean metric in the space of gene frequencies, and with this metric the distance from p to q is some value d. Then of all points in gene frequency space at distance d from p, the point q corresponding to natural selection maximizes the so-called partial increase in mean fitness, a central concept in a recent interpretation of the Fundamental Theorem of Natural Selection.
1
Optimality
It has long been known that many phenomena in the natural sciences exhibit optimality behavior, and the formalization of this goes back to the times of Fermat, Euler, Lagrange and Hamilton. An account of the use of optimality principles in science has been given in a recent paper by Schoemaker (1991) and the associated discussion. This discussion focused on the physical sciences, with comparatively little attention being paid to the biological sciences. Nevertheless, optimality concepts are of central interest in the biological sciences, as well as in areas such as biocomputation and the use of genetic algorithms which employ
biological concepts. The various chapters in this book witness this focus on optimality in these areas: in particular we refer to the companion paper by Hastings and Fox (1993). Optimality in the physical sciences is frequently associated with simplicity: often the easiest way of arriving at a physical principle is through an optimality requirement. By contrast, optimality considerations in the evolutionary biological sciences are sometimes associated with complexity and the resultant difficulties of reaching an optimum - a current trend in genetical evolution (Kauffman, (1993)) focuses on the "complexity catastrophe" reached when a biological entity has evolved to such a complex state that it cannot readily evolve further to a different but more desirable state. On the other hand, a similarity between the physical and biological sciences concerns the choice of a suitable metric in the space in which dynamic behavior occurs. It is well known, for example, that in general relativity optimality behavior is exhibited in a space-time co-ordinate system endowed with a suitable metric: we will show later how choice of an appropriate metric in the space of gene frequencies leads to an optimality behavior that is not readily perceived using the standard euclidean metric. At a higher level, the Darwinian theory itself can be viewed as one in which a population continually strives for optimization through natural selection. In the controversy surrounding the two most important theories concerning the rewriting of the Darwinian paradigm in a Mendelian framework, proposed respectively by R.A. Fisher and Sewall Wright, the main point at issue concerned the different conditions assumed under each theory to be best suited to optimizing the evolutionary process. We will discuss later the interpretation of the centerpiece of Fisher's theory, encapsulated in his "Fundamental Theorem of Natural Selection", and will claim that it has consistently been misunderstood since its introduction, and argue further that it is best presented in association with an evolutionary optimization behavior which we describe later. In this chapter we focus on two aspects of optimization which derive from the central dynamical equations of biological evolution, viewed as a genetic process describing changes in gene frequencies under natural selection. The first aspect concerns optimality properties of the path integral of a certain function of gene frequencies when mating is random and fitness depends on the genetic constitution at a single gene locus. The second concerns the case where fitness depends on many loci and mating is not necessarily at random, and focuses on the concept of partial increase in mean fitness. To make this exposition self-contained, we first outline the equations which describe the dynamics of evolutionary change when viewed as a genetic process. We assume throughout a monoecious diploid population of size so large that random changes in gene frequency can be ignored.
2
Dynamical Equations
We consider first the case of a gene locus "A", admitting alleles (gene types) A1, A2 9 9 Ak 9At the time of conception of a certain (parental) generation, the
frequency of AiAi is assumed to be Pii while that of AiAj is 2Pij, (i # j). it follows that the gene (more properly allelic) frequency Pi of Ai at this time is
Pi = E Pij. J
(1)
Under random mating we have Pij = PiPj (both for i = j and i ~ j), and we will sometimes assume that this is the case. The (viability) fitness of AiAj defined as a measure of the probability that an individual of this genotype will survive until the age of reproduction, is written wq. It follows that the frequency P~j of this genotype at the age of reproduction is
P'~-
wij Pij -~
(2)
where @, defined by
: Z ~ P~jwij, i
(3)
j
is the mean fitness of the population. From this, the frequency p~ of Ai at this later age is P~ = E wijPij_ (4) J Thus the change 5i in the frequency of Ai between the two life stages is
6i = E wij_Pij i
Pl, (i = 1 , 2 , . . . , k ) .
(5)
w
Since we normally assume that the frequency of Ai in the daughter generation at the time of conception is the same as that in the parental generation at the age of reproduction, this is also the change in the frequency of Ai from the time of conception of one generation to the time of conception of the next. To this extent, (5) represents a part of any model of the dynamical behavior of gene frequency change under natural selection. To develop further properties of this dynamic behavior further assumptions are necessary. One assumption often made is that of random mating. Under random mating the above equations simplify to
p~
Pi wi _
-
,
(6)
w
5i -- pi(wi_- u , (i = 1, 2, . . . , k),
(7)
w
i
j
where we define wi by
wi = E PJwij. We may think of wi as the marginal fitness of the allele A~.
(9)
10
The above analysis assume discrete generations. It is often more appropriate to consider time as continuous, in which, for the random mating case, (7) is replaced by [9i = p i ( w i -- w ) , (i = 1, 2 , . . . , k), (10) with N being given by (8) and a superscript dot denotes a derivative with respect to time. (We do not give the continuous-time analogue of the more general equation (2), since to do this would require specific assumptions being made about the mating scheme.) There are two further quantities of major importance in population genetics theory which we now define and consider at some length. The first of these was introduced by Fisher (1958) and is central to his concept of evolution, which he saw being described essentially as changes in gene frequencies in a population over time, as opposed to changes in gametic frequencies, under the action of natural selection. This concept is that of the average effect of the gene Ai, which is defined by a minimization procedure in the following way. Suppose first that the fitness wij can be written in the form
w~j = ~ + ai + aj
(11)
for parameters (O~1,..., O~k) which satisfy, as they must, from (11), Epj~j
=0.
(12)
Thus if any individual in the population is chosen at random and a randomly chosen gene in that individual is replaced by an Ai gene, the mean fitness change of that individual is cti - E p i a j = ai. (13) This explains the terminology "average effect of Ai", which in this case is a constant. More generally the genotype fitnesses cannot be written as in (11), and the average effects (which now depend on gene and genotype frequencies) are chosen so as to minimize c~ 2
(14)
subject to (12). If we write D = diag(pl, P2,..., Pk), P = =
(15)
P' = (Pl,P2,... ,Pk), 5' = (51,6,...,5~), the minimizing values for (al, c~2,..., ae) are found implicitly as the solutions to the equations (D + P ) a = ~e, (16)
1]
where the components in 5 are found from (5). In the random-mating case we can solve these equations explicitly, to obtain c~ _
- wi - w,
(17)
Pi
so that ai is the excess of the marginal fitness of Ai over the mean fitness. When random mating is not assumed, no simple explicit formula for c~i exists, and equation (16) must be solved numerically. The second central quantity, also introduced into the genetics literature by Fisher, is the additive genetic variance, denoted a~, which can be thought of as that component of the total variance in fitness which can be ascribed to differences in the (marginal) fitnesses of the various alleles AI, A2. 9 At. In the random-mating case, this is given by ~r~ = 2 E P i ( W i - N) 2.
(18)
i
It follows from (7), (17) and (18) that an alternative expression for ~r~ is
=
(19)
When mating is not at random the analysis is more complex. However it is found eventually that the additive genetic variance is still defined by (19) if we define by by (3) , 5 by using (5), and c~ as the solution of the equations implied in (16). All these results can be generalized to the case where fitness depends on the genes present at an arbitrary number of loci. Those aspects of this generalization which are of interest to us here follow immediately from the above equations, and thus will be described later at a more appropriate point. 3
A Hamilton's
Principle
in Population
Genetics
The equations of motion in physics can typically be obtained via an optimization procedure based on the calculus of variations which determines a path which minimizes the difference between potential and kinetic energy along the path. This is embodied in Hamilton's principle which states that for gradient systems, that is systems whose equations of motion are the gradient of a potential, the motion can be obtained by finding the stationary point of the integral of the lagrangian which is defined as the difference between kinetic and potential energy. We will now present an analogue to this approach for single locus population genetics, which was first discovered by Svirezhev (1972). Our computations, however, are presented in a somewhat different form than his. We first return to the point that the physical systems for which dynamical equations can be obtained via variational arguments are gradient systems. We thus would expect that a similar approach might work for continuous time single locus population genetics systems with random mating, where the dynamic
12 equations as well can be obtained at the gradient of a potential (with the appropriate metric), as shown by Shashahani (1979), Akin (1979) and perhaps best explained in Hofbauer and Sigmund (1988, pp.242-245). As an analogue to the difference between kinetic and potential energy, we define the function f (Svirezhev, 1972) which includes a term corresponding to dynamics, and a term corresponding to half the additive genetic variance (given in (18)), by f = ~
-~i
+ EPi(Wi
-- @)2
(20)
This form can also be motivated by noting that there are two ways of viewing the equations of single locus population genetics as being derived form the gradient of a potential. One can change the metric, as indicated earlier, or one can make the change of variables Yi = (Pi)2/4, which makes the single locus dynamic equations into a gradient system under the ordinary metric, with the phase space being the surface of a sphere restricted to the positive orthant. Under this transformation, the first term in (1) becomes (y~)2 which is the kinetic energy. To show that the actual dynamics of allele frequencies can be found from a variational principle, we cannot simply minimize the integral of f along evolutionary paths, since we must include the additional constraint that all the allele frequencies sum to one, that is c =
1 = 0.
(21)
The claim we will now demonstrate is that the equations (10) of motion for single locus population genetics can be obtained by minimizing the integrand (20) along the evolutionary path taken by the allele frequencies, subject to the constraint (21). Standard results from the calculus of variations imply that the solution to the problem, minimize
[,2
f dt, subject to G = 0,
(22)
dr1
where tl and t2 are the initial and final values of time and the allele frequencies are specified at the initial and final time, satisfies the system of variational equations 8dpi
dt
= 0.
(23)
Here F is obtained from f by using a Lagrange multiplier, so that F = f + ~O,
(24)
where # is a function to be determined. The function f does not involve time explicitly, so we can integrate the system (23) via a straightforward computation (e.g. Weinstock, 1974, pp. 48-53) and obtain the first order equations ioi = p~(w~ - ~ ) (i = 1 , 2 , . . . , k ) .
(25)
13 Since these equations are identical to (10), we have demonstrated that the equations for single locus population genetics can be obtained via a variational argument, as first shown by Svirezhev (1972). Note that the course of this demonstration shows that # = 2w. It is important to understand the limitations of what has been shown. First, the integral is taken with respect to time, and not with respect to allele frequencies. Secondly, although it is true that
(Pi)--I ~ = ~p~(w; - ~)~ Pi
(26)
if the dynamic equations (10) hold, (26) is not true for an arbitrary evolutionary path. Thus, it is not correct to say that the integral of the additive genetic variance is minimized by the evolutionary path determined by natural selection. Finally, we have been unable to extend this approach to more than one locus. We conjecture that this will prove to be impossible, because in contrast to the equations describing one-locus population genetics, the equations of multilocus population genetic systems can be shown not to be gradient systems in general (ttofbauer and Sigmund, 1988). The only possible extension might be to a special case of the single locus mutation selection equations. If the mutation rate from allele i to allele j depends only on the identity of allele j and not that of allele i, then the dynamic equations are a gradient system, as shown by Hofbauer and Sigmund (1988). 4
Optimality
and the Fundamental
Theorem
Our first aim in this section is to define the "partial increase" in mean fitness: (a more complete description is given by Ewens (1989)). This will be done in the case of a general population (that is, random mating is not necessarily assumed) evolving in discrete time according to (2). A definition of mean fitness alternative
to (3) is ~=
~ ~ Pi~(~ + ~, + ~ ) ,
(27)
and there appears to be strong evidence that Fisher viewed the right-hand side in (26) as a more natural definition of mean fitness than the right-hand side in (3). The partial change in mean fitness during the course of one generation is defined as the change in the right-hand side in (26) due to a change in Pq alone: that is, the partial change in mean fitness is, by definition,
F_, r.,(e~ - Pq)(~ + ~, + "~),
(28)
Using (2), this is easily seen to reduce to
2~ ~
= 2,~'r
(29)
i
and then use of (19) shows tha~ this is exactly Cry/@. Since (29) depends on changes in genotype frequencies only through changes in gene frequencies, we
14 may use the argument following (5) to describe (28) as the partial change in mean fitness from one generation to another. Price (1972) and Ewens (1989) argue that this conclusion was viewed by Fisher as the statement of his "Fundamental Theorem". Suppose now we consider arbitrary changes (dl,d2,... ,dk) in the gene frequencies, and define a vector d by d ' = (di, d 2 , . . . , dk). The interpretation of ai as the average effect of Ai shows that we may think of 2d~a as the partial change in mean fitness due to these gene frequency changes. Suppose now that we impose the constraint O-2
2 d l a _ ---
w
(30)
(as well as the natural constraint ~ di = 0) : this requirement is that the partial increase in mean fitness should equal that arising through natural selection. We now ask what quadratic form d t T d in these arbitrary changes is minimized when d = 5, that is, at the natural selection values, subject to the constraint (33). The introduction of a Lagrange multiplier shows that we must minimize the function d / T d + 2,k(a'a)
(31)
and straightforward differentiation leads to the equation T d = ,~,c~
(32)
T-let =constd.
(33)
which may be written We want this equation to be solved by d = 5, and comparison with (16) shows immediately that to do this we may take T = (D + p ) - l . Thus the quadratic form we seek is d'(D + P ) - l d , (34) and we may say that the quadratic form (34) is minimized, subject to the constraint (33), at the natural selection vector d = 6. A statement equivalent to this is: subject to the condition d'(D + P ) - l d = z~/(2~2), the vector d of gene frequency changes which maximizes the partial increase in mean fitness is the natural selection vector & We can restate this conclusion in a more useful way if we define (34) as a new metric giving the distance between old (pl,..., Pk) and new (Pl + d l , . . . , p ~ + dk) gene frequency values, by saying that if the distance between two sets of gene frequencies is prescribed to be the natural selection value, then the natural selection changes in gene frequency maximize the partial increase in mean fitness. In this way we can begin to ascribe an optimality character to natural selection, but the statement as described is of little value unless we can first find a "natural" interpretation for the metric (34). Before doing this, we note that in the particular case of random mating, the metric (34) simplifies to ~ d2/p~. In his original derivation of the results described in the previous section, Svirezhev (1972) used precisely this metric. This was done purely for mathematical convenience and no interpretation of this metric in biological terms was needed (or offered). Thus our interpretation of the
]5
more general metric (34) can also be regarded as a biological justification for the metric that Svirezhev, for purely mathematical reasons, found it convenient to employ. We now turn to the interpretation of (34). The quantity (11) which is minimized in the definition of the average effects is, up to a linear function, a'(D + P ) a - 2@a'5.
(35)
Consider for the moment the minimization of a'(D + P ) a
(36)
subject to O-2 a'5 = --~-~. (37) 2~ Introducing a Lagrange multiplier, this is done by the absolute minimization of
a'(D + P ) a - 2Aa'5.
(38)
This minimization occurs where (D + P ) a = ~6.
(39)
which is precisely (16) if we choose A = ~. In other words, minimization of (36) subject to (33) is identical, from (34), to the absolute minimization of a'(D + P ) a - 2~a'5.
(40)
But this is exactly (35). In other words, the average effects can be defined, not only through the original definition of minimizing (13) subject to (12), but also by the minimization of (36) subject to (33). Suppose we now define a vector g by (41) g - (D +~ P ) a ' so that a = ~ ( D + p ) - l g . Then (36) and (33), jointly, define the minimization of const g'(D + p ) - l g (42) subject to g'(D + P)-15 =const
(43)
g'a = const.
(44)
which in view of (16) is But minimization of (42) subject to (43) is precisely the minimization of (34) subject to (33). The two procedures, namely the minimization of a quadratic form to define average effects, and the maximization of the partial increase in mean fitness, are, in fact, the same mathematical procedure, simply presented in different ways. Thus insofar as the definition of average effects through the minimization of (13) is regarded as natural and meaningful, use of (34) as a
16
distance metric describing the distance between old and new gene frequencies also becomes natural and meaningful, and we summarize by saying that in a gene frequency space endowed with the "natural" metric (34), natural selection possesses the optimizing property of maximizing the partial increase in mean fitness for any set of gene frequencies which are at the same distance from the original as those arising through natural selection. The above analysis is in discrete time. An analogous analysis holds in continuous time, with in effect the same result. All of the above makes the (unrealistic) assumption that fitnesses depend on the genotype at one locus only. It is however possible to generalize the analysis immediately to the case where fitnesses depend, in a completely arbitrary way, on the genetic make-up of the entire genome, and where no specific assumptions need by made about linkage arrangements, recombination values, the number of loci in the genome or the number of alleles at each locus. To do this, we first order the loci in some agreed way and then the genes at each locus. We now redefine D as a diagonal matrix whose elements are, in turn, the gene frequencies at the various loci, P as a block diagonal matrix, each block corresponding to one gene locus having as entries the various within-locus genotype frequencies, and Q as a certain (off-block-diagonal) matrix of pairwise two-locus genotypic frequencies (Castilloux and Lessard, 1995). Appropriate generalizations of the mean fitness and the additive genetic variance rr~ are also made. Our first task is to define the average effects of all the alleles at all the loci. To do this we define a vector c~ of these average effects, where the alleles whose average effects are described in this vector are conformal with the alleles whose frequencies are displayed in D . Then the natural generalization of the procedure which leads to (16) shows that the average effects are defined, implicitly, as the solutions to the equation
(D + P + Q)a = ~5,
(45)
where 5 is a vector of allelic frequency changes, with again the alleles being conformal with the alleles whose average effects are given in c~. The similarity with (16) is immediate. Carrying through an analysis directly generalizing that given above, we find that a natural metric in the space of gene frequencies is d'(D + P + Q ) - l d ,
(46)
and that subject to the requirement that the distance between old and new gene frequency sets, as measured by (45), is ~ / 2 N 2, the vector of gene frequency changes which maximizes the partial increase in mean fitness is again the natural selection vector. Details of this procedure are given in Ewens (1992). In this way we have shown, in a completely general setting, (that is, considering the entire genome, all alleles at all loci, arbitrary fitnesses, arbitrary genotype frequencies and arbitrary recombination structure, and with a natural metric in gene-frequency space), that natural selection operates in a meaningful optimizing manner.
17
References Akin, E. (1979). The Geometry o] Population Genetics. Lecture notes in Biomathematics 31. Springer-Verlag, Berlin. Castilloux, A -M., and Lessard, S. (1995). The Fundamental Theorem of Natural Selection in Ewens' Sense (case of many loci), (submitted). Ewens, W.J. (1988). An interpretation and proof of the Fundamental Theorem of Natural Selection. Theoret. Pop. Biol. 36, 167-180. Ewens, W.J. (1992). An optimizing principle of natural selection in evolutionary population genetics. Theoret. Pop. Biol. 42, 333-346. Fisher, R.A. (1958). The Genetical Theory o] Natural Selection. Dover, New York. Hastings, Alan and Fox, Gordon (1995). Optimization as a way of studying population genetics equations. (This volume.) Hofbauer, J. and Sigmund, K. (1988). The Theory of Evolution and Dynamic Systems. Cambridge University Press, Cambridge. Kauffman, S.A. (1993). The Origins o] Order. Oxford University Press, New York. Price, G.R. (1972). Fisher's Fundamental Theorem made clear. Ann. Hum. Genet. 36, 129-140. Schoemaker, P.J.H. (1991). The quest for optimality: A positive heuristic of Science? Behav. Brain Sci. 14, 205-245. Shahshahani, S. (1979). A new mathematical framework for the study of linkage and selection. Memoirs of the American Mathematical Society, Vol. 17, No. 211, Amer. Math. Soc. Providence. Svirezhev, Y.M. (1972). Optimum principles in genetics, in Studies on Theoretical Genetics. USSR Academy of Science, Novosibirsk. [In Russian with English summary.] Weinstock, R. (1974). Calculus o] Variations with Applications to Physics and Engineering. Dover, New York.
Optimization as a Technique for Studying Population Genetics Equations Alan Hastings 1 and Gordon A. Fox2 1 Division of Environmental Studies, Center for Population Biology, and Institute for Theoretical Dynamics University of California Davis, CA. 95616 Email:
[email protected] Department of Ecology and Evolutionary Biology University of Arizona Tucson, AZ 85721 Abstract. We use methods from dynamic optimization to study the possible behavior of simple population genetic models. These methods can be used, at least conceptually, to determine limits to the behavior of optimization algorithms based on genetic equations.
1 Introduction The primary focus of this book is to look at how to use the equations of population genetics to study and understand problems in optimization. Most of the rationale for using ideas borrowed from natural selection to solve problems in optimization comes from Fisher's fundamental theorem. Unfortunately, it is well known that Fisher's result applies only to random-mating, single-locus population genetic models with constant selection (see for example, Ewens and Hastings, 1995). Multilocus population genetic models are complicated nonlinear dynamic equations. The dynamics and the equilibrium behavior of these multilocus equations are not well understood, except for some special cases. In this chapter, we will describe approaches for trying to understand bounds to the behavior of these equations by using optimization methods. This work may, in turn, provide some insights on the performance of methods that use genetic equations to solve optimization problems. We will summarize two primary approaches: equilibrium behavior of the twolocus models, and dynamics of two-locus models. In both cases, the approach has been to use optimization methods to find limits to the behavior of the equations (Hastings, 1981; Fox and Hastings, 1992) for fitnesses that are only known within some bounds. One reason for this is that in population genetics the fitnesses are not well specified. Thus the fitnesses are treated either as the unknowns or as parameters to be determined. To place these approaches in a larger context, we will begin by examining the simpler one-locus population genetic equations. This will provide background
19 and motivation for the methods we will discuss for studying the multilocus equations. Moreover, the single-locus viewpoint, in combination with the multilocus results, will help illustrate the role played by recombination and linkage disequilibrium in the dynamics and equilibrium behavior of multilocus population genetic equations.
2
Single Locus Population Genetic Models
Here, we will start with the simplest case, a single locus with two alleles. Let the alleles be A and a, and denote the frequency of A by p and the frequency of a by q. We will begin with a description of the deterministic discrete time model with random mating and nonoverlapping generations. Let the fitness of the genotypes A A , A a , and aa be denoted by WaA,WAa, and waa, respectively. We define the average fitness of the allele A as WA -: pWAA + qWAa
(1)
and the average fitness of the population as = p w a + qw~
(2)
The dynamics of the allele frequencies are then given by the equation # =
(3)
where pP is the allele frequency in the next generation. The equilibrium behavior of this model is easy to analyze. The usual approach is to view the fitnesses as parameters and then solve for the equilibrium value of p = p' (e.g., Ewens, 1979; Nagylaki, 1992). If one could readily estimate the fitnesses, this would make it easy to predict the evolution of gene frequencies in natural populations. However, it is much easier to measure allele frequencies than it is to estimate fitnesses in natural populations. Attractive as this approach may be, then, it is usually impossible to implement in practice. So here we will use an alternate approach (Hastings, 1981): view the equilibrium allele frequency as the parameter, and the fitnesses as unknowns. Doing so allows us to find values for the fitnesses that m a y explain the observed allele frequencies. 2.1
Equilibria
In the simple one-locus, two allele case, this alternate approach leads to a single linear equation with two unknowns. To see this, note that only the relative fitnesses are important, thus reducing the three unknown fitnesses to two, if, e.g., we normalize WAa to be one. Thus, a particular equilibrium allele frequency can be 'explained' by any of the fitnesses in a one dimensional set of possible fitnesses. If we add the constraint (which is easy to specify in this case) that the equilibrium be stable, this restricts the possible fitnesses to those lying along a
20 line that satisfy the constraint that WAA and w~a be less than one. No further information can be deduced - - there is no minimum or maximum strength to selection that we can find. Any specified set of allele frequencies is an equilibrium in fact, a stable equilibrium - - for some set of fitnesses. This view can be extended to an arbitrary number of alleles, and for any specified set of allele frequencies we can find a set of possible fitnesses. These results will not hold for the multilocus problem. -
-
2.2
Dynamics
Given the simplicity of the single-locus model, some aspects of dynamical behavior can be studied using the discrete time model. To facilitate comparison with multilocus results and techniques, we will turn to the continuous time model with random mating and overlapping generations to examine dynamics. The model that we will use is not exact, but is a reasonable approximation when selection is weak (see for example, Nagylaki, 1992). Here, it will be more convenient explicitly to include multiple alleles. Let Pi be the frequency of allele i. Then, define m~j as the Malthusian parameter for the genotype ij, so rnii = bij - dij, where bij and dij are the birth and death rates of the genotype ij. The model we will use is d p i / d t = p i ( m i - fn), (4) where J and
(6) J As a precursor to the study of the multilocus model, we will phrase a study of the dynamics of this model using an optimization procedure. This is motivated in part by a biological question: how far have two populations diverged from one another? Such "genetic distances" (e.g., Nei, 1987) are usually based on measurements of the allele frequencies at a locus in two populations. Most often, these distances have been defined without an underlying biological justification. Here, we will define the genetic distance using an optimization problem, based on the assumption that selection (of an unknown and possibly time-varying pattern) has led to the divergence between the two populations. Define a starting set of allele frequencies
p(0) = po
(7)
p(T) = pT.
(8)
and a final set of allele frequencies
Our problem is to find the minimum time for the allele frequencies in a single population obeying (4) to go from P0 to PT, where the fitnesses mij ~re unspecified and can vary with time, but obey the constraint
<
(9)
21 where mmax is the maximum strngth of selection. Before explaining how this problem can be solved, we will discuss how it provides a genetic distance. Since time can be scaled out of (4), the minimum time in this problem scales linearly with mmax. Thus, relative distances between different sets of allelie distributions are unaffected by the choice of m,~ax. Think of the allelic distribution in one current population as the initial frequencies in the problem, and the allelic distribution in the other population as the final frequencies in the problem. A biologically meaningful genetic distance is provided by the minimum time to go from the initial to the final allelic distribution. The reason is that half of this time is the minimum time for both current allelic distributions to have evolved from an unknown common ancestral allelic distribution. This problem can be phrased as a standard problem in optimal control (see Fox and Hastings, 1992 where the multilocus model is discussed in detail). A general reference is Bryson and Ho (1975). The way to proceed is to minimize the integral T =
f
dt
(10)
subject to the 'constraints' of the dynamics and starting and ending conditions (4)-(9). Minimizing the integral is equivalent to minimizing at each time the Hamiltonian H = 1 + Ehi
(11)
Z
where the A's are defined by d~i
OH
d--i- = -- Op---: '
(12)
subject to the initial and final conditions and the limits on the strength of selection. For time minimum problems, the Hamiltonian H must always be zero. To find the choices of the time-varying Malthusian parameters that lead to the optimal solution, define OH c,,:/(t)- Omi)" (13) The form of our problem says that at any time the optimal mij ~s will generally be
%
=
(14)
% ( t ) = -r mox
(15)
if aij(t) > 0, and if c,ij(t) < O. This form is known, for obvious reasons, as bang-bang control. The idea here is that the fastest response will be obtained by always using the maximum strength of selection. In the special case when crij(t) = 0 over a finite interval, a "singular interval" occurs when the optimum is intermediate. However, we have yet to encounter one of these singular intervals.
22 Finding the minimum of the Hamiltonian for this time minimum problem reduces to solving a boundary value problem, which must in general be solved numerically. However, examination of the simplest case with two alleles, where the solution can be obtained by inspection, provides insight into the structure of the solution that is possible in more complex cases. For the sake of definiteness assume that pl(O) < p~(T). It is clear that the minimum time is obtained if r/tll
~- m 1 2
~- -rrz22
~
ttlrnaa:
(16)
rnrnax
(17)
when pl(t) < 1/2 and gr~ll :
--m12
:
--7n22
~
when pl(t) > 1/2. In control theory the curve (in this case a point), pl = 1/2, is known as a switching curve. Knowledge of the switching curve, which can be found in some eases where the complete solution cannot be found anMytically, provides a geometric interpretation of the optimal solution. Note that for this two-allele case it is easy to prove that the controls are always bang-bang, i.e., singular intervals cannot occur. Three features of the solution (16), (17) carry over either to the multiple allele case or multilocus case. First, the optimal solution generally involves timevarying choices for the fitnesses or Malthusian parameters. Second, the number of switches (through time) between different choices for the fitnesses typically is small. Finally, the form of the control - - namely that it is bang-bang - - says what the form of selection is for the optimal solution for the single-locus multiple allele problem. In all cases every pair of alleles interacts, so there is complete dominance in fitnesses. The complete nature of the solutions in the multiple allele case will be discussed elsewhere.
3
Two Loci
For mathematical reasons, the studies of dynamics of two-locus models have typically focussed on continuous time models. However, the studies of equilibrium behavior have used the more easily justified discrete time models. We will continue that dichotomy here, while noting that the form of the results indicates that these choices do not have a large effect on the outcome. In this section we will summarize earlier results of Hastings (1981) and Fox and Hastings (1992). We will therefore omit most of the details, and emphasize the contrast between the one-locus and the multiple-locus cases. 3.1
Equilibria
For the discrete time model, we will use the standard model where xl, x~, xa and x4 are the frequencies of the four gametes AB, Ab, aB, ab respectively. Denote by wij the fitness of the individual with gametes whose frequency are given by
23 x~. and xj, where we assume that w~j = wji and that w14 : W23. Then, as before, we define the average fitness of the gametes by
wi = ~ p j w , j J
(18)
and the mean fitness of the population by ~b = ~-~piwi.
(19)
i
Finally, define the linkage disequilibrium D=
zlz4
-
x:z3.
(20)
Denote the probability of recombination by r. Then the dynamics of the model are described by the equations z~ =
x~wi =k rw14D
(21)
where the sign is positive for i :- 2,3 and negative for i = 1,4. To find an equilibrium, we set x~ = zi. In contrast with the single-locus problem, even finding all the equilibria of this model tbr general choices for the fitnesses is essentially impossible since the problem is nonlinear. However, exactly as in the single-locus case, if we reverse the normal procedure and view the gametic frequencies x~ as given and the fitnesses wij as the unknowns then finding the fitnesses corresponding to a specified equilibrium is a linear problem (Hastings, 1981). Also, as in the single-locus case this procedure does not determine the fitnesses uniquely, as there are three equations for the nine unknown fitnesses, leaving six free parameters. To obtain information about the equilibrium possibilities, Hastings (1981) phrased the problem as an optimization problem that could be solved using linear programming. Specify a limit to the strength of selection by assuming w14 = 1 and that 1 - s < wij _< 1 + s.
(22)
Then fix the gametic frequencies with D nonzero and maximize the recombination rate. The reason there is a maximum value to the recombination rate is that recombination breaks up combinations of alleles at the two loci, so recombination reduces the disequilibrium, D. Thus, in contrast with the single locus case, there are limits to the strength of selection (for a fixed value of the recombination rate, r) that can lead to a specified equilibrium. Further discussion of this approach is contained in Hastings (1981, 1989).
24
3.2
Dynamics
We will now turn to the formulation of the dynamic problem for two loci, beginning with the formulation of the model. For the continuous time model, we will use the formulation of Hot'bauer and Sigmund (1988). We denote the frequency of the haplotype with allele i at the first locus and allele j at the second by Xij. We let X be the vector of all the genotypic frequencies. We denote the birth and death rates of the genotype ij, kl at time t by b~j,kl(t) and dij,kl(t), respectively. Then the rate of increase (Malthusian fitness) of genotype ij, kl is mij,kl(t) = bij,kt(t) - dij,~t(t). The mean rate of increase (mean Malthusian fitness) has the usual definition =
(23)
i,j k,l
The disequilibria have their usual definitions as D
j(t) =
(24)
-
k,l
We assume that the system follows the standard model for dynamics, of a two-locus model in continuous time, namely
dXij = X i j ( t ) ( Z X ~ , t ( t ) m i j , k z ( t ) _ ~ ( t ) ) dt \ ~,l r ~_~ (bij,~,(t)Xii (t)Xei(t) - biz,kj (t)Xil(t)Xej (1))
(25)
k,l
Note that in the continuous-time model, unlike the standard discrete-time model, the birth rates enter into the final term on the right-hand side, to account for the continuous production of recombinants (Crow and Kimura 1970, Hofbauer and Sigmund 1988). We can then study the same time minimization problem that we studied for the one-locus model (see Fox and Hastings, 1992 for details). The problem is then to minimize the time, T, assuming that equation (25) holds, to go from a specified set of initial frequencies X(0) = X0
(26)
to a specified set of final frequencies X(T) = XT
(27)
with an additional set of constraints on the strength of selection,
brnin,i,kz < bij,kl(t) < bmax,j,~z, } mij,kt(t)
I = bii,kl(t) -- dij,~l(t)mrnaz~j,k~,
i,j=l,...,n;
k,l=l,...,n;
O
(28)
25 Detailed results for this minimization problem are discussed elsewhere (Fox and Hastings, 1992; in preparation). We will limit our discussion here to contrasts between the multiple-locus problem and the single-locus problem. As in the single-locus model, the form of the optimal solutions is the same - controls are 'bang-bang'. Also, as suggested by the discussion of the one-locus, two allele model, the number of switches in the fitness patterns is usually quite small. If the strength of selection is small and the final disequilibrium large, it can be impossible to reach the final point - the final point may lie outside the reachable set. This is a major difference between the multilocus and sincle-locus problems. The results in Fox and Hastings (1992) show that the equilibrium results from the discrete time model provide a good guide to the form of the reachable set. Moreover, as expected the time to reach points near the boundary of the reachable set goes up dramatically. An important, and perhaps surprising result, from Fox and Hastings (1992) is that the minimum time can either increase or decrease with an increase in r, depending on the terminal point. Also, for all the values we considered, the effect of r on the minimum time is very small unless r >> s. When r < 8 the time necessary to reach the terminal point depends almost entirely on s. While the minimum time to reach the terminal point is insensitive to r when selection dominates, the time-minimum trajectories themselves - - and therefore the selective regimes - - do vary with r.
4
Conclusions
The equations of population genetics are complicated nonlinear equations, and therefore general solutions, particularly of dynamic behavior, have not been found. One way of studying these equations has been to directly study their behavior numerically. This produces answers that are not general. The approaches described here attempt to provide limits to the dynamical behavior of these equations by using optimization techniques. We suggest that the approaches described here may be useful both to study other questions in population genetics, and in the study of genetic algorithms. In principle, the framework developed here could be used to study limits to the performance of genetic algorithms as optimization methods. For most questions in population genetics, we have a much better grasp of the variables in our models than we do of the parameters. Thus the framework outlined here - - where the parameters are used as the unknowns - - corresponds much more closely to our state of knowledge than the more standard approaches.
Acknowledgements We thank Warren Ewens for comments. This research was supported by Public Health Service grant GM32130 to AH.
26
References Bryson, A. E., and Y.-C. Ho. 1975. Applied Optimal Control. Hemisphere, Washington. Crow, J. F., and M. Kimura. 1970. An introduction to population genetics theory. Harper and Row, N.Y. Ewens, W. J. 1979. Mathematical population genetics. Springer-Verlag, Berlin (Biomathematics Volume 9). Ewens, W.J. and A. Hastings. 1995. Chapter in this book. Fox, G. A. and A. Hastings. 1992. Inferring Selective History from Multilocus Frequency Data: Wright Meets the Hamiltonian. Genetics 132:277-288. Hastings, A. 1981. Disequilibrium, selection~ and recombination: limits in two-locus, two-allele models. Genetics 98:659-668. Hastings, A. 1989. Deterministic multilocus population genetics: an overview, pp 27-54 in Some Mathematical Questions in Biology: Models in Population Biology, edited by A. Hastings. American Mathematical Society, Providence, Rhode Island. Hofbauer, J., and K. Sigmund. 1988. The Theory of Evolution and Dynamical Systems. Cambridge University Press, Cambridge. Nagylaki~ T. 1992. Introduction to theoretical population genetics. Springer-Verlag, Berlin (Biomathematics Volume 21). Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.
Emergence of Mutualism
Guillemette Duchateau-Nguyen*, G6rard Weisbuch* and Luca Peliti + * Ecole Normale Sup6rieure Laboratoire de Physique statistique (Associ6 au CNRS et aux Universit6s Paris VI et Paris VII) 24 rue Lhomond 75005 Paris + Institut Curie Section de Physique et chimie 1I, rue Pierre et Marie Curie 75005 Paris duchat @physique.ens. fr
weisbuch @physique.ens.fr
peliti @radium.jussieu.fr
Abstract A population dynamics approach based on a system of differential equations allows us to establish conditions for the emergence of mutualism for cases such as coelenterates-algae symbionts. A central assumption of the model is that a host organism is able to discriminate, via some molecular recognition mechanisms, among different invading organisms and preferentially reject parasites rather than bona fide symbionts. Large differential rejection rates allow the emergence of mutualism. Different attractors of the population dynamics correspond to the emergence of mutualism, predominance of "selfish" species, or coexistence of many species.
1 Introduction 1.1 The paradox of mutuallsm Mutualistic systems are known to occur in nature, e.g., lichen made of algae and fungus, corals-zooxanthellae and Hydra-Chloretla. Exchanges among the partners are beneficial for the species involved, which live together in close association. The benefits exchanged can be food, protection-habitat and transport (pollination) {Boucher, James and Keeler, 1982}. According to {Begon, Harper and Townsend, 1986} most of the world's biomass is composed of mutualists: organisms in forests, meadows and corals are
28 involved in symbiotic associations. The emergence and stability of mutualism constitutes a paradox in terms of individual selection of the fittest. The paradox is the following: since giving food to the other symbiont should be costly for the donor, we expect the donor to be disadvantaged in terms of fitness with respect to a more selfish species which would give nothing. The purpose of this paper is to describe a mathematical model which accepts the premises of this assertion but refutes its conclusion. An important part of the argument is that selection occurs not only at the level of individual organisms, but also at the level of their mutualistic associations.
1.2 Previous theoretical studies
The stability of mutualistic ecosystems has been studied by a number of authors using cost-benefits analysis, standard non-linear differential systems or "Artificial Life" numerical simulations. An early work is that of {Roughgarden, 1975} who did a cost benefit analysis of the exchanges among hosts and guests to describe damselfish-anemones mutualism.{Wilson, 1983 } in his study of beetIes-phoretic mite association insists on the idea of group rather than individual selection. He predicted that when populations are clustered in groups with a varying proportion of mutualists and selfish types, those group with more mutualists should do better and be selected. The differential system approach is sumarized in {De Angelis, Post and Travis, 1986}. It is based on either Volterra-Lotka formalism to characterize global stability or a more general form of the per capita rate of increase function to monitor local stability. Classical non-linear analysis criteria are used to obtain inequalities among equation coefficients that ensure stability. The absence of stability among a set of host, parasites and true symbionts that exist as individual species leads these authors to suggest that mutualism can only exist when the relationship between the host and the guest involves a one-to-one relationship. Artificial life simulations of {Ikegami and Kaneko, 1990} do not succeed in establishing mutualism permanently since they postulate the existence of parasitic mutants quite harmful to the host. Their model exhibit transient periods during which mutualistie species predominate, alternating with periods when parasitic species exploit them. Artificial life models have also been used to study stability and breakdown of symbiotic associations {Taylor, Muscatine and Jefferson, 1989 }. A conclusion of many authors is that long term interactions among host and guest should be necessary for the establishment of mutnalism. These conclusions also appear in studies of equivalent problems, such as the increase of the stability of hypercycles by compartmentation {Szathmhry aad Demeter, I987 } and in the iterated prisonner dilemma model {Lindgren, I992; Nowak and May, 1992} where cooperation among players can only be established when players play long enough. The present paper works along these lines. We propose here a differential equation model that takes into account a dynamics of association-dissociation of organisms which is representative of the processes existing among coelenterates-algae symbiosis and we
29 study the range of parameter values for which this aggregation dynamics gives rise to mutualism. The next section presents the mathematical framework and the elementary processes involving individual organisms. Section 4 is a summary of simulation results obtained for a simple model where only binary associations are possible. Section 5 presents the slow manifold analysis of the model, which brings us some insight into the transitions among the different observed dynamical behaviors. In section 6, we show that the asymptotic behavior of associations with many endosymbionts is equivalent to that of binary association, since the slow manifolds of both models are equivalent. Section 7 extends the model to those cases when one of the benefits enjoyed by endosymbionts is protection by the host, and when the host eventually digest parasites instead of rejecting them. The conclusions of this study are discussed in the section 8.
2 The hyperbolic model We use a model of population dynamics introduced in {Weisbuch, 1984}. With respect to the Volterra-Lotka approach, this model does not exhibit divergences of populations in the case of positive interactions among organisms and is analytically soluble in the low mutation rate limit. The following differential system allows us to study population dynamics in the presence of mutations. Each population varies in time due to three terms: - The first term describes population growth according to a fitness coefficient and available resources f which are shared by all populations; - The second term is simply the rate of death. - The third term is a mutation term which decreases the population because of all possible point mutations or increases it because of mutations from other existing species j one mutation away.
dPi o~i.f.Pi - - dt - Pt
d.P i +m.(-n.P i + E P j )
J
and
; every i
(1.1)
'tr",t
Pt =
)_.., Pi i
where Pi and Pj are the populations of i and j, Pt is the total population, d and m are the death and mutation rates and n is the number of genes of each organism. A first time scale is given by l/d, the other one much longer is the time scale related to evolution l/m. The scale of the populations is proportional to the available resources. Starting from low population levels, those populations with sufficient fitness grow at an exponential rate. But a total population level is then reached when only the fittest populations have a positive growth rate, namely those which fitness is in excess of
30 d.Pt f . The attractor of the system is analytically obtained by a perturbation technique {Morse and Feshbach, 1953} when the mutation term is small with respect to the death term. First of all, only the fittest organisms predominate with a population ratio of 1_ m with respect to other organisms. The fittest (or dominant) population Pm is obtained by equating its time derivative to 0 and neglecting mutations from other species:
Pt -
~m.f d =Pro
(1.2)
where a m is the fitness coefficient of the fittest population (equating Pm to Pt is a guess which is supported by further analysis of higher order terms in m). The populations of the nearest mutants are obtained by equating to zero their time derivative and taking into account the mutation term which comes from the fittest species:
r Pt
- d ). Pi + m . Pm = 0
(1.3) Pi -
m.Pm d(l-
~mm)
In the limit of small r~l, the ratio of the first mutants to the fittest population m scales as "-d The same analysis can be carried to next mutants which decay in population by a factor d for each further mutation from the fittest. The perturbation technique described here will allow us to interpret the results obtained later. The main point is the predominance of the fittest populations by a factor m
d"
3 Building-up of the model 3.1 Phylogeny and interactions We shall now apply a modified version of the above formalism to a system made of five populations, whose interactions and phylogeny are represented on Figure 1. This model differs from the canonical model of section 1.3 by the presence of two food resources, fl and f2 corresponding repectively to hosts and endosymbionts.
31
Food f2
Food fl
B
C
C
B
13
Fig. 1 : Phylogeny and interactions of the 5-species system. The organisms of the populations C, D and E use resources 2, available in quantity f2. The organisms of the populations A and B use resources l, available in quantity fl. Horizontal arrows figure exchanges among organisms.
Populations C and B are unrelated and are not involved in any interaction with other populations. They can be considered as primeval organisms. Their respective fitnesses are Y and ~. Population A differs from population B by one mutation. Population A can be considered as a host for populations D and E. It produces some nutrients that can be used by D or E. D is a guest of A and is a selfish organism, it is further called a commensal. It uses nutrients produced by A, but does not give anything in exchange to A. E is a bonafide symbiont of A. It uses nutrients produced by A and in exchange provides A with nutrients. Both D and E are called endosymbionts when they are inside A.
3.2 The elementary processes
The ecosystem is made of free organisms A, B, C, D and E and couples AE and AD, when A is infested respectively by E or D. To simplify the model, we have first supposed that the host A offers only one site where E or D are able to bind. This simplification limits the set of possible associations to couples AD and AE. Changes in population sizes are due to elementary processes represented in Figure 2. Those elementary processes are: reproduction which depends on a fitness
32 coefficient, death which occurs with a rate d, mutation which occurs with a rate m and association and dissociation which depend on kinetic constants.
3.2.1
Reproduction
The organisms are either free or associated with another organism belonging to a different species. In the case of reproduction of one of the symbionts of AD or AE, the other one is liberated. For each one of AE or AD two processes can then occur: AE---) AE--) AD --9 AD ~
A +AE E+AE A + AD D +AD
(reproduction (reproduction (reproduction (reproduction
of A with fitness of E with fitness of A with fitness of D with fitness
~) a) ~F) 5)
The following choice of fitness parameters reflects the fundamental paradox about mutualism. The relations (2.1, 2.2) among fitnesses are: o~>~ >o~F
(3.1)
The fitness of B ([3) is higher than the fitness of A (~F) because A provides D and E with nutrients. But when E is bound to A, the fitness of A (o0 is larger than that of B thanks to the cooperation of E. 8F<SF=
Y < ~ <_5
(3.2)
The fitness of the E species (e or ~F) is inferior to the fitness of D (5 or 5 F) because E produces with some cost a nutrient for A. ~iF = y since D does not provide nutrient for anyone. Fitnesses of D (8) and E (E) when associated with A are larger than fitnesses (5 F and 8F) of the free organisms.
3.2.2 Association and dissociation Choosing the kinetic constant of dissociation of D superior to the kinetic constant of dissociation of E (ks d > ks e) allows E to spend more time inside A than D. This results in a larger time average fitness of E. The basic hypothesis is that the host normally rejects "guests" at a certain rate, but it can somehow appreciate the degree of cooperation of E and reject it less frequently than D. This selective rejection of D might be due to some molecular recognition mechanism, which are known to exist in polypes and sponges, or simply because the increase in the level of nutrient produced by E decreases the rejection rate of E by A. The only "advantage" of E with respect to D in this model is : ks e < ks d. We then want to check when, i.e. for what set of parameters, this condition is sufficient to overcome the advantage in fitness of D over E and bring the emergence of mutualism.
33
,/ ...- .... t i #
t'~'# 7
II I| / I I
I I
/
f
A
I I
',/I I
FF
11
/
I
0 1 e
I
q
O~F
---
I 1 I I
11 Ii II
-I
'1.
Fig. 2: Set of the elementary processes that modify the populations. Thin continuous lines represent fast association processes between hosts and guests organisms with kinetic constants kl d, kl e, and fast dissociation processes with kinetic constants ks d and ks e. Bold lines represent reproduction and death processes (with rate d). The greek letters represent the fitness coefficients associated with the reproduction processes of the organisms.
,CJ
34
4 The Single site model (SSM)
4. 1 The differential system The set of differential equations describing the corresponding population dynamics is: dA _ (aF.(A+AD) + a.AE).fl - (d+ m.n).A+ d.(AD+AE)+ m.(B+AD) dt A+AD+AE+B + (kse.AE) + (ksd.AD) - kle.(A.E ) - kld.(A.D )
(4. I)
dAD -dr = - 2.( d + m.n).AD+ (m.AE) - (ksd.AD) + (kld.(A.D))
(4. 2)
dAE dt - "2.( d + m.n).AE + (m.AD) - (kse.AE) + (kle.(A,E))
(4.3)
_ ~.B.fl w . (d+ m.n).B + m.(A+AE+AD) dt - A+AD+AE+B
(4.4)
dC ~/.C.f2 - (d+ m.n).C + m.(D+AD) dt - C+D+E+AD+AE
(4. 5)
d:) (g~F.D+ &AD).f2 dt - C+D+E+AD+AE " (d+ m.n).D+ (d.AD) + m.(C+E+n.AD) + (ksd.AD) - kld.(A.D )
(4.6)
cE (aF.E + e.AE).f2 ~ - - C + D + E + A D + A E - (d+ m.n).E + (d.AE) + m.(D+n.AE) + kse.AE - kle.(A.E )
(4.7)
These equations simply sum the contributions of the processes described by the Figure 2 (section 3.2) and set the time variation of each population.
4. 2 Simulation results The initial conditions are such that populations are at equilibrium for the populations B, C, A, D and AD, mutant E being absent; for instance: A=0.01, AE=0, AD=0, B=80000, C=20000, D=60000.
35 For each set of parameters, the initial conditions, determined by numerical simulations of the differential system in the absence of E, are those corresponding to the eventual emergence of mutualism, when a true symbiont E is introduced among an ecosystem at equilibrium containing only primeval species B,C and commensal D plus the host species A. The set of parameters listed in the Table 1 were used in the simulations, unless otherwise specified. Varying parameters and initial conditions three types of attractors are obtained by numerical integration of system (equations (4. 1) to (4. 7)). All attractors are point attractors. The most sensitive parameters, in terms of dynamical behavior, are the difference in fitness (~5- e) and in the rejection rate (ksd - kse) between species D and E. The resource parameters fl and f2 simply change the scale of populations. Three time scales, fast, intermediate and slow, are fixed by respectively ksd, d and m. d/m determines the ratio between dominant and less fit species populations.
Parameters Values d (frequency of death) 10"2 m (mutation rate) 10-5 n (number of genes) 3 fl (food shared between A and B) 80 f2 (food shared between C, D and E) 100 Fitness coefficients cxF (free organisms A) 8 13 (organisms B) 10 "f (organisms C) 8 8F (free organisms D) 8 eF (free organisms E) 7 ct (organisms A in AE) 12 (organisms D in AD) 12 e (organisms E in AE) between 8 and 12 Association and dissociation constants kld (organisms D) 1 kl e (organisms E) 1 ksd (organisms D) between 0.1 and 12 kse (organisms E) 0.1 Table 1 : Values of the used parameters in the numerical simulations.
For small ksd, i.e., when D spends in A an amount of time comparable to E, AE does not prevail over AD. The commensal D benefits from the nutrients it gets from A, and develops faster than E. A does not benefit from D and is not able to grow faster than B. Finally the primeval population B remains at the higher level. A small population of
36 host A is mainly infested by commensals D, which gives a small but sufficient advantage to D to overcome C. Emergence o f mutualism is observed for large ksd, i.e. when D is rapidly expelled from A. A gets support from E and is able to overcome B. The symbiotic organism AE becomes predominant over the primeval populations C and B. The primeval organisms populations are only maintained by the mutations. For some values of the fitness parameters a coexistence region is observed for intermediate values of ksd, In this region, one observes coexistence of the couples AD and AE. The fitness of A in these couples is comparable to that of B, which coexists with them. A high population of free D is also maintained.
100000
x)
AEm B
80000
o
AE
o
"'~ 60000
40ooo
AEc ~::.:~:.'.:.'..'. - A D AE -
20000 m
0
~
0
2
~
~
T
~
I
4
6
8
m
w
m
"r
!
10
12
ks d
Fig. 3: Equilibrium populations measured at time t=100000 as a function of ksd for ~:=10. The three regimes, predominance of the ancestors, coexistence and emergence of mutualism are separated by sharp transitions. Populations of A and E, very small, are not represented on the diagram. Continuous lines were obtained from initial conditions in the absence of E species (emergence conditions). The dotted lines are obtained with initial populations of the attractors: ADc and AEc corresponding to the coexistence attractor and AE m to the mutualism attractor. The arrow indicates that the coexistence attractor exists up to larger values of ks d.
37 The three regimes - egoism, coexistence and mutualism - can be observed on Figure 3 which shows the equilibrium levels of the populations when ksd increases. The transition value of ksd between the three dynamics depends upon the difference in fitness between D and E. This diagram is complicated because the regime that is reached depends not only on the parameters, but also from the initial conditions. The discontinuities in populations are due to the non-linearity of the equilibrium equations, obtained by setting to zero the time derivatives. For the same set of parameters several solutions exist, one or two of which are attractors. Dependence on the initial conditions and hysteresis are then observed. One consequence of this hysteresis is that the conditions for the stability of mutualism when facing the invasion of a new commensal is less stringent than the condition for the emergence of mutualism facing the same commensal already established. The emergence of mutualism implies a transition from a different attractor and is achieved for larger ksd (or a smaller 5 - a) than its failure against an invading commensal which implies a dynamics starting from the mutualism attractor. In the second case the transition is achieved for a lower ks d (or a larger 5 - e).
5 Slow manifold analysis
The slow manifold analysis allows us to interprete the simulation results and to predict the transitions among the different dynamical regimes. The interaction processes among the different populations have very different time scales: the exchange interaction between free and linked species are very fast with respect to population growth and death terms which are of the same order. Mutations are even slower. One can then suppose that the populations of free and linked organisms follow their ratio at exchange equilibrium given by:
(kSd+2.d).AD = kld.A.D
(5. 1)
(kse+2.d).AE = kle.A.E
(5.2)
After a fast decay towards exchange equilibria, the populations are describing a slow manifold whose equations are obtained by equating to zero combinations of the time derivatives in which the association and disssociation terms are cancelled (fast exchange). Mutation terms are neglected since they are small with respect to proliferation and death rates: dA
dAD dAE (aF.(A+AD) + oc.AE).fl + ' ~ + dt A+AD+AE+B
- d.(A+AD+AE) = 0
dD dAD (~F.D + 8.AD).f2 ~ - + dt - C+D+E+AD+AE - d.(D+AD) = 0
(5.3)
(5.4)
38 dE dAE (eF.E + e.AE).f2 ~- + dt - C+D+E+AD+AE "d.(E+AE) = 0
as
(5.5)
~.B.fl
~- = A+AD+AE+B "d.B = 0
(5.6)
dC T.C.f2 dt - C+D+E+AD+AE "d.C = 0
(5.7)
The set of elementary processes that give rise to these equations is reduced to the bold line processes of Figure 2.
5.1 The "species" and their effective fitness The first three equations represent the time derivatives of populations of "species" A, D and E. Here the term "species", by opposition to organisms, refers to a set of organisms, whether free or bound. The total population of species A for instance is A+AD+AE. The reproduction of these species involves an" effective fitness" which is:
(OCF.(A+AD)+ c~.AE) A+AD+AE
5.AD D+AD
for species A composed of populations A+AD+AE,
+
and eF'E + e.AE E+AE
for species D composed of populations D+ AD,
for species E composed of populations E+AE.
The effective fitnesses are thus average fitnesses, intermediate between those of free and bound organisms (see Figure 4). Since the processes of association and dissociation are faster than those of death and reproduction, the selection process, which is based on the reproduction/death balance, occurs for "species" rather than for individuals. We then expect that those species with larger effective fitness, or larger fitnesses as far as B and C are concerned, will become predominant. Since the effective fitnesses depend on the ratio of free to bound organisms, they vary with the dissociation constant, which gives rise to the observed transitions. The maximum of effective fitnesses play a role similar to the minimisation of free energies in thermodynamics of solutions. The phases, nearly pure chemical species or mixtures, that are obtained for a given set of thermodynamics conditions are those which minimize free energy under these conditions. The same is true here: the observed dynamical regimes involve either a strict selection of species as in the case of egoism and mutualism, or coexistence of species, but in both case the highest fitnesses are selected. To carry on this analogy, the mutation rate which increases the populations of subdominant species has a
39 role similar to temperature in physical systems. In ordinary physical systems, phase transitions are obtained by changes of intensive variables such as temperature, or concentrations of some chemical species. In the systems of biologicaly interacting species that we describe here, the most biologicaly plausible factors for possible changes in dynamical regime are the apparition of new mutants with different parameters for fitness, death or association. Resources are extensive variables in this model, the equivalent of volumes in thermodynamics of solutions.
5.2 Dominant species We have seven equations with seven unknown, and we should, in principle, be able to compute all variables which we expect a priori to be of the same order of magnitude. In fact this is never observed; if one tries to solve directly the system, some variables have to be set to zero to avoid contradictions. This means that only some of the variables have strictly positive values; they correspond to the dominant populations with larger effective fitness (see Figure 3) and they can be directly computed by solving the slow manifold equations without mutation terms. The other variables are in fact m smaller by a factor -~ ; they can be computed from the dominant populations by taking into account the mutation terms that we have previously neglected. The five last equations (5. 3) to (5. 7) describe the balance between proliferation and death of species A, D, E, B and C. Not all populations can satisfy these equations at the same time. Equations (5.4) and (5.7), for instance, when combined, give:
~/=
8F.D + 8.AD D+AD
(5.8)
which implies: 8F < y < 8
(5.9)
in contradiction with expression (3.2), 5F= y. In fact, the species with larger effective fitness, D, predominates with a large 8f2 population of order ~-~ (cf. equation (1.2)). The mutation term has to be re-introduced in equation (5.7) in order to compute C which is smaller than D+AD by a factor of order d" The different dynamical regimes which are observed by computer simulations correspond to different branches of the slow manifold where different sets of populations are dominant. There is a limited domain of parameters where a particular attractor solution exists, which gives a necessary condition to reach this attractor. To actually solve the equations, one selects a set of predominant populations, being guided by the simulation results. The self consistency of the choice is checked by solving the equilibrium equations. The following algebraic computations using the slow manifold approximation allows to predict the possible attractors for a given set of parameters. But, unless the
40 attractor is unique, knowing the domains of existence of the attractor does not predicts which possible attractor is reached for every initial condition,
5.3 The coexistence regime Let us start with the coexistence regime, where all species except C are dominant. The following change of variable is made:
x-
AE AD E+AE ' y - D + A D '
E+AE z = D+AD+E+AE
(kSd+2,d) , K-
(kse+2.d)
(5. 10)
The equations (5. I) and (5. 2) being combined in one, the system may be now simplified into: K_y 1-y
x - 1-x
(5, 11)
A is neglected with respect to AE and AD. This is based on simulation results. A amplitude could also be estimated from equations (5. 1) and (5.2), taking into account the AD AE fact that ~ and ~ are of order 1. The following equations (5.12 - 5.15) are derived from equations (5.3 - 5.6): O~FY.(1 - z) + ~.x.z = [3. (x.z + y.(1-z))
(5.12)
5F. (l-y)+ &y =
d_.(C+D+E+AD+AE) f2
(5. 13)
EF. (l-x)+ e.x =
d.(C+D+E+AD+AE ) f2
(5. 14)
A+AD+AE+B
= ~.fl d
(5. 15)
Figure 4 is a graphical resolution of equations (5. 11), (5. 13) and (5. 14). This diagram shows the linear increase of the effective fitnesses as a function of the fraction of linked organisms. Equating both left handside of equations (5. 13) and (5. 14) shows that equality among effective fitnesses, which is the condition for coexistence of species D and E, is obtained if and only if the segments delimited on the horizontal by the fitness lines are in the ratio corresponding to equation (5. 11). Two solutions exist for large values of K, but only the one closer to x=l, corresponding to the highest fitness, is stable. The other one
41 is close to y=0. When K decreases, the two solutions get closer until they collapse for the transition from coexistence to egoism. The algebraic solution is obtained by equating both left handside of equations (5. 13) and (5.14) which allows to express y as a function of x. Putting the resulting y in equation (5. I 1) gives a second degree equation in x. Cancelling the discriminant of this equation gives the lower boundary in ksd for the existence of the coexistence attractor (the corresponding transition line is the lower dotted line on Figure 3). The boundary'only depends upon the ratio of association constants K and upon the fitnesses of species E and D. It is independant of available resources. An upper boundary in ksd is obtained by using equation (5. 12) to compute the actual magnitudes of the populations; it is reached for rather large values of ksd when B goes to 0. In between, equations (5. 11) to (5. 15) allow us to compute the populations of the dominant species and C is computed from the original equation (4.5) with mutations.
13 ~
12 11
~i(y)~
~(Y)Io e (x) 9 8
6
0,0
I
0,2
~
I,
,I
I
0,6
0,8
,
0,4
,
x~y
Fig. 4 : Effective fitnesses and limits of stability When x (resp. y), the fraction of associated true symbionts E (resp. parasites D) varies between 0 and 1, the effective fitness e(x) averaged on both associated and free forms varies between eF and e (resp. 3F < 5(y) < fi). Equality of effective fitnesses (the transition condition) and the association equilibria require that the segments delimited on the horizontal fine verify a ratio K defined in the text (see equation (5. 11)).
,,
1,0
42
5.4 The mutualistic regime
In the other regimes, the set of dominant species is even smaller. We could of course systematically try all possible combinations of species to find out whether they satisfy equations (5. 11) to (5. 15), but the easiest way is to make use of simulation results to select them. In the mutualistic regime, the dominant population is AE. It is computed from equation (5.3) by neglecting all other populations on fl branch :
c~.fl
AE= d
(5. 16)
Using this result in equation (5.5) gives a second degree equation for x, the ratio of bound E to total E (as defined above in (5.10)):
(e-eF).X 2+eF.x
The condition • available foods:
-
~'fl f2
-0
(5. 17)
gives the following inequality between the fitnesses and the
e.f2 > c~.fl
(5.18)
When both sides are divided by d, they represent the maximum population size of species E and A. Inequality (5.18) is then interpreted as the possibility for E to eventually bind to all available A. This means that mutualism can be established by compensating the difference in fitness between E and D inside A by a large enough rejection rate for D, provided that the food available to species D and E is in sufficient amount so that E can always saturate A. Otherwise, however large ks d is with respect to kse, the effective fitness of A is not larger than ~], and mutualism cannot be established. The mutualistic regime is observed as long as the effective fitness of species E is larger than that of D. A transition to coexistence occurs when both fitnesses become equal. The effective fitness of E is expressed in terms of x computed from (5. 17). x is used to compute y by (5. 11), which is then used to compute the effective fitness of D. The transition line of Figure 3 (the upper dotted line), which limits the domain of existence of mutualism, is obtained when the values of x and y are replaced in the equality: 8F.(1-y )+ 5.y
= EF.(1-x )+ e.x
(5. 19)
43 5.5 The selfish regime The dominant species in the selfish regime are B and D given by equations (5.4) and (5.6):
B
~.fl
= --d-
D
-
5.f2 d
(5.20)
The most important mutants AD and E are obtained from equations (5.3) and (5. 5) where the most important mutation terms, those involving B and D, have been reintroduced in the right hand side: .0.m.B AD = d({]-~)
~SF.m.D E - d(SF_eF)
(5.21)
A and AE are finally obtained by the kinetics equations (5. 1) and (5.2). The limit of stability of the selfish regime is reached when the effective fitness of A reaches that of B and the same is true for E and D. In the algebraic computation of the stability limit, the fitnesses a and e F in the denominator have to be replaced by the effective fitnesses obtained by iteration of the above computation to achieve a reasonable accuracy. Let us note that the stability of the egoism attractor depends upon the mutation rate, which is not the case for the other regimes. All the quantitative predictions of the slow manifold analysis concerning the values of the populations at equilibrium and the limits of stability of the different attractors have been checked against the numerical integration of the complete model. The agreement is excellent and discrepancies are never greater than the percent. Apart from the transitions between different attractors according to the initial conditions, which can only be obtained by numerical integration, the slow manifold analysis is then a very good predictor of the dynamics. Figure 5 is a summary of the limits of stability of the different attractors obtained from numerical simulations and algebraic analysis of the slow manifold. We are presently concerned with the case when d*---d (the black dots on the figure). Let us note the vertical asymptote of the lower mutualism limit when both maximum populations are equal, as obtained from equation (5. 17). The monotonously decreasing upper curve is the upper limit of egoism. When this limit is overcome, transition to mutualism is obtained on the left part of the diagram and to coexistence on the right. The bifurcation point between the two possible regimes after the transition is very close to the left of the intersection with the lower stability of mutualism. In other words, nearly as soon as mutualism is a possible attractor, it is reached when egoism becomes unstable. The condition for this scenario is the possibility for the symbionts to saturate the available hosts. Otherwise, mutualism is impossible and transition to coexistence is observed. The diagram of Figure 3 with several reachable attractors is in fact only obtained for a small parameter region in the vicinity of the equality of maximum populations.
44
10
8
--
mutualism
9
egoism
................... coexistence -----ta---
mutualism (d>d*)
~
egoism (d>d*)
i
0,0
0,5
,0
1,5
2,0
P2
P1 Fig. 5: Limits of stability in the plane (~--~,ksd) The transition from egoism to coexistence or mutualism depends on the saturation ratio relating the maximum populations of host (A species), Pl=fl.~.d*, to symbiont (E species), P2=f2.E.d, (equations. (5. I8) and (7.5)). The vertical line at PI=P2 is a limit to the existence of mutualism which is only possible in the left part of the diagram. The limits of stability of the three regimes are drawn when the endosymbionts enjoys either an increase in fitness (black dots) or in lifetime (d>d*) (open dots). The upper decreasing curves are the upper stability limits of egoism (losanges), and the lower curves are the lower stability limits of mutualism (squares). The horizontal straight line is the lower stability of coexistence, for both cases. In the lower ksd regions egoism, and in the upper regions mutualism or coexistence, according to the possibility of saturation of the host by the symbionts, are the unique possible attractors. In between, which attractor is reached depends on the initial conditions (hysteresis).
2,5
45
6 The multi-site model
(MSM)
Binary associations are not the rule and in most cases the host offers a number of sites, p, to the invading species (Figure 6).
E
E
D
E
D
E
E
D
E
E
E
,--,--,-_,_-,_-,-_,__,_,_-,_,-_,_,__, Fig. 6 : An organism A with p sites occupied or not by E's or D's.
This is the case, for instance, for corals with algae inside polyps. The straightforward extension of the above model would be to write a large system of ordinary differential equations which would include all of species such as ADrEs, where r and s vary from 0 to p and indicate the number of D and E organisms inside A. In this large system, we can expect that after any reproduction, mutation or death events, the induced fluctuations in the rate of occupancy of the host by the endosymbionts D and E decay very fast while the populations reach their value on the slow manifold. Since we don't care much about these fluctuations, we can enormously simplify the model. A simple approach, which takes into account the different time scales for association and growth, is to suppose that association and dissociation processes are quasi-instantaneous. Apart from the free species, we only have to deal with variables A, AD and AI3. AD (resp.AE) is the population of organisms of type D (resp.E) inside A. A is the population of organisms A which are offering p sites per individual to invading organisms D and E. Among these p sites, AE/A sites are occupied by E organisms and AD/A are occupied by D organisms. In other words we are neglecting fluctuations of the occupancy rates of D and E inside the different A organisms~ This approximation, equivalent to a mean field approach, is based on the fast association kinetics. The differential system is then written:
dA (CZF.(P.A-AE)+ ot.AE).fl = (it p.(A+B)
- (d+m.n).A+ m.B
(6./)
dAD &AD.f2 --~'-" = C+D+E+AD+AE - 2.(d+m.n).AD+ m.AE - ksd.AD + kl d. (p.A-AD-AE).D
(6.2) dAE dt
a.AE.f2 - 2.(d+m.n).AE + m.AD - kSe.AE + kl e. (p.A-AD-AE).E C+D+E+AD+AE (6.3)
dR I].B.fl =-= - (d+m.n),B + m.A dt A+B
(6.4)
46
tiC_ )'.C.f2 - (d+m.n).C + m.(D+AD ) dt - C+D+E+AD+AE ~)
(6.5)
~F .D.f2 = C+D+E+AD+AE - (d+m.n).D + (d.AD) + m.(C+E+n.AD) + ksd.AD kl d.(p.A-AD-AE).D
(6.6)
cE eF "E'f2 dt - C + D + E + A D + A E " (d+m.n).E + (d.AE) + m.(D+n.AE) + kSe.AE - kl e. (p.A-AD-AE).E
(6.7)
The growth term for A is a linear function of the fraction of sites occupied by E. Mutations in A release infesting organisms, resulting in population changes in AD, AE, D and E. They appear as proportional to these populations rather than to A because of the occupancy rates. For instance, when an A organism mutates with frequency m, it frees AE/A organisms of species E. The corresponding source term in the E differential equation is then: AE m. A. ~ = m.AE
(6.8)
When simulated with p=10 or 100, this model (MSM) gives time evolution of populations which are very similar to those obtained with the single site model (SSM). Populations, are the same in both models for the same set of parameters, provided that the following changes are made: single site model fl
B
multi-site model fl P
A
p.B
p.A - AD -AE
Except for their short time linking dynamics, both SSM and MSM are pretty much the same, once one realizes that free sites of the A species in the MSM play the same role as free A in the SSM. When the slow manifold analysis is carried out for the multi-site model with the above changes, most equations are identical. The last four equations (5.4) to (5.7) of the simplified dynamical system are the same for both models (single site and multi-site, resp. SSM and MSM). For the MSM the first three equations are written: kse-AE = kie. (p.A-AD-AE).E
(6.9)
ksd.AD=kld.(P.A-AD-AE).D
(6.10)
47 (CCF.(pA-AE) + c~.AE).f1 p.(A+B)
=d.A
(6.1i)
The two first equations, once combined, give the same equation, (5.11), in x and y as in the SSM. The third equation (6.11) is equivalent in SSM to equation (5. 3) when one notices that the free A population of the SSM corresponds to the number of free sites in A, pA-AD-AE, of the MSM. fl in the SSM then corresponds to f l in the MSM. P P Since the dynamical equations on the slow manifold are the same in both models, we expect the attractors and their domains of existence to be the same, which is verified by the computer simulations. The only difference concerns the evolution towards the attractors. At short time scales of order (ksd)-I some differences are indeed observed, consistent with the approximations made in the MSM on the binding dynamics. Otherwise, at larger time scales the dynamics are identical, and even the transition lines from any given initial conditions are very close. Their relative distance in the (e, kSd) plane is never more than 3% of the parameters. In conclusion of this section, all the dynamical results that were obtained for the simple, but rather unrealistic single site model, remain valid for the evolution of the multi-site model which posess some biological relevance.
7 Discussion
7.1 Death rates of hosts and endosymbionts
In the case of coelenterates and algae, the host lifetime is larger that the endosymbiont lifetime. The assumption of equal death rates for both is unrealistic, but we have tested with numerical simulations that the qualitative features of the model, such as the existence of the three dynamical regimes, are preserved when the death ratio is changed up to a factor of 10. For the purpose of comparison, when changing the death ratio, we fl .dh.O~ maintained constant the ratio - - ~ , which controls the existence of coexistence (see Figure 5), by changing fl. (In the previous expression d h is the death rate of the host and varies from 10-2 to 10-3, the death rate (d) of the endosymbiont remaining unchanged.) This result could have been expected from the fact that the slow manifold equations remain unchanged. 7.2 Selective protection by the host
Mutualistic associations sometimes involve protection in exchange for food, for instance in the case of damselfish and anemon, or coelenterate and algae. This protection against predators might be described in our model by changing the death rate inside the host with respect to the death rate outside. By regrouping the first two terms of the
48 differential equation (1.1): dPi
.~.f = (-~t" d).Pi + . . . .
one sees that a change in the death rate is equivalent to the same relative change in the fitness coefficient. We have done series of simulations and algebraic computations on the slow manifold to study the "extreme" case when the only benefit enjoyed by the endosymbionts D and E (or the damsel fish in the case of fishes and protective hosts) is a reduction of their death rate d* with respect to their death rate d when they are free. d=0.01 ,
d*= 0.007
By contrast with section 4, for the present simulations, fitnesses are maintained constant inside and outside the host:
5F=5=8
aF=e=7
All other parameters are the same as for previous cases. The 30% decrease in death rate for endosymbionts is comparable to the 30% increase in fitnesses used in sections 4 and 6. We then expect comparable behaviors when the other parameters such as the dissociation rate ksd and the maximum population ratio fl.d*.~ f2.d.-----~ are the same, which is indeed observed on the phase diagram of Figure 5, where the open dots - corresponding to the model where protection is offered by the host - fall aside the black dots - corresponding to the model where the fitness of the symbionts increases inside the host. The slow manifold analysis follows the procedures that we have developped in section 5. Only equations (5.4) and (5.5) are changed and become:
dD dAD ~5(D+AD~ _ (d.D+d*.AD) = 0 dt + T = C+D+E+AD+AE
(7.1)
cE dAE E(E+AE).f2 dt + dt - C+D+E+AD+AI~ - (d.E+d*.AE) = 0
(7.2)
The equation (7.1) can be rewritten: dD dAD 52"2 d.D+d*.AD. dt + ~ = (C+D+E+AD+AE - ~ - ~ ) (D+AD) = 0
(7.3)
which shows that the per capita rate of increase is now averaged on the two populations, free and associated, via the death term. The previous analysis (section 4) has
49 shown that instabilities of the different regimes were obtained when average fitnesses became equal. It is now generalized by comparing the averaged per capita rate of increase, which shows that the transitions are obtained by equating average death terms divided by the corresponding fitness. The transitions are then obtained when:
1 d.D+d*.AD 8 D+AD
m
1 d.E+d*.AE ~ E+AE
(7.4)
The same arguments concerning equalities of the per capita rate of increase allow to predict the transitions in the most general case, when the symbiosis involves simultaneous benefits in fitness and protection.
7.3 Selective digestion by the host. We have tentatively tried to check whether a selective digestion of the endosymbionts could result into mutualism. We were driven to this hypothesis by the fact that parasitic algae are sometimes digested by Hydra normally involved in association with Chlorella. ksd and kse in equations (4. 2) and (4. 3) now correspond to digestion coefficients. The source terms (ksd.AD) and (kse.AE) are then removed from equations (4. 6) and (4. 7) since the symbionts die from the digestion process. The positive terms in ksd are kept in equation (4, I) since the digestion of the symbionts frees A. No mutualism is observed when the digestion is faster or of the same order of magnitude as the death process, which corresponds to the most reasonable hypothesis. The slow manifold analysis allows to understand the dynamical behavior of the system. Equations (5.4) and (5.5) now become:
dO
dAD
dt + dt
(SF.D+&AD).f2 - C+D+E+AD+AE - (d+m.n).(D+AD) - ksd. AD = 0
CE dAE (eF.E + E.AE).f2 dt + dt - C+D+E+AD+AE - (d+m.n).(E+AE) - kse. AE = 0
(7.5)
(7.6)
The digestion terms add up to the death term and mutation term (d+m.n), which constitutes a strong handicap for species D and E with respect to organisms C. C remains predominant on its branch, except when ksd is notably smaller than d. Egoism is then the only observed regime, when ksd is larger than d.
50
8 Conclusions
Let us summarize the results and discuss the biological significance of the model. Mutualism does not contradict Darwinian theory of selection of the fittest, provided that one compares species according to their effective fitness, which takes into account the benefits enjoyed by the symbionts while they are associated. Mutualism can be established when a recognition mechanism allows the host to discriminate among parasites and bona fide symbionts. One is then led to look for possible recognition mechanism in biological associations. In the case of rumen/enterobacteria association {Begon, Harper and Townsend, 1986}, the evident candidate for the recognition function is the immune system. For the rhizobium of legumes associating the legume cells and nitrogen fixing bacteria, lectins able to recognize the polysaccharides on the cells walls of the bacteria also are rather convincing molecular mechanisms {Lis and Sharon, 1986}. Other molecular recognition mechanisms have been documented in very simple organisms (spongae, tunicates, coelenterates) which could be involved in symbiotic associations {Douglas, 1988; Taylor, 1973}. Negative electric charges on the surface of symbiotic chlorellae cell walls allow hydrae to recognize them {McNeil, Hohman and Muscatine, 198I }. Another mechanism, which seems to apply to the HydralChlorella system, is the detection by the Hydra of maltose released by the Chlorella (maltose is the benefit enjoyed by the Hydra from the Chlorella) {Hohman, McNeil and Muscatine, 1982; McAuley and Smith, 1982; Muscatine and McNeil, 1989}.
The differential model that we have built allows to predict the possible emergence of mutualism according to the individual charactistics of the organisms. It applies to most endosymbiotic systems, where association times are smaller than organisms lifetime. It takes into account exchanged benefits such as food and protection. Most of the discussion involved the single site model, but we have shown in section 6 that this simpler model is equivalent to the multiple site model in the time range of interest to us.
The possibity of a coexistence regime with both commensals and true symbionts present with comparable populations was an unexpected result of the model. We are tempted to consider that the possible saturation of the host by the symbionts is the normal case which excludes the possibility of observing coexistence. On the other hand, coexistence could be a possibility in those many systems where we cannot figure out the benefits of the association for each individual organism involved, e.g. lichen where the benefit for the algae is not obvious {Begon, Harper and Townsend, 1986} or intestinal flora in insects or mammals. Since the coexistence situation is less favorable to the host than complete mutualism, we might imagine that further mutuations would select host organisms with larger rejection rates of commensals. The real existence in biological systems of the coexistence regime is still an open question worth investigating.
Although most results can be obtained through direct numerical simulation, the slow manifold analysis brings some insight in the important concepts: effective fimesses and effective per capita increase, the saturation limit for mutualism, equivalence between
51 single site and multiple site models, and why selective digestion is insufficient to establish mutualism. An extended version of this manuscript has been published in {Weisbuch and Duchateau, 1993 }.
Acknowledgments Computer simulation used GRIN73 software {De Boer, 1983 }. The Laboratoire de Physique Statistique is associated with the CNRS (URA 1306) and Paris VI and Paris VII Universities. This work was started in the Santa Fe Institute which we thank for its hospitality. We thank NATO (CRG 900998) and Fondation Curie external grants progam for partial support. We thank Richard Belew, Bernard Derrida, Nancy Koppel, Alan Perelson, Jonathan Roughgarden and Errs SzathmSry for helpful discussions.
References Begon, M., J. L. Harper and C. R. Townsend. 1986. Ecology. Blackwell Scientific Publication. Boucher, D. H., S. James and K. H. Keeler. 1982. Ecology of Mutualism. Annual Review of Ecology and Systematics 13, 315-347. De Angelis, D. L., W. M. Post and C. C. Travis. 1986. Mutualistic and Competitive Systems. In Positive Feedback in Natural Systems. S. A. Levin (Ed.) p 290. Berlin: Springer Verlag. De Boer, R. J. 1983. GRIND: Great Integrator for Differential Equations. Bioinformatics Group, University of Utrecht, The Netherlands. Douglas, A. E, 1988. Nutritional interactions as signals in the green hydra symbiosis. In Cell to cell signals in plant, animal and microbial symbiosis. S. Scannerini, D. Smith, P. Bonfante-Fasolo and V. Gianinazzi-Pearson (Eds.) p 283-296. Berlin: Springer-verlag. Hohman, T. C., P. L. McNeil and L. Muscatine. 1982. Phagosome-lysosome fusion inhibited by algal symbionts of Hydra viridis. Journal of cell biology 94, 56-63. Ikegami, T. and K. Kaneko. 1990. Computer Symbiosis - Emergence of Symbiotic Behavior Through Evolution. Physica D 42,235-243. Lindgren, K. 1992. Evolutionary Phenomena in Simple Dynamics. In Artificial Life 11. C. G. Langton, C. Taylor, J. D. Farmer and S. Rasmussen (Eds.) p 295-312. Addison-Wesley. Lis, H. and N. Sharon. 1986. Lectins as molecules and as tools. Ann. rev. biochem. 55, 35-67. McAuley, P. J. and D. C. Smith. 1982. The green hydra symbiosis. VII Conservation of the host cell habitat by the symbiotic algae. Proceedings of the Royal Society of London B 216, 415-426.
52 McNeil, P. L., T. C. Hohman and L. Muscatine. 1981. Mechanisms of nutritive endocytosis. II The effect of charged agents on phagocytic recognition by digestive cells. Journal of Cell Science 52, 243-269. Morse, P. and H. Feshbach. 1953. Methods of Theoretical Physics. McGrawHill. Muscatine, L. and P. L. McNeil. 1989. Endosymbiosis in Hydra and the evolution of internal defense systems. American zoologist 29(2), 371-386. Nowak, M. A. and R. M. May. 1992. Evolutionary games and spatial chaos. Nature, 826-829. Roughgarden, J. 1975. Evolution of Marine Symbiosis - A Simple Cost-Benefit Model. Ecology 56, 1201-1208. Szathrn~try, E. and L. Demeter. 1987. Group selection of early replicators and the origin of life. Journal of theoretical biology 128, 463-486. Taylor, C. E., L. Muscatine and D. R. Jefferson. 1989. Maintenance and Breakdown of the Hydra-chlorella Symbiosis: a Computer Model. Proceedings of the Royal Society of London B 238, 277-289. Taylor, D. 1973. The cellular interactions of algal-invertebrate symbiosis. Advances in marine biology 11, 1-56. Weisbuch, G. 1984. Un module de l'6volution des esp~ces ~ trois niveaux, bas6 sur les propri6t~s globales des r6seaux bool6ens. Comptes rendus de l'Acaddmie des Sciences de Paris 298(III (14)), 375-378. Weisbuch, G. and G. Duchateau. 1993. Emergence of mutualism: Application of a differential model to endosymbiosis. Bulletin of mathematical biology 55(6), 1063-1090. Wilson, D. S. 1983. The Effect of Population Structure on the Evolution of Mutualism: a Field test involving burying beetles and mites. American Naturalist 121,851870.
Three Illustrations of Artificial Life's Working Hypothesis Mark A. Bedau Reed College, 3203 SE Woodstock Blvd., Portland OR 97202, USA Emaih
[email protected] A b s t r a c t , Artificial life uses computer models to study the essential nature of the characteristic processes of complex adaptive systems--
proceses such as self-organization, adaptation, and evolution. Work in the field is guided by the working hypothesis that simple computer models can capture the essential nature of these processes. This hypothesis is illustrated by recent results with a simple population of computational agents whose sensorimotor functionality undergo open-ended adaptive evolution. These might illuminate three aspects of complex adaptive systems in general: punctuated equilibrium dynamics of diversity, a transition separating genetic order and disorder, and a law of adaptive evolutionary activity.
1
Artificial Life's Working Hypothesis
Artificial life studies computer models of the processes characteristic of complex adaptive systems--processes like self-organization, self-reproduction, adaptation, and evolution. Complex adaptive systems take many forms, each of which differs from the others in myriad ways. By abstracting away from the diverse details, artificial life hopes to reveal fundamental principles governing broad classes of complex adaptive systems. This hope rests on artificial life's working hypothesis that simple computer models can capture the essential nature of complex adaptive systems [1]. I propose to pursue artificial life's working hypothesis by applying a "thermodynamic" methodology [5, 6, 3, 4, 2, 7]. Recently it has been suggested that there is a close, intrinsic connection between the content of evolution and thermodynamics (e.g., Brooks and Wiley [8]). By contrast, I envisage the two fields as sharing the methodology of developing and investigating statistical macrovariables. Thermodynamics investigates macrovariables like temperature, pressure, and specific heat, and the fruits of this method include simple, basic laws and classifications (like the ideal gas law and the phase transition separating the solids and liquids). By analogy, the "thermodynamic" approach in artificial life seeks to identify statistical macrovariables that capture the distinctive features of complex adaptive systems. The most straightforward sign that this methodology is bearing fruit would be the demonstration that appropriate macrovariables can be used to frame simple, basic laws and classifications that apply to broad classes of complex adaptive systems.
54
This methodology involves formulating statistical macrovariables that are general enough to apply across a wide variety of systems, and then using these variables to search for underlying quantitative order unifying different systems. It is natural to begin this endevour with simple models, for macrovariables are easiest to formulate initially in simple models and simple models are easiest to study. Furthermore, simple models can reveal the essential nature of complex adaptive systems in general--at least, that is artificial life's working hypothesis. This working hypothesis might be false, of course. It is at odds with the conclusions often drawn from the historicity, contingency, and variety of evolving biological systems (e.g., [25, 16]). One should bear in mind though that processes rife with historicity, complexity, and variety may well still fall under simple, basic laws and classifications, especially if these laws and classifications emerge through the application of statistical macrovariables. The "thermodynamic" methodology applied to simple computer models is a promising way to identify such laws and classifications, if they exist.
2
A Simple
Model
of Evolution
The model studied here is designed to be simple yet able to capture the essential features of an evolutionary process [27, 5, 6, 3, 4, 2, 7]. This model is motivated by the view that evolving life is typified by a population of agents whose continued existence depends on their sensorimotor functionality, i.e., their success at using local information to find and process the resources needed to survive and flourish. Thus, information processing and resource processing are the two internal processes that dominate agents' lives, and their primary goal--whether they know this or not--is to enhance their sensorimotor functionality by suitably coordinating these two internal processes. Since the requirements of sensorimotor functionality typically alter as the contingencies of evolution change, continued viability and vitality calls for sensorimotor functionality to adapt in an openended, autonomous fashion. The present model attempts to create agents with sensorimotor functionality that can undergo this open-ended~ autonomous evolutionary adaptation. The model consists of agents residing in a two-dimensional world, sensing their local environment, moving, and ingesting resources. All that exists in the world besides the agents are heaps of resources that are concentrated at particular locations, with levels decreasing with distance from a central location. The resource is refreshed periodically in time and randomly in space. Agents interact with the resource field at each time step by extracting any found at their current site and storing it in their internal resource reservoir. Agents must continually replenish their internal resource supply to survive. Agents pay" a resource tax just for living and a movement tax proportional to the distance traveled. If an agent's internal resource supply drops to zero, it dies and disappears from the world. On the other hand, an agent can remain alive indefinitely if it can continue to find sufficient resources.
55 An agent's movement is governed by its genetically hardwired sensorimotor strategy. A sensorimotor strategy is simply a map taking sensory data from a local neighborhood (the five site yon Neumann neighborhood) to a vector indicating a magnitude and direction for movement: s:
...,
--.
= (r,
o).
(1)
A agent's sensory data has two bits of resolution for each site, allowing the agents to recognize four resource levels (minimal resources, somewhat more resources, much more resources, maximal resources). Its behavioral repertoire is also finite, with four bits of resolution for magnitude r (zero, one, ..., fifteen steps), and three bits for direction 0 (north, northeast, east, ...). A unit step in the NE, SE, SW, or NW direction is defined as movement to the next diagonal site, so its magnitude is v ~ times greater than a unit step in the N, E, S, or W direction. Each movement vector v thus produces a displacement (x, y) in a square space of possible spatial destinations from an agent's current location. The graph of the strategy map S may be thought of as a look-up table with 2 l~ entries, each entry taking one of 27 possible values. This look-up table represents an agent's overall sensorimotor strategy. The entries are input-output pairs that link each sensory state (input) that an agent could possibly encounter with a specific behavior (output). The different entries in the look-up table represent genetic loci, and the movement vectors assigned to them represent alleles. Since agents have 1024 loci, each containing one out of a possible 128 alleles, the total number of different genotypes is 1281~ Although finite, this space of genotypes allows for evolution in a huge space of genetic possibilities, which simulates the much larger number of possibilities in the biological world. In order to investigate how adaptation affects the evolutionary dynamics of this model, I introduce a behavioral noise parameter, Bo, defined as the probability that an agent's behavior is chosen at random from the 27 possible behaviors, rather than determined by the agent's genetically encoded sensorimotor strategy. Thus, behavioral noise severs the link between genotype and phenotype. If B0 = 1, then agents survive and reproduce differentially, and children inherit their parents' strategy elements (except for mutations), but the inherited strategies reflect only random genetic drift rather than the process of adaptation. Sensorimotor strategies evolve over generations. An agent reproduces (asexually) when its internal resource supply crosses a threshold. The parent produces one child, which is given half of its parent's supply of resources. Parental allele values are inherited except when a point mutation at a locus gives a child a randomly chosen allele value. The mutation rate p determines the probability with which individual locus mutate during reproduction. At the limit of/~ = 1, every allele value will mutate and thus each allele of child is chosen completely randomly. It is important to note that selection and adaptation in the model are "intrinsic" or "indirect" in the sense that survival and reproduction are determined solely by the contingencies involved in each agent's finding and expending resources. No externally-specified fitness function governs the evolutionary dynamics [27, 5]. Good strategies for flourishing in this model would allow agents
56 to acquire and manage resources efficiently. However, it is an open question which specific strategies would efficiently acquire and manage resources, and there might be no universally optimal strategy. A strategy's worth is relative to the environment; a strategy might be optimal in one environment and suboptimal in another. The environment of the present model consists of the fluctuating resource field and the competing strategies possessed by the agents in the population. Both of these environmental components change during the course of evolution. The strategies directly evolve, and the resource field indirectly changes because different populations of strategies affect it differently. For this reason, the model has the potential to show an open-ended evolutionary dynamic consisting of the perpetual creation of adaptive novelty. This potential for an unpredictably shifting adaptive landscape is one reason the model resists treatment by the analytical methods used in traditional mathematical population genetics [9, 14, 15]. Not only are there thousands of loci and hundreds of alleles per locus, but the vicissitudes of natural selection indirectly cause unpredictable fluctuations in the finite population's size, age structure, and genotype distribution. In general, the only way to discern any underlying order in the model's behavior is through extensive computer simulation focussed on appropriate statistical macrovariables. These complications notwithstanding, the model is an unabashedly abstract and idealized representation of a population of evolving agents, lacking many of the features often emphasized in the biological literature. For example, the environment lacks the spatial structure required for migration effects, there are no explicit interactions (such as predation) among organisms, there is no intron/exon distinction in the chromosome, and there is no "continuity" of mutation (mutated allele values are not "near" previous values). Nevertheless, my working hypothesis is that this model captures the fundamental features of complex adaptive systems, and is thus a useful model for investigating the essential aspects of more realistic systems.
3
Measurement of Population Diversity
Population diversity is one plausible statistical macrovariable for artificial life to investigate. But how might population diversity be measured? My proposal, very roughly, is to represent the population as a cloud of points in an abstract genetic space, and then define the population's diversity as the spread of that cloud. In the present model, an allele is a movement vector, a spatial displacement, and an agent's genotype is a set of spatiM displacements. To capture the total population diversity, D, then, collect all the displacements of all agents in all environments into a cloud, and measure the spread or variance of that cloud. We can divide this total diversity D into two components. First, collect the spatial displacements of each agent in the population in a given environment, i.e., the traits encoded across the population at a given locus, and calculate the spread of this locus's cloud. The average spread or variance of all such locus distributions is a population's within-locus diversity, W. Now, form another,
57 second-order collection of the centroid each locus's cloud, i.e., a cloud of the "mean" displacement at each locus. The spread or variance of this second-order cloud is the population's between-locus diversity, B; it measures the diversity of the different mean population responses. More formally, I define total diversity as the mean squared deviation between the average movement of the whole population, averaged over all agents and over all environmental conditions, and the individual movements of particular agents subject to particular conditions, i.e., I
]
D
J
=
-
+
-
(2)
i----1 j-----1
where I is the number of agents i, J is the number of environmental conditions (or, in the present model, loci) j, (x~j, Yij) is the movement vector of agent i subject to input j, and ~IJ = T)-i~i=ll ~ ] = 1 z/j (similarly for zJ J). So, (5cxJ ,~l[g) is the (a, y) displacement of the population averaged over all agents i and loci (environments) j. Then, the within- and between-locus components of the total diversity are defined as follows: 1 w
I
Y
=
-
+
-
(3)
i--1 j = l J
B : 5
_
+ (~.r
(4)
j=l
where x-j r = _~E i =rI XiJ (and similarly for ~ t ) . So, (xj- t ,yj- t ) is the (x, y) displacement of the population in locus (environment) j averaged over all agents i. (.Further formal analysis of diversity and its components is developed elsewhere [6, 3, 4]0 From the analysis of variance [20], we know that the total diversity is the sum of the within- and between-locus components, D = W + B. The relative size of D, W, and B reflects a population's genetic structure, as two extreme kinds of populations can illustrate. First, consider a population consisting of "random agents," in the sense that each agent's alleles are chosen randomly from the set of possible alleles, different agent's alleles being chosen independently. In this case, the distribution across the population at any given locus will be a huge cloud covering the whole set of possible spatial displacements, so the population's within-locus diversity W will be quite large. Since the centroid of each of these huge clouds will be virtually the same point-the center of the space of possible behavioral displacements--the distribution of these centers of gravity will be quite tight, and so the between-locus diversity will be nearly zero, B ~ 0. The population's total diversity will approximately equal the within-locus diversity, D ~ W. A second extreme case is a population consisting of "quasi-clonal" (nearly genetically identical) agents that act differently in different environments. In this case, the within-locus diversity is nearly zero, W ~ 0, since the average spread of
58 the cloud of behavioral displacements at each environment-locus is minimal. On the other hand, since the average behaviors in different environments are quite different, the between-locus diversity is large and equal to the total diversity, D ~ B. In this way, the relations among D, W, and B clearly distinguish the quasi-clonal and random agent populations. 3.1
P u n c t u a t e d Equilibria
One of the most controversial topics in recent evolutionary biology has been the existence, cause, and implications of punctuated equilibria [13, 17, 10, 23, 26]. Artificial life systems might shed some new light on this controversy, since they often display punctuated equilibria in quantities like species concentration and average fitness (e.g, [19, 21, 28]). Yet the causes of these punctuated dynamics remain uncertain. Ecological complications such as host-parasite interactions or genetic complications such as extensive epistasis are typically thought to be implicated, and it is almost universally assumed that adaptation plays an essential role. My observations question whether any of these factors are essential. I measured diversity in a series of simulations in which mutation rate and the presence or absence of adaptation were varied, while all other parameters of the model, including the size of the world and the resource environment, were held constant. Alleles were assigned to the founder population randomly, with displacement direction chosen from the eight compass directions and distance in steps chosen from zero, one and two. Thus, in the founder population, the total diversity was relatively low, D = 2.5, and virtually all of the total diversity was in the within-locus component, D ~ W and B ~ 0. Diversity dynamics in the present model routinely display clear punctuated equilibria when the mutation rate is suitably low. Figure 1 shows the typical dynamics of diversity for simulations in which # = 10-~. Diversity remains largely static ibr significant periods of time, but every now and then diversity is punctuated by very rapid changes. The resulting picture is characterized by relatively fiat plateaus separated by abrupt cliffs. (Figure 1 shows the withinand between-locus diversity components, W and B. The interesting diversity punctuations occur with respect to B. B approximates D since W is very low in these simulations and D = W + B, so the punctuations also occur with respect
to D,) It is notable that these punctuated equilibria occur in such a simple model. None of the ecological or genetical complications usually thought to be implicated are explicitly present in the model. For example, the model allows no explicit ecological interactions like those between host and parasite and the genetic structure has no epistasis. It is true that the model could support the emergence of implicit sub-populations that follow competing or cooperating resource-finding strategies. If such sub-populations were to exist, they would produce a substantial within-locus diversity W, for the average trait at given loci would differ between the sub-populations. 'The slightly positive values of within-locus diversity W in the simulation with adaptation (Fig. 1, top) is too low to be consistent with significantly different sub-populations. The simulations without adaptation
59 I
~
--~r-'
l "84
r
I .......
W
.~ 3Q .>_ Q ~5
0 ........
W
--B
45
.>_ O
, 0
,
[ 2ooooo
1 4oooo0
Time
6o0ooo
8oooo0
1oo0ooo
Fig. 1. Punctuated equilibria in diversity dynamics from the first 1,000,000 time steps of two typical low-mutation simulations (# -- 10-2). Adaptation above (B0 -- 0), no adaptation below (B0 = 1). Time series for the two diversity components, W and B, are shown. The founder populations in these simulations have fairly low diversity, so punctuations initially tend to increase diversity, as shown here. On longer time scales, punctuations are equally likely to decrease and increase diversity.
(Fig. I, bottom) show W values virtually equal to zero, which means that the population is virtually clonal and so has no sub-populations. Thus, although interactions between sub-populations might sometimes contribute to punctuations in some of the simulations, in general sub-populations play no fundamental rote in the punctuated equilibria we observe in this model. The most striking aspect of these punctuations is their presence even when adaptation is absent. Although punctuated equilibria in the absence of adaptation occur only when the mutation rate # is suitably low, the effect is quite robust. Therefore, the presumption that punctuated equilibria must reflect the operation of adaptation is simply wrong. If punctuated equilibria are observed in the presence of adaptation~ without additional evidence one cannot assume that adaptation plays any important role in their genesis. Evidently, there is an intrinsic tendency for evolving systems absent adaptation--that is, stochasti-
60 cally branching, trait-transmitting processes--to produce punctuated diversity dynamics, provided the branching rate is suitably poised. 3.2
Transition Separating Genetic Order and Disorder
Punctuated diversity dynamics fit into a broader pattern suggesting that evolving systems can be classified into two qualitatively different categories. I measured total diversity D and its within-locus W and between-locus B components in a series of pairs of adaptation/no-adaptation simulations, smoothly varying the mutation rate # (on a log scale). The resulting diversity data reveal a transition separating two qualitatively different kinds of genetic systems. One indication of this transition comes from the qualitative nature of the observed diversity dynamics. As noted in the previous section, when # is low diversity dynamics typically consist of punctuated equilibria, the frequency of which is proportional to the mutation rate. On the other hand, when # is high the diversity dynamics exhibit noisy fluctuations around a stable equilibrium value. The amplitude of these fluctuations is inversely proportional to the mutation rate. The relationship between the total diversity and its two components clearly indicates the two different kinds of genetic systems and the transition between them. When the mutation rate is low, the total diversity is well approximated by the between-locus diversity, D ~ B. This shows that low mutation systems consist of the sort of "qaasi-clonal" population mentioned in Sect. 3. On the other hand, when the mutation rate is high, the total diversity is well approximated by the within-locus diversity, D ~ W. Thus, high mutation systems consist of the sort of "random agent" population also mentioned in Sect. 3. The way in which the transition between these quasi-clonal and random populations depends on mutation rate can be made vivid by plotting the component diversity, i.e., the extent to which the total diversity D is dominated by neither W nor B but has a large contribution from each. The component diversity can be defined as the proportion of the area of a square of side D is covered by a rectangle with sides 2W and 2B:
4WB C-
92
(5)
(The factor of 4 scales C so that 0 < C < 1.) I noted above that W will be near zero in a quasi-clonal population, and B will be near zero in a random population. Thus, the component diversity C will be near zero in both of these two kinds of populations. The component diversity C can approach one only if neither diversity component dominates the total diversity, which would entail that the population is neither quasi-clonal nor random. Figure 2 shows the time average of the component diversity C as a function of the mutation rate, for systems both with and without adaptation. A transition between two qualitatively different genetic systems is clearly indicated. Notice that C is close to zero if the mutation rate is either high or low, and C approaches its maximal value of one at intermediate mutation rates, roughly, 10-3 < ~ <_
6] 1.10
........
r'--~
.....
I
........
I
o Adaptation
1.00 ..........
Fit to Adaptation
Q N o Adaptation
0.90
__>,
.......
F'rt to No Adaptation
~
0.80
0.70
.>
E3 E
,o =;
0.60
/
i\
/
/
0.50
/
o
E o o
~ i ~,
0.40 0.30
/
/
o
/
/ /
d
/
0.20
0.10 0,00 -0-1010-~
........
[
1 0 -6
~ I all,.
i
........
10 -~
]
10 "=
J , iH,,,
t l O~
Mutation Rate
Fig. 2. Transition in diversity dynamics, reflected by the time average of the component diversity, C, as a function of mutation rate (shown on a log scale to improve resolution). The transition separates two regions of qualitatively different behavior. Systems with low mutation rate # are genetically "ordered"--the genetic structure of each agent in general is highly correlated with those of the other agents. High # systems are genetically "disordered"--the genetic structure of each agent in the population is uncorrelated with those of the other agents. (The leftmost data points represent not # = 1 0 -s but # = 0 . )
10 -2 . It is striking that this transition exists whether or not the agents' genetic strategies are adapting during the course of evolution. Even if all genes are merely drifting because of the operation of behavioral noise, we still see the two qualitatively different genetic systems and the transition between them. (In fact, the transition seems to be sharper without adaptation. Further details about the diversity dynamics and the effects of mutation and adaptation are described elsewhere [6, 3, 4].) Figure 2 paints a picture of an abstract space of evolving systems with two qualitatively distinct regions dividing the mutation spectrum. Low mutation systems are genetically "ordered," consisting of a population of genetically identical (or, nearly identical) agents--a quasi-elonal population. Different loci encode dif-
62 ferent traits, and from time to time this more or less static distribution of traits across loci abruptly shifts, causing punctuations in the prevailing genetic stasis. By contrast, high mutation systems are genetically "disordered," consisting of a population of genetically dissimilar agents, each of which has a random collection of alleles--a random population. Over time, the gene pool is a continually fluctuating random distribution. These ordered and disordered regions are separated by a transitional region. (Whether this transitional region itself contains further structure is a topic of ongoing work.
6 0 0 0 ~ ...........
o
oooo
0 (D
T
T
~
-~--~":~-~--~
.........................
i-
i
L
!
\\.,
..,
!
a::
u)
300000
~
q
fl: c~ > < 150000 ~• Adaptation G----- D No Adaptation
. . . . . .
0 -8
,,
}
10"6
........
i
. ~ ,
,i.,
10 .4
I
10.2
~
,
, r tLu_
10~
Mutation Rate
Fig. 3. Time averages of the amount of uningested resource in the world as mutation rate # is varied (shown on a log scale to improve resolution). In one set of simulations adaptation operates normally; in the other set of simulations ~daptation is prevented with behavioral noise B0 = 1. The "bars" surrounding each point indicate the standard deviation of the time series of resource values. (The le[tmost data points represent not # = 1 0 -8 but # = 0 . )
Figure 3 shows that this transition separating genetic order and disorder has a striking connection with population fitness. Since my model is resource-driven, the population's overall fitness is reflected by its efficiency at extracting the
63 available resources from the environment. (Exactly the same amount of resources were pumped into all simulations.) A crude (inverse) measure of this resourceextraction efficiency is the amount of residual (uningested) resource present in the world. The time average of residual resource is plotted against mutation rate, in Fig. 3. When the dependence of residual resource is compared with the diversity transition shown in Fig. 2, we can see that maximal resource-extraction efficiency occurs when the mutation rate is at or slightly below the transition (a region that one might describe as near "the edge of disorder"). As the mutation rate rises significantly into the region in which systems are disordered, resourceextraction efficiency falls off dramatically. (There is some indication that fitness also falls off if the mutation rate is well into the region of ordered systems, but this is unclear since it is difficult to gather clean statistics at very low mutation rates.) Although the transition between genetic order and disorder exists whether or not adaptation happens, effective adaptation is evidently optimal around this edge of disorder. This effect might reflect a balance between two competing demands of evolutionary learning. On the one hand, the need to remember what has been learned requires a sufficiently low mutation rate; on the other hand, the need to explore novel possibilities requires a sufficiently high mutation rate. Optimal evolutionary learning, then, requires a mutation rate that appropriately balances these competing needs. This optimally poised mutation rate appears to coincide with the region around the edge of disorder.
4
Measurement
of Adaptive
Evolutionary
Activity
A fundamental feature of any complex adaptive system is its adaptive evolutionary dynamics. But how might this property be measured? I think that we should conceive of adaptive evolutionary activity as the creation through the evolutionary process of sensorimotor functionality, i.e., of sensorimotor traits that are beneficial to the agents that possess them and that persist in the population because of this benefit. But how might this process be measured, especially when we might not know which traits have any functionality, and, if they do, what kind and how much? The difficulty--some would say impossibility--of answering this question was stressed in a classic paper by Gould and Lewontin [18] which subsequently generated a flood of critical debate (e.g., [12, 11, 29, 22, 24]). I propose that we can address this issue by measuring the extent to which a trait is well-tested by natural selection. Every time as agent uses one of its sensorimotor traits, natural selection has an opportunity to provide some feedback about the trait's benefit or cost. If the trait persists in the lineage through repeated use and, in particular, accumulates more usage than would be expected a priori, then we have evidence that it is persisting because of its beneficial effects. Measuring a fruit's adaptive significance in this way, then, involves measuring the extent to which its use exceeds a priori expectations. In the context of the present model, sensorimotor traits are alleles. To meat to the 8 th allele of sure the "raw" usage of an allele, assign a usage variable uis
64 the i t h agent. An allele's usage variable is set to zero when the allele first enters the population through mutation (or at the very beginning of the simulation). Then, usage is incremented every time an allele is actually used, i.e., when the agent receives the sensory input genetically linked with the 8 TM lOCUS and the behavior encoded by the 8 t h allele is thereby triggered: . t+l ~is =
I ui,t + 1 if i uses the uist
8th
allele at t
otherwise
Recall that, if B0 > 0, behavioral noise can prevent the .Sth allele from actually producing i's behavior at t; in this case, i would not use the 8 t h allele even after receiving the sensory input that normally triggers its use. If B0 = 1 then u~, = 0 for all i, s, and t. Not all raw usage indicates an allele's adaptive significance, however, since harmful alleles accumulate usage. (In fact, it is only when harmful alleles are used that natural selection can eliminate them.) To determine an allele's proven adaptive value, we need to screen off that usage that might not signify the allele's adaptive value. One way to do this is to measure the duration during which an allele lineage could accumulate usage in the absence of adaptation, and then count an allele's usage as having adaptive significance only if the allele's age exceeds this duration. More precisely, one can measure the extent of the adaptive evolutionary activity underlying all traits in a given simulation of the model--with some particular setting of model parameters (resource field, resource taxes, size of world, etc.)--as follows. Let the age A~ of the 8 t h allele of the i t h agent be defined as the number of time steps since that allele was originally introduced into i's genetic lineage by a mutation at the S t h locus. Then measure the age distribution of all alleles of all agents at the model parameter settings of interest except that behavioral noise is fully turned on, B0 = 1. Adaptation cannot affect the allele age distribution when behavioral noise is always present--genotype and phenotype are unconnected--so the allele age distribution reflects only genetic drift. Given this measured distribution of ages Ai,, define the drift duration, t , , as the shortest duration which is less than Ai, for all s and i. We can be quite confident that, in the simulation of interest, no allele can survive in the population for longer than the drift duration if the alMe's presence is due to chance alone. To calculate the "net" usage gis of the s th allele of the i agent, we modify Eq. 6 by adding the constraint that the allele's age must exceed the drift duration tu: :.t+l ~is =
ui, + 1 if i uses the s th allele at ~ and Ai, >_ t~
f ui~t
otherwise
(r)
Finally, adaptive evolutionary activity A t is simply the sum of the net usage: At = ~
~t Uis .
(s)
65 4.1
A Law of Adaptive Evolutionary Activity
The drift duration tu was measured in a series of simulations across the mutation spectrum. (Limited computational resources prevented measurement of t u for p < 10-3.) All model parameters were set exactly as in the simulations discussed in Sec. 3.1 and Sec. 3.2 above. Then the time average A = tAt)t of evolutionary activity was measured across the mutation spectrum, for various values of behavioral noise, 0 <_ B0 < .25.
10
:>_.r
r-
.9 -I--,
o > LU
1
i
10 -a
l l l ~ l
_ _
i
,f,~.J
I
10 -e
i
llLJll
10 4
I
10 "=
........
10 0
Mutation Rate
Fig. 4. Average evolutionary activity A as a function of mutation rate for several values of behavior~ noise, 0 < B0 < .25. To facilitate comparison with Fig. 2 and Fig. 3, the same mutation rate scale is used. Due to the computational resources necessary for the calculation of the drift duration t~ when # < 10 -z, evolutionary activity has not yet been measured at lower mutation rates.
Figure 4 shows how A was observed to depend on the mutation rate #. We see that, within the range of mutation rates sampled, evolutionary activity approximately follows a power law: A =
,
(9)
66 with a ~ -2.3+0.3. Notice that the dependence of adaptive evolutionary activity A on the mutation rate corresponds very closely with the dependence of resourceextraction efficiency on mutation rate depicted in Fig. 3. It is notable that the approximate power law behavior of A in Fig. 4 holds up at a dozen different (relatively low) values of behavioral noise. This suggests that the law of adaptive evolutionary activity in Eq. 9 is fairly robust. An open question (requiring significant computational resources to answer) is how A will change when # passes through and below the transition separating genetic order and disorder shown in Fig. 2. This question is especially intriguing given the adaptive significance of the transition revealed in when Fig. 2 is overlayed with Fig. 3.
5
The Status of Artificial Life's Working Hypothesis
The three results discussed here--punctuated equilibria in diversity dynamics, the transition separating genetic order and disorder, and the empirical law of adaptive evolutionary activity--illustrate the possible fruits of artificial life's working hypothesis that simple computer models can capture the essential nature of complex adaptive systems. I say possible fruits because it is not clear that these three effects are part of the essential nature of complex adaptive systems in general. Still, the results in the present model are sufficiently compelling for us to seriously entertain the hypotheses that these punctuation, transition, and power law effects have some significant universal application. These three specific hypotheses about punctuation, transition, and adaptation must be sharply distinguished from the general working hypothesis that underlies this whole line of research in artificial life. The specific hypotheses are candidates for confirmation or disconfirmation in the short run, but the working hypothesis is not. In the short run, the working hypothesis is to be judged by whether it generates fruitful lines of research. When held to this standard, the results presented above give the working hypothesis some provisionally credibility. The punctuation~ transition, and adaptation results found in the present simple model will prompt the search for evidence for similar effects in other complex adaptive systems, both artificial and natural, and this in turn wil prompt the development of maximally general formulations of macrovariables like D, W, B, and A. These are exciting and promising lines of research. In the long run, working hypotheses often can be effectively confirmed or disconfirmed. Artificial life's working hypothesis will win confirmation if enough of the specific hypothesis (like punctuation, transition, and adaptation) it spawns prove to be compelling. Whether this is so is an empirical matter, one which the "thermodynamic" methodology illustrated in this paper is well suited to address. But how plausible are the three specific hypotheses about punctuation, transition, and adaptation? Are punctuated equilibrium diversity dynamics, a transition separating genetic order and disorder, and a power law dependence of
67
evolutionary activity on mutation rate part of the essential nature of some significant class of complex adaptive systems? These questions remain open. But there is a straightforward empirical method by which we can pursue their answers. The hypotheses are eminently testable. Testing such hypothesis in a wide variety of artificial and natural systems is my vision of artificial life as-it-couldbe.
References 1. M. A. Bedau, 1992, '~Philosophical Aspects of Artificial Life," in F. J. Varela and P. Bourgine, Towards a Practice of Autonomous Systems, Bradford/MIT Press, Cambridge, MA. 2. M. A. Bedau, 1994, "The Evolution of Sensorimotor Functionality," in P. Gaussier and J.-D. Nicoud, eds., eds., From Perception to Action, IEEE Computer Society Press, Los Alamitos, CA. 3. M. A. Bedau and A. Bahm, 1993, "Order and Chaos in the Evolution of Diversity," in Proceedings of the Second European Conference on Artificial Life, Brussels, Belgium. 4. M. A. Bedan and A. Bahm, 1994, "Bifurcation Structure in Diversity Dynamics," in R. Brooks and P. Maes, eds., Artificial Life IV, Bradford/MIT Press, Cambridge, MA. 5. M. A. Bedan, and N. H. Packard, 1991, "Measurement of Evolutionary Activity, Teleology, and Life," in C. G. Langton, C. E. Taylor, J. D. Farmer, and S. Rasmussen, eds., Artificial Life II, SFI Studies in the Sciences of Complexity, Vol. X., Addison-Wesley, Redwood City, CA. 6. M. A. Bedau, F. Ronneburg, and M. Zwick, 1992, "Dynamics of Diversity in an Evolving Population," in R. MS.nner and B. Manderick, eds., Parallel Problem Solving from Nature, 2, New York, Elsevier. 7. M. A. Bedau and R. Seymour, 1994, "Adaptation of Mutation Rates in a Simple Model of Evolution," in R. Stonier and X. H. Yu, eds., Complex Systems-Mechanisms of Adaptation, IOS Press, Amsterdam. 8. D. R. Brooks and E. O. Wiley, 1988, Evolution as Entropy, second edition, Chicago University Press, Chicago. 9. J. F. Crow and M. Kimura, 1970, An Introduction to Population Genetics Theory, Harper and Row, New York. 10. R. Dawkins, 1982, The Extended Phenotype, Oxford University Press, New York. 11. R. Dawkins, 1983, "Adaptationism Was Always Predictive and Needs No Defense," Behavioral and Brain Sciences, 6, 360-61. 12. D. C. Dennett, 1983, "Intentional Systems in Cognitive Ethology: the 'Pang]ossian Paradigm' Defended," Behavioral and Brain Sciences, 6, 343-390. 13. N. Eldredge and S. J. Gould, 1972, "Punctuated Equilibria: An Alternative to Phyletic Graduahsm," in T. 3. M. Schopf, ed., Models in Paleobiology, Freeman, Cooper and Company, San Francisco. 14. W. J. Ewens, 1979, Mathematical Population Genetics, Springer-Verlag, Berlin. 15. D. S. Falconer, 1981, Introduction to Quantitative Genetics, second edition, Wiley~ New York. 16. S. J. Gould, 1989, Wonderful Life, Norton, New York. 17. S. J. Gould and N. Eldredge, 1977, "Punctuated Equilibria: The Tempo and Mode of Evolution Reconsidered," Paleobiology, 3, 115-151.
68 18. S. 3. Gould and R. C. Lewontin, 1979, "The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationist Programme," Proceedings of the Royal Society B, 205, 581-598. 19. D. Hillis, 1992, "Simulated Evolution and the Red Queen Hypothesis," Biocomputation Workshop, Monterey, June 22-24. 20. G. R. Iversen and H. Norpoth, 1976, Analysis of" Variance, Sage Publications, Beverly Hills, CA. 21. K. Lindgren, 1991, "Evolutionary Phenomena in Simple Dynamics," in C. G. Lungton, C. E. Taylor, J. D. Farmer, and S. Rasmussen, eds., Artificial Life H, SPI Studies in the Sciences of Complexity, Vol. X., Addison-Wesley, Redwood City, CA. 22. J. Maynard Smith, 1978, "Optimisation Theory in Evolution," Annual Review of Ecology and Systematics, 9, 31-56. 23. J. Maynard Smith, 1989, Did Darwin Get It Right?, Chapman and Hall, New York. 24. E. Mayr, 1983, "How To Carry Out the Adaptationist Program," American Naturalist, 121, 324-33. 25. E. Mayr, 1988, "Is Biology an Autonomous Science?" In his Towards a New Philosophy of Biology, Harvard University Press, Cambridge, MA. 26. E. Mayr, 1988, "Speciational Evolution through Punctuated Equilibria," in his Towards a New Philosophy of Biology, Harvard University Press, Cambridge, MA. 27. N. H. Packard, 1989, "Intrinsic Adaptation in a Simple Model for Evolution," in C. G. Langton, ed., Artificial Life, SFI Studies in the Sciences of Complexity, Vol. VI., Addison-Wesley, Redwood City, CA. 28. T. Ray, 1991, "An Approach to the Synthesis of Life," in C. G. Langton, C. E. Taylor, J. D. Farmer, and S. Rasmussen, eds., Artificial Life II, SFI Studies in the Sciences of Complexity, Vol. X., Addison-Wesley, Redwood City, CA. 29. A. Rosenberg, 1985, "Adaptationalist Imperatives and Panglossian Paradigms," in J. I-I. Fetzer, ed., Sociobiology and EpistemoJogy~ Reidel, Dordrecht.
Self-Organizing Algorithms Derived from RNA Interactions Wolfgang Banzhaf Department of Computer Science, Dortmund University Baroper Str. 301, 44221 Dortmund, G E R M A N Y banzhaf@t arantoga.informatik .uni-dortmund.de
A b s t r a c t . We discuss algorithms based on the RNA interaction found in Nature. Molecular biology has revelled that strands of RNA, besides being autocatalytic, can interact with each other. They play a double role of being information carriers and enzymes. The first role is realized by the 1-dimensional sequence of nucleotides on a strand of RNA, the second by the 3-dimensional form strands can assume under appropriate temperature and solvent conditions. We use this basic idea of having two alternative forms of the same sequence to propose a new Artificial Life algorithm. After a general introduction to the area we report our findings in a specific application studied recently: an algorithm which allows sequences of binary numbers to interact. We introduce folding methods to achieve 2-dimensional alternative forms of the sequences. Interactions between 1- and 2-dimensional forms of binary sequences generate new sequences, which compete with the original ones due to selection pressure. Starting from random sequences, replicating and selfreplicating sequences are generated in considerable numbers. We follow the evolution of a number of sample simulations and analyse the resulting self-organising system.
1
The Age of R N A
A new age is dawning in molecular biology, the age of RNA [1]. Over roughly the last decade m a n y discoveries were made that have completely changed our understanding of RNA. Whereas the Fifties, Sixties and Seventies were dedicated mainly to explore the enormous richness of the molecular worlds of DNA and proteins, the Eighties clearly marked an explosion of knowledge in l~NA-related problems and facts. W h a t is so interesting about RiboNucleicAcids (RNAs) that chemists and biologists are flocking in large numbers into this research field? W h a t might be the consequences for our understanding of the mechanisms of Life? Finally, what kind of computational models could be derived from this new world that would offer insights into the functioning of a distinct category of algorithms, algorithms of self-organizing systems? This chapter is dedicated to explore the latter question, mostly by discussing computational aspects of recent revolutionary discoveries in biochemistry. We
70 shall put forward a new class of algorithms that shows signs of self-organization. Essential features of this class are derived from new findings in RNA chemistry. In our oppinion, it is possible that those findings might have reverberations into mathematics, physics, studies of complex systems and even engineering (besides heavily impacting biology and chemistry).
\
\ 0
0
I
O-P-O II 0
I
~
CH~
Base
0
d-P-O u 0
~"
CH 2
0
I%H I
I
0 OH m ()-p - 0 0
(a)
Base
H?\ i
0
H
I
H
I
O-P - 0 0
(b)
Fig. 1. The sugar-phosphate backbone of RNA ~nd DNA. Only a slight difference can be seen between RNA (a), and DNA (b). One hydroxyl-group is absent in DNA.
A few words are in order to highlight the specifics of RNA as opposed to DNA. Basically, there are two differences between DNA and RNA: One concerns a mere oxygen atom bound in a hydroxyl-group in the latter which makes macromolecules of RNA much more prone to form secondary structures in itself. DNA macromolecules, on its part, prefer to form stable double helices with complementary strands. The other difference between DNA and RNA is the set of nitrogeneous bases connected to their respective backbone (see figure 1). For RNA, these bases are adenine (A), guanine (G), cytosine (C) and uracil (U), for DNA they are adenine (A), guanine (G), cytosine (C) and thymine (T). Besides an additional methyl group in T as compared to U, they are identical. The primary mechanism for forming 2- and 3-dimensional structures is the interaction via hydrogen bonds between corresponding bases that form base pairs. Basically, a polynucleotide can gain energy by forming such hydrogen bonds which translates into stability for the resulting structure. Figure 2 shows the two most stable pairings in RNA. A typical example of a secondary RNA polynucleotide is shown in figure 3 b. This is often called the phenotypic form of the macromolecule consisting of the sequence shown in (a). By assuming this shape, RNA is more reactive than DNA with its inert form of a double helix, which effectively conserves the sequence on its strands. And here is it, the main functional difference between RNA and DNA: DNA is highly specialized in conserving information residing in the order
71
C
"~C
"% I
N~
I II
Backbone
""H
O
H
I
Guanine
H,. //C ~" C"" N" It"
c
I II H/C%N / c\
II
"'o
I
I ". / C N / N N C \\
Uracil
H
\
"N /
c/N\H
/
H
AderAne
r
/
C-
H
Backbone
N ~I N-- C/ ""H /C N\\ II "N NC / O.
N/ !
I
"" H., N,,C%N
Cytosine
H '
Backbone (a)
II
C - H
/ c \ N/ I Backbone
(b)
Fig. 2. The two most important base-pairings via hydrogen bonds in RNA. (a) U - A and (b) C - G . Other pairings are also possible, notably U - G, but they are not very stable. Dashed lines symbolize hydrogen bonds.
of its bases, whereas RNA is an information storage (in the base sequence) a n d a reactive agent, not very specialized in either of these functions. No wonder there exist numerous different (and more specialized) kinds of RNA, mRNA, tRNA, rRNA, snRNA to name a few, all performing different kinds of functions in the information processing machinery of a cell [2]. In 1989 the Nobel Prize in Chemistry went to Sidney Altman of Yale University and to Thomas R. Cech of the University of Colorado for their pivotal role in the discovery that molecules of RNA can really act as catalysts (ribozymes) [4, 5]. It was subsequently established that certain RNA molecules can accelerate reactions by a factor of as high as 1012 which is comparable to the effect of protein enzymes built from amino acids. An entire new branch of biotechnology has sprung up since [6] to make use of these new functional building blocks for drug design [7, 8]. Furthermore, early on in evolution some organisms seem to have managed the transition into pure RNA form: viruses. Viruses have an intimate knowledge about the replication mechanisms of cells, but are not able to survive on their own. As parasites, however, they have succeeded in exploiting cellular replication mechanisms for their own purposes. It is therefore suspected that they derived from early self-replicating life forms [9]. This leads us naturally to another important topic regarding sequences of RNA. Many scientists [10, 11, 12, 13] now believe that molecules of RNA were the predecessors of a much refined DNA-protein system that allowed Life to selfreplicate and to perpetuate itself. Theories of the origin of life have long been considering the double function of RNA as one important aspect of a system capable of self-replication. The chicken-egg problem of our DNA-protein system could thus find a plausible explanation: Presumably a much less specialized RNA system performing both information storage and enzymatic function could have bootstrapped itself and might have lead later on to the more efficient DNAprotein system with various kinds of RNA acting in auxiliary roles that support DNA-protein. Figure 4 shows a sketch of the dependencies between present-day
72 (a)
GAAUACACGGAAUUCGCCCGGACUCGGUUCGAUUCCGAGUCCGGGCACCAC
C A C C A GAAUACACGGAAUUCG='C C-'G C--G C-.G G-'C G--C A--U C~ U-'A C-'G G--C G--C U U
(b)
U C
A G
Fig. 3. (a) Genotypic form of a t-RNA sequence from E. coil, (b) Phenotypic form of the same sequence [3].
D N A , R N A and proteins.
DNA
m-RNA
--
~
proteins
Fig. 4. The DNA-protein system has to hold information about its own information conservation in itself. Various kinds of RNA play auxiliary roles. Arrows indicate a supporting function.
In the same spirit Stuart Kauffman writes in a recent book: "I shall argue t h a t life started as a minimally complex collection of peptide or R N A catalysts capable of achieving simultaneously collective reflexive catalysis of the set of polymers ( ... ) and a coordinated web of metabolism." [13]
73
2
Evolutionary Algorithms and beyond
Evolutionary Algorithms (EAs) make use of ideas gleaned from natural evolution. Information, e.g. useful for solving an optimization problem, is conserved and evolved over time, by providing a population of entities that breed with each other to generate better solutions. Starting from random solutions, the EA narrows down solutions in successive generations until it cannot find an improvement of a solution any more. Based on the external problem to be solved, the EA assigns fitness values to each individual in the population which are then used to determine the eligibility of the particular solution at hand for breeding and perpetuation into the next generation. A kind of artificial replication and selection takes place which results in a change in the content of successive generations. Genetic Algorithms (GAs) are a prominent example of this idea. At the level of the genotypic representation, John Holland proposed this scheme in 1975 [14]. Similar considerations have been undertaken at the level of the phenotype of a solution and are summarized in a 1973 book by Ingo Rechenberg [15]. The algorithms considered here, however, are different in that they start with a system capable of self-organization. This is done in close analogy to the RNA system in nature by postulating that the same physical entities that are used for information storage exist in an alternative form that allows them to interact with each other. We propose to consider artificial systems with the characteristic feature of being genotype and phenotype at the same time. We shall look at one specific example and study some phenomena that emerge in such as system. We will then point out various routes to generalizing the system and put it into a broader perspective. The system we have chosen to look at in more detail is based on the most fundamental material of information processing in computers, the binary numbers 0 and 1. Sequences of these numbers constitute both, data and programs in the v.Neumann paradigm of computing which was so pervasive over the last 50 years. We thus will study binary sequences that come in two alternative forms, a 1-dimensional "genotypic" form and a 2-dimensional "phenotypic" form.
3
The basic algorithm
As we deal with the evolution of (binary) strings of symbols, two principles have to be embodied in the algorithm: i) Machines, which we call operators should exist, able to change strings according to certain rules we have to define. ii) Strings should be equivalent to machines in that mapping methods determine which operator can be generated from which string.
74 Since we wanted to construct an algorithm as simple as possible, we settled for
binary strings. However, the requirements mentioned in i) and ii) are sufficiently general to allow for other objects. Here, we consider strings s, consisting of concatenated numbers "0" and "1": s = ( s l , s2, ..., st, ..., s N ) ,
s~ ~ {0, 1},
1 < i < N
To keep things as Simple as possible we choose a square number for N, the length of string s. An important question arises immediately: Itow can operators be formed from binary strings? In Nature, nucleotide strands tend to fold together according to the laws of chemistry and physics. Bond formation in Nature is governed mainly by a tendency of the strands to relax into energy-minimal configurations in physical space. This process might be called a compilation of nucleotide programs into enzymatic "executables". tIere we try to keep things as straightforward as possible and only consider two-dimensional configurations of binary numbers which, in a mathematical sense, are operators.
3.1
T h e f o l d i n g of o p e r a t o r s
For binary strings the following procedure is feasible: Strings s with N components fold into operators P which can be represented mathematically by quadratic matrices of size v/N x v/N (remember, N should be a square number!), as it is schematically depicted in Figure 5. In principle, any kind of mapping of the topologically one dimensional strings of binary numbers into a two dimensional (quadratic) array of numbers is allowed. Depending on the method of folding, we can expect various transformation paths between strings. Also, the folding must not be deterministic, but can map strings stochastically into different operators. We then assumed, that an operator Ps, formed from string s, can act on another string in turn and generate still another string (see Figure 6): ;os s / ::~ s u It is important to keep in mind that neither ;Os nor s~ is required to be deleted by this operation 1, Rather, a new string s " is generated by the cooperation of ;~ and s '. Thus, we consider the system to be open, with an ongoing generation of new strings (from some sort of raw materials). In this interpretation, only the information carried by Ps and s ~ is considered as something essential, and it is this information that is required to be conserved. One can imagine some possibilities to balance the continued production of new strings, all having to do with resource limitations necessarily imposed on 1 It is also possible to require that only one of the two, either string or operator should be conserved. Qualitatively, the behaviour of the system is equal.
75
i) String s
Operator Ps
Fig. 5. A string s folds into an operator Ps
such a system: (1) The system might have a fixed number of strings. Each new string produced by an interaction causes the deletion of an already existing string. The string to be replaced can be selected either by chance, i.e. according to its frequency in the ensemble, or by some quality criterion, its length, the number of " l " ' s in it, etc. (2) After an intitial period of unrestricted growth the increase in in string numbers might level off to zero. (3) At the outset, a restricted number of elements that constitute strings might be provided. As a consequence, an intitial growth period in the number of strings would cause a rapid depletion in the supply of raw material, in our case of "0" 's and "1" 's, which in turn would restrict the formation of new strings.
The net effect of these counter-measures (as well as others one may devise) is to force strings into a competition for available resources. Only those strings will have a chance to survive in macroscopic numbers which are able either i) to reproduce themselves or ii) to reproduce by the help of others or iii) to lock into reaction cycles with mutually beneficial transformations. We shall study and demonstrate this behaviour in the next sections.
76
-N-
T l
1]
1 1
N
0 0 1
T ~
N
1 Operator P
String s
String s'
Fig. 6. An operator P acts upon a string s to produce a new string s '
3.2
O p e r a t o r s a t work
For the moment, however, we have to come back to the question, how exactly an operator can act on a string. Consider Figure 2. We can think of s as being concatenated from ~ fragments with length v/N each. The operator 7) is able to transform one of these fragments at a time (semi-local operation). In this way, it moves downward the string in steps of size ~ until it has finally completed the production of a new string s r Then, operator P unfolds back into its corresponding form as a string s~, and is released, together with s and s t, into the ensemble of strings, which will be called string soup from now on. A particular example of the action of putation of scalar products:
an
operator onto a string is the com-
j=,/~
P sj+k
S/
(1)
j=l
i : 1,...,v~
k : O,...,V~- 1
where k counts the steps the operator has taken downwards the string. This computation, however, will not conserve the binary character of strings, unless we introduce a nonlinearity. Therefore, later on we shall examine in more detail
77
the following related computation:
s~+k,/~ = o-
Pqs.i+k,/~ - 0
(2)
k j=l
i = 1, ,.., v/N r
k = 0,...,vrN - 1
symbolizes the squashing function 1 for x >_ 0
0 forx<0
(3)
and O is an adjustable threshold. Equ. (1) can be interpreted as a matrix multiplication, whereas equ. (2) amounts to a certain combination of Boolean operations (for the case ~ = 1 discussed below). Other operations are allowed as well and may be subsumed under the following general transformation rule: s ' = f (P, s)
(4)
Examples would be string matching and majority vote. It has turned out to be useful to adopt the reaction notation of chemistry to study the interaction between P and s. In this picture, consider an operator P, formed from s(1), which reacts with s (2) to produce s (3) under conservation of all reactants. We can write this as a general reaction of the kind s (1) + s (2) + X
, s (~) + s (~) + s (a)
(5)
and classify the possible reactions into five classes: Ps(1) ~ s (2) --~ Ps(1) + s (2) + s (3)
(6)
Pso) | s (2)
-"~ Ps(1) + s (~) + s (2)
(7)
P s ( 1 ) 9 s (2)
.....'
Ps(, + s(2) + s(1)
(8)
Pso) q) s(1) ~
Ps(~) + sO) + s(2)
(9)
Ps(1) | s (1) ~
Ps(~) + sO) + s(1)
(10)
where operators Ps(k) are characterized by the strings they correspond to. The indicates the active process of the operator working on a string (using X as raw material). Interesting reactions are given in (7), (8) and (10), which show replication of one reactant (the first two) and self-replication (the last).
78 3.3
Equilibration
We have to mention the fact, that potentially "lethal" strings exist in these systems. A string is said to be lethal or "pathological" with respect to an operation (4), if it is able to replicate in an unproportionally large number in almost any ensemble configuration. In the particular case of equ. (2), the string consisting of "0" 's only is pathological~ because it is able to replicate with itself and with every other string. We shall call this string destructor and shall constantly monitor soup reactions in order to remove the destructor upon appearance 2. Another potentially hazardous string consists of "1" 's only. We shall call it the exploitor. In addition to replicate itself it is able to replicate with a large fraction of strings. Although the exploitor is pathological~ we can deal with it in a more gentle way by providing a means of non-deterministic string decay. To this end, we shall introduce the following general stability criterion for strings: A string may be considered the more stable, the less " l " ' s it contains. Its chance to decay, hence, depends on N
I (k) = ~
sl k),
k = 1, ..., M.
(11)
i=l
I (k) measures the amount of" l"'s in string k and will determine a probability
= (I(k)/N)
(12)
with which an encountered string should decay. The parameter n is used to tilt probabilities slightly. Note that the exploitor has probability p = 1 and must decay upon encounter.
3.4
Summary
The entire algorithm can now be stated in its simplest form as follows:
S T E P h Generate M random binary strings of length N each S T E P 2: Select a string and fold it into an operator (a matrix) of dimension S T E P 3: Select another string and apply operator S T E P 4: Release the new string, the old string and the operator (as string) into the soup 2 The method might be applied to arbitrary strings when it is called 'reaction masking'
79 S T E P 5: Remove one randomly chosen string in order to compensate for the addition of a string in S T E P 4 S T E P 6: Monitor the soup and replace destructors by random strings S T E P 7: Select one string and destroy it according to the probability (12) S T E P 8: Go to STEP 2
This algorithm makes use of the resource limitation scenario previously mentioned under point (1). We close this section by giving a short table (Table 1) showing the impressive amount of possible interactions between strings as we increase their length N. For arbitrary N we have n~ = 2iv - 1 (13) strings and nn
= 2 2N -- 3 92N + 2
(14)
reactions, excluding reactions with the destructor and self-reactions. The number of potential self-replications is, of course nsR
v~l
(15)
= ns
2
3
4
5
10
iN
li 4
9
16
25
100
i n~
1 15
511
65535
nn
3 210 ~ 2.6. I05--~ 4-
"~ 1 0 7 "~
1030
1 0 9,-* 1015 "~
1060
T a b l e 1. Some low dimensional examples. V~: Matrix size in one dimension; N: Length of strings; n,: Number of different strings, excluding destructor; nR: Number of possible reactions, excluding self-reactions.
4
The
simplest
systems,
N
= 4 and
N
= 9
With these general remarks we have set the stage to discuss the simplest nontrivial system with strings of length N -- 4. A still simpler and in fact trivial system is N = 1. It has two strings, s(1) = 0 and s (2) = 1, which coincide with their operators. (There is not much to fold
8O
in one-component strings!) Using the operations defined by eqn. (2), we readily observe that both strings are able to self-replicateS: 0-0=1
1.1=1
and that the destructor s (1) replicates using s(2): 0.1=0
1.0=0
Since we have to remove the destructor, nothing is left than one string, prompting us to call this system trivial.
4.1
Static Features, N=
4
Let us start the examination of the system N = 4 by naming its strings. In low-dimensional systems, we shall use decimal numbers that correspond to the binary numbers carried by a string as
compact descriptions. Thus, e.g. s =
will be called s(5). 0
Folding strings into 2 x 2 matrices can take place in various ways. One of these ways will allow us to consider the operations involving scalar products (according to equ. (1)) with the string acting on itself as ordinary matrix multiplications, so we shall call this the canonical folding. The arrangement is
s =
S2 83 84
81 83
=
82 84
(16)
which can be easily generalized to arbitrary size x / ~ . A corresponding folding is the transposed version of (16):
The other main folding way is ptj =
(Sl s2).Wecallit~opologicalfolding, s4
sa
since it conserves the order of the sequence in 2d form, and its transpose P~J~ = 3 It is astonishing that this self-replicating system was around unrecognized for centuries!
81
\ s 4 ] . Table 2 gives the resulting operators for all four kinds of folding 82 63 / methods. (sl
We shall now give an example of every of the five sorts of reactions listed as equations (6) to (10) in the last section. Using the squashed scalar product and the canonical folding, we find, for example Ps(,) G s (6) * s (4)
(18)
Ps(~) | s (1) ~ s (1)
(19)
psi1 ) | s01) ~ s(1)
(20)
Ps(~) | s(4) ~ s(S)
(21)
Ps(~) | s(s) ~ s(s)
(22)
where the :=v sign indicates only the string that was newly produced by the interaction (suppressing the conserved reactants). A list of all reactions for the present case is given in Table 3. Similar reaction tables can be derived for other folding methods. At this point we pause and note again the fact, that we are dealing at the moment with a system of binary strings, each of which has only 4 components. This poor material is able to "react" in quite a complicated manner, as a glance at Table 3 tells us. Hence, already for N = 4, we do expect rather complicated dynamical behaviour. The strength of these systems is that they exploit the phenomenon of combinatorial explosion as can be seen from Table 1. Therefore, by studying N = 4, we shall have gained only a slight impression of what might be possible in larger systems of this sort. An entire reaction universe is opened, once we consider larger strings with length, say, of order O(100). In our real world, the smallest virus contains about 3000 base pairs. Their elements are certainly able to interact in much more complicated ways than the binary strings we are considering here. A comparison may give us a small hint, how intricate the fundamental mechanisms of life really may be.
4.2
Dynamical Behaviour, N = 4
Now let us discuss the dynamical behaviour of this system. Global quantities which characterize the time development of our system N = 4 are the concentrations xi of all different string sorts s(0: x~(t) = r n i ( t ) / M
(23)
where mi is the number of actual appearances of string type s(i) in the soup and M, as before, the constant total number of strings. We have m (t) = M i=1
(24)
82
String number
Folding Method 1 2
0 0
3
0 0
4
11
0 0 0
f
0
I 3 1
i
4 1
0
00 00~
7 1
0
,
1101
1
01'
il i ii ~ 1 0 1 0
0
0
Folding
I
String number
I
Method!
8 0'0
10
9
12
11
13
14
15
(~ ~ (~0i) ~) (0(10 O~II I011 01111 0) ~) (~ (~ 0)(0 0) (0~ 10)(i ~0)(~ ~)(011 1
1(01 0 ) 1 ( 1)1 1
(01)
1i
o
4
1
1
1
1
Table 2. Results of folding methods 1 to 4 onto strings 0, ..., 15. Only a rearrangement of matrices is observable.
or
ns
E xi(t) = 1.
(25)
i=1
Figure 7 shows the first l04 iterations through the algorithm with M = 1000 strings. All concentrations xi, except for the destructor are shown. The system evidently relaxes to a macroscopic attractor state as can be seen more clearly in Figure 8. P~unning the algorithm under different initial conditions reveals that the macroscopic behaviour is indeed stabilizing, with some minor fluctuations around equilibrium values of concentrations. If we change selection probabilities by choosing another parameter n, equ. (12), we end up with a different asymptotic distribution of sorts s(O. Using another folding rule also results in global change of behaviour. Since choosing another folding methods amounts to a different enumeration of strings, we can - without a loss of generality - stick to one folding method. As far as the overM1 stability of the system is concerned, we observe in Figure 9 a case where the balance between string sorts is seriously disturbed.
83
~ring i
1
1
0
1
4
5
4
5
0,1
011
4
5
4
5
2
0
1
1
0
0
1
1
4
4
5 I 5
4
4
5
5
3
1
1
1
4
5
5
5
4
5
5
5
4
5
5
5
4
2
0
2
8
10
8
10
0
2
0
2
8
10
8
10
5
3
0
3
12
15
12
15
0
3
0
3
12
15
12
15
6
2
1
3
8
10
9
11
4
6
5
7
12
14
13
15
7
3
1
3
12
15
13
15
4
7
5
7
12
15
13
15
8
0
2
2
0
0
2
2
8
8
10
10
8
8
10
10
9
1
2
3
4
5
6
7
8
9
10
11
12113
14
15
"
I
10
0
3
3
0
0
3
3
12
12
15
15
12
12
15
15
11
1
3
3
4
5
7
7
12
13
15
15
12
13
15
15
12
2
2
2
8
10
10
10
8
10
10
10
8
10
10
10
13
]3
2
3
12
15
14
15
8
11
10
1t
12
15
14
15
14
2
3
3
8
10
11
11
12
14
15
15
12
14
15
15
15
3
3
3
12
15
15115
12
15
15
15
12
15
15
15
J
T a b l e 3. Reactions using computations according to (2) with 1st kind of folding. 4 reactions are self-replications, 76 are replications.
The figure was generated by altering the algorithm of Section 3 to include destructors. This example clearly demonstrates the need for the corresponding counter-measure taken in the algorithm.
4.3
The system N = 9
The system N = 9 is much more complicated than N = 4. Due to combinatorics we face ns = 511 strings with a total of nR = 261,121 reactions. We first generalize the folding methods given for N = 4 to the present case.
84
0.160
l
l
I
I
f
I
t
I
I
8
10300
Time (a.u.)
F i g . 7. System N = 4, c~nonical folding. Number of strings: M = t060. Number of different sorts: ns = 15. Shown are the first 10 generations.
For the canonical folding we have:
S--~
---+ "Ps =
81
82
83 )
84
85
86
87
88
89
81 S2
84 85
87 88
83
86
89
:
(26)
9
(27)
with the transpose given by 82 S=
""+ ,'lOs :
S:
(
/
The topological folding and its transpose read, respectively: $2 S:
~
~Os : ~
(
81 86 87
82 85 88
"33 84 89
)
(2s)
85 0.125
I
,
I
I
1
t
--J
I
I
I
I
I
i
i
i
i
i
c
8
i
i
I~)(X3(X)
Time (au.)
Fig. 8. System N = 4, canonical folding. Number of strings: M = 100,000. Number of different sorts: n~ = 15. Shown are the first 100 generations.
"i) 82
S ~---
-'~ ~:)s -~
8
81
86
87
S2
S5
88
83
84
89
(
)
9
(29)
A summarizing table of the resulting replication and self-replication reactions is presented in Table 4. Quantitative differences appear between different folding methods. Since it is difficult to visualize all possible reactions, Table 5 shows a randomly selected part of all reactions. The size of this part is the same as the total table for N - 4. In dynamical terms, the N = 9 system is much more complicated, too. Figure 10 shows one simulation with the 1st folding method applied. We have selected concentrations of 15 frequent string sorts (at t = 10000). Table 6 gives an overview of the sorts involved. Histograms of all string sort concentrations are generally more explanatory. If one looks at a histogram of string sorts (see Figure 11), one observes great differences among sorts, some represented in large numbers. Most sorts, though, are at the lower end of the histogram resolution or have died out completely. Due to a small ensemble size, fluctuations have driven many sorts into extinction.
86 Folding~elf-replications Replications
-
-
1
14
12028
2---
122
21310
18
11822
94
16830
Table 4. Number of replication and self-replication reactions in N = 9.
String 165
283]287 287 280!281 284 285'283 283 287 2871312 313 316 317!
166
347 351 351 344 344 349 349 347 347 3 5 , 3 5 1 376 376 381 381
167
3471351 351 3441345 349 349!347 347 351 351 376 377 381 381
168
274 278 278 272 274 276 278,274 274 278]278 3O4 306 308 310
169
275 278 279 280 283 284 287 282 283 286!287 304 307 308 311
170
338 343 343 336 338 341 343 338 338 343 343 376 378 381!383
171
339 343 343,3441347 349 351'346 347 351 351'376 379 381,383
172
283 287 287280 282 284 286 283 283 287 287 312 314 316!318
173
283 287 287]280283 284 287 283 283 287 287 312 315316 319
174
347~351 351 344 346 349 351 347 347 351 351 376 378 381 383
175
347 351 351 344 347 349351 347 347 351 351 376 379~381 383
176
402 406 406 400~400 406 4061402 402 406[406 432 432 438]438
177
403 406 407 408 409 414 415 410 411 414 415 432 433]438 439
178
466 471 471 464 464 471!471 466 466'471 471 504 504 511 511
179
467 471 4711472 473 479 479 474 475 479 479 504 505!511 511
I
-
I
Table 5. Reactions for N = 9 with 1st kind of folding. Selected are reactions between operator/string pairs 165, ..., 179.
87 0,,250
'
I
S
~
~
I
I
c
8
103(30
Time (au.)
Fig. 9. System N = 4, canonical folding. Same initial conditions as in Figures 7. The destructor is not removed regularly. All concentrations except x0 for iterations 1 to 10000. The destructor (not shown) is able to quickly suppress all activity. Sort
511
63
7 It 27
216
54
73
45
146
195
Concentration 4.4 %!3.9 %3.4 %13.3 %3.1 ~ 2.8 %2.3 %2.2 % 1.8 %1.6 % T a b l e 6. Sorts and their concentrations of Figure 10's simulation. Only 10 most frequent sorts ( out of 15 in the graph) shown.
We shall demonstrate that even an unsymmetric reaction rule does result in qualitatively similar behaviour. Figure 12 shows a simulation where only the operator involved in the operation is conserved and is allowed to fold back into its primary string form. The string it is reacting with is not conserved. In a sense, we could say that it is transformed by the action of the operator into a new string. If we require the operator to degrade after a successful operation on a string, a similar behaviour results. The old and the newly produced strings are kept in the soup in this scenario. Finally, another symmetric treatment, namely destroying both, the operator
88 0.100
I
I
I
I
I
I
I
- 1
I
I
I
I
.--A-
I
I - - r
I
I
1OOOO
Time (a.u.)
F i g . 10. System N = 9, canonical folding. Iterations 1 to 10000. 15 frequent string sorts (selected randomly) are recorded. String type concentrations
kiLL 1
J I.....
. . . .
....
lJ 511
F i g . 11. System N = 9, M --- 1000. Histogram of coacentrations of all string types 1, ..., 511 after t = 90000. Some clusters are clearly visible.
89
0.100
I
.I
I
I
1
r
1
I
I
I
I
I
I
I
u 8
I
Time (c~u.)
10000
Fig. 12. Unsymmetric system N = 9, canonical folding. Only the operator is conserved. Iterations 1 to 10000 with same initial conditions as in Figure 10. The same 15 frequent string sorts are recorded.
as well as the reacting string, should be mentioned. Since it is not very selective, we do not consider it further in this chapter.
5
Model Equations
We can try to model the proposed string reactions by a system of coupled differential equations similar to those studied by Eigen and Schuster for the hypercycle [16] - [19]. To this end we have to assume that the important aspects of our system m a y be described by concentrations of the different string sorts, averaged over time. The equations in average concentrations
Yi = < x i > ,
0 < yi, xi < 1
read:
~r
j,~r
~ k yk(t)
(30)
90
where Bi, Cik, O~, Wijk are (coupling) constants, A(t) is an unspecific growth term and #(t) is a flow term used to enact competition between the various string sorts s(i).
Couplin~ Short description
Random value l_L_
Spontaneous generation Bi
Self-replication
Cik
Replication
Uo
with p = ~ with p ~_ ifi=k
_1!withp=
Spontaneous decay
0
i
W~j~ Reaction
aijyiyj + ~g
240
1
with p =
Di
System value
O ifi=jori=k 1 with P -_i g 1 0 with p = ~ 1o
~f 1 if reaction exists L 0 otherwise {01 i f i = k if reaction exists 0 otherwise
{~
if i = j or i = k
1 if reaction exists 0 otherwise
Table 7. Couplings between 15 string sorts (i.e. a N = 4-system) for a simulation of equations (30). D = ~ k D~y~. Every entry has to be scaled with a factor ~ .
Let us discuss in more detail the different contributions to equation (30): The first term, A(t) > 0 is a growth term due to STEP 6 in our algorithm. For this term, we may assume that the probability to generate the destructor does not change over time and is, hence, approximately equal for all sorts of strings. If this is too rough an approximation we may compute A by
A(t) = E aij yi (t)yj (t)
(31)
i,j
where (~ aij =
0
if s (~) | s (j) ~ s (~ otherwise
(32)
reflects reactions producing the destructor. The second term describes self-replications of sort s(i) (see equ. (10)) in STEP 3 and 4 and is either Bi = l / M , if this reaction exists or Bi = 0 otherwise. It is quadratic in concentration yi, since operator and string are required to be of sort i. The third term describes all other replication reactions between strings (see equ. (7) and (8)) in STEP 3 and 4 of the algorithm with Cik = 1/M if replication occurs between i and k and 0 otherwise. It depends on two concentrations, yl and y~. The fourth term is linear in Yi as it models the spontaneous decomposition of strings according to STEP 7.
91 The first of the two remaining terms describes a reaction between operator Psu) and string s (k) leading to string s (0, in which case Wijk = 1/M, and 0 otherwise. Such reactions take place as a consequence of STEP 5 and 8 of the algorithm. The flow term, finally, assures that the sum of all concentrations is constant over the relaxation process. #(t) is defined as ~(t)
y~(t).
= Z
(33)
i
By keeping ~ i Yi = const, a strong competition between string sorts is caused. The flow term corresponds to S T E P 4 irt the algorithm. We are now in a position to examine the behaviour of these equations for 15 string sorts with concentrations x{(t) in two special cases: (a) with random couplings and (b) with couplings derived from the N = 4 system.
0.125
I
,
r
1
~_
~
t
~
~
t
1
i
~
!
i
i
i
Time
(Q.u.)
8cO
o
i
2000 Fig. 13. Simulation of the differenti~ equations (30). Random choice of coupling constants according to Table 7
Table 7 shows the choice for the various parameters in these two cases. Qualitatively, the behaviour for both cases is quite similar, as can be seen from a comparison of Figures 13 and 14. Figure 13 depicts a typical random parameter run. Most of the concentrations are relaxing to individual levels and these are
92 0.125I _
I
I
I
1
I
1
1
I
I
I
--
I
I
I
0
I
Time (au.)
1
I
2OOO
Fig. 14. Simulation of the differential equations (30). Constants derived from binary string system N --- 4.
independent of the starting concentrations. The system can thus be termed a deterministic and competitive attractor system. As such, it is like an ecosystem that approaches an equilibrium in the distribution of its species.
Figure 14 demonstrates a run with coefficients from a N = 4 binary string system. It shows the simulation of the differential equations (30) under the same initial conditions as Figure 8. We can clearly observe that some concentrations merge into the same levels, due to the particular interactions present in the dynamics of this binary string system. Again, we can term the system a deterministic and competitive one. The comparison between the statistical data and the numerical integration of (30) shows very good agreement.
If we want to apply the equations to cases involving larger strings, however, we run quickly into problems. For N = 9, Wij~ would have 1.3 • l0 s components and a numeric integration of the ODEs would be tedious.
93 6 6.1
Extensions Artificial Selection
So far we have considered the system without coupling to the external world. A kind of selection pressure has already been applied, though, by removing strings that are too heavily loaded with "l"s. This could be considered intrinsically necessary for the algorithm to function at all. Cum grano salis, however, it is still an autonomous system evolving according to a self-set agenda. The situation changes completely if one adds selection pressure in specific directions prescribed by a user outside of the system: What was so far a syntactic system determined by the initial mix of strings, the rules for their transformation and random events influencing the actual history, now suddenly gains semantic information, that is information which links events in the system to the outside environment and gives rise to a meaningful interpretation.
M
Selector
Fig. 15. An external selection operator attaches itself to a randomly chosen string and evaluates its outside selection criterion. The string is subsequently destroyed if its selection value is below average. Attachment may be controlled by markers that allow for a docking of the selector at a specific site of the string.
The outside user does not only observe what is going on inside the system, (s)he also interacts with the system by occasionally selecting strings for destruction. If (s)he applies a systematic method, (s)he is able to imprint a certain path for the development of the system, and in fact can determine its evolutionary trajectory to a considerable amount. Let us see with an example how this works. Suppose, the outside observer is interested in finding strings with a certain pattern of ls and 0s, e.g. patterns
94 like 1,0,0,0 in a longer string. Then (s)he will sample strings independently from what is going on in the system at a frequency (s)he determines. If (s)he does find the above mentioned pattern to be present on the string, (s)he does not intervene, otherwise (s)he destroys the string. By this outside intervention, a new place is generated for a string that can be filled with any product string within the system. Thus the user destabilizes the autonomous equilibrium in the system and presses it into a direction of his (her) choice. The direction (s)he chooses is completely dependent on the association of importance the user gives to certain features of the string. If (s)he wants to interpret the string sequence in a certain way, and derive some measures of quality from such an interpretation (s)he might use the system for optimizing this quality. Figure 15 gives a sketch of the situation. Operator E acts as a selector transfeting outside selection pressure into the string population.
6.2
Inoculation
Another important consideration has to do with undersampling the combinatorial space of possible sequences. We shall discuss initialization and evolution subsequently in greater detail, Following Ray, we call the process of initially distributing a certain number of binary strings over the sequence space an inoculation [20]. Inoculation provides initial string sorts which later on generate reaction products by encountering other strings during reactions. The size of an initial population is not necessarily the size of the population during or towards the end of a simulation, as a certain amount of growth in the number of binary strings may certainly be allowed. It may even make sense to provide for a big expansion of the original population size in order to study unconstrained growth processes. Sooner or later, however, every population will reach its size limits, and it is at that point when competition suddenly will win more influence over the system and will strongly accelerate certain processes taking place in the reaction vessel. For the sake of this discussion let us assume henceforth that the population size remains constant over the entire dynamical process. There are basically two kinds of scenarios for seeding by inoculation: (i) full system seeding and (ii) firing disk seeding ([21]). The former refers to randomly distributing the population over sequence space by generating completely random binary strings. In smaller sequence spaces like N = 4 or N = 9, this has the effect of distributing strings more or less evenly over all possible string sorts. In larger sequence spaces, however, this amounts to a kind of "shotgun" strategy [22], where some sorts are hit by one or a few individuals, others are not hit at all. Sequence space gets more and more sparsely populated with larger and larger average distances between single string sort representatives.
95 In scenario (ii), on the other hand, only a small number of string sorts is initially populated, usually by taking one string type and by allowing lowdimensional random variations around this center (therefore the name "disk"). The radius 4 of the disk is commonly chosen as to allow each of the string types within it to be populated by at least a few instances. An extreme embodiment of this type of seeding would be to allow only for one string sort to be present at all. Finally, some hybrid seeding methods exist, like allowing for a few firing disks (with small radius) to be distributed randomly in sequence space, or to allow for one firing disk with larger radius that would lead to occupation numbers M/ below 1. Table 8 summarizes these possibilities and shows some typical numerical examples. This may give an idea of how to seed systems up to N = 100 in a comparable way: Choosing one firing disk of radius 3, for instance.
Seeding type Feature N = 1 6 ] Fullseeding Density
15.3
Firing Disk Radius 16 (full) Hybrid I
Number
286
Hybrid II
Radius 16 (full)
I
N=36
I
.I N = 6 4
N = 100 Assumptions
1.46.10 -~ 5.42.10 -14 7.89 .10 -25 4
3
3
25
4
1
Mi=5, r=3
6
5
4
M / > 10 -1
Mi>4
T a b l e 8. Overview of different seeding types for various string lengths. Size of the population used for calculation: M = 106. Mi: Occupation number of string type i. r: Radius as given by Hamming distance.
6.3 I n t r o d u c i n g a n e w s o u r c e o f c h a n g e : E v o l v i n g p o p u l a t i o n s b y mutation In general, sequence space will be astronomically large so that it cannot be sampled thoroughly. The solution to this problem is to concentrate the population in a small part of the sequence space. As we have discussed in Section 6.2 we might start with a small firing disk. We have also to consider the change during the subsequent competitive processes. If we continue to substitute destructors and exploitors with completely random strings, chances are that we never will generate two strings of the same sort in one and the same simulation run. An investigation of the sequence space would thus become hopeless, 4 In our binary system, radius is measured by Hamming distances
96 However, what may not be possible in parallel, i.e. with a large population covering the entire sequence space, may be possible serially, i.e. by introducing the possibility of accidental transformations between string types. This is precisely what mutations could achieve. Thus, for larger systems an alternative to shotgun random events of steps 6 and 7 in the basic algorithm might be introduced which would allow at least a partial exploration of the neighborhood of populated areas. Otherwise, our system would be closed and presumably consist of those types which are achievable in reactions between all the string types at seeding time and their respective products. 5 By adding random errors, anything becomes possible - though less predictable - if we wait long enough. The system will show history and attractor behaviour will only be intermediate (as in [23]). For the introduction of these explorative random events into the algorithm various options exist, one of which we shall discuss here. This is to use a mutation as an occasional change which hits each string with a probability depending on its size. We define q to be the probability that one element of a randomly selected string component changes to another symbol, here "0" to " 1 " and vice versa. Since each element may be hit, this is a size dependent change and the probability that at least one error occurs in a string is
Q(1) = Nq
(a4)
with the provision that q < < ~ . Evidently, this mutation probability depends linearly on the concentration of string sorts in the reactor. That is to say, a more successful string sort will spawn more variations. In Nature this mutation type occurs in mutations caused by cosmic radiation. A new step may be inserted into the original algorithm allowing for one randomly chosen string to be hit by mutation according to equ. (34). At the same time, generation of random strings in steps 6 and 7 has to be abandoned. Here instead, we shall modify step 6 and 7 of our basic algorithm and substitute removed strings simply by (mutated) copies of other strings present in the system. Figure 16 shows a N = 9 system with mutation replacing the random generation of strings in the algorithm. Seeding was done using a firing disk of radius 0 around a self-replicating string sort (the identity operator). In terms of string sort concentrations, the initial condition was x273 -- 1.0, x i = 0.0, i ~ 273. We can clearly observe three phases during the simulation, characterized each by the dominance of one string type. First is the age of s (273), since this was the only string type present after seeding our system. It is clear that without a mutation nothing would happen, because s (u73) is a selfreplicator. With the "radiation" mutation present, however, one-bit mutated versions of s (273) can appear. Besides others, s (275) and s (2sl) are of this type. Their reactions with s (273) produce each other constituting a catalytic cycle: This may be a fairly large number, and depending on our intentions we may well live with it comfortably.
97 1.000
l
I
I
I
I
I
I
/
o
I
to
0.00C
0
Time
400
Fig. 16. Evolution of a N =- 9 system which includes mutation. Dominant string types follow each other in succession. See also the discussion in text.
P s ( ~ ) | s (275) ~ s(~r~) ~os(27~) (~ 8(281) ::~ 8(28I) :Ps(27~) | s (~73) ~ 8(TM) 7~(2~1) | s(27~) ~ s (27~) Other one-bit m u t a t i o n s show similar mechanisms, but are not i m p o r t a n t in this particular simulation. As the concentrations x~75, x2sl exponentially grow, reactions of s (275) and s(2sl) with itself become more and m o r e probable. T h e y produce a new self-replicating string sort: Ps(:~) | s(275) ~ s(2S3) 7)s(2~1) | s (2sl) =~ s (283) ~)S(2a3) O S(283) ::~ S(283) which subsequently is able to take over nearly the entire population, because it can exploit the a b u n d a n c e of s(273) on its own: 7)s(~) | s (2s3) ~ s (283)
98 :Ps(:8~) | s (~73) ~ s (~83) This leads to the age of s(283) which, in turn, is brought to an end by the interaction of two one-bit mutations of it, s( 34D and s (411) according to a slightly different reaction pattern that involves two intermediate products: ~OS(~a3) (~ S (347) :=~ S (475) Ps(34~) G s (283) ~ s (319) Ps(2~3) | s (411) =~ s (475) Ps(,,1) @ s (2s3) ~ s (319) As the intermediates s (319) and s (475) start to react with the dominant string s (2s3), they catalyze each other: Ps(319) 9 s (2s3) ~ s (475) Ps('~) G s (283) ~ s (319) PS(2a3) ~ S(319) :=~ S(319) Ps~)
9 s (475) ~ s (475)
Finally, self-reactions lead to the production of a new self-replicator
Ps(47~) ~ s (475) ==~s (511) P s ( ~ ) G s (~11) =~ s (511) which is able to exploit the dominant string sort s(2s~) for its own replication purposes P s ( ~ ) 9 s (51t) ~ s (5il) ~ s ( ~ ) @ s(2S3) ~ s(5~l) and to dominate itself. We can see a real evolution taking place in this system. Ages of dominance are followed by transition periods which are themselves followed by new ages of dominance. These time-dependent phenomena are evident results of a sequential exploration of sequence space that became possible with the introduction of string mutations.
99 7
In Perspective
Many aspects of what we have discussed here are also treated elsewhere [24] - [28]. The model system seems to be one of the simpler systems for self-organization that has been proposed to date. Whether it remains a toy system or can in itself give rise to questions interesting to biologists is not clear at this point. Walter Fontana has considered similar systems in the framework of A-calculus [29] which he named algorithmic chemistry [30]. There are many variants of this system and various routes to generalize the adopted strategy. Here we want to set the model in perspective and indicate a few routes, into which the model might develop in the future. The presently well stirred reaction vessel may be substituted by a system with local interactions. Spatial organisation of reactions and reaction products will be the result. The folding method can be made dependent on the information content of strings. This situation is realized in nature, where folding depends on the actual sequence of nucleic or amino acids. Finally, completely different objects might be evolved, like bit strings representing program code in arbitrary programming languages [31] or continuous-valued strings.
Molecule 1
Collision "~
Product
Interaction
Molecule 2
Input
..-"~
Computation
f"~ Result
Fig. 17. General view of a reaction and a computation .
In summary, the proposed model can be considered to be in a special class of algorithms, derived from a chemical metapher. In this metapher, one considers computation or the transformation of input data into output data in a computing machine as a sort of reaction, cf. Figure 17. In a chemical context, reactions are events caused by collisions between molecules that come about due to thermic movement. In the computational context, colliding molecules are the analog of data input to a computation. The reaction itself, following the rules of attraction and repulsion between parts of the participating molecules, is the
100
analog of a computation which, through many intermediate stages, finally results in an output, the product of the reaction 6. Enzyme-like catalysts might be helpful as computation probability amplifiers which are a means to control the computation. The advantage of this analogy is that it lends itself to a natural parallelizao tion. One molecular reaction does not determine the fate of a substance. But millions or billions do. The principle of mass action in chemistry that determines the stationary state in reaction vessels gains some prominence in this kind of computation: The more results of a certain type there are produced, the more they dominate the entire data space available for computation. Probing this data space with a device that picks up results at random leads (with high probability) to the result dominating the data space. In other words, probing will find results that are stable, i.e. that are produced in sufficient numbers as to sustain themselves. The chemical metapher is interesting in another aspect: It is a clear embodiment of a competitive organization principle for parallel computation [a3]. Any kind of data space is constrained by some limits, hence only a finite number of results can be held. Since results are themselves members of the colliding and reacting data ensemble, they will survive only if they have sufficient production rates. With the rapid progress in technology the moment may soon approach when self-organizing algorithms like the one introduced here become a mainstay in computing and begin to influence the architecture of new massively parallel machines.
8
Acknowledgement
Part of this work has been written during my stay at Mitsubishi Electric Research Laboratories (MERL) in Cambridge, MA. I am very grateful to the president of MERL, Dr. Tohei Nitta, and to the director of the Lab., Dr. Les Belady, for their continuous support and encouragement.
References 1. M.W. Gray and R. Cedergren, The new age of RNA, FASEB Journal 7 (1993) 4 2. B. Lewis, Genes IlI, John Wiley, New York, 1987 3. S. Altman, L. Kirsebom, S. Talbot, Recent studies in ribonucleasc P, FASEB Journal 7 (1993) 7 s Curiously enough, simple computation along these lines has been considered in a different context (cellular automata) as the billiard ball computer [32].
101
4. C. Guerrier-Takada, K. Gardiner, T. Marsh, N. Pace, S. Altman, The RNA moeity of Ribonuclease P is the catalytic subunit of the enzyme, Cell 35
(1983) 849 5. K. Kruger, P.J. Grabowski, A.J. Zaug, J. Sands, D.E. Gottschling, T.R. Cech, Self-splicing RNA : A utoexcision and autocyclization of the ribosomal RNA intervening sequence of Tetrahymena, Cell 31 (1982) 147 6. The Wallstreet Journal, February 25, 1993, p. B5 7. J.M. Burke and A. Berzal-Herranz, In vitro selection and evolution of RNA: application for catalytic RNA, molecular recognition and drug discovery, FASEB Journal 7 (1993) 106 8. A.A. Beaudry, G.F. Joyce, Directed evolution of an RNA enzyme, Sciece 257 (1992) 035 9. M. Eigen, Steps toward Life: a perspective on evolution, Oxford University Press, 1992 10. M. Eigen, R. Winkler-Oswatitsch, Transfer RNA, an early adaptor?, NaturwissenschMten 68 (1981) 217 11. L. Orgel, F. Crick, Anticipating an RNA world .Some past speculations on the origin of life: where are we today?, FASEB Journal 7 (1993) 238 12. M. Waldrop, Catalytic RNA wins chemistry Nobelprize, Science 246 (Oct. 1989) 325 13. S. Kauffman, The Origins of Order, Oxford University Press, 1992, p.295 14. J.H. Holland, Adaption in natural and artificial Systems, University of Michigan Press, Ann Arbor, 1975 15. I. Rechenberg, Evolutionsstrategien, Frommann-Holzboog, Stuttgart,1973 16. M. Eigen, Self.organisation of matter and the evolution of biological molecules, Naturwissenschaften 58 (1971) 465 - 523 17. M. Eigen, P. Schuster, The Hypercycle - A principle of natural selforganization, Part A, Naturwissenschaften 64 (1977) 541 - 565 18. M. Eigen, P. Schuster, The Hypercycle - A princwle of natural selforganization, Part B, Naturwissenschaften 65 (1978) 7 - 41 19. M. Eigen, P. Schuster, The Hypercycle - A principle of natural selforganization, Part C, Naturwissenschaften 65 (1978) 341 - 369 20. T. Ray, An Approach to the Synthesis of Life, in: C.G. Langton, C. Taylor, J.D. Farmer, S. Rasmussen (Eds.), Artificial Life II, Santa Fe Institute Studies on the Sciences of Complexity, Proc. Vol. X, 1991, Addison-Wesley, Reading, MA 21. J.D. Farmer, S. Kauffman, N. Packard, Autocatalytic Replication of Polymers, Physica 22D (1986) 50 -67 22. G.F. Joyce, Directed Molecular Evolution, Scientific American 267 6 (1992) 90 - 97 23. W. Ebeling, I. Sonntag, A stochastic description of evolutionary processes in underoccupied systems, BioSystems, 19 (1986) 91 - 100 24. W. Banzhaf, Self-replicating sequences of binary numbers, Computers and Mathematics, 26 (1993) 1 - 8 25. W. Banzhaf, Self-replicating sequences of binary numbers -- Foundations [: General, Biological Cybernetics, 69 (1993) 269 - 274 26. W. Banzhaf, Self-replicating sequences of binary numbers -- Foundations II: Strings of Length N = 4, Biological Cybernetics, 69 (1993) 275 - 281 27. W. Banzhaf, Self-replicating sequences of binary numbers -- The build-up of complexity, Complex Systems 7 (1994) 1 - 11
102
28. W. Banzhaf, Artificial Selection in a System of self-replicating strings, Proc. 1st IEEE Conference on Evolutionary Computation (World Congress of Computational Intelligence (WCCI-94)), Orlando, FL, USA, 1994, IEEE Press, Vol. 1I, 651 - 655 29. G.J. Chaitin, Algorithmic Information Theory, Cambridge University Press, 1987 30. W. Fontana, Algorithmic Chemistry, in: Artificial Life 11, C.G. Langton, C. Taylor, J.D. Farmer, S. Rasmussen (eds.), Santa Fe Institute Studies on the Sciences of Complexity, Proc. Vol. X, Addison-Wesley, Reading, MA, 1991, 159 - 209 31. W. Banzhaf, Genotype -- Phenotype Mapping and Neutral Variation, A case study in Genetic Programming, Proc. PPSN III, Jerusalem, Y.Davidor, H.P. Schwefel, R. M&nner (Eds.), Springer, Berlin~ 1994, 322 - 332 32. N. Margohs, Physics-like models of computation, Physica DIO (1984) 81 33. W. Banzhaf, Competition as an organizational principle for massively parallel computers?, Proc. of the Workshop on Physics and Computation, Dallas, Texas, 1992, IEEE Computer Society Press, Los Alamitos, 1993, 229
Modeling the Connection Between Development and Evolution: Preliminary Report Eric Mjolsness*, Charles D. Garrett~, John Reinitz*, and David It. Sharp:~
*Departmenl of Computer Science, Yale University, New Haven CT 06520 and Department of Computer Science and Engineering, University of California, San Diego (present address) La Jolla, CA 92093-0114;
[email protected] t NEC Research Institute 4 Independence Way, Princeton, NJ 08540;
[email protected] *Center for Medical Informatics, Yale University, New Haven CT 06510 and Brookdale Center for Molecular Biology, Mr. Sinai School of Medicine (present address) 1 Gustave L. Levy Place, New York NY 10029;
[email protected] Theoretical Division, Los Alamos National Laboratory Los Alamos NM 87545;
[email protected] A b s t r a c t . In this paper we outhne a model which incorporates developmental processes into an evolutionary framework. The model consists of three sectors describing development, genetics, and the selective environment. The formulation of models governing each sector uses dynamical grammars to describe processes in which state variables evolve in a quantitative fashion, and the number and type of participating biological entities can change. This program has previously been elaborated for development. Its extension to the other sectors of the model is discussed here and forms the basis for further approximations. A specific implementation of these ideas is described for an idealized model of the evolution of a multicellular organism. While this model does not describe an actual biological system, it illustrates the interplay of development and evolution. Preliminary results of numerical simulations of this idealized model are presented.
1
Introduction
The scientific questions motivating the present work concern the many instances in which development and evolution interact. For example, it has been hypothesized [1] that the period of experimentation with alternative body plans (the "Cambrian explosion") just after the evolution of multicellular organisms ended because of an interaction between developmental and selective constraints. Since basic animal body plans have remained fixed since that time, a theoretical understanding of this problem is of great interest. A model of the evolutionary dynamics of body plans requires development as an essential component. More generally, the expected diversity of morphologies in a given environment is
104
an open scientific question which leads to the study of the interaction between development and evolution. Another fundamental question about evolution concerns the biological units on which selection acts. In the Modern Synthesis, multicellular organisms are taken to be the units of selection, but in the early history of life cells and even smaller units played this role. It is possible that the ancient transition from cellular to multicellular units of selection must be modeled in some detail in order to understand aspects of multicellular organization such as the sequestration of the germ line and common patterns of gastrulation in animals. Since the cellular and multicellular levels are related through development, a model of the transition between units of selection will necessarily require a model which integrates developmental and evolutionary processes. It is the purpose of this paper to formulate a model describing the interaction between development and evolution, and to present the results of preliminary computer simulations. These simulations illustrate schematically the transition from reproducing cells to reproducing multicellular organisms. The model proposed here is an extension of our previous work on development [10, 12]. Developmental and evolutionary processes occur at different levels of organization such as genomes, cells, cell clusters, organisms, populations of organisms and ecosystems. Important interactions occur within and between these levels of organization. This presents a multi-scale modeling problem of daunting complexity, but we believe that a tractable approach can be based on the fact that the problem decomposes naturally into modules, or sectors, which can be developed and modified more or less independently. In particular, the relevant processes can be organized conceptually into model sectors describing development, the selective environment, and genetic adaptation through heritable variation. Existing models of evolution treat the map from genotype to phenotype in a rudimentary way. A more complete modeling framework for development was introduced in [10]. This model was based on a set of ordinary differential equations (ODEs) to represent a genetic regulatory circuit, together with a set of rules (a "grammar")which allow a description of changes in the number and type of biological entities that are present at a given time. We will adopt this model to describe the developmental sector of the enlarged model of evolution discussed here, and we will also borrow its techniques in formulating models for the other sectors. A key issue in fbrmulating the sector of the model pertaining to the selective environment is to define a scoring function which reasonably approximates a "fitness function". We are aware that the precise character of such a fitness function, in fact even its existence, remains an open scientific question in spite of much effort. We view a fitness function as an approximation to a more complete model of a selective environment, which would use a stochastic grammar to describe important interactions with the environment such as establishing a territory, finding a mate and getting killed. The formulation of such a model would use dynamicM grammars in a way similar to their use in the developmental sector, but is not attempted here. The work in the present paper is based on
t05
a simple scoring function which accounts for several aspects of the selective environment for multicellular organisms. (See Section 2.3). We next discuss the genetic sector. Important genetic events to be modeled include point mutations, cross-over, gene duplication and fusion of gametes. An accurate description of the impact of these events on a genetic regulatory circuit is another open scientific problem. As a substitute for this genetic sector, we use a crude model of adaptation based on simulated annealing. It differs from real genetics in that the only genetic events are "point mutations" acting directly on the regulatory circuitry, which are selected using an optimization algorithm [5, 6, 7]. In this paper we restrict attention to asexually reproducing haploid organisms. This paper also reports results of preliminary numerical experiments with the model. In these experiments~ we imagine a hypothetical world as follows. As an initial state, we have a collection of cells which can reproduce to form multicellular aggregates. These aggregates, however, do not have a genotype capable of multicellular reproduction. The cells' genotype is allowed to adapt under a fitness function which rewards multicellular organisms of a certain size and internal structure, and which reproduce as multicellular entities. In this world, development is specified by a simple genetic circuit model. We exhibit simulations of such a system, and examine the various outcomes as to their capacity for multicellular reproduction and the behavior of their associated genetic circuits. Experience with the developmental sector of the model as applied to the Drosophila blastoderm has taught that the modeling approach described here can be successfully applied to answer real biological questions, but only when great care is exercised in incorporating the necessary biological detail. The same will certainly be true of the evolutionary extension of the model outlined here. The computer experiments described serve to illustrate the interplay of development and evolution in the model, but do not yet constitute a biologically realistic model of an evolutionary process.
2 2.1
Formulation of the M o d e l Development
We have described our modeling framework for development elsewhere [10]. We recapitulate it here only to the extent necessary to understand its application in the present paper. As used here, the model describes two fundamental processes: (a) the continuous dynamics of regulatory molecules and (b) the change in number and type of biological entities. Concentrations of regulatory molecules change in response to existing concentrations of regulators, exchange of regulatory molecules between nuclei (by diffusion), and decay. These effects are described in the model by a system of coupled nonlinear differential equations. The continuous internal dynamics of gene products are given by a "genetic regulatory circuit" in which a pair of genes a and b interact by means of a single real number Tab. There may be many
106
proteins b regulating the gene for protein a and thus influencing the dynamics of its concentration v~ in object i. To make a tractable model, we will assume that these effects are monotonic in the concentrations vb and are approximately additive, with nonlinearity confined to sigmoidal threshold functions 9~. So we assume that the dynamics of v,~ depend on the other variables v~ through a summed input ua: u ai = Z
i,'~ab vib + h a ,
(1)
b
where h a determines the threshold of ga. During the time when object i does not participate in birth or death processes, which for dividing cells is called interphase, we use a continuous time model of its internal dynamics. A simple model is the neural net dynamics given by (c.f. [4])
dv~
~~
= g~
- ~~
(2)
Equations (1) and (2) comprise our connectionist model of the internal dynamics of interphase, one of many processes involved in development. Other continuous time and purely internal dynamical subsystems will be modeled in like manner, with just the connection matrix T and the thresholds h a changed. Each such process can be represented by one rule in a dynamical "grammar" which can model several biological processes. We next consider discrete events. Birth processes such as cell division (mitosis) require a discrete time update equation with its own connection matrix. Since the different daughter cells of one parent cell are not necessarily equivalent, we use one such connection matrix T~ (with components Tffb) for each of the progeny. We use multiple index notation: if i is the index of the parent cell, then (i, k) is the index of its k'th daughter cell. (And ((i, k), l) would index the second generation descended from i.) We then suppose that a v(,,~)
~
~ + nogo(~7~%~
^ V ,a, + h ~) - ~,o
(3)
b
When the only regulatory molecules are gene products, Equation (3) can be further simplified. Because there is no synthesis of gene products during mitosis, the only dynamical process that modifies v is unequal partioning of gene products, and Equation (3) becomes
vg,~) = U~v~
(4)
where each U~ is diagonal and U~ > 0. Other diScrete-time processes, which just change the type and state of one object, can be modeled as one-child analogs of Equations (3) or (4) by suppressing the k index. For a system which includes both interphase and cell division, Equations (2) and (3) must be combined so that (2) operates continuously except at certain discrete times when (3) is invoked. The same is true of any combination of continuous and discrete time processes.
107
We see that every developmental process that we model is described by a set of rules which govern how a single object of a given type can be replaced by one or more objects of the same or different type, together with an internal dynamics model such as that defined by Equations (2) and (3). This set of rules can be thought of as a developmental "grammar" F, in the sense of Lindenmayer and others [8, 2, 1I]. The grammatical rule adopted by an object i at a given time t will in general be a function of its state vector vi(t). In summary, this approach comprises a framework in which many biological processes, continuous and discrete in time, can be modeled in a unified way. One can take advantage of this fact to model different processes at different levels of detail. This may be of considerable utility for multiscale modeling in biology. As an example, we discuss a five-rule grammar for a collection of cells that have the capability of dividing and of forming reproducing multicellular organisms. These rules will be used in the experiments reported in Section 3. We assume that the internal state of each cell is specified by the concentrations of a small set of proteins that are products of regulatory genes, and in particular that these gene products control cell division. The cells do not exchange material with one another, but they do pass material to their progeny. Although mitosis itself is a complex dynamic process, we model only those aspects relevant to the problem at hand: Mitosis lasts a finite time, during which genes are not expressed, and at the end of that time there are two cells instead of one. In order to describe a multicellular organism, there must be a rule that represents the idea of the temporal boundaries of such an organism. Lastly, we must take into account the fact that not all zygotes survive to find an available territory (or more generally a "niche" or "slot") in the environment (see Section 2.3) This picture can be summarized in the following five rules. 1 cell ~ cell, or interphase, in which the internal state variables of the cell evolve continuously; 2 cell --~ mitosing cell, which initiates cell division. During mitosis the genetic circuitry is shut down for a period which represents the time required for a cell's chromosomes to condense and separate on the mitotic spindle; 3 mitosing cell --~ two cells, or cleavage, which concludes mitosis; 4 cell --* (detached) cell, or dispersal, in which a zygote separates from its parent organism. 5 (detached) cell --* cell, in which the zygote successfully starts a new organism. 2.2
Genetics
Phylogeny is a summary of the evolutionary history of an organism. Darwinian evolution consists of selection, discussed in the next section, and heritable variation, discussed here. A phylogeny can be pictured (Figure 1) as a comprehensive lineage tree, showing the ancestry of every cell, which has three or four scales of organization: (1) individual cells and their descendants, (2) some number of unit subtrees, each of which recurs approximately across generations and constitutes the ontogeny
108
1 l
l
1 1
l
!
Fig. 1. A comprehensive lineage tree, showing organization at different scMes. Cells (nodes) are grouped into two recurring subtrees (heavy and light lines). One heavy and zero, one, or two light subtrees make up one organism. Organisms repeat in a large-scale lineage.
of a phase in the life cycle of an organism, (3) instances of the full life cycle of an organism, and (4) phylogeny: the slow emergence over evolutionary time of a repeating life cycle and its constituent subtrees. Actually, the tree should be generalized to a directed acyclic graph (DAG) because some cells receive information from more than one "parent" during recombination, the fusion of gametes, or developmental induction. For the present we will ignore these important processes, and the genetic effects of diploidy, and we will also indirectly but severely limit the possibilities for multiple-phase life cycles (thereby conflaring scales (2) and (3) above) by modeling only very limited environments. Thus we will model comprehensive lineage trees with cellular, individual, and phylogenetic levels of organization. If we examine the genotype of each cell in such a comprehensive lineage tree, we will see that cells and their progeny often differ owing to genetic events which introduce heritable variation. These include point mutations, gene duplications, and crossover. (Technically, crossover and sexual reproduction change the tree into a DAG.) Such processes may be modeled as additional grammatical rules which modify a cell's genotype. By contrast, the developmental grammar rules we have considered previously act on the protein concentration state vector.
109
Thus one has a combined grammar, which has both developmental and genetic rules. We note that selective processes are not represented directly in the comprehensive lineage tree, but their result (mortality of an organism) is reflected in changes in the shape of the tree. This occurs since fewer progeny are included in the tree. Full implementation of these ideas requires information that we do not presently have. Specifically, the action of genetic grammar rules (such as those describing point mutation and crossover) on genetic circuits requires a biologically realistic model of the relationship between genetic information on a onedimensional string of DNA and the connection matrix of a genetic regulatory circuit. As an interim measure we take advantage of a similarity between simulated annealing and the comprehensive lineage tree with genetic events to formulate a rough approximation to the genetic dynamics. Simulated annealing is a method for finding the global minimum of a "bumpy" function, that is, one with many local minima. The method is derived from statistical mechanics [9] where it models the slow cooling (annealing) of a physical system to its lowest energy state; later it was generalized by Kirkpatrick [5]. In the following, the function to be minimized (the "cost function") is E = f ( z l , . . . , x i , . . . x , ) , and T is a parameter (the "temperature") that starts off large and slowly gets smaller. The basic method is quite simple. Start with a random set of xi, and then perform the following procedure: 1. 2. 3. 4. 5.
Compute E = Eo]d from the variables xi. Make a change in one (or more) of the zi (This is referred to as a "move"). Compute E = Enew from the newly generated set of xi. Compute exp (E~ ) If the above quantity is bigger than a random number between zero and one, keep the new x{'s ("accept the move"). Otherwise, restore the old x/'s ("reject the move"). 6. Repeat while allowing T to decrease slowly from a large value to zero. Typically this entails 10 ~ to 10 7 iterations.
Our simulated annealing algorithm [6, 7] consists of point mutations on connection matrix elements and rule strength coefficients, selection using a scoring function described in the next section, and gradual reduction of both the temperature and move size to make an optimization algorithm. Point mutation on connection matrix elements is directly analogous to point mutation on genetic information, and stochastic update using a scoring function is directly analogous to stochastic selection using a fitness function. However, the regulation of temperature and the reduction in move size over the course of an annealing run does not have a direct analogy in the comprehensive lineage tree picture. They represent, in essence, a stochastic process approximating such a picture. This approximation achieves computational efficiency in that the effects of macromutations are explored under low selective pressure. As the selective pressure
110
increases with the decline in temperature, the effects of mutations with smaller and smaller effects are explored. These points of analogy are exploited in the algorithm used in the computer experiments reported in section 3.
2.3
Selective E n v i r o n m e n t
As in development, a dynamical model of a selective environment must include diverse processes, many of which change the number of organisms that are present. We have already mentioned mating and various forms of mortality as examples of such processes. For this reason, the use of a grammar in which a rule corresponds to a process again appears to be a useful modeling tool. The grammars used in modeling the selective environment will differ from the developmental grammars in that they are stochastic grammars, reflecting the fact that most environmental processes have an important stochastic component, and that such processes are often too intricate to model deterministically. As in developmental grammars, however, we expect that there will be a quantitative dynamics associated with each process. The grammar rules to be implemented in a specific problem must reflect a careful analysis of the features in the environment which are of central importance. Given a scientifically correct identification of the rules to be modeled, there is still the question of whether the full probabilistic description can be simplified. The exact probabilistic formulation amounts to a master equation for the probability distribution of numbers of organisms of each species and phenotype. We consider coarse-grained phenotypes, so that the average occupancy for each phenotype can be high in a large population. Then we can apply the van Kampen-Kramers-Moyal [3] "large system" approximation to the master equation to obtain simplified stochastic dynamics in the form of Fokker-Planck equations. These equations have as their solutions Gaussian probability distributions, with time dependent mean and covariance which approach well defined asymptotic limits. We suggest that the limiting Gaussian may provide the defining distribution in which to evaluate the expected number of offspring, or closely related quantities such as reproductive value, and hence in which to calculate the fitness function for a given species. Indeed, fitness so calculated may have the same maximum as the limiting Gaussian. It is in this sense that we think of a fitness function as approximating the stochastic grammar which models a selective environment. This point of view raises several interesting questions. First, how well is the science of this problem captured by the Fokker-Planck approximation? If this approximation is not good enough, what are the prospects either for better analytic approximations (eg. higher moments) or for direct numerical solution of the master equation? Second, in a compound grammar modeling several interacting species, how are the separate fitness functions related to the asymptotic Gaussian solution of the entire system? An answer to this question could shed interesting light on the game theory approach to evolutionarily stable strategies [13], because it leads naturally to an analysis of the asymptotic stable states
111
of a system with competing species. As a third question, we ask how to treat fine-grained phenotypes with average occupancy numbers << 1. We believe that fitness functions may provide useful approximations to the dynamics of the selective environment in some circumstances. We shall assume that this is the case for the model studied in Section 3 of this paper, and go on to describe a further approximation of the fitness function by a scoring function. The main idea of this further approximation is to represent several selective effects as additive terms in a scoring function. This scoring function is ad hoc in that its separate terms are guesses as to approximate analytical forms for various contributions to the true fitness function; they have in no sense been derived or computed. The environmental scoring function rewards multicellular organisms which develop an optimal number of cells, correctly distributed into three cell types, achieved through a supportable rate of proliferation. This much results in ontogehies. Life cycles result if the scoring function is actually a sum of such functions, taken over a set of available environmental slots each of which may be occupied by at most one individual. The occupation of such slots is governed by further environmental parameters, and requires that some cells of established organisms disperse to another slot. These processes can be represented analytically with a scoring function of the form
slots time
et
cell types vt
(5)
_~_A~nmitosi s n~' } (7 = 2/3 or 1). In this formula, n is the number of cells present in a given environmental slot at a given time, n~ is the number of cells which have differentiated to cell type c~, and Z~nmitosis is the number of cells produced by mitosis at a given time step. The constants appearing in this formula are ntarget , the desired number of cells in an individual organism, f~ target, the desired fraction of cells differentiated to type c~, and the exponent 7 which relates the energetic cost (or cost in some other limited resource) of each mitosis to the energy or resource available within the organism. The constants ntarget and {f~ targetl Ec~ fa target : 1} can be specified arbitrarily, subject to practical constraints on the cost of the simulations. The first two terms in the scoring function are simply quadratic penalty functions which favor organisms which have parameter values close to those desired. The last term represents a sharing of energy or some other limited resource among all the cells in an organism, to support mitosis by a few. The form of this term is the fraction of the organism's limited resource expended in mitosis. The denominator is the total resource available, and is a function of the size of the organism, n. This function could express a proportionality to the volume (7 = 1) or to the surface area (7 = 2/3), for example. The mitosis term thus incorporates the essential assumption that the cells in a slot are actually cooperating and
112
%
Na
.5
>,
r
o
X
o
b,
time Fig. 2. Assumed probability with which a zygote may find and occupy an unoccupied slot as a function of time. Different slots are temporally overlapped as shown. Each slot's probability of being suitable for a new zygote (if the slot is empty) starts at zero, rises to a maximal value p , ~ _< 1, then returns to zero. Experiments used pm~x = .5. When the probability returns to zero the slot is eliminated and its occupant dies.
functioning as a multicellular organism. The first two terms also reflect cooperation, but in the weaker sense that separate cells produce the correct number and types of offspring whether or not they interact during development. The summations in (5) express the fact that a genotype is shared by all the cells in an organism, over several generations. A genotype is therefore scored not only by its ability to fill one slot well, but also by its ability to occupy new slots. Finding and occupying a slot is governed by important environmental parameters such as the total number of slots present at a given time and the the probability of finding and occupying an unoccupied slot (as a function of time). In our simulations we choose a. schedule of slot creation, destruction, and probability of occupancy which is periodic, as shown in Figure 2. This choice of schedule can be interpreted as favoring the evolution of "iteroparous" as compared to "semelparous" organisms. These two classes of organisms differ in their reproductive strategies. Iteroparous organisms produce offspring in a slow, steady manner (eg. horses), whereas semelparous organisms produce m a n y offspring in a rapid burst of reproductive activity (fruit flies). These reproductive strategies are associated with low variance and high variance environments respectively.
113
3
Results
of Numerical
Computation
In this section we complete the specification of the model which we have simulated (Section 3.1), describe how the simulations were performed (Section 3.2), and (Section 3.3) report and discuss the results. 3.1
Specification of t h e M o d e l
Here we recapitulate briefly the components of our model, as described in the previous section. The growth and reproduction of a multicellular organism is modeled by a dynamical grammar (Section 2.1). This grammar makes possible the growth and reproduction of multicellular organisms by incorporating rules governing cell division, gene regulation, and dispersal of zygotes. However, the grammar does not specify how these events are coordinated. The selective environment is modeled by a scoring function (Equation 5) which rewards reproductive multicellular organisms of a certain size and degree of cellular differentiation. Finally, we have a genetic sector whose function is to make changes in the genome which by trial and error optimize the environmental scoring function. The parameters which are varied to achieve the optimization include connection strengths in the genetic regulatory circuits, parameters governing celt type differentiation, and the parameters governing the application of discrete time grammar rules. The trial and error procedure is implemented using the method of simulated annealing. 3.2
How the Computations Were Performed
We mentioned that in broad outline the computations were performed by applying the method of simulated annealing to optimize Equation (5). In this section we explain further details of this procedure. The simulated annealing algorithm used was a variation of the one proposed by Lain [6, 7], in which both the temperature and average step size are controlled according to an adaptive cooling schedule. Annealing moves for each parameter were selected from an exponential distribution and given random signs. The mean of this distribution was separately altered for each parameter so as to ensure an effective sampling of the search space. This search space was restricted by limiting the saturation of the threshold function g(u), Equation (2), to 99%. The parameters to be optimized characterize genetic circuits and their connection to grammar rules. There is a genetic circuit associated with each grammatical rule in section 2.1. Rule 1 (the interphase rule) has a fully connected matrix defined by Equation 1. The remaining rules 2 through 5 operate in discrete time, and of these rules, only the matrix elements in rule 3 (cleavage) and rule 5 (formation of a new organism) are optimized. Rules 2, 3 and 5 have discrete-time diagonal circuits, specializing Equation 3 (or its one-child analog in which the
114
k index is suppressed) to the case in which T is diagonal. Rule 2 (mitosis) has a constant T = 0. Rule 4 (dispersal) does not affect the internal state vector at all (U ~ = 1 in the one-child analog of Equation 4). Further parameters are required to control the processes of rule selection and cell type differentiation which we imagine smnmarize the effects of other regulatory circuits not explicitly included in the model. If such an omitted subcircuit has one input unit, and perhaps many output units, then the net input to that circuit from the explicitly modeled regulatory circuit would be, as in Equation 1, a function of the vector product between a row of a connection matrix and the state vector of the regulatory circuit plus a threshold. So for each grammatical rule r E {1...5}, we assume a rule strength of the form S~ = s r .v~ + 0 ~
(6)
We also assume that the rules compete at each cell, and that the one with the largest strength is selected. Many of the strength connections s r and 0r are made identically zero in order to impose appropriate structure on the system. Rule 1 (interphase) is the only continuous time rule, with s int = 0, and has a strength 0 i'~t = 1 which is independent of v~. Rules 2 (mitosis) and 4 (dispersal) have 0r = 0. Typically one of these rules is triggered when its strength rises above 0i'~t. s mi~~ was held constant, while s dispe~'az was optimized. Rule 2 (mitosis) always triggered rule 3 (cleavage) on the next time step, and rule 4 (dispersal) triggered rule 5 (new organism) in a stochastic manner according to the probability distribution shown in Figure 2. Failure to trigger rule 5 resulted in death. Similarly, cell type differentiation is determined at each cell type i by a competition between cell type strengths given by the linear form c~,~ = ca . v i .
(7)
In this case a "soft" competition was implemented in which for small ceil type strengths c~,i a cell can belong partially to different cell types. Then the number of cells of type a appearing in the scoring function is given by
cells i
fl
The square encourages decisiveness but does not absolutely require it. The cell type strength vectors, c~, were also optimized. Experiments were performed with and without "slots", i.e. with selection for multicellular growth but with and without selection for multicellular reproduction. Experiments with slots used a sum over twelve slots in the scoring function (5). The procedure for experiments without slots differed from the foregoing in that the dispersal grammar rule was omitted and the scoring function was not summed over slots. Furthermore, the 5 x 5 interphase circuit was assumed to have the special form of non-communicating 2x2 and 3x3 sub-circuits. This type of experiment was done as a pilot experiment to explore multicellular growth. A typical simulated annealing run required 2 to 4 days of CPU time on a SPARC 2 workstation.
115
t
3
{
3
2
4
2
4 2
2
3 3
4 2
2 2
1
}
3
3
3
2
4
2
2
Fig. 3. An evolved lineage tree for a single-slot organism (no reproduction required). Cells are labelled by their scoring function cell type. The desired numbers of cells of each type are attained: there are 12 terminal cells of type 2, eight of type 3 and four of type 4.
3.3
Results
In this section we report the results of numerical simulations. We emphasize that these results are preliminary, in that we present the outcomes of individual simulated annealing runs. We have yet to characterize exhaustively the possible behavior and lineages which can arise from the chosen scoring function. In the first set of runs, the optimization was performed without summing the scoring function over slots. Several runs produced multicellular organisms having the desired size, distribution of cell types, and growth schedule. A cell lineage tree for one such run is shown in Figure 3. The desired number of cells, 24, was not a power of 2 and thus an unbalanced tree was required to achieve the correct size. The strategy adopted to arrive at this unbalanced tree depended on differential reproductive rates in the left and right progeny cells, which came about because cells on the right branches of the tree stopped reproducing sooner. We refer to this loss of mitotic capacity as "aging". The mechanism that causes aging can be seen in the phase portrait shown in Figures 4 and 5. Figure 4 shows the convergence of the interphase dynamics alone to a distant fixed point, regardless of the initial conditions. Figure 5 shows a small region of Figure 4, superimposed on the full dynamics of the growing organism. Cells begin mitosis when their interphase dynamics takes protein 0 above its threshold concentration. For sufficiently large concentrations of protein 1, this can no longer happen: protein 0 can not reach threshold because the interphase trajectories bend towards the distant interphase fixed point. Therefore, increased levels of protein 1 result in a termination of a cell's ability to reproduce. Left and right progeny age differently, as seen in the figure, because the cleavage grammar rule affects
116
the concentration of protein 1 differently for the two progeny (see Equation (4)).
In~erphase
Z5
?ortrai~
- Block
Connection
HaLri
zo
15 .i
~_1o
. . . . . . . . . . . . . . . . . .
J
I
1
I
I
I
7
pro~,ol n
I
3
[}
Fig. 4. Evolved interphase dynamics for the single-slot organism, Observe fixed point at (O, 21.5), and attracting trajectory from (2.4, 0) to the fixed point.
In the same run, cellular differentiation occurred. The desired fractions fa target of the three different cell types were 1/2, 1/3 and 1/6. Figure 6 is a phase portrait of two components of the 3• sub-circuit, showing how eight of the cells are differentiated from the rest. These cells, which have low concentrations of protein 3, appear to be in the domain of attraction of a nearby fixed point, whereas 15 of the 16 remaining cells appear to be converging to a distant fixed point. In no case does a cell reach its interphase fixed point; the cells have been sufficiently separated to differentiate. In the second set of experiments, the sum over twelve slots is included in the scoring function, so that a successful genotype must not only grow, but also reproduce as a multicellular organism 9 Figure 7 shows the lineage tree of our first exactly reproducing genotype. Cells are labeled according to whether they disperse or not, and by their cell type. Since dispersing cells are no longer part
117
0.8
0.6
5
2o
&o.,~
0.2
O'-o.0Q
0.5
l.O
1.5
protein 0
Fig. 5. Phase portrait for the single-slot organism, showing an emergent "clock" which regulates the size of the organism. Note that the interphase fixed point is far outside the operating range of the genetic circuit.
of the organism, the organism has the correct size. The pattern of dispersing cells in the lineage tree can be understood from the phase portrait in Figure 8. This figure shows that the cells segregate into a sequence of 6 groups. Mitosis replaces a cell in group n with cells in groups n + 1 and n - 1. Dispersing cells are all found in group 6, except for two cells in group 4. These facts account for the pattern of dispersal in the lineage tree. The group number of each cell, along with its scoring function cell type and its status as a zygote or a somatic cell, is shown in Figure 8. We see that group 6 is specialized to produce zygotes, but other groups do not correspond in a one-to-one fashion with the externally imposed cell types. We may interpret the groups revealed in this figure as "emergent cell types", which are not required by the scoring function but are part of the developmental strategy which evolved to minimize the scoring function. The growth and reproduction over many slots of organisms bearing the evolved genotype is shown in Figure 9. The number of cells in each organism is plot-
118
l'OI.
I nterphase and Ontogon 7 '
")
9
'
,I
'.
i
-
'
.I
0.8
0.6
i
0,4
0.2
0.0
2
4
6
8
pro~.r I n 2
Fig. 6. Another phase portrait for the single-slot organism, showing the differentiation of one third of the cells (those with the lowest values of protein 3). Interphase trajectories can cross because they have been projected from a three dimensional subcircuit to a two-dimensional plot.
ted against time, and the plots are superimposed. Starting with the appearance of the seventh slot, the growth patterns of the organisms are indistinguishable. We interpret this as an exact life cycle occurring in a stable environment. The growth pattern in Figure 9 settles into an exact periodic life cycle after an initial transient behavior. This transient is an artifact of the use of simulated annealing as a substitute for the full genetic grammar discussed in Section 2.2. Under a genetic grammar, the initial state of a zygote's protein concentration vector after a change in genotype would he the same as the final state vector before such a change. But the simulated annealing procedure must re-set the state vector to some constant value for the zygote of the organism in the first slot, because the scoring function must be evaluated the same way after each genetic change. 9 After this transient, the run in Figure 9 has periodic behavior with period 1 in units of slot appearance time. Other periodicities are observed in other runs.
119
T~ 14 is ]2 ~6
12 1311 15 -2-2 .~
. . . . . . . 13 [2 4
2
15
~ 3
13
15 --.-__2.2
~
2
3 16
I5 ,I,4
.t5 t5
.t6
14
.[3 ],4
13
.t5 16
15 ~6
14
J,5 16
4
4
15 12 16
2
15 l-6
13
$
14
,I,3 14
22
4
13
14
,i3
13 [2
13 12
4
~
b
/~5 16
.t3
14
Fig. 7. Lineage tree of the first exactly reproducing genotype. Cells are labeled above by their group number as determined from the next figure, and below by their status as zygotes or somatic cells (label = 1 for zygotes) and, for somatic cells, by their scoring-function cell type (label = 2, 3, or 4). Ontogeny of slot seven is shown.
Figure 10 shows the onset of a period 6 life cycle. On inspection this life cycle is seen to consist of 3 interleaved period 2 life cycles. As an indication of the range of results possible in this model, we mention that we have also observed in the simulations an apparently "chaotic" and non-terminating life cycle.
4
Conclusion
We have presented a model incorporating developmental processes into an evolutionary framework. In simulations of this model, we have observed a number of interesting phenomena including an emergent clock which regulates organism size; the use of interphase attractors to define externally imposed cell types; emergent cell types which were not imposed by the environment; and multicellular reproduction in periodic and nonperiodic forms. The next step is to identify an actual biological system for which the information required in the formulation of the model is currently available. The challenge is to apply the model to a real biological system, and thereby validate the correctness of the modeling assumptions and show that the model has scientific substance, in that it is capable of answering questions of biological interest. One area of study which may be suitable for these purposes has been suggested to us by Buss; this concerns the evolution of developmental differences between
120 4.5
4.0
e~ 5
3o
3.5
I
k
0.9
1.0
proteln
1.
[
Fig. 8. Phase portrait of the first exactly reproducing genotype, as expressed in slot seven. Note that this two-dimensional slice of a five-dimensional circuit state space reveals six groups of cells, which we may number from left (group 1) to right (group 6). Also, note that group 5 contains no terminMly differentiated cells, groups 1 and 6 do not mitose, and mitosing cells move from group n to group n -t- 1.
semelparous and iteroparous organisms, which must reproduce in relatively unpredictable and predictable environments respectively.
Acknowledgement We wish to thank Leo Buss for extensive discussions of the context and content of this research, and specifically for encouraging us to apply the developmental model to problems of biological evolution. J R acknowledges support from National Institutes of Health grants LM 07056 and R R 07801; EM and CG from the US Air Force Office of Scientific Research grant 88-0240; EM from Hewlett-Packard and from the Yale Institute for Biospheric Studies; and DHS from National Institutes of Health grant R R 07801.
121
30
o
-~2o o
]0
0
200
~00 tlme steps
600
Fig. 9. First exactly reproducing genotype, showing the size of the organism in each slot in the training set (the scoring function) as a function of environmental time. Note transient, followed by convergence to periodic behavior.
References 1. Buss, L. W. (1987). The Evolution of Individuality. Princeton University Press, Princeton, New Jersey. See pp. 31, 69, 98-115. 2. Fleischer, K. and Barr, A. H. (1993). A simulation testbed for the study of multicellular development: The multiple mechanisms of morphogenesis. In Artificial Life IlI. Addison-Wesley, 389 - - 416 C. Langton (Ed.). 3. Gardiner, C. W. (1983). Handbook of Stochastic Methods for Physics, Chemistry, and the Natural Sciences. Springer Verlag, Berlin. 4. Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Sciences USA, vol. 81:3088-3092. 5. Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220:671-680. 6. Lam, J. and Delosme, J.-M. (1988a). An efficient simulated annealing schedule: derivation. Technical Report 8816, Yale Electrical Engineering Department, New Haven, CT. 7. Lain, J. and Delosme, J.-M. (1988b). An efficient simulated annealing schedule: implementation and evaluation. Technical Report 8817, Yale Electrical Engineering Department, New Haven, CT.
122 50 '
'
'
'
I
'
~
'
t
'
llO
o o30 N
c
~20 E
Io o!
50
100
s l o t n~mber
Fig. 10. A period six life cycle. The maximum size of each slot's occupying organism is shown as a function of slot number (proportional to environmental time), for many slots. Only the first 12 slots were trained. Closer inspection of the organism-level lineage tree shows that this is actually three interleaved life cycles of period two.
8. Lindenmayer, A. (1968). Mathematical models for cellular interaction in development, parts i and ii. Journal of Theoretical Biology, 18:280-315. 9. Metropolis, N., Rosenbluth~ A., Rosenbluth, M. N., Teller, A., and Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21:1087-1092. 10. Mjolsness, E., Sharp, D. H., and Reinitz, J. (1991). A connectionist model of development. Journal of Theoretical Biology, 152:429-453. 11. Prusinkiewicz, P., Hammel, M. S., and Mjolsness, E. (1993). Animation of plant development. In SIGGRAPH 93 Conference Proceedings. Association for Computing Machinery Press. 12. Reinitz, J., Mjolsness, E., and Sharp, D. H. (1994). Cooperative control of positional information in Drosophila by bicoid and maternal hunchback. Submitted to J. theor Biol.. 13. Smith, J. M. (1982). Evolution and the Theory of Games. Cambridge University Press, Cambridge.
Soft Genetic Operators in Evolutionary Algorithms* Hans-Michael Voigt 1 Technical University Berlin, Bionics and Evolution Techniques Laboratory Ackerstral~e 71-76, 13355 Berlin, Germany Abstract. With this paper soft genetic operators for Evolutionary Algorithms are introduced and analyzed for multimodal continuous parameter optimization problems. A new scaling rule for multiple mutations is formalized and compared with a new step-size scaling for Evolution Strategies. The scaling of the Evolutionary Algorithm with Soft genetic operators (EASY) is compared with that of the Breeder Genetic Algorithm (BGA). A performance comparison of EASY with recently published results concerning the performance of Bayesian/Sampling and Very Fast Simulated Reannealing techniques for global optimization is given.
1
Introduction
We consider continuous parameter optimization problems
f* = f ( x ' ) = m i l l ( x ) ,
G c R
(1)
where no assumptions are made concerning the convexity and differentiability of the function f(x). To solve such problems Evolutionary Algorithms (EAs) are used. A formal description of different types of EAs for parameter optimization is given in [1]. For the Breeder Genetic Algorithm (BGA) [10] a discrete mutation scheme is used which tests much more often in the neighborhood of a given point. The main recombination operator is a discrete one. In the second section we formulate a generalization of this recombination scheme and analyze its problem solving properties and robustness for some simple test functions and multimodal test functions with widespread use in global optimization. This analysis leads to the notion of soft modal recombination. The same idea is then applied to define a soft modal mutation scheme. The term "soft" is gleaned from fuzzy set theory [23, 9] only to grasp the main idea. The uncertainty of the modal values in recombination is characterized by probability distributions not by membership functions. But the use of real membership functions and corresponding inference rules as recombination is quite straightforward and a subject of further research. * This work is supported by the Bundesminister ffir Forschung und Technologie (BMFT) as part of the project SALGON and the Deutsche Forschungsgemeinschaft (DFG) Grant Vo 493/1-1. e-mail:
[email protected]
124 These soft mutations and soft recombination are shown to be more robust then discrete ones. In the following we use the terms discrete and crisp synonymously. Furthermore we introduce a new simple scaling rule for multiple mutations. Comparisons with a new step-size control for Evolution Strategies are given. The scaling of the EA with soft modal mutations and soft recombination is compared with the scaling behavior of the BGA in the third section. The final section contains a comparison of the Evolutionary Algorithm with soft genetic operators with other techniques based on Very Fast Simulated Reannealing and Bayesian/Sampling methods. 2
Soft Genetic
Operators
Selection in the EA with soft genetic operators is done in the same way as in Evolution Strategies [14, 16] and in the BGA [10]. It is characterized by the response to selection equation [4, i0] from quantitative genetics < f ( t + 1) > - < / ( t )
>=
I. p(t)
(2)
where < f(t) > is the average fitness at generation t, bt is the inheritance coefficient at generation t, I is the selection intensity, and cp(t) is the phenotypic variance at generation t. The underlying assumption for this equation is a normal fitness distribution within the population. The selection intensity I is also a feature of the normal probability distribution
r
(3)
1 -~(x) where r is the normal probability distribution, and O(x) is the normal probability density function.The dependence of the selection intensity I on the percentage T of selected parents is shown in Figure 1. Selected values are given in Table 1.
Table 1. Selection intensity I for N ~ co, N population size /0~L~.34 ,.0-8 10.971 1.2
1_4 1.6 1 ~
The theoretical results for the BGA [10] concerning the relation of selection and recombination are obtained for binary problems with an underlying binomial distribution. Unfortunately, such a binomial distribution cannot be assumed for the EA with soft genetic operators. Therefore, we made extensive simulation studies concerning the performance and the scaling behavior of EASY using the test functions given in Tables 2 and 3. It should be noted that EASY is an instantiation of the Multivalued Evolutionary Algorithm (MEA) [20].
125
B
r
1.2 ._O
:i!iii::-ii
0.8
9
0.6
0.4
........i
0.2
........ ...........................................
. . . . . . . . .
,.......................i............+ - - ~ - < r
9
*-
"~
.......
~
....
10 20 30 40 50 60 70 80 90 100 Percentage Selected Parents T%
Fig. 1. Selection intensity I vs. percentage T of selected parents
a)
X i (mother)
X i (father)
X i(mother)
X i (father)
b)
Fig. 2. a) Crisp recombination and b) soft recombination
2.1
Soft
Modal
Recombination
Let (zl, ..., xn) and (yl,..., y,~) be the parent feature values. Then for discrete recombination the offspring feature values (zl, ..., zn) are generated by ~{ ~ {=~, y{}.
(4)
x{ or y{ are chosen with probability 0.5. This discrete recombination scheme is depicted in Figure 2a). To check the robustness of such a recombination scheme (uniform crossing over, discrete recombination) we analyzed the sphere model Fsph~r, from Table 2 which is a basic one in the analysis of mathematical optimization and evolutionary algorithms, e.g. [5, 2, 14, 16].
126
Table 2. Simple Test Functions Function
Constraints k
2
i=1
&.~p~oid(~) = ~
i~ . ~
i---1
i=1 7~ i=1
The result is shown in Figure 3 labeled Discrete Recombination.
SPHERE T-
, - -
,
,
,oo,,oo
II}
I.L
. . . . . . .C~ .....
le-10 0
20
M~da!~!.i.~ 40 60 80 Generations
100
120
Fig. a. Behavior of different modal recombination operators for the sphere model, n = 32, N = 512, e = 10 -12
It conforms with the results of the BGA [10]. But this means that selection and discrete recombination does not give a sustained development. The question is how to get a more robust recombination scheme. Contrary to existing recombination schemes (Discrete Recombination [16], Intermediate Recombination [16], Extended Intermediate Recombination [10], Extended Line Recombination [10], Fuzzy Min-Max-Recombination [19], Uni-
127
form Crossover [14, 18], Linear Crossover [22], BLX-0.a Crossover [3], 1-Pointand x-Point-Crossover [2, 6]) we introduce a soft recombination scheme gleaned from fuzzy set theory [23] but used stochastically
p(z{) {r
r
(5)
with triangular probability distributions r having the modal values xi and Yi with x i - a . l y i - x i I ~ r ~_ x i + a . l y i - x i I and y i - a . l y i - x i I ~_ r ~_ y i + ~ ' l y i - x i l , a _> 0.5, for xi ~_ yi. This soft recombination scheme with a -- 0.5 which is used throughout this paper is sketched in Figure 2b). The result using soft recombination is shown in Figure 3 labeled Continuous Modal Recombination. With this recombination scheme it is possible to generate a sustained convergence for all generations, at least for the sphere model.
SPHERE
ELLIPSOID
le+10 L
le+10
T
1 E
E.
1,,I.
le-10
I. . . . . . . . . . .
......................
le-20 " 0
H.4
I~1.~ i=o.8
1:1.4 I-1.1 1=0.8
i 50
le+10
i
le-20
100 150 200 250 300 Generations ZEROMIN , --
i
50
100 150 200 250 300 Generations NEGSPHERE
le+10
i 1 r t.-
Y.
LL
le-10
le-10
j i
i i
100 200 300 400 500 600 700 800 Generations
le-20
i i i
i
i i i
i i
i
i i
i
i
100 200 300 400 500 600 700 800 900 Generations
Fig.4. Upper row: Sphere model (left) and hyper ellipsoid function (right), I = 0.8, 1.1, 1.4, Lower row: Continuous zeromin (left) and negsphere function (right), I = 0.8, 1.4, n = 32, N = 512, c = 10 -i2, soft recombination, 5 runs overlaid for every graph
128
The EA with soft recombination is characterized by the population size N and the selection intensity I, only. Furthermore, the number of features n has to be taken into account. Based on these parameters a number of questions arises concerning the convergence of the EA, i.e. we want to predict the number of generations gen*(g, I, n) such that If* - ]1 -< e where f is the optimal value approximation. The questions concerned are: 9 What is the influence of the selection intensity to gen* (I) for convergent populations ? 9 What is the influence of the population size N on the convergence to the optimum, and if so, what is the dependence gen*(N) ? 9 What is the critical population size N* for which the convergence probability Pco,~ = 1, i.e. 100% convergence is assured ? How does the convergence probability decline if the population size is decreased beyond N* v 9 How does the number of features n influence gen* (n), i.e. how is the scaling behavior of the EA with soft recombination ? To check the influence of these parameters we used the simple test functions given in Table 2.
Table a. Test Functions for Global Optimization Function
Constraints r~
F6(~) = ~ . l o + G ( x ~
- 10. cos(;~))
--600 ~ xi < 600
i=l
Pffx) : ~ -xi sin(x/~)
-500 ~ zi ~ 500
Ps(z) = ~ x~/4ooo - M cos(~,/~) + 1
-600 < x~ < 600
i=1
/=1
F9(z) = -20 e x p ( - ~
-30 _< xi < 30
,=1 ~ x~) - exp(-~ i=lkcos(2~rx,))+
+20 + e
Q i=1
12kxi--nin~l[2~Cxill~
-1000 < xi < 1000
k~0
1 This corresponds to the Fortran generic intrinsic function nint.
129
s
RASTRIGIN le+10
1e+10
i
' i
i
i
J
i
i
. . . . .
I
+o
le-10 i 1
le-20
I=l,g i +
i
i
i
i
=
........,:+
1=0.8
i
+
le-20
i
50 100150200250300350400450500
50 100150200250300350400450500
Generations ACKLEY
Generations GRIEWANGK le+10
le+10 i
i
i
1
te-10
le-10
le-20
0
.....:i.i1
50 t00 150 200 250 300 350 Generations
le-20
'
0
'
100 200 300 400 500 600 Generations
Fig. 5. Upper row: Rastrigin's function 2+6 and Schwefel's Function Fr, n = 32, N = 5120, lower row: Griewangk's function Fs and Ackley's function Fg, n = 32, N = 512, e = 10 -12, soft recombination, 5 runs overlaid for every graph
D e p e n d e n c e o n t h e s e l e c t i o n i n t e n s i t y I We checked the influence of the selection intensity I for large population sizes N >> 1. The convergence behavior for the sphere and for the ellipsoid model as well as for the aeromin and negsphere model dependent on the selection intensity is shown in Figure 4. It is quite obvious that there is an inverse proportionate dependence of the number of gener&tions until convergence on the selection intensity. Making a Mathematica fit we get the relation
gen*(I)
-- /2.]n(2)'
~I = c o n s t
(6)
Furthermore, it is interesting to notice that there is no difference in the convergence of the sphere and the ellipsoid model.
130
For the test functions of Table 3 we get the results shown in Figure 5 (Schwefel's function F7 is normalized to make a log-plot possible) which confirm the relation (6). For these function we observe different regions of convergence. The behavior of the EA with soft recombination reflects the self-referential structure of the functions to be optimized. For Griewangk's function Fs we get the structure for one feature corresponding to Figure 6. The region of a low fractal dimension corresponds to a high convergence speed and vice versa. D e p e n d e n c e on t h e p o p u l a t i o n size N The considerations in the previous section are valid for large population sizes N >> 1. If the population size is large enough the number of generations until convergence gen* (N) is independent on the population size N, i.e.
gen*(N)=kN,
kN=const
for
N>>I.
(7)
What happens for small population sizes? Figure 7 shows the influence of the population size on the probability of convergence for the sphere model and for Griewangk's function Fs. Obviously there exists a lower limit N* of the population size with a convergence probability pcon(N > N*) = 1. D e p e n d e n c e on t h e n u m b e r of f e a t u r e s n The scaling behavior of the EA with soft recombination, i.e. the convergence speed dependence on the number of features, is very interesting for large scMe optimization problems. For the sphere model and for Griewangk's function Fs we get the results for n = 32, 64,128 shown in Figure 8. Estimating the scaling behavior of the EA with soft recombination by using a Mathematica fit we get the following relation for a large population size N >> 1 =
=
const.
(s)
This relation is depicted in Figure 9. Soft M o d a l R e c o m b i n a t i o n S u m m a r i z e d Summarizing the convergence behavior for soft modal recombination one finally gets for N >> 1 the estimate
gen*(I, n) = Icx,~. 2.2
nl/(2,1n(~))
i2.~(2 ) ,
kx,,~ = const.
(9)
Soft M o d a l M u t a t i o n s
Mutations for the Breeder Genetic Algorithm [10] for continuous parameter optimization problems are introduced for a mutation base bm =- 2 by
z5 e -4- {2-15Am,2-14Am,...,2~
}
(10)
131
v
GRIEWANGK
GRIEWANGK 100 90 B0 70 60 50 40 30 20 10 0 9600-400-200 0 200 400 600
12 10 8
6
4
2 0 -200-150-100 -50 0
GRIEWANGK
GRIEWANGK 0.5i
9 ,
,
,
0'45 0.4 : ! . i i i i ~ i ~
0.35
t ;
0.3 0.25 0.2 0.15 " i 0.1 0.05 0
~ i ~ ~ i
3
,--,---
2,5
~i
i i : , - ~ --~-" i-----.i.......... ,. . . . .
2 ~
......... i
i
50 100 150 200
•
x
1.5 1
~....... ! ; ~ i
0.5 0 -40
-I -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8
-20
0
20
40
X
X
Fig. 6. Self-referential structure of Griewangk's function Fs, left: equally low fractal dimension, right: high fractal dimension
where Am =- t~m(Xma~: -- Xmln) defines the absolute mutation range and /~m the relative m u t a t i o n range. Mutations A for changing a feature zi to zi + Ai are chosen randomly with uniform distribution from the given set. For the BGA R,~ is set usually t o / ~ = 0.1. The discrete modal mutation scheme is a generalization of the B G A mutation scheme, i.e. the number of discrete values klo~ depends now on a lower limit of the relative mutation range R,~i, and the base of the mutations b m > 1 need not be necessarily b,~ = 2 such that discrete modal mutations are from
(11) with
I l~
i
(12)
132
SPHERE
GtEWANGK i
rr E
Oo..86f.t .........:............i ...........i
.............,--2
....................
rr
0.8 0.6
> C;
0
o
~,
o.4
.0
o
0.2 ...... i.............................................
0
o 50 100 150 200 250 300 350 PopulationSize
i ...................
i .. : i: , 100 200 300 400 PopulationSize
i
500
Fig. 7. Ratio of convergent runs P~o,~vs. population size N for the sphere model (left) and Griewangk's function Fs (right), n = 32, c = 10 -12, 20 runs
Discrete modal mutations are depicted schematically in Figure 10a). Since there are only discrete mutation steps in the set of possible mutations we checked the robustness of such a scheme by means of the multimodal test function set from [10, 20]. We extended the Multivalued Evolutionary Algorithm (MEA) [20] by the discrete modal mutation scheme. The new algorithm specification parameters are then the mutation base bin, the relative mutation range Rm and the minimal relative mutation range R,~,~. All other parameters are used as for the MEA, i.e. the population size N, the selection intensity I, the number of genes m for a phenotypic feature, the number n of phenotypic features xi, i = 1, ..., n, and the mutation probability p~. The algorithm stops if the best fitness value f* within the population is below a given threshold. For Rastrigin's function F6 and Ackley's function F9 Figures 11 left) show the average number of function evaluations
versus values of the relative mutation range 0 <_ Rm < 1. For comparative reasons with the BGA we always used bm -- 2, m = 1, and discrete recombination (uniform crossover). A smaller value of bm for fixed Rm and R,~i, gives more mutation modes which are more uniformly distributed. Obviously, the convergence of the algorithm depends crucially in both cases on the mutation range. For both functions there exist only small windows of R,~ where convergence to the global optimum can be reached. For Rm ~ 0.1 the algorithms converged to the global optimum with almost the minimum number of function evaluations. This corresponds to the standard value of R,~ for the BGA. The question is how to improve the robustness of a modal mutation scheme. To reach any point within the mutation range without leaving the concept of modal values we introduce soft modal mutations which are schematically shown
133 SPHERE
SPHERE
Y_
le-10
le-10 hi= 32
i i64
....... ]
ni128
1e-20 0 100 200 300 400 500 600 700 Generations GRIEWANGK le+101 ' ' ' ' ' ' + --
h=32i n=64
n"128
1 e-20 ~
0 50 100 150 200 250 3011350 Generations GRIEWANGK i
i
i
i
i
i
i
i
1
1e-101~
'
- ---;-----i------:7-"--4 _ ! ---le-10 i=3:!=64
le-20
i
;
t
;
;
' : ri=32 r~4
n=1:.28 ;
i
;
i n=~28
0 50 100150200250300350400450 Generations
0 100200300400500600700800900 Generations
Fig. 8. Scaling behavior of soft recombination for the sphere function (upper row) and Griewangk's function F8 (lower row) for I = 0.8 (left) and I = 1.4 (right), N = 512, e = 10 -12, 5 runs overlaid for every graph
in Figure 10b). A mutation is now randomly chosen from the set of probability distributions r
...,...,
(13)
with the same modal values as the discrete modal mutations. We used triangular probability distributions r with
2
< zk _<
2
(14)
though the considerations are not limited to this type and made the same experiments as for the crisp modal mutation scheme. The results are shown in Figure 11 right). It is obvious from these results that EASY with soft modal mutations converged for R,~ >_ 0.1 in any case to the global optimum.
134
LL e-
._o t-.
SOFT RECOMBINATION SCALING 160 140 i 120 ............................ " ii~..................... i"l i............................ i......................................... 100 8O 60 40 2O 0
0
200
400 600 800 Number of Features
1000
Fig. 9. Factor for the number of generations until convergence gen* vs. number of features n
")
I
_5 NVL
LAi Ill .......
NVS
PVS
i .......
PVL
.......
PVL
0
NVL
.......
NV$ PVS 0
Fig. 10. a) Crisp modal mutations and b) soft modal mutations, NVS and PVS: negative and positive very small mutations, NVL and PVL: negative and positive very large mutations
Because of the structure of Rastrigin~s function F6 there is a large difference in the average number of function evaluations for different values of the relative mutation range Rm. On average the number of evaluations in case of F6 is higher for soft modal mutations then for convergent crisp modal mutations. But the overall robustness of the soft modal mutation scheme is quite evident. For Ackley's function F9 soft modal mutations are in any case better then discrete modal mutations. The optimization of this function seems to be an easy task. M u l t i p l e M o d a l M u t a t i o n s Perhaps the first considerations concerning the mathematical modeling of the abstract nature of adaptation in multidimensional systems can be found in [5]. There is given a geometric illustration using a sphere.
135
100000
100000
~n c-
.o
o
_= 1000O
W
10Ooo
t-
l 1000 100000
' ' 0 0.1 0.2 02 0.4 0.5 0.6 0.7 0.8 0.9 1 Mutation Range .
.
.
.
.
.
.
U.
1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0,9 Mutation Range 100000
.,r ,
80000
8O0O0j
60000
60000
40000
40OO0
20000
2OOOO
0
0 l
i
t
I
i
,
T
L
.............................................. ,J
i--
i
r
i
i
~
~
r
,=
0 0.1 0.2 0.3 0.4 0.5 0.6 0,7 0.8 0.9 MutationRange
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 MutationRange
Fig. 11. Top: Crisp (left) and soft (right) modal mutations, vs. Rm for Rastrigin's function F6, N = 20 , I = 1.4, f* < 9.0. 10 -1 , Rmin = 10 -5 , n = 20, p,~ = 1/n, 20 runs, Bottom: Crisp (left) and soft (right) modal m u t a tions, vs. Rrn for Ackley's function Fg, N = 20 , [ = 1.4, f* < 10 -3, Rmin = 10 -5, n = 20, Pm = l/n, 20 runs
Based on this model the relation
1/2
p- v/~
e--~ dt with
(15)
x=r
represents the probability of improvement p dependent on the number of dimensions n, the distance from a fixed point d, and the undirected change r expressed as a distance. The relation holds for large n. In [8] the corresponding expression was derived purely geometrically for any dimension n as 1
p=-~I~(u/2,1/2)
with
z=l-(~)
r
2
,v=n-1.
(16)
136 I, (., .) is given as the fraction
Iz (a, b) - B~ (a, b)
B(a, b)
(17)
with B(a,b) the Beta-Function and Bz(a, b) the incomplete Beta-Function. If u >> 1 then (16) can be approximated by (15). The lesson to learn is as follows. The probability of an improvement declines very rapidly for a given change r with an increasing number of features n changing. For large n changes of features should be smaller then changes for small n. This idea is used in an adaptive way in Evolution Strategies [14, 16].
Table 4. < fr > for a (1,10)-Evolution Strategy with individual step-size adaptation ES-Scale and with a general step-size adaptation ES-Simple, f* < 10-1~ 20 runs, data from [11], < fe~al> for soft modal mutations and mutation probabilities p,~ = l/n,5/n, lO/n, f* < 10 -1~ N = 10, I = 1.4, R,~ = 0.1, -R,~in = 10-s, 20 runs ,,
,
Hyper-ElliPs0id
ES'Scale ES-SimplelEASY-1 EASY-51EASY-10
130r 11001
2560 I 13500 I
20800[ 8322I 5970001 271111 196921 1090001 23000000 104520 742701 ---I 210068 _ ~ 1 5 8155224 2371
20122I 76333I
We adopt this idea for EASY in the following way. Let us assume that we want to mutate a fixed number of features. Then a randomly selected first feature with value xl will be changed by a mutation A1 to xl + A1 corresponding to the modal mutation scheme with the mutation range Am1 = Am, the randomly selected second feature with value z~ will be changed by a mutation As now with the new mutation range Am2 = z~t, etc. That is a very simple scaling rule for multiple mutations taking into account the above given considerations. The more features are changed the smaller the change will be from feature to feature. We compare this scaling rule for multiple mutations with a new individual step-size adaptation in Evolution Strategies [11]. As a test function the hyperellipsoid function Fellipsoid from Table 2 with - 1 _. xi < 1 is used. The average number of function evaluations < f ~ > for Evolution Strategies without accumulated information to find the minimum with the specified accuracy is given in Table 4. x = (1, ..., 1) is used as the starting point. The performance of EASY with the given settings is shown in Table 4. Starting points are initialized like for the ES. The performance results from Table 4 show that the simple scaling rule for multiple modal mutations works very well. Because of a mutation base b m = 2 there is almost no difference between the mutation of 5 and 10 features. But
137
m u t a t i n g more then one feature gives obviously better results, at least for high dimensional u n i m o d a l functions. For P m = 5 / n and Pm --= l O / n the average n u m b e r of function evaluations scales almost linear with n, i.e. < f ~ a ~ > ~ 750. n,
3
Scaling of Soft Genetic Operators
T h e scaling performance of the EA with soft genetic operators (EASY) is compared with t h a t of the BGA. This is done because other EAs did not consider the scaling behavior for problems with up to 1000 variables.
T a b l e 5. < f ~ l > vs. n for Function F6 (left) with f* _< 9 910 -1, I = 1.4, Rm = 0.I , R,,~in = 10 -~ , P m = 1/n , b~ = 2, 20 runs, and for Function F7 (right) with f . < fopt + 5 , 1 0 .4 9 If ~ , I = 1.4, Rm = 0.75 ,Rmi, = 10 -4 , P m = 1/n ,bm = 2, 20 runs, vs. n for Function Fs (left) with f* < 10 -3, I = 1.4, Rm --= 0.1 , Rmin = 10 -8 ,Pm = l / n , b m = 2, 20 runs, and for Function F9 (right) with f* < 10 -3 , [ --- 1.4, R,,, = 0.1 ,Rmi,~ = 10 -4, p~ = 1/n ,bm = 2, 20 runs, BGA data from [10] Rastrigin's Function F~ n EASY BGA 20 100 200 400 1000
20 6098 20 45118 20 98047 20 243068 20 574561
20 20
3608 20]20 10987 500 16100 25040 100[20 1014581000 92000 52948 200120 241478 2000 248000 i 201 112634 40020 4300844000 699803i 20 i 337570 1000'20 1 0 6 7 2 2 1 ~ l
Gfiewangk'sFunctionFs EASY BGA
t
n NI
Schwefel's Function F7 I EASY BGA
NI
500 26700 500 66000 500 77250 500 361722 500 128875 500 748300 500 229750 500 1630000~ 500 563350 - - - - ] 1 0 0 0
....Ac~ey'sFunctio~ F9 EASY BGA
n NI
13997 57628 122347 262606i 686614
20 i 19420 20 538601 20 107800 20i 220820i 20[ 548306 i
T h e test functions given in Table 3 are very popular in the literature on global optimization. The m a i n results of this paper are summarized in Table 5 with n the n u m b e r of variables, N the population size, and feral the average n u m b e r of function evaluations for 20 runs. The algorithm stops if f* < g(e) where e is chosen in such a way t h a t f* is in the basin of attraction of the global optimum. It is interesting to notice t h a t the B G A does not scale up to 1O00 variables for functions F7, which has no u n i m o d a l m a c r o - s t r u c t u r e , and Fs, which is not
138
linear separable. The EA with soft genetic operators solved the problem with a fixed population size even for these functions. For functions Fs and F9 the average number of function evaluations scaled approximately linear with n, and for F6 and F7 approximately with n. ln(n).
4
Performance Comparisons with other Methods
The performance of the proposed Evolutionary Algorithm applied to continuous parameter optimization problems will be compared with recently published results using Simulated Reannealing Techniques and Bayesian/Sampling methods.
4.1
Advanced Simulated Annealing
For global optimization a very fast simulated reannealing (VFSR) technique was proposed [13]. He used the test function/'10 [7] which is shown in Figure 12. The higher the value of/~ is the smoother the function will be though it is nowhere differentiable. In [13] results for the rather smooth case fl = 60 and n = 4 are reported. As can be seen from Table 6 for this case EASY is approximately 10 times faster then VFSR and gives better approximation results, i.e. the minimum f* found was f* <_ 1.05 for all runs. Results for higher dimensions n and lower values of/~ are also reported.
100
10
1 -3
-2
-1
0
1
2
Fig. 12. Katsuura's function Flo for ~ = 32, 35, 60, n = 1
4.2
Bayesian/Sampling Methods
In (17] a detailed overview and numerous comparisons of Bayesian/Sampling (B/S) techniques for global optimization are given. The comprised results for
139
T a b l e 6. Katsuura's function Flo, < f ~ Rm = 0.02 , Rmi. = 10 -6, 20 runs
> vs. n , f* < 1.05, N = 10, I --= 1.4,
's Function Flo i Minb~t
~atsuura
SYIVFSRNASYIVFSa
207 22326 1.004 1.019 777 1.009 335 1.037 281 1.007 626 1.012 333 1.035
m o s t of the test functions are reported in Table 7 compared to the performance of the Evolutionary Algorithm with m o d a l m u t a t i o n s given in the third column of Table 7. T h e considered test functions are described in [17, 20]. The function abbreviations refers to G P - Goldstein-Price, Ctt - six-hump camelback, B R Branin, SH - Shubert, and EA - E a s o m function. Obviously, no one of the B a y e s i a n / S a m p l i n g m e t h o d s converged uniformly best in all cases. T h e m e t h o d s of Zilinskas and Shaltenis, also under consideration, were discarded in [17] because of the high C P U requirements (in some cases exceeding 1 h on a VAX 8650). T h e Evolutionary Algorithm with m o d a l m u t a t i o n s converged almost uniformly better then the B / S methods.
T a b l e 7. e~z = 10 -3, f * ~ solutions with B/S-methods S: Stuckman, M: Mockus, P: Perttunen, T: Torn, C: Monte Carlo, A: Simulated Annealing, ** no evaluation due to extreme CPU requirements,
00 > 1000 < 200 < 500 > 1000 > 1000
~001~ ~ ~00{~~00o{~~000{ {_~:{~0{ ~%-,ooo1>,00oj_~oo{>lOOO1>,0oo1>,oooi {~{30{
21~{{ ~ 500 >-10001~00 j _<~0ot> 10oo{> 10oot < 1000 > 1000
> 1000
140
5
Conclusions
Evolutionary Algorithms with soft genetic operators are a robust method for large-scale global parameter optimization problems. With respect to robustness and performance soft genetic operators are superior to crisp ones. The EA with soft genetic operators solved the problem for all test functions with up to 1000 variables with a fixed population size. For functions Fs and F~ the average number of function evaluations scaled approximately linear with n, and for F6 and F7 approximately with n. In(n). For F6 and F9 EASY used on average more function evaluations than the BGA. That is the prize to be paid for a higher robustness. Multiple mutations can be introduced by a very simple scaling rule. Performance comparisons of EASY with recently published results based on Very Fast Simulated Reannealing and Bayesian/Sampling techniques show that EASY has in almost all cases a better performance. Future research concerns the theoretical confirmation of the given experimental results. A c k n o w l e d g m e n t The author would like to thank Joachim Born and Ivan Santibanez-Koref from the Bio- and Neuroinformatics Research Group of the Bionics and Evolution Techniques Laboratory of the Technical University Berlin for helpful discussions. The author participated in the Biocomputation Workshop at Monterey when he was with the International Computer Science Institute (ICSI), Berkeley.
References 1. Th. B&ck and K.-P. Schwefel " An Overview of Evolutionary Algorithms for Para.meter Optimization" Evolutionary Computation 1 (1):1-23, 1993 2. K. A. DeJong "An Analysis of the Behavior of a Class of Genetic Adaptive Systems" Doctoral Dissertation, University of Michigan 1975 3. L. J. Eshelman and J. D. Schaffer " Real-coded Genetic Algorithms and Intervalschemata" Foundations of Genetic Algorithms, pp. 187-202, Morgan Kaufmann 1992 4. D. S. Falconer "Introduction to Quantitative Genetics" Longman 1981 5. R.A. Fisher "The Genetical Theory of Natural Selection" Oxford University Press 1929, 2rid rev. ed. Dover Publications 1958 6. D. E. Goldberg "Genetic Algorithms in Search, Optimization, and Machine Learning" Addison-Wesley 1989 7. H. Katsuura "Continuous Nowhere-Differentiable Functions - An Application of Contraction Mappings" The American Mathematical Monthly, 5 (98) 1991 8. M. Kimura "The Neutral Theory of Molecular Evolution" Cambridge University Press 1983 9. B. Kosko "Neural Networks and Fuzzy Systems" Prentice Hall 1992 10. H. Miihlenbein and D. Scblierkamp-Voseu "Predictive Models for the Breeder Genetic Algorithm, I. Continuous Parameter Optimization" Evolutionary Computation 1 (1):25-49, 1993
141
11. A. Ostermeier, A. Gawelczyk and N. Hansen "A Derandomized Approach to Self Adaptation of Evolution Strategies" Technical University Berhn, Bionics and Evolution Techniques Laboratory, Technical Report TR-93-003, July 1993, Submitted to Evolutionary Computation 12. D. Rasch "Einfiihrung in die mathematische Statistik" Deutscher Verlag der Wissenschaften, Berlin 1976 13. B. Rosen "Function Optimization Based on Advanced Simulated Annealing", ftp: cis. archive, ohio-state, edu, dir: /pub/neuroprose, file: rosen, advsim, ps.Z 14. I. Rechenberg "Evolutionsstrategie" Frommann-Holzboog 1973 15. I. Rechenberg "Evolutionsstrategie 94" Frommann-Holzboog 1994 16. H.-P. Schwefel "Numerical Optimization of Computer Models" John Wiley 1981 17. Stuckman, B. E. and E. E. Easom "A Comparison of Bayesian/ Samphng Global Optimization Techniques" IEEE Trans. Systems, Man and Cybernetics. Vol. 22, No. 5, pp. 1024-1032, 1992 18. G. Syswerda "Uniform Crossover in Genetic Algorithms" Proc. Third Int. Conf. on Genetic Algorithms. pp. 2-9, D. Schaffer (Ed.), Morgan Kaufmann 1989 19. H.-M. Voigt "Fuzzy Evolutionary Algorithms" Technical Report tr-92-038, International Computer Science Institute (ICSI) Berkeley, June 1992, ftp: icsi.berkeley.edu, dir: /pub/techreports/1992, file: tr-92-038.ps.Z 20. H.-M. Voigt, J. Born and I. Santibanez-Koref "Multivalued Evolutionary Algorithms" Technical Report tr-93-022, International Computer Science Institute (tCSI) Berkeley, April 1993, see also in: St. Forrest (Ed.) "Proc. 5th Intl. Conf. Genetic Algorithms" p. 657, San Mateo: Morgan Kaufmann Pub. 1993 and ftp: icsi.berkeley.edu, dir: /pub/techreports/1993, file: tr-93-022.ps.Z 21. H.-M. Voigt and T. Anheyer "Modal Mutations in Evolutionary Algorithms" Proc. IEEE Int. Conf. on Evolutionary Computation, vol. I, pp.88-92, IEEE 1994 22. A. H. Wright "Genetic Algorithms for Real Parameter Optimization" Foundations of Genetic Algorithms. pp. 205-220, Morgan Kaufmann 1990 23. L.A. Zadeh " Fuzzy Sets" Information and Control, vol. 8, 338-353, 1965
Analysis of Selection, Mutation and Recombination in Genetic Algorithms Heinz Miihlenbein and Dirk Schlierkamp-Voosen GMD Schlo$ Birlinghoven D-53754 Sankt Augustin, Germany
A b s t r a c t . Genetic algorithms have been applied fairly successful to a number of optimization problems. Nevertheless, a common theory why and when they work is still missing. In this paper a theory is outlined which is based on the science of plant and animal breeding. A central part of the theory is the response to selection equation and the concept of heritability. A fundamental theorem states that the heritability is equal to the regression coefficient of parent to offspring. The theory is applied to analyze selection, mutation and recombination. The results are used in the Breeder Genetic Algorithm whose performance is shown to be superior to other genetic algorithms.
1
Introduction
Evolutionary algorithms which model natural evolution processes were already proposed for optimization in the 60's. We cite just one representative example, the outstanding work of Bremermann. He wrote in [6]. "The major purpose of the work is the study of the effects of mutation, mating, and selection on the evolution of genotypes in the case of non-linear fitness functions. In view of the mathematical difficulties involved, computer experimentation has been utilized in combination with theoretical analysis... In a new series of experiments we found evolutionary schemes that converge much better, but with no known biological counterpart,." These remarks are still vMid. The designer of evolutionary algorithms should be inspired by nature, but he should not intend a one-to-one copy. His major goal should be to develop powerful optimization methods. An optimization is powerful if it is able to solve difficult optimization problems. Furthermore the algorithm should be based on a solid theory. We object popular arguments along the lines: "This is a good optimization method because it is used in nature", and vice versa: "This cannot be a good optimization procedure because you do not find it in nature". Modelling the evolution process and applying it to optimization problems is a challenging task. We see at least two families of algorithms, one modelling natural and self-organized evolution, the other is based on rational selection as done by human breeders. In principle artificial selection of animals for breeding and artificicial selection of virtual animals on a computer is the same problem. Therefore the designer of an evolutionary algorithm can profit from the
143
knowledge accumulated by human breeders. But in the course of applying the algorithm to difficult fitness landscapes, the human breeder may also profit from the experience gained by applying the algorithm. Bremermann notes [6]: "One of the results was unexpected. The evolution process may stagnate far from the optimum, even in the case of a smooth convex fitness function...It can be traced to the bias that is introduced into the sampling of directions by essentially mutating one gene at a time. One may think that mating would offset this bias; however, in many experiments mating did little to improve convergence of the process." Bremermann used the term mating for recombining two (or more) parent strings into an offspring. The s~agnation problem will be solved in this paper. Bremermann's algorithm contained most of the ingredients of a good evolutionary algorithm. But because of limited computer experiments and a misssing theory, he did not find a good combination of the ingredients. In the 70% two different evolutionary algorithms independently emerged the genetic algorithm of Holland [18] and the evolution strategies of Rechenberg [24] and Schwefel [27]. Holland was not so much interested in optimization, but in adaptation. He investigated the genetic algorithm with decision theory for discrete domains. Holland emphasized the importance of recombination in large populations, whereas Rechenberg and Schwefel mainly investigated normally distributed mutations in very small populations for continuous parameter optimization. Evolutionary algorithms are random search methods which can be applied to both discrete and continuous functions. In this paper the theory of evolutionary algorithms will be based on the answers to the following questions:
-
- Given a population, how should the selection be done? - Given a mutation scheme, what is the expected progress of successful mutations? - Given a selection and recombination schedule, what is the expected progress of the population? How can selection, mutation and recombination be combined in synergistic manner?
This approach is opposite to the standard GA analysis initiated by Holland, which starts with the schema theorem [18]. The theorem predicts the effect of proportionate selection. Later mutation and recombination are introduced as disruptions of the population. Our view is the opposite. We regard mutation and recombination as constructive search operators. They have to be evaluated according to the probability that they create better solutions. The search strategies of mutation and recombination are different. Mutation is based on chance. It works most efficiently in small populations. The progress for a single mutation step is almost unpredictable. Recombination is a more global search based on restricted chance. The bias is implicitly given by the population. Recombination only shuffles the substrings contained in the population. The substrings of the optimum have to be present in the population. Otherwise a search by recombination is not able to locate the optimum.
144
Central themes of plant and animal breeding as well as of genetic algorithms can be phrased in statistical terms and can make substantial use of statistical techniques. In fact, problems of breeding have been the driving forces behind the development of statistics early in this century. The English school of biometry introduced a variety of now standard statistical techniques, including those of correlation and regression. We will use these techniques in order to answer the above questions. A central role plays the response to selection equation developed in quantitative genetics. The outline of the paper is as follows. In section 2 some popular evolutionary algorithms are surveyed. Truncation selection and proportionate selection are investigated in section 3. In section 4 a fundamental theorem is proven which connects the response to selection equation with parent-offspring regression. Recombination/crossover and mutation are theoretically analyzed in sections 5 and 6. In section 7 mutation vs. crossover is investigated by means of a competition between these two strategies. Then numerical results are given for a test suite of discrete functions.
2
Evolutionary Algorithms
A previous survey of search strategies based on evolution has been done in [20]. Evolutionary algorithms for continuous paramet, er optimization are surveyed in [4]. Algorithms which are driven mainly by mutation and selection have been developed by Rechenberg [24] and Schwefel [27] for continuous parameter optimization. Their algorithms are called evolution strategies. (# + A) E v o l u t i o n S t r a t e g y STEP1: STEP2: STEP3: STEP4: STEP5:
Create an initial population of size )~ Compute the fitness F ( x i ) i = 1 , . . . , A Select the # < A best. individuals Create A/# offspring of each of the # individuals by small variation If not finished, return to STEP2
An evolution strategy is a random search which uses selection and variation. The small variation is done by randomly choosing a number of a normal distribution with zero mean. This number is added to the value of the continuous variable. The algorithm adapts the amount of variation by changing the variance of the normal distribution. The most popular algorithm uses p = A = 1 In biological terms, evolution strategies model natural evolution by asexual reproduction with mutation and selection. Search algorithms which model sexual reproduction are called genetic algorithms. Sexual reproduction is characterized by recombining two parent strings into an off`spring. The recombination is called crossover. Genetic algorithms were invented by Holland [18]. Recent surveys can be found in [14] and the proceedings of the international conferences on genetic algorithms [25] [5] [131.
145
Genetic Algorithm S T E P 0 : Define a genetic representation of the problem X~v S T E P 1 : Create an initial population P(0) = x ~ S T E P 2 : Compute the average fitness T = ~ N F ( x i ) / N . Assign each individual the normalized fitness value F ( x ~ ) / F S T E P 3 : Assign each xi a probability p(xi,t) proportional to its normalized fitness. Using this distribution, select N vectors from P(t). This gives the set S(t) S T E P 4 : Pair all of the vectors in S(t) at random forming N / 2 pairs. Apply crossover with probability p~ross to each pair and other genetic operators such as mutation, forming a new population P(t + 1) S T E P 5 : Set t = t + 1, return to STEP2
In the simplest case the genetic representation is just a bitstring of length n, the chromosome. The positions of the strings are called loci of the chromosome. The variable at a locus is called gene, its value allele. The set of chromosomes is called the genotype which defines a phenotype (the individual) with a certain fitness. The genetic operator mutation changes with a given probability Pm each bit of the selected string. The crossover operator works with two strings. If two strings X ---: ( X l , . . . , Xn) and y = ( Y l , . . . , Yn) are given, then the uniform crossover operator [28] combines the two strings as follows
z = (zl,...,z,)
z~ = xi or zi = yi
Normally xi or Yi are chosen with equal probability. In genetic algorithms many different crossover operators are used. Most popular are one-point and two-point crossover. One or two loci of the string are randomly chosen. Between these loci the parent strings are exchanged. This exchange models crossover of chromosomes found in nature. The disruptive uniform crossover is not used in nature. It can be seen as n-point crossover. The crossover operator links two probabilistically chosen searches. The information contained in two strings is mixed to generate a new string. Instead of crossing-over I prefer to use the general term recombination for any method of combining two or more strings. A genetic algorithm is a parallel random search with centralized control. The centralized part is the selection schedule. The selection needs the average fitness of the population. The result is a highly synchronized algorithm, which is difficult to implement efficiently on parallel computers. In the parallel genetic algorithm P G A [20],[21], a distributed selection scheme is used. This is achieved as follows. Each individual does the selection by itself. It looks for a partner in its neighborhood only. The set of neighborhoods defines a spatial population structure. The second major change can also easily be understood. Each individual is active and not acted on. It may improve its fitness during its lifetime by performing a local search.
146
The parallel genetic algorithm PGA can be described as follows: : Parallel G e n e t i c A l g o r i t h m Define a genetic representation of the problem Create an initial population and its population structure Each individual does local hill-climbing Each individual selects a partner for mating in its neighborhood An offspring is created with genetic operators working on the genotypes of its parents S T E P S : The offspring does local hill-climbing. It replaces the parent, if it is better than some criterion (acceptance) S T E P 6 : If not finished, return to STEP3. STEP0: STEP1: STEP2: STEP3: STEP4:
It has to be noticed that each individual may use a different local hill-climbing method. This feature will be important for problems, where the efficiency of a particular hill-climbing method depends on the problem instance. In the PGA the information exchange within the whole population is a diffusion process because the neighborhoods of the individuals overlap. All decisions are made by the individuals themselves. Therefore the PGA is a totally distributed algorithm without any central control. The PGA models the natural evolution process which self-organizes itself. The next algorithm, the breeder genetic algorithm B G A [22] is inspired by the science of breeding animals. In this algorithm, each one of a set of virtual breeders has the task to improve its own subpopulation. Occasionally the breeder imports individuals from neighboring subpopulations. The DBGA models rational controlled evolution. We will describe the breeder genetic algorithm only. Breeder Genetic Algorithm S T E P 0 : Define a genetic representation of the problem S T E P 1 : Create an initial population P(0)
S T E P 2 : Each individual may perform local hill-climbing S T E P 3 : The breeder selects T% of the population for mating. This gives set S(t) S T E P 4 : Pair all the vectors in S(t) at random forming N pairs. Apply the genetic operators crossover and mutation, forming a new population P(t § 1). S T E P 5 : Set t - t + 1, return to STEP2 if it is better than some criterion (acceptance) S T E P 6 : If not finished, return to STEP3. The major difference between the genetic algorithm and the breeder genetic algorithm is the method of selection. The breeders have developed many different selection strategies. We only want to mention truncation selection which
147
breeders usually apply for large populations. In truncation selection the T% best individuals of a population are selected as parents. The different evolutionary algorithms described above put different emphasis on the three most important evolutionary forces, namely selection, mutation and recombination. We will in the next sections analyze these evolutionary forces by methods developed in quantitative genetics. One of the most important aspect of algorithms inspired by processes found in nature is the fact that they can be investigated by the methods proven usefully in the natural sciences.
3
N a t u r a l vs. A r t i f i c i a l S e l e c t i o n
The theoretical analysis of evolution centered in the last 60 years on understanding evolution in a natural environment. It tried to model natural selection. The term natural selection was informally introduced by Darwin in his famous book "On the origins of species by means of natural selection". He wrote: "The preservation of favourable variations and the rejection of injurious variations, I call Natural Selection." Modelling natural selection mathematically is difficult. Normally biologist introduce another term, the fitness of an individual which is defined as the number of offspring of that individual. This fitness definition cannot be used for prediction. It can only be measured after the individual is not able to reproduce any more. Artificial selection as used by breeders is seldom investigated in textbooks on evolution. It is described in more practical books aimed for the breeders. We believe that this is a mistake. Artificial selection is a controlled experiment, like an experiment in physics. It can be used to isolate and understand specific aspects of evolution. Individuals are selected by the breeder according to some trait. In artificial selection predicting the outcome of a breeding programme plays a major role. Darwin recognized the importance of artificial selection. He devoted the whole first chapter of his book to artificial selection by breeders. In fact, artificial selection independently done by a number of breeders served as a model for natural selection. Darwin wrote: "I have called this principle by the term Natural Selection in order to mark its relation to man's power of selection." In this section we will first analyze artificial selection by methods found in quantitative genetics [11], [81 and [7]. A mathematically oriented book on quantitative genetics and natural selection is [9]. We will show at the end of this section that natural selection can be investigated by the same methods. A detailed investigation can be found in [23]. 3.1
Artificial Selection
The change produced by selection that mainly interests the breeder is the response to selection, which is symbolized by R. R is defined as the difference between the population mean fitness M ( t + 1) of generation t + 1 and the population mean of generation t. R(t) estimates the expected progress of the population.
148
R(t) = M(t + 1) - M(t)
(1)
Breeders measure the selection with the selection differential, which is symbolized by S. It is defined as the difference between the average fitness of the selected parents and the average fitness of the population. s(t) : M , ( t ) - M ( t )
(2)
These two definitions are very important. They quantify the most important variables. The breeder tries to predict R(t) from S(t). Breeders often use truncation selection or mass selection. In truncation selection with threshold Trunc, the Trunc go best individuals will be selected as parents. Trunc is normally chosen in the range 50go to 10%. The prediction of the response to selection starts with
n(t) : b,. S(t)
(3)
bt is called the realized heritability. The breeder either measures b~ in previous generations or estimates bt by different methods [23]. It is normally assumed that bt is constant for a certain number of generations. This leads to R(t) = b. S(t)
(4)
There is no genetics ~nvolved in this equation. It is simply an extrapolation from direct observation. The prediction of just one generation is only half the story. The breeder (and the GA user) would like to predict the cumulative response R~ for n generations of his breeding scheme.
=
R(t)
(5)
t=l
In order to comput~ ign a second equation is needed. In quantitative genetics, several approximate equations for S(t) are proposed [7], [11]. Unfortunately these equations are only valid for diploid organisms. Diploid organisms have two sets of chromosomes. Most genetic algorithms use one set of chromosomes, i.e. deal with haploid organisms. Therefore, we can only apply the research methods of quantitative genetics, not the results. If the fitness values are normal distributed, the selection differential S(t) in truncation selection is approximately given by s = i
(6)
where ~p is the standard deviation. I is called the selection intensity. The formula is a feature of the normal distribution. A derivation can be found in [7]. In table 1 the relation between the truncation threshold Trune and the selection intensity I is shown. A decrease from 50 % to 1% leads to an increase of the selection intensity from 0.8 to 2.66.
149
Trunc 80 e/~50 % 40 % 20'% 10 % 1% I 0.34 0.8 0.97 1.2 i.76 2.66 Table 1. Selection intensity.
If we insert (6) into (4) we obtain the well-known response to selection equation
[11]. R(t) = b. I .
(7)
The science of artificial selection consists of estimating b and c~p(t). The estimates depend on the fitness function. We will use as an introductory example the binary O N E M A X function of size n. Here the fitness is given by the number of l ' s in the binary string. We will first estimate b. A popular method for estimation is to make a regression of the midparent fitness value to the offspring. The midparent fitness value is defined as the average of the fitness of the two parents. We assume uniform crossover for recombination. For the simple O N E M A X function a simple calculation shows that the probability of the offspring being better than the midparent is equal to the probability of them being worse. Therefore the average fitness of the offspring will be the same as the average of the midparents. But this means that the average of the offspring is the same as the average of the selected parents. This gives b = 1 for O N E M A X . Estimating c~p(t) is more difficult. We make the assumption that uniform crossover is a random process which creates a binomial fitness distribution with probability p(t). p(t) is the probability that there is a 1 at a locus. Therefore the standard deviation is given by
c~v(t) = ~/n . p(t) . (1 - p(t))
(8)
T h e o r e m 1. If the population is large enough that it converges to the optimum
and if the selection intensity I is greater than O, then the reponse to selection is given for the O N E M A X function by [
n(t) =
Vp(t)(1 - ;(t))
(9)
The number of generations needed until equilibrium is approximate GENe = (7c -~ - arcsin(2po
-
1)) 9 v~[
(lo)
Po = p(O) denotes the probability of the advantageous bit in the initial population.
150
Proof.
Noting
that R(t)=n(p(t+l) p(t + I) - p(t)
- p ( t ) ) w e obtain the difference equation
I = ~.
~-~.
p(t))
(1 -
(li)
The difference equation can be approximated by a differential equation dp(t) _ -
(i -
(12)
;(t))
The initial condition is p(0) = P0. The solution of the differential equation is given by
p(t) = O.5 (l +sin (-~nt + arcsin(2po -1)) ) The convergence of the total population is characterized by p(GENe) can be easily computed from the above equation. One obtains
GENe
/ 71"
(13)
= 1. GEN~
\
= (~ - arcsin(2po- 1)) . - 7 -
(14)
| The number of generations needed until convergence is proportional to V~ and inversely proportional to the selection intensity. Note that the equations are only valid if the size of the population is large enough so that the population converges to the optimum. The most efficient breeder genetic algorithm runs with the minimal popsize N*, so that the population still converges to the optimum. N* depends on the size of the problem n, the selection intensity I and the probability of the advantageous bit P0. This problem will be discussed in section 5. R e m a r k : The above theorem assumes that the variance of the fitness is binomial distributed. Simulations show that the phenotypic variance is slightly less than given by the binomial distribution. The empirical data is better fitted if the binomial variance is reduced by a a factor ~r/4.3. Using this variance one obtains the equations =
I
v/p(t)(1 - p ( t ) )
GEN~ =--~4.3 (Tr -~-arcsin(2po-1)
)
(15)
" x/~ I
(16)
Equation 15 is a good prediction for the mean fitness of the population. This is demonstrated in figure 1. The mean fitness versus the number of generations is shown for three popsizes N = 1024, 256, 64. The selection intensity is I = 0.8, the size of the problem n = 64. The initial population was generated with p0 = 1/64. The fit of equation 15 and the simulation run with N = 1024 is very good. For N = 256 and N = 64 the population does not converge to the optimum. These popsizes are less than the critical popsize N*(I, n, po). A more detailed evaluation of equation 15 can be found in [23].
151 MeanFit
60 50
-..... - ---
Theory Simulation(N=1024) Simulation(N= 256) Simulation(N=64) S
40 30 20
~ ,~ .....
" .......
A-"
//5~ t*
10 10
0
20
30
40
Gen
Fig. 1. Mean fitness for theory and simulations for various N 3.2
N a t u r a l Selection
Natural selection is modelled by proportionate selection in quantitative genetics. Proportionate selection is defined as follows. Let 0 _< gi(t) < 1 be the proportion of genotype i in a population of size N at generation t, Fi its fitness. Then the phenotype distribution of the selected parents is given by
9i ((t ))Fi "
gi,s(t)=
(17)
where M(t) is the average fitness of the population N
M(t) = Egi(t)Fi
(18)
i=1
Note that proportionate selection is also used by the simple genetic algorithm [14]. T h e o r e m 2. In proportionate selection the selection differential is given by S(t) = i ( t ) For the O N E M A X
(19)
function of size n the response to selection is given by R(t) = 1 - p(t)
(20)
If the population is large enough, the number of generations until p(t) = 1 - e is given for large n by GENI_~ ~ n . In 1 - Po E
Po is the probability of the advantageous allele in the initial population.
(21)
152
Pro@ N
S(t) = E pi,sFi - M(t) i=1
N pi(t)F 2 _ pi(t)M~(t) M(t)
i=1
:
1 M(t--U
- M(t)) i=1
For O N E M A X ( n )
we have R(t + 1) = S(t). Furthermore we approximate
~ ( t ) ,,~ np(t)(1 - p(t))
(22)
Because M(t) = np(t), equation 20 is obtained. From/~(t) = n(p(t + 1) - p(t)) we get the difference equation 1
= -
1 + (1 -
);(t)
(23)
n
This equation has the solution
p(t)= 1(1+(1-
1 1~1 n)+.,.+(1-n))+(1-
1
)tpo
This equation can be simplified to
p(t) = 1 - (1 - 1)t(1 - P o ) By setting p(GENI_~) = 1 - e equation 21 is easily obtained. | R e m a r k : If we assume R(t) = S(t) we obtain from equation 19 a version of Fisher's fundamental theorem of natural selection [12] [9]. By comparing truncation selection and proportionate selection one observes that proportionate selection gets weaker when the population approaches the optimum. An infinite population will need an infinite number of generations for convergence. In contrast, with truncation selection the population will converge in at most O(v/-~) generations independent of the size of the population. Therefore truncation selection as used by breeders is much more effective than proportionate selection for optimization. The major results of these investigations can be summarized as follows. A genetic algorithm using recombination~crossover only is most efficient if run with the minimal population size N* so that the population converges to the optimum. Proportionate selection as used by the simple genetic algorithm is inefficient.
t53
4
Statistics and Genetics
Central themes of plant and animal breeding as well as of genetic algorithms can be phrased in statistical terms and can make substantial use of statistical techniques. In fact, problems of breeding have been the driving forces behind the development of statistics early in this century. The English school of biometry introduced a variety of now standard statistical techniques, including those of correlation and regression. In this section we will only prove the fundamental theorem, which connects the rather artificial factor b(t) with the well known regression coefficient of parent-offspring.
Let X(t) = (xl(t),...xN(t)) be the population at generation t, where xi denotes the phenotypic value of individual i. Assume that an offspring generation Xt(t + 1) is created by random mating, without selection. If the regression equation Theorem3.
x~j(t + 1) = a(t) + bx,x(t).
xi(t) + xj(t) 2
+ eq
(24)
with E(e~) = 0
is valid, where x~j is the offspring of xi and xj, then
bx,x(t) ~ b(t)
(25)
Proof. From the regression equation we obtain for the averages E(x'(t + 1)) = a(t) + bx,x(t)M(t) Because the offspring generation is created by random mating without selection, the expected average fitness remains constant
E(x'(t + 1)) = M(t) Let us now select a subset Xs(t) C X(t) as parents. The parents are randomly mated, producing the offspring generation X(t + 1). If the subset Xs(t) is large enough, we may use the regression equation and get for the averages
E(x(t + 1)) = a(t) + bx,x(t) * (Ms(t) - M(t)) Subtracting the above equations we obtain
M(t + 1) - M(t) = bx,x(t)S(t)
II
154
For the proof we have used some additional statistical assumptions. It is outside the scope of this paper to discuss these assumptions in detail. The problem of computing a good regression coefficient is solved by the theorem of Gauss-Markov. The proof can be found in any textbook on statistics. Theorem4.
A good estimate for the regression coefficient is given by bx,x(t) = 2 *
cov(x'(t), x(t)) var(x(t))
(26)
These two theorems allow the estimation of the factor b(t) without doing a selection experiment. In quantitative genetics b(t) is called the heritability of the trait to be optimized. We have shown in [23] how to apply these theorems to the breeder genetic algorithm.
5
Analysis of recombination and selection
In this section we will make a detailed analysis of selection and crossover by simulations. First we will explain the performance of the crossover operator in finite populations by a diagram+ We will use O N E M A X as fitness function. In figure 2 the number of generations GEN+ until equilibrium and the size of the population are displayed. At equilibrium the whole population consists of one genotype only. The initial population was randomly generated with probability P0 = 0.2 of the advantageous allele. The data are averages over 100 runs.
GEN 200 175
p=0.2
1
--
1=o.12 /
---
1=0.2
|
,=o.s
/ ~
/
150 ....
..
........
125 100i 75 50
++e+
~ ..........
9
-e. ...........
o. ....................................................
-e. . . . . . .
i+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
"~
i~
-e
,+i- . . . . . . . . . . . . j.
25
++
+ +, . . . . +i+ . . . . .
I++
m ++
k.. 9
i-
d+
+
i:~+
:++6
Fig. 2. GENr vs population size N for p0 = 0.2 and po = 0.5 The figure can be divided into three areas. The first area we name saturation region. The population size is large enough so that the population converges to
155
the optimum value. In this area GENe is constant. This is an important result, because it is commonly believed in population genetics that GENe increases with the population size [19]. This is only the case in the second region. Here the population size is too small. The population does not converge to the optimum. GEN~ increases with the population size because the quality of the final solution gets better. The two regions are separated by the critical population size N*. It is the minimal population size so that the population converges to the optimum. N* depends on the selection intensity I, the size of the problem and the initial population. The relation between N* and I is esspecially difficult. N* increases for small selection intensities I and for large ones. The increase for large I can be easily understood. If only one individual is selected as parent, then the population converges in one generation. In this case the genotype of the optimum has to be contained in the initial population. So the population size has to be very large. The increase of N* with small selection intensity is more difficult to understand. It is related to the genetic drift. It has been known for quite a time that the population converges also without any kind of selection just because of random sampling in a finite population. In [1] it has been shown that GENe increases proportional to the size of the population N and to the logarithm of the size of the problem n. Thus GEN~ is surprisingly small. This important result demonstrates that chance alone is sufficient to drive a finite population to an equilibrium. The formula has been proven for one gene in [9]. It lead to the development of the neutral theory of evolution [19]. This theory states that many aspects of natural evolution can be explained by neutral mutations which got fixed because of the finite population size. Selection seems to be not as important as previously thought for explaining natural evolution. We are now able to understand why N* has to increase for small selection intensities. The population will converge in a number of generations proportional to the size of the population. Therefore the size of the population has to be large enough that the best genotype is randomly generated during this time. From GENe the number of trials till convergence can be easily computed by
FEe= N.GE~ In order to minimize FEe, the BGA should be run with the minimal popsize N* (I, n, P0)- The problem of predicting N* is very difficult because the transition from region 2 to the saturation region is very slow. In this paper we will only make a qualitative comparison of nmtation and crossover. Therefore a closed expression for N* is not needed. In [23] some formulas for N* are derived. The major results of this section can be summarized as follows: A gentic algorithms with recombination~crossover is only effective in large populations. It runs most efficiently with the critical population size N*( I,n ,po). The response to selection can be accurately predicted for the saturation region.
156
6
Analysis of Mutation
The mutation operator in small populations is well understood. The analysis of mutation in large populations is more difficult. In principle it is just a problem of statistics - doing N trials in parallel instead of a sequence. But the selection converts the problem to a nonstandard statistical problem. We will solve this problem by an extension of the response to selection equation. In [21] we have computed the probability of a successful mutation for a single individual. From this analysis the optimal mutation rate has been obtained. The optimal mutation rate maximizes the probability of a success. We just state the most important results. T h e o r e m 5. For the O N E M A X function of size n the optimal mutation rate m is proportional to the size of the problem. 1 m
~
m n
This important result has been independently discovered several times. The implications of this result to biology and to evolutionary algorithms have been first investigated by Bremermann [6]. The performance of crossover was measured by G E N t , the number of generations until equilibrium. This measure cannot be used for mutation because the population will never converge to a unique genotype. Therefore we will use as performance measure for mutation GENop,. It is defined as the average number of generations till the optimum has been found for the first time. For a population with two individuals (one parent and one offspring) GENop, has been computed by a Markov chain analysis [21]. In this case GENopt is equal to FEopt, the number of trials to reach the optimum. T h e o r e m 6 . Let Po be the probability of the advantageous allelle in the initial string. Then the (1+1) evolutionary algorithm needs on the average the following number of trials FEopt
FEopt --- e . n
(1-po),~ 1 E j=l 3
(27)
to reach the optimum. The mutation rate is set to m = 1/n. Proof. We only sketch the proof. Let the given string have one incorrect bit left. Then the probability of switching this bit is given by
sl = m * ( 1 - m )
n-1 ~ e - i ' m
(2s)
The number of trials to obtain the optimum is given by e * 1/m. Similarly if two bits are incorrect, then the number of trials needed to get one bit correct is given by el2 * 1/m. The total number is obtained by summation. |
157
For 0 < P0 < 0.9 the above equation can be approximated by
FEopt
=
(29)
e . n . l n ( ( 1 - po)n)
We have confirmed the formula by intensive simulations [21]. Recently Bs [2] has shown that FEopt can be only marginally reduced if a theoretically optimal variable mutation rate is used. This mutation rate depends on the number of bits which are still wrong. This result has been predicted in [21]. Mutation spends most of the time in adjusting the very last bits. But in this region the optimal mutation rate is m = 1/n. Next we will extend the analysis to large populations. First we will use simulation results. In figure 3 the relation between GENopt, FEopt, and the popsize N is displayed for two selection methods. The selection thresholds are T = 50% and the smallest one possible, T = 1/N. In the latter case only the best individual is selected as parent. In large populations the strong selection outperforms the fixed selection scheme by far. These results can easily be explained. The mutation operator will change one bit on the average. The probability of a success gets less the nearer the population comes to the optimum. Therefore the best strategy is to take just the best individual as parent of the next generation.
Gen
FE
300 T=0,5
250
-
-
-
'
T=I/N
10000
~
T=0,5
--- T = I / N 8000
200 6000
150
-Ii 4000
t00
................. 4
2000
50
s1'6
d
Fig. 3. GENopt and function evaluations (FE) for various N and different T
From GENopt the expected number of trials needed to find the optimum can be computed
FEopt = N . GENopt For both selection methods, FEovt increases linearly with N for large N. The increase is much smaller for the strong selection. The smallest number of function evaluations are obtained for N = 1, 2, 4. We now turn to the theoretical analysis. It depends on an extension of the response to selection equation.
Let ut be the probability of a mutation success, imp the average improvement of a successful mutation. Let vt be the probability that the offspring
Theorem7.
158 is worse than the parent, red the average reduction of the fitness. Then the response to selection for small mutations in large populations is given by R ( t ) : S(t) + u, . i m v -
yr. red
(30)
S(t) is the average fitness of the selected parents. Proof. Let Ms(t) be the average of the selected parents. Then M ( t + 1) = ut(M~(t) + imp) + vt(M,(t) - red) + (1 - ut - vt)M,(t) Subtracting M ( t ) from both sides of the equation we obtain the theorem. II The response to selection equation for mutation contains no heritability. Instead there is an offset, defined by the difference of the probabilities of getting better or worse. The importance of ut and vt has been independently discovered by Schaffer et al. [26]. They did not use the difference of the probabilities, but the quotient which they called the safety factor. F:-Vt
In order to apply the theorem we have to estimate S(t), ut and yr. The last two variables can be estimated by using the results of [21]. The estimationn needs the average number i of wrong bits of the parent strings as input. But i can be easily transformed into a variable depending on the state of the population at generation t. This variable is the marginal probability p(t) that there is the advantageous allele at a locus, p(t) was already used in the previous theorems. i and p(t) are connected by i ,.~ n - ( 1 - p(t)) = n - M(t)
(31)
We have been not able to estimate S(t) analytically. For the next result we have used simulations. Therefore we call it an empirical law. E m p l r i e a l L a w 1 For the ONEMAX function, a truncation threshold of T = 50%, a mutation rate of m = 1/n, and n >> 1 the response to selection of a large population changing by mu-lation is approximate
R(t) = 1 + (1 - p(t))e -p(t) - p(t)e -(i-p('))
(32)
Pro@ Let the parents have i bits wrong, let si be the probability of a success by mutation, fi be the probability of a defect mutation, si is approximately given by the product of changing at least one of tlhe wrong bits and not changing the correct bit [21]. Therfore =
(1
-
=
(1 -
-
(i
Similarly -d(1
-
(i -
-
t59
From equation 31 and 1 - (1 - m) ~ ~ i 9 m we obtain = (1 - p ( t ) ) ( 1
n
k = p(t)(1
rt
Because (1 - ~)'~ ~ e - I we get st = (1 - v ( t ) )
e-P(')
.It = P(t)e -(1-p(t)) We are left with the problem to estimate imp and red. In a first approximation we set both to 1 because a mutation rate of m = 1/n changes one bit on the average. We have not been able to estimate S(t) analyticMly. Simulations show that for T = 50% S(t) decreases from about 1.15 at the beginning to about 0.9 at GENop,. Therefore S(t) = 1 is a resonable approximation. This completes the proof. | Equation 32 defines a difference equation for p(t + 1). We did not succeed to solve it analytically. We have found that the following linear approximation gives almost the same results E m p i r i c a l L a w 2 Under the asssumptions of empirical law I the response to
selection can be approximated by .~(t) = 2 - 2p(t)
(33)
The number of generations until p(t) = 1 - c is reached is given by n . l n l - Pe~ GENI_~. ~ -~
(34)
Proof, The proof is identical to the proof of theorem 2. In figure 4 the development of the mean fitness is shown. The simulations have been done with two popsizes ( N = 1024, 64) and two m u t a t i o n rates (m = 1/n, 4/n). The agreement between the theory and the simulation is very good. The evolution of the mean fitness of the large population and the small population is almost equal. This demonstrates that for mutation a large population is inefficient. A large mutation rate has an interesting effect. The mean fitness increases faster at the beginning, but it never finds the optimum. This observation again suggests to use a variable mutation rate. But we have already mentioned t h a t the increase in performance by using a variable mutation rate will be rather small. Mutation spends most of its time in getting the last bits correct. But in this region a mutation rate of m = 1/n is optimal. The m a j o r results of this section can be summarized as follows: Mutation in
large populations is not effective. It is more efficient with very strong selection. The response to selection becomes very small when the population is approaching the optimum. The efficiency of the mutation operator critically depends on the mutation rate.
160 MeanFit
60 50 40 30
~ a 20 /f / ~ y 10 0
20
i
o
40
n (N=1024,M=l/n) ~SImuation (N=1024,M=4/n) ---. s i ~ u . ~ a ~ ~ = ! / . " ! ..... Simulation(N= 64, M=4/n)
60
80
100
Gen
Fig. 4. Mean fitness for theory and simulations for various N and mutation probabilities
7
Competition between Mutation and Crossover
The previous sections have qualitatively shown that the crossover operator and the mutation operator are performing good in different regions of the parameter space of the BGA. In figure 5 crossover and mutation are compared quantitatively for a popsize of N = 1024. The initial population was generated with P0 = 1/64. The mean fitness of the population with mutation is larger than that of the population with crossover until generation 18. Afterwards the population with crossover performs better. This was predicted by the analysis.
MeanFit
---
Crossover Mutation
s j
20
40
.
60
.
.
.
.
'
80
.
.
.
.
.
1 O0
Gen
Fig. 5. Comparison of mutation and crossover
161
The question now arises how to best combine mutation and crossover. This can be done by two different methods at least. First one can try to use both operators in a single genetic algorithm with their optimal parameter settings. This means that a good mutation rate and a good population size has to be predicted. This method is used for the standard breeder genetic algorithm B G A . Results for popular test functions will be given later. Another method is to apply a competition between subpopulations using different strategies. Such a competition is in the spirit of population dynamics. It is the foundation of the Distributed Breeder Genetic Algorithm. Competition of strategies can be done on different levels, for example the level of the individuals, the level of subpopulations or the level of populations. B~ck et al. [3] have implemented the adaptation of strategy parameters on the individual level. The strategy parameters of the best individuals are reeombined, giving the new stepsize for the mutation. Herdy [17] uses an competition on the population level. In this case whole populations are evaluated at certain intervals. The strategies of the succesful populations proliferate, strategies in populations with bad performance die out. Our adaptation lies between these two extreme cases. The competition is done between subpopulations. Competition requires a quality criterion to rate a group, a gain criterion to reward or punish the groups, an evaluation interval, and a migration interval. The evaluation interval gives each strategy the chance to demonstrate its performance in a certain time window. By occasional migration of the best individuals groups which performed badly are given a better chance for the next competition. The sizes of the subgoups have a lower limit. Therefore no strategy is lost. The rationale behind this algorithm will be published separately. In the experiments the mean fitness of the species was used as quality criterion. The isolation interval was four generations, the migration interval eight generations. The gain was four individuals. In the case of two groups the population size of the better group increases by four, the population size of the worse group decreases by four. If there are more than two groups competing, then a proportional rating is used. Figure 6 shows a competition race between two groups, one using mutation only, the other crossing-over. The initial population was randomly generated with p0 = 1/64. The initial population is far away from the optimum. Therefore first the population using mutation only grows, then the crossover population takes over. The first figure shows the mean fitness of the two groups. The migration strategy ensures that the mean fitness of both populations are almost equal. in figure 7 competition is done between three groups using different mutation rates. At the beginning the group with the highest mutation rate grows, then both the middle and the lowest mutation rate grow. At the end the lowest mutation rate takes over. These experiments confirm the results of the previous sections. In the next section we will compare the efficiency of a BGA using mutation, crossover and an optimal combination of both.
t62 MeanFit
N
1MAX, n---64
60
6O
50
50
40
~
----
30
1MAX, n---64
4o!
Mutation Crossover
30
20
20
10
10
0 -
' ' 25 50
~ =Gen 75 100 125 150 175 20G
0
25
50
75 100 125 150 175 200Gen
Fig. 6, Competition between mutation and crossover
MeanFit
I MAX, n=64
N
1MAX, n=64
60 40 + ~ p ~
30
Lt,-"
j~,
30 '
t
20
j
.
.
.
.
- - p=l/n --- p=4/n
p=16/n
,,
2O 10
' "
o
25
50
t. %.:..^,,
..
~
75 100 125 150 175 200_en-~G
IO
o'
2~
~o
7~ 1oo 125 15o 17s a00ae"
Fig. 7. Competition between different mutation rates
8
The Test Functions
The outcome of a comparison of mutation and crossover depends on the fitness landscape. Therefore a carefully chosen set of test functions is necessary. We will use test functions which we have theoretically analyzed in [21]. They are similar to the test functions used by Schaffer [26]. The test suite consists of ONEMAX(n) MULTIMAX(n) PLATEAU(k,1) SYMBASIN(k,1) DECEPTION(k,1) The fitness of ONEMAX is given by the number of l's in the string. MULTIMAX(n) is similar to ONEMAX, but its global optima have exactly n/2 l's contained in the string. It is defined as follows
MULTIMAX(n, X)
163
We have included the MULTIMAX(n) function in the test suite to show the dependence of the performance of the crossover operator on the fitness function. MULTIMAX(n) poses no difficulty for mutation. Mutation will find one of the many global optima in O(n) time. But crossover has difficulties when two different optimal strings are recombined. This will lead with high probability to a worse offspring. An example is shown below for n = 4 1100 (~) 0011 With probability P = 10/16 will crossover create an offspring worse than the midparent. The average fitness of an offspring is 3/2. Therefore the population will need many generations in order to converge. More precisely: The number of generations between the time when an optimum is first found and the convergence of the whole population is very high. MULTIMAX is equal to ONEMAX away from the global optima. In this region the heritability is one. When the population approaches the optima, the heritability drops sharply to zero. The response to selection is almost 0. For the PLATEAU function k bits have to be flipped in order that the fitness increases by k. The DECEPTION function has been defined by Goldberg [t6]. The fitness of DECEPTION(k,1) is given by the sum of l deceptive functions of size k. A deceptive function and a smoothed version of order k = 3 is defined in the following table bit DECEP SYMBA I bit:DECEP SYMBA 111 30 30.100 14 14 101 0 26010 22 22 110 0 22 001 26 26 011 0 14 000 28 28 A DECEPTION function has 21 local maxima. Neighboring maxima are k bits apart. Their fitness value differs by two. The basin of attraction of the global optimum is of size k l, the basin of attraction of the smallest optimum is of size (2 k - 1) z. The DECEPTION function is called deceptive because the search is mislead to the wrong maximum (0, 0 , . . . , 0). The global optimum is particularly isolated. The SYMBASIN(k,1) function is like a deceptive function, but the basins of attraction of the two peaks are equal. In the simulations we used the values given in the above table for SYMBA.
9
N u m e r i c a l Results
All simulations have been done with the breeder genetic algorithm BGA. In order to keep the number of simulations small, several parameters were fixed. The mutation rate was set to m = 1/n where n denotes the size of the problem. The parents were selected with a truncation threshold of T = 35%. Sometimes T = 50% was used.
164
In the following tables the average number of generations is reported which are needed in order that the best individual is above a predefined fitness value. With these values it is possible to imagine a type of race between the populations using the different operators. Table 2 shows the results for ONEMAX of size 64. FE denotes the number of function evaluations necessary to reach the optimum. SD is the standard deviation of GENt if crossover is applied only. In all other cases it is GENop~,the number of generations until the optimum was found. The initial population was randomly generated with a probability P0 = 0.5 that there is a 1 at a locus. The numerical values are averages over 100 runs.
63[ 64[ SD I F ~ M 241941156 183 226 309! 82 618 M 641840 65 801102143I 56!9161 c* ! 64 711 15 15 17 19 1.1 1210 C 128 5 9 12 12 13 15 10.8 189~ M~zC 423151 81 961151521 47 608 M&C 64 713 17 19i 20 22 2.1 2102 Table 2. ONEMAX(64); C* found optimum in 84 runs only
The simulations confirm the theory. Mutation in small populations is a very effective search. But the variance SD of GENopt is very high. Furthermore, the success of mutation decreases when the population approaches the optimum. A large population reduces the efficiency of a population using mutation. Crossover is more predictable. The progress of the population is constant. But crossover critically depends on the size of the population. The most efficient search is done by the BGA using both mutation and crossover with a population size of N = 4. In table 3 the initial population was generated farther away from the optimum (p0 = 1/8). In this experiment, mutation in small populations is much more efficient than crossover. But the combined search is also performing good.
lOP N2432 62 63 64SD FE M 21424192237307 85 615 M 64 8 16 96 117 161 72110388 C* 256 6 9 24 25 27 0.9 6790 C 320 6 9 24 25 26 0.9 8369 M&:C 4}i'i 19 114 136i180 5'2 '725 MaC 641 5 8 29 31 34 3 2207 Table 3. ONEMAX(64); P0 = 1/8; C* found optimum in 84 runs only
165
In table 4 results are presented for the PLATEAU function. The efficiency of the small population with mutation is slightly worse than for ONEMAX, But the efficiency of the large population is much better than for ONEMAX. This can be easily explained. The large population is doing a random walk on the plateau. The best efficiency has the BGA with mutation and crossover and a popsize of N --- 4.
I~ I 'NI288l 291129412971300,[SD[, FEI M 4[ 27 42 64 95 184 107 737 M 64~ 5 8 13i 19 31 9!2064 C* 64 3 4 6 7 9 1 569 C 128 3 4 5 6 8 1!1004 M&C 4 2232,5 49 73!134 63 539 M&C 64 10 10 10 10 12 2 i 793 Table 4. PLATEAU(3,10); C* found optimum in 78 runs only
In table 5 results are shown for the
lOP M M M
DECEPTION(3, 10) function.
IN I 2831 2911 2941 2971 3001 SD I FE] 4 419 3520 4721 6632 9797 4160 391927 16 117 550 677 827 1241 595 19871 64
35 202 266 375 573 246 36714
C* 32 11 M&C 4 ] 597 3480,4760,6550 9750 3127 38245 M&C !161 150 535 625 775 1000 389 16004 M&C*!6411170 ...... ! Table 5. DECEPTION(3,10);* stagnated far from optimum
We observe a new behavior. Mutation clearly outperforms uniform crossover. But note that a popsize of N = 16 is twice as efficient as a popsize of N = 4. The performance decreases till N = 1. Mutation is most efficient with a popsize between 12 and 24. In very difficult fitness landscapes it pays off to try many different searches in parallel. The BGA with crossover only does not come near to the optimmn. Furthermore, increasing the size of the population from 32 to 4000 gives worse result. This behavior of crossover dominates also the BGA with mutation and crossover. The BGA does not find the optimum if it is run with popsizes greater than 50. This is a very unpleasant fact. There exist only a small range of popsizes where the BGA wilt find the optimum.
166
It is known that the above problem would vanish, if we use 1-point crossover instead of uniform crossover. But then the results depend on the bit positions of the deceptive function. For the ugly deceptive function [21] 1-point crossover performs worse than uniform crossover. Therefore we will not discuss experiments with 1-point crossover here. The results for SYMBASIN are different. In table 6 the results are given. For mutation this function is only slightly easier to optimize than the DECEPTION function. Good results are achieved with popsizes between 8 and 64, But the SYMBASIN function is a lot more easier to optimize for uniform crossover. The BGA with mutation and crossover performs best. Increasing the popsize decreases the number of generations needed to find the optimum.
297l a 0 0 l ~ iI 41 1092215035857404420029621 16 24 125 205 391 765 530 12250 64 18 46 68 106 221 136 14172 6 16 18 19! 20 4 4 14 15 17! 18 0,2136741 aa 1642 2987'ssa719105 l18a[a6421
12
16115 95 186 331 615 418]9840
64.12 I
aa
5a
99 2:611 15}11~176
Table 6. SYMBASIN(3,10);C*: only 50% reached the optimum
The absolute performance of the BGA is impressive compared to other algorithms. We will only mention ONEMAX and DECEPTION. For ONEMAX the number of function evaluations needed to locate the optimum (FEopt) scales like e. n. In(n) (empirical law i). Goldberg [15] observed a scaling of O(n ~7) for his best algorithm. To our knowledge the previous best results for DECEPTION and uniform crossover have been achieved by the CHC algorithm of Eshelman [10]. The CHC algorithm needed 20960 function evaluations to find the optimum. The BGA needs about 16000 function evaluations. The efficiency can be increased if steepest ascent hillclimbing is used [2t], In the last table we will show that the combination of mutation and crossover gives also good results for continuous functions. In table 7 results for Rastrigin's function [2'2] are shown. The results are similar to the results of the ONEMAX function. The reason of this behavior has been explained in [22]. A BGA using mutation and discrete recombination with a popsize of N = 4 performs most efficiently.
167
loP ]N[i.O I .1[.01[.001[SD I FE I M [ 4594636691 M ]64[139176 225 M&C 4 531 599i634 M&C64 50 66 91
801 40 3205 286 9 18316 720 38 2881 123 3 7932
Table 7. Rastrigin's function (n = 10)
10
Conclusion
The theoretical analysis of evolutionary algorithms has suffered in the past from the fact that the methods developed in quantitative genetics to understand especially artificial selection have been largely neglected. Many researchers still believe that the schema theorem [14] is the foundation of the theory. But the schema theorem is nothing else than a simple version of Fisher's fundamental theorem of natural selection. In population genetics it was discovered very early that this theorem has very limited applications. We have shown in this paper that the behaviour of evolutionary algorithms can be well understood by the response Lo selection equation. It turned out that the behaviour of the breeder genetic algorithm is already complex for one of the most simple optimization functions, the O N E M A X function. This function can play the same role for evolutionary algorithms as the ideal gas in thermodynamics. For the ideal gas the thermodynamic laws can be theoretically derived. The laws for real gases are extensions of the basic laws. In the same manner the equations derived for O N E M A X will be extended for other optimization functions. For this extension a statistical approach using the concept heritability and the genotypic and phenotypic variance of the population can be used. This approach is already used in the science of artificial breeding.
References 1. H. Asoh and H. Miihlenbein. On the mean convergence time of genetic populations without selection. Technical report, GMD, Sankt Augustin, 1994. 2. Thomas Bgck. Optimal mutation rates in genetic search. In S. Forrest, editor, 5rd Int. Conf. on Genetic Algorithms, pages 2-9, San Mateo, 1993. Morgan Kaufmann. 3. Thomas Bs and Hans-Paul Schwefel. A Survey of Evolution Strategies. In Proceedings of the Fourth International Conference of Genetic Algorithms, pages 2-9, San Diego, 1991. ICGA. 4. Thomas Bs and Hans-Paul Schwefel. An Overview of Evolutionary Algorithms for Parameter Optimization. Evolutionary Computation, 1:1-24, 1993. 5. R. K. Belew and L. Booker, editors. Procedings of the Fourth International Conference on Genetic Algorithms, San Mateo, 1991. Morgan Kaufmann. 6. H.J. Bremermann, M. Rogson, and S. Salaff. Global properties of evolution processes. In H.H. Pattee, editor, Natural Automata and Useful Simulations, pages 3-42, 1966.
168
7. M. G. Bulmer. "The Mathematical Theory of Quantitative Genetics". Clarendon Press, Oxford, 1980. 8. J. F. Crow. Basic Concepts in Population, Quantitative and Evolutionary Genetics. Freeman, New York, 1986. 9. J . F . Crow and M. Kimura. An Introduction to Population Genetics Theory. Harper and Row, New York, 1970. 10. L.J. Eshelman. The CHC Adaptive Search Algorithm: How to Have Safe Search when Engaging in Nontraditional Genetic Recombination. In G. Rawfins, editor, Foundations of Genetic Algorithms, pages 265-283, San Mateo, 1991. MorganKaufman. 11. D. S. Falconer. Introduction to Quantitative Genetics. Longman, London, 1981. 12. R. A. Fisher. The Genetical Theory of Natural Selection. Dover, New York, 1958. 13. S. Forrest, editor. Procedings of the Fifth International Conference on Genetic Algorithms, San Mateo, 1993. Morgan Kaufmann. 14. David E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, 1989. 15. D.E. Goldberg. Genetic algorithms, noise, and the sizing of populations. Complex Systems, 6:333-362, 1992. 16. D.E. Goldberg, K. Deb, and B. Korb. Messy genetic algorithms revisited: Studies in mixed size and scale. Complex Systems, 4:415-444, 1990. 17. Michael Herdy. Reproductive Isolation as Strategy Parameter in Hierarchical Organized Evolution Strategies. In PPSN 2 Bruxelles, pages 207-217, September 1992. 18. J.H. Holland. Adaptation in Natural and Artificial Systems. Univ. of Michigan Press, Ann Arbor, 1975. 19. M. Kimura. The neutral theory of molecular evolution. Cambridge University Press, Cambridge University Press, 1983. 20. H. Miihlenbein, M. Gorges-Schleuter, and O. Kr~imer. Evolution Algorithms in Combinatorial Optimization. Parallel Computing, 7:65-85, 1988. 21. Heinz Mfihlenbein. Evolution in time and space - the parallel genetic algorithm. In G. Rawfins, editor, Foundations of Genetic Algorithms, pages 316-337, San Mateo, 1991. Morgan-Kaufman. 22. Heinz Miihlenbein and Dirk Schlierkamp-Voosen. Predictive Models for the Breeder Genetic Algorithm: Continuous Parameter Optimization. Evolutionary Computation, 1(1):25-49, 1993. 23. Heinz Mfihlenbein and Dirk Schlierkamp-Voosen. The science of breeding and its application to the breeder genetic algorithm. Evolutionary Computation, 1(4):335360, 1994. 24. Ingo Rechenberg. Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Information. Fromman Verlag, Freiburg, 1973. 25. It. Schaffer, editor. Proeedings of the Third International Conference on Genetic Algorithms, San Mateo, 1989. Morgan Kaufmann. 26. J.D. Schaffer and L.J. Eshelman. On crossover as an evolutionary viable strategy. In R. K. Belew and L. Booker, editors, Procedings of the ~burth International Conference on Genetic Algorithms, pages 61-68, San Mateo, 1991. Morgan Kaufmann. 27. H.-P. Schwefel. Numerical Optimization of Computer Models. Wiley, Chichester, 1981. 28. G. Syswerda. Uniform crossover in genetic algorithms. In H. Schaffer, editor, 3rd
Int. Conf. on Genetic Algorithms, pages 2-9, San Mateo, 1989. Morgan Kaufmann.
The Role of Mate Choice in Biocomputation: Sexual Selection as a Process of Search, Optimization, and Diversification Geoffrey F. Miller 1 and Peter M. Todd 2 School of Cognitive and Computing Sciences University of Sussex Falmer, Brighton, BN1 9QH, UK [email protected], ac.uk Department of Psychology University of Denver 2155 S. Race Street Denver, CO 80208, USA ptodd@pst ar.psy.du.edu A b s t r a c t . The most successful, complex, and numerous species on earth are composed of sexually-reproducing animals and flowering plants. Both groups typically undergo a form of sexual selection through mate choice: animals are selected by conspecifics and flowering plants are selected by heterospecific pollinators. This suggests that the evolution of phenotypic complexity and diversity may be driven not simply by natural-selective adaptation to econiches, but by subtle interactions between natural selection and sexual selection. This paper reviews several theoretical arguments and simulation results in support of this view. Biological interest in sexual selection has exploded in the last 15 years (see Andersson, 1994; Cronin, 1991), but has not yet been integrated with the biocompurational perspective on evolution as a process of search and optimization (Holland, 1975; Goldberg, 1989). In the terminology of sexual selection theory, mate preferences for 'viability indicators' (e.g. Hamilton & Zuk, 1982) may enhance evolutionary optimization, and mate preferences for 'aesthetic displays' (e.g. Fisher, 1930) may enhance evolutionary search and diversification. Specifically, as a short-term optimization process, sexual selection can: (1) speed evolution by increasing the accuracy of the mapping from phenotype to fitness and thereby decreasing the 'noise' or 'sampling error' characteristic of many forms of natural selection, and (2) speed evolution by increasing the effective reproductive variance in a population even when survival-relevant differences are minimal, thereby imposing an automatic, emergent form of 'fitness scaling', as used in genetic Mgorithm optimization methods (see Goldberg, 1989). As a longer-term search process, sexual selection can: (3) help populations escape from local ecological optima, essentially by replacing genetic drift in Wright's (1932) "shifting balance" model with a much more powerful and directional stochastic process, and (4) facilitate the emergence of complex innovations, some of which may eventually show some ecological utility. Finally, as a process of diversification, sexual selection can (5)
170
promote spontaneous sympatric speciation through assortative mating, increasing biodiversity and thereby increasing the number of reproductively isolated lineages performing parallel evolutionary searches (Todd & Miller, 1991) through an adaptive landscape. The net result of these last three effects is that sexual selection may be to macroevolution what genetic mutation is to microevolution: the prime source of potentially adaptive heritable variation, at both the individual and species levels. Thus, if evolution is understood as a biocomputational process of search, optimization, and diversification, sexual selection can play an important role complementary to that of natural selection. In that role, sexual selection may help explain precisely those phenomena that natural selection finds troubling, such as the success of sexually-reproducing lineages, the speed and robustness of evolutionary adaptation, and the origin of otherwise puzzling evolutionary innovations, such as the human brain (Miller, 1993). Implications of this view will be discussed for biology, psychology, and evolutionary approaches to artificial intelligence and robotics.
1
Introduction
Sexual selection through mate choice (Darwin~ 1871) has traditionally been considered a minor, peripheral, even pathological process, tangential to the main work of natural selection and largely irrelevant to such central issues in biology as speciation, the origin of evolutionary innovations, and the optimization of complex adaptations (see Cronin, 1991). But this traditional view is at odds with the fact that the most complex, diversified, and elaborated taxa on earth are those in which mate choice operates: animals with nervous systems, and flowering plants. The dominance of these life-forms, and the maintenance of sexual reproduction itself, has often been attributed to the advantages of genetic recombination. But recombination alone is not diagnostic of animals and flowering plants: bacteria and non-flowering plants both do sexual recombination. Rather, the interesting common feature of animals and flowering plants is that both undergo a form of sexual selection through mate choice. Animals are sexually selected for reproduction by opposite-sex conspecifics (Darwin, 1871; see Andersson, 1994), and flowering plants are sexually selected by the heterospeciflc pollinators such as insects and hummingbirds that they attract to further their own reproduction (Sprengel, 1793; Darwin, 1862; see Barth, 1991). Indeed, Darwin's dual fascination with animal courtship (Darwin, 1871) and with the contrivances of flowers to attract pollinators (Darwin, 1862) may reflect his understanding that these two phenomena shared some deep similarities. The importance of mate choice in evolution can be appreciated by considering the special properties of neural systems as generators of selection forces. The brains and sensory-motor systems of organisms make choices that affect the survival and reproduction of other organisms in ways that are quite different from the effects of inanimate selection forces (as first emphasized by Morgan,
171
1888). 1 This sort of psychological selection (Miller, 1993; Miller & Cliff, 1994; Miller ~: Freyd, 1993) by animate agents can have much more direct, accurate, focused, and striking results than simple biological seleclion by ecological challenges such as unicellular parasites or physical selection by habitat conditions such as temperature or humidity. Recently, several biologists have considered the evolutionary implications of "sensory selection", perhaps the simplest form of psychological selection (see Endler, 1992; Enquist & Arak, 1993; Guilford & Dawkins, 1991; Ryan, 1990; Ryan ~: Keddy-Hector, 1992). This paper emphasizes the evolutionary effects of mate choice because mate choice is probably the strongest, most common, and best-analyzed type of psychological selection. But there are many other forms of psychological selection both within and between species. For example, the effects of psychological selection on prey by predators results in mimicry, camouflage, warning coloration, and protean (unpredictable) escape behavior. Artificial selection on other species by humans, whether for economic or aesthetic purposes, is simply the most self-conscious and systematic form of psychological selection. Thus, we can view sexual selection by animals choosing mates as mid-way between brute natural selection by the inanimate environment, and purposive artificial selection by humans. But the big questions remain: What distinctive evolutionary effects arise from psychological selection, and in particular from sexual selection through mate choice? And how does sexual selection interact with other selective forces arising from the ecological and physical environment? The traditional answer has been that sexual selection either copies natural selection pressures already present (e.g. when animals choose high-viability mates) making it redundant and impotent, or introduces new selection pressures irrelevant to the real work of adapting to the econiche (e.g. when animals choose highly ornamented mates), making it distracting and maladaptive. In this paper we take a more positive view of sexual selection. By viewing evolution as a 'biocomputational' process of search, optimization, and diversification in an adaptive landscape of possible phenotypic designs, we can better appreciate the complementary roles played by sexual selection and natural selection. We suggest that the success of animals and flowering plants is no accident, but is due to the complex interplay between the dynamics of sexually-selective mate choice and the dynamics of naturally-selective ecological factors. Both processes together are capable of generating complex adaptations and biodiversity much more efficiently than either process alone. Mate choice can therefore play a critical role in biocomputation, facilitating not only short-term optimization within populations, but also the longer-term search for new adaptive zones and new evolutionary innovations, and even speciation and the macroevolution of biodiversity. 1 Mate choice may also be possible without brains, occurring in plants through a variety of mechanisms of female choice and male competition (see Willson & Burley, 1983; Andersson, 1994). However, these mechanisms seem for the most part to be instantiated in and have effects at the microscopic and molecular levels, in contrast to the mostly macroscopic effects of selection by animal nervous systems.
172
This paper begins with a discussion of the historical origins of the idea of mate choice (section 2) and the evolutionary origins of mate choice mechanisms (section 3). We then explore how mate choice can improve bioeomputation construed as adaptive population movements on fitness landscapes, by allowing faster optimization to fitness peaks (section 4), easier escape from local optima (section 5), and the generation of evolutionary innovations (section 6). Moving from serial to parallel search, we then consider how sexual selection can lead to sympatric speeiation and thus to evolutionary search by multiple independent lineages (section 7). Finally, section 8 discusses some implications of these ideas for science (particularly biology and evolutionary psychology) and some applications in engineering (particularly genetic algorithms research and evolutionary optimization techniques). This theoretical paper complements our earlier work on genetic algorithm simulations of sexual selection (Todd & Miller, 1991, 1993; Miller & Todd, 1993; Miller, 1994; Todd, in press); in further work we test these ideas with more extensive simulations (Todd & Miller, in preparation) and comparative biology research (Miller, accepted, a; Miller, 1993). 2 The evolution of economic selection versus the evolution through sexual selection
traits through of reproductive
natural traits
Darwin (1859, 1871) clearly distinguished between natural selection and sexual selection as different kinds of processes operating on different kinds of traits according to different kinds of evolutionary dynamics. For him, natural selection improved organisms' abilities to survive in an environment that is often hostile and always competitive, while sexual selection honed abilities to attract and select mates and to produce viable and attractive offspring. But this critical distinction between natural and sexual selection was lost with tile Modern Synthesis (Dobzhansky, 1937; Huxley, 1942; Mayr, 1942; Simpson, 1944), when natural selection was redefined as any change in gene frequencies due to the fitness effects of heritable traits, whether through differential survival or differential reproduction. The theory of sexual selection through mate choice had been widely dismissed after Darwin, and this brute-force redefinition of natural selection to encompass virtually all non-random evolutionary processes did nothing to revive interest in mate choice. Fisher (1915, 1930) was one of the few biologists of his era to worry about the origins and effects of mate choice. He developed a theory of "runaway sexual selection," in which an evolutionary positive-feedback loop is established (via genetic linkage) between female preferences for certain male traits, and the male traits themselves. As a result, both the elaborateness of the traits and the extremity of the preferences could increase at an exponential rate. Fisher's model could account for the wildly exaggerated male traits seen in many species, such as the peacock's plumage, but it did not explain the evolutionary origins of female preferences themselves, and was not stated in formal genetic terms. Huxley (1938) criticized Fisher's model in a hostile and confused review of sexual
173
selection theory, which kept Darwin's theory of mate choice in limbo for decades to come. In the last 15 years, however, there has been an explosion of work on sexual selection through mate choice. The new population genetics models of O'Donald (1980), Lande (1981), and Kirkpatrick (1982) supported the mathematical feasibility of Fisher's runaway sexual selection process. Behavioral experiments on animals showed that females of many species do exhibit strong preferences for certain male traits (e.g. Andersson, 1982; Catchpole, 1980; Ryan, 1985). New comparative morphology has supported Darwin's (1871) claim that capricious elaboration is the hallmark of sexual selection: for instance, Eberhard (1985) argued that the only feasible explanation for the wildly complex and diverse male genitalia of many species is evolution through female preference for certain kinds of genital stimulation. Evolutionary computer simulation models such as those of Collins and Jefferson (1992) and Miller and Todd (1993) have confirmed the plausibility, robustness, and power of runaway sexual selection. Once biologists started taking the possibility of female choice seriously, evidence for its existence and significance came quickly and ubiquitously. Cronin (1991) provides a readable, comprehensive, and much more detailed account of this history, and Andersson (1994) gives the most authoritative review of the literature. Largely independently of this revival of sexual selection theory, Eldredge (1985, 1986, 1989) has developed a general model of evolution based on the interaction of a "genealogical hierarchy" composed of genes, organisms, species, and monophyletic taxa, and an "ecological hierarchy" composed of organisms, "avatars" (sets of organisms that each occupy the same ecological niche), and ecosystems. Phenotypes in this view are composed of two kinds of traits: "economic traits" that arise through natural selection to deal with the ecological hierarchy, and "reproductive traits" that arise through sexual selection to deal with other entities (e.g. potential mates) in the genealogical hierarchy. Eldredge (1989) emphasizes that the relationship between economic success and reproductive success can be quite weak, and that reproductive traits are legitimate biological adaptations - - as shown by recent research on mate choice and courtship displays (see Andersson, 1994). Eldredge also grants genealogical units their own hierarchy separate from the ecological one, but does not emphasize the possibility of evolutionary dynamics occurring entirely within the genealogical hierarchy, without any ecological relevance. The one exception is Eldredge's discussion of how "specific mate recognition systems" (SMRSs) might be disrupted through stochastic effects, resulting in spontaneous speciation. But other processes occurring purely within the genealogical hierarchy, such as Fisher's (1930) runaway process, are not mentioned. Thus, even in his authoritative review of macroevolutionary theory (Eldredge, 1989), which consistently views evolutionary change in terms of movements through adaptive landscapes, Eldredge overlooks the adaptive autonomy of sexual selection, and the adaptive interplay between sexual selection and natural selection. But the time is now right to take sexual selection seriously in both roles: (1) as a potentially autonomous evolutionary process that can operate entirely
174
within Eldredge's "genealogical hierarchy", and (2) as a potentially important complement to natural selection that can facilitate adaptation to Eldredge's "ecological hierarchy" in various ways. The remainder of this paper focuses on this second role. But to understand the dynamic interplay between natural and sexual selection, we must first understand their different characteristic dynamics. Natural selection typically results in convergent evolution onto a few (locally) optimal solutions given pre-established problems posed by the econiche. In natural selection by the ecological niche or the physical habitat, organisms adapt to environments, but not vice-versa (except in relatively rare cases of tight coevolution - - see Futuyama & Slatkin, 1983). This causal flow of selection from environment to organism makes natural selection fairly easy to study empirically and formally, because one can often identify a relatively stable set of external conditions (i.e. a 'fitness function') to which a species adapts. Moreover, natural selection itself is primarily a hill-climbing process, good at exploiting adaptive peaks, but somewhat weak at discovering them. By contrast, sexual selection often results in an unpredictable, divergent pattern of evolution, with lineages speciating spontaneously and exploring the space of phenotypic possibilities according to their capriciously evolved mate preferences. In sexual selection, the mate choice mechanisms that constitute the selective 'environment' can themselves evolve under various forces, including the current distribution of available phenotypes. Thus, the environment and the adaptations - - the traits and preferences - - can co-evolve under sexual selection, as Fisher (1930) realized. The causal flow of sexual selection forces is bi-directional, and thus more complex and chaotic. The resulting unpredictable dynamics may look entirely anarchic, without structure and due entirely to chance, but are in fact 'autarchic', in that a species evolving through strong selective mate choice is a self-governing system that in a sense determines its own evolutionary trajectory. Indeed, sexual selection could be considered the strongest form of biological self-organization that operates apart from natural selection - - but it is a form almost entirely overlooked by those who study self-organization from a biocomputational perspective (e.g. Brooks & Maes, 1994; Kauffman, 1993). If one visualizes sexual selection dynamics as branching, divergent patterns that explore phenotype space capriciously and autonomously, and natural selection dynamics as convergent, hill-climbing patterns that seek out adaptive peaks, then their potential complementarity can be understood. The overall evolutionary trajectory of a sexually-reproducing lineage results from the combined effects of sexual selection dynamics and natural selection dynamics (plus the stochastic effects of genetic drift and neutral drift) - - an interplay of capriciously directed divergence and ecologically directed convergence. This interplay might help explain evolutionary patterns that have proven difficult to explain under natural selection alone, particularly the abilities of lineages to optimize complex adaptations, to escape from local evolutionary optima, to generate evolutionaxy innovations, and to split apart into sympatric species. This interplay between capricious, divergent sexual selection and directed, convergent natural selection is analogous to the interplay between genetic muta-
175
tion and natural selection. The major difference is that the high-level variation in phenotypic design produced by sexual selection is much richer, more complex, and typically less deleterious than the low-level variation in protein structure produced by random genetic mutation. Thus, many of the phenomena that seem difficult to account for through the interaction of low-level genetic mutation and natural selection, might be better accounted for through the interaction of higher-level sexual-selective effects and natural selection. But we should consider the evolutionary origins of mate choice before we consider its evolutionary effects. 3
Why
mate
choice mechanisms
evolve
Darwin (1871) analyzed the evolutionary effects but not the evolutionary origins of mate preferences. Fisher (1915, 1930) went further in discussing how mate preferences might co-evolve with the traits they prefer, by becoming genetically linked to them, but he too did not directly consider the selection pressures on mate choice itself. Recently, the question of how selective mate choice can evolve has occupied an increasingly important position in sexual selection theory (e.g. Bateson, 1983; Kirkpatrick, 1982, 1987; Pomiankowski, 1988; Sullivan, 1989); the issue becomes particularly acute when mate choice is costly in terms of energy, time, or risk (Iwasa et al., 1991; Pomiankowski, 1987, 1990; Pomiankowski et al., 1991). The mysterious origins of mate choice can be made clearer if the adaptive utility of choice in general is appreciated. Little sleep is lost over the issues of how habitat choice, food choice, or nesting place choice could ever evolve given their costs; the same acceptance ought to apply to mate choice. Animal nervous systems have two basic functions: (1) generating adaptive survival behavior that registers, and exploits or avoids, important objects and situations in the ecological environment, such as food, water, prey, and predators ("ecological affordances"), and (2) generating adaptive reproductive behavior that registers and exploits important objects in the sexual environment, such as viable, fertile, and attractive mates ("reproductive affordances"). Current theories of how animals make adaptive choices among ecological affordances are substantially more sophisticated than theories of how animals make adaptive choices among reproductive affordances. However, by seeing both ecological affordances and reproductive affordances as examples of "fitness affordances" in general (Miller Cliff, 1994; Miller & Freyd, 1993; Todd & Wilson, 1993), we can see the underlying similarity between both sorts of adaptive choice behavior. The key to choosing food adaptively is to have an evolved food-choice mechanism that has internalized the likely survival effects of eating different kinds of foods: from an evolutionary perspective, the internally represented utility of a food item should reflect its objectively likely prospective fitness effects on the animal, given the animal's energy requirements, biochemistry, gut morphology, etc. By analogy, the key to choosing mates adaptively is an evolved mate choice mechanism that has internalized the likely long-term fitness consequences of reproducing with
176
different kinds of potential mates, given a certain recurring set of natural and sexual selection pressures. The adaptive benefit of choice in each case is that negative fitness affordances that threatened survival or fertility in the past can be avoided, and positive fitness affordances that enhanced survival or fertility in the past can be exploited. Thus, choice is a way of internalizing ancestral selection pressures into current psychological mechanisms. This view of the evolution of choice suggests that mate choice mechanisms can be analyzed according to normative criteria of adaptiveness. The internally represented sexual attractiveness of a potential mate should reflect its objectively likely prospective fitness value as a mate, in terms of the likely viability and sexual attractiveness of any offspring that one might have with it. Thus, the efficiency and normativity of a mate choice mechanism could in principle be assessed with the same theoretical rigor as a mechanism for any other kind of adaptive choice. Mate choice is well-calibrated if the perceived sexual attractiveness of potential mates is highly correlated with the actual viability, fertility, and attractiveness of the offspring they would produce. The observable traits of potential mates that correlate primarily with offspring survival prospects can be termed "viability indicators" (Zahavi, 1975), and the observable traits that correlate primarily with offspring reproductive prospects can be called "aesthetic displays" of the sort analyzed by Darwin (1871) and Fisher (1930). In fact, most sexually-elaborated traits such as the peacock's tail will probably play both roles to some extent, with their large costs making them useful viability indicators (e.g. Petrie, 1992) but the details of their design making them attractive aesthetic displays (e.g. Petrie et al., 1991). Now we can ask, what actually gets "evotutionarily internalized" from the environment (Shepard, 1984, 1987) in the case of mate preferences? Mate choice mechanisms may in some cases evolve to 'represent' the recent history of a population's evolutionary trajectory through phenotype space, that is, the recent history of natural selection and sexual selection patterns that have been operating in the population. Sustained, directional movement through phenotype space typically implies that directional selection is operating, or that a fitness gradient is being climbed in a certain direction. Mate preferences that are in agreement with this directional movement, internalizing the species' recent history, will then be more successful, assuming the movement continues. In this case, mate preferences can be described as 'anticipatory' assessments of past selection pressures that will probably continue to be applied in the future, in particular to one's offspring. This picture of how mate preferences evolve has clear implications for sexual selection dynamics. If a population has n o t been moving through phenotypic space, e.g. it is perched atop an adaptive peak due to stabilizing selection, as most populations are most of the time, then mate preferences will probably evolve to favor potential mates near the current peak, and they will tend to reinforce the stabilizing natural selection that is currently in force. (If biased mutation tends to displace individuals from the peak more often in one direction than in another, then mate preferences may evolve to counteract that recurrent delete-
177
rious mutation by having a directional component - - see Pomiankowski et al.~ 1991.) But if a population has been evolving and moving through phenotype space, then mate preferences can evolve to 'point' in the direction of movement, conferring more evolutionary 'momentum' on the population that it would have under natural selection alone. These sorts of directional mate preferences (Kirkpatrick, 1987; Miller & Todd, 1993) can be visualized as momentum vectors in phenotype space that can keep populations moving along a certain trajectory, in some cases even after natural-selective forces have shifted.
Another effect could be seen when a population has been splitting apart due to some form of genetic divergence (which we will discuss more in section 7.1). In this case, mate preferences in each sub-population can evolve to favor breeding within the sub-population, and not between sub-populations, thereby reinforcing the speciation. The divergent mate preferences of two populations splitting apart can be visualized as vectors pointing in different directions. These sexualselective vectors will reinforce and amplify the initial effects of divergence b y imposing disruptive (sexual) selection against individuals positioned phenotypically in between the parting populations. Thus, directional mate preferences will often evolve to be congruent with whatever directional natural selection (if any) is operating on a population, whether it applies to a unified population or one splitting apart into subspecies. Sexual selection may thereby smooth out and reinforce the effects of natural selection.
But sexual selection vectors can often point in different directions from natural selection vectors, resulting in a complex evolutionary interplay between these forces. The evolution of mate preferences can be influenced by a number of factors other than natural selection for mate preferences in favor of high-viability traits. For example, stochastic genetic drift can act on mate preferences as it can act on any phenotypic trait; this effect is important in facilitating spontaneous speciation and in the capriciousness of runaway sexual selection. Intrinsic sensory biases in favor of certain kinds of courtship displays, such as louder calls or brighter colors, may affect the direction of sexual selection (Endler, 1992; Enquist & Arak, 1993; Guilford & Dawkins, 1991; Ryan, 1990; Ryan & KeddyHector, 1992). Also, an intrinsic psychological preference for novelty, as noted by Darwin (1871) and in work on the "Coolidge effect" (Dewsbury, 1981), may favor low-frequency traits and exert "apostatic selection" (Clarke, 1962), a kind of centrifugal selection that can maintain stable polymorphisms, facilitate speciation, and hasten the evolution of biodiversity. Thus, a number of effects may lead mate choice mechanisms to diverge from preferring the objectively highestviability mate as the sexiest mate. These effects will often make sexual-selective vectors diverge from natural-selective gradients in phenotype space, and give sexual selection its capricious, divergent, unpredictable nature. Now that we have considered the evolutionary origins of mate preferences, we can consider their evolutionary effects.
178
4 Ecological optimization can be facilitated by selective mate choice Natural selection is often analyzed theoretically, and implemented computationally, as a fairly simple 'fitness function' that maps from phenotypic traits to reproductive success scores (Goldberg, 1989). But natural selection as it actually operates in the wild is often a horribly noisy, irregular, and inaccurate process. Predators might often eat the prey animal that has the better vision, larger brain, and longer legs, simply because that animal happened to be closer at dinner time than the duller, blinder, slower animal over the hill. A lethal virus may attack and eliminate the animal with the better immune system simply because that animal happened to drink from the wrong pond. Anyone who doubts the noisiness and inaccuracy of natural selection should consider the relative lack of speed with which animals evolve in the wild in comparison to evolution under artificial selection by human breeders, who cull undesirable traits with much more accuracy and thoroughness. Maynard Smith (1978, p. 12) observed that evolution can happen up to five orders of magnitude (100,000 times) faster under artificial selection than under typical natural selection, at least over the short term. The fundamental reason for this disparity is that Nature (i.e. the physical habitat or biological econiche) has no incentive to maximize the selective efficiency or accuracy of naturM selection, whereas human breeders do have incentives to maximize the efficiency and accuracy of artificial selection. Likewise, animals choosing mates have very heavy incentives to maximize the efficiency and accuracy of their mate choice, and thereby the efficiency and accuracy of the sexual selection that they impose. Thus, it would be extremely surprising if the selective efficiency and accuracy of natural selection were typically as high as that of sexual selection through mate choice. Habitats and econiches are not well-adapted to impose natural selection, whereas animals are well-adapted to choose mates and thereby to impose sexual selection. (This difference is often obscured in genetic algorithms research, where fitness functions are specifically designed by humans to be efficient and accurate selectors and Mlocators of offspring.) Given the relative noisiness and inefficiency of natural selection itself, how can the "organs of extreme perfection and complication" that Darwin (1859) so admired ever manage to evolve? We believe they may do so with substantial assistance from selective mate choice, at least in animals and flowering plants. As we saw in the previous section, sexually reproducing animals have strong incentives to internalize whatever natural selection pressures are being applied to their population in the form of selective mate preferences. For example, these preferences can inhibit mating with individuals that probably survived by luck rather than by genetic merit, whatever genetic merit means given current natural-selective and sexual-selective pressures. By avoiding mates that have low apparent viability but happen to still be alive anyway, parents can keep from having offspring that would probably not be so lucky. Conversely, by mating with individuals who clearly show high viability and sexual attractiveness, parents may give their offspring a genetic boost with respect to natural and sexual
179
selection for generations to come. For example, an average individual who mates with someone with twice their viability or attractiveness may increase their longterm reproductive success (e.g. number of surviving grand-children) by roughly 50to random mating, by having their genes 'hitch-hike' in bodies with the better genes of their mate. This inheritance of genetic and economic advantage through mate choice can have several important effects on the optimization of complex adaptations, because the brains and sensory systems involved in mate choice can act as highly efficient 'lenses' for reflecting, refracting, recombining, amplifying, and focusing natural selection pressures. First, the noisiness of natural selection can be substantially reduced by mate choice, leading to smoother, faster evolutionary optimization. It might take a while for mate preferences to accurately internalize the current regime of natural selection, but once in place, such preferences can exert much more accurate, less noisy selection than natural selection itself can. For example, natural selection by viruses alone (a biological selector) might yield a low correlation between heritable immune system quality and reproductive success, because the infected animals might be too sick to have a full-sized litter, but still manage to have several offspring despite their illness. But mate choice based on observed health and immune capacity may boost this correlation much higher, if conspecifics refuse to mate at all with an individual who bears the viral infection, and thereby lower the sick individual's reproductive success to nil. The higher the correlation between heritable phenotypic traits and reproductive success, the faster the evolution (Fisher, 1930). Mate choice can therefore heavily penalize individuals who show a tendency to get sick, whereas natural selection heavily penalizes only those individuals who actually have fewer offspring or die. Here, the brains and sensory systems involved in mate choice act to focus the noisy, diffuse, unreliable forces of natural selection into smoother, steeper gradients of sexual selection. Thus, much of the work of constructing and optimizing complex adaptations may be performed by mate choice mechanisms tuned to reflect natural selection pressures, rather than by the natural selection pressures themselves. Of course, most animals that fail to reproduce - - especially in r-selected species that produce large numbers of offspring with little parental care - - will do so because they fail to survive to reproductive maturity in the first place, being spontaneously aborted, never hatching, or dying due to illness, starvation, or predation. Out of the countless eggs and sperm that adult salmon release during mating, only a very few zygotes will survive the rigors of childhood and up-river migration to successfully choose mates and spawn themselves. Natural selection may eliminate almost all of the individuals in a particular generation in this way. As Darwin (1859) noted in his discussion of the inevitability of competition, the manifest capability of organisms to reproduce far outstrips the carrying capacity of their environment, so natural selection will eliminate the vast majority of individuals. In contrast, even the most intensive mate choice in highly polygynous species will not cull the remaining reproductively mature individuals from the mating game with anything like this kind of ferocious efficiency. A large number of bachelor males may not leave behind any offspring, but most of the
180
females and a significant number of males will, making sexual selection look like a much weaker force in terms of the percentages of individuals affected. But the efficiency of a selective process depends most heavily on the correlation between heritable phenotypic features and selective outcomes. In natural selection, this correlation may often be quite low, because, as stressed earlier, Nature typically has no incentive to increase its selective efficiency. By contrast, this correlation may be quite high in sexual selection, because animals have large incentives to increase their mate choice efficiency. Thus, although sexual selection typically affects fewer individuals per generation than natural selection, sexual selection may account for most of the nonrandom change in heritable phenotypic traits - - i.e. most of the evolution. Second, mate choice can magnify relative fitness differences, thereby increasing the speed and robustness of optimization. In genetic algorithms research, populations often converge to have nearly equal performance on the user-imposed objective fitness function after a few dozen generations, and further optimization becomes difficult because the relatively small fitness differences are insufficient to result in much evolution. Methods for 'fitness scaling' such as linear rescaling or rank-based selection can overcome this problem by mapping small differences in objective fitness (corresponding to ecological success) onto large differences in reproductive success (Goldberg, 1989). We believe that in nature, sexual selection can provide an automatic form of fitness scaling that helps populations avoid this sort of evolutionary stagnation. Again, sexually reproducing animals have incentives to register slight differences in the observed viability of potential mates and to mate selectively with higher-viability individuals. The result of this choosiness will be automatic fitness scaling that maintains substantial variance in reproductive success and thereby keeps evolution humming along even when every individual is similar in fitness (e.g. when near some optimum). Here, brains and sensory systems act via mate choice to magnify small fitness differences, effectively separating individuals who would otherwise have indistinguishable fitnesses (and have the same number of offspring) into different distinguishable fitnesses - - and thereby greatly increasing the variance in the number of offspring. Third, mate choice mechanisms can pick out phenotypic traits that are different from those on which natural selection itself acts, but that are highly correlated with natural-selective fitness. For example, bilateral symmetry may be an important correlate of ecological success for many vertebrates. But natural selection might increase the degree of symmetry in a particular lineage only very indirectly through its effects on several different correlates of symmetry, such as locomotive efficiency (individuals with asymmetric legs won't be able to get around as well and so will be selected against on the grounds of their locomotive inefficiency, rather than being selected against for asymmetry per se). By contrast, mate preferences for perceivable facial and body form can directly select for symmetry in a way that natural selection cannot. 2 2 Symmetry is a useful general-purpose cue of developmental competence (Moiler & Pomi~nkowski, 1993), because deleterious mutations, injuries, and diseases often
181
In general, mate choice can complement natural selection by operating on perceivable phenotypic attributes that underlie a wide array of economic traits, but that would typically be shaped only indirectly by a number of different, weak, indirect natural selection pressures. To continue our analogy between brains and optical devices, mate choice mechanisms can act as panoramic lenses, bringing into view a wider array of phenotypic features than natural selection alone would tend to focus on. Natural selection is extremely efficient at eliminating major genetic blunders, such as highly deleterious mutations or disruptive chromosome duplications - it simply prevents the afflicted individual from reaching reproductive maturity. But the subtler task of shaping and optimizing complex adaptations may be more difficult for direct ecological selection pressures to manage. Natural selection alone can of course accomplish wonderful things, given enough time: 3.5 billion years of prokaryote evolution (amounting to many trillions of generations) has produced some quite intricate biochemical adaptations in these single-celled organisms. But for larger-bodied animals with longer generation times, we believe that selective mate choice plays a major role in the optimization of complex adaptations. For such species, the efficacy of natural selection may depend strongly on shaping mate choice mechanisms that 'take over' via sexual selection and do much of the difficult evolutionary work. There is suggestive data that support this hypothesis. Bateson (1988) replotted data from Wyles, Kunkel, and Wilson (1983), and found a strong positive correlation across several taxa between rate of evolution (assessed by a measure of morphological variability across eight traits) and relative brain size. For example, song birds have larger brains than other birds, and apparently evolve faster; humans have the largest brains of all primates, and apparently evolve the fastest. Bateson (1988) interpreted this correlation in terms of larger brains allowing better habitat choice, a stronger "Baldwin effect" (in which the ability to learn speeds up the evolution of unlearned traits - - see Hinton and Nowlan, 1987), and various forms of "behaviorally induced environmental change" - - but he overlooked the potential effects of brain size on sexual selection patterns. We believe it is more important that larger brains allow more powerful and subtle forms of selective mate choice. Indeed, the vastly enlarged human brain has allowed us not only to (unconsciously) impose strong sexual selection on members of our own species (Darwin, 1871; Miller, 1993), but also to impose very strong artificial selection on members of other species (Darwin, 1859). The correlation disrupt symmetry. Furthermore, an animal choosing a mate based on its ability to develop symmetrically need not know the "intended" optimal form of a particular bilateral structure -- it only needs the circuitry for detecting differences between the two matched halves of the structure. Symmetrically-structured sensory surfaces and neural circuits (e.g. eyes and brains) may make such symmetry judgments easy, because they facilitate the comparison of the corresponding left and right features of perceived objects. The utility of symmetric body-plans as displays of developmental competence, and of symmetric brains and senses as mechanisms for choosing symmetric mates, could make developing a symmetric phenotype a common attractor state for many evolving lineages.
182
between brain size and rate of evolution provides a suggestive start for studies of the relationship between the capacity for selective mate choice and the rate and course of evolution, but clearly much more data is needed on this issue.
5 Escaping evolutionary local optima through sexual selection 5.1 The relative power of ~sexual-selective drift', genetic drift, and neutral drift Populations can become perched on some adaptive peak in the fitness landscape through the optimizing effect of sexual and natural selection acting together. But many such peaks are only local evolutionary optima, and better peaks may exist elsewhere. Once a population has converged on such a locally optimal peak then, how can it move off that peak, incurring a temporary ecological fitness cost, to explore the surrounding adaptive landscape and perhaps find a higher-fitness peak elsewhere? Wright's (1932, 1982) "shifting balance" theory was designed to address this problem of escaping from local evolutionary optima. He suggested that genetic drift operating in quasi-isolated populations can sometimes allow one population to move far enough away from its current fitness peak that it enters a new adaptive zone at the base of a new and higher fitness peak. Once that population starts to climb the new fitness peak, its genes can spread to other populations, so that the evolutionary innovations involved in climbing this peak can eventually reach fixation throughout the species. Thus, the species as a whole can climb from a lower peak to a higher one. Wright's (1932) model anticipated some of the recent concerns about how to take "adaptive walks" that escape from local optima in rugged fitness landscapes (Kaufmann, 1993). In very rugged landscapes, short steps (defined relative to the landscape's ruggedness) of the sort generated by genetic point mutations are unlikely to allow individuals or populations to escape a local optimum. This is similar to Darwin's (1883) problem of how minor mutations can accumulate into useful adaptations if they have no utility in their initial form. But jumping further across the landscape does not guarantee success, either: longer steps of the sort generated by macromutations (as favored by Goldschmidt, 1940) are unlikely to end up anywhere very reasonable; most mutations are deleterious, and major mutations even more so. The central problem is how to match the "foray length" of population movements away from local optima with the "correlation length" of the adaptive landscape, and thereby facilitate directional excursions away from the current adaptive peak to explore the surrounding fitness landscape. Wright's shifting balance model suggests that genetic drift might provide enough random jiggling around the local optimum to sometimes knock the population over into another adaptive zone, but the theoretical analysis of adaptive walks in rugged fitness landscapes (Kaufmann, 1993) indicates that this is unlikely to be a common occurrence.
t83
Our model of population movement in phenotype space via mate choice is similar to Wright's shifting balance theory, but it provides a mechanism for exploring the local adaptive landscape that can be much more powerful and directional than random genetic drift: sexual selection. Here, we are relying on a kind of 'sexual-selective drift' resulting from the stochastic dynamics of mate choice and runaway sexual selection to displace populations from local optima. We suspect that with mate choice, the effects of sexual-selective drift will almost always be stronger and more directional than simple genetic drift for a given population size, and will be more likely to take a population down from a local optimum and over into a new adaptive zone. Genetic drift relies on passive sampling error to move populations down from economic adaptive peaks, whereas sexual selection relies on active mate choice, which can overwhelm even quite strong ecological selection pressures. Our simulations have shown that with directional mate preferences in particular, populations move around through phenotype space much more quickly than they would under genetic drift alone, and not uncommonly in direct opposition to natural selection forces (Miller & Todd, 1993). Thus, sexual selection can be seen as a way of making Wright's shifting bMance model much more powerful, by allowing active mate choice dynamics to replace passive genetic drift as the main source of evolutionary innovation. Aside from classical genetic drift (sampling error in small populations), "neutral drift" through adaptively neutral mutations (Kimura, 1983) might conceivably play an important role in allowing populations to explore high-dimensional adaptive landscapes. The idea is this: the more dimensions there are to an adaptive landscape, the less problematic local optima will be, because the more equal-fitness 'ridges' there will be from one optimum to another in the space. A local optimum may be a peak with respect to each of two phenotypic dimensions, but it is unlikely to be a peak with respect to each of a thousand dimensions, so there will be plenty of room for adaptively neutral exploration of phenotype space (see Eigen, 1992; Schuster, 1988). Under this model, populations can drift around through adaptive landscapes without incurring fitness costs for doing the exploration. The neutral drift theory is usually applied to molecular evolution (DNA base pair substitutions typically do not change expressed protein functionMity), but it could in principle extend to morphology and behavior. To take an implausible example, if quadrupedalism and bipedatism happen to have equal locomotive efficiency in a certain environment (such as the Pleistocene savanna of Africa), a population might drift from the former to the latter without incurring much fitness cost in between, and without natural selection in favor of bipedalism p e r se. Although both ways of moving may be equal in locomotive efficiency, they have very different implications with respect to other potential activities such as tool use. Once the population drifts into bipedalism, it will happen to enter a new adaptive zone wherein natural selection can favor new adaptations for tool use, resulting in an evolutionary innovation with respect to tool use. Thus, if the problem of local optima in high-dimensional adaptive landscapes really is overstated, then neutral drift from one adaptive zone to another might facilitate the
184
discovery of evolutionary innovations associated with different adaptive peaks. However, we believe that for complex phenotypic adaptations at the level of morphology and behavior, the problems of local optima are not so easily overcome. The evolutionary conservatism characteristic of many morphological and behavioral traits in many taxa suggests that neutral drift has trouble operating on such traits. Still, so little is known about neutral drift above the level of molecules that such arguments are not convincing. We can nonetheless ask, if neutral drift theory does apply to complex phenotypic traits, is neutral drift through phenotype space likely to be faster with or without the capricious dynamics of sexual selection? Here again, we believe that populations capable of mate choice will be more likely to exploit the possibilities of neutral drift and move along fitness ridges, because mate choice can confer more mobility and momentum on evolving populations.
5.2 The role of sexual dimorphism in escaping local optima through sexual selection As Darwin (1871) noted, females are usually choosier than males about their mates, so sexual selection typically acts more strongly on males. SexuMly dimorphic selection pressures will often result in sexually dimorphic traits, although dimorphism in a trait tends to evolve much more slowly than the trait itself (Lande, 1980, 1987). Thus, Darwin was able to use sexual dimorphism as a diagnostic feature for a trait having evolved through sexual selection. But the effects of sexual dimorphism on longer-term evolutionary processes have rarely been considered. Highly elaborated male courtship displays, whether behavioral or morphological, are often costly in terms of the male's 'economic' success with respect to the surrounding econiche. Indeed, according to Zahavi's (1975) handicap theory, this cost is indirectly the reason why elaborated displays can evolve under sexual selection, as an indication of the male's vitality in being able to overcome the handicapping costly courtship display. If we view a dimorphic population as situated in an adaptive landscape that represents purely ecological (economic) fitness, then the females will be situated close to the fitness peak, while the males will be situated some distance from the peak, and thus lower on the fitness landscape. As the male displays become more elaborated and more costly, the males will end up further away from the fitness peak representing economic optimality. Thus, sexuM dimorphism in courtship traits leads to a kind of sexual division of labor with respect to the job of exploring adaptive landscapes. Males get pushed off economic fitness peaks by the pressure of female choice in favor of highly elaborated, costly courtship displays. Due to the typical lack of male choosiness, the females can stay more comfortably situated near the economic fitness peak. Thus, males become the explorers of the adaptive landscape, compelled to wander through the space of possible phenotypic designs by the demands of female choice to 'bring home' a sexy, interesting, and expensive courtship display: The economic costs of wandering through phenotype space are compensated for by the reproductive benefits of attracting mates with a costly, elaborated courtship
185
display. In most species most of the time, the males will reach some equilibrium distance (Fisher, 1930; Kirkpatrick, 1982), close enough to the economic fitness peak to survive, but far enough away to demonstrate their viability and to incur the costs of an elaborate display, and the species will be recognized as having some sexually dimorphic traits. But sometimes, in some species, the males might stumble upon a new adaptive zone in the course of their wanderings. That is, a sexually elaborated trait, or some phenotypic side-effect of it, could prove economically useful, and become subject to favorable natural selection. The males would then start to climb the new economic fitness peak; and once the males reach a level of economic benefit on this new peak that exceeds the benefit obtainable on the old fitness peak, then there can be selection for females as well to move from their position on the old peak to the new, higher, peak. This selection on females would act to eliminate the sexual dimorphism that maintained the useful new traits in the males alone, so that the females too could inherit the new trait (from their fathers initially). Thus, once the males enter a new adaptive zone and start to climb a higher fitness peak, a combination of natural selection and reduced sexual dimorphism may move the entire population, males and females, to the top of the new fitness peak. Populations that successfully shift from one adaptive peak to another will show little sexual dimorphism for the original courtship traits that brought them into the region of the new peak, since selection on the females will have worked to remove it; instead, they will be recognized as beneficiaries of an evolutionary innovation that is characteristic of both males and females. So it may be difficult to recognize modern species that have undergone this peak-jumping process except through careful analysis of the fossil record; computer simulation may be more useful in determining whether this peak-jumping mechanism is plausible. Such hypothesized rapid shifts between fitness peaks resemble what Simpson (1944) called "quantum evolution" or what Eldredge and Gould (1972) called "punctuations". The quantum evolution term is apt because our theory suggests that populations capable of sexual dimorphism can do a kind of 'quantum tunneling' between adaptive peaks: the normal economic costs that slow movement across low-fitness valleys between peaks can be overridden by genealogical (sexually selected) benefits to the males, allowing them to traverse the valleys much more quickly. The females can then join the males once a new peak is actually discovered. The result could be much more rapid movement between peaks than would be possible under natural selection alone. This rapid tunneling between peaks looks strange from the perspective of the purely economic adaptive landscape that represents only natural selection pressures. But that landscape is not the whole picture: the effects of sexual selection establish a separate 'reproductive landscape' with different dimensions and perhaps a different topography for males and females. The economic and reproductive landscapes together combine to form a master adaptive landscape; what looks like paradoxical downhill movement or quantum tunneling in the purely economic landscape traversed by natural selection may actually be hillclimbing in the combined landscape that includes sexual selection pressures.
186
But won't these initially economically unfeasible excursions by the males threaten their survival, and hence that of the species as a whole? Sexual selection is often maligned for just this reason, as "a fascinating example of how selection may proceed without adaptation" (Futuyma, 1986, p. 278), on the principle that the economic costs of highly elaborated male courtship displays might predispose a species to extinction - - e.g. as argued by Haldane (1932), Huxley (1938), and Kirkpatriek (1982). But as Pomiankowski (1988) has emphasized, the relationship between male economic success and population viability is quite complex and unclear. Reproductive output in sexually-reproducing species is typically limited by the number of females, not by the number of males. The population's rate of replacement will not necessarily be decreased by the loss of male viability due to elaborated courtship displays. On the contrary: "a population denuded of males will have more resources available for females and so may support an absolutely larger reproductive output for a given resource base" (Pomiankowski, 1988). Thus, the population-level costs of sexually elaborated traits may be minimal, and the individual-level benefits may be large, due to sexual selection. This makes quantum tunneling between adaptive peaks through sexual selection a plausible mechanism for generating evolutionary innovations and escaping local ecological optima. At first glance, our proposal bears an uncomfortable resemblance to traditional sexist images of males going out to hunt and sometimes returning with meat for the benefit of their families. But females may also do some important exploration of the adaptive landscape, with respect to different phenotypic dimensions. Under Fisher's (1930) runaway selection model for example, female preferences and male traits both become elaborated through sexual selection. Females become ever-choosier and more discriminating. The benefits of selective mate choice can favor the evolution of new sensory, perceptual, and decisionmaking adaptations in females, despite their economic costs. Thus, while males are exploring the space of possible secondary sexual characteristics and behavioral courtship displays under sexual selection, females may be exploring the space of possible sensory, perceptual, and cognitive traits. If the females happen upon a mate choice mechanism such as a new form of color vision or better timbre perception that also happens to hmre economic benefits in their econiche, then we would expect such mechanisms to be further modified and elaborated through natural selection, and inherited by males as well, eventually showing low dimorphism. Thus, females can also tunnel between peaks in the space of possible perceptual systems, deriving the reproductive benefits of selective mate choice even when a perceptual system shows little ecological benefit. In summary, sexual selection provides the easiest, fastest, and most efficient way for populations to escape local ecological optima. Sexual dimorphism with respect to courtship traits and mate preferences allows a sexual division of labor in searching the adaptive landscape. Many morphological and behavioral innovations that currently show high economic utility and low sexual dimorphism may have originated as parts of male courtship displays. Likewise, many sensory, perceptual, and decision-making innovations could have originated as components
187
of female choice mechanisms, and later have been modified for ecological applications. Those innovations that did not happen to show any ecological utility remained in their sexually dimorphic form, and are typically not recognized as innovations at all. 6 6.1
Sexual selection and evolutionary
innovations
T h e m y s t e r y of e v o l u t i o n a r y innovations
Evolutionary innovations are important because natural selection crafts adaptations out of innovations:" Innovation is the mainspring of evolution" (Jablonski & Bottjer, 1990, p. 253). Classic examples of major evolutionary innovations include the bony skeleton of vertebrates, the jaws of gnathostomes, the amniote egg, feathers, continuously growing incisors, large brains in hominids, the insect wing, and insect pollination of angiosperms (Cracraft, 1990). But the complete list of major evolutionary innovations is almost endless, being virtually synonymous with the diagnostic characters of all successful higher taxa, and the complete list of minor innovations would include essentially all diagnostic characters of all species. But, for all their biological importance and large number, the causal origins of evo!utionary innovations have been tong contended and remain poorly understood. Virtually every major evolutionary theorist has tackled the problem of evolutionary innovations, e.g. Darwin (1859, 1871, 1883), Romanes (1897), Weismann (1917), Wright (1932, 1982), Simpson (1953), Mayr (1954, 1960, 1963), and Gould (1977). But the major questions remain unresolved (see Nitecki, 1990, for a recent review). This section reviews the history of evolutionary thinking about innovations; section 6.2 examines the most baffling features of innovations; section 6.3 suggests that sexual selection through mate choice can help explain the strange pattern of innovations in animals and flowering plants; section 6.4 outlines some limits to our hypothesis; and section 6.5 concludes the discussion of innovations. Darwin, particularly in the sixth edition of the Origin of species (Darwin, 1883), worried about the early evolutionary stages of"organs of extreme perfection" such as the human eye and the bird's wing. How could these innovations be preserved and elaborated before they could possibly assume their later survival function (such as vision or flight)? The problem for Darwin was to account for the origin of phenotypic innovation that was more complex and well-integrated than what random mutation could produce, but that was not yet useful enough in the struggle for existence to have been favored by natural selection. Mutations seemed able to generate only trivial or disastrous phenotypic changes, and so could not account for the origins of useful innovations, whereas natural selection could only optimize innovations already in place. Nor could Darwin convince skeptics that some mysterious interplay between mutation and selection could account for evolutionary innovations. Darwin's difficulty in accounting for evolutionary innovations was one of the weakest and most often-attacked aspects of his theory of natural selection. Even
188
his most ardent followers were anxious about this problem. Romanes (1897) was very concerned to show how "adaptive characters", or evolutionary novelties, originate. For him, this was the central question of evolutionary theory, much more important than the question of how species originate, but one that he was never able to answer to his own satisfaction. Simpson (1953) later proposed that "key mutations" can cause a lineage to enter a new "adaptive zone" such that the lineage undergoes an adaptive radiation, splitting apart into a large number of species to exploit all the ecological opportunities in that new adaptive zone. Similarly, Mayr (1963) defined an evolutionary innovation as "any newly acquired structure or property that permits the performance of a new function, which, in turn, will open a new adaptive zone" (Mayr, 1963, p. 602). However, both Simpson and Mayr were better able to describe innovation's effects than to explain its causes. Their notion that major innovations are closely associated with adaptive radiations has been a persistent theme in innovation theory, appearing more recently under the guise of "key evolutionary innovations" in Liem (1973, 1990), and "key characters" in Van Valen (1971). Over this long history, several kinds of explanations have been offered to explain the emergence of evolutionary innovations. Goldschmidt (1940) suggested that macromutations could produce fully functioning novelties in the form of "hopeful monsters". The problem with this view is that random macromutations are overwhelmingly unlikely to generate the sort of structural complexity and integration characteristic of innovations even in their early stages. Complex innovations cannot be explained by undirected random mutation. On the other hand, Fisher (1930) took the Darwinian hard line and maintained that innovations could indeed be produced purely through natural-selective hill-climbing. The difficulty with this idea is that it ignores the problem of local optima, as discussed in section 5. Significant innovation corresponds to fairly substantial movement through a multi-dimensionM adaptive landscape. But because many adaptive landscapes have complex structures (Eigen, 1992; Kauffman, 1993), with many peaks, ridges, valleys, and local optima, long movements through such landscapes may often require escaping from local optima. As section 5.1 emphasized, this problem of escaping local optima may be more serious at the level of complex phenotypic design than at the level of genetic sequences or prorein shapes (cf. Eigen, 1992) - - and most evolutionary innovations of interest to biologists are innovations in complex phenotypie design. (However, see Dawkins, 1994, for a description of a recently simulated example of a possible course of evolution for a complex adaptation - - the vertebrate eye - - that proceeds rapid[y and directly from flat skin to fish-eye in 400,000 generations without getting stuck in local optima.) Thus~ the evolution of a new phenotypic innovation may often reflect escape from a local adaptive optimum and the discovery of a better solution elsewhere in the space of possible phenotypes (Wright, 1932; Patterson, 1988). Finally, other theorists have put forth explanations of the origins of innovation that stress the role of phenotypic structure in allowing for innovations. In these theories, innovative adaptations can arise through phenotypic by-products
189
of other adaptive change (Mayr, 1963), through various mechanisms of phenotypic self-organization (e.g. Eigen, 1992; Kauffman, 1993), and through changes in developmental mechanisms~ particularly 'heterochronies' that affect the relative timing of the development of different traits (Bonnet, 1982; Goodwin et al., 1983; Gould, 1977; Muller, 1990; Raft, 1990; Raft ~: Raft, 1987). These sorts of phenotypic constraints and correlations are probably important, but as we will see, they cannot explain the most striking features of the distribution of evolutionary innovation. There are three major problems for these and the other traditional theories about evolutionary innovation just described; we will now examine these challenges in turn. 6.2
T h r e e puzzling a s p e c t s of e v o l u t i o n a r y innovation
First, there is a disparity between the huge number of minor varietal innovations and the small number of ecologically useful innovations. Darwin (1883, p. 156) stressed this problem when he quoted Milne Edwards: "Nature is prodigal in variety but niggardly in innovation. Why ... should there be so much variety and so little real novelty?". The vast majority of characteristic innovations are "inconsequential" (Liem, 1990); they are what Francis Bacon called "the mere Sport of Nature" when he disparaged the apparently pointless variety of animals, plants, and fossils (quoted in Cook, 1991). Only very few of the initially inconsequential minor innovations may lead to major innovative evolutionary shifts in form or function that allow the invasion of major new habitats and adaptive zones. But if evolutionary innovations spread through populations under the influence of traditional natural selection for their ecological utility, why do so few varietal innovations show the sort of ecological utility that characterizes key innovations? Second, there is often a disparity in time between the causal origin of an innovation and the ultimate ecological and evolutionary effect of an innovation. The causes of evolutionary innovations must be clearly separated from their possible effects on diversification, niche exploitation, or adaptive radiation (Cracraft, 1990). "Key innovations" that allow a monophyletic taxon to radiate outwards into a number of new niches can only be identified post-hoe, after their success has been demonstrated evolutionarily. Immediately after they originate, evolutionary innovations are just innovations pure and simple. Their prospective future ecological utility as fully elaborated traits cannot bring them into being initially. If we wish to understand the actual causal origins of evolutionary innovations, we must look within the species where the innovation originated, not at the ultimate macroevolutionary consequences of the innovation. Liem has stressed this point, observing that "An evolutionary novelty may remain in a stasis for extended times when it does not convey an improvement in the matter/energy transfer" (Liem, 1990, p. i61), and "historical tests also show that there is often a great delay between the emergence of a KEI [key evolutionary innovation] and the onset of the diversification it is assumed to cause" due to its newfound ecological utility (Liem, 1990, p. 165). Earlier, he also noted that "adaptive radiations will not occur until after an evolutionary novelty has reached a certain degree of development" (Liem, 1973, p. 426). Jablonski (1986, 1990) has also observed
190
that many innovations fail to persist, let alone trigger a diversification indicative of ecological utility. Thus, to understand key innovations, we must explain the origin and elaboration of many integrated morphological and behavioral systems that only rarely manifest much survival utility. We seem to need a form of iterative Darwinian selection other than natural selection for ecologically useful survival traits, to account for the period of evolution of an innovation between its first appearance and its eventual ecological significance. Third, the distribution of innovations in animals and flowering plants is not random with respect to phenotypic features, but is highly r in features subject to sexual selection. Traditional theories of innovation through natural selection or through phenotypic constraints and correlations have trouble accounting for this distribution, which is seen most clearly when we consider the methods of biological taxonomy. The most common features used by taxonomists to distinguish one species from another should logically be the sorts of features most characteristic of (at least minor) evolutionary innovations. This is an almost tautological result of the fact that taxa, including species, are in some sense made up of their innovations (Weismann, 1917): their innovations are their critical defining features. The most commonly used defining features for species appear to be primary and secondary sexual traits, and behavioral courtship displays, which Mayr (1960) designated "species recognition signals". And a great many of these traits, used in the identification of species of animals and flowering plants and discussed in speciation research, are just the sort of characteristics most likely to have arisen by sexual selection through mate choice. Studies of evolutionary innovation that rely on reconstructing explicit phylogenies often rely on such features. For example, in Cracraft's (1990, pp. 31-35) analysis of evolutionary innovations in the Pwnopsittagenus of South American parrots, every single one of the 30 innovations discussed was a distinctive plumage color pattern or plumage growth pattern that could have been elaborated through mate choice, such as "bright orange-red shoulder patch"," crown bright red in male, not female", "yellow collar around head", or "crown and back of neck black". Moreover, it is often easier in taxonomy to identify the species of a male than of a female animal, because secondary sexual characters are typically more elaborated in males, whereas females more often retain camouflaged and ancestral forms (Eberhard, 1985). So, in Eldredge's (1989) terminology, reproductive rather than economic traits are often used to distinguish between species. In section 7.1, we argue that speciation can result from a stochastic divergence of mate choice criteria in a geographically-united population leading to a disruption of the mate recognition system within a given species. Under this scenario, most most of the traits distinguishing one species from another - - that is, most minor evolutionary innovations - - are likely to be sexuM characters or courtship displays that arose through mate choice. Moreover, the biological species concept, which views species as reproductively isolated populations, virtually demands that the innovations that distinguish one species from another must function as reproductive isolators - - that is, as traits subject to selective or assortative mate choice. Thus,
191
both the empirical methods of taxonomists and the theoretical presuppositions of the biological species concept suggest that most evolutionary innovations in animals and flowering plants arose through sexual selection acting on traits capable of creating reproductive isolation between populations, particularly primary and secondary sexual characteristics and courtship behaviors. To explain evolutionary innovations then, we need to account for the following facts: (1) Most innovations are too complex and well-integrated to have resulted simply from random mutation or genetic drift, and are too structurally and functionally novel (i.e. functionally non-neutral) to have resulted simply from neutral drift. (2) Many innovations may require escape from an evolutionary local optimum, which natural-selective hill-climbing tends to oppose. (3) Most innovations remain minor, showing very little ecological utility and not leading to adaptive radiations. (4) Those innovations that do eventually become ecologically important often show a long delay between their origin and their proliferation. Finally, (5) most innovations in animals and flowering plants, i.e. most traits taxonomically useful in distinguishing species, are heavily concentrated in phenotypic traits subject to mate choice, and this distribution cannot be explained by models of innovation relying on general phenotypic correlations and constraints. In general then, the origins of evolutionary innovations must be explained in terms of some kind of selection between individuals that has little effect on ecological success and that only rarely leads to macroevolutionary success. "Irrespective of whether innovations are perceived as 'large' or 'small', they all must arise and become established at the level of individuals and populations, not higher taxa" (Cracraft, 1990, p. 28). Thus, innovations that characterize an entire population or species must be explained at some level above that of simple mutation or developmental constraints, but below that of macroevolutionary 'sifting' between species (Vrba & Gould, 1986), and aside from that of natural selection for ecological utility.
6.3
T h e role of m a t e choice in g e n e r a t i n g e v o l u t i o n a r y innovations
Sexual selection through mate choice can account for all of these features of evolutionary innovation in animals and flowering plants. Thus, Darwin's "prodigal variety" may arise from a long-overlooked wellspring of innovation - - the effects and side-effects of mate choice. These sexually-selected varietal novelties could be called "courtship innovations." From these humble origins, a few incipient courtship innovations may continue to be elaborated into more and more complex morphological and behavioral characteristics. At various points in this evolutionary course of elaboration, a tiny minority of courtship innovations and their phenotypic by-products will happen to show some ecological utility, and may be modified to form new "economic innovations" that have some ecological utility. And a tiny minority of these economic innovations will prove important enough that they allow adaptive radiations and later come to be recognized as "key innovations." Thus, the causal origins of key innovations may often be the same as the causal origins of courtship innovations: elaboration of a trait by
192
sexual selection through mate choice. The net result of sexual selection's innovativeness may be that sexual selection is to macroevolution what genetic mutation is to microevolution: the prime source of potentially adaptive heritable variation, at both the individual and species levels. 6.4 W h a t kinds of e v o l u t i o n a r y innovations can be g e n e r a t e d t h r o u g h sexual selection? Our theory that many evolutionary innovations arise at first through the effects of selective mate choice, or as side-effects of sexually-selected traits, must be clarified and given some caveats. First, and most obviously, the theory applies only to biological systems where mate choice operates in some fashion. We have lumped together flowering plants and animals because they both undergo a form of sexual selection by animals with nervous systems~ either heterospecific pollinators or conspecifics. Evolutionary innovations in asexual lineages, and in sexually reproducing organisms that are too simple to exercise heritable patterns of nonrandom mate choice, must be explained in some other way. But since innovations seem to emerge much more slowly and sparsely in lineages without mate choice, there is less that needs explaining. Thus, we would expect the frequency distribution of evolutionary innovations to be highly skewed across lineages, clustered in species subject to high levels of selective mate choice. As sections 6.1 and 7.2 argue, this is just what we see. Second, selective mate choice can directly affect only those phenotypic traits that are perceivable to the animal doing the selecting, given its sensory and perceptual capabilities. Thus, mate choice typically applies to macroscopic morphology and manifest behavior. But it also applies indirectly to any microscopic morphology, physiology, neural circuitry, or biochemistry that affects the appearance of the perceivable traits or behaviors, e.g. the iridescence of bird feathers carried by microscopic diffractive structures on feathers, the complex courtship behavior generated by hidden neural circuits, or the persistent bird song allowed by an efficient metabolism. Furthermore, elaboration of these sexually selected traits may often have phenotypic side-effects on many other traits, and ecologically useful innovations may sometimes emerge from these side-effects. So we would expect the frequency distribution of evolutionary innovations across phenotypic traits to be highly skewed, clustered around traits that are directly subject to mate choice (such as genitals, secondary sexual morphology, and courtship behaviors), and spreading outwards from these traits to others that are structurally, behaviorally, or developmentally correlated. Third, as a corollary of the previous point about phenotypic side-effects, our theory may have fairly limited application to evolutionary innovation in the traits of flowering plants, apart from flowers themselves. Pollinators can directly select ibr flower traits such as shape, color~ smell, and. size, but it is unclear how easy it would be for floral innovations to become modified into ecologically useful new kinds of seeds, fruits, or chemical defenses, much less new kinds of twigs, leaves, or roots. Moreover, despite the fact that the complexity of plant behavior has often been underestimated (see Darwin~ 1876; Simon, 1992), plants cannot
193
use shifts in behavior and habit to smooth the way for changes of morphological function as easily as animals do (Darwin, 1883; Bateson, 1988). As a result, the modification of courtship innovations into economic innovations in plants may be more difficult than in animals. However, polymorphism and sympatric speciation could almost certainly be facilitated through flower selection by pollinators, as the data from Eriksson and Bremer (t992) suggest. So the effects of pollinator choice might at least explain the higher speciation rates and high rates of floral innovation in flowering plants.
6.5 Summary: An overview of evolutionary innovation through sexual selection Species perched on adaptive peaks will generally have mate choice mechanisms complementary to the natural-selective pressures keeping them there, so long periods of stasis will ensue for most species, most of the time. But occasionally, directional preferences, or intrinsic perceptual biases in preferences, or genetic drift acting on preferences, can lead to runaway dynamics that take a population (or at least the males) away from the ecological fitness peak. So the effects of mate choice can be visualized as vectors that pull populations away from adaptive peaks out on long forays into the unknown, where they may or may not encounter new ecological opportunities and evolve economically useful traits. If they do not encounter new opportunities, little is lost: the males will have sexually dimorphic courtship innovations, and the females will have mate choice mechanisms, both of which have some economic costs but substantial reproductive benefits. But if they do encounter new opportunities, much is gained: if the male courtship innovation or the female mate choice mechanism happens to be modifiable into a useful economic innovation, then it will be elaborated through natural selection and its degree of sexual dimorphism will decrease. The lucky population will enter a new adaptive zone, rapidly climb the new peak, and may often become reproductively isolated from other populations. The result could look like a period of rapid evolution concentrated around a speciation event, just as described by punctuated equilibrium theory (Eldredge &: Gould, 1972). Moreover, if the new adaptive zone happens to be particularly large and fruitful, and the economic innovation proves particularly advantageous, then the event will look like the establishment of a key evolutionary innovation, and may lead to the formation of new higher taxa.
7 7.1
Speciation Sympatric speciation t h r o u g h sexual selection
Parallel computation can be faster than serial computation. This principle also applies to evolutionary processes of 'biocomputation'. At one level, the adaptive power on natural selection exploits parallelism across the genes, gene complexes, and individuals within a population. But at another level, a single population exploring an adaptive landscape is not as efficient as a set of populations exploring
194
the landscape in parallel. As section 5.2 discussed, sexual dimorphism between males and females allows one sub-population (the females) to stay perched on an old adaptive peak while another (the males) explores the surrounding phenotype space for other adaptive peaks. Are there any more powerful methods of parallel search in biocomputation that would allow many 'search parties' to branch out across the adaptive landscape? Speciation does exactly that. When a biological lineage splits apart into reproductively isolated subpopulations, one search party is replaced by two independent parties. Here again, we can ask whether mate choice and sexual selection can help biocomputation, this time through facilitating speciation. Though vitally interested in both speciation and mate choice, Darwin did not seem to perceive this connection, and the Origin of species (1859) in fact offered no clear mechanism of any sort whereby speciation could happen. The biologists of the Modern Synthesis (e.g. Dobzhansky, 1937; Huxley, 1942; Mayr, 1942) saw species as self-defined reproductive communities, and yet often argued against the idea that sexual selection, the obvious agent of reproductive self-definition, could induce speciation, because their attitude towards Darwin's theory of selective mate choice was so hostile. Instead, two major theories of speciation developed during the Modern Synthesis, and both suggested that speciating populations are split apart by some divisive force or "cleaver" external to the population itself. The cleaver separates the population in twain genetically and phenotypically, and then reproductive barriers arise afterwards through genetic drift or through selection against hybridization. In Mayr's (1942) model of allopatric (spatially separated) speciation, the cleaver is a new geographic barrier arising to separate previously interbreeding populations. For example, a river may shift course to isolate one population from another. Some combination of genetic drift, the "founder effect" (genetic biases resulting from populations starting with a very few isolated individuals), and disruptive selection then causes the two newly isolated groups to diverge phenotypically. Once enough phenotypic divergence accumulates, the populations can no longer interbreed even when the physical barrier disappears, and so are recognized as separate species. Speciation for Mayr was thus generally a side-effect of geographical separation. In Dobzhansky's (1937) model of sympatric (spatially united) speciation, the cleaver is more abstract: it is a low-fitness valley in an adaptive landscape, rather than a barrier in geographic space. For example, an adaptive landscape might contain two high-fitness peaks (econiehes) separated by a low-fitness valley. This valley could establish disruptive selection against interbreeding between the peaks, thereby driving an original population starting in the valley to split and diverge towards the separate peaks in two polymorphic subpopulations. Dobzhansky further suggested that after divergence, reproductive isolation evolves through selection against hybridization: since hybrid offspring will usually fall genetically back in the lower-fitness valley, mechanisms to prevent cross-breeding between the separate populations will tend to evolve. Thus the evolution of reproductive isolation (speciation itself) is viewed as a conservative
195
process of consolidating adaptive change rather than a radical process of differentiation. Vrba (1985) and Futuyma (1986) concur that speciation serves a conservative function, acting like a 'ratchet' in macroevolution: only reproductive isolation allows a newly diverged population to effectively consolidate its adaptive differentiation; otherwise, the parent species will tend to genetically re-absorb it. A recent development in sympatric models is Paterson's (1985) concept of
specific mate recognition systems (SMRSs). SMRSs are phenotypic mechanisms a species uses to maintain itself as a self-defining reproductive community - - in our terms, a set of mate choice mechanisms for assortative mating. A species is thus considered the largest collection of organisms with a shared SMRS. In Paterson's view, sympatric disruption and divergence of these SMRSs themselves (through some unspecified processes) can lead to speciation. Eldredge (1989, p. 120) emphasizes the potential macroevolutionary significance of SMRSs: "significant adaptive change in sexually reproducing lineages accumulates only in conjunction with occasional disruptions of the SMRSs." Historically, the acceptability of sympatric models has depended on the perceived ability of disruptive selection to generate stable polymorphisms and eventual reproductive isolation. A large number of experiments reviewed by Thoday (1972) show that disruptive selection is sufficient to generate phenotypic divergence even in the face of maximal gene flow between populations (which Mayr, 1963, p. 472, saw as the Achilles' heel of sympatric speciation models), and that mechanisms of reproductive isolation can then evolve to avoid hybrids and consolidate that divergence. Computer models by Crosby (1970) showed that syrnpatric speeiation could occur when populations choose different micro-habitats, evolve stable polymorphisms through disruptive selection, and then evolve reproductive barriers to avoid hybridization. But the speciation debate has continued to grind down to a question of whose cleaver is bigger: Mayr's (1942) geographic barriers or Dobzhansky's (1937) fitness valleys. To address this issue, we (Todd & Miller, 1991) developed a computer simulation of sexual selection that allowed for the possibility of "spontaneous" sympatric speciation through the interaction of assortative mating and genetic drift acting in a finite population. We found that spontaneous speciation could indeed happen, even in the absence of any geographic isolation and even without any natural selection - - no cleaver is necessary beyond the mate choices of individuals in the population. The rate of speciation increased with mutation rate and depended on the exact type of mate preference implemented. Preferences for individuals similar to one's own phenotype yielded the highest speciation rate, while inherited preferences for individuals with particular specific phenotypes yielded lower rates of speciation. In further investigations we found that spontaneous speciation also happens robustly with directional mate preferences, when the directional preference vectors happen to diverge and split the population into two subpopulations heading off on different trajectories through phenotype space (Miller & Todd, 1993); and that speciation can happen robustly as well when an individual's mate preferences are learned from the phenotypes of their
196
parents through the process of 'sexual imprinting' (Todd & Miller, 1993; Todd, in press).
7.2
Sexual selection a n d t h e origins of b i o d i v e r s i t y
There is some biological evidence that speciation rates are indeed higher when selective mate choice plays a more important role. Ryan (1986) found a correlation between cladal diversity in frogs and complexity of their inner ear organs (amphibian papilla), which are responsible for the operation of female choice on male calls. He reasoned that "since mating call divergence is an important component in the speciation process, differences in the number of species in each lineage should be influenced by structural variation of the inner ear [and hence the operation of mate choice]" (p. 1379). Immelmann (1972, p. 167) has argued that mate preferences derived from imprinting on the phenotypes of one's parents may speed speeiation in ducks, geese, and the like: "imprinting may be of special advantage in any rapidly evolving group, as well as wherever several closely related and similar species occur in the same region [i.e. sympatric situations]. Interestingly enough, both statements really do seem to apply to all groups of birds in which imprinting has been found to be a widespread phenomenon...'. The enormous diversity of insects (at least 750,000 documented species, maybe as many as 10 million in the wild) might seem at first sight to contradict the notion that mate choice facilitates speciation, since few (except Darwin) seem willing to attribute much mate choice to insects. But Eberhard (1985, 1991, 1992) has shown that male insect genitalia evolve largely through the effects of cryptic female choice, in such as way that speciation could be promoted. Further evidence for speciation through mate choice comes from a consideration of biodiversity and the numbers of species across different kingdoms and phyla. There seems to be a striking correlation between a taxon's species diversity and the taxon's evolutionary potential for sexual selection through mate choice, resulting in highly skewed richness of species across the five kingdoms. Recent estimates of biodiversity suggest there may be somewhere between 10 and 80 million species on earth (May, 1990, 1992). But of the 1.5 million or so species that have actually been identified and documented so far by taxonomists, the animal kingdom contains about 1,110,000, the plant kingdom contains about 290,000, the fungi contain about 90,000, the protists contain about 40,000, and the monera contain only about 5000 (Cook, 1991). (It should be noted that sampling biases might account for a small amount of the skewness here: many animals and plants are larger and easier to notice and to classify than fungi, protists, or monera.) Although the vast majority of species in each kingdom can undergo some form of genetic recombination through sexual reproduction, only in the animals and the flowering plants is selective mate choice of central importance. Of the 290,000 documented species of plants, about 250,000 are angiosperms (flowering plants) fertilized by animal pollinators. And of the 1,110,000 documented species of animals, those with sufficient neural complexity to allow for some degree of mate choice (particularly the arthropods, molluscs,
197
and chordates) are much more numerous than those without. Thus, species diversity is vastly greater among taxa wherein a more or less complex nervous system mediates mate choice, either a conspecific's nervous system in the case of animals, or in a heterospecific pollinator's nervous system in the case of flowering plants. This pattern is the opposite of what we might expect if allopatric speciation were the primary cause of biodiversity. The effects of geographic separation (allopatry) should obviously be weaker for species whose reproduction is mediated by a mobile animal. Animals can search over wide areas for mates and pollinators can fly long distances. So allopatric speciation would predict lower species diversity among taxa whose reproduction is mediated by mobile animals with reasonably complex nervous systems - - exactly the opposite of what we observe. A similar problem holds for sympatric speciation through disruptive selection: animals with complex nervous systems should find it easier to generate conditional behavior that exploits different fitness peaks (ecological niches) flexibly, without having to speciate in order to specialize. Yet it is precisely such animals that seem to speciate most quickly. To further explore the role of selective mate choice in creating species biodiversity, we need to analyze the degree of mate choice in the various taxa more accurately, adjust the speciation rates between taxa for number of generations of evolution (and thus organism size), and if possible take into account the amount of geographic spread and migratory range of the species involved. In this way, we hope to gain more evidence to show that sympatric speciation through mate choice, particularly through assortative mating, is the best explanation available for the extreme biodiversity of animals and flowering plants, and is thus the most powerful mechanism for dividing up and spreading out evolution's exploratory search of the adaptive landscape.
8 8.1
Implications
and applications
I m p l i c a t i o n s for biology a n d p s y c h o l o g y
Biologists have been exploring the nuances of natural selection almost continuously since Darwin's time, and much has been learned. By contrast, Darwin's (1871) theory of sexual selection through mate choice was virtually ignored until about 15 years ago, so the implications of sexual selection are only beginning to be realized. This paper has made some strong claims about how natural selection and sexual selection might interact to explain long-standing mysteries in biology, such as how complex adaptations get optimized, how species split apart, and how evolutionary innovations are constructed before they show any ecological utility. From the perspective of traditional natural selection research and the Modern Synthesis, these claims may look strange and implausible. But Darwin may not have found them so. Taking mate choice seriously does not mean abandoning Darwinism, adaptationism, optimality theory, game theory, or anything else of proven value in biology. It simply means recognizing a broader class of selection
198
pressures and a richer set of evolutionary dynamics than have been analyzed so far. Psychology has barely begun to recognize the role of natural selection in constructing mental and behavioral adaptations, much less the role of sexual selection in doing so. One of our motivations for exploring the interaction of natural and sexual selection is our conviction that sexual selection may have played a critical role in the evolution of our unique human morphology (Szalay & Costello, 1991) and psychology (Miller, 1993). The evolution of the human brain can be seen as a problem of escaping a local optimum: the ecologically efficient 500 cc. brain of the Australopithecenes, who were perfectly good at bipedal walking, gathering, scavenging, and complex social life with their normal ape-sized brains. During the rapid encephalization of our species in the last two million years, through the Homo habilis and Homo erectus stages up through archaic Homo sapiens, our ancestors showed very little ecological progress: tool making was at a virtual stand-still, the hunting of even small animals was still quite inefficient, and we persisted alongside unencephalized Australopiihecene species for well over a million years. These facts suggest an evolutionary pattern just like that of other key innovations, as discussed in section 6.2: that large brains did not give our lineage any significant ecological advantages until the last 100,000 years, when big-game hunting and complex tool-making started to develop quite rapidly - - long after we had attained roughly our present brain size. Instead, we propose that brain size probably evolved through runaway sexual selection operating on both males and females (Miller, 1993). Human encephalization represents the most mysterious example of innovative escape from a local ecological optimum, and we think the runaway dynamics of selective mate choice had everything to do with this escape.
8.2 Applications in genetic algorithms research and evolutionary design optimization If mate choice has been critical to the innovation, optimization, and diversification of life on our planet, we might expect that mate choice will also prove important in the design of complex artificial systems using genetic algorithms and other evolutionary optimization techniques. Evolutionary engineering methods are often defended by claiming that we have a 'sufficiency proof' that natural selection alone is capable of generating complex animals with complex behaviors. But this is not strictly true: all we really know is that natural and sexual selection in concert can do this. Indeed, the traditional assumption in genetic algorithms research that sexual recombination per se is the major advantage of sexual reproduction (Holland, 1975; Goldberg, 1989) may be misleading. If instead the process of selective mate choice is what gives evolutionary power and subtlety to sexual reproduction, then current genetic algorithms work may be missing out on a major benefit of simulating sex. For those interested in evolving robot control systems (e.g. Cliff, Husbands, & Harvey, 1992; Harvey, Husbands, & Cliff, 1992, 1993) or other complex design
199
structures (e.g. Goldberg, 1989; Koza, 1993; see Forrest, 1993) through simulated natural selection, we suggest that incorporating processes of simulated sexual selection may help speed optimization, avoid local evolutionary optima, develop important new evolutionary innovations, and increase parallel search and niche differentiation through speciation. These effects may become particularly important as we move from pre-defined noise-free fitness functions to more complex, noisy, emergent fitness functions of the sort that arise when actually simulating ecosystems, coevolution, and other more naturalistic interactions. Also, to the extent that the human brain evolved through runaway sexual selection (Miller, 1993), simulated sexual selection may help us cross the border between artificial life and artificial intelligence sometime in the future. 9
Conclusions
Natural selection is fairly good at climbing fitness peaks in adaptive landscapes representing 'economic' traits. Sexual selection through mate choice has complementary strengths: it is good at making this natural-selective hill-climbing faster and more accurate, at allowing escape from local optima, at generating courtship innovations that may prove useful as economic innovations, and at creating biodiversity and parallel niche differentiation through speciation. The two processes together yield a very powerful form of biocomputation that rapidly and efficiently explores the space of possible phenotypes, as shown by the diversity and complexity of animals and flowering plants on our planet. We are all the products not only of selection for survival, but also of selection for sexiness - - dark-bright alloys forged in death and shaped by love. 10
Acknowledgments
Geoffrey Miller has been supported by NSF Research Grant INT-9203229 and NSF-NATO Post-Doctoral Grant RCD-9255323. For comments, support, advice, and/or inspiration relevant to this work, we are indebted to: Dave Cliff, Helena Cronin, Inman Harvey, Phil Husbands, Andrew Pomiankowski, Roger Shepard, and John Maynard Smith. References Andersson, M. (1994): Sexual selection. Princeton: Princeton U. Press. Barth, F. G. (1991): Insects and flowers: The biology o] a partnership. Princeton: Princeton U. Press. Bateson, P. (Ed.). (1983): Mate choice. Cambridge, UK: Cambridge U. Press. Bateson, P. (1988): The active role of behavior in evolution. In M.-W. Ho & S. W. Fox (Eds.), Evolutionary processes and metaphors (pp. 191-207). New York: John Wiley. Bonnet, J. T. (Ed.). (1982): Evolution and development. Berlin: Springer-Verlag.
200
Brooks, R. A., & Maes, P. (Eds.). (1994): Artificial Life IV. Cambridge, MA: MIT Press/Bradford Books. Clarke, B. C. (1962): The evidence for apostatic selection. Heredity (London), 24, 347352. Cliff, D., Husbands, P., & Harvey, I. (1992): Evolving visually guided robots. In J.-A. Meyer, H. L. Roitblat, & S. W. Wilson (Eds.), From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior (pp. 374-383). Cambridge, MA: MIT Press. Cook, L. M. (1991): Genetic and ecological diversity: The sport of nature. London: Chapman & Hall. Cracraft, J. (1990): The origin of evolutionary novelties: Pattern and process at different hierarchical levels. In M. Nitecki (Ed.), Evolutionary innovations (pp. 21-44). Chicago: U. Chicago Press. Cronin, H. (1991): The ant and the peacock: Altruism and sexual selection from Darwin to today. Cambridge, UK: Cambridge U. Press. Crosby, J. L. (1970): The evolution of genetic discontinuity: Computer models of the selection of barriers to interbreeding between subspecies. Heredity, 25, 253-297. Darwin, C. (1859): On the origin of species (lst ed.). London: John Murray. Darwin, C. (1862): On the various contrivances by which orchids are fertilized by insects. London: John Murray. Darwin, C. (1871): The descent of man, and selection in relation to sex. London: John Murray. Darwin, C. (1876): The movements and habits of climbing plants (2nd ed.). New York: D. Appleton & Co. Darwin, C. (1883): On the origin of species (6th ed.). New York: D. Appleton & Co. Dawkins, R. (1994): The eye in a twinkling. Nature, 368, 690-691. Dewsbury, D. A. (1981): Effects of novelty on copulatory behavior: The Coolidge Effect and related phenomena. Psychological Review, 89(3), 464-482. Dobzhansky, T. (1937): Genetics and the origin of species. (Reprint edition 1982). New York: Columbia U. Press. Endler, J. A. (1992): Signals, signal conditions, and the direction of evolution. American Naturalist, 139, $125-S153. Eberhard, W. G. (1985): Sexual selection and animal genitalia. Cambridge, MA: Harvard U. Press. Eberhard, W. G. (1991): Copulatory courtship and cryptic female choice in insects. Biol. Rev., 66, 1-31. Eberhard, W. G. (1992): Species isolation, genital mechanics, and the evolution of species-specific genitalia in three species of Macrodactytus beetles (Coleoptera, Scaraceidae, Melolonthinae). Evolution, 46(6), 1774-1783. Eigen, M. (1992): Steps towards life: A perspective on evolution. Oxford: Oxford U. Press. Eldredge, N. (1985): Unfinished synthesis: Biological hierarchies and modern evolutionary thought. New York: Oxford U. Press Eldredge, N. (1986): Information, economics, and evolution. Ann. Review of Ecology and Systematics, 17, 351-369. Eldredge, N. (1989): Macroevolutionary dynamics: Species, niches, and adaptive peaks. New York: McGraw-Hill. Eldredge, N., &: Gould, S. J. (1972): Punctuated equilibria: An alternative to phyletic gradualism. In T. J. M. Schopf (Ed.), Models in paleobiology (pp. 82-115). San Francisco: Freeman, Cooper.
201
Enquist, M., & Arak, A. (1993): Selection of exaggerated male traits through female aesthetic senses. Nature, 361(6~11), 446-448. Fisher, R. A. (1915): The evolution of sexual preference. Eugenics review, 7, 184-192. Fisher, R. A. (1930): The genetical theory of natural selection. Oxford: Clarendon Press. Porrest, S. (Ed.) (1993): Proceedings of the Fifth International Conference on Genetic Algorithms. San Francisco: Morgan Kaufmann. Futuyma, D. (1986): Evolutionary biology. Sunderland, MA: Sinauer. Futuyama, D., & Slatkin, M. (Eds.). (1983): Convolution. Sunderland, MA: Sinauer. Goldschmidt, R. B. (1940): The material basis of evolution. New Haven, CT: Yale U. Press. Goodwin, B. C., Holder, N., & Wylie, C. C. (Eds.). (1983): Development and evolution. Cambridge, UK: Cambridge U. Press. Gould, S. J. (1977): Ontogeny and phylogeny. Cambridge, MA: Harvard U. Press. Guilford, T., & Dawkins, M. S. (1991): Receiver psychology and the evolution of animal signals. Animal Behaviour, 42, 1-14. Hnldane, J. B. S. (1932): The causes of evolution. London: Longman. Harvey, I., Husbands, P., & Cliff, D. (1992): Issues in evolutionary robotics. In J.-A. Meyer, H. L. Roitblat, & S. W. Wilson (Eds.), From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior (pp. 364-373). Cambridge, MA: MIT Press/Bradford Books. Harvey, I., Husbands, P., & Cliff, D. (1993): Genetic convergence in a species of evolved robot control architectures, tn S. Forrest (Ed.), Proceedings of Fifth International Conference on Genetic Algorithms. San Francisco: Morgan Kaufmann. Hinton, G. E., & Nowlan, S. J. (1987): How learning guides evolution. Complex Systems, 1, 495-502. Huxley, J. S. (1938): The present standing of the theory of sexual selection. In G. R. de Beer (Ed.), Evolution: Essays on aspects of evolutionary biology (pp. 11-42). Oxford: Clarendon Press. Huxley, J. S. (1942): Evolution: The modern synthesis. New York: Harper. Iwasa, Y., Pomiankowsld, A., ~z Nee, S. (1991): The evolution of costly mate preferences. II. The 'handicap' principle. Evolution, ~5(6), 1431-1442. Jablonsld, D., & Bottjer, D. J. (1990): The ecology of evolutionary innovation: The fossil record. In M. Nitecki (Ed.), Evolutionary innovations (pp. 253-288). Chicago: U. Chicago Press. Jensen, J. S. (1990): Plausibility and testability: Assessing the consequences of evolutionary innovation. In M. Nitectd (Ed.), Evolutionary innovations (pp. 171-190). Chicago: U. Chicago Press. Kauffman, S. A. (1993): Origins of order: Self-organization and selection in evolution. New York: Oxford U. Press. Kimura, M. (1983): The neutral theory of molecular evolution. In M. Nei & R. K. Koehn (Eds.), Evolution of genes and proteins, pp. 213-233. Sunderland, MA: Sinauer. Kirkpatrick, M. (1982): Sexual selection and the evolution of female choice. Evolution, 36, 1-12. Kirkpatrick, M. (1987): The evolutionary forces acting on female preferences in polygynous animals. In J. W. Bradbury & M. B. Andersson (Eds.), Sexual selection: Testing the alternatives (pp. 67-82). New York: John Wiley. Koza, J. (1993): Genetic programming. Cambridge, MA: MIT Press/Bradford Books. Lande, R. (1980): Sexual dimorphism, sexual selection and adaptation in polygenic characters. Evolution, 34, 292-305.
202
Lande, R. (1981): Models of speciation by sexual selection on polygenic characters. Proe. Nat. Acad. Sci. USA, 78, 3721-3725. Lande, R. (1987): Genetic correlation between the sexes in the evolution of sexual dimorphism and mating preferences. In J. W. Bradbury & M. B. Andersson (Eds.), Sexual selection: Testing the alternatives (pp. 83-95). New York: John Wiley. Liem, K. F. (1973): Evolutionary strategies and morphological innovations: Cichlid pharyngeal jaws. Systematic zoology, 22, 425-441. Liem, K. F. (1990): Key evolutionary innovations, differential diversity, and symecomorphosis. In M. Nitecki (Ed.), Evolutionary innovations (pp. 14%170). Chicago: U. Chicago Press. May, R. M. (1990): How many species? Phil. Trans..l~oyat Soc. London B, Biological Sciences, 330(1257), 293-304. May, R. M. (1992): How many species inhabit the earth? Scientific American, 267(4), 42-48. Maynard Smith, J. (1978): The evolution o] sex. Cambridge, UK: Cambridge U. Press. Mayr, E. (1942): Systematics and the origin of species. (Reprint edition 1982). New York: Columbia U. Press. Mayr, E. (1954): Change of genetic environment and evolution. In J. Huxley, A. C. Hardy, & E. B. Ford (Eds.), Evolution as a process (pp. 157-180). London: George Allen & Unwin. Mayr, E. (1960): The emergence of evolutionary novelties. In S. Tax (Ed.), Evolution after Darwin, Vol. I (pp. 349-380). Chicago: U. Chicago Press. Mayr, E. (1983): Animal species and evolution. Cambridge, MA: Harvard U. Press. McKinney, F. K. (1988): Multidisciplinary perspectives on evolutionary innovations. Trends in ecology and evolution, 3, 220-222. Miller, G. F. (1993): Evolution of the human brain through runaway sexual selection. Ph.D. thesis, Psychology Department, Stanford University. (To be published in 1995 by MIT Press.) Miller, G. F. (1994): Exploiting mate choice in evolutionary computation: Sexual selection as a process of search, optimization, and diversification. In T. C. Fogarty (Ed.), Evolutionary Computing: Proceedings of the t994 Artificial Intelligence and Simulation of Behavior (AISB) Society Workshop (pp. 65-79). Berlin: Springer-Verlag. Miller, G. F. (Accepted, a): Psychological selection in primates: The evolution of adaptive unpredictability in competition and courtship. To appear in A. Whiten & R. W. Byrne (Eds.), Machiavellian Intelligence II. Miller, G. F. (Accepted, b): Sexual selection in human evolution: Review and prospects. To appear in C. Crawford (Ed.), Evolution and human behavior: Ideas, issues~ and applications, ttillsdale, N J: Lawrence Erlbaum. Miller, G. F., & Cliff, D. (1994): Protean behavior in dynamic games: Arguments for the co-evolution of pursuit-evasion tactics in simulated robots. In D. Cliff, P. Husbands, J. A. Meyer, & S. W. Wilson (Eds.), From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior (pp. 411-420). Cambridge, MA: MIT Press/Bradford Books. Miller, G. F. & Freyd, J. J. (1993): Dynamic mental representations of animate motion: The interplay among evolutionary, cognitive, and behavioral dynamics. Cognitive Science Research Paper 290, University of Sussex. Submitted as a target article for Behavioral and Brain Sciences. Miller, G. F.~ K; Todd, P. M. (1993): Evolutionary wanderlust: Sexual selection with directional mate preferences. In J.-A. Meyer, H. L. Roitblat, & S. W. Wilson (Eds.), From Animals to Animats 2: Proceedings of the Second International Conference on
203
Simulation of Adaptive Behavior (pp. 21-30). Cambridge, MA: MIT Press/Bradford Books. Moiler, A. P., & Pomiankowski, A. (1993): Fluctuating asymmetry and sexual selection. Genetica, 89, 267-279. Morgan, C. L. (1888): Natural selection and elimination. Nature, Aug. 16, 370. Muller, G. B. (1990): Developmental mechanisms at the origin of morphological novelty: A side-effect hypothesis. In M. Nitecki (Ed.), Evolutionary innovations, pp. 99-130. Chicago: U. Chicago Press. Nitecki, M. (Ed.). (1990): Evolutionary innovations. Chicago: U. Chicago Press. O'Donald, P. (1980): Genetic models of sexual selection. Cambridge, UK: Cambridge U. Press. Paterson, H. E. H. (1985): The recognition concept of species. In E. S. Vrba (Ed.), Species and speciation, Transvaal Mus. Monogr. 4, 21-29. Patterson, B. D. (1988): Evolutionary innovations: Patterns and processes. Evolutionary trends in plants, 2, 86-87. Petrie, M. (1992): Peacocks with low mating success are more fikely to suffer predation. Animal Behaviour, 44, 585-586. Petrie, M., Halliday, T., & Sanders, C. (1991): Peahens prefer peacocks with elaborate trains. Animal Behaviour~ 41, 323-331. Pimental, D., Smith, G. J. C., & Soans, J. (1967) A population model of sympatric speciation. American Naturalist, 101(92P), 493-504. Pomiankowski, A. (1987): The costs of choice in sexual selection. J. Theoretical Biology, 128, 195-218. Pomiankowski, A. (1988): The evolution of female mate preferences for male genetic quality. Oxford Surveys in Evolutionary Biology, 5, 136-184. Pomiankowski, A. (1990): How to find the top male. Nature, 3$Z 616-617. Pomiankowski, A., Iwasa, Y., & Nee, S. (1991): The evolution of costly mate preferences. I. Fisher and biased mutation. Evolution, 45(6), 1422-1430. Raft, R. A., Par, B., Parks, A., & Wray, G. (1990): Radical evolutionary change in early development. In M. Nitecki (Ed.), Evolutionary innovations (pp. 71-98). Chicago: U. Chicago Press. Raft, R. A., & Raft, E. C. (Eds.): (1987): Development as an evolutionary process. New York: Alan R Liss. Romanes, G. J. (1897): Darwin, and after Darwin. IL Post-Darwinian Questions. Heredity and Utility (2nd ed.). Chicago: Open Court Pubfishing. Ryan, M. J. (1986): Neuroanatomy influences speciation rates among anurans. Proc. Nat. Acad. Sci. USA, 83, 1379-1382. Ryan, M. J. (1990): Sexual selection, sensory systems, and sensory exploitation. Oxford Surveys of Evol. Biology, 7, 156-195. Ryan, M. J., & Keddy-Hector, A. (1992): Directional patterns of female mate choice and the role of sensory biases. American Naturalist, 139, $4-$35. Schuster, P. (1988): Stationary mutant distributions and evolutionary optimization. Bull. Mathematical Biology, 50(6), 635-660. Simon, P. (1992): The action plant: Movement and nervous behaviour in plants. Cambridge, MA: Blackwell. Simpson, G. (1944): Tempo and mode in evolution. New York: Columbia U. Press. Simpson, G. (1953): The major features of evolution. New York: Columbia U. Press. Sprengel, C. K. (1793): Das entdeekte Geheimnis der Natur im Bau und in der Befruchtung der Blumen. (The secret of nature revealed in the structure and pollination of flowers.) Berlin: F. Vieweg. (Reprinted 1972 by J. Cramer, Lehre.)
204 Szalay, F. S. & Costello, R. K. (1991): "Evolution of permanent estrus displays in hominids." J. Human Evolution, 20, 439-464. Sullivan, B. K. (1989): Passive and active female choice: A comment. Animal Behaviour, 37(g), 692-694. Thoday, J. M. (1972): Disruptive selection. Proc. of the Royal Soc. of London B, 182, 109-143. Todd, P. M. (in press): Sexual selection and the evolution of learning. To appear in R. Belew & M. Mitchell (Eds.), Adaptive individuals in evolving populations: Models and algorithms. Reading, MA: Addison-Wesley. Todd, P. M., & Miller, G. F. (1991): On the sympatric origin of species: Mercurial mating in the Quicksilver Model. In R. K. Belew & L. B. Booker (Eds.), Proceedings of the Fourth International Conference on Genetic Algorithms (pp. 547-554). San Mateo, CA: Morgan Kaufmann. Todd, P. M. & Miller, G. F. (1993): Parental guidance suggested: How parental imprinting evolves through sexual selection as an adaptive learning mechanism. Adaptive Behavior, 2(1), 5-47. Todd, P. M. & Miller, G. F. (in preparation): The role of mate choice in biocomputation H: Applications of sexual selection in search and optimization Todd, P. M., & Wilson, S. W. (1993): Environment structure and adaptive behavior from the ground up. In J.-A. Meyer, H. L. Roitblat, & S. W. Wilson (Eds.), From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior (pp. 11-20). Cambridge, MA: MIT Press/Bradford Books. Weismann, A. (1917): The selection theory. In Evolution in modern thought, by Haeckel, Thomson, Weismann, and Others (pp. 23-86)~ New York: Boni and Liveright. Williams, G. C. (1975): Sex and evolution. Princeton: Princeton U. Press. Willson, M. F., and Burley, N. (1983): Mate choice in plants: Tactics, mechanisms, and consequences. Princeton: Princeton U. Press. Wright, S. (1932): The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proc. Sixth Int. Congr. Genetics, 1, 356-366. Wright, S. (1982): Character change, speciation, and the higher taxa. Evolution, 36, 427-443. Wyles, J. S., Kunkel, J. G., & Wilson, A. C. (1983): Birds, behavior, and anatomical evolution. Proc. Nat. Acad~ Sci. USA, 80, 4394-4397. Van Valen, L. M. (1971): Adaptive zones and the orders of mammals. Evolution, 16, 125-142. Vrb~, E. S. (1983): Macroevolutionary trends: New perspectives on the roles of adaptation and incidental effect. Science, 221,387-389. Vrba, E, S. (1985): Environment and evolution: Alternative causes of the temporal distribution of evolutionary events. South African Journal of Science, 81,, 229-236. Vrba, E. S., & Gould, S. J. (1986): The hierarchical expansion of sorting and selection: Sorting and selection cannot be equated. Paleobiology, 12, 217-228. Zahavi, A. (1975): Mate selection: A selection for a handicap. Journal of Theoretical Biology, 53, 205-214.
Genome Growth and the Evolution of the Genotype-Phenotype Map Lee Altenberg* Institute of Statistics and Decision Sciences, Duke University, Durham, NC 27708-0251 U.S.A. The evolution of new genes is distinct from evolution through allelic substitution in that new genes bring with them new degrees of freedom for genetic variability. Selection in the evolution of new genes can therefore act to sculpt the dimensions of variability in the genome. This "constructional" selection effect is an evolutionary mechanism, in addition to genetic modification, that can affect the variational properties of the genome and its evolvability. One consequence is a form of genic selection: genes with large potential for generating new useful genes when duplicated ought to proliferate in the genome, rendering it ever more capable of generating adaptive variants. A second consequence is that alleles of new genes whose creation produced a selective advantage may be more likely to also produce a selective advantage, provided that gene creation and allelic variation have correlated phenotypic effects. A fitness distribution model is analyzed which demonstrates these two effects quantitatively. These are effects that select on the nature of the genotype-phenotype map. New genes that perturb numerous functions under stabilizing selection, i.e. with high pleiotropy, are unlikely to be advantageous. Therefore, genes coming into the genome ought to exhibit low pleiotropy during their creation. If subsequent offspring genes also have low pleiotropy, then genic selection can occur. If subsequent allelic variation also has low pleiotropy, then that too should have a higher chance of not being deleterious. The effects on pleiotropy are illustrated with two model genotype-phenotype maps: Wagner's linear quantitative-genetic model with Gaussian selection, and Kauffman's "NK" adaptive landscape model. Constructional selection is compared with other processes and ideas about the evolution of constraints, evolvability, and the genotype-phenotype map. Empirical phenomena such as dissociability in development, morphological integration, and exon shuffling are discussed in the context of this evolutionary process. 1
Introduction
In this chapter I discuss an evolutionary mechanism whose target is specifically the ability of genomes to generate adaptive variants. It is about the evolution of evolvability. The main focus of action for this process is the genotype-phenotype map (Wagner 1984, 1989), i.e. the way genetic variation maps to phenotypic variation. The genotype-phenotype map is the concept underpinning the classical concepts of pleiotropy, polygeny, epistasis, constraints, and gradualness. Internet: [email protected], edu.
206
The way that genetic variation maps to phenotypic variation is fundamental to whether or not that variation has the possibility of producing adaptive change. Even when strong opportunity exists for new adaptations in an organism, many of its previously evolved functions will remain under stabilizing selection. Adaptation requires variation that is able to move the organismal phenotype toward traits under directional selection without greatly disturbing traits remaining under stabilizing selection. Variation that disturbs existing adaptations - - will have as it produces new adaptations - - i.e. variation which is p l e i o t r o p i c difficulty producing an overall fitness advantage. Other aspects of the genotype-phenotype map that affect evolvability include: - Gradualness: genetic changes with extreme effects are less likely to be advantageous; Rugged landscapes: adaptive changes that require the simultaneous altering of several genes are less likely to evolve; and Constraints: adaptations for which no genetic variability exists are unable to evolve. -
-
The question of whether the genotype-phenotype map has evolved so as to systematically affect evolvability has been dealt with in a variety of ways in the literature. Approaches include the following: T h e g e n o m e as fluid: Evolvability is not limited; genetic variation exists within populations tbr any trait one wishes to select on. T h e internalist view: The degree of evolvability is a byproduct of the physics of development. It is fortunate that physics permitted evolvability. Lineage selection: Different developmental systems may have different evolvabilities; those which happen to have high evolvability will proliferate as species lineages. G e n e t i c modification: Selection for adaptedness happens to systematically produce high evolvability. This paper adds an additional hypothesis to this list: C o n s t r u c t i o n a l selection: Selection during the origin of genes provides a filter on the construction of the genotype-phenotype map that naturally produces evolvability. The internalist viewpoint is what this paper will take issue with most. The internalist viewpoint holds that the variational properties of the genotype-phenotype map are the result of the physics of development (Goodwin 1989). The process of morphogenesis is proposed as a complex dynamical system toward which genes contribute, but which has internal macroscopic properties that determine what kinds of phenotypic variability exist. One can ask, however, whether morphogenetic dynamics could have been shaped by evolutionary forces that systematically affect the nature of developmental constraints, or the smoothness of the adaptive landscape, or its evolvability. Here I discuss an evolutionary mechanism by which selection can come
207
to act indirectly on evolutionary potential, as a consequence of how genes come into being in the first place. The main idea, in a nutshell, is this: the genes that stably exist in a genome share the common feature that, when they were created, they produced a selective advantage for the organism. But when a new gene is created, it not only produces its current phenotypic effect, but carries with it a new "neighborhood" in "sequence space" - - the kinds of variants that it can in turn give rise to. The phenotypic character of this neighborhood depends on the gene's mode of action. Different modes of gene action can be expected to have different overall likelihood of producing adaptive variants. The fact that a gene's existence is predicated on it having originally produced a selective advantage means that the accumulation of new genes in the genome should be biased toward modes of action whose variants are more likely to be fruitful in adaptation. Since there is a diversity of modes of gene action, the question remains as to why there are the kinds there are, in the frequencies they are found, within the genomes of organisms. This chapter presents a theory about the statistical properties of genotype-phenotype maps, and how these statistics would be expected to change in the course of the evolutionary construction of the genome toward ways that facilitate the generation of adaptive variants. There are two basic aspects to the idea of a genotype-phenotype map. One can think of the genotype as a "representation" or description of the phenotype. Representation has two aspects: generative and variational. The generative aspect of a representation is how the representation is actually used to produce the object, which in genetics would be the process of gene expression and its integration in development. It is not the mechanisms of how this map is accomplished that is relevant to evolvability; rather, what matters is the variational aspect of a representation - - how changes in the representation map to changes in the object. Variational aspects can be described by their statistical properties without having to deal with the generative mechanisms. The principal variational aspect I will be concerned with is pleiotropy - - the constellation of phenotypic effects from single mutations.
1.1
Bonner's Low Pleiotropy Principle
Bonnet (1974) has articulated a basic "design principle" for the genotype-phenotype map necessary to allow the generation of adaptive variants through random genetic variation, a principle of low pleiotropy: We presume that it is of a distinct advantage to keep a number of the units of gene action of the organism quite independent of one another. The reason for this seems straightforward: mutations that affect a number of construction units are more likely to be lethal than those that affect only one. Or to put it another way, the fewer the interconnections of gene action (the less the pleiotropy), the greater the chances of its being a viable mutant. A viable mutant may be one that appears late in development, such as the pigmentation of hair, eyes, or feathers, or
208
one that acts in a small developmental unit that is independent of the others. (1974, p. 61) Lewontin (1978) proposed the low pleiotropy principle in a somewhat different manner, as a principle of "quasi-independence", i.e. that there must be % great variety of alternative paths by which a given characteristic may change, so that some of them will allow selection to act on the characteristic without altering other characteristics of the organism in a countervailing fashion: pleiotropic and allometric relations must be changeable." However, this design principle suffers from the "for the good of the species" problem. Even though a property might be "good for the species", it can only evolve if organisms bearing it (or "replicators" to be more general (Brandon 1990)) have higher fitness. Although it would be a marvelous design for the organism to have a genome organized for its future adaptive potential, this future advantage does not give an organism the present advantage it needs in order to pass on such a trait.
2
Constructional
Selection
All variational aspects of the genotype-phenotype map face the "good of the species" problem, because variation is not the phenotype of an organism, but a property of genetic transmission between organisms. How, therefore, can organismal selection get a "handle" on the processes that produce variation? The general answer to this question is that there must be correlations between variational properties and properties affecting organismal fitness. These correlations can come about through diverse means. In the case of variational properties like recombination and mutation rates, correlations can be induced by the evolutionary dynamics of modifier genes - genes that control recombination, mutation, and so forth. Genes modifying recombination rates, for example, can evolve linkage associations to genes under selection whose transmission they affect. In this case, it is the modifier gene that provides natural selection with the "handle" to change recombination rates (Liberman and Feldman 1986, Altenberg and Feldman 1987). Modifier genes are rather specialized mechanisms. But here I consider a means by which selection can gain a handle on the variational properties of any gene, through the selective forces operating during the origin of the gene. All genes face the problem of selection during their creation, and those genes that produce a selective disadvantage never become stably incorporated in the genome. Therefore, existing genes share the common history of having once produced a selective advantage to the organism. But new genes bring with them new degrees of freedom for variability in the genome. These new degrees of freedom are of two types: T y p e I: new genes serve as new templates for further genome growth, and T y p e II: new genes afford new sites at which allelic variation can occur.
209
The phenotypic effects of either of these new degrees of freedom depend on the physical nature of the gene's action. And the gene's mechanisms of action is unlikely to change radically between its creation and subsequent gene duplications and allelic variations. Therefore it is reasonable to expect a correlation to exist between the phenotypic or fitness effects of a newly created gene and subsequent duplications and allelic changes. This then is a means by which variational properties of the genome can become correlated with organismal selection. Therefore, without the postulation of additional modifier genes, selection during the creation of new degrees of freedom for genetic variability can gain a handle on the quality of those degrees of freedom. The strength of this handle depends on the strength of the correlations. When referring to this process, I will summarize it with the term "constructional selection", since it is tied to the construction of new genes (Altenberg 1985). 2.1
Riedl's T h e o r y
Riedl's (1977) theory for the evolution of "genome systemization" is the main earlier example of a constructional selection theory for the genotype-phenotype map. He considers the situation where functional interactions arise in the organism that require the coordinated change of several phenotypic characters in order to produce adaptive variants. When this would require simultaneous mutations at several genes, he argues that the evolution of a new gene that produces the needed coordinated variability - - a "superimposed genetic unit" - - is a far more likely possibility. Thus Riedl is proposing that the genotype-phenotype map can evolve in directions that facilitate adaptation through selective genome growth. 2.2
Fine P o i n t s
It is important at this point to be clear that this is not an argument that most adaptive evolution happens through the origin of new genes, as opposed to allelic substitution. Rather, I am proposing that the events surrounding the creation of new genes may play a special role in the evolution of the genotype-phenotype map because of their distinct property of adding new degrees of freedom to the genome. Also, it should be understood that "new genes" can refer equally to new parts of genes or new clusters of genes, i.e. new sections of DNA sequence that are of functional use to the organism. Therefore, the arguments here apply to such elements as exons, promoters, enhancers, operators, other regulatory elements, etc.. Throughout this chapter, pleiotropy must be understood to refer not to multiple effects on arbitrary "characters" of the organism, since these are artifacts of measurement and description, but to organismal functions that are components of adaptation, what Nemeschkal et al. (1992) refer to as a "unit of characters working together to accomplish a common biological role". Moreover, in the case of new genes, the definition of "multiple" effects that is germane as a definition
210
of pleiotropy is when the gene not only produces variability for functions under directional selection, but also disturbs functions under stabilizing selection. "Low pleiotropy" will refer to genes that affect mainly functions under directional selection and leave functions under stabilizing selection unaffected.
2.3
Pleiotropy and Constructional Selection
Let us examine Bonner's low pleiotropy principle in the context of the genome growth process. New genes which have fewer pleiotropic effects when added to the genome, whose action causes the phenotype to change mainly in dimensions that are under directional selection, stand a better chance, by Bonnet's principle, of providing a selective advantage. This is would hold even if that chance is still slight. Genes which disturb many adapted functions of the phenotype are unlikely to be advantageous, and thus would not be incorporated in the genome. Therefore, selection can filter the pleiotropy of genes as they are added to the genome. If there is any correlation between the pleiotropic effects during the gene's addition and the pleiotropy of subsequent additions or allelic changes in the gene, then the genome shall have expanded its degrees of freedom in directions with lower pleiotropy. The effects of constructional selection on the two forms of genetic variation, Type I and II above, are distinct, so each is taken up in turn. 2.4
T y p e I Effect: T h e G e n o m e as Population.
If there are correlations between the phenotypic effects of duplicated genes and the effects of their subsequent duplications during macroevolutionary time scales, then a novel form of "genic" selection process becomes possible. This selection process is based on looking at the genome as a "population" of genes, as in the case of genic selection in the evolution of transposable elements. The idea that transposable elements are genetic parasites propagating within the genome (Cavalier-Smith 1977, Doolittle and Sapienza 1980, Orgel and Crick 1980) lead to the idea that the genome could be considered a population of genes, within which a new level of selection can operate when certain sequences can proliferate within the genome. Such "genic" selection is usually associated with transposable elements, whose activity is generally in conflict with organismal selection. The type I effect, however, is a form of genic selection in harmony with organismal selection, which, moreover, has organismal selection as a sub-process. Where do new genes come from? Although there is a certain amount of de novo synthesis of DNA in the genome, most genes originate from template based duplication of existing sequences. And while the vast majority of gene duplications may go to extinction, the genes currently functioning in an organism will possess an unbroken backward genealogy to earlier, ancestral genes (complicated perhaps by the occasional reactivation or insertion of pseudogene sequences). So there exists an "intra-genomic phylogeny", which is actually beginning to be taken as an object of study as the accumulation of DNA sequences allows the construction of "gene-trees" (Dorit and Gilbert 1990, Dorit et al. 1991, Strong
211
and Gutman 1992, Burt and Paton 1992, Klenova, et al. 1992, Streydio et al. 1992, Haefliger et al. 1989). If one picks any functioning gene in the genome, what would a typical story for its origin be? One could generally list: 1. 2. 3. 4.
Sequence duplication; Fixation in the population, through selection or drift; Maintenance of function by selection; Sequence evolution under mutation and selection.
Differences in gene properties that systematically bias the chances of the above events can produce a Darwinian process on the level of ge-nome-as-population. Darwinian process have three basic elements: viability, fecundity, and heritability. If there exist properties which show heritable variation in viability or fecundity, those properties can evolve over time. Viability, fecundity, and heritability each have their analogs on the level of genome-as-population:
Viability: The viability of a gene is simply its survival as a functioning gene in the genome. This requires its maintenance against mutational degradation, or replacement with other genes, and would occur through organismal selection against deletions or gene silencing mutations.
Fecundity: The fecundity of a gene is the rate at which it gives rise to other functional genes in the genome. This depends on: 1. The overall rate that duplications of the gene are produced; and 2. The probability that a duplication becomes established in the genome as a new, functional gene. This in turn depends on: (a) There being adaptive opportunity for properties of the sequence; (b) the sequence having functional properties which are not disrupted by new functional contexts; and (c) the sequence having properties that allow its duplication without disrupting existing functions of genes with which it interacts.
Herltability: Heritability here refers to ancestral and offspring genes having correlated properties, and depends on: 1. Conservation of the property of a gene over the time scale on which gene duplications occur; and 2. Carry-over of the property from ancestral to offspring genes. In each case above, one could just as well substitute "genetic element" for "gene', since the principles apply equally well to exons, promoters, regulatory sequences, and so forth. If there are systematic differences between sequences in the likelihood that duplications of them give rise to useful new genes (fecundity), and these different
212
likelihoods are conserved between gene origins, and carried from ancestral to offspring genes (heritability), then the genome will become populated with genes that are better able to give rise to other genes. The type I, or "genic selection" effect of constructional selection, therefore, is to increase the genome's ability to evolve new genes. This is an effect on the variational properties of the genome. The genome-as-population analogs of viability, fecundity and heritability in the type I effect can be contrasted with these analogs in the case of transposable elements. For such "selfish" DNA, viability as genes is low: on a macroevolutionary time scale, individual copies of transposons are transient, since they exist either as transient allelic polymorphisms or, if they ever go to fixation, are deleted or silenced rapidly because as alleles they are usually neutral or deleterious, and genetically unstable. The fecundity of transposons in the genome, however, is unsurpassed, and overcomes their sub-viability in the genome as individual copies. Their fecundity is due not to their probability of being useful to the organism (item 2 under Fecundity, above), but due to the shear rate at which copies are produced (item 1 under Fecundity, above). Furthermore, their heritability as genes is extremely high. Thus the type I effect of constructional selection and "selfish DNA" are two kinds of genic selection, and are in a sense opposite points within a continuum defined by the genome-as-population analogs of the Darwinian elements, viability, fecundity, and heritability.
2.5
T y p e II Effect: C o r r e l a t e d Allelic Variation.
The type II effect is where the genes that are stably incorporated into the gehome also have an enhanced likelihood that some of their allelic variants will also produce a selective advantage, by varying the phenotype along the same "lines" as occurred during the gene's original incorporation in the genome. By "enhanced", I mean relative to the e~ects of allelie variation at all the genes that were generated by duplication processes, but never fixed in the population and maintained by selection. If the pleiotropy of a gene is a relatively fixed result of its mode of action, then there will be a correlation between the phenotypic effects of the gene's origin and its subsequent allelic variation. If low pleiotropy helped the gene become established in the first place, then the subsequent low pleiotropy of its allelic variants would enhance their likelihood of being adaptive rather than universally deleterious. An important case of the correlated allelic variation effect is "function splitting", where a gene that has been selected as a compromise for carrying out several organismal functions is duplicated and the separate copies can evolve to specialize in some subset of functions. An example is the duplication of the hemoglobin gene and its specialization for fetal or postnatal oxygen transport conditions. In this case, the duplication causes changes in the genotype-phenotype maps of both resulting genes, with the net result of lowering the pleiotropy of allelic variation at these genes, and better optimization of the adaptive functions. This is an area which has already received a good deal of empirical and theoretical study (Ohta 1991 1988, Kappen et al.1989, Li 1985).
213
The type II effect is entirely dependent on there being correlations between the phenotypic effects of a new gene and the effects of allelic variation at that locus. For genes or recent origin, correlations would be expected. However, over time these correlations would be expected to weaken due to several factors. First, substantial sequence changes may occur as the gene diverges in function from that of its ancestral state. Second, whatever novel advantage the gene may have offered when it first arose will tend to change from being a "luxury" to being a necessity, as other functions evolve conditioned on the current state of that gene. This is what Riedl (1977) calls "burden" (and what Wimsatt and Schank call :'generative entrenchment" (Schank and Wimsatt 1987, Wimsatt and Schank 1988). Histones, polymerases, snRPs, etc, are extreme examples of burdened genes, since effectively all characters of the organism depend on them; their mutations are of necessity highly pleiotropic, and they are extremely well conserved. So over macroevolutionary time scales, the correlated allelic variation effect may become "stale" once a gene is in place. The low pleiotropy might be kept "fresh", however, if changing selection or polymorphism produces a history of variation in the gene to which other genes coadapt. 2.6
An Overall Picture of G e n o m e G r o w t h .
These considerations lead to the following picture of the intra-genomic phylogeny: There should be a static core of genes which have ceased to give rise to new genes in the genome; these may be extremely ancient and functionally burdened, or so highly speciMized as to have little adaptive potential for duplications. Once genes enter this core, they should tend to remain there (though they may continue sequence evolution). There should in addition be a "growth front" in the genome consisting of genes that are prolific in generating offspring genes. The growth front would gradually lose genes to the static core once they were created, but would be renewed by the influx of newly created genes, which would be the most likely to give rise to the next set of new genes. On occasion, static genes would be revived into the growth front by new adaptive opportunities conferred by changes in organismal selection. In addition, there would be the various "exceptional" families of genes, including transposable elements, highly repetitive genes selected for quantity production, "junk" and structural DNA, and so forth. 2.7
Constraints and Latent Directional Selection.
An examination of the situations discussed in the literature in which the genotype-phenotype map constrains evolution shows them to be of two basic kinds: kinetic and range constraints. A range constraint is simply where no genetic variation exists for phenotype or specific combination of phenotypic changes. Kinetic constraints emerge from the population genetic dynamics when the probability of creating given phenotypic variants is vanishingly low. A softer version of this is a kinetic bias, in which the most probable variant that responds to a selective pressure has specific phenotypic forms. The problem of adaptation on "rugged
214
fitness landscapes" (Kauffman 1989a) is an example of kinetic constraints, in that what keeps a population at a local fitness peak is the improbability of generating fitter variants (in fact it is transmission probabilities that define what a neighborhood is in the sequence space). This includes the situation considered by Riedl (1977), where mutations are needed at several loci to produce a given phenotype. The general consequence of either range or kinetic constraints is that to varying extents, organisms will be suboptimally adapted. There may be phenotypes that would be more adapted if only the genome could produce them. The population may have reached a mutation-selection balance, in which new variants are all deleterious, and so appear to be at an adaptive peak, when the lack of fitter variants is due to kinetic or range constraints. In such cases one could say that there exists a "latent" directional selection, which would become visible if genetic variation existed in this direction. Riedl's idea is that much of the adaptive opportunity for the evolution of new genes may come from latent directional selection. But constructional selection effects would apply to conditions of normal directional selection as well. There would be adaptive opportunity for any new gene whose effects on the phenotype were in the direction of the current directional selection on the organism. Therefore, genes may to some extent reflect the historical sequence of directional selection experienced by the organism's lineage. Even ancient and highly functionally burdened genes may reveal the functions they conferred in their origin. For example, homeotic mutations which change insect segment identity are universally deleterious. But if an alteration of segment identity was what the gene did when it was created (and thereby presupposed to have been selectively advantageous), then the gene's current function may be a reflection of the directional selection that existed at the time of its origin.
2.8
Models Illustrating Constructional Selection
To give explicit mathematical form to the ideas sketched so far about genome growth, several models wilt be developed. The first is a simple model showing both type I and II effects, which uses probability distributions of fitness effects for gene additions and subsequent allelic variation. The analysis shows the exponential quality of the genic selection effect, and the dependence on correlations in the correlated MIMicvariation effect. The second and third models are further illustrations of the correlated allelic variation effect, using as concrete examptes of genotype-phenotype map functions: 1. Wagner's linear quantitative-genetic model with Gaussian stabilizing selection (Wagner 1989); and 2. Kauffman's (1989a) epistatic "NK" adaptive landscape model. The linear model illustrates latent directional selection arising from constraints on the range of phenotypic variation produced by the genotype, and exhibits selection for new genes that overcome these range constraints. The NK model
215
illustrates latent directional selection arising from kinetic constraints due to the ruggedness of the adaptive landscape, and exhibits selection for genes that overcome the kinetic constraints and produce smoother adaptive landscapes. The Discussion follows, with an overview of the results, an examination of relevant empirical phenomena, and a discussion of the relation of constructional selection to current thinking about the evolution of evolvability. 3
A Fitness
Distribution
Model
The effects of constructional selection can be described directly in terms of the fitness distributions of new mutations, without having to specify the genotypephenotype maps that give rise to these distributions. In the case of the genic selection effect, the mutation is a gene duplication; in the case of the correlated allelic variation effect, the mutation is an allelic change. In this model, a new gene is randomly created from the existing genes in the genome. Selection then determines whether the gene is kept in the genome. The model considers what happens when either allelic mutations or subsequent gene duplications occur. The genes in the population come in different types that determine the fitness distribution of their mutations. The main elements in the model are as follows. Let: G be the space of different types; pi be the probability that a newly created gene is of type i E ~; w be the fitness of the genome with the new gene, relative to its value before the addition; f~(w) be the probability that a new gene of type i has relative fitness w; x~ be the probability that a new gene of type i is kept in the genome by selection. The probability density f i ( w ) would be the result of the phenotypic properties of the gene, as described in item 2 under F e c u n d i t y in Sect. 2.4, including its pleiotropy, modularity, and adaptive opportunity. A concrete illustration is developed in Sect. 5, on Kauffman's NK adaptive landscapes. In a simple-minded approach, a gene would be kept by selection if it increased fitness, i.e. if w > 1. Then the probability that the gene is kept is
//
xi =
fi(w) dw .
=1
But in finite populations, or in any population dynamics where there is a chance that a gene will not be passed down to any offspring, even a gene increasing fitness can sometimes be lost from the population. The probability that a new gene is successfully incorporated in the genome will be some increasing function r of its fitness w. Classical results using branching process models or diffusion approximations give a success probability of 0 if w < 1, and r ~ 2(w - 1) for w ~ 1 (Haldane 1927, Crow and Kimura 1970). So a more general formula for the likelihood that a new gene of type i is fixed is: poo
x~=/ J0
r
fi(w)dw .
(1)
216
The fixation probability over all random newly created gene is: Y
E x ~ P~ . iE6
With these definitions, results for both the genic selection effect and the correlated allelic variation effect will be derived. 3.1
T h e C o r r e l a t e d Allelic V a r i a t i o n E f f e c t
Here we will see how selection on the creation of new genes can cause subsequent allelic variation of the genes to be more likely to be adaptive. We will look at the fitness distributions of alleles from all new genes and from only those genes that selection stably incorporates into the genome. Suppose that a newly created gene of type i gives rise to allelic variants. Let the allelic fitnesses, w I, be distributed with probability density ai(w'). No assumptions need to be made about this density, so it would certainly include the biologically plausible case in which most of the alleles are deleterious. For a gene or type i, we see that the proportion A~,(w) =
a~(y) dy ,
of its alleles are fitter than w. R e s u l t 1 ( C o r r e l a t e d allelic v a r i a t i o n )
Let A(w) be the proportion of new alleles of randomly created genes that are fitter than y, and A*(w) be the proportion of new alleles of stably incorporated genes that are fitter than y. Then
A*(w) = A(w) + eov[Ai(w), x,/~] .
(2)
Proof. The proportion of alleles that are fitter than y, among randomly created gene, is
while among genes that are stably incorporated in the genome it is
A* (w) - Pr[w / > y lthe gene was fixed] = Pr[w I > y, and the gene was fixed] / Pr[the gene was fixed]
= E Ai(w) x~ p~ / 2 = A(w) + Cov[A(yi), x~/~] . iE6
B
217
If there is a positive correlation between the fixation probability x~ =
//
r
fi(w) dw
of a new gene, and the fitness distribution
Ai(w) =
//
ai(y) dy
of its alleles, then A*(w) is greater than A(w). Similarity between the functions fi(w) and ai(w) would produce a positive covariance. The biological foundation for a positive covariance would include: 1. there continuing to be adaptive opportunity for variation in the phenotype controlled by the gene, and 2. the same suite of phenotypic characters being affected by the alleles of the gene as were affected during the gene's origin. With these plausible and general provisions, we see how selection on new genes can also select on the fitness distributions of the alleles that these genes generate.
3.2
The Genic Selection Effect
Now we will see how selection on new genes can increase the chance that new genes are adaptive when created. We will examine how genes with a higher chance of producing adaptive variants tend to proliferate as the genome grows, as reflected in the evolution of p~. The model I am considering is this: genes are randomly picked from the genome and copied. Their fitness effect determines whether they are stably incorporated in the genome. If they are, then the pool of genes subject to duplication is increased by one, and the process repeated. In this way genes of different types come to proliferate at different rates within the genome. Consider the process of sequence duplication that is the starting point for the history of every gene (or part of a gene). One can think of the rate that a gene gives rise to new, successfully incorporated genes as its "constructional fitness". This will be the product of 1. the rate that copies of the gene are produced, and 2. the likelihood that they are fixed in the genome by having provided a selective advantage to the organism. While genetic elements such as transposons or highly repetitive sequences may proliferate because of factor 1, here I wish consider only factor 2, and assume no systematic differences among sequences in the rate that gene copies are produced.
218
P e r f e c t T r a n s m i s s i o n of t h e G e n e ' s T y p e . I suppose for now that copies of genes of type i are also of type i. Because the gene's type is transmitted from a gene to its offspring genes, this provides a correlation between the fitness effects of a new gene and its subsequent duplications. As in (1), a new gene of type i will have probability xi of fixation due to its yielding a selective advantage. Let ni(t) be the number of genes in the genome of type i at time t, N ( t ) = ~2,ie6 n~(t) be total number of genes in the genome at time t, so that the frequency of genes of type i is p~(t) = n~(t) / N ( t ) , and a be the rate each gene is duplicated per unit time. One then obtains this differential equation for the change in the composition of the genome (approximating the number of genes with a continuum), using the fixation probability, xi, for new genes of type i:
-d~ (
t) = ~x~ni(t)
which has solution: n,(t) = e ~ ' ~ n~(o) .
The ratio between the frequencies in the genome of sequences with different constructional fitnesses grows exponentially with the degree of difference between them: hi(t) _ e(~_~j)~ ~ n~(O)
nat)
~j(o) "
R e s u l t 2 ( F i s h e r ' s T h e o r e m a p p l i e d to g e n o m e g r o w t h ) The average constructional fitness of the genome, ~(t) = ~
p~(t) ,
iEG
which is the portion of new duplicated genes that go to fixation, increases at rate d ~ g ( t ) = ~ Var(x) > 0 . Pro@ d
d iE6
d t = Z xd~n~()/N(t) - n~(t)~N(t)/N(t)2t lEG
= ~ x, N ( t ) ie6
-
--,.,2
x~ n ~ ( t )
219
]
= c~
x~p~t) - 5(t) 2
= a Vat(z)
> 0
.
m
This result is Fisher's fundamental theorem of Natural Selection (Fisher 1930), but here, what is evolving is the probability of gene duplications giving rise to new useful genes. I m p e r f e c t Transmission of t h e G e n e ' s T y p e . The model can be extended to less-than-perfect heritability of constructional fitness by defining a transmission function, T(i *-- j), which is the probability that a gene of type j gives rise to a copy of type i (Slatkin 1970, Altenberg and Feldman 1987). It satisfies conditions
ET(i~-j)
= 1 for all j E ~, and T(i*---j) > 0 for all i , j E G
Here, the fraction of the new genes that are of type i is
pi(t) = E
T(i+--j) ni(t ) / g ( t ) .
The dynamics now become: d -~n~(t) = ~ xi E r ( i + - - - j ) nj(t) . jEg Price's Covariance and Selection theorem (Price 1970 1972) emerges when we consider selection in the presence of arbitrary transmission: R e s u l t 3 (Price's T h e o r e m applied to g e n o m e growth) For a gene of type j, let
~J= E x i T ( i ~ J )
.
be the fraction of its duplicate offspring genes that are stably incorporated in the genome. Then rate of change in the average constructional fitness of the genome evaluates to ~-~(t)d = o~ {Cov(~, x) + [~(t) - ~(t)] ~(t)} ,
where
-~(t) = ~ ~ p,(t), a~d Cov(~, x) = ~ ~, x, p,(t) - ~(t) ~(t) . ieg
ieg
220
Proof.
The portion of gene duplications that go to fixation is
keg
~g
jog
je~
This changes at the rate:
-~5(t) = ~ x~ T(i+--j)
/N(t)
dN(t)dt nj(t)/N(t)2]
i,jEg
= a ~ xi T(i*-j) [xj ~ T(j*--k) nk(t) / N(t) i,jEG
keg
- nj(t) k,hGg ~ Xk T(k~-h)nh(t) /
N(t) 2]
= a ~ ~j [xj ~ T(j *- k) nk(t) / N(t) jEg keg
- ~j(t) ~ ~ ~.(t) / x(t) ~] hEg
= ~ {Cov(~, x) + [~(t) - ~(t)] ~(t)}
The covariance term is between a gene's probability of fixation and its offspring genes' average probability of fixation. Note that the frequencies used in the covariance are the frequencies of different types among gene duplications, not the current genes in the genome. A positive correlation between ~i and x~ is to be expected if a gene and its offspring genes affect the same sort of phenotypic characters, and the adaptive opportunity that existed for these characters still exists. Genes (or gene parts, e.g. exons) that code for generally useful products, such as promoters, transmembrane linkers, catalytic sites, developmental controls, etc., would have such continuing adaptive opportunity, and they would contribute to making
Cov(~, x) > o. The term ~(t) - ~(t) is the net bias in the transmission of constructional fitness between a gene and its offspring genes. A conservative assumption is that the transmission bias is negative - - i.e. the chance that gene duplications are adaptive is less for a gene's grand-offspring than it is for the gene's offspring. This is a reasonable assumption since duplications of a gene (or gene part) would diverge to various extents from the ancestral gene's effects, selection may change, or the adaptive opportunity for new copies of the gene may get saturated.
221
But even with a negative transmission bias, the average constructional fitness, Z(t), increases as long as -
> -Cov(
, x) /
.
(3)
As an illustrative example, we can set ~i = ~xi with ~ < 1, a downward transmission bias. Still, ~(t) increases as long as 1 > 1 + Var(xj~(t))
(4)
"
Evaluation of (4) requires evaluating the magnitude of Var(xi/~(t)), which depends on the distribution of constructional fitness values in the genome. Let g(x) be the portion of gene duplications with constructional fitness x. The conditions for (4) under a variety of distributions are: A uniform distribution, g(x) = 1 : 5 increases if fl > 3/4; - An exponential distribution, g(x) = ~e -~x (v is the normalizer): for large ;~, ~ increases if/3 > 1/2; A Gaussian initial distribution, g(x) = L,e-~x2: for large A, 9 increases if > 2/~-; A Gamma distribution,
-
-
-
-1
0,
x > 0,
x<0,
:
for large A, ~ increases if ~ > __z__ Since one can choose 7 > 0 close to 0, "7+1 " distributions can be found for any arbitrarily small ~ in which the average constructional fitness of the genome grows. Thus, even for arbitrarily strong downward transmission bias, where the probability of a gene giving rise to a useful offspring gene decreases by a factor /~ each gene duplication, the average probability in the genome that a gene duplication produces a selective advantage may still increase in time, depending on the initial distribution of these probabilities in the genome. As hi(t) evolves, both Cov(~, x) and the net transmission bias will change. Under a wide variety of well-behaved transmission functions, where the net transmission bias initially satisfies (3), the distribution of constructional fitness values will shift upward until the net bias balances the covariance or the covariance is exhausted. Results 1 and 3 are extensions of a line of theorems in quantitative genetics based on the covariance of different traits with fitness, including Fisher's fundamental theorem, Robertson's "secondary theorem of Natural Selection" (Robertson 1966), and a result by Price (1970) on gene frequency change, which were elaborated upon by Crow and Nagylaki (1976) and Lande and Arnold (1983). Price's theorem has been applied in a number of different contexts in evolutionary genetics, including kin selection (Grafen 1985, Taylor 1988), group selection (Wade 1985), the evolution of mating systems (Uyenoyama 1988), and quantitative genetics (Frank and Slatkin 1990). I have applied it to performance analysis of genetic algorithms in Altenberg (1994, 1995).
222 4 Wagner's Linear Gaussian Selection
Quantitative-Genetic
Model
with
Wagner (1984, 1989) has investigated evolutionary aspects of the genotypephenotype map through analysis of linear maps combined with a number of different fitness surfaces, including "corridor" and Gaussian fitness functions. In this section I investigate the correlated allelic variation effect of genome growth using a variant of Wagner's (1989) model of "constrained pleiotropy". The model here is a multilayered linear map from the genotype to the organismal phenotype, and from the phenotype to the adaptive functions they carry out. Figure 1 illustrates this model.
Functions under selection~ 1 "~~ .Ncr o
~ Q
MAP
Phenotype~ PtIENOTYPE MAP
,~
Genotype[ ~ Fig. 1. Wagner's linear model of the genotype-phenotype map with a Gaussian fitness function on the departure, z, from optimality.
What I want to capture with this model is the following idea: genes don't "know" a priori what they are doing, what functions they are carrying out; i.e. there is "universal pleiotropy". Pleiotropic constraints may limit the genotype's ability to optimize simultaneously all the functions it controls, so that the best phenotype achievable, given the genetic variability available, may be a compromise between tradeoffs that represents a departure from the global selective optimum. The genotype may appear to be at a selective peak, but if new dimensions of genetic variability were opened up, this peak would be revealed to be on the slope of a larger selective peak. Therefore, at these constrained peaks there exists a "latent" directional selection to which the population could respond if the proper dimension of genetic variation existed. In such situations, events which makes the proper variation possible can be major factors in evolution. Genetic changes that alter the nature of the pleiotropic constraints can therefore come under selection. In this model, I will show how, when there exists variability in the pleiotropic effects of genes coming into existence, genes which are most aligned with the latent directional
223
selection will have the best chance of being incorporated into the genome, and the genomes that result will be able to simultaneously optimize all the adaptive functions much better than would be expected from the underlying distribution of pleiotropic effects. Moreover, the pattern of phenotypic effects of each gene will tend to reflect the directional selection that existed when the gene came into being. The phenotypic variability present in the genomes will therefore indicate the history of directional selection that the genomes experienced during their evolutionary construction.
4.1
The Adaptive Landscape
The organismal phenotype is defined as a k-element long vector, y E ]Re. The organism carries out f different adaptive functions. The optimal organismal phenotype is y*, which would perform each of these functions maximally. For each of the f organismal functions there will be a vector qi E ]Rk such that when the phenotype y departs from y* in the direction qi, only the performance of adaptive function i is altered. Thus the set of {q~} must be orthogonal. The amount, zi, of this departure of adaptive function i from its optimum is simply the component of qi present in y - y*, i.e., the projection of y - y* onto q~: z i = q i (Ty - y * )
.
Let the departures from optimality in each adaptive function interact multiplicatively in reducing the fitness of the organism, with the relative importance of function i measured by a value ,~i > 0. A Gaussian selection scheme satisfies these specifications, giving
w(y) = exp [ - ( y - y,)TQAQT(y _ y,)] = exp - E Aiz2
(5)
i----1
where
Q= lfql,...,qfll is the matrix whose columns are qi, and A is the diagonal matrix A =diag
Ii
=1
Assume that {qi} are linearly independent, which requires f _ k. Let them also be normalized, so that QTQ = I (if f = k then Q is an orthogonal matrix, hence QT = Q - l ) . Together, y*, Q, and A determine the structure of the "adaptive landscape" in terms of the organismal phenotype, y.
224
4.2
G e n e t i c C o n t r o l of t h e P h e n o t y p e
Suppose there are n genes, and the allelic state at each gene i determines a genotype xi C lR. The organismal phenotype, y, is the sum of a set of normalized vectors a~ E Sk on the unit k-sphere, weighted by the values xi. Hence y = Am ,
(6)
where A = Ilal,...,a
ll
is the matrix whose columns are the vectors aj. The gene effects on the phenotype are additive, by the linearity of (6). The magnitude is partitioned fl'om the direction of the gene's effects by normalizing aj, so that T
aj aj ~
Ea 2 =I ij
i
for all j. The allelic value x j controls the magnitude of the gene's effects. The fitness function for the genotype is: w(x) = exp [ - ( A x - N * ) T Q A Q T ( A x - y*)] A note on epistasis: Although the loci interact additively in this model, they are also epistatic in terms of fitness, since the contribution of each Mlelic value to fitness depends on the value of the alleles at the other loci: Ow(x)/Ox~ = - w ( x )
( A x - y * ) T Q A Q T a~ .
(7)
4.3 " L a t e n t " D i r e c t i o n a l Selection at F i t n e s s P e a k s u n d e r Pleiotropic Constraints I assume that each of the elements of x are free to evolve, and that the population will eventually become fixed, through allelic substitution, on the genotype vector that produces the maximum fitness, i.e. which minimizes 5(x) = ( n x - y * ) T Q A Q T ( A x
- y*) .
(8)
This is illustrated in Fig. 2. The dynamics of the evolution toward this optimum are not critical to what follows, but the gradient ascent model of Via and Lande (1985), extended to arbitrary dimensions, would be applicable. The constraints in this model are therefore entirely range constraints, and not kinetic constraints, on the attainable optima. To find the minimum of 5(x) in (8) one differentiates. Let M = QAQ w Then M is positive definite (if f = k) or semi-definite (if f < k). The system AT M ( A S z - y*) = 105(x)/0~z Z
= 0
(9)
225
Fig. 2. Illustration of the "latent" directional selection remaining when adaptation is constrained by phenotypic variability"to be suboptimal. The global optimum phenotype is y* and the constrained optimum is Y.
represents the "normal equations" for the minimization problem (Luenberger 1968). The closed-form solution is
= (ATMA)-IATMy
* ,
(10)
and requires that the matrix A T M A , known as the Gram matrix of A, be positive definite. This is assured if: A is full rank, i.e. a~ are linearly independent, M is positive semi-definite, and no ai is in the null space of M , i.e. for all i, QTai ~ 0 and A~ r 0. Note that numerical computation of ~ uses LU decomposition, not the matrix inversion in (10). In his analysis of variability maintained by a mutation-selection balance in this model, Wagner (1989) changes coordinates so that y* = 0. But then by (10), = y*, so the system evolves to the global fitness peak, and is not constrained by variation to be suboptimal. Although this is of no consequence for the nature of a mutation-selection balance, it eliminates the evolutionary potential afforded by the "latent" directional selection that exists when the population is constrained to be suboptimal, which is what I consider here. Quantitative genetic models with the kind of constrained optima described here present a number of important features. Adding allelic polymorphism to the current model, as in Wagner (1989), would reveal that there can be additive genetic variance for a trait under directional selection and yet no evolution of that trait. Moreover, if selection is increased on any trait, the population will respond to it and move in the direction of the increase of selection until a new balance is found; upon relaxation of the selection to the former level, the population would return to the previous value.
226
4.4
C o n s t r u c t i o n a l Selection
The presence of latent directional selection at a constrained optimum creates adaptive opportunity for new genes that give different directions of phenotypic variability, and so until evolution reaches the global maximum, there is always the opportunity for genome growth. The process of adding new genes to the genome then is modeled as increasing the matrix A column by column. Here this process is examined under very simple evolutionary dynamics, where the population is fixed on its best attainable genotype at the time a new gene is tested in the genome. If the new gene increases fitness, it is added to the genome, and before any new genes are tested, the genotype evolves through allelic substitution to the new optimum that the new gene allows it to attain. This process is then repeated and the genome thus built up. A new gene is added to the genome according to some random sampling process, producing a random vector, an+1 - - its vector of effects on the organismal phenotype - - which expands A by one column to yield A'. Addition of a new gene increases the length of ~ by one element, Xn+l, a random variable, to yield x'. The number of phenotypic characters, k, remains unchanged. Once the new gene is added to the genome, mutations in its allelic value xn+1 will change the phenotype along the same vector of variation, a~+l, as produced by the gene's creation. Thus there is complete correlation in this model between the phenotypic effects from the creation of the gene and the effects of its subsequent allelic variation, which is what provides the basis of the correlated allelic variation effect of constructional selection. The departure of the fitness components from the optimum before the addition of the new gene is: ~(=) = ~TAz = ~
~z~ 2 i
where z = Q T ( A ~ - y * ) , and each z~ is the departure of phenotype from perfect realization of adaptive function i, The fitness of the organism after addition of the new gene is vJ(=') = e -6(x') where ~(~c') = (AS: + x~+la,~+t - y ~ ) T Q A Q T ( A ~
+ x,~+la,~+l - y * ) 9
(11)
Define: --~ Xn+lQT an+l .
Then ~(=') : (~ + ~)TA(~ + ~) .
(12)
So fitness increases if and only if 5(w')
--
5(x)
::
2x~+I( A ~
--
.Y. , T) U a
n-},l + Z n2 + l a nT+ l M a n + l
Ai(2zi "4"6i)ei <
= (2z + e)TAs = ~ i
0
9
(13)
227
The effect of the new gene on fitness depends on both its magnitude x,~+l and its direction a~+t. In order for changes in function i to contribute toward increased fitness, z~ and e~ must be of opposite sign (i.e. the new gene changes the genotype in the opposite direction from its error), and
If X~+l is very small, then (2z~ + e~)e~ ,,~ 2z~e~ , and under a wide variety of assumptions about the distributions of x,~+l, the probability that a new gene will produce a fitness increase would be 1/2, independent of the new gene's pleiotropy vector, a,~+l. Thus there would be no constructional selection on an+l. If x~+l is distributed with larger values, however, the condition in (14) corresponds to the new gene not causing the phenotype to overshoot the maximum for function i and produce a fitness contribution lower than before. If any z~ has evolved to be very small, i.e., the organismal phenotype has realized adaptive function i very well, then a large perturbation c~ from any new gene reduces the chance that it increases fitness. This selection against large e~ is greater with larger hi. Thus there will be selection against the addition of new genes that alter existing highly adapted functions. Under this model, new genes that are incorporated in the growing genome will therefore tend to have lower pleiotropy for existing organismal functions than randomly added genes. A measure PA(a,~+l) of the pleiotropy of the new gene can be defined to display the extent to which the new gene disturbs the existing constrained optimum:
~TATMan+I
PA(a~+]) =- y.WMa~+ 1 We see from (9) that pleiotropy is large for a new gene that moves the phenotype in a direction within the space of variability that it is already optimized for:
PA(ai)=1 fori=l...n
.
Whereas pleiotropy is small when the new gene moves the phenotype in the exact direction of the global optimum, A ~ - y * :
P A ( A ~ - y*) = 0 . Then condition (13) for a fitness increase can be written: 6(x')
--
~(x)=
2
Xn+
T
1 an+l
/~I a n + l
--
2x,~+1[1
--
PA(a~+l)]y*TMa,~+l
Since the first term is always positive, a fitness increase requires that the term 1 -PA(a~+l) be as large as possible, i.e. that the pleiotropy value be small.
228
G e n e t i c M o d i f i e r s o f P l e i o t r o p y . It should be mentioned that the same analysis applies to selection on a modifier gene that changes the A matrix. Suppose an allele at a modifier locus changes matrix A to A + C. Then with the substitution x,~+la~,+t = C~c in (11) the subsequent analysis (through (14)) applies. The selective advantage of the modifier relative to the unmodified genotype is w'(x)
l = e 6(x)-~'(x) - 1 .
Here, w r and 5~ indicate values using A + C for A. So any modifier locus which is able to change the genotype-phenotype map, A, has a potential selective advantage of as much as the "latent" directional selection, e 5(x) - 1. 4.5
Numerical Simulation
A numerical simulation of this model illustrates the constructional selection process. The genome is grown gene by gene according to the algorithm illustrated in Fig. 3:
1. Randomly create the adaptive landscape matrices matrices Q, A, and optimal phenotype vector y*: (a) pick the elements of Q uniformly on [-I, I] and then orthogonalize the columns (the Modified Gram-Schmidt algorithm was used (Golub and Van Loan 1983)); (b) generate the diagonal elements of A uniformly on [0, I]; (c) generate elements y* uniformly on [-I, i]. 2. Add a new gene to the genome: (a) create a new pleiotropy vector an+l by picking elements'a/uniformly on [-1, 1] and then normalizing so that ~ i a2 = 1; (b) let the allelic value, x,~+l, for the new gene equal a scale value which exponentially decreases until the new gene is kept. 3. In a run when constructional selection is acting: if the new gene decreases fitness, reject it and repeat step 2. Otherwise, keep it. 4. Adapt x to the new optimum &. 5. Repeat step 2 until the genome has 32 genes. In this simulation, the pleiotropy vectors, a n + i , are chosen from the same distribution throughout the run. Therefore, there is no heritability on the level of genome-as-population, and thus no opportunity for the genic selection effect. The obvious scheme of heredity for gene-to-gene duplications will not produce meaningful results given the way the model is set up. Consider a simple form of heredity, where new vectors a~+t are resampled from { a l , . . . , an}, the columns of A. The new gene would have maximal pleiotropy and always be deleterious since it could only move the phenotype off its constrained peak; the new matrix A ~ would be less than full rank, moreover, giving a continuum of constrained optima. So with the linear genotype-phenotype map, the genic selection effect would not occur under this model of heredity.
229
IGENOMEGROWTHALGORITHM:I ADD A NEW GENE TO THE GENOME
OBTAIN ITS FUNCTIONAL EFFECTS RANDOMLY FROM A GIVEN DISTRIBUTION
J,F
NEWGENE PRODUCES NEWGENE PRODUCES A FITNESS DECREASE A FITNESS INCREASE ICONSTRUCTIONAk | REJECT IT
KEEP IT
ADAPT THE GENOME THROUGH ' ALLELIC SUBSTITUTION UNTIL !T IS AT A FITNESS PEAK
Fig. 3. The genome growth algorithm used in the simulation.
The procedure for choosing x~+l in step 2b was taken instead of choosing xn+l from some random distribution in order to lessen the variance in the stringency of constructional selection on a (as discussed in Sect. 4.4) and to maintain a roughly constant stringency of constructional selection as the genome grows. Simulation were run both with and without constructional selection (where each new gene is accepted in the genome regardless of its immediate effect on fitness), to allow comparison between genomes resulting from constructional selection and genomes sampled from the underlying random distribution of gene effects. In these simulations, there are 64 organismal functions under Gaussian stabilizing selection, and the genomes evolve from one gene to 32. Figure 4 shows the evolution of the fitness components for each organismal function as the genome grows. The height, A~z~, plotted for each function i, represents the departure of each component from optimality as the genome is increased from 1 to 32 genes. The bumps in the landscape indicate where gene addition decreases the adaptation for certain components, while raising it for other. Comparison between the genomes grown with and without constructional selection shows that adaptation simultaneously at many organismal functions can be achieved with a much smaller genome when constructional selection acts during the evolution of the genotype-phenotype map. Figure 5 shows the trajectories of organismal fitness as new genes are added to the genome. The phenotype y always moves closer to y* whether or not constructional selection is acting~ because any generic new gene increases the phenotype subspaee spanned by the genetic variation regardless of its immediate effect on fitness. With constructional selection, however, rapid approach to the
230
Fig. 4. Fitness components for multiple organismal functions during genome growth: in a genome evolved with (left) and without (right) constructional selection. The height, )~iz~, measures departure of each organismal function from optimality. For clarity, only 32 of the 64 different adaptive functions are plotted, in arbitrary order.
global optimum in the adaptive landscape occurs with much smaller genome size. Genomes with the random distribution of phenotypic effects had to grow to a size of 32 genes to reach the same fitnesses attained by genomes of only around 5 genes when these underwent constructional selection. In these simulations, most of the adaptation occurs not from the addition of the new genes, but from the climb to the constrained fitness peaks that occurs between gene additions, the part attributable to allelic substitution. 5
The
"NK"
Adaptive
Landscape
Model
Kauffman's "NK" adaptive landscape model (1989) will be used to illustrate the effects of constructionM selection because it explicitly shows the epistatic structure of the genotype-phenotype map. A separate presentation of this material can be found in Altenberg (1994b). First I will describe the NK model and review existing analytical work on its evolutionary behavior. Then I will examine the properties of genomes evolved under constructional selection including their adaptive performance and the nature of the emergent genotype-phenotype maps. Kauffman's NK model has the following components: A genome consists of n genes; - Each gene contributes a fitness component to the organism, and these are summed to give the total organismal fitness; - The fitness component contributed by a given gene i depends on the allelic state at k other genes.
-
Although Kauffman ascribes each fitness component to a particular gene, in his model control over each fitness component is, in fact, symmetric with respect to
231
Fig. 5. Organismal fitness as a function of genome size for several runs of the genome growth algorithm, with (dark lines) and without (light lines) constructional selection.
all the genes that affect it. So in the development to follow, I recast the NK model in terms of a map between a set of genes and a set of fitness components. This allows the number of fitness components to differ from the number of genes, and allows genes to be added to the genome while keeping the set of fitness components fixed. This is illustrated in Fig. 6. The elements of the model are recast as follows: 1. The haploid genome consists of n binary-valued genes, that exert control over f pheaotypic functions, each of which contributes a component to the total fitness. 2. Each gene controls a subset of the f fitness components, and in turn, each fitness component is controlled by a subset of the n genes. This genotypephenotype map can be represented by a matrix, M = HmijI[, i = 1 . . . n , j = 1 . . . f , of indices m~j E {0, 1}, where mij 1 indicates that gene i affects fitness component j; 3. The columns of M , called the polygeny vectors, g j = Ilmijll, i = 1 . . . n, give the genes controlling each fitness component j; 4. The rows of M , called the pleiotropy vectors, p~ = Ilmijll, J = 1 . . . f , give the fitness components controlled by each gene i; 5. If any of the genes controlling a given fitness component mutates, the new value of the fitness component will be uncorrelated with the old. Each fitness =
232
component r
is a uniform pseudo-random function 2 of the genotype, x E
{o, IF: r
= ~ ( x o g ~ , i , g ~ ) ~ uniform on [0, 1] ,
where q5 : {0,1} ~ x { 1 , . . . , n }
x {0,1} '~ ~-+ [0,1], o is the Schur p r o d u c t
( x o g j = [[ximij[[, i = 1 . . . n). A n y change in i, gi, or x o gi gives a new
value for ~ ( x o gi, i, gi) t h a t is uncorrelated with the old; 6. If a fitness component is affected by no genes, it is assumed to be zero:
9 (x og~,i,gi)
= 0 for all x , i f g ~ = I10.--01t ;
7. T h e total fitness is the normalized sum of the fitness components: f
1 = 7 ,=1
(15)
FUNCTIONS
0 0 0 0 0 0 0 0 \
' ;\
%/'
"q
-,
"~f,'~:v"
,,..~;-'+ G E N O T Y P E -
~'~
N GENOME
NEW GENE
Fig. 6. Kauffman's NK model recast as a map between the genotype and a set of fitness components. Arrows indicate that the gene ~ffects the fitness component. A new gene with effects on two fitness components is shown being introduced to.the genome.
5.1
Pleiotropy and Evolvability
W i t h the r a n d o m fitness function w ( x ) now defined, the relationship between the g e n o t y p e - p h e n o t y p e m a p and the model's adaptive behavior can be investigated. T h e r a n d o m fitness function w(x) causes genotypes t h a t are one m u t a tional event away from one another to be more or less correlated, depending on the g e n o t y p e - p h e n o t y p e map. The statistical p r o p e r t y t h a t affects a d a p t a t i o n is the likelihood t h a t a genotype is fitter t h a n all the genotypes t h a t are one m u t a t i o n different from it. T h e set of genotypes t h a t are one m u t a t i o n away from a given genotype can be called its "neighborhood", and if it is the fittest 2 The popular Park-Miller algorithm generates non-random bits, so the eneryption-like algorithm ran~ described in Press, et al. (1992) was used.
233
genotype in its neighborhood, then it is a fitness "peak", to use the metaphor of the adaptive landscape (Wright 19.32). The NK fitness function thus produces a tunably rugged landscape (Kauffman 1989a). Mutation is not the only variation-producing mechanism involved in evolution. Recombination is also very important. However, in the case of sequence evolution on rugged adaptive landscapes, it has been argued that single mutations are the main mechanism of change. Maynard Smith (1970) proposed that molecular evolution must be limited mainly to moves from a genotype to one of its fitter single- mutation neighbors. Gillespie (1984) provided a theoretical population genetic analysis corroborating that evolution on "mutational landscapes" would consist mainly of "adaptive walks", in which the population moves from fixation of one genotype to fixation of a neighboring genotype of greater fitness. So such adaptive walks will be used here. Adaptive walks have been used to study the statistics of adaptation on NK fitness landscapes (Kauffman and Levin 1987, Kauffman 1989, Macken and Perelson 1989, Weinberger 1991). Beginning with a chosen genotype, the fitness of each of its 1-mutant neighbors is evaluated. If there are no fitter genotypes, the genotype is at a fitness peak and the adaptive walk stops. Otherwise, one moves to the fittest genotype and begins the process again. In the NK model, the chance that a mutation produces a fitness increase will depend on the pleiotropy of the genotype-phenotype map. This effect can be analyzed as follows. Define the pleiotropy value, /
m~j j=l
to be the number of fitness components affected by gene i (the K in Kauffman's usage is ki - 1 here). Define the marginal fitness of gene i as the sum of the fitness components it affects: / =
j=l
When gene i mutates, each fitness component it affects is resampled uniformly from [0,1] independently. The probability that its new marginal fitness will be less than y is
Fk(y) = Pr[S~ < y] 1 ~(_1)i k! i=0
(16) y-i+ly2
'
where Sk is the sum of k independent uniform random variables on [0,1] (Feller 1971). The probability distribution Fk(y) is plotted against y/k for different values of k in Fig. 7. One can see how as k increases, the probability density concentrates around the expected value E(r = 1/2, an illustration of the Central Limit Theorem. Thus in genes with higher pleiotropy k, mutations have a
234
stronger regression toward the fitness y/k = 1/2, eith diminishing upper tails of the fitness density. Therefore, lower fitnesses are likely to be fitness peaks (i.e. all its 1-mutant neighbors are less fit) for genotype-phenotype maps with high pleiotropy.
!I k 0 0 05
--
0.6
0.7
yN
0.8
0.9
1
Fig. 7. The probability, Fk(y), that the marginal fitness of a mutation affecting k fitness components will be greater than y. Plotted with the abscissa normalized by k, for k = 1... 32.
5.2
S t a t i s t i c s of F i t n e s s P e a k s on G e n e r i c L a n d s c a p e s
We would like to take our knowledge of the neighborhood properties of the fitness function in (16) and see how evolution proceeds. A principle question is how fit the peaks are that are arrived at from random starting points. Analysis of the probability distribution of endpoints of such adaptive walks have been made for k = 1 and k = n (Kauffman and Levin 1989, Macken and Perelson 1989). Intermediate values of k present ana]ytical difficulties, so Weinberger (1991) took an. indirect approach to solving the distribution of fitness peaks. Instead of looking at the distribution of fitness peaks arrived at from random initial genotypes, he looked at the fitness distributions of adaptive peaks from among the unweighted set of fitness peaks. The results are reviewed as follows. Given that a genotype m has marginal fitnesses, wj(x), the probability that it is a fitness peak is Pr[m is a local fitness peak
I r = I] Fk~ (wi(m)) ,
(17)
i=1
where r 11r i = 1. f, and t k~] are independent random variables with distributions F ~ . Letting r and Ski be random variables, the probability that
235
a random point is a fitness peak is /
= Pr[~m,jr
> sk, vi]
(1~)
j=l
The probability, than y is
G(y),
n
f
i=l
j=l
that a genotype, given it is a fitness peak, has fitness less 1
f
G(y)=~Pr[EO~
E m i j C j > S k , Vi] .
(19)
j
Weinberger (1991) obtained normal approximations of G(y) for intermediate large values of k, with f = n. Assuming ki -- k for all i, and denoting/z -E[r and cr~ = Var[r Weinberger's (1991) approximation for the distribution of fitnesses among fitness peaks is
c(y)
: ;I
\
(TG
] , /
(20)
where Af0 is the normal distribution with mean 0 and variance 1, and
~
=. +.
and .g = ~ [I + (I + ~)21n(~)]
For the uniform fitness functions, # = 1/2 and cr2 : 1/12. Figure 8 shows Weinberger's approximation of G(y) plotted for n = 31 and several values of k. As k increases, the distribution of fitness peaks gets lower (when normalized by f), approaching the expectation for random genotypes, f/2. In other words, genotype-phenotype maps with large amounts of pleiotropy do not allow 1-mutant adaptive walks to get very near their global optima (~ f). 5.3
C o n s t r u c t i o n a l Selection for Low P l e i o t r o p y
The effect of selective genome growth on the degree of pleiotropy in the evolved genotype-phenotype map can be analyzed, as follows. Suppose a gene newly added to the genome has pleiotropy vector Pn+l, and affects k,~+l = ~jY--1 m,~+lj fitness components, which become resampled uniformly from the interval [0,1]. If a fitness component is not yet affected by any gene, then its preexisting value is 0. Let y be the sum, before the new gene is added, of the fitness components the new gene is going to alter. The probability that the new sum will be less than y is Fk,~+l(y) from (16). Then, from (16), the probability that the new gene will produce a fitness increase is 1 - Fkn+l (Y). As can be seen from Fig. 7, when the average of the fitness components to be altered by the new gene is above 1/2,
236 1 0.8
G (y) o6! 0.4 0.2 -0.5
0.55
0.6
0.65
0.7
0.75
0.8
y// Fig. 8. Weinberger's (1991) normal approximation for the distribution of fitnesses among fitness peaks in the NK model. Plotted for n = 31 and k = 10, 20, 40, and 80.
the greater k~+l is, the less the chance that the new gene will produce a fitness increase, precipitously less so for highly adapted fitness components. Since the new gene is kept only if it produces a fitness increase, constructionM selection will filter out genes with high k. Suppose that there is an underlying probability density s (k) of pleiotropy values k for genes newly added to the genome. Then the density s*(k) of pleiotropy values among genes that are kept by the genome (i.e. which improve fitness) will be s*(k) = s(k) ~ Pr[plk ] [1--Fk(pWr /g , (21) pe{0,1}• where r is the vector of fitness components before the gene was added, Pr[plk ] is the probability of sampling pleiotropy vector p given that the new gene's pleiotropy value is k, and N is the normalizer so that ~ k s*(k) = 1. The way constructional selection filters out high pleiotropy as the adaptedness of the genome increases is illustrated in Fig. 9. It plots(21) with the assumption that all fitness components are the same, i.e. r = r for all i. The underlying density of pleiotropy values before selection is taken to be uniform on 1 ... f. The figure shows that the more highly adapted the genome is, the more severe is the selection against high pleiotropy. 5.4
Numerical Results
A numerical simulation of constructional selection in the NK model was performed using the same genome growth algorithm as was used in Sect. illustrated in Fig. 3: 1. Add a new gene to the genome:
237
Fig. 9. The density, s* (k), of pleiotropy values k, among genes successfully_incorporated in the genome, plotted as a function of the fitness component average, r prior to the gene's addition. The arrow points out the plot of the prior density s(k), of pleiotropy values from which the genes are sampled, s(k) is uniform on {1,..., f}, and here,
f=3I. (a) create a new pleiotropy vector P,~+I, choosing uniformly (from { 1 , . . . , 31}) the number, k,~+l, of fitness components to be affected by the new geue, and then selecting randomly which fitness components these are, from a set of f = 31 possible; (b) pick the allelic value, x~+l, of the new gene with probability 1/2 being either 0 or 1. 2. If the new gene decreases fitness, reject it and repeat step 1. Otherwise, keep it. 3. Adapt x to the new (local) optimum 5~ by allelic substitution through a "greedy" 1-mutant adaptive walk. 4. Repeat step 1 until the genome has 31 genes. The pleiotropy vectors, Pn+l, are chosen from the same uniform distribution throughout the run. As a basis for comparison, the genome growth algorithm is also run without step 2, giving the result of choosing representations a priori. E v o l v e d G e n o t y p e - P h e n o t y p e M a p s . Figure 10 shows typical genotypephenotype maps produced during runs with and without constructional selection. The run without constructional selection reflects the underlying distribution of pleiotropy vectors sampled for each new gene. In the run with constructional selection, during the evolution of the first few genes, the discovery of new fitness components selects for high pleiotropy, but as these fitness components evolve toward their optima, selection becomes strong against new genes affecting them.
238
Fig. 10. Two genotype-phenotype maps evolved through genome growth, with (left) and without (right) constructional selection. Dark squares indicate that fitness component j depends on gene i. The columns in the right map reflect the sampling distribution of the pleiotropy vectors, in which the number of fitness components affected is uniform on [1, f]. The leR map shows how under constructional selection, later genes have lower pleiotropy as the genome grows and becomes more adapted.
This increasing selection for low pleiotropy can be seen in Fig. 11, which shows the distribution of pleiotropies k,, as the genome grows, over repeated runs of genome growth. It can be seen to resemble the predicted distribution in Fig. 9. The mode for k,~ is always 1 after the first few genes, but as shown in Fig. 12, the mean k,, tends toward 1 from initial values of around 16, or half of the maximum possible, f = 31. The progress in adaptation can be compared between runs with and without constructional selection. Figure 13 shows plots for a number of runs. Without constructional selection, disruptive new genes are not filtered out, and adaptation shows little progress once the fitness components are saturated with genes that affect them. With constructional selection, however, fitness continues to increase with each new gene throughout the genome growth. As the genome grows, the trajectories of individual fitness components can be seen in Fig. 14. With constructional selection, once a fitness component has reached a high value (low points in graph)~ only new genes that leave it alone are likely to be incorporated in the genome. Occasionally, however, one component is sacrificed for the improvement in another, which show up as spikes in the graph. By the time the genome has reached a size of 31 genes, most of the components have reached values well. above their expected value of 1/2, Without constructional selection, the jumble of spikes represents the continuing randomization of the fitness components as genes with random pleiotropy are incorporated into the genome. Here, most of the adaptation occurs during the incorporation of new genes, rather than during the adaptive walks (through altelic substitution) between gene
239
Fig. 11. The distribution, from repeated runs of the genome growth algorithm, of pleiotropy values k,~, from each gene's pleiotropy vector p,~, as the genome grows; with (left) and without (right) constructional selection. 20
AVERAGE PLEIOTROPY,
k,
17.5 15 12.5 i0 7.5 5 2.5 0
i0
15
20
GENE NUMBER,
25
30
n
Fig. 12. The average pleiotropy values k~ for each gene as the genome grows, from the runs in Figure 11, with constructional selection.
additions. This is because there is a much larger pool of new pleiotropy vectors to sample from than the pool of genotypes in the 1-mutant neighborhood of an existing genotype (2 / vs. n). The evolutionary process under constructional selection is figuratively the "building" of a fitness peak, gene by gene, rather than the climbing of a fitness peak. The correlated allelic variation effect as discussed in Sect. 3.1 is illustrated here by the fact that low pleiotropy evolves. Compared to average genes being tested, the genes kept by the genome will have a higher correlation between the fitness when the gene was added and its alternate allele because of its low pleiotropy. However, by the nature of the NK model, the single alternate allele of an advantageous new gene is unlikely to be fitter still. The ability of the correlated allelic variation effect to enhance evolvability, through the low pleiotropy of genotype-phenotype maps produced under constructional selection, would be
240
Fig. 13. Fitness as a function of genome size for several runs of the genome growth algorithm. Dark lines are with, and light lines without constructional selection.
evident in the event of shifts in the adaptive peak. Should any of the fitness components change due to a changed environment, the low pleiotropy of the genes that affect these fitness components would enhance their chance of producing alleles that respond to the change, without causing a prohibitive disruption of functions for which selection has not changed.
N o n - G e n e r i c P r o p e r t i e s of Evolved Landscapes. Existing theory for adaptive walks on NK landscapes, as in Sect. 5.2, has been derived for generic landscapes, i.e. landscapes that one would typically obtain from a random sampling of landscapes with given values of n and k (Kauffman and Levin 1987, Kauffman 1989, Weinberger 1991) . The applicability of these results to organic evolution assumes that evolutionary processes produce such generic adaptive landscapes. However, the distribution of fitness peaks in the NK landscapes grown here under constructional selection are nowhere near the distributions for generic NK landscapes with identical genotype-phenotype maps. Constructional selection produces genotype-phenotype maps that are much more finely tuned to the fitness function under which they evolved. To illustrate this, the distribution of fitness peaks for several landscapes evolved under constructional selection are plotted in Fig. 15. For comparison, distributions are plotted for landscapes using the same genotype-phenotype map, but with fitness functions, ~, chosen a priori. Each point represents the fitness peak obtained
241
WITH C O N S T R U C T I O N A L S E L E C T I O N
WITHOUT CONSTRUCTIONAL SELECTION m
9
1-
~.
. 9
"
."
.~ ~ . . . % .
FITNESS 0.75 0.5
0.25
GENOME
SIZE
~o~
SIZE FUNCTIONS
=o
~~ ~, "~
3o zo
40
FUNCTIONS
4o
Fig. 14. Fitness components during genome growth, for one genome evolved with (left) and one without (right) constructional selection. Fitness components are sorted according to their value at the end of the run.
by starting an adaptive walk from a randomly sampled genotype. The distributions are plotted by sorting the fitness peaks by size (the transpose of the figure therefore represents the cumulative probability distribution for fitness peaks). The width of horizontal plateaus represents the size of the domain of attraction for a particular fitness peak. The plateaus, and discontinuities between them, indicate fewer and larger domains of attraction for the evolved landscapes, i.e. they are smoother than the generic landscapes. The distributions for the generic landscapes follow roughly the Gaussian approximation derived by Weinberger (1991) as seen in Fig. 8 (fitting close to the generic k = 10 landscapes). While the least-fit peaks are approximately the same for both evolved and generic landscapes, at various points in the ranking, the fitness of the evolved landscapes grows much higher. Interestingly, the jumps in the distribution are highly variable. An additional beneficial outcome of constructional selection is that the genotypes resulting at the end of the run are usually the apparent global fitness peak. In 77% of adaptive landscapes evolved under constructional selection (304 sampled), the genotypes attained at the end of genome growth were fitter than any other adaptive peak found (from 250 other starting genotypes). Of the remaining landscapes, only an average of 19% of random initial genotypes evolved to peaks fitter than the genotype attained at the end of genome growth. 5.5
Lineage S e l e c t i o n vs. C o n s t r u c t i o n a l S e l e c t i o n
The NK model can be used to compare the effectiveness of lineage selection (see Sect. 6.3) with that of constructional selection in producing evolvable genotype-phenotype maps. The idea of lineage selection is that the organisms whose developmental mechanisms happen to be most evolvable will found the most successful phyletic lineages, so that evolvable species will proliferate at the greatest
242 landscapes from constructional selection
fitness 0.9
0.85
0.8
0.75
0.7
----
~
'S
0.65
0.6
2 0
400
600
800
i000
fitness peaks sorted by fitness Fig. 15. Distributions of fitness peaks of NK landscapes: upper 10 plots are for adaptive landscapes evolved under constructional selection; lower 10 plots are with the same genotype-phenotype maps but randomized fitness functions. In each plot, the peaks attained from 1000 random starting genotypes are sorted by fitness. Plateaus indicate large domains of attraction for the peak.
rate (Dawkins 1989). It leaves evolvability within lineages as a byproduct of their genotype-phenotype map not subject to secular evolutionary pressure. Lineage selection in the NK model could be implemented by generating random NK landscapes, selecting those with the greatest evolvability, and evaluating the height of the fitness peaks on these landscapes. This can be compared to the height of fitness peaks of genomes evolved through constructional selection. The basis for comparison will be the fitness of the fittest individual obtained after a set number of genotypes have been generated. The payoff in the level of optimization obtained through constructional selection shows it to be a much more powerful than lineage selection. To give lineage selection the best possible advantage, I will consider the class of NK landscapes with the highest expected fitness peaks, the k = 1 landscapes (which is /( = 0 in Kauffman's original definition). In the k = 1 landscapes, there is a one-to-one map fl'om each gene to each fitness component, and f = n. Each gene can be optimized individually, so it takes the evaluation of 2n genotypes to find the global peak for the landscape. Each fitness component is i.i.d., where the optimal r is distributed as the maximum of two independent uniform random variables on [0,I]. The probability density of each maximum r is/(r = 2r So E(r = 2/3 and Vat(C) = 1/18. With n = f = 31 genes and
243
fitness components, one obtains: 1 E[w(~)] = 2/3, and Var[w(~)] = 1-~ ~ 0.00179 . The average fitness attained under constructional selection in the numerical simulations (where the average k is much larger than 1) is about 0.89, which is therefore some 5 standard deviations (5 x 0.0423) above the expected value of peaks obtained from generic k = 1 landscapes. The fraction of randomly generated k = 1 landscape having a peak with fitness at least 0.89, under a normal approximation, is 3 x 10 -7. So over 2 x 106 different k = 1 genotype-phenotype maps would need to be sampled to be likely to obtain fitness peaks as high as those obtained through constructional selection, which in the runs here took some 3000 sampled genotypes. The ability to select on the genotype-phenotype map as it is constructed is the key to finding higher fitness values. For a more even comparison with lineage selection, we can see what kind of k = 1 landscapes constructional selection can produce. A simple way to implement this would be to add one gene to the genome at a time, map the gene to each of the remaining unmapped fitness components, evaluate the fitnesses of both alleles with each map, and keep the allele and map that give the fittest value. The first gene would be evaluated for the f possible genotype-phenotype maps, for a total of 2f evaluations. The second gene would be sampled with the remaining f - 1 unmapped fitness components, and so forth, giving a total of f ( f + 1) genotypes sampled. Each resulting fitness component would be the maximum of 2(f - i + 1) uniform i.i.d, values, for the ith gene/gene map pair. So the ith fitness component would be distributed as Fi(r = ~2(/-i+1). The fitness components would have expectation E(&)=
2(f+l-i) 2(f + 1 - i ) + 1
The expected value for the fitness peaks obtained through this constructional selection process would be 1s E[w(x)]
:
7 i=1
2i 2i -4- 1
For n = f = 31, the expected fitness peak would be E[w(&)] ~ 0.945, and would take 31 x 32 = 992 genotypes to find. So when lineage selection and constructional selection are both compared with k = 1 landscapes, the levels of adaptation achieved with constructional selection are seen to be vastly greater. 6 6.1
Discussion Overview of Results
The goal of this chapter is to introduce the idea of "constructional" selection as a description of how the evolutionary acquisition of new genes can produce a
244
genome better able to generate adaptive variants. In the Introduction I sketched out the basic conceptual framework of the idea, the two main parts being the genic selection effect (type I) and the correlated Mlelic variation effect (type II). The genic selection effect was based on the idea of defining viability, fecundity, and heritability differences at the level of the genome-as-population. I described how characteristics of the genotype-phenotype map, in particular Bonner's low pleiotropy principle, would lead to predicted differences in gene fecundity for giving rise to new, useful genes, and how this in turn would filter genome growth in the direction of lower pleiotropy. Three models were provided to give concrete illustrations of constructional selection effects. The genic selection effect was illustrated with a simple model in which genes are ascribed "constructional" fitnesses - - the probability that duplications of them are useful genes. The result is exponential growth in the genome of genes better able to spawn new genes. The correlated allelic variation effect was illustrated with some concrete examples of genotype-phenotype map functions, Wagner's linear quantitative-genetic map with Gaussian selection, and the epistatic NK fitness landscape of Kauffman. The NK model, because of the discreteness of the different organismal functions, provided a good example of how the level of pleiotropy of a new gene's effects affects the probability that it is selectively beneficial. One general implication of constructional selection is that the genotypephenotype map ought to be less complex than one might suppose. In other words, to some degree "bean bag genetics" might not be entirely wrong, at least for more recently evolved genes. One can also expect that there would be an attunement (Barwise and Perry 1983) between the dimensions of recurrent environmental variation and the dimensions of genetic variation, in that recurrent, environmentally caused shifts of optimal phenotypes along certain phenotypic dimensions would expose the lineage to repeated directional selection along the same phenotypic axes; this would create the potential for the evolution of genes with phenotypic effects along these dimensions. In the simulation models here, because the selection functions do not change during evolution, once a geue is incorporated in the genome, it is always deleterious to delete it. Examples are accumulating of genes that have been lost in the course of evolution (Brakenhoff, et al. 1990, Nishikimi et al. 1992, Wu et al. 1992), so clearly gene loss is a possibility, but the degree to which genes turn over in the genome is not known yet. The addition of gene loss to the models here would not prevent the constructionM selection process; in fact, systematic differences in the rate that different genes are lost would also contribute to constructional selection as viability differences on the level of genome-as-population. 6.2
Empirical P h e n o m e n a
The main empirical predictions that come out of the processes discussed here are that: 1. There ought to be signatures of differential gene fecundity within the genome, and these should relate to way the gene duplications and/or allelic variation
245
maps to the phenotype; 2. There ought to be dimensions of variation within the genome with low pleiotropy, affecting a relatively small suite of organismal functions; and 3. More recently evolved functions ought to have the least pleiotropic genetic control. The advent of intra-genomic gene trees makes it foreseeable that some of these predictions could be tested. Observations on allelic variation by itself, without knowledge of the lineage of the gene, could be manifestations of other evolutionary processes besides constructional selection, and present methodological difficulties. Also, the prediction of low pleiotropy requires a basis for comparison in order to be tested, i.e. null hypotheses about levels of pleiotropy. And pleiotropy itself is a slippery concept, because the discretization of organismal traits is primarily an observer artifact; what is required is a quantification of functional relationships. Genic Selection Effect. Numerous examples can be pointed to of genetic elements with specific functions that appear to have proliferated in the genome. Promoter sequences provide one example. A comparison can be made between transcription promoters that are external to the transcribed sequence, and internal promoters, whose sequence is part of the transcribed gene. Whereas internal promoters (which use RNA polymerase III) cannot be recombined with other genes without large pleiotropic effects (Shi and Tyler 1991), external promoters can routinely be recombined with other peptide coding sequences, with the promoter retaining its regulatory properties and the peptide retaining its functional properties. External promoters are ubiquitous (i.e. they evidently have high constructional fitness), while internal promoters are restricted mainly to rRNA, tRNA, and snRNA genes and appear to be of ancient origin. Signal peptides may be another example of a low-pleiotropy module that has proliferated. But because the constraints on the amino acids of signal sequences are rather broad, accurate intragenomic phylogenies are difficult to be certain of. Multi-gene families are examples of sequences that proliferate because of their ability to produce new useful variants, and their parts and subparts, as in the case of serine proteases, tend to have very specific functions that are retained in their different combinations (Doolittle 1985). The immunoglobulins are a spectacular example of genes of specific function which when duplicated produce offspring genes with a very high likelihood of being selectively advantageous. There are several ways that the genic selection effect may have left a mark on multi-gene families, which I give with some suggestive anecdotal examples. These predictions apply to those gene families that are highly diversified (rather than multiple-copy gene families which have a very different selection and transmission dynamic due to unequal crossing over and gene conversion): 1. Gene families should show periods of exponential growth, possibly followed by logistic-like stasis as the genes saturate the available adaptive opportunities, depart from their original effects on organismal function, or become
246
functionally burdened (e.g. the vertebrate Wnt developmental gene family (Sidow 1992)~ antennapedia-class vertebrate homeobox genes (Kappen et al. 1989)); 2. During periods of exponential gene family growth, adjacent branches in the gene tree should show correlations in the time intervals between gene origins, producing acceleration in the branching rates in fecund branches; i.e. most new genes should come from genes that are themselves new; 3. Gene families that are in the process of expanding should continue to do so in independent lineages after taxon branching (e.g. neurofilament proteins in fish (Mencarelli et al. t991)). E x o n Shuffling. The genic selection effect offers some explanatory clarity to questions about exon shuffling. Gilbert (1978) proposed that the characteristic exon/intron mosaic structure eukaryotic genomes existed so as to "speed evolution" by the creation of new genes through exon shuffling. But Crick (1979) criticized this evolutionary reasoning as being non-Darwinian, because it appeared to rely on "evolutionary foresight" in the genome - - structures existing for their future evolutionary potential. Blake (1978) instead proposed that exons were descendents of the original "proto-genes", later assembled into complex~ multi-exon proteins. Others, however, have proposed that introns were inserted later into contiguous genes through a transposition process. Subsequently, the debate on the exon-shuffling hypothesis has focused on a number of issues: - Whether introns arose "early" or "late" - - i.e. were present at the origin of eukaryotic genes or were inserted later into contiguous genes, possibly through a transposition process; - Whether exons correspond to units of peptide structure or function, i.e. whether exons are "modular"; Whether protein evolution through exon shuffling has indeed occurred; Whether selection could create a correspondence between exons and protein structures or whether exons would need to be descendents of the original "proto-genes". -
-
Proving a correspondence between intron position and peptide structure has been seen as crucial for answering whether gene evolution through exon shuffling has occurred, and whether introns arose "early" or "late". Evidence has been marshMled both in favor (e.g. Gilbert 1993) and against (Stoltzfus 1994) there being a significant correspondence between exon structure and protein structure in eukaryotic genes. Once it is understood that constructional selection would enter into exon shuffling dynamics, however, the question of when exons arose early or late becomes decoupled from the phenomenon of exon modularity. The genic selection effect provides a Darwinian mechanism for the evolution of modular exons through exon shuffling (Altenberg 1985). Exon modularity is the equivalent of low pleiotropy on the molecular level, so more modular exons would be expected to have a better chance of producing useful variation when recombined with other
247
genes. Even a genome composed of randomly partitioned exons would come to be populated by modular exons if enough genome growth had occurred to allow the differential proliferation of exons, i.e. the genic selection effect (Altenberg and Brutlag 1986). Moreover, the evolutionary increase in modular exons needn't be relegated to the distant past, but would be occurring presently in any evolution of new genes through exon shuffling. This and other hypotheses for the evolution of split genes are reviewed in Doolittle (1987). Second, the genic selection effect clarifies the feature of exons that would help them proliferate in the genome and speed evolution. What matters fundamentally is not that introns fall between structural elements of the peptide, but that each exon be able to maintain, within a new peptide environment, the properties that it was selected for. It has been assumed that the latter would require the former, but this assumption needs to be justified. Therefore, the negative statistical results carried out by Stoltzfus (1994), based on where introns fall in the peptide structure, might not be measuring whether exons have modular properties under exon shuffling. The more definitive test is to perform experimental manipulation of gene or peptide structure showing functional autonomy of the product of the exon or set of exons (e.g Craik 1980, Sanctis 1986, Zonneveld 1986, de Vries 1988, and Casorati 1993). Modularity of exon function has been found in many but not all cases. A testable prediction I proposed as to whether exons with modular properties may have proliferated in the genome was to examine the reading frame statistics of exons (Altenberg 1983). Exons or group of exons that were a multiple of 3 nucleotides long would have greater modularity, because insertion of such exons into a protein-coding gene does not shift the reading frame down-stream. Such exons would have a constructional fitness advantage under exon shuffling. Data confirming this prediction were presented (Altenberg 1985, Altenberg and Brutlag 1986) showing statistical excesses of exons and pairs of exons with lengths a multiple of three. Subsequently, Patthy (1987) proposed what is also a constructional selection theory for exon shuffling, and also made similar predictions about exon reading frame properties. Smith (1988) and Gelfand (1992) have both corroborated these statistical findings on exon reading frame lengths. Selection has been proposed as a possible cause of modular exons, but without sufficient attention to the level at which selection would have to act. The genic selection effect clarifies the means by which selection could produce modular exons. Doolittle (1985) had proposed that modular exons should be prevalent because "introns that occur between potentially useful domains will have added survival value". But one must ask, survival value for whom? It cannot be the survival of the organism carrying the intron, because intron position generally does not affect organismal viability. It could be perhaps the long term survival of the intron within the gene, if rates of introns loss were found to correlate with protein structure. In terms of the genic selection effect, intron survival within a gene corresponds to viability on the level of genome-as-population. If there were differences in intron longevity based on peptide position, this could produce a correspondence between intron position and peptide structure. But the differen-
248
tial proliferation of modular exons within the genome is a matter of differential survival, but differential fecundity on the level of genome-as-population.
DissociabUity in D e v e l o p m e n t and Morphological I n t e g r a t i o n . Dissociability in development and morphological integration are two aspects of the correlation structure of phenotypic variability. Morphological integration is said to be present where morphological characters which are functionally interdependent are also genetically correlated. Dissociability is where one such suite of functionally related characters has variability independent from another such suite, tt is a form of low pleiotropy in developmental processes (defined for phenotypic as welt as genetic perturbations), in that the development of certain structures can be changed without altering the development of other structures. Both phenomena are predicted outcomes of constructional selection. But both can come about from modifier evolution as well, in which evolution under organismal selection at one locus systematically changes the genotype-phenotype maps at other loci, which is a form of epistasis (see Sect. 6.3, below). So evidence of their existence cannot alone be taken as support for an impact of constructional selection processes. Cheverud (1984) has hypothesized modifier evolution under stabilizing selection as a primary mechanism producing morphological integration. The argument is principally one of genetic load: modifiers should evolve to reduce the genetic variance for the phenotypic dimensions under the most severe stabilizing selection. Wagner (1988) notes, however, that for populations in mutationselection balance around a fitness peak, the strength of selection on modifiers of the genotype-phenotype map can be only on the order of the mutation rate. Wagner has consequently looked to situations in which the population is not near equilibrium, but is at an earlier stage of directional selection, where there can be stronger selection on modifiers. He notes that rather special ecological conditions may be needed, however, to keep a population continually far from equilibrium, limiting the ubiquity of this mechanism for generating morphological integration. This difficulty is overcome by recalling, from Sect. 4.4, that pleiotropic constraints can be expected to leave the population in a state of "latent" directional selection that can provide strong selective advantage to modifiers, or newly created genes, that break or shift the constraints in the right way. This is the context for Riedl's (1977) constructional selection mechanism. Therefore, morphological integration could be the result not of ongoing stabilizing selection about a fitness peak, but the legacy of changes in the genotype-phenotype map driven by the presence of latent directional selection for the morphological function. If the origin of the morphological adaptation involved the creation of new genes, then morphological integration could reflect the suite of phenotypic correlations that originally gave these genes their selective advantage (the correlated allelic variation effect). This would require a certain degree of stability in the correlation structures over evolutionary time. One of the means to test for morphological integration is to examine the
249
eigenvalues of the genetic correlation matrices for quantitative traits. Wagner (1984) pointed out that conclusions about the significance of the eigenvalues requires a null hypothesis. Wagner investigated of the eigenvalue distributions for random genotype-phenotype maps as a null hypothesis, and this was extended to other statistics on genetic correlation matrices (Cheverud et al. 1989). Significant departures in real data from the random expectation were found. The genotypephenotype maps evolved here in simulations of genome growth can be used to derive genetic correlation matrices and thus yield expectations from the action of constructional selection that can be compared with quantitative genetic data. Patterns of dissociability are ubiquitous in development, but little quantitative theory for the origins or maintenance of dissociability has been developed. Rieppel (1991) proposes that phylogenetically successful taxa are those which have been able to achieve greater dissociability in their ontogenetic systems, using snakes as his example. This is exactly what is expected from constructional selection acting during genome growth, and what may also result from the evolution of modifiers of the genotype-phenotype map. Much of the thinking about dissociability places it as an aspect of yon Baer's Laws (1828), that dissociable developmental pathways are those added recently in evolution to later stages of ontogeny. A view of developmental mechanisms as causal cascades gives rise to this view (Riedl 1977, Schank and Wimsatt 1987, Wimsatt and Schank 1988), but it should not be forgotten that there is variation for developmental mechanisms and that what emerges is filtered by selection. Terminal additions in ontogeny may be prevalent because on the average they are less likely to be pleiotropic. But if pleiotropic cascading effects constrain the evolutionary malleability of earlier ontogeny, then there may exist large degrees of latent directional selection which would drive the evolution of new dimensions of genetic variability that produced dissociability. An example is the evolution of imaginal disks in Drosophila, a highly derived trait which effectively decouples larval and adult morphology. Variation in larval morphology, as long as it doesn't impinge on the imaginal disks, has little "generative" consequence for adult functioning, thereby unburdening the larval form. Imaginal disks thus were an invention that reduced pleiotropic constraints within Drosophila development. Raft and coworkers (Raft 1992, Raft et al. 1992) have demonstrated especially well with sea urchins that divergent early development does not necessitate divergent adult forms, so that cascading effects of perturbations to early development must be seen as contingent results of evolution. Other examples include the frog Gastrotheca and the clam Unio (Levinton 1988, del Pino and Elinson 1983). This supports a view of the evolution of the genotype-phenotype map in which constraints on early development are not a mechanistic necessity, but are always unstable to new dimensions of variation that compartmentalize the genetic underpinnings for different adaptive functions. Allelic Polyraorphisms. If constructional selection has indeed produced genomes with a prevalence of genes with low pleiotropy, this might be expected to be evident in the phenotypic nature of Mlelic polymorphisms. Koehn et al. (1983)
250
surveyed several cases of enzyme polymorphisms to get a sense of the levels of pteiotropy that typically exist. Their evaluation was that natural polymorphisms have low levels of pleiotropy when compared with the possibilities that exist for wide functional effects. The genes studied may not be reflective of genes in general because the nature of their pleiotropy may influence whether the genes maintain polymorphisms. Gimelfarb (1986, 1992) and Hastings and Horn (1989, 1990) have shown in the case of linear genotype-phenotype maps and stabilizing selection (as in Sect. 4), how the degree of pleiotropy can be critical to the number of loci at which polymorphisms can be maintained. In these quantitative genetic models, higher pleiotropy allows for greater polymorphism, which would strengthen the significance of Koehn et al.'s observations. Further theoretical study is needed in this area before natural genetic polymorphisms can be interpreted as evidence with respect to the genotype-phenotype map, and therefore constructional selection.
M a c r o e v o l u t i o n a r y Dynamics. The creation of new genes may represent only a tiny fraction of the genetic events that contribute to adaptation, yet they may play a significant role in the sculpting of the genotype-phenotype map. All that is required is that the mode of action of the gene on the phenotype be somewhat conserved over macroevolutionary time scales. Still, however, one should not overlook the possibility that the evolution of new genes may often have a profound effect on the rates and direction of evolution. Several cases have been found where gene duplication was followed by accelerated rates of Mlelic substitution (Li 1985). Because changed selection regimes would be expected to increase adaptive opportunities for the evolution of new genes, and because new genes can open up new dimensions of adaptive variation, one might expect to find associations between the origin of new genes and the origin of new taxa. This should be statistical testable once sufficient quantities of gene tree data accumulate. In discussions of macroevolutionary dynamics, Eldredge (1989) points out that since Wright (1932) introduced the notion of the "adaptive landscape", adaptive change has been seen as either the tracking of moving adaptive peaks, or shifts from one adaptive peak to another. The dynamics considered here are another kind of adaptive change, in which new dimensions of variability in the phenotype are created, and what appeared to be an adaptive peak is now revealed to be a "slice" through the side of a peak of higher phenotypic dimensions. The organismal change afforded by new dimensions of variability may be incremental or profound, depending on how well the new variability allows decoupling of conflicting pleiotropie constraints and progress toward new adaptive optima.. But the potential at least exists that certain "punctuations" or "saltational" changes during phylogeny reflect rapid climbing of pre-existing adaptive peaks through the introduction of new degrees of genetic freedom.
251
6.3
T h e E v o l u t i o n of Evolvability
The main significance of constructional selection is that it is a mechanism that can apply more or less to all genes, with the effect of enhancing the ability of the genome to generate adaptive variants, and the effect of extending the genotypephenotype map in the direction of lower pleiotropy. As such, it is an anagenetic mechanism that can enhance the genome's evolvability. Several other mechanisms that have been proposed for the evolution of evolvability are reviewed below. Lineage Selection. Dawkins (1989) has discussed a mechanism for the evolution of evolvability that has perhaps made inroads into making the "evolution of evolvability" more discussible in evolutionary research (Arnold, et al. 1989, Alberch 1991). Dawkins's mechanism is lineage selection. In lineage selection, organisms whose genotype-phenotype map by happenstance makes them evolvable - - i.e. better able to generate adaptive variants - - are the ones whose lineages would have most proliferated and endured. Thus, even though there would never be selection for evolvability within a lineage (consonant with Concept 3), most of the species we see would have high evotvability. Dawkins proposes lineage selection for the prevalence of species with evolvable developmental mechanisms; Doolittle proposed lineage selection for the evolvability-enhancing property of introns in protein evolution (Doolittle 1987); lineage selection has also been proposed for the evolution of sex (Stanley 1976, Aboitiz 1991). However, lineage selection must still turn to chance as an explanation of why certain genomes came to be more evolvable; it cannot produce an increase in the evolvability of the genome within lineages. Genetic modification and constructional selection are mechanisms that can change evolvability within lineages. G e n e t i c Modification. Modification, as mentioned earlier, is a form of epistasis, in which the nature of the phenotypic variability determined by one locus is affected by the allelic state at another locus, the modifier. The most widely discussed idea for the evolution of evolvability through genetic modification is "regulation" in development, otherwise called canalization (Waddington 1957), developmental homeostasis (Lerner 1954), morphogenetic correlations (Schmalhausen 1949), or morphological integration (Olson and Miller 1958, Cheverud 1984). In this mechanism, genes are selected on for their organismal fitness effects but modify the variational properties of the gehome as a systematic side effect. Most discussions of this mechanism do not explicitly describe it as a modifier effect. Selection to stabilize morphological functions against environmental and genetic variability can systematically lead to the reduction of pleiotropic effects from genetic background variability. Therefore, this may endow the developmental system with "extrapolation" capabilities if it can produce the morphological function in the face of evolutionary changes in other parts of the organism (Frazzetta 1975). Kauffman argues that
252 the properties of dynamical systems would produce such a systematic correlation between phenotypic stability, which is selected for, and smooth adaptive landscapes, which are not directly selected for (Kauffman 1989b). Riedl, in his theory of 'genome systemization", adds a modifier effect to the evolution of "superimposed" regulatory genes, through an unnecessary assumption about the nature of these genes. The process Riedl proposes is constructional selection, because new genes evolve that produce coordinated variation in the right direction for adaptation. Existing genes are unable to produce the adaptation because it would require the simultaneous mutation of several of them. This creates the situation of latent directional selection. But Riedl emphasizes that the regulatory genes will also eliminate the previously existing uncoordinated dimensions of variability. This is the basis of his argument for hierarchies of constraint in the phenotype. Yet it is not the suppression of uncoordinated variability that allows a new superimposed gene to survive. It is its ability to produce variation in the direction of adaptive opportunity. The former effect would produce a reduction in genetic load, which is another example of the genetic modification described above. Although the evolution of new genes could involve both constructional selection and genetic modification effects, these are not inherently linked. One additional modifier mechanism is the idea of hitchhiking, which Conrad (1979 1982) proposes can evolve smoother adaptive landscapes, and Wagner(1981) proposes can accelerate responses to selection. A near-neutral modifier allele that increases the chance that mutations at another locus are adaptive can hitchhike along with these mutations. Conrad provides no population genetic analysis of the hitchhiking model, but it is plausible based on the model of Eshel (1973) for modification of mutation rates. Modifiers that alter the adaptive landscape, however, would in general be expected to have direct effects on fitness, either advantageous or deleterious, which would swamp any hitchhiking effects on the modifier evolution. Wagner also proposed what is effectively a hitchhiking mechanism for the evolution of increased rates of adaptation in his idea of "feedback" selection (Wagner 1981). Wagner considers the general situation where a neutral modifier can evolve if it increases the rate at which selection increases the fitness of the modifier's carriers.
E v o l v a b i l i t y a n d t h e R a n d o m n e s s o f M u t a t i o n . There is often a Darwinian hesitation in discussing the evolution of evolvability, because it seems to step outside of what natural selection can act upon, or invoke non-random mutational processes. To claim that the evolution of new genes should enhance the genome's ability to produce adaptive variants is, on the surface, contrary to the idea that mutation is random with respect to adaptation. But if one kooks more closely at these notions of randomness, one finds three different concepts:
Concept I: Mutation pressure by itself will not produce adaptive evolution. Concept 2: Current selective pressures do not affect the direction of mutations, with respect to those selective pressures.
253
C o n c e p t 3: The ability of the genome to generate adaptive variants is not molded in any systematic way by its evolutionary history. Neither Concepts 1 nor 2 are at issue with constructional selection. Concept 2, I should note, has been the center of the controversial claim of "directed mutation" in bacteria (Cairns et al. 1988, Lenski et al. 1989, Hall et al. 1990, Cairns and Foster 1991, Foster and Cairns 1992, and Mittler and Lenski 1992). The mechanism of lineage selection that Dawkins (1989) proposed for the evolution of evolvability is in keeping with Concept 3. It is Concept 3, however, with which I have taken issue here. Concept 3 is well exemplified by Maynard Smith, et al. (1985): Furthermore, there is usually no reason to suppose that the developmental mechanisms in question evolved because of the particular phenotypes that they make readily accessible. In general, therefore, the direction of the resulting constraints (biases on the production of variant phenotypes) is "accidental" or "random" with respect to the demands of adaptive evolution. (1985, p. 269) The argument of this chapter could be boiled down simply to this: the Darwinian proposition in the first sentence of this quote does not logically entail the assertion in the second sentence, that the genotype-phenotype map is random with respect to adaptation. 6.4
F u t u r e Directions
The models described here have been designed to be simple and illustrative of the constructional selection effects. There are numerous refinements and elaborations one could make to the genome growth models, including allelic polymorphism, changing selection and coevolution, stochastic mixing of gene duplication and allelic mutation events, and finite population size. Other models for genotype-phenotype maps can be analyzed for their evolution under constructional selection. I suspect that the basic findings about the evolution of pleiotropy will be robust under these elaborations, and further phenomena may emerge as well. Models specific to some of the predictions made here need to be developed, such as statistics of gene-tree topologies under the genic selection effect. Riedl's idea of burden, while not directly dealt with in the models here, may be incorporated with minor modifications. Moreover, the model genotype-phenotype maps evolved under constructional selection can be utilized in providing underlying models for what the effects of genetic constraints on evolution should look like. Theories about how constraints affect morphological evolution and cladistics can be concretely simulated with such model maps. Levinton (1988) writes, Evolutionary biologists have been mainly concerned with the fate of variability in populations, not the generation of variability .... Whatever the
254
reason, the time has come to reemphasize the study of the origin of variation. Levinton's call is certainly heeded in this chapter, which has attempted to provide a framework for thinking about the evolutionary forces acting on the generation of variability, and to describe new mechanisms which enable the evolvability of the genome to evolve.
Acknowledgement s This research was supported in part by the Santa Fe institute, The Center for Nonlinear and Complex Systems at Duke University, Provost's Common Fund, and NSF Grant EAR-89-15983. I thank: Peter Haft, Marcy Uyenoyama, Richard Palmer, the Hawaii Institute of Geophysics and Planetology, and the Maui High Performance Computing Center for infrastructural support; Joe Felsenstein, Frank Eeckman, Giinter Wagner and Eric Mjolsness for the invitations to speak which facilitated this work; Stu Kauffman for helpful discussion of the NK landscape models, Jim Bever, Eric Macklin and Leo Buss for insightful dialectics on levels of selection, and Giinter Wagner for stimulating discussions on the whole subject. The section on NK landscapes originated at the 1991 Complex Systems Summer School held by the Santa Fe Institute. Thanks to Roger A1tenberg and Wolfgang Banzhaf for their consideration during the completion of this work.
References Aboitiz, F. 1991. Lineage selection and the capacity to evolve. Medical Hypotheses 36(2): 155-156. Alberch, P. 1981. From genes to phenotype: dynamical systems and evolvability. Genetiea 84: 5-11. Altenberg, L. 1983. Letter to Nature. Unpublished~ Altenberg, L. 1985. Knowledge representation in the genome: new genes, exons, and pleiotropy. Genetics 110, supplement: s41. Abstract of paper presented at the 1985 Meeting of the Genetics Society of America. Altenberg, L. 1994a. The evolution of evolvability in genetic programming. In K. E. Kinnear, editor~ Advances in Genetic Programming, pages 47-74. MIT Press, Cambridge, MA. Altenberg, L. 1994b. Evolving better representations through selective genome growth. In J. D. Schaffer, H. P. Schwefel, and H. Kitano, editors, Proceedings of the IEEE World Congress on Computational lntelligence~ pages 182-187, Piscataway N.J. IEEE. Altenberg, L. 1995. The Schema Theorem and Price's Theorem. In D. Whitley and M. D. Vose, editors, Foundations of Genetic Algorithms 3. Morgan Kaufmann, San Mateo, CA. Altenberg, L. and D. L. Brutlag. 1986. Selection for modularity in the genome. Unpublished manuscript. Cited in Doolittle (1987).
255
Altenberg, L. and M. W. Feldman. 1987. Selection, generalized transmission, and the evolution of modifier genes. I. The reduction principle. Genetics 117: 559-572. Arnold, S. J., P. Alberch, V. Csanyi, R. C. Dawkins, S. B. Emerson, B. Fritzsch, T. J. Horder, J. Maynard Smith, M. J. Starck, E. S. Vrba, G. P. Wagner, and D. B. Wake. 1988. How do complex organisms evolve? In D. B. Wake and G. Roth, editors, Complex Organismal Functions: Integration and Evolution in Vertebrates, pages 403-433. John Wiley and Sons, New York. Baer, K. E. v. 1828. Entwiehlungsgeschiehte der Thiere: Beobachtung und Reflexion. Borntr~ger, KSnigsberg, pages 221-224. Barwise, J. and J. Perry. 1983. Situations and Attitudes. M.I.T. Press, Boston, pages 292-295. Blake, C. C. F. 1978. Nature 273: 267. Bonner, J.T. 1974. On Development: The Biology of Form. Harvard University, Cambridge, MA, page 61. Brakenhoff, R. H., H. J. M. Aarts, F. H. Reek, N. H. Lubsen, and J. G. G. Schoenmakers. 1990. Human .gamma.-crystallin genes: A gene family on its way to extinction. Journal of Molecular Biology 216(3): 519-532. Brandon, R.N. 1990. Adaptation and Environment. Princeton University Press, Princeton, pages 83-84. Burt, D. W. and I. R. Paton. 1992. Evolutionary origins of the transforming growth factor-/3 gene family. DNA and Cell Biology 11(7): 497-510. Cairns, J. and P. L. Foster. 1991. Adaptive reversion of a frameshift mutation in Escherichia coli. Genetics 128(4): 695-702. Cairns~ J., J. Overbaugh, and S. Miller. 1988. The origin of mutants. Nature 335: 142-145. Casorati, G., A. Traunecker, and K. Karjalainen. 1993. The t cell receptor alpha-beta v-j shuffling shows lack of autonomy between the combining site and the constant domain of the receptor chains. European Journal of Immunology 23(2): 586-589. Cavalier-Smith, T. 1977. Visualising jumping genes. Nature 270: 10-12. Cheverud, J. 1984. Quantitative genetics and developmental constraints on evolution by selection. Journal of Theoretical Biology 110: 155-171. Cheverud, J. M., G. P. Wagner, and M. M. Dow. 1989. Methods for the comparative analysis of variation patterns. Systematic Zoology 38(3): 201-213. Conrad, M. 1979a. Bootstrapping on the adaptive landscape. BioSystems 11: 167-182. Conrad, M. 1982. Natural selection and the evolution of neutralism. BioSystems 15: 83-85. Craik, C., S. Buchman, and S. Beychok. 1980. Proceedings of the National Academy of Sciences U.S.A. 77: 1384-1388. Crick, F. 1979. Split genes and RNA splicing. Science 204: 264-271. Crow, J. F. and M. Kimura. 1970. An Introduction to Population Genetics Theory. Alpha Editions, Edina, MN, pages 418-430. Crow, J. F. and T. Nagylaki. 1976. The rate of change of a character correlated with fitness. American Naturalist 110(972): 207-213. Dawkins, R. 1989. The evolution of evolvabifity. In C. G. Langton, editor, Artificial
life, the proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems. Addison-Wesley, Redwood City, CA. De Vries, C., H. Veerman, F. Blasi~ and H. Pannekoek. 1988. Artificial exon shuffling between tissue-type plasminogen activator (t-PA) and urokinase (u-PA): A comparative study on the fibrinolytic properties of T-PA/u-PA hybrid proteins. Biochemistry 27(7): 2565-2572.
256
del Pino, E. M. and R. P. Elinson. 1983. A novel development pattern for flogs: gastrulation produces and embryonic disk. Nature 306: 589-591. Doolittle, R . F . 1985. The genealogy of some recently evolved vertebrate proteins. Trends in Biochemical Sciences 10: 233-237. Doolittle, W. F. 1987. The origin and function of intervening sequences in DNA: A review. American Naturalist 130: 915-928. Doolittle, W. F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603. Dorit, R. L. and W. Gilbert. 1990. How big is the universe of exons? Science 250(4986): 1377-1382. Dorit, R. L., L. Schoenbach, and W. Gilbert. 1991. Exon shuffling and the underlying motifs of protein evolution. Journal of Cellular Biochemistry Supplement 15 PART D: 8t. El&edge, N. 1989. Macroevolutionary Dynamics: Species, Niches, and Adaptive Peaks. McGraw-Hill, New York, page 205. Eshel, I. 1973. Clone-selection and optimal rates of mutation. Journal of Applied Probability 10: 728-738. Feller, W. 1971. An Introduction to Probability Theory and Its Applications. John Wiley and Sons, New York, page 27. Fisher, R . A . 1930. The Genetical Theory of Natural Selection. Clarendon Press, Oxford, pages 30-37. Foster, P. L. and J. Cairns. 1992. Mechanisms of directed mutation. Genetics 131(4): 783-789. Frank, S. A. and M. Slatkin. 1990. The distribution of allelic effects under mutation and selection. Genetical Research, Cambridge 55: 111-117. Frazzetta, T. H. 1975. Complex Adaptations in Evolving Populations. Sinauer Associates, Sunderland, MA, pages 212-238. Gelfand~ M.S. 1992. Statistical analysis and prediction of the exonic structure of human genes. Journal of Molecular Evolution 35: 239-252. Gilbert, W. 1978. Why genes in pieces? Nature 271: 501. Gilbert, W. and M. Glynias. 1993. On the ancient nature of introns. Gene 135(1-2): 137-144. Gillespie, J. H. 1984. Molecular evolution over the mutational landscape. Evolution 38(5): 1116-1129. Gimelfarb, A. 1986. Additive variation maintained under stabilizing selection: a twolocus model of pMotropy for two quantitative characters. Genetics 112: 717-725. Gimelfarb, A. 1992. Pleiotropy and multilocus polymorphisms. Genetics 130: 223-227. Golub, G. H. and C. F. V. Loan. 1983. Matrix Computations. Johns Hopkins University Press, Baltimore, page 152. Goodwin, B. C. 1989. Evolution and the generative order. In B. C. Goodwin and P. T. Saunders, editors, Theoretical Biology: Epigenetic and Evolutionary Order, pages 89-100. Edinburgh University Press. Grafen, A. 1985. A geometric view of relatedness. Oxford Surveys in Evolutionary Biology 2: 28-89. Haefliger, D. N , J. E. Moska~tis, D. R. Schoenberg, and W. Wahli. 1989. Amphibian albumins as members of the albumin, alpha-fetoprotein, vitamin D-binding protein multigene family. Journal of Molecular Evolution 29(4): 344-354. HaldaJae, J. B. S. 1927. A mathematical theory of natural and artificial selection, part V. selection and mutation. Proceedings of the Cambridge Philosophical Society 23: 838-844.
257
Hastings, A. and C. L. Horn. 1989. Pleiotropic stabilizing selection limits the number of polymorphic loci to be at most the number of characters. Genetics 122: 459-463. Hastings, A. and C. L. Hom. 1990. Multiple equilibria and maintenance of additive genetic variance in a model of pleiotropy. Evolution 44: 1153-1163. Kappen, C., K. Schughaxt, and F. H. Ruddle. 1989. Two steps in the evolution of antennapedia-class vertebrate homeobox genes. Proceedings of the National Academy of Sciences of the United States of America 86(14): 5459-5463. Kauffman, S. A. 1989a. Adaptation on rugged fitness landscapes. In D. Stein, editor, Lectures in the Sciences of Complexity, pages 527-618. Addison-Wesley, Redwood City. SFI Studies in the Sciences of Complexity, Lecture Volume I. Kauffman, S. A. 1989b. Principles of adaptation in complex systems. In D. Stein, editor, Lectures in the Sciences of Complexity, pages 619-712. Addison-Wesley, Redwood City. SFI Studies in the Sciences of Complexity, Lecture Volume I. Kauffman, S. A. and S. Levin. 1987. Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology 128: 11-45. Klenova, E. M., I. Botezato, V. Laudet, G.H. Goodwin, J. C. Wallace, and V. V. Lobanenkov. 1992. Isolation of a cDNA clone encoding the RNase-superfamilyrelated gene highly expressed in chicken bone marrow cells. Biochemical and Biophysical Research Communications 185(1): 231-239. Koehn, R. K., A. J. Zeta, and J. G. Hall. 1983. Enzyme polymorphism and natural selection. In M. Nei and R. K. Koehn, editors, Evolution of Genes and Proteins, chapter 6, pages 115-136. Sinauer Associates, Sunderland, MA. Lande, R. and S. J. Arnold. 1983. The measurement of selection on correlated characters. Evolution 37(6): 1210-1226. Lamer, I. M. 1954. Developmental Homeostasis. Oliver and Boyd, Edinburgh. Levinton, J. 1988. Genetics, Paleontology, and Macroevolution. Cambridge University Press, Cambridge, pages 224-225, 494. Lewontin, R. C. 1978. Adaptation. Scientific American 239(3): 213-230. Li, W.-H. 1985. Accelerated evolution following gene duplication and its implication for the neutralist-selectionist controversy. In T. Ohta and K. Aoki, editors, Population Genetics and Molecular Evolution, pages 333-352. Springer-Verlag, Berlin. Liberman, U. and M. W. Feldman. 1986b. A general reduction principle for genetic modifiers of recombination. Theoretical Population Biology 30: 341-371. Luenberger, D. G. 1968. Optimization by Vector Space Methods. John Wiley and Sons, New York, pages 46-62. Macken, C. A. and A. S. Perelson. 1989. Protein evolution on rugged landscapes.
Proceedings of the National Academy of Sciences of the United States of America 86: 6191-6195. Maynard Smith, J. 1970. Natural selection and the concept of a protein space. Nature 225: 563-564. Maynard Smith, J., R. Burian, S. Kauffman, P. Alberch, J. Campbell, B. Goodwin, R. Lande, D. Raup, and L. Wolpert. 1985. Developmental constraints and evolution. Quarterly Review of Biology 60(3): 265-571. Mencarelli, C., B. Magi, B. Marzocchi, M. Contorni, and V. Pallini. 1991. Evolutionary trends of neurofilament proteins in fish. Comparative Biochemistry and Physiology B Comparative Biochemistry 100(4): 733-740. Nemeschkal, H. L., R. Van Den Elzen, and H. Brieschke. 1992. The morphometric extraction of character complexes accomplishing common biological roles: Avian skeletons as a case study. Zeitschrift fiir Zoologische Systematik und Evolutionsforsehung 30(3): 201-219.
258
Nishikimi, M., T. Kawai, and K. Yagi. 1992. Guinea pigs possess a highly mutated gene for 1-gulono-.gamma.-lactone oxidase, the key enzyme for 1-ascorbic acid biosynthesis missing in this species. Journal of Biological Chemistry 267(30): 21967-21972. Ohta, T. 1988. Further simulation studies on evolution by gene duplication. Evolution 42: 375-386. Ohta, T. 1991. Role of diversifying selection and gene conversion in evolution of major histocompatibility complex loci. Proceedings of the National Academy of Sciences of the U.S.A. 88(15): 6716-6720. Olson, E. C. ~nd R. L. Miller. 1958. Morphological Integration. University of Chicago Press, Chicago. Orgel, L. E. and F. H. C. Crick. 1980. Selfish DNA: The ultimate parasite. Nature 284: 604-607. Patthy, L. 1987. Intron-dependent evolution: preferred types of axons and introns. Febs (Federation Of European Biochemical Societies) Letters 214(1): 1-7. Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. 1992. Numerical Recipes in C: The Art of Scientific Computing. Second Edition. Cambridge University Press, pages 278-280, 300-304. Price, G. R. 1970. Selection and covariance. Nature 227: 520-521. Price, G, R. 1972. Extension of covariance selection mathematics. Annals of Human Genetics 35: 485-489. Raft, R. A. 1992. Direct-developing sea urchins and the evolutionary reorganization of early development. Bioessays 14(4): 211-218. Raft, R. A., G. A. Wray, and J. J. Henry. 1990. Implications of radical evolutionary changes in early development for concepts of developmental constraint. In L. Warren and M. Meselson, editors, New Perspective on Evolution. A, R. Liss. Riedl, R. J. 1977. A systems-analytical approach to macroevolutionary phenomena. Quarterly Review of Biology 52: 351-370. Rieppel, O. 1991. Progress in evolution: Snakes as an example. Zeitschrift fiir Zoologische Systematik und Evolutionsforsehung 29(3): 208-212. Robertson, A. 1966. A mathematica model of the culling process in dairy cattle. Animal Production 8: 95-108. Sanctis, G., G. Falcioni, B. Giardina, F. Ascoli, and M. Brnnori. 1986. Journal of Molecular Biology 188: 73-76. Schank, J. C. and W. C. Wimsatt. 1987. Generative entrenchment and evolution. Philosophy of Science Association 1986 2: 33-60. Schmalhausen, I. I. 1949. Factors of Evolution: The Theory of Stabilizing Selection. University of Chicago Press, Chicago, page 273. Shi, Y. and B. M. Tyler. 1991. All internal promoter elements of neurospora crassa 5S ribosomal RNA and transfer RNA genes, including the A boxes, are functionally gene-specific. Journal of Biological Chemistry 266(13): 8015-8019. Sidow, A. 1992. Diversification of the wnt gene family on the ~ncestral lineage of vertebrates. Proceedings of the National Academy of Sciences of the United States of America 89(11): 5098-5102. Slatkin, M. 1970. Selection and polygenic characters. Proceedings of the National Academy of Sciences U.S.A. 66: 87-93. Smith, M. W. 1988. Structure of vertebrate genes: a statistical analysis implicating selection. Journal of Molecular Evolution 27: 45-55. Stanley, S. M. 1976. Clades versus clones in evolution: Why we have sex. Science 190: 282-283.
259
Stoltzfus, A., D. F. Spencer, M. Zuker, J. M. J. Logsdon, and W. F. Doolittle. 1994. Testing the exon theory of genes: The evidence from protein structure. Science 265(5169): 202-207. Streydio, C., S. Swillens, M. Georges, C. Szpirer, and G. Vassart. 1992. Structure, evolution and chromosomal localization of the human pregnancy-specific j31 glycoprotein gene family. Genomies 6(4): 579-592. Strong, M. and G. A. Gutman. 1992. Evolutionary relationships within the potassium channel multigene family. Society For Neuroseience Abstracts 18: 78. Taylor, P.D. 1988. Inclusive fitness models with two sexes. Theoretical Population Biology 34: 145-168. Uyenoyama, M.K. 1988. On the evolution of genetic incompatibility systems: incompatibility as a mechanism for the regulation of outcrossing distance. In R. E. Michod and B. R. Levin, editors, The Evolution of Sex, pages 212-232. Sinauer Associates, Sunderland, MA. Via, S. and R. Lande. 1985. Genotype-environment interaction and the evolution of phenotypic plasticity. Evolution 39(3): 505-522. Waddington, C. H. 1957. The Strategy of the genes. Allen and Unwin, London. Wade, M. J. 1985. Soft selection, hard selection, kin selection, and group selection. American Naturalist 125: 61-73. Wagner, G. P. 1981. Feedback selection and the evolution of modifiers. Acta Biotheoretiea 30: 79-102. Wagner, G. P. 1984. On the eigenvalue distribution of genetic and phenotypic dispersion matrices: Evidence for a nonrandom organization of quantitative character variation. Journal of Mathematical Biology 21: 77-95. Wagner, G. P. 1988. The systems approach: an interface between development and population genetic aspects of evolution. In D. M. Raup and D. Jablonski, editors, Patterns and Processes in the History of Life, pages 149-165. Springer-Verlag, Berlin. Wagner, G.P. 1 9 8 9 . Multivariate mutation-selection balance with constrained pleiotropic effects. Genetics 122: 223-234. Weinberger, E. D. 1991. Local properties of Kauffman's N-k model, a tuneably rugged energy landscape. Physical Review A 44(10): 6399-6413. Wimsatt, W. C. and J. C. Schank. 1988. Two constraints on the evolution of complex adaptations and the means for their avoidance. In M. H. Nitecki, editor, Evolutionary Progress, pages 231-274. University of Chicago. Wright, S. 1932. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proceedings of the Sixth International Congress on Genetics 1: 356-366. Wu, X., D. M. Muzny, C. C. Lee, and C. T. Caskey. 1992. Two independent mutational events in the loss of urate oxidase during hominoid evolution. Journal of Molecular Evolution 34(1): 78-84. Zonneveld, A.-J. V., H. Veerman, and H. Pannekoek. 1986. Proceedings of the National Academy of Sciences U.S.A. 83: 4670-4674.
About the contributors Lee A l t e n b e r g works on the theory of evolutionary processes. He is a research affiliate with the University of Hawaii Institute of Geophysics and Planetology, and is a visitor at the Maui High Performance Computing Center. He became Atrium Baccalaureus in genetics at the University of California in 1980, and Philosophiae Doctor in Biological Sciences at the Leland Stanford, Jr. University in 1985. Subsequently he has done postdoctoral research in the diverse departments of Biochemistry, Statistics, Zoology, and Civil and Environmental Engineering at Stanford, North Carolina State University, and Duke University. He has pursued first-hand experience with cultural evolution through participation in student cooperative communities, environmental activities and ecological urban development. W o l f g a n g B a n z h a f holds a PhD in physics and presently holds an associate professorship in applied Computer Science at the University of Dortmund, Germany. He is working in the areas of Artificial Neural Networks, Evolutionary / Genetic Algorithms, Genetic Programming and Artificial Life / Selforganization. Before coming to Dortmund, he did research on computers and synergetics in the institute of H. Haken at the University of Stuttgart. Subsequently, he became a member of the Neuroeomputing group at Mitsubishi Electric's Central Research Laboratory in Japan. Later on he worked as a Senior Research Scientist at Mitsubishi Electric's Research Laboratory (MERL) in Cambridge, Mass. M a r k A. B e d a u received a Ph.D. in Philosophy from the University of California at Berkeley in 1985. He taught at Dartmouth College for a number of years where he worked on problems in the philosophy of biology, the philosophy of psychology, and computational logic. He is now Associate Professor of Philosophy at Reed College in Portland, Oregon. His current research interests focus on the conceptual and quantatitive foundations of living and evolving systems. Fie is currently writing a book for MIT Press on how artificial life can clarify the conceptual foundations of evolution, life, and mind. G u i l l e m e t t e D u c h a t e a u - N g u y e n received a PhD in biomathematics from the University Paris 7, in 1992. She is presently a post-doctoral fellow at the Laboratoire de physique statistique of the Ecole Normale Superieure (Paris). Her research interests are models of the evolution of species, multi-agent modeling and sustainable development. F r a n k H. E e c k m a n received an MD from the University of Ghent Medical School in Belgium in 1982. He practiced medicine in Belgium for a year before moving to the United States in 1983. In 1988 he received a PhD in neurophysiology from the University of California at Berkeley. Fie has been working at the Lawrence Livermore National Laboratory since 1988, first in Physics and later in the Computations Directorate. He was a staff member at the Institute
262
for Scientific Computing Research. In 1994 he started working in the informatics group at the Human Genome Center at the Lawrence Berkeley Laboratory His research interests are computational neuroscience, artificial neural networks, genetic algorithms, and computer vision. W a r r e n Ewens studied Statistics at the Australian National University and received his PhD in 1964, specializing in the area of stochastic processes in population genetics. This work led to an interest in the stochastic neutral theory of genetic evolution and in particular to developing statistical methods for "assessing whether a sample of genes gave evidence to support the neutral theory. A second interest was in the multilocus deterministic theory of genetic evolution, in particular in finding an interpretation for the Fundamental Theorem of Natural Selection and thus in developing optimality principles for evolution as a Mendelian process. He moved to the University of Pennsylvania in 1972, and has since focussed on aspects of human population genetics, in particular in statistical problems of ascertainment sampling and in linkage analysis associated with finding genes causing diseases in humans. G o r d o n Pox's undergraduate degree, from the University of California, Berkeley, was in history, and before going to graduate school he worked as a schoolteacher and writer. He received his Ph~ in 1989 in Ecology and Evolutionary Biology from the University of Arizona. After a postdoctoral fellowship at the University of California, Davis, he joined the biology faculty at the University of the South. Currently he is a visiting scholar at the University of California, San Diego. His research involves several areas of population biology (especially of plants), including population and genetic dynamics, ecology and evolution of reproductive timing, and inferences about evolutionary history. Alan H a s t i n g s has a PhD in applied mathematics from Cornell University, and is currently professor and chair of the Division of Environmental Studies at the University of Califronia, Davis. He also is associated with the Institute for Theoretical Dynamics and the Center for Population Biology at Davis, and was formerly a faculty member in the Department of Mathematics. His current research focusses on dynamics of models in quantitative and population genetics, and the dynamics of ecological populations where space and/or age are important. Geoffrey F. Miller received a BA in biology and psychology from Columbia University and a PhD in cognitive psychology with Roger Shepard from Stanford University in 1993. tie was an NSF-NATO Post-Doctoral Research Fellow with the Evolutionary and Adaptive Systems Group at the University of Sussex, England, from 1993 to 1995. He is currently a lecturer in the University of Nottingham Psychology Department. His research focuses on the reciprocal interactions between psychological adaptations and evolutionary dynamics, particularly the origins and effects of learning, courtship, mate choice, motion perception, and
263 protean behavior. His book Evolution of the human brain through runaway sexual selection based on his Stanford doctoral thesis will be published in 1995 by MIT Press. Eric M j o l s n e s s obtained a PhD in physics at Caltech in 1985, and was on the faculty of the Yale Computer Science Department from then to 1994. In 1995 he became a Research Scientist at the University of California, San Diego's department of Computer Science and Engineering, and its Institute for Neural Computation. His research interests are in optimization-based approaches to the design and evolution of intelligent systems; this has included work in artificial neural networks, computer vision, and biological modeling. Heinz M f i h l e n b e i n received a PhD in numerical mathematics from the University of Bonn in 1975. He has been working at the German national research center for computer science (GMD) since 1969. His responsibilities included management of the computing center, the development of the time-sharing operating system BS2000 and of programming environments for parallel processing. He has been research manager of the research group "Adaptive Systems" since 1987. The goal of the group is to create intelligent systems which act in open environments with incomplete information. His research interests are reflective statistics, explorative learning, genetic algorithms and behavior based robotics. L u c a Peliti studied physics and is presently professor of statistical mechanics at the Department of Physical Sciences, University of Naples, Italy. He is working on the statistical mechanics of membranes, but also on the statistical mechanics approach to biological complexity, in particular population dynamics, evolution, and the integration of biological information. J o h n R e i n i t z is an Assistant Professor of Molecular Biology at Mount Sinai Medical School in New York City. Prof. Reinitz received his PhD from the Department of Biology at Yale University in 1988. He then spent three year as a postdoctoral fellow in the laboratory of Michael Levine at Columbia University, where he did experimental and theoretical work on the developmental genetics of the fruit fly Drosophila. The theoretical work was extended and applied to evolutionary questions during two years at the Center for Medical Informatics at Yale Medical School. His work is directed toward the characterization of biological regulatory networks by the application of theoretical and experimental methods. Dirk S c h l i e r k a m p - V o o s e n studied computer science and received his degree from the University of Bonn in Germany in 1991. He has been working at the German national research center for computer science (GMD) in the research group "Adaptive Systems" since 1991. His research interests are evolutionary algorithms, optical pattern recognition and parallel computing.
264
D a v i d H. Sharp is a Fellow of the Los Alamos National Laboratory. He received his AB from Princeton University and his PhD in theoretical physics from the California Institute of Technology. Sharp's research interests include the modeling of complex fluid flows and the formulation and analysis of models of gene regulation. He is a Fellow of the American Assosciation for the Advancement of Science and the American Physical Society. P e t e r T o d d received a BA in mathematics from Oberlin College, an MPhil in computer speech and language processing from Cambridge University, and an MA in psychology from University of California at San Diego. He completed his PhD in psychology at Stanford University in 1992, working with David Rumelhart on connectionist simulations of the evolution of learning. From 1992 to 1994 he was a founding member of the Adaptive Animat Research Group (AARG) at the Rowland Institute for Science, Cambridge, Mass. He is currently an assistant professor in the Department of Psychology at the University of Denver. His research interests include evolutionary psychology and the exploration of adaptive behavior, particularly through computer models of evolving artificial creatures, and models of human musical cognition and composition; in the latter area he has edited a book, with D. Gareth Loy, entitled Music and Connectionism (MIT Press, 1991). H a n s - M i c h a e l Voigt studied control engineering and automation. He received the Dr.-Ing. degree from the Technical University Chemnitz and the Dr. sc. techn, degree from the Academy of Sciences, both in technical cybernetics. He has been working at the Industrial Institute for Control Engineering Berlin as head of the systems analysis department. Later on he was head of a research group on dynamical networks at the Academy of Sciences Berlin. He was a visiting senior scientist at the International Computer Science Institute (ICSI) at Berkeley and at the German National Research Center for Computer Science (GMD) at Birlinghoven. At present he works as a senior scientist at the Bionics and Evolution Techniques Laboratory of the Technical University Berlin. His research interests are artificial neural network architectures, evolutionary and parallel algorithms, self-organization models and their application in modelling, control and pattern recognition. His hobbies are downhill skiing and canoeing. G e r a r d Weisbuch, Directeur de recherches au CNRS, works in the Laboratoire de physique statistique of the Ecole Normale Superieure (Paris). After a thesis in condensed matter physics and research in polymer physics, he has been involved in theoretical biology for the past 18 years. His current research interests are immunology, evolution of species, cultural transitions and sustainable development.
Index A
adaptation, 53-55, 58-64, 66, 104, 105,134, 136,143,161,169171,173,174, 179,181-184, 186-188,197, 198,202,204, 206,207,209,210,214,225, 230,231,233,238,240,244, 249,251-253 adaptive function, 213,222,223,227, 230,250 adaptive landscape, 56,170,171,173, 182-186,188,193,194,197, 199,205,207,215,228,231, 233,242, 243, 253 adaptive opportunity, 211,214, 215, 217, 221,226 adaptive peak, 214, 234, 240, 242, 251 adaptive peaks, 174, 183-186, 193, 194, 200 adaptive radiation, 188, 189, 191 adaptive system, 53, 54, 56, 63, 66~ 67 adaptive zone, 171, 182, 183, 185, 188, 189, 193 aesthetic displays, 176 affordance ecological affordance, 175 fitness affordance, 175, 176 reproductive affordance, 175 Akin, 12 Alberch, 252 algorithmic chemistry, 99 allele, 8, 9, 11-13, 16, 19, 20, 2225, 55-57, 62-64, 145, 151, 154, 158,205,212,216,217, 224, 228, 240, 244, 253 allelic changes, 208-210 allelic polymorphism, 212, 226,250, 254 allelic substitution, 205,209,225,226, 231,237,240, 251
Altenberg, 205, 208, 209, 219, 222, 231,247, 248,254 Altman, 71, 100, 101 analysis of variance, 57 analytical methods, 56 Andersson, 169-171, 173, 199, 201, 202 Anheyer, 141 Arak, 171, 177, 201 artificial intelligence, 170, 199 Artificial Life, 28, 51, 53, 66-69,101, 102, 121,200 artificial life, 53, 54, 56, 66, 67, 199 asexually reproducing, 105 association biological association, 50 mutualistic association, 28 symbiotic association, 28, 50 asymptotic behavior, 29 attractor attractor solution, 39 attractor state, 82, 181 competitive attractor system, 92 domain of attraction, 242 mutualism attractor, 36, 37 point attractor, 35 autonomous system, 93 B
Bahm, 67 Banzhaf, 69, 101, 102 Barr, 121 Barth, 170, 199 Bateson, 175, 181,193, 199 Bayesian/Sampling method, 124, 138 Beaudry, 101 Bedau, 53, 67 Begon, 51 behavioral noise, 55, 61, 62, 64-66 beneficial effects, 63 Berzal-Herranz, 101
266 BGA, 123, 124, 126, 131, 132, 137, 140,146,155,160,161,163166 bifurcation point, 43 binary numbers, 69, 73, 74, 80,101 binary sequences, 69, 73 binary strings, 74, 78, 81, 94 biocomputation, 7,169-172,174,194, 199,204 biological taxonomy, 190 biotechnology, 71 birth rate, 24 Bonnet, 189,200, 207, 210,245 Born, 140, 141 Boucher, 51 boundary value problem, 22 branching process model, 216 Bremermann, 142, 143,156 Brooks, 53, 67, 174, 200 Bryson, 21 Burke, 101 Burr, 211 Buss, 119-121,254 C calculus of variations, 11 canalization, 253 Castilloux, 16 catalyst, 71, 72, 100 Cavalier-Smith, 210 Cedergren, 100 cell cell cluster, 104 cell division, 106, 107, 113 cell types, 111, 114-117, 119 cellular differentiation, 113,116 Central Limit Theorem, 234 Chaitin, 102 chicken-egg problem, 71 chromosome, 56, 107, 145, 148, 181 Clarke, 177, 200 cleavage, 107, 113-115 Cliff, 171, 175, 198-202 coexistence, 27, 36-38, 40-44, 47, 50
combinatorial explosion, 81 commensal, 31, 35-37, 50 comparative morphology, 173 competing strategies, 56 competition, 75, 90, 91, 94,114, t44, 161,171, 179, 202 complexity complex adaptive system, 53 complexity catastrophe, 8 neural complexity, 196 computation, 11, 12, 39, 43, 48, 53, 65, 66, 69, 76, 77, 99, 100, 102, 109,113,121,178, 193, 202, 226 computer simulation, 39, 47, 56,104, 173, 185, 195 connectionist model, 106, 122 constrained peak, 222, 229 constraint, 12, 14, 19, 20, 24, 64, 103, 111,189-191,205,206,209, 214,215,222,225,246,249252,254 Constructional selection, 205, 242, 253 continuous time model, 20, 22, 24, 106 convergence, 115,127-130,132,143, 150,152,155,163,174,201 Cook, 189, 196, 200 cooperation, 28, 32, 74, 112 correlation and regression, 144, 153 courtship behavior, 191, 192 Cracraft, 187, 189-191,200 Crick, 101,210, 247 Cronin, 169, 170, 173, 199,200 Crosby, 195,200 crossover 1-point crossover, 166 uniform crossover, 132,145,149, 165, 166 Crow, 24, 67,216,222 D
DAG, !08
267
Darwin, 8, 50, 68,107, 147, 170,172, 173,175-179,181,182,184, 187-194,196,197,200,202, 203, 211,212, 247, 252 Dawkins, 67, 171,177, 188,200,201, 242, 252 De Angelis, 51 De Boer, 51 death rate, 20, 24, 37, 47, 48 Demeter, 28, 52 design structures, 199 destructor, 78-80, 82, 83, 90, 95 development, 66, 81, 93,103-110,112, 117, 119-122,126,144, 153, 155,159,180,181,189,191, 192,195,200,203,205-208, 221,232,242,247,249,250, 252, 253 Dewsbury, 177, 200 difference equation, 150, 152, 159 differential equation, 27, 28, 34, 45, 46, 48, 89, 92,104, 105,150, 218 diffusion, 105, 146,216 digestion, 49, 51 diploidy, 108 directed acyclic graph, 108 discrete time model, 19, 20, 22, 25 discrete time process, 106 dispersal, 107, 113, 114, 117 dissociability, 205, 250 dissociation, 32, 33, 35, 38, 45, 48 distribution age distribution, 64 asymptotic distribution, 82 random distribution, 62,230,231 uniform distribution, 131, 221, 237 diversification, 169-171,189,190,198, 202 diversity biodiversity, 170,171,177, 196, 197, 199 component diversity, 60, 61 diversity dynamics, 59-61, 66
population diversity, 56 total diversity, 56-58, 60 DNA, 69-72, 109,183,209,210,212, 214 Dobzhansky, 172, 194, 195, 200 Doolittle, 210, 246, 248, 252 Dorit, 211 Douglas, 51 drift drift duration, 64, 65 genetic drift, 55, 64, 155, 169, 174, 177, 182,183,191,193195 neutral drift, 174, 182-184, 191 Drosophila, 250 Duchateau, 51, 52 E
EASY, 123, 124, 133, 136-138, 140 Eberhard, 173, 190, 196,200 ecological benefit, 186 ecological utility, 169, 187, 189-191, 197 economic costs, 184-186, 193 ecosystem, 28, 31, 35, 92, 104, 173, 199 edge of disorder, 63 egoism, 37, 38, 41, 43, 44 Eigen, 89, 101,183, 188, 189,200 Eldredge, 67, 173,174, 185, 190, 193, 195, 200,201,251 emergent clock, 119 Endler, 171, 177, 200 endosymbiont, 29-31, 44, 45, 47-49 energy-minimal configuration, 74 engineering methods, 198 Enquist, 171,177, 201 environment, 54, 56-58, 63, 93, 103105,107,108,110-113,118120,147, 171,172, 174-176, 179,181,183,202,240,245, 248,253 equations Fokker-Planck equation, 110
268 genetic equations, 18, 19 equilibrium autonomous equilibrium, 94 disequilibrium, 19, 23, 25 equilibrium behavior, 18, 19, 22 equilibrium distance, 185 equilibrium equations, 37, 39 equilibrium level, 37 equilibrium value, 19, 60, 82 punctuated equilibrium, 53, 66, 193 stable equilibrium, 20, 60 Eshelman, 166 evolution adaptive evolution, 53, 63, 64, 66,209,252 biological evolution, 8, 120 coevolution, 199,254 evolution of sex, 253 evolution strategies, 144 evolutionary activity, 53, 63-67 evolutionary algorithms, 125,142144, 147, 156, 167 evolutionary biology, 58, 201 evolutionary dynamics, 55, 63, 103,172,173,198,208, 226 evolutionary learning, 63 evolutionary novelty, 189 evolvability, 205-207,215,240, 242, 243,252,253 macroevolution, 170, 171, 173, 189,191,192,195,210,212, 213,251 microevolution, 170, 192 molecular evolution, 183, 201, 233 natural evolution process, 142, 146 neutral theory of evolution, 155 organic evolution, 242 protein evolution, 247, 253 quantum evolution, 185 rational controlled evolution, 146 self-organized evolution, 142 Evolution Strategies, 123, 124, 136,
141 Evolutionary Algorithm, 73,123,124, 132, 138-141 Ewens, 7, 13, 14, 16-19, 25, 67 exon shuffling, 205, 247, 248, 253 extinction, 85, 186, 211
F
Falconer, 67 Farmer, 51, 67, 68, 101, 102 fecundity, 211,212, 245, 248 feedback, 63 Feldman, 208, 219 Feller, 234 fertility, 176 Fisher, 8, 10, 11, 13, 14, 17, 18, 152, 167,169,172-176,179,185, 186,188,201,203,219,222 fitness average fitness, 19, 23, 32, 38, 49, 58, 124, 145, 148, 151, 153, 158, 163,244 binomial fitness distribution, 149 constructional fitness, 218,246, 248 effective fitness, 38-43, 50 fitness coefficient, 29, 30, 33, 48 fitness cost, 182, 183 fitness distribution, 205,215-217, 234 fitness function, 55,104,105,109111,142,143,149,154, 163, !78,180, 199,224,233,234, 236,242,243 fitness gradient, 176 fitness increase, 227, 228, 234, 237 fitness landscape, 143,162, 165, 172,182,184,214,233,245 fitness parameters, 32, 36 fitness peak, 172, 182,184, 185, t93,197,199,214,226,231, 233-236,240,242-244,249
269
fitness value, 73, 132, 145, 148, 149,163, 164, 176,244 Gaussian fitness function, 222 increase in fitness, 44, 48 Malthusian fitness, 24 marginal fitness, 9, 11,234, 235 maximum fitness, 225 mean fitness, 7-11, 13, 14, 16, 23, 147, 150, 159-161 organismal fitness, 208,230,232 fixation, 212, 216-220,233 fixed point, 115-117, 135 Fleischer, 121 flow , 90, 91, 174, 195 fluctuation, 45, 56, 60, 82, 85 folding canonical folding, 80, 81, 84 folding methods, 69, 81-83, 85 topological folding, 84 Fontana, 99, 102 Forrest, 141, 199,201 Fox, 8, 17, 18, 21, 22, 24, 25, 199 fractal dimension, 130,131 function Ackley, 129, 132, 134, 135 Beta-Function, 136 Branin, 139 DECEPTION, 162, 163, 166 density function, 124 Easom, 139, 141 Goldstein-Price, 139 Griewangk, 129-133 hyper ellipsoid function, 127 MULTIMAX, 162 ONEMAX, 156, 158, 162, 164166 penalty function, 111 PLATEAU, 162, 163, 165 Rastrigin, 129,132,134, 135,166 scoring function, 104, 105, 109, 111,113-118 Shubert, 139 six-hump camelback, 139 SYMBASIN, 162, 163, 166 test function, 123,124, 128,130,
132, 136-140, 161, 162 Fundamental Theorem, 7, 13, 17 fundamental theorem, 219,222 fusion of gametes, 105, 108 Futuyama, 174, 201 fuzzy set theory, 123, 127 G GA, 73, 143, 148 game theory, 110, 197 gametic frequencies, 23 Gardiner, 101,121 Gauss-Markov theorem of Gauss-Markov, 154 Gawelczyk, 141 Gelfand, 248 gene burdened gene, 213, 214 gene duplication, 105, 1,08,209, 211,212,215,219-221,229, 245,251,254 gene frequencies, 7, 8, 13, 14, 16, 19, 172 gene pool, 62 gene regulation, 113 population of genes, 210 Genetic Algorithm, 73,123,130,140142,145,146, 161,201,204 genetic algorithm, 7, 25,142-146,148, 150-154,161,163,167, 169, 172, 178, 180, 198,222 genetic distance, 20, 21 Genetic modification, 253 genetic modification, 205,253 genetic regulatory circuit, 104, 105, 109, 113 genome, 7, 16, 104, 113 genome growth, 209, 210, 214, 218, 220,222,226,229-231,236239,241,242,245,248,250, 254 genome-as-population, 245,248,249 genotype, 9, 10, 13, 16, 19, 20, 24, 55, 56, 73, 104, 105, 108,
270 112,116-118,142, 145,146, 151, 154-156 Gimelfarb, 251 global maximum, 226 Goldberg, 163, 166, 169, 178, 180, 198, 199 Goldschmidt, 182, 188,201 Golub, 228 Goodwin, 189, 201,206 Gould, 63, 67, 68,185,187,189,191, 193, 201,204 gradient ascent model, 225 gradient system, 11-13 gradualness, 206 grammar dynamical grammar, 103, 104, 106,113 stochastic grammar, 104, 110 Gray, 100 growth, 29, 37, 45, 46, 75, 90, 94, 113-115, 117, 118, 190 Guerrier-Takada, 101 guest, 28, 31-33 Guilford, 171,177, 201 Gutman, 211 H
Haefliger, 211 Haldane, 186,201,216 Hamiltonian, 21, 22, 26 handicap theory, 184 Hansen, 141 Harper, 26, 27, 50, 51, 67, 201 Harvey, 198-201 Hastings, 7, 8, 18, 19, 21-26,251 Herdy, 161 heritability, 142, 148, 154, 158, 163, 167,211,212,219,229,245 Hillis, 68 Hinton, 181,201 historicity, 54 hitchhiking, 253 Hofbauer, 12, 13, 24 Hohman, 50-52
Holland, 73, 101,143, 144, 169, 198 Hopfield, 121 host, 27-33, 35, 36, 43-45, 47-51, 58, 172, 194 Huxley, 172, 186, 194, 201,202 hybridization, 194, 195 hypercycle, 28, 89 hysteresis, 37, 44
idea of burden, 254 Ikegami, 51 inference rules, 123 information genetic information, 109 information carrier, 69 information processing, 54, 71, 73 information storage, 71, 73 semantic information, 93 syntactic system, 93 inheritance, I24, 179 initial conditions, 34-37, 43, 44, 47, 82, 92, 115 innovation courtship innovation, 191, 193, 199 distribution of innovations, 190 economic innovation, 191, 193, 199 evolutionary innovation, 170-172, 174, 182-193,197,199,201, 202 key innovation, 189-191,198 phenotypic innovation, 187,188 interaction process, 37 interphase, 106,107, 113-117, 119 iterated prisonner dilemma, 28 Iversen, 68 Iwasa, 175,201,203 J
Jablonski, 187, 189,201
271
James, 27, 51 Jefferson, 28, 52, 173 Jensen, 201 Joyce, 101 K
Kaneko, 28, 51 9 Kappen, 213,247 Katsuura, 138,139 Kauffman, 8, 72, 101,174, 188,189, 201,205,214,215,231-234, 242,243, 245, 253,254 Keeler, 27, 51 Kimura, 24, 26, 67, 183, 201,216 Kirkpatrick, 109,121,173,175, 177, 185, 186, 201 Kirsebom, 100 Koehn, 250 Koza, 199, 201 Kruger, 101 L Lain, 113, 121, 122 Lande, 173, 184, 202,222, 225 Langton, 51, 67, 68, 101,102, 121 Lessard, 16, 17 Lewis, 100 Lewontin, 63, 68,208 Li, 208,213,222,251 Liem, 188, 189, 202 life cycle, 108, 118, 119 Lindenmayer, 107, 122 Lindgren, 51, 68 lineage tree, 107-109, 115-117 linear programming, 23 Lis, 51,203 Luenberger, 225 M
Macken, 233, 234 macrovariable, 53, 54, 56, 66
Malthusian parameter, 20-22 Margolus, 102 Markov chain , 156 Marsh, 101 master equation, 110 mate choice, 169-184,186,187,190199,202-204 mate preference, 169,174-180, 183, 186, 195, 196, 201,203 mating, 7-11, 13, 14, 19, 20, 110, 142,143,146,153,170,178, 179,195-197,202-204,222 maximization, 15 maximum, 20, 21, 23, 38, 42-44, 48, 110,163,227,238,243,244 May, 28, 52, 68, 172, 178, 187-190, 194-196,199,202,233,252 McAuley, 50, 51 MeKinney, 202 McNeil, 50-52 MEA, 124, 132 mean field approach, 45 metabolism, 72, 192 metric, 7, 8, 12, 14-16 migration, 56,161, 179 Miller, 169-173, 175, 177, 181, 183, 195,196,198,199,202-204, 253 minimization, 10, 15, 24, 25,225 minimum, 20-22, 25, 109, 132, 136, 138,225 mitosis, 106, 107, 111,114, 115 Mjolsness, 103, 122,254 model ellipsoid model, 129 negsphere model, 129 sphere model , 125, 130,132 modularity, 215, 247, 248 molecular biology, 69 molecular recognition mechanism, 27, 32, 50 Moller, 180,203 Morgan, 140,141,170,201,203,204 morphogenetic correlation, 253 morphological integration, 205,249,
272
253 Morse, 52 mortality, 109, ii0 Muller, 189,203 multi-site model, 45-47 multiple site model, 50, 51 Muscatine, 28, 50-52 mutation crisp modal mutation, 125,133, 134 discrete modal mutation, 131134 genetic mutation, 170, 175,192 high mutation, 60, 62, 63 low mutation, 29, 60, 61, 63 macromutation, 109, 182, 188 multiple modal mutation, 136 multiple mutation, 123,124, 136 mutation range, 131, 132, 134, 136 mutation rate, 13, 29, 35, 38, 43, 55, 58-63, 65-67, 156159,161,163,195,208,249, 253 mutation-selection balance, 214, 226, 249 mutational degradation, 211 neutral mutation, 155, 183 point mutation, 29, 55,105,108, 109, 182 relative mutation range, 132 soft modal mutation, 123-125, 133, 134 soft mutation, 124 undirected random mutation, 188 mutualism emergence of mutualism, 27, 32, 35-37, 50 existence of mntualism, 42, 44 mutualistic regime, 42 N
Nagylaki, 19, 20, 222 Nei, 20, 163,201,252
neural net, 106 niche ecological niches, 197 econiches, 169, 178, 194 niche differentiation, 199 niche exploitation, 189 Nitecki, 187, 200-203 NK model, 215, 231-234, 236, 237, 240, 242, 243, 245 non-linear analysis, 28 non-linear differential system, 28 nonlinear equation, 25 nonlinearity, 76, 106 Norpoth, 68 Nowak, 28, 52 nucleotide strands, 74 numerical integration, 35, 43, 92 numerical simulation, 28, 35, 43, 47, 50, 103, 115, 228,237, 244
O occupancy, 45, 46, 110-112 occupation, 95, 111 Ohta, 213 ontogeny, 250 optima local ecological optima, 169,186 local evolutionary optima, 174, 182, 199 local optima, 172,182-184, 188, 199 optimal control, 21 optimal solution, 21, 22, 25, 174 optimality economic optimality, 184 optimality behavior, 7, 8 optimality principles, 7 optimality theory, 197 optimization evolutionary optimization, 169, 172, 179, 198,203 global optimization, 123,137,138 mathematical optimization, 125
273
optimization algorithm, 18,105, 109 optimization methods, 18, 25, 142, 169 optimization problem, 18, 20, 23, 73, 123, 130, 138, 140, 142 optimization techniques, 25,172, 198 parameter optimization, 123,143, 144 optimum global optimum, 132, 133, 137, 163,225, 228,231 global selective optimum, 222 local adaptive optimum, 188 local ecological optimum, 198 local optimum, 182, 183, 191, 198 smallest optimum, 163 organism diploid organism, 148 free organism, 31, 32, 35 haploid organism, 105, 148 iteroparous organism, 120 multicellular organism, 103-105, 107, 111-113, 115, 116 primeval organism, 31, 36 semelparous organism, 112 symbiotic organism, 36 organization principle, 100 Orgel, 101,210 origin of life, 52, 71, 101 Ostermeier, 141 P
Pace, 101 Packard, 67, 68, 101 parallel machines, 100 parasite, 27-29, 41, 50, 58, 71, 171, 210 Paterson, 195, 203 Paton, 211 Patterson, 188, 203 Patthy, 248
per capita rate, 28, 48, 49 periodic behavior, 118 Petrie, 176,203 phase portrait, 115-118 phenotype, 55, 64, 73, 104, 110,111, 145,151,169,174,176,177, 181,183,184, 188,194-196, 199,206-210,212,214,217, 222-230,245,246,251,252 phylogeny, 30, 107, 108, 201, 211, 213,251 Pimental, 203 pleiotropy low pleiotropy, 205,207,208,210,
212,213,238,240,245-247, 249,250 pleiotropy value, 228,234, 23724O polygeny, 206,232 Pomiankowski, 175, 177, 180, 186, 199,201,203 population dominant population, 39, 42 fittest population, 29, 30 initial population, 36, 94, 144146,149-151,154,155,160, 161,164 population dynamics, 27, 29, 34, 161,216 population size, 31, 42, 94,128130,132,137, 138,140,152, 1542 155,161,164,183,254 predominant population, 39 primeval population, 35, 36 quasi-clonal population, 60, 61 random population, 60, 62 population genetics multilocus population genetic, 13, 26 population genetic dynamics, 214 single locus population genetic, 11, 12 power law, 65, 66 predation, 56, 179, 203 predominance, 27, 30, 36
274
preservation, 147 Press, 232 Price, t4,219,222 probability distribution binomial distribution, 124, 150 Gamma distribution, 221 Gaussian probability distribution, 110 normal probability distribution, 124 triangular probability distribution, 127, 133 progeny, 106-109, 115, 116 proliferation, 37, 39, 111, 191,248 protection, 27, 29, 47-50 protein, 69, 71, 72,106-108,115,116, 118, 175,183,188,201,247, 248 Prusinkiewicz, 122 psychology, 170, 172, 197, 198, 201 punctuation, 58, 59, 62, 66, 185
Q quantitative genetic, 222,250, 251 quantitative genetics, 124, 144, 147, 148, 151, 154, 167 quantitative-genetic model, 215 R
Raft, 189, 203, 250 Rasmussen, 51, 67, 68, i01,102 Ray, 68, 94, i01 reachable set, 25 reaction cycle, 75 real biological system, 119 Reehenberg, 73, 101, 143, 144 recombination, 16, 19, 23, 26, 108, 123-130,132,133,142-145, 147, 149,152,154,155,166, 170, 196, 198, 208 Recombination scheme BLX-0.a Crossover, 127 Discrete Recombination, 126
Extended Line Recombination, 126 Fuzzy Min-Max-Recombination, 126 Intermediate Recombination, 126 Linear Crossover, 127 Uniform Crossover, 127 Reinitz, 103, 122 rejection, 27, 32, 35, 42, 50, 147 relaxation, 226 relaxation process, 91 replication mechanism, 71 reproduction, 9, 31-33, 38, 45, 55, 105,108,113-115,117, 119, 144, 170, 172, 196-198 resources available resources, 29, 41, 63, 75 food resources, 30 resource field, 54, 56, 64 resource levels, 55 resource limitation, 74, 79 resource processing, 54 resource taxes, 64 resource-extraction, 63, 66 Riedl, 209, 213, 214, 249, 250, 253, 254 Rieppel, 250 RNA, 69-73,100, 101,246 Robertson, 222 robot control systems, 198 robotics, 170, 201 robustness, 123, 125, 132, 134, 140, 170, 173, 180 Romanes, 187, 188, 203 Ronneburg, 67 Rosen, 68, 122 Roughgarden, 51, 52 Ryan, 171, 173, 177, 196,203 $
safety -Factor, 158 saturation, 44, 50, 113, 154, 155 scaling behavior, 124, 128, 130, 137
275
Schaffer, 140, 141,158, t62 Schank, 213, 250 Schoemaker, 7 Schuster, 89, 101,183, 203 Schwefel, 102,129,130,140,143,144 search evolutionary search, 169,170,172 parallel search, 172, 194, 199 search space, 113 selection apostatic selection, 177,200 artificial selection, 142,147,149, 167, 171, 178, 181 biological selection, 171 directional selection, 206, 210, 214,215,222,225,226,228, 245,249, 250 disruptive selection, 194, 195,197 ecological selection, 181,183 Gaussian selection, 205,223,245 genic selection, 205, 210, 212, 245,248 individual selection, 28 lineage selection, 242-244, 252 mass selection, 148 natural selection, 7-9, 13, 14, 16, 18, 56, 63, 64,147, 152, 167,169-183,185-187, 189191,193,195,197-199,201, 208,252 organismal selection, 208-211, 213, 249 physical selection, 171 proportionate selection, 143,144, 151,152 psychological selection, 171 response to selection, 124, 142, 144,147-149,151,155-159, 163,167 selection differential, 148, 151 selection intensity, 124, 128, !29, 132, 148-150, 155 selection method, 157 selection of the fittest, 28, 50 selection operator, 93
selection pressure, 69, 93, 94,171, 175,176,178,179,181,183185 selective advantage, 205,207,208, 210,212,218,221,228,249 selective peak, 222 selective pressure, 214, 252 sensory selection, 171 sexual selection, 169-187, 190192, 194-204 stochastic selection, 109 truncation selection, 146-148,152 self-organization, 53, 70, 73, 99,101, 174, 189 self-organizing system, 69 self-reaction, 79, 98 self-replicating system, 80 self-replication, 71, 77, 79, 85, 90 self-reproduction, 53 selfish regime, 43 sensitive parameters, 35 sensorimotor functionality, 53, 54, 63 sequence duplication, 218 sequence space, 214 sexual attractiveness, 176, 178 sexual dimorphism, 184-186,193,194, 202 Seymour, 67 Sharon, 50, 51 Sharp, 103, 122 Sigmund, 12, 13, 17, 24, 26 Simon, 192, 203 Simpson, 172, 185, 187, 188, 203 simulated annealing, 105,109, 113115, 118, 121, 122 single site model, 46, 47, 50 slow manifold analysis, 29, 37, 43, 46, 48-5O slow manifold approximation, 39 slow manifold equation, 39, 47 Smith, 50, 51, 68, 122,178,199,202, 203, 233, 248,252 soft genetic operator, 123, 124, 137, 138, 140 speciation
276
allopatric speciation, 194, 197 sympatric speciation, 170, 172, 193-195, 197, 203 species competing species, 111 dominant species, 41-43 primeval species, 35 subdominant species, 38 species concentration, 58 specific mate recognition system, 173, 195 Sprengel, 170, 204 stability global stability, 28 instabilities, 49 limit of stability, 43 local stability, 28 stability criterion, 78 stability limit, 43, 44 stability of mutualism, 28, 37, 43 stable states, 110 stagnation problem, 143 statistical mechanics, 109 statistical techniques, 144, 153 stochastic process, 109, 169 stochastically branching, 59 Streydio, 211 string soup, 76 Strong, 211 Sullivan, 175, 204 survival, 55, 169, 170, 172,175, 176, 186,187,190,199,211,248 Svirezhev, 7, 11-15 symbiont, 27, 28, 31, 32, 35, 41, 43, 44, 48-51 symbiosis, 28, 49, 51, 52 T Taylor, 50-52, 67, 68, 101, 102,222 thermodynamics, 38, 39, 53, 167 Thoday, 195, 204 threshold, 55, 77,106,113-115,132, 148, 157, 158, 163
time scale, 29, 35, 37, 45, 47, 59,210, 212,213, 251 Todd, 169, 170, 172, 173, 175, 177, 183, 195, 196, 203,204 Townsend, 27, 50, 51 traits economic traits, 172, 173, 181, 190 phenotypic traits, 178-180,184, 191,192 reproductive traits, 172, 173 trait-transmitting process, 60 trajectory, 93, 116, 174, 176, 177 transient behavior, 118 transient period, 28 transition, 29, 36-39, 41-44, 47, 49, 53, 60-63, 66, 71, 98, 104, 155 Travis, 28, 51 trial and error, 113 U uncertainty, 123 V Van Valen, 188, 204 variability, 205, 206, 208, 209, 222, 225,226,228,249-251,253, 254 variation adaptive variants, 205,207,209, 217, 245, 252 allelic variants, 212,216 alletic variation, 205, 209, 210, 212,213,215,216,226,245, 246 genetic variance, 226,249 genetic variation, 205-207,210, 214, 222, 231,245 phenotypic variants, 214 phenotypic variation, 205, 206, 215 variational principle, 12
277
variety, 54, 67, 144, 153, 171, 189, 191,206,208,221,222,227 Very Fast Simulated Reannealing, 123, 124, 140 Via, 211,225 viability, 54, 176,178-180,185,186, 211,245 vitality, 54, 184 Voigt, 123 Volterra-Lotka formalism, 28 yon Neumann neighborhood, 55 Vrba, 191,195, 203,204 W Wagner, 205,222,223,226,245,249, 253,254 Waldrop, 101 Weinberger, 233-236,242 Weinstock, 12 Weisbuch, 52 Wiley, 53, 67, 100,141,199,201,202 Williams, 204 Willson, 171,204 Wilson, 52, 175, 181,200-204 Wimsatt, 213, 250 Wright, 8, 26, 169, 182, 183, 187, 188,204, 233, 251 Wyles, 181,204 Z
Zahavi, 176, 184, 204 Zwick, 67
Lecture Notes in Computer Science For information about Vols. 1-822 please contact your bookseller or Springer-Verlag
Vol. 823: R. A. Elmasri, V. Kouramajian, B. Thalheim (Eds.), Entity-Relationship Approach - - ER '93. Proceedings, 1993. X, 531 pages. 1994.
Vol. 841: I. Privara, B. Rovan, P. Ru~i~ka (Eds.), Mathematical Foundations of Computer Science 1994. Proceedings, 1994. X, 628 pages. 1994.
Vol. 824: E. M. Schmidt, S. Skyum (Eds.), Algorithm Theory - SWAT '94. Proceedings. IX, 383 pages. 1994.
Vo1.842: T. Kloks, Treewidth. IX, 209 pages. 1994.
Vol. 825: J. L. Mundy, A. Zisserman, D. Forsyth (Eds.), Applications of Invariance in Computer Vision. Proceedings, 1993. IX, 510 pages. 1994. Vol. 826: D. S, Bowers (Ed.), Directions in Databases. Proceedings, 1994. X, 234 pages. 1994. Vol. 827: D. M. Gabbay, H. J. Ohlbach (Eds.), Temporal Logic. Proceedings, 1994. XI, 546 pages. 1994. (Snbseries LNAI). Vol. 828: L. C. Paulson, Isabelle. XVII, 32t pages. 1994. Vol. 829: A. Chmora, S. B. Wicker (Eds.), Error Control, Cryptology, and Speech Compression. Proceedings, 1993. VIII, 121 pages. 1994. Vol. 830: C. Castelfranchi, E. Werner (Eds.), Artificial Social Systems.Proceedings, 1992. XVIII, 337 pages. 1994. (Subseries LNAI). Vol. 831: V. Bouchittd, M. Morvan (Eds.), Orders, Algorithms, and Applications. Proceedings, 1994. IX, 204 pages. 1994. Vol. 832: E. B6rger, Y. Gurevich, K. Meinke (Eds.), Computer Science Logic, Proceedings, 1993. VIII, 336 pages. 1994. Vol. 833: D. Driankov, P. W. Eklund, A. Ralescu (Eds.), Fuzzy Logic and Fuzzy Control. Proceedings, 1991. XII, 157 pages. 1994. (Subseries LNAI). Vol. 834: D.-Z. Du, X.-S. Zhang (Eds.), Algorithms and Computation. Proceedings, 1994. XIII, 687 pages. 1994. Vot. 835: W. M. Tepfenhart, J. P. Dick, J. F. Sowa (Eds.), Conceptual Structures: Current Practices. Proceedings, 1994. VIII, 331 pages. 1994. (Subseries LNAI). Vol. 836: B. Jonsson, J. Parrow (Eds.), CONCUR '94: Concurrency Theory. Proceedings, 1994. IX, 529 pages. 1994. Vol. 837: S. Wess, K.-D. Althoff, M. M. Richter (Eds.), Topics in Case-Based Reasoning. Proceedings, 1993. IX, 471 pages. 1994. (Subseries LNAI). Vol. 838: C. MacNish, D. Pearce, L. Moniz Pereira (Eds.), Logics in Artificial Intelligence. Proceedings, 1994. IX, 413 pages. 1994. (Subseries LNAI). Vol. 839: Y. G. Desmedt (Ed.), Advances in Cryptology CRYPTO '94. Proceedings, 1994. XII, 439 pages. 1994. Vol. 840: G. Reinelt, The Traveling Salesman. VIII, 223 pages. 1994.
Vol. 843: A. Szepietowski, Turing Machines with Sublogarithmic Space. VIII, 115 pages. 1994. Vol. 844: M. Hermenegildo, J. Penjam (Eds.), Programming Language Implementation and Logic Programming. Proceedings, 1994. XII, 469 pages. 1994. Vol. 845: J.-P. Jouannaud (Ed.), Constraints in Computational Logics. Proceedings, 1994. VIII, 367 pages. 1994. Vol. 846: D. Shepherd, G. Blair, G. Coulson, N. Davies, F. Garcia (Eds.), Network and Operating System Support for Digital Audio and Video. Proceedings, 1993. VIII, 269 pages. 1994. Vol. 847: A. L. Ralescu (Ed.) Fuzzy Logic in Artificial Intelligence. Proceedings, 1993. VII, 128 pages. 1994. (Subseries LNAI). Vol. 848: A. R. Krommer, C. W. Ueberhuber, Numerical Integration on Advanced Computer Systems. XIII, 341 pages. 1994. Vol, 849: R. W. Hartenstein, M. Z. Servit (Eds.), FieldProgrammable Logic. Proceedings, 1994. XI, 434 pages. 1994. Vol. 850: G. Levi, M. Rodrlguez-Artalejo (Eds.), Algebraic and Logic Programming. Proceedings, 1994. VIII, 304 pages. 1994. Vol. 851: H.-J. Kugler, A. Mullery, N. Niebert (Eds.), Towards a Pan-European Telecommunication Service Infrastructure. Proceedings, 1994. XIII, 582 pages. 1994. Vol. 852: K. Echtle, D. Hammer, D. Powell (Eds.), Dependable Computing - EDCC-1. Proceedings, 1994. XVII, 618 pages. 1994. Vol. 853: K. Bolding, L. Snyder (Eds.), Parallel Computer Routing and Communication. Proceedings, 1994. IX, 317 pages, 1994. Vol. 854: B. Buehberger,J. Volkert (Eds.), Parallel Processing: CONPAR 94 - VAPP VI. Proceedings, 1994. XVI, 893 pages. 1994. Vol. 855: J. van Leeuwen (Ed.), Algorithms - ESA '94. Proceedings, 1994. X, 510 pagesA994. Vol. 856: D. Karagiannis (Ed.), Database and Expert Systems Applications. Proceedings, 1994. XVII, 807 pages. 1994. Vol. 857: G. Tel, P. Vitfinyi (Eds.), Distributed Algorithms. Proceedings, 1994. X, 370 pages. 1994. Vol. 858: E. Bertino, S. Urban (Eds.), Object-Oriented Methodologies and Systems. Proceedings, 1994. X, 386 pages. 1994.
Vol. 859: T. F. Melham, J. Camilleri (Eds.), Higher Order Logic Theorem Proving and Its Applications. Proceedings, 1994. IX, 470 pages. 1994.
Vol. 880: P. S. Thiagarajan (Ed.), Foundations of Software Technology and Theoretical Computer Science. Proceedings, 1994. XI, 451 pages. 1994.
Vol. 860: W. L. Zagler, G. Busby, R. R. Wagner (Eds.), Computers for Handicapped Persons. Proceedings, 1994. XX, 625 pages. 1994.
Vol. 881: P. Loucopoulos (Ed.), Entity-Relationship Approach - ER'94. Proceedings, 1994. XIII, 579 pages. 1994.
VoI: 861: B. Nebel, L. Dreschler-Fischer (Eds.), KI-94: Advances in Artificial Intelligence. Proceedings, 1994. IX, 401 pages. 1994. (Subseries LNAI). Vol. 862: R. C. Carrasco, J. Oncina (Eds.), Grammatical Inference and Applications, Proceedings, 1994. VIII, 290 pages. 1994. (Subseries LNAI). Vol. 863: H. Langmaaek, W.-P. de Roever, J. Vytopil (Eds.), Formal Techniques in Real-Time and Fault-Tolerant Systems. Proceedings, 1994. X1V, 787 pages. 1994. Vol. 864: B. Le Charlier (Ed.), Static Analysis. Proceedings, 1994. XII, 465 pages. 1994. Vol. 865: T. C. Fogarty (Ed.), Evolutionary Computing. Proceedings, 1994. XII, 332 pages. 1994. Vol. 866: Y. Davidor, H.-P. Schwefel, R. Manner (Eds.), Parallel Problem Solving from Nature - PPSN III. Proceedings, 1994. XV, 642 pages, t994. Vol 867: L. Steels, G. Schreiber, W. Van de Velde (Eds.), A Future for Knowledge Acquisition. Proceedings, 1994. XII, 414 pages. 1994. (Subseries LNAI). Vol. 868: R. Steinmetz (Ed.), Multimedia: Advanced Teleservices and High-Speed Communication Architectures. Proceedings, 1994. IX, 451 pages. 1994. Vol. 869: Z. W. Rag, Zemankova (Eds.), Methodologies for Intelligent Systems. Proceedings, 1994. X, 613 pages. 1994. (Subseries LNAI).
Vol. 882: D. Hutchison, A. Danthine, H. Leopold, G. Conlson (Eds.), Multimedia Transport and Teteservices. Proceedings, 1994. XI, 380 pages. 1994. Vol. 883: L. Fribourg, F. Turini (Eds.), Logic Program Synthesis and Transformation - Meta-Programming in Logic. Proceedings, 1994. IX, 451 pages. 1994. Vol. 884: J. Nievergelt, T. Roos, H.-J. Schek, P. Widmayer (Eds.), IGIS '94: Geographic Information Systems. Proceedings, 1994. VIII, 292 pages. 19944. Vol. 885: R. C. Veltkamp, Closed Objects Boundaries from Scattered Points. VIII, 144 pages. 1994. Vol. 886: M. M. Veloso, Planning and Learning by Analogical Reasoning. XIII, 181 pages. 1994. (Subseries LNAI). Vol. 887: M. Toussaint (Ed.), Ada in Europe. Proceedings, 1994. XII, 521 pages. 1994. Vol. 888: S. A, Andersson (Ed.), Analysis of Dynamical and Cognitive Systems. Proceedings, 1993. VII, 260 pages. 1995. Vol. 889: H. P. Lubich, Towards a CSCW Framework for Scientific Cooperation in Europe. X, 268 pages. 1995. Vol. 890: M. 3. Wooldridge, N. R. Jennings (Eds.), Intelligent Agents. Proceedings, 1994. VII1, 407 pages. 1995. (Subseries LNAI). Vol. 891: C. Lewerentz, T. Lindner (Eds,), Formal Development of Reactive Systems. XI, 394 pages, t995.
Vol. 870: J. S. Greenfield, Distributed Programming Paradigms with Cryptography Applications. XI, 182 pages. 1994.
Vol. 892: K. Pingali, U. Banerjee, D. Gelernter, A. Nicolau, D. Padua (Eds.), Languages and Compilers for Parallel Computing. Proceedings, 1994. XI, 496 pages. 1995.
Vol. 871 : J. P. Lee, G. G. Grinstein (Eds.), Database Issues for Data Visualization. Proceedings, 1993. XIV, 229 pages. 1994.
Vol. 893: G. Gottlob, M. Y. Vardi (Eds.), Database TheoryICDT '95. Proceedings, t995. XI, 454 pages. 1995.
Vol. 872: S Arikawa, K. P, Jantke (Eds.), Algorithmic Learning Theory. Proceedings, 1994. XIV, 575 pages. 1994. Vol, 873: M. Naftalin, T. Denvir, M. Bertran (Eds.), FME '94: Industrial Benefit of Formal Methods. Proceedings, 1994. XI, 723 pages. 1994. Vol. 874: A. Borning (Ed.), Principles and Practice of Constraint Programming. Proceedings, 1994, IX, 361 pages. 1994, Vol. 875: D. Gollmann (Ed.), Computer Security ESORICS 94. Proceedings, 1994. XI, 469 pages. 1994. Vol. 876: B. Blumenthal, J. Gornostaev, C. Unger (Eds.), Human-Computer Interaetion. Proceedings.~ 1994. IX, 239 pages. 1994. Vol. 877: L. M. Adieman, M.-D. Huang (Eds.), Algorithmic Number Theory. Proceedings, 1994. IX, 323 pages. 1994. Vol. 878: T. Ishida; Parallel, Distributed and Multiagent Production Systems. XVII, 166 pages. 1994. (Subseries LNAI). Vol. 879: J. Dongarra, J. Wagniewski (Eds.), Parallel Scientific Computing. Proceedings, 1994. XI, 566 pages. 1994.
Vol. 894: R. Tamassia, I. G. Tollis (Eds.), Graph Drawing. Proceedings, 1994.. X, 471 pages. 1995. VoL 895: R. L. Ibrahim (Ed.), Software Engineering Education. Proceedings, 1995. XII, 449 pages. 1995. Vol. 896: R. M. Taylor, J. Coutaz (Eds.), Software Engineering and Human-Computer Interaction. Proceedings, 1994. X, 281 pages. 1995, Vol. 898: P. Steffens (Ed.), Machine Translation and the Lexicon. Proceedings, 1993. X, 251 pages. 1995. (Subseries LNAI). Vol. 899: W. Banzhaf, F. H. Eeckman (Eds.), Evolution and Biocomputatiou. VII, 277 pages. 1995. Vol. 900: E. W. Mayr, C. Puech (Eds.), STACS 95. Proceedings, 1995. XIII, 654 pages. 1995. Vol. 901: R. Kumar, T. Kropf (Eds.), Theorem Provers in Circuit Design. Proceedings, 1994. VIII, 303 pages. 1995. Vol. 902: M. Dezani-Ciancaglini, G. Plotkin (Eds.), Typed Lambda Calculi and Applications. Proceedings, 1995. VIII, 443 pages. 1995. Vol. 903: E. W. Mayr, G. Schmidt, G. Tinhofer (Eds.), Graph-Theoretic Concepts in Computer Science. Proceedings, 1994. IX, 414 pages. 1995.