COMPUTERS AND BIOMEDICAL RESEARCH ARTICLE NO.
31, 1–17 (1998)
CO971464
Modeling and Simulating Morphological Evolution in an Artificial Life Environment Paulo Se´rgio Panse Silveira and Eduardo Massad Discipline of Medical Informatics, School of Medicine of the University of Sa˜o Paulo and LIMQ1/HC-FMUSP, Sa˜o Paulo, Brazil E-mail:
[email protected]
Received January 16, 1997
This paper presents a computer-based environment designed to study biological evolution considering morphological aspects. It was inspired on cellular automata and evolutionary algorithm principles. Simple rules are used to determine the genotype and phenotype of individuals and their relationships with behavioral aspects in a square matrix environment, where individuals can evolve. Two methods to simulate mutational errors and to introduce variability of mutations are discussed. A series of four simulations show that the model promotes phenotype evolution depending on the distribution of food over the environment; morphology evolved as to favor movement of the individuals towards the portion of the environment in which the food has been distributed or to capture falling food. 1998 Academic Press
Key Words: computer simulations; artificial life; genetic algorithm; cellular automata; medical informatics; ecology; modeling.
INTRODUCTION Morphogenesis is certainly one of the most complex problems in biology (6, 7), but evolutionary rules that generate patterns of development may be simpler than we may think. Computer simulations of the evolutionary aspects are based on very rudimentary rules that, sometimes surprisingly, mimics real life. If we consider the fact that the whole biosphere evolved through a process of very tiny errors in the transcription of a 4-letter alphabet, together with selection of the favorable new words, we may be led to the somewhat simplified conclusion that life started based on a simple set of rules. Development can be viewed as involving only a small set of rules of cellular and mechanochemical interactions that can generate complex morphologies (1, 2, 8). This does not imply that, if it is possible to use computer algorithms to simulate complex behaviors using a set of simple rules, so Nature evolved in a similar fashion. Otherwise, we must think about how simple are the rules for DNA duplication in relation to the effects we can observe, namely, life or ecosystems. Natural life on earth is organized into molecular level, cellular level, organism level, and population1 0010-4809/98 $25.00 Copyright 1998 by Academic Press All rights of reproduction in any form reserved.
2
SILVEIRA AND MASSAD
ecosystem level. A living thing at any of these levels is a complex adaptive system that emerges from the interaction of a large number of elements from the level below (13). Another matter of contention is that computational rules may not be the same set of natural rules, or, in other words, if a model correlates well with a process, it does not necessarily mean that the process was generated by the same set of rules in the simulation. We believe that the answer would be, most often, a ‘‘no.’’ It is almost certain that we are not building a natural set of rules, since they are a computational, artificial, and ad hoc set of rules. According to Prusinkiewicz (10), ‘‘the relationship between the rules expressing the behavior of individual components and the resulting developmental processes, patterns, and forms is often nonintuitive and difficult to grasp, and computer simulations play an essential role in the study of morphogenesis.’’ Taylor and Jefferson (13), assume that there is a major intellectual divide between the modeling tools designed to accomplish some complex tasks (even if only distantly related to the way natural systems accomplish it) and systems meant to accurately model biological systems and intended for testing biological hypotheses. We think it is a didactic way to classify the computational models, but there is not a clear limit between them. The model discussed in this paper was inspired in cellular automata (CA) and evolutionary algorithm principles. It intends to study biological evolution considering morphological aspects by offering a computer-based environment. The individual morphology evolves on the conditions of the environment, applying artificial rules. We believe that, if this model is able to follow its rules consistently, it is useful to be applied to solve some complex tasks as predator/ prey emergence, populational strategies, parasitism evolution, evolution, and differentiation of a species in another two species, and so on, testing biological hypotheses, even if it is distantly related to the way natural systems accomplish it. The environment presented here is a matrix represented on a computer, seen as a chessboard. Each position can contain an individual, a portion of food, an obstacle or another kind of element in study. All algorithms have local scope, i.e., as is in biological systems, all decisions are not taken by demographic parameters, but they depend only on individual state and the state of its nearby neighborhood. In the model presented here, most of decisions are influenced by the individual shape like its movement, its probability of mating, and its energy consumption; i.e., individual shape is used as a decisional parameter by the algorithms. This is an aspect of the model inspired in CA methodology. The model is also inspired by techniques from artificial life (Alife) and evolutionary algorithms. This field of study emerged in the last few years and has been developed by some independent groups (4). It consists of five great divisions: genetic algorithms (GA), evolutionary programming (EP), evolution strategies (ES), classifier systems (CFS), and genetic programming (GP). As a common aspect, all of them are related with Darwinian evolutionary theories and the survival of the fittest. The inspired algorithms are thus termed evolutionary algorithms (see [4] for a detailed discussion about GA, EP, ES, CFS, and GP
SIMULATING MORPHOLOGICAL EVOLUTION
3
differences). Evolutionary algorithm models are applicable to complex tasks, mainly if the space of searching is very large. Although these methods have been applied in many areas of human knowledge, their applications in biological modeling are a direct consequence. Our model codifies artificial genes in each individual, used as parameters to the decisional algorithms, whose copies are propagated to the descendants during the reproduction when the genes are submitted to mutational errors. It is important to emphasize that we are not stating that the shapes evolved by this model use the same set of rules as natural morphogenesis. We intend to demonstrate that shapes may evolve from simple rules in a simple and artificial environment. The model just reproduces some basic principles of Darwinian evolution, accumulating favorable variations by mutation and selection, promoting morphogenesis as a complex dynamic process in which development takes place in a sequential way, and morphological forms depend on the history of their past forms; i.e., ‘‘the appearance of novel phenotypic forms is not random’’ (7). METHODS The program was developed in a computer environment based on a CISC architecture in C computer language. Details of the initial conception of our computer program can be found in Silveria et al. (12). An example of its application in studies related to infectious diseases can be found in Silveira et al. (11). The computational environment is represented by a square matrix and individuals are placed in it. The square matrix is 200 3 200 positions. Each individual is represented by its variables of life and genetic components (genotype) used as parameters by the algorithms. Simple rules are used to determine the genotype and phenotype of individuals, and their relationships with behavioral aspects and individual decisions, like the reproductive age, movement, and the rules to reproduce and form gametes with variation (mutation), the two basic requisites to selection in a Darwinian sense (see below). Simple algorithms are applied to each individual in repeated life cycles. The scopes of these algorithms are local, mimicking a biological system, where individual decisions use inputs provided by the neighborhood and the current state of the individual. The main variables of life are: age, amount of ingested food and its spatial position (x, y) in the environment. The genotype is described by two genes: recursion gene (R) and morphology gene (M ). R determines the size of the individual, whose body is generated by R iterations from (x, y) initial position of the individual. Recursion may assume values from 0 to 31 (5 bits). M is an array, where each element of the array may assume values from 0 to 3, representing the orientation of growing, namely: 0 5 downwards; 1 5 rightwards; 2 5 upwards; 3 5 leftwards. For instance, an individual with genotype as in Table 1 is six cells large. Its adult shape (phenotype) is shown in Fig. 1.
4
SILVEIRA AND MASSAD TABLE 1 Example of Genotype of an Individual
R
M0
M1
M2
M3
M4
M5
M6
M7
...
M31
5
2
2
3
3
2
?
?
?
...
?
Note. R ; Recursion Gene; M0 to M31 ; morphologic array gene.
The model is able to: ● save and retrieve the configuration supplied by the researcher; ● interrupt a simulation and continue at that point; ● execute simulations in batch; ● exhibit graphics and data during simulations to allow visual control; ● generate files to store, cycle by cycle, the status of the variables involved in the simulations that can be read by other computer systems. When a simulation is running, the program creates a split screen with an environment where the individuals can live, and the information window where the graphics and the ‘‘census’’ appear. Inside the environment the individuals are represented by composition of points (cells), and the food by single points. A simulation is initiated by setting the initial conditions, as in Table 2. All simulations are initiated with R 5 0 and random values in M array to guarantee that any growing direction is favored. Ages are equal to zero and the individuals are positioned randomly over all the environment in cycle 0. The simulations are based on the following steps: ● Random distribution of food over the upper portion of the environment (explained in Table 2). ● Generation of the ‘‘body’’ of each individual based on its genetics (see Table 1 and Fig. 1). An individual can contact a point of food. In this case the individual ‘‘eats’’ this point and increases its reserve of food. In case of contact with another individual, mating and generation of descendants can occur. As each cell has an ‘‘area of influence’’ (Table 2) the phenotype of the individual influences its feeding and mating success.
FIG. 1. Phenotype of individual generated by chromosome sequence described in Table 1. The initial cell is denoted by (x, y). The M array from M5 to M31 is not read by this individual.
SIMULATING MORPHOLOGICAL EVOLUTION
5
TABLE 2 Parameters of the Model Final cycle Initial number of individuals Area of influence Metabolic parameter Agitation Floor depth
Food by cycle Area of food distribution Initial food Food donation Food by descendant
Mutation
Probability of fatal accident Maximum age
Total duration of the simulation in life cycles (number of iterations) Initial number of individuals in cycle 0. Distance that each cell of each individual can reach to mate or feed. Used to determine how many points of food are consumed for an individual to maintain its ‘‘body’’. The position (x, y) of each individual is disturbed by (x 6 agitation, y 6 agitation) in each cycle, determined randomly. The basement of the environment cannot be occupied by (x, y) cell. An individual may be able to grow in this direction (by predominance of alleles 0 in M ) in order to acquire accumulated food. Amount of food randomly distributed over the environment in each cycle. Determines the depth (from top downwards to the bottom) that food is distributed in the environment. Amount of food that each individual has in cycle 0. Proportion of food that each parental individual donates to the offspring when reproducing (see main text). Each newborn receives a determined amount of food from the parents. Depending on parental reserve of food, more or less offspring are generated in each reproduction. Mutation rate is determined by one of two ways: (1) acting on each ‘‘bit’’ that represents the genotype of each individual in each cycle or (2) on the whole gene (see main text). All individuals are submitted to a constant probability of death by accident in each cycle. Maximum age that an individual may achieve.
● Movement of each individual. The new position that will be occupied by an individual is a function of three components: agitation of the environment (described in Table 2), ‘‘force of gravity’’ (acting over individuals and food), and morphology. Considering the individuals in Fig. 2 it is possible to understand how those influences act upon each individual. ● Reproduction. To reproduce, individuals must be at a certain minimum age. Although the R gene determines the number of recursions, if the age (a) of an individual is smaller than R, the number of recursions performed by the system will be a. In this case the individual is assumed to be still at a ‘‘growing’’ stage of life and therefore ‘‘immature’’ to reproduce. When R $ a the individual is able to reproduce. Gamete formation is subject to errors (mutations). Two ways to implement mutation are applied, named as ‘‘bit-mutation’’ and ‘‘gene-mutation.’’ As each gene is represented by an integer number, ‘‘bit-mutation’’ assumes that each bit
6
SILVEIRA AND MASSAD
FIG. 2. Examples of different phenotypes: (a) a one-celled individual is unable to move itself by its own resources and it tends to fall downwards. (b) a two-celled individual that tends to move upwards by one position in each cycle; since the ‘‘force of gravity’’ move it downwards the same amount, this kind of individual is able to ‘‘float’’ in the environment. (c) an example of ‘‘heavy’’ individual; it tends to move downwards two positions each cycle, one by itself and one by force of gravity. (d) a ‘‘light’’ individual able to move up by one position in each cycle. (e) this individual displays a composed movement, moving upwards by two positions and moving rightwards by two positions. (f) a more complex morphology, showing a ‘‘heavy’’ individual that tends to move leftwards.
is analogous to a DNA base (ATCG) and each of them is submitted to a probability of error during the gamete formation. This means that, from a gene 00000 (decimal value 50), a single mutation may generate, for instance, 01000 (decimal value 58). ‘‘Gene-mutation’’ is to assume the decimal value of the gene and we allow steps of one unity (from gene 52 a mutation may generate a 1 or a 3). Offspring result from the combination of parental genes. Reproduction depends on the participation of two, but all individuals are haploid like in some bacteria. Crossovers are not considered in this model in order to simplify the
TABLE 3 Initial Conditions to the Simulation Described in Fig. 3 Final cycle Initial number of individuals Carrying capacity Area of influence
10000 100 individuals
Food by cycle Area of food distribution
150 points 10 positions depth
500 individuals 1 position
Initial food Food donation
Metabolic parameter Agitation Floor depth
0.50 21 positions 10 positions
Mutation rate Probability of accident Maximum age
10 points 0.10 of parental reserve 5 3 1024 /bit 0.01 1000 cycles
SIMULATING MORPHOLOGICAL EVOLUTION
7
FIG. 3. (a) Evolution of a population showing initial oscillation. The amount of distributed food in each cycle is the main parameter to limit the amount of living individuals. (b) When the population stabilize the average age oscillates around a certain average. (c) The amount of ingested food per capita did not change during this simulation. (d) Causes of death by accident (constant to all ages and all individuals during simulations) and by lack of food. (e) Evolution of recursion gene R showing two phases. Simulation was initiated with R 5 0 (binary 5 00000). Allele R 5 8 (binary 5 01000) was initially successful. Allele R 5 24 (binary 5 11000) won the competition. (f) Position occupied by the individuals during the simulation. The food is distributed in the upper fraction of the environment and ‘‘light’’ individuals evolved.
8
SILVEIRA AND MASSAD
FIG. 4. Snapshots of a simulation from cycle 0 to 2100 showing aspects of evolution of phenotypes that are able to move upwards. Individuals represented in black and food in gray. Observe that some favorable individuals were able to move upwards at cycle 100 but they reproduced in significant number around cycle 200 competing by food. This caused lack of food for the individuals living close to the floor at cycle 350. In the sequence larger individuals were able to evolve.
algorithm and we assume that each gene (R or an element of the M array) is independently segregate from the others. The new individual receives food from each parental individual that donates a proportion of its own food during the process. The percentage of food donated from parental reserves and the amount of food received by each descendant are
FIG. 5. Simulation A: (a) distribution of food over the upper 10 lines of the environment generates ‘‘light’’ individuals, able to move upwards; (b) succession of some prevalent phenotypes during the simulation. Initial cells are assigned by 3.
SIMULATING MORPHOLOGICAL EVOLUTION
9
10
SILVEIRA AND MASSAD
always the same. As the reserve of parental food is variable, probably depending on its fitness, this is a supersimplification of real life, although somewhat realistic—in biological species this donation varies in a range from the parents that donate all their bodies to the descendants (like bacteria) to parents that donate part of their reserves depending on their nutritional state (as mammals). In this model the amount of the parents’ food would be an indirect consequence of its shape, and available food is the major constraint in the environment. Therefore, individuals with higher fitness would have more reserve and would be able to generate more offspring. The initial age is zero and the position of the offspring is an average of the parental positions: ● Aging. All individuals become one cycle older at each simulation cycle. ● Energy consumption. It corresponds to the basic metabolism of an individual and determines how many points of ingested food it spends in each cycle. The calculation of this amount of consumed food depends on the size of its ‘‘body.’’ Considering that a one-celled individual with an ‘‘area of influence’’ equals to 1 has 8 neighboring positions, this means that it has 8 possibilities to find food. A two-celled individual has 7 neighboring positions around each cell. We can presume that if this individual consumes two points of food, it is disadvantaged in comparison to the one-celled individual. In order to allow the appearance of larger individuals, a metabolic parameter m (ranged from 0 to 1) is applied to calculate the food consumption, C, in each cycle as a function of the size s (number of cells) of the ‘‘body,’’ according to C5
Om. s
i
i51
● Probability of death by accident. ● Census: values of the variables in the study are stored in a hard disk. ● Removal of dead individuals. ● All points of food drop one position (force of gravity). ● Updating the screen so as to show the environment and data to the researcher (visual control). ● If not interrupted by the user, go to step 1. RESULTS Typical simulation parameters and results are shown in Table 3 and Fig. 3. Observe that the ‘‘bit-mutation’’ model (Fig. 3e) generated gene R equal to 0, 8, and 24 in rapid succession. This is not desirable since it may be difficult ‘‘fine-
FIG. 6. Simulation B: (a) distribution of food over 40 lines of the environment generates individuals with ‘‘force’’ to move upwards of two cells (one of them opposed by force of gravity); (b) succession of some prevalent phenotypes during the simulation. Observe the lateral appendices developed to capture falling food.
SIMULATING MORPHOLOGICAL EVOLUTION
11
12
SILVEIRA AND MASSAD
tuning’’ the system. For instance, if the fittest individual would be, for instance, R 5 15 (binary 5 01111), the system would take a long time to find this value. Compare it with the results below (Fig. 9). Figure 4 shows selected aspects of the computer screen. Observe that the simulation begins with food randomly distributed over the environment, except in its upper portion, where food will be distributed at each cycle in the subsequent snapshots. One-celled individuals (R 5 0) are also randomly distributed over the environment. At cycle 50 the force of gravity is concentrating the individuals close to the floor and part of food is concentrated below floor depth as described in Table 2. At cycle 100 few ‘‘light’’ individuals emerge and begin to move towards the upper portion of the environment. When they are well established no more food falls over the ‘‘heavy’’ variety. Part of food escaped from these individuals during their transition from the bottom portions of the model. At around cycle 350 all food has been consumed. The result is the extinction of the ‘‘heavy’’ variety. Larger individuals were able to be developed on the upper portion of the environment presumably as a function of the evolution of a more ‘‘efficient’’ morphology. In order to show that it is possible to observe evolutionary issues out of these simulations, the following are results from simulations performed with the same initial conditions except by the area of food distribution. All the following simulations used the ‘‘gene-mutation’’ model (as defined in the description of the steps of a simulation under ‘‘Reproduction’’). Four simulations were performed. On simulation A, food was distributed close to the upper fraction of the environment. In this case, individuals evolved towards a shape able to move upwards (Fig. 5a). This movement generated a population exhibiting high ‘‘demographic’’ concentration. In consequence, a convex shape, probably by maximizing food capture, evolved (Fig. 5b). It is suggested that the competition for food was more difficult in this simulation than in the following, since the deaths caused by lack of food (Fig. 9) was higher in simulation A than the others, except simulation B. In simulation B (Fig. 6), the distribution of food was over the quarter upper fraction of the environment. In this case, ‘‘light’’ individuals evolved, but they are less efficient than individuals of simulation A (compare Figs. 5b and 6b) since these individuals have only two cells to move upwards and one of them is opposed by the force of gravity. The evolved individuals are able to occupy the upper fraction of the environment, without the disadvantage of high concentration as in simulation A. On the other hand, they evolved a kind of plate useful to capture falling food. In simulations C and D (Figs. 7 to 8) food was spread over the upper three quarters and over all the environment. In both cases, ‘‘floating’’ individuals
FIG. 7. Simulation C: (a) distributing food over 140 upper lines generates ‘‘floating’’ individuals; (b) succession of some prevalent phenotypes during the simulation. Note the lateral appendices.
SIMULATING MORPHOLOGICAL EVOLUTION
13
14
SILVEIRA AND MASSAD
evolved. Both populations developed ‘‘plates’’ to capture falling food. In consequence, the individuals were more spread over the environment. Although simulations C and D were similar to each other, it is possible to observe, by comparing Figs. 7a and 8a, that the individuals occupy all the environment only in simulation D. It appears to be a ‘‘heavy’’ subpopulation living close to the floor in simulation D, which was not possible in the conditions of simulation C. FINAL COMMENTS Instead of describing populational behavior as a function of equations we describe individual behaviors as computer algorithms. This kind of model is able to mimic some real life situations and it deals with the problem of the incomplete information necessary to simulate real systems based on mathematical modeling, by circumventing some of the steps involved in the process of model construction. The model can also avoid some of the difficulties of dealing with heterogeneity, due to the construction of algorithms that act on each individual of the population. The two main disadvantages of Alife models against equatorial models are that (1) Alife models are time-consuming to run in the currently available personal computers and (2) a subjective component provided by the researcher intervenes with the model building. The first problem could be solved with technology evolution. The second is also true for equational models, but it is minimized with progressive mathematical formalization. If the populational behavior of a simulation correlates well with a biological observation, the model may be useful as an exploring tool. The set of rules may be not the same for a biological environment, since it is a more controlled model than reality. For instance, if we are studying mortality and each individual has a gene D that determines its probability of death, it is not relevant if a natural individual has its mortality based on a single gene. Probably not, but the model would be an initial approach to the question, devoted to clarify some basic mechanisms and influences in an overall evolution of the mortality of the species in a poligenic universe. This discussion gains importance when we consider the processes of mechanization of real systems in the form of a computer simulation. Through this technique, we state certain rules, for instance, mathematical functions or maps or procedural algorithms, which generate outputs that can be compared to real systems after a set of iteration–correction processes. Simpler tools, such as CA, are proving to be extremely useful to simulate complex patterns, like the evolution of biological shapes and sizes, based on comparatively simple rules (10).
FIG. 8. Simulation D: (a) distributing food over all environment generates ‘‘floating’’ individuals. On the other hand, there are ‘‘heavy’’ phenotypes living close to the floor of the environment; (b) succession of some prevalent phenotypes during the simulation. The usual lateral appendices were developed.
SIMULATING MORPHOLOGICAL EVOLUTION
15
16
SILVEIRA AND MASSAD
FIG. 9. Graphical results from simulations A, B, C, and D.
In this paper we intended to demonstrate that models based on cellular automata and evolutionary algorithms may be useful in evolutionary studies. Many improvements may be done in the future in order to implement more realistic features to the model. This kind of model may reflect a biological reality from simple rules. By constructing a model of reality many hypotheses can be tested. If the model fails, it should be modified or abandoned. Models and hypotheses that do not agree with reality are substituted by others that are better able to reflect the real environment (9). Although it is possible to apply complex algorithms to this kind of model, we believe that simpler rules may mimic the biological behavior. The present model associates morphology with interactions among individuals and their environment to allow evolution to act and select the more advantageous individuals. ACKNOWLEDGMENT We thank CNPq for financial support.
SIMULATING MORPHOLOGICAL EVOLUTION
17
REFERENCES 1. Alberch, P. Ontogenesis and morphological diversification. Amer. Zool. 20, 653–667 (1980). 2. Alberch, P. Developmental constraints in evolutionary processes. In ‘‘Evolution and Development.’’ Dahlem Conference Rep, Vol. 20, (J. T. Bonner, Ed.), pp. 313–332. Springer-Verlag, Berlin/Heidelberg/New York, 1982. 3. Dawkins, R. A. ‘‘The Blind Watchmaker,’’ Oxford Univ. Press, London, 1986. 4. Heitko¨tter, J., and Beasley, D. (Eds.) The hitch-hiker’s guide to evolutionary computation: A list of frequently asked questions (FAQ). USENET: comp.ai.genetic (1995). Available by Internet (FTP anonymous) from ftp://rtfm.mit.edu:/pub/usenet/news.answers/ai-faq/genetic/, 103 pages. 5. Kot, M., Sayler, C. S., and Schultz, T. W. Complex dynamics in a model microbial system. Bul. Math. Biol. 54(4), 619–648 (1992). 6. Maynard-Smith, J. ‘‘Problems of Biology,’’ Oxford Univ. Press, London, 1987. 7. Murray, J. D. Mathematical biology, 2nd Ed., Springer-Verlag, USA (1993). 8. Oster, G. F., Alberch, P. Evolution and bifurcation of developmental programs. Evolution: 36, 444–459 (1982). 9. Pianka, E. R. Evolutionary Ecology, 3rd Ed., Harper & Row Publishers, Inc., USA (1983). 10. Prusinkiewicz, P. Visual models of morphogenesis. Artificial Life: 61–74. C. L. Langton (Ed.), MIT Press, Cambridge, Massachusetts, USA. (1995). 11. Silveira, P. S. P., Yang, H. M., Azevedo Neto, R. S., Massad, E. Mathematical and computer models for infective contact rates. Math. Modelling Sci. Comput. 6, 793–798, 1996. 12. Silveira, P. S. P., Yang, H. M., and Massad, E. Computer-based environment for the study of ecological systems. Environ. Manag. Health 6(4), 19–28. (1995). 13. Taylor, C., and Jefferson, D. Artificial life as a tool for biological inquiry, In ‘‘Artificial Life MIT Press, Cambridge, MA, pp. 1–13. (C. L. Langton, Ed.), 1995.