BIOMAT 2008 International Symposium on Mathematical and Computational Biology
This page intentionally left blank
BIOMAT 2008 International Symposium on Mathematical and Computational Biology
Campos do Jordão, Brazil 22 – 27 November 2008 edited by
Rubem P Mondaini Universidade Federal do Rio de Janeiro, Brazil
World Scientific NEW JERSEY
•
LONDON
•
SINGAPORE
•
BEIJING
•
SHANGHAI
•
HONG KONG
•
TA I P E I
•
CHENNAI
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
BIOMAT 2008 International Symposium on Mathematical and Computational Biology Copyright © 2009 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN-13 978-981-4271-81-3 ISBN-10 981-4271-81-0
Printed in Singapore.
ZhangJi - BIOMAT2008.pmd
1
5/27/2009, 12:06 PM
April 24, 2009
12:44
WSPC - Proceedings Trim Size: 9in x 6in
Preface.novo2
v
PREFACE The present publication is the volume of selected works of the BIOMAT International Symposium on Mathematical and Computational Biology which was held in Campos do Jord˜ao, a beautiful touristic town on the Mantiqueira mountains, state of S˜ao Paulo, Brazil. Thirteen Keynote Speakers coming from Europe and Americas have given comprehensive talks in plenary sessions or have been lecturing in the two days of tutorials before the conference. Thirty two contributed papers were presented at technical sessions as part of the evaluation process of the Board of Referees. With more than 130 submitted papers, the level of acceptance is within the reasonable level of 20% to follow the decision of the BIOMAT Consortium for this BIOMAT series of books and special issues of indexed journals. The informal atmosphere of the meeting has provided excellent opportunities for discussing interesting scientific subjects among professionals of many educational formations. This is the spirit of interdisciplinary science of the BIOMAT Consortium as well as the duty of the Consortium of enhancing the participation of young research students. The leaders of the tutorial sessions have been also selected among the Keynote Speakers carefully, in order to motivate the future work of prospective scientists. As is usually made for the volumes of this series, the set of selected papers is a combination of original and state of art research as well as comprehensive review approaches in the areas of Mathematical Biology, Biological Physics, Mathematical Modelling of Biosystems and Bioinformatics. The book contains original results on Reaction-Diffusion waves, Demographic Allee Effect and the Dynamics of Reinfection of Tuberculosis. Recent reviews on the Modelling of Biosystems as the works on Symmetry Class of Icosahedral Viral Capsids and New Perspectives for a Theoretical Basis for Bioinformatics. We acknowledge the Administrative Staff of Brazilian Sponsoring Agencies: CAPES - Coordination for the Improvement of Higher Education Personnel and CNPq - National Research Council. We thank the PETROBRAS Oil Company and the PETROBRAS - CENPES Research Centre. Thanks are also due to the Directors and Representatives of these institutions for their receptiveness to the BIOMAT Symposium series: Prof. Sandoval Carneiro Jr. and Prof. Emidio Cantidio de Oliveira Filho, from CAPES, Prof. Marco Antonio Zago, from CNPq, Dr. Carlos Tadeu da Costa Fraga and Dr. Gina Vazquez Sebastian from CENPES-PETROBRAS. Last but not least, we thank Luiz Augusto Sousa de Oliveira for his collaboration with the editorial work, Cecilia Mondaini and Felipe Mondaini for the
April 24, 2009
12:44
WSPC - Proceedings Trim Size: 9in x 6in
Preface.novo2
vi
index of this volume. On behalf of the BIOMAT Consortium, a non-profit association of scientific researchers from universities and research institutes in five continents which is responsible for the organization of this annual series of symposia since the year 2001, we congratulate all the authors and participants for their professional work during the BIOMAT 2008 Symposium. Rubem P. Mondaini President of the BIOMAT Consortium Rio de Janeiro, December 2008.
April 24, 2009
12:44
WSPC - Proceedings Trim Size: 9in x 6in
Preface.novo2
vii
EDITORIAL BOARD OF THE BIOMAT CONSORTIUM
Rubem Mondaini (Chair) Federal University of Rio de Janeiro, Brazil Alain Goriely University of Arizona, USA Alan Perelson Los Alamos National Laboratory, New Mexico, USA Alexei Finkelstein Institute of Protein Research, Russian Federation Ana Georgina Flesia Universidad Nacional de Cordoba, Argentina Anna Tramontano University of Rome La Sapienza, Italy Avner Friedman Ohio State University, USA Carlos Castillo-Ch´ avez Arizona State University, USA Charles Pearce Adelaide University, Australia Christian Gautier Universit´e Claude Bernard, Lyon, France Christodoulos Floudas Princeton University, USA Denise Kirschner University of Michigan, USA Eduardo Gonz´ alez-Olivares Catholic University of Valpara´ıso, Chile Eduardo Massad Faculty of Medicine, University of S. Paulo, Brazil Frederick Cummings University of California, Riverside, USA Guy Perri`ere Universit´e Claude Bernard, Lyon, France Ingo Roeder University of Leipzig, Germany Jaime Mena-Lorca Catholic University of Valparaso, Chile Jo˜ao Frederico A. Meyer State University of Campinas, Brazil John Harte University of California, Berkeley, USA John Jungck Beloit College, Wisconsin, USA Jorge Velasco-Hern´andez Instituto Mexicano del Petr´ oleo, Mexico Jos´e Flores University of South Dakota, USA Jos´e Fernando Fontanari University of S˜ao Paulo, Brazil Kerson Huang Massachussets Institute of Technology, USA Lisa Sattenspiel University of Missouri-Columbia, USA Louis Gross University of Tennessee, USA Ludek Berec Academy of Sciences, Czech Republic Marat Rafikov University of Northwest, Rio Grande do Sul, Brazil Mariano Ricard Havana University, Cuba Nicholas Britton University of Bath, UK Panos Pardalos University of Florida, Gainesville, USA Peter Stadler University of Leipzig, Germany Philip Maini University of Oxford, UK Pierre Baldi University of California, Irvine, USA Raymond Mej´ıa National Institute of Health, USA Richard Kerner Universit´e Pierre et Marie Curie, Paris, France
April 24, 2009
12:44
WSPC - Proceedings Trim Size: 9in x 6in
Preface.novo2
viii
Rodney Bassanezi State University of Campinas, Brazil Rui Dil˜ ao Instituto Superior T´ecnico, Lisbon, Portugal Ruy Ribeiro Los Alamos National Laboratory, New Mexico, USA Timoteo Carletti Facult´es Universitaires Notre-Dame de la Paix, Belgium Vitaly Volpert Universit´e de Lyon 1, France William Taylor National Institute for Medical Research, UK Zhijun Wu Iowa State University, USA
April 24, 2009
16:11
WSPC - Proceedings Trim Size: 9in x 6in
Table.Of.Contents.novo2
ix
CONTENTS Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Editorial Board of the BIOMAT Consortium . . . . . . . . . . . . . . . . . . . . . . . . . . vii Mathematical Analysis of Reaction-Diffusion Equations Reaction-Diffusion Waves: Classical Theory and Recent Developments. V. Volpert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Linear and Non-linear Front Selection for Reaction-Diffusion Equations. A. Goriely, J. Rose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 Epidemiological Modelling Epidemiological Models with Demographic Allee Effect. F. M. Hilker . . 52 Pulse Infection, Control Fixing Time between Infection Events. F. Cordova-Lepe, E. Gonz´ ales-Olivares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Network Structure and Epidemic Waves in Metapopulation Models. V. Colliza, F. Gargiulo, A. Barrat, J. R. Ramasco, A. Vespignani . . . . . 91 Modelling of Biosystems Structure Integral Symmetry Classes of Icosahedral Viral Capsids. R. Kerner . . . 114 Recognition of Freshwater Macroinvertebrate Taxa by Image Analysis and Artificial Neural Networks. S. R. Doyle, A. L. Somma, J. Codnia, J. E. Ure, L. Romanelli, F.R. Momo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A Study of Hydrophobic Effect on the Protein Folding using Monte Carlo. L. P. B. Scott . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Population Dynamics Evolution in a Host-Parasite System. N. F. Britton . . . . . . . . . . . . . . . . . . . 157 A Population Dynamics Approach to Language Evolution. J. F. Fontanari . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Impact and Effectiveness of Marine Protected Area on Economic Sustainability. C. Jerry, N. Rassi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Long Distance Dispersal and Allee Effect in a Biological Invasion. S. L. Vega, W. C. Ferreira . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Modeling the Risk of Falciparum Malaria for Travelers to Holoendemic Regions. E. Massad, M. N. Burattini, F. A. B. Coutinho . . . . . . . . . . . . 211
May 20, 2009
11:42
WSPC - Proceedings Trim Size: 9in x 6in
Table.Of.Contents.novo2
x
Computational Biology and Bioinformatics Neural Network Classification with Prior Knowledge for Analysis of Biological Data. P. M. Pardalos, D. Abbate, M. R. Guarracino A. Chinchuluun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 The Markov Chains (Markov Set-Chains) as a Tool for Bacterial Genomes Evolution Analysis. P. Sliwka, M. Dudkiewicz . . . . . . . . . . . . . . . . . . . . . . . . 235 Multi-objectives Approach Applied to the Phylogenetic Inference Problem. W. Cancino, A. C. B. Delbem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Logic Formulas Based Knowledge Discovery and its Applications to the Classification of Biological Data. P. M. Pardalos, G. Felici, P. Bertolazzi, M. R. Guarracino, A. Chinchuluun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Unsupervised Classification of Tree Structured Objects. A. G. Flesia . 280 Genetic Codes as Codes: Towards a Theoretical Basis for Bioinformatics. J. R. Jungck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 Modelling Physiological Disorders and Medical Physics Mathematical Biology: Some Opportunities in Integrative Biology. R. Mej´ia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 An In silico Approach for the Antigenic Mutation and Immune Memory. A. de Castro, A. R. Neto, D. Alves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Software Developed from a Fuzzy Mathematical Model to Predict the Pathological Stage of the Prostate Cancer. G. P. Silveira, L. L. Vendite, L. C. Barros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Modeling and Simulation of the Human Eye. L. P. Brazil, L. H. O. Fernandes, L. G. Nonato, L. A. V. Carvalho, O. M. Bruno . . . . . . . . . . . . . . 377 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .393
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
1
REACTION-DIFFUSION WAVES: CLASSICAL THEORY AND RECENT DEVELOPMENTS V. VOLPERT Institute of Mathematics, UMR 5208 CNRS, University Lyon 1, 69622 Villeurbanne, France E-mail:
[email protected] Systematic studies of reaction-diffusion waves begin in the 1930s with the works by Zeldovich - Frank-Kamenetskii in combustion theory and by Fisher and Kolmogorov - Petrovskii - Piskunov in population dynamics. The theory of reaction-diffusion waves is now well developed and includes detailed analysis of the scalar reaction-diffusion equations and monotone systems, of flame stability and nonlinear dynamics, of waves in excitable media, of reaction-diffusionconvection waves, and so on. After a short review of the theory of reactiondiffusion waves, some recent developments in the theory of flame propagation will be presented. They concern some mathematical aspects of combustion waves with the Lewis number different from 1 and reaction-diffusion-convection waves. Models in population dynamics with intra-specific competition will be discussed. They provide a new mechanism of pattern formation in biology and can be used to describe the emergence of biological species in the process of evolution.
1. Scalar reaction-diffusion equations Reaction-diffusion equations describe numerous phenomena in population dynamics, in chemical kinetics and combustion, in catalysis, and in many other applications. The pioneering works23 and36 on propagation of the dominant gene and68 in combustion initiated the development of this area of research. The scalar reaction-diffusion equations in one space dimension were intensively studied till the 1980s (see, e.g., Refs. 22, 32, 34, 60 and the references therein). The existence, stability, and the speed of propagation of reaction-diffusion waves were well understood and formed the basis for further developments. In this section we will recall the main notions and results for the scalar reaction-diffusion equation. They can be found in particular in.60
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
2
1.1. Existence of waves Consider the reaction diffusion equation ∂2u ∂u + F (u) = ∂t ∂x2
(1)
on the whole axis, x ∈ R. Everywhere below we will assume that the function F (u) is continuous together with its first derivative. Travelling wave solution of this equation is, by definition, the solution of the form u(x, t) = w(x − ct), where c is a constant, the wave speed. Then the function w(x) satisfies the equation ′′
w + cw′ + F (w) = 0,
(2)
where prime denotes the derivative with respect to x. If we look for solutions of this equation with some limits at infinity, lim w(x) = w± , w+ < w− ,
x→±∞
(3)
then it can be easily shown that F (w± ) = 0. The constant c is not given. We should find its values such that problem (2), (3) has a solution. We will look for solutions w(x) of problem (2), (3) such that w+ < w(x) < w− for all x ∈ R. In this case, if such solution exists, then it is monotonically decreasing. It appears that only such solutions are interesting from the point of view of applications. Non-monotone waves are unstable (see the next section). It is also convenient to reduce equation (2) to the system of two first order equations ′
′
w = p, p = −cp − F (w).
(4)
Solutions of problem (2), (3) correspond to trajectories connecting two stationary points of system (4), that are (1, 0) and (0, 0). It is easy to verify that the points w+ and w− are stationary points of the equation dw = F (w). dt
(5)
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
3
If F (w) ≤ 0 in a right-half neighborhood of the point w+ , then this point is stable with respect to positive perturbations, that is the solution of equation (5) with the initial condition w(0) > w+ and close to w+ will remain in a neighborhood of this stationary point. If F (w) < 0 in a right-half neighborhood of w+ , then this point is asymptotically stable, that is the solution will converge to it. These observations justify the following terminology. We call problem (2), (3) bistable if F (w) ≤ 0 in a right-half neighborhood of w+ and F (w) ≥ 0 in a left-half neighborhood of w− . It is monostable if F (w) > 0 in a righthalf neighborhood of w+ and F (w) ≥ 0 in a left-half neighborhood of w− . Finally, it is unstable if F (w) > 0 in a right-half neighborhood of w+ and F (w) < 0 in a left-half neighborhood of w− . The simplest example of the bistable case is given by the function F (w) such that F (w) < 0, w+ < w < w0 , F (w) > 0, w0 < w < w−
(6)
for some w0 ∈ (w+ , w− ). We begin with the theorem on wave existence in the bistable case. Theorem 1.1. Let condition (6) be satisfied. Then there exists a unique value of c and a unique up to translation monotonically decreasing function w(x) which satisfy problem (2), (3). This theorem can be proved by the analysis of system (4). We show the existence of a trajectory connecting the stationary points (w+ , 0) and (w− , 0). Both of them are saddles. It appears that the trajectory going from the point (w+ , 0) into the half-plane p < 0 and the trajectory which comes to the point (w− , 0) from this half-plane depend monotonically on c. They intersect and, consequently, coincide for a single value of c. Theorem 1.1 can be generalized for other nonlinearities F (w) in the bistable case. For some functions F (w), solution of problem (2), (3) may not exist. In this case, we should introduce the notion of systems of waves which we do not discuss here (see60 ). Consider next the function F (w) such that F (w) > 0, w+ < w < w− .
(7)
It is the simplest example of the monostable case. Theorem 1.2. Let condition (7) be satisfied. Then there exists the minimal speed c0 such that for all c ≥ c0 there exist monotonically decreasing solution w(x) of (2), (3). Such solutions do not exist for c < c0 .
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
4
The proof of this theorem can also be done studying trajectories of system (4). The stationary point (w− , 0) is a saddle as before. However, the point (w+ , 0) is now a stable node. This is why the waves exist for all values of the velocity greater or equal to the minimal velocity. For more general nonlinearities in the monostable case the waves may not exist. In this case, similar to the bistable case, systems of waves should be introduced. We finally note that in the unstable case solutions of problem (2), (3) do not exist. 1.2. Stability of waves To define the notion of stability of travelling waves we consider the solution of equation (1) with some initial condition u(x, 0) = u0 (x) supposed here to be a continuous function. If the initial condition is close to the wave, that is it can be represented in the form u0 (x) = w(x) + v(x), where v(x) is a small perturbation, then the solution can approach the wave when time t goes to infinity. In this case the wave is stable. The perturbation can also grow in time, and in this case the wave is unstable. To be more precise, we should specify the class of perturbations and in what sense we understand the convergence. We recall that the waves are invariant with respect to translation in space. This means that along with w(x), all functions w(x + h) for any h ∈ R satisfy equation (2). It appears that the solution with the initial condition close to the wave w(x) can converge to a shifted wave w(x + h). This is stability with shift. Definition 1.1. If for some ǫ > 0, the function u0 (x) satisfies the estimate sup |u0 (x) − w(x)| ≤ ǫ, x
and the solution of the equation (1) with the initial condition u(x, 0) = u0 (x) converges to a wave w(x + h) for some h, sup |u(x, t) − w(x + h)| → 0, t → ∞, x
then the wave w(x) is asymptotically stable with a shift. The notion of asymptotic stability implies that the perturbation decays. If we impose a weaker condition that the perturbation remains bounded,
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
5
then stability of waves for the scalar equations follows directly from the comparison principle: if the initial condition satisfies the inequality w(x) ≤ u0 (x) ≤ w(x − h), x ∈ R for some positive h, then the same inequality holds for the solution: w(x) ≤ u(x, t) ≤ w(x − h), x ∈ R, t ≥ 0. This means that initially small perturbation remains small. The proof of the asymptotic stability is much more involved and requires the analysis of the spectrum of the linearized operators. Consider the operator L linearized about the wave, ′′
′
′
Lv = v + cv + F (w(x))v. ′
Theorem 1.3. Suppose that F (w± ) < 0 (bistable case). The principle eigenvalue of the operator L (that is the eigenvalue with the maximal real part) is real, simple, and the corresponding eigenfunction is positive up to a constant factor. There are no other eigenvalues with positive eigenfunctions. This result allows us to make some conclusions about the location of the spectrum of the operator L. Indeed, differentiating equation (2) with respect ′ to x, we can easily verify that v0 (x) = −w (x) is a positive eigenfunction of the operator L corresponding to the zero eigenvalue. Hence, according to the last theorem, the zero eigenvalue is simple and all other spectrum lies in the left-half plane. This is exactly the situation which provides the asymptotic stability with a shift. ′
Theorem 1.4. Suppose that F (w± ) < 0 (bistable case). If for some ǫ > 0, sup |u0 (x) − w(x)| ≤ ǫ, x
then the wave w(x) is asymptotically stable with a shift, that is there exists a constant h such that the following estimate holds sup |u(x, t) − w(x + h)| ≤ M e−σt , x
where M and σ are some positive constants. Stability in the sense of Definition 1.1 is also called local stability or stability with respect to small perturbations. We can obtain a stronger result on global stability. We will assume for simplicity that the initial condition u0 (x) is a monotonically decreasing function.
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
6 ′
Theorem 1.5. Suppose that F (w± ) < 0 (bistable case) and F (u) < 0, w+ < u < w+ + a,
F (u) > 0, w− − b < u < w− ,
where a and b are some positive constants. If the initial condition u0 (x) is a monotonically decreasing function such that w+ < u0 (+∞) < w+ + a,
w− − b < u0 (−∞) < w− ,
then the solution of equation (1) exponentially converges to a wave, that is there exists a constant h such that the following estimate holds sup |u(x, t) − w(x + h)| ≤ M e−σt , x
where M and σ are some positive constants. We next introduce the notion of convergence in form and in speed. Consider the equation u(x, t) =
1 (w+ + w− ), 2
where u(x, t) is a solution of equation (1). We will assume for simplicity that u(x, t) is monotonically decreasing in x (this is the case if the initial condition is decreasing), with the limits w± at ±∞. Then the last equation has a unique solution for all t > 0. Denote it by m(t). Put v(x, t) = u(x + m(t), t). Then v(x, t) is a solution of the equation ′ ∂2v ∂v ∂v = + m (t) + F (v) ∂t ∂x2 ∂x
(8)
and v(x, 0) =
1 (w+ + w− ). 2
Definition 1.2. If sup |v(x, t) − w(x)| → 0, t → ∞, x
then we will say that the solution u(x, t) converges to the wave w(x) in ′ form. If m (t) → c, then it is the convergence in speed. It can be verified that the uniform convergence discussed above implies the convergence in form and in speed but the converse is not generally true. The convergence in speed follows from the convergence in form.
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
7
Convergence in form can be proved for wider classes of nonlinearities than in the case of the uniform convergence. If we still consider the bistable ′ case but the condition F (w± ) < 0 is not satisfied, that is one of the derivatives equals zero, then the essential spectrum of the linearized operator passes through the origin (see Section 2) and the stability results presented above are not applicable. To simplify the formulation of the next theorem, we will restrict ourselves to monotone initial conditions with the limits w± at ±∞. Theorem 1.6. Suppose that the nonlinearity F (u) satisfies the conditions of the bistable case and that there exists a monotonically decreasing solution w(x) of problem (2), (3). Then for any monotonically decreasing initial condition u0 (x) such that u0 (±∞) = w± , the solution u(x, t) of equation (1) converges to the wave w(x) in form and in speed. We will finish this section with some results on the convergence to waves in the monostable case. We assume, for simplicity of presentation, that the function F (u) is positive in the interval (w+ , w− ). In this case monotone waves exists for all c ≥ c0 . They have exponential behavior at +∞ with the exponent λ, where r r c2 c2 c c ′ − F (w+ ) for c > c0 , λ = + − F ′ (w+ ) for c = c0 . λ= − 2 4 2 4 Thus, the waves can be characterized by the parameter ′
λ = − lim wc (x)/(wc (x) − w+ ). x→∞
(9)
Theorem 1.7. Let F (u) > 0 for u ∈ (w+ , w− ). Suppose that u0 (x) is a decreasing function continuous with its first derivative. If there exists the limit ′
λ = − lim u0 (x)/(u0 (x) − w+ ) x→+∞
and c0 − λ< 2
r
c20 − F ′ (w+ ), 4
then the solution of equation (1) with the initial condition u(x, 0) = u0 (x) converges in form and in speed to the wave wc (x), for which the limit (9) has the same value of λ. If r c20 c0 λ> − − F ′ (w+ ), 2 4
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
8
then the solution converges in form and speed to the wave wc0 (x) with the minimal velocity. We note that this theorem can be generalized for much wider classes of nonlinearities and initial conditions. We will only mention that if the initial condition is a Heaviside function, u0 (x) = w− for x ≤ 0 and u0 (x) = w+ for x > 0, then the solution converges to the wave with the minimal velocity. This is the main result of the work KPP.36 1.3. Velocity of waves We recall that if the wave exists in the bistable case then the corresponding value of the velocity is unique. In the monostable case, if the waves exist, then their speed fill some interval [c0 , c1 ), for some c1 ≤ ∞. There are various ways to compute or to estimate the wave velocity. We begin with the bistable case. From (4) we have c≡−
dp F (w) − , w ∈ (w+ , w− ). dw p(w)
If instead of the exact solution p(w) we consider a smooth function ρ(w) such that ρ(w+ ) = ρ(w− ) = 0, ρ(w) < 0, w+ < w < w− ,
(10)
then it can be shown that ′ ′ F (w) F (w) ≤ c ≤ max −ρ (w) − . min −ρ (w) − w w ρ(w) ρ(w) If the wave exists, then its velocity admits the minimax representation ′ ′ F (w) F (w) c = inf max −ρ (w) − = sup min −ρ (w) − , (11) ρ w w ρ(w) ρ(w) ρ where the infimum and supremum are taken with respect to all functions ρ satisfying conditions (10). In the monostable case the left equality in (11) gives the values of the minimal velocity. There are some particular cases where the minimal velocity in the ′ ′ monostable case can bepfound explicitly. If F (w) ≤ F (w+ ) for w ∈ ′ (w+ , w− ), then c0 = 2 pF (w+ ). If this condition is not satisfied, then it is possible that c0 > 2 F ′ (w+ ).
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
9
2. Chemical kinetics and combustion Reaction-diffusion systems describe numerous applications in chemical kinetics and combustion. A general chemical reaction can be written in the form m m X X βij Aj , i = 1, ..., n. αij Aj → j=1
j=1
There are n elementary reactions and m species A1 , ..., Am . The constants αij , βij are called the stoichiometric coefficients. The rate of the i-th reaction Wi , according to the mass action law can be written as α
α
Wi = ki (T )A1 i1 × ... × Amim . The corresponding reaction-diffusion system has the form n ∂2T X ∂T q i Wi , + =κ ∂t ∂x2 i=1
(12)
n
∂ 2 Aj X ∂Aj γij Wi , j = 1, ..., m, = dj + ∂t ∂x2 i=1
(13)
where γij = βij − αij . In the case of equality of transport coefficients, that is if d1 = ... = dm = κ, and under some additional conditions on the reaction scheme,60 system (12), (13) can be reduced to so-called monotone systems characterized by the applicability of the maximum principle and of comparison theorems (Section 2.1). In this case, most of the results on wave existence and stability obtained for the scalar equation remain valid for systems of equations. If the transport coefficients differ from each other, as it is the case for condensed phase combustion, then propagation of flames can show some new features. In particular, it can be accompanied by various instabilities resulting in complex nonlinear dynamics (Section 2.2). 2.1. Monotone systems Consider now the system of equations ∂u ∂2u = d 2 + F (u), (14) ∂t ∂x where u = (u1 , ..., um ), F = (F1 , ..., Fm ), d is a diagonal matrix with positive diagonal elements. We assume that the vector-valued function F (u) satisfies the following condition ∂Fi > 0, i, j = 1, ..., m, i 6= j. (15) ∂uj
May 20, 2009
11:57
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
10
Such systems satisfy the positiveness and comparison theorems. They are essentially used in the investigation of travelling waves. The following theorem gives the existence of waves in the bistable case. Theorem 2.1. Suppose that F (w+ ) = F (w− ) = 0, where w+ < w− (the inequality is component-wise) and the matrices F 0 (w± ) have all eigenvalues in the left-half plane. If there exists a finite number of points w j 6= w± , j = 1, ..., k such that w+ ≤ wj ≤ w− and each matrix F 0 (wj ) has at least one eigenvalue in the right-half plane, then there exists a unique monotonically decreasing travelling wave solution u(x, t) = w(x − ct) of system (14) with the limits w(±∞) = w± . Its velocity admits the following minimax representation 00
00
c = inf sup ρ∈K x,i
ρ + Fi (ρ) ρi + Fi (ρ) = sup inf i , −ρ0 −ρ0 ρ∈K x,i
where K is the class of monotonically decreasing vector-functions ρ continuous with their second derivatives and having limits ρ(±∞) = w± at infinity. This theorem generalizes Theorem 1.1 for monotone systems. Its proof is based on the topological degree theory and on some special a priori estimates of solutions. Under some additional conditions, it remains valid if inequality (15) is not strict. Detailed presentation of this and of the following result for the monostable case is given in.60 Theorem 2.2. Suppose that F (w+ ) = F (w− ) = 0, where w+ < w− , the matrix F 0 (w− ) has all eigenvalues in the left-half plane and the matrix F 0 (w+ ) has at least one eigenvalue in the right-half plane. If there are no other zeros of the function F (u) for w+ ≤ u ≤ w− , then for all c ≥ c0 there exists a monotonically decreasing travelling wave solution u(x, t) = w(x−ct) with the limits w(±∞) = w± . For c < c0 such solutions do not exist. The minimal value c0 of the velocity is given by the equality 00
c0 = inf sup ρ∈K x,i
ρi + Fi (ρ) , −ρ0
where K is the same as in the previous theorem. We finally remark that the monotone waves are asymptotically stable.60 The results on wave existence, stability and velocity for monotone systems in cylinders for the monostable and for the bistable cases are given in.61,63,64
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
11
2.2. Flame propagation The classical model describing flame propagation consists of the two reaction-diffusion equations ∂2T ∂T =κ + qk(T )φ(α), ∂t ∂x2
(16)
∂2α ∂α =d + k(T )φ(α), (17) ∂t ∂x2 where T is the temperature, α the depth of conversion, κ the coefficient of thermal diffusivity, d the coefficient of mass diffusion, q the adiabatic heat release, k0 e−E/RT , T > T ∗ , k(T ) = 0 , T ≤ T∗ φ(α) is the kinetic function usually considered in the form φ(α) = (1 − α)n , n ≥ 0. For T > T ∗ the function k(T ) has the form of the Arrhenius exponential, where E is the activation energy, R the gas constant, k0 the preexponential factor. We employ here the well known in combustion theory “cut-off” procedure assuming that k(T ) equals zero for T < T ∗ . Without this assumption, system (16), (17) does not have travelling wave solutions and flames can be considered only as an intermediate asymptotics.66 If the activation energy is sufficiently large, which is the case for combustion processes, then the choice of T ∗ is not essential. System (16), (17) should be completed by the conditions at infinity: x = +∞ : T = Ti , α = 0, x = −∞ : T = Tb , α = 1, where Tb = Ti + q. These conditions imply that the wave propagates from −∞ to +∞. The temperature distribution in combustion wave is shown schematically in Figure 1. The black region is the reaction zone where the reaction rate W = K(T )φ(α) is essentially different from zero. Outside of the reaction zone, the reaction rate is close to zero. In front of it (with respect to the direction of wave propagation), the temperature decreases and the Arrhenius exponential in the case of large activation energies becomes small. Behind the reaction zone, the temperature is high, but the depth of conversion becomes close to 1, that is φ(α) is small. Thus, if the activation energy is large, then the reaction zone is narrow. This observation allowed Zeldovich and Frank-Kamenetskii to develop the method of narrow reaction zone68 which remains the basic analytical
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
12
Fig. 1. Schematic representation of combustion wave. Black region corresponds to the reaction zone.
method to study combustion fronts. The idea of the method is to consider the limit where the width of the reaction zone tends to zero. In this limit, we obtain a free boundary problem with linear equations outside of the free boundary and with some jump conditions at the interface. In the case of Le = 0 and zero order reaction, the jump conditions have the form: q T (ξ + 0, t) = T (ξ − 0, t), T ′ (ξ + 0, t) − T ′ (ξ − 0, t) = − , κ (T ′ (ξ + 0, t))2 − (T ′ (ξ − 0, t))2 =
2q κ
Z
Tb
k(T )dT,
Ti
where ξ(t) is the position of the free boundary. The value of the temperature and of its derivatives at the interface can be explicitly found from the solution of linear equations outside of the reaction zone. This allows us to obtain the formula for the speed of propagation: Z 2κ Tb 2 c = k(T )dT. q Ti The method of narrow reaction zone is also used to study stability of combustion fronts. If κ = d, that is the Lewis number Le = d/κ equals 1, then system (16), (17) can be reduced to the scalar reaction-diffusion equation considered in Section 1. In this case, the existence and stability of waves is discussed above. If Le 6= 1, then the wave existence is also known though its proof can be more involved. We will discuss here wave stability in more detail. Along with the Lewis number, we introduce another dimensionless parameter called Zeldovich number, Z = qE/RTb2 . It is a large parameter (since the activation energy is large) inversely proportional to the width of the reaction zone.
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
13
Fig. 2. plane.
Schematic representation of the stability and instability regions in the (Z, Le)-
Schematic representation of stability regions in the (Z, Le)-plane is shown in Figure 2. As we indicated above, for Le = 1, the wave is stable for all values of Z. If Le < 1 and Z is sufficiently large, then the wave becomes unstable. When the parameter crosses the stability boundary, a Hopf-like bifurcation occurs. This means that a pair of complex conjugate eigenvalues of the corresponding linearized operator crosses the imaginary axis. We recall that this operator has also a zero eigenvalue related to the translation invariance of travelling wave solutions. Because of this, the bifurcation analysis and behavior of bifurcating solutions differs from the classical situation of Hopf bifurcation (see, e.g.,60 ). In the one-dimensional spatial case, with the Lewis number less than 1, the loss of stability of combustion front results in appearance of time oscillations. The speed of propagation is not constant any more, it becomes periodic in time. Further increase of the Zeldovich number leads to a period doubling bifurcations and to transition to chaos. In the multi-dimensional case (the second derivatives in (16), (17) should be replaced by the Laplace operators), the loss of stability of plane fronts can result in a big variety of the propagating regimes, which depend on the geometry of the sample and on the values of parameters. The most well known among them are the so-called spinning modes of propagation where the high-temperature spots rotate along the cylindrical surface of the
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
14
sample. They are discovered experimentally first in the condensed phase combustion47 and later in frontal polymerization.57 There is a number of analytical and numerical studies devoted to spinning modes of gasless combustion (see39,42,52,60 and the references therein).
Fig. 3. Frontal polymerization sample after the spinning mode of propagation57 (left). A snapshot of the temperature distribution in numerical simulations of spinning combustion58 (right).
If the Lewis number is greater than 1 and the Zeldovich number is sufficiently large, then a bifurcation of change of stability occurs. A simple real eigenvalue of the linearized operator crosses the origin. The plane front loses its stability and a curved front appears. It propagates with a constant speed and a constant profile but it is essentially multi-dimensional. Such regimes are called in combustion theory cellular flames.51 2.3. Mathematical theory of combustion waves with the Lewis number different from 1 In the multi-dimensional case, travelling wave solution of the reactiondiffusion system describing flame propagation satisfies the elliptic system of equations ∆θ + (c + ψ(y))
∂θ + K(θ, α) = 0 ∂x
(18)
∂ψ + K(θ, α) = 0. (19) ∂x Here θ is the dimensionless temperature, K(θ, α) the reaction rate. We consider it in in an unbounded cylinder Ω = (x, y), −∞ < x < +∞, y ∈ G , Le ∆α + (c + ψ(y))
May 20, 2009
11:57
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
15
where G is an open bounded subset of Rm with m = 1, 2. The system is completed by the boundary conditions ∂θ ∂α = = 0 on ∂Ω, ∂ν ∂ν
(20)
where ν is the outer normal vector, and by the following conditions at infinity θ(−∞, y) = 1, α(−∞, y) = 1;
θ(+∞, y) = 0, α(+∞, y) = 0.
(21)
The function ψ(y) describes the stationary gas velocity. It can be for example the Poiseuille profile. The specific feature of system (18), (19) is that the nonlinearity is the same in both equations. In the general case where we have several reaction species and the system can contain more than two equations, the nonlinearities in these equations are linearly dependent. We restrict ourselves here to the case of two equations. If κ = d, that is Le = 1, then this system can be reduced to the single equation and θ(x, y) ≡ α(x, y). However, if Le 6= 1, then this reduction cannot be done. We should consider both equations. It appears that linear dependence of the nonlinearities results in some “bad” properties of the corresponding elliptic operator. Namely, it does not satisfy the Fredholm property because the corresponding limiting operators are not invertible.19,21,65 This means that the usual solvability conditions of linear problems, which affirm that the problem is solvable if and only if the right-hand side is orthogonal to the solutions of the homogeneous adjoint problem, cannot be applied. However, the solvability conditions are used in many methods of linear and nonlinear analysis: asymptotic methods, methods of bifurcation theory, topological degree and so on. Thus, we cannot apply them to the problems under consideration or, which is often the case, can apply them only formally, without mathematical justification. From the mathematical point of view, this is one of open questions in combustion theory. We describe here an approach which was recently developed in19–21 and which allows us to overcome this difficulty. It is based on the reduction of the reaction-diffusion system to some integro-differential problem in such a way that it satisfies the Fredholm property. We assume that F (θ) = K(θ, θ) is a C 2 function that satisfies the conditions: F (0) = F (1) = 0, F 0 (0) < 0, F 0 (1) < 0.
(22)
They imply that the corresponding scalar equation is of the bistable type. Let us describe the construction of the integro-differential operator. We
May 20, 2009
11:57
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
16
first introduce the function H = θ − α.
(23)
Adding equations (18) and (19), we see that it satisfies the linear equation ∂H = (Le − 1)∆α, ∂x together with the boundary condition ∆H + (c + ψ(y))
∂H = 0 on ∂Ω, H(±∞, y) = 0. ∂ν
(24)
(25)
It appears that the resolution of problem (24), (25) allows us to define H as a function of α and that the operator H(α) is bounded and differentiable in appropriate spaces. This allows the reduction of problem (18)-(21) to the integro-differential problem Le ∆α + (c + ψ(y))
∂α + K(H(α) + α, α) = 0, ∂x
(26)
∂α = 0 on ∂Ω. (27) ∂ν The resolution of this problem will provide α while the unknown θ will be given by the equality θ = H(α) + α. The analysis of the nonlinear problem is based on the properties of the corresponding linear operator ∂v + Kθ0 (θ, α)(H 0 (α)v − v) + Kα0 (θ, α)v. (28) ∂x This new formulation of the problem allows us to obtain some existence results for problem (18)-(21) and to study bifurcations of solutions. We begin with the case where Le is close to 1 and where the system reduces to a scalar equation. The existence of multidimensional waves described by such scalar equations is studied in.8,9,24,28,33,56,64 Monotone systems of equations are considered in.59,63,64 Problem (18)-(21) with Λ 6= 1 and close to 1 is studied in17,18 with a completely different and less general method. The existence of solutions is first proved in a bounded rectangle. Next, a priori estimates independent of the rectangle allow the proof of the existence of solutions in the unbounded strip. Also, parabolic problems with the Lewis number different from one are studied in.35,37,40 Lv = Le ∆v + (c + ψ(y))
Theorem 2.3. Suppose R that for Le = 1 problem (18)-(21) has a solution (θ0 , α0 (= θ0 ), c0 ) with G ψ(y) > 0. Then for all Le sufficiently close to 1 it has a solution (θΛ , ψΛ , cΛ ).
May 20, 2009
11:57
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
17
The proof of this theorem is based on the integro-differential formulation (26), (27). If Le = 1, then H(α) = H 0 (α) ≡ 0. In that particular case the operator L given by (28) is the usual differential operator with well known properties. These properties allow the application of the implicit function theorem and the proof of the theorem. In the general case, the above perturbation argument requires to take into account properties of the operator H 0 . It is proved that the linearized operator L satisfies the Fredholm property.21 This result allows us to prove another existence theorem. If in the previous case we prove the existence of solutions close to a multi-dimensional solution with Le = 1, in this case we consider any Le and prove the existence of multi-dimensional solutions close to a one-dimensional solution under the assumption that the function ψ(y) is close to a constant. This difference appears to be rather essential. In the first case, the linearized operator does not contain the operator H because it is identically zero for Le = 1. In the second case, the linearized operator contains H, and we need to prove its Fredholm property in order to obtain solvability conditions. We finally note that this approach is also used to study bifurcations of solutions and the emergence of cellular flames.21 2.4. Reaction-diffusion-convection waves Propagation of reaction fronts can be accompanied by a change of density due to the temperature and concentration variation. In a liquid or in a gaseous phase under the action of gravity, this can result in appearance of convection. Figure 4 shows the propagation of a polymerization front in the vertical direction. The polymers below the front is solid, the monomer above it is liquid. The reaction is exothermic. It heats the liquid monomer from below and leads to convection. The convective instability appears if the frontal Rayleigh number is sufficiently large.25,26 The situation is different for horizontally propagating fronts (Figures 5). Convection appears here for any Raleigh numbers different from zero, and convective free solutions do not exist. Such problems can be modelled by reaction-diffusion system coupled with the Navier-Stokes equations: ∂θ − ∆θ + v.∇θ − K(θ, α) = 0, ∂t
(29)
∂α − Le ∆α + v.∇α − K(θ, α) = 0, ∂t
(30)
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
18
Fig. 4. Experimental results on propagation of a polymerization front with convection. The system consists of acrylamide dissolved in dymethyl sulfoxide. Adapted from10 .
∂v − P ∆v + (v.∇)v + ∇p − P R(θ − θ0∗ )τ = 0, ∂t
(31)
∇.v = 0.
(32)
Here θ is the dimensionless temperature, α the depth of conversion, v = (v1 , v2 ) the velocity of the medium, p the pressure. We consider here the case of two space dimensions. The equations (29), (30) describe the propagation of a premixed flame. The Navier-Stokes equations are written under the so-called Boussinesq approximation where the medium is considered as incompressible and the density is everywhere constant except for the buoyancy term, which describes the action of gravity. This term involves some characteristic temperature θ0∗ and τ = (τ1 , τ2 ), a unit vector in R2 representing the orientation of the gravity force. Finally, the system depends on dimensionless parameters that are the the Lewis number Le, the Prandtl number P and the Rayleigh number R. System (29)-(32) is considered in an infinite strip Ω with the axis forming a given angle with the vertical direction. For convenience we will suppose that the domain is fixed, Ω = (x, y) ∈ R2 , y ∈ (0, 1) ,
while the vector τ can be variable. Equations (29)-(32) are supplemented with the following boundary conditions for the temperature and for the
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
19
Fig. 5. Propagation of a horizontal front, concentration distribution (left), stream function (right). Numerical simulations. Adapted from6 .
concentration ∂α ∂θ = = 0 on ∂Ω, ∂y ∂y
(33)
and with the free surface boundary condition for the velocity ∂v1 = 0, v2 = 0 on ∂Ω. ∂y
(34)
In order to study this problem, it is convenient to rewrite the NavierStokes equations in the stream function-vorticity formulation. We introduce the stream function ψ defined by: v = curl ψ = (
∂ψ ∂ψ ,− ), ψ = 0 on ∂Ω ∂y ∂x
(35)
ω = −∆ψ.
(36)
and the vorticity
Then equations (31), (32) rewrite as ∂ω − P ∆ω + curl ψ.∇ω − P Rcurl θ.τ = 0, ∂t
(37)
∆ψ + ω = 0,
(38)
while the boundary condition (34) provides ω = 0 on ∂Ω.
(39)
May 20, 2009
11:57
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
20
As already mentioned, the vector τ in (31) determines the orientation of the gravity force with respect to the strip. If τ = (1, 0), it is directed along the axis of the strip. If τ = (0, 1), it is oriented vertically and, consequently, perpendicular to the strip. These two cases appear to be essentially different. In the first case, the experimental results for polymerization fronts,10,46 formal asymptotic expansions,25 and rigorous mathematical analysis for Le = 154,55,62 show that for small Rayleigh numbers the vertically propagating reaction-diffusion wave is stable, and convection does not appear. For sufficiently large values of the Rayleigh number it loses its stability, and a reaction-diffusion-convection (RDC) wave appears due to a bifurcation of change of stability where a real eigenvalue of the linearized problem crosses the origin. For all other directions of the gravity force, including the case where it is perpendicular to the strip, propagation of waves is accompanied by convection even for small Rayleigh numbers. Existence of RDC waves in this case is proved in5 for Le = 1 and small R. Reaction-diffusionconvection waves in the case Le 6= 1 are studied in20 by the method described in Section 2.3. Existence of waves for arbitrary Rayleigh numbers is studied in.13,38 The corresponding experimental results are described in.3 Existence of solutions of the evolution problem and some their properties are studied in,40,41 stability of RDC waves is discussed in.12 3. Population dynamics 3.1. Classical models in population dynamics Beginning from the first works by Fisher23 and KPP36 on the propagation of dominant gene, reaction-diffusion equation are widely used in order to describe various phenomena in population dynamics. We can mention the logistic equation and the model of competition of species, prey-predator model, various models in epidemiology (see, e.g.,11,48 and the references therein). We will briefly discuss here some classical models of growth and competition of populations. Let u denote a scaled density of the population. Then its evolution in space and in time can be described, under some simplifying assumptions, by the reaction-diffusion equation ∂ 2u ∂u =d + ku(1 − u). (40) ∂t ∂x2 Here the diffusion term describes the displacement of the individuals in the population and the reaction term their reproduction which is proportional to the density of the population and to available resources (1−u). This is so-
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
21
called logistic equation. According to the terminology introduced in Section 1, this is the reaction-diffusion equation with a monostable nonlinearity. We recall that √ travelling waves exist in this case for all values of the speed c ≥ c0 = 2 k. These waves are stable with respect to small perturbations in the norm with a properly chosen exponential weight. Under some conditions on the initial functions, we can also prove the convergence of solutions of the Cauchy problem to the travelling waves in form and in speed. If we take into account the bi-sexual reproduction, then the reproduction term will be proportional to u2 : ∂2u ∂u =d + ku2 (1 − u) − bu. (41) ∂t ∂x2 The last term in the right-hand side describes mortality of the population. In this case we have a bistable nonlinearity. We recall that the travelling wave is unique up to translation in space and it is globally asymptotically stable with shift. If there are several populations, then they can compete with each other for resources. As it is well known (see, e.g.,48 ), depending on the characteristics of the populations, they can co-exist or one of them can extinct while another one will expand. The simplest model of competition of two species is given by the reaction-diffusion system ∂2u ∂u + k1 u(1 − a1 u − b1 v), =d ∂t ∂x2
(42)
∂2u ∂u + k2 v(1 − a2 u − b2 v). (43) =d ∂t ∂x2 There exist four stationary points of this system: P0 = (0, 0), P1 = (1/a1 , 0), P2 = (0, 1/b2 ), and P3 which can be found as a solution of the linear algebraic system a1 u + b1 v = 1, a2 u + b2 v = 1. If b1 > b2 and a2 > a1 , then P3 is stable and the species co-exist. If the opposite inequalities hold, then P1 and P2 are stable. In this case, we can put the question about the existence of travelling waves connecting these two points. We note that system (42), (43) can be reduced to a monotone system by a change of variables. Hence, we can prove the wave existence and stability, and use the minimax representation of the wave speed. If its speed equals zero, then the two species co-exist occupying two different space regions. If the speed is different from zero, then one of the species will invade the whole space replacing the other species.
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
22
3.2. Intra-specific competition and nonlocal consumption of resources The models presented in the previous section do not take into account intraspecific competition. Its importance for various questions in population dynamics is accepted already long time ago due to Darwin’s famous book on the origin of species.14 One of the simplest models taking into account intra-specific competition is given by the integro-differential equation Z ∞ ∂u ∂2u (44) ϕ(x − y)u(y, t)dy . = d 2 + ku 1 − ∂t ∂x −∞ Here u is the density of a population, the first term in the right-hand side describes displacement of the individuals either in the physical space or in the morphological space. In the latter case, x corresponds to some morphological characteristics, for example, the weight of some animals, their height, or some other metrical characteristics. The second term in the right-hand side describes the reproduction of the population which is proportional to its density and to available resources, the expression in the brackets. If ϕ(y) is the Dirac δ-function, then the integral becomes equal u(x, t), and we obtain equation (40). If ϕ(y) has a finite support, then the integral describes the consumption of resources at the space point x by the individuals located at the space point y. Thus, we deal with nonlocal consumption of resources or, in the other words, with the intra-specific competition, that is the competition of individuals of the same species for resources.
Fig. 6.
Emergence of new species: level lines of the density in the bistable case.
Integro-differential equations in population dynamics can also arise in epidemiology and other applications. The review of early works can be
May 20, 2009
11:57
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
23
found in11 (see 29,50 for more recent references). The wave existence for equation (44) can be proved if the support of ϕ is sufficiently narrow. For the monostable case (44) it can be done by constructing some approximating sequences (see16 and the references therein). In the bistable case, Z ∞ ∂u ∂2u = d 2 + ku2 1 − ϕ(x − y)u(y, t)dy − bu, (45) ∂t ∂x −∞ which describes bisexual reproduction, it is proved in1 by the implicit function theorem. In both cases, monostable and bistable the waves have the limits u = 0 and u = 1 at infinity. If the support of ϕ becomes sufficiently large, then the homogeneous in space solution u = 1 loses its stability, and a periodic in space stationary solution emerges.29–31 It is possible that the wave with the limit u(−∞) = 1 still exists but becomes unstable. Numerical simulations show propagation of periodic waves. An example of such simulations is shown in Figure 6 (1 ). It represents level lines of the solution u(x, t) of equation (45). The horizontal axis is the x-variable, the vertical axis is time. The initial condition is localized in a narrow interval near x = 200. We see that a periodic in space structure emerges at the center of the interval and propagates to the left and to the right in the form of periodic travelling waves.
Fig. 7. Evolution of biological species: Darwin’s diagram (left, adapted from 14 ); numerical modelling of extinction (right).
There are various biological interpretations of these results. One of the most interesting is related to the evolution of biological species. If we con-
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
24
sider the space variable x as a morphological parameter, then the individuals that have approximately the same value of this parameter can be interpreted as a biological species, while two populations separated in the morphological space correspond to two different species. In the simulations shown in Figure 6, initially we have a unique species with approximately the same value of the morphological parameter. After some time, it splits into two subpopulations or two different species. Some time later, new species appear and so on until the whole morphological space is completely filled. These results correspond to Darwin’s description of the emergence of biological species in the process of evolution (cf. Figure 7 (left)). In our modelling, it is based on three properties: random mutations (diffusion), intra-specific competition (integral), self-reproduction with the same phenotype (nonlinear term). It is interesting to note that the process of speciation can be modelled in a similar way for many other applications including the production of consumer goods or specialization in art and in science. One of the differences between the modelling and Darwin’s schematic diagram is that in the former, once a new species appears, it remains forever. In the diagram, some of the species disappear with time. One of possible explanations of this behavior can be related to variable environmental conditions. If we take them into account in the model through the values of parameters assuming that they can change with time, then we can describe the process of extinction (Figure 7 (right)).
Fig. 8.
Competition of species with intra-specific competition.
Thus, intra-specific competition can result in the emergence of new biological species. The new species continue to compete for resources in both
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
25
ways, intra-specifically and between the species. In order to model this phenomenon, we consider the integro-differential system of equations
1 − a1
1 − a2
∞
Z
−∞
Z
∞
−∞
∂2u ∂u (46) = d1 2 + k1 um × ∂t ∂x Z ∞ ϕ(x − y)u(y, t)dy − b1 ϕ(x − y)v(y, t)dy − p1 u, −∞
∂v ∂2v (47) = d2 2 + k2 v m × ∂t ∂x Z ∞ ϕ(x − y)u(y, t)dy − b2 ϕ(x − y)v(y, t)dy − p2 v. −∞
If we replace φ by δ-function and put m = 1, p1 = p2 = 0, then we obtain system (42), (43). An example of numerical simulations of system (46), (47) is shown in Figure 8. We take the initial conditions for functions u and v in such a way that they have finite supports and these support are separated. At the first stage, the corresponding populations develop independently of each other forming periodic structures (cf. Figure 6). After some time, they meet, and the function u gradually replaces the function v. From the mathematical point of view, this behavior corresponds to a periodic waves moving from the left to the tight. It is interesting to note that its speed is less than in the case without the integral terms. This means that intra-specific competition helps weaker species to resist to the invasion of stronger species. Acknowledgements In this survey I have used some results obtained in joint works with many co-authors. I would like to express them my profound gratitude. This review is necessarily incomplete, both from the point of view of the choice of topics and of the list of references. I hope that this does not touch the sensibility of the authors whose works remain out of the scope of this paper. References 1. N. Apreutesei, A. Ducrot and V. Volpert, submitted. 2. N. Apreutesei, A. Ducrot and V. Volpert, submitted. 3. M. Bazile, H.A. Nichols, J.A. Pojman and V. Volpert, Journal of Polymer Science. Part A. Polymer Chemistry, 40, 3504 (2002).
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
26
4. M. Belk, Th`ese, Universit´e Lyon 1, 2003. 5. M. Belk, B. Kazmierczak and V. Volpert, Int. J. Math. and Math. Sciences, 169 (2005), No. 2. 6. M. Belk, K. Kostarev, V. Volpert and T. Yudina, Journal of Physical Chemistry B, 107, 10292, (2003). 7. H. Berestycki, ICIAM 99 (Edinburgh), 13-22, Oxford Univ. Press, Oxford, 2000. 8. H. Berestycki and B. Larrouturou and P.L. Lions, Arch. Rational Mech. Anal., 111, 33 (1990). 9. H. Berestycki and L. Nirenberg, Ann. Inst. H. Poincar´e Anal. Non Lin´eaire, 9, 497 (1992). 10. G. Bowden, M. Garbey, V. Ilyashenko, J. Pojman, S. Solovyov and V. Volpert, J. Chem. Physics B, 101, 678 (1997). 11. V. Capasso. Mathematical structure of epidemic systems. Lecture Notes in Biomathematics, 97, Springer-Verlag, Heidelberg, 1993. 12. P. Constantin, A. Kiselev and L. Ryzhik, Comm. Pure Appl. Math., 56, 1781 (2003). 13. P. Constantin, M. Lewicka and L. Ryzhik, Nonlinearity, 19 2605 (2006). 14. C. Darwin. On the origin of species by means of natural selection. 1859. Barnes and Noble Classics, New York, 2004. 15. A.Ducrot, Math. Models and Methods in Appl. Sciences, 16, 793 (2006). 16. A. Ducrot, Discrete Contin. Dyn. Syst. Ser. B, 7, 251 (2007). 17. A.Ducrot and M.Marion, Nonlinear Analysis, TMA, 61, 1105 (2005). 18. A. Ducrot and M. Marion, “Patterns and Waves”, A. Abramian, S. Vakulenko, V. Volpert, Eds. St. Petersburg, (2003), pp. 79-97. 19. A. Ducrot, M. Marion and V. Volpert. CRAS, 340, 659 (2005). 20. A. Ducrot, M. Marion and V. Volpert, Int. J. Pure and Applied Mathematics, 27, 179 (2006). 21. A. Ducrot, M. Marion and V. Volpert, Adv. Diff. Equations, to appear. 22. P.C. Fife and J.B. McLeod, Arch, Rational Mech. 75, 281 (1981). 23. R.A. Fisher, Ann. Eugenics, 7, 355 (1937). 24. M. Freidlin, Surveys Appl. Math., Eds. M. Freidlin, S. Gredeskul, J. Hunter, A. Marchenko, L. Pastur. Plenum Press, New York, 2 (1995), pp. 1-62. 25. M. Garbey, A. Taik and V. Volpert, Quart. Appl. Math., 225 (1996). 26. M. Garbey, A. Taik and V. Volpert, Quart. Appl. Math., 1 (1998). 27. R. Gardner, J. Diff. Eq., 44, 343 (1982). 28. R. Gardner, J. Diff. Eq., 61, 335 (1986). 29. S. Genieys, V. Volpert and P. Auger, Math. Model. Nat. Phenom. 1, 65 (2006), no. 1. 30. S. Genieys, V. Volpert and P. Auger, Comptes Rendus Biologies, 329, 876 (2006). 31. S. A. Gourley, J. Math. Biol. 41, 272 (2000). 32. K.P. Hadeler and F. Rothe, J. Math. Biology, 2, 251 (1975). 33. S. Heinze, Preprint No. 506, Heidelberg, (1989), p. 46. 34. Ya. I. Kanel, Mat. Sb., 101, 245 (1962). 35. A. Kiselev and L. Ryzhik, Nonlinearity, 14, 1297 (2001).
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
27
36. A.N. Kolmogorov, I.G. Petrovsky and N.S. Piskunov, B. Univ. d’Etat ` a Moscou, S´er. Intern. A, 1 1 (1937). 37. A. Langlois and M. Marion, Asymptotic Analysis, 23, 195 (2000). 38. M. Lewicka, J. Diff. Eq., 237, 343 (2007). 39. G.M. Makhviladze and B.V. Novozhilov, Zh. Prikl. Mekh. i Tekhn. Fiz. 51 (1971), No. 5, English transl. in J. Appl. Mech. Tech. Phys. 40. S. Malham and J. Xin, Comm. Math. Phys., 193, 287 (1998). 41. O. Manley, M. Marion and R. Temam, Indiana Univ. Math. J. 42, 941 (1993). 42. S.B. Margolis, H.G. Kaper, G.K. Leaf and B.J. Matkowsky, Combust. Sci. and Technol., 43, 127 (1985). 43. B.J. Matkowsky and D.O. Olagunju, SIAM J. Appl. Math., 42, 1138 (1982). 44. B.J. Matkowsky and G.I. Sivashinsky, SIAM J. Appl. Math., 35, 465 (1978). 45. B. J. Matkowsky and G. I. Sivashinsky, SIAM J. Appl. Math., 37, 686 (1979). 46. B. McCaughey, J.A. Pojman, C. Simmons and V.A. Volpert, Chaos, 8, 520 (1998). 47. A.G. Merzhanov, A.K. Filonenko and I.P. Borovinskaya, Dokl. Phys. Chem., 208, 122 (1973). 48. J. D. Murray, Mathematical biology. Corrected 2nd printing, Springer-Verlag, Berlin, 1990. 49. B. Perthame and S. Genieys, Math. Model. Nat. Phenom., 2, 135 (2007), No. 4. 50. S. Ruan, Spatial-temporal dynamics in nonlocal epidemiological models, to appear. 51. G. I. Sivashinsky, Combust. Sci. and Technol., 15, 137 (1977). 52. G. I. Sivashinsky, SIAM J. Appl. Math., 40, 432 (1981). 53. G. I. Sivashinsky, In Eyrolles, editor, Mod´elisation des ph´enom`enes de combustion, 59, 121 (1985). 54. R. Texier and V. Volpert, Revista Matematica Complutense, 16, 233 (2003). 55. R. Texier and V. Volpert, CRAS, 333, Serie I, 1077 (2001). 56. J.M. Vega, Differential Integral Equations, 6, 131 (1993). 57. V. Volpert, S.P. Davtyan and A.I. Malkin, Doklady Physical Chemistry, 29, 1075 (1984), No. 2. 58. Vit.A. Volpert, Vl.A. Volpert, S.P. Davtyan, I.N. Megrabova and N.F. Surkov, SIAM J. Appl. Math., 52, 368 (1992), No. 2. 59. A.I. Volpert and V.A. Volpert, Trans. Moscow Math. Soc., 52, 59 (1990). 60. A. Volpert, Vit. Volpert and Vl. Volpert, Traveling wave solutions of parabolic systems, Translation of Mathematical Monographs, Vol. 140, Amer. Math. Society, Providence, 1994. 61. V. Volpert and A. Volpert, CRAS, 328, 123 (1999). 62. V. Volpert and A. Volpert, Eur. J. Appl. Math., 9, 507 (1998). 63. A.I. Volpert and V.A. Volpert, Asymptotic Analysis, 23, 111 (2000). 64. A. Volpert and V. Volpert, Comm. PDE, 26, 421 (2001). 65. A. Volpert and V. Volpert, Trans. Moscow Math. Soc., 67, 127 (2006). 66. Ya.B. Zeldovich, G.I. Barenblatt, V.B. Librovich and G.M. Makhviladze, The Mathematical Theory of Combustion and Explosions, Plenum, New York, 1985.
April 24, 2009
8:48
WSPC - Proceedings Trim Size: 9in x 6in
Vitaly.Volpert.novo2
28
67. Ya.B. Zeldovich and G.I. Barenblatt, Combust. Flame, 3, 61 (1959). 68. Ya.B. Zeldovich and D.A. Frank-Kamenetskii, Acta Physicochim. USSR, 9, 341 (1938) (Russian).
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
29
LINEAR AND NONLINEAR FRONT SELECTION FOR REACTION-DIFFUSION EQUATIONS A. GORIELY J. ROSE Program in Applied Mathematics, BIO5 Institute. Department of Mathematics, University of Arizona, Tucson, AZ85721 E-mail:
[email protected] Many reaction-diffusion equations exhibit front solutions. These solutions connect two asymptotic states and propagates at a constant speed. The main problem is to compute the speed of the front selected by the dynamics. The conventional methods rely on the computation of exact solutions. However, these solutions can not be computed in general. A review of the different techniques and known bounds are given and a detailed analysis of a 2D reaction-diffusion equation is given.
1. Introduction Reaction-diffusion equations are nonlinear evolution equations of the form:
ut = △u + uF (u)
u ∈ Cn ,
(1)
where F (u) is analytic in some domain, △u is the Laplacian of u, and ut stands for the time derivative of u. These equations have received considerable interest from mathematicians, physicists, and biologists. In physics, reaction-diffusion equations allow to describe pattern formation in systems driven beyond the limits of stability of their spatially homogeneous states. It is observed that near onset, critical systems restabilize to form complex structure known as “patterns”. For instance, Rayleigh-Benard convection in fluid dynamics is a typical physical setting where transition from homogeneous states to spatially periodic structure can be observed.1,2 A motionless fluid (the homogeneous state) is heated from below (in a system with gravity) until the system reaches a crucial stage where
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
30
convection occurs. The behavior of the flow near the onset can be described by reaction-diffusion equations such as Newell-Whitehead equation3 . One of the central aspect for the understanding of pattern formation is the propagation of patterns from spatially homogeneous states to unstable states. The typical setup for such phenomena is the propagation of perturbation for some homogeneous states due to some abrupt “quenching” of the system. The propagation will develop domain walls separating the unstable state from some other states. These domain walls are called fronts and we shall further focus our study on these solutions. Many others physical and chemical systems exhibit pattern formation and front propagation such as the Taylor-Couette instability,4 crystal growth models,5 optical systems,6 chemical reactions,7,8 combustion dynamics9,10 and many biological system.11 The fronts propagate the perturbation pattern in the system. It is observed in physical settings that while the perturbation disturbs different linearly stable states, rapidly, unique states are selected. The physically relevant questions are: what is the preferred wavelength of patterns generated by the front propagation? At what speed does the front pattern move to the unstable states? The “selected” or “preferred” front speed is usually the speed of the most unstable mode, that is the speed at which local exponential growth are stationary. One of the simplest equation which exhibits front propagation is the FisherKolmogorov equation which plays a central role in the propagation of populations ut = uxx + u − u3
u ∈ R.
(2)
It describes the diffusion of a preferred genotype from a region (u = 1) to (u = 0). Dee and Langer12,13 developed a method for finding the selected front speed. It is based on the Fourier modes for the linearized equation around u = 0. According to their approach, the preferred speed is the maximal speed of the most unstable Fourier modes. They called this method the marginal stability approach because the speed c∗ at which the front propagates is such that the front profiles with speed c are stable for c > c∗ and unstable forc < c∗ . The important feature here is that the preferred front speed is solely determined by the linearized equation. We shall therefore refer to this preferred speed as linear front speed. While this linear approach seems correct for many systems,14 it was pointed out by Aronson and Weinberger15 that some equations admit stable fronts with faster propagation speed than the linear front speed. One of such
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
31
u=u(x)
Speed c
x Fig. 1.
A front.
equations is the quintic Fisher-Kolmogorov equation:16–18 ut = uxx + µu + u3 − u5
u ∈ R.
(3)
For µ small enough this equation has front solutions whose preferred speed e c cannot be predicted from a linear analysis of ut = uxx + µu. The speed e c will be called nonlinear front speeds. It is the purpose of this paper to study the occurrence of nonlinear fronts. These nonlinear fronts are relevant for the physics of the problem since, typically, they exist for small values of the stress parameter µ where the model equations obtained by perturbation expansions are valid. In order to find the nonlinear speeds, van Sarloos17,18 proposed an ansatz for the nonlinear front. He was able to find a closed form solution for the front, the analysis of which gives the nonlinear speeds. Subsequently, it was shown by Powell et. al.16,19 that the nonlinear fronts could be found by applying the truncated Painlev´e PDE test (the so-called Weiss-TaborCarnevale expansion as first described in20 ). More generally, there has been an extensive interest in finding particular solutions of reaction-diffusion equation using different methods such as the Painlev´e PDE test,21–25 nonclassical method,26 symmetry reduction27–29 and direct methods.30–36 However, despite the efficiency of these methods, they only apply to solvable particular solutions. In particular, van Sarloos’ ansatz and WTC method seemed to find all nonlinear fronts solutions when they can be computed.
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
32
Nevertheless, it was not understood why they work and to what extent they can be applied. Worst, there are systems for which the nonlinear fronts are proved to exist but whose closed form (henceforth, their speeds) cannot be computed.37 The simplest example is provided by a slight modification of (3): ut = uxx + µu + u4 − u5
u ∈ R.
(4)
The method proposed by van Sarloos is correct but it can only be applied to a subclass of integrable systems∗ :38 ut = uxx + µu + νu
n+1 2
− un
u ∈ R.
(5)
In this paper, I use the phase-portrait topology together with the behavior of the solutions around singularities in complex time to prove that the WTC-expansion is valid and does provide the nonlinear front speeds. 2. Basic facts about front selection and front speed 2.1. Formulation of the problem We consider the PDE: ut = uxx + f (u),
(6)
where u = u(x, t), t > 0, x ∈ R and u ∈ C. We assume that f (0) = 0, f ′ (0) > 0, f (u+ ) = 0 for some u+ > 0. A front solution is a solution of the form u(x, t) = u b(x − ct) ∈ R such that: u b(z) → 0 z→∞
(7)
u b(z) → u+
(8)
uzz + cuz + f (u) = 0.
(9)
z→−∞
As a consequence, the front solution is a heteroclinic connection joining the point u+ to u = 0 in the phase-space (u, uz ) of the reduced ODE. The reduced ODE is obtained by the traveling wave reduction (z = x − ct):
∗ The term integrable is used here for the equations whose heteroclinic solutions can be expressed analytically not for “completely integrable” equations.
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
33
Therefore, the question of existence for fronts in the PDE is equivalent to the existence of heteroclinic connections for the reduced ODE. Now, since c is a free parameter of the system it can be seen that for certain values of the parameters there exists a continuous family of heteroclinic connections with different speeds. The problem is to find the speed selected by the dynamics of the PDE.
2.2. Front selection for scalar equation In order to find the speed selected by the dynamics, one considers real15 or complex39 perturbations of the front and study the stability of the solutions. This approach reveals the existence of bands of stable fronts from which the asymptotic speed can be determined. The critical speed is defined as the slowest speed for which positive front solutions exist.39 Therefore, the problem of finding the selected front for the PDE reduces to finding the minimum value of c such that the corresponding heteroclinic connection in the phase space of the reduced system is such that u(z) > 0 ∀z. If a nonlinear front exists, then it must be a strong heteroclinic (−) (+) connection. Let W0 (resp. W0 ) be the unstable manifold along the most unstable eigendirections (1, λ− ) (resp. (1, λ+ )) (λ− ≤ λ+ < 0) to the origin. (+) Correspondingly, let Wu+ be the unstable manifold to the unstable fixed point u+ . The strong (resp. weak) heteroclinic connection u bS (resp. u bW ) is the intersection of stable and unstable manifolds (whenever the intersection is not empty):
(−)
SH : u bS = W0 WH : u bW =
∩ Wu(+) +
(+) W0
∩
Wu(+) +
(10) (11)
For fixed values of the parameters there is at most one strong heteroclinic connection. Therefore, this connection is non-generic. When the strong connection exists, it is steeper than any generic front. Using the idea of the marginal stability approach, the strong heteroclinic (SH) connection will be selected whenever it is faster for the asymptotic front state. As a consequence, the problem of finding the nonlinear front speed for the PDE is equivalent to looking for SH connections for the reduced ODE. The phase-portrait of the reduced ODE is shown on Figure 2. We consider µ = 0.1 and vary c as to obtain all possible connections.
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
34
Marginal Stability (linear fronts)
Oscillatory solutions Fig. a: c
0.1 u z
0.1
uz
Fig. b: c=c*
0.05 -0.2
0.2
0.4
0.6
0.8
u 1.2
1
u
-0.2 -0.05
-0.1
0.2
0.4
0.6
0.8
1
1.2
1
1.2
-0.1 -0.15
-0.2
-0.2 -0.25
-0.3 -0.3
Strong Heteroclinic Connection 0.1
Generic connections
~ Fig. c: c=c
uz
0.1
~ Fig. d: c>c
uz
0.05
0.05
u
u -0.2 -0.05
0.2
0.4
0.6
0.8
1
-0.2 -0.05
1.2
-0.1
-0.1
-0.15
-0.15
-0.2
-0.2
-0.25
-0.25
-0.3
-0.3
0.2
0.4
0.6
0.8
Weak Heteroclinic connection Fig. e
0.1 uz 0.05 -0.2 -0.05 -0.1 -0.15 -0.2
u 0.2
0.4
0.6
0.8
1
1.2
µ=0.1 u+ =1.048 c* =0.632 (Marginal Stability) ~ c =0.789 (SH) c = 1.944 (WH)
-0.25 -0.3
Fig. 2. The phase-portrait of the quintic Fisher-Kolmogorov equation. µ = 0.1 and different values of c are considered. Note that the strong heteroclinic connection is the first connection for which the solution remains strictly positive in the neighborhood of u = 0.
2.3. Linear versus nonlinear front selection In order to understand the major difference between linear and nonlinear front selection we consider now the case where linear selection is achieved and show directly at the level of the PDE the selection of modes:
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
35
2.3.1. Linear front selection To determine the asymptotic linear speed c∗ , we consider the linearized equation: ut = uxx + µu
(12)
′
where µ = f (0). Starting from general initially localized conditions we consider the evolution of the initial conditions Z
u(x, 0) = ui (x),
+∞
−∞
|ui |dx ≪ 1,
(13)
and predict the asymptotic speed from the method of the steepest descents.16,40 For small enough u, the solution can be written in Fourier modes
u(x, t) =
+∞
Z
−∞
where
u b(k) =
u b(k) exp [ikx + Ω(k)t] dk
1 2π
Z
(14)
+∞
ui (x)eikx dx
(15)
−∞
and Ω(k) = µ − k 2 . Let x = z + ct, u(z, t) reads now: u(z, t) =
Z
+∞
−∞
u b(k) exp [ikz + h(k)t] dk
(16)
where h(k) = ick + Ω(k). The integral is dominated by the stationary phase point [41, p. 437]: h′ (k) = 0 ⇔ c∗ = −2ik ∗ .
(17)
Now, the propagation speed of the front is the speed c∗ such that the solution is stable in time, that is: √ ℜ(h(k ∗ )) = 0 ⇔ c∗ = 2 µ.
(18)
The “marginal stability” argument proposed by Dee, Langer and van Sarloos is equivalent to the steepest descent method. The idea is to consider
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
36
localized perturbation in the moving frame. All fronts with c > c∗ are stable while fronts with c < c∗ are unstable. A complete proof of the linear selection principle is still lacking. However, this analysis gives a physical picture of the selection of the marginal speed in the allowable front speed, except when there exists a nonlinear front with speed e c > c∗ . We have seen ∗ that c corresponds to the fastest mode among a continuum of unstable modes determined by the dispersion relation h(k). However, it is possible to have discrete fronts with speed e c ≥ c∗ . These discrete fronts could not be obtained by the steepest descent methods which requires the existence of a continuum of mode for the derivation rules to be applicable. Moreover, if such a front exists it would intrinsically depend on the nonlinear part of the equation. Nonlinear fronts have been observed in experimental settings such as the iodate oxidations of arsenious acids.42 2.3.2. Nonlinear front selection To illustrate the problem, consider again the equation ut = uxx + µu + u3 − u5
u ∈ R.
(19)
The linear analysis suggests that the preferred linear speed is given by √ c∗ = 2 µ. However, numerical analysis shows that for small µ, there exists a nonlinear speed e c > c∗ . An analytical study, using van Sarloos’ ansatz, or the WTC expansion reveals the existence of a front solution: e
where z = x − ct, and
u+ eλz u(z) = p e 1 + e2λz
e2 −µ + λ e c= , e λ
√ p − 3 e λ= 1 + 1 + 4µ . 6
(20)
(21)
It is striking that this solution is exactly the one we obtained when we applied the Painlev´e test on the reduced ODE: uzz + cuz + µu + u3 − u5 = 0. ∗
∗
(22)
The curves e c = e c(µ) and c = c (µ) are plotted on Figure 3. For µ ≤ µc = 3/4, the selected front speed for the PDE is given by e c and ∗ by c for µ ≥ µc .
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
37
c
~ ~ µ c=c( )
3
c*=c*( µ )
2.5 2 1.5 1 0.5
µc =3/4 0.5
Fig. 3.
1
µ 1.5
2
2.5
The linear and nonlinear front speeds for the quintic Fisher Kolmogorov equation
2.4. Integrability theory The fact that the integrability methods can be applied to find the nonlinear front solution raises an interesting question: To what extent integrability methods can be effectively used? First, let us consider the second question. It has been answered by Powell and Tabor in.37 They show that the largest class of systems of the form (6) for which either the WTC or the van Sarloos method can be applied is given by the class:
uzz + cuz + µu + νu
n+1 2
− un = 0.
(23)
The nonlinear speed of which is given by:
with β =
2 n−1 .
e c = ±ν
s
β+1 1 (β + 1) ± β 4
s
ν2
β + 1 + 4µ β
(24)
The front solution reads then:
e and where αβ = λ
u b = u+
eαz 1 + eαz
β
(25)
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
38
e= 1 λ 4
±ν
s
β+1 ± β
s
ν2
β + 1 + 4µ β
!
.
(26)
2(n+1) When ν > 0 and µ < µc = (n−1) 2 , the nonlinear front solution is faster and steeper than the linear fronts.
2.5. A counterexample All the exact methods are based on the same idea: if we can compute the strong heteroclinic connection, then it gives the nonlinear speed. This may seem obvious but an analytic form is not always available. Can we still compute the nonlinear speeds? Is there a simple way to obtain the speeds without computing the solution? The reduced ODE for system (4) is: uzz + cuz + µu + u4 − u5 = 0.
(27)
Powell and Tabor presented this system as a counterexample for the analytic approach. Using standard dynamical systems arguments, it is possible to prove the existence of an SH for (27). However, this system does not fall in the class amenable by the techniques relying on exact integration. 3. Front existence and propagation in a 2D system We now turn our attention to a reaction-diffusion equation modeling the spread of chemotactic bacteria swarming on an agar plate. The existence and properties of travelling waves will be explored both analytically and numerically. The model tracks the density of bacteria and the density of a nutrient to which the bacteria are assumed to be chemotactically attracted. The equation for the bacteria is: bt (x, t) = Db ∆b − Γ∇ · (b∇a) + ka ab,
(28)
where Db is the rate of bacterial diffusion, Γ is the strength of the bacteria’s chemotactic response, and ka is the rate at which new bacteria are produced in the presence of the nutrient. The equation for the nutrient density takes the form: at (x, t) = Da ∆a − kb ab
(29)
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
39
Here Da characterizes the rate at which the nutrient diffuses through the agar and ka is the rate at which the nutrient is consumed in the presence of bacteria. Since we wish to discuss travelling waves, we restrict the equations to one spatial dimension and apply the boundary condition a(x, t) → a0 as x → ∞ b(x, t) → 0 as x → ∞
(30) (31)
Note that if Γ is set equal to zero, we have just the equations for quadratic autocatalysis of a reactant a by an autocatalyst b. Travelling waves of this system have been the subject of several papers by Billingham and Needham.43–45 If in addition Da = Db the system reduces (upon addition and subtraction of the two equations) to the well-known Fisher equation. We can reduce the number of parameters in the equations if we introduce the dimensionless variables α = a/a0 , β = ka b/kb a0 , τ = kb a0 t/, ξ 2 = kb a0 x2 /Da
(32)
which transform the system of equations to ατ = αξξ − αβ
βτ = δβξξ − γ(βαξ )ξ + αβ
(33) (34)
where δ = Db /Da , γ = a0 Γ/Da and the boundary conditions are α(ξ, τ ) → 1 , β(ξ, τ ) → 0 as ξ → ∞ Following Billingham and Needham,43 we define a traveling wave. Definition 1. A permanent form travelling wave solution of equations (33,34) is a non-trivial, non-negative solution that depends only on the single variable z = x − c(t), where c(t) is the position of the wavefront, and satisfies the conditions α → 1, β → 0 as ξ → +∞ and α → α−∞ , β → β−∞ as ξ → −∞ where α−∞ ,β−∞ are the uniform, non-negative concentrations behind the wavefront. 3.1. General properties of travelling wave solutions Travelling waves are obtained by seeking a solution of the form α ≡ α(z), β ≡ β(z). Substituting this into (33,34) gives
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
40
αzz + vαz − αβ = 0
(35)
δβzz + vβz − γ(βαz )z + αβ = 0
(36)
where v(t) = dc/dt, a constant. We will now use the above definition of a travelling wave to prove some elementary results about travelling wave solutions of the system (33,34). These results are similar to the results obtained by Billingham and Needham,43 and we give their proofs with only slight modifications. Proposition 1. A permanent form travelling wave solution of equations (35,36) has α > 0, β > 0 for all −∞ < z < ∞ Proof. Let α(z), β(z) be a permanent form travelling wave solution and suppose that there exists a z0 such that α(z0 ) = 0. Then, since α(z) is non-negative, it must be that αz (z0 ) = 0. Equation (35) can be thought of as a second-order, linear, ordinary differential equation for α(z) with βz acting as a coefficient. As such, any initial value problem for α(z) has a unique solution in −∞ < z < ∞. Equation (35) α(z0 ) = 0, αz (z0 ) = 0 forms an initial value problem for α(z) with the unique solution α(z) ≡ 0 for −∞ < z < ∞. A similar argument demonstrates the result for β. Proposition 2. A permanent form travelling wave solution of Equations (35, 36) has α → 0, β → 1 as z → −∞. Proof. Let α(z), β(z) be a permanent form travelling wave solution. As z → −∞, α → α−∞ and so αzz → 0, αz → 0. Thus, from Equations (35, 36), either α−∞ = 0 or β−∞ = 0. After integrating Equations (35, 36) with respect to z on the range −∞ < z < ∞ we obtain Z
∞
−∞
αβ = v(1 − α−∞ ),
Z
∞
αβ = v(β−∞ ),
(37)
−∞
which shows that α−∞ + β−∞ = 1, and therefore either α−∞ = 0, β−∞ = 1 or α−∞ = 1, β−∞ = 0. However, from (37) and Proposition 1, β−∞ 6= 0 and therefore α−∞ = 0, β−∞ = 1 and the proposition is established. Proposition 3. A permanent form travelling wave solution of Equations (35, 36) is strictly monotone increasing in α, and (if γ ≤ 1) monotone
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
41
decreasing in β, with 0 < α < 1 and 0 < β < 1 for −∞ < z < ∞. Proof. Let α(z), β(z) be a permanent form travelling wave solution. Suppose αz (z) has more than one zero in −∞ < z < ∞. Let zn and zn+1 be two consecutive zeros of αz (z) with zn < zn+1 . Then, using Equation (35) and Proposition 1, we have that αzz (zn+1 ) > 0 and hence αz (z) < 0 for all zn < z < zn+1 . Thus αzz (zn ) ≤ 0. However, from Equation (35) and Proposition 1, we obtain αzz (zn ) > 0, a contradiction which implies that αz (z) has at most one zero for −∞ < z < ∞. Suppose now that αz (z) has exactly one zero in −∞ < z < ∞ at z = z0 . Since αz (z0 ) = 0, Equation (35) and Proposition 1 show that αzz (z0 ) > 0, and hence αz (z) < 0 for all −∞ < z < z0 . Therefore, on integrating αz with respect to z on the range −∞ < z < z ∗ , we obtain, on using Proposition 2, Z z∗ αz dz = α(z∗ ) < 0, −∞
∗
for any −∞ < z < z0 , which violates Proposition 1. Thus we conclude that αz 6= 0 for any −∞ < z < ∞. Also, from Proposition 2, α → 0 as z → −∞ and α → 1 as z → +∞, and so α(z) is strictly monotone increasing, with 0 < α < 1 for −∞ < z < ∞. Because of the chemotactic term, the above proof cannot be adapted to β(z). We can however, use a phase space argument. We first note that we can solve Equation (35) for αβ in terms of derivatives of α and plug this into Equation (36) to get: δβzz + αzz + v(αz + βz ) − γ(βαz )z = 0. Integrating once in time and applying the boundary conditions gives δβz + αz + v(α + β − 1) − γ(βαz ) = 0, which can be solved for βz . (Alternatively, this could be solved for αz which, as we shall see later, may at times be more convenient). We then have a system of 3 first-order ODE. βz = δ −1 (v(1 − α − β) + (γβ − 1)w),
(38)
αz = w,
(39)
wz = αβ − vw,
(40)
with fixed points at (α, β, w) = (0, 1, 0) and (1, 0, 0).
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
42
Linear stability shows that (0, 1, 0) is a saddle with one unstable eigenvector and (1, 0, 0) is a stable node or center so that a travelling wave corresponds to a heteroclinic orbit leaving along the unstable eigenvector at (0,1,0) and and approaching (1, 0, 0) as z → ∞ By examining the surface on which βz = 0 it can be shown that if γ ≤ 1, it is impossible for trajectories to cross through the β null-cline from the side where βz < 0 to the side where βz > 0. Furthermore, the unstable eigenvector points into the region of phase space where βz < 0. Therefore, βz ≤ 0. Indeed, We need to show that if γ < 1 it is impossible for trajectories to cross through the β nullcline from the side where βz < 0 to the side where βz > 0. Recall that our system of equations took the form βz = δ −1 (v(1 − α − β) + (γβ − 1)w),
(41)
αz = w,
(42)
wz = αβ − vw.
(43)
Therefore, βz = 0 implies that for fixed β = β0 , w = v(α+β0 −1)/(γβ0 − 1). Since β0z = 0, we need only show that −v w(α, β0 )z > αz 1 − γβ
where 0 < β ≤ 1, 0 ≤ α < 1 − β and γ ≤ 1. Since α ≥ 0 and (α + β − 1) < (1 − β + β − 1) = 0, it is clear that α(γβ − 1)2 ≥ 0 > v 2 γ(α + β − 1). Multiplying through by β, αβ(γβ − 1)2 ≥ v 2 γβ(α + β − 1). Dividing both sides by the positive quantity (γβ − 1)2 , αβ ≥ v 2 γβ(α + β − 1)/(γβ − 1)2 , which is αβ ≥ vwγβ/(γβ − 1) = w(v −
v ). 1 − γβ
Subtract vw from both sides and divide by w to obtain v (αβ − vw)/w = wz /αz ≥ − . 1 − γβ √ Interestingly, if v < γ + √1γ then the unstable eigenvector points into the region of phase space where βz > 0 and β > 1 so that travelling waves,
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
43
if they exist, must have a region where β increases before decreasing to zero. Proposition 4. A permanent form travelling wave solution of Equations (35, 36) has α + β ≥ 1 if δ ≥ 1 and γ ≤ 1. Proof. Let δ ≥ 1 and γ ≤ 1. On addition, Equations (35,36) may be integrated once to yield, after application of the conditions in Proposition 2, (α + β)z + v(α + β) = (1 − δ)βz + γβαz + v ≥ v for − ∞ < z < ∞ Integrating this inequality and applying the condition (α + β) → 1 as z → ∞, establishes the desired result. In Proposition 3, if we had solved our conservation law for αz instead of βz , we would have obtained the third-order system αz = (v(1 − α − β) − δw)/(1 − γβ)
βz = w
wz = −δ
−1
2
(αβ + vw − γ(wαz + β α − vβαz )).
(44) (45) (46)
As before, this system has only two equilibrium points, at (α, β, w) = (0, 1, 0) and (1, 0, 0) and we seek a heteroclinic connection between these two points. It is appropriate to first consider the linear behavior near these two points. Linearizing about (0, 1, 0) shows that the equilibrium point is a saddle with a two-dimensional stable manifold and a one-dimensional unstable manifold. The eigenvalues and corresponding eigenvectors are: λ1 = −vδ −1
(47) −1 T
eλ1 = (0, 1, −vδ ) 1 p λ2 = ( (v + γ)2 + 4(1 − γ) − (v + γ))/(1 − γ) 2 eλ2 = (−(v + δλ2 )/(v + λ2 (1 − γ)), 1, λ2 )T 1 p λ3 = − ( (v + γ)2 + 4(1 − γ) + (v + γ))/(1 − γ) 2 eλ3 = (−(v + δλ3 )/(v + λ3 (1 − γ)), 1, λ3 )T .
(48) (49) (50) (51) (52)
Linearization about (1, 0, 0) shows that the fixed point is stable, with eigenvalues and eigenvectors:
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
44
µ1 = −v
(53) T
eµ1 = (1, 0, 0) p 1 µ2 = − δ −1 (v + v 2 − 4δ) 2 eµ2 = (−(δµ2 + v)/(v + µ2 ), 1, µ2 )T p 1 µ3 = − δ −1 (v − v 2 − 4δ) 2 eµ3 = (−(δµ3 + v)/(v + µ3 ), 1, µ3 )T .
(54) (55) (56) (57)
(58) √ When v < 2 δ, µ2 and µ3 become complex, causing trajectories to oscillate as they approach (1, 0, 0), inconsistent with our definition of a travelling wave. As a result, we have Proposition √ 5. There exist no permanent form travelling wave solutions for v < 2 δ. Proposition 6. A permanent form travelling wave solution of equation (33,34) exists for each 4δ ≤ v 2 ≤ 1/(γ(1 + 1/(2δ))). Note that this set is empty unless γ is sufficiently small. Proof. Define the region R = {(α, β, w) : 0 ≤ α ≤ 1, 0 ≤ β ≤ 1, −(vβ/2δ) ≤ w ≤ 0}. Let 4δ ≤ v 2 ≤ 1/(γ(1 + 1/(2δ))). We first show that all orbits crossing through the faces of R are directed strictly into R. Recall that w satisfy the ODE αz = (v(1 − α − β) − δw)/(1 − γβ)
βz = w
wz = −δ
−1
(αβ + vw − γ(βαz )z ).
(59) (60) (61)
We check two of the faces of R. First, on the face w = 0, 0 ≤ α ≤ 1, 0 ≤ β ≤ 1, αz = (v(1 − α − β))/(1 − γβ) βz = 0 wz = −δ −1 (αβ − γ(αβ 2 − vβαz ))
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
45
or wz = −δ −1 (αβ − γαβ 2 + v 2 γβv(1 − α − β)/(1 − γβ))). We see that wz < 0 because wz = −δ −1 (αβ(1 − γβ − v 2 γ/(1 − γβ)) + v 2 γβv(1 − β)/(1 − γβ))). If wz < 0, then 1 − γβ − v 2 γ/(1 − γβ) and v 2 γβv(1 − β)/(1 − γβ)) should be non-negative. But v 2 γβv(1 − β)/(1 − γβ)) is clearly positive by the assumption on β while 1 − γβ − v 2 γ/(1 − γbeta) ≥ 1 − γ − v 2 γ/(1 − γ) 1 − γ − v 2 γ/(1 − γ) ≥ 0 implies that γ 2 − (2 + v 2 )γ + 1 ≥ 0 but by the assumption that 4δ ≤ v 2 ≤
1 γ(1+1/2δ)
γ 2 − (2 + v 2 )γ + 1 ≥ γ 2 − (2 + v 2 )/(v 2 + v 2 /2δ) + 1 ≥ γ 2 − (2 + v 2 )/(v 2 + 2) + 1 ≥ γ 2 > 0. It is easily seen that on the edges of this face αz has the appropriate sign. On the face w = −vβ/2δ,0 ≤ α ≤ 1, 0 ≤ β ≤ 1 αz = (v(1 − α − βz = −
β )/(1 − γβ) 2
vβ 2δ
wz = −δ −1 (αβ −
(62) (63)
vβ vβ − γ(αβ 2 − vβαz − αz )). 2δ 2δ
(64)
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
46 2
v z In this case it must be shown that the ratio w βz < − 2δ . Notice that 1 1 1 1 γ ≤ v2 +v2 /2δ ≤ v2 +2 ≤ 2 which implies that 2γ > 1 so that ∞ X 1 v2 (γβ)n > 0 − 1 + α − α + γαβ + (α + − 1) 4δ 2γ n=1
but ∞ ∞ X v2 v2 1 β X (γβ)n = −α+γαβ−(1−α− ) (γβ)n −1+α−α+γαβ+(α+ −1) 4δ 2γ 4δ 2 n=1 n=0
≤
v2 β wz v2 v2 − α + γαβ − γ(v 2 + (1 − α − )/(1 − γβ) = − − 4δ 2δ 2 βz 2δ
The remaining faces and edges must also be checked, but the calculations are less involved and are left to the reader. Therefore any orbit that enters R must remain in R. Since βz = w ≤ 0 within R, any orbit which starts within R or strictly enters R, is monotone decreasing in β as z increases. However, the orbit must be bounded below by the edge of R along β = 0. Therefore, all these orbits must come arbitrarily close to (1, 0, 0). In particular, the unstable manifold of the point (0, 1, 0) strictly enters R and must remain within R until it finally connects with (1, 0, 0). Notice that we have only shown a connection for a very limited subset of the (δ, γ, v) parameter space, in particular γ must be fairly small and in no case can it be larger than 1/2. It would be interesting to know if travelling waves exist for larger values of γ. In the next section, we use numerical simulations of both the ODE system (35,36) and the original PDE (33,34) to gain insight into this question. 3.2.
Numerical Investigation
A numerical routine was written in order to integrate the system of PDE ατ − vαξ = αξξ − αβ βτ − vβξ = δβξξ − γ(βαξ )ξ + αβ over a finite, one-dimensional domain −c ≤ ξ ≤ c. Initial conditions were specified as α(ξ, 0) = 0, β(ξ, 0) = 1, − c ≤ ξ < 0 α(ξ, 0) = 1, β(ξ, 0) = 0, 0 ≤ ξ < c
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
47
These initial conditions were allowed to evolve until a travelling wave developed and then the value of v that allowed the wave to remain stationary was recorded. The ODE integrating program XPP was used to investigate solutions of
βz = δ −1 (v(1 − α − β) + (γβ − 1)w)
(65)
αz = w
(66)
wz = αβ − vw
(67)
for the same parameter values as were used in the PDE simulation. Using XPP, we determined the minimum values for v that produced numerical results consistent with the properties of permanent form travelling wave solutions. These values of v were then compared to the values of v evolving in the numerical PDE experiments. Table 1 summarizes these results. The linear stability √ analysis about the fixed points gives the velocity selection criterion v ≥ 2 δ. It seems clear from Table 1 that as γ increases above a certain threshold, the original velocity selection criterion is replaced by one that depends on γ as well as δ. The numerical solution also suggests that traveling waves exist for values of γ that are well above the threshold in Proposition 6. Earlier it was noted √ that v > γ + √1γ for γ > 1 was a necessary condition for monotonicity of β. While experimenting numerically using XPP over a broad range of the parameters δ and γ, it was observed that when v is below this threshold, β always diverged to +∞. When v is above this threshold, numerical simulations indicate that a heteroclinic connection exists, though usually when v is too close to the threshold, β becomes negative before entering the node at (1, 0, 0). This is reflected in Table 1, where it can be seen that all values of v are above this threshold, though usually farther above as δ get larger.
Conclusions The PDE dynamics selects the nonlinear front whenever the speed is faster and the front steeper than the linear front. This selection principle can be understood in the phase-space of the reduced ODE. It is equivalent to the existence of a strong heteroclinic curve joining the unstable fixed point to the stable one. The analytical computation of the front speed remains, in general, an open problem.
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
48
Table 1. Numerically observed values of v for various values of the parameters γ and δ. PDE v is the speed that naturally evolved under “step” initial conditions. XPP vmin is the smallest value of v for which numerical integration of the ODE system produced non-negative solutions with alpha less than one. v greater than two times the square root of delta is the minimum value of v that will prevent solutions from oscillating as they approach α = 1, β = 0. δ = 0.1 PDE v
√ 2 δ = 0.63 XPP vmin
δ = 0.5 PDE v
√ 2 δ = 1.41 XPP vmin
2.04 1.81 1.55 1.26 0.95 0.68 0.61 0.61
2.05 1.81 1.55 1.26 0.96 0.70 0.63 0.62
2.21 2.01 1.82 1.64 1.49 1.42 1.41 1.41
2.19 2.00 1.82 1.64 1.48 1.41 1.41 1.41
δ = 1.0 PDE v
√ 2 δ=2 XPP vmin
δ = 2.0 PDE v
√ 2 δ = 2.83 XPP vmin
2.44 2.31 2.18 2.06 2.02 2.01 2.00 2.00
2.42 2.28 2.15 2.05 1.99 1.97 1.96 1.96
2.94 2.88 2.83
2.92 2.86 2.81
2.83
2.81
γ = 6.0 γ = 5.0 γ = 4.0 γ = 3.0 γ = 2.0 γ = 1.0 γ = 0.5 γ=0
γ = 6.0 γ = 5.0 γ = 4.0 γ = 3.0 γ = 2.0 γ = 1.0 γ = 0.5 γ=0
Acknowledgments This material is based in part upon work supported by the National Science Foundation under grant No. DMS-0604704 (A.G.). The author thanks M. Tabor for introducing him to the problem, J. Powell for many fruitful discussions.
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
49
References 1. S. Fauve and O. Thual. Solitary waves by subcritical instabilities in dissipative systems. Phys. Rev. Lett., 64:282–284, 1990. 2. M. Assenheimer and V. Steinberg. Transition between spiral and target states in Rayleigh-Benard convection. Nature, 367:345–347, 1994. 3. A. Newell. Envelope equations. Lect. Appl. Math., 15:157–163, 1974. 4. G. Ahlers and D. S. Cannell. Vortex front in rotating Couette-Taylor flow. Phys. Rev. Lett., 50:1583–1586, 1983. 5. J. S. Langer and H. M¨ uller-Krumbhaar. Mode selection in a dendritelike nonlinear system. Phys. Rev. A, 27:499–514, 1983. 6. W. van Saarloos and P. C. Hohenberg. Pulses and fronts in the complex Ginzburg-Landau equation near a subcritical bifurcation. Phys. Rev. Lett., 64:749–752, 1990. 7. G. Freeman. Kinetics of nonhomogeneous processes. Chemichal waves, pages 769–821, 1987. 8. A. Arneodo, J. Elezgaray, J. Pearson, and T. Russo. Instabilities of front patterns in reaction-diffusion systems. Physica D, 49:141–160, 1991. 9. I. N. Newman. Some exact solutions to a non-linear diffusion problem in population genetics and combustion. J. Theor. Biol., 45:325–334, 1980. 10. W. I. Newman. The long-time behavior of the solution to a non-linear diffusion problem in population genetics and combustion. J. Theor. Biol., 104:473–484, 1983. 11. J. D. Murray. Mathematical biology. Springer-Verlag, New York, 1993. 12. G. Dee. Dynamical properties of propagating front solutions of the amplitude solutions of the amplitude equation. Physica D, 15:295–304, 1985. 13. G. Dee and J. S. Langer. Propagating pattern selection. Phys. Rev. Lett., 50:383–386, 1983. 14. K. Nozaki and N. Bekki. Pattern selection and spatiotemporal transition to chaos in the Ginzburg-Landau equation. Phys. Rev. Lett., 51:2171–2174, 1983. 15. D. Aronson and H. Weinberger. Multidimensional nonlinear diffusion arising in population genetics. Adv. Math., 30:33–76, 1978. 16. J. A. Powell. Nonlinear fronts near a first-order phase transition. Ph. D. Thesis, The University of Arizona, 1990. 17. W. van Saarloos. Front propagation into unstable states: Marginal stability as a dynamical mechanism for velocity selection. Phys. Rev. A, 37:211–229, 1988. 18. W. van Saarloos. Front propagation into unstable states. II. linear versus nonlinear marginal stability and rate of convergence. Phys. Rev. A, 39:6367– 6390, 1989. 19. J. A. Powell, A. C. Newell, and C. R. K.T. Jones. Competition between generic and nongeneric fronts in envelope equations. Phys. Rev. A, 44:3636– 3652, 1991. 20. F. Cariello and M. Tabor. Painlev´e expansions for nonintegrable evolution equations. Physica D, 39:77–94, 1989. 21. M. J. Ablowitz and A. Zeppetella. Explicit solutions of fisher’s equation for
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
50
a special wave speed. Bull. Math. Biol., 41:835–840, 1979. 22. P. G. Este´ vez and P. R. Gordoa. Painlev´e analysis of the generalized BurgersHuxley equation. J. Phys. A, 23:4831–4837, 1990. 23. B. Guo and Z. Chen. Analytic solutions of the Fisher equation. J. Phys. A, 24:645–650, 1991. 24. F. Cariello and M. Tabor. Similarity reductions from extended Painlev´e expansions for nonintegrable evolution equations. Physica D, 53:59–70, 1991. 25. R. Conte and M. Musette. Linearity inside nonlinearity: exact solutions to the complex Ginzburg-Landau equation. Physica D, 69:1–17, 1993. 26. M. C. Nucci and P. A. Clarkson. The nonclassical method is more general than the direct method for symmetry reductions. An example of the Fitshugh-Nagumo equation. Phys. Lett. A, 164:49–56, 1992. 27. A. Oron and P. Rosenau. Some symmetries of the nonlinear heat and wave equations. Phys. Lett. A, 118:172–176, 1986. 28. M. Florjanczyk and L. Gagnon. Exact solutions for a higher-order nonlinear Schr¨ odinger equation. Phys. Rev. A, 41:4478–4485, 1990. 29. P. A. Clarkson and E. L. Mansfield. Symmetry reductions and exact solutions of a class of nonlinear heat equations. Physica D, 70:250–288, 1993. 30. P. Kaliappan. An exact solutions for travelling waves of ut = duxx + u − uk . Physica D, 11:368–374, 1984. 31. W. Hereman, P. P. Banerjee, A. Korpel, G. Assanto, A. Van Immerzeele, and A. Meerpoel. Exact solitary waves solutions of nonlinear evolution and waves equations using a direct algebraic method. J. Phys. A, 19:607–628, 1986. 32. W. Hereman and M. Takaoke. Solitary wave solutions of nonlinear evolution and wave equation using a direct method and macsyma. J. Phys. A, 23:4805– 4822, 1990. 33. L. Huibin and W. Kelin. Exact solutions for two nonlinear equations. J. Phys. A, 23:3923–3928, 1990. 34. J. J. Herrera, A. Minzoni, and R. Ondarza. Reaction-diffusion equations in one dimension: particular solutions and relaxation. Physica D, 57:249–266, 1992. 35. N. A. Kudryashov. Partial differential equations with solutions having movable first order singularities. Phys. Lett. A, 169:237–242, 1992. 36. Z. J. Yang. Travelling waves solutions to nonlinear evolution and wave equations. J. Phys. A, 27:2837–2855, 1994. 37. J. Powell and M. Tabor. Nongeneric connections corresponding to front solutions. J. Phys. A, 25:3773–3796, 1992. 38. A. Goriely. A simple solution to the nonlinear front problem. Phys. Rev. Lett., 75:2047–2050, 1995. 39. J. P Eckmann and C. E. Wayne. The non-linear stability of front solutions for parabolic partial differential equations. Commun. Math. Phys., 161:323–334, 1994. 40. M. C. Cross and P. C. Hohenberg. Pattern formation outside of equilibrium. Rev. Mod. Phys., 65:851–1112, 1993. 41. J. P. Keener. Principles of applied mathematics. Transformation and approximations. Addison-Wesley Publishing Company, New York, 1988.
April 24, 2009
15:48
WSPC - Proceedings Trim Size: 9in x 6in
Alain.Goriely.novo2
51
42. A. Hanna, A. Saul, and K. Showalter. Detailed studies of propagating fronts in the iodate oxidation of arsenous acid. J. Am. Chem. Soc., 104:3838–3844, 1982. 43. J. Billingham and DJ Needham. The Development of Travelling Waves in Quadratic and Cubic Autocatalysis with Unequal Diffusion Rates. I. Permanent Form Travelling Waves. Philosophical Transactions: Physical Sciences and Engineering, 334(1633):1–24, 1991. 44. J. Billingham and DJ Needham. The Development of Travelling Waves in Quadratic and Cubic Autocatalysis with Unequal Diffusion Rates. II. An Initial-Value Problem with an Immobilized or Nearly Immobilized Autocatalyst. Philosophical Transactions: Physical Sciences and Engineering, 336(1644):497–539, 1991. 45. J. Billingham and DJ Needham. The development of travelling waves in quadratic and cubic autocatalysis with unequal diffusion rates. III. Large time development in quadratic autocatalysis. Quart. Appl. Math, 50:343–372, 1992.
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
52
EPIDEMIOLOGICAL MODELS WITH DEMOGRAPHIC ALLEE EFFECT F. HILKER Centro de Matem´ atica e Aplica¸co ˜es Fundamentais, Universidade de Lisboa, Complexo Interdisciplinar - Avenida Prof. Gama Pinto 2, 1649-003 Lisboa, Portugal E-mail:
[email protected] Biological populations can be faced with two detriments simultaneously if they experience both parasitism and an Allee effect. While infection with disease causes additional mortality, the Allee effect is a demographic process describing depensation (i.e., population decline or reduced population growth at low densities in case of a ‘strong’ or ‘weak’ Allee effect, respectively). The joint interplay of disease spread and a strong Allee effect are investigated in mathematical models that consist of two differential equations (describing the susceptible and infectious part of the host population) with a cubic nonlinearity (modelling the Allee effect). Two different incidences are considered, namely frequencyand density-dependent transmission, which model the infection process at two opposite ends of a spectrum of possibilities. Various threshold quantities are derived and employed to explain infection disappearance, parasite invasion and host extinction. The comparison of dynamical behaviour in both models provides interesting insight how depensation and disease transmission interact at various population densities. The general impact of disease is (i) to depress the host population size in endemic equilibrium and (ii) to enlarge the likelihood of extinction. If the incidence is density-dependent, oscillatory dynamics are possible as well as the emergence of three endemic equilibria, rendering the population tristable. The latter scenario is discussed in detail with respect to implications for the conservation of endangered species and the management of pests such as invasive alien species. Critical parameter values are identified for which population persistence might be possible even at extremely large values of the basic reproduction number R0 , which could be expected to drive the host extinct independent of the initial condition.
1. Introduction Pathogenic parasites that induce an additional disease-related mortality in their host are well-known to regulate the host’s population dynamics. They can drive their host to extinction, limit unbounded growth or simply depress
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
53
the population size.1–6 Similarly, host demographics significantly affect the outcome of parasite invasion as well. For example, the birth of uninfected individuals provides a pool of susceptibles that can trigger and sustain ongoing infection, thus rendering an epidemic to become an endemic.7,8 For certain growth dynamics and disease characteristics, pathogen establishment induces stable oscillations.9–11 The joint interplay of demographic and epidemiological processes can clearly drive the overall population dynamics. Understanding these basic mechanisms is necessary to gain insight into disease patterns observed and devising effective control measures. While exponential and logistic host population growth, for example, have received considerable attention in the literature,12–22 only few papers have addressed yet the impact of an Allee effect in the host.23–27 The Allee effect describes positive density dependence in population growth at low densities.28,29 That is, growth is largest for some intermediate population size. This can be due to difficulties in finding mating partners at small densities, genetic inbreeding, skewed sex ratios and adverse effects for anti-predator defence or other social functionings.30–32 There is increasing empirical evidence for the Allee effect, see e.g. Refs. 33–41. If population growth is not only reduced at small densities, but actually becomes negative, this is referred to as a strong Allee effect. Infectious diseases have been implicated in the extinction and decline of a large variety of species.42–52 Small populations can be expected to be particularly threatened if they are affected by both the Allee effect as well as a virulent disease. The study of disease models with demographic Allee effects therefore is particularly relevant for endangered species and wildlife populations. The impact of such combined pressure on the host population is investigated here by way of mathematical modelling. Section 2 introduces a simple epidemiological compartment model and how a strong Allee effect can be incorporated therein. Two different forms of incidences are considered: density-dependent disease transmission, in which the effective contact rate between individuals increases linearly with population size, and frequencydependent transmission, in which the effective contact rate stays contact. This contribution largely builds upon models presented previously,25,26 but provides some novel insights, for which the synthesis of the frequency- and density-dependent transmission model have been instrumental. Section 3 identifies various threshold quantities that describe whether the disease can successfully become established and how it affects the host
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
54
population. Section 4 focuses on how the disease can drive its host population to extinction under both transmission regimes. It also shows how transient dynamics can occur over an actually very long time horizon. Section 5 is devoted to the existence of multiple endemic states in the model with density-dependent disease transmission. Several critical parameter values are derived, for which population dynamics change significantly — including the switch from endemic existence to extinction out of the blue. Finally, Sec. 6 discusses the results and provides conclusions that are relevant for population persistence and questions of wildlife management as well as biological control. Although disease spread in animal populations is an epizootological problem, we will use epidemiological terminology, because it more easily connects the fields of mathematical epidemiology and population dynamics. 2. Model description This Section describes two disease models, one with density-dependent and one with frequency-dependent transmission. The Allee effect is integrated as suggested in Refs. 23,25,26. Other possibilities of including an Allee effect as proposed in Refs. 24 and 27 will be shown at the end of this Section. 2.1. Basic model structure The total host population size N = N (T ) ≥ 0 can vary in time T ≥ 0. The disease is assumed to effectively divide the host into a susceptibles (X) and an infectious (Y ) part. There is no immunity or recovery from the disease, hence N = X + Y . Offspring of infectious individuals are susceptible. Infection takes place exclusively by direct contact between infectious and susceptible individuals. A single susceptible individual has a fraction Y /N of all its contacts with individuals that already contracted the disease. Let Θ(N ) ≥ 0 describe the number of contacts per individual that lead to effective disease transmission. Then the rate of new infections (incidence) is given by Θ(N )XY /N . The following transfer diagram illustrates these assumptions b(N ) (X+Y ) y Θ(N )XY /N
X −−−−−−−−−−−−−−→ m(N ) X y
Y [m(N )+µ] Y y
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
55
Infectious individuals experience an additional per-capita mortality µ > 0 that is induced by the disease. The per-capita birth and natural death rate are denoted by b(N ) ≥ 0 and m(N ) ≥ 0, respectively. Note that they are density-dependent. They will be specified below to incorporate the Allee effect. At the moment, the mathematical model can be formulated in generalised form as dX XY = b(N )N − m(N )X − Θ(N ) , dT N XY dY = Θ(N ) − m(N )Y − µY . dT N
(1) (2)
2.2. Density- and frequency-dependent disease transmission The driving force behind the dynamics of any infectious disease is the transmission process.53 The incidence function in epidemiological models can therefore be of crucial importance. The transmission pattern between individuals is complex and depends on the social behaviour as well as spatial organisation of the host population,54 since these factors influence the contact rate leading to disease transmission. There are two standard transmission functions that have been traditionally considered in mathematical epidemiology. The first one is density-dependent transmission (also called mass action), in which the number of contacts increases linearly with population size, i.e. Θ(N ) = βdd N . βdd is a proportionality constant, also taking into account the probability of successful transmission. The second one is frequency-dependent transmission (also called proportionate mixing or standard incidence), in which the number of contacts is independent of population size, i.e. Θ(N ) = βf d . βf d is again the transmission coefficient, but has a different dimensionality than βdd . Henceforth, we shall refer to these two parameters as transmissibilities. Both incidences assume that the host population is mixed homogeneously. However, density- and frequencydependent transmission are assumed to represent two opposite extremes in a continuum of incidence functions.55 2.3. Incorporating the strong Allee effect In order to incorporate the Allee effect, the birth and death rate need to be defined appropriately. In general, a standard way of modelling the Allee effect is to assume a quadratic per-capita net growth rate31,56 g(N ) = a(K+ − N )(N − K− ) .
(3)
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
56
bHNL, birth rate
A
ac
K+ + K- + e 2
0 0
K+ +K- +e
K+
mHNL, death rate
B
a Hc + K- K+ L
0 0
K+
C gHNL, net growth rate
April 24, 2009
0 TK0 K-
HK+ +K- L2 K+ N, population size
Fig. 1. The quadratic per-capita birth rate (A) and linear per-capita death rate (B) generate a strong Allee effect in the per-capita net growth rate (C). Parameters are described in the main text. TK− refers to the dimensional inflection threshold, cf. Sec. 5.
K+ is the carrying capacity and K− the Allee threshold (minimum viable population size, sometimes also referred to as Allee limit). Note that 0 < K− < K+ and K− K+ in most cases of biological interest, which is why we shall assume K− < K+ /2. The intrinsic growth rate is determined by parameter a. In the situation of the epidemiological model (1), (2), however, there is no vertical transmission, i.e. infectious individuals give birth into the susceptible compartment. b(N ) and m(N ) therefore need to be explicitly defined, while maintaining the standard description (3). One possibility is to assume a quadratic and linear per-capita birth and death
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
57
rate, respectively: b(N ) = a −N 2 + [K+ + K− + e]N + c , m(N ) = a (eN + K+ K− + c) . In this formulation, the per-capita death rate increases linearly with population density due to intraspecific competition (as is usually assumed in logistic growth). The birth rate is composed of two parts, one assuming that the number of mating encounters is proportional to N 2 (again assuming bimolecular collision) and one assuming that offspring survival decreases linearly with population size (due to crowding). This approach goes back to Ref. 57 gives rise to a quadratic per-capita birth rate.58,59 Note that the birth and death rate have two additional parameters. Parameter c > 0 effectively shifts b(N ) and m(N ) up or down, thus affecting the baseline levels of fertility and mortality at zero population size (cf. Fig. 1). Parameter e determines (i) the slope and therefore the degree of density-dependence in the per-capita death rate, (ii) the location of the maximum per-capita birth rate and (iii) for which densities the birth rate becomes negative. The latter is, of course, unrealistic. However, restricting initial total population sizes to be within N (0) < K+ + K− + e safely guarantees a positive birth rate (see also Fig. 1). Note that neither c nor e affect the net growth rate g(N ).
2.4. Non-dimensionalisation Introducing the dimensionless quantities N , K+ K+ > 0, r= e µ ≥0 α= aeK+
P =
Y , K+ K− 1 u= ∈ (0, ) , K+ 2 I=
t = aeK+ T > 0 , d=
c > 0, eK+
and formulating system (1),(2) in terms of the state variables P (dimensionless total host population) and I (infectious individuals therein) yields dP = r(1 − P )(P − u)P − αI , dt dI N −Y = Γ(P ) − α − d − ru − P I . dt N
(4) (5)
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
58
Γ(P ) is the dimensionless effective contact rate with βdd P = σdd P ae βf d Γ(P ) = = σf d aeK+
Γ(P ) =
and
(6) (7)
in the case of density- and frequency-dependent transmission, respectively. Henceforth, model (4),(5) with (6) and (6) shall be respectively referred to as density-dependent (dd ) and frequency-dependent (fd ) model. 2.5. Other disease models with demographic Allee effect The model in Ref. 27 also assumes that the Allee effect is concentrated in the birth rate. While the per-capita death rate is considered to be constant, the per-capita birth rate is of a form similar to enzyme kinetics with inhibition (or a non-monotone functional response). It should be noted that both super- and subcritical Hopf bifurcations are possible. Ref. 60 focus on models with complete vertical transmission. However, they also consider an ‘alternative’ model (Eq. [6] in Ref. 60) without vertical transmission, in which the per-capita birth rate is constant. The per-capita death rate is assumed to be quadratic and similar to the standard form as in Eq. (3). Hence, the density dependence is concentrated in the death rather than the birth rate. 3. Thresholds: disease invasion and the impact of positive and inverse density dependence To begin with, consider the disease-free system. There are three stationary states. Two of them are stable, namely the trivial extinction state (P0∗ , I0∗ ) = (0, 0) and the equilibrium in which the host reaches its carrying capacity (P2∗ , I2∗ ) = (1, 0). The dynamics is bistable, i.e. which one of these two stable equilibria is achieved depends on the initial condition. If the population size is smaller or larger than the Allee threshold u, the population goes extinct or persists, respectively. The unstable equilibrium (P1∗ , I1∗ ) = (u, 0) separates the two basins of attraction. The Allee threshold u is the extinction threshold of the population. In the presence of disease, it is constructive to consider the effective reproduction number R, which gives the number of secondary disease cases produced by a single infectious individual introduced into a population of size P . The number of secondary cases per infectious individual and unit time is Γ(P ), which has to be multiplied with the infectious lifetime
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
59 −1
[α + m(P )] , where m(P ) = d+ru+P is the dimensionless death rate. The effective reproduction numbers in case of frequency- and density-dependent transmission respectively are σf d , α + d + ru + P σdd P = . α + d + ru + P
Rf d =
(8)
Rdd
(9)
For frequency-dependent transmission, the number of secondary cases is constant, while it increases with population size for density-dependent transmission. The infectious life time, however, is the same for both incidences and decreases with P . Obviously, the disease spreads and becomes endemic if R > 1 and cannot establish else. σdd > 1 is a necessary condition for disease spread in the density-dependent model, cf. Eq. (9). If this condition holds, Rdd increases with population size. That is, large enough a population is the ingredient of disease persistence. Setting Rdd = 1 and solving for P yields the minimum population size required for disease spread PTdd =
α + d + ru . σdd − 1
(10)
PTdd is sometimes referred to as critical community size. We shall refer to it as disease threshold. In the case of frequency-dependent transmission, one can easily see that Rf d decreases with population size, cf. Eq. (8). This means that the disease can maintain itself in small populations, but disappears in large population. This is fundamentally different from the situation for density-dependent transmission. The disease threshold PTf d = σf d − α − d − ru
(11)
now is an upper rather than a lower threshold. The reason behind this is that the number of new infections remains constant, while natural mortality increases with crowding and thus limits the time during which transmission takes place. Whether the disease can establish at all, clearly depends on whether the parasites are able to invade the disease-free population at carrying capacity. That is, the effective reproduction number needs to be evaluated at P = 1, which gives the well-known basic reproduction number σf d α + d + ru + 1 σdd = α + d + ru + 1
Rf0 d = Rdd 0
and
(12) (13)
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
60
PTfd u
0
1 P
A <
>
< DFE
PTfd
1
Ext
u
0
P
B
<
<
>
>
Ext
< DFE
u
0
1
PTfd P
C
<
<
>
Ext
>
<
End
u
0
PTfd
1
P
D
<
<
<
Ext Fig. 2. The upper disease threshold PTf d defines the range of population sizes where the disease could persist in the frequency-dependent model (shown in grey). Depending on the location of the disease threshold, parasites can become endemic (End) and increase the basin of attraction of the extinction equilibrium (Ext), if they invade the disease-free equilibrium (DFE). (Un-)filled circles represent (un-)stable equilibria. See Sec. 3 for more details.
for the dd and fd model, respectively. Once the disease successfully invades, the disease threshold is instructive in understanding how the population will be affected by parasite invasion. The following two subsections will consider the cases of frequency- and density-dependent transmission separately.
3.1. Frequency-dependent transmission If PTf d < u (Fig. 2A), the disease cannot invade the host population. Due to the inverse density dependence of the strong Allee effect, there are no persistent population sizes that are small enough to allow parasite persistence. Hence, the population remains disease-free and reaches the diseasefree equilibrium (DFE), i.e. the carrying capacity, unless it goes extinct due to the Allee effect. If u < PTf d < 1 (Fig. 2B), there is a range of population sizes above the Allee threshold that would allow disease persistence. However, two aspects have to be taken into account. First, the host experiences an additional disease-related mortality. This increases the effective extinction threshold
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
61
for the population from u to some value larger than the Allee threshold. The precise extinction threshold depends on the number of infectious individuals, because the problem is two-dimensional. Figure 2 therefore has to be understood as an illustrating simplification. Second, above the effective extinction threshold the population grows to carrying capacity. This implies that the disease cannot be maintained. Endemic disease persistence is only possible exactly on the unstable equilibrium, where the disease mortality precisely balances the demographic growth. Note that the unstable endemic equilibrium organises the basins of attraction of the extinction and DFE state (cf. Fig. 2B). If PTf d > 1, the upper disease threshold is large enough so that the population at the disease-free equilibrium can carry the disease and thus allows for parasite invasion. That is, PTf d > 1 implies Rf0 d > 1. Two scenarios can be distinguished once the disease establishes. First, it depresses the host population size to a value P+∗ < 1 due to the additional mortality, thus leading to an endemic equilibrium. The unstable equilibrium with smaller population size P−∗ < P+∗ continues to exist and now separates the endemic basin of attraction from the extinction region (Fig. 2C). Second, if virulence is too large, the disease causes so many deaths so that the population size P+∗ is always reduced below the effective extinction threshold. An endemic equilibrium therefore cannot exist. In mathematical terms, the nontrivial stationary states collide and disappear (cf. Fig. 2D). The population goes extinct for all initial conditions (in contrast to the situations in Figs. 2B and C, where the disease merely increases the basins of extinction). This means that parasites inflicting a large burden in the population render their host dynamics monostable,25 even though there is a strong Allee effect that typically induces bistability.
3.2. Density-dependent transmission If disease transmission is density-dependent, the disease threshold is a lower threshold and the ranges of population sizes allowing for disease persistence reverse. Disease persistence is impossible if PTdd > 1, because positive density dependence prevents the population from growing that large (Fig. 3A). In fact, one can see that Rdd 0 < 1 in this case. The population remains disease-free and either persists at the disease-free equilibrium (DFE, i.e. the carrying capacity) or goes extinct. If the disease threshold is in between the Allee threshold and the carrying capacity, u < PTdd < 1, various scenarios can be distinguished that
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
62
u
0
PTdd
1
P
A
>
<
<
Ext
DFE u
0
Tu
PTdd
1 P
B
>
<
<
Ext
<
End
PTdd
u
0
1
Tu
P
C
> < End
< Ext
End u PTdd
0
< 1
Tu
P
D
< Ext
End u PTdd
0
> < End
< 1
Tu
P
E
<
< Ext
End
End
PTdd u
0
< 1 P
F <
<
>
Ext 0
<
<
End
PTdd
u
1 P
G
<
<
<
Ext Fig. 3. In the density-dependent model, the disease threshold PTdd defines the minimum population size required for disease persistence (shown in grey). Density-dependent transmission allows the possibility of three endemic equilibria and tristability.
involve the so-called inflection threshold Tu =
(u + 1)3 . 9(u2 − u + 1)
This quantity is related to the curvature of the net growth rate and corresponds to the root of the tangent line through the inflection point of the host population’s zero-growth isocline.26 If Tu < PTdd < 1, the disease invades the disease-free equilibrium and becomes established. The additional mor-
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
63
tality depresses the population size towards an endemic equilibrium with large population size P+∗ < 1, cf. Fig. 3B. If, contrariwise, u < PTdd < Tu , there can be up to three endemic equilibria (Fig. 3C). Below the inflection threshold, population growth is so small (cf. Fig. 1C) that it can be balanced by disease-related deaths. This induces an endemic equilibrium with small population size P−∗ , u < P−∗ < P+∗ , which can be stable or unstable. In between the two endemic equilibria P−∗ and P+∗ there is another stationary state with intermediate population size Po∗ , P−∗ < Po∗ < P+∗ , which is always unstable (cf. Fig. 3C). Note that the overall dynamics is tristable if the small endemic equilibrium is stable. For the sake of abbreviation, we shall henceforth refer to the equilibria with population sizes P−∗ , Po∗ and P+∗ as small, intermediate and large endemic equilibrium, respectively. There are two different ways by which two of the three endemic states cease to exist. The first one is indicated in Fig. 3D. Disease burden is low, so that P−∗ becomes large and Po∗ becomes small. When the equilibria collide (mathematically corresponding to a saddle-node bifurcation), they disappear and the basin of attraction corresponding to P−∗ is absorbed by the large endemic equilibrium as well as the extinction state. In the second way (Fig. 3E), the disease is so strong that the large endemic equilibrium collides with the intermediate endemic equilibrium. The basin of attraction corresponding to P+∗ is then absorbed by the small endemic equilibrium — if it is stable at all — and/or by the extinction state. The conditions for the existence of three endemic equilibria and their disappearance will be investigated in more detail in Sec. 5. It should be noted that the small endemic equilibrium becomes unstable if P−∗ is small enough. In this case, the interplay of density-dependent disease transmission, virulence and population growth depensation due to the Allee effect can cause stable limit cycle oscillations.26 Furthermore, the cyclic attractor can completely disappear after a homoclinic bifurcation. This is illustrated in Sec. 4.2. Lastly, if PTdd < u, there can be two endemic equilibria or none at all. The existence of a third endemic equilibrium with small population size is impossible because disease prevalence depresses the host below the Allee threshold. In fact, the effective extinction threshold of the population is raised (Fig. 3F). If the impact of disease is too strong, the population goes extinct for all initial conditions (Fig. 3G). As in the fd model, too strong a disease kills its own host.
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
64
4. Host population extinction and long-lasting transients In both disease transmission models, parasite invasion can trigger the host population to die out. While it is well-known that frequency-dependent transmission can lead to host extinction,4,12,14,18,61 it has long been believed that this is not possible for density-dependent transmission, cf. the review in Ref. 6. Only in the presence of alternative host species which could act as a reservoir to the parasite, models could explain the extirpation of a population affected by diseases with dd transmission.44,62 Although some authors already conjectured that Allee effects in combination with dd transmission should make extinction possible as well,6,48 this has been demonstrated in mathematical models only recently.26,27,60 The aim of this Section is (i) to illustrate the difference between extinction in the fd and dd model, (ii) to show that there are two different dynamical regimes that can lead to host extinction in the dd model and (iii) to showcase extinction dynamics with long-lasting transients.
4.1. Disease-induced extinction In the case of frequency-dependent transmission, the disease continues to be transmitted even when the host population becomes small and approaches zero. This is because the number of contacts of individuals within the population is constant. Therefore, the fraction of infectious individuals (i.e., the prevalence) stays positive and converges to an equilibrium value in the process of P → 0. This is illustrated in Fig. 4A. In general, it is helpful to formulate fd models in (P, i) state variables, where i = I/P denotes the prevalence, because this allows to distinguish the disease-induced extinction equilibrium (P ∗ = 0, i∗ > 0) from the trivial extinction equilibrium (P ∗ = 0, i∗ = 0). In (P, I) state variables, both stationary states correspond to (P ∗ = 0, I ∗ = 0). Moreover, this is a handy way to deal with the singularity in the incidence function that occurs at P = 0. In the case of density-dependent transmission, the number of contacts decreases with population size and actually approaches zero as P → 0. Hence, disease transmission vanishes when the population is in the process of going extinct. In models without Allee effect, this allows the host to stop declining, which is why infection cannot eradicate the population. In the presence of a strong Allee effect, the disease burden can push the population size below the Allee threshold. From then on, it is the Allee effect driving the host extinct. However, it is the disease to initiate the population decline and it is therefore a synergy of infection and Allee effect leading to extinction.
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
65
A
1.0 P
B
1.0
i
0.8
0.8
0.6
0.6
0.4
0.4 I
0.2
0.0
0.2
0
20
40
60
80
100
0.0
0
50
t, time C
1.0
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
100
200 t, time
300
400
150
0.0
200 D
1.0
0.8
0.0
100 t, time
0
500
1000 t, time
1500
2000
Fig. 4. Host population extinction in the frequency-dependent (left panels) and densitydependent (right panels) transmission model. The bottom panels show long-lasting transient dynamics (note the different time scales). Parameter values: u = 0.1, r = 0.2, d = 0.25, α = 0.1, (A) σf d = 3, (B) σdd = 8, (C) σf d = 1.7, (D) σdd = 4.16.
If this happens (Fig. 4B), the number of infectious individuals I as well as the prevalence i reach zero in the course of extinction. A positive prevalence as in the density-dependent case cannot be observed. The difference between these two extinction dynamics is whether infection can spread in vanishing population sizes. As already observed in Sec. 3, the disease threshold PTdd is a lower threshold, whereas PTf d is an upper threshold. Therefore, it is plausible that disease transmission stops in small populations in the dd model. Mathematically, this can be quantified by considering the linearised growth rates of the infectious and the total host population, assuming that P ≈ 0 and the number of infectious
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
66
individuals is small within the remaining population. Then the host population approximately decays at rate ru, and the infectious population at rate −(Γ(0) − α − d − ru), cf. Eqs. (4) and (5), respectively. The difference between these two rates is Γ(0) − α − d, and the ratio Γ(0) α+d determines if the prevalence remains positive in the vanishing population (cf. Refs. 4 and 63). Biologically, Ri can be interpreted as the number of secondary infections (Γ(0)) made in a vanishing population during the differential time period (α + d) that the infectious population exists less on average than the total population. If Ri < 1, the infectious population decays more quickly than the total host population. The prevalence therefore approaches zero as in in the dd scenario shown in Fig. 4B. In fact, the corresponding ratio for the dd model is always smaller than unity: Ri =
Rdd i = 0. Hence, the prevalence can never be positive in the extinction process of the dd model. Contrariwise, if Ri > 1, the infectious population decays more slowly than the total host population, which ensures a positive prevalence. This can happen in the fd model as shown in Fig. 4A. The corresponding ratio for the fd model is σf d . Rfi d = α+d 4.2. Transient dynamics In both transmission models, eventual host population extinction can be preceded by long-lasting transient dynamics. Fig. 4C illustrates an example where the overall time duration of the extinction process in the fd model is roughly four times longer. The parameter set chosen corresponds to the situation where the endemic equilibria have just ceased to exist after a saddle-node bifurcation (cf. Fig. 2D and also Fig. 2 in Ref. 25). The ‘plateau’ in the transients corresponds to the numerical values of the saddle node that disappeared. In the dd model, there is yet another scenario of extinction, also including long-lasting transient. Consider the situation where there is a unique endemic equilibrium with small population size (Fig. 3E). If this equilibrium becomes unstable, the amplitudes of the arising limit cycle oscillations grow larger when disease burden continues to increase. Sooner or later the cycle will collide with the Allee threshold and then disappear (homoclinic
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
67
bifurcation).26 Roughly speaking, one can see this as another way of the disease pushing the population into the extinction the regime. The transient dynamics before eventual extinction is oscillatory with increasing amplitudes (Fig. 4D). These transient oscillations can last an extremely long time. Just like the transient dynamics shown in Fig. 4C, they are reminiscent or ‘ghosts’ of an attractor that disappeared out of the blue (in this case a limit cycle rather than a saddle node). Transient dynamics are likely to be affected by perturbations in reality, including stochastic effects, that could accelerate the extinction process (or even prolong it). 5. Multiple endemic states In the density-dependent transmission model, it is possible that there are not only two, but three endemic equilibria. We shall refer to them in short as the endemic triple. Two of these nontrivial stationary states can be locally stable, which means that the overall dynamics is tristable as the extinction state always is an attractor.26 The phase plane therefore is rather ‘fragile’, with perturbations possibly shifting the system from the large to the small endemic equilibrium (corresponding to mild and severe ‘outbreaks’, respectively) or even to extinction. The aim of this Section is to investigate the existence conditions of the endemic triple in more detail, thus enhancing the biological understanding of the underlying mechanisms and providing insights for potential consequences in conservation biology and wildlife management. 5.1. The necessary inflection threshold condition A necessary condition for the endemic triple is that the disease threshold PT is in between the Allee threshold u and the inflection threshold Tu . Note that Tu > u for all Allee thresholds considered here (u < 12 ). Figure 5A shows how the inflection threshold varies with u. The Allee threshold and the carrying capacity are the only parameters determining Tu . Reasoning in Sec. 3 suggests that the host growth for population sizes in the specified range are rather small due to depensation (cf. also Fig. 1C). Pathogenic parasites can balance this growth and if their incidence is density-dependent, the transmission can be adjusted such that the equilibrium is stable rather than unstable as in the fd model. The inequalities u < PT < Tu define a minimum Allee threshold umin and a maximum Allee threshold umax for which three endemic equilibria are possible (Fig. 5B). The first inequality determines umin , and the second
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
68 0.35 B
1.0 A
Tu
u
u
0.30
Tu
0.8
PT 0.25
0.6
0.20
0.4
0.15
0.2
0.0
0.0
0.2
0.4 0.6 u, Allee threshold
0.8
1.0
umax
umin
0.10 0.0
0.1
0.2 u, Allee threshold
0.3
0.4
1.0 D
0.5 C 0.8 0.4
0.6 0.3
0.4
umax
0.2
0.2
0.1 umin 0.0
Σ* 0.0
0.0
0.1
0.2
0.3 0.4 Α, virulence
0.5
0.6
umax
umin
Α*
2
Σ 3 4 Σ, transmissibility
5
Fig. 5. (A) The inflection threshold Tu in the density-dependent transmission model depends solely on the Allee threshold. (B) The inequalities u < PTdd < Tu provide necessary conditions for the existence of three endemic equilibria. (C) and (D) show how the ranges of Allee thresholds for which the endemic triple might exist vary with disease parameters α and σdd , respectively. Parameter values: (B) r = 1, d = 0.25, α = 0.1, σdd = 3.9, (C) r = 0.2, d = 0.25, σdd = 3, (D) r = 0.2, d = 0.25, α = 0.06.
inequality determines umax . We are now interested in the range of Allee thresholds u ∈ (umin , umax ) that potentially facilitate the endemic triple. Figures 5C and D show how this range of Allee thresholds is affected by the disease parameters α and σdd , respectively. The range is largest for small virulences α and becomes narrow with increasing α. In fact, there is a critical α∗ , at which umin and umax meet and three endemic equilibria are not possible anymore. As for the transmissibility σdd , the endemic triple is ∗ possible from a critical value σdd onward. At first, the range expands, but when umin becomes zero, the range starts shrinking slowly with increas-
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
69
ing σdd . Both plots suggest that the endemic triple emerges only for rather mild Allee effects, i.e. small values of u. Although the Allee threshold does not necessarily need to be close to zero, three endemic states seem unlikely if u > 0.2, for instance. 5.2. The saddle-node bifurcation conditions The inflection threshold constitutes only necessary conditions for the existence of the endemic triple. Sufficient conditions which basically describe the occurrence of saddle-node bifurcations can be derived from a graphical phase plane analysis.26 Three endemic equilibria exist if the system is nested within two saddle-node bifurcation. The corresponding condition can be quantified as s− < s < s+ , −1 is the slope of the infectious nullcline. The two critical where s = σdd σdd values s− and s+ correspond to the slopes of tangents to the convex and concave branches of the host nullcline, emanating from the disease threshold. They are affected by all parameters, especially the disease-related ones as shown in Fig. 6. The curves of s− and s+ exhibit a typical cusp shape. The intersections s = s− and s = s+ define two saddle-node bifurcations SN1 and SN2 , respectively. The endemic triple exists for those parameter values, for which the curve of s is within this cusp. While this never happens in Fig. 6A (α = 0.1), the parameter range yielding three endemic states can easily be deduced from Fig. 6B (α = 0.066). If the virulence is below a certain value (approximately α ≈ 0.04), the s curve remains within the cusp spanned by s− and s+ and does not intersect with the s+ -curve for increasing transmissibility. This means that the second saddle-node bifurcation SN2 leading to the disappearance of the large and intermediate endemic equilibria does not take place. The consequences are illustrated below. Before, it should be noted that the endemic equilibrium with small population size disappears when the s− curve ends. The reason is that the population size in this endemic equilibrium falls below the Allee threshold (i.e., P−∗ < u).
5.3. Control implications If there are two saddle-node bifurcations, the endemic triple exists for a restricted range of large, but moderate transmissibilities. This is illustrated in the bifurcation diagram of Fig. 7A. For even larger transmissibilities
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
70 1.0 A 0.8 0.6
s+ s
0.4
s-
0.2 0.0
1 1.0 B
2
3
4
5
s+
0.8 0.6
ss
0.4 0.2
SN1 0.0
1
2
3
SN2 4
5
1.5 C s+ s-
1.0
0.5
s P*- < u
SN1 0.0
2
4
6 8 10 12 Σ, transmissibility
14
16
Fig. 6. Saddle-node (SN) bifurcation conditions for the density-dependent transmission model, derived from graphical phase plane analysis (cf. the main text). If s intersects the cusp spanned by s− and s+ , there exist three endemic equilibria — unless the one ∗ < u). Note the different scale of the with small population size becomes unfeasible (P− horizontal axis in panel C. Parameter values: u = 0.1, r = 0.2, d = 0.25, (A) α = 0.1, (B) α = 0.066, (C) α = 0.04.
(i.e., beyond SN2 ), there is no endemic attractor left, because the endemic equilibrium with small population size ceased to exist (P−∗ < u). Hence, the population is rendered monostable and eventually goes extinct for all initial
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
71 (A), α=0.06 1
P, host population size
R0=1 0.8
0.6
SN2 0.4
0.2
SN
1 *
P−
0 0
5
10
15
20
(B), α=0.045
P, host population size
1
0.8
0.6
SN2 0.4
0.2
0 0
5
10
15
20
15
20
(C), α=0.041 1
P, host population size
April 24, 2009
0.8
0.6
0.4
0.2
0 0
5
10
σdd, transmissibility
Fig. 7. Bifurcation diagrams of the total host populations with varying transmissibility. Below a critical virulence α, the second saddle-node bifurcation SN2 does not occur anymore, so that there is always a large endemic equilibrium as alternative attractor to the extinction state. Dashed (solid) lines represent (un-)stable equilibria. Points indicate limit cycle amplitudes. Parameter values: u = 0.1, r = 0.2, d = 0.25, (A) α = 0.06, (B) α = 0.045, (C) α = 0.041.
conditions. The closer the virulence is to the critical virulence (α ≈ 0.04), the second saddle-node bifurcation SN2 occurs for rather large transmissi-
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
72
bilities. This is shown in Figs. 7B and C. Below the critical virulence, there is a locally stable endemic equilibrium with large population size P+∗ for all large transmissibility values. This means that increasing transmissibility (i.e., increasing R0 ) has basically no effect on the large endemic state. If the population is in the endemic state, the host cannot be driven extinct by management measures that affect R0 . (This would be possible in Fig. 7B, but not in Fig. 7C). Only direct manipulations of the population size (e.g. culling or trapping) could eradicate the host species by forcing it towards the extinction state. 6. Discussion and conclusions Populations with a demographic Allee effect are subject to inverse density dependence at low densities, which increases the likelihood of extinction. If the population is additionally affected by parasitism, the disease-related mortality further increases the likelihood of extinction and can induce more complicated dynamics. Two different forms of parasite transmission have been considered: the density-dependent (dd) and the frequency-dependent (fd) one, which are seen as two opposite extremes in a continuum of possibilities.55 In fact, the dynamics imposed by pathogens which are transmitted in one of these ways can be very different. Both incidences impose a critical host population size, referred to as disease threshold, that determines if they can persist. While this is a lower threshold in the dd model, it is an upper threshold in the fd model. The reason is simply that the number of new infections in the fd model is a constant and that natural mortality increases with host population size. At large densities, the time during which transmission takes place is therefore reduced. This is fundamentally different from the general view of conditions for disease spread (stemming from the threshold criterions in Refs. 64 and 1, for example). Obviously, this view largely originated from densitydependent transmission models, in which the number of secondary cases increases with population size. A large population size is generally considered to be susceptible to disease spread. In fd models, however, this is not the case and a large population size protects the host from parasite invasion. For frequency-dependent transmission, disease establishment requires a large upper threshold value (or a small carrying capacity in the disease-free equilibrium). The main effect of parasitism is to depress host population size and to increase the extinction region at low density. If parasite pressure is too strong, the host can be driven to extinction.
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
73
For density-dependent transmission, in contrast, successful disease establishment requires a small lower threshold value (or a large carrying capacity). Again, the host population size is reduced in endemic equilibrium, and the presence of disease enlarges the extinction region. The usage of disease thresholds in both the dd and fd model proves very useful. Relating them to the carrying capacity (which is induced by population regulation due to density dependence) and to the Allee threshold (which is induced by inverse density dependence) can explain the population behaviour and independently confirms the results from previous linear stability analyses.25,26 The biological insight provided by the disease thresholds is probably their largest asset. Ref. 60 suggests that the Allee effect protects the host population by making it more difficult for the disease to become established. This effect is not reported here. Disease invasion usually takes place around the disease-free equilibrium, i.e. the carrying capacity, where the Allee effect has only little impact. Moreover, the disease thresholds are affected by the death rate, cf. Eqs. (10) and (11), which is linearly increasing as in logistic growth models. Since the Allee effect is concentrated in the birth rate, the disease thresholds are unlikely to be affected directly by depensation.∗ This conclusions has also been drawn in Ref. 27, where a protective function of the Allee effect has not been observed either. If the disease threshold is sufficiently small, there can be, broadly speaking, two dynamical regimes in the dd model that do not occur for fd transmission. The first one is the emergence of another stable endemic equilibrium with small population size, leading to tristabilty. A necessary condition is that the disease threshold lies in between the Allee threshold and the inflection threshold. The latter relates to the curvature of host population growth. Populations at small endemic equilibrium are largely affected by the inverse density dependence. Their demographic growth rate is so small that it can be balanced by disease-related deaths. It appears to be also important that disease transmission is density-dependent, because the existence of the third endemic equilibrium in the fd model is impossible. This must be due to the constant contact rate, which is likely to induce too large a disease burden. The host would probably go extinct at those small population densities in the fd model. The inflection threshold poses only a necessary condition for tristability ∗ As
the threshold analysis in Sec. 3 suggests, it is the interplay of the disease threshold with the thresholds set by inverse and positive density dependence that drives the dynamics.
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
74
to occur. This condition holds for rather small values of the Allee threshold, i.e. the Allee effect does not need to be severe. Moreover, critical values of the disease-related parameters have been identified. The virulence needs to be below a certain level and the transmissibility above a certain value. From the saddle-node bifurcation conditions, emanating from phase plane analysis, there is another restriction on the transmissibility, namely that it cannot be too large. Hence, the endemic triple requires pathogen with moderate transmissibility and not too large a virulence. These are characteristics of a disease that spreads effectively within a host population, but does not cause too many deaths. While both aforementioned conditions are sufficient for the existence of three endemic equilibria, they are not sufficient for tristability, because the small endemic state can lose its stability. In this case, there can be a cyclic attractor in form of a limit cycle (i.e., the dynamics is still tristable), but when the limit cycle disappears due to a homoclinic bifurcation, the dynamics becomes bistable.† The emergence and disappearance of stable oscillations is the second difference between the dd and fd model. Densitydependent transmission is a necessity to make possible the cyclic dynamics. Otherwise, the number of infectious individuals could not decline more rapidly than the total population, which allows the latter to recover and grow before the infection starts spreading again. The dynamical regime after the homoclinic bifurcation is characterised by long oscillatory transients with increasing amplitudes before eventual extinction (or, depending on the initial conditions, the approach of the large endemic equilibrium, alternatively). This scenario is a peculiar example of transient dynamics that possibly act over an extremely long time scale and might therefore be important on the ecological time scale. In the fd model, transient dynamics are possible as well that can quadruple the time to deterministic extinction. Both cases highlight the importance of transients.65 The host population extinction is of particular interest if the host is an endangered species or a pest. Management measures aim to prevent extinction in the former case and try to achieve it in the latter case. The joint interplay of parasitism and the Allee effect can eradicate the host even if the incidence is density-dependent. Pathogens with dd transmission therefore pose an additional threat to endangered species, while they appear as another alternative for biological control of undesirable species. Extinction
† Note
that the complex dynamical behaviour in the dd model (including saddle-node, Hopf and homoclinic bifurcation) is organised by a Bogdanov-Takens bifurcation point.26
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
75
in fd and dd models takes place in a different way, with the prevalence being larger or equal to zero in the limit process. A threshold quantity Ri with biological meaning has been provided that distinguishes between these two scenarios. In mathematical epidemiology, recommendations for management actions are often derived in terms of trying to decrease or increase the basic reproduction number (e.g., by reducing transmissibility or disease-related deaths). If the host-parasite system is close to a parameter region allowing for three endemic equilibria, any intervention should be planned very carefully. Depending on a critical virulence, there could be a large endemic state even for extremely large values of R0 (Fig. 7C), whereas for other values of the virulence extinction is the certain outcome (Fig. 7A). In the former case, eradicating a host is more likely to happen by manipulating directly the population size (i.e., changing the initial conditions). In the latter case, protecting and conserving the host species is possible even for large basic reproduction numbers, but might require a restocking programme. Acknowledgments The author is a Ciˆencia 2007 fellow, supported Funda¸c˜ao para a Ciˆencia e a Tecnologia, Financiamento Base 2008 - ISFL/1/209. References 1. R. M. Anderson and R. M. May, Nature 280, 361 (1979). 2. R. M. Anderson, Nature 279, 150 (1979). 3. J. Mena-Lorca and H. W. Hethcote, Journal of Mathematical Biology 30, 693 (1992). 4. H. R. Thieme, Mathematical Biosciences 111, 99 (1992). 5. P. J. Hudson, A. Rizzoli, B. T. Grenfell, H. Heesterbeek and A. P. Dobson, The Ecology of Wildlife Diseases (Oxford University Press, Oxford, 2001). 6. F. de Castro and B. Bolker, Ecology Letters 8, 117 (2005). 7. W. O. Kermack and A. G. McKendrick, Proceedings of the Royal Society of London A 138, 55 (1932). 8. F. Brauer and C. Castillo-Chavez, Mathematical models in population biology and epidemiology (Springer-Verlag, New York, 2001). 9. R. M. Anderson, H. C. Jackson, R. M. May and A. M. Smith, Nature 289, 765 (1981). 10. H. W. Hethcote and S. A. Levin, Periodicity in epidemiological models, in Applied Mathematical Ecology, eds. S. A. Levin, T. G. Hallam and L. J. Gross (Springer-Verlag, New York, 1989) pp. 193–211. 11. A. Pugliese, Journal of Mathematical Biology 28, 65 (1990). 12. R. M. Anderson, R. M. May and A. R. McLean, Nature 332, 228 (1988).
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
76
13. F. Brauer, Journal of Mathematical Biology 28, 451 (1990). 14. S. Busenberg and P. van den Driessche, Journal of Mathematical Biology 28, 257 (1990). 15. S. N. Busenberg and K. P. Hadeler, Mathematical Biosciences 101, 63 (1990). 16. L. Q. Gao and H. W. Hethcote, Journal of Mathematical Biology 30, 717 (1992). 17. W. R. Derrick and P. van den Driessche, Journal of Mathematical Biology 31, 495 (1993). 18. J. Zhou and H. W. Hethcote, Journal of Mathematical Biology 32, 809 (1994). 19. F. Brauer, Mathematical Biosciences 128, 13 (1995). 20. D. Greenhalgh, Mathematical and Computer Modelling 25, 85 (1997). 21. M. Y. Li, J. R. Graef, L. Wang and J. Karsai, Mathematical Biosciences 160, 191 (1999). 22. H. W. Hethcote, SIAM Review 42, 599 (2000). 23. F. M. Hilker, M. A. Lewis, H. Seno, M. Langlais and H. Malchow, Biological Invasions 7, 817 (2005). 24. A. Deredec and F. Courchamp, Oikos 112, 667 (2006). 25. F. M. Hilker, M. Langlais, S. V. Petrovskii and H. Malchow, Mathematical Biosciences 206, 61 (2007). 26. F. M. Hilker, M. Langlais and H. Malchow, American Naturalist (accepted). 27. H. R. Thieme, T. Dhirasakdanon, Z. Han and R. Trevino, Journal of Biological Dynamics (accepted). 28. W. C. Allee, Animal Aggregations: A Study in General Sociology (University of Chicago Press, Chicago, 1931). 29. P. A. Stephens and W. J. Sutherland, Trends in Ecology & Evolution 14, 401 (1999). 30. P. A. Stephens, W. J. Sutherland and R. P. Freckleton, Oikos 87, 185 (1999). 31. F. Courchamp, T. Clutton-Brock and B. Grenfell, Trends in Ecology & Evolution 14, 405 (1999). 32. F. Courchamp, L. Berec and J. Gascoigne, Allee Effects in Ecology and Conservation (Oxford University Press, New York, 2008). 33. R. A. Myers, N. J. Barrowman, J. A. Hutchings and A. A. Rosenberg, Science 269, 1106 (1995). 34. M. Kuussaari, I. Saccheri, M. Camara and I. Hanski, Oikos 82, 384 (1998). 35. A. Liebhold and J. Bascompte, Ecology Letters 6, 133 (2003). 36. M. Liermann and R. Hilborn, Fish and Fisheries 2, 33 (2003). 37. E. J. Milner-Gulland, O. M. Bukreeva, T. Coulson, A. A. Lushchekina, M. V. Kholodova, A. B. Bekenov and I. A. Grachev, Nature 422, p. 135 (2003). 38. S. Rowe, J. A. Hutchings, D. Bekkevold and A. Rakitin, ICES Journal of Marine Science 61, 1144 (2004). 39. E. Angulo, G. W. Roemer, L. Berec, J. Gascoigne and F. Courchamp, Conservation Biology 21, 1082 (2007). 40. L. Berec, E. Angulo and F. Courchamp, Trends in Ecology & Evolution 22, 185 (2007). 41. P. K. Moln´ ar, A. E. Derocher, M. A. Lewis and M. K. Taylor, Proceedings of the Royal Society of London B 275, 217 (2008).
April 24, 2009
15:52
WSPC - Proceedings Trim Size: 9in x 6in
Frank.Hilker.novo2
77
42. A. P. Dobson and R. M. May, Disease and conservation, in Conservation Biology, ed. M. E. Soul´e (Sinauer Associates, Inc., Sunderland, MA, 1986) pp. 345–365. 43. M. E. Scott, Conservation Biology 2, 40 (1988). 44. H. McCallum and A. Dobson, Trends in Ecology & Evolution 10, 190 (1995). 45. P. Daszak, L. Berger, A. A. Cunningham, A. D. Hyatt, D. E. Green and R. Speare, Emerging Infectious Diseases 5, 735 (1999). 46. R. Woodroffe, Animal Conservation 2, 185 (1999). 47. C. D. Harvell, C. E. Mitchell, J. R. Ward, S. Altizer, A. P. Dobson, R. S. Ostfeld and M. D. Samuel, Science 296, 2158 (2002). 48. K. D. Lafferty and L. R. Gerber, Conservation Biology 16, 593 (2002). 49. J. A. Pounds, M. R. Bustamante, L. A. Coloma, J. A. Consuegra, M. P. L. Fogden, P. N. Foster, E. La Marca, K. L. Masters, A. Merino-Viteri, R. Puschendorf, S. R. Ron, G. A. S´ anchez-Azofeifa, C. J. Still and B. E. Young, Nature 439, 161 (2006). 50. L. J. Rachowicz, R. A. Knapp, J. A. T. Morgan, M. J. Stice, V. T. Vredenburg, J. M. Parker and C. J. Briggs, Ecology 87, 1671 (2006). 51. K. F. Smith, D. F. Sax and K. D. Lafferty, Conservation Biology 20, 1349 (2006). 52. S. Cleaveland, T. Mlengeya, M. Kaare, D. Haydon, T. Lembo, M. K. Laurenson and C. Packer, Conservation Biology 21, 612 (2007). 53. M. Begon, M. Bennett, R. G. Bowers, N. P. French, S. M. Hazel and J. Turner, Epidemiology and Infection 129, 147 (2002). 54. E. Fromont, D. Pontier and M. Langlais, Proceedings of the Royal Society of London B 265, 1097 (1998). 55. H. McCallum, N. Barlow and J. Hone, Trends in Ecology & Evolution 16, 295 (2001). 56. M. A. Lewis and P. Kareiva, Theoretical Population Biology 43, 141 (1993). 57. V. Volterra, Human Biology 10, 3 (1938). 58. B. Dennis, Natural Resource Modeling 3, 481 (1989). 59. D. S. Boukal and L. Berec, Journal of Theoretical Biology 218, 375 (2002). 60. F. Dercole, R. Ferri`ere, A. Gragnani and S. Rinaldi, Proceedings of the Royal Society of London B 273, 983 (2006). 61. W. M. Getz and J. Pickering, American Naturalist 121, 892 (1983). 62. R. D. Holt and J. Pickering, American Naturalist 126, 196 (1985). 63. F. M. Hilker and K. Schmitz, Journal of Theoretical Biology, p. doi:10.1016/j.jtbi.2008.08.018 (in press). 64. W. O. Kermack and A. G. McKendrick, Proceedings of the Royal Society of London A 115, 700 (1927). 65. A. Hastings, Trends in Ecology & Evolution 19, 39 (2004).
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
78
PULSE INFECTION. CONTROL FIXING TIME BETWEEN INFECTION EVENTS∗ † ´ F. CORDOVA-LEPE
Universidad Cat´ olica del Maule, 3605 San Miguel Avenue, Talca, Chile E-mail:
[email protected] ´ E. GONZALEZ-OLIVARES Pontificia Universidad Cat´ olica de Valpara´ıso, 2950 Brasil Avenue, Valpara´ıso, Chile E-mail:
[email protected] We assumed a population affected by a disease, whose infection process is associated to a sequence of social punctual events. The event is a kind of cultural activity or an economic necessity that happens with some frequency. We formulate a generalist mathematical model for determining, with analytic techniques of the Impulsive Differential Equations, the dynamic behavior of the infectious group. We introduce diverse conditions on the frequency of the infection events with the intention of to put control tools in hands of the regulatory authority, for a better sanitary management. The idea is to avoid a spread of the disease, trying to keep the amount of infectious under predetermined levels or going towards extinction.
1. Introduction From the fundamentals of the Mathematical Epidemiology, as the works of Hamer1 (1906), Ross2 (1911) and Kermack & Mc. Kendrick3 (1927), the matter it has been centered in establish models that permit the study (description, understanding and prediction) of the dynamics of infectious diseases principally in human populations. The literature gives account of a wide variety of classical models with good general perspectives.4 ∗ This † Work
work is supported by Universidad Cat´ olica del Maule. partially supported by UMCE.
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
79
The most used deterministic mathematical tools have been the ordinary differential systems. Normally, we can think, in presence of an infectious disease, that the population is divided, for instance, in two subpopulation: the susceptible group, and the infectious group, according the individuals may be infected (without immunity or resistance), or those being infected have the capacity of infect. Usually the population sizes of these groups are denoted with letters S and I respectively. The ordinary differentiable models try to give account of some inner characteristics of the disease. They express, in SIS models, the rates of change of S and I as functions of S and I. But, there are diseases that for its particularities does not resist a natural model by the way of the ordinary differential systems. In this case, for deterministic models it can to opt by different mathematical tools such as the Difference Equations (DE), Differential Equations with Delay (DDE), Equations in Time Scale, or Impulsive Differential Equations (IDE), which are hybrid systems. This article is aimed at studying the potential of a new type of IDE proposed in C´ ordova-Lepe5 (2007). Some details are given in Section 2. The main uses of the traditional IDE in Mathematical Epidemiology it have been concentrated in the possibility of to exert impulsive preventive control. It has considered that in the global dynamics of the disease a vaccination process, with very short duration, represents a real jump (impulse) in the variables values. An impulse that transports in an instantaneous way, a part of the susceptible population to the group of removed individuals, this is, the individuals that get immunity and do not infect. See Meng XZ & Chen LS,6 Aug. 2008 ; Wei CJ & Chen LS,7 2008; Gao SJ, Teng ZD & Xie DH,8 Jul. 2008; and Zhang TL & Teng ZD,9 Jan. 2008. This model class is knew as “Pulse Vaccination Models”. Our interest is the construction and analysis of adequate mathematical SIS models, for studying the dynamics of diseases, where the infection process is associated with the occurrence of certain events. An event, that considering time variable, it is feasible of consider punctual, i.e., the event has a short duration and it is sporadic. The idea is insert the impulsive effect principally in the infection process. We suppose that in those mentioned events a considerable group of susceptible individuals are transported to the infectious group. This transportation is a process fast enough compared with the dynamic the rest of the time, so that it is possible to suppose instantaneous. The idea of discretization of the instants of contagion is justified because it is not unreal to suppose that the human populations can have
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
80
associated social activities (cultural or economic) with transmission of diseases individual to individual. Diseases which are impossible of block or only partially lockable. When do not occur those infection events, we will assume the process dynamics of continuous type. Immediately we can detect two scenarios. One where the events of infection are uniformly spaced and represent a usual activity of the community. An other possibility is that there is a health authority or a sanitary organism, who has the power for to establish a calendar of activities with a level of flexibility due to cultural or economic reasons, although if it can not prohibit the execution of infectious events. This is, we are in front the possibility of to introduce an element of control that can down the infection rates. In both cases the IDE seem to be the kind of mathematical model more appropriated, but in the second we require a type of IDE in which the time between impulse instants, this is, the time between the events of infection, is a function of the impulse size, or at least of the amount of infected individuals. This work is organized as follow. In Section 2 we do a summary of some aspect of the IDE, specially those about the new type mentioned. In Section 3 we present the epidemiological model (an IDE) of our interest. Finally in Section 4 we have three results, they show conditions to obtain stable equilibria that could allow exercise control. 2. The Impulsive Differential Equations In the literature it is possible find several forms for modelling phenomena of reality that show an impulsive effect in its evolution. A form is representing the evolutive trajectories as solution of IDE. This equations were introduced by Mil’man & Myshkis10 (1960), point of view revived in eighties, see Perestyuk & Samoilenko11–13 (1977, 1981, 1987). We note the two books of Bainov & Simeonov14,15 (1989, 1993), because they have meant a significant role diffuser. We are in the domain of the IDE at Fixed Times, IDE-FT, when in a system, the impulsive instants are known before the development of the dynamics. If the impulses appear when, a priori, a determined relation there exists between the state and the time, we are in the field of the IDE at Variable Times, IDE-VT. The theory of IDE-FT is widely developed and presents a development based in direct analogies with the ODE. But, comparatively the IDE-VT has exhibited a smaller development, because they have phenomena as: solution with infinity pulses in finite time, the not existence of the uniqueness toward the past in the initial value problem, lost of autonomy, among other difficulties. Other
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
81
classical texts are Bainov & Covachev16 (1994); Lakshmikantham, Bainov & Simeonov17 (1989); Lakshmikantham & Liu18 (1993); Samoilenko & Perestyuk19 (1995); and Yang20 (2001). In the last time, they have appeared diverse types of hybrid equations, which combine discrete techniques and continuous for working the state variable. Some examples are the advanced and delayed differential equations with piecewise constant argument, see Cook & Wiener21 (1987). The use of IDE in biomathematics is very extended. In pulse vaccination see the works of Meng & Chen22 (2008), Gakkhar & Negi23 (2008) or Zhang & Teng24 (2008). In the treatment in chemotherapy of diseases, see Lakmeche & Arino25 (2001). Impulsive strategies of control and pest management, see Zhang, Jianjun & Lansun26 (2007) or Pang & Chen27 (2008). Impulsive Harvest in management of renewable resources, see Allegretto & Papini28 (2008) or Negi & Gakkhar29 (2007). The type of IDE that we need for expressing mathematically the evolution law, which is associated to the process of infection of our objective, can not be expressed by means of traditional IDE-FT or IDE-VT. In fact we are in front of a necessity of a new impulsive model, as the proposed by C´ordova-Lepe5 (2007) for the biomathematical community, and by C´ ordova-Lepe, Pinto & Gonz´ alez-Olivares30 (2008) for mathematical community. Now we will review some introductory elements of IDE-ITD necessary for the formulation of models. Let Ω ⊂ ℜn , be the set of all possible states of the process. We denote by x : ℜ → Ω the function that relates each instant t ∈ ℜ with the state x(t). To formulate the dynamics of model we require: First, an ordinary differential equations system
x′ = f (t, x), x ∈ Ω, t ∈ ℜ,
(1)
for some f : ℜ × Ω → ℜn , a function that describes the state almost always, except for {tk }k≥1 , recursively determined during the course of the dynamics. Second, a law regulating the impulses, it act in {tk }k≥1 , as follows x(t+ ) = It (x(t)), t ∈ {tk }k≥1 .
(2)
For each t ∈ ℜ, the function It : Ω → Ω transfers instantaneously x(t) towards a new state I(x(t)). Finally, from t0 ∈ ℜ, the first impulse instant, the sequence {tk }k≥1 is generated, by the recurrence
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
82
tk+1 = tk + Gtk (x(t+ k ), x(tk )),
(3)
where Gt : Ω × Ω → ℜ+ , for each t ∈ ℜ. The fulfillment of (1), (2) and (3); as a unified evolution law, it is denoted ′ f (t, x), t 6= tk , x (t) = + x(t ) = It (x(t)), t = tk , ∆tk = Gtk (x(t+ ), x(t )). k k
(4)
The set Ω is an open and connected of ℜn . The function f : ℜ×Ω → ℜn is continuous and satisfies a local Lipschitz condition, It : Ω → Ω, t ∈ ℜ, and each function Gt : ω × Ω → ℜ+ ,t ∈ ℜ, is continuous. A function x : (α, β) → Ω, −∞ ≤ α < β ≤ ∞, is a solution of (4) if exist Tx ⊂ (α, β), a set without accumulation points in (α, β), such that: (i) for all t ∈ (α, β) \ T − x, we have x′ (t) = f (t, x(t)); (ii) for all t ∈ Tx , we have: (a) x(t+ ) = It (x(t)) and (b) x(t− ) = x(t); and (iii) for all t′ , t′′ ∈ Tx , t′ < t′′ , consecutive elements of Tx , we have t′′ − t′ = Gt′ (x(t′ +), x(t′ )). The set is called the set of impulse times of x. Moreover, given a pair (t0 , x0 ) ∈ ℜ × Ω, the initial value problem describes by system (4) plus the condition x(t0 ) = x0 , has a unique solution. A solution x : (α, β) → Ω, β < ∞, of (4) has a continuaci´on from β if only if: a) x(β − ) exists in Ω, and b) Tx is finite by the right. A set T is called finite by the right if there exists t ∈ T such that (t, ∞) ∩ t is finite. 3. The Model We assume that there exists a sequence of times t = tk , k = 0, 1, · · · , for which the events of infection occur. This induces an impulsive flux S → I, depends of the size of I(t) and S(t), that we consider as function of the product of those sizes and we will denote β[S(t) · I(t)]. Note that we are assuming a relatively very short latent period. If in the other times, we assume a continuous process of recuperation with a per capita rate γ[I(t)] and moreover, the recuperation does not assure immunity, then there will have a continuous flux I → S. Furthermore, for each instant tk , we assume that the next infection event will be in G[I(tk ), I(t+ k )] times units after. That is, the moment of occurrence of a new event is determined as a function of the number of infectious just before and just after of the current event. Without consider vital dynamics and under the condition S + I = 1, the model takes the form
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
83
′ −γ[I(t)] · I(t), t 6= tk , I (t) = I(t+ ) = I(t) + β[(1 − I(t)) · I(t)], t = tk , ∆tk = G[I(tk ), I(t+ k )].
(5)
We will study analytically two very simples cases of equation (5). First, we consider that function G is constant, determining the corresponding basic reproductive number. We compare the results with its analogous in continuous time SIS models. Secondly we take the IDE-ITD for the case G[I(tk ), I(t+ k )] equals to η · I(tk ). Here, we pretend to do a complete characterization in the parameter space for the possibilities of the dynamics for the infectious group. Finally, we will set general conditions on equation (5) that allow us to exert control and to obtain: bounded trajectories, specific asymptotic behavior, and/or periodic and stable behaviors. 4. Results A simple case of (5) is obtained, when for each instants t = tk , k = 0, 1, · · · , a number of β · (1 − I(t)) · I(t), β > 0, individuals quickly pass from the susceptible group towards the infectious group. For other different times, the recuperation is of γ · I(t) individuals per unit time. So we reduce (5) to the following IDE-TID: ′ −γ · I(t), t 6= tk , I (t) = I(t+ ) = [1 + β · (1 − I(t))] · I(t), t = tk , ∆tk = G[I(tk ), I(t+ k )].
(6)
Note that the necessity of I(t+ ) ∈ [0, 1] in the second line (6) implies the condition 0 < β < 1. In principle we can think several possibilities for the control (regulation) function G, which determines the waiting times between consecutive infectious events. Here, first we take G as constant and later G will depend on the number of infectious individual just before of the infectious current event. Theorem 4.1. If in (6) we define G(I, I + ) = s, s > 0, then the basic reproductive number takes the form R0 = In fact,
β . −1
eγ·s
(7)
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
84
(a) if R0 > 1, then there exists a periodic trajectory of (6), with period s, which is globally asymptotically stable in (0, 1]. (b) if R0 ≤ 1, the trajectory free of infective individuals (I(t) = 0, t ≥ 0) is globally asymptotically stable. Proof: For any k = 0, 1, · · · , the instant tk has associated a number I(tk ) of infectious individuals, which we will denote by Ik . Note that two consecutive terms of the sequence {Ik } are related recursively, this is, Ik+1 = F (Ik ), where F (x) = [1 + β(1 − x)]x · e−γ·s , x ∈ [0, 1]. Function F : [0, 1] → [0, 1] defines a discrete dynamical system on the interval [0,1]. Note that for any fixed point of F , it corresponds a periodic solution of ˆ we get (6). Looking for a non null solution Iˆ of Iˆ = F (I), eγ·s − 1 . Iˆ = 1 − β It is a positive number if only if R0 > 1. Moreover, we have F ′ (x) = [(1 − β) + 2β(1 − x)]e−γ·s > 0 and F ′′ (x) = −2β · e−γ·s < 0, for all x ∈ [0, 1]. So that, F is strictly increasing and concave on its domain. Then, (a), if R0 > 1 and considering the continuity of F , the fixed point Iˆ is globally asymptotically stable in ]0, 1]. (b), when R0 ≤ 1, since F ′ (0) = (1 + β)e−γ·s , we have F ′ (0) < 1, which implies that the null fixed point is globally asymptotically stable in [0, 1]. The equilibria stability of the discrete dynamical systems are transferred to the respective periodic trajectories of IDE (6). Theorem 4.2. If in (6) we define G(I, I + ) = α · I, for each I ∈ [0, 1], certain α > 0, then there exists a periodic trajectory representing a non trivial equilibrium of the infectious group, which is globally asymptotically stable. Proof: For any time tk , k = 0, 1, · · · , denoting I(tk ) by Ik , we obtain the recurrence Ik+1 = F (Ik ), where F (x) = [1 + β(1 − x)]x · e−γ·αx , x ∈ [0, 1].
ˆ = A non trivial equilibrium Iˆ of this discrete system has to satisfies f (I) γ·αx ˆ g(I), where f (x) = 1 + β(1 − x ˆ), g(x) = e , for each x ∈ [0, 1]. The graph
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
85
of function f is a straight segment joining point (0, 1 + β) with (1, 1). But for function g we have g(0) = 1 < f (0) and g(1) = eγ·α > f (1). Since f is strictly decreasing and g strictly increasing, by continuity there exists a unique fixed point Iˆ ∈]0, 1[. The convexity of function g implies that its tangent line y = γαx + 1 at point (0, 1), intersects the graph of f in point with abscise I ∗ such that Iˆ < I ∗ . So that β . 0 < Iˆ < I ∗ = γ·α+β On the other hand
F ′ (x) = F (x) then
β 1 − +γ·α , x 1 + β(1 − x)
ˆ = 1 − Iˆ F (I) ′
"
β ˆ 1 + β(1 − I)
#
+γ·α .
Note that the condition −1 < F ′ (x) < 1 of local stability for discrete systems is equivalent to " # β 0 < Iˆ · + γ · α < 2. ˆ 1 + β(1 − I) If we denotes the expression between brackets by A, we have 0 < Iˆ · [A] < I ∗ · [A] <
β [γ · α + β] = β < 1 < 2. γ·α+β
So that Iˆ is locally stable. In order to determine if Iˆ is globally asymptotically stable or not, we will find x ∈ [0, 1] where F ′ (x) > 0, this will happen if αβγ · x2 − [2β + αγ(1 + β)] · x + (1 + β) > 0.
The left quadratic expression has roots λ− and λ+ given by p 1 [2β + αγ(1 + β) ∓ [2β]2 + [αγ(1 + β)]2 ]. λ∓ = 2αβγ
Since 0 < λ− < λ+ , the sign of F ′ (x) might be negative only in ]λ− , λ+ [. However, to suppose 1 > λ+ implies a contradiction, so we have two possibilities (a) 1 ≤ λ− or (b) λ− < 1 ≤ λ+ . Note that with some algebraic works it is proved that if β + αγ ≤ 1 [resp. β + αγ > 1], then (a) [resp. (b)]. In case (a), we have F ′ (x) ≥ 0, for each x ∈ [0, 1]. Observing that ′ F (0) = 1 + β > 1, we have that Iˆ is globally asymptotically stable in ]0, 1].
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
86
In case (b), this is, β + αγ > 1, then 1 ∈]λ− , λ+ ]. Claim: In any case Iˆ < λ− . In fact, since β < 1, we have 2β 2 < β(1 + β) and αγ(1 + β)β < αγ(1 + β). Adding the inequalities we get β[2β + αγ(1 + β)] < (αγ + β)(1 + β). Then αγβ 3 β [2β + αγ(1 + β)] < 1 + β + . αγ + β [αγ + β]2 Amplifying the last expression by 4αβγ, after that adding to both side [2β]2 + [αγ(1 + β)]2 , and factoring, we obtain 2 2αγβ 2 4αγβ 2 . [2β+αγ(1+β)] < [2β+αγ(1+β)]2 + [2β]2 +[αγ(1+β)]2 + αγ + β αγ + β Now, reordering and taking square root, it follows that p
[2β]2 + [αγ(1 + β)]2 < [2β + αγ(1 + β)] −
2αγβ 2 . αγ + β
But the left side is equal to [2β + αγ(1 + β)] − 2αβγλ− , so replacing and cancelling terms we obtain −2αβγλ− < −
2αγβ 2 . αγ + β
ˆ Finally λ− > β/(αγ + β) = I ∗ > I. For proving that in case (b) the fixed point Iˆ is globally asymptotically stable, it rest to apply Theorem 1.5.2 in Koci´c & Ladas.31 Looking for a generalization of Theorem 4.2, we will consider the IDETID that follows ′ I (t) = −γ(I(t)) · I(t), t 6= tk , I(t+ ) = F (I(t)), t = tk , ∆tk = G(I(t+ )). k
(8)
In (8) the function F : [0, 1] → [0, 1] is differentiable, F (0) = 0, F (1) = 1, and 0 < F ′ (I) < F (I)/I, for each I ∈]0, 1]. Note that F (I) = [1 + β(1 − I)]I considered in (6) satisfies the above conditions. Moreover, function γ : [0, 1] →]0, ∞[ is such that 0 ≤ γ(I2 ) − γ(I1 ) < L · (I2 − I1 ), for all I1 , I2 ∈ [0, 1], such that I1 < I2 , a trivial example is the constant function γ of (6).
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
87
Theorem 4.3. If in (8) function G is strictly increasing and there exist λ, δ > 0, δ > L + γ(0), such that λ · [v − u] + δ · [G(v) − G(u)] ≤ ln[v/u],
(9)
for any 0 < u < v < 1, then (8) has a periodic trajectory which is globally asymptotically stable. Proof: If we take an initial condition I in time t0 = 0, we observe that I jump to F (I). During the interval ]0, G(F (I)] the dynamics is defined by the first equation of (8). Using the equivalent integral equation the value T (I) of the variable in the instant G(F (I)) is given by
T (I) = F (I) · exp −
Z
!
G[F (I)]
γ(ϕ(s; F (I)))ds , 0
(10)
where ϕ(s; I0 ), s ∈ [0, ∞[, is the solution in the real line of the ODE x′ = −γ(x) · x, such that ϕ(0) = I0 . First we will prove that T : [0, 1] → [0, 1] defined by (10) is an strictly increasing function. In fact, let I1 and I2 be in [0, 1] and such that I1 < I2 , then we have ln[T (I1 )/T (I2 )] = ln[F (I1 )/F (I2 )]+
Z
0
G[F (I2 )]
γ(ϕ(s; F (I2 )))ds −
Z
G[F (I1 )]
γ(ϕ(s; F (I1 )))ds.
(11)
0
Since G is strictly increasing we know that G(F (I1 )) < G(F (I2 )), because F is strictly increasing. So the substraction of the integrals in (11) can be rewritten as Z G[F (I1 )] [γ(ϕ(s; F (I2 ))) − γ(ϕ(s; F (I1 )))]ds Λ= 0
+
Z
G[F (I2 )]
γ(ϕ(s; F (I2 )))ds.
G[F (I1 )]
We note that first integral by Mean Value Theorem is G(F (I1 ))[γ(ϕ(s; F (I2 ))) − γ(ϕ(s; F (I1 )))], for some s ∈ [0, G(F (I1 ))], by Lipschitzian condition and continuity to initial condition, this integral is less than G(F (I1 )) · L · ρ · [F (I2 ) − F (I1 )],
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
88
some ρ > 0. The second integral, by Mean Value Theorem and the limiting of γ, is less than (L + γ(0))[G(F (I2 ) − G(F (I1 ))]. So that Λ ≤ ρ · G(1) · L · [F (I2 ) − F (I1 ))] + (L + γ(0)) · [G(F (I2 )) − G(F (I1 ))]. Note that ρ can be elected small enough to have ρ · G(1) · L < λ, if I1 and I2 are near, then by (9) we get ln[T (I1 )/T (I2 )] < 0, then T (I1 ) < T (I2 ). Now if F ′ (0) > 1, we will prove that T : [0, 1] → [0, 1] has a unique non ˆ = I, ˆ then trivial fixed point. If Iˆ ∈]0, 1[ is such that T (I) ˆ Iˆ = exp F (I)/
Z
0
ˆ G(F (I))
!
ˆ γ(ϕ(s; F (I)))ds .
(12)
The function g : [0, 1] → [0, 1] such that g(I) = F (I)/I if I ∈]0, 1] and g(0) = F ′ (0) decreases continuously from F ′ (0) > 1 towards F (1)/1 = 1, because g ′ (I) = [F ′ (I) · I − F (I)]/I 2 < 0, for each 0 < I ≤ 1. But the right side of (12), as a function of I, is strictly increasing and in I = 0 is 1, and in I = 1 is bigger than 1, so there exist a unique common point in its graphs. Clearly Iˆ has to be is globally asymptotically stable in ]0, 1[. Finally, we note that F ′ (I)/F (I) < 1/I, for 0 < I ≤ 1. Integrating on [y, 1], y > 0, we get ln(F (1)) − ln(F (y)) < ln(1) − ln(y). So, F (y) > y, then F ′ (0) ≥ 1. However, F ′ (0) = 1 is impossible, because in this case g is an strictly decreasing function such that g(0) = g(1) = 1. Therefore, always F ′ (0) > 1. References 1. Hamer, Epidemic Disease in England - the evidence of variability and of persistency of type The Lancet(i), 733-739, (1906). 2. Ross, Some quantitative studies in epidemiology. Nature, 87, 466-467 (1911). 3. Kermack & McKendrick, A contribution to the mathematical theory of epidemics. Proceedings from the Royal Society A115, 700-721 (1927). 4. Hethcote, H.W., The Mathematics of Infectious Diseases, SIAM Review, 42(4), 599-653 (2000). 5. C´ ordova-Lepe, Advances in a theory of impulsive differential equations at impulse-dependent times, with applications to the Bio-economics, in: Mondaini & Dilao, (Eds.) Biomat 2006, International Symposium on Mathematical and Computational Biology, World Scientific, 2007, 343-357 (2007). 6. Meng XZ & Chen LS, Dynamical behaviors for an sir epidemic model with time delay and pulse vaccination. Taiwanese Journal of Mathematics, 12[5], 1107-1122 (2008).
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
89
7. Wei CJ & Chen LS, A delayed epidemic model with pulse vaccination. Discrete Dynamics in Nature and Society, Article Number: 746951, (2008). 8. Gao SJ, Teng ZD & Xie DH, The effects of pulse vaccination on SEIR model with two time delays. Applied Mathematics and Computation, 201[1-2], 282292 (2008). 9. Zhang TL & Teng ZD, An SIRVS epidemic model with pulse vaccination strategy. Journal of Theoretical Biology, 250[2], 375-381 (2008). 10. Mil’man & Myshkis, On the stability of motion in the presence of impulses. Sibirik. Mat. Zh., 1:2, 233-237 (1960)(Russian). 11. Perestyuk & Samoilenko, Stability of solutions of differential equations with impulse effect. Diff. Uravn., 13, 1981-1992 (1977). 12. Perestyuk & Samoilenko, On the stability of systems with impulse effect. Diff. Uravn., 11, 1995-2001 (1981). 13. Perestyuk & Samoilenko, Differential equations with impulse effect. Kiev, Vishcha Shkola (1987). 14. Bainov & Simeonov, Systems with impulse effect. Ellis Horwood Series in Mathematics and its Applications. Ellis Horwood Limited. (1989). 15. Bainov & Simeonov, Impulsive differential equations: periodic solutions and applications. Longman Scientific & Technical.(1993). 16. Bainov & Covachev, Impulsive differential equations with small parameter. V.4. of Series on Advances in Mathematics for Applied Sciences. World Scientific, Singapore.(1994). 17. Lakshmikantham, Bainov & Simeonov, Theory of impulsive differential equations. World Scientific, Singapore.(1989). 18. Lakshmikantham & Liu, Stability analysis in terms of two measures. World Scientific, Singapore.(1993). 19. Samoilenko & Perestyuk, Impulsive Differential Equations. World Scientific Series on Nonlinear Science, A14 (1995). 20. Yang, Impulsive systems and control: theory and applications. Nova Scientific Publishers Inc., Huntington, N.Y. (2001). 21. Cook & Wiener, An equation alternately of retarded and advanced type. Proceedings of the American Mathematical Society. 99, 726-732.(1987). 22. Meng & Chen, The dynamics of a new SIR epidemic model concerning pulse vaccination strategy. Applied Mathematics and Computation, 197 [2], 582597 (2008). 23. Gakkhar & Negi, Pulse vaccination in SIRS epidemic model with nonmootonic incidence rate. Chaos, Solitons and Fractals, 35 [3], 626-638 (2008). 24. Zhang & Teng, An SIRVS epidemic model with pulse vaccination strategy, Journal of Theoretical Biology, 250 [2], 375-381 (2008). 25. Lakmeche & Arino, Nonlinear mathematical model of pulse therapy of heterogeneous tumor. Nonlinear Analysis: Real World Applications, 2 [4], 455-465 (2001). 26. Zhang, Jianjun & Lansun, Pest management through continuous and impulsive control strategies. Biosystems, 90[2], 350-361 (2007). 27. Pang & Chen, Dynamic analysis of a pest-epidemic model with impulsive control, Mathematics and Computers in Simulation, In Press.(2008).
April 24, 2009
16:31
WSPC - Proceedings Trim Size: 9in x 6in
F.Cordova.novo2
90
28. Allegreto & Papini, Analysis of a lagoon ecological model with anoxic crises and impulsive harvesting, Mathematical and Computer Modelling, 47[7-8], 675-686 (2008). 29. Negi & Gakkhar, Dynamics in a BeddingtonDeAngelis preypredator system with impulsive harvesting, Ecological Modelling, 206[3-4], 421-430 (2007) 30. C´ ordova-Lepe, Pinto & Gonz´ alez-Olivares, A new class of Differential Equations with Impulses at Instants-Dependent of Preceding Pulses. Applications to Management of Renewable Resources. Nonlinear Analysis Series B: Real World Applications, Submitted.(2008). 31. Koci´c & Ladas, Global behavior of nonlinear difference equations of higher order and applications. Kluwer Academic Publishers, Dordrecht. (1993).
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
91
NETWORK STRUCTURE AND EPIDEMIC WAVES IN METAPOPULATION MODELS V. COLIZZA1
†
F. GARGIULO1 J. J. RAMASCO1 A. BARRAT1,2,3 A. VESPIGNANI1,4 1 Complex Networks and Systems, ISI Foundation Turin 10133, Italy † E-mail:
[email protected] 2 Laboratoire de Physique Thorique (CNRS UMR 8627) Univ. Paris-Sud, France 3 CPT (CNRS UMR 6207), Luminy Case 907 F-13288 Marseille Cedex 9, France 4 School of Informatics and Center for Biocomplexity Indiana University, Bloomington, IN 47401, USA Understanding and being able to predict the geo-temporal pattern of epidemic dynamics would provide key information in the identification and planning of control strategies. Relying on theoretical approaches and numerical simulations, it is possible to study specific aspects of the epidemic evolution and investigate the features of human mobility that are responsible for the observed patterns. We consider a stochastic metapopulation model for the spread of an infectious disease in a spatially structured population. The model assumes strong mixing of individuals within local communities or subpopulations, and weaker interactions between people belonging to distinct subpopulations. The underlying structure of the metapopulation model connecting the subpopulations represents the patterns of human mobility and travel. Here we focus on the impact that spatial, topological and traffic properties of the metapopulation network have on the geo-temporal propagation of the disease in the system. We find that the metapopulation network topology (hop distance) is responsible for driving the invasion dynamics in waves which simultaneously affect shells of subpopulations at the same topological distance from the epidemic seed, regardless of the geographical distance. Heterogeneous travel fluxes on
May 21, 2009
9:27
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
92 the other hand affect this picture by weakening the waves synchrony. Results show how the epidemic peak times are in the latter case less correlated with the topological distance, and are correlated instead with weighted paths based on the travel probabilities. Epidemic spreading in metapopulation models of spatially structured populations connected by a mobility network naturally leads to epidemic waves affecting subpopulations at the same topological distance which experience simultaneous epidemic outbreaks. A variety of factors such as multiple legs travel, heterogeneity of travel fluxes and correlations between fluxes and topology might enhance or impede the waves synchrony.
1. Introduction Understanding the geo-temporal evolution of infectious diseases is crucial in the control of the epidemic impact on the population and the evaluation of intervention strategies. Computational models for the spatial spread of infectious diseases are therefore becoming increasingly important since they offer the experimental setting where to test and validate theories, analyze observed epidemic patterns, and devise appropriate control measures (for a recent review, see1 ). Building on detailed data on social interactions and travel flows, simulations can provide useful insights on specific aspects of the spatial disease dynamics and on the role played by heterogeneous human behavior in shaping the observed spreading patterns. Here we focus on the impact of the spatial structure of the environment and of the topology of transportation infrastructures on the geo-temporal evolution of epidemics. Among the different possible levels of description for the spatial evolution of epidemics, the metapopulation modeling approach provides a useful paradigm to simulate epidemics in a global population localized in discrete and well defined communities (or subpopulations), with strong mixing within each community and weaker interactions between communities2 . The latter represent the individual mobility across subpopulations and provide the coupling between the local outbreaks. Several models and results have been obtained within this framework, ranging from local3,4 to global scales1,5–15 , including explicit movements of individuals12,16–21 or effective coupling approaches6,13,22–26 , addressing case studies16–20,27–30 or focusing on the understanding of the theoretical aspects of the modeling approach.31,32 We consider a stochastic metapopulation model for the spread of a virus on a set of V locations (i.e. spatially structured communities or subpopulations), connected by means of a network representing the transportation infrastructure enabling the mobility of individuals across subpopulations. The infection dynamics occurs inside each subpopulation with a basic homogeneous mixing approach, whereas the spatial propagation of the virus
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
93
takes place along the connections of the transportation network, as carried by infectious travelers. We focus on the analysis of the timing of the epidemic peaks experienced by the local subpopulations and relate it to the topological and geographic structural properties of the environment. We consider different mobility patterns and study how these aspects impact the coupling between distinct subpopulation outbreaks and affect the spreading scenario and the reliability of its prediction. We find that the spreading phenomenon can produce synchronized patterns depending on the transportation network structure.
2. Methods & Models 2.1. Epidemic metapopulation model We consider a metapopulation model of V subpopulations inside which the infection dynamics takes place interacting through the movement of individuals along the connections of the transportation infrastructure. The infection dynamics is described by the standard compartmentalization approach in which individuals are labeled according to their stage with respect to the course of the disease33 . The most basic model within this approach is the susceptible-infected-removed (SIR) model33,34 , which assumes that the population in a community j of Nj individuals is divided into three distinct compartments: susceptible, infected, and recovered. The number of susceptible, infected, and recovered individuals in the subpopulation j at time t are denoted respectively Sj (t), Ij (t), Rj (t), with Nj = Sj (t) + Ij (t) + Rj (t). The dynamic inside the subpopulations accounts for the individuals getting into contact and transmitting the disease, thus leading to local outbreaks which are overall coupled as the result of people moving from one subpopulation to another. In the assumption of local homogeneous mixing, each susceptible individual in subpopulation j becomes infected at time t with a probability rate βIj (t)/Nj , where the parameter β represents the transmission rate. Infected individuals can recover (or be removed) at a rate µ, where µ represents the inverse of the average duration of the infectious period. The disease evolution in each subpopulation is described in terms of binomial processes to take explicitly into account the discreteness of individuals. The number of new infectious individuals in subpopulation j generated at each time step ∆t is extracted from a binomial distribution with probability βIj (t)∆t/Nj and number of trials Sj (t). In the recovery transition, the number of individuals entering the recovered compartment is extracted from a binomial distribution with probability µ∆t and number
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
94
of trials Ij (t). For more details on the implementation, we refer to29,35 . The spatial propagation of the virus occurs as a result of the mobility of infectious individuals. We consider a network of interactions among the V subpopulations that generically represents different possible means of transportation, such as air connections, railways, highways, and others. The model is schematically represented in Figure 1.
Fig. 1. Epidemic metapopulation model: Sketch of a metapopulation epidemics model. The nodes of the network are the different subpopulations. Each community is divided into three compartments: Susceptible, Infected and Recovered. The links represent all the possible kind of transport connections that are linking two interacting subpopulations.
The travel flux along a link from origin i to destination j is quantified by the weight wij , defined as the average number of individuals traveling from i to j per unit time36 . The probability for an individual in the subpopulation i of traveling to j during the time interval ∆t is then given by pij = wij ∆t/Ni . Under the Markovian hypothesis that the individuals travel behavior has no memory effects, it is then possible to compute the number of infectious individuals traveling on each connection from a subpopulation i as a set of stochastic variables that follows a multinomial distribution. A mathematical description of the travel coupling is provided in detail by refs27,28,35 .
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
95
2.2. Transportation network models Mobility patterns represent the interaction substrate of the metapopulation structure along which the epidemic diffuses. Various studies have shown that they often display highly heterogeneous properties, affecting both their topological structure and their traffic capacity. Examples range from large scale transportation infrastructures to local commuting patterns36–40 . Detailed data on movement of people and goods can be described in terms of networks in which nodes represent the locations of origin and destination (e.g. airports in the air transportation network, cities in the commuting pattern data, city locations in the everyday human mobility) and links the connections between them, each characterized by a weight quantifying the traffic capacity of the link itself. The probability distributions of the number of links per node (i.e. the degree) and of the travel fluxes are found to be broad laws, pointing to the presence of several levels of strong heterogeneity typical of complex systems36,41–43 . In addition, travel fluxes are often found to be correlated with the topology of the network, as for example in the world-wide air transportation network36 where the airport traffic between two airports i and j is statistically described by the relation < wij >= w0 (ki kj )δ with δ ≈ 0.5 and ki denoting the degree of the node i36 . Together with topological and traffic properties, another important aspect of transportation infrastructures is their embedding in the geographical space, which imposes constraints on the nature and features of the links44–46 . Longer connections are for example usually associated to higher costs than short ones and connect distant regions, whereas short links provide the connectivity at a local level. This results in non-trivial correlations between topology, traffic and geography for these networks45,47 . In order to take into account the above ingredients of the metapopulation network and to understand how they impact the timing of the local epidemic peaks, we consider the spatial network model introduced in44 . Nodes are placed at random on a disk of fixed radius L (without loss of generality we will use L = 1). Each new node n introduced in the network connects to m already existing nodes, with a probability to establish dni a link to an already present node i given by Pn→i ∝ Ki e− rc , where dni is the geographical distance between n and i, and rc is a parameter of the model. This procedure incorporates both a rich-get-richer mechanism since a node with larger degree receives new links with higher probability which allows to reproduce heterogeneous degree distributions48 , and the effect of geographical constraints which favor short connections. According to the values of the adimensional parameter η = rLc , two different regimes can
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
96
be distinguished: for η ≫ 1, the network growth process is driven by the degree of the nodes, whereas the geographical constraints are not relevant, thus leading to a scale-free network with power-law degree distribution P (k) ∝ K −348 ; for n ≪ 1 in contrast, the connection probability strongly depends on the geographical position of the nodes, so that nodes tend to connect to closer neighbors and hence the average geographical distance of the connections decreases. The crossover between these two regimes can be observed in Figure 2, where the distribution of the geographical distance between directly connected nodes and the degree distribution are displayed for different values of η.
Fig. 2. Network topology: Left plot: Distribution of the distances between connected nodes for different values of η. Right plot: Degree distribution P (k) for different values of η. The data correspond to networks of V = 104 subpopulations.
Finally, to incorporate the traffic aspect in the spatial metapopulation network with heterogeneous topology above defined, we consider and compare the results obtained with different models. First, we define a homogeneous model, HOMW, which assumes that all links weights are equal to a fixed value, i.e. wij =< w > ∀i, j . We compare it to a weighted model with heterogeneous distribution of weights, obtained from the statistical law observed in real world transportation networks36 , i.e. wij = w0 (ki kj )1/2 , thus imposing a correlation between the travel flux on a connection and the degrees of the origin and destination. We denote such network HETW-C. Finally, we introduce a weighted network called HETW-U where the distribution of weights is heterogeneous and obtained as before, but weights are reshuffled in order to destroy the correlations between topology and traffic. For both HETW-C and HETW-U models, the parameter w0 is chosen in order to obtain the same average weight as in the model with homogeneous weights.
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
97
2.3. Local outbreaks and spreading pattern predictability We describe the spread of an infectious disease in a spatially structured environment, starting from one initially infected location and following the geo-temporal evolution in the heterogeneous metapopulation network. Each location in the system may eventually experience an outbreak that can be characterized by the local prevalence ρi (t) = Ii (t)/Ni (t), defined as the local density of infectious individuals Ii (t) in subpopulation i at time t. We denote with tpeak the time at which the outbreak in a subpopulation reaches its maximum value in the prevalence, corresponding to the maximum number of simultaneously infected individuals. Depending on the invasion dynamics, subpopulations will reach tpeak at different times during the overall duration of the epidemic. In addition, given the stochasticity of both the travel events and the infection dynamics, different realizations starting from the same initial conditions might lead to different behaviors in the local peak timing. Therefore we consider a measure of the similarity between stochastic realizations27,29 and relate this to the observed patterns of the local peak times. Following refs.27,29 we characterize each outbreak starting from given initial conditions with the vector P π ¯ (t), whose components πi (t) = Ii (t)/ i Ii (t) represent the probability that a given infected individual is in subpopulation i at time t. The statistical similarity between two vectors π ¯ I (t) and π ¯ II (t) characterizing two outbreaks I and II, respectively, can be measured by the Hellinger affinP p ¯ I (t) and π ¯ II (t) are invariant under ity sim(¯ πI , π ¯ II ) = i πiI πiII . Since π a global scaling factor for I I (t) and I II (t), we need to consider as well P P the total epidemic prevalence, i(t) = i Ii (t)/NT OT with NT OT = i Ni the total population of the metapopulation system, and define the vector ¯i(t) = (I(t), 1 − i(t)). The overlap function is then defined as:
Θ(t) = sim(¯ π I (t), π ¯ II (t).sim(¯iI (t), ¯iII (t)).
The overlap is an estimator of the predictability of the outbreak. Its possible values lie in the interval [0, 1]; taking unit value when the two realizations are exactly identical, and hence predictable, and 0 if the sets of infected subpopulations in realizations I and II are disjoints.
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
98
3. Results 3.1. Emergence of epidemic waves Let us consider the spread of an infectious disease described by an SIR on a heterogeneous metapopulation network. Initially, we assume homogeneous weights on the connections, i.e. the HOMW model, with a number of passengers per connection equal to wij =< w >= 10. Our system includes V = 104 subpopulations and, for the sake of simplicity, we consider uniform populations of each Ni =< N >= 105 individuals. All the simulations start with a single infected individual located in a given subpopulation. We will distinguish simulations in which the initial outbreak occurs in a hub from the ones in which the seed is in low connected node. All the results are averaged over 100 stochastic realizations of the spreading.
Fig. 3. Overlap function and geography: Overlap function and infectious prevalence of a sample of locations for four networks characterized by different values of η. The left column regards an outbreak starting from a low connected node, the right column a spreading starting from a hub. The maximum degree for each network is respectively: kmax = 289, 205, 64 and 37.
Figure 3 displays the evolution of the overlap function for different values of the parameter η defining the network structure, together with the prevalence curves for a sample of subpopulations in a single realization. Results obtained starting from both a low degree node (left column) and a hub (right column) are reported. The overlap starts from 1 (since the initial conditions are the same for all realizations) and then decreases due to
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
99
the stochastic fluctuations that make realizations different from each other. At late times, close to the end of the spreading, the overlap vanishes as the overall epidemic dies out. A common feature of the overlap profile in the various cases is its marked oscillatory behavior. As shown in Figure 3, oscillations are quite synchronized with groups of subpopulations reaching their epidemic peak almost simultaneously. In order to identify these groups of subpopulations which, with a high degree of predictability, experience the outbreak at the same time, let us follow the spatial spread of the virus on the network and define shells of subpopulations according to their topological distance from the starting seed. The starting initial seeded subpopulation is therefore defined as shell number zero, all subpopulations which are one link away of the seed defines the shell one and so on so forth. The seeding subpopulation reaches the epidemic peak while the other nodes have a very low prevalence. The overlap then starts to decrease, with a significantly faster and stronger decrease if the initial seed is a hub (see the right column of Figure 3): as already noted in27,28 , the large choice of possible destinations for the infected individuals of the seed leads indeed in this case to a larger difference between realizations, and to a smaller predictability. Next, the subpopulations of the first shell, directly connected to the seed and infected by travelers from the seed, experience a maximum in the number of infections almost simultaneously. This produces an increase in the overlap function since, if the number of infected increases, the corresponding relative stochastic fluctuations become smaller: the evolution is less prone to stochastic fluctuations and becomes more predictable. The disease continues propagating further into the neighbors of the first shell (i.e. the subpopulations of the second shell), which also display a synchronized outbreak. The process thus defines a wave dynamics of the epidemic that moves in a synchronized way from one topological shell to the other in the subpopulation network. The process goes on until the disease has affected the whole system, with a progressive reduction of the synchronization as farther shells get infected. In addition, this effect is modified by changes in the interplay between topology and space as measured by the parameter η. If it decreases, the overall duration of the spread is longer and the number of oscillations in the overlap profile increases. This is due to an increase in the topological diameter of the network, determined by the reduction of the number of geographical long-range connections for lower values of η (see Figure 2). In this case, links tend to connect subpopulations relatively close to each other in Euclidean space, so that the number of hops needed to reach the outskirt of the network from the seed becomes larger, as shown
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
100
in Figure 4. A similar effect is obtained by fixing the topology, i.e. fixing the value of η, and observing the differences between an epidemic starting from a hub (right column of Figure 3) or from a low degree node (left column). Also in this case, indeed, a larger number of shells is required to cover the network if starting from a poorly connected node (see Figure 4), and therefore a larger number of oscillations is observed in the overlap.
Fig. 4. Number of communities on each shell: Distribution of the number of locations for each topological distance from the seed in the case of η = 0.001 and η = 0.1. For the upper plot the seed is located in a low degree node while in the lower plot the seed is in a hub.
Typically, the synchronization effect is well defined in the first shells, leading to clear oscillations of Θ(t). These shells are close to the seed and contain a quite small number of nodes (see Figure 4), with therefore a small dispersion in the peak times. The number of subpopulations located on each shell increases with the topological distance (for instance, for η = 0.001, almost 90% of the cities are located between the shell number 6 an the shell number 9). The fact that on such shells the number of cities is
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
101
so large, added to the stochastic nature of the spreading process, causes the increasing of the dispersion of the peak times and, consequently, a smoothing out of the separation of peak times between consecutive shells. This explains that the oscillatory behavior vanishes at longer times.
Fig. 5. Geographic and Reshuffled network: Overlap function and infectious prevalence of a sample of locations for a disease spreading on a spatial and on a randomly reshuffled networks.
In order to better understand how the interplay between topology and space affects the wave dynamics, we consider a topologically uncorrelated network obtained by reshuffling the links of the original network while keeping the degree sequence constant (to this aim, we repeatedly select pairs of links and exchange their endpoints49 ) . In Figure 5, we compare the results of the spreading dynamics on the original network with a small value of η(η = 0.001) and on the reshuffled one. The same initial conditions are used on both networks, i.e. the same subpopulation with the same degree is seeded in both cases, but the first shell of subpopulations include different locations in the two cases; the same is true for all the following shells. The spread is faster on the reshuffled network, since the randomization procedure reduces the diameter of the network by increasing the number of long-range connections. In this respect, the spreading dynamics on reshuffled graphs present strong similarities to the spreading on high η networks. Moreover, the overlap reaches lower values and displays fewer oscillations with larger amplitude. This is due to the fact that the reshuffled network is organized in a smaller number of shells around the infection seed, and the first shells are formed by more nodes than in the original network. The oscillations in the total number of infected become therefore larger, leading to oscillations of larger amplitude in the overlap (since Θ(t) is larger for larger numbers of infected which correspond to smaller relative fluctuations27,28 ).
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
102
=0.6
Fig. 6. Geographic and Reshuffled networks (time of the peaks): Upper plots: Peak time tpeak versus topological shortest path length for a geographical and a randomly reshuffled network. The figure shows the average value of tpeak over all the locations lying at a certain topological distance and over all the stochastic realizations. Error bars represent the 90% confidence intervals. Inset: time of the peak versus geographical distance. Lower plots: tpeak distribution for each topological distance.
Figure 6 shows the relation between the topological distance from the seed, l, and the prevalence peak of the subpopulations in shell l. The insets show how the peak time is correlated with the geographical distance from the seed only when spatial constraints are very strong (small η). This effect is strongly marked only at the beginning of the spread. Even the network with η = 0.001 has indeed small-world properties, so that the largest possible geographical distances are reached through few topological steps, i.e. through few shells from the seed. Figure 6 also shows the distribution of peak times in each shell l. The synchronization effect is visible in the clear separation of the distributions for small l, whereas shells farther from the seed display broader distributions which tend to overlap significantly.
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
103
3.2. Effect of two legs travel We now explore the impact of the introduction of multiple legs trips on the synchronization of the local epidemic outbreaks in a metapopulation model. Results presented so far are indeed based on the assumption that travelers can only make trips of length 1 in terms of number of connections, with links of the network therefore representing the actual origin-destination matrix. On the other hand, in many cases, people cover longer topological distances in a single trip only. In the context of airline connections, for example, this would correspond to multiple legs flights to travel from an airport to another. Following refs.27,28 we explicitly include this additional travel behavior in the computational model considered. The relative importance of multi-leg travel is controlled in the model by the percentage α of travelers who, on each connection i − j, will connect in location j to proceed to their final destination l, neighbor of j. Here we consider two extreme situations: one with a low α(α = 0.05), and another with a very high α(α = 0.2). Statistics of major airports available on the web estimate the value of α to be equal to few percentage points. Figure 7 shows how taking into account two-legs travel changes the dynamical scenario of the spreading. As the probability α increases, the total duration of the spread is shortened and the oscillatory behavior of Θ(t) becomes weaker, due to the larger probability that infectious travelers have to cover longer topological distances in a single trip. This results in a mixing of simultaneous outbreaks experienced by subpopulations belonging to different shells, and therefore weakening the wave synchronization phenomenon. In order to quantify this effect, we measure the normalized intersection area χl,l+1 of the peak time distributions, Pl (tpeak and Pl+1 (tpeak , for two consecutive shells, l and l + 1:
χl,l+1 =
tX max 1 Pl (t)nl θ(Pl+1 (t)) + Pl+1 (t)nl+1 θ(Pl (t))), ( nl + nl+1 t=0
where θ(x) is a step function, which is zero if the argument x = 0, and one otherwise. nl is the total number of communities in shell l, and the sum runs over all the times from the beginning to the end of the spreading. If the distributions of two consecutive shells share an extensive common domain, the values of tpeak for the subpopulations of the two shells will be difficult to distinguish and the synchronization will thus be less pronounced. On the other hand, two consecutive shells with null intersection in their peak time distributions would lead to χl,l+1 = 0 and clearly separated oscillations
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
104
Fig. 7. Multiple connections travels (overlap function): Left plots: Overlap function after the introduction of connecting flights with different values of the parameter α. The figure also displays the prevalence profiles for a sample of locations. Right plots: tpeak distribution for each topological distance.
in Θ(t) for the corresponding values of tpeak . Figure 8 displays χl,l+1 as a function of l for systems with different value of α. The intersection displays an increase that corresponds to the progressive broadening of the peak time distributions observed in Figures 6 and 7, and shows the de-synchronizing effect of the multiple legs travel captured by χl,l+1 . 3.3. Impact of heterogeneous travel flows If we introduce heterogeneous weights on the links connecting the subpopulations, in order to model realistic transportation infrastructures with heterogeneous travel fluxes, the picture changes and the topology-space interplay is no longer the dominant aspect of the epidemic spreading. We consider the heterogeneous weighted networks, both correlated and uncorrelated with topology HETW-C and HETW-U, respectively and in Figure 9 we compare the results with the ones assuming HOMW as the metapopulation substrate.
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
105
Fig. 8. Multiple connections travels (normalized intersection area): Normalized intersection area between two consecutive length tpeak distributions (χl,l+1 ) for different values of α.
While the overall shape of the overlap is preserved, with similar values for the predictability, the heterogeneous fluxes of passengers alter the probabilities of leaving a subpopulation towards neighboring destinations, probabilities which no longer assume equal values. This results in different timing at which neighboring subpopulations will be infected and reach the peak of their epidemic, differentiating the subpopulations within each shell and leading to a faster and stronger de-synchronization of the waves, as revealed by the smaller number of oscillations of Θ(t). The effect is stronger for the weight-degree correlated network. The probability distributions of the weights are exactly the same for HETW-C and HETW-U networks, so the correlations are the only responsible for this difference. For HETW-C networks, the variability in the overall traffic out of a node is larger than for HETW-U. In other words, the traffic differences between hubs and poorly connected subpopulations are stronger for HETW-C than for HETW-U networks. Indeed, shells are formed by communities with very diverse degree. The difference on the traffic levels that correlations between traffic and topology introduces in the HETW-C definitively alters the spreading probability of the disease propagation, affecting also the synchronization within a shell. The normalized intersection area χl,l+1 shown in Figure 10 confirms
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
106
Fig. 9. Heterogeneous weight models (overlap function): Left plots: Overlap function for different weight models and prevalence profile for a sample of locations. Right plots: tpeak distribution for each topological distance.
this picture in a more quantitative way: the superposition of the peak time distributions is larger in the case of heterogeneous weights, particularly in the weight-degree correlated case. For heterogeneous travel fluxes, the topological distance from the seed is therefore not a good indicator of the peak time for a subpopulation. The timing of the spatial invasion is indeed driven by the probability of traveling on a given path, connecting i to j through a series of intermediate nodes k0 , k1 , ..., which is given by the product of the probabilities of traveling on each link of the given path. Many paths are usually available between two nodes, and we define the connection probability between i and j as the maximum probability along all the possible paths: cpij = maxpaths
wik0 wk0 k1 wk1 k2 wkn−1 j ... Ni N k 0 N k Nk n
(1)
where n is the length of the path. It has already been shown50,51 that the arrival time of a disease in each subpopulation is indeed linked to this
April 24, 2009
16:22
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
107
Fig. 10. Heterogeneous weight models (intersection area): Normalized intersection area between two consecutive tpeak distributions (χl,l+1 ) for different weight distributions.
quantity. Figure 11 shows how the average peak time for each location is proportional to the logarithm of the connection probability. This confirms that the invasion dynamics of the epidemic outbreaks, in the case of heterogeneous metapopulation networks, do not select the topological shortest path but rather the path that is more convenient from the point of view of the connection probability.
3.4. Increasing the individual mobility We finally examine the role that a global variation of the transport flux magnitude can have on the synchronization. We consider three different values of the average weights, namely < w >= 10, < w >= 50 and < w >= 100, for both the homogeneous and the heterogeneous correlated cases, i.e. HOMW and HETW-C. An increase of the fluxes, while keeping fixed the subpopulation sizes, leads to larger travel probabilities for the infectious individuals, and the overall propagation also increases, as shown in Figure 12. As a consequence, the distributions of peak times for the successive subpopulation shells tends to overlap, and the synchronization within the
May 21, 2009
9:27
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
108
Fig. 11. Heterogeneous weight models (connection probability): Logarithm of the connection probability as a function of tpeak .
shells becomes less singled out for increasing values of < w >, with a disappearance of the oscillations of the overlap and larger values of χl,l+1 , as shown in Figure 13. 4. Discussion In this paper, we have performed a numerical analysis of the propagation patterns in metapopulation epidemic models, focusing on the peak times of the infection in the different subpopulations. The coupling between different subpopulations is described by a complex network, whose characteristics can be tuned. Such a coupling leads to a high level of synchronization in the prevalence profiles of groups of different locations. In other words, we observe epidemic waves across subpopulations even if they are geographically very far apart. This synchronization pattern is understood by considering that the network is organized in successive topological shells, each one corresponding to a certain link distance from the seed of the disease. The spreading process therefore proceeds through shortest path trajectories and
May 21, 2009
9:27
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
109
Fig. 12. Different values of < w >: tpeak as a function of the topological distance for 3 different values of the weights, in the homogeneous weight case. The dashed lines are linear fits.
exhibits a noticeable synchronous behavior of the subpopulations located at the same topological distance. Multi-leg displacements allow direct traveling to farther topological distances and therefore smooth out the wave phenomenon. Moreover, as weights become more heterogeneous, locations at the same topological distance from the seed can be reached (and therefore experience their epidemic peak) at different times. Finally, the synchronization depends strongly on the average passenger flux on the network. As the average number of travelers increases keeping fixed the population size, the velocity of the spread increases, reducing the shell synchronization and thus the epidemic waves. Acknowledgments V.C is partially funded by the EC EpiFor contract n. ERC 2007 Stg 204863. A.V. is partially funded by the NIH-NIDA-21DA024259-01 award. References 1. Riley S (2007) Large-Scale Spatial-Transmission Models of Infectious Disease. Science 316: 1298. 2. Colizza V, Vespignani A (2007) Epidemic modeling in metapopulation systems with heterogeneous coupling pattern: theory and simulations. eprint arXiv: 07063647.
May 21, 2009
9:27
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
110
Fig. 13. Different values of < w > (overlap function & intersection area): Left column: Overlap function for different values of < w > in the homogeneous case. Right column: Overlap function for different values of < w > in the heterogeneous degree-correlated case (HETW-C). In both columns the last plot gives the normalized intersection area vs the topological distance for the three cases (blue symbols for < w >= 10, green symbols for < w >= 50, red symbols for < w >= 100).
3. Levins R (1967) Extinction. Lectures Math Life Sci 2, math Questions Biology, Proc second Sympos math Biology, New York: 2. 4. Levins R (1969) Some demographic consequences of environmental heterogeneity for biological control. Bulletin of the Entomological Society of Amer-
May 21, 2009
9:27
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
111
ica 15: 237-240. 5. Anderson R, May RM (2003) Spatial, Temporal, and Genetic Heterogeneity in Host Populations And the Design of Immunization Programmes. Mathematical Medicine and Biology 1: 233-266. 6. Bolker B, Grenfell B (1995) Space, Persistence and Dynamics of Measles Epidemics. Philosophical Transactions: Biological Sciences 348: 309-320. 7. Bolker BM, Grenfell BT (1993) Chaos and Biological Complexity in Measles Dynamics. Proceedings: Biological Sciences 251: 75-81. 8. Ferguson NM, Keeling MJ, Edmunds WJ, Gani R, Grenfell BT, et al. (2003) Planning for smallpox outbreaks. Nature 425: 681-685. 9. Grenfell B, Harwood J (1997) (Meta) population dynamics of infectious diseases. Trends in Ecology & Evolution 12: 395-399. 10. Grenfell BT, Bolker BM (1998) Cities and villages: infection hierarchies in a measles metapopulation. Ecology Letters 1: 63-70. 11. Hethcote HW (1978) An immunization model for a heterogeneous population. Theor Popul Biol 14: 338-349. 12. Keeling MJ, Rohani P (2002) Estimating spatial coupling in epidemiological systems: a mechanistic approach. Ecology Letters 5: 20-29. 13. Lloyd AL, May RM (1996) Spatial Heterogeneity in Epidemic Models. Journal of Theoretical Biology 179: 1-11. 14. May RM, Anderson RM (1979) Population biology of infectious diseases: Part II. Nature 280: 455-461. 15. May RM, Anderson RM (1984) Spatial heterogeneity and the design of immunization programs. Mathematical biosciences 72: 83-111. 16. Baroyan OV, Genchikov LA, Rvachev LA, Shashkov VA (1969) An attempt at large-scale influenza epidemic modelling by means of a computer. Bull Int Epidemiol Assoc 18: 22-31. 17. Flahault A, Valleron A-J (1991 ) A method for assessing the global spread of HIV-1 infection based on air-travel. Math Pop Studies 3: 1-11. 18. Grais RF, Hugh Ellis J, Glass GE (2003) Assessing the impact of airline travel on the geographic spread of pandemic influenza. Eur J Epidemiol 18: 1065-1072. 19. Longini Jr IM (1988) A mathematical model for predicting the geographic spread of new infectious agents. Mathematical biosciences 90: 367-383. 20. Rvachev LA, Longini I (1985) A mathematical model for the global spread of influenza. Mathematical biosciences 75: 3-22. 21. Sattenspiel L, Dietz K (1995) A structured epidemic model incorporating geographic mobility among regions. Mathematical biosciences 128: 71-91. 22. Earn DJD (1998) Persistence, chaos and synchrony in ecology and epidemiology. Proceedings of the Royal Society B: Biological Sciences 265: 7-10. 23. Keeling MJ (2000) Metapopulation moments: coupling, stochasticity and persistence. Journal of Animal Ecology 69: 725-736. 24. Park A, Gubbins S, Gilliga CA (2002) Extinction times for closed epidemics: the effects of host spatial structure. Ecology Letters 5: 747-755. 25. Rohani P, Earn DJD, Grenfell BT (1999) Opposite Patterns of Synchrony in Sympatric Disease Metapopulations. Science 286: 968.
May 21, 2009
9:27
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
112
26. Vazquez A (2007) Epidemic outbreaks on structured populations. Journal of Theoretical Biology 245: 125-129. 27. Colizza V, Barrat A, Barthlemy M, Vespignani A (2006) The Modeling of Global Epidemics: Stochastic Dynamics and Predictability. Bull Math Biol 68: 1893-1921. 28. Colizza V, Barrat A, Barthlemy M, Vespignani A (2006) The role of the airline transportation network in the prediction and predictability of global epidemics. Proc Natl Acad Sci 103: 2015-2020. 29. Colizza V, Barrat A, Barthlemy M, Vespignani A (2007) Predictability and epidemic pathways in global outbreaks of infectious diseases: the SARS case study BMC. Medicine 5: 34. 30. Hufnagel L, Brockmann D, Geisel T (2004) Forecast and control of epidemics in a globalized world. Proceedings of the National Academy of Sciences, USA 101: 15124-15129. 31. Colizza V, Pastor-Satorras R, A. V (2007) Reactiondiffusion processes and metapopulation models in heterogeneous networks. Nature Physics 3: 276-282. 32. Colizza V, Vespignani A (2007) Invasion threshold in heterogeneous metapopulation networks. Phys Rev Lett. 33. Anderson RM, May RM (1992) Infectious disease in humans. Oxford, UK: Oxford University Press. 34. Murray JD (1993) Mathematical Biology: 2nd ed. Springer, New York. 35. Colizza V, Barrat A, Barthelemy M, Valleron AJ, Vespignani A (2007) Modeling the worldwide spread of pandemic influenza: baseline case and containment interventions. PLoS Med 4: e13. 36. A. Barrat MB, R Pastor-Satorras, A Vespignani (2004) The architecture of complex weighted network. Proceedings of the National Academy of Sciences 101: 3747-3752. 37. Barrett C, Beckman R, Berkbigler K, Bisset K, Bush B, et al. (2000) TRANSIMS (TRansportation ANalysis SIMulation System) 3.0 LA-UR-00-1724, 1725, 1755, 1766, and 1767. Los Alamos National Laboratory. 38. Chowell G, Hyman JM, Eubank S, Castillo-Chavez C (2003) Scaling laws for the movement of people between locations in a large city. Physical Review E 68: 66102. 39. De Montis A, Barthelemy M, Chessa A, Vespignani A (2005) The structure of Inter-Urban traffic: A weighted network analysis. Arxiv preprint physics/0507106. 40. Guimera R, Mossa S, Turtschi A, Amaral LAN ( 2005) The worldwide air transportation network: anomalous centrality, community structure, and cities global roles. Proc Natl Acad Sci USA 102: 7794-7799. 41. Albert R, Barabsi AL (2002) Statistical mechanics of complex networks. Reviews of Modern Physics 74: 47-97. 42. Dorogovtsev SN, Mendes JFF (2003) Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford Univ. Press, Oxford. 43. Pastor-Satorras R, Vespignani A (2003) Evolution and Structure of the Internet: A Statistical Physics Approach. Cambridge University Press, Cambridge, UK.
May 21, 2009
9:27
WSPC - Proceedings Trim Size: 9in x 6in
Colizza˙BIOMAT2008-1.novo2
113
44. Barthlemy M (2003) Crossover from scale-free to spatial networks. Europhys Lett 63 (6): 915-921 45. Guimera R, Amaral LAN (2004) Modeling the world-wide airport network. The European Physical Journal B-Condensed Matter 38: 381-385. 46. Masuda N, Miwa H, Konno N (2005) Geographical threshold graphs with small-world and scale-free properties. Physical Review E 71: 36108. 47. Barrat A, Barthelemy M, Vespignani A (2005) The effects of spatial constraints on the evolution of weighted complex networks. J Stat Mech P05003, 1742-5468. 48. Barabasi A-L, Albert R (1999) Emergence of Scaling in Random Networks. Science 286. 49. Maslov S, Sneppen K (2002) Specificity and Stability in Topology of Protein Networks. Science 296: 910. 50. Gautreau A, Barrat A, Barthlemy M, Global disease spread: statistics and estimation of arrival times. J Theor Biol, To appear. 51. Gautreau A, Barrat A, Barthlemy M (2007) Arrival time statistics in global disease spread. J Stat Mech L09001.
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
114
INTERNAL SYMMETRY CLASSES OF ICOSAHEDRAL VIRAL CAPSIDS R. KERNER Laboratoire de Physique Th´ eorique de la Mati` ere Condens´ ee, Universit´ e Pierre et Marie Curie, Boˆıte Courrier 121 4 Place Jussieu, 75252 Paris Cedex 05, France E-mail:
[email protected] We show that the usual Kaspar-Klug classification of icosahedral viral capsids based on the triangular number T can be refined if one takes into account the differentiation of hexagonal capsomers. These can appear in three different kinds according to their symmetry axis: two-fold and chiral (abcabc), three-fold (ababab), or non-symmetric, i.e. totally differentiated (abcdef ). The icosahedral capsids can be then subdivided into four symmetry classes, which in turn may be chiral or non-chiral. We discuss the properties of the “periodic table”resulting from this classification and explore the hints concerning the mutations and the evolutionary trends of icosahedral capsid viruses.
1. Introduction Since Aristotle’s first attempts to systematize animal and plant species1 the question of similarity and proximity between different kinds of living organisms has become one of the most difficult scientific puzzles, for biologists and mathematicians alike. Which features of plants or animals are important in deciding whether this or that species are closer or more distant from each other? Is it the size, the shape, or more intimate internal structure? Is it the environment (seawater, air, sand) or animal’s diet (predator or grass-eater)? Aristotle asked these questions in his “History of animals”, and several answers he gave are valid until now; others make us smile, like calling the whale “a fish”. His observations on anatomy and biological functions of internal organs are also among his finest achievements. The question of distance between various species became more important that ever since Darwin proposed his theory of evolution. Now the “distance” could become the measure of time and number of mutations needed
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
115
for transforming one species into another, often via numerous intermediary stages. However, it remains unclear whether the time needed for certain amount of mutations is proportional to their number. Short-cuts and very rapid transitions cannot be excluded. While evolving, living organisms often seem to keep the traces of their previous form. Thus certain evolution trends can be deciphered via comparison of organisms coexisting at a given time with their evolution during the individual growth. This “recapitulation theory” was proposed by Ernest Haeckel in ninetienth century;3 according to his own words,it was announced that “ontogeny recapitulates phylogeny”, claiming that an individual organism’s biological development, or ontogeny, parallels and summarizes its species’ entire evolutionary development, or phylogeny. The simpler the organism, the greater the frequency and number of mutations that it presents for observation. Viruses, which are the simplest lumps of living matter, reduced almost entirely to the genetic material in the form of giant DNA double helix, whose only biological function is to duplicate itself ad infinitum, provide an excellent field for observation of mutations and development of new strains. The notion of a “distance” between various strains can be more easily introduced here, because the shapes and sizes of viruses are much more easily calibrated than similar features of living organisms of higher complexity. This is particularly true in the case of the numerous group of spherical viruses, whose protective protein shells called “capsids” display perfect icosahedral symmetry.4 It is amazing that these structures, known to mathematicians since Coxeter’s classification,5 are also observed in the so-called fullerenes, huge molecules composed exclusively of carbon atoms, predicted by Smalley and Kroto, and discovered in the eighties. The icosahedral viral capsids are one of the most spectacular examples of self-organization of giant proteins which can build up the more and more complicated structures. First theoretical basis of quantitative, physical and mathematical analysis of such processes was set forth by Manfred Eigen in his classical book on biological self-organization6 . Since Caspar and Klug7 introduced simple rules predicting a sequence of observed viral capsids, several models of growth dynamics of these structures have been proposed, e.g. A. Zlotnick’s model8 published in 1994. The common geometrical feature of many viral capsids and fullerenes is their icosahedral shape, with twelve pentagons found on the opposite sides of six five-fold symmetry axes, and an appropriate number of hexagons in between. The number of hexagons is given by the following simple formula:
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
116
N6 = 10 (T − 1), with T = (p2 + pq + q 2 ), called the triangular number, and where p and q are two non-negative integers.5 In capsids, the building blocks made of coat proteins are called monomers, dimers, trimers, pentamers and hexamers, according to their shape, the bigger ones usually being assembled from smaller ones prior to further agglomeration into capsid shells14 . Sometimes pentameric or hexameric symmetry is displayed despite the direct construction from 60 or 180 smaller subunits, like in the Cowpea mosaic virus and the Cowpea chlorotic mottle virus, respectively9 Although certain virus species grow medium-size capsids corresponding to N6 = 20 (like in the C60 fullerene molecule), or N6 = 30 and N6 = 60, some of them form pure dodecahedral capsids (with exclusively pentamers as building blocks), like certain Comoviridae or Cowpea virus13 ), while some others, like human Adenovirus,11 form very huge capsids with N6 = 240, corresponding to p = 5, q = 0 In some cases, the similarity with the fullerene structure is striking: for example, the TRSV capsid is composed of 60 copies of a single capsid protein (56 000 Da, 513 amino acid residues),16 which can be put in a one-to-one correspondence with 60 carbon atoms forming a fullerene C60 molecule; the aforementioned Cowpea viruses provide another example of the same type (see Fig. 1).
Schematic representation of capsids with T-numbers 3, 4 and 7, rightand left-handed (Courtesy of VIPERdb). Fig. 1.
2. The assembly rules of icosahedral capsids The process of building the icosahedral viral capsids differs quite essentially from the fullerene formation: fullerenes are formed from carbon atoms and small carbon molecules like C2 ,C3 , up to C9 or C10 ), etc., in a hot plasma around electric arc between two graphine electrodes, whereas capsids are built progressively in liquid medium, from agglomerates of giant protein molecules displaying pentagonal or hexagonal symmetry, or directly from
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
117
smaller units (monomers or dimers). It also seems that there is no such thing as universal assembly kinetics: the way the capsids are assembled differs from one virus to another. The T = 7 phage HK47 appears to build pentamers and hexamers first, then assemble these capsomers to form the final capsid structure, whereas another T = 7 phage labeled P22 appears to assemble its capsids directly from individual coat proteins (see15 ) and the references within). The common point is the presence of pentagons and hexagons in the resulting structure, and the strict topological rules that result from Euler’s theorem on convex polyhedra: V − E + F = 2, with V number of vertices, E number of edges, and F number of faces. From this one derives the fact that when only pentagonal and hexagonal faces are allowed, the number of pentagons is always N5 = 12, while the hexagon number is N6 = 10(T − 1). Contrary to the case of fullerene molecules, whose yield from the hot plasma is in the best case no higher than 10% of total mass of carbon sooth, viruses use almost 100% of pentamers and hexamers at their disposal to form perfect icosahedral capsid structures, into which their DN A genetic material is densely packed once the capsid is complete. This means that the initial nucleation ratio of pentamers versus hexamers is very close to its final value in capsids in order to minimize the waste. Secondly, the final size of the capsid must depend on particular assembly rules, which can be fairly well deduced from the statistical weights of various agglomeration steps, found by maximizing the final production rate. Let us investigate the rules that define the type and the size of capsids, simultaneously optimizing the production rate. From symmetry considerations (and confirmed by chemical analysis) it results that the pentamers are composed from five identical dimers, so that their five edges are perfectly equivalent, and that they possess a defined orientation, i.e. it is known which one of the two faces will be on the outer side of the capsid. All the five sides of a pentamer should be equivalent (identical), because 5 is a prime number, and any division into parts will break the symmetry. Concerning the hexamers, as 6 is divisible by 2 and 3, one can have the following four situations: - All 6 sides equivalent, (aaaaaa) - Two types of sides, disposed as (ababab) - Three types of sides, disposed as (abcabc) - Six different sides, (abcdef )
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
118
The hexamers are also oriented, with one face becoming external, and the other one turned to the interior of the capsid. The three differentiated hexamers are represented in Fig. 2 below.
Fig. 2.
Three differentiations of hexamers
Let us denote pentamers’ sides by symbol p, whereas two different kinds of sides on hexamers’ edges will be called a and b ( Fig. 3). Suppose that a hexamer can stick to a pentamer with only (p + a)-combination; then two hexamers must stick to each other only through a (b + b) combination, with both (p + b) and (a + b) combinations being forbidden by chemical potential barrier.
Fig. 3.
Building schemes and affinity matrices for the T=3 and T=4 capsids
Similarly, with a more differentiate hexamer scheme, (abcabc), and with
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
119
the assembling rules allowing only associations of p + a and b + c, we get with a 100% probability the T = 4 capsid, as shown in Fig. 3. Note that in both cases we show only one of the “basic triangles”forming the capsid, which is always made with 20 identical triangles sticking together to form a perfect icosahedral shape. These examples suggest that strict association rules may exist providing precise agglomeration pathways for each kind of icosahedral capsid. Let us analyze these rules in more detail. If the viruses were using undifferentiated hexamers with all their sides equivalent, then there would be no reason for not creating any kind of structures as shown in Fig. 4, and the final yield would be very low (at best like in the fullerenes, less than 10%). But with differentiated hexamers
Random agglomeration of pentagonal and hexagonal capsomers would lead to 50% of waste at each consecutive step, resulting in a negligible final yield. Fig. 4.
of the (ababab)− type simple selection rules excluding the (p − b) and (ab) associations while letting the creation of (p − a) and of (b − b) links, we have seen that the issue becomes determined with practically 100% certainty, as it follows from the Fig. 3. These sticking rules can be summarized up in a table that we shall call the “affinity matrix”, displayed in Fig. 3. Here a “0” is put at the crossing of two symbols whose agglomeration is forbidden, and a “1” when the agglomeration is allowed. By construction, a “1” can occur only once any line or in any column. The next case presents itself when one uses the next hexamer type, with a two-fold symmetry : (abcabc). Again, supposing that only a-sides can stick to pentamers’ sides p, there is no other choice but the one presented in Fig. 3. Finally, let us use the maximally differentiated hexamers of the
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
120
(abcdef )-type. Starting with pentamers surrounded by the hexamers sticking via the (p − a)-pairing, we discover that now two choices are possible, leading to left- and right-hand sided versions, as shown in Fig. 5.
Fig. 5.
The building schemes of the T=7 capsids.
Now a natural question can be asked: what comes next? In order to grow capsids with T -numbers greater than 7, one has to introduce new types of hexamers that would never stick to pentamers, but being able to associate themselves with certain sides of the former maximally differentiated hexamers. For bigger capsids, in which the rate of pentamers is lower, one can not obtain proper result unless more than one type of hexamers is present, out of which only one is allowed to agglomerate with pentamers. In the case of two different hexamer types one obtains either the T = 9 capsid, or, with more exclusive sticking rules, the T = 12 capsid. The result is shown in the Fig. 6 below. The corresponding affinity matrices are easily constructed just looking at the above figures; they can be also found in.18 Finally, in order to get the T = 25 adenovirus capsid, one must introduce no less than four hexamer types, out of which only one type can agglomerate
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
121
Fig. 6.
The T=9 and T=12 capsid’s basic triangles
with pentamers. The inspection of the building schemes encoded in the elementary triangles and the corresponding affinity matrices leads to the following simple rules: 1) For the construction of a capsid with given triangular number T one needs exactly T different proteins (or at least, T different types of sticking sides) - because the affinity matrix has the dimension T × T , as easily seen in the examples. The affinity matrices belong to the class of stochastic matrices and have always an eigenvector of unit eigenvalue representing the asymptotic probability distribution. 2) By definition, the “affinity matrices” have only one non-zero item in each row and in each column; moreover, they are symmetric. This means that all capsid protein types (or more exactly, all different sticking sides) encountered in a complete icosahedron appear with the same frequency: 60 times each. This can be most easily seen for the p-type forming a pentamer: there are 5 of them in each pentamer, and there are 12 pentamers in any icosahedral capsid. But then each p sticks to an a, and exclusively to it: therefore, there must be also 60 a-type proteins in the complete capsid, and so on, for each different protein. This means that all the dimers that assemble in pentamers and hexamers later on have to be produced at exactly the same rate in order to optimize capsid production. 3) The capsomers composing a given capsid should be produced at different rates, with a very simple rule: for every dozen of pentamers, one should have 60 maximally diversified hexamers of the type (abcdef ), (and of each different type, like the (na nb nc nd ne nf ) in the T = 9 capsid); then 30 hexamers of each (ababab) type; and 20 hexamers of the (abcabc) type.
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
122
This rule can be easily seen upon inspection of Figs. 3 - 6. 4) In order to know how many (and of which kinds) hexamers should be used, the T -number should be partitioned into 1+ the rest, the “1” staying for the unique type of pentamers’ side, while “the rest”’ must be decomposed into a sum of numbers 2, 3 or 6, according to the simple factors of 6. This is shown in the last column of the table: we see that T = 3 = 1 + 2; T = 4 = 1 + 3; then the next cases decomposes as T = 7 = 1 + 6; T = 9 = 1 + 6 + 2 (by the way, there is no point in trying to build a capsid with 1 + 6 + 3 = 10 because 10 cannot be a triangular number!). Although new capsid viruses, yet unknown, continue to be discovered, and nobody can tell for sure what is their maximal size, the number of known species is already enormous, and certain statictical features may contain a very useful information. In what follows, we show an attempt to organize the capsids according to their internal symmetries encoded in their capsomer differentiation and the assembling rules encoded in their affinity matrices. 3. Classification of Icosahedral Capsids Now we can organize all these results in a single table that follows. To each value of triangular number T corresponds a unique partition into 1+(T −1), where the “1”represents the unique pentamer type and (T −1) is partitioned into a sum of certain number of different hexamer types, according to the formula (T − 1) = 6 n6 + 2 n2 + 3 n3 with non-negative integers n6 , n2 and n3 , the last two ones, n2 and n3 taking on the values 0 or 1 exclusively. The result is displayed in the following tables, where we see how many different types of hexamers are needed for the construction of a capsid with a given triangular number T . The next Table continues the classification. Starting from p + q = 9 the triangular number T does not follow the uniform growth with p + q; indeed, the capsid with T = 61 appears in the Table after the capsid with T = 64. In the next table such cases become more and more frequent. Clearly enough, all icosahedral capsids can be divided into four different classes according to their internal symmetry; in each of these classes one can encounter chiral (left- and right-oriented isomers) or non-chiral types. Starting from certain size, the isomers become possible. We mean by this not just the left and right oriented tilings like in the case of T = 7
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
123 Table 1. Classification of icosahedral capsids (isomers marked with ∗ ) The last two columns give the number and type of hexamers needed for the construction and the corresponding chirality. N =p+q
(p, q)
T
N6
Partition
Chirality
1
(0, 1)
1
0
1
0
2
(1, 1)
3
20
1+2
0
2
(0, 2)
4
30
1+3
0
3
(1, 2)
7
60
1+6
+
3
(0, 3)
9
80
1+6+2
0
4
(2, 2)
12
110
1+6+2+3
0
4
(3, 1)
13
120
1+6+6
+
4
(4, 0)
16
150
1+6+6+3
0
5
3, 2)
19
180
1+6+6+6
+
5
(4, 1)
21
200
1+6+6+6+2
+
5
(5, 0)
25
240
1+6+6+6+6
0
6
(3, 3)
27
260
1 + (4 × 6) + 2
0
6
(4, 2)
28
270
1 + (4 × 6) + 3
+
6
(5, 1)
31
300
1 + (5 × 6)
+
6
(6, 0)
36
350
1 + (5 × 6) + 2 + 3
0
7
(4, 3)
37
360
1 + (6 × 6)
+
7
(5, 2)
39
380
1 + (6 × 6) + 2
+
7
(6, 1)
43
420
1 + (7 × 6)
+
7
(7, 0)
49∗
480
1 + (8 × 6)
0
8
(4, 4)
48
470
1 + (7 × 6) + 2 + 3
0
8
(5, 3)
49∗
480
1 + (8 × 6)
+
8
(6, 2)
52
510
1 + (8 × 6) + 3
+
8
(7, 1)
57
560
1 + (9 × 6) + 2
+
8
(8, 0)
64
630
1 + (10 × 6) + 3
0
and similar bigger capsids existing with two chiralities, but the possibility to construct capsids of similar size and with identical triangular number T ,
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
124 Table 2. with ∗ )
Classification of icosahedral capsids continued. (isomers marked
N =p+q
(p, q)
T
N6
Partition
Chirality
9
(5, 4)
61
600
1 + (10 × 6)
+
9
(6, 3)
63
620
1 + (10 × 6) + 2
+
9
(7, 2)
67
660
1 + (11 × 6)
+
9
(8, 1)
73
720
1 + (12 × 6)
+
9
(9, 0)
81
800
1 + (13 × 6) + 2
0
10
(5, 5)
75
740
1 + (12 × 6) + 2
0
10
(6, 4)
76
750
1 + (12 × 6) + 3
+
10
(7, 3)
79
780
1 + (13 × 6)
+
10
(8, 2)
84
830
1 + (13 × 6) + 2 + 3
+
10
(9, 1)
91∗
900
1 + (15 × 6)
+
10
(10, 0)
100
990
1 + (16 × 6) + 3
0
11
(6, 5)
91∗
900
1 + (15 × 6) + 2 + 3
+
11
(7, 4)
93
920
1 + (15 × 6) + 2
+
11
(8, 3)
96
950
1 + (15 × 6) + 2 + 3
+
11
(9, 2)
103
1020
1 + (17 × 6)
+
11
(10, 1)
111
1100
1 + (18 × 6) + 2
+
11
(11, 0)
121
1200
1 + (20 × 6)
0
12
(6, 6)
108
1070
1 + (17 × 6) + 2 + 3
0
12
(7, 5)
109
1080
1 + (18 × 6)
+
12
(8, 4)
112
1110
1 + (18 × 6) + 3
+
with different partition into pentamers and hexamers, and different numbers p and q. The first isomers occur at T = 49; in the table of isomers below we display only ten cases. This series can be continued ad infinitum, but there is no evidence that viruses can grow indefinitely large capsids. But from the lack of monotoneous ordering in the above tables we conclude that neither the triangular number T nor the sum p + q provide a correct parametrization of icosahedral capsids. A more appropriate organization of tables reflecting should take into account the combinatorial
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
125 Table 3.
Isomer icosahedral capsids.
Type (p,q)
T = p2 + pq + q 2
N6 = 10(T − 1)
T decomposition
(7,0); (5,3)
49
480
1+8×6
(9,1); (6,5)
91
900
1 + 15 × 6
(11,1); (9,4)
133
1320
1 + 22 × 6
(11,2); (7,7)
147
1460
1 + 24 × 6
(13,0); (8,7)
169
1680
1 + 28 × 6
(13,3); (9,8)
217
2160
1 + 36 × 6
(14,3); (11,7)
247
2460
1 + 41 × 6
(15,2); (13,5)
259
2580
1 + 43 × 6
(15,4); (11,9)
301
3000
1 + 50 × 6
(18,1); (14,7)
343
3420
1 + 57 × 6)
properties of various hexamer differentiations and assembling rules. 4. The periodic table of icosahedral capsids In order to produce a table of capsids with monotoneously growing size, let us see whether any combination of various hexamer types (i.e. of abababtype, of abcabc-type and of the abcdef -type can coexist and lead to one of the possible icosahedral structures displayed in the tables of the previous section. It is easy to see that not all combinations are geometrically and topologically possible. The table below shows which combinations of various hexamers can be realized, and which are barred by topological impossibility. Now the capsids are arranged according to a strictly growing scheme, and the table contains all combinations of various hexamers, which can be also realized, as we have seen in the second section, using dimers or trimers. Simple algebraic relation 1 + 6n6 + 2n2 + 3n3 = T = p2 + pq + q 2
(1)
may or may not have solutions in positive or zero numbers (p, q) and arbitrary values of n6 with n2 and n3 taking on values 0 or 1. Next table is the continuation of the periodic structure. In the tables 5 and 6 one clearly observes periodicity in the columns: the realisable icosahedral capsids follow in the places separated by a multiple of
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
126 Table 4. Combinations of four types of hexamers. Some can produce icosahedral capsids, others cannot. (ppppp)
(abcdef )
(ababab)
(abcabc)
T
(p, q)
1
0
0
0
1
(1, 0)
1
0
1
0
3
(1,1)
1
0pdf
0
1
4
(2, 0)
1
0
1
1
6
impossible
1
1
0
0
7
(2, 1)
1
1
1
0
9
(3,0)
1
1
0
1
10
impossible
1
1
1
1
12
(2, 2)
1
2
0
0
13
(3, 1)
1
2
1
0
15
impossible
1
2
0
1
16
(4,0)
1
2
1
1
18
impossible
1
3
0
0
19
(3, 2)
1
3
1
0
21
(4,1)
1
3
0
1
22
impossible
1
3
1
1
24
impossible
1
4
0
0
25
(5, 0)
1
4
1
0
27
(3,3)
1
4
0
1
28
(4,2)
1
4
1
1
30
impossible
1
5
0
0
31
(5, 0)
1
5
1
0
33
impossible
1
5
0
1
34
impossible
1
5
1
1
36
(6,0)
four lines. In the first column, containing the capsids with lowest internal symmetry, built up exclusively with maximally differentiated hexamers,
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
127 Table 5.
Periodic table of icosahedral capsids.
Scheme
T, (p, q)
(5 + 6)
(1, 0, 0, 0)
1, (1, 0)
(1, 0, 0, 0)
(1, 0, 1, 0)
3, (1, 1)
(1, 0, 0, 1)
4, (2, 0)
(1, 0, 1, 1)
6
−
(1, 1, 0, 0)
7, (2, 1)
(1, 1, 0, 0)
(1, 1, 1, 0)
9, (3, 0)
(1, 1, 0, 1)
10
(1, 1, 1, 1)
12, (2, 2)
(1, 2, 0, 0)
13, (3, 1)
(1, 2, 0, 0)
(1, 2, 1, 0)
15
−
(1, 2, 0, 1)
16, (4, 0)
(1, 2, 1, 1)
18
−
(1, 3, 0, 0)
19, (3, 2)
(1, 3, 0, 0)
(1, 3, 1, 0)
21, (4, 1)
(1, 3, 0, 1)
22
−
(1, 3, 1, 1)
24
−
(1, 4, 0, 0)
25, (5, 0)
(1, 4, 0, 0)
(1, 4, 1, 0)
27, (3, 3)
(1, 4, 0, 1)
28, (4, 2)
(1, 4, 1, 1)
30
−
(1, 5, 0, 0)
31, (5, 1)
(1, 5, 0, 0)
(1, 5, 1, 0)
33
(1, 5, 0, 1)
34
(1, 5, 1, 1)
36
(5 + 6 + 2)
(5 + 6 + 3)
(5 + 6 + 2 + 3)
(1, 0, 1, 0) (1, 0, 0, 1) −
−
−
−
−
(1, 1, 1, 0) −
−
(1, 1, 1, 1)
−
−
−
(1, 2, 0, 1) −
−
−
−
−
−
−
−
−
(1, 3, 1, 0)
(1, 4, 1, 0) (1, 4, 0, 1) −
−
−
−
−
−
−
−
−
−
− (1, 5, 1, 1)
with one exception they follow regularly after each skip of four lines, which corresponds to adding one extra (abcdef )-type new hexamer. Apparently, the addition of one of the more symmetric hexamers ((ababab)-type or an
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
128 Table 6.
Periodic table of icosahedral capsids continued.
Scheme
T, (p, q)
(5 + 6)
(5 + 6 + 2) (5 + 6 + 3) (5 + 6 + 2 + 3)
(1, 6, 0, 0)
37, (4, 3)
(1, 6, 0, 0)
(1, 6, 1, 0)
39, (5, 2)
−
(1, 6, 1, 0)
−
−
(1, 6, 0, 1)
40
−
−
−
−
(1, 6, 1, 1)
42
−
−
−
−
(1, 7, 0, 0)
43, (6, 1)
(1, 7, 0, 0)
(1, 7, 1, 0)
45
−
−
−
−
(1, 7, 0, 1)
46
−
−
−
−
(1, 7, 1, 1)
48, (4, 4)
−
(1, 7, 1, 1)
(1, 8, 0, 0) 49, (7, 0); (5, 3) (1, 8, 0, 0) −
−
−
−
(1, 8, 1, 0)
51
(1, 8, 0, 1)
52, (6, 2)
(1, 8, 1, 1)
54
−
−
−
−
(1, 9, 0, 0)
55
−
−
−
−
(1, 9, 1, 0)
57, (7, 1)
(1, 9, 0, 1)
58
−
−
−
−
(1, 9, 1, 1)
60
−
−
−
−
(1, 10, 0, 0)
61, (5, 4)
(1, 10, 0, 0)
(1, 10, 1, 0)
63, (6, 3)
(1, 10, 0, 1)
64, (8, 0)
(1, 10, 1, 1)
66
−
(1, 11, 0, 0)
67, (7, 2)
(1, 11, 0, 0)
(1, 11, 1, 0)
69
(1, 11, 0, 1) (1, 11, 1, 1)
(1, 8, 0, 1)
(1, 9, 1, 0)
(1, 10, 1, 0) (1, 10, 0, 1) −
−
−
−
−
−
−
70
−
−
−
−
72
−
−
−
−
(abcabc)-type, is less easy; adding both of these types at ones must be even more difficult judging by the lowest number of capsids displaying this type of internal symmetry. These features of our “periodic table” may shed
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
129
some light on the number of necessary new protein additions in order to pass from one capsid to its closest neighbor in the table, and consequently, the amount of mutations necessary to achieve such modification. This in turn may give us some hints concerning past evolution of capsid viruses. 5. Mutational distance and evolutionary trends Trying to define evolutionary trends from these capsid classification tables seems to be a risky endeavour; one can recall however that non-negligible information was drawn from a careful study of animal skeletons, fish scales, and similar secondary features of living organisms. There is a clear “evolutionary” pattern in the last two tables − meaning that every next (bigger) type of capsid uses the previous construction, just adding a minimal amount of novelty: and it is clear most of the time which kind of new hexamer one must add, just looking at the differences between the consecutive T-numbers - e.g. if they differ by 2 or by 3, one should add one new hexamer type, ababab or abcabc, respectively; but if they differ by 4 (e.g. from T=21 to T=25) or by 5 (from T=31 to T=36); one must add two new types of (ababab), or one (ababab) and one (abcabc) type. It seems plausible that the major evolution trend is from smaller towards bigger forms, as it supposes progressive differentiation among the constitutive hexamers. From a purely mathematical point of view the evolution would mean then an addition of a new hexamer type, or a transformation of one of the constitutive hexamers into another one with higher differentiation. An addition of new (abcdef ) hexamer to one of the capsids of any column results in one step down the same column; an addition of several new maximally differentiated hexamers results in the same number of steps down the same column. In order to better understand the tables in Figs.4, 5 and 6 one should imagine them as cylinders, with the right edge of the right column glued to the left edge of the first column. Then one can imagine all possible transitions from any column to another one, not necessarily its immediate neighbor, but always towards the right side and down. For example, adding a new (ababab)-type hexamer to a capsid from the first column creates a species belonging to the second column, etc. The important question is then, how many mutations are necessary to accomplish one of these transformations? This would give a hint as to how can one conceive the notion of “distance” between different capsids. It seems reasonable to assume that the closest species are those separated by a single addition of a maximally differentiated hexamer, which can be sym-
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
130
bolized by the transition (abcdef ) → (a′ b′ c′ d′ e′ f ′ ), because such a mutation does not alter hexamer’s character and can be obtained by a common modification (adding a particular radical, for example). This is why we should expect the viruses with triangulation numbers 7 and 13 to be close parents, both belonging to the first column and both skew-symmetric; we would also expect the Adenovirus (T = 25, (p, q) = (5, 0)) to be related to one of the isomers of the T = 49, (p, q) = (7, 0) capsid, whose respective schemes are 25 = 1 + 4 × 6 and 49 = 8 × 6, which can be obtained by a common modification of all the four differentiated hexamers thus doubling their number. The isomer T = 49, (p, q) = (5, 3) should be quite distant frome these two. It is also plausible that the evolution by mutations keeps the capsid types inside the same column, with the same symmetry type, as the symmetry change (abcdef ) → (ababab) requires several more specific mutations at once. This is probably why the capsid type T = 16 = 1 + 2 × 6 + 3 in the third column of the Table 5 appearing as isolated corresponds to a single and unique family of Herpesvirus, which admits many mutations and variations, but remaining always inside the same capsid species and size (12 ). Such “islands” exist also among the capsid types corresponding to higher values of T , even when one considers the Tables 5, 6 and 7as cylinders, with their right and left borders glued together. There are single isolated species corresponding to T = 124; T = 196 and T = 268 (the last one beyond the range of observed types). But there are also “islands” in form of isolated groups of species containing a few neighbors in the table. As an example, one can cite the group of five species with triangular numbers equal to T = 52, 57, 61, 63 and 64; there is another small isolated doublet with T = 79 and T = 81; an isolated group of four species T = 67, 73, 75 and 76; another isolated doublet with T = 96 and T = 100, and so forth. It is reasonable to suppose that all such groups represent an increased stability against mutations that would force them to get out of their isolated cluster, because by definition, such evolutionary displacements need more than one mutation at once. To define a notion of a distance between various types of capsids is a challenge for further research in this direction.
References 1. Aristotle’s Complete Works, ed. J. Barnes, Princeton University Press, Bollingen Series, Vol. LXXI, 2 (1984) 2. C. Darwin, On the Origin of Species by Means of Natural Selection (1859) 3. E. Haeckel, Nat¨ urliche Sch¨ opfungsgeschichte (1868), Die systematische Phy-
April 24, 2009
15:57
WSPC - Proceedings Trim Size: 9in x 6in
Richard.Kerner.novo2
131
logenie (1894) 4. D.D. Richman, R.J. Whitley and F.G. Hayden, Clinical Virology, (second edition); ASM Press, Washington DC (2002) 5. M.C.M. Coxeter, “Regular polytopes”, Methuen and C., London, (1948) 6. M. Eigen, Selforganization of matter and the evolution of biological molecules, Springer-Verlag, Die Natutwissenschaften 58 heft 10 (1971) 7. D.L.D. Caspar, A. Klug, Symp. Quant. Biol. 27, 1 (1962) 8. A. Zlotnick, J. Mol. Biology 241, pp. 59-67 (1994) 9. Larson et al., 1998, Journal of Molecular Biology 277, pp.37-59 10. D.J. McGeogh and A.J. Davison, The molecular evolutionary history of the herpesviruses: origins and evolution of viruses, Academic Press Ltd., London (1999) 11. P.L. Stewart, R.M. Burnett, M. Cyrklaff, S.D. Fuller, Cell, Vol. 67 October 4, pp. 145-154 (1991) 12. B.L. Trus et al., 2001, Journal of Virology, 75 (6), pp. 2879-2890 13. T. Lin and J.E. Johnson, Advances in Virus Research, Vol. 62, pp. 167-236 (2003) 14. H.R. Hill, N.J. Stonehouse, S.A. Fonseca and P.G. Stockley, J. Mol. Biol. 266, pp. 1-7 (1997). 15. P.E. Prevelige, D. Thomas and J. King, Biophys. Journal 64, pp. 824-835 (1993); R. Schwartz, P.E. Prevelige, B. Berger, Biophys. Journal, 765, pp. 2626-2636 (1998) 16. B. Buckley, S. Silva, S. Singh, Virus Research, 30, pp. 335-349 (1993) 17. R. Kerner, R. Computational Materials Science, 2, pp.500-508 (1994) 18. R. Kerner, “Models of Agglomeration and Glass Transition”, Imperial College Press, London, (2007). 19. R. Twarock, J. Theor. Biology, 21 226 (4), pp. 477-482 (2004) 20. R. Kerner, Journal of Theoretical Medicine, Vol. 6 (2), pp.95-97 (2005) 21. R. Kerner, Journal of Theoretical Medicine, Vol. 9 (3,4), pp.175-181 (2008)
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
132
RECOGNITION OF FRESHWATER MACROINVERTEBRATE TAXA BY IMAGE ANALYSIS AND ARTIFICIAL NEURAL NETWORKS S. R. DOYLE1,2,3 A. L. SOMMA2 J. CODNIA3,4 J. E. URE3 L. ROMANELLI1 F. R. MOMO2,3 1 Consejo Nacional de Investigaciones Cient´ıficas y Tecnol´ ogicas (CONICET), Argentina. 2 PIEA, Depto. de Ciencias B´ asicas, Universidad Nacional de Luj´ an, Ruta 5 y 7, Luj´ an (6700), Argentina. 3 Instituto de Ciencias, Universidad Nacional de General Sarmiento, J.M. Gutierrez 1150, Los Polvorines (1613), Argentina. 4 Instituto de Investigaciones Cient´ıficas y T´ ecnicas para la Defensa (CITEFA), San Juan Bautista de La Salle 4397 (1603) Villa Martelli, Argentina. Routine taxonomic identification is a limitation factor in the study of macroinvertebrates communities, a key group of freshwater ecosystems. Traditionally, macroinvertebrates has been identified through examination under stereoscopic microscope, an activity that requires high technical expertise and a considerable amount of time. In this paper we present the first automatic taxonomic identification of freshwater macroinvertebrate taxa, achieved through a novel image processing program developed with MATLABr . The program works in a completely automated fashion once it has been trained, with no user intervention. A set of morphological and texture parameters are calculated through image analysis and processed by a hierarchical set of partitioned artificial neural networks (ANNs) in order to identify the taxon to which presented specimens belong. Classification performance is estimated by 10-fold stratified cross-validation. Specimens of 10 macroinvertebrate taxa of varying taxonomic hierarchy were isolated and identified from field samples. Digital images of
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
133 specimens were acquired with a flatbed scanner, yielding a database of 1042 images. Overall recognition performance was > 74% for all taxa, with most values in the 80-90% range, and a highest value of 99.48%. The followed scheme of image processing and hierarchical partitioned ANNs analysis proved to be effective for this particular challenge of pattern recognition, yielding a global classification performance of 87.83% and being able to distinguish between species of the same genus.
1. Introduction Routine taxonomic identification is a limitation factor of many ecological studies. This is particularly the case when the volume of specimens that can usefully be obtained vastly outstrip any capacity to identify this material.1 At the same time that there is a growing need of species identification due to biodiversity conservation efforts and climate changes assessment, the number of trained taxonomist decline.2 Tools that allow fast and reliable taxonomic identification are therefore of high value for both pure and applied ecology research programs. Macroinvertebrates are a key group of freshwater ecosystems, being a fundamental link in the food web between organic matter resources and higher order consumers.3 By convention, macroinvertebrates are defined as the invertebrate fauna that is retained by a 500µm mesh3 . Since macroinvertebrates generally have life cycles of a year of more, they are exposed to eventual pollutants over long periods and integrates the effects of short-term episodes, thus being suitable as indicators of the average water quality.4 Biotic index of ecosystem status based on communities of insects and other taxa has been developed, but its use is sometimes limited by taxonomic identification, which requires trained staff. Taxonomic identification of macroinvertebrates has been traditionally achieved through examination of samples under stereoscopic microscope, an activity that requires high technical expertise and a considerable amount of time. Automation of taxonomic classification of macroinvertebrates would therefore represent a major advance for its study, reducing considerably processing times and allowing to be operated by non-expert users. To our best knowledge, however, an automated identification system of macroinvertebrates has not been developed yet. Automated taxon identification is tan emerging discipline that is becoming more achievable due to increases in processing power and significant reductions in memory and processing costs.5 Automated taxon identification has a profound multidisciplinary nature, requiring the collaboration of biologists, mathematicians and computer scientists. The usual procedure followed in the development of an automated identification system involves the collection of raw data from a
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
134
number of individuals of different taxa, and the processing of the raw data in a form suitable to train a pattern recognition algorithm. While in principle any kind of raw data might be used, the majority of published works in this topic uses digital images as source of information, and takes advantage from image processing techniques. In the last years, several works used this approach to achieve automated classification in diverse data sets2,6–20 . In the present paper, we present a novel automatic image processing program developed with MATLABr that was trained to perform the taxonomic identification of 10 most common occurring macroinvertebrate taxa from a stream of Argentina. The program works in a completely automated fashion once it has been trained, with no user intervention. A set of morphological and texture parameters are calculated through image analysis from individual digital images of specimens. Classification process is then performed based on calculated parameters by a hierarchical set of partitioned artificial neural networks (ANNs). 2. Methods 2.1. Macroinvertebrates image set 2.1.1. Specimens collection Macroinvertebrate specimens were collected at Las Flores stream (34o 27’30.62”S, 59o 3’14.97”W), a tributary of the Luj´an River, Argentina, in January 2008. Given that most macroinvertebrates species in this stream are associated to macrophytes 21 , samples of most abundant macrophytes, Ceratophyllum demersum and Egeria densa, were taken using a hand net with 500µm mesh size. Macroinvertebrates were isolated from plants in the laboratory. Specimens were identified under stereoscopic microscope. 2.1.2. Image acquisition A set of images of individuals from most abundant macroinvertebrate taxa present in field samples were acquired with the aid of a flatbed scanner (Epson Perfection V100). A total of 10 categories with different taxonomic hierarchy were included in this study, including insects (5), crustaceans (2) and mollusks (3) taxa, which are detailed in Figure 1. The complete image set consisted of 1042 images, with the number of images of each taxon ranging from 53 to 357 (Figure 2). Digital images of identified individuals were obtained with the aid of a flatbed scanner. Specimens were placed on glass recipient with a 2 mm plane
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
135
Fig. 1. Macroinvertebrate taxa included in this study. Insects: (a) Zygoptera (Odonata), (b) Ephemeroptera, (c) Trichoptera, (d) Chironomidae (Diptera), and (e) Elmidae (Coleptera). Crustaceans: (f) Hyalella curvispina (Amphipoda), and (g) Hyallela pseudoazteca (Amphipoda). Molluscs: (h) Hydrobiidae (Gasteropoda), (i) Ampullaridae (Gasteropoda), (j) Ancylidae (Gasteropoda).
glass at the bottom, this procedure allowed us to take images of individuals submerged in the water. A non-reflective white plastic sheet was used to generate a uniform background in all images. Images had 24 bit color depth and 2400 dpi resolution. 2.2. Image analysis program The image analysis program was developed in MATLABr (The Mathworks, Inc.). The program has 3 main modules, which will be described in detail below: 1) segmentation, which consists on the background identification and object individualization, 2) feature extraction, where 65 parameters related with texture, morphology and color are calculated for each individualized object, and 3) classification, achieved using a neural network analysis. 2.2.1. Segmentation The segmentation of a digital image is the procedure by which each pixel of the image is assigned as belonging to the background of the image or to object of interest, producing as result a binary image22 . Segmentation was achieved by simple, yet very effective pixel-based method. Raw images were transformed from RGB color space to HIS (hue-saturation-intensity) color space. A preliminary segmentation was done based only on the saturation layer, using a conservative threshold that allowed distinguishing pixels that were part of the specimen with high confidence, but rejecting at the same time many pixels of the object of interest, especially in border areas. Morphological operations were applied to the produced binary image to improve the quality of segmentation, including morphological closing and opening operations22 . The preliminary segmentation was then used to es-
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
136
Fig. 2.
Composition of the macroinvertebrate image set.
tablish the area of the image where the specimen is present. The saturation values of the rest of the image, which contain noting but background pixels, is used to calculate a threshold value that is used in the final segmentation. The threshold value is computed as the value greater than the 99% of the saturation values of the background pixels. The described segmentation method allowed overcoming uncontrolled differences that exist between in the background of different images that prevented using a single threshold value for all images. The quality of segmentation preserved details of specimens’ body structure important for classification (e.g. antennae). Main steps of the segmentation procedure are summarized in Figure 3. 2.2.2. Feature extraction The second module of the image analysis program calculates a set of parameters based on the segmented image of specimens. The obtained parameters can be classified as morphological, texture or color descriptors. Built-in MATLABr functions were used when possible; otherwise, references are indicated. Morphological parameters describe the shape of specimens of the objects, regardless of the color and texture of the image, and are therefore based on the binary segmented image only23 . One main feature of morphological descriptors is whether they are invariant with respect to scale,
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
137
Fig. 3. Outline of the segmentation process: (a) original image, (b) saturation channel after transforming to HSI color space, (c) preliminary segmentation, (d) background area used to calculate optimum threshold, (e) segmentation using optimum threshold, and (f) segmented image used in subsequent feature extraction process.
rotation, translation and mirroring22 . In this work we used rotation, translation and mirroring invariants descriptors only, given that the orientation and location of specimens contained no information. Mirroring-invariant is a desirable feature when dealing with species where images are taken from one side of the body, such as amphipods (Figure 1f-g). Scale-invariants shape descriptors are useful because they describe a shape regardless of its size, an important feature if dealing with living organisms. Non scale-invariant descriptors vary with the size of specimens, but they offer valuable information since the size of specimens is a very important feature. We therefore used a mixture of scale-invariants and non-invariant morphological descriptors. Scale-invariants morphological descriptors included circularity22 and fractal dimension estimated by box counting method24 . The seven Hu’s invariant moments were calculated as described in Gonzalez et al. (2004). Hus invariant moments are a normalization of the unscaled central moments, and consist of seven compound spatial moments that are invariant to translation, rotation and scale change23 . The magnitude of the first 30 Fourier descriptors of the boundary of the shape of specimens were calculated25 . Fourier descriptors are powerful morphological descriptors that allow to capture information of different frequencies of the boundary of the shape24 . Discarding high frequency terms of the Fourier expansion of the boundary allow obtaining shape descriptors that are robust to segmentation imperfections; the first terms of the Fourier expansion contains lower
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
138
frequency details, and usually with a small number of terms most of the information of the shape is preserved. Taking the module of the Fourier descriptors makes them rotation-invariants, while normalizing by the first term is necessary to achieve scale-invariance22 . Area, perimeter and equivalent diameter were also calculated, being the only descriptors non-invariant with scale. Texture descriptors vary with the distribution of digital levels in the images. Statistical moments were calculated for each color channel of the segmented images and for the gray image as well. Statistical moments included standard deviation, skewness and kurtosis. The entropy of gray and color channels was also calculated. Color descriptors contain information regarding the absolute and relative values of specimens colors. Means of each color channel and color ratio relative to the green channel were calculated. 2.2.3. Classification by artificial neural networks The last program module consisted on an artificial neural network analysis. We used a hierarchical set of partitioned ANNs as classifiers, with the 65 calculated parameters used as inputs. Partitioned networks comprise a set of individual networks, each of which is trained to discriminate two groups a class of interest and a background of all other classes26 . This approach proved to be much more efficient than the alternative of training one ANN with as many outputs as categories, which incurred in a much higher error rate in preliminary trials. The hierarchical structure of the ANNs classification procedure works by first identifying aggregated taxa, which are composed by morphologically similar taxa. After the first order classification is completed, specimens that were classified as belonging to a taxon that is an aggregation of lower known taxa are subjected to a second order classification. This hierarchical design resulted in much better preliminary results than the ordinary non-hierarchical alternative in the recognition of closely related taxa. The neural network architecture used in this work was a feed-forward ANNs, i.e. a multilayer perceptron (MLP), with one hidden layer containing a number that varied according to recognized taxon. Networks with two or more hidden layers were tested but showed no appreciable advantage with the data set used in this study. The number of nodes on the hidden layer was set to 10 as initial value based on preliminary results, and then adjusted to optimize the classification performance. A hyperbolic tangent sigmoid transfer function was used in the hidden layer, whereas a linear transfer function was used in the output layer. A requirement for using backpropagation optimization is to use differentiable transfer func-
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
139
tions. Therefore, the output of ANNs was values close to 1 in case of being classified as belonging to the specific taxa, or close to 0 if not, and a value of 0.5 was use as threshold between the two. 2.2.4. Training and estimation of classification error The LevenbergMarquardt back-propagation optimization algorithm was used as training function, with gradient descent algorithm as learning function and the mean squared error as performance function. A stratified 10fold cross-validation was used to estimate classification error. This procedure consists in randomly dividing the whole data set in 10 equally sized portions, but in a stratified manner: cases of each category are randomly assigned to each portion separately from other categories. The classification error is estimated by the mean of the classification error when using as training set 9 portions and as test set the remaining portion, thus yielding 10 different error values. For each of 10 training and test set as previously defined, 20 training trials were performed with different, randomly assigned initial values. This is necessary due to the dependence of results of training with initial values of NNs parameters27 . In order to improve generalization of trained NNs, each training set was additionally stratified split. A 15% of training set was used for early stopping of training, while a 20% was to calculate a classification error that allow to estimate generalization of trained NNs. From the 20 NNs resulting with training with different initial values, we selected the one with minimum mean classification error considering the 20% cases not used in training and the 80% (65+15) used in training the NNs. This criterion proved to yield smaller classification error on the test set for the selected NNs in preliminary trials than other possible criteria (e.g. selecting the NN with minimum classification error in cases used on training or on the 20% cases not used on training). The classification performance of the program was assessed through the evaluation of several parameters calculated by 10-fold stratified cross-validation. The global classification performance is determined as the percentage of specimens correctly classified. The recognition percentage is calculated as the ratio between the number of specimens correctly classified as belonging to its taxon and the total number of specimens of that taxon. The omission error is the percentage of specimens of a taxon that were identified as belonging to that taxon (false negatives). The commission error of a taxon is the percentage of specimens that were incorrectly assigned to the taxon (false positives). The misclassification error rate for a certain taxon is the ratio between the number of specimens that were incorrectly assigned to
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
140
that taxon (false positives) and the total specimens of all the other taxa. The overall recognition percentage of a particular taxon is determined as the recognition percentage multiplied by 100-(misclassification error) and the divided by 100. 3. Results The developed program had a global classification performance of 87.83% in the recognition of the 10 taxa present in the image set. Errors by omission and commission, which represent the false negatives and positives respectively, were of the same magnitude for all taxa (Table 1). While the absolute number of this two types of errors were similar, misclassification error rates were negligible, with values in the range 0.05-1.25% (Table 1). The number of nodes of the hidden layer for optimum performance was in the range 3-15. The overall recognition percentage was > 74% for all taxa, with most values in the 80-90% range (Table 1). Lowest values were obtained for Ephemeropetra and the two Hyalella species. The highest recognition percentage was of 99.48%, obtained for Trichoptera. Table 1.
Classification performance parameters(%).
Taxon
Recognition
Omission
Commission
Ampullaridae Ancylidae Chironomidae Elmidae Ephemeroptera H.curvispina H.pseudoazteca Hydrobiidae Trichoptera Zygoptera
83,50 85,34 81,85 88,86 74,38 75,77 75,22 85,07 99,48 83,84
10,68 6,81 8,88 6,63 13,22 12,59 13,39 6,38 0,14 8,08
5,83 7,85 9,26 4,51 12,40 11,64 11,40 8,55 0,38 8,08
Misclassification 0,62 0,82 0,96 0,46 1,25 1,21 1,18 0,91 0,05 0,82
Overall 83,49 85,34 81,85 88,86 74,38 75,76 75,22 85,07 99,48 83,84
The classification procedure had, as previously stated, a hierarchical structure. A first order classification distinguished among 9 taxa, with H. curvispina and H. pseudoazteca specimens constituting the only aggregated taxon on this study, Hyallella sp., with n=154. The global performance of this first classification step was of 89.84%, with a recognition percentage of 89.13% for Hyallela sp.. The second order classification distinguished between the two Hyallela species, and had a global performance of 83.92%, with similar recognition percentage for both species (84.52% for H. cuvispina and 83.33% for H. pseudoazteca).
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
141
4. Discussion This work is the first regarding the automatic recognition of freshwater macroinvertebrates that we are aware of. A global recognition performance of 87.83% was achieved in this study in the classification of the 10 most common occurring macroinvertebrate taxa in summer samples of a stream of Argentina. The developed program does not involve, once it has been trained, the intervention of the user at any time, thus being completely automatic. The former represent an advantage over many works in the topic found on the literature, which require the user intervention for tasks such as segmentation or area of interest selection (e.g.10 ). The classification performance varied with taxa (Table 1). The highest recognition value (99.48%), and the only ¿90%, was obtained for the taxon with the highest number of images (Trichoptera, n=357). The former suggest a dependence of classification performance with the size of the image set, as expected. The similar number of images in the other taxa precluded a traditional regression analysis (between 50 and 100, see Figure 2), but sub-sampling techniques might be used to test and estimate this effect. Recognition performance had a lowest value of 74-75% for Ephemeroptera and the two species of the same genus, H. curvispina and H. pseudoazteca. The low value in the case of Ephemeroptera might indicate a poor characterization of the taxa due to a low number of image in the database (n=53) and a high variability among images. On the other hand, first order identification of Hyallela sp. and second order distinction between the two species had similar recognition performance when considered separately (89.92% and 83-84% respectively). In consequence, the low global recognition values yielded for the Hyallella species are the product of two consecutive classifications, given the hierarchical nature of the classification procedure. It seems reasonable to expect that, in general, there will be such tradeoff between taxonomic accuracy and recognition performance in any hierarchical identification system. A hierarchical classification structure is used by the developed program. Traditional taxonomic identification is by definition hierarchical since the modern taxonomy was first established by Carl von Linn in the XVIII century. As opposed as what might be expected considering the former, most of the developed automated taxon identification systems are not hierarchical (e.g.2,8,18,19 ). In this work, with the particular macroinvertebrate taxa involved, the hierarchical approach was much more effective than its alternative in preliminary results, but we only tested it on the case of the two Hyalella species. The improvement of classification as a result of a hierarchical approach should be tested with a much higher
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
142
number of taxa in further work. A hierarchical classification structure might have an advantage of a very different nature: it might fell more natural to ecologists given its similarities to traditional taxonomic identification. Partitioned artificial neural networks (ANNs) were used as classifiers in this work. The use of partitioned ANNs was dictated by preliminary results on which its performance was much higher than the alternative of only one network with as many outputs as taxa to be classified. A major advantage of using partitioned ANNs for taxon identification is that new taxa can be rapidly incorporated into the identification scheme without the need for complete retraining.26 Thus, increasing the number of recognized taxa can be accomplished in a rather simple way, only retraining those network that incur in a high commission rate due to the inclusion of new taxa. Multi layer perceptron (MLP) neural networks with one hidden layer were used as classifiers. Several types of ANNs have been used in automated taxon identification with varying results. Ginoris et al.10 used MLP in the identification of the main protozoa and metazoan species present in the activated sludge of wastewater treatment plants, with classification performance of 51.4% - 85.6% for stalked and non-stalked organisms respectively. Russell et al.28 also used MLP in the SPIDA system with satisfactory results, but in this case the results are rather difficult to compare because the methodology used to estimated classification error, as happens with other works in the field.17 While MLP yielded highly satisfactory results in the present work, in further development we intend to test whether other types of ANNs might improve classification performance. Automatic taxon identification presents several challenges that are particular for this pattern recognition task. Several factors will affect the variability of the input data. The naturally existing variability among individuals of the same taxa is an obvious and inevitable source of variability.1 Other factors than natural variability introduce noise and thus might diminish classification performance. In this study we identified, in particular, one factor related with the image acquisition process. The portion of the body of specimens that is registered on images varies according to the physical position of specimens when image are acquired, which was not fixed by the user. The extent of this factor depends on body shape of taxa. If there is a natural tendency to lie down on a lateral side, such as occur with amphipods (see Figure 1), a lateral view will be registered on the image, reducing this source of variability in this taxa. Therefore, this factor might explain differences in the recognition performances of taxa. The automatization of taxonomic identification has several advantages over traditional methods. The time involved by the
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
143
user of an automated taxon identification system in typically a small fraction of the time consumed in identification by traditional methods, e.g.19 Other main advantage is that body size measurements could be easily implemented, thus obtaining body size of identified specimens in a fraction of the time taken by traditional means. Among the applications of a system capable of automatic recognition of freshwater macroinvertebrate, as the presented in this study, are the study of macroinvertebrates communitys trough time, and the monitoring of freshwater bodies through the macroinvertebrates communitys composition. In the face of the multiple advantages of automatic taxon identification, one question arises considering its actual use among scientific community: why have been these systems seldom developed and used? Gaston and ONeill (2004) explored this issue, addressing the notions that automatic taxonomic identification it is too difficult, too threatening, too different or too costly. Our experience indicates that collaboration between different disciplines is a hardworking but indispensable task. The former might be, in our opinion, one of the big challenges that hinder the development and popularization of automatic taxon identification systems. We agree with Gaston and ONeill (2004) in that, besides collaborations between biologists and computer scientists, individuals with a background in both biology and computer sciences are required. Conclusions In this work we presented the first report of an automatic taxon identification system of freshwater macroinvertebrates using digital images. Automatic recognition of freshwater macroinvertebrates was achieved by a novel image analysis program developed with MATLABr . The followed scheme of image processing techniques and hierarchical partitioned ANNs analysis proved to be effective for this particular challenge of pattern recognition, yielding a global classification performance of 87.83% and being able to distinguish between species of the same genus. The developed program could be improved in several ways. As previously stated, the use of different types of ANNs might improve classification performance. Even though applications the developed automated taxon identification program are multiples as it was presented in this work, a segmentation process that allow to identify specimens on images with multiple individuals will undoubtly expand its potential applications. We plan to work in both of these issues in further work. Since the developed program incorporates at no point information regarding the nature of the specimens on images, it seems reasonable that the image analysis program could be trained to recognize other taxa with
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
144
successful results. In future work we intend to test the generality and performance of the program with a higher number of taxa and on other groups of organisms. Acknowledgments The authors are grateful to Gonzalo Paz and family for facilitating the collection of field samples, and to Anala Bardelas for assistance in acquiring of digital images. During this work SRD was supported by a fellowship from CONICET (National Council of Scientific and Technical Research, Argentina). Partial funding for this work was provided by the PIP 6124 project supported by CONICET, and by the Fondo Semilla 2007- Efectos de la eutroficacin sobre la dinmica de poblaciones de organismos de agua dulce project supported by UNGS (Universidad Nacional de General Sarmiento). References 1. Gaston, K.J. and M.A. O’Neill, Philosophical Transactions of the Royal Society B: Biological Sciences 359, (2004). 2. Mayo, M. and A.T. Watson, Knowledge-Based Systems 20, (2007). 3. Hauer, F.R. and V.H. Resh, Macroinvertebrates, in Methods in stream ecology, F.R. Hauer and G.A. Lamberti, Editors. 2007, Academic Press: Burglinton. 4. Lampert, W. and U. Sommer, Limnoecology. 2 ed. 2007, Oxford: Oxford University Press. 324. 5. MacLeod, N., ed. Automated taxon identification in systematics : theory, approaches and applications. 2008, CRC Press: Boca Raton, FL. 339. 6. Weeks, P.J.D., et al., Image and Vision Computing 17, (1999). 7. Gottwald, S., C.U. Germeier, and W. Ruhmann, Mycological Research 105, (2001). 8. Thiel, S.U., R.J. Wiltshire, and L.J. Davies, Water Research 29, (1995). 9. Beaufort, L. and D. Dollfus, Marine Micropaleontology 51, (2004). 10. Ginoris, Y.P., et al., Water Research 41, (2007). 11. Castanon, C.A.B., et al., Pattern Recognition 40, (2007). 12. Tang, X., et al., Artificial Intelligence Review 12, (1998). 13. Dogantekin, E., et al., Expert Systems with Applications 35, (2008). 14. Balfoort, H.W., et al., J. Plankton Res. 14, (1992). 15. Carr, M.R., G.A. Tarran, and P.H. Burkill, J. Plankton Res. 18, (1996). 16. D¨ orge, T., J.M. Carstensen, and J.C. Frisvad, Journal of Microbiological Methods 41, (2000). 17. Embleton, K.V., C.E. Gibson, and S.I. Heaney, J. Plankton Res. 25, (2003). 18. Ginoris, Y.P., et al., Analytica Chimica Acta 595, (2007). 19. Grosjean, P., et al., ICES J. Mar. Sci. 61, (2004). 20. Neto, J.C., et al., Computers and Electronics in Agriculture 50, (2006).
April 24, 2009
10:52
WSPC - Proceedings Trim Size: 9in x 6in
Santiago.Doyle.novo2
145
21. Giorgi, A., C. Feijo, and G. Tell, Biodiversity and Conservation 14, (2005). 22. J¨ ahne, B., Digital image processing. 5 ed. 2002, Berlin: Springer-Verlag. 575. 23. Pratt, W.K., Digital image processing. 2007, Hoboken, New Jersey: John Wiley & Sons, Inc. 782. 24. Russ, J.C., The image processing handbook. 5 ed. 2006, Boca Raton, FL: CRC Press. 817. 25. Gonzalez, R.C., R.E. Woods, and S.L. Eddins, Digital image processing using Matlab. 2004: Prentice Hall. 624. 26. Morris, C.W., A. Autret, and L. Boddy, Ecological Modelling 146, (2001). 27. Camastra, F. and A. Vinciarelli, Machine learning for audio, image and video analysis: theory and applications. Advanced Information and Knowledge Processing, ed. L. Jain and X. Wu. 2008, London: Springer-Verlag. 494. 28. Russell, K.N., et al., Introducing SPIDA-Web: Wavelets, Neural Networks and Internet Accessibility in an Image-Based Automated Identification System, in Automated taxon identification in systematics: theory, approaches and applications, N. MacLeod, Editor. 2008, CRC Press: Boca Raton. pp. 131-150.
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
146
A STUDY OF HYDROPHOBIC EFFECT ON THE PROTEIN FOLDING USING MONTE CARLO L. P. B. SCOTT† H. R. A. da SILVA M. H. ARAUJO CMCC - Universidade Federal do ABC CEP: 09090-400 - Santo Andr´ e SP, Brasil †E-mail:
[email protected] In this work, the HP model and Monte Carlo Method are used to study the effect hydrophobic on the folding problem. We used two lattices models (square and cubic) and several chains with distinct proportions of hydrophobic residues. We investigate how the hydrophobic residues number of the chains can influence its folding. For each simulation, we measure three parameters: Energy, Endto-End Distance and Radius of Gyration. The geometry of the final chains was analyzed too. The simulations show that the proportion of hydrophobic residues in the chain is very important for the folding. New simulations have been showed that the position of theses residues is important to the chain. Keywords: monte carlo, folding protein, lattice model.
1. Introduction The information for life is stored by a four-letters alphabet in the genes (DNA). Proteins are, among others, the macromolecules that perform all important task in organism as catalysis of biochemical reactions, transport, recognition.1,2 The three-dimensional structure (tertiary structure) of proteins determines their function.3–6 This fact has been described as the determination of the second genetic code.7,8 The practical uses of this knowledge to a vast field, such as biotechnology, pharmaceutical sciences, among many scientific research fields, have produced a variety of computational methods along with significant improvement of the potentials to be minimized in order to reach the structure of the proteins and also learn more about the protein folding problem.7–16 Potentials derived from ab initio principles6,17–19 or statistical potentials20,21 based on structural databases
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
147
have been used in the simulations which are performed through a variety of methods such as molecular dynamics, monte carlo simulations, genetic algorithms, neural networks, simulated annealing to predict the secondary and tertiary structure of proteins and to optimize the conformation of macromolecules.22–29 The work reported in this paper present the use of monte carlo and the HP model30–32 to study the effect of hydrophobic residues on the folding chains. The next section presents a brief report on folding protein problem and the HP model. Section 3 discusses the methodology and results. Section 4 presents the conclusions and future works. 2. Protein Folding and Monte Carlo Many theories have been advanced to elucidate the folding mechanism of polypeptide chains. Although “folding funnel concept” remains an important contribution to the understanding of how a unique stable structure may be attained in a physiological time, the methods developed until now are not yet successful in the majority of cases to find such a structure from computational means without using any experimental indications. No reliable computational methods exist presently for determining the structure of proteins for which any homologous structures exist with a sequence identity larger than 30%.33,34 The main difficulties, as stated above, remain in the high dimensionality of the potential energy surface and its ruggedness. Many methods have been developed in order to explore such a surface by artificially modifying the energy surface in order to overcome the energy barriers. Another major difficulty is to energetically discriminate the native structure from all other non-native structures. Most of the potential functions developed until now are not optimal for such a discrimination.33,34 Kamphausen et al. proposed a new genetic algorithm that has been tailored to meet the demands of de novo drug design. In particular, the efficiency of the design algorithm was demonstrated in the context of several different applications. First, RNA molecules were optimized with respect to folding energy. Second, a spinglass was optimized as a model system for the optimization of multiletter alphabet biopolymers such as peptides. Finally, the feasibility of the computer assisted molecular design approach was demonstrated for the de novo construction of peptidic thrombin inhibitors using an iterative process of 4 design cycles of computer-guided optimization.35 Bayley et al presented the program GENFOLD, a genetic algorithm that calculates protein structures using restraints obtained from NMR, such as distances derived from nuclear Overhauser effects, and dihedral angles derived from coupling constants. The program was tested on three proteins:
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
148
the POU domain (a small three-helix DNA-binding protein), bovine pancreatic trypsin inhibitor (BPTI), and the starch-binding domain from Aspergillus niger glucoamylase I, a 108-residue P-sheet protein.36 Pendersen & Moult used genetic algorithms to perform ab initio folding simulations based on a method that operates on all atom representation. The method was tested in the CASP1.37 Much work on hydrophobicity has been done in an attempt to answer the following questions: Do compact conformations due to hydrophobic collapse help protein folding?31 Which scenario would proteins choose in orderto fold faster: a fast nonspecific collapse followed by a slow rearrangement to reach the native state or a specific collapse with simultaneous formation of the native state? While studies have shown that some proteins undergo a burst hydrophobic collapse followed by their folding, there is experimental evidence that some proteins collapse concomitantly with the formation of their native structure.31 Several authors have been used Monte Carlo to study the problem folding and the hydrophobicity.31,32 The use of MC methods to model physical problems allows us to examine more complex systems than we otherwise can. Solving equations which describe the interactions between two atoms is fairly simple; solving the same equations for hundreds or thousands of atoms is impossible. With MC methods, a large system can be sampled in a number of random configurations, and that data can be used to describe the system as a whole. The statistical mechanical and monte carlo have been used to investigate aspect of folding problem by emphasizing universality of folding scenarios over uniqueness of folding pathways for each protein.31,38 One of the most popular models is the so-called HP model, where the hydrophobic interactions between the amino acids are considered to be the main force in the folding process.30 In this work, we have adopted the 2D square lattice HP bead model where the H and P beads are constrained to lie on a bidimensional square lattice and interactions occur only between nonbonded beads that lies adjacent to each other on the lattice and they are not adjacent in the sequence. The values of the H-H, H-P and P-P interactions (ǫij ) in the standad HP model: ǫHH =1.0, ǫHP =0.0 and ǫP P =0.0. 3. Methodology and Results We used the language Java to implement the simulation of the chains with monte carlo method and HP model. In this work, several chain configurations were generated randomly with a fixed length of 28 monomers. The simulations were realized in a 2D (bi-dimensional) and 3D (tri-dimensional) lattice model varying the number of hydrophobic residues and theirs posi-
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
149
tions on the chain. So we simulated several chains with different number of hydrophobic residues: 28, 25, 19, 16, 13, 8 and 4. In this paper we present the results with 25, 16 and 8 hydrophobic monomers for the 2D lattice and results with for 3D lattice. The monte carlo implemented three possible movements (Figure 1). For each simulation, the program calculated three parameter of the chain: the Potential Energy of the chain (Equation 1), the End-to-End Distance and the Rarius of Gyration. The geometry of the chain with lower energy was analyzed to study how it folded. The condition to accept a movement (new conformation) is the Metropolis test. The new conformation is accepted if its energy is lower than actual conformation; otherwise it is accepted with the probability describe in the Equation 2, where G2 G1 is the average free energy difference between the new conformations (index 1) and the actual conformation (index 2) P = e−(G2 −G1 )/kT
Fig. 1.
(1)
Movement implemented by Monte Carlo Method.
3.1. Simulation in a 2D Lattice The results of the simulations with 25, 18 and 8 hydrophobic monomers are showed in the Figures 2.a, 2.b, 2.c, 3a, 3b, 3c, 4a, 4b and 4c respectively. In each case, we measured the Medium Energy, the End-to-End Distance and the Gyration Ratio. Several simulation was made with configuration generated randomly (chains with polar residues in distinct positions). The graphics (Figures 2, 3 and 4) presents the results of three simulations for each case (25, 18 and 8 hydrophobic monomers). 3.2. Simulation in a 3D Lattice The results of the simulations with 22, 21 hydrophobic monomers are showed in the Figures 2a, 2b, 2c, 3a, 3b, 3c, 4a, 4b and 4c respectively. In each case, we measured the Medium Energy, the End-to-End Distance
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
150
(a)
(b)
(c) Fig. 2. For a chain with 25 hydrophobic monomers the curves represent the medium energy(a), distance end-to-end (b) and radius of gyration (c). Different simulations was realized with different starting configuration (hydrophobic monomer position).
and the Gyration Ratio. Several simulation was made with configuration generated randomly (chains with polar residues in distinct positions). The graphics (Figures 2, 3 and 4) presents the results of three simulations for
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
151
(a)
(b)
(c) Fig. 3. For a chain with 16 hydrophobic monomers the curves represent the medium energy(a), distance end-to-end (b) and radius of gyration (c). Different simulations was realized with different starting configuration (hydrophobic monomer position).
each case (25, 18 and 8 hydrophobic monomers).
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
152
(a)
(b)
(c) Fig. 4. For a chain with 08 hydrophobic monomers the curves represent the medium energy(a), distance end-to-end (b) and radius of gyration (c). Different simulations was realized with different starting configuration (hydrophobic monomer position).
Conclusions and Future Works Monte Carlo and HP model have been used to investigate the folding problem and aspect like hidrophobicity, folding pathways and chain sequence
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
153
(a)
(b)
(c)
(d)
Fig. 5. For a chain with 22 hydrophobic monomers the curves represent the medium energy(d), end-to-end (c) and radius of gyration (a). The figure 5 (b) is the initial and finally conformation. Different simulations was realized with different starting configuration (hydrophobic monomer position).
chain optimization.30,31 In this work a bi-dimensional grid, monte carlo and HP model was used to study the influence of the hydrophobic residues in the chain folding. It was noted that both number of hydrophobic residues and their position in the chain is important to the folding. In this moment, we are investigating the use of different potential parameter for the HP model in a three dimensional grid to simulate the chains and comparing the results with this work.
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
154
(a)
(b)
(c)
(d)
Fig. 6. For a chain with 21 hydrophobic monomers the curves represent the medium energy(d), end-to-end (c) and radius of gyration (a). The figure 5 (b) is the initial and finally conformation. Different simulations was realized with different starting configuration (hydrophobic monomer position).
Aknowledgments The authors thank the CNPq. This work was supported by CNPq. References 1. Baldi P. et al., Exploiting the past and the future in protein secondary structure prediction. Bioninformatics. 15:11: 937-946 (1999). 2. Dill et al., Principles of Protein folding A perspective form simple exact models. Protein Science. 4:561-602 (1995). 3. Cui Y., Protein Folding Simulation With Genetic Algorithm and Supersecondary Structure Constraints. Proteins: Structure, Functions and Genetics. 31:247-257, 1998. 4. Kim P. S and Baldwin R. L., Intermediates in the Folding Reactions of Small Proteins. Annu. Rev Biochemistry. V 50:631:660 (1990). 5. Kollman, P. A.; Djam Y. and Lee, M. R., State of the art in studying protein Folding and protein structure prediction using molecular dynamics methods. J. Molecular Graphics and Modeling. Vol 19:I1:146-149 (2001). 6. Desjarlais, J. R and Handel T. M., Side-chain backbone flexibility in protein core design. Journal of Molecular Biology. vol. 290:I:July:305-318 (1999).
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
155
7. Leonhard K. et al., Solvent-amino acid interaction energies in threedimensional-lattice Monte Carlo simulations of a model 27-mer protein: Folding thermodynamics and Kinetics. Protein Science. May:358-369 (2004). 8. Radja et al., Conservation of Statistical results under the reduction of paircontact interactions to salvation interaction. Physical Review E. 72:061915 (2005). 9. Peterson, R. W., Improved side-chain prediction accuracy using an ab initio potential energy function and very large rotamer library. Protein Science. 13:735-751 (2004). 10. Wang J., Wang W., Huo S., Lee M. and Kollman A., Solvation Model Based on Weighted Solvent Accessible Surface Area. J. Phys. Chem. B, 105:505-507 (2001). 11. Rost B. and Sander C., Combining Evolutionary Information and Neural Networks to Predict Protein Secondary Structure. PROTEINS: Structure, Functions and Genetics. 19:55-72 (1999). 12. Ponder, J. W.; Richards F. M., Tertiary templates for protein use packing criteria in the enumeration of allowed sequences for different structural classes. J. Molecular Biology, v. 193:775-791 (1987). 13. Pendersen, T. J. and Moult J., Protein Folding Simulations with Genetic Algorithms and a Detailed Molecular Description. Journal of Molecular Biology. 269:240-259 (1997). 14. Belda, I. et al., ENPDA: an evolutionary structure-based de novo peptide design algorithm. Journal of Computer-Aided Molecular Design. 19:585-601 (2005). 15. Amari, S.; Aizawa M.; Zhang, J; Fukuzawa, VISCANA: Visualized Cluster Analysis of Protein-Ligand Interaction Based on the ab initio Fragment Molecular Orbital Method for Virtual Ligand Screening. J. Chem. Inf. Model. 46:221-230 (2006). 16. Barton, G., Protein secondary structure prediction: Current Opinion in Structural Biology. 5:372-276 (1995). 17. Bohr H, Bohr J, Brunak S, Cotterill R M J, Lautrup B, Nrskov L, Olsen O, Petersen S, Protein secondary structure and homology by neural networks. FEBS Letters, 241:223-228 (1998). 18. Bonneau, R. et al., Contact order and ab initio protein structure prediction. Protein Science. 11:1937-1944 (2002). 19. Bottegoni, G.; Cavalli, A. and Racnatinni, M. A, Comparative Study on the Application of Hierarchical-Agglomerative Clustering Approaches to Organize Outputs of Reiterated Docking Runs. J. Chem. Inf. Model. 46:852-862 (2006). 20. Bystroff, C and Baker, D., Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs. Journal Molecular Biology. 281:565-577 (1999). 21. Cai, Y., Li Y. and Chou K. C., Using Neural Networks for prediction of domain structure. Biochimica et Biophysica Acta. 1476:1-2 (2000). 22. Canutescu, A., A graph-theory algorithm for rapid protein side-chain prediction. Protein Science. 12:2001-2004, (2003). 23. Chandonia J. M., Karplus M., New Methods for Accurate Prediction of Pro-
April 24, 2009
15:59
WSPC - Proceedings Trim Size: 9in x 6in
Luis.Scott.novo2
156
24. 25.
26. 27. 28.
29. 30. 31.
32.
33.
34. 35.
36.
37.
38.
tein Secondary Structure. PROTEINS. Structure, Function and Genetics. 35:293-306 (1999). Chen, H et al., On Evaluating Molecular-Docking Methods for Pose Prediction and Enrichment Factors. J. Chem. Inf. Model. 46:401-415, (2006). Compiani M. et al., An entropy criterion to detect minimally frustrated intermediates in native proteins. Proc. Nat. Acad. Sci. USA. vol. 95:9290-9294 (1998). Dandekar T. and Argos P., Folding the Main Chain of Small Proteins with the Genetic Algorithm. Journal of Molecular Biology. 236:844-861, 1993. Fang Q. and Shortle, D., New Fold Methods: Prediction Reports. Protein: Structure, Function and Genetics. v. 53:S6:486-490, 2003. Frishman D, Argos P., Incorporation of non-local interactions in protein secondary structure prediction from amino acid sequence. Protein Engineering. 9(2):133-142, 1996. Jacobson, M., Force field validation using protein side chain prediction. Journal of Ph. Chem. B. 106(44):11673-11680 (2002). Cox A.G, and Johsnton L. R., Analyzing energy landscapes for folding model proteins. Journal of Chemical Physics. vol. 124, (2006). Oliveira C. L, Silva, T.H. R, Leite , V. B. and Chahine, J., Frustration and hydrophobicity interplay in protein folding and protein evolution. Journal of Chemical Physics. vol. 125, (2006). Travasso, R. D. M, Gama, M. T. M. and Fasca, P, F. N., Pathways to folding nucleation events and native geometry. Journal of Chemical Physics. vol. 127, (2007). Scott, L. P. B., Chahine, J. and Ruggiero, J. R., Use of Genetic Algorithms and Solvation Potential to Study Peptides Structure. http://dx.doi.org/10.1016/j.amc.2007.05.002. Irback, A, Peterson, C, Pottast F. and Sandelin E, Monte Carlo procedure for protein design. Physical Review E, Rapid Comunications, November, (1998) Kamphausen S., et al. Genetic algorithm for the design of molecules with desired properties. Journal of Computer-Aided Molecular Design, 16, pp. 551567, (2002). Bayley M.J., Jones G., Willet P. and Williamson M. P., GENFOLD: A genetic algorithm for folding protein structures using NMR restraints. Protein Science, 7, pp. 491-499, (1998). Pendersen T. J. & Moult J., Ab initio protein folding simulations with genetic algorithms: simulations on the complete sequence of small proteins. Proteins, suppl. 1, p. 179-184, 1997. Shakhnovich E., Monte-Carlo Methods in Studies of Protein Folding and Evolution. SpringerLink, Vol. 704, (2006)
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
157
EVOLUTION IN A HOST-PARASITE SYSTEM N. F. BRITTON Department of Mathematical Sciences and Centre for Mathematical Biology, University of Bath, UK E-mail:
[email protected] Some organisms employ multiple defence strategies against their enemies, while others fail to employ a defence that seems obvious. We shall investigate three questions for host-parasite systems. (1) Under what circumstances does it pay for a host to employ a given defence strategy against one of its parasites? (2) If alternative strategies are available, how is the appropriate strategy chosen? (3) When is it appropriate to employ multiple defence strategies against an enemy? We shall illustrate our results in two cases of brood parasites and their hosts. The paper by Britton et al. (2007) contains more background details on the basic model and the analysis but the extensions to the model and some of the results are new.
1. Introduction 1.1. General introduction Flax (Linum usitatissimum) has twenty-six defensive genes conferring resistance to flax rust (Melampsora lini), but each such gene is countered by an attacking gene in the rust (Flor 1956). This situation may have come about through an arms race (Dawkins and Krebs, 1979), a succession of defensive gambits in the flax each countered by the rust, in a process known as gene-for-gene coevolution. Passion-vines (Passifloraceae) produce toxic compounds as a general defence against herbivores. Heliconius butterfly larvae have overcome these, and the passion-vines employ more specialised defence strategies, such as hooks to immobilise Heliconius larvae and structures that mimic Heliconius eggs, against them (Gilbert 1983). This seems to be another example of an arms race.
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
158
Swollen-thorn acacias have a mutualistic relationship with ants; they provide shelter to the ants within their thorns, and in return the ants deter herbivores from eating the acacia (Janzen 1966). Other acacias synthesis toxic cyanogenic glycosides to deter herbivores, but none does both, and so no arms race seems to have taken place in this case (Rehr et al. 1973). Several vertically transmitted bacterial symbionts provide resistance in pea aphids (Acyrthosiphon pisum) against the parasitoid wasp Aphidius ervi, but are only seen at intermediate frequencies in natural populations. Any one pea aphid very rarely harbours more than one species of bacterial symbiont, so multiple strategies are very rarely employed (Oliver et al. 2003). Again, no arms race has occurred here. Hedgehogs (small spiny mammals of the subfamily Erinaceinae and the order Erinaceomorpha) have two alternative defence strategies to their predators, to run or to roll up into a ball. These are true alternatives: it is not possible to employ both these strategies at once. 1.2. Rare enemy effect and strategy-blocking Co-evolutionary arms races seem to occur in some cases but not in others. Dawkins (1982) introduced the concept of the rare-enemy effect, arguing that because there are costs involved in any adaptation, it is not advantageous to develop a defence against a rare enemy. This may explain the lack of an arms race in some cases. In the example of brood parasites that we shall consider later the enemy is not particularly rare, but we shall show that when there are two possible defence strategies that may be deployed by a host against a parasite, each of which is advantageous on its own, an extension of the rare-enemy effect may be used to understand when a combination of the two is advantageous. One strategy may prevent the appearance of the other, a phenomenon we shall call strategy-blocking. 1.3. Brood parasite natural history A general treatment of this area is given in Davies (2000). Brood-parasitic birds lay their eggs in the nest of another bird, the host; if the parasitism is successful the host raises the parasite offspring to independence. Many brood-parasitic chicks, when they hatch, eject all host eggs or chicks from the nest, so that they are raised alone. The following questions arise. • Is it evolutionarily advantageous for a host to defend itself against a brood parasite?
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
159
• In particular, is it advantageous for the host to develop a strategy for ejecting brood-parasitic eggs from its nest? • Is it advantageous for the host to develop a strategy for ejecting or deserting brood-parasitic chicks in its nest? The archetypal Old-World brood parasite is the Eurasian cuckoo (Cuculus canorus), which parasitises several host species. Reed warblers (Acrocephalus scirpaceus) often recognise cuckoo eggs and reject them. In response to this, cuckoos lay eggs that mimic those of the reed warbler. On the other hand reed warblers never recognise cuckoo chicks, and will raise them as their own. The first stage of an arms race has taken place, but not the second. Dunnocks (Prunella modularis) do not even recognise cuckoo eggs, which are quite unlike dunnock eggs. Not even the first stage of the arms race has occurred. In Australia, superb fairy-wrens (Malurus cyaneus) fail to recognise the eggs of their brood parasites, Horsfield’s bronze-cuckoo (Chrysococcyx basalis), but do sometimes desert their nest once the bronzecuckoo chick has ejected all their offsring. They and reed warblers use alternative rejection strategies. There is no known example of a host species that rejects both the eggs and the chicks of its brood parasite. A very simple explanation of this fact could be that in no case has sufficient evolutionary time passed for both rejection behaviours to evolve. In this paper we shall ask whether a deeper reason exists.
2. Modelling 2.1. Monomorphic populations Our model is based on the archetypal Nicholson–Bailey (1935) model for a host-parasitoid system in discrete time: P ′ = c(1 − f (P ))H,
H ′ = RHf (P ),
where P and H are the numbers in the parasitoid and host populations, R is the basic reproductive ratio of the host population, f (P ) is the fraction of hosts that escape parasitism, c is the mean number of parasitoids from each parasitised host that survive to breed, and there is no survival between generations. An acknowledged problem with this model is a lack of selflimitation, with consequent unlimited oscillatory growth of the populations, and we shall introduce self-limitation into the host population only, for simplicity. To adapt the model for brood parasites we also require survival
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
160
between seasons. The equations become P ′ = (1 − µ)P + c(1 − f (P ))H, H ′ = HΨ(H)(1 − ν + Rf (P )),
where Ψ is the self-limitation function, e.g. Ψ(H) = 1/(1 + H/k), and µ and ν are the annual probabilities of death of parasites and hosts in the absence of density-dependent effects. Note that the host steady state H ∗ in the absence of parasitism is given by H ∗ = k(R − ν), an increasing function of k, which may therefore be thought of as the richness of the environment. Defence is not taken into account in the models above. Let a fraction 1 − g of parasitised hosts successfully defend themselves against parasitism (and hence produce hosts in the next season), leaving a fraction g that fail to do so (and hence produce parasites in the next season). Let this defence be cost-free. The equations become P ′ = (1 − µ)P + cg(1 − f (P ))H,
H ′ = HΨ(H) (1 − ν + Rf (P ) + R(1 − g)(1 − f (P )) . Defence costs are of two kinds. Parasite-independent costs are incurred whether or not the parasite is present. It costs to have an immune system, whether or not it is ever used to fight off a disease. If a host defends against brood parasites by ejecting eggs that it believes to be parasitic, it will occasionally make a false-positive identification error and eject one of its own eggs, even if no parasite is present. Parasite-dependent costs are only incurred when the parasite is present. One example is the cost of fighting off a microparasitic disease. In the brood parasite case, a strategy of rejecting the parasitic chick is costly if that chick has time to eject host brood before it is rejected. Incorporating the cost of defence into the model for brood parasites, the equations become P ′ = (1 − µ)P + cg(1 − f (P ))H, H ′ = Ψ(H)Hw(P ),
where the relative fitness function w is given by w(P ) = 1 − ν + Rθf (P ) + (1 − g)Rφ(1 − f (P )). Here φ and θ are pay-offs relative to non-defending hosts with and without parasitism, taking into account parasite-dependent and parasiteindependent costs.
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
161
2.2. Dimorphic populations It is straightforward to generalise the model to two parasite and two host types. The equations become P0′ = (1 − µ0 )P0 + c00 g00 f¯00 (P0 , P1 )H0 + c01 g01 f¯01 (P0 , P1 )H1 , P1′ = (1 − µ1 )P1 + c10 g10 f¯10 (P0 , P1 )H0 + c11 g11 f¯11 (P0 , P1 )H1 , and H0′ = Ψ0 (H0 , H1 )H0 w0 (P0 , P1 ), H1′ = Ψ1 (H0 , H1 )H1 w1 (P0 , P1 ), where wi (P0 , P1 ) is the relative fitness of type i, and wi (P0 , P1 ) = 1 − νi + Rθi fi (P0 , P1 ) + (1 − g0i )Rφ0i f¯0i (P0 , P1 ) + (1 − g1i )Rφ1i f¯1i (P0 , P1 ). Here f¯ji (P0 , P1 ) = P {Hi is parasitised by Pj } , fi (P0 , P1 ) = P {Hi is not parasitised by P0 or P1 } , so fi (P0 , P1 ) = 1 − f¯0i (P0 , P1 ) − f¯1i (P0 , P1 ). The simplest model generalises f (P ) = e−aP in Nicholson–Bailey: f¯ji (P0 , P1 ) =
Pj (1 − exp(−a(P0 + P1 ))) P0 + P1
fi (P0 , P1 ) = exp(−a(P0 + P1 )). The costs of counter-attack may be encoded in the cij parameters.
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
162
2.3. Some extensions of the model 2.3.1. Continuous trait values In some cases strategies are better described by continuous variables, e.g. to quantify how much resource is devoted to defence (for hosts) or attack (for parasites). Let the parasite trait value be x ∈ [0, 1], and the host trait value y ∈ [0, 1]. Let P (x) and H(y) be densities of parasite and host populations in terms of their trait values. With some simplifications, this leads to the following system of integro-difference equations: Z 1 1 − f (kP k) g(x, y)H(y)dy, P (x) P (x)′ = (1 − µ)P (x) + c kP k 0 where kP k = with
R1 0
P (x)dx, f (kP k) = exp(−akP k) (in the simplest model), H(y)′ = Ψ(y, H)H(y)w(y, P ),
and w(y, P ) = 1 − ν + Rθ(y)f (kP k) + Rφ(y)
1 − f (kP k) kP k
Z
0
1
(1 − g(x, y))P (x)dx;
w(y, P ) is the relative fitness of host y in an environment of parasites P . Much of the bifurcation analysis that we describe later can be extended to this system, but we shall not do this here. 2.3.2. Inclusion of mutation Let an offspringRof a parasite of type ξ be of type x with probability density 1 M (x, ξ), where 0 M (x, ξ)dx = 1, M typically positive and symmetric, and similarly for host mutation. Define Pˆ , offspring in the absence of mutation, by Z 1 1 − f (kP k) ˆ P (x) = c g(x, y)H(y)dy. P (x) kP k 0 With mutation, the equation becomes P (x)′ = (1 − µ)P (x) +
Z
0
1
M (x, ξ)Pˆ (ξ)dξ.
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
163
Similarly for host mutation, with kernel N (y, η): ˆ H(y) = Rθ(y)f (kP k)H(y) + Rφ(y)
′
1 − f (kP k) H(y) kP k
H(y) = Ψ(y, H) (1 − ν)H(y) +
Z
0
1
Z
1 0
(1 − g(x, y))P (x)dx,
ˆ N (y, η)H(η)dη .
We again have a system of integro-difference equations, but now with double integrals. Again, many results may be extended to this case, but we shall not do this here. 3. Analysis We return to the essentially ecological model for two parasite types and two host types, with some simplifications, Pi′ = (1 − µ)Pi + cgi0 f¯i0 (P0 , P1 )H0 + cgi1 f¯i1 (P0 , P1 )H1 , Hi′ = Ψ(H)Hi wi (P0 , P1 ), with H = H0 + H1 (host types ecologically identical), and with wi (P0 , P1 ) = 1 − ν + Rθi fi (P0 , P1 ) + (1 − g0i )Rφ0i f¯0i (P0 , P1 ) + (1 − g1i )Rφ1i f¯1i (P0 , P1 ). For the results in this article, all we need is to consider invasion eigenvalues. As an example, consider whether a mutant H1 employing strategy 1 will invade a steady state (P0∗ , 0, H0∗ , 0) consisting of hosts and parasites all employing strategy 0. Linearising about this steady state, H1 will invade if λ = Ψ(H0∗ )w1 (P0∗ ) > Ψ(H0∗ )w0 (P0∗ ) = 1, or w1 (P0∗ ) > w0 (P0∗ ). Very simply, the fittest host wins. If the growth of two types is only limited by a common parasite P , then they are in apparent competition: the one that can survive a higher parasite population survives, and drives the other to extinction (Holt, 1977). At first sight it seems that we shall never see host populations mixed for defensive strategy. However, we shall see later that this is not the case, and our model does not preclude co-existence of two host types.
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
164
4. Results 4.1. Alternative strategies Let host and parasite strategies 0 and 1 be true alternatives, i.e. it is not physically possible to employ both 0 and 1. Let parasite strategy 0 be the counter-attacking strategy to host strategy 0, and parasite strategy 1 the counter-attacking strategy to host strategy 1. Let host strategy 1 be the defence to parasite strategy 0, and host strategy 0 the defence to parasite strategy 1. We might expect strategies to cycle: H0 high =⇒ P0 high =⇒ H1 high =⇒ P1 high. Numerical results show that this is indeed the case.
Fig. 1. Cycling strategies. Depending on the parameters of the system it either tends to a periodically cycling solution (left panels) or to a singular solution that switches between the points at the corners of the frequency diagram (right panels).
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
165
4.2. Arms races Let H0 and P0 be resident naive (no defence, no counter-attack) host and parasite types, and assume that a mutation occurs that produces a defending host type H1 . If the benefits of the defence outweigh the costs of deploying it, then (in the absence of stochastic extinction) we expect this mutation to invade the resident steady state and go to fixation. In mathematical terms, the mutant type H1 invades the naive steady state (P0∗ , 0, H0∗ , 0) and goes to fixation, resulting in a steady state (P0∗ , 0, 0, H1∗ ). Now assume that a mutation occurs in the parasite population, leading to a mutant counter-attacking parasite P1∗ . If the benefits of the counter-attack outweigh its costs, the mutant parasite type will invade the steady state and go to fixation, resulting in a steady state (0, P1∗ , 0, H1∗ ). Mathematically, invasion eigenvalues may be calculated which determine whether the invasions occur. With cost-free defence and counter-attack, invasion of both types occurs. The process may be repeated indefinitely, leading to a genefor-gene arms race such as the one that seems to have occurred for flax and flax rust.
Fig. 2. A stage in an arms race: a naive host type is replaced by a defending one, and the parasite counter-attacks.
4.3. Bifurcation diagrams with costly defence If defence is costly, things are more complicated. Typical fitness functions are as shown in figure 3 (if the defence has a cost even in the absence of parasitism). A bifurcation analysis may be carried out with richness of the environment k as the bifurcation parameter. This may be done rigorously, but here
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
166
Fig. 3. Typical fitness curves for a naive and a defending host, as functions of the parasitism pressure (population) P . Since the defence has a cost in the absence of parasitism, the naive host is fitter for low values of P .
we give an algorithm for constructing the qualitative features of the bifurcation diagram, as the richness of the environment k increases from zero, when there is a single parasite type and two host types, one that employs a defence against the parasite and one that does not, as in figure 3. The extension to more than two host types is straightforward. Note that as k increases then H ∗ increases and so P ∗ increases, so the idea is to construct the bifurcation diagram from the host fitness diagram. Note also, however, that H ∗ and P ∗ are not in general strictly increasing with k. • As k increases from zero then so does the steady state host population size H ∗ , but for low values of k it is not sufficient to support the parasite. The fittest host type employs the naive (no-defence) strategy, and this is the type that persists. • A bifurcation point occurs (at about k = 25 in figure 4) beyond which the parasite population is supported by the host, and both populations strictly increase with k. The fittest host type still employs the naive strategy. • As k increases further, H ∗ and P ∗ increase until we reach a new bifurcation point (at around k = 50 in figure 4) where the naive and the defending host are equally fit. Beyond this point either (i) P ∗ remains constant while the naive host is gradually replaced by the defending host (as in figure 4), or (ii) the naive host is immediately replaced by the defending host. The first of these alternatives is typical for brood-parasite systems.
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
167
• Once the naive host has been replaced by the defending host, the parasite population resumes its increase as the defending host population increases. • Note that as k → ∞ we tend to the point on the fitness diagram where host fitness drops to 1; however rich the environment it is not possible to progress beyond this point.
Fig. 4. Bifurcation diagram for the fitness curves of figure 3, with bifurcation parameter k. Note the gradual replacement of naive host by defending host while the parasite population remains constant (so that each host type has equal fitness), between about k = 50 and k = 100.
We can therefore describe the bifurcation diagrams for the parameters of our brood-parasite systems. For reed-warbler–cuckoo parameters the fitness diagram is as in figure 5, we expect no defence at low cuckoo densities, eggrejection at high cuckoo densities, or a mixture of the two, but no other defence strategies. However rich the environment is, the cuckoo chick can never be other than a rare enemy, not worth defending against. For fairy-wren parameters the fitness diagram is as in figure 6, and we expect no defence at low cuckoo densities, chick-rejection at high cuckoo densities, or a mixture of the two, but no other defence strategies. However rich the environment is, the cuckoo egg can never be other than a rare enemy, not worth defending against. The essential difference between the two sets of parameters that leads to this difference in outcome is the high fitness of fairy-wren chick rejectors at low P . This results from the much higher probability that fairy-wrens can raise a successful brood in the same season after deserting a nest, because of
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
168
Fig. 5. The figure is in terms of excess lifetime production v rather than fitness w; these are related by v = (w − 1)/ν, so we cannot progress beyond v = 0. The only host types we shall see are therefore naive or egg-rejectors, or a mixture of the two. Note that chick-rejection and all-rejection, although they are advantageous compared to the naive strategy at some parasitism rates, are blocked by the egg-rejection strategy.
Fig. 6. Again we cannot progress beyond v = 0. Therefore, for these parameters, the only host types we shall see are naive or chick-rejectors, or a mixture of the two. Note that here egg-rejection and all-rejection, although they are advantageous compared to the naive strategy at some parasitism rates, are blocked by the chick-rejection strategy.
the longer Australian breeding season and the shorter fairy-wren hatching time.
April 24, 2009
9:1
WSPC - Proceedings Trim Size: 9in x 6in
Nick.Britton.novo2
169
Conclusions • If defence is cost-free, even rare enemies are worth defending against, and an arms race should be expected. • If defence has parasite-independent costs, then we expect no defence against rare parasites, and defence against common parasites, with a mixture of defensive strategies at intermediate levels. • One strategy may prevent another otherwise advantageous strategy from appearing, a phenomenon known as strategy-blocking. • If host basic reproductive ratio is small, then however rich the environment there will never be enough parasites to make costly defence worthwhile. • There is no fundamental reason for hosts not to reject both eggs and chicks, despite this never having been observed in nature. We might expect this to happen for high R/ν and high k. References 1. N F Britton, R Planqu´e and N R Franks, Evolution of defence portfolios in exploiter–victim systems, Bulletin of Mathematical Biology 69, 957-988, 2007. 2. N B Davies, Cuckoos, Cowbirds and Other Cheats, T & A D Poyser, London, 2000. 3. R Dawkins, The Extended Phenotype: the Long Reach of the Gene, Oxford University Press, 1982. 4. R Dawkins and J R Krebs, Arms races between and within species, Proceedings of the Royal Society of London B 205, 489-511, 1979. 5. H Flor, The complementary genic systems in flax and flax rust, Advances in Genetics 8, 29-54, 1956. 6. L Gilbert, Coevolution and mimicry, in Coevolution, eds D Futuyma and M Slatkin, Sinauer, Sunderland Massachusetts, 207-231, 1983. 7. R D Holt, Predation, apparent competition, and the structure of prey communities, Theoretical Population Biology 12, 197-229, 1977. 8. D Janzen, Coevolution of mutualism between ants and acacias in Central America, Evolution 20, 249-275, 1966. 9. N Langmore, S Hunt and R Kilner, Escalation of a coevolutionary arms race through host rejection of brood-parasitic young, Nature 422, 157-160, 2003. 10. A Nicholson and V Bailey, The balance of animal populations, I, Proceedings of the Zoological Society of London 1, 551-598, 1935. 11. K M Oliver, J A Russell, N A Moran, and M S Hunter, Facultative bacterial symbionts in aphids confer resistance to parasitic wasps, Proceedings of the National Academy of Sciences of the USA 100, 1803-1807, 2003. 12. S Rehr, P Feeny and D Janzen, Chemical defense in Central American nonant acacias, Journal of Animal Ecology 42, 405-416, 1973.
April 24, 2009
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
170
A POPULATION DYNAMICS APPROACH TO LANGUAGE EVOLUTION J. F. FONTANARI Instituto de F´ısica de S˜ ao Carlos Universidade de S˜ ao Paulo Caixa Postal 369, 13560-970 S˜ ao Carlos SP, Brazil E-mail:
[email protected] The notion that words compete and languages evolve in analogy to individuals and populations was already familiar in the nineteenth century as expressed in this quotation by the famous Darwin contemporary philologist Max M¨ uller, “A struggle for life is constantly going on amongst the words and grammatical forms in each language. The better, the shorter, the easier forms are constantly gaining the upper hand, and they owe their success to their own inherent virtue.” A more suitable analogue to language, however, is that of a parasitic species since language does not exist without speakers, just like parasites do not exist without hosts. Indeed, the view of language as a purely cultural trait which follows the rules of cultural rather than of biological evolution leads to a mathematical description of language evolution very similar to the formulation of the population dynamics of infectious agents, since the transmission of the language occurs only through the direct interaction between languageproficient (i.e., infected) adults and children. Here we use the results of recent experiments on infants, which demonstrate that they are imprinted by the language of their parents so as to favor contact with individuals that speak the same language, to replace the usual assumption of random meeting between individuals by a procedure in which children born from language-proficient parents but who have failed to learn the language from them (vertical transmission), can actively search for unrelated adults (oblique transmission) that speak the same language. We find that by properly setting the parameters that control the efficiency of oblique transmission, language can be maintained in the population even if it lacks of adaptive value.
1. Introduction An argument for the study of language evolution within a population dynamics framework was presented by Ferdinand de Saussure1 in his famous statement “language is not complete in any speaker; it exists only within a collectivity... only by virtue of a sort of contract signed by members of a community.” The translation of this notion into a mathematical model has
April 24, 2009
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
171
faced, however, some difficulties. The usual approach is to view linguistic innovations as mutations that spread in a population following the very same rules that govern the spreading of genes in population genetics.2–6 This stealthy assumption underlies the game theoretical framework7 which, nevertheless, has been applied to a variety of situations outside the biological realm.8 In fact, game theory assumes that the proportion of individuals using a certain strategy in a given generation is proportional to the relative payoff of that strategy in the previous generation. In the context of language evolution, this assumption implies that mastery of a communication system adds to the reproductive and survival potential of the individuals.2 This procedure gives rise to recurrence equations that are identical to the equations derived for deterministic (i.e., infinite population size) models of viability selection in population genetics.9 Implicit in the population genetics approach is the so-called vertical transmission assumption in which ‘information’ is passed from parent to child. Whereas this point is uncontroversial for genetic evolution, the transmission mode can become an issue for cultural evolution.10,11 There is virtually no dispute on the claim that language is transmitted culturally rather than genetically, and so vertical transmission may not be the sole or even the more important transmission mode associated to language evolution. Thus it is important to consider alternative mathematical approaches to language evolution that take explicitly into account the different transmission modes of culture. We note, of course, that the issue of whether acquiring a language is the result of an exclusive human innate language competence, i.e., language is unlearnable12,13 or is the result of the use of purely inductive, statistical learning procedures14–16 is a matter of fierce contention, which is not directly related to the mode of transmission of language. The heretical idea that language is a sort of parasitic organism that infects and parasitizes child brains in order to reproduce itself was put forward by Deacon.17 This is a deep insight that bears on the issue of the learnability of language mentioned before. From our population dynamics perspective, however, we will concentrate only on the more superficial similarities. First, language needs a human brain to exist as much as a parasite needs a host to survive. Second, language is transmitted from adults to children in a infectious process similar to a microorganism infection. Because an individual of the current generation acquires its language from an individual of the previous generation (not necessarily the biological parents) the transmission mode is a mix between the oblique mode (transmission from adults other than the parents) and the vertical mode (transmission from
April 24, 2009
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
172
the parents). In that sense, the mode of transmission of language differs from the horizontal mode (the transmission occurs between individuals of the same generation) which is characteristic of most infectious microorganisms. The obvious criticism that language is an abstraction that could not be more different than physical organisms, which possess specific metabolic and reproductive systems, does not stand, since a virus – an archetype of the successful parasite – does not possess these systems either, relying completely on the cellular machinery of the host to achieve its replication. Once the analogy between language and parasites is accepted, we must admit the possibility that language evolves so as to adapt to their hosts. As Deacon pointed out, the structure of language is under intense selection since at each generation it must pass through children’s minds and only those language operations and elements that are quickly and easily learned by children are likely to be passed intact to next generations17 – a viewpoint which is resonant of M¨ uller’s view of language evolution presented in the 18 abstract. Hence learning a language is effortless not because children are endowed with a special organ of language as in Chomsky’s theory, but because language has evolved to adapt to the cognitive abilities of children, making learning a language an easy task. An important and unexpected ingredient to understand the population dynamics of language was unveiled recently by experiments with infants. In particular, Kinzler et al.19 have shown that young infants who are not yet capable to produce or comprehend speech prefer to look at a person who previously spoke their native language, and older infants preferentially accept toys from native-language speakers. This innate behavior, mediated by language, may be the key to explain the division of the social world in groups. In this contribution we model this behavior by an active search procedure in which the children are allowed to make a certain number of attempts to meet adults that speak the language of their parents. The mathematical formulation is essentially the formulation used to describe the propagation of infectious agents in a non-structured population. 2. Language as a cultural trait We consider a simple scenario in which individuals endowed with language (or, more modestly, a communication system that can be transmitted culturally to the new generations) co-existed with languageless individuals. Since the ability to communicate can be learned from adults that master the language, it is a cultural trait and so we can borrow the mathematical formulation of cultural evolution developed by Cavalli-Sforza and Feld-
April 24, 2009
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
173
man10,20 to model the spread of language in the population. Henceforth we will refer to individuals who are language-proficient as skilled individuals and to languageless individuals as unskilled individuals. Let vt be the fraction of skilled individuals (i.e., language-proficient individuals) and ut = 1 − vt the fraction of unskilled (i.e., languageless individuals) at generation t. If we denote the efficiency of the process of learning a language from the parents by the parameter b ∈ [0, 1] we can write the frequencies of skilled and unskilled individuals in the next generation due to vertical transmission only as u′ = ut + (1 − b) vt
(1)
′
v = bvt .
(2)
Hence only a fraction b of the skilled individuals remain skilled due essentially to imperfections on the learning process. Vertical transmission can be turned off by setting b = 0. It is clear that vertical transmission is not an appropriate mechanism for skill (or, more generally, culture) propagation since this mechanism rules out the possibility that children of unskilled individuals learn the skill from the other individuals in the population. To take oblique transmission into account we must consider the possibility of interaction (meeting) between skilled adults, whose frequency is vt , and unskilled children, whose frequency is u′ . Here we assume that skilled children, whose frequency is v ′ , are unable to teach the language. If individuals meet randomly then the probability of such meeting is given simply by the product of the two frequencies, u′ vt . Here we will modify this random meeting assumption in order to conform to the experiments of Kinzler et al.19 mentioned in Sec. 1. First, we note that the fraction u′ given in Eq. (1) is a sum of two terms: the fraction of children born from unskilled individuals (ut ) and the fraction of children born from skilled individuals who somehow failed to acquire the skill through parental teaching [(1 − b) vt ]. On the one hand, we assume that the meeting of skilled adults and unskilled children born from unskilled parents is indeed random, and takes places with probability ut vt . On the other hand, we assume that an unskilled child born from a skilled adult performs an active search for skilled adults, since they were already biased towards language by their parents. In particular, we allow this child to attempt a fixed number of random meetings, n ≥ 1, while seeking for a skilled adult. The probability that the child meets a skilled adults is then21,22 Λn (vt ) = 1 − (1 − vt )
n
(3)
April 24, 2009
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
174
which reduces to the result of the random meeting assumption for n = 1. As in the case of vertical transmission, we introduce the parameter f ∈ [0, 1] which yields the probability that a meeting between a skilled adult and an unskilled child of any kind results in successful learning of the skill. The frequencies of skilled and unskilled individuals due to vertical and oblique transmissions can then be written as u′′ = ut (1 − f vt ) + (1 − b) vt [1 − f Λn (vt )] ′′
v = bvt + f vt [ut + (1 − b) Λn (vt )]
(4) (5)
To complete the generation cycle we consider the effect of viability selection by assuming that the fitness of a skilled adult relative to an unskilled one is (1 + s) : 1, where s > 0 is the fitness increment for acquiring the language skill. The final recursion equation for the fraction of skilled adults is v ′′ (1 + s) 1 + sv ′′
(6)
vt (1 + s) {b + f [1 − vt + (1 − b) Λn (vt )]} 1 + svt {b + f [1 − vt + (1 − b) Λn (vt )]}
(7)
vt+1 = or, more explicitly, vt+1 =
where we have used Eq. (5) to write v ′′ in terms of the adult frequencies at generation t and used the normalization ut = 1 − vt to eliminate the frequency of unskilled adults from our final recursion. In what follows we will focus only on the equilibrium solutions of Eq. (7), vt+1 = vt = v. 3. Analysis of the equilibrium solutions Clearly, v = 0 which signs the elimination of skilled adults from the population, is always a fixed point of recursion (7). For any finite n, v = 0 is locally stable provided that the condition (1 + s) (b + f ) − 1 < 0
(8)
is satisfied, whereas for n → ∞, which implies Λ∞ = 1, the stability condition becomes f (1 + s) (1 − b) + (1 + s) (b + f ) − 1 < 0.
(9)
Hence, regardless of the selective advantage of the skill, the perfecting of the transmission mechanisms to pass the skill to children whether by their parents (b) or by unrelated adults (f ) can guarantee the presence of the skill in the population, since the increase of the values of these parameters leads to the violation of the stability conditions (8) and (9).
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
175
For b < 1 and f < 1, however, the unskilled individuals can never be driven out of the population due to the always present outflow of individuals from the skilled class to the unskilled one. Hence v = 1 is not a fixed point of the recursion (7) and we have to resort to a numerical analysis to determine the fixed point v > 0, since it is given by the roots of a polynomial of order n + 1. Next we summarize our main findings. 1 0.8 0.6 v
April 24, 2009
0.4 0.2 0 0
0.2
0.4
0.6
0.8
1
b Fig. 1. The fraction of skilled adults at equilibrium v as a function of the efficiency of the transmission of the skill between parents and children b (vertical transmission) for s = 1/9 and f = 9/10. This choice of parameters guarantees that conditions (8) and (9) are violated in the whole range of b. The three curves correspond to different values of the number of attempts n that a child born from skilled parents can make to meet another skilled adult: (bottom to top) n = 1, 2 and ∞.
We find that whenever the fixed point v = 0 is unstable (i.e., conditions (8) or (9) are violated) there exists a unique fixed point of the recursion (7) in the physical range (0, 1). From Fig. 1, which illustrates the dependence of v on the control parameters of the model in this regime, we can appreciate the highly nontrivial effect of the replacement of the random meeting assumption (n = 1) by the active search procedure (n > 1) in the regime where the efficiency of vertical transmission is low. Because Λn goes to 1 exponentially fast with increasing n, the results for n = 2 are already very close to those for n → ∞, as shown in the figure. We can easily obtain analytical expressions for v in the two extreme cases n = 1 and n → ∞. In the former case the meetings between children and skilled adults take place randomly, whereas in the latter case the chil-
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
176
dren are certain to meet at least one skilled adult. For n = 1 we find that v is given by the smaller root of the quadratic equation bf sv 2 − [s (b + f ) + f b (1 + s)] v + (1 + s) (b + f ) − 1 = 0.
(10)
In fact, we can easily show that in the regime of interest (i.e., condition (8) is violated) this equation has two positive roots v1 and v2 such that v1 < 1 and v2 > 1. In addition, we note that for b = 0 we have v = [(1 + s) f − 1] /sf , and so the vanishing of v at b = 0 shown in Fig. 1 is a consequence of the particular choice of the parameters s and f used in the drawing of that figure. 1 0.8 v=0 0.6
v>0
and v>0
s
April 24, 2009
0.4 v=0
0.2 0 0
0.2
0.4
0.6
0.8
1
b Fig. 2. Bifurcation diagram showing the regions in the plane (b, s) for f = 0.3 characterized by the different stable fixed points of the recursion (7). In the region where condition (8) is satisfied but condition (9) is violated there is a value of n > 1 above which a new stable fixed point appears. Since in this region there are two stable fixed points, v = 0 and v > 0, the outcome of cultural evolution depends on the initial fraction of skilled adults v0 (see Fig. 3).
For n → ∞ we find that v is given by the smaller root of the quadratic equation f sv 2 − [3f s + sb (1 − f ) + f ] v + (1 + s) (b − bf + 2f ) − 1 = 0,
(11)
since the other root is always greater than 1, as in the previous case. We note that for f > 1/2, the solution v > 0 is always stable (and v = 0 is always unstable), regardless of the values of s and b. The relevant feature illustrated by Fig. 1 is that for the transmission of a skill we can dispense
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
177
with vertical transmission altogether provided that there is some institutionalized mechanism that guarantees the meeting between children and skilled adults (e.g., schools). 0.5 0.4 0.3 vt
April 24, 2009
0.2 0.1 0 0
20
40
60
80
100
t Fig. 3. Time evolution of the fraction of skilled adults for n = 15, f = 0.3, b = 0.3 and s = 0.4. The outcome of cultural evolution depends on whether the initial fraction of skilled adults v0 is greater or less than a threshold value vc , given by the unstable fixed point. For this parameter setting we have vc = 0.09388, which is indicated in the figure by the horizontal dashed line.
Now we turn to the more interesting situation where n > 1 is finite and the stability condition (8) is satisfied so the fixed point v = 0 is locally stable. Based on the previous analysis, we expect that for sufficiently large n a pair of fixed points – one unstable and the other locally stable – will appear in the regions of the parameter space where (8) is satisfied and (9) is violated (see Fig. 2). This expectation is confirmed by Fig. 3, which shows the time evolution of the fraction of skilled adults vt for different initial conditions v0 . This dependence of the outcome of the dynamics on the initial frequency of skilled adults and, in particular, the fact that for small v0 the dynamics is attracted to the fixed point v = 0 are the key elements of the Allee effect of population dynamics, which asserts that intraspecific cooperation might lead to inverse density dependence, resulting in the extinction of some (social) animal species when their population size becomes small.23,24 This effect is particularly strong in the game theoretical formulation of the competition between two dialects (or communication strategies) since in this case there are always two stable fixed points corresponding to the
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
178
fixation of each one of the strategies.5,6 1000
100 nm
April 24, 2009
10
1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
b Fig. 4. Minimum value of the number of meetings a child born from skilled parents is allowed to attempt while seeking for a skilled adult (nm ) as function of the vertical transmission efficiency (b) in the regions where the two stable fixed points v = 0 and v > 0 coexist. Note the discontinuity from nm > 1 to nm = 1 at the value of b at which condition (8) is violated. The parameters are s = 0.4 and (right to left) f = 0.1, 0.2, 0.3, 0.4, 0.5 and 0.6.
To determine the minimum value nm such that for n ≥ nm we have the coexistence between the two stable fixed points, v = 0 and v > 0, first we write the equilibrium condition for the nontrivial solution of recursion (7) as hn (v) ≡ 1 − [1 + s (1 − v)] {b + f [1 − vt + (1 − b) Λn (vt )]} = 0.
(12)
As mentioned before, what happens here is the (simultaneous) appearance of a pair of fixed points in the physical region (0, 1), which means that Eq. (12) exhibits a double root in this region. Hence for f , b and v fixed, we have to solve (12) simultaneously with h′n (v) = 0 for n and v. The results are summarized in Fig. 4 where the independent variable b is allowed to vary within the boundaries given by the stability conditions (8) and (9), i.e., within the region where the two fixed points v = 0 and v > 0 are stable (see Fig. 2). For the sake of clarity, we allow nm to take on real values. As expected, nm diverges as b approaches the left boundary in Fig. 2 which is given by condition (9), but surprisingly nm does not go to 1 as the right boundary, given by condition (8), is approached. Since wherever
April 24, 2009
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
179
this condition is violated the quadratic equation (10), which yields the nontrivial fixed points for n = 1, has one physical solution, nm exhibits a discontinuity at the right boundary. 4. Conclusion This contribution departs from the usual approaches to language evolution which are based on population genetics (genetic algorithms, in the case the analysis is centered on agent-based simulations2,3,5 ) or on the game theoretical framework,6,25 as we choose to view language as a purely cultural trait which thus follows the rules of cultural rather than of biological evolution. The mathematical formulation of cultural evolution is very similar to that of the population dynamics of infectious agents, since transmission of the language occurs only through the direct contact and interaction between individuals. This observation is part of the conception of language as a parasitic species that coevolves with children’s minds.17 Here we considered the simplest possible scenario, albeit one that certainly occurred in the evolution of communication in animals, in which individuals endowed with language (thought of as a skill) co-existed and hence competed with unskilled individuals. All children were capable to learn the skill, regardless of the category – skilled or unskilled – of their parents. Children born from skilled parents, however, are more likely to acquire the skill since they can learn it directly from their parents via the vertical transmission mode or, in the case parental teaching fails, they can actively seek for adults that possess the skill. This active search process, that replaces the random meeting assumption of population dynamics, is inspired by actual experiments on infants which demonstrate that they are imprinted by the language of their parents so as to favor contact with individuals that speak the same language.19 We find that the introduction of the active search strategy to model the frequency of meetings between children born from skilled parents and unrelated skilled adults results in qualitative changes in the population dynamics. In particular, the key parameter to specify the outcome of the dynamics turns out to be the efficiency of the oblique transmission mode (f ), which essentially measures the probability that the skill is successfully transmitted during the encounter of a skilled adult with a child. In fact, by setting f > 0.5 and assuming no bounds on the number of attempts to find a skilled adult we can guarantee the permanence of skilled individuals in the population even if the skill provides no selective advantage (s = 0) or the vertical transmission mode is turned off (b = 0). The framework presented
April 24, 2009
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
180
here can be useful to determine strategies and policies aiming at slowing down the current rate of extinction of world’s languages and cultures.26 Acknowledgments This work was supported in part by CNPq and FAPESP, Project No. 04/06156-3. References 1. F. de Saussure, Course in General Linguistics (McGraw-Hill Book Company, New York, 1966). 2. J. R. Hurford, Lingua 77, 187 (1989). 3. M. A. Nowak and D. C. Krakauer, Proc. Natl. Acad. Sci. USA 96, 8028 (1999). 4. J. F. Fontanari and L. I. Perlovsky, Phys. Rev. E 70, 042901 (2004). 5. J. F. Fontanari and L. I. Perlovsky, IEEE Trans. Evol. Comput. 11, 758 (2007). 6. J. F. Fontanari and L. I. Perlovsky, Theory Biosci. 127, 205 (2008). 7. J. Maynard Smith, Evolution and the Theory of Games (Cambridge University Press, Cambridge, UK, 1982). 8. D. Fudenberg and J. Tirole, Game Theory (MIT Press, Cambridge, MA, 1991). 9. D. L. Hartl and A. G. Clark, Principles of Population Genetics (Sinauer Associates Inc., Sunderland, MA, 1989). 10. L. L. Cavalli-Sforza and M. W. Feldman, Cultural Transmission and Evolution: A Quantitative Approach (Princeton University Press, Princeton, NJ, 1981). 11. R. Boyd and P. J. Richerson, Culture and the evolutionary process (University of Chicago Press, Chicago, 1985). 12. N. Chomsky, Language and mind (Harcourt Brace Jovanovich, New York, 1972). 13. N. Chomsky, Rules and Representations (Columbia University Press, New York, 1980). 14. J. R. Saffran, R. N. Aslin and E. L. Newport, Science 274, 1926 (1996). 15. E. Bates and J. Elman, Science 274, 1849 (1996). 16. M. S. Seidenberg, M. C. MacDonald and J. R. Saffran, Science 298, 553 (2002). 17. T. W. Deacon, The Symbolic Species (W. W. Norton & Company, New York, 1997). 18. G. Radick, Selection 3 7 (2002). 19. K. D. Kinzler, E. Dupoux and E. S. Spelke, Proc. Natl. Acad. Sci. USA 104, 12577 (2007). 20. L. L. Cavalli-Sforza and M. W. Feldman, Proc. Natl. Acad. Sci. USA 80, 4993 (1983). 21. S. A. Boorman and P. R. Levitt, The Genetics of Altruism (Academic Press, New York, NY, 1980). 22. I. Eshel and L. L. Cavalli-Sforza, Proc. Natl. Acad. Sci. USA 79, 1331 (1982).
April 24, 2009
9:3
WSPC - Proceedings Trim Size: 9in x 6in
Jose.Fontanari.novo2
181
23. W. C. Allee, Animal Aggregations. A Study in General Sociology (University of Chicago Press, Chicago, 1931). 24. F. Courchamp, T. Clutton-Brock and B. Grenfell, Trends Ecol. Evol. 14, 405 (1999). 25. L. L. Cavalli-Sforza and M. W. Feldman, Proc. Natl. Acad. Sci. USA 80, 2017 (1983). 26. D. Nettle and S. Romaine, Vanishing Voices (Oxford University Press, Oxford, 2000).
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
182
IMPACT AND EFFECTIVENESS OF MARINE PROTECTED AREA ON ECONOMIC SUSTAINABILITY C. JERRY N. RASSI Equipe d’Ing´ enierie Math´ ematique(E.I.MA), L.I.R.N.E, Facult des Sciences, Universit Ibn Tofa¨ı l, B.P.133, Kenitra, Morocco E-mail:
[email protected] Among the many factors that contribute to overexploitation of marine fisheries, the role played by uncertainty is important. This uncertainty includes both the scientific uncertainties related to the resource dynamics and the evolution of its price. Recently, many works advocate for the use of Marine Protected Areas (MPAs) as a central element of future stock management. In this work we investigate and analyse the impacts of the creation of MPAs, in economic sustainability through a bioeconomical model integrating the evolution of the resource price. Equilibria and stability of the model are studied. Also, instead of studying the environmental and economic interactions in terms of optimal control, we focus on the viability of the system. This viability is defined by a set of economic state constraints. This constraints combine a guaranteed consumption and a minimum income for fishermen. Using the mathematical concept of viability kernel, we exhibit how marines reserves might guarantee a perennial system and viable fisheries. Keywords: Fisheries management, Marine protected area, Demand function, Endogenous price, Viability kernel.
1. Introduction Over-exploitation of marines fisheries resources remains a serious problem worldwide. Among the many factors that contribute to sustainable fisheries management failures, the role played by uncertainty is important. Uncertainty in fisheries can be classified under three principal forms Ref. 4: random fluctuations such as those affecting fish survival rate in the ocean or fish price in the market; uncertainty in parameter estimates and states of nature such as uncertainty on the stock size or the fishing mortality; and structural uncertainty that reflects a basic lack of knowledge about
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
183
the nature of the fishery system such as uncertainty on the multi-species interactions, for instance predator-prey effects. In the present work we are interested in fish price and stock size uncertainty. In fisheries, it becomes now widely accepted that a basic issue to avoid this uncertainty, in sustainable fisheries management, is the reconciliation of environmental and economic requirement with an intergenerational equity perspective. Since recently, a growing number of works have been suggesting that one possible way to integrate and to manage this uncertainty may be through the implementation of marine reserves1,7,11 . The main goal in this work is to analyse the impact and effectiveness of marine reserves in the economic(price) sustainability and more specifically in the viable inter-temporal use of fishing resources. We use the viability analysis2 to investigate how marine reserves can guarantee the price sustainability of the fishery. In brief, viability approach is a mathematical framework which aims at analyzing the compatibility between the dynamics (eventually uncertain) of a system and a set of constraints representing safe, tolerable, effective, feasible or admissible situations affecting the system. The viability approach, therefore, allows coping with multicriteria constraints and may offer the opportunity to reconcile economical and environmental issues within a sustainability perspective. This viability approach has already been applied to fishery issues in Ref. 3,7,11. In the present case, since we address conservation issues, the constraints mainely include a guaranteed consumption level together with a guaranteed level of income. From a mathematical point of view, the viability approach focuses on the set of conditions (states or decisions) that allow dynamical systems to remain viable over time, i.e. to stay within their viability domains defined by the sets of constraints. The work is organized as follows. In section 2, we describe the dynamics of the system and we identify the viability constraints. we define the sustainability of the economy through the viability kernel. Section 3 we determine the viability kernel. Section 4 summarizes the major findings.
2. The model 2.1. The dynamics Following Hannesson,10 Ami et al.1 and Jerry et al.,11 the growth of the population density into the reserved and unreserved areas are governed by the dynamics:
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
184
X˙ 1 = F1 (X1 ) + λ(X2 − X1 ), X1 (0) = X10 X˙ 2 = F2 (X2 ) − λ(X2 − X1 ) − qE2 X2 , X2 (0) = X20 .
In which we introduce the dynamic prevailing on the market in order to determine endogenously the price(,1214 ) D(P ) − qE2 X2 ) with P (0) = P 0 > 0, P˙ = ε( K2 where X1 , X2 are the stock densities in two areas: index 1 represents the protected area, where harvesting is forbidden, and index 2 the complementary domain where harvesting is allowed. P is the unit price of the resource. F1 (·) and F2 (·) are the growth functions of the resource corresponding to each area and assumed to be Lipschitz, continuous and strictly concave with Fi (0) = Fi (1) = 0. We assume that in each zone growth of fish population follows logistic model. D(p) is the demand function and assumed to be strictly nonincreasing and convex.5 The parameter ε stands for the price’s speed adjustment.9,14 The decision or control of this dynamics is the fishing effort E2 (t). Where admissible harvesting efforts E2 (·) belong to [E2min , E2max ] with E2max > E2min > 0 and q represents the catchability coefficient. We obtain the following control differential system X˙ = sX1 (1 − X1 ) + λ(X2 − X1 ), X1 (0) = X10 ˙1 X2 = rX2 (1 − X2 ) − λ(X2 − X1 ) − qE2 X2 , X2 (0) = X20 (1) D(P ) P˙ = ε( − qE2 X2 ), P (0) = P 0 . K2 In the above model, r and s are the intrinsic growth rates of fish population inside the unreserved and reserved area. K1 , K2 are the carrying capacities of the two areas. 2.2. The sustainability constraints A second step in this analysis is to express state variable constraints from a regulating (say a government) agency viewpoint. The constraint concerns the resource price. First, we consider that the government agency seeks to guarantee the sustainability of the fishing activity by maintaining a global positive net benefit in the sector at any time.13 Therefore, P (t) is constrained by a fixed minimum price P¯ , such that P ≤ P (t), ∀ t ≥ 0. ¯
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
185
Furthermore, we impose an important requirement related to a guaranteed consumption level throughout time. Then, P (t) is constrained by a fixed ¯ such that maximum price P, ¯ ∀t≥0 P (t) ≤ P, This price restraint refers to a sustainability and intergenerational equity concern. Then, we thus identify the final constraint, ¯ ∀ t ≥ 0. P ≤ P (t) ≤ P, ¯
(2)
2.3. Definitions and Hypothesis For the following, it is convenient to posit and to define ¯ i. Z := {P / P ≤ P (t) ≤ P}, ¯ ii. N1 := {(X1 , X2 ) | F1 (X1 ) + λ(X2 − X1 ) = 0}, iii. N2 (E2 ) := {(X1 , X2 ) | F2 (X2 ) − λ(X2 − X1 ) − qE2 X2 = 0}, F2 (X2 ) qE2 X2 iv. X˜1 (E2 ) = X2 − + λ λ D(P ) − qE2 X2 = 0}, v. N3 (E2 ) := {(X2 , P ) | K2 vi. X ∗ (E2 ) is the intersection of N1 and N2 (E2 ) away from the origin, vii. (X2∗ (E2 ), P ∗ (E2 ) is the intersection of N3 (E2 ) with the line X2 = ∗ X2 (E2 ), and to assume the following hypotheses H0. λ + qE2min ≥ r, H1. λ ≥ s, H2. lim D(P ) = 0 and lim D(P ) = ∞. p→∞
p→0
Notice that hypotheses H0-H1 imply that the null-clines N1 and N2 (E2 ) do not cross the axes away from the origin, whatever is E2 ∈ [E2min , E2max ]. 3. A viability analysis A first question that arise now is whether the dynamics (1) are compatible and consistent with the set of constraints Z. In other words, we aim at revealing levels of resource and price of the constraint domain Z that are associated with a viable trajectory in Z and thus a viable regulation E2 (t). To achieve this, we proceed in two steps: identification of viable stationary points and determination of viability kernel.
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
186
3.1. Viable Stationary points The first step of the analysis concerns the viable stationary points of the system, which correspond to X˙ 1 = 0, X˙ 2 = 0, P˙ = 0. Under hypothesis H0-H1, the null-cline N1 crosses N2 (E2 ) in two nonnegative point, namely (0, 0) and (X1∗ , X2∗ ) (see Ref. 1,8,11). The intersection of the line X2 = X2∗ with N3 (E2 ) is the point (X2∗ , P ∗ ). So, the dynamical system (1) admits only one equilibria, namely B ∗ (X1∗ , X2∗ , P ∗ ).
Fig. 1.
The viable stationary point for the system 1
The dynamical behaviour of equilibria can be studied by using the Liapunov function. In the following theorem we show that the positive equilibrium B ∗ is globally asymptotically stable. The equilibrium B ∗ is globally asymptotically stable with respect to all solutions initiating in the interior of the positive cube. Consider the following positive definite function about B ∗ :
V =
X2 −
X2∗
−
X2∗
X2 ln ∗ X2
X∗ + 1∗ X2
X1 −
X1∗
−
X1∗
X1 ln ∗ X1
+ (P − P ∗ )2 .
Differentiating V with respect to time t along the solutions of model (1), a little algebraic manipulation yields
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
187
∂V λ r X ∗s (X2 X1∗ − X2∗ X1 )2 =− (X2 − X2∗ )2 − ∗1 (X1 − X1∗ )2 − ∗ ∂t K2 X 2 K1 X2 X2 X1 +2ε(P − P ∗ )(D(P ) − qE2 X2 ) < 0.
∂V is negative definite, and hence by Liapunov’s the∂t 6 orem on stability, it follows that the positive equilibrium B ∗ is globally asymptotically stable with respect to all solutions initiating in the interior of the positive cube. The above theorem implies that in an open-access fishery, if a subregion is reserved where fishing is not allowed and fish populations are harvested only outside the reserved subregion, then in both the reserved and unreserved zones fish species settle down to their respective equilibrium levels. Also, its associated price stabilizes to it equilibrium level. Whose magnitudes depend upon the intrinsic growth rates of fish species, their migration coefficient and carrying capacities. This implies that the system (1) may be sustained at an appropriate equilibrium level for any admissible harvesting of fish populations in unreserved area. This show that
3.2. Viability kernel The next step is to study the whole viability of the system using the concept of viability kernel. The viability kernel, denoted by V iab(Z), corresponds to the set of all initial conditions (X1 , X2 , P ) such that there exists at least one trajectory starting from (X1 , X2 , P ) that stays in the set of constraints Z. In other words, V iab(Z) = (X1 , X2 , P )
∃ an admissible E(.) sucht that the solution (X1 (.), X2 (.), P (.)) of (1), starting from (X , X , P ), is viable in Z 1 2
Let us consider first the following sets
U = {P ∗ /∃ X1∗ and X2∗ where (X1∗ , X2∗ , P ∗ ) is an equilibrium point of ¯ (1) and P ≤ P ∗ ≤ P} ¯ J = {X2∗ /∃ X1∗ and P ∗ where (X1∗ , X2∗ , P ∗ ) is an equilibrium point of ¯ (1) and P ≤ P ∗ ≤ P}. ¯ We distinguish two situations, depending on the fact that U is empty or not.
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
188
Case 1. If U = Ø then V iab(Z) = Ø. This notify that there is no economic state that makes possible to satisfy the set of constraint Z. In other words. it means that the economic is not sustainable. Case 2. If U = 6 Ø: Before to give the result, let us define and consider the following: ∗ ∗ ∗ ∗ • We define two equilibrium point BM (X1,M , X2,M , PM ) and ∗ ∗ ∗ ∗ ∗ ∗ Bm (X1,m , X2,m , Pm ), where X2,M = min J and X = 2,m ∗ X2
J . Also, we define E2,M and E2,m which are the cormax ∗ X2
∗ ∗ . So, we responding fishing effort respectively to BM and Bm have E2,M > E2,m because X2,M < X2,m (see Mounir et al.). Accordingly to the definition of N3 (E2 ) we have
D(P ) D(P ) < ∀P qK2 E2,M qK2 E2,m This inequality means that ∀E ǫ [E2,m , E2,M ] its corresponding null-cline, i.e: N3 (E2 ), is between N3 (E2,m ) at the top and N3 (E2,M ) underneath (see figure 2). • We consider the point (X¯1 , X¯2 , P¯ ), where (X¯2 ,P¯ ) is the intersection of N3 (E2,m ) with the line P = P¯ in the phase plane (X2 , P ) and (X¯1 ,X¯2 ) is the intersection of N2 (E2,m ) with the line X2 = X¯2 in the phase plane (X1 , X2 ) (see figure 2). • We consider the point (X¯1 , X¯2 , P¯ ), where (X¯2 , P¯ ) is the intersection of N3 (E2,M ) with the line P = P¯ in the phase plane (X2 , P ) and (X¯1 , X¯2 ) is the intersection of N2 (E2,M ) with the line X2 = X¯2 in the phase plane (X1 , X2 ) (see figure 2). • Let consider the following system X˙ = −sX1 (1 − X1 ) − λ(X2 − X1 ), X1 (0) = X10 ˙1 X2 = −rX2 (1 − X2 ) + λ(X2 − X1 ) + qE2 X2 , X2 (0) = X20 D(P ) P˙ = −ε( − qE2 X2 ), P (0) = P 0 K2 (3) T1 = (X¯1 (t), X¯2 (t)) is the trajectory solution of the system (3) with initial condition (X10 = X¯1 , X20 = X¯2 ) and E2 = E2,M . T2 = (X¯2 (t), P¯ (t)) is the trajectory solution of the system (3) with initial condition (X10 = X¯1 , X20 = X¯2 ) and E2 = E2,m . We notice that T1 and T2 are independent of P .
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
189
• T3 = (Pˆ (t), Xˆ2 (t)) is the trajectory solution of the system (3) with initial condition X10 : which is the intersection of T1 with the line X2 = X¯2 , X20 = X¯2 , P 0 = P¯ and E2 = E2,m . ˘ 2 (t)) is the trajectory solution of the system (3) T4 = (P˘ (t), X with initial condition X10 : which is the intersection of T2 with the line X2 = X¯2 , X20 = X¯2 ), P 0 = P¯ and E2 = E2,M .
Fig. 2.
The viability kernel.
V iab(Z) is given by the following proposition Under hypothesis H0. and H1., one has X ¯ 1 (t) ≤ X1 ≤ X ˜ 1 (E2,m ) and Pˆ (t) ≤ P ≤ P¯ when X ¯ 2 ≤ X2 ≤ X ¯′ 2 ¯ ¯ ¯ ¯ X1 (t) ≤ X1 ≤ X1 (t) and P ≤ P ≤ P V iab(Z) = (X1 , X2 , P ) ¯ 2 ≤ X2 ≤ X ¯2 when X ˜ ¯ ¯ ˘ (E ) ≤ X ≤ X (t) and P ≤ P ≤ P (t) X 1 2,M 1 1 ′ when X ¯ ≤ X2 ≤ X ¯2 2 This set is shown by figure 2.
Conclusion In this work, impact and effectiveness of marine reserves have been investigated, on economic sustainability. Our attention has been focused on the use of the viability kernel as an indicator of economic sustainability. This
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
190
use allow us to characterize the set of policies and states that do not drive the system into crisis situations outside the domain of constraints. We have showed that the use of marine reserves as a management measure establish an economic sustainability. Also, it allow us the sustainability of the stock by maintaining it above a minimum biomass level. It is clear that the model adopted in this study is quite stylized and built on simplistic assumptions. Future research is needed to relax some of these assumptions. In particular, the fact that some parameters, as the intrinsic growth rate and the carrying capacity, are assumed constants. Capital dynamics through investment should also make the model more realistic. We also hope to incorporate and analyze behavior mechanisms such as cooperation with respect to the resource access issue. More generally, we believe that the viability approach may provide an interesting analytical framework to address some of the issues encountered in natural resource management and economic sustainability development.
References 1. Ami D., Cartigny P., and Rapaport A. (2005), Can Marine Protected Areas Enhance Both Economic and Biological Situations?, C. R. Biologies, 328, pp. 357-366. 2. Aubin, J.P., 1991. Viability Theory. Springer Verlag, Birkhaser. 3. Bn, C., Doyen, L., Gabay, D., 1998. A Viability Analysis for a Bio-economic model. Cahiers du Centre de Recherche Viabilit-Jeux-Contrle, N 9815. 4. Charles, A.T., (1998), Living with uncertainty in fisheries: analytical methods, management priorities and the Canadian ground-fishery expe-rience, Fisheries Research 37, pp. 3750. 5. Clark C.W. (1990), Mathematical Bioeconomics: the Optimal Management of Renewable Resources, 2nd ed.: A Wiley-Interscience. 6. Clarke F. H., Ledyanuv Yu. S., Strem R. J. and Wolenski P. R. (1998), Nonsmooth analysis and optimal control theory, Graduate Texts in Mathematics (Springer, 1998). 7. Doyen, L. and Bene, C., (2003), Sustainability of fisheries through marine reserves: a robust modeling analysis. Journal of Environmental Management 69, pp. 113. 8. Dubey B., Chandra P. and Sinha P. (2003), A model for fishery resource with reserve area, Nonlinear Analysis: Real world Appl. 4 (2003), pp. 625-637. 9. Gatto M., and Ghezzi L.L. (1992), Taxing Overexploited Open-access Fisheries: The Role of Demand Elasticities, Ecological Modelling, 80, pp. 185-198. 10. Hannesson R., Marine reserves: what would they accomplish?, Mar. Resour. Econ. 13, pp. 159-170 (1998). 11. Jerry M., Cartigny P., and Rapaport A. (2007), The study of the viability domain for a fishing problem with reserve, Session: Mathematical Modeling
April 24, 2009
9:5
WSPC - Proceedings Trim Size: 9in x 6in
Raissi.Jerry.novo2
191
of Fisheries Management, Second Conference on Computational and Mathematical Population Dynamics (CMPD2), 16-20 July, 2007. 12. Jerry C. and Rassi N., Analysis of management measures effects on the fisheries, Session: Mathematical Modeling of Fisheries Management, Second Conference on Computational and Mathematical Population Dynamics (CMPD2), 16-20 July, 2007. 13. Kamili A. (2006), Thse intitul: Bio-economie et Gestion de la Pˆecherie des ´ Petits Plagiques- Cas de lAtlantique Centre Marocain. 14. Svizzero S. (1993), On the Introduction of Demand in a Fishery Model.
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
192
LONG DISTANCE DISPERSAL AND ALLEE EFFECT IN A BIOLOGICAL INVASION∗ S. LOU VEGA∗† W. C. FERREIRA Jr.∗∗ Inst. Matem´ atica , Estat´ıstica e Computa¸c˜ ao Cient´ıfica -IMECCUniversidade Estadual de Campinas, Campinas, S˜ ao Paulo Brasil E-mail: ∗
[email protected]; ∗∗
[email protected] We present a mathematical model which couples a population growth dynamic subject to an Allee effect and a long distance dispersal process. First we analyzed the local dynamic through the equilibria and their stability. For the spatial dynamic we used numerical simulations, that allowed us to observe the spatial expansion of the population and to track the spatial displacement of the invasion front. This permitted us to calculated the expansion speeds. We determined the influence of the Allee effect, reproductive capacity and the long distance dispersal on the invasion speeds. We observed that an Allee effect turns accelerating expansion speeds into constant speeds. Expansion speeds decreases with Allee effect intensity but increases with the reproductive capacity of the population. Our results show that While dispersal contributes to expansion speeds, it also turns the population more susceptible to extinction.
1. Introduction Allee effect refers to any process by which any component of the fitness of an individual is enhanced as population density of conspecifics rises.22 When the overall fitness of the individuals increases with population density, this translates into a higher per capita growth rate and we have a demographic Allee effect.22 A demographic Allee effect establishes a positive relationship between the population density and the per the capita growth rate. At the population level, the Allee effect occurs mainly at low population levels, often resulting in a critical density threshold, below which the population experiences a negative per capita growth, driving the population to extinc∗ This † Work
work is supported by CNPq supported by CNPq
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
193
tion. When the Allee effect produces a negative per capita growth rate for a range of low population density, we have a strong Allee effect.26 If the population decreases its per capita growth rate at low density levels, but it never becomes negative, then we have a weak Allee effect.26 The Allee effect is important when considering the conservation of endangered or rare species, but recently it has taken relevance in biological invasions and their control. Allee effect is potentially present, whenever a population experiences low density levels. This can occur especially during the introduction and establishment phases of an invasion, when the introduced population is usually small. The Allee effect is important when considering the conservation of endangered or rare species, but recently it has taken relevance in biological invasions and their control.19 Allee effect is potentially present, whenever a population experiences low density levels. This can occur during the introduction and establishment phases of an invasion, since the introduced population is small. Additionally, low density levels at the front of an invasion during the expansion stage, gives an opportunity to the Allee effect to emerge.3,24 The Allee effect can alter the dynamics of Invasion. It can reduce expansion speeds of invasive organisms,15,17,25,26 and it is considered to be the responsible for the lag phases observed in some invasions.3,5,24 Failure to locate mates, is one of the most recognized mechanisms to produce an Allee effect in sexually reproducing species.1 Although, it is usually expected in animal species, it also occurs not only in animal-pollinated,8 but also in wind-pollinated plants.3 Some evidence exists, that wind-pollinated plants can experience pollen limitation.3,13 In Plant invasions, factors affecting seed production can alter the dynamics of the invasion. Davis et al.3 demonstrated that pollen limitation and reproductive success in plants were strongly influenced by local plant density. They detected an Allee effect due to pollen limitation at the front of the invasion of Spartina alterinflora a wind pollinated invasive weed.3 Models based on these results can give an insight into the dynamics of Invasion of species subject to Allee effect. In this project we attempted to model this situation, by considering a reproductive growth model which takes into account an Allee effect due to pollen limitation. The spatial spread of invasive species is frequently modeled by reactiondiffusion (RD) equations and by Integro-difference equations (IDE).11 For the present model we used IDE, because they incorporate dispersal kernels explicitly, which allows us to model a variety of dispersal patterns, including long distance dispersal.
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
194
Dispersal is considered to be a multiple scale process. In plant dispersal, two principal scales are commonly defined, local dispersion (LD) and long distance dispersion (LDD). The LDD is considered to be a stochastic or eventual process. Even though, the LDD is an eventual process, the expansion speed of an invasive species is very sensitive on it. The stochasticity of the LDD process is often represented by the proportion of seeds dispersing long distances . It is usually assumed that only 1 to 10% of the seeds disperses long distances . If the front of the invasion is determined mainly by this small fraction of seeds, it is reasonable that an Allee effect may arise at the front of the invasion, as Davis et al.3 has evidenced in the case of Spartina alterniflora. Using IDE and a multiple scale dispersal (LD and LDD), we analyzed the influence of the Allee effect on the dynamics of the invasion. We show that the Allee effect turns accelerating expansion speeds into constant speeds, as has been demonstrated by other authors.26 We also show that an Allee effect not only reduces expansion speeds, but that the combination of dispersal and Allee effect turns the population more susceptible to extinction. 2. The Model We developed a model which couples population growth and dispersal dynamics of an invasive plant. For the model, we considered a plant with two distinct stages : a) a growth stage, and b) a reproductive and dispersal stage. The model was developed in two phases : a) the local population growth model and b) incorporation of the dispersal process into the population growth model. 2.1. Reproductive dynamic For the population growth model we divided the plant population into two subpopulations: a) reproductive plants, the part of the population which produces seeds and b) the seeds. As a temporal scale, we took a generation to be the unit of time, and assume that a seed takes one generation to become a reproductive plant. We define nt as the density of reproductive plants at the generation t, and st , as the quantity of seeds produced by the reproductive plants in that generation. We model the density of reproductive plants in the generation t + 1 as nt+1 = nt − µnt + βst
(1)
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
195
where µ is the proportion of plants that died during the tth generation, and β is the proportion of seeds that germinated and established as seedlings. The average amount of seeds produced per plant, represents the reproductive capacity per capita of the population. Considering that a reproductive plant produces in average f ovules per generation, we define the per capita reproductive rate (ˆ r) as the number of fertilized ovules per plant per generation: rˆ = λ · f
(2)
where λ is the probability of an ovule to be fertilized once a grain of pollen encounters a stigma of a flower. The total amount of seeds produced per generation is given by st = rˆ · nt
(3)
Seed production depends on how much pollen reaches stigma flowers. The amount of pollen landing on stigmas is positively correlated with pollen availability in the environment.3 Recent studies have determined that local plant density is a key factor in determining local pollen availability in wind pollinated plants.3,13 Additionally, Davis et al.3 showed that pollen abundance drops drastically outside stand of plants in a wind pollinated invasive grass. Based on these results, we assume that pollination (pollen dispersal) is a punctual process relative to seed dispersal, and that pollen availability is determined by punctual plant density. Since seed production depends on pollen availability, we introduce a density dependent function P (nt ) that describes the probability of pollen-stigma encounters to the per capita reproductive rate rˆ(nt ) = λf · P (nt )
(4)
nt We use the rectangular hyperbola (RH) functional form P (θ, nt ) = θ+n t 4 suggested by Dennis , for describing the encounter probability between sexual species in function of the population density. In our case the parameter θ > 0 represents the population density at which a plant has the probability of fertilize half of its ovules. The RH functional form, is a monotonic increasing function of nt , and approaches unity as nt → ∞. Hence the per capita reproductive rate (4) is an increasing function of the population density, which corresponds to the definition of an Allee effect. In our case pollen
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
196
limitation is the mechanism that gives rise to the Allee effect. Introducing the Allee effect into the population growth model (1) we have
nt+1
nt = nt − µnt + βˆ r θ + nt
nt .
(5)
The parameter θ will give the Allee effect intensity. The growth model (5) does not have an overcrowding effect. A simple form to introduce this effect is considering that the probability of a seedling to reach maturity declines linearly with plant density:
γ(nt ) = 1 −
nt k
(6)
where k is the plant density at which, the probability of a seedling to become a reproductive plant is zero. Introducing the term γ(nt ) to (5) we have the full model representing the population growth of a plant subject to an Allee effect due to pollen limitation, and intraspecific competition due to overcrowding
nt+1
nt = nt − µnt + βˆ r θ + nt
nt . nt · 1 − k
(7)
2.2. Incorporating seed dispersal We model the dispersal process in a one dimensional region Ω. The seeds are the part of the population that disperses. We define nt (x), as the reproductive plant density in x ∈ Ω in the tth generation, and sˆt as the quantity of seeds in x ∈ Ω after seed dispersal. The term sˆt (x) includes all the seeds produced by the plants in Ω that dispersed and arrived at x ∈ Ω. We model the plant density distribution in the generation t + 1 as nt . nt+1 = nt − µnt + βˆ st 1 − k
(8)
If we exclude the dispersal process we have sˆt = st , and we recover the local growth model (5). To model the spatial dynamics of the population, we introduce seed dispersal kernels. A dispersal kernel k(x, y), is a probability density function (pdf) for a seed released in a point x, and arrives at a location y. As a pdf, the dispersal kernel has the following properties
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
197
Z
k(x, y) > 0 ∀x, y ∈ Ω k(x, y)dy = 1
Ω
∀x ∈ Ω
(9) (10)
We assume that the probability of a seed arriving at a location depends only on the relative distance between the origin and destiny of the seed (|x − y|). In other words, we are considering a homogeneous environment for the dispersal process. We also assume that dispersal is isotropic, which implies a simetric dispersal kernel. The seed density distribution after seed dispersal, sˆt , is given by the product of the seeds produced in y before seed dispersal, st (y), and the dispersal kernel k(x, y), and summed for all y ∈ Ω, thus yielding the following expression sˆt (x) =
Z
k(x, y)st (y)dy.
(11)
Ω
The dispersal kernel represents the seed dispersal pattern. Mixed kernels describe seed dispersal when different processes or mechanisms govern the movement of seeds under different circumstances.12 Dispersal is considered to be a multiple scale process in which various mechanisms may participate, each one generating a different scale of dispersion.12,18 Two spatial scales are commonly used, when modeling plant dispersal: a) a local dispersal (LD) and b) long distance dispersal (LDD).2,12,18 It is said that the standard way of dispersal of the plant generates the LD scale, while the LDD is produced by non-standard means of dispersal or by unusual behavior of the standard means of dispersal. considering that dispersal is a multiple scale process, we use a mixed kernel to model seed dispersal. We consider that the majority o seeds disperses locally according to a Gaussian distribution, and a little fraction of seeds disperses long distances according to a fat tail probability distribution. The small fraction of the population that disperses long distances, represents the eventuality of the LDD. We use a functional form which represents a family of exponential dispersal kernels k(z) =
z c c −| α | 1 e 2αΓ( c )
(12)
where α is the variance (distance parameter) and c is the shape parameter of the distribution. We use a c = 2 and a c = 21 to obtain the Gaussian and the fat tail distribution respectively. For both kernels we consider a variance equal the unity (α = 1). The combination of these two distributions forms
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
198
our mixed kernel. Coupling the population growth and dispersal dynamics yields the full model nt nt+1 = nt − µnt + βrˆ st 1 − k
(13)
where sˆt is given by sˆt = (1 − p) +p
Z
Ω
Z
Ω
k1 (|x − y|)
k2 (|x − y|)
nt (y) θ + nt (y)
nt (y) θ + nt (y)
nt (y)dy
nt (y)dy
(14)
and k1 (x, y) and k2 (x, y) are the Gaussian and fat tail distribution respectively. 3. Analyses and Results We analyze the nondimensional version of the model. We introduce the new adimensional variables Nt = nkt , St = skt which represent the plant and seed densities relative to the density k, and the new adimensional parameter σ = kθ which represents the plant density relative to k, at which a plant gets half of its ovules fertilized. In order to simplify the expression of the model, we gathered the parameters rˆ and β in one new parameter r = βˆ r. To investigate our model, we first analyze the local behavior of the model and then, we analyze the influence of the Allee effect on the spatial dynamic of the population. 3.1. Local dynamics For the local dynamics, we analyze the growth model through the equilibrium points and their stability. We express the population growth model in the following general form: Nt+1 = F (Nt ) = Nt f (Nt )
(15)
where f (Nt ) is the per capita population growth rate, given by f (Nt ) = 1 − µ + r
Nt σ + Nt
(1 − Nt ).
(16)
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
199
The equilibrium points N ∗ satisfy F (N ∗ ) = 1, which implies from (15), that N ∗ = 0 or f (N ∗ ) = 1. We have that the trivial equilibrium point (No = 0) always exists. The real roots of f (Nt ) = 1 yield the non trivial equilibria 1p 1−γ (1 − γ)2 − 4σγ (17) − 2 2 p 1 1−γ + N2 = (1 − γ)2 − 4σγ (18) 2 2 µ where γ = r . The nontrivial equilibria are function of the parameters (σ, γ), and their existence depend on the following condition to be satisfied N1 =
(1 − γ)2 (19) 4γ When the equality in (19) is satisfied, we have N1 = N2 , and the model has one non trivial equilibrium point (N ∗ = 1−γ 2 ). This equilibrium point represents a point of bifurcation and satisfies F ′ (N ∗ ) = 1. When the inequality (19) is satisfied, the point N ∗ bifurcates yielding the two equilibria points N1 and N2 when the inequality is straight. N1 takes values in the interval [0, N ∗ ] and N2 in the interval [N ∗ , 1 − γ] for any values of σ and γ satisfying the condition (19). The non trivial equilibria get closer to each other as the intensity of the Allee effect (σ) gets stronger (Figure 1). σ≤
N
0.5
0 1.20417
2
Σ
Fig. 1. Bifurcation diagram. Equilibria as function of the parameter of bifurcationσ. (—-) N2 , (—) N1 .
The stability of the equilibria are determined through the value of the derivative of the population growth model function F (Nt ), evaluated at the point of equilibrium
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
200
F ′ (N ∗ ) = f (N ∗ ) + N ∗ f ′ (N ∗ ).
(20)
An equilibrium point is stable if −1 < F ′ (N ∗ ) < 1, and unstable if F (N ∗ ) > 1 or (F (N ∗ ) < −1. For the trivial point we have F ′ (No ) = 1 − µ, which is always stable if µ 6= 0. In the case that µ = 0, No is a bifurcation point and its stability changes. For the non trivial equilibria we have that if f ′ (N ∗ ) > 0 then F ′ (N ∗ ) > 1, since N ∗ satisfies f (N ∗ ) = 1. In this case we have an exponentially unstable equilibria. Whenever the two non equilibria exist, we observe from Figure 2 that f ′ (N1 ) is always positive, hence from (20), we have that N1 is an exponentially unstable equilibria. f HNL 1.5
N 0.5
1
0.8
Fig. 2.
Per capita growth function f (Nt ) for different values of parameters: (–
(1−γ)2
–)
(1−γ)2
σ > 4γ , no non trivial equilibria; (—) σ < 4γ , two non trivial equilibria satisfying f (N ∗ ) = 1, the lower equilibria corresponds to N1 and the higher to N2 ; (- - -) µ = 0.
On the other hand, f ′ (N2 ) is always negative, hence F ′ (N2 ) < 1. The stability of this equilibria depends on the condition (21) to be satisfied 2 + N2 f ′ (N2 ) > 0.
(21)
When substituting the expression of N2 in (21), we obtain a new inequality in terms of the parameters G(σ, r, µ) > 0. We take the parameter σ as the independent variable and solve for G(σ, r, µ) = 0. Figure 3 shows the real and positive roots of G(σ, r, µ) = 0, in function of r and for different values of µ. The region above each curve represents the combination of parameters σ and r for which the non equilibrium point N2 is stable. The dynamics of the local population growth model can exhibit the three basic scenarios described by Boukal and Berec1 for population growth
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
201
Σ
2
r 2
6
10
Fig. 3. Real and positive root of G(r, σ, µ) = 0 as function of r and for different values of µ: (—)µ = 0.3 ; (– –)µ = 0.6 ; (- - -)µ = 0.9
models with Allee effect: a) Unconditional extinction (UE), b) Extinction - survival (ES) and c) Unconditional survival (US). The possible outcome of the model depends on the parameters (σ, γ)(Table 1) Table 1. Possible outcomes for the population growth model. Scenario Condition
UE σ>
(1−γ) 4γ
ES 2
σ<
(1−γ) 4γ
US 2
µ=0
The scenario (UE) is obtained when No = 0 is the only equilibria for the population growth model. Since No is a stable equilibria, the population always goes to extinction regardless of the initial population density. When the three equilibria exist, the model shows an (ES) scenario. This scenario is the most familiar consequence of an Allee effect.1 The (ES) scenario establishes a critical density which is represented by a non trivial unstable equilibria. Below the critical density, the per capita population growth is insufficient (f (Nt ) < 1) for the survival of the population, and inevitably goes to extinction. In our reproductive dynamic, the critical density corresponds to the N1 equilibria. The critical density in our model depends on the reproductive capacity (r), on the death rate of the population (µ) and on the intensity of the Allee effect (σ). The critical density increases if the intensity of the Allee effect gets stronger, or if the fecundity of the plant (number of ovules per plant) decreases. If the density of population is above the critical density, the population density will grow towards the maximum density that the population can reach. The maximum density
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
202
of the population is represented by the equilibria N2 . From Figure 1 we see that the maximum density reached by the population decreases as the intensity of the Allee Effect gets stronger. This means that Allee effect not only establishes a critical density, but also influences the maximum density the population can reach. The maximum density that the population reaches, not always is a stable equilibria. Figure 21 shows that the Allee effect can stabilize the equilibria N2 , in the sense that the population can have a higher reproductive capacity (r) without destabilizing N2 . This stabilizing effect has been documented for another population growth models with an Allee effect.6,20 Since chaotic behavior is an uncommon natural phenomenon in population dynamics, it is suggested that sexual populations models should incorporate an Allee effect. Parameter estimation based on experimental data yields chaotic behavior in population growth models without Allee effect, when in nature the population does not experience such chaotic behavior.20 The population can experience an UE scenario, when the death rate is zero or very small compared to the reproductive capacity (µ ≪ r). In this case we have that the population experiences a very low per capita growth rate when the population density is low, but enough for the survival of the population. In this case we have a weak Allee effect, an there is no critical density for the population. 3.2. Spatial Dynamic: dispersal process We examine the population growth and dispersal model (13) by numerical simulations. The numerical simulation consisted in taking the model and calculate, recursively , the density distribution at each generation, from an initial condition. We use the following initial condition for all the simulations
N0 (x) =
0.7 0
for|x| < 1 otherwise
(22)
The numerical simulations allowed us to observe the spatial expansion of the population at each generation. This also allowed us to calculate the spatial displacement of the invasion front. We define the invasion front as the furthest point from origin, at which the population density exceeded an arbitrary preset density (for our simulations we used Nf = 0.01). We use the spatial displacement of the invasive front, to calculate the expansion speed. The expansion speed is computed as the difference between the position of
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
203
the front in one generation and the previous generation. For the expansion speed, we averaged the displacements of the invasive front over the last half of the generations simulated. The expansion speeds were computed for different values of the parameters (r, µ, σ, p) for which the equilibria N2 was stable. 3.2.1. Spatial expansion of the population Figure 4 shows the density distribution of the plant population at various equally spaced time intervals, and for constant values of parameters. We observe an initial phase in which the population first reaches its maximum density (N2 ). Once the population reaches the maximum density, it begins to expand through space as two simetric waves, traveling in opposite directions. The two waves travel through space without changing its shape. The space behind the traveling front remains at the constant population density of N2 . Nt HxL 0.8
0.6
0.4
0.2
x -100
-50
0
50
100
Fig. 4. Population density distribution of the growth and dispersal model with Allee effect, at different equally spaced time intervals (t = 20 generations).
After the initial phase, we observe that the distance between two wave fronts at equally spaced time intervals is relatively constant. This means that the front displacement is constant through time. This is shown in Figure 5, where after some interval of time, the front displacement behaves linearly with time. A constant displacement with time generates constant speed of expansion. In the initial phase (aprox. 15 generations), the front displacement is not constant, it accelerates, causing the graph in Figure 5 to be a slightly convex at the initial phase. The initial expansion of the population is accelerating but then converges to a constant expansion speed.
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
204
The constant speed of expansion is referred as the asymptotical speed of expansion. The convergence to a constant expansion speed was a general behavior for all the numerical simulation of the reproductive and dispersal model with Allee effect. x
50 40 30 20 10 t 20
40
60
80
100
Fig. 5. Invasive front displacement (x) over time (t) for the growth and dispersal model with Allee effect.
The behavior of the spatial expansion for the growth and dispersal model with Allee effect is quite different from the model without the Allee effect. Figure 6 shows the spatial expansion of the model without the Allee effect. As in the growth and dispersal model with Allee effect, the population first reaches its maximum density (N2 ), then the population splits into two simetric waves, each one traveling in the opposite direction. NHxL
0.8
0.6
0.4
0.2
x -300
-200
-100
100
200
300
Fig. 6. Population density distribution of the growth and dispersal model with Allee effect, at different equally spaced time intervals (t = 20 generations)
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
205
The wave fronts are not even spaced between two equally spaced time intervals. Figure 7 shows a growing front displacement with time, which generates an accelerating expansion speed. We can observe that in just 15 generations, the invasion front has displaced almost 300 distance units. x 300 250 200 150 100 50 t 5
10
15
Fig. 7. Invasive front displacement (x) over time (t) for the growth and dispersal model without Allee effect.
Besides the accelerating speed of expansion, the model without the Allee effect, shows an oscillatory behavior around the maximum density N2 , although the parameters used for the simulation where the same as with the Allee effect. Additionally, the parameter values used for the simulation, generate a stable equilibria N2 for the local reproductive dynamic. This shows that the dispersal process can turn unstable the equilibria N2 , and in the other hand the stabilizing influence of Allee effect can be extended to the spatial dynamic. 3.2.2. Asymptotic speeds, Allee effect and reproductive capacity To investigate the influence of the Allee effect and the reproductive capacity of the plant on invasion speeds, we compute the asymptotic expansion speed as function of parameter σ, and for different values of the reproductive (r), holding constant the death rate of the population (µ) (Figure 8). The overall effect of the Allee effect on invasion speeds is a reduction in its value and to turn it into a constant speed. All asymptotic speeds computed for any intensity of the Allee effect yielded constant speeds. Figure 8 shows that a slightly Allee effect turns accelerating invasion speed into constant speed. The expansion speed decreases as the Allee effect intensity gets stronger, and is very sensitive when the Allee effect is small (σ ≈ 0). A small variation
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
206
of σ around zero, produces a strong decrease in the expansion speed. As the intensity of the Allee effect increases, the variation in the expansion speed becomes smaller. c 5
4
3
2
1
0.0
0.5
1.0
1.5
Σ
2.0
Fig. 8. Expansion speeds (c) as function of the intensity of the Allee effect (σ), and for different values of the reproductive capacity r: (—) r = 2; (– –) r = 3; (- - -) r = 4. the parameter µ = 0.3 was hold fixed.
The Allee effect reduces the expansion speed, but an increase in the reproductive capacity of the plant produces higher invasion speeds. This is reasonable, since the more seeds are produced, the more seeds are dispersed to a new location, rising the chances of a successful colonization. Figure 8 shows that for reproductive values (r = 3, r = 4) the intensity of the Allee effect must exceed certain value, in order to stabilize the equilibria N2 . On the other hand, for each value of r , there exists a threshold value of σ, beyond this value, the population extinguishes. This value is smaller than the threshold value for the local dynamic (condition (22). Table 2 shows the maximum σ-values, for the local dynamics and for the spatial dynamics. The maximum values for the local dynamic correspond to σ = (1−γ)2 and the values for the spatial dynamic were computed numerically 4γ with a precision of 0.01. Table 2.
Critical Values of σ.
r
σ
σ
Reproductive Capacity
Local Dynamics
Spatial Dynamics
2
1.20
0.93
3
2.03
1.58
4
2.85
2.24
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
207
This shows that the dispersal process might turn a population subject to an Allee effect, more susceptible to extinction. In the other hand, the reproductive capacity of the population should compensate the dilution of the population due to the dispersal process, in order to avoid extinction. We conjecture that the dispersal capacity might be influenced or determined by the reproductive capacity in a sexual population. An increase in dispersal capacity should be accompanied with an increase in the reproductive capacity of the species. 3.2.3. Asymptotic speeds, Allee effect and long distance dispersal Figure 9 shows the expansion speed as a function of the fraction of seeds that disperses long distances, and for different intensities of the Allee effect. The expansion speed increases with the fraction of seeds that disperses long distances. For small fractions the increase in expansion speed, is less perceptible. After a certain fraction of seeds that disperses long distances, the rate of change in speed increases considerably. The variation in speed is less evident when the intensity of the Allee effect is strong (σ = 0.8). In this case, the long distance dispersal does not make a significant effect in the increase of the expansion speed. c
2.5 2.0 1.5 1.0 0.5 p 0.2
0.4
0.6
0.8
1.0
Fig. 9. Expansion speed (c) as function of the fraction (p) of the seeds that disperses long distances, for different intensities of the Allee effect (σ) : (—)σ = 0.3; (– –) σ = 0.5; (- - - )σ = 0.8
Figure 9 shows a critical value of p, beyond this value the expansion speed drops to zero. This result is caused by the Allee effect. As more and more seeds are dispersed over long distances, the population density after dispersal is diluted over a large area. As a result, the population density is
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
208
insufficient for its survival, and goes extinct. The stronger the intensity of the Allee effect, the smaller is the critical value of p. 4. Discussion Our population growth model displays all the possible outcomes considered for population growth models with Allee effect. This means that our model has a broad dynamical behavior, which allows us to represent various situations of the Allee effect. The population growth model, can exhibit weak or strong Allee effect, and each one with a range of intensities. The analysis of population growth model, showed that the Allee effect not only establishes a critical density for the survival of the population, but also influences the maximum density the population reaches. This effect has been documented for other growth models with Allee effect,6 but little is discussed about its significance. We considered that the influence of the Allee effect is as important as the strength of the intraspecific competition, in determining the maximum density that the population reaches. Apart from influencing the maximum density a population reaches, our results show that the Allee effect has a stabilizing effect on the maximum density (N2 ). The stabilizing influence of the Allee effect on the higher equilibria has been documented for other growth models.6,20 The Allee effect turns the equilibria more stable in the sense that the reproductive capacity can take higher values without destabilizing the equilibria. Frequently reproductive parameter estimations based on experimental data, yield an unstable equilibria (growing oscillations and chaos) in growth models without an Allee effect, when the natural population does not experience such behavior.20 Sheuring20 suggest that growth models for sexual populations without an Allee effect are oversimplified, and the introduction of an Allee effect turns more realistic the growth model. The stabilizing influence of the Allee effect on the higher equilibria can be extended to the spatial dynamics as is shown in our results. We show that the dispersal process turned unstable the higher equilibria in the growth model without the Allee effect. It said that dispersal process modeled as a diffusion process has a destabilizing effect on equilibria.9 But when dispersal is modelled in a finer scale, say between patches, dispersal process can have a stabilizing effect on equilibria10 . Hastings9 argues that the different results lies in how the spatial variations are incorporated into he model. When modeling dispersal as a diffusion, it is assumed that the dispersal process occurs at a large spatial scale and not between few neighboring patches.9 Our model considers a dispersal process at a large spatial scale,
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
209
hence presenting a destabilizing effect according to the diffusion models. The influence of the Allee effect in turning accelerating expansion speeds into constant ones, have been documented in other works.15 Fat tail dispersal kernels are frequently used to represent long distance dispersal,2 and they are responsible for the accelerating expansion speeds.15 Our results show that the Allee effect in fact turns accelerating expansion speeds into constant speeds. The reduction in speed as the intensity of the Allee effect gets stronger has been documented in other works.17,25,26 Our results on the expansion speed as function of the intensity of the Allee effect, behaved as documented by the previous works cited. The negative influence on establishment caused by an Allee effect, tends to be intensified with dispersal.16 Population density is low at the front of an invasion and frequently isolated. In these populations, emigration does not compensate immigration and the result is a net loss due to dispersal. This situation is supported by our results which show that dispersal turns the population more susceptible to extinction. As the dispersal process is introduced to the local dynamics, the population exhibits an unconditional extinction for a weaker intensity of the Allee effect. The reproductive capacity of the population does not compensate the rate of spread of the population, and the population dilutes in space. The population drops below the critical density and the population goes extinct. “While dispersal enhances the spread of an invading population, it detracts from its ability to establish”.16 It has been shown that the introduction of long distance dispersal increases the expansion speed, and that the expansion speed is very sensitive to the fraction of the population that disperses long distances.2,7 A small fraction (> 10%) of the population that disperses long distances, results in a considerable increase in the expansion speed. According to our results, when the population is subject to an Allee effect, the increment is not so evident for small fractions (> 0.05%). The increment in speed becomes perceptible for greater fractions. For an intense Allee effect, the long distance dispersal does not make a substantial difference in the expansion speed as compared with the local dispersal. The outlying populations that originate from the seeds that disperses long distances, do not survive to the intense Allee effect, so that the LLD does not contribute substantially to the spreading speed. Our results show that there is an increase in expansion speed with the fraction that disperses long distances, but also show that there is a threshold value for p, beyond this critical value, the population goes extinct. The threshold for the fraction that disperses long distances
April 24, 2009
9:8
WSPC - Proceedings Trim Size: 9in x 6in
Salvador.Lou.Vega.novo2
210
has also been documented in other works.7 As the fraction increases, the population dilutes into a larger space, favoring the Allee effect to emerge and turning the population susceptible to extinction. The stronger the intensity of the Allee effect, the smaller the threshold value the population can tolerate. References 1. D.S. Boukal and L. Berec, J. theor. Biol. 218, 375 (2002). 2. J. Clark, Am. Nat. 152, 204 (1999). 3. H. Davis, C.M. Taylor, J.G Lambrinos and D.R Strong Proc. Natl. Acad. Sci. 101, 13804 (2004). 4. B, Dennis Nat. Res. Model. 3, 481 (1989). 5. J.M. Derek, A.M. Liebhold, P.C. Tobin and O.N Bjornstad, Nature 444, 361 (2006). 6. M.S. Fowler and G.D Ruxton, J. theor. Biol 215, 39 (2002). 7. F. Takasu, N. Yamamoto, K. Kawasaki, K. Togashi, Y. Kishi and N. Shigesada, Biol. Inv. 2, 141 (2000). 8. M.H Wang, M. Kot, Am. Nat. 171, 83 (1999). 9. A.Hastings, Ecology 71, 426 (1990). 10. A.Hastings, Ecology 74, 1362 (1993). 11. A. Hastings, K. Cuddington, D.F Davis, C.J Dugaw, S. Elmendorf, et al. , Ecol. Lett.8, 91 (2006). 12. S.I. Higgins, R. Nathan and M.L. Cain Ecology 84, 1945 (2003). 13. W.D. Koenig and M.V Ashley, Tren. Ecol. and Evol. 18, 157 (2003) 14. M. Kot, J. Math. Biol. 30, 413 (1992). 15. M. Kot, M. Lewis, P. van den Driessch Ecology 77, 2027 (1996). 16. A.M Liebhold and P.C. Tobin, Annu. rev. Entomol. 53, 387 (2008). 17. M. Lewis, P. Kareiva, Theor. Popul. Biol. 43, 141 (1993). 18. R. Nathan and H.C. Muller-Landau,, Tre. Ecol. Evol. 15, 278 (2000). 19. C.M. Taylor and A. Hastings, Ecol. Lett. 8, 895 (2005). 20. I. Scheuring, J. theor. Biol 199, 407 (1999). 21. N. Shigesada, K. Kawasaki, Y. Takeda, Am. Nat. 146, 229 (1995). 22. P.A. Stephens, W.J. Sutherland andR.P Freckleton, Oikos. 87, 185 (1999). 23. P.A. Stephens and W.J. Sutherland, Trends Ecol. Evol. 14, 401 (2002). 24. I.Parker, Nature 101, 13695 (2004). 25. M.H Wang, M. Kot, Math. Biosc. 171, 83 (2001). 26. M.H Wang, M. Kot, J. Math. Biol. 44, 150 (2002).
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
211
MODELING THE RISK OF FALCIPARUM MALARIA FOR TRAVELERS TO HOLOENDEMIC REGIONS E. MASSAD1,2 M. N. BURATTINI1 F. A. B. COUTINHO1 1
2
School of Medicine, University of Sao Paulo and LIM 01-HCFMUSP CEP: 05405-000 - S˜ ao Paulo - S.P.- Brazil
London School of Hygiene and Tropical Diseases, U.K.
Malaria has emerged as a frequent problem in international travelers. The risk depends on destination, duration and season of travel. However, data to quantify the true risk for travelers to acquire malaria are lacking. Methods: We used mathematical models to estimate the risk of non-immune persons to acquire malaria when traveling to the Amazon region. From the force of infection we calculated the risk of dengue dependent on duration of stay and season of arrival. Our data highlight that the risk for non-immune travelers to acquire malaria in the Amazon region is substantial but varies greatly with seasons and epidemic cycles. For instance, for a traveler who stays in the Amazon for 120 days during the high season, the risk of acquiring malaria was 0.16%. Risk estimates based on mathematical modelling will help the travel medicine provider give better evidence based advice for travelers to malarial countries.
1. Introduction Malaria is a parasitic infection of red blood cells and the liver caused by any of four related species: Plasmodium falciparum, P. vivax, P. ovale, and P. malariae. There is recent evidence that a fifth species, P. knowlesi, previous thought to cause malaria only in monkeys, may also cause malaria in humans(Garnham, 1988) Malaria is transmitted by female Anopheline mosquitoes, which typically bite between dusk and dawn (White, 1996). The chief symptoms are fever, sweats, chills, headache, body aches, and malaise, typically accompanied by anemia and enlargement of the spleen. Complications may include jaundice, low blood sugar, kidney failure, fluid in the lungs (pulmonary edema or ARDS), and circulatory collapse (Ko-
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
212
rgstad, 1999). Infections caused by P. falciparum are particularly dangerous because of a propensity for infected red blood cells to obstruct blood flow to the brain, causing seizures, impaired consciousness, and sometimes coma (Korgstad, 1999). Malaria risk—P. vivax (75%), P. falciparum (25%)—is present in most forested areas below 900 m within the nine states of the “Legal Amazonia” region (Acre, Amap´a, Amazonas, Maranh˜ ao (western part), Mato Grosso (northern part), Par´ a (except Bel´em City), Rondˆonia, Roraima and Tocantins (western part)). Figure 1 shows the mapa of malaria in Brazil (FUNASA, 2001, WHO, 2004).Transmission intensity varies from municipality to municipality but is higher in jungle areas of mining, lumbering and agricultural settlements less than 5 years old than in the urban areas, including in large cities such as Boa Vista, Macap´ a, Manaus, Maraba, Pˆ orto Velho, Rio Branco and Santar´em, where transmission occurs on the periphery of these cities. In the states outside “Legal Amazonia”, malaria transmission risk is negligible or non-existent. Multidrug-resistant P. falciparum has also been reported (WHO, 2008). In 2002, Brazil reported approximately 40% of the total number of the malaria cases in the Americas.Almost 99% of cases occur in the Legal Amazon Region, where no more than 12% of the country’s populationresides. An increase in the number of cases began in the 1980s. In 1992, 572 000 cases were reported and apeak of 610 878 cases was reported in 2000. By 2002, the number of cases was reduced to 349 873 among 2.12 million slides examined, giving a 16.5% smear positivity rate. A slight rebound in 2003 of 379 500 cases was reportedly associated with population movement to the periphery of large cities as well as to the Legal Amazon Region.(WHO, 2005). Therefore, the total number of malaria cases in Brazil is oscilating around 600,000 cases per year, with a prevalence of falciparum that is increasing linearly in the last decade and it is expected to be aorund 30% currently (Ladislaw, 2006). Brazil receives the highest number of foreign visitors in South America and is the second largest inbound market (after Mexico) in Latin America (Ecobrasil, 2008) In 2005 a record of 5,4 million international arrivals was registered for Brazil with 57% of tourists coming from North America and Europe (Embratur, 2006). Of the total arrivals 44% was leisure tourism. Preliminary results of the analysis of the profile of international demand for 2004/2005 by Embratur (2006), show that 39% of tourists cite Brazil’s natural beauty as a motivation to visit the country. However, only 7% of leisure tourists (which translates into 3% of total tourists) say they
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
213
visit the Brazilian Amazon, arguably the destination most associated with ecotourism for the international market. Therefore, the total number of foreging people visiting malaria endemic areas of Brazil can be estimated in around 160,000 per year. In addition studies done by FIPE (Embratur 2006) indicate that, in number of tourists, the domestic market is much larger. The latest study indicates that there is a core population of about 11 million tourists, who travel by plane and can be considered long haul domestic tourists. So, the total number of domestic tourists to the Amazon is around 330,000, totalizing almost half a million people exposed to the risk of catching malaria every year (Ecobrasil, 2008) The objective of this work is to estimate, through a mathematical model, the risk of catching falciparum malaria, the deadliest form of malaria in the endemic regions of Brazil. This paper is organized as follows. After this brief introduction we present the model to estimate the risk of falciparum malaria. The next section deduces the basic and effective reproduction numbers for the malaria
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
214
model. Next we estimate, through a mathematical model, the risk of malaria acquisition for a traveller to a holoendemic area. The final section discusses our results.
2. The Model The model describes the dynamic of dengue or malaria in its two ’subpopulations’, namely, human hosts and mosquitoes. The populations densities, therefore, are divided into susceptible humans, denoted SH , infected humans, IH , recovered (and immune) humans, RH , total humans, NH , susceptible mosquitoes, SM , infected and latent mosquitoes, LM , infected and infectious mosquitoes, IM . In addition, we separated a cohort (denoted by primes and called ”probe”) that is going to be followed through the entire outbreak and that is used to calculate the probability that an individual gets malaria infection, according to equation (3) described below. In this cohort N’H is number of humans in the cohort, S’H , I’H, R’H, are the number of susceptible, infected and recovered humans, respectively, in the cohort (Massad et al., 2008). The model’s dynamics is described by the following set of equations:
dSH dt
SH − µH SH + σH RH + rH NH 1 − = −abIM N H
dIH dt
SH − (µH + γH + αH ) IH = abIM N H
dRH dt
= γH IH − µH RH − σH RH
dSM dt
= −µM SM − acSM HNH H M +rM NM 1 − N [cs − ds sin (2πf t)] κM
dLM dt
= acSM
′
(I +I )
(1)
′
(IH +IH ) NH
−µM τ
−e dIM dt
NH κH
−µM τ
=e
acSM (t − τ )
acSM (t − τ )
h i ′ IH (t−τ )+IH (t−τ ) NH (t−τ )
h i ′ IH (t−τ )+IH (t−τ ) NH (t−τ )
The evolution equations for the probe cohort are:
− µM LM
− µM IM
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
215
′
dSH dt
S
′
dIH dt ′
dRH dt
′
′
= −abIM NHH − µH SH S
′
′
= abIM NHH − (µH + γH + αH ) IH ′
(2)
′
′ = γH IH − µH RH − σH RH
for
′
′
′
′
NH = SH + IH + RH ′
NH = NH + SH + IH + RH
(3)
NM = SM + LM + IM .
The model’s parameters are, a is the mosquitoes daily biting rate; b is the proportion of infected bites that are actually infective for humans; c is the proportion of bites that are actually infective for mosquitoes; µH is the humans mortality rate; γH is the humans recovery rate from parasitaemia; rH is the humans birth rate; αH is the malaria-induced mortality rate of humans; σH is the lost of immunity due to malaria; µM is the mosquitoes daily mortality rate, τ is the extrinsic incubation period; rM is the mosquitoes fertility rate; κH is the humans carrying capacity and κM is the mosquitoes carrying capacity. We introduced the term [cs − ds sin (2πf t)] in the susceptible mosquitoes population in order to simulate seasonality in the mosquitoes population (Burattini et al., 2008, Forattini et al., 1993). The parameters cs and ds modulate the intensity of the seasonality, mimicking deep or light winters, depending on the difference between those parameters’ values. The model result, with parameters as in table 1, is shown in figure 2.
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
216
Parameter a b c µH γ σ α rH κH µM τ rM κM cs ds f
Table 1. Model’s parameters Biological interpretation Mosquitoes’ biting rate Probability of infection to humans Probability of infection to mosquitoes Humans’ mortality rate Recovery rate Loss of immunity Malaria’s mortality rate Humans’ reproductive rate Humans’ carrying capacity Mosquitoes’ mortality rate Extrinsic incubation period Msoquitoes’ reproductive rate Mosquitoes’ carrying capacity Seasonality factor Seasonality factor Frequencey of seasonality
Value 0.3 days−1 8.8 × 10−2 days−1 8.7 × 10−2 days−1 3.9 × 10−5 days−1 5.5 × 10−3 days−1 3.3 × 10−2 days−1 1.0 × 10−2 days−1 8 days−1 1.6 × 107 1.0 × 10−1 days−1 7 days 4 days−1 2.0 × 108 days−1 7.0 × 10−2 days−1 6.9 × 10−2 days−1 2.7 × 10−3 days−1
Fig. 1. Model’s output. In black the number of infective humans and in gray the number of infective mosquitoes.
2.1. The Basic Reproduction Number The central parameter related to the intensity of transmission of infections is the so called basic reproduction number (R0 ), defined by Macdonald (1952, see also Massad et al., 1994) as the number of secondary infections produced by a single infective in an entirely susceptible population (see
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
217
next section). Originally applied in the context of malaria, R0 is a function of the vector population density as related to the host population, m, the average daily biting rate of the vector, a, the host susceptibility, b, the vector mortality rate, µ, the parasite extrinsic incubation period in days, n, and the parasitemia recovery rate, r, according with the (now) historical equation: R0 =
ma2 b exp [−µn] rµ
(4)
(actually, Macdonald denoted R0 as z0 in his original paper). From the definition of the basic reproduction number it can be demonstrated that if R0 is not greater than one, that is, when an index case (the first infective individual) is not able to generate at least one new infection, the disease dies out. Hence, in the original Macdonald analysis, R0 coincides with the threshold for the infection persistence. Let us now deduce the expression of R0 for system (1). we begin by taking only the infective compartments of system (1) dIh dt
SH − (µH + γH + αH ) IH = abIM N H
dIM dt
) = e−µM τ acSM (t − τ ) NIHH(t−τ (t−τ ) − µM IM
(5)
where SM and SM are the number of susceptible humans and vectors, respectively. To deduce the threshold for the disease to establish in the human population we analyze the stability of the trivial solution SH = NH , SM = NM , IM = IH = 0, that is, the absence of the infection. Linearizing the system 5 around the trivial solution we get dih dt
= iM ab − (µH + γH + αH ) iM
diM dt
=
SM (t−τ )a −µτ iM (t NH (t−τ ) ce
(6)
− τ ) − µM iM
where yv and yh are small deviations from zero. From the system 6 we get the following characteristic equation
or
−(λ + (µH + γH + αH )) ab =0 SM (t−τ )a −µτ −λτ NH (t−τ ) ce e −(λ + µM )
(7)
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
218
λ2 + (µ + (µH + γH + αH )) λ + µM (µH + γH + αH ) − SM (t − τ )a −µτ −λτ ce e ab = 0 NH (t − τ )
(8)
It follows that the roots of equation 7 or 9 have negative real parts if µM (µH + γH + αH ) −
SM (t − τ )a −µτ ce ab > 0. NH (t − τ )
(9)
The above result is the same as that obtained by the intuitive McDonald´s approach. The expression for the basic reproduction number, therefore is R0 =
SM (t − τ )a −µτ 1 ce ab NH (t − τ ) µM (µH + γH + αH )
(10)
and the effective reproduction number is R(t) =
SH (t) 1 SM (t − τ )a −µτ ce ab . NH (t − τ ) µM (µH + γH + αH ) NH (t)
(11)
The value of R(t) oscilates around the threshold value 1, as can be seen in figure 3. Whenever R(t) > 1 the number of cases grows, whereas whne R(t) < 1 the number of cases diminishes.
Fig. 2.
Effective reproduction number (curve) as related to threshold for infection (line)
Note that the value of R(t) remains above unit for a greater part of the time, which implies that the disease maintain itself oscilating in the population, as shown in figure 2.
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
219
3. Estimating the risk of malaria In order to calculate the probability of an individual acquiring malaria infection, πmal after the introduction of a single case in an entirely susceptible population we consider the probe cohort that is going to be followed through the entire outbreak. The probability of infection in this self-limiting outbreak is then given by the following expression:
πmal =
R∞ 0
′
SH (t)hmal (t)dt ′
NH (0)
.
(12)
In the above equation, S′H (t) and N ′H (t) are respectively the number of susceptible hosts and the total population of the cohort used as a probe, and hmal (t) is the force of infection of malaria, defined as the per capita number of new cases per time unit (Massad et al., 2008) and expressed as
hmal (t) = abmal
IM (t) , NH (t)
(13)
where IM (t) is the number of infected mosquitoes. One can also calculate the risk of infection for a traveller, who arrives in the affected region at week Ω after the outbreak is triggered and remains travelers there for ω weeks πmal . This is done by setting the limits of integration in equation (3) as:
travelers πmal =
Ω+ω R
′
SH (t)hmal (t)dt
Ω
′
Nh (Ω)
.
(14)
We calculated the risk for a traveller who arrive at the Amazonian region in four different moments of time, namely, in the dry season (winter) in the spring, in the wet season (summer) and in the fall. The model assumes that the Amazonian region has around 300,000 cases of falciparum malaria per year and the region is in a holoendemic situation. The result is shown in figure 4. 4. Discussion Previous studies have attempted to determine the cumulative risk of acquiring malaria in travelers (Tada et al., 2008)), but the given incidence
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
220
Fig. 3. Risk of catching falciparum malaria for travellers as a function of the period of the year they arrive and the time remaining in the area.
rates are not generalizable to all travelers at all times, as malaria incidence varies greatly from year to year (House and Ehlers, 2008). Mathematical modelling are well suited to take into account seasonality and annual variations. To the best of our knowledge, this is the first time that risk estimates for acquiring malaria when traveling to malaria endemic countries have been calculated using mathematical modeling. Our models are robust and have been tested extensively on Amazonian data(Wyse, 2007). Our data highlight that the risk for non-immune travelers to acquire malaria in the Amazonian region is substantial but varies greatly with seasons and epidemic cycles. For example, the risk is almost 10 fold higher for a traveller who arrive in the fall and remains in the area for 120 days versus a traveller who arrive in the spring and remains in the area for the same amount of time. As expected, the risk increases with duration of stay. The limitations of our modelling were that we were able to take into account potential pre-existing malaria immunity. No data are currently available on actual incidence of malaria in travellers to the Amazon but this would be an important study to perform to test our models. Similar risk estimates can be calculated for other malaria endemic countries provided that local data on the force of infection and variations over time are available. Such risk estimates will help the travel medicine provider give better evidence based pre-travel advice for travelers to malaria endemic countries.
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
221
Acknowledgement This work was partially supported by CNPq and LIM01-HCFMUSP.
References 1. Burattini MN, Chen M, Chow A, Coutinho FA, Goh KT, Lopez LF, Ma S, Massad E. 2008. Modelling the control strategies against dengue in Singapore. Epidemiol Infect 136(3): 309-319. 2. Ecobrasil, 2008. http://www.ecobrasil.org.br 3. Embratur, 2006. http://www.ecoviagem.com.br/fique-por-dentro/noticias /turismo/embratur-2006-foi-o-melhor-ano-da-historia-do-turismo-6470.asp 4. Forattini, O.P.; Kakitani, I.; Massad, E.; Marucci, D.. Studies on mosquitoes(Diptera:Culicidae) and anthropic environment 4- Survey of resisting adults and synanthropic behaviour in South-Eastern Brazil. 1993. Revista de Sa´ ude P´ ublica, 27(6): 398-411. 5. FUNASA 2001. Manual de Terapˆeutica da Mal´ aria, 6th edi¸c˜ ao. Funda¸c˜ ao Nacional da Sa´ ude, Bras´ılia: Minist´erio da Sa´ ude do Brasil. 6. Garnam PCC. 1988. Malaria parasites of man: Life-cycles and morphology (excluding ultrastructure). In: W.H. Wernsdorfer and I. McGregor (eds). Malaria. Principles and Practice of Malariology. Chap 2, pp 61-96. Churchill Livingstone. Edinburgh. 7. House HR, Ehlers JP. 2008. Travel-Related Infections. Emergency Medicine Clinics of North America 26 (2): 499-516. 8. Krogstad DJ. 1999. Malaria. In: R.L. Guerrant, D.H. Walker and P.F. Weller (eds). Tropical Infectious Diseases. Principles, Pathogens & Practice. Chap. 70, pp 736-766. Churchill Livingstone. New York. 9. Ladislaw JLB. 2006. Situa¸c˜ ao da Mal´ aria na Amazˆ onia Legal. Minist´erio da Sa´ ude. Secretaria de Vigilˆ ancia em Sa´ ude. 10. Macdonald G. 1952. The analysis of equilibrium in malaria, Trop.Dis.Bull. 49 813-828. 11. Massad E, Coutinho FAB, Yang HM, De Carvalho HB, Mesquita F, Burattini MN. 1994. The Basic Reproduction Ratio of HIV among Intravenous Drug Users. Mathematical Bioscience, 123: 227-247. 12. Massad E, Ma S, Burattini MN, Tun Y, Coutinho FA, Ang LW. 2008. The risk of chikungunya fever in a dengue-endemic area. J Travel Med ;15(3):14755. 13. Tada Y, Okabe N, Kimura M. 2008. Travelers’ risk of malaria by destination country: A study from Japan. Travel Medicine and Infectious Diseases 6: 368372. 14. White NJ. 1996. Malaria. In: G.C. Cook (ed) Manson’s Tropical Diseases. Chap 61, pp 1087-1164. WB Saunders. London. 15. WHO 2004. Basic facts on malaria. Roll Back Malaria. 16. WHO 2008. http://www.who.int/ith/countries/bra/en/ 17. WHO, 2005. http://rbm.who.int/wmr2005/profiles/brazil.pdf 18. Wyse APP. 2007. Controle ´ otimo do vetor da mal´ aria par o modelo
April 24, 2009
9:28
WSPC - Proceedings Trim Size: 9in x 6in
Eduardo.novo2
222
matem´ atico sazonal. PhD thesis. Laborat´ orio Nacional de Computa¸c˜ ao Cient´ıfica, Brazil.
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
223
NEURAL NETWORK CLASSIFICATION WITH PRIOR KNOWLEDGE FOR ANALYSIS OF BIOLOGICAL DATA D. ABBATE Department of Computer Science, University of Bari, Bari, Italy E-mail:
[email protected] M. R. GUARRACINO High Performance Computing and Networking Institute, National Research Council, Naples, Italy E-mail:
[email protected] A. CHINCHULUUN P. M. PARDALOS Industrial and Systems Engineering Department, University of Florida, Gainesville, FL, USA E-mail: {altannar,pardalos}@ufl.edu Neural Networks are efficient classification tools that have been applied to several applications including extracting regularities in data and classifying events in finance, marketing, internet and biomedicine. The training process uses available examples to produce a model and classify new events based on the extracted model. This learning procedure based on the examples is not capable of taking prior knowledge that is either available or discovered in data into account. In the present work, we propose a way to include prior knowledge into Radial Basis Function Neural Networks and to express the knowledge as a set of linear constraints in the least square problem. The obtained method still takes advantage of kernel functions to obtain a nonlinear classifier. Furthermore, its computational complexity is not affected while the misclassification error is enhanced. Publicly available biomedical datasets are used in a case study to analyze the performance of the approach, and to compare the results with the state of the art classifiers.
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
224
1. Introduction Given a set A of data and two classes, −1 and +1, the purpose of binary classification is to divide the set into two disjoint classes, so that each data can be assigned to the correct class according to some discriminant features. Classification methods have been successfully applied for loan applications by banks, fiscal evasion by the Internal Revenue Service, face detection and optical character recognition. Nevertheless, one of the most promising applications of those methods is in the field of biomedicine and bioinformatics. Indeed, tasks which typically involve the use of binary classification are medical diagnosis such as verifying whether a patient has a given disease or not. In this case, the class labels are related to the presence and absence of the disease. From a mathematical point of view, given a set of points Γ ⊂ Rn , a binary classifier is a function f (x) : Rn → R, x ∈ Γ, whose sign represents the class that the point x belongs to. Examples of classifiers are neural networks,2 decision trees13 and support vector machines (SVM).4 The performance of a binary classifier can be evaluated through misclassification error, sensitivity and specificity. Misclassification error represents the percentage of misclassified samples. Sensitivity is the percentage of true positives among all positives tested. Specificity is the percentage of true negatives among all negatives tested. Most of the real world problems deal with irregular and noisy data for which optimal classification accuracy is hard to obtain. This motivates the rush towards the design of new classifiers. Medical data sets are practical examples of those types of data. Having a better classification accuracy on medical data can have a drastic impact both on the quality of life of a patient and on the promptness of diagnosis. In this paper, we will use the WPBC data set (Wisconsin Prognostic Breast Cancer5 ) in a case study. The version of the data set we used contains medical data of 155 patients who underwent surgery for removal of metastasized lymph nodes. WPBC is an example of a data set for which is difficult to improve classification accuracy because patients belonging to different classes have many attributes with close values. It is very difficult to reach an accuracy higher than 80% without adding the prior knowledge of a field expert. In WPBC, 18.06% of the points belong to the class +1. In this case, it is very likely that the classifier would be trivial and completely influenced by the points in class −1, which are the 81.94% of the whole data set. If the training set is composed of a large number of samples, there could be an overfitting problem. Thus, training with a big data would be risky, as the classifier could perfectly classifies the training data, but it would not
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
225
generalize. A natural approach is to use a prior knowledge in a classifier is to add more points to the data set. This results in higher computational complexity and overfitting. On the other hand, Lee and Mangasarian9 have shown that it is possible to analytically express knowledge as additional terms of the cost function of the optimization problem defining SVM. This solution has the advantage of not increasing the dimension of the training set, and avoiding overfitting and poor generalization of the classification model.2 Guarracino et. al.7 recently have shown a way to extend this approach to Generalized Proximal Support Vector Machines,8 halving the misclassification error of the original method. In this paper, we show how nonlinear knowledge can be applied to neural network classifiers. In particular, we will focus on Radial Basis Function neural networks (RBF-NN), which we explain in the following. Therefore, its computational properties make it a good candidate to understand how prior knowledge can be used to improve classification methods. We show that the proposed model with prior knowledge can substantially increase the original neural network accuracy by providing results that well compare with those reported by Mangasarian and Wild.10 Throughout this paper, we use the following notations and guidelines. All vectors are column vectors, a null vector is denoted by 0 and the vector whose components are all ones is denoted as e. Given a vector Pn x ∈ Rn , kxk1 denotes the 1-norm, i=1 |xi |, while kxk denotes the 2-norm, 1 Pn 2 2 . The transpose of the matrix A ∈ Rm×n is AT , while Ai and i=1 (xi ) A.j denote the i-th row and the j-th column of matrix A, respectively. Given two matrices A ∈ Rm×n and B ∈ Rn×k , a kernel K(A, B) maps m×n R × Rn×k into Rm×k . More precisely, if x and y are column vectors n in R then K(xT , y) is a real number in R, K(xT , B) is a row vector in Rm and K(A, B) is a matrix m × k. A common kernel in nonlinear classification is the Gaussian kernel, where the ij-th element is defined as 2 T (K(A, B))ij = e−µkAi −B.j k , where A and B are matrices with the same number of columns, µ is a positive constant and e is the Napier’s constant. As we deal with classification problems, each point x ∈ Rn is assigned to a class in {−1, 1}. Thus, for set Γ of m real points, which can be represented m by a matrix A ∈ Rm×n , there is an associated vector c ∈ {−1, 1} of labels denoting the class of each point of the set. The remainder of this paper is organized as follows. Section 2 gives a general description of neural networks and shows how nonlinear knowledge is added to RBF neural networks. In Section 3, we present a new algorithm
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
226
based on neural networks, which includes nonlinear knowledge as a set of linear inequalities. In Section 4, numerical results are reported on the WPBC case study. In Section 5, experimental results are provided for more data sets, and finally, in Section 6, conclusions are drawn and future work is proposed. 2. Related work The state of the art in binary classification is represented by SVM.4,15 SVMs separate the input space into two halfspaces by finding the hyperplane xT w − γ = 0 which maximizes the margin between the two classes. The margin is the distance between the hyperplane and the closest point. The hyperplane is found by minimizing the norm of w, with subject to classifying points of both classes correctly. Using the kernel trick, it is possible to obtain a nonlinear separating surface that correctly classifies nonlinearly separable classes, while still working with a linear program.10 Thus the resulting hyperplane, projected in the feature space,14 has the equation: f (x) ≡ K(xT , B T )u − γ = 0,
(1)
where B ∈ Rk×n and K(xT , B T ) : R1×n × Rn×k → R1×k is an arbitrary kernel function. Parameters u ∈ Rk and γ ∈ R are determined by solving the following quadratic programming problem:9 1 min eT y + uT u u,y 2 s.t. D(K(Γ, ΓT )u − eγ) + y ≥ 0,
(2)
y ≥ 0,
where D is a diagonal matrix with the diagonal elements equal to the labels of the corresponding elements of the training set (matrix Γ). Such condition places the points belonging to the two classes +1 and −1, represented by the matrix Γ, on two different sides of the nonlinear separation surface (1). Problem (2) corresponds to the following linear programming problem:10 min νeT y + eT s
u,γ,y,s
s.t. D(K(Γ, ΓT )u − γe) + y ≥ e, −s ≤ u ≤ s,
y ≥ 0.
(3)
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
227
The nonlinear classification model cannot describe the discriminating function in terms of inequalities involving linear relations among features. This can be perceived as a problem in case of medical diagnosis, in which doctors prefer to find simple correlations between the results of a clinic exams and the diagnosis or prognosis of an illness. On the other hand, it is generally accepted that results achieved by nonlinear models provide higher classification accuracy. Furthermore, with the advent of high throughput medical equipments, the number of exams to consider for a diagnosis can be very high and cannot be correlated only with the experience. Finally, methods that provide explicit classification rules are not guaranteed to find a set of rules that are small enough to read easily. In order to improve the results obtained by a classifier solely from the training set, it is possible to impose the knowledge of an expert into the learning phase of the function (1). Such expertise is represented by the following implication which represents a region ∆ in the input space in which those points that are known to belong to class +1: g(x) ≤ 0 ⇒ K(xT , B T )u − γ ≥ α, ∀x ∈ ∆.
(4)
g(x) : ∆ ⊂ Rn → R is a function defined on the subset ∆ ⊂ Rn where prior knowledge imposes to the classification model K(xT , B T )u − γ to be greater than, or equal to, a non negative number α, to classify points x ∈ {x|g(x) ≤ 0} as belonging to class +1. Given the theorem of the alternatives for a convex function, the implication (4) can be expressed as a linear inequality in the parameters (u, γ) of the classification model: Theorem 2.1. (Mangasarian and Wild, 2006 10 ) Given u ∈ Rk and γ ∈ R, if there is a v ∈ R, v ≥ 0 such that: K(xT , ΓT )u − γ − α + v T g(x) ≥ 0, ∀x ∈ ∆
(5)
then the implication (4) holds.
Finally, to add positive nonlinear knowledge to Problem (3) using Theorem
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
228
2.1: min νeT y + eT s
(6)
u,γ,y
s.t. D(K(A, B T )u − γe) + y ≥ e, −s ≤ u ≤ s, y ≥ 0,
K(xTi , B T )u − γ − α + v T g(xi ) + zi ≥ 0,
v ≥ 0, zi ≥ 0,
i = 1, . . . , l.
To add negative nonlinear knowledge, we can just consider the following implication: h(x) ≤ 0 ⇒ K(xT , B T )u − γ ≤ −α, ∀x ∈ Λ
(7)
where h(x) : Λ ⊂ Rn → R represents the region in the input space where the implication (7) forces the classification function to be less than or equal to −α, in order to classify the points x ∈ {x|h(x) ≤ 0} as −1. We now can finally formulate the linear program (3) with nonlinear knowledge included in the cost function: min
u,γ,y,s,v,p,z1 ,...,zl ,q1 ,...,qt
s.t.
t l X X qj ) zi + νeT y + eT s + σ( i=1
(8)
j=1
D(K(A, B T )u − γe) + y ≥ e,
−s ≤ u ≤ s, y ≥ 0,
K(xTi , B T )u − γ − α + v T g(xi ) + zi ≥ 0, v ≥ 0, zi ≥ 0, i = 1, . . . , l,
−K(xTj , B T )u + γ − α + pT g(xj ) + qj ≥ 0, p ≥ 0, qj ≥ 0, j = 1, . . . , t.
The linear programming problem (8) minimizes the margin between the two classes subject to the classification problem leaves the two a prior knowledge sets in the corresponding halfspace. 3. Nonlinear knowledge in RBF Neural Networks A RBF network is divided into two operative blocks: an inner hidden layer, and the output layer. The hidden layer, as it is based on neurons with a radial basis activation function, creates a response localized on the input vector x; the binary output will then be calculated as a weighted sum of these localized responses. Training a RBF network is a procedure divided
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
229
into two phases: in the first one, due to an unsupervised learning, the parameters of the radial bases function are calculated. Traditionally there are two strategies for this first phase of unsupervised learning. The classic strategy2 calculates these parameters through different clustering techniques. These aim to divide the training set into a fixed amount of homogeneous groups, organized according to the distance of the points in the training set. Besides clustering, it is possible to have an incremental approach. In this way, one seeks to reduce the mean quadratic error under a threshold ǫ by adding nodes to the hidden layer. In the second part of the training, we search for values of the weights w which determine the binary output y. Such weights are calculated by minimizing the following error function: m
E=
1X 2 (y(Xi. ) − ci ) 2 i=1
(9)
which tells the distance of the actual solution from the desired one. Prior knowledge is added by a modification to this phase. As seen for SVM10 and GEPSVM,7 we will use the theorem of the alternatives by Mangasarian and Wild. The optimization problem6 used to calculate the values of the second layer of weights imposes that the separation surface calculated by the RBF network must pass through the prior knowledge points. Constraining the hyperplane to pass through the prior knowledge points strengthens the value of the prior knowledge used in the learning process. Prior knowledge is then added as a set of constraints to Problem (9) to obtain the following minimization problem: m
min
1X 2 (y(Xi. ) − ci ) 2 i=1
(10)
s.t. Bx ≥ 0. The constraints of this problem force the hyperplane solution of the equation (9) to pass through the m points represented by the matrix B ∈ Rm×n . Algebraically, this means the solution to the least squares problem has to be searched in the subspace generated by prior knowledge points. As pointed out by Golub,6 the original problem is reduced with a QR decomposition, or with a singular value decomposition as shown by Bjorck3 in 1996. Knowledge is formally added as a set of linear constraints to the least squares
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
230
problem: min kAx − bk2
(11)
s.t. Bx = d,
where kAx − bk2 is the square root of the error, hence the distance between training set points and the separation surface. Moreover, A ∈ Rm×n , B ∈ Rp×n , b ∈ Rm , d ∈ Rp and rank(B) = p. Let us assume, for clarity, that both A and B are full rank matrices. Given: R p T T Q B = , (12) 0 n−p which is the QR factorization of the prior knowledge matrix B T , and the following: A1 A2 y p AQ = and QT x = , (13) z n−p p n−p we can transform the original problem (11) into the following one: min kA1 y + A2 z − bk2
(14)
T
s.t. R y = d.
Variable y is determined by solving the equation RT y = d. The vector z is obtained by solving the following unconditioned least squared problem: min kA2 z + (b − A1 y)k2 . z
Combining results from problems (14) and (15), we can see that y x=Q z
(15)
(16)
solves the original problem (11). 4. A case study The above method has been tested on a publicly available data set, the Wisconsin Prognostic Breast Cancer, from UCI repository.5 Different methods are compared using misclassification accuracy, sensitivity and specificity. Results for SVM were taken from Mangasarian and Wild, 200610 while results for RBF neural networks have been evaluated using a GNU/linux PC, kernel 2.6.9-42 with AMD Opteron 64 bits of the series 284 (2.2GHz), 4 Gigabyte RAM. The version of Matlab11 used is 7.3.0.298 (R2006b). Accuracy, sensitivity and specificity were calculated upon Leave-One-Out (LOO)
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
231
classification, where the parameters for each method are defined through a ten-fold cross validation grid-search over the whole data set. RBF neural network algorithm was implemented in Matlab. The data set provides 30 cytological features plus tumor size and the number of metastasized lymph nodes. Moreover, for each of the 198 patients, it provides the number of months after the patient has been diagnosed a new cancer. If there has been no recurrence, the data set contains information on how long the patient has been under analysis. In our work, we wanted to identify those patients which had a recurrence in a period of 24 months, discriminating them from those who did not have any recurrence. This is a subset U psilon of the data set: Υ = {x ∈ W P BC| property p holds for x} where the property p is defined as follows: ( the patient has had a recurrence in the 24 months period, p holds iff the patient had not recurrence. After this filtering, the remaining data set contains 155 patients. To simulate the expertise of a surgeon, we used the same areas identified by Managasarian and Wild10 and described by the following formulas:
(5.5) x1 (5.5) 7
+ (5.5) x1 − (5.5) 4.5 − 23.0509 ≤ 0
−
27 x2 9 x2 ⇒ f (x) ≥ 1 (17) −x2 + 5.7143x1 − 5.75 x2 − 2.8571x1 − 4.25 ≤ 0 ⇒ f (x) ≥ 1 (18) −x2 + 6.75 1 2 2 (x1 − 3.35) + (x2 − 4) − 1 ≤ 0 ⇒ f (x) ≥ 1. (19) 2 Equations (17)-(19) describe three areas in a two dimensional representation of the data set. The x-axis is the tumor size (the next to last feature of the data set) while the y-axis is the number of metastasized lymph nodes (the last feature of the data set). Following the work by Mangasarian and Wild, we decided to take those 14 points which belong to the three areas described by the equations (17)-(19). We note that those 14 points are among the support vectors that have been misclassified by SVM in leave one out validation. So far, the lowest accuracy error without knowledge was 13.7%.1 Adding knowledge it was possible to decrease misclassification rate to around 9.0% as it is shown in Table 1.
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
232 Table 1. Leave One Out misclassification percentage, sensitivity and specificity on WPBC data set (24 months). Classifier
Misclassification error
Sensitivity
Specificity
SVM SVM with knowledge RBF-NN RBF-NN with knowledge Improvement due to knowledge
0.1806 0.0903 0.1806 0.0968 50.0 %
0 0.5000 0 0.4643
1.000 1.000 1.000 1.000
In Table 1, we report missclassification error percentage, sensitivity and specificity for SVM and RBF-NN with and without knowledge. We note that without knowledge both methods produce trivial results. When knowledge is used, both methods have the same accuracy, and nearly same values of sensitivity and specificity. The slightly lower value of sensitivity is due to the fact that RBF-NN misses one point of class +1. We can analyze our approach in terms of sensitivity and specificity as introduced in Section 1. The accuracy of the classifier, with respect to class −1, is measured in terms of specificity. Note that, with our approach, specificity is maximum. 5. Numerical experiments To further asses the classification accuracy of RBF-NN with prior knowledge, we have used three publicly available data sets:5,12 Thyroid, Heart, Pima Indians and Banana. Thyroid contains information about 215 patients along with 5 cytological features. There are 65 patients affected by a thyroid disease. They represent 30.23% of the data set (class +1). Remaining 150 patients belong to class -1. Heart is also a medical data set, collected by Janosi and Steinbrunn. It provides 13 characteristics (age, sex, chest pain, blood pressure, ...) about 270 patients. 120 patients, 44.44% of the data set, belong to class +1, representing patients affected by heart disease. The remaining 150 patients belong to class -1. Pima Indians diabetes dataset has been created by the National Institute of Diabetes, Digestive and Kidney Diseases. It contains data about 768 patients. Those are extracted from a larger database, considering females at least 21 years old, belonging to the Indian tribe named Pima, all living in Phoenix (AZ) area. For each patient, the problem is to asses if she shows the symptoms of diabetes, with respect to the criteria established by the World Health Organization. Features regards factors like the number
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
233
of pregnancies, glucose concentration in plasma, diastolic blood pressure, tricipital skin thickness and insulin level. The patients are divided as 268 positives, which represents 34.90% of the dataset, and 500 healthy patients. Finally, Banana is an artificial data set of 2-dimensional points which are grouped together in a shape of a banana. We decided to choose points to add in prior knowledge in the following way. For each data set, we have executed a LOO validation and chosen the misclassified points. In Table 2, we report classification accuracy, sensitivity and specificity for RBF-NN with and without prior knowledge. Table 2. Leave One Out misclassification error percentage, sensitivity and specificity on different data sets.
Data set Thyroid Heart Diabetes Banana Data set Thyroid Heart Diabetes Banana
Results without knowledge Misclassification Sensitivity Specificity error 0.1488 0.5538 0.9800 0.1926 0.7833 0.8267 0.3216 0.6493 0.6940 0.1399 0.8304 0.8829 Results with knowledge Misclassification Sensitivity Specificity error 0.0977 0.7231 0.9800 0.1296 0.8500 0.8867 0.2227 0.8731 0.6940 0.1110 0.8565 0.8967
We note that the performance of the method is substantially improved in each case. It is interesting to note that the number of true positive and negative points is substantially improved. 6. Conclusions and future work In the present, work we have proposed a new method to incorporate nonlinear knowledge provided by an expert in Radial Basis Function neural networks, in a fashion similar to what has been done in Mangasarian and Wild, 2006.10 Results showed that the accuracy of the new algorithm well compares with existing ones and, therefore, the accuracy is improved with respect to neural networks without knowledge. In future, we will test and compare the method with other methods using other standard data sets. Finally, we believe further investigation needs to be devoted to the identification of regions, where knowledge is needed,
April 24, 2009
10:4
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.novo2
234
in order to improve generalization of the classification model. We will investigate how the expression of a prior knowledge in terms of probability of a patient to belong to one class can affect classification models. References 1. K. P. Bennett. Decision tree construction via linear programming. In Procedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society Conference (Utica, Illinois)(M.Evans,ed.), pages 8290, 1992. 2. C. M. Bishop. Neural networks for pattern recognition. Oxford Press, 1995. 3. A. Bjorck. Numerical Methods for Least Squares. SIAM, Philadelphia, 1996. 4. C. Cortes and V. Vapnik. Support vector machines. Machine Learning, 20:273279, 1995. 5. C.L. Blake, D.J. Newman, S. Hettich and C.J. Merz. UCI repository of machine learning databases, 1998. http://mlearn.ics.uci.edu/MLRepository.html. 6. G. H. Golub and C. F. van Loan. Matrix Computation. John Hopkins University Press, 1996. 7. M. R. Guarracino, C. Cifarelli, O. Seref, and P. M. Pardalos. A classification algorithm based on generalized eigenvalue problems. Optimization Methods and Software, 22(1):7381, 2007. 8. M.R. Guarracino, D. Abbate, and R. Prevete. Nonlinear knowledge in learning models. In Workshop on Prior Conceptual Knowledge in Machine Learning and Knowledge Discovery, European Conference on Machine Learning, pages 2940, 2007. 9. Y. Lee and O. L. Mangasarian. Ssvm: A smooth support vector machine for classification, 1999. 10. O. L. Mangasarian and E. W. Wild. Nonlinear knowledge-based classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1):6874, 2006. 11. MATLAB. Users guide. The Mathworks, Inc., Natick, MA 01760, 19942006. http://www.mathworks.com. 12. S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Muller. Fisher discriminant analysis with kernels, 1999. 13. T.M. Mitchell. Machine Learning. McGraw Hill, 1997. 14. B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. 15. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
235
THE MARKOV CHAINS (MARKOV SET-CHAINS) AS A TOOL FOR BACTERIAL GENOMES EVOLUTION ANALYSIS ´ P. SLIWKA Faculty of Mathematics and Natural Sciences, Stefan Cardinal Wyszyski University, 1/3 Woycickiego,01-938, Warsaw; Department of Biometrics, University of Life Sciences, 159 Nowoursynowska Street, 02-776, Warsaw; M. DUDKIEWICZ Department of Experimental Design and Bioinformatics, University of Life Sciences, 159 Nowoursynowska Street, 02-776, Warsaw, Poland. Despite of great interest and numerous analysis assays the knowledge of pure mutation process is still scarce and insufficient. The aim of our investigation was to connect environmental conditions and some specific trends in substitutions patterns observed in different bacterial genomes. As a tool for genome large scale analysis, based on Borrelia burgdorferi B31 chromosomal and plasmid complete sequences and 13 others bacterial chromosomes complete sequences available at the NCBI FTP site, we used Markov chains. We assumed that pure mutational pressure could be considered as Markovian process hence substitution patterns could be described as a Markov chain transition matrix. We attempt to answer the question if its ergodic state reflects the nucleotide composition of the given sequences equilibrium state and if it could characterize the direction of mutational pressure specific for a particular genome. Keywords: Mutation pressures process, Markov chain, Markov set-chain, Fuzzy probability.
1. Introduction The Borrelia burgdorferi B31 chromosome is the most asymmetric of all bacterial chromosomes sequenced to date (this feature is perfectly visible in Fig. 1) asymmetry in nucleotide composition of DNA is defined as a deviation from Parity Rule II [22], which claims that number of T equals number of A and number of C equals G within single DNA strand. De-
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
236
viation from PR2 means that the two DNA strands are under different mutational or selection pressures or both, which leads to asymmetric substitution patterns [6], [7], [21], [28]. The asymmetry is so strong that it can help in experimental searches for the origin of replication and hence it seems to be obvious that the main cause of asymmetry in the nucleotide composition of two strands of bacterial chromosomes are is the different ways of its replications [26]. This asymmetric replication mechanism is specific only for Prokaryotic genomes and the asymmetry has not been observed in Eukaryotic chromosomes. However it is impossible to find out which substitutions in bacterial genomes generate the observed deviation from a PR2 just by measuring the asymmetry itself. On the other hand, the intergenic sequences, which, as we assume are remnants of duplications of genes in the species history, should accumulate all mutations [10], [17]. Their comparison with the original genes should reveal the influence of mutational pressure. This approach was adopted by Kowalczuk et al. [18], [19]. About ten intergenic sequences were found in the B. burgdorferi genome, which read from the leading strand had strong homology (with expected value ¡0.05) to ORFs from that genomes database. They were aligned and all the observed differences between the ORFs and their homologues were assumed to result from substitutions in the intergenic sequences. The obtained table of substitution frequencies was tested in computer simulations [19]. When a random, equimolar ([A]=[T]=[G]=[C]) DNA sequence was put under such a mutational pressure, after a sufficient number of generations it had the same asymmetry as the third codon positions of the leading strand ORFs of the real B. burgdorferi genome. When the B. burgdorferi ORFs were put under this pressure, its turned into sequence of its own third position composition. The results obtained were noticeable, because the third positions of codons are most often free from selection what is explained by genetic code properties. We propose the approach in which Monte Carlo simulations steps have been replaced by Markov chains calculations. We assumed that the pure mutational pressure in DNA molecule complies with conditions of Markovian process [14], [16], [20]. The basis of our conception was the thesis that the pure mutational process could be considered as a Markov chain with four possible states corresponding to four nucleotides: adenine, guanine, cytosine and thymine and with twelve possible transitions (Figure 2): from A to T, from A to C etc. and a specific probability of remaining in the current state in the next step. We used the observed substitutions frequencies calculated for 13 bacterial genomes and B. burgdorferi plasmid sequences
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
237
Fig. 1. [A-T], [G-C] walks for B. burgdorferi genome: left - chromosome asymmetry, right - the same analysis performed for one of 17 B. burgdorferi plasmids.
according to method of Kowalczuk et al. [17], [18] as probabilities of transition matrix describing the Markov chain. In the further part of this paper, we perform the homogeneity analysis for this process and we propose the method for calculate the intervals for ergodic state in the case of lack of homogeneity in the Markov chain. We describe precisely used methods for B. burgdorferi genome.
Fig. 2. The scheme of Markov chain probability matrix construction numbers of substitution calculated from aligned homologous sequences become a basis to determine probabilities of 16 kinds of substitution.
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
238
Our aim was to answer following questions: • Is the ergodic state calculated for B. burgdorferi Markov chain transition table consistent with the composition of third positions in codons (as it turned out for Monte Carlo simulations based on the same substitution matrix)? And hence is the ergodic state consistent with genome equilibrium state? • Is this equilibrium state a reflection of the mutational pressure under which a given genome evolved? If so, we should observe the differences between bacterial species living in extremely different environments. • Is this mutational pressure dependent on the external or rather internal factors? To answer last two questions we analyzed sequences of B. burgdorferi plasmids - separate DNA particles proliferating inside the cell, in parallel with the chromosome, which could be considered as additional chromosomes (some of them are about 60 000 bp long), as well as sequences of 13 different bacterial genomes evolving in wide variety of environmental conditions. 2. Materials and methods 2.1. Preparation and choice of biological data Nucleotide sequences of 13 completely sequenced bacterial genomes were used as a research material; three archebacterial: Archeoglobus fulgidus, Thermoplasma acidophilum DSM 1728, Pyrococcus abyssi and ten eubacterial: Escherichia coli UTI89, Borrelia burgdorferi B31, Helicobacter pylori 26695, Listeria monocytogenes EGD, Pseudomonas aeruginosa str. PAO1, Psychrobacter articum 273-4, Agrobacterium tumefaciens C58, Thermus thermophilus HB27, Thermosynechococcus elongatus BP-1, Deinococcus geothermalis and all sequences are available at the NCBI FTP site: ftp://ftp.ncbi.nih.gov/genbank. Their ecological and physiological features are compared in Table 1. The basis of our conception was the assumption that the pure mutational process could be considered as a Markov chain with four possible states corresponding to four nucleotides: adenine, guanine, cytosine and thymine and with twelve possible transitions: from A to T, from A to C etc. and a specific probability of remaining in the current
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
239
state in the next step. To find these probabilities we excised the intergenic fragments from complete genome sequence and we send them as search strings directly to BLAST server using the blastcl3 web tool. We chose the option blastn to find the similarities within the same bacterial genome with E value not grater then 0.05, so we obtained formatted nucleotide alignments as a result files. These files were the input for the Perl script, which calculated the substitution observed and produces the output file in form of substitutions frequencies table. These experimentally derived frequencies were used as probabilities of 4 x 4 transition matrix describing the Markov chain. In the case of plasmids analysis we have chosen homologs with E value not greater than 0.05 from other Borrelia burgdorferi B31 plasmids or from chromosome. We have analyzed 61 pairs of homologs with total length of 38 936 bp and we observed 13 327 substitutions. These pairs of sequences were used for create alignments with ClustalX [15], which were next the base for substitution counting. By far no evidence of nucleotide composition asymmetry in Borrelia burgdorferi plasmids was presented, so we performed 42 DNA [G-C] and [A-T] walks to show it. In opposite to the chromosome a great many of Borrelia burgdorferi B31 plasmids havent revealed a strong asymmetry (Figure 1). In cases when we were able to distinguish the point indicating a change in trend, we choose one or two nearest plasmid genes from NCBI base, assuming that the distal genes are more likely to be duplicated or transposed. The last step was the computation of ergodic distribution for estimated transition matrices using Perl and Mathematica tools combination. After calculating Markov Chains ergodic states we compared it with ACGT fractions in intergenic sequences in the given genome or plasmid (Figure 3 a, b, c) by performing Chi square tests of consistent. For all genomes the Chi square test values were statistically significant. We assumed that the pure mutational pressure in DNA molecule complies with conditions of Markovian process [14], [19]. However, it is sensible to assume that the mutation process is non-homogeneous. In the further part of this paper (see chapter 2.2 and Appendix) we propose the method of non-homogeneous Markov chains analysis for mutational pressure and for calculating the ergodic states intervals. Obtained ergodic states of HCM and ergodic states intervals for NHMC were next compared and interpreted in the context of filogenetic relations and differences in the environmental conditions of the given genome evolution (see chapter 3).
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
240
(a)
(b)
(c) Fig. 3. Comparison of ergodic states of Markov chains obtained as described in section 1 and the composition of intergenic sequences for three of 13 analyzed genomes: a. Pyrococcus abyssi hyperthermophile belonging to Archebacteria , b. Listeria monocytogenes eubacterial mezophile and Deinococcus geothermalis eubacterial thermophile.
2.2. Mathematical background and tools Consider a sequence of random variables X0 , X1 , ..., Xn , ... assuming values in the finite set of space state S = {0, 1, ..., N − 1}. If Xn = j depends only upon the immediate past Xn−1 = i and not upon the remote past
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
241 Table 1. Comparison of the ecological and physiological features of analyzed genomes. Shortcuts: M- mezophile, P- psychrophile, T-thermophile, HT- hyperthermophile, Ccircular, L-linear, n- neutral, a-acidic. species
group
chromo-oxygenacidity o C Typeasym-%GC some OGT metry C +/n 37 M + 50,79
Escherichia coli Eubacteria UTI89 (γ-Proteobacteria) Borrelia burgorferi Eubacteria L B31 (Spirochaetes) Listeria Eubacteria C monocytogenes (Gram positives) Helicobacter Eubacteria C pylori (ǫ-Proteobacteria) Psychrobacter Eubacteria C arcticum (γ-Proteobacteria) Pseudomonas Eubacteria C aeruginosa (γ-Proteobacteria) Agrobacterium Eubacteria 2 C/L tumefaciens (α-Proteobacteria) Thermus Eubacteria C thermophilus (Deinococci) Thermosynechococcus Eubacteria C elongatus (Cyjanobacteria) Deinococcus Eubacteria C geothermalis (Deinococci) Pyrococcus Archebacteria C abyssi Thermoplasma Archebacteria C acidophilum Archeoglobus Archebacteria C fulgidus
+
n
37
M
+
28,59
+
n
30-37 M
+
37,98
-
a
37
M
+
38,87
+
n
0
P
+
42,80
+/-
n
30-37 M
+
66,56
+
n
20 P/M
+
59,30
+
n
68
T
+
69,44
+
n
55
T
-
53,92
-
a
55
T
+
66,64
-
a
-
a
59
T
-
45,99
-
a
83
HT
-
48,58
96-98 HT
+/- 44,71
X0 = m, ..., Xn−2 = k, then Xn satisfy the Markov property
P (Xn = j | Xn−1 = i, Xn−2 = k, ..., X0 = m) =
= P (Xn = j | Xn−1 = i) = pij
i, j, k, ..., m ∈ S
If Xn satisfy the Markov property for all n ≥ 1 and the transition probabilities pij are time-homogeneous, than the sequence of random variables is called a finite homogenous Markov chain. Having the precise numbers of the individuals (for example: mutations) for which Xt−1 = i and Xt = j we can apply the microdate [20]. The microdate allows us to estimate the parameters of the Markov chains by estimator [20]
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
242
where,
P vij (t) pij = Pt∈T t∈T ni (t)
i, j ∈ S
(1)
pij ≥ 0, j=1 pij = 1, Pˆ = [ˆ pij ]i,j∈S − matrix, t ∈ T - the space of the time, s ∈ S - the space of the states (nucleotides), vij (t) - the number of individuals for which Xt−1 = i and Xt = j at time (t − 1, t], nj (t − 1) - the number of micro units in state j at time t − 1. Ps
Generally: the assumption that the rules of transitions between nucleotides do not change in time allows us to model the mutation process by the homogenous Markov chain (HMC). We used the observed substitutions frequencies as probabilities of transition matrix describing the Markov chain. We investigated each aligned base pair and we considered each observed nucleotide substitution (from A to G, from A to T etc.) and each identical position in analyzed sequences. We assumed that the direction of the transition was always from gene to intergenic sequence. The numbers of substitutions obtained (including A - A, C - C, G - G and T - T cases) were summed and normalized thus the fractions of transition from one nucleotide to any other summed up to one. In such a way we have received matrices of substitutions, which we set as 4 × 4 Markov chain probability matrices. The next step was computation of ergodic distribution for estimated transition matrices combination. Obtained ergodic states which approximated the nucleotide compositions of the sequences in equilibrium with mutation, were used as characteristics of mutational pressure in different DNA molecules. To make sure that the assumption of homogeneity of mutation process along the chromosome is correct, we have compared a substitution table calculated for all chromosomal alignments together with ten alternative tables constructed on the base of each alignment separately using Chi-Square (χ2 ) test. We repeated the tests for the set of matrices calculated for each combination of 2, 3, 4, ... alignments. The results for B. burgdorferi are plotted in Figure 4. Same of this individual alignments passed the test of similarity positively at significance level α = 0.05. Obviously, the results for combined matrices improved with increasing number
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
243
of alignments. It seems to be reasonable to work out the method which could manage with heterogeneous process as well. In view of lack of homogeneity in mutation rate along the chromosome and insufficient data it is not possible to model such process using non-homogenous Markov chains. Therefore, we propose the model based on Markov set-chains.
Fig. 4. Results of multiple χ2 consistent test for transition tables constructed for combination of 1, 2, ..., 9 alignments for B. burgdorferi. On y axis Chi-Square values, x values shows number of alignments combinations.
In a Markov set-chains the intervals of each transitions probability pij is replaced by an interval defined by the minimum and maximum possible to achieve probability of replacement (in our case: the minimum and maximum number of mutations between A, C, G, T). The proposed method (to go in to detail: see Appendix) allows us to approximate the interval probability transitions. The transitions probability intervals are expressed in two nonnegative matrices P and Q (P ≤ Q, element-wise). If P and Q (constructed by rows) satisfy pij = min aij and qij = max aij , A ∈ [P, Q], for all i, j then [P, Q] is called tight [1] (full description of the method: see Appendix). The ergodic distribution of homogenous Markov chains for all analyzed
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
244
genomes we presented in table 2. Table 2.
The ergodic distribution of 13 genomes.
Genome Escherichia coli UTI89 Borrelia burgorferi B31 Listeria monocytogenes Helicobacter pylori Psychrobacter arcticum Pseudomonas aeruginosa Agrobacterium tumefaciens Thermus thermophilus Thermosynechococcus elongatus Deinococcus geothermalis Pyrococcus abyssi Thermoplasma acidophilum Archeoglobus fulgidus
A 0.180577 0.297288 0.307275 0.312904 0.427992 0.279115 0.152572 0.309124 0.275475 0.187453 0.271237 0.29528 0.272417
C 0.220819 0.079474 0.251133 0.193775 0.166834 0.251499 0.286023 0.210725 0.225041 0.239525 0.234744 0.222824 0.222484
G 0.385832 0.146162 0.188211 0.186174 0.202594 0.324576 0.347406 0.333222 0.313937 0.376221 0.200833 0.198469 0.238538
T 0.212757 0.477075 0.253373 0.307133 0.202583 0.144823 0.214003 0.146938 0.185535 0.196809 0.293208 0.283419 0.266546
The bound of ergodic distributions for analyzed genomes computed by Markov set-chains we presented in table 3. Comparing intervals distribution of Markov set-chain with appropriate ergodic distribution of genome we observe a large span between lower and upper bounds. We suppose, that mutation process should be examined by non-homogenous Markov chain. Using non-homogenous Markov-chain guarantees suitable form of the stationary distribution, and appropriate conclusions about process of the mutation pressure. 3. Results After comparing ergodic states for all analyzes genomes and its intergenic sequences ATGC compositions we stated that their similarities were always significant at the confidence level of 95% (p-values for χ2 statistics were always close to 0). Additionally we distinguished several schemas for ATGC composition Markov chain ergodic states for different groups of bacterial genomes. It is obvious that the archebacterial genomes should create an external group cause of the feeble asymmetry and GC content as well. There is a characteristic pattern in nucleotide composition of intergenic sequences and equilibrium composition for archebacterial genomes (see Figure 3.a). According to the theory the stability of DNA double helix in high temperatures depends on the number of G-C pairs the both DNA strands. More G-C pairs (more triple hydrogen bonds) more stable the DNA particle. So we should expect higher fractions of GC in thermophiles and hyperther-
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
245 Table 3. The ergodic intervals distribution of Markov set-chain for 13 genomes (each genome: first row - down bounds, second row up bounds). Genome Escherichia coli UTI89 Borrelia burgorferi B31 Listeria monocytogenes Helicobacter pylori Psychrobacter arcticum Pseudomonas aeruginosa Agrobacterium tumefaciens Thermus thermophilus Thermosynechococcus elongatus Deinococcus geothermalis Pyrococcus abyssi Thermoplasma acidophilum Archeoglobus fulgidus
A 0.03357 0.7764 0.06976 0.57381 0.03222 0.85399 0.04608 0.76209 0.00634 0.9081 0.00994 0.90816 0.02667 0.6659 0.0116 0.88693 0.0206 0.77421 0.03075 0.52977 0.02494 0.68712 0.03368 0.70875 0.03648 0.71021
C 0.03954 0.74478 0.01923 0.44229 0.01761 0.69596 0.0205 0.69295 0.00364 0.87458 0.01592 0.92443 0.03882 0.75983 0.01903 0.91349 0.02118 0.7834 0.03099 0.618 0.02055 0.67192 0.02908 0.71096 0.04806 0.62469
G 0.03492 0.80006 0.04044 0.47206 0.00951 0.63049 0.02567 0.70082 0.00582 0.87569 0.0143 0.92223 0.05034 0.77818 0.01919 0.93122 0.02348 0.81184 0.04589 0.69187 0.01651 0.61192 0.02606 0.70321 0.03518 0.68254
T 0.03357 0.77202 0.14249 0.6889 0.02344 0.79458 0.03802 0.79687 0.00634 0.91222 0.009 0.9007 0.0294 0.74335 0.01617 0.88423 0.0206 0.76764 0.03144 0.5773 0.0201 0.6736 0.03427 0.69422 0.0609 0.67545
mophiles genomes sequences and ergodic states. It is true for eubacterial thermophiles but not for archebacterial ones (see Figure 3 a-c). The second group gathers the eubacterial thermophiles with high GC content in intergenic sequences and similar ergodic states. The last group consist eubacterial mezo-and psychrophiles with low GC content intergenic sequences as well as calculated ergodic states were poorer in GC than in AT. Borrelia burgdorferi genome was not numbered among any of above mentioned groups. We compared the intergenic sequences composition of B. burgdorferi main chromosome and ergodic distributions for substitution matrix calculated for chromosomal sequences. Because we observed a significant difference, we have chosen III rd positions in codons for coding sequences as a reference values. This time we obtained a significant value of Chi Square test (see Table 4). For distinguish internal and external factors influencing mutational pressure in genomes we analyzed B. burgdorferi plasmids and calculated ergodic states for them. After stating the similarity between er-
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
246
godic state for main chromosome and the composition of III rd positions in codons we decided to check the accordance of equilibrium states calculated for plasmid sequences. This time we did not obtain significant p-values for χ2 statistics (see Table 4). Table 4. Comparison of nucleotide composition of coding sequences third positions from leading and lagging strands of B. burgdorferi chromosome with substitution matrices ergodic states calculated for four different alignment groups. Ergodic state for substitution matrix calculated for: -genes from leading strand of the chromosome and intergenic sequences [a] -genes from circular plasmids and pseudogenes from circular plasmids [b] -genes from linear plasmids and pseudogenes from linear plasmids [c] -genes from circular plasmids and pseudogenes from linear plasmids [d]
Nucleotide composition of third positions of coding sequences from leading strand of Borrelia burgdorferi 0.000224229 0.052343598 0.015220211 0.022117944
It seems that the equilibrium state for plasmids evolving in the same cell is different from equilibrium for a chromosome. Considering the fact that the mechanism of replication of B. burgdorferi plasmids is utterly different than the way of replication for main chromosome [1] [27], we can point on the replication as for the main factor deciding about mutational pressure influencing the evolution of the given DNA particle. The analysis of the intervals for ergodic distributions obtained with the help of Hartfiel algorithm [11],[12] did not provide us of evidence, which could confirm the above conclusions. Obtained intervals are too wide and leave a lot of freedom for the expected real ergodic distributions of genomes Markov Chains transition tables. Conclusions Concluding our investigations we can state that: (1) External factors influencing mutational pressure during genome evolution can be masked by the factors connected with internal conditions such as replication or translation mechanisms specific for a given cell or even DNA molecule. (2) Modeling with HMC seems to be overuse in the case of genome evolution. The suitable tool for analyzing mutational pressure is NHMC.
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
247
(3) Because of the lack of the experimental data on the number and direction of nucleotide substitutions for a constant time unit in the given genome, the most proper method (from statistical point of view) for modeling of pure mutation process are Markov Set Chains. Acknowledgments All computations were done using Mathematica 5.0 and Statistica 8.0. Appendix [11-13] In the monograph [11] Hartfiel propose two construction (geometrical and algorithm) of the Markov set-chains intervals, which bounds (up and down) unknown parameters at each step. Both methods use the tight boundaries. Only algorithm method being applied is presented below. Definition Let p, q - the nonnegative vectors with p ≤ q, [p, q] - interval for a fuzzy probability. If pi = min x, qi = max x, where x ∈ [p, q], then pi , qi are called tight. If for each i pi and qi are tight then interval [pi , qi ] is called tight. Definition Let a) M - compact set of nxn stochastic matrices A, b) s1 , ...,sn - states consider the set of all nonhomogenous Markov chains, having all their transition matrices in M. Define
M k+1
M = A1 A2 : A1 , A2 ∈ M ... = M M k = A1 ...Ak : A1 , ..., Ak ∈ M
The sequence M, M 2 , ..., M k , is Markov set-chain. Definition Define ωn = {A : A − n × n stochastic matrix}. Let P, Q be n × n nonnegative matrices with P ≤ Q. Corresponding interval in ωn is [P, Q] = {A : A − n × n stochastic matrix with P ≤ A ≤ Q}, (assumption: [P, Q] 6= ∅). If P and Q satisfy for all i and j
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
248
pij = min aij , qij = max aij , A ∈ [P, Q], then matrices interval [P, Q] is called tight. The tightness of the interval can be tested by lemma Lemma Let [p,q] be an interval. For each i: P a) pi is tight if and only if pi + k6=i qk ≥ 1, P b) qi is tight if and only if qi + k6=i pk ≤ 1. If an interval is not tight, Hartfiel gives the tighteness interval algorithm (1) Input p, q. (2) For i = 1,...,n do P P i) if pi + k6=i qk ≥ 1, then set p¯i = pi , otherwise set p¯i = 1 − k6=i qk , P P ii) if qi + k6=i pk ≤ 1, then set q¯i = qi , otherwise set q¯i = 1 − k6=i pk . (3) Output p,q. For computing bounds at each step in a Markov Set-Chains Hartfiel define two kinds of matrices: H and L, column tight component bounds on the compact set of n × n transition matrices M. Definition Matrices L and H are called column tight component bounds on M provided L ≤ A ≤ H for all A ∈ M and i) if lk is the k-th column of L, then there is a matrix A ∈ M with k-th column lk , and ii) if hk is the k-th column of H, then there is a matrix A ∈ M with k-th column hk . Algorithm to compute bounds on M k+1 ¯ Given: column tight bounds L, H on M k . Find column tight bound L, k+1 ¯ H on M for k=1, 2, ..., m. ¯ (column of L ¯ - computed sequentially) To compute L L1. Input P, Q, L, H. ¯ L2. For j=1 to n do (computing the j-th column ¯lj of L)
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
249
a) define l = lj , b) sort entries of l
li1 ≤ li2 ≤ ... ≤ lin , c) for i=1 to n do c1) search for t such that qii1 + ... + qiit + piit+1 + piit+2 + ... + piin < 1, qii1 + ... + qiit + piit+1 + piit+2 + ... + piin ≥ 1, c2) define x = 1 − qii1 − ... − qiit − piit+2 − ... − piin < 1, ¯ i = p (as in (A1)), c3) define L ˜ ˜ 1 ...L ˜ n ]T , d) define L = [L j ˜ e) define l = Ll, ¯ L3. define L = [¯l1 ...¯ln ], ¯ L4. output L. ¯ (column of H ¯ - computed sequentially) To compute H H1. Input P, Q, L, H. ¯ j of H) ¯ H2. For j=1 to n do (computing the j-th column h j a) define h = h , b) sort entries of h hi1 ≤ hi2 ≤ ... ≤ hin , c) for i=1 to n do c1) search for t such that pii1 + ... + piis + piis+1 + qiis+2 + ... + qiin < 1, pii1 + ... + piis + piis+1 + qiis+2 + ... + qiin ≥ 1, c2) define y = 1 − pii1 − ... − piis − qiis+2 − ... − qiin < 1, ¯ i = q (as in (A1)), c3) define H ˜ ˜ 1 ...H ˜ n ]T , d) define H = [H j ˜ e) define h = Hh, ¯ 1 ...h ¯ n ], ¯ = [h H3. define H ¯ H4. output H. To find column tight bounds on M k for any k repeat the algorithm.
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
250
References 1. S. Casjens, N. Palmer, R. van Vugt, M.H. Wai, B. Stevenson, P. Rosa, R. Lathigra, G. Sutton, J. Peterson, R. Dodson, D. Haft, E. Hickey, M. Gwinn, O. White and C. Fraser, Molecular Microbiology 35 (3) pp. 490-516 (2000). 2. M. Dudkiewicz, P. Mackiewicz, A. Nowicka, M. Kowalczuk, D. Mackiewicz, N. Polak, K. Smolarczyk, M.R. Dudek and S. Cebrat, Lecture Notes in Computer Science. 2657 pp. 343-35. (2003). 3. M. Dudkiewicz, P. Mackiewicz, M. Kowalczuk, D. Mackiewicz, Nowicka A., Polak N., Smolarczyk K., Banaszak J.,Dudek M.R, and Cebrat S., Physica A. 336 (1-2) pp. 63-73 (2004). 4. M. Dudkiewicz, P. Mackiewicz, A. Nowicka, M. Kowalczuk, D. Mackiewicz, N. Polak, K. Smolarczyk, J. Banaszak, M.R. Dudek and S. Cebrat, FGCS (2004). 5. M. Dudkiewicz, P. Mackiewicz, D. Mackiewicz, M. Kowalczuk, A. Nowicka, N. Polak, K. Smolarczyk, J. Banaszak, M.R. Dudek, and S. Cebrat, Biosystems. 80, pp. 192-199 (2005). 6. M.P. Francino, L. Chao, M. A. Riley and H. Ochman, Science, 2729 pp. 107-109 (1996). 7. M.P. Francino and H. Ochman, Trends Genet. 13 pp. 240-245 (1997). 8. A.C. Frank and J.R. Lobry, Gene. 238 pp. 65-77 (1999). 9. C.M. Fraser, S. Casjens, W.M. Huang, G.G. Sutton, R. Clayton, R. Lathigra, O. White and K.A. Ketchum, Nature. 390 pp. 580-586 (1997). 10. T. Gojobori, W. H. Li, and D. Graur, J. Mol. Evol. 18 pp. 360-369 (1982). 11. D.J. Hartfiel, Markov set-chains, Springer, New York (1998). 12. D.J. Hartfiel, J. Satist. Comput. Simulation. 38, 947 (1994). 13. D.J. Hartfiel, Non-homogeneous Matrix Products, World Scientific, New Jersey, (2002). 14. H D. Gwang, and P. Green, Proc. Natl. Acad. Sci. USA. 101 (39) pp. 1399414001 (2004). 15. F. Jeanmougin, J.D. Thompson, M. Gouy, D.G. Higgins, and T.J. Gibson, Trends, Biochem. Sci. 23 pp. 403-405 (1988). 16. J.G. Kemeny, and J.L. Snell, Finite Markov Chains, New York (1960). 17. M. Kimura, Mol. Evol. 16 pp. 111-120 (1980). 18. M. Kowalczuk, P. Mackiewicz, D. Mackiewicz, A. Nowicka, M. Dudkiewicz, M.R. Dudek and S. Cebrat, J. Appl. Genet. 42 (4) pp. 553-577 (2001). 19. M. Kowalczuk, P. Mackiewicz, D. Mackiewicz, A. Nowicka, M. Dudkiewicz, M.R. Dudek, and S. Cebrat, BMC Evolutionary Biology 1 pp. 13-17 (2001). 20. J. Leluk, Computers & Chemistry 24 pp. 659-672 (2000). 21. T.C. Lee, G.G. Judge, and A. Zellner, Estimating the Parameters of the Markov Probability Model from Aggregate Time Series, North-Holland, Amsterdam, 1970. 22. J.R. Lobry, Mol. Biol. Evol. 13 pp. 660-665 (1996). 23. J.R. Lobry, J. Mol. Evol. 40 pp.326-330 (1995). 24. D. Mackiewicz, P. Mackiewicz, M. Kowalczuk, M. Dudkiewicz, M.R. Dudek and S. Cebrat, Acta Microbiologica Polonica 52 (3) pp. 245-261 (2003). 25. P. Mackiewicz, M. Dudkiewicz, M. Kowalczuk, D. Mackiewicz, J. Banaszak,
April 24, 2009
10:11
WSPC - Proceedings Trim Size: 9in x 6in
Piotr.Sliwkw.novo2
251
N. Polak, K. Smolarczyk, A. Nowicka, M.R. Dudek and S. Cebrat, Lecture Notes in Computer Scienc. 3039 pp. 687-693 (2004). 26. P. Mackiewicz, J. Zakrzewska-Czerwiska, A. Zawilak, M.R. Dudek and S. Cebrat, Nucl. Acids. Res. 32 pp. 3781-3791 (2004). 27. K.J. Marians, Annu. Rev. Biochem. 61 pp. 673-719 (1992). 28. R. Okazaki, T. Okazaki, K. Sakabe, K. Sugimoto and A. Sugino, Proc. Natl. Acad. Sci. USA. 59(2) pp. 598-605 (1968).
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
252
MULTI-OBJECTIVE APPROACHES APPLIED TO THE PHYLOGENETIC INFERENCE PROBLEM W. CANCINO∗ A. C. B. DELBEM∗∗ Institute of Mathematics and Computer Science University of Sao Paulo Av. Trabalhador Sao-carlense, 400 - Centro Sao Carlos, SP. Brazil 13560–970 Phone : (16) 33738167 E-mails: ∗
[email protected], ∗∗
[email protected] The phylogenetic inference problem consist of determining the best evolutionary relation among species.31 There are several methods proposed in the literature which resolve the this problem using a optimality criterion which evaluates each possible solution.11 Nevertheless, different criteria may lead to distinct phylogenies, which often conflict with each other. Moreover, other factors of phylogenetic inference may generate conflicting solutions.26 In this context, a multi-objective formulation can help to overcome these incongruities. In this paper, we make a review of the main multi-objective approaches for phylogenetic inference proposed in the literature.1,4,25
1. Introduction Phylogenetic inference is one of the central problems in computational biology. This problem consists in finding the best tree that explains evolutionary history of species from a given dataset. Several phylogenetic reconstruction methods have been proposed in the literature. Some of these methods uses one optimality criterion (or objective function) to evaluate possible solutions in order to determine the best tree. Several researches15,21,32 have shown important differences in the results obtained by applying distinct reconstruction methods to the same input data. The use of different data sources has been identified as another cause of incongruities in phylogenetic analysis.26 Multi-objective optimization (MOO) approaches to phylogenetic reconstruction can be a relevant contribution since it can search for solutions using more than one criterion and produce trees which are consistent with
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
253
all employed criteria. Recently, Handl et. al.14 discussed current and future applications of MOO in bioinformatics and computational biology problems. In this paper, we present a review of the main MOO approaches for phylogenetic reconstruction found in the literature.1,4,25 These studies show how a MOO formulation can be applied when conflicting datasets and conflicting optimality criteria are considered. This paper is organized as follows. Section 2 provides relevant background information about phylogenetic inference. Section 3 discusses multiobjective optimization problems (MOOPs). Section 4 presents the MOO approaches to phylogenetic inference found in the literature. Finally, Section 5 discusses the contributions of the studies reviewed. 2. Phylogenetic Inference Phylogenetic analysis investigates evolutionary relationships among species. Data used in this analysis usually come from sequence data (nucleotide or aminoacid sequences), morphological features, or other types of data.11 Frequently, researchers only have data from contemporary species while information about past species is unknown. Consequently, the phylogenetic reconstruction is only an estimation process since it is based on incomplete information.31 The evolutionary history of species under analysis is often represented as a leaf-labelled tree, called a phylogenetic tree. The actual species (or taxons) are represented by the external nodes of the tree. The past species (ancestors) are referred by internal nodes of the tree. Tree nodes are connected by branches which may have an associated length value. This branch length value represents the evolutionary distance between the nodes connected by the branch. It is important to point out that a phylogenetic tree is a hypothesis (of many possible ones) concerning the evolutionary events in the history of species. A phylogenetic tree can be rooted or unrooted. In a rooted tree, there is a special node called root that defines the direction of the evolution, determining ancestral relationships among nodes. An unrooted tree only shows the relative positions of nodes without an evolutionary direction. Figures 1 and 2 show a rooted and unrooted tree, respectively. The main goal of the phylogenetic inference is the determination of the best tree that explains the evolutionary events of the species under analysis. Several phylogenetic reconstruction methods have been proposed in the literature. Swofford et. al.31 classify phylogenetic reconstruction methods
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
254
Fig. 1.
A rooted tree.
Fig. 2.
An unrooted tree.
into two categories: (1) Algorithmic methods which use well-defined steps to generate a tree. An important feature of these methods is that they go directly to the final solution without examining many alternatives in the search space. Consequently, the solutions are quickly produced by these methods. Clustering approaches like Neighbor Joining (NJ)28 and UPGMA24 are in this category; (2) Optimality criterion methods, which basically have two components: an objective function (optimality criterion) and a search mechanism. The objective function is used to score each possible solution. The search mechanism walks through the tree search space in order to find the best scored tree according to the criterion used. Optimality methods are slower than algorithmic methods, however, they often provide more accurate answers.15 Examples of optimality criterion methods are maximum parsimony,12 maximum likelihood,9 minimum evolution20 and least squares.2 One of the main problems in phylogenetic inference is the size of the tree search space which increases exponentially with the number of taxons. In the case of optimality criterion methods, this means that the search mechanism requires heuristic techniques. Such techniques are able to find good solutions in reasonable time for large or even moderate datasets. Exhaustive and exact search techniques can also be employed, although their use is constrained to problems with small number of species.
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
255
3. Multi-Objective Optimization An MOOP deals with two or more objective functions that must be simultaneously optimized. In this context, the Pareto dominance concept is commonly used to compare two solutions. A solution x dominates a solution y if x is not worse than y in all objectives and if it is better for at least one. Solving an MOOP implies calculating the Pareto optimal set whose elements, called Pareto optimal solutions, represent a trade-off among objective functions. Pareto optimal solutions are not dominated by any other in the search space. The curve formed by plotting these solutions in the objective function space is entitled Pareto front. If there is no additional information regarding the relevance of the objectives, all Pareto optimal solutions have the same importance. Deb7 highlights two fundamental goals in MOOP: (1) To find a set of solutions as close as possible to the Pareto optimal front; (2) To find a set of solutions as diverse as possible. Many classical optimization techniques have been proposed to deal with MOOPs.7 The simplest approach transforms an MOOP into a single optimization problem using a weighted sum of objective functions. This strategy only finds a single point in the Pareto front for each weight combination. Thus, several runs using different weight values are required to obtain a reasonable number of Pareto optimal solutions. Nevertheless, this method does not guarantee solution diversity in the frontier. There are other methods to deal with MOOPs, however, all of them have limitations, i.e. they need a priori knowledge of the problem, for example, target values; which are not always available. On the other hand, evolutionary algorithms for multi-objective optimization (MOEAs) and other meta-heuristics have been successfully applied to both theoretical and practical MOOPs.7 In general, the most elaborated meta-heuristics models are capable of finding a distributed Pareto optimal set in a single run. The following section presents a review of the main MOO approaches applied to the phylogenetic inference problem. 4. Multi-objective approaches to phylogenetic inference The use of various phylogenetic reconstruction methods can produce different results for the same input data. Huelsenbeck15 tested the main phy-
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
256
logenetic approaches for simulated datasets containing four species. In this study, most methods performed successfully, however, under some conditions, methods failed to find the true tree producing different answers. Other studies17,21,27,32 using simulated and/or real datasets confirmed these results. Thus, the selection of the reconstruction method is a crucial step in phylogenetic analysis. Rokas et. al.26 pointed out other sources of incongruity in phylogenetic analysis: the datasets used and evolutionary assumptions concerning data. Frequently, data for a group of species can come from different sources. The use of conflicting datasets in phylogenetic reconstruction, combined o separated, may produce different trees Thus, an MOO approach that takes into account several criteria can help to deal the incongruities obtained from phylogenetic reconstruction. This approach not only enables the determination of the best solution according to each criterion separately, it also finds intermediate solutions representing a trade-off among criteria used. The following subsections reviews the main references in the literature applied to conflicting datasets and conflicting optimality criteria. 4.1. Conflicting datasets The main concerns of the total evidence debate is determine how to use information from different datasets in phylogenetic inference. There are two ways such data can be employed: (1) Combine all available data for a single phylogenetic inference. (2) Perform a phylogenetic inference for each dataset independently. Then, a consensus tree is calculated from all solutions. Both uses of phylogenetic data have problems. There is no consensus on how to combine different datasets that produces a reliable inference. On the other hand, results from many separated analysis may be incongruent among them. Summarising several trees in a single one leads to loss of information. In this context,25 propose an MOO approach to deal with conflicting data. The authors built two artificial datasets of four DNA sequences, named Dataset 1 and Dataset 2. A phylogenetic inference using the maximum likelihood criterion were performed on Dataset 1, Dataset 2 and both datasets concatenated. This concatenation is equivalent to assume that both datasets have the same importance or equal weights. For simplicity, the Jukes-Cantor model of sequence evolution18 was used. The tree
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
257
different topologies resulting from each inference are showed in Figure 3. It is important to note that each topology groups species in a different way. While tree A groups (P,R) and (Q,S), tree B and C groups (P,Q),(R,S) and (P,S),(Q,R), respectively. A separated or combined analyses do not help to resolve the incongruity among these results. P
R
Tree A
Q
P
S
Q
Tree B
R
P
S
S
R
Tree C
Q
Maximum likelihood trees obtained for conflicting datasets.25
Fig. 3.
The authors explored the nature of conflicting trees using an MOO formulation. The likelihood values obtained from datasets 1 and 2 are used as objective functions. The Pareto-optimal trees maximizes the likelihood for each possible weight combination of both datasets. The resulting Paretofront is showed in Figure 4. Likelihood scores for Dataset 1 and 2 are represented in horizontal and vertical axes, respectively. The obtained Paretofront is divided in three regions: -530
-535 Likelihood Dataset 2
April 24, 2009
Tree A Tree B Tree C Inflexion points X Y
-540 Z
-545 W
-550 -470
Fig. 4.
-465
-460 -455 Likelihood Dataset 1
-450
Pareto-front obtained from maximum likelihood analysis.25
(1) Region from points X to Y. The tree topologies in this region correspond
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
258
to tree A. The length of internal branch increases; the lengths of the branches to P and S decrease, and the lengths of the branches to Q and R increases. In the point Y, the topology of solutions changes to tree C. (2) Region from points Y to Z. The tree topologies in this region are equal to tree C. Branch lengths do not vary significantly towards this region. In point Z, the topology of solutions changes to tree B. (3) Region from points Z to W. Trees in this region have topology equal to tree B. The length of internal branch and the length of the branch to R increase while the lengths of the branches to P,Q,S decrease. More information from Pareto-front is obtained by introducing some constraints. For instance, if the tree topology is fixed, the maximum likelihood value for each competing topology is obtained. Moreover, solutions along Pareto-front reveal how topologies and likelihood values change. These results are useful to summarize all solutions in representative topologies without loss of information about Pareto front o fitness landscape. The authors also note the importance of the four-species problem for phylogenetic reconstruction methods based on quartets of species.30 The MOO approach applied to each quartet may reveal the species that generate conflicting solutions. These species can be separated for a more comprehensive analysis while the others can be used to construct an subtree consistent with all datasets. 4.2. Conflicting optimality criteria Coelho and Von Zuben4 propose an multi-objective Artificial Immune System (AIS) approach to the reconstruction of phylogenetic trees. AIS groups a class of algorithms designed to emulate principles of the natural immune systems. These immune-inspired algorithms are applied to solve real-world complex problems in optimization, data mining and other areas.6 The proposed MOO approach is based on the omni-aiNet algorithm, an AIS for single and multi-objective optimization previously developed by the authors.3 The omni-aiNet is employed to find a set of Pareto-optimal trees that represent a trade-off between the minimum evolution and the least-squares criteria. Both criteria employs distance data to evaluate tree topologies. The omni-aiNet represents each solution as a distance matrix. This matrix is submitted to the NJ algorithm in order to obtain a tree topology and branch lengths. Thus, instead of searching for tree topology and branch
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
259
lengths directly; the omni-aiNet searches for the distance matrix that optimizes minimum evolution and least-square criteria. In the experiments were used an artificial dataset of eight species. The results indicate that omni-aiNet algorithm was able to find a well spread Pareto-optimal trees. Solution found by omni-aiNet were compared against tree inferred by applying NJ algorithm on the input dataset. During NJ iterations, the minimum evolution and least-square criteria are used to group species and build the tree. Thus, the NJ can be view as a multi-objective greedy strategy. Compared to the tree found by NJ, solutions obtained by omni-aiNet have better minimum evolution and least squares scores. Additionally, two decision making techniques were applied to choose the most promising trees from the Pareto-front: Compromise Programming (CP) and Marginal Rate of Return (MRR). Although the trees chosen by CP and MRR have different branch lengths, they share the same topology. Cancino and Delbem1 propose an MOO approach for phylogenetic reconstruction using maximum parsimony12 and maximum likelihood9 criteria. This approach, called PhyloMOEA, is an MOEA based on the NSGA-II model.8 The PhyloMOEA output is a solution set representing a trade-off between the considered criteria. The initial trees used by PhyloMOEA are provided by maximum likelihood, maximum parsimony and bootstrap analysis,10 which are performed before PhyloMOEA’s execution. This strategy is usually employed by other evolutionary algorithms (EAs) phylogenetic inference programs.19,22 Branch lengths optimization it is applied only after a PhyloMOEA execution due this operation is time consuming. PhyloMOEA was tested using four nucleotide data sets: (1) The rbcL 55 dataset comprises 55 sequences (each sequence has 1314 sites) of the rbcL chloroplast gene from green plants;23 (2) The mtDN A 186 dataset contains 186 human mitochondrial DNA sequences (each sequence has 16608 sites) obtained from The Human Mitochondrial Genome Database (mtDB);16 (3) The RDP II 218 dataset comprises 218 prokaryotic sequences of RNA (each sequence has 4182 sites) taken from the Ribosomal Database Project II;5 (4) Finally, the ZILLA 500 dataset includes 500 rbcL sequences (each sequence has 1428 sites) from plant plastids.13 Due to the stochastic nature of EAs, 20 executions of PhyloMOEA were performed for each dataset. At the end of each run, PhyloMOEA stores two
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
260
types of solutions: (1) Pareto-optimal Solutions, which are the non-dominated solutions of the final population; (2) Final Solutions, which includes Pareto-optimal solutions and the trees that have equal parsimony scores and different likelihood scores. These trees are promising from the perspective of parsimony criterion.
-24570
-40850
Pareto Front Final Solutions
-24580
Pareto Front Final Solutions
-40900
Ln Lilelihood
Ln Lilelihood
-24590
-24600
-40950
-41000
-24610 -41050
-24620
-24630
-41100 4874
4876
4878
4880 4882 Parsimony
4884
4886
4888
2436
2438
2440
2442 2444 Parsimony
2446
2448
2450
Fig. 5. Final Solutions and Pareto front Fig. 6. Final Solutions and Pareto front for rbcL 55 dataset.1 for mtDN A 186 dataset.1
-156000
-86900
Pareto Front Final Solutions
Pareto Front Final Solutions
-87000 -158000
-160000
-162000
-87100 Ln Lilelihood
Ln Lilelihood
April 24, 2009
-87200 -87300 -87400
-164000
-166000 41400 41600 41800 42000 42200 42400 42600 42800 43000 43200 Parsimony
-87500 -87600 16220 16230 16240 16250 16260 16270 16280 16290 Parsimony
Fig. 7. Final Solutions and Pareto front Fig. 8. Final Solutions and Pareto front for RDP II 218 dataset.1 for ZILLA 500 dataset.1
Figures 5, 6, 7 and 8 show the Pareto fronts obtained in one PhyloMOEA execution for rbcL 55, mtDN A 186, RDP II 218 and ZILLA 500 datasets, respectively. Parsimony scores are represented in horizontal axis, while likelihood scores are represented in the vertical one. These figures also show Final Solutions near the Pareto front. Since the parsimony scores are integer values, the resulting Pareto front is a discontinuous set of points. The two extreme points from the frontier represent the maximum parsimony and maximum likelihood trees found by PhyloMOEA.
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
261
Solutions from Pareto and Final Solutions were validated statically using the Shimodaira-Hasegawa test (SH-test).29 Tables 1 and 2 show summarized results from the SH-test applied to Pareto and Final Solutions for each dataset, respectively. The number of non-rejected and rejected trees according to parsimony and likelihood criteria for Pareto and Final solutions are shown in these tables. Table 1.
Summary of SH-test results for Pareto-optimal Solutions.1
Dataset rbcL 55 mtDN A 186 RDP II 218 ZILLA 500
Table 2.
SH-test Parsimony Non-Rej. Rej. 10 0 8 0 10 25 12 9
SH-test Likelihood Non-Rej. Rej. 10 0 4 4 6 29 14 7
Summary of SH-test results for Final Solutions.1
Dataset rbcL 55 mtDN A 186 RDP II 218 ZILLA 500
SH-test Parsimony Non-Rej. Rej. 16 37 37 8 21 57 27 79
SH-test Likelihood Non-Rej. Rej. 17 36 22 23 11 67 29 77
The authors point out that the SH-test was designed to be applied for one criterion, i.e. this is not a multi-criteria test. However, the SH-test shows that some of the Pareto optimal solutions are not significantly worse than the best trees resulting from a separate analysis. Thus, PhyloMOEA was able to find intermediate solutions that are consistent with the best solutions obtained from the parsimony and likelihood criteria. 5. Conclusion In this paper, we presented a review of the main MOO approaches applied to phylogenetic reconstruction. These studies were motivated by several sources of incongruities identified in the literature.15,21,26,32 In this context, MOO approaches were used when conflicting datasets and conflicting optimality criteria produce different solutions. Poladian and Jermiin25 show how MOO can be applied to deal with conflicting datasets. The authors focused on a four species problem. Two conflicting datasets of four species were artificially created. Different trees
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
262
were obtained by maximum likelihood inference on both datasets (separated or combined). Thus, an MOO approach was applied using the likelihood scores of each dataset as objective functions. The analysis of the Pareto-optimal solutions illustrates how the tree topologies supported by each dataset changes across the front. The authors also point out the importance of the four species problem for the quartet puzzling phylogenetic reconstruction techniques.30 MOO approaches for conflicting optimality criteria were applied in.1,4 Coelho and Von Zuben4 developed an AIS algorithms, named omni-aiNet,3 to find Pareto-optimal trees using the minimum evolution and least squares criteria. The Pareto-optimal solutions found by omni-aiNet were superior to the tree inferred by NJ on both criteria. On the other hand, Cancino and Delbem1 uses the PhyloMOEA algorithm to find trees that represent a trade-off between the maximum parsimony and maximum likelihood criteria. The trees found by PhyloMOEA were statistically tested against best trees obtained by parsimony and likelihood analyses performed separately. Results suggest that some Paretooptimal solutions were consistent with the employed criteria. The studies reviewed in this paper present three possible MOO approaches to the phylogenetic inference problem. In each case, resulting Pareto-optimal trees were examined and validated using different techniques. Also, the proposed approaches reveals the importance of a multiobjective formulation when conflicting phylogenetic trees are obtained. The application of MOO techniques in phylogenetic inference is still incipient. However, this approach would be a useful tool that allows researchers to examine different evolutionary scenarios and to perform a robust inference. Acknowledgments The authors would like to thank the Sao Paulo State Research Foundation (FAPESP) for the financial support provided. References 1. W. Cancino and A. Delbem. Inferring phylogenies by multi-objective evolutionary algorithms. International Journal of Information Tech- nology and Intelligent Computing, 2(2), 2007. 2. L. Cavalli-Sforza and A. Edwards. Phylogenetic Analysis: Models and Estimation Procedures. Evolution, 21(3):550-570, 1967. 3. G. Coelho and F. Von Zuben. Omni-ainet: An immune-inspired approach
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
263
4.
5.
6. 7. 8.
9. 10. 11. 12. 13.
14.
15. 16.
17. 18. 19.
20.
for omni optimization. In Artificial Immune Systems, volume 4163, pages 294-308, 2006. G. Coelho, A. Silva, and F. von Zuben. A multiobjective approach to phylogenetic trees: Selecting the most promising solutions from the pareto front. In 7th International Conference on Intelligent Systems Design and Applications, 2007. J. Cole, B. Chai, R. Farris, Wang, S. Kulam, D. McGarrell, G. Garrity, and J. Tiedje. The Ribosomal Database Project (RDP-II): Sequences and Tools for High-throughput rRNA Analysis. Nucleic Acids Research, 33:D294-D296, 2005. L. De Castro and J. Timmis. Artificial immune systems: a new computational intelligence approach. Springer, London, 2002. K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. John Wiley & Sons, New York, 2001. K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan. A Fast Elitist Non- Dominated Sorting Genetic Algorithm for Multi-Objective Optimiza- tion: NSGAII. KanGAL report 200001, Indian Institute of Technology, Kanpur, India, 2000. J. Felsenstein. Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach. Journal of Molecular Evolution, 17:368-376, 1981. J. Felsenstein. Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution, 39(4):783-791, 1985. J. Felsenstein. Inferring Phylogenies. Sinauer, Sunderland, Mas- sachusetts, 2004. W. Fitch. Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology. Systematic Zoology, 20(4):406-416, 1972. S. Guindon and O. Gascuel. A Simple, Fast, and Accurate Algorithm to Estimate Large Pfelsenstein85hylogenies by Maximum Likelihood. Systematic Biology, 5(52):696-704, 2003. J. Handl, D. Kell, and J. Knowles. Multiobjective Optimization in Computational Biology and Bioinformatics. IEEE Transactions on Computational Biology and Bioinformatics, 4(2):289-292, 2006. J. Huelsenbeck. Performance of Phylogenetic Methods in Simulation. Systematic Biology, 44:17-48, 1995. M. Ingman and U. Gyllensten. mtDB: Human Mitochondrial Genome Database, a Resource for Population Genetics and Medical Sciences. Nucleic Acids Research, 34:D749-D751, 2006. L. Jin and M. Nei. Limitations of the Evolutionary Parsimony Method of Phylogenetic Analysis. Molecular Biology and Evolution, 7:82-102, 1990. T. Jukes and C. Cantor. Mammalian protein metabolism. In Evolution of protein molecules, pages 21-120. Academic Press, 1969. K. Katoh, K. Kuma, and T. Miyata. Genetic Algorithm-Based MaximumLikelihood Analysis for Molecular Phylogeny. Journal of Molecular Evolution, 53:477-484, 2001. K. Kidd and L. Sgaramella. Phylogenetic analysis: Concepts and Methods. American Journal of Human Genetics, 23(3):235-252, 1971.
April 24, 2009
16:10
WSPC - Proceedings Trim Size: 9in x 6in
Waldo.Cancino.novo2
264
21. M. Kuhner and J. Felsenstein. A Simulation Comparison of Phylogeny Algorithms under Equal and Unequal Evolutionary Rate. Molecular Biology and Evolution, 11:459-468, 1994. 22. A. R. Lemmon and M. C. Milinkovitch. The Metapopulation Genetic Algorithm: An Efficient Solution for the Problem of Large Phylogeny Estimation. In Proceedings of the National Academy of Sciences, volume 99, page 1051610521, 2002. 23. P. O. Lewis. A Genetic Algorithm for Maximum-Likelihood Phylogeny Inference Using Nucleotide Sequence Data. Molecular Biology and Evolution, 15(3):277-283, 1998. 24. C. Michener and R. Sokal. A quantitative approach to a problem in classification. Evolution, 11:130-162, 1957. 25. L. Poladian and L. Jermiin. Multi-Objective Evolutionary Algorithms and Phylogenetic Inference with Multiple Data Sets. Soft Computing, 10(4):359368, 2006. 26. A. Rokas, B. Wiliams, N. King, and S. Carroll. Genome-Scale Approaches to Resolving Incongrounce in Molecular Phylogenies. Nature, 425(23):798-804, 2003. 27. C. Russo, N. Takezaki, and M. Nei. Efficiencies of diferent genes and diferent tree-building methods in recovering a known vertebrate phylogeny. Molecular Biology and Evolution, 13(3):525-536, 1996. 28. N. Saitou and M. Nei. The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution, 4(4):406425, 1987. 29. H. Shimodaira and M. Hasegawa. Likelihood-Based Tests of Topologies in Phylogenetics. Molecular Biology and Evolution, 16(8):1114-1116, 1999. 30. K. Strimmer and A. von Haesler. Quartet puzzling: A quartet maximumlikelihood method for reconstructing tree topologies. Molecular Biology and Evolution, 13:407-514, 1996. 31. D. Swoford, G. Olsen, P. Waddell, and D. Hillis. Phylogeny Reconstruction. In Molecular Systematics, chapter 11, pages 407-514. Sinauer, 3 edition, 1996. 32. Y. Tateno, N. Takezaki, and M. Nei. Relative Efficiences of the MaximumLikelihood, Neighbor-Joining, and Maximum Parsimony Methods when Substitution Rate Varies with Site. Molecular Biology and Evolution, 11:261-267, 1994.
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
265
LOGIC FORMULAS BASED KNOWLEDGE DISCOVERY AND ITS APPLICATION TO THE CLASSIFICATION OF BIOLOGICAL DATA G. FELICI P. BERTOLAZZI Institute for System Analysis and Computer Science National Research Council, Rome, Italy E-mail: {giovanni.felici, paola.bertolazzi}@cnr.it M. R. GUARRACINO High Performance Computing and Networking Institute National Research Council, Naples, Italy E-mail:
[email protected] A. CHINCHULUUN P. M. PARDALOS Industrial and Systems Engineering Department, University of Florida, Gainesville, FL, USA E-mail: {altannar,pardalos}@ufl.edu Classifiers built through supervised learning techniques are widely used in computational biology. Examples are neural networks, decision trees and support vector machines. Recently, an extension of Regularized Generalized Eigenvalues Classifier (ReGEC) has been proposed, in which prior knowledge is included. When knowledge is formalized as a set of linear constraints to the ReGEC, the resulting non linear classifier has a lower complexity and halves the misclassification error with respect to the original method. In this work, we show how logic programming can extract knowledge from data to enhance classification models produced by ReGEC. The knowledge extraction method is based on two phases: a feature selection phase and a rules extraction phase. Feature selection is formulated as an integer programming problem that extends a set covering problem. The extraction phase is performed through the iterative solution of different instances of the same minimum cost satisfiability problem that models the logic separation rules used for classification. The overall method, that
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
266 we call LF-ReGEC, guarantees that the number of points in the training set is not increased and the resulting model does not overfit the problem. Furthermore, the overall accuracy of the method is increased. Finally, the method is compared with other methods using genomic and proteomic data sets taken from the literature.
1. Introduction The use of automatic classification methods based on supervised learning has become an important research topic for theoretical and applied mathematics. The capability of extracting information and knowledge from large amount of data is in fact a much demanded task for all those settings where complex phenomena are observed, and numerically measured, with the aim of understanding the underlying motivations that govern them. The literature proposes different types of classifiers, each one characterized by its own specific features. Their aim is to fit a model on the observed data that is capable of connecting the observed measures of a data point with a relevant, non-observed characteristic of the data point itself, usually referred to as class. The standard approach of supervised learning is thus to infer the model’s parameter from a set of observations of known class (the training sample) and then apply the model to predict or forecast the class value for observations for which the class is unknown. In the typical setting, one keeps aside a portion of the available data - referred to as test sample - to verify that the model, produced on the basis of the training data, obtains a sufficient level of precision in predicting the class for the testing data. More elaborated testing schemes have been developed to reinforce the model’s evaluation, such as cross validation or leave-one-out testing. Widely known examples of classifiers are neural networks,3 decision trees18 and support vector machines (SVM).5 Classification methods have been successfully applied in a variety of fields. In particular, promising applications of these methods are in the field of biomedicine and bioinformatics. Here, the precision of the method is of particular relevance, and the data sets that have to be analyzed are typically very large; the available samples are composed of genetic expressions or DNA sequences that can reach dimensions of several tens of thousands. The relevance of automatic classification for problems of this type pushes the efforts of research towards the identification of new ideas that are able to improve on the current performances of the available methods. One such idea is related with the integration of external or prior knowledge in a classification method. A natural approach is to plug a priori knowledge in a classifier adding directly more points to the data set. This results in higher
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
267
computational complexity, and in a propension to overfitting; moreover, most times additional data is not available or expensive to obtain. An interesting approach in this direction has been proposed by Mangasarian,16 showing that it is possible to analytically express knowledge as additional terms of the cost function of the optimization problem defining a standard SVM. This solution has the advantage not to increase the dimension of the training set, thus to avoid overfitting and poor generalization of the classification model.3 Guarracino et al.14 have recently shown how to extend this approach to Generalized Proximal Support Vector Machines15 (GEPSVM), halving the missclassification error of the original method. The idea of increasing the knowledge contained in the training set with additional knowledge is particularly appealing in the context of biomedical data, where it can provided by the experience of field experts or previous results. In this paper, we propose an alternative method to incorporate additional knowledge in an SVM classifier, extending the work of Mangasarian et al.17 The proposed approach is based on the extraction of additional knowledge from the training data itself, but with a learning method that is consistently different from SVM. We adopt the logic mining method Lsquare 7 combined with a recently developed feature extractor based on integer programming, to extract logic DNF formulas from the training data. Then, we select the most meaningful portions of such formulas and plug them into the ReGEC algorithm as external non linear knowledge. The results so obtained are indeed very interesting and exhibit quite consistently an increase in the recognition capability of the system, as measured by 10-fold cross validation. In some sense, we propose a combination of two very different learning methods: ReGEC, that operates in a multidimensional Euclidean space with highly nonlinear data transformation, and logic learning, that operates in a discretized space with models based on propositional logic. The former constitutes the master learning algorithm, while the latter provides the additional knowledge that is efficiently incorporated and dealt with by the ReGEC algorithm. It is of interest to note that the logic rules adopted to represent the external knowledge are the most appropriate form for synthesizing the additional knowledge that is possessed by field experts. We briefly introduce the notation adopted throughout the paper below. - Given a vector x ∈ Rn , kxk1 is the 1-norm, ( 1 Pn 2 2 . the 2-norm, i=1 (xi )
Pn
i=1
|xi |), and kxk is
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
268
- AT is the transpose of the matrix A ∈ Rm×n , and Ai and A.j are the i-th row and the j-th column, respectively, of the matrix A. - e = (1, 1, . . . , 1)T and 0 = (0, 0, . . . , 0)T . - Given two matrices A ∈ Rm×n and B ∈ Rn×k , a kernel K(A, B) maps Rm×n ×Rn×k into Rm×k . One of the most widely used kernels is the Gaussian kernel where the ij-th element of the matrix is de2 T fined as (K(A, B))ij = e−µkAi −B.j k , where A and B are matrices with the same number of columns, and µ is a positive constant. - In classification, each point x ∈ Rn is assigned to one of the classes in {−1, 1}, and, for set Γ of m real points, which is represented by the m rows of the matrix in Rm×n , there is an associated a vector m c ∈ {−1, 1} of labels denoting their classes. The paper is organized as follows. In Section 2, we give a general description of knowledge based SVM. Section 3 discusses the technique by means of which additional knowledge can be implicitly added to the ReGEC classifier with very limited additional computational cost. Section 4 describes the logic learning method adopted to extract separating logic rules from the data while Section 5 presents the experimental results obtained on 5 different biomedical data sets of large dimension. Section 6 provides some conclusions and directions for future research. 2. Knowledge based SVM Support Vector Machine is a state of the art machine learning algorithm.5,27 The main idea of SVM is to separate the input space in two half spaces using the hyperplane xT w − γ = 0 which maximizes the margin between the two classes. The hyperplane can be found by minimizing the norm of w, with constraints to correctly classified points of both classes. For nonlinear classification, the SVM is used with kernel functions17 and the basic solution technique is still through linear programming. With a kernel Radial Basis Function, the corresponding hyperlane, projected in the feature space,22 has the following form: f (x) ≡ K(xT , B T )u − γ = 0,
(1)
where B ∈ Rk×n and K(xT , B T ) : R1×n × Rn×k → R1×k is the Radial Basis Function that returns the vector y with components: yi = e−
kx−Bi k2 σ
.
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
269
Parameters u ∈ Rk and γ ∈ R are determined by solving the following linear programming problem:17
min νeT y + eT s
u,γ,y,s
(2)
s.t. D(K(Γ, ΓT )u − γe) + y ≥ e, −s ≤ u ≤ s,
y ≥ 0.
Here, D is the diagonal matrix of ±1 corresponding to elements of the training set or matrix Γ. One can include the knowledge of an expert in the learning phase of the function (1) to improve the results obtained by a classifier from the training set. The expertise knowledge is represented by the following implication which represents a knowledge region ∆ in the input space in which points xs are known to belong to the class +1:
g(x) ≤ 0 ⇒ K(xT , B T )u − γ ≥ α, ∀x ∈ ∆,
(3)
where α is a nonnegative number. Therefore, based on the theorem of the alternatives17 for a convex function, the implication (3) can be expressed as a linear inequality in terms of the parameters (u, γ), and we can add positive nonlinear knowledge to the problem (2) as follows:
min νeT y + eT s
(4)
u,γ,y
s.t. D(K(A, B T )u − γe) + y ≥ e, −s ≤ u ≤ s, y ≥ 0,
K(xTi , B T )u − γ − α + v T g(xi ) + zi ≥ 0,
v ≥ 0, zi ≥ 0,
i = 1, . . . , l.
The same holds for negative knowledge regions. This leads to the linear
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
270
program (2) with knowledge included in the cost function: min
u,γ,y,s,v,p,z1 ,...,zl ,q1 ,...,qt
s.t.
t l X X qj ) zi + νe y + e s + σ( T
T
(5)
j=1
i=1
D(K(A, B T )u − γe) + y ≥ e,
−s ≤ u ≤ s, y ≥ 0,
K(xTi , B T )u − γ − α + v T g(xi ) + zi ≥ 0, v ≥ 0, zi ≥ 0, i = 1, . . . , l,
−K(xTj , B T )u + γ − α + pT g(xj ) + qj ≥ 0, p ≥ 0, qj ≥ 0, j = 1, . . . , t.
We note that the linear programming problem (5) minimizes the margin between the two classes constraining the classification problem to leave the two a priori knowledge sets in the corresponding halfspace. 3. Nonlinear Knowledge in GEPSVM We recall that a SVM binary classifier finds a hyperplane which divides the space into two halfspaces. Points laying in one halfspace belong to class +1, the others to class −1. A different approach is also used in Proximal Support Vector Machines (PSVM),10 where the class of a point is determined by the closeness to one of two hyperplanes. Given matrices A ∈ Rm×n and B ∈ Rk×n , respectively representing points of class +1 and −1, Γ = A ∪ B, we can find the hyperplane approximating the class +1 solving the following minimization task: min
u,γ6=0
kK(A, Γ)u − eγk
2
kK(B, Γ)u − eγk
2,
(6)
which finds the hyperplane minimizing the distance from the points of class +1 and at the same time maximizing the distance from the points of class −1. Conversely, the hyperplane for points in class −1, can be found solving the reciprocal of (6): min
u,γ6=0
kK(B, Γ)u − eγk kK(A, Γ)u − eγk
2
2.
(7)
Equation (7) finds the hyperplane that minimizes the distance from points in class −1 and maximizes the distance from points in class −1. Each of these problems can be solved using regularization as proposed by
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
271
Guarracino et al.:15 min
2
e
2 kK(A, Γ)u − eγk + δ K u − eγ
B
,
(8)
min
2
e
2 kK(B, Γ)u − eγk + δ K A u − eγ
,
(9)
u,γ6=0
u,γ6=0
kK(B, Γ)u − eγk
kK(A, Γ)u − eγk
2
2
e A and K e B are diagonal matrices, with diagonal elements taken where K respectively from matrices K(A, Γ) and K(B, Γ). Given T
G = [K(A, Γ) − e] [K(A, Γ) − e] , T
H = [K(B, Γ) − e] [K(B, Γ) − e] , T z = uT γ ,
(10)
the equation (6) becomes:
z T Gz . z∈R z T Hz Similarly for B, we obtain the reciprocal problem: minm
minm
z∈R
z T Hz . z T Gz
(11)
(12)
Equations (11) and (12) represent Rayleigh quotients of the eigenvalue problems Gz = λHz and their reciprocal. The minimum eigenvectors obtained as solution to (8)-(9) give the proximity planes Pi , 1 = 1, 2. A given point x will thus be classified according to the following formula: class(x) = arg min dist(x, Pi )
(13)
|K(x, Γ)u − γ| . kuk
(14)
i=1,2
using the distance dist(x, Pi ) =
GEPSVM algorithm has several advantages with respect to SVM. First of all, in its linear formulation, it can be used to classify problems that are not linearly separable. Furthermore, its computational complexity is dominated by the number of training samples. Finally, its implementatino is reduced to eigenpairs computation, which can be expressed in a single line code in many problem solving environments such as R and Matlab.
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
272
It is possibile to add nonlinear prior knowledge formulating the model in terms of a constrained generalized eigenvalue problems. The latter has been extensively studied and a procedure for its solution has been proposed by Golub.12 If G and H, as defined in (11), are symmetric matrices of order n, constraints can be expressed by the equation: C T z = 0,
(15)
where C is an n × p matrix of rank r, with r < p < n. The constrained eigenvalue problem for the classification surface for points in class +1 is: z T Gz z∈R z T Hz s.t. C T z = 0. minm
(16)
Let ∆ be the set of class +1 points describing nonlinear knowledge, constraint matrix C must represent knowledge imposed on class +1 points, hence it will be: T C = K(∆, Γ) −e . (17)
Matrix C needs to be rank deficient in order to have non-trivial solution. The set of constraints 15 requires all points in ∆ to have null distance from the hyperplane, and thus to belong to class +1. The QR decomposition of C gives two matrices Q and R such that C = QR. Q is an orthonormal matrix where QT Q = I. R is an order r upper triangular matrix. Reordering the rows of C, we can write: RS T Q C= , 0 0
where S is an r × (p − r) matrix. Let z = Qw = Q
y , v
where y is a vector of the first r components of w and v of the last (n − r) components of w, thus having a representation of z in the space generated by Q. We have:
CT z =
RT 0 ST 0
y =0 v
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
273
and hence y = 0. Defining z = Qw it is possible to reformulate the equation (11) as: min w6=0
wT QT G Q w . wT QT H Q w
T
To simplify, we let L = Q GQ and M = QT HQ, with M11 M12 L11 L12 , ,M = L= M21 M22 L21 L22 where L11 , M11 are r×r matrices and L22 , M22 are (n−r)×(n−r) matrices. Both L and M are symmetric matrices. Moreover, being matrix M positive definite, we have 0 < λmin (M ) ≤ λmin (M22 ) ≤ λmax (H22 ) ≤ λmax (H),
(18)
where λmin and λmax represent minimum and maximum eigenvalues. Leading back to a minimization problem, we have to find w such that: min w6=0
wT Lw . wT M w
(19)
Minimization problem (19) contains positive nonlinear knowledge represented by C. This expression is Rayleigh quotient of the generalized eigenvalue problem Lw = λM w. Stationary points are those and only those corresponding to the eigenvectors of (19). Moreover, being M positive definite, Rayleigh quotient is limited and varies in the interval determined by minimum and maximum eigenvalues.19 Considering (18), we just need to search stationary values of the equation: L22 v = λM22 v.
(20)
Since L and M are symmetric and M is positive definite, L22 and M22 will be symmetric, and M22 positive definite. So far, having found the n − r eigenvalues and eigenvectors of L22 vi = λi M22 vi , i = 1 . . . n−r, we calculate the components of the vector z, original solution of the problem (16): 0 wi = Q · · · vi . (21) In−r The constrained method just introduced has a lower complexity, compared to the original method, in the solution of the eigenvalue problem (20), which involves matrices of order (n−r), although an initial QR factorization is needed for the matrix C.
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
274
4. Additional Knowledge in the form of Logic Formulas The additional knowledge for the ReGEC classifier is extracted from training data with a logic mining technique capable of dealing efficiently with large data sets. Such choice is motivated by two main considerations: first, the nature of the method is intrinsically different from the SVM adopted as primary classifier; second, the logic formulas are, semantically, the form of “knowledge” closest to human reasoning and therefore resemble at best contextual information. The logic mining system consists of two main components, each characterized by the use of integer programming models and state-of-the-art solution techniques. Below, a brief description of the system is given, pointing, when needed, to extended descriptions made available in the related literature. Discretization. When the data is in numeric form (e.g., as in the case of the experiments described in the following of this paper) the system adopts a discretization method that builds, from each original variable, one or more logic variables based on some thresholds on the range of variation of the variable itself. This is accomplished by an iterative procedure that first sections the variation interval of the variable in a large number of intervals, and the joins these intervals based on class entropy. A detailed description of the methods can be found in.2 Feature Selection. The logic variables obtained in the discretization step are selected with a feature selection method inspired to the one often referred to as Combinatorial Feature Selection or Minimal Test Collection (see11 ). We proposed a modification of such method based on the infimumnorm that amounts to the following integer linear program: max α m X xh ≤ k s.t. h=1 m X
h=1
(22)
ahij xh ≥ α, i = 1 . . . n, j = 1 . . . n, c(i) 6= c(j)
xh ∈ {0, 1} , h = 1 . . . m, where the binary variables xh = {0, 1} are associated with each feature (h = 1, ..., m) and have value of 1 only if the feature is chosen; the coefficients ahij equal to 1 when individuals i and j differ on feature h, and 0 otherwise, and c(i) indicates the class of individual i. For details, refer to,26
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
275
Such combinatorial problem is hard in nature, and, for interesting cases, of large dimensions. For this reason we adopt a randomized heuristic solution technique of the GRASP1,9 family, which stand for Greedy Randomized Adaptive Search Procedures, that has proven to reach solutions of very good quality in limited time. Logic Separation. The extraction of logic formulas that holds true for individuals in one class and false for those in the other one is accomplished by the logic miner Lsquare. Lsquare is a learning method that operates on data represented by logic variables and extracts separating logic formulas in Disjunctive Normal From (DNF). The classification rules are determined using a particular problem formulation that amounts to be a well know and hard combinatorial optimization problem, the minimum cost satisfiability problem, or MINSAT, that is solved using a very sophisticated solver based on decomposition and learning techniques.26 The DNF formulas are formed by few clauses with large coverage (the interpretation of the trends present in the data) and, if needed, additional clauses with smaller coverage (the interpretation of the outliers in the training set). The system and its additional components have been presented and described in related papers (7,8 ) and their detailed description is out of the scope of this paper. 5. Experimental results LF-ReGEC has been implemented with Matlab 7.3.0. The computational kernels of the logic formulas are implemented in C. Results are evaluated using an Intel Xeon CPU 3.20GHz, 6GB RAM running Red Hat Enterprise Linux WS release 3. Matlab function eig for the solution of the generalized eigenvalue problem was used as computational kernel of ReGEC. Tests have been performed for ReGEC, LF and LF-ReGEC algorithms. Accuracy results for SVM and TSP are taken from literature.25 5.1. A case study We have chosen acute leukemias microarray dataset as a test case. The aim is to classify acute leukemias into those arising from lymphoid precursors (acute lymphoblastic leukemia, ALL) or from myeloid precursors (acute myeloid leukemia, AML). Distinguishing ALL from AML is critical for successful treatment. Indeed, chemotherapy regimens for ALL are generally different from those for AML and, although remissions can be achieved using ALL therapy for AML (and vice versa), cure rates are markedly diminished, and toxicities are encountered. The dataset is obtained from Golub
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
276
et al., 1999.13 It consists of 25 AML and 47 ALL samples. The gene expression data for 7129 probes has been acquired with an Affymetrix microarray. The dataset has been divided in 10 folds, each containing approximately 10% of the complete dataset. Each fold has been extracted from the original data and used for testing. The remaining 90% in each fold has been used for training. Firstly, the dataset has been discretized and the logic formulas have been evaluated. Those formulas are in the form: IF p(4196) > 3.435 AND p(6041) > 3.004 THEN class1, IF p(6573) < 2.059 AND p(6685) > 2.794 THEN class1, IF p(1144) > 2.385 AND p(4373) < 3.190 THEN class − 1,
IF p(4847) < 3.006 AND p(6376) < 2.492 THEN class − 1,
where p(i) represents the i-th probe. Each of the previous formulas is true for some samples of one class and it is false for all samples of the other class. We chose only these formulas for which is maximum the number of points satisfying the condition. The knowledge region for each class, are those given by the intersection of all chosen formulas. Table 1.
Accuracy results of ten fold (1) and leave one out (2) cross validation
Dataset
ReGEC (1)
LF (1)
LF-ReGEC (1)
SVM(2)
TSP(2)
Leukemia
98.33%
86.36%
100%
98.61%
93.80%
Accuracy results reported in Table 1 show ten fold cross validation results for ReGEC, LP algorithm and SVM, leave one out cross validation for the last two. The LF-ReGEC method is fully accurate on the dataset. 5.2. Numerical experiments LF-IReGEC was tested on publicly available benchmark data sets. The data has been obtained from k-TSP Program Download Page.25 Dataset characteristics and references are reported in Table 2. Results regarding its performance in terms of classification accuracy are also presented. In Table 3, accuracy results are reported for the datasets of Table 2 for various methods. Null classification results have been computed on the complete datasets, supposing that all samples would have been classified in the class containing the larger number of samples. We note that the LP method is more accurate than TSP in three cases out of five. In all cases, the use of LF in conjunction with ReGEC, produces equal or higher accuracy results. We
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
277
note that in LOO cross validation results are usually more accurate, because the training is done on all training set, except a sample. Nevertheless, LFReGEC well compares with SVM. Table 2.
Datasets characteristics
Dataset
Platform
genes (P)
samples (N)
Leukemia Prostate1 Prostate2 CNS GCM
Affy Affy Affy Affy Affy
7129 12 600 12 625 7129 16 063
25 (AML) 52 (T) 38 (T) 25 (C) 190 (C)
Table 3.
Reference 47 (ALL) 50 (N) 50 (N) 9 (D) 90 (N)
(Golub et al.13 ) (Singh et al.23 ) (Stuart et al.24 ) (Pomeroy et al.20 ) (Ramaswamy et al.21 )
Ten fold (1) and leave one out (2) cross validation accuracy results
Dataset
NULL
ReGEC (1)
LF (1)
LF-ReGEC (1)
SVM(2)
TSP(2)
Leukemia Prostate1 Prostate2 CNS GCM
65.27% 50.98% 56.81% 73.52% 67.85%
98.33% 84.62% 65.78% 65.78% 70.45%
86.36% 77.80% 73.50% 79.20% 79.60%
100% 84.62% 75.25% 82.58% 71.43%
98.61% 91.18% 76.14% 82.35% 93.21%
93.80% 95.10% 67.60% 77.90% 75.40%
6. Conclusions and future work In the present work, we have proposed a new method to incorporate nonlinear knowledge provided by Logic Formulas in ReGEC, in a fashion similar to what has been proposed in Mangasarian and Wild, 2006.17 Results show that accuracy of the new algorithm well compares with those of the single algorithms. In future, we will test and compare the LF-ReGEC method against other standard datasets. Finally, we believe further investigation needs to be devoted to the identification of knowledge regions, using only a subset of the training set with an incremental technique.4 References 1. P. Bertolazzi and G. Felici. Learning to classify species with barcodes. Technical report, Institute for System Analysis and Computer Science, Tech. Rep. 665, 2007. 2. P. Bertolazzi, G. Felici, P. Festa, and G. Lancia. Logic classification and feature selection for biomedical data. Computer and Mathematics with Applications, 55(5):889899, 2008. 3. C. M. Bishop. Neural networks for pattern recognition. Oxford Press, 1995.
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
278
4. C. Cifarelli, M. R. Guarracino, O. Seref, S. Cuciniello, and P. M. Pardalos. Incremental classification with generalized eigenvalues. Journal of Classification, 22(1):7381, 2007. 5. C. Cortes and V. Vapnik. Support vector machines. Machine Learning, 20:273279, 1995. 6. G. Felici, V. de Angelis, and G. Mancinelli. Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, chapter Feature Selection for Data Mining, pages 227252. Felici G. and Triantaphyllou E. eds., Springer, 2006. 7. G. Felici and K. Truemper. A minsat approach for learning in logic domains. INFORMS Journal on Computing, 13(3):117, 2001. 8. G. Felici and K. Truemper. Encyclopedia of Data Warehousing and Mining, volume 2, chapter The Lsquare System for Mining Logic Data, pages 693697. J. Wang ed., Idea Group Inc., 2006. 9. P. Festa and M.G.C. Resende. Essays and Surveys on Metaheuristics, chapter Grasp: An annotated bibliography, pages 325367. Ribeiro C.C. and Hansen P. eds., Kluwer Academic Publishers, 2002. 10. G. Fung and O. L. Mangasarian. Proximal support vector machine classifiers. In Knowledge Discovery and Data Mining, pages 7786, 2001. 11. M.R. Garey and D.S. Johnson. Computer and Intractability: a guide to the theory of NP-completeness. Freeman, San Francisco, 1979. 12. G. H. Golub and R. Underwood. Stationary values of the ratio of quadratic forms subject to linear constraints. Zeitschrift fr Angewandte Mathematik und Physik (ZAMP), 21(3):318326, 1970. 13. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531537, 1999. 14. M. R. Guarracino, C. Cifarelli, O. Seref, and P. M. Pardalos. A classification algorithm based on generalized eigenvalue problems. Optimization Methods and Software, 22(1):7381, 2007. 15. M.R. Guarracino, D. Abbate, and R. Prevete. Nonlinear knowledge in learning models. In Workshop on Prior Conceptual Knowledge in Machine Learning and Knowledge Discovery, European Conference on Machine Learning, pages 2940, 2007. 16. Y. Lee and O. L. Mangasarian. Ssvm: A smooth support vector machine for classification, 1999. 17. O. L. Mangasarian and E. W. Wild. Nonlinear knowledge-based classification. Technical report, Data Mining Institute Technical Report 06-04, Computer Science Department, University of Wisconsin, Madison, Wisconsin, November 2006. 18. T.M. Mitchell. Machine Learning. McGraw Hill, 1997. 19. B. N. Parlett. The Symmetric Eigenvalue Problem. SIAM, Philadelphia, PA, 1998. 20. S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen,
April 24, 2009
16:13
WSPC - Proceedings Trim Size: 9in x 6in
Mario.Guarracino.II.novo2
279
21.
22.
23.
24.
25.
26. 27.
D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, and T.R. Golub. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415:436442, 2002. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America, 98:1514915154, 2001. B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. DAmico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, and W. R. Sellers. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203209, 2002. R.O. Stuart, W. Wachsman, C.C. Berry, J. Wang-Rodriguez, L. Wasserman, I. Klacansky, D. Masys, K. Arden, S. Goodison, M. McClelland, Y. Wang, A. Sawyers, I. Kalcheva, D. Tarin, and D. Mercola. In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proceedings of the National Academy of Sciences of the United States of America, 101:615620, 2004. A.C. Tan, D.Q. Naiman, L. Xu, R.L. Winslow, and D. Geman. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 21:38963904, 2005. K. Truemper. Design of Logic-Based Intelligent Systems. Wiley-Interscience, New York, 2004. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
280
UNSUPERVISED CLASSIFICATION OF TREE STRUCTURED OBJECTS A. G. FLESIA CIEM-Conicet and FaMAF-UNC Ing. Medina Allende s/n, Ciudad Universitaria CP 5000, C´ ordoba, Argentina. E-mail:
[email protected] Recent developments in medical image analysis, phylogenetics and proteomics motivate the statistical analysis of populations of tree-structured data objects. In this context, unsupervised classification of trees arises as a challenging new area that depends on the careful development of novel mathematical framework. The discussion will center on statistical aspects of clustering in a framework where the tree data to be clustered has been sampled from some unknown probability distribution. Following Ref. 12, we will try to verify two conditions: appropriateness, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process; and steadiness, the more sample points we have, the more reliable the clustering should be. We will argue about steadiness and reliability by showing an extension of the convergence properties for a class of non-parametric clustering algorithm: k-means, defined on different metric spaces of trees. We will explore the appropriateness of the clustering outputs of k-means on a real data set from proteomics, and we will comment the results from Ref. 1 on three real data sets of phylogenetic trees.
1. Introduction A type of data space which is far from Euclidean in nature is the set of trees. Tree-structured data objects are usually mathematically represented as simple graphs (a collection of nodes and edges, each of which connects some pair of nodes). A rooted tree is a simple graph, where one node is designated as the root node, and all other nodes are children of a parent node that is closer to the root, where parents and children are connected by edges. In many applications, ranging from medical imagery analysis to phylogeny analysis, a tree-structured representation of each data object is very natural Ref. 2, 3. Also, metric spaces of random graphs are the natural mathematical framework for statistical analysis of samples of such
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
281
tree shaped objects, see Ref. 4, 6 between others. The following examples offer a partial indication of the different type of tree structures that may be encountered. (1) Binary trees with Unlabeled Interior Nodes. Biologists are used to taxonomic hierarchies: species are grouped into genera which are grouped into families and so on. The explosion of data coming from molecular biology introduced a new twist on this problem. Computational biologists can construct a phylogenetic tree from amino acid discrepancies in a specific protein sampled from many species. When many related proteins are used, a random sample of descendent trees (cladograms) are obtained, each with labeled terminal nodes (the species) and unlabeled internal nodes. The central belief that there is a unique phylogenetic tree that partitions a set of taxa still holds, thus many strategies for computing a central tree, or consensus tree from the sample have been devised, see Ref. 5 and references wherein. Also, sample sizes are becoming increasingly large, specially when combining databases, so single tree consensus may result unsatisfactory. Information about the whole set of candidate trees is lost, including how trees are distributed on the set of all binary trees and how the trees are similar to each other. Ref. 1 introduced a postprocessing approach to phylogeny analysis by clustering the tree data. Using data mining techniques, a reduced set of characteristic trees was proposed instead of a single consensus tree, each better resolved over its own cluster. (2) Trees with bounded number of children but unlabeled nodes. Another important problem in computational molecular biology is the one of determination of protein functionality using only information about its primary sequence. Large databases of protein sequences have been constructed with semi-supervised methods based on sequence alignment. To construct the Pfam database Ref. 7, a small core of curated (well known) proteins were selected to build one HMM model for each family, assigning the rest of the proteins into the family of the model with highest score. Recently, Ref. 3 has mapped each protein sequence to a tree with up to 20 children per node (amino acids codes), fixed depth D, and unlabeled terminal nodes. The topology of each tree induces a partition on the set of all possible amino acids sequences of fixed length D, so they argue that discrimination of protein families is related to partition discrimination. Fully automatic unsupervised classification of protein sequences into functionality families is a harder problem, given the difficulty introduced by multi-domain pro-
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
282
teins Ref. 8. Linkage algorithms, like average linkage and single linkage are the implementations with higher degree of success when applied to known families. (3) Trees with attributes on each node. Medical image analysis is motivating some interesting new developments on statistical analysis. These new developments are not in traditional imaging areas, such as the denoising, segmentation and enhancement of a single image, but instead are about the analysis of populations of images. Again common goals include finding center points and variation about the center, but also supervised and unsupervised classification are important. One special example introduced on Ref. 2 is the analysis of tree shaped objects extracted from 3D images of the brain. These tree shaped objects are segmentations of images of blood vessels of the brain from several patients, and the features collected from them are the tree topology of the blood vessels, (connection between arteries) and attributes of the ongoing edges as length and 3D orientation, between others. These examples are indicative but not exhaustive. Protein modeling using topological context trees, Ref. 3, and blood vessels modeling using m-trees with attributes, Ref. 2 are recent modeling proposals concerning trees. Ref. 3 analyzed nonparametric hypothesis testing procedures for distinguishing between populations of m-trees, Ref. 2 introduced the concept of principal line to decompose samples of trees into an analog of principal components. In both cases a Hamming type of metric was used to carry out the mathematical framework needed. Metric spaces of binary trees are the most studied set of trees, given its relationship with phylogenetic analysis. Clustering in the set of binary trees is a recent problem (Ref. 1), introduced by the enormous amount of data that DNA sequencing and protein sequencing (among others) are providing. The appropriateness of combining data sets is also under discussion (see Page, 1996), thus clustering trees may help see the problem under other perspective. In this paper, we address the problem of clustering a data set of trees from an statistical point of view. We will suppose that the tree data to be clustered has been sampled from some unknown probability distribution. Following Ref. 12, we will try to verify two conditions: appropriateness, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process; and steadiness, the more sample points we acquire, the more reliable the clustering should be. The second condition is closed to the notion of consistency, or
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
283
convergency of some type. An algorithm which does not converge produces rather unpredictable results on any given sample and thus is completely unreliable. On Euclidean data, there are few classes of clustering techniques that have some kind of convergence properties. Between them, k-means and single linkage algorithms are two classes that are appropriate for modeling the examples we described earlier, so it is important to study if their convergence properties still hold on non-vectorial spaces. In Section 2, we will give details of the three classes of metric spaces that have been used to model the examples detailed in this introduction. In Section 3 we will show an extension of the k-means procedure that holds for all of the spaces introduced previously, and state its convergence properties. In Section 4 we will show a real data example. We applied kmeans for clustering protein sequences through their characterization as topological context trees. On section 6 we will comment the work of Ref. 1, who applied k-medoids, single linkage and other data mining algorithms to three different data sets of un-rooted phylogenetic trees. 2. Unsupervised Learning in the space of trees To cast clustering as a statistical problem we start by regarding the data t1 , . . . , tn as an i.i.d sample form some unknown probability distribution ν(t). There are several statistical approaches to clustering. The parametric approach (Ref. 13) on Euclidean spaces is based on the assumption that each group is represented by a distribution that is a member of some parametric family, such as the multivariate Gaussian distributions. The joint distribution is a mixture of the group ones, and the number of mixture components and their parameters are estimated from the data. In the case of labeled binary trees, Ref. 4 proposed the case where a discrete density νk (t) = c(τk , t∗k ) exp(−τk d(t, t∗k )) holds for each cluster, with τk and t∗k a dispersion and central parameter respectively. A direct extension of the mixture model is quit difficult, parameter estimation would involve a search over the entire space, but an appropriated non-parametric approach for this setting would be a compact partition method like k-means. Partitioning methods divide the examples into a pre-assigned number of groups. A cluster center t∗i is assigned to each group and then the cluster centers and the groups are updated so as to minimize the sum of distances from each example to its own cluster center.
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
284
A major drawback of compact partitioning methods is their inability to cope with nested clusters. In Euclidean spaces, this problem is overcome by changing the hypothesis to the one where groups correspond to a partition of the support of the distribution into regions of influence of the modes of its density. This hypothesis allows the introduction of linkage algorithms for clustering that are close related to density estimation, like single linkage. But an extension of the approach in the general metric space context suppose the existence of a density for ν, with respect to another convenient measure µ analogous to the Lebesgue measure in Euclidean space, and there is no such measure universally acknowledged. For assessing the quality of the clustering output we may considering investigating its consistency. The convergence of a clustering algorithm provides evidence for the intuition that the more data points we get, the more reliable the result of the clustering algorithm should be. An algorithm which does not converge produces rather unpredictable results on any given sample and thus is completely unreliable. Conversely, if an algorithm does converge, it can be investigated whether, at least for some prototypical examples, the limit clustering is a useful clustering of the data space or not. It is mainly for three classes of non-parametric clustering algorithms on Euclidean spaces that certain convergence properties are known: k-means (Ref. 14, see Ref. 15 for a recent overview), linkage algorithms (Ref. 16, Ref. 17), and spectral clustering (Ref. 12). In the following section we will explore the consistency of k-means on general metric spaces of trees. Let start our study of the definition and properties of the partitioning method called k-means defining some elements on the space of trees. 2.1. General specifications on metric spaces of trees A metric space of trees consists in a set of tree-shaped objects along with a metric. There are two broad strategies for defining a suitable metric: one strategy counts and weights the number of discrepancies between two trees whereas the other strategy maps the trees into alternative mathematical structures for which natural metrics already exists. In this paper we will follow the first approach. 2.1.1. Example 1: k trees An m-tree is a rooted tree such that every node has at most m children (enumerated from left to right). If a node has only one child, it should be designated as one of m possible. The set of all m-trees, the m-tree space,
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
285
is denoted by Tm . Ref. 18 introduced a metric that counts the number of edge discrepancies between two graphs. In Ref. 6, a refinement of this metric was introduced, transforming the set of all unbounded m-trees in an infinite compact metric space.
111
r Q Q
Q11 r ´Q Q 1 ´ r´ Qr ´c 211 c r´ ´
c crλ # #
21 12r
Q Q
122
r Q Q
´ r Q´ ´22
# Q# r ´2
´ r ´ 222
Fig. 1.
2-tree of depth two
Definition 2.1. Hamming type distance Let V be the set of all possible nodes of the (possible unbounded) tree, P φ : V → R+ be a strictly positive function such that v∈V φ(v) < ∞. Let I{t,y} (v) be an indicator function taking the value 1 when the node v is present in exactly one of the trees t and y, but otherwise is zero. Thus the indicator function notices discrepancies between nodes in the two trees. We define the distance d on Tk by dh (t, y) =
X
φ(v)I{t,y} (v)
v∈V
for all t, y ∈ Tm . The distance dh measures the discrepancy between the topology of the trees node by node, see Balding et al (2007) for details. An example of such space, the metric space with trees with up to 20 children per node was considered on Ref. 3 for modeling protein functionality.
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
286
2.1.2. Example 2: Binary trees A (labeled) binary tree is a graph with a node called root, all internal nodes with degree 3, and s labeled terminal nodes, with labels in A. The set of all labeled binary trees with s distinct terminal nodes is denoted by Gs . When there is a label preserving isomorphism between two trees they are consider identical, thus if two nodes have the same father it does not mater which is on the left or which is on the right. We must note that a set of different labeled trees may have the same topology tree, so Gs is not a subset of T2 . This is the space defined on Ref. 4 in order to compare different hierarchical clustering algorithms. It is also the typical representation of cladograms or evolutionary trees Ref. 5. In figure 2 we observe two evolutionary trees over the same taxa obtained by Neighbor Joining (single linkage) and UPGM (average linkage) methods.
Fig. 2. Left panel:Neighbor joining distance tree of primates using Jukes-Cantor model. Rigth panel: UPGM distance tree of primates using the Jukes-Cantor model
Defining a distance on this space is a difficult problem because there is no reasonable metric that imposes a neighborhood structure which is the same for all trees. One way to extend the Hamming metric strategy described before to a metric on binary trees is based on hypergraphs. The hypergraph generalizes traditional graphs by admitting edges that link more than two nodes Ref. 19. Each edge in the hypergraph is a cluster of cardinality greater than one in the hierarchy associated with a binary tree. In this framework the extension of the Hamming strategy is to count and weight the number of discrepancies in each type of edge. Definition 2.2. Let W be the set of all possible r − edges of the tree,
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
287
φ : W → R+ be a strictly positive function such that define the distance d on Gm by
Pm r=2
φ(r) < ∞. We
m
dh (t, y) =
1X φ(r)|discrepant r-edges between t and y|. 2 r=2
where |.| denotes the cardinality of the argument set. Another popular distance between computational biologist is the Robinson Foulds distance, (see Ref. 20). It counts the number of ”crossover” operations needed to change one tree into another. It is also the number of bipartitions induced by one tree but not by the other. If we choose an internal node, and cut the subtree pending from that node out of the tree, we obtain a bipartition of the set of taxa. If we take that subtree and inter-exchange it with another subtree, we have a crossover operation. Crossover operations are easier to determine on un-rooted trees, bi-partitions are clearly defined, and characterize rooted trees. We shall denote by B(t) the set of bi-partitions of A induced by t. Definition 2.3. Given a set A of taxa and two binary trees t and y, the Robinson Foulds distance between t and y is 1 dR (t, y) = [|B(t) − B(y)| + |B(y) − B(t)|] 2 where |.| denotes the cardinality of the argument set. Both distances were considered by Ref. 4 to model a small set of labeled trees. Their goal was to estimate a central binary tree and place a confidence region around it. Ref. 1 studied the phylogeny of three different sets of taxa organizing large data sets of phylogenetic trees estimated over them. They chose the Robinson Foulds distance for their analysis. 2.1.3. Example 3: m-tree with attributes A m-tree with attributes is a rooted tree with at most m possible children per node, with nodal attributes defined at each node. This is the space considered on Ref. 2 to model blood vessels of the brain of several people. Each node keep information about distance to its father, 3D orientation of the ongoing edge, etc. Definition 2.4. Let d : T × T → R be the function defined by X X d((t1 , x1 ), (t2 , x2 )) = I{t1 ,t2 } (v)φ1 (v) + λ kx1 (v) − x2 (v)k2 φ2 (v) v∈V
v∈V
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
288
for some strictly positive functions φ1 : A∗ → R+ and φ2 : A∗ → R+ P P satisfying v∈A∗ φ1 (v) < ∞, and v∈A∗ φ2 (v) < ∞. If the attributes take values on a compact set of an Euclidean space, the resulting space of trees is also compact space. 3. k-means on tree space Now we define random trees, expectations and empirical centers on general spaces of trees. This follows the definitions given by Ref. 21 for general metric spaces and Ref. 6 specifically for spaces of m-trees. 3.1. Random Tree, Mean tree and empirical mean tree Let (T , d) be a metric space of trees, in particular any of the preceding examples. A random tree with distribution ν is a measurable function Z T : Ω → T such that P(T ∈ A) = ν(dt) . (1) A
for any Borel set A ∈ B, where (Ω, F, P) is a probability space and ν a probability on (T , B), with B the Borel σ-algebra in T . The expected mean of a random tree T is the set (of trees) Ed T which minimizes the expected distance to T : Z Ed T := arg min d(t, y) ν(dy). (2) t∈T
T
The set Ed T is not empty, if T is compact or finite. Any element of the set Ed T is also called an expected tree or center. Since Ed T depends only on the distribution ν induced by T on T , it may also be denoted as Ed (ν). Let (T1 , . . . , Tn ) be a random sample of T (independent random trees with the same law as T ). The empiric mean tree (empiric center, sample mean) is defined as the random set of trees given by n
t := arg min t∈T
1X d(Ti , t). n i=1
(3)
If ν is defined on a finite set of trees the following law of large numbers follows immediately Theorem 3.1. Let (T0 , d) be a finite tree space with metric d. Let T ∈ T0 be a random tree with law ν such that Ed T is unique. Let {Tn , n ≥ 1} be an
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
289
i.i.d sequence of random trees with law ν. If tn is any of the empiric mean trees of {T1 , . . . , Tn }, that is, tn ∈ t, then lim d(tn , Ed T ) = 0
n→∞
a.s.
In other words, the set of empiric mean trees coincides with a singleton of the expected mean if n is large enough. A similar result holds for compact metric spaces, see Ref. 21 for details. Theorem 3.2. Let ν be a probability on the compact metric space (K, d) such that Ed T is unique. Consider {Tn , n ≥ 1} be an i.i.d sequence of random elements with distribution ν. Then, the empirical mean tree converge uniformly to Ed T almost surely: lim sup d(t, Ed T ) = 0
n→∞
a.s.
t∈t
These two theorems prove that the empirical mean trees are consistent estimators of the population mean tree in the cases comprehended by our examples, finite spaces and compact spaces of trees. If we have the distribution ν partitioned in clusters with a center on each cluster, we would like to define empirical centers with the same consistence properties as the empirical mean . 3.2.
Classical batch k-means algorithm
Let (T1 , . . . , Tn ) be a random sample of T with distribution ν. The classical batch k-means algorithm is a particular cluster technique that generates the class labels through minimization of the ”within cluster” point scatter, a distance based loss function defined by W (C) =
k 1X X 2 n=1
X
d(Ti , Ti0 ).
C(i)=n C(i0 )=n
This criterium characterizes the extent to which observations assigned to the same cluster tend to be close to each other. It is easy to see that the objective function W (C) is minimized by an alternating optimization procedure that first compute the cluster center as a solution of the minimization problem ∗
t := arg min t∈T
n X i=1
d(Ti , t)
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
290
and then reassign the observations to the closest center, until there are no further changes. We should notice that the cluster centers are the empirical mean trees computed over the trees of the cluster. Let state the algorithm in a slight different way in order to introduce the consistency theorem. k-means clustering algorithm (1) First choose centers t∗1 , . . . , t∗k to minimize n
gn (t1 , . . . , tk ) =
1X min {d(Ti , tj )}. n i=1 1≤j≤k
(2) Given a current set of centers t∗1 , . . . , t∗k , assign each observation to the closest center. That is C(i) = argmin1≤n≤k d(Ti , t∗n ). (3) Steps 1 and 2 are iterated until the assignments do not change. The mean of the points in C(i) must equal t∗i , otherwise gn could be decreased by first replacing t∗i by that cluster mean then, if necessary, reassigning some of the observations to new centers. This criterion is therefore equivalent to that of minimizing the within sum of squares. Following our statistical intuition, we now must study the consistency properties of our just defined algorithm. Consistency should mean then that the more data points we put in the sample, the more reliable the result of the clustering algorithm should be. In the case of k-means, the evolution of the partitions when sample size increases is tied to the evolution of the empirical centers. Expected centers Let t∗1 , . . . , t∗k be the expected centers of the population clustering, that means the trees that minimize the function g [{t∗1 , . . . , t∗k }] = arg min E(min{d(T, t1 ), . . . , d(T, tk )}) t1 ,...,tk Z = arg min min{d(x, t1 ), . . . , d(x, tk )}ν(dx) t1 ,...,tk
T
= arg min g(t1 , . . . , tk ). t1 ,...,tk
Since T is finite or a compact metric space, the expected center exists but it does not have to be unique. Now, let µn be the empirical measure of the sample 1X µn = δTi n
May 20, 2009
15:50
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
291
where δT is the point mass T . Then the function gn can be written as Z min{d(T, t1 ), . . . , d(T, tk )}µn (dx) gn (t1 , . . . , tk ) := T
n
=
1X min {d(Ti , tj )} n i=1 1≤j≤k
Empiric centers The empiric centers are the random set Tn∗ := arg
min
{t1 ,...,tk }
gn (t1 , . . . , tk ).
An argument of the type of the strong law of large numbers (as in given in Ref. 21 for the one dimensional case), proves that for each fixed set t1 , . . . , tk lim gn (t1 , . . . , tk ) = g(t1 , . . . , tk ).
n→∞
Theorem 3.3. Pollard (1981) If (T , d) is a compact metric space then arg min gn (t1 , . . . , tk ) → arg min g(t1 , . . . , tk ) 1≤j≤k
1≤j≤k
a.s.
Therefore, if T1∗ , . . . , Tk∗ are empiric centers, there is an appropriate labeling of them, and a labeling of the centers such that Tj∗r → t∗r
a.s
Pollard’s proof was written originally for Euclidean data, but as he did not use any vectorial resource, it is straightforward to extend it to general compact metric spaces. J. Lember in his phd thesis Ref. 15 has extended Pollard’s theorem to general separable metric spaces. 4. Computational examples We have shown in the previous section that the k-means algorithm can be defined in a metric space of trees, provided it is finite or compact, and that it maintains the property of consistency, the empirical centers converge towards the right centers when sample size increases. A main problem become computation of such empirical centers. In the Euclidean case, they are just the average between the observations in each cluster, but what are they on metric spaces of trees? By definition, they are trees, but how to find them? In Ref. 4, the centers are also the parameters of central tendency of their parametric distribution. In this case a descent algorithm is constructed to find them, in an example of phylogenetic trees with the Hamming type of
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
292
metric stated in Example two of Section 2. In Ref. 1, with the Robinson Foulds distance, k- means become k medoids, the centers were elements of the sample that minimize the within sum of distances per cluster. The k-medoids cluster algorithm is a gradient descent algorithm, thus it delivers a partition, but there is no consistency theorem possible. From Stockham analysis, single linkage seams to be the most promising algorithm for clustering large sets of phylogenetic trees, and as we said in the introduction, in order to deliver a consistency theorem for single linkage, we have to sort out the majorant measure problem. If we do, if we are able to say that clusters are the region of influence of the modes of a density (derivative of distribution ν respecting to the majorant measure µ), then single linkage may be seen as an estimator of the the minimum spanning tree of the density and it is therefore consistent as a clustering algorithm. In which case is k-means a computable algorithm? In the case of m-trees with the Hamming type of metric, the empirical mean or empirical center is the m-tree that has a node present if and only if it is present in at least half of the trees in the sample. This is a property of the Hamming type of metric. In the following example we will show how m-trees can represent protein sequences and k-means can deliver a partition of protein space.
4.1. Unsupervised classification of protein sequences into families 4.1.1. Variable Length Markov Chain Modeling of protein functionality A central problem in functional genomics is to determine the function of a protein using only the information contained in its primary sequence, Ref. 22. The primary structure of a protein is represented by a sequence of 20 different symbols called amino acids. It is well known that a protein functionality family is formed by proteins that perform the same function on different organisms and by proteins that come from the same organism that have been derived by genetic duplication or rearrangements, Ref. 23,24. Well characterized proteins within a family may help enhance the process of classification of family members whose functions are not well known or not well understood Ref. 25 . Also, the features characterizing each functionality family may give information about common evolutionary history Ref. 27 . Most used methods for proposing hypothesis over protein functionality are based on sequence alignment, Smith (1981). Exact sequence alignment has a quadratic computational complexity, which make them unfeasible for
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
293
large databases. Heuristic methods like BLAST, Ref. 26 are between the most common choices for comparing sequences in large data sets. Recently, this problem has been addressed also with non alignment methods, that look for family models with parameters or characteristics that determine its functionality. From the mathematical point of view, clustering is an ill posed problem. Moreover, the definition of functionality family is quite ambiguous, so it is very difficult to quantify it mathematically to obtain a unique objective function to optimize. As a result, computational clustering approaches differ in the representations of the proteins to be clustered, the definition of the optimization goals and also in the resulting partitions of the known protein space. Stability and heterogeneity of the resulting clusters are known problems that are shared for most methods, but still they help to build a big picture of the on going experimental structure which represents superfamilies. The goal of fully automated clustering methods becomes to give partial answers with respect to global organization of all protein sequences. We start by modeling protein sequences as Variable Length Markov Chains (VLMC), a model introduced by Ref. 29. A (VLMC) is a discrete time stochastic process with the property that the law of the process at any given time depends on a finite (but not of fixed length) portion of the process at precedent times (Ref. 30). As usual in the applications of (VLMC), we assume that the process is a Markov chain of order at most L (finite memory process). The minimum set of sequences needed to completely specify the distribution of the next symbol in the sequence is known as a context tree and it is denoted by t. Calling p the conditional transition probabilities associated to the nodes of t, the pair (t, p) completely determines the law of the (VLMC). (VLMC) have been successfully applied to model and classify protein sequences (Ref. 10). As in the case of profile HMM (Hidden Markov Model) in the construction of the Pfam families, the (VLMC) approach of Bejerano and Yona takes, for each family, a set of already classified protein domains and estimates a (VLMC) model, i.e. a pair (t, p). Then, the estimated (VLMC) model is used to classify other protein sequences into the family.The motivation of such approach is related to biological understanding of the evolution and composition of protein families. They suppose that a group of evolutionary related protein sequences should exhibit many identical short segments which have been either preserved by selection or have not diverged long enough from their common single ancestral sequence. The variable memory model is well equipped to pick up these locally conserved
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
294
segments, showing them in the architecture of the context tree, (Ref. 9). Instead, we consider the context trees of sequences as a random sample of a probability distribution on tree space that has its support divided into clusters around true centers, and we consider the distribution restricted to each cluster as the signature of the family. That is, we propose that the context trees of the sequences disregarding the associated probabilities are sufficient to learn the clustering structure of different Pfam families. To evaluate this bold statement, we take k samples of protein sequences, and for each sequence we construct the estimated context tree using the PST algorithm introduced by Ron et al. (1996) and implemented by Ref. 32, obtaining k samples of trees. We assume that the samples are independent and that the trees in each sample are independent and identically distributed with law ν. Clustering is then carried out in a metric space of trees, and as it is well known, the success of it depends strongly in the concentration of the distribution (restricted to the family) in tree space. We compute estimates of the VLMC context tree of each family using the Probabilistic Suffix Trees algorithm, Ref. 9. 4.1.2. First example Table 1.
Confusion matrix of recognition rates. Overall recognition 90.5%
Family ATP-synt-A beta-lactamase cox2 cpn10 DNA-pol
ATP-synt-A 94.87 0 0.93 0 0
beta-lactamase 0 78 0 0 0
cox2 0 0 84.26 0 0
cpn10 5.13 2 10.19 100 0
DNA-pol 0 20 4.63 0 100
Our approach diverges from the classical approach of VLMC methods since we do not use for classification the empirical probabilities associated to each context but the architecture of the context tree that is computed for each protein sequence in the family. Also, we do not collect all the samples to generate an estimation of the model, but we compute an estimate per sample sequence and look how they cluster together in tree space. The context tree built with all the collected sequences will show segments that are consistently repeated in most of the sequences, but context trees built with each sequence will show patterns inside each particular sequence, and the family bond will emerge as a relationship in tree space. The definition of the distance is thus fundamental for our approach.
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
295
Our first example selected 5 families of proteins from the version of the Pfam database used by Ref. 9, labeled ’ATP-synt-A’, ’betalactamase’,’cox2’, ’cpn10’, ’DNA-pol’. As we can see in Table 1, the overall performance is 90.5 %. Five families represent a small example, but allows to write a confusion matrix, and check for bias in the clustering output. Only 9% of proteins have been misclassified over a total of 337 proteins, and we see that ’DNA-pol’ is perfectly separated from the rest but absorbs 20% of the proteins from ’beta-lactamase’. 4.1.3. Second example In our second example we added 6 more families from the same database, ’7tm-1’,’actin’,’adh-short’,’adh-zinc’,’ank’, and ’efhand’, and the overall recognition rate drop to 85.7 %, Again, we work with small samples and few families in order two study the confusion matrix we can see at Table 2. In this case roughly 15% of a total of 1701 proteins are misclassified. Looking at Table 2 we notice again that proteins are not scattered around but are misplaced in specific families. This feature is very interesting at the time to determine the coherence of each family and the relationships between different families. For example, from the set of proteins of the beta lactamase family that have been incorrectly assigned to the ’Dna-pol’ family, 90% of them have been reassigned now to the ’ank’ family, but 50% of the ank proteins have been assigned to the ’Dna-pol’ family, showing that these three families are close in tree space.
5. Final Remarks In this paper we were concerned with automatic clustering of tree structured objects. A set of strongly non-vectorial objects is very difficult to visualize, and characteristics must be extracted from them in a fully automatized manner. In this context, algorithms that show data dependent partitions are unreliable, even if they convey some good particular examples. In the other hand, statistically consistent clustering procedures are very few, even working with Euclidean data. The procedure that more attention has obtained in this aspect is k-means, since it has been proved consistent under different assumptions, even real valued functional data. Metric spaces of trees are very particular spaces, where the choice of the metric will be determining at the time of verifying the computability of the procedure. We
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
296
have shown an example of tree space with a Hamming type of metric where the mean tree is computable. It is a novel modeling approach for families of protein sequences, enabling a partition of protein space without need of previous sequence alignment. The tree sample is the one obtained when a general set of codified strings is modeled with a Variable Length Markov Chain (VLMC). The VLMC is represented by its context tree, which can be estimated from each string using an algorithm like PST (Ref. 9) or Context (Ref. 29) leading to the final database of estimated trees. If the codification is correct, we claim that the context tree of the chain will have all the information that is needed for discrimination by metric based methods. In functional genomics, proteins are codified as strings of amino acids, and VLMC models are naturally fitted to functional families of such strings. Amino Acid chains are natural candidates for this type of modeling, but any suitable codification of an object with a finite alphabet will make this model arise, so other problems besides functional genomics could make profit of this type of approach. 6. Prospects The paper by Stockham et al (2002) on Bioinformatics shows a proposal of consensus among binary trees by clustering. They choose to work with the Robinson Foulds metric, which do not allow for an explicit computation of the empirical centers. They apply k-medoids and several linkage algorithms in a data mining fashion. Single linkage was shown as the best of the set. Coming back to the problem of delivering an automatic map of protein space, single linkage is also proposed by Ref. 8 as an efficient and accurate method to tackle 50000 proteins at once (using a BLAST derived similarity measure). On Euclidean data sets, k-means and single linkage are the two nonparametric methods of clustering that are suited for different population distributions, the bull’s eye is perfect for single linkage, when compact groups with well separated means is the data set for k-means. Both methods deliver consistent clustering, through very different modeling assumptions. It would be important to study, define and prove consistency of single linkage on (finite, separable, compact, locally compact?) metric spaces. Acknowledgments Work partially supported by grant PICT 2005 31659-233 and PID SecytUNC 69/08. We would like to thank Florencia Leonardi for providing the
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
297
data we used in Section 4. References 1. Stockham, C., Wang, L., Warnow, T. (2002) Statistically based postprocessing of phylogenetic analysis by clustering Bioinformatics Vol. 18 no. 90001. 2. Wang, H., Marron, J. (2007). Object oriented data analysis: Sets of trees. Ann. Statist. Volume 35, Number 5, 1849-1873. 3. Busch, J., Ferrari, P. Fleisa, A.G., Freiman, R., Grynberg, S. Leonardi, F. (2008). Testing statistical hypothesis on random trees, and applications to the protein classification problem. To appear in Annals on Applied Statistics. 4. Banks, D. and Constantine, G. (1998), Metric models for random graphs. Journal of Classification, 15: 199–223. 5. Bryant, D. 2003. A classification of consensus methods for phylogenies. in Janowitz, M., Lapointe, F.-J., McMorris, F.R., Mirkin, B., Roberts, F.S. (eds) BioConsensus, DIMACS. AMS. 163–184. 6. Balding, D. Ferrari, P., Fraiman, R., Sued, M. (2007) Limit theorems for sequences of random trees. To appear in TEST. ArXiv: math.PR/0406280. 7. Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J. Zhang, Z. Miller, W. and Lipman, D.J.(1997) Gapped BLAST and PSI BLAST: a new generation of protein database search programs. Nucleic Acids Research. 25, 3389–3402. 8. Loewenstein, Y.,Portugaly, E., Fromer, M., Linial, M.(2008). Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Bioinformatics. ISMB 2008 CONFERENCE PROCEEDINGS 19-23 JULY 2008, TORONTO. 24(13): i41-i49, July 1, 2008. 9. Bejerano,G. (2003) Automata learning and stochastic modelling for bio sequence analysis, PhD thesis. Hebrew University. 10. Bejerano,G and Yona,G. (2001) Variations on probabilistic suffix trees: statistical modelling and prediction of protein families. Bioinformatics, vol.17:1, 2001, 23–43. 11. Page, R. D. M. (1996). On consensus, confidence, and total evidence. Cladistics 12: 83-92. 12. U. von Luxburg, M. Belkin, and O. Bousquet.(2008). Consistency of spectral clustering. Annals of Statistics, 36 (2), 555-586. 13. McLachlan, G.J. and Peel, D.. Finite Mixture Models. Wiley, 2000. 14. Pollard, D. (1981) Strong Consistency of k-means clustering. Annals of Statistics Vol. 9 No. 1 135-140. 15. J. Lember. (2003). On Minimizing Sequences for k-centres Journal of Approximation Theory 120, 20-35. 16. Hartigan, J.A. ( 1981) Consistency of single linkage for high-density clusters. Journal of the American Statistical Association, 76: 388-394. 17. Hartigan, J.A.(1987). Estimation of a convex density contour in two dimensions. Journal of the American Statistical Association, 82(397): 267-270. 18. Hamming, R. (1950). Error detecting and error correcting codes. Bell systems technical journal 29, 147-160. 19. Berge, C. (1989) Hypergraphs: combinatorics of finite sets, New York: North
April 24, 2009
16:15
WSPC - Proceedings Trim Size: 9in x 6in
Ana.Georgina.novo2
298
Holland. 20. Robinson, D.R., and Foulds, L.R. 1981. Comparison of phylogenetic trees. Mathematical Biosciences 53: 131-147. 21. Sverdrup-Thygeson, H. (1981) Strong law of large numbers for measures of central tendency and dispersion of random variables in compact metric spaces. Annals of Statistics Vol. 9 No. 141-143. 22. Karp, R.M. (2002) Mathematical challenges from genomics and molecular biology, Notices Amer. Math. Soc., 49(5), 544–553. 23. Dayhoff, M.O. (1976) The origin and evolution of protein superfamilies. Fed. Proc. 35, 2132–2138. 24. Hefyi, H. and Gerstein, M (1999) The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 288, 147–164. 25. Einsenberg, D, Marcotte, E.M., Xenarios, I. and Yeates, T.O. (2000) Protein function in the post genomic era. Nature, 405, 823–826. 26. Bateman, A. and Coin, L. and Durbin, R. and Finn, R.D. and Hollich, V. and Griffiths-Jones, S. and Khanna, A. and Marshall, M. and Moxon, S. and Sonnhammer, E.L. and Studholme, D.J. and Yeats, C. and Eddy, S.R. (2004) The Pfam protein families database. Nucl. Acids Res. 32, 90001, D138-141. 27. Sasson, O, Vaakin, , Fleischer H, Portugaly, E. Bilu, Y, Lineal, N and Linial, M (2003) ProtoNet: Hierachical classification of protein space. Nucleic acid res. 31(1): 348–352. 28. Smith, O., Annau, T.M. and Chandrasegaran. S, (1990) Finding Sequence Motifs in Groups of Functionally Related Proteins, PNAS, 87, (2), 826-830. 29. Rissanen, J. (1983). A universal data compression system. IEEE Trans. Inform. Theory Vol. 29(5), 656–664. 30. B¨ uhlmann, P. and Wyner, A. J. (1999), Variable Length Markov chains. Ann. Statist. 27: 480–513. 31. Ron, D., Singer, Y. and Tishby, N. (1996), The power of amnesia: learning probabilistic automata with variable memory length, Machine Learning 25(23): 117–149. 32. Bejerano, G. (2004), Algorithms for variable length Markov chain modeling. Bioinformatics 20(5): 788–789.
April 24, 2009 16:15
Table 2. 7tm-1 0.96 0 0 0 0 0 0 0 0 0 0
actin 0 0.85 0 0 0 0 0 0 0 0 0
adh-short 0 0 0.992 0 0 0 0 0 0 0 0
adh-zinc 0.001 0 0.008 0.9 0 0 0.02 0 0 0 0.003
ank 0 0.10 0 0 0.29 0.025 0.18 0.028 0 0.35 0.003
ATP-synt-A 0 0 0 0 0 0.95 0 0.028 0.017 0 0.003
beta-lactam. 0.019 0.01 0 0.1 0.14 0 0.78 0.028 0 0 0
cox2 0 0 0 0 0 0 0 0.83 0 0 0
cpn10 0.02 0.02 0 0 0 0.025 0 0.086 0.983 0 0.006
DNA-pol 0 0.02 0 0 0.55 0 0.02 0 0 0.65 0
WSPC - Proceedings Trim Size: 9in x 6in
Family 7tm-1 actin adh-short adh-zinc ank ATP-synt-A beta-lactamase cox2 cpn10 DNA-pol efhand
Confusion matrix of recognition rates. Overall recognition 84.53% efhand 0 0 0 0 0.02 0 0 0 0 0 0.985
Ana.Georgina.novo2
299
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
300
GENETIC CODES AS CODES: TOWARDS A THEORETICAL BASIS FOR BIOINFORMATICS J. R. JUNGCK Department of Biology Beloit College 700 College Street Beloit, WI 53511 USA E-mail:
[email protected] Bioinformatics has developed primarily as a discipline within mathematics and computer science devoted to organizing and analyzing large biological databases. However, biology has much to offer to a synthetic discipline of bioinformatics that draws upon and respects the mutual contributions of biology, mathematics and computer science. In particular, biology has two major theoretical foundations, both evolutionary: namely, phylogenetic systematics and population genetics, that can serve as a cornerstone of a theoretical foundation of bioinformatics along with traditional empirically driven, pattern searching forms of classical bioinformatics. In this re-conception of bioinformatics, mathematics and computer science are instrumental in developing biological theory and in solving practical biological problems. Since the genetic code is both an evolutionary product as well as a process for mediating the conversion of genotype to phenotype, it is argued here that an evolutionary analysis of genetic codes will fundamentally affect our ability to make meaning out of molecular messages through a theoretically grounded bioinformatics. How do the mathematical properties of genetic codes relate to selection pressures on the rate of synthesis of proteins, correctability and detectability of mutations, compactness of genes, and the origins of genetic codes by employing coding theory (Baudot codes, Gray codes, Hamming codes, Huffman codes, common free codes, etc.), abstract algebra, graph theory, combinatorics, information theory, efficiencies, symmetries, and phylogenetic systematics of sequences? Genetic codes become much more understandable and elegant to biologists, mathematicians, and computer scientists when they are not considered as mere ciphers, but are instead understood from three perspectives: codes as codes per se, physical chemical interactions, and evolutionary selective pressures. These various faces of genetic codes are useful for making meaning out of molecular messages, applying causal mechanisms to complex patterns, and the efficient storage and retrieval of large complex data sets. In addition, some of the alternative distance metrics based upon different mathematical representations of genetic codes that have utility in genomic data base searching (comparative sequence analyses and gene finding), phylogenetic tree construction, and prediction of
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
301 three dimensional structure from primary structure will be illustrated and different evolutionary mechanisms affecting gene expression based upon codon usage will be considered.
1. Introduction Bioinformatics has developed primarily as a discipline within mathematics and computer science devoted to organizing and analyzing large biological databases. Implicit in most bioinformatic analyses of DNA and RNA sequences are assumptions of a universal genetic code that, as a cipher, translates from the nucleotide alphabets of DNA and RNA to the amino acid alphabet of proteins. These forms of analyses work in enough cases to have become encoded as the default setting in most bioinformatics software packages. However, these forms of analyses are problematic, erroneous or limited in terms of the: a) evolutionary concepts; b) biophysical systems; c) metaphors; and d) pedagogies in use. Consequently, I argue that evolutionary biology ought to complement mathematics and computer science in the discipline of bioinformatics. Specifically, phylogenetic systematics and population genetics can serve as cornerstones of a theoretical foundation of bioinformatics along with the empirically driven, pattern searching forms of classical bioinformatics. In this re-conception of bioinformatics, mathematics and computer science are instrumental in applying and developing biological theory and in solving practical biological problems. Since genetic codes are both evolutionary products as well as processes of mediating the conversion of genotype to phenotype, it is argued here that an evolutionarily informed analysis of genetic codes will fundamentally affect our ability to make meaning out of molecular messages through a theoretically grounded bioinformatics. Genetic codes can be more accurately grasped by biologists, mathematicians and computer scientists when they are understood not as mere ciphers, but from the three perspectives of: codes as codes per se; physical chemical interactions; and, evolutionary selective pressures. These various perspectives are useful for making meaning out of molecular messages, applying causal mechanisms to complex patterns, and efficiently storing and retrieving large complex data sets. Re-conceptualizing bioinformatics to include biology allows us to pursue more robust questions. For example, how do the mathematical properties of genetic codes relate to selection pressures on the: 1) rate of protein synthesis; 2) accuracy; 3) compressibility (compactness of gene sequences); 4) efficiency; and 5) channel capacity? First, in most bioinformatics analyses of DNA and RNA sequences, the
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
302
assumption is that there is one universal genetic code and it is used as a cipher to translate from the nucleotide alphabet of DNA and RNA to the amino acid alphabet of proteins. While this approach works in many cases and is encoded into most bioinformatics software packages as a default setting, it is problematic because: (1) some investigators do not realize that sequences from mitochondria or chloroplasts use an a different genetic code than nuclear genes or from some protozoan source which also uses a different genetic code; (2) cells grown in the presence of antibiotics like puromycin, streptomycin and rifampicin, with high or low salt concentrations, at very high or low pH, or other some other environmental conditions express numerous “missense” mutations are ignored or unnoted; (3) phylogenetic tree distances that that presume that variation in mutation rates has occurred under conditions independent of environmental conditions and cells SOS repair systems (Radman, 2001)1 or other environmental conditions that affect anagenic change along internal branches of phylogenetic trees. Notions of the genetic code being fixed (Freeland et al., 2000),2 universal, and/or “one in a million” (Freeland and Hurst, 1999)3 are common; but evolution works on variation within populations. Thus, I adopt a typical evolutionary biologist’s counter-assumption: namely, since variation exists in genetic coding in different populations of living organisms and environments that they inhabit vary, genetic codes are evolving and will continue to evolve. A second problem with many bioinformatic analyses is that the focus is on statistical pattern matching, databases, and algorithms without any underlying causal material model of biological phenomena. While statistics and computer science are absolutely crucial to bioinformatics, when biologists are maligned for their ignorance of these subjects as if they bring nothing to the table for analysis of such complex data they go too far. Molecular sequences are strings of digital entities (whether ones and zeros or even A’s, T’s, G’s, and C’s). Computer scientists Doerge, BaileyKellogg, Sherman, and Weil (2003)4 state: “Bioinformatics is an evolving science that . . . [is] Defined as the generation, organization, and analysis of biological data, bioinformatics encompasses all biological phenomena . . . bioinformatics addresses problems related to the generation, organization, retrieval, and analysis of information about biological structure, sequence, and function.” Another computer scientist, Pevzner (2004),5 stresses learning computer science algorithms over that of biological insight. Often be-
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
303
ginners with this perspective miss that we can interpret close sequence alignments as products of common evolutionary ancestry and that these molecules are structural equivalents of one another that can functionally replace one another; furthermore, their differences are often used to design inhibitors or drugs. Elsewhere I (Jungck, 2005)6 have argued that we need a collaboration between computer scientists, mathematicians, and biologists rather than continuing these arguments over primacy in our emerging field of bioinformatics and herein will illustrate contributions from mathematical theorems and constructions as well as biological research. Third, challenges to the metaphor of our genome as the “book of life,” of chromosomes as “libraries,” and even, of genetic codes as “codes” at all eliminate information theory, computational linguistics, communication theory, signal analysis, and coding theory as legitimate sources of fundamental insight –Lilly Kay (2000)7 explicitly asserts that “the genetic code is not a code, it is simply a table of correlations, though not nearly as systematic or predictive as the periodic table, for example, because of contingencies, degeneracies, and ambiguities in the structure of the so-called genetic code.” Furthermore, she believes that “DNA is not a natural language: it lacks phonemic features, semantics, punctuation marks, and intersymbol restrictions. So unlike any language, “letter” frequency analyses of amino acids yield only random statistical distributions.” While I do not want to debate her ontological assumptions about what is or is not a code or a language, I will explicitly show herein that her testable predictions are simply wrong: (1) “letter” frequency analyses of amino acids do not yield only random statistical distributions; rather they obey power laws similar to those used to study human languages, and (2) that contingencies, degeneracies, and ambiguities are precisely concepts we need to consider in understanding how well genetic codes work as codes. While I have previously asserted that the genetic code is a periodic table (Jungck, 1978),8 I take biological opposition to the notion that “the genetic code is simply a table of correlations” because she mistakes what the genetic code even is. As with Woese et al. (2000),9 I believe that genetic code encoding occurs in the loading of amino acids onto transfer RNAs catalyzed by the specificity of amino acyl tRNA synthetases (note well that both tRNAs and amino acyl tRNA synthetases are evolved and evolving structures) and as May et al. (2004a)10 assert that genetic code decoding occurs on ribosomes which they make analogous as “functionally paralleled to a table-based convolutional decoder.” Ribosomes contain about 55 different proteins and three large nucleic acids. May et al.s (2004b)11 convolutional code model views “the ribosome as a
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
304
mechanism with memory, which differs from Schneider’s (1991)12 idea of macromolecular machines without memory.” To misconstrue the table as the code ignores the physicochemical processes of encoding and decoding as well as the channel in which these information processes occurs, namely the interior of cells with crowded macromolecular environments with spatial heterogeneity and boundaries. Fourth and finally, from an educational perspective, cracking “the Genetic Code” has been perceived as one of the most significant periods of biochemical problem-solving in the past half-century (see Kay (2000)7 and Hayes (1998)13 for excellent histories of this episode). However, this historical perception has substantially ignored the contributions of mathematical work on codes in this “cracking” and has led students to memorize “what stands for what” in this genetic dictionary without understanding why. In addition, this normative textbook approach has reified three antievolutionary notions: (1) “the Genetic Code” is universal, (2) “the Genetic Code” was created once, and (3) “the Genetic Code” is no longer subject to change. If we are to understand what genetic CODING is, then the historical misconceptions and unappreciation of mathematics in biological research should be redressed and each of these three textbook features of “the Genetic Code” should be illuminated from an evolutionary as well as a mathematical perspective. These four views of a universal genetic code (digital cipher) that is robustly conserved neglects not only the diversity of genetic codes, but more importantly, it ignores the complex, omnipresent synchronic and diachronic evolutionary pressures on genetic coding machinery (tRNAs, amino acyl tRNA synthetases, and ribosomes) and that “being” is analog, not digital. If instead, we assume that (1) specificity in encoding primarily occurs through the loading of specific amino acids onto tRNA’s mediated by the catalysis of amino acyl tRNA synthetases that are themselves products of extensive molecular evolutionary history (Delarue, 2007);14 (2) that the interaction of loaded amino acyl tRNAs and mRNAs are mediated (i.e., decoded) by roughly 55 different proteins and three large rRNA’s in ribosomes (May et al.,2004b);11 and, (3) that these entities are molecules that diffuse, interact in both very specific and nonspecific collisions in a crowded intracellular environment sequestered into different compartments, are transported between these compartments, and are eusemantides (Zuckerkandl and Pauling, 1965)15 records of genetic and evolutionary history. Furthermore, evolutionary biologists know that “equilibria destroy history” and that “universality” is a Platonic anathema in biology (Figure 1) that vio-
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
305
lates Fishers “Fundamental Theorem of Natural Selection” which basically says that increases in fitness are directly proportional to genetic variance (Jungck, 1997a).16 Without variation, extinction is the most probable outcome in biology.
Fig. 1. Universal in biology connotes an archetype or the pure, abstract, perfect eidos versus the Darwinian perception that variation abounds and items identified as universal are simply statistical averages or subsets of a much larger distribution of possibilities that do exist in nature (Jungck, 1997b).17
Many authors have recognized the need for much more comprehensive views that stress that bioinformatics is composed of computer science, mathematics and biology or that an “integrative bioinformatics” would include a long list of disciplines such as comparative genomics, “structural bioinformatics,” chemistry, mass spectroscopy, 2-hybrid systems, SNP analysis, and expression arrays (refs). However, in order to stress the role of theory as a lens for experimental design, observation and interpretation (after Dolf Seilacher’s famous adage: “I wouldnt have seen it, if I hadnt believed it”),18 I have proposed that bioinformatics increase its perspective to embrace traditional aspirations of mathematical biology: namely, instead of being judged by the accuracy of phenomenological curve fitting, a theoretical basis for biology should have a causal, material, and mechanistic basis. In Figure 2, I illustrate that that this theoretical foundation could be built upon three distinct sources: (1) the major theoretical theory of biology, namely, evolution, and more specifically upon two mathematically rich foundations of evolutionary theory: population genetics and phylogenetic systematics; (2) the major theoretical theories of biophysics and biochemistry to recognize these sequences as molecules in an environment, namely, quantum mechanics, statistical mechanics, thermodynamics, and kinetics; and (3) the major theoretical theories available to infer properties of messages, namely, the aforementioned information theory, computational
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
306
linguistics, communication theory, signal analysis, and coding theory.
Fig. 2. A theoretical basis of bioinformatics should be informed by biological theory, in particular, evolutionary biology’s two main successes: phylogenetic systematics and population genetics; by an understanding of the biophysical theory about molecular structures and dynamics, namely, thermodynamics, quantum mechanics, statistical mechanics, and kinetics; and by foundations of inferences about messages, namely, information theory, algebraic coding theory, and theoretical linguistics. If such a theoretically informed bioinformatics were adopted, I believe that we would have a better chance of understanding sequences, structures, pathways, expression, and regulation of genetic machinery from a causal perspective (Jungck et al., 2006).19
If genetic codes are instead viewed as physical and evolutionary entities in a pluralistic perspective where these codes are susceptible to numerous selective pressures on complex rugged dynamic evolutionary landscapes, I believe that such a “satisficing” perspective will better inform bioinformatics attempts to infer meaning from molecular messages. “Satisficing is an alternative to optimization for cases where there are MULTIPLE and COMPETITIVE objectives in which one gives up the idea of obtaining a “best” solution. In this approach one sets lower bounds for the various objectives that, if attained, will be “good enough” and then seeks a solution that will exceed these bounds. The satisficer’s philosophy is that in realworld problems there are too many uncertainties and conflicts in values for there to be any hope of obtaining a true optimization and that it is far more sensible to set out to do “well enough” (but better than has been done previously)” (Principia Cybernetica Web, accessed 2008).20 Because of the combinatorial explosion of possible intra- and intermolecular interactions inside of cells and ever arising multiplicity mutations over time, computation of optima seems like a very unsatisfactory approach. Populations that
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
307
survive and contribute genes to future generations may be lucky and not always the best; nonetheless, they still contribute to the evolutionary legacy. Attention to optimization functions also neglect that empirically we face the mathematical problem of simultaneous under determination and over determination by data. For example, in construction distance based phylogenetic trees, the distance matrices have many more pairwise distances than there are interior edges in a tree and so we need to compute fits to these multiple constraints; on the other hand, because the number of possible tree topologies explode combinatorically, we will never have enough data to determine the best of all possible trees; furthermore, anastamosing processes such as horizontal gene transfer, allopolyploidy, and endosymbiosis may produce networks or rings rather than trees. Therefore, I believe that less ambitious, modest modeling in the face of such complexity rather than focusing on primacy of a few factors is a better way to understand evolution of genetic codes. Also, from a philosophy of science perspective, the maintenance of multiple working hypotheses (Chamberlin, 1890; 1965)21 affords an intellectual openness to including more problem solving possibilities and responding to new data and ideas. So what are these selective and drift pressures? If an Escherichia coli cell is able to reproduce and divide in twenty minutes, think of the amazing speed at which various cellular processes have to occur such that numerous long proteins are synthesized in such a short time. Proteins that are dysfunctional or mis-folded are frequently digested by lysosomes in eukaryotes and amino acids are recycled; at what cost (Tlusty, 2008)22 ? Secondary structural features of messenger RNAs such as attenuators influence the rate of translation; are there selective pressures on codon composition of mRNA’s or the frequencies of third bases in synonymous codons due to mRNA regulation (Biro, 2008)23 ? Kinetic proof reading and hydrolytic editing have been suggested as two theories for the specificity of loading amino acids onto tRNAs by the catalysis of amino acyl tRNA synthetases moving an amino acid residue from the 2’ to the 3’ vicinyl hydroxyl groups of riboses in the terminal adenosine of tRNA’s; how has this specificity arisen and to what pressures are they still exposed (Rocha, 2004)24 ? If a mutation occurs during genetic coding, many proteins would be simultaneously affected; if so, how can changes in genetic code reassignment take place (such as the addition of selenocysteine to the basic set of twenty amino acids or a reduction in the number of amino acids used in constructing proteins such as occurs in some mitochondria and chloroplasts) without irreparable harm to a whole cell, organism, species?
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
308
These biological questions have distinct mathematical equivalents. Herein I review and synthesize the contributions of employing a mathematical biologist’s perspective on these questions and related ones. I hope to convince the reader that genetic codes are indeed codes and that the informatic theoretic perspective of well-designed source and channel encoding not only inform subjects like electrical engineering and communication, but bioinformatics as well. 2. Adaptive Landscape of Genetic Coding Selective Pressures Let us begin by comparing various evolutionary models of genetic coding. Since the sixties, six primary models have existed: (1) translational error ambiguity reduction, (2) direct interaction - physicochemical bases through stereochemical fits, (3) frozen accidents, (4) lethal mutation - prevention of errors and catastrophes due to sudden changes, (5) vocabulary expansion from simple to complex codes, and (6) genetic flexibility). These have been well reviewed and evaluated elsewhere. In Jungck (1978),8 based on multiple linear regression and contingency table analyses of 45 physicochemical properties of amino acids and dinucleotide-monophosphates, I concluded that evidence was consistent with both (1) and (2), that (3) (4) and (5) could be rejected on evidenciary and/or logical grounds, but that I had no reason to judge Kimura’s genetic flexibility hypothesis. However, herein, I want to compliment and complement as well as critique two recent works that have combined several factors from the above six in their consideration. First, Knight, Freeland, and Landweber (1999)25 presented a wonderful case for multiple factors working simultaneously in the evolution of genetic codes: adaptation (error minimization, stereochemical fits between amino acids and codons or anticodons, and historical contingencies that tended to lock in certain amino acid codon correlations and that each of these played a more significant role than the other two over some epoch of time. In the earliest stage of code evolution, they favor stereochemical forces, then they favor a vocabulary expansion model, and, finally, in recent and current time, they favor error minimization almost exclusively. Second, Tlusty (2007, 2008),22,26 in a more quantitative approach has identified three factors as most significant: accuracy, diversity, and cost. The equipoise of these three conflicting needs, for minimal error load, minimal cost of resources and maximal diversity of vocabulary, defines the fitness of code (Tlusty, 2008).22 In each case the authors stress the need for multiple causative models. The introduction of concepts of load are parallel
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
309
to typical population genetic textbook coverage of substitution load, mutation load, and segregation load. Tlustys load (2008)22 is defined within the biophysical context that I am emphasizing; namely, he states: “Finally, one must remember that the code is realized in molecules, which cost the organism materials, energy and time to synthesize and maintain”. I believe that a better construction of our understanding of genetic coding should rely more on codes developed by mathematicians, computer scientists, and electrical engineers and the questions that they were developed to address (Table 1). Also, previous analyses have routinely made some presumptions that are empirically wrong, teleological, improbable, or unwarranted. First, many authors presume that amino acid ordering in the absence of a genetic code was random ab initio. In Nakashima, Jungck, and Fox (1977),27 we statistically rejected the null, hypothesis, i.e., the random model, at the 106 fiducial level, by illustrating that simple thermal polymerization of amino acids in the dry state selectively generated certain sequences in much higher amounts than expected. Second, the physical chemical association of amino acid physical properties and that of their conjugate anticodon dinucleotide monophosphates was also statistically rejected at the 10-6 fiducial level (Jungck, 1978; 1984a & b),8,28,29 both parametrically and nonparametrically, the anticodons are predictable from the physical chemical properties of the amino acids. Third, the strong language of perfection and distortion by the authors does not account for situations where a tad of sloppiness is somewhat beneficial; for example, some “read through” mistranslations of stop codons generate some longer polypeptides that are employed at crucial steps in development of mealworms, defense proteins in chickens, and in viral antigens (e.g., Harrell, Melcher, and Atkins, (2002)30 , Segawa and Imamoto, (1976)31 ). As Miroslav Radman (2001)1 so eloquently states: “Errors and infidelity, even wastefulness, can cause individual failure, but also provide innovation and robustness, ensuring the perpetuation of life. Nature does not exhaust itself for the sake of fidelity and perfectionism. Rather, errors are made, often repaired or discarded, but always tested as the source of blind innovation during the continuous adaptation to unpredictable environmental changes and challenges.” Fourth, the code is not frozen at twenty amino acids (Hayes, 2004)32 both selenocysteine and pyrolysine are used in some organisms. Fifth, the length of the anticodon in tRNAs is not always three nucleotides long but can vary up to four or five (Hayes, 2004).32 Finally, as Massey (2008)33 points out, much of evolutionary history is not always due to selection; in particular, what role has random genetic drift played over the course of history on the evolution of
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
310
genetic coding apparatuses? Table 1.
Information theory and coding theory address analogous questions to those
considered by biologists trying to understand how the genetic code operates swiftly, accurately, and efficiently. Selective pressure
Information Theory/Code
Biological question counterpart
1. Speed
Shannon’s First Theorem Huffman Code
Rate of protein synthesis Minimal average size word
2.Accuracy
Shannon’s Second Theorem Hamming Code Gray Code Baudot Code Error Detecting and Error Correcting Codes
Symmetry/ Ambiguity/ Redundancy/ Degeneracy/ Resilience/ Reliability/ Flexibility/ Conservation Distances between different code words/ Minimal mutational change and effect on 3D protein structure/ Parity checking/
3. Compressibility
Kolmogorov Chaitin Algorithmic Complexity Comma-Free Codes
Compactness Overlapping codes/ Selection on frameshift mutations and multiple reading frames/
4. Efficiency
Zipf’s Laws
Rank order of use of codons/ codon usage patterns/ relationship to composition
5. Channel Capacity
Relationship between first and second laws of information theory
Noisy channel of diffusing materials in a heterogeneous, crowded interior milieu of a cell
6. Why are certain words in the domain assigned to certain words in the range?
Combinatoric Codes
Physicochemical basis of coding
3. Shannon’s First Theorem of Information theory After Quastler, Yockey, Ycas, and others felt disappointed that information theory had not played as dramatic a role in the beginnings of molecular biology, Lila Gatlin (1972)34 drew upon information theory to differentiate the way that viruses and bacteria utilize the DNA alphabet in their genomes was substantially different. Since she wrote in a period that pre-
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
311
ceded Maxam-Gilbert and Sanger dideoxy DNA sequencing techniques, the primary data available to her was individual nucleobase frequencies and dinucleotide frequencies. Luckily, these data were sufficient for her to conclude that viruses and bacteria vary their response to environment by varying base composition (higher GC content in warmer environments, higher AT content in colder environments) and that eukaryotes varied syntax instead. Thus, intersymbol interactions drew considerable attention after that. However, she argued that Darwinian evolution focused more on Shannons second theorem than his first. But his first theorem is still important if viewed from the question: How fast can a protein be synthesized? Shannon’s first theorem of information theory, also known as the source coding theorem, states that it is possible to transmit a code at a high speed without loss of information unless one shortens the average word length below a certain length determined by the frequency distribution of use of letters of the alphabet under consideration (Equation 1):
H=
n X
pi (log2 p1 ),
(1)
i=1
where H is the size of the average length word in bits or the entropy of the information, pi is the frequency of the i-th letter of the alphabet, and the summation is over all letters (amino acids) in the alphabet (Shannon 1948).35 Before returning to Gatlins attention to the second theorem, an earlier literature had successfully looked at the first theorem (Mackay, (1967)36 , Alff-Steinberger, (1967)37 , Golomb, (1962)38 , Papentin, (1973)39 ). In particular, Mackay (1967)36 employed Huffman’s (1952)40 algorithm for practically approaching the Shannon theoretical limit. We (Berg and Jungck, 1998)41 have extended this analysis by using a much larger and more phylogenetically diverse sample of sequences (Table 2, Figure 3). For this set of data, the Shannon theoretical limit is 4 bits per word, the Huffman code achieves 4.25 bits per word, and the standard genetic code results in 4.33 bits per word. Thus, the standard genetic code is capable of synthesizing proteins at a rate nearly as well as a well designed electrical engineering solution. Thus, the degeneracy of the standard genetic code is not a random frequency distribution as Kay (2000)7 asserts, but instead behaves much like a human language encoded in a game of scrabble: frequently used letters in the English language such as “A”, “T”, and “E” are only worth one point and have several tiles each while least frequent letters such as “Q”
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
312
Fig. 3. and Table 2. Huffman Coding of codons for amino acids based upon their frequency in the composition of protein sequences. 168 proteins were selected for functional and phylogenetic diversity and downloaded from Swissprot, a well curated database of protein sequences. The compositional frequencies were used to construct a Huffman coding tree. Then the average word length was computed from the sample for both the standard genetic code degeneracy and for the associated Huffman code (Berg and Jungck, 1998).41
and “Z” are worth ten points and have only one tile for each. In terms of the machinery of genetic coding, many genomes have multiple copies of frequently used tRNAs (refs). A priori neither result is predicted by the critics of understanding genetic codes as codes and the importance of the informatic theoretic perspective in analyzing protein and nucleic acid sequences. 4. Shannon’s Second Theorem of Information theory Shannon’s (1948)35 second theorem says that source and channel coding can be done independently. It is possible to construct a code such that any amount of information can be sent over a noisy channel at high fidelity (accuracy) [without loss of a speed] by utilizing degeneracy and redundancy
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
313
in the signals to be encoded. The second theorem does not specify how the effects of noise can be overcome, but it does say that we can get the probability of error down by using degeneracy, redundancy, and symmetry. In our approach, we examined symmetries in the standard genetic code and evaluated whether when mutations occurred, were amino acids replaced by similar amino acids and were whether distances between the code words of these amino acids were distant enough from one another to potential detect and correct errors. 4.1. Hamming Codes and Error Minimization In 1978 and 1979, we (Bertman and Jungck)42,43 extended the work Danckwerts and Neubert (1975)44 who had reported that the genetic code can be most easily divided into two octets of dinucleotides. Eight dinucleotides are totally degenerate; i.e., they code for only one amino acid. The other eight dinucleotides ambiguously code for amino acids (see Table 3). Table 3.
Danckwerts and Neubert (1975)44 two octet’s of the standard genetic code.
The codons in Octet 1 are completely degenerate; i.e., any one of the four nucleotides in the third position of its codon still specifies that particular amino acid. For the codons in Octet 2, whether a purine or a pyrimidine exists in the third position of its codon determines which amino acid or punctuation is specified; for two dinucleotides (AU and UG), which purine is in the third position is important.
OCTET 1 Dinucleotide GC
Amino Acid Ala
OCTET 2 Dinucleotide AU
GU GG AC CC UC CG CU
Val Gly Thr Pro Ser Arg Leu
GA AG UU AA CA UA UG
Amino Acid Ilu, Met, “Start” Asp, Glu Arg, Ser Leu, Phe Asn, Lys His, Gln Tyr,“Stop” Cys, Trp, “Stop”
Danckwerts and Neubert (1975)44 showed that the relationships could be demonstrated with the Klein 4-group (Table 4) by using the exchange operators = 1 = identity, α = transversions between noncomplementary
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
314
bases, β = transversions between complementary bases, and γ = transitions (see Figure 4). Table 4.
Multiplication within the Klein 4-group of nucleotide exchange operators.
Note the symmetry that every element in the group is its own inverse.
1 α β γ
1 1 α β γ
α α 1 γ β
β β γ 1 α
γ γ β α 1
Fig. 4. Elements of the Klein-4 group (K) illustrating the exchange operators α, β and γ for the nucleotide substitution mutations.
Although they presented the connectedness of the doublet code algebraically and in tabular form, we believe that our graph of the Cartesian cross product of two Kelin-4 groups (a K x K group) displays the nature of the doublet code in a fashion such that operational connectedness is obvious (see Figure 5). Note that all of the totally degenerate dinucleotides (octet 1) lie at the vertices of planes connected continuously (illustrated by shading). Furthermore, this dinucleotide group graph is consonant with the conservativeness of the genetic code to mutational alterations; i.e., most operations from one dinucleotide to another produce dinucleotides that code for similar amino acids. The reader can also see that it takes a minimum of two operations to move from a vertex coding for an initiator (AU) to a vertex coding for a terminator (UA or UG). This result remains true if we
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
315
were to include the putative initiator GU(G).
Fig. 5. K vs K graph of the doublet genetic code. The graph of a group G whose elements are 1, α, β and γ is the graph of a game G1 associated with K as above. The graph of K may be drawn (since α and β are their own inverses) by connecting the 16 vertices by undirected edges (which we will understand to represent traffic in both directions) and hence the graph takes on a simplified form (Bertman and Jungck, 1979).43
Since we are interested in determining the cost of an error due to a mutation in either a sequence or in decoding a sequence, let us compute the Hamming distance between two code words (and their cognate amino acids) by simply counting the minimal number edges on the graph in Figure 5 to navigate their respective dinucleotides (Figure 6).
Fig. 6. The Hamming distances between amino acids defined by the number edges to traverse between their respective dinucleotides (Table 3) on the K X K group graph in Figure 5.
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
316
The values in Figure 6 could be used as an alternative to PAM and BLOSSUM matrices in determining distances between sequences in multiple sequence alignments and in the construction of distance based phylogenetic trees. Such values are not empirically derived from previously constructed multiple sequence alignments and phylogenetic trees, but are based on a theoretical grounding and a group theoretic symmetry. This work has been extended to a six-dimensional hypercube for all 64 codons and with an examination both in vivo and in vitro by Jimnez-Montano, de la Mora-Bas´an ´ez, and P¨ oschel (1996).45 Group theory has also been used by Findley, McGlynn, and Findley (1982, 1989)46,47 to analyze alternative genetic codes; by using cyclic groups they have been able to predict all alternative genetic codes recorded then and since then a rather remarkable achievement. They concluded: “The generalized approach to the genetic code sketched by Gatlin (1972) 34 implies that the total code be viewed initially as C x A. A biological context then serves to select a particular subset of C x A (e.g., in the case of the standard code, the subset is that defined by the f-mapping). Such a generalized code appears to be logically necessary if ambiguous codon assignments are to be considered as an integral part of the genetic apparatus. It then follows that the code must be considered a much more complicated structure than was previously thought. What we have demonstrated is the possible existence of a constraint on C x A that must necessarily serve to restrict the biologically meaningful subsets of C X A. Thus, while the genetic code may very well be more complex than the standard view implies, it does exhibit well-defined regularities.” More recently, Findley and McGlynn (2008)48 have extended their work by differentiating between “permutational and substitutional evolution” and that this may lead to an “evolutionary field theory.” Thus, both symmetries and distances result from such a theoretical analysis of codes and help us understand how evolution of genetic codes has enhanced the potential for increased accuracy and minimizing the impact of mistakes by separating the code words of unlike amino acids more than those of amino acids with similar side chains. 4.2. Gray Codes and Error Minimization Rosemarie Swanson (1984)49 and with Stan Swanson (1995)50 used a Gray code to illustrate a different minimization approach: namely, since 26 = 64, the genetic code could be written as a cycle of single bit changes in sixbit words representing each of the 64 codons in the standard genetic code where each of the four nucleotides were represented by a two-bit word.
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
317
As a biophysicist who studies the three dimensional X-ray crystallographic structures of proteins, she then mapped the changes around the cycle to the position and size of amino acids in primarily globular proteins that exist in an aqueous environment. If we view their circle as a compass face, the north-south axis was from external amino acid side chains to internal amino acid side chains and the west-east axis was from small amino acid side chains to large amino acid side chains. Bosnacki, ten Eikelder, and Hilbers (2003)51 extended the above work by showing that the Gray code is a TSP (Traveling Salesperson Problem) solution (Gilbert, 1958),52 i.e., the Gray code is a Hamiltonian path on an n-cube (in this case n = 6). They also used a more formal approach to determine the distances between amino acids’ codons on the six-dimensional hypercube by solving the TSP problem locally: “The TSP problem is a so-called NP-hard problem, which implies that finding a shortest route is in general very hard. However, the TSP instances we consider here consist only of n = 64 cities. For instances of this size an optimal solution, i.e. a shortest route, can be found in a few seconds by an exact TSP solver. We used a local search method based on 2-opt neighborhoods[1] as well as the exact Concorde solver[2].”However, neither sets of authors considered the size of the set of the general Gray code solutions. Each of them is only considering one solution (their particular circular representation) rather than for all potential solutions. Bosnacki, ten Eikelder, and Hilbers (2003)51 development of a Gray code reinforces that “the genetic code is optimized to be robust with regard to translation errors, rather than random mutations. As we do not have the problem of misplaced serine (S) and proline (P) [in the Swanson and Swanson result], one can say that our arrangement even strengthens the arguments in that direction.” The calculation of greatest lower bounds and least upper bounds to the Hamiltonian path remains an open research (NP-complete) problem; nonetheless, the On-Line Encyclopedia of Integer Sequences! lists the solution for the “Number of directed Hamiltonian cycles (or Gray codes) on n-cube” for n = 5, is almost 2 billion unique solutions 1,813,091,520 and we know that for n = 6, the answer will be substantially higher. Since all solutions are perfectly good Gray code representations of the standard genetic code, we need to ask the question what proportion of these paths maintain small distances between similar amino acids and large distances between substantially different amino acids. Both the Swanson and Swanson (1984, 1995)49,50 and Bosnacki, ten Eikelder, and Hilbers (2003)51 Gray codes were based upon the use of mRNA codons. Since my earlier work had found the significant correla-
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
318
tion between anticodon dinucleotide properties and amino acid physical properties (Jungck, 1978; 1984a),8,28 I have constructed an anticodon Gray code (Figure 7)
Fig. 7. An anticodon Gray Code. In this case, I have tried to relate the amino acids to the two sets (I and II) of octets in Table 3 which uses group theoretic symmetry as reported by Danckwerts and Neubert (1975).44 Note that six type I’s are contiguous and that five type II’s are contiguous in this particular Gray Code solution.
The Gray Code is not the only Hamiltonian circuit solution to consider; a Hamiltonian path, namely, a rook’s tour of a chess board (also with 64 cells one per codon). In 1991, Gerald Rosen,53 developed a “Rook’s tour of the genetic code” which he similarly thought made minimal changes in amino acid properties as it went along. However, we (Qat Allikian and myself) have generated numerous Hamiltonian path solutions that have better linear regression fits to a variety of physicochemical characteristics of amino acids (Jungck, 1978; 1984a)8,28 even using Rosen’s (1991)53 distribution of code words on the chess board. Thus, the Gray Code or Hamiltonian circuits and paths affords another approach to look at error minimization of the standard genetic code in terms of the physicochemical properties of amino acids. Most small mutations or translational errors will result in the substitution of similar amino acids. 4.3. Baudot Codes and Error Minimization An alternative conceptualization asks the question: How could we store the genetic code in the computer in a minimum amount of space. Such storage of the genetic code is frequently important to anyone manipulating
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
319
Fig. 8. This Rook tour was chosen because the linear regression of numbered consecutive steps versus the hydrophilicity - hydrophobicity scale was a much better fit than Rosen’s (1991).53
extensive protein and nucleic acid sequence data with the purpose of constructing parsimonious phylogenetic trees (Dayhoff, 1972).54 Although the genetic code is typically presented as a two-way table which would most easily be handled as a 4 x 4 x 4 three-dimensional matrix, many small computers (as well as most programmers with minimal experience) can handle strings much more efficiently than matrices. Thus, the basic question to be answered is:“What is the minimal length string of symbols in which we can completely contain all 64 codons with the minimal amount of duplication?” Obviously a string less than 192 (3 x 64) symbols long can contain all 64 codons if overlapping reading frames are employed. Thus, I approached the problem empirically and was able to show that a linear sequence of 66 nucleotides is able to account for the complete set of codons without any repetition (Figure 9). I immediately noticed that this solution’s ends were identical in the first two and the last two nucleotides; hence, the sequence can be reduced to a 64-long nucleotide sequence in a circle (Figure 10). Since the concatenated circle could be cut in any of 64 internucleotide spaces to construct a 66 nucleotide string (with duplication of the penultimate and ultimate nucleotides only), there must exist at least 64 possible linear sequences of 66 nucleotides which contain all 64 codons (or anticodons). In addition, each of the nucleotides can be named in 4! (i.e., 24) different ways by interchanging A and U, and G and C, etc. Therefore, minimally
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
320
Fig. 9. An example of a string of nucleotides 66 long which contains all 64 codons once and only once in overlapping reading frames. The codons can be read out three at a time (with completely overlapping groups) from beginning to end. A = adenylic acid; G = guanylic acid; C = cytidylic acid; and, U = uridylic acid.
there must be at least 64 x 24 = 1,536 possible ways of constructing a 66 base sequence which satisfies the stipulated conditions. How many circles of 64 nucleotides in length could be constructed from four different kinds of nucleotides? The solution of an equivalent string of beads question has been reported by Rota (1969)54 to be given by equation 2, where n is the length of the necklace, k is the kinds of beads, and (d) is equal to the number of positive integers prime to one o˜the number of positive numbers less than d whose greatest common divisor and that number is one (Avital and Hansen, 1978).56 c(n, k) = (1/n)
X
ω(d)k n/d
(2)
dn
For the problem at hand, the number of combinations c(64,4) is about 2122 or 5.32 × 1036 different such necklaces. Thus, the minimum number of circles 64 long composed of four nucleotides (1,536) only represents a very small fraction (2.89 × 10−32 %) of this maximum. By inspection, it is easy to show that permutations of nucleotides other than those generated by severing or exchanging names of the solution in Figure 10 will also satisfy the stipulated criteria. On the other hand, intuitively the number of sequences which satisfies the conditions must be less than the calculated maximum because one condition for such a sequence is that it must contain exactly 16 nucleotides of each of the four types. However, the number of permutations of a linear sequence of 64 beads consisting of 16 of each of four types is nearly as large a number (see Equation 3). P =
64! = 6.62 × 1035 16! 16! 16! 16!
(3)
T. van Aardenne-Ehrenfest and N.G. de Bruijn (1951)57 have shown that the general mathematical solution for any linear block code with an alphabet containing α letters and with word length n could be contracted n−1 to a circular αn long and that there would be α−n (α!)α such sequences.
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
321
Thus, for the problem at hand with an alphabet α = 4 and word length n = 3, the de Bruijn sequence is 64 long and there are (4!)16/64 or 1.89 × 102 0 unique circular sequences. It is interesting to note that even de Bruijn discovered in 197558 that his specific solution for word length 2 was antedated by C. Flye Sainte-Marie in 189459. De Bruijn also introduced two methods for the construction of these concatenated circles of code words. The first, the method I originally employed, simply takes any n-tuple and takes the n-l suffix as the n-l prefix of the next code word and one simply continues to build a circle sn long by exhaustively using all αn words, in this fashion, once and only once. If a word is missed, it usually can be inserted into your sequence at some point without concomitantly losing other words. The second method requires the construction of a directed de Bruijn graph (Figure 10) which contains all possible n-l tuples as its vertices and each directed edge represents a code word with the source of the edge representing the n-l prefix of the code word and the sink of the edge representing the n-l suffix of the code word. Each vertex should have a degree of eight in the de Bruijn graph; i.e., each vertex should have four outgoing edges and four incoming edges. Homodinucleotides have a reflexive arc that must be counted as both an outgoing edge and an incoming edge. If one traverses the directed graph by a Eulerian circuit (Bondy and Murty, 1976)60 which follows each and every edge once and only once, then one will have generated a de Bruijn sequence. Note that the total number of unique sequences is equivalent to the number of such Eulerian circuits on the directed graph (van Lint, 1974).61 In contrast to the Hamiltonian circuit problem in the Gray Code section, the Eulerian circuit problem on an n-dimensional hypercube is a well known result. The directed graph for the genetic code shown in Figure 10 is just one of these Eulerian circuits. Street (1974)62 has called these “Eulerian Washing Machines.” Such circular overlapping codes are formally called Baudot Codes. Baudot invented his code in 1870 and patented it in 1874. When Baudot Codes came into use for teletypes, they were considered as important for that technology as Morse coding had been for telegraphs; thus, the Baudot Code was also known as International Telegraph Alphabet No. 1. The fact that the standard genetic code is a block code that has 1.89 × 1020 possible configurations is interesting with respect to the enumeration of possible original alternative genetic codes (Bertman and Jungck, 1978).42 For example, how many code tables could one have which assign all 64 codons to the twenty amino acids and terminators, if one makes sure that each amino acid and a terminator was assigned at least on codon, as is the
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
322
Fig. 10. De Bruijn graph of the genetic code. All 64 codons will be included in any complete Eulerian circuit of this graph. There are 1.89 x 1020 such possible circuits.
case in our supposedly “universal” genetic code? This is equivalent to the combinatoric question which asks how many ways can we place 64 labeled balls into 21 labeled urns (s) where each urn receives at least one ball (see Equation 4) .
C ≈ 21!
21 X (−1)s (s! 21 − s)64 s=0
(21 − s)!
!
∼ = 1.50 × 1084
(4)
An alternative, stricter set of conditions would be to evaluate the genetic codes with the same distribution of numbers of codons to particular amino acids as exists in the actual code (see Equation 5). 64! = 6.62 × 1035 (5) 6!6!6!4!4!4!4!4!3!3!2!2!2!2!2!2!2!2!2!1!1! The importance of these calculations is that they place the question of the code’s origin in proper perspective because any statement that purports that our standard genetic code is the optimal code should at least present strong evidence to exclude the majority of these other possibilities. A very difficult task indeed! P =
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
323
Fig. 11. A circular representation of the sequence shown in Figure 10 with all the corresponding amino acids (in three letter abbreviations) and the three terminators (ldots) for each of the 64 codons shown on inner wheels. There are 21 triplet words in each inner circle and the fourth valine codon has one letter of its abbreviation in each of the three circles. A reader can start reading anywhere on the outer circle and continue to read three at a time until in the same reading frame again, and he will have then read all 64 codons (Jungck, 1984c).63
The fact that the genetic code can now be efficiently stored in sequences 66 nucleotides long in a large variety of ways also allows us to pursue a number of related questions. Note first that 66 spaces is almost one-third the length (192 spaces) that would be required to individually list each of the 64 codons (or anticodons) separately. The amino acids could also be stored using single letter notation in a 64 symbol sequence; we have used such a solution in computer assisted instruction on genetic translation. Among the related questions are: Can the genetic code be represented by a circle of 48 nucleotides (because C and U are totally degenerate in the third position of codons)? Can a sequence of 24 nucleotides be generated which starts with an initiator codon, is followed by codons for the twenty amino acids, and ends with a terminator codon? [Demongeot (1978)64 solved a similar question; he showed that the circular permutation of (AUGGUGCCAUUCAAGACUAUGA) is a non-unique solution of 22 bases which contains an
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
324
initiator, 20 amino acid codons, and a terminator.] In a 66 nucleotide sequence such as in Figure 9, can we do the same thing in one frame; i. e., place an initiator codon in the first three nucleotide positions, 20 amino acid codons in the center 60 nucleotides, and a terminator codon in the final three positions and still account for all 64 codons by overlapping reading frames? [Because the initiators are AUG and GUG, and the terminators are UGA, UAG, and UAA, this is impossible; however, only one additional nucleotide at the beginning may be necessary.] Can a circle of 64 nucleotides be made periodic (Jungck, 1978)8 such that related amino acids are encoded by contiguous series of overlapping codons or are positioned at regular arc angles? How long a string would be required to place all degenerate codons together? We might also ask what features the antiparallel complements possess (Jungck, 1977)65 ? Finally, do any such strings occur in natural nucleic acid sequences, and, if so, do they have any unique selective characteristics? One feature of the circular sequence presented in Figure 11 is that at least it could recognize all 64 codons or anticodons in template fashion which is not possible in such proto- tRNAs or rRNAs as suggested by Folsome’s (1977)66 circular permuted generator hypothesis for the origin of the genetic code. However, many tRNA and 55 rRNA sequences, while quite short, do contain a large number of code words when examined in overlapping reading frames (Dayhoff, 1972).54 On evolutionary grounds, it may be worth examining the maximum homology of such sequences with the 1.89x 1020 unique de Bruijn sequences which can be generated from Figure 10 in order to test a version of Folsome’s hypothesis. The efficient presentation of the standard genetic code as a sequence of concatenated codons has utility in comparisons of coded nucleic acid sequences for proteins as stated earlier. Also, a pedagogical use of the genetic code configuration in Figure 10 would be as a Monte Carlo generator of random sequences of codons much in the fashion of a roulette wheel. It is furthermore hoped that the utility of results of applying algorithms for general linear block codes to the genetic code such as shown here may lead to further insights into the actual formal coding properties of the standard genetic code and help us infer why it has evolved to its present form. How does the Baudot Code relate to the theme of error minimization? One of the reasons that Baudot Codes were literally used in rotary washingmachine dials was by using such overlapping words to code for similar activities, a little over rotation was unlikely to engage a fill cycle when the tub is rotating at its highest speed or vice versa. As a biological parallel,
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
325
Baudot Codes could be used to examine the impact of frameshift mutations and mistranslations. In fact, Crick, Griffith, and Orgel first proposed the “Error Minimization Hypothesis” in 195767 , long before even the first word of the genetic code was “cracked,” that the code when solved would have the property that frame-shift errors would be avoided; i.e., “errors in which translation shifts by 1, 2, 4, 5, 7, 8 . . . nucleotides” Gutfraind (2006).68 Some viruses, bacteria, and eukaryotic organelles are known to use overlapping genes (Normark et al., 1983)69 and it has been hypothesized that when genomes are under severe selective pressure such that minimizing the size of their genome has high survival value (Lillo and Krakauer, 2007).70 Just to give you the reader some idea of how difficult this is, look at Figure 12. While, with great difficulty, we could find some examples of individual words that make sense, think of how difficult it would be to construct meaningful sentences; nonetheless evolution has been able to accomplish this unlikely result. Partly, nucleic acids have an advantage over our linguistic constructions because overlapping genes can be encoded in the antiparallel complementary strand of DNA so six reading frames are possible, not just three.
(a)
(b) Fig. 12. Analog of overlapping genes. A Linear representations of three frameshifted sequences of three letter wordsat make sense in a. two out of three reading frames and b. all three reading frames.
On the other hand, Yuko Osada, Ryo Matsushima, and Masaru Tomita (Keio University, Endo, Fujisawa, Japan, private communication) assert a contrary view in their computational analysis of overlapping genes in My-
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
326
coplasma genitalium: As we can see in those results, when one of the overlapping genes conserves the overlapping region, the other gene does not conserve the region well. This indicates that amino acid sequences of the overlapping region are biologically important in only one (or none) of the two proteins. Thus it can be inferred that overlapping genes have emerged from two non-overlapping genes, one of the genes extending its coding region by, for example, changing its stop codon. Amino acid sequences of those extended regions would not have biological importance, but still preserved because they are harmless. Whereas some people think that overlapping genes are the results of strong evolutionary pressure to down-size genome, we conclude that it is probably not true in the case of M.genitalium. Three recent papers have addressed the issue of genome compression and the usage of overlapping genes in RNA viruses (R. Belshaw, O. G. Pybus, and A. Rambaut, 2007),71 in bacteria (Kingsford, A. L. Delcher, and S. L. Salzberg, 2007),72 and vertebrate genomes (Makalowska, Lin, and Makalowski, 2005).73 Overlapping genes may be of considerable value as markers in building phylogenetic trees because of their rarity (Luoa et al., 2006).74 Also, local sequence contexts around overlapping genes may (Cock and Whitworth, (2007)75 ; Fukuda, Nakayama, Tomita, (2003)76 ; Sabath, Graur, and Landan, (2008)77 ) may be important selective constraints. While Cullmann and Labouygues consider some related possibilities in their article on Noise Immunity of the Genetic Code ((1983)78 ; and see their (1985)79 article), to my knowledge, no systematic, quantitative analysis of the evolutionarily selective pressures on the co-evolution of overlapping genes has been published; in particular, with reference to Baudot Codes and their ramifications on varieties. With 1.89x1020 such possible de Bruijn circuits (Equation 6), each which corresponds to one Baudot Code, there is plenty of combinatorial possibilities to consider.
S = (m + 1)−n [(m + 1)!](m+1)
n−1
= 4−3 (4!)4
S = 1.89 × 1020 .
3−1
=
2416 43 (6)
It should be noted that de Bruijn circuits have found tremendous application in another aspect of computational molecular biology, namely, the assembly of short sequenced DNA fragments into whole genomes. For a fine review of this literature, see Kaptcianos (2008).80
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
327
5. Algorithmic Complexity and Other Information Theoretic Considerations Shannon states that the trade-offs between the first and second theorems of Information Theory are such that we can get the probability of error arbitrarily down close to zero while transmitting at high rate if we use good enough codes of sufficient length. Furthermore, Shannon’s separation principle says that source and channel coding can be done independently. In what are sometimes referred to as Shannon’s third laws of information theory, the “Inverse channel coding theorem” and the “Direct channel coding theorem,” he introduces the notion of channel capacity. Numerous papers have considered this problem in the context of genetic coding, including Tlusty (2007, 2008)22,26 where we began. I will deal with problem in a future paper. I am particularly intrigued with the recent report of Yampolsky and Stoltzfus (2005)81 on a better metric of the relatedness of amino acids physical characteristics and how much such an evaluation will make us reevaluate some of the above made claims. Another often used concept from computational linguistics is Zipf‘s law (1949).82 Essentially a power law relationship exists between the rank order of word usage and their frequency of use. Analyses of both coding and noncoding regions have been analyzed extensively. While all conclude that genetic sequences make efficient use of codon usage, considerable debate surrounds whether the relationship is always necessarily a power one. I will also deal with this issue in a future paper. Let me conclude with a brief observation on a too little explored aspect of Information Theory, namely, Chaitin - Kolmorgorov Algorithmic Complexity (also known as “compressibiity”). Danchin (1996)83 restates their approach in genetic context thusly: What are the “minimum number of steps for an algorithm to specify a sequence of length N (or minimum space without altering its performance)?” A completely monotonic sequence can be easily summarized by a single exponent: AAAAAAAAAAAAAAAAA
AN
On the other hand, a completely random sequence (and randomness is always very difficult to judge): GCCT GAACT GA . . . (RANDOM) N steps In other words, random sequences cannot be stored more efficiently; each and every symbol needs to be specified. These two extremes specify
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
328
the range of “compressibility.” If we had better insights into this problem, we might be able to compress DNA databases much more efficiently. We know that there is room for progress because evolution could not have explored the combinatorial explosion of all potential sequences even for short lengths. Thus, even at the 16-mer level, certain sequences are missing in GenBank (Hampankian and Andersen 2007).84 Danchin (1996)83 reports that prokaryotes are space dependent “they cannot usually have two regulatory signals in the same place because of physical occlusion - compositional variation” and that eukaryotes are “both space and time variable: less constraint on length variation. . .. [which] permits juxtaposition of regulation signals by regulatory proteins and allows exploration of the properties of their association according to the rules of combinatory logic”. Thus, evolutionary analysis supplemented by coding theory and information theory affords a rich source of insights into a variety of bioinformatics problems. The lack of considering codes as real codes made of molecules in complex intracellular environments, transmitted over generations, subject to energetic, kinetic, material, and informational constraints has limited both research and education too long. Herein we demonstrated that four simple codes employed by electrical engineers, computer scientists, and mathematicians, namely Huffman codes, Hamming codes, Gray codes, and Baudot codes, apply and have been applied interestingly to genetic codes and their bioinformatic usage. Furthermore, open problems exist for future research because the complex combinatorial possibilities offered by each of these codes provide numerous avenues to pursue.
6. Conclusion: “Genetic Codes as Codes” In the above analysis of “genetic codes as codes” viewed through the lens of four different codes: Huffman codes, Hamming codes, Gray Codes, and Baudot codes, I have argued for a comprehensive view of genetic coding that emphasizes three important considerations: (1) coding is a physicochemical process involving molecules and their interactions; (2) coding is dynamically evolving in multiple intra- and interspecies niches subject to numerous selective pressures and random mutation affecting both ontogeny and phylogeny; and (3) coding is informed by algebraic coding theory, information theory, and computational linguistics. In Table 5, I list a number of the features that clearly differentiate this perspective from many other treatments.
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
329 Table 5.
Comparison of the properties of “genetic codes as codes” presented herein
versus those of the “standard genetic code” portrayed in many articles, textbooks, and software programs.
Genetic Codes as Codes Code Analog Dynamic Multiple Context-sensitive Wetware Relation Subject to Natural Selection and Genetic Drift (Chance, Luck) Encoding occurs during the loading of amino acids onto transfer RNAs; The noisy channel occurs in transmission through a crowded intracellular melieu; Decoding occurs during synthesis of proteins on ribosomes
The “Standard Genetic Code” Cipher Digital Static, Frozen Universal, Unique Context-free Virtual Map/ Function Inevitable, Determined Designed, “Law” The “Standard Genetic Code” is not differentiated into separate steps; encoding and decoding usually are not identified or when they are, attention is usually only given to the decoding step of ribosomal reading of mRNA to make proteins
Lest I be accused of constructing a straw-man argument, in addition to the articles and book referred to in the introduction to this chapter, let me draw attention to several articles in the mathematical biology literature that embody “the standard genetic code” perspective listed in the right hand column. While you might think that analogies of “the standard genetic code” to the I Ching would be limited to new age websites, numerological patterns without reference to evolutionary, biochemical, and informatic contexts abound. For example, shCherbak (2003)85 and Rakocevic (2004)86 build on the work of Dankwerts and Neubert (1975),44 Bertman and Jungck (1979),42 Jim´enez-Montano, de la Mora-Bas´anez, and P¨ oschel (1996),45 and Karasev and Sorokin (1997)87 in drawing attention to the eight out of sixteen codons based on dinucleotides in “the standard genetic code” that are completely degenerate in their third nucleotide positions. However, their work only defines a number series divisible by 37 which they attach special importance to. Thus, it is not surprising that their work has been picked up by creationists and “intelligent design” advocates as God-given designs and that “the standard genetic code” is “the” “Circle-of-Life.” To differentiate this from advocates whose work seems close to theirs, the RNA code world adoption of the four-dimensional hypercube by Jos´e, Morgado, and Govezensky (2007)88 not only posits evolution from an RNA world to a DNA world, but also considers again the distances between codons for different amino acids and stop signals as well as a hypothesis for possible adaptive significance “frame-reading mistranslations.” Similarly, Michel (2007)89 use the three operators described above: transitions, transversions between
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
330
complementary bases, and transversions between non-complementary bases to construct nine mutations of each and every codon to predict the usage of twelve amino acid frequencies. Other numerical patterns include Yamagishi and Shimabukuro’s (2008)90 use of Fibonacci series to analyze Chargaff’s second parity law of nucleotide composition of DNA in multiple species, to develop a model of convergent frequencies of each of the four nucleotides in DNA, and to propose explanations for the C-value paradox. They believe that a “. . . Fibonacci string process might be involved in DNA sequence growth, particularly, in those DNA repetitive sequences which are almost 50 % of the human genome.” Apoorva Patel (2001)91 used a quantum search algorithm and N-dimensional Hilbert space to draw attention to the two solutions of an equation yielding the numbers four and twenty which he felt mapped conveniently onto four nucleotide letters of the RNA and DNA alphabets and onto twenty amino acid letters of the protein alphabet. Here again, while the number pattern seems to derive from an orthodox view of “the standard genetic code,” Patel still interprets it in terms of evolutionary problems of “digitization,” “packing of information,” and “selection of letters of the alphabet.” In a series of nine papers by Frappat and colleagues (e.g., Frappat et al., 1998; 2002; 2005)92–94 on a crystal structure based on quantum algebra of “the standard genetic code,” they have productively predicted three unmeasured thermodynamic properties of the amino acids histidine, aspartate, and glutamate, computed “free energy released by base pairing in double stranded RNA,” found correlations with codon usage, and that “the Shannon entropy for codons . . . [is] strongly dependent on the exonic GC content.” Obviously, a significant list of surprising achievements. Similarly, Sergey Petoukhov (2008)95 uses Hadamard matrices to investigate patterns in “the standard genetic code” related to employ ideas from signal processing: “General theory of signal processing utilizes the encoding of discrete signals by means of special mathematical matrices and spectral representations of signals to increase reliability and efficiency of information transfer” to expand on his earlier work with stochastic matrices (Petoukhov, (2005)96 ; He, Petoukhov, and Ricci, (2004)97 ). He furthermore proposes that these approaches “can be a natural base to organize storage and transfer of genetic information with noise immunity properties by means of decomposition of genetic sequences on the base of these orthogonal systems and by means of using proper codes . . . ” Genomic signal processing has, in fact, become a subject of considerable bioinformatics interest (e.g., Cristea, (2001)98 ; Jim´enez-Montano, Feistel, and Diez-Martinez, (2004)99 ; Re and Pavesi, (2007)100 ; Shmulevich and Dougherty, (2007)101 ). Thus,
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
331
novel approaches to “the standard genetic code” not only continue to be produced in abundance, but they may still lead to important bioinformatics tools and I do not want to throw the baby out with the bath water. I am simply suggesting that a more robust view such as presented here and by Tlusty (2007; 2008)22,26 and Wang et al. (2003)102 of genetic coding that emphasizes physico-chemical processes involving molecules and their interactions, dynamic evolutionary constraints, and codes as codes will lead to a better theoretical, material, causal bioinformatics. Acknowledgements I was requested to solve the Baudot problem independently by James C. Lacey, Jr., University of Alabama, in 1975 and again by B. Dennis Sustare, Clarkson College, in 1977. I especially appreciate the advice of four mathematicians: Larry Cummings, University of Waterloo, Robert Feinberg, Iowa State University, Ranjan Roy, Beloit College, and Martha O. Bertman, Clarkson College. Three of my students, Carley Berg, Qat Allikian, and Sunan Ganguli did wonderful projects mentioned herein. This research was partly supported by NSF grant # 0127498: BEDROCK: Bioinformatics Education Dissemination: Reaching Out, Connecting, and Knitting-together. References 1. M. Radman (2001). Fidelity and infidelity. Nature 413: 115. 2. S. J. Freeland, R. D. Knight, L. F. Landweber, and L. D. Hurst. (2000). Early Fixation of an Optimal Genetic Code. Molecular Biology and Evolution 17: 511-518. 3. S. J. Freeland and L. D. Hurst. (1998). The Genetic Code is One in a Million. J. Molec. Evol. 47: 238-248. 4. R. W. Doerge, C. Bailey-Kellogg, L. Sherman, and C. Weil (2003) http://www.science.purdue.edu/about us/strategic plan/ COALESCEAreas/bioinformatics03may.pdf (accessed 2008) 5. P. S. Pevzner (2004). Educating Biologists for the 21st Century: Bioinformatics Scientists Versus Bioinformatics Technicians. Bioinformatics 20: 2159-2161. 6. J. R. Jungck (2005). Challenges, Connections, Complexities: Educating for Collaboration. In Steen, L. (ed), Math & Bio2010: Linking Undergraduate Disciplines, pp. 1-12. The Mathematics Association of America: Washington, D.C. 7. L. E. Kay (2000). Who Wrote the Book of Life?: A History of the Genetic Code. Stanford University Press: Stanford, California. 8. J. R. Jungck (1978). The Genetic Code as a Periodic Table. Journal of Molecular Evolution 11: 211-224.
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
332
9. C. R. Woese, G. J. Olsen, M. Ibba, and D. Soll. (2000). Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process. Microbiology A Molecular Biology Reviews 64 (1): 202-236. 10. E. E. May, M. A. Vouk, D. L. Bitzer, and D. I. Rosnick, (2004a). An ErrorCorrecting Framework for Genetic Sequence Analysis (from Workshop on Genomic Signal Processing and Statistics (GENESIPS), Rayleigh, North Carolina, October 11-13, 2002). Journal of the Franklin Institute 341 (1-2): 89-109 (January-March). 11. E. E. May, M. A. Vouk, D. L. Bitzer, and D. I. Rosnick. (2004b). Coding theory based models for protein translation initiation in prokaryotic organisms. Biosystems 76 ( 1-3): 249-260 (August-October). 12. T. D. Schneider (1991). Theory of Molecular Machines. I. Channel Capacity of Molecular Machines. J. Theoretical Biology 148: 83-123. 13. B. Hayes (1998). The Invention of the Genetic Code. American Scientist 86: 8-14. 14. M. Delarue (2007). An asymmetric underlying rule in the assignment of codons: Possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13: 161-169. 15. E. Zuckerkandl and L. Pauling. (1965). Molecules as Documents of Evolutionary History. J. Theoret. Biol. 8: 357-36. 16. J. R. Jungck (1997a). Ten Equations that Changed Biology: Mathematics in Problem-Solving Biology Curricula. Bioscene: 23 (1): 11-36 (May). 17. J. R. Jungck (1997b). Biological Aftermath: What Can We Learn from Contemporary Mathematics Reform? BioQUEST Notes 7 (2): 1-5, 8-13 (March). 18. A. Seilacher (1995). In Delta Willis, The Sand Dollar and the Slide Rule: Drawing Blueprints from Nature. Addisson-Wesley Publishing Company: North Reading, Massachusetts. 19. J. R. Jungck, N. Khiripet, R. Viruchpinta, and M. Maneewattanapluk (2006). Evolutionary Bioinformatics: Making Meaning of Microbes, Molecules, Maps. Microbe 1: 365-371. 20. Principia Cybernetica Web. (accessed 2008). (http://pespmc1.vub.ac.be/ ASC/Satisficing.html). 21. T. C. Chamberlin (1890). The method of multiple working hypotheses: Science (old series) 15: 92-96; (reprinted 1965) 148: 754-759. 22. T. Tlusty (2008). A Simple Model for the Evolution of Molecular Codes Driven by the Interplay of Accuracy, Diversity and Cost. Phys. Biol. 5: 1-7. 23. J. C. Biro (2008). Does Codon Bias Have an Evolutionary Origin? Theoretical Biology and Medical Modelling 5 (1): 16-. 24. E. P. C. Rocha (2004). Codon usage bias from tRNAs Point of View: Redundancy, Specialization, and Efficient Decoding for Translation Optimization. Genome Res 14: 2279-2286. 25. R. D. Knight, S. J. Freeland, and L. F. Landweber. (1999). Selection, History, and Chemistry: the Three Faces of the Genetic Code. Trends in the Biochemical Sciences 24: 421-427. 26. T. Tlusty (2007). A Model for the Emergence of the Genetic Code as a Transition in a Noisy Information Channel. Journal of Theoretical Biology
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
333
249: 331-342. 27. T. Nakashima, J. R. Jungck, S. W. Fox, E. Lederer, and B.C. Das (1977). A test for randomness in peptides isolated from a thermal poly-amino acid. International Journal of Quantum Chemistry, Symposium 4: 65-72. 28. J. R. Jungck (1984a). The adaptationist programme in molecular evolution. The origins of genetic codes. In Molecular Evolution and Protobiology, K. Matsuno, K. Dose, K. Harada, and D. L. Rohlfing, eds., Plenum Press: New York, pp. 345-364. 29. J. R. Jungck and Robert M. Friedman (1984b). Mathematical Tools for Molecular Genetics Data: An Annotated Bibliography. Bulletin of Mathematical Biology 46 (4): 699-744. 30. L. Harrell, and J. F. Atkins. (2002). Predominance of six different hexanucleotide recoding signals 3’ of read-through stop codons. Nucleic Acids Res. 30 (9): 20112017. 31. T. Segawa and F. Imamoto. (1976). Evidence of Read-Through At Termination Signal For Transcription of Trp Operon. Virology 70 (1): 181-184. 32. B. Hayes (2004). Ode to the Code. American Scientist 92 (6): 494-499. 33. S. E. Massey (2008). A Neutral Origin for Error Minimization in the Genetic Code. Journal of Molecular Evolution. Early electronic access: http://www.springerlink.com/content/hl82264664k17513/fulltext.pdf? page=1. DOI 10.1007/s00239-008-9167-4 34. L. L. Gatlin (1972). Information Theory and Living Systems. Columbia University Press: New York, New York. 35. C. E. Shannon (1948). A Mathematical Theory of Communication. Bell System Technical Journal 27: 379-423, 623-656 (July, October). 36. A. L. Mackay (1967). Optimization of the Genetic Code. Nature 216: 159160. 37. C. Alff-Steinberger (1969). The Genetic Code and Error Transmission. Proc. National Acad. Sci. USA 64: 584-591. 38. S. W. Golomb (1962). Efficient Coding for the Deoxyribonucleic Channel. In Mathematical Problems in the Biological Sciences, pp. 87-100. American Mathematical Society: Providence, Rhode Island. 39. F. Papentin (1973). A Darwinian Evolutionary System. II. Experiments on Protein Evolution and Evolutionary Respects of the Genetic Code. J. Theor. Biol. 39: 417-430. 40. D.A. Huffman (1952). A Method for the Construction of MinimumRedundancy Codes. Proceedings of the I.R.E. September 1952; pp 10981102. 41. C. Berg and J. R. Jungck. (1998). Genetic Code Evolution: A Balance of Effort and Information Content. Beloit Biologist 17: 13-25. 42. M. O. Bertman and J. R. Jungck. (1978). Some Unresolved Mathematical Problems in Genetic Coding. Notices Am. Math. Soc. 25: A-174. 43. M. O. Bertman and J. R. Jungck. (1979). Group graph of the Genetic Code. J. Heredity 70: 379-384. 44. H. J. Dankwerts and D. Neubert. (1975). Symmetries of Genetic Code Doublets. Journal of Molecular Evolution 5: 327-332.
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
334
45. Jim´enez-Montano, Mora-Bas´ anez, and P¨ oschel (1996). The Hypercube Structure of the Genetic Code Explains Conservative and Non-Conservative Aminoacid Substitutions in vivo and in vitro. BioSystems 39: 117-125. 46. G. L. Findley, A. M. Findley, and S. P. McGlynn. (1982). Symmetry characteristics of the genetic code. Proc Natl Acad Sci U S A. 1982 November; 79(22): 7061-7065. 47. A. M. Findley, S. P. McGlynn, and G. L. Findley. (1989). The Geometry of Genetics. Wiley-Interscience: New York. 48. G. L. Findley and S. P. McGlynn. (2008). Geometry and Evolution. International J. Quantum Chemistry 20 (S8): 455-461. 49. R. Swanson (1984). A Unifying Concept for the Amino Acid Code. Bulletin of Mathematical Biology 46: No. 2, 187-203. 50. R. Swanson and S. M. Swanson. (1995). A Picture of the Genetic Code. In C.A. Pickover (ed.), Visualizing Biological Information, pp. 15, World Scientific. 51. D. Bosnacki, H. M.M. ten Eikelder, and P. A.J. Hilbers. (2003). Genetic Code as a Gray Code Revisited. Proceedings of the 2003 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS 03), Las Vegas, Nevada. 52. E. Gilbert (1958). Gray codes and paths on the n-cube. Bell System Technical J. 37: 815-826. 53. G. Rosen (1991). Rook’s tour of the genetic code. Bulletin of Mathematical Biology 53: 845-851. 54. M. O. Dayhoff ed. (1972). Atlas of Protein Sequence and Structure 1972, Volume 5. Silver Spring, Maryland: National Biomedical Research Foundation. 55. G.-C. Rota (1969). Combinatorial Analysis. In The Mathematical Sciences, ed. COSRIMS, pp. 197-208. MIT Press: Cambridge, Massachusetts. 56. S. Avital and R. T. Hansen (1978). Euler’s () Function: A Function with Many Properties and Uses. Int. J. Math. Educ. Sci. Technol. 9: 153-161. 57. T. van Aardenne-Ehrenfest and N. G. de Bruijn (1951). Circuits and Trees in Oriented Linear Graphs. Simon Stevin 28: 203-217. 58. N. G. de Bruijn (1975). Acknowledgement of Priority to C. Flye SainteMarie on the Counting of Circular Arrangements of 2n Zeros and Ones that Show Each n-Letter Word Exactly Once, Technische Hogeschool Einhoven (Nederland) Report 75-WSK-06: ii & 1-14. 59. C. Flye Sainte-Marie (1894). Solution to Question Nr. 48. l’Intermediaire des Mathematiciens l: 107-110. 60. J. A. Bondy U. S. R. (1976). Graph Theory with Applications, pp. 182-188, American-Elsevier: New York. 61. J. H. van Lint (1974). Combinatorical Theory Seminar Einhoven University of Technology, Lecture Notes in Mathematics No. 125, pp. 82-92. SpringerVerlag: Berlin. 62. A. P. Street (1974). Eulerian Washing Machines. In Combinatorial Mathematics, Proceedings of the Second Australian Conference, Lecture Notes in Mathematics, Volume 403, pp. 105-108. Springer-Verlag: Berlin.
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
335
63. J. R. Jungck (1984c). Circular Concatenated Genetic Code. Bioscene (cover). 64. J. Demongeot (1978). Sur la possibilite de Considerer Ie Code Genetique Comme un Code a Enchainement Degenere. Revue de Bio-Mathematique 62: 61-66. 65. J. R. Jungck (1977). Complementarity and Coding. J. College Science Teaching 7: 27-28. 66. C. Folsome (1977). The Permuted Generator Hypothesis for the Origin of the Genetic Code. Origins of Life 8: 391-392. 67. F. H. C. Crick, J S Griffith, and L E Orgel (1957). Codes without commas. Proc. Nat. Acad. Sci., 43, pp. 416-421. 68. A. Gutfraind (2006). Error-Tolerant Coding and the Genetic Code. M. S. Thesis, University of Waterloo, Waterloo, Ontario, Canada. 69. S. Normark, S. Bergstrom, T. Edlund, T. Grundstrom, B. Jaurin, F. P. Lindberg, and O. Olsson. (1983). Overlapping Genes. Annual Review of Genetics 17: 499-525. 70. F. Lillo and D. C. Krakauer. (2007). A statistical analysis of the three-fold evolution of genomic compression through frame overlaps in prokaryotes. Biology Direct 2: 22. 71. R. Belshaw, O. G. Pybus, and A. Rambaut. (2007). The evolution of genome compression and genomic novelty in RNA viruses. Genome Res. 17 (10): 1496 - 1504. 72. C. Kingsford, A. L. Delcher, and S. L. Salzberg. (2007). A Unified Model Explaining the Offsets of Overlapping and Near-Overlapping Prokaryotic Genes. Mol. Biol. Evol. 24 (9): 2091 - 2098. 73. I. Makalowska, C. F. Lin, W. Makalowski. (2005). Overlapping genes in vertebrate genomes. Comput Biol Chem. 29 (1): 1-12. 74. Y. Luoa, C. Fua, Da-Y. Zhanga, and K. Lina. (2006). Overlapping genes as rare genomic markers: the phylogeny of γ-Proteobacteria as a case study. Trends in Genetics 22 (11): 593-596. 75. P. J. Cock and D. E. Whitworth. (2007). Evolution of gene overlaps: relative reading frame bias in prokaryotic two-component system genes. Journal of Molecular Evolution 64 (4): 457-462. 76. Y. Fukuda, Y. Nakayama, and M. Tomita. (2003). On dynamics of overlapping genes in bacterial genomes. Gene 323: 181-187. 77. N. Sabath, D. Graur, and G. Landan. (2008). Same-strand overlapping genes in bacteria: compositional determinants of phase bias. Biology Direct 3: 3645. 78. G. Cullmann and J-M. Labouygues. (1983). Noise Immunity of the Genetic Code. BioSystems 16:9-29. 79. G. Cullmann and J-M. Labouygues. (1985). The genetic code, an instantaneous absolutely optimal code [Article in French]. C R Acad Sci III 301 (5):157-60. 80. J. Kaptcianos (2008). A Graph Theoretical Approach to DNA Fragment Assembly. American Journal of Undergraduate Research 7 (1): 1-18. 81. L. Y. Yampolsky and A. Stoltzfus. (2005). The exchange ability of amino acids in proteins. Genetics 170 (4):1459-1472.
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
336
82. G. K. Zipf (1949). Human behavior and the principle of least effort: an introduction to human ecology. Addison-Wesley Press, Inc.: Cambridge, MA. 83. A. Danchin (1996). On Genomes and Cosmologies. In Collado-Vides, B. Magasanik, and T. F. Smith, Integrative Approaches to Molecular Biology, MIT Press: Cambridge, MA. 84. G. Hampikian and T. Andersen. (2007). Absent Sequences: Nullomers And Primes. Pacific Symposium on Biocomputing 12: 355-366. 85. V. I. shCherbak (2003). Arithmetic Inside the Universal Genetic Code. Biosystems 70 (3): 187-209. 86. M. M. Rakocevic (2004). A Harmonic Structure of the Genetic Code. J. Theoretical Biol. 229: 463-465. 87. V. A. Karasev and S. G. Sorokin. (1997). Topological Structure of the Genetic Code. Russian J. Genetics 33 (6): 622-628. 88. M. V. Jos´e, E. R. Morgado, and T. Govezensky. (2007). An Extended RNA Code and Its Relationship to the Standard Genetic Code: An Algebraic and Geometric Approach. Bull. Math. Biol. 69 (1): 215-243. 89. C. J. Michel (2007). An Analytical Model of Gene Evolution with 9 Mutation Parameters: An Application to the Amino Acids Coded by the Common Circular Code. Bull. Math. Biol. 69 (2): 677-698. 90. M. E. B. Yamagishi and A. I. Shimabukuro. (2008). Nucleotide Frequencies in the Human Genome and Fibonacci Numbers. Bull. Math. Biol. 70 (3): 643-653. 91. A. Patel (2001). Quantum Algorithms and the Genetic Code. Pramana- J. Phys. 56 (2& 3): 367-381. 92. L. Frappat, A. Sciarrino, and P. Sorba. (1998). A Crystal Base for the Genetic Code. Physics Letters A 250: 214-221. 93. L. Frappat, A. Sciarrino, and P. Sorba. (2002). Prediction of PhysicalChemical Properties of Amino Acids from Genetic Code. J. Biological Physics 28: 17-26. 94. L. Frappat, A. Sciarrino, and P. Sorba. (2005). Correlation matrix for Quartet Codon Usage. Physica A 351: 461-476. 95. S. V. Petoukhov (2008). The Degeneracy of the Genetic Code and Hadamard Matrices. Arxiv preprint arXiv:0802.3366, 2008 (arxiv.org). 96. S. V. Petoukhov (2005). Hadamard Matrices and Quint Matrices in Matrix Presentations of Molecular Genetic Systems. Symmetry: Culture and Science 16 (3): 247- 266. 97. M. He, S. V.Petoukhov and P. Ricci. (2004). Genetic Code, Hamming Distance, and Stochastic Matrices. Bull. Math. Biol. 66: 1405-1421. 98. P. Cristea (2001). Genetic Signal Analysis. Signal Processing and Its Applications, Sixth International Symposium 2: 703-706. 99. M. A. Jim´enez-Montano, R. Feistel, and O. Diez-Martinez. (2004). Information hidden in signals and macromolecules. I. : Symbolic time-series analyses. Nonlinear Dynamics Psychol. Life Sci. 8 (4): 445-478. 100. M. Re and G. Pavesi. (2007). Signal Processing in Comparative Genomics. In F. Masulli, S. Mitra, and G. Pasi, eds., Applications of Fuzzy Sets Theory, Lecture Notes in Computer Science Volume 4578: 544-550; Springer-
April 24, 2009
16:18
WSPC - Proceedings Trim Size: 9in x 6in
John.Jungck.novo2
337
Berlin/Heidelberg, Germany. 101. I. Shmulevich and E. R. Dougherty (2007). Genomic Signal Processing. Princeton University Press: Princeton, NJ. 102. X. H. Wang, R. S. H. Istepanian, Y. H. Song, and E. E. May. (2003). Review of Application of Coding Theory in Genetic Sequence Analysis. Proceedings, 5th International Workshop on Enterprise Networking and Computing in healthcare Industry, June 6-7, Santa Monica.
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
338
MATHEMATICAL BIOLOGY: SOME OPPORTUNITIES IN INTEGRATIVE BIOLOGY R. MEJ´IA NHLBI, National Institutes of Health 10 Center Drive, Room B1D416 Bethesda, MD 20892-1061 USA E-Mail:
[email protected] Integrative Biology is the study of an organism within a framework in an integrated, systematic manner in order to discern governing principles or mechanisms. Quantitative tools applied in the study of biological organisms include, in addition to statistical analyzes and hypothesis testing, mathematical modeling. Computational tools used include databases to organize both the data and models into a form that is linked and readily usable. I will describe mathematical models integrated into research in physiology as well as tools being developed by the Physiome Project with the support of the International Union of Physiological Sciences. The goal of the Physiome Project is the quantitative description of the integrated function of living organisms, and for the human physiome, to develop quantitative biology to improve medical science from genes to health. The “model validate” cycle used in Mathematical Biology is iterated to refine our understanding of the biology as illustrated here with experiments, databases and modeling in kidney physiology.
1. Introduction Models have been used to study biological processes at varying space and time scales from DNA to RNA, proteins, pathways, networks, cells, tissues and organs. In DNA analysis whole-genome shotgun sequencing was initially considered unworkable, but was predicted to be feasible by statistical analysis [57]. Since then it has been used to sequence the human genome [56, 25] as well as other genomes [16]. RNA molecules with pseudo-knots have been analyzed mathematically [15], and models have been used to predict the outcome of small interfering RNA (siRNA) therapy in the treatment of cancer [1]. Protein folding has been a fertile field of study. Monte Carlo models of protein folding abound, for example, in dynamic Monte Carlo simulation of helix-coil transition [8]. Gene networks and regulatory pathways of the cell cycle in yeast and bacteria have been elucidated using
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
339
mathematical models by Tyson and coworkers [54, 38, 5] for two decades. Recently, quantitative models have described biochemical networks in signal transduction, metabolic pathways and regulatory networks as described in [6]. At the cellular level, mathematical models have contributed to the understanding of the function of many cell types such as pancreatic beta cells [4], kidney cells [30, 58], and smooth muscle cells [34, 22, 59]. Modeling at the tissue level has contributed to the understanding of disease progression such as in cancer [51, 31] At the organ level, models have played an important role in cardiac physiology, specifically in the study of the pacemaker [55] and cardiac dynamics [23, 46]. More broadly still, the Cardiome Project has sought to describe the functioning heart [2, 3]. Currently synthetic biology is beginning to make it possible to design and study new organisms [44]. The Human Physiome Project of the International Union of Physiological Sciences (IUPS) has as goal the quantitative description of the integrated functions of living organisms, and seeks to develop quantitative biology to improve medical science from genes to health [20, 19]. This goal has also been undertaken by the EuroPhysiome initiative [13, 52]. Integral to this effort is the development of databases to organize and disseminate data to both bench scientists and modelers. This is supported by the development and use of markup languages that facilitate the exchange of data and models across compute platforms. Markup languages in use include SBML [18] for representation of biochemical reaction networks, CellML [29] for description of mathematical models of cellular function, and MorphML [11] for description of neuroanatomical data, coding and sharing information. An active area of investigation in renal physiology has been the study of the mechanism by which the mammalian kidney produces a concentrated urine. Mathematical models have contributed to the understanding of processes involved in the physiology and pathophysiology of the kidney since Stephenson [48, 49] and Kokko and Rector [24] first described the kidney as a countercurrent exchanger and multiplier. The kidney physiome project, promulgated by Schafer [47] and others [20, 53] under the auspices of the IUPS, seeks to integrate tools for use in renal physiology. We describe how bench scientists and modelers are collaborating to use techniques of molecular biology and imaging to refine mathematical models of the mammalian kidney, and how the physiome project seeks to facilitate the exchange of physiological data and models.
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
340
Fig. 1. A schematic diagram of a cross section of the human kidney. The rectangular section (top of figure) delineates nephro-vascular units in the cortex and the medulla that are modeled.
2. Model of the Kidney The mammalian kidney serves to maintain homeostasis by excreting impurities, byproducts of diet and metabolism, and conserving water and other solutes necessary for body function. A schematic diagram of a cross section of the human kidney (Figure 1) shows multiple papillae that contain nephrovascular units. Small solutes and water are filtered from arterial blood by glomerulii in the cortical region and travel through nephron segments that are permselectable. The fluid that reaches the nephron’s terminal segment, the collecting duct, flows into a calyx at the bottom of the papilla and is excreted through the ureter. Blood vessels in each pyramid exchange water and solutes with the nephrons and return reabsorbate to the venous circulation. A mathematical model of a water and solute movement in a single nephro-vascular population was first described by Stephenson and coworkers [50]. A model with populations of short and long nephrons is shown schematically in Figure 2. This model shows transport of water, NaCl, and urea to and from nephron segments and the vasculature through an interstitial space.
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
341
Fig. 2. A schematic diagram of a model with two nephro-vascular units. Water and solute movement is shown with water (clear arrow), NaCl (black arrow), urea (stippled arrow). Segments are: proximal tubule (PT), descending Henle’s limb (DHL), ascending limb of Henle (AHL), distal tubule (DT), collecting duct (CD), descending vas rectum (DVR), ascending vas rectum (AVR) and post-glomerular capillary (PGC).
2.1. Actions of Atrial Natriuretic Factor A model where the interstitium and vasa recta are merged into a single compartment, the central core, is shown in Figure 3. The model has five populations of short and long nephrons that account for 71, 13, 9, 5 and 2% of nephrons in the rat kidney, respectively. This model has been used to study three hypothetical steady-state effects of atrial natriuretic factor (ANF) on the concentrating mechanism [35]: namely, inhibition of NaCl absorption in the collecting duct, inhibition of water permeability in the collecting duct, and increased glomerular filtration rate. Table 1 shows the composition of fluid at three locations in the nephron: the outflow from the distal cortical tubule, the flow into the cortical collecting duct, and the outflow from the collecting duct. The composition in the control, a rat kidney with argenine vasopressin stimulated collecting duct water permeability, is compared with the composition for (i) reduced VNaCl m,CCD , (ii) reduced Pf,IMCDt , and (iii) increased delivery of fluid to the proximal straight tubule VPST (0) that represents increased GFR in the model. Boundary values and other model parameters are given in [35]. Table 1 shows that the three actions of ANF considered have little effect on urea as shown by the fraction of filtered load excreted. Inhibition of
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
342
NaCl absorption in the cortical collecting duct by reduction of VNaCl m,CCD by 50 and 90% increased NaCl delivery to the IMCDt (not shown) and NaCl excretion. Water excretion also increased as predicted. Inhibition of water permeability in the terminal segment of the inner medullary collecting duct by reduction of Pf,IMCDt by 50 and 80% increased water excretion slightly, and NaCl excretion was reduced slightly relative to the control. The major
Table 1.
Composition for Control and Hypothesized Actions of ANF Water FFL x 100
V, nl/min
NaCl FFL x 100
Conc, mM
Urea FFL x 100
Conc, mM
Total Osmolality, mosmol/ kgH2 O
Control DCT out 19.74 4.94 6.11 46 55.4 18 CCD in 9.59 2.40 5.37 84 149.1 101 CD out 0.96 0.24 0.27 42 46.9 318 aCl 50% VN max,CCD DCT out 19.66 4.92 6.08 46 55.4 18 CCD in 9.17 2.29 5.30 87 130.8 93 CD out 1.14 0.28 0.84 111 46.7 266 aCl 10% VN max,CCD DCT out 19.64 4.91 6.08 46 55.4 18 CCD in 8.89 2.22 5.26 89 117.9 86 CD out 1.32 0.33 1.41 159 46.7 229 50% Pf,IM CDt DCT out 19.58 4.89 6.06 46 55.4 18 CCD in 9.20 2.30 5.24 86 137.2 97 CD out 1.02 0.26 0.20 29 46.9 299 20% Pf,IM CDt DCT out 19.34 4.83 5.98 46 55.4 19 CCD in 8.60 2.15 5.05 88 118.7 90 CD out 1.12 0.28 0.11 15 47.0 272 1.025 VP ST (0) DCT out 20.29 5.20 6.83 50 55.4 18 CCD in 10.01 2.57 6.04 90 126.9 82 CD out 1.23 0.32 0.89 108 47.1 248 1.05 VP ST (0) DCT out 20.92 5.49 7.57 54 55.4 17 CCD in 10.58 2.78 6.75 96 110.1 68 CD out 1.57 0.41 1.64 156 47.3 196 FFL is fraction of filtered load. Flow rate (V) in the collecting duct is normalized by the total number of glomerulotubular units in rat kidney (38,000). Therefore, to obtain total flow in the collecting duct, values are multiplied by 38,000. DCT values are for short nephrons only. See the text for definition of other parameters.
130 311 822 130 309 835 130 308 843 130 311 745 131 310 643 137 305 800 144 299 778
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
343
effect was a reduction in urine osmolality due to incomplete equilibration with the medullary interstitium or central core. Increased flow to the proximal straight tubule, VPST (0), results in increased NaCl and water delivery to the distal cortical tubule, increased NaCl and water excretion, and urinary NaCl concentration close to the plasma concentration (150 mM). A 2.5 and 5% increase in VPST (0) increased delivery to the distal cortical tubule of superficial nephrons by over 5 and 11% respectively. The results support the conclusion that the overall effect of an increase in circulating ANF is due to multiple actions of ANF in the kidney that result in a more effective regulatory response than a single action could produce. 2.2. Synergy of Models and Experiments A curated database of renal parameter (RPDB) for use by modelers has been implemented by Legato et al. [27]. It includes experimental conditions for measurements with links to the literature and to regulatory and transporter proteins. Measurement of water and solute transport in segments of the rat loop of Henle has not been reported. However, a query to RPDB for “chinchilla” shows permeabilities measured for several segments of the descending and ascending thin limb [9, 10], as does a search for “hamster” [21]. Hence, permeabilities of the loop of Henle in the rat have been extrapolated from measurements made in other rodents. These measurements have been made using tubule perfusion [7]. A segment of tubule is dissected and mounted on an apparatus such that the composition of the perfusate and of the solution bathing the tubule is known. The composition of the outflow at the distal end of the tubule is used to calculate the transport. However, perfusion does not resolve the question of changes in transport within the segment of tubule perfused, where uniform transport is generally assumed. Immunofluorescent immunolabeling [40, 37], on the other hand, has been used to identify transporters expressed in individual portions of a tubule. Antibodies to a protein are conjugated with a fluorophore in order to label a transporter in the cells that form a tubule. Models have described quantitatively the mammalian urine concentration mechanism. They have shown that the permselectability of the nephron segments produces gradients for water and solute transport in the medulla. However, none has shown how the kidney produces the solute gradient in the inner medulla that is necessary to produce a concentrated urine. Immunolabeling has also shown that nephron segments and the vasculature are located preferentially in the medulla. This suggests that the solute com-
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
344
position of the interstitium is not homogeneous at a given depth. Instead, tubules that extend to the same depth may be exposed to different osmotic gradients in the surrounding interstitium. Experiments by Pannabecker et al. [40] in rat, mouse and rabbit have shown that in the inner medulla descending thin limbs of Henle (DTL) and ascending thin limbs of Henle (ATL) express different transporters in adjacent segments. They have also shown that DTL segments may be permeable to water, NaCl, or urea, while an ATL segment may be permeable to NaCl or urea. Mejia and Wade [37] have shown that in the rat inner medulla more thin limbs (TL) of Henle were labeled by antibodies to chloride channel marker ClC-K1 than by antibodies to water channel marker AQP1 (Table 2). This sugests that some DTL segments transport NaCl and not water. In addition, TL were labeled by ClC-K1 on both sides of the hairpin turns, showing that DTL shift from expressing AQP1 to expressing ClC-K1 at some distance from where they turn and begin to ascend. Since AQP1 is expressed in vasa recta (VR) as well as in DTL, von Willebrand Factor (vWF) and morphology (the diameter of VR is greater than that of DTL) were used to obtain the estimate of DTL shown in Table 2. Table 2. Distance to Tip
Sprague-Dawley rat tubules labeled and estimates of DTL and ATL Labeled by ClC (ATL) 74± 25 60± 2 83± 15 37± 10 99± 4 190± 33
Labeled by AQP 1 (DTL) 16 ± 13 16 ± 13 12 ± 23 12 ± 23 19 ± 84 72± 21
Estimated ATL1 Incidence 45± 13 38± 1 48± 8 24± 6 59± 5 131± 27
% DTL with AQP 1 36 43 26 50 32 55
Incidence ClC labeled DTL 29± 12 22± 1 35± 6 12± 5 40± 5 59± 6
µm 50 (4)2 50 (2)2 strong ClC 100 (3)2 100 (3)2 strong ClC 200 (3)2 Junction IM-OM (2)2 1 (AQP 1 + ClC)/2 2 (n) is average for n sections at this depth 3 DTL labeled by AQP 1 = structures labeled by AQP 1 − VR estimated from adjacent sections 4 DTL labeled by AQP 1 = structures labeled by AQP 1 − structures labeled by vWF as VR
Layton and coworkers [26] have used data about the three dimensional structure of the outer medulla to group the nephrovascular segments into four groups, each with its own interstitium [26, Figure 1]. The three dimensional architecture of the rat inner medulla has been described by Dantzler and coworkers [39, 41, 42], and a model has been used to compute sodium and urea concentration profiles and osmolality in the inner medulla [43, Figures 2 and 3]. It remains to test these observations with a model of the whole kidney.
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
345
3. Kidney Physiome Effort undertaken by the Physiome Project [20] has resulted in the construction of a repository for models written in CellML [29]. The EuroPhysiome Project supports access to several databases for use by investigators including the Quantitative Kidney Database (QKDB) [12]. QKDB links to several renal databases and other resources. The databases include the Collecting Duct Database of regulatory and transporter proteins (CDDB) [28], the Collecting Duct Phosphoprotein Database (CDPD) [17], and the Urinary Exosome Protein Database [45] that contains protein products identified in the urine and facilitates a BLAST [32] comparison of an amino acid sequence against the database. 4. Summary We have described how a study of the urine concentration mechanism of the mammalian kidney has used methods in experimental physiology, molecular biology, bioinformatics and mathematical modeling. This is a multidisciplinary effort - the type that the Physiome Project seeks to stimulate, and is representative of the many opportunities available for contribution by mathematical biologists in the full breadth of research, from the genome to the organism. 5. Appendix 5.1. Model Equations A multinephron model of the mammalian kidney described by Mejia, et al. [35] has been used to study the urine concentrating mechanism. The model combines the vasculature and interstitium into a central core [49], and is described as follows: ∂t (AC) + ∂x F = −J, ∂t A + ∂x Fv = −Jv , ∂x P = −Rv Fv ,
(1)
where mass flow is given by F = Fv C − A(D∂x C); x is distance along the cortico-papillary axis; t is time; A is the crosssectional area of the segment; C is a vector of solute concentrations; Fv is volume flow; J is solute flux (positive defined to be out of the lumen). Jv
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
346
is volume flux out of the lumen. P is hydrostatic pressure; Rv is resistance to flow, and D are solute diffusion coefficients.
Fig. 3. A schematic diagram of central core model with five nephron populations. The nephron segments are: proximal straight tubule (PST), short descending thin limb of Henle (DT LI ), long descending thin limb (DT LII , DT LIII ), distal cortical tubule (DCT), thick ascending thick limb (TAL), ascending thin limb (ATL), initial collecting tubule (ICT), cortical collecting duct (CCD), outer medullary collecting duct (OMCD) and inner medullary collecting duct (IMCDi, IMCDt). In the rat 71% are short nephrons, while 2% extend to the papilla.
Transmural water flux is given by Jv = −2πρPf Vw
X
σk ∆Ck ,
k
where ρ is the radius of the tubule; Pf is the water permeability; Vw is the partial molar volume of water; σk is the reflection coefficient of the k th species, and ∆Ck = Ck − CCk , where Ck and CCk are the concentration of the k th species in the lumen and central core, respectively. Transmural solute flux is given by Jk = 2πρPk ∆Ck + (1 − σk )Ck Jv + Jka ,
where Pk is the solute permeability of the k th species, and Ck = (Ck + CCk )/2. Active transport of the k th species is given by Jka =
Vmk Ck , Kmk + Ck
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
347
where Vmk is the maximum rate of active transport, and Kmk is the Michaelis constant. Water and mass conservation require that X JCv (x) = − Jiv (x), i
JCk (x) = −
X
Jik (x),
i
where subscript C represents the medullary central core, and summation is over all tube segments i that extend to medullary depth x. In the cortex, the interstitium is considered to be a well-mixed compartment with the concentration of each solute equal to that of plasma (superscript p), so that C = Cp , and the hydrostatic pressure is prescribed as Pc = Pco . The central core is treated as a tube open at the border of the cortical labyrinth and the medullary rays and closed at the papilla. Thus equations (1) hold, and boundary conditions for t ≥ 0 are AC (L)∂t CC (L, t) = CC (L, t)JCv (L, t) − JC (L, t), FCv (L, t) = FC (L, t) = 0, PC (0, t) = Pco , where L is the depth of the medulla. Boundary conditions for each nephron population are given by C1 (0, t) = C01 ,
0 F1v (0, t) = F1v ,
Pℓ (L, t) = Pb , where subscripts 1 and ℓ refer to the first and last tube segment of each nephron population, respectively, and Pb is the bladder pressure. Intermediate boundary data are obtained by matching the value entering a tube segment to that leaving the previous segment (Figure 3). Initial conditions at each axial position x and time t = 0 in the lumen and central core are given by C(x, 0) = C0 ,
Fv (x, 0) = Fv0 ,
P (x, 0) = P 0 .
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
348
5.2. Solution Method Whole kidney multinephron models are non-linear multi-point boundary value problems. A partitioning scheme described in [36] has been used to reduce the storage and computation time, and a second-order implicit numerical scheme is used to discretize the differential equations [36]. Multiple steady-state solutions may exist, so we have used a parameter continuation scheme described in [33] to solve the discretized equations. An implementation of the continuation algorithm is available at ftp://ftp.ncifcrf.gov/pub/users/mejia/ray/conkub.tar.Z . Accounting for multiplicity of solutions is required when computing the transition from one steady-state to another. This is illustrated for transition from diuresis to antidiuresis in [33, Figure 7].
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
13. 14. 15. 16. 17. 18. 19.
J. C. Arceiro, T. L. Jackson, and D. E. Kirschner, DCDS 4, 39 (2004). J. B. Bassingthwaighte, Adv. Exp. Med. Biol. 382, 331 (1995). J. B. Bassingthwaighte, Adv. Exp. Med. Biol. 430, 325 (1997). R. Bertram, A. Sherman, and L. S. Satin, Am. J. Physiol. Endocrinol. Metab. 293, E890 (2007). P. Brazhnik, and J. J. Tyson, Cell Cycle 5, 522 (2006). R. Breitling, D. Gilbert, M. Heiner, and R. Orton, Brief Bioinform. 9, 404 (2008). M. Burg, J. Grantham, M. Abramow, and J. Orloff, Am. J. Physiol. 210, 1293 (1966). Y. Chen, Y. Zhou, and J. Ding , Proteins 69, 58 (2007). C. L. Chou, and M. A. Knepper, Am. J. Physiol. 263, F417 (1992). C. L. Chou, and M. A. Knepper, Am. J. Physiol. 264, F337 (1993). S. Crook, P. Gleeson, F. Howell, J. Svitak, R. A. Silver, Neuroinformatics 5, Summer 96 (2007). V. Dzodic, S. Hervy, D. Fritsch, H. Khalfallah, M. Thereau, and S. R. Thomas, Cell Mol. Biol. (Noisy-le-grand) 50, 795 (2004). See http://physiome.ibisc.fr/qkdb/. J. W. Fenner, et al., Philos. Transact. A Math. Phys. Eng. Sci. 366, 2979 (2008). P. A. Gonzales, et al., J. Am. Soc. Nephrol. (in press) C. Haslinger, and P. F. Stadler. Bull. Math. Biol. 61, 437 (1999). H. Herlyn, and H. Zischler, Genome Dyn. 2, 17 (2006). J. D. Hoffert, G. Wang, T. Pisitkun, R. F. Shen, and M. A. Knepper, J. Proteome Res. 6, 3501 (2007). See http://dir.nhlbi.nih.gov/papers/lkem/cdpd/. M. Hucka et al., Bioinformatics 19, 524 (2003). P. J. Hunter, E. J. Crampin, and P. M. Nielsen, Brief. Bioinform. 9, 333 (2008).
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
349
20. P. Hunter, P. Robbins, and D. Noble, Pflugers Arch. 445, 1 (2002). See http://www.physiome.org.nz/index html. 21. M. Imai, J. Taniguchi, and K. Yoshitomi, Am. J. Physiol. 254, F323 (1988). 22. A. Kapela, A. Bezerianos, N. M. Tsoukias, J. Theor. Biol. 253, 238 (2008). 23. J. P. Keener, J. Cardiovasc. Electrophysiol. 14, 1225 (2003). 24. J. P. Kokko, and F. C. Rector Jr, Kidney Int. 2, 214 (1972). 25. E. S. Lander, et al., Nature 409, 860 (2001). 26. A. T. Layton, and H. E. Layton, Am. J. Physiol. Renal Physiol. 289, F1346 (2005). 27. J. Legato, M. A. Knepper and R. Mejia, http://cddb.nhlbi.nih.gov/rpdb/, unpublished. 28. J. Legato, M. A. Knepper, R. A. Star, and R. Mejia, Physiol. Genomics 13, 179 (2003). See http://cddb.nhlbi.nih.gov/cddb/. 29. C. M. Lloyd, M. D. Halstead, and P. F. Nielsen, Prog. Biophys. Mol. Biol. 85, 433 (2004). See http://www.physiome.org.nz/index html. 30. R. M. Lynch, R.Mejia and R. S. Balaban, Comments Mol. Cell Biophys.a5, (1988), 151. 31. B. P. Marchant, J. Norbury, and H. M. Byrne, Math. Med. Biol. 23, 173 (2006). 32. S. McGinnis, and T. L. Madden, Nucleic Acids Res. 32, W20 (2004). See http://blast.ncbi.nlm.nih.gov/Blast.cga. 33. R. Mejia, J. Comput. Phys. 63, 67 (1986). 34. R. Mejia and R. M. Lynch, BIOMAT 2006 International Symposium on Mathematical and Computational Biology, Mondaini, R.P. and Dil˜ ao, R. (Eds) (2007), ISBN 978-981-270-768-0. 35. R. Mejia, J. M. Sands, J. L. Stephenson, and M. A. Knepper, Am. J. Physiol. 257, F1146 (1989). 36. R. Mejia and J. L. Stephenson, J. Comput. Phys. 32, 235 (1979). 37. R. Mejia and J. B. Wade, Am. J. Physiol. Renal Physiol. 288, F553 (2002). 38. B. Nov´ ak, and J. J. Tyson, Biochem. Soc. Trans. 31, 1526 (2003). 39. T. L. Pannabecker, D. E. Abbott, and W. H. Dantzler, Am. J. Physiol. Renal Physiol. 286, F38 (2004). 40. T. L. Pannabecker, A. Dahlmann, O. H. Brokl, and W. H. Dantzler, Am. J. Physiol. Renal Physiol. 278, F202 (2000). 41. T. L. Pannabecker, and W. H. Dantzler, Am. J. Physiol. Renal Physiol. 287, F767 (2004). 42. T. L. Pannabecker, and W. H. Dantzler, Am. J. Physiol. Renal Physiol. 293, F696 (2007). 43. T. L. Pannabecker, W. H. Dantzler, H. E. Layton, and A. T. Layton, Am. J. Physiol. 295, F1271 (2008). 44. Peccoud J, et al., PLoS ONE. 3, e2671 (2008). 45. T. Pisitkun, R. F. Shen, and M. A. Knepper, Proc. Natl. Acad. Sci. USA 101, 1336 (2004). See http://dir.nhlbi.nih.gov/papers/lkem/exosome/. 46. J J. Rice, M. S. Jafri, and R. L. Winslow, Am. J. Physiol. Heart Circ. Physiol. 278, H913 (2000). 47. J. A. Schafer, Ann. Biomed. Eng. 28, 1002 (2000).
April 24, 2009
16:19
WSPC - Proceedings Trim Size: 9in x 6in
Raymond.Mejia.novo2
350
48. J. L. Stephenson, Nature 206, 1215 (1965). 49. J. L. Stephenson, Kidney Int. 2, 85 (1972). 50. J. L. Stephenson, R. Mejia, and R. P. Tewarson, Proc. Natl. Acad. Sci. USA. 73, 252 (1976). 51. K. R. Swanson, C. Bridge, J. D. Murray, and E. C. Alvord Jr, J. Neurol. Sci. 216, 1 (2003). 52. S. R. Thomas, Wiley Interdisciplinary Reviews Systems Biology, in press. 53. S. R. Thomas, et al., Philos. Transact. A Math. Phys. Eng. Sci. 366, 3175 (2008). 54. J. Tyson, Proc. Natl. Acad. Sci. USA 88, 7328 (1991). 55. A. Varghese, and R. L. Winslow, J. Theor. Biol. 168, 407 (1994). 56. J. C. Venter, et al., Science 291, 1394 (2001). 57. J. L. Weber, and E. W. Myers, Genome Res. 7, 401 (1997). 58. A. M. Weinstein, Am. J. Physiol. Renal Physiol. 280, F1072 (2001). 59. P. F. Zhuk, S. A. Karakhim, V. F. Gorchev, and S. A. Kosterin, Ukr. Biokhim. Zh. 80, 123 (2008).
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
351
AN IN SILICO APPROACH FOR THE ANTIGENIC MUTATION AND IMMUNE MEMORY∗ A. de CASTRO Centro Nacional de Pesquisa Tecnol´ ogica em Inform´ atica para a Agricultura, Empresa Brasileira de Pesquisa Agropecu´ aria - EMBRAPA Campinas 13083-886, Brazil Departamento de Inform´ atica em Sa´ ude, Universidade Federal de S˜ ao Paulo UNIFESP S˜ ao Paulo 04023-062, Brasil E-mail:
[email protected] C. F. FRONZA Departamento de Inform´ atica em Sa´ ude, Universidade Federal de S˜ ao Paulo UNIFESP S˜ ao Paulo 04023-062, Brazil D. ALVES Departamento de Medicina Social, Faculdade de Medicina de Ribeir˜ ao Preto, Universidade de S˜ ao Paulo - USP Ribeir˜ ao Preto 14049-900, Brazil Departamento de Inform´ atica em Sa´ ude, Universidade Federal de S˜ ao Paulo UNIFESP S˜ ao Paulo 04023-062, Brazil In this article we studied in machina an approach to simulate the process of antigenic mutation. Our results have suggested that the durability of the immune memory is affected by the process of antigenic mutation.and by populations of soluble antibodies in the blood. The results also suggest that the decrease of the production of antibodies favors the global maintenance of immune memory.
1. Introduction To better understand the mammals defense system, particularly the human beings, is of fundamental importance for the cure of innumerous diseases ∗ Work
partially supported by Brazilian Agricultural Research Corporation (Embrapa)
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
352
and improvement of the quality human life. As the microorganisms are presented in many different forms, there is the necessity of a wide variety of immune responses to control each kind of infection. The immune responses are mediated mainly by the lymphocytes B and T responsible by the specific recognition of the antigens (strange molecules to the organism capable of being recognized by the immune system) and by soluble molecules that this B lymphocytes secrete, the antibodies.1 In general, the immune system must present virgin, immune and tolerant states and may present limits of memorization. In the virgin state the populations are all at a very low level, which means, with values of the order of the quantities randomly produced by the bone marrow. In the immune state the population of B cells that specifically recognizes a kind of antigen remains at a determined level, even after the suppression of this antigen. In the tolerant state the populations of antibodies and of B cells did not respond to the presence of antigens or self-antigens (proteins from the individuals organism itself). However, there can be a positive selection of B cells naturally self-reactive, indicating the existence of a subgroup of B lymphocytes subject to self reactivity.1 Among millions of kinds of B lymphocytes of the organism, each one with its specific antibody held on the membrane, only those which recognize a specific antigen are stimulated. When this occurs, the B lymphocyte multiplies, originating a lineage of cells (clones) able to produce specific antibodies against the antigen that induced its multiplication. The antibodies produced by a mature B lymphocyte known as plasmacyte are released in great amount in the blood. The multiplication continues as long as there are antigens able to activate them. As a determined kind of antigen is being eliminated from the body, the number of lymphocytes specialized in battle it also diminishes. However, a small population of these lymphocytes remains in the organism for the rest of the individuals life, constituting what is denominated immune memory. During the evolution of the immune system, an organism finds a given antigen repeatedly. The efficiency of the adaptative response to secondary encounters could be considerably increased by the storing of populations of cells that produce antibodies with high affinity to that antigen, denominated memory cells. Instead of starting all over” each time a given antigenic stimulus is presented this strategy guarantees that the speed and efficiency of the immune response is enhanced after each infection.1,2 To better understand this process, two fundamental theories for the immune memory were developed. The first one considers that, after the expansion of the B cells, the formation of plasma cells and memory cells occur. According to F. M. Burnet,2 these memory cells would be remain-
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
353
ing cells from an immunologic response that, supposedly, survive until the end of the individuals life therefore with a life longer than the other cell of the organism. The second theory, due to N. K. Jerne,3 considers that the immune system presents memory and capacity of response for a second invasion of the same antigen, with a self organization of the system, allowing the formation of cellular populations that last for a long time. In other words, this author theorizes that the populations survive, and not only a specific kind of cell with a lifespan longer than the one from other cells of the organism. 2. Methodology In this paper a computational model is presented,4,5 developed to simulate the behavior of the immune system, considering structural mechanisms of regulation that were not included in the simplified model proposed by Lagreca et al.6 In our approach we considered not only the antibodies linked to the surface of the B cells (surface receptors), but also the populations of antibodies soluble in the blood (antibodies secreted by mature B cells), making this model closer to the actual behavior of the immune system. Besides the differentiation of the B cells, the model exposed here allows representing the generation, maintenance and regulation of the immune memory in a more complete way, through a memory network, that combines the characteristics of Burnets clonal selection theory3 and Jernes network hypothesis, considering only idiotypicantiidiotypic interactions. In the model discussed here, the molecular receptors of the B cells are represented through bitstrings with diversity of 2B , where B is the number of bits in the string.4,5 The individual components of the immune system represented in the model are the B cells, the antibodies and the antigens. The B cells (clones) are characterized by their surface receptor and modeled by a binary string. The epitopes portions of an antigen that can be linked by the B cell receptors (BCR) are also represented by bit strings. The antibodies have receptors (paratopes)7 that are represented by the same bit-string that models the BCR of the B cell which produced them. Each string (shape) is associated to an integer σ(0 ≤ σ ≤ M = 2B − 1) which represents each on of the clones, antigens or antibodies. to a given σ are expressed by The neighbors the Boolean function σi = 2i xor σ . The complementary form of σ is obtained by σ ¯ = M − σ and the time evolution of the concentrations of the several populations is obtained as a function of the integer variants σ and t, through direct iteration. The equations that describe the behavior of the clonal populations are calculated through an iterative process, for different
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
354
parameters and initial conditions: y(σ, t) ζah (¯ σ , t) (1) y(σ, t + 1) = (1 − y(σ, t)) × m + (1 − d)y(σ, t) + b ytot (t) with the complementary shapes included in the term ζah (¯ σ , t) , σ , t) = (1 − ah )[y(¯ σ , t) + yF (¯ σ , t) + yA (¯ σ , t)] ζah (¯ + ah
B
[y(σ¯i , t) + yF (σ¯i , t) + yA (σ¯i , t)].
i=1
σ , t) and yF (¯ σ , t) are, respectively, the populaIn these equations, yA (¯ tions of antibodies and antigens; b is the proliferation rate of the B cells; σ ¯ and σ¯i are the complementary shapes of σ and of the nearest B neighbors in the hypercube (with the ith bit flipped). The first term (m), inside the curled brackets in equation (1), represents the production of the cells by the bone marrow and it is a stochastic variable. This term is small, but non-zero. The second term inside the curled brackets describes the populations that have survived to natural death (d), and the third term represents the clonal proliferation due to iteration with complementary shapes (other clones, antigens or antibodies). The parameter ah is the relative connectivity among a determined bit-string and the neighborhood of its mirror image or complementary shape. When ah = 0, 0, only perfect complementary shapes are allowed. When ah = 0, 5, a string can equally recognize its mirror image and its first neighbors. The factor ytot (t) is expressed by: ytot (t) =
[y(σ, t) + yF (σ, t) + yA (σ, t)].
(2)
σ
The time evolution of the antigens is determined by: yF (σ, t + 1) = yF (σ, t) − k
yF (σ, t) × ytot (t)
(1 − ah )[y(¯ σ , t) + yA (¯ σ , t)] + ah
B i=1
[y(σ¯i , t) + yA (σ¯i , t)] ,
(3)
where k is the speed in which the populations of antigens or antibodies decrease to zero, which means, the antigen removal rate due to iterations with the populations of B cells and antibodies. The populations of antibodies is
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
355
described by a group of variable 2B defined in a B-dimensional hypercube, interacting with the antigenic populations: y(σ, t) × ytot (t) B yA (σ, t) (1 − ah )yF (¯ ζa (¯ σ , t) + ah yF (¯ σ , t) − k σ , t), ytot (t) h i=1 yA (σ, t + 1) = yA (σ, t) + bA
(4)
σ , t) is again inwhere the contribution of the complementary shapes ζah (¯ cluded in the last term, bA is the antibody proliferation, and k is the antibody removal rate, which measures its iterations with the other populations. The populations of antibodies yA (σ, t) (that represent the total number of F (σ,t antibodies) depend on the inoculated dosage of antigens. The factors yytot (t) and
yA (σ,t ytot (t)
are the responsible by the control and decrease of the popula-
tions of antigens and antibodies, while the factor yy(σ,t is the corresponding tot (t) factor for the accumulation of the clone populations in the formation of the immune memory. The clonal population y(σ, t) (normalized total number of clones) can vary since the value produced by the bone marrow (m) until its maximum value (in our model, he unity), since the Verhust factor limits its growth. The Verhust factor produces a local control of the populations of clones (B cells), considering the several regulation mechanisms. However, the populations of B cells are strongly affected by the populations of antibodies soluble in the blood1 . This is the reason that leads us to include σ , t) as an extra contribution in the set of maps previously the term ζah (¯ coupled proposed by Lagreca et al.6 . In order to properly study the time evolution of the components of the immune system, we define clone as being only a set of B cells. So, the population of antibody is treated separately in the present model. The equations (1) to (4) form a set of coupled maps that describes the main interactions of the immune system among entities that interact through connections key-lock type, which means, entities that recognize each other specifically. This set of equations is solved iteratively, considering different initial conditions. 3. Results The simulations performed in this paper show the generation, maintenance and the regulation mechanisms of the immune memory, and the cellular differentiation, through idiotypic- antiidiotypic interactions, that combine the characteristics of the clonal selection theory and the immune network
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
356
theory. To show the extension of the validity of the model, the results of some simulations are presented. Immunization experiments, in which the several antigens, with fixed concentrations are injected in the body in an interval of 1000 time steps, in order to stimulate immune response. When a new antigen is introduced, its interaction with all the other components in the system is obtained through a random number generator. The length of the B bit string was fixed in 12, corresponding to a potential repertory of 4096 cells and distinctive receptors. Injections of different antigens, in time intervals, corresponding to one period of life or to the entire life of the individual were given. In the simulations, the value d = 0, 99 was considered for the rate of natural death of the cells (apoptosis), and the proliferation rate of clones and antibodies, as being 2 and 100, respectively. For the connectivity parameter ah the value 0,01 was chosen; and the antibodies and antigens removal rate (k) was fixed in 0,1, so that in each interval of 1000 time steps, the populations of antigens and antibodies disappear before the next antigen be inoculated. In each inoculation the same seed for the random number generator was used, so the different antigens are inoculated at the same order in all the simulations. Many simulations were performed, with antigen doses varying between 0,0001 and 1,5. Next, the results of some simulations will be shown, with special emphasis to two intermediate values for the doses 0,08 and 010 in the region of coverage of the simulations. Although very close, these values present distinct results for the immune memory. Little alterations in the initial conditions of the system affect significantly the evolution, showing that the modeling of the immune system, through non-linear coupled maps, represents a good reproduction of a complex biological system, such as the immune memory. The results for doses at extreme limits, with peculiar behaviors, will be treated in the continuation of this paper. In Figure 1 the time evolution of the first clonal population that recognizes the first inoculated antigen, is shown: (a) with the addition of antibodies population, and (b) without considering antibodies in the set of coupled maps. The addition of antibodies to the system do not originate a considerable local disturb, however, in Figures 1 to 3, it is shown that the addition of antibodies alters the global capacity of the immune memory, for different antigenic concentrations. The results obtained without the presence of the term referring to the antibodies. Figure 1(b), correspond to the simplified model of Lagreca et al.,6 in which the antibodies soluble in the blood are not considered. In the Figures 1 and 2 the memory capacity is represented, considering the system with or without the presence of antibodies. For both concentrations of antigens 0,08 and
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
357
0,10 when the populations of antibodies are considered (Figures 1(a) and 2(a)), the capacity of the immune memory network is smaller than in the absence of populations of antibodies Figures 1(b) and 2(b)).
Fig. 1. Time evolution of the first clonal population that recognizes the first inoculated antigen with (a) addition of antibody population and (b) without antibody.
In the Figure 4, with high antigenic dosage, it is possible to notice clearly that the bigger the antibody proliferation rate, the smaller the network capacity. In the absence of a specific model for the antibodies, the populations reach higher levels. Taking into account that the populations of antibodies soluble in the blood help in the regulation of the B cells differentiation, we can infer from the results not only the important role of the antibodies in the mechanism of regulation of the proliferation of the B cells, but also in the maintenance of the immune memory. The decrease of active populations can be explained through the interaction between antibodies and B cells, which is according to the immune network theory. The results suggest that, despite promoting the fighting to infections, the high production of antibodies can destroy the memory
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
358
Fig. 2. Memory capacity for the concentration of antigens equals 0,08: (a) with the addition of antibody population, and (b) without antibodies.
Fig. 3. Memory capacity for antigenic concentration equals 0,10: (a) with the addition of antibody populations, and (b) without antibodies.
clonal populations produced by previous infections. Though the dynamical model proposed, we also have reproduced in maquina, experiments to study the behavior of the system facing antigenic mutation. Using 10 sam-
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
359
Fig. 4. Memory capacity for the concentration of antigens equals 1,0: (a) without the addition of antibody populations; (b) with antibody proliferation rate equals 100 and (c) with antibody proliferation rate equals 5000.
ples that represent organisms with the same initial conditions, the several populations of antigens are inoculated with concentrations fixed in 0,1 and injected in intervals of 1000 time steps. When a new antigen is introduced, its interactions (connections) with all the other components in the system are obtained according to a random number generator. Changing the seed of the random number generator, the bits in the bit-strings are altered (flipped) and, as the bit-strings represent the antigenic variability, the alterations of bits, consequently, represent the respective mutations. To study the behavior of the system facing mutation, we fixed in 350, 250 and 110 the number of injections of different mutated antigens. The same values of the parameters previously used were considered, which means, the apoptosis or cells natural death rate d = 0, 99, clonal proliferation rate equals 2,0 and antibody proliferation rate equals 100. The connectivity parameter ah was considered equals 0,01 and the bone marrow term m was fixed in 10−7 . Figure 5 shows the average lifespan of the clonal populations that specifically recognized the mutated antigens, considering 350, 250 and 110 injections. The averages, calculated over the 10 samples, indicate that, apparently, the first populations tend to survive more than the others, independently of the number of the inoculations.
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
360
Fig. 5. Average of the lifespan of the populations that recognized the antigens, for (a) 350, (b) 250 and (c) 110 inoculations.
However, in Figure 6(a)-(j) it is possible to visualize the behavior of each one of the 10 samples separately, when we administer in silico 110 injections. It is clearly noticed, according to Figure 6, that we can not affirm that the first clonal population lasts longer than the populations subsequently recognized other antigens, since the simulations indicate that only in 2 samples the first clonal population have survived for a long time (Figures 6(b) and (h)). It is important to highlight that this discrepancy between the results of Figures 5 and 6 is due to the fact that in two samples the lifespan of the first clonal population excited was long, so, the arithmetic mean was high even that in other samples the first clonal populations have not survived for a long period. Considering the result shown in Figure 6, it is possible to observe that the most likely value is not the first population.
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
361
From these results obtained in our simulations, from the proposed model, we can conclude that it is impossible to identify which clonal populations will survive, for the behaviors are totally random, which was already expected, in the case of non-linear systems similar results for the randomness theory were also obtained in recent papers5,8–21 .
Fig. 6.
Lifespan of the clonal populations in each sample.
The set of results also shows that, despite the behavior of the memory
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
362
being random, the smaller the production of soluble antibodies, the bigger the duration of the immunizations. Which means, we cannot predict the duration of the immunizations, however, they will last longer as smaller the production of antibodies is.4 4. Discussion and Conclusion In this paper, we have extended the model proposed by Lagreca et al,6 to include antibodies response and to study the dynamics of an infectious disease, considering structural mechanisms of regulation that were not previously contemplated. As a consequence we obtained that the time evolution of the clones is different from the time evolution of the populations of secreted antibodies, in agreement with what is expected for a normal immune system. We considered the fundamental role of the antibodies in the mediation of the global control of the differentiation of the B-cells, showing that they affect considerably the immunological memory. In this model there is no the need of persistent antigen or the existence of longliving memory lymphocytes. The presence of Burnet cell and complementary Jerne cell stabilize the memory-regenerating system through idiotypicanti-idiotypic interactions of their surface immunoglobulins, what implies a self-perpetuating22–30 . Our model considers that the cellcell interactions routine5,6 results in maintenance of memory in a dynamic equilibrium. The results presented in this article suggest that the process of antigenic mutation have relation with the durability of the immune memory and the populations of antibodies soluble in the blood not only participate of the immune response, but also help in the regulation of the B cells differentiation1 . The presence of soluble antibodies change the global properties of the network, and this behavior can only be observed when the populations are treated separately4,5 . Although the approach proposed for the antigenic mutation presents a random behavior for the memory of the system, our results have strongly suggested that the absence or decrease of antibody production promotes the global maintenance of immunizations. References 1. Roitt I.; Brostoff J. and Male D., Immunology, 4th ed., (New York: Mosby, 1998). 2. Burnet F.M., The Clonal Selection Theory of Acquired Immunity. (Cambridge: Cambridge University Press, 1959). 3. Jerne N.K., Towards a Network Theory of the Immune System, Ann. Immunol., Paris, v. 125C, pp. 373-389, Oct. 1974.
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
363
4. De Castro A., Antibodies production and the maintenance of the immunological memory. Eur. Phys. J. Appl. Phys. 33, pp. 147-150, Jan 2006. 5. De Castro A., Random behaviors in the process of Immunological memory. Simulation Modelling Practice and Theory 15, pp. 831-846, April 2007. 6. Lagreca M.C.; Almeida R.M.C.; Zorzenon Dos Santos R.M., A Dynamical Model for the Immune Repertoire, Physica A, Amsterdam, v. 289, pp. 191207, Aug. 2001. 7. Perelson A. S.; Weisbuch G., Immunology for Physicists, Rev. of Modern Physics, Seattle, v. 69, n.4, pp. 1219-1267, Oct. 1997. 8. Tarlinton D.; Radbruch A. et al., Plasma cell differentiation and survival, Current Opinion in Immunology 20, pp. 162-169, May 2008. 9. Sze M.; Toellner K.M.; Garcia de Vinuesa C.; Taylor D.R. and MacLennan I.C., Intrinsic constraint on plasmablast growth and extrinsic limits of plasma cell survival, J Exp Med 192 (2000), pp. 813-821. 10. Turner Jr. A.; Mack D.H. and Davis M.M., Blimp-1, a novel zinc fingercontaining protein that can drive the maturation of B lymphocytes into immunoglobulin-secreting cells, Cell 77 (1994), pp. 297-306. 11. Iwakoshi N.N.; Lee A.H. and. Glimcher L.H, The X-box binding protein-1 transcription factor is required for plasma cell differentiation and the unfolded protein response, Immunol Rev 194 (2003), pp. 29-38. 12. Iwakoshi N.N.; Lee A.H.; Vallabhajosyula P.; Otipoby K.L.; Rajewsky K. and Glimcher L.H., Plasma cell differentiation and the unfolded protein response intersect at the transcription factor XBP-1, Nat Immunol 4 (2003), pp. 321329. 13. Bernasconi N.L.; Onai N. and Lanzavecchia A., A role for Toll-like receptors in acquired immunity: up-regulation of TLR9 by BCR triggering in naive B cells and constitutive expression in memory B cells, Blood 101 (2003), pp. 4500-4504. 14. Kearney J.F. and Lawton A.R., B lymphocyte differentiation induced by lipopolysaccharide. I. Generation of cells synthesizing four major immunoglobulin classes, J Immunol 115 (1975), pp. 671-676. 15. Fairfax K.A.; Corcoran L.M.; Pridans C.; Huntington N.D.; Kallies A.; Nutt S.L. and Tarlinton D.M., Different kinetics of blimp-1 induction in B cell subsets revealed by reporter gene, J Immunol 178 (2007), pp. 4104-4111. 16. Genestier L.; Taillardet M.; Mondiere P.; Gheit H.; Bella C. and Defrance T., TLR agonists selectively promote terminal plasma cell differentiation of B cell subsets specialized in thymus-independent responses, J Immunol 178 (2007), pp. 7779-7786. 17. Rousset F.; Garcia E.; Defrance T.; Peronne C.; Vezzio N.;. Hsu D.H; Kastelein R.; Moore K.W. and Banchereau J., Interleukin 10 is a potent growth and differentiation factor for activated human B lymphocytes, Proc Natl Acad Sci U S A 89 (1992), pp. 1890-1893. 18. Allen C.D.; Ansel K.M.; Low C.; Lesley R.; Tamamura H.; Fujii N. and Cyster J.G., Germinal center dark and light zone organization is mediated by CXCR4 and CXCR5, Nat Immunol 5 (2004), pp. 943-952. 19. Ellyard J.I.; Avery D.T.; Phan T.G.; Hare N.J.; Hodgkin P.D. and Tangye
May 22, 2009
14:48
WSPC - Proceedings Trim Size: 9in x 6in
Alexandre.Castro.novo2
364
20.
21.
22. 23.
24. 25.
26. 27.
28. 29.
30.
S.G., Antigen-selected, immunoglobulin-secreting cells persist in human spleen and bone marrow, Blood 103 (2004), pp. 3805-3812. Hargreaves D.C.; Hyman P.L.; Lu T.T.; Ngo V.N.; Bidgol A.; Suzuki G.; Zou Y.R.; Littman D.R. and Cyster J.G., A coordinated change in chemokine responsiveness guides plasma cell movements, J Exp Med 194 (2001), pp. 45-56. Wehrli N.; Legler D.F.; Finke D.; Toellner K.M.; Loetscher P.; Baggiolini M.; MacLennan I.C. and Acha-Orbea H., Changing responsiveness to chemokines allows medullary plasmablasts to leave lymph nodes, Eur J Immunol 31 (2001), pp. 609-616. Nayak R.; Mitra-Kaushik S. and Shaila M.S., Perpetuation of immunological memory: a relay hypothesis, Immunology 102 (2001) 387. Zinkernagel R.M.; Bachmann M.F.; Kundig T.M.; Oehen S.; Pirchet H. and Hengartner H., On immunological memory. Annu Rev Immunol 1996; 14:333367. Lau L.L.; Jamieson B.D.; Somasundaram T. and Ahmed R., Cytotoxic T-cell memory without antigen. Nature 1994; 369:648-652. Hou S.; Hyland L.; Ryan K.W.; Portner A. and Doherty P.C., Virus-specific CD8+ T-cell memory determined by clonal burst size. Nature 1994; 369:652654. Matzinger P., Immunology: memories are made of this? Nature 1994; 369:605-606. Neuberger M.S.; Ehrenstein M.R.; Rada C.; Sale J.; Batista F.D.; Williams G. and Milstein C., Memory in the B-cell compartment: antibody affinity maturation. Phil Trans R Soc Lond B Biol Sci 2000; 355:357-360. Jondal M.; Schirmbeck R. and Reimann J., MHC Class I-restricted CTL responses to exogenous antigens. Immunity 1996; 5:295-302. Moody D.B.; Besra G.S.; Wilson I.A. and Porcelli S.A., The molecular basis of CD1-mediated presentation of lipid antigens. Immunol Rev 1999; 172:285296. . Kourilsky P.; Chaouat G.; Rabourdin-Combe C. and Claverie JM., Working principles in the immune system implied by the ’peptidic self ’ model, Proc Natl Acad Sci USA 1987; 84:3400-3404.
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
365
SOFTWARE DEVELOPED FROM A FUZZY MATHEMATICAL MODEL TO PREDICT THE PATHOLOGICAL STAGE OF PROSTATE CANCER G. P. SILVEIRA1 L. L. VENDITE2 L. C. BARROS3 Applied Mathematical Departament, IMECC/UNICAMP 651 Sergio Buarque de Holanda Street, University City - Bar˜ ao Geraldo Zip code: 13083-859, Campinas - SP, Brazil E-mail: 1
[email protected], 2
[email protected] and 3
[email protected] In this work, we did the construction of a fuzzy mathematical model wich it was developed to predict the pathological stage of prostate cancer.5 The intention is to help the specialists on the decision process about stage of the disease, to avoid surgery and intensive treatments unnecessary. The model consists on a system founded in fuzzy rules, that it combine the pre-surgical data (clinic state, PSA level and Gleason score) availing of a set of linguistic rules made with base on informations of the existents nomograms. Herewith we hoped to get the chance of the individual, with certain clinical features, be in each stage of the tumor extension: localized, advanced locally and metastatic. Simulations were made with patient’s data of the Clinics Hospital/UNICAMP and the results were compared with Kattan’s probabilities8 that are used on the medicals decisions. A software was developed from this model and is a graphic interface that makes interaction with the subroutines that make the calculations. Its source code was written in Java and software has been tested on Windows and Linux / GNU. Keywords: Prostate Cancer, Mathematical Model, Fuzzy Sets, Software, Biomathematics.
1. Introduction The increase of the incidence of cancer in developed countries and in development, in the last years, it is an important problem of public health and a challenge for medical science. Cancer is the name given to a class of diseases characterized by uncontrolled growth of abnormal cells of the body, lead-
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
366
ing to several pathological consequences and often death4 . The estimates of the National Institute of the Cancer6 showed that about 466,730 new cases of cancer they would be diagnosed between the brazilians in 2008. The prostate cancer presented the second rate higher losing only to nonmelanoma skin cancer. The occurrence estimated was 49,230 new cases, what it corresponds to a risk of 52 cases to each 100,000 men. The Prostate is a gland of the urogenital system masculine localized under of the bladder. Produces part of the seminal fluid (that it nourishes and protects the sperm) and the Prostatic Specific Antigen (PSA). The origin of prostate cancer is still unknown, however it is assumed that some factors may influence its development, namely: age, family history, race, diet, among others. In the initial phase it is not common present symptoms. In the advanced disease can occur blockage of the urinary flow with consequent worsening of the renal function and pains in the bones due to metastasis. For the precocious diagnosis they are indicated the clinical examination (rectal examination) and the dosage of the Prostatic Specific Antigen (PSA). The results can suggest the existence of the disease and indicate the accomplishment of the prostate biopsy. The rectal examination is used to evaluate the disease’s local extent. However, it is important to observe that some benign injuries can simulate the cancer. On the other hand, the normal exam also does not exclude the cancer presence. Prostatic Specific Antigen is a glycoprotein produced mainly in the epithelium’s prostate. Its biological function is the liquefaction of the clots seminal, assisting in the reproduction human. It is a tumoral marker since 1986. In the classic classification the PSA level is considered normal until 4 ng/ml, lightly raised between 4 and 10 ng/ml, moderately raised between 10 and 20 ng/ml and highly raised above 20 ng/ml. The PSA level is correlated with the tumour extent. How much bigger the PSA index greater are the possibilities of advanced disease. However, only the PSA level doesn’t determine the stage’s cancer because they exist patients with similar PSA level in different stages. Although the PSA level is considered normal until 4 ng/ml, studies already have shown that 22% of the men with prostate cancer frequently have PSA level between 2,6 and 4 ng/ml3,9 . The prostate biopsy is a procedure when the pathologist distinguishes benign tumour of malignant and identifies the degree of cellular differentiation. Nowadays, the Gleason system that it emphasizes the glandular architecture is the most used. In this system the tumors are classified in 5 degrees, being that degree 1 represents the injuries with less aggressive behavior and the degree 5 ones that have more aggressive behavior. The final diagnosis is given by the addition of the degrees of the first and second
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
367
predominant standards. The clinical stage is the assessment of the extent of the disease made from examinations that were described previously. The system TNM describes the anatomic extent of disease and based on the evaluation of three components: the extent of the primary tumour (T); the absence or presence and extent of regional lymph node metastasis (N) and the absence or presence of distant metastasis (M). This system for the classification of malignant tumors, is used by specialists to indicate the clinical stage of patients. Table 1 describes the part of the classification TNM that we will use in the model. Table1. Clinical Stage of the prostate cancer. Stage
Description
T1a
Nonpalpable; 5% (or less) malign
T1b
Nonpalpable; above 5% malign
T1c
Nonpalpable, PSA elevated
T2a
Palpable, < 1/2 one lobe
T2b
Palpable, > 1/2 one lobe
T2c
Palpable in both lobes
T3a
Extra-prostatic extension
The pathological staging evaluates the extent of prostate cancer through examination after surgery (radical prostatectomy). The pathological stage can be classified as: localized (if the tumor is confined in the gland), locally advanced (cancer progressing beyond the prostate) and metastatic (with involvement of the seminal vesicles and adjacent areas). For the decision process of the specialist with relation to the diagnosis and disease stage are available the examinations that we already cite. The clinical examination is useful in the identification of nodules or suspected areas as hard regions. However that examination depends on the experience of the specialist. The dosages of PSA can assist in the detection of the disease. However, this test still does not allow to the precocious discovery of all the cancer cases. In the same way, the biopsy can no disclose the presence of tumors in the prostate gland and delaying the diagnosis. An alternative that is being used by doctors are the statistical models called nomograms. A nomogram is a graphical representation that incorporates different predictors, modeled as
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
368
continuous variables to predict an outcome particular. Kattan et al. (2006) developed nomograms - based on the clinical stage of the tumour, on the level of PSA and on the degree of Gleason - providing the probabilities for the various stages of prostate cancer. However the nomogram have several limitations. It is known that all nomogram depends on the sample of patients. This means that to develop a predictive tool on a given season may not provide satisfactory predictions in patients contemporaries. Also, if the sample has more individuals of a particular race, the nomogram may not be applicable to individuals from other sources. Another important factor is that the variables used by Kattan are formed by clinical examination, biopsy and PSA level, that is, are uncertain variables where are assigned numerical values to represent subjective situations. For all these factors to become important to develop a mathematical model - and a software to help the professionals in urology - that combines the data supplied by examinations, to predict the pathological stage of the prostate cancer. The chosen tool to the construction of this model was the Fuzzy Sets Theory, due its capacity to deal with the uncertainties involved in the problem. Our modeling was based on the work done by Castanho (2006)2 . 2. The Fuzzy Model A system based on rules fuzzy - SBRF - is composed by four main modules: a module of fuzzification or encoder that represents the inputs and outputs of the system by fuzzy sets; a module of inference; a base of rules and a module of defuzzification or decoder that it transforms the output into a numerical value. Figure 1 illustrates the structure of the SBRF that we use in the modeling. One of the first work involving the application of the Fuzzy Sets Theory to predict the stage of prostate cancer was to of Castanho (2005)2 . This work has been an important reference for the construction of our model. In the fuzzy model we considered the following inputs: • Clinical Stage linguistically classified as T1, T2a, T2b, T2c e T3a, in accordance with TNM system; • PSA Level considered Normal (until 4 ng/ml), Lightly Raised (410 ng/ml), Moderately Raised (10-20 ng/ml) and Highly Raised (above 20 ng/ml); • Gleason Score classified in highly differentiated (degrees 2, 3 and 4), medium differentiated (degree 5), low medium differentiated (degree 6), little differentiated (degree 7) and Undifferentiated (de-
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
369
Fig. 1.
Basic structure of a system based on rules fuzzy.
grees 8, 9 and 10). The membership functions to the inputs are represented in the Figures 2, 3 and 4.
Fig. 2.
Membership functions - linguistic variable Clinical Stage.
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
370
Fig. 3.
Fig. 4.
Membership functions - linguistic variable PSA Level.
Membership functions - linguistic variable Gleason Score.
For the output Disease Stage we attributed the linguistic terms: Lo-
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
371
calized, Advanced Locally and Metastatic. This is a qualitative variable and therefore was chosen a scale of 0 to 1 to indicate the disease’s extent. Initially, in a species of “ansatz” we constructed its membership functions triangular (see Figure 5).
Fig. 5.
Membership functions - linguistic variable Disease Stage.
The informations contained in the Stephenson and Kattan’s nomograms (2006) were used on the elaboration of the rules fuzzy. In the modeling of Castanho (2005) were used the Partin’s nomograms (1997)7 . We did this change because, in contrast of Partin, Stephenson e Kattan calculated the probabilities for the different disease stages by means of modified regression. This allowed consider the disease stages mutually no exclusive, that is, it was possible to consider the overlap of the stages. To construct the base of rules, we did all the different combinations between clinical state, PSA level and Gleason score. We take in account the all the linguistic terms attributed to the variables and all the different probabilities of the nomograms - to the different disease stages. To follow we present an example of construction of one of the rules. For a patient with clinical stage T2a, PSA level < 4 and Gleason score 6, we find in the nomograms the following probabilities: 64% for localized cancer, 34% for advanced locally disease and 2% for metastatic disease. With these
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
372
informations we construct the rules numbers 31, 32 and 33 of the Table 2. 31) “If Clinical Stage is T2a, PSA Level is Normal and Gleason Score is degree 6 then Disease Stage is Localized (0,64)”. 32) “If Clinical Stage is T2a, PSA Level is Normal and Gleason Score is degree 6 then Disease Stage is Advanced Locally (0,34)”. 33) “If Clinical Stage is T2a, PSA Level is Normal and Gleason Score is degree 6 then Disease Stage is Metastatic (0,02)”. Table 2. Some of the 285 rules of the Rules Set. N 01 02 03 31 32 33 124 125 126
Clin. T1 T1 T1 T2a T2a T2a T2c T2c T2c
PSA <4 <4 <4 <4 <4 <4 4-10 4-10 4-10
Gleason 2-4 2-4 2-4 6 6 6 7 7 7
Stage Loc. Advan.L. Metast. Loc. Advan.L. Metast. Loc. Advan.L. Metast.
weight 0,80 0,19 0,01 0,64 0,34 0,02 0,25 0,48 0,27
The probabilities of occurrence of the disease stages constituted the weights that pondered the rules. The process of inference was done by Mamdani Method and the defuzzification by Center of Gravity Method. We did some simulations of the model with real data of patients of the Hospital of the Clinics of the UNICAMP, using the membership functions triangular of the output. The results found by SBRF represent the possibility of the patient to be in each one of the cancer stages. Such results were transformed into probabilities and proved to be more optimistic than the probabilities of Kattan.8 However, when we analyze the results together with a specialist - professor Ubirajara Ferreira, of the College of Medical Sciences of the UNICAMP - it was found that these were more next of the results presented in the nomograms and further from the reality clinic of the patients. Ahead of this, we adapted the membership functions that describe the output of the system from the probabilities of occurrence of each stage. We used curve fitting method - least square method - on the points that represent the probabilities of each disease stage for several samples of patients. The functions adjusted were normalized and concentrated.
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
373
In the Figure 6 these functions are represented.
Fig. 6.
Membership functions - Disease Stage.
Using these adjusted functions for the output of the system, we did other simulations of patients. The data also are of patients of the Hospital of the Clinics of the UNICAMP. Some results of these simulations are in Table 3. The column Probab. is the transformation of the possibility in probability. Whether, for example, a patient with clinical stage T2a, PSA level 4.6 ng/ml and Gleason score 6. The possibilities determined by SBRF for prostate cancer located, locally advanced and metastatic were, respectively, 0.57, 0.41 and 0.0. Transforming these values on probabilities we found 58%, 42% and 0%. Already in the nomograms, for these clinical datas we have respectively the probabilities: 51%, 44% and 5%. We can note that for this case, the possibility that the disease is located is raised, while the possibilities of locally advanced and metastatic disease decreased. For another patient with clinical stage T2a, PSA 11 ng/ml and Gleason score 6, the possibilities according to the SBRF are 0.47 (localized disease), 0.73 (locally advanced disease) and 0 (metastasis). In this case the probabilities are 39%, 61% and 0% respectively. In nomogramas found 38%, 52%
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
374
Table3. Comparison of the results found by SBRF with the Kattan’s nomograms.
T2a PSA 4,6 Gleason 6 T2b PSA 13,3 Gleason 7 T2a PSA 11 Gleason 6 T3a PSA 15,6 Gleason 8 T1 PSA 3 Gleason 6 T2c PSA 11,3 Gleason 7
Stage Loc. Advan.L. Metast. Loc. Advan.L. Metast. Loc. Advan.L. Metast. Loc. Advan.L. Metast. Loc. Advan.L. Metast. Loc. Advan.L. Metast.
SBRF Possib. 0,57 0,41 0 0,23 0,81 0,10 0,47 0,73 0 0,15 0,43 0,26 0,61 0,31 0 0,28 0,96 0,05
Probab. 58% 42% 0% 20% 71% 9% 39% 61% 0% 18% 51% 31% 66% 34% 0% 22% 74% 4%
Kattan Nomogram 51% 44% 5% 13% 51% 36% 38% 52% 9% 3% 26% 71% 61% 35% 4% 15% 45% 40%
and 9%. The results showed that for this patient, to the stage localized the probabilities were almost the same. However, the possibility of metastatic disease has been reduced to zero, while the possibility of locally advanced disease were raised in relation to the results presented in nomograms. In assessing specialist, the results obtained with the membership functions adapted for the output of the system, in general have been more optimistic, for the stage of prostate cancer when compared with the probabilities of the nomogramas. Therefore, they were considered more consistent with reality faced by patients in the clinic. 3. Software With the objective of assisting the specialists on the decision process about stage of the disease, a software was developed from the model we describe. The intention is to provide such software to specialists in the Department of Urology of the Hospital of the UNICAMP, for the use in working with patients. The software is a graphic interface that makes interaction with the sub-
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
375
routines that make the calculations. These subroutines contain the theory that describes the fuzzy mathematical model for the prediction of the pathological stage of prostate cancer. The Figure 7 shows an illustration of the graphical interface software.
Fig. 7.
Graphical interface of the software.
The user enters with the pre-surgical data patient’s - clinic state, PSA level and Gleason score - the program shows the graphical interface, the possibility (and this turned into a probability) that the patient fits into one of the stages of the prostate cancer. The program was developed in JAVA and to run it you must have installed on your computer, at least version 1.6 of the Java SE platform, convenient to system’s operational computer. With appropriate facilities, the software built has been tested on Linux / GNU, Windows XP and Vista. Acknowledgments We would like to thank the help of Raphael de Oliveira Garcia (PhD student in Applied Mathematical, IMECC/UNICAMP) in writing of the program for the software development. We are grateful to Alexandre de Oliveira Garcia (developer of systems, Sao Paulo - SP) by support in building the graphical interface in Java. References 1. L. C. Barros, R. C. Bassanezi, “Topics of Fuzzy Logic and Biomathematics”, UNICAMP/IMECC, Campinas - SP, Brazil (2006). (In portuguese).
April 24, 2009
16:40
WSPC - Proceedings Trim Size: 9in x 6in
Graciele.novo2
376
2. M. J. P. Castanho, Construction and evaluation of a mathematical model to predict the evolution of prostate cancer and describe its growth using the Fuzzy Sets Theory, Ph.D. Thesis, University of Campinas, Campinas - SP, Brazil (2005). (In portuguese). 3. W. J. Catalona, D. S. Smith, D. K. Ornstein, Prostate cancer detection in men with serum PSA concentrations of 2.6 to 4.0 ng/ml and benign prostate examination. JAMA 277 no 18, pp. 1452-1455 (1997). 4. U. Ferreira, A. C. Nardi, Prostate Cancer in N.R.J. Netto, “Pratice Urology”, Atheneu, So Paulo, Brazil (1999). (In portuguese). 5. G. P. Silveira, Application of the Fuzzy Sets Theory in the prediction of the pathological staging of prostate cancer, MSc thesis, University of Campinas, Campinas - SP, Brazil (2007). (In portuguese). 6. INCA, National Cancer Institute. www.inca.gov.br - last acess in september of 2008. (In portuguese). 7. A. W. Partin, L. A. Mangold, D. M. Lamm, P. C. Walsh, J. I. Epstein, J. D. Pearson, Contemporary update of prostate cancer staging nomograms (Partin Tables) for the new millennium. Journal Urology 58, no. 6, pp. 843848 (2001). 8. A. J. Stephenson, M. W. Kattan, Nomograms for prostate cancer. Journal Urological Oncology 98, pp. 39-46 (2006). 9. H. Zhu, K. A. Roehl, J. A. V. Antenor, W. J. Catalona, Biopsy of Men with PSA level of 2.6 to 4.0 ng/ml associated with favorable pathologic features and PSA progression rate: A Preliminary Analysis. Journal Urology 66, pp. 547-551 (2005).
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
377
MODELING AND SIMULATION OF THE HUMAN EYE L.P. BRAZIL∗ L.H.O FERNANDES L.G NONATO O.M. BRUNO Instituto de Ciˆ encias Matem´ aticas e de Computa¸c˜ ao, USP, Av. Trabalhador S˜ ao-carlense 400, S˜ ao Carlos, CEP: 13560-970, Brasil E-mails:
[email protected],
[email protected],
[email protected],
[email protected] L.A.V. CARVALHO Instituto de F´ısica de S˜ ao Carlos, USP, Av. Trabalhador S˜ ao-carlense 400, S˜ ao Carlos, CEP: 13560-970, Brasil E-mail:
[email protected] In opposite to the advances in computer aided surgical procedures, computer simulation of the human eye has not experienced the same technological growth. In fact, even basic questions such as the computational modeling of the structures comprising the human eye system and how to simulate the projection of an image onto the retina for a specific eye have not been addressed properly. In this work we present a framework for modeling, simulation and visualization of the human eye system. The proposed technique makes use of schematic data, and may represent a more reliable method for predicting retinal image formation given that the corneal anomalies should have greater effect in image distortion compared to the contributions of the lens and other internal media. Qualitative and quantitative validations of our approach is also presented, showing that such a methodology can be used as a virtual environment for teachers and students of ophthalmology and optometry, as well as for computer simulation of retinal images. Keywords: Mathematical modeling in biological sciences, Medical informatics and Medical physics.
∗ Author
who will be presenting the full paper.
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
378
1. Introduction The interest in understanding quality of vision, the optical properties of the human eye and its relation to physical and physiological properties of its components, is very ancient.19 Helmholtz, in the 19th century, was one of the pioneers in this study, which was compiled in the famous collection Helmholtz Treatise in Physiological Optics.11 Since then, a great amount of techniques and instrumentation for visual quality measurements were implemented. Today they form a collection of tools that aid the eye care professional in providing the best diagnostic and treatment to their patients. In the early 80s, many companies were starting to introduce refractive lasers for correction of myopia. Because of this possibility, better instrumentation was required in order to analyze the pre and post shape of the entire corneal surface. The earlier techniques such as manual keratometers,15,20 which measure only the central 3mm, were not sufficiently precise. With the advent of more powerful and price attractive micro-computers a new line of equipments for corneal surface analysis started to take place over the conventional keratometers. These instruments, popularly known as Corneal Topographers or Videokeratographers, are based on the 19th century Placido Disc.18 Surface curvature and calculations are based on image processing5,9 and computer graphics techniques.12 They allow the eye care professional to analyze an 8 to 10 mm region over the cornea, displaying curvature data for thousands of points. In 1984, the first refractive surgery techniques, called Radial Keratectomy (RK), were based on the application of radial incisions to flatten the central cornea. This was an empiric method and based on data collected from data gathered from human and animal cadaver eyes. During this period certain companies were investigating an alternative method for corneal intervention, using laser technology. At the beginning of the 90s some companies introduced the first commercial excited dimmer lasers for corneal tissue ablations. These lasers became popularly known as excimer lasers and the first generations were primarily designed for myopic correction, and procedures became commonly known as Photorefractive Keratectomy (PK). These lasers rapidly took over the place of conventional RK techniques, which were very aggressive to the corneal tissue since incisions could go as deep as 90% of the corneal depth. In the late 1990s a series of technological advances allowed refractive surgery to be taken to higher levels of precision. Excimer lasers with flying spot beams and eye tracking systems (to eliminate misalignment of laser treatment caused by involuntary eye movements) provide sufficient precision for practically sculpting
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
379
the cornea to any desired shape17,23 and a high resolution auto-refractor inspired in astronomical instruments10 measures the eye’s wave-front optical aberrations with such a high precision. That refractive surgery can now be undertaken in a patient-to-patient base, allowing what became known as customized corneal ablations. In opposite to the advances in computer aided surgical procedures, computer simulation of the human eye has not experienced the same technological growth. In fact, even basic questions such as the computational modeling of the structures comprising the human eye system and how to simulate the projection of an image onto the retina for a specific eye have not been addressed properly. Furthermore, few works devoted to simulate the optimal correction for aberrations and lens accommodation process have been proposed. This work presents a first step toward tackling some of the questions above, proposing a framework for computational modeling and simulation of the human eye system. Making use of geometric modeling and computer graphics techniques the proposed approach is able to handle synthetic data which it is possible to visualize and analyze retinal images of a specific eye. The system, called Virtual Eye (VEye), presents some characteristics not found in other approaches described in the literature. For example, the eye model is comprised of cornea, crystalline and spherical retina, all of them modeled by triangular meshes, facilitating computational simulation with such structures. The technique presented here can lead to further contributions to the the understanding and planning process for customized refractive surgeries. In order to put our approach in context, in Section 2 we present some background and related work. Section 3 describes the three main modules that make up our eye simulation system, namely modeling, simulation, and visualization. The Section 4 presents a validation for our framework and results are discussed in Section 5. Finally, in Section 6 we present conclusions and future work. 2. Background and Related Work Different approaches for computational modelling of human visual system exist, being the schematical and reduced models the most commonly used. In schematical modelling the goal is to create precise models for the ocular system, respecting, at least up to a certain degree, certain anatomical parameters of the biological eye. Although schematical models can precisely represent certain properties and functionalities of the biological eye their
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
380
complexity may impair computational simulation, which demands a more simplified modeling. The objective of the reduced models is to reproduce the optic characteristics of the human ocular system through a simplified set of anatomical structures of the human eye. The simplifications introduced by the reduced models, which some times arrive to consist of only one refractory structure, help enormously in the calculation of simple parameters. For problems whose goal is to reach the real performance of the human visual system such models can become incomplete. One of the first works related with the modeling of the human eye was written by H.V. Helmhotz.11 This schematical model, said almost accurate, intends to represent an eye with correct biological functioning, including the majority of the anatomical structures, even so the values of the refractive indices as well as some values of radius of curvature are not biologically consistent. Although all eye schematical eye models have different degrees of discrepancy with the biological model, Helmholtz modelo is considered to be one of the most faithful to the biological eye in terms of the optic properties. Gullstrand8 proposes simplifications in the Helmhotz’s schematical model, considering the cornea as being constituted of only one refractive surface. The reduced model of Emsley,16 derivative from Gullstrand’s schematical model, is one of the most spread out, mainly due to its simplicity. This model has only retina and cornea, both represented as a single surface, with refraction power of 60 diopters and an intern medium with refractive index of 1.333. The previously described models use spherical surfaces to model the components of the eye, making difficult to accomplish realistic simulations due to spherical aberration present in this models. In order to overcome such a problem, Lotmar13 considers a model where parabolic surfaces substitute the spherical models. Although the proposed surfaces do not agree with the anatomical strutures, the results obtained are in accordance with experimental tests. More general approaches have also been proposed, as for example the elliptical, parabolic and hyperbolic models proposed by Kooijman,21 whose results are quite similiar to the previews one. With the objective of improving the simulation of chromatic aberrations, Thibos22 proposes a new reduced model, called “Chromatic Eye”. The chromatic eye introduces a pupil to the model and uses elliptical surfaces to model the corneal lens, ensuring a null spherical aberration. Changing the elliptical surface by a family of models with rotational symmetry
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
381
Thibos7 introduces a degree of freedom to the model, making possible to simulate spherical aberration. This new model has been called “Indiana Eye”. Doshi4 has compared the effectiveness of Kooijman’s model, Chromatic Eye, and Indiana Eye in computer simulation. Although all models have presented satisfactory results, Chromatic Eye has obtained a best behavior in experiments. A different approach for computer simulation of the human eye has been proposed by Camp et al.6 Camp’s approach makes use of real data to model the cornea and a plane to model the retina. Using ray-tracing to simulate light rays through the eye, such strategy can compute a Spot Diagram on the plane defining the retina. By convolving the Spot Diagram with Snellen’s letters the system is able to infer how a person sees. Aspherical algebraic surfaces have been employed by Langenbucher et. al.14 Although such an approach enables to compute normals and ray intersections analytically, handling real corneal data set becames difficult, therefore limiting the technique to general surfaces. Carvalho et. al6 propose a framework to deal with real corneal data set, using the Emsley Eye as base to accomplish the simulation. The method proposed here is also able to deal with real data to model the cornea while including additional structures as crystalline lens and spherical retina, making it more complete than others. Furthermore, our simulation scheme is based on triangular mesh to model the anatomical structures, an original approach in the context of eye simulation. As we shall present in the next section, this framework makes the basic operations involved in human eye simulation, such as the ray-tracing process, more robust and efficient.
3. The VEye System The VEye system is divided in three main modules, namely: modeling (in the computational sense), simulation, and visualization. Modeling here has the objective of structuring the components of the eye, such as cornea, retina and crystalline lens. Simulation is responsible for the optical system, casting rays and computing the intersections and refraction of the rays. The wave-front is also computed in this module. Outputs from the simulation are handled by the visualization module. The following subsections describe in more details the functionalities of each module.
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
382
3.1. Modelling As the name suggests, this module is responsible for modeling the three main components of the eye: cornea, retina and crystalline lens. In the following we present the strategy adopted to generate each component. 3.1.1. Cornea The cornea is approximated by a triangular mesh that fits a set of points. The set of points can be either obtained from a corneal topograph of the Ophthalmology Group of USP S˜ao Carlos, in this case the mesh approximates a real cornea, or by computational simulation. The corneal topographer gives the measures of elevations on the center of the cornea, where is the origin O of the coordinate system. Actually it considers the highest elevation point with the origin of the coordinates system. The reading of elevations are made in one of a degree and for 17 different circles with the center in O. Finally, it made the triangulation in points and the result is illustrated in Figure 1.
Fig. 1.
Generating a triangular mesh as strips of triangles.
Figures 2 shows the meshes generated from a real cornea and a simulated cornea, respectively. Notice that the simulated data seems more irregular, which is a consequence from the fact that this data was collected on a videokeratography system. 3.1.2. Retina The retina is modeled from the schematic eye proposed by Emsley.16 Emsley’s schematic eye was adopted essentially because of its simplicity.
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
383
(a) Fig. 2.
(b)
(a) Mesh generated from real data. (b) Simulated data.
A multi-level triangular mesh is employed to model the retina. The mesh is generated by refining an initial polihedron whose vertices are on the Emsley’s schematic eye. Distinct levels of refinement are employed in each part of the retina aim at reproducing the distribution of photoreceptors. In this way, vertices are more concentrated in the fovea than in the peripheral retina. The reasoning behing this approach is that each vertex represents a concentration of photoreceptors in its neighborhood. This approach allows to change the resolution of photoreceptors, briging our model close to the biological eye, as the concentration of vertices becames closer to physiological data. The user may set up the number of layers and the level of refinement to be applied in the layer containing the fovea. From the number of layers, the initial polyhedron is built as follows: Suppose that n layers have been specified. Starting from the cross section where the cornea must be attached, a set of n equally spaced planes, orthogonal to the x-axis, is defined. These planes intersect the Emsley’s schematic eye in circles. Parameterizing each circle in polar coordinates, four points at 0o , 90o , 180o , and 270o are chosen to make up, together with the fovea, the initial polyhedron, as shown in Figure 3. The refinement process takes each triangle of the mesh (starting with the initial polyhedron) and subdivides it by introducing the median point of each edge of the triangle, thus generating 4 new triangles. The level of refinement (number of times that each triangle will be divided) given by the user is applied at foveal layer. The previous layer will have one refinement level less than the current one, and so on. Each new vertex created by the refinement is projected on the sphere defined by Emsley’s schematic eye. Figure 4 shows a complete retinal model with 4 layers and 5 refinements at
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
384
(a) Fig. 3.
(b)
(a) Initial polyhedron with two layers and (b) with four layers.
the foveal layer.
Fig. 4.
Retinal model with 4 layers e 5 levels of refinement.
3.1.3. Crystalline Lens The crystalline lens has been modeled from Gullstrand’s eye model,8 which defines the anterior lens as a sphere of radius 10mm and the posterior lens as a sphere of radius 6mm. The meshes for both anterior and posterior lens are also generated by a refinement process. An initial polyhedron is defined from four points on the plane where the lens meet and one point on the
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
385
apex of each lens. Each triangle is divided in four new ones and the new vertices projected on the spheres that define the lens. 3.2. Simulation The simulation consists in casting rays toward the eye, computing the refractions on the cornea and crystalline lens, finally obtaining the intersection points among rays and retina (ray-tracing process). The calculation of refractions involves two geometrical estimates: the intersection points between each ray and the refractive surfaces (cornea and crystalline) and the normal vectors in these intersection points. In our implementation we make use of the “door in - door out” principle to compute the intersections among rays and lens. This principle can be stated as follows: Let v1 , v2 and v3 be three vertices defining a triangle t in a mesh. Suppose that a ray l is casted from point x in direction u (Figure 5). The ray l intersects the plane defined by t in a point w, in mathematical terms w = x + αu. The point w can also be writen as w = λ1 v1 + λ2 v2 + λ3 v3 , where λ1 + λ2 + λ3 = 1. From these two expressions for w we derive the following linear system:
1 1 1 0 v1 v 2 v3 u
Fig. 5.
λ1 λ2 = 1 . λ3 x α
“Door in - Door out” principle.
(1)
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
386
The system (1) gives values to λ1 , λ2 , λ3 and α. If λi ≥ 0 for all i = 1, 2, 3 then w is within the triangle t, otherwise at least one λi is negative. A negative λi indicates that w is opposite to vi , for example, in figure 5 w is opposite to v2 regarding the edge v1 v3 , so λ2 < 0. Notice that if we “jump” from t to the adjacent triangle opposite to vi we go toward w. Following this principle we shall reach a triangle containing w where λi ≥ 0, i = 1, 2, 3. It is worth noting that door in - door out principle does not work if the mesh presents “strong” concavities (high intensity curvatures), but this is not the case in our context. Besides allowing to find out the intersection points, the values of λ can also be employed to interpolate normals on the mesh. Let v1 , v2 and v3 be the vertices of a triangle t containing w and n1 , n2 and n3 be the normals in theses vertices (the normals ni can be estimated by averaging the normals of the triangules surrounding the vertex vi ). As λi ≥ 0, i = 1, 2, 3 in t, the normal nw in w can be estimated as: nw = λ1 n1 + λ2 n2 + λ3 n3 .
(2)
In that way, the door in - door out principle allows the computation of intersection points and normals in a very efficient way, as it avoids the necessity of testing each ray against all triangles in the mesh. Since all eye structures are modeled as triangular meshes, the door in - door out technique can be employed in all geometrical calculations present in the ray-tracing. Refractions are computed from Snell’s law ni sin(θi ) = nr sin(θr ), where ni , θi are the index of refraction and angle of incidence of the incident medium and nr , θr are the index and angle of refraction of the refractive medium. The indexes of refraction used in our implementation are presented in Table 1. Table 1. Air 1.0
Indexes of refraction.
Cornea 1.33
Crystalline 1.413
3.3. Visualization The visualization module is comprised of a set of graphical tools devoted to visualize the mesh models and the output of the simulation module. The meshes can be visualized either in wire-frame or shading. Examples of wire-frame visualization has been presented in Figures 2 and 4. The
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
387
Figure 6 shows a view of all eye components in shading and the ray tracing process.
Fig. 6.
Shading view of all eye components.
There are two alternatives for presentation of the output of the simulation. We can use either the vertices of the mesh or plot the exact intersection points of rays at and retina. In the first alternative each intersection point is approximated by its closest vertex in the retinal mesh. This option is usually useful when one wants to plot the so called spot diagram,3 which is largely employed in aberration analysis of optical instruments. In the next section we explain in more details this useful technique for visualization of our results.
4. Validation In order to validate and check the accuracy of algorithms and models employed it has been computed the gaussian properties for Le Grand model by finite ray-tracing method.1 The paraxial ray-tracing was emulate shooting some rays next to optical axis, make a less possible paraxial ray with h = 1.0mm high of optical axis. The focal points F and F ′ were computed shooting a light ray parallel to the optical axis of the eye and getting the intersection between the refracted ray and optical axis. The principal points H e H ′ are H = F + HF and the same to H ′ = F ′ + H ′ F ′ . The measures HF = f and H ′ F ′ = f ′ are:
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
388
f′ = −
h φ′
f=
h φ
(3)
where φ′ and φ are the angles formed by intersection of the incident and refracted rays respectively with the optical axis of surface. h is the distance of optical axis with the intersection between incident ray and surface. The nodal points N and N ′ are N = H + HN and N ′ = H ′ + H ′ N ′ . (n′ − n) (4) D where n′ and n are the refractive indices and D is the diopter of the lens. Table 2 shows a comparison of the gaussian properties between the results computed by VEye and Smith1 for a paraxial accommodated Le Grand model. Observe the diopter in VEye is D = 67.1511D almost the same value in Smith of D = 67, 6780D and there is not much difference between others gaussian properties values shown in the table. H ′ N ′ = HN =
Table 2.
Gaussian Properties of Le Grand model A = 6.96D.
Power VF VF′ VH VH′ VN VN′ H′ N′ = HN f f′
Le Grand (VEye) 67.1511 -12.86389406 -22.1482 1.9057 2.2528 6.7947 7.2564 5.00364 -14.7694 19.8954
Le Grand1 67.678 -12.397 21.932 1.819 2.192 6.784 7.156 4.965 -14.776 19.741
Desviation 0.5269 0.4668 0.2162 0.0867 0.0608 0.0107 0.1004 0.03864 0.0066 0.15554
5. Results In order to prove the effectiveness and robustness of our framework we accomplish a set of simulations with real and synthetic data. The results of such simulations are described next subsections. 5.1. Retinal Images In order to investigate the retinal images formed by an theorical eye model, a simulation of an actual object (a bitmap image) was implemented. Each
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
389
pixel on the object corresponds to a light ray launched in direction of the eye. Figure 7(a) has dimension h = 11, 6mm and is used as input I, being placed at plane z = 141, 787mm in the object space for the accommodated Le Grand model. In this situation, it has also been computed the circle of confusion ρ′ of the light cone shot from z. The radius R of pupil of Le Grand model was changed in order to check what happens in the image formation of I. Light cones were projected from each pixel of I. The radius of pupil are: R = 1, 0mm, R = 2, 5mm and R = 4, 0mm. The results are shown in Figure 7(b), Figura 7(c) and Figura 7(d). They are real inverted images with some “blur” for radius 2, 5mm and 4, 0mm.
(a)
(b)
(c)
(d)
Fig. 7. Original image and images formed by the accommodated Le Grand model in VEye. (a) Original image. (b) Radius of the pupil R = 1, 0mm and circle of confusion is ρ′ = 0, 032mm. (c) Radius of the pupil R = 2, 5mm and circle of confusion is ρ′ = 0, 198mm. (d) Radius of the pupil R = 4, 0mm and circle of confusion is ρ′ = 1, 125mm
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
390
When examining the results, we realized that the flow of light rays incident in the retina increases with the size of the pupil focusing their gray levels at the white regions indicating the increase in brightness of the image in its center. The circle of confusion show us the increase in the blur of image I because the aberrations present in Le Grand. In all cases, the size of image is h′ = 1, 3288mm witch results in increasing lateral = −0, 1145. M = −1,3288mm 11,6mm Although our framework, in actual stage, does not reproduce all optical effects, such as blur, dispersion, contrast and interference, it allows the analysis of the geometric distortions of retinal images. This may be an important tool for the eye-care professional in order to understand how his/her patient sees. In this way, this kind of simulation can be seen as a first step towards understanding physiological aspects of the retina level and in the interface between optical and neurological phenomena.
6. Conclusions The simulation of vision systems, found in the literature, are predominately based on theoretical models of the human eye, which are characterized by simplifications of the organ accomplished with the intention of facilitating the calculations and simulations. Other models, many times simplistic, are very distant from the real biological model, what implies on a non-realistic simulation that can not aid comprehending of eye phenomena. In this work was introduced a simulation system of the human eye, that makes use from theorical models data. The main contribution of the work is to supply subsidies to new experiments can be accomplished, providing information for new investigations in eye physiology and ophthalmic optics. The system allows to visualize the ray path of a determined eye, enabling the analysis of its optical characteristics. A pupil with variable diameter is implemented in our system in order to make our simulation more complete and accurate. Such structure should improve the quantitative results, specially regarding the wavefront. Another improvement that we are incorporating is the simulation of blur effects in retinal images. The system presented in this work will be able to visualize the visual deformities caused by irregularities of the cornea or incorrect dimensions of the eyeball as well as the using of real datas in-vivo obtained by cornea topography.
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
391
Acknowledgments This work was sponsored by FAPESP - State of So Paulo Research Funding Agency - Brazil proc. 2007/02821-0.
References 1. Atchison D. A., Smith G. Optics of the human eye, Reed Educational and Professional Publishing Ltd, 2000. 2. Bezdidko S.N. The Use of Zernike Polynomials in Optics. Sov. J. Opt. Techn., 41, 1974. 3. Born M. and Wolf E. Principles of Optics, Pergamon Press, 464-466, 1975. 4. Camp J., Maguire, J., et al. A computer model for the evaluation of the effect of corneal topography on optical performance. American Journal of Ophthalmology, 109(4):379–385, 1990. 5. Carvalho L.A., Stefani M., Romo A.C., Tonissi S., Castro J. Digital Processing of image reflectd from the Lachrymal Film of the Anterior Corneal Surface. Revista Brasileira de Engenharia Biomdica, 17(3):113-123, 2001. 6. Carvalho L.A. Simple mathematical model for simulation of the human optical system based on in vivo corneal data. Revista Brasileira de Engenharia Biomdica, 19(1):29–38, 2003. 7. Doshi J., Sarver J. and Applegate B. Schematic eyes models for simulation of patient visual performance. Journal of Refractive Surgery, 17:414-419, 2001. 8. Emsley H. Visual Optics. Hatton Press Ltd, 5th ed, 1952. 9. Gonzales R.C. and Woods R.E. Digital Image Processing, Addison-Wesley, 1992. 10. Gullstrand A. Helmholtz’s Handbuch der Physiologischen Optik, vol 1, 3rd ed, 1909. 11. Helmholtz von H.H. Handbuch der Physiologishen Optik. In Southall, J.P.C. (Translator), Helmholtz’s treatise on physiological optics. New York: Dover, 1962. 12. Klyce S.D. Computer-Assisted Corneal Topography, High Resolution Graphics Presentation and Analyses of Keratoscopy. Invest. Ofthalmol. Vis. Sci., 25:426-435, 1984. 13. Kooijman A.C. Light Distribution on the retina of a wide-angle theoretical eye. J. Opt Soc. Amer., 73:1544-1550, 1983. 14. Langenbucher A., Viestenz A., Viestenz A., Brunner H., and Seitz B. Ray tracing through a schematic eye containing second-order (quadric) surfaces using 4 x 4 matrix notation. Ophthalmic Physiol Opt., (2):180–8, 2006. 15. Le Grand Y. and El Hage S.G. Physiological Optics, Springer Series in Optical Sciences, Springer-Verlag, 13, 1980. 16. Lotmar W. Theoretical eye model with aspherics surfaces. Opt. Soc. Amer., 61:1522-1529, 1971. 17. Pettit G.H., Campin J.A., Housand B.J., and Liedel K.K. Customized corneal ablation: wavefront guided laser vision correction. Autonomous TechnologyARVO, 9–14, 1999.
April 24, 2009
16:42
WSPC - Proceedings Trim Size: 9in x 6in
Leandro.Paganotti.novo2
392
18. Placido A. Novo Instrumento de Explorac˜ ao da C´ ornea. Periodico d’Oftalmolgica Practica, 5:27-30, 1880. 19. Scheiner C. Sive fundamentum opticum, Innspruk, 1619. 20. Stone J. The Validity of Some Existing Methods of Measuring Corneal Contour Compared with Suggested New Methods. Brit. J. Physiol. Opt., 19:205230, 1962. 21. Thibos N., Zhang X., and Bradley A. The chromatic eye: a new reduced-eye model of ocular chromatic aberration in humans. Appl Opt, 31(19):3594-3600, 1992. 22. Thibos N., Zhang X., and Bradley A. Spherical aberration of the reduced schematic eye with elliptical refracting surface, Indiana School of Optometry, 1997. 23. Thibos L.N. The prospects of perfect vision. Journal of Refractive Surgery, 16:540-545, 2000.
April 24, 2009
16:47
Proceedings Trim Size: 9in x 6in
Index.novo2
INDEX
C-value paradox, 329; Cancer, 224, 230, 277, 278, 337, 338, 364-367, 370-375; Capsid production, 121; Carrying capacity, 56, 59-61, 67, 72, 73, 190, 215, 216; Center of Gravity Method, 371; Chaotic behavior, 202; Chemical reaction, 9, 30; Chemotactic term, 41; Chomsky’s theory, 172; Chromatic eye, 379, 380, 391; Cladograms, 285; Classification methods, 224, 225, 265; Clonal selection theory, 352, 355, 361; Clustering algorithms, 283, 285; Co-evolutionary arms races, 158; Coding theory, 299, 302, 305, 309, 327, 336; Color descriptors, 136, 138; Combinatorial Feature Selection, 273; Combustion theory, 1, 11, 14, 15; Compromise Programming (CP), 258; Computational linguistics, 302, 326, 327; Computer graphics, 377; Corneal topographer, 381; Cross validation, 230, 265, 266, 275, 276; “cut-off procedure, 11;
ab initio principles, 146; Active search procedure, 175; Affinity matrix, 119, 121; Allee effect, 52-58, 60-64, 69, 72-76, 192-210; Allopolyploidy, 306; Amino acid, 156, 280, 291, 295, 301303, 306-317, 320, 322, 223, 325, 326, 328, 329, 331, 333, 335, 344; Anopheline mosquitoes, 211; Antibodies, 342, 343, 350-357, 360, 361; Antigenic mutation, 350, 358; Antigens, 308, 351-359, 363; Apoptosis, 355, 358; Archebacterial genomes, 245; Arrhenius exponential, 11; Artificial Immune System (AIS), 262; Artificial neural networks, 132, 138, 142; Asymptotic expansion speed, 205; Asymptotic stability, 4, 5; Atrial natriuretic factor, 340; BLAST server, 239; Banana data set, 232, 233; Baudot Codes, 317, 320, 323-327; B cells, 351-356, 361, 362; Bi-sexual reproduction, 21; Binary classification, 224, 226; Binary trees, 280-208, 285, 286, 295; Bioinformatics, 235, 252, 262, 295307, 327, 329-331, 344, 347; Biological control, 54, 74, 110; Borrelia burgdorferi, 235, 238, 239, 246; Boussinesq approximation, 18; Buoyancy term, 18;
Darwin’s diagram, 23; Darwinian evolution, 310; de Bruijn sequence, 320, 323; de novo drug design, 147; 393
April 24, 2009
16:47
Proceedings Trim Size: 9in x 6in
Index.novo2
394
Decision trees, 224, 264; Demographic Allee effect, 58, 72, 192; Density-dependent transmission, 52, 53, 55, 59-61, 64, 67, 68, 70, 73; Difference Equations, 79, 90, 162, 163, 193; Differential Equations with Delay, 79; Dinucleotide-monophosphates, 307; Dinucleotides, 312-314, 328; Direct channel coding theorem, 326; Discrete dynamical systems, 84; Discretization method, 273; Disease-free equilibrium, 61, 62, 73; Disease progression, 338; Dispersal kernel, 193, 196, 197; Empirical mean tree, 287, 288; Emsleys schematic eye, 382; Endemic equilibrium, 52, 61, 63, 66, 67, 69, 71-74; Endemic triple, 67-69, 74; Endosymbiosis, 306; Epidemic metapopulation model, 93, 94; Epidemic waves, 91, 92, 97, 108, 109; Epitopes, 352; Equilibria stability, 84; Ergodic states, 239, 240, 242, 245, 246; Error Minimization Hypothesis, 324; Escherichia coli, 238, 241, 244, 306; Eubacterial thermophiles, 245; Eukaryotic chromosomes, 236; Eulerian Washing Machines, 333; Eulerian circuit, 320, 321; Evolutionary field theory, 315; Exponentially unstable equilibria, 200; Extinction-survival (ES), 201; Fibonacci series, 329; Fisher-Kolmogorov equation, 31, 34; Folding funnel concept, 147;
Folsome’s hypothesis, 323; Fourier descriptors, 137; Fourier modes, 30, 35; Fredholm property, 15, 17; Frequency-dependent transmission, 54, 55, 58-60, 64, 72; Front solution, 29, 31-33, 36-38, 49, 50; Fullerenes, 115, 116, 119; Functional genomics, 291, 295; Fuzzy Sets Theory, 367, 375; Game theory, 171, 180; Gaussian distribution, 197, 282; Gaussian kernel, 225, 267; Gene-for-gene coevolution, 157; Generalized Proximal Support Vector Machines, 225, 266; Genetic algorithms, 148, 179; Genetic codes, 299-307, 311, 315, 320, 327, 328, 332; Genotype, 30, 300; Glycoprotein, 365; Gray code, 299, 309, 315-317, 320, 327, 333; Greedy Randomized Adaptive Search Procedures, 274; Gullstrands eye model, 383; Gyration Ratio, 149, 150; Hamiltonian path, 316, 317; Hamming distance, 314; Heart data set, 232, 233; Heaviside function, 8; Hellinger affinity, 97; Hexamers, 116-130; Hidden Markov Model (HMM), 292; Holoendemic area, 214; Homoclinic bifurcation, 63, 66, 74; Homodinucleotides, 320; Hopf bifurcations, 58; Horizontal mode, 172; Host-parasite systems, 157; Host population, 52-55, 57, 60, 6466, 71-74, 110, 159, 162, 163, 166, 167, 217,
April 24, 2009
16:47
Proceedings Trim Size: 9in x 6in
Index.novo2
395
Huffman codes, 299, 327; Human Mitochondrial Genome Database, 258; Human eye system, 376, 378; Hybrid equations, 81; Hydrophobicity, 148, 156, 318; Hydrophobic residues, 149; Hypergraph, 285, 296; Icosahedral viral capsids, 114-116; Idiotypic antiidiotypic interactions, 354; Image processing, 132-134, 143, 145, 377, 390; Immune memory, 350-356, 361; Immune network theory, 355; Immune responses, 351; Immune system, 351, 352, 354, 361, 363; Impulsive Differential Equations, 78-80, 88, 89; Indiana Eye, 380; Infectious diseases, 53, 77, 88, 92, 111, 112, 221; Information theory, 299, 302, 304, 309-311, 326, 327, 332; Integrated functions, 337; Integro-difference equations, 162, 163, 193; Intergenic sequences, 236, 239, 240, 245, 246; Inverse channel coding theorem, 326; Jukes-Cantor model, 255, 285; k-means procedure, 282; Kidney multinephron models, 347; Le Grand model,386-388; Least square method, 371; Leave-One-Out (LOO) classification, 230, 265; Legal Amazon Region, 212; LevenbergMarquardt back-propagation optimization algorithm, 139;
Lewis number, 1, 12-14, 16, 18; Linear front speed, 30-33, 37; Local dispersion, 194; Long distance dispersion, 194;
Macroinvertebrates, 132-134, 141, 143, 144; Macrophytes, 134; Malaria, 211-216, 219-221; Malignant tumours, 366; Mamdani Method, 371; Mammalian kidney, 338, 339, 344; Manual keratometers, 377; Marginal Rate of Return (MRR), 258; Marginal stability approach, 30, 33; Markov chain, 235-247, 250, 291, 292, 297; Markovian hypothesis, 94; Markov set-chains, 235, 242, 243, 247, 248, 250; Mass action, 9, 55; Mathematical epidemiology, 54, 75, 78, 79; Mean Value Theorem, 87, 88; Metapopulation network, Metastasis, 91, 97, 98, 107; Metropolis test, 365, 366, 372; Minimal Test Collection, 273; Minimum cost satisfability problem (MINSAT), 274, Misclassification error, 139, 140, 223, 224, 233; Monte carlo simulations, 147, 155, 236, 238; Multi-objective optimization, 251, 254, 257, 262; Multi-species interactions, 183; Multi layer perceptron (MLP) neural networks, 142; Multiletter alphabet biopolymers, 147; Multiple defence strategies, 157; Multiple endemic states, 54, 67; Mutational distance, 129;
April 24, 2009
16:47
Proceedings Trim Size: 9in x 6in
Index.novo2
396
Navier-Stokes equations, 17, 18; Neighbor Joining, 253, 285; Neural networks, 132, 134, 138, 142, 145, 147, 155, 223-226, 228, 230, 233, 264, 265, 276; Newell-Whitehead equation, 30; Nomograms, 264, 266, 267, 270-273, 275; Nonlinear front speed, 31-33, 37; Nontrivial equilibria, 199; Oblique mode, 171; Painlev´e PDE test, 31; Parasitemia, 217; Paratopes, 352; Pareto dominance concept, 254; Pareto optimal solutions, 254, 260; Parity Rule II, 235; Pathogenic parasites, 52; Pentamers, 116-122, 124; Pfam database, 280, 293; Phenotype, 24, 169, 299, 300; Photorefractive Keratectomy, 377; Phylogenetic inference, 251-255, 258, 261, 263; Phylogenetic tree, 252, 257, 261, 263, 279, 280, 282, 286, 290, 291, 296, 299, 301, 315, 318, 325; Pima Indians diabetes data set, 232; Plasmacyte, 351; Plasmodium falciparum, 211; Polypeptide chains, 147; Population dynamics, 1, 20, 22, 5254, 111, 170, 171, 177, 179, 191, 202; Population genetics, 49, 171, 179, 180, 262, 299, 300, 304, 305; Power-law degree distribution, 96, Prandtl number, 18; Prey-predator model, 20; Primary tumour, 366; Probabilistic Suffix Trees algorithm, 296; Probe cohort, 214, 219; Prokaryotic genomes, 236;
Prostate cancer, 278, 364-367, 372375; Prostatic Specific Antigen (PSA), 365; Protein folding problem, 146; Pulse Vaccination Models, 79; Pyrolysine, 308; QR decomposition 229, 271; Radial Basis Function (RBF) neural networks, 223, 225, 233; Radial Keratectomy, 377; Random Tree, 287, 288, 296; Randomly reshuffled network, 101, 102; Rare-enemy effect, 158; Ray-tracing process, 380, 384; Rayleigh-Benard convection, 48; Rayleigh number, 17, 18, 20; Rayleigh quotient, 270, 272; ReGEC algorithm, 266, 274; Reaction-diffusion-convection (RDC) wave, 1, 17, 20; Reaction-diffusion equations, 1, 11, 29, 30, 50; Rectal examination, 365; Rectangular hyperbola (RH) functional form, 195; Red blood cells, 211, 212; Ribosomes, 302, 303, 328; Rich-get-richer mechanism, 95; Robinson Foulds distance, 286; SIS models, 79, 83; Saddle-node bifurcation, 63, 66, 69, 71, 74; Scale-invariants morphological descriptors, 137; Scale-invariants shape descriptors, 137; Second genetic code, 146; Segmentation, 281; Selenocysteine, 306, 308; Self-limitation function, 160; Shannons first theorem, 309;
April 24, 2009
16:47
Proceedings Trim Size: 9in x 6in
Index.novo2
397
Shannons second theorem, 309, 311; Signal transduction, 338; Singular value decomposition, 229; Snellens letters, 308; Spherical viruses, 115; Spot Diagram, 380, 386; Statistical potentials, 146; Stoichiometric coefficients, 9; Strategy-blocking, 158, 169; Strong heteroclinic connection, 34, 38; Support Vector Machines, 225, 234, 265, 266, 277; Susceptible-infected-removed (SIR) model, 93; Sustainable fisheries management, 183; Taxonomic identification, 132, 133, 141-143; Taylor-Couette instability, 30; Texture descriptors, 138; Thyroid data set, 232, 233; T lymphocytes, 351; Topological context trees, 281, 282; Transient dynamics, 54, 66, 67, 74; Transportation networks, 96; Traveling Salesperson Problem, 316; Travelling wave solution, 10, 11, 13, 14, 39-41, 43, 44; Tree-structured data objects, 279; Tubule perfusion, 342;
UPGM methods, 285; Unconditional extinction (UE), 201; Unconditional survival, 201; VEye system, 380; Van Sarloos’ ansatz, 31, 36; Variable Length Markov Chains (VLMC), 297; Verhust factor, 354; Vertical mode, 171; Vertical transmission, 56, 58, 171, 173-175, 177, 179; Viability analysis, 183, 185, 190; Viability kernel, 182, 183, 185, 187, 189,; Viable stationary points, 186; Videokeratography system, 381; Vorticity, 19; Water permeability, 340, 345; Wind-pollinated plants, 193; Zeldovich number, 12-14;